<<

Linear Algebra II (MAT 3141)

Course notes written by Damien Roy with the assistance of Pierre Bel and financial support from the University of Ottawa for the development of pedagogical material in French (Translated into English by Alistair Savage)

Fall 2012 Department of Mathematics and Statistics University of Ottawa Contents

Preface v

1 Review of vector spaces1 1.1 Vector spaces...... 1 1.2 Vector subspaces...... 3 1.3 Bases...... 4 1.4 Direct sum...... 7

2 Review of linear maps 13 2.1 Linear maps...... 13

2.2 The LK (V,W )...... 16 2.3 Composition...... 18 2.4 Matrices associated to linear maps...... 21 2.5 Change of coordinates...... 24 2.6 Endomorphisms and invariant subspaces...... 26

3 Review of diagonalization 33 3.1 and similar matrices...... 33 3.2 Diagonalization of operators...... 36 3.3 Diagonalization of matrices...... 41

4 Polynomials, linear operators and matrices 47 4.1 The of linear operators...... 47 4.2 Polynomial rings...... 48 4.3 Evaluation at a linear operator or matrix...... 54

5 Unique factorization in euclidean domains 59 5.1 Divisibility in integral domains...... 59 5.2 Divisibility in terms of ideals...... 62 5.3 Euclidean division of polynomials...... 64 5.4 Euclidean domains...... 65 5.5 The Unique Factorization Theorem...... 69 5.6 The Fundamental Theorem of Algebra...... 72

i ii CONTENTS

6 Modules 75 6.1 The notion of a ...... 75 6.2 Submodules...... 80 6.3 Free modules...... 83 6.4 Direct sum...... 84 6.5 Homomorphisms of modules...... 86

7 The structure theorem 91 7.1 Annihilators...... 91 7.2 Modules over a euclidean domain...... 99 7.3 Primary decomposition...... 102 7.4 Jordan canonical form...... 108

8 The proof of the structure theorem 119 8.1 Submodules of free modules, Part I...... 119 8.2 The Cayley-Hamilton Theorem...... 124 8.3 Submodules of free modules, Part II...... 127 8.4 The column module of a matrix...... 130 8.5 Smith normal form...... 134 8.6 Algorithms...... 141

9 Duality and the 155 9.1 Duality...... 155 9.2 Bilinear maps...... 160 9.3 Tensor product...... 166 9.4 The Kronecker product...... 172 9.5 Multiple tensor products...... 176

10 Inner product spaces 179 10.1 Review...... 179 10.2 Orthogonal operators...... 186 10.3 The adjoint operator...... 190 10.4 Spectral theorems...... 193 10.5 Polar decomposition...... 199

Appendices 205

A Review: groups, rings and fields 207 A.1 Monoids...... 207 A.2 Groups...... 209 A.3 Subgroups...... 212 A.4 Group homomorphisms...... 213 A.5 Rings...... 215 A.6 Subrings...... 219 A.7 Ring homomorphisms...... 220 A.8 Fields...... 222 CONTENTS iii

B The 223 B.1 Multilinear maps...... 223 B.2 The determinant...... 228 B.3 The adjugate...... 231 iv CONTENTS Preface

These notes are aimed at students in the course Linear Algebra II (MAT 3141) at the University of Ottawa. The first three chapters contain a revision of basic notions covered in the prerequisite course Linear Algebra I (MAT 2141): vector spaces, linear maps and diagonalization. The rest of the course is divided into three parts. Chapters 4 to 8 make up the main part of the course. They culminate in the structure theorem for finite type modules over a euclidean ring, a result which leads to the Jordan canonical form of operators on a finite dimensional complex vector space, as well as to the structure theorem for abelian groups of finite type. Chapter 9 involves duality and the tensor product. Then Chapter 10 treats the spectral theorems for operators on an inner product space (after a review of some notions seen in the prerequisite course MAT 2141). With respect to the initial plan, they still lack an eleventh chapter on hermitian spaces, which will be added later. The notes are completed by two appendices. Appendix A provides a review of basic algebra that are used in the course: groups, rings and fields (and the morphisms betwen these objects). Appendix B covers the determinants of matrices with entries in an arbitrary and their properties. Often the determinant is simply defined over a field and the goal of this appendix is to show that this notion easily generalizes to an arbitrary commutative ring.

This course provides students the chance to learn the theory of modules that is not covered, due to lack of time, in other undergraduate algebra courses. To simplify matters and make the course easier to follow, I have avoided introducing quotient modules. This topic could be added through exercises (for more motivated students). According to comments I have received, students appreciate the richness of the concepts presented in this course and that they open new horizons to them.

I thank Pierre Bel who greatly helped with the typing and updating of these notes. I also thank the Rectorat aux Etudes´ of the University of Ottawa for its financial support of this project via the funds for the development of French teaching materials. This project would not have been possible without their help. Finally, I thank my wife Laura Dutto for her help in the revision of preliminary versions of these notes and the students who took this course with me in the Winter and Fall of 2009 for their participation and their comments.

Damien Roy (as translated by Alistair Savage) Ottawa, January 2010.

v Chapter 1

Review of vector spaces

This chapter, like the two that follow it, is devoted to a review of fundamental concepts of linear algebra from the prerequisite course MAT 2141. This first chapter concerns the main object of study in linear algebra: vector spaces. We recall here the notions of a vector space, vector subspace, basis, dimension, coordinates, and direct sum. The reader should pay special attention to the notion of direct sum, since it will play a vital role later in the course.

1.1 Vector spaces

Definition 1.1.1 (Vector space). A vector space over a field K is a set V with operations of addition and scalar multiplication:

V × V −→ V K × V −→ V and (u, v) 7−→ u + v (a, v) 7−→ av that satisfies the following axioms:

VS1. u + (v + w) = (u + v) + w  for all u, v, w ∈ V. VS2. u + v = v + u VS3. There exists 0 ∈ V such that v + 0 = v for all v ∈ V . VS4. For all v ∈ V , there exists −v ∈ V such that v + (−v) = 0.  VS5. 1v = v  VS6. a(bv) = (ab)v  for all a, b ∈ K and u, v ∈ V . VS7. (a + b)v = av + bv  VS8. a(u + v) = au + av 

These eight axioms can be easily interpreted. The first four imply that (V, +) is an abelian group (cf. Appendix A). In particular, VS1 and VS2 imply that the order in which

1 2 CHAPTER 1. REVIEW OF VECTOR SPACES

we add the elements v1,..., vn of V does not affect their sum, denoted

n X v1 + ··· + vn or vi. i=1

Condition VS3 uniquely determines the element 0 of V . We call it the zero!vector of V , the term vector being the generic name used to designate an element of a vector space. For each v ∈ V , there exists one and only one vector −v ∈ V that satisfies VS4. We call it the additive inverse of v. The existence of the additive inverse allows us to define subtraction in V by u − v := u + (−v).

Axioms VS5 and VS6 involve only scalar multiplication, while the Axioms VS7 and VS8 link the two operations. These last axioms require that multiplication by a scalar be distributive over addition on the right and on the left (i.e. over addition in K and in V ). They imply general distributivity formulas:

n ! n n n X X X X ai v = aiv and a vi = avi i=1 i=1 i=1 i=1

for all a, a1, . . . , an ∈ K and v, v1,..., vn ∈ V . We will return to these axioms later when we study the notion of a module over a commutative ring, which generalizes the idea of a vector space over a field.

Example 1.1.2. Let n ∈ N>0. The set    a   1   a   n  2  K =  .  ; a1, a2, . . . , an ∈ K  .      an  of n-tuples of elements of K is a vector space over K for the operations:           a1 b1 a1 + b1 a1 c a1  a   b   a + b   a   c a   2   2   2 2   2   2   .  +  .  =  .  and c  .  =  .  .  .   .   .   .   .  an bn an + bn an c an

In particular, K1 = K is a vector space over K.

Example 1.1.3. More generally, let m, n ∈ N>0. The set    a ··· a   11 1n   . .  Matm×n(K) =  . .  ; a11, . . . , amn ∈ K    am1 . . . amn  1.2. VECTOR SUBSPACES 3 of m × n matrices with entries in K is a vector space over K for the usual operations:

      a11 ··· a1n b11 ··· b1n a11 + b11 ··· a1n + b1n  . .   . .   . .   . .  +  . .  =  . .  am1 . . . amn bm1 . . . bmn am1 + bm1 . . . amn + bmn     a11 ··· a1n c a11 ··· c a1n  . .   . .  and c  . .  =  . .  . am1 . . . amn c am1 . . . c amn Example 1.1.4. Let X be an arbitrary set. The set F(X,K) of functions from X to K is a vector space over K when we define the sum of two functions f : X → K and g : X → K to be the function f + g : X → K given by (f + g)(x) = f(x) + g(x) for all x ∈ X, and the product of f : X → K and a scalar c ∈ K to be the function c f : X → K given by (c f)(x) = c f(x) for all x ∈ X.

Example 1.1.5. Finally, if V1,...,Vn are vector spaces over K, their cartesian product

V1 × · · · × Vn = {(v1,..., vn); v1 ∈ V1,..., vn ∈ Vn} is a vector space over K for the operations

(v1,..., vn) + (w1,..., wn) = (v1 + w1,..., vn + wn),

c(v1,..., vn) = (cv1, . . . , cvn).

1.2 Vector subspaces

Fix a vector space V over a field K. Definition 1.2.1 (Vector subspace). A vector subspace of V is a subset U of V satisfying the following conditions: SUB1. 0 ∈ U. SUB2. If u, v ∈ U, then u + v ∈ U. SUB3. If u ∈ U and c ∈ K, then cu ∈ U.

Conditions SUB2 and SUB3 imply that U is stable under the addition in V and stable under scalar multiplication by elements of K. We thus obtain the operations U × U −→ U K × U −→ U and (u, v) 7−→ u + v (a, v) 7−→ av on U and we see that, with these operations, U is itself a vector space. Condition SUB1 can be replaced by U 6= ∅ because, if U contains an element u, then SUB3 implies that 0 u = 0 ∈ U. Nevertheless, it is generally just as easy to verify SUB1. We therefore have the following proposition. 4 CHAPTER 1. REVIEW OF VECTOR SPACES

Proposition 1.2.2. A vector subspace U of V is a vector space over K for the addition and scalar multiplication restricted from V to U.

We therefore see that all vector subspaces of V give new examples of vector spaces. If U is a subspace of V , we can also consider subspaces of U. However, we see that these are simply subspaces of V contained in U. Thus, this does not give new examples. In particular, the notion of being a subspace is transitive. Proposition 1.2.3. If U is a subspace of V and W is a subspace of U, then W is a subspace of V .

We can also form the sum and intersection of subspaces of V :

Proposition 1.2.4. Let U1,...,Un be subspaces of V . The sets

U1 + ··· + Un = {u1 + ··· + un ; u1 ∈ U1,..., un ∈ Un} and

U1 ∩ · · · ∩ Un = {u ; u ∈ U1,..., u ∈ Un} ,

called, respectively, the sum and intersection of U1,...,Un, are subspaces of V .

Example 1.2.5. Let V1 and V2 be vector spaces over K. Then

U1 = V1 × {0} = {(v1, 0); v1 ∈ V1} and U2 = {0} × V2 = {(0, v2); v2 ∈ V2} are subspaces of V1 × V2. We see that

U1 + U2 = V1 × V2 and U1 ∩ U2 = {(0, 0)} .

1.3 Bases

In this section, we fix a vector space V over a field K and elements v1,..., vn of V . Definition 1.3.1 (Linear combination). We say that an element v of V is a linear combination of v1,..., vn if there exist a1, . . . , an ∈ K such that

v = a1v1 + ··· + anvn. We denote by

hv1,..., vniK = {a1v1 + ··· + anvn ; a1, . . . , an ∈ K}

the set of linear combinations of v1,..., vn. This is also sometimes denoted SpanK {v1,..., vn}.

The identities 0v1 + ··· + 0vn = 0 n n n X X X aivi + bivi = (ai + bi)vi i=1 i=1 i=1 n n X X c aivi = (c ai)vi i=1 i=1 1.3. BASES 5

show that hv1,..., vniK is a subspace of V . This subspaces contains v1,..., vn since we have

n X  1 if i = j, v = δ v where δ = j i,j i i,j 0 otherwise. i=1

Finally, every subspace of V that contains v1,..., vn also contains all linear combinations of

v1,..., vn, that is, it contains hv1,..., vniK . Therefore:

Proposition 1.3.2. The set hv1,..., vniK of linear combinations of v1,..., vn is a subspace of V that contains v1,..., vn and that is contained in all subspaces of V containing v1,..., vn.

We express this property by saying that hv1,..., vniK is the smallest subspace of V that contains v1,..., vn (with respect to inclusion). We call it the subspace of V generated by v1,..., vn.

Definition 1.3.3 (Generators). We say that v1,..., vn generates V or that {v1,..., vn} is a system of generators (or generating set) of V if

hv1,..., vniK = V

We say that V is a subspace of finite type (or is finite dimensional) if it admits a finite system of generators.

It is important to distinguish the finite set {v1,..., vn} consisting of n elements (count-

ing possible repetitions) from the set hv1,..., vniK of their linear combinations, which is generally infinite.

Definition 1.3.4 (Linear dependence and independence). We say that the vectors v1,..., vn are linearly independent, or that the set {v1,..., vn} is linearly independent if the only choice of a1, . . . , an ∈ K such that a1v1 + ··· + anvn = 0 (1.1)

is a1 = ··· = an = 0. Otherwise, we say that the vectors v1,..., vn are linearly dependent.

In other words, v1,..., vn are linearly dependent if there exist a1, . . . , an, not all zero, that satisfy Condition (1.1). A relation of the form (1.1) with a1, . . . , an not all zero is called a linear dependence relation among v1,..., vn.

Definition 1.3.5 (Basis). We say that {v1,..., vn} is a basis of V if {v1,..., vn} is a system of linearly independent generators for V .

The importance of the notion of basis comes from the following result:

Proposition 1.3.6. Suppose that {v1,..., vn} is a basis for V . For all v ∈ V , there exists a unique choice of scalars a1, . . . , an ∈ K such that

v = a1v1 + ··· + anvn . (1.2) 6 CHAPTER 1. REVIEW OF VECTOR SPACES

Proof. Suppose v ∈ V . Since the vectors v1,..., vn generate V , there exist a1, . . . , an ∈ K 0 0 0 0 satisfying (1.2). If a1, . . . , an ∈ K also satisfy v = a1v1 + ··· + anvn, then we have

0 0 0 = (a1v1 + ··· + anvn) − (a1v1 + ··· + anvn) 0 0 = (a1 − a1)v1 + ··· + (an − an)vn.

0 Since the vectors v1,..., vn are linearly independents, this implies that a1 − a1 = ··· = 0 0 0 an − an = 0, and so a1 = a1, . . . , an = an.

If B = {v1,..., vn} is a basis of V , the scalars a1, . . . , an that satisfy (1.2) are called the coordinates of v in the basis B. We denote by   a1  .  n [v]B =  .  ∈ K an the column vector formed by these coordinates. Note that this supposes that we have fixed an order on the elements of B, in other words that B is an ordered set of elements of V . Really, we should write B as an n-tuple (v1,..., vn), but we stick with the traditional set notation.

Example 1.3.7. Let n ∈ N>0. The vectors  1   0   0   0   1   .       .  e1 =  .  , e2 =  .  ,..., en =    .   .   0  0 0 1

n n form a basis E = {e1,..., en} of K called the standard basis (or canonical basis) of K . We have         a1 1 0 0  a   0   1   .   2       .   .  = a1  .  + a2  .  + ··· + an   ,  .   .   .   0  an 0 0 1 therefore       a1 a1 a1  a   a   a   2   2   2  n  .  =  .  for all  .  ∈ K .  .   .   .  an E an an The following result is fundamental: Proposition 1.3.8. Suppose V is a vector space of finite type. Then

(i) V has a basis and all bases of V have the same cardinality. 1.4. DIRECT SUM 7

(ii) Every generating set of V contains a basis of V .

(iii) Every linearly independent set of elements of V is contained in a basis of V .

(iv) Every subspace of V has a basis.

The common cardinality of the bases of V is called the dimension of V and is denoted dimK V . Recall that, by convention, if V = {0}, then ∅ is a basis of V and dimK V = 0. If V is of dimension n, then it follows from statements (ii)) and (iii) of Proposition 1.3.8 that every generating set of V contains at least n elements and that every linearly independent set in V contains at most n elements. From this we can also deduce that if U1 and U2 are subspaces of V of the same finite dimension, with U1 ⊆ U2, then U1 = U2.

We also recall the following fact.

Proposition 1.3.9. If U1 and U2 are finite dimensional subspaces of V , then

dimK (U1 + U2) + dimK (U1 ∩ U2) = dimK (U1) + dimK (U2).

1.4 Direct sum

In this section, we fix a vector space V over a field K and subspaces V1,...,Vs of V .

Definition 1.4.1 (Direct sum). We say that the sum V1 + ··· + Vs is direct if the only choice of vectors v1 ∈ V1,..., vs ∈ Vs such that

v1 + ··· + vs = 0.

is v1 = ··· vs = 0. We express this condition by writing the sum of the subspaces V1,...,Vs as V1 ⊕ · · · ⊕ Vs with circles around the addition signs.

We say that V is the direct sum of V1,...,Vs if V = V1 ⊕ · · · ⊕ Vs.

Example 1.4.2. Let {v1,..., vn} be a basis of a vector space V over K. Proposition 1.3.6

shows that V = hv1iK ⊕ · · · ⊕ hvniK .

Example 1.4.3. We always have V = V ⊕ {0} = V ⊕ {0} ⊕ {0}.

This last example shows that, in a direct sum, we can have one or more summands equal to {0}.

Proposition 1.4.4. The vector space V is the direct sum of the subspaces V1,...,Vs if and only if, for every v ∈ V , there exists a unique choice of vectors v1 ∈ V1,..., vs ∈ Vs such that v = v1 + ··· + vs. 8 CHAPTER 1. REVIEW OF VECTOR SPACES

The proof is similar to that of Proposition 1.3.6 and is left to the reader. We also have the following criterion (compare to Exercise 1.4.2).

Proposition 1.4.5. The sum V1 + ··· + Vs is direct if and only if  Vi ∩ V1 + ··· + Vbi + ··· + Vs = {0} for i = 1, . . . , s. (1.3)

In the above condition, the hat over the Vi indicates that Vi is omitted from the sum.

Proof. First suppose that Condition (1.3) is satisfied. If v1 ∈ V1,..., vs ∈ Vs satisfy v1 + ··· + vs = 0, then, for i = 1, . . . , s, we have  −vi = v1 + ··· + vbi + ··· + vs ∈ Vi ∩ V1 + ··· + Vbi + ··· + Vs .

Therefore −vi = 0 and so vi = 0. Thus the sum V1 + ··· + Vs is direct. Conversely, suppose that the sum is direct, and fix i ∈ {1, . . . , s}. If  v ∈ Vi ∩ V1 + ··· + Vbi + ··· + Vs ,

then there exists v1 ∈ V1,..., vs ∈ Vs such that

v = −vi and v = v1 + ··· + vbi + ··· + vs.

Eliminating v, we see that −vi = v1 + ··· + vbi + ··· + vs, thus v1 + ··· + vs = 0 and so v1 = ··· = vs = 0. In particular, we see that v = 0. Since the choice of v was arbitrary, this shows that Condition (1.3) is satisfied for all i.

In particular, for s = 2, Proposition 1.4.5 implies that V = V1 ⊕ V2 if and only if V = V1 + V2 and V1 ∩ V2 = {0} . Applying this result to Proposition 1.3.9, we see that:

Proposition 1.4.6. Suppose V1 and V2 are subspaces of a vector space V of finite dimension. Then V = V1 ⊕ V2 if and only if

(i) dimK V = dimK V1 + dimK V2, and

(ii) V1 ∩ V2 = {0}.

Example 1.4.7. Let V = R3, * 2 +  2a    x      3  V1 =  0  =  0  ; a ∈ R and V2 =  y  ∈ R ; x + y + z = 0 . 1  a   z  R    2    3   Proposition 1.3.2 implies that V1 is a subspace of R . Since  0  is a basis of V1, we have    1 

dimR(V1) = 1. On the other hand, since V2 is the set of solutions to a homogeneous system 1.4. DIRECT SUM 9

3 of linear equations of rank 1 (with a single equation), V2 is a subsapce of R of dimension dimR(V2) = 3 − (rank of the system) = 2. We also see that  2a   0  ∈ V2 ⇐⇒ 2a + 0 + a = 0 ⇐⇒ a = 0, a

3 and so V1 ∩ V2 = {0}. Since dimR V1 + dimR V2 = 3 = dimR R , Proposition 1.4.6 implies 3 that R = V1 ⊕ V2.

Geometric interpretation: The set V1 represents a straight line passing through the origin 3 in R , while V2 represents a plane passing through the origin with normal vector n := t t (1, 1, 1) . The line V1 is not parallel to the plane V2 because its direction vector (2, 0, 1) is not perpendicular to n. In this situation, we see that every vector in the space can be written in a unique way as the sum of a vector of the line and a vector in the plane.

The following result will play a key role later in the course.

Proposition 1.4.8. Suppose that V = V1 ⊕ · · · ⊕ Vs is of finite dimension and that Bi =

{vi,1, vi,2,..., vi,ni } is a basis of Vi for i = 1, . . . , s. Then the ordered set

B = {v1,1,..., v1,n1 , v2,1,..., v2,n2 ,...... , vs,1,..., vs,ns }

obtained by concatenation from B1, B2,..., Bs is a basis of V .

From now on, we will denote this basis by B1 q B2 q · · · q Bs. The symbol q represents “disjoint union”.

Proof. 1st We show that B is a generating set for V .

Let v ∈ V . Since V = V1 ⊕ · · · ⊕ Vs, we can write v = v1 + ··· + vs with vi ∈ Vi for i = 1, . . . , s. Since Bi is a basis for Vi, we can also write

vi = ai,1vi,1 + ··· + ai,ni vi,ni

with ai,1, . . . , ai,ni ∈ K. From this, we conclude that

v = a1,1v1,1 + ··· + a1,n1 v1,ni + ······ + as,1vs,1 + ··· + as,ns vs,ns . So B generates V .

2nd We show that B is linearly independent. Suppose that

a1,1v1,1 + ··· + a1,n1 v1,ni + ··· + as,1vs,1 + ··· + as,ns vs,ns = 0

for some elements a1,1, . . . , as,ns ∈ K. Setting vi = ai,1vi,1 +···+ai,ni vi,ni for i = 1, . . . , s, we see that v1 +···+vs = 0. Since vi ∈ Vi for i = 1, . . . , s and V is the direct sum of V1,...,Vs,

this implies vi = 0 for i = 1, . . . , s. We deduce that ai,1 = ··· = ai,ni = 0, because Bi is a

basis of Vi. So we have a1,1 = ··· = as,ns = 0. This shows that B is linearly independent. 10 CHAPTER 1. REVIEW OF VECTOR SPACES

Corollary 1.4.9. If V = V1 ⊕ · · · ⊕ Vs is finite dimensional, then

dimK V = dimK (V1) + ··· + dimK (Vs).

Proof. Using Proposition 1.4.8 and using the same notation, we have

s s X X dimK V = |B| = |Bi| = dimK (Vi). i=1 i=1

3 Example 1.4.10. In Example 1.4.7, we have R = V1 ⊕ V2. We see that        2   −1 0  B1 =  0  and B2 =  0  ,  −1   1   1 1  are bases for V1 and V2 respectively (indeed, B2 is a linearly independent subset of V2 con- sisting of two elements, and dimR(V2) = 2, thus it is a basis V2). We conclude that        2 −1 0  B = B1 q B2 =  0  ,  0  ,  −1   1 1 1  is a basis of R3.

Exercises.

1.4.1. Prove Proposition 1.4.4.

1.4.2. Let V1,...,Vs be subspaces of a vector space V over a field K. Show that their sum is direct if and only if Vi ∩ (Vi+1 + ··· + Vs) = {0} for i = 1, . . . , s − 1.

1.4.3. Let V1,...,Vs be subspaces of a vector space V over a field K. Suppose that their sum is direct.

(i) Show that, if Ui is a subspace of Vi for i = 1, . . . , s, then the sum U1 +···+Us is direct.

(ii) Show that, if vi is a nonzero element of Vi for i = 1, . . . , s, then v1,..., vs are linearly independent over K. 1.4. DIRECT SUM 11

1.4.4. Consider the subspaces of R4 given by 1 1 1 0 1 * + * + 2 1 0 0 0 V1 =   ,   and V2 =   ,   ,   . 3 1 1 1 2 4 1 1 1 2 R R 4 Show that R = V1 ⊕ V2.

1.4.5. Let V1 = { f : Z → R ; f(−x) = f(x) for all x ∈ Z } and V2 = { f : Z → R ; f(−x) = −f(x) for all x ∈ Z }.

Show that V1 and V2 are subspaces of F(Z, R) and that F(Z, R) = V1 ⊕ V2.

1.4.6. Let V1,V2,...,Vs be vector subspaces over the same field K. Show that

V1×V2 × · · · × Vs    = V1 × {0} × · · · × {0} ⊕ {0} × V2 × · · · × {0} ⊕ · · · ⊕ {0} × · · · × {0} × Vs . 12 CHAPTER 1. REVIEW OF VECTOR SPACES Chapter 2

Review of linear maps

Fixing a field K of scalars, a linear map from a vector space V to a vector space W is a function that “commutes” with addition and scalar multiplication. This second preliminary chapter review the notions of linear maps, kernel and image, and the operations we can perform on linear maps (addition, scalar multiplication and composition). We then review matrices of linear maps relative to a choice of bases, how they behave under the above- mentioned operations and also under a change of basis. We examine the particular case of linear operators on a vector space V , that is, linear maps from V to itself. We pay special attention to the notion of invariant subspace of a linear operator, which will play a fundamental role later in the course.

2.1 Linear maps

Definition 2.1.1 (Linear map). Let V and W be vector spaces over K.A linear map from V to W is a function T : V → W such that

LM1. T (v + v0) = T (v) + T (v0) for all v, v0 ∈ V,

LM2. T (cv) = c T (v) for all c ∈ K and all v ∈ V.

We recall the following results:

Proposition 2.1.2. Let T : V → W be a linear map.

(i) The set ker(T ) := {v ; T (v) = 0}, called the kernel of T , is a subspace of V .

(ii) The set Im(T ) := {T (v); v ∈ V }, called the image of T , is a subspace of W .

(iii) The linear map T is injective if and only if ker(T ) = {0}. It is surjective if and only if Im(T ) = W .

13 14 CHAPTER 2. REVIEW OF LINEAR MAPS

(iv) If V is finite dimensional, then

dimK V = dimK (ker(T )) + dimK (Im(T )).

Example 2.1.3 (Linear map associated to a matrix). Let M ∈ Matm×n(K). The function

n m TM : K −→ K X 7−→ MX is linear (this follows immediately from the properties of matrix multiplication). Its kernel is n ker(TM ) := {X ∈ K ; MX = 0}

and is thus the solution set of the homogeneous system MX = 0. If C1,...,Cn denote the columns of M, we see that

n Im(TM ) = {MX ; X ∈ K }  x  n 1 o  .  = (C1 ··· Cn)  .  ; x1, . . . , xn ∈ K xn

= {x1C1 + ··· + xnCn ; x1, . . . , xn ∈ K}

= hC1,...,CniK .

Thus the image of TM is the column space of M. Since the dimension of the latter is the rank of M, formula (iv) of Proposition 2.1.2 recovers the following known result:

dim{X ∈ Kn ; MX = 0} = n − rank(M).

Example 2.1.4. Let V1 and V2 be vector spaces over K. The function

π1 : V1 × V2 −→ V1 (v1, v2) 7−→ v1

0 0 is a linear map since if (v1, v2) and (v1, v2) are elements of V1 × V2 and c ∈ K, we have

0 0 0 0 π1((v1, v2) + (v1, v2)) = π1(v1 + v1, v2 + v2) 0 = v1 + v1 0 0 = π1(v1, v2) + π1(v1, v2) and π1(c (v1, v2)) = π1(c v1, c v2) = c v1 = c π1(v1, v2). We also see that

ker(π1) = {(v1, v2) ∈ V1 × V2 ; v1 = 0} = {(0, v2); v2 ∈ V2} = {0} × V2,

and Im(π1) = V1 since v1 = π1(v1, 0) ∈ Im(π1) for all v1 ∈ V1. Thus π1 is surjective. 2.1. LINEAR MAPS 15

The map π1 : V1 × V2 → V1 is called the projection onto the first factor. Similarly, we see that π : V × V −→ V 2 1 2 2 , (v1, v2) 7−→ v2

called projection onto the second factor is linear and surjective with kernel V1 × {0}. In a similar manner, we see that the maps

i1 : V1 −→ V1 × V2 and i2 : V2 −→ V1 × V2 v1 7−→ (v1, 0) v2 7−→ (0, v2)

are linear and injective, with images V1 × {0} and {0} × V2 respectively.

We conclude this section with the following result which will be crucial in what follows. In particular, we will see in Chapter 9 that it can be generalized to “bilinear” maps (see also Appendix B for the even more general case of multilinear maps).

Theorem 2.1.5. Let V and W be vector spaces over K, with dimK (V ) = n. Let {v1,..., vn} be a basis of V , and w1,..., wn a sequence of n arbitrary elements of W (not necessarily distinct). Then there exists a unique linear map T : V → W satisfying T (vi) = wi for all i = 1, . . . , n.

Proof. 1st Uniqueness. If such a linear map T exists, then

n n n X  X X T aivi = aiT (vi) = aiwi i=1 i=1 i=1

for all a1, . . . , an ∈ K. This uniquely determines T . 2nd Existence. Consider the function T : V → W defined by

n n X  X T aivi = aiwi. i=1 i=1

It satisfies T (vi) = wi for all i = 1, . . . , n. It remains to show that this map is linear. We have n n n X X 0  X 0  T aivi + aivi = T (ai + ai)vi i=1 j=1 i=1 n X 0 = (ai + ai)wi i=1 n n X X 0 = aiwi + aiwi i=1 i=1 n n X  X 0  = T aivi + T aivi , i=1 j=1 16 CHAPTER 2. REVIEW OF LINEAR MAPS

0 0 for all a1, . . . , an, a1, . . . , an ∈ K. We also see that n n  X  X  T c aivi = c T aivi , i=1 i=1 for all c ∈ K. Thus T is indeed a linear map.

2.2 The vector space LK(V,W )

Let V and W be vector spaces over K. We denote by LK (V,W ) the set of linear maps from V to W . We first recall that this set is naturally endowed with the operations of addition and scalar multiplication.

Proposition 2.2.1. Let T1 : V → W and T2 : V → W be linear maps, and let c ∈ K.

(i) The function T1 + T2 : V → W given by

(T1 + T2)(v) = T1(v) + T2(v) for all v ∈ V is a linear map.

(ii) The function c T1 : V → W given by

(c T1)(v) = c T1(v) for all v ∈ V is also a linear map.

Proof of (i). For all v1, v2 ∈ V and a ∈ K, we have:

(T1 + T2)(v1 + v2) = T1(v1 + v2) + T2(v1 + v2) (definition of T1 + T2)

= (T1(v1) + T1(v2)) + (T2(v1) + T2(v2)) (since T1 and T2 are linear)

= (T1(v1) + T2(v1)) + (T1(v2) + T2(v2))

= (T1 + T2)(v1) + (T1 + T2)(v2), (definition of T1 + T2)

(T1 + T2)(a v1) = T1(a v1) + T2(a v1) (definition of T1 + T2)

= a T1(v1) + a T2(v1) (since T1 and T2 are linear)

= a (T1(v1) + T2(v1))

= a (T1 + T2)(v1). (definition of T1 + T2)

Thus T1 + T2 is linear. The proof that c T1 is linear is analogous.

Example 2.2.2. Let V be a vector space over Q, and let π1 : V × V → V and π2 : V × V → V be the maps defined, as in Example 2.1.4, by

π1(v1, v2) = v1 and π2(v1, v2) = v2

for all (v1, v2) ∈ V × V . Then 2.2. THE VECTOR SPACE LK (V,W ) 17

1) π1 + π2 : V × V → V is the linear map given by

(π1 + π2)(v1, v2) = π1(v1, v2) + π2(v1, v2) = v1 + v2,

2)3 π1 : V × V → V is the linear map given by

(3π1)(v1, v2) = 3π1(v1, v2) = 3v1.

Combining addition and scalar multiplication, we can, for example, form

3π1 + 5π5 : V × V −→ V (v1, v2) 7−→ 3v1 + 5v2.

Theorem 2.2.3. The set LK (V,W ) is a vector space over K for the addition and scalar multiplication defined in Proposition 2.2.1.

Proof. To show that LK (V,W ) is a vector space over K, it suffices to proof that the opera- tions satisfy the 8 required axioms. We first note that the map O : V → W given by

O(v) = 0 for all v ∈ V is a linear map (hence an element of LK (V,W )). We also note that, if T ∈ LK (V,W ), then the function −T : V → W defined by

(−T )(v) = −T (v) for all v ∈ V is also a linear map (it is (−1)T ). Therefore, it suffices to show that, for all a, b ∈ K and T1,T2,T3 ∈ LK (V,W ), we have

1) T1 + (T2 + T3) = (T1 + T2) + T3

2) T1 + T2 = T2 + T1

3) O + T1 = T1

4) T1 + (−T1) = O

5)1 · T1 = T1

6) a(b T1) = (ab) T1

7)( a + b) T1 = a T1 + b T1

8) a (T1 + T2) = a T1 + a T2. 18 CHAPTER 2. REVIEW OF LINEAR MAPS

We show, for example, Condition 8). To verify that two elements of LK (V,W ) are equal, we must show that they take the same values at each point of their domain V . In the case of 8), this amounts to checking that (a (T1 + T2))(v) = (a T1 + a T2)(v) for all v ∈ V . Suppose v ∈ V . Then

(a (T1 + T2))(v) = a(T1 + T2)(v)

= a(T1(v) + T2(v))

= aT1(v) + aT2(v)

= (aT1)(v) + (aT2)(v)

= (aT1 + aT2)(v).

Therefore, we have a(T1 + T2) = a T1 + a T2 as desired. The other conditions are left as an exercise. Example 2.2.4 (Duals and linear forms). We know that K is a vector space over K Therefore, for any vector space V , the set LK (V,K) of linear maps from V to K is a vector space over K. We call in the dual of V and denote it V ∗. An element of V ∗ is called a on V .

Exercises.

2.2.1. Complete the proof of Proposition 2.2.1 by showing that cT1 is a linear map.

2.2.2. Complete (more of) the proof of Theorem 2.2.3 by showing that conditions 3), 6) and 7) are satisfied.

2.2.3. Let n be a positive integer. Show that, for i = 1, . . . , n, the function

n fi : K −→ K   x1  .   .  7−→ xi xn

n ∗ n ∗ is an element of the dual (K ) of K, and that {f1, . . . , fn} is a basis of (K ) .

2.3 Composition

Let U, V and W be vector spaces over a field K. Recall that if S : U → V and T : V → W are linear maps, then their composition T ◦ S : U → W , given by

(T ◦ S)(u) = T (S(u)) for all u ∈ U, 2.3. COMPOSITION 19

is also a linear map. In what follows, we will use the fact that composition is distributive over addition and that it “commutes” with scalar multiplication as the following proposition indicates.

Proposition 2.3.1. Suppose S,S1,S2 ∈ LK (U, V ) and T,T1,T2 ∈ LK (V,W ), and c ∈ K. We have

(i) (T1 + T2) ◦ S = T1 ◦ S + T2 ◦ S ,

(ii) T ◦ (S1 + S2) = T ◦ S1 + T ◦ S2 , (iii) (c T ) ◦ S = T ◦ (c S) = c T ◦ S .

Proof of (i). For all u ∈ U, we have

((T1 + T2) ◦ S)(u) = (T1 + T2)(S(u))

= T1(S(u)) + T2(S(u))

= (T1 ◦ S)(u) + (T2 ◦ S)(u)

= (T1 ◦ S + T2 ◦ S)(u) . The proofs of (ii) and (iii) are similar.

Let T : V → W be a linear map. Recall that, if T is bijective, then the inverse function T −1 : W → V defined by T −1(w) = u ⇐⇒ T (u) = w is also a linear map. It satisfies

−1 −1 T ◦ T = IV and T ◦ T = IW

where IV : V → V and IW : W → W denote the identity maps on V and W v 7→ v w 7→ w respectively. Definition 2.3.2 (Invertible). We say that a linear map T : V → W is invertible if there exists a linear map S : W → V such that

S ◦ T = IV and T ◦ S = IW . (2.1)

The preceding observations show that if T is bijective, then it is invertible. On the other hand, if T is invertible and S : W → V satisfies (2.1), then T is bijective and S = T −1. An invertible linear map is thus nothing more than a bijective linear map. We recall the following fact: Proposition 2.3.3. Suppose that S : U → V and T : V → W are invertible linear maps. Then U, V, W have the same dimension (finite or infinite). Furthermore,

(i) T −1 : W → V is invertible and (T −1)−1 = T , and 20 CHAPTER 2. REVIEW OF LINEAR MAPS

(ii) T ◦ S : U → W is invertible and (T ◦ S)−1 = S−1 ◦ T −1.

An invertible linear map is also called an isomorphism. We say that V is isomorphic to W and we write V ' W , if there exists an isomorphism from V to W . We can easily verify that

(i) V ' V ,

(ii) V ' W ⇐⇒ W ' V ,

(iii) if U ' V and V ' W , then U ' W .

Thus isomorphism is an equivalence relation on the class of vector spaces over K.

The following result shows that, for each integer n ≥ 1, every vector space of dimension n over K is isomorphic to Kn.

Proposition 2.3.4. Let n ∈ N>0, let V be a vector space over K of dimension n, and let B = {v1,..., vn} be a basis of V . The map

ϕ : V −→ Kn v 7−→ [v]B is an isomorphism of vector spaces over K.

Proof. To prove this, we first show that ϕ is linear (exercise). The fact that ϕ is bijective follows directly from Proposition 1.3.6.

Exercises.

2.3.1. Prove Part (iii) of Proposition 2.3.1.

2.3.2. Let V and W be vector spaces over a field K and let T : V → W be a linear map. Show that the function T ∗ : W ∗ → V ∗ defined by T ∗(f) = f ◦ T for all f ∈ W ∗ is a linear map (see Example 2.2.4 for the definitions of V ∗ and W ∗). We say that T ∗ is the linear map dual to T . 2.4. MATRICES ASSOCIATED TO LINEAR MAPS 21 2.4 Matrices associated to linear maps

In this section, U, V and W denote vector spaces over the same field K.

Proposition 2.4.1. Let T : V → W be a linear map. Suppose that B = {v1,..., vn} is a basis of V and that D = {w1,..., wm} is a basis of W . Then there exists a unique matrix B in Matm×n(K), denoted [ T ]D, such that B [T (v)]D = [ T ]D [v]B for all v ∈ V . This matrix is given by B   [ T ]D = [T (v1)]D ··· [T (vn)]D ,

B that is, the j-th column of [ T ]D is [T (vj)]D.

Try to prove this without looking at the argument below. It’s a good exercise!

Proof. 1st Existence: Let v ∈ V . Write   a1  .  [v]B =  .  . an

By definition, this means that v = a1v1 + ··· + anvn, thus

T (v) = a1T (v1) + ··· + anT (vn). Since the map W → Kn is linear (see Proposition 2.3.4), we see that w 7→ [w]D

[T (v)]D = a1[T (v1)]D + ··· + an[T (vn)]D  a    1  .  = [T (v1)]D ··· [T (vn)]D  .  an B = [ T ]D[v]B.

nd 2 Uniqueness: If a matrix M ∈ Matm×n(K) satisfies [T (v)]D = M[v]B for all v ∈ V , then, for j = 1, . . . , n, we have  0   .   .    [T (vj)]D = M[vj]B = M  1  ← j-th row  .   .  0 = j-th column of M.

B Therefore M = [ T ]D. 22 CHAPTER 2. REVIEW OF LINEAR MAPS

Example 2.4.2. Let M ∈ Matm×n(K) and let

n n TM : K −→ K X 7−→ MX

be the linear map associated to the matrix M (see Example 2.1.3). Let E = {e1,..., en} be n m n the standard basis of K and F = {f1,..., fm} the standard basis of K . For all X ∈ K and Y ∈ Km, we have [X]E = X and [Y ]F = Y (see Example 1.3.7), thus

[TM (X)]F = [MX]F = MX = M[X]E . From this we conclude that E [TM ]F = M, n m that is, M is the matrix of TM relative to the standard bases of K and K . Theorem 2.4.3. In the notation of Proposition 2.4.1, the function

LK (V,W ) −→ Matm×n(K) B T 7−→ [ T ]D is an isomorphism of vector spaces over K.

st Proof. 1 We show that this map in linear. For this, we choose arbitrary T1,T2 ∈ LK (V,W ) and c ∈ K. We need to check that

B B B 1)[ T1 + T2]D = [T1]D + [T2]D ,

B B 2)[ cT1]D = c [T1]D .

To show 1), we note that, for all v ∈ V , we have

[(T1 + T2)(v)]D = [T1(v) + T2(v)]D

= [T1(v)]D + [T2(v)]D B B = [T1]D[v]B + [T1]D[v]B B B = ([T1]D + [T2]D)[v]B.

Then 1) follows from the uniqueness of the matrix associated to T1 + T2. The proof of 2) is similar. 2nd It remains to show that the function is bijective. Proposition 2.4.1 implies that it is injective. To show that it is surjective, we choose an arbitrary matrix M ∈ Matm×n(K). We B need to show that there exists T ∈ LK (V,W ) such that [ T ]D = M. To do this, we consider the linear map n n TM : K −→ K X 7−→ MX 2.4. MATRICES ASSOCIATED TO LINEAR MAPS 23

associated to M. Proposition 2.3.4 implies that the maps

ϕ : V −→ Kn and ψ : W −→ Km v 7−→ [v]B w 7−→ [w]D are isomorphisms. Thus the composition

−1 T = ψ ◦ TM ◦ ϕ: V −→ W is a linear map, that is, an element of LK (V,W ). For all v ∈ V , we have

[T (v)]D = ψ(T (v)) = (ψ ◦ T )(v) = (TM ◦ ϕ)(v) = TM (ϕ(v)) = TM ([v]B) = M[v]B,

B and so M = [ T ]D, as desired.

Corollary 2.4.4. If dimK (V ) = n and dimK (W ) = m, then dimK LK (V,W ) = mn.

Proof. Since LK (V,W ) ' Matm×n(K), we have

dimK LK (V,W ) = dimK Matm×n(K) = mn.

Example 2.4.5. If V is a vector space of dimension n, it follows from Corollary 2.4.4 that its ∗ dual V = LK (V,K) has dimension n · 1 = n (see Example 2.2.4 for the notion of the dual 2 of V ). Also, the vector space LK (V,V ) of maps from V to itself has dimension n · n = n . Proposition 2.4.6. Let S : U → V and T : V → W be linear maps. Suppose that U, V , W are finite dimensional, and that A, B and D are bases of U, V and W respectively. Then we have A B A [T ◦ S]D = [T ]D [S]B .

Proof. For all u ∈ U, we have

B B A  B A [T ◦ S(u)]D = [T (S(u))]D = [ T ]D[S(u)]B = [ T ]D [S]B [u]A = [ T ]D [S]B [u]A.

Corollary 2.4.7. Let T : V → W be a linear map. Suppose that V and W are of finite dimension n and that B and D are bases of V and W respectively. Then T is invertible if B and only if the matrix [ T ]D ∈ Matn×n(K) is invertible, in which case we have

−1 D B −1 [T ]B = [ T ]D .

Proof. If T is invertible, we have

−1 −1 T ◦ T = IW and T ◦ T = IV ,

and so D −1 D B −1 D In = [ IW ]D = [T ◦ T ]D = [ T ]D[T ]B B −1 B −1 D B and In = [ IV ]B = [T ◦ T ]B = [T ]B [ T ]D. 24 CHAPTER 2. REVIEW OF LINEAR MAPS

B −1 D Thus [ T ]D is invertible and its inverse is [T ]B .

B D Conversely, if [ T ]D is invertible, there exists a linear map S : W → V such that [S]B = B −1 [ T ]D . We have

D B D D [T ◦ S]D = [ T ]D [ S ]B = In = [ IW ]D B D B B and [S ◦ T ]B = [ S ]B [ T ]D = In = [ IV ]B,

−1 thus T ◦ S = IW and S ◦ T = IV . Therefore T is invertible and its inverse is T = S.

2.5 Change of coordinates

We first recall the notion of a change of coordinates matrix.

0 0 0 Proposition 2.5.1. Let B = {v1,..., vn} and B = {v1,..., vn} be two bases of the same vector space V of dimension n over K. There exists a unique invertible matrix P ∈ Matn×n(K) such that

[v]B0 = P [v]B for all v ∈ V . This matrix is given by

B   P = [ IV ]B0 = [v1]B0 ··· [vn]B0

that is, the j-th column of P is [vj]B0 for j = 1, . . . , n. Its inverse is

−1 B0  0 0  P = [ IV ]B = [v1]B ··· [vn]B .

Proof. Since the identity map IV on V is linear, Proposition 2.4.1 implies that there exists a unique matrix P ∈ Matn×n(K) such that

[v]B0 = [ IV (v)]B0 = P [v]B

for all v ∈ V . This matrix is

B     P = [ IV ]B0 = [ IV (v1)]B0 ··· [ IV (vn)]B0 = [v1]B0 ··· [vn]B0 .

Also, since IV is invertible, equal to its own inverse, Corollary 2.4.7 gives

−1 −1 B0 B0 P = [ IV ]B = [ IV ]B .

The matrix P is called the change of coordinates matrix from basis B to basis B0.. 2.5. CHANGE OF COORDINATES 25

n 1 1 o Example 2.5.2. The set B = , is a basis of 2 since its cardinality is |B| = 2 3 R 2 0 2 = dimR R and B consists of linearly independent vectors. For the same reason, B = n  1  1 o , is also a basis of 2. We see that −1 0 R

1  1  1 1  1  1 = −2 + 3 and = −3 + 4 , 2 −1 0 3 −1 0 thus        B 1 1 −2 −3 [ I 2 ] 0 = , = R B 2 3 3 4 B0 B0 and so  −2 −3  [v] 0 = [v] B 3 4 B for all v ∈ R2. n 1 0 o We can also do this calculation using the standard basis E = , of 2 as an 0 1 R intermediary. Since IR2 = IR2 ◦ IR2 , we see that B E B [ IR2 ]B0 = [ IR2 ]B0 [ IR2 ]E −1  B0  B = [ IR2 ]E [ IR2 ]E  1 1 −1  1 1  = −1 0 2 3  0 −1   1 1   −2 −3  = = . 1 1 2 3 3 4

The second method used in the above example can be generalized as follows: Proposition 2.5.3. If B, B0 and B00 are three bases of the same vector space V , we have

B B0 B [ IV ]B00 = [ IV ]B00 [ IV ]B0 .

Change of basis matrices also allow us to relate the matrices of a linear map T : V → W with respect to different choices of bases for V and W . Proposition 2.5.4. Let T : V → W be a linear map. Suppose that B and B0 are bases of V and that D and D0 are bases of W . Then we have

B0 D B B0 [ T ]D0 = [ IW ]D0 [ T ]D [ IV ]B .

Proof. To see this, it suffices to write T = IW ◦ T ◦ IV and apply Proposition 2.4.6.

We conclude by recalling the following result whose proof is left as an exercise, and which provides a useful complement to Proposition 2.5.1. 26 CHAPTER 2. REVIEW OF LINEAR MAPS

Proposition 2.5.5. Let V be a vector space over K of finite dimension n, let B = {v1,..., vn} 0 0 0 0 be a basis of V , and let B = {v1,..., vn} be a set of n elements of V . Then B is a basis of V if and only if the matrix  0 0  P = [v1]B ··· [vn]B is invertible.

2.6 Endomorphisms and invariant subspaces

A linear map T : V → V from a vector space V to itself is called an endomorphism of V , or a linear operator on V . We denote by

EndK (V ) := LK (V,V ) the set of endomorphisms of V . By Theorem 2.2.3, this is a vector space over K.

Suppose that V has finite dimension n and that B = {v1,..., vn} is a basis of V . For B every linear operator T : V → V , we simply write [ T ]B for the matrix [ T ]B. With this convention, we have [T (v)]B = [ T ]B [v]B for all v ∈ V. Theorem 2.4.3 tells us that we have an isomorphism of vector spaces

EndK (V ) −→ Matn×n(K) .

T 7−→ [ T ]B This implies: 2 dimK (EndK (V )) = dimK (Matn×n(K)) = n .

Furthermore, if T1 and T2 are two linear operators on V and if c ∈ K, then we have:

[T1 + T2]B = [T1]B + [T2]B and [c T1]B = c [T1]B.

The composite T1 ◦ T2 : V → V is also a linear operator on V and Proposition 2.4.6 gives

[T1 ◦ T2]B = [T1]B [T2]B.

Finally, Propositions 2.5.1 and 2.5.4 tell us that, if B0 is another basis of V , then, for all T ∈ EndK (V ), we have −1 [ T ]B0 = P [ T ]B P B0 where P = [ IV ]B .

As explained in the introduction, one of the principal goals of this course is to give the simplest possible description of an endomorphism of a vector space. This description involves the following notion: 2.6. ENDOMORPHISMS AND INVARIANT SUBSPACES 27

Definition 2.6.1 (Invariant subspace). Let T ∈ EndK (V ). We say that a subspace U of V is T -invariant if T (u) ∈ U for all u ∈ U.

The proof of the following result is left as an exercise:

Proposition 2.6.2. Let T ∈ EndK (V ) and let U1,...,Us be T -invariant subspaces of V . Then U1 + ··· + Us and U1 ∩ · · · ∩ Us are also T -invariant.

We will now show the following:

Proposition 2.6.3. Let T ∈ EndK (V ) and let U be a T -invariant subspace of V . The function T : U −→ U |U u 7−→ T (u) is a linear map.

We say that T is the restriction of T to U. Proposition 2.6.3 tells us that this is a |U linear operator on U.

Proof. For all u1, u2 ∈ U and c ∈ K, we have

T (u + u ) = T (u + u ) = T (u ) + T (u ) = T (u ) + T (u ), |U 1 2 1 2 1 2 |U 1 |U 2

T (c u ) = T (c u ) = cT (u ) = c T (u ). |U 1 1 1 |U 1 Thus T : U → U is indeed linear. |U

Finally, the following criterion is useful for determining if a subspace U of V is invariant under an endomorphism of V .

Proposition 2.6.4. Let T ∈ EndK (V ) and let {u1,..., um} be a generating set of a subspace U of V . Then U is T -invariant if and only if T (ui) ∈ U for all i = 1, . . . , m.

Proof. By definition, if U is T -invariant, we have T (u1),...,T (um) ∈ U. Conversely, suppose that T (ui) ∈ U for all i = 1, . . . , m and choose an arbitrary element u of U. Since U = hu1,..., umiK , there exists a1, . . . , am ∈ K such that u = a1 u1 + ··· + amum. Therefore

T (u) = a1 T (u1) + ··· + amT (um) ∈ U.

Thus U is T -invariant. 28 CHAPTER 2. REVIEW OF LINEAR MAPS

Example 2.6.5. Let C∞(R) be the set of infinitely differentiable functions f : R → R. The derivative D : C∞(R) −→ C∞(R) f 7−→ f 0 is an R-linear operator on C∞(R). Let U be the subspace of C∞(R) generated by the functions f1, f2 and f3 given by 2x 2x 2 2x f1(x) = e , f2(x) = x e , f3(x) = x e . For all x ∈ R, we have 0 x f1(x) = 2e = 2f1(x), 0 2x 2x f2(x) = e + 2x e = f1(x) + 2f2(x), 0 2x 2 2x f3(x) = 2x e + 2x e = 2f2(x) + 2f3(x). Thus

D(f1) = 2 f1,D(f2) = f1 + 2 f2,D(f3) = 2 f2 + 2 f3 (2.2) belong to U and so U is a D-invariant subspace of C∞(R).

The functions f1, f2 and f3 are linearly independent, since if a1, a2, a3 ∈ R satisfy a1 f1 + a2 f2 + a3 f3 = 0, then we have 2x 2x 2 2x a1 e + a2x e + a3x e = 0 for all x ∈ R. Since e2x 6= 0 for any x ∈ R, we conclude (by dividing by e2x) 2 a1 + a2x + a3x = 0 for all x ∈ R and so a1 = a2 = a3 = 0 (a nonzero degree 2 polynomial has at most 2 roots). Therefore B = {f1, f2, f3} is a basis of U and the formulas (2.2) yield  2 1 0  [D ] = 0 2 2 . |U B   0 0 2

Combining the above ideas with those of Section 1.4, we obtain: Theorem 2.6.6. Let T : V −→ V be a linear operator on a finite dimensional vector space V .

Suppose that V is the direct sum of T -invariant subspaces V1,...,Vs. Let Bi = {vi,1,..., vi,ni } be a basis of Vi for i = 1, . . . , s. Then

B = B1 q · · · q Bs = {v1,1,..., v1,n1 ,...... , vs,1,..., vs,ns } is a basis of V and we have   [T | ]B1 0 ··· 0 V1

 0 [T | ]B2 ··· 0  [ T ] =  V2  , (2.3) B  . . .. .   . . . . 

0 0 ··· [T | ]Bs Vs that is, the matrix of T relative to the basis B is block diagonal, the i-th block on the diagonal being the matrix [T | ]Bi of the restriction of T to Vi relative to the basis Bi. Vi 2.6. ENDOMORPHISMS AND INVARIANT SUBSPACES 29

The fact that B is a basis of V follows from Proposition 1.4.8. We prove Equation (2.3) by induction on s. For s = 1, there is nothing to prove. For s = 2, it is easier to change notation. It suffices to prove: Lemma 2.6.7. Let T : V → V be a linear operator on a finite dimensional vector space V . Suppose that V = U ⊕ W where U and W are T -invariant subspaces. Let A = {u1,..., um} be a basis of U and D = {w1,..., wn} a basis of W . Then

B = {u1,..., um, w1,..., wn} is a basis of V and   [T | ]A 0 [ T ] = U . B 0 [T ] |W D

Proof. Write [T ] = (a ) and [T ] = (b ). Then, for j = 1, . . . , m, we have: |U A ij |W D ij   a1j .  .   .   a  T (u ) = T (u ) = a u + ··· + a u =⇒ [T (u )] =  mj  . j |U j 1j 1 mj m j B    0   .   .  0 Similarly, for j = 1, . . . , n, we have:  0  .  .   .   0  T (w ) = T (w ) = b w + ··· + b w =⇒ [T (w )] =   . j |W j 1j 1 nj n j B    b1j   .   .  bnj Thus,   [ T ]B = [T (u1)]B,..., [T (um)]B, [T (w1)]B,..., [T (wn)]B   a11 ··· a1m 0 ··· 0 ......  ......   . . . .     am1 ··· amm 0 ··· 0  =    0 ··· 0 b11 ··· b1n   ......   ......  0 ··· 0 bn1 ··· bnn   [T | ]A 0 = U 0 [T ] |W D 30 CHAPTER 2. REVIEW OF LINEAR MAPS

Proof of Theorem 2.6.6. As explained above, it suffices to prove (2.3). The case where s = 1 is clear. Now suppose that s ≤ 2 and that the result holds for all values less than s. The hypotheses of Lemma 2.6.7 are fulfilled for the choices

U := V1, A := B1,W := V2 ⊕ · · · ⊕ Vs and D := B2 q · · · q Bs.

Thus we have ! [T | ]B1 0 [ T ] = V1 . B 0 [T ] |W D Since W = V ⊕ · · · ⊕ V is a direct sum of s − 1 T -invariant subspaces, the induction 2 s |W hypothesis gives   [T | ]B2 ··· 0 V2 [ T ] =  . .. .  , |W D  . . . 

0 ··· [T | ]Bs Vs since (T | )| = T | for i = 2, . . . , s. The result follows. W Vi Vi

Exercises. 2.6.1. Prove Proposition 2.6.2.

2.6.2. Let V be a vector space over a field K and let S and T be commuting linear operators on V (i.e. such that S ◦ T = T ◦ S). Show that ker(S) and Im(S) are T -invariant subspaces of V .

2.6.3. Let D be the linear operator of differentiation on the space C∞(R) of infinitely differ- m entiable functions f : R → R. Consider the vector subspace U = h 1, x, . . . , x iR of C∞(R) for an integer m ≥ 0.

(i) Show that U is D-invariant.

(ii) Show that B = {1, x, . . . , xm} is a basis of U.

(iii) Calculate [ D ] . |U B

2.6.4. Let D be as in Exercise 2.6.3, and let U = h cos(3x), sin(3x), x cos(3x), x sin(3x) iR.

(i) Show that U is D-invariant.

(ii) Show that B = {cos(3x), sin(3x), x cos(3x), x sin(3x)} is a basis of U.

(iii) Calculate [ D|U ]B. 2.6. ENDOMORPHISMS AND INVARIANT SUBSPACES 31

2.6.5. Let V be a vector space of dimension 4 over Q, let A = {v1, v2, v3, v4} be a basis of V , and let T : V → V be the linear map determined by the conditions

T (v1) = v1 + v2 + 2v3,T (v2) = v1 + v2 + 2v4,

T (v3) = v1 − 3v2 + v3 − v4,T (v4) = −3v1 + v2 − v3 + v4.

(i) Show that B1 = {v1 +v2, v3 +v4} and B2 = {v1 −v2, v3 −v4} are bases (respectively) of T -invariant subspaces V1 and V2 of V such that V = V1 ⊕ V2. ` (ii) Calculate [ T | ]B1 and [ T | ]B2 . Then find [ T ]B, where B = B1 B2. V1 V2

2.6.6. Let T : R3 → R3 be the linear map whose matrix relative to the standard basis E of R3 is −1 2 2  [ T ]E =  2 −1 2  . 2 2 −1

t t t (i) Show that B1 = {(1, 1, 1) } and B2 = {(1, −1, 0) , (0, 1, −1) } are bases (respectively) 3 3 of T -invariant subspaces V1 and V2 of R such that R = V1 ⊕ V2. ` (ii) Calculate [ T | ]B1 and [ T | ]B2 , and deduce [ T ]B, where B = B1 B2. V1 V2

2.6.7. Let V be a finite dimensional vector space over a field K, let T : V → V be an endomor- phism of V , and let n = dimK (V ). Suppose that there exists an integer m ≥ 0 and a vector 2 m n−1 v ∈ V such that V = h v,T (v),T (v),...,T (v) iK . Show that B = {v,T (v),...,T (v)} is a basis of V and that there exist a0, . . . , an−1 ∈ K such that   0 0 ... 0 a0 1 0 ... 0 a   1  0 1 ... 0 a  [ T ]B =  2  . ......  . . . . .  0 0 ... 1 an−1

Note. We say that a vector spacel V with the above property is T -cyclic, or simply cyclic.

The last three exercises give examples of the general situation discussed in Exercise 2.6.7.

2.6.8. Let T : R2 → R2 be the linear map whose matrix in the standard basis E is 1 2 [ T ] = . E 3 1

(i) Let v = (1, 1)t. Show that B = {v,T (v)} is a basis of R2.

2 (ii) Determine [T (v)]B. 32 CHAPTER 2. REVIEW OF LINEAR MAPS

(iii) Calculate [ T ]B.

2.6.9. Let V be a vector space of dimension 3 over Q, let A = {v1, v2, v3} be a basis of V , and let T : V → V be the linear map determined by the conditions

T (v1) = v1 + v3,T (v2) = v1 − v2 and T (v3) = v2.

(i) Find [ T ]A.

2 (ii) Show that B = {v1,T (v1),T (v1)} is a basis of V .

(iii) Calculate [ T ]B.

x 2x 2.6.10. Let V = h 1, e , e iR ⊂ C∞(R) and let D be the restriction to V of the operator of x 2x derivation on C∞(R). Then A = {1, e , e } is a basis of V (you do not need to show this).

(i) Let f(x) = 1 + ex + e2x. Show that B = { f, D(f),D2(f) } is a basis of V .

(ii) Calculate [ D ]B. Chapter 3

Review of diagonalization

In this chapter, we recall how to determine if a linear operator or a square matrix is diago- nalizable and, if so, how to diagonalize it.

3.1 Determinants and similar matrices

Let K be a field. We define the determinant of a square matrix A = (aij) ∈ Matn×n(K) by X det(A) = (j1, . . . , jn)a1j1 ··· anjn

(j1,...,jn)∈Sn where Sn denotes the set of permutations of (1, 2, . . . , n) and where, for a permutation (j1, . . . , jn) ∈ Sn, the expression (j1, . . . , jn) represents the signature of (j1, . . . , jn), given by N(j1,...,jn) (j1, . . . , jn) = (−1) where N(j1, . . . , jn) denotes the number of pairs of indices (k, l) with 1 ≤ k < l ≤ n and jk > jl (called the number of inversions of (j1, . . . , jn)).

Recall that the determinant satisfies the following properties:

Theorem 3.1.1. Let n ∈ N>0 and let A, B ∈ Matn×n(K). Then we have:

(i) det(I) = 1,

(ii) det(At) = det(A),

(iii) det(AB) = det(A) det(B), where I denotes the n × n identity matrix and At denotes the transpose of A. Moreover, the matrix A is invertible if and only if det(A) 6= 0, in which case

33 34 CHAPTER 3. REVIEW OF DIAGONALIZATION

(iv) det(A−1) = det(A)−1.

In fact, the above formula for the determinant applies to any square matrix with coeffi- cients in a commutative ring. A proof of Theorem 3.1.1 in the general case, with K replaced by an arbitrary commutative ring, can be found in Appendix B.

The “multiplicative” properties (iii) and (iv) of the determinant imply the following result.

Proposition 3.1.2. Let T be an endomorphism of a finite dimensional vector space V , and let B, B0 be two bases of V . Then we have

det[ T ]B = det[ T ]B0 .

−1 B0 Proof. We have [ T ]B0 = P [ T ]BP where P = [ IV ]B , and so

−1 det[ T ]B0 = det(P [ T ]BP ) −1 = det(P ) det[ T ]B det(P ) −1 = det(P ) det[ T ]B det(P )

= det[ T ]B.

This allows us to formulate the following: Definition 3.1.3 (Determinant). Let T : V → V be a linear operator on a finite dimensional vector space V . We define the determinant of T by

det(T ) = det[ T ]B where B denotes an arbitrary basis of V .

Theorem 3.1.1 allows us to study the behavior of the determinant under the composition of linear operators:

Theorem 3.1.4. Let S and T be linear operators on a finite dimensional vector space V . We have:

(i) det(IV ) = 1, (ii) det(S ◦ T ) = det(S) det(T ).

Moreover, T is invertible if and only if det(T ) 6= 0, in which case we have

(iii) det(T −1) = det(T )−1. 3.1. DETERMINANTS AND SIMILAR MATRICES 35

Proof. Let B be a basis of V . Since [ IV ]B = I is the n × n identity matrix, we have det(IV ) = det(I) = 1. This proves (i). For (iii), we note, by Corollary 2.4.7, that T is invertible if and only if the matrix [ T ]B is invertible and that, in this case,

−1 −1 [ T ]B = [ T ]B . This proves (iii). Relation (ii) is left as an exercise.

Theorem 3.1.4 illustrates the phenomenon mentioned in the introduction that, most of the time in this course, a result about matrices implies a corresponding result about linear operators and vice-versa. We conclude with the following notion:

Definition 3.1.5 (Similarity). Let n ∈ N>0 and let A, B ∈ Matn×n(K) be square matrices of the same size n × n. We say that A is similar to B (or conjugate to B), if there exists an −1 invertible matrix P ∈ Matn×n(K), such that B = P AP . We then write A ∼ B.

Example 3.1.6. If T : V → V is a linear operator on a finite dimensional vector space V and if B and B0 are two bases of V , then we have

−1 [ T ]B0 = P [ T ]BP

B0 where P = [ IV ]B , and so [ T ]B0 and [ T ]B are similar.

More precisely, we can show:

Proposition 3.1.7. Let T : V → V be a linear operator on a finite dimensional vector space V , and let B be a basis of V . A matrix M is similar to [ T ]B if and only if there exists a 0 basis B of V such that M = [ T ]B0 .

Finally, Theorem 3.1.1 provides a necessary condition for two matrices to be similar:

Proposition 3.1.8. If A and B are two similar matrices, then det(A) = det(B).

The proof is left as an exercise. Proposition 3.1.2 can be viewed as a corollary to this result, together with Example 3.1.6.

Exercises.

3.1.1. Show that similarity ∼ is an equivalence relation on Matn×n(K).

3.1.2. Prove Proposition 3.1.7.

3.1.3. Prove Proposition 3.1.8. 36 CHAPTER 3. REVIEW OF DIAGONALIZATION 3.2 Diagonalization of operators

In this section, we fix a vector space V of finite dimension n over a field K and a linear operator T : V → V . We say that:

• T is diagonalizable if there exists a basis B of V such that [ T ]B is a diagonal matrix,

• an element λ of K is an eigenvalue of T if there exists a nonzero vector v of V such that T (v) = λv ,

• a nonzero vector v that satisfies the above relation for an element λ of K is called an eigenvector of T or, more precisely, an eigenvector of T for the eigenvalue λ.

We also recall that if λ ∈ K is an eigenvalue of T then the set

Eλ := {v ∈ V ; T (v) = λv}

(also denoted Eλ(T )) is a subspace of V since, for v ∈ V , we have

T (v) = λv ⇐⇒ (T − λI)(v) = 0 and so Eλ = ker(T − λI).

We way that Eλ is the eigenspace of T for the eigenvalue λ. The eigenvectors of T for this eigenvalue are the nonzero elements of Eλ.

For each eigenvalue λ ∈ K, the eigenspace Eλ is a T -invariant subspace of V since, if v ∈ Eλ, we have T (v) = λv ∈ Eλ. Also note that an element v of V is an eigenvector of T if and only if hviK is a T -invariant subspace of V of dimension 1 (exercise). Proposition 3.2.1. The operator T is diagonalizable if and only if V admits a basis con- sisting of eigenvectors of T .

Proof. Let B = {v1,..., vn} be a basis of V , and let λ1, . . . , λn ∈ K. We have   λ1 0  ..  [ T ]B =  .  ⇐⇒ T (v1) = λ1v1 ,...,T (vn) = λnvn. 0 λn

The conclusion follows. 3.2. DIAGONALIZATION OF OPERATORS 37

The crucial point is the following result:

Proposition 3.2.2. Let λ1, . . . , λs ∈ K be distinct eigenvalues of T . The sum Eλ1 +···+Eλs is direct.

Proof. In light of Definition 1.4.1, we need to show that the only possible choice of vectors

v1 ∈ Eλ1 ,..., vs ∈ Eλs such that

v1 + ··· + vs = 0 (3.1)

is v1 = ··· = vs = 0. We show this by induction on s, noting first that, for s = 1, the result is immediate.

Suppose that s ≥ 2 and that the sum Eλ1 + ··· + Eλs−1 is direct. Choose vectors

v1 ∈ Eλ1 ,..., vs ∈ Eλs satisfying (3.1) and apply T to both sides of this equality. Since T (vi) = λivi for i = 1, . . . , s and T (0) = 0, we get

λ1v1 + ··· + λsvs = 0. (3.2)

Subtracting from (3.2) the equality (3.1) multiplied by λs, we have

(λ1 − λs)v1 + ··· + (λs−1 − λs)vs−1 = 0.

Since (λi − λs)vi ∈ Eλi for i = 1, . . . , s − 1 and, by hypothesis, the sum Eλ1 + ··· + Eλs−1 is direct, we see that (λ1 − λs)v1 = ··· = (λs−1 − λs)vs−1 = 0.

Finally, since λs 6= λi for i = 1, . . . , s − 1, we conclude that v1 = ··· = vs−1 = 0, hence,

by (3.1), that vs = 0. Therefore the sum Eλ1 + ··· + Eλs is direct.

From this we conclude the following:

Theorem 3.2.3. The operator T has at most n = dimK (V ) eigenvalues. Let λ1, . . . , λs be the distinct eigenvalues of T in K. Then T is diagonalizable if and only if

dimK (V ) = dimK (Eλ1 ) + ··· + dimK (Eλs ). If this is the case, we obtain a basis B of V consisting of eigenvectors of T by concatenating bases of Eλ1 ,...,Eλs and   λ1  ..   . 0     λ1   .  [ T ]B =  ..     λ   s   .   0 ..  λs

with each λi repeated a number of times equal to dimK (Eλi ). 38 CHAPTER 3. REVIEW OF DIAGONALIZATION

Proof. First suppose that λ1, . . . , λs are distinct eigenvalues of T (not necessarily all of them).

Then, since the sum Eλ1 + ··· + Eλs is direct, we find that

dimK (Eλ1 + ··· + Eλs ) = dimK (Eλ1 ) + ··· + dimK (Eλs ).

Since Eλ1 + ··· + Eλs ⊆ V and dimK (Eλi ) ≥ 1 for i = 1, . . . , s, we deduce that

n = dimK (V ) ≥ dimK (Eλ1 ) + ··· + dimK (Eλs ) ≥ s.

This proves the first statement of the theorem: T has at most n eigenvalues.

Now suppose that λ1, . . . , λs are all the distinct eigenvalues of T . By propositions 3.2.1, 3.2.2 and Corollary 1.4.9, we see that

T is diagonalizable ⇐⇒ V = Eλ1 ⊕ · · · ⊕ Eλs

⇐⇒ n = dimK (Eλ1 ) + ··· + dimK (Eλs ).

This proves the second assertion of the theorem. The last two follow from Theorem 2.6.6.

To be able to diagonalize T in practice, that is, find a basis B of V such that [ T ]B is diagonal, if such a basis exists, it suffices to be able to determine the eigenvalues of T and the corresponding eigenvspaces. Since Eλ = ker(T − λI) for each eigenvalue λ, the problem is reduced to the determination of the eigenvalues of T . To do this, we rely on the following result:

Proposition 3.2.4. Let B be a basis of V . The polynomial

charT (x) := det(x I − [ T ]B) ∈ K[x]

does not depend on the choice of B. The eigenvalues of T are the roots of this polynomial in K.

Before proving this result, we first note that the expression det(x I − [ T ]B) really is an element of K[x], that is, a polynomial in x with coefficients in K. To see this, write [ T ]B = (aij). Then we have   x − a11 −a12 · · · −a1n  −a21 x − a22 · · · −a2n  det(x I − [ T ]B) = det  . . . .   . . .. .    (3.3) −an1 −an2 ··· x − ann

= (x − a11)(x − a22) ··· (x − ann) + (terms of degree ≤ n − 2) n n−1 = x − (a11 + ··· + ann)x + ··· . as can be seen from the formula for the expansion of the determinant reviewed in Section 3.1. This polynomial is called the characteristic polynomial of T and is denoted charT (X). 3.2. DIAGONALIZATION OF OPERATORS 39

Proof of Proposition 3.2.4. Let B0 be another basis of V . We know that

−1 [ T ]B0 = P [ T ]B P

B0 where P = [ I ]B . We thus have

−1 −1 xI − [ T ]B0 = xI − P [ T ]B P = P (xI − [ T ]B) P,

and so −1  det(xI − [ T ]B0 ) = det P (xI − [ T ]B) P −1 = det(P ) det(xI − [ T ]B) det(P ) (3.4)

= det(xI − [ T ]B).

This shows that the polynomial det(xI − [ T ]B) is independent of the choice of B. Note that the calculations (3.4) involve matrices with polynomial coefficients in K[x], and the multi- plicativity of the determinant for a product of such matrices. This is justified in Appendix B. If the field K if infinite, we can avoid this difficulty by identifying the polynomials in K[x] with polynomial functions from K to K (see Section 4.2). Finally, let λ ∈ K. Be definition, λ is an eigenvalue of T if and only if

ker(T − λI) 6= {0} ⇐⇒ there exists v ∈ V \{0} such that (T − λ I)v = 0 , n ⇐⇒ there exists X ∈ K \{0} such that [ T − λ I ]BX = 0 ,

⇐⇒ det([ T − λ I ]B) = 0 .

Since [ T − λI ]B = [ T ]B − λ[ I ]B = [ T ]B − λI = −(λI − [ T ]B), we also have

n det([ T − λI ]B) = (−1) charT (λ) .

Thus the eigenvalues of T are the roots of charT (x).

Corollary 3.2.5. Let B be a basis of V . Writes [ T ]B = (aij). Then the number

trace(T ) := a11 + a22 + ··· + ann,

called the trace of T , does not depend on the choice of B. We have

n n−1 n charT (x) = x − trace(T ) x + ··· + (−1) det(T ).

n−1 Proof. The calculations (3.3) show that the coefficient of x in det(x I − [ T ]B) is equal to −(a11 + ··· + ann). Since charT (x) = det(xI − [ T ]B) does not depend on the choice of B, the sum a11 + ··· + ann does not depend on it either. This proves the first assertion of the corollary. For the second, it suffices to show that the constant term of charT (x) is (−1)n det(T ). But this coefficient is

n n charT (0) = det(−[ T ]B) = (−1) det([ T ]B) = (−1) det(T ). 40 CHAPTER 3. REVIEW OF DIAGONALIZATION

Example 3.2.6. With the notation of Example 2.6.5, we set T = D where U is the subspace |U 2x 2x 2 2x of C∞(R) generated by the functions f1(x) = e , f2(x) = x e and f3(x) = x e , and where D is the derivative operator on C∞(R). We have seen that B = {f1, f2, f3} is a basis for U and that  2 1 0  [ T ]B =  0 2 2  . 0 0 2 The characteristic polynomial of T is therefore

x − 2 −1 0 3 charT (x) = det(xI − [ T ]B) = 0 x − 2 −2 = (x − 2) .

0 0 x − 2

Since x = 2 is the only root of (x − 2)3 in R, we see that 2 is the only eigenvalue of T . The eigenspace of T for this eigenvalue is

E2 = ker(T − 2I).

Let f ∈ U. We have

f ∈ E2 ⇐⇒ (T − 2I)(f) = 0 ⇐⇒ [ T − 2I ]B[f]B = 0 ⇐⇒ ([ T ]B − 2I)[f]B = 0  0 1 0   t  ⇐⇒  0 0 2  [f]B = 0 ⇐⇒ [f]B =  0  with t ∈ R 0 0 0 0

⇐⇒ f = t f1 with t ∈ R .

Thus E2 = hf1iR has dimension 1. Since

dimR(E2) = 1 6= 3 = dimR(U), Theorem 3.2.3 tells us that T is not diagonalizable.

Exercises.

3.2.1. Let V be a vector space over a field K, let T : V → V be a linear operator on V , and let v1,..., vs be eigenvectors of T for the distinct eigenvalues λ1, . . . , λs ∈ K. Show that v1,..., vs are linearly independent over K. Hint. You can, for example, combine Proposition 3.2.2 with Exercise 1.4.3(ii).

3.2.2. Let V = h f0, f1, . . . , fn iR be the subspace of C∞(R) generated by the functions fi(x) = eix for i = 0, . . . , n. 3.3. DIAGONALIZATION OF MATRICES 41

(i) Show that V is invariant under the operator D of derivation on C∞(R) and that, for i = 0, . . . , n, the function fi is an eigenvector of D with eigenvalue i.

(ii) Using Exercise 3.2.1, deduce that B = {f0, f1, . . . , fn} is a basis of V . (iii) Calculate [ D ] . |V B

2 3.2.3. Consider the subspace P2(R) of C∞(R) given by P2(R) = h 1, x, x iR. In each case, determine whether or not the given linear map is diagonalizable and, if it is, find a basis with respect to which its matrix is diagonal.

0 (i) T : P2(R) → P2(R) given by T (p(x)) = p (x).

(ii) T : P2(R) → P2(R) given by T (p(x)) = p(2x).

t 3.2.4. Let T : Mat2×2(R) → Mat2×2(R) be the linear map given by T (A) = A .

(i) Find a basis of the eigenspaces of T for the eigenvalues 1 and −1.

(ii) Find a basis B of Mat2×2(R) such that [ T ]B is diagonal, and write [ T ]B.

3.2.5. Let S : R3 → R3 be the reflection in the plane with equation x + y + z = 0.

(i) Describe geometrically the eigenspaces of S and give a basis of each.

3 (ii) Find a basis B of R relative to which the matrix of S is diagonal and give [ S ]B. (iii) Using the above, find the matrix of T in the standard basis.

3.3 Diagonalization of matrices

As we have already mentioned, results on linear operators translate into results on matrices and vice versa. The problem of diagonalization is a good example of this.

In this section, we fix an integer n ≥ 1.

• We say that a matrix A ∈ Matn×n(K) is diagonalizable if there exists an invertible −1 matrix P ∈ Matn×n(K) such that P AP is a diagonal matrix.

• We say that an element λ of K is an eigenvalue of A if there exists a nonzero column vector X ∈ Kn such that AX = λX. 42 CHAPTER 3. REVIEW OF DIAGONALIZATION

• A nonzero column vector X ∈ Kn that satisfies the relation AX = λX for a scalar λ ∈ K is called an eigenvector of A for the eigenvalue λ.

• If λ is an eigenvalue of A, then the set

n Eλ(A) := {X ∈ K ; AX = λ X} = {X ∈ Kn ;(A − λI) X = 0}

is a subspace of Kn called the eigenspace of A for the eigenvalue λ.

• The polynomial

charA(x) := det(x I − A) ∈ K[x] is called the characteristic polynomial of A.

n n Recall that to every A ∈ Matn×n(K), we associate a linear operator TA : K → K given by n TA(X) = AX for all X ∈ K . Then we see that the above definitions coincide with the corresponding notions for the operator TA. More precisely, we note that:

Proposition 3.3.1. Let A ∈ Matn×n(K).

(i) The eigenvalues of TA coincide with those of A.

(ii) If λ is an eigenvalue TA, then the eigenvectors of TA for this eigenvalue are the same as those of A and we have Eλ(TA) = Eλ(A).

(iii) charTA (x) = charA(x).

Proof. Assertions (i) and (ii) follow directly from the definitions, while (iii) follows from the n fact that [ TA ]E = A, where E is the standard basis of K .

We also note that:

Proposition 3.3.2. Let A ∈ Matn×n(K). The following conditions are equvalent:

(i) TA is diagonalizable,

(ii) there exists a basis of Kn consisting of eigenvectors of A,

(iii) A is diagonalizable. 3.3. DIAGONALIZATION OF MATRICES 43

n Furthermore, if these conditions are satisfied and if B = {X1,...,Xn} is a basis of K consisting of eigenvectors of A, then the matrix P = (X1 ··· Xn) is invertible and   λ1 0 −1  ..  P AP =  .  , 0 λn where λi is the eigenvalue of A corresponding to Xi for i = 1, . . . , n.

Proof. If TA is diagonalizable, Proposition 3.2.1 shows that there exists a basis B = {X1,...,Xn} n of K consisting of eigenvectors of TA or, equivalently, of A. For such a basis, the change of B basis matrix P = [ I ]E = (X1 ··· Xn) from the basis B to the standard basis E is invertible and we find that   λ1 0 −1 −1  ..  P AP = P [ TA ]E P = [ TA ]B =  .  , 0 λn where λi is the eigenvalue of A corresponding to Xi for i = 1, . . . , n. This proves the implication (i)⇒(ii)⇒(iii) as well as the last assertion of the proposition. It remains to prove the implication (iii)⇒(i).

Suppose therefore that A is diagonalizable and that P ∈ Matn×n(K) is an invertible matrix such that P −1 AP is diagonal. Since P is invertible, its columns form a basis B of n B −1 −1 K and, with this choice of basis, we have P = [ I ]E . Then [ TA ]B = P [ TA ]E P = P AP is diagonal, which proves that TA is diagonalizable.

Combining Theorem 3.2.3 with the two preceding propositions, we have the following:

Theorem 3.3.3. Let A ∈ Matn×n(K). Then A has an most n eigenvalues in K. Let λ1, . . . , λs be the distinct eigenvalues of A in K. Then A is diagonalizable if and only if

n = dimK (Eλ1 (A)) + ··· + dimK (Eλs (A)) .

n If this is the case, we obtain a basis B = {X1,...,Xn} of K consisting of eigenvectors of

A by concatenating the bases Eλ1 (A),...,Eλs (A). Then the matrix P = (X1 ··· Xn) with columns X1,...,Xn is invertible and   λ1  ..   . 0     λ1  −1  .  P AP =  ..     λ   s   .   0 ..  λs

with each λi repeated a number of times equal to dimK (Eλi (A)). 44 CHAPTER 3. REVIEW OF DIAGONALIZATION

 1¯ 2¯ 1¯  ¯ ¯ ¯ ¯ ¯ ¯ Example 3.3.4. Let A =  0 0 0  ∈ Mat3×3(F3), were F3 = {0, 1, 2} is the finite field 1¯ 2¯ 1¯ with 3 elements. Do the following:

◦ 1 Find the eigenvalues of A in F3. 2◦ For each eigenvalue, find a basis of the corresponding eigenspace. 3◦ Determine whether or not A is diagonalizable. If it is, find an invertible matrix P such that P −1AP is diagonal and give this diagonal matrix.

Solution: 1st We find ¯ ¯ ¯ x − 1 −2 −1 ¯ ¯ ¯ ¯ x − 1 −1 (expansion along the second charA(x) = 0 x 0 = x −1¯ x − 1¯ row) −1¯ −2¯ x − 1¯ = x(x2 + x) = x2(x + 1)¯ ,

and so the eigenvalues of A are 0¯ and −1¯ = 2.¯

2nd We have 3 ¯ 3 E0¯(A) = {X ∈ F3 ;(A − 0I)X = 0} = {X ∈ F3 ; AX = 0}.

3 In other words E0¯(A) is the solution set in F3 to the homogeneous system of linear equations AX = 0. The latter does not change if we apply to the matrix A of the system a series of elementary row operations:  1¯ 2¯ 1¯   1¯ 2¯ 1¯  A =  0¯ 0¯ 0¯  ∼  0¯ 0¯ 0¯  1¯ 2¯ 1¯ 0¯ 0¯ 0¯ ← L3 − L1

Therefore, E0¯(A) is the solution set of the equation x + 2¯y + z = 0¯ .

Setting y = s and z = t, we see that x = s − t, and so    * ¯   ¯ +  s − t  1 2 3 ¯ ¯ E0¯(A) =  s  ; s, t ∈ F3 =  1  ,  0  .  t  0¯ 1¯ F3  ¯   ¯   1 2       ¯   ¯  Thus B1 =  1 , 0  is a basis of E0(A) (it is clear that the two column ectors are      0¯ 1¯  linearly independent). 3.3. DIAGONALIZATION OF MATRICES 45

Similarly, we have:

3 ¯ 3 E2¯(A) = {X ∈ F3 ;(A − 2I)X = 0} = {X ∈ F3 ;(A + I)X = 0} . We find       2¯ 2¯ 1¯ 1¯ 1¯ 2¯ ← 2¯L1 1¯ 1¯ 2¯ A + I =  0¯ 1¯ 0¯  ∼  0¯ 1¯ 0¯  ∼  0¯ 1¯ 0¯  1¯ 2¯ 2¯ 1¯ 2¯ 2¯ 0¯ 1¯ 0¯ ← L3−L1   1¯ 0¯ 2¯ ← L1−L2 ∼  0¯ 1¯ 0¯  0¯ 0¯ 0¯ ← L3−L2

Therefore, E2¯(A) is the solution set to the system  x + 2¯z = 0¯ y = 0¯.

Here, z is the only independent variable. Let z = t. Then x = −2¯t = t and y = 0,¯ and so    * ¯ +  t  1 ¯ ¯ E2¯(A) =  0  ; t ∈ F3 =  0   t  1¯ F3  ¯   1     ¯  and it follows that B2 =  0  is a basis of E2¯(A).    1¯ 

3rd Since A is a 3 × 3 matrix and

dimF3 E0¯(A) + dimF3 E2¯(A) = 3, we conclude that A is diagonalizable, and that  ¯   ¯   ¯   1 2 1  ¯ ¯ ¯ B = B1 q B2 =  1  ,  0  ,  0   0¯ 1¯ 1¯ 

3 ¯ ¯ ¯ is a basis of F3 consisting of eigenvectors of A for the respective eigenvalues 0, 0, 2. Then the matrix  1¯ 2¯ 1¯  P =  1¯ 0¯ 0¯  0¯ 1¯ 1¯ is invertible and the product P −1AP is a diagonal matrix with diagonal (0¯, 0¯, 2):¯

 0¯ 0¯ 0¯  −1 P AP =  0¯ 0¯ 0¯  . 0¯ 0¯ 2¯ 46 CHAPTER 3. REVIEW OF DIAGONALIZATION

Exercises.

3.3.1. In each case, find the real eigenvalues of the given matrix A ∈ Mat3×3(R) and, for each of the eigenvalues, give a basis of the corresponding eigenspace. If A is diagonalizable, −1 −1 find an invertible matrix P ∈ Mat3×3(R) such that P AP is diagonal, and give P AP . −2 0 5 1 −2 3  (i) A =  0 5 0 , (ii) A = 2 6 −6 . −4 0 7 1 2 −1

1 −2 3.3.2. The matrix A = is not diagonalizable over the real numbers. However, it 4 −3 is over the complex numbers. Find an invertible matrix P with complex entries such that P −1AP is diagonal, and give P −1AP .

1¯ 1¯ 1¯ ¯ ¯ ¯ ¯ ¯ 3.3.3. Let F2 = {0, 1} be the field with 2 elements and let A = 1 0 0 ∈ Mat3×3(F2). 0¯ 0¯ 1¯ Find the eigenvalues of A in F2 and, for each one, give a basis of the corresponding eigenspace. −1 If A is diagonalizable, find an invertible matrix P ∈ Mat3×3(F2) such that P AP is diagonal, and give P −1AP . Chapter 4

Polynomials, linear operators and matrices

Let V be a vector space of finite dimension n over a field K. In this chapter, we show that the set EndK (V ) of linear operators on V is a ring under addition and composition, isomorphic to the ring Matn×n(K) of n × n matrices with coefficients in K. We then construct the ring K[x] of polynomials in one indeterminate x over the field K. For each linear operator T ∈ EndK (V ) and each matrix A ∈ Matn×n(K), we define ring homomorphisms

K[x] −→ EndK (V ) and K[x] −→ Matn×n(K) P (x) 7−→ P (T ) P (x) 7−→ P (A) that send x to T and x to A respectively. The link between them is that, if B is a basis of V , then [P (T )]B = P ([ T ]B).

4.1 The ring of linear operators

We refer the reader to Appendix A for a review of the basic notions related to rings. We first show:

Theorem 4.1.1. Let V be a vector space of dimension n over a field K and let B be a basis of V . The set EndK (V ) of linear operators on V is a ring under addition and composition. Furthermore, the function

ϕ : EndK (V ) −→ Matn×n(K) T 7−→ [ T ]B is a ring isomorphism.

Proof. We already know, by Theorem 2.2.3, that the set EndK (V ) = LK (V,V ) is a vector space over K. In particular, it is an abelian group under addition. We also know that the

47 48 CHAPTER 4. POLYNOMIALS, LINEAR OPERATORS AND MATRICES composite of two linear operators S : V → V and T : V → V is a linear operator S ◦T : V → V , hence an element of EndK (V ). Since the composition is associative and the identity map IV ∈ EndK (V ) is an identity element for this operation, we conclude that EndK (V ) is a monoid under composition. Finally, Proposition 2.3.1 gives that if R,S,T ∈ EndK (V ), then (R + S) ◦ T = R ◦ T + S ◦ T and R ◦ (S + T ) = R ◦ S + R ◦ T.

Therefore composition is distributive over addition in EndK (V ) and so EndK (V ) is a ring. In addition, Theorem 2.4.3 shows that the function ϕ is an isomorphism of vector spaces over K. In particular, it is an isomorphism of abelian groups under addition. Finally, Proposition 2.4.6 implies that if S,T ∈ EndK (V ), then

ϕ(S ◦ T ) = [ S ◦ T ]B = [ S ]B[ T ]B = ϕ(S)ϕ(T ) .

Since ϕ(IV ) = I, we conclude that ϕ is also an isomorphism of rings.

Exercises. 4.1.1. Let V be a vector space over a field K and let U be a subspace of V . Show that the set of linear maps T : V → V such that T (u) ∈ U for all u ∈ U is a subring of EndK (V ). √ 4.1.2. Let A = {a + b 2 ; a, b ∈ Q}. √ (i) Show that A is a subring of R and that B = {1, 2} is a basis of A viewed as a vector space over Q. √ (ii) Let α = a + b 2 ∈ A. Show that the function mα : A → A defined by mα(β) = αβ for all β ∈ A is a Q-linear map and calculate [mα]B.

(iii) Show that the function ϕ: A → Mat2×2(Q) given by ϕ(α) = [mα]B is an injective ring homomorphism, and describe its image.

4.2 Polynomial rings

Proposition 4.2.1. Let A be a commutative ring and let x be a symbol not in A. There exists a commutative ring, denoted A[x], containing A as a subring and x as an element such that

n (i) every element of A[x] can be written as a0 + a1 x + ··· + an x for some integer n ≥ 0 and some elements a0, a1, . . . , an of A ; 4.2. POLYNOMIAL RINGS 49

n (ii) if a0 + a1 x + ··· + an x = 0 for an integer n ≥ 0 and elements a0, a1, . . . , an of A, then a0 = a1 = ··· = an = 0.

This ring A[x] is called the polynomial ring in the indeterminate x over A. The proof of this result is a bit technical. The student make skip it in a first reading. However, it is important to known how to manipulate the elements of this ring, called polynomials in x with coefficients in A.

Suppose the existence of this ring A[x] and let

m n p = a0 + a1 x + ··· + am x and q = b0 + b1 x + ··· + bn x

be two of its elements. We first note that condition (ii) implies   ai = bi for i = 0, 1,..., min(m, n), p = q ⇐⇒ ai = 0 for i > n,  bi = 0 for i > m.

Defining ai = 0 for each i > m and bi = 0 for each i > n, we also have

max(n,m) max(n,m) max(n,m) X i X i X i p + q = aix + bix = (ai + bi) x , i=0 i=0 i=0

m ! n ! m n m+n k ! X i X i XX i+j X X k p · q = aix · bix = aibj x = aibk−i x . i=0 i=0 i=0 j=0 k=0 i=0 ¯ ¯ ¯ Example 4.2.2. Over the field F3 = {0, 1, 2} of 3 elements, we have (1¯ + x2 + 2¯x3) + (x + 2¯x2) = 1¯ + x + 2¯x3, (1¯ + x2 + 2¯x3)(x + 2¯x2) = x + 2¯x2 + x3 + x4 + x5.

Proof of Proposition 4.2.1. Let S be the set of sequences (a0, a1, a2,... ) of elements of A whose elements are all zero after a certain point. We define an “addition” on S by setting

(a0, a1, a2,... ) + (b0, b1, b2,... ) = (a0 + b0, a1 + b1, a2 + b2,... ).

The result is indeed an element of S, since if ai = bi = 0 for all i > n, then ai + bi = 0 for all i > n. It is easily verified that this operation is associative and commutative, that the element (0, 0, 0,... ) is an identity element, and that an arbitrary element (a0, a1, a2,... ) of S has inverse (−a0, −a1, −a2,... ). Thus S is an abelian group under this operation. We also define a “multiplication” on S by defining

k X (a0, a1, a2,... )(b0, b1, b2,... ) = (t0, t1, t2,... ) where tk = aibk−i. i=0 50 CHAPTER 4. POLYNOMIALS, LINEAR OPERATORS AND MATRICES

The result is an element of S, since if we suppose again that ai = bi = 0 for all i > n, then, for every integer k > 2n, each of the products aibk−i with i = 0, . . . , k is zero (one of the indices i or k − i being necessarily > n) and so tk = 0. It is easy to see that the sequence (1, 0, 0,... ) is an identity element for this operation. To conclude that S is a ring under these operations, it remains to show that the mul- tiplication is associative and distributive over the addition. In other words, we must show that if p = (a0, a1, a2,... ), q = (b0, b1, b2,... ) and r = (c0, c1, c2,... ) are three elements of S, then

1)( pq)r = p(qr),

2)( p + q)r = p r + q r,

3) p(q + r) = p q + p r.

These three equalities are proved in the same manner: it suffices to show that the sequences on either side of the equality have the same element of index k for each k ≥ 0. Equality 1) is the most delicate to verify since each term involves two products. We see that the element of index k of (pq)r is

k k j ! k j X X X X X ((pq)r)k = (pq)jck−j = aibj−i ck−j = aibj−ick−j j=0 j=0 i=0 j=0 i=0

and that of p(qr) is

k k k−i ! k k−i X X X X X (p(qr))k = ai(qr)k−i = ai bjck−i−j = aibjck−i−j. i=0 i=0 j=0 i=0 j=0

In both cases, we see that the result if the sum of all products aibjcl with i, j, l ≥ 0 and i + j + l = k, hence X ((pq)r)k = aibjcl = (p(qr))k. i+j+l=k This proves 1). The proofs of 2) and 3) are simpler and left as exercises.

For all a, b ∈ A, we see that

(a, 0, 0,... ) + (b, 0, 0,... ) = (a + b, 0, 0,... ), (a, 0, 0,... )(b, 0, 0,... ) = (ab, 0, 0,... ).

Thus the map ι: A → S given by

ι(a) = (a, 0, 0,... ) 4.2. POLYNOMIAL RINGS 51

is an injective ring homomorphism. From now on, we identify A with its image under ι. Then A becomes a subring of S and we have

a(b0, b1,... ) = (a b0, a b1, a b2,... ) for all a ∈ A and every sequence (b0, b1, b2,... ) ∈ S.

Let X = (0, 1, 0, 0,... ). An easy induction shows that, for every integer i ≥ 1, the i-th power Xi of X is the sequence whose elements are all zero except for the element of index i, which is equal to 1: i ∨ Xi = (0,..., 0, 1, 0,... ). From this we deduce:

st 1 For every sequence (a0, a1, a2,... ) of S, by choosing an integer n ≥ 0 such that ai = 0 for all i > n, we have:

n i n i n ∨ X ∨ X X i (a0, a1, a2,... ) = (0,..., 0, ai, 0,... ) = ai (0,..., 0, 1, 0,... ) = ai X . i=0 i=0 i=0

nd 2 In addition, if n is an integer ≥ 0 and a0, a1, . . . , an ∈ A satisfy

n a0 + a1 X + ··· + an X = 0, then

n i n i X ∨ X ∨ 0 = ai (0,..., 0, 1, 0,... ) = (0,..., 0, ai, 0,... ) = (a0, a1, . . . , an, 0, 0,... ), i=0 i=0

and so a0 = a1 = ··· = an = 0. These last two observations show that the ring S has the two properties required by the proposition except that instead of the element x, we have the sequence X = (0, 1, 0,... ). This is not really a problem: it suffices to replace X by x in S and in the addition and multiplication tables of S.

Every polynomial p ∈ A[x] induces a function from A to A. Before giving this construc- tion, we first note the following fact whose proof is left as an exercise: Proposition 4.2.3. Let X be a set and let A be a commutative ring. The set F(X,A) of functions from X to A is a commutative ring under the addition and multiplication defined in the following manner. If f, g ∈ F(X,A) then f + g and f g are functions from X to A given by (f + g)(x) = f(x) + g(x) and (f g)(x) = f(x) g(x) for all x ∈ X. 52 CHAPTER 4. POLYNOMIALS, LINEAR OPERATORS AND MATRICES

We also show:

Proposition 4.2.4. Let A be a subring of a commutative ring C and let c ∈ C. The function

A[x] −→ C n n X i X i (4.1) aix 7−→ aic i=0 i=0 is a ring homomorphism.

This homomorphism is called evaluation at c and the image of a polynomial p under this homomorphism is denoted p(c).

Proof of Proposition 4.2.4. Let

n m X i X i p = aix and q = bix i=0 i=0 be elements of A[x]. Define ai = 0 for all i > n and bi = 0 for all i > m. Then we have:

max(n,m) m+n k ! X i X X k p + q = (ai + bi) x and p q = aibk−i x . i=0 k=0 i=0 We see that

m n max(n,m) X i X i X i p(c) + q(c) = aic + bic = (ai + bi) c = (p + q)(c) i=0 i=0 i=0 and

m ! n ! m n m+n k ! X i X i XX i+j X X k p(c)q(c) = aic bic = aibj c = aibk−i c = (pq)(c). i=0 i=0 i=0 j=0 k=0 i=0

These last equalities, together with the fact that the image of 1 ∈ A[x] is 1 show that the function (4.1) is indeed a ring homomorphism.

In particular, if we take C = A[x] and c = x, the map 4.1 becomes the identity map:

A[x] −→ A[x] n n X i X i p = aix 7−→ p(x) = aix . i=0 i=0

For this reason, we generally write p(x) to denote a polynomial in A[x]. This is what we will generally do from now on. 4.2. POLYNOMIAL RINGS 53

Proposition 4.2.5. Let A be a commutative ring. For each p ∈ A[x], denote by Φ(p): A → A the function given by Φ(p)(a) = p(a) for all a ∈ A. The map Φ: A[x] → F(A, A) thus defined is a ring homomorphism with kernel

ker(Φ) = {p ∈ A[x]; p(a) = 0 for all a ∈ A}.

We say that Φ(p) is the polynomial function from A to A induced by p. The image of Φ in F(A, A) is called the ring of polynomial functions from A to A. More concisely, we can define Φ in the following manner:

Φ: A[x] −→ F(A, A) p 7−→ Φ(p): A −→ A a 7−→ p(a)

Proof. Let p, q ∈ A[x]. For all a ∈ A, we have

Φ(p + q)(a) = (p + q)(a) = p(a) + q(a) = Φ(p)(a) + Φ(q)(a), Φ(p q)(a) = (p q)(a) = p(a) q(a) = Φ(p)(a) · Φ(q)(a),

hence Φ(p + q) = Φ(p) + Φ(q) and Φ(p q) = Φ(p)Φ(q). Since Φ(1) is the constant function equal to 1 (the identity element of F(A, A) for the product), we conclude that Φ is a ring homomorphism. The last assertion concerning its kernel follows from the definitions.

Example 4.2.6. If A = R, the function Φ: R[x] → F(R, R) defined by Proposition 4.2.5 is injective, since we know that a nonzero polynomial in R[x] has a finite number of real roots and thus ker Φ = {0}. In this case, Φ induces an isomorphism between the ring R[x] of polynomials with coefficients in R and its image in F(R, R), the ring of polynomials functions from R to R. ¯ ¯ Example 4.2.7. Let A = F2 = {0, 1}, the field with two elements. The ring F(F2, F2) of functions from F2 to itself consists of 4 elements whereas F2[x] contains an infinite number of polynomials. In this case, the map Φ is not injective. For example, the polynomial p(x) = x2 − x satisfies p(0)¯ = 0¯ and p(1)¯ = 0,¯ hence p(x) ∈ ker Φ.

Exercises.

4.2.1. Prove Proposition 4.2.3.

4.2.2. Show that the map Φ: A[x] → F(A, A) of Proposition 4.2.5 is not injective if A is a finite ring. 54 CHAPTER 4. POLYNOMIALS, LINEAR OPERATORS AND MATRICES 4.3 Evaluation at a linear operator or matrix

Let V be a vector space over a field K and let T : V → V be a linear operator on V . For every polynomial n p(x) = a0 + a1x + ··· + anx ∈ K[x], we set n P (T ) = a0I + a1T + ··· + anT ∈ EndK (V ), where I denotes the identity map on V . Proposition 4.3.1. With the above notation, the map

K[x] −→ EndK (V ) p(x) 7−→ p(T )

is a ring homomorphism. Its image, denoted K[T ], is the commutative subring of EndK (V ) given by n K[T ] = {a0I + a1T + ··· + anT ; n ∈ N>0, a0, . . . , an ∈ K}. This homomorphism is called evaluation at T .

Proof. By definition, the image of the constant polynomial 1 is the identity element I of EndK (V ). To conclude that evaluation at T is a ring homomorphism, it remains to show that if p, q ∈ K[x] are arbitrary polynomials, then (p + q)(T ) = p(T ) + q(T ) and (pq)(T ) = p(T ) ◦ q(T ). The proof for the sum is left as an exercise. The proof for the product uses the fact that, for all a ∈ K and all S ∈ EndK (V ), we have (a I) ◦ S = S ◦ (a I) = a S. Write n m X i X j p = ai x and q = bj x . i=0 j=0 We see that n m  X i  X i p(T ) ◦ q(T ) = a0I + ai T ◦ b0I + bj T i=1 j=1 m n n m  X j  X i  X i  X j = (a0I) ◦ (b0I) + (a0I) ◦ bj T + ai T ◦ (b0I) + ai T ◦ bj T j=1 i=1 i=1 j=1 m n n m X j X i X X i+j = a0b0I + a0bj T + aib0 T + aibj T j=1 i=1 i=1 j=1 n+m X  X  k = (a0b0)I + aibj T k=1 i+j=k = (pq)(T ). 4.3. EVALUATION AT A LINEAR OPERATOR OR MATRIX 55

Thus evaluation at T is indeed a ring homomorphism. The fact that its image is a commu- tative subring of EndK (V ) follows from the fact that K[x] is commutative.

Example 4.3.2. Let D be the operator of derivation on the vector space C∞(R) of infinitely differentiable functions f : R → R. Let 2 p(x) = x − 1 ∈ R[x] and q(x) = 2x + 1 ∈ R[x].

For every function f ∈ C∞(R), we have p(D)(f) = (D2 − I)(f) = f 00 − f, q(D)(f) = (2D + I)(f) = 2f 0 + f.

By the above, we have p(D) ◦ q(D) = (pq)(D) for all polynomials p, q ∈ R[X]. We verify this for the above choices of p and q. This reduces to showing that (p(D) ◦ q(D))(f) = ((pq)(D))(f) for all f ∈ C∞(R). We see that p(D) ◦ q(D)(f) = p(D)q(D)(f) = p(D)(2f 0 + f) = (2f 0 + f)00 − (2f 0 + f) = 2f 000 + f 00 − 2f 0 − f. Since pq = (x2 − 1)(2x + 1) = 2x3 + x2 − 2x − 1, we also have ((pq)(D))(f) = (2D3 + D2 − 2D − I)(f) = 2f 000 + f 00 − 2f 0 − f as claimed. Example 4.3.3. Let F(Z, R) be the vector space of functions f : Z → R. We define a linear operator T on this vector space in the following manner. If f : Z → R is an arbitrary function, then T (f): Z → R is the function given by  T (f) (i) = f(i + 1) for all i ∈ Z. We see that T 2(f)(i) = T (T (f))(i) = (T (f))(i + 1) = f(i + 2), T 3(f)(i) = T (T 2(f))(i) = (T 2(f))(i + 1) = f(i + 3), and so on. In general, T k(f)(i) = f(i + k) for k = 1, 2, 3,...

n From this, we deduce that if p(x) = a0 + a1 x + ··· + an x ∈ R[x], then p(T )(f) is the function from Z to R given, for all i ∈ Z, by n (p(T )(f))(i) = ((a0I + a1 T + ··· + an T )(f))(i) n = (a0f + a1 T (f) + ··· + an T (f))(i) n = a0f(i) + a1T (f)(i) + ··· + anT (f)(i)

= a0f(i) + a1f(i + 1) + ··· + anf(i + n) . 56 CHAPTER 4. POLYNOMIALS, LINEAR OPERATORS AND MATRICES

Let A ∈ Matn×n(K) be an n × n square matrix with coefficients in a field K. For every polynomial n p(x) = a0 + a1x + ··· + anx ∈ K[x], we set n p(A) = a0I + a1A + ··· + anA ∈ Matn×n(K), where I denotes the n × n identity matrix.

Proposition 4.3.4. With the above notation, the map

K[x] −→ Matn×n(K) p(x) 7−→ p(A) is a ring homomorphism. Its image, denoted K[A], is the commutative subring of Matn×n(K) given by n K[A] = {a0I + a1A + ··· + anA ; n ∈ N, a0, . . . , an ∈ K}.

This homomorphism is called evaluation at A. The proof of this proposition is similar to that of Proposition 4.3.1 and is left as an exercise.  1 1  Example 4.3.5. Let A = ∈ Mat ( ) and let p(x) = x3 + 2x − 1 ∈ [x]. We have 0 1 2×2 Q Q

1 3 1 1 1 0 2 5 p(A) = A3 + 2A − I = + 2 − = . 0 1 0 1 0 1 0 2

We conclude with the following result:

Theorem 4.3.6. Let V be a vector space of finite dimension n, let B be a basis of V , and let T : V → V be a linear operator on V . For every polynomial p(x) ∈ K[x], we have

[p(T )]B = p([ T ]B).

n Proof. Write p(x) = a0 + a1 x + ··· + an x . We have

n [p(T )]B = [a0I + a1T + ··· + anT ]B n = a0I + a1[T ]B + ··· + an[T ]B n = a0I + a1[T ]B + ··· + an([T ]B)

= p([ T ]B).

Example 4.3.7. Let V = hcos(2t), sin(2t)iR ⊆ C∞(R) and let D be the operator of derivation on C∞(R). The subspace V is stable under D since

D(cos(2t)) = −2 sin(2t) ∈ V and D(sin(2t)) = 2 cos(2t) ∈ V. (4.2) 4.3. EVALUATION AT A LINEAR OPERATOR OR MATRIX 57

For simplicity, denote by D the restriction D of D to V . We note that |V B = {cos(2t), sin(2t)} is a basis of V (exercise). Then formulas (4.2) give

 0 2  [ D ] = . B −2 0

Thus we have  0 2  [p(D)] = p B −2 0

2 for all p(x) ∈ R[x]. In particular, for p(x) = x + 4, we have p([ D ]B) = 0, hence p(D) = 0.

Exercises.

4.3.1. Prove the equality (p + q)(T ) = p(T ) + q(T ) left as an exercise in the proof of Propo- sition 4.3.1.

2 2 2 4.3.2. Let Rπ/4 : R → R be the rotation through angle π/4 about the origin in R , and let 2 p(x) = x − 1 ∈ R[x]. Compute p(Rπ/4)(1, 2).

2 2 2 4.3.3. Let R2π/3 : R → R be the rotation through angle 2π/3 about the origin in R , and 2 let p(x) = x + x + 1 ∈ R[x]. Show that p(R2π/3) = 0.

x x 2 x 4.3.4. Let D be the restriction to V = h e , xe , x e iR of the operator of differentiation in C∞(R).

x x 2 x (i) Show that B = {e , xe , x e } is a basis of V and determine [ D ]B.

2 (ii) Let p(x) = x − 1 and q(x) = x − 1. Compute [p(D)]B and [q(D)]B. Do these matrices commute? Why or why not?

4.3.5. Let V be a vector space over C of dimension 3, let B = {v1, v2, v3} be a basis of V , and let T : V → V be the linear map determined by the conditions

T (v1) = iv2 + v3,T (v2) = iv3 + v1 and T (v3) = iv1 + v2.

2 (i) Calculate [p(T )]B where p(x) = x + (1 + i)x + i.

(ii) Calculate p(T )(v1). 58 CHAPTER 4. POLYNOMIALS, LINEAR OPERATORS AND MATRICES Chapter 5

Unique factorization in euclidean domains

The goal of this chapter is to show that every nonconstant polynomial with coefficients in a field K can be written as a product of irreducible polynomials in K[x] and that this factorization is unique up to the ordering of the factors and multiplying each factor by a nonzero element of K. We will show that this property extends to a relatively large class of rings called euclidean domains. This class includes the polynomials rings K[x] over a field K, the ring Z of ordinary integers and the ring of gaussian integers, just to mention a few. The key property of euclidean rings is that every ideal of such a ring is principal, that is, generated by a single element.

5.1 Divisibility in integral domains

Definition 5.1.1 (Integral domain). An integral domain (or integral ring) is a commutative ring A with 0 6= 1 satisfying the property that ab 6= 0 for all a, b ∈ A with a 6= 0 and b 6= 0 (in other words, ab = 0 =⇒ a = 0 or b = 0).

Example 5.1.2. The ring Z is integral. Every subring of a field K is integral.

We will see later that the ring K[x] is integral for all fields K. Definition 5.1.3 (Division). Let A be an integral domain and let a, b ∈ A with a 6= 0. We that that a divides b (in A) or that b is divisible by a if there exists c ∈ A such that b = c a. We denote this condition by a | b.A divisor of b (in A) is a nonzero element of A that divides b.

Example 5.1.4. In Z, we have −2 | 6, since −6 = (−2)(−3). On the other hand, 5 does not divide 6 in Z, but 5 divides 6 in Q.

The following properties are left as exercises:

59 60 CHAPTER 5. UNIQUE FACTORIZATION IN EUCLIDEAN DOMAINS

Proposition 5.1.5. Let A be an integral domain and let a, b, c ∈ A with a 6= 0.

(i) 1 | b

(ii) If a | b, then a | bc.

(iii) If a | b and a | c, then a | (b + c).

(iv) If a | b, b 6= 0 and b | c, then a | c.

(v) If a 6= 0 and ab | ac, then b | c.

Definition 5.1.6 (Associate). Let A be an integral domain and let a, b ∈ A\{0}. We say that a and b are associates or that a is associated to b if a | b and b | a. We denote this condition by a ∼ b. Example 5.1.7. In Z, two nonzero integers a and b are associates if and only if a = ±b. In particular, every nonzero integer a is associated to a unique positive integer, known as |a|.

Lemma 5.1.8. Let A be a integral domain and let a ∈ A \{0}. The following conditions are equivalent:

(i) a ∼ 1,

(ii) a | 1,

(iii) a is invertible.

Proof. Since 1 | a, conditions (i) and (ii) are equivalent. Since A is commutative, conditions (ii) and (iii) are also equivalent.

Definition 5.1.9 (Unit). The nonzero elements of an integral domain A that satisfy the equivalent conditions of Lemma 5.1.8 are called units of A. The set of all units of A is denoted A×.

Proposition 5.1.10. The set A× of units of an integral domain A is stable under the product in A and, for this operation, it is an abelian group.

For the proof, see Appendix A. Example 5.1.11. The group of units of Z is Z× = {1, −1}. The group of units of a field K is K× = K \{0}. We also call it the multiplicative group of K.

Proposition 5.1.12. Let A be an integral domain and let a, b ∈ A \{0}. Then a ∼ b if and only if a = u b for some unit u of A. 5.1. DIVISIBILITY IN INTEGRAL DOMAINS 61

Proof. First suppose that a ∼ b. Then there exists u, v ∈ A such that a = u b and b = v a. We thus have a = u v a, and so (1 − u v)a = 0. Since A is integral and a 6= 0, this implies 1 − u v = 0, hence u v = 1 and so u ∈ A×. Conversely, suppose that a = u b with u ∈ A×. Since u ∈ A×, there exists v ∈ A such that v u = 1. Therefore b = v u b = v a. By the equalities a = u b and b = v a, we have b | a and a | b, hence a ∼ b. Definition 5.1.13 (Irreducible). We say that an element p in an integral domain A is irre- ducible (in A) if p 6= 0, p∈ / A× and the only divisors of p (in A) are the elements of A associated to 1 or to p. Example 5.1.14. The irreducible elements of Z are of the form ±p where p is a prime number. Definition 5.1.15 (Greatest common divisor). Let a, b be elements of an integral domain A, with a 6= 0 or b 6= 0. We say that a nonzero element d of A is a greatest common divisor (gcd) of a and b if

(i) d | a and d | b,

(ii) for all c ∈ A \{0} such that c | a and c | b, we have c | d.

We then write d = gcd(a, b).

Note that the gcd doesn’t always exist and, if it exists, it is not unique in general. More precisely, if d is a gcd of a and b, than the gcds of a and b are the elements of A associated to d (see Exercise 5.1.3). For this reason, we often write d ∼ gcd(a, b) to indicate that d is a gcd of a and b. Example 5.1.16. We will soon see that Z[x] is an integral domain. Thus, so is the subring Z[x2, x3]. However, the elements x5 and x6 do not have a gcd in Z[x2, x3] since x2 and x3 are both common divisors, but neither one divides the other.

Exercises. 5.1.1. Prove Proposition 5.1.5. Note. Part (v) uses in a crucial way the fact that A is an integral domain.

5.1.2. Let A be an integral domain. Show that the relation ∼ of Definition 5.1.6 is an equiv- alence relation on A \{0}.

5.1.3. Let A be an integral domain and let a, b ∈ A \{0}. Suppose that d and d0 are gcds of a and b. Show that d ∼ d0. 62 CHAPTER 5. UNIQUE FACTORIZATION IN EUCLIDEAN DOMAINS

5.1.4. Let A be an integral domain and let p and q be nonzero element of A with p ∼ q. Show that p is irreducible if and only if q is irreducible.

5.1.5. Let A be an integral domain, and let p and q be irreducible elements of A with p|q. Show that p ∼ q.

5.2 Divisibility in terms of ideals

Let A be a commutative ring. Recall that an ideal of A is a subsets I of A such that

(i)0 ∈ I,

(ii) if r, s ∈ I, then r + s ∈ I,

(iii) if a ∈ A and r ∈ I, then ar ∈ I.

Example 5.2.1. The subsets {0} and A are ideals of A.

The following result is left as an exercise.

Lemma 5.2.2. Let r1, . . . , rm be elements of a commutative ring A. The set

{a1 r1 + ··· + am rm ; a1, . . . , am ∈ A}

is the smallest ideal of A containing r1, . . . , rm.

Definition 5.2.3 (Ideal generated by elements, principal ideal). The ideal constructed in Lemma 5.2.2 is called the ideal of A generated by r1, . . . , rm and is denoted (r1, . . . , rm). We say that an ideal of A is principal if it is generated by a single element. The principal ideal of A generated by r is (r) = {a r ; a ∈ A}.

Example 5.2.4. The ideal of Z generated by 2 is the set (2) = {2 a ; a ∈ Z} of even integers. Example 5.2.5. In Z, we have 1 ∈ (5, 18) since 1 = −7 · 5 + 2 · 18, hence Z = (1) ⊆ (5, 18) and thus (5, 18) = (1) = Z.

Let A be an integral domain and let a ∈ A \{0}. By the definitions, for every element b of A, we have a | b ⇐⇒ b ∈ (a) ⇐⇒ (b) ⊆ (a). Therefore a ∼ b ⇐⇒ (a) = (b) and a ∈ A× ⇐⇒ (a) = A. 5.2. DIVISIBILITY IN TERMS OF IDEALS 63

Exercises.

5.2.1. Prove Lemma 5.2.2.

5.2.2. Let I and J be ideals of a commutative ring A.

(i) Show that I + J := {r + s ; r ∈ I, s ∈ J}

is the smallest ideal of A containing I and J.

(ii) If I = (a) and J = (b), show that I + J = (a, b).

5.2.3. Let a1, . . . , as be nonzero elements of an integral domain A. Suppose that there exists d ∈ A such that

(d) = (a1, . . . , as). Show that

(i) d is nonzero and divides a1, . . . , as;

(ii) if a nonzero element c of A divides a1, . . . , as, then c|d.

Note. An element d of A satisfying conditions (i) and (ii) is called a greatest common divisor of a1, . . . , as (we abbreviate gcd).

5.2.4. Let a1, . . . , as be nonzero elements of an integral domain A. Suppose that there exists m ∈ A such that

(m) = (a1) ∩ · · · ∩ (as). Prove that

(i) m is nonzero and a1, . . . , as divide m;

(ii) if a nonzero element c of A is divisible by a1, . . . , as, then m|c.

Note. An element m of A satisfying conditions (i) and (ii) is called a least common multiple of a1, . . . , as (we abbreviate lcm). 64 CHAPTER 5. UNIQUE FACTORIZATION IN EUCLIDEAN DOMAINS 5.3 Euclidean division of polynomials

Let K be a field. Every nonzero polynomial p(x) ∈ K[x] can be written in the form

n p(x) = a0 + a1x + ··· + anx

for an integer n ≥ 0 and elements a0, a1, . . . , an of K with an 6= 0. This integer n is called the degree of p(x) and is denoted deg(p(x)). The coefficient an is called the leading coefficient of p(x). We say that p(x) is monic if its leading coefficient is 1.

The degree of the polynomial 0 is not defined (certain authors set deg(0) = −∞, but we will not adopt that convention in this course).

Proposition 5.3.1. Let p(x), q(x) ∈ K[x] \{0}. Then we have p(x)q(x) 6= 0 and

deg(p(x)q(x)) = deg(p(x)) + deg(q(x)).

n m Proof. Write p(x) = a0 + ··· + anx and q(x) = b0 + ··· + bmx with an 6= 0 and bm 6= 0. Then the product n+m p(x)q(x) = a0b0 + ··· + anbmx has leading term anbm 6= 0 and so

deg(p(x)q(x)) = n + m = deg(p(x)) + deg(q(x)).

Corollary 5.3.2.

(i) K[x] is an integral domain.

(ii) K[x]× = K× = K \{0}.

(iii) Every polynomial p(x) ∈ K[x]\{0} is associated to a unique monic polynomial in K[x].

Proof. Statement (i) follows directly from the preceding proposition. To prove (ii), we first note that if p(x) ∈ K[x]×, then there exists q(x) ∈ K[x], such that p(x)q(x) = 1. Applying the proposition, we obtain

deg(p(x)) + deg(q(x)) = deg(1) = 0,

hence deg(p(x)) = 0 and thus p(x) ∈ K \{0} = K×. This proves that K[x]× ⊆ K×. Since the inclusion K× ⊆ K[x]× is immediate, we obtain K[x]× = K×. Statement (iii) is left as an exercise.

The proof of the following result is also left as an exercise: 5.4. EUCLIDEAN DOMAINS 65

Proposition 5.3.3. Let p1(x), p2(x) ∈ K[x] with p2(x) 6= 0. We can write

p1(x) = q(x)p2(x) + r(x) for a unique choice of polynomials q(x) ∈ K[x] and r(x) ∈ K[x] satisfying

r(x) = 0 or deg(r(x)) < deg(p2(x)).

This fact is called euclidean division. We prove it, for fixed p2(x), by induction on the degree of p1(x) (supposing p1(x) 6= 0). The polynomials q(x) and r(x) are called respectively the quotient and the remainder of the division of p1(x) by p2(x). 3 2 2 Example 5.3.4. For K = Q, p1(x) = x − 5x + 8x − 4 and p2(x) = x − 2x + 1, we see that

p1(x) = (x − 3) p2(x) + (x − 1) with deg(x − 1) = 1 < deg(p2(x)) = 2.

Exercises. 5.3.1. Prove part (iii) of Corollary 5.3.2. ¯ ¯ ¯ 5.3.2. Let F3 = {0, 1, 2} be the field with 3 elements and let 5 2 ¯ ¯ 3 ¯ ¯ p1(x) = x − x + 2x + 1, p2(x) = x − 2x + 1 ∈ F3[x].

Determine the quotient and the remainder of the division of p1(x) by p2(x).

5.4 Euclidean domains

Definition 5.4.1 (Euclidean domain). A euclidean domain (or euclidean ring) is an integral domain A that admits a function

ϕ : A \{0} → N satisfying the following properties:

(i) for all a, b ∈ A \{0} with a | b, we have ϕ(a) ≤ ϕ(b),

(ii) for all a, b ∈ A with b 6= 0, we can write

a = q b + r

with q, r ∈ A satisfying r = 0 or ϕ(r) < ϕ(b). 66 CHAPTER 5. UNIQUE FACTORIZATION IN EUCLIDEAN DOMAINS

A function ϕ possessing these properties is called a euclidean function. Condition (i) is often omitted. (In fact, if A has a function satisfying Condition (ii) then it also has a function satisfying both conditions.) We have included it since it permits us to simplify certain arguments and because it is satisfied in the cases that are most important to us. Example 5.4.2. The ring Z is euclidean for the absolute value function

ϕ : Z −→ N . a 7−→ |a|

For any field K, the ring K[x] is euclidean for the degree

deg : K[x] \{0} −→ N . p(x) 7−→ deg(p(x))

For the remainder of this section, we fix a euclidean domain A and a function ϕ A\{0} → N as in Definition 5.4.1. Theorem 5.4.3. Every ideal of A is principal.

Proof. If I = {0}, then I = (0) is generated by 0. Now suppose that I 6= {0}. Then the set I \{0} is not empty and thus it contains an element a for which ϕ(a) is minimal. We will show that I = (a). We first note that (a) ⊆ I since a ∈ I. To show the reverse inclusion, choose b ∈ I. Since A is euclidean, we can write b = q a + r with q, r ∈ A satisfying r = 0 or ϕ(r) < ϕ(a). Since

r = b − q a = b + (−q)a ∈ I

(because a, b ∈ I), the choice of a implies ϕ(r) ≥ ϕ(a) if r 6= 0. So we must have r = 0 and so b = q a ∈ (a). Since the choice of b is arbitrary, this shows that I ⊆ (a). Thus I = (a), as claimed.

For an ideal generated by two elements, we can find a single generator by using the following algorithm:

Euclidean Algorithm 5.4.4. Let a1, a2 ∈ A with a2 6= 0. By repeated euclidean division, we write   a1 = q1a2 + a3 with a3 6= 0 and ϕ(a3) < ϕ(a2),   a2 = q2a3 + a4 with a4 6= 0 and ϕ(a4) < ϕ(a3),  ......  a = q a + a with a 6= 0 and ϕ(a ) < ϕ(a ),  k−2 k−2 k−1 k k k k−1   ak−1 = qk−1ak.

Then we have (a1, a2) = (ak). 5.4. EUCLIDEAN DOMAINS 67

An outline of a proof of this result is given in Exercise 5.4.1.

Corollary 5.4.5. Let a, b ∈ A with a 6= 0 or b 6= 0. Then a and b admit a gcd in A. More precisely, an element d of A is a gcd of a and b if and only if (d) = (a, b).

Proof. Let d be a generator of (a, b). Since a 6= 0 or b 6= 0, we have (a, b) 6= (0) and so d 6= 0. Since a, b ∈ (a, b) = (d), we first note that d divides a and b. Also, if c ∈ A \{0} divides a and b, then a, b ∈ (c), hence d ∈ (a, b) ⊆ (c) and so c divides d. This shows that d is a gcd of a and b. If d0 is another gcd of a and b, we have d ∼ d0 (see Exercise 5.1.3), and so (d0) = (d) = (a, b).

Corollary 5.4.5 implies in particular that every gcd of elements a and b of A with a 6= 0 or b 6= 0 can be written in the form

d = r a + s b with r, s ∈ A since such an element belongs to the ideal (a, b). Example 5.4.6. Find the gcd of 15 and 39 and write it in the form 15r + 39s with r, s ∈ Z.

Solution: We apply the euclidean algorithm 5.4.4 to determine the generator of (15, 39). By repeated euclidean division, we find

39 = 2 · 15 + 9 (5.1) 15 = 9 + 6 (5.2) 9 = 6 + 3 (5.3) 6 = 2 · 3 (5.4)

hence gcd(15, 39) = 3. To find r and s, we follow the chain of calculations (5.1) to (5.4) starting at the end. Doing this, we have

3 = 9 − 6 by (5.3), = 9 − (15 − 9) = 2 · 9 − 15 by (5.2), = 2(39 − 2 · 15) − 15 = 2 · 39 − 5 · 15 by (5.1).

Therefore gcd(15, 39) = 3 = (−5) · 15 + 2 · 39.

Example 5.4.7. Find the gcd of

4 2 2 p1(x) = x + 3x + x − 1 and p2(x) = x − 1

in Q[x] and write it in the form a1(x) p1(x) + a2(x) p2(x) with a1(x), a2(x) ∈ Q[x]. 68 CHAPTER 5. UNIQUE FACTORIZATION IN EUCLIDEAN DOMAINS

Solution: By repeated repeated division, we have

2 p1(x) = (x + 4)p2(x) + (x + 3) (5.5)

p2(x) = (x − 3)(x + 3) + 8 (5.6) x + 3 = (1/8)(x + 3) · 8 (5.7)

hence gcd(p1(x), p2(x)) = 8 (or 1). Retracing the calculation starting at the end, we find

8 = p2(x) − (x − 3)(x + 3) by (5.6), 2 = p2(x) − (x − 3)(p1(x) − (x + 4)p2(x)) by (5.5), 3 2 = (3 − x) p1(x) + (x − 3 x + 4 x − 11) p2(x) .

Another consequence of Theorem 5.4.3 is the following result which will play a key role later.

Theorem 5.4.8. Let p be an irreducible element of A and let a, b ∈ A. Suppose that p | ab. Then we have p | a or p | b.

Proof. Suppose that p - a. We have to show that p | b. To do this, we choose d such that (p, a) = (d). Since d | p and p is irreducible, we have d ∼ 1 or d ∼ p. Since we also have d | a and p - a, the possibility d ∼ p is excluded. Thus d is a unit of A and so 1 ∈ (d) = (p, a).

This implies that we can write 1 = r p+s a with r, s ∈ A. Then b = (r p+s a)b = r b p+s a b is divisible by p, since p | a b.

By induction, this result can be generalized to an arbitrary product of elements of A:

Corollary 5.4.9. Let p be an irreducible element of A and let a1, . . . , as ∈ A. Suppose that p | a1 ··· as. Then p divides at least one of the elements a1, . . . , as.

Exercises.

5.4.1. Let a1 and a2 be elements of a euclidean domain A, with a2 6= 0.

(i) Show that Algorithm 5.4.4 terminates after a finite number of steps.

(ii) In the notation of this algorithm, show that (ai, ai+1) = (ai+1, ai+2) for all i = 1, . . . , k− 2.

(iii) In the same notation, show that (ak−1, ak) = (ak). 5.5. THE UNIQUE FACTORIZATION THEOREM 69

(iv) Conclude that (a1, a2) = (ak).

5.4.2. Find the gcd of f(x) = x3 + x − 2 and g(x) = x5 − x4 + 2x2 − x − 1 in Q[x] and express it as a linear combination of f(x) and g(x). 3 ¯ 5 4 ¯ 5.4.3. Find the gcd of f(x) = x + 1 and g(x) = x + x + x + 1 in F2[x] and express it as a linear combination of f(x) and g(x).

5.4.4. Find the gcd of 114 and 45 in Z and express it as a linear combination of 114 and 45.

5.4.5. Let a1, . . . , as be elements, not all zero, of a euclidean domain A.

(i) Show that a1, . . . , as admit a gcd in A in the sense of Exercise 5.2.3.

(ii) If s ≥ 3, show that gcd(a1, . . . , as) = gcd(a1, gcd(a2, . . . , as)).

5.4.6. Let a1, . . . , as be nonzero elements of a euclidean domain A.

(i) Show that a1, . . . , as admit an lcm in A in the sense of Exercise 5.2.4.

(ii) If s ≥ 3, show that lcm(a1, . . . , as) = lcm(a1, lcm(a2, . . . , as)).

5.4.7. Find the gcd of 42, 63 and 140 in Z, and express it as a linear combination of the three integers (see Exercise 5.4.5).

5.4.8. Find the gcd of x4 − 1, x6 − 1 and x3 + x2 + x + 1 in Q[x], and express it as a linear combination of the three polynomials (see Exercise 5.4.5).

5.5 The Unique Factorization Theorem

We again fix a euclidean domain A and a function ϕ : A \{0} → N as in Definition 5.4.1. The results of the preceding section only rely on the second property of ϕ required by Definition 5.4.1. The following result uses the first property. Proposition 5.5.1. Let a, b ∈ A \{0} with a | b. Then we have ϕ(a) ≤ ϕ(b) with equality if and only if a ∼ b.

Proof. Since a | b, we have ϕ(a) ≤ ϕ(b) as required in Definition 5.4.1. If a ∼ b, we also have b | a, hence ϕ(b) ≤ ϕ(a) and so ϕ(a) = ϕ(b). Conversely, suppose that ϕ(a) = ϕ(b). We can write a = q b + r with q, r ∈ A satisfying r = 0 or ϕ(r) < ϕ(b). Since a | b, we see that a divides r = a − q b, hence r = 0 or ϕ(a) ≤ ϕ(r). Since ϕ(a) = ϕ(b), the only possibility is that r = 0. This implies that a = q b, hence b | a and thus a ∼ b. 70 CHAPTER 5. UNIQUE FACTORIZATION IN EUCLIDEAN DOMAINS

Applying this result with a = 1, we deduce the following. Corollary 5.5.2. Let b ∈ A \{0}. We have ϕ(b) ≥ ϕ(1) with equality if and only if b ∈ A×.

We are now ready to prove the main result of this chapter. Theorem 5.5.3 (Unique Factorization Theorem). Let a ∈ A \{0}. There exists a unit × u ∈ A , an integer r ≥ 0 and irreducible elements p1, . . . , pr of A such that

a = u p1 ··· pr. (5.8) For every other factorization of a as a product

0 a = u q1 ··· qs (5.9)

0 × of a unit u ∈ A and irreducible elements q1, . . . , qs of A, we have s = r and there exists a permutation (i1, i2, . . . , ir) of (1, 2, . . . , r) such that

q1 ∼ pi1 , q2 ∼ pi2 , . . . , qr ∼ pir .

Proof. 1st Existence: We first show the existence of a factorization of the form (5.8) by induction on ϕ(a). If ϕ(a) ≤ ϕ(1), Corollary 5.5.2 tells us that ϕ(a) = ϕ(1) and that a ∈ A×. In this case, Equality (5.8)) is satisfied for the choices u = a and r = 0. Now suppose that ϕ(a) > ϕ(1). Among the divisors p of a with ϕ(p) > ϕ(1), we choose one for which ϕ(p) is minimal. If q is a divisor of this element p then q | a and so we have ϕ(q) = ϕ(1) or ϕ(q) ≥ ϕ(p). By Proposition 5.5.1, this implies that q ∼ 1 or q ∼ p. Since this is true for every divisor q of p, we conclude that p is irreducible. Write a = b p with b ∈ A. Since b | a and b 6∼ a, Proposition 5.5.1 gives ϕ(b) < ϕ(a). By induction, we can assume that b admits a factorization of the type (5.8). Then a = b p also admits such a factorization (with the additional factor p). 2nd Uniqueness: Suppose that a also admits the factorization (5.9). We we have

0 u p1 ··· pr = u q1 ··· qs (5.10)

0 where u, u are units and p1, . . . , pr, q1, ··· , qs are irreducible elements of A. If r = 0, the left side of this equality is simply u and since no irreducible element of A can divide a unit, this implies that s = 0 as required. Now suppose that r ≥ 1. Since p1 divides u p1 ··· pr, it also 0 divides u q1 ··· qs. Since p1 is irreducible, this implies, by Corollary 5.4.9, that p1 divides at 0 least one of the factors q1, . . . , qs (since p1 - u ). Permuting q1, . . . , qs if necessary, we can × assume that p1 | q1. Then we have q1 = u1p1 for a unit u1 ∈ A , and Equality (5.10) gives

00 u p2 ··· pr = u q2 ··· qs

00 0 × where u = u u1 ∈ A . Since the left side of this new equality contains one less irreducible factor, we can assume, by induction, that this implies r − 1 = s − 1 and q2 ∼ pi2 , . . . , qr ∼ pir for some permutation (i2, . . . , ir) of (2, . . . , r). Then r = s and q1 ∼ p1, q2 ∼ pi2 , . . . , qr ∼ pir as desired. 5.5. THE UNIQUE FACTORIZATION THEOREM 71

Because the group of units of Z is Z× = {1, −1} and every irreducible element of Z is associated to a unique (positive) prime p, we conclude: Corollary 5.5.4. Every nonzero integer a can be written in the form

a = ±p1 ··· pr for a choice of sign ±, an integer r ≥ 0 and prime numbers p1, . . . , pr. This expression is unique up to a permutation of p1, . . . , pr.

If r = 0, we interpret the product ±p1 ··· pr as being ±1. Since we can order the prime number by their size, we can deduce: Corollary 5.5.5. For every nonzero integer a, there exists a unique choice of sign ±, an integer s ≥ 0, prime numbers p1 < p2 < ··· < ps and positive integers e1, e2, . . . , es such that

e1 e2 es a = ±p1 p2 ··· ps .

Similarly, if K is a field, we know that K[x]× = K× and that every irreducible polynomial of K[x] is associated to a unique irreducible monic polynomial. However, there is no natural order on K[x]. Therefore we have: Corollary 5.5.6. Let K be a field and let a(x) ∈ K[x]\{0}. There exists a constant c ∈ K×, an integer s ≥ 0, irreducible distinct monic polynomials p1(x), . . . , ps(x) of K[x] and positive integers e1, . . . , es such that e1 es a(x) = c p1(x) ··· ps(x) .

e1 es This factorization is unique up to permutation of the factors p1(x) , . . . , ps(x) .

Exercises.

e1 es 5.5.1. Let a be a nonzero element of a euclidean domain A. Write a = up1 ··· ps where u is a unit of A, where p1, . . . , ps are irreducible elements of A, pairwise non-associated, 0 and where e1, . . . , es are integers ≥ 0 (we set p = 1). Show that the divisors of a are the d1 ds products vp1 ··· ps where v is a unit of A and where d1, . . . , ds are integers with 0 ≤ di ≤ ei for i = 1, . . . , s.

5.5.2. Let a and b be nonzero elements of euclidean domain A and let p1, . . . , ps be a maximal system of irreducible divisors, pairwise non-associated, of ab.

(i) Show that we can write

e1 es f1 fs a = up1 ··· ps and b = vp1 ··· ps

× with u, v ∈ A and e1, . . . , es, f1, . . . , fs ∈ N. 72 CHAPTER 5. UNIQUE FACTORIZATION IN EUCLIDEAN DOMAINS

(ii) Show that d1 ds gcd(a, b) ∼ p1 ··· ps

where di = min(ei, fi) for i = 1, . . . , s. (iii) Show that h1 hs lcm(a, b) ∼ p1 ··· ps

where hi = max(ei, fi) for i = 1, . . . , s (see Exercise 5.2.4 for the notion of lcm). Hint. For (ii), use the result of Exercise 5.5.1.

5.6 The Fundamental Theorem of Algebra

Fix a field K. We first note: Proposition 5.6.1. Let a(x) ∈ K[x] and r ∈ K. The polynomial x − r divides a(x) if and only if a(r) = 0.

Proof. Dividing a(x) by x − r, we see that

a(x) = q(x)(x − r) + b

where q(x) ∈ K[x] and b ∈ K since the remainder in the division is zero or of degree < deg(x − r) = 1. Evaluating the two sides of this equality at x = r, we obtain a(r) = b. The conclusion follows. Definition 5.6.2 (Root). Let a(x) ∈ K[x]. We say that an element r of K is a root of a(x) if a(r) = 0.

The Alembert-Gauss Theorem, also called the Fundamental Theorem of Algebra is in fact a theorem in analysis which can be stated as follows:

Theorem 5.6.3 (Alembert-Gauss Theorem). Every polynomial a(x) ∈ C[x] of degree ≥ 1 has at least one root in C.

We will accept this result without proof. In light of Proposition 5.6.1, it implies that the irreducible monic polynomials of C[x] are of the form x − r with r ∈ C. By Corollary 5.5.6, we conclude:

Theorem 5.6.4. Let a(x) ∈ C[x] \{0}, let r1, . . . , rs be the distinct roots of a(x) in C, and let c be the leading coefficient of a(x). There exists a unique choice of integers e1, . . . , es ≥ 1 such that e1 es a(x) = c(x − r1) ··· (x − rs) .

The integer ei is called the multiplicity of the root ri of a(x).

Theorem 5.6.3 also sheds light on the irreducible polynomials of R[x]. 5.6. THE FUNDAMENTAL THEOREM OF ALGEBRA 73

Theorem 5.6.5. The monic irreducible polynomials of R[x] are of the form x−r with r ∈ R or of the form x2 + b x + c with b, c ∈ R and b2 − 4c < 0.

Proof. Let p(x) be a monic irreducible polynomial in R[x]. By Theorem 5.6.3, it has at least one root r in C. If r ∈ R, then x − r divides p(x) in R[x] and so p(x) = x − r. If r∈ / R, then the complex conjugater ¯ of r is different from r and, since

p(¯r) = p(r) = 0,

it is also a root of p(x). Therefore, p(x) is divisible by (x − r)(x − r¯) in C[x]. Since this product can be written as x2 + b x + c with b = −(r +r ¯) ∈ R and c = rr¯ ∈ R, the quotient of p(x) by x2 + b x + c is a polynomial in R[x] and so p(x) = x2 + b x + c. Since the roots of p(x) in C are not real, we must have b2 − 4c < 0.

Exercises.

5.6.1. Let K be a field and let p(x) be a nonzero polynomial of K[x].

(i) Suppose that deg(p(x)) = 1. Show that p(x) is irreducible.

(ii) Suppose that 2 ≤ deg(p(x)) ≤ 3. Show that p(x) is irreducible if and only if it does not have a root in K.

(iii) In general, show that p(x) is irreducible if and only if it does not have any irreducible factors of degree ≤ deg(p(x))/2 in K[x].

¯ ¯ ¯ 5.6.2. Let F3 = {0, 1, 2} be the finite field with 3 elements.

(i) Find all the irreducible monic polynomials of F3[x] of degree at most 2. 4 3 ¯ (ii) Show that x + x + 2 is an irreducible polynomial of F3[x]. 5 ¯ 4 ¯ 3 2 ¯ (iii) Factor x + 2x + 2x + x + x + 2 as a product of irreducible polynomials of F3[x].

5.6.3. Find all the irreducible monic polynomials of F2[x] of degree at most 4 where F2 = {0¯, 1¯} is the finite field with 2 elements.

5.6.4. Factor x4 − 2 as a product of irreducible polynomials (i) in R[x], (ii) in C[x].

5.6.5. Let V be a vector space of finite dimension n ≥ 1 over C and let T : V → V be a linear operator. Show that T has an least one eigenvalue. 74 CHAPTER 5. UNIQUE FACTORIZATION IN EUCLIDEAN DOMAINS

5.6.6. Let V be a vector space of finite dimension n ≥ 1 over a field K, let T : V → V be a linear operator over V , and let λ1, . . . , λs be the distinct eigenvalues of T in K. Show that T is diagonalizable if and only if

e1 es charT (x) = (x − λ1) ··· (x − λs)

where ei = dimK Eλi (T ) for i = 1, . . . , s. Hint. Use Theorem 3.2.3.

5.6.7. Let A ∈ Matn×n(K) where n is a positive integer and K is a field. State and prove a necessary and sufficient condition, analogous to that of Exercise 5.6.6, for A to be diagonal- izable over K. Chapter 6

Modules

The notion of a module is a natural extension of that of a vector space. The difference is that the scalars no longer lie in a field but rather in a ring. This permits one to encompass more objects. For example, we will see that an abelian group is a module over the ring Z of integers. We will also see that if V is a vector space over a field K equipped with a linear operator, then V is naturally a module over the ring K[x] of polynomials in one variable over K. In this chapter, we introduce in a general manner the theory of modules over a commutative ring and the notions of submodule, free module, direct sum, homomorphism and isomorphism of modules. We interpret these constructions in several contexts, the most important for us being the case of a vector space over a field K equipped with a linear operator.

6.1 The notion of a module

Definition 6.1.1 (Module). A module over a commutative ring A is a set M equipped with an addition and a multiplication by the elements of A: M × M −→ M and A × M −→ M (u, v) 7−→ u + v (a, u) 7−→ a u that satisfies the following axioms:

Mod1. u + (v + w) = (u + v) + w  for all u, v, w ∈ M. Mod2. u + v = v + u Mod3. There exists 0 ∈ M such that u + 0 = u for all u ∈ M. Mod4. For all u ∈ M, there exists −u ∈ M such that u + (−u) = 0.  Mod5. 1u = u  Mod6. a(bu) = (ab)u  for all a, b ∈ A and u, v ∈ M. Mod7. (a + b)u = a u + b u  Mod8. a(u + v) = a u + a v 

75 76 CHAPTER 6. MODULES

In what follows, we will speak of a module over A or of an A-module to designate such an object. If we compare the definition of a module over a commutative ring A to that of a vector space over a field K (see Definition 1.1.1), we recognize the same axioms. Since a field is a particular type of commutative ring, we deduce: Example 6.1.2. A module over a field K is simply a vector space over K.

Axioms Mod1 to Mod4 imply that a module M is an abelian group under addition (see Appendix A). From this we deduce that the order in which we add elements u1,..., un of M does not affect their sum, denoted

n X u1 + ··· + un or ui. i=1

The element 0 of M characterized by axiom Mod3 is called the zero of M (it is the iden- tity element for the addition). Finally, for each u ∈ M, the element −u of M characterized by axiom Mod 4 is called the additive inverse of u. We define subtraction in M by

u − v := u + (−v).

We also verify, by induction on n, that the axioms of distributivity Mod7 and Mod8 imply in general n ! n n n X X X X ai u = aiu and a ui = a ui (6.1) i=1 i=1 i=1 i=1

for all a, a1, . . . , an ∈ A and u, u1,..., un ∈ M. We also show: Lemma 6.1.3. Let M be a module over a commutative ring A. For all a ∈ A and all u ∈ M, we have

(i) 0u = 0 and a0 = 0 (ii) (−a)u = a(−u) = −(a u).

Proof. In A, we have 0 + 0 = 0. Then axiom Mod7 gives

0u = (0 + 0)u = 0u + 0u, hence 0u = 0. More generally, since a + (−a) = 0, the same axiom gives

0u = (a + (−a))u = a u + (−a)u.

Since 0u = 0, this implies that (−a)u = −(a u). The relations a0 = 0 and a(−u) = −(a u) are proven in a similar manner by applying instead axiom Mod8. 6.1. THE NOTION OF A MODULE 77

Suppose that M is an Z-module, that is, a module over Z. Then M is an abelian group under addition and, for every integer n ≥ 1 and for all u ∈ M, the general formulas for distributivity (6.1) give

n u = (1 + ··· + 1)u = 1u + ··· + 1u = u + ··· + u . | {z } | {z } | {z } n times n times n times Thanks to Lemma 6.1.3, we also find that

(−n)u = n(−u) = (−u) + ··· + (−u) and 0u = 0. | {z } n times In other words, multiplication by the integers in M is entirely determined by the addition law in M and corresponds to the natural notion of product by an integer in an abelian group. Conversely, if M is an abelian group, the rule of “exponents” (Proposition A.2.11 of Appendix A) implies that the product by an integer in M satisfies axioms Mod6 to Mod8. Since it clearly satisfies axiom Mod5, we conclude:

Proposition 6.1.4. A Z-module is simply an abelian group equipped with the natural mul- tiplication by the integers.

The following proposition gives a third example of modules.

Proposition 6.1.5. Let V be a vector space over a field K and let T ∈ EndK (V ) be a linear operator on V . Then V is a K[x]-module for the addition of V and the multiplication by polynomials given by p(x)v := p(T )(v) for all p(x) ∈ K[x] and all v ∈ V .

Proof. Since V is an abelian group under addition, it suffices to verify axioms Mod5 to Mod8. Let p(x), q(x) ∈ K[x] and u, v ∈ V . We must show:

1)1 v = v,

2) p(x)(q(x)v) = (p(x)q(x))v,

3)( p(x) + q(x))v = p(x)v + q(x)v,

4) p(x)(u + v) = p(x)u + p(x)v.

By the definition of the product by a polynomial, this reduces to showing

0 1 ) IV (v) = v, 20) p(T )(q(T )v) = (p(T ) ◦ q(T ))v,

30)( p(T ) + q(T ))v = p(T )v + q(T )v, 78 CHAPTER 6. MODULES

40) p(T )(u + v) = p(T )u + p(T )v, because the map K[x] −→ EndK (V ) , r(x) 7−→ r(T ) of evaluation at T , being a ring homomorphism, maps 1 to IV , p(x) + q(x) to p(T ) + q(T ) 0 0 0 and p(x)q(x) to p(T ) ◦ q(T ). Equalities 1 ), 2 ) and 3 ) follow from the definitions of IV , p(T ) ◦ q(T ) and p(T ) + q(T ) respectively. Finally, 40) follows from the fact that p(T ) is a linear map.

In the context of Proposition 6.1.5, we also note that, for a constant polynomial p(x) = a with a ∈ K, we have p(x)v = (a I)(v) = a v for all v ∈ V . Thus the notation a v has the same meaning if we consider a as a polynomial of K[x] or as a scalar of the field K. We can then state the following complement to Proposition 6.1.5.

Proposition 6.1.6. In the notation of Proposition 6.1.5, the external multiplication by the elements of K[x] on V extends the scalar multiplication by the elements of K.

Example 6.1.7. Let V be a vector space of dimension 2 over Q equipped with a basis B = {v1, v2} and let T : V → V be the linear operator determined by the conditions

T (v1) = v1 − v2,T (v2) = v1.

We equip V with the structure of a Q[x]-module associated to the endomorphism T and calculate 2 (2x − 1)v1 + (x + 3x)v2. By definition, we have

2 2 (2 T − I)(v1) + (T + 3 T )(v2) = 2 T (v1) − v1 + T (v2) + 3 T (v2) 2 = 3 T (v1) − v1 + 3 T (v2) (car T (v2) = T (T (v2)) = T (v1))

= 3(v1 − v2) − v1 + 3v1

= 5v1 − 3v2.

Fix an arbitrary commutative ring A. We conclude with three general examples of A- modules. The details of the proofs are left as exercises. Example 6.1.8. The ring A is an A-module for its addition law and the external multplication

A × A −→ A (a, u) 7−→ a u

given by the product in A. 6.1. THE NOTION OF A MODULE 79

Example 6.1.9. Let n ∈ N>0. The set   a   1  a   n  2 A =  .  ; a1, a2, . . . , an ∈ A  .      an  of n-tuples of elements of A is a module over A for the operations           a1 b1 a1 + b1 a1 c a1 a  b  a + b  a  c a   2  2  2 2   2  2  .  +  .  =  .  and c  .  =  .  .  .   .   .   .   .  an bn an + bn an c an

For n = 1, we recover the natural structure of A-module on A1 = A described by Example 6.1.8.

Example 6.1.10. Let M1,...,Mn be A-modules. Their cartesian product

M1 × · · · × Mn = {(u1,..., un); u1 ∈ M1,..., un ∈ Mn} is an A-module for the operations

(u1,..., un) + (v1,..., vn) = (u1 + v1,..., un + vn),

c (u1,..., un) = (c u1, . . . , c un).

Remark. If we apply the construction of Example 6.1.10 with M1 = ··· = Mn = A for the natural A-module structure on A given by Example 6.1.8, we obtain the structure of an A-module on A × · · · × A = An that is essentially that of Example 6.1.9 except that the elements of An are presented in rows. For other applications, it is nevertheless more convenient to write the elements of An in columns.

Exercises.

6.1.1. Complete the proof of Lemma 6.1.3 by showing that a 0 = 0 and a(−u) = −(a u).

6.1.2. Let V be a vector space of dimension 3 over F5 equipped with a basis B = {v1, v2, v3} and let T : V → V be a linear operator determined by the conditions

T (v1) = v1 + 2¯v2 − v3,T (v2) = v2 + 4¯v3,T (v3) = v3.

For the corresponding structure of an F5[x]-module on V , calculate 80 CHAPTER 6. MODULES

(i) x v1,

2 (ii) x v1 − (x + 1)¯ v2,

2 2 (iii)( 2¯x − 3)¯ v1 + (3¯x)v2 + (x + 2)¯ v3. 2 2 6.1.3. Let T : R → R be a linear operator whose matrix in the standard basis E = {e1, e2} 2 −1 is [ T ] = . For the associated [x]-module structure on 2, calculate E 3 0 R R

(i) x e1,

2 (ii) x e1 + e2,

2 (iii)( x + x + 1)e1 + (2x − 1)e2. n 6.1.4. Let n ∈ N>0 and let A be a commutative ring. Show that A is an A-module for the operations defined in Example 6.1.9.

6.1.5. Let M1,...,Mn be modules over a commutative ring A. Show that their cartesian product M1 × · · · × Mn is an A-module for the operations defined in Example 6.1.10. 6.1.6. Let M be a module over the commutative ring A and let X be a set. For all a ∈ A and every pair of functions f : X → M and g : X → M, we define f + g : X → M and af : X → M by setting (f + g)(x) = f(x) + g(x) and (af)(x) = af(x) for all x ∈ X. Show that the set F(X,M) of functions from X to M is an A-module for these operations.

6.2 Submodules

Fix an arbitrary module M over an arbitrary commutative ring A. Definition 6.2.1 (Submodule). An A-submodule of M is a subset N of M satisfying the following conditions: SM1. 0 ∈ N. SM2. If u, v ∈ N, then u + v ∈ N. SM3. If a ∈ A and u ∈ N, then a u ∈ N. Note that condition SM3 implies that −u = (−1)u ∈ N for all u ∈ N which, together with conditions SM1 and SM2, implies that an A-submodule of M is, in particular, a subgroup of M. Conditions SM2 and SM3 require that an A-submodule N of M be stable under the addition in M and the multiplication by elements of A in M. By restriction, these operations therefore define an addition on N and a multiplication by elements of A on N. We leave it as an exercise to verify that these operations satisfy the 8 axioms required for N to be an A-module. 6.2. SUBMODULES 81

Proposition 6.2.2. An A-submodule N of M is itself an A-module for the addition and multiplication by the elements of A restricted from M to N. Example 6.2.3. Let V be a vector space over a field K.A K-submodule of V is simply a vector subspace of V . Example 6.2.4. Let M be an abelian group. A Z-submodule of M is simply a subgroup of M. Example 6.2.5. Let A be a commutative ring considered as a module for itself (see Exam- ple 6.1.8). An A-submodule of A is simply an ideal of A.

In the case of a vector space V equipped with an endomorphism, the concept of submodule translates as follows:

Proposition 6.2.6. Let V be a vector space over a field K and let T ∈ EndK (V ). For the corresponding structure of a K[x]-module on V , a K[x]-submodule of V is simply a T - invariant vector subspace of V , that is a vector subspace U of V such that T (u) ∈ U for all u ∈ U.

Proof. Since the multiplication by elements of K[x] on V extends the scalar multiplication by the elements of K (see Proposition 6.1.6), a K[x]-submodule U of B is necessarily a vector subspace of V . Since it also satisfies xu ∈ U for all u ∈ U, and xu = T (u) (by definition), we also see that U must be T -invariant. Conversely, if U is a T -invariant vector subspace of U, we have T (u) ∈ U for all u ∈ U. By induction on n, we deduce that T n(u) ∈ U for every integer n ≥ 1 and all u ∈ U. n Therefore, for every polynomial p(x) = a0 + a1x + ··· + anx ∈ K[x] and u ∈ U, we have

n n p(x)u = (a0I + a1T + ··· + anT )(u) = a0u + a1T (u) + ··· + anT (u) ∈ U. Thus U is a K[x]-submodule of V : we have just showed that it satisfies condition SM3 and the other conditions SM1 and SM2 are satisfied by virtue of the fact that U is a subspace of V .

We conclude this section with several valuable general constructions for an arbitrary A-module.

Proposition 6.2.7. Let u1,..., us be elements of an A-module M. The set

{a1 u1 + ··· + as us ; a1, . . . , as ∈ A}

is an A-submodule of M that contains u1,..., us and it is the smallest A-submodule of M with this property.

This A-submodule is called the A-submodule of M generated by u1,..., us. It is denoted

hu1,..., usiA or A u1 + ··· + A us. The proof of this result is left as an exercise. 82 CHAPTER 6. MODULES

Definition 6.2.8 (Cyclic module and finite type module). We say that an A-module M (or an A-submodule of M) is cyclic if it is generated by a single element. We say that it is of finite type (or is finitely generated) if it is generated by a finite number of elements of M.

Example 6.2.9. If u1,..., us are elements of an abelian group M, then hu1,..., usiZ is simply the subgroup of M generated by u1,..., us. In particular, M is cyclic as an abelian group if and only if it is cyclic as a Z-module.

Example 6.2.10. Let V be a finite-dimensional vector space over a field K, let {v1,..., vs} be a set of generators of V , and let T ∈ EndK (V ). Since the structure of a K[x]-module on V associated to T extends that of a K-module, we have

V = hv1,..., vsiK ⊆ hv1,..., vsiK[x], hence V = hv1,..., vsiK[x] is also generated by v1,..., vs as a K[x]-module. In particular, V is a K[x]-module of finite type.

Proposition 6.2.11. Let N1,...,Ns be submodules of an A-module M. Their intersection N1 ∩ · · · ∩ Ns and their sum

N1 + ··· + Ns = {u1 + ··· + us ; u1 ∈ N1,..., us ∈ Ns} are also A-submodules of M.

Proof for the sum. We first note that, since 0 ∈ Ni for i = 1, . . . , s, we have

0 = 0 + ··· + 0 ∈ N1 + ··· + Ns.

0 Let u, u be arbitrary elements of N1 + ··· + Ns, and let a ∈ A. We can write

0 0 0 u = u1 + ··· + us and u = u1 + ··· + us

0 0 with u1, u1 ∈ N1,..., us, us ∈ Ns. Thus we have

0 0 0 u + u = (u1 + u1) + ··· + (us + us) ∈ N1 + ··· + Ns,

a u = (a u1) + ··· + (a us) ∈ N1 + ··· + Ns.

This proves that N1 + ··· + Ns is an A-submodule of M.

In closing this section, we note that the notation A u1 + ··· + A us for the A-submodule hu1,..., usiA generated by the elements u1,..., us of M is consistent with the notion of a sum of submodules since, for every element u of M, we have

A u = huiA = {a u ; a ∈ A}.

When A = K is a field, we recover the usual notions of the intersection and sum of subspaces of a vector space over K. When A = Z, we instead recover the notions of the intersection and sum of subgroups of an abelian group. 6.3. FREE MODULES 83

Exercises.

6.2.1. Prove Proposition 6.2.2. 6.2.2. Let M be an abelian group. Explain why the Z-submodules of M are simply the subgroups of M. Hint. Show that every Z-submodule of M is a subgroup of M and, conversely, that every subgroup of M is a Z-submodule of M. 6.2.3. Prove Proposition 6.2.7.

6.2.4. Let M be a module over a commutative ring A, and let u1, . . . , us, v1, . . . , vt ∈ M. Set L = h u1, . . . , us iA and N = h v1, . . . , vt iA. Show that L + N = h u1, . . . , us, v1, . . . , vt iA. 6.2.5. Let M be a module over a commutative ring A. We say that an element u of M is torsion if there exits a nonzero element a of A such that a u = 0. We say that M is a torsion A-module if all its elements are torsion. Suppose that A is an integral domain.

(i) Show that the set Mtor of torsion elements of M is an A-submodule of M. (ii) Show that, if M is of finite type and if all its elements are torsion, then there exists a nonzero element a of A such that a u = 0 for all u ∈ M.

6.3 Free modules

Definition 6.3.1 (Linear independence, basis, free module). Let u1,..., un be elements of an A-module M.

(i) We say that u1,..., un are linearly independent over A if the only choice of elements a1, . . . , an of A such that a1u1 + ··· + anun = 0

is a1 = ··· = an = 0.

(ii) We say that {u1,..., un} is a basis of M if u1,..., un are linearly independent over A and generate M as an A-module.

(iii) We say that M is an free A-module if it admits a basis.

By convention, the module {0} consisting of a single element is a free A-module that admits the empty set as a basis.

In contrast to finite type vector spaces over a field K, which always admit a basis, A- modules of finite type do not always admit a basis (see the exercises). 84 CHAPTER 6. MODULES

n Example 6.3.2. Let n ∈ N>0. The A-module A introduced in Example 6.1.9 admits the basis         1 0 0   .   0 1 . e1 = . , e2 = . ,..., en =   . . . 0         0 0 1  Indeed, we have         a1 1 0 0 a  0 1 .  2     .  .  = a1 . + a2 . + ··· + an   (6.2)  .  . . 0 an 0 0 1 n for all a1, . . . , an ∈ A. Thus e1,..., en generate A as an A-module. Furthermore, if a1, . . . , an ∈ A satisfy a1e1 + ··· + anen = 0, then equality (6.2) gives     a1 0 a  0  2    .  = .  .  . an 0

and so a1 = a2 = ··· = an = 0. Hence e1,..., en are also linearly independent over A.

Exercises. 6.3.1. Show that a finite abelian group M admits a basis as a Z-module if and only if M = {0} (in which case the empty set is, by convention, a basis of M). 6.3.2. In general, show that, if M is a finite module over an infinite commutative ring A, then M is a free A-module if and only if M = {0}.

6.3.3. Let V be a finite-dimensional vector space over a field K and let T ∈ EndK (V ). Show that, for the corresponding structure of a K[x]-module, V admits a basis if and only if V = {0}. 6.3.4. Let m and n be positive integers and let A be a commutative ring. Show that Matm×n(A) is a free A-module for the usual operations of addition and multiplication by elements of A.

6.4 Direct sum

Definition 6.4.1 (Direct sum). Let M1,...,Ms be submodules of an A-module M. We say that their sum is direct if the only choice of u1 ∈ M1,..., us ∈ Ms such that

u1 + ··· + us = 0 6.4. DIRECT SUM 85

is u1 = ··· = us = 0. When this condition is satisfied, we write the sum M1 + ··· + Ms as M1 ⊕ · · · ⊕ Ms.

We say that M is the direct sum of M1,...,Ms if M = M1 ⊕ · · · ⊕ Ms.

The following propositions, which generalize the analogous results of Section 1.4 are left as exercises. The first justifies the importance of the notion of direct sum.

Proposition 6.4.2. Let M1,...,Ms be submodules of an A-module M. Then we have M = M1 ⊕· · ·⊕Ms if and only if, for all u ∈ M, there exists a unique choice of u1 ∈ M1,..., us ∈ Ms such that u = u1 + ··· + us.

Proposition 6.4.3. Let M1,...,Ms be submodules of an A-module M. The following con- ditions are equivalent:

1) the sum of M1,...,Ms is direct,

2) Mi ∩ (M1 + ··· + Mci + ··· + Ms) = {0} for i = 1, . . . , s,

3) Mi ∩ (Mi+1 + ··· + Ms) = {0} for i = 1, . . . , s − 1.

(As usual, the notation Mci means that we omit the term Mi from the sum.)

Example 6.4.4. Let V be a vector space over a field K and let T ∈ EndK (V ). For the corresponding structure of a K[x]-module on V , we know that the submodules of V are simply the T -invariants subspaces of V . Since the notion of direct sum only involves addition, to say that V , as a K[x]-module, is a direct sum of K[x]-submodules V1,...,Vs is equivalent to saying that V , as a vector space over K, is a direct sum of T -invariant subspaces V1,...,Vs.

Exercises.

6.4.1. Prove Proposition 6.4.3.

6.4.2. Let M be a module over a commutative ring A and let u1,..., un ∈ M. Show that {u1,..., un} is a basis of M if and only if M = Au1 ⊕ · · · ⊕ Aun and none of the elements u1,..., un is torsion (see Exercise 6.2.5 for the definition of this term). 86 CHAPTER 6. MODULES 6.5 Homomorphisms of modules

Fix a commutative ring A. Definition 6.5.1. Let M and N be A-modules. A homomorphism of A-modules from M to N is a function ϕ: M → N satisfying the following conditions:

HM1. ϕ(u1 + u2) = ϕ(u1) + ϕ(u2) for all u1, u2 ∈ M, HM2. ϕ(a u) = a ϕ(u) for all a ∈ A and all u ∈ M.

Condition HM1 implies that a homomorphism of A-modules is, in particular, a homo- morphism of groups under addition. Example 6.5.2. If A = K is a field, then K-modules M and N are simply vector spaces over K and a homomorphism of K-modules ϕ: M → N is nothing but a linear map from M to N. Example 6.5.3. We recall that if A = Z, then Z-modules M and N are simply abelian groups equipped with the natural multiplication by the integers. If ϕ: M → N is a homomorphism of Z-modules, condition HM1 shows that it is a homomorphism of groups. Conversely, if ϕ: M → N is a homomorphism of abelian groups, we have

ϕ(n u) = n ϕ(u)

for all n ∈ Z and all u ∈ M (exercise), hence ϕ satisfies conditions HM1 and HM2 of Definition 6.5.1. Thus, a homomorphism of Z-modules is simply a homomorphism of abelian groups.

Example 6.5.4. Let V be a vector space over a field K and let T ∈ EndK (V ). For the corresponding structure of a K[x]-module on V , a homomorphism of K[x]-modules S : V → V is simply a linear map that commutes with T , i.e. such that S ◦ T = T ◦ S (exercise). 0 n Example 6.5.5. Let m, n ∈ N>0, and let P ∈ Matm×n(A). For all u, u ∈ A and a ∈ A, we have: P (u + u0) = P u + P u0 and P (a u) = a P u. Thus the function An −→ Am u 7−→ P u is a homomorphism of A-modules.

The following result shows that we obtain in this way all the homomorphism of A-modules from An to Am.

n m Proposition 6.5.6. Let m, n ∈ N>0 and let ϕ: A → A be a homomorphism of A-modules. There exists a unique matrix P ∈ Matm×n(A) such that

ϕ(u) = P u for all u ∈ An. (6.3) 6.5. HOMOMORPHISMS OF MODULES 87

n Proof. Let {e1,..., en} be the standard basis of A defined in Example 6.3.2 and let P be the m × n matrix whose columns are C1 := ϕ(e1) ,...,Cn := ϕ(en). For every n-tuple   a1  .  n u =  .  ∈ A , we have: an   a1   .  ϕ(u) = ϕ(a1e1 + ··· + anen) = a1 C1 + ··· + an Cn = C1 ··· Cn  .  = P u . an

Thus P satisfies Condition (6.3). It is the only matrix that satisfies this condition since if 0 0 n a matrix P ∈ Matm×n(A) also satisfies ϕ(u) = P u for all u ∈ A , then its j-th column is 0 P ej = ϕ(ej) = Cj for all j = 1, . . . , n.

We continue the general theory with the following result.

Proposition 6.5.7. Let ϕ: M → N and ψ : N → P be homomorphisms of A-modules.

(i) The composite ψ ◦ ϕ: M → P is also a homomorphism of A-modules.

(ii) If ϕ: M → N is bijective, its inverse ϕ−1 : N → M is a homomorphism of A-modules.

Part (i) is left as an exercise.

Proof of (ii). Suppose that ϕ is bijective. Let v, v0 ∈ N and a ∈ A. We set

u := ϕ−1(v) and u0 := ϕ−1(v0).

Since u, u0 ∈ M and ϕ is a homomorphism of A-modules, we have

ϕ(u + u0) = ϕ(u) + ϕ(u0) = v + v0 and ϕ(a u) = a ϕ(u) = a v.

Therefore,

ϕ−1(v + v0) = u + u0 = ϕ−1(v) + ϕ−1(v0) and ϕ−1(a v) = a u = a ϕ−1(v).

This proves that ϕ−1 : N → M is indeed a homomorphism of A-modules. Definition 6.5.8 (Isomorphism). A bijective homomorphism of A-modules is called an iso- morphism of A-modules. We say that an A-module M is isomorphic to an A-module N if there exists an isomorphism of A-modules ϕ: M → N. We then write M ' N.

In light of these definitions, Proposition 6.5.7 has the following immediate consequence:

Corollary 6.5.9. Let ϕ: M → N and ψ : N → P be isomorphisms of A-modules. Then ψ ◦ ϕ: M → P and ϕ−1 : N → M are also isomorphisms of A-modules. 88 CHAPTER 6. MODULES

We deduce from this that isomorphism is an equivalence relation on the class of A- modules. Furthermore, the set of isomorphisms of A-modules from an A-module M to itself, called automorphisms of M, form a subgroup of the group of bijections from M to itself. We call it the group of automorphisms of M. Definition 6.5.10. Let ϕ: M → N be a homomorphism of A-modules. The kernel of ϕ is

ker(ϕ) := {u ∈ M ; ϕ(u) = 0}

and the image of ϕ is Im(ϕ) := {ϕ(u); u ∈ M}.

The kernel and image of a homomorphism of A-modules ϕ: M → N are therefore simply the kernel and image of ϕ considered as a homomorphism of abelian groups. The proof of the following result is left as an exercise.

Proposition 6.5.11. Let ϕ: M → N be a homomorphism of A-modules.

(i) ker(ϕ) is an A-submodule of M.

(ii) Im(ϕ) is an A-submodule of N.

(iii) The homomorphism ϕ is injective if and only if ker(ϕ) = {0}. It is surjective if and only if Im(ϕ) = N.

Example 6.5.12. Let M be an A-module and let a ∈ A. The function

ϕ : M → M u → au is a homomorphism of A-modules (exercise). Its kernel is

Ma := {u ∈ M ; au = 0} and its image is aM := {au ; u ∈ M}.

Thus, by Proposition 6.5.11, Ma and aM are both A-submodules of M.

We conclude this chapter with the following result.

Proposition 6.5.13. Let ϕ: M → N be a homomorphism of A-modules. Suppose that M is free with basis {u1,..., un}. Then ϕ is an isomorphism if and only if {ϕ(u1), . . . , ϕ(un)} is a basis of N.

Proof. We will show more precisely that

i) ϕ is injective if and only if ϕ(u1), . . . , ϕ(un) are linearly independent over A, and 6.5. HOMOMORPHISMS OF MODULES 89

ii) ϕ is surjective if and only if ϕ(u1), . . . , ϕ(un) generate N.

Therefore, ϕ is bijective if and only if {ϕ(u1), . . . , ϕ(un)} is a basis of N. To prove (i), we use the fact that ϕ is injective if and only if ker(ϕ) = {0} (Proposi- tion 6.5.11(iii)). We note that, for all a1, . . . , an ∈ A, we have

a1ϕ(u1) + ··· + anϕ(un) = 0 ⇐⇒ ϕ(a1u1 + ··· + anun) = 0 (6.4)

⇐⇒ a1u1 + ··· + anun ∈ ker(ϕ).

If ker(ϕ) = {0}, the last condition becomes a1u1 + ··· + anun = 0 which implies a1 = ··· = an = 0 since u1,..., un are linearly independent over A. In this case, we conclude that ϕ(u1), . . . , ϕ(un) are linearly independent over A. Conversely, if ϕ(u1), . . . , ϕ(un) are linearly independent, the equivalences (6.4) tell us that the only choice of a1, . . . , an ∈ A such that a1u1 + ··· + anun ∈ ker(ϕ) is a1 = ··· = an = 0. Since u1,..., un generate M, this implies that ker(ϕ) = {0}. Thus ker(ϕ) = {0} if and only if ϕ(u1), . . . , ϕ(un) are linearly independent. To prove (ii), we use the fact that ϕ is surjective if and only if Im(ϕ) = N. Since M = hu1,..., uniA, we have

Im(ϕ) = {ϕ(a1u1 + ··· + anun); a1, . . . , an ∈ A}

= {a1ϕ(u1) + ··· + anϕ(un); a1, . . . , an ∈ A}

= hϕ(u1), . . . , ϕ(un)iA.

Thus Im(ϕ) = N if and only if ϕ(u1), . . . , ϕ(un) generate N.

Exercises. 6.5.1. Prove Proposition 6.5.7(i).

6.5.2. Prove Proposition 6.5.11.

6.5.3. Prove that the map ϕ of Example 6.5.12 is a homomorphism of A-modules.

6.5.4. Let ϕ: M → M 0 be a homomorphism of A-modules and let N be an A-submodule of M. We define the image of N under ϕ by

ϕ(N) := {ϕ(u); u ∈ M}.

Show that

(i) ϕ(N) is an A-submodule of M 0;

(ii) if N = hu1,..., umiA, then ϕ(N) = hϕ(u1), . . . , ϕ(um)iA; 90 CHAPTER 6. MODULES

(iii) if N admits a basis {u1,..., um} and if ϕ is injective, then {ϕ(u1), . . . , ϕ(um)} is a basis of ϕ(N).

6.5.5. Let ϕ: M → M 0 be a homomorphism of A-modules and let N 0 be an A-submodule of M 0. We define the inverse image of N 0 under ϕ by

ϕ−1(N 0) := {u ∈ M ; ϕ(u) ∈ N 0}

(this notation does not imply that ϕ−1 exists). Show that ϕ−1(N 0) is an A-submodule of M.

6.5.6. Let M be a free A-module with basis {u1,..., un}, let N be an arbitrary A-module, and let v1,..., vn be arbitrary elements of N. Show that there exists a unique homomorphism of A-modules ϕ: M → N such that ϕ(ui) = vi for i = 1, . . . , n.

6.5.7. Let N be an A-module of finite type generated by n elements. Show that there exists a surjective homomorphism of A-modules ϕ: An → N. Hint. Use the result of Exercise 6.5.6. Chapter 7

Structure of modules of finite type over a euclidean domain

In this chapter, we define the notions of the annihilator of a module and the annihilator of an element of a module. We study in detail the meaning of these notions for a finite abelian group considered as a Z-module and then for a finite-dimensional vector space over a field K equipped with an endomorphism. We also cite without proof the structure theorem for modules of finite type over a euclidean domain and we analyze its consequences in the same situations. Finally, we prove a refinement of this result and we conclude with the Jordan canonical form for a linear operator on a finite-dimensional complex vector space.

7.1 Annihilators

Definition 7.1.1 (Annihilator). Let M be a module over a commutative ring A. The annihi- lator of an element u of M is

Ann(u) := {a ∈ A ; au = 0}.

The annihilator of M is

Ann(M) := {a ∈ A ; au = 0 for all u ∈ M}.

The following result is left as an exercise: Proposition 7.1.2. Let A and M be as in the above definition. The annihilator of M and the annihilator of any element of M are ideals of A. Furthermore, if M = hu1,..., usiA, then Ann(M) = Ann(u1) ∩ · · · ∩ Ann(us).

Recall that an A-module is called cylic if it is generated by a single element (see Sec- tion 6.2). For such a module, the last assertion of Proposition 7.1.2 gives:

91 92 CHAPTER 7. THE STRUCTURE THEOREM

Corollary 7.1.3. Let M be a cyclic module over a commutative ring A. We have Ann(M) = Ann(u) for every generator u of M.

In the case of an abelian group G written additively and considered as a Z-module, these notions become the following:

• The annihilator of an element g of G is

Ann(g) = {m ∈ Z ; mg = 0}.

It is an ideal of Z. By definition, if Ann(g) = {0}, we say that g is of infinite order. Otherwise, we say that g is of finite order and its order, denoted o(g), is the smallest positive integer in Ann(g), hence it is also a generator of Ann(g).

• The annihilator of G is

Ann(G) = {m ∈ Z ; mg = 0 for all g ∈ G}.

It is also an ideal of Z. If Ann(G) 6= {0}, then the smallest positive integer in Ann(G), that is, its positive generator, is called the exponent of G.

If G is a finite group, then the order |G| of the group (its cardinality) is an element of Ann(G). This follows from Lagrange’s Theorem which says that, in a finite group (commu- tative or not), the order of every element divides the order of the group. In this case, we have Ann(G) 6= {0} and the exponent of G is a divisor of |G|. More precisely, we prove:

Proposition 7.1.4. Let G be a finite abelian group considered as a Z-module. The exponent of G is the lcm of the orders of the elements of G and is a divisor of the order |G| of G. If G is cyclic, generated by an element g, the exponent of G, the order of G and the order of g are equal.

Recall that a lowest common multiple (lcm) of a finite family of nonzero integers a1, . . . , as is any nonzero integer m divisible by each of a1, . . . , as and that divides every other integer with this property (see Exercises 5.2.4 and 5.4.6 for the existence of the lcm).

Proof. Let m be the lcm of the orders of the elements of G. For all g ∈ G, we have mg = 0 since o(g) | m, hence m ∈ Ann(G). Otherwise, if n ∈ Ann(G), then ng = O for all g ∈ G, hence o(g) | n for all g ∈ G and thus m | n. This shows that Ann(G) = mZ. Therefore the exponent of G is m. The remark preceding the proposition shows that it is a divisor of |G|.

If G = Zg is cyclic generated by g, we know that |G| = o(g), and Corollary 7.1.3 gives Ann(G) = Ann(g). Thus the exponent of G is o(g) = |G|. 7.1. ANNIHILATORS 93

Example 7.1.5. Let C1 = Zg1 and C2 = Zg2 be cyclic groups generated respectively by an element g1 of order 6 and an element g2 of order 8. We form their cartesian product

C1 × C2 = {(k g1, ` g2); k, ` ∈ Z} = Z(g1, 0) + Z(0, g2).

st 1 Find the order of (2g1, 3g2).

For an integer m ∈ Z, we have

m(2g1, 3g2) = (0, 0) ⇐⇒ 2mg1 = 0 and 3mg2 = 0 ⇐⇒ 6 | 2m and 8 | 3m ⇐⇒ 3 | m and 8 | m ⇐⇒ 24 | m .

Thus the order of (2g1, 3g2) is 24.

nd 2 Find the exponent of C1 × C2.

For an integer m ∈ Z, we have

m ∈ Ann(C1 × C2) ⇐⇒ m(kg1, `g2) = (0, 0) ∀k, ` ∈ Z ⇐⇒ mk g1 = 0 and m` g2 = 0 ∀k, ` ∈ Z ⇐⇒ m g1 = 0 and m g2 = 0 ⇐⇒ 6 | m and 8 | m ⇐⇒ 24 | m .

Thus the exponent of C1 × C2 is 24.

Since |C1 × C2| = |C1| |C2| = 6 · 8 = 48 is a strict multiple of the exponent of C1 × C2, the group C1 × C2 is not cyclic (see the last part of Proposition 7.1.4).

In the case of a vector space equipped with an endomorphism, the notions of annihilators become the following:

Proposition 7.1.6. Let V be a vector space over a field K and let T ∈ EndK (V ). For the corresponding structure of a K[x]-module on V , the annihilator of an element v of V is

Ann(v) = {p(x) ∈ K[x]; p(T )(v) = 0}, while the annihilator of V is

Ann(V ) = {p(x) ∈ K[x]; p(T ) = 0}.

These are ideals of K[x]. 94 CHAPTER 7. THE STRUCTURE THEOREM

Proof. Multiplication by a polynomial p(x) ∈ K[x] is defined by p(x)v = p(T )v for all v ∈ V . Thus, for fixed v ∈ V we have:

p(x) ∈ Ann(v) ⇐⇒ p(T )(v) = 0 and so

p(x) ∈ Ann(V ) ⇐⇒ p(T )(v) = 0 ∀v ∈ V ⇐⇒ p(T ) = 0.

We know that every nonzero ideal of K[x] has a unique monic generator: the monic polynomial of smallest degree that belongs to this ideal. In the context of Proposition 7.1.6, the following result provides a means of calculating the monic generator of Ann(v) when V is finite-dimensional. Proposition 7.1.7. With the notation of Proposition 7.1.6, suppose that V is finite-dimensional. Let v ∈ V with v 6= 0. Then:

(i) there exists a greatest positive integer m such that v,T (v),...,T m−1(v) are linearly independent over K;

(ii) for this integer m, there exists a unique choice of element a0, a1, . . . , am−1 of K such that m m−1 T (v) = a0v + a1T (v) + ··· + am−1T (v);

m m−1 (iii) the polynomial p(x) = x −am−1x −· · ·−a1x−a0 is the monic generator of Ann(v); (iv) the K[x]-submodule U of V generated by v is a T -invariant vector subspace of V that admits C = {v,T (v),...,T m−1(v)} as a basis;   0 0 ··· 0 a0  1 0 ··· 0 a   1  (v) we have [ T ] =  0 1 ··· 0 a2  . |U C    ......   . . . . .  0 0 ··· 1 am−1

Before presenting the proof of this result, we give a name to the matrix appearing in the last assertion of the proposition.

Definition 7.1.8 (Companion matrix). Let m ∈ N>0 and let a0, . . . , am−1 be elements of a field K. The matrix   0 0 ··· 0 a0  1 0 ··· 0 a   1   0 1 ··· 0 a   2   ......   . . . . .  0 0 ··· 1 am−1 m m−1 is called the companion matrix of the monic polynomial x −am−1x −· · ·−a1x−a0 ∈ K[x]. 7.1. ANNIHILATORS 95

Proof of Proposition 7.1.7. Let n = dimK (V ). We first note that, for every integer m ≥ 1, the vectors v,T (v),...,T m(v) are linearly dependent if and only if the annihilator Ann(v) of v contains a nonzero polynomial of degree at most m. Since dimK (V ) = n, we know that v,T (v),...,T n(v) are linearly dependent and thus Ann(v) contains a nonzero polynomial of degree ≤ n. In particular, Ann(v) is a nonzero ideal of K[x], and as such it possesses a unique monic generator p(x), namely its monic element of least degree. Let m be the degree of p(x). Since v 6= 0, we have m ≥ 1. Write

m m−1 p(x) = x − am−1x − · · · − a1x − a0. Then the integer m is the largest integer such that v,T (v),...,T m−1(v) are linearly inde- pendent and the elements a0, a1, . . . , am−1 of A are the only ones for which

m m−1 T (v) − am−1T (v) − · · · − a1T (v) − a0v = 0, that is, such that m m−1 T (v) = a0v + a1T (v) + ··· + am−1T (v). (7.1) This proves (i), (ii) and (iii) all at the same time. On the other hand, we clearly have

i m−1 T (T (v)) ∈ hv,T (v),...,T (v)iK for i = 0, . . . , m − 2 and equation (7.1) shows that this is again true for i = m − 1. Then, m−1 by Proposition 2.6.4, the subspace hv,T (v),...,T (v)iK is T -invariant. Therefore, it is a K[x]-submodule of V (see Proposition 6.2.6). Since it contains v and it is contained in U := K[x]v, we deduce that this subspace is U. Finally, since v,T (v),...,T m−1(v) are linearly independent, we conclude that these m vectors form a basis of U. This proves (iv). The final assertion (v) follows directly from (iv) by using the relation (7.1).

Example 7.1.9. Let V be a vector space of dimension 3 over Q, let B = {v1, v2, v3} be a basis of V , and let T ∈ EndQ(V ). Suppose that  2 −2 −1  [ T ]B =  1 −1 −1  , 1 1 0 and find the monic generator of Ann(v1). We have  2 −2 −1  1 2 [T (v1)]B = [ T ]B[v1]B =  1 −1 −1  0 = 1 , 1 1 0 0 1  2 −2 −1  2 1 2 [T (v1)]B = [ T ]B[T (v1)]B =  1 −1 −1  1 = 0 , 1 1 0 1 3  2 −2 −1  1 −1 3 2 [T (v1)]B = [ T ]B[T (v1)]B =  1 −1 −1  0 = −2 , 1 1 0 3 1 96 CHAPTER 7. THE STRUCTURE THEOREM and so

2 3 T (v1) = 2v1 + v2 + v3,T (v1) = v1 + 3v3,T (v1) = −v1 − 2v2 + v3.

Since

1 2 1  2  det [v1]B [T (v1)]B [T (v1)]B = 0 1 0 = 3 6= 0,

0 1 3 2 the vectors v1,T (v1),T (v1) are linearly independent. Since dimQ(V ) = 3, the greatest m−1 integer m such that v1,T (v1),...,T (v1) are linearly independent is necessarily m = 3. By Proposition 7.1.7, this implies that the monic generator of Ann(v1) is of degree 3. We see that

3 2 3 2 T (v1) = a0v1 + a1T (v1) + a2T (v1) ⇐⇒ [T (v1)]B = a0[v1]B + a1[T (v1)]B + a2[T (v1)]B −1 1 2 1 ⇐⇒ −2 = a0 0 + a1 1 + a2 0 1 0 1 3   a0 + 2a1 + a2 = −1 ⇐⇒ a1 = −2  a1 + 3a2 = 1

⇐⇒ a0 = 2, a1 = −2 and a2 = 1.

Hence we have 3 2 T (v1) = 2v1 − 2T (v1) + T (v1). By Proposition 7.1.7, we conclude that the polynomial

p(x) = x3 − x2 + 2x − 2 is the monic generator of Ann(v1).

2 Since dimQ(V ) = 3 and v1,T (v1),T (v1) are linearly independent over Q, the ordered 2 set C = {v1,T (v1),T (v1)} forms a basis of V . Therefore,

2 V = hv1,T (v1),T (v1)iQ = hv1iQ[x] is a cyclic Q[x]-module generated by v1 and the last assertion of Proposition 7.1.7 gives  0 0 2  [ T ]C =  1 0 −2  . 0 1 1

From Proposition 7.1.7, we deduce: Proposition 7.1.10. Let V be a finite-dimensional vector space over a field K and let T ∈ EndK (V ). For the corresponding structure of a K[x]-module on V , the ideal Ann(V ) is nonzero. 7.1. ANNIHILATORS 97

Proof. Let {v1,..., vn} be a basis of V . We have

V = hv1,..., vniK = hv1,..., vniK[x].

Then Proposition 7.1.2 gives

Ann(V ) = Ann(v1) ∩ · · · ∩ Ann(vn).

For each i = 1, . . . , n, Proposition 7.1.7 shows that Ann(vi) is generated by a monic poly- nomial pi(x) ∈ K[x] of degree ≤ n. The product p1(x) ··· pn(x) annihilates each vi, hence it annihilates all of V . This proves that Ann(V ) contains a nonzero polynomial of degree ≤ n2.

We conclude this section with the following definition. Definition 7.1.11 (Minimal polynomial). Let V be a vector space over a field K and let T ∈ EndK (V ). Suppose that Ann(V ) 6= {0} for the corresponding structure of a K[x]- module on V . Then the monic generator of Ann(V ) is called the minimal polynomial of T and is denoted minT (x). By Proposition 7.1.6, it is the monic polynomial p(x) ∈ K[x] of smallest degree such that p(T ) = 0.

Example 7.1.12. In Example 7.1.9, V is a cyclic Q[x]-module generated by v1. By Corol- lary 7.1.3, we therefore have

Ann(V ) = Ann(v1), and thus the minimal polynomial of T is the monic generator of Ann(v1), that is, minT (x) = x3 − x2 + 2x − 2.

Exercises.

7.1.1. Prove Proposition 7.1.2 and Corollary 7.1.3.

7.1.2. Let C1 = Zg1, C2 = Zg2 and C3 = Zg3 be cyclic abelian groups of orders 6, 10 and 15 respectively.

(i) Find the order of (3g1, 5g2) ∈ C1 × C2.

(ii) Find the order of (g1, g2, g3) ∈ C1 × C2 × C3.

(iii) Find the exponent of C1 × C2.

(iv) Find the exponent of C1 × C2 × C3. 98 CHAPTER 7. THE STRUCTURE THEOREM

7.1.3. Let V be a vector space of dimensions 3 over R, let B = {v1, v2, v3} be a basis of V , and let T ∈ EndR(V ). Suppose that

 1 0 −1 [ T ]B = −1 1 2  . 1 −1 −2

For the corresponding structure of a R[x]-module on V :

(i) find the monic generator of Ann(v1);

(ii) find the monic generator of Ann(v2) and, from this, deduce the monic generator of Ann(V ).

7.1.4. Let T : Q3 → Q3 be the linear map whose matrix relative to the standard basis 3 E = {e1, e2, e3} of Q is  0 1 2 [ T ]E = −1 2 2 . −2 2 3

For the corresponding structure of a Q[x]-module on Q3:

(i) find the monic generator of Ann(e1 + e3);

3 (ii) show that Q is cyclic and find minT (x).

7.1.5. Let V be a vector space of dimension n over a field K and let T ∈ EndK (V ).

(i) Show that I,T,...,T n2 are linearly dependent over K.

(ii) Deduce that there exists a nonzero polynomial p(x) ∈ K[x] of degree at most n2 such that p(T ) = 0.

(iii) State and prove the corresponding result for a matrix A ∈ Matn×n(K).

7.1.6. Let V be a vector space of dimension n over a field K and let T ∈ EndK (V ). Suppose that V is T -cyclic (i.e. cyclic as a K[x]-module for the corresponding K[x]-module structure). Show that there exists a nonzero polynomial p(x) ∈ K[x] of degree at most n such that p(T ) = 0. 7.2. MODULES OVER A EUCLIDEAN DOMAIN 99 7.2 Modules over a euclidean domain

We first state the structure theorem for a module of finite type over a euclidean domain, and then we examine its consequence for abelian groups and vector spaces equipped with an endomorphism. The proof of the theorem will be given in Chapter 8. Theorem 7.2.1 (The structure theorem). Let M be a module of finite type over a euclidean domain A. Then M is a direct sum of cyclic submodules

M = Au1 ⊕ Au2 ⊕ · · · ⊕ Aus

with A ! Ann(u1) ⊇ Ann(u2) ⊇ · · · ⊇ Ann(us).

Since A is euclidean, each of the ideals Ann(ui) is generated by an element di of A. The condition Ann(ui) ⊇ Ann(ui+1) implies in this case that di | di+1 if di 6= 0 and that di+1 = 0 if di = 0. Therefore there exists an integer t such that d1, . . . , dt are nonzero with d1 | d2 | ... | dt and dt+1 = ··· = ds = 0. Moreover, the condition A 6= Ann(u1) implies that d1 is not a unit. This is easy to fulfill since if Ann(u1) = A, then u1 = 0 and so we can omit the factor Au1 in the direct sum. While the decomposition of M given by Theorem 7.2.1 is not unique in general, one can show that the integer s and the ideals Ann(u1),..., Ann(us) only depend on M and not on the choice of the ui. For example, we have the following characterization of Ann(us), whose proof is left as an exercise:

Corollary 7.2.2. With the notation of Theorem 7.2.1, we have Ann(M) = Ann(us).

In the case where A = Z, an A-module is simply an abelian group. For a finite abelian group, Theorem 7.2.1 and its corollary give: Theorem 7.2.3. Let G be a finite abelian group. Then G is a direct sum of cyclic subgroups

G = Zg1 ⊕ Zg2 ⊕ · · · ⊕ Zgs with o(g1) 6= 1 and o(g1) | o(g2) | ... | o(gs). Furthermore, the exponent of G is o(gs).

For the situation in which we are particularly interested in this course, we obtain: Theorem 7.2.4. Let V be a finite-dimensional vector space over a field K and let T ∈ EndK (V ). For the corresponding structure of a K[x]-module on V , we can write

V = K[x]v1 ⊕ K[x]v2 ⊕ · · · ⊕ K[x]vs (7.2) such that, if we denote by di(x) the monic generator of Ann(vi) for i = 1, . . . , s, we have deg(d1(x)) 6= 0 and d1(x) | d2(x) | ... | ds(x).

The minimal polynomial of T is ds(x). 100 CHAPTER 7. THE STRUCTURE THEOREM

Combining this result with Proposition 7.1.7, we get: Theorem 7.2.5. With the notation of Theorem 7.2.4, set

Vi = K[x]vi and ni = deg(di(x)) for i = 1, . . . , s. Then the ordered set

n1−1 ns−1 C = {v1,T (v1),...,T (v1),...... , vs,T (vs),...,T (vs)} is a basis of V relative to which the matrix of T is block diagonal with s blocks on the diagonal:   D1 0  D   2  [ T ]C =  ..  , (7.3)  0 .  Ds

the i-th block Di of the diagonal being the companion matrix of the polynomial di(x).

Proof. By Proposition 7.1.7, each Vi is a T -cyclic subspace of V that admits a basis

ni−1 Ci = {vi,T (vi),...,T (vi)} and we have

[ T | ]Ci = Di. Vi Since V = V1 ⊕ · · · ⊕ Vs, Theorem 2.6.6 implies that C = C1 q · · · q Cs is a basis of V such that     [T | ]C1 D1 V1 0 0  ..   ..  [ T ]C =  0 .  =  0 .  . [T | ]Cs Ds Vs

The polynomials d1(x), . . . , ds(x) appearing in Theorem 7.2.4 are called the invariant factors of T and the corresponding matrix (7.3) given by Theorem 7.2.5 is call the rational canonical form of T . It follows from Theorem 7.2.5 that the sum of the degree of the invariant factors of T is equal to the dimension of V . Example 7.2.6. Let V be a vector space over R of dimension 5. Suppose that the invariant factors of T are 2 d1(x) = x − 1, d2(x) = d3(x) = x − 2x + 1. Then there exists a basis C of V such that   1 0    0 −1    [ T ]C =  1 2     0 −1  0 1 2  0 −1  since the companion matrix of x − 1 is 1 and that of x2 − 2x + 1 is . 1 2 7.2. MODULES OVER A EUCLIDEAN DOMAIN 101

The proof of Theorem 7.2.5 does not use the divisibility relations relating the polynomials d1(x), . . . , ds(x). In fact, it applies to any decomposition of the form (7.2). Example 7.2.7. Let V be a vector space over C of dimension 5 and let T : V → V be a linear operator on C. Suppose that, for the corresponding structure of a C[x]-module on V , we have

V = C[x]v1 ⊕ C[x]v2 ⊕ C[x]v3

and that the monic generators of the annihilators of v1, v2 and v3 are respectively

x2 − i x + 1 − i, x − 1 + 2i, x2 + (2 + i)x − 3.

Then C = {v1,T (v1), v2, v3,T (v3)} is a basis of V and  0 −1 + i 0   1 i    [ T ]C =  1 − 2i  .    0 3  0 1 −2 − i

Exercises.

7.2.1. Prove Corollary 7.2.2 using Theorem 7.2.1. 7.2.2. Prove Theorem 7.2.3 using Theorem 7.2.1 and Corollary 7.2.2. 7.2.3. Prove Theorem 7.2.4 using Theorem 7.2.1 and Corollary 7.2.2. 7.2.4. Let V be a finite-dimensional real vector space equipped with an endomorphism T ∈

EndR(V ). Suppose that, for the corresponding structure of an R[x]-module on V , we have

V = R[x]v1 ⊕ R[x]v2 ⊕ R[x]v3,

2 where v1, v2 and v3 are elements of V whose respective annihilators are generated by x − x + 1, x3 and x − 2. Determine the dimension of V , a basis C of V associated to this decomposition and, for this basis, determine [ T ]C.

7.2.5. Let A ∈ Matn×n(K) where K is a field. Show that there exist nonconstant monic polynomials d1(x), . . . , ds(x) ∈ K[x] with d1(x) | · · · | ds(x) and an invertible matrix P ∈ −1 Matn×n(K) such that P AP is block diagonal, with the companion matrices of d1(x), . . . , ds(x) on the diagonal. 102 CHAPTER 7. THE STRUCTURE THEOREM 7.3 Primary decomposition

Theorem 7.2.1 tells us that every module of finite type over a euclidean ring is a direct sum of cyclic submodules. To obtain a finer decomposition, it suffices to decompose cyclic modules. This is the subject of primary decomposition. Since it applies to every module with nonzero annihilator, we consider the more general case, which is no more complicated. Before stating the principal result, recall that, if M is a module over a commutative ring A, then, for each a ∈ A, we define A-submodules of M by setting

Ma := {u ∈ M ; au = 0} and aM = {au ; u ∈ M}

(see Example 6.5.12).

Theorem 7.3.1. Let M be a module over a euclidean ring A. Suppose that M 6= {0} and that Ann(M) contains a nonzero element a. Write

a = q1 . . . qs where  is a unit in A, s is an integer ≥ 0, and q1, . . . , qs are nonzero elements of A which are pairwise relatively prime. Then the integer s is ≥ 1 and we have:

1) M = Mq1 ⊕ · · · ⊕ Mqs , 2) M = q ··· q ··· q M for i = 1, . . . , s. qi 1 bi s

Furthermore, if Ann(M) = (a), then Ann(Mqi ) = (qi) for i = 1, . . . , s.

In formula 2), the hat on qi indicates that qi is omitted from the product. If s = 1, this product is empty and we adopt the convention that q1 ··· qbi ··· qs = 1.

Proof. Since Ann(M) is an ideal of A, it contains

−1  a = q1 ··· qs.

Furthermore, since M 6= {0}, it does not contain 1. Therefore, we must have s ≥ 1. If s = 1, we find that M = M = q M, and formulas 1) and 2) are verified. Suppose q1 b1 now that s ≥ 2 and define

ai := q1 ··· qbi ··· qs for i = 1, . . . , s.

Since q1, . . . , qs are pairwise relatively prime, the products a1, . . . , as are relatively prime (i.e. their gcd is one), and so there exist r1, . . . , rs ∈ A such that

r1a1 + ··· + rsas = 1 7.3. PRIMARY DECOMPOSITION 103

(see Exercise 7.3.1). Then, for all u ∈ M, we have:

u = (r1a1 + ··· + rsas)u = r1a1u + ··· + rsasu. (7.4)

Since riaiu = ai(riu) ∈ aiM for i = 1, . . . , s, this shows that

M = a1M + ··· + asM. (7.5)

If u ∈ Mqi , we have aju = 0 for every index j different from i since qi | aj. Thus formula (7.4) gives:

u = riaiu for all u ∈ Mqi . (7.6)

This shows that Mqi ⊆ aiM. Since

−1 qiaiM =  aM = {0},

we also have aiM ⊆ Mqi , hence aiM = Mqi . Then equality (7.5) becomes

M = Mq1 + ··· + Mqs . (7.7)

To show that this sum is direct, choose u1 ∈ Mq1 , . . . , us ∈ Mqs such that

u1 + ··· + us = 0.

By multiplying the two sides of this equalities by riai and noting that riaiuj = 0 for j 6= i (since qj | ai if j 6= i), we have riaiui = 0.

Since (7.6) gives ui = riaiui, we conclude that ui = 0 for i = 1, . . . , s. By Definition 6.4.1, this shows that the sum (7.7) is direct.

Finally, suppose that Ann(M) = (a). Since Mqi = aiM, we see that

Ann(Mqi ) = Ann(aiM)

= {r ∈ A ; r aiM = {0}}

= {r ∈ A ; r ai ∈ Ann(M)}

= {r ∈ A ; a | r ai}

= (qi).

For a cyclic module, Theorem 7.3.1 implies the following:

Corollary 7.3.2. Let A be a euclidean ring and let M = A u be a cylic A-module. Suppose that u 6= 0 and that Ann(u) 6= {0}. Choose a generator a of Ann(u) and factor it in the form e1 es a = p1 ··· ps 104 CHAPTER 7. THE STRUCTURE THEOREM

where  is a unit of A, s is an integer ≥ 1, p1, . . . , ps are pairwise non-associated irreducible elements of A, and e1, . . . , es are positive integers. Then, for i = 1, . . . , s, the product

e1 ei es ui = p1 ··· pci ··· ps u ∈ M

ei satisfies Ann(ui) = (pi ), and we have:

M = Au1 ⊕ · · · ⊕ Aus. (7.8)

e1 es Proof. Since Ann(M) = Ann(u) = (a), Theorem 7.3.1 applies with q1 = p1 , . . . , qs = ps . Since q1 ··· qbi ··· qs M = Aui (1 ≤ i ≤ s), it gives the decomposition (7.8) and

ei Ann(ui) = Ann(Aui) = (qi) = (pi ) (1 ≤ i ≤ s).

Combining this result with Theorem 7.2.1, we obtain:

Theorem 7.3.3. Let M be a module of finite type over a euclidean ring. Suppose that M 6= {0} and that Ann(M) 6= {0}. Then M is a direct sum of cyclic submodules whose annihilator is generated by a power of an irreducible element of A.

Proof. Corollary 7.3.2 shows that the result is true for a cylic module. The general case follows since, according to Theorem 7.2.1, the module M is a direct sum of cyclic submodules, and the annihilator of each of these submodules is nonzero since it contains Ann(M).

Example 7.3.4. We use the notation of Example 7.1.9. The vector space V is a cyclic Q[x]- module generated by the vector v1 and the ideal Ann(v1) is generated by

p(x) = x3 − x2 + 2x − 2 = (x − 1)(x2 + 2).

Since x2 + 2 has no real root, it has no root in Q, hence it is an irreducible polynomial in Q[x]. The polynomial x − 1 is also irreducible, since its degree is 1. Then Corollary 7.3.2 gives V = Q[x]u1 ⊕ Q[x]u2 where 2 2 u1 = (x + 2)v1 = T (v1) + 2v1 = 3v1 + 3v3,

u2 = (x − 1)v1 = T (v1) − v1 = v1 + v2 + v3 satisfy 2 Ann(u1) = (x − 1) and Ann(u2) = (x + 2). Reasoning as in the proof of Theorem 7.2.5, we deduce that

C = {u1, u2,T (u2)} = {3v1 + 3v3, v1 + v2 + v3, −v1 − v2 + 2v3)} 7.3. PRIMARY DECOMPOSITION 105

is a basis of V such that  1 0  [ T ]C =  0 0 −2  1 0 are block diagonal with the companion matrices of x − 1 and x2 + 2 on the diagonal.

We leave it to the reader to analyze the meaning of Theorem 7.3.3 for abelian groups. For the situation that interests us the most, we have:

Theorem 7.3.5. Let V be a finite-dimensional vector space over a field K and let T : V → V be a linear operator on V . There exists a basis of V relative to which the matrix of T is block diagonal with the companion matrices of powers of monic irreducible polynomials of K[x] on the diagonal.

This matrix is called a primary rational canonical form of T . We can show that it is unique up to a permutation of the blocks on the diagonal. The powers of monic irreducible polynomials whose companion matrices appear on its diagonal are called the elementary divisors of T . We leave the proof of this result as an exercise and we give an example. Example 7.3.6. Let V be a vector space of dimension 6 over R and let T : V → V be a linear operator. Suppose that, for the corresponding structure of an R[x]-module on V , we have

V = R[x]v1 ⊕ R[x]v2

3 2 3 with Ann(v1) = (x − x ) and Ann(v2) = (x − 1).

1st We have x3 − x2 = x2(x − 1) where x and x − 1 are irreducible. Thus,

0 00 R[x]v1 = R[x]v1 ⊕ R[x]v1

0 00 2 2 where v1 = (x − 1)v1 = T (v1) − v1 and v1 = x v1 = T (v1) satisfy

0 2 00 Ann(v1) = (x ) and Ann(v1 ) = (x − 1).

2nd Similarly, x3 − 1 = (x − 1)(x2 + x + 1) is the factorization of x3 − 1 as a product of powers of irreducible polynomials of R[x]. Therefore,

0 00 R[x]v2 = R[x]v2 ⊕ R[x]v2

0 2 2 00 where v2 = (x + x + 1)v2 = T (v2) + T (v2) + v2 and v2 = (x − 1)v2 = T (v2) − v2 satisfy 0 00 2 Ann(v2) = (x − 1) and Ann(v2 ) = (x + x + 1). 106 CHAPTER 7. THE STRUCTURE THEOREM

3rd We conclude that 0 00 0 00 V = R[x]v1 ⊕ R[x]v1 ⊕ R[x]v2 ⊕ R[x]v2 and that 0 0 00 0 00 00 C = {v1,T (v1), v1 , v2, v2 ,T (v2 )} is a basis of V for which  0 0 0   1 0     1  [ T ]C =   ,  1     0 −1  0 1 −1

with the companion matrices of x2, x − 1, x − 1 and x2 + x + 1 on the diagonal.

Exercises.

7.3.1. Let A be a euclidean domain, let p1, . . . , ps be non-associated irreducible elements of ei A and let e1, . . . , es be positive integers. Set qi = pi and ai = q1 ··· qbi ··· qs for i = 1, . . . , s. Show that there exists r1, . . . , rs ∈ A such that 1 = r1a1 + ··· + rsas.

7.3.2. Let M be a module over a euclidean ring A. Suppose that Ann(M) = (a) with a 6= 0, and that c ∈ A with c 6= 0. Show that Mc = Md where d = gcd(a, c).

7.3.3. Let V be a vector space over a field K and let T ∈ EndK (V ). We consider V as a K[x]-module in the usual way. Show that, in this context, the notions of the submodules Ma and aM introduced in Example 6.5.12 and recalled before Theorem 7.3.1 become:

Vp(x) = ker(p(T )) and p(x)V = Im(p(T ))

for all p(x) ∈ k[x]. Deduce that ker(p(T )) and Im(p(T )) are T -invariant subspaces of V .

7.3.4. Let V be a finite-dimensional vector space over a field K and let T ∈ EndK (V ). Suppose that p(x) ∈ K[x] is a monic polynomial such that p(T ) = 0, and write p(x) = e1 es p1(x) ··· ps(x) where p1(x), . . . , ps(x) are distinct monic irreducible polynomials of K[x] and e1, . . . , es are positive integers. Combining Exercise 7.3.3 and Theorem 7.3.1, show that

e1 es (i) V = (ker p1(T ) ) ⊕ · · · ⊕ (ker ps(T ) ),   ei e1 e es (ii) ker pi(T ) = Im p1(T ) ... p\i(T ) i . . . ps(T ) for i = 1, . . . s. 7.3. PRIMARY DECOMPOSITION 107

7.3.5. Let V be a vector space of dimension n over a field K, and let T ∈ EndK (V ). Show that T is diagonalizable if and only if its minimal polynomial minT (x) admits a factorization of the form

minT (x) = (x − α1) ··· (x − αs) where α1, . . . , αs are distinct elements of K.

7.3.6. Let T : R3 → R3 be a linear maps whose matrix relative to the standard basis C of R3 is 2 2 −1 [ T ]C = 1 3 −1 . 2 4 −1

(i) Show that p(T ) = 0 where p(x) = (x − 1)(x − 2) ∈ R[x].

(ii) Why can we assert that R3 = ker(T − I) ⊕ ker(T − 2I)?

(iii) Find a basis B1 for ker(T − I) and a basis B2 for ker(T − 2I).

` 3 (iv) Give the matrix of T with respect to the basis B = B1 B2 of R .

7.3.7. Let V be a vector space over Q with basis A = {v1, v2, v3}, and let T : V → V be the linear map whose matrix relative to the basis A is

 5 1 5  [ T ]A = −3 −2 −2 . −3 −1 −3

(i) Show that p(T ) = 0 where p(x) = (x + 1)2(x − 2) ∈ Q[x]. (ii) Why can we assert that V = ker(T + I)2 ⊕ ker(T − 2I)?

2 (iii) Find a basis B1 for ker(T + I) and a basis B2 for ker(T − 2I). ` (iv) Give the matrix of T with respect to the basis B = B1 B2 of V .

7.3.8. Let V be a finite-dimensional vector space over C and let T : V → V be a C-linear map. Suppose that the polynomial p(x) = x4 −1 satisfies p(T ) = 0. Factor p(x) as a product of powers of irreducible polynomials in C[x] and give a corresponding decomposition of V as a direct sum of T -invariant subspaces.

7.3.9. Let V be a finite-dimensional vector space over R and let T : V → V be an R-linear map. Suppose that the polynomial p(x) = x3 + 2x2 + 2x + 4 satisfies p(T ) = 0. Factor p(x) as a product of powers of irreducible polynomials in R[x] and give a corresponding decomposition of V as a direct sum of T -invariant subspaces. 108 CHAPTER 7. THE STRUCTURE THEOREM

7.3.10 (Application to differential equations). Let D be the operator of derivation on the vector space C∞(R) of infinitely differentiable functions from R to R. Rolle’s Theorem tells us that if a function f : R → R is differentiable at each point x ∈ R with f 0(x) = 0, then f is constant.

(i) Let λ ∈ R. Suppose that a function satisfies f 0(x) = λf(x) for all x ∈ R. Applying Rolle’s Theorem to the function f(x)e−λx, shows that there exists a constant C ∈ R λx λx such that f(x) = Ce for all x ∈ R. Deduce that ker(D − λI) = he iR.

(ii) Let λ1, . . . , λs be distinct real numbers, and let p(x) = (x − λ1) ··· (x − λs) ∈ R[x]. Combining the result of (i) with Theorem 7.3.1, show that ker p(D) is a subspace of λ1x λsx C∞(R) of dimension s that admits the basis {e , . . . , e }.

000 00 0 (iii) Determine the general form of functions f ∈ C∞(R) such that f − 2f − f + 2f = 0.

7.4 Jordan canonical form

In this section, we fix a finite-dimensional vector space V over C and a linear operator T : V → V . We also equip V with the corresponding structure of a C[x]-module. Since the irreducible monic polynomials of C[x] are of the form x − λ with λ ∈ C, Theorem 7.3.3 gives V = C[x]v1 ⊕ · · · ⊕ C[x]vs

for the elements v1,..., vs of V whose annihilators are of the form

e1  es  Ann(v1) = (x − λ1) ,..., Ann(vs) = (x − λs)

with λ1, . . . , λs ∈ C and e1, . . . , es ∈ N>0.

Set Ui = C[x]vi for i = 1, . . . , s. For each i, Proposition 7.1.7 shows that Ci := ei−1 {vi,T (vi),...,T (vi)} is a basis of Ui and that [ T | ]Ci is the companion matrix of Ui ei (x − λi) . Then Theorem 2.6.6 tells us that C = C1 q · · · q Cs is a basis of V relative to which the matrix [ T ]C of T is block diagonal with the companion matrices of the poly- e1 es nomials (x − λ1) ,..., (x − λs) on the diagonal. This is the rational canonical form of T . The following lemma proposes a different choice of basis for each of the subspaces Ui.

Lemma 7.4.1. Let v ∈ V and let U := C[x]v. Suppose that Ann(v) = ((x−λ)e) with λ ∈ C and e ∈ N>0. Then B = {(T − λI)e−1(v),..., (T − λI)(v), v} is a basis of U for which λ 1  . 0  λ ..  [ T ] =   (7.9) |U B  .   .. 1  0 λ 7.4. JORDAN CANONICAL FORM 109

Proof. Since Ann(v) = ((x − λ)e), Proposition 7.1.7 shows that

C := {v,T (v),...,T e−1(v)}

is a basis of U and therefore dimC(U) = e. Set

e−i ui = (T − λI) (v) for i = 1, . . . , e,

so that B can be written as B = {u1, u2,..., ue} 0 (we adopt the convention that ue = (T − λI) (v) = v). We see that

e T (u1) = (T − λI)(u1) + λu1 = (T − λI) (v) + λu1 = λu1 (7.10) since (T − λI)e(v) = (x − λ)ev = 0. For i = 2, . . . , e, we also have

T (ui) = (T − λI)(ui) + λui = ui−1 + λui . (7.11)

Relation (7.10) and (7.11) show in particular that

T (ui) ∈ hu1,..., ueiC for i = 1, . . . , e.

Thus hu1,..., ueiC is a T -invariant subspace of U. Since it contains v = ue, it contains all elements of C and so

hu1,..., ueiC = U.

Since dimC(U) = e = |B|, we conclude that B is a basis of U. From (7.10) and (7.11), we deduce 0   . λ  .   .  0 1 [T (u )] = [λu ] =   and [T (u )] = [u + λu ] =   for i = 2, . . . , e, 1 B 1 B  .  i B i−1 i B λ ← i  .     .  0  .  0 and hence the form (7.9) for the matrix [ T ] . |U B

Definition 7.4.2 (Jordan block). For each λ ∈ C and each e ∈ N>0, we denote by Jλ(e) the square matrix of size e with entries on the diagonal equal to λ, entries just above the diagonal equal to 1, and all other entries equal to 0: λ 1 0  λ ...    Jλ(e) =  .  ∈ Mate×e(C).  .. 1  0 λ

We call it the Jordan block of size e × e for λ. 110 CHAPTER 7. THE STRUCTURE THEOREM

Thanks to Lemma 7.4.1 and the discussion preceding it, we conclude:

Theorem 7.4.3. There exists a basis B of V such that [ T ]B is block diagonal with Jordan blocks on the diagonal.

Proof. Lemma 7.4.1 shows that, for i = 1, . . . , s, the subspace Ui admits the basis

ei−1 Bi = {(T − λiI) (vi),..., (T − λiI)(vi), vi} and that

[ T | ]Bi = Jλi (ei). Ui

Then, according to Theorem 2.6.6, the ordered set B := B1 q · · · q Bs is a basis of V for which   Jλ1 (e1) 0  J (e )   λ2 2  [ T ]B =  .  . (7.12)  ..   

0 Jλs (es)

We say that (7.12)) is a Jordan canonical form (or Jordan normal form) of T . The following result shows that this form is essentially unique and gives a method for calculating it: Theorem 7.4.4. The Jordan canonical forms of T are obtained from one another by a permutation of the Jordan blocks on the diagonal. These blocks are of the form Jλ(e) where λ is an eigenvalue of T and where e is an integer ≥ 1. For each λ ∈ C and each integer k ≥ 1, we have

k X dimC ker(T − λI) = min(k, e) nλ(e) (7.13) e≥1 where nλ(e) denotes the number of Jordan blocks equal to Jλ(e) on the diagonal.

Before proceeding to the proof of this theorem, we first explain how formula (7.13) allows us to calculate the integers nλ(e) and hence to determine a Jordan canonical form of T . For this, we fix λ ∈ C. We note that there exists an integer r ≥ 1 such that nλ(e) = 0 for all e > r and we set k mk = dimC ker(T − λI) for k = 1, . . . , r. Then formula (7.13) gives

m1 = nλ(1) + nλ(2) + nλ(3) + ··· + nλ(r),

m2 = nλ(1) + 2nλ(2) + 2nλ(3) + ··· + 2nλ(r),

m3 = nλ(1) + 2nλ(2) + 3nλ(3) + ··· + 3nλ(r), (7.14) ...

mr = nλ(1) + 2nλ(2) + 3nλ(3) + ··· + rnλ(r). 7.4. JORDAN CANONICAL FORM 111

From this we deduce

m1 = nλ(1) + nλ(2) + nλ(3) + ··· + nλ(r),

m2 − m1 = nλ(2) + nλ(3) + ··· + nλ(r),

m3 − m2 = nλ(3) + ··· + nλ(r), ...

mr − mr−1 = nλ(r).

Therefore, we can recover nλ(1), . . . , nλ(r) from the data m1, . . . , mr. Explicitly, this gives  2m − m if e = 1,  1 2 nλ(e) = 2me − me−1 − me+1 if 1 < e < r, (7.15)  mr − mr−1 if e = r.

Proof of Theorem 7.4.4. Suppose that [ T ]B is in Jordan canonical form 7.12 for a basis B of V . 0 0 Exercise 7.4.1 at the end of this section shows that, for every permuation J1,...,Js of 0 the blocks Jλ1 (e1),...,Jλs (es), there exists a basis B of V such that  J 0  1 0 0  ..  [ T ]B =  .  . 0 0 Js To prove the first statement of the theorem, it therefore suffices to verify formula (7.13) because, as the remark following the theorem shows, this formula allows us to calculate all the integers nλ(e) and thus determine a Jordan canonical form of T up to a permutation of the blocks on the diagonal.

Fix λ ∈ C and k ∈ N>0, and set n = dimC(V ). We have k k dimC ker(T − λI) = n − rank[(T − λI) ]B k = n − rank([ T ]B − λI) k  J (e ) − λI  λ1 1 0  ..  = n − rank  0 .  Jλs (es) − λI  (J (e ) − λI)k  λ1 1 0  ..  = n − rank  .  0 k (Jλs (es) − λI) k k = n − rank(Jλ1 (e1) − λI) + ··· + rank(Jλs (es) − λI) .

Since we also have n = e1 + ··· + es, we conclude that s k X k dimC ker(T − λI) = ei − rank(Jλi (ei) − λI) . i=1 112 CHAPTER 7. THE STRUCTURE THEOREM

For i = 1, . . . , s, we have   λi − λ 1 . 0  λi − λ ..  J (e ) − λI =   = J (e ) λi i  .  λi−λ i  0 .. 1  λi − λ and so the formula becomes

s k X k dimC ker(T − λI) = ei − rank(Jλi−λ(ei)) . (7.16) i=1

If λi 6= λ, we have ei det(Jλi−λ(ei)) = (λi − λ) 6= 0, k hence Jλi−λ(ei) is invertible. Then Jλi−λ(ei) is also invertible for each integer k ≥ 1. Therefore, we have k rank(Jλi−λ(ei) ) = ei if λi 6= λ. (7.17)

If λi = λ, we have 0 1 0  0 ...  J (e ) = J (e ) =   λi−λ i 0 i  .   .. 1  0 0 and we easily check by induction on k that

0 I ! k ei−k J0(ei) = for k = 1, . . . , ei − 1 (7.18) 0k 0

k and that J0(ei) = 0 if k ≥ ei (in formula (7.18)), the symbol Iei−k represents the identity matrix of size (ei − k) × (ei − k), the symbol 0k represents the zero matrix of size k × k, and the symbols 0 on the diagonal are zero matrices of mixed size). From this we deduce that ( k ei − k if 1 ≤ k < ei, rank(J0(ei) ) = (7.19) 0 if k ≥ ei.

By (7.17) and (7.19), formula (7.16) becomes

k X X dimC ker(T − λI) = min(k, ei) = min(k, e)nλ(e) {i ; λi=λ} e≥1 where, for each integer e ≥ 1, the integer nλ(e) represents the number of indices i ∈ {1, . . . , s} such that (λi, ei) = (λ, e), hence the number of blocks equal to Jλ(e) on the diagonal of (7.12). This proves formula (7.13) and, at the same time, we complete the proof of the first assertion of the theorem. The exercises at the end of this section propose an alternative proof of (7.13). 7.4. JORDAN CANONICAL FORM 113

Finally, we find that

xI − J (e ) λ1 1 0 .. charT (x) = det(xI − [ T ]B) = . 0 xI − Jλs (es)

= det(xI − Jλ1 (e1)) ··· det(xI − Jλs (es)) e1 es = (x − λ1) ··· (x − λs) .

Thus, the numbers λ for which the Jordan canonical form (7.12) of T contains a block Jλ(e) with e ≥ 1 are the eigenvalues of T . This proves the second assertion of the theorem and completes the proof.

To determine the integers nλ(e) using formulas (7.14), we must determine, for each eigen- value λ of T , the largest integer r such that nλ(r) ≥ 1. The following corollary provides a method for calculating this integer.

k Corollary 7.4.5. Let λ be an eigenvalue of T . Let mk = dimC ker(T − λI) for all k ≥ 1. Then we have

m1 < m2 < ··· < mr = mr+1 = mr+2 = ··· where r denotes the largest integer for which nλ(r) ≥ 1.

Proof. Let r be the largest integer for which nλ(r) ≥ 1. The equalities (7.14) show that mk − mk−1 ≥ nλ(r) ≥ 1 for k = 2, . . . , r and so m1 < m2 < ··· < mr. For k ≥ r, formula (7.13) gives mk = mr.

Example 7.4.6. Let T : C10 → C10 be a linear map. Suppose that its eigenvalues are 1 + i and 2, and that the integers

k 0 k mk = dimC ker(T − (1 + i)I) and mk = dimC ker(T − 2I)

satisfy

0 0 0 m1 = 3, m2 = 5, m3 = 6, m4 = 6, m1 = 2, m2 = 4 and m3 = 4.

Since m1 < m2 < m3 = m4, Corollary 7.4.5 shows that the largest Jordan block of T for the 0 0 0 eigenvalue λ = 1 + i is of size 3 × 3. Similarly, since m1 < m2 = m3, the largest Jordan block of T for the eigenvalue λ = 2 is of size 2 × 2. Formulas (7.14) for λ = 1 + i give

m1 = 3 = n1+i(1) + n1+i(2) + n1+i(3),

m2 = 5 = n1+i(1) + 2n1+i(2) + 2n1+i(3),

m3 = 6 = n1+i(1) + 2n1+i(2) + 3n1+i(3), 114 CHAPTER 7. THE STRUCTURE THEOREM

hence n1+i(1) = n1+i(2) = n1+i(3) = 1. For λ = 2, they are written as

0 m1 = 2 = n2(1) + n2(2), 0 m2 = 4 = n2(1) + 2n2(2), hence n2(1) = 0 and n2(2) = 2. We conclude that the Jordan canonical form of T is the block diagonal matrix with the blocks  1 + i 1 0   1 + i 1   2 1   2 1  (1 + i), , 0 1 + i 1 , , 0 1 + i   0 2 0 2 0 0 1 + i on the diagonal.

Example 7.4.7. Let T : C4 → C4 be the linear map whose matrix relative to the standard basis E of C4 is  1 0 −1 4   0 1 −1 5  [ T ]E =   .  1 0 −1 8  1 −1 0 3 We wish to determine a Jordan canonical form of T .

Solution: We calculate

2 2 charT (x) = det(xI − [ T ]E ) = x (x − 2) . Thus the eigenvalues of T are 0 and 2. We set

k k k mk = dimC ker(T − 0I) = 4 − rank[ T ]E = 4 − rank([ T ]E ) 0 k k k mk = dimC ker(T − 2I) = 4 − rank[(T − 2I) ]E = 4 − rank([ T ]E − 2I) for every integer k ≥ 1. We find, after a series of elementary row operations, the following reduced echelon forms:  1 0 −1 0   0 1 −1 0  [ T ]E ∼    0 0 0 1  0 0 0 0

 4 −4 0 8   1 −1 0 0  2  4 −4 0 12   0 0 0 1  [ T ] =   ∼   E  8 −8 0 20   0 0 0 0  4 −4 0 8 0 0 0 0

 12 −12 0 20   1 −1 0 0  3  16 −16 0 32   0 0 0 1  [ T ] =   ∼   E  28 −28 0 52   0 0 0 0  12 −12 0 20 0 0 0 0 7.4. JORDAN CANONICAL FORM 115

 −1 0 −1 4   1 0 0 −1   0 −1 −1 5   0 1 0 −2  [ T ]E − 2I =   ∼    1 0 −3 8   0 0 1 −3  1 −1 0 1 0 0 0 0

 4 −4 4 −8   1 0 0 −1  2  4 −4 4 −8   0 1 −1 1  ([ T ]E − 2I) =   ∼    4 −8 8 −12   0 0 0 0  0 0 0 0 0 0 0 0

 −8 −12 −12 20   1 0 0 −1  3  −8 −12 −12 20   0 1 −1 1  ([ T ]E − 2I) =   =    −8 −20 −20 28   0 0 0 0  0 0 0 0 0 0 0 0

0 0 0 hence m1 = 1, m2 = m3 = 2, m1 = 1 and m2 = m3 = 2. So the largest Jordan block of T for the eigenvalue 0 is of size 2 × 2 and the block for the eigenvalue 2 is also of size 2 × 2. Since the sum of the sizes of the block must be equal to 4, we conclude that the Jordan canonical form of T is  0 1 0 0    J0(2) 0  0 0 0 0  =   . 0 J2(2)  0 0 2 1  0 0 0 2

4 However, this does not say how to find a basis B of C such that [ T ]B is of this form. We will see in the following chapter how to do this in principle. The matrix version of Theorem 7.4.3 is the following:

Theorem 7.4.8. Let A ∈ Matn×n(C). There exists an invertible matrix P ∈ GLn(C) such that P −1AP is block diagonal with Jordan blocks on the diagonal.

 J (e )  λ1 1 0 −1  ..  Such a product P AP =  0 .  is called a Jordan canonical form Jλs (es) of A (or Jordan normal form of A). The matrix version of Theorem 7.4.4 is stated as follows:

Theorem 7.4.9. Let A ∈ Matn×n(C). The Jordan canonical forms of A are obtained from one another by a permutation of the Jordan blocks on the diagonal. The blocks in question are of the form Jλ(e) where λ is an eigenvalue of A and where e is an integer ≥ 1. For each λ ∈ C and each integer k ≥ 1, we have k X n − rank(A − λI) = min(k, e)nλ(e) e≥1

where nλ(e) denotes the number of blocks on the diagonal equal to Jλ(e). 116 CHAPTER 7. THE STRUCTURE THEOREM

This result provides the means to determine if two matrices with complex entries are similar or not:

Corollary 7.4.10. Let A, B ∈ Matn×n(C). Then A and B are similar if and only if they admit the same Jordan canonical form.

The proof of this statement is left as an exercise.

Exercises.

7.4.1. Let V be a finite-dimensional vector space over C and let T : V → V be a linear operator. Suppose that there exists a basis B of V such that   A1 0  ..  [ T ]B =  0 .  As

where Ai ∈ Matei×ei (C) for i = 1, . . . , s. Write

B = B1 q · · · q Bs where Bi has ei elements for i = 1, . . . , s. Finally, for each i, denote by Vi the subspace of V generated by Bi.

(a) Show that each subspace Vi is T -invariant, that it admits Bi as a basis and that

[ T | ]Bi = Ai. Vi

0 (b) Show that if (i1, . . . , is) is a permutation of (1, . . . , s), then B = Bi1 q · · · q Bis is a basis of V such that  A  i1 0 0  ..  [ T ]B =  0 .  . Ais

7.4.2. Let V be a vector space over C and let T : V → V be a linear operator. Suppose that

V = V1 ⊕ · · · ⊕ Vs where Vi is a T -invariant subspace of V for i = 1, . . . , s.

(a) Show that ker(T ) = ker(T | ) ⊕ · · · ⊕ ker(T | ). V1 Vs 7.4. JORDAN CANONICAL FORM 117

(b) Deduce that, for all λ ∈ C and all k ∈ N>0, we have

k k k ker(T − λIV ) = ker(T | − λIV1 ) ⊕ · · · ⊕ ker(T | − λIVs ) . V1 Vs

7.4.3. Set T : V → V be a linear operator on a vector space V of dimension n over C. Suppose that there exists a basis B = {v1,..., vn} of V and λ ∈ C such that

[ T ]B = Jλ(n).

k (a) Show that, if λ 6= 0, then dimC(ker T ) = 0 for every integer k ≥ 1. (b) Suppose that λ = 0. Show that

k ker(T ) = hv1,..., vkiC

k for k = 1, . . . , n. Deduce that dimC ker(T ) = min(k, n) for every integer k ≥ 1.

7.4.4. Give an alternative proof of formula (7.13) of Theorem 7.4.4 by proceeding in the following manner. First suppose that B is a basis of V such that

 J (e )  λ1 1 0  ..  [ T ]B =  0 .  . Jλs (es)

By Exercise 7.4.1, we can write B = B1 q · · · q Bs where, for each i, Bi is a basis of a T -invariant subspace Vi of V with

[ T | ]Bi = Jλi (ei) Vi

and we have V = V1 ⊕ · · · ⊕ Vs.

(a) Applying Exercise 7.4.2, show that

s k X k dim ker(T − λI) = dim ker(T | − λIVi ) C C Vi i=1

for each λ ∈ C and each integer k ≥ 1.

(b) Show that [ T | − λIVi ]Bi = Jλi−λ(ei) and, using Exercise 7.4.3, deduce that Vi ( 0 if λ 6= λ , dim ker(T − λI )k = i C |V Vi i min(ei, k) if λ = λi .

(c) Deduce formula (7.13). 118 CHAPTER 7. THE STRUCTURE THEOREM

7.4.5. Deduce Theorem 7.4.8 from Theorem 7.4.3 by applying the latter to the linear operator n n TA : C → C associated to A. Similarly, deduce Theorem 7.4.9 from Theorem 7.4.4.

7.4.6. Prove Corollary 7.4.10.

7.4.7. Let T : C7 → C7 be a linear map. Suppose that its eigenvalues are 0, 1 and −i and that the integers

k 0 k 00 k mk = dimC ker T , mk = dimC ker(T − I) , mk = dimC ker(T + iI)

0 0 0 0 00 00 00 satisfy m1 = m2 = 2, m1 = 1, m2 = 2, m3 = m4 = 3, m1 = 1 and m2 = m3 = 2. Determine a Jordan canonical form of T .

7.4.8. Determine a Jordan canonical form of the matrix  0 1 0  A =  −1 0 1  . 0 1 0 Chapter 8

The proof of the structure theorem

This chapter concludes the first part of the course. We first state a theorem concerning submodules of free modules over a euclidean domain and then we show that it implies the structure theorem for modules of finite type over a euclidean domain. In Section 8.2, we prove the Cayley-Hamilton Theorem which says that, for every linear operator T : V → V on a finite-dimensional vector space V , we have charT (T ) = 0. The rest of the chapter is dedicated to the proof of our submodule theorem. In Section 8.3, we show that, if A is a euclidean domian, then every submodule N of An has a basis. In Section 8.4, we give an algorithm for finding such a basis from a system of generators of N. In Section 8.5, we present a more complex algorithm that allows us to transform every matrix with coefficients in A into its “Smith normal form” and we deduce from this the theorem on submodules of free modules. Finally, given a linear operator T : V → V as above, Section 8.6 provides a method for determining a basis B of V such that [ T ]B is in rational canonical form.

Throughout this chapter, A denotes a euclidean domain and n is a positive integer.

8.1 Submodules of free modules, Part I

We first state:

n Theorem 8.1.1. Let N be an A-submodule of A . Then there exists a basis {C1,...,Cn} n of A , an integer r with 0 ≤ r ≤ n and nonzero elements d1, . . . , dr of A with d1 | d2 | · · · | dr such that {d1C1, . . . , drCr} is a basis of N.

The proof of this result is given in Section 8.5. We will see that the elements d1, . . . , dr are completely determined by N up to multiplication by units of A. We explain here how we can deduce the structure theorem for A-modules of finite type from Theorem 8.1.1.

Proof of Theorem 7.2.1. Let M = hv1,..., vniA be an A-module of finite type. The function

119 120 CHAPTER 8. THE PROOF OF THE STRUCTURE THEOREM

ψ : An −→ M   a1  .  (8.1)  .  7−→ a1v1 + ··· + anvn an is a surjective homomorphism of A-modules. Let

N = ker ψ.

n Then N is an A-submodule of A . By Theorem 8.1.1, there exists a basis {C1,...,Cn} of n A , an integer r with 0 ≤ r ≤ n and nonzero elements d1, . . . , dr of A with d1 | d2 | · · · | dr such that {d1C1, . . . , drCr} is a basis of N. Then let

ui = ψ(Ci)(i = 1, . . . , n).

n Since A = hC1,...,CniA, we have

M = Im(ψ) = hu1,..., uniA = Au1 + ··· + Aun. (8.2)

On the other hand, for a1, . . . , an ∈ A, we find

a1u1 + ··· + anun = 0 ⇐⇒ a1ψ(C1) + ··· + anψ(Cn) = 0

⇐⇒ ψ(a1C1 + ··· + anCn) = 0

⇐⇒ a1C1 + ··· + anCn ∈ N.

Recall that {d1C1, . . . , drCr} is a basis of N. Thus the condition a1C1 + ··· + anCn ∈ N is equivalent to the existence of b1, . . . , br ∈ A such that

a1C1 + ··· + anCn = b1(d1C1) + ··· + br(drCr).

Since C1,...,Cn are linearly independent over A, this is not possible unless

a1 = b1d1, . . . , ar = brdr and ar+1 = ··· = an = 0.

We conclude that

a1u1 + ··· + anun = 0 ⇐⇒ a1C1 + ··· + anCn ∈ N

⇐⇒ a1 ∈ (d1), . . . , ar ∈ (dr) and ar+1 = ··· = an = 0. (8.3) In particular, this shows that ( ai ∈ (di) if 1 ≤ i ≤ r, aiui = 0 ⇐⇒ (8.4) ai = 0 if r < i ≤ n.

Thus Ann(ui) = (di) for i = 1, . . . , r and Ann(ui) = (0) for i = r + 1, . . . , n. Since d1 | d2 | · · · | dr, we have

Ann(u1) ⊇ Ann(u2) ⊇ · · · ⊇ Ann(un). (8.5) 8.1. SUBMODULES OF FREE MODULES, PART I 121

Formulas (8.3) and (8.4) also imply that

a1u1 + ··· + anun = 0 ⇐⇒ a1u1 = ··· = anun = 0. By (8.2), this property implies that

M = Au1 ⊕ · · · ⊕ Aun (8.6) (see Definition 6.4.1). This completes the proof of Theorem 7.2.1 except for the condition that Ann(u1) 6= A. Starting with a decomposition (8.6) satisfying (8.5), it is easy to obtain another with Ann(u1) 6= A. Indeed, for all u ∈ M, we have Ann(u) = A ⇐⇒ u = 0 ⇐⇒ Au = {0}.

Thus if M 6= {0} and k denotes the smallest integer such that uk 6= 0, we have

A 6= Ann(uk) ⊇ · · · ⊇ Ann(un) and M = {0} ⊕ · · · ⊕ {0} ⊕ Auk ⊕ · · · ⊕ Aun = Auk ⊕ · · · ⊕ Aun as required. If M = {0}, we obtain an empty direct sum.

Section 8.6 provides an algorithm for determining {C1,...,Cn} and d1, . . . , dr as in the statement of Theorem 8.1.1, provided that we know a system of generators of N. In the application of Theorem 7.2.1, this means knowing a system of generators of ker(ψ) where ψ is given by (8.1). The following result allows us to find such a system of generators in the situation most interesting to us. Proposition 8.1.2. Let V be a vector space of finite dimension n > 0 over a field K, let B = {v1,..., vn} be a basis of V and let T : V → V be a linear operator. For the corresponding structure of a K[x]-module on V , the function ψ : K[x]n −→ V   p1(x)  .   .  7−→ p1(x)v1 + ··· + pn(x)vn pn(x) is a surjective homomorphism of K[x]-modules, and its kernel is the K[x]-submodule of K[x]n generated by the columns of the matrix xI − [ T ]B.

n Proof. Write [ T ]B = (aij). Let N be the K[x]-submodule of K[x] generated by the columns C1,...,Cn of xI − [ T ]B. For j = 1, . . . , n, we have   −a1,j .  .   .    Cj = x − ajj ,  .   .  −anj 122 CHAPTER 8. THE PROOF OF THE STRUCTURE THEOREM

hence

ψ(Cj) = (−a1j)v1 + ··· + (x − ajj)vj + ··· + (−anj)vn

= T (vj) − (a1jv1 + ··· + anjvn) = 0

and so Cj ∈ ker(ψ). This proves that N ⊆ ker(ψ).

t To prove the reverse inclusion, we first show that every n-tuple (p1(x), . . . , pn(x)) of K[x]n can be written as a sum of an element of N and an element of Kn. We prove this assertion by induction on the maximum m of the degree of the polynomials p1(x), . . . , pn(x) where, for the purpose of this argument, we say that the zero polynomial has t n degree 0. If m = 0, then (p1(x), . . . , pn(x)) is an element of K and we are done. Suppose that m ≥ 1 and that the result is true for an n-tuple of polynomials of degrees ≤ m − 1. m Denoting by bi the coefficient of x in pi(x), we have:   p1(x) b xm + (terms of degree ≤ m − 1) . 1  .  =  ······    m pn(x) bnx + (terms of degree ≤ m − 1)  ∗  p1(x) m−1 m−1  .  = b1x C1 + ··· + bnx Cn +  .  | {z } ∗ ∈ N pn(x) ∗ ∗ for polynomials p (x), . . . , pn(x) in K[x] of degree ≤ m−1. Applying the induction hypothesis ∗ ∗ t t to the n-tuple (p1(x), . . . , pn(x)) , we conclude that (p1(x), . . . , pn(x)) is the sum of an element of N and an element of Kn.

t Let (p1(x), . . . , pn(x)) be an arbitrary element of ker(ψ). By the above observation, we can write       p1(x) q1(x) c1  .   .   .   .  =  .  +  .  (8.7) pn(x) qn(x) cn t t n where (q1(x), . . . , qn(x)) ∈ N and (c1, . . . , cn) ∈ K . Since N ⊆ ker(ψ), we see that       c1 p1(x) q1(x)  .   .   .   .  =  .  −  .  ∈ ker(ψ), cn pn(x) qn(x) t and so, applying ψ to the vector (c1, . . . , cn) , we get

c1v1 + ··· + cnvn = 0.

Since v1,..., vn are linearly independent over K, this implies that c1 = ··· = cn = 0 and equality (8.7) gives     p1(x) q1(x)  .   .   .  =  .  ∈ N. pn(x) qn(x) 8.1. SUBMODULES OF FREE MODULES, PART I 123

t Since the choice of (p1(x), . . . , pn(x)) ∈ ker(ψ) is arbitrary, this shows that ker(ψ) ⊆ N, and so ker(ψ) = N.

Example 8.1.3. Let V be a vector space of dimension 2 over R, let A = {v1, v2} be a basis  −1 2  of V , and let T ∈ End (V ) with [ T ] = . For the corresponding structure of an R A 4 1 R[x]-module over V , we consider the homomorphism of R[x]-modules ψ : R[x]2 → V given by   p1(x) ψ = p1(x)v1 + p2(x)v2. p2(x)

(i) By Proposition 8.1.2, ker(ψ) is the submodule of R[x]2 generated by the columns of

 x + 1 −2  xI − [ T ] = , A −4 x − 1

that is,  x + 1   −2  ker(ψ) = , . −4 x − 1 R[x]

2 (ii) We see that (x − 9)v1 = 0 since

2 2 (x − 9)v1 = T (v1) − 9v1 = T (−v1 + 4v2) − 9v1 = 0.

x2 − 9 So we have ∈ ker(ψ). To write this element of ker(ψ) as a linear combination of 0 the generators of ker(ψ) found in (i), we proceed as in the proof of Proposition 8.1.2. We obtain

x2 − 9  x + 1   −2   −x − 9  = x + 0 + 0 −4 x − 1 4x

and −x − 9  x + 1   −2   0  = (−1) + 4 + , 4x −4 x − 1 0

thus x2 − 9  x + 1   −2  = (x − 1) + 4 . 0 −4 x − 1 124 CHAPTER 8. THE PROOF OF THE STRUCTURE THEOREM

Exercises.

8.1.1. Let V be a vector space over Q equipped with a basis A = {v1, v2}, and let T ∈ 1 2 End (V ) with [ T ] = . We consider V as a [x]-module and we denote by N the Q A 3 2 Q kernel of the homomorphism of Q[x]-modules ψ : Q[x]2 −→ V given by   p1(x) ψ = p1(x) · v1 + p2(x) · v2. p2(x)

(i) Find a system of generators of N as a Q[x]-module.  x2 − 3  (ii) Show that ∈ N and express this column vector as a linear combination of −2x − 5 the generators of N found in (i).

8.1.2. Let V be a vector space over R equipped with a basis A = {v1, v2, v3} an let T ∈ 1 2 1

EndR(V ) with [ T ]A = 0 1 2. We consider V as a R[x]-module and we denote by N 0 0 1 the kernel of the homomorphism of R[x]-modules ψ : R[x]3 −→ V given by   p1(x) ψ p2(x) = p1(x) · v1 + p2(x) · v2 + p3(x) · v3. p3(x)

(i) Find a system of generators of N as a R[x]-module.  x2 − x − 6  (ii) Show that  x − 1  ∈ N and express this column vector as a linear combination x2 − 2x + 1 of the generators of N found in (i).

8.2 The Cayley-Hamilton Theorem

We begin with the following observation:

Proposition 8.2.1. Let N be the A-submodule of An generated by the columns of a matrix n U ∈ Matn×n(A). Then N contains det(U)A . 8.2. THE CAYLEY-HAMILTON THEOREM 125

Proof. We know that the adjugate Adj(U) of U is a matrix of Matn×n(A) satisfying

UAdj(U) = det(U)I (8.8)

(see Theorem B.3.4). Denote by C1,...,Cn the columns of U and by e1,..., en those of I n (these are the elements of the standard basis of A ). Also write Adj(U) = (aij). Comparing the j-th columns of the matrices on each side of (8.8), we see that

a1jC1 + ··· + anjCn = det(U)ej (1 ≤ j ≤ n).

Since N = hC1,...,CniA, we deduce that

n det(U)A = hdet(U)e1,..., det(U)eniA ⊆ N.

We can now prove the following fundamental result:

Theorem 8.2.2 (Cayley-Hamilton Theorem). Let V be a finite-dimensional vector space over a field K and let T : V → V be a linear operator on V . Then we have

charT (T ) = 0.

Proof. Let B = {v1,..., vn} be a basis of V over K. By Proposition 8.1.2, the kernel of the homomorphism of K[x]-modules

ψ : K[x]n −→ V   p1(x)  .   .  7−→ p1(x)v1 + ··· + pn(x)vn pn(x)

n is the submodule of K[x] generated by the columns of xI − [ T ]B ∈ Matn×n(K[x]). By the proceeding proposition (applied to A = K[x]), this implies that

n n ker(ψ) ⊇ det(xI − [ T ]B) · K[x] = charT (x) · K[x] .

n In particular, for all u ∈ K[x] , we have charT (x)u ∈ ker(ψ) and so

0 = ψ(charT (x) u) = charT (x) ψ(u).

Since ψ is a surjective homomorphism, this implies that charT (x) ∈ Ann(V ), that is, charT (T ) = 0.

The following consequence is left as an exercise.

Corollary 8.2.3. For every matrix M ∈ Matn×n(K), we have charM (M) = 0. 126 CHAPTER 8. THE PROOF OF THE STRUCTURE THEOREM

 a b  Example 8.2.4. For n = 2, we can give a direct proof. Write M = . We see that c d

x − a −b 2 charT (x) = = x − (a + d)x + (ad − bc) −c x − d

thus 2 charM (M) = M − (a + d)M + (ad − bc)I  a2 + bc ab + bd   a b   1 0  = − (a + d) + (ad − bc) ac + cd bc + d2 c d 0 1  0 0  = . 0 0

Exercises.

8.2.1. Prove Corollary 8.2.3.

8.2.2. Let V be a finite-dimensional vector space over a field K and let T : V → V be a linear operator on V .

(i) Show that minT (x) divides charT (x).

3 2 (ii) If K = Q and charT (x) = x − 2x + x, what are the possibilities for minT (x)?

8.2.3 (Alternate proof of the Cayley-Hamilton Theorem). Let V be a vector space of finite dimension n over a field K and let T : V → V be a linear operator on V . The goal of this exercise is to give another proof of the Cayley-Hamilton Theorem. To do this, we equip V with the corresponding structure of a K[x]-module, and we choose an arbitrary element v of V . Let p(x) be the monic generator of Ann(v), and let m be its degree.

 P Q  (i) Show that there exists a basis B of V such that [ T ] = where P denotes B 0 R the companion matrix of p(x) and where R is a square matrix of size (n−m)×(n−m).

(ii) Show that charT (x) = p(x)charR(x) ∈ Ann(v).

(iii) Why can we conclude that charT (T ) = 0? 8.3. SUBMODULES OF FREE MODULES, PART II 127 8.3 Submodules of free modules, Part II

The goal of this section is to prove the following weaker version of Theorem 8.1.1 as a first step towards the proof of this theorem.

Theorem 8.3.1. Every A-submodule of An is free and possesses a basis with at most n elements.

This result is false in general for an arbitrary commutative ring A (see Exercise 8.3.1).

Proof. We proceed by induction on n. For n = 1, an A-submodule N of A1 = A is simply an ideal of A. If N = {0}, then by convention N is free and admits the basis ∅. Otherwise, we know that N = (b) = A b for a nonzero element b of A since every ideal of A is principal (Theorem 5.4.3). We note that if ab = 0 with a ∈ A then a = 0 since A is an integral domain. Therefore, in this case, {b} is a basis of N.

Now suppose that n ≥ 2 and that every A-submodule of An−1 admits a basis with at most n − 1 elements. Let N be an A-submodule of An. We define    a     a    2 I = a ∈ A ;  .  ∈ N for some a2, . . . , an ∈ A ,   .      an         a1   0    a  a    2 a   0  2 00  .  n−1  2 N =  .  ∈ N ; a1 = 0 and N =  .  ∈ A ;  .  ∈ N .  .     .      an   an   an 

It is straightforward (exercise) to verify that I is an ideal of A, that N 0 is a submodule of N and that N 00 is a submodule of An−1. Moreover, the function

ϕ : N 00 −→ N 0  0  a  2 a   .   2  .  7−→  .   .  an an is an isomorphism of A-modules. By the induction hypothesis, N 00 admits a basis with at most n − 1 elements. The image of this basis under ϕ is therefore a basis {u1,..., um} of N 0 with 0 ≤ m ≤ n − 1 (Proposition 6.5.13).

0 If I = {0}, then we have N = N . In this case, {u1,..., um} is a basis of N and, since 0 ≤ m ≤ n, we are finished. 128 CHAPTER 8. THE PROOF OF THE STRUCTURE THEOREM

Otherwise, Theorem 5.4.3 implies I = (b1) = Ab1 for a nonzero element b1 of A. Since b1 ∈ I, there exists b2, . . . , bn ∈ A such that   b1 b   2 um+1 :=  .  ∈ N.  .  bn

0 We will show that N = N ⊕ Aum+1. Assuming this result, we have that {u1,..., um+1} is a basis of N and, since m + 1 ≤ n, the induction step is complete.

t Let u = (c1, . . . , cn) ∈ N. By the definition of I, we have c1 ∈ I. Thus there exists a ∈ A such that c1 = a b1. We see that  0  c − a b   2 2  0 u − aum+1 =  .  ∈ N ,  .  cn − a bn and so 0 u = (u − aum+1) + aum+1 ∈ N + Aum+1. Since the choice of u is arbitrary, this shows that

0 N = N + Aum+1. We also note that, for a ∈ A,

0 aum+1 ∈ N ⇐⇒ a b1 = 0 ⇐⇒ a = 0. This shows that 0 N ∩ Aum+1 = {0}, 0 hence N = N ⊕ Aum+1 as claimed.     x  Example 8.3.2. Let N = y ; 2x + 3y + 6z = 0 .  z 

(i) Show that N is a Z-submodule of Z3. (ii) Find a basis of N as a Z-module.

Solution: For (i), it suffices to show that N is a subgroup of Z3. We leave this as an exercise. For (ii), we set

I = {x ∈ Z ; 2x + 3y + 6z = 0 for some y, z ∈ Z}, x    y  N 0 = y ∈ N ; x = 0 and N 00 = ∈ 2 ; 3y + 6z = 0 ,   z Z  z  8.3. SUBMODULES OF FREE MODULES, PART II 129

as in the proof of Theorem 8.3.1. We see that

3y + 6z = 0 ⇐⇒ y = −2z,

hence −2z  −2 N 00 = ; z ∈ = z Z Z 1   0  −2   admits the basis and so u := −2 is a basis of N 0. We also see that, for 1 1    1  x ∈ Z, we have

x ∈ I ⇐⇒ 2x = −3y − 6z for some y, z ∈ Z ⇐⇒ 2x ∈ (−3, −6) = (3) ⇐⇒ 3 | 2x ⇐⇒ 3 | x.

Thus I = (3). Furthermore, 3 is the first component of

 3  u2 = −2 ∈ N. 0

Repeating the arguments of the proof of Theorem 8.3.1, we see that

0 N = N ⊕ Zu2 = Zu1 ⊕ Zu2.       0 3  Thus u1 = −2 , u2 = −2 is a basis of N.  1 0 

Exercises. 8.3.1. Let A be an arbitrary integral domain and let I be an ideal of A considered as an A-submodule of A1 = A. Show that I is a free A-module if and only I is principal.

8.3.2. Let M be a free module over a euclidean domain A. Show that every A-submodule of M is free.     x  8.3.3. Let N = y ∈ Z3 ; 3x + 10y − 16z = 0 .  z 

(i) Show that N is a submodule of Z3. 130 CHAPTER 8. THE PROOF OF THE STRUCTURE THEOREM

(ii) Find a basis of N as a Z-module.

   p1(x)  3 2 2 2  8.3.4. Let N = p2(x) ∈ Q[x] ;(x − x)p1(x) + x p2(x) − (x + x)p3(x) = 0 .  p3(x) 

(i) Show that N is a Q[x]-submodule of Q[x]3.

(ii) Find a basis of N as a Q[x]-module.

8.4 The column module of a matrix

Definition 8.4.1 (Column module). We define the column module of a matrix U ∈ Matn×m(A) n to be the A-submodule of A generated by the columns of A. We denote it by ColA(U).

Theorem 8.3.1 shows that every submodule of An has a basis with at most n elements. Thus the column module of a matrix U ∈ Matn×m(A) has a basis. The goal of this section is to give an algorithm for finding such a basis. More generally, this algorithm permits us to find a basis of an arbitrary submodule N of An provided that we know a system of generators C1,...,Cm of N since N is then the column module of the matrix (C1 ··· Cm) ∈ Matn×m(A) whose columns are C1,...,Cm.

We begin with the following observation:

0 Lemma 8.4.2. Let U ∈ Matn×m(A) and let U be a matrix obtained from U by applying one of the following operations:

(I) interchange two columns,

(II) add to one column the product of another column and an element of A,

(III) multiply a column by an element of A×.

Then U and U 0 have the same column module.

0 Proof. By construction, the columns of U belong to ColA(U) since they are linear com- 0 binations of the columns of U. Thus we have ColA(U ) ⊆ ColA(U). On the other hand, each of the operations of type (I), (III) or (III) is invertible: we can recover U from U 0 by 0 applying an operation of the same type. Thus we also have ColA(U) ⊆ ColA(U ), and so 0 ColA(U) = ColA(U ).

The operations (I), (II), (III) of Lemma 8.4.2 are called elementary column operations (on A). From this lemma we deduce: 8.4. THE COLUMN MODULE OF A MATRIX 131

0 Lemma 8.4.3. Let U ∈ Matn×m(A) and let U be a matrix obtained from U by applying a 0 series of elementary column operation. Then ColA(U) = ColA(U ).

Definition 8.4.4 (Echelon form). We say that a matrix U ∈ Matn×m(A) is in echelon form if it is in the form  0 0 ······ 0 0 ··· 0  . . . . .  . . . . .   . . . . .   . . . .   0 . . . .     . . . .   ui1,1 . ······ . . .     . . .   ∗ 0 . . .   . . . .  U =  . u . . .  (8.9)  i2,2   . . .   . ∗ 0 . .     . . . .   . . uir,r . .   . . . .   . . . .   . . ∗ . .   . . . . .   . . . . .  ∗ ∗ · · · · · · ∗ 0 0

with 1 ≤ i1 < i2 < ··· < ir ≤ n and ui1,1 6= 0, . . . , uir,r 6= 0.

Lemma 8.4.5. Suppose that U ∈ Matn×m(A) is in echelon form. Then the nonzero columns of U form a basis of ColA(U).

Proof. Suppose that U is given by (8.9). Then the first r columns C1,...,Cr of U generate ColA(U). Suppose that we have

a1C1 + ··· + arCr = 0

for elements a1, . . . , ar of A. If a1, . . . , ar are not nonzero, there exists a smallest integer k such that ak 6= 0. Then we have

akCk + ··· + arCr = 0.

Taking the ik-th component of each side of this equality, we see that

akuik,k = 0.

Since ak 6= 0 and uik,k 6= 0, this is impossible (since A is an integral domain). This con- tradiction shows that we must have a1 = ··· = ar = 0. Thus the columns C1,...,Cr are linearly independent and so they form a basis of ColA(U).

The three preceding lemmas only use the fact that A is an integral domain. The following result uses in a crucial way the hypothesis that A is a euclidean domain.

Theorem 8.4.6. Let U ∈ Matn×m(A). It is possible to bring U to echelon form by a series of elementary column operations. 132 CHAPTER 8. THE PROOF OF THE STRUCTURE THEOREM

Proof. Let ϕ: A \{0} → N be a euclidean function for A (see Definition 5.4.1). We consider the following algorithm: Algorithm 8.4.7. 1. If U = 0, the matrix U is already in echelon form and we are finished.

 0  2. Suppose that U 6= 0. We locate the first nonzero 6= 0 ← row of U.   ∗

2.1. In this row we choose an element a 6= 0 for  0  which ϕ(a) is minimal and we permute the columns  a 6= 0 a2 ··· am  to bring this element to the first column. ∗

2.2. We divide each of the other elements a2, . . . , am C − q C C − q C of this row by a by writing 2 2 1 ··· m m 1 ↓ ↓ aj = qja + rj  0  with q , r ∈ A satisfying r = 0 or ϕ(r ) < ϕ(a), j j j j  a r2 ··· rm  then, for j = 2, . . . , m, we subtract from the j-th ∗ column the product of the first column and qj.

2.3. If r , . . . , r are not all zero, we return to step 2 m  0 0 ······ 0  2.1. . . .  . . .  If they are all zero, the new matrix is in the form pic-    0 0 ······ 0  tured to the right and we return to step 1, restricting    a 0 ······ 0  ourselves to the submatrix formed by the last m − 1   ∗ ∗ columns, that is, ignoring the first column.

This algorithm terminates after a finite number of steps since each iteration of steps 2.1 to 2.3 on a row replaces the first element a 6= 0 of this row by an element a0 6= 0 of A with ϕ(a0) < ϕ(a), and this cannot be repeated indefinitely since ϕ takes values in N = {0, 1, 2,... }. When completed, this algorithm produces a matrix in echelon form as required.

Combining this theorem with the preceding lemmas, we conclude:

Corollary 8.4.8. Let U ∈ Matn×m(A). By a series of elementary column operations, we can transform U to a echelon matrix U 0. Then the nonzero columns of U 0 form a basis of ColA(U).

Proof. The first assertion follows from Theorem 8.4.6. By Lemma 8.4.3, we have ColA(U) = 0 0 ColA(U ). Finally, Lemma 8.4.5 shows that the nonzero columns of U form a basis of 0 ColA(U ), hence these also form a basis of ColA(U). 8.4. THE COLUMN MODULE OF A MATRIX 133

 3 7 4 

Example 8.4.9. Find a basis of ColZ(U) where U =  −4 −2 2  ∈ Mat3×3(Z). 1 −5 −6

Solution: We have

C2 − 2C1 C3 − C1 C2 C1  3 1 1   1 3 1  U ∼ ∼ −4 6 6   6 −4 6  1 −7 −7 −7 1 −7

C2 − 3C1 C3 − C1 −C2  1 0 0   1 0 0  ∼ ∼ .  6 −22 0   6 22 0  −7 22 0 −7 −22 0

     1 0  Thus  6  ,  22  is a basis of ColZ(U).  −7 −22  Example 8.4.10. Find a basis of x2 − 1 x2 − x x3 − 7 N := , , ⊆ [x]2. x + 1 x x − 1 Q Q[x]

Solution: We have N = ColQ[x](U) where

C2 − C1 C3 − xC1 x2 − 1 x2 − x x3 − 7  x2 − 1 −x + 1 x − 7  U = x + 1 x x − 1 ∼ x + 1 −1 −x2 − 1

C2 C1 −C1 −x + 1 x2 − 1 x − 7  x − 1 x2 − 1 x − 7  ∼ −1 x + 1 −x2 − 1 ∼ 1 x + 1 −x2 − 1

C2 − (x + 1)C1 C3 − C1 −C3 C1 C2 x − 1 0 −6   6 x − 1 0  ∼ 1 0 −x2 − 2 ∼ x2 + 2 1 0

6C2   6 6(x − 1) 0 × × ∼ x2 + 2 6 0 since 6 ∈ Q[x] = Q ,

C2 − (x − 1)C1  6 0 0  ∼ x2 + 2 −x3 + x2 − 2x + 8 0 134 CHAPTER 8. THE PROOF OF THE STRUCTURE THEOREM

 6   0  Thus , is a basis of N over [x]. x2 + 2 −x3 + x2 − 2x + 8 Q

Exercises. 2 2 8.4.1. Find a basis of the subgroup of Z3 generated by 4 and 0. 6 3

x2 − 1 x2 − 2x + 1 8.4.2. Find a basis of the [x]-submodule of [x]2 generated by the pairs , R R x x − 2 x3 − 1 and . x2 − 1

8.4.3. Let A be a euclidean domain, and let U ∈ Matn×m(A). Show that the ideal I of A generated by the elements of the first row of U is not affected by the elementary column operations. Deduce that if a series of elementary column operations transforms U into a matrix whose first row is (a, 0,..., 0), then a is a generator of I.

8.5 Smith normal form

To define the notion of elementary row operation on a matrix, we replace everywhere the word “column” by “row” in statements (I) to (III) of Lemma 8.4.2: Definition 8.5.1 (Elementary row operation). An elementary row operation on a matrix of Matn×m(A) is any operation of one of the following types:

(I) interchange two rows,

(II) add to one row the product of another row and an element of A,

(III) multiply a row by an element of A×.

To study the effect of an elementary row operation on the column module of a matrix, we will use the following proof whose proof is left as an exercise (see Exercise 6.5.4). Lemma 8.5.2. Let ϕ: M → M 0 be a homomorphism of A-modules and let N be an A- submodule of M. Then ϕ(N) := {ϕ(u); u ∈ M} is an A-submodule of M 0. Furthermore: 8.5. SMITH NORMAL FORM 135

(i) if N = hu1,..., umiA, then ϕ(N) = hϕ(u1), . . . , ϕ(um)iA;

(ii) if N admits a basis {u1,..., um} and ϕ is injective, then {ϕ(u1), . . . , ϕ(um)} is a basis of ϕ(N).

We can now prove: 0 Lemma 8.5.3. Let U ∈ Matn×m(A) and let U be a matrix obtained from U by applying an elementary row operation. Then there exists an automorphism ϕ of An such that 0 ColA(U ) = ϕ(ColA(U)).

Recall that an automorphism of an A-module is an isomorphism from this module to itself.

0 0 0 Proof. Let C1,...,Cm be the columns of U and let C1,...,Cm be those of U . We split the proof into three cases. If U 0 is obtained from U by applying an operation of type (I), we will simply assume that we have interchanged rows 1 and 2. The general case is similar but more complicated to write. In this case, the function ϕ: An → An given by     u1 u2 u  u   2  1 u  u  ϕ  3 =  3  .   .   .   .  un un n 0 is an automorphism of A such that ϕ(Cj) = Cj for j = 1, . . . , m. Since ColA(U) = 0 0 0 0 0 hC1,...,CmiA and ColA(U ) = hC1,...,CmiA, Lemma 8.5.2 then gives ColA(U ) = ϕ(ColA(U )). If U 0 is obtained by an operation of type (II), we will suppose that we have added to row 0 1 the product of row 2 and an element a of A. Then we have Cj = ϕ(Cj) for j = 1, . . . , m where ϕ: An → An is the function given by     u1 u1 + a u2 u   u   2  2  ϕ  .  =  .  .  .   .  un un We verify that ϕ is an automorphism of An and the conclusion then follows from Lemma 8.5.2. Finally, if U 0 is obtained by an operation of type (III), we will suppose that we have × 0 n n multiplied row 1 of U a unit a ∈ A . Then Cj = ϕ(Cj) for j = 1, . . . , m where ϕ: A → A is the automorphism of An given by     u1 a u1 u   u   2  2  ϕ  .  =  .  ,  .   .  un un 136 CHAPTER 8. THE PROOF OF THE STRUCTURE THEOREM

0 and Lemma 8.5.2 gives ColA(U ) = ϕ(ColA(U)).

Combining Lemma 8.4.2 and 8.5.3, we get:

0 Theorem 8.5.4. Let U ∈ Matn×m(A) and let U be a matrix obtained from U by applying a series of elementary row and column operations. Then there exists an automorphism ϕ: An → An such that 0 ColA(U ) = ϕ(ColA(U)).

Proof. By assumption, there exists a series of matrices

0 U1 = U, U2,U3,...,Uk = U such that, for i = 1, . . . , k − 1, the matrix Ui+1 is obtained by applying to Ui an elementary row or column operation. In the case of a row operation, Lemma 8.5.3 gives

ColA(Ui+1) = ϕi(ColA(Ui))

n for an automorphism ϕi of A . In the case of a column operation, Lemma 8.4.2 gives

ColA(Ui+1) = ColA(Ui) = ϕi(ColA(Ui))

n where ϕi = IAn is the identity function on A . We deduce that

0 ColA(U ) = ColA(Uk) = ϕk−1(. . . ϕ2(ϕ1(ColA(U1))) ... ) = ϕ(ColA(U))

n where ϕ = ϕk−1 ◦ · · · ◦ ϕ2 ◦ ϕ1 is an automorphism of A .

Theorem 8.5.4 is valid over an arbitrary commutative ring A. Using the assumption that A is euclidean, we show:

Theorem 8.5.5. Let U ∈ Matn×m(A). Applying to U a series of elementary row and column operations, we can bring it to the form   d1 0  ..   . 0   0  (8.10)  dr     0 0  for an integer r ≥ 0 and nonzero elements d1, . . . , dr of A with d1 | d2 | ... | dr.

We say that (8.10) is the Smith normal form of U. We can show that the elements d1, . . . , dr are uniquely determined by U up to multiplication by a unit of A (see Exer- cise 8.5.4). We call them the invariant factors of U. The connection to the invariant factors mentioned after Theorem 7.2.5 is that the invariant factors there are the invariant factors (in the sense of Smith normal form) of the matrix xI − A, where A is the matrix associated to T in some basis. 8.5. SMITH NORMAL FORM 137

Proof. If U = 0, the matrix U is already in Smith normal form with r = 0. We therefore suppose that U 6= 0. Then it suffices to show that U can be transformed into a matrix of type  d 0 ··· 0   0    (8.11)  . ∗   . U  0 where d is a nonzero element of A that divides all the coefficients of U ∗. Since U 6= 0, there exists a nonzero element a of U for which ϕ(a) is minimal. By permuting the row and columns of U, we can bring this element to position (1,1):  .      . ··· a ··· a ······ . .  ··· a ···  ∼  .  ∼  .   .   .   .  . . .

Step 1. Let a2, . . . , am be the other elements of the first row. For j = 2, . . . , m, we write aj = qja + rj with qj, rj ∈ A satisfying rj = 0 or ϕ(rj) < ϕ(a), and then we subtract from column j the product of column 1 and qj. This gives

C2 − q2C1 Cm − qmC1   a r2 ··· rm  ∗     .   . ∗  ∗

If r1 = ··· = rm = 0 (this happens when a divides all the elements of the first row), this matrix is of the form  a 0 ··· 0   ∗     .  .  . ∗  ∗

Otherwise, we chose an index j with rj 6= 0 for which ϕ(rj) is minimal and we interchange columns 1 and j to bring this element rj to position (1, 1). This gives a matrix of the form

 a0 ∗ · · · ∗   ∗     .   . ∗  ∗

with a0 6= 0 and ϕ(a0) < ϕ(a). We then repeat the procedure. Since every series of nonzero elements a, a0, a00,... of A with ϕ(a) > ϕ(a0) > ϕ(a00) > ··· cannot continue indefinitely, we 138 CHAPTER 8. THE PROOF OF THE STRUCTURE THEOREM

obtain, after a finite number of iterations, a matrix of the form

 b 0 ··· 0   ∗     .  (8.12)  . ∗  ∗

with b 6= 0 and ϕ(b) ≤ ϕ(a). Furthermore, we have ϕ(b) < ϕ(a) if at the start a does not divide all the elements of row 1.

Step 2. Let b2, . . . , bm be the other elements of the first column of the matrix (8.12). For 0 0 0 0 0 0 i = 2, . . . , n, we write bi = qib + ri with qi, ri ∈ A satisfying ri = 0 or ϕ(ri) < ϕ(b), and then 0 we subtract from row i row 1 multiplied by qi. This gives

 b 0 ··· 0   r0  R − q0 R  2  2 2 1  .  .  . ∗  . 0 0 rn Rn − qnR1

0 0 If r2 = ··· = rn = 0 (this happens when b divides all the elements of the first column), this matrix is of the form  b 0 ··· 0   0     .  .  . ∗  0

0 0 Otherwise we choose an index i with ri 6= 0 for which ϕ(ri) is minimal and then with 0 interchange rows 1 and i to bring this element ri to position (1, 1). This gives a matrix of the form  b0 ∗ · · · ∗   ∗     .   . ∗  ∗ with b0 6= 0 and ϕ(b0) < ϕ(b). We then return to step 1. Since every series of nonzero elements b, b0, b00,... of A with ϕ(b) > ϕ(b0) > ϕ(b00) > ··· is necessarily finite, we obtain, after a finite number of steps, a matrix of the form

 c 0 ··· 0   0    (8.13)  . 0   . U  0

with c 6= 0 and ϕ(c) ≤ ϕ(a). Moreover, we have ϕ(c) < ϕ(a) if initially a does not divide all the elements of the first row and first column. 8.5. SMITH NORMAL FORM 139

Step 3. If c divides all the entries of the submatrix U 0 of (8.13) we have finished. Otherwise, we add to row 1 a row that contains an element of U 0 that is not divisible by c. We then return to step 1 with the resulting matrix

 c c ··· c  2 m R1 + Ri where c - Ri.  0     . 0   . U  0

Since c does not divides all the entries of row 1 and column 1, this produces a new matrix

 c0 0 ··· 0   0     . 00   . U  0

with c0 6= 0 and ϕ(c0) < ϕ(c). Since every series of nonzero elements c, c0, c00,... of A with ϕ(c) > ϕ(c0) > ϕ(c00) > ··· is necessarily finite, we obtain, after a finite number of iterations, a matrix of the required form (8.10) where d 6= 0 divides all the entries of U ∗

 4 8 4  Example 8.5.6. Find the Smith normal form of U = ∈ Mat ( ). 4 10 8 2×3 Z

Solution: The calculation gives:

C2 − 2C1 C3 − C1  4 0 0   4 0 0   4 2 4  R + R U ∼ ∼ ∼ 1 2 4 2 4 0 2 4 R2 − R1 0 2 4 ↑ not divisible by 4

C2 C1 C2 − 2C1 C3 − 2C1  2 4 4   2 0 0   2 0 0  ∼ ∼ ∼ 2 0 4 2 −4 0 0 −4 0 R2 − R1  2 0 0  ∼ 0 4 0 −R2

Example 8.5.7. Find the invariant factors of

 x 1 1  U =  3 x − 2 −3  ∈ Mat3×3(Q[x]). −4 4 x + 5 140 CHAPTER 8. THE PROOF OF THE STRUCTURE THEOREM

Solution: We have

C2 C1 C2 − xC1 C3 − C1  1 x 1   1 0 0  2 U ∼ x − 2 3 −3  ∼ x − 2 −x + 2x + 3 −x − 1  4 −4 x + 5 4 −4x − 4 x + 1

 1 0 0  2 ∼  0 −x + 2x + 3 −x − 1  R2 − (x − 2)R1 0 −4x − 4 x + 1 R3 − 4R1

C3 C2  1 0 0   1 0 0  2 2 ∼  0 −x − 1 −x + 2x + 3  ∼  0 x + 1 x − 2x − 3  −R2 2 0 x + 1 −4x − 4 0 0 −x − 2x − 1 R3 + R2

C3 − (x − 3)C2  1 0 0   1 0 0  ∼  0 x + 1 0  ∼  0 x + 1 0  2 2 0 0 −(x + 1) 0 0 (x + 1) −R3 Thus the invariant factors of U are 1, x + 1 and (x + 1)2.

We conclude this section with the proof of Theorem 8.1.1:

Proof of Theorem 8.1.1. Let N be a submodule of An. By Theorem 8.3.1, N has a basis, hence in particular it has a finite system of generators {u1,..., um} and thus we can write

N = ColA(U) where U = (u1 ··· um) ∈ Matn×m(A). By Theorem 8.5.5, we can, by a series of elementary row and column operations, bring the matrix U to the form   d1 0  ..   . 0  U 0 =  0   dr     0 0  for an integer r ≥ 0 and nonzero elements d1, . . . , dr of A with d1 | d2 | · · · | dr. By Theo- rem 8.5.4, there exists an automorphism ϕ of An such that 0 ColA(U ) = ϕ(ColA(U) = ϕ(N). n −1 Denote by {e1,..., en} the standard basis of A , and set Cj = ϕ (ej) for each j = 1, . . . , n. −1 n Since ϕ is an automorphism of A , Proposition 6.5.13 shows that {C1,...,Cn} is a basis n 0 of A . Also, we know, thanks to Lemma 8.4.5, that {d1e1, . . . , drer} is a basis of ColA(U ). Then Lemma 8.5.2 applied to ϕ−1 tells us that −1 −1 {ϕ (d1e1), . . . , ϕ (drer)} = {d1C1, . . . , drCr} −1 0 is a basis of ϕ (ColA(U )) = N. 8.6. ALGORITHMS 141

Exercises. 8.5.1. Prove Lemma 8.5.2. 8.5.2. Find the invariant factors of each of the following matrices: 6 4 0 9 6 12 (i) 2 0 3 ∈ Mat ( ), (ii) ∈ Mat ( ),   3×3 Z 3 0 4 2×3 Z 2 2 8 x3 0 x2 x3 − 1 0 x4 − 1 (iii) x2 x 0 ∈ Mat ( [x]), (iv) ∈ Mat ( [x]).   3×3 Q x2 − 1 x + 1 0 2×3 Q 0 x x3

8.5.3. Let A be a euclidean domain and let U ∈ Matn×m(A). Show that the ideal I of A generated by all the entries of U is not affected by elementary row or column operations. Deduce that the first invariant factor of U is a generator of I and that therefore this divisor is uniquely determined by U up to multiplication by a unit. 8.5.4. Let A and U be as in Exercise 8.5.3, and let k be an integer with 1 ≤ k ≤ min(n, m). For all integers i1, . . . , ik and j1, . . . , jk with

1 ≤ i1 < i2 < ··· < ik ≤ n and 1 ≤ j1 < j2 < ··· < jk ≤ m, we denote by U j1,...,jk the submatrix of U of size k × k formed by the entries of U appearing i1,...,ik in the rows with indices i1, . . . , ik and the columns with indices j1, . . . , jk. Let Ik be the ideal of A generated by the determinants of all these submatrices (called the k × k minors of U).

(i) Show that this ideal is not affected by elementary row or column operations.

(ii) Let d1, . . . , dr be the invariant factors of U. Deduce from (i) that Ik = (d1 ··· dk) if k ≤ r and that Ik = (0) if k > r. (iii) Conclude that the integer r is the greatest integer for which U contains a submatrix of size r × r with nonzero determinant, and that the invariant factors d1, . . . , dr of U are uniquely determined by U up to multiplication by units in A×.

Note. This construction generalizes that of Exercise 8.5.3 since I1 is simply the ideal of A generated by all the elements of U.

8.6 Algorithms

The proof of Theorem 8.1.1 given in the preceding section gives, in principle, a method of n finding the basis {C1,...,Cn} of A appearing in the statement of the theorem. The goal of this section is to give a practical algorithm to achieve this goal, then to deduce an algorithm for calculating the rational canonical form of an endomorphism of a finite-dimensional vector space. We begin with the following observation: 142 CHAPTER 8. THE PROOF OF THE STRUCTURE THEOREM

0 Lemma 8.6.1. Let U ∈ Matn×m(A), let U be a matrix obtaining by applying to U an elementary column operation, and let E be the matrix obtained by applying the same operation to Im (the m × m identity matrix). Then we have:

U 0 = UE.

Proof. Denote the columns of U by C1,...,Cm. We split the proof into three cases:

(I) The matrix U 0 is obtained by interchanging columns i and j of U, where i < j. We then have

i j ∨ ∨ 0 U = (C1 ··· Cj ··· Ci ··· Cm) i j ↓ ↓  1   ..   .     1    = (C1 ··· Ci ··· Cj ··· Cm)  0 ··· 1  ← i  . . .   . .. .     1 ··· 0  ← j    1     .   ..  1 = UE.

(II) The matrix U 0 is obtained by adding to column i of U the product of column j of U and an element a of A, where j 6= i. In this case

i j ∨ ∨ 0 U = (C1 ··· Ci + aCj ··· Cj ··· Cm)  1   ..   .     1  ← i  . .  = (C1 ··· Ci ··· Cj ··· Cm)  . ..     a ··· 1  ← j    .   ..  1 = UE.

(This presentation supposes i < j. The case i > j is similar.) 8.6. ALGORITHMS 143

(III) The matrix U 0 is obtained by multiplying the i-th column of U by a unit a ∈ A×. Then i ↓ i i  1  ∨ ∨ 0  ..  U = (C1 ··· aCi ··· Cm) = (C1 ··· Ci ··· Cm)  .  = UE.    a   .   ..    1

A similar result applies to elementary row operations. The proof of the following lemma is left as an exercise:

0 Lemma 8.6.2. Let U ∈ Matn×m(A), let U be a matrix obtained from applying an elementary row operation to U, and let E be the matrix obtained by applying the same operation to In. Then we have: U 0 = E U.

We also note: Lemma 8.6.3. Let k be a positive integer. Every matrix obtained by applying an elementary column operation to Ik can also be obtained by applying an elementary row operation to Ik, and vice versa.

Proof. (I) Interchanging columns i and j of Ik produces the same result as interchanging rows i and j

(II) Adding to column i of Ik its j-th column multiplied by an element a aof A, for distinct i and j is equivalent to adding to row j of Ik its i-th row multiplied by a:  1   ..   .     1  ← i  . .   . ..     a ··· 1  ← j    .   ..  1

(III) Multiplying column i of Ik by a unit a produces the same result as multiplying row i by a.

These results suggest the following definition. 144 CHAPTER 8. THE PROOF OF THE STRUCTURE THEOREM

Definition 8.6.4 (Elementary matrix). A k×k elementary matrix is any matrix of Matk×k(A) obtained by applying an elementary row or column operation to Ik.

We immediately note: Proposition 8.6.5. An elementary matrix is invertible and its inverse is again an elemen- tary matrix.

Proof. Let E ∈ Matk×k(A) be an elementary matrix. It is obtained by applying to Ik an elementary column operation. Let F be the matrix obtained by applying to Ik the inverse elementary operation. Then Lemma 8.6.1 gives Ik = EF and Ik = FE. Thus E is invertible and E−1 = F . Note. The fact that an elementary matrix is invertible can also be seen by observing that its determinant is a unit of A (see Appendix B).

Using the notion of elementary matrices, we can reformulate Theorem 8.5.5 in the fol- lowing manner:

Theorem 8.6.6. Let U ∈ Matn×m(A). There exist elementary matrices E1,...,Es of size n × n and elementary matrices F1,...,Ft of size m × m such that   d1 0  ..   . 0  E ··· E UF ··· F =  0  (8.14) s 1 1 t  dr     0 0 

for some integer r ≥ 0 and nonzero elements d1, . . . , dr of A with d1 | d2 | · · · | dr.

Proof. Theorem 8.5.5 tells us that we can transform U to a matrix of the form on the right side of (8.14) by a series of elementary row and column operations. Equality (8.14) follows by denoting by Ei the elementary matrix that encodes the i-th elementary row operation and by Fj the elementary matrix that encodes the j-th elementary column operation.

Since every elementary matrix is invertible and a product of invertible matrices in invert- ible, we deduce:

Corollary 8.6.7. Let U ∈ Matn×m(A). There exists invertible matrices P ∈ Matn×n(A) and Q ∈ Matm×m(A) such that PUQ is in Smith normal form.

To calculate P and Q, we can proceed as follows:

Algorithm 8.6.8. Let U ∈ Matn×m(A).

" U I • We construct the “corner” table n . Im 8.6. ALGORITHMS 145

• Applying a series of elementary operations to its first n row and its first m columns, we " S P bring it to the form were S denotes the Smith normal form of U. Q

• Then P and Q are invertible matrices such that PUQ = S.

Proof. Let s be the number of elementary operations needed and, for i = 0, 1, . . . , s, let " Ui Pi

Qi

be the table obtained after i elementary operation, so that the series of intermediate tables is " " " " " U I U P U P U P S P n = 0 0 ∼ 1 1 ∼ · · · ∼ s s = . Im Q0 Q1 Qs Q

We will show by induction that, for i = 0, 1, . . . , s, the matrices Pi and Qi are invertible and that PiUQi = Ui. For i = s, this will show that P and Q are invertible with PUQ = S as claimed.

For i = 0, the statement to prove is clear since In and Im are invertible and InUIm = U. Now suppose that we have 1 ≤ i < s, that Pi−1 and Qi−1 are invertible and that Pi−1UQi−1 = Ui−1. We split into two cases.

Case 1: The i-th operation is an elementary row operation (on the first n rows). In this case, if E is the corresponding elementary matrix, we have

Ui = EUi−1,Pi = EPi−1 and Qi = Qi−1.

Since E, Pi−1 and Qi−1 are invertible, we deduce that Pi and Qi are invertible and that

PiUQi = EPi−1UQi−1 = EUi−1 = Ui.

Case 2: The i-th operation is an elementary column operation (on the first m columns). In this case, if F is the corresponding elementary matrix, we have

Ui = Ui−1F,Pi = Pi−1 and Qi = Qi−1F.

Since F , Pi−1 and Qi−1 are invertible, so are Pi and Qi and we obtain

PiUQi = Pi−1UQi−1F = Ui−1F = Ui. 146 CHAPTER 8. THE PROOF OF THE STRUCTURE THEOREM

 4 8 4  Example 8.6.9. Let U = ∈ Mat ( ). We are looking for invertible matrices 4 10 8 2×3 Z P and Q such that PUQ is in Smith normal form. " U I Solution: We retrace the calculations of Example 8.5.6, applying them to the table 2 . I3 In this way, we find

C2 − 2C1 C3 − C1  4 8 4 1 0  4 0 0 1 0  4 10 8 0 1  4 2 4 0 1    1 0 0 ∼  1 −2 −1    0 1 0  0 1 0 0 0 1 0 0 1   4 0 0 1 0 4 2 4 0 1 R1 + R2  0 2 4 −1 1 R2 − R1  0 2 4 −1 1   ∼  1 −2 −1 ∼  1 −2 −1    0 1 0  0 1 0 0 0 1 0 0 1

C2 C1 C2 − 2C1 C3 − 2C1  2 4 4 0 1  2 0 0 0 1  2 0 4 −1 1  2 −4 0 −1 1   ∼  −2 1 −1 ∼  −2 5 3    1 0 0  1 −2 −2 0 0 1 0 0 1

 2 0 0 0 1  2 0 0 0 1  0 −4 0 −1 0 R2 − R1  0 4 0 1 0 −R2   ∼  −2 5 3 ∼  −2 5 3    1 −2 −2  1 −2 −2 0 0 1 0 0 1

 −2 5 3   0 1  Thus P = ∈ Mat ( ) and Q = 1 −2 −2 ∈ Mat ( ) are 1 0 2×2 Z   3×3 Z 0 0 1 invertible matrices (over Z) such that  2 0 0  PUQ = 0 4 0 is the Smith normal form of U.

The following result shows in particular that every invertible matrix is a product of elementary matrices. Therefore, Corollary 8.6.7 is equivalent to Theorem 8.6.6 (each of the two statements implies the other). 8.6. ALGORITHMS 147

Theorem 8.6.10. Let P ∈ Matn×n(A). The following conditions are equivalent:

(i) P is invertible,

(ii) P can be transformed to In by a series of elementary row operations, (iii) P can be written as a product of elementary matrices.

If these equivalent conditions are satisfied, we can, by a series of elementary row operations, transform the matrix (P | In) ∈ Matn×2n(A) to a matrix of the form (In | Q) and then we have P −1 = Q.

In Exercises 8.6.2 and 8.6.3, we propose a sketch of a proof of this result as well as a method to determine if P is invertible and, if so, to transform P to In by elementary row operations.  x + 1 x2 + x + 1  Example 8.6.11. Let P = ∈ Mat ( [x]). Show that P is invertible x x2 + 1 2×2 Q and calculate P −1.

Solution: We have  x + 1 x2 + x + 1 1 0   1 x 1 −1  R − R (P | I) = ∼ 1 2 x x2 + 1 0 1 x x2 + 1 0 1  1 x 1 −1   1 0 x2 + 1 −x2 − x − 1  R − xR ∼ ∼ 1 2 0 1 −x 1 + x R2 − xR1 0 1 −x x + 1

Thus P is invertible (we can transform it to I2 by a series of elementary row operations) and  x2 + 1 −x2 − x − 1  P −1 = . −x x + 1

Theorem 8.1.1 provides a basis of An adapted to each submodule N of An. The following algorithm gives a practical means of calculating such a basis when we know a system of generators of the module N.

Algorithm 8.6.12. Let U ∈ Matn×m(A) and let N = ColA(U).

• Using Algorithm 8.6.8, we construct invertible matrices P and Q such that   d1 0  ..   . 0  PUQ =  0  (8.15)  dr     0 0 

for an integer r ≥ 0 and nonzero elements d1, . . . , dr of A with d1 | d2 | · · · | dr. 148 CHAPTER 8. THE PROOF OF THE STRUCTURE THEOREM

−1 n • Then the columns of P form a basis {C1,...,Cn} of A such that {d1C1, . . . , drCr} is a basis of N.

n Proof. Let {e1,..., en} be the standard basis of A and let C1,...,Cn be the columns of P −1. The function ϕ : An −→ An X 7−→ P −1X is an isomorphism of A-modules (exercise) and so Proposition 6.5.13 tells us that

−1 −1 {P (e1),...,P (en)} = {C1,...,Cn}

is a basis of An. Then equality (8.15) can be rewritten   d1 0  ..   . 0    UQ = P −1  0  = d C ··· d C 0 ··· 0 .  dr  1 1 r r    0 0 

Since Q is invertible, it is a product of elementary matrices (Theorem 8.6.10) and so UQ is obtained from U by a series of elementary column operations. By Lemma 8.4.3, we deduce that N = ColA(U) = ColA(UQ) = hd1C1, . . . , drCriA.

Thus {d1C1, . . . , drCr} is a system of generators of N. If a1, . . . , ar ∈ A satisfy

a1(d1C1) + ··· + ar(drCr) = 0, then (a1d1)C1 +···+(ardr)Cr = 0, hence a1d1 = ··· = ardr = 0 since C1,...,Cn are linearly independent over A, and thus a1 = ··· = ar = 0. This proves that d1C1, . . . , drCr are linearly independent and so {d1C1, . . . , drCr} is a basis of N.

In fact, since the second part of Algorithm 8.6.12 does not use the matrix P , we can skip to calculating the matrix Q: it suffices to apply Algorithm 8.6.8 to the table (U | In).  x  x2  x  Example 8.6.13. Let N = , , ⊆ [x]2. Then N = Col (U) where x2 x x3 Q Q[x] Q[x]  x x2 x  U = . We see x2 x x3 C2 − xC1 C3 − C1    x x2 x 1 0  x 0 0 1 0 (U | I) = ∼ x2 x x3 0 1 x2 x − x3 x3 − x2 0 1

C2 + C3    x 0 0 1 0  x 0 0 1 0 ∼ 3 3 2 ∼ 2 3 2 0 x − x x − x −x 1 R2 − xR1 0 x − x x − x −x 1 8.6. ALGORITHMS 149

C3 + xC2   x 0 0 1 0  x 0 0 1 0  ∼ 2 ∼ 2 0 x − x 0 −x 1 0 x − x 0 x −1 −R2  1 0  Therefore the invariant factors of N are x and x2 − x = x(x − 1). Setting P = , x −1 we see that  1 0 1 0   1 0 1 0   1 0 1 0  (P | I) = ∼ ∼ x −1 0 1 0 −1 −x 1 R2 − xR1 0 1 x −1 −R2  1 0  hence P −1 = . Applying Algorithm 8.6.12, we conclude that x −1  1  0  C = ,C = 1 x 2 −1

is a basis of Q[x]2 such that  x   0  {xC , (x2 − x)C } = , 1 2 x2 −x2 + x is a basis of N. We conclude this chapter with the following result that refines Theorem 7.2.4 and that, combined with Theorem 7.2.5, allows us to construct a basis relative to which the matrix of a linear operator is in rational canonical form. Theorem 8.6.14. Let V be a vector space of finite dimension n ≥ 1 over a field K, let B = {v1,..., vn} be a basis of V , and let T : V → V be a linear operator. Then there exist invertible matrices P,Q ∈ Matn×n(K[x]) and monic polynomials d1(x), . . . , dn(x) ∈ K[x] with d1(x) | d2(x) | · · · | dn(x) such that   d1(x) 0  ..  P (xI − [ T ]B) Q =  0 .  . (8.16) dn(x) Write   p11(x) ··· p1n(x) −1  . .  P =  . .  , pn1(x) ··· pnn(x) and, for the K[x]-module structure on V associated to T , set

uj = p1j(x)v1 + ··· + pnj(x)vn for j = 1, . . . , n. Then we have

V = K[x]uk ⊕ · · · ⊕ K[x]un

where k denotes the smallest integer such that dk(x) 6= 1. Furthermore, the annihilator of uj is generated by dj(x) for j = k, . . . , n. 150 CHAPTER 8. THE PROOF OF THE STRUCTURE THEOREM

Proof. Proposition 8.1.2 tells us that the function ψ : K[x]n −→ V   p1(x)  .   .  7−→ p1(x)v1 + ··· + pn(x)vn pn(x) is a surjective homomorphism of K[x]-modules, whose kernel is

ker(ψ) = ColK[x](xI − [ T ]B).

Algorithm 8.6.8 allows us to construct invertible matrices P,Q ∈ Matn×n(K[x]) such that   d1(x)  ... 0       dr(x)  P (xI − [ T ]B)Q =    0   .   ..  0 0 for an integer r ≥ 0 and nonzero polynomials d1(x), . . . , dr(x) ∈ K[x] with d1(x) | · · · | dr(x). Since the determinant of the left side of this equality is nonzero, we must have r = n. By changing P or Q, we can make d1(x), . . . , dn(x) monic. This proves the first assertion of the theorem.

−1 n Algorithm 8.6.12 tells us that the columns of P form a basis {C1,...,Cn} of K[x] such that {d1(x)C1, . . . , dn(x)Cn} is a basis of ker(ψ). By construction, we also have

uj = ψ(Cj) for j = 1, . . . , n. The conclusion now follows by retracing the proof of Theorem 7.2.1 given in Section 8.1 (with A = K[x]).

This results provides a new proof of the Cayley-Hamilton Theorem. Indeed, we deduce: Corollary 8.6.15. In the notation of Theorem 8.6.14, we have

charT (x) = d1(x)d2(x) ··· dn(x)

and dn(x) is the minimal polynomial of T . Furthermore, we have charT (T ) = 0.

Proof. Taking the determinant of both sides of (8.16), we have

det(P ) charT (x) det(Q) = d1(x) ··· dn(x).

Since P and Q are invertible elements of Matn×n(K[x]), their determinants are elements of K[x]× = K×, hence the above equality can be rewritten

charT (x) = a d1(x) ··· dn(x) 8.6. ALGORITHMS 151

× for some constant a ∈ K . Since the polynomials charT (x) and d1(x), . . . , dn(x) are both monic, we must have a = 1. Now, Theorem 7.2.4 tells us that dn(x) is the minimal polynomial of T . We deduce that charT (T ) = d1(T ) ◦ · · · ◦ dn(T ) = 0.

Applying Theorem 7.2.5, we also obtain:

Corollary 8.6.16. With the notation of Theorem 8.6.14, let mi = deg(di(x)) for i = k, . . . , n. Then

 mk−1 mn−1 C = uk,T (uk),...,T (uk), ...... , un,T (un),...,T (un) is a basis of V such that   Dk 0  ..  [ T ]C =  0 .  Dn where Di denotes the companion matrix of di(x) for i = k, . . . , n.

Example 8.6.17. Let V be a vector space over Q of dimension 3, let B = {v1, v2, v3} be a basis of V , and let T : V → V be a linear operator for which

 0 −1 −1  [ T ]B =  −3 2 3  . 4 −4 −5

Find a basis C of V such that [ T ]C is in rational canonical form.

Solution: We first apply Algorithm 8.6.8 to the matrix

 x 1 1  U = xI − [ T ]B =  3 x − 2 −3  . −4 4 x + 5

Since we only need the matrix P , we work with the table (U | I). Retracing the calculations of Example 8.5.7, we find

C2 C1  1 x 1 1 0 0  (U | I) ∼  x − 2 3 −3 0 1 0  4 −4 x + 5 0 0 1

C2 − xC1 C3 − C1  1 0 0 1 0 0  ∼  x − 2 −x2 + 2x + 3 −x − 1 0 1 0  4 −4x − 4 x + 1 0 0 1 152 CHAPTER 8. THE PROOF OF THE STRUCTURE THEOREM

 1 0 0 1 0 0  2 ∼  0 −x + 2x + 3 −x − 1 −x + 2 1 0  R2 − (x − 2)R1 0 −4x − 4 x + 1 −4 0 1 R3 − 4R1

C3 C2  1 0 0 1 0 0  ∼  0 −x − 1 −x2 + 2x + 3 −x + 2 1 0  0 x + 1 −4x − 4 −4 0 1  1 0 0 1 0 0  2 ∼  0 x + 1 x − 2x − 3 x − 2 −1 0  −R2 2 0 0 −x − 2x − 1 −x − 2 1 1 R3 + R2

C3 − (x − 3)C2  1 0 0 1 0 0  ∼  0 x + 1 0 x − 2 −1 0  0 0 −(x + 1)2 −x − 2 1 1  1 0 0 1 0 0  ∼  0 x + 1 0 x − 2 −1 0  2 0 0 (x + 1) x + 2 −1 −1 −R3

Thus the invariant factors are 1, x + 1 and (x + 1)2. We see that

 1 0 0 −1  1 0 0  −1 P =  x − 2 −1 0  =  x − 2 −1 0  x + 2 −1 −1 4 1 −1

(the verification of this calculation is left as an exercise). We deduce that

V = Q[x]u2 ⊕ Q[x]u3 where u2 = −v2 + v3 and u3 = −v3.

2 Furthermore, the annihilator u2 is generated by x+1 and that of u3 is generated by (x+1) = x2 + 2x + 1. We conclude that

C = {u2, u3,T (u3)} = {−v2 + v3, −v3, v1 − 3v2 + 5v3}

is a basis of V such that  −1 0 0  [ T ]C =  0 0 −1  0 1 −2 where, on the diagonal, we recognize the companion matrices of x + 1 and x2 + 2x + 1. Note. The first column of P −1 gives

u1 = 1v1 + (x − 1)v2 + 4v3 = v1 + T (v2) − 2v2 + 4v3 = 0

as it should, since the annihilator of u1 is generated by d1(x) = 1. 8.6. ALGORITHMS 153

Exercises.

In all the exercises below, A denotes a euclidean domain. 8.6.1. Prove Lemma 8.6.2.

8.6.2. Let P ∈ Matn×n(A).

(i) Give an algorithm for transforming P into an upper triangular matrix T by a series of elementary row operations. (ii) Show that P is invertible if and only if the diagonal element of T are units of A.

(iii) If the diagonal elements of T are units, give an algorithm for transforming T into In by a series of elementary row operations.

(iv) Conclude that, if P is invertible, then P can be transformed into the matrix In by a series of elementary row operations. Hint. For part (i), show that P is invertible if and only if T is invertible and use the fact that a matrix in Matn×n(A) is invertible if and only if its determinant is a unit of A (see Appendix B).

8.6.3. Problem 8.6.2 shows that, in the statement of Theorem 8.6.10, the condition (i) implies condition (ii). Complete the proof of this theorem by proving the implications (ii) ⇒ (iii) and (iii) ⇒ (i), and then proving the last assertion of the theorem.

8.6.4. For each of the matrix U of Exercise 8.5.2, find invertible matrices P and Q such that PUQ is in Smith normal form.

n 8.6.5. In each case below, find a basis {C1,...,Cn} of A and nonzero elements d1, . . . , dr of A with d1 | d2 | · · · | dr such that {d1C1, . . . , drCr} is a basis of the given submodule N of An.

* x   0 + (i) A = Q[x], N = x + 1 ,  x  x x2 Q[x]

*x + 1  −4   4 + (ii) A = R[x], N =  −1  , x + 1 ,  −2  −2 4 x − 5 R[x]

4  4  2 (iii) A = , N = , , Z 8 10 8 Z 154 CHAPTER 8. THE PROOF OF THE STRUCTURE THEOREM

* 4  0  4 + (iv) A = Z, N = 12 , 8 ,  8  0 4 −2 Z

8.6.6. Let V be a finite-dimensional vector space over a field K, and let T : V → V be a linear map. Show that minT (x) and charT (x) have the same irreducible factors in K[x]. Hint. Use Corollary 8.6.15.

3 3 8.6.7. Let T : Q → Q be a linear map. Suppose that charT (x) admits an irreducible factor of degree ≥ 2 in Q[x]. Show that Q3 is a cyclic Q[x]-module.

8.6.8. Let V be a vector space of dimension 2 over Q with basis B = {v1, v2}, and let  3 1 T : V → V be a linear map with [ T ] = . B −1 1

(i) Find charT (x) and minT (x).

(ii) Find a basis C of V such that [ T ]C is block diagonal with companion matrices of polynomials on the diagonal.

 4 2 2  8.6.9. Let A =  1 3 1  ∈ Mat3×3(Q). −3 −3 −1

(i) Find charA(x) and minA(x). (ii) Find a matrix D that is similar to A and that is block diagonal with companion matrices of polynomials on the diagonal. Chapter 9

Duality and the tensor product

In this chapter, we present some linear algebraic constructions that play an important role in numerous areas of mathematics, as much in analysis as in algebra.

9.1 Duality

Definition 9.1.1. Let V be a vector space over a field K. The dual of V is the vector space ∗ ∗ LK (V,K) of linear maps from V to K. We denote it by V . An element of V is called a linear form on V . Example 9.1.2. Let X be a set and let F(X,K) be the vector space of functions g : X → K. For all x ∈ X, the function

Ex : F(X,K) −→ K g 7−→ g(x)

∗ called evaluation at the point x, is linear, thus Ex ∈ F(X,K) .

Proposition 9.1.3. Suppose that V is finite dimensional over K and that B = {v1,..., vn} is a basis of V . For each i = 1, . . . , n, we denote by fi : V → K the linear map such that ( 0 if j 6= i, fi(vj) = δij := 1 if j = i.

∗ ∗ Then B = {f1, . . . , fn} is a basis of V .

We say that B∗ is the basis of V ∗ dual to B.

Proof. We know that E = {1} is a basis of K viewed as a vector space over itself. Then Theorem 2.4.3 gives us an isomorphism

∼ LK (V,K) −→ Mat1×n(K) . B f 7−→ [ f ]E

155 156 CHAPTER 9. DUALITY AND THE TENSOR PRODUCT

For i = 1, . . . , n, we see that

i ∨ B   t [ fi ]E = [fi(v1)]E ··· [fi(vn)]E = (0,..., 1,..., 0) = ei.

t t ∗ Since {e1,..., en} is a basis of Mat1×n(K), we deduce that {f1, . . . , fn} is a basis of V . Proposition 9.1.4. Let V and W be vector spaces over K and let T : V → W be a linear map. For each linear form g : W → K, the composition g ◦ T : V → K is a linear form on V . Furthermore, the function

tT : W ∗ −→ V ∗ g 7−→ g ◦ T is linear.

We say that tT is the transpose of T .

Proof. The first assertion follows from the fact that the composition of two linear maps is linear. The fact that tT is linear follows from Proposition 2.3.1.

Example 9.1.5. Let X and Y be two sets, and let h: X → Y be a function. One can verify that the map T : F(Y,K) −→ F(X,K) g 7−→ g ◦ h

t ∗ is linear (exercise). In this context, we determine T (Ex) where Ex ∈ F(X,K) denotes the evaluation at a point x of X (see Example 9.1.2).

For all g ∈ F(Y,K), we have

t ( T (Ex))(g) = (Ex ◦ T )(g) = Ex(T (g)) = Ex(g ◦ h) = (g ◦ h)(x) = g(h(x)) = Eh(x)(g),

t hence T (Ex) = Eh(x).

Proposition 9.1.6. Let U, V and W be vector spaces over K.

(i) If S,T ∈ LK (V,W ) and c ∈ K, then

t(S + T ) = tS + tT and t(cT ) = c tT.

(ii) If S ∈ LK (U, V ) and T ∈ LK (V,W ), then

t(T ◦ S) = tS ◦ tT.

The proof of this result is left as an exercise. Part (i) implies: 9.1. DUALITY 157

Corollary 9.1.7. Let V and W be vector spaces over K. The function

∗ ∗ LK (V,W ) −→ LK (W ,V ) T 7−→ tT is linear.

Theorem 9.1.8. Let V and W be finite-dimensional vector spaces over K, let

B = {v1,..., vn} and D = {w1,..., wm} be bases, respectively, of V and W , and let

∗ ∗ B = {f1, . . . , fn} and D = {g1, . . . , gm}

∗ ∗ be the bases of V and W dual, respectively, to B and D. For all T ∈ LK (V,W ), we have

t D∗ B t [ T ]B∗ = [ T ]D .

In other words, the matrix of the transpose tT of T relative to the dual bases D∗ and B∗ is the transpose of the matrix of T relative to the bases B and D.

B Proof. Let T ∈ LK (V,W ). Write [ T ]D = (aij). By definition, we have

T (vj) = a1jw1 + ··· + amjwm (j = 1, . . . , n).

t ∗ ∗ We know that T ∈ LK (W ,V ) and we want to find

t D∗  t t  [ T ]B∗ = [ T (g1)]B∗ ··· [ T (gm)]B∗ .

Fix an index j ∈ {1, . . . , m}. For all i = 1, . . . , n, we have

m t X  ( T (gj))(vi) = (gj ◦ T )(vi) = gj(T (vi)) = gj akiwk = aji. k=1 We deduce that n t X T (gj) = ajifi i=1 t B (see Exercise 9.1.1). Thus, [ T (gj)]B∗ is the transpose of the row j of [ T ]D. This gives   a11 ··· am1 t D∗ . . B t [ T ] ∗ =  . .  = [ T ] B  . .  D a1n ··· amn 158 CHAPTER 9. DUALITY AND THE TENSOR PRODUCT

Definition 9.1.9 (Double dual). The double dual of a vector space V is the dual of V ∗. We denote it V ∗∗.

∗∗ ∗ We thus have V = LK (V ,K). The following result is particularly important. Theorem 9.1.10. Let V be a vector space over K.

(i) For all v ∈ V , the function

∗ Ev : V −→ K f 7−→ f(v)

∗∗ is linear and so Ev ∈ V . (ii) The function ϕ : V −→ V ∗∗ v 7−→ Ev is also linear.

(iii) If V is finite-dimensional, then ϕ is an isomorphism.

Proof. (i) Let v ∈ V . For all f, g ∈ V ∗ and c ∈ K, we have

Ev(f + g) = (f + g)(v) = f(v) + g(v) = Ev(f) + Ev(g) ,

Ev(c f) = (c f)(v) = c f(v) = c Ev(f) .

(ii) Let u, v ∈ V and c ∈ K. For all f ∈ V ∗, we have

Eu+v(f) = f(u + v) = f(u) + f(v) = Eu(f) + Ev(f) = (Eu + Ev)(f) ,

Ec v(f) = f(c v) = c f(v) = c Ev(f) = (c Ev)(f) ,

hence Eu+v = Eu + Ev and Ec v = c Ev. This shows that ϕ is linear.

(iii) Suppose that V is finite-dimensional. Let B = {v1,..., vn} be a basis of V and let ∗ ∗ B = {f1, . . . , fn} be the basis of V dual to B. We have

Evj (fi) = fi(vj) = δij (1 ≤ i, j ≤ n).

∗ ∗ ∗∗ ∗ Thus {Ev1 ,...,Evn } is the basis of (V ) = V dual to B . Since ϕ maps B to this basis, the linear map ϕ is an isomorphism.

It can be shown that the map ϕ defined in Theorem 9.1.10 is always injective (even if V is infinite-dimensional). It is an isomorphism if and only if V is finite-dimensional. 9.1. DUALITY 159

Exercises.

9.1.1. Let V be a finite-dimensional vector space over a field K, let B = {v1,..., vn} be a ∗ ∗ ∗ basis of V , and let B = {f1, . . . , fn} be the basis of V dual to B. Show that, for all f ∈ V , we have f = f(v1)f1 + ··· + f(vn)fn.

9.1.2. Prove Proposition 9.1.6.

t t t 3 9.1.3. Consider the vectors v1 = (1, 0, 1) , v2 = (0, 1, −2) and v3 = (−1, −1, 0) of R .

(i) If f is a linear form on R3 such that

f(v1) = 1, f(v2) = −1, f(v3) = 3,

and if v = (a, b, c)t, determine f(v).

3 (ii) Give an explicit linear form f on R such that f(v1) = f(v2) = 0 but f(v3) 6= 0.

3 (iii) Let f be a linear form on R such that f(v1) = f(v2) = 0 and f(v3) 6= 0. Show that, for v = (2, 3, −1)t, we have f(v) 6= 0.

t t t 3 9.1.4. Consider the vectors v1 = (1, 0, −i) , v2 = (1, 1, 1) and v3 = (i, i, 0) of C . Show 3 ∗ 3 ∗ that B = {v1, v2, v3} is a basis of C and determine the basis B of (C ) dual to B.

2 9.1.5. Let V = h 1, x, x iR be the vector space of polynomial functions p: R → R of degree at most 2. We define three linear forms f1, f2, f3 on V by setting

Z 1 Z 2 Z 0 f1(p) = p(x)dx, f2(p) = p(x)dx, f3(p) = p(x)dx. 0 0 −1

∗ Show that B = {f1, f2, f3} is a basis of V and find its dual basis in V .

9.1.6. Let m and n be positive integers, let K be a field, and let f1, . . . , fm be linear forms on Kn.

(i) Verify that the function T : Kn → Km given by   f1(v)  .  n T (v) =  .  for all v ∈ K , fm(v)

is a linear map. 160 CHAPTER 9. DUALITY AND THE TENSOR PRODUCT

(ii) Show that every linear map from Kn to Km can be written in this way for some choice n ∗ of f1, . . . , fm ∈ (K ) .

9.1.7. Let n be a positive integer and let K be a field. Set V = Matn×n(K). Recall that the trace of a matrix A = (ai,j) ∈ V is the sum of the elements of its diagonal: Pn trace(A) = i=1 ai,i.

(i) Verify that the function trace: V → K is a linear form on V .

(ii) Let B ∈ V . Show that the map ϕB : V → K given by ϕB(A) = trace(AB) for all A ∈ V is linear.

∗ (iii) Show that the function Φ: V → V given by Φ(B) = ϕB for all B ∈ V is linear. (iv) Show that ker Φ = {0} and conclude that Φ is an isomorphism.

9.2 Bilinear maps

Let U, V and W be vector spaces over a field K. Definition 9.2.1 (, ). A bilinear map from U ×V to W is a function B : U × V → W that satisfies

(i) B(u, v1 + v2) = B(u, v1) + B(u, v2),

(ii) B(u1 + u2, v) = B(u1, v) + B(u2, v), (iii) B(c u, v) = B(u, c v) = c B(u, v)

for every choice of u, u1, u2 ∈ U, of v, v1, v2 ∈ V and of c ∈ K. A bilinear form on U × V is a bilinear map from U × V to K.

Therefore, a bilinear map from U × V to W is a function B : U × V → W such that, 1) for all u ∈ U, the function V → W is linear; v 7→ B(u, v) 2) for all v ∈ V , the function U → W is linear. u 7→ B(u, v) We express condition 1) by saying that B is right linear and condition 2) by saying that B is left linear. Example 9.2.2. The function B : V × V ∗ −→ K (v, f) 7−→ f(v)

∗ ∗ is a bilinear form on V × V since, for every choice of v, v1, v2 ∈ V , of f, f1, f2 ∈ V and of c ∈ K, we have 9.2. BILINEAR MAPS 161

(i)( f1 + f2)(v) = f1(v) + f2(v),

(ii) f(v1 + v2) = f(v1) + f(v2),

(iii) f(c v) = c f(v) = (c f)(v).

Example 9.2.3. Proposition 2.3.1 shows that the function

LK (U, V ) × LK (V,W ) −→ LK (U, W ) (S,T ) 7−→ T ◦ S is a bilinear map. 2 Definition 9.2.4. We denote by LK (U, V ; W ) the set of bilinear maps from U × V to W .

The following result provides an analogue of Theorem 2.2.3 for bilinear maps.

2 Theorem 9.2.5. For all B1,B2 ∈ LK (U, V ; W ) and c ∈ K, the functions

B1 + B2 : U × V −→ W and c B1 : U × V −→ W (u, v) 7−→ B1(u, v) + B2(u, v)(u, v) 7−→ c B1(u, v)

2 are bilinear. The set LK (U, V ; W ) equipped with the addition and scalar multiplication defined above is a vector space over K.

The proof of the first assertion is similar to that of Proposition 2.2.1 while the second assertion can be proved by following the model of the proof of Theorem 2.2.3. These proofs are left as exercises.

In Exercise 9.2.3, an outline is given of a proof that we have a natural isomorphism:

2 ∼  LK (U, V ; W ) −→ LK U, LK (V,W ) B 7−→ (u 7→ (v 7→ (B(u, v)))

We conclude this section with the following results:

Theorem 9.2.6. Suppose that U and V admit bases

A = {u1,..., um} and B = {v1,..., vn} respectively. Then, for all wij ∈ W (1 ≤ i ≤ m, 1 ≤ j ≤ n), there exists a unique bilinear map B : U × V → W satisfying

B(ui, vj) = wij (1 ≤ i ≤ m, 1 ≤ j ≤ n). (9.1) 162 CHAPTER 9. DUALITY AND THE TENSOR PRODUCT

Proof. Let wij ∈ W (1 ≤ i ≤ m, 1 ≤ j ≤ n). 1st Uniqueness. If there is a bilinear map B : U × V → W satisfying (9.1), then

m n ! m n ! X X X X B aiui, bjvj = aiB ui, bjvj i=1 j=1 i=1 j=1 m n ! X X = ai bjB(ui, vj) i=1 j=1 m n XX = aibjwij i=1 j=1

for all a1, . . . , am, b1, . . . , bn ∈ K. This proves that there exists at most one such bilinear map.

2nd Existence. Consider the function B : U × V → W defined by

m n ! m n X X XX B aiui, bjvj = aibjwij. i=1 j=1 i=1 j=1

It satisfies conditions (9.1). It remains to prove that it is bilinear. We have

m n n ! m n ! X X X 0 X X 0 B aiui, bjvj + bjvj = B aiui, (bj + bj)vj i=1 j=1 j=1 i=1 j=1 m n XX 0 = ai(bj + bj)wij i=1 j=1 m n m n XX XX 0 = aibjwij + aibjwij i=1 j=1 i=1 j=1 m n ! m n ! X X X X 0 = B aiui, bjvj + B aiui, bjvj , i=1 j=1 i=1 j=1

0 0 for all a1, . . . , am, b1, . . . , bn, b1, . . . , bn ∈ K. We also verify that

m n ! m n ! X X X X B aiui, c bjvj = c B aiui, bjvj , i=1 j=1 i=1 j=1 for all c ∈ K. Thus B is right linear. We check similarly that B is left linear. Therefore, B is bilinear. Theorem 9.2.7. Suppose that U and V admit bases

A = {u1,..., um} and B = {v1,..., vn} 9.2. BILINEAR MAPS 163

respectively. Let B : U × V → K be a bilinear form on U × V . Then there exists a unique matrix M ∈ Matm×n(K) such that

t B(u, v) = [u]AM[v]B (9.2)

for all u ∈ U and all v ∈ V . The element in row i and column j of M is B(ui, vj).

The matrix M is called the matrix of the bilinear form B relative to the bases A of U and B of V .

Proof. 1st Existence. Let u ∈ U and v ∈ V . They can be written in the form

m n X X u = aiui and v = bjvj i=1 j=1 with a1, . . . , am, b1, . . . , bn ∈ K. We have m n XX B(u, v) = aibjB(ui, vj) i=1 j=1     B(u1, v1) ··· B(u1, vn) b1  . .   .  = (a1, . . . , am)  . .   .  B(um, v1) ··· B(um, vn) bn

t   = [u]A B(ui, vj) [v]B .

nd 2 Uniqueness. Suppose that M ∈ Matm×n(K) satisfies condition (9.2) for all u ∈ U and all v ∈ V . For i = 1, . . . , m and j = 1, . . . , n, we have  0  . i  .  ∨   t   B(ui, vj) = [ui]AM[vj]B = (0,..., 1,..., 0)M  1  ← j  .   .  0 = element in (i, j) position of M.

∗ Example 9.2.8. Suppose that B = {v1,..., vn} is a basis of V . Let B = {f1, . . . , fn} be the dual basis of V ∗. We calculate the matrix of the bilinear form B : V × V ∗ −→ K (v, f) 7−→ f(v) relative to these bases.

Solution: We have B(vi, fj) = fj(vi) = δij for 1 ≤ i, j ≤ n, hence the matrix of B is (δij) = In, the n × n identity matrix. 164 CHAPTER 9. DUALITY AND THE TENSOR PRODUCT

Exercises.

9.2.1. Let V and W be vector spaces over a field K and let v ∈ V . Show that the map

B : V ∗ × W −→ W (f, w) 7−→ f(v)w

is bilinear.

9.2.2. Let T , U, V and W be vector spaces over a field K, let B : U × V → T be a bilinear map and let L: T → W be a linear map. Show that the composite

L ◦ B : U × V −→ W

is a bilinear map.

2 9.2.3. Let U, V, W be vector spaces over a field K. For all B ∈ LK (U, V ; W ) and all u ∈ U, we denote by Bu : V → W the linear map given by

Bu(v) = B(u, v)(v ∈ V ).

2 (i) Let B ∈ LK (U, V ; W ). Show that the map

ϕB : U −→ LK (V ; W ) u 7−→ Bu

is linear.

(ii) Show that the map

2 ϕ : LK (U, V ; W ) −→ LK (U, LK (V ; W )) B 7−→ ϕB

is an isomorphism of vector spaces over K.

2 0 0 0 3 9.2.4. Let {e1, e2} be the standard basis of R and {e1, e2, e3} be the standard basis of R . 3 2 Give a formula for f((x1, x2, x3), (y1, y2)) where f : R × R → R denotes the bilinear map determined by the conditions

0 0 0 f(e1, e1) = 1, f(e1, e2) = 0, f(e2, e1) = 3, 0 0 0 f(e2, e2) = −1, f(e3, e1) = 4, f(e3, e2) = −2.

Deduce the value of f(u, v) if u = (1, 2, 1)t and v = (1, 0)t. 9.2. BILINEAR MAPS 165

9.2.5. Let U, V and W be vector spaces over a field K, let Bi : U × V → W (i = 1, 2) be bilinear maps, and let c ∈ K. Show that the maps

B1 + B2 : U × V −→ W and cB1 : U × V −→ W (u, v) 7−→ B1(u, v) + B2(u, v)(u, v) 7−→ cB1(u, v)

2 are bilinear. Then show that LK (U, V ; W ) is a vector space for these operations (since this verification is a bit long, you can simply show the existence of an additive identity, that every element has an additive inverse and that one of the two axioms of distributivity holds).

9.2.6. Let V be a vector space of finite dimension n over a field K, and let B = {v1,..., vn} be a basis of V . For every pair of integers i, j ∈ {1, . . . , n}, we denote by gi,j : V × V → K the bilinear form determined by the conditions ( 1 if (k, `) = (i, j), gi,j(vk, v`) = 0 otherwise.

2 Show that B = { gi,j ; 1 ≤ i, j ≤ n } is a basis of the vector space Bil(V,K) := LK (V,V ; K) of all bilinear maps from V × V to K.

9.2.7. Let V be a vector space over a field K of characteristic different from 2. We say that a bilinear form f : V × V → K is symmetric if f(u, v) = f(v, u) for all u ∈ V and v ∈ V . We say that it is antisymmetric if f(u, v) = −f(v, u) for all u ∈ V and v ∈ V .

(i) Show that the set S of symmetric bilinear forms on V ×V is a subspace of Bil(V,K) := 2 LK (V,V ; K). (ii) Show that the set A of antisymmetric bilinear forms on V × V is also a subspace of Bil(V,K). (iii) Show that Bil(V,K) = S ⊕ A.

9.2.8. Find a basis of the space of symmetric bilinear forms from R2 × R2 to R and a basis of the space of antisymmetric bilinear forms from R2 × R2 to R.

9.2.9. Let C(R) be the vector space of continuous functions from R to R, and let V be the subspace of C(R) generated (over R) by the functions 1, sin(x) and cos(x). Show that the function B : V × V −→ R R π (f, g) 7−→ 0 f(t)g(π − t)dt is a bilinear form and find its matrix with respect to the basis A = {1, sin(x), cos(x)} of V .

n 9.2.10. Let V = h 1, x, . . . , x iR be the vector space of polynomial functions p: R → R of degree at most n. Find the matrix of the bilinear map

B : V × V −→ R R 1 (f, g) 7−→ 0 f(t)g(t)dt with respect to the basis A = {1, x, . . . , xn} of V . 166 CHAPTER 9. DUALITY AND THE TENSOR PRODUCT 9.3 Tensor product

Throughout this section, we fix two vector spaces U and V of finite dimension over a field K.

If T is a vector space, we can consider a bilinear map from U × V to T as a product

U × V −→ T (9.3) (u, v) 7−→ u ⊗ v with values in T . To say that this map is bilinear means that the product in question satisfies:

(u1 + u2) ⊗ v = u1 ⊗ v + u2 ⊗ v,

u ⊗ (v1 + v2) = u ⊗ v1 + u ⊗ v2, (c u) ⊗ v = u ⊗ (c v) = c u ⊗ v, for all u, u1, u2 ∈ U, all v, v1, v2 ∈ V and all c ∈ K.

Given a bilinear map as above in (9.3), we easily check that, for every linear map L: T → W , the composite B : U × V → W given by

B(u, v) = L(u ⊗ v) for all (u, v) ∈ U × V (9.4) is a bilinear map (see Exercise 9.2.2).

Our goal is to construct a bilinear map (9.3) that is universal in the sense that every bilinear map B : U × V → W can be written in the form (9.4) for a unique choice of linear map L: T → W . A bilinear map (9.3) with this property is called a tensor product of U and V . The following definition summarizes this discussion. Definition 9.3.1 (Tensor product). A tensor product of U and V is the data of a vector space T over K and a bilinear map U × V −→ T (u, v) 7−→ u ⊗ v possessing the following property. For every vector space W and every bilinear map B : U × V → W , there exists a unique linear map L: T → W such that B(u, v) = L(u ⊗ v) for all (u, v) ∈ U × V .

This condition can be represented by the following diagram:

B U × V → W ·· ⊗ ·· ·· ↓ ·· L T · (9.5) 9.3. TENSOR PRODUCT 167

In this diagram, the vertical arrow represents the tensor product, the horizontal arrow repre- sents an arbitrary bilinear map, and the dotted diagonal arrow represents the unique linear map L that, by composition with the tensor product, recovers B. For such a choice of L, the horizontal arrow B is the composition of the two other arrows and we say that the diagram (9.5) is commutative. The following proposition gives the existence of a tensor product of U and V .

Proposition 9.3.2. Let {u1,..., um} be a basis of U, let {v1,..., vn} be a basis of V , and let {tij ; 1 ≤ i ≤ m, 1 ≤ j ≤ n} be a basis of a vector space T of dimension mn over K. The bilinear map (9.3) determined by the conditions

ui ⊗ vj = tij (1 ≤ i ≤ m, 1 ≤ j ≤ n) (9.6) is a tensor product of U and V .

Proof. Theorem 9.2.6 shows that there exists a bilinear map (u, v) 7−→ u ⊗ v from U × V to T satisfying 9.6. It remains to show that it satisfies the universal property required by Definition 9.3.1.

To do this, we fix an arbitrary bilinear map B : U × V → W . Since the set {tij ; 1 ≤ i ≤ m, 1 ≤ j ≤ n} is a basis of T , there exists a unique linear map L: T → W such that

L(tij) = B(ui, vj) (1 ≤ i ≤ m, 1 ≤ j ≤ n). (9.7) Then the map B0 : U × V → W given by B0(u, v) = L(u ⊗ v) for all (u, v) ∈ U × V is bilinear and satisfies

0 B (ui, vj) = L(ui ⊗ vj) = L(tij) = B(ui, vj) (1 ≤ i ≤ m, 1 ≤ j ≤ n). By Theorem 9.2.6, this implies that B0 = B. Thus the map L satisfies the condition B(u, v) = L(u ⊗ v) for all (u, v) ∈ U × V. Finally, L is the only linear map from T to W satisfying this condition since the latter implies (9.7), which determines L uniquely.

The proof given above uses in a crucial way the hypotheses that U and V are finite- dimensional over K. It is possible to prove the existence of the tensor product in greater generality, without this hypothesis, but we will not do so here. We next observe that the tensor product is unique up to isomorphism. Proposition 9.3.3. Suppose that U × V −→ T and U × V −→ T 0 (u, v) 7−→ u ⊗ v (u, v) 7−→ u ⊗0 v are two tensor products of U and V . Then there exists an isomorphism L: T → T 0 such that L(u ⊗ v) = u ⊗0 v for all (u, v) ∈ U × V . 168 CHAPTER 9. DUALITY AND THE TENSOR PRODUCT

Proof. Since ⊗0 is bilinear and ⊗ is a tensor product, there exists a linear map L: T → T 0 such that u ⊗0 v = L(u ⊗ v) (9.8) for all (u, v) ∈ U × V . It remains to prove that L is an isomorphism. For this, we first note that, symmetrically, since ⊗ is bilinear and ⊗0 is a tensor product, there exists a linear map L0 : T 0 → T such that u ⊗ v = L0(u ⊗0 v) (9.9) for all (u, v) ∈ U × V . Combining (9.8) and (9.9), we obtain

(L0 ◦ L)(u ⊗ v) = u ⊗ v and (L ◦ L0)(u ⊗0 v) = u ⊗0 v for all (u, v) ∈ U × V . Now, the identity maps I : T → T and I0 : T 0 → T 0 also satisfy

I(u ⊗ v) = u ⊗ v and I0(u ⊗0 v) = u ⊗0 v

for all (u, v) ∈ U × V . Since ⊗ and ⊗0 are tensor products, this implies that L0 ◦ L = I and L◦L0 = I0. Thus L is an invertible linear map with inverse L0, and so it is an isomorphism.

Combining Proposition 9.3.2 and 9.3.3, we obtain the following.

Proposition 9.3.4. Let {u1,..., um} be a basis of U, let {v1,..., vn} be a basis of V , and let ⊗: U × V → T be a tensor product of U and V . Then

{ui ⊗ vj ; 1 ≤ i ≤ m, 1 ≤ j ≤ n} (9.10)

is a basis of T .

Proof. Proposition 9.3.2 shows that there exists a tensor product ⊗: U × V → T such that (9.10) is a basis of T . If ⊗0 : U × V → T 0 is another tensor product, then, by Propo- sition 9.3.3, there exists an isomorphism L: T → T 0 such that u ⊗0 v = L(u ⊗ v) for all (u, v) ∈ U × V . Since (9.10) is a basis of T , we deduce that the products

0 ui ⊗ vj = L(ui ⊗ vj) (1 ≤ i ≤ m, 1 ≤ j ≤ n)

form a basis of T 0.

The following theorem summarizes the essential results obtained up to now:

Theorem 9.3.5. There exists a tensor product ⊗: U × V → T of U and V . If {u1,..., um} is a basis of U and {v1,..., vn} is a basis of V , then {ui ⊗ vj ; 1 ≤ i ≤ m, 1 ≤ j ≤ n} is a basis of T .

We often say that T itself is the tensor product of U and V and we denote it by U ⊗V or, more precisely, by U ⊗K V when we want to emphasize the fact that U and V are considered as vector spaces over K. Theorem 9.3.5 gives

dimK (U ⊗K V ) = dimK (U) dimK (V ). 9.3. TENSOR PRODUCT 169

The fact that the tensor product is not unique, but is unique up to isomorphism, does not cause problems in practice, in the sense that the linear dependence relations between the products u ⊗ v with u ∈ U and v ∈ V do not depend on the tensor product ⊗. In fact if, for a given tensor product ⊗: U × V → T , we have:

c1u1 ⊗ v1 + c2u2 ⊗ v2 + ··· + csus ⊗ vs = 0 (9.11)

0 0 with c1, . . . , cs ∈ K, u1,..., us ∈ U and v1,..., vs ∈ V , and if ⊗ : U × V → T is another tensor product, then, denoting by L: T → T 0 the isomorphism such that L(u ⊗ v) = u ⊗0 v for all (u, v) ∈ U × V and applying L to both sides of (9.11), we get

0 0 0 c1u1 ⊗ v1 + c2u2 ⊗ v2 + ··· + csus ⊗ vs = 0.

2 2 2 Example 9.3.6. Let {e1, e2} be the standard basis of K . A basis of K ⊗K K is

{e1 ⊗ e1, e1 ⊗ e2, e2 ⊗ e1, e2 ⊗ e2}.

In particular, we have e1 ⊗ e2 − e2 ⊗ e1 6= 0, hence e1 ⊗ e2 6= e2 ⊗ e1. We also find

(e1 + e2) ⊗ (e1 − e2) = e1 ⊗ (e1 − e2) + e2 ⊗ (e1 − e2)

= e1 ⊗ e1 − e1 ⊗ e2 + e2 ⊗ e1 − e2 ⊗ e2.

More generally, we have

2 2 X  X  (a1e1 + a2e2) ⊗ (b1e1 + b2e2) = aiei ⊗ bjej i=1 j=1 2 2 XX = aibjei ⊗ ej i=1 j=1

= a1b1e1 ⊗ e1 + a1b2e1 ⊗ e2 + a2b1e2 ⊗ e1 + a2b2e2 ⊗ e2.

Example 9.3.7. We know that the scalar product in Rn is a bilinear form

Rn × Rn −→ R . (u, v) 7−→ hu, vi

n n Thus these exists a unique linear map L: R ⊗R R → R such that

hu, vi = L(u ⊗ v)

n n for all u, v ∈ R . Let {e1,..., en} be the standard basis of R . Then the set {ei ⊗ ej ; 1 ≤ n n i ≤ n, 1 ≤ j ≤ n} is a basis of R ⊗R R and so L is determined by the conditions

L(ei ⊗ ej) = hei, eji = δij (1 ≤ i ≤ n, 1 ≤ j ≤ n). 170 CHAPTER 9. DUALITY AND THE TENSOR PRODUCT

Exercises.

2 9.3.1. Let {e1, e2} be the standard basis of R . Show that e1 ⊗e2 +e2 ⊗e1 cannot be written in the form u ⊗ v with u, v ∈ R2.

2 2 2 2 9.3.2. Let {e1, e2} be the standard basis of R , let B : R × R → R be the bilinear map determined by the conditions

B(e1, e1) = (1, 0),B(e1, e2) = (2, 1),B(e2, e1) = (−1, 3) and B(e2, e2) = (0, −1),

2 2 2 2 and let L: R ⊗R R → R be the linear map such that B(u, v) = L(u ⊗ v) for all u, v ∈ R . Calculate

(i) L(e1 ⊗ e2 − e2 ⊗ e1),

(ii) L((e1 + 2e2) ⊗ (−e1 + e2)),

(iii) L(e1 ⊗ e1 + (e1 + e2) ⊗ e2).

9.3.3. Let V be a finite-dimensional vector space over a field K. Show that there exists a ∗ ∗ unique linear form L: V ⊗K V → K satisfying the condition L(f ⊗ v) = f(v) for all f ∈ V and v ∈ V .

9.3.4. Let U and V be finite-dimensional vector spaces over a field K and let {u1,..., um} be a basis of U. Show that every element t of U ⊗K V can be written in the form

t = u1 ⊗ v1 + u2 ⊗ v2 + ··· + um ⊗ vm for a unique choice of vectors v1, v2,..., vm ∈ V .

9.3.5. Let U, V be finite-dimensional vector spaces over a field K.

(i) Let U1 and V1 be subspaces of U and V respectively and let T1 be the subspace of U ⊗K V generated by the products u ⊗ v with u ∈ U1 and v ∈ V1. Show that the map

U1 × V1 −→ T1 (u, v) 7−→ u ⊗ v

is a tensor product of U1 and V1. We can then identify T1 with U1 ⊗K V1.

(ii) Suppose that U = U1 ⊕ U2 and V = V1 ⊕ V2. Show that

U ⊗K V = (U1 ⊗K V1) ⊕ (U2 ⊗K V1) ⊕ (U1 ⊗K V2) ⊕ (U2 ⊗K V2). 9.3. TENSOR PRODUCT 171

9.3.6. Let V be a finite-dimensional vector space over R. Considering C as a vector space over R, we form the tensor product C ⊗R V .

(i) Show that, for all α ∈ C, there exists a unique linear map Lα : C ⊗R V → C ⊗R V satisfying Lα(β ⊗ v) = (αβ) ⊗ v for all β ∈ C and v ∈ V .

0 (ii) Show that, for all α, α ∈ C, we have Lα ◦ Lα0 = Lαα0 and Lα + Lα0 = Lα+α0 . Deduce that C⊗R V is a vector space over C for the external multiplication given by αt = Lα(t) for all α ∈ C and v ∈ V .

(iii) Show that if {v1,..., vn} is a basis of V as a vector space over R, then {1 ⊗ v1,..., 1 ⊗ vn} is a basis of C ⊗R V as a vector space over C.

We say that C ⊗R V is the complexification of V .

9.3.7. With the notation of the preceding problem, show that every element t of C ⊗R V can be written in the form t = 1 ⊗ u + i ⊗ v for unique u, v ∈ V , and also that every complex number α ∈ C can be written in the form α = a + ib for unique a, b ∈ R. Given α and u as above, determine the decomposition of the product αt.

9.3.8. Let V and W be finite-dimensional vector spaces over a field K.

(i) Show that, for all f ∈ V ∗ and w ∈ W , the map

Lf,w : V −→ W v 7−→ f(v)w

is linear.

(ii) Show also that the map ∗ V × W −→ LK (V,W ) (f, w) 7−→ Lf,w is bilinear.

∗ ∼ (iii) Deduce that there exists a unique isomorphism L: V ⊗K W −→ LK (V,W ) satisfying ∗ L(f ⊗ w) = Lf,w for all f ∈ V and w ∈ W .

9.3.9. With the notation of the proceeding problem, let B = {v1,..., vn} be a basis of V , ∗ let {f1, . . . , fn} be the basis of V dual to B, and let D = {w1,..., wm} be a basis of W . B Fix T ∈ LK (V,W ) and set [ T ]D = (aij). Show that, under the isomorphism L of part (iii) of the preceding problem, we have

m n ! X X T = L aijfj ⊗ wi . i=1 j=1 172 CHAPTER 9. DUALITY AND THE TENSOR PRODUCT

9.3.10. Let K be a field of characteristic not equal to 2, and let V be a vector space over K of dimension n with basis A = {v1,..., vn}. For every pair of elements u, v of V , we define

1 1 u ∧ v := (u ⊗ v − v ⊗ u) and u · v := (u ⊗ v + v ⊗ u). 2 2

V2 2 Let V be the subspace of V ⊗K V generated by the vectors u ∧ v, and let Sym (V ) be the subspace generated by the vectors u · v.

V2 (i) Show that the set B1 = {vi ∧ vj ; 1 ≤ i < j ≤ n} is a basis of V while B2 = 2 V2 2 {vi ·vj ; 1 ≤ i ≤ j ≤ n} is a basis of Sym (V ). Deduce that V ⊗K V = V ⊕Sym (V ).

V2 (ii) Show that the map ϕ1 : V × V → V given by ϕ1(u, v) = u ∧ v is antisymmetric 2 bilinear and that the map ϕ2 : V ×V → Sym (V ) given by ϕ2(u, v) = u·v is symmetric bilinear (see Exercise 9.2.7 for the definitions).

(iii) Let B : V × V → W be an arbitrary antisymmetric bilinear map. Show that there V2 exists a unique linear map L1 : V → W such that B = L1 ◦ ϕ1.

(iv) State and prove the analogue of (iii) for a symmetric bilinear map.

Hint. For (iii), let L: V ⊗ V → W be the linear map satisfying B(u, v) = L(u ⊗ v) for all u, v ∈ V . Show that Sym2(V ) ⊆ ker(L) and that B(u, v) = L(u ∧ v) for all u, v ∈ V .

9.4 The Kronecker product

In this section, we fix finite-dimensional vector spaces U and V over a field K and we fix a tensor product

U × V −→ U ⊗K V. (u, v) 7−→ u ⊗ v We first note:

Proposition 9.4.1. Let S ∈ EndK (U) and T ∈ EndK (V ). Then there exists a unique element of EndK (U ⊗K V ), denoted S ⊗ T , such that

(S ⊗ T )(u ⊗ v) = S(u) ⊗ T (v)

for all (u, v) ∈ U × V .

Proof. The map

B : U × V −→ U ⊗K V (u, v) 7−→ S(u) ⊗ T (v) 9.4. THE KRONECKER PRODUCT 173 is left linear since

B(u1 + u2, v) = S(u2 + u2) ⊗ T (v) and B(c u, v) = S(c u) ⊗ T (v)

= (S(u1) + S(u2)) ⊗ T (v) = (c S(u)) ⊗ T (v)

= S(u1) ⊗ T (v) + S(u2) ⊗ T (v) = c (S(u) ⊗ T (v))

= B(u1, v) + B(u2, v) = c B(u, v) for all u, u1, u2 ∈ U, v ∈ V and c ∈ K. Similarly, we verify that B is right linear. Thus it is a bilinear map. By the properties of the tensor product, we conclude that there exists a unique linear map L: U ⊗K V → U ⊗K V such that L(u ⊗ v) = S(u) ⊗ T (v) for all u ∈ U and v ∈ V .

The tensor product of the operators defined by Proposition 9.4.1 possesses a number of properties (see the exercises). We show here the following:

Proposition 9.4.2. Let S1,S2 ∈ EndK (U) and T1,T2 ∈ EndK (V ). Then we have

(S1 ⊗ T1) ◦ (S2 ⊗ T2) = (S1 ◦ S2) ⊗ (T1 ◦ T2).

Proof. For all (u, v) ∈ U × V , we have

((S1 ⊗ T1) ◦ (S2 ⊗ T2))(u ⊗ v) = (S1 ⊗ T1)((S2 ⊗ T2)(u ⊗ v))

= (S1 ⊗ T1)(S2(u) ⊗ T2(v))

= S1(S2(u)) ⊗ T1(T2(v))

= (S1 ◦ S2)(u) ⊗ (T1 ◦ T2)(v).

By Proposition 9.4.1, this implies that (S1 ⊗ T1) ◦ (S2 ⊗ T2) = (S1 ◦ S2) ⊗ (T1 ◦ T2).

Definition 9.4.3 (Kronecker product). The Kronecker product of a matrix A = (aij) ∈ Matm×m(K) by a matrix B = (bij) ∈ Matn×n(K) is the matrix   a11B a12B ··· a1mB  a B a B ··· a B  ˙  21 22 2m  A×B =  . . .  ∈ Matmn×mn(K).  . . .  am1B am2B ··· ammB

Our interest in this notion comes from the following result:

Proposition 9.4.4. Let S ∈ EndK (U) and T ∈ EndK (V ), and let

A = {u1,..., um} and B = {v1,..., vn} be bases of U and V respectively. Then

C = {u1 ⊗ v1,..., u1 ⊗ vn; u2 ⊗ v1,..., u2 ⊗ vn; ...... ; um ⊗ v1,..., um ⊗ vn} is a basis of U ⊗K V and we have

[ S ⊗ T ]C = [ S ]A×˙ [ T ]B. 174 CHAPTER 9. DUALITY AND THE TENSOR PRODUCT

Proof. The matrix [ S ⊗ T ]C decomposes naturally in blocks:

z }| { z }| { u1 ⊗ v1,..., u1 ⊗ vn ··· um ⊗ v1,..., um ⊗ vn     u1 ⊗ v1 .   .  C11 ··· C1m     u1 ⊗ vn     . .  . .  .  . .        um ⊗ v1   .   . C ··· C  m1 mm um ⊗ vn

For fixed k, l ∈ {1, . . . , m}, the block Ckl is the following piece:

··· ul ⊗ v1,..., ul ⊗ vn ··· . .   uk ⊗ v1     . .   .  Ckl    uk ⊗ vn   .   .

The (i, j) element of Ckl is thus equal to the coefficient of uk ⊗ vi in the expression of (S ⊗ T )(ul ⊗ vj) as a linear combination of elements of the basis C. To calculate it, write     a11 ··· a1m b11 ··· b1n  . .   . .  A = [ S ]A =  . .  and B = [ T ]B =  . .  . am1 ··· amm bn1 ··· bnn

We see that

(S ⊗ T )(ul ⊗ vj) = S(ul) ⊗ T (vj)

= (a1lu1 + ··· + amlum) ⊗ (b1jv1 + ··· + bnjvn)

= ··· + (aklb1juk ⊗ v1 + ··· + aklbnjuk ⊗ vn) + ··· hence the (i, j) entry of Ckl is aklbij, equal to that of aklB. This proves that Ckl = aklB and so [ S ⊗ T ]C = A×˙ B = [ S ]A×˙ [ T ]B. 9.4. THE KRONECKER PRODUCT 175

Exercises.

2 0 0 0 0 3 9.4.1. Denote by E = {e1, e2} the standard basis of R and by E = {e1, e2, e3} that of R . Let S : R2 → R2 and T : R3 → R3 be the linear operators for which 1 1 0  1 2 [ S ] = and [ T ] 0 = 0 1 1 . E −1 0 E   1 0 1

Calculate

0 (i)( S ⊗ T )(3e1 ⊗ e1),

0 0 (ii)( S ⊗ T )(2e1 ⊗ e1 − e2 ⊗ e2),

0 0 (iii)( S ⊗ T )(2(e1 + e2) ⊗ e3 − e2 ⊗ e3).

9.4.2. Let U and V be finite-dimensional vector spaces over a field K, and let S : U → U and T : V → V be invertible linear maps. Show that the linear map S ⊗ T : U ⊗K V → U ⊗K V is also invertible, and give an equation relating S−1, T −1 and (S ⊗ T )−1.

9.4.3. Let S ∈ EndK (U) and T ∈ EndK (V ), where U and V denote finite-dimensional vector spaces over a field K. Let U1 be an S-invariant subspace of U and let V1 be a T -invariant subspace of V . Consider U1 ⊗K V1 as a subspace of U ⊗K V as in Exercise 9.3.5. Under these hypotheses, show that U1 ⊗K V1 is an S ⊗ T -invariant subspace of U ⊗K V and that (S ⊗ T )| = S| ⊗ T | . U1⊗K V1 U1 V1

9.4.4. Let S ∈ EndC(U) and T ∈ EndC(V ), where U and V denote vector spaces of dimen- sions m and n respectively over C. We choose a basis A of U and a basis B of V such that the matrix [ S ]A and [ T ]B are in Jordan canonical form. Show that, for the corresponding basis C of U ⊗C V given by Proposition 9.4.4, the matrix [ S ⊗ T ]C is upper triangular. By calculating its determinant, show that det(S ⊗ T ) = det(S)n det(T )m.

9.4.5. Let U and V be finite-dimensional vector spaces over a field K.

(i) Show that the map

B : EndK (U) × EndK (V ) −→ EndK (U ⊗K V ) (S,T ) 7−→ S ⊗ T

is bilinear.

(ii) Show that the image of B generates EndK (U ⊗K V ) as a vector space over K. 176 CHAPTER 9. DUALITY AND THE TENSOR PRODUCT

(iii) Conclude that the linear map from EndK (U)⊗K EndK (V ) to EndK (U ⊗K V ) associated to B is an isomorphism.

9.4.6. Let U and V be finite-dimensional vector spaces over a field K.

∗ ∗ (i) Let f ∈ U and g ∈ V . Show that there exists a unique linear form f ⊗g : U ⊗K V → K such that (f ⊗ g)(u ⊗ v) = f(u)g(v) for all (u, v) ∈ U × V .

∗ ∗ ∗ (ii) Show that the map B : U × V → (U ⊗K V ) given by B(f, g) = f ⊗ g for all ∗ ∗ ∗ (f, g) ∈ U × V is bilinear and that its image generates (U ⊗K V ) as a vector space ∗ ∗ ∗ over K. Conclude that the linear map from U ⊗K V to (U ⊗K V ) associated to V is an isomorphism.

∗ ∗ (iii) Let S ∈ EndK (U) and T ∈ EndK (V ). Show that, if we identify U ⊗K V with ∗ t t t (U ⊗K V ) using the isomorphism established in (ii), then we have S ⊗ T = (S ⊗ T ).

9.5 Multiple tensor products

Theorem 9.5.1. Let U, V, W be finite-dimensional vector spaces over a field K. There exists a unique isomorphism L:(U ⊗K V ) ⊗K W → U ⊗K (V ⊗K W ) such that

L((u ⊗ v) ⊗ w) = u ⊗ (v ⊗ w) for all (u, v, w) ∈ U × V × W .

Proof. Let {u1,..., ul}, {v1,..., vm} and {w1,..., wn} be bases of U, V and W respectively. Then

{(ui ⊗ vj) ⊗ wk ; 1 ≤ i ≤ l, 1 ≤ j ≤ m, 1 ≤ k ≤ n},

{ui ⊗ (vj ⊗ wk) ; 1 ≤ i ≤ l, 1 ≤ j ≤ m, 1 ≤ k ≤ n} are bases, respectively, of (U ⊗K V ) ⊗K W and U ⊗K (V ⊗K W ). Thus there exists a unique linear map L from (U ⊗K V ) ⊗K W to U ⊗K (V ⊗K W ) such that

L((ui ⊗ vj) ⊗ wk) = ui ⊗ (vj ⊗ wk) for i = 1, . . . , l, j = 1, . . . , m and k = 1, . . . , n. By construction, L is an isomorphism. Finally, for arbitrary vectors

l m n X X X u = aiui ∈ U, v = bjvj ∈ V and w = ckwk ∈ W, i=1 j=1 k=1 9.5. MULTIPLE TENSOR PRODUCTS 177

we have

l ! m !! n !! X X X L((u ⊗ v) ⊗ w) = L aiui ⊗ bjvj ⊗ ckwk i=1 j=1 k=1 l m n ! X X X = L aibjck(ui ⊗ vj) ⊗ wk i=1 j=1 k=1 l m n X X X = aibjck ui ⊗ (vj ⊗ wk) i=1 j=1 k=1 l ! m ! n !! X X X = aiui ⊗ bjvj ⊗ ckwk i=1 j=1 k=1 = u ⊗ (v ⊗ w), as required.

The isomorphism (U ⊗K V ) ⊗K W ' U ⊗K (V ⊗K W ) given by Theorem 9.5.1 allows us to identify these two spaces, and to denote them by U ⊗K V ⊗K W . We then write u⊗v ⊗w instead of (u ⊗ v) ⊗ w or u ⊗ (v ⊗ w).

In general, Theorem 9.5.1 allows us to omit the parentheses in the tensor product of an arbitrary number of finite-dimensional vector spaces V1,...,Vr over K. Thus, we write

V1 ⊗K V2 ⊗K · · · ⊗K Vr instead of V1 ⊗K (V2 ⊗K (· · · ⊗K Vr))

and v1 ⊗ v2 ⊗ · · · ⊗ vr instead of v1 ⊗ (v2 × (· · · ⊗ vr)). We then have

dimK (V1 ⊗K V2 ⊗K · · · ⊗K Vr) = dimK (V1) dimK (V2) ··· dimK (Vr).

Exercises. 9.5.1. Let V and W be finite-dimensional vector space over a field K. Show that there exists a unique isomorphism L: V ⊗K W → W ⊗K V such that L(v ⊗ w) = w ⊗ v for all v ∈ V and w ∈ W .

9.5.2. Let r be a positive integer and V1,...,Vr,W be vector spaces over K. We say that a function ϕ : V1 × · · · × Vr −→ W is multilinear or r-linear if it is linear in each of its arguments, that is if it satisfies 178 CHAPTER 9. DUALITY AND THE TENSOR PRODUCT

0 0 ML1. ϕ(v1,..., vi + vi,..., vr) = ϕ(v1,..., vi,..., vr) + ϕ(v1,..., vi,..., vr)

ML2. ϕ(v1, . . . , c vi,..., vr) = c ϕ(v1,..., vi,..., vr)

0 for all i ∈ {1, . . . , r}, v1 ∈ V1,..., vi, vi ∈ Vi,..., vr ∈ Vr and c ∈ K. Show that the function

ψ : V1 × V2 × · · · × Vr −→ V1 ⊗K V2 ⊗K · · · ⊗K Vr (v1, v2,..., vr) 7−→ v1 ⊗ v2 ⊗ · · · ⊗ vr is multilinear.

9.5.3. Let V1,...,Vr and W be as in the preceding exercise. The goal of this new exercise is to show that, for any ϕ: V1 × · · · × Vr → W , there exists a unique linear map L: V1 ⊗K · · · ⊗K Vr → W such that

ϕ(v1,..., vr) = L(v1 ⊗ v2 ⊗ · · · ⊗ vr) for all (v1,..., vr) ∈ V1 × · · · × Vr. We proceed by induction on r. For r = 2, this follows from the definition of the tensor product. Suppose r ≥ 3 and the result is true for multilinear maps from V2 × · · · × Vr to W .

(i) Show that, for each choice of v1 ∈ V1, there exists a unique linear map Lv1 : V2 ⊗K · · · ⊗K Vr → W such that

ϕ(v1, v2,..., vr) = Lv1 (v2 ⊗ · · · ⊗ vr)

for all (v2,..., vr) ∈ V2 × · · · × Vr. (ii) Show that the map

B : V1 × (V2 ⊗K · · · ⊗K Vr) −→ W

(v1, α) 7−→ Lv1 (α) is bilinear.

(iii) Deduce that there exists a unique linear map L: V1 ⊗K (V2 ⊗K · · · ⊗K Vr) → W such that ϕ(v1,..., vr) = L(v1 ⊗ (v2 ⊗ · · · ⊗ vr)) for all (v1,..., vr) ∈ V1 × · · · × Vr.

9.5.4. Let V be a finite-dimensional vector space over a field K. Using the result of the ∗ preceding exercise, show that there exists a unique linear map L: V ⊗K V ⊗K V → V such ∗ that L(f ⊗ v1 ⊗ v2) = f(v1)v2 for all (f, v1, v2) ∈ V × V × V .

9.5.5. Let V1, V2, V3 and W be finite-dimensional vector spaces over a field K, and let ϕ : V1 × V2 × V3 → W be a trilinear map. Show that there exists a unique bilinear map B : V1 ×(V2 ⊗K V3) → W such that B(v1, v2 ⊗v3) = ϕ(v1, v2, v3) for all vj ∈ Vj (j = 1, 2, 3). Chapter 10

Inner product spaces

After a general review of inner product spaces, we define the notions of orthogonal operators and symmetric operators on these spaces and prove the structure theorems for these operators (the spectral theorems). We conclude by showing that every linear operator on a finite- dimensional inner product space is the composition of a positive definite symmetric operator followed by an orthogonal operator. This decomposition is called the polar decomposition of an operator. For example, in R2, a positive definite symmetric operator is a product of dilations along orthogonal axes, while an orthogonal operator is a rotation about the origin or a reflection in a line passing through the origin. Therefore every linear operator on R2 is the composition of dilations along orthogonal axes follwed by a rotation or reflection.

10.1 Review

Definition 10.1.1 (Inner product). Let V be a real vector space. An inner product on V is a bilinear form V × V −→ R (u, v) 7−→ hu, vi that is symmetric: hu, vi = hv, ui for all u, v ∈ V and positive definite:

hv, vi ≥ 0 for all v ∈ V , with hv, vi = 0 ⇐⇒ v = 0 .

Definition 10.1.2 (Inner product space). A inner product space is a real vector space V equipped with an inner product. Example 10.1.3. Let n be a positive integer. We define an inner product on Rn by setting     * x1 y1 +  .   .   .  ,  .  = x1y1 + ··· + xnyn. xn yn

179 180 CHAPTER 10. INNER PRODUCT SPACES

Equipped with this inner product, Rn becomes an inner product space, called euclidean space. When we speak of Rn as an inner product space without specifying the inner product, we are implicitly using the inner product defined above. Example 10.1.4. Let V be the vector space of continuous functions f : R → R that are 2π- periodic, that is, which satisfy f(x + 2π) = f(x) for all x ∈ R. Then V is an inner product space for the inner product Z 2π hf, gi = f(x)g(x)dx . 0

Example 10.1.5. Let n be a positive integer. Then Matn×n(R) is an inner product space for the inner product hA, Bi = trace(ABt).

For the rest of this section, we fix an inner product space V with inner product h , i. We first note:

Proposition 10.1.6. Every subspace W of V is an inner product space for the inner product of V restricted to W .

Proof. Indeed, the inner product of V induces, by restriction, a bilinear map

W × W −→ R (w1, w2) 7−→ hw1, w2i

and it is easily verified that this is an inner product in the sense of Definition 10.1.1.

When we refer to a subspace of V as an inner product space, it is this inner product we are referring to. Definition 10.1.7 (Norm, unit vector). The norm of a vector v ∈ V is

kvk = hv, vi1/2.

We say that a vector v ∈ V is a unit vector if kvk = 1.

Recall that the norm possesses the following properties, the first two of which follow immediately from the definition:

Proposition 10.1.8. For all u, v ∈ V and c ∈ R, we have

(i) kvk ≥ 0 with equality if and only if v = 0,

(ii) kc vk = |c| kvk,

(iii) |hu, vi| ≤ kukkvk (Cauchy-Schwarz inequality),

(iv) ku + vk ≤ kuk + kvk (triangle inequality). 10.1. REVIEW 181

Relations (i), (ii) and (iv) show that k·k is indeed a norm in the proper sense of the term. The norm allows us to measure the “length” of vectors in V . It also allows us to define the “distance” d(u, v) between two vectors u and v of V by setting

d(u, v) = ku − vk.

Thanks to the triangle inequality, it is easily verified that this defines a metric on V .

Inequality (iii), called the Cauchy-Schwarz inequality, allows us to define the angle θ ∈ [0, π] between two nonzero vectors u and v of V by setting  hu, vi  θ = arccos . (10.1) kukkvk In particular, this angle θ is equal to π/2 if and only if hu, vi = 0. For this reason, we say that the vectors u, v of V are perpendicular or orthogonal if hu, vi = 0.

Finally, if v ∈ V is nonzero, it follows from (ii) that the product

kvk−1v is a unit vector parallel to v (the angle between it and v is 0 radians by (10.1)).

Definition 10.1.9. Let v1,..., vn ∈ V.

1) We say that {v1,..., vn} is an orthogonal subset of V if vi 6= 0 for i = 1, . . . , n and if hvi, vji = 0 for every pair of integers (i, j) with 1 ≤ i, j ≤ n and i 6= j.

2) We say that {v1,..., vn} is an orthogonal basis of V , if is it both a basis of V and an orthogonal subset of V .

3) We say that {v1,..., vn} is an orthonormal basis of V if it is an orthogonal basis of V 1/2 consisting of unit vectors, that is, such that kvik = hvi, vii = 1 for i = 1, . . . , n.

The following proposition summarizes the importance of these ideas. Proposition 10.1.10.

(i) Every orthogonal subset of V is linearly independent.

(ii) Suppose that B = {v1,..., vn} is an orthogonal basis of V . Then, for all v ∈ V , we have hv, v1i hv, vni v = v1 + ··· + vn. hv1, v1i hvn, vni

(iii) If B = {v1,..., vn} is an orthonormal basis of V , then, for all v ∈ V , we have   hv, v1i  .  [v]B =  .  . hv, vni 182 CHAPTER 10. INNER PRODUCT SPACES

Proof. Suppose first that {v1,..., vn} is an orthogonal subset of V , and that v ∈ hv1,..., vniR. We can write v = a1v1 + ··· + anvn

with a1, . . . , an ∈ R. Taking the inner product of both sides of this equality with vi, we have n n D X E X hv, vii = ajvj, vi = ajhvj, vii = aihvi, vii j=1 j=1 and so hv, vii ai = (i = 1, . . . , n). hvi, vii

If we choose v = 0, this formula gives a1 = ··· = an = 0. Thus the set {v1,..., vn} is

linearly independent and thus it is a basis of hv1,..., vniR. This proves both (i) and (ii). Statement (iii) follows immediately from (ii).

The following result shows that, if V is finite-dimensional, it possesses an orthogonal basis. In particular, this holds for every finite-dimensional subspace of V . Theorem 10.1.11 (Gram-Schimdt orthogonalization procedure). Suppose that {w1,..., wn} is a basis of V . We recursively define:

v1 = w1,

hw2, v1i v2 = w2 − v1, hv1, v1i hw3, v1i hw3, v2i v3 = w3 − v1 − v2, hv1, v1i hv2, v2i . .

hwn, v1i hwn, vn−1i vn = wn − v1 − · · · − vn−1. hv1, v1i hvn−1, vn−1i

Then, for all i = 1, . . . , n, the set {v1,..., vi} is an orthogonal basis of the subspace hw1,..., wiiR. In particular, {v1,..., vn} is an orthogonal basis of V .

Proof. It suffices to show the first assertion. For this, we proceed by induction on i. For

i = 1, it is clear that {v1} is an orthogonal basis of hw1iR since v1 = w1 6= 0. Suppose that {v1,..., vi−1} is an orthogonal basis of hw1,..., wi−1iR for an integer i with 2 ≤ i ≤ n. By definition, we have

hwi, v1i hwi, vi−1i vi = wi − v1 − · · · − vi−1, (10.2) hv1, v1i hvi−1, vi−1i

and we note that the denominators hv1, v1i,..., hvi−1, vi−1i of the coefficients of v1,..., vi−1 are nonzero since each of the vectors v1,..., vi−1 is nonzero. We deduce from this formula that

hv1,..., vi−1, viiR = hv1,..., vi−1, wiiR = hw1,..., wi−1, wiiR. 10.1. REVIEW 183

Then, to conclude that {v1,..., vi} is an orthogonal basis of hw1,..., wiiR, it remains to show that this is an orthogonal set (by Proposition 10.1.10). For this, it suffices to show that hvi, vji = 0 for j = 1, . . . , i − 1. By (10.2), we see that

i−1 X hwi, vki hvi, vji = hwi, vji − hvk, vji = 0 hvk, vki k=1 for j = 1, . . . , i − 1. This completes the proof.

We end this section with a review of the notion of orthogonal complement and two related results. Definition 10.1.12 (Orthogonal complement). The orthogonal complement of a subspace W of V is W ⊥ := {v ∈ V ; hv, wi = 0 for all w ∈ W }.

Proposition 10.1.13. Let W be a subspace of V . Then W ⊥ is a subspace of V that satisfies ⊥ W ∩ W = {0}. Furthermore, if W = hw1,..., wmiR, then

⊥ W = {v ∈ V ; hv, wii = 0 for i = 1, . . . , m}.

Proof. We leave it as an exercise to show that W ⊥ is a subspace of V . We note that, if w ∈ W ∩ W ⊥, then we have hw, wi = 0 and so w = 0. This shows that W ∩ W ⊥ = {0}. Finally, if W = hw1,..., wmi, we see that

⊥ v ∈ W ⇐⇒ hv, a1w1 + ··· + amwmi = 0 ∀ a1, . . . , am ∈ R ⇐⇒ a1hv, w1i + ··· + amhv, wmi = 0 ∀ a1, . . . , am ∈ R ⇐⇒ hv, w1i = ··· = hv, wmi = 0,

and thus the final assertion of the proposition follows.

It follows from Proposition 10.1.13 that V ⊥ = {0}. In other words, for v ∈ V , we have

v = 0 ⇐⇒ hu, vi = 0 for all u ∈ V.

Theorem 10.1.14. Let W be a finite-dimensional subspace of V . Then we have:

V = W ⊕ W ⊥.

Proof. Since, by Proposition 10.1.13, we have W ∩ W ⊥ = {0}, it remains to show that ⊥ V = W + W . Let {w1,..., wm} be an orthogonal basis of W (such a basis exists since W is finite-dimensional). Given v ∈ V , we set

hv, w1i hv, wmi w = w1 + ··· + wm. (10.3) hw1, w1i hwm, wmi 184 CHAPTER 10. INNER PRODUCT SPACES

For i = 1, . . . , m, we see that

hv − w, wii = hv, wii − hw, wii = 0, thus v − w ∈ W ⊥, and so v = w + (v − w) ∈ W + W ⊥. Since the choice v ∈ V is arbitrary, this shows that V = W + W ⊥, as claimed.

If W is a finite-dimensional subspace of V , Theorem 10.1.14 tells us that, for all v ∈ V , there exists a unique vector w ∈ W such that v − w ∈ W ⊥.

·  ·· w ·· ·· · W ·· 1    v 0

We say that this vector w is the orthogonal projection of v on W . If {w1,..., wm} is an orthogonal basis of W , then this projection w is given by (10.3). Example 10.1.15. Calculate the orthogonal projection of v = (0, 1, 1, 0)t on the subspace W 4 t t of R generated by w1 = (1, 1, 0, 0) and w2 = (1, 1, 1, 1) .

Solution: We first apply the Gram-Schmidt algorithm to the basis {w1, w2} of W . We obtain

t v1 = w1 = (1, 1, 0, 0)

hw2, v1i t 2 t t v2 = w2 − v1 = (1, 1, 1, 1) − (1, 1, 0, 0) = (0, 0, 1, 1) . hv1, v1i 2

Thus {v1, v2} is an orthogonal basis of W . The orthogonal projection of v on W is therefore

hv, v1i hv, v2i 1 1 1 t v1 + v2 = v1 + v2 = (1, 1, 1, 1) . hv1, v1i hv2, v2i 2 2 2

Exercises.

10.1.1. Show that B = {(1, −1, 3)t, (−2, 1, 1)t, (4, 7, 1)t} is an orthogonal basis of R3 and t calculate [(a, b, c) ]B.

10.1.2. Find a, b, c ∈ R such that the set {(1, 2, 1, 0)t, (1, −1, 1, 3)t, (2, −1, 0, −1)t, (a, b, c, 1)t} is an orthogonal basis of R4. 10.1. REVIEW 185

10.1.3. Let U and V be inner product spaces equipped with inner products h , iU and h , iV respectively. Show that the formula

(u1, v1), (u2, v2) = hu1, u2iU + hv1, v2iV defines an inner product on U × V for which (U × {0})⊥ = {0} × V .

10.1.4. Let V be an inner product space of dimension n equipped with an inner product h , i.

(i) Verify that, for each u ∈ V , the function fu : V → R given by

fu(v) = hu, vi for all v ∈ V

∗ ∗ belongs to V . Then verify that the map ϕ: V → V given by ϕ(u) = fu for all u ∈ V is an isomorphism of vector spaces over R.

∗ (ii) Let B = {u1,..., un} be an orthonormal basis of V . Show that the basis of V dual ∗ to B is B = {f1, . . . , fn}, where fi := ϕ(ui) = fui for i = 1, . . . , n.

10.1.5. Let V be the vector space of continuous functions f : R → R that are 2π-periodic.

(i) Show that the formula Z 2π hf, gi = f(x)g(x)dx 0 defines an inner product on V .

(ii) Show that {1, sin(x), cos(x)} is an orthogonal subset of V for this inner product.

(iii) Let f : R → R be the periodic “sawtooth” function given on [0, 2π] by ( x if 0 ≤ x ≤ π, f(x) = 2π − x if π ≤ x ≤ 2π,

and extended by periodicity to all of R. Calculate the projection of f on the subspace W of V generated by {1, sin(x), cos(x)}.

10.1.6. Let V = Matn×n(R).

(i) Show that the formula hA,Bi = trace(ABt) defines an inner product on V .

(ii) For n = 3, find an orthonormal basis of the subspace S = {A ∈ V ; At = A} of symmetric matrices. 186 CHAPTER 10. INNER PRODUCT SPACES

10.1.7. Consider the vector space C[−1, 1] of continuous functions from [−1, 1] to R, equipped with the inner product Z 1 hf, gi = f(x)g(x) dx . −1 2 Find an orthogonal basis of the subspace V = h 1, x, x iR.

10.1.8. Let W be the subspace of R4 generated by (1, −1, 0, 1)t and (1, 1, 0, 0)t. Find a basis of W ⊥.

10.1.9. Let W1 and W2 be subspaces of a finite-dimensional inner product space V . Show ⊥ ⊥ that W1 ⊆ W2 if and only if W2 ⊆ W1 .

10.2 Orthogonal operators

In this section, we fix an inner product space V of finite dimension n, and an orthonormal basis B = {u1,..., un} of V . We first note: Lemma 10.2.1. For all u, v ∈ V , we have t hu, vi = [u]B [v]B.

Proof. Let u, v ∈ V . Write

n n X X u = aiui and v = biui i=1 i=1 with a1, . . . , an, b1, . . . , bn ∈ R. We have

* n n + n n n X X X X X hu, vi = aiui, bjvj = aibjhui, vji = aibi i=1 j=1 i=1 j=1 i=1 since hui, uji = δij. In matrix form, this equality can be rewritten   b1 hu, vi = (a , . . . , a )  .  = [u]t [v] . 1 n  .  B B bn

Using this lemma, we can now show:

Proposition 10.2.2. Let T : V → V be a linear operator. The following conditions are equivalent: 10.2. ORTHOGONAL OPERATORS 187

(i) kT (u)k = 1 for all u ∈ V with kuk = 1,

(ii) kT (v)k = kvk for all v ∈ V ,

(iii) hT (u),T (v)i = hu, vi for all u, v ∈ V ,

(iv) {T (u1),...,T (un)} is an orthonormal basis of V ,

t (v) [ T ]B [ T ]B = In,

−1 t (vi) [ T ]B is invertible and [ T ]B = [ T ]B.

Proof. (i)=⇒(ii): Suppose first that T (u) is a unit vector for every unit vector u in V . Let v be an arbitrary vector in V . If v 6= 0, then u := kvk−1v is a unit vector, hence T (u) = kvk−1T (v) is a unit vector and so kT (v)k = kvk. If v = 0, we have T (v) = 0 and then the equality kT (v)k = kvk is again verified.

(ii)=⇒(iii): Suppose that kT (v)k = kvk for all v ∈ V . Let u, v be arbitrary vectors in V . We have ku + vk2 = hu + v, u + vi = hu, ui + 2hu, vi + hv, vi (10.4) = kuk2 + 2hu, vi + kvk2. Applying this formula to T (u) + T (v), we obtain

kT (u) + T (v)k2 = kT (u)k2 + 2hT (u),T (v)i + kT (v)k2. (10.5)

Since kT (u) + T (v)k = kT (u + v)k = ku + vk, kT (u)k = kuk and kT (v)k = kvk, compar- ing (10.4) and (10.5) gives hT (u),T (v)i = hu, vi.

(iii)=⇒(iv): Suppose that hT (u),T (v)i = hu, vi for all u, v ∈ V . Since {u1,..., un} is an orthonormal basis of V , we deduce that

hT (ui),T (uj)i = hui, uji = δij (1 ≤ i, j ≤ n).

Thus {T (u1),...,T (un)} is an orthogonal subset of V consisting of unit vectors. By Propo- sition 10.1.10(i), this set is linearly independent. Since it consists of n vectors and V is of dimension n, it is a basis of V , hence an orthonormal basis of V .

(iv)=⇒(v): Suppose that {T (u1),...,T (un)} is an orthonormal basis of V . Let i, j ∈ {1, . . . , n}. By Lemma 10.2.1, we have

t δij = hT (ui),T (uj)i = [T (ui)]B [T (uj)]B.

t t Now, [T (ui)]B is the i-th row of the matrix [ T ]B, and [T (uj)]B is the j-th column of [ T ]B. t t Thus [T (ui)]B [T (uj)]B = δij is the entry in row i and column j of [ T ]B [ T ]B. Since the t choice of i, j ∈ {1, . . . , n} was arbitrary, this implies that [ T ]B [ T ]B = I.

t t (v)=⇒(vi): Suppose that [ T ]B [ T ]B = I. Then [ T ]B is invertible with inverse [ T ]B. 188 CHAPTER 10. INNER PRODUCT SPACES

−1 t (vi)=⇒(i): Finally, suppose that [ T ]B is invertible and that [ T ]B = [ T ]B. If u is an arbitrary unit vector in V , we see, by Lemma 10.2.1, that

2 t kT (u)k = hT (u),T (u)i = [T (u)]B [T (u)]B t t t t 2 = ([ T ]B[u]B) ([ T ]B[u]B) = [u]B [ T ]B [ T ]B[u]B = [u]B [u]B = hu, ui = kuk = 1.

Thus T (u) is also a unit vector.

Definition 10.2.3 (Orthogonal operator, orthogonal matrix).

• A linear operator T : V → V satisfying the equivalent conditions of Proposition 10.2.2 is called a orthogonal operator or a real unitary operator.

−1 t • An invertible matrix A ∈ Matn×n(R) such that A = A is called an orthogonal matrix.

Condition (i) of Proposition 10.2.2 requires that T sends the set of unit vectors in V to itself. This explains the terminology of a unitary operator.

Condition (iii) of Proposition 10.2.2 means that T preserves the inner product. In par- ticular, this requires that T maps each pair of orthogonal vectors u, v in V to another pair of orthogonal vectors. For this reason, we say that such an operator is orthogonal.

Finally, condition (vi) means:

Corollary 10.2.4. A linear operator T : V → V is orthogonal if and only if its matrix relative to an orthonormal basis of V is an orthogonal matrix.

Example 10.2.5. The following linear operators are orthogonal:

• a rotation about the origin in R2,

• a reflection in a line passing through the origin in R2,

• a rotation about a line passing through the origin in R3,

• a reflection in a plane passing through the origin in R3.

Definition 10.2.6 (O(V ), On(R)). We denote by O(V ) the set of orthogonal operators on V and by On(R) the set of orthogonal n × n matrices.

× We leave it as an exercise to show that O(V ) is a subgroup of GL(V ) = EndK (V ) , that × On(R) is a subgroup of GLn(R) = Matn×n(R) , and that these two groups are isomorphic. 10.2. ORTHOGONAL OPERATORS 189

Example 10.2.7. Let T : R3 → R3 be the linear operator whose matrix relative to the standard basis E of R3 is  −1 2 2  1 [ T ] = 2 −1 2 . E 3   2 2 −1 Show that T is an orthogonal operator.

3 Solution: Since E is an orthonormal basis of R , it suffices to show that [ T ]E is an orthogonal matrix. We have  −1 2 2   −1 2 2  1 [ T ]t [ T ] = 2 −1 2 2 −1 2 = I, E E 9     2 2 −1 2 2 −1

−1 t hence [ T ]E = [ T ]E , as required.

Exercises.

10.2.1. Let T : V → V be an orthogonal operator. Show that T preserves angles, that is, if u and v are nonzero vectors in V , the angle between T (u) and T (v) is equal to the angle between u and v.

10.2.2. Let T : V → V be an invertible linear operator. Show that the following conditions are equivalent:

(i) hT (u),T (v)i = 0 for all u, v ∈ V satisfying hu, vi = 0.

(ii) T = c S for a (real) unitary operator S : V → V and a real number c 6= 0.

Note. The above shows that we cannot define an orthogonal operator to be a linear operator that preserves orthogonality. In this sense, the terminology of a unitary operator is more appropriate.

10.2.3. Let E be an orthonormal basis of V and let B be an arbitrary basis of V . Show that B the change of basis matrix P = [ I ]E is orthogonal if and only if B is also an orthonormal basis.

10.2.4. Let T : V → V be an orthogonal operator. Suppose that W is a T -invariant subspace of V . Show that W ⊥ is also T -invariant.

10.2.5. Let T : V → V be an orthogonal operator. Show that det(T ) = ±1. 190 CHAPTER 10. INNER PRODUCT SPACES

10.2.6. Let V be an inner product space of dimension n. Show that O(V ) is a subgroup of × × GL(V ) = EndK (V ) , that On(R) is a subgroup of GLn(R) = Matn×n(R) , and define an ∼ isomorphism ϕ: O(V ) −→ On(R).

10.2.7. Let A ∈ Matn×n(R) be an orthogonal matrix. Show that the columns of A form an orthonormal basis of Rn.

10.2.8. Let V and W be arbitrary inner product spaces. Suppose that there exists a function T : V → W such that T (0) = 0 and kT (v1) − T (v2)k = kv1 − v2k for all v1, v2 ∈ V .

(i) Show that hT (v1),T (v2)i = hv1, v2i for all v1, v2 ∈ V .

(ii) Deduce that kc1T (v1) + c2T (v2)k = kc1v1 + c2v2k for all v1, v2 ∈ V and c1, c2 ∈ R.

(iii) Conclude that, for all v ∈ V and c ∈ R, we have kT (c v) − c T (v)k = 0, hence T (c v) = c T (v).

(iv) Using similar reasoning, show that, for all vectors v1, v2 ∈ V , we have kT (v1 + v2) − T (v1) − T (v2)k=0, hence T (v1 + v2) = T (v1) + T (v2).

(v) Conclude that T is a linear map. In particular, show that if V = W is finite- dimensional, then T is an orthogonal operator in the sense of Definition 10.2.3.

A function T : V → W satisfying the conditions given at the beginning of this exercise is called an isometry.

Hint. For (i), use formula (10.4) with u = T (v1) and v = −T (v2), then with u = v1 and v = −v2. For (ii), use the same formula with u = c1T (v1) and v = c2T (v2), then with u = c1v1 and v = c2v2.

10.3 The adjoint operator

Using the same notation as in the preceding section, we first show:

Theorem 10.3.1. Let B : V × V → R be a bilinear form on V . There exists a unique pair of linear operators S and T on V such that

B(u, v) = hT (u), vi = hu,S(v)i (10.6)

for all u, v ∈ V . Furthermore, let E be an orthonormal basis of V and let M be the matrix of B relative to the basis E. Then we have

t [ S ]E = M and [ T ]E = M . (10.7) 10.3. THE ADJOINT OPERATOR 191

Proof. 1st Existence. Let S and T be linear operators on V determined by conditions (10.7), and let u, v ∈ V . Since M is the matrix of V in the basis E, we have

t B(u, v) = [u]E M[v]E .

t Since [T (u)]E = M [u]E and [S(v)]E = M[v]E , Lemma 10.2.1 gives

t t t hT (u), vi = M [u]E [v]E = [u]E M[v]E = B(u, v) t and hu,S(v)i = [u]E (M[v]E ) = B(u, v), as required.

2nd Uniqueness. Suppose that S0,T 0 are arbitrary linear operators on V such that

B(u, v) = hT 0(u), vi = hu,S0(v)i

for all u, v ∈ V . By subtracting these equalities from (10.6), we get

0 = hT 0(u) − T (u), vi = hu,S0(v) − S(v)i

for all u, v ∈ V . We deduce that, for all u ∈ V , we have

T 0(u) − T (u) ∈ V ⊥ = {0},

hence T 0(u) = T (u) and so T 0 = T . Similarly, we see that S0 = S.

Corollary 10.3.2. For any linear operator T on V , there exists a unique linear operator on V , denoted T ∗, such that hT (u), vi = hu,T ∗(v)i ∗ t for all u, v ∈ V . If E is an orthonormal basis of V , we have [T ]E = [ T ]E .

Proof. The function B : V × V → R given by B(u, v) = hT (u), vi for all (u, v) ∈ V × V is bilinear (exercise). To conclude, it suffices to apply Theorem 10.3.1 to this bilinear form. Definition 10.3.3 (Adjoint, symmetric, self-adjoint). Let T be a linear operator on V . The operator T ∗ : V → V defined by Corollary 10.3.2 is called the adjoint of T . We say that T is symmetric of self-adjoint if T ∗ = T .

By Corollary 10.3.2, we immediately obtain:

Proposition 10.3.4. Let T : V → V be a linear operator. The following conditions are equivalent:

(i) T is symmetric,

(ii) hT (u), vi = hu,T (v)i for all u, v ∈ V ,

(iii) there exists an orthonormal basis E of V such that [ T ]E is a symmetric matrix, 192 CHAPTER 10. INNER PRODUCT SPACES

(iv) [ T ]E is a symmetric matrix for every orthonormal basis E of V .

The proof of the following properties is left as an exercise. Proposition 10.3.5. Let S and T be linear operators on V , and let c ∈ R. We have: (i) (c T )∗ = c T ∗ , (ii) (S + T )∗ = S∗ + T ∗ , (iii) (S ◦ T )∗ = T ∗ ◦ S∗ , (iv) (T ∗)∗ = T . (v) Furthermore, if T is invertible, then T ∗ is also invertible and (T ∗)−1 = (T −1)∗.

Properties (i) and (ii) imply that the map

EndR(V ) −→ EndR(V ) T 7−→ T ∗ is linear. Properties (ii) and (iii) can be summarized by saying that T 7→ T ∗ is a ring

anti-homomorphism of EndR(V ). More precisely, because of (iv), we say that it is a ring anti-involution.

Example 10.3.6. Every symmetric matrix A ∈ Matn×n(R) defines a symmetric operator on Rn T : n −→ n A R R . X 7−→ AX We can verify this directly from the definition by noting that, for all X,Y ∈ Rn, we have t t t t hTA(X),Y i = (AX) Y = X A Y = X AY = hX,TA(Y )i. This also follows from Proposition 10.3.4. Indeed, since the standard basis E of Rn is or- thonormal and [ TA ]E = A is a symmetric matrix, the linear operator TA is symmetric. Example 10.3.7. Let W be a subspace of V , and let P : V → V be the orthogonal projection on W (that is, the linear map that sends v ∈ V to its orthogonal projection on W ). Then P is a symmetric operator. Theorem 10.1.14 shows that V = W ⊕ W ⊥. The function P is simply the projection on the first factor of this sum followed by the inclusion of W into V . Let u, v be arbitrary vectors in V . We have hP (u), vi = hP (u) , (v − P (v)) + P (v)i = hP (u) , v − P (v)i + hP (u),P (v)i. Since P (u) ∈ W and v − P (v) ∈ W ⊥, we have hP (u), v − P (v)i = 0 and the last equality can be rewritten hP (u), vi = hP (u),P (v)i. A similar calculation gives hu,P (v)i = hP (u),P (v)i. Thus, we have hu,P (v)i = hP (u), vi, which shows that P is symmetric. 10.4. SPECTRAL THEOREMS 193

Exercises. 10.3.1. Prove Proposition 10.3.5.

10.3.2. Let T : V → V be a linear operator. Show that ker(T ∗) = (Im(T ))⊥ and that Im(T ∗) = (ker(T ))⊥.

10.3.3. Let T : V → V be a linear operator. Show that T is an orthogonal operator if and only if T ∗ ◦ T = I.

10.3.4. We know that V = Matn×n(R) is an inner product space for the inner product hA, Bi = trace(ABt) (see Exercise 10.1.6). Fix C ∈ V and consider the linear operator T : V → V given by T (A) = CA for all A ∈ V . Show that the adjoint T ∗ of T is given by T ∗(A) = Ct A for all A ∈ V . Hint. Use the fact that trace(AB) = trace(BA) for all A, B ∈ V .

10.4 Spectral theorems

We again fix an inner product space V of finite dimension n ≥ 1. We first show that: Lemma 10.4.1. Let T : V → V be a linear operator on V . There exists a T -invariant subspace of V of dimension 1 or 2.

Proof. We consider V as an R[x]-module for the product p(x)v = p(T )(v). Let v be a nonzero element of V . By Proposition 7.1.7, the annihilator of v is generated by a monic polynomial p(x) ∈ R[x] of degree ≥ 1. Write p(x) = q(x)a(x) where q(x) is a monic irreducible factor of p(x), and set w = a(x)v. Then the annihilator of w is the ideal of R[x] generated by q(x). By Proposition 7.1.7, the submodule W := R[x]w is a T -invariant subspace of V of dimension equal to the degree of q(x). Now, Theorem 5.6.5 shows that all the irreducible polynomials of R[x] are of degree 1 or 2. Thus W is of dimension 1 or 2.

This lemma suggests the following definition: Definition 10.4.2 (Minimal invariant subspace, irreducible invariant subspace). Let T : V → V be a linear operator on V . We say that a T -invariant subspace W of V is minimal or irreducible if W 6= {0} and if the only T -invariant subspaces of V contained in W are {0} and W .

It follows from this definition and Lemma 10.4.1 that every nonzero T -invariant subspace of V contains a minimal T -invariant subspace W , and that such a subspace W is of dimension 1 or 2.

For an orthogonal or symmetric operator, we have more: 194 CHAPTER 10. INNER PRODUCT SPACES

Lemma 10.4.3. Let T : V → V be an orthogonal or symmetric linear operator, and let W be a T -invariant subspace of V . Then W ⊥ is also T -invariant.

Proof. Suppose first that T is orthogonal. Then T : W → W is an orthogonal operator on |W W (it preserves the norm of the elements of W ). Then T is invertible and, in particular, |W surjective. Thus T (W ) = W . Let v ∈ W ⊥. For all w ∈ W , we see that

hT (v),T (w)i = hv, wi = 0,

thus T (v) ∈ (T (W ))⊥ = W ⊥. This shows that W ⊥ is T -invariant. Now suppose that T is symmetric. Let v ∈ W ⊥. For all w ∈ W , we have

hT (v), wi = hv,T (w)i = 0,

since T (w) ∈ W . So we have T (v) ∈ W ⊥. Since the choice of v ∈ W ⊥ was arbitrary, this shows that W ⊥ is T -invariant.

Lemma 10.4.3 immediately implies:

Proposition 10.4.4. Let T : V → V be an orthogonal or symmetric linear operator. Then V is a direct sum of minimal T -invariant subspaces

V = W1 ⊕ · · · ⊕ Ws

⊥ which are pairwise orthogonal, that is, such that Wi ⊆ Wj for every pair of distinct integers i and j with 1 ≤ i, j ≤ s.

To complete our analysis of symmetric or orthogonal linear operators T : V → V , it thus suffices to study the restriction of T to a minimal T -invariant subspace. By Lemma 10.4.1, we know that such a subspace is of dimension 1 or 2. We begin with the case of an orthogonal operator.

Proposition 10.4.5. Let T : V → V be an orthogonal operator, let W be a minimal T - invariant subspace of V , and let B be an orthonormal basis of W . Then either

dim (W ) = 1 and [ T ] = (±1) R |W B for a choice of sign ±, or

 cos(θ) − sin(θ)  dim (W ) = 2 and [ T ] = R |W B sin(θ) cos(θ)

for some real number θ. 10.4. SPECTRAL THEOREMS 195

Proof. By Lemma 10.4.1, the subspace W is of dimension 1 or 2. Furthermore, the restriction T : W → W of T to W is an orthogonal operator since T preserves the norm (see |W |W Definition 10.2.3). Thus [ T ] is an orthogonal matrix, by Corollary 10.2.4. |W B 1st If dim (W ) = 1, then [ T ] = (a) with a ∈ . Since [ T ] is an orthogonal R |W B R |W B matrix, we must have (a)t(a) = 1, hence a2 = 1 and so a = ±1. 2nd If dim (W ) = 2, then the columns of [ T ] form an orthonormal basis of 2 (see R |W B R Exercise 10.2.7), and so this matrix can be written  cos(θ) − sin(θ)  [ T ] = |W B sin(θ)  cos(θ) with θ ∈ R and  = ±1. We deduce that

x − cos(θ)  sin(θ) 2 charT | (x) = = x − (1 + ) cos(θ)x + . W − sin(θ) x −  cos(θ) 2 If  = −1, the polynomial becomes charT (x) = x − 1 = (x − 1)(x + 1). Then 1 and −1 are |W the eigenvalues of T , and for each of these eigenvalues, the operator T : W → W has an |W |W eigenvector. But if w ∈ W is an eigenvector of T then w is a T -invariant subspace of V |W R of dimension 1 contained in W , contradicting the hypothesis that W is a minimal T -invariant subspace. Therefore, we must have  = 1 and the proof is complete.

In the case of a symmetric operator, the situation is even simpler. Proposition 10.4.6. Let T : V → V be a symmetric operator, and let W be a minimal T -invariant subspace of V . Then W is of dimension 1.

Proof. We know that W is of dimension 1 or 2. Suppose that it is of dimension 2, and let B be an orthonormal basis of W . The restriction T : W → W is a symmetric operator on |W W since for all w1, w2 ∈ W , we have hT (w ), w i = hT (w ), w i = hw ,T (w )i = hw ,T (w )i. |W 1 2 1 2 1 2 1 |W 2 Thus, by Proposition 10.3.4, the matrix [ T ] is symmetric. It can therefore be written in |W B the form  a b  [ T ] = |W B b c with a, b, c ∈ R. We then have

x − a −b 2 2 charT | (x) = = x − (a + c)x + (ac − b ). W −b x − c The discriminant of this polynomial is (a + c)2 − 4(ac − b2) = (a − c)2 + 4b2 ≥ 0.

Thus charT (x) has at least one real root. This root is an eigenvalue of T and so it |W |W corresponds to an eigenvector w. Then Rw is a T -invariant subspace of V contained in W , contradicting the minimality hypothesis on W . Thus W is necessarily of dimension 1. 196 CHAPTER 10. INNER PRODUCT SPACES

Combining the last three propositions, we obtain a description of orthogonal and sym- metric operators on V .

Theorem 10.4.7. Let T : V → V be a linear operator.

(i) If T is orthogonal, there exists an orthonormal basis B of V such that [ T ]B is block diagonal with blocks of the form

 cos(θ) − sin(θ)  (1), (−1) or (10.8) sin(θ) cos(θ)

(with θ ∈ R) on the diagonal.

(ii) If T is symmetric, there exists an orthonormal basis B of V such that [ T ]B is a diagonal matrix.

Result (ii) can be interpreted as saying that a symmetric operator is diagonalizable in an orthonormal basis.

Proof. By Proposition 10.4.4, the space V can be decomposed as a direct sum

V = W1 ⊕ · · · ⊕ Ws of minimal T -invariant subspaces W1,...,Ws which are pairwise orthogonal. Choose an orthonormal basis Bi of Wi for each i = 1, . . . , s. Then

B = B1 q · · · q Bs is an orthonormal basis of V such that   [ T | ]B1 W1 0  ..  [ T ]B =  0 .  . [ T | ]Bs Ws

If T is an orthogonal operator, Proposition 10.4.5 shows that each of the blocks [ T | ]Bi on Wi the diagonal is of the form (10.8). On the other hand, if T is symmetric, Proposition 10.4.6 shows that each W is of dimension 1. Thus, in this case, [ T ] is diagonal. i |W B

Corollary 10.4.8. Let A ∈ Matn×n(R) be a symmetric matrix. There exists an orthogonal −1 t matrix P ∈ On(R) such that P AP = P AP is diagonal.

In particular, this corollary tells us that every real symmetric matrix is diagonalizable over R. 10.4. SPECTRAL THEOREMS 197

n n n Proof. Let E be the standard basis of R and let TA : R → R be the linear operator associated to A, with matrix [ TA ]E = A. Since A is symmetric and E is an orthonormal n basis of R , the operator TA is symmetric (Proposition 10.3.4). Thus, by Theorem 10.4.7, n B there exists an orthonormal basis B of R such that [ TA ]B is diagonal. Setting P = [ I ]E , we have −1 −1 [ TA ]B = P [ TA ]E P = P AP (see Section 2.6). Finally, since B and E are two orthonormal basis of Rn, Exercise 10.2.3 tells us that P is an orthogonal matrix, that is, invertible with P −1 = P t.

Example 10.4.9. Let E be the standard basis of R3 and let T : R3 → R3 be the linear operator whose matrix in the basis E is  0 0 1  [ T ]E =  1 0 0  . 0 1 0

t 3 It is easily verified that [ T ]E [ T ]E = I. Since E is an orthonormal basis of R , this implies that T is an orthogonal operator. We see that

x 0 −1 3 2 charT (x) = −1 x 0 = x − 1 = (x − 1)(x + x + 1),

0 −1 x and so 1 is the only real eigenvalue of T . Since

 −1 0 1   1 0 −1  [ T ]E − I =  1 −1 0  ∼  0 1 −1  0 1 −1 0 0 0 is of rank 2, we see that the eigenspace of T for the eigenvalue 1 is of dimension 1 and is generated by the unit vector 1 1 u1 = √ 1 . 3 1 3 Then W1 = hu1iR is a T -invariant subspace of R and so we have a decomposition

3 ⊥ R = W1 ⊕ W1 where  x   ⊥   W1 =  y  ; x + y + z = 0  z  is also a T -invariant subspace of R3. We see that   1   1   1 1  u2 = √ −1 , u3 = √  1   2 0 6 −2  198 CHAPTER 10. INNER PRODUCT SPACES

⊥ is an orthonormal basis of W1 . Direction computation gives  0  √ −2 √ 1 1 3 1 3 1 T (u ) = √ 1 = − u + u ,T (u ) = √ 1 = − u − u . 2   2 2 2 3 3   2 2 2 3 2 −1 6 1

We also have T (u1) = u1 since u1 is an eigenvector of T for the eigenvalue 1. We conclude 3 that B = {u1, u2, u3} is an orthonormal basis of R such that     1 0√ 0 1 0 0 [ T ]B =  0 √−1/2 − 3/2  =  0 cos(2π/3) − sin(2π/3)  . 0 3/2 −1/2 0 sin(2π/3) cos(2π/3)

We see from this matrix that T fixes each vector of the line W1, while it performs a rotation ⊥ through the angle 2π/3 in the plane W1 . Thus T is a rotation through angle 2π/3 about the line W1. Note. This latter description of T does not completely determine it since T −1 is also a −1 rotation through an angle 2π/3 about W1. To distinguish T from T , one must specify the direction of the rotation.

Exercises.

10.4.1. Let R: R4 → R4 be the linear map whose matrix relative to the standard basis E of R4 is −1 1 1 1  1  1 −1 1 1  [ R ]E =   . 2  1 1 −1 1  1 1 1 −1

(i) Show that R is an orthogonal operator.

(ii) Find an orthonormal basis B of R3 in which the matrix of R is block diagonal with blocks of size 1 or 2 on the diagonal, and give [ R ]B.

−1 2 2  1 10.4.2. Let A = 2 −1 2 . 3   2 2 −1

(i) Show that A is an orthogonal matrix.

(ii) Find an orthogonal matrix U for which the product U t AU is block diagonal with blocks of size 1 or 2 on the diagonal, and give this product (i.e. give this block diagonal matrix). 10.5. POLAR DECOMPOSITION 199

10.4.3. For each of the following matrices A, find an orthogonal matrix U such that U t AU is a diagonal matrix and give this diagonal matrix.

 0 2 −1 1 2 1 1 (i) A = , (ii) A = , (iii) A = 2 3 −2 . 2 1 1 1   −1 −2 0

10.4.4. Let T be a symmetric operator on an inner product space V of dimension n. Suppose that T has n distinct eigenvalues λ1, . . . , λn and that v1,..., vn are respective eigenvectors for these eigenvalues. Show that {v1,..., vn} is an orthogonal basis of V .

10.4.5. Let T be a symmetric operator on a finite-dimensional inner product space V . Show that the following conditions are equivalent:

(i) All the eigenvalues of T are positive.

(ii) We have hT (v), vi > 0 for all v ∈ V with v 6= 0.

10.4.6. Let V be a finite-dimensional inner product space, and let S and T be symmetric operators on V that commute (i.e. S ◦ T = T ◦ S). Show that each eigenspace of T is S-invariant. Deduce from this observation that there exists an orthogonal basis B of V such that [ S ]B and [ T ]B are both diagonal. (We say that S and T are simultaneously diagonalizable.)

10.5 Polar decomposition

Let V be a finite-dimensional inner product space. Definition 10.5.1 (Positive definite). We say that a symmetric operator S : V → V is positive definite if the symmetric bilinear form

V × V −→ R (10.9) (u, v) 7−→ hS(u), vi = hu,S(v)i

associated to it is positive definite, that is, if it satisfies

hS(v), vi > 0 for all v ∈ V \{0}.

By Definition 10.1.1, this amounts to requiring that (10.9) is an inner product on V . 200 CHAPTER 10. INNER PRODUCT SPACES

Example 10.5.2. Let T : V → V be an arbitrary invertible operator. Then T ∗ ◦ T : V → V is a positive definite symmetric linear operator. Indeed, the properties of the adjoint given by Proposition 10.3.5 show that

(T ∗ ◦ T )∗ = T ∗ ◦ (T ∗)∗ = T ∗ ◦ T, and so T ∗ ◦ T is a symmetric operator. Also, for all v ∈ V \{0}, we have T (v) 6= 0, and so

h(T ∗ ◦ T )(v), vi = hT ∗(T (v)), vi = hT (v),T (v)i > 0.

This implies that T ∗ ◦ T is also positive definite.

Proposition 10.5.3. Let S : V → V be a linear operator. The following conditions are equivalent:

(i) S is a positive definite symmetric operator;

(ii) There exists an orthonormal basis B of V such that [ S ]B is a diagonal matrix with positive numbers on the diagonal.

Proof. (i)=⇒(ii): First suppose that S is positive definite symmetric. Since S is symmetric, Theorem 10.4.7 tells us that there exists an orthonormal basis B = {u1,..., un} of V such that [ S ]B can be written in the form   σ1 0  ..  [ S ]B =  0 .  (10.10) σn with σ1, . . . , σn ∈ R. Then, for j = 1, . . . , n, we have S(uj) = σjuj and so

2 hS(uj), uji = σjhuj, uji = σjkujk = σj.

Since S is positive definite, we conclude that the elements σ1, . . . , σn on the diagonal of [ S ]B are positive.

(ii)=⇒(i): Conversely, suppose that [ S ]B can be written in the form (10.10) for an orthonormal basis B = {u1,..., un} and positive real numbers σ1, . . . , σn. Since [ S ]B is symmetric, Proposition 10.3.4 shows that S is symmetric operator. Furthermore, if v is a nonzero vector of V , it can be written in the form v = a1u1 + ··· + anun with a1, . . . , an ∈ R not all zero, and we have

* n n + * n n + n X X X X X 2 hS(v), vi = ajS(uj), ajuj = ajσjuj, ajuj = aj σj > 0, j=1 j=1 j=1 j=1 j=1 hence S is positive definite. 10.5. POLAR DECOMPOSITION 201

Geometric interpretation in R2

2 2 2 Let S : R → R be a symmetric operator on R , and let B = {u1, u2} be an orthonormal basis of R2 such that       σ1 0 σ1 0 1 0 [ S ]B = = 0 σ2 0 2 0 σ2

with σ1 > 0 and σ2 > 0. Then S can be viewed as the composition of two “dilations” along orthogonal axes, one by a factor of σ1 in the direction u1 and the other by a factor of σ2 in the direction u2. The overall effect is that S sends the circle of radius 1 centred at the origin to the ellipse with semi-axes σ1u1 and σ2u2.

S 6

6  σ2u2

u2  - - @ @ @R @@R u1 σ1u1

Proposition 10.5.4. Let S : V → V be a positive definite symmetric operator. There exists a unique positive definite symmetric operator P : V → V such that S = P 2.

In other words, a positive definite symmetric operator admits a unique square root of the same type.

Proof. By Proposition 10.5.3, there exists an orthonormal basis B = {u1,..., un} of V such that   σ1 0  ..  [ S ]B =  0 .  σn

with σ1, . . . , σn > 0. Let P : V → V be the linear operator for which  √  σ1 . 0 [ P ]B =  ..  .  0 √  σn

2 2 2 Since [ S ]B = ([ P ]B) = [ P ]B, we have S = P . Since [ P ]B is also diagonal with positive numbers on its diagonal, Proposition 10.5.3 tells us that it is a positive definite symmetric operator. For the uniqueness of P , see Exercise 10.5.1. 202 CHAPTER 10. INNER PRODUCT SPACES

We conclude this chapter with the following result, which gives a decomposition of in- vertible linear operators T : V → V similar to the polar decomposition of a complex number z 6= 0 in the form z = r u where r is a positive real number and u is a complex number of norm 1 (see Exercise 10.5.3).

Theorem 10.5.5. Let T : V → V be an invertible linear operator. There exists a unique positive definite symmetric operator P : V → V and a unique orthogonal operator U : V → V such that T = U ◦ P .

Proof. By Example 10.5.2, we have that T ∗ ◦ T : V → V is a positive definite symmetric operator. By Proposition 10.5.4, there exists a positive definite symmetric operator P : V → V such that T ∗ ◦ T = P 2. Since P is positive definite, it is invertible. Let

U = T ◦ P −1.

Then we have U ∗ = (P −1)∗ ◦ T ∗ = (P ∗)−1 ◦ T ∗ = P −1 ◦ T ∗ and so U ∗ ◦ U = P −1 ◦ (T ∗ ◦ T ) ◦ P −1 = P −1 ◦ P 2 ◦ P −1 = I. This shows that U is an orthogonal operator (see Exercise 10.3.3). So we have T = U ◦ P as required. For the uniqueness of P and U, see Exercise 10.5.2.

Example 10.5.6. Let T : R2 → R2 be the linear operator given by x  x − y  T = . y 2x + 2y

Its matrix relative to the standard basis E of R2 is  1 −1  [ T ] = . E 2 2

Direct computation gives

 1 2 1 −1 5 3 [ T ∗ ◦ T ] = = , E −1 2 2 2 3 5 and so the characteristic polynomial of T ∗ ◦ T is x2 − 10x + 16 = (x − 2)(x − 8). Thus the eigenvalues of this operator are 2 and 8. We find that (1, −1)t and (1, 1)t are eigenvectors of T ∗ ◦ T for these eigenvalues. By Exercise 10.4.4, the latter form an orthogonal basis of R2. Then  1  1  1 1 B = u1 := √ , u2 := √ 2 −1 2 1 10.5. POLAR DECOMPOSITION 203

is an orthonormal basis of R2 such that 2 0 [ T ∗ ◦ T ] = . B 0 8

So we have T ∗ ◦ T = P 2 where P : R2 → R2 is the positive definite symmetric operator for which √  2 0  [ P ] = √ . B 0 8 We have that 1  1 1 1 1 −1 [ I ]B = √ and [ I ]E = ([ I ]B)−1 = ([ I ]B)t = √ , E 2 −1 1 B E E 2 1 1 where the latter calculation uses the fact that, since E and B are orthonormal basis of R2, B the matrix [ I ]E is orthogonal (see Exercise 10.2.3). The matrix of P in the standard basis is therefore   √      B E 1 1 1 2 0 1 −1 1 3 1 [ P ]E = [ I ] [ P ]B [ I ] = √ = √ . E B 2 −1 1 0 2 2 1 1 2 1 3 We note in passing that this matrix is indeed symmetric, as it should be. Finally, following the steps of the proof of Theorem 10.5.5, we set U = T ◦ P −1. We find that 1 −1 1  3 −1 1 1 −1 cos(π/4) − sin(π/4) [ U ]E = √ = √ = . 2 2 4 2 −1 3 2 1 1 sin(π/4) cos(π/4) In this form, we recognize that U is the rotation through an angle π/4 about the origin in R2. In particular, U is indeed an orthogonal operator and we have obtained the required √decomposition T = U ◦ P , with P the symmetric√ operator that dilates vectors by a factor of 2 in the direction u1 and by a factor of 8 in the direction u2.

P 6 U 6 3 T (u2) 6 6 2 2  P (u2)

1 1 u2  T (u1) - - - - @ @ −1 @R 1 2 −2 −1 1 2 3 −2 −1 1 2 @@R −1 u1 P (u1)

T 204 CHAPTER 10. INNER PRODUCT SPACES

Exercises.

10.5.1. Let P : V → V and S : V → V be positive definite symmetric operators such that P 2 = S. Show that, for every eigenvalue λ of P , the eigenspace of P for the eigenvalue λ coincides with that of S for the eigenvalue λ2. Conclude that P is the unique positive definite symmetric operator on V such that P 2 = S.

10.5.2. Let P : V → V be a positive definite symmetric operator, let U : V → V be an orthogonal operator, and let T = U ◦ P . Show that T ∗ ◦ T = P 2. Using the preceding exercise, deduce that P is the unique positive definite symmetric operator and U is the unique orthogonal operator such that T = U ◦ P .

10.5.3. Consider C as a real vector space and equip it with the inner product ha + ib, c + idi = ac + bd.

Fix a complex number z 6= 0 and consider the R-linear map T : C → C given by T (w) = zw for all w ∈ C.

(i) Verify that hw1, w2i = Re(w1w2) for all w1, w2 ∈ C.

(ii) Show that the adjoint T ∗ of T is given by T ∗(w) = zw for all w ∈ C.

(iii) Let r = |z| and let u = r−1z. Show that the operator P : C → C of multiplication by r is symmetric positive definite, that the operator U : C → C of multiplication by u is orthogonal and that T = U ◦ P is the polar decomposition of T . Appendices

205

Appendix A

Review: groups, rings and fields

Throughout this course, we will call on notions of algebra seen in MAT 2143. In particular, we will use the idea of a ring, which is only touched on in MAT 2143. This appendix presents a review of the notions of groups, rings and fields, together with their simplest properties. We omit most proofs already covered in MAT 2143.

A.1 Monoids

A binary operation on a set E is a function ∗: E × E → E which, to each pair of elements (a, b) of E, associates an element of E denoted a ∗ b. The symbol used to designate the operation can vary. Often, when it will not cause confusion, we simply write ab to denote the result of a binary operation applied to the pair (a, b) ∈ E × E. Definition A.1.1 (Associative, commutative). We say that a binary operation ∗ on a set E is:

• associative if (a ∗ b) ∗ c = a ∗ (b ∗ c) for all a, b, c ∈ E

• commutative if a ∗ b = b ∗ a for all a, b ∈ E.

Examples A.1.2. Addition and multiplication in Z, Q, R and C are binary operations that are both associative and commutative. On the other hand, for an integer n ≥ 2, multiplication in the set Matn×n(R) of square matrices of size n with coefficients in R is associative but not commutative. Similarly, composition in the set End(S) of maps from a set S to itself is associative but not commutative if |S| ≥ 3. The vector product (cross product) on R3 is a binary operation that is neither commutative nor associative.

One important fact is that if an operation ∗ on a set E is associative, then the way in which we group terms in a product of a finite number of elements a1, . . . , an of E does not

207 208 APPENDIX A. REVIEW: GROUPS, RINGS AND FIELDS affect the result, as long as the order is preserved. For example, there are five ways to form the product of four elements a1, . . . , a4 of E if we maintain their order. They are:

((a1 ∗ a2) ∗ a3) ∗ a4, (a1 ∗ (a2 ∗ a3)) ∗ a4, (a1 ∗ a2) ∗ (a3 ∗ a4),

a1 ∗ (a2 ∗ (a3 ∗ a4)), a1 ∗ ((a2 ∗ a3) ∗ a4).

We can check that if ∗ is associative, then all of these products are equal. For this reason, if ∗ is an associative operation, we simply write

a1 ∗ a2 ∗ · · · ∗ an to denote the product of the elememtns a1, . . . , an of E, in that order (without using paren- theses).

If the operation ∗ is both associative and commutative, then the order in which we multiply the elements of E does not affect their product. We have, for example,

a1 ∗ a2 ∗ a3 = a1 ∗ a3 ∗ a2 = a3 ∗ a1 ∗ a2 = ··· .

Here, the verb “to multiply” and the term “product” are taken in the generic sense: “mul- tiply” means “apply the operation” and the word “product” means “the result of the oper- ation”. When we wish to be more precise, we will use the terms that apply to the specific situation.

Special case: The symbol + is reserved for commutative associative operations. We then say that the operation is an addition and we speak of adding elements. The result of an addition is called a sum. Definition A.1.3 (Identity element). Let ∗ be a binary operation on a set E. We say that an element e of E is an identity element for ∗ if

a ∗ e = e ∗ a = a for all a ∈ E.

One can show that if E has an identity element for an operation ∗, then it only has one such element. Generally, we denote this element by 1, but if the operation is written as +, we denote it by 0. Examples A.1.4. Let n be a positive integer. The identity element for the addition in Matn×n(R) is the zero matrix On of size n × n whose entries are all zero. The identity element for the multiplication in Matn×n(R) is the identity matrix In of size n × n whose element on the diagonal are all equal to 1 and the others are all equal to 0.

Definition A.1.5 (Monoid). A monoid is a set E equipped with an associative binary oper- ation for which E has an identity element. A.2. GROUPS 209

A monoid is therefore a pair (E, ∗) consisting of a set E and an operation ∗ on E with the above properties. When we speak of a monoid E, without specifying the operation, it is because the operation is implied. In this case, we generally write ab to indicate the product of two elements a and b in E.

Example A.1.6. The pairs (Z, +), (Z, ·), (N, +) and (N, ·) are monoids (recall that N = {0, 1, 2,... } denotes the set of nonnegative integers). For each integer n ≥ 1, the pairs (Matn×n(R), +) and (Matn×n(R), ·) are also monoids. Finally, if S is a set, then (End(S), ◦) is a monoid.

Definition A.1.7. Let (E, ∗) be a monoid. For an integer n ≥ 1 and a ∈ E, we denote by

an = a ∗ a ∗ · · · ∗ a | {z } n times

the product of n copies of a. If the operation is denoted + (hence is commutative), we instead write n a = a + a + ··· + a | {z } n times to denote the sum of n copies of a.

In this context, we have the following rule:

Exponent rule. If (E, ∗) is a monoid, we have

am ∗ an = am+n and (am)n = amn

for all a ∈ E and all pairs of integers m, n ≥ 1. Furthermore, if (E, ∗) is a commutative monoid, that is if ∗ is commutative (in addition to being associative), we also have

am ∗ bm = (a ∗ b)m

for every all integers m ≥ 1 and a, b ∈ E.

Finally, if (E, +) is a commutative monoid written additively, these formulas become

ma + na = (m + n)a, n(ma) = (nm)a and ma + mb = m(a + b)

for all a, b ∈ E and integers m, n ≥ 1.

A.2 Groups

The notion of a group is central in algebra. In this course, the groups will encounter will mostly be abelian groups, but since we will also see nonabelian groups, it is appropriate to recall the more general notion. We first recall: 210 APPENDIX A. REVIEW: GROUPS, RINGS AND FIELDS

Definition A.2.1 (Invertible). Let (E, ∗) be a monoid. We say that an element a of E is invertible (for the operation ∗) if there exists b ∈ E such that

a ∗ b = b ∗ a = 1 (A.1)

One can show that if a is an invertible element of a monoid (E, ∗), then there exists a unique element b of E that satisfies Condition (A.1). We call it inverse of a and denote it a−1. We therefore have a ∗ a−1 = a−1 ∗ a = 1 (A.2) for every invertible element a of (E, ∗). We can also show:

Lemma A.2.2. Let a and b be invertible elements of a monoid (E, ∗). Then

(i) a−1 is invertible and (a−1)−1 = a,

(ii) ab is invertible and (ab)−1 = b−1a−1.

Important special case: If the operation of a commutative monoid is written +, we write −a to denote the inverse of an invertible element a and we have:

a + (−a) = 0.

If a, b are two invertible elements of (E, +), the assertions (i) and (ii) of Lemma A.2.2 become:

(i’) −a is invertible and −(−a) = a.

(ii’) a + b is invertible and −(a + b) = (−a) + (−b).

Examples A.2.3. 1. All elements of (Z, +) are invertible.

2. The set of invertible elements of (Z, ·) is {1, −1}. 3. If S is a set, the invertible elements of (End(S), ◦) are the bijective maps f : S → S.

Definition A.2.4 (Group). A group is a monoid (G, ∗) whose elements are all invertible.

If we make this definition more explicit by including the preceding definitions, a group is a set E equipped with a binary operation ∗ possessing the following properties:

G1. (a ∗ b) ∗ c = a ∗ (b ∗ c) for all a, b, c ∈ G ;

G2. there exists 1 ∈ G such that 1 ∗ a = a ∗ 1 = a for all a ∈ G ;

G3. for all a ∈ G, there exists b ∈ G such that a ∗ b = b ∗ a = 1. A.2. GROUPS 211

Definition A.2.5 (Abelian group). We say that a group (G, ∗) is abelian or commutative if the operation is commutative.

Examples A.2.6. 1. The pairs (Z, +), (Q, +), (R, +) are abelian groups. 2. If S is a set, the set Aut(S) of bijections from S to itself is a group under composition.

The second example above is a special case of the following result:

Proposition A.2.7. The set E∗ of invertible elements of a monoid E is a group for the operation of E restricted to E∗.

Proof. Lemmat A.2.2 shows that if a, b ∈ E∗, then their product ab remains in E∗. Therefore, by restriction, the operation of E induces a binary operation on E∗. It is associative and, since 1 ∈ E∗, it has an identity element in E∗. Finally, for all a ∈ E∗, Lemma A.2.2 shows that the inverse a−1 (in E) belongs to E∗. Since it satisfies a a−1 = a−1 a = 1, it is also the inverse of a in E∗.

Example A.2.8. If S is a set, the invertible elements of (End(S), ◦) are the bijective maps from S ot itself. Therefore the set Aut(S) of bijections f : S → S is a group under composition.

Example A.2.9. Fix an integer n ≥ 1. We know that the set Matn×n(R) is a monoid under multiplication. The invertible elements of this monoid are the n × n matrices with nonzero determinant. These therefore form a group under multiplication. We denote this group GLn(R): GLn(R) = {A ∈ Matn×n(R) ; det(A) 6= 0}. We call it the general linear group of order n on R.

Definition A.2.10. Let a be an element of a group G. For every integer n ∈ Z, we define  a a ··· a if n ≥ 1, | {z }  n times an := 1 if n = 0,  −1 −1 a ··· a if n ≤ −1. | {z } −n times

Then the Exponent Rule becomes:

Proposition A.2.11. Let a be an element of a group G. For all integers m, n ∈ Z, we have:

(i) aman = am+n

(ii) (am)n = amn.

If G is abelian and if a, b ∈ G, we also have 212 APPENDIX A. REVIEW: GROUPS, RINGS AND FIELDS

(iii) (ab)m = ambm for all m ∈ Z.

The Exponent Rule implies as a special case (a−1)−1 = a(−1)(−1) = a1 = a.

In the case of an abelian group (G, +) denoted additively, we define instead, for all n ∈ Z,  a + a + ··· + a if n ≥ 1,  | {z }  n times na = 0 if n = 0,  (−a) + ··· + (−a) if n ≤ −1, | {z }  −n times

and the rule becomes

(i)( ma) + (na) = (m + n)a,

(ii) m(na) = (mn)a,

(iii) m(a + b) = ma + mb, for all integers m, n ∈ Z and elements a, b ∈ G.

A.3 Subgroups

Definition A.3.1 (Subgroup). A subgroup of a group G is a subset H of G such that

SG1. 1 ∈ H,

SG2. ab ∈ H for all a, b ∈ H,

SG3. a−1 ∈ H for all a ∈ H.

It is easy to see that a subgroup H of a group G is itself a group under the operation of G restricted to H (thanks to SG2, the product in G induces a product on H by restriction).

Examples A.3.2. 1. The set 2Z of even integers is a subgroup of (Z, +).

2. The set R∗ = R \{0} of nonzero real numbers is a group under multiplication and the ∗ ∗ set R+ = {x ∈ R ; x > 0} of positive real numbers is a subgroup of (R , ·).

We also recall that the set of subgroups of a group G is stable under intersection.

Proposition A.3.3. Let (Gi)i∈I be a collection of subgroups of a group G. Then their T intersection i∈I Gi is also a subgroup of G. A.4. GROUP HOMOMORPHISMS 213

For example, if X is a subset of a group G, the intersection of all the subgroups of G containing X is a subgroup of G and, since it contains X, it is the smallest subgroup of G that contains X. We call it the subgroup of G generated by X and we denote it hXi. In the case that X is a singleton (i.e. contains a single element) we have the following: Proposition A.3.4. Let a be an element of a group G. The subgroup of G generated by a is n hai = {a ; n ∈ Z}.

Proof. Let H = {an ; n ∈ Z}. We have 1 = a0 ∈ H.

Furthermore, for all m, n ∈ Z, the Exponent Rule (Proposition A.2.11) gives aman = am+n ∈ H and (am)−1 = a−m ∈ H.

Therefore, H is a subgroup of G. It contains a1 = a. Since every subgroup of G that contains a must also contain all of H, H is the smallest subgroup of G containing a, and so H = hai.

Whether or not G is abelian, the subgroup hai generated by a single elements is always abelian. If G is abelian with its operation written additively, then the subgroup generated by an element a of G is instead written:

hai = {n a ; n ∈ N}. Definition A.3.5 (Cyclic group). We say that a group G is cyclic if G is generated by a single element, that is, if there exists a ∈ G such that G = hai.

Example A.3.6. The group (Z, +) is cyclic, generated by 1.

Example A.3.7. Let n be a positive integer. The set C∗ = C \{0} of nonzero complex numbers is a group under multiplication and the set

∗ n Cn = {z ∈ C ; z = 1}

of n-th roots of unity is the subgroup of C∗ generated by e2πi/n = cos(2π/n) + i sin(2π/n).

A.4 Group homomorphisms

Definition A.4.1 (Group homomorphism). Let G and H be two groups. A (group) homo- morphism from G to H is a function ϕ : G → H such that

ϕ(a b) = ϕ(a) ϕ(b)

for all pairs of elements a, b of G. 214 APPENDIX A. REVIEW: GROUPS, RINGS AND FIELDS

We recall that if ϕ : G → H is a group homomorphism, then:

−1 −1 ϕ(1G) = 1H and ϕ(a ) = ϕ(a) for all a ∈ G,

where 1G and 1H denote, respectively, the identity elements of G and H. More generally, we have ϕ(an) = ϕ(a)n for all a ∈ G and n ∈ Z. In the case where G and H are abelian groups written additively, we must interpret Definition A.4.1 in the following manner: a homomorphism from G to H is a function ϕ : G → H such that ϕ(a + b) = ϕ(a) + ϕ(b)

for every pair of elements a, b of G. Such a function satisfies ϕ(0G) = 0H and ϕ(−a) = −ϕ(a) for all a ∈ G, and more generally ϕ(na) = nϕ(a) for all n ∈ Z and a ∈ G. We also recall the following results: Proposition A.4.2. Let ϕ : G → H and ψ : H → K be group homomorphisms.

(i) The composition ψ ◦ ϕ : G → K is a homomorophism.

(ii) If ϕ is bijective, the inverse function ϕ−1 : H → G is also a bijective homomorphism.

A bijective homomorphism is called an isomorphism. We say that a group G is isomorphic to a group H is there exists an isomorphism from G to H. This defines an equivalence relation on the class of groups. Example A.4.3. The map ∗ exp : R → R+ x 7→ ex is a group homomorphism since ex+y = exey for all x, y ∈ R. Since it is bijective, it is an ∗ isomorphism. Therefore the groups (R, +) and (R+, ·) are isomorphic. Proposition A.4.4. Let ϕ : G → H be a group homomorphism.

(i) The set ker(ϕ) := {a ∈ G ; ϕ(a) = 1H } is a subgroup of G, called the kernel of ϕ.

(ii) The set Im(ϕ) := {ϕ(a); a ∈ G} is a subgroup of H called the image of ϕ.

(iii) The map ϕ is injective if and only if ker(ϕ) = {1G}. It is surjective if and only if Im(ϕ) = H. A.5. RINGS 215

Example A.4.5. The map ϕ : R∗ −→ R∗ x 7−→ x2 ∗ is a group homomorphism with kernel ker(ϕ) = {−1, 1} and image Im(ϕ) = R+.

Example A.4.6. Let n ∈ N∗. The map

∗ ϕ : GLn(R) −→ R A 7−→ det(A)

is a group homomorphism since det(AB) = det(A) det(B) for every pair of matrices A, B ∈ GLn(R) (see Example A.2.9 for the definition of GLn(R)). Its kernel is

SLn(R) := {A ∈ GLn(R) ; det(A) = 1}.

We call it the special linear group of order n on R. It is a subgroup of GLn(R). In addition, we can easily verify that the image of ϕ is R∗, that is, ϕ is a surjective homomorphism. Example A.4.7. Let G and H be two group. Then cartesian product

G × H = {(g, h); g ∈ G, h ∈ H}

is a group for the componentwise product:

(g, h) · (g0, h0) = (gg0, hh0).

For this group structure, the projection

π : G × H −→ G (g, h) 7−→ g

is a surjective homomorphism with kernel

ker(π) = {(1G, h); h ∈ H} = {1G} × H.

A.5 Rings

Definition A.5.1 (Ring). A ring is a set A equipped with two binary operations

A × A −→ A and A × A −→ A, (a, b) 7−→ a + b (a, b) 7−→ ab called, respectively, addition and multiplication, that satisfy the following conditions:

(i) A is an abelian group under the addition,

(ii) A ia a monoid under the multiplication, 216 APPENDIX A. REVIEW: GROUPS, RINGS AND FIELDS

(iii) the multiplication is distributive over the addition: we have

a(b + c) = ab + ac and (a + b)c = ac + bc

for all a, b, c ∈ A.

Explicitly, the three conditions of the definition are equivalent to the following eight axioms:

A1. a + (b + c) = (a + b) + c ∀a, b, c ∈ A.

A2. a + b = b + a ∀a, b ∈ A.

A3. There exists an element 0 of A such that a + 0 = a for all a ∈ A.

A4. For all a ∈ A, there exists −a ∈ A such that a + (−a) = 0.

A5. a(bc) = (ab)c ∀a, b, c ∈ A.

A6. There exists an element 1 of A such that a 1 = 1 a = a.

A7. a(b + c) = ab + ac ∀a, b, c ∈ A.

A8. (a + b)c = ac + bc ∀a, b, c ∈ A.

We say that a ring is commutative if the multiplication is commutative, in which case Conditions A7 and A8 are equivalent.

Since a ring is an abelian group under addition, the sum of n elements a1, ··· , an of a ring A does not depend on either the grouping of terms or the order in which we add them. The sum of these elements is simply denoted

n X a1 + ··· + an or ai. i=1 The identity element for the addition, characterized by condition A3, is called the zero of the ring A and we denote it 0. Each element a of A has a unique additive inverse, denoted −a and characterized by Condition A4. For all n ∈ Z, we define  a + +a + ··· + a if n ≥ 1,  | {z }  n times na = 0 if n = 0,  (−a) + ··· + (−a) if n ≤ −1, | {z }  −n times as in the case of abelian groups. Then we have

(m + n)a = ma + na, m(na) = (mn)a and m(a + b) = ma + mb, A.5. RINGS 217 for all m, n ∈ Z and a, b ∈ A. Also, since a ring is a monoid under the multiplication, the product of n elements a1, a2, . . . , an of a ring A does not depend on the way we group the terms in the product as long as we maintain the order. The product of these elements is simply denoted

a1 a2 ··· an. If the ring A is commutative, the product of elements of A also does not depend on the order in which we multiply them. Finally, the identity element of A for the multiplication is denoted 1 and is characterized by Condition A5, if is called the unit. For every n ≥ 1 and a ∈ A, we define, as usual, an = a a ··· a . | {z } n times We also have the Exponent Rule:

aman = am+n, (am)n = amn (A.3) for all integers m, n ≥ 1 and a ∈ A.

We say that two elements a, b of A commute if ab = ba. In this case, we also have

(ab)m = ambm for all m ∈ N∗. In particular, this formula is satisfied for all a, b ∈ A if A is a commutative ring.

We say that an element a of a ring A is invertible if it has a multiplicative inverse, that is, if there exists b ∈ A such that ab = ba = 1. As seen in Section A.2, this element b is then characterized by this condition. We denote it a−1 and call it the inverse of a. For an invertible element a of A, we can extend the definition of am to any integer m ∈ Z and formulas (A.3) remain valid for all m, n ∈ Z (see Section A.2). In a ring A, the laws of addition and multiplication are related by the conditions of distributivity in two terms A7 and A8. By induction on n, we can deduce the properties of distributivity in n terms:

a(b1 + ··· + bn) = ab1 + ··· + abn ,

(a1 + ··· + an)b = a1b + ··· + anb , for all a, a1, . . . , an, b, b1, . . . , bn ∈ A. Still more generally, combining these two properties, we find m ! n ! m n ! m n X X X X X X ai bj = ai bj = aibj i=1 j=1 i=1 j=1 i=1 j=1 for all a1, . . . , an, b1, . . . , bn ∈ A. Moreover, if a, b ∈ A commute, we obtain the “binomial formula” n X n (a + b)n = aibn−i. i i=0 218 APPENDIX A. REVIEW: GROUPS, RINGS AND FIELDS

Finally, the axioms of distributivity imply

(ma)b = a(mb) = m(ab)

for all a, b ∈ A and m ∈ Z. In particular, taking m = 0 and m = −1, we recover the well know properties 0 a = a 0 = 0 and (−a)b = a(−b) = −(ab) where, in the first series of equalities, the symbol 0 represents the zero in the ring.

We also recall that, since a ring A is a monoid under the multiplication, the set of invertible elements of A forms a group under the multiplication. We denote this group A∗.

Example A.5.2. For every integer n ≥ 1, the set Matn×n(R) is a ring under the addition ∗ and multiplication of matrices. The group (Matn×n(R)) of invertible element of this ring is denoted GLn(R) (see Example A.2.9).

Example A.5.3. Let n ≥ 1 be an integer and let A be an arbitrary ring. The set    a ··· a   11 1n   . .  Matn×n(A) =  . .  ; a11, . . . , ann ∈ A    an1 ··· ann  of n × n matrices with entries in A is a ring for the operations

(aij) + (bij) = (aij + bij) n X (aij) · (bij) = (cij) where cij = aikbkj. k=1 This does not require that the ring A be commutative.

Example A.5.4. Let X be a set and let A be a ring. We denote by F(X,A) the set of functions from X to A. Given f, g ∈ F(X,A), we define f + g and fg to be the functions from X to A given by

(f + g)(x) = f(x) + g(x) and (fg)(x) = f(x)g(x) for all x ∈ X. Then F(X,A) is a ring for these operations.

Example A.5.5. The triples (Z, +, ·), (Q, +, ·), (R, +, ·) are commutative rings.

Example A.5.6. Let n be a positive integer. We define an equivalence relation on Z by setting

a ≡ b mod n ⇐⇒ n divides a − b.

Then Z is partitioned into exactly n equivalence classes represented by the integers 0, 1, . . . , n− 1. We denote bya ¯ the equivalence class of a ∈ Z and we call it more precisely the congruence A.6. SUBRINGS 219

class of a modulo n. The set of congruence classes modulo n is denoted Z/nZ. We therefore have Z/nZ = {0¯, 1¯,..., n − 1}. We define the sum and product of two congruence classes module n by a + b = a + b and a b = ab , the resulting classes a + b and ab being independent of the choices of representatives a of a and b of b. For these operations, Z/nZ is a commutative ring with n elements. We call it the ring of integers modulo n.

A.6 Subrings

Definition A.6.1 (Subring). A subring of a ring A is a subset B of A such that

SR1. 0, 1 ∈ B.

SR2. a + b ∈ B for all a, b ∈ B. SR3. −a ∈ B for all a ∈ B. SR4. ab ∈ B for all a, b ∈ B.

Such a subset B is itself a ring for the operations of A restricted to B. The notion of subring is transitive: if B is a subring of a ring A and C is a subring of B, then C is a subring of A. Example A.6.2. In the chain of inclusions Z ⊆ Q ⊆ R ⊆ C each ring is a subring of its successor. Example A.6.3. Let A be an arbitrary ring. The set

P := {m 1A ; m ∈ Z}

of integer multiples of the unit 1A of A is a subring of A. Indeed, we have

0A = 0 · 1A ∈ P, 1A = 1 · 1A ∈ P, and, for all m, n ∈ Z,

(m 1A) + (n 1A) = (m + n) 1A ∈ P,

−(m 1A) = (−m) 1A ∈ P,

(m 1A)(n 1A) = (mn) 1A ∈ P. This subring P is the smallest subring of A. Example A.6.4. Let A be a subring of a commutative ring C and let c ∈ C. The set n ∗ A[c] := {a0 + a1 c + ··· + an c ; n ∈ N , a0, . . . , an ∈ A} is the smallest subring of C containing A and c (exercise). 220 APPENDIX A. REVIEW: GROUPS, RINGS AND FIELDS A.7 Ring homomorphisms

Definition A.7.1. Let A and B be two rings. A ring homomorphism from A to B is a map ϕ : A → B such that

1) ϕ(1A) = 1B, 2) ϕ(a + a0) = ϕ(a) + ϕ(a0) for all a, a0 ∈ A, 3) ϕ(a a0) = ϕ(a)ϕ(a0) for all a, a0 ∈ A.

Condition 2) means that ϕ : A → B is a homomorphism of abelian groups under addition. In particular, it implies that ϕ(0A) = 0B. On the other hand, Condition 3) does not necessarily imply that ϕ(1A) = 1B. That is why we add Condition 1). If ϕ: A → B is a ring homomorphism and a ∈ A, we have

ϕ(na) = nϕ(a) for all n ∈ Z, ϕ(an) = ϕ(a)n for all n ∈ N∗ . Example A.7.2. Let A be a ring. It is easily verified that the set  a b   U = ; a, b, c ∈ A 0 c is a subring of Mat2×2(A) and that the map ϕ: U → A given by  a b  ϕ = a 0 c is a ring homomorphism. Proposition A.7.3. Let ϕ: A → B and ψ : B → C be ring homomorphisms.

(i) The composition ψ ◦ ϕ: A → C is a ring homomorphism. (ii) If ϕ is bijective, then the inverse map ϕ−1 : B → A is also a bijective ring homomor- phism.

As in the case of groups, a bijective ring homomorphism is called a ring isomorphism. We say that a ring A is isomorphic to a ring B if there exists a ring isomorphism from A to B. This defines an equivalence relation on the class of rings. Definition A.7.4 (Kernel and image). Let ϕ: A → B be a ring homomorphism. We define the kernel of ϕ by ker(ϕ) := {a ∈ A ; ϕ(a) = 0} and the image of ϕ by Im(ϕ) := {ϕ(a); a ∈ A}. A.7. RING HOMOMORPHISMS 221

These are, respectively, the kernel and image of ϕ viewed as a homomorphism of abelian groups. If follows that ker(ϕ) is a subgroup of A under addition, and Im(ϕ) is a subgroup of B under addition. In view of part (iii) of Proposition A.4.4, we can conclude that a ring homomorphism ϕ : A → B is

• injective ⇐⇒ ker(ϕ) = {0},

• surjective ⇐⇒ Im(ϕ) = B.

We can also verify more precisely that Im(ϕ) is a subring of B. However, ker(ϕ) is not generally a subring of A. Indeed, we see that

1A ∈ ker(ϕ) ⇐⇒ 1B = 0B ⇐⇒ B = {0B}.

Thus ker(ϕ) is a subring of A if and only if B is the trivial ring consisting of a single element 0 = 1. We note however that if a ∈ A and x ∈ ker(ϕ), then we have

ϕ(a x) = ϕ(a)ϕ(x) = ϕ(a)0 = 0

ϕ(x a) = ϕ(x)ϕ(a) = 0ϕ(a) = 0 and so ax and xa belong to ker(ϕ). This motivates the following definition: Definition A.7.5 (Ideal). An ideal of a ring A is a subset I of A such that

(i)0 ∈ I

(ii) x + y ∈ I for all x, y ∈ I,

(iii) ax, xa ∈ I for all a ∈ A and x ∈ I.

Condition (iii) implies that −x = (−1)x ∈ I for all x ∈ I. Thus an ideal of A is a subgroup of A under addition that satisfies Condition (iii). When A is commutative, this last condition becomes simply

ax ∈ I for all a ∈ A and x ∈ I.

In light of the observations preceding Definition A.7.5, we can conclude:

Proposition A.7.6. The kernel of a ring homomorphism ϕ : A → B is an ideal of A.

Example A.7.7. For the homomorphism ϕ : U → A of Example A.7.2, we have

 0 b   ker(ϕ) = ; b, c ∈ A . 0 c

The latter is an ideal of U. 222 APPENDIX A. REVIEW: GROUPS, RINGS AND FIELDS A.8 Fields

Definition A.8.1 (). A field is a commutative ring K with 0 6= 1, all of whose nonzero elements are invertible.

If K is a field, each nonzero element a of K has a multiplicative inverse a−1. In addition, using the addition an multiplication, we can therefore define an operation of subtraction in K by a − b := a + (−b) and an operation of division by a := a b−1 if b 6= 0. b

Example A.8.2. The sets Q, R and C are fields for the usual operations.

Finally, we recall the following result:

Proposition A.8.3. Let n ∈ N∗. The ring Z/nZ is a field if and only if n is a prime number.

For each prime number p, we denote by Fp the field with p elements.

For p = 2, the addition and multiplication tables for F2 = {0, 1} are given by

+ 0¯ 1¯ · 0¯ 1¯ 0¯ 0¯ 1¯ and 0¯ 0¯ 0¯ . 1¯ 1¯ 0¯ 1¯ 0¯ 1¯

¯ ¯ ¯ In F3 = {0, 1, 2}, they are given by + 0¯ 1¯ 2¯ · 0¯ 1¯ 2¯ 0¯ 0¯ 1¯ 2¯ 0¯ 0¯ 0¯ 0¯ and . 1¯ 1¯ 2¯ 0¯ 1¯ 0¯ 1¯ 2¯ 2¯ 2¯ 0¯ 1¯ 2¯ 0¯ 2¯ 1¯ Appendix B

The determinant

Throughout this appendix, we fix an arbitrary commutative ring A, with 0 6= 1. We extend the notion of determinants to square matrices with entries in A and we show that most of the usual properties of the determinant (those we know for matrices with entries in a field) apply to this general situation with minor modifications. To do this, we use the notion of A-module presented in Chapter 6.

B.1 Multilinear maps

Definition B.1.1 (Multilinear map). Let M1,...,Ms and N be A-modules. We say that a map

ϕ: M1 × · · · × Ms → N is multilinear if it satisfies

0 00 0 00 ML1. ϕ(u1,..., uj + uj ,..., us) = ϕ(u1,..., uj,..., us) + ϕ(u1,..., uj ,..., us),

ML2. ϕ(u1, . . . , auj,..., us) = aϕ(u1,..., uj,..., us),

0 00 for every integer j = 1, . . . , s, all u1 ∈ M1,..., uj, uj, uj ∈ Mj,..., us ∈ Ms, and all a ∈ A.

In other words, a map ϕ: M1 ×· · ·×Ms → N is multilinear if, for each integer j = 1, . . . , s and all u1 ∈ M1 ,..., us ∈ Ms, the function

ϕ : Mj −→ N u 7−→ ϕ(u1,..., uj−1, u, uj+1,..., us)

is a homomorphism of A-modules.

223 224 APPENDIX B. THE DETERMINANT

(j) (j) Proposition B.1.2. Let M1,...,Ms be free A-modules, and let {v1 ,..., vnj } be a basis of Mj for j = 1, . . . , s. For every A-module N and all

wi1...is ∈ N (1 ≤ i1 ≤ n1,..., 1 ≤ is ≤ ns),

there exists a unique multilinear map ϕ: M1 × · · · × Ms → N such that

ϕ(v(1),..., v(s)) = w (1 ≤ i ≤ n ,..., 1 ≤ i ≤ n ). (B.1) i1 is i1...is 1 1 s s

Proof. Suppose that a multilinear map ϕ: M1 × · · · × Ms → N satisfies Conditions (B.1). For all u1 ∈ M1,..., us ∈ Ms, we can write

n1 n2 ns X (1) X (2) X (s) u = a v , u = a v ,..., u = a v , (B.2) 1 i1,1 i1 2 i2,2 i2 s is,s is i1=1 i2=1 is=1

for some elements ai,j of A. Then, by the multilinearity of ϕ, we have

n1 X (1) ϕ(u , u ,..., u ) = a ϕ(v , u ,..., u ) 1 2 s i1,1 i1 2 s i1=1 n1 n2 X X (1) (2) = a a ϕ(v , v , u ,..., u ) i1,1 i2,2 i1 i2 3 s i1=1 i2=1 ······ (B.3)

n1 n2 ns X X X (1) (2) (s) = ··· a a . . . a ϕ(v , v ,..., v ) i1,1 i2,2 is,s i1 i2 is i1=1 i2=1 is=1 n n n X1 X2 Xs = ··· ai1,1ai2,2 . . . ais,s wi1...is .

i1=1 i2=1 is=1

This proves that there is at most one multilinear map ϕ satisfying (B.1). On the other hand, for given u1 ∈ M1,..., us ∈ Ms, there is only one choice of aij ∈ A satisfying (B.2) and so formula (B.3) does indeed define a function ϕ: M1 × · · · × Ms → N. We leave it as an exercise to verify that this function is multilinear and satisfies (B.1).

In this appendix, we are interested in a particular type of multilinear map: Definition B.1.3 (Alternating multilinear map). Let M and N be A-modules, and let n be a positive integer. We say that a multilinear map

ϕ : M n −→ N is alternating if it satisfies ϕ(u1,..., un) = 0 for all u1,..., un ∈ M not all distinct.

The qualifier “alternating” is justified by the following proposition: B.1. MULTILINEAR MAPS 225

Proposition B.1.4. Let ϕ: M n −→ N be an alternating multilinear map, and let i and j be integers with 1 ≤ i < j ≤ n. For all u1,..., un ∈ M, we have i j i j ∨ ∨ ∨ ∨ ϕ(u1,..., uj,..., ui,..., un) = −ϕ(u1,..., ui,..., uj,..., un).

In other words, the sign of ϕ changes when we permute two of its arguments.

Proof. Fix u1,..., un ∈ M. For all u ∈ M, we have: i j ∨ ∨ ϕ(u1,..., u,..., u,..., un) = 0.

Choosing u = ui + uj and using the multilinearity of ϕ, we have

i j ∨ ∨ 0 = ϕ(u1,..., ui + uj,..., ui + uj,..., un)

= ϕ(u1,..., ui,..., ui,..., un) + ϕ(u1,..., ui,..., uj,..., un)

+ ϕ(u1,..., uj,..., ui,..., un) + ϕ(u1,..., uj,..., uj,..., un)

= ϕ(u1,..., ui,..., uj,..., un) + ϕ(u1,..., uj,..., ui,..., un).

The conclusion follows.

Let Sn be the set of bijections from the set {1, 2, . . . , n} to itself. This is a group under composition, called the group of permutations of {1, 2, . . . , n}. We say that an element τ of Sn is a transposition if there exist integers i and j with 1 ≤ i < j ≤ n such that τ(i) = j, τ(j) = i and τ(k) = k for all k 6= i, j. In the course MAT 2143 on group theory, you saw that every σ ∈ Sn can be written as a product of transpositions

σ = τ1 ◦ τ2 ◦ · · · ◦ τ`.

This expression is not unique, but for each one the parity of the integer ` is the same. We define the signature of σ by (σ) = (−1)`

and so all transposition have a signature of −1. You also saw that the resulting map : Sn → {1, −1} is a group homomorphism. Using these notions, we can generalize Proposition B.1.4 as follows.

n Proposition B.1.5. Let ϕ: M −→ N be an alternating multilinear map, and let σ ∈ Sn. For all u1,..., un ∈ M, we have

ϕ(uσ(1),..., uσ(n)) = (σ)ϕ(u1,..., un).

Proof. Proposition B.1.4 shows that, for every transposition τ ∈ Sn, we have

ϕ(uτ(1),..., uτ(n)) = −ϕ(u1,..., un). 226 APPENDIX B. THE DETERMINANT

for any choice of u1,..., un ∈ M. From this we conclude that

ϕ(uν(τ(1)),..., uν(τ(n))) = −ϕ(uν(1),..., uν(n)) (B.4) for all ν ∈ Sn.

Write σ = τ1 ◦ τ2 ◦ · · · ◦ τ` where τ1, τ2, . . . , τ` are transpositions. Let σ0 = 1 and

σi = τ1 ◦ · · · ◦ τi for i = 1, . . . , `, so that σi = σi−1 ◦ τi for i = 1, . . . , `. For each integer i, Relation (B.4) gives

ϕ(uσi(1),..., uσi(n)) = −ϕ(uσi−1(1),..., uσi−1(n)).

Since σ = σ`, these ` relations imply

` ϕ(uσ(1),..., uσ(n)) = (−1) ϕ(u1,..., un).

The conclusion follows, since (σ) = (−1)`.

Proposition B.1.5 implies the following:

Theorem B.1.6. Let M be a free A-module with basis {e1,..., en}, and let c ∈ A. There n exists a unique alternating multilinear map ϕ : M → A such that ϕ(e1,..., en) = c. It is given by n n  X X  X ϕ ai,1ei ,..., ai,nei = c (σ) aσ(1),1 . . . aσ(n),n (B.5)

i=1 i=1 σ∈Sn

for all ai,j ∈ A.

Proof. First suppose that there exists an alternating multilinear map ϕ : M n → A such that ϕ(e1,..., en) = c. Let

n n n X X X u1 = ai,1ei, u2 = ai,2ei,..., un = ai,nei (B.6) i=1 i=1 i=1

be elements of M written as linear combinations of e1,..., en. Since ϕ is multilinear, we have n n n X X X ϕ(u1, u2,..., un) = ··· ai1,1ai2,2 . . . ain,n ϕ(ei1 , ei2 ,..., ein ) (B.7)

i1=1 i2=1 in=1

(this a particular case of Formula (B.3)). For integers i1, . . . , in ∈ {1, 2, . . . , n} not all

distinct, we have ϕ(ei1 ,..., ein ) = 0 by Definition B.1.3. One the other hand, if i1, . . . , in ∈ {1, 2, . . . , n} are distinct, they determine a bijection σ of {1, 2, . . . , n} given by σ(1) = i1, σ(2) = i2, . . . , σ(n) = in and, by Proposition B.1.5, we get

ϕ(ei1 ,..., ein ) = ϕ(eσ(1),..., eσ(n)) = (σ) ϕ(e1,..., en) = c (σ). B.1. MULTILINEAR MAPS 227

Thus Formula (B.7) can be rewritten as X ϕ(u1, u2,..., un) = c (σ) aσ(1),1aσ(2),2 . . . aσ(n),n.

σ∈Sn

This shows that ϕ is necessarily given by (B.5).

To conclude, we need to show that the function ϕ : M n → A given by (B.5) is alternating multilinear. We first note that it is multilinear (exercise). Thus it remains to check that, if u1, u2,..., un ∈ M are not all distinct, then ϕ(u1, u2,..., un) = 0.

Suppose therefore that there exist indices i and j with 1 ≤ i < j ≤ n such that ui = uj. We denote by E the set of permutations σ of Sn such that σ(i) < σ(j), by τ the transposition c of Sn that interchanges i and j, and by E the complement of E in Sn, that is the set of c permutations σ of Sn such that σ(i) > σ(j). Then an element of Sn belongs to E if and only if it is of the form σ ◦ τ with σ ∈ E. From this we deduce that X   ϕ(u1,..., un) = c (σ) aσ(1),1 . . . aσ(n),n + (σ ◦ τ) aσ(τ(1)),1 . . . aσ(τ(n)),n . σ∈E

Then, for all σ ∈ Sn, we have both (σ ◦ τ) = (σ)(τ) = −(σ) and

aσ(τ(1)),1 . . . aσ(τ(n)),n = aσ(1),τ(1) . . . aσ(n),τ(n) = aσ(1),1 . . . aσ(n),n, the last equality following from the fact that ui = uj. We conclude that ϕ(u1,..., un) = 0, as claimed.

Corollary B.1.7. Let M be a free A-module. Then all bases of M contain the same number of elements.

Proof. Suppose on the contrary that {e1,..., en} and {f1,..., fm} are two bases of M of different cardinalities. Without loss of generality, suppose that m < n and denote by n ϕ: M −→ A the alternating multilinear form such that ϕ(e1,..., en) = 1. Since {f1,..., fm} is a basis of M, we can write

m m X X e1 = ai,1fi ,..., en = ai,nfi i=1 i=1

for some elements ai,j of A. Since ϕ is multilinear, we conclude that

m m X X ϕ(e1,..., en) = ··· ai1,1 . . . ain,n ϕ(fi1 ,..., fin ). (B.8)

i1=1 in=1

Thus, for all i1, . . . , in ∈ {1, . . . , m}, at least two of the integers i1, . . . , in are equal (since

n > m) and therefore, since ϕ is alternating, we get ϕ(fi1 ,..., fin ) = 0. Thus, Equality (B.8) implies ϕ(e1,..., en) = 0. Since we chose ϕ such that ϕ(e1,..., en) = 1, this is a contradic- tion. 228 APPENDIX B. THE DETERMINANT B.2 The determinant

Let n be a positive integer. We know that   a   1  n  .  A =  .  ; a1, . . . , an ∈ A    an  is a free A-module with basis          1 0 0   .   0 1 .  e1 = . , e2 = . ,..., en =   . . . 0          0 0 1 

Theorem B.1.6 applied to M = An shows that, for this choice of basis, there exists a unique n n alternating multilinear map ϕ:(A ) → A such that ϕ(e1,..., en) = 1, and that it is given by        a1,1 a1,2 a1,n  .   .   .   X ϕ  .  ,  .  ,...,  .   = (σ) aσ(1),1 . . . aσ(n),n . σ∈S an,1 an,2 an,n n Now, the cartesian product (An)n = An ×· · ·×An of n copies of An can be naturally identified with the set Matn×n(A) of n × n matrices with entries in A. One associates to each element n n (C1,...,Cn) of (A ) the matrix (C1,...,Cn) ∈ Matn×n(A) whose columns are C1,...,Cn. n n In this way, the n-tuple (e1,..., en) ∈ (A ) corresponds to the identity matrix In of size n × n. We thus obtain:

Theorem B.2.1. There exists a unique function det: Matn×n(A) → A that satisfies det(In) = 1 and that, viewed as a function on the n columns of a matrix, is alternating multilinear. It is given by   a1,1 . . . a1,n  . .  X det  . .  = (σ) aσ(1),1 . . . aσ(n),n . (B.9) σ∈S an,1 . . . an,n n

This function is called the determinant. We can deduce the effect of an elementary column operation on the determinant:

0 Proposition B.2.2. Let U ∈ Matn×n(A) and let U be a matrix obtained by applying to U an elementary column operation. There are three cases:

(I) If U 0 is obtained by interchanging two columns of U, then det(U 0) = − det(U);

(II) If U 0 is obtained by adding to column i of U the product of column j of U and an element a of A, for distinct integers i and j, then det(U 0) = det(U); B.2. THE DETERMINANT 229

(III) If U 0 is obtained by multiplying column i of U by a unit a ∈ A∗, then det(U 0) = a det(U).

0 Proof. Denote the columns of U by C1,...,Cn. The equality det(U ) = − det(U) of case (I) follows from Proposition B.1.4. In case (II), we see that i j ∨ ∨ 0 det(U ) = det(C1,..., Ci + a Cj,..., Cj,...,Cn) i j i j ∨ ∨ ∨ ∨ = det(C1,..., Ci,..., Cj,...,Cn) + a det(C1,..., Cj,..., Cj,...,Cn) = det(U) since the determinant of a matrix with two equal columns is zero. Finally, in case (III), we have i i ∨ ∨ 0 det(U ) = det(C1,..., a Ci,...,Cn) = a det(C1,..., Ci,...,Cn) = a det(U) as claimed.

When A is a euclidean domain, we can bring every matrix U ∈ Matn×n(A) into lower triangular form by a series of elementary column operations (see Theorem 8.4.6). By the preceding proposition, we can follow the effect of these operations on the determinant. To deduce the value of det(U) from this, we then use:

Proposition B.2.3. If U ∈ Matn×n(A) is a lower triangular matrix, its determinant is the product of the elements on its diagonal.

Proof. Write U = (aij). Since U is lower triangular, we have aij = 0 si i > j. For every permutation σ ∈ Sn with σ 6= 1, there exists an integer i ∈ {1, 2, . . . , n} such that σ(i) > i and so we obtain aσ(1),1 . . . aσ(i),i . . . aσ(n),n = 0.

Then Formula (B.9) gives det(U) = a1,1 a2,2 . . . an,n.

We also note that the determinant of every square matrix is equal to the determinant of its transpose: t Proposition B.2.4. For every matrix U ∈ Matn×n(A) we have det(U ) = det(U).

Proof. Write U = (aij). For all σ ∈ Sn, we have

aσ(1),1 . . . aσ(n),n = a1,σ−1(1) . . . an,σ−1(n). Since we also have (σ−1) = (σ)−1 = (σ), Formula (B.9) gives

X −1 X t det(U) = (σ ) a1,σ−1(1) . . . an,σ−1(n) = (σ) a1,σ(1) . . . an,σ(n) = det(U ).

σ∈Sn σ∈Sn 230 APPENDIX B. THE DETERMINANT

In light of this result, we deduce that the function det: Matn×n(A) → A, viewed as a function of the n rows of a matrix is alternating multilinear. We also get the following analogue of Proposition B.2.2 for elementary row operations, replacing everywhere the word “column” by “row”. 0 Proposition B.2.5. Let U ∈ Matn×n(A) and let U be a matrix obtained by applying to U an elementary row operation. There are three cases:

(I) If U 0 is obtained by interchanging two rows of U, then det(U 0) = − det(U); (II) If U 0 is obtained by adding to row i of U the product of row j of U and an element a of A, for distinct integers i and j, then det(U 0) = det(U); (III) If U 0 is obtained by multiplying row i of U by a unit a ∈ A∗, then det(U 0) = a det(U).

Proof. For example, in case (II), the matrix (U 0)t is obtained from U t by adding to column i of U t the product of column j of U and a. Proposition B.2.2 gives in this case det((U 0)t) = det(U t). By Proposition B.2.4, we conclude that det(U 0) = det(U).

Similarly, we have:

Proposition B.2.6. The determinant of a triangular matrix U ∈ Matn×n(A) is equal to the product of the elements of its diagonal.

Proof. If U is lower triangular, this follows form Proposition B.2.3. If U is upper triangular, its transpose U t is lower triangular, and so det(U) = det(U t) is the product of the element on the diagonal of U t. Since these elements are the same as those on the diagonal of U, the result follows.

We finish this section with the following result.

Theorem B.2.7. For every pair of matrices U, V ∈ Matn×n(A), we have det(UV ) = det(U) det(V ).

n n Proof. Fix a matrix U ∈ Matn×n(A), and consider the function ϕ:(A ) → A given by

ϕ(C1,...,Cn) = det(UC1,...,UCn) n for every choice of n columns C1,...,Cn ∈ A . We claim that ϕ is alternating multilinear. Furthermore, if C1,...,Cn are the columns of a matrix V , then UC1,...,UCn are the columns of UV . Therefore, we also have

ϕ(e1,..., en) = det(UI) = det(U). Since Theorem B.1.6 shows that there exists a unique alternating multilinear function from n n n n (A ) to A such that ϕ(e1,..., en) = det(U) and the function ψ :(A ) → A given by

ψ(C1,...,Cn) = det(U) det(C1,...,Cn)

is another, we have det(UC1,...,UCn) = det(U) det(C1,...,Cn) for all (C1,...,Cn) ∈ n n (A ) , and so det(UV ) = det(U) det(V ) for all V ∈ Matn×n(A). B.3. THE ADJUGATE 231 B.3 The adjugate

The goal of this last section is to show that the usual formulas for expanding a determinant along a row or column remain valid for all matrices with entries in an arbitrary commutative ring A, and that the notion of the adjugate and its properties also extend to this situation. To do this, we start with an alternate presentation of the determinant which is more useful for describing the determinant of a submatrix.

We first recall that the number of inversions of an n-tuple of distinct integers (i1, . . . , in), denoted N(i1, . . . , in), is defined to be the number or pairs of indices (k, `) with 1 ≤ k < ` ≤ n such that ik > i`. For example, we have N(1, 5, 7, 3) = 2 and N(4, 3, 1, 2) = 4. Every permutation σ of Sn can be identified with the n-tuple (σ(1), . . . , σ(n)) of its values, and we show in algebra that (σ) = (−1)N(σ(1),...,σ(n)).

Then Formula (B.9) applied to a matrix U = (ai,j) ∈ Matn×n(A) becomes

X N(i1,...,in) det(U) = (−1) ai1,1 ··· ain,n.

(i1,...,in)∈Sn

Since the determinant of U is equal to that of its transpose, we again deduce that

X N(j1,...,jn) det(U) = (−1) a1,j1 ··· an,jn .

(j1,...,jn)∈Sn

Thanks to this formula, if 1 ≤ f1 < f2 < ··· < fk ≤ n and 1 ≤ g1 < g2 < ··· < gk ≤ n are two increasing subsequences of k elements chosen from the integers 1, 2, . . . , n, then the 0 submatrix U = (af(i),g(j)) of U of size k × k formed from the elements of U belonging to the rows with indices f1, . . . , fk and the columns with indices g1, . . . , gk has determinant

0 X N(j1,...,jk) det(U ) = (−1) af1,j1 ··· afk,jk (B.10) where the sum if over all permutations (j1, . . . , jk) of (g1, . . . , gk).

Definition B.3.1 (Minor, cofactor matrix, adjugate). Let U = (ai,j) be an n × n matrix with entries in A. For i, j = 1, . . . , n, the (i, j)-minor of U, denoted Mi,j(U), is the determinant of the (n − 1) × (n − 1) submatrix of U obtained by removing from U its i-th row and j-th column. The (i, j)-cofactor of U is

i+j Ci,j(U) := (−1) Mi,j(U).

The cofactor matrix of U is the n × n matrix whose (i, j) entry is the (i, j)-cofactor of U. Finally, the adjugate of U, denoted Adj(U) is the transpose of the cofactor matrix of U.

We first give the formula for expansion along the first row: Pn Lemma B.3.2. Let U = (ai,j) ∈ Matn×n(A). We have det(U) = j=1 a1,jC1,j(U). 232 APPENDIX B. THE DETERMINANT

Proof. Formula (B.10) applied to the calculation on M1,j(U) gives

X N(j1,...,jn−1) M1,j(U) = (−1) a2,j1 ··· an,jn−1 ,

(j1,...,jn−1)∈Ej the sum being over the set Ej of permutations (j1, . . . , jn−1) of (1,...,bj, . . . , n) that is the set of (n − 1)-tuples of integers (j1, . . . , jn−1) such that (j, j1, . . . , jn−1) ∈ Sn. Since N(j, j1, . . . , jn−1) = N(j1, . . . , jn−1) + j − 1 for each of these (n − 1)-tuples, we see that

X N(j1,...,jn) det(U) = (−1) a1,j1 ··· an,jn

(j1,...,jn)∈Sn n X X N(j,j1,...,jn−1) = a1,j (−1) a2,j1 ··· an,jn−1

j=1 (j1,...,jn−1)∈Ej n X j−1 = a1,j(−1) C1,j(U), j=1

and the assertion follows.

More generally, we have the following formulas:

Proposition B.3.3. Let U = (ai,j) ∈ Matn×n(A) and let k, ` ∈ {1, 2, . . . , n}. We have

n n ( X X det(U) if k = `, ak,jC`,j(U) = aj,kCj,`(U) = j=1 j=1 0 otherwise.

Proof. First suppose that k = ` and denote by U 0 the matrix obtained from U by dragging its k-th row from position k to position 1, in such a way that, for all j = 1, . . . , n, we have 0 0 M1,j(U ) = Mk,j(U). Applying Lemma B.3.2 to the matrix U , we have

n n k−1 0 k−1 X 0 X det(U) = (−1) det(U ) = (−1) ak,jC1,j(U ) = ak,jCk,j(U). j=1 j=1

Pn If k 6= `, we deduce from the above formula that j=1 ak,jC`,j(U) is equal to the determinant of the matrix U 00 obtained from U by replacing its `-th row by its k-th row. Since this matrix has two identical row (the k-th and `-th rows), its determinant is zero. Thus we have Pn t j=1 ak,jC`,j(U) = det(U)δk,` for all k and `. Applying this formula to the matrix U and t noting that Cj,`(U) = C`,j(U ) for all j and `, we have

n n t X t X det(U)δk,` = det(U )δk,` = aj,kC`,j(U ) = aj,kCj,`(U). j=1 j=1 B.3. THE ADJUGATE 233

Theorem B.3.4. For every matrix U ∈ Matn×n(A), we have

UAdj(U) = Adj(U)U = det(U)I.

Proof. Indeed, the element in position (`, j) of Adj(U) is Cj,`(U), and so the formulas of Proposition B.3.3 imply that the (k, `) entry of the product UAdj(U) and the (`, k) entry of the product Adj(U)U are both equal to det(U)δk,` for all k, ` ∈ {1, . . . , n}.

We conclude with the following criterion:

Corollary B.3.5. The invertible elements of the ring Matn×n(A) are those whose determi- nant is an invertible element of A.

Proof. Let U ∈ Matn×n(A). If U is invertible, there exists a matrix V ∈ Matn×n(A) such that UV = I. Taking the determinant of both sides of this equality, we have, thanks to Theorem B.2.7, det(U) det(V ) = det(I) = 1. Since det(V ) ∈ A, we deduce that det(U) ∈ A∗. Conversely if det(U) ∈ A∗, there exists c ∈ A such that c det(U) = 1 and so the above theorem shows that the matrix V = cAdj(U) of Matn×n(A) satisfies UV = VU = c det(U)I = I, thus U is invertible. Index

abelian, 211 binary operation, 207 addition, 208 diagram, 167 additive inverse,2, 76 group, 211 adjoint, 191 ring, 216 adjugate, 231 commute, 217 Alembert-Gauss Theorem, 72 companion matrix, 94 alternating, 224 complexification, 171 annihilator, 91 congruence class, 219 anti-homomorphism of rings, 192 conjugate, 35 anti-involution of rings, 192 coordinates,6 antisymmetric cyclic, 31, 82 bilinear form, 165 module, 91 associated, 60 cyclic group, 213 associates, 60 associative degree, 64 binary operation, 207 dependence relation,5 automorphism determinant, 228 of a module, 88 of a linear operator, 34 of a matrix, 33 basis,5 diagonalizable canonical,6 linear operator, 36 of a module, 83 matrix, 41 standard,6 dimension,7 bilinear form, 160 direct sum,7 binary operation, 207 of submodules, 84 distance, 181 canonical basis,6 divides, 59 cartesian product,3 divisible, 59 Cauchy-Schwarz inequality, 181 divisor, 59 Cayley-Hamilton Theorem, 125 double dual, 158 change of coordinates matrix, 24 dual, 18, 155 characteristic polynomial dual basis, 155 of a linear operator, 38 dual linear map, 20 of a matrix, 42 cofactor, 231 echelon form, 131 cofactor matrix, 231 eigenspace column module, 130 of a linear operator, 36 commutative of a matrix, 42

234 INDEX 235 eigenvalue ideal, 62, 221 of a linear operator, 36 generated by elements, 62 of a matrix, 41 principal, 62 eigenvector identity element, 208 of a linear operator, 36 image of a matrix, 42 of a group homomorphism, 214 elementary column operations, 130 of a linear map, 13 elementary divisor, 105 of a module homomorphism, 88 elementary matrix, 144 of a ring homomorphism, 220 elementary row operation, 134 of a submodule under a module homo- endomorphism, 26 morphism, 89 Euclidean Algorithm, 66 inner product space, 179 euclidean division, 65 integral domain, 59 euclidean domain, 65 integral ring, 59 euclidean function, 66 intersection,4 euclidean ring, 65 of submodules, 82 euclidean space, 180 invariant factor, 100, 136 evaluation, 52, 155 invariant subspace, 27 at a linear operator, 54 inverse, 210 at a matrix, 56 additive,2 exponent of an element of a ring, 217 of a group, 92 inverse image of a submodule under a module homo- field, 222 morphism, 90 finite dimensional,5 inversions, 231 finite type invertible, 19, 210 module, 82 element of a ring, 217 vector space,5 irreducible, 61 finitely generated irreducible invariant subspace, 193 module, 82 isometry, 190 free module, 83 isomorphic, 20 Fundamental Theorem of Algebra, 72 groups, 214 general linear group, 211 rings, 220 generate,5 isomorphism, 20 generating set,5 of groups, 214 greatest common divisor, 61, 63 of modules, 87 group, 210 of rings, 220 of permutations, 225 group of automorphisms Jordan block, 109 of a modules, 88 Jordan canonical form of a linear operator, 110 homomorphism of a matrix, 115 of groups, 213 of modules, 86 kernel of rings, 220 of a group homomorphism, 214 236 INDEX

of a linear map, 13 complement, 183 of a module homomorphism, 88 matrix, 188 of a ring homomorphism, 220 projection, 184 Kronecker product, 173 subset, 181 vectors, 181 leading coefficient, 64 orthongal least common multiple, 63 basis, 181 left linear, 160 operator, 188 length, 181 orthonormal linear dependence relation,5 basis, 181 linear form, 18, 155 linear map, 13 perpendicular, 181 linear operator, 26 polar decomposition, 179 linearly dependent,5 polynomial, 49 linearly independent polynomial function, 53 elements of a module, 83 polynomial ring, 49 vectors,5 positive definite, 179, 199 lowest common multiple, 92 primary rational canonical form, 105 principal ideal, 62 map product linear, 13 cartesian,3 matrix projection, 15 associated to a linear map, 21 change of coordinates, 24 quotient polynomial, 65 of a bilinear form, 163 minimal invariant subspace, 193 r-linear, 177 minimal polynomial, 97 rational canonical form, 100 minor, 141, 231 real unitary operator, 188 module, 75 remainder polynomial, 65 monic, 64 restriction, 27 monoid, 208 right linear, 160 multilinear, 177, 223 ring, 215 multiplicative group of a field, 60 homomorphism, 220 multiplicity, 72 isomorphism, 220 of integers modulo n, 219 norm, 180 ring of polynomial functions, 53 number of inversions, 33 root, 72

O(V ), 188 self-adjoint, 191 On(R), 188 signature, 33, 225 operator similar, 35 linear, 26 Smith normal form, 136 order special linear group, 215 of a group, 92 standard basis,6 of an element of a group, 92 Structure theorem for finite type modules over orthogonal a euclidean domain, 99 INDEX 237 subgroup, 212 generated by a set of elements, 213 submodule, 80 generated by elements, 81 of free module, 119 subring, 219 subspace,3 invariant, 27 sum,4, 208 direct,7 of submodules, 82 symmetric bilinear form, 165, 179 operator, 191 system of generators,5 tensor product, 166 torsion, 83 torsion module, 83 trace, 160 of a linear operator, 39 transpose, 156 transposition, 225

Unique Factorization Theorem, 70 unit, 60 unit of a ring, 217 unit vector, 180 vector,2 vector space,1 vector subspace,3 zero of a module, 76 of a ring, 216 vector,2