<<

MATH2201 Lecture Notes

Andrei Yafaev (based on notes by Richard Hill, John Talbot and Minhyong Kim)

December 8, 2009

If you find a mistake then please email me ([email protected]).

Contents

1 Number Theory 2 1.1 Euclid’salgorithm ...... 2 1.2 Factorizationintoprimes ...... 8 1.3 Congruences...... 12

2 Polynomial Rings 16 2.1 IrreducibleelementsinRings ...... 16 2.2 Euclid’s algorithm in k[X]...... 19

3 Jordan Canonical Form 23 3.1 Revisionoflinearalgebra ...... 23 3.2 representation of linear maps ...... 27 3.3 Minimalpolynomials...... 33 3.4 GeneralizedEigenspaces ...... 36 3.5 Jordan Bases in the one eigenvalue case ...... 40 3.6 Jordan Canonical (or Normal) Form in the one eigenvalue case ...... 44 3.7 Jordancanonicalformingeneral...... 49

4 Bilinear and Quadratic Forms 52 4.1 MatrixRepresentation ...... 52 4.2 Symmetric bilinear forms and quadratic forms ...... 56 4.3 and diagonalization ...... 58 4.4 ExamplesofDiagonalising...... 60 4.5 Canonical forms over C ...... 63 4.6 Canonical forms over R ...... 63

5 Inner Product Spaces 66 5.1 GeometryofInnerProductSpaces ...... 66 5.2 Gram–Schmidt ...... 70 5.3 Adjoints...... 73 5.4 Isometries...... 74 5.5 OrthogonalDiagonalization ...... 78 1 Lecture 1 Sketch of the course:

Number theory (prime numbers, factorization, congruences); • Polynomials (factorization); • Jordan canonical form (generalization of diagonalizing); • Quadratic and bilinear forms; • Euclidean and Hermitian spaces. • Prerequisites (you should know all this from first year algebra courses)

Fields and vector spaces (bases, , span, subspaces). • Linear maps (, nullity, and image, matrix representation). • Matrix algebra (row reduction, , eigenvalues and eigenvectors, diagonaliza- • tion).

1 Number Theory

For this chapter and the next (Polynomials), I recommend the book ‘A concrete introduction to higher algebra’, by Lindsay Childs. Number theory is the theory of Z = 0, 1, 2,... . Recall also the notation N = 1, 2, 3, 4,... . { ± ± } { } 1.1 Euclid’s algorithm We say that a Z divides b Z iff there exists c Z such that b = ac. We write a b. • ∈ ∈ ∈ | A common divisor of a and b is d that divides both a and b. • The greatest common divisor or highest common factor of a and b is a common factor • d of a and b such that any other common factor is smaller than d. This is written d = gcd(a, b) = hcf(a, b) = (a, b).

Note that every a Z is a factor of 0, since 0 = 0 a. Therefore every number is a common ∈ × factor of 0 and 0, so there is no such thing as hcf(0, 0). However if a, b Z are not both zero, ∈ then they have a highest common factor. In particular if a > 0 then hcf(a, 0) = a. Euclid’s algorithm is a method for calculating highest common factors. Note that if a divides b, then also a divides b or a divides b or a divides b. To remove − − − − this ambiguity, we will usually work with positive integers. The following obvious remark : any divisor of a 0 is smaller or equal to a, is often used in ≥ the proofs.

2 1.1.1 Euclidean divison Let a b 0 be two integers. There exists a UNIQUE pair of ≥ ≥ integers (q,r) satisfying a = qb + r and 0 r < b. ≤ Proof. Two things need to be proved : the existence of (q,r) and its uniqueness. Let us prove the existence. Consider the set S = x,x integer 0: a xb 0 { ≥ − ≥ } The set S is not empty : 1 belongs to S. The set S is bounded : any element x of S satisfies x a . THerefore, S being a bounded set of positive integers, S is finite and hence contains a ≤ b maximal element. Let q be this maximal element and let r := a qb. − We need to prove that 0 r < b. By definition r 0. To prove that r < b, let us argue by ≤ ≥ contradiction. Suppose that r b. Then, replacing r by a qb, we get ≥ − a (q + 1)b 0 − ≥ This means that q + 1 S but q + 1 > q. This contradicts the maximality of q. Therefore r < b ∈ and the existence is proved. Let us now prove the uniqueness. Again we argue by contradiction. Suppose that there exists a pair (q′,r′) satisfying a = q′b+r′ with 0 r < b and such that q = q. By subtracting the inequality 0 r < b to this inequality, ≤ ′ ′ 6 ≤ we get b

r r′ = q q′ b | − | | − | By assumption q = q , hence q q 1 and we get the inequality 6 ′ | ′ − |≥ r r′ b | − |≥ The two inequalities satisfied by r r contradict each other, hence q = q . And now the equality − ′ ′ r r = q q b gives r = r . The uniqueness is proved. 2 | − ′| | − ′| ′ We now prove the following property which lies at the heart of Euclid’s algorithm.

1.1.2 Let a b 0 be two integers and (q,r) such that ≥ ≥ a = bq + r, 0 r < b ≤ Then hcf(a, b) = hcf(b, r).

Proof. Let A := hcf(a, b) and B := hcf(b, r). As r = a bq and A divides a and b, A divides − r. Therefore A is a common factor of b and r. As B is the highest common factor of b and r, A B. ≤ 3 In exactly the same way, one proves (left to the reader), that B A and therefore A = B. ≤ 2

This proposition leads to the following algorithm (Euclids algorithm). Let a b 0 be two integers. We wish to calculate hcf(a, b). ≥ ≥ The method is this: Set r1 = b and r2 = a. If ri 1 = 0 then define for i 3: − 6 ≥

ri 2 = qiri 1 + ri, 0 ri

The fact that ri < ri 1 (strict inequality !!!) implies that there will be an integer n such that − rn = 0. Then by the theorem we have

hcf(a, b) = hcf(r2,r3)= ... = hcf(rn 1, 0) = rn 1. − −

1.1.3 Remark When performing Euclid’s algorithm, be very careful not to divide qi by ri. This is a mistake very easy to make.

1.1.4 Example Take a = 27 and b = 7. We have

27 = 3 7 + 6 × 7 = 1 6 + 1 × 6 = 6 1 + 0. × Therefore hcf(27, 7) = hcf(7, 6) = hcf(6, 1) = hcf(1, 0) = 1.

Euclid’s algorithm is very easy to implement on a computer. Suppose that you have some standard computer language (Basic, Pascal, Fortran,...) and that it has an instruction r := a mod b which returns the remainder of the Euclidean division of a by b. The implementation of the algorithm would be something like this:

Procedure hcf(a, b) If a < b then Swap(a, b) While b = 0 6 Begin r := a mod b a := b b := r End Return a End

The following lemma is very important for what will follow. It is essentially ‘the Euclid’s algorithm’ run backwards. 4 1.1.5 Bezout’s Lemma As usual, let a b 0 be integers. Let d = hcf(a, b). Then there ≥ ≥ are integers h, k Z such that ∈ d = ha + kb.

Note that in this lemma, the integers h and k are not positive, in fact exactly one of them is negative or zero. Prove it ! Proof. Consider the sequence given by Euclid’s algorithm:

a = r1, b = r2,r3,r4,...,rn = d.

In fact we’ll show that each of the integers ri may be expressed in the form ha+kb. We prove this by induction on i: it is certainly the case for i = 1, 2 since r = 1 a+0 b and r = 0 a+1 b. 1 × × 2 × × For the inductive step, assume it is the case for ri 1 and ri 2, i.e. − −

ri 1 = ha + bk, ri 2 = h′a + k′b. − − We have ri 2 = qri 1 + ri. − − Therefore r = h′a + k′b q(ha + kb) = (h′ qh)a + (k′ qk)b. i − − − 2

1.1.6 Example Again we take a = 27 and b = 7.

27 = 3 7 + 6 × 7 = 1 6 + 1 × 6 = 6 1 + 0. × Therefore

1 = 7 1 6 − × = 7 1 (27 3 7) − × − × = 4 7 1 27. × − × So we take h = 1 and k = 4. − We now apply the Euclid’s algorithm and B´ezout’s lemma to the solution of linear diophan- tine equations. Let a,b,c be three positive integers. A linear diophantine equation (in two variables) is the equation ax + by = c A solution is a pair (x, y) of integers (not necessarily positive) that satisfy this relation. Such an equation may or may not have solutions. For example, consider 2x + 4y = 5. Quite clearly, if there was a solution, then 2 will divide the right hand side, which is 5. This is not the case, therefore, this equation does not have a solution. 5 On the other hand, the equation 2x + 4y = 6 has many solutions : (1, 1), (5, 1),.... This − suggests that the existence of solutions depends on whether or not c is divisible by the hcf(a, b) and that if such is the case, there are many solutions to the equation. This indeed is the case, as shown in the following theorem. Before we prove the theorem, let us prove a couple of preliminary lemmas.

1.1.7 Lemma Let a and b be two positive integers. Let d := hcf(a, b). Then hcf(a/d, b/d) = 1.

Proof. Use B´ezout’s lemma : there exist h, k such that ah + kb = d. Divide by d to get a b a b 2 d h + d k = 1. Any common divisor of d and d divides one, hence it’s one. Two integers a and b such that hcf(a, b) = 1 are called coprime or relatively prime.

1.1.8 Lemma Suppose a divides bc and hcf(a, b) = 1, then a divides c.

Proof. Write bc = ga. As usual 1 = ha + kb. Multiply by c to get

c = h(ac)+ k(bc)= h(ac)+ k(ga)= a(hc + kg) Hence a divides c. 2

1.1.9 Solution to linear diophantine equations Let a,b,c be three positive integers, let d := hcf(a, b) and consider the equation

ax + by = c

1. This equation has a solution if and only if d divides c

2. Suppose that d c and let (x , y ) be a solution. The set of all solutions is (x +n b , y n a ) | 0 0 0 d 0 − d where n runs through the set of all integers (positive and negative).

Proof. For the ‘if’ part : Suppose there is a solution (x, y). Then d divides ax + by. But, as ax + by = c, d divides c. For the ‘only if’ part : Suppose that d divides c and write c = dm for some integer m. By B´ezout’s lemma there exist integers h, k such that

d = ha + kb

Multiply this relation by m and get

c = dm = (mh)a + (mk)b

This shows that (x0 = mk, y0 = mh) is a solution to the equation. That finishes the ‘only if’ part.

6 Let us now suppose that the equation has a solution (in particular d divides c) (x0, y0). Let (x, y) be any other solution. Subtract ax + by = c from ax0 + by0 = c to get

a(x x)+ b(y y) = 0 0 − 0 − Divide by d to get a b (x x)= (y y) d 0 − −d 0 − This relation shows that a divides b (y y) but the integers a and b are coprime (lemma 1.1.7) d d 0 − d d hence a divides y y (lemma 1.1.8). d 0 − Therefore, there exists an integer n such that a y = y n 0 − d Now plug this into the equality a (x x)= b (y y) to get that d 0 − − d 0 − b x = x + n 0 d 2

The proof of this theorem gives a procedure for finding solutions, it is as follows:

1. Calculate d = hcf(a, b). If d does not divide c, then there are no solutions and you’re done. If d divides c, c = md then there are solutions.

2. Run Euclid’s algorithm backwards to get h, k such that d = ha + kb. Then (x0 = mh, y0 = mk) is a solution.

3. All solutions are b a (x + n , y n ) 0 d 0 − d where n runs through all integers.

1.1.10 Example Take a = 27, b = 7,c = 5. We have found that hcf(a, b) = 1 (in particular there will be solutions with any c) and that 1 = 4 7 1 27 hence h = 1 and k = 4. × − × − Our procedure gives a particular solution : ( 5, 20) and the general one ( 5 + 7n, 20 27n). − − −

Take a = 666, b = 153,c = 43. We have found that hcf(a, b) = 9, it does not divide 43, hence no solutions. Take c = 45 = 5 9. There will be solutions. We had 9 = 3 666 13 153. A particular × × − × solution is (15, 65) and the general one is (15 + 17n, 65 74n). − − − (in particular there will be solutions with any c) and that 1 = 4 7 1 27 hence h = 1 × − × − and k = 4. Our procedure gives a particular solution : ( 5, 20) and the general one is ( 5+7n, 20 27n). − − −

7 Lecture 2

1.2 Factorization into primes 1.2.11 Definition An integer p Z, not equal to 1 is prime iff the only divisors of p are 1 ∈ ± ± and p. ± As usual, we will work with positive integers, in which case the defintion becomes p is prime if and only if its only divisors are 1 and p.

1.2.12 Euclid’s Theorem If p is a prime and p ab then p a or p b. | | | Proof. Suppose that p a then hcf(a,p) = 1. By lemma 1.1.8, p divides b. 2 6|

1.2.13 Example Prove that if a and b divide n and a and bare coprime, then ab divide n.

1.2.14 Corollary If p a a then there exists 1 i n such that p a . | 1 ··· n ≤ ≤ | i Proof. True for i = 1, 2 so suppose true for n 1 and suppose that p a a . Let A = − | 1 ··· n a1 an 1 and B = an then p AB = p A or p B. In the latter case we are done and in the ··· − | ⇒ | | former case the inductive hypothesis implies that p a for some 1 i n 1. 2 | i ≤ ≤ −

1.2.15 Unique Factorisation Theorem If a 2 is an integer then there are primes p > 0 ≥ i such that a = p p p . 1 2 ··· s Moreover this factorisation is unique in the sense that if

a = q q q 1 2 ··· t for primes qj > 0 then s = t and p ,...,p = q ,...,q { 1 s} { 1 s} (equality of sets) In other words, the pis and the qis are the same prime numbers up to reodering.

Proof. For existence suppose the result does not hold. Then there an integer which can not be written as a product of primes. Among all those integers, there is a smallest one (the integers are under consideration are greater than two !). Let a be this smallest integer which is not a product of primes. Certainly a is not prime so a = bc with 1

b = p p 1 ··· k

8 and c = p p k+1 ··· l hence a = p p , 1 ··· l a contradiction hence the factorisation exists. For uniqueness suppose that we have an example where there are two distinct factorisations. Again we can choose a smallest integer with two diffrent factorisations a = p p = q q . 1 ··· s 1 ··· t Then p q q so by Corollary 1.4 we have p q for some 1 j t then since p and q are 1| 1 ··· t 1| j ≤ ≤ 1 j primes we have p1 = qj. But then dividing a by p1 we have a smaller integer with two distinct factorisations, a contradiction. 2

1.2.16 Remark Of course, the primes in the factorisation a = p p need not be distinct. 1 ··· s 2 3 For example : 4 = 2 , here p1 = p2 = 2. Similarly 8 = 2 ,p1 = p2 = p3 = 3. Also 12 = 3 22,p = 3,p = p = 2 × 1 2 3 In fact we have that for any integer a 2, there exist s distinct primes p ,...,p and t ≥ 1 s integers ei 1 such that ≥ 1 t a = pe pe 1 ··· s

1.2.17 Example 1000 = 23 53 × 144 = 24 32 × 975 = 23 53 ×

Factoring a given integer is hard as there is no procedure like Euclidean algorithm. One unsually does it by trial and error. The following trivial lemma helps.

1.2.18 Lemma Square root test Let n be a composite (not prime) integer. Then n has a prime divisor < √n. Proof. Write n = ab with a,b > 1. Suppose that a > √n, then n = ab > √nb hence b < √n and therefore any prime divisor of b is < √n. 2 For example, suppose you were to factor 3372. Clearly it’s divisible by 2 : 3372 = 2 1686. × Now, 1686 is again divisible by two : 1686 = 2 843 and 3372 = 22 843. Now we notice that × × 3 divides 843 = 3 281. Now the primes < √281 are 2, 3, 5, 7, 11, 13 and 281 is not divisible by × any of these. Hence 281 is prime and we get a factorisation: 3372 = 22 3 281 · · How many primes there are ? Here is the answer. 9 1.2.19 Euclid’s Theorem There exist infinitely many primes.

Proof. Suppose not, let p ,...p be all the primes there are. Consider Q = p p p + 1. 1 n 1 2 ··· n Since Q has a prime factorisation then there is a prime P that divides Q, but this cannot be in our list of primes since any prime pi leaves a remainder of 1 when we divide Q by pi. 2 The idea we used here is this : suppose the set of all primes is finite, we construct an integer that is not divisible by any of the primes from this set. This is a contradiction. Can we use the same idea to prove that there are infinitely many primes of a certain form ? Quite clearly Euclid’s theorem shows that there are infinitely many odd primes since the only even prime is 2. Put in another way, it shows that there are infinitely many promes of the form 2k + 1. Let’s look at primes of the form 4k + 3. Are there infinitely many of them ? Suppose there are finitely many and list them p1,...,pr. Note that p1 = 3. Consider Q = 4p p + 3 (note that we started at p !!!). 2 ··· r 2 The integer Q is clearly not divisible by 3 (becaise p = 3 for all i> 1). i 6 None of the pi, i> 2 divides Q. Indeed suppose some pi, i> 2 divides Q. Then

4p p +3= p k 2 ··· r i which shows that pi divides 3 which is not the case. To get a contradiction, we need to prove that Q is divisible by a prime of the form 4k + 3, for it will have necessarily be one of the pis. Thisis precisely what we are proving.

1.2.20 Lemma Every integer of the form 4k + 3 has a prime factor of the form 4k + 3. Proof. Let N = 4k + 3. If N is prime, then take for the factor N itself. Let us proceed by induction : suppose that the result is true for all integers strictly less than N. We can and do assume that N is composite. As N is odd, it factors as a product of two odd numbers. Any odd number is of the form 4k + 1 or 4k + 3. We have the following possibilities.

1. N = (4a + 1)(4b + 1) = 4(4ab + a + b) + 1.

2. N = (4a + 1)(4b + 3) = 4(4ab + 3a + b) + 3.

3. N = (4a + 3)(4b + 3) = 4(4ab + 3a + 3b + 2) + 1.

Notice that only the case two occur - cases one and three are not of the form 4k + 3 and case two has a factor of the form 4k + 3. One concludes by induction. 2 Note that the proof does not work if you try to prove that there are infinitely many primes of the form 4k + 1. This is where it fails. The first prime of this form is 5 = 4 1 + 1 but when × you try to construct your Q, you get Q = 4 5+1=21=3 7. The divisors of Q are of the × × form 4k + 3, not 4k + 1.... In other words, the method fails because the divisors of Q can have no divisor of the form 4k + 1. It is however true that there are infinitely many primes of the form 4k + 1, in fact, there is the following spectacular theorem : 10 1.2.21 Dirichlet’s theorem on primes in arithmetic progressions Let a and d be two coprime integers. There exist infinitely many primes of the form a + kd. The proof of this theorem is well beyond the scope of this course. To conclude, here are some questions to think about :

1. Is any positive integer a sum of two prime numbers ? For example : 8 = 3 + 5, 80 = 37 + 43, 800 = 379 + 421,...

2. Are there infinitely many primes p such that p +2 is prime ? Ex. (3, 5), (17, 19), (881, 883), (1997, 1999),...

11 Lecture 3

1.3 Congruences We define a b mod m iff m (a b). ≡ | − We say a is congruent to b modulo m. The congruency class of a is the set of numbers congruent to a modulo m. This is written [a]. Every integer is congruent to one of the numbers 0, 1,...,m 1, so the set of all congruency − classes is [0],..., [m 1] . This is written Z/mZ. { − } Ex. Take m =3, them [8]=[5]=[2]=[ 1]=[ 4] = ... − − For an integer k, 4k + 1 1 mod 4, 4k + 3 3 mod 4 and 4k 0 mod 4. ≡ ≡ ≡ An integer is even if and only if it is zero mod 2. An integer is odd if and only if it is one mod 2. Let a b be two positive integers and let (q,r) be such that a = bq + r. Then a r mod b. ≥ ≡ It may help to think of congruences as the remainders of the Euclidean division.

1.3.22 Proposition If a b mod m then b a mod m. ≡ ≡ If a a mod m and b b mod m then a + b a + b mod m and ab a b mod m. ≡ ′ ≡ ′ ≡ ′ ′ ≡ ′ ′ Proof. easy 2

We can rewrite this proposition by simply saying:

[a]+[b]=[a + b] and [a][b]=[ab]

The proposition says that these operations + and are well defined operations on Z/mZ. × Ex. Write down addition and multiplication tables in Z/3Z, Z/4Z and Z/6Z. By an inverse of a modulo m we mean a number c such that ac 1 mod m. This is written ≡ c a 1 mod m. ≡ − An element may or may not have an inverse mod m. Take m = 6. [5] has an inverse in Z/6Z :

[5] [5] = [25] = [1] × While [3] does not have an inverse : in Z/6Z we have [3][2] = [6] = [0]. So if [3] had an inverse, say [a], we would have [3][a] = [1], and by miltiplying by [2] we would get [0] = [2] which is not the case. This suggests that the existence of the inverse of a mod m has something to do with common factors of a and m. This is indeed the case as shown in the following lemma.

1.3.23 Lemma An integer a has an inverse modulo m if and only if a and m are coprime (hcf(a, m) = 1).

Proof. The integer a has an inverse mod m if and only if the equation

ax + my = 1 has a solution. This equation has a solution if and only if hcf(a, m) divides 1 which is only possible if hcf(a, m)=1. 2 12 As usual, the proof of the lemma gives a procedure for finding inverses. Use Euclidean algorithm to calculate hcf(a, m). If it’s not one, there is inverse. If it is one run the algorithm backwards to find h and k such that ah + mk = 1 and

1 [a]− =[h]

1 1.3.24 Example Find 43− mod 7. Euclid’s algorithm :

43 = 6 7 + 1 ×

1 They are coprime and 1 = 6 7 + 1 43. Hence 43− = 1 mod 7 1 − × × Same with 32− mod 7. 32 = 4 7 + 4 ∗ 7 = 1 4 + 3 ∗ 4 = 1 3 + 1 ∗

And

1 = (1 4) + ( 1 3) = ( 1 7) + (2 4) = (2 32) + ( 9 7) = ( 9 7) + (2 32) ∗ − ∗ − ∗ ∗ ∗ − ∗ − ∗ ∗ 1 Hence 32− = 2 mod 7. 1 Same with 49− mod 15. 49 = 3 15 + 4 ∗ 15 = 3 4 + 3 ∗ 4 = 1 3 + 1 ∗

And get

1 = (1 4) + ( 1 3) = ( 1 15) + (4 4) = (4 49) + ( 13 15) = ( 13 15) + (4 49) ∗ − ∗ − ∗ ∗ ∗ − ∗ − ∗ ∗ 1 Hence 49− mod 7 = 4. More generally, suppose we want to solve an equation

ax = c mod b

This is equivalent to the existence of an integer y such that

ax + by = c

And we know how to solve this ! This equation has a solution if and only if hcf(a, b) divides c and we know how to find all the solutions. 13 1.3.25 Example Give examples here.

1.3.26 Corollary Z/p = F is a field. (Recall that F = 0, 1,...,p 1 with addition and p p { − } multiplication defined modulo p.)

Proof. This was proved last year. The only axiom, which is not trivial to check, is the one which states that every every non-zero element has an inverse. 2

1.3.27 Corollary F = 1, 2,...,p 1 is a group with the operation of multiplication. p× { − } Proof. A group is a set with a binary operation (in this case multiplication), such that (i) the operation is associative; (ii) there is an identity element; (iii) every element has an inverse. Clearly [1] is the identity element, and the Lemma says that every element has an inverse. 2

1.3.28 Fermat’s Little Theorem If p is prime and a Z then ∈ ap a mod p. ≡

Hence if p a then ap 1 1 mod p. 6| − ≡ Proof. If p a then a 0 mod p and ap 0 mod p so suppose p a, and so a F . Recall | ≡ ≡ 6| ∈ p× that by a corollary to Lagrange’s Theorem, the order of an element of a group divides the order of the group. Let n be the order of a, so an 1. But by the corollary to Lagrange’s theorem, ≡ n p 1. 2 | − Example What is 3322 mod 23? 23 is prime so 3322 1 mod 23. 101 102 ≡ 101 1 How about 3 mod 103? Well 103 is prime so 3 1 mod 103 So 3 3− mod 103. 1 ≡ ≡ To find 3− mod 103 use Euclid’s algorithm.

103 = 3 34 + 1. × So 3 1 34 mod 103. Hence 3101 34 mod 103. − ≡ ≡ Another example : 326 mod 7. We know that 327 mod 32 mod 7. It follows that 327 = 1 1 32− mod 7. It suffices to calculate 32− mod 7. We get

32 = 4 7 + 4 ∗ 7 = 1 4 + 3 ∗ 4 = 1 3 + 1 ∗ and

1 = (1 4) + ( 1 3) = ( 1 7) + (2 4) = (2 32) + ( 9 7) = ( 9 7) + (2 32) ∗ − ∗ − ∗ ∗ ∗ − ∗ − ∗ ∗ Hence 32 1 2 mod 7 and 326 2 mod 7. − ≡ ≡ Yet another example : 4535 mod 13. 14 We have 13 2 = 26 and 4513 45 mod 13. Hence 4535 = 452 459 = 4511 mod 13. As 12 × 11 ≡1 × 45 ∼= 1 mod 13, we have 45 = 45− mod 13 1 We need to calculate 45− mod 13. Eucledian algorithm : We get

45 = 3 13 + 6 ∗ 13 = 2 6 + 1 ∗ and 1 = (1 13) + ( 2 6) = ( 2 45) + (7 13) = (7 13) + ( 2 45) ∗ − ∗ − ∗ ∗ ∗ − ∗ Hence 4535 2 mod 13. ≡− Let’s do 4342 mod 13. We have 4339 433 mod 13. Hence 4342 436 mod 13.Now 43 mod ≡ ≡ 3 13 4 mod 13. Hence 4342 46 mod 13. Now 42 = 16 = 3 mod 13 Hence 46 = 42 = 33 = ≡ ≡ 27 mod 13 = 1 mod 13 Hence 4342 mod 1 mod 13. And now we get to yet another application of the B´ezout’s lemma.

1.3.29 Chinese Remainder Theorem Suppose m and n are coprime; let x and y be two integers. Then there is a unique [z] Z/nm such that z x mod m and z y mod n. ∈ ≡ ≡ Proof. (existence) By Bezout’s Lemma, we can find h, k Z such that ∈ hn + km = 1.

Given x, y we choose z by z = hnx + kmy. Clearly z hnx x mod m (hn 1 mod m) and z y mod n. ≡ ≡ ≡ ≡ (uniqueness) For uniqueness, suppose z is another solution. Then z z mod n and z z ′ ≡ ′ ≡ ′ mod m. Hence there exist integers r, s such that

z z′ = nr = ms. − Since hn + km = 1 we have

z z′ = (z z′)hn + (z z′)km = mshn + nrkm = nm(sh + rk). − − − Hence z z (nm). 2 ≡ ′ As usual the proof gives you a procedure to find z. To find z, find h and k as in the B´ezouts lemma (run Euclidean algorithm backwards). Then z is hnx + kmy.

1.3.30 Example Find the unique solution of x 3 mod 7 and x 9 mod 11 satisfying ≡ ≡ 0 x 76. ≤ ≤ Solution find h, k such that 7h + 11k = 1 using Euclid: 11=7+4 7=4+3 4=3+1 So 1=4-3=4-(7-4)=2.4-7=2.(11-7)-7=2.11-3.7. Hence let h = 3 and k = 2 so take x = 3.7.9 + 2.11.3= 189 + 66 = 123 31 mod 77. − − − − ≡ 15 Lecture 4

2 Polynomial Rings

2.1 Irreducible elements in Rings 2.1.31 Definition A is (R, +, ), R is a set and +, are binary operations. (R, +) is an · · Abelian group and (R, ) is a monoid and multiplication is distributive over addition. In detail: · a,b,c R (a + b)+ c = a + (b + c), • ∀ ∈ 0 R a R a +0= a =0+ a, • ∃ ∈ ∀ ∈ a R a R a + ( a)=0=( a)+ a, • ∀ ∈ ∃− ∈ − − a, b R a + b = b + a, • ∀ ∈ a,b,c R (ab)c = a(bc), • ∀ ∈ 1 R a R 1 a = a = a 1, • ∃ ∈ ∀ ∈ · · a,b,c R a(b + c)= ab + ac, • ∀ ∈ a,b,c R (b + c)a = ba + ca. • ∀ ∈

2.1.32 Example There are lots of examples of rings:

Z is a ring; • Z/n is a ring; • Q and R and C are rings; • More generally every field is a ring. Conversely if R is a ring in which 0 = 1; xy is always • 6 the same as yx, and every non-zero element has a multiplicative inverse then R is a field.

The set M (R) of real n n matrices is a ring; • n × More generally, given any ring R, the set M (R) is a ring. • n The set R[x] is all polynomials in x with coefficients in R is a ring. Note that a polynomial • is an expression of the form

a + a x + ... + a xn, a ,...a R. 0 1 n 0 n ∈ More generally, for any ring R, the set R[x] of polynomials with coefficients in R is a • n ring. Addition and multiplication are defined as one expects: if f(X) = anX and n g(X)= bnX then we define P P n (f + g)(X)= (an + bn)X , 16 X n (fg)(X)= cnX , where X n cn = aibn i. − Xi=0

n We’ll actually study polynomial rings k[X] over a field k. If f = anX is a non-zero polynomial in k[X], then the degree of f is the largest n such that an = 0. We also define P6 deg(0) = . The point of this definition is so that we always have: −∞ deg(f g) = deg(f)+ deg(g). × (we are using the convention that + n = infty). If f = a Xn = 0 has degree d, the the −∞ n 6 coefficient a is called the leading coefficient of f. If f has leading coefficient 1 then f is called d P monic.

2.1.33 Example f(X)= X3 + X + 2 has degree 3, and is monic.

2.1.34 Definition Let R be any ring. There are three kinds of element of R:

An element a R is a if there exists a 1 R such that aa 1 = a 1a = 1. The set of • ∈ − ∈ − − units of R is denoted by R×. An element a R is reducible if it factorizes as a = bc with neither b nor c a unit. • ∈ If a is neither a unit nor reducible then a is called irreducible. •

2.1.35 Example If R = Z then Z = 1, 1 . The irreducible elements are p with p prime. × {− } ±

2.1.36 Example If k is a field then k = k 0 . The element 0 is reducible since 0 = 0 0.. × \ { } ×

2.1.37 Proposition The units in k[X] are precisely the polynomials of degree 0, i.e. the non-zero constant polynomials.

Proof. Clearly if a is a non-zero constant polynomial then it is a unit in k[X]. Conversely, suppose ab = 1. Then we have deg(a) + deg(b) = 0. Hence deg(a) = deg(b)=0. 2 The question of which polynomials are irreducible is much harder, and depends on the field. For example X2 2 factorizes in R[X] as (X + √2)(X √2), but is irreducible in Q[X] (since − − √2 is irrational). The only general statement about irreducible polynomials is the following:

17 2.1.38 Proposition If deg(f) = 1 then f is irreducible.

Proof. Suppose f = gh. Then deg(g) + deg(h) = 1. Therefore the degrees of g and h are 0 and 1, so one of them is a unit. 2 Note that the converse to the above is false as we have already seen with X2 2 in Q[X]. − Note also that even in R[X], the polynomial X2 + 1 is irreducible, although it factorizes in C[X] as (X + i)(X i). One might ask whether there are similar phenomena for C and bigger fields, − but in fact we have:

2.1.39 Fundamental Theorem of Algebra Let f C[X] be a non-zero polynomial. Then ∈ f factorizes as a product of linear factors:

f(X)= c(X λ ) (X λ ), − 1 ··· − d where c is the leading coefficient of f.

Proof. This is proved in a complex analysis course. Here is a sketch of the proof. Let f be a non-constant polynomial. Suppose f has no roots, define 1 g(z)= f(z) As f has no root, g is a holomorphic function. The function g is bounded because f(z) as z (this is because we assumed that | | → ∞ | | → ∞ f is non-constant). A bounded holomorphic function C C is constant, hence f is constant which is a contra- → diction. Hence f has a root. 2 In the notation of this course, the theorem means the in C[X] the irreducible polynomials are exactly the polynomials of degree 1, with no exceptions. In R[X] the description of the irreducible polynomials is a little more complicated. In Q[X] things are much more complicated and it can take some time to determine whether a polynomial is irreducible or not.

18 Lecture 5

2.2 Euclid’s algorithm in k[X] The rings Z and k[X] are very similar. This is because in both rings we a able to divide with remainder in such a way that the remainder is smaller than the element we divided by. In Z if we divide a by b we find: a = qb + r, 0 r

2.2.40 Division Algorithm Given a, b k[X] with b = 0 there exist unique q,r k[x] such ∈ 6 ∈ that a = qb + r and deg(r) < deg(b).

This allows us to prove the same theorems for k[X] as we proved for Z. We have the following corollary of the fundamental theorem of algebra and euclidean division.

2.2.41 Corollary No polynomial f(x) in R[X] of degree > 2 is irreducible in R[X]. Proof.

Let f R[X] be a polynomial of degree > 2. By fundamental theorem f has a root in C, call ∈ it α. Then α (complex conjugate) is another root (because f R[X]). Let ∈ p(x) = (x α)(x α) − − Write α = a + bi, expand to get

p(x)= x2 2ax + (a2 + b2) − The polynomial p is in R[X] and is irreducible (if it was reducible it would have a real root). Divide f by p. f(x)= p(x)q(x)+ r(x) with deg(r) 1. We can write r = sx+r with s, r R. But f(α)= p(α)q(α)+r(α)=0= r(α). ≤ ∈ As α not real we must have r = s = 0. This implies that p divides f but deg(p) = 2 < deg(f). It follows that f is not irreducible. 2

2.2.42 Example In Q[X] divide f = X4 + 2X3 + X2 + 2X +1 by g = X2 + X + 1. We find f X2g = X3 + 2X + 1 − The degree is still deg(g), hence we do it again ≥ X3 + 2X + 1 Xg = X2 + X + 1 − − and one more time :

X2 + X +1+ g = 2(X + 1) − 19 We found something of degree strictly smaller than deg(g). We get f = (x2 + x 1)g + 2x + 1 − Hence q = x2 + x + 1 and r = 2x + 1. Another example: f = x3 + x + 2 and g = x2 1. − We get f xg = 2x + 2 − This is of degree strictly smaller than g, hence q = x and r = x + 1.

1 1 2.2.43 Example In F5[X] divide ...... etc. Note that in F5 we have 2− = 3, 3− = 2 and 1 4− = 4. .... etc.

2.2.44 The Remainder Theorem If f k[X] and a k then ∈ ∈ f(a) = 0 (X a) f. ⇐⇒ − |

Proof. If (x a) f then there exists g k[x] such that f(x) = (x a)g(x). Then f(a) = − | ∈ − (a a)g(a) = 0g(a)=0. − Conversely if by the Division Algorithm we have q,r F[x] with deg(r) < deg(X a) = 1 ∈ − such that f(X)= q(X)(X a)+ r(X). So r(x) k. Then − ∈ r(a)= f(a) q(x)(a a)=0+0=0. − − Hence (X a) f. 2 − | We can also use the division algorithm to calculate highest common factors as before:

2.2.45 Definition Let f, g k[X], not both zero. A highest common factor of f and g is a ∈ monic polynomial h such that:

h f and h g. • | | if a f and a g then deg(a) deg(h). • | | ≤

2.2.47 Example

2.2.47 Proposition Let f = qg + r. Then h is a hcf if f and g iff h is a hcf of g and r.

Proof. Exactly the same as with the integers. 2 1 Note that hcf(f, 0) = c f where c is the leading coefficient of f.

20 Lecture 6

2.2.48 Bezout’s Lemma Let f, g k[X] not both zero. Then there exist a, b k[X] such ∈ ∈ that hcf(f, g)= af + bg.

Again the proof is the same as in the case of integers. Let’s do an example : Calculate hcf(f, g) and find a, b such that hcf(f, g) = af + bg with f = x4 + 1 and g = x2 + x. We write: f x2g = x3 + 1, then f x2g + xg = x2 + 1 and f x2g + xg g = 1 x and − − − − − − we are finished. We find: f = (x2 x + 1)g + 1 x − − And then x2 + x = ( x + 1)( x 2) + 2 − − − As 2 is invertible, we find that the hcf is one ! Now, we do it backwards:

1 = (1/2)( x+1)(x+2)+(1/2)(x2+x) = (1/2)((x4+1) (x2+x)(x2 x+1))(x+2)+(1/2)(x2+x) = (1/2)(x4+1)( − − − hence a = (1/2)(x + 2) and b = (1/2)( x3 x2 + x 1). − − − 2.2.49 Lemma Let p k[X] be irreducible. If p ab then p a or p b. ∈ | | | Proof. Exactly identical to the integers. 2

2.2.50 Unique Factorisation Theorem Let f k[x] be monic. Then there exist p ,p ,...,p ∈ 1 2 n ∈ k[x] monic irreducibles such that f = p p p . 1 2 ··· n If q1,...,qs are monic and irreducible and f = q1 ...qs then r = s and (after reordering) p1 = q2, ... , pr = qr.

Proof. (Existence): We prove the existence by induction on deg(f). If f is linear then it is irreducible and the result holds. So suppose the result holds for polynomials of smaller degree. Either f is irreducible and so the result holds or f = gh for g, h non-constant polynomials of smaller degree. By our inductive hypothesis g and h can be factorized into irreducibles and hence so can f. (Uniqueness): Factorization is obviously unique for linear polynomials (or even irreducible polynomials). For the inductive step, assume all polynomials of smaller degree than f have unique factorization. Let f = g g = h h , 1 ··· s 1 ··· t with gi, hj monic irreducible. 21 Now g is irreducible and g h h . By the Lemma, there is 1 j t such g h . This 1 1| 1 ··· t ≤ ≤ 1| j implies g1 = hj since they are both monic irreducibles. After reordering, we can assume j = 1, so g g = h h , 2 ··· s 2 ··· t is a polynomial of smaller degree than f. By the inductive hypothesis, this has unique factor- ization. I.e. we can reorder things so that s = t and

g2 = h2,...,gs = ht. 2

The fundamental theorem of algebra tells you exactly that any monic polynomial in C[x] is a product of irreducibles (recall that polynomials of degree one are irreducible). A consequence of factorisation theorem and fundamental theorem of algebra is the following: any polynomial of odd degree has a root in R. Indeed, in the decomposition we can have polynomials of degree one and two. Because the degree is odd, we have a factor of degree one, hence a root. Another example : x2 + 2x +1=(x + 1)2 in k[X]. Look at x2 + 1. This is irreducible in R[x] but in C[x] it is reducible and decomposes as (x+i)(x i) and in F [x] it is also reducible : x2 +1=(x+1)(x 1) = (x+1)2 in F [x]. In F [x] − 2 − 2 5 we have 22 =4= 1 hence x2 +1=(x + 2)(x 2) (check : (x 2)(x +2) = x2 4= x2 + 5). − − − − In fact one can show that x2 +1 is reducible in F [x] is and only if p 1 mod 4. p ≡ In F [x], the polynomial xp x decomposes as product of polynomials of degree one. p − Suppose you want to decompose x4 +1 in R[x]. It is not irreducible puisque degree est > 2. Also, x4 + 1 does not have a root in R[x] but it does in C[x]. The idea is to decompose into factors of the form (x a) in C[x] and then group the conjugate factors. − This is in general how you decompose a polynomial into irreducibles in R[x] ! So here, the roots are

iπ/4 3iπ/4 5iπ/4 7iπ/4 a1 = e ,a2 = e ,a3 = e ,a4 = e .

Now note that a = a and the polynomial (x a )(x a ) is irreducible over R. The middle 4 1 − 1 − 4 coefficient is (a + a )= 2 cos(π/4) = √2. Hence we find : (x a )(x a )= x2 √2x + 1. − 1 2 − − − 1 − 4 − Similarly a = a and (x a )(x a )= x2 + √2x + 1. 2 3 − 2 − 3 We get the decomposition into irreducibles over R :

x4 +1=(x2 √2x + 1)(x2 + √2x + 1) − In Q[x] one can show that x4 + 1 is irreducible. 4 In F2[x] we can also decompose x + 1 into irreducibles. Indeed :

x4 +1= x4 1 = (x2 1)2 = (x 1)4 − − −

22 Lecture 7

3 Jordan Canonical Form

3.1 Revision of Fields. A field is a commutative ring with 1 such that every non-zero element has an • inverse. Examples: Q, R, C, Fp. If k is any field then k(X) (the field of rational functions) is a field.

Vector spaces, subspaces, direct sums. A over a field k is a set V with two • operations: addition and multiplication by elements of k. Elements of V are called vectors, and elements of k are called scalars. The axioms are:

– (V, +) is an abelian (commutative) group. – (xy)v = x(yv) for x, y k, v V . ∈ ∈ – (x + y)(v + w)= xv + xw + yv + yw for x, y k, v,w V . ∈ ∈ – 1v = v.

A typical example of a vector space is the space kn of n-tuples of elements in k. In particular k itself is a vector space over itself. Another example is k[X]. The set of polynomials with coefficients in k is a vector space. Fix n 0 and let k[X] be te set of polynomials of degree less or equal to n. This is a ≥ n vector space (although it is not a ring !). If n = 0, then this vecor space is just k. Take k = R and let C be the set of all continuous functions from [0, 1] to R. Then C is an R-vector space. Similarly, take k = C and let H be the set of all holomorphic functions from the unit ball to C. This is a C vector space. Of course it also an R-vector space. Another example. Let a, b R and consider the set of all twice differentiable functions f ∈ such that d2f df + a + bf = 0 dx dx This is an R vector space (exercise).

A linear combination of v ,...,v is a vector of the form x v + ... + x v . • { 1 n} 1 1 n n For example, consider the vector space k[x]n as before. This vector space is in fact the set of all linear combinations of the elements 1,x,...,xn.

The span of a set of vectors is the set of linear combinations of those vectors. • As above, k[x] is the span of the set 1,x,...,xn . We say that the vectors 1,x,...,xn n { } { } span or generate this vector space. Consider k2 and the vecotrs 1 e = 1 0   23 and

0 e = 2 1   Then the set of vectors e , e spans R2. { 1 2} Let V be a k-vector space. A subset A of V is said to generate V is V is the span of A. • In the examples above 1,x,...,xn generates k[x] but 1, 1,x , 1,x,x2 , 1,x,x2,...,xn 1 { } n { } { } { − } do not generate V . The set e , e certainly generates R2 while e or e do not. { 1 2} { 1} { 2} Let V be a k-vector space. A subset W V is called a subspace if any linear combination • ⊂ of elements in W is in W . In other words, a subspace is a subset which is a vector space with the same addition and scalar multiplication as V . Let V be a k-vector space and take v V . The set kv of all multiples of v by elements of ∈ k is a subspace. More generally, take any set A V , then the set of linear combinations ⊂ of elements of A is a vector subspace. As an (easy) exercise, prove that given any collection W , i I (I is some set, finite or i ∈ infinite) of subspaces of V , the intersection i I Vi is a subspace. The union is not ! For ∩ ∈ example, let V = k2 and W = ke and e = ke . It is quite clear that W W not a 1 1 2 2 1 ∪ 2 subspace, for example e1 + e2 is not in it. Let C be the set of continuous functions [0, 1] R. We have seen that this is an R-vector −→ space. Let W = f C : f(0) = 0 . This is a subspace (easy exercise). { ∈ } We have seen that k[x] is a vector space. The space k[x]n is a vector subspace. Vectors v ,...,v V are called linearly independent if whenever n λ v = 0 (for some • 1 n ∈ i=1 i i λi R), then λi = 0 for all i. ∈ P 2 For example, in k , vetors e1, e2 are linearly independent. Clearly e1 and 2e1 are not linearly independent. In k[x], the vectors 1,x,x2,..., are linearly independent. { A set v ,...,v of vectors is a for V if it is linearly independent and it’s span is • { 1 n} V . If this is the case then every vector has a unique expression as a linear combination of v ,...,v . { 1 n} For example e , e is a basis of R2. The set 1,x,x2,...,xn is basis of k[x] . { 1 2} { } n The set 1,x,x2,..., is a basis of k[x]. { The of a vector space is the number of vectors in a basis. This does not depend • on the basis: any two bases have the same number of elements. n For example k has dimension n, k[x]n has dimension n+1 while k[x] is infinite dimensional and so is C. R viewed as vactor space over itself has dimension 1 but viewed as vector space over Q is infinite dimensional. 24 C viewed as a vector space over itself has dimension 1 but as a vector space over R it has dimension 2 : a basis is 1,i . { } In this course we will be mainly concerned with finite dimensional vector spaces.

Let V,W be vector spaces. A function T : V W is a if → T (v + w)= T (v)+ T (w), • T (xv)= xT (v). • or equivalently, T (v + xw)= T (v)+ xT (w). A bijective linear map is called an isomorphism of vector spaces. For example the map T : C C that sends z to z is not a linear map of C-vector spaces : −→ T (λz)= λT (z) ! But it is a map of real vector spaces : if λ R, then λ = λ. ∈ If T is linear, we define its kernel and image:

ker(T )= v V : T (v) = 0 , { ∈ } Im(T )= T (v): v V . { ∈ } The rank of T is the dimension of the image of T , and the nullity of T is the dimension of the kernel of T . This implies the following:

3.1.51 Rank-Nullity Theorem Let T : V W be a linear map. Then → rank(T ) + null(T ) = dim V.

Proof. Let v ,...,v be a basis of ker(T ) and w ,...,w be a basis of Im(T ). By definition { 1 r} { 1 s} of the image, there exist u ,...,u vectors of V such that { 1 s}

T (ui)= wi

We claim that u ,...,u ,v ,...,v form a basis of V which will conclude the proof. { 1 s 1 r} First we show linear independence. Suppose that

a v + + a v + b u + + b u = 0 1 1 ··· s r 1 1 ··· r s Apply T , we get

0= T (0) = b T (u )+ + b T (u )= b w + + b w 1 1 ··· r s 1 1 ··· s s (note that a v + + a v = 0 because the v s are in the kernel of T ). Now, as w ,...,w is 1 1 ··· s r i { 1 s} a basis of Im(T ) (in particular it is linearly independent), we get that bi = 0 for all i. So we have a v + + a v = 0 and, because v s for a basis of ker(T ) (and in particular are 1 1 ··· s r i linearly independent), we get that ai = 0 for all i. We have shown that a s and b)is are all zero hence u ,...,u ,v ,...,v is linearly inde- i { 1 s 1 r} pendent. It remains to show that u ,...,u ,v ,...,v spans V . { 1 s 1 r} 25 Let x V . By the choice of w ,...,w as a basis of the image of T , we have ∈ { 1 s} s s s T (x)= aiwi = aiT (ui)= T ( T (aiui)) Xi=1 Xi=1 Xi=1 Therefore s T (x a u ) = 0 − i i Xi=1 and hence s x a u ker(T ) − i i ∈ Xi=1 and now, by the choice of v as basis of ker(T ), there exist b s such that { i} i s r x = aiui + bivi Xi=1 Xj=1 which shows that u ,...,u ,v ,...,v generates V . { 1 s 1 r} This finishes the proof. 2 Here are some consequences of this theorem.

3.1.52 Definition A linear map T : V W is called isomorphism if there exists −→ 1. T : W V such that T T = I (identity of W ) 1 −→ 1 V 2. T : W V such that T T = I (identity of V ) 2 −→ 2 W In particular, a linear map T 1 : V V is an isomorphism if there exists T 1 such that T 1T − −→ − − is the identity. It is easy (and left as exercise) to see that T : V W is an isomorphism if and only if T −→ is both surjective and injective. (for the converse you will need to constrict T1 and T2 as maps and then show that they are linear.)

3.1.53 Corollary Let T : V W be a linear map with dim V = dim W . If T is injective, −→ then T is an isomorphism. If T is surjective, then T is an isomorphism. Proof. If T is injective, then ker(T ) = 0 . By the above theorem, dim(Im(T )) = dim(V ) = dim(W ) and { } hence Im(T )= W and T is surjective. Injective + Surjective = Isomorphism. Similarly, if T is surjective, then dim(Im(T )) = dim(W ) = dim(V ) and hence dim(ker(T )) = 0. It follows that T is injective. Injective + Surjective = Isomorphism. 2

3.1.54 Corollary Let V and W be two vector spaces of same dimension. Then V is isomorphic to W . Proof. Let r = dim(V ) = dim(W ) and let v ,...,v be a basis of V and w ,...,w { 1 r} { 1 r} be a basis of W . Define T by T (vi)= wi. By construction, T is surjective and by the theorem it’s also injective hence an isomorphism. 2

26 Lecture 8

3.2 Matrix representation of linear maps Let V and W be two finite dimensional vector space over a field k. Suppose that V is of dimension r and W is of dimension t. Let B = b ,...,b be a basis for V and B = b ,...,b be a basis for W . { 1 r} ′ { 1′ s′ } For any vector v V we shall write [v] (in the future, we will by abuse of notation simply ∈ B call this column vector v when it is obvious which basis we are referring to) for the column vector of coefficients of v with respect to the basis B, i.e.

x1 . [v] = . , v = x1b1 + ... + xrbr. B  .  x  r   Given a linear map T : V W we have → r r T (v)= T ( xibi)= xiT (bi) Xi=1 Xi=1 Now we have s T (bi)= ajiwj Xj=1 We get : T (v)= xiajiwj 1 i r,1 j s ≤ ≤X≤ ≤ In other words it is the s r matrix product of the matrix, usually denoted by A , with entries × T

(AT )i,j = aji

One also writes [T ] , ′ for this matrix AT . B B In practice, to write a matrix of T with respect to given bases, decompose T (bi) in the basis of W and write column vectors, this gives the matrix AT The matrix AT or [T ] , ′ is called the matrix of T with respect to bases and ′. B B B B A LINEAR TRANSFORMATION IS THE MATRIX WITH RESPECT TO SPEC- IFIED BASES OF THE SOURCE AND THE TARGET SPACES. Example. V = W = R3 with canonical bases e , e , e . { 1 2 3} x x T y = y     z 0     (notice that this is the projection onto the plane z = 0). One finds 1 0 0 A = 0 1 0 T   0 0 0 27  One can find the kernel and the image. In this case, clearly the image is the span of e1 and e2 hence dim(ImT ) = 2. By rank-nullity theorem, dim ker(T ) = 1 and quite clearly it is generated by e3. Let us look at T : R3 R3 given by → x 2x y + 3z − T = y = 4x 2y + 6z    −  z 6x + 3y 9z − −     One finds : 2 1 3 − A = 4 2 6 T  −  6 3 9 − − − Quite clearly, the first column vector in this matrix is 2 times the second and the third − 2 is the first minus the second, therefore Im(T ) is one dimensional and spanned by 4 The   6 − rank-nullity theorem implies that the dimension of ker(T ) is 2. To find ker(T ) one needs to solve AT v = 0. By elimination, one easily shows that the kernel has equation 2x y +3 = 0, hence 1 0 − can be spanned by the vectors 2 and 3     0 1 Another example : k[x]n k[x]n 1 sending  f to its derivative f ′. Quite clearly it’s a linear −→ − map. Find its matrix, image and kernel. Same question with k[x] k[x] sending f to f + f . n −→ n ′ Let us consider the transformation T : R2 R2 −→ x x + y T = = y x y    −  1 3 and B1 = B2 = v1 = ,v1 = − . { 1 2 } −    0 3 One calculates T (v1)= v1 = = 6v1 2v2 and T (v2)= v1 = − = 17v1 + 6v2. 2 − − 2 The matrix of T with respect to these bases is  

6 17 − 2 6 −  In the canonical bases, of course the matrix is:

1 1 1 1  −  One one changes the bases, the matrix gets multiplied on the left and on the right by appropriate ‘base change’ matrices. More precisely, let T : V W be a linear map. Let B = v ,...,v be a basis for V and −→ 1 { 1 r} let B = v ,...,v be another basis for V . 1′ { 1′ r′ } 28 Similarly let B = w ,...,w be a basis for W and let B = w ,...,w be another basis 2 { 1 s} 2′ { 1′ s′ } for W .

Let AT =[T ]B1,B2 be the matrix of T in the bases B1 and B2. Let

x1 . v =  .  , v = x1v1′ + ... + xrvr′ . x  r   be a vector of V written in the basis B1′ . Now write

r

vi′ = bijvi Xj=1 be the expression of the vector v from B in the basis B . Thus we obtain a r r matrix i′ 1′ 1 × B = (bij) which has the property that vi′ = Bvi for all is. This matrix B is called the transition (or base change) matrix from basis B1 to B1′ . Then A Bv = x w + + x w , vector in W in the basis B . Now write T 1 1 ··· s s 2 s

wi = aikwk′ Xk=1 Then the s s matrix A = (a ) is such that AA Bv is a column vecor that expresses the × ik T coordinates of T (v) in the basis B2′ . We summarise : ′ ′ [T ]B1,B2 = A[T ]B1,B2 B where A is the s s matrix whose columns are coordinates of w in the basis B and B is r r × i 2′ × matrix whose columns are coordinates of vi′ in the basis B1. In the particular case where r = s and V = W and B1 = B2 and B1′ = B2′ we get that 1 A = B− and ′ ′ 1 [T ]B1,B2 = B− [T ]B1,B2 B Example.

x 2y x + y T = − = 2x + y x y    −  1 3 and B = B = v = ,v = . 1 2 { 1 2 1 2 } −    5 One calculates T (v )= v = = 5 (v + v ) 1 1 0 4 1 2   1 13 3 and T (v2)= v1 = − = v1 + v2. 8 − 4 4   29 The matrix of T with respect to these bases is 5/4 13/4 [T ] 1 2 = − B ,B 5/4 3/4   In the canonical basis B the matrix is 1 2 [T ]B,B = − 2 1   The transition matrix from B to B1 is 1 3 A = 2 2 −  And the transition matrix from B1 to B is

1 1 2 3 A− = − 8 2 1   1 One easily checks that [T ]B1,B2 = A− [T ]B,BA. To summarise what we have seen : Let T : V W be a linear map, we let r be the dimension of V and s be the dimension of −→ W . Let B = v ,...,v be a basis of V and B a basis of W . { 1 r} ′ The matrix AT of T in the bases B and B′ is the matrix (T (v1),...,T (vr)). where column vectors are coordinates of T (v ) in the basis B . This is an s r matrix. i ′ × Let v be a vector in V and write it as a column vector (r 1 vector) of its coordinates in × the basis B. Then A v ( !) is the column vector (s 1 matrix) which T × represents the coordinates of T (v) W in the basis B . ∈ ′ Suppose B1 is another basis of V and B1′ is another basis of W . Then T is represented in the bases B1 and B1′ by the matrix AT multiplied on the left and on the right by appropiate base change matrices. It can be shown that the only matrices which do not depend on the choice of a basis are diagonals with all coefficients on the diagonal equal (for example identity and zero). We also have the following :

3.2.55 Proposition Let T : V W and T : W U be two linear maps and suppose we 1 −→ 2 −→ are given bases B, B , B of the vector spaces V , W , U. Then for the composed map T T 1 2 2 ◦ 1 (usually simply denoted by T2T1) the matrix is

[T2T1]B,B2 =[T2]B1,B2 [T1]B,B1 In particular if T : V V (such a map is called endomorphism) and B is a basis for V , −→ then n n [T ]B =[T ]B and, if we suppose that T is an isomorphism 1 1 [T − ]B =[T ]B−

In particular, if B and B′ are two bases of V , then the transition matrix of B to B′ is the inverse of the transition matrix of B′ to B. 30 Lecture 9 In what follows V is a vector space of dimension n and B is a basis. Let A be the matrix representing T in the basis B. Because A is an n n matrix, we can k 0 × k form powers A for any k with the convention that A = In. Note that A represents the transformation T composed ktimes as seen above. Notice for example that when the matrix is diagonal with coefficients λi on the diagonal, n n then A is diagonal with coefficients λi . Notice also that such a matrix is invertible if and only 1 1 if all λis are non-zero, then A− is the diagonal matrix with coefficients λi− on the diagonal.

3.2.56 Definition Let f(X)= a Xi k[X]. We define i ∈ P i f(T )= aiT . X where we define T 0 = id. This is a linear transformation. If A M (k) is a matrix then we define ∈ n i f(A)= aiA , X This matrix f(A) represents f(T ) in the basis B. What is means is that we can ‘evaluate’ a polynomial at a n n matrix and get another n n × × matrix. We write [f(T )]B for this matrix in the basis B, obviously it is the same as f([T ]B). Let’s look at an example. 1 3 Take A = − and f(x)= x2 5x + 3. Then 4 7 −   21 3 f(A)= A2 5A +3= − 4 29   3.2.57 Definition Recall that the polynomial of an n n matrix A is defined × by ch (X) = det(X I A). A · n − This is a monic polynomial of degree n over k. Now suppose T : V V is a linear map. We → can define chT to be ch[T ]B but we need to check that this does not depend on the basis B. If C is another basis with transition matrix M then we have:

1 ch (X) = det(X I M − [T ] M) [T ]C · n − B 1 = det(M − (X In [T ]B)M) 1 · − = det(M)− det(X I [T ] ) det(M) · n − B = det(X I [T ] ) · n − B = ch[T ]B (X)

In other words, the characteristic polynomial does not depend on the choice of the basis in which we write our matrix. The following (important !) theorem was proved in the first year courses. 31 3.2.58 Cayley–Hamilton Theorem For any A be an n n matrix. We have ch (A) = 0. × A

We therefore have:

3.2.59 Cayley–Hamilton Theorem For any T : V V linear map, we have ch (T ) = 0. −→ T

λ1 0 3.2.60 Example Take A = Then chA(x) = (x λ1)(x λ2) and clearly chA(A)=0. 0 λ2 − −   1 2 Take A = . Calculate ch (x) and check that ch (A)=0. 3 4 A A   There are plenty of polynomials f such that f(A) = 0, all the multiples of f for example. What can also happen is that some divisor g of chA is already such that g(A) = 0. Take the identity I for example. Its characteristic polynomial is (1 x)n but in fact g = 1 x is already n − − such that g(In) = 0. This leads us to the notion of minimal polynomial.

32 Lecture 10 In the last lecture, we showed that given a linear transformation T there is a polynomial f (namely the characteristic polynomial) such that f(T ) = 0. Among all polynomial with this property, there is one of minimal degree.

3.3 Minimal polynomials 3.3.61 Definition Let V be a finite dimensional vector space over a field k and T : V V a → linear map. A minimal polynomial of T is a monic polynomial m k[X] such that ∈ m(T )=0; • if f(T ) = 0 and f = 0 then deg f deg m. • 6 ≥

3.3.62 Theorem Every linear map T : V V has a unique minimal polynomial m . → T Furthermore f(T ) = 0 iff m f. T | Proof. Firstly the Cayley-Hamilton theorem implies that there exists a polynomial f sat- isfying f(T ) = 0, namely f = chT . Suppose that mT is not unique then there exists a monic polynomial n(x) k[X] satisfying deg(m) = deg(n) and n(T )=0. ∈ If f(x)= m(x) n(x) then − f(T )= m(T ) n(T ) = 0, − also deg(f) < deg(m) = deg(n), a contradiction. Suppose f k[X] and f(T ) = 0. By the Division Algorithm for polynomials there exist ∈ unique q,r k[X] with deg(r) < deg(m) and ∈ f = qm + r.

Then r(T )= f(T ) q(T )m(T ) = 0 q(T ) 0 = 0. − − · So r is the zero polynomial (by the minimality of deg(m).) Hence f = qm and so m f. | Conversely if f k[X] and m f then f = qm for some q k[X] and so f(T )= q(T )m(T )= ∈ | ∈ q(T ) 0=0. 2 ·

3.3.63 Corollary If T : V V is a linear map then m ch . → T | T

Proof. By the Cayley-Hamilton Theorem chT (T )=0. 2 Using the corollary we can calculate the minimal polynomial as follows:

Calculate ch and factorize it into irreducibles. • T Make a list of all the factors. • Find the monic factor m of smallest degree such that m(T )=0. • 33 2 1 3.3.64 Example Suppose T is represented by the matrix 2 . The characteristic   2 polynomial is   ch (X) = (X 2)3. T − The factors of this are: 1, (X 2), (X 2)2, (X 2)3. − − − The minimal polynomial is (X 2)2. − In fact this method can be speeded up: there are certain factors of the characteristic poly- nomial which cannot arise. To explain this we recall the definition of an eigenvalue

3.3.65 Definition Recall that a number λ k is called an eigenvalue of T if there is a ∈ non-zero vector v satisfying T (v)= λ v. · The non-zero vector v is called an eigenvector

3.3.66 Remark It is important that an eigenvector be non-zero. If you allow zero to be an eigenvector, then any λ would be an eigenvalue.

3.3.67 Proposition Let v be an eigenvector of T with eigenvalue λ k. Then for any ∈ polynomial f k[X], ∈ (f(T ))(v)= f(λ) v. ·

Proof. Just use that T (v)= λv. 2

3.3.68 Theorem If T : V V is linear and λ k then the following are equivalent: → ∈ (i) λ is an eigenvalue of T .

(ii) mT (λ) = 0.

(iii) chT (λ) = 0.

Proof. (i) (ii): Assume T (v)= λv with v = 0. Then by the proposition, ⇒ 6 (m (T ))(v)= m (λ) v. T T · But m (T ) = 0 so we have m (λ) v = 0. Since v = 0 this implies m (λ)=0. T T · 6 T (ii) (iii): This is trivial since we have already shown that m is a factor of ch . ⇒ T T (iii) (i): Suppose ch (λ) = 0. Therefore det(λ id T ) = 0. It follows that (λ id T ) is ⇒ T · − · − not invertible so there is a non-zero solution to (λ id T )(v) = 0. But then T (v)= λ v. 2 · − ·

34 Now suppose the characteristic polynomial of T factorizes into irreducibles as

r ch (X)= (X λ )a1 . T − i Yi=1 By fundamental theorem of algebra, if k = C, you can always factorise it like this. Then the minimal polynomial has the form

r m (X)= (X λ )b1 , 1 b a . T − i ≤ i ≤ i Yi=1 This makes it much quicker to calculate the minimal polynomial. Indeed, in practice, the number of factors and the ais are quite small.

3.3.69 Example Suppose T is represented by the matrix diag(2, 2, 3). The characteristic polynomial is ch (X) = (X 2)2(X 3). T − − The possibilities for the minimal polynomial are:

(X 2)(X 3), (X 3)2(X 3). − − − − The minimal polynomial is (X 2)(X 3). − −

35 Lecture 11

3.4 Generalized Eigenspaces 3.4.70 Definition Let V be a finite dimensional vector space over a field k, and let λ k be ∈ an eigenvalue of a linear map T : V V . We define for t N the t-th generalized eigenspace → ∈ by: V (λ) = ker((λ id T )t). t · − Note that V1(λ) is the usual eigenspace (i.e. the set of eigenvectors together with zero).

3.4.71 Remark We obviously have

V (λ) V (λ) ... 1 ⊆ 2 ⊆ and by definition, dim V (λ) = null (λ id T )t . t · − 

3.4.72 Example Let 2 2 2 A = 0 2 2 .   0 0 2   We have ch (X) = (X 2)3 so 2 is the only eigenvalue. We’ll now calculate the generalized A − eigenspaces Vt(2): 0 2 2 V (2) = ker 0 0 2 . 1   0 0 0 We calculate the kernel by row-reducing the matrix: 

0 1 0 1 V (2) = ker 0 0 1 = span 0 . 1     0 0 0  0      Similarly   0 0 1 1 0 V (2) = ker 0 0 0 = span 0 , 1 . 2       0 0 0  0 0        0 0 0 1 0 0 V (2) = ker 0 0 0 = span 0 , 1 , 0 . 3         0 0 0  0 0 1           

36 3.4.73 Example Let 1 1 2 − A = 1 1 2 .  −  1 1 2 −   Let V be a vector space and U and W be two subspaces. Then (exercise on Sheet 4), U + W is a subspace of V . If furthermore U V = 0 , then we call this subspace the direct sum of ∩ { } U and W and denote it by U W . In this case ⊕ dim(U W ) = dim U + dim W ⊕ 3.4.74 Primary Decomposition Theorem If V is a finite dimensional vector space over C and T : V V is linear, with distinct eigenvalues λ ,...,λ k, minimal polynomial → 1 r ∈ r m (X)= (X λ )bi , T − i Yi=1 then V = V (λ ) V (λ ). b1 1 ⊕···⊕ br r

3.4.75 Lemma Let k be a field. If f, g k[x] satisfy hcf(f, g) = 1 and T is as above then ∈ ker(fg(T )) = ker(f(T )) ker(g(T )). ⊕

Proof of the Theorem. By definition of mT we have mT (T ) = 0 so ker(mT (T )) = V . We have a factorization of m into pairwise coprime factors of the form (x λ )bi (this is where the T − i fact that the ground field is C is used) so the lemma implies that

r bi b1 bt V = ker(mT (α)) = ker (T λi1) = ker(T λ1) ker(T λt) − ! − ⊕···⊕ − Yi=1 = V (λ ) V (λ ). b1 1 ⊕···⊕ bt t 2

Proof of the lemma. Let f, g k[x] satisfy hcf(f, g)=1. ∈ Firstly if v ker f(T ) + ker g(T ), say v = w + w , with w ker f(T ) and w ker g(T ) ∈ 1 2 1 ∈ 2 ∈ then fg(T )v = fg(T )(w1 + w2)= f(g(T )w1)+ f(g(T )w2)= f(g(T )w1) Now, f and g are polynomials in k[x], hence fg = gf, therefore

f(g(T )w1)= g(f(T )w1) = 0 because w ker(f(T )). 2 ∈ 37 Therefore ker(f(α)) + ker(g(α)) ker(fg(α)). ⊆ We need to prove the eqaulity, here we will use that hcf(f, g)=1. Now since hcf(f, g) = 1 there exist a, b k[x] such that ∈ af + bg = 1.

So a(T )f(T )+ b(T )g(T ) = 1 (the identity map). Let v ker(fg(T )). If ∈ v1 = a(T )f(T )v, v2 = b(T )g(T )v then v = v1 + v2 and

g(T )v1 = (gaf)(T )v = a(fg(T )v)= a(T )0 = 0.

So v ker(g(T )). Similarly v ker(f(T )) since 1 ∈ 2 ∈

f(T )v2 = (fbg)(T )v = b(fg(T )v)= b(T )0 = 0.

Hence ker fg(T ) = ker f(T )+ker g(T ). Moreover, if v ker f(T ) ker g(T ) then v =0= v so ∈ ∩ 1 2 v = 0. Hence ker fg(T ) = ker f(T ) ker g(T ). ⊕ 2

38 Lecture 12

3.4.76 Definition Recall that a linear map T : V V is diagonalizable if there is a basis → B of V such that [T ] is a diagonal matrix. This is equivalent to saying that the basis vectors in B are all eigenvectors. B

3.4.77 Theorem Let V is a finite dimensional vector space over a field k and let T : V V → be a linear map with eigenvalues λ ,...,λ k. Then T is diagonalizable iff we have (in k[X]): 1 r ∈ m (X) = (X λ ) ... (X λ ). T − 1 − r

Proof. First suppose that T is diagonalizable and let be a basis of eigenvectors. For B simplicity let f(X) = (X λ ) ... (X λ ). We already know that f m , so to prove that − 1 − r | T f = mT we just have to check that f(T ) = 0. To show this, it is sufficient to check that f(T )(v) = 0 for each basis vector v . Suppose v , so v is an eigenvector with some ∈ B ∈ B eigenvalue λi. Then we have

f(T )(v)= f(λ) v = 0 v = 0. · ·

Therefore mT = f. Conversely if mT = f then by the primary decomposition theorem we have

V = V (λ ) ... V (λ ). 1 1 ⊕ ⊕ 1 r Let be a basis for V (λ ). Then obviously the elements of are eigenvectors and = Bi 1 i Bi B ... is a basis of V . Therefore T is diagonalizable. 2 B1 ∪ ∪ Br This gives a practical criterion to check whether a given matrix is diagonalisable or not : calculate the minimal polynomial and factor it over C. If it does not have multiple roots then the matrix is diagonalisable, if it does then it is not.

3.4.78 Example Let k = C and let

4 2 A = . 3 3   The characteristic polynomials is (x 1)(x 6). The minimal polynomial is the same. The − − matrix is diagonalisable. One finds that the basis of eigenvectors is

2 1 , 3 1 −    . In fact this matrix is diagonalisable over R or even Q.

39 3.4.79 Example Let k = R and let

0 1 A = − . 1 0   The characteristic polynomial is x2 +1. It is irreducible over R. The minimal polynomial is the same. The matrix is not diagonalisable over R, however over C x2 +1=(x i)(x + i) and the − matrix is diagonalisable.

3.4.80 Example Let k = C and let

1 1 A = − . 1 1  −  The characteristic polynomials is X2. Since A = 0 the minimal polynomial is also X2. Since 6 this is not a product of distinct linear factors, A is not diagonalizable over C.

3.4.81 Example Let k = C and let

1 0 A = . 1 1   The minimal polynomial is (x 1)2. Not diagonalisable. −

3.5 Jordan Bases in the one eigenvalue case Let T : V V be a linear map. Fix a basis for V and let A be the matrix of T in this basis. As → we have seen above, it is not always the case that T can be diagonalised; i.e. there is not always a basis of V consisting of eigenvalues of T . This the case that there is no basis of eigenvalues, the best kind of basis is a Jordan basis. We shall define a Jordan basis in several steps. Suppose λ k is the only eigenvalue of a linear map T : V V . We have defined generalized ∈ → eigenspaces: V (λ) V (λ) ... V (λ), 1 ⊆ 2 ⊆ ⊆ b where b is the power of X λ in the minimal polynomial m . − T We can choose a basis for V (λ). Then we can choose so that is a basis for B1 1 B2 B1 ∪ B2 V (λ) etc. Eventually we end up with a basis ... for V (λ). We’ll call such a basis a 2 B1 ∪ ∪ Bb b pre-Jordan basis.

3.5.82 Example Consider 3 2 A = − . 8 5  −  One calculates the characteristic polynomial and finds (x + 1)2 hence λ = 1 is the only eigen- − 1 value. The unique eigenvector is v = . Hence V ( 1) = Span(v). Of course we have 2 1 −  

40 0 (A λI )2 = 0 hence V ( 1) = C2 and we complete v to a basis of C2 = V ( 1), by v = − 2 2 − 2 − 2 1   for example. We have Av = 2v v and hence in the basis v ,v the matrix of A 2 − 1 − 2 { 1 2} 1 2 − − 0 1  −  The basis v ,v is a pre-Jordan basis for A and in this basis the matrix of A is upper triangular. { 1 2} This is a general fact : AV (λ) V (λ) k ⊂ k Indeed, if v V (λ), we have ∈ k (A λI )kAv = A(A λI )kv = 0 − n − n hence Av V (λ) and therefore in the pre-Jordan basis the matrix of A is upper triangular. ∈ k 3.5.83 Example 2 1 2 − A = 1 2 2  −  1 1 1 −   We have ch (X) = (X 1)3 and m (X) = (X 1)2. There is only one eigenvalue λ = 1, and A − A − we have generalized eigenspaces

V (1) = ker 1 1 2 , V (1) = ker(0) = C3. 1 − 2 So we can choose a pre-Jordan basis as follows:

1 0 1 = 1 , 2 , = 0 . B1 −    B2    0 1   0        This in fact also works over R.     Now note the following:

3.5.84 Lemma If v V (λ) with t> 1 then ∈ t

(T λ id)(v) Vt 1(λ). − · ∈ −

Proof. Clear from the definition of the generalised eigenspaces. 2 Now suppose we have a pre-Jordan basis ... . We call this a Jordan basis if in B1 ∪ ∪ Bb addition we have the condition:

(T λ id) t t 1, t = 2, 3,...,b. − · B ⊂ B − If we have a pre-Jordan basis ... , then to find a Jordan basis, we do the following: B1 ∪ ∪ Bb 41 For each basis vector v b, replace one of the vectors in b 1 by (T λ id)(v). When • ∈ B B − − · choosing which vector to replace, we just need to take care that we still have a basis at the end.

For each basis vector v b 1, replace one of the vectors in b 2 by (T λ id)(v). When • ∈ B − B − − · choosing which vector to replace, we just need to take care that we still have a basis at the end.

etc. • For each basis vector v , replace one of the vectors in by (T λ id)(v). When • ∈ B2 B1 − · choosing which vector to replace, we just need to take care that we still have a basis at the end.

We’ll prove later that this method always works.

3.5.85 Example Let’s look again at

3 2 A = − . 8 5  −  We have seen that v ,v is a pre-Jordan basis, here v is the second vector in the standard { 1 2} 2 basis. 2 Replace v by the vector (A + I )v = . Then v ,v still forms a basis of C2. This is 1 2 2 4 { 1 2} the Jordan basis for A.   We have Av = v and Av = v v (you do not need to calculate, just use (A+I )v = v ). 1 − 1 2 1− 2 2 2 1 Hence in the new basis the matrix is 1 1 − 0 1  − 

3.5.86 Example In the example above, we replace one of the vectors in by B1 1 1 (A I ) 0 = 1 . − 3     0 1     So we can choose a Jordan basis as follows: 1 1 1 = 1 , 1 , = 0 . B1 −    B2    0 1   0           

42 Lecture 13

3.5.87 Example Take k = R.

1 1 A = − . 1 1  −  Here, we have seen that the characteristic and minimal polynomials are x2. Therefore, 0 is the only eigenvalue. 1 Clearly v = generates the eigenspace and V (0) = R2. We complete the basis by taking 1 1 2   0 v = . We get a pre-Jordan basis. 2 1   1 Let’s construct a Jordan basis. Replace v1 by Av2 = − . This is a Jordan basis. The 1 matrix of A in the new basis is −  0 1 0 0  

3.5.88 Example 2 2 2 A = 0 2 2 .   0 0 2   Clearly, the characteristic polynomial is (x 2)3 and it is equal to the minimal polynomial, 2 is − the only eigenvalue. 1 V (2) has equations y = z = 0, hence it’s spanned by v = 0 1 1   0 3 V2(2) is z = 0 and V3(2) is R . Therefore, the standard basis is a pre-Jordan basis. We have 0 0 4 (A 2I )2 = 0 0 0 − 3   0 0 0 hence we get   Now, 0 2 2 A 2I = 0 0 2 − 3   0 0 0 We have   2 (A 2I )v = 2 − 3 3   0   and we replace v2 by this vector.

43 4 Now (A 2I )v = 0 − 3 2   0 We get:  

4 2 0 v = 0 , v = 2 , v = 0 1   2   3   0 0 1 We have       Av1 = 2v1, Av2 = v1 + 2v2, Av3 = v2 + 2v3 In this basis the matrix of A is: 2 1 0 0 2 1   0 0 2   This is a Jordan basis.

3.6 Jordan Canonical (or Normal) Form in the one eigenvalue case The Jordan canonical form of a linear map T : V V is essentially the matrix of T with respect → to a Jordan basis. We just need to order the vecors appropriately. Everything is over a field k, often k will be C. Suppose for the moment that T has only one eigenvalue λ and choose a Jordan basis: = ... , B B1 ∪ ∪ Bb where m (x) = (x λ)b. Of course, as (A λid)b = 0, we have V (λ)= V . T − − b We have a chain of subspaces V (λ) V (λ) V (λ)= V 1 ⊂ 2 ⊂···⊂ b and the pre-Jordan basis was consstructed by starting with a basis of V1(λ) and completing it successefully to get a basis of Vb(λ)= V . WE then altered this basis so that

(T λid) i i 1 − B ⊂ B − Notice that we can arrange the basis elements in chains : starting with a vector v we get ∈ Bb a chain 2 b 1 v, (T λid)v, (T λid) v,..., (T λid) − v − − − This last vector w = (T λid)b 1v is in V (λ). Indeed − − 1 (T λid)bv = 0 − hence (T λid)w = 0 − therefore Tw = λw therefore w V (λ). ∈ 1 We have the following 44 3.6.89 Lemma For any v (in particular v / for i < b !), the vectors ∈ Bb ∈ Bi 2 b 1 v, (T λid)v, (T λid) v,..., (T λid) − v − − − are linearly independent.

Proof.S 2uppose that

µ (T λid)iv = 0 i − Xi Then µ v + (T λid)w = 0 0 − where w is a lineat combination of (T λid)kv. Multiplying by (T λid)b 1, we get − − − b 1 µ (T λid) − v = 0 0 − but, as v / Vb 1(λ), we see that ∈ − b 1 (T λid) − v = 0 − 6 hence µ0 = 0. Repeating the process inductively, we get that µi = 0 for all i and the vectors we consider are linearly independent. Let us number the vectors in this chain as v = (T λid)b 1v,...,v = v. In other words b − − 0 b i v = (T λid) − v i − Then (T λid)vi = vi 1 − − i.e. Tvi = λvi + vi 1 − In other words, 0 .  .  0   1 T (v )=   , i λ   0    .   .    0   in the basis formed by this chain.   This gives a Jordan block i.e. b b matrix: × λ 1 λ 1  . .  J (λ)= .. .. . k    .   .. 1    λ    45  In this way, we arrange our Jordan basis in chains starting with (for i = b, b 1,..., 1) Bi − and terminating at V1(λ). By putting the chains together, we get that in the Jordan basis, the matrix is of the following form:

λ 1 λ 1   λ 1    λ     λ 1     λ 1    .  λ 1     λ     λ 1     λ     λ     λ    We nad write it as  

[T ] = diag(Jh1 (λ),...,Jhw (λ)). B where the Jhi s are blocks corresponding to a chain of length hi. We can actually say more; in fact the following results determines the number of blocks:

3.6.90 Lemma The number of blocks is the dimension of the eigenspace V1(λ). Proof. Let

(v1,...,vk) be the Jordan basis of the subspace U corresponding to one block. It is a chain, we have Tv1 = λv1 and Tvi = λvi + vi 1 − for 2 i k. ≤ ≤ Let v U be an eigenvector : Tv = λv. Write v = k c v . Then ∈ i=1 i i P Tv = c1λv1 + ci(λvi + vi 1)= λv + civi − i 2 i 2 X≥ X≥

It follows that Tv = λv if and only if i 2 civi = 0 which implies that c2 = = cn = 0 ≥ ··· and hence v is in the subspace generated by v1. Therefore, each block determines exactly one P eigenvector for eigenvalue λ. As eigenvectors from different blocks are linearly independent : they are members of a basis, the number of blocks is exactly the dimension of the eigenspace V1(λ). 2 SUMMARY : To summarise what we have seen so far. Suppose T has one eigenvalue λ, let mT (x) = (x λid)b be its minimal polynomial. We construct a pre-Jordan basis by choosing a basis − B1 for the eigenspace V (λ) and then complete by (a certain number of vectors in V (λ)) and 1 B2 2 46 then to ,..., . Note that V (λ) = V . We get a pre-Jordan basis = . Ina B3 Bb b B B1 ∪···∪Bb pre-Jordan basis the matrix of T is upper tirangular. Now we alter the pre-Jordan basis by doing the following. Start with a vector v , b ∈ B replace one of the vectors in b 1 by vb 1 = (T λid)vb making sure that this vb 1 is linearly B − − − − independent of the other vectors in b 1. Then replace a vector in b 2 by vb 2 = (T λid)vb 1 B − B − − − − (again choose a vector to replace by choosing one such that vb 2 is linearly independent of the others)... continue until you get to V (λ). The last vector will− be v V (λ) i.e. v is an 1 1 ∈ 1 1 eigenvector. We obtain a chain of vectors

v1 = (T λid)v2,v2 = (T λid)v3,...,vb 1 = (T λid)vb,vb − − − − Hence in particular Tvk = vk 1 + λvk − The subspace U spanned by this chain is T -stable (because Tvk = vk 1 + λvk) and this chain is linearly independent hence the chain forms a basis of U. In restriction− to U and with respect to this basis the matrix of T is λ 1 λ 1   λ 1    λ     λ 1     λ 1  J(b)(λ)=   .  λ 1     λ     λ 1     λ     λ     λ      One constructs such chains with all elements of b. Once done, one looks for elements in b 1 B B − which are not in the previously constructed chains starting at and constructs chains Bb with them. Then with b 2, etc... B − In the end, the union of chains will be a Jordan basis and in it the matrix of T is of the form :

diag(Jh1 (λ),...,Jhw (λ)). Notice the following two observations : 1. There is always a block of size b b. Hence by knowing the degree of the × minimal polynomial, in some cases it is possible to determine the shape of Jordan normal form. 2. The number of blocks is the dimension of the eigenspace V1(λ) Here are some examples: Suppose you have a matrix such that

ch = (x λ)5 A − 47 and m (x) = (x λ)4 A − There is always a block of size 4 4, hence the Jordan normal form has one 4 4 block and one × × 1 1 block. × Suppose ch is the same but m (x) = (x λ)3. Here you need to know more. There is A A − one 3 3 block and then either two 1 1 blocks or one 2 2 block. This is determined by the × × × diomension of V1(λ). If it’s three then the first possibility, if it’s two then the second.

0 101 0 100 A =   1111 −  1102 −  One calculates that ch (x) = (x 1)4. We have  A − 1101 − 0 000 A I =   − 1101 −  1101 −  Clearly the rank of A I is 1, hence dim V (λ)=3.  − 1 This means that the Jordan normal form will have three blocks. Therefore there will be two blocks of size 1 1 and one of size 2 2. The Jordan normal form is × × 1000 0100   0011 0001   Another example:  

2 0 1 1 − − 0 2 1 0 A =  −  0 0 2 0 −  0 0 0 2  −  4  One calculates that chA(x) = (x + 2) . We have 0 0 1 1 − 00 1 0 A + 2I =   00 1 0 00 0 0   2  2 We see that (A + 2I) = 0 and therefore mA(x) = (x + 2) . As there is always a block of size two, there are two possibilities : either two 2 2 blocks or one 2 2 and two 1 1. × × × To decide which one it is, we see that the rank of A + 2I is 2 hence the dimension of the kernel is 2. There are therefore 2 blocks and the Jordan normal form is 21 0 0 − 0 2 0 0  −  0 0 2 1 −  0 0 0 2  −    48 Lecture 14

3.7 Jordan canonical form in general. Once we know how to determine Jordan canonical form in one eigenvalue case, the general case is easy. Let T be a linear transformation and λ1,...,λr its eigenvalues. Suppose that the minimal polynomial decomposes as r m (x)= (x λ )bi T − i Yi=1 (recall again that this is always true over C.) The we have seen that V = V (λ ) V (λ ) b1 1 ⊕···⊕ br r and each Vbi (λi) is stable by T . Therefore in restriction to each Vbi (λi), T is a linear transfor- mation with one eigenvelue, namely λi and the minimal polynomial of T restricted to Vbi (λi) is (x λ )bi . − i One gets the Jordan basis by taking the union of Jordan bases for each Vbi (λi) which are constructed as previously. Let’s look at an example.

1 1 1 − A = 2 2 1 −  1 1 1 −   One calculates that ch (x)= x(x 1)2. Then 0 and 1 are the only eigenvalues and A − V = V (0) V (1) 1 ⊕ 2

One finds that V1(0) is generated by 1 v = 1 0   0   That will be the first vector of the basis. For λ = 1. We have 2 1 1 − A I = 2 1 1 − −  1 1 0 −   We find that V1(λ) is spanned by 1 v = 1 1   1 Then   1 0 1 − (A I)2 = 1 0 1 −  −  0 0 0 49  and V2(λ) is spanned by 1 v = 1 1   1 and   0 v = 1 2   0   Notice that (A I)v = v and therefore this is already a Jordan basis. The matrix in this basis − 2 1 is 0 0 0 0 1 1   0 0 1 Another example :  

5 4 2 A = 4 5 2   2 2 2   One calculates that ch (x) = (x 1)2(x 10). Then 1 and 10 are the only eigenvalues. A − − One finds 1 1 − − V (1) = Span(u = 0 ,u 1 ) 1 1   2   2 0 The dimension is two, therefore there will be two blocks  of size 1 1 corresponding to the × eigenvalue 1. For V1(10), one finds

2 V (10) = Span(u = 2 ) 1 3   1   In the basis (u1,u2,u3), the matrix is

1 0 0 0 1 0   0 0 10   It is diagonal, the matrix is diagonalisable, in fact m = (x 1)(x 10). A − − And a last example : find Jordan basis and normal form of :

4 010 2 230 A =   1020 −  4 012     One finds that the characteristic polynomial is ch (x) = (x 2)2(x 3)2. A − − Hence 2 and 3 are eigenvalues and we have 50 2 010 2 030 A 2I =   − 1000 −  4 010     Clearly the dimension of the kernel is 2 and spanned by e2 and e4. The eigenspace has dimension two. So we will have two blocks of size 1 1 corresponding to eigenvalue 2. × For the eigenvalue 3:

1 0 1 0 2 1 3 0 A 3I =  −  − 1 0 1 0 − −  4 0 1 1  −    We see that rows one and three are identical, others are linearly independent. It follows that 1 1 the eigenspace is one-dimensional and spanned by u = − ) 1 −  3    We will have one block.   Let us calculate:

0 0 0 0 3 1 4 0 (A 3I)2 = − −  − 0 0 0 0  10 2 1 −    0 4 We take the vector v =  ) to complete the basis of ker(A 3I)2. 1 −  2 −  Now, we have (A 3I)v= uhence we already have a Jordan basis. − The basis (e2, e4,u,v) is a Jordan basis and in this basis the matrix is

2000 0200   0031 0003    

51 Lecture 15

4 Bilinear and Quadratic Forms

4.1 Matrix Representation 4.1.91 Definition Let V be a vector space over k. A on V is a function f : V V k such that × → f(u + λv,w)= f(u,w)+ λf(v,w); • f(u,v + λw)= f(u,w)+ λf(u,w). • I.e. f(v,w) is linear in both v and w.

4.1.92 Example An obvious example is the following : take V = R and f : R R R × −→ defined by f(x, y)= xy. Notice here the difference between linear and bilinear : f(x, y) = x + y is linear, f(x, y)= xy is bilinear. More generally f(x, y)= λxy is bilienar for any λ R. ∈ More generally still, given a matrix A M (k), the following is a bilinear form on kn: ∈ n v1 w1 t . . f(v,w)= v Aw. = viai,jwj, v =  .  , w =  .  . i,j v w X  n   n      We’ll see that in fact all bilenear form are of this form.

1 2 4.1.93 Example If A = then the corresponding bilinear form is 3 4   x x 1 2 x f 1 , 2 = x y 2 = x x + 2x y + 3y x + 4y y . y y 1 1 3 4 y 1 2 1 2 1 2 1 2  1   2     2  

Recall that if = b1,...,bn is a basis for V and v = xibi then we write [v] for the B { } B column vector x1 P [v] = . . B  .  x  n    4.1.94 Definition If f is a bilinear form on V and = b ,...,b is a basis for V then we B { 1 n} define the matrix of f with respect to by B f(b1, b1) ... f(b1, bn) [f] = . . B  . .  f(b , b ) ... f(b , b )  n 1 n n    52 4.1.95 Proposition Let be a basis for a finite dimensional vector space V over k, dim(V )= B n. Any bilinear form f on V is determined by the matrix [f] . Moreover for v,w V , B ∈ f(v,w) = [v]t [f] [w] . B B B

Proof. Let v = x b + x b + + x b , 1 1 2 2 ··· n n so x1 [u] = . . B  .  x  n  Similarly suppose   y1 [w] = . . B  .  y  n  Then  

n f(v,w) = f xibi,w i=1 ! n X = xif (bi,w) Xi=1 n n = x f b , y b i  i j j Xi=1 Xj=1 n n  = xi yjf (bi, bj) Xi=1 Xj=1 n n = xiyjf (bi, bj) Xi=1 Xj=1 n n = ai,jxiyj Xi=1 Xj=1 = [v]t [f] [w] . B B B 2

Let us give some examples. Suppose that f : R2 R2 R is given by × −→ x x f 1 , 2 = 2x x + 3x y + x y y y 1 2 1 2 2 1  1   2  Let us write the matrix of f in the standard basis. 53 f(e1, e1) = 2, f(e1, e2) = 3, f(e2, e1) = 1, f(e2, e2) = 0 hence the matrix in the standard basis is 2 3 1 0   Now suppose = b ,...,b and = c ,...,c are two bases for V . We may write one B { 1 n} C { 1 n} basis in terms of the other: n ci = λj,ibj. Xj=1 The matrix λ1,1 ... λ1,n . . M =  . .  λ ... λ  n,1 n,n  is called the transition matrix from to . It is always an : its inverse in the B C transition matrix from to . Recall that for any vector v V we have C B ∈ [v] = M[v] , B C and for any linear map T : V V we have → 1 [T ] = M − [T ] M. C B We’ll now describe how bilinear forms behave under .

4.1.96 Change of Basis Formula Let f be a bilinear form on a finite dimensional vector space V over k. Let and be two bases for V and let M be the transition matrix from to . B C B C [f] = M t[f] M. C B

Proof. Let u,v V with [u] = x, [v] = y, [u] = s and [v] = t. ∈ B B C C Let A = (a ) be the matrix representing f with respect to . i,j B By Theorem 5.1 we have f(u,v)= xtAy. Now x = Ms and y = Mt so

f(u,v) = (Ms)tA(Mt) = (stM t)A(Mt) = st(M tAM)t.

t We have f(bi, bj) = (M AM)i,j. Hence

[f] = M tAM = M t[f] M. C B 54 2

For example, let f be the from the previous example. It is given by

2 3 1 0   in the standard basis. We want to write this matrix in the basis 1 1 , 1 0     The transition matrix is : 1 1 M = 1 0   it’s is the same. The matrix of f in the new basis is

6 3 5 2  

55 Lecture 16

4.2 Symmetric bilinear forms and quadratic forms As before let V be a finite dimensional vector space over a field k.

4.2.97 Definition A bilinear form f on V is called symmetric if it satisfies f(v,w)= f(w,v) for all v,w V . ∈

4.2.98 Definition Given a f on V , the associated is the function q(v)= f(v,v). Notice that q has the property that q(λv)= λ2q(v). For exemple, take the bilinear form f defined by

6 0 0 5   The corresponding quadratic form is

x q( ) = 6x2 + 5y2 y   4.2.99 Proposition Let f be a bilinear form on V and let be a basis for V . Then f is a B symmetric bilinear form if and only if [f] is a (that means ai,j = aj,i.). B

Proof. This is because f(ei, ej)= f(ej, ei). 2

4.2.100 Polarization Theorem If 1+1 = 0 in k then for any quadratic form q the underlying 6 symmetric bilinear form is unique.

Proof. If u,v V then ∈ q(u + v) = f(u + v,u + v) = f(u,u) + 2f(u,v)+ f(v,v) = q(u)+ q(v) + 2f(u,v).

So f(u,v)= 1 (q(u + v) q(u) q(v)). 2 2 − − Let’s look at an example : Consider 2 1 A = 1 0   it is a symmetric matrix. Let f be the corresponding bilinear form. We have

f((x1, y1), (x2, y2)) = 2x1x2 + x1y2 + x2y1 56 and q(x, y) = 2x2 + 2xy = f((x, y), (x, y))

Let u = (x1, y1),v = (x2, y2) and let us calculate 1 1 (q(u + v) q(u) q(v)) = (2(x + x )2 + 2(x + x )(y + y ) x2 2x y 2x2 2x y ) 2 − − 2 1 2 1 2 1 2 − 1 − 1 1 − 2 − 1 2 1 = (4x x + 2(x y + x y )) = f((x , y ), (x , y )) 2 1 2 1 2 2 1 1 1 2 2

If A = (ai,j) is a symmetric matrix, then the corresponding form is

f(x, y)= ai,ixiyi + ai,j(xiyj + xjyi) Xi Xi

n 2 q(x)= ai,ixi + 2 ai,jxixj Xi=1 Xi

2 2 2 4.2.101 Example Let q(x1,x2,x3)= x1 + 3x2 + 5x3 + 4x1x2 + 6x1x3 + 8x2x3. The matrix of this quadratic form is 1 2 3 A = 2 3 4   3 4 5 The underlying bilinear form is represented by the same matrix. Let us write down the matrix of this form in the basis

1 2 0 0 , 1 , 0 {  −   } 0 0 1 Consider the matrix       1 2 1 M = 0 1 2  − −  0 0 1 Then the matrix we are looking for is  

1 0 0 M tAM = 0 1 0  −  0 0 0   Notice that in this new basis the quadratic form is

q((x ,x )) = x2 y2 1 2 1 − 2

57 4.3 Orthogonality and diagonalization 4.3.102 Definition Let V be a vector space over k with a symmetric bilinear form f. We call two vectors v,w V orthogonal if f(v,w) = 0. It is a good idea to imagine this means that v ∈ and w are at right angles to each other. This is written v w. If S V then the orthogonal ⊥ ⊂ complement of S is defined to be

S⊥ = v V : w S,w v . { ∈ ∀ ∈ ⊥ }

4.3.103 Proposition S⊥ is a subspace of V .

Proof. Let v,w V and λ k. Then for any u S we have ∈ ∈ ∈ f(v + λw,u)= f(v,u)+ λf(w,u) = 0.

Therefore v + λw S . 2 ∈ ⊥

4.3.104 Definition A basis is called an if any two distinct basis vectors B are orthogonal. Thus is an orthogonal basis if and only if [f] is diagonal. B B

4.3.105 Diagonalisation Theorem Let f be a symmetric bilinear form on a finite dimen- sional vector space V over a field k in which 1 + 1 = 0. Then there is an orthogonal basis for 6 B V ; i.e. a basis such that [f] is a diagonal matrix. B Notice that the existence of an orthogonal basis is indeed equivalent to the matrix being diagonal. Let B = v ,...,v be an orthogonal basis. By definition f(v ,v )=0if i = j hence the { 1 n} i j 6 only possible non-zero values are f(vi,vi) i.e. on the diagonal. And of course the converse holds : if the matrix is diagonal, then f(v ,v ) = 0 of i = j. i j 6 The quadratic form associated to such a bilinear form is

2 q(x1,...,xn)= λixi Xi where λis are elements on the diagonal.

58 Lecture 17

4.3.106 Recall Let U, W be two subspaces of V . The sum of U and W is the subspace

U + W = u + w : u U, w W . { ∈ ∈ } We call this a direct sum U W if U W = 0 . This is the same as saying that ever element ⊕ ∩ { } of U + W can be written uniquely as u + w with u U and w W . ∈ ∈

4.3.107 Key Lemma Let v V and assume that q(v) = 0. Then ∈ 6

V = span v v ⊥. { } ⊕ { }

Proof. For w V , let ∈ f(v,w) f(v,w) w = v, w = w v. 1 f(v,v) 2 − f(v,v)

Clearly w = w + w and w span v . Note also that 1 2 1 ∈ { } f(v,w) f(v,w) f(w ,v)= f w v,v = f(w,v) f(v,v) = 0. 2 − f(v,v) − f(v,v)   Therefore w v . It follows that span v + v = V . To prove that the sum is direct, 2 ∈ { }⊥ { } { }⊥ suppose that w span v v . Then w = λv for some λ k and we have f(w,v) = 0. Hence ∈ { }∩{ }⊥ ∈ λf(v,v) = 0. Since q(v)= f(v,v) = 0 it follows that λ = 0 so w = 0. 2 6 Proof of the theorem. We use induction on dim(V )= n. If n = 1 then the theorem is true, since any 1 1 matrix is diagonal. So suppose the result holds for vector spaces of dimension × less than n = dim(V ). If f(v,v) = 0 for every v V then using Theorem 5.3 for any basis we have [f] = [0], ∈ B B which is diagonal. [This is true since 1 f(e , e )= (f(e + e , e + e ) f(e , e ) f(e , e )) = 0.] i j 2 i j i j − i i − j j So we can suppose there exists v V such that f(v,v) = 0. By the Key Lemma we have ∈ 6

V = span v v ⊥. { } ⊕ { } Since span v is 1-dimensional, it follows that v is n 1-dimensional. Hence by the inductive { } { }⊥ − hypothesis there is an b1,...,bn 1 of v ⊥. { − } { } Now let = b1,...,bn 1,v . This is a basis for V . Any two of the vectors bi are orthogonal B { − } by definition. Furthermore b v , so b v. Hence is an orthogonal basis. 2 i ∈ { }⊥ i ⊥ B

59 4.4 Examples of Diagonalising 4.4.108 Definition Two matrices A, B M (k) are congruent if there is an invertible matrix ∈ n P such that B = P tAP. We have shown that if and are two bases then for a bilinear form f, the matrices [f] and B C B [f] are congruent. C

4.4.109 Theorem Let A M (k) be symmetric, where k is a field in which 1 + 1 = 0, then ∈ n 6 A is congruent to a diagonal matrix.

Proof. This is just the matrix version of the previous theorem. 2 We shall next find out how to calculate the diagonal matrix congruent to a given symmetric matrix.

4.4.110 Recall There are three kinds of row operation:

swap rows i and j; • multiply row(i) by λ = 0; • 6 add λ row(i) to row(j). • × To each row operation there is a corresponding elementary matrix E; the matrix E is the result of doing the row operation to In. The row operation transforms a matrix A into EA. We may also define three corresponding column operations:

swap columns i and j; • multiply column(i) by λ = 0; • 6 add λ column(i) to column(j). • × Doing a column operation to A is the same a doing the corresponding row operation to At. We therefore obtain (EAt)t = AEt.

4.4.111 Definition By a double operation we shall mean a row operation followed by the corresponding column operation. If E is the corresponging elementary matrix then the double operation transforms a matrix A into EAEt.

4.4.112 Lemma I f we do a double operation to A then we obtain a matrix congruent to A.

Proof. EAEt is congruent to A. 2

60 Lecture 18 Recall that a symmetric bilinear forms are represented by symmetric matrices. If we change the basis then we will obtain a congruent matrix. We’ve seen that if we do a double operation to matrix A then we obtain a congruent matrix. This corresponds to the same quadratic form with respect to a different basis. We can always do a sequence of double operations to transform any symmetric matrix into a diagonal matrix.

4.4.113 Example Consider the quadratic form q(x, y)t = x2 + 4xy + 3y2

1 2 1 2 1 0 A = . 2 3 → 0 1 → 0 1    −   −  This shows that there is a basis = b , b such that B { 1 2} q(xb + yb )= x2 y2. 1 2 − Notice that when we have done the first operation, we have multiplied A on the left by 1 0 E ( 2) = and when we have done the second, we have multiplied on the right 2,1 − 2 1 −  1 2 by E ( 2)t = 2,1 − 0 1 We find that −  1 0 E ( 2)AE ( 2)t = 2,1 − 2,1 − 0 1  −  Hence in the basis 1 2 , − 0 1     1 0 the matrix of the corresponding quadratic form is 0 1  −  4.4.114 Example Consider the quadratic form q(x, y)t = 4xy + y2

0 2 2 1 1 2 2 1 → 0 2 → 2 0       1 2 1 0 → 0 4 → 0 4  −   −  1 0 1 0 → 0 2 → 0 1  −   − 

This shows that there is a basis b , b such that { 1 2} q(xb + yb )= x2 y2. 1 2 −

The last step in the previous example transformed the -4 into a -1. In general, once we have a diagonal matrix we are free to multiply or divide the diagonal entries by squares: 61 4.4.115 Lemma For µ ,...,µ k = k 0 and λ ,...,λ k 1 n ∈ × \ { } 1 n ∈ 2 2 D(λ1,...,λn) is congruent to D(µ1λ1,...µnλn).

Proof. Since µ ,...,µ k 0 then µ µ = 0. So 1 n ∈ \ { } 1 ··· n 6

P = D(µ1,...,µn) is invertible. Then

t P D(λ1,...,λn)P = D(µ1,...,µn)D(λ1,...,λn)D(µ1,...,µn) 2 2 = D(µ1λ1,...,µnλn). 2

4.4.116 Definition Two bilinear forms f, f ′ are equivalent if they are the same up to a change of basis.

4.4.117 Definition The rank of a bilinear form f is the rank [f] for any basis . B B Clearly if f and f ′ have different rank then they are not equivalent.

62 Lecture 19

4.5 Canonical forms over C 4.5.118 Definition Let q be a quadratic form on vector space V over C, and suppose there is a basis of V such that B I [q] = r . B 0   I We call the matrix r a canonical form of q (over C). 0  

4.5.119 Canonical forms over C Let V be a finite dimensional vector space over C and let q be a quadratic form on V . Then q has exactly one canonical form.

Proof. (Existence) We first choose an orthogonal basis = b ,...,b . After reordering the B { 1 n} basis we may assume that q(b ),...,q(b ) = 0 and q(b ),...,q(b ) = 0. Since every complex 1 r 6 r+1 n number has a square root in C, we may divide b by q(b ) if i r. i i ≤ (Uniqueness) row and column operations do not change the rank of a matrix. Hence con- p gruent matrices have the same rank. 2

4.5.120 Corollary Two quadratic forms over C are equivalent iff they have the same canon- ical form.

4.6 Canonical forms over R 4.6.121 Definition Let q be a quadratic form on vector space V over R, and suppose there is a basis of V such that B Ir [q] = Is . B  −  0   Ir We call the matrix I a canonical form of q (over R).  − s  0   4.6.122 Sylvester’s Law of Inertia Let V be a finite dimensional vector space over R and let q be a quadratic form on V . Then q has exactly one (real) canonical form.

Proof. (existence) Let = b ,...,b be an orthogonal basis. We can reorder the basis so B { 1 n} that q(b1),...,q(br) > 0, q(br+1),...,q(br+s) < 0, q(br+s+1),...,q(bn) = 0. Then define a new basis by 1 bi i r + s, √ q(bi,bi) ci = | | ≤ b i>r + s.  i 63  The matrix of q with respect to is a canonical form. C (uniqueness) Suppose we have two bases and with B C

Ir Ir′ [q] = Is , [q] = Is′ . B  −  C  −  0 0     By comparing the ranks we know that r + s = r′ + s′. It’s therefore sufficient to prove that r = r′. Define two subspaces of V by

U = span b ,...,b , W = span c ′ ,...,c . { 1 r} { r +1 n}

If u is a non-zero vector of U then we have u = x1b1 + ... + xrbr. Hence

2 2 q(u)= x1 + ... + xr > 0.

Similarly if w W then w = y ′ c ′ + ... + y c , and ∈ r +1 r +1 n n 2 2 q(w)= y ′ ... y ′ ′ 0. − r +1 − − r +s ≤ It follows that U W = 0 . Therefore ∩ { } U + W = U W V. ⊕ ⊂ From this we have dim U + dim W dim V. ≤ Hence r + (n r′) n. − ≤ This implies r r . A similar argument shows that r r, so we have r = r . 2 ≤ ′ ′ ≤ ′

4.6.123 Definition The rank of a quadratic form is the rank of the corresponding matrix. Clearly, in the complex case it is the integer r that appears in the canonical form. In the real case, it is r + s. For a real quadratic form, the signature is the pair (r, s). In this case q(v) > 0 for all non-zero vectors v. A real form q is positive definite if its signature is (r, 0), negative definite if its signature is (0, s). In this case q(v) < 0 for all non-zero vectors v. There exists a non-zero vector v such that q(v) = 0 if and only is the signature is (r, s) with r> 0 and s> 0.

64 Lecture 20 Examples of canonical forms over R and C.

65 Lecture 21

5 Inner Product Spaces

5.1 Geometry of Inner Product Spaces 5.1.124 Definition Let V be a vector space over R and let , be a symmetric bilinear h− −i form on V . We shall call the form positive definite if for all non-zero vectors v V we have ∈ v,v > 0. h i

5.1.125 Remark A symmetric bilinear form is positive definite if and only if its canonical form (over R) is In.

Proof. Clearly x2 + ... + x2 is positive definite on Rn. Conversely, suppose is a basis such 1 n B that the matrix with respect to is the canonical form. For any basis vector b , the diagonal B i entry satisfies b , b > 0 and hence b , b = 1. 2 h i ii h i ii

5.1.126 Definition Let V be a vector space over C. A Hermitian form on V is a function , : V V C such that: h− −i × → For all u,v,w V and all λ C, • ∈ ∈ u + λv,w = u,w + λ v,w ; h i h i h i For all u,v V , • ∈ u,v = v,u . h i h i

5.1.127 Example The simplest example is the following : take V = C, then = zw¯ is a hermitian form on C. A matrix A M (C) is called a Hermitian matrix if At = A¯. Here A¯ is the matrix obtained ∈ n from A by applying complex conjugation to the entries. If A is a Hermitian matrix then the following is a Hermitian form on Cn:

v,w = vtAw.¯ h i In fact every Hermitian form on Cn is one of these. To see why, suppose we are given a Hermitian form <,>. Choose a basis B = (b1,...,bn). Let v = i λibi and w = j µjbj. We calculate

P P t =< λibi, µjbj >= λiµj < bi, bj >= v Aw Xi Xj Xi,j 66 t where A = (< bi, bj >). Of course A = A because < bi, bj >= < bj, bi >. Note that a Hermitian form is conjugate-linear in the second variable, i.e.

u,v + λw = u,v + λ¯ u,w . h i h i h i Note also that by the second axiom u,u R. h i ∈ 5.1.128 Definition A Hermitian form is positive definite if for all non-zero vectors v we have

v,v > 0. h i

Clearly, the fom zw¯ is positive definite.

5.1.129 Definition By an we shall mean one of the following: either A finite dimensional vector space V over R with a positive definite symmetric bilinear form;

or A finite dimensional vector space V over C with a positive definite Hermitian form.

We shall often write K to mean the field R or C, depending on which is relevant.

5.1.130 Example Consider the vector space V of all continuous functions [0, 1] C. −→ Then we can define 1 f, g = f(x)g(x)dx. h i Z0 This defines an inner product on V (easy exercise). Another example. Let V = M (R) the vector space of n n-matrices with real entries. Then n × = tr(ABt)

is an inner product on V . t Similarly, if V = Mn(C) and = tr(AB ) is an inner product.

5.1.131 Definition Let V be an inner product space. We define the of a vector v V ∈ by v = v,v . || || h i p

5.1.132 Lemma For λ K we have λλ¯ = λ 2 for for v V we have λv = λ v . ∈ | | ∈ || || | | || || Proof. Easy. 2

67 5.1.133 Cauchy-Schwarz inequality If V is an inner product space then

u,v V u,v u v . ∀ ∈ |h i| ≤ || ||·|| ||

Proof. If v = 0 then the result holds so suppose v = 0. We have for all λ K, 6 ∈ u λv,u λv 0. h − − i≥ Expanding this out we have:

u 2 λ v,u λ¯ u,v + λ 2 v 2 0. || || − h i− h i | | || || ≥ u,v Setting λ = h v 2i we have: || || u,v v,u u,v 2 u 2 h i v,u h i u,v + h i v 2 0. || || − v 2 h i− v 2 h i v 2 || || ≥

|| || || || || || Multiplying by v 2 we get || || u 2 v 2 2 u,v 2 + u,v 2 0. || || || || − |h i| |h i| ≥ Hence u 2 v 2 u,v 2 . || || || || ≥ |h i| Taking the square root of both sides we get the result. 2

5.1.134 Triangle inequality If V is an inner product space with norm then ||·|| u,v V u + v u + v . ∀ ∈ || || ≤ || || || ||

Proof. We have

u + v 2 = u + v,u + v || || h i = u 2 + 2 u,v + v 5. || || h i || || So the Cauchy–Schwarz inequality implies that

u + v 2 u 2 + 2 u v + v 2 = ( u + v )5. || || ≤ || || || || || || || || || || || || Hence u + v u + v . || || ≤ || || || || 2

5.1.135 Definition Two vectors v,w in an inner product space are called orthogonal if v,w = 0. h i 68 5.1.136 Pythagoras’ Theorem Let (V,<,>) be an inner product space. If v,w V are ∈ orthogonal, then v 2 + w 2 = v + w 2 || || || || || || Proof. Since

v + w 2 = v + w,v + w = v 2 + 2 v,w + w 2, || || h i || || ℜh i || || so we have v 2 + w 2 = v + w 2 || || || || || || if v,w = 0. 2 h i

69 Lecture 22

5.2 Gram–Schmidt Orthogonalization 5.2.137 Definition Let V be an inner product space. We shall call a basis of V an or- B thonormal basis if b , b = δ . h i ji i,j

5.2.138 Proposition If is an orthonormal basis then for v,w V we have: B ∈ v,w =[v]t [w] . h i B B

Proof. If the basis = (b ,...,b ) is orthonormal, then the matrix of <,> in this basis is B 1 n the identity In. The proposition follows. 2

5.2.139 Gram–Schmidt Orthogonalization Let be any basis. Then the basis defined B C by

c1 = b1 b ,c c = b h 2 1ic 2 2 − c ,c 1 h 1 1i b ,c b ,c c = b h 3 1ic h 3 2ic 3 3 − c ,c 1 − c ,c 2 h 1 1i h 2 2i . . n 1 − b ,c c = b h n ric , n n − c ,c r r=1 r r X h i is orthogonal. Furthermore the basis defined by D 1 d = c , r c r || r|| is orthonormal.

Proof. Clearly each b is a linear combination of , so spans V . Hence is a basis. It i C C C follows also that is a basis. We’ll prove by induction that c ,...,c is orthogonal. Clearly D { 1 r} any one vector is orthogonal. Suppose c1,...,cr 1 are orthogonal. The for s

5.2.141 Proposition If V is an inner product space with an orthonormal basis = b ,...,b , B { 1 n} then any v V can be written as v = n v, e e . ∈ i=1h ii i Proof. We have v = n λ e andPv, e = n λ e , e = λ . 2 i=1 i i h ji i=1 ih i ji j P P

71 Lecture 23

5.2.142 Definition Let S be a subspace of an inner product space V . The orthogonal com- plement of S is defined to be

S⊥ = v V : w S v,w = 0 . { ∈ ∀ ∈ h i }

5.2.143 Theorem If V is a and W is a subspace of V then

V = W W ⊥, ⊕ and hence any v V can be written as ∈

v = w + w⊥, for unique w W and w W . ∈ ⊥ ∈ ⊥

Proof. We show first that V = W + W ⊥. Let = e ,...,e be an orthonormal basis for V , such that e ,...,e is a basis for W . E { 1 n} { 1 r} This can be constructed by Gram-Schmidt orthogonalisation. (choose a basis b ,...,b for W { 1 r} and complete to a basis b ,...,b of V . { 1 n} Then apply Gram-Schmidt process. Notice that in Gram-Schmidt process, when constructing orthonormal basis, the vectors c1,...,ck lie in the space generated by c1,...,ck 1, bk. It follows − that the process will give an orthonormal basis e1,...,en such that e1,...,er is an orthonormal basis of W .) If v V then ∈ r n v = λiei + λiei. Xi=1 i=Xr+1 Now r λ e W. i i ∈ Xi=1 If w W then there exist µ R such that ∈ i ∈ r w = µiei. Xi=1 So n r n w, λiej = µiλj ei, ej = 0. * + h i j=Xr+1 Xi=1 j=Xr+1 Hence n λ e W ⊥. i i ∈ i=Xr+1 72 Therefore V = W + W ⊥. Next suppose v W W . So v,v = 0 and so v = 0. ∈ ∩ ⊥ h i Hence V = W W and so any vector v V can be expressed uniquely as ⊕ ⊥ ∈

v = w + w⊥, where w W and w W . 2 ∈ ⊥ ∈ ⊥

5.3 Adjoints 5.3.144 Definition An adjoint of a linear map T : V V is a linear map T such that → ∗ T (u),v = u, T (v) for all u,v V . h i h ∗ i ∈

5.3.145 existence and uniqueness Every T : V V has a unique adjoint. If T is → t represented by A (w.r.t. an orthonormal basis) then T ∗ is represented by A¯ .

t Proof. (Existence) Let T ∗ be the linear map represented by A¯ . We’ll prove that it is an adjoint of A. t t t t Tv,w =[v] A [w]=[v] A¯ [w]. = v, T ∗w . h i h i Notice that here we have used that the basis is orthonormal : we said that the matrix of <,> was the identity. (Uniqueness) Let T ∗, T ′ be two adjoints. Then we have

u, (T ∗ T ′)v = 0. h − i for all u,v V . In particular, let u = (T T )v, then (T T )v = 0 hence T (v)= T (v) ∈ ∗ − ′ || ∗ − ′ || ∗ for all v V . Therefore T = T . 2 ∈ ∗ ′

5.3.146 Example Consider V = C2 with the standard orthonormal basis and let T be repre- sented by 1 i A = i 1 −  Then T ∗ = T (such a linear map is called autoadjoint). Notice that T being autoadjoint is equivalent to the matrix representing it being hermitian

2i 1+ i A = 1+ i i −  Then T = T ∗ − t We also see that T ∗∗ = T (using that T ∗ is represented by A ).

73 5.4 Isometries. 5.4.147 Theorem If T : V V be a linear map of a Euclidean space V then the following → are equivalent.

1 (i) T T ∗ = id (i.e. T ∗ = T − ). (ii) u,v V Tu,Tv = u,v . (i.e. T preserves the inner product.) ∀ ∈ h i h i (iii) v V Tv = v . (i.e. T preserves the norm.) ∀ ∈ || || || ||

5.4.148 Definition If T satisfies any of the above (and so all of them) then T is called an isometry. t We also see that T ∗∗ = T (using that T ∗ is represented by A ). Proof. (i) = (ii) ⇒ Let u,v V then ∈ Tu,Tv = u, T ∗Tv = u,v , h i h i h i 1 since T ∗ = T − . (ii) = (iii) ⇒ If v V then ∈ Tv 2 = Tv,Tv || || h i so by (ii) Tv 2 = v,v = v 2. || || h i || || Hence Tv = v , so (iii) holds. || || || || (iii) = (ii) We just show that the form can be recovered from the norm. We have ⇒ 2 u,v = u + v 2 u 2 v 2, v,w = v,iw . ℜh i || || − || || − || || ℑh i ℜh i For the second equality, notice that,

2 = + = i +i = ℜ 1 − i( )= ( ) = 2 − − i − ℑ Now suppose Tv = v for all v. Take u,v V . We have T (u + v) = u + v , || || || || ∈ || || || || T (u) = u , and T (u) = v . It follows that || || || || || || || || 2 < T (u), T (v) >= T (u)+T (v) 2 T (u) 2 T (v) 2 = u+v 2 u 2 v 2 = 2 ℜ || || −|| || −|| || || || −|| || −|| || ℜ and the second inequality shows that

< T (u), T (v) >= R < T (u), iT (v) >= R(< T (u), T (iv) >)= R(< u,iv >)= ℑ ℑ . Hence < T (u), T (v) >= . 74 (ii) implies (i):

T ∗Tu,v = Tu,Tv h i h i = u,v . h i Therefore < (T T I)u,v >= 0 for all v. In particular, take v = (T T I)u, then (T T ∗ − ∗ − h ∗ − I)u, (T T 1)u = 0. Therefore T T = I. 2 ∗ − i ∗ Notice that in an orthonormal basis, an isometry is represented by a matrix such t 1 that A = A− .

75 Lecture 24 We let O (R) be the set of n n real matrices satisfying AAt = I (in other words At = A 1). n × n − 5.4.149 Proposition (a) If A O (R) then det A = 1. ∈ n ± (b) On(R) is a subgroup of GLn(R).

Proof. If A O (R) then At = A 1 so ∈ n − t 1 1 det A = det A = det(A− ) = det A− .

Therefore det(A)2 = 1 and det A = 1. ± Clearly On(R) is a subset of GLn(R) so to show that it is a subgroup it is enough to show that if A, B O (R) then AB 1 O (R). ∈ n − ∈ n Let A, B O (R). Then ∈ n 1 1 1 t t t 1 t (AB− )− = BA− = BA = (AB ) = (AB− ) .

Hence AB 1 O (R) and so O (R) is a subgroup of GL (R). 2 − ∈ n n n

5.4.150 Theorem If A GL (R) then the following are equivalent. ∈ n (i) A O (R). ∈ n (ii) The columns of A form an orthonormal basis for Rn (for the standard inner product on Rn).

(iii) The rows of A form an orthonormal basis for Rn.

Proof. We prove (i) (ii) (the proof of (i) (iii) is identical). t ⇐⇒ ⇐⇒ Consider A A. If A = [C1,...,Cn], so the jth column of A is Cj, then the (i, j)th entry of t t A A is Ci Cj. So AtA = I CtC = δ C , C = δ C ,...,C is an orthonormal n ⇐⇒ i j i,j ⇐⇒ h i ji i,j ⇐⇒ { 1 n} basis for Rn. 2 For example take the matrix:

1/√2 1/√2 − 1/√2 1/√2   This matrix is in O2(R). In fact it is the matrix of rotation by angle π/4. −

76 5.4.151 Theorem Let V be a Euclidean space with orthonormal basis = e ,...,e . If E { 1 n} = f ,..., f is a basis for V and P is the transition matrix from to , then F { 1 n} E F P O (R) is an orthonormal basis for V. ∈ n ⇐⇒ F

Proof. The jth column of P is [fj] so E n fj = pk,jek. Xk=1 Hence

n n n n n t fi, fj = pk,iek, pl,jel = pk,ipl,j ek, el = pk,ipk,j = (P P )i,j. h i * + h i Xk=1 Xl=1 Xk=1 Xl=1 Xk=1 So is an orthonormal basis for Rn f , f = δ iff P tP = I P O (R). 2 F ⇐⇒ h i ji i,j n ⇐⇒ ∈ n Notice that it is NOT true that matrices in On(R) are diagonalisable. Indeed, take cos(θ) sin(θ) sin(θ) cos(θ) −  where θ is not a multiple of π. The characteristic polynomial is x2 2 cos(θ)x + 1. Then, as cos(θ)2 1 < 0, there are no − − real eigenvalues and the matrix is not diagonalisable. Notice that for a given matrix A, it is easy to check that columns are orthogonal. If that is 1 t the case, then A is in On(R) and it is easy to calculate inverse : A− = A .

77 Lecture 25

5.5 Orthogonal Diagonalization 5.5.152 Definition Let V be an inner space. A linear map T : V V is self-adjoint if −→ T ∗ = T

t Notice that in an othonormal basis, T is represented by a matrix A such that A = A. In particular if V is real, then A is symmetric.

5.5.153 If A M (C) is Hermitian then all the eigenvalues of A are real. Proof.2 ∈ n

t Recall that Hermitian means that A = A and that this implies that < Au,v >=< u,Av > for all u,v. Let λ be an eigenvalue of A and let v = 0 be a corresponding eigenvector. Then 6 Av = λv

It follows that

= λ== λ

As v = 0, we can divide by = v 2 = 0 hence we can divide by it. It follows that λ = λ 6 || || 6 In particular a real symmetric matrix always has an eigenvalue : take a complex eigenvalue (always exists !), then by the above theorem it will be real.

5.5.154 Theorem Spectral theorem Let T : V V be a self-adjoint linear map of an inner → product space V . Then V has an orthonormal basis of eigenvectors. Proof. This is rather similar to Theorem 5.4. We use induction on dim(V )= n. True for n = 1 so suppose the result holds n 1 and let − dim(V )= n. Since T is self-adjoint, if is an orthonormal basis for V and A is the matrix representing E T in then E t A = A . So A is Hermitian. Hence by Theorem 6.18 A has a real eigenvalue λ. So there is a vector e V 0 such that T e = λe . Normalizing (dividing by e ) we 1 ∈ \ { } 1 1 || 1|| can assume that e = 1. || 1|| Let W = Span e then by Theorem 6.9 we have V = W W . Now { 1} ⊕ ⊥ n = dim(V ) = dim(W ) + dim(W ⊥) = 1 + dim(W ⊥), so dim(W )= n 1. ⊥ − We claim that T : W W , i.e. T (W ) α(W ). Let w = µe W , µ R and ⊥ → ⊥ ⊥ ⊆ ⊥ 1 ∈ ∈ v W . Then ∈ ⊥ w,Tv = T ∗w,v = Tw,v = T (µe ),v = µT e ,v = µλe ,v = 0, h i h i h i h 1 i h 1 i h 1 i 78 since µλe W . Hence T : W W . 1 ∈ ⊥ → ⊥ By induction there exists an orthonormal basis of eigenvectors e ,...,e for W . But { 2 n} ⊥ V = W W so = e ,...,e is a basis for V and e , e = 0 for 2 i n and e = 1. ⊕ ⊥ E { 1 n} h 1 ii ≤ ≤ || 1|| Hence is an orthonormal basis of eigenvectors for V . 2 E

5.5.155 Theorem Let T : V V be a self-adjoint linear map of a Euclidean space V . If → λ, µ are distinct eigenvalues of T then

u V v V u,v = 0. ∀ ∈ λ ∀ ∈ µ h i

Proof. If u V and v V then ∈ λ ∈ µ

λ u,v = λu,v = Tu,v = u, T ∗v = u,Tv = u,µv = µ u,v . h i h i h i h i h i h i h i So (λ µ) u,v = 0, with λ = µ. Hence u,v = 0. 2 − h i 6 h i

5.5.156 Example Let 1 i A = i 1 −  This matrix is self-adjoint. One calculates the cracteristic polynomial and finds t(t 2) (in particular the minimal − polynomial is the same, hence you know that the matrix is diagonalisable for other reasons than being self-adjoint). For eigenvalue zero, one finds eigenvector

i − 1  

i 1 i 1 i For eigenvalue 2, one finds Then we normalise the vectors : v1 = − and v1 = 1 √2 1 √2 1 We let       1 i i P = − √2 1 1   and 0 0 P 1AP = − 0 2  

In general the procedure for orthogonal orthonormalisationis as follows. Let A be an n n self-adjoint matrix. × Find eigenvalues λi and eigenspaces V1(λi). Because it is diagonalisable, you will have:

V = V (λ ) V (λ ) 1 1 ⊕···⊕ r r

Choose a basis for V as union of bases of V1(λi). Apply Gram-Schmidt to it to get an orthonormal basis. 79 For example :

1 2 2 − A = 2 4 4 − −  2 4 4 − This matrix is symmetric hence self-adjoint.  One calculates the characteristic polynomial and finds λ2(λ 9). − 1 For V (9), one finds v = 2 To make this orthonormal, divide by the norm, i.e replace 1 1 −  2 1 v1 by 3 v1.   For V1(0), one finds V1(0) = Span(v2,v3) with

2 v = 1 3   0   and 2 − v = 0 4   1   By Gram-Schmidt process we replace v3 by

2 1 1 √   5 0   and v4 by 2 1 − 4 √   3 5 5 Let   1/3 2/√5 2/3√5 − P = 2/3 1/√5 4/3√5 −  2/3 0 5/3√5   We have 9 0 0 1 P − AP = 0 0 0   0 0 0  

80