<<

176

8. Linear Maps

In this chapter, we study the notion of a linear of abstract vector spaces. This is the abstraction of the notion of a linear transformation on Rn.

8.1. Basic definitions

Definition 8.1. Let U, V be vector spaces. A L : U V (reads L from U → to V ) is a rule which assigns to each element U an element in V , such that L preserves and . That is,

L(u1 + u2)=L(u1)+L(u2),L(cu)=cL(u).

Notation: we sometimes write L : u v,ifL(u)=v,andwecallv the of u under ￿→ L.

The two defining properties of a linear map are called .Wheredoesa linear map L : U V send the O in U? By linearity, we have → L(O)=L(0O)=0L(O).

But 0L(O) is the zero element, which we also denote by O,inV .Thus

L(O)=O.

Example. For any U, we have a linear map Id : U U given by U → Id : u u. U ￿→ 177

This is called the identity map. For any other vector space V , we also have a linear map O : U V given by → O : u O. ￿→ This is called the zero map.

Example. Let A be an m n , and define × L : Rn Rm,L(X)=AX. A → A We have seen, in chapter 3, that this map is linear. We have also seen that every linear map L : Rn Rm is represented by a matrix A,ie.ifL = L for some m n matrix A. → A × Example. Let A an m n matrix, and define × L : M(n, k) M(m, k),L(B)=AB. A → A This map is linear because

A(B1 + B2)=AB1 + AB2,A(cB)=c(AB) for any B ,B ,B M(n, k) and any number c (chapter 3). 1 2 ∈ Exercise. Is L : Rn R defined by L(X)=X X, linear? → · Exercise. Fix a vector A in Rn.IsL : Rn R defined by L(X)=A X, linear? → ·

Example. (With ) Let C∞ be the vector space of smooth functions, and let

D(f)=f ￿.ThenD : C∞ C∞ is linear. differentiation → Example. (With calculus) Continuing with the preceding example, let V = Span 1,t,..,tn n { } k k 1 in C∞.SinceD(t )=kt − for k =0, 1, 2,.., it follows that D(f) lands in Vn 1 for any − f in Vn. Thus we can regard D as a linear map

D : Vn Vn 1. → −

Theorem 8.2. Let U, V be vector spaces, S be a of U. Let f : S V be any map. → Then there is a unique linear map L : U V such that L(u)=f(u) for every element u → 178

in S.

Proof: If u1,..,uk are distinct elements in S,thenwedefine k k

L( aiui)= aif(ui) i=1 i=1 ￿ ￿ for any numbers ai.Sinceeveryelementw in U can be expressed uniquely in the form k a u , this defines a map L : U V . Moreover, L(u)=l(u) for every u in S.Soit i=1 i i → ￿remains to verify that L is linear.

Let w1,w2,w be elements in U, and c be a . We can write k

w1 = ai￿ ui i=1 ￿ k

w2 = ai￿￿ui i=1 ￿ k

w = aiui i=1 ￿ where u1,..,uk are distinct elements in S, and the a’s are numbers. Then

L(w1 + w2)=L[ (ai￿ + ai￿￿)ui] ￿ = (ai￿ + ai￿￿)f(ui) ￿ = ai￿ f(ui)+ ai￿￿f(ui)

= L￿(w1)+L(w2￿)

L(cw)=L[c aiui]

= L[ ￿(cai)ui]

= ￿(cai)f(ui)

= c￿ aif(ui) = cL￿(w). Thus L is linear.

For the rest of this section, L : U V will be a linear map. →

Definition 8.3. The Ker L = u U L(u)=O { ∈ | } 179 is called the of L. The set

Im L = L(u) u U { | ∈ } is called the image of L. More generally, for any subspace W U,wewrite ⊂ L(W )= L(w) w W . { | ∈ }

Lemma 8.4. Ker L is of U. Im L is a linear subspace of V .

Proof: If u ,u Ker L,then 1 2 ∈

L(u1 + u2)=L(u1)+L(u2)=O + O = O.

Thus u + u Ker L.Ifc is any number and u Ker L,then 1 2 ∈ ∈ L(cu)=cL(u)=cO = O.

Thus cu Ker L.ThusKer L is closed under addition and scaling in U, hence is a linear ∈ subspace of U.

The proof for Im L is similar and is left as an exercise.

Example. Let A an m n matrix, and L : Rn Rm with L (X)=AX.Then × A → A t Ker LA = Null(A), and Im LA = Row(A ). This shows that the notions of kernel and image are abstractions of the notions of null space and row space.

Example. (With calculus) Consider D : C∞ C∞ with D(f)=f ￿. Ker D consists → of just constant functions. Any smooth function is the of a smooth function

(fundamental theorem of calculus). Thus Im D = C∞.

n Example. (With calculus) As before, let V = Span 1,t,..,t in C∞, and consider n { } k k 1 D : Vn Vn 1. As before, Ker D consists of the constant functions. Since D(t )=kt − → − n 1 for k =0, 1, 2,.., it follows 1,t,..,t − are all in Im D. Thus any of their is also in Im D.SowehaveIm D = Vn 1. −

Example. Fix a nonzero vector A in Rn, and define

L : Rn R,L: X A X. → ￿→ · 180

Then Ker L = V ⊥ where V is the spanned by A, and Im L = R.

Example. Let L : U V be a linear map, and W be a linear subspace of U.Wedefine → a new map L : W V as follows: |W → L (w)=L(w). |W This map is linear. L is called the restriction of L to W . |W

8.2. A relation

Throughout this section, L : U V will be a linear map of finite dimensional → vector spaces.

Lemma 8.5. Suppose that Ker L = O and that u ,..,u is a linearly independent { } { 1 k} set in U. Then L(u1),..,L(uk) form a linearly independent set.

Proof: Let k

xiL(ui)=O. i=1 ￿ By linearity of L, this reads k

L( xiui)=O. i=1 ￿ Since Ker L = O , this implies that { } k

xiui = O. i=1 ￿ Since u ,..,u is linearly independent, it follows that the x’s are all zero. Thus { 1 k} L(u1),..,L(uk) form a linearly independent set.

Theorem 8.6. (-nullity relation) dim U = dim (Ker L)+dim (Im L).

Proof:

If Im L is the zero space, then Ker L = U, and the theorem holds trivially. Suppose Im L is not the zero space, and let v ,..,v be a basis of Im L. Let L(u )=v . { 1 s} i i 181

Let w ,..,w be a basis of Ker L U. We will show that the elements u ,..,u ,w ,..,w { 1 r} ⊂ 1 s 1 r form a linearly independent set that spans U.

Linear independence: Let • x u + + x u + y w + + y w = O. 1 1 ··· s s 1 1 ··· r r Applying L to this, we get x v + + x v = O. 1 1 ··· s s Since v ,..,v is linearly independent, the x’s are all zero, and so { 1 s} y w + + y w = O. 1 1 ··· r r Since w ,..,w is linearly independent, the y’s are all zero. This shows that { 1 r} u1,..,us,w1,..,wr form a linearly independent set.

Spanning property: Let u U.Since v ,..,v spans Im L,wehave • ∈ { 1 s} L(u)=a v + + a v 1 1 ··· s s for some a . This gives L(u)=a L(u )+ + a L(u ). By linearity of L,thisyields i 1 1 ··· s s L(u a u a u )=0, − 1 1 −···− s s which means that u a u a u Ker L.So − 1 1 −···− s s ∈ u a u a u = b w + + b w − 1 1 −···− s s 1 1 ··· r r because w ,..,w spans Ker L. This shows that { 1 r} u = a u + + a u + b w + + b w . 1 1 ··· s s 1 1 ··· r r This completes the proof.

Example. Let A be an m n matrix, and consider L : Rn Rm, L (X)=AX.Since × A → A t t Ker LA = Null(A),ImLA = Row(A ),rankA= dim Row(A ),

The preceding theorem yields

n = rank A + dim Null(A).

We have proven this in chapter 4 by using an orthogonal basis. Note that the preceding theorem does not involve using any inner product. 182

n Example. (With calculus) As before, let V = Span 1,t,..,t in C∞, and consider n { } D : Vn Vn 1. Note that dim Vn = n + 1. We have seen that Ker D consists of the → − constant functions, and that Im D = Vn 1.Sodim(Ker D) = 1 and dim(Im D)=n. − Thus, in this case,

dim Vn = dim(Ker D)+dim(Im D)

as expected.

Corollary 8.7. Let U, W be finite dimensional linear subspaces of V . Then

dim(U)+dim(W )=dim(U + W )+dim(U W ). ∩

Proof: Define a linear map L : U W V ,(u, w) u w.Then ⊕ → ￿→ − Ker L = (u, u) u U, u W ,ImL= U + W. { | ∈ ∈ } We have seen that dim(U W )=dim(U)+dim(W ). By the dimension relation, it suffices ⊕ to show that dim(Ker L)=dim(U W ). For this, let u ,..,u be a basis of U W ,so ∩ { 1 s} ∩ that dim(U W )=s. We will show that the elements (u ,u ),..,(u ,u ) form a basis of ∩ 1 1 s s Ker L, so that dim(Ker L)=s.

Linear independence: consider •

xi(ui,ui)=(O, O). ￿ This means that

xiui = O.

Since u ,..,u is linearly independent,￿ the x’s are all zero. So (u ,u ),..,(u ,u ) form a { 1 s} 1 1 s s linearly independent set.

Spanning property: recall that an element of Ker L is of the form (u, u)withu U W . • ∈ ∩ Since u ,..,u spans U W ,wehave { 1 s} ∩

u = aiui ￿ 183

for some scalars ai.Thus

(u, u)=( aiui, aiui)= ai(ui,ui).

So (u ,u ),..,(u ,u ) spans Ker￿ L. ￿ ￿ { 1 1 s s }

Definition 8.8. Let L : U V be a linear map. We call L injective if Ker L =0.We → call L surjective if Im L = V .WecallL bijective if L is both injective and surjective.

Corollary 8.9. Let L : U V be a linear map of finite dimensional vector spaces. → (a) If L is injective then dim(U) dim(V ). ≤ (b) If L is surjective then dim(U) dim(V ). ≥ (c) If L is bijective then dim(U)=dim(V ).

Proof: All three follows from the dimension relation immediately.

Corollary 8.10. Let L : U V be a linear map of finite dimensional vector spaces with → dim(U)=dim(V ).

(a) If L is injective then L is bijective.

(b) If L is surjective then L is bijective.

Proof: (a) For L injective ie. Ker L = 0, the dimension relation says that dim(U)= dim(Im L). So our assumption gives dim(Im L)=dim(V ), ie. Im L is a subspace of V of the same dimension as V . A corollary of the Dimension Theorem says that V = Im L.

(b) For L surjective ie. Im L = V , our assumption says that dim(U)=dim(Im L). So the dimension relation gives dim(Ker L) = 0. Hence L is injective.

Example. Warning. The preceding corollary fails when U, V are infinite dimensional.

Consider for example D : C∞ C∞ as before. We know that C∞ is infinite dimensional. → Since every smooth function is the derivative of a smooth function, D is surjective. But we know that D(1) = 0, and so D is not injective. 184

Exercise. Let M : C∞ C∞ be the map M : f tf, ie. multiplication by the function → ￿→ t. Show that M is injective, but not surjective.

Theorem 8.11. Let L : U V be an injective linear map. If u ,...,u form a linearly → 1 k independent set in U, then L(u1),...,L(uk) form a linearly independent set in V .

Proof: Consider the

xiL(ui)=O. i ￿ By linearity, we can rewrite this as

L xiui = O. ￿ i ￿ ￿ By injectivity, it follows that i xiui = O.Sinceu1,...,uk form a linearly independent set, it follows that x1,...,xk are￿ all zero.

Corollary 8.12. Let L : U V be an injective linear map. If S is a basis of U, then the → set L(S)= L(u) u S is a basis of Im L. { | ∈ }

Proof: By the preceding theorem the set L(S) is linearly independent. Since L(S) also spans Im L, it follows that L(S) is a basis of Im L.

Corollary 8.13. Let L : U V be a bijective linear map of finite dimensional vector → spaces. If u ,...,u is a basis of U, then L(u ),...,L(u ) is a basis of V . { 1 n} { 1 n }

Proof: Since L is injective, L(u ),...,L(u ) is a basis of Im L by the preceding lemma. { 1 n } Since L is surjective, Im L = V .

Exercise. Let u ,...,u , v ,...,v be respective bases of two vector spaces U, V .Show { 1 n} { 1 n} that there is a unique bijective linear map L : U V such that →

L(ui)=vi,i=1,...,n. 185

Definition 8.14. Given a linear map L : U V , we define its rank to be dim(Im L). → Thus rank(L)=dim(U) dim(Ker L). −

Let u ,..,u and v ,..,v be bases of U and V respectively. For any u U, { 1 r} { 1 s} ∈ L(u) is a linear combination of v ,..,v in a unique way. In particular, { 1 r} L(u )=a v + + a v = a v i i1 1 ··· is s ij j j ￿ where (a ,..,a ) are the coordinates of L(u ) relative to v ,..,v . We call the r s i1 is i { 1 s} × matrix A =(a )thematrix of L relative to the bases u ,..,u and v ,..,v . L ij { 1 r} { 1 s}

Theorem 8.15. rank(AL)=rank(L).

We will not detail the proof here. The proof is by setting up a bijective linear map

Ker L Null(At ). → L t This gives dim(Ker L)=dim(Null AL). Then the theorem follows from the dimension relation. We will return to the study of the correspondence L A later. ￿→ L

Warning. Let A be an m n matrix, and consider the linear map L : Rn Rm, • × A → X AX. Let B =(b ) be the matrix of L relative to the standard bases of Rn, Rm, ￿→ ij A which we denote by e ,...,e , E ,...,E respectively. (Note that B is an n m { 1 n} { 1 m} × matrix.) Then

Aei = LA(ei)= bijEj. j ￿ The right hand side is the ith row of B,whileAei is the ith column of A. Therefore

B = At.

t Thus the matrix of LA relative to the standard bases is A ,andnotA. 186

8.3. Composition

We define composition by mimicking . Composition is an operations which takes two linear maps (with appropriate domains and ranges) as input and yields another linear map as output.

Definition 8.16. Let L : U V and M : V W be linear maps. The composition of → → L and M is the map M L : U W defined by (M L)(u)=M[L(u)]. ◦ → ◦

Exercise. Show that the map M L in the preceding definition is linear. ◦

Exercise. Prove that if L : U V , L￿ : U V , M : V W , M ￿ : V W are linear → → → → maps and c is any scalar, then

(L + L￿) M = L M + L￿ M ◦ ◦ ◦ (cL) M = c(L M) ◦ ◦ L (M + M ￿)=L M + L M ￿ ◦ ◦ ◦ L (cM)=c(L M). ◦ ◦ Thus we say that composition is bilinear.

Exercise. Prove that if N : U V , M : V W , L : W Z are linear maps, then → → → (L M) N = L (M N). ◦ ◦ ◦ ◦ Thus we say that composition is associative.

We now define inverse by mimicking the inverse of a matrix.

Definition 8.17. We say that a linear map L : U V is invertible if there is a linear → map M : V U such that L M = id and M L = id . In this case, → ◦ V ◦ U

we call M an inverse of L; •

we also call an invertible linear map, a linear ; • 187

we say that U and V are isomorphic. •

Lemma 8.18. Let L : U V be invertible linear map. Then L is bijective. →

Proof: Let M be an inverse of L.

Injectivity: Let L(u) = 0. Applying M to this, we get u = O.ThusKer L = O , and • { } hence L is injective.

Surjectivity: Let v V .ThenL[M(v)] = v.ThusIm L = V , and hence L is surjective. • ∈

Lemma 8.19. If L is invertible, then L has a unique inverse.

Proof: Let M,N be inverses of L. Applying L to M(v) N(v), we get O.Byinjectivity − of L, M(v) N(v)=O.ThusM = N. −

Lemma 8.20. Let L : U V be bijective linear map. Then L is invertible. →

Proof: Given v V there is a u U such that L(u)=v by surjectivity. If u, u￿ U have ∈ ∈ ∈ the same image, ie. L(u)=L(u￿), then L(u u￿) = 0, ie. u u￿ =0byinjectivity.So − − given v V there is exactly one u U with L(u)=v. ∈ ∈ Define a map M : V U by M(v)=u where L(u)=v. By definition M L(u)= → ◦ u for all u,ie.M L = id . Also L M(v)=v for all V ,ie.L M = id . It remains to ◦ U ◦ ◦ V show that M is linear.

Addition: Applying L to M(v + v ) M(v ) M(v ), we get O.SinceL is injective, • 1 2 − 1 − 2 it follows that M(v + v ) M(v ) M(v )=O. 1 2 − 1 − 2

Scaling: Applying L to M(cv) cM(v), we also get O. • − 188

Lemma 8.21. If U, V are finite dimensional vector spaces with dim U = dim V , then there is a bijective linear map L : U V . →

Proof: Let n = dim U = dim V and u ,..,u , v ,..,v be bases of U, V respectively. { 1 n} { 1 n} Define a map f : u ,..,u V , f(u )=v for all i. By Theorem 8.2, there is a linear { 1 n}→ i i map L : U V such that L(u )=v . It remains to show that L is bijective. Since both → i i U, V are finite dimensional, it suffices to show that L is surjective.

Let v be an element of V .Thenv can be expressed as v = aivi. By linearity of L,wehave ￿ L( aiui)= aiL(ui)= aivi = v. Thus v lies in Im L. We conclude￿ that￿Im L = V . ￿

n Example. Let U = Span 1,t,..,t in C∞. Let E ,..,E be the standard vectors { } 1 n+1 in Rn+1. We have a bijective linear map L : U Rn+1 such that L(ti)=E for → i+1 i =0, 1,..,n.

In summary, we have shown the following. For finite dimensional vector spaces U, V , the following statements are equivalent:

(a) There is an invertible linear map L : U V . → (b) There is a bijective linear map L : U V . → (c) dim(U)=dim(V ).

Lemma 8.22. If U, V are finite dimensional vector spaces of the same dimension and L : U V , M : V U are linear maps with L M = id , then L is invertible with → → ◦ V inverse M.

Proof: For any element v in V , L[M(v)] = v.

Thus L is surjective. By the dimension relation,

dim(Ker L)+dim(V )=dim(U). 189

That U and V having the same dimension implies that Ker L = O .ThusL is injective. { } This means that L is invertible. By the associativity property of composition, we have

1 1 L− (L M)=(L− L) M = id M = M. ◦ ◦ ◦ ◦ U ◦ 1 1 1 But since L M = id , we also have L− (L M)=L− . It follows that M = L− . ◦ V ◦ ◦ Exercise. Give an example to show that the statement without the assumption that dim(U)=dim(V ) in the lemma is false.

Exercise. Prove that if U, V are finite dimensional vector spaces of the same dimension and L : U V , M : V U are linear maps with M L = id ,thenL is invertible with → → ◦ U inverse M.

8.4. Linear equations

Terminology: Let L : U V be a linear map. An equation of the form →

L(X)=v0 is called a linear equation in the variable X. When the given v0 is ￿0, we call that a homogeneous equation. We call a vector u U a solution if L(u)=v . Note that we do ∈ 0 n n not require that U, V be finite dimensional. For U = R , V = R , and L = LA, v0 = B, the linear equation LA(X)=B reads AX = B. This is what we used to call a .

Theorem 8.23. Suppose L(u0)=v0. Then the solution set of the equation L(X)=v0 is

w + u w Ker L . { 0| ∈ }

Proof: Call this set S. A vector in S is of the form w + u with w Ker L.So 0 ∈ L(w + u0)=L(w)+L(u0)=v0,hencew + u0 is a solution. Conversely if u is a solution, then L(u)=v .ThusL(u u )=O, and hence w = u u Ker L. It follows that 0 − 0 − 0 ∈ u = w + u0. 190

Example. (With calculus) Differential equations. Consider the linear map D+cI : C∞ → C∞ where D(f)=f ￿ as before, I is the identity map and c is a number. We want to solve the equation (D + cI)f = O.

ct ct This is an example of a differential equation. Since D(e− )= ce− , it follows that − ct ct ct e− is a solution. Let f be any solution, and let g = fe .Thenf = ge− , and so ct (D + cI)(ge− )=O. Explicitly, this reads

ct g￿e− = O

which shows that g￿ = O.Thusg is a , and hence f is a scalar multiple ct ct of e− . It follows that the solution space Ker(D + cI) is spanned by the function e− .

Let’s solve the inhomogeneous equation

(D + cI)f =1.

If c = 0, then (D + cI) 1 = 1. By the preceding theorem, the solution set of our equation ￿ c in this case consists of functions of the form

ct 1 const. e− + . c If c = 0, then (D + cI)t = 1. The solution set in this case consists of functions of the form

ct const. e− + t.

8.5. The Hom space

Let U, V be vector spaces. Let L, M be two linear maps from U to V , and c be a scalar. We define the sum L + M and the scalar multiple cL as the following maps from U to V : L + M : U V, (L + M)(u)=L(u)+M(u) → cL : U V, (cL)(u)=cL(u). →

L + M and cL are linear. • 191

Proof: Let u1,u2,u be elements of U, and b be scalars. We chase through a chain of definitions as follows:

(L + M)(u1 + u2)=L(u1 + u2)+M(u1 + u2)

=(L(u1)+L(u2)) + (M(u1)+M(u2))

=(L(u1)+M(u1)) + (L(u2)+M(u2))

=(L + M)(u1)+(L + M)(u2) (L + M)(bu)=L(bu)+M(bu) = bL(u)+bM(u) = b[L(u)+M(u)] = b[(L + M)(u)]. This proves that L + M is a linear map. Checking the linearity of cL is similar.

Let Hom(U, V ) denote the set of all linear maps from U to V . This set contains the zero map O. We have just defined addition and scaling on this set.

Hom(U, V ) is a vector space with addition and scaling defined above. • Proof: Verifying each of the axioms V1-V8 needs no more than chasing through definitions in a straightforward manner. We illustrate this for V1 and V5, and leave the rest as exercise.

Let L, M, N be three linear maps from U to V . To prove V1, we must show that the linear map (L + M)+N is equal to the linear map L +(M + N). Two linear maps being equal means that they both do the same thing to any given element u. Thus we are to show that [(L + M)+N](u)=[L +(M + N)](u) for any element u in U. Let’s start from the left hand side and derive the right hand side: [(L + M)+N](u)=(L + M)(u)+N(u) =(L(u)+M(u)) + N(u) = L(u)+(M(u)+N(u)) = L(u)+(M + N)(u) =[L +(M + N)](u). 192

This proves V1.

Let a be a scalar and L, M be two linear maps from U to V . We want to show that [a(L + M)](u)=(aL + aM)(u) for any element u in U. Again, let’s start from the left: [a(L + M)](u)=a[(L + M)(u)] = a[L(u)+M(u)] = aL(u)+aM(u) =(aL)(u)+(aM)(u) =(aL + aM)(u). This proves V5.

Example. Consider the special case V = R. The vector space Hom(U, R) is often denoted

by U ∗, and is called the linear dual of U. An element of U ∗ = Hom(U, R) is called a or a 1-form on U.

Example. Let V be a 1 dimensional vector space, and L : V V be a linear map. Then → V consists of all scalar multiples cv0. of some fixed element v0.SinceL(v0) is an element of V ,wehave

L(v0)=c0v0

for some number c0. It follows that

L(cv0)=cL(v0)=cc0v0 =(c0IdV )(cv0)

for any number c. We conclude that L = c0IdV . This shows that any linear map on a one

dimensional vector space V is a scalar multiple of the identity map IdV . In particular,

dim Hom(V,V )=1.

Let U, V be finite dimensional vector spaces. Fix bases u ,..,u and v ,..,v { 1 r} { 1 s} of U and V respectively. For every element L in Hom(U, V ), let AL =(aij) be the matrix of L relative to those bases. Thus s

L(ui)= aijvj. j=1 ￿ 193

Define a map F : Hom(U, V ) M(r, s),F: L A . → ￿→ L Note that this map depends on the choice of our bases. We call this the matrix represen- tation map relative to the bases.

Theorem 8.24. The matrix representation map F relative to any given basis is a bijective linear map.

Proof:

Linearity: Let L ,L ,L be elements of Hom(U, V ), and c be a scalar. Let A =(a￿ ), • 1 2 L1 ij AL2 =(aij￿￿ ), AL =(aij).

(L1 + L2)(ui)=L1(ui)+L2(ui) s s

= aij￿ vj + aij￿￿ vj j=1 j=1 ￿ ￿ s

= (aij￿ + aij￿￿ )vj. j=1 ￿

This says that aij￿ + aij￿￿ is the (ij) entry of the matrix of AL1+L2 . On the other hand,

aij￿ + aij￿￿ is also the (ij) entry of the matrix AL1 + AL2 . So we conclude that

AL1+L2 = AL1 + AL2 , or

F (L1 + L2)=F (L1)+F (L2). Similarly (cL)(ui)=cL(ui) s = c aijvj j=1 ￿ s

= (caij)vj. j=1 ￿ This says that caij is the (ij) entry of the matrix AcL. On the other hand, caij is also the

(ij) entry of the matrix cAL. So we conclude that

AcL = cAL 194

or F (cL)=cF (L).

Injectivity: Suppose that F (L)=A is the . Then • L

L(ui)=O

for all i. By linearity of L, it maps any linear combination of u ,..,u , and hence any { 1 r} element u in U,toO.ThusL is the zero map from U to V . Hence F is injective.

Surjectivity: Let B =(b ) be any r s matrix. Define the assignment f : u ,..,u • ij × { 1 r}→ V , f : u s b v . By Theorem 8.2, there is a linear map L : U V such that i ￿→ j=1 ij j → ￿ s L(ui)=f(ui)= bijvj. j=1 ￿ This shows that AL = B,ie.F (L)=B.ThusF is surjective.

This completes our proof.

Corollary 8.25. If U and V are finite dimensional vector spaces, then dim Hom(U, V )= (dim U)(dim V ).

Proof: By the preceding theorem,

dim Hom(U, V )=dim M(r, s)=rs,

where r = dim U and s = dim V .

Corollary 8.26. If U is finite dimensional, then dim U ∗ = dim U.

Proof: Since U ∗ = Hom(U, R), it follows from the preceding corollary that

dim U ∗ =(dim U)(dim R)=dim U.

The following theorem says that the matrix representation map relative to any given bases has another important property: that it relates composition of linear maps with multiplication of their matrices. 195

Theorem 8.27. Let U, V, W be finite dimensional vector spaces with dimension r, s, t respectively. Fix three respective bases, and let

F : Hom(U, V ) M(r, s),G: Hom(V,W) M(s, t),H: Hom(U, W ) M(r, t) → → → be the respective matrix representation maps relative to those bases. Then for any linear maps L : U V , M : V W , → → H(M L)=F (L)G(M). ◦

Proof: Let u ,..,u , v ,..,v , w ,..,w be bases we have chosen for the vector spaces { 1 r} { 1 s} { 1 t} U, V, W respectively. Let N = M L. Let A =(a ), B =(b ), C =(c )betherespective ◦ ij ij ij matrices of L, M, N relative to the given bases. We must show that

C = AB.

By definition of composition,

N(ui)=M[L(ui)] = aijM(vj)= aijbjkwk. j j ￿ ￿ ￿k On the other hand, N(ui)= k cikwk. Comparing the coefficient of each wk, we conclude that ￿ cik = aijbjk j ￿ for all i, k. This gives the asserted matrix identity C = AB.

Corollary 8.28. Let U be an n dimensional vector space. Fix a basis and let

F : Hom(U, U) M(n, n),L A → ￿→ L be the matrix representation map relative to that basis. Then for any linear maps L : U → U, M : U U, → AM L = ALAM . ◦

Proof: This follows from specializing the preceding theorem to the case U = V = W . 196

Corollary 8.29. Assume the same as in the preceding corollary. If L : U U is invertible → 1 linear map, then AL is an with inverse AL− .

1 Proof: This follows from specializing the preceding corollary to the case M = L− , and the fact that A is the n n . Id ×

Exercise. Assume the same as in the preceding two corollaries. Prove that if L, M, N are any linear maps U U,then →

AN M L = ALAM AN . ◦ ◦

8.6. Induced maps

Let U, V, W be vector spaces. Let f : V W be a given linear map. We define a → map H : Hom(U, V ) Hom(U, W ),H: L f L. f → f ￿→ ◦ This is called the map induced by f. Here is a diagram which is a good mnemonic device for this definition: f L ◦ −−−−−→f U L V W. → →

H is a linear map. • f

Proof: Let L1,L2,L be elements of Hom(U, V ), and c be a scalar. We want to show that

Hf (L1 + L2)=Hf (L1)+Hf (L2)

Hf (cL)=cHf (L).

Now Hf (L1 + L2) and Hf (L1)+Hf (L2) are linear maps from U to W . So to prove that they are equal means to show that

[Hf (L1 + L2)](u)=[Hf (L1)+Hf (L2)](u) 197 for any element u in U. Starting from the left, we have [H (L + L )](u)=[f (L + L )](u) f 1 2 ◦ 1 2 = f[(L1 + L2)(u)]

= f[L1(u)+L2(u)]

= f[L1(u)] + f[L2(u)] =(f L )(u)+(f L )(u) ◦ 1 ◦ 2 =[Hf (L1)](u)+[Hf (L2)](u)

=[Hf (L1)+Hf (L2)](u).

This proves that Hf (L1 + L2) and Hf (L1)+Hf (L2) are equal. Similarly, [H (cL)](u)=[f (cL)](u) f ◦ = f[(cL)(u)] = f[cL(u)] = cf[L(u)] = c (f L)(u) ◦ =[c(f L)](u) ◦ =[cHf (L)](u).

This proves that Hf (cL)=cHf (L), hence completes our proof that Hf is a linear map.

Let U, V, W be vector spaces. Let f : U V be a given linear map. We define a → map Hf : Hom(V,W) Hom(U, W ),Hf : L L f. → ￿→ ◦ The diagram for this definition is:

L f ◦ −−−−−→f U V L W. → →

Exercise. Prove that Hf is a linear map.

Exercise. Prove that if f : V W is invertible, then H : Hom(U, V ) Hom(U, W )is → f → 1 also invertible with inverse Hf − . 198

Exercise. Prove that if f : U V is invertible, then Hf : Hom(V,W) Hom(U, W )is →1 → also invertible with inverse Hf − .

8.7. product spaces

Let S be any nonempty set. Let F (S) be the set consisting of all functions

f : S R → such that f(s) = 0 for all but finitely many s S. It is a vector space under the usual ∈ rules of function addition and scaling. It is called the the free vector space on S. The set f s S ,wheref (r) = 1 for r = s and f (s) = 1, is a basis of F (S). It is called { s| ∈ } s ￿ s the canonical basis of F (S). For any function f : S R can be uniquely written as a → linear combination s f(s)fs. It is convenient to denote a vector in F (S) as a formal sum a s, where it is understood that this represents the function S R, s a . s s ￿ → ￿→ s ￿ Exercise. Let S = 1, 2,..,n . Show that there is a canonical linear isomorphism F (S) { } → Rn.

Now let V,W be vector spaces, and consider F (V W ). Let R(V,W) be the linear × subspace of F (V W ) generated by the vectors × (v + cv ,w) (v ,w) c(v ,w), (v, w + cw ) (v, w ) c(v, w ) 1 2 − 1 − 2 1 2 − 1 − 2 (v, v ,v V, w,w ,w W, c R.) 1 2 ∈ 1 2 ∈ ∈

Definition 8.30. The space V W is the quotient space F (V ⊗ × W )/R(V,W).

In particular, V W is spanned by cosets of the form v w := (v, w)+R(V,W). ⊗ ⊗ Exercise. The main point. Verify that the expression v w is linear in v and w. ⊗ Exercise. Let U be a vector space and B : V W U is a . Prove that there × → is a unique linear map B˜ : V W U such that ˜(v w)=B(v, w) for all (v, w) V W . ⊗ → ⊗ ∈ × (Hint: First define a map F (V W ) U using the canonical basis of F (V W ), and × → × then show that R(V,W)liesinthekernel.) 199

8.8. Homework

2 2 2 1. Consider the linear map D = D D : C∞ C∞.DescribeKer D and Im D . ◦ → Do the same for Dk = D D D (k times). ◦ ◦···◦

2. Define a map T : M(m, n) M(n, m), T (A)=At. Show that T is linear. →

3. Let L : U R be a linear map from a 4 dimensional vector space U to R, what → are the possible of Ker L? Give an example in each case.

4. Define the map 1 L : M(n, n) M(n, n),L(A)= (A + At). → 2

(a) Show that L is linear.

(b) Show that Ker L consists of all skew symmetric matrices.

(c) Show that Im L consists of all symmetric matrices.

5. (a) Let A be a given n n matrix. Let S be the vector space of n n symmetric × × matrices. Define a map L : S S,by → L(X)=AtXA.

Show that L is a linear map.

(b) Show that if A is an invertible matrix, then L is invertible.

6. Let L : U V be a linear map of finite dimensional vector spaces. → (a) Show that dim U dim (Im L). ≥ (b) Prove that if dim U > dim V ,thenKer L = 0. ￿ 200

7. Let V be a vector space with an inner product , . For every element v in V ,we ￿ ￿ define a map f : V R by f (u)= v, u . v → v ￿ ￿

(a) Show that fv is linear, hence it is an element of V ∗.

(b) Define a map R : V V ∗ by R(v)=f . Show that R is linear. → v (c) Show that R is injective.

(d) Show that if V is finite dimensional, then R is a linear isomorphism.

8. Let V be a finite dimensional vector space, and u ,..,u be a basis of V . For { 1 n} each i, we define a linear form ui∗ on V by

u∗(u )=1,u∗(u )=0 j = i. i i i j ￿

Prove that u∗,..,u∗ form a basis of V ∗. It is called the basis dual to u ,..,u . 1 n { 1 n}

9. Let V = Rn equipped with the . Then we have a linear isomorphism n R : R V ∗ as defined above. Let E∗,..,E∗ be the basis of V ∗ dual to the → { 1 n} n 1 standard basis of R . Find the images of the Ei∗ under the linear map R− . Now do the same for R2 with the basis (1, 1), (2, 1) . { − }

10. Let U, V be a vector space, and L : U V be a linear map. Define a second map → L∗ : V ∗ U ∗ as follows. Given an element f in V ∗ and u in U,let →

[L∗(f)](u)=f[L(u)].

Show that L∗ is linear. It is called the dual map of L.

11. Continuing with the preceding exercise, suppose in addition that U, V are finite

dimensional. Let AL be the matrix of L relative to some given bases of U and V

respectively. Show that the matrix of L∗ : V ∗ U ∗ relative to the dual bases is → t the matrix AL. 201

12. (With calculus) More on differential equations. Consider the linear map L = 2 D 3D +2I : C∞ C∞. We want to solve the equation − → L(f)=O.

(a) Show that et and e2t are solutions to the equation.

t (b) Let f be any solution, and let g = fe− . Show that g￿￿ g￿ = O. Hence − t conclude that g￿ = const.e .

(c) Show that f is a linear combination of et and e2t. Hence conclude that Ker L is spanned by these two functions.

(d) Show that L(1) = 2.

(e) Describe all solutions to the equation

L(f)=1.

13. Let V be any finite dimensional vector space and U1,U2 be two linear subspaces of V of the same dimension. Show that there is an invertible map L : V V such → that L(U1)=U2. (Hint: Start with bases of U1,U2.)

14. ∗ Consider the of two linear maps

f g U V W. → → We say that this sequence is exact if f is injective, g is surjective, and Im f = Ker g. Show that if U, V, W are finite dimensional and the sequence is exact, then there is a linear isomorphism V = U W. ∼ ⊕

15. Let A be an n n matrix having entries below and along the diagonal all zero. × Regard A : Rn Rn as a linear map as usual. (Cf. this problem with problem → 6, Chapter 3.) 202

(a) For 1 k n,letV be the linear subspace spanned by the standard unit ≤ ≤ k n vectors E1,..,Ek in R , and let V0 be the zero subspace. Show that

AEk Vk 1. ∈ −

i (b) Show that AVk Vk 1. Hence conclude that A Vk Vk i, for 1 i k n. ⊂ − ⊂ − ≤ ≤ ≤ Thus each power Ai has the same shape as A except that the band of zeros increases its width by one (or more) each time the power increases by one.

16. ∗ Let V be a finite dimensional vector space and A, B : V V be linear maps. → Prove that dim Ker(A B) dim Ker(A)+dim Ker(B). ◦ ≤ (Hint: Write V = Ker(B)+U where U is a complementary subspace of Ker(B) in V .)

17. ∗ Let V be a finite dimensional and W a subspace of V .

(a) Let T : V V be the linear map such that →

T (w)=w, w W ; T (u)=O, u W ⊥. ∈ ∈ This is called the orthogonal of V onto W . Prove that T has the properties

(i) T 2 = T

(ii) Im T = W

(iii) T (v) v . ￿ ￿≤￿ ￿ Now suppose that T : V V is a given linear map with properties (i)-(iii). → (b) Show that V = Im(T )+Im(I T ) − and that the right side is a . Also show that

Im(T )=Ker(I T ),Im(I T )=Ker(T ). − − 203

(c) Show that Ker(T )⊥ = Ker(I T ). (Hint: Assume that w Ker(I T ), u − ∈ − ∈ Ker(T ) are not orthogonal. Show that this violates (iii) for v = w + cu, for an appropriate number c.)

(d) Conclude that T is the orthogonal projection of V onto W .

18. Let V be a finite dimensional inner product space. Suppose U1 and U2 are sub-

spaces of V with dim U1

such that u, w = 0 for all vector w in U . (Hint: Consider the map U U ∗, ￿ ￿ 1 2 → 1 u u, .) ￿→ ￿ −￿

19. ∗ Let V and W be possibly infinite dimensional vector spaces. Show that there

is a linear map α : V ∗ W Hom(V,W), such that λ w λ( )w, for all ⊗ → ⊗ ￿→ − λ V ∗ = Hom(V,R) and w W . Prove that α is injective, and that its image is ∈ ∈ the set consisting of ρ : V W with dim ρ(V ) < . Conclude that α is surjective → ∞ iffeither V or W is finite dimensional.

20. Use the preceding problem to prove that if V,W are finite dimensional, then dim V W = dim V dim W . ⊗ ·

21. Let U W and V be vector spaces and U, W finite dimensional. Suppose that ⊂ ω W V Hom(W ∗,V) defines an injective map τ : W ∗ V , λ λ(ω). ∈ ⊗ ≡ ω → ￿→ Show that

ω¯ = ω + U V W/U V W V/U V Hom((W/U)∗,V) ⊗ ⊂ ⊗ ≡ ⊗ ⊗ ≡

also defines an injective map τ :(W/U)∗ V . Conclude that if w ,..,w W ω¯ → 1 n ∈ form a basis of W , w￿ + U, .., w￿ + U W/U form a basis of W/U, v ,..,v V 1 m ∈ 1 n ∈ are independent, and if n m w v w￿ v￿ mod U V i ⊗ i ≡ j ⊗ j ⊗ i=1 j=1 ￿ ￿ then v￿ ,..,v￿ V are independent. (Hint: τ = τ π∗, π∗ :(W/U)∗ ￿ W ∗.) 1 m ∈ ω¯ ω ◦ → 204

9. For Linear Maps

Throughout this chapter, V will be a finite dimensional vector space. We denote dim V by n.

9.1. p-forms

We denote by V V (p times), the set of all p- (v ,..,v ) of elements ×···× 1 p vi in V .

Definition 9.1. Amapf : V V R is called a p-form on V if it is linear in each ×···× → slot, ie. for any elements v1,..,vp,v in V and any scalar c,

f(v1,..,vi + v, .., vp)=f(v1,..,vi,..,vp)+f(v1,..,v,..,vp)

f(v1,..,cvi,..,vp)=cf(v1,..,vi,..,vp)

for all i. By convention, a 0-form is a number. We denote by p(V ) the set of p-forms T on V .Wesaythatf is symmetric if

f(v1￿ ,..,vp￿ )=f(v1,..,vp)

p (v￿ ,..,v￿ ) is obtained from (v ,..,v ) by swapping two of the v’s. We denote by (V ) the 1 p 1 p S set of all symmetric p-forms on V .Wesaythatf is skew-symmetric if

f(v1,..,vp)=0 205 whenever two of the v’s are equal. We denote by p(V ) the set of all skew-symmetric E p-forms on V .

If f,g are p-forms and c anumber,wedefinenewp-forms f + g, cf by

(f + g)(v1,..,vp)=f(v1,..,vp)+g(v1,..,vp)

(cf)(v1,..,vp)=cf(v1,..,vp).

Exercise. Verify that the set p(V ), equipped with the two operations defined above, is T a vector space. Also verify that p(V ), p(V ) are linear subspaces of p(V ). S E T Question. What are their dimensions? 206

By definition, 0(V )= 0(V )= 0(V )=R. S E T So they are all one dimensional.

A 1-form is a linear map f : V R, v f(v). So, by definition, → ￿→ 0 0 0 (V )= (V )= (V )=V ∗. S E T So they are all n = dim V dimensional.

Example. Consider the map f : R2 R2 R defined by × → f(X, Y )=X Y. · This is a 2-form because the dot product has properties D1-D3 (chapter 2). It is also symmetric, by property D1. But f is not skew-symmetric because f(X, X) > 0when X = O. More generally, an inner product , on a a vector space V defines a symmetric ￿ ￿ ￿ 2-form.

Example. Consider the map f : R2 R2 R defined by × → f(X, Y )=det[X, Y ].

This is a 2-form and it is skew-symmetric by the four properties of of a matrix (chapter 5).

Example. Let f be a general 2-form. Expanding f(X, Y ) using the standard basis E ,E of R2, we get { 1 2} f(X, Y )=f(x1E1 + x2E2,y1E1 + y2E2)

= x1y1f(E1,E1)+x1y2f(E1,E2)+x2y1f(E2,E1)+x2y2f(E2,E2). If f is symmetric, then this becomes

f(X, Y )=x1y1f(E1,E1)+(x1y2 + x2y1)f(E1,E2)+x2y2f(E2,E2).

Example. Continuing with the preceding discussion, we consider the case when f is skew-symmetric. Then

( ) f(X, Y )=x y f(E ,E )+x y f(E ,E ). ∗ 1 2 1 2 2 1 2 1 207

For Y = X, this becomes

f(X, X)=x1x2(f(E1,E2)+f(E2,E1)).

But since f(X, X) = 0 for arbitrary X, it follows that f(E ,E )= f(E ,E ). Hence (*) 1 2 − 2 1 now reads f(X, Y )=(x y x y )f(E ,E )=f(E ,E )det[X, Y ] 1 2 − 2 1 1 2 1 2

for any X, Y .Thus,f is completely determined by the value of f(E1,E2).

Exercise. Expand a skew-symmetric 2-form f on R3 and write f(X, Y ) in terms of

f(E1,E2), f(E1,E3), f(E2,E3).

This chapter focuses primarily on the study of skew-symmetric p-forms. Symmet- ric 2-forms arise also in eigenvalue problems, as will be seen in the next chapter.

We now make four important observations about general p-forms. Let u ,..,u { 1 n} be a basis of V .First,letf be a p-form. Consider f(v1,..,vp) for given elements v1,..,vp

in V . Each vi is a linear combination, say n vi = aijuj. j=1 ￿ By linearity, we can expand f(v1,..,vp) and express it as a sum of the form

f(v ,..,v )= a a f(u ,..,u ) 1 p 1j1 ··· pjp j1 jp ￿ where the sum is ranging over all j1,..,jp taking values in the list 1,..,n. This shows that

f is determined by the values f(uj1 ,..,ujp ). In other words,

1. if f,g are two p-forms such that

f(uj1 ,..,ujp )=g(uj1 ,..,ujp )

for all j1,..,jp, then f = g.

In particular, if f is a p-form with f(uj1 ,..,ujp ) = 0 for all j1,..,jp,thenf is the

zero p-form. (ie. f maps every p- (v1,..,vp) to 0).

2. p-form f is skew-symmetric iff

f(x￿)= f(x) − 208

whenever the p-tuple x￿ =(v1￿ ,..,vp￿ ) is obtained from x =(v1,..,vp) by swapping two of the v’s.

Exercise. Prove this. (Hint: Imitate the proof of skew-symmetry for the determinant in chapter 5.)

Third, let f be a skew-symmetric p-form. Then f(uj1 ,..,ujp )=0whenevertwo

of the j’s are equal. Thus to determine f it is enough to know the values f(uj1 ,..,ujp ) for

all j1,..,jp distinct. In particular when p>n, since there cannot be p distinct integers

between 1 and n, it follows that f(uj1 ,..,ujp ) = 0 for all j1,..,jp.Thuswhen p>n, the only skew-symmetric p-form is the zero p-form. Now suppose that p n and that ≤ j1,..,jp are distinct. By the second observation above, if we perform a of m swaps – swapping two elements at a time – on x =(u ,..,u )toresultin,sayx =(u ,..,u ), j1 jp ￿ j1￿ jp￿ then m f(x￿)=( 1) f(x). − By the Sign Theorem (chapter 5), we can write this as

f(x￿)=sign(L)f(x)

where L is any permutation of p letters which transforms (j1,..,jp)to(j1￿ ,..,jp￿ ). In par- ticular, given any x =(u ,..,u ) such that j ,..,j are distinct, we can transform this ￿ j1￿ jp￿ 1￿ p￿ by a permutation L so that the result x =(u ,..,u ) is such that 1 j <

f(x￿)=sign(L)f(x). Thus we conclude that

3. a skew-symmetric p-form f is determined by the values

f(u ,..,u ), 1 j <

Fourth, given a p-tuple J =(j ,..,j ) of integers such that 1 j <

J ￿ =(j1￿ ,..,jp￿ ) of integers (not necessarily distinct) in the list 1,..,n,wewrite u =(u ,..,u ). J ￿ j1￿ jp￿ We also introduce the symbol

∆(J, J ￿)=sign(L) 209

if J ￿ transforms to J by the permutation L. Again, by the Sign Theorem (chapter 5), this

is independent of the choice of permutation L which transforms J ￿ to J.IfJ ￿ does not

transform to J,thenweset∆(J, J ￿) = 0. Let v1,..,vp be elements in V .Then,inaunique

way, each vi is a linear combination, say n

vi = aijuj. j=1 ￿ We define a map f : V V R by J ×···× →

fJ (v1,..,vp)= a1j apj ∆(J, J ￿) 1￿ ··· p￿ ￿ where the sum is ranging over all p-tuples J ￿ =(j1￿ ,..,jp￿ ) of integers in the list 1,..,n. (Compare this with our second observation above).

4. Then fJ has the following properties:

f is a p-form • J

f is skew-symmetric • J

f (u )=∆(J, J ￿) for all J ￿. • J J ￿ All three properties can be verified in a straightforward way by imitating the proof of the alternating, linearity (additive and scaling), and unity properties of the determinant function det on n n matrices (chapter 5). ×

n Lemma 9.2. For 0 p n, there are exactly k-tuples (j ,..,j ) of integers such ≤ ≤ p 1 p that 1 j <

Proof: We prove this by induction on the integer n + p. For n + p = 0, ie. n = p = 0, there is just one 0-tuple, namely the empty tuple, by definition. Thus our claim holds trivially in this case. Assuming that the claim is true for n + p

A. All p-tuples of the form (j1,..,jp 1,n), ie. jp = n.Sincejp 1

B. All p-tuples (j1,..,jp) such that jp

Exercise. Prove the last identity of binomial coefficients. Use the identity:

n 1 n (1 + t) − (1 + t)=(1+t) . (What are the coefficients of tp on both sides?)

n Theorem 9.3. For 0 p n, dim p(V )= . ≤ ≤ E p ￿ ￿ Proof: Let u ,..,u be a basis of V . We will construct a basis of p(V ). By the preceding { 1 n} E lemma, the number of p-tuples J =(j ,..,j ) of integers with 1 j <

fJ (uJ ￿ )=∆(J, J ￿).

To prove our assertion, it suffices to show that these p-forms fJ together form a basis of p(V ). E

Linear independence: Consider a linear relation • bJ fJ = O ￿J 211

where the sum is ranging over all p-tuples J =(j ,..,j ) of integers with 1 j < < 1 p ≤ 1 ··· j n.FixaJ ￿ =(j￿ ,..,j￿ )with1 j￿ <

bJ ∆(J, J ￿)=0. ￿J Suppose ∆(J, J ￿) = 0 for some J in the sum above. Then by definition, ￿

∆(J, J ￿)=sign(L)

where J ￿ transforms to J by some permutation L.Butsincewehave

1 j <

J ￿ can transform to J only if J ￿ = J. Thus we can choose L to be the identity permutation, and hence

∆(J, J ￿)=1.

Moreover, the linear relation now reads

bJ ￿ =0.

Since J ￿ is arbitrary, it follows that bJ = 0 for all J. This shows that the fJ forms a linearly independent set in p(V ). E

Spanning property: Let f be any skew-symmetric p-form. We will show that f is a linear • combination of the fJ ’s. Define a skew-symmetric p-form by

g = f(uJ )fJ ￿J where the sum is ranging over all p-tuples J =(j ,..,j ) of integers with 1 j < < 1 p ≤ 1 ··· j n. Again fix a J ￿ =(j￿ ,..,j￿ )with1 j￿ <

g(uJ ￿ )= f(uJ )fJ (uJ ￿ )= f(uJ )∆(J, J ￿). ￿J ￿J Again, by the very same argument as above, we conclude that ∆(J, J ￿)=0unlessJ = J ￿,

and that ∆(J ￿,J￿) = 1. Thus we get

g(uJ ￿ )=f(uJ ￿ ). 212

Since J ￿ is arbitrary, by our third observation above, we conclude that g = f. Hence

f = f(uJ )fJ . ￿J

This completes our proof.

Given a linear map L : V V , we define a map → L : p(V ) p(V ) p E →E for each p =0, 1,..,n, as follows. Put L = 1 (note that 0(V )=R, by definition). For 0 E p 1, given a p-form f,wedefineL (f)by ≥ p

[Lp(f)](v1,..,vp)=f[L(v1),..,L(vp)].

We will show that L is linear. Let f ,f ,f be elements of p(V ), and c be a scalar. Let p 1 2 E v1,..,vp be elements of V .Then

[Lp(f1 + f2)](v1,..,vp)=(f1 + f2)[L(v1),..,L(vp)]

= f1[L(v1),..,L(vp)] + f2[L(v1),..,L(vp)]

=[Lp(f1)](v1,..,vp)+[Lp(f2)](v1,..,vp)

=[Lp(f1)+Lp(f2)](v1,..,vp). It follows that

Lp(f1 + f2)=Lp(f1)+Lp(f2).

Similarly, we have

Lp(cf)=cLp(f).

Corollary 9.4. For a given linear map L : V V ,thelinearmapL : n(V ) n(V ) → n E →E is a scalar multiple of the identity map.

Proof: Recall that (an exercise in chapter 8) if U is a 1 dimensional vector space U,then any linear map from U to U is a scalar multiple of the identity map on U.Bythepreceding n theorem, the vector space n(V ) has dimension = 1. It follows that the linear map E n ￿ ￿ L is a scalar multiple of the identity map on n(V ). n E 213

9.2. The determinant

Definition 9.5. For a given linear map L : V V , we define the determinant, denoted → by det(L), to be the number such that Ln = det(L)Id.

Theorem 9.6. (Multiplicative property ) If L : V V and M : V V are two linear → → maps, then det(L M)=det(L)det(M). ◦

Proof: Let f be an element in n(V ), and v ,..,v be elements in V .Then E 1 p [(L M) f](v ,..,v )=f[(L M)(v ),..,(L M)(v )] ◦ n 1 n ◦ 1 ◦ n = f[L(M(v1)),..,L(M(vn))]

=[Ln(f)][M(v1),..,M(vn)]

=[Mn(Ln(f))](v1,..,vn) =[(M L )(f)](v ,..,v ). n ◦ n 1 n This shows that (L M) = M L . ◦ n n ◦ n Writing this in terms of det, we get

det(L M)Id = det(M)det(L)Id. ◦ This proves our assertion.

Corollary 9.7. If L : V V is an invertible linear map, then det(L) =0. In this case, → ￿ 1 det(L− )=1/det(L).

1 1 Proof: det(L L− )=det(L)det(L− ) = 1. ◦

Corollary 9.8. If L : V V is a noninvertible linear map, then det(L)=0. →

Proof: Since L is not invertible, it fails to be injective (chapter 8). Let u1 be a nonzero element such that L(u1)=O. By suitably adjoining elements from V , we can form a basis 214

u ,..,u of V . Then relative to this basis, we have a skew-symmetric n-form f such { 1 n} I that

fI (u1,..,un)=1.

Since L(u1)=O, we get

[Ln(fI )](u1,..,un)=fI [L(u1),..,L(un)] = 0.

On the other hand, L(fI )=det(L)fI . It follows that det(L) = 0.

Example. Let V = Rn. Then a linear map L : Rn Rn is represented by an n n → × matrix AL =(aij), ie.

L(Ei)= aijEj j ￿ where E ,..,E is the standard basis of Rn. Consider the special n-tuple of integers { 1 n} I =(1,..,n). Relative to the standard basis, we have a skew-symmetric n-form fI such that

fI (E1,..,En)=1. By definition, we have

Ln(fI )=det(L)fI .

Evaluating both sides on (E1,..,En), we get

[Ln(fI )](E1,..,En)=det(L).

Unwinding the left hand side, we get

det(L)=fI [L(E1),..,L(En)] = a a f (E ,..,E ) 1j1 ··· njn I j1 jn ￿J where the sum is ranging over all n-tuple J =(j1,..,jn) of integers in the list 1,..,n.Since

fI is skew-symmetric, the summand is zero unless j1,..,jn are distinct. When they are

distinct, J =(j1,..,jn) is a rearrangement of I =(1,..,n), ie. J is a permutation of n letters, and that

fI (Ej1 ,..,Ejn )=∆(I,J)=sign(J). It follows that ￿ det(L)= sign(J) a a 1j1 ··· njn ￿J 215 where the sum now is ranging over all permutations J of n letters. This number, by our previous definition in chapter 5, is the determinant of the n n matrix A =(a ). Thus × L ij we have proven

Theorem 9.9. For a given linear map L : Rn Rn, →

det(L)=det(AL).

Question. How do we compute det(L) for a given linear map L : V V ? → Let u ,..,u be a basis of V , and let A =(a ) be the matrix of L relative to { 1 n} L ij this basis. Now relative to the same basis, we have a skew-symmetric n-form fI such that

fI (u1,..,un)=1.

n By proceeding the same way as in the case of R above (with u1,..,un playing the roles of E1,..,En), we find that

det(L)=det(AL). Thus we have the following generalization of the preceding theorem:

Theorem 9.10. Given a linear map L : V V and a basis u ,..,u of V , let A be → { 1 n} L the matrix of L relative to this basis. Then

det(L)=det(AL).

9.3. Appendix

The approach to determinants presented above is based on the notion of p-forms. The main theorem which made our definition for det(L) possible was the fact that n(V ) E is one dimensional, which took quite a bit of work to establish. We should emphasize that p-forms are of fundamental importance in many areas of mathematics – not just in algebra. Thus introducing it is worth every bit of work. On the other hand, Theorem 9.10 suggests 216

a more pragmatic way to define determinants. We now discuss this approach, which is less theoretical (perhaps even easier) and sidesteps the use of p-forms. However, this approach seems less elegant because the definition of determinant in this approach involves a choice of basis, and one has to prove that it is independent of the choice at the end.

Let L : V V be a linear map. Let u ,...,u be a basis of V and let A =(a ) → { 1 n} ij (we drop the subscript for convenience) be the matrix of L relative to this basis. We define

det(L)=det(A).

We must prove that this definition is independent of the choice basis u ,...,u , ie. that { 1 n} if A￿ is the matrix of L relative to another basis u￿ ,..,u￿ of V ,then { 1 n}

det(A￿)=det(A).

Let R : V Rn be the linear map defined by R(u )=E (Theorem 8.2). Then → i i R is a linear isomorphism (why?). We also have a new linear map

1 n n L = R L R− : R R . R ◦ ◦ → From

LR(Ei)=R[L(ui)] = R[ aijuj]= aijEj, j j ￿ ￿ it follows that for any column vector X in Rn,

t LR(X)=A X

n Consider the linear isomorphism S : V R defined by S(u￿ )=E , and the linear map → i i 1 n n L = S L S− : R R . S ◦ ◦ → Let 1 n n T = S R− : R R . ◦ → Then for any column vector X in Rn,

T (X)=BX

where B is some invertible matrix B.Wehave

1 1 1 1 1 T L T − = S R− R L R− R S− = S L S− = L . ◦ R ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ S 217

Applying this to any vector X in Rn, we get

t 1 t BA B− X =(A￿) X.

Since X is arbitrary, it follows that

t 1 t BA B− =(A￿) .

Thus t t 1 det(A￿) = det(BA B− ). By the multiplicative property of matrix determinant,wehave

t t 1 t det(A￿) = det(B)det(A )det(B)− = det(A ).

Since the determinant of a matrix is the same as that of its transpose, This completes the proof.

9.4. Homework

1. Use Theorem 9.10 to give another proof of the multiplicative property of deter- minant. Recall that if L : V V , M : V V are two linear maps of a finite → → dimensional vector space V , then relative to a given basis, we have (Theorem 8.27)

AM L = ALAM . ◦

2. Let V ,V be finite dimensional vector spaces and L : V V , L : V V be 1 2 1 1 → 1 2 2 → 2 linear maps. Let V = V V . Define a third map L : V V by 1 ⊕ 2 →

L(v1,v2)=(L1v1,L2v2).

(a) Show that L is linear.

(b) Show that det(L)=det(L1)det(L2).

3. Let A be an n n matrix. Consider the linear map × L : M(n, m) M(n, m),L(B)=AB. A → A 218

m Prove that det(LA)=det(A) .

4. Let L : V V be a linear map of a finite dimensional vector space V , and A be → the matrix of L relative to a given basis of V . Show that the matrix of the dual t map L∗ : V ∗ V ∗ relative to the is A . Conclude that → det(L∗)=det(L).

5. ∗ Volume. Let V be a finite dimensional vector space equipped with an inner

product , . Let u ,...,u and u￿ ,...,u￿ be two orthonormal bases. Let ￿ ￿ { 1 n} { 1 n} L : V V be the linear map defined by →

L(ui)=ui￿ ,i=1,...,n.

(a) Show that for any vectors v, w in V , n v, u￿ u￿ ,w = v, w . ￿ j￿￿ j ￿ ￿ ￿ j=1 ￿

(b) Show that the matrix of L relative to u ,...,u is orthogonal. { 1 n}

(c) Let v ,...,v be vectors in V . Let M,M￿ : V V be the linear maps defined 1 n → by

M(ui)=vi = M ￿(ui￿ ),i=1,...,n. Prove that

det(M)= det(M ￿). ±

(d) Define the volume of the parallelopiped P generated by v1,...,vn in V to be Vol(P )= det(M) . | | Conclude that this is well-defined, ie. it is independent of the choice of of V .

6. ∗∗ Another way to define determinant, assuming the notion of a . Let V be a finite dimensional vector space of dimension n.Put

p TV = p 0T V ⊕ ≥ 219 where T pV is the p-fold tensor product space V V (p times.) Note that ⊗···⊗ T 0V = R by definition.

(a) Show that TV is naturally a ring with a unique bilinear product TV TV × → TV such that

((v v ), (u u )) v v u u . 1 ⊗···⊗ p 1 ⊗···⊗ q ￿→ 1 ⊗···⊗ p ⊗ 1 ⊗···⊗ q

(b) Let f : V V be a linear map. Show that there is a unique linear map → Tf : TV TV such that → Tf(v v )=fv fv . 1 ⊗···⊗ p 1 ⊗···⊗ p

(c) Let AV be the two-sided ideal of the ring TV generated by the set v v v V . { ⊗ | ∈ } Note that AV is a linear subspace of TV (why?) Show that Tf preserves AV ,i.e.

Tf(AV ) AV. ⊂

(d) Put V = T V/AV ∧ and let pV be the subspace of V spanned by all cosets of the form ∧ ∧ v v := v v + AV. 1 ∧···∧ p 1 ⊗···⊗ p Show that nV is one dimensional. In fact, if v ,..,v form a basis of V ,then ∧ 1 n v v forms a basis of nV . 1 ∧···∧ n ∧ (e) Use (b) and (c) to show that there is a unique linear map

f : V V ∧ ∧ →∧ such that f( v v )=fv fv . ∧ ∧ 1 ∧···∧ p 1 ∧···∧ p

(f) Use (d) to show that f on nV is a scalar, and show that the scalar is precisely ∧ ∧ det(f). 220

10. Eigenvalue Problems For Linear Maps

Throughout this chapter, unless stated otherwise, L : V V will be a linear map → of a finite dimensional vector spaces V . We denote dim V by n.

10.1. Eigenvalues

Definition 10.1. A number λ is called an eigenvalue of L if the matrix L λId is − V singular, ie. not invertible.

The statement that λ is an eigenvalue of L can be restated in any one of the following equivalent ways:

(a) the linear map L λId is singular. − V (b) the function det(L xId ) vanishes at x = λ. − V (c) Ker(L λId ) is not the zero space. This subspace is called the eigenspace for λ. − V (d) There is a nonzero vector v such that L(v)=λv.Suchanonzero vector is called an eigenvector for λ.

Let AL =(aij) be the matrix of L relative to a given basis. Then for any number λ, the matrix of the linear map L λId relative to this basis is A λI,whereI is the − V L − n n identity matrix. By Theorem 9.10, × det(L λId )=det(A λI). − V L − 221

This shows the linear map L has the same eigenvalues as its matrix AL.

Definition 10.2. The polynomial function det(L xId )=det(A xI) is called the − V L − characteristic polynomial of L.

Thus the characteristic polynomial is a polynomial function whose leading term is ( 1)nxn (chapter 6). −

Exercise. Compute the characteristic polynomial of the linear map

L : M(2, 2) M(2, 2),A BA → ￿→ 11 where B = . 10 ￿ ￿ Exercise. Do the same for the linear map

D : U U, D(f)=f ￿ →

where U = Span 1,t . { }

Exercise. Do the same for the linear map

D : U U, D(f)=f ￿ →

where U = Span cos t, sin t . { }

Theorem 10.3. Let u1,..,uk be eigenvectors of L. If their corresponding eigenvalues are distinct, then the eigenvectors are linearly independent.

Corollary 10.4. There are no more than n distinct eigenvalues for L.

The proofs of these two statements are similar to the case of matrices (chapter 6). 222

10.2. Diagonalizable linear maps

Definition 10.5. AlinearmapL : U U is said to be diagonalizable if U has a basis → consisting of eigenvectors of L. Such a basis is called an eigenbasis of L.

Theorem 10.6. L is diagonalizable iffits matrix AL relative to some given basis is diagonalizable.

Proof: Let A = A =(a ) be the matrix of L relative to a basis u ,...,u of U. Let L ij { 1 n} f : U Rn be the linear isomorphism defined by →

f(ui)=Ei.

Then (At f)(u )=row i of A = a E = a f(u ) ◦ i ij j ij j j j ￿ ￿ = f[ a u ]=f[L(u )] = (f L)(u ). ij j i ◦ i j ￿ So the two linear maps At f,f L : U Rn agree on a basis, so they are equal: ◦ ◦ → ( ) At f = f L. ∗ ◦ ◦ Likewise, we have 1 t 1 ( ) f − A = L f − . ∗∗ ◦ ◦

If L is diagonalizable with eigenbasis v ,...,v and corresponding eigenvalues { 1 n} λ1,...,λn, then applying (*) to vi,wefind

t A f(vi)=f[L(vi)] = λif(vi).

t This shows that f(v1),..,f(vn) form an eigenbasis of A .

Conversely, if At is diagonalizable with eigenbasis B ,..,B and corresponding { 1 n} 1 1 eigenvalues λ1,..,λn, then applying (**) shows that f − (B1),..,f− (Bn) form an eigenbasis of L.

Thus we have proved that L is diagonalizable iff At is. Now in chapter 6, we showed that At is diagonalizable iff A is. 223

10.3. Symmetric linear maps

In this section, , will denotes an inner product on U. ￿ ￿

Definition 10.7. AlinearmapL : U U is called a symmetric linear map if Lv, w = → ￿ ￿ v, Lw for all v, w V . ￿ ￿ ∈

Example. Recall that given an n n matrix A, we have the linear map L : Rn Rn, × A → L (X)=AX. Let , be the dot product. For any vectors X, Y ,wehave A ￿ ￿ L (X) Y = AX Y = Y AX. A · · · Since X L (Y )=XtAY = Y tAtX = Y AtX · A · t it follows that the linear map LA is symmetric iff A = A ,ie.iffA is a .

Example. Let A be a symmetric positive definite n n matrix (see homework section of × chapter 7). Then we have an inner product on Rn defined by

X, Y = X AY. ￿ ￿ · Let B be a symmetric n n matrix, and consider the linear map L defined by L (X)= × B B n t BX. Note that the matrix of LB relative to the standard basis of R is B . Now

L X, Y = BX AY = XtBtAY = X BAY. ￿ B ￿ · · We also have X, L Y = X ABY. ￿ B ￿ · It follows that the linear map L is symmetric (with respect to the inner product , ) B ￿ ￿ iff BA = AB.

Exercise. Given an example to show that a linear map L need not be symmetric even if its matrix relative to some given basis is symmetric.

Given a linear map L, we define a map g : U U R by L × → g (v ,v )= L(v ),v . L 1 2 ￿ 1 2￿ 224

Exercise. Verify that gL is a 2-form on U. Moreover, gL is a symmetric 2-form iff L is

a symmetric linear map. A symmetric 2-form is also called a . gL is called the quadratic form associated with L.

Theorem 10.8. Let g be a 2-form on U. Then there is a unique linear map L : U U → such that g = gL. Moreover, g is symmetric iff L is symmetric.

Proof: The second assertion follows immediately from the first assertion and the preceding

exercise. We will construct a linear map L with g = gL, and prove that such an L is unique.

Construction. Define a map S : U U ∗, u h ,by • → ￿→ u

hu(v)=g(u, v).

Note that since g is linear in the second slot, it follows that h : U R is linear. The u → map S is linear (exercise). Recall that there is an linear isomorphism

R : U U ∗ = Hom(U, R),u f → ￿→ u where f is defined by f (v)= u, v (homework section, chapter 8). Consider the linear u u ￿ ￿ map 1 L = R− S : U U. ◦ → For any element u in U,wehave

R[L(u)] = S(u),ie.fL(u) = hu.

Since both sides are elements of U ∗, this is equivalent to saying that, for any element v in U, f (v)=h (v),ie.g(u, v)= L(u),v = g(u, v). L(u) u L ￿ ￿

Since this holds for arbitrary u, v, we conclude that gL = g.

Uniqueness. Suppose L, L￿ are two linear maps with g = g . Then for any u, v,we • L L￿ have

L(u),v = L￿(u),v ,ie.L(u) L￿(u),v =0. ￿ ￿ ￿ ￿ ￿ − ￿ 225

Since v is arbitrary, it follows that L(u) L￿(u)=O.Butu is also arbitrary. Thus L = L￿. − This completes the proof.

Theorem 10.9. L is symmetric iffits matrix AL relative to any orthonormal basis is symmetric.

Proof: Let A =(a ) be the matrix of L relative to a basis u ,...,u of U (relative to L ij { 1 n} a given inner product , ). Then ￿ ￿ u ,L(u ) = u , a u = a ￿ i j ￿ ￿ i jk k￿ ji ￿k since u ,u =0unlessi = k, and u ,u = 1. Similarly, we have ￿ i k￿ ￿ i i￿ L(u ),u = a . ￿ i j￿ ij

If L is symmetric, then

( ) u ,L(u ) = L(u ),u ∗ ￿ i j ￿ ￿ i j￿

for all i, j, and hence AL is symmetric. Conversely, suppose that AL is symmetric. Then

(*) holds for all i, j. Multiplying both sides by arbitrary numbers xi and yj, and then sum over i, j, we get from (*) x, L(y) = L(x),y ￿ ￿ ￿ ￿

where x = i xiui and y = j yjuj. This holds for arbitrary elements x, y of U. It follws that L is symmetric.￿ ￿

Theorem 10.10. Every symmetric map is diagonalizable.

Proof: Let L be a symmetric map. By the preceding theorem, its matrix AL relative some orthonormal basis is symmetric, hence is also diagonalizable (chapter 6). By Theorem 10.6, it follows that L is diagonalizable.

10.4. Homework

1. Find the characteristic polynomial, eigenvalues, and eigenvectors of the linear map 2 t t D +2D : U U,whereU = Span e ,te and D(f)=f ￿. → { } 226

11 2. Let S(2) be the space of 2 2 symmetric matrices, and let A = .Define × 11 the linear map ￿ ￿ L : S(2) S(2),L(B)=AtBA. → Find the characteristic polynomial, eigenvalues, and eigenvectors of L.

3. Repeat the previous problem, but with S(2) replaced by the space of all 2 2 × matrices.

4. Let L, M be two linear maps on a finite dimensional vector space U. Show that L M and M L have the same eigenvalues. ◦ ◦

5. Let L : U U be a linear map and u ,...,u be an eigenbasis of L with distinct → { 1 n} eigenvalues. Show that if u is an eigenvector of L,thenu is a scalar multiple of a

ui.

6. Let U be a finite dimensional vector space. Two linear maps L, M on U are said to be similar if 1 L = P − M P ◦ ◦ for some linear map P on U. Show that similar linear maps have the same eigen- values. 227

11. Jordan Canonical Form

Up to now, we’ve been working with notions involving only real numbers R.They have been the only “scalars” we use. Vectors in Rn are, by definition, lists of real numbers. We’ve scaled vectors by real numbers only. Our matrices have only real numbers as their entries. While working with real numbers alone afford us a certain degree of simplicity, it also places certain restrictions on what we can do. For example, a 2 2 real matrix 01 × need not have any real eigenvalue. The matrix A = is one such example. So 10 diagonalizing such a matrix using real numbers alone￿ is− out of￿ the question. The source of this restriction is simply that certain polynomial functions, such as t2 + 1, have no real roots. Thus for such matrices or linear maps, we lack the useful numerical tool, provided by eigenvalues and eigenspaces, for studying linear maps. It is a lot harder to compare these maps without those numerical tools.

To circumvent this fundamental difficulty, we therefore need another system of scalars in which we are guaranteed to find a root for every polynomial function, such as t2 + 1. Such systems exist, and the simplest of them is the system of complex numbers.

11.1. Complex numbers

Recall that C is simply the set R2, endowed with the rules of addition and mul- tiplication given by

(x, y)+(x￿,y￿)=(x + x￿,y+ y￿), (x, y)(x￿,y￿)=(xx￿ yy￿,xy￿ + yx￿). − 228

If we put z =(x, y), then we say that x is the real part of z and y is the imaginary part of z, and we write x = Re(z),y= Im(z). We also writez ¯ =(x, y) and call this the of z. It is not obvious, but − quite easy to verify directly, that the rules of addition and multiplication of C have all the usual properties that one might expect of scalars. For example, if a, b, c are three complex numbers, then we have the relations

ab = ba, (a + b)+c = a +(b + c), (ab)c = a(bc).

(Prove this!) It is customary to write z = x(1, 0) + y(0, 1) simply as z = x + yi,where i =(0, 1). Note that the (1, 0) does indeed behave like a “one” because (x, y)(1, 0) = (x, y) according to our rule of multiplication. So there is no confusion in writing (1, 0) = 1. We can, of course, think of a x as a complex number with Im(x) = 0. We also have i2 = ii = (1, 0) = 1. Note that i is, therefore, precisely a root − − for the polynomial function t2 + 1, because i2 + 1 = 0. More generally, we have

Theorem 11.1. (The fundamental theorem of algebra) Every non-constant polynomial function a tn + + a t + a , with coefficients a C,hasatleastonerootinC. n ··· 1 0 i ∈

This is a statement about certain algebraic completeness of C which the reals R lack.

Corollary 11.2. Every non-constant polynomial function f(t) with coefficients in C can be factorized into a product of linear functions. Specifically, if f(t) has degree n, i.e. the leading term of f(t) is atn with a =0, then f(t) is the product of n linear factors and a ￿ scalar, i.e. f(t)=a(t λ ) (t λ ) where λ ,..,λ are the roots of f(t). − n ··· − 1 1 n

Proof: By the fundamental theorem of algebra, f(t) has at least one root, say λ C.By n ∈ long division, we can divide f(t)byt λ and the remainder must be a scalar, say b.So − n we have f(t)=g (t)(t λ )+b 1 − n n 1 where g1(t) is a polynomial function with leading term at − . Since both sides must be zero

when t = λn, it follows that b = 0. If g1(t) is constant, i.e. if n = 1, we are done. If not, we 229

can apply the fundamental theorem to g1(t) and get a factorization g1(t)=g2(t)(t λn 1) − − n 2 where g2(t) has leading term at − . So we get

f(t)=g2(t)(t λn)(t λn 1). − − − This procedure terminates after n steps and we get the desired factorization for f(t).

Note that the roots λ1,..,λn of f(t) above need not be distinct. It can be shown that the factorization f(t)=a(t λ ) (t λ )isunique up to rearranging that roots. − n ··· − 1 That is, if f(t)=b(t λ￿ ) (t λ￿ ), then b = a and we can rearrange λ￿ ,..,λ￿ so that − n ··· − 1 1 n λi = λi￿ for all i.

11.2. over C

Almost all of the linear algebra notions and results we’ve developed in the previous chapters when R is the underlying scalar system, can be developed in a parallel fashion when R is replaced by C as the underlying scalars. We mention some of them here and point out a few things that need to be handled with care when working over C.Wewill leave some of the details to the reader in the form of exercises at the end of this chapter.

Row operations on complex matrices, i.e. matrices with entries in C, can be performed • just as over R.

The notion of inner product of two vectors A =(a ,..,a ),B =(b ,..,b )inCn now • 1 n 1 n becomes A, B = a ¯b + +a ¯b ; length becomes A = A, A . Note that A, A 0 ￿ ￿ 1 1 ··· n n ￿ ￿ ￿ ￿ ￿ ￿≥ by definition. It is important to note that while the dot product X Y in Rn is linear with ￿ · respect to scaling of X, Y by real numbers, A, B for complex vectors A, B is only linear ￿ ￿ with respect to scaling A, and is anti-linear with respect to scaling B. That is, we have

zA, B = z A, B , A, zB =¯z A, B . ￿ ￿ ￿ ￿ ￿ ￿ ￿ ￿ There is another important difference between the real and the complex case. While X Y = Y X for real vectors, we have · · A, B = B,A ￿ ￿ ￿ ￿ for complex vectors. 230

Schwarz’s inequality becomes • A, B A B . |￿ ￿|≤ ￿ ￿￿ ￿

Rules of scaling, adding, and multiplying complex matrices are the same as for real • matrices. Likewise for the rules of inverting and transposing matrices, as well as the correspondence between matrices and linear transformations of Cn.

The notions of linear subspaces of Rn, linear relations, linear dependence, bases, and • dimension, all carry over to Cn verbatim. Orthonormal bases and Gram-Schmidt also carry over, but with A, B playing the role of dot product. ￿ ￿

The notion of the determinant of an n n matrix and its basic properties also carry over • × verbatim.

But now, unlike real matrices, every n n complex matrix has at least one eigenvalue, • × by the fundamental theorem of algebra.

Unlike symmetric real matrices, symmetric complex matrices need not be diagonalizable. • 1+i 1 E.g. . 11i ￿ − ￿

There is a more subtle complex analogue of the transpose of a matrix, called the conjugate • transpose. If aij is the (ij) entry of A, then the A† is the matrix whose

(ij)entryis¯aji. A matrix A is called hermitian if A† = A. This is the complex analogue of a symmetric matrix. It is a theorem that every hermitian matrix is diagonalizable. This is the for hermitian matrices. The proof is essentially parallel to the proof of the spectral theorem for real symmetric matrices.

The notions of abstract vector spaces over C are similar to the real case. We have built a • theory of abstract (real) vector spaces by using the theory for Rn as a guide. One can now go one step further by replacing the role of R everywhere by C. It turns out that virtually 231 every formal notion in the real case, studied in Chapters 8-10 (see table of content there), has a straightforward analogue in the complex case. We would encourage the reader to write down these formal analogues yourself, or to consult another textbook for details. We would recommend S. Lang, Linear Algebra.

11.3. Similarity

From now on, all matrices are assumed to be complex matrices.

Definition 11.3. We say that two n n matrices A, B are similar if there is an invertible × 1 matrix C such that B = CAC− . In this case, we also say that A, B are conjugates of each other.

Exercise. Show that if P, Q are similar and if Q, R are similar, then P, R are also similar.

1 k k 1 Exercise. Show that if B = CAC− ,thenB = CA C− for all positive integer k.

For positive integers k, l, we denote by Ok l the k l zero matrix. × ×

Definition 11.4. An n n matrix B is called a if it has the shape × B1 Ok k Ok k 1× 2 ··· 1× p Ok2 k1 B2 Ok2 k3 Ok2 kp B =  .× . × ···. ×  . . . .    Okp k1 Okp kp 1 Bp   × ··· × −  where the block B is a k k matrix whose diagonal entries are all the same, say λ ,and i i × i i whose entries just above the diagonal are all 1, and whose entries are 0 everywhere else. Note that n = k + + k .TheB are called the Jordan blocks of B. 1 ··· p i

If a given Jordan block B above is 1 1, then B =[λ ]. If B is 2 2, then i × i i i × λi 10 λi 1 Bi = .IfBi is 3 3, then Bi = 0 λi 1 , and so on. In particular, any 0 λi ×   ￿ ￿ 00λi is a Jordan matrix whose Jordan blocks all have size 1 1. Note that the × λi need not be distinct. 232

Theorem 11.5. (Jordan canonical form) Every given n n matrix A is similar to a × Jordan matrix B. Moreover, B is uniquely determined by A up to rearranging the blocks

Bi along its diagonal. B is called a Jordan canonical form of A.

We will sketch a proof of this in the next section and Appendices A,B, of this chapter, assuming some basic facts about the polynomial algebra C[x]. The preceding theorem can be stated more abstractly as follows.

Theorem 11.6. For any given linear map A : V V of a finite dimensional complex → vector space V , there is a basis β of V , such that the matrix B =[A]β of A in the basis β is a Jordan matrix. Moreover, B is unique up to rearranging the Jordan blocks. The β above is called a Jordan basis associated to A.

We now sketch the essential steps of the proof. This will also lead to a simple procedure for computing a Jordan matrix of A.

11.4. subspaces and cycles

Throughout this section, we let A : V V be a linear map of a finite dimensional → complex vector space V of dimension n, and denote by f(t) the characteristic polynomial det(A tI) of A. − By the fundamental theorem of algebra, ( 1)nf(t) factorizes into a product of − functions of the form (t λ)mλ , and there is one such factor for every distinct eigenvalue − λ of A. The positive integer mλ is called the multiplicity of λ.

Definition 11.7. A linear subspace K of V is said to be A-invariant if AK K, i.e. ⊂ Av K, v K. ∈ ∀ ∈

Put K = v V (A λI)pv =0for some p > 0 . λ { ∈ | − } This linear subspace of V is called the generalized eigenspace of A corresponding to λ. 233

Lemma 11.8. Kλ is a A-invariant.

Proof: The maps A, A λI, commute with each other. So if v K ,say(A λI)pv = 0, − ∈ λ − then (A λI)pAv = A(A λI)pv = A0=0, − − and so Av K . This shows that AK K. ∈ λ ⊂

Theorem 11.9. (Generalized eigenspace decomposition) We have dim Kλ = mλ for each eigenvalue λ. Moreover, V is equal to the direct sum of all the Kλ. In other words, if

λ1,..,λs are the distinct eigenvalues of A, then we have a direct sum decomposition V = K + + K . λ1 ··· λs

A proof is given in Appendix A of this chapter.

Since AK K , we get a map K K , v Av. This map is denoted by A K λ ⊂ λ λ → λ ￿→ | λ and it is called the restriction of A to Kλ. This theorem tells us that we should break up the problem of finding a Jordan matrix of A into the problems of finding a Jordan matrix of each restriction A K , and then we can “stack” these smaller Jordan matrices together | λ to give a Jordan matrix of A. In fact, more generally we have

Lemma 11.10. If V1,V2 are A-invariant subspaces of A and if V = V1 + V2 is a direct B O sum, then a Jordan matrix of A is given by B = 1 where B ,B are Jordan OB 1 2 ￿ 2 ￿ matrices of the restrictions A V , A V respectively. | 1 | 2

Proof: Exercise.

Thus from now on, we will focus on a fixed eigenvalue λ,andput

m = m ,K= K = (0), λ λ ￿ which we assume are given to us. Note that the restriction A K has characteristic poly- | nomial ( 1)m(t λ)m. Since we are focusing on one eigenvalue at a time, we may as well − − forget V and just consider the linear map

A K : K K . | λ λ → λ 234

For convenience, we will temporarily drop the λ from the notations and simply write A : K K. → The remaining task to find a Jordan matrix of the linear map A : K K,assumed → to have characteristic polynomial ( 1)m(t λ)m, and to prove its uniqueness. The idea − − is to further decompose K into smaller A-invariant subspaces, until the pieces can’t be decomposed any more.

Throughout the rest of this section, λ is assumed fixed and we put

T = A λI. − Note that a subspace L of K is A-invariant iffit is T -invariant, and that for v K we ∈ have Av = λv Tv =0 v Ker(T ). ⇔ ⇔ ∈

Definition 11.11. A T -cycle of size p 1 is a list of vectors in K of the shape ≥ p 1 J = v, Tv, .., T − v (1) { } p 1 p where T − v =0and T v =0. A subspace L of K is called T -cyclic if it is spanned by a ￿ T -cycle. It is clear that a T - is also T -invariant.

Theorem 11.12. Any T -cycle is linearly independent.

Proof: We give two proofs. The first proof is by induction. A T -cycle of size one v with { } v = 0 is clearly linearly independent. Suppose our assertion holds for any T -cycle of size ￿ up to p 1. We now show that for p>1, (1) is linearly independent. Consider a linear − relation p 1 a0v + a1Tv+ + ap 1T − v =0. ··· − p 1 p p 1 Applying T to this, we get a0Tv+ + ap 2T − v =0sinceT v = 0. But Tv,..,T − v ··· − { } is clearly a T -cycle of size p 1. By our inductive hypothesis, this is linearly independent, − p 1 so a0 = = ap 2 = 0. This shows that ap 1T − v = 0, implying that ap 1 = 0 as well. ··· − − − So, (1) is linearly independent. 235

The second proof assumes some basic facts about the polynomial algebra C[x]. Let L be the T -cyclic subspace spanned by (1). Consider the map ϕ : C[x] L defined → by g(x) g(T )v. It is easy to see that Ker(ϕ) is an ideal in C[x], hence the ideal must ￿→ be principal, i.e. it is generated by a unique monic f(x) C[x]. Note that xp Ker(ϕ), ∈ ∈ hence xp f(x). So deg f(x) p.Butsinceϕ is onto, it induces an isomorphism ϕ : | ≥ C[x]/(f(x)) L.Sowehave → p dim C[x]/(f(x)) = deg f(x)=dim L p ≤ ≤ implying that all five terms are equal. This shows that dim L = p,soJ must be a basis of L, so it is linearly independent.

Theorem 11.13. There is a direct sum decomposition

K = L + + L 1 ··· r where each Li is a T -cyclic subspace of K.

A proof is given in the Appendix B.

It can be shown that a T -cyclic subspace L is indecomposable in the sense that if L is written as a direct sum of two T -invariant subspaces, then one of them has to be zero. This result won’t be proved or needed here. Note also that they may be many ways to decompose K into T -cyclic subspaces. But as shown below, while the Li appearing in the decomposition need not be unique, the list of their dimensions dim Li is uniquely determined by A.

By Theorem 11.12, a T -cyclic subspace spanned by a T -cycle J of size p must have dim L = p. In this case, the matrix of the restriction A L : L L in the basis J is | → precisely the p p matrix × λ 100 0 0 λ 10··· 0 [A L]J =  . . ···.  (2) | . . .    00 0 λ   ···  which has the shape of a Jordan block. Since λ is assumed given, this block is completely specified by its size, namely p = dim L. By Lemma 11.10, Theorem 11.13 implies the existence of a Jordan matrix of A : K K with r Jordan blocks [A L ] ,whereJ is a → | i Ji i 236

T -cycle spanning Li. Combining this with Theorem 11.9, we get the existence assertion of Theorem 11.6. We shall see below that, assuming existence, the uniqueness assertion can be proved by showing that the block sizes, if ordered decreasingly,

( ) dim L dim L , ∗ 1 ≥···≥ r are uniquely determined by A. In fact, the numbers (*) will be expressed in terms of the i numbers mi = dim Ker(T ), as shown below.

By Theorem 11.13, the list (*) is subject to the conditions

m = dim K = dim L + + dim L ,dimL> 0. (3) 1 ··· r i Let’s examine a few simple cases.

If m = 1, then there is just one possibility for the list (*), namely, 1. A Jordan matrix of A must be 1 1, so [λ] is a Jordan matrix of A. × If m = 2, then there are just two distinct possibilities for the list (*), namely, 2, or 1,1. A Jordan matrix of A in these cases are respectively λ 1 λ 0 , . 0 λ 0 λ ￿ ￿ ￿ ￿

If m = 3, then there are just three possibilities for the list (*): 3, or 2,1, or 1,1,1. A Jordan matrix of A in these cases are respectively λ 10 λ 10 λ 00 0 λ 1 , 0 λ 0 , 0 λ 0 .       00λ 00λ 00λ       As m gets larger, the number of possibilities grows and they correspond to the ways in which we can express m as a sum of positive integers as in (3). Thus to find a correct list (*) corresponding to a given A, it is clearly not enough just to know m, λ. We need to extract more out of the preceding theorem. First, we need a more definitive description of K.

Lemma 11.14. There exists a smallest positive integer q such that

Ker(T q)=Ker(T q+1). 237

Moreover, we have K = Ker(T q).

Proof: Clearly we have Ker(T ) Ker(T 2) K, and by definition, K is the union ⊂ ⊂···⊂ of all the Ker(T p). Since K is finite dimensional, these subspaces cannot grow indefinitely in dimensions, which means that dim Ker(T p)=dim K = m for all large enough p. In particular, Ker(T q)=Ker(T q+1) for some q. Fix the smallest such q, and our first assertion follows.

To prove our second assertion, it suffices to show that K Ker(T q). So let v K, ⊂ ∈ v = 0; we will show that T qv = 0. By definition of K,wehaveT pv = 0 for some p>0. If ￿ p q,then ≤ q q p p T v = T − T v =0

hence v Ker(T q). Now assume p>q.Then ∈ p q+1 p q 1 0=T v = T T − − v

p q 1 q+1 q = T − − v Ker(T )=Ker(T ) ⇒ ∈ q p q 1 p 1 = 0=T T − − v = T − v. ⇒ This shows that for p>q, we can still get T pv = 0 after reducing p by 1. We can continue to do so until p = q, proving that T qv = 0.

We now begin computing the ordered list (*). What is the largest possible value for

dim L1? By Theorem 11.13, L1 is spanned by a T -cycle J. Now comes a key observation: this J cannot have size larger than q, given by the preceding lemma. For otherwise, J p 1 would contain the vector T − v for some p>q; but this vector in J would be zero, by the

preceding lemma, contradicting Theorem 11.12. So, the largest possible value of dim L1 is q. In particular, the dimensions in the list (*) all lie between 1 and q. For 1 j q, ≤ ≤ let’s denote by lj, the number of times the number j occurs in the list (*). So the list (*), in this notation, has the shape

q, ..., q , q 1,..,q 1 ,..., 1,..,1 . { } { − − } { } We have grouped the list into q smaller sublists, where the (q j + 1)st sublist reads − j, .., j having size l . Note that l can be zero. Now, our remaining task is to find the { } j j numbers lq,..,l1. 238

Exercise. Show that if L is a d dimensional T -cyclic subspace of K,then jif1 j d dim Ker(T L)j = ≤ ≤ | difj>d. ￿

Exercise. Show that Ker(T j)=Ker(T L )j + + Ker(T L )j, and that the right side | 1 ··· | r is a direct sum.

j Lemma 11.15. Let mj = dim Ker(T ), j =1,..,q. Then we have

mq = qlq +(q 1)lq 1 + + l1 − − ··· mq 1 =(q 1)lq +(q 1)lq 1 + + l1 − − − − ··· mq 2 =(q 2)lq +(q 2)lq 1 +(q 2)lq 2 + + l1 − − − − − − ··· ··· m1 = lq + lq 1 + + l1. − ···

Proof: This is an immediate consequence of the preceding two exercises.

Example 11.16.

Suppose A : K K has q = 2, m = m = dim K = 6, m = 4. Then the → 2 1 preceding lemma yields

m2 =2l2 + l1,m1 = l2 + l1.

Solving them gives l2 = 2, l1 = 2. So a Jordan matrix of A in this case has two Jordan blocks of size 2 and two Jordan blocks of size 1: λ 1 0 λ  λ 1   0 λ     λ     λ    where all the unfilled entries are 0. 239

Exercise. Show that lq = mq mq 1 − − lq + lq 1 = mq 1 mq 2 − − − − lq + lq 1 + lq 2 = mq 2 mq 3 − − − − − ··· lq + lq 1 + + l1 = m1 0. − ··· − Hence conclude that the mj determines the lj.

Thus we have proved

Theorem 11.17. If K = L + + L is a direct sum, where the L are nonzero T -cyclic 1 ··· r i subspaces of K such that ( ) dim L dim L , ∗ 1 ≥···≥ r then the numbers (*) are uniquely determined by T (hence by A = T + λI.)

Thus to complete our task of finding a Jordan matrix of A, it is enough to compute the number q, and the numbers m1,..,mq. Then solve for the numbers l1,..,lq by the preceding lemma, and we are done. Recall that if we know a matrix of T : K K (in any → j given basis of K), then computing mj = dim Ker(T ) can be easily done by row reducing the matrix of T j. Note also that the number q of Lemma 11.14 can be computed at the same time. Namely, q is the first positive integer such that T q and T q+1 have the same rank (dim K = dim Ker + rank), which can also be easily computed by row reduction.

Recapitulating what we have done, we summarize the steps for finding a Jordan matrix for a given linear map A : V V schematically as follows: → V K L dim L dim Ker(A λI)j row reduce T j =(A λI)j. → λ → i → i → − → −

Exercise. Suppose A : K K has q = 2, m = m = 5, m = 2. Find a Jordan matrix λ → λ 2 1 of A.

Exercise. Suppose that a linear map A : V V has characteristic polynomial (t → − − λ )2(t λ )3 where λ = λ , and that its generalized eigenspaces are 1 − 2 1 ￿ 2 2 2 Kλ1 = Ker(A λ1I) ￿ Ker(A λ1I),Kλ2 = Ker(A λ2I) ￿ Ker(A λ2I). − − − − 240

Find a Jordan matrix of A.

11.5. Appendix A

We now prove Theorem 11.9. Throughout this section, A : V V will be a linear → map of a finite dimensional vector space V of dimension n. Our proof involves the use of the notion of the quotient of V by a linear subspace K. This notion is used in Appendix B as well. Let m be the dimension of K.

Define an on the set V as follows. We say that u, v V are ∈ equivalent if u v K. (Check that this is an equivalence relation!) Let V/K denote the − ∈ set of equivalence classes. The equivalence containing v V is denoted byv ¯ = v +K. ∈ If x C and v V ,wedefinevector scaling ∈ ∈ xv¯ = xv.

This is well-defined; for if u, v represents the same , then xu xv = − x(u v) K, because K is a linear subspace, and so xu = xv.Ifu, v V ,wedefine − ∈ ∈ vector addition u¯ +¯v = u + v.

Again, it is easy to check that this is well-defined. With the two operations we have just defined on the set V/K, it is straightforward, but tedious, to check that V/K becomes a complex vector space with the zero vector 0.¯ Moreover, the map

V V/K, v v,¯ → ￿→ is linear and surjective. It is called the projection map.

Suppose now that K is A-invariant. Define

A¯ : V/K V/K, v¯ = Av. → Again, this is well-defined. It is also linear, i.e.

A¯(¯u + xv¯)=A¯u¯ + xA¯v.¯

The linear map A¯ is called the map induced by A on the quotient, or simply the induced map. 241

Lemma 11.18. Let T : V V be a linear map, and K be a T - of V . → Then we have det(T )=det(T K) det(T¯). |

Proof: Pick a basis of K,say v ,..,v . Extend this to a basis of V ,sayβ = { 1 m} v ,..,v ,..,v .ThenTv takes the form { 1 m n} i m p v 1 i m Tv = j=1 ij j ≤ ≤ i m q v + n r v m +1 i n ￿ j=1 ij￿j j=m+1 ij j ≤ ≤ where P =(pij),Q =(qij)￿,R =(rij) are￿ matrices of respective sizes m m, m (n × × − PQ m), (n m) (n m). In other words, the matrix [T ] is equal to ,whereO is − × − β OR ￿ ￿ a zero matrix of size (n m) m.Sowehave − ×

det(T )=det[T ]β = det(P )det(R). (4)

On the other hand, observe that P is the matrix of the restriction map T K in the basis | v ,..,v .Sodet(P )=det(T K). Observe also thatv ¯ ,..,v¯ form a basis of V/K, { 1 m} | m+1 n and that n T¯v¯ = r v¯ ,m+1 i n. i ij j ≤ ≤ j=m+1 ￿ So R is the matrix of the induced map T¯ : V/K V/K in this basis. So det(R)=det(T¯). → Thus (4) yields the desired result.

We now consider the generalized eigenspaces K of A : V V . λ →

Lemma 11.19. If λ ,λ are distinct eigenvalues of A, then K K = (0). 1 2 λ1 ∩ λ2

Proof: Suppose the contrary, i.e. there is a nonzero vector v such that

(A λ I)av =0=(A λ I)bv (5) − 1 − 2 for some a, b > 0. Let a be the smallest integer such that (A λ I)av = 0. Then the − 1 a 1 vector v =(A λ I) − v is a nonzero eigenvector of A with eigenvalue λ by (5). Since 1 − 1 1 the subspace K K is A-invariant, and v lies in this subspace, it follows that v must λ1 ∩ λ2 1 also lie in this subspace. Let b be the smallest integer such that (A λ I)bv = 0. Then − 2 1 242

b 1 the vector v =(A λ I) − v is a nonzero eigenvector of A with eigenvalue λ .Butsince 2 − 2 1 2 A commutes with A λ I,wehave − 2 b 1 b 1 Av = A(A λ I) − v =(A λ I) − Av = λ v . 2 − 2 1 − 2 1 1 2

This shows that v2 is a nonzero eigenvector of A with two distinct eigenvalues λ1,λ2.This is a contradiction.

Lemma 11.20. Let λ ,..,λ be the distinct eigenvalues of A : V V . Then K + + 1 s → λ1 ··· K is a direct sum. That is, if v K for i =1,..,s,andifv + + v =0, then λs i ∈ λi 1 ··· s vi =0for all i.

Proof: Suppose the contrary, say v = 0. Since v K ,wehave(A λ I)pv = 0 for 1 ￿ 1 ∈ λ1 − 2 1 ￿ all p>0. For otherwise, we have 0 = v K K , contradicting the preceding lemma. ￿ 1 ∈ λ1 ∩ λ2 Note also that (A λ I)pv K since K is A-invariant. Repeating this argument, we − 2 1 ∈ λ1 λ1 find that (A λ I)p (A λ I)pv is a nonzero vector in K for all p>0. − 2 ··· − s 1 λ1 On the other hand, for any given i>1, we have (A λ I)pv = 0 for some p>0 − i i because v K . So we can choose a large p so that i ∈ λi (A λ I)p (A λ I)p(v + + v )=0. − 2 ··· − s 2 ··· s But since v = (v + + v ), the left side is (A λ I)p (A λ I)pv =0,a 1 − 2 ··· s − − 2 ··· − s 1 ￿ contradiction.

Proposition 11.21. Let m be the multiplicity of the eigenvalue λ of A, i.e. (t λ − λ)mλ is the largest power of the t λ appearing in the factorization of the − characteristic polynomial of A. Then dim Kλ = mλ.

Proof: For T = A tI : V V (t C), the characteristic polynomial of A is det(T ). By − → ∈ assumption, det(T )=(t λ)mλ g(t) − for some polynomial function g(t) whose factorization contains no factor t λ. − By the second lemma above, the only eigenvalue of the restriction map A K is λ. | λ It follows that the characteristic polynomial of this restriction map must be ( 1)p(t λ)p − − 243 where p = dim K . By definition this is also equal to det(T K ). So by the first lemma λ | λ above, we have (t λ)mλ g(t)=( 1)p(t λ)pdet(T¯). − − − Since g(t) does not divide the function t λ,wemusthavep m . It remains to show − ≤ λ that p m . ≥ λ Suppose the contrary p

This says that (A λI)v K . By definition of K , this shows that − ∈ λ λ (A λI)r(A λI)v =0 − − for r>0. In turn, this implies that v K ,hence¯v is zero in V/K , a contradiction. ∈ λ λ This completes our proof.

Proof of Theorem 11.9: The preceding proposition is the first assertion of the theorem. The third lemma above shows that the subspace K + + K V is a direct λ1 ··· λs ⊂ sum. Hence its dimension is m + + m . But the characteristic polynomial of A is λ1 ··· λs n m m ( 1) (t λ ) λ1 (t λ ) λs − − 1 ··· − s which must have degree n = m + + m . This shows that the subspace K + + λ1 ··· λs λ1 ··· K V has dimension n, hence must be all of V . So, we have a direct sum decomposition λs ⊂ V = K + + K . λ1 ··· λs

11.6. Appendix B

We shall prove Theorem 11.13 in this section. Throughout this section, we put T = A λI,whereA : K K is a fixed linear map of a complex vector space K = (0) − → ￿ with a single given eigenvalue λ.

Recall that a T -cycle of size p 1inK is a list of vectors ≥ p 1 J = v, Tv, .., T − v (6) { } 244

p 1 p such that T − v = 0 and T v = 0. ￿

Lemma 11.22. Let L be a T -cyclic subspace of K. Then L Ker(T ) is one dimensional. ∩

Proof: By definition, L is spanned by a T -cycle (6), which contains the nonzero vector p 1 T − v Ker(T ). So, L Ker(T ) is at least one dimensional. Let u L Ker(T ). Then ∈ ∩ ∈ ∩ u is a linear combination of J,say

p 1 u = a0v + a1Tv+ + ap 1T − v. ··· − Applying T to this, we get

p 1 0=Tu = a0Tv+ ap 2T − v. ··· −

By Theorem 11.12, J is linearly independent, and so a0 = = ap 2 = 0. Thus u = ··· − p 1 p 1 ap 1T − v. This shows that T − v is a basis of L Ker(T ). − ∩

Lemma 11.23. Let L ,..,L be T -cyclic subspaces in K. Pick a nonzero u L Ker(T ) 1 r i ∈ i ∩ for each i. Then u ,..,u are linearly independent iff L + + L is a direct sum. 1 r 1 ··· r

Proof: We show the if part first. Assume the sum is direct. Consider a linear relation a u + + a u = 0. Since a u L for all i,wehavea u = 0 because the sum 1 1 ··· r r i i ∈ i i i L + + L is direct. Now a = 0 because u = 0. This shows that u ,..,u are linearly 1 ··· r i i ￿ 1 r independent.

For the only-if part, we do induction on dim K.Ifdim K = 1, there is nothing to prove. So, suppose our assertion holds true for up to dim K = m 1. We shall prove − the case dim K = m. Thus, suppose the vectors u L Ker(T ), i =1,..,r are linearly i ∈ i ∩ independent. We shall prove that L + + L is a direct sum. First, if dim L = 1 for all 1 ··· r i i,thenL = Cu , and L + + L is clearly a direct sum in this case. So, let’s assume i i 1 ··· r that dim L1 > 1.

Consider the projection map K K/Cu , v v¯.SinceCu is T -invariant, T → 1 ￿→ 1 induces a map T¯ : K¯ K¯ ,¯v Tv. Now, suppose L is a nonzero T -cyclic subspace of K. → ￿→ We claim that the image L¯ under the projection map is a T¯-cyclic subspace of K¯ . First, L is spanned by a T -cycle, say (6). Then L¯ is spanned by

p 1 J¯ = v,¯ T¯v,¯ .., T¯ − v¯ . { } 245

Ifv ¯ = 0, then L¯ = (0), which is trivially T¯-cyclic. Ifv ¯ = 0, then there is a smallest positive ￿ d d 1 integer d p such that T¯ v¯ = 0. In this case, L¯ is spanned by v,¯ T¯v,¯ .., T¯ − v¯ J¯, ≤ { }⊂ which is a T¯-cycle in K¯ . In any case, L¯ is T¯-cyclic. So, the image L¯i of each Li is a T¯-cyclic subspace of K¯ . Note thatu ¯ L¯ is a nonzero vector lying in Ker(T¯), because u / Cu 2 ∈ 2 2 ∈ 1 by linear independence and because T¯u¯2 = Tu2 = 0. Likewise for i =2,..,r,wehave

0 =¯u L¯ Ker(T¯). ￿ i ∈ i ∩ Also L¯ = (0) since L Cu , because dim L > 1. By the preceding lemma, L¯ contains 1 ￿ 1 ￿⊂ 1 1 1 a nonzero vectorw ¯ Ker(T¯). ∈

We claim thatw, ¯ u¯2,..,u¯r are linearly independent. Consider a linear relation a w¯ + a u¯ + + a u¯ =0inK¯ . So, in K,wehave 1 2 2 ··· r r aw + a u + + a u = a u (7) 2 2 ··· r r 1 1 for some a C. Applying T to this, we get aT w = 0 because u Ker(T ). So either 1 ∈ i ∈ Tw = 0 or a = 0. Sincew ¯ = 0, it follows that w/Cu .ButL Ker(T )=Cu by ￿ ∈ 1 1 ∩ 1 preceding lemma. So, w/Ker(T ), i.e. Tw = 0. Hence a = 0. Now (7) implies that ∈ ￿ ai = 0 for all i, because the ui are linearly independent.

So, we have shown thatw, ¯ u¯ ,..,u¯ Ker(T ), lying respectively in the T¯-cyclic 2 r ∈ subspaces L¯ ,..,L¯ , are linearly independent vectors in K¯ .Sincedim K¯ = m 1, our 1 r − inductive hypothesis implies that L¯ + + L¯ (8) 1 ··· r is a direct sum in K¯ . We now show that L + + L is a direct sum in K. Let v L 1 ··· r i ∈ i such that v + + v = 0. Applying the projection map to this, we getv ¯ + +¯v = 0. 1 ··· r 1 ··· r Since (8) is a direct sum andv ¯ L¯ ,wehave¯v = 0 for all i. In particular for i = 2, i ∈ i i

v2 = au1

for some a C. Since the right side lies in Ker(T ) and the left side lies in L , both ∈ 2 sides lie in L Ker(T )=Cu .Thus,wehavev = au Cu ,hencea = 0 and so 2 ∩ 2 2 1 ∈ 2 v2 = 0, because u1,u2 are linearly independent. Likewise, vi = 0 for i =2,..,r. Finally, v + + v = 0 implies that v = 0 as well. This proves that L + + L is a direct 1 ··· r 1 1 ··· r sum when dim K = m.

To prove Theorem 11.13, the preceding lemma suggests that starting from linearly independent eigenvectors u ,..,u T , we should find T -cyclic subspaces L u .Then 1 r ∈ i ￿ i 246

hope to get K = L + + L . Unfortunately, doing it this way, the T -cyclic subspaces 1 ··· r may fall short of saturating K,i.e.wemayendupwithjustK ￿ L1 + + Lr. Here is ··· an example.

Example 11.24.

Let T : C4 C4 be the linear map defined by → e 0,e e ,e e ,e 0 1 ￿→ 2 ￿→ 1 3 ￿→ 2 4 ￿→ 4 where the ei form the standard basis of C . Note that in this basis, the matrix of T is already a Jordan matrix with a single eigenvalue 0 and two Jordan blocks of sizes 3,1. (Write it down!) The eigenspace is

Ker(T )=Ce1 + Ce4.

Note that J = e + e ,e and J = e are T -cycles. The T -cyclic subspaces L ,L 1 { 2 4 1} 2 { 4} 1 2 they each spans contain the eigenvectors e1,e4 respectively. There are no more independent eigenvectors we can use to make more T -cycle. Yet, the T -cyclic subspaces we get from 4 J1,J2 add up to just 3 dimension, not enough to give K = C . Note that we cannot

“extend” either J1,J2, as it stands, by adding more vectors to the left, because e2 + e4,e4

are not in Im(T ). Roughly, the problem is that J1 is not a T -cycle that is as large as while containing the eigenvector e . In fact, here is one that is as big as it can be: e ,e ,e . 1 { 3 2 1} Note that this new T -cycle together with e are enough to give T -cyclic subspaces that { 4} add up to 4 dimension. Note also that the two eigenvectors e e cannot be extended at 1 ± 4 all to a larger T -cycle.

Proof of Theorem 11.13: We shall do induction on dim K. Again for dim K = 1, there is nothing to prove. Suppose the theorem holds for up to dim K = m 1. We − shall prove the case dim K = m.SinceKer(T ) = 0, dim T K < dim K by the dimension ￿ relation. Since TK = Im(T )isT -invariant, we can apply our inductive hypothesis to the restriction map T TK, which we still call T .ThusthereexistsT -cyclic subspaces | M1,..,Ms of TK, and a direct sum decomposition

TK = M + + M . (9) 1 ··· s Consider one of the summands, say M .SinceM is T -cyclic, there exists w TK,say 1 1 ∈ d 1 w = Tv, such that M is spanned by a T -cycle Tv,..,T − v in TK. Adjoining v to 1 { } 247

this T -cycle, we get a new T -cycle of size d in K. Let L1 be the T -cyclic subspace of K

spanned by the new T -cycle. By Theorem 11.12, dim L1 = dim M1 + 1. (Note that the

space L1, though not dim L, depends on the choice of v.) Repeat this construction for

each Mi and get Li with

dim Li = dim Mi +1. Let u ,..,u be nonzero vectors in the respective one dimensional spaces L Ker(T )(cf. 1 s i ∩ Lemma 11.22.) Since (9) is a direct sum, these vectors are linearly independent. So, we can extend u ,..,u Ker(T ) to a basis u ,..,u ,..,u of Ker(T ). We can regard u , { 1 s}⊂ { 1 s r} { i} for i = s +1,..,r, as T -cycles in K, and let Li be the one dimensional T -cyclic subspace Cu , for i = s +1,..,r.ThuswenowhaverT-cyclic subspaces L u in K and linearly i i ￿ i independent eigenvectors u1,..,ur of T . By Lemma 11.23,

L + + L K 1 ··· r ⊂ is a direct sum. It remains to show that this direct sum has dimension dim K.Its dimension is s r s L + L = (dim M +1)+(r s)=dim T K+s+(r s)=dim T K+dim Ker(T ) i i i − − i=1 i=s+1 i=1 ￿ ￿ ￿ But the last expression is equal to dim K by the dimension relation.

11.7. Homework

7. Prove Schwarz’s inequality for Cn by imitating the case of Rn.

8. An n n matrix A is said to be unitary if AA† = I. This is the complex analogue × of the notion of an . Show that the product of two unitary matrices is unitary and the inverse of a unitary matrix is also unitary. Prove that n n unitary matrices are in 1-1 correspondence with (ordered) orthonormal bases × of Cn.

9. * By imitating the proof of the spectral theorem for real symmetric matrices, prove

that if A is a hermitian matrix, there is a unitary matrix B such that BAB† is diagonal. 248

10. Suppose that a linear map A : V V has characteristic polynomial (t λ )3(t → − 1 − λ )3 where λ = λ , and that its generalized eigenspaces are 2 1 ￿ 2 3 2 2 Kλ1 = Ker(A λ1I) ￿ Ker(A λ1I) ,Kλ2 = Ker(A λ2I) ￿ Ker(A λ2I). − − − − Find a Jordan matrix of A.

11. Let f(t)=a tm + +a t+a be a polynomial function. If A is an n n matrix, m ··· 1 0 × we define f(A) to be the matrix

f(A)=a Am + + a A + a I. m ··· 1 0 1 1 Show that if C is any invertible matrix, then f(CAC− )=Cf(A)C− . In partic- 1 ular, if f(A)=0thenf(CAC− ) = 0.

12. Show that if A, B are similar matrices, then they have the same characteristic polynomial. (Hint: det(PQ)=det(P )det(Q).)

13. Let B be a Jordan matrix with the k k blocks B along the diagonal as given in i × i i n Definition 11.6. Let e1,..,en be the standard unit (column) vectors in C .Show that for each i =1,..,p,wehave

ki (B λiI) ej =0,ki 1 +1 j ki, (k0 =0.) − − ≤ ≤ Use this to show that

(B λ I)k1 (B λ I)kp =0. − 1 ··· − p Now argue that if B is a Jordan canonical form of A,then

(A λ I)k1 (A λ I)kp =0. − 1 ··· − p

14. (continuing the preceding problem) Let f(t) be the characteristic polynomial det(A tI) of A. Prove that − f(t)=( 1)n(t λ )k1 (t λ )kp . − − 1 ··· − p (Hint: Find the characteristic polynomial of B first.) Finally, conclude that f(A) = 0. This result is known as the Cayley-Hamilton theorem. 249

15. Minimal polynomial. Let A be a given nonzero n n matrix. × a. Prove that there is a lowest degree nonconstant polynomial function g(t) such that g(A) = 0. (Hint: What is the dimension of the space of n n matrices M(n, n)?) × b. Prove that any two such must be a scalar multiple of one another. Thus there is a unique such polynomial g(t) if we further assume that its leading term has the form tm. This unique polynomial, determined by A, is called the minimal polynomial of A.

Note: It can be shown that if f(t) is any polynomial function such that f(A) = 0, then f(t) is divisible by the minimal polynomial g(t), i.e. f(t)=g(t)h(t) for some polynomial function h(t). In particular if f(t) is the characteristic polynomial of A, the Cayley-Hamilton theorem implies that f(t)isdivisiblebyg(t).

In the problems below, let T : K K be a linear map of a finite dimensional vector → space K, and assume that there is a smallest positive integer q such that K = Ker(T q).

16. Show that 2 q K ￿ TK ￿ T K ￿ ￿ T K = (0). ··· A nonzero vector u K is said to have depth d if d is the largest positive integer ∈ d 1 such that u T − K.Ifu is a nonzero vector in Ker(T ) and has depth d,show ∈ that there is a T -cyclic subspace L of K of dimension d such that u L. ∈

17. (cf. Appendix B.) Show that if L is a T -cyclic subspace of K,thenL Ker(T )is ∩ one dimensional.

18. A T -cyclic subspace L of K is called maximal if dim L = depth(u) for a nonzero vector u in L Ker(T ). Show that every nonzero vector u in Ker(T ) is contained ∩ in a maximal T -cyclic subspace of K.

19. Let L ,..,L be T -cyclic subspaces of K such that L + + L is a direct sum. 1 r 1 ··· r If K = L + + L , show that each L is maximal. (Hint: Assume L is not 1 ··· r i 1 250

maximal. Use the preceding problem to construct another direct sum bigger than K.)

20. (cf. Appendix A.) Let L be a T -invariant subspace of K. Consider the quotient space K¯ = K/L and the projection map K K¯ , v v¯. Show that the map → ￿→ T¯ : K¯ K¯ , v Tv is well-defined, and that T¯q coincides with the zero map. → ￿→

21. Let u be a nonzero vector with depth(u)=q. Show that u Ker(T ), hence ∈ L = Cu is T -invariant. Consider the quotient space K¯ = K/L and the map T¯ : K¯ K¯ . Show that if w is a nonzero vector in Ker(T ) andw ¯ is nonzero, then → depth(¯w)=depth(w).

22. ∗ Let T be the set of all diagonal invertible n n complex matrices, and C× be × the set of nonzero complex numbers. Let f : T C× be a continuous map such → 1 1 that f(AB)=f(A)f(B) and f(A− )=f(A)− for all A, B T . For continuity, ∈ f is thought of as a function of n variables (the diagonal entries.) Prove that f(A)=det(A)k for some integer k. (Hint: First consider the case n = 1, so that

T = C×. Restrict the map f to the unit circle S T , consisting of complex ⊂ k numbers a with a = 1. Show that f : S C× must be of the form f(a)=a .) | | →

23. ∗ Let G be the set of all invertible n n complex matrices and f : G C× be × → 1 1 a continuous map such that f(AB)=f(A)f(B) and f(A− )=f(A)− for all A, B G. Prove that f(A)=det(A)k for some integer k. (Hint: Do the preceding ∈ problem first. Then restrict f to T G. Observe that diagonalizable elements of ⊂ G form a dense subset of G and that f(A)=det(A)k holds for diagonalizable A.) 251