<<

Lecture Notes on Linear and Multilinear 2301-610

Wicharn Lewkeeratiyutkul Department of and Faculty of Science Chulalongkorn University August 2014

Contents

Preface iii

1 Vector Spaces 1 1.1 Vector Spaces and Subspaces ...... 2 1.2 and ...... 10 1.3 Linear Maps ...... 18 1.4 Representation ...... 32 1.5 Change of Bases ...... 42 1.6 Sums and Direct Sums ...... 48 1.7 Quotient Spaces ...... 61 1.8 Dual Spaces ...... 65

2 73 2.1 Free Vector Spaces ...... 73 2.2 Multilinear Maps and Products ...... 78 2.3 ...... 95 2.4 Exterior Products ...... 101

3 Canonical Forms 107 3.1 Polynomials ...... 107 3.2 Diagonalization ...... 115 3.3 Minimal Polynomial ...... 128 3.4 Jordan Canonical Forms ...... 141

i ii CONTENTS

4 Inner Product Spaces 161 4.1 Bilinear and Sesquilinear Forms ...... 161 4.2 Inner Product Spaces ...... 167 4.3 Operators on Inner Product Spaces ...... 180 4.4 Spectral Theorem ...... 190

Bibliography 203 Preface

This book grew out of the lecture notes for the course 2301-610 Linear and Multilinaer Algebra given at the Deparment of Mathematics, Faculty of Science, Chulalongkorn University that I have taught in the past 5 years. is one of the most important subjects in Mathematics, with numerous applications in pure and applied sciences. A more theoretical linear algebra course will emphasize on linear maps between vector spaces, while an applied-oriented course will mainly work with matrices. Matrices have an ad- vantage of being easier to compute, while it is easier to establish the results by working with linear maps. This book tries to establish a close connection between the two aspects of the subject. I would like to thank my students who took this course with me and proof- read the early drafts. Special thanks go to Chao Kusollerschariya who provide technical help about latex and suggest several easier proofs, and Detchat Samart who supplied the proofs on polynomials. Please do not distribute.

Wicharn Lewkeeratiyutkul

iii

Chapter 1

Vector Spaces

In this chapter, we will study an abstract theory of vector spaces and linear maps between vector spaces. A is a generalization of the space of vectors in the 2- or 3-dimensional Euclidean space. We can add two vectors and multiply a vector by a . In a general framework, we still can add vectors, but the scalars don’t have to be numbers; they are required to satisfy some algebraic properties which constitute a field. A vector space is defined to be a non-empty set that satisfies certain axioms that generalize the addition and of 2 3 vectors in R and R . This will allow our theory to be applicable to a wide range of situations. Once we have some vector spaces, we can construct new vector spaces from existing ones by taking subspaces, direct sums and quotient spaces. We then introduce a basis for a vector space, which can be regarded as choosing a coor- dinate system. Once we fix a basis for the vector space, every other element can be written uniquely as a of elements in the basis. We also study a between vector spaces. It is a that preserves the vector space operations. If we fix bases for vector spaces V and W , a linear map from V into W can be represented by a matrix. This will allow the computational aspect of the theory. The set of all linear maps between two vector spaces is a vector space itself. The case when the target space is a scalar field is of particular importance, called the of the vector space.

1 2 CHAPTER 1. VECTOR SPACES

1.1 Vector Spaces and Subspaces

Definition 1.1.1. A field is a set F with two binary operations, + and ·, and two distinct elements 0 and 1, satisfying the following properties:

(i) ∀x, y, z ∈ F ,(x + y) + z = x + (y + z);

(ii) ∀x ∈ F , x + 0 = 0 + x = x;

(iii) ∀x ∈ F ∃y ∈ F , x + y = y + x = 0;

(iv) ∀x, y ∈ F , x + y = y + x;

(v) ∀x, y, z ∈ F ,(x · y) · z = x · (y · z);

(vi) ∀x ∈ F , x · 1 = 1 · x = x;

(vii) ∀x ∈ F − {0} ∃y ∈ F , x · y = y · x = 1;

(viii) ∀x, y ∈ F , x · y = y · x;

(ix) ∀x, y, z ∈ F , x · (y + z) = x · y + x · z.

Properties (i)-(iv) say that (F, +) is an abelian . Properties (v)-(viii) say that (F − {0}, ·) is an abelian group. Property (ix) is the distributive law for the multiplication over addition.

Example 1.1.2. Q, R, C, Zp, where p is a prime number, are fields. Definition 1.1.3. A vector space over a field F is a non-empty set V , together with an addition + : V × V → V and a scalar multiplication · : F × V → V , satisfying the following properties:

(i) ∀u, v, w ∈ V ,(u + v) + w = u + (v + w);

(ii) ∃0¯ ∈ V ∀v ∈ V , v + 0¯ = 0¯ + v = v;

(iii) ∀v ∈ V ∃ − v ∈ V , v + (−v) = (−v) + v = 0;¯

(iv) ∀u, v ∈ V , u + v = v + u;

(v) ∀m, n ∈ F ∀v ∈ V ,(m + n) · v = m · v + n · v; 1.1. VECTOR SPACES AND SUBSPACES 3

(vi) ∀m ∈ F ∀u, v ∈ V , m · (u + v) = m · u + m · v;

(vii) ∀m, n ∈ F ∀v ∈ V ,(m · n) · v = m · (n · v);

(viii) ∀v ∈ V , 1 · v = v.

Proposition 1.1.4. Let V be a vector space over a field F . Then

(i) ∀v ∈ V , 0 · v = 0¯;

(ii) ∀k ∈ F , k · 0¯ = 0¯;

(iii) ∀v ∈ V ∀k ∈ F , k · v = 0¯ ⇔ k = 0 or v = 0¯;

(iv) ∀v ∈ V , (−1) · v = −v.

Proof. Let v ∈ V and k ∈ F . Then

(i) 0 · v + v = 0 · v + 1 · v = (0 + 1) · v = 1 · v = v. Hence 0 · v = 0.¯

(ii) k · 0¯ = k · (0 · 0)¯ = (k · 0) · 0¯ = 0 · 0¯ = 0¯.

(iii) If k · v = 0¯ and k 6= 0, then  1  1 v = 1 · v = · k · v = (k · v) = 0¯. k k

(iv) (−1) · v + v = (−1) · v + 1 · v = (−1 + 1) · v = 0 · v = 0¯. Hence (−1) · v = −v.

Remark. When there is no confusion, we will denote the additive identity 0¯ simply by 0.

Example 1.1.5. The following sets with the given addition and scalar multipli- cation are vector spaces.

(i) The set of n-tuples whose entries are in F :

n F = {(x1, x2, . . . , xn) | xi ∈ F, i = 1, 2, . . . , n},

with the addition and scalar multiplication given by

(x1, . . . , xn) + (y1, . . . , yn) = (x1 + y1, . . . , xn + yn),

k(x1, . . . , xn) = (kx1, . . . , kxn). 4 CHAPTER 1. VECTOR SPACES

(ii) The set of m × n matrices whose entries are in F :

Mm×n(F ) = {[aij]m×n | aij ∈ F, i = 1, 2, . . . , m; j = 1, 2, . . . , n} ,

with the usual matrix addition and scalar multiplication. Note that if

m = n, we write Mn(F ) for Mn×n(F ).

(iii) The set of polynomials over F :

n F [x] = {a0 + a1x + ··· + anx | n ∈ N ∪ {0}, ai ∈ F, i = 0, 1, . . . , n}.

with the usual polynomial addition and scalar multiplication.

(iv) The set of sequences over F :

S = {(xn) | xn ∈ F for all n ∈ N},

with the addition and scalar multiplication given by

(xn) + (yn) = (xn + yn),

k(xn) = (kxn).

Here we are not concerned with convergence of the sequences.

(v) Let X be a non-empty set. The set of F -valued functions on X

F(X) = {f : X → F }

with the following addition and scalar multiplication:

(f + g)(x) = f(x) + g(x)(x ∈ X), (kf)(x) = kf(x)(x ∈ X).

Once we have some vector spaces to begin with, there are several methods to construct new vector spaces from the old ones. We first consider a subset which is also a vector space under the same operations.

Definition 1.1.6. Let V be a vector space over a field F .A subspace of V is a subset of V which is also a vector space over F under the same operations. We write W ≤ V to denote that W is a subspace of V . 1.1. VECTOR SPACES AND SUBSPACES 5

Proposition 1.1.7. Let W be a non-empty subset of a vector space V over a field F . Then the following statements are equivalent:

(i) W is a subspace of V ;

(ii) ∀v ∈ W ∀w ∈ W , v + w ∈ W and ∀v ∈ W ∀k ∈ F, kv ∈ W ;

(iii) ∀v ∈ W ∀w ∈ W ∀α ∈ F ∀β ∈ F , αv + βw ∈ W .

Proof. We will establish (i) ⇔ (ii) and (ii) ⇔ (iii). (i) ⇒ (ii). Assume W is a subspace of V . Then W is a vector space over a field F under the restriction of the addition and the scalar multiplication to W . Hence v + w ∈ W and kv ∈ W for any v, w ∈ W and any k ∈ F . (ii) ⇒ (i). Assume (ii) holds. Since the axioms of a vector space hold for all elements in V , they also hold for elements in W as well. Since W is non-empty, we can choose an element v ∈ W . Then 0¯ = 0 · v ∈ W . Moreover, for any v ∈ W , −v = (−1) · v ∈ W . This shows that W contains the additive identity and the additive inverse of each element. (ii) ⇒ (iii). Let v, w ∈ W and α, β ∈ F . Then αv ∈ W and βw ∈ W , which implies αv + βw ∈ W . (iii) ⇒ (ii). Assume (iii) holds. Then for any v, w ∈ W , v +w = 1·v +1·w ∈ W and for any v ∈ W and any k ∈ F , kv = k · v + 0 · v ∈ W .

Example 1.1.8.

(i) {0} and V are subspaces of a vector space V .

(ii) [0, 1] and [0, ∞) are not subspaces of R.

(iii) Let F be a field and define

(a) W1 = the set of upper triangular matrices in Mn(F ),

(b) W2 = the set of lower triangular matrices in Mn(F ), and

(c) W3 = the set of nonsingular matrices in Mn(F ).

Then W1 and W2 are subspaces of Mn(F ), but W3 is not a subspace of

Mn(F ). 6 CHAPTER 1. VECTOR SPACES

(iv) Let Pn = {p ∈ F [x] : deg p ≤ n} ∪ {0}. Then Pn is a subspace of F [x], but the set {p ∈ F [x] : deg p = n} ∪ {0} is not a subspace of F [x].

(v) By Example 1.1.5 (v), the set of real-valued functions F([a, b]) is a vector space over R. Now let

C([a, b]) = {f :[a, b] → R | f is continuous}.

Then C([a, b]) is a subspace of F([a, b]). This follows from the standard fact from that a sum of two continuous functions is still continuous and a multiplication of a continuous function by a scalar is also continuous.

(vi) Let S be the sequence space in Example 1.1.5 (iv). The following subsets are subspaces of S:

∞ 1 n X o ` = (xn) ∈ S : |xn| < ∞ n=1 ∞ n o ` = (xn) ∈ S : sup |xn| < ∞ n∈N n o c = (xn) ∈ S : lim xn exists . n→∞ These subspaces play an important role and will be studied in greater details in .

Proposition 1.1.9. Let V be a vector space and suppose Wα is a subspace of V T for each α ∈ Λ. Then α∈Λ Wα is a subspace of V . T T Proof. Since 0 ∈ Wα for each α ∈ Λ, we have 0 ∈ α∈Λ Wα. Thus α∈Λ Wα 6= ∅. T Next, let w1, w2 ∈ α∈Λ Wα and k1, k2 ∈ F . Then w1, w2 ∈ Wα for each α ∈ Λ. T Hence k1w1 + k2w2 ∈ Wα for each α ∈ Λ, i.e. k1w1 + k2w2 ∈ α∈Λ Wα.

Proposition 1.1.10. Let S be a subset of a vector space V . Then there is the smallest subspace of V containing S.

Proof. Define T = {W ≤ V | S ⊆ W }. Then T 6= ∅ because V ∈ T . Let T ∗ U = W ∈T W . Then U is a subspace of V containing S. If W is a subspace of ∗ T ∗ V containing S, then W ∈ T , which implies U = W ∈T W ⊆ W . This shows that U is the smallest subspace of V containing S. 1.1. VECTOR SPACES AND SUBSPACES 7

Definition 1.1.11. Let S be a subset of a vector space V . Then we call the smallest subspace containing S the subspace of V generated by S or the subspace of V spanned by S, or simply the span of S, denoted by span S or hSi. If hSi = V , we say that V is spanned by S or S spans V .

Proposition 1.1.12. Let S and T be subsets of a vector space V . Then

(i) h∅i = {0};

(ii) S ⊆ hSi;

(iii) S ⊆ T ⇒ hSi ⊆ hT i;

(iv) hhSii = hSi.

Proof. (i) Clearly, {0} is the smallest subspace of V containing ∅. (ii) This follows from the definition of hSi.

(iii) Assume S ⊆ T . Since T ⊆ hT i, we have S ⊆ hT i. Then hT i is a subspace of V containing S. But hSi is the smallest subspace of V containing S. Hence hSi ⊆ hT i.

(iv) Since S ⊆ hSi, by (iii), hSi ⊆ hhSii. On the other hand, hhSii is the smallest subspace of V containing hSi. But then hSi is a subspace of V containing hSi itself. It implies that hhSii ⊆ hSi. Hence hhSii = hSi.

Definition 1.1.13. Let v1, . . . , vn ∈ V and k1, . . . , kn ∈ F . Then the element

k1v1 + ··· + knvn is called a linear combination of v1, . . . , vn with coefficients

k1, . . . , kn, respectively.

Theorem 1.1.14. If S ⊆ V , then hSi = the set of linear combinations of ele- ments in S.

Proof. Let W be the set of linear combinations of elements in S. For any s ∈ S, s is a linear combination of an element in S, namely s = 1 · s. Hence S ⊆ W . Let

v, w ∈ W and k, ` ∈ F . Then there exist v1, . . . , vn, w1, . . . , wm in S and scalars α1, . . . , αn, β1, . . . , βm, for some m, n ∈ N, such that

v = α1v1 + ··· + αnvn and w = β1w1 + ··· + βmwm. 8 CHAPTER 1. VECTOR SPACES

It follows that

kv + `w = (kα1)v1 + ··· + (kαn)vn + (`β1)w1 + ··· + (`βm)wm.

Thus kv + `w is a linear combination of elements in S. This shows that W is a subspace of V containing S. Hence hSi ⊆ W . On the other hand, let v ∈ W . Then there exist v1, . . . , vn ∈ S and α1, . . . , αn ∈ F , for some n ∈ N, such that v = α1v1 + ··· + αnvn. Since each vi ∈ S ⊆ hSi and hSi is a subspace of V , we Pn have v = i=1 αivi ∈ hSi. Hence W ⊆ hSi. We now conclude that hSi = W .

Example 1.1.15. (i) Let V = F n, where F is a field. Let

e1 = (1, 0,..., 0), e2 = (0, 1,..., 0), . . . , en = (0, 0,..., 1).

n n Then span {e1, e2, . . . , en} = F . Indeed, any element (x1, . . . , xn) ∈ F

can be written as a linear combination x1e1 + ··· + xnen.

(ii) Let V = Mm×n(F ). For i = 1, . . . , m and j = 1, . . . , n, let Eij be the m × n

matrix whose (i, j)-entry is 1 and 0 otherwise. Then Mm×n(F ) is spanned

by {Eij | i = 1, . . . , m, j = 1, . . . , n}. For, if A = [aij] ∈ Mm×n(F ), then m n X X A = aij Eij. i=1 j=1

(iii) The set of monomials {1, x, x2, x3,... } spans the vector space F [x] because any polynomial in F [x] can be written as a linear combination of monomials.

(iv) Let S = {(xn) | xn ∈ F for all n ∈ N} be the vector space of sequences in F . For each k ∈ N, let

ek = (0,..., 0, 1, 0, 0,... )

∞ where ek has 1 in the k-th coordinate, and 0’s elsewhere. Then {ek}k=1 does not span S. For example, a sequence (1, 1, 1,... ) cannot be written

as a linear combination of ek’s. In fact,

∞ span {ek}k=1 = {(xn) ∈ S | xn = 0 for all but finitely many n’s}. We leave this as an exercise. 1.1. VECTOR SPACES AND SUBSPACES 9

Exercises 4 4 1.1.1. Determine which of the following subsets of R are subspaces of R . (i) U = {(a, b, c, d) | a + 2b = c + 2d}.

(ii) V = {(a, b, c, d) | a + 2b = c + 2d + 1}.

(iii) W = {(a, b, c, d) | ab = cd}. 1.1.2. Let A be an m × n matrix.

m n m (i) Show that {b ∈ R | Ax = b for some x ∈ R } is a subspace of R .

n n (ii) Show that {x ∈ R | Ax = 0} is a subspace of R .

m n (iii) Let b 6= 0 be an element in R . Verify whether { x ∈ R | Ax = b } is a n subspace of R .

1.1.3. Verify whether the following sets are subspaces of Mn(R).

(i) {A ∈ Mn(R) | det A = 0};

t (ii) {A ∈ Mn(R) | A = A};

t (iii) {A ∈ Mn(R) | A = −A};

1.1.4. Let W1 and W2 be subspaces of a vector space V . Prove that W1 ∪ W2 is

a subspace of V if and only if W1 ⊆ W2 or W2 ⊆ W1. 1.1.5. Let V be a vector space over an infinite field F . Prove that V cannot be written as a finite union of its proper subspaces.

1.1.6. Let S = {(xn) | xn ∈ F for all n ∈ N}. Define

f = {(xn) ∈ S | xn = 0 for all but finitely many n’s}.

Prove that f = {(xn) ∈ S | ∃N ∈ N ∀n ≥ N, xn = 0} and that f is a subspace of ∞ S spanned by {ek}k=1, where ek is defined in Example 1.1.15 (iv). 1.1.7. An abelian group hV, +i is called divisible if for any non-zero integer n, nV = V , i.e. if for every u ∈ V and for any non-zero integer n, there exists v ∈ V such that u = nv. Prove that an abelian group hV, +i is a vector space over Q if and only if V is divisible, all of whose non-zero elements are of infinite order. 10 CHAPTER 1. VECTOR SPACES

1.2 Basis and Dimension

Definition 1.2.1. Let V be a vector space over a field F and S a subset of V .

We say that S is linearly dependent if there exist distinct elements v1, . . . , vn ∈ S and scalars k1, . . . , kn ∈ F , not all zero, such that k1v1 + ··· + knvn = 0. We say that S is linearly independent if S is not linearly dependent. In other words, S is linearly independent if and only if for any distinct elements v1, . . . , vn ∈ S and any k1, . . . , kn ∈ F ,

k1v1 + ··· + knvn = 0 ⇒ k1 = ··· = kn = 0.

Remark.

(i) ∅ is linearly independent. (ii) If 0 ∈ S, then S is linearly dependent.

(iii) If S ⊆ T and T is linearly independent, then S is linearly independent.

Example 1.2.2.

n (i) Let V = F , where F is a field. Let e1, . . . , en be the coordinate vectors

defined in Example 1.1.15 (i). Then {e1, . . . , en} is linearly independent.

To see this, let α1, . . . , αn ∈ F be such that α1e1 + ··· + αnen = 0. But then

(0,..., 0) = α1e1 + ··· + αnen = (α1, . . . , αn).

Hence α1 = ··· = αn = 0.

(ii) Let V = Mm×n(F ). For i = 1, . . . , m and j = 1, . . . , n, let Eij be defined as

in Example 1.1.15 (ii). Then {Eij | i = 1, . . . , m, j = 1, . . . , n} is linearly independent.

(iii) Let V = F [x]. Then the set {1, x, x2, x3,... } is linearly independent. This n follows from the fact that a polynomial a0 + a1x + ··· + anx = 0 if and

only if ai = 0 for i = 0, 1, . . . , n.

(iv) Let S be the vector space of sequences in F defined in Example 1.1.15 (iv). ∞ For each k ∈ N, let ek be the k-th coordinate sequence. Then {ek}k=1 is linearly independent in S. We leave it to the reader to verify this fact. 1.2. BASIS AND DIMENSION 11

(v) Let V = C([0, 1]), the space of continuous real-valued functions defined on [0,1]. Let f(x) = 2x and g(x) = 3x for any x ∈ [0, 1]. Then {f, g} is linearly independent in C([0, 1]). Indeed, let α, β ∈ R be such that αf + βg = 0. Then α 2x + β 3x = 0 for any x ∈ [0, 1]. If x = 0, α + β = 0. If x = 1, 2α + 3β = 0. Solving these equations, we obtain α = β = 0.

(vi) If V is a vector space over fields F1 and F2 and S ⊆ V , then it is possible that a subset S of V is linearly independent over F , but linearly dependent 1 √ over F . For example, let V = , F = and F = and S = {1, 2}. 2 R 1 √Q 2 √R Then S is linearly dependent over :(− 2) · 1 + 1 · 2 = 0. On the other R √ hand, suppose α, β ∈ are such that α · 1 + β · 2 = 0. If β 6= 0, then √ Q α/β = − 2, which is a contradiction. Hence β = 0, which implies α = 0. This shows that S is linearly independent over Q. Theorem 1.2.3. Let S be a linearly independent subset of a vector space V . Then ∀v ∈ V − S, v∈ / hSi ⇔ S ∪ {v} is linearly independent.

Proof. Let v ∈ V − S be such that v∈ / hSi. To show that S ∪ {v} is linearly independent, let v1, . . . , vn ∈ S and k1, . . . , kn, k ∈ F be such that

k1v1 + k2v2 + ··· + knvn + kv = 0.

Then kv = −k1v1 − k2v2 − · · · − knvn. If k 6= 0, we have k k k v = − 1 v − 2 v − · · · − n v ∈ hSi , k 1 k 2 k n which is a contradiction. It follows that k = 0 and that k1v1 + ··· + knvn = 0.

By of S, we also have k1 = ··· = kn = 0. Hence S ∪ {v} is linearly independent. On the other hand, let v ∈ V be such that v ∈ hSi and v∈ / S. Then there exist v1, . . . , vn ∈ S and k1, . . . , kn ∈ F such that v = k1v1 + ··· + knvn. Hence k1v1 + ··· knvn + (−1)v = 0. Thus S ∪ {v} is linearly dependent.

Definition 1.2.4. A subset S of a vector space V is called a basis for V if

(i) S spans V, and

(ii) S is linearly independent. 12 CHAPTER 1. VECTOR SPACES

Example 1.2.5.

(i) The following set of coordinate vectors is a basis for F n

{(1, 0,..., 0), (0, 1,..., 0),..., (0, 0,..., 1)}.

It is called the standard basis for F n.

(ii) For i = 1, . . . , m and j = 1, . . . , n, let Eij be an m × n matrix whose (i, j)-

entry is 1 and 0 otherwise. Then {Eij}i=1,...,m is a basis for Mm×n(F ). j=1,...,n (iii) The set of monomials {1, x, x2, x3,... } is a basis for F [x].

Theorem 1.2.6. Let B be a basis for a vector space V . Then any element in V can be written uniquely as a linear combination of elements in B.

Proof. Since B spans V , any element in V can be written as a linear combination of elements in B. We have to show that it can be written in a unique way. Let v ∈ V . Assume that n m X X v = αivi = βjwj, i=1 j=1 where αi, βj ∈ F and vi, wj ∈ B for i = 1, . . . , n and j = 1, . . . , m, for some m, n ∈ N. Without loss of generality, assume that vi = wi for i = 1, . . . , k and {vk+1, . . . , vn} ∩ {wk+1, . . . , wm} = ∅. Then we have

k n m X X X (αi − βi)vi + αivi + (−βj)wj = 0. i=1 i=k+1 j=k+1

By linear independence of {v1, . . . , vk, vk+1, . . . , vn, wk+1, . . . , wm} ⊆ B, we have

αi − βi = 0 for i = 1, . . . , k;

αi = 0 for i = k + 1, . . . , n;

−βj = 0 for j = k + 1, . . . , m.

Pn Hence m = n = k and v is written uniquely as a linear combination i=1 αivi.

Next, we give an alternative definition of a basis for a vector space. 1.2. BASIS AND DIMENSION 13

Theorem 1.2.7. Let B be a subset of a vector space V . Then B is a basis for V if and only if B is a maximal linearly independent subset of V .

Proof. Let B be a basis for V and let C be a linearly independent subset of V such that B ⊆ C. Assume that B 6= C. Then there is an element v ∈ C such that v∈ / B. Hence B ∪ {v} is still linearly independent, being a subset of C. By Theorem 1.2.3, v∈ / hBi, which is a contradiction since hBi = V . Hence B = C. Conversely, let B be a maximal linearly independent subset of V . Suppose that hBi= 6 V . Then there exists v ∈ V such that v∈ / hBi. By Theorem 1.2.3 again, B ∪ {v} is linearly independent, contradicting the assumption. Hence B spans V . It follows that B is a basis for V .

Next we show that every vector space has a basis. The proof of this requires the Axiom of Choice in an equivalent form of Zorn’s Lemma which we recall first.

Theorem 1.2.8 (Zorn’s Lemma). If S is a partially ordered set in which every totally ordered subset has an upper bound, then S has a maximal element.

Theorem 1.2.9. In a vector space, every linearly independent set can be extended to a basis. In particular, every vector space has a basis.

Proof. The second statement follows from the first one by noting that the empty set is a linearly independent set and thus can be extended to a basis. To prove the first statement, let So be a linearly independent set in a vector space V . Set

E = { S ⊆ V | S is linearly independent and So ⊆ S }.

0 Then E is partially ordered by inclusion. Let E = {Sα}α∈Λ be a totally ordered S subset of E. Let T = α∈Λ Sα. Clearly, So ⊆ T ⊆ V . To establish linear inde- Pn pendence of T , let v1, . . . , vn ∈ T and k1, . . . , kn ∈ F be such that i=1 kivi = 0. 0 Since each vi belongs to some Sαi and E is a totally ordered set, there must be 0 Sβ in E such that vi ∈ Sβ for i = 1, . . . , n. Since Sβ is linearly independent, we 0 have ki = 0 for i = 1, . . . , n. This shows that T is an upper bound in E . By Zorn’s Lemma, E has a maximal element S∗. Thus S∗ is a linearly inde- ∗ ∗ pendent set containing So. If there is v∈ / hS i, by Theorem 1.2.3, S ∪ {v} is a ∗ linearly independent set containing So. This contradicts the maximality of S . Hence hS∗i = V , which implies that S∗ is a basis for V . 14 CHAPTER 1. VECTOR SPACES

The proof of the existence of a basis for a vector space above relies on the Zorn’s Lemma, which is equivalent to the Axiom of Choice. Any proof that requires the Axiom of Choice is nonconstructive. It gives the existence without explaining how to find one. In this situation, it implies that every vector space contains a basis, but it does not tell us how to construct one. If the vector space is finitely generated, i.e. spanned by a finite set, then we can construct a basis from the spanning set. In general, we know that a vector space has a basis but we may not be able to give one such example. For example, consider the vector space S = {(xn) | xn ∈ F for all n ∈ N}. We have seen that the set ∞ of coordinate sequences {ek}k=1 is a linearly independent subset of S and hence can be extended to a basis for S, but we do not have an explicit description of such a basis. A basis for a vector space is not unique. However, any two bases for the same vector space have the same cardinality. We begin by proving the following theorem.

Theorem 1.2.10. For any vector space V , if V has a spanning set with n ele- ments, then any subset of V with more than n elements is linearly dependent.

Proof. We will prove the statement by induction on n. Case n = 1: Assume that V = span {v}. Let R be a subset of V with at least two elements. Choose x, y ∈ R with x 6= y. Then there exist α, β ∈ F such that x = αv and y = βv. Hence βx − αy = 0. Since x 6= y, α and β cannot be both zero. This shows that R is linearly dependent. Assume that the statement holds for n − 1. Let V be a vector space with a spanning set S = {v1, . . . , vn}. Let R = {x1, . . . , xm} be a subset of V where m > n. Each xi ∈ R can be written as a linear combination of v1, . . . , vn:

n X xi = aijvj i = 1, 2, . . . , m. (1.1) j=1

We examine the scalars ai1 that multiply v1 and split the proof into two cases.

Case 1: ai1 = 0 for i = 1, . . . , m. In this case, the sums in (1.1) do not involve v1. Let W = span{v2, . . . , vn}. Then W is spanned by a set with n − 1 elements, R ⊆ W and |R| = m > n > n − 1. It follows that R is linearly dependent. 1.2. BASIS AND DIMENSION 15

Case 2: ai1 6= 0 for some i. By renumbering if necessary, we assume that a11 6= 0. Consider i = 1 in (1.1): n X x1 = a1jvj. j=1

Multiplying by ci = ai1/a11, i = 2, . . . , n, both sides, we have n X cix1 = ai1v1 + cia1jvj i = 2, . . . , m. (1.2) j=2 Substract (1.1) from (1.2):

n X cix1 − xi = (cia1j − aij)vj i = 2, . . . , m. j=2

0 Let W = span{v2, . . . , vn} and R = {cix1 − xi : i = 2, . . . , m}. We see that R0 ⊆ W and |R0| = m − 1 > n − 1. By the induction hypothesis, R0 is linearly

dependent. Hence there exist α2,..., αn ∈ F , not all zero, such that m m m  X  X X αici x1 − αixi = αi(cix1 − xi) = 0. i=2 i=2 i=2

This implies that R = {x1, . . . , xm} is linearly dependent.

Corollary 1.2.11. If V has finite bases B and C, then |B| = |C|.

Proof. From the above theorem, if B spans V and C is linearly independent, then |C| ≤ |B|. By reversing the roles of B and C, we have |B| ≤ |C|. Hence |B| = |C|.

In fact, the above Corollary is true if V has infinite bases too, but the proof requires arguments involving infinite cardinal numbers, which is beyond the scope of this book, so we state it as a fact below and omit the proof.

Theorem 1.2.12. All bases for a vector space have the same cardinality.

Definition 1.2.13. Let V be a vector space over a field F . The cardinality of a basis for V is called the dimension of V , denoted by dim V . If dim V < ∞ (i.e. V has a finite basis), we say that V is finite-dimensional. Otherwise, we say that V is infinite-dimensional. 16 CHAPTER 1. VECTOR SPACES

Example 1.2.14.

(i) dim({0}) = 0.

(ii) dim F n = n.

(iii) dim(Mm×n(F )) = mn.

(iv) dim(F [x]) = ∞.

Proposition 1.2.15. Let V be a vector space. If W is a subspace of V , then dim W ≤ dim V .

Proof. Let B be a basis for W . Then B is a linearly independent subset of V , and hence can be extended to a basis C for V . Thus dim W = |B| ≤ |C| = dim V .

Corollary 1.2.16. Let V be a finite-dimensional vector space. If B is a linearly independent subset of V such that |B| = dim V , then B is a basis for V .

Proof. Let B be a linearly independent subset of V such that |B| = dim V . Suppose B is not a basis for V . Then B can be extended to a basis C for V and B ( C. Thus |C| = dim V = |B|, a contradiction.

Corollary 1.2.17. Let V be a finite-dimensional vector space and W a subspace of V . If dim W = dim V , then W = V .

Proof. Let B be a basis for W . Then |B| = dim W = dim V . By Corollary 1.2.16, B is a basis for V . Hence W = hBi = V . 1.2. BASIS AND DIMENSION 17

Exercises √ √ 1.2.1. Prove that {1, 2, 3} is linearly independent over Q, but linearly depen- dent over R.

1.2.2. Prove that {sin x, cos x} is a linearly independent subset of C([0, π]).

1.2.3. Prove that R is an infinite-dimensional vector space over Q.

1.2.4. If {u, v, w} is a basis for a vector space V , show that {u + v, v + w, w + u} is a basis for V .

1.2.5. Let A and B be linearly independent subsets of a vector space V such that A∩B = ∅. Show that A∪B is linearly independent if and only if hAi∩hBi = {0}.

1.2.6. Prove the converse of Theorem 1.2.6: if B is a subset of a vector space V over a field F such that every element in V can be written uniquely as a linear combination of elements in B, then B is a basis for V .

1.2.7. Let V be a vector space over a field F and S ⊆ V with |S| ≥ 2. Show that S is linearly dependent if and only if some element of S can be written as a linear combination of the other elements in S.

1.2.8. Let S be a subset of a vector space V . Show that S is a basis for V if and only if S is a minimal spanning subset of V .

1.2.9. Let S be a spanning subset of a vector space V . Show that there is a subset B of S which is a basis for V .

1.2.10. Let V be a finite-dimensional vector space with dimension n. Let S ⊆ V . Prove that

(i) if |S| < n, then S does not span V ;

(ii) if |S| = n and V is spanned by S, then S is a basis for V . 18 CHAPTER 1. VECTOR SPACES

1.3 Linear Maps

In this section, we study a function between vector spaces that preserves the vector space operations.

Definition 1.3.1. Let V and W be vector spaces over a field F . A function T : V → W is called a linear map or a linear transformation if

(i) T (u + v) = T (u) + T (v) for any u, v ∈ V ;

(ii) T (kv) = k T (v) for any v ∈ V and k ∈ F .

We can combine conditions (i) and (ii) together into a single condition as follows:

Proposition 1.3.2. Let V and W be vector spaces over a field F . A function T : V → W is linear if and only if

T (αu + βv) = αT (u) + βT (v) for any u, v ∈ V and α, β ∈ F.

Proof. Assume that T : V → W is linear. Then for any u, v ∈ V and α, β ∈ F ,

T (αu + βv) = T (αu) + T (βv) = αT (u) + βT (v).

Conversely, for any u, v ∈ V ,

T (u + v) = T (1 · u + 1 · v) = 1 · T (u) + 1 · T (v) = T (u) + T (v) and for any v ∈ V and any k ∈ F ,

T (kv) = T (k · v + 0 · v) = kT (v) + 0T (v) = kT (v).

Hence T is linear.

The above proposition says that a linear map preserves a linear combination of two elements. We can apply a mathematical induction to show that it preserves any linear combination of elements in a vector space.

Corollary 1.3.3. Let T : V → W be a linear map. Then

T (α1v1 + ··· + αnvn) = α1T (v1) + ··· + αnT (vn), for any n ∈ N, any v1, . . . , vn ∈ V and any α1, . . . , αn ∈ F . 1.3. LINEAR MAPS 19

Proposition 1.3.4. If T : V → W be a linear map, then T (0)¯ = 0¯.

Proof. T (0)¯ = T (0 · 0)¯ = 0 · T (0)¯ = 0.¯

Example 1.3.5. The following functions are examples of linear maps.

(i) The zero map T : V → W defined by T (v) = 0¯ for all v ∈ V . The zero map will be denoted by 0.

(ii) The identity map IV : V → V defined by IV (v) = v for all v ∈ V .

n m (iii) Let A ∈ Mm×n(F ). Define LA : F → F by

n LA(x) = Ax for any x ∈ F ,

where x is represented as an n × 1 matrix.

(iv) Define D : F [x] → F [x] by

n n−1 D(a0 + a1x + ··· + anx ) = a1 + 2a2x + ··· + nanx .

The map D is the “formal” differentiation of polynomials. We may denote D(f) by f 0. The linearity of D can be written as (f + g)0 = f 0 + g0 and (kf)0 = k f 0 for any f, g ∈ F [x] and k ∈ F .

(v) Define T : C([a, b]) → R by

Z b T (f) = f(x) dx for any f ∈ C([a, b]). a The linearity of T follows from properties of the Riemann integration.

(vi) Let S denote the set of all sequences in F . Define R, L: S → S by

R((x1, x2, x3,... )) = (0, x1, x2, x3,... ), and

L((x1, x2, x3,... )) = (x2, x3, x4,... ).

The map R is called the right-shift operator and the map L is called the left-shift operator. 20 CHAPTER 1. VECTOR SPACES

Definition 1.3.6. Let T : V → W be a linear map. Define the and the image of T to be the following sets:

ker T = {v ∈ V | T (v) = 0}, im T = {w ∈ W | ∃v ∈ V, w = T (v)}.

Proposition 1.3.7. Let T : V → W be a linear map. Then

(i) ker T is a subspace of V ;

(ii) im T is a subspace of W ;

(iii) T is onto if and only if im T = W ;

(iv) T is 1-1 if and only if ker T = {0}.

Proof. (i) Since T (0) = 0, 0 ∈ ker T. Let u, v ∈ ker T and α, β ∈ F . Then

T (αu + βv) = αT (u) + βT (v) = 0.

Hence αu + βv ∈ ker T . It shows that ker T is a subspace of V .

(iii) Since T (0) = 0, 0 ∈ im T . Let u, v ∈ im T and α, β ∈ F . Then there exist x, y ∈ V such that T (x) = u and T (y) = v. It follows that

αu + βv = αT (x) + βT (y) = T (αx + βy) ∈ im T.

Thus im T is a subspace of W .

(iii) This is a restatement of T being onto.

(iv) Suppose that T is 1-1. It is clear that {0} ⊆ ker T . Let u ∈ ker T . Then T (u) = 0 = T (0). Since T is 1-1, u = 0. Hence ker T = {0}. Conversely, assume that ker T = {0}. Let u, v ∈ V be such that T (u) = T (v). Then T (u − v) = T (u) − T (v) = 0. Thus u − v = 0, i.e. u = v. This shows that T is 1-1.

The next theorem states the relation between the of the kernel and the image of a linear map. 1.3. LINEAR MAPS 21

Theorem 1.3.8. Let T : V → W be a linear map between finite-dimensional vector spaces. Then

dim V = dim(ker T ) + dim(im T ).

Proof. Let A = {v1, . . . , vk} be a basis for ker T . Then it is a linearly independent set in V and thus can be extended to a basis B = {v1, . . . , vk, vk+1, . . . , vn} for

V . We will show that C = {T (vk+1),...,T (vn)} is a basis for im T . To see that it spans im T , let w = T (v), where v ∈ V . Then v can be written uniquely as

v = α1v1 + ··· + αnvn, for some α1,..., αn ∈ F . Hence

n n n  X  X X T (v) = T αivi = αiT (vi) = αiT (vi), i=1 i=1 i=k+1

because T (v1) = ··· = T (vk) = 0. Hence w = T (v) is in the span of C. Now let

αk+1,..., αn ∈ F be such that

αk+1T (vk+1) + ··· + αnT (vn) = 0.

Then n n  X  X T αivi = αiT (vi) = 0. i=k+1 i=k+1 Pn Hence i=k+1 αivi ∈ ker T . Since A is a basis for ker T , there exist α1, . . . , αk ∈ F such that n k X X αivi = αivi. i=k+1 i=1 It follows that k n X X αivi + (−αi)vi = 0. i=1 i=k+1

Since B is a basis for V , αi = 0 for i = 1, . . . , n. In particular, it means that C is linearly independent. We conclude that C is a basis for im T . Now,

dim(im T ) = n − k = dim V − dim(ker T ).

This establishes the theorem. 22 CHAPTER 1. VECTOR SPACES

Definition 1.3.9. Let V and W be vector spaces and T : V → W a linear map. We call dim(ker T ) and dim(im T ) the nullity and of T , respectively. Denote the rank of T by rank T . (We do not introduce notation for the nullity because it has less use.)

3 3 Example 1.3.10. Let T : R → R be defined by

T (x, y, z) = (2x − y, x + 2y − z, z − 5x).

Find ker T , im T , rank T and the nullity of T .

Solution. If T (x, y, z) = (0, 0, 0), then

2x − y = 0, x + 2y − z = 0, z − 5x = 0.

Solving this system of equations, we see that y = 2x, z = 5x, where x is a free variable. Hence ker T = {(x, 2x, 5x) | x ∈ R} = h(1, 2, 5)i. Moreover,

T (x, y, z) = x(2, 1, −5) + y(−1, 2, 0) + z(0, −1, 1).

Since (2, 1, −5) = −2(−1, 2, 0) − 5(0, −1, 1), im T = h(−1, 2, 0), (0, −1, 1)i. Hence rank T = 2 and the nullity of T is 1.

The next theorem states that a function defined on a basis of a vector space can be uniquely extended to a linear map on the entire vector space. Hence a linear map on a vector space is uniquely determined on its basis.

Theorem 1.3.11. Let B be a basis for a vector space V . Then for any vector space W and a function t: B → W , there is a unique linear map T : V → W which extends t.

Proof. Existence: Let v ∈ V . Then v can be written uniquely in the form

n X v = αivi i=1 for some n ∈ N, v1, . . . , vn ∈ B and α1, . . . , αn ∈ F . Define n X T (v) = αit(vi). i=1 1.3. LINEAR MAPS 23

Clearly, this map is well-defined and T extends t. To show that T is linear, let u, v ∈ V and r, s ∈ F . Then m n X X u = αiui and v = βjvj i=1 j=1

for some m, n ∈ N, ui, vj ∈ B and αi, βj ∈ F , i = 1, . . . , m, j = 1, . . . , n. By renumbering if necessary, we may assume that ui = vi for i = 1, . . . , k and {uk+1, . . . , um} ∩ {vk+1, . . . , vn} = ∅. Then k m n X X X ru + sv = (rαi + sβi)ui + rαiui + sβjvj. i=1 i=k+1 j=k+1 Hence k m n X X X T (ru + sv) = (rαi + sβi)t(ui) + rαit(ui) + sβjt(vj) i=1 i=k+1 j=k+1 m n X X = r αit(ui) + s βjt(vj) i=1 j=1 = r T (u) + s T (v).

Uniqueness: Assume that S and T are linear maps from V into W that are Pn extensions of t. Let v ∈ V . Then v can be written uniquely as v = i=1 kivi for some n ∈ N, v1, . . . , vn ∈ B and k1, . . . , kn ∈ F . Since S is linear, n n X X S(v) = kiS(vi) = kit(vi). i=1 i=1 Do the same for T . We can see that S(v) = T (v) for any v ∈ V .

We can state the above theorem in terms of the universal mapping property, which will be useful later.

Let iB : B ,→ V denote the inclusion map defined by iB(x) = x for any x ∈ B. Then the above theorem can be restated as:

 iB B / V For any vector space W and a function t: B → W , ??  ??  ??  there exists a unique linear map T : V → W such t ??  T ?   that T ◦ i = t. W B 24 CHAPTER 1. VECTOR SPACES

Definition 1.3.12. A function T : V → W is called a linear isomorphism if it is linear and bijective. If there is a linear isomorphism from V onto W , we say that V is isomorphic to W , denoted by V =∼ W .

Proposition 1.3.13. Let T : V → W be a linear map. Then T is a linear isomorphism if and only if T has a linear inverse, i.e., a linear map S : W → V such that ST = IV and TS = IW .

Proof. (⇒) Assume that T is a linear isomorphism. Since T is bijective, T has −1 −1 −1 an inverse function T : W → V such that T T = IV and TT = IW . It remains to show that T −1 is linear. Let u, v ∈ W and α, β ∈ F . Then

T (αT −1(u) + βT −1(v)) = α T (T −1(u)) + β T (T −1(v)) = αu + βv.

Thus T −1(αu + βv) = αT −1(u) + βT −1(v).

(⇐) Suppose there is a linear map S : W → V such that ST = IV and TS = IW .

It is easy to verify that ST = IV implies injectivity of T and TS = IW implies surjectivity of T . Hence T is linear and bijective, i.e. a linear isomorphism.

By the above proposition, a linear isomorphism is also called an invertible linear map. Frequently, it is easy to prove that two vector spaces are isomorphic by finding linear maps from one vector space to the other which are inverses of each other.

Example 1.3.14.

(i) Let V be a finite-dimensional vector space of dimension n over a field F . ∼ n Then V = F . To see this, fix a basis {v1, . . . , vn} for V . Then any element

in V can be written uniquely as a1v1 + ··· + anvn, where a1, . . . , an ∈ F .A linear isomorphism between V and F n is given by

a1v1 + ··· + anvn ←→ (a1, . . . , an).

∼ (ii) Mm×n(F ) = Mn×m(F ). The linear maps Φ: Mm×n(F ) → Mn×m(F ) and

Ψ: Mn×m(F ) → Mm×n(F ) defined by

Φ(A) = At, and Ψ(B) = Bt,

for any A ∈ Mn×m(F ) and B ∈ Mn×m(F ), are inverses of each other. 1.3. LINEAR MAPS 25

Theorem 1.3.15. Let V and W be vector spaces. Then V =∼ W if and only if dim V = dim W.

Proof. Assume that T : V → W is a linear isomorphism. Let B be a basis for V . Then it is easy to show that T [B] is a basis for W . Since T is a bijection, |B| = |T [B]|, which implies that dim V = dim W . Conversely, assume that dim V = dim W and let B and C be bases for V and W , respectively. Suppose

B = {vα}α∈Λ and C = {wα}α∈Λ. Define T : B → W by T (vα) = wα for each α ∈ Λ and extend it to a linear map with the same name from V into W .

Similarly, define S : C → V by S(wα) = vα for each α ∈ Λ and extend it to a

linear map S from W into V . It is easy to see that ST = IV and TS = IW . Hence S and T are linear isomorphisms, which implies that V =∼ W .

Theorem 1.3.16. Let T : V → W be a linear map between finite-dimensional vector spaces with dim V = dim W . Then the following statements are equivalent:

(i) T is 1-1;

(ii) T is onto;

(iii) T is a linear isomorphism.

Proof. It suffices to show that T is 1-1 if and only if T is onto. Suppose T is 1-1. Then ker T = {0}. By Theorem 1.3.8, dim W = dim V = dim(im T ). It follows from Corollary 1.2.17 that W = im T . On the other hand, suppose T is onto. Then im T = W . Hence dim(im T ) = dim W = dim V . By Theorem 1.3.8, dim(ker T ) = 0, i.e. ker T = {0}, which implies that T is 1-1.

Corollary 1.3.17. Let T and S be linear maps on a finite-dimensional vector space V . Then ST = IV implies TS = IV . In other words, if a linear map on a finite-dimensional vector space is either left-invertible or right-invertible, then it is invertible.

Proof. The condition TS = IV implies that T is onto and S is 1-1. The conclusion now follows from Theorem 1.3.16.

Remark. Theorem 1.3.16 and Corollary 1.3.17 may not hold if V is infinite dimensional. See problem 1.3.13. 26 CHAPTER 1. VECTOR SPACES

Proposition 1.3.18. Let V and W be vector spaces over F . If S, T : V → W are linear maps and k ∈ F , define S + T and kT by

(S + T )(v) = S(v) + T (v), (kT )(v) = k T (v).

Then S + T and kT are linear maps from V into W . Proof. For any u, v ∈ V and α, β ∈ F ,

(S + T )(αu + βv) = S(αu + βv) + T (αu + βv) = αS(u) + βS(v) + αT (u) + βT (v) = α{S(u) + T (u)} + β{S(v) + T (v)} = α(S + T )(u) + β(S + T )(v).

Hence S + T is linear. Similarly, we can show that kT is linear.

Definition 1.3.19. Let V and W be vector spaces over a field F . Denote by L(V,W ) or Hom(V,W ) the set of linear maps from V into W :

L(V,W ) = Hom(V,W ) = {T : V → W | T is linear}.

If V = W , we simply write L(V ) or Hom(V ). Proposition 1.3.20. Let V and W be vector spaces over F . Then L(V,W ) is a vector space over F under the operations defined above. Proof. We leave this as a routine exercise.

Proposition 1.3.21. Let V and W be finite-dimensional vector spaces over F . Then dim(L(V,W )) = (dim V )(dim W ).

Proof. Let B = {v1, . . . , vn} and C = {w1, . . . , wm} be bases for V and W , respectively. For i = 1, . . . , m and j = 1, . . . , n, define Tij : B → W by

Tij(vj) = wi and Tij(vk) = 0 if k 6= j.

Extend each of them to a linear map from V into W . We leave it as an exercise to show that {Tij} is a basis for L(V,W ). Since {Tij} has mn elements, we see that dim(L(V,W )) = nm = (dim V )(dim W ). 1.3. LINEAR MAPS 27

The next proposition shows that a composition of linear maps is still linear.

Proposition 1.3.22. Let U, V and W be vector spaces over F . If S : U → V and T : V → W are linear maps, let

TS(v) = T ◦ S(v) = T (S(v)) for any v ∈ V .

Then TS is a linear map from U into W .

Proof. For any u, v ∈ U and α, β ∈ F ,

(TS)(αu + βv) = T (αS(u) + βS(v)) = α T S(u) + β T S(v).

This shows that TS is linear.

Sometimes a vector space has an extra which can be regarded as a multiplication.

Definition 1.3.23. Let V be a vector space over a field F .A product is a function V × V → V ,(x, y) 7→ x · y, satisfying the following properties: for any x, y, z ∈ V ,

(i) x · (αy + βz) = α(x · y) + β(x · z); “left-distributive law”

(ii) (αx + βy) · z = α(x · z) + β(y · z). “right-distributive law”

The product is said to be associative if it satisfies

(x · y) · z = x · (y · z) for any x, y, z ∈ V.

An algebra is a vector space equipped with an associative product. It is said to be commutative if the product is commutative. If it has a multiplicative identity, i.e. ∃ 1 ∈ V , 1 · v = v · 1 = v for all v ∈ V , we call it a unital algebra.

Note that an algebra has 3 operations: addition, multiplication and scalar multiplication. It has a structure under the addition and multiplication.

Definition 1.3.24. Let V and W be over a field F . A map φ: V → W is called an algebra homomorphism if it is a linear map such that

φ(x · y) = φ(x) · φ(y) for any x, y ∈ V .

It is called an algebra isomorphism if it a bijective algebra homomorphism. 28 CHAPTER 1. VECTOR SPACES

Proposition 1.3.25. Let V be a vector space over a field F . Define the product on L(V ) by ST = S ◦ T for any S, T ∈ L(V ). Then L(V ) is a unital algebra. If dim V > 1, it is a non-. Proof. By Proposition 1.3.20, L(V ) is a vector space over F . By linearity of S, for any S, T1, T2 ∈ L(V ) and α, β ∈ F ,

S(αT1 + βT2) = α ST1 + β ST2.

On the other hand, by the definition of addition, for any S, T1, T2 ∈ L(V ) and α, β ∈ F ,

(αS1 + βS2)T = α S1T + β S2T. The associativity of the product follows from the associativity of the composition of functions. Moreover, IV T = TIV = T . Hence L(V ) is a unital algebra. If dim V > 1, choose a linearly independent subset {x, y} of V and extend it to a basis for V . Define S(x) = y, S(y) = y, T (x) = x and T (y) = x and extend them to linear maps on V . It is easy to see that ST (x) 6= TS(x).

We have seen that L(V ) is a unital algebra. Other examples of an algebra can be found below. Example 1.3.26.

(i) For n ≥ 2, the set Mn(F ) of n × n matrices over F is a unital non- commutative algebra, where the product is the usual .

The identity matrix In is the multiplicative identity.

(ii) The set F [x] of polynomials over F is a unital commutative algebra under the usual polynomial operations. The polynomial 1 is the multiplicative identity.

(iii) Let X be a non-empty set and F a field. The set of all F -valued functions F(X) = {f : X → F } is a unital commutative algebra under the point- wise operations. The constant function 1(x) = 1 for any x ∈ X, is the multiplicative identity.

(iv) The space C([a, b]) of continuous functions on [a, b] is a unital commutative algebra under the pointwise operations. 1.3. LINEAR MAPS 29

Exercises

1.3.1. Fix a matrix Q ∈ Mn(F ) and let W = {A ∈ Mn(F ) | AQ = QA}.

(a) Prove that W is a subspace of Mn(F ).

(b) Define T : Mn(F ) → Mn(F ) by T (A) = AQ − QA for any A ∈ Mn(F ). Prove that T is a linear map and find ker T .

1.3.2. Let T : U → V be a linear map. If W is a subspace of V , prove that

T −1[W ] = {u ∈ U | T (u) ∈ W }

is a subspace of U.

1.3.3. Let V and W be vector spaces and T : V → W a linear transformation. Prove that T is one-to-one if and only if T maps any linearly independent subsets of V to a linearly independent subset of W .

1.3.4. Let V and W be vector spaces and T : V → W a linear transformation. Let B be a basis for ker T and C a basis for V such that B ⊆ C. Let B0 = C − B. Show that

0 (i) for any v1 and v2 in B , if v1 6= v2 then T (v1) 6= T (v2);

(ii) T [B0] = {T (v) | v ∈ B0} is a basis for im T .

Remark. We do not assume that V and W are finite-dimensional.

1.3.5. Let V be a vector space over a field F with dim V = 1. Show that if T : V → V is a linear map, then there exists a unique scalar k such that T (v) = kv for any v ∈ V .

1.3.6. Let T be a linear map on a finite-dimensional vector space V such that rank T = rank T 2. Show that im T ∩ ker T = {0}.

1.3.7. Let T be a linear map on a finite-dimensional vector space V such that T 2 = 0. Show that 2 rank T ≤ dim V . 30 CHAPTER 1. VECTOR SPACES

1.3.8. Let V and W be finite-dimensional vector spaces and T : V → W a linear map. Show that rank T ≤ min{dim V, dim W }.

1.3.9. Let U, V and W be finite-dimensional vector spaces and S : U → V , T : V → W linear maps. Show that

rank(TS) ≤ min{rank S, rank T }.

Moreover, if S or T is a linear isomorphism, then

rank(TS) = min{rank S, rank T }.

1.3.10. Let Vi be vector spaces over a field F and fi : Vi → Vi+1 linear maps. Consider a sequence

fi−1 fi · · · −→ Vi−1 −−−→ Vi −→ Vi+1 −→ · · ·

It is called exact at Vi if im fi−1 = ker fi. It is exact if it is exact at each Vi.

(i) Prove that 0 −→ V −→T W is exact if and only if T is 1-1.

(ii) Prove that V −→T W −→ 0 is exact if and only if T is onto.

(iii) Let V1,..., Vn be finite-dimensional vector spaces. Assume that we have an exact sequence

0 −→ V1 −→ V2 −→ · · · −→ Vn −→ 0.

Prove that n X i (−1) dim Vi = 0. i=0 1.3.11. Prove Proposition 1.3.20.

1.3.12. Recall that f = {(xn) | ∃N ∈ N ∀n ≥ N, xn = 0} is a subspace of S. Prove that f is isomorphic to F [x].

1.3.13. Give an example to show that Theorem 1.3.16 and Corollary 1.3.17 may not hold if V is infinite dimensional. 1.3. LINEAR MAPS 31

1.3.14. Prove that the set {Tij} in Proposition 1.3.21 is a basis for L(V,W ).

1.3.15. Let V and W be vector spaces and let U : V → W be a linear isomor- phism. Show that the map T 7→ UTU −1 is a linear isomorphism from L(V,V ) onto L(W, W ).

1.3.16. Suppose V is a finite-dimensional vector space and T : V → V a linear map such that T 6= 0 and T is not a linear isomorphism. Show that there is a linear map S : V → V such that ST = 0 but TS 6= 0.

1.3.17. Let V be a finite-dimensional vector space and suppose that U and W are subspaces of V such that dim U + dim W = dim V . Prove that there exists a linear transformation T : V → V such that ker T = U and im T = W . 32 CHAPTER 1. VECTOR SPACES

1.4 Matrix Representation

In this section, we gives a computational aspect of linear maps. The main theorem is that there is a 1-1 correspondence between the set of linear maps and the set of matrices. By assigning coordinates with respect to bases for vector spaces, we turn a linear mapping to a matrix multiplication. On the other hand, to prove results about matrices, it is easily done by considering the linear map obtained by the matrix multiplication.

Definition 1.4.1. Let V be a vector space of dimension n. An ordered n-tuple

(v1, . . . , vn) of n elements in V is called an ordered basis if {v1, . . . , vn} is a basis for V .

In other words, an ordered basis for a vector space is a basis such that the order of its elements is taken into account. We still use the usual notation {v1, . . . , vn} for ordered basis (v1, . . . , vn).

Definition 1.4.2. Let B = {v1, . . . , vn} be an ordered basis for a vector space

V . If v = k1v1 + ··· + knvn, where ki ∈ F for i = 1, . . . , n, then (k1, . . . , kn) is called the coordinate vector of v with respect to B, denoted by [v]B.

n Remark. For a computational purpose, we write a vector (α1, . . . , αn) in F as a column matrix:   α1  .   .    αn t We will also write it horizontally as [α1 . . . αn] .

Proposition 1.4.3. Let V be a vector space over a field F with dim V = n. Fix an ordered basis B for V . Then the map v 7→ [v]B is a linear isomorphism from V onto F n.

Proof. Let B = {v1, . . . , vn} be an ordered basis for V . Any v ∈ V can be written uniquely as v = α1v1 +···+αnvn, where αi ∈ F for i = 1, . . . , n. A linear isomorphism between V and F n is given by

t v = α1v1 + ··· + αnvn ←→ [α1 . . . αn] . 1.4. MATRIX REPRESENTATION 33

It is easy to see that the map in each direction is a linear map and is an inverse of each other.

Theorem 1.4.4. Let V and W be vector spaces over a field F with dim V = n and dim W = m. Fix ordered bases B for V and C for W , respectively. If T : V → W is a linear map, then there is a unique m × n matrix A such that

[T (v)]C = A[v]B for any v ∈ V. (1.3)

Proof. Let B = {v1, . . . , vn} and C = {w1, . . . , wm} be ordered bases for V and W , respectively. First assume that there exists an m × n matrix A such that

(1.3) holds. For each vj in B,[T (vj)]C = A[vj]B. But then the column matrix

[vj]B has 1 in the j-th position and 0 in the other places. Thus A[vj]B is the j-th column of A. This shows that the matrix A can be formed by obtaining the j-th column of A from [T (vj)]C: h i A = [T (v1)]C [T (v2)]C ... [T (vn)]C .

For each j ∈ {1, . . . , n}, there exist a1j,..., amj ∈ F such that m X T (vj) = aijwi. (1.4) i=1

Now we obtain all entries aij’s of A. Hence if A satisfies (1.3), then A must be in this form. Now we show that the matrix A defined this way satisfies (1.3). Let v ∈ V and write v = k1v1 + ··· + knvn, where k1,..., kn ∈ F . Then n n  X  X T (v) = T kjvj = kjT (vj) j=1 j=1 n m X  X  = kj aijwi j=1 i=1 m n X  X  = aijkj wi. i=1 j=1 Pn Hence [T (v)]C is an m × 1 matrix whose i-th row is j=1 aijkj. On the other hand, A[v]B is an m×1 matrix whose i-th row is obtained by multiplying the i-th Pn row of A to the only column of [v]B. Hence the i-th row of A[v]B is j=1 aijkj. This shows that (1.3) holds. We now finish the proof. 34 CHAPTER 1. VECTOR SPACES

Remark. We can give an alternative proof of the existence part of the above theorem as follows. For each vj in B, we write

m X T (vj) = aijwi. i=1

Form an m×n matrix A with the the (i, j)-entry aij given by the above equation. Hence   a1j  .  [T (vj)]C =  .    amj

On the other hand, A[vj]B is the j-th column of A. Hence [T (vj)]C = A[vj]B.

Now we can view [T ( · )]C and A[ · ]B = LA([ · ]B) as composite functions of linear maps and hence both of them are linear. We have established the equality of these two linear maps on the ordered basis B = {v1, . . . , vn} and thus they must be equal on all elements v ∈ V .

Definition 1.4.5. The unique matrix A in Theorem 1.4.4 is called the matrix representation of T with respect to the ordered bases B and C, respectively, and is denoted by [T ]B,C. Hence

[T (v)]C = [T ]B,C [v]B for all v ∈ V.

T V / W

[]B []C

 [T ]B,C  F n / F m

If V = W and B = C, we simply write [T ]B.

n m Given an m × n matrix A over F , we can define a linear map LA : F → F n by a matrix multiplication LA(x) = Ax for any x ∈ F . Now given a linear map, we can construct a matrix so that the linear map, in coordinates, is just the matrix multiplication. 1.4. MATRIX REPRESENTATION 35

3 2 Example 1.4.6. Let T : R → R be defined by

T (x, y, z) = (2x − 3y − z, −x + y + 2z).

3 Let B = {(1, 1, 0), (0, 1, 1), (1, 0, 1)} and C = {(1, 1), (−1, 1)} be bases for R and 2 R , respectively. Find [T ]B,C.

Solution. Note that

1 1 T (1, 1, 0) = (−1, 0) = − 2 (1, 1) + 2 (−1, 1) 1 7 T (0, 1, 1) = (−4, 3) = − 2 (1, 1) + 2 (−1, 1) T (1, 0, 1) = (1, 1) = 1(1, 1) + 0(−1, 1).

1 1 ! − 2 − 2 1 This shows that [T ]B,C = 1 7 . 2 2 0

2 2 Example 1.4.7. Let Tθ : R → R be the θ-angle counterclockwise 2 around the origin. Then it is a linear map on R . Let B = {(1, 0), (0, 1)} be the 2 standard basis for R . Find [Tθ]B and write down an explicit formula for Tθ.

Solution. If we rotate the points (1,0) and (0,1) on the plane counterclockwise by θ-angle, using elementary , we see that they get moved to the points (cos θ, sin θ) and (− sin θ, cos θ), respectively. Hence

Tθ(1, 0) = (cos θ, sin θ) = cos θ(1, 0) + sin θ(0, 1),

Tθ(0, 1) = (− sin θ, cos θ) = − sin θ(1, 0) + cos θ(0, 1).

Thus ! cos θ − sin θ [Tθ] = . B sin θ cos θ

2 If (x, y) ∈ R , then ! ! ! cos θ − sin θ x x cos θ − y sin θ [Tθ(x, y)] = = . B sin θ cos θ y x sin θ + y cos θ

Hence Tθ(x, y) = (x cos θ − y sin θ, x sin θ + y cos θ). 36 CHAPTER 1. VECTOR SPACES

Proposition 1.4.8. Let V and W be finite-dimensional vector spaces with or- dered bases B and C, respectively. Let S, T : V → W be linear maps and α, β ∈ F . Then

[αS + βT ]B,C = α[S]B,C + β[T ]B,C. Proof. Note that for any v ∈ V ,

[S(v)]C = [S]B,C[v]B and [T (v)]C = [T ]B,C[v]B, which implies

[(αS + βT )(v)]C = α[S(v)]C + β[T (v)]C

= α[S]B,C[v]B + β[T ]B,C[v]B

= (α[S]B,C + β[T ]B,C)[v]B.

But then [αS + βT ]B,C is a unique matrix such that

[(αS + βT )(v)]C = [αS + βT ]B,C[v]B for any v ∈ V.

We conclude that [αS + βT ]B,C = α[S]B,C + β[T ]B,C.

Proposition 1.4.9. Let U, V and W be finite-dimensional vector spaces with ordered bases A, B and C, respectively. Let S : U → V and T : V → W be linear maps. Then

[TS]A,C = [T ]B,C[S]A,B. Proof. Note that

[S(u)]B = [S]A,B [u]A for any u ∈ U, and (1.5)

[T (v)]C = [T ]B,C [v]B for any v ∈ V . (1.6)

Replacing v = S(u) in (1.6) and applying (1.5), we have

[T (S(u))]C = [T ]B,C[S(u)]B = [T ]B,C[S]A,B[u]A for any u ∈ U.

On the other hand, [TS]A,C is the unique matrix such that

[TS(u)]C = [TS]A,C [u]A for any u ∈ U.

We now conclude that [TS]A,C = [T ]B,C[S]A,B. 1.4. MATRIX REPRESENTATION 37

Theorem 1.4.10. Let V and W be finite-dimensional vector spaces over a field F with dim V = n and dim W = m. Let B and C be ordered bases for V and W , respectively. Then the map T 7→ [T ]B,C is a linear isomorphism from L(V,W ) onto Mm×n(F ). Hence ∼ L(V,W ) = Mm×n(F ).

Proof. Define Φ: L(V,W ) → Mm×n(F ) by Φ(T ) = [T ]B,C for any T ∈ L(V,W ). By Proposition 1.4.8, Φ is a linear map. To see that it is 1-1, let T ∈ L(V,W ) be such that [T ]B,C = 0. Then for any v ∈ V ,

[T (v)]C = [T ]B,C[v]B = 0 [v]B = 0.

Hence for any v ∈ V , the coordinate vector of T (v), with respect to C, is a zero vector. This shows that T (v) = 0 for any v ∈ V , i.e., T ≡ 0. To show that T is onto, let A = [aij] be an m × n matrix. Write B = {v1, . . . , vn} and

C = {w1, . . . , wm}. Define t: B → W by

m X t(vj) = aijwi for j = 1, . . . , n. i=1

Extend t uniquely to a linear map T : V → W . It is easy to see that [T ]B,C = A. Hence Φ is a linear isomorphism.

If V = W , we know from Proposition 1.3.25 and Example 1.3.26 that L(V ) and Mn(F ) are algebras. In this case, they are also isomorphic as algebras.

Corollary 1.4.11. Let V be a finite-dimensional vector space over a field F with dim V = n. Let B be an ordered basis for V . Then the map Φ: T 7→ [T ]B is an algebra isomorphism from L(V ) onto Mn(F ).

Proof. By Theorem 1.4.10, Φ is a linear isomorphism. That Φ(TS) = Φ(T )Φ(S) for all S, T ∈ L(V ) follows from Proposition 1.4.9.

By Theorem 1.4.10 and Corollary 1.4.11, we see that linear maps and matrices are two aspects of the same thing. We can prove theorems about matrices by working with linear maps instead. See, e.g., Exercises 1.4.1-1.4.3. On the other hand, matrices have an advantage of being easier to calculate with. 38 CHAPTER 1. VECTOR SPACES

Proposition 1.4.12. Let V and W be finite-dimensional vector spaces with the same dimension. Let B and C be ordered bases for V and W , respectively. Then a linear map T : V → W is invertible if and only if [T ]B,C is an . Proof. Let n = dim V = dim W . Assume that T is invertible. Then there is a −1 −1 −1 linear map T : W → V such that T T = IV and TT = IW . Then

−1 −1 [T ]C,B[T ]B,C = [T T ]B = [IV ]B = In and −1 −1 [T ]B,C[T ]C,B = [TT ]C = [IW ]C = In.

−1 −1 Hence [T ]B,C is invertible and [T ]B,C = [T ]C,B. Conversely, write A = [T ]B,C and assume that A is an invertible matrix. Then there is an n × n matrix B such that AB = BA = In. By Theorem 1.4.10, there is a linear map S : W → V such that [S]C,B = B. Hence

[ST ]B = [S]C,B[T ]B,C = BA = In = [IV ]B.

Hence ST = IV . Similarly, TS = IW . This shows that T and S are invertible and T −1 = S.

In the remaining part of this section, we discuss the rank of a matrix. The rank of a linear map is the dimension of its image. We will define the rank of a matrix to be the dimension of its column space, which turns out to be the same as the dimension of its row space. We will establish the relation between the rank of a matrix and the rank of the corresponding linear map.

Definition 1.4.13. Let A be an m × n matrix over a field F . The row space of A is the subspace of F n spanned by the row vectors of A. Similarly, the column space of A is the subspace of F m spanned by the column vectors of A. The row rank of A is defined to be the dimension of the row space of A. The column rank of A is defined to be the dimension of the column space of A.

If A is an m × n matrix over F , the row space of A is a subspace of F n, while the column space is a subspace of F m. However, it is remarkable that their dimensions are equal.

Theorem 1.4.14. Let A be an m × n matrix over a field F . Then the row rank and the column rank of A are equal. 1.4. MATRIX REPRESENTATION 39

Proof. Let A = [aij]. Let r1, . . . , rm be the row vectors of A, and let c1, . . . , cn

be the column vectors of A. Let d be the row rank of A and let {v1, . . . , vd} be n a basis for the row space of A. Write each vk ∈ F as

n vk = (βk1, . . . , βkn) ∈ F .

For i = 1, . . . , m, write

d d X X ri = αikvk = αik(βk1, . . . , βkn), k=1 k=1 where each αik ∈ F . Hence d d  X X  ri = (ai1, . . . , ain) = αikβk1,..., αikβkn . k=1 k=1 From this, it follows that, for i = 1, . . . , m and for j = 1, . . . , n,

d X aij = αikβkj. k=1 Hence for j = 1, . . . , n,

cj = (a1j, . . . , amj) d d  X X  = α1kβkj,..., αmkβkj k=1 k=1 d X = βkj(α1k, . . . , αmk) k=1 d X = βkjxk, k=1 m where xk = (α1k, . . . , αmk) ∈ F , for k = 1, . . . , d. This shows that

hc1, . . . , cni ⊆ hx1, . . . , xdi.

Hence the column rank of A ≤ d = the row rank of A. But this is true for any matrix A. Thus the column rank of At ≤ the row rank of At. Since the row space and the column space of At are the column space and the row space of A, respectively, this shows that the column rank of A equals the row rank of A. 40 CHAPTER 1. VECTOR SPACES

Definition 1.4.15. Let A be an m × n matrix over a field F . The rank of A is defined to be the column rank of A, which equals the row rank of A, and is denoted by rank A.

Remark. The elementary row operations preserve the row space of a matrix. Hence the rank of a matrix is still preserved under the elementary row operations. We can apply these operations to the matrix until it is in a reduced echelon form. Then the rank of the matrix is the number of non-zero row vectors in the reduced echelon form. However, the elementary row operations do not preserve the column space (but it preserves the column rank = the row rank).

n m Proposition 1.4.16. Let A be an m×n matrix over a field F . Let LA : F → F n be the linear map defined by LA(x) = Ax for any x ∈ F . Then

(i) im LA = the column space of A;

(ii) rank LA = rank A.

n Proof. Let c1,..., cn be the column vectors of A. If x = (x1, . . . , xn) ∈ F , it is easy to see that

Ax = x1c1 + ··· + xncn.

It follows that im LA = the column space of A. Thus (ii) follows from (i).

On the other hand, if T : V → W is a linear map, the rank of T coincides with the rank of the matrix representation of T .

Proposition 1.4.17. Let V and W be finite-dimensional vector spaces with or- dered bases B and C, respectively. For any linear map T : V → W ,

rank T = rank([T ]B,C).

Proof. See Exercise 1.4.5. 1.4. MATRIX REPRESENTATION 41

Exercises

n m 1.4.1. Recall that if A ∈ Mm×n(F ), then the linear map LA : F → F is given n by LA(x) = Ax, for all x ∈ F . Prove the following statements:

(i) if B and C are standard ordered bases for F n and F m, respectively, then

[LA]B,C = A;

(ii) if A and B are m × n matrices, then A = B if and only if LA = LB;

(iii) if A is an n × n matrix, then A is invertible if and only if LA is invertible.

1.4.2. If A and B are n × n matrices such that AB = In, prove that BA = In. Hence A and B are invertible and A−1 = B.

1.4.3. Let A be an m × n matrix and B an n × m matrix such that AB = Im −1 and BA = In. Prove that m = n, A and B are invertible and A = B .

1.4.4. Let A be an n × n matrix. Show that A is invertible if and only if rank A = n.

1.4.5. Let V and W be finite-dimensional vector spaces over a field F with dim V = n and dim W = m. Let B and C be ordered bases for V and W , respectively. Let T : V → W be a linear map, and write A = [T ]B,C. Let n m n LA : F → F be defined by LA(x) = Ax for all x ∈ F . Prove the following statements: ∼ (i) ker T = ker LA; ∼ (ii) im T = im LA;

(iii) rank T = rank A.

2 2 2 1.4.6. Let T : R → R be a linear map such that T = T . Show that T = 0 or 2 T = I or there is an ordered basis B for R such that " # 1 0 [T ] = . B 0 0

Hint: Consider dim(ker T ). 42 CHAPTER 1. VECTOR SPACES

1.5 Change of Bases

Given two ordered bases B and B0 for a vector space V , the coordinate vectors of an element v ∈ V with respect to B and B0, respectively, are usually different. We can transform one to the other by a matrix multiplication.

Theorem 1.5.1. Let B and B0 be ordered bases for a vector space V . Then there exists a unique P such that

[v]B0 = P [v]B for any v ∈ V. (1.7)

V ?  ? [] []B  ?? B0  ??  ??   P ? F n / F n Proof. The proof of this theorem is similar to the proof of Theorem 1.4.4. Let 0 0 0 B = {v1, . . . , vn} and B = {v1, . . . , vn} be ordered bases for V . First, assume that there is a matrix P such that (1.7) holds. For each vj ∈ B,[vj]B is the n × 1 column matrix with 1 in the j-th row and 0 in the other positions. Thus P [vj]B is the j-th column of P . Hence for (1.7) to hold, the j-th-column of P must be

[vj]B0 . It follows that P will be of the form: h i P = [v1]B0 [v2]B0 ... [vn]B0 .

It remains to show that the matrix P defined above satisfies (1.7). The proof is the same as that of Theorem 1.4.4 and we leave it as an exercise.

Definition 1.5.2. The matrix P with the property above is called the transition 0 matrix from B to B . Notice that this is the same as [IV ]B,B0 . The proof of Theorem 1.5.1 gives a method of how to find a transition matrix. 0 0 0 Let B = {v1, . . . , vn} and B = {v1, . . . , vn} be ordered bases for V . The j-th 0 column of the transition matrix from B to B is the coordinate vector of vj with respect to B0. More precisely, for j = 1, . . . , n, write n X 0 vj = pij vi. i=1 0 The matrix P = [pij] is the transition matrix from B to B . 1.5. CHANGE OF BASES 43

Example 1.5.3. Let B = {(1, 0), (0, 1)} and B0 = {{(1, 1), (−1, 1)} be ordered 2 0 bases for R . Find the transition matrix from B to B and the transition matrix from B0 to B. Solution. Note that 1 1 (1, 0) = (1, 1) − (−1, 1), 2 2 1 1 (0, 1) = (1, 1) + (−1, 1). 2 2 1 1 ! 0 2 2 Hence the transition matrix from B to B is 1 1 . Similarly, − 2 2 (1, 1) = 1(1, 0) + 1(0, 1), (−1, 1) = −1(1, 0) + 1(0, 1). ! 1 −1 Hence the transition matrix from B0 to B is . 1 1 In the above example, notice that −1 ! 1 1 ! 1 −1 2 2 = 1 1 . 1 1 − 2 2 This is true in general, as stated in the next proposition. Proposition 1.5.4. The transition matrix is invertible. In fact, the inverse of the transition matrix from B to B0 is the transition matrix from B0 to B. Proof. Let P be the transition matrix from B to B0 and Q the transition matrix from B0 to B, respectively. Then

[v]B0 = P [v]B and [v]B = Q[v]B0 for any v ∈ V. Hence

[v]B = Q[v]B0 = QP [v]B for any v ∈ V , and

[v]B0 = P [v]B = PQ[v]B0 for any v ∈ V .

But then the identity matrix I is the unique matrix such that [v]B = I[v]B for any v ∈ V . Thus QP = I. By the same reason, PQ = I. This shows that P and Q are inverses of each other. 44 CHAPTER 1. VECTOR SPACES

The next theorem shows the relation between the matrix representations of the same linear map with respect to different ordered bases.

Theorem 1.5.5. Let V and W be finite-dimensional vector spaces, B, B0 ordered bases for V , and C, C0 ordered bases for W . Let P be the transition matrix from B to B0 and Q the transition matrix from C to C0. Then for any linear map T : V → W , −1 [T ]B0,C0 = Q [T ]B,C P .

T V / W ?? ??  ? []  ? []  ?? B0  ?? C0  ? []C  ?  ??  ??  ??  ??  [T ]B0,C0  []B  F n  / F m  oo7  o7  oo  ooo  P ooo  Q oo  oo  ooo  ooo  oo  oo  ooo × ooo [T ]B,C × oo F n / F m

Proof. We can rephrase the statement of this theorem in terms of a commutative diagram in the following way. If the diagram is commutative on all 4 sides of the tent, it must be commutative at the base of the tent as well. Now we proof the theorem. Write all the properties of the relevant matrices:

[T (v)]C = [T ]B,C[v]B for any v ∈ V ; (1.8)

[v]B0 = P [v]B for any v ∈ V ; (1.9)

[w]C0 = Q[w]C for any w ∈ W . (1.10)

Replacing w = T (v) in (1.10) and applying the other identities above, we have

−1 [T (v)]C0 = Q [T (v)]C = Q [T ]B,C[v]B = Q [T ]B,C P [v]B0 for any v ∈ V . But then [T ]B0,C0 is the unique matrix such that

[T (v)]C0 = [T ]B0,C0 [v]B0 for any v ∈ V.

−1 We now conclude that [T ]B0,C0 = Q [T ]B,C P . 1.5. CHANGE OF BASES 45

Corollary 1.5.6. Let V be a finite-dimensional vector space with ordered bases B and B0. Let P be the transition matrix from B to B0. Then for any linear map T : V → V , −1 [T ]B0 = P [T ]B P .

Proof. Let C = B, C0 = B0 and Q = P in Theorem 1.5.5.

Definition 1.5.7. Let A and B be square matrices. We say that A is similar to B if there is an invertible matrix P such that B = P AP −1. We use notation A ∼ B to denote A being similar to B.

Proposition 1.5.8. Similarity is an equivalence relation on Mn(F ).

Proof. This is easy and is left as an exercise.

If A is similar to B, then B is similar to A. Hence we can say that A and B are similar.

Proposition 1.5.9. If T : V → V is a linear map on a finite-dimensional vector 0 space V and if B and B are ordered bases for V , then [T ]B ∼ [T ]B0 .

Proof. It follows from Corollary 1.5.6. 46 CHAPTER 1. VECTOR SPACES

Exercises

1.5.1. If A = [aij] is a square matrix in Mn(F ), define the trace of A to be the sum of all entries in the main diagonal:

n X tr(A) = aii. i=1

For any A, B ∈ Mn(F ), prove that

(i) tr(AB) = tr(BA);

(ii) tr(ABA−1) = tr(B) if A is invertible.

1.5.2. Let T : V → V be a linear map on a finite-dimensional vector space V . Define the trace of T to be

tr(T ) = tr([T ]B) where B is an ordered basis for V . Prove the following statements:

(i) this definition is well-defined, i.e., independent of a basis;

(ii) tr(TS) = tr(ST ) for any linear maps S and T on V .

1.5.3. Let V be a vector space over a field F .

(i) If V is finite dimensional, prove that it is impossible to find two linear maps

S and T on V such that ST − TS = IV .

(ii) Show that the statement in (i) is not true if V is infinite dimensional. (Take V = F [x], S(f)(x) = f 0(x) and T (f)(x) = xf(x) for any f ∈ F [x].)

1.5.4. Let V be a finite-dimensional vector space with dim V = n. Let B be an ordered basis for V and T : V → V a linear map on V . Prove that if A is an n × n matrix similar to [T ]B, then there is an ordered basis C for V such that

[T ]C = A.

1.5.5. Let V be a finite-dimensional vector space with dim V = n. Show that two n × n matrices A and B are similar if and only if they are matrix representations of the same linear map on V with respect to (possibly) different ordered bases. 1.5. CHANGE OF BASES 47

1.5.6. Let S, T : V → V be linear maps on a finite-dimensional vector space V . 0 Show that there exist ordered bases B and B for V such that [S]B = [T ]B0 if and only if there is a linear isomorphism U : V → V such that T = USU −1. 0 Hint: If [S]B = [T ]B0 , let U be a linear map that carries B onto B . Conversely, if T = USU −1, let B be any basis for V and B0 = U[B].

1.5.7. Show that if A and B are similar matrices, then rank A = rank B. 48 CHAPTER 1. VECTOR SPACES

1.6 Sums and Direct Sums

In this section, we construct a new vector space from existing ones. Given vector spaces V and W over the same field, we can define a vector space structure on the Cartesian product V × W . The new vector space obtained this way is called an external direct sum. On the other hand, we can define the sum of subspaces of a vector space. If the subspaces have only zero vector in common, the sum will be called the (internal) direct sum. We will investigate the relation between external and internal direct sums and generalize the idea into the case where we have an arbitrary number of vector spaces.

Definition 1.6.1. Let V and W be vector spaces over the same field F . Define

V × W = {(v, w) | v ∈ V, w ∈ W }, together with the following operations

(v, w) + (v0, w0) = (v + v0, w + w0) k(v, w) = (kv, kw) for any (v, w), (v0, w0) ∈ V × W and k ∈ F . It is easy to check that V × W is a vector space over F , called the direct product or the external direct sum of V and W .

Proposition 1.6.2. Let V and W be finite-dimensional vector spaces. Then

dim(V × W ) = dim V + dim W.

Proof. Let {v1, . . . , vn} and {w1, . . . , wm} be bases for V and W , respectively.

Then B = {(v1, 0),..., (vn, 0)} ∪ {(0, w1),..., (0, wm)} is a basis for V × W . To see that B spans V × W , let (v, w) ∈ V × W . Then v and w can be written uniquely as v = α1v1 + ··· + αnvn and w = β1w1 + ··· + βmwm, where αi, βj ∈ F for all i and j. Then

n m X X (v, w) = (v, 0) + (0, w) = αi(vi, 0) + βj(0, wj). i=1 j=1 1.6. SUMS AND DIRECT SUMS 49

Now, let α1,..., αn, β1,..., βm be elements in F such that n m X X αi(vi, 0) + βj(0, wj) = (0, 0). i=1 j=1 Then n m  X X  αivi, βjwj = (0, 0). i=1 j=1 Hence n m X X αivi = 0 and βjwj = 0. i=1 j=1

It follows that αi = 0 and βj = 0 for all i, j.

Proposition 1.6.2 shows that the dimension of V ×W is the sum of the dimen- sions of V and W if they are finite-dimensional. It suggests that the Cartesian product V × W is really a “sum” and not a product of vector spaces. That is why we call it the external direct sum. The adjective external is to emphasize that we construct a new vector space from the existing ones. Next, we turn to constructing a new subspace from existing ones. We know that an intersection of subspaces is still a subspace, but a union of subspaces may not be a subspace. The sum of subspaces will play a role of union as we will see below.

Definition 1.6.3. Let W1 and W2 be subspaces of a vector space V . Define the

sum of W1 and W2 to be

W1 + W2 = {w1 + w2 | w1 ∈ W1, w2 ∈ W2}.

Proposition 1.6.4. Let W1 and W2 be subspaces of a vector space V over a field

F . Then W1 + W2 is a subspace of V generated by W1 ∪ W2, i.e.

W1 + W2 = hW1 ∪ W2i .

Proof. Clearly W1 + W2 6= ∅. Let x, y ∈ W1 + W2 and k ∈ F . Then there exist 0 0 0 0 w1, w1 ∈ W1 and w2, w2 ∈ W2 such that x = w1 + w2 and y = w1 + w2. Hence

0 0 0 0 x + y = (w1 + w2) + (w1 + w2) = (w1 + w1) + (w2 + w2) ∈ W1 + W2;

kx = k(w1 + w2) = (kw1) + (kw2) ∈ W1 + W2. 50 CHAPTER 1. VECTOR SPACES

Thus W1 + W2 is a subspace of V . Next, note that W1 and W2 are subsets of

W1 + W2, which implies W1 ∪ W2 ⊆ W1 + W2. Hence hW1 ∪ W2i ⊆ W1 + W2.

Now let x ∈ W1 + W2. Then x = w1 + w2, where w1 ∈ W1 and w2 ∈ W2. Thus w1 and w2 belong to W1 ∪ W2. This implies x = w1 + w2 ∈ hW1 ∪ W2i . It follows that W1 + W2 ⊆ hW1 ∪ W2i.

Example 1.6.5.

2 (i) We can write R as a sum of two subspaces in several ways:

2 R = h(1, 0)i + h(0, 1)i = h(1, 0)i + h(1, 1)i = h(1, −1)i + h(1, 1)i .

3 (ii) R can be written as a sum of the xy-plane and the yz-plane:

3 R = {(x, y, 0) | x, y ∈ R} + {(0, y, z) | y, z ∈ R}. (1.11)

3 Also, R can be written as a sum of the xy-plane and the z-axis:

3 R = {(x, y, 0) | x, y ∈ R} + {(0, 0, z) | z ∈ R}. (1.12)

3 Note that in (1.11), R is a sum of subspaces of dimension 2, while in (1.12), 3 R is a sum of a subspace of dimension 2 and a subspace of dimension 1.

If V = W1 + W2, every element v in V can be written as v = w1 + w2, where w1 ∈ W1 and w2 ∈ W2, but this representation need not be unique. It will be unique when W1 ∩ W2 = {0}, which is called the (internal) direct sum.

Definition 1.6.6. Let W1 and W2 be subspaces of a vector space V . We say that V is the (internal) direct sum of W1 and W2, written as V = W1 ⊕ W2, if

V = W1 + W2 and W1 ∩ W2 = {0}.

Proposition 1.6.7. Let W1 and W2 be subspaces of a vector space V . Then

V = W1 ⊕ W2 if and only if every v ∈ V can be written uniquely as v = w1 + w2 for some w1 ∈ W1 and w2 ∈ W2.

Proof. Assume that V = W1 ⊕ W2. By the definition of W1 + W2, every v ∈ V can be written as v = w1 + w2 for some w1 ∈ W1 and w2 ∈ W2. Assume 1.6. SUMS AND DIRECT SUMS 51

0 0 0 0 that v = w1 + w2 = w1 + w2, where w1, w1 ∈ W1 and w2, w2 ∈ W2. Then 0 0 0 0 w1 − w1 = w2 − w2 ∈ W1 ∩ W2 = {0}. This shows that w1 = w1 and w2 = w2. Conversely, it is easy to see that V = W1 +W2. Let v ∈ W1 ∩W2. Then we can write v = v+0 ∈ W1 +W2 and v = 0+v ∈ W1 +W2. By the uniqueness part of the assumption, we have v = 0. Hence W1 ∩ W2 = {0} and thus V = W1 ⊕ W2.

Example 1.6.8.

3 (i) In Example 1.6.5, we have the following sum of R :

3 R = {(x, y, 0) | x, y ∈ R} + {(0, y, z) | y, z ∈ R}.

However, this is not a direct sum because

{(x, y, 0) | x, y ∈ R} ∩ {(0, y, z) | y, z ∈ R} = {(0, y, 0) | y ∈ R}.

On the other hand, we have the following direct sum

3 R = {(x, y, 0) | x, y ∈ R} ⊕ {(0, 0, z) | z ∈ R}.

(ii) Let V = Mn(R) and let W1 and W2 be the subspaces of symmetric matrices and of skew-symmetric matrices, respectively:

t t W1 = {A ∈ Mn(R) | A = A} and W2 = {A ∈ Mn(R) | A = −A}.

We leave it as an exercise to show that V = W1 ⊕ W2.

Theorem 1.6.9. Let W1 and W2 be subspaces of a finite-dimensional vector space. Then

dim(W1 + W2) = dim W1 + dim W2 − dim(W1 ∩ W2).

Proof. Let B = {v1, . . . , vk} be a basis for W1 ∩ W2. Then B is a linearly inde- pendent subset of W1 and W2. Extend it to a basis B1 = {v1, . . . , vk, w1, . . . , wn} 0 0 for W1 and also extend it to a basis B2 = {v1, . . . , vk, w1, . . . , wm} for W2. Note 0 that wi 6= wj for any i, j; for otherwise they are in W1 ∩ W2. We will show that 0 0 the set B = B1 ∪B2 = {v1, . . . , vk, w1, . . . , wn, w1, . . . , wm} is a basis for W1 +W2. Once we establish this fact, it then follows that

dim(W1 + W2) = k + n + m = dim W1 + dim W2 − dim(W1 ∩ W2). 52 CHAPTER 1. VECTOR SPACES

To show that B spans W1 + W2, let v = u1 + u2, where u1 ∈ W1 and u2 ∈ W2.

Since B1 and B2 are bases for W1 and W2, respectively, we can write

u1 = α1v1 + . . . αkvk + αk+1w1 + ··· + αk+nwn and 0 0 u2 = β1v1 + . . . βkvk + βk+1w1 + ··· + βk+mwm, where αi’s and βj’s are in F . Then

k n m X X X 0 u1 + u2 = (αi + βi)vi + αk+iwi + βk+iwi. i=1 i=1 i=1

Hence B spans V . To establish linear independence of B, let α1, . . . , αk, β1, . . . , βn 0 0 and β1, . . . , βm be elements in F such that

k n m X X X 0 0 αivi + βiwi + βiwi = 0. (1.13) i=1 i=1 i=1 Then k n m X X X 0 0 αivi + βiwi = − βiwi ∈ W1 ∩ W2. i=1 i=1 i=1 Hence m k X 0 0 X − βiwi = γivi i=1 i=1 for some γ1, . . . , γk in F , which implies

k m X X 0 0 γivi + βiwi = 0. i=1 i=1 0 By linear independence of B2, γi and βi are all zero. Now, (1.13) reduces to

k n X X αivi + βiwi = 0. i=1 i=1

By linear independence of B1, we see that αi and βi are all zero. Hence B is a basis for W1 + W2.

Corollary 1.6.10. If W1 and W2 are subspaces of a finite-dimensional vector space V and V = W1 ⊕ W2, then dim V = dim W1 + dim W2. 1.6. SUMS AND DIRECT SUMS 53

Proof. It V = W1 ⊕ W2, then W1 ∩ W2 = {0}, and thus dim(W1 ∩ W2) = 0.

The next proposition shows the relation between internal and external direct sums.

Proposition 1.6.11. Let W1 and W2 be subspaces of a vector space V . Suppose ∼ that V = W1 ⊕ W2. Then V = W1 × W2. On the other hand, let V and W be vector spaces over a field F . Let X =

V × W be the external direct sum of V and W . Let X1 = {(v, 0) | v ∈ V } and ∼ ∼ X2 = {(0, w) | w ∈ W }. Then X1 and X2 are subspaces of X, X1 = V , X2 = W

and X = X1 ⊕ X2.

Proof. Exercise.

In the future, we will talk about a direct sum without stating whether it is internal or external. We also write V ⊕ W to denote the (external) direct sum of V and W . It should be clear from the context whether it is internal or external. Moreover, by Proposition 1.6.11, we can regard it as an internal direct sum or an external direct sum without confusion. We sometimes omit the adjective “internal” or “external” and simply talk about the direct sum of vector spaces.

Proposition 1.6.12. Let W be a subspace of a vector space V . Then there exists a subspace U of V such that V = U ⊕ W .

Proof. Let B be a basis for W . Then B is linearly independent in V and hence can be extended to a basis C for V . Let B0 = C − B and U = hB0i. It is easy to check that U is a subspace of V such that V = W ⊕ U.

Proposition 1.6.13. Let V1 and V2 be subspaces of a vector space V such that

V = V1 ⊕ V2. Then given any vector space W and linear maps T1 : V1 → W and

T2 : V2 → W , there is a unique linear map T : V1 ⊕ V2 → W such that T |V1 = T1

and T |V2 = T2.

Proof. Assume that there is a linear map T : V1 ⊕ V2 → W such that T |V1 = T1 and T |V2 = T2. By linearity, for any v1 ∈ V1 and v2 ∈ V2,

T (v1 + v2) = T (v1) + T (v2) = T1(v1) + T2(v2). 54 CHAPTER 1. VECTOR SPACES

Hence we define the map T : V1 ⊕ V2 → W by

T (v1 + v2) = T1(v1) + T2(v2) for any v1 ∈ V1, v2 ∈ V2.

It is easy to show that T is linear and satisfies T |V1 = T1 and T |V2 = T2. This finishes the uniqueness and existence of the map T .

Proposition 1.6.13 is the universal mapping property for the direct sum. If we let ι1 : V1 → V1 ⊕ V2 and ι2 : V2 → V1 ⊕ V2 be the inclusion maps of V1 and V2 into V1 ⊕ V2, respectively, then it can be summarized by the following diagram:

ι1 ι2 V1 / V1 ⊕ V2 o V2 E EE yy EE yy EE yy EE yy EE T yy T1 EE yy T2 EE yy EE yy E"  |yy W

This proposition can also be interpreted for the external direct sum if we define

ι1 : V1 → V1 ⊕ V2 and ι2 : V2 → V1 ⊕ V2 by ι1(v1) = (v1, 0) and ι2(v2) = (0, v2) for any v1 ∈ V1 and v2 ∈ V2. There is also another universal mapping property of the direct sum in terms of the projection maps.

Proposition 1.6.14. Let V1 and V2 be vector spaces over the same field. For i = 1, 2, define πi : V1 ⊕ V2 → Vi by πi(v1, v2) = vi for any v1 ∈ V1 and v2 ∈ V2.

Then given any vector space W and linear maps T1 : W → V1 and T2 : W → V2, there is a unique linear map T : W → V1⊕V2 such that π1◦T = T1 and π2◦T = T2.

W y EE yy EE yy EE T1 yy EE T1 yy T EE yy EE yy EE yy EE yy EE |y π1  π2 " V1 o V1 ⊕ V2 / V2

Proof. Exercise.

Next, we will define a sum and a direct sum for a finite number of subspaces. 1.6. SUMS AND DIRECT SUMS 55

Definition 1.6.15. Let W1,...,Wn be subspaces of a vector space V . Define

W1 + ··· + Wn = {w1 + ··· + wn | w1 ∈ W1, . . . , wn ∈ Wn}.

Proposition 1.6.16. If W1,...,Wn are subspaces of a vector space V , then

W1 + ··· + Wn is a subspace of V generated by W1 ∪ · · · ∪ Wn:

W1 + ··· + Wn = hW1 ∪ · · · ∪ Wni .

Proof. The proof is the same as that of Proposition 1.6.4.

Definition 1.6.17. Let W1,...,Wn be subspaces of a vector space V . We say that V is the (internal) direct sum of W1,..., Wn if

(i) V = W1 + ··· + Wn, and

(ii) Wi ∩ (W1 + ··· + Wi−1 + Wi+1 + ··· + Wn) = {0} for i = 1, . . . , n.

Denote it by V = W1 ⊕ · · · ⊕ Wn.

The second condition in the above definition can be replaced by one of the following equivalent statements below.

Proposition 1.6.18. Let W1,...,Wn be subspaces of a vector space V and let

V = W1 + ··· + Wn. Then TFAE:

(i) Wi ∩ (W1 + ··· + Wi−1 + Wi+1 + ··· + Wn) = {0} for i = 1, . . . , n;

(ii) ∀w1 ∈ W1 ... ∀wn ∈ Wn, w1 + ··· + wn = 0 ⇒ w1 = ··· = wn = 0;

(iii) every v ∈ V can be written uniquely as v = w1 + ··· + wn, with wi ∈ Wi.

Proof. (i) ⇒ (ii). Assume (i) holds. Let w1 ∈ W1,..., wn ∈ Wn be such that w1 + ··· + wn = 0. For each i ∈ {1, . . . , n}, we see that

−wi = w1 + ··· + wi−1 + wi+1 + ··· + wn

∈ Wi ∩ (W1 + ··· + Wi−1 + Wi+1 + ··· + Wn) = {0}.

Hence wi = 0 for each i ∈ {1, . . . , n}. (ii) ⇒ (iii). Assume (ii) holds. Suppose an element v ∈ V can be written as

0 0 v = w1 + ··· + wn = w1 + ··· + wn, 56 CHAPTER 1. VECTOR SPACES

0 where wi, wi ∈ Wi for i = 1, . . . , n. Then

0 0 (w1 − w1) + ··· + (wn − wn) = 0.

0 By the assumption, wi = wi for i = 1, . . . , n. (iii) ⇒ (i). Assume (iii) holds. Let v ∈ Wi ∩(W1 +···+Wi−1 +Wi+1 +···+Wn).

Then v ∈ Wi and

v = w1 + ··· + wi−1 + 0 + wi+1 + ··· + wn, where wi ∈ Wi for i = 1, . . . , i − 1, i + 1, . . . , n. By the assumption, we see that 0 v = 0 and wi = 0 for wi, wi ∈ Wi for i = 1, . . . , n. This means that

Wi ∩ (W1 + ··· + Wi−1 + Wi+1 + ··· + Wn) = {0} for i = 1, 2, . . . , n.

The concept of a direct product or an external direct sum of an arbitrary number of vector spaces can be defined similarly. We first start with the case when there are finitely many vector spaces.

Definition 1.6.19. Let W1,...,Wn be vector spaces over a field F . Define

W1 × · · · × Wn = {(w1, . . . , wn) | w1 ∈ W1, . . . , wn ∈ Wn}.

Define the vector space operations componentwise. Then W1 × · · · × Wn is a vector space over F , called the external direct sum of W1,..., Wn.

We list important results for a finite direct sum of vector spaces whose proofs are left as exercises.

Proposition 1.6.20. Let W1,...,Wn be subspaces of a vector space V . Suppose ∼ that V = W1 ⊕ · · · ⊕ Wn. Then V = W1 × · · · × Wn.

On the other hand, let V1,...,Vn be vector spaces. Let X = V1 × · · · × Vn be the external direct sum of V1,...,Vn. For i = 1, . . . , n, let

Xi = {(v1, . . . , vn) | vi ∈ Vi and vj = 0 for j 6= i}.

∼ Then each Xi is a subspace of X, Xi = Vi and X = X1 ⊕ · · · ⊕ Xn. 1.6. SUMS AND DIRECT SUMS 57

Proof. Exercise.

We will also denote the external direct sum of V1,...,Vn by V1 ⊕ · · · ⊕ Vn.

Proposition 1.6.21. Assume that V = V1 ⊕ · · · ⊕ Vn, where V1,...,Vn are subspaces of a vector space V . For i = 1, . . . , n, let Bi be a linearly independent subset of Vi. Then B1 ∪ · · · ∪ Bn is a linearly independent subset of V . In particular, if Bi is a basis for each Vi, then B1 ∪ · · · ∪ Bn is a basis for V . Proof. Exercise.

Corollary 1.6.22. Let V1,...,Vn be subspaces of a finite-dimensional vector space V such that V = V1 ⊕ · · · ⊕ Vn. Then

dim V = dim V1 + ··· + dim Vn. Proof. Exercise.

In the above definition, we see that a direct product and an external direct sum are the same when we have a finite number of vector spaces. Next, we consider the general case when we have an arbitrary number of vector spaces. In this case, the definitions of a direct product and an external direct sum will be different. But there is a close relation between the internal direct sum and the external direct sum.

Definition 1.6.23. Let {Vα}α∈Λ be a family of vector spaces. Define the Carte- sian product Y n [ o Vα = v :Λ → Vα : v(α) ∈ Vα for all α ∈ Λ . α∈Λ α∈Λ Define the following operations:

(v + w)(α) = v(α) + w(α) (kv)(α) = kv(α), Q Q for any v, w ∈ α∈Λ Vα and α ∈ Λ. It is easy to check that α∈Λ Vα is a vector space under the operations defined above. It is called the direct product

of {Vα}α∈Λ. Next, we define M n Y o Vα = v ∈ Vα : v(α) = 0 for all but finitely many α . α∈Λ α∈Λ 58 CHAPTER 1. VECTOR SPACES

L Q L It is easy to see that α∈Λ Vα is a subspace of α∈Λ Vα. We call α∈Λ Vα the (external) direct sum of {Vα}α∈Λ. Note that the direct product and the external direct sum of {Vα}α∈Λ are the same when the index set Λ is finite.

Now we define an arbitrary internal direct sum of a vector space.

Definition 1.6.24. Let V be a vector space over a field F . Let {Vα}α∈Λ be a family of subspaces of V such that S (i) V = α∈Λ Vα ; DS E (ii) for each β ∈ Λ, Vβ ∩ α∈Λ−{β} Vα = {0}.

Then we say that V is the (internal) direct sum of {Vα}α∈Λ and denote it by L L P V = α∈Λ Vα. An element in α∈Λ Vα can be written as a finite sum α∈Λ vα, where vα ∈ Vα for each α ∈ Λ and vα = 0 for all but finitely many α’s. Moreover, this representation is unique.

Theorem 1.6.25. Let {Vα}α∈Λ be a family of vector spaces. Form the external L direct sum V = α∈Λ Vα. For each α ∈ Λ, let Wα be the subspace of V defined by

Wα = {v ∈ V | v(β) = 0 for all β ∈ Λ − {α}}. ∼ L Then Wα = Vα for each α ∈ Λ and V = α∈Λ Wα as an internal direct sum. On the other hand, let V be a vector space over a field F and {Wα}α∈Λ a L family of subspaces of V such that V = α∈Λ Wα as an internal direct sum. L ∼ Form an external direct sum W = α∈Λ Wα. Then V = W .

Proof. Exercise. 1.6. SUMS AND DIRECT SUMS 59

Exercises

1.6.1. Let V = Mn(R) be a vector space over R. Define

t t W1 = {A ∈ Mn×n(R) | A = A} and W2 = {A ∈ Mn×n(R) | A = −A}.

(a) Prove that W1 and W2 are subspaces of V .

(b) Prove that V = W1 ⊕ W2.

1.6.2. Let A1,...,An be subsets of a vector space V . Show that

hA1 ∪ · · · ∪ Ani = hA1i + ··· + hAni.

1.6.3. Assume V = U ⊕ W , where U and W are subspaces of V . For any v ∈ V , there exists a unique pair (u, w) where u ∈ U and w ∈ W such that v = u + w. Define P (v) = u and Q(v) = w. Prove that

(i) P and Q are linear maps on V ;

(ii) P 2 = P and Q2 = Q;

(iii) P (V ) = U and Q(V ) = W .

1.6.4. Let P : V → V be a linear map on a vector space V such that P 2 = P . Prove that V = im P ⊕ ker P.

1.6.5. Prove Theorem 1.6.11.

1.6.6. Prove Theorem 1.6.14.

1.6.7. Let U, V and W be vector spaces. Prove that

(i) L(U ⊕ V,W ) =∼ L(U, W ) ⊕ L(V,W );

(ii) L(U, V ⊕ W ) =∼ L(U, V ) ⊕ L(U, W ).

Now generalize these statements to finite direct sums:

Ln ∼ Ln (iii) L( i=1 Vi,W ) = i=1 L(Vi,W ); Ln ∼ Ln (iv) L(U, i=1 Vi) = i=1 L(U, Vi). 60 CHAPTER 1. VECTOR SPACES

1.6.8. Let W1,...,Wn be subspaces of a vector space V . Show that V = W1 ⊕

· · · ⊕ Wn if and only if there exist linear maps P1,...,Pn on V such that

(i) IV = P1 + ··· + Pn;

(ii) PiPj = 0 for any i 6= j;

(iii) Pi(V ) = Wi for each i. 2 Moreover, show that if Pi satisfy (i) and (ii) for each i, then Pi = Pi for each i. 1.6.9. Prove Proposition 1.6.20.

1.6.10. Let V1,...,Vn be subspaces of a vector space V such that V = V1 ⊕

· · · ⊕ Vn. Prove that given any vector space W and linear maps Ti : Vi → W for i = 1, . . . , n, there is a unique linear map T : V → W such that T |Vi = Ti for i = 1, . . . , n. 1.6.11. Prove Proposition 1.6.21 and Corollary 1.6.22.

1.6.12. Let W1,...,Wn be subspaces of a finite-dimensional vector space V such that V = W1 + ··· + Wn. Prove that V = W1 ⊕ · · · ⊕ Wn if and only if dim V = dim W1 + ··· + dim Wn. 1.6.13. Prove Theorem 1.6.25.

1.6.14. Let {Vα}α∈Λ be a family of vector spaces. For each α ∈ Λ, define the Q Q α-th projection πα : α∈Λ Vα → Vα by πα(v) = v(α) for each v ∈ α∈Λ Vα, and Q Q the α-th inclusion ια : Vα → α∈Λ Vα by ια(x) = v ∈ α∈Λ Vα, where v(α) = x and v(β) = 0 for any β 6= α. Q L Let V = α∈Λ Vα and let U = α∈Λ Vα. Prove the following statements:

(i) παια = IVα , the identity map on Vα, for each α ∈ Λ;

(ii) παιβ = 0 for all α 6= β in Λ;

(iii) πα is surjective and ια is injective for all α ∈ Λ;

(iv) given a vector space W and a family Tα : W → Vα for each α ∈ Λ, there is

a unique linear map T : W → V such that πα ◦ T = Tα for each α ∈ Λ;

(v) given a vector space W and a family Sα : Vα → W for each α ∈ Λ, there is

a unique linear map S : U → W such that S ◦ ια = Sα for each α ∈ Λ. 1.7. QUOTIENT SPACES 61

1.7 Quotient Spaces

Definition 1.7.1. Let W be a subspace of a vector space V . For each v ∈ V , define the affine space of v to be

v + W = {v + w | w ∈ W }.

Note that u ∈ v + W if and only if u = v + w for some w ∈ W . In general, an affine space v + W is not a subspace of V because 0 may not be in v + W .

Proposition 1.7.2. Let W be a subspace of a vector space V . Then for any u, v ∈ V ,

(i) u + W = v + W ⇔ u − v ∈ W ;

(ii) u + W 6= v + W ⇒ (u + W ) ∩ (v + W ) = ∅.

Proof. (i) Assume that u + W = v + W . Since u = u + 0 ∈ u + W , we have u = v + w for some w ∈ W . Hence u − v = w ∈ W . Conversely, assume that u − v ∈ W . Then u + w = v + (u − v + w) ∈ v + W for any w ∈ W . Thus u + W ⊆ v + W . Since W is a subspace of V , u − v ∈ W implies v − u ∈ W and hence v + w = u + (v − u + w) ∈ u + W for any w ∈ W . It follows that v + W ⊆ u + W . Thus u + W = v + W . (ii) Suppose there exists x ∈ (u + W ) ∩ (v + W ). Then x = u + w = v + w0 for some w, w0 ∈ W . Hence u − v = w0 − w ∈ W . By part (i), it follows that u + W = v + W .

Definition 1.7.3. Let V be vector space over a field F and W a subspace of V . Define V/W = {v + W | v ∈ V }.

Define the vector space operations on V/W by

(u + W ) + (v + W ) = (u + v) + W k(v + W ) = kv + W, for any u + W , v + W ∈ V/W and k ∈ F . It is easy to check that these operations are well-defined and that V/W is a vector space over a field F under 62 CHAPTER 1. VECTOR SPACES the operations define above. The space V/W is called the quotient space of V modulo W .

Proposition 1.7.4. Let W be a subspace of a vector space V . Define the canon- ical map π : V → V/W by

π(v) = v + W for any v ∈ V.

Then π is a surjective linear map with ker π = W .

Proof. Exercise.

Theorem 1.7.5 (Universal Mapping Property). Let V be a vector space and W ≤ V . Given any vector space U and a linear map t: V → U such that t(w) = 0 ∀w ∈ W , i.e. ker t ⊇ W , there exists a unique linear map T : V/W → U such that T ◦ π = t. π V / V/W ?? ?  ??  ??  t ??  T ??  ?   U Proof. If there exists a linear map T : V/W → U such that T ◦ π = t, then we must have

T (v + W ) = T (π(v)) = T ◦ π(v) = t(v) for each v ∈ V.

Hence if it exists, it must be defined by the formula T (v + W ) = t(v) for any v ∈ V . Now, we show that this is a well-defined linear map. Assume that v1 + W = v2 + W . Then v1 − v2 ∈ W ⊆ ker t. Hence t(v1) = t(v2). This shows that T is well-defined. It is then easy to check that T is linear and T ◦ π = t. The uniqueness of T follows from the discussion above.

Theorem 1.7.6 (First Isomorphism Theorem). Let t: V → U be a surjective linear map. Then V/ ker t =∼ U.

Proof. By the Universal Mapping Property (Theorem 1.7.5), there is a unique linear map T : V/ ker t → U such that T ◦ π = t. Since t = T ◦ π is surjective, 1.7. QUOTIENT SPACES 63

T is surjective. To show that T is 1-1, we will prove that ker T = {ker t} (recall that ker t is the zero in V/ ker t). Let v + ker t ∈ ker T . Then

0 = T (v + ker t) = T (π(v)) = t(v).

Hence v ∈ ker t. That is, v + ker t = ker t. On the other hand,

T (ker t) = T (π(0)) = t(0) = 0.

Hence ker t ∈ ker T .

Theorem 1.7.7 (Second Isomorphism Theorem). Let U and W be subspaces of a vector space V . Then (U + W )/W =∼ U/(U ∩ W ).

Proof. Exercise.

Corollary 1.7.8. Let V = U ⊕ W . Then (U ⊕ W )/W =∼ U.

Theorem 1.7.9 (Third Isomorphism Theorem). Let W ≤ U and U ≤ V . Then (V/W )/(U/W ) =∼ V/U.

Proof. Exercise.

Theorem 1.7.10. Let V be a finite-dimensional vector space and W a subspace of V . Then dim(V/W ) = dim V − dim W.

Proof. Note that the canonical map π : V → V/W is a surjective linear map whose kernel is W . Now the result follows from Theorem 1.3.8. 64 CHAPTER 1. VECTOR SPACES

Exercises

1.7.1. Let W be a subspace of a vector space V . Define a relation ∼ on V by

u ∼ v if u − v ∈ W.

Prove that ∼ is an equivalence relation on V and the equivalence classes are the affine spaces of V .

1.7.2. Prove that an affine space v + W is a subspace of V if and only if v ∈ W , in which case v + W = W .

1.7.3. Prove Proposition 1.7.4.

1.7.4. Let F be a field and let V = F [x]. Define W by

n W = {a0 + a1x + ··· + anx ∈ F [x]: n ∈ N ∪ {0}a0 + a1 + ··· + an = 0}.

Prove that W is a subspace of V and find dim(V/W ).

1.7.5. Let T : V → W be a linear map between vector spaces V and W . Let A and B be subspaces of V and W , respectively such that T [A] ⊆ B. Denote by p: V → V/A and q : W → W/B their respective canonical maps. Prove that there is a unique linear map T˜ : V/A → W/B such that T˜ ◦ p = q ◦ T .

T V / W

p q  T˜  V/A / W/B

Furthermore, prove that

(i) T˜ is 1-1 if and only if A = T −1[B];

(i) T˜ is onto if and only if B + im T = W .

1.7.6. Prove the Second and Third Isomorphism Theorems. 1.8. DUAL SPACES 65

1.8 Dual Spaces

We know that if V and W are vector spaces over a field F , the set L(V,W ) of linear maps from V into W is also a vector space over F . In this section, we consider the special case where W = F . It plays an important role in various subjects such as differential geometry, functional analysis, quantum mechanics.

Definition 1.8.1. Let V be a vector space over a field F . A linear map T : V → F is called a linear functional on V . The set of linear functionals on V is called the dual space or the conjugate space of V , denoted by V ∗:

V ∗ = L(V,F ) = Hom(V,F ).

By Proposition 1.3.20, V ∗ is a vector space over a field F . Hence if V is a finite-dimensional vector space, then so is V ∗ and dim V = dim V ∗.

Example 1.8.2.

n (i) For i = 1, . . . , n, let pi : F → F be defined by pi(a1, . . . , an) = ai for any n (a1, . . . , an) ∈ F . It is easy to see that each pi is a linear functional on F n, called the i-th coordinate function.

n n (ii) Let a = (a1, . . . , an) ∈ F . The map Ta : F → F defined by

Ta(x1, . . . , xn) = a · x = a1x1 + ··· + anxn,

n n for any x = (x1, . . . , xn) ∈ F , is a linear functional on F .

(iii) For each a ∈ F , define Ea : F [x] → F be Ea(p) = p(a) for each p ∈ F [x].

Then Ea is a linear functional on F [x]. R b (iv) Define T : C([a, b]) → R by T (f) = a f(x) dx for each f ∈ C([a, b]). Then T is a linear functional on C([a, b]).

(v) For any square matrix, its trace is the sum of all elements in the main

diagonal of the matrix. Define tr: Mn(F ) → F as follows: n X tr([aij]) = aii. i=1

The map tr is a linear functional on Mn(F ), called the trace function. 66 CHAPTER 1. VECTOR SPACES

Proposition 1.8.3. Let B = {v1, v2, . . . , vn} is a basis for a finite-dimensional ∗ ∗ vector space V . For i = 1, 2, . . . , n, define vi ∈ V on the basis B by  ∗ 1 if i = j; vi (vj) = δij = 0 if i 6= j.

∗ ∗ ∗ ∗ ∗ Then B = {v1, v2, . . . , vn} is a basis for V .

∗ Pn ∗ Proof. Let f ∈ V . We will show that f = i=1 f(vi)vi . To see this, let g be Pn ∗ the linear functional i=1 f(vi)vi . Then for j = 1, 2, . . . , n,

n n X ∗ X g(vj) = f(vi)vi (vj) = f(vi)δij = f(vj). i=1 i=1 Hence f = g on the basis B and thus f = g on V . This implies that B∗ spans ∗ Pn ∗ V . Next, let α1,..., αn ∈ F be such that i=1 αivi = 0. Applying vj, for each j, to both sides, we have

n n X ∗ X 0 = αivi (vj) = αiδij = αj. i=1 i=1

∗ Hence αj = 0 for j = 1, 2, . . . , n. This shows that B is linearly independent and thus a basis for V ∗.

Remark. Proposition 1.8.3 is not true if V is infinite-dimensional. For example, let V = F [x] and B = {1, x, x2,... }. Then B is a basis for V . Let B∗ = n ∗ {f0, f1, f2,... }, where fk(x ) = δkn. It is easy to check that B is linearly independent, but B∗ does not span V ∗. To see this, let g ∈ V ∗ be defined on a n ∗ basis B by g(x ) = 1 for every n ∈ N ∪ {0}. Suppose g ∈ hB i. Then

g = k0f0 + k1f1 + ··· + kmfm,

m+1 for some m ∈ N and k1, . . . , km ∈ F . Apply the above equation to x . Then m+1 m+1 g(x ) = 1, but fk(x ) = 0 for k = 0, 1, . . . , m, which is a contradiction. Hence g 6∈ hB∗i. 1.8. DUAL SPACES 67

Definition 1.8.4. Let V be a vector space. For any subset S of V , the annihilator of S, denoted by S◦, is defined by

S◦ = {f ∈ V ∗ | f(x) = 0 for all x ∈ S}.

Proposition 1.8.5. Let S be a subset of a vector space V . Then

◦ ∗ ◦ (i) {0V } = V and V = {0V ∗ };

(ii) S◦ is a subspace of V ∗;

◦ ◦ (iii) For any subsets S1 and S2 of V , S1 ⊆ S2 implies S2 ⊆ S1 . Proof. (i) This follows from the definition of an annihilator. (ii) The proof is routine and we leave it to the reader. ∗ (iii) Assume that S1 ⊆ S2 ⊆ V . For any f ∈ V , if f(x) = 0 for all x ∈ S2, then ◦ ◦ f(x) = 0 for all x ∈ S1. Hence S2 ⊆ S1 .

Proposition 1.8.6. If W is a subspace of a finite-dimensional vector space V , then dim V = dim W + dim W ◦.

Proof. Let W be a subspace of V . Let {v1, . . . , vk} be a basis for W and extend ∗ ∗ ∗ it to a basis B = {v1, . . . , vk, vk+1, . . . , vn} for V . Let B = {v1, . . . , vn} be the ∗ ∗ ∗ ∗ dual basis of B and let C = {vk+1, . . . , vn}. We will show that C is a basis for W ◦. Since C∗ ⊆ B∗, it follows that C∗ is linearly independent. To see that C∗ ◦ ◦ Pn ∗ spans W , let f ∈ W . We will show that f = i=k+1 f(vi)vi . By the proof of Pn ∗ ◦ Proposition 1.8.3, f = i=1 f(vi)vi . Since f ∈ W and vi ∈ W for i = 1, . . . , k, Pn ∗ ∗ we have f(vi) = 0 for i = 1, . . . , k. Hence f = i=k+1 f(vi)vi ∈ span C . Now, dim W ◦ = |C∗| = n − k = dim V − dim W .

Next, we define a dual of a linear map. Given a linear map T : V → W , we can use T to turn a linear functional f on W into a linear functional on V just by a composition T ◦ f. Hence there is a map from W ∗ into V ∗ associated to T .

Definition 1.8.7. Let T : V → W be a linear map. Define T t : W ∗ → V ∗ by

T t(f) = f ◦ T for any f ∈ W ∗.

The map T t is called the or the dual of T . 68 CHAPTER 1. VECTOR SPACES

Proposition 1.8.8. Let V and W be vector spaces. Then

(i) If T ∈ L(V,W ), then T t ∈ L(W ∗,V ∗).

t (ii) (IV ) = IV ∗ .

(iii) (αS + βT )t = αSt + βT t for any S, T ∈ L(V,W ) and α, β ∈ F .

(iv) (TS)t = StT t for any S ∈ L(U, V ) and T ∈ L(V,W ).

(v) If T ∈ L(V,W ) is invertible, then T t is invertible and (T t)−1 = (T −1)t.

Proof. (i) Let f, g ∈ W ∗ and α, β ∈ F . Then

T t(αf + βg) = (αf + βg) ◦ T = α(f ◦ T ) + β(g ◦ T ) = α T t(f) + β T t(g).

Hence T t : W ∗ → V ∗ is linear. (ii) Note that

t ∗ (IV ) (f) = f ◦ IV = f = IV ∗ (f) for any f ∈ V .

t Hence (IV ) = IV ∗ . (iii) Let S, T ∈ L(V,W ) and α, β ∈ F . Then for any f ∈ W ∗,

(αS + βT )t(f) = f ◦ (αS + βT ) = α(f ◦ S) + β(f ◦ T ) = α St(f) + β T t(f).

Hence (αS + βT )t = αSt + βT t. (iv) Let S ∈ L(U, V ) and T ∈ L(V,W ). Then for any f ∈ W ∗,

(TS)t(f) = f ◦ (T ◦ S) = St(f ◦ T ) = St(T t(f)).

Hence (TS)t = StT t. (v) Assume that T ∈ L(V,W ) is invertible. Then there is S ∈ L(W, V ) such that

ST = IV and TS = IW . Then

t t t t t t t t T S = (ST ) = (IV ) = IV ∗ and S T = (TS) = (IW ) = IW ∗ .

This shows that T t is invertible and (T t)−1 = St = (T −1)t. 1.8. DUAL SPACES 69

Proposition 1.8.9. Let V and W be finite-dimensional vector spaces over F and T : V → W a linear map. Let B and C be ordered bases for V and W , respectively. Also, let B∗ and C∗ be the dual (ordered) bases of B and C, respectively. Then

t t [T ]C∗,B∗ = [T ]B,C.

Proof. Let B = {v1, . . . , vn} and C = {w1, . . . , wm} be ordered bases for V and ∗ ∗ ∗ ∗ ∗ W , respectively. Let B = {v1, . . . , vn} and C = {w1, . . . , wm} be the dual bases t of B and C, respectively. Let A = [aij] = [T ]B,C and B = [bij] = [T ]C∗,B∗ . Then for j = 1, . . . , n, m X T (vj) = akjwk k=1 and for i = 1, . . . , m,

n t ∗ X ∗ T (wi ) = bkivk. k=1 These two equalities imply that

∗ t ∗ aij = wi (T (vj)) = (T (wi ))(vj) = bji for i = 1, . . . , m and j = 1, . . . , n. This shows that B = At.

If V is a vector space, its dual space V ∗ is also a vector space and hence we can again define the dual space (V ∗)∗ of V ∗. In the sense that we will describe below, the second dual V ∗∗ = (V ∗)∗ is closely related to the original space V , especially if V is finite-dimensional, V and V ∗∗ are isomorphic via a canonical (basis-free) linear isomorphism.

Definition 1.8.10. If V is a vector space, the dual space of V ∗, denoted by V ∗∗, is called the double dual or the second dual of V .

To establish the main result about the double dual space, the following propo- sition will be useful.

Proposition 1.8.11. Let V be a vector space and v ∈ V . If f(v) = 0 for any f ∈ V ∗, then v = 0. Equivalently, if v 6= 0, then there exists f ∈ V ∗ such that f(v) 6= 0. 70 CHAPTER 1. VECTOR SPACES

Proof. Assume that v 6= 0. Then {v} is linearly independent and thus can be extended to a basis B for V . Let t: B → F be defined by t(v) = 1 and t(x) = 0 for any x ∈ B − {v}. Extend t to a linear functional f on V . Hence f ∈ V ∗ and f(v) 6= 0.

Theorem 1.8.12. Let V be a vector space over a field F . For each v ∈ V , define vˆ: V ∗ → F by vˆ(f) = f(v) for any f ∈ V ∗. Then

(i) vˆ is a linear functional on V ∗, i.e. vˆ ∈ V ∗∗ for each v ∈ V .

(ii) the map Φ: V → V ∗∗, v 7→ vˆ, is an injective linear map.

(iii) If V is finite-dimensional, then Φ is a linear isomorphism.

Hence V =∼ V ∗∗, via the canonical map Φ if V is finite-dimensional.

Proof. (i) For any f, g ∈ V ∗ and α, β ∈ F ,

vˆ(αf + βg) = (αf + βg)(v) = αf(v) + βg(v) = αvˆ(f) + βvˆ(g).

This shows thatv ˆ is linear on V ∗ for each v ∈ V .

(ii) Let v, w ∈ V and α ∈ F . Then, for any f ∈ V ∗,

(v\+ w)(f) = f(v + w) = f(v) + f(w) =v ˆ(f) +w ˆ(f) = (ˆv +w ˆ)(f).

Similarly, for any f ∈ V ∗

(dαv)(f) = f(αv) = αf(v) = α vˆ(f).

Hence (v\+ w) =v ˆ +w ˆ and (dαv) = α vˆ. This shows that Φ is linear. To see that it is 1-1, let v ∈ V be such that v 6= 0. By Proposition 1.8.11, there exists f ∈ V ∗ such that f(v) 6= 0. Thusv ˆ(f) 6= 0, i.e.,v ˆ 6= 0. Hence Φ is 1-1.

(iii) If V is finite-dimensional, then dim V ∗∗ = dim V ∗ = dim V. Since Φ is 1-1, by Theorem 1.3.16, it is a linear isomorphism. 1.8. DUAL SPACES 71

Exercises

1.8.1. Consider C as a vector space over R. Prove that the dual basis for {1, i} is {Re, Im}, where Re and Im are the real part and the imaginary part, respectively, of a complex number.

1.8.2. Let V be a finite-dimensional vector space and U, W subspaces of V . Prove that

(i) (U + W )◦ = U ◦ ∩ W ◦.

(ii) (U ∩ W )◦ = U ◦ + W ◦.

(iii) If V = U ⊕ W , then V ∗ = U ◦ ⊕ W ◦.

Also try to prove these statements without assuming that V is finite-dimensional.

1.8.3. Let V be a finite-dimensional vector space and W a subspace of V . Prove that W =∼ (W ◦)◦, under the canonical isomorphism between V and V ∗∗.

1.8.4. Let V be a vector space. For any M ⊆ V ∗, define the annihilator ◦M of M by ◦M = {x ∈ V | f(x) = 0 ∀f ∈ M}.

Prove that

◦ ◦ ∗ (i) {0V ∗ } = V and (V ) = {0V }.

(ii) ◦M is a subspace of V .

∗ ◦ ◦ (iii) For any M1, M2 ⊆ V , if M1 ⊆ M2, then M2 ⊆ M1.

(iv) if V is finite dimensional, then

dim V = dim V ∗ = dim W + dim(◦W ).

1.8.5. Let V and W be vector spaces. Prove that (V ⊕ W )∗ =∼ V ∗ ⊕ W ∗, where the direct sums are external. 72 CHAPTER 1. VECTOR SPACES

1.8.6. Let {Vα}α∈Λ be a family of vector spaces. Prove that

∗  M  ∼ Y ∗ Vα = Vα . α∈Λ α∈Λ 1.8.7. Let V be a vector space. Prove the following statements:

(i) If U is a proper subspace of V and x ∈ V − U, then there exists f ∈ V ∗ such that f(x) = 1, but f(U) = {0};

◦ ◦ (ii) for any subspaces W1 and W2 of V , W1 = W2 if and only if W1 = W2 .

1.8.8. Let V be a finite-dimensional vector space. If C is a basis for V ∗, prove that there exists a basis B for V such that C = B∗.

1.8.9. Let f, g ∈ V ∗ be such that ker f ⊆ ker g. Prove that g = αf for some α ∈ F .

1.8.10. Let T : V → W be a linear map. Prove the following statements:

(i) ker T t = (im T )◦.

(ii) im T t = (ker T )◦.

(iii) T is 1-1 if and only if T t is onto.

(iv) T is onto if and only if T t is 1-1.

(v) rank T = rank T t if V is finite-dimensional.

Hint for (ii): Let U be a subspace of W such that W = im T ⊕U. If g ∈ (ker T )◦, define f(T (x) + u) = g(x) for any x ∈ V and u ∈ U.

∗∗ ∗∗ 1.8.11. Let T : V → W be a linear map. Let ΦV : V → V and ΦW : W → W be the canonical maps, defined in Theorem 1.8.12, for V and W , respectively. Let tt ∗∗ ∗∗ tt T : V → W denote the double transpose of T . Prove that T ◦ΦV = ΦW ◦T. Draw a commutative diagram. Chapter 2

Multilinear Algebra

In this chapter, we study various aspects of multilinear maps. A is a function defined on a product of vector spaces which is linear in each factor. To study a multilinear map, we turn them into a linear map on a new vector space, called a of the vector spaces. A tensor product is characterized by its universal mapping property. Then we look at the of a matrix, which can be regarded as a multilinear map on the row vectors. The determinant function is an example of an alternating multilinear map. This leads to a study of an exterior product of vector spaces, which is also defined by its universal mapping property for alternating multilinear maps. To acquaint a reader with the concept of a universal mapping property, we first start with the concept of free vector spaces, which will be used in the construction of a tensor product of vector spaces.

2.1 Free Vector Spaces

Throughout this chapter, unless otherwise stated, F will be an aribitrary field.

Given any nonempty set X, we will construct a vector space FX over F which contains X as a basis. Then FX is called a free vector space on X. Recall that if V is a vector space with a basis B, we have the following universal mapping property:

73 74 CHAPTER 2. MULTILINEAR ALGEBRA

 iB B / V Given a vector space W and a function t: B → W , ??  ??  ??  there exists a unique linear map T : V → W such t ??  T ?   that T ◦ i = t. W B

We now define a free vector space on a non-empty set by the universal mapping property.

Definition 2.1.1. Let X be a non-empty set. A free vector space on X is a pair (V, i) consisting of a vector space V and a function i: X → V satisfying the following universal mapping property:

i X / V Given a vector space W and a function t: X → W , ??  ??  ??  there exists a unique linear map T : V → W such t ??  T ?   W that T ◦ i = t.

Hence if V is a vector space over F with a basis B, then (V, iB) is a free vector space on B, where iB : B ,→ V is the inclusion map.

Proposition 2.1.2. If (V, i) is a free vector space on a non-empty set X, then i is injective.

Proof. Let x, y ∈ X be such that x 6= y. Take W = F in the universal mapping property and choose a function t: X → F so that t(x) 6= t(y) (e.g., t(x) = 0, t(y) = 1). Then there is a unique linear map T : V → F such that T ◦ i = t. It follows that T (i(x)) 6= T (i(y)), which implies i(x) 6= i(y). Thus i is injective.

If (V, i) is a free vector space on a non-empty set X, we will soon see that i(X) forms a basis for V . Since i is injective, we can identify X with a subset i(X) of V and simply say that V is a vector space containing X as a basis. The term “free” means there is no relationship between the elements of X. The point of view here is that, starting from an arbitrary set, we can construct a vector space for which the given set is a basis.

Proposition 2.1.3. Let F be a field and X a non-empty set. Then there exists a free vector space over F on X. 2.1. FREE VECTOR SPACES 75

Proof. Define

FX = {f : X → F | f(x) 6= 0 for finitely many x}.  1 if y = x; For each x ∈ X, define δx : X → F by δx(y) = 0 if y 6= x. It is now routine to verify that

(i) FX is a subspace of F(X);

P (ii) for any f ∈ FX , f = x f(x)δx, where the sum is a finite sum;

(iii) {δx}x∈X is linearly independent.

It follows that FX is a vector space over F containing {δx}x∈X as a basis. Let

iX : X → FX be defined by iX (x) = δx for each x ∈ X. It is readily checked that

the universal mapping property is satisfied. Hence (FX , iX ) is a free vector space over F on X.

With a slight abuse of notation, we will identify the function δx with the element x ∈ X itself. Then we can view FX as a vector space containing X as a Pn basis. A typical element in FX can be written as i=1 αixi, where n ∈ N, αi ∈ F and xi ∈ X, for i = 1, . . . , n. The vector space operations are done by combining like terms using the rules

αxi + βxi = (α + β)xi,

α(βxi) = (αβ)xi.

In general, there are several ways to construct a free vector space on a non- empty set. However, the universal mapping property will show that different constructions of a free vector space on the same set are all isomorphic. Hence, a free vector space is uniquely determined up to isomorphism.

Proposition 2.1.4. A free vector space on a non-empty set X is unique up to isomorphism. More precisely, if (V1, i1) and (V2, i2) are free vector spaces on X, then there is a linear isomorphism T : V1 → V2 such that T ◦ i1 = i2. 76 CHAPTER 2. MULTILINEAR ALGEBRA

Proof. Let (V1, i1) and (V2, i2) be free vector spaces on a non-empty set X. By the universal mapping property, we have the following commutative diagrams:

i1 i2 X / V1 X / V2 ?? ?? ??  ??  ??  ??  t ??  T t ??  T ?   ?   W W

Now, taking W = V2 and t = i2 in the first diagram, we have a linear map

T1 : V1 → V2 such that T1 ◦ i1 = i2. Similarly, taking W = V1 and t = i1 in the second diagram, we have a linear map T2 : V2 → V1 such that T2 ◦ i2 = i1.

i1 i2 X / V1 X / V2 ??  ??  ??  ??  ??  ??  i2 ?  T1 i1 ?  T2 ?   ?   V2 V1

Hence (T1 ◦ T2) ◦ i2 = i2 and (T2 ◦ T1) ◦ i1 = i1. However, the identity map

IV1 on V1 is a unique linear map such that IV1 ◦ i1 = i1. Similarly, IV2 is a unique linear map such that IV2 ◦ i2 = i2.

i1 i2 X / V1 X / V2 ??  ??  ??  ??  ??  ??  i1 ?  IV i2 ?  IV ?   1 ?   2 V1 V2

Hence T2 ◦ T1 = IV1 and T1 ◦ T2 = IV2 . This shows that T1 and T2 are inverses of each other. Hence T : V1 → V2 is a linear isomorphism such that T ◦ i1 = i2. 2.1. FREE VECTOR SPACES 77

Exercises

2.1.1. Let (V, i) be a free vector space on a non-empty set X. Given a vector space U and a function j : X → U, show that (U, j) is a free vector space on X if and only if there is a unique linear map f : U → V such that f ◦ j = i.

2.1.2. Let (V, i) be a free vector space on a non-empty set X. Prove directly from the universal mapping property that i(X) spans V .

Hint: Let W be the span of i(X) and iW : W → V the inclusion map. Apply the universal mapping property to the following commutative diagram to show

that iW is surjective. V ? i   ϕ   i  X / W ?? ?? i ?? iW ?? ?  V 2.1.3. Let (V, i) be a free vector space on a non-empty set X. Prove that i(X) is a basis for V . 78 CHAPTER 2. MULTILINEAR ALGEBRA

2.2 Multilinear Maps and Tensor Products

Definition 2.2.1. Let V1,..., Vn and W be vector spaces over F . A function f : V1 × · · · × Vn → W is said to be multilinear if for each i ∈ {1, 2, . . . , n},

f(x1, . . . , αxi + βyi, . . . , xn) = αf(x1, . . . , xi, . . . , xn) + βf(x1, . . . , yi, . . . , xn) for any xi, yi ∈ Vi and α, β ∈ F . In other words, a multilinear map is a function on a Cartesian product of vector spaces which is linear in each variable. If W = F , we call it a .

Denote by Mul(V1,...,Vn; W ) the set of multilinear maps from V1 × · · · × Vn into W .

Remark. If n = 1, a multilinear map is simply a linear map. If n = 2, we call it a . In general, we may call a multilinear map on a product of n vector spaces an n-linear map.

Examples. (1) Let V be a vector space. Then the dual pairing ω : V × V ∗ → F defined by ω(v, f) = f(v) for any v ∈ V and f ∈ V ∗ is a . (2) If V is an algebra, a multiplication ·: V × V → V is a bilinear map. (3) Let A be an n × n matrix over F . The map L: F n × F n → F defined by

L(x, y) = ytAx for any x, y ∈ F n, is a bilinear form on F n. Here we identify a vector in F n with an n × 1 column matrix.

(4) We can view the determinant function on Mn(F ) as a multilinear map as follows. Let A be an n × n matrix and r1,..., rn the rows of A. Then the determinant can be viewed as a function det: F n × · · · × F n → F defined by

det(r1, . . . , rn) = det A.

That det is a multilinear map follows from the following properties:

0 0 det(r1, . . . , ri + ri, . . . , rn) = det(r1, . . . , ri, . . . , rn) + det(r1, . . . , ri, . . . , rn)

det(r1, . . . , αri, . . . , rn) = α det(r1, . . . , ri, . . . , rn) 2.2. MULTILINEAR MAPS AND TENSOR PRODUCTS 79

Proposition 2.2.2. Let V1,..., Vn and W be vector spaces over F . Then the set

of multilinear maps Mul(V1,...,Vn; W ) is a vector space over F under addition and scalar multiplication defined by

(f + g)(v1, . . . , vn) = f(v1, . . . , vn) + g(v1, . . . , vn)

(kf)(v1, . . . , vn) = kf(v1, . . . , vn). Proof. The proof is routine and we leave it to the reader.

Proposition 2.2.3. Let V1,..., Vn and W be vector spaces over F . Then for any n ≥ 2,

Mul(V1,...,Vn; W ) ∩ L(V1 × · · · × Vn,W ) = {0}.

Proof. Let T ∈ Mul(V1,...,Vn; W ) ∩ L(V1 × · · · × Vn,W ). Then

T (v1, v2, . . . , vn) = T (v1, 0,..., 0) + T (0, v2, . . . , vn)

= 0 · T (v1, v2, 0,..., 0) + 0 · T (v1, v2, . . . , vn) = 0

for any (v1, v2, . . . , vn) ∈ V1 × · · · × Vn. The first equality above follows from the linearity of T and the second one follows from the linearity in the second and first variables, respectively.

From this proposition, we see that theory of linear maps cannot be applied to multilinear maps directly. However, we can transform a multilinear map to a linear map on a certain vector space and apply theory of linear algebra to this induced linear map and then transfer information back to the original multilinear map. In the process of doing so, we will construct a new vector space which is very important in its own. It is called a tensor product of vector spaces. We begin by considering a tensor product of two vector spaces. Let U and V be vector spaces over F . We would like to define a new vector space U ⊗ V which is the “product”of U and V . (Note that the direct product U × V is really the “sum”of U and V .) The space U ⊗ V will consist of formal elements of the form

α1(u1 ⊗ v1) + ··· + αn(un ⊗ vn), (2.1)

where n ∈ N, αi ∈ F , ui ∈ U and vi ∈ V , for i = 1, 2, . . . , n. 80 CHAPTER 2. MULTILINEAR ALGEBRA

Moreover, it satisfies the distributive laws:

(αu1 + βu2) ⊗ v = α(u1 ⊗ v) + β(u2 ⊗ v) ∀u1, u2 ∈ U ∀v ∈ V ∀α, β ∈ F ;

u ⊗ (αv1 + βv2) = α(u ⊗ v1) + β(u ⊗ v2) ∀u ∈ U ∀v1, v2 ∈ V ∀α, β ∈ F .

But we do not assume that the tensor product is commutative: u ⊗ v 6= v ⊗ u. In fact, u ⊗ v and v ⊗ u live in different spaces. By the distributive laws, we can rewrite the formal sum (2.1) as

(α1u1) ⊗ v1 + ··· + (αnun) ⊗ vn.

By renaming αiui as ui, any element in U ⊗ V can be written as

u1 ⊗ v1 + ··· + un ⊗ vn. (2.2)

However, this representation is not unique. We can have different formal sums (2.2) that represent the same element in U ⊗ V . This will be a problem when we define a function on the tensor product U ⊗ V . To get around this problem, we will introduce the universal mapping property of a tensor product. In fact, we will define a tensor product U ⊗V to be the universal object that turns a bilinear map on U ×V into a linear map on U ⊗V . Any linear map on the tensor product U ⊗ V will be defined through the universal mapping property.

Definition 2.2.4. Let U and V be vector spaces over F . A tensor product of U and V , is a vector space X over F , together with a bilinear map b: U × V → X with the following universal mapping property:

b U × V / X Given any vector space W and a bilinear map GG | GG GG | ϕ : U × V → W , there exists a unique linear ϕ GG | φ G# ~| W map φ: X → W such that φ ◦ b = ϕ.

There are several ways to define a tensor product of vector spaces. If the vector spaces are finite-dimensional, we can give an elementary construction. On the other hand, one can construct a tensor product of modules, in which case a construction of a tensor product of vector spaces is a special case. Here, we will adopt a medium ground in which we construct a tensor product of two vector spaces, not necessarily finite-dimensional. 2.2. MULTILINEAR MAPS AND TENSOR PRODUCTS 81

Theorem 2.2.5. Let U and V be vector spaces. Then a tensor product of U and V exists.

Proof. Let U and V be vector spaces over a field F . Let (FU×V , i) denote the free vector space on U × V . Here U × V is the Cartesian product of U and V with no algebraic structure. Then n X o FU×V = αj(uj, vj) | (uj, vj) ∈ U × V and αj ∈ F . finite

Let T be the subspace of FU×V generated by all vectors of the form:

α(u, v) + β(u0, v) − (αu + βu0, v) and α(u, v) + β(u, v0) − (u, αv + βv0)

0 0 for all α, β ∈ F , u, u ∈ U and v, v ∈ V . Let b: U × V → FU×V /T be a map defined by b(u, v) = (u, v)+T . Note that the map b is just the composition of the canonical map i: U × V → FU×V and the projection map π : FU×V → FU×V /T . Since α(u, v)+β(u0, v)−(αu+βu0, v) ∈ T and α(u, v)+β(u, v0)−(u, αv+βv0) ∈ T for all α, β ∈ F , u, u0 ∈ U and v, v0 ∈ V , it follows that

(αu + βu0, v) + T = α(u, v) + β(u0, v) + T and (u, αv + βv0) + T = α(u, v) + β(u, v0) + T

for all α, β ∈ F , u, u0 ∈ U and v, v0 ∈ V . From this, it is easy to see that b is

bilinear. Next, we prove that the quotient space FU×V /T satisfies the universal mapping property in Definiton 2.2.4. Consider the following diagram:

b = π◦i

i π % U × V / FU×V / FU×V /T J JJ s JJ ϕ s ϕ JJ s JJ s φ J$  s W y

Let W be a vector space over F and ϕ: U × V → W a bilinear map. By the universal mapping property of the free vector space FU×V , there exists a unique linear map ϕ: FU×V → W such that ϕ ◦ i = ϕ. Since ϕ is bilinear, it 82 CHAPTER 2. MULTILINEAR ALGEBRA sends any of the vectors which generate T to zero, so T ⊆ ker ϕ. Hence by the universal mapping property of the quotient space, there exists a unique linear map φ: FU×V /T → W such that φ ◦ π = ϕ. Hence

φ ◦ b = φ ◦ π ◦ i = ϕ ◦ i = ϕ.

0 It remains to show that φ is unique. Suppose that φ : FU×V /T → W is a linear 0 0 map such that φ ◦ b = ϕ. Then φ ◦ π : FU×V → W is a linear map for which (φ0 ◦ π) ◦ i = φ0 ◦ b = ϕ. Hence by the uniqueness of ϕ, we have φ0 ◦ π = ϕ. But then by the uniqueness of φ, we have φ0 = φ.

We have given a construction of a tensor product of two vector spaces. In fact, there are different ways in constructing a tensor product. For example, if U and V are finite-dimensional vector spaces, then the space Bil(U ∗,V ∗; F ) consisting of all bilinear maps from U ∗ × V ∗ into F satisfies the universal mapping property for the tensor product. (Exercise!) However, any construction of a tensor product will give an isomorphic vector space as stated in the next Proposition.

Proposition 2.2.6. A tensor product of U and V is unique up to isomorphism.

More precisely, if (X1, b1) and (X2, b2) are tensor products of U and V , then there is a linear isomorphism F : X1 → X2 such that F ◦ b1 = b2.

Proof. The proof here is the same as the proof of uniqueness of a free vector space on a non-empty set (Theorem 2.1.4). We repeat it here for the sake of completeness. Let (X1, b1) and (X2, b2) be tensor products of U and V . Note that b1 and b2 are bilinear maps from U × V into X1 and X2, respectively. By the universal mapping property of (X1, b1), there exists a unique linear map

F1 : X1 → X2 such that F1 ◦ b1 = b2. Similarly, there exists a unique linear map

F2 : X2 → X1 such that F2 ◦ b2 = b1.

X1 nn7 W b1 nn nnn nnn nnn U × V F1 F2 PP PPP PPP b2 PPP PP'  X2 2.2. MULTILINEAR MAPS AND TENSOR PRODUCTS 83

Hence F2 ◦ F1 ◦ b1 = b1. But then IX1 is the unique linear map from X1 into X1 such that IX1 ◦ b1 = b1. Thus F2 ◦ F1 = IX1 . Similarly, F1 ◦ F2 = IX2 .

b1 b2 U × V / X1 U × V / X2 ??  ??  ??  ??  ??  ??  ??  ??  b ? I b ? I 1 ??  X1 2 ??  X2 ??  ??  ?   ?   X1 X2

−1 Thus F1 = F2 . Hence F1 is a linear isomorphism such that F1 ◦ b1 = b2.

Remark. Since the tensor product of U and V is unique up to isomorphism, we denote it by U ⊗ V or U ⊗F V if the base field is emphasized. We summarize the universal mapping property as follows:

b U × V / U ⊗ V Given any vector space W and a bilinear map GG v GG v GG v ϕ : U × V → W , there exists a unique linear ϕ GG v φ G# {v W map φ: U ⊗ V → W such that φ ◦ b = ϕ.

We also write u ⊗ v = b(u, v).

Proposition 2.2.7. Let U and V be vector spaces over a field F . Then

(i) (αu1 + βu2) ⊗ v = α(u1 ⊗ v) + β(u2 ⊗ v) ∀u1, u2 ∈ U ∀v ∈ V ∀α, β ∈ F ;

(ii) u ⊗ (αv1 + βv2) = α(u ⊗ v1) + β(u ⊗ v2) ∀u ∈ U ∀v1, v2 ∈ V ∀α, β ∈ F .

Proof. (i) Let u1, u2 ∈ U, v ∈ V and α, β ∈ F . Then by bilinearity of b,

(αu1 + βu2) ⊗ v = b(αu1 + βu2, v)

= αb(u1, v) + βb(u2, v)

= α(u1 ⊗ v) + β(u2 ⊗ v)

(ii) The proof of (ii) is very similar and is omitted here.

Theorem 2.2.8. Let U and V be vector spaces. If B and C are bases for U and V , respectively, then {u ⊗ v | u ∈ B, v ∈ C} is a basis for U ⊗ V . 84 CHAPTER 2. MULTILINEAR ALGEBRA

Proof. Let D = {u ⊗ v | u ∈ B, v ∈ C}. To see that D is linearly independent, let ui ∈ B, vj ∈ C and aij ∈ F , for i = 1, . . . , n, j = 1, . . . , m, be such that

n m X X aij(ui ⊗ vj) = 0. (2.3) i=1 j=1

For k = 1, . . . , n, define ϕk : B → F by  1 if u = uk; ϕk(u) = 0 otherwise.

Similarly, for ` = 1, . . . , m, define ψ` : C → F by  1 if v = v`; ψ`(v) = 0 otherwise.

Extend ϕk and ψ` to linear functionals on U and V , respectively. Moreover, for k = 1, . . . , n, ` = 1, . . . , m, define fk` : U × V → F by

fk`(u, v) = ϕk(u)ψ`(v) for any (u, v) ∈ U × V .

It is easy to see that fk` is a bilinear map for k = 1, . . . , n, ` = 1, . . . , m. Hence there is a unique linear map Fk` : U ⊗V → F such that Fk` ◦b = fk`. In particular,

Fk`(ui ⊗ vj) = Fk` ◦ b(ui, vj) = fk`(ui, vj) = ϕk(ui)ψ`(vj).

Now, for k = 1, . . . , n, ` = 1, . . . , m, apply Fk` to (2.3):

n m  X X  0 = Fk` aij(ui ⊗ vj) i=1 j=1 n m X X = aijFk`(ui ⊗ vj) i=1 j=1 n m X X = aijϕk(ui)ψ`(vj) i=1 j=1

= ak`.

This shows that the coefficients aij = 0 for any i = 1, . . . , n, j = 1, . . . , m. Thus D is linearly independent. 2.2. MULTILINEAR MAPS AND TENSOR PRODUCTS 85

Next, let Y = span D. Then Y is a subspace of the vector space U ⊗ V . Thus there is a subspace Z of U ⊗V such that U ⊗V = Y ⊕Z. Let Φ be the projection map from U ⊗ V (= Y ⊕ Z) onto Y , namely,

Φ(y + z) = y (y ∈ Y , z ∈ Z).

Pn Pm To show that Φ ◦ b = b, let u = i=1 αiui ∈ U and v = j=1 βjvj ∈ V , where ui ∈ U, vj ∈ V and αi, βj ∈ F for i = 1, . . . , n and j = 1, . . . , m. Then by bilinearity of b, we have

n m  X X  X X b(u, v) = b αiui, βjvj = αiβj b(ui, vj) = αiβj(ui ⊗ vj). i=1 j=1 i,j i,j Hence b(u, v) ∈ span D = Y . It follows that

Φ ◦ b(u, v) = Φ(b(u, v)) = b(u, v).

Thus Φ ◦ b = b, as desired. Since Φ: U ⊗ V → U ⊗ V is a linear map such that Φ ◦ b = b and IU⊗V : U ⊗ V → U ⊗ V is the unique linear map such that

IU⊗V ◦b = b. By the uniqueness, we have Φ = IU⊗V . It follows that Z = {0} and that span D = Y = U ⊗ V . We now can conclude that D is a basis for U ⊗ V .

Corollary 2.2.9. Let U and V be finite-dimensional vector spaces. Then

dim(U ⊗ V ) = (dim U)(dim V ).

Proof. Let B, C and D be the bases of U, V and U ⊗ V , respectively, as in the proof of Theorem 2.2.8. Then |D| = |B| · |C|.

Corollary 2.2.10. Let U and V be vector spaces. Then any element in U ⊗ V can be written as n X ui ⊗ vi, i=1 where n ∈ N, ui ∈ U and vi ∈ V , for i = 1, . . . , n. Proof. Let B and C be bases for U and V , respectively. Let x ∈ U ⊗ V . Then x can be written as n m X X x = aij(ui ⊗ vj) i=1 j=1 86 CHAPTER 2. MULTILINEAR ALGEBRA where m, n ∈ N, ui ∈ B, vj ∈ C and aij ∈ F for i = 1, . . . , n and j = 1, . . . , m. Thus n m n m n X X X  X  X 0 x = aij(ui ⊗ vj) = ui ⊗ aijvj = ui ⊗ vi, i=1 j=1 i=1 j=1 i=1 0 Pm where each vi = j=1 aijvj ∈ V .

Remark. A typical element in U ⊗ V is not u ⊗ v, but a linear combination of n P elements in the form ui ⊗ vi, where n ∈ N, ui ∈ U, vi ∈ V , i = 1, ..., n.A i=1 linear combination of products may not be written as a single product of two elements.

U ⊗ V 6= {u ⊗ v | u ∈ U, v ∈ V }.

But

U ⊗ V = span{u ⊗ v | u ∈ U, v ∈ V } n n X o = ui ⊗ vi | n ∈ N, ui ∈ U, vi ∈ V . i=1 However, a linear combination that represents an element in U ⊗V is not unique. For example,

2(u ⊗ v) = (2u) ⊗ v = u ⊗ (2v) = u ⊗ v + u ⊗ v = (2u) ⊗ (2v) − 2(u ⊗ v).

This is an important point because a function on the tensor product U ⊗V defined by specifying the action on its elements may not be well-defined. In general, we will use the universal mapping property to define a linear map on the tensor product. We will see more about this later.

Next we investigate several properties of tensor products.

Theorem 2.2.11. Let V be a vector space over a field F . Then

F ⊗ V =∼ V =∼ V ⊗ F.

Proof. Let b: F × V → F ⊗ V be a bilinear map defined by

b(k, v) = k ⊗ v for any (k, v) ∈ F × V . 2.2. MULTILINEAR MAPS AND TENSOR PRODUCTS 87

Define ϕ: F × V → V by

ϕ(k, v) = kv for any (k, v) ∈ F × V .

It is easy to see that ϕ is a bilinear map. Then there is a unique linear map Φ: F ⊗ V → V such that Φ ◦ b = ϕ. In particular, Φ(k ⊗ v) = kv for any k ∈ F and v ∈ V . Now define Ψ: V → F ⊗ V by Ψ(v) = 1 ⊗ v for any v ∈ V . By

Proposition 2.2.7, Ψ is linear. Moreover, Φ ◦ Ψ = IV . To see that Ψ ◦ Φ = IF ⊗V , consider, for any k ∈ F and v ∈ V ,

Ψ ◦ Φ(k ⊗ v) = Ψ(kv) = 1 ⊗ kv = k(1 ⊗ v) = k ⊗ v. (2.4)

Hence Ψ ◦ Φ = IF ⊗V on the set {k ⊗ v | k ∈ F, v ∈ V }, which spans F ⊗ V . It

follows that Ψ ◦ Φ = IF ⊗V . (Alternatively, (2.4) show that Ψ ◦ Φ ◦ b = b. By the

uniqueness, Ψ ◦ Φ = IF ⊗V .) It means that Φ and Ψ are inverses of each other, and that Φ: F ⊗ V → V is a linear isomorphism. Now we establish F ⊗ V =∼ V . Similarly, we can show that V ⊗ F =∼ V .

Theorem 2.2.12. Let U and V be vector spaces over a field F . Then

U ⊗ V =∼ V ⊗ U.

Proof. Let b1 : U × V → U ⊗ V and b2 : V × U → V ⊗ U be bilinear maps defined by

b1(u, v) = u ⊗ v and b2(v, u) = v ⊗ u. Define ϕ: U × V → V ⊗ U by

ϕ(u, v) = b2(v, u) = v ⊗ u. Similarly, define ψ : V × U → U ⊗ V by

ψ(v, u) = b1(u, v) = u ⊗ v. Then ϕ and ψ are bilinear maps and hence there exists a unique pair of linear

maps Φ: U ⊗ V → V ⊗ U and Ψ: V ⊗ U → U ⊗ V such that Φ ◦ b1 = ϕ and

Ψ ◦ b2 = ψ. Note that

Ψ ◦ Φ ◦ b1(u, v) = Ψ ◦ ϕ(u, v) = Ψ ◦ b2(v, u) = ψ(v, u) = b1(u, v)

for any (u, v) ∈ U × V . Hence Ψ ◦ Φ ◦ b1 = b1. Similarly, Φ ◦ Ψ ◦ b2 = b2. Thus ∼ Ψ ◦ Φ = IU⊗V and Φ ◦ Ψ = IV ⊗U . It follows that U ⊗ V = V ⊗ U. 88 CHAPTER 2. MULTILINEAR ALGEBRA

Theorem 2.2.13. Let U, V , W be vector spaces over a field F . Then

(U ⊗ V ) ⊗ W =∼ U ⊗ (V ⊗ W ).

Proof. Fix w ∈ W . Define ϕw : U × V → U ⊗ (V ⊗ W ) by

ϕw(u, v) = u ⊗ (v ⊗ w) for any u ∈ U and v ∈ V .

By Proposition 2.2.7, we see that ϕw is bilinear. Then there exists a unique linear map φw : U ⊗ V → U ⊗ (V ⊗ W ) such that

φw(u ⊗ v) = u ⊗ (v ⊗ w) for any u ∈ U and v ∈ V .

0 It is easy to see that φw+w0 = φw + φw0 and φkw = kφw for any w, w ∈ W and k ∈ F . Define a bilinear map φ:(U ⊗ V ) × W → U ⊗ (V ⊗ W ) by

φ(x, w) = φw(x) for any x ∈ U ⊗ V and w ∈ W .

Then there is a unique linear map Φ: (U ⊗ V ) ⊗ W → U ⊗ (V ⊗ W ) such that

Φ(x ⊗ w) = φ(x, w) = φw(x) for any x ∈ U ⊗ V and w ∈ W .

In particular, for any u ∈ U, v ∈ V and w ∈ W ,

Φ((u ⊗ v) ⊗ w) = u ⊗ (v ⊗ w). (2.5)

Similarly, there is a linear map Ψ: U ⊗ (V ⊗ W ) → (U ⊗ V ) ⊗ W such that

Ψ(u ⊗ (v ⊗ w)) = (u ⊗ v) ⊗ w (2.6) for any u ∈ U, v ∈ V and w ∈ W . By (2.5) and (2.6), we see that

Ψ ◦ Φ((u ⊗ v) ⊗ w) = Ψ(u ⊗ (v ⊗ w)) = (u ⊗ v) ⊗ w for any u ∈ U, v ∈ V and w ∈ W . If x ∈ U ⊗ V , then x is a linear combination of elements in the form u ⊗ v. This implies that

Ψ ◦ Φ(x ⊗ w) = x ⊗ w for any x ∈ U ⊗ V and w ∈ W .

Hence Ψ ◦ Φ = I(U⊗V )⊗W on the set {x ⊗ w | x ∈ U ⊗ V, w ∈ W }, which spans

(U ⊗ V ) ⊗ W . It follows that Ψ ◦ Φ = I(U⊗V )⊗W . Similarly, Φ ◦ Ψ = IU⊗(V ⊗W ). This shows that (U ⊗ V ) ⊗ W =∼ U ⊗ (V ⊗ W ). 2.2. MULTILINEAR MAPS AND TENSOR PRODUCTS 89

Theorem 2.2.14. Let U, V , W be vector spaces over a field F . Then

U ⊗ (V ⊕ W ) =∼ (U ⊗ V ) ⊕ (U ⊗ W ).

Proof. Define ϕ: U × (V ⊕ W ) → (U ⊗ V ) ⊕ (U ⊗ W ) by

ϕ(u, (v, w)) = (u ⊗ v, u ⊗ w).

It is easy to check that ϕ is bilinear. Hence there is a unique linear map Φ: U ⊗ (V ⊕ W ) → (U ⊗ V ) ⊕ (U ⊗ W ) such that

Φ(u ⊗ (v, w)) = (u ⊗ v, u ⊗ w).

Let f1 : U × V → U ⊗ (V ⊕ W ) and f2 : U × W → U ⊗ (V ⊕ W ) be defined by

f1(u, v) = u ⊗ (v, 0) and f2(u, w) = u ⊗ (0, w).

Then f1 and f2 are bilinear maps and hence there exist linear maps ψ1 : U ⊗V →

U ⊗ (V ⊕ W ) and ψ2 : U ⊗ W → U ⊗ (V ⊕ W ) such that

ψ1(u ⊗ v) = u ⊗ (v, 0) and ψ2(u ⊗ w) = u ⊗ (0, w).

Now, define Ψ: (U ⊗ V ) ⊕ (U ⊗ W ) → U ⊗ (V ⊕ W ) by

Ψ(x, y) = ψ1(x) + ψ2(y) for any x ∈ U ⊗ V and y ∈ U ⊗ W .

In particular, for any u1, u2 ∈ U, v ∈ V and w ∈ W ,

Ψ(u1 ⊗ v, u2 ⊗ w) = u1 ⊗ (v, 0) + u2 ⊗ (0, w).

It is routine to verify that

Ψ ◦ Φ(u ⊗ (v, w)) = u ⊗ (v, w).

But then {u ⊗ (v, w) | u ∈ V, v ∈ V, w ∈ W } spans U ⊗ (V ⊕ W ). Hence

Ψ ◦ Φ = IU⊗(V ⊕W ). On the other hand, if u ∈ U and v ∈ V , then

Φ ◦ ψ1(u ⊗ v) = Φ(u ⊗ (v, 0)) = (u ⊗ v, u ⊗ 0) = (u ⊗ v, 0). 90 CHAPTER 2. MULTILINEAR ALGEBRA

Hence Φ ◦ ψ1(x) = (x, 0) for any x ∈ U ⊗ V . Similarly, Φ ◦ ψ2(y) = (0, y) for any y ∈ U ⊗ W . Hence for any x ∈ U ⊗ V and y ∈ U ⊗ W ,

Φ ◦ Ψ(x, y) = Φ(ψ1(x) + ψ2(y))

= Φ ◦ ψ1(x) + Φ ◦ ψ2(y) = (x, 0) + (0, y) = (x, y).

Thus Φ ◦ Ψ = I(U⊗V )⊕(U⊗W ). Hence Φ and Ψ are linear isomorphisms.

We can generalize the definition of a tensor product of two vector spaces to a tensor product of n vector spaces by the universal mapping property.

Definition 2.2.15. Let V1,..., Vn be vector spaces over the same field F .A tensor product of V1,...,Vn is a vector space V1 ⊗ · · · ⊗ Vn, together with an n-linear map t: V1 × · · · × Vn → V1 ⊗ · · · ⊗ Vn satisfying the universal mapping property. Given a vector space W and an n-linear map ϕ: V1 × · · · × Vn → W , there exists a unique linear map φ: V1 ⊗ · · · ⊗ Vn → W such that φ ◦ t = ϕ.

t V1 × · · · × Vn / V1 ⊗ · · · ⊗ Vn MMM q MMM q q ϕ MM q φ MMM q & W xq

We can show that a tensor product V1 ⊗ · · · ⊗ Vn exists and is unique up to isomorphism. The proof is similar to the case n = 2 and will only be sketched here. The uniqueness part is routine. For the existence, consider the free vector space FV1×···×Vn on V1 ×· · ·×Vn modulo the subspace T generated by the elements of the form

0 0 (v1, . . . , αvi + βvi, . . . , vn) − α(v1, . . . , vi, . . . , vn) − β(v1, . . . , vi, . . . , vn).

Let t(v1, . . . , vn) = (v1, . . . , vn) + T . It is routine to verify that t is an n-linear map and that FV1×···×Vn /T satisfies the universal mapping property above. An element t(v1, . . . , vn) in V1 ⊗ · · · ⊗ Vn will be denoted by v1 ⊗ · · · ⊗ vn.

If Bi is a basis for Vi for i = 1, . . . , n, then the following set is a basis for

V1 ⊗ · · · ⊗ Vn:

{v1 ⊗ · · · ⊗ vn | v1 ∈ B1, . . . , vn ∈ Bn}. 2.2. MULTILINEAR MAPS AND TENSOR PRODUCTS 91

Theorem 2.2.13 can be generalized to the following theorem:

Theorem 2.2.16. Let V1,...,Vn be vector spaces over the same field F . For k = 1, . . . , n, there is a unique linear isomorphism

k n n  O   O  O Φk : Vi ⊗ Vi → Vi i=1 i=k+1 i=1 such that for any v1 ∈ V1,..., vn ∈ Vn,

Φk((v1 ⊗ · · · ⊗ vk) ⊗ (vk+1 ⊗ · · · ⊗ vn)) = v1 ⊗ · · · ⊗ vk ⊗ vk+1 ⊗ · · · ⊗ vn.

Proof. Exercise. 92 CHAPTER 2. MULTILINEAR ALGEBRA

Exercises

2.2.1. Let U, V and W be vector spaces over F . Show that

Bil(U, V ; W ) =∼ L(U ⊗ V,W ), where Bil(U, V ; W ) is the space of bilinear maps from U × V into W .

m n 2.2.2. Show that there is a unique linear map Φ: F ⊗ F → Mm×n(F ) such that       x1 y1 x1y1 x1y2 . . . x1yn        x2   y2   x2y1 x2y2 . . . x2yn  Φ  .  ⊗  .  =  . . . .  .  .   .   . . .. .   .   .   . . .  xm yn xmy1 xmy2 . . . xmyn m n ∼ Then prove that Φ is a linear isomorphism. Hence F ⊗ F = Mm×n(F ). 2 Now, let n = 2 and let {e1 = (1, 0), e2 = (0, 1)} be the standard basis for F .

Notice that the 2×2 identity matrix I2 corresponds to the element e1 ⊗e1 +e2 ⊗e2 2 2 2 ∼ in F ⊗F . Show that we cannot find u, v ∈ F such that I2 = u⊗v. This shows that an element in a tensor product may not be a simple tensor.

2.2.3. Let V and W be vector spaces.

(i) Prove that there is a unique linear map Φ: V ∗ ⊗ W ∗ → (V ⊗ W )∗ such that

Φ(f ⊗ g)(v ⊗ w) = f(v)g(w)

for any f ∈ V ∗, g ∈ W ∗, v ∈ V and w ∈ W .

(ii) Show that if V and W are finite-dimensional, then Φ is a linear isomorphism. Hence (V ⊗ W )∗ =∼ V ∗ ⊗ W ∗.

2.2.4. Let V and W be vector spaces. Give a canonical linear map V ∗ ⊗ W → Hom(V,W ) and prove that it is a linear isomorphism when V and W are finite- dimensional. Hence V ∗ ⊗ W =∼ Hom(V,W ). 2.2. MULTILINEAR MAPS AND TENSOR PRODUCTS 93

2.2.5. Let U and V be finite-dimensional vector spaces over a field F . Denote by Bil(U ∗,V ∗; F ) the set of bilinear maps from U ∗ × V ∗ into F . Let b: U × V → Bil(U ∗,V ∗; F ) be defined by

b(u, v)(f, g) = f(u)g(v) for any u ∈ U, v ∈ V and f ∈ U ∗, g ∈ V ∗. Prove that the pair (Bil(U ∗,V ∗; F ), b) satisfies the universal mapping property for a tensor product: given any vector space W over F and a bilinear map ϕ: U × V → W , there exists a unique linear map Φ: Bil(U ∗,V ∗; F ) → W such that Φ ◦ b = ϕ. (This gives another construction of a tensor product U ⊗ V when U and V are finite-dimensional.)

2.2.6. Let U and V be finite-dimensional vector spaces, u1, . . . , un ∈ U and Pn v1, . . . , vn ∈ V . Prove that if i=1 ui ⊗ vi = 0 and {v1, . . . , vn} is linearly inde- pendent, then ui = 0 for i = 1, . . . , n.

2.2.7. Let V , V 0, W and W 0 be vector spaces. Let S : V → V 0 and T : W → W 0 be linear maps. Show that there exists a unique linear map Ψ: V ⊗W → V 0 ⊗W 0 such that

Ψ(v ⊗ w) = S(v) ⊗ T (w) for any v ∈ V and w ∈ W .

The unique linear map Ψ is called the tensor product of S and T , denoted by S ⊗ T . Hence

(S ⊗ T )(v ⊗ w) = S(v) ⊗ T (w) for any v ∈ V and w ∈ W .

2.2.8. Let V , V 0, W and W 0 be vector spaces over F . Let S, S0 : V → V 0 and T , T 0 : W → W 0 be linear maps and k ∈ F . Show that

(i) S ⊗ (T + T 0) = S ⊗ T + S ⊗ T 0;

(ii) (S + S0) ⊗ T = S ⊗ T + S0 ⊗ T ;

(iii) (kS) ⊗ T = S ⊗ (kT ) = k(S ⊗ T ). 94 CHAPTER 2. MULTILINEAR ALGEBRA

2.2.9. Let V and W be vector spaces. Let S1,S2 : V → V and T1,T2 : W → W be linear maps. Show that

(S1 ⊗ T1)(S2 ⊗ T2) = (S1S2) ⊗ (T1T2).

2.2.10. Prove Theorem 2.2.16. 2.3. DETERMINANTS 95

2.3 Determinants

In this section, we will define the determinant function. Here, we do not need the fact that F is a field. It suffices to assume that F is a with identity. However, we will develop the theory on vector spaces over a field as before, but keep in mind that what we are doing here works in a more general situation where vector spaces are replaced by modules over a commutative ring with identity.

Definition 2.3.1. Let V and W be vector spaces and f : V n → W a multilinear function. Then f is said to be symmetric if

f(vσ(1), . . . , vσ(n)) = f(v1, . . . , vn) for any σ ∈ Sn, (2.7) and skew-symmetric if

f(vσ(1), . . . , vσ(n)) = (sgn σ) f(v1, . . . , vn) for any σ ∈ Sn. (2.8) Moreover, f is said to be alternating if

f(v1, . . . , vn) = 0 whenever vi = vj for some i 6= j.

Recall that Sn is the set of permutations on {1, 2, . . . , n} and that |Sn| = n! elements. Moreover, sgn σ = 1 if σ is an even permutation and sgn σ = −1 if σ is an odd permutation. Since any permutation can be written as a product of transpositions, (2.8) is equivalent to

f(v1, . . . , vi, . . . , vj, . . . , vn) = −f(v1, . . . , vj, . . . , vi, . . . , vn) (2.9) for any v1,..., vn ∈ V . Proposition 2.3.2. Let V and W be vector spaces over F and f : V n → W a multilinear function. If f is alternating then it is skew-symmetric. The converse holds if 1 + 1 6= 0 in F .

Proof. Assume that f is alternating. First, let us consider the case n = 2. Note that for any u, v ∈ V ,

0 = f(u + v, u + v) = f(u, u) + f(u, v) + f(v, u) + f(v, v) = 0 + f(u, v) + f(v, u) + 0. 96 CHAPTER 2. MULTILINEAR ALGEBRA

Hence f(u, v) = −f(v, u) for any u, v ∈ V . This argument can be generalized for general n: for any v1,..., vn ∈ V ,

f(v1, . . . , vi, . . . , vj, . . . , vn) = −f(v1, . . . , vj, . . . , vi, . . . , vn).

This shows that (2.8) holds for a transposition σ = (i j) ∈ Sn and hence holds for any σ ∈ Sn. n On the other hand, assume that f is skew-symmetric. Let (v1, . . . , vn) ∈ V and vi = vj for some i 6= j. Let σ be the transposition (i j). Then sgn σ = −1 and thus

f(v1, . . . , vi, . . . , vj, . . . , vn) = −f(v1, . . . , vj, . . . , vi, . . . , vn)

= −f(v1, . . . , vi, . . . , vj, . . . , vn), because vi = vj. Since 1 + 1 6= 0, we have f(v1, . . . , vi, . . . , vj, . . . , vn) = 0.

Next, we will consider a multilinear map on the vector space F n over the field F . We can view an element in (F n)n as an n × n matrix whose i-th row is the i-th component in (F n)n.

Theorem 2.3.3. Let r ∈ F . Then there is a unique alternating multilinear map n n f :(F ) → F such that f(e1, . . . , en) = r, where {e1, . . . , en} is the standard basis for F n.

Proof. (Uniqueness) Suppose f :(F n)n → F is an alternating multilinear map n such that f(e1, . . . , en) = r. Let X1,..., Xn ∈ F and write each of them as

n X Xi = (ai1, . . . , ain) = aijej. j=1

By multilinearity,

n n  X X  f(X1,...,Xn) = f a1j1 ej1 ,..., anjn ejn j1=1 jn=1 n n X X = ··· a1j1 . . . anjn f(ej1 , . . . , ejn ). j1=1 jn=1 2.3. DETERMINANTS 97

Since f is alternating, f(ej1 , . . . , ejn ) = 0 unless ej1 ,..., ejn are all distinct; that is, the set {j1, . . . , jn} = {1, . . . , n} in some order. Hence the sum above reduces to the sum of n! terms over all the permutations in Sn: X f(X1,...,Xn) = a1σ(1) . . . anσ(n)f(eσ(1), . . . , eσ(n)) σ∈Sn X = (sgn σ) a1σ(1) . . . anσ(n)f(e1, . . . , en) σ∈Sn X = r (sgn σ) a1σ(1) . . . anσ(n). σ∈Sn (Existence) We define the function f :(F n)n → F by X f(X1,...,Xn) = r (sgn σ) a1σ(1) . . . anσ(n), (2.10) σ∈Sn where each Xi = (ai1, . . . , ain) and verify that it satisfies the desired property. To see that f is multilinear, we will show that f is linear in the first coordinate. 0 For the other coordinates, the proof is similar. Assume that X1 = (b11, . . . , b1n). Then

0 f(αX1 + βX1,...,Xn) X = r (sgn σ)[αa1σ(1) + βb1σ(1)] . . . anσ(n) σ∈Sn X X = α r (sgn σ) a1σ(1) . . . anσ(n) + β r (sgn σ) b1σ(1) . . . anσ(n) σ∈Sn σ∈Sn 0 = αf(X1,...,Xn) + βf(X1,...,Xn).

To show that f(e1, . . . , en) = r, note that each ei = (δi1, . . . , δin), where

δij = 1 if i = j and zero otherwise. Hence the product δ1σ(1) . . . δnσ(n) = 0 unless σ(1) = 1, . . . , σ(n) = n, i.e., σ is the identity permutation. Thus X f(e1, . . . , en) = r (sgn σ) δ1σ(1) . . . δnσ(n) = r. σ∈Sn

To show that it is alternating, suppose that Xj = Xk for some j 6= k. Then the map σ 7→ σ(j k) is a 1-1 correspondence between the set of even permutations and the set of odd permutations. Recall that the set of even permutations is 98 CHAPTER 2. MULTILINEAR ALGEBRA denoted by An. Hence the sum on the right-hand-side of (2.10) can be separated into the sum over even permutations and the sum over odd permutations: X X f(X1,...,Xn) = r a1σ(1) . . . anσ(n) − r a1τ(1) . . . anτ(n). (2.11)

σ∈An τ∈An(j k)

Let σ ∈ An and τ = σ(j k). If i∈ / {j, k}, then τ(i) = σ(j k)(i) = σ(i) and thus aiτ(i) = aiσ(i). Moreover, τ(j) = σ(j k)(j) = σ(k) implies ajτ(j) = ajσ(k) = akσ(k) since Xj = Xk. Similarly, akτ(k) = akσ(j) = ajσ(j). This shows that for any σ ∈ An and τ = σ(j k),

a1σ(1) . . . anσ(n) = a1τ(1) . . . anτ(n).

Thus each term in the first sum in (2.11) will cancel out with the corresponding term in the second sum so that the total sum is zero. Hence f is alternating.

Definition 2.3.4. The unique alternating multilinear function d: Mn(F ) → F such that d(In) = 1 is called the determinant function on Mn(F ), denoted by det. The determinant of a matrix A ∈ Mn(F ) is the element det(A) in F . Hence if A = [aij], then X det(A) = (sgn σ) a1σ(1) . . . anσ(n). σ∈Sn Remark. By Theorem 2.3.3 and Definition 2.3.4, it follows that any alternat- ing multilinear function f : Mn(F ) → F is a scalar multiple of the determinant function.

Theorem 2.3.5. For any A, B ∈ Mn(F ), det(AB) = det(A) det(B).

Proof. First, we note the following fact: if A is an m×n matrix and B is an n×p matrix, then the i-th row of the product AB is the product of the row matrix Ai with B.

To prove the theorem, let A1,..., An be the rows of A, respectively. Let d denote the determinant function, regarded as a multilinear function of the rows of a matrix. Hence det(AB) = d(A1B,...,AnB). Now, keep B fixed and let

f(A1,...,An) = det(AB) = d(A1B,...,AnB). 2.3. DETERMINANTS 99

It is easy to verify that f is an alternating multilinear function. It follows that f(A) = c det(A), where c = f(In) = det(B). Hence det(AB) = det(B) det(A).

−1 Corollary 2.3.6. If A ∈ Mn(F ) is invertible, then det(A ) = 1/ det(A).

−1 −1 −1 Proof. Since AA = In, det A · det(A ) = det(AA ) = det In = 1.

t Theorem 2.3.7. For any A ∈ Mn(F ), det(A) = det(A ).

t Proof. Let A = [aij] and A = [bij] where bij = aji. Note that if σ ∈ Sn −1 is such that σ(i) = j, then i = σ (j) and thus aσ(i)i = ajσ−1(j). Moreover, −1 sgn σ = sgn σ for any σ ∈ Sn. Hence

t X det(A ) = (sgn σ) b1σ(1) . . . bnσ(n) σ∈Sn X = (sgn σ) aσ(1)1 . . . aσ(n)n σ∈Sn X −1 = (sgn σ ) a1σ−1(1) . . . anσ−1(n). σ∈Sn

Since the last sum is taken over all permutations in Sn, it must equal det A.

Now, we define the determinant of a linear operator on a finite-dimensional vector space.

Definition 2.3.8. Let V be a finite-dimensional vector space and T : V → V a linear map. Define the determinant of T , denoted det T , by

det T = det([T ]B), where [T ]B is the matrix representation of T with respect to an ordered basis B. Note that this definition is independent of the ordered basis. For if B and B0 are ordered bases for V and P is the transition matrix from B to B0, then

−1 [T ]B0 = P [T ]B P ,

−1 Hence det([T ]B0 ) = det(P [T ]B P ) = det([T ]B). 100 CHAPTER 2. MULTILINEAR ALGEBRA

Proposition 2.3.9. Let S and T be linear maps on a finite-dimensional vector space. Then det(ST ) = det(S) det(T ).

Proof. Let [S] and [T ] be the matrix representations of S and T (with respect to a certain ordered basis), respectively. Then [ST ] = [S][T ]. Hence

det(ST ) = det([ST ]) = det([S][T ]) = det([S]) det([T ]) = det S det T. 2.4. EXTERIOR PRODUCTS 101

2.4 Exterior Products

In this section, we will construct a vector space that satisfies the universal map- ping property for alternating multilinear maps.

Theorem 2.4.1. Let V be a vector space over a field F and k a positive integer. Then there exists a vector space X over F , together with a k-linear alternating map a: V k → X satisfying the universal mapping property: given a vector space W and a k-linear alternating map ϕ: V k → W , there exists a unique linear map φ: X → W such that φ ◦ a = ϕ.

a V k / X B BB | BB | ϕ BB | φ B! ~| W Moreover, the pair (X, a) satisfying the universal mapping property above is unique up to isomorphism.

Proof. Let T be the subspace of V ⊗k = V ⊗ · · · ⊗ V (k-times) spanned by

{v1 ⊗ · · · ⊗ vk | vi = vj for some i 6= j}.

Let X = V ⊗k/T and let a: V k → X be defined by

a(v1, . . . , vk) = v1 ⊗ · · · ⊗ vk + T.

It is easy to see that a is k-linear. If v1,..., vk ∈ V are such that vi = vj for some i 6= j, then v1 ⊗· · ·⊗vk ∈ T and hence a(v1, . . . , vn) = v1 ⊗· · ·⊗vk +T = T. This shows that a is alternating. Now we show that it satisfies the universal mapping property. Let f : V k → ⊗k V be the canonical k-linear map sending (v1, . . . , vk) to v1 ⊗ · · · ⊗ vk and let π : V ⊗k → V ⊗k/T be the canonical projection map. Then a = π ◦ f.

a = π◦f

f π # V k / V ⊗k / V ⊗k/T D DD u D ϕ u ϕ D u DD u φ D"  zu W 102 CHAPTER 2. MULTILINEAR ALGEBRA

Let W be a vector space and ϕ: V k → W a k-linear alternating map. By the universal mapping property of the tensor product, there is a unique linear map ⊗k ϕ: V → W such that ϕ ◦ f = ϕ. If v1 ⊗ · · · ⊗ vk ∈ T , then

ϕ(v1 ⊗ · · · ⊗ vk) = ϕ(f(v1, . . . , vk)) = ϕ(v1, . . . , vk) = 0 because ϕ is alternating. This shows that ϕ sends the elements that generate T to zero. Hence T ⊆ ker ϕ. Then by the universal mapping property of the quotient space, there is a unique linear map φ: V ⊗k/T → W such that φ◦π = ϕ. Hence φ ◦ a = φ ◦ π ◦ f = ϕ ◦ f = ϕ.

To show that φ is unique, let φ0 : V ⊗k/T → W be such that φ0 ◦ a = ϕ. Then φ0 ◦ π : V ⊗k → W is a linear map for which (φ0 ◦ π) ◦ f = φ0 ◦ a = ϕ. Hence by the uniqueness of ϕ, we have φ0 ◦ π = ϕ. But then by the uniqueness of φ, we have φ0 = φ. Finally, the uniqueness of the pair (X, a) up to isomorphism follows from the standard argument of the universal mapping property.

Definition 2.4.2. The vector space X in Theorem 2.4.1 is called the k-th exterior power of V and is denoted by Vk V . Hence Vk V is a vector space together with a k-linear alternating map a: V k → Vk V satisfying the universal mapping property:

a V k / Vk V A Given any vector space W and a k-linear alter- AA y AA y nating map ϕ: V k → W , there is a unique linear ϕ AA y φ A |y Vk W map φ: V → W such that φ ◦ a = ϕ.

Vk An element a(v1, . . . , vk) in V will be denoted by v1 ∧ · · · ∧ vk. It is called an exterior product or a wedge product of v1,..., vk.

Proposition 2.4.3. The wedge product satisfies the following properties.

(i) v1 ∧ · · · ∧ (α vi) ∧ · · · ∧ vk = α(v1 ∧ · · · ∧ vi ∧ · · · ∧ vk) for any α ∈ F ;

0 0 (ii) v1 ∧ · · · ∧ (vi + vi) ∧ · · · ∧ vk = v1 ∧ · · · ∧ vi ∧ · · · ∧ vk + v1 ∧ · · · ∧ vi ∧ · · · ∧ vk; 2.4. EXTERIOR PRODUCTS 103

(iii) v1 ∧ · · · ∧ vk = 0 if vi = vj for some i 6= j.

(iv) vσ(1) ∧ · · · ∧ vσ(k) = sgn(σ) v1 ∧ · · · ∧ vk for any σ ∈ Sk. Proof. The first two properties follow from the multilinearity of a. The last two properties follow from the fact that a is alternating and skew-symmetric.

Theorem 2.4.4. Let V be a finite-dimensional vector space with a basis B = Vk {v1, . . . , vn}. Then the following set is a basis for V :

{vi1 ∧ · · · ∧ vik | 1 ≤ i1 < ··· < ik ≤ n}. (2.12)

Proof. Let C be the set in (2.12). To show that C spans Vk V , let us recall that the following set is a basis for V ⊗k:

{vi1 ⊗ · · · ⊗ vik | 1 ≤ i1, . . . , ik ≤ n}.

By the universal mapping for the tensor product, there is a unique linear map π : V ⊗k → Vk V such that

π(x1 ⊗ · · · ⊗ xk) = x1 ∧ · · · ∧ xk

for any x1,..., xk ∈ V . It follows that {vi1 ∧ · · · ∧ vik | 1 ≤ i1, . . . , ik ≤ n} spans Vk V . If two indices are the same, then vi1 ∧ · · · ∧ vik = 0. If the indices are all different, then we can rearrange them in increasing order by using Proposition 2.4.3 (iii) in order to see that C spans Vk V . Next, we show that C is linearly independent. Let

I = {(i1, . . . , ik) | 1 ≤ i1 < ··· < ik ≤ n}.

If α = (i1, . . . , ik) ∈ I, write vα = vi1 ∧ · · · ∧ vik . Now suppose X aαvα = 0, (2.13) α∈I Vk where each aα ∈ F . We will construct linear maps Fβ : V → F such that ∗ Fβ(vα) = δαβ for all α, β ∈ I. Let B = {f1, . . . , fn} be the dual basis of B for ∗ k V . Then fj(vi) = δij for i, j = 1, . . . , n. Define fα : V → F by X fα(x1, . . . , xk) = (sgn σ) fiσ(1) (x1) . . . fiσ(k) (xk). σ∈Sk 104 CHAPTER 2. MULTILINEAR ALGEBRA

Then fα is a k-linear alternating map. The proof of this fact is similar to the existence part of the proof of Theorem 2.3.3 and is omitted here. By the universal Vk mapping property, there is a unique linear map Fα : V → F such that Fα ◦a = fα. Then for any α, β ∈ I,

Fβ(vα) = fα(vi1 , . . . , vik ) = δαβ.

If we apply each Fβ to both sides of (2.13), we see that aβ = 0. It means that C is linearly independent.

Corollary 2.4.5. Let V be a finite-dimensional vector space with dim V = n. Vk n Then V is a finite-dimensional vector space with dimension k . Proof. This follows from a standard combinatorial argument.

From this Corollary, it is natural to define V0 V = F . Moreover, if k > n, Vk Vn then V = {0}. If k = n, we see that dim( V ) = 1. Hence if {v1, . . . , vn} is Vn a basis for V , then the singleton set {v1 ∧ · · · ∧ vn} is a basis for V . Let T : V → V be a linear map on a vector space V over a field F . Then T induces a unique linear map T ∧k : Vk V → Vk V such that

∧k T (x1 ∧ · · · ∧ xk) = T (x1) ∧ · · · ∧ T (xk) for any x1,..., xk ∈ V . The case k = dim V will be of special interest.

Theorem 2.4.6. Let V be a finite-dimensional vector space with dim V = n and T : V → V a linear map. Then det T is the unique scalar such that

T (v1) ∧ · · · ∧ T (vn) = (det T )(v1 ∧ · · · ∧ vn) for any v1,..., vn ∈ V .

Proof. By the discussion above, T ∧n is a linear map on a 1-dimensional vector space Vn V . Hence there exists a unique scalar k such that T ∧k(w) = kw for any Vn w ∈ V . Next, we will show that k = det T . Let {v1, . . . , vn} be a basis for V . Vn Then {v1 ∧ · · · ∧ vn} is a basis for V . For i = 1, . . . , n, write n X T (vi) = aijvj. j=1 2.4. EXTERIOR PRODUCTS 105

Note that the obtained matrix A = [aij] is the transpose of the matrix represen- tation of T . But then the determinant of a matrix is equal to the determinant of its transpose, and thus det T = det A. Now, let us consider

n n  X   X  T (v1) ∧ · · · ∧ T (vn) = a1j1 vj1 ∧ · · · ∧ anjn vjn j1=1 jn=1 n n X X = ··· a1j1 . . . anjn (vj1 ∧ · · · ∧ vjn ) j1=1 jn=1 X = a1σ(1) . . . anσ(n)(vσ(1) ∧ · · · ∧ vσ(n)) σ∈Sn X = (sgn σ) a1σ(1) . . . anσ(n)(v1 ∧ · · · ∧ vn) σ∈Sn

= (det T )(v1 ∧ · · · ∧ vn).

∧k Hence T (v1 ∧ · · · ∧ vn) = (det T )(v1 ∧ · · · ∧ vn). Thus k = det T . 106 CHAPTER 2. MULTILINEAR ALGEBRA

Exercises

2.4.1. Let V be a vector space and v1,..., vk ∈ V . If {v1, . . . , vk} is linearly dependent, show that v1 ∧ · · · ∧ vk = 0. In particular, if dim V = n and k > n, then v1 ∧ · · · ∧ vk = 0.

2.4.2. Let V be a finite-dimensional vector space. Prove that for any k ∈ N, ^k ∗ ^k V =∼ V ∗.

2.4.3. Let V be a finite-dimensional vector space and T : V → V a linear map. Show that if dim V = n and f : V n → F is an n-linear alternating form, then

f(T (v1),...,T (vn)) = (det T )f(v1, . . . , vn) for any v1, . . . , vn ∈ V . Moreover, det T is the only scalar satisfying the above equality for any v1, . . . , vn ∈ V .

2.4.4. Let f : V n → W be a multilinear map. Define fˆ: V n → W by

ˆ X f(v1, . . . , vn) = sgn(σ)f(vσ(1), . . . , vσ(n)) σ∈Sn for any v1,..., vn ∈ V . Prove that fˆ is a multilinear alternating map.

2.4.5. Let V be a vector space over a field F and k a positive integer. Show that there exists a vector space X over F , together with a symmetric k-linear map s: V k → X satisfying the universal mapping property: given a vector space W and a symmetric k-linear map ϕ: V k → W , there exists a unique linear map φ: X → W such that φ ◦ s = ϕ. Moreover, show that the pair (X, s) satisfying the universal mapping property above is unique up to isomorphism. The pair (X, s) satisfying the above universal mapping property for symmetric k-linear maps is called the k-th symmetric product of V , denoted by Sk(V ).

2.4.6. Let V be a finite-dimensional vector space with dimension n. What is the dimension of Sk(V )? Justify your answer. Chapter 3

Canonical Forms

The basic question in this chapter is as follows. Given a finite-dimensional vector space V and a linear operator T : V → V , does there exist an ordered basis B for V such that [T ]B has a “simple” form. First we investigate when T can be represented as a diagonal matrix. Then we will find a Jordan canonical form of a linear operator. But first, we review some results about polynomials that will be used in this chapter.

3.1 Polynomials

Definition 3.1.1. A polynomial f(x) ∈ F [x] is said to be monic if the coefficient of the highest degree term of f(x) is 1. A polynomial f(x) ∈ F [x] is said to be constant if f(x) = c for some c ∈ F . Equivalently, f(x) is constant if f(x) = 0 or deg f(x) = 0.

Definition 3.1.2. Let f(x), g(x) ∈ F [x], with g(x) 6= 0. We say that g(x) divides f(x), denoted by g(x) | f(x), if there is a polynomial q(x) ∈ F [x] such that f(x) = q(x)g(x).

Theorem 3.1.3 (Division ). Let f(x), g(x) ∈ F [x], with g(x) 6= 0. Then there exist unique polynomials q(x) and r(x) in F [x] such that

f(x) = q(x)g(x) + r(x)

and deg r(x) < deg g(x) or r(x) = 0.

107 108 CHAPTER 3. CANONICAL FORMS

Proof. First, we will show the existence part. If f(x) = 0, take q(x) = 0 and r(x) = 0. If f(x) 6= 0 and deg f(x) < deg g(x), take q(x) = 0 and r(x) = f(x). Assume that deg f(x) ≥ deg g(x). We will prove the theorem by induction on deg f(x). If deg f(x) = 0, then deg g(x) = 0, i.e., f(x) = a and g(x) = b for some a, b ∈ F − {0}. Then f(x) = ab−1g(x) + 0, with q(x) = ab−1 and r(x) = 0. Next, let f(x), g(x) ∈ F [x] with deg f(x) = n > 0 and deg g(x) = m ≤ n. Assume that the statement holds for any polynomial of degree < n. Write

n m f(x) = anx + ··· + a1x + a0 and g(x) = bmx + ··· + b1x + b0 where n ≥ m, ai, bj ∈ F for all i, j and bm 6= 0. Let

−1 n−m h(x) = f(x) − anbm x g(x). (1)

−1 n−m Then either h(x) = 0 or deg h(x) < n. If h(x) = 0, take q(x) = anbm x and r(x) = 0. If deg h(x) < n, by the induction hypothesis, there exist q0(x) and r0(x) in F [x] such that

h(x) = q0(x)g(x) + r0(x) (2) where either r0(x) = 0 or deg r0(x) < deg g(x). Combining (1) and (2) together, −1 n−m 0 0 we have that f(x) = (anbm x + q (x))g(x) + r (x), as desired. To prove uniqueness, assume that

f(x) = q1(x)g(x) + r1(x) = q2(x)g(x) + r2(x) where qi(x), ri(x) ∈ F [x] and ri(x) = 0 or deg ri(x) < deg g(x), for i = 1, 2. Then

(q1(x) − q2(x))g(x) = r2(x) − r1(x). If r2(x) − r1(x) 6= 0, then q1(x) − q2(x) 6= 0, which implies

deg g(x) ≤ deg((q1(x) − q2(x))g(x)) = deg(r2(x) − r1(x)) < deg g(x), a contradiction. Thus r2(x) − r1(x) = 0, which implies q1(x) − q2(x) = 0.

Corollary 3.1.4. Let p(x) ∈ F [x] and α ∈ F . Then p(α) = 0 if and only if p(x) = (x − α)q(x) for some q(x) ∈ F [x]. 3.1. POLYNOMIALS 109

Proof. Assume that p(α) = 0. By the Division Algorithm, there exist q(x), r(x) in F [x] such that p(x) = (x − α)q(x) + r(x) where deg r(x) < 1 or r(x) = 0, i.e. r(x) is constant. By the assumption, we have r(α) = p(α) = 0, which implies r(x) = 0. So p(x) = (x − α)q(x). The converse is obvious.

Definition 3.1.5. Let p(x) ∈ F [x] and α ∈ F . We say that α is a root or a zero of p(x) if p(α) = 0. Hence the above corollary says that α is a root of p(x) if and only if x − α is a factor of p(x).

Corollary 3.1.6. Any polynomial of degree n ≥ 1 has at most n distinct zeros.

Proof. We will prove by induction on n. The case n = 1 is clear. Assume that the statement holds for a positive integer n. Let f(x) be a polynomial of degree n + 1. If f has no zero, we are done. Suppose that f(α) = 0 for some α ∈ F . By Corollary 1.1.4, f(x) = (x − α)g(x) for some g(x) ∈ F [x]. Hence deg g(x) = n. By the induction hypothesis, g(x) has at most n zeros, which implies that f(x) has at most n + 1 zeros.

Definition 3.1.7. Let f1(x), . . . , fn(x) ∈ F [x]. A monic polynomial g(x) ∈ F [x] is said to be the greatest common divisor of f1(x), . . . , fn(x) if it satisfies these two properties:

(i) g(x) | fi(x) for i = 1, . . . , n;

(ii) for any h(x) ∈ F [x], if h(x) | fi(x) for i = 1, . . . , n, then h(x) | g(x).

We denote the greatest common divisor of f1,..., fn by gcd(f1, . . . , fn).

Definition 3.1.8. Let f1(x), . . . , fn(x) ∈ F [x]. A monic polynomial g(x) ∈ F [x]

is said to be the least common multiple of f1(x), . . . , fn(x) if it satisfies these two properties:

(i) fi(x) | g(x) for i = 1, . . . , n;

(ii) for any h(x) ∈ F [x], if fi(x) | h(x) for i = 1, . . . , n, then g(x) | h(x).

We denote the least common multiple of f1,..., fn by lcm(f1, . . . , fn). 110 CHAPTER 3. CANONICAL FORMS

Proposition 3.1.9. Let f1(x), . . . , fn(x) be nonzero polynomials in F [x] and let

g(x) = gcd(f1(x), . . . , fn(x)).

Then there exist q1(x), . . . , qn(x) ∈ F [x] such that

g(x) = q1(x)f1(x) + ··· + qn(x)fn(x).

Proof. Let n n X o P = pi(x)fi(x) | pi(x) ∈ F [x], i = 1, . . . , n . i=1

Since each fi(x) ∈ P is nonzero, P contains a nonzero element. By the Well- ordering Principle, P contains a polynomial d(x) ∈ F [x] with the smallest degree. Pn Thus d(x) = i=1 pi(x)fi(x), for some pi(x) ∈ F [x], i = 1, . . . , n. By the Division Algorithm, there exist a(x), r(x) ∈ F [x] such that f1(x) = a(x)d(x)+r(x), where r(x) = 0 or deg r(x) < deg d(x). It follows that r(x) ∈ P. Indeed,

r(x) = f1(x) − a(x)d(x) n  X  = f1(x) − a(x) pi(x)fi(x) i=1 n X = (1 − a(x)p1(x))f1(x) − a(x)pi(x)fi(x) ∈ P. i=2 But then d(x) is an element in P with the smallest degree. Hence r(x) = 0, which implies d(x) | f1(x). Similarly, we have d(x) | fi(x) for i = 1, . . . , n. It follows that d(x) | g(x). Thus there exists σ(x) ∈ F [x] such that

n X g(x) = σ(x)d(x) = σ(x)pi(x)fi(x). i=1

Letting qi(x) = σ(x)pi(x), for i = 1, . . . , n, finishes the proof.

Definition 3.1.10. Two polynomials f(x) and g(x) in F [x] are said to be rela- tively prime if gcd(f(x), g(x)) = 1.

Corollary 3.1.11. Two polynomials f(x) and g(x) in F [x] are relatively prime if and only if there exist p(x), q(x) ∈ F [x] such that p(x)f(x) + q(x)g(x) = 1. 3.1. POLYNOMIALS 111

Proof. If gcd(f(x), g(x)) = 1, the implication follows from Theorem 3.1.9. Con- versely, suppose there exist p(x), q(x) ∈ F [x] such that p(x)f(x) + q(x)g(x) = 1. Let d(x) = gcd(f(x), g(x)). Then d(x) | f(x) and d(x) | g(x). It follows easily that d(x) | p(x)f(x) + q(x)g(x). Hence d(x) | 1, which implies d(x) = 1.

Definition 3.1.12. Let p(x), q(x), r(x) ∈ F [x] with r(x) 6= 0. We say that p(x) is congruent to q(x) modulo r(x) if r(x) | (p(x) − q(x)), denoted by

p(x) ≡ q(x) mod r(x).

Theorem 3.1.13 (Chinese Remainder Theorem). Let σ1(x), . . . , σn(x) be polyno-

mials in F [x] such that σi(x) and σj(x) are relatively prime for i 6= j. Then given

any polynomials r1(x), . . . , rn(x) ∈ F [x], there exists a polynomial p(x) ∈ F [x] such that

p(x) ≡ ri(x) mod σi(x) for i = 1, . . . , n.

Proof. For j = 1, . . . , n, let

n Y ϕj(x) = σi(x). i=1 i6=j

Then σj(x) and ϕj(x) are relatively prime for each j. By Proposition 3.1.9, for

each i, there exist pi(x), qi(x) ∈ F [x] such that 1 = pi(x)ϕi(x) + qi(x)σi(x). Let

p(x) = r1(x)p1(x)ϕ1(x) + ··· + rn(x)pn(x)ϕn(x).

Note that for each i,

ri(x)pi(x)ϕi(x) = ri(x) − ri(x)qi(x)σi(x) ≡ ri(x) mod σi(x)

and

rj(x)pj(x)ϕj(x) ≡ 0 mod σi(x) for j 6= i.

Hence p(x) ≡ ri(x) mod σi(x) for i = 1, . . . , n.

Definition 3.1.14. We say that f(x) ∈ F [x] is irreducible over F if f is not constant, and whenever f(x) = g(x)h(x) with g(x), h(x) ∈ F [x], then g(x) or h(x) is constant. 112 CHAPTER 3. CANONICAL FORMS

Note that the notion of irreducibility depends on the field F . For example, 2 f(x) = x + 1 is irreducible over R, but not irreducible over C because

2 x + 1 = (x + i)(x − i) over C.

Also, note that a linear polynomial ax + b, where a 6= 0, is always irreducible.

Lemma 3.1.15. Let f(x), g(x) and h(x) be polynomials in F [x] where f(x) is irreducible. If f(x) | g(x)h(x), then either f(x) | g(x) or f(x) | h(x).

Proof. Assume that f(x) is irreducible, f(x) | g(x)h(x), but f(x) - g(x). We will show that gcd(f(x), g(x)) = 1. Let d(x) = gcd(f(x), g(x)). Since d(x) | f(x), we can write f(x) = d(x)k(x) for some k(x) ∈ F [x]. By irreducibility of f(x), d(x) or k(x) is a constant. If k(x) = k is a constant, then d(x) = k−1f(x), which implies that f(x) | g(x), a contradiction. Hence, d(x) is a constant, i.e. d(x) = 1. By Proposition 3.1.9, we have 1 = p(x)f(x)+q(x)g(x) for some p(x), q(x) ∈ F [x]. Thus h(x) = p(x)f(x)h(x)+q(x)g(x)h(x). Since f(x) divides g(x)h(x), it divides the term on the right-hand side of this equation and hence f(x) | h(x).

Theorem 3.1.16 (Unique factorization of polynomials). Every nonconstant poly- nomial in F [x] can be written as a product of irreducible polynomials, and the factorization is unique up to associates; namely, if f(x) ∈ F [x] and

f(x) = g1(x) . . . gn(x) = h1(x) . . . hm(x), then n = m and we can renumber the indices so that gi(x) = αihi(x) for some

αi ∈ F for i = 1, . . . , n.

Proof. We will prove the theorem by induction on deg f(x). Obviously, any polynomial of degree 1 is irreducible. Let n > 1 be an integer and assume that any polynomial of degree less than n can be written as a product of irreducible polynomials. Let f(x) ∈ F [x] with deg f(x) = n. If f(x) is irreducible, we are done. Otherwise, we can write f(x) = g(x)h(x) for some g(x), h(x) ∈ F [x], where deg g(x) < n and deg h(x) < n. By the induction hypothesis, both g(x) and h(x) can be written as products of irreducible polynomials, and hence so can f(x). Next, we will prove uniqueness of the factorization again by the induction on deg f(x). This is clear in case deg f(x) = 1. Let n > 1 and assume that a 3.1. POLYNOMIALS 113 factorization of any polynomial of degree less than n is unique up to associates. Let f(x) ∈ F [x] with deg f(x) = n. Suppose that

f(x) = g1(x) . . . gn(x) = h1(x) . . . hm(x),

where gi(x) and hj(x) are all irreducible. Hence g1(x) | h1(x) . . . hm(x). It

follows easily by a generalization of Lemma 3.1.15 that g1(x) | hi(x) for some i = 1, . . . , m. By renumbering the irreducible factors in the second factorization

if necessary, we may assume that i = 1. Since g1(x) and h1(x) are irreducible,

g1(x) = α1h1(x) for some α1 ∈ F . Thus

α1g2(x) . . . gn(x) = h2(x) . . . hm(x).

Note that the polynomial above has degree less than n. Hence, by the induction

hypothesis, m = n and for each j = 2, . . . , n, gj(x) = αjhj(x) for some αj ∈ F . This finishes the induction and the proof of the theorem.

Remark. Theorem 3.1.16 says that F [x] is a Unique Factorization Domain (UFD) whenever F is a field.

Definition 3.1.17. A nonconstant polynomial p(x) ∈ F [x] is said to split over F if p(x) can be written as a product of linear factors in F [x].

Definition 3.1.18. A field F is said to be algebraically closed if every noncon- stant polynomial over F has a root in F .

Examples. C is algebraically closed by the Fundamental Theorem of Algebra, but Q and R are not algebraically closed.

Proposition 3.1.19. The following statements on a field F are equivalent:

(i) F is algebraically closed;

(ii) every nonconstant polynomial p(x) ∈ F [x] splits over F ;

(iii) every irreducible polynomial in F [x] has degree one. 114 CHAPTER 3. CANONICAL FORMS

Proof. (i) ⇒ (ii). Let p(x) ∈ F [x] − F and n = deg p(x). We will prove by induction on n. If n = 1, then we are done. Assume that n > 1 and every non- constant polynomial of degree n−1 in F [x] splits over F . Since F is algebraically closed, p(x) has a root α ∈ F . By Corollary 3.1.4, p(x) = (x − α)q(x) for some q(x) ∈ F [x]. Then deg q(x) = n−1, and hence q(x) splits over F by the induction hypothesis. Thus p(x) also splits over F . (ii) ⇒ (iii). Let q(x) be an irreducible polynomial in F [x]. Then deg q(x) ≥ 1 and hence q(x) splits over F by the assumption. If deg q(x) > 1, then any linear factor of q(x) is its nonconstant proper factor, contradicting irreducibility of q(x). Thus deg q(x) = 1. (iii) ⇒ (i). Let f(x) be a nonconstant polynomial over F . By Theorem 3.1.16, f(x) can be written as a product of linear factors. Hence there exists α ∈ F such that (x − α) | f(x), i.e. α is a root of f(x), by Corollary 3.1.4. This shows that F is algebraically closed. 3.2. DIAGONALIZATION 115

3.2 Diagonalization

Throughout this chapter, V will be a finite-dimensional vector space over a field F and T : V → V a linear operator on V .

Definition 3.2.1. A linear operator T : V → V is said to be diagonalizable if there exists an ordered basis B for V such that [T ]B is a diagonal matrix. An n × n matrix A is said to be diagonalizable if there is an invertible matrix P such that P −1AP is a diagonal matrix, i.e. A is similar to a diagonal matrix.

Lemma 3.2.2. Let V be a finite-dimensional vector space with dim V = n. Let B be an ordered basis for V and T : V → V a linear operator on V . If A is an n × n matrix similar to [T ]B, then there is an ordered basis C for V such that

[T ]C = A.

Proof. Let B = {v1, . . . , vn} and A an n × n matrix. Then there is an invertible −1 ∼ matrix P such that A = P [T ]BP . Since L(V ) = Mn(F ), there is a linear map

U : V → V such that [U]B = P . Then U is a linear isomorphism because P is invertible. Hence

−1 −1 −1 [U TU]B = [U ]B[T ]B[U]B = P [T ]BP = A.

It follows that, for j = 1, . . . , n,

n −1 X U TU(vj) = aijvi, and that i=1 n X TU(vj) = aijU(vi). (3.1) i=1

Let C = {U(v1),...,U(vn)}. Since B is an ordered basis for V and U is a linear isomorphism, we see that C is a basis for V and [T ]C = A by (3.1).

Proposition 3.2.3. Let T : V → V be a linear operator on V and B an ordered

basis for V . Then T is diagonalizable if and only if [T ]B is diagonalizable.

Proof. Assume that T is diagonalizable. Then there is an ordered basis C such that [T ]C is a diagonal matrix. Thus there is an invertible matrix P such that −1 P [T ]BP = [T ]C. This shows that [T ]B is diagonalizable. 116 CHAPTER 3. CANONICAL FORMS

Conversely, assume that [T ]B is diagonalizable. Then [T ]B is similar to a diagonal matrix D. By Lemma 3.2.2, there is an ordered basis C for V such that

[T ]C = D. Hence T is diagonalizable.

n n n Corollary 3.2.4. Let A ∈ Mn(F ) and LA : F → F a linear map on F n defined by LA(x) = Ax for any x ∈ F , considered as a column matrix. Then A is diagonalizable if and only if LA is diagonalizable.

Proof. Exercise.

Proposition 3.2.5. Let V be a vector space over a field F with dim V = n and T : V → V a linear operator. Then T is diagonalizable if and only if there is a basis B = {v1, . . . , vn} for V and scalars λ1, . . . , λn ∈ F , not necessarily distinct, such that

T vj = λjvj for j = 1, . . . , n.

Proof. Assume there is a basis B = {v1, . . . , vn} for V and scalars λ1, . . . , λn in F , not necessarily distinct, such that

T vj = λjvj for j = 1, . . . , n.

By the definition of matrix representation, we have   λ1 0 ... 0    0 λ2 ... 0  [T ]B =  . . . .  . (3.2)  . . .. .   . . .  0 0 . . . λn

Conversely, assume that T is diagonalizable. Then there is an ordered basis

B = {v1, . . . , vn} such that [T ]B is a diagonal matrix in the form (3.2). Again, by the definition of matrix representation, we have T vj = λjvj for j = 1, . . . , n.

Corollary 3.2.6. Let A ∈ Mn(F ). Then A is diagonalizable if and only if there n is a basis B = {v1, . . . , vn} for F and scalars λ1,..., λn ∈ F , not necessarily distinct, such that

Avj = λjvj for j = 1, . . . , n.

Proof. This follows immediately from Proposition 3.2.5 and Corollary 3.2.4. 3.2. DIAGONALIZATION 117

Definition 3.2.7. Let T : V → V be a linear operator on V . A scalar λ ∈ F is called an eigenvalue for T if there is a non-zero v ∈ V such that T (v) = λv.A non-zero vector v such that T (v) = λv is called an eigenvector corresponding to the eigenvalue λ. For each λ ∈ F , define

Vλ = {v ∈ V | T (v) = λv} = ker(T − λIV ).

Then Vλ is a subspace of V . If λ is not an eigenvalue of T , then Vλ = {0}. If λ

is an eigenvalue of T , we call Vλ the eigenspace corresponding to the eigenvalue

λ. Any non-zero vector in Vλ is an eigenvector corresponding to λ.

Similarly, we define an eigenvalue, an eigenvector and an eigenspace of a matrix in an analogous way.

Definition 3.2.8. Let A be an n × n matrix with entries in a field F . A scalar λ ∈ F is called an eigenvalue for A if there is a non-zero v ∈ F n such that Av = λv. A non-zero vector v such that Av = λv is called an eigenvector corresponding to the eigenvalue λ. For each λ ∈ F , define n Vλ = {v ∈ F | Av = λv}.

If λ is not an eigenvalue of A, then Vλ = {0}. If λ is an eigenvalue of A, Vλ is called the eigenspace corresponding to the eigenvalue λ. Any non-zero vector in

Vλ is an eigenvector corresponding to λ. In fact, an eigenvalue (eigenvector, eigenspace) for a matrix A is an eigenvalue n n (eigenvector, eigenspace) for a linear operator LA : F → F , x 7→ Ax. Hence any result about eigenvalues and eigenvectors of a linear operator will be transferred to the analogous result for a matrix as well.

Using the language of eigenvalues and eigenvectors, we can rephrase Propo- sition 3.2.5 as follows:

Corollary 3.2.9. A linear operator T is diagonalizable if and only if there is a basis for V consisting of eigenvectors of T .

From this Corollary, to verify whether a linear operator is diagonalizable, we will find its eigenvectors and see whether they form a basis for the vector space. 118 CHAPTER 3. CANONICAL FORMS

Proposition 3.2.10. Let T : V → V be a linear operator. If v1, . . . , vk are eigen- vectors of T corresponding to distinct eigenvalues, then {v1, . . . , vk} is linearly independent.

Proof. We will proceed by induction on k. If k = 1, then the result follows immediately because a non-zero vector forms a linearly independent set. Assume the statement holds for k − 1 eigenvectors. Let v1, . . . , vk be eigenvectors of T corresponding to distinct eigenvalues λ1, . . . , λk, respectively. Let α1, . . . , αk ∈ F be such that

α1v1 + α2v2 + ··· + αkvk = 0. (3.3)

Applying T both sides of (3.3), we have

α1λ1v1 + α2λ2v2 + ··· + αkλkvk = 0. (3.4)

Multiplying equation (3.3) by λk, we also have

α1λkv1 + α2λkv2 + ··· + αkλkvk = 0. (3.5)

We now subtract (3.5) from (3.4).

α1(λ1 − λk)v1 + α2(λ2 − λk)v2 + ··· + αk−1(λk−1 − λk)vk−1 = 0.

By the induction hypothesis, αi(λi −λk) = 0 for i = 1, . . . , k−1. Hence αi = 0 for i = 1, . . . , k − 1 because λi’s are all distinct. Substitute αi = 0 for i = 1, . . . , k − 1 in (3.3). It follows that αk = 0. Thus {v1, . . . , vk} is linearly independent.

Corollary 3.2.11. Let V be a finite-dimensional vector space with dim V = n and T : V → V a linear operator. Then T has at most n distinct eigenvalues. Furthermore, if T has n distinct eigenvalues, then T is diagonalizable.

Proof. Let λ1, . . . , λk be the distinct eigenvalues of T with corresponding eigen- vectors v1, . . . , vk, respectively. By Proposition 3.2.10, {v1, . . . , vk} is linearly independent. Since dim V = n, it follows that k ≤ n. If k = n, {v1, . . . , vn} is a basis for V consisting of eigenvectors of T . Hence T is diagonalizable by Corollary 3.2.9. 3.2. DIAGONALIZATION 119

Proposition 3.2.12. Let T : V → V be a linear operator with distinct eigenvalues

λ1,..., λk. Let W = Vλ1 + ··· + Vλk , where each Vλi is the corresponding eigenspace of λi. Then W = Vλ1 ⊕· · ·⊕Vλk . In other words, the sum of eigenspaces is indeed a direct sum.

Proof. Let v1 ∈ Vλ1 ,..., vk ∈ Vλk be such that v1 + ··· + vk = 0. Suppose vi 6= 0 for some i. By renumbering if necessary, assume that vi 6= 0 for 1 ≤ i ≤ j and vi = 0 for i = j + 1, . . . , k. Then v1 + ··· + vj = 0. This shows that {v1, . . . , vj} is linearly dependent. But this contradicts Proposition 3.2.10. Hence vi = 0 for i = 1, . . . , k.

Theorem 3.2.13. Let T : V → V be a linear operator with distinct eigenvalues

λ1,..., λk. Then TFAE: (i) T is diagonalizable;

(ii) V = Vλ1 ⊕ · · · ⊕ Vλk ;

(iii) dim V = dim Vλ1 + ··· + dim Vλk .

Proof. Let W = Vλ1 + ··· + Vλk . By Proposition 3.2.12, W = Vλ1 ⊕ · · · ⊕ Vλk . (i) ⇒ (ii). Assume that T is diagonalizable. Let B be a basis for V consisting k of eigenvectors of T . For i = 1, . . . , k, let Bi = B ∩ Vλi . Then B = ∪i=1Bi and Bi ∩ Bj = ∅ for any i 6= j. Note that each Bi is a linearly independent subset of

Vλi . Hence k k X X dim V = |B| = |Bi| ≤ dim Vλi = dim W. i=1 i=1

This implies V = W = Vλ1 ⊕ · · · ⊕ Vλk . (ii) ⇒ (iii). This follows from Corollary 1.6.22. Pk (iii) ⇒ (i). Suppose dim V = i=1 dim Vλi . For i = 1, . . . , k, choose a basis Bi k for each Vλi . Then Bi ∩ Bj = ∅ for any i 6= j. Let B = ∪i=1Bi. By Proposition 1.6.21, B is a basis for W and hence is linearly independent in V . It follows that

k k X X dim V = dim Vλi = |Bi| = |B|. i=1 i=1 Thus B is a basis for V consisting of eigenvectors of T . This shows that T is diagonalizable. 120 CHAPTER 3. CANONICAL FORMS

The next proposition gives a method for computing an eigenvalue of a linear map by solving a certain polynomial equation called the characteristic equation.

Proposition 3.2.14. Let T : V → V be a linear operator and λ ∈ F . Then λ is an eigenvalue of T if and only if det(T − λIV ) = 0.

Proof. For any λ ∈ F , we have the following equivalent statements:

λ is an eigenvalue of T ⇔ ∃v ∈ V − {0},T (v) = λv

⇔ ∃v ∈ V − {0}, (T − λIV )(v) = 0

⇔ T − λIV is not 1-1

⇔ T − λIV is not invertible

⇔ det(T − λIV ) = 0.

Notice that we use the assumption V being finite-dimensional in the fourth equiv- alence.

Corollary 3.2.15. Let A ∈ Mn(F ) and λ ∈ F . Then λ is an eigenvalue of A if and only if det(A − λIn) = 0.

Proposition 3.2.16. Let T : V → V be a linear operator and λ ∈ F . If B is an ordered basis for V , then

det(T − λIV ) = det([T ]B − λIn).

Hence λ is an eigenvalue of T if and only if λ is an eigenvalue of [T ]B.

Proof. The first statement follows from the fact that

det(T − λIV ) = det([T − λIV ]B) = det([T ]B − λIn).

The second statement immediately follows.

Definition 3.2.17. Let A ∈ Mn(F ). We define the characteristic polynomial of A to be

χA(x) = det(xIn − A). 3.2. DIAGONALIZATION 121

Similarly, if T : V → V is a linear operator on V , we define the characteristic polynomial of T to be

χT (x) = det(xIn − [T ]B),

where B is an ordered basis for V .

Notice that χT (and χA) is a monic polynomial of degree n = dim V . More- over, Proposition 3.2.14 shows that the eigenvalues of T (or A) are the roots of its characteristic polynomial.

Remark. Note that the matrix xIn −A is in Mn(F [x]) with each entry in xIn −A being a polynomial in F [x]. In this case, F [x] is a ring but not a field. We can extend the definition of the determinant of a matrix over a field to that of a matrix over a ring. However, we cannot define the characteristic polynomial of

a linear operator T to be det(xIV − T ) because xIV − T is not a linear operator on a vector space V . We define its characteristic polynomial using its matrix representation instead.

2 2 Example. Define T : R → R by

T (x, y) = (x + 4y, 3x + 2y).

Find the eigenvalues of T and determine whether it is diagonalizable. If it is, find 2 a basis for R consisting of eigenvectors of T .

2 Solution. Let B = {(1, 0), (0, 1)} be the standard ordered basis for R . Let ! 1 4 A = [T ] = . B 3 2

2 Then T can be viewed as T (v) = Av = LA(v) for any v ∈ R , written as a column 2 × 1 matrix. Hence ! x − 1 −4 χT (x) = χA(x) = det −3 x − 2 = (x − 1)(x − 2) − 12 = (x − 5)(x + 2). 122 CHAPTER 3. CANONICAL FORMS

Thus the eigenvalues of T are −2 and 5. Since T has 2 distinct eigenvalues, it is diagonalizable. To find a basis consisting of eigenvectors of T , we will find the eigenspaces corresponding to −2 and 5, respectively. λ = −2 : Let v = (x, y) ∈ ker(T + 2I). Then (A + 2I)v = 0, i.e., ! ! ! 3 4 x 0 = . 3 4 y 0

Thus 3x + 4y = 0. Hence the eigenspace corresponding to λ = −2 is h(4, −3)i. λ = 5 : Let v = (x, y) ∈ ker(T − 5I). Then (A − 5I)v = 0, i.e., ! ! ! −4 4 x 0 = . 3 −3 y 0

Thus x − y = 0. Hence the eigenspace corresponding to λ = 5 is h(1, 1)i. Let C = {(4, −3), (1, 1)}. Then C is a linearly independent set which has 2 2 elements, and hence it is a basis for R consisting of eigenvectors of T . In fact, to 2 obtain a basis for R consisting of eigenvectors of T , we simply choose one vector from each eigenspace.

2 2 Example. Define T : R → R by

T (x, y) = (x + y, y).

Find the eigenvalues of T and determine whether it is diagonalizable. If it is, find 2 a basis for R consisting of eigenvectors of T . ! 1 1 Solution. Let A = [T ] = , where B is the standard ordered basis for B 0 1 2 R . Hence ! x − 1 −1 2 χT (x) = χA(x) = det = (x − 1) . 0 x − 1

Hence the eigenvalue of T is 1 with multiplicity 2. Now we find the eigenspace corresponding to 1. Let v = (x, y) ∈ ker(T − I). Then (A − I)v = 0, i.e., ! ! ! 0 1 x 0 = . 0 0 y 0 3.2. DIAGONALIZATION 123

Thus y = 0. Hence the eigenspace corresponding to λ = 1 is h(1, 0)i. We see 2 that there is no basis for R consisting of eigenvectors of T . Thus T is not diagonalizable.

2 2 Example. Define T : R → R by

T (x, y) = (−y, x).

Find the eigenvalues of T and determine whether it is diagonalizable. If it is, find 2 a basis for R consisting of eigenvectors of T . ! 0 −1 Solution. Let A = [T ] = , where B is the standard ordered basis for B 1 0 2. Hence R ! x 1 2 χT (x) = χA(x) = det = x + 1. −1 x

2 Since x + 1 has no root in R, we see that T is not diagonalizable. Note that if 2 T is regarded as a linear map on C , then T has two eigenvalues i and −i and hence it is diagonalizable over C.

Remark. A linear map on a complex vector space always has an eigenvalue by the Fundamental Theorem of Algebra.

If A is an n × n diagonalizable matrix, then there is an invertible matrix P −1 such that P AP = D is a diagonal matrix. Assume that D = diag(λ1, . . . , λn). Then AP = PD.

If we write P = [p1 . . . pn], where each pj is the j-th column of P , then

AP = [Ap1 . . . Apn] and PD = [λ1p1 . . . λnpn].

Hence

[Ap1 . . . Apn] = [λ1p1 . . . λnpn].

It follows that Apj = λjpj for j = 1, . . . , n. Thus each pj is an eigenvector of A corresponding to the eigenvalue λj. Hence the j-th column of P is an eigenvector of A corresponding to the the j-th diagonal entry of D. 124 CHAPTER 3. CANONICAL FORMS

Example. Given the following 3 × 3 matrix A:

 2 0 −2    A =  0 1 0  , −2 0 5 determine whether A is diagonalizable. If it is, find an invertible matrix P such that P −1AP is a diagonal matrix.

Solution.

 x − 2 0 2    2 χA(x) = det  0 x − 1 0  = (x − 1) (x − 6). 2 0 x − 5

Hence the eigenvalues of A are 1, 1, 6. By routine calculation, we see that the eigenspace corresponding to λ = 1 is h(0, 1, 0), (2, 0, 1)i and the eigenspace 3 corresponding to λ = 6 is h(1, 0, −2)i. Hence we have a basis for R

{(0, 1, 0), (2, 0, 1), (1, 0, −2)} consisting of eigenvectors of A. This shows that A is diagonalizable. If we let

 0 2 1   1 0 0      P =  1 0 0  and D =  0 1 0  , 0 1 −2 0 0 6 then we have P −1AP = D.

The following proposition about a determinant of a will be useful.

Proposition 3.2.18. Suppose A is an (m + n) × (m + n) matrix which can be written in a block form: ! BC , O D where B ∈ Mm×m(F ), C ∈ Mm×n(F ), D ∈ Mn×n(F ) and O is the zero matrix of size n × m, respectively. Then

det A = det B · det D. 3.2. DIAGONALIZATION 125

Proof. We outline the calculations and leave the details to the reader. Note that ! ! ! BC I C B O = m , O D O D O In

where the zero matrices are of their suitable sizes. It is easy to verify that ! ! I C B O det m = det D and det = det B. O D O In

The desired result now follows.

Definition 3.2.19. Let T : V → V be a linear operator. Assume that

n1 nk χT (x) = (x − λ1) ... (x − λk) ,

where λ1,..., λk are distinct eigenvalues of T . We call ni the algebraic multiplicity of λi and dim Vλi the geometric multiplicity of λi. In other words, the algebraic multiplicity of λi is the number of the repeated factors x − λi in the characteristic polynomial, and its geometric multiplicity is the number of linearly independent eigenvectors corresponding to λi.

Proposition 3.2.20. Let T : V → V be a linear operator. Assume that the characteristic polynomial χT (x) splits over F with distinct roots λ1, . . . , λk. For

each eigenvalue λi of T , its geometric multiplicity is no greater than its algebraic multiplicity. They are equal for all eigenvalues if and only if T is diagonalizable.

Proof. Let λ be an eigenvalue of T with geometric multiplicity d. Then Vλ con-

tains d linearly independent eigenvectors, say {v1, . . . , vd}. Extend it to a basis

B = {v1, . . . , vn} for V . Since

T (vi) = λvi for i = 1, . . . , d,

the matrix representation [T ]B has the block form ! λI B [T ] = d , B O C 126 CHAPTER 3. CANONICAL FORMS where B is a d × (n − d) matrix, C is an (n − d) × (n − d) matrix, and O is the zero matrix of size (n − d) × d, respectively. By Proposition 3.2.18,

det(xIn − [T ]B) = det((x − λ)Id) · det(xIn−d − C) = (x − λ)dg(x),

d where g(x) = det(xIn−d − C) is a polynomial in F [x]. Hence (x − λ) | χT (x). This shows that d ≤ the algebraic multiplicity of λ.

Let n = dim V , ni = algebraic multiplicity of λi and di = geometric multi- plicity of λi for i = 1, . . . , k. Suppose T is diagonalizable. Let B be a basis for V consisting of eigenvectors of T . For i = 1, . . . , k, let Bi = B∩Vλi , the set of vectors in B that are eigenvectors of T corresponding to λi, and let mi = |Bi|. Then

mi ≤ di ≤ ni for i = 1, . . . , k.

Hence k k k X X X n = mi ≤ di ≤ ni = n. i=1 i=1 i=1

This implies that mi = di = ni for i = 1, . . . , k. Conversely, if di = ni for i = 1, . . . , k, then

k k k X X X dim V = n = ni = di = dim Vλi . i=1 i=1 i=1 Hence T is diagonalizable by Theorem 3.2.13. 3.2. DIAGONALIZATION 127

Exercises In this exercise, let V be a finite-dimensional vector space over a field F and T : V → V a linear operator.

3.2.1. Given 3 × 3 matrices A below, determine whether A is diagonalizable. If it is, find an invertible matrix P such that P −1AP is a diagonal matrix.

 3 1 −1   5 −6 −6      (a)  2 2 −1  (b)  −1 4 2  . 2 2 0 3 −6 −4

3.2.2. Prove the following statements:

(i) 0 is an eigenvalue of T if and only if T is non-invertible;

(ii) If T is invertible and λ is an eigenvalue of T , then λ−1 is an eigenvalue for T −1.

3.2.3. Let S and T be linear operators on V . Show that ST and TS have the same set of eigenvalues. Hint: Separate the cases whether 0 is an eigenvalue.

3.2.4. If A and B are similar square matrices, show that χA = χB. Hence similar matrices have the same set of eigenvalues.

3.2.5. If T 2 has an eigenvalue λ2, for some λ ∈ F , show that λ or −λ is an eigenvalue for T . Remark. Try to use the definition and not the characteristic equation.

3.2.6. Prove that if λ is an eigenvalue of T and p(x) ∈ F [x], then p(λ) is an eigenvalue of p(T ).

3.2.7. Let λ ∈ F and suppose there is a non-zero v ∈ V such that T (v) = λv. Prove that there is a non-zero linear functional f ∈ V ∗ such that T t(f) = λf. In other words, if λ is an eigenvalue of T , then it is an eigenvalue of T t. 128 CHAPTER 3. CANONICAL FORMS

3.3 Minimal Polynomial

Definition 3.3.1. Let T : V → V be a linear operator and p(x) ∈ F [x]. If n p(x) = a0 + a1x + ··· + anx , define

n p(T ) = a0I + a1T + ··· + anT .

Then p(T ) is a linear operator on V . Also, if p(x), q(x) ∈ F [x] and k ∈ F ,

(p + q)(T ) = p(T ) + q(T ) (kp)(T ) = kp(T ) (pq)(T ) = p(T )q(T ).

In other words, the map p(x) 7→ p(T ) is an algebra homomorphism from the polynomial algebra F [x] into the algebra of linear operators L(V ). Note that any two polynomials in T commute:

p(T )q(T ) = (pq)(T ) = (qp)(T ) = q(T )p(T ).

Similarly, if A is an n × n matrix over F and p(x) ∈ F [x] is as above, define

n p(A) = a0In + a1A + ··· + anA .

Then p(A) is an n × n matrix over F and the map p(x) 7→ p(A) is an algebra ho- momorphism from the polynomial algebra F [x] into the algebra of n×n matrices

Mn(F ).

Lemma 3.3.2. Let T : V → V be a linear operator. Then there is a non-zero polynomial p(x) ∈ F [x] such that p(T ) = 0.

Proof. Let n = dim V . Consider the set of n2 + 1 elements {I,T,T 2,...,T n2 } in L(V ). Since dim L(V ) = n2, it is linearly dependent. Hence there exist scalars a0, a1,..., an2 , not all zero, such that

n2 a0I + a1T + ··· + an2 T = 0.

n2 Now, let p(x) = a0 + a1x + ··· + an2 x . Then p(x) ∈ F [x] and p(T ) = 0. 3.3. MINIMAL POLYNOMIAL 129

Theorem 3.3.3. Let T : V → V be a linear operator. Then there is a unique monic polynomial of smallest degree mT (x) ∈ F [x] such that mT (T ) = 0. More-

over, if f(x) ∈ F [x] is such that f(T ) = 0, then mT (x) divides f(x).

Similarly, for any matrix A ∈ Mn(F ), there is a unique monic polynomial of

smallest degree mA(x) ∈ F [x] such that mA(A) = 0. Moreover, if f(x) ∈ F [x] is

such that f(A) = 0, then mA(x) divides f(x).

Proof. We will prove only the first part of the theorem. By Lemma 3.3.2, there is a polynomial p(x) such that p(T ) = 0. By the Well-Ordering Principle, let m(x) be a polynomial over F of smallest degree such that m(T ) = 0. By dividing all the coefficients by the leading coefficient, we can choose m(x) to be monic. Now let f(x) ∈ F [x] be a polynomial such that f(T ) = 0. By the Division Algorithm for polynomials (Theorem 3.1.3), there exist q(x), r(x) ∈ F [x] such that

f(x) = q(x)m(x) + r(x),

where deg r(x) < deg m(x) or r(x) = 0. Hence

f(T ) = q(T )m(T ) + r(T ).

Since f(T ) = m(T ) = 0, it follows that r(T ) = 0. But then m(T ) is the polyno- mial of smallest degree such that m(T ) = 0. This shows that r(x) = 0 and that f(x) = q(x)m(x). Thus m(x) | f(x). Now, let m(x) and m0(x) be monic polynomials of smallest degree such that m(T ) = m0(T ) = 0. By the argument above, m(x) | m0(x) and m0(x) | m(x). This implies that m0(x) = c m(x) for some c ∈ F . Since m(x) and m0(x) are monic, we see that c = 1 and that m(x) = m0(x).

Definition 3.3.4. Let T : V → V be a linear operator. The unique monic polynomial mT (x) ∈ F [x] of smallest degree such that mT (T ) = 0 is called the minimal polynomial of T .

If A ∈ Mn(F ), then the unique monic polynomial mA(x) ∈ F [x] of smallest

degree such that mA(A) = 0 is called the minimal polynomial of A. 130 CHAPTER 3. CANONICAL FORMS

Theorem 3.3.5. Let T : V → V be a linear operator. Then mT (λ) = 0 if and only if λ is an eigenvalue of T . In other words, χT (x) and mT (x) have the same set of roots, possibly except for multiplicities.

Proof. (⇒) Assume that mT (λ) = 0. By Corollary 3.1.4, mT (x) = (x − λ)q(x) for some q(x) ∈ F [x]. Since deg q(x) < deg mT (x), we see that q(T ) 6= 0. Hence there is a nonzero v ∈ V such that q(T )(v) 6= 0. Let w = q(T )(v). It follows that

(T − λI)(w) = (T − λI)q(T )(v) = mT (T )(v) = 0.

Thus λ is an eigenvalue of T with a corresponding eigenvector w. (⇐) Assume that λ is an eigenvalue of T . Then there is a nonzero v ∈ V such that T (v) = λv. By the Division Algorithm for polynomials (Theorem 3.1.3), there exist q(x), r(x) ∈ F [x] such that

mT (x) = q(x)(x − λ) + r(x), where deg r(x) < deg(x − λ) or r(x) = 0, i.e., r(x) = r is a constant. Thus

0 = mT (T ) = q(T )(T − λI) + rI.

Applying the above equality to the eigenvector v, we obtain

0 = q(T )(T − λI)(v) + rv = rv.

Then r = 0. Hence mT (x) = q(x)(x − λ), which implies mT (λ) = 0.

Theorem 3.3.6 (Cayley-Hamilton). If A ∈ Mn(F ), then χA(A) = 0.

Proof. Let A ∈ Mn(F ). Write C = xIn − A. Then

n n−1 χA(x) = det(xI − A) = knx + kn−1x + ··· + k1x + k0, where kn, kn−1,..., k0 ∈ F . We will show that

n n−1 χA(A) = knA + kn−1A + ··· + k1A + k0In = 0.

Recall that for any square matrix P , adj P = (Cof P )t is a matrix satisfying

P adj P = (det P )In. Thus adj C is an n×n matrix whose entries are polynomials of degree ≤ n − 1. Hence we can write adj C as

n−1 n−2 adj C = Mn−1x + Mn−2x + ··· + M1x + M0 3.3. MINIMAL POLYNOMIAL 131 where Mi, i = 0, 1, . . . , n − 1, are n × n matrices with scalar entries. Thus

n−1 n−2 C adj C = (xIn − A)(Mn−1x + Mn−2x + ··· + M1x + M0) n n−1 = Mn−1x + (Mn−2 − AMn−1)x + ··· + (M0 − AM1)x − AM0.

On the other hand,

n n−1 (det C)In = χA(x)In = (knIn)x + (kn−1In)x + ··· + (k1In)x + k0In.

By comparing the matrix coefficients, we see that

knIn = Mn−1

kn−1In = Mn−2 − AMn−1 . . . .

k1In = M0 − AM1

k0In = −AM0.

Multiply on the left the first equation by An, the second equation by An−1, and so on. We then have

n n knA = A Mn−1 n−1 n−1 n kn−1A = A Mn−2 − A Mn−1 . . . . 2 k1A = AM0 − A M1

k0In = −AM0.

Adding up these equations, we obtain

n n−1 knA + kn−1A + ··· + k1A + k0In = 0.

Hence χA(A) = 0.

Corollary 3.3.7. Let T : V → V be a linear operator. Then χT (T ) = 0. 132 CHAPTER 3. CANONICAL FORMS

Proof. Let B be an ordered basis for V . Write A = [T ]B. Note that

χT (x) = det(xIn − [T ]B) = χA(x).

We leave it as an exercise to show that

[p(T )]B = p([T ]B) for any p(x) ∈ F [x].

Hence [χT (T )]B = χT ([T ]B) = χA(A) = 0. This shows that χT (T ) = 0.

Corollary 3.3.8. If T : V → V is a linear operator, then mT divides χT . Simi- larly, if A ∈ Mn(F ), then mA divides χA.

Proof. This follows immediately from Theorem 3.3.3, Theorem 3.3.6 and Corol- lary 3.3.7.

2 2 Example. Let T : R → R be defined by

T (x, y) = (3x − 2y, 2x − y).

Find χT and mT .

2 Solution. Let B = {(1, 0), (0, 1)} be the standard basis for R . Let ! 3 −2 A = [T ] = . B 2 −1

Then ! x − 3 2 2 χT (x) = χA(x) = det = (x − 3)(x + 1) + 4 = (x − 1) . −2 x + 1

Since mA divides χA and they have the same roots, we see that mA(x) = x − 1 2 or mA(x) = (x − 1) . If p(x) = x − 1, then p(A) = A − I 6= 0. Hence mT (x) = 2 mA(x) = (x − 1) .

3 3 Example. Let T : R → R be defined by

T (x, y, z) = (3x − 2y, −2x + 3y, 5z).

Find χT and mT . 3.3. MINIMAL POLYNOMIAL 133

3 Solution. Let B = {(1, 0, 0), (0, 1, 0), (0, 0, 1)} be the standard basis for R . Let

 3 −2 0    A = [T ]B =  −2 3 0  . 0 0 5

Then

 x − 3 2 0    2 χT (x) = χA(x) = det  2 x − 3 0  = (x − 5) (x − 1). 0 0 x − 5

2 Thus mA(x) = (x − 5)(x − 1) or mA(x) = (x − 5) (x − 1). But then

 −2 −2 0   2 −2 0      (A − 5I)(A − I) =  −2 −2 0   −2 2 0  = 0. 0 0 0 0 0 4

Hence mT (x) = mA(x) = (x − 1)(x − 5).

Theorem 3.3.9. Let T : V → V be a linear operator. Then T is diagonalizable if and only if mT (x) is a product of distinct linear factors over F .

Proof. (⇒) Assume that there is a basis B = {v1, . . . , vn} for V consisting of eigenvectors of T . For i = 1, . . . , n, let λi be the eigenvalue corresponding to vi, i.e. T (vi) = λivi. Let α1, . . . , αk be distinct elements of the set {λ1, . . . , λn}. Let p(x) = (x − α1) ... (x − αk). We will show that p(T ) = 0. Since B is a basis for

V , it suffices to show that p(T )(vi) = 0 for i = 1, . . . , n. Fix i ∈ {1, . . . , n} and assume that λi = αj. Then T (vi) = αjvi. Since

p(T ) = (T − α1I) ... (T − αjI) ... (T − αkI), we can switch the order of the terms in the parentheses so that T −αjI is the last one on the right-hand side. Then (T − αjI)(vi) = 0, which implies p(T )(vi) = 0.

Hence p(T ) = 0. It follows that mT (x) | p(x). But then p(x) is a product of distinct linear factors, and so is mT (x). 134 CHAPTER 3. CANONICAL FORMS

(⇐) Suppose mT (x) = (x − λ1) ... (x − λk), where λ1, . . . , λk are all distinct. Let

Vλi = ker(T − λiI) be the eigenspace corresponding to λi for i = 1, . . . , k. First we show that V = Vλ1 + ··· + Vλk . For i = 1, . . . , k, let Y σi(x) = x − λi and τi(x) = (x − λj). j6=i

Then σi(x)τi(x) = mT (x) for i = 1, . . . , k, and hence σi(T )τi(T ) = mT (T ) = 0.

Note that τ1(x), . . . , τk(x) have no common factors in F [x], and thus

gcd(τ1(x), . . . , τk(x)) = 1.

By Proposition 3.1.9, there exist q1(x), . . . , qk(x) ∈ F [x] such that

q1(x)τ1(x) + ··· + qk(x)τk(x) = 1.

It follows that

q1(T )τ1(T ) + ··· + qk(T )τk(T ) = I.

Let v ∈ V and vi = qi(T )τi(T )(v) for i = 1, . . . , k. Then v = v1 + ··· + vk and

(T − λiI)(vi) = σi(T )qi(T )τi(T )(v) = qi(T )σi(T )τi(T )(v) = 0.

Thus vi ∈ Vλi for i = 1, . . . , k. Hence V = Vλ1 + ··· + Vλk . By Proposition

3.2.12 and Theorem 3.2.13, we conclude that V = Vλ1 ⊕ · · · ⊕ Vλk and that T is diagonalizable.

Definition 3.3.10. Let T : V → V be a linear operator. A subspace W of V is said to be invariant under T or T -invariant if T (W ) ⊆ W .

Example. {0}, V , ker T and im T are all T -invariant. An eigenspace of T is also T -invariant.

Example. Let T : F [x] → F [x] be the differentiation operator f 7→ f 0. Then each subspace Pn of F [x] consisting of polynomials of degree ≤ n and the zero polynomial is T -invariant.

Remark. If T : V → V is a linear operator on V and W is a T -invariant subspace of V , then the restriction T |W of T to W is a linear operator on W . We will denote it by TW . Hence TW : W → W is a linear operator such that TW (x) = T (x) for any x ∈ W . 3.3. MINIMAL POLYNOMIAL 135

Let us discuss a relation between the matrix representation of a linear operator and its restriction on an invariant subspace. Let W be a T -invariant subspace of

V . Let C = {v1, . . . , vk} be a basis for W and extend it to a basis B = {v1, . . . , vn} for V . Write n X T (vj) = αijvi j = 1, . . . , n. i=1

Since W is T -invariant, we have T (vj) ∈ W for j = 1, . . . , k. Hence αij = 0 for

j = 1, . . . , k and i = k + 1, . . . , n. We see that [T ]B has the block form ! BC [T ] = , B O D where B = [TW ]C is a k×k matrix, C is a k×(n−k) matrix, D is an (n−k)×(n−k) matrix, and O is the zero matrix of size (n − k) × k, respectively.

Suppose V = V1 ⊕ · · · ⊕ Vk, where Vi’s are T -invariant subspaces of V . Let

Bi be a basis for Vi for each i. Then, by Proposition 1.6.21, B = B1 ∪ · · · ∪ Bk is

an ordered basis for V and [T ]B has the block form   A1 O ... O    O A2 ... O  [T ]B =  . . . .  (3.6)  . . .. .   . . .  OO ...Ak

where Ai = [T |Vi ]Bi for i = 1, . . . , k. Next we investigate relations between characteristic polynomials and minimal polynomials of a linear operator and its restriction on invariant subspaces.

Proposition 3.3.11. Let T : V → V be a linear operator, and let W be a T - invariant subspace of V . Then

(a) the characteristic polynomial of TW divides the characteristic polynomial of T ;

(b) the minimal polynomial of TW divides the minimal polynomial of T .

Proof. (a) Let C be a basis for W and extend it to a basis B for V . Let A = [T ]B

and B = [TW ]C. Then χA(x) and χB(x) are the characteristic polynomials of T 136 CHAPTER 3. CANONICAL FORMS and TW , respectively. Note that A has the block form ! BC A = . O D By Proposition 3.2.18, ! xIk − B −C χA(x) = det = χB(x) det(xIn−k − D). O xIn−k − D

Since det(xIn−k − D) is a polynomial in x, we see that χB(x) | χA(x).

(b) Denote by mT (x) and mTW (x) the minimal polynomials of T and TW , re- spectively. Since mT (T ) = 0, we see that mT (TW ) = mT (T )|W = 0. It follows that mTW (x) | mT (x).

Corollary 3.3.12. Let T : V → V be a linear operator, and W a T -invariant subspace of V . If T is diagonalizable, then TW is diagonalizable. Proof. If T is diagonalizable, then its minimal polynomial is a product of distinct factors. But then the minimal polynomial of TW divides the minimal polynomial of T and thus it must be a product of distinct factors as well. Hence TW is diagonalizable.

Proposition 3.3.13. Let T : V → V be a linear operator and suppose that

V = V1 ⊕ · · · ⊕ Vk, where each Vi is a T -invariant subspace of V . Let Ti = T |Vi regarded as a linear operator on Vi for i = 1, . . . , k. Then Qk (a) χT (x) = i=1 χTi (x);

(b) mT (x) = lcm{mT1 (x), . . . , mTk (x)}.

Proof. (a) Let Bi be a basis for Vi for each i. Then B = B1 ∪ · · · ∪ Bk is an ordered basis for V , by Proposition 1.6.21. Then the matrix [T ]B has the block form (3.6). By a generalization of Proposition 3.2.18, we see that

k k Y Y χT (x) = det(xI − [T ]B) = det(xIi − [Ti]Bi ) = χTi (x), i=1 i=1 where Ii denotes the identity matrix of size dim Vi for each i. This finishes the proof of part (a). To prove part (b), we will show that 3.3. MINIMAL POLYNOMIAL 137

(i) mTi | mT for i = 1, . . . , k;

(ii) for any p(x) ∈ F [x], if mTi (x) | p(x) for i = 1, . . . , k, then mT (x) | p(x).

The first statement follows from Proposition 3.3.11 (ii). To show the second

statement, let p(x) ∈ F [x] be such that mTi (x) | p(x) for i = 1, . . . , k. Then

p(x) = qi(x)mTi (x) for some qi(x) ∈ F [x]. In particular, if vi ∈ Vi, then

p(Ti)(vi) = qi(Ti)mTi (Ti)(vi) = 0.

Now let v ∈ V and write v = v1 + ··· + vk, where vi ∈ Vi for i = 1, . . . , k.

Since each Vi is invariant under T , it is also invariant under p(T ). Note that

p(T )|Vi = p(T |Vi ) = p(Ti). Hence

k k X X p(T )(v) = p(T )(vi) = p(Ti)(vi) = 0. i=1 i=1

This shows that p(T ) = 0, which implies mT (x) | p(x). This finishes the proof of the second statement and of part (b).

We finish this section by proving that commuting diagonalizable linear op- erators can be simultaneously diagonalized, i.e., they have a common basis of eigenvectors. This is an important result which is useful in some applications of linear algebra. Since the proof of this theorem is quite involved, we start with an easier version.

Proposition 3.3.14. Let V be a vector space over an algebraically closed field F and let S, T : V → V be linear operators such that ST = TS. Then S and T have a common eigenvector.

Proof. First, consider the linear operator T : V → V . Since F is an algebraically closed field, the characteristic equation of T has a root in F and hence T has an eigenvalue, say, λ. Then the eigenspace W = Vλ corresponding to eigenvalue

λ is nonzero. Consider the restriction S|W : W → V of S to W . We will show that S(W ) ⊆ W . To see this, let w ∈ W . Then T (w) = λw. It follows that

TS(w) = ST (w) = S(λw) = λS(w). Hence S(w) ∈ W . Now we can regard S|W as a linear operator on W . 138 CHAPTER 3. CANONICAL FORMS

By the same reason, S|W : W → W has an eigenvalue, say, α with a cor- responding eigenvector v. Then S(v) = S|W (v) = αv. Moreover, v ∈ W ⊆ V , which implies that v is an eigenvector of T corresponding to λ: T (v) = λv. Hence v is a common eigenvector for both S and T .

Theorem 3.3.15. Let F be a commuting family of diagonalizable linear operators on V , i.e. ST = TS for all S, T ∈ F. Then there is an ordered basis B for V such that [T ]B is a diagonal matrix for each T ∈ F. In other words, all linear operators in F are simultaneously diagonalizable.

Proof. We will prove this theorem by induction on the dimension of V . If dim V = 1, this is obvious. Next, let n be a positive integer such that the statement of the theorem holds for all vector spaces of dimension less than n. Let V be a vector space of dimension n. Choose T ∈ F which is not a scalar multiple of

I. Let λ1, . . . , λk be the distinct eigenvalues of T and let Vi be the eigenspace corresponding to λi for each i. Then each Vi is a proper subspace of V and

V = V1 ⊕ · · · ⊕ Vk. Note that each Vi is S-invariant for any S ∈ F. To see this, let S ∈ F. If v ∈ Vi, then T (v) = λiv, which implies TS(v) = ST (v) = S(λiv) =

λiS(v). Hence S(v) ∈ Vi.

Fix i ∈ {1, . . . , k} and let Fi denote the set of linear operators S|Vi : Vi → Vi where S ∈ F. Each operator S|Vi in Fi is diagonalizable by Corollary 3.3.12. Hence Fi is a commuting family of diagonalizable linear operators on Vi. Since dim Vi < n, by the induction hypothesis, the operators in Fi can be simulta- neously diagonalized, i.e., there exists a basis Bi for Vi consisting of common eigenvectors of every operators in Fi. Thus B = B1 ∪ · · · ∪ Bk is a basis for V consisting of simultaneous eigenvectors of every operator in F. This completes the induction and the proof of this theorem. 3.3. MINIMAL POLYNOMIAL 139

Exercises In this exercise, unless otherwise stated, V is a finite-dimensional vector space over a field F .

3.3.1. Find the characteristic polynomials and the minimal polynomials of the following matrices, and determine whether they are diagonalizable.

 3 1 −1   5 −6 −6      (a)  2 2 −1  (b)  −1 4 2  . 2 2 0 3 −6 −4

2 2 3.3.2. Let P : R → R be defined by P (x, y) = (x, 0). Find the minimal poly- nomial of P .

3.3.3. Let V be a finite-dimensional vector space over C and T : V → V a linear n operator. If T = I for some n ∈ N, show that T is diagonalizable.

3.3.4. Show that T is invertible if and only if the constant term in the minimal polynomial of T is non-zero. Moreover, if T is invertible, then T −1 = p(T ) for some p(x) ∈ F [x].

3.3.5. Let T : V → V be a linear operator and let B be an ordered basis for V . If p(x) ∈ F [x], prove that [p(T )] = p([T ] ). Then prove that m = m . B B T [T ]B

3.3.6. If A and B are similar matrices, show that mA = mB.

3.3.7. If A is an invertible matrix such that Ak is diagonalizable for some k ≥ 2, show that A is diagonalizable.

3.3.8. Let T : V → V be a linear operator. If f(x) ∈ F [x] is any polynomial,

show that ker f(T ) = ker g(T ), where g = gcd(f, mT ).

3.3.9. Let T : V → V be a linear operator. If every subspace of V is T -invariant, show that T is a scalar multiple of the identity.

2 2 3.3.10. Let T : R → R be a linear operator defined by T (x, y) = (2x + y, 2y). Let W1 = h(1, 0)i. Prove that W1 is T -invariant and that there is no T -invariant 2 subspace W2 of V such that R = W1 ⊕ W2. 140 CHAPTER 3. CANONICAL FORMS

3.3.11. Let A and B be nonsingular complex square matrices such that ABA = B. Prove that

(i) if v is an eigenvector of A, then so is Bv;

(ii) A and B2 have a common eigenvector. 3.4. JORDAN CANONICAL FORMS 141

3.4 Jordan Canonical Forms

In this section, we show that in case a linear operator (or a matrix) is not diago- nalizable, there is still a matrix representation which has a nice form and is called the Jordan canonical form. If it is diagonalizable, then its Jordan canonical form is a diagonal matrix.

Definition 3.4.1. Let V be a finite-dimensional vector space over a field F and Ω an algebraically closed field containing F . Let T : V → V be a linear operator. We say that

- T is semisimple if mT (x) is a product of distinct linear factors over Ω[x];

n - T is nilpotent if T = 0 for some n ∈ N.

Remark. If F is algebraically closed, then T is semisimple if and only if T is diagonalizable.

Proposition 3.4.2. Let T : V → V be a linear operator. If T is semisimple and nilpotent, then T = 0.

n n Proof. Since T is nilpotent, T = 0 for some n ∈ N. Let p(x) = x ∈ F [x]. Then k p(T ) = 0. Hence mT | p, which implies mT (x) = x for some k ≤ n. Since T is semisimple, k = 1. Thus mT (x) = x so that T = mT (T ) = 0.

Proposition 3.4.3. Let S, T : V → V be linear operators such that ST = TS and α, β ∈ F .

(i) If S and T are semisimple, then so is αS + βT .

(ii) If S and T are nilpotent, then so is αS + βT .

Proof. (i) We will prove this statement under the assumption that F is alge- braically closed. In this case, S and T are diagonalizable. (In the general case, we can extend V to a new vector space over an algebraically closed field Ω con- taining F . Then S and T wil be diagonalizable over Ω.) Since S and T commute, they are simultaneously diagonalizable by Theorem 3.3.15. Thus there is an

ordered basis B for V such that [S]B and [T ]B are diagonal matrices. Hence 142 CHAPTER 3. CANONICAL FORMS

[αS + βT ]B = α[S]B + β[T ]B is also a diagonal matrix. This shows that αS + βT is diagonalizable (semisimple). m n (ii) Assume that S = 0 and T = 0 for some m, n ∈ N. Since ST = TS,

m+n X m + n (αS + βT )m+n = αm+n−k βk Sm+n−k T k. k k=0

If 0 ≤ k ≤ n, then m + n − k ≥ m and thus Sm+n−k = 0. If n < k ≤ m + n, then T k = 0. It follows that (αS + βT )m+n = 0 and that αS + βT is nilpotent.

Theorem 3.4.4 (Primary Decomposition). Let T : V → V be a linear operator.

Assume that the minimal polynomial mT (x) can be written as

m1 mk mT (x) = (x − λ1) ... (x − λk) , where λ1,..., λk are distinct elements in F . Define

mi Vi = ker(T − λiI) , i = 1, . . . , k.

Then

(i) each Vi is a non-zero, T -invariant subspace of V ;

(ii) V = V1 ⊕ · · · ⊕ Vk.

Proof. (i) Let i ∈ {1, . . . , k}. Since λi is a root of mT (x), it is an eigenvalue of m T . Hence ker(T − λiI) 6= {0}. But then ker(T − λiI) ⊆ ker(T − λiI) i = Vi. It follows that Vi is a non-zero subspace of V . To see that Vi is T -invariant, note m m that T commutes with any polynomial in T . Thus T (T −λiI) i = (T −λiI) i T , which implies that, for any v ∈ Vi,

mi mi (T − λiI) T (v) = T (T − λiI) (v) = T (0) = 0,

m and hence T (v) ∈ ker(T − λiI) i = Vi. This shows that Vi is T -invariant.

(ii) For i = 1, . . . , k, let

mi Y Y mj σi(x) = (x − λi) and τi(x) = σj(x) = (x − λj) . (3.7) j6=i j6=i 3.4. JORDAN CANONICAL FORMS 143

Then for each i = 1, . . . , k, we have σi(x)τi(x) = mT (x) and hence

σi(T )τi(T ) = mT (T ) = 0.

Note that τ1(x), . . . , τk(x) have no common factors in F [x], and thus

gcd(τ1(x), . . . , τk(x)) = 1.

By Proposition 3.1.9, there exist q1(x), . . . , qk(x) ∈ F [x] such that

q1(x)τ1(x) + ··· + qk(x)τk(x) = 1.

Hence

q1(T )τ1(T ) + ··· + qk(T )τk(T ) = I.

Let v ∈ V and vi = qi(T )τi(T )(v) for i = 1, . . . , k. Then v = v1 + ··· + vk and

mi (T − λiI) (vi) = σi(T )qi(T )τi(T )(v) = qi(T )σi(T )τi(T )(v) = 0.

Thus vi ∈ Vi for i = 1, . . . , k. It remains to show that  X  Vi ∩ Vj = {0} for i = 1, . . . , k. (3.8) j6=i P  Fix i ∈ {1, . . . , k}. Let v ∈ Vi ∩ j6=i Vj . Since v ∈ Vi,

σi(T )(v) = 0. (3.9) P Write v = j6=i vj, where vj ∈ Vj for j = 1, . . . , k and j 6= i. Then τi(T )(vj) = 0 for all j 6= i. Hence X τi(T )(v) = τi(T )(vj) = 0. (3.10) j6=i

Note that gcd(σi, τi) = 1. By Proposition 3.1.9, there exist p(x), q(x) ∈ F [x] such that

p(x)σi(x) + q(x)τi(x) = 1. Thus

p(T )σi(T ) + q(T )τi(T ) = I. By (3.9) and (3.10), it follows that

v = p(T )σi(T )(v) + q(T )τi(T )(v) = 0.

Hence (3.8) holds. This establishes (ii). 144 CHAPTER 3. CANONICAL FORMS

Proposition 3.4.5. Let T : V → V be a linear operator with the characteristic polynomial χT (x) and the minimal polynomial mT (x) given by

n1 nk χT (x) = (x − λ1) ... (x − λk) and

m1 mk mT (x) = (x − λ1) ... (x − λk) .

m For i = 1, . . . , k, if Vi = ker(T − λiI) i , then

ni (i) the characteristic polynomial of T |Vi is (x − λi) ,

mi (ii) the minimal polynomial of T |Vi is (x − λi) , and

(iii) dim Vi = ni.

mi Proof. (i) Let Ti = T |Vi . Then (Ti − λiI) = 0 on Vi. Hence the minimal mi pi polynomial mTi (x) of Ti divides σi(x) = (x − λi) . Thus mTi (x) = (x − λi) qi for some integer pi. It follows that χTi (x) = (x − λi) for some integer qi. By Proposition 3.3.13(i), we have

n1 nk q1 qk (x − λ1) ... (x − λk) = χT (x) = (x − λ1) ... (x − λk) .

ni We conclude that qi = ni, by Theorem 3.1.16, and that χTi (x) = (x − λi) . p p (ii) Note that (x − λi) i and (x − λj) j are relative prime if i 6= j. Hence their least common multiple is just a product of every term. By Proposition 3.3.13(ii),

m1 mk p1 pk (x − λ1) ... (x − λk) = mT (x) = (x − λ1) ... (x − λk) .

mi Again, we have pi = mi and hence mTi (x) = (x − λi) for i = 1, . . . , k. (iii) Note that the degree of the characteristic polynomial is the dimension of the vector space. Hence dim Vi = ni.

mi On each Vi = ker(T − λiI) , write Ti = T |Vi so that

Ti = λiIVi + (Ti − λiIVi ).

Hence if Bi is any ordered basis for Vi, then

[Ti]Bi = [λiIVi ]Bi + [Ti − λiIVi ]Bi

= diag(λi, . . . , λi) + [Ti − λiIVi ]Bi . 3.4. JORDAN CANONICAL FORMS 145

We will choose an ordered basis Bi for Vi so that [Ti − λiIVi ]Bi has a nice mi form. Note that (Ti − λiIVi ) = 0 on Vi. In this case, we say that Ti − λiIVi is a nilpotent operator. We will investigate a nilpotent operator more carefully.

Definition 3.4.6. Let T : V → V be a linear operator on a finite-dimensional k vector space V . We say that T is nilpotent if T = 0 for some k ∈ N. The smallest positive integer k such that T k = 0 is called the index of nilpotency, or simply the index, of T , denoted by Ind T . A nilpotent matrix can be defined similarly.

Example. For k ∈ N, define Nk ∈ Mk(F ) by   0 1 0 ... 0   0 0 1 ... 0   ......  Nk = . . . . . . (3.11)   0 0 0 ... 1   0 0 0 ... 0

Then Nk is a nilpotent matrix of index k.

Proposition 3.4.7. Let T : V → V be a nilpotent operator. Then

k (i) Ind T = k if and only if mT (x) = x ;

(ii) Ind T ≤ dim V ;

(iii) if n = dim V , then T n = 0.

Proof. Exercise.

Definition 3.4.8. Let T : V → V be a linear operator. We say that V is T -cyclic if there is a vector v ∈ V such that V is spanned by the set {v, T (v),T 2(v),... }. In this case, v is called a cyclic vector of V .A T -cyclic subspace of V generated by v is the span of the set {v, T (v),T 2(v),... }.

Remark. It is obvious that a T -cyclic subspace is T -invariant.

0 Example 3.4.9. Let T : R[x] → R[x] be the differentiation operator: T (f) 7→ f . 2 2 Then the T -cyclic subspace of R[x] generated by x is h{x , 2x, 2}i = P2(R). 146 CHAPTER 3. CANONICAL FORMS

Proposition 3.4.10. Let T : V → V be a linear operator and n = dim V . If V is T -cyclic generated by v ∈ V , then {v, T (v),T 2(v),...,T n−1(v)} is a basis for V .

Proof. Let j be the smallest integer for which {v, T (v),...,T j(v)} is linearly dependent. The existence of j follows from the assumption that V is finite- dimensional. It follows that {v, T (v),...,T j−1(v)} is linearly independent. We will show that

k j−1 T (v) ∈ h{v, T (v),...,T (v)}i for any k ∈ N ∪ {0}. (3.12)

This is clear for 0 ≤ k ≤ j − 1. Suppose T s(v) ∈ h{v, T (v),...,T j−1(v)}i. Write

s j−1 T (v) = b0v + b1T (v) + ··· + bj−1T (v), where b0, b1, . . . , bj−1 ∈ F . Apply T both sides:

s+1 2 j T (v) = b0T (v) + b1T (v) + ··· + bj−1T (v).

Since the set {v, T (v),...,T j(v)} is linearly dependent, T j(v) can be written as a linear combination of v, T (v),...,T j−1(v). Hence

T s+1(v) ∈ h{v, T (v),...,T j−1(v)}i.

By induction, we have established (3.12). It follows that

V = h{v, T (v),..., }i ⊆ h{v, T (v),...,T j−1(v)}i ⊆ V.

Hence V = h{v, T (v),...,T j−1(v)}i. It follows that {v, T (v),...,T j−1(v)} is a basis for V . Since dim V = n, we see that j = n.

Proposition 3.4.11. Let V be a vector space with dim V = k and T : V → V a linear operator. If V is T -cyclic generated by v ∈ V with ordered basis B = {T k−1(v),T k−2(v),...,T (v), v}, then

[T ]B = Nk, where Nk is the k × k matrix defined by (3.11). 3.4. JORDAN CANONICAL FORMS 147

k−i Proof. Let vi = T (v) for i = 1, . . . , k. Then T (v1) = 0 and T (vi) = vi−1 for i = 2, . . . , k. It follows that [T ]B = Nk where Nk is defined by (3.11).

Lemma 3.4.12. Let T : V → V be a nilpotent linear operator of index k. Then there exist subspaces W and W 0 of V such that

(i) W and W 0 are T -invariant;

(ii) W is T -cyclic with dimension k;

(iii) V = W ⊕ W 0.

Proof. Let v ∈ V be such that T k−1(v) 6= 0. Let W be the subspace of V generated by B = {v, T (v),...,T k−1(v)}. To show that B is linearly independent, let α0,..., αk−1 be scalars such that

k−1 α0v + α1T (v) + ··· + αk−1T (v) = 0. (3.13)

k−1 k−1 Applying T to (3.13), we obtain α0T (v) = 0, which implies α0 = 0. Hence (3.13) reduces to k−1 α1T (v) + ··· + αk−1T (v) = 0. (3.14) k−2 Again, applying T (v) to (3.14), we have α1 = 0. Repeat this process until we have α0 = α1 = ··· = αk−1 = 0. Thus B is linearly independent. Hence W is a T -cyclic subspace of dimension k. Next define

T = {Z ≤ V | Z is a T -invariant subspace of V and Z ∩ W = {0}}.

0 Then T 6= ∅ since {0} ∈ T . Choose W ∈ T with the maximum dimension. Then W 0 is a T -invariant subspace of V and W 0 ∩ W = {0}. It remains to show that V = W + W 0. Suppose V 6= W + W 0. We will produce an element y ∈ V such that

y∈ / W + W 0, but T (y) = w0 ∈ W 0. (3.15)

Once we have this element y, let Z = hW 0 ∪ {y}i = W 0 + hyi be the subspace of V generated by W 0 and y. Then Z is T -invariant and Z ∩ W = {0}. Indeed, if t = s + αy ∈ Z, where s ∈ W 0 and α ∈ F , we have

T (t) = T (s) + αT (y) ∈ W 0 ⊆ Z. 148 CHAPTER 3. CANONICAL FORMS

Hence T (Z) ⊆ Z. Next, let t = s + αy ∈ Z ∩ W , where s ∈ W 0 and α ∈ F . Then αy = t − s ∈ W + W 0. If α 6= 0, we have y ∈ W + W 0, a contradiction. Thus α = 0, which implies t = s ∈ W ∩ W 0 = {0}. It follows that t = 0 and that Z ∩ W = {0}. This contradicts the choice of W 0 since dim W 0 < dim Z. We can conclude that V = W + W 0 and that V = W ⊕ W 0. Now we find an element y satisfying (3.15). Since W + W 0 6= V , there exists an x ∈ V such that x∈ / W + W 0. Note that T 0(x) = x and T k(x) = 0 ∈ W + W 0. i 0 i+1 0 Hence there is an i ∈ N ∪ {0} such that T (x) ∈/ W + W but T (x) ∈ W + W . Set u = T i(x). Then u∈ / W + W 0 but T (u) ∈ W + W 0. Write T (u) = w + w0, where w ∈ W and w0 ∈ W 0. We claim that w = T (z) for some z ∈ W .

To see this, note that

T k−1(w) + T k−1(w0) = T k−1(w + w0) = T k(u) = 0.

Since W and W 0 are both T -invariant,

T k−1(w) = −T k−1(w0) ∈ W ∩ W 0 = {0}.

Hence T k−1(w) = 0. Since w ∈ W = h{v, T (v),...,T k−1(v)}i,

k−1 w = α0v + α1T (v) + ··· + αk−1T (v)

k−1 for some scalars α0,..., αk−1 ∈ F . Applying T to w in the k−1 k−1 above equation, we see that 0 = T (w) = α0T (v), which implies

α0 = 0. As a result,

k−1 w = α1T (v) + ··· + αk−1T (v) = T (z),

k−2 where z = α1v + ··· + αk−1T (v) ∈ W .

Thus T (u) = w + w0 = T (z) + w0. Hence w0 = T (u − z). Let y = u − z. If y ∈ W + W 0, then u = z + y ∈ W + W 0, a contradiction. Hence y∈ / W + W 0, but T (y) = w0 ∈ W 0. This finishes the proof.

Theorem 3.4.13. Let T : V → V be a nilpotent linear operator. Then there exist

T -cyclic subspaces W1,...,Wr of V such that 3.4. JORDAN CANONICAL FORMS 149

(i) V = W1 ⊕ · · · ⊕ Wr;

(ii) dim Wi = Ind(T |Wi ) for i = 1, . . . , r;

(iii) Ind T = dim W1 ≥ · · · ≥ dim Wr.

Proof. We induct on n = dim V . If n = 1, then Ind T = dim V = 1. Hence T = 0 and we are done with r = 1 and W1 = V . Assume that the statement of the theorem holds whenever dim V < n. Let V be a vector space with dimension n and T : V → V a nilpotent operator. If

Ind T = n = dim V , then we are done with r = 1 and W1 = V by Lemma 3.4.12. Suppose now that Ind T < dim V . By Lemma 3.4.12 again, there exist T - 0 invariant subspaces W1 and W such that W1 is T -cyclic with dim W1 = Ind T and 0 0 V = W1 ⊕ W . Since dim W1 ≥ 1, dim W < n and T |W 0 is a nilpotent operator 0 on W . By the induction hypothesis, there exist T -cyclic subspaces W2,...,Wr 0 0 of W such that W = W2 ⊕ · · · ⊕ Wr, dim Wi = Ind(T |Wi ) for i = 2, . . . , r and Ind(T |W 0 ) = dim W2 ≥ · · · ≥ dim Wr. Thus

0 V = W1 ⊕ W = W1 ⊕ · · · ⊕ Wr.

Since Ind T ≥ Ind(T |W 0 ), we have dim W1 ≥ dim W2 ≥ · · · ≥ dim Wr.

While the cyclic subspaces that constitute the cyclic decomposition in Theo- rem 3.4.13 are not unique, the number of cyclic subspaces in the direct sum and their respective dimensions are uniquely determined by the information of the operator T alone.

Proposition 3.4.14. Let T : V → V be a nilpotent linear operator. Suppose

V = W1 ⊕ · · · ⊕ Wr, where Wi’s are T -cyclic subspaces such that Ind T = dim W1 ≥ · · · ≥ dim Wr and dim Wi = Ind(T |Wi ) for i = 1, . . . , r. Then (i) r = dim(ker T );

(ii) For any q ∈ N, the number of subspaces Wi with dim Wi = q is

2 dim(ker T q) − dim(ker T q−1) − dim(ker T q+1). 150 CHAPTER 3. CANONICAL FORMS

Proof. Suppose V = W1 ⊕ · · · ⊕ Wr. For any q = 1, 2,... , we first show that

q q q ker T = (W1 ∩ ker T ) ⊕ · · · ⊕ (Wr ∩ ker T ). (3.16)

q Let u ∈ ker T and write u = u1 + ··· + ur, where ui ∈ Wi for each i. Then

q q q 0 = T (u) = T (u1) + ··· + T (ur).

q Since each Wi is T -invariant, T (ui) ∈ Wi for each i. By a property of direct q q sum, we conclude that T (ui) = 0 for each i. Hence each ui ∈ Wi ∩ ker T . This establishes (3.16). k −1 Suppose Wi is a T -cyclic subspace spanned by Bi = {v, T (v),...,T i (v)}, ki for some v ∈ Wi, where ki = dim Wi = Ind(T |Wi ). Then T (v) = 0. Note that q Wi ⊆ ker T if q ≥ ki. Hence

q dim(Wi ∩ ker T ) = ki if ki < q. (3.17)

Next, we show that

q dim(Wi ∩ ker T ) = q if ki ≥ q. (3.18)

k −q k −1 q q Clearly, T i (v),...,T i ∈ Wi ∩ ker T . If x ∈ Wi ∩ ker T , then

ki−1 x = α0v + α1T (v) + ··· + αki−1T (v),

where α0, . . . , αki−1 ∈ F . Then

q q q+1 ki−1 0 = T (x) = α0T (v) + α1T (v) + ··· + αki−q−1T (v).

Since Bi is linearly independent, it follows that α0 = ··· = αki−q−1 = 0. Hence

ki−q ki−1 x = αki−qT (v) + ··· + αki−1T (v).

q k −q k −1 This shows that Wi ∩ ker T is spanned by {T i (v),...,T i (v)} and thus has dimension q. Now, applying (3.16) and (3.18) to q = 1, we see that r = dim(ker T ). In general,

r q X q X X dim(ker T ) = dim(Wi ∩ ker T ) = ki + q.

i=1 ki≤q−1 ki≥q 3.4. JORDAN CANONICAL FORMS 151

Hence

q−1 X X X X dim(ker T ) = ki + (q − 1) = ki + (q − 1).

ki≤q−2 ki≥q−1 ki≤q−1 ki≥q It follows that

q q−1 (# of Wi with dim Wi ≥ q) = dim(ker T ) − dim(ker T ).

This shows that

q q−1 q+1 (# of Wi with dim Wi = q) = 2 dim(ker T ) − dim(ker T ) − dim(ker T ).

Corollary 3.4.15. Let T : V → V be a nilpotent linear operator of index k ≥ 2. Then there exists an ordered basis B for V such that   Nk1 0  .  [T ]B =  ..    0 Nkr where

(i) k = k1 ≥ k2 ≥ · · · ≥ kr;

(ii) k1 + ··· + kr = n = dim V .

Moreover, the numbers r and k1, . . . , kr are uniquely determined by T .

Proof. It follows from Theorem 3.4.13 and Proposition 3.4.11. The uniqueness part follows from Proposition 3.4.14.

Theorem 3.4.16 (Jordan canonical form). Let T : V → V be a linear operator

such that mT (x) splits over F . Then there exists an ordered basis B for V such

that [T ]B is in the following Jordan canonical form:   A1    A2  [T ]B =  .  (3.19)  ..    Ak 152 CHAPTER 3. CANONICAL FORMS where each Ai is of the form

  Ji1    Ji2  Ai =  .   ..   

Jiri and where each Jij, called a Jordan block, is of the form

  λi 1  .   λ ..   i  Jij =  .  . (3.20)  .. 1    λi

m1 m Proof. Let mT (x) = (x − λ1) ... (x − λk) k , where λ1, . . . , λk are distinct ele- m ments in F . Let Vi = ker(T − λiI) i for i = 1, . . . , k. By Theorem 3.4.4, each Vi is a non-zero T -invariant subspace and

V = V1 ⊕ · · · ⊕ Vk.

Fix i ∈ {1, . . . , k}. Consider Ti = T |Vi , regarded as a linear operator on Vi. Write

Ti = λiIVi + (Ti − λiIVi ). (3.21)

m By Proposition 3.4.5 (ii), the minimal polynomial of Ti is (x − λi) i . Hence the mi minimal polynomial of Ti − λiIVi is x . This shows that the linear operator

TN := Ti − λiIVi is nilpotent of index mi. By Theorem 3.4.13, there are non-zero subspaces Wi1,...,Wiri of Vi such that each Wij is TN -cyclic, Ind(TN |Wij ) = dim Wij, mi = Ind(TN ) = dim Wi1 ≥ · · · ≥ dim Wiri and

Vi = Wi1 ⊕ · · · ⊕ Wiri .

By Proposition 3.4.11, there is an ordered basis Bij for Wij such that

[TN |Wij ]Bij = Nkj for some kj ∈ N. 3.4. JORDAN CANONICAL FORMS 153

It follows from (3.21) that

[Ti|Wij ]Bij = [λiIWij ]Bij + [TN |Wij ]Bij     λi 0 1  .   λi   0 ..  =   +    ..   .   .   .. 1      λi 0   λi 1  .   λ ..   i  =  .  .  .. 1    λi Finally, we have k k r M M Mi V = Vi = Wij. i=1 i=1 j=1

k ni Let B = ∪i=1 ∪j=1 Bij be the ordered basis for V obtained from Bij’s. Then [T ]B is of the form (3.19) as desired.

The following theorem gives a procedure on how to find a Jordan canonical form of a linear operator whose minimal polynomial splits.

Theorem 3.4.17. Let T : V → V be a linear operator. Assume that

n1 nk χT (x) = (x − λ1) ... (x − λk) and

m1 mk mT (x) = (x − λ1) ... (x − λk) .

Let J be the Jordan canonical form of T . Then

(i) For each i, each entry on the main diagonal of Jij is λi, and the number

of λi’s on the main diagonal of J is equal to ni. Hence the sum (over j) of

the orders of the Jij’s is ni.

(ii) For each i, the largest Jordan block Jij is of size mi × mi.

(iii) For each i, the number of blocks Jij equals the dimension of the eigenspace

ker(T − λiI). 154 CHAPTER 3. CANONICAL FORMS

(iv) For each i the number of blocks Jij with size q × q equals

q q+1 q−1 2 dim(ker(T − λiI) ) − dim(ker(T − λiI) ) − dim(ker(T − λiI) ).

(v) The Jordan canonical form is unique up to the order of the Jordan blocks.

m Proof. Let Vi = ker(T −λiI) i for i = 1, . . . , k. By Proposition 3.4.5, dim Vi = ni for each i. This, together with the proof of Theorem 3.4.16, implies (i).

Recall that the linear operator T − λiI, restricted to Vi, is nilpotent of index mi and that we have a cyclic decomposition

Vi = Wi1 ⊕ · · · ⊕ Wiri , with mi = dim Wi1 ≥ · · · ≥ dim Wiri .

Each Jordan block Jij corresponds to the subspace Wij in the cyclic decomposi- tion above. Hence the largest Jordan block Jij is of size mi × mi. Parts (iii) and (iv) follow from Proposition 3.4.14. The knowledge of (i)-(iv) shows that the Jordan canonical form is unique up to the order of the Jordan blocks.

Corollary 3.4.18. Let A be a square matrix such that mA(x) splits over F . Then A is similar to a matrix in the Jordan canonical form (3.19). Moreover, two matrices are similar if and only if they have the same Jordan canonical form, except possibly for a permutation of the blocks.

Example. Let T : V → V be a linear operator. Assume that

4 3 χT (x) = (x − 2) (x − 3) and 2 2 mT (x) = (x − 2) (x − 3) .

Find all possible Jordan canonical forms of T .

Solution. We can extract the following information about the Jordan canonical form J of T :

• J has size 7 × 7.

• λ = 2 appears 4 times and λ = 3 appears 3 times. 3.4. JORDAN CANONICAL FORMS 155

• There is at least one Jordan block corresponding to λ = 2 of order 2.

• There is at least one Jordan block corresponding to λ = 3 of order 2.

With these 3 properties above, the Jordan canonical form of T is one of the following matrices:

2 1  2 1      0 2  0 2       2 1   2           0 2  or  2  .      3 1   3 1           0 3   0 3  3 3

The first matrix occurs when dim ker(T − 2I) = 2 and the second one occurs when dim ker(T − 2I) = 3. 156 CHAPTER 3. CANONICAL FORMS

Example. Suppose that T : V → V is a linear operator such that

7 (i) χT (x) = (x − 5) ,

3 (ii) mT (x) = (x − 5) ,

(iii) dim ker(T − 5I) = 3, and

(iv) dim ker(T − 5I)2 = 6.

Find a possible Jordan canonical form of T .

3 3 Solution. Note that since mT (x) = (x − 5) , we see that ker(T − 5I) = V . From the give information, we know that

• the Jordan canonical form of T has size 7 × 7 and λ = 5 appears 7 times.

• The largest Jordan block has size 3 × 3.

• The number of Jordan blocks = dim ker(T − 5I) = 3.

With these 3 pieces of information above, the possible Jordan canonical form of T can be one of the following matrices:

5 1 0  5 1 0      0 5 1  0 5 1      0 0 5  0 0 5           5 1  or  5 1 0  .      0 5   0 5 1           5 1  0 0 5  0 5 5

But then the number of Jordan blocks with size 2 × 2 equals

2 dim ker(T − 5I)2 − dim ker(T − 5I)3 − dim ker(T − 5I) = 12 − 7 − 3 = 2.

Hence the only possible Jordan canonical form is the first matrix above. 3.4. JORDAN CANONICAL FORMS 157

Example. Classify 3 × 3 matrices A such that A2 = 0 up to similarity.

Solution. Two matrices are similar if and only if they have the same Jordan canonical form. Hence we will find 3 × 3 matrices in Jordan canonical form such that A2 = 0. 2 Let p(x) = x . Then p(A) = 0, which implies that mA(x) | p(A). Hence 2 2 mA(x) = x or mA(x) = x . If mA(x) = x, then A = 0. If mA(x) = x , then A has 2 Jordan blocks of sizes 2 × 2 and 1 × 1, respectively with 0 on its diagonal:

0 1 0   0 0 0 . 0 0 0

Hence, up to similarity, there are two 3 × 3 matrices A such that A2 = 0.

As an application of Jordan canonical form, we prove the following result:

Theorem 3.4.19. Let A be an n × n matrix over F . Assume that χA(x) splits over F . Then

(i) the sum of all eigenvalues of A = tr A;

(ii) the product of all eigenvalues of A = det A.

Proof. Let J be the Jordan canonical form of A. Then A = PJP −1 where P is an invertible matrix. Thus

det A = det(PJP −1) = det J and tr A = tr(PJP −1) = tr J.

Since J is the upper triangular matrix, det J and tr J is the product and the sum, respectively, of the diagonal entries. But then the diagonal entries of J consist of the eigenvalues of A. The result now follows. 158 CHAPTER 3. CANONICAL FORMS

Exercises

3.4.1. Find the characteristic polynomial and the minimal polynomial of matrix

 1 2 3    A =  0 4 5  0 0 4 and determine its Jordan canonical form.

3.4.2. Let T : V → V be a linear operator. Show that T is nilpotent of index k if and only if [T ]B is nilpotent of index k for any ordered basis B for V . 3.4.3. Let T : V → V be a linear operator. Then the T -cyclic subspace generated by v is {p(T )(v) | p(x) ∈ F [x]}.

3.4.4. Let V be a finite-dimensional vector space over a field F and T : V → V a linear operator. For λ ∈ F , define the generalized λ-eigenspace V[λ] to be

k V[λ] = {v ∈ V | (T − λI) (v) = 0 for some k ∈ N}.

(i) Prove that V[λ] is a subspace of V and that if λ, β1,..., βn are distinct

elements in F , then V[λ] ∩ (V[β1] + ··· + V[βn]) = {0}.

m1 m (ii) If mT (x) = (x − λ1) ... (x − λk) k , where λ1,..., λk ∈ F , prove that

mi V[λi] = ker(T − λiI) i = 1, . . . , k.

3 2 2 3.4.5. If A ∈ M5(C) with χA(x) = (x − 2) (x + 7) and mA(x) = (x − 2) (x + 7), what is the Jordan canonical form for A?

3.4.6. How many possible Jordan canonical forms are there for a 6 × 6 complex 4 2 matrix A with χA(x) = (x + 2) (x − 1) ?

3.4.7. Classify up to similarity all 3 × 3 complex matrices A such that A3 = I.

2 2 2 3.4.8. Let T : R → R be a linear operator such that T = 0. Show that T = 0 2 or there is an ordered basis B for R such that " # 0 1 [T ] = . B 0 0 3.4. JORDAN CANONICAL FORMS 159

3.4.9. List up to similarity all real 4 × 4 matrices A such that A3 = A2 6= 0, and exhibit the Jordan canonical form of each.

3.4.10. Let A and B be square matrices such that A2 = A and and B2 = B. Show that A and B are similar if and only if they have the same rank.

2 3.4.11. Let A be an n × n complex matrix such that A = cA for some c ∈ C.

(i) Describe all the possibilities for the Jordan canonical form of A.

(ii) Suppose B is an n × n complex matrix such that B2 = cB (same c), and assume that rank A = rank B. Prove that A and B are similar over C.

Remark. Consider the cases c = 0 and c 6= 0.

3.4.12. Let A ∈ Mn(C) with rank A = 1.

(i) Find all the possibilities for the Jordan canonical form of A.

(ii) Prove that det(In + A) = 1 + tr(A).

Chapter 4

Inner Product Spaces

4.1 Bilinear and Sesquilinear Forms

Definition 4.1.1. Let V be a vector space over a field F .A bilinear form on V is a function f : V × V → F which is linear in the both variables, i.e., for any x, y, z ∈ V and α, β ∈ F ,

(i) f(αx + βy, z) = αf(x, z) + βf(y, z),

(ii) f(z, αx + βy) = αf(z, x) + βf(z, y).

A bilinear form f on V is said to be symmetric if

f(v, w) = f(w, v) for any v, w ∈ V .

Similarly, it is said to be skew-symmetric if

f(v, w) = −f(w, v) for any v, w ∈ V .

If the underlying field is the field of complex numbers, we can define a sesquilinear form on V .

Definition 4.1.2. Let V be a vector space over C.A sesquilinear form on V is a function f : V ×V → C which is linear in the first variable and conjugate-linear (or anti-linear) in the second variable, i.e., for any x, y, z ∈ V and α, β ∈ C, (i) f(αx + βy, z) = αf(x, z) + βf(y, z),

161 162 CHAPTER 4. INNER PRODUCT SPACES

(ii) f(z, αx + βy) = αf(z, x) + βf(z, y).

A sesquilinear form f is hermitian if f(x, y) = f(y, x) for any x, y ∈ V .

Definition 4.1.3. If f : V × V → F is a bilinear form or a sesquilinear form on V , then the map q : V → F defined by

q(v) = f(v, v) for any v ∈ V is called a quadratic form associated with f.

The following proposition gives a formula that shows how to recover a sesqui- linear form from its quadratic form.

Proposition 4.1.4 (Polarization identity). If f : V × V → C is a sesquilinear form on V and q(v) = f(v, v) is its associated quadratic form, then for any x, y ∈ V ,

3 1 X f(x, y) = ik q(x + iky) 4 k=0 1h i i h i = q(x + y) − q(x − y) + q(x + iy) − q(x − iy) . (4.1) 4 4 Proof. The proof is a straightforward calculation and is left as an exercise.

We also have a Polarization identity for a symmetric bilinear form, which will be given as an exercise.

Definition 4.1.5. Let f be a bilinear form or a sesquilinear form on a vector space V . Then f is said to be

- nondegenerate if

f(x, y) = 0 ∀y ∈ V ⇒ x = 0, and f(y, x) = 0 ∀y ∈ V ⇒ x = 0.

- positive semi-definite if

f(x, x) ≥ 0 for any x ∈ V . 4.1. BILINEAR AND SESQUILINEAR FORMS 163

- positive definite if

∀x ∈ V, x 6= 0 ⇒ f(x, x) > 0.

Remark. A positive definite (sesquilinear or bilinear) form is positive semi- definite. A positive semi-definite form f is positive definite if and only if f(v, v) = 0 implies v = 0.

Proposition 4.1.6. A sesquilinear form f on a complex vector space V is her- mitian if and only if f(x, x) ∈ R for any x ∈ V .

Proof. If the sesquilinear form is hermitian, then

f(x, x) = f(x, x) for any x ∈ V , which implies f(x, x) ∈ R for any x ∈ V . Conversely, assume that f(x, x) ∈ R for any x ∈ V . Then the associated quadratic form q(x) = f(x, x) ∈ R for any 2 x ∈ V . Since q(αx) = |α| q(x) for any x ∈ X and α ∈ C, we have

q(y + ix) = q(i(x − iy)) = q(x − iy), q(y − ix) = q(−i(x + iy)) = q(x + iy).

These identities, together with the Polarization identity, imply that f is hermi- tian.

Corollary 4.1.7. A positive semi-definite sesquilinear form is hermitian.

Proof. It follows immediately from Definition 4.1.5 and Proposition 4.1.6.

Proposition 4.1.8. A positive definite (sesquilinear or bilinear) form is nonde- generate.

Proof. Let f be a positive definite sesquilinear (bilinear) form on V . Let u ∈ V be such that f(u, v) = 0 for any v ∈ V . In particular, f(u, u) = 0, which implies u = 0. Similarly, if f(v, u) = 0 for any v ∈ V , then u = 0. Hence it is nondegenerate. 164 CHAPTER 4. INNER PRODUCT SPACES

Theorem 4.1.9 (Cauchy-Schwarz inequality). Let f be a positive semi-definite (bilinear or sesquilinear) form on V . Then for any x, y ∈ V ,

|f(x, y)| ≤ pf(x, x) pf(y, y).

Proof. We will prove this for a sesquilinear form over a complex vector space. Let A = f(x, x), B = |f(x, y)| and C = f(y, y). If B = 0, the result follows trivially. Suppose B 6= 0. Let α = B/f(y, x). Then |α| = 1 and αf(y, x) = B. By Corollary 4.1.7, we also have αf(x, y) = B. For any r ∈ R,

f(x − rαy, x − rαy) = f(x, x) − rαf(y, x) − rαf(x, y) + r2f(y, y) = A − 2rB + r2C.

2 Hence A − 2rB + r C ≥ 0 for any r ∈ R. If C = 0, then 2rB ≤ A for any r ∈ R, which implies B = 0. If C > 0, take r = B/C so that A − B2/C ≥ 0, which implies B2 ≤ AC. 4.1. BILINEAR AND SESQUILINEAR FORMS 165

Exercises

4.1.1. If f : V × V → F is a symmetric bilinear form on V and q(v) = f(v, v) is its associated quadratic form, show that for any x, y ∈ V , 1h i f(x, y) = q(x + y) − q(x − y) 4 1h i = q(x + y) − q(x) − q(y) . 2 n 4.1.2. For any z = (z1, . . . , zn) and w = (w1, . . . , wn) in C , define

(z, w) = z1w1 + ··· + znwn.

n Prove that this is a nondegenerate symmetric bilinear form on C , which is not positive definite. However, the formula

hz, wi = z1w¯1 + ··· + znw¯n

n defines a positive definite hermitian sesquilinear form on C .

n 4.1.3. Verify whether each of the following bilinear forms on F , where F = R, C, is symmetric, skew-symmetric or nondegenerate:

(i) f(x, y) = x1y1 + ··· + xpyp − xp+1yp+1 − · · · − xnyn;

(ii) g(x, y) = (x1y2 − x2y1) + ··· + (x2m−1y2m − x2my2m−1);

(iii) h(x, y) = (x1ym+1 − xm+1y1) + ··· + (xmy2m − x2mym).

Remark. In (i), p in an integer in the set {1, . . . , n}. In (ii) and (iii), we assume that n = 2m is even.

4.1.4. Let V be a finite-dimensional vector space of dimension n. Let B be an ordered basis for V and let A and B be n × n matrices such that

t t [v]BA [w]B = [v]BB [w]B for any v, w ∈ V.

Prove that A = B.

4.1.5. Let V be a finite-dimensional vector space with a basis B = {v1, . . . , vn} and let f be a bilinear form on V . The matrix representation of f, denoted by

[f]B, is the matrix whose ij-entry is f(vi, vj). 166 CHAPTER 4. INNER PRODUCT SPACES

Prove the following statements:

t (i) f(v, w) = [v]B[f]B [w]B for any v, w ∈ V ;

(ii) [f]B is symmetric (skew-symmetric) if and only if f is symmetric (skew- symmetric);

(iii) [f]B is invertible if and only if f is nondegenerate.

4.1.6. Compute the matrix representations of the bilinear forms in Problem 4.1.3.

4.1.7. Let V be a finite-dimensional vector space and f a bilinear form on V . Let B and B0 be ordered bases for V and P the transition matrix from B to B0. Show that t [f]B = P [f]B0 P.

4.1.8. Let f be a bilinear form on a vector space V . A linear operator T : V → V is said to preserve the bilinear form f if

f(T v, T w) = f(v, w) for any v, w ∈ V.

Show that

(i) if f is nondegenerate and T preserves f, then T is invertible;

(ii) if f is nondegenerate, then the set of linear operators preserving f is a group under composition.

(iii) if V is finite-dimensional, then T preserves f if and only if

t [T ]B[f]B[T ]B = [f]B

for any ordered basis B for V . 4.2. INNER PRODUCT SPACES 167

4.2 Inner Product Spaces

Definition 4.2.1. Let V be a vector space over field F, (where F = R or F = C). An inner product on V is a map h· , ·i: V × V → F satisfying

(1) hx + y, zi = hx, zi + hy, zi for each x, y, z ∈ V ;

(2) hαx, yi = αhx, yi for each x, y ∈ V and α ∈ F;

(3) hx, yi = hy, xi for each x, y ∈ V ;

(4) ∀x ∈ V , x 6= 0 ⇒ hx, xi > 0.

A real (or complex) vector space equipped with an inner product is called a real (or complex) .

Proposition 4.2.2. Let V be an inner product space. Then

(i) hx, αyi = αhx, yi for any x, y ∈ V and α ∈ F;

(ii) hx, y + zi = hx, yi + hx, zi for any x, y, z ∈ V ;

(iii) hx, 0i = h0, xi = 0 for any x ∈ V .

Proof. Easy.

From Definition 4.2.1 and Proposition 4.2.2, we see that if F = R, then the inner product is linear in both variables, and if F = C, then the inner product is linear in the first variable and conjugate-linear in the second variable. Hence the real inner product is a positive definite, symmetric bilinear form and the complex inner product is a positive definite hermitian sesquilinear form.

The next proposition is useful in proving results about an inner product.

Proposition 4.2.3. If y and z are elements in an inner product space V such that hx, yi = hx, zi for each x ∈ V , then y = z.

Proof. Since hx, y − zi = 0, for any x ∈ V , by choosing x = y − z, we have hy − z, y − zi = 0. Hence y = z. 168 CHAPTER 4. INNER PRODUCT SPACES

Let V be an inner product space. For each x ∈ V , write

kxk = phx, xi. (4.2)

In other words, kxk is the square-root of the associated quadratic form of x. The Cauchy-Schwarz inequality (Theorem 4.1.9) can be written as

|hx, yi| ≤ kxk kyk for any x, y ∈ V.

Definition 4.2.4. Let V be a vector space. A function k · k: V → [0, ∞) is said to be a norm on V if

(i) kxk = 0 if and only if x = 0,

(ii) kcxk = |c| kxk for any x ∈ V and c ∈ F,

(iii) kx + yk ≤ kxk + kyk for any x, y ∈ V .

A vector space equipped with a norm is called a normed linear space, or simply a normed space. Property (iii) is referred to as the triangle inequality.

Proposition 4.2.5. Let V be an inner product space. Then the function k · k defined in (4.2) is a norm on V .

Proof. It is easy to see that kxk ≥ 0 and kxk = 0 if and only if x = 0. For any x ∈ V and α ∈ F,

kαxk2 = hαx, αxi = ααhx, xi = |α|2kxk2.

Hence kαxk = |α|kxk. To prove the triangle inequality, let x, y ∈ V .

kx + yk2 = hx + y, x + yi = hx, xi + hx, yi + hy, xi + hy, yi = kxk2 + 2 Rehx, yi + kyk2 ≤ kxk2 + 2|hx, yi| + kyk2 ≤ kxk2 + 2 kxk kyk + kyk2 = (kxk + kyk)2.

Hence kx + yk ≤ kxk + kyk. 4.2. INNER PRODUCT SPACES 169

Proposition 4.2.6 (Parallelogram law). Let V be an inner product space. Then for any x, y ∈ V ,

kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2.

Proof. For any x, y ∈ V , we have

kx + yk2 = hx, xi + hx, yi + hy, xi + hy, yi = kxk2 + 2 Rehx, yi + kyk2, and kx − yk2 = hx, xi − hx, yi − hy, xi + hy, yi = kxk2 − 2 Rehx, yi + kyk2.

We immediately see that kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2.

Proposition 4.2.7 (Polarization identity). Let V be an inner product space.

(1) If F = R, then 1 hx, yi = kx + yk2 − kx − yk2 . 4 (2) If F = C, then 1 hx, yi = kx + yk2 − kx − yk2 + ikx + iyk2 − ikx − iyk2 . 4

Proof. The complex case is Proposition 4.1.4. The real case is easy and is left as an exercise.

Examples.

n 1. F is an inner product space with respect to the following inner product n X hx, yi = xiy¯i = x1y¯1 + x2y¯2 + ··· + xny¯n, i=1 n where x = (x1, . . . , xn) and y = (y1, . . . , yn) ∈ F . Note that when F = R, then Pn the inner product is simply hx, yi = i=1 xiyi. 2 P∞ 2 2 2. ` = { (xn) | n=1 |xn| < ∞}. If x = (xn) and y = (yn) ∈ ` , then ∞ X hx, yi = xiy¯i i=1 170 CHAPTER 4. INNER PRODUCT SPACES is an inner product on `2. The series above is convergent by the Cauchy-Schwarz p 2 inequality. Note that hx, xi = kxk2 on ` .

n2 3. Mn(F), regarded as F , is an inner product space with respect to the following inner product

n n X X ∗ hA, Bi = aijbij = tr(AB ) i=1 j=1 where B∗ = (B)t = Bt is the conjugate-transpose of B. In case of a real matrix, B∗ is simply Bt. 4. The vector space C[0, 1] of all continuous functions on [0, 1] is an inner product space with respect to the inner product Z 1 hf, gi = f(x)g(x) dx (f, g ∈ C[0, 1]). 0 Definition 4.2.8. Let V be an inner product space.

(1) We say that u, v ∈ V are orthogonal if hu, vi = 0 and write u ⊥ v.

(2) If x ∈ V is orthogonal to every element of a subset W of V , then we say that x is orthogonal or perpendicular to W and write x ⊥ W .

(3) If U, W are subsets of V and u ⊥ w for all u ∈ U and all w ∈ W , then we say that U and W are orthogonal and write U ⊥ W .

(4) The set of all x ∈ V orthogonal to a set W is denoted by W ⊥ and called the orthogonal complement of W :

W ⊥ = { x ∈ V | x ⊥ W }.

Proposition 4.2.9. Let V be an inner product space.

(1) {0}⊥ = V and V ⊥ = {0}.

(2) If A is a subset of V , then A⊥ is a subspace of V .

⊥ ⊥ (3) If A is a subset of V , then A ∩ A = ∅ or A ∩ A = {0}; if A is a subspace of V , then A ∩ A⊥ = {0}. 4.2. INNER PRODUCT SPACES 171

(4) For any subsets A, B of V , if A ⊆ B then B⊥ ⊆ A⊥.

(5) For any subset A of V , A ⊆ A⊥⊥.

Proof. (1) is trivial.

⊥ ⊥ (2) Clearly, 0 ∈ A . If x1, x2 ∈ A and α, β ∈ F, then

hαx1 + βx2, yi = αhx1, yi + βhx2, yi = 0 for all y ∈ A.

⊥ Hence αx1 + βx2 ∈ A .

⊥ ⊥ ⊥ (3) Assume that A ∩ A 6= ∅. Let x ∈ A ∩ A . Since x ∈ A , we have hx, yi = 0 for each y ∈ A. In particular, hx, xi = 0. Hence x = 0. This shows that A ∩ A⊥ ⊆ {0}. Now, assume that A is a subspace of A. Since both A and A⊥ are subspaces of V , 0 ∈ A ∩ A⊥. Hence A ∩ A⊥ = {0}.

(4) Assume that A ⊆ B. Let x ∈ B⊥. If y ∈ A, then y ∈ B and hence hx, yi = 0. This shows that x ∈ A⊥. Thus, B⊥ ⊆ A⊥.

(5) Let x ∈ A. Then hx, yi = 0 for any y ∈ A⊥. Hence x ∈ A⊥⊥.

Definition 4.2.10. A nonempty collection O = {uα | α ∈ Λ} of elements in an inner product space is said to be an orthogonal set if uα ⊥ uβ for all α 6= β

in Λ. If, in addition, each uα has norm one, then we say that the set O is an

orthonormal set. That is, the set O is orthonormal if and only if huα, uβi = δαβ

for each α, β ∈ Λ, where δαβ is the Kronecker’s delta function.

Note that we can always construct an orthonormal set from an orthogonal set of nonzero vectors by dividing each vector by its norm.

Examples.

3 (1) {(1, 0, 0), (0, 1, 0), (0, 0, 1)} is an orthonormal set in R .

3 (2) {(1, −1, 0), (1, 1, 0), (0, 0, 1)} is an orthogonal set in R , but not an orthonor- mal set. By dividing each element by its norm, we obtain an orthonormal n o set ( √1 , − √1 , 0), ( √1 , √1 , 0), (0, 0, 1) . 2 2 2 2 172 CHAPTER 4. INNER PRODUCT SPACES

i2nπx ∞ (3) {e }n=−∞ is an orthonormal set in C[0, 1] because Z 1 Z 1 2nπix 2mπix 2(n−m)πix e · e dx = e dx = δnm. 0 0 Proposition 4.2.11. Any orthogonal set of nonzero vectors is linearly indepen- dent. Proof. Assume that O is an orthogonal set consisting of non-zero vectors. If Pn ui ∈ O and ci ∈ F, i = 1, 2, . . . , n, are such that i=1 ciui = 0, then, for j = 1, 2, . . . , n, n n D X E X 2 0 = ciui, uj = ci hui, uji = cjkujk , i=1 i=1 which implies that cj = 0 for each j. Hence O is linearly independent.

The next proposition is a generalization of the Pythagorean theorem for a right-angled triangle.

Proposition 4.2.12 (Pythagorean formula). If {x1, x2, . . . , xn} is an orthogonal subset of an inner product space, then n n 2 X X 2 xi = kxik . i=1 i=1

Proof. Using the fact that hxi, xji = 0 if i 6= j, we have n n n n n n 2 D E X X X X X X 2 xi = xi, xj = hxi, xji = kxik . i=1 i=1 j=1 i=1 j=1 i=1

If S = {u1, u2, . . . , un} is a linearly independent subset of a vector space V , then any element x ∈ span(S) can be written uniquely as a linear combination Pn x = i=1 αiui. However, if S is an orthonormal set in an inner product space, it Pn is linearly independent by Proposition 4.2.11. In this case, if x = i=1 αiui is in span(S), we can determine the formula for the coefficients αi.

Proposition 4.2.13. Let {u1, . . . , un} be an orthonormal set in an inner product Pn space and x = i=1 αiui, where each αi ∈ F. Then αi = hx, uii for i = 1, . . . , n and n n 2 X 2 X 2 kxk = |αi| = | hx, uii | . i=1 i=1 4.2. INNER PRODUCT SPACES 173

Pn Proof. If x = i=1 αiui, then

n n DX E X hx, uji = αiui, uj = αi hui, uji = αj. i=1 i=1 Moreover, by Proposition 4.2.12, n n n n 2 2 X X 2 X 2 X 2 kxk = αiui = kαiuik = |αi| = | hx, uii | . i=1 i=1 i=1 i=1

Proposition 4.2.14. Let {u1, u2, . . . , un} be an orthonormal subset of an inner product space V . Let N = span{u1, u2, . . . , un}. For any x ∈ V , define the orthogonal projection of x on N by

n X PN (x) = hx, uii ui. i=1

⊥ Then PN (x) ∈ N and x − PN (x) ∈ N . In particular, x − PN (x) ⊥ PN (x).

Proof. It is obvious that PN (x) ∈ N. First, we show that x − PN (x) ⊥ uj for

j = 1, . . . , n. Using the fact that hui, uji = δij, we have

n D X E hx − PN (x), uji = x − hx, uii ui, uj i=1 n X = hx, uji − hx, uii hui, uji i=1

= hx, uji − hx, uji = 0.

Pn This implies that x − PN (x) ⊥ uj for j = 1, . . . , n. If y = j=1 cjuj ∈ N, then

n n D X E X hx − PN (x), yi = x − PN (x), cjuj = cj hx − PN (x), uji = 0. j=1 j=1

Hence x − PN (x) ⊥ N, which implies x − PN (x) ⊥ PN (x).

We will apply this proposition to show that we can construct an orthonormal set from a linearly independent set and it still has the same spanning set. This is known as Gram-Schmidt orthogonalization process. 174 CHAPTER 4. INNER PRODUCT SPACES

Theorem 4.2.15 (Gram-Schmidt orthogonalization process). Let V be an inner product space and let {x1, x2,... } be a linearly independent set in V . Then there is an orthonormal set {u1, u2,... } such that, for each n ∈ N,

span{x1, . . . , xn} = span{u1, . . . , un}.

Proof. First, set u1 = x1/kx1k. Next, let z2 = x2 − hx2, u1i u1. Clearly, z2 is orthogonal to u1. Then, we set u2 = z2/kz2k. In general, assume that we have chosen an orthonormal set {u1, u2, . . . , un−1} such that

span{x1, . . . , xn−1} = span{u1, . . . , un−1}. (4.3)

Define

n−1 X zn = xn − hxn, uii ui. i=1

Pn−1 By Proposition 4.2.14, i=1 hxn, uii ui is the orthogonal projection of xn onto span{u1, . . . , un−1}. Hence zn is orthogonal to each uj. Now, let un = zn/kznk.

It follows that {u1, . . . , un} is an orthonormal set. Moreover, it is easy to see that un ∈ span{u1, . . . , un−1, xn}. This, together with (4.3), implies

span{u1, . . . , un} ⊆ span{x1, . . . , xn}.

On the other hand, xn ∈ span{u1, . . . , un−1, zn} = span{u1, . . . , un−1, un}. Thus

span{x1, . . . , xn} ⊆ span{u1, . . . , un}.

This finishes the proof.

Definition 4.2.16. Let V be a finite-dimensional inner product space. An or- thonormal basis for V is a basis for V which is an orthonormal set.

Example.

3 (1) {(1, 0, 0), (0, 1, 0), (0, 0, 1)} is an orthonormal basis for R .

(2) ( √1 , − √1 , 0), ( √1 , √1 , 0), (0, 0, 1) is an orthonormal basis for 3. 2 2 2 2 R 4.2. INNER PRODUCT SPACES 175

Corollary 4.2.17. Let V be a finite-dimensional inner product space. Then V has an orthonormal basis.

Proof. Let {x1, . . . , xn} be a basis for V . By Gram-Schmidt orthogonalization process, there is an orthonormal set {u1, . . . , un} such that

span{x1, . . . , xn} = span{u1, . . . , un}.

This shows that {u1, . . . , un} is an orthonormal basis for V .

Example. Apply Gram-Schmidt process to produce an orthonormal basis for 3 R from basis {(1, 1, 0), (0, 1, 1), (1, 0, 1)}.

Solution. Let x1 = (1, 1, 0), x2 = (0, 1, 1) and x3 = (1, 0, 1). First, set   x1 1 1 u1 = = √ , √ , 0 . kx1k 2 2 Next, let 1  1 1   1 1  z2 = x2 − hx2, u1i u1 = (0, 1, 1) − √ √ , √ , 0 = − , , 1 . 2 2 2 2 2 Then set √     z2 2 1 1 1 1 2 u2 = = √ − , , 1 = −√ , √ , √ . kz2k 3 2 2 6 6 6 Now let

z3 = x3 − hx3, u1i u1 − hx3, u2i u2 1  1 1  1  1 1 2  = (1, 0, 1) − √ √ , √ , 0 − √ −√ , √ , √ 2 2 2 6 6 6 6 2 2 2 = , − , . 3 3 3 Finally, set √     z3 3 2 2 2 1 1 1 u3 = = , − , = √ , −√ , √ . kz3k 2 3 3 3 3 3 3 n     o We have an orthonormal basis √1 , √1 , 0 , − √1 , √1 , √2 , √1 , − √1 , √1 . 2 2 6 6 6 3 3 3 176 CHAPTER 4. INNER PRODUCT SPACES

Theorem 4.2.18 (Projection Theorem). Let V be a finite-dimensional inner product space and W a subspace of V . Then

V = W ⊕ W ⊥.

Proof. Let {u1, . . . , un} be an orthonormal basis for W . Let x ∈ V . Consider the orthogonal projection of x on W :

n X PW (x) = hx, uii ui. i=1

⊥ By Proposition 4.2.14, PW (x) ∈ W and x − PW (x) ∈ W . Thus

⊥ x = PW (x) + (x − PW (x)) ∈ W + W .

This shows that V = W + W ⊥. We already know that W ∩ W ⊥ = {0}. Hence V = W ⊕ W ⊥.

Corollary 4.2.19. Let W be a subspace of a finite-dimensional inner product space V . Then W ⊥⊥ = W .

Proof. If x ∈ W , then x ⊥ W ⊥, which implies x ∈ W ⊥⊥. Hence W ⊆ W ⊥⊥. On the other hand, let x ∈ W ⊥⊥. By Theorem 4.2.18, we can write x = y + z, where y ∈ W and z ∈ W ⊥. Since y ∈ W , it is also in W ⊥⊥. Since W ⊥⊥ is a subspace of V , we have x − y ∈ W ⊥⊥. But then x − y = z ∈ W ⊥. Hence x−y ∈ W ⊥ ∩(W ⊥)⊥ = {0}. Thus x = y ∈ W , which shows that W = W ⊥⊥.

Next we show that, given a subspace W and a point v in V , the orthogonal projection PW (v) is the point on W that minimizes the distance from v to W .

Proposition 4.2.20. Let W be a subspace of a finite-dimensional inner product space V and v ∈ V . Then

kv − PW (v)k ≤ kv − wk for any w ∈ W .

Moreover, the equality holds if and only if w = PW (v). 4.2. INNER PRODUCT SPACES 177

Proof. For any w ∈ W ,

2 2 kv − wk = k(v − PW (v)) + (PW (v) − w)k 2 2 = kv − PW (v)k + kPW (v) − wk 2 ≥ kv − PW (v)k ,

⊥ where the second equality holds because v − PW (v) ∈ W and PW (v) − w ∈ W .

Also, the equality holds if and only if kPW (v) − wk = 0, i.e., PW (v) = w.

Now we consider a linear functional on an inner product space. It is easily seen that for a fixed w ∈ V , the map v 7→ hv, wi is a linear functional on V . The next theorem shows that these are the only linear functionals on V . It is also true in a more general case with a variety of interesting applications.

Theorem 4.2.21 (Riesz’s Theorem). Let f be a linear functional on a finite- dimensional inner product space V . Then there is a unique w ∈ V such that

f(v) = hv, wi for any v ∈ V .

Proof. Let {u1, . . . , un} be an orthonormal basis for V . Let

n X w = f(ui) ui. i=1 Pn Let v ∈ V and write v = i=1hv, uii ui. Then n  X  f(v) = f hv, uii ui i=1 n X = hv, uiif(ui) i=1 n D X E = v, f(ui) ui i=1 = hv, wi.

To show uniqueness, let w0 ∈ V be such that f(v) = hv, wi = hv, w0i for any v ∈ V . By Proposition 4.2.3, w = w0. 178 CHAPTER 4. INNER PRODUCT SPACES

Exercises

4.2.1. Let n ≥ 3. Prove that if x, y are elements of a complex inner product space, then n 1 X hx, yi = ωkkx + ωkyk2, n k=1 where ω is the n-th root of unity, i.e., ωn = 1.

4.2.2. Show that in a complex inner product space, x ⊥ y if and only if

kx + αyk = kx − αyk for all scalars α.

4.2.3. Let ϕ be a nonzero linear functional on a finite-dimensional inner product space V . Prove that (ker ϕ)⊥ is a subspace of dimension 1.

4.2.4. Let W1, W2 be subspaces of an inner product space V . Prove that

⊥ ⊥ ⊥ (W1 + W2) = W1 ∩ W2 .

4.2.5. Let A be a subset of a finite-dimensional inner product space V . Prove that A⊥ = (span A)⊥.

4.2.6. In each of the following parts, apply Gram-Schmidt process to the given 3 basis for R to produce an orthonormal basis, and write the given element x as a linear combination of the elements in the orthonormal basis thus obtained.

(a) {(1, 0, −1), (0, 1, 1), (1, 2, 3)}, and x = (2, 1, −2);

(b) {(1, 1, 1), (0, 1, 1), (0, 0, 3)}, and x = (3, 3, 1).

4.2.7. Let {v1, . . . , vk} be an orthonormal subset of an inner product space V . Show that for any x ∈ V , k X 2 2 |hx, vii| ≤ kxk . i=1

Prove also that the equality holds if and only if {v1, . . . , vk} is an orthonormal basis for V . 4.2. INNER PRODUCT SPACES 179

4.2.8. Let V be a complex inner product space. Let {v1, . . . , vn} be an orthonor- mal basis for V . Prove that n X (i) x = hx, vii vi for any x ∈ V ; i=1 n X (ii) hx, yi = hx, viihy, vii for any x, y ∈ V . i=1 180 CHAPTER 4. INNER PRODUCT SPACES

4.3 Operators on Inner Product Spaces

Throughout this section, unless otherwise stated, V is a finite-dimensional inner product space. For simplicity, in this section we will write T x for T (x) when there is no confusion.

Proposition 4.3.1. Let V be an inner product space and T a linear operator on V .

(i) If hT x, yi = 0 for any x, y ∈ V , then T = 0.

(ii) If V is a complex inner product space and hT x, xi = 0 for all x ∈ V , then T = 0.

Proof. (i) For each x ∈ V , hT x, yi = 0 for any y ∈ V . Hence T x = 0 for each x ∈ V , which implies that T = 0.

(ii) Let x, y ∈ V and r ∈ C. Then

0 = hT (rx + y), rx + yi = |r|2hT x, xi + hT y, yi + rhT x, yi +r ¯hT y, xi = rhT x, yi +r ¯hT y, xi.

Setting r = 1, we have

hT x, yi + hT y, xi = 0.

Setting r = i, we have

hT x, yi − hT y, xi = 0.

Hence hT x, yi = 0 for any x, y ∈ V . It follows from part (i) that T = 0.

Remark. Part (ii) may not hold for a real inner product space. For example, 2 ◦ 2 let V = R and T is the 90 -rotation, i.e. T (x, y) = (−y, x) for any (x, y) ∈ R . Then hT v, vi = 0 for each v ∈ V , but T 6= 0.

Theorem 4.3.2. Let T be a linear operator on V . Then there is a unique linear operator T ∗ on V satisfying

hT x, yi = hx, T ∗yi for all x, y ∈ V. 4.3. OPERATORS ON INNER PRODUCT SPACES 181

Proof. Let T be a linear operator on V . For any y ∈ V , the map x 7→ hT x, yi is a linear functional on V . By Riesz’s Theorem (Theorem 4.2.21), there exists a unique z ∈ V such that

hT x, yi = hx, zi for all x ∈ V.

∗ ∗ Define T y = z. To show that the map T is linear, let y1, y2 ∈ V and α, β ∈ F. Then for any x ∈ V ,

∗ hx, T (αy1 + βy2)i = hT x, αy1 + βy2i

=α ¯hT x, y1i + β¯hT x, y2i ∗ ∗ =α ¯hx, T y1i + β¯hx, T y2i ∗ ∗ = hx, αT y1 + βT y2i.

∗ ∗ ∗ Hence T (αy1 +βy2) = αT y1 +βT y2. For uniqueness, assume that S is a linear operator on V such that

hT x, yi = hx, Syi for all x, y ∈ V.

Then hx, Syi = hx, T ∗yi for all x, y ∈ V.

Thus S = T ∗ by Proposition 4.3.1.

Definition 4.3.3. Let T be a linear operator on V . Then the linear operator T ∗ defined in Theorem 4.3.2 is called the adjoint of T .

We summarize important properties of the adjoint of an operator in the fol- lowing theorem:

Theorem 4.3.4. Let T , S be linear operators on V . Then

1. T ∗∗ = T ;

∗ ∗ ∗ 2. (αT + βS) =αT ¯ + βS¯ for all α, β ∈ F ;

3. (TS)∗ = S∗T ∗ ;

4. If T is invertible, then T ∗ is also invertible and (T ∗)−1 = (T −1)∗. 182 CHAPTER 4. INNER PRODUCT SPACES

Proof. Let T and S be linear operators on V .

(1) For any x, y ∈ V ,

hx, T ∗∗yi = hT ∗x, yi = hx, T yi.

Hence T ∗∗ = T .

(2) We leave this as a (straightforward) exercise.

(3) For any x, y ∈ V ,

hx, (TS)∗yi = hT Sx, yi = hSx, T ∗yi = hx, S∗T ∗yi.

Hence (TS)∗ = S∗T ∗.

(4) Assume that T −1 exists. Then TT −1 = T −1T = I. Taking adjoint and applying (3), we see that

(T −1)∗T ∗ = T ∗(T −1)∗ = I∗ = I.

Hence T ∗ is invertible and (T ∗)−1 = (T −1)∗.

Remark. If T : V → W is a linear operator between finite-dimensional inner product spaces, we can define T ∗ : W → V to be a unique linear operator from W into V satisfying

∗ hT x, yiW = hx, T yiV for any x ∈ V and y ∈ W.

It has all the properties listed in the previous theorem. Since we are mainly interested in the case where V = W , we will restrict ourselves to this setting.

n Examples. If we write elements in C as column vectors (n × 1 matrices), then n we can write the inner product on C as n X t hx, yi = xiy¯i = x y.¯ i=1 n Recall that any n × n matrix A defines a linear operator LA on C by left mul- n tiplication LA(x) = Ax, where x ∈ C is written as an n × 1 matrix. Then

∗ (LA) = LA∗ , 4.3. OPERATORS ON INNER PRODUCT SPACES 183

∗ t n where A = A¯ . To see this, let x, y ∈ C . Then

t t t ¯t hLA(x), yi = hAx, yi = (Ax) y¯ = x A y¯ = hx, A yi = hx, LA∗ (y)i.

On the other hand, if T is a linear operator on V and B is an ordered orthonormal ∗ ∗ basis for V , then [T ]B = [T ]B. We leave this as an exercise.

Definition 4.3.5. Let T be a linear operator on V . Then

- T is said to be normal if TT ∗ = T ∗T ;

- T is said to be self-adjoint or hermitian if T ∗ = T ;

- T is said to be unitary if T is invertible and T ∗ = T −1.

If V is a real inner product space and T is unitary, then we may say that T is orthogonal. It is clear that if T is self-adjoint or unitary, then it is normal.

Definition 4.3.6. Let A ∈ Mn(F). Then

- A is said to be normal if AA∗ = A∗A ;

- A is said to be self-adjoint or hermitian if A∗ = A ;

- A is said to be unitary if A is invertible and A∗ = A−1.

If F = R, then

- A is said to be symmetric if At = A ;

- A is said to be orthogonal if A is invertible and At = A−1.

In other words, a symmetric matrix is a real self-adjoint matrix and an orthogonal matrix is a real unitary matrix.

n Examples. Let V = F and let A be an n×n matrix over F. Consider the linear operator LA given by left multiplication by a matrix A. It is easy to verify that

- LA is normal if and only if A is normal;

- LA is hermitian if and only if A is hermitian; 184 CHAPTER 4. INNER PRODUCT SPACES

- LA is unitary if and only if A is unitary.

Theorem 4.3.7. Let T be a linear operator on V .

(i) T is self-adjoint if and only if hT x, yi = hx, T yi for any x, y ∈ V .

(ii) If T is self-adjoint, then hT x, xi ∈ R for each x ∈ V .

(iii) If V is a complex inner product space, then T is self-adjoint if and only if hT x, xi ∈ R for each x ∈ V . Proof. (i) Assume that T = T ∗. Then

hT x, yi = hx, T ∗yi = hx, T yi for any x, y ∈ V.

Conversely, if hT x, yi = hx, T yi for any x, y ∈ V , then

hT x, yi = hx, T yi = hT ∗x, yi for any x, y ∈ V.

Hence T = T ∗ by Proposition 4.3.1 (i). (ii) Assume that T is self-adjoint. Then for any x ∈ V ,

hT x, xi = hx, T xi = hT x, xi, which implies hT x, xi ∈ R for any x ∈ V . (iii) Let V be a complex inner product space. Assume that hT x, xi ∈ R for any x ∈ V . Then

hT x, xi = hT x, xi = hx, T xi = hT ∗x, xi for any x ∈ V.

By Proposition 4.3.1 (ii), we conclude that T = T ∗.

Proposition 4.3.8. Let T be a self-adjoint operator on V . If hT x, xi = 0 for all x ∈ V , then T = 0.

Proof. Assume that hT x, xi = 0 for all x ∈ V . If V is a complex inner product space, then T = 0 (without assuming that T is self-adjoint) by Proposition 4.3.1 (ii). Thus we will establish this for a real inner product space. For any x, y ∈ V ,

0 = hT (x + y), x + yi = hT x, xi + hT x, yi + hT y, xi + hT y, yi, 4.3. OPERATORS ON INNER PRODUCT SPACES 185 which implies hT x, yi + hT y, xi = 0. But then

hT x, yi = hy, T xi = hT y, xi.

The first equality follows from the fact that the inner product is real and the second one follows because T is self-adjoint. It follows that hT x, yi = 0.

Theorem 4.3.9. Let T be a linear operator on V . Then T is normal if and only if kT xk = kT ∗xk for each x ∈ V .

Proof. Let T be a linear operator on V . Note that T ∗T − T ∗T is self-adjoint. Then by Proposition 4.3.8,

T ∗T − TT ∗ = 0 ⇐⇒ h(T ∗T − TT ∗)(x), xi = 0 for any x ∈ V ⇐⇒ hT ∗T x, xi = hTT ∗x, xi for any x ∈ V ⇐⇒ kT xk2 = kT ∗xk2 for any x ∈ V.

Hence T is normal if and only if kT xk = kT ∗xk for any x ∈ V .

Theorem 4.3.10. Let T be a linear operator on V . Then TFAE:

(i) T is unitary;

(ii) kT xk = kxk for all x ∈ V ;

(iii) hT x, T yi = hx, yi for all x, y ∈ V ;

(iv) T ∗T = I.

Proof. (i) ⇒ (ii). For all x ∈ V ,

kT xk2 = hT x, T xi = hT ∗T x, xi = hT −1T x, xi = hx, xi = kxk2.

(ii) ⇒ (iii). We use the Polarization identity (Proposition 4.2.7). We will prove it when F = C. The real case can be done the same way. For all x, y ∈ V , 1 i hx, yi = kx + yk2 − kx − yk2 + kx + iyk2 − kx − iyk2 4 4 1 i = kT x + T yk2 − kT x − T yk2 + kT x + iT yk2 − kT x − iT yk2 4 4 = hT x, T yi. 186 CHAPTER 4. INNER PRODUCT SPACES

(iii) ⇒ (iv). Since hT ∗T x, yi = hT x, T yi = hx, yi for all x, y ∈ V , we have T ∗T = I.

(iv) ⇒ (v). If T ∗T = I, then T is 1-1. Since V is finite-dimensional, T is invertible and thus T ∗ = T −1.

Remark. This proposition is not true for an infinite-dimensional inner product space. For example, let R: `2 → `2 be the right-shift operator, i.e.

R(x1, x2,... ) = (0, x1, x2,... ).

Then kRxk = kxk for all x ∈ `2, but R is not surjective and thus not invertible.

Theorem 4.3.11. Let A ∈ Mn(C). Then TFAE:

(i) A is unitary;

n (ii) kAxk = kxk for all x ∈ C ;

n (iii) hAx, Ayi = hx, yi for all x, y ∈ C ;

∗ (iv) A A = In;

(v) the column vectors of A are orthonormal;

(vi) the row vectors of A are orthonormal.

Proof. The proof that (i), (ii), (iii) and (iv) are equivalent is similar to the proof of Theorem 4.3.10. We now show that (iv) and (v) are equivalent. Let A = [aij] ∗ ∗ and A = [bij], where bij = aji. Then A A = [cij], where

n n ∗ X X (A A)ij = bikakj = akiakj. (4.4) k=1 k=1

∗ ∗ The fact that A A = In is equivalent to (A A)ij = δij for i, j ∈ {1, . . . , n}. The i-th column vector of A is Ci = (a1i, . . . , ani), for i = 1, . . . , n. Hence

n X hCi,Cji = akiakj. k=1 4.3. OPERATORS ON INNER PRODUCT SPACES 187

It follows that n X hCj,Cii = hCi,Cji = akiakj. (4.5) k=1 From (4.4) and (4.5), we see that (iv) and (v) are equivalent. That (vi) is equivalent to the other statements follows from the fact that A is unitary if and only if At is unitary and that the row vectors of A are the column vectors of At.

We also have the following version for an orthogonal real matrix.

Theorem 4.3.12. Let A ∈ Mn(R). Then TFAE:

(i) A is orthogonal;

n (ii) kAxk = kxk for all x ∈ R ;

n (iii) hAx, Ayi = hx, yi for all x, y ∈ R ;

t (iv) A A = In;

(v) the column vectors of A are orthonormal;

(vi) the row vectors of A are orthonormal. 188 CHAPTER 4. INNER PRODUCT SPACES

Exercises

4.3.1. Let V be a (finite-dimensional) inner product space. If P is an orthogonal projection onto a subspace of V , prove that P 2 = P and P ∗ = P . Conversely, if P is a linear operator on V such that P 2 = P and P ∗ = P , show that P is an orthogonal projection onto a subspace of V .

4.3.2. Prove that a linear operator on a finite-dimensional inner product space is unitary if and only if it maps an orthonormal basis onto an orthonormal basis.

4.3.3. Let V be a finite-dimensional inner product space with dim V = n. Let

B = {v1, . . . , vn} be an ordered orthonormal basis for V . If T is a linear operator on V , prove that

(i) the ij-entry of [T ]B is hT (vj), vii for any i, j ∈ {1, 2, . . . , n};

∗ ∗ (ii) [T ]B = [T ]B;

(iii) T is normal if and only if [T ]B is normal;

(iv) T is self-adjoint if and only if [T ]B is self-adjoint;

(v) T is unitary if and only if [T ]B is unitary.

4.3.4. If T : V → V is a linear operator on a finite-dimensional inner product space, show that

ker T ∗ = (im T )⊥ and im T ∗ = (ker T )⊥.

4.3.5. Let f be a sesquilinear form on a finite-dimensional complex inner product space V . Prove that there is a linear operator T : V → V such that

f(x, y) = hT x, yi for any x, y ∈ V .

Moreover, show that

(i) T is self-adjoint if and only if f is hermitian;

(ii) T is invertible if and only if f is nondegenerate. 4.3. OPERATORS ON INNER PRODUCT SPACES 189

4.3.6. Let T be a unitary linear operator on a complex inner product space V . Prove that for any subspace W of V ,

T (W ⊥) = T (W )⊥.

4.3.7. Show that every linear operator T on a complex inner product space V can be written uniquely in the form

T = T1 + iT2, where T1 and T2 are self-adjoint linear operators on V . The operators T1 and

T2 are called the real part and the imaginary part of T , respectively. Moreover, show that T is normal if and only if its real part and imaginary part commute.

4.3.8. Let P be a linear operator on a finite-dimensional complex inner product space V such that P 2 = P . Show that the following statements are equivalent:

(a) P is self-adjoint;

(b) P is normal;

(c) ker P = (im P )⊥;

(d) hP x, xi = kP xk2 for all x ∈ V . 190 CHAPTER 4. INNER PRODUCT SPACES

4.4 Spectral Theorem

Theorem 4.4.1. Let T be a self-adjoint operator on V .

(i) Any eigenvalue of T is real.

(ii) The eigenspaces associated with distinct eigenvalues are orthogonal.

Proof. (i) Let λ be an eigenvalue for T . Then there is a nonzero vector v ∈ V such that T v = λv. Hence

λhv, vi = hT v, vi = hv, T vi = λ¯hv, vi.

Since v 6= 0, we have λ = λ¯. This implies that λ is real.

(ii) Let λ and µ be distinct eigenvalues of T . Let u and v be elements in V such that T u = λu and T v = µv. Then

λhu, vi = hλu, vi = hT u, vi = hu, T vi = hu, µvi = µhu, vi.

In the last equality, we use the fact that an eigenvalue of a self-adjoint operator is real. Since λ 6= µ, we have hu, vi = 0. Hence the eigenspaces associated with λ and µ are orthogonal.

Theorem 4.4.2. Let T be a normal operator on V .

(i) If v is an eigenvector of T corresponding to an eigenvalue λ, then v is an eigenvector of T ∗ corresponding to an eigenvalue λ¯. Moreover,

ker(T − λI) = ker(T ∗ − λI¯ ).

(ii) The eigenspaces associated with distinct eigenvalues are orthogonal.

Proof. (i) Let v be an eigenvector of T corresponding to an eigenvalue λ. Since T is normal, it is easy to check that T − λI is normal. By Theorem 4.3.9,

0 = k(T − λI)vk = k(T − λI)∗vk = k(T ∗ − λI¯ )vk, which implies T ∗(v) = λv¯ . Hence v is an eigenvector of T ∗ corresponding to an eigenvalue λ¯. 4.4. SPECTRAL THEOREM 191

(ii) Let λ and µ be distinct eigenvalues of T . Let u and v be nonzero elements in V such that T u = λu and T v = µv. By (i), T ∗v =µv ¯ . Hence

λhu, vi = hλu, vi = hT u, vi = hu, T ∗vi = hu, µv¯ i = µhu, vi.

Since λ 6= µ, we have hu, vi = 0. Hence the eigenspaces associated with λ and µ are orthogonal.

Proposition 4.4.3. Let T be a linear operator on V . If W is a T -invariant subspace of V , then W ⊥ is T ∗-invariant.

Proof. Suppose T (W ) ⊆ W , i.e. T w ∈ W for any w ∈ W . If v ∈ W ⊥, then

hT ∗v, wi = hv, T wi = 0 for any w ∈ W .

Hence T ∗v ∈ W ⊥, which implies T ∗(W ⊥) ⊆ W ⊥.

Definition 4.4.4. A linear operator T on a finite-dimensional inner product space V is said to be orthogonally diagonalizable if there is an orthonormal basis for V consisting of eigenvectors of T . A matrix A ∈ Mn(F) is said to be orthogonally diagonalizable if there is an n orthonormal basis for F consisting of eigenvectors of A.

Theorem 4.4.5 (Spectral theorem - complex version). A linear operator on a finite-dimensional complex inner product space is orthogonally diagonalizable if and only if it is normal.

Proof. If V has an orthonormal basis B consisting of eigenvectors of T , then [T ]B

is a diagonal matrix, say, [T ]B = diag(λ1, . . . , λn) and thus

∗ ∗ ¯ ¯ [T ]B = [T ]B = diag(λ1,..., λn).

∗ ∗ From this, it is easy to check that [T ]B[T ]B = [T ]B[T ]B. It follows that [T ]B is normal, and thus T is normal. Now assume that T is normal. We will prove the result by induction on the dimension of V . If dim V = 1, then the result is trivial. Now suppose n = dim V > 1. Assume the result holds for any complex inner product space of dimension less than n. Since V is a complex vector space, T has an eigenvalue 192 CHAPTER 4. INNER PRODUCT SPACES

λ because the characteristic polynomial always has a root in C. Let W be the eigenspace corresponding to λ. If W = V , then T = λI and the result follows trivially. Assume that W is a proper subspace of V . By Theorem 4.4.2, W = ker(T − λI) = ker(T ∗ − λI¯ ). Hence W is invariant under both T and T ∗. By Proposition 4.4.3, W ⊥ is invariant under T ∗∗ = T . This shows that both W ⊥ and W are T -invariant. Thus T |W and T |W ⊥ are normal operators on W and W ⊥, respectively (see Theorem 4.3.9). Since V = W ⊕ W ⊥ and 0 < dim W < n, we have 0 < dim W ⊥ < n. By the induction hypothesis, there exist an orthonormal basis {u1, . . . , uk} for W consisting of eigenvectors of T |W and an ⊥ orthonormal basis {uk+1, . . . , un} for W consisting of eigenvectors of T |W ⊥ . Thus {u1, u2, . . . , un} is an orthonormal basis of V consisting of eigenvectors of T .

The complex spectral theorem says that a linear operator on a complex inner product space can be diagonalized by an orthonormal basis precisely when it is normal. However, if an inner product space is real, a linear operator is diagonal- ized by an orthonormal basis precisely when it is self-adjoint. To prove this, we need an important lemma. A linear operator on a real inner product space may not have an eigenvalue, but it is true for a self-adjoint operator.

Lemma 4.4.6. Let T be a self-adjoint operator on a finite-dimensional real inner product space. Then T has an eigenvalue (and an eigenvector).

Proof. Let V be a finite-dimensional real inner product space and T : V → V a self-adjoint operator. Fix an orthonormal basis B for V and let A be the matrix representation of T with respect to B. View A as a matrix in Mn(C). Since T is self-adjoint, A is self-adjoint. As a complex matrix, A must have an eigenvalue n λ, which will be real by Theorem 4.4.1 (i). Thus there is an eigenvector v ∈ C n such that Av = λv. In fact, we can choose v to be in R . Since A is a real matrix and λ is real, A − λIn is also a real matrix. Hence the system (A − λIn)v = 0 n has a nontrivial solution in R because A − λIn is singular. It follows that there is a nonzero vector v ∈ V such that T v = λv. Hence λ is an eigenvalue of T corresponding to an eigenvector v.

Theorem 4.4.7 (Spectral theorem - real version). A linear operator on a finite- 4.4. SPECTRAL THEOREM 193 dimensional real inner product space is orthogonally diagonalizable if and only if it is self-adjoint.

Proof. Let V be a real inner product space. Assume first that V has an orthonor- mal basis B consisting of eigenvectors of T . Then [T ]B is a diagonal matrix, say,

[T ]B = diag(λ1, . . . , λn), where λi’s are all real. Thus

∗ ¯ ¯ [T ]B = diag(λ1,..., λn) = diag(λ1, . . . , λn) = [T ]B.

Hence [T ]B is self-adjoint and thus T is self-adjoint. Now assume that T is self-adjoint. We will prove the result by induction on the dimension of V . If dim V = 1, then the result is trivial. Now suppose dim V > 1. Then T has an eigenvalue λ by Lemma 4.4.6. Let W be the eigenspace corresponding to λ. We assume that W is a proper subspace of V . Then W is invariant under T . By Proposition 4.4.3, W ⊥ is invariant under T ∗ = T . Hence ⊥ both W and W are T -invariant. Thus T |W and T |W ⊥ are self-adjoint operators on W and W ⊥, respectively. The rest of the proof is the same as the proof of Theorem 4.4.5.

If A is a diagonalizable matrix, then there is an invertible matrix P such that P −1AP is a diagonal matrix. In fact, the columns of P are formed by eigenvectors of A. If A is a complex normal matrix or a real symmetric matrix, then A is diagonalizable. In this case, the matrix P can be chosen to be unitary in the complex case and orthogonal in the real case. This the a matrix version of the spectral theorem.

Theorem 4.4.8. A complex matrix is orthogonally diagonalizable if and only if there is a unitary matrix P such that P −1AP is a diagonal matrix. A real matrix is orthogonally diagonalizable if and only if there is an orthogonal matrix P such that P −1AP is a diagonal matrix.

Proof. We will give a proof for the complex case. Let A be a complex matrix. First, note that if P is an invertible matrix, then D = P −1AP is equivalent to

PD = AP . Moreover, if D is a diagonal matrix and P = [u1 . . . un], where each ui is the i-th column of P , then

AP = [Au1 . . . Aun] and PD = [λ1u1 . . . λnun]. (4.6) 194 CHAPTER 4. INNER PRODUCT SPACES

Assume that A is orthogonally diagonalizable. Then there is an orthonormal n basis B = {u1, . . . , un} for C and λ1, . . . , λn ∈ C such that Aui = λiui for i = 1, . . . , n. Let D = diag(λ1, . . . , λn) and let P be the matrix of which the i-th column is ui for each i. Then P is unitary by Theorem 4.3.11. From (4.6), we see that AP = PD, which implies P −1AP is a diagonal matrix. Conversely, assume that there is a unitary matrix P such that D = P −1AP is a diagonal matrix. Then AP = PD. Assume that D = diag(λ1, . . . , λn) and

P = [u1 . . . un], where each ui is the i-th column of P . By (4.6), it follows that

Aui = λiui for i = 1, . . . , n. Hence each ui is an eigenvector of A corresponding to the eigenvalue λi. Since P is unitary, {u1, . . . , un} is an orthonormal basis for n C by Theorem 4.3.11. This completes the proof.

Theorem 4.4.9 (Spectral theorem - matrix version). If A is a normal matrix −1 in Mn(C), then there is a unitary matrix P such that P AP = D is a diagonal matrix. Hence A = PDP −1 = PDP ∗.

If A is a real symmetric matrix, then there is an orthogonal matrix P such that P −1AP = D is a diagonal matrix. Hence

A = PDP −1 = PDP t.

Proof. Let A be a complex normal matrix. Then LA is a normal operator on n C . By the Spectral theorem (Theorem 4.4.5), there is an orthonormal basis for n C consisting of eigenvectors of LA (which are eigenvectors of A). By Theorem 4.4.8, there is a unitary matrix P such that P −1AP = D is a diagonal matrix. The proof for a real symmetric matrix is the same and will be omitted.

Example. Define ! 1 2 A = . 2 −2 Find an orthogonal matrix P and a diagonal matrix D such that A = PDP −1.

Proof. ! x − 1 −2 2 χA(x) = det = x + x − 6. −2 x + 2 4.4. SPECTRAL THEOREM 195

The eigenvalues of A are −3 and 2. Moreover,

V−3 = h(1, −2)i and V2 = h(2, 1)i.

Thus B = {( √1 , √−2 ), ( √2 , √1 )} is an orthonormal basis for 2 consisting of eigen- 5 5 5 5 R vectors of A. Let ! ! √1 √2 −3 0 P = 5 5 and D = . √−2 √1 0 2 5 5

Then P is an orthogonal matrix such that A = PDP −1.

Example. Define  5 4 2    A =  4 5 2  . 2 2 2

Find an orthogonal matrix P and a diagonal matrix D such that A = PDP −1.

Proof. Solving the equation det(xI3 − A) = 0, we have x = 1, 1, 10, which are the eigenvalues of A. Hence

V1 = h(1, 0, −2), (0, 1, −2)i and V10 = h(2, 2, 1)i.

Note that V1 ⊥ V10, so we have to choose 2 orthonormal vectors from V1 by

applying the Gram-Schmidt process to V1. Let x1 = (1, 0, −2) and x2 = (0, 1, −2).     Let u = x /kx k = √1 , 0, √−2 . Write z = x − hx , u iu = − 4 , 1, − 2 , 1 1 1 5 5 2 2 2 1 1 5 5   and u = z /kz k = − √4 , √5 , − √2 . Moreover, let 2 2 2 45 45 45

2 2 1 u = (2, 2, 1)/k(2, 2, 1)k = , , . 3 3 3 3 Let   √1 √−4 2   5 45 3 1 0 0  5 2  P =  0 √  and D =  0 1 0  .  45 3    √−2 √−2 1 0 0 10 5 45 3 Then P is an orthogonal matrix and A = PDP −1. 196 CHAPTER 4. INNER PRODUCT SPACES

Remark. 1. A linear operator or a square matrix can be diagonalizable but not orthogonally diagonalizable. For example, let ! 1 1 A = . 0 2

It is easy to show that A has eigenvalues 1 and 2 and that V1 = span{(1, 0)} and 2 V2 = span{(1, 1)}. Hence {(1, 0), (1, 1)} is a basis for R consisting of eigenvectors of A. However, we cannot choose vectors in V1 and V2 that are orthogonal. Hence 2 there is no orthonormal basis for R consisting of eigenvectors of A. 2. A real matrix can be orthogonally diagonalizable over C, but not over R. For example, consider the following real orthogonal matrix: ! cos θ − sin θ A = sin θ cos θ where θ is a real number. Note that A, regarded as a complex matrix, is unitary and hence normal. Thus A is orthogonally diagonalizable over C. Its eigenval- ues eiθ and e−iθ and their corresponding eigenspaces are h(1, −i)i and h(1, i)i, respectively. Then

!−1 ! ! ! 1 1 cos θ − sin θ 1 1 eiθ 0 = . −i i sin θ cos θ −i i 0 e−iθ

However, the only real matrices which are orthogonally diagonalizable (over R) are symmetric matrices. Hence A is not orthogonally diagonalizable over R.

Definition 4.4.10. Let T be a linear operator on an inner product space V . We say that T is positive or positive semi-definite if T is self-adjoint and

hT x, xi ≥ 0 for any x ∈ V.

Moreover, we say that T is positive definite if T is self-adjoint and

hT x, xi > 0 for any x 6= 0.

A positive (semi-definite) matrix and a positive definite matrix can be defined analogously. 4.4. SPECTRAL THEOREM 197

Note that if V is a complex inner product space, by Theorem 4.3.7, we can drop the assumption T being self-adjoint.

Example. The following matrix is positive definite as can be readily checked. ! 2 −1 A = . −1 2

Theorem 4.4.11. Let T be a linear operator on a finite-dimensional inner prod- uct space V . Then TFAE:

(i) T is positive;

(ii) T is self-adjoint and all eigenvalues of T are nonnegative;

(iii) T = P 2 for some self-adjoint operator P ;

(iv) T = S∗S for some linear operator S.

Proof. (i) ⇒ (ii). Assume that T is positive. Clearly, T is self-adjoint. Let λ be an eigenvalue of T . Then T x = λx for some nonzero x ∈ V . Thus

λkxk2 = hλx, xi = hT x, xi ≥ 0,

which implies λ ≥ 0. (ii) ⇒ (iii). Assume T is self-adjoint and all eigenvalues of T are nonnegative.

By the Spectral theorem, there is an orthonormal basis B = {u1, . . . , un} for V

consisting of eigenvectors of T . Assume that T uj = λjuj for j = 1, . . . , n. Then p λj ≥ 0 for all j. Define P uj = λj uj for j = 1, . . . , n and extend it to a linear operator on V . Clearly,

2 p P uj = P ( λj uj) = λjuj = T uj for j = 1, . . . , n.

Hence P 2 = T on a basis B for V . It follows that P 2 = T on V . Note that √ √ [P ]B = diag( λ1,..., λn ). Thus [P ]B is a self-adjoint matrix, which implies P is a self-adjoint operator. (iii) ⇒ (iv). If T = P 2 where P is a self-adjoint operator, then

P ∗P = PP = P 2 = T. 198 CHAPTER 4. INNER PRODUCT SPACES

(iv) ⇒ (i). If T = S∗S, then T ∗ = (S∗S)∗ = S∗S = T, and

hT x, xi = hS∗S x, xi = hSx, Sxi = kSxk2 ≥ 0 for any x ∈ V . Hence T is positive.

Similarly, we can establish the following Corollary for positive definiteness:

Corollary 4.4.12. Let T be a linear operator on a finite-dimensional inner product space V . Then TFAE:

(i) T is positive definite;

(ii) T is self-adjoint and all eigenvalues of T are positive;

(iii) T = P 2 for some self-adjoint invertible operator P ;

(iv) T = S∗S for some invertible linear operator S;

(v) T is positive and invertible.

Proof. Exercise.

Remark. The operator P in Theorem 4.4.11 (ii) is indeed a positive operator as can be seen in the proof. It is called a positive square-root of T . Although a linear operator can have many square-root, it has a unique positive square-root.

Proposition 4.4.13. A positive operator has a unique positive square-root.

Proof. Let T be a positive operator on V . Let λ1, . . . , λk be the distinct eigen- values of T . Since T is positive, λi ≥ 0 for i = 1, . . . , n. Since T is self-adjoint, it is diagonalizable. Hence

V = ker(T − λ1I) ⊕ · · · ⊕ ker(T − λkI).

Let P be a positive operator such that P 2 = T . If α is an eigenvalue of P with a corresponding eigenvector v. Then P v = αv, which implies T v = P 2v = α2v. 2 p Hence α is one of λj for some j ∈ {1, . . . , k}. It follows that α = λj for some j and that p ker(P − λj I) ⊆ ker(T − λjI). 4.4. SPECTRAL THEOREM 199

√ √ This shows that the only eigenvalues of P are λ1,..., λk. Since P is self- adjoint, it is diagonalizable and thus p p V = ker(P − λ1 I) ⊕ · · · ⊕ ker(P − λk I).

It follows that

p ker(P − λj I) = ker(T − λjI) for j = 1, . . . , k. p Hence on each subspace ker(T −λjI) of V , P = λj I. Thus the positive square- root P of T is uniquely determined.

Theorem 4.4.14 (Polar Decomposition). Let T be an invertible operator on V . Then T = UP , where U is a unitary operator and P is a positive definite operator. Moreover, the operators U and P are uniquely determined.

Proof. Since T ∗T is positive, there is a (unique) positive square-root P so that P 2 = T ∗T . Since T is invertible, so is P . By Corollary 4.4.12, P is positive definite. Let U = TP −1. Then

U ∗U = (TP −1)∗(TP −1) = (P −1)∗T ∗TP −1 = P −1P 2P −1 = I.

Hence U is unitary.

Suppose T = U1P1 = U2P2, where U1, U2 are unitary and P1, P2 are positive definite. Then ∗ ∗ ∗ ∗ 2 T T = (P1 U1 )(U1P1) = P1 IP1 = P1 .

∗ 2 ∗ Similarly, T T = P2 . But the positive square-root of T T is unique. Hence P1 = P2. It follows that U1P1 = U2P2 = U2P1. Since P1 is invertible, we have

U1 = U2.

Corollary 4.4.15. Let A be an invertible matrix in Mn(F). Then A = UP , where U is a unitary (orthogonal if F = R) matrix and P is a positive definite matrix. Moreover, the matrices U and P are uniquely determined. 200 CHAPTER 4. INNER PRODUCT SPACES

Exercises

4.4.1. Given  0 2 −1    A =  2 3 −2  , −1 −2 0 find an orthogonal matrix P that diagonalizes A. 4.4.2. Let T be a normal operator on a complex finite-dimensional inner product space and let σ(T ) denote the set of eigenvalues of T . Prove that

(a) T is self-adjoint if and only if σ(T ) ⊆ R;

(b) T is unitary if and only if σ(T ) ⊆ {z ∈ C : |z| = 1}. 4.4.3. Let T be a self-adjoint operator on a finite-dimensional inner product space such that tr(T 2) = 0. Prove that T = 0. 4.4.4. Show that if T is self-adjoint, nilpotent operator on a finite-dimensional inner product space, then T = 0. 4.4.5. Let T be a normal (symmetric) operator on a complex (real) finite- dimensional inner product space V . Show that there exist orthogonal projections

E1,...,En on V and scalars λ1, . . . , λn such that

(i) EiEj = 0 for i 6= j;

(ii) I = E1 + ··· + En;

(iii) T = λ1E1 + ··· + λnEn.

Conversely, if there exist orthogonal projections E1,...,En satisfying (i)-(iii) above, show that T is normal.

4.4.6. Let a1, . . . , an, b1, . . . , bn ∈ F for some n ∈ N. Show that there is a polynomial p(x) ∈ F[x] such that p(ai) = bi for i = 1, . . . , n. This is called the Lagrange Interpolation Theorem. 4.4.7. Let T be a linear operator on a finite-dimensional complex inner product space. Show that T is normal if and only if there is a polynomial p ∈ C[x] such that T ∗ = p(T ). 4.4. SPECTRAL THEOREM 201

4.4.8. Let S and T be linear operators on a finite-dimensional inner product space. Prove that

(i) if S and T are positive, then so is S + T ;

(ii) if T is positive and c ≥ 0, then so is cT ;

(iii) any orthogonal projection is positive;

(iv) if T is positive, then S∗TS is positive;

(v) if T is a linear operator, then TT ∗ and T ∗T are positive;

(vi) if T is positive and invertible, then so is T −1.

4.4.9. Prove Corollary 4.4.12.

4.4.10. Let T be a self-adjoint operator on a finite-dimensional inner product space. Prove that T is positive definite if and only if T is invertible and T −1 is positive definite.

4.4.11. Let A be an n × n real symmetric matrix. Prove there is an α ∈ R such that A + αIn is positive definite. ! a b 4.4.12. Let A be a 2 × 2 matrix . Prove that A is positive definite if c d and only if a > 0 and det A > 0.

4.4.13. Find a square root of matrix

 1 3 −3    A =  0 4 5  . 0 0 9

4.4.14. Let T be a positive operator on a finite-dimensional inner product space V . Prove that

|hT x, yi|2 ≤ hT x, xihT y, yi for all x, y ∈ V.

Bibliography

[1] Sheldon Axler, Linear Algebra Done Right, Seconnd Edition, Springer, New York 1997.

[2] Thomas S. Blyth, Theory: An Approach to Linear Algebra, Second Edition, Oxford University Press, Oxford 1990.

[3] William C. Brown, A Second Course in Linear Algebra, John Wiley & Sons, New York, 1988.

[4] Stephen H. Friedberg, Arnold J. Insel, Lawrence E. Spence, Linear Algebra, Second Edition, Prentice Hall, New Jersey, 1989.

[5] Kenneth Hoffman and Ray Kunze, Linear Algebra, Second Edition, Prentice Hall, New Jersey, 1971.

[6] Thomas W. Hungerford, Algebra, Springer-Verlag, New York 1974.

[7] Seymour Lipschutz, Linear Algebra SI (Metric) Edition, McGraw Hill, Sin- gapore, 1987.

[8] Aigli Papantonopoulou, Algebra : Pure and Applied, First Edition, Prentice Hall, New Jersey, 2002.

[9] Steven Roman, Advanced Linear Algebra, Third Edition, Springer, New York 2008.

[10] Surjeet Singh and Qazi Zameeruddin, Modern Algebra, Vikas Publishing House PVT Ltd., New Delhi, 1988.

203