<<

Linear II

Andrew Kobin Fall 2014 Contents Contents

Contents

0 Introduction 1

1 The Algebra of Vector Spaces 2 1.1 Scalar Fields ...... 2 1.2 Vector Spaces and Linear Transformations ...... 3 1.3 Constructing Vector Spaces ...... 7 1.4 , Span and Bases ...... 10 1.5 More on Linear Transformations ...... 17 1.6 Similar and Conjugate Matrices ...... 23 1.7 and Image ...... 24 1.8 The Theorems ...... 27

2 29 2.1 Functionals ...... 29 2.2 Self Duality ...... 31 2.3 Perpendicular Subspaces ...... 31

3 Multilinearity 34 3.1 Multilinear Functions ...... 34 3.2 Products ...... 35 3.3 Tensoring Transformations ...... 38 3.4 Alternating Functions and Exterior Products ...... 39 3.5 The Determinant ...... 43

4 Diagonalizability 46 4.1 Eigenvalues and Eigenvectors ...... 46 4.2 Minimal ...... 50 4.3 Cyclic Linear Systems ...... 53 4.4 Jordan Canonical Form ...... 59

5 Inner Product Spaces 65 5.1 Inner Products and Norms ...... 65 5.2 ...... 68 5.3 The Adjoint ...... 71 5.4 Normal Operators ...... 73 5.5 Spectral Decomposition ...... 74 5.6 Unitary Transformations ...... 75

i 0 Introduction

0 Introduction

These notes were taken from a second-semester course taught by Dr. Frank Moore at Wake Forest University in the fall of 2014. The main reference for the course is A (Terse) Introduction to Linear Algebra by Katznelson. The goal of the course is to expand upon the concepts introduced in a first-semester linear algebra course by generalizing real vector spaces to vector spaces over an arbitrary field, and describing linear operators, canonical forms, inner products and linear groups on such spaces. The main topics and theorems that are covered here are:

ˆ A rigorous treatment of the algebraic theory of vector spaces and their transformations

ˆ Duality, bases, the adjoint transformation

ˆ Bilinear maps and the

ˆ Alternating functions, the exterior product and the determinant

ˆ Eigenvalues, eigenvectors, the Cayley-Hamiliton Theorem

ˆ Jordan canonical form and the theory of diagonalizability

ˆ Inner product spaces

ˆ Orthonormal bases and the Gram-Schmidt process

ˆ Spectral theory

1 1 The Algebra of Vector Spaces

1 The Algebra of Vector Spaces

1.1 Scalar Fields

The first question we must answer is: what should the scalars be in a general treatment of linear algebra? The answer is fields:

Definition. A field is a set F with two binary operations

00+00 : F × F −→ F 00·00 : F × F −→ F (a, b) 7−→ a + b (a, b) 7−→ ab satisfying the following axioms:

(1) There is an element 0 ∈ F such that a + 0 = 0 + a = a for all a ∈ F.

(2) For all a, b ∈ F, a + b = b + a.

(3) For each a ∈ F, there is an element −a ∈ F such that a + (−a) = 0.

(4) For all a, b ∈ F, ab = ba.

(5) For all nonzero a ∈ F (the set of such elements is denoted F×) there exists a−1 ∈ F such that aa−1 = 1.

(6) For any a, b, c ∈ F, a(bc) = (ab)c.

(7) For any a, b, c ∈ F, a(b + c) = ab + ac.

Note that axioms (1) – (4) say that (F, +) is an , (5) – (8) say that (F×, ·) is an abelian group, and axiom (9) serves to describe the interaction between the operations. We sometimes say that + and · “play nicely”.

Examples.

1 The real numbers R, the rational numbers Q and the complex numbers C are all common examples of fields.

2 Z2 = {0, 1} is the smallest possible field, sometimes called the finite field of character- istic 2. This example is important in computer science, particularly coding theory. ¯ ¯ ¯ 3 For any p a prime , Zp = {0, 1, 2,..., p − 1} is called the finite field of char- acteristic p. In contrast, Q, R and C are called infinite fields.

4 The Z are not a field. Z has + and · but no multiplicative inverses (i.e. it fails axiom (7)).

5 Let n be a composite number. Then Zn is not a field. For example, consider Z4 = {0¯, 1¯, 2¯, 3¯}. In this set 2¯ does not have an inverse.

2 1.2 Vector Spaces and Linear Transformations 1 The Algebra of Vector Spaces

1.2 Vector Spaces and Linear Transformations

Definition. A over a field F is a set V with two operations 00+00 : V × V −→ V 00·00 : F × V −→ V (u, v) 7−→ u + v (a, v) 7−→ av called vector and scaling, respectively, such that (1) (V, +) satisfies axioms (1) – (4), i.e. V is an additive abelian group.

(2) For all a, b ∈ F and v ∈ V , a(bv) = (ab)v. (3) For all v ∈ V , 1v = v.

(4) For all a, b ∈ F and v ∈ V , (a + b)v = av + bv. (5) For all a ∈ F and u, v ∈ V , a(u + v) = au + av. Examples.

n 1 F = {(a1, . . . , an) | ai ∈ F} is the canonical n-dimensional vector space over F. The operations are the usual componentwise addition and scaling. In particular, a field is a (one-dimensional) vector space over itself.

2 {0} is the trivial vector space over any field F.

3 Mat(n, m, F) denotes the set of all n × m matrices with coefficients in F. It is an F-vector space under componentwise addition and scaling, and is isomorphic to (which will be described later) Fnm.

2 n 4 F[x] = {a0 + a1x + a2x + ... + anx | ai ∈ F} is called the algebra over F. It is a vector space via coefficient addition and polynomial (extended FOIL). This is an example of an infinite dimensional vector space.

5 Let X be a set. Then FX = {f : X → F | f is a } is a vector space over F. For f, g ∈ FX and for all x ∈ X, addition and scaling are defined by (f + g)(x) = f(x) + g(x) and f(cx) = cf(x).

The zero function 0 : X → F sends every element x 7→ 0. It is easy to verify that FX satisfies the properties of a vector space.

6 Let V = {f : R → C | f satisfies the ODE 3f 000(x) − sin x f 00(x) + 2f(x) = 0}. Then V is a vector space under pointwise addition and scaling. For example, if f, g ∈ V and x ∈ R then 3(f + g)000(x) − sin x(f + g)00(x) + 2(f + g)(x) = (3f 000(x) − sin x f 00(x) + 2f(x)) + (3g000(x) − sin x g00(x) + 2g(x)) = 0 + 0 = 0. Thus f + g ∈ V . The proof for is similar.

3 1.2 Vector Spaces and Linear Transformations 1 The Algebra of Vector Spaces

Next we seek a way of comparing two vector spaces. Definition. A linear transformation is a function T : V → W , where V and W are F-vector spaces, satisfying (1) T (u + v) = T (u) + T (v) (2) T (cv) = cT (u) where u, v ∈ V and c ∈ F. Recall the definitions of one-to-one, onto and bijective. Definition. A function f : X → Y is

(a) One-to-one if f(x1) = f(x2) implies x1 = x2. (b) Onto if for all y ∈ Y there is an x ∈ X such that f(x) = y. (c) Bijective if it is both one-to-one and onto. Remark. Let f : X → Y be a function.

(1) f is 1-1 ⇐⇒ there is a function g : im(f) → X such that gf = idX .

(2) f is onto ⇐⇒ there is a function h : Y → X such that fh = idY . Definition. A bijective linear transformation is called an isomorphism. Examples. 1 Recall the vector space Mat(n, m, F). Define a map nm T : Mat(n, m, F) −→ F   a11  a21     .  a ··· a  .  11 1m  .   . .. .   a  A =  . . .  7−→  n1   a  an1 ··· anm  12   .   .  anm There is a clear candidate for an inverse function to T : nm S : F −→ Mat(n, m, F)   a11  a21   .     .  a ··· a  .  11 1m  a   . .. .   n1  7−→  . . .   a   12  an1 ··· anm  .   .  anm and checking T is linear is trivial, so T is an isomorphism. This confirms our earlier ∼ nm comment that Mat(n, m, F) = F .

4 1.2 Vector Spaces and Linear Transformations 1 The Algebra of Vector Spaces

2 Let X be a finite set, say X = {x1, . . . , xn}. Define

n X T : F −→ F   a1   . f : X → F  .  7−→   xi 7→ ai an

One can verify that this gives an isomorphism.

3 The function

Int : C[0, 1] −→ R Z 1 f 7−→ f(x) dx 0 is a linear transformation since we know from calculus that are linear. However Int is definitely not an isomorphism since there are many continuous functions on [0,1] that have 0.

4 Let A ∈ Mat(n, m, F) be given by   a11 ··· a1m  . .. .  A =  . . .  an1 ··· anm

Then we can define a linear transformation

m n TA : F −→ F v 7−→ Av

where Av denotes the usual -vector multiplication.

In many branches of algebra there is a notion of of an algebraic object which have the same properties as the larger object. In linear algebra this takes the form of subspaces.

Definition. Let V be a vector space. A nonempty W ⊆ V is a subspace if

(1) For all w, z ∈ W , w + z ∈ W .

(2) For all c ∈ F, cw ∈ W . Note that (2) implies that 0 ∈ W for any subspace W . The key is that W is nonempty, i.e. you have to start with some element in the subspace to begin with. In practice it is usually easiest to show this condition for W by showing that 0 ∈ W anyways. Given a vector space V , there are always two trivial subspaces: 0 and all of V .

5 1.2 Vector Spaces and Linear Transformations 1 The Algebra of Vector Spaces

Example 1.2.1. Some geometric examples of subspaces

In R2, examples of subspaces are: ˆ The trivial subspace 0 (as always)

ˆ All of R2 ˆ Lines through the origin.

In fact, every subspace of R2 is one of these. Moving up a , all subspaces of R3 are one of:

ˆ 0 and R3 ˆ Lines through the origin

ˆ Planes through the origin.

Example 1.2.2. Let V = Fn and take a system of homogeneous equations  a11x1 + ... + a1nxn = 0  . (∗) = .  am1x1 + ... + amnxn = 0.

Then the set   x   1   .  W =  .  ∈ V :(∗) holds    xn  is a subspace of V . This is equivalent to the set of vectorsx ¯ such that Ax¯ = 0,¯ where A is the matrix with coefficients (aij) taken from (∗). There is also an associated linear n m transformation TA : F → F ,x ¯ 7→ Ax¯. Then W is a special kind of subspace, called the kernel of TA, which we describe below. Definition. If T is a linear transformation of V → W , the kernel of T is

ker T = {v ∈ V | T (v) = 0}.

Then in the above example, W is the kernel of TA. Another important subspace arising from linear transformations is Definition. If T is a linear transformation V → W , the image of T is the subset of W given by im T = {w ∈ W | there exists v ∈ V such that T (v) = w}. Proposition 1.2.3. For any linear transformation T : V → W , ker T is a subspace of V and im T is a subspace of W . Proof omitted.

6 1.3 Constructing Vector Spaces 1 The Algebra of Vector Spaces

Another important subspace arises in the following way. Example 1.2.4. Let V be a vector space and let S ⊆ V be any subset. Then we can form a subspace from S by defining

( n ) X Span(S) = aisi : ai ∈ F, si ∈ S, n ≥ 0 . i=1 This is equivalently the smallest subspace containing S. Remark. Every subspace may be realized as the span of some set and the kernel of some linear transformation.

Example 1.2.5. Let V = Matn(F), the set of n × n matrices with coefficients in F. The following are subspaces of V : ˆ Diagn(F) = {(aij) | aij ∈ F, aij = 0 if i 6= j}. ˆ Symn(F) = {(aij) | aij ∈ F, aij = aji for all i, j}.

ˆ Upper and lower triangular matrices over F.

Example 1.2.6. For V = Matn(F), the transpose map

T (·) : Matn(F) −→ Matn(F) (aij) 7−→ (aji)

is a linear transformation.

Example 1.2.7. If {Wi}i∈I is a collection of subspaces of V , the following are subspaces of V : \ ˆ Wi i∈I X ˆ Wi, i.e. the smallest subspace containing all the Wi. Equivalently, this is the set i∈I of all finite sums of elements in the Wi.

1.3 Constructing Vector Spaces A prevalent technique in mathematics is to define an equivalence relation on a set X and identify objects as being “the same” under this relation. Recall that an equivalence relation ∼ on X is (i) Reflexive: x ∼ x for all x ∈ X.

(ii) Symmetric: x ∼ y =⇒ y ∼ x for all x, y ∈ X.

(iii) Transitive: x ∼ y, y ∼ z =⇒ x ∼ z for all x, y, z ∈ X.

7 1.3 Constructing Vector Spaces 1 The Algebra of Vector Spaces

In linear algebra, we seek to define an equivalence relation on a vector space via some subspace.

Definition. Let V be a vector space and W ⊆ V a subspace. The coset of W through a vector v ∈ V is the set v + W = {v + w | w ∈ W }.

Example 1.3.1. If V = R2 and W is a line through the origin, the coset v + W is a translation of W : v + W W

v

We define an equivalence relation on V by u ∼ v ⇐⇒ u + W = v + W , i.e. coset equivalence. Here u and v are sometimes called coset representatives, so ∼ partitions V into distinct classes of equivalent coset representatives. A useful property to note is that u + W = v + W ⇐⇒ v − u ∈ W . We form a new vector space in the following way.

Definition. The quotient space V/W (pronounced V mod W ) is the set

V/W = {v + W | v ∈ V }.

It is a vector space under coset addition and scaling:

(v1 + W ) + (v2 + W ) = (v1 + v2) + W c(v + W ) = cv + W.

Proposition 1.3.2. Coset addition and scaling are well-defined operations on the cosets of a subspace W ⊆ V .

Proof omitted. This is a common proof in a first-semester course on modern algebra. For details, see Artin or Hungerford. The moral of Proposition 1.3.2 is that when defining functions or operations on a quotient space, we have to address well-definedness in some way. One theorem that helps with this is

8 1.3 Constructing Vector Spaces 1 The Algebra of Vector Spaces

Theorem 1.3.3. Given a linear transformation T : V → V 0, there is a linear transformation T : V/W → V 0 satisfying T (v + W ) = T (v) for all v ∈ V if and only if W ⊆ ker T . Note that the zero element of the vector space V/W is just 0 + W = W . Taking the quotient is a natural way of creating a new space “centered” about W .

Definition. Let {Vi}i∈I be a collection of vector spaces. Then we can form two new vector spaces called the direct sum and : M Vi = {(vi)i∈I | vi ∈ Vi and only finitely many are nonzero} i∈I Y Vi = {(vi)i∈I | vi ∈ Vi}. i∈I These are both vector spaces under componentwise addition and scaling. M Y Remark. In general, Vi is a subspace of Vi, and if I is a finite set, the direct sum i∈I i∈I and direct product are equal. In practice, you can take any two vector spaces and ‘glom’ them together into a direct sum (note: glom is not an industry term).

Examples.

2 M 2 1 Let V1 = R and V2 = C[0, 1]. Then Vi = R ⊕ C[0, 1]. i=1,2

2 Let Vi = R for all i ∈ N, a countably infinite set. Then the vector (1, 1, 1,...) is an Y M element of Vi but not Vi. Thus the direct sum is a proper subspace of the direct i∈N i∈N product in this example. A natural question is: given a vector space, when can it be built out of smaller vector spaces? Given subspaces W1,...,Wk of a vector space V , we define a function

Φ: W1 ⊕ ... ⊕ Wk −→ W1 + ... + Wk

(w1, . . . , wk) 7−→ w1 + ... + wk. It’s easy to see that Φ is a surjective linear transformation.

Definition. The subspaces W1,...,Wk are said to be independent if Φ is one-to-one.

In other words, if the Wi are dependent, then for every v ∈ W1 + ... + Wk there are unique wi ∈ Wi such that v = w1 + ... + wk.

Definition. If V is a vector space with subspaces W1,...,Wk that are independent, and k ∼ M V = W1 + ... + Wk, then Φ induces an isomorphism V = Wi and we say V is the i=1 internal direct sum of the Wi.

9 1.4 Linear Independence, Span and Bases 1 The Algebra of Vector Spaces

Example.

3 Let V = {f : R → R} and define the subspaces of even and odd functions, respectively:

Ve = {f : R → R | f(−x) = f(x) for all x} Vo = {f : R → R | f(−x) = −f(x) for all x}. ∼ These subspaces are independent, and it turns out that V = Ve + Vo, so V = Ve ⊕ Vo. Also note that Ve ∩ Vo = {0}. In general, we have

Proposition 1.3.4. Two subspaces W1 and W2 of V are independent ⇐⇒ W1 ∩ W2 = {0}.

Proof. By definition, W1 and W2 are independent if and only Φ is one-to-one. Consider

ker Φ = {(w1, w2) ∈ W1 ⊕ W2 | w1 + w2 = 0}

= {(w1, −w1) ∈ W1 ⊕ W2 | w1 ∈ W1 ∩ W2}.

Hence ker Φ = 0 ⇐⇒ W1 ∩ W2 = {0}.

1.4 Linear Independence, Span and Bases

Definition. A set A ⊆ V is linearly independent if for every choice of a finite set of vectors a1, . . . , an ∈ A, n X ciai = 0 =⇒ ci = 0 i=1 where ci ∈ F. Otherwise, the set is linearly dependent.

If a counterexample exists, i.e. a set of vectors a1, . . . , an ∈ A and weights ci ∈ F, not all n X zero, such that ciai = 0 then this sum is called a dependence relation for the ai. i=1

Example 1.4.1. In V = R3, the vectors 2 1 0 v1 = 5 v2 = 3 v3 = 1 2 1 0 are linearly dependent since v1 − 2v2 + v3 = 0.¯

Remark. A set A is linearly independent ⇐⇒ the subspaces {Span(a)}a∈A are indepen- dent.

Example 1.4.2. The trivial vector space {0} is linearly independent.

Example 1.4.3. Subsets of linearly independent sets are linearly independent.

10 1.4 Linear Independence, Span and Bases 1 The Algebra of Vector Spaces

Definition. A subset A ⊆ V is called a spanning set for V , or spans V , if Span(A) = V , i.e. if for every v ∈ V there are a1, . . . , an ∈ A and c1, . . . , cn ∈ F such that

n X v = ciai. i=1 Example 1.4.4. Spanning sets exist: for any vector space V , Span(V ) = V .

Example 1.4.5. Supersets of spanning sets are spanning sets.

Given these properties of linearly independent and spanning sets, we are interested in finding maximal linearly independent sets and minimal spanning sets. The relation between the two is described in the next lemma.

Lemma 1.4.6. For any vector space V ,

(1) Any maximal linearly independent set is a spanning set.

(2) Any minimal spanning set is linearly independent.

Proof. (1) We prove that if a linearly independent set doesn’t span, it cannot be maximal. Let A be such a set, i.e. Span(A) ( V . Choose v ∈ V r Span(A) and consider the set n X A ∪ {v}. Suppose cv + ciai = 0 for some c, c1, . . . , cn ∈ F and a1, . . . , an ∈ A. If c 6= 0 i=1 then we can solve for v: n −1 X v = c a c i i i=1 but the term on the right lies in Span(A). This contradicts our choice of v, so c = 0, but then the original equation becomes n X ciai = 0. i=1

Since A is linearly independent, the ci must all be 0. Hence A ∪ {v} is linearly independent, and therefore A is not maximal among such sets. (2) Similar to the first part, we will show that a spanning set A that is linearly dependent properly contains a spanning set. Since A is assumed to be linearly dependent, there exist elements ai ∈ A and ci ∈ F, not all zero, such that

n X ciai = 0. i=1

We may reindex so that c1 6= 0. Consider the set A r {a1}. Let v ∈ V . Then because A spans V , there exist an+1, . . . , am+n ∈ A and cn+1, . . . , cn+m ∈ F such that

m X v = cn+jan+j. j=1

11 1.4 Linear Independence, Span and Bases 1 The Algebra of Vector Spaces

If an+j 6= a1 for any j then we have shown A r {a1} spans V . Otherwise, rearrange that an+1 = a1. Then

m X v = cn+1an+1 + cn+jan+j j=2 n ! m −1 X X = c c a + c a n+1 c i i n+j n+j 1 i=2 j=2

(by solving for a1 in the dependence relation). This shows that v ∈ Span(A r {a1}) in either case. Hence A is not minimal among spanning sets.

Definition. A linearly independent spanning set is called a for V .

Alternatively, this definition can be expressed in the following way:

B is a basis for V ⇐⇒

given any v ∈ V, there is a unique choice of n P ai ∈ A and ci ∈ F such that v = ciai. i=1 The existence portion corresponds to spanning, and the uniqueness follows from linear inde- pendence. Let V be a finite dimensional vector space and B be a basis for V , with |B| = n. Then there is a linear transformation, called the coordinate mapping:

n CB : V −→ F   c1  .  v 7−→  .  cn

n X where v = cibi for the distinct basis elements bi ∈ B. i=1 Proposition 1.4.7. For a finite dimensional vector space V with basis B,

(1) CB is a linear transformation.

(2) CB is bijective, and hence an isomorphism. Proof. (1) is easy. (2) follows from spanning (onto) and linear independence (one-to-one).

What this really says is that every isomorphism V → Fn corresponds to a choice of basis in V .

12 1.4 Linear Independence, Span and Bases 1 The Algebra of Vector Spaces

Definition. The cardinality of a basis B of V is called the dimension of V , denoted dim V . In order to use this definition in good faith, we must verify that every basis of a fixed vector space has the same cardinality. In theory this is known as the invariant basis number property (IBN). We prove it in the finite dimensional case below. Theorem 1.4.8 (Invariance of Basis). If B and C are bases of a finite dimensional vector space V , then B and C have the same cardinality. Proof. In the finite dimensional case, we can use the coordinate mapping to get a short proof of IBN. Suppose |B| = n and |C| = m. The map

−1 n CB CC m T : F −−→ V −→ F −1 is a linear transformation. Since CB and CC are invertible, so is T . Thus the n × n matrix of T is invertible, and hence must be square. Therefore m = n. Another route to proving IBN for finite dimensional vector spaces is via the next lemma. Note that this does not require that we know anything about invertibility of linear transfor- mations, something we haven’t developed yet.

Lemma 1.4.9 (Steinitz’s Exchange Lemma). Suppose V = Span(v1, . . . , vn) and {u1, . . . , um} is a linearly independent set in V . Then we may reorder the vj such that for all k = 1, . . . , m, {u1, . . . , uk, vk+1, . . . , vn} spans V . In particular, m ≤ n. n X Proof. Write u1 = ajvj for aj ∈ F. Reorder the vj so that a1 6= 0. Then j=1

n ! 1 X v = u − a v 1 a 1 j j 1 j=2 which shows that Span(u1, v2, . . . , vn) ⊃ {v1, . . . , vn}. Hence Span(u1, v2, . . . , vn) = V ; this proves the base case for k = 1. Now if m = 1, we’re done. Otherwise we induct on m. Assume the property holds 0 for k = 1, . . . , m < m. We know that {u1, . . . , uk, vk+1, . . . , vn} spans V . We will reorder the vj again so that V = Span(u1, . . . , uk+1, vk+2, . . . , vn). Since the set of ui is linearly independent, we may write k n X X uk+1 = ajuj + bjvj j=1 j=k+1

for aj, bj ∈ F. We know that some bj 6= 0, since otherwise uk+1 would be a of the remaining ui, contradicting our choice of the ui. Reorder vk+1, . . . , vn so that bk+1 6= 0. Now k n ! 1 X X vk+1 = uk+1 − ajuj − bjvj . bk j=1 j=k+2

This shows that vk+1 ∈ Span(u1, . . . , uk+1, vk+2, . . . , vn) and so V = Span(u1, . . . , uk+1, vk+2, . . . , vn) as desired.

13 1.4 Linear Independence, Span and Bases 1 The Algebra of Vector Spaces

Since m ≤ n and we can switch the order to get n ≤ m, we see that every (finite) basis has the same cardinality. The proof is only slightly more complicated in the infinite case: instead of showing m ≤ n we can construct an inclusion B ,→ C where B and C are bases for the same vector space, and in this way construct a on sets. However, this requires the following fact from set theory: Proposition 1.4.10. If X and Y are (possibly countably or uncountably infinite) then |X| ≤ |Y | and |Y | ≤ |X| imply |X| = |Y |. In any case, now we can talk about the dimension of a vector space without fear of the term not being well-defined.

Examples.

n n 1 Let V = F in which we view elements of V as column vectors. The set {ei}i=1, where 0 . . .   0   ei = 1   0 . . 0

with 1 in the ith row, is a basis for V . This is sometimes called the standard basis of V . a c 2 Let V = 2 and take any two nonzero vectors , ∈ V . Then F b d

a c B = , b d

c a is a basis for V if and only if is not a scalar multiple of . This really shows d b that there are tons of bases even in a ‘small’ vector space like F2.

N i N 3 For V = FN [x] = {a0 + a1x + ... + aN x | ai ∈ F}, a basis for V is {x }i=0.

n 4 In the polynomial algebra F[x] = {a0 + a1x + ... + anx | ai ∈ F, n ∈ N}, things are i ∞ similar: a basis is {x }i=0. A useful result for producing bases for a vector space is Proposition 1.4.11. If V is a vector space of dimension n < ∞ then (1) Every linearly independent set of size n spans V .

(2) Every spanning set of size n is linearly independent.

14 1.4 Linear Independence, Span and Bases 1 The Algebra of Vector Spaces

Proof. (1) Let B be a basis of V and let C be a linearly independent set of the same size as B. By Lemma 1.4.6, we may complete C to a basis of V . If we add anything to C in doing so, we would have |C| > |B|, contradicting the Exchange Lemma. Hence C must already be a basis for V . (2) is proven similarly. An even more important result is ∼ Theorem 1.4.12. Let V and W be F-vector spaces. Then V = W ⇐⇒ dim V = dim W . Proof. ( ⇒ = ) Let B be a basis of V and let C be a basis of W such that |B| = |C| = n. Recall that the coordinate mappings

n CB : V −→ F n and CC : W −→ F

−1 −1 are , and hence CC is also an isomorphism. Then CC ◦ CB : V → W is a linear bijection because it is the composition of linear . Therefore V ∼= W . ( =⇒ ) Suppose ϕ : V → W is an isomorphism. Choose a basis B of V . Since ϕ is one-to-one, ϕ(B) is linearly independent. Likewise since ϕ is onto, ϕ(B) spans W . But ϕ didn’t change the size of B. Therefore ϕ(B) is a basis for W , and |ϕ(B)| = |B|. There is an important connection between linear independence and injectivity, and like- wise between spanning and surjectivity. Let B = {v1, . . . , vn} ⊆ V . Define

n Φ: F −→ V   c1 n  .  X  .  7−→ civi. i=1 cn Then B is linearly independent exactly when Φ is one-to-one, and B is a spanning set of V exactly when Φ is onto. The following is an example of what is known as a universal mapping property, a common theme in many areas of algebra.

Proposition 1.4.13 (Universal Property of Vector Spaces). Let V and W be F-vector spaces and let B = {v1, . . . , vn} be a basis of V . For any choice of function f : B → W there is a ˜ ˜ unique linear transformation f : V → W satisfying f|B = f. Remark. The map f : B → W can be any assignment at all; it does not have to be a . Another way to think of the universal mapping property is that such an f induces a linear map f˜ making the following diagram commute:

f B W

f˜ V

15 1.4 Linear Independence, Span and Bases 1 The Algebra of Vector Spaces

˜ ˜ ˜ ˜ Proof. Since f|B has to be f, f must satisfy f(vi) = f(vi) for all vi ∈ B. If such an f exists, linearity would also imply that

n ! n n ˜ X X ˜ X f aivi = aif(vi) = aif(vi). i=1 i=1 i=1 But B is a basis, so for every vector in V there is a unique way to write it as a linear combination of the vi. Therefore we define

f˜ : V −→ W n n X X aivi 7−→ aif(vi). i=1 i=1

Since the ai are uniquely determined for any choice of vector in V , this map is well-defined. Note that if we take two vectors u, v ∈ V and write them

n n X X u = aivi and v = bivi. i=1 i=1 Then we have

n n ! ˜ ˜ X X f(u + v) = f aivi + bivi i=1 i=1 n ! ˜ X = f (ai + bi)vi i=1 n X = (ai + bi)f(vi) i=1 n n X X = aif(vi) + bif(vi) i=1 i=1 = f˜(u) + f˜(v).

Scaling is proven similarly, and after this is established we have proven f˜ is linear. Moreover, the initial comments in this prove show that this is the unique way to extend f to a linear map in V . Corollary 1.4.14. If two functions f, g : V → W agree on some basis B of V , then f = g. 1 Example 1.4.15. Let V = 2 and W = Span , the x-axis. A common question is: can R 0 we find a subspace W 0 of V so that V ∼= W ⊕ W 0? Recall that

∼ 0 0 2 0 V = W ⊕ W ⇐⇒ W + W = R and W ∩ W = {0}. There are lots of choices for W 0, namely any line through the origin other than the x-axis.

16 1.5 More on Linear Transformations 1 The Algebra of Vector Spaces

The choice of W 0 is not always this obvious. However we can state this problem in a different way, which will generalize nicely. Given the projection map V −→π V/W , we want to find a right inverse of π, i.e. a map f : V/W → V such that f is linear and πf = idV/W . In reality, this is always true when π is surjective, as it is in this case. The reason this is relevant to the above problem is that if we take W 0 = im f, then W + W 0 = V and W ∩ W 0 = {0}. To see the independence part, take v ∈ W ∩ W 0. Then v = f(v0 + W ) for some v0 ∈ V . Projecting, we have π(v) = π(f(v0 + W )). On one hand,

π(v) = v + W = W

since v ∈ W . On the other, π(f(v0 + W )) = v0 + W which together with the first part show that v0 ∈ W . Hence v = 0. To show that W + W 0 = V , take v ∈ V and consider

π(v − fπ(v)) = π(v) − π(fπ(v)) = π(v) − id(π(v)) = π(v) − π(v) = 0.

Then v − fπ(v) ∈ ker π = W . In particular this means v = (v − fπ(v)) + fπ(v) ∈ W + W 0. Hence if such an f is shown to exist, im f satisfies the desired properties of W 0, so that V is the direct sum of W and im f. Pick a basis {v1 +W, . . . , vn +W } for V/W . Define f : V/W → V by sending vi +W 7→ vi and extending by linearity. Then by the universal mapping property (Proposition 1.4.13), it suffices to show that πf is the identity on any basis of V/W . But the definition of f gives us πf(vi + W ) = π(vi) = vi + W

and so πf = idV/W .

1.5 More on Linear Transformations

Definition. For a pair of F-vector spaces V and W , the set

HomF(V,W ) = {f : V → W | f is a linear transformation} is a vector space under pointwise addition and scaling.

HomF(V,W ) is a subset of the vector space of all functions V → W , so it will suffice in the theorem below to prove that HomF(V,W ) is a subspace.

Theorem 1.5.1. HomF(V,W ) is an F-vector space.

17 1.5 More on Linear Transformations 1 The Algebra of Vector Spaces

Proof. Given f, g ∈ Hom(V,W ) and r ∈ F, we must show taht f + g and rf are linear. Note that for all v, w ∈ V and a, b ∈ F, (f + g)(av + bw) = f(av + bw) + g(av + bw) = af(v) + bf(w) + ag(v) + bf(w) = a(f(v) + g(v)) + b(f(w) + g(w)) = a(f + g)(v) + b(f + g)(w).

Similarly,

(rf)(av + bw) = r[f(av + bw)] = r[af(v) + bf(w)] = raf(v) + rbf(w) = arf(v) + brf(w) by commutativity of F = a(rf)(v) + b(rf)(w).

Note that we need F to be commutative for the above proof to work. If R is just a ring, or at least a domain, we would only be able to turn HomR(A, B) into a left R-, and even then only if A and B are R-bimodules.

Examples. 1 The set of linear maps V → V is called the endomorphism ring of V , denoted

HomF(V,V ) = EndF(V ).

2 If we view F as a vector space over itself, we will write

∗ HomF(V, F) = V . V ∗ is called the dual of V , and elements of V ∗ are called functionals.

Proposition 1.5.2. If V and W are finite dimensional F-vector spaces, then Hom(V,W ) is also finite dimensional and in particular

dim Hom(V,W ) = (dim V )(dim W ).

Proof. Let {v1, . . . , vn} be a basis of V and {w1, . . . , wm} be a basis of W . Define

ϕij : V −→ W

vi 7−→ wj

vk 7−→ 0 if k 6= i

where i = 1, . . . , n and j = 1, . . . , m. The result follows from showing that {ϕij} is a basis for Hom(V,W ).

18 1.5 More on Linear Transformations 1 The Algebra of Vector Spaces

Corollary 1.5.3. If V is finite dimensional,

ˆ dim End(V ) = (dim V )2

ˆ dim V ∗ = dim V .

In particular, V ∗ ∼= V .

Remark. There is a

Hom(W, U) × Hom(V,W ) −→ Hom(V,U) (S,T ) 7−→ S ◦ T = ST.

This is an example of a bilinear pairing, since it satisfies

(S + S0)T = ST + S0T S(T + T 0) = ST + ST 0 (rS)T = r(ST ) = S(rT ).

Taking W = U = V , we get a multiplication on End(V ):

End(V ) × End(V ) −→ End(V ) (S,T ) 7−→ ST

which makes End(V ) into an algebra over F, i.e. a ring that is also a vector space. Definition. The general linear group associated to V is

GL(V ) = {f : V → V | f is an isomorphism}.

Note that GL(V ) is a subset of End(V ), but not a subspace (in particular, the zero map is not invertible). In fact, GL(V ) is precisly the subgroup of all invertible elements of End(V ) as a ring, i.e. GL(V ) = End(V )×.

Definition. Two elements S,T ∈ End(V ) are conjugate if T = RSR−1 for some R ∈ GL(V ).

Equivalently, S and T are conjugate if TR = RS, i.e. the diagram below commutes.

S V V

R R

V V T

19 1.5 More on Linear Transformations 1 The Algebra of Vector Spaces

As we know from any introductory linear algebra course, once we pick a basis for V , every linear transformation can be realized as multiplication by a matrix A, where A is viewed  as a ‘row vector of column vectors’ of V . Given a row vector ~r = a1 a2 ··· an and a   b1 b   2 coumn vector ~c =  . , we define their product by  .  bn

n X ~r~c = aibi. i=1

For a matrix A, let ri(A) denote the ith row of A and ci(A) the ith column of A. If A ∈ Mat(l, m) and B ∈ Mat(m, n) then matrix multiplication lands AB in Mat(l, n) via

m X (AB)ij = aikbkj = ri(A)cj(B). k=1 Proposition 1.5.4. Let T : Fn → Fm be a linear transformation. Then there is an m × n matrix A, called the standard matrix of T , satisfying T (~x) = A~x for all ~x ∈ Fn.   a1 n  .  X n Proof. If ~x =  .  then ~x = aiei where ei is the ith vector of the standard basis for F : i=1 an

0 . . .   0   ei = 1 row i   0 . . 0

n X By linearity, T (~x) = aiT (ei) so we should set i=1  | |  A = T (e1) ··· T (en) | | Then we have    | |  a1 T (e ) ··· T (e )  .  A~x =  1 n   .  = a1T (e1) + ... + anT (en) = T (~x). | | an

Note that the T (ei) are uniquely determined, so A really is the standard matrix for T .

20 1.5 More on Linear Transformations 1 The Algebra of Vector Spaces

Using the coordinate mapping along with a choice of basis in each of V and W , we can construct the standard matrix for any linear transformation. Let T : V → W be linear and choose bases B = {v1, . . . , vn} and C = {w1, . . . , wm} for V and W , respectively. This data determines isomorphisms

T V W

CB CC

Fn Fm

−1 n m The composition CC ◦ T ◦ CB : F → F is linear, so by Proposition 1.5.4 it has a standard matrix which we will denote by [T ]B,C. The ith column of this matrix is

−1 CC ◦ T ◦ CB (ei) = CC(T (vi))

so determining the standard matrix for T comes down to computing the coordinates in C of the images of the vi under T .

2 Example 1.5.5. Let V = W = P2[x] = {a + bx + cx | a, b, c ∈ R}. We will choose several bases and see what standard matrices they produce for various linear transformations. First 2 choose the obvious basis B = C = {1, x, x } for V and W . Let T1 : V → W be the linear d transformation mapping f 7→ x dx f. To construct [T1]B,C, we compute d T (1) = x (1) = 0 = 0 · 1 + 0 · x + 0 · x2 1 dx d T (x) = x (x) = x · 1 = 0 · 1 + 1 · x + 0 · x2 1 dx d T (x2) = x (x2) = x · 2x = 0 · 1 + 0 · x + 2 · x2. 1 dx So the standard matrix is 0 0 0 [T1]B,C = 0 1 0 0 0 2 On the other hand, define the linear transformation

T2 : V −→ W d f 7−→ (xf). dx In the same way, we compute the standard matrix to be

1 0 0 [T2]B,C = 0 2 0 0 0 3

21 1.5 More on Linear Transformations 1 The Algebra of Vector Spaces

Let’s take a different basis, say D = {1, 1 + x, 1 + x + x2}. Then we have a different (and less nice-looking) standard matrix for T1: 0 −1 0  [T1]B,D = 0 1 −2 0 0 2

If we choose a different initial basis, we get a different matrix for T2 as well: 1 1 1 0 2 2 0 0 3

−1 Since CB and CC are invertible and linear, ker T = CB (ker[T ]B,C). Similarly, im T = −1 CC (im[T ]B,C). The kernel and image of [T ]B,C are sometimes called the null space and column space of the matrix, respectively. Proposition 1.5.6. Suppose U, V, W are vector spaces and we have linear transformations U −→S V −→T W. If A, B and C are bases for U, V and W , respectively, then the matrix for T ◦ S satisfies

[T ◦ S]A,C = [T ]B,C[S]A,B.

Proof. Let A = [S]A,B,B = [T ]B,C and C = [T ◦ S]A,C. On one hand, the ijth coefficient of BA is X (BA)ij = bikakj. k On the other hand, we compute the ith column of C to be

` ! X C(ai) = [T ◦ S]A,C(ai) = T bikakj k=1 ` X = T (bik)akj k=1 ` m X X = akj bikci k=1 i=1 m ` X X = cibikakj i=1 k=1 X and the coefficient of ci is precisely bikakj as needed. k Corollary 1.5.7. Matrix multiplication is associative. Once a (finite) basis for V has been chosen, the general linear group of V becomes

GLn(F) = {A ∈ Matn(F) | A is invertible}. Note that A is invertible ⇐⇒ (A) = n ⇐⇒ null(A) = 0.

22 1.6 Similar and Conjugate Matrices 1 The Algebra of Vector Spaces

1.6 Similar and Conjugate Matrices

Definition. Two matrices A and B are conjugate if there is some C ∈ GLn(F) such that B = CAC−1.

When a single basis B is chosen for V , we denote [T ]B,B simply by [T ]B. Definition. Two matrices A and B are similar if there is a linear transformation T : V → V and bases A and B of V so that A = [T ]A and B = [T ]B. We will prove

Theorem 1.6.1. A and B are similar ⇐⇒ A and B are conjugate.

Before we do so, let’s take a moment to discuss [T ]B. In fact, let’s just look at the identity map

id : V −→ V v 7−→ v.

For bases B and C of V , what is [id]B,C? We know it makes the following diagram commute:

id V V

CB CC

Fn Fn [id]B,C

Definition. The change-of-basis matrix from B to C is CB,C = [id]B,C.

−1 Note that by the above, CB,C = CC ◦ CB . Explicitly, this matrix looks like

 | |  CB,C = CC(b1) ··· CC(bn) | |

2 2 Example 1.6.2. Let V = P2[x] with bases B = {1, x, x } and C = {1, 1 + x, 1 + x + x }. Then we have the following change-of-basis matrices:

1 −1 0  1 1 1 CB,C = 0 1 −1 and CC,B = 0 1 1 0 0 1 0 0 1

−1 To prove Theorem 1.6.1, notice in the above example that CC,B = CB,C. This holds in general. The following diagram describes the relation between [T ]B,[T ]C and CB,C.

23 1.7 Kernel and Image 1 The Algebra of Vector Spaces

T V V

CB CB

CB [T ]B CB V Fn Fn V

id CB,C CB,C id

[T ]C V Fn Fn V CC CC

CC CC

V V T

All of the outer squares commute, and we want to show that the middle square commutes as well. Consider

−1 −1 [T ]CCB,C = (CCTCC )(CCCB ) −1 = CCTCB −1 −1 = CCCB CBTCB −1 −1 = (CCCB )(CBTCB )

= CB,C[T ]B.

Therefore the middle square commutes. What’s more, this shows that [T ]C and [T ]B are conjugate. Hence we have proven the ( =⇒ ) direction of the theorem. The ( ⇒ = ) direction is a little more complicated; starting with the middle commutative square, one must build the outer squares of the above diagram in order to show similarity. We end the discussion of Theorem 1.6.1 here.

1.7 Kernel and Image

In this section we show examples of how one may compute a basis for the kernel and image of a linear transformation.

Example 1.7.1. Let V = P3[x] be the vector space of polynomials of degree at most 3 with coefficients in F. Define a linear transformation T : V −→ V 1 Z x f 7−→ f(t) dt + f(1). x 0

24 1.7 Kernel and Image 1 The Algebra of Vector Spaces

Pick a basis B = {1, x, x2, x3} for V and compute the images of the basis vectors under T :

T (1) = 2 x T (x) = + 1 2 x2 T (x2) = + 1 3 x3 T (x3) = + 1. 4 The standard matrix for T with respect to this basis is then computed as 2 1 1 1 1 0 2 0 0 [T ]B =  1  0 0 3 0 1 0 0 0 4 It’s easy to see that this matrix is invertible, so T is one-to-one and onto. Therefore the kernel is trivial, so the only choice of basis for ker T is the empty basis {}. This also tells us that a basis for im T is any basis of V , so we may take B but let’s find a more interesting basis. In general one can find a basis for im T by taking the images of basis vectors corresponding to the columns of [T ]B containing pivots. In the above case, this produces the basis

 1 1 2 1 3 2, 1 + 2 x, 1 + 3 x , 1 + 4 x . 1 2 Example 1.7.2. Let A = 3 4 and define a transformation 5 6

T : Mat(2, 3, F) −→ Mat2(F) B 7−→ BA.

T is linear because matrix multiplication is linear. By the rank-nullity theorem, 6 = dim im T + dim ker T and since Mat2(F) only has dimension 4, the kernel is at least di- mension 2. Pick bases B and C for Mat(2, 3, F) and Mat2(F) to be

B = {E11,E21,E12,E22,E13,E23} and C = {E11,E21,E12,E22}

where Eij has 1 in the ijth component and 0’s elsewhere. We compute the 4 × 6 standard matrix to be 1 0 3 0 5 0 0 1 0 3 0 5 [T ]B,C =   2 0 4 0 6 0 0 2 0 4 0 6 T In fact this can be written as a tensor product A ⊗ I2, where I2 is the 2 × 2 identity matrix. That construction is sometimes called the Kronecker product.

25 1.7 Kernel and Image 1 The Algebra of Vector Spaces

We row reduce [T ]B,C to obtain

1 0 0 0 −1 0  0 1 0 0 0 −1   0 0 1 0 2 0  0 0 0 1 0 2

From this we see immediately that T is onto, so a basis of im T may be computed by taking the matrices corresponding to the first four columns of [T ]B,C:

1 2 0 0 3 4 0 0 , , , . 0 0 1 2 0 0 3 4

(A valid basis for im T is also C.) To compute the kernel of T , note that ker[T ]B,C is    x1   x2 x1 − x5 = 0     x3 x2 − x6 = 0  ker[T ]B,C =   : x4 x3 + 2x5 = 0    x5 x4 + 2x6 = 0    x6          x5 1 0      x6     0   1          −2x5   −2  0  =   : x5, x6 ∈ F = x5   + x6   −2x6  0  −2          x5     1   0       x6   0 1 

which shows that a basis for ker[T ]B,C is

 1   0     0   1      −2  0    ,    0  −2      1   0   0 1 

Going backward through the coordinate mapping, we compute a basis for ker T to be

1 −2 1 0 0 0 , . 0 0 0 1 −2 1

26 1.8 The Isomorphism Theorems 1 The Algebra of Vector Spaces

1.8 The Isomorphism Theorems

Recall that an isomorphism (of vector spaces) is a linear transformation that is one-to- one and onto. Anytime we are able to define isomorphisms in algebra, there are powerful theorems characterizing certain ‘canonical’ isomorphisms; these are called the Isomorphism Theorems.

Theorem 1.8.1 (First Isomorphism Theorem). If T : V → W is a linear transformation then there is an isomorphism T : V/ ker T → im T .

Proof. Define T by mapping v + ker T 7→ T (v). Let π : V → V/ ker T be the natural projection. To show T is well-defined, suppose v + ker T = v0 + ker T for some v, v0 ∈ V . Then v0 = v + w for some w ∈ ker T and we have

T (v0 + ker T ) = T (v0) = T (v + w) = T (v) + T (w) = T (v) + 0 = T (v) = T (v + ker T )

so T is well-defined. It is easy to see that T inherits linearity from T , so it remains to check that it is bijective. If w ∈ im T , there is some v ∈ V such that T (v) = w. But T (v) = T (v + ker T ) so we see that T is onto as well. Finally, we see that

v + ker T ∈ ker T ⇐⇒ T (v) = 0 ⇐⇒ v ∈ ker T ⇐⇒ v + ker T = ker T so T is one-to-one.

Theorem 1.8.2 (Second Isomorphism Theorem). Given V a vector space and U, W sub- spaces of V , (U + W )/U ∼= W/(U ∩ W ). Proof. Define T : W → (U + W )/U by T (v) = v + U. This is clearly linear, and since cosets partition W , T is surjective. By the first isomorphism theorem (1.8.1), (U + W )/U ∼= W/ ker T so it suffices to show that ker T = U ∩ W . But

ker T = {w ∈ W | w + U = U} = {w ∈ W | w ∈ U} = U ∩ W and it follows that T is an isomorphism.

Theorem 1.8.3 (Third Isomorphism Theorem). Given a chain of vector spaces U ⊂ W ⊂ V , W/U is a subspace of V/U and

(V/U)/(W/U) ∼= V/W.

Proof. Following the last two proofs, one may hope to define a linear transformation

T :(V/U)/(W/U) −→ V/W (v + U) + W/U 7−→ v + W and show that it is an isomorphism. However, this approach requires that we prove W/U is a subspace, T is well-defined, linear and bijective. We can certainly be more efficient.

27 1.8 The Isomorphism Theorems 1 The Algebra of Vector Spaces

Recall that the projection map π : V → V/W, v 7→ v + W is linear. Since U ⊂ W , π(U) = 0 so this induces a map

π¯ : V/U −→ V/W v + U 7−→ v + W which is well-defined and inherits surjectivity from π. We will be done if we can show that kerπ ¯ = W/U. But notice that

kerπ ¯ = {v + U | v + W = W } = {v + U | v ∈ W } = W/U.

This simultaneously shows that W/U is a subspace of V/U (every kernel is), and that (V/U)/(W/U) ∼= V/W by the first isomorphism theorem (1.8.1).

28 2 Duality

2 Duality

2.1 Functionals

Definition. A on an F-vector space V is a linear transformation f : V → F, ∗ i.e. f ∈ V = HomF(V, F).

One way in which functionals arise is via the coordinate mapping CB for B a basis. In this section we will restrict our attention to the finite dimensional case, so assume B = {v1, . . . , vn}. The coordinate mapping, along with the ith projection map for each i, gives us a composition

CB n πi V −→ F −→ F   c1 n  .  X v 7→  .  7→ ci where v = civi. i=1 cn

∗ ∗ ∗ ∗ We denote the ith composite πi ◦ CB by vi . Then we see that vi is a functional, i.e. vi ∈ V . ∗ In other words, vi (v) outputs the coefficient of vi when writing v as a linear combination of {v1, . . . , vn}.

∗ Remark. vi is the same functional for any choice of basis containing vi.

Example 2.1.1. Let V = P2[x], the degree 2 over R. Choose a basis

2 2 B = {v1, v2, v3} = {1 + x + 2x , 1 − x, x }.

Then 1 + x + x2 = 1(1 + x + 2x2) + 0(1 − x) − 1(x2) so the functionals look like

∗ 2 v1(1 + x + x ) = 1 ∗ 2 v2(1 + x + x ) = 0 ∗ 2 v3(1 + x + x ) = −1.

∗ ∗ ∗ Definition. The set {v1, . . . , vn} is called the of V . In a moment we will prove that the dual basis is indeed a basis of V ∗. Some alternate ∗ ∗ notation: we will write (v, vi ) to denote vi (v) when it is convenient to do so. In particular, this notation will be easier to wield when discussing inner product spaces. ∗ ∗ Given a basis B = {v1, . . . , vn} of V and its dual basis {v1, . . . , vn}, any v ∈ V can be n X ∗ written v = vi (v)vi. Therefore knowing the coordinates of a vector v with respect to a i=1 basis is equivalent to knowing the value of v at each functional in the dual basis.

∗ ∗ Theorem 2.1.2. Suppose {v1, . . . , vn} is a basis of a vector space V . Then {v1, . . . , vn} is a basis of V ∗.

29 2.1 Functionals 2 Duality

∗ ∗ Proof. First let f ∈ V ; we will show f can be written as a linear combination of the vi in ∗ ∗ the set {v1, . . . , vn}. To do this, we evaluate f on any vk in the basis {v1, . . . , vn}:

∗ (vk, f) = ((vk, vk)vk, f) n ! X ∗ = (vk, vi )vi, f i=1 n X ∗ = (vk, vi )(vi, f) by linearity of f i=1 n ! X ∗ ∗ = vk, (vi, f)vi by linearity of vi . i=1

n X ∗ ∗ ∗ Therefore f = (vi, f)vi so the set {vi } spans V . i=1 n X ∗ Now suppose cjvj = 0. Evaluate both sides at vk to see that i=1

n ! n X ∗ X ∗ 0 = (vk, 0) = vk, cjvj = cj(vk, vj ) = ck. j=1 j=1

∗ Hence ck = 0 for k = 1, . . . , n so the dual basis is linearly independent. This shows {vi } is actually a basis of V ∗.

The above proof can be generalized quite a bit to describe HomF(V,W ) for any two F-vector spaces V and W .

Theorem 2.1.3. If {v1, . . . , vn} is a basis of V and {w1, . . . , wm} is a basis of W , then the set {ϕij} is a basis of HomF(V,W ), where

ϕij : V −→ W

vk 7−→ δikwj and δik is the Kronecker delta, defined by δik = 1 if i = k and δik = 0 otherwise.

Applying this to a basis {v1, . . . , vn} of V and the basis {1} of F shows that

∗ ϕi1(vk) = δik = vi (vk).

∗ So vi and ϕi1 agree on a basis and therefore they are the same map. Thus Theorem 2.1.3 implies an important relation:

Corollary 2.1.4. If V is finite dimensional, V ∼= V ∗.

30 2.2 Self Duality 2 Duality

2.2 Self Duality

Since V ∗ is a vector space, what happens when we take the dual of V ∗? We will prove that the double dual V ∗∗ of a vector space is really just V itself. ∗ Given V = HomF(V, F), for each v ∈ V define a map out of the by

∗ θv : V −→ F f 7−→ (v, f).

Notice that θv(f + g) = (v, f + g) = (v, f) + (v, g) = θv(f) + θv(g) so θv is linear and hence ∗∗ ∗ θv ∈ V . By linearity of each f ∈ V we can show that there is a natural assignment

θ : V −→ V ∗∗

v 7−→ θv.

Proposition 2.2.1. For all vector spaces V , θ is one-to-one. Proof omitted.

If V is finite dimensional, we know V,V ∗ and V ∗∗ all have the same dimension, so in the finite dimensional case θ is onto and therefore an isomorphism. Henceforth we will identify ∗∗ elements of V with functionals in V via v ↔ θv. n To think of this another way, let V = Fc , where the subscript c indicates that we are n ∗ n considering vectors in V as column vectors in F . Likewise consider V = Fr , so that functionals are thought of as row vectors. Then there is a bilinear pairing

∗ V × V −→ F (v, w) 7−→ wv.

2.3 Perpendicular Subspaces

An important subspace of the dual space V ∗ that arises naturally from subsets of V is Definition. Let A ⊆ V be a subset. The annihilator of A, also called “A perp”, is

A⊥ = {f ∈ V ∗ | (v, f) = 0 for all v ∈ A}.

Proposition 2.3.1. For any subset A ⊆ V , (i) A⊥ is a subspace of V ∗.

(ii) A⊥ = (Span A)⊥.

(iii) (A⊥)⊥ = Span A. Proof. (i) will result from the proof of Theorem 2.3.2 below. For (ii), one containment is easy and the other is given by the linearity of the functionals in A⊥ ⊂ V ∗. Finally, (iii) follows from the identification V ↔ V ∗∗ described in the last section.

31 2.3 Perpendicular Subspaces 2 Duality

There is a nice formula for “how big” A⊥ can possibly be:

Theorem 2.3.2. Let W be a subspace of a finite dimensional vector space V . Then dim W + dim W ⊥ = dim V .

First we need

Lemma 2.3.3. Given a linear transformation T : V → W , there is a linear transformation

T ∗ : W ∗ −→ V ∗ f 7−→ f ◦ T

called the adjoint of T .

Proof. Easy. A nice application of this lemma is to the inclusion map i : W,→ V , where W ⊆ V is a subspace. Since i is linear, there is an adjoint map i∗ : V ∗ → W ∗ which is onto when V is finite dimensional. This allows us to prove the theorem:

Proof of Theorem 2.3.2. By the above comments, the inclusion i : W,→ V induces an adjoint map i∗ : V ∗ → W ∗ which is onto. By the rank-nullity theorem, dim ker i∗ + dim im i∗ = dim V ∗, but i∗ is onto, so dim im i∗ = dim W ∗. Moreover, a vector space and its dual have the same dimension, so dim ker i∗ + dim W = dim V . Thus it suffices to show ker i∗ = W ⊥. By definition,

ker i∗ = {f ∈ V ∗ | i∗(f) = 0} = {f ∈ V ∗ | f ◦ i = 0} = {f ∈ V ∗ | f ◦ i(w) = 0 for all w ∈ W } = {f ∈ V ∗ | f(w) = 0 for all w ∈ W } = W ⊥.

Hence dim W ⊥ + dim W = dim V . Note that this proves (i) of Proposition 2.3.1 above, since we have realized W ⊥ as the kernel of a linear transformation out of V ∗.

Corollary 2.3.4. Let W ⊆ V be a subspace and v ∈ V . Then v ∈ W ⇐⇒ (v, f) = 0 for all f ∈ W ⊥.

Proof. ( =⇒ ) is obvious, and ( ⇒ = ) is proven by the identification V ↔ V ∗∗ which tells us that if (v, f) = 0 for all f ∈ W ⊥, then v ∈ (W ⊥)⊥ = W . Say we have a linear transformation T : V → W and a functional g ∈ W ∗. Consider

(T (v), g) = g(T (v)) = (g ◦ T )(v) = T ∗(g)(v) = (v, T ∗(g)).

This tells us how to move linear maps from one side of the pairing (·, ·) on W × W ∗ to the other. This is the main property that characterizes the adjoint.

32 2.3 Perpendicular Subspaces 2 Duality

Proposition 2.3.5. Let T : V → W be linear. Then (im T )⊥ = ker T ∗.

Proof. By the comments above,

f ∈ (im T )⊥ ⇐⇒ for all w ∈ im T, (w, f) = 0 ⇐⇒ for all v ∈ V, (T (v), f) = 0 ⇐⇒ for all v ∈ V, (v, T ∗(f)) = 0 ⇐⇒ T ∗(f) = 0 ⇐⇒ f ∈ ker T ∗.

This proposition is the to the fact that (col A)⊥ = ker AT for a matrix A whose transpose is AT . Note that Proposition 2.3.5 also implies (im T ∗)⊥ = ker T , which is the analog of (row A)⊥ = ker A.

n m Example 2.3.6. Let T : Fc → Fc have m × n standard matrix A, i.e. T (v) = Av for all n ∗ n v ∈ V . We identify (Fc ) = Fr so that the adjoint pairing is

n n Fc × Fr −→ F (v, w) 7−→ wv.

∗ m n m ∗ The adjoint of T is T : Fr → Fr . For w ∈ Fr , T (w) should be another row vector, and we see that it is precisely wA:

T ∗(w)v = (w ◦ T )(v) = wT (v) = w(Av) = (wA)v.

∗ m n m n Thus T : Fr → Fr takes w 7→ wA. If we use the standard bases for Fc and Fc and take m n the corresponding dual bases B of Fr and C of Fr , we get the commutative diagram

∗ m T n Fr Fr

T T (·) = CB CC = (·)

m n Fc ∗ Fc [T ]B,C

∗ ∗ T T T Then we know the structure of [T ]B,C, namely [T ]B,C : v 7→ A v = (v A) . Thus we have identified the adjoint T ∗ with the transpose matrix AT .

33 3 Multilinearity

3 Multilinearity

3.1 Multilinear Functions

Definition. A function Ψ: V1 × V2 × · · · × Vk → W is multilinear if Ψ is linear in each component when all other components are fixed.

Examples.

1 A bilinear function Ψ : V1 × V2 → W satisfies

0 0 Ψ(v1 + v1, v2) = Ψ(v1, v2) + Ψ(v1, v2)

Ψ(cv1, v2) = cΨ(v1, v2) 0 0 Ψ(v1, v2 + v2) = Ψ(v1, v2) + Ψ(v1, v2)

Ψ(v1, cv2) = cΨ(v1, v2).

0 0 0 0 Note that Ψ is not linear, since Ψ(v1 + v1, v2 + v2) 6= Ψ(v1, v2) + Ψ(v1, v2). 2 Matrix multiplication (·) : Mat(m, n) × Mat(n, l) → Mat(m, l) is bilinear.

3 Let det : F2 × F2 → F be the determinant function, i.e.     a1 b1 a1 b1 , 7−→ . a2 b2 a2 b2

The determinant is an important example of a multilinear function (bilinear in this case). We will define det rigorously in Section 3.5.

∗ 4 Suppose f1, . . . , fk ∈ V . Define Ψ : V × V × · · · × V → F by | {z } k

k Y (v1, . . . , vk) 7−→ (vi, fi). i=1 Then Ψ is multilinear since each of the functionals is linear. This particular map is

usually denoted Ψf1,...,fk .

The set of all multilinear functions V1 × · · · Vk → W is denoted MulF(V1 × · · · × Vk,W ). This is a vector space over F, and it can be viewed as a multilinear analog to HomF(V,W ). In fact, if k = 1, it is just the set of linear maps, that is, Mul(V,W ) = Hom(V,W ). We seek a way to turn MulF into a HomF set. It turns out that the solution is the tensor product:

MulF(V1 × · · · × Vk,W ) = HomF(V1 ⊗ · · · ⊗ Vk,W ).

34 3.2 Tensor Products 3 Multilinearity

3.2 Tensor Products

We will construct a solution to the multilinear problem (i.e. turning MulF into a HomF set) for bilinear functions and remark that the generalization to k components is similar, although notationally cumbersome. Let V1 and V2 be vector spaces over a field F and let F be the vector space with basis {(v1, v2) | v1 ∈ V1, v2 ∈ V2}. Take a moment to think about how huge this is: elements n X in F look like ci(v1i, v2i). To make it “smaller”, we define a subspace of F and take i=1 the quotient of F by this subspace. In particular, let X be the subspace of F spanned by elements of the form

0 0 (cv1, v2) − c(v1, v2), (v1 + v1, v2) − (v1, v2) − (v1, v2), 0 0 (v1, cv2) − c(v1, v2), (v1, v2 + v2) − (v1, v2) − (v1, v2).

Then in F/X we have the following relations:

0 0 (v1 + v1, v2) + X = ((v1, v2) + X) + ((v1, v2) + X)

(cv1, v2) + X = c(v1, v2) + X 0 0 (v1, v2 + v2) + X = ((v1, v2) + X) + ((v1, v2) + X)

(v1, cv2) + X = c(v1, v2) + X.

From this we define

Definition. The tensor product of V1 and V2 is F/X, denoted V1 ⊗F V2. Why should we choose to mod out by those particular vectors in F/X? Well suppose Ψ: V1 × V2 → W is bilinear. Recall that any map defined on a basis extends to a linear map on the span of this basis – this is called ‘extending by linearity’. So if F is the vector space with basis V1 × V2, Ψ defines a linear map

Ψ:e F 7−→ W n n X X ci(v1i, v2i) 7−→ ciΨ(v1i, v2i). i=1 i=1 Consider

0 0 0 0 Ψ((e v1, v2 + v2) − (v1, v2) − (v1, v2)) = Ψ(e v1, v2 + v2) − Ψ(e v1, v2) − Ψ(e v1, v2) 0 0 = Ψ(v1, v2 + v2) − Ψ(v1, v2) − Ψ(v1, v2).

But this last term equals 0 since Ψ is bilinear. Similar statements can be made for the other generators of X above, so we see that X ⊆ Ψ.e This motivates our choice of X above. P Some notation: rather than writing ci(v1i, v2i) + X for elements (cosets) in F/X = P V1 ⊗ V2, we simply write elements in the tensor product as ci(v1i ⊗ v2i). A spanning set of V1 ⊗ V2 consists of elements of the form v1 ⊗ v2 for v1 ∈ V1 and v2 ∈ V2. These are called

35 3.2 Tensor Products 3 Multilinearity elementary , or rank one tensors (the latter term relates to the use of tensors in physics). Note that the elementary tensors span V1 ⊗V2 but they may not be linearly independent. Thus we want to find a basis for V1 ⊗ V2 in order to count its dimension. In fact, once we know dim V1 ⊗ V2, we will see in a moment the following application:

dim Mul(V1 × V2,W ) = dim Hom(V1 ⊗ V2,W ) = dim(V1 ⊗ V2) dim(W ).

Observe that the function

ϕ : V1 × V2 −→ V1 ⊗ V2

(v1, v2) 7−→ v1 ⊗ v2 is bilinear. To see this, take v1 ∈ V1, v2 ∈ V2 and consider

0 0 ϕ(v1, v2 + v2) = v1 ⊗ (v2 + v2) 0 = (v1, v2 + v2) + X 0 0 0 = (v1, v2) + (v1, v2) + X since (v1, v2 + v2) − (v1, v2) − (v1, v2) ∈ X 0 = (v1 ⊗ v2) + (v1 ⊗ v2) 0 = ϕ(v1, v2) + ϕ(v1, v2).

The other properties are shown in a similar fashion, after which we conclude that ϕ is bilinear. This function is special in that every bilinear function defined on V1 × V2 factors through ϕ. This is made rigorous in the next theorem.

Theorem 3.2.1. Given any Ψ: V1 ×V2 → W , there exists a unique linear map ψ : V1 ⊗ V2 → W making the diagram commute (i.e. Ψ = ψϕ).

V1 × V2 ϕ Ψ

V1 ⊗ V2 W ψ

Proof. First we prove that ψ is well-defined. We saw earlier that Ψ : V1 × V2 → W induced a unique Ψ:e F → W such that X ⊆ ker Ψ.e Then there is a unique map ψ : F/X → W that is linear and well-defined. Consider the diagram W

Ψ ψ Ψe

V × V F F/X = V1 ⊗ V2 1 2 i π

36 3.2 Tensor Products 3 Multilinearity

The outer triangle is precisely the triangle we want to make commute, i.e. ϕ = π ◦ i, and we have shown the existence of such a ψ, so it remains to prove ψ is unique. Suppose ψ0 is another solution to the theorem. Then

0 0 ψ (v1 ⊗ v2) = (ψ ϕ)(v1, v2) = Ψ(v1, v2) = (ψϕ)(v1, v2) = ψ(v1 ⊗ v2).

Since ψ0 and ψ agree on elementary tensors (a spanning set), they agree everywhere. Hence ψ is unique. How does one use this theorem? Suppose we want to define a map out of the tensor product:

ψ : V1 ⊗ V2 −→ W

v1 ⊗ v2 7−→ f(v1, v2).

Well-definedness is a major obstacle since V1 ⊗V2 is defined in terms of a quotient, so instead we construct a map out of the product

f : V1 × V2 −→ W

(v1, v2) 7−→ f(v1, v2) and show that it is bilinear. Then Theorem 3.2.1 gives us our unique ψ as desired. We won’t prove the next result, but it is important to note that the tensor product V ⊗ W as a solution to the bilinear mapping problem generalizes to trilinear and higher- order multilinear mapping statements. This generalization is achieved via associativity: ∼ ∼ Proposition 3.2.2. Tensor is associative, that is V1⊗(V2⊗V3) = (V1⊗V2)⊗V3 = V1⊗V2⊗V3. Recall that our goal is to compute a basis for the tensor V ⊗ W (we switch to V and W for the remainder of the section to avoid any more nasty subscripts). Let B = {v1, . . . , vn} be a basis for V and C = {w1, . . . , wm} a basis for W . Then for each 1 ≤ i ≤ n, 1 ≤ j ≤ m we have the elementary tensor vi ⊗ wj. We claim that the set of such elements is a basis for V ⊗ W .

Proposition 3.2.3. The set {vi ⊗ wj | vi ∈ B, wj ∈ C} spans the tensor product V ⊗ W . Proof. We saw earlier that the set of all elementary tensors is a spanning set of V ⊗ W , so it’s enough to show that each v ⊗ w is in Span{vi ⊗ wj}i,j. Let v ∈ V and w ∈ W and P P consider v ⊗ w ∈ V ⊗ W . Write v = civi and w = djwj. Then we have ! ! X X X X v ⊗ w = civi ⊗ djwj = cidj(vi ⊗ wj). i j i j

This tells us that dim V ⊗ W ≤ mn. We sketch the proof of the reverse inequality below.

37 3.3 Tensoring Transformations 3 Multilinearity

Proposition 3.2.4. dim(V ⊗ W ) ≥ mn. Proof sketch. Define a bilinear map V × W −→ Mat(n, m) T (v, w) 7−→ CB(v)CC(w) which induces a linear map V ⊗W → Mat(n, m). Show that this linear map is onto to prove dim(V ⊗ W ) is at least mn.

Therefore we have dim(V ⊗ W ) = (dim V )(dim W ). This further implies that {vi ⊗ wj} is a basis of V ⊗ W since its cardinality is mn.

3.3 Tensoring Transformations

Let S : V → W and T : X → Y be linear transformations. Then Ψ: V × X −→ W ⊗ Y (v, x) 7−→ S(v) ⊗ T (x) is bilinear and therefore induces a linear map on the tensor, S ⊗ T : V ⊗ X −→ W ⊗ Y v ⊗ x 7−→ S(v) ⊗ T (x). The following example will help us understand what the standard matrix of S ⊗ T . A B Example 3.3.1. Let S : F2 −→ F3 and T : F3 −→ F2 be linear transformations with standard 1 2 1 2 3 matrices A = 3 4 and B = . Let B, C, D and E denote the standard bases of   4 5 6 5 6 F2, F3, F3 and F2, respectively. We showed in Section 3.2 that

B ⊗ D = {e1 ⊗ e1, e1 ⊗ e2, e1 ⊗ e3, e2 ⊗ e1, e2 ⊗ e2, e2 ⊗ e3}

and C ⊗ E = {e1 ⊗ e1, e1 ⊗ e2, e2 ⊗ e1, e2 ⊗ e2, e3 ⊗ e1, e3 ⊗ e2} 2 3 3 2 are bases of F ⊗ F and F ⊗ F , respectively. Let’s compute the matrix [S ⊗ T ]B⊗D,C⊗E of the tensored transformation S ⊗ T : F2 ⊗ F3 → F3 ⊗ F2. First we have

(S ⊗ T )(e1 ⊗ e1) = S(e1) ⊗ T (e1)

= (e1 + 3e2 + 5e3) ⊗ (e1 + 4e2)

= e1 ⊗ e1 + 4(e1 ⊗ e2) + 3(e2 ⊗ e1) + 12(e2 ⊗ e2) + 5(e3 ⊗ e1) + 20(e3 ⊗ e2). These coefficients give the first column. The rest of the computations give us  1 2 3 4 5 6   4 5 6 8 10 12     3 6 9 4 8 12  [S ⊗ T ]B⊗D,C⊗E =    12 15 18 16 20 24     5 10 15 6 12 18  20 25 30 24 30 36

38 3.4 Alternating Functions and Exterior Products 3 Multilinearity

This is called the Kronecker product of A and B, denoted A ⊗ B. Notice that it can also be obtained by   a11B a12B a21B a22B a31B a32B In general, we have Theorem 3.3.2. Let S : V → W and T : X → Y be linear and suppose B, C, D and E are bases of V, W, X and Y , respectively. Then

[S ⊗ T ]B⊗D,C⊗E = [S]B,C ⊗ [T ]D,E where the ⊗ on the left refers to the tensor products V ⊗ X and W ⊗ Y (and their respective bases) and on the right it refers to the Kronecker product.

A word of warning: if B = {v1, . . . , vn} and C = {w1, . . . , wm} then to obtain the Kro- necker product correctly, one must write the basis B ⊗ C in the following order:

B ⊗ C = {v1 ⊗ w1, . . . , v1 ⊗ wm, . . . , vn ⊗ w1, . . . , vn ⊗ wm}.

3.4 Alternating Functions and Exterior Products

Definition. Let V be a vector space and k a positive integer. Then a multilinear function Ψ: V × · · · × V → W is an alternating function if Ψ(v1, . . . , vk) = 0 whenever vi = vj | {z } k times for some i 6= j.

Example 3.4.1. Let V = Fn and consider the map

Ψ: V × · · · × V −→ F | {z } n

| |

(v1, . . . , vn) 7−→ CB(v1) ··· CB(vn)

| | where the vertical bars on the right denote the regular determinant of an n × n matrix. It’s easy to see that Ψ is multilinear (this is Cramer’s Rule) and alternating by the definition of the determinant.

Why use the term ’alternating’? Say v1, . . . , vk ∈ V and Ψ is an alternating function on V ×k – going forward this will be our notation for V × V × · · · V and likewise for similar iterative objects, e.g. V ⊗ V ⊗ · · · = V ⊗k. Since Ψ is an alternating function, we should have Ψ(v1, . . . , vi−1, vi +vj, vi+1, . . . , vj−1, vj +vi, vj+1, . . . , vk) equal to 0. Expanding this out looks like

0 = Ψ(. . . , vi, . . . , vj,...) + Ψ(. . . , vj, . . . , vj,...) + Ψ(. . . , vj, . . . , vi,...) + Ψ(. . . , vi, . . . , vi,...)

= Ψ(. . . , vi, . . . , vj,...) + 0 + Ψ(. . . , vj, . . . , vi,...) + 0

39 3.4 Alternating Functions and Exterior Products 3 Multilinearity

so Ψ(. . . , vi, . . . , vj,...) = −Ψ(. . . , vj, . . . , vi,...), i.e. flipping two slots changes the sign of Ψ by −1. If char F 6= 2, we can define alternating functions in this way but the definition given at the start of the section works for any field. t(σ) Note that alternating implies Ψ(v1, . . . , vk) = (−1) Ψ(vσ(1), . . . , vσ(k)) for any σ in the permutation group Sk with sgn(σ) = t(σ), i.e. t(σ) is the number of transpositions (2-cycles) into which σ decomposes. ×k ×k Let AltF(V ,W ) be the set of alternating functions V → W . Our goal is to find a space such that AltF can be realized as as a HomF set, just as we did with tensor. The construction is nearly the same. First let Ψ : V ×k → W be an alternating function. Since alternating maps are multilinear, there is a map ψ : V ⊗k −→ W

v1 ⊗ · · · ⊗ vk 7−→ Ψ(v1, . . . , vk) which is linear. Note that ψ(v1 ⊗ · · · ⊗ vk) = 0 if vi = vj for some i 6= j. Therefore we define a subspace Q = Span{v1 ⊗ · · · ⊗ vk | vi = vj for some i 6= j}. Since ψ(Q) = 0, i.e. Q is in the kernel of ψ, we get a linear map ω : V ⊗k/Q −→ W

v1 ⊗ · · · ⊗ vk + Q 7−→ Ψ(v1, . . . , vk) which is thus well-defined. Definition. Let V be a finite dimension vector space and let k be a nonnegative integer. Then the kth exterior power of V is Vk V := V ⊗k/Q. By convention we set V0 V equal to the base field F. Theorem 3.4.2. Let V be a finite dimensional vector space and k a nonnegative integer. For any alternating function Ψ: V ×k → W there is a unique linear map ω : Vk V → W making the diagram below commute.

V ×k Ψ

Vk V ω W

This says that exterior powers are a solution to the universal mapping problem described ×k ∼ Vk above; that is, AltF(V ,W ) = HomF( V,W ). Instead of writing v1 ⊗ · · · ⊗ vk + Q every time, we adopt the notation v1 ∧· · ·∧vk. One could call these ‘elementary wedges’, following the name for elementary tensors in Section 3.2. In general, V and ∧ are pronounced ‘wedge’. Notice that Vk V has all of the multilinear relations (it is after all a quotient of the tensor product) plus an additional relation:

v1 ∧ · · · ∧ vk = 0 if vi = vj for some i 6= j.

t(σ) This in turn implies the formula v1 ∧ · · · ∧ vk = (−1) vσ(1) ∧ · · · ∧ vσ(k).

40 3.4 Alternating Functions and Exterior Products 3 Multilinearity

Proposition 3.4.3. Let V be a finite dimensional vector space and k a positive integer. Let v1, . . . , vk ∈ V . Then v1 ∧ · · · ∧ vk = 0 ⇐⇒ {v1, . . . , vk} is linearly dependent. X Proof. ( ⇒ = ) Reorder {v1, . . . , vk} so that v1 = cjvj. This only changes the sign of the j≥2 wedge of the vi by at most −1 but we don’t care since we show that it equals 0: ! X v1 ∧ · · · ∧ vk = cjvj ∧ · · · ∧ vk j≥2 X = cj(vj ∧ · · · ∧ vk) j≥2 X = cj · 0 = 0. j≥2

( =⇒ ) is a little harder to show, but we will gain some traction on the problem once we find a basis for Vk V below.

In Section 3.2, we learned that if {v1, . . . , vn} is a basis for V then {vi1 ⊗ · · · ⊗ vik | ij ∈ {1, . . . , n}} is a basis for V ⊗k. Because Vk V is a quotient of V ⊗k and projections are onto, Vk the image of {vi1 ⊗ · · · ⊗ vik } is a spanning set. That is, {vi1 ∧ · · · ∧ vik } spans V .

⊗2 Example 3.4.4. Suppose V has a basis {v1, v2, v3, v4} so that a basis for V consists of

v1 ⊗ v1 v1 ⊗ v2 v1 ⊗ v3 v1 ⊗ v4 v2 ⊗ v1 v2 ⊗ v2 v2 ⊗ v3 v2 ⊗ v4 v3 ⊗ v1 v3 ⊗ v2 v3 ⊗ v3 v3 ⊗ v4 v4 ⊗ v1 v4 ⊗ v2 v4 ⊗ v3 v4 ⊗ v4

Then a spanning set for V2 V is just the images of these:

v1 ∧ v1 v1 ∧ v2 v1 ∧ v3 v1 ∧ v4 v2 ∧ v1 v2 ∧ v2 v2 ∧ v3 v2 ∧ v4 v3 ∧ v1 v3 ∧ v2 v3 ∧ v3 v3 ∧ v4 v4 ∧ v1 v4 ∧ v2 v4 ∧ v3 v4 ∧ v4

V2 However vj ∧ vj = 0 for j = 1, 2, 3, 4 and by the relations on V , vi ∧ vj = −(vj ∧ vi) for all i 6= j. Therefore a basis for V2 V can be taken from the ‘matrix’ above and crossing off the diagonal and the lower triangle. Therefore we are left with the following basis elements:

v1 ∧ v2 v1 ∧ v3 v1 ∧ v4 v2 ∧ v3 v2 ∧ v4 v3 ∧ v4

In general, we have

41 3.4 Alternating Functions and Exterior Products 3 Multilinearity

Theorem 3.4.5. Let V be a vector space with basis {v1, . . . , vn} and let k be a positive integer. A basis of Vk V is

{vi1 ∧ · · · ∧ vik | 1 ≤ i1 < i2 < ··· < ik ≤ n}. Proof sketch. Consider the map ×k (n) Ψ: V −→ F k X (v1, . . . , vk) 7−→ det AI eI I where the sum is over all k-element subsets I ⊆ {1, . . . , n} and  | |  A = CB(v1) ··· CB(vk) | |

In particular, we can see that if dim V = n, (n ^ if 1 ≤ k ≤ n dim k V = k 0 if k > n. As a special case, if k = dim V then dim Vk V = 1. As with tensor and transformations, we can consider the ‘wedge’ of a transformation. Let T : V → W be a linear transformation and k ≥ 0 an integer. Then ^ Ψ: V ×k −→ k W

(v1, . . . , vk) 7−→ T (v1) ∧ · · · ∧ T (vk) is alternating so the universal property of exterior products gives us a unique linear map ^ ^ ^ k T : k V −→ k W

v1 ∧ · · · ∧ vk 7−→ T (v1) ∧ · · · ∧ T (vk).

Proposition 3.4.6. For linear transformations S and T whose composite ST is defined and     k a nonnegative integer, Vk(ST ) = Vk S Vk T . Proof. By definition,

^k  (ST ) (v1 ∧ · · · ∧ vk) = (ST )(v1) ∧ · · · ∧ (ST )(vk)

^k  = S (T (v1) ∧ · · · ∧ T (vk))

^k  ^k  = S T (v1 ∧ · · · ∧ vk).

This proposition shows that exterior powers ‘play nicely’ with composites. However, the same cannot be said for sums, for in general ^ ^ ^ k(S + T ) 6= kS + kT.

42 3.5 The Determinant 3 Multilinearity

3.5 The Determinant

In this section we realize the classic determinant det A of a matrix A as an invariant of the Vk n exterior product. Recall that by Theorem 3.4.5, if dim V = n, dim V = k if 1 ≤ k ≤ n and 0 otherwise. This means dim Vn V = 1, so given T : V → V linear, we get a map ^ ^ ^ nT : nV −→ nV where the domain and target are both one-dimensional. In other words there is some constant c such that ^n T (v1 ∧ · · · ∧ vn) = c(v1 ∧ · · · ∧ vn) for all choices of v1, . . . , vn ∈ V . Definition. Let T : V → V be linear and let n = dim V . The determinant of T is the element det T ∈ F such that

^n T (v1 ∧ · · · ∧ vn) = (det T )(v1 ∧ · · · ∧ vn) for all v1, . . . , vn ∈ V . Proposition 3.5.1. Determinants are multiplicative, that is, if S,T : V → V are linear, then det(ST ) = (det S)(det T ).

Proof. This follows from Proposition 3.4.6.

Corollary 3.5.2. If T : V → V is an invertible linear transformation, then det T −1 = (det T )−1.

Proof. Follows from the fact that the determinant of the identity on V must equal 1. In fact, a stronger result is true.

Theorem 3.5.3. T is invertible ⇐⇒ det T 6= 0.

Proof. ( =⇒ ) was Corollary 3.5.2. We prove ( ⇒ = ) by contrapositive. Suppose T is not invertible. Then in particular T is not onto so dim im T < n. Thus any set {T (v1),...,T (vn)} of n elements in im T must be linearly dependent. By Proposition 3.4.3, this is equivalent to T (v1) ∧ · · · T (vn) = 0. Hence det T = 0.

Example 3.5.4. Let T : F3 → F3 be given by T (x) = Ax where 1 3 1 A = 2 2 3 3 1 2

43 3.5 The Determinant 3 Multilinearity

By definition the determinant satisfies T (v1) ∧ T (v2) ∧ T (v3) = (det T )(v1 ∧ v2 ∧ v3) for any 3 3 vectors v1, v2, v3 ∈ F . In particular if {e1, e2, e3} is the standard basis for F , then using the exterior product relations, we compute

T (e1 ∧ e2 ∧ e3) = (e1 + 2e2 + 3e3) ∧ (3e1 + 2e2 + e3) ∧ (e1 + 3e2 + 2e3)

= (e1 ∧ 2e2 ∧ 2e3) + (e1 ∧ e3 ∧ 3e2) + (2e2 ∧ 3e1 ∧ 2e3)

+ (2e2 ∧ e3 ∧ e1) + (3e3 ∧ 3e1 ∧ 3e2) + (3e3 ∧ 2e2 ∧ e1)

= 4(e1 ∧ e2 ∧ e3) − 3(e1 ∧ e2 ∧ e3) − 12(e1 ∧ e2 ∧ e3)

+ 2(e1 ∧ e2 ∧ e3) + 27(e1 ∧ e2 ∧ e3) − 6(e1 ∧ e2 ∧ e3)

= 12(e1 ∧ e2 ∧ e3). So det T = 12. It can be verified, by cofactor expansion for example, that det A = 12.

n A n In general, if T : F −→ F for an n × n matrix A = (aij), then X  X  X  T (e1) ∧ · · · ∧ T (en) = ai1ei ∧ ai2ei ∧ · · · ∧ ainei ! X t(σ) = (−1) a1σ(1) ··· anσ(n) (e1 ∧ · · · ∧ en).

σ∈Sn Proposition 3.5.5. If T : V → V is a linear transformation and B is a basis of V , then det T defined in this section coincides with the classic determinant det A, where A = [T ]B. Example 3.5.6. Assume T : V → V is linear such that for W a subspace of V , T (W ) ⊆ W . Such a subspace is called a T -invariant subspace of V . We know that this induces a well-defined linear map

T : V/W −→ V/W v + W 7−→ T (v) + W.

We also get a restricted map

T |W : W −→ W w 7−→ T (w).

The standard matrix for T with respect to some basis B of V can be written in block form   [T |W ]A ∗ [T ]B =     0 T C where A is a basis of W and C is a basis of V/W . In fact, if A and C are given by

A = {w1, . . . , wm} and C = {vm+1 + W, . . . , vn + W } then the choice of basis that works in the above formula for [T ]B is

B = {w1, . . . , wm, vm+1, . . . , vn}.

44 3.5 The Determinant 3 Multilinearity

Furthermore, by Proposition 3.5.5, det T can be computed from this matrix: det T = det[T ]B. We know from introductory linear algebra that the determinant of a block upper triangular matrix is the product of the determinants along the diagonal, so      det T = det[T ]B = det [T |W ]A det T C = (det T |W ) det T . This will be useful in future computations.

Wouldn’t it be nice if we could find a subspace W so that the matrix above were not only block upper triangular but block diagonal? This question motivates the study of Jordan canonical form in Chapter 4.

45 4 Diagonalizability

4 Diagonalizability

Given a square matrix A (or equivalently a transformation T : V → V ), our goal in this −1 chapter is to find an invertible matrix P (or a basis of V ) so that P AP (or [T ]B) can be written in block form  J  λ1,n1 0  J   λ2,n2   ..   0 .  Jλk,nk

where each block Jλi,ni is the ni × ni matrix   λi 1 0  λ 1   i   .. ..   . .     0 λi 1  λi

with 1’s along the superdiagonal and λi an eigenvalue of T . This is called the Jordan canonical form of a matrix and each Jλi,ni is called a Jordan block. We will see that, up to reordering of the blocks, there is only one matrix in the similarity class of A of this form. Example 4.0.7. The following matrix is in Jordan form. 2 1 0 0 0 0 2 0 0 0   0 0 3 0 0   0 0 0 3 1 0 0 0 0 3

To fully describe this canonical form, we will

ˆ Find the λi’s that define the blocks. These will be eigenvalues of T .

ˆ Determine the size of each block (the ni) and the total number of blocks (k). ˆ Produce a basis (equivalently, a matrix P ) that makes all of this work.

4.1 Eigenvalues and Eigenvectors

Definition. Let T : V → V be a linear transformation. An element λ ∈ F is called an eigenvalue of T if there is a nonzero vector v ∈ V such that T (v) = λv. If such a vector v exists, it is called an eigenvector of T . As we know from introductory linear algebra, this definition can be reformulated as T (v) − λv = 0 ⇐⇒ (T − λ)v = 0, where T − λ is shorthand for T − λ · idV . If v is a nonzero vector satisfying this equation, then T − λ must not be an invertible transformation. Hence det(T − λ) = 0.

46 4.1 Eigenvalues and Eigenvectors 4 Diagonalizability

Definition. Let λ be an indeterminate () in the field F. Then det(T − λ) is a polynomial in λ, called the characteristic polynomial of T . It has degree equal to dim V and is denoted χT (λ). The equation χT (λ) = 0 is called the characteristic equation of T . Example 4.1.1. Consider the transformation

T : P2[x] −→ P2[x] df f 7−→ + f(2). dx

To compute χT (λ) we use exterior products: ^  3(T − λ) (1 ∧ x ∧ x2) = (T − λ)(1) ∧ (T − λ)(x) ∧ (T − λ)(x2) = (1 − λ) ∧ (3 − λx) ∧ (2x + 4 − λx2) = (1 − λ)(−λ)2(1 ∧ x ∧ x2).

2 So χT (λ) = (1 − λ)λ . For this T , the eigenvalues are λ = 0 and 1.

This is how we obtain the λi’s in the Jordan blocks for Jordan canonical form. The set of eigenvalues of a transformation has a special name. Definition. The set of eigenvalues of a linear transformation T is called the spectrum of T , denoted σ(T ). We will sometimes denote a linear transformation T : V → V by (V,T ), called a linear system.

Definition. Given a linear system (V,T ) and a scalar λ0 ∈ F, define

Eλ0 (T ) = {v ∈ V : T (v) = λ0v}

to be the set of all eigenvectors for the eigenvalue λ0 of T . Eλ0 (T ) is called the eigenspace of T for the eigenvalue λ0.

Remark. Note that Eλ0 (T ) = ker(T − λ0) so eigenspaces are subspaces of V .

Definition. The generalized eigenspace of T for an eigenvalue λ0 is the set

k Kλ0 (T ) = {v ∈ V | (T − λ0) (v) = 0 for some k}.

These Kλ0 are not immediately recognizable as the kernel of a transformation as easily as the regular eigenspaces. However, if dim V = n, there will exist a number k between 1 k and n such that Kλ0 (T ) = ker(T − λ0) . So if a vector v doesn’t show up in the kernel of k any (T − λ0) for 1 ≤ k ≤ n, then v 6∈ Kλ0 (T ).

A Example 4.1.2. Consider T : F3 −→ F3 where 0 1 0 A = 0 0 1 0 0 0

47 4.1 Eigenvalues and Eigenvectors 4 Diagonalizability

3 Then χT (λ) = −λ by our usual calculation of the characteristic polynomial, so we see that λ = 0 is the only eigenvalue of T . In particular, E0(T ) = ker A = Span{e1}. We also have 0 0 1 0 0 0 2 3 A = 0 0 0 and A = 0 0 0 0 0 0 0 0 0

2 3 3 so ker A = Span{e1, e2} and ker A = Span{e1, e2, e3} = F . This tells us that the general- 3 ized eigenspace K0(T ) is all of F . A Example 4.1.3. Let’s do the same thing with T : F3 −→ F3 for 2 0 0 A = 0 3 1 0 0 3

2 The characteristic polynomial is χT (λ) = (2 − λ)(3 − λ) so 2 and 3 are the eigenvalues of T . 0 0 0 −1 0 0 A − 2 = 0 1 1 and A − 3 =  0 0 1 0 0 1 0 0 0

Notice that ker(A − 2) = Span{e1} and since the lower right block of A − 2 is invertible, this kernel will never get bigger with larger powers of A − 2. Thus K2 = Span{e1}. On the 2 other hand, ker(A − 3) = Span{e2} and ker(A − 3) = Span{e2, e3} which is stable. Hence K3 = Span{e2, e3}. Notice that K2 ∩ K3 = {0}. This will hold in general for the set of generalized eigenspaces of a linear system.

Definition. Let λ0 be an eigenvalue of the linear system (V,T ). The geometric multiplic-

ity of λ0 is dim Eλ0 (T ), i.e. the maximal size of a linearly independent set of λ0-eigenvectors.

Definition. The algebraic multiplicity of λ0 is the order of λ0 as a root of χT (λ), that k is, if χT (λ) = (λ − λ0) p(λ) where p(λ0) 6= 0, then the algebraic multiplicity of λ0 is k. 2 0 0 2 Example 4.1.4. Using the same matrix A = 0 3 1 with χT (λ) = (2−λ)(3−λ) , we see 0 0 3

that gm2(T ) = 1 = am2(T ). However for the eigenvalue 3, these are not equal: gm3(T ) = 1

and am3(T ) = 2. It turns out that amλ0 (T ) = dim Kλ0 (T ).

Theorem 4.1.5. For any linear system (V,T ) with eigenvalue λ0 ∈ F, gmλ0 (T ) ≤ amλ0 (T ).

Proof. If we can prove that T (Eλ0 ) ⊆ Eλ0 (i.e. Eλ0 is T -invariant) then we will get a linear map

T : V/Eλ0 −→ V/Eλ0 .

To see that Eλ0 is T -invariant, suppose v ∈ Eλ0 . Then T (v) = λ0v ∈ Eλ0 since eigenspaces are subspaces of V . With T in hand, we know that "h i # T |E ∗ [T ] = λ0 A B   0 T C

48 4.1 Eigenvalues and Eigenvectors 4 Diagonalizability

where A is a basis of Eλ0 , C is a basis of V/Eλ0 and B is the basis of V built out of A and C. It is clear that   λ0 0 h i  λ0  T | =   Eλ0  ..  A  0 .  λ0

Now we compute the characteristic equation of the block upper triangular matrix [T ]B, since it is equal to χT (λ). Since the matrix is block triangular, the determinants multiply so we have gmλ χT (λ) = (λ0 − λ) 0 · χT (λ). Therefore even without knowing what T  looks like, we can see that gm ≤ am . C λ0 λ0

The most interesting case of course is when gmλ0 = amλ0 . We will see that the definition below is equivalent to this equality.

Definition. A linear system (V,T ) is diagonalizable if there is a basis of V consisting of eigenvectors of T .

The reason this is called diagonalizability is that if B is such a basis, [T ]B will be a diagonal matrix. Of course not every matrix is diagonal, and therefore not every linear transformation is diagonalizable. The Jordan canonical form is in a rough sense the ‘most diagonal’ we can make a matrix. If the linear system corresponding to the matrix is diagonalizable then the Jordan canonical form, which corresponds to [T ]B, will be diagonal, i.e. every Jordan block is 1 × 1.

Theorem 4.1.6. (V,T ) is diagonalizable if and only if χT (λ) splits completely over F into irreducible linear factors and gmλ0 (T ) = amλ0 (T ) for every eigenvalue λ0 of T .

Proof. ( =⇒ ) If (V,T ) is diagonalizable then choose a basis B = {v1, . . . , vn} where the vi are eigenvectors of T . Let λi be the eigenvalue for vi, i.e. T (vi) = λivi. Then   λ1 0  λ   2  [T ]B =  ..   0 .  λn so we see that the characteristic polynomial of T is χT (λ) = (λ − λ1)(λ − λ2) ··· (λ − λn) which shows that χT (λ) splits completely. Furthermore, by definition

amλi (T ) = #{λj | λj = λi}

= dim ker([T ]B − λiI)

= dim Eλi (T ) = gmλi (T ). This proves the second statement.

49 4.2 Minimal Polynomials 4 Diagonalizability

( ⇒ = ) We will prove shortly that eigenvectors from different eigenspaces are linearly independent. Then χT (λ) splitting completely implies that X amλ0 (T ) = dim V = n.

λ0∈σ(T ) P By assumption gmλ0 (T ) = n as well so the first comment shows that we can pick a basis for V out of eigenvectors in σ(T ). Hence T is diagonalizable.

4.2 Minimal Polynomials

We recall some standard facts about polynomials in F[x]. First, F[x] is an F-algebra since it has a natural multiplication in addition to its vector space structure. F[x] is an example of a principal domain (PID) meaning every ideal I ⊂ F[x] may be generated by a single polynomial f, written I = (f). F[x] is also a unique factorization domain (UFD) which means that for every f ∈ F[x] there exist irreducible polynomials p1, . . . , pk ∈ F[x] and positive integers n1, . . . , nk such that n1 nk f = p1 ··· pk and this factorization is unique up to reordering of the pi and multiplication by a unit. In general, a PID is also a UFD but the converse is false. F[x] further has the division property.

Proposition 4.2.1 (The Division Algorithm). Given f, g ∈ F[x], there exist unique polyno- mials q, r ∈ F[x] with deg r < deg g such that f = qg + r.

This makes F[x] into a Euclidean domain. In general, a Euclidean domain is also a PID which further implies it is a UFD. Euclidean domains, and in particular F[x], have greatest common divisors.

Definition. Given f, g ∈ F[x], their greatest common divisor is a polynomial d ∈ F[x] such that if k | f and k | g then k | d. This is denoted gcd(f, g) = d.

Example 4.2.2. The gcd of (x − 3)3(x − 2)(x − 1) and (x − 4)(x − 5)2(x − 3)(x − 2)2 is the largest factor shared by both polynomials, which is (x − 3)(x − 2).

Theorem 4.2.3. Given f, g ∈ F[x], there exist p, q ∈ F[x] such that pf + qg = gcd(f, g). In particular this shows that if d = gcd(f, g) then the ideals generated by these elements satisfy (f, g) = (f) + (g) = (d).

Definition. Two polynomials f and g are relatively prime if gcd(f, g) = 1.

By Theorem 4.2.3, f and g are relatively prime ⇐⇒ there exist p, q ∈ F[x] such that pf + qg = 1 ⇐⇒ (f) + (g) = F[x]. One last fact we will need classifies the irreducibles over R[x] and C[x].

50 4.2 Minimal Polynomials 4 Diagonalizability

Theorem 4.2.4. The irreducible elements R[x] are (up to a unit):

ˆ x − α for α ∈ R.

ˆ x2 + bx + c where b, c ∈ R and b2 − 4c < 0.

The irreducible elements of C[x] (up to a unit once again) are just the linear polynomials x − α, α ∈ C. One of the most important linear transformations in this course is the evaluation ho- momorphism

evT : F[x] −→ End(V ) p(x) 7−→ p(T ).

2 k Explicitly, if p(x) = a0 + a1x + a2x + ... + akx then the image of p(x) under “evaluation 2 k i at T ” is a0idV + a1T + a2T + ... + akT , where T = T ◦ T ◦ · · · ◦ T . Note that evT is | {z } i times an F-algebra homomorphism since if p and q are polynomials, evT (pq) = evT (p) ◦ evT (q). Therefore evT (F[x]) is a commutative subalgebra of the (sometimes very) noncommutative End(V ).

Definition. The image im evT is written F[T ]. This is the smallest subalgebra of End(V ) containing T .

On the other hand, the kernel of evT is an ideal of F[x] and since F[x] is a PID, there is some polynomial p ∈ F[x] such that ker evT = (p). Since F is a field, we may scale p to be monic; that is, the coefficient of the highest power of x in p(x) is 1.

Definition. The minimal polynomial of T , denoted pT (x), is the unique monic polynomial such that ker evT = (pT ).

Equivalently, pT is the monic polynomial of least degree such that pT (T )(v) = 0 for all v ∈ V .

A Example 4.2.5. Let T : F2 −→ F2 where 2 0 A = . 0 3

A polynomial in ker evT is p(x) = (x − 2)(x − 3) since

0 0 −1 0 0 0 p(A) = (A − 2I)(A − 3I) = = . 0 1 0 0 0 0

The only monic polynomials dividing p(x) are x − 2 and x − 3, but the above expression shows that neither of these are in the kernel of the evaluation map at T . Therefore p(x) = (x − 2)(x − 3) is the minimal polynomial of T .

51 4.2 Minimal Polynomials 4 Diagonalizability

Theorem 4.2.6 (Cayley-Hamilton). If χT (λ) is the characteristic polynomial of a linear system (V,T ), then χT (x) lies in the kernel of the evaluation homomorphism at T , i.e. χT (T ) = 0. We will next focus on developing the required results to prove this theorem (this will be done in Section 4.3). In general, minimal polynomials are not as easy to compute as in the example above (e.g. try any non-triangular matrix). We will define another type of minimal polynomial that is easier to compute. For a linear system (V,T ) and a vector v ∈ V , define the transformation

evT,v : F[x] −→ V p(x) 7−→ p(T )(v).

This is no longer an algebra homomorphism since in particular V is not an algebra. However the kernel of evT,v is still an ideal of F[x] and hence principal. We will denote the monic generator of ker evT,v by pT,v(x). As with pT (x), this polynomial is unique. 2 0 Example 4.2.7. Take A = as in the previous example and let V = 2 have the 0 3 F standard basis {e1, e2}. Let’s calculate pA,e1 (x). Before we saw that

0 0 A − 2I = 0 1

so (A − 2I)e1 = 0. Because this is monic (and irreducible since it’s linear), pA,e1 (x) = x − 2.

Likewise, we see that pA,e2 (x) = x − 3. Now let v = e1 + e2. The following technique will work with larger (finite) bases to produce minimal polynomials for linear combinations of the basis elements. Let’s build a linearly independent set out of

1 2 4 {e + e ,T (e + e ),T 2(e + e )} = , , . 1 2 1 2 1 2 1 3 9

2 Since we’re in dimension 2, this set must be linearly dependent so T (e1 + e2) can be written as a linear combination of the other two vectors:

T 2(v) = −6v + 5T (v) =⇒ T 2(v) − 5T (v) + 6v = 0 2 =⇒ (T − 5T + 6idV )(v) = 0.

2 2 Hence pT,v(x) = x − 5x + 6. Notice that x − 5x + 6 = (x − 2)(x − 3) so something else is going on here.

52 4.3 Cyclic Linear Systems 4 Diagonalizability

4.3 Cyclic Linear Systems

We now discuss a type of linear system for which the minimal polynomial pT (x) is equal to the minimal polynomial pT,v(x) for some v ∈ V . Definition. Let (V,T ) be a linear system with dim V = n. Define the span of v ∈ V with respect to T to be

k Span(T,V ) := {a0v + a1T (v) + ... + akT (v) | k ≥ 0, ai ∈ F} = Span{v, T (v),T 2(v),...,T n−1(v)}. Notice that Span(T, v) is a subspace of V – it’s defined as a span – and moreover it is a T - invariant subspace. In fact, Span(T, v) is the smallest T -invariant subspace of V containing the vector v. Definition. A linear system (V,T ) is cyclic if there is a v ∈ V such that Span(T, v) = V . If such a vector exists, it is called a generator of the system. In the language of modules (remember that vector space is merely the name for a module over a field), Span(T, v) is the cyclic submodule of V generated by v. The first example below shows that there are non-cyclic linear systems.

Example 4.3.1. The identity transformation T = idV with standard matrix the identity matrix   1 0  1    In =  ..   0 .  1 does not form a cyclic system on V for n ≥ 2. This is because for any v ∈ V , Span(idV , v) = Span(v) which is one-dimensional. Example 4.3.2. Consider the linear system (F3,A) where 3 1 0 0 3 1 0 0 3

3 (Notice that this is in Jordan normal form.) Let e3 be the 3rd standard basis vector for F . Then we see that

2 Span(A, e3) = Span{e3, Ae3,A e3} = Span{e3, 3e3 + e2, 9e3 + 6e2 + e1}.

3 3 The latter equals F so this shows that (F ,A) is cyclic and a generator is e3. Example 4.3.3. As in Example 4.2.7, consider (F2,A), where 2 0 A = 0 3

2 Then Span(A, e1 + e2) = Span{e1 + e2, 2e1 + 3e2}. This is a basis for F , so e1 + e2 cyclically generates (F2,A).

53 4.3 Cyclic Linear Systems 4 Diagonalizability

This last example generalizes: if A is diagonal with n distinct eigenvalues, let {v1, . . . , vn} be a set of linearly independent eigenvectors of (Fn,A) (e.g. one for each eigenvalue – we will see shortly that this must be the case). Using the Vandermond determinant, one can n show that the sum of the vi is a cyclic generator of (F ,A). Theorem 4.3.4. Let (V,T ) be a linear system and for some v ∈ V , let k be the small- est positive integer such that T k(v) ∈ Span{v, T (v),T 2(v),...,T k−1(v)}; write T k(v) = k−1 X j − bjT (v). Then the minimal polynomial pT,v is given by j=0

k−1 k X j pT,v(x) = x + bjx . j=0

Proof. Nearly identical to the work in Example 4.2.7. As an immediate consequence, we have

Corollary 4.3.5. pT,v is a polynomial of degree dim V ⇐⇒ v is a cyclic generator of (V,T ).

Notice that by definition, for any v ∈ V the space Span(T, v) is a cyclic subspace of V generated by v. It is a common technique, as the next proposition shows, to establish properties for cyclic subobjects (subgroups, submodules, subspaces, etc.) and then relate that information to the parent object to prove a desired theorem.

Proposition 4.3.6. Let (V,T ) be a linear system and define W = Span(T, v) for a vector v ∈ V . Then pTW (x) = pT,v(x). In particular, if (V,T ) is cyclic then pT (x) = pT,v(x) for any generator v of V .

Proof. It suffices to show that pTW (x) and pT,v(x) divide each other. One direction is easy: by definition pTW (x) is the monic polynomial of least degree such that pTW (T ) is the zero operator on W , but v ∈ W so pTW (T )(v) = 0. This means pTW (x) is in the ideal ker evT,v, and so pT,v divides the generator of this ideal, pTW . Going the other direction, we need to show that pT,v(T ) is the zero operator on W . Recall k−1 i that W = Span{v, T (v),...,T (v)}. Then it’s enough to show that pT,v(T )(T (v)) = 0 for some i since then pT,v(T ) will be 0 on all of W . By associativity,

i  i  pT,v(T )(T (v)) = (x pT,v(x)(T ) (v)

and the right side is 0 since ker evT,v is an ideal of F[x]. Hence pTW (x) divides pT,v(x) and we conclude that they are equal.

Lemma 4.3.7. Let p(x) ∈ F[x] be a polynomial and (V,T ) a linear system. Then p(T ) = 0 ⇐⇒ pT,v(x) divides p(x) for all v ∈ V .

54 4.3 Cyclic Linear Systems 4 Diagonalizability

Proof. ( =⇒ ) If p(T ) = 0 then p(T )(v) = 0 so p(x) ∈ ker evT,v. This is an ideal generated by pT,v so pT,v divides p. ( ⇒ = ) If pT,v(x) divides p(x), write p(x) = g(x)pT,v(x) where g(x) is not divisible by pT,v(x). Then for any v,

p(T )(v) = [g(T ) ◦ pT,v(T )](v) = g(T )(pT,v(T )(v)) = g(T )(0) = 0.

Since v was arbitrary, p(T ) must be the zero operator.

Definition. Let p(x) be a monic polynomial in F[x]:

m−1 m X j p(x) = x + bjx . j=0 The companion matrix for p is the m × m matrix   0 0 · · · −b0 1 0 · · · −b   1  0 1 · · · −b  C(p) =  2  . . .. .  . . . .  0 0 · · · −bm−1

Note that if V is an m-dimensional vector space with basis B = {e0, . . . , em−1} then the linear transformation

S : V −→ V  ei+1 for 0 ≤ i < m − 1 m−1 ei 7−→ P − bjej for i = m − 1  j=0

m−1 m X j has standard matrix [S]B = C(p), where p(x) = x + bjx . Furthermore, if (V,T ) is j=0 m−1 m X j cyclic with generator v and pT (x) = pT,v(x) = x + bjx , then j=0

[T ]B = C(pT ) = C(pT,v) where B = {v, T (v),...,T m−1(v)}. The main benefit to using companion matrices is that it’s very easy to compute eigenvalues:

−λ 0 · · · −b0

1 −λ · · · −b 1 0 1 · · · −b det(C(p) − λ) = 2 ......

0 0 · · · −bm−1 − λ

55 4.3 Cyclic Linear Systems 4 Diagonalizability

Notice that the upper right block is −λ times another (smaller) companion matrix. This allows for easy cofactor expansion. If we write p(x) = xq(x) + b0, this process produces the following formula: m χC(p)(λ) = −λχC(q)(λ) + (−1) b0.

C(p) Theorem 4.3.8. For the linear transformation V −−→ V , its characteristic polynomial is m−1 m m X j given by χC(p) = (−1) p where p(x) = x + bjx . In particular, if (V,T ) is cyclic then j=0

dim V pT (x) = (−1) χT (x).

Proof. We induct on deg p. If deg p = 1 then p(x) = x − b and the companion matrix is just 1 the 1 × 1 matrix [b]. Then χC(p)(x) = (b − x) = (−1) p(x) so the base case holds. Now write p(x) = xq(x) + b0. Since q(x) is monic of strictly smaller degree then p(x), the inductive hypothesis gives us

m χC(p)(x) = −xχC(q)(x) + (−1) b0 m−1 m = (−x)(−1) q(x) + (−1) b0 m = (−1) (xq(x) + b0) = (−1)mp(x).

The second statement follows easily from this fact. The converse of the second statement is true as well:

dim V Theorem 4.3.9. χT (x) = (−1) pT (x) ⇐⇒ (V,T ) is cyclic. Proof omitted. We have thus proven the Cayley-Hamilton Theorem (4.2.6) for cyclic linear systems. We now use a clever trick to prove the result for any linear system. First we restate the theorem.

Theorem (4.2.6). For any linear system (V,T ), χT (T ) = 0.

Proof. We may assume (V,T ) is not cyclic. By Lemma 4.3.7, it suffices to show that pT,v(x) divides χT (x) for any v ∈ V . Let v ∈ V and set W = Span(T, v). Since V is not cyclic, W is a proper T -invariant subspace of V . As before, this gives us two linear transformations

T : W −→ W T : V/W −→ V/W W and w 7−→ T (w) v + W 7−→ T (v) + W.

By Section 4.1, χT (x) = χTW (x)χT (x). Moreover, since W is cyclic, χTW (x) = ±pT,v(x) for its generator v. Hence χT (x) is a multiple of pT,v(x) for each v.

We can prove a more precise statement about χT (x).

Proposition 4.3.10. Every prime factor of χT (x) divides the minimal polynomial pT,v(x) for some v ∈ V .

56 4.3 Cyclic Linear Systems 4 Diagonalizability

Proof. We prove this by induction again. If (V,T ) is cyclic, there’s nothing new to prove since Theorem 4.3.9 shows that χT (x) = ±pT,v(x) where v is a generator of (V,T ). Otherwise,

let v ∈ V and set W = Span(T, v) ( V . As above, χT (x) = χTW (x)χT (x). Take a

prime factor f(x) of χT (x). If f(x) divides χTW (x), we’re done since W is cyclic, meaning

χTW (x) = ±pT,v(x). Otherwise f(x) must divide χT (x). Since V/W has dimension strictly less than dim V , the inductive hypothesis says there exists a vectorv ¯ ∈ V/W such that f(x)

divides pT,v¯(x). It’s easy to prove that pT,v¯(x) divides pT,v(x) for any vector v, which finishes the proof.

Example 4.3.11. If χT (x) = f1(x) ··· fr(x) where the fi(x) are distinct and prime, then χT (x) = ±pT (x). In particular, Theorem 4.3.9 tells us that (V,T ) is cyclic in this case.

Lemma 4.3.12. Suppose S and T are linear transformations on V so that S + T = idV . Then (1) ST = TS.

(2) ker S ∩ ker T = 0. If in addition ST = 0, then (3) S2 = S and T 2 = T (they are idempotent).

(4) V = ker S ⊕ ker T . Finally, if V is finite dimensional, (5) im S = ker T and im T = ker S.

2 Proof. First assume S + T = idV . Then S(S + T ) = S + ST = S idV = S. Similarly, multiplying on the right by S gives S2 + TS = S. Subtracting S2 from both shows that ST = TS, proving (1). To show (2), let v ∈ ker S ∩ ker T . Then

v = idV (v) = (S + T )(v) = S(v) + T (v) = 0 + 0 = 0.

Hence ker S ∩ ker T = 0. Now assuming ST = 0, (3) follows immediately from the proof of (1). To show (4), let v ∈ V . Then v = S(v)+T (v) by the above, but since ST = TS = 0, we see that S(v) ∈ ker T and T (v) ∈ ker S. Hence V = ker S ⊕ ker T . For the final statement (5), note that im S ⊆ ker T by ST = 0. By the rank-nullity theorem, dim im S + dim ker S = dim V and by (4), dim ker S = dim V − dim ker T . This implies dim im S = dim ker T and so they must be equal. The proof of the other equality is identical. Note that (4) and (5) of Lemma 4.3.12 together say that S and T are ‘projections’ of V onto im S and im T , respectively. This has a key connection to the study of minimal polynomials. Lemma 4.3.13. Let (V,T ) be a linear system and suppose f(x) and g(x) are relatively prime. Then ker f(T ) ∩ ker g(T ) = 0, and if f(T )g(T ) = 0 then V = ker f(T ) ⊕ ker g(T ).

57 4.3 Cyclic Linear Systems 4 Diagonalizability

Proof. Since f(x) and g(x) are relatively prime, there exist polynomials p(x), q(x) ∈ F[x] so that we can write pf + qg = 1. Plugging in T , we see that

p(T )f(T ) + q(T )g(T ) = idV

so Lemma 4.3.12 says that ker(p(T )f(T )) ∩ ker(q(T )g(T )) = 0. This implies ker f(T ) ∩ ker g(T ) = 0 since ker f(T ) ⊆ ker(p(T )f(T )) and ker g(T ) ⊆ ker(q(T )g(T )). Moreover, (4) and (5) of Lemma 4.3.12 implies that when f(T )g(T ) = 0,

V = ker(p(T )f(T )) ⊕ ker(q(T )g(T )).

Note that for any v ∈ V , v = (p(T )f(T ))(v) + (q(T )g(T ))(v) and since everything commutes by hypothesis, (p(T )f(T ))(v) ∈ ker g(T ) and (q(T )g(T ))(v) ∈ ker f(T ). This shows that V = ker f(T ) + ker g(T ) and since we showed that ker f(T ) ∩ ker g(T ) = 0, the sum is direct.

Theorem 4.3.14. For a linear system (V,T ), write its minimal polynomial pT (x) as a product of distinct, monic, irreducible factors:

Y ni pT (x) = fi(x) for ni ≥ 1.

L ni Then V = ker fi(T ) . Proof. Distinct, monic, irreducible polynomials are relatively prime so here we can apply Lemma 4.3.13 and induct on the number of fi. This type of factorization of polynomials, called invariant factor decomposition, corre- sponds to a canonical matrix form called Rational Canonical Form. Recall that for a monic m−1 m X j polynomial p(x) = x + bjx the companion matrix for p is j=0   0 0 · · · −b0 1 0 · · · −b   1  0 1 · · · −b  C(p) =  2  . . .. .  . . . .  0 0 · · · −bm−1

Theorem 4.3.15 (Rational Canonical Form). For a linear system (V,T ) (where V is finite dimensional), there exists a basis for V such that the standard matrix of T has the form   C1 0  C   2   ..   0 .  Cm where Ci is the companion matrix for fi(x) in the invariant factor decomposition of the minimal polynomial pT (x) of (V,T ).

58 4.4 Jordan Canonical Form 4 Diagonalizability

4.4 Jordan Canonical Form

The most interesting case of the invariant factor decomposition for pT (x) is when each fi is linear, and can be written fi(x) = x − λi for an eigenvalue λi. Our goal is to find a basis for ni each ker fi so that the standard matrix for T restricted to each kernel has the form J  λi,m1 0 h i  Jλi,m2  T | ni =   ker fi  ..   0 .  Jλi,mk

where each Jλi,mj is an mj × mj Jordan block for λi. It turns out that ni = max{mj}j=1,...,k, that is, ni is the largest size of a Jordan block for λi. We will also see that k, the number

of Jordan blocks for λi, is equal to dim Eλi (the dimension of the eigenspace for λi) and k X ni mj = dim Kλi , i.e. Kλi = ker fi(T ) . j=1 ni The following algorithm produces a basis for ker(T − λi) .

(1) For each eigenvalue λ0, compute bases of the following spaces

im(T − λ0) Eλ0 2 im(T − λ0) im(T − λ0) ∩ Eλ0 . . . . p p−1 im(T − λ0) im(T − λ0) ∩ Eλ0

until the dimensions are equal across a row. Then p will be the power of λ0 −x appearing in the minimal polynomial pT (x) for (V,T ), and as discussed above, the size of the largest Jordan block for λ0.

p−1 (2) Let S1 = {x1, . . . , xk1 } be a basis of im(T −λ0) ∩Eλ0 . The value k1 will be the number p−2 of p × p Jordan blocks for λ0. Complete S1 to a basis of im(T − λ0) ∩ Eλ0 :

{x1, . . . , xk1 , xk1+1, . . . , xk2 }.

Then k2 is the number of Jordan blocks for λ0 of size ≥ p − 1. For each x ∈ S1, find 0 p−2 0 x ∈ im(T − λ0) such that (T − λ0)x = x. Putting this together, denote

0 0 S2 = {x1, . . . , xk1 , xk1+1, . . . , xk2 }.

p−3 (3) Next complete {x1, . . . , xk2 } to a basis of im(T − λ0) ∩ Eλ0 :

{x1, . . . , xk2 , xk2+1, . . . , xk3 }

0 p−3 0 and lift each x ∈ S2 to some x ∈ im(T − λ0) such that (T − λ0)x = x, giving

00 00 0 0 S3 = {x1, . . . , xk1 , xk1+1, . . . , xk2 , xk2+1, . . . , xk3 }. Repeat this process of completing a basis and lifting all the previous elements to produce Sp. The size of Sp will correspond to the total number of Jordan blocks for λ0

59 4.4 Jordan Canonical Form 4 Diagonalizability

(4) Finally, concatenate the Si to obtain a basis for Kλ0 ; after reordering, this should look like 0 00 (p−1) 0 00 (p−1) {x1, x1, x1, . . . , x1 , x2, x2, x2, . . . , x2 ,...}. Example 4.4.1. Consider the matrix 3 2 1 2 0 2 3 2 A =   0 0 3 4 0 0 0 2 Since A is upper triangular, we see that it has two eigenvalues, λ = 2 and 3. The character- istic polynomial is computed to be

2 2 χT (λ) = (2 − λ) (3 − λ) . This actually gives us all the information we need to write the Jordan canonical form for A: 2 1 0 0 0 2 0 0 J =   0 0 3 1 0 0 0 3

However, to make full use of the equation A = PJP −1 we need to compute P , which can be obtained by the algorithm above. For λ0 = 2, we compute a basis of 1 2 1 2 0 0 3 2 A − 2I =   0 0 1 4 0 0 0 0

 1   1   2   −2  0 3 2 1 to be 0 , 1 , 4 and a basis for E2 to be 0 . Likewise, a basis for 0 0 0 0 1 2 8 10 2 0 0 3 12 (A − 2I) =   0 0 1 4  0 0 0 0

 1   8  0 3 is 0 , 1 . Notice that the dimension changed, so p ≥ 2 but we know p ≤ 2 since 2 0 0 is the exponent of 2 − x in the characteristic polynomial, of which pT (x) is a divisor. This tells us that there will be the size of the largest Jordan block for λ0 = 2. Finally we compute a basis of im(A − 2I) ∩ E2 by solving 1 1 2 0 a 0 3 2 0 (A − 2I)   b =   0 1 4   0   c   0 0 0 0

60 4.4 Jordan Canonical Form 4 Diagonalizability

Notice that we only take the column vectors in the second matrix corresponding to the basis  20  −10 of im(A − 2I) found above. This yields the basis {x1} = 0 of im(A − 2I) ∩ E2 0 which we called S1 in the algorithm. This is already a basis of E2, so we just need to lift x1 0 0 to some x1 such that (A − 2I)x1 = x1. This is obtained from solving the system above for  22  0 0 (a, b, c), which yields x1 = −4 . Thus our basis of K2 is 1  20   22    0 −10  0  {x1, x1} =   ,   .  0  −4  0 1  Now note that

(A − 2I)x1 = 0 ⇐⇒ Ax1 = 2x1 0 0 0 and (A − 2I)x1 = x1 ⇐⇒ Ax1 = 2x1 + x1.

So the standard matrix for T restricted to the generalized eigenspace for λ0 = 2 is 2 1 [T |K2 ]{x ,x0 } = , 1 1 0 2 the first Jordan block. We now repeat the same procedure for λ0 = 3. Since 0 2 1 2  0 −1 3 2  A − 3I =   0 0 0 4  0 0 0 −1 we see that a basis for im(A − 3I) is  2  1  2    −1 3  2    ,   ,    0  0  4   0 0 −1 

 1  0 2 and a basis for E3 = ker(A − 3I) is 0 . Note that a basis for im(A − 3I) must be 0 rank 2 since E3 is one-dimensional and therefore the Jordan block for λ0 = 3 must again be  7  0 2 × 2. One can compute a basis for im(A − 3I) ∩ E3 to be S1 = {y1} = 0 by solving 0 a linear system as before. Since this is already a basis of E3, we just need to lift y1 to some 0 0 y1 satisfying (A − 3I)y1 = y1. We do this by solving 0 2 1 2  7 0 −1 3 2  0 0   y =   0 0 0 4  1 0 0 0 0 −1 0

61 4.4 Jordan Canonical Form 4 Diagonalizability

 0  0 3 This yields y1 = 1 so our basis for K3 is 0

7 0   0 3 S2 =   ,   0 1  0 0 

Putting this all together, we have

 20 22 7 0 2 1 0   20 22 7 0−1 −10 0 0 3 0 2 0 0 −10 0 0 3 A =        0 −4 0 1 0 0 3 1  0 −4 0 1 0 1 0 0 0 0 0 3 0 1 0 0

This expression, A = PJP −1, is the Jordan decomposition of A. It is more useful than just knowing J since P contains the change-of-basis information necessary for switching between A and J. Moreover, since dim K2 and dim K3 were both 2, this shows that χT (x) really is the minimal polynomial for (F4,T ). Example 4.4.2. Let’s see what happens when the multiplicities don’t work out quite so nicely. Let T : F4 → F4 be a linear transformation with standard matrix  9 0 0 1  −1 8 0 −1 A =    0 1 7 0  0 1 −1 8

4 3 One may compute χT (λ) = (8 − λ) and pT (x) = (x − 8) , so we can already see that the Jordan form of A will be 8 1 0 0 0 8 1 0 J =   0 0 8 0 0 0 0 8 Consider  1 0 0 1  −1 0 0 −1 A − 8I =    0 1 −1 0  0 1 −1 0 Then dim im(A − 8I) = 2 and by rank-nullity, dim ker(A − 8I) = 2 which is the number of

62 4.4 Jordan Canonical Form 4 Diagonalizability

Jordan blocks. We compute some bases: A basis for im(A − 8I) is easily seen to be

0 −1  1  0     1  0  −1 0 E8 = ker(A − 8I) = Span   ,   im(A − 8I) =   ,   1  0   0  1  0 1   0 1  −1 −1     2  1   1  im(A − 8I) = Span   im(A − 8I) ∩ E8 = Span    1   1   1   1  −1   3 2  1  im(A − 8I) = 0 im(A − 8I) ∩ E8 = Span    1   1 

Since im(A−8I)3 = 0, we know p = 3; in fact this shows that the matrix A−8I is nilpotent.  −1  1 Set S1 = {x1} = 1 . Since {x1} is already a basis of im(A − 8I) ∩ E8, there’s 1 0 0 nothing new to add but we still lift x by solving (A − 8I)x1 = x1 for some x1 ∈ im(A − 8I). 0 Note that simply solving the system (A−8I)x1 = x1 as in the last example doesn’t guarantee 0  0  that x1 will lie in im(A − 8I), so instead we write (A − 8I) (A − 8I)y1 = x1 and solve for 0 y1. This looks like  1 1 −1 1 −1 −1 −1 1 1 0  1    y =   −1 −1 1 0 1  1  −1 −1 1 0 1 0 Clearly y1 = −e1 will work, which gives us −1 0  1  x = (A − 8I)(−e1) =   1  0  0

 0   −1  0 00 1 0 The algorithm gives us S2 = {x1} and we obtain S3 = {x2, x1} = 1 , 0 af- 0 0 ter completing a lifting the previous basis. Putting this all together gives us the Jordan decomposition A = PJP −1, where

8 1 0 0 −1 −1 −1 0 0 8 1 0  1 1 0 1 J =   and P =   0 0 8 0  1 0 0 1 0 0 0 8 1 0 0 0

63 4.4 Jordan Canonical Form 4 Diagonalizability

Remark. Our algorithm shows that knowing

rank(A − λ0) 2 rank(A − λ0) 3 rank(A − λ0) . . p rank(A − λ0) for each distinct eigenvalue λ0 completely determines the Jordan canonical form of A. In other words, the list of these ranks is a list of all invariants of the linear transformation T : V −→A V .

64 5 Inner Product Spaces

5 Inner Product Spaces

5.1 Inner Products and Norms

For this chapter F will be either R or C. The proofs of results on inner product spaces will apply to both choices of F, and the notation will be sufficiently general to represent both. In general, the idea of this chapter is to make sense of length and angle for vectors in V , a vector space over F. Definition. Let V be a vector space over R. An inner product on V is a symmetric, positive definite, bilinear pairing h·, ·i : V × V → R; that is, for every u, v ∈ V and c ∈ R, (i) (Bilinear) hu, v + v0i = hu, vi + hu, v0i, hu, cvi = chu, vi, hu + u0, vi = hu, vi + hu0, vi and hcu, vi = chu, vi. (ii) (Symmetric) hu, vi = hv, ui. (iii) (Positive definite) hu, ui ≥ 0 and hu, ui = 0 ⇐⇒ u = 0. Examples.

1 Dot products on Rn (where n is assumed to be finite) are inner products: hu, vi = u · v.

2 Consider the infinite dimensional vector space V = C([0, 1], R), which consists of con- tinuous functions f : [0, 1] → R. An inner product on V is Z 1 hf, gi := f(x)g(x) dx. 0

Inner products on C-vector spaces have a slightly different definition. Definition. Let V be a C-vector space. An inner product on V is a Hermitian, positive definite, sesquilinear pairing h·, ·, i : V × V → C satisfying the following properties for all u, v ∈ V and c ∈ C: (i) (Sesquilinear) hu + u0, vi = hu, vi + hu0, vi and hcu, vi = chu, vi (the product is linear in the first component); hu, v + v0i = hu, vi + hu, v0i and hu, cvi =c ¯hu, vi where c¯ is the complex conjugate of c.

(ii) (Hermitian) hu, vi = hv, ui. In particular hu, ui ∈ R. (iii) (Positive definite) hu, ui ≥ 0 and hu, ui = 0 ⇐⇒ u = 0. To distinguish the two types of inner products we have defined, we will often refer to real inner products defined on R-vector spaces and complex inner products defined on C-vector spaces. Notice that every complex inner product restricts to a real inner product on R when R is viewed as a subspace of C.

65 5.1 Inner Products and Norms 5 Inner Product Spaces

Examples.

3 Notice that the regular dot product on V = Cn is not an inner product. For example, i · i = −1 < 0. To rectify this, we define an inner product by hu, vi := u · v¯, i.e. if u, v ∈ n then C     * u1 v1 + n  .   .  X  .  ,  .  = uiv¯i. i=1 un vn

4 In a similar fashion, we extend the dot product on C([0, 1], R) to V = C([0, 1], C) by Z 1 hf, gi = f(x)g(x) dx. 0

2 ∞ 5 Define ` (C) to be the space of sequences of complex numbers (zn)n=1 such that ∞ X 2 2 |zi| < ∞. Then an inner product on ` (C) is defined to be i=1

∞ X h(yn), (zn)i := yiz¯i. i=1

Definition. Let H be a vector space over F, where F = R or C. If there exists an inner product h·, ·, i on H then H is called an , denoted (H, h·, ·i).

Definition. Let H be an inner product space. For u ∈ H, the norm of u is defined to p be ||u|| := hu, ui. This defines a metric || · || : H → F and hence a function d(u, v) := ||u − v|| on H.

Remark. This shows that every inner product space is a normed vector space. The reverse holds, i.e. a normed linear space is also an inner product space, but only when || · || satisfies the parallelogram law.

Theorem 5.1.1 (Cauchy-Schwarz Inequality). Let H be an inner product space and take u, v ∈ H. Then |hu, vi| ≤ ||u|| ||v||.

Proof. If v = cu for a c, |hu, vi| = |c|hu, ui = |c| ||u||2 = ||cu|| ||u|| (this is actually true more generally when c ∈ C). On the other hand, if u and v are not real multiples of each other, then for all c ∈ R, hu + cv, u + cvi > 0. Let’s expand this using the definition of the norm:

0 < hu + cv, u + cvi = hu, ui + hu, cvi + hcv, ui + hcv, cvi = ||u||2 + chu, vi + chv, ui + c2||v||2   = ||u||2 + c hu, vi + hu, vi + c2||v||2 = ||u||2 + 2cRe(hu, vi) + c2||v||2.

66 5.1 Inner Products and Norms 5 Inner Product Spaces

This is a quadratic polynomial in c, and since it has no real roots, the discriminant must be negative: 4(Re(hu, vi))2 − 4||u||2||v||2 < 0. This implies |Re(hu, vi)| < ||u|| ||v|| which proves the inequality when h·, ·i is a real inner product. If h·, ·i is complex, choose ξ ∈ C such that |ξ| = 1 and |hu, vi| = hξu, vi. In other words the choice of ξ should be such that ξhu, vi rotates hu, vi down to the real axis of the complex plane:

hu, vi

ξ = e−iθ θ hξu, vi

Then |hu, vi| = hξu, vi = |Re(hξu, vi)| ≤ ||ξu|| ||v|| = ||u|| ||v|| since |ξ| = 1. This finishes the proof.

Proposition 5.1.2. Let H be an inner product space and u, v ∈ H.

(1) ||v|| ∈ R. (2) ||v|| ≥ 0 and ||v|| = 0 ⇐⇒ v = 0.

(3) ||cv|| = |c| ||v|| for all c ∈ C. (4) (Triangle inequality) ||u + v|| ≤ ||u|| + ||v||.

(5) (Parallelogram law) ||u + v||2 + ||u − v||2 = 2(||u||2 + ||v||2).

Proof. (1) – (3) follow easily from the definitions. To prove (4), note that

||u + v||2 = hu + v, u + vi = hu, ui + hu, vi + hv, ui + hv, vi = ||u||2 + hu, vi + hu, vi + ||v||2 by Hermitian property = ||u||2 + ||v||2 + 2Re(hu, vi) ≤ ||u||2 + ||v||2 + 2|hu, vi| ≤ ||u||2 + ||v||2 + 2||u|| ||v|| by Cauchy-Schwarz = (||u|| + ||v||)2.

Since taking the square root preserves ≤, we have shown the triangle inequality. (5) is proven using a similar expansion of the norm in terms of h·, ·i.

67 5.2 Orthogonality 5 Inner Product Spaces

5.2 Orthogonality

Definition. Let H be an inner product space. We say u and v in H are orthogonal vectors if hu, vi = 0. Similarly, u is orthogonal to a subset A ⊆ H if hu, vi = 0 for every v ∈ A. The set of vectors orthogonal to A is a subspace of H, denoted

A⊥ = {u ∈ H | hu, vi = 0 for all v ∈ A}.

For a moment this conflicts with our earlier definition of “A perp” from Section 2.3. However, we will fix this apparent conflict in notation shortly.

Definition. A set of vectors {u1, . . . , um} is orthogonal if hui, uji = 0 for every i 6= j. Furthermore, the set is orthonormal if, in addition, ( 0 i 6= j hui, uji = δij = 1 i = j for every i, j = 1, . . . , m. In other words, an orthonormal set of vectors is orthogonal and consists only of unit vectors (with respect to || · ||). The following theorem characterizes orthonormal sets in finite dimensional inner product spaces.

Theorem 5.2.1. Let H be an n-dimensional inner product space and take {u1, . . . , um} to be an orthonormal set of vectors in H. Set W = Span{u1, . . . , um}.

(1) {u1, . . . , um} is linearly independent and therefore m ≤ n.

m X ⊥ (2) For all v ∈ H, v − hv, ujiuj ∈ W . This difference is called the orthogonal j=1 projection of v onto W , and is sometimes denoted projW (v). n X (3) If n = m, {u1, . . . , um} is called an orthonormal basis of H, and v = hv, ujiuj j=1 for all v ∈ H.

n X (4) (Parseval’s Identity) If n = m, then for any v, w ∈ H, hv, wi = hv, ujihw, uji. j=1

m X 2 2 (5) (Bessel’s Inequality) For all v ∈ H, |hv, uji| ≤ ||v|| . j=1

(6) If T ∈ End(H) and B = {u1, . . . , un} is an orthonormal basis of H, then the standard matrix of T with respect to this basis has the form

([T ]B)ij = hT (ui), uji.

68 5.2 Orthogonality 5 Inner Product Spaces

Proof. (1) Suppose a1u1 + ... + amum = 0. Apply h·, uii to both sides:

a1hu1, uii + ... + aihui, uii + ... + amhum, uii = 0.

Since the set is orthonormal, huj, uii = 0 for all j 6= i and hui, uii = 1, so ai = 0. Since i was arbitrary, we have shown that {u1, . . . , um} is linearly independent. * m + X (2) It suffices to check that v − hv, uiiui, uj = 0 for j = 1, . . . , m. By linearity, i=1

* m + m X X v − hv, uiiui, uj = hv, uji − hv, uiihui, uji i=1 i=1

= hv, uji − hv, uji by orthonormality = 0.

⊥ (3) follows from (2) since if Span{u1, . . . , um} = V then W = 0 (the only vector orthog- onal to everything in V is 0 by the positive definite condition). Finally, (4) – (6) follow from (1) – (3) and the definitions of inner product and norm. One of the main theoretical tools in linear algebra is the Gram-Schmidt Theorem; it produces the QR factorization of a matrix which has important consequences for real-world computations of linear systems.

Theorem 5.2.2 (Gram-Schmidt). Let H be an inner product space and {v1, . . . , vm} a linearly independent subset. Then there exists an orthonormal set of vectors {u1, . . . , um} such that Span{u1, . . . , ui} = Span{v1, . . . , vi} for all i = 1, . . . , m.

Proof sketch. When i = 1 there’s only one choice for u1: it is a unit vector that’s a scalar 1 multiple of v1 so we set x1 = v1 and u1 = x1. Next, we know from (2) of Theorem 5.2.1 ||x1|| 1 that v2 − hv2, u1iu1 is orthogonal to u1, so we set x2 = v2 − hv2, u1iu1 and u2 = x2. ||x2|| 1 Continue this algorithm to produce x3 = v3 − hv3, u1iu1 − hv3, u2iu2 and u3 = x3 and so ||x||3 forth. In general, for each i = 2, . . . , m we will have

i X 1 x = v − hv , u iu and u = x . i i i j j i ||x || i j=1 i

Finishing the proof of Gram-Schmidt comes down to checking that the {u1, . . . , um} produced by this algorithm satisfy the desired properties. Producing orthonormal bases is important in many areas, notably in . In loose terms, the goal is to approximate a (symmetric, even) periodic function f(x) as the sum of weighted trigonometric functions. For example, f(x) is approximated with 10 sinusoids by 10 X f(x) ≈ an cos(nx). n=0

69 5.2 Orthogonality 5 Inner Product Spaces

It turns out that the cos(nx) are part of an orthonormal basis for H = C1, the of continuously differentiable, real-valued functions, with respect to `2. Gram-Schmidt (5.2.2) allows us to prove the following important property of subspaces of inner product spaces. Corollary 5.2.3. If W ⊆ H is a subspace of an inner product space H, then H = W ⊕ W ⊥.

Proof. Start with a basis {v1, . . . , vm} of W and complete it to {v1, . . . , vm, vm+1, . . . , vn} a basis of H. The algorithm in the proof of the Gram-Schmidt Theorem (5.2.2) enables us to product u1, . . . , un such that ⊥ W = Span{u1, . . . , um} and W = Span{um+1, . . . , un}. By dimension arguments, we must have W + W ⊥ = H. Furthermore, W ∩ W ⊥ = 0 follows from the positive definite condition (the only vector that is orthogonal to everything is 0). Hence the sum is direct, so H = W ⊕ W ⊥. Next we connect W ⊥ to the dual space defined in Section 2.3. Theorem 5.2.4. Let H be a finite dimensional inner product space and take f ∈ H∗. Then there exists a unique v ∈ H such that f(u) = hu, vi; that is, the function ϕ : H −→ H∗ ( h−, vi : H → v 7−→ C u 7→ hu, vi is a well-defined isomorphism. ∗ Proof. It suffices to show that ϕ is onto since dim H = dim H . Let {u1, . . . , un} be an orthonormal basis of H. Then for any u ∈ H, n ! n X X f(u) = f hu, uiiui = hu, uiif(ui) i=1 i=1 * n + X = u, f(ui)ui by sesquilinearity. i=1 n X This tells us that if we set v = f(ui)ui then ϕ(v)(u) = hu, vi = f(u). i=1 Theorem 5.2.4 says that every finite dimensional inner product space is a Hilbert space. Using this, we identify H with its dual H∗. Unlike in Chapter 2, when the isomorphism required a choice of basis, the isomorphism prescribed above does not depend on a basis of H; it is in some sense canonical. This identification gives rise to the following definition.

Definition. Given a basis {v1, . . . , vn} of H, Theorem 5.2.4 implies there is a unique basis 0 0 0 {v1, . . . , vn} of H such that hvi, vji = δij for all i, j = 1, . . . , n. This basis is called the dual basis of {v1, . . . , vn}. Moreover, we now see that the two definitions for W ⊥ coincide on inner product spaces. Corollary 5.2.5. A basis is orthonormal if and only if it is equal to its own dual basis.

70 5.3 The Adjoint 5 Inner Product Spaces

5.3 The Adjoint

In Section 2.3, we had a transformation T : T → W to which we associated the adjoint ∗ ∗ ∗ T : W → V . In an inner product space, a linear transformation T : H1 → H2 corresponds to an adjoint

∗ T : H2 −→ H1 h−, vi 7−→ hT (−), vi = h−,T ∗(v)i.

Notice that here the adjoint is a transformation of the vector spaces themselves, so we don’t ∗ have to pass to their duals. In particular, for each v ∈ H2 T (v) is the unique vector in ∗ H1 such that hT (u), vi = hu, T (v)i for all u ∈ H1. The question we want to answer in this section is

Question. Given a linear transformation T : H1 → H2 and orthonormal bases B1 and B2 ∗ of H1 and H2, respectively, what is the matrix of T with respect to B2 and B1?

∗ ∗ Note that B1 = B1 and B2 = B2 by Corollary 5.2.5, so this question makes sense. Recall that the matrix of T with respect to these bases is given by

([T ]B1,B2 )ij = hT (ui), vji

if B1 = {ui} and B2 = {vj}. This tells us that

∗ ∗ ([T ]B2,B1 )ij = hT (vi), uji ∗ = huj,T (vi)i by Hermitian

= hT (uj), vii

= ([T ]B1,B2 )ji. From this identity we define

T Definition. Let A ∈ Mat(n, m, C). The adjoint matrix of A is A∗ := (AT ) = A , where A is the conjugate matrix of A, meaning each entry is conjugated.

By the work above, we have proven

Theorem 5.3.1. If T : H1 → H2 is a linear transformation of inner product spaces and ∗ ∗ B1, B2 are orthonormal bases of H1 and H2, respectively, then [T ]B2,B1 = ([T ]B1,B2 ) . Definition. A linear operator T : H → H on an inner product space H is called self-adjoint if T = T ∗.

Example 5.3.2. Given any transformation T : H1 → H2, the operators

∗ ∗ TT : H2 → H2 and T T : H1 → H1

are self-adjoint, since (TT ∗)∗ = T ∗∗T ∗ = TT ∗ and (T ∗T )∗ = T ∗T ∗∗ = T ∗T .

71 5.3 The Adjoint 5 Inner Product Spaces

Theorem 5.3.3. Let T : H → H be a linear operator on an inner product space H and suppose T is self-adjoint. Then

(1) σ(T ) ⊂ R. (2) If a subspace W ⊂ H is T -invariant then so is W ⊥.

(3) If W is T -invariant, then T |W : W → W is self-adjoint. Proof. (1) Assume T is a complex operator on H – the proof is similar in the case that T is a real operator, but it requires one to consider H as a vector space over C as well as R. Let v ∈ H such that T (v) = λ0v for some λ0 6= 0. Then

λ0hv, vi = hλ0v, vi = hT v, vi = hv, T vi by self-adjointness

= hv, λ0vi ¯ = λ0hv, vi by Hermitian. ¯ Hence λ0 = λ0 which occurs if and only if λ0 ∈ R. (2) Let v ∈ W ⊥. We must show that T (v) is orthogonal to each u ∈ W , but hu, T vi = hT u, vi by self-adjointness, and since T u ∈ W by T -invariance, hT u, vi = 0. Hence T (v) is orthogonal to W . (3) Note that every subspace W ⊆ H inherits an inner product h·, ·iW from the parent inner product space, whose inner product we will temporarily denote h·, ·iH . for all u, v ∈ W , one has

hT |W (u), viW = hT u, viW = hT u, viH = hu, T viH = hu, T viW = hu, T |W (v)iW .

Hence T |W is self-adjoint. This theorem is the main ingredient in the Spectral Theorem, which is of great importance in the study of operators on Hilbert spaces (e.g. in ). Theorem 5.3.4 (Spectral Theorem). Let T : H → H be a self-adjoint linear operator on an inner product space H. Then there is an orthonormal basis of H consisting of eigenvectors of T with real eigenvalues. Proof. We induct on n = dim H. If n = 1 we are done, since any single vector forms a basis and by Theorem 5.3.3 it will have a real eigenvalue (scale to make it orthonormal).

For the inductive step, take λ0 ∈ σ(T ) ⊂ R (by Theorem 5.3.3) and set W = Eλ0 6= 0. Then T |W and T |W ⊥ are each self-adjoint by Theorem 5.3.3. Since W 6= 0, dim W and ⊥ dim W are both strictly less than dim H so by induction we have bases B1 and B2 of W and ⊥ W , respectively, such that B1 is an orthonormal basis of eigenvectors of T |W and likewise for B2 and T |W ⊥ . If we set B = B1 ∪ B2, we see that B is a basis of H by virtue of the fact that H = W ⊕ W ⊥, so this finishes the proof.

In practice, there’s no reason to expect an arbitrary basis of Eλ0 to be orthonormal, but one can use the algorithm from the proof of Gram-Schmidt (5.2.2) to arrange this and then take a union of all the bases.

72 5.4 Normal Operators 5 Inner Product Spaces

5.4 Normal Operators

In the next few sections we will explore some consequences of the Spectral Theorem (5.3.4). Let S and T be self-adjoint linear transformations on H such that ST = TS. By the Spectral Theorem (5.3.4), H can be decomposed into the eigenspaces of T :

c M H = Eλi (T ) i=1

where λi are the eigenvalues of T . We will temporarily denote these by Hλi (T ) = Eλi (T ).

Since S and T commute, each Hλi (T ) is not just T -invariant but also S-invariant. By

Theorem 5.3.3, S restricts to a self-adjoint operator on each Hλi (T ). The Spectral Theorem (5.3.4) again implies that for each λi,

d M Hλi (T ) = Hλi,µj (T,S). j=1

where µj is an eigenvalue of S restricted to Hλi . Putting everything together, we have M H = Hλ,µ(T,S) (λ,µ)

where λ ∈ σ(T ) and µ ∈ σ(S). Notice that Hλ,µ(T,S) = ker(T −λI)∩ker(S−µI). Moreover, there is an orthonormal basis {u1, . . . , un} of H such that each uk is an eigenvector for both S and T . One can continue this process for any (finite) number of commuting, self-adjoint linear transformations. Definition. A linear transformation T on an inner product space H is normal if T com- mutes with its adjoint: TT ∗ = T ∗T . For any (complex) linear operator T , we can define its real and imaginary parts by

1 ∗ 1 ∗ Re(T ) = 2 (T + T ) and Im(T ) = 2i (T − T ). Just as with complex numbers, this allows us to write any operator in terms of its real and imaginary parts: T = Re(T ) + i Im(T ). These are integral to the following characterization of normal operators. Proposition 5.4.1. T is normal if and only if Re(T ) and Im(T ) commute. Proof sketch. Use the fact that for any T , Re(T ∗) = Re(T ) and Im(T ∗) = −Im(T ). This is used to prove Theorem 5.4.2 (Spectral Theorem for Normal Operators). If T : H → H is normal, then there exists an orthonormal basis of eigenvectors {u1, . . . , un} of H such that each uj is an eigenvector for T . Moreover,

σ(Re(T )) = Re(σ(T )) and σ(Im(T )) = Im(σ(T )).

73 5.5 Spectral Decomposition 5 Inner Product Spaces

Proof. By the Spectral Theorem (5.3.4), there exists an orthonormal basis {u1, . . . , un} of H where each uj is an eigenvector for T . By Proposition 5.4.1, Re(T ) and Im(T ) commute so by Lemma 4.3.12, the basis may be chosen so that each uj is an eigenvector for both Re(T ) and Im(T ). Then for any of the basis vectors,

T (uj) = (Re(T ) + i Im(T ))(uj)

= Re(T )(uj) + i Im(T )(uj)

= λjuj + i µjuj

= (λj + i µj)uj. This implies that σ(Re(T )) = Re(σ(T )) and σ(Im(T )) = Im(σ(T )).

5.5 Spectral Decomposition

∗ Let A be an n × n Hermitian matrix over C, that is, A = A – equivalently, TA is self- adjoint. The Spectral Theorem (5.3.4) says that there is an orthonormal basis {u1, . . . , un} n n of C such that each uj is an eigenvector for A, say Auj = λjuj. Then C decomposes into the eigenspaces for A: c n M C = Eλj (A) j=1 which suggests that A can be written as a sum of simpler matrices corresponding to the λj. In fact, TA can be written

TA = λ1πλ1 + λ2πλ2 + ... + λcπλc

where πλj = proj is the orthogonal projection map for the λj-eigenspace. If we have Eλj (A) n an orthonormal basis {v1, . . . , vk} of λj-eigenvectors for Eλj (A), every v ∈ C is projected

onto Eλj (A) via inner products:

πλj (v) = hv, v1iv1 + ... + hv, vkivk. To move forward, we need to address the following question.

n Question. For a vector u, what is the matrix of proju in the standard basis on C ?

Because the standard basis {e1, . . . , en} is orthonormal, the (i, j)th entry of proju is

hproju(ei), eji = hhei, ui,j i = hei, ui hej, ui which equals (after swapping the order) the (i, j)th entry of uu∗. The n × n matrix uu∗ is sometimes referred to as the outer product of u (with itself; this generalizes to an outer product uv∗ for any two vectors u and v). We have therefore proven Theorem 5.5.1 (Spectral Decomposition). If A is Hermitian, there exists an orthonormal n basis {u1, . . . , un} of C consisting of eigenvectors of A such that ∗ ∗ ∗ A = λ1u1u1 + λ2u2u2 + ... + λnunun.

74 5.6 Unitary Transformations 5 Inner Product Spaces

 1 3 + 4i Example 5.5.2. Consider A = which has a basis of eigenvectors 3 − 4i 1

 1 3 + 4i 1 3 + 4i √ , √ 50 5 50 −5 corresponding to eigenvalues 6 and −4, respectively. Notice that

1 3 + 4i 1  25 15 + 20i X = 3 − 4i 5 = 50 5 50 15 − 20i 25

which is Hermitian. Likewise,

1 3 + 4i 1  25 −15 − 20i Y = 3 − 4i −5 = 50 −5 50 −15 + 20i 25

is also Hermitian. Then the spectral decomposition of A is

1  50 150 + 200i  1 3 + 4i A = 6X − 4Y = = 50 150 − 200i 50 3 − 4i 1

In another sense, the spectral decomposition theorem says that any Hermitian matrix can be written as a linear combination of rank 1 matrices. This is because in general, uu∗ (an n × 1 matrix times a 1 × n matrix) is always rank 1.

5.6 Unitary Transformations

Definition. A linear transformation U : H → H on an inner product space H is unitary if for all x ∈ H, ||Ux|| = ||x||.

The definition of unitary is equivalent to hUx, Uyi = hx, yi for all x, y ∈ H. In particular, if U is unitary then hx, yi = hUx, Uyi = hx, U ∗Uyi ∗ ∗ −1 by definition of the adjoint. Duality implies that U U = idH , and so we see that U = U . The converse also holds when H is finite dimensional. We summarize this here.

Proposition 5.6.1. U is a unitary transformation on H if and only if U ∗ = U −1.

The next theorem is a useful characterization of unitary matrices in terms of orthonormal bases. The real strength of the theorem is the equivalence of (2) and (3).

Theorem 5.6.2. The following are equivalent for a linear transformation U : H → H:

(1) U is unitary.

(2) U takes every orthonormal basis of H to an orthonormal basis of H.

(3) U takes some orthonormal basis of H to an orthonormal basis of H.

75 5.6 Unitary Transformations 5 Inner Product Spaces

Proof. (1) =⇒ (2) Let {u1, . . . , un} be an orthonormal basis for H. Then hUuj, Uuki = huj, uki = δjk for all 1 ≤ j, k ≤ n so {Uu1, . . . , Uun} is an orthonormal basis for H. (2) =⇒ (3) is trivial. (3) =⇒ (1) Suppose {u1, . . . , un} is an orthonormal basis with {Uu1, . . . , Uun} also an n n X X orthonormal basis. Let v ∈ H and Write v = ajuj. Then Uv = ajUuj by linearity, j=1 j=1 and Bessel’s Formula (Section 5.2) gives us

n 2 X 2 2 ||Uv|| = |aj| = ||v|| . j=1

Hence U is unitary. As a sidenote, one can define a norm on operators U : H → H by

||U|| := sup ||Ux||. x∈H ||x||=1

Then unitary transformations are precisely those operators with norm 1.

Definition. Two matrices A, B ∈ Matn(C) are said to be unitarily equivalent if A = U ∗BU for some unitary matrix U.

Definition. Two matrices A, B ∈ Matn(C) are said to be orthogonally equivalent if A = OT BO for some orthogonal matrix O.

These definitions give us yet another way of viewing the Spectral Theorem (5.3.4) for inner product spaces H.

Theorem 5.6.3. Every Hermitian (resp. symmetric) matrix over H is unitarily (resp. orthonormally) equivalent to a diagonal matrix with real numbers on the diagonal.

What’s more, any matrix unitarily (orthonormally) equivalent to such a diagonal matrix is Hermitian (symmetric), so unitary and orthonormal equivalence are full characterizations of these types of matrices.

Corollary 5.6.4. Every A ∈ Matn(C) is unitarily equivalent to an upper triangular matrix.

Note that this fails over R, since we need the existence of eigenvalues provided by C.

76