<<

VECTOR SPACES AND FOURIER THEORY

NEIL STRICKLAND

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike license.

1. Introduction This course involves many of the same themes as SOM201 (Linear Mathematics for Applications), but takes a more abstract point of view. A central aim of the course is to help you become familiar and comfort- able with mathematical abstraction and generalisation, which plays an important role in pure mathematics. This has many benefits. For example, we will be able to prove a single theorem that simultaneously tells us useful things about vectors, matrices, polynomials, differential equations, and sequences satisfying a re- currence relation. Without the axiomatic approach, we would have to give five different (but very similar) proofs, which would be much less efficient. We will also be led to make some non-obvious but useful analo- gies between different situations. For example, we will be able to define the distance or angle between two functions (by analogy with the distance or angle between two vectors in R3), and this will help us to under- stand the theory of Fourier series. We will prove a number of things that were merely stated in SOM201. Similarly, we will give abstract proofs of some things that were previously proved using matrix manipulation. These new proofs will require a better understanding of the underlying concepts, but once you have that understanding, they will often be considerably simpler.

2. Vector spaces Predefinition 2.1. [predef-vector-space] A (over R) is a nonempty set V of things such that (a) If u and v are elements of V , then u + v is an also an element of V . (b) If u is an element of V and t is a real number, then tu is an element of V . This definition is not strictly meaningful or rigorous; we will pick holes in it later (see Example 2.12). But it will do for the moment. Example 2.2. [eg-R-three] The set R3 of all three-dimensional vectors is a vector space, because the sum of two vectors is a vector (eg h 1 i h 3 i h 4 i h 1 i h 3 i 2 + 2 = 4 ) and the product of a real number and a vector is a vector (eg 3 2 = 6 ). In the same 3 1 4 3 9 way, the set R2 of two-dimensional vectors is also a vector space. Remark 2.3. [rem-row-column] For various reasons it will be convenient to work mostly with column vectors, as in the previous example. However, this can be typographically awkward, so we use the following notational device: if u is a row vector, then uT denotes the corresponding column vector, so for example  1  T 2 [ 1 2 3 4 ] = 3 . 4 Example 2.4. [eg-Rn] For any natural number n the set Rn of vectors of length n is a vector space. For example, the vectors T T T u = [ 1 2 4 8 16 ] and v = [ 1 −2 4 −8 16 ] are elements of R5, with u + v = [ 2 0 8 0 32 ] . We can even consider the set R∞ of all infinite sequences of real numbers, which is again a vector space. 1 Example 2.5. [eg-easy] The set {0} is a trivial example of a vector space (but it is important in the same way that the number zero is important). This space can also be thought of as R0. Another trivial example is that R itself is a vector space (which can be thought of as R1). Example 2.6. [eg-physical] The set U of physical vectors is a vector space. We can define some elements of U by • a is the vector from Sheffield to London • b is the vector from London to Cardiff • c is the vector from Sheffield to Cardiff • d is the vector from the centre of the earth to the north pole • e is the vector from the south pole to the north pole. We then have a + b = c and 2d = e.

Once we have agreed on where our axes should point, and what units of length we should use, we can identify U with R3. However, it is conceptually important (especially in the theory of relativity) that U exists in its own right without any such choice of conventions.

Example 2.7. [eg-FR] The set F (R) of all functions from R to R is a vector space, because we can add any two functions to get a new function, and we can multiply a function by a number to get a new function. For example, we can define functions f, g, h: R −→ R by f(x) = ex g(x) = e−x h(x) = cosh(x) = (ex + e−x)/2, so f, g and h are elements of F (R). Then f + g and 2h are again functions, in other words f + g ∈ F (R) and 2h ∈ F (R). Of course we actually have f + g = 2h in this example. For this to work properly, we must insist that f(x) is defined for all√ x, and is a real number for all x; it cannot be infinite or imaginary. Thus the rules p(x) = 1/x and q(x) = x do not define elements p, q ∈ F (R). Remark 2.8. In order to understand the above example, you need to think of a function f : R → R as a single object in its own right, and then think about the set F (R) of all possible functions as a single object; later you will need to think about various different subsets of F (R). All this may seem quite difficult to deal with. However, it is a central aim of this course for you to get to grips with this level of abstraction. So you should persevere, ask questions, study the notes and work through examples until it becomes clear to you.

Example 2.9. [eg-CR] In practise, we do not generally want to consider the set F (R) of all functions. Instead we consider the set C(R) of continuous functions, or the set C∞(R) of “smooth” functions (those which can be differentiated arbitrarily often), or the set R[x] of polynomial functions (eg p(x) = 1 + x + x2 + x3 defines an element p ∈ R[x]). If f and g are continuous then f + g and tf are continuous, so C(R) is a vector space. If f and g are smooth then f + g and tf are smooth, so C∞(R) is a vector space. If f and g are polynomials then f + g and tf are polynomials, so R[x] is a vector space. 2 Example 2.10. [eg-Cab] We also let [a, b] denote the interval {x ∈ R | a ≤ x ≤ b}, and we write C[a, b] for the set of continuous functions f :[a, b] −→ R. For example, the rule f(x) = 1/x defines a continuous function on the interval [1, 2]. (The only potential problem is at the point x = 0, but 0 6∈ [1, 2], so we do not need to worry about it.) We thus have an element f ∈ C[1, 2]. Example 2.11. [eg-matrices] The set M2R of 2 × 2 matrices (with real entries) is a vector space. Indeed, if we add two such matrices, we get another 2 × 2 matrix, for example 1 0 0 2 1 2 [ 0 4 ] + [ 3 0 ] = [ 3 4 ] . Similarly, if we multiply a 2 × 2 matrix by a real number, we get another 2 × 2 matrix, for example

1 2 7 14 7 [ 3 4 ] = [ 21 28 ] . 4 We can identify M2R with R , by the rule  a   a b  b c d ↔ c . d

More generally, for any n the set MnR of n × n square matrices is a vector space, which can be identified n2 with R . More generally still, for any n and m, the set Mn,mR of n × m matrices is a vector space, which can be identified with Rnm. Example 2.12. [eg-lists] Let L be the set of all finite lists of real numbers. For example, the lists a = (10, 20, 30, 40) and b = (5, 6, 7) and c = (1, π, π2) define three elements a, b, c ∈ L. Is L a vector space? In trying to answer this question, we will reveal some inadequacies of Predefinition 2.1. It seems clear that L is closed under scalar multiplication: for example 100b = (500, 600, 700), which is another element of L. The real issue is closure under addition. For example, is a + b an element of L? We cannot answer this unless we know what a + b means. There are at least three possible meanings: (1) a + b could mean (10, 20, 30, 40, 5, 6, 7) (the list a followed by the list c). (2) a + b could mean (15, 26, 37) (chop off a to make the lists the same length, then add them together). (3) a + b could mean (15, 26, 37, 40) (add zeros to the end of c to make the lists the same length, then add them together.) The point is that the expression a + b does not have a meaning until we decide to give it one. (Strictly speaking, the same is true of the expression 100b, but in that case there is only one reasonable possibility for what it should mean.) To avoid this kind of ambiguity, we should say that a vector space is a set together with a definition of addition etc. Suppose we agree to use the third definition of addition, so that a + b = (15, 26, 37, 40). The ordinary rules of algebra would tell us that (a + (−1).a) + b = b. However, in fact we have (a + (−1).a) + b = ((10, 20, 30, 40) + (−10, −20, −30, −40)) + (5, 6, 7) = (0, 0, 0, 0) + (5, 6, 7) = (5, 6, 7, 0) 6= (5, 6, 7) = b. Thus, the ordinary rules of algebra do not hold. We do not want to deal with this kind of thing; we only want to consider sets where addition and scalar multiplication work in the usual way. We must therefore give a more careful definition of a vector space, which will allow us to say that L is not a vector space, so we need not think about it. (If we used either of the other definitions of addition then things would still go wrong; details are left as an exercise.) Example 2.13. [eg-PR] Now consider the collection W of all subsets of R, so for example the sets A = {1, 2} and B = {30, 40, 50} are elements of W . We shall try to decide whether W is a vector space, and in doing so, we will understand why our original definition is not satisfactory. If W is to be a vector space, then 2A and A + B should be 3 elements of W . But what do 2A and A + B actually mean? It seems clear that 2A should mean {2, 4}, but A + B is less obvious. One possible answer is that A + B should just mean the union of A and B, so A + B = A ∪ B = {1, 2, 30, 40, 50}. Another possible answer is that A + B should mean the set of all sums a + b with a ∈ A and b ∈ B; this gives A + B = {1 + 30, 1 + 40, 1 + 50, 2 + 30, 2 + 40, 2 + 50} = {31, 32, 41, 42, 51, 52}. To avoid this kind of ambiguity, we should say that a vector space is a set together with a definition of addition etc. Suppose we agree on the second meaning of A + B. By our original definition, we see that W is a vector space: the sum of any two subsets of R is another subset of R, and the product of a real number and a subset of R is again a subset of R. However, some funny things happen. For example, we have A + 10A = {1, 2} + {10, 20} = {1 + 10, 1 + 20, 2 + 10, 2 + 20} = {11, 12, 21, 22}, whereas 11A = {11, 22}, so A + 10A 6= 11A. We certainly don’t want to allow this, so we should change the definition so that W does not count as a vector space. (If we used the first definition of addition instead of the second one, then things would still go wrong.) Our next attempt at a definition is as follows: Predefinition 2.14. [predef-vector-space-ii] A vector space over R is a nonempty set V , together with a definition of what it means to add elements of V or multiply them by real numbers, such that (a) If u and v are elements of V , then u + v is an also an element of V . (b) If u is an element of V and t is a real number, then tu is an element of V . (c) All the usual algebraic rules for addition and multiplication hold. In the course we will be content with an informal understanding of the phrase “all the usual algebraic rules”, but for completeness, we give an explicit list of axioms: Definition 2.15. [defn-real-vector-space] A vector space over R is a set V , together with an element 0 ∈ V and a definition of what it means to add elements of V or multiply them by real numbers, such that (a) If u and v are elements of V , then u + v is an also an element of V . (b) If v is an element of V and t is a real number, then tv is an element of V . (c) For any elements u, v, w ∈ V and any real numbers s, t, the following equations hold: (1)0+ v = v (2) u + v = v + u (3) u + (v + w) = (u + v) + w (4)0 u = 0 (5)1 u = u (6)( st)u = s(tu) (7)( s + t)u = su + tu (8) s(u + v) = su + sv. Note that there are many rules that do not appear explicitly in the above list, such as the fact that t(u + v − w/t) = tu + tv − w, but it turns out that all such rules can be deduced from the ones listed. We will not discuss any such deductions. In example 2.12, the only element 0 ∈ L with the property that 0 + v = v for all v is the empty list 0 = (). If u is a nonempty list of length n, then 0u is a list of n zeros, which is not the same as the empty list, so the axiom 0u = 0 is not satisfied, so L is not a vector space. In all our other examples, it is obvious that the axioms hold, and we will not discuss them further. In example 2.13, the fact that A + 10A 6= 11A contradicts the axiom (s + t)u = su + tu, so W is not a vector space. (Some of the other axioms hold, and some of them do not; we leave the details as an exercise.) In all our other examples, it is obvious that the axioms hold, and we will not discuss them further. 4 Remark 2.16. [rem-zero-notation] We will usually use the symbol 0 for the zero element of whatever vector space we are considering. Thus 0 h 0 i could mean the vector 0 (if we are working with 3) or the zero matrix [ 0 0 0 ] (if we are working with 0 R 0 0 0 M2,3R) or whatever. Occasionally it will be important to distinguish between the zero elements in different 0 2 vector spaces. In that case, we write 0V for the zero element of V . For example, we have 0R = [ 0 ] and 0 0 0M2R = [ 0 0 ]. One can also consider vector spaces over fields other than R; the most important case for us will be the field C of complex numbers. We record the definitions for completeness. Definition 2.17. [defn-] A field is a set K together with elements 0, 1 ∈ K and a definition of what it means to add or multiply two elements of K, such that: (a) The usual rules of algebra are valid. More explicitly, for all a, b, c ∈ K the following equations hold: • 0 + a = a • a + (b + c) = (a + b) + c • a + b = b + a • 1.a = a • a(bc) = (ab)c • ab = ba • a(b + c) = ab + ac (b) For every a ∈ K there is an element −a with a + (−a) = 0. (c) For every a ∈ K with a 6= 0 there is an element a−1 ∈ K with aa−1 = 1. (d)1 6= 0. Example 2.18. [eg-fields] Recall that Z = { integers } = {..., −2, −1, 0, 1, 2, 3, 4,... } Q = { rational numbers } = {a/b | a, b ∈ Z , b 6= 0} R = { real numbers } C = { complex numbers } = {x + iy | x, y ∈ R}, so Z ⊂ Q ⊂ R ⊂ C. Then R, C and Q are fields. The Z is not a field, because axiom (c) is not satisfied: there is no element 2−1 in the set Z for which 2.2−1 = 1. One can show that the ring Z/nZ is a field if and only if n is a prime number. Definition 2.19. [defn-vector-space] A vector space over a field K is a set V , together with an element 0 ∈ V and a definition of what it means to add elements of V or multiply them by elements of K, such that (a) If u and v are elements of V , then u + v is an also an element of V . (b) If v is an element of V and t is an element of K, then tv is an element of V . (c) For any elements u, v, w ∈ V and any elements s, t ∈ K, the following equations hold: (1)0+ v = v (2) u + v = v + u (3) u + (v + w) = (u + v) + w (4)0 u = 0 (5)1 u = u (6)( st)u = s(tu) (7)( s + t)u = su + tu (8) s(u + v) = su + sv. Example 2.20. [eg-other-fields] Almost all our examples of real vector spaces work over any field K. For example, the set M4Q (of 4 × 4 matrices whose entries are rational numbers) is a vector space over Q. The set C[x] (of polynomials with complex coefficients) is a vector space over C. 5 Exercises Exercise 2.1. [ex-typical-elements] For each of the following vector spaces V , write down two typical elements of V (say u and v), calculate u + v and 10v, and observe that these are again elements of V . (For example, in the case V = R2 we could 1 3 4 30 take u = [ 2 ] and v = [ 4 ], then we would have u + v = [ 6 ] and 10v = [ 40 ], both of which are again elements of R2.) 4 (a) V = R (b) V = M2,3(R) (c) V = R[x] (d) V is the set of physical vectors, as in Example 2.6 in the notes.

Exercise 2.2. [ex-not-vector-spaces] Explain why none of the following is a vector space (with the obvious definition of addition and scalar multiplication).  a b  (a) V0 = c d ∈ M2R | a ≤ b ≤ c ≤ d  a 2 (b) V1 = [ b ] ∈ R | a + b is an odd integer  x 2 2 2 (c) V2 = [ y ] ∈ R | x = y . (d) V3 = {p ∈ R[x] | p(0)p(1) = 0}

3. Linear maps Definition 3.1. [defn-linear] Let V and W be vector spaces, and let φ: V −→ W be a function (so for each element v ∈ V we have an element φ(v) ∈ W ). We say that φ is linear if (a) For any v and v0 in V , we have φ(v + v0) = φ(v) + φ(v0) in W . (b) For any t ∈ R and v ∈ V we have φ(tv) = tφ(v) in W . By taking t = v = 0 in (b), we see that a must satisfy φ(0) = 0. Further simple arguments also show that φ(v − v0) = φ(v) − φ(v0). Remark 3.2. [rem-linear-condensed] The definition can be reformulated slightly as follows. A map φ: V → W is linear iff (c) For any t, t0 ∈ R and any v, v0 ∈ V we have φ(tv + t0v0) = tφ(v) + t0φ(v0). To show that this reformulation is valid, we must show that if (c) holds, then so do (a) and (b); and conversely, if (a) and (b) hold, then so does (c). Condition (a) is the special case of (c) where t = t0 = 1, and condition (b) is the special case where t0 = 0 and v0 = 0. Thus, if (c) holds then so do (a) and (b). Conversely, suppose that (a) and (b) hold, and that we have t, t0 ∈ R and v, v0 ∈ V . Condition (a) tells us that φ(tv + t0v0) = φ(tv) + φ(t0v0), and condition (b) tells us that φ(tv) = tφ(v) and φ(t0v0) = t0φ(v0). Putting these together, we get φ(tv + t0v0) = tφ(v) + t0φ(v0), so condition (c) holds, as required. Example 3.3. [eg-square-nonlinear] Consider the functions f, g : R −→ R given by f(x) = 2x and g(x) = x2. Then g(x + x0) = x2 + x02 + 2xx0 6= x2 + x02 = g(x) + g(x0), so g is not linear. Similarly, for general x and x0 we have sin(x + x0) 6= sin(x) + sin(x0) and exp(x + x0) 6= exp(x) + exp(x0), so the functions sin and exp are not linear. On the other hand, we have f(x + x0) = 2(x + x0) = 2x + 2x0 = f(x) + f(x0) f(tx) = 2tx = tf(x) so f is linear. 6 Example 3.4. [eg-LRR] The obvious generalisation of the previous example is as follows. For any number m ∈ R, we can define µm : R −→ R by µm(x) = mx (so f in the previous example is µ2). We have 0 0 0 0 µm(x + x ) = m(x + x ) = mx + mx = µm(x) + µm(x )

µm(tx) = mtx = tmx = tµm(x),

so µm is linear (and in fact, these are all the linear maps from R to R). Note also that the graph of µm is a straight line of slope m through the origin; this is essentially the reason for the word “linear”. Example 3.5. [eg-rot-linear] For any v ∈ R2, we let ρ(v) be the vector obtained by rotating v through 90 degrees anticlockwise around x −y the origin. It is well-known that the formula for this is ρ [ y ] = [ x ].

−y x x y

We thus have h x i h x0 i h x+x0 i h −y−y0 i h −y i h −y0 i h x i h x0 i ρ + 0 = ρ 0 = = + = ρ + ρ 0 y y y+y x+x0 x x0 y y  h i h i h i h i ρ t x = ρ tx = −ty = tρ x , y ty tx y so ρ is linear. (Can you explain this geometrically, without using the formula?) Example 3.6. [eg-ref-linear] For any v ∈ R2, we let τ(v) be the vector obtained by reflecting v across the line y = 0. It is clear that the x x formula is τ [ y ] = [ −y ], and using this we see that τ is linear.

x y

x −y

Example 3.7. [eg-norm-nonlinear] 2 x p 2 2 Define θ : R −→ R by θ(v) = kvk, so θ [ y ] = x + y . This is not linear, because θ(u + v) 6= θ(u) + θ(v) 1  −1  in general. Indeed, if u = [ 0 ] and v = 0 then θ(u + v) = 0 but θ(u) + θ(v) = 1 + 1 = 2. Example 3.8. [eg-shift-nonlinear] 2 2 x  x+1  0 0 Define σ : R −→ R by σ [ y ] = y−1 . Then σ is not linear, because σ [ 0 ] 6= [ 0 ]. 7 Example 3.9. [eg-homogeneous-nonlinear] Define α: R2 → R2 by h 3 2 2 i α [ x ] = y /(x +y ) . y x3/(x2+y2) (This does not really make sense when x = y = 0, but for that case we make the separate definition that 0 0 α [ 0 ] = [ 0 ].) This map satisfies α(tv) = tα(v), but it does not satisfy α(u + v) = α(u) + α(v), so it is not 1 0 linear. For example, if u = [ 0 ] and v = [ 1 ] then α(u) = v and α(v) = u but α(u + v) = (u + v)/2 6= α(u) + α(v). Example 3.10. [eg-vector-algebra] u v h 1 i h 1 i 3 Given vectors u = u2 and v = v2 in , recall that the inner product and cross product are defined by u3 v3 R

hu, vi = u.v = u1v1 + u2v2 + u3v3

h u2v3−u3v2 i u × v = u3v1−u1v3 . u1v2−u2v1 Fix a vector a ∈ R3. Define α: R3 −→ R by α(v) = ha, vi and β : R3 −→ R3 by β(v) = a × v. Then both α and β are linear. To prove this we must show that α(tv) = tα(v) and α(v + w) = α(v) + α(w) and β(tv) = tβ(v) and β(v + w) = β(v) + β(w). We will write out only the last of these; the others are similar but easier.   h v1+w1 i a2(v3+w3)−a3(v2+w2) h a2v3−a3v2 i h a2w3−a3w2 i β(v + w) = β v2+w2 = a3(v1+w1)−a1(v3+w3) = a3v1−a1v3 + a3w1−a1w3 = β(v) + β(w). v3+w3 a1(v2+w2)−a2(v1+w1) a1v2−a2v1 a1w2−a2w1 Example 3.11. [eg-matrix-linear] Let A be a fixed m × n matrix. Given a vector v of length n (so v ∈ Rn), we can multiply A by v in the n m usual way to get a vector Av of length m. We can thus define φA : R −→ R by φA(v) = Av. It is clear 0 0 that A(v + v ) = Av + Av and Atv = tAv, so φA is a linear map. We will see later that every linear map from Rn to Rm has this form. In particular, if we put  0 −1   1 0  R = 1 0 T = 0 −1 we find that x −y x x x x R [ y ] = [ x ] = ρ ([ y ]) T [ y ] = [ −y ] = τ ([ y ])

(where ρ and τ are as in Examples 3.5 and 3.6). This means that ρ = φR and τ = φT . Example 3.12. [eg-int-linear] For any continuous function f : R −→ R, we write Z 1 I(f) = f(x)dx ∈ R. 0 This defines a map I : C(R) −→ R. If we put p(x) = x2 q(x) = 2x − 1 r(x) = ex we have I(p) = 1/3 and I(q) = 0 and I(r) = e − 1. Using the obvious equations Z 1 Z 1 Z 1 f(x) + g(x)dx = f(x)dx + g(x)dx 0 0 0 Z 1 Z 1 tf(x)dx = t f(x)dx 0 0 we see that I is a linear map. 8 Definition 3.13. [eg-diff-linear] For any smooth function f : R −→ R we write D(f) = f 0 and L(f) = f 00 + f. These are again smooth functions, so we have maps D : C∞(R) −→ C∞(R) and L: C∞(R) −→ C∞(R). If we put p(x) = sin(x) q(x) = cos(x) r(x) = ex then D(p) = q and D(q) = −p and D(r) = r. It follows that L(p) = L(q) = 0 and that L(r) = 2r. Using the obvious equations (f + g)0 = f 0 + g0 (tf)0 = t f 0 we see that D is linear. Similarly, we have L(f + g) = (f + g)00 + (f + g) = f 00 + g00 + f + g = (f 00 + f) + (g00 + g) = L(f) + L(g) L(tf) = (tf)00 + tf = t f 00 + tf = tL(f). This shows that L is also linear. Example 3.14. [eg-trace-det]  a b  For any 2 × 2 matrix A = c d , the trace and determinant are defined by trace(A) = a + d ∈ R and det(A) = ad−bc ∈ R. We thus have two functions trace, det: M2R −→ R. It is easy to see that trace(A+B) = trace(A) + trace(B) and trace(tA) = t trace(A), so trace: M2R −→ R is a linear map. On the other hand, we 2 have det(tA) = t det(A) and det(A + B) 6= det(A) + det(B) in general, so det: M2R −→ R is not a linear map. For a specific counterexample, consider

1 0 0 0 A = [ 0 0 ] and B = [ 0 1 ] Then det(A) = det(B) = 0 but det(A + B) = 1, so det(A + B) 6= det(A) + det(B). None of this is really restricted to 2 × 2 matrices. For any n we have a map trace: MnR −→ R given by Pn trace(A) = i=1 Aii, which is again linear. We also have a determinant map det: MnR −→ R which satisfies det(tI) = tn; this shows that det is not linear, except in the silly case where n = 1. Example 3.15. [eg-inv-nonlinear] −1 “Define” φ: M2R −→ M2R by φ(A) = A , so

 a b  h d/(ad−bc) −b/(ad−bc) i φ c d = −c/(ad−bc) a/(ad−bc) . This is not a linear map, simply because it is not a well-defined map at all: the “definition” does not make sense when ad−bc = 0. Even if it were well-defined, it would not be linear, because φ(I +I) = (2I)−1 = I/2, whereas φ(I) + φ(I) = 2I, so φ(I + I) 6= φ(I) + φ(I).

Example 3.16. [eg-rref-nonlinear] Define φ: M3R −→ M3R by φ(A) = the row reduced echelon form of A. For example, we have the following sequence of reductions: h 1 2 3 i h 1 2 3 i h 1 2 3 i h 1 2 0 i 4 8 6 → 0 0 −6 → 0 0 1 → 0 0 1 , 7 14 9 0 0 −12 0 0 −12 0 0 0 which shows that h 1 2 3 i h 1 2 0 i φ 4 8 6 = 0 0 1 . 7 14 9 0 0 0 The map is not linear, because φ(I) = I and also φ(2I) = I, so φ(2I) 6= 2φ(I). 9 Example 3.17. [eg-transpose-linear] T T We can define a map trans: MnR −→ MnR by trans(A) = A . Here as usual, A is the transpose of A, which is obtained by flipping A across the main diagonal. For example:

h 1 2 3 iT h 1 0 0 i 0 4 5 = 2 4 0 . 0 0 6 3 5 6 T T T T T T In general, we have (A )ij = Aji. It is clear that (A + B) = A + B and (tA) = tA , so trans: MnR −→ MnR is a linear map. Definition 3.18. [defn-iso] We say that a linear map φ: V → W is an isomorphism if it is a bijection, so there is an inverse map φ−1 : W −→ V with φ(φ−1(w)) = w for all w ∈ W , and φ−1(φ(v)) = v for all v ∈ V . (It turns out that φ−1 is automatically a linear map - we leave this as an exercise.) We say that V and W are isomorphic if there exists an isomorphism from V to W . Example 3.19. [eg-iso-matrices] 4 We can now rephrase part of Example 2.11 as follows: there is an isomorphism φ: M2R −→ R given by  a b  T 4 pq φ c d = [a, b, c, d] , so M2R is isomorphic to R . Similarly, the space Mp,qR is isomorphic to R . Example 3.20. [eg-iso-physical] Let U be the space of physical vectors, as in Example 2.6. A choice of axes and length units gives rise to an isomorphism from R3 to U. More explicitly, choose a point P on the surface of the earth (for example, the base of the Eiffel Tower) and put u = the vector of length 1 km pointing east from P v = the vector of length 1 km pointing north from P w = the vector of length 1 km pointing vertically upwards from P .

Define φ: R3 → U by φ(x, y, z) = xu + yv + zw. Then φ is an isomorphism. We will be able to give more interesting examples of isomorphisms after we have learnt about subspaces. Exercises Exercise 3.1. [ex-check-linear] Which of the following rules defines a linear map? 2 2 x  x+y  (a) φ0 : R → R given by φ0 [ y ] = x−y 3 h x i (b) φ1 : → given by φ1 y = xyz R R z  a b  (c) φ2 : M2R → R given by φ2 c d = max(|a|, |b|, |c|, |d|) 0 00 (d) φ3 : R[x] → R given by φ3(f) = f(0) + f (1) + f (2) (e) φ4 : R[x] → R given by φ4(f) = f(0)f(1).

Exercise 3.2. [ex-find-eg-linear] In each of the cases below, give an example of a nonzero linear map φ: V → W . (Here “nonzero” means that there is at least one v ∈ V such that φ(v) 6= 0.) (a) V = R4 and W = R2 2 (b) V = M3R and W = R (c) V = M3R and W = R[x] (d) V = R[x] and W = M2R

Exercise 3.3. [ex-char-nonlinear] Define χ: MnR −→ R[t]≤n by χ(A) = det(tI − A) = the characteristic polynomial of A. Is this a linear map? 10 Exercise 3.4. [ex-spectral-radius] Given a matrix A ∈ MnC, we write ρ(A) for the spectral radius of A, which is the largest absolute value of any eigenvalue of A. In symbols, we have ρ(A) = max{|λ| | det(λI − A) = 0}.

Is ρ: MnC → C a linear map?

4. Subspaces Definition 4.1. [defn-subspace] Let V be a vector space. A vector subspace (or just subspace) of V is a subset W ⊆ V such that (a)0 ∈ W (b) Whenever u and v lie in W , the element u + v also lies in W . (In other words, W is closed under addition.) (c) Whenever u lies in W and t lies in R, the element tu also lies in W . (In other words, W is closed under scalar multiplication.) These conditions mean that W is itself a vector space. Remark 4.2. [rem-restricted-ops] Strictly speaking, a vector space is a set together with a definition of addition and scalar multiplication such that certain identities hold. We should therefore specify that addition in W is to be defined using the same rule as for V , and similarly for scalar multiplication. Remark 4.3. [rem-subspace-condensed] The definition can be reformulated slightly as follows: a set W ⊆ V is a subspace iff (a)0 ∈ W (d) Whenever u, v ∈ W and t, s ∈ R we have tu + sv ∈ W . To show that this reformulation is valid, we must check that if condition (d) holds then so do (b) and (c); and conversely, that if (b) and (c) hold then so does (d). In fact, conditions (b) is the special cases of (d) where t = s = 1, and condition (c) is the special case of (d) where v = 0; so if (d) holds then so do (b) and (c). Conversely, suppose that (b) and (c) hold, and that u, v ∈ W and t, s ∈ R. Then condition (c) tells us that tu ∈ W , and similarly that sv ∈ W . Given these, condition (b) tells us that tu + sv ∈ W ; we conclude that condition (d) holds, as required. Example 4.4. [eg-silly-subspaces] There are two silly examples: {0} is always a subspace of V , and V itself is always a subspace of V . Example 4.5. [eg-subspaces-R-two] Any straight line through the origin is a subspace of R2. These are the only subspaces of R2 (except for the two silly examples). Example 4.6. [eg-subspaces-R-three] In R3, any straight line through the origin is a subspace, and any plane through the origin is also a subspace. These are the only subspaces of R3 (except for the two silly examples). Example 4.7. [eg-trace-free] The set W = {A ∈ M2R | trace(A) = 0} is a subspace of M2R. To check this, we first note that 0 ∈ W . Suppose that A, A0 ∈ W and t, t0 ∈ R. We then have trace(A) = trace(A0) = 0 (because A, A0 ∈ W ) and so trace(tA + t0A0) = t trace(A) + t0 trace(A0) = t.0 + t0.0 = 0, so tA + t0A0 ∈ W . Thus, conditions (a) and (d) in Remark 4.3 is satisfied, showing that W is a subspace as claimed. Example 4.8. [eg-Rx-subspace] Recall that R[x] denotes the set of all polynomial functions in one variable (so the functions p(x) = x + 1 and q(x) = (x + 1)5 − (x − 1)5 and r(x) = 1 + 4x4 + 8x8 define elements p, q, r ∈ R[x]). It is clear that 11 the sum of two polynomials is another polynomial, and any polynomial multiplied by a constant is also a polynomial, so R[x] is a subspace of the vector space F (R) of all functions on R. We write R[x]≤d for the set of polynomials of degree at most d, so a general element f ∈ R[x]≤d has the form d d X i f(x) = a0 + a1x + ... + adx = aix i=0 for some a0, . . . , ad ∈ R. It is easy to see that this is a subspace of R[x]. T d+1 If we let f correspond to the vector [ a0 ··· ad ] ∈ R , we get a one-to-one correspondence between d+1 d+1 R[x]≤d and R . More precisely, there is an isomorphism φ: R → R[x]≤d given by d T X i φ([ a0 ··· ad ] ) = aix . i=0 Remark 4.9. [rem-off-by-one] d d+1 It is a common mistake to think that R[x]≤d is isomorphic to R (rather than R ), but this is not correct. Note that the list 0, 1, 2, 3 has four entries (not three), and similarly, the list 0, 1, 2, . . . , d has d + 1 entries (not d). Example 4.10. [eg-even-odd] A function f : R −→ R is said to be even if f(−x) = f(x) for all x, and odd if f(−x) = −f(x) for all x. For example, cos(−x) = cos(x) and sin(−x) = − sin(x), so cos is even and sin is odd. (Of course, most functions are neither even nor odd.) We write EF for the set of even functions, so EF is a subset of the set F (R) of all functions from R to R, and cos ∈ EF . If f and g are even, it is clear that f + g is also even. If f is even and t is a constant, then it is clear that tf is also even; and the zero function is certainly even as well. This shows that EF is actually a subspace of F (R). Similarly, the set OF of odd functions is a subspace of F (R). Example 4.11. [eg-diffeq] Let V be the vector space of smooth functions u(x, t) in two variables x and t (to be thought of as position and time). • We say that u is a solution of the Wave Equation if ∂2u ∂2u + = 0. ∂x2 ∂t2 This equation governs the propagation of small waves in deep water, or of electromagnetic waves in empty space. • We say that u is a solution of the Heat Equation if ∂u ∂2u − = 0. ∂t ∂x2 This governs the flow of heat along an iron bar. • We say that u is a solution of the Korteweg-de Vries (KdV) equation if ∂u ∂3u ∂u + − 6u = 0. ∂t ∂x3 ∂x This equation governs the propagation of large waves in shallow water. The set of solutions of the Wave Equation is a vector subspace of V , as is the set of solutions to the Heat Equation. However, the sum of two solutions to the KdV equation does not satisfy the KdV equation, so the set of solutions is not a subspace of V . In other words, the Wave Equation and the Heat Equation are linear, but the KdV equation is not. The distinction between linear and nonlinear differential equations is of fundamental importance in physics. Linear equations can generally be solved analytically, or by efficient computer algorithms, but nonlinear equations require far more computing power. The equations of electromagnetism are linear, which explains why hundreds of different radio, TV and mobile phone channels can coexist, together with visible light (which is also a form of electromagnetic radiation), with little or no interference. The motion of fluids and gasses is governed by the Navier-Stokes equation, which is nonlinear; because of this, massive supercomputers are needed for weather forecasting, climate modelling, and aircraft design. 12 Example 4.12. [eg-matrix-subspaces] Consider the following sets of 3 × 3 matrices: T U0 = {symmetric matrices} = {A ∈ M3R | A = A} T U1 = {antisymmetric matrices} = {A ∈ M3R | A = −A} U2 = {trace-free matrices} = {A ∈ M3R | trace(A) = 0} U3 = { diagonal matrices } = {A ∈ M3R | Aij = 0 whenever i 6= j} U4 = { strictly upper-triangular matrices } = {A ∈ M3R | Aij = 0 whenever i ≥ j} U5 = { invertible matrices } = {A ∈ M3R | det(A) 6= 0} U6 = { noninvertible matrices } = {A ∈ M3R | det(A) = 0} Then U0,...,U4 are all subspaces of M3R. We will prove this for U0 and U4; the other cases are similar. T T Firstly, it is clear that the zero matrix has 0 = 0, so 0 ∈ U0. Suppose that A, B ∈ U0 (so A = A and BT = B) and s, t ∈ R. Then (sA + tB)T = sAT + tBT = sA + tB, so sA + tB ∈ U0. Using Remark 4.3, we conclude that U0 is a subspace. Now consider U4. The elements of U are the matrices of the form 4   0 a12 a13 A = 0 0 a23 0 0 0

In particular, the zero matrix is an element of U4 (with a12 = a13 = a23 = 0). Now suppose that A, B ∈ U4 and s, t ∈ R. We have       0 a12 a13 0 b12 b13 0 sa12 + tb12 sa13 + tb13 sA + tB = s 0 0 a23 + t 0 0 b23 = 0 0 sa23 + tb23 , 0 0 0 0 0 0 0 0 0 which shows that sA + tB is again strictly upper triangular, and so is an element of U4. Using Remark 4.3 again, we conclude that U4 is also a subspace. On the other hand, U5 is not a subspace, because it does not contain the zero matrix. Similarly, U6 is not a subspace: if we put h 1 0 0 i h 0 0 0 i A = 0 0 0 B = 0 1 0 0 0 0 0 0 1 then A, B ∈ U6 but A + B = I 6∈ U6. Definition 4.13. [defn-sum] Let U be a vector space, and let V and W be subspaces of U. We put V + W = {u ∈ U | u = v + w for some v ∈ V and w ∈ W }. Example 4.14. [eg-sum-easy] If U = R3 and h x i h 0 i V = { 0 | x ∈ } W = { 0 | z ∈ } 0 R z R then h x i V + W = { 0 | x, z ∈ } z R Example 4.15. [eg-sum-matrix] If U = M2R and a b  0 b  V = {[ 0 0 ] | a, b ∈ R} W = { 0 d | b, d ∈ R} then  a b  V + W = { 0 d | a, b, d ∈ R}.  a b  a b 0 0 a b 0 0 Indeed, any matrix of the form A = 0 d can be written as A = [ 0 0 ] + [ 0 d ] with [ 0 0 ] ∈ V and [ 0 d ] ∈ W , so A ∈ V + W . Conversely, any A ∈ V + W can be written as A = B + C with B ∈ V and C ∈ W . This  a b1   0 b2  means that B has the form B = 0 0 for some a, b1 ∈ R and C has the form C = 0 d for some b2, d ∈ R,  a b1+b2  so A = 0 d , which lies in V + W . 13 Proposition 4.16. [prop-meet-sum] Let U be a vector space, and let V and W be subspaces of U. Then both V ∩ W and V + W are subspaces of U.

Proof. We first consider V ∩ W . As V is a subspace we have 0 ∈ V , and as W is a subspace we have 0 ∈ W , so 0 ∈ V ∩ W . Next, suppose we have x, x0 ∈ V ∩ W . Then x, x0 ∈ V and V is a subspace, so x + x0 ∈ V . Similarly, we have x, x0 ∈ W and W is a subspace so x + x0 ∈ W . This shows that x + x0 ∈ V ∩ W , so V ∩ W is closed under addition. Finally consider x ∈ V ∩ W and t ∈ R. Then x ∈ V and V is a subspace so tx ∈ V . Similarly x ∈ W and W is a subspace so tx ∈ W . This shows that tx ∈ V ∩ W , so V ∩ W is closed under scalar multiplication, so V ∩ W is a subspace. Now consider the space V + W . We can write 0 as 0 + 0 with 0 ∈ V and 0 ∈ W , so 0 ∈ V + W . Now suppose we have x, x0 ∈ V + W . As x ∈ V + W we can find v ∈ V and w ∈ W such that x = v + w. As x0 ∈ V + W we can also find v0 ∈ V and w0 ∈ W such that x0 = v0 + w0. We then have v + v0 ∈ V (because V is closed under addition) and w + w0 ∈ W (because W is closed under addition). We also have x + x0 = (v + v0) + (w + w0) with v + v0 ∈ V and w + w0 ∈ W , so x + x0 ∈ V + W . This shows that V + W is closed under addition. Now suppose we have t ∈ R. Then tv ∈ V (because V is closed under scalar multiplication) and tw ∈ W (because W is closed under scalar multiplication). We thus have tx = tv + tw with tv ∈ V and tw ∈ W , so tx ∈ V + W . This shows that V + W is also closed under scalar multiplication, so it is a subspace. 

Example 4.17. [eg-intersect-planes] Take U = R3 and V = {[x, y, z]T | x + 2y + 3z = 0} W = {[x, y, z]T | 3x + 2y + z = 0}.

We claim that T V ∩ W = {[x, −2x, x] | x ∈ R}. and V +W = R3. Indeed, a vector [x, y, z]T lies in V ∩W iff we have x+2y +3z = 0 and also 3x+2y +z = 0. If we subtract these two equations and divide by two, we find that z = x. If we feed this back into the first equation, we see that y = −2x. Conversely, if y = −2x and z = x we see directly that both equations are satisfied. It follows that V ∩ W = {[x, −2x, x]T | t ∈ R} as claimed. Next, consider an arbitrary vector u = [x, y, z] ∈ R3. Put  12x + 8y + 4z   −8y − 4z  1 1 v = 3x + 2y + z w = −3x + 10y − z , 12   12   −6x − 4y − 2z 6x + 4y + 14z

so v + w = u. One can check that

(12x + 8y + 4z) + 2(3x + 2y + z) + 3(−6x − 4y − 2z) = 0 3(−8y − 4z) + 2(−3x + 10y − z) + (6x + 4y + 14z) = 0

so v ∈ V and w ∈ W . This shows that u ∈ V + W , so V + W = R3. Example 4.18. [eg-intersect-poly] Take

U = R[x]≤4 V = {f ∈ U | f(0) = f 0(0) = 0} W = {f ∈ U | f(−x) = f(x) for all x}.

To understand these, it is best to write the defining conditions more explicitly in terms of the coefficients of 2 3 4 f. Any element f ∈ U can be written as f(x) = a0 + a1x + a2x + a3x + a4x for some a0, . . . , a4. We then 14 have

f(0) = a0 0 2 3 f (x) = a1 + 2a2x + 3a3x + 4a4x 0 f (0) = a1 2 3 4 f(−x) = a0 − a1x + a2x − a3x + a4x 3 f(x) − f(−x) = 2a1x + 2a3x

Thus f ∈ V iff a0 = a1 = 0, and f ∈ W iff f(x) = f(−x) iff a1 = a3 = 0. This means that 2 3 4 U = {a0 + a1x + a2x + a3x + a4x | a0, . . . , a4 ∈ R} 2 3 4 V = {a2x + a3x + a4x | a2, a3, a4 ∈ R} 2 4 W = {a0 + a2x + a4x | a0, a2, a4 ∈ R}

From this we see that f ∈ V ∩ W iff a0 = a1 = a3 = 0, so 2 4 V ∩ W = {a2x + a4x | a2, a4 ∈ R}. 1 We next claim that f ∈ V + W iff a1 = 0, so f has no term in x . Indeed, from the formulae above we see that any polynomial in V or in W has no term in x1, so if we add together a polynomial in V and a 1 polynomial in W we will still have no term in x , so for f ∈ V + W we have a1 = 0 as claimed. Conversely, if f has no term in x1 then we can write 2 3 4 3 2 4 f(x) = a0 + a2x + a3x + a4x = a3x + (a0 + a2x + a4x ), 3 2 4 with a3x ∈ V and a0 + a2x + a4x ∈ W , so f ∈ V + W . In particular, the polynomial f(x) = x does not lie in V + W , so V + W 6= U. Definition 4.19. [defn-ker-img] Let U and V be vector spaces, and let φ: U −→ V be a linear map. Then the and image of φ are defined by ker(φ) = {u ∈ U | φ(u) = 0} image(φ) = {v ∈ V | v = φ(u) for some u ∈ U}. Example 4.20. [eg-ker-img-i] h x i h 0 i Define π : 3 → 3 by π y = y . Then R R z z h x i ker(π) = { 0 | x ∈ } 0 R h 0 i image(π) = { y | y, z ∈ } z R Proposition 4.21. [prop-ker-img] Let U and V be vector spaces, and let φ: U −→ V be a linear map. Then ker(φ) is a subspace of U, and image(φ) is a subspace of V .

Proof. Firstly, we have φ(0U ) = 0V , which shows both that 0U ∈ ker(φ) and that 0V ∈ image(φ). Next, suppose that u, u0 ∈ ker(φ), which means that φ(u) = φ(u0) = 0. As φ is linear this implies that φ(u + u0) = φ(u) + φ(u0) = 0 + 0 = 0, so u + u0 ∈ ker(φ). This shows that ker(φ) is closed under addition. Now suppose we have t ∈ R. Using the linearity of φ again, we have φ(tu) = tφ(u) = t.0 = 0, so tu ∈ ker(φ). This means that ker(φ) is also closed under scalar multiplication, so it is a subspace of U. Now suppose we have v, v0 ∈ image(φ). This means that we can find x, x0 ∈ U with φ(x) = v and φ(x0) = v0. We thus have x + x0, tx ∈ U and as φ is linear we have φ(x + x0) = φ(x) + φ(x0) = v + v0 and φ(tx) = tφ(x) = tv. This shows that v + v0 and tv lie in image(φ), so image(φ) is closed under addition and scalar multiplication, so it is a subspace.  15 Example 4.22. [eg-ker-img-ii] Define φ: R3 −→ R3 by φ([x, y, z]T ) = [x − y, y − z, z − x]T . Then T 3 T ker(φ) = {[x, y, z] ∈ R | x = y = z} = {[t, t, t] | t ∈ R} T 3 T 2 image(φ) = {[x, y, z] ∈ R | x + y + z = 0} = {[x, y, −x − y] | x, y ∈ R }. Indeed, for the kernel we have [x, y, z]T ∈ ker(φ) iff φ([x, y, z]T ) = [0, 0, 0]T iff x − y = y − z = z − x = 0 iff x = y = z, which means that [x, y, z]T = [t, t, t]T for some t. For the image, note that if x + y + z = 0 then φ([0, −x, −x − y]T ) = [0 − (−x), (−x) − (−x − y), −x − y − 0]T = [x, y, z]T , so [x, y, z]T ∈ image(φ). Conversely, if [x, y, z]T ∈ image(φ) then [x, y, z]T = φ([u, v, w]T ) for some u, v, w ∈ R, which means that x = u − v and y = v − w and z = w − u, so x + y + z = (u − v) + (v − w) + (w − u) = 0. Thus ker(φ) is a line through the origin (and thus a one-dimensional subspace) and image(φ) is a plane through the origin (and thus a two-dimensional subspace). Example 4.23. [eg-anti-symm] T T Define φ: MnR −→ MnR by φ(A) = A − A (which is linear). Then clearly φ(A) = 0 iff A = A iff A is a symmetric matrix. Thus ker(φ) = {n × n symmetric matrices }. We claim that also image(φ) = {n × n antisymmetric matrices }. For brevity, we write W for the set of antisymmetric matrices, so we must show that image(φ) = W . For any A we have φ(A)T = (A − AT )T = AT − ATT , but ATT = A, so φ(A)T = AT − A = −φ(A). This shows that φ(A) is always antisymmetric, so image(φ) ⊆ W . Next, if B is antisymmetric then BT = −B so φ(B/2) = B/2 − BT /2 = B/2 + B/2 = B. Thus B is φ(something), so B ∈ image(φ). This shows that W ⊆ image(φ), so W = image(φ) as claimed. Example 4.24. [eg-ker-img-iv] 3 T Define φ: R[x]≤1 → R by φ(f) = [f(0), f(1), f(2)] . More explicitly, we have φ(ax + b) = [b, a + b, 2a + b]T = a[0, 1, 2]T + b[1, 1, 1]T . If ax + b ∈ ker(φ) then we must have φ(ax + b) = 0, or in other words b = a + b = 2a + b = 0, which implies that a = b = 0 and so ax + b = 0. This means that ker(φ) = {0}. Next, we claim that image(φ) = {[u, v, w]T | u − 2v + w = 0}. Indeed, if [u, v, w]T ∈ image(φ) then we must have [u, v, w] = φ(ax + b) = [b, a + b, 2a + b] for some a, b ∈ R. This means that u − 2v + w = b − 2(a + b) + 2a + b = 0, as required. Conversely, suppose that we have a vector [u, v, w]T ∈ R3 with u − 2v + w = 0. We then have w = 2v − u and so  u  h u i h u i φ((v − u)x + u) = (v−u)+u = v = v , 2(v−u)+u 2v−u w so [u, v, w]T is in the image of φ. Remark 4.25. [rem-ker-img-iv] How did we arrive at our description of the image, and our proof that that description is correct? We need to consider a vector [u, v, w]T and ask whether it can be written as φ(f) for some polynomial f(x) = ax + b. In other words, we want to have [u, v, w]T = [b, a + b, 2a + b]T , which means that u = b and v = a + b and w = 2a + b. The first two equations tell us that the only possible solution is to take a = v − u and b = u, so f(x) = (v − u)x + u. This potential solution is only a real solution if the third equation w = 2a + b is also satisfied, which means that w = 2(v − u) + u = 2v − u, which means that w − 2v + u = 0. Recall that a map φ: U −→ V is surjective if every element v ∈ V has the form φ(u) for some u ∈ U. Moreover, φ is said to be injective if whenever φ(u) = φ(u0) we have u = u0. 16 Example 4.26. [eg-ker-img-v] 2 0 T Define φ: R[x]≤2 → R by φ(f) = [f(1), f (1)] . More explicitly, we have φ(ax2 + bx + c) = [a + b + c, 2a + b]T . It follows that ax2 + bx + c lies in ker(φ) iff a + b + c = 0 = 2a + b, which gives b = −2a and c = −a − b = −a + 2a = a, so ax2 + bx + c = ax2 − 2ax + a = a(x2 − 2x + 1) = a(x − 1)2. It follows that ker(φ) = {a(x − 1)2 | a ∈ R}. In particular, ker(φ) is nonzero, so φ is not injective. Explicitly, we have x2 + 1 6= 2x but φ(x2 + 1) = [2, 2]T = φ(2x). On the other hand, we claim that φ is surjective. Indeed, for any vector a = [u, v]T ∈ R2 we check that φ(vx + u − v) = [v + u − v, v]T = [u, v]T = a, so a is φ(something) as required. Remark 4.27. [rem-ker-img-v] How did we arrive at the proof of surjectivity? We need to find a polynomial f(x) = ax2 + bx + c such that φ(f) = [u, v]T , or equivalently [a + b + c, 2a + b] = [u, v], which means that a + b + c = u and 2a + b = v. These equations can be solved to give b = v − 2a and c = u − v + a, with a arbitrary. We can choose to take a = 0, giving b = v and c = u − v, so f(x) = vx + u − v. Proposition 4.28. [prop-epi-mono] Let U and V be vector spaces, and let φ: U −→ V be a linear map. Then φ is injective iff ker(φ) = {0}, and φ is surjective iff image(φ) = V . Proof. • Suppose that φ is injective, so whenever φ(u) = φ(u0) we have u = u0. Suppose that u ∈ ker(φ). Then φ(u) = 0 = φ(0). As φ is injective and φ(u) = φ(0), we must have u = 0. Thus ker(φ) = {0}, as claimed. • Conversely, suppose that ker(φ) = {0}. Suppose that φ(u) = φ(u0). Then φ(u−u0) = φ(u)−φ(u0) = 0, so u − u0 ∈ ker(φ) = {0}, so u − u0 = 0, so u = u0. This means that φ is injective. • Recall that image(φ) is the set of those v ∈ V such that v = φ(u) for some u ∈ U. Thus image(φ) = V iff every element v ∈ V has the form φ(u) for some u ∈ U, which is precisely what it means for φ to be surjective.  Corollary 4.29. [cor-iso] φ: U −→ V is an isomorphism iff ker(φ) = 0 and image(φ) = V .  Example 4.30. [eg-ker-img-vi] Consider the map φ: R3 → R3 given by φ([x, y, z]T ) = [x − y, y − z, z − x]T as in Example 4.22. Then ker(φ) = {[t, t, t]T | t ∈ R}, which is not zero, so φ is not injective. Explicitly, we have [1, 2, 3]T 6= [4, 5, 6]T but φ([1, 2, 3]T ) = [1 − 2, 2 − 3, 3 − 1]T = [−1, −1, 2]T = φ([4, 5, 6]T ), so [1, 2, 3]T and [4, 5, 6]T are distinct points with the same image under φ, so φ is not injective. Moreover, we have seen that image(φ) = {[u, v, w]T | u + v + w = 0}, which is not all of R3. In particular, the vector [1, 1, 1]T does not lie in image(φ) (because 1 + 1 + 1 6= 0), so it cannot be written as φ([x, y, z]T ) for any [x, y, z]. This means that φ is not surjective. Example 4.31. [eg-ker-img-vii] 3 T Consider the map φ: R[x]≤1 → R given by φ(f) = [f(0), f(1), f(2)] as in Example 4.24. We saw there that ker(φ) = {0}, so φ is injective. However, we have image(φ) = {[u, v, w]T ∈ R3 | u − 2v + w = 0}, which is not the whole of R3. In particular, the vector a = [1, 0, 0]T does not lie in image(φ) (because 1 − 2.0 + 0 6= 0), so φ is not surjective. 17 2 0 T Example 4.32. Consider the map φ: R[x]≤2 → R given by φ(f) = [f(1), f (1)] , as in Example 4.26. We saw there that the polynomial f(x) = (x − 1)2 is a nonzero element of ker(φ), so φ is not injective. We also saw that image(φ) = R2, so φ is surjective. Definition 4.33. [defn-oplus] Let V and W be vector spaces. We define V ⊕ W to be the set of pairs (v, w) with v ∈ V and w ∈ W . Addition and scalar multiplication are defined in the obvious way: (v, w) + (v0, w0) = (v + v0, w + w0) t.(v, w) = (tv, tw). This makes V ⊕ W into a vector space, called the direct sum of V and W . We may sometimes use the notation V × W instead of V ⊕ W . Example 4.34. [eg-Rp-oplus-Rq] An element of Rp ⊕ Rq is a pair (x, y), where x is a list of p real numbers, and y is a list of q real numbers. Such a pair is essentially the same thing as a list of p + q real numbers, so Rp ⊕ Rq = Rp+q. Remark 4.35. [rem-gluing-vectors] Strictly speaking, Rp ⊕Rq is only isomorphic to Rp+q, not equal to it. This is a pedantic distinction if you are doing things by hand, but it becomes more significant if you are using a computer. Maple would represent an element of R2 ⊕ R3 as something like a=[<10,20>,<7,8,9>], and an element of R5 as something like b=<10,20,7,8,9>. To convert from the second form to the first, you can use syntax like this: a := [b[1..2],b[3..5]]; Conversion the other way is a little more tricky. It is easiest to define an auxiliary function called strip() as follows: strip := (u) -> op(convert(u,list)); This converts a vector (like <7,8,9>) to an unbracketed sequence (like 7,8,9). You can then do b := < strip(a[1]), strip(a[2]) >; Now suppose that V and W are subspaces of a third space U. We then have a space V ⊕ W as above, and also a subspace V + W ≤ U as in Definition 4.13. We need to understand the relationship between these. Proposition 4.36. [prop-direct-internal] The rule σ(v, w) = v + w defines a linear map σ : V ⊕ W → U, whose image is V + W , and whose kernel is the space X = {(x, −x) ∈ V ⊕ W | x ∈ V ∩ W }. Thus, if V ∩ W = 0 then ker(σ) = 0 and σ gives an isomorphism V ⊕ W → V + W . Proof. We leave it as an exercise to check that σ is a linear map. The image is the set of things of the form v + w for some v ∈ V and w ∈ W , which is precisely the definition of V + W . The kernel is the set of pairs (x, y) ∈ V ⊕ W for which x + y = 0. This means that x ∈ V and y ∈ W and y = −x. Note then that x = −y and y ∈ W so x ∈ W . We also have x ∈ V , so x ∈ V ∩ W . This shows that ker(σ) = {(x, −x) | x ∈ V ∩ W }, as claimed. If V ∩ W = 0 then we get ker(σ) = 0, so σ is injective (by Proposition 4.28). If we regard it as a map to V + W (rather than to U) then it is also surjective, so it is an isomorphism V ⊕ W → V + W , as claimed.  Remark 4.37. [rem-direct-internal] If V ∩ W = 0 and V + W = U then σ gives an isomorphism V ⊕ W → U. In this situation it is common to say that U = V ⊕ W . This is not strictly true (because U is only isomorphic to V ⊕ W , not equal to it), but it is a harmless abuse of language. Sometimes people call V ⊕ W the external direct sum of V and W , and they say that U is the internal direct sum of V and W if U = V + W and V ∩ W = 0. Example 4.38. [eg-odd-plus-even] Consider the space F of all functions from R to R, and the subspaces EF and OF of even functions and odd functions. We claim that F = EF ⊕OF . To prove this, we must check that EF ∩OF = 0 and EF +OF = F . Suppose that f ∈ EF ∩ OF . Then for any x we have f(x) = f(−x) (because f ∈ EF ), but f(−x) = −f(x) 18 (because f ∈ OF ), so f(x) = −f(x), so f(x) = 0. Thus EF ∩ OF = 0, as required. Next, consider an arbitrary function g ∈ F . Put

g+(x) = (g(x) + g(−x))/2

g−(x) = (g(x) − g(−x))/2. Then

g+(−x) = (g(−x) + g(x))/2 = g+(x)

g−(−x) = (g(−x) − g(x))/2 = −g−(x),

so g+ ∈ EF and g− ∈ OF . It is also clear from the formulae that g = g+ + g−, so g ∈ EF + OF . This shows that EF + OF = F , so F = EF ⊕ OF as claimed. Example 4.39. [eg-id-plus-trace-free] Put

U = M2R  a b  V = {A ∈ M2R | trace(A) = 0} = { c −a | a, b, c ∈ R} t 0 W = {tI | t ∈ R} = {[ 0 t ] | t ∈ R}. We claim that U = V ⊕ W . To check this, first suppose that A ∈ V ∩ W . As A ∈ W we have A = tI for some t ∈ R, but trace(A) = 0 (because A ∈ V ) whereas trace(tI) = 2t, so we must have t = 0, which means p q that A = 0. This shows that V ∩ W = 0. Next, consider an arbitrary matrix B = [ r s ] ∈ U. We can write this as B = C + D, where h (p−s)/2 q i C = r (s−p)/2 ∈ V h i p + s D = (p+s)/2 0 = I ∈ W. 0 (p+s)/2 2 This shows that U = V + W . p q Remark 4.40. How did we find these formulae? We have a matrix B = [ r s ] ∈ U, and we want to write it  a b  t 0 as B = C + D with C ∈ U and D ∈ W . We must then have C = c −a for some a, b, c and D = [ 0 t ] for some t, and we want to have p q  a b  t 0  t+a b  [ r s ] = c −a + [ 0 t ] = c t−a , so b = q and c = r and t + a = p and t − a = s, which gives a = (p − s)/2 and t = (p + s)/2 as before. Exercises Exercise 4.1. [ex-subspaces-R-three] Consider the following subspaces of R4: U = {[w, x, y, z]T | w − x + y − z = 0} V = {[w, x, y, z]T | w + x + y = 0 = x + y + z} T W = {[u, u + v, u + 2v, u + 3v] | u, v ∈ R}. Find U ∩ V , U ∩ W and V ∩ W .

Exercise 4.2. [ex-degenerate-planes] Consider the planes P , Q and R in R3 given by P = {[x, y, z]T | x + 2y + 3z = 0} Q = {[x, y, z]T | 3x + 2y + z = 0} R = {[x, y, z]T | x + y + z = 0}. This system of planes has an unusual feature, not shared by most other systems of three planes through the origin. What is it? 19 Exercise 4.3. Let a, b and c be nonzero real numbers. Show that the matrix  0 a b A = −a 0 c −b −c 0 has one real eigenvalue, and two purely imaginary eigenvalues.

Exercise 4.4. Find the rank of the following matrix:  1 1 1 1  1 1 −1 −1 A = 1 −1 0 0 0 0 1 −1

Exercise 4.5. [ex-check-subspace] Which of the following subsets of R4 is a subspace? T U0 = {[w, x, y, z] | w + x = 0} T U1 = {[w, x, y, z] | w + x = 1} T U2 = {[w, x, y, z] | w + 2x + 3y + 4z = 0} T 2 3 4 U3 = {[w, x, y, z] | w + x + y + z = 0} T 2 2 U4 = {[w, x, y, z] | w + x = 0}

Exercise 4.6. [ex-check-subspace-FR] Which of the following subsets of F (R) are subspaces?

U0 = {f | f(0) = 0} U1 = {f | f(1) = 1}

U2 = {f | f(0) ≥ 0} U3 = {f | f(0) = f(1)}

U4 = {f | f(0)f(1) = f(2)f(3)}.

Exercise 4.7. [ex-find-eg-subspace] For each of the following vector spaces V , give an example of a subspace W ≤ V such that W 6= 0 and W 6= V .

(a) V = R[x]≤3 (b) V = M2,3R (c) V = {[x, y, z]T ∈ R3 | x + y + z = 0}

Exercise 4.8. [ex-find-eg-trivial-intersect] For each of the following vector spaces U, give an example of subspaces V,W ≤ U such that V 6= 0 and W 6= 0 but V ∩ W = 0. (a) U = R4 (b) U = M2R (c) U = {[x, y, z]T ∈ R3 | x + y + z = 0}

Exercise 4.9. [ex-quad-intersection] Put U = R[x]≤2 and V = {f ∈ U | f(0) = 0} and W = {f ∈ U | f(1) + f(−1) = 0}. Show that V ∩ W is the set of all polynomials of the form f(x) = bx, and that V + W = U. Please write your argument carefully, using complete sentences and correct notation.

20 Exercise 4.10. [ex-inj-misc-i] 2 u  u −u  Define α: R → M2R by α [ v ] = −v v . Show that α is injective, and that 1 image(α) = {A ∈ M2R | A [ 1 ] = 0}.

Exercise 4.11. [ex-inj-misc-ii] 1 Define φ: M2R → M2R by φ(A) = A − 2 trace(A)I.  a b  (a) Find φ c d . (b) Show that ker(φ) = {aI | a ∈ R}. (c) Show that image(φ) = {A ∈ M2R | trace(A) = 0}.

Exercise 4.12. [ex-inj-misc-iii] 3 Define φ: R[x]≤2 → R by Z 0 Z 1 Z 1 T φ(f) = f(x) dx, f(x) dx, f(x) dx . −1 −1 0 (a) If f(x) = ax2 + bx + c, find φ(f). (b) Show that ker(φ) = {c(1 − 3x2) | c ∈ R}. T (c) Find a function g+(x) = px + q such that φ(g+) = [1, 1, 0] . T (d) Put g−(x) = g+(−x), and show that φ(g−) = [0, 1, 1] . (e) Deduce that image(φ) = {[u, v, w]T ∈ R3 | v = u + w}.

Exercise 4.13. [ex-inj-misc-iv] For each of the following linear maps, decide whether the map is injective, whether it is surjective, and whether it is an isomorphism. Please write your arguments carefully, using complete sentences and correct notation. Where counterexamples are required, make them as simple and specific as possible. 2 3 x h x i (a) φ: → given by φ [ y ] = y R R x h x i x−y (b) φ: 3 → 2 given by φ y =   R R z y−z 3 0 00 T (c) φ: R[x]≤2 → R given by φ(f) = [f(0), f (0), f (0)] 2 x  0 x+y  (d) φ: R → M2R given by φ [ y ] = x+y 0 R 1 (e) φ: R[x] → R given by φ(f) = −1 f(x) dx

Exercise 4.14. [ex-inj-misc-v] Put s 2t L = {[ 2s ] | s ∈ R} M = {[ t ] | t ∈ R} Show that L ∩ M = 0 and L + M = R2 (or in other words, R2 = L ⊕ M).

Exercise 4.15. [ex-trace-free-symmetric] T T Put U = M3R and V = {A ∈ U | A = A} and W = {A ∈ U | A = −A}. Show that V ∩ W = 0 and V + W = U (or in other words, U = V ⊕ W ).

5. Independence and spanning sets Two randomly-chosen vectors in R2 will generally not be parallel; it is an important special case if they happen to point in the same direction. Similarly, given three vectors u, v and w in R3, there will usually not be any plane that contains all three vectors. This means that we can get from the origin to any point by travelling a certain (possibly negative) distance in the direction of u, then a certain distance in the direction of v, then a certain distance in the 21 direction of w. The case where u, v and w all lie in a common plane will have special geometric significance in any purely mathematical problem, and will often have special physical significance in applied problems. Our task in this section is to generalise these ideas, and study the corresponding special cases in an arbitrary vector space V . The abstract picture will be illuminating even in the case of R2 and R3. Definition 5.1. [defn-dependent] Let V be a vector space, and let V = v1, . . . , vn be a list of elements of V .A linear relation between the vi’s T n T is a vector [λ1, . . . , λn] ∈ R such that λ1v1 + ... + λnvn = 0. The vector [0,..., 0] is obviously a linear relation, called the trivial relation. If there is a nontrivial linear relation, we say that the list V is linearly dependent. Otherwise, if the only relation is the trivial one, we say that the list V is linearly independent.

Example 5.2. [eg-vector-dependence] Consider the following vectors in R3: h 1 i h 4 i h 7 i v1 = 2 v2 = 5 v3 = 8 3 6 9

T Then one finds that v1 − 2v2 + v3 = 0, so [1, −2, 1] is a nontrivial linear relation, so the list v1, v2, v3 is linearly dependent.

Example 5.3. [eg-vector-independence] Consider the following vectors:

h 1 i h 0 i h 0 i v1 = 1 v2 = 1 v3 = 0 . 1 1 1

T A linear relation between these is a vector [λ1, λ2, λ3] such that λ1v1 + λ2v2 + λ3v3 = 0, or equivalently   λ1 h 0 i λ1+λ2 = 0 . λ1+λ2+λ3 0

From this we see that λ1 = 0, then from the equation λ1 + λ2 = 0 we see that λ2 = 0, then from the equation λ1 + λ2 + λ3 = 0 we see that λ3 = 0. Thus, the only linear relation is the trivial one where [λ1, λ2, λ3] = [0, 0, 0], so our vectors v1, v2, v3 are linearly independent. Example 5.4. [eg-poly-dependence] 2 Consider the polynomials pn(x) = (x + n) , so

2 p0(x) = x 2 p1(x) = x + 2x + 1 2 p2(x) = x + 4x + 4 2 p3(x) = x + 6x + 9.

I claim that the list p0, p1, p2 is linearly independent. Indeed, a linear relation between them is a vector T [λ0, λ1, λ2] such that λ0p0 + λ1p1 + λ2p2 = 0, or equivalently

2 (λ0 + λ1 + λ2)x + (2λ1 + 4λ2)x + (λ1 + 4λ2) = 0 for all x, or equivalently

λ0 + λ1 + λ2 = 0, 2λ1 + 4λ2 = 0, λ1 + 4λ2 = 0.

Subtracting the last two equations gives λ1 = 0, putting this in the last equation gives λ2 = 0, and now the T T first equation gives λ0 = 0. Thus, the only linear relation is [λ0, λ1, λ2] = [0, 0, 0] , so the list p0, p1, p2 is independent. I next claim, however, that the list p0, p1, p2, p3 is linearly dependent. Indeed, you can check that p3 − T 3p2 +3p1 −p0 = 0, so [1, −3, 3, −1] is a nontrivial linear relation. (The entries in this list are the coefficients in the expansion of (T − 1)3 = T 3 − 3T 2 + 3T − 1; this is not a coincidence, but the explanation would take us too far afield.) 22 Example 5.5. [eg-function-dependence] Consider the functions x f1(x) = e −x f2(x) = e

f3(x) = sinh(x)

f4(x) = cosh(x). These are linearly dependent, because sinh(x) is by definition just (ex − e−x)/2, so x −x x −x f1 − f2 − 2f3 = e − e − (e − e ) = 0, so [1, −1, 2, 0]T is a nontrivial linear relation. Similarly, we have cosh(x) = (ex + e−x)/2, so [1, 1, 0, −2]T is another linear relation. Example 5.6. [eg-matrix-independence] Consider the matrices 1 0 0 1 0 0 0 0 E1 = [ 0 0 ] E2 = [ 0 0 ] E3 = [ 1 0 ] E4 = [ 0 1 ] . T A linear relation between these is a vector [λ1, λ2, λ3, λ4] such that λ1E1 + λ2E2 + λ3E3 + λ4E4 is the zero matrix. But λ E + λ E + λ E + λ E =  λ1 λ2  , 1 1 2 2 3 3 4 4 λ3 λ4 and this is only the zero matrix if λ1 = λ2 = λ3 = λ4 = 0. Thus, the only linear relation is the trivial one, showing that E1,...,E4 are linearly independent. Remark 5.7. [rem-independent-mu] n Let V be a vector space, and let V = v1, . . . , vn be a list of elements of V . We have a linear map µV : R −→ V , given by T µV ([λ1, . . . , λn] ) = λ1v1 + ... + λnvn. T n By definition, a linear relation between the vi’s is just a vector λ = [λ1, . . . , λn] ∈ R such that µV (λ) = 0, or in other words, an element of the kernel of µV . Thus, V is linearly independent iff ker(µV ) = {0} iff µV is injective (by Proposition 4.28). Definition 5.8. [defn-wronskian] ∞ ∞ Let C (R) be the vector space of smooth functions f : R → R. Given f1, . . . , fn ∈ C (R), their Wronskian (j) matrix is the matrix WM(f1, . . . , fn) whose entries are the derivatives fi for i = 1, . . . , n and j = 0, . . . , n− 1. For example, in the case n = 4, we have   f1 f2 f3 f4 0 0 0 0  f1 f2 f3 f4  WM(f1, f2, f3, f4) =  00 00 00 00  . f1 f2 f3 f4  000 000 000 000 f1 f2 f3 f4

The Wronskian of f1, . . . , fn is the determinant of the Wronskian matrix; it is written W (f1, . . . , fn). Note that the entries in the Wronskian matrix are all functions, so the determinant is again a function. Example 5.9. [eg-wronskian] Consider the functions exp and sin and cos, so exp0 = exp and sin0 = cos and cos0 = − sin and sin2 + cos2 = 1. We have exp sin cos  exp sin cos  0 W (exp, sin, cos) = det exp sin cos0  = det exp cos − sin exp sin00 cos00 exp − sin − cos = exp .(− cos2 − sin2) − exp .(− sin . cos + sin . cos) + exp .(− sin2 − cos2) = −2 exp . Proposition 5.10. [prop-wronskian] If f1, . . . , fn are linearly dependent, then W (f1, . . . , fn) = 0. (More precisely, the function w = W (f1, . . . , fn) is the zero function, ie w(x) = 0 for all x.) 23 Proof. We will prove the case n = 3. The general case is essentially the same, but it just needs more complicated notation. If f1, f2, f3 are linearly dependent, then there are numbers λ1, λ2, λ3 (not all zero) such that λ1f1 + λ2f2 + λ3f3 is the zero function, which means that

λ1f1(x) + λ2f2(x) + λ3f3(x) = 0 0 0 0 for all x. We can differentiate this identity to see that λ1f1(x) + λ2f2(x) + λ3f3(x) = 0 for all x, and then 00 00 00 differentiate again to see that λ1f1 (x) + λ2f2 (x) + λ3f3 (x) = 0 for all x. Thus, if we let ui be the vector  fi(x)  0 fi (x) (which is the i’th column of the Wronskian matrix), we see that λ1u1 + λ2u2 + λ3u3 = 0. This 00 fi (x) means that the columns of the Wronskian matrix are linearly dependent, which means that the determinant is zero, as claimed.  Corollary 5.11. [cor-wronskian] If W (f1, . . . , fn) 6= 0, then f1, . . . , fn are linearly independent.  Remark 5.12. [rem-wronskian-reverse] Consider a pair of smooth functions like this:

f1(x) f2(x)

Suppose that f1(x) is zero (not just small) for x ≥ 0, and that f2(x) is zero for x ≤ 0. (It is not easy to write down formulae for such functions, but it can be done; we will not discuss this further here.) For x ≤ 0,   f1(x) 0 the matrix WM(f1, f2)(x) has the form 0 , so the determinant is zero. For x ≥ 0, the matrix f1(x) 0   0 f2(x) WM(f1, f2)(x) has the form 0 , so the determinant is again zero. Thus W (f1, f2)(x) = 0 for all x, 0 f2(x) but f1 and f2 are not linearly dependent. This shows that the test in Proposition 5.10 is not reversible: if the functions are dependent then the Wronskian vanishes, but if the Wronskian vanishes then the functions need not be dependent. In practice it is rare to find such counterexamples, however. Definition 5.13. [defn-span] Given a list V = v1, . . . , vn of elements of a vector space V , we write span(V) for the set of all vectors w ∈ V that can be written in the form w = λ1v1 + ... + λnvn for some λ1, . . . , λn ∈ R. Equivalently, span(V) is the n image of the map µV : R −→ V (which shows that span(V) is a subspace of V ). We say that V spans V if span(V) = V , or equivalently, if µV is surjective. Remark 5.14. [rem-span] Often V will be a subspace of some larger space U. If you are asked whether certain vectors v1, . . . , vn span V , the first thing that you have to check is that they are actually elements of V .

There is an obvious spanning list for Rn. Definition 5.15. [defn-standard-basis] n 3 Let ei be the vector in R whose i’th entry is 1, with all other entries being zero. For example, in R we have h 1 i h 0 i h 0 i e1 = 0 e2 = 1 e3 = 0 0 0 1 Example 5.16. [eg-standard-spanning-list] n n The list e1,..., en spans R . Indeed, any vector x ∈ R can be written as x1e1 + ... + xnen, which is a 3 linear combination of e1,..., en, as required. For example, in R we have

h x1 i h 1 i h 0 i h 0 i x2 = x1 0 + x2 1 + x3 0 = x1e1 + x2e2 + x3e3. x3 0 0 1 24 Example 5.17. [eg-monomials-span] n The list 1, x, . . . , x spans R[x]≤n. Indeed, any element of R[x]≤n is a polynomial of the form f(x) = n n a0 + a1x + ··· + anx , and so is visibly a linear combination of 1, x, . . . , x . Example 5.18. [eg-span-i] Consider the vectors  1   1   0   0  1 1 1 1 u1 = 1 u2 = 1 u3 = 1 u4 = 0 1 0 1 0 4 T 4 We claim that these span R . Indeed, consider an arbitrary vector v = [ a b c d ] ∈ R . We have " a−c+d #  c−d   0   0   a  a−c+d c−d c−a b−c b (a − c + d)u1 + (c − d)u2 + (c − a)u3 + (b − c)u3 = a−c+d + c−d + c−a + 0 = c = v, a−c+d 0 c−a 0 d which shows that v is a linear combination of u1,..., u4, as required. This is a perfectly valid argument, but it does rely on a formula that we pulled out of a hat. Here is an explanation of how the formula was constructed. We want to find p, q, r, s such that v = pu1+qu2+ru3+su4, or equivalently " a # " 1 # " 1 # " 0 # " 0 #  p+q  b 1 1 1 1 p+q+r+s c = p 1 + q 1 + r 1 + s 0 =  p+q+r  , or d 1 0 1 0 p+r p + q = a (1) p + q + r + s = b (2) p + q + r = c (3) p + r = d (4) Subtracting (3) and (4) gives q = c − d; subtracting (1) and(3) gives r = c − a; subtracting (2) and(3) gives s = b − c; putting q = c − d in (1) gives p = a − c + d. With these values we have

 a−c+d  " c−d # " 0 # " 0 # " a # (a − c + d)u + (c − d)u + (c − a)u + (b − c)u = a−c+d + c−d + c−a + b−c = b = v 1 2 3 3  a−c+d  c−d c−a 0 c a−c+d 0 c−a 0 d as required. Example 5.19. [eg-span-ii] 2 Consider the polynomials pi(x) = (x + i) . We claim that the list p−2, p−1, p0, p1, p2 spans R[x]≤2. Indeed, we have 2 p0(x) = x 2 2 p1(x) − p−1(x) = (x + 1) − (x − 1) = 4x 2 2 2 p2(x) + p−2(x) − 2p0(x) = (x + 2) + (x − 2) − 2x = 8. Thus for an arbitrary quadratic polynomial f(x) = ax2 + bx + c, we have 1 1 f(x) = ap0(x) + 4 b(p1(x) − p−1(x)) + 8 c(p2(x) + p−2(x) − 2p0(x)) c b c b c = 8 p−2(x) − 4 p−1(x) + (a − 4 )p0(x) + 4 p1(x) + 8 p2(x). Example 5.20. [eg-shm] Put V = {f ∈ C∞(R) | f 00 + f = 0}. We claim that the functions sin and cos span V . In other words, we claim that if f is a solution to the equation f 00(x) = −f(x) for all x, then there are constants a and b such that f(x) = a sin(x) + b cos(x) for all x. You have probably heard in a differential equations course that this is true, but you may not have seen a proof, so we will give one. Firstly, we have sin0 = cos and cos0 = − sin, so sin00 = − sin and cos00 = − cos, so sin and cos are indeed elements of V . Consider an arbitrary element f ∈ V . Put a = f 0(0) and b = f(0), and put g(x) = f(x) − a sin(x) − b cos(x). We claim that g = 0. First, we have g(0) = f(0) − a sin(0) − b cos(0) = b − a.0 − b.1 = 0 g0(0) = f 0(0) − a sin0(0) − b cos0(0) = a − a cos(0) + b sin(0) = a − a.1 − b.0 = 0. Now put h(x) = g(x)2 + g0(x)2; the above shows that h(0) = 0. Next, we have g ∈ V , so g00 = −g, so h0(x) = 2g(x)g0(x) + 2g0(x)g00(x) = 2g0(x)(g(x) + g00(x)) = 0. This means that h is constant, but h(0) = 0, so h(x) = 0 for all x. However, h(x) = g(x)2 + g0(x)2, which is the sum of two nonnegative quantities; the only way we can have h(x) = 0 is if g(x) = 0 = g0(x). This means that g = 0, so f(x) − a sin(x) − b cos(x) = 0, so f(x) = a sin(x) + b cos(x), as required. 25 This argument has an interesting physical interpretation. You should think of g(x) as representing some kind of vibration. The term g(x)2 gives the elastic energy and g0(x)2 gives the kinetic energy, so the equation h0(x) = 0 is just conservation of total energy. Definition 5.21. [defn-fin-dim] A vector space V is finite-dimensional if there is a (finite) list V = v1, . . . , vn of elements of V that spans V . Example 5.22. [eg-fin-dim] n Using our earlier examples of spanning sets, we see that the spaces R , Mn,mR and R[x]≤n are finite- dimensional. Example 5.23. [eg-Rx-not-fin-dim] The space R[x] is not finite-dimensional. To see this, consider a list P = p1, . . . , pn of polynomials. Let d be the maximum of the degrees of p1, . . . , pn. Then pi lies in R[x]≤d for all i, so the span of P is contained in d+1 R[x]≤d. In particular, the polynomial x does not lie in span(P), so P does not span all of R[x]. Definition 5.24. [defn-basis] A basis for a vector space V is a list V of elements of V that is linearly independent and also spans V . n Equivalently, a list V = v1, . . . , vn is a basis iff the map µV : R −→ V is a bijection. Example 5.25. [eg-antisym-basis] We will find a basis for the space V of antisymmetric 3 × 3 matrices. Such a matrix has the form h 0 a b i X = −a 0 c . −b −c 0 In other words, if we put h 0 1 0 i h 0 0 1 i h 0 0 0 i A = −1 0 0 B = 0 0 0 C = 0 0 1 , 0 0 0 −1 0 0 0 −1 0 then any antisymmetric matrix X can be written in the form X = aA + bB + cC. This means that the matrices A, B and C span V , and they are clearly independent, so they form a basis. Example 5.26. [eg-symfree-basis] T Put V = {A ∈ M3R | A = A and trace(A) = 0}. Any matrix X ∈ V has the form h a b c i X = b d e c e −a−d for some a, b, c, d, e ∈ R. In other words, if we put h 1 0 0 i h 0 1 0 i h 0 0 1 i h 0 0 0 i h 0 0 0 i A = 0 0 0 B = 1 0 0 C = 0 0 0 D = 0 1 0 E = 0 0 1 0 0 −1 0 0 0 1 0 0 0 0 −1 0 1 0 then any matrix X ∈ V can be written in the form X = aA + bB + cC + dD + eE. This means that the matrices A, . . . , E span V , and they are also linearly independent, so they form a basis for V . Example 5.27. [eg-quadratic-bases] There are several interesting bases for the space Q = R[x]≤2 of polynomials of degree at most two. A typical element f ∈ Q has f(x) = ax2 + bx + c for some a, b, c ∈ R. i • The list p0, p1, p2, where pi(x) = x . This is the most obvious basis. For f as above we have

0 1 00 f = c p0 + b p1 + a p2 = f(0) p0 + f (0) p1 + 2 f (0)p2. i • The list q0, q1, q2, where qi(x) = (x + 1) , is another basis. For f as above, one checks that ax2 + bx + c = a(x + 1)2 + (b − 2a)(x + 1) + (a − b + c)

0 1 00 so f = (a − b + c)q0 + (b − 2a)q1 + a q2 = f(−1)q0 + f (−1)q1 + 2 f (−1)q2. 26 2 • The list r0, r1, r2, where ri(x) = (x + i) , is another basis. Indeed, we have 1 2 2 2 p0(x) = 1 = 2 ((x + 2) − 2(x + 1) + x ) 1 = 2 (r2(x) − 2r1(x) + r0(x)) 1 2 2 2 p1(x) = x = − 4 ((x + 2) − 4(x + 1) + 3x ) 1 = − 4 (r2(x) − 4r1(x) + 3r0(x)) 2 p2(x) = x = r0(x).

This implies that p0, p1, p2 ∈ span(r0, r1, r2) and thus that span(r0, r1, r2) = Q. • The list 2 s0(x) = (x − 3x + 2)/2 2 s1(x) = −x + 2x 2 s2(x) = (x − x)/2. These functions have the property that

s0(0) = 1 s0(1) = 0 s0(2) = 0 s1(0) = 0 s1(1) = 1 s1(2) = 0 s2(0) = 0 s2(1) = 0 s2(2) = 1

Given f ∈ Q we claim that f = f(0).s0 + f(1).s1 + f(2).s2. Indeed, if we put g(x) = f(x) − f(0)s0(x) − f(1)s1(x) − f(2).s2(x), we find that g ∈ Q and g(0) = g(1) = g(2) = 0. A quadratic polynomial with three different roots must be zero, so g = 0, so f = f(0).s0 + f(1).s1 + f(2).s2. • The list

t0(x) = 1 √ t1(x) = 3(2x − 1) √ 2 t2(x) = 5(6x − 6x + 1). These functions have the property that ( Z 1 1 if i = j ti(x)tj(x) dx = 0 0 if i 6= j. R 1 Using this, we find that f = λ0t0 + λ1t1 + λ2t2, where λi = 0 f(x)ti(x) dx. Example 5.28. [eg-poly-basis] 0 0 Put V = {f ∈ R[x]≤4 | f(1) = f(−1) = 0 and f (1) = f (−1)}. Consider a polynomial f ∈ R[x]≤4, so f(x) = a + bx + cx2 + dx3 + ex4 for some constants a, . . . , e. We then have f(1) = a + b + c + d + e f(−1) = a − b + c − d + e f 0(1) − f 0(−1) = (b + 2c + 3d + 4e) − (b − 2c + 3d − 4e) = 4c + 8e It follows that f ∈ V iff a + b + c + d + e = a − b + c − d + e = 4c + 8e = 0. This simplifies to c = −2e and a = e and b = −d, so f(x) = e − dx − 2ex2 + dx3 + ex4 = d(x3 − x) + e(x4 − 2x2 + 1). Thus, if we put p(x) = x3 − x and q(x) = x4 − 2x2 + 1 = (x2 − 1)2, then p, q is a basis for V . Example 5.29. [eg-magic] A magic square is a 3 × 3 matrix in which the sum of every row is the same, and the sum of every column is the same. More explicitly, a matrix  a b c  X = d e f g h i 27 is a magic square iff we have a + b + c = d + e + f = g + h + i a + d + g = b + e + h = c + f + i.

Let V be the set of magic squares, which is easily seen to be a subspace of M3R; we will find a basis for V . First, we write R(X) = a + b + c = d + e + f = g + h + i C(X) = a + d + g = b + e + h = c + f + i T (X) = a + b + c + d + e + f + g + h + i. on the one hand, we have T (X) = a + b + c + d + e + f + g + h + i = (a + b + c) + (d + e + f) + (g + h + i) = 3R(X). We also have T (X) = a + d + g + b + e + h + c + f + i = (a + d + g) + (b + e + h) + (c + f + i) = 3C(X). It follows that R(X) = C(X) = T (X)/3. It is now convenient to consider the subspace W = {X ∈ V | T (X) = 0}, consisting of squares as above for which a + b + c = d + e + f = g + h + i = 0 a + d + g = b + e + h = c + f + i = 0. For such a square, we certainly have c = −a − b f = −d − e g = −a − d h = −b − e. Substituting this back into the equation g+h+i = 0 (or into the equation c+f +i = 0) gives i = a+b+d+e. It follows that any element of W can be written in the form  a b −a−b  X = d e −d−e . −a−d −b−e a+b+d+e Equivalently, if we put h 1 0 −1 i h 0 1 −1 i A = 0 0 0 B = 0 0 0 −1 0 1 0 −1 1 h 0 0 0 i h 0 0 0 i D = 1 0 −1 E = 0 1 −1 , −1 0 1 0 −1 1 then any element of W can be written in the form X = aA + bB + dD + eE for some list a, b, d, e of real numbers. This means that A, B, D, E spans W , and these matrices are clearly linearly independent, so they form a basis for W . Next, observe that the matrix h 1 1 1 i Q = 1 1 1 1 1 1 lies in V but not in W (because T (Q) = 9). We claim that Q, A, B, D, E is a basis for V . Indeed, given X ∈ V we can put t = T (X)/9 and Y = X − tQ. We then have Y ∈ V and T (Y ) = T (X) − tT (Q) = 0, so Y ∈ W . As A, B, D, E is a basis for W , we see that Y = aA + bB + dD + eE for some a, b, d, e ∈ R. It follows that X = tQ + Y = tQ + aA + bB + dD + eE. This means that Q, A, B, D, E spans V . Suppose we have a linear relation qQ + aA + bB + dD + eE = 0 for some q, a, b, d, e ∈ R. Applying T to this equation gives 9q = 0 (because T (A) = T (B) = T (D) = T (E) = 0 and T (Q) = 9), and so q = 0. This leaves aA + bB + dD + eE = 0, and we have noted that A, B, D and 28 E are linearly independent, so a = b = d = e = 0 as well. This means that Q, A, B, D and E are linearly independent as well as spanning V , so they form a basis for V . Thus dim(V ) = 5. Exercises Exercise 5.1. [ex-check-dependence] For each of the following lists of vectors, say (with justification) whether they are linearly independent, whether they span R3, and whether they form a basis of R3. (If you understand the concepts involved, you should be able to do this by eye, without much calculation.) h 1 i h 3 i h 5 i h 7 i (a) u1 = 0 , u2 = 0 , u3 = 0 , u4 = 0 . 2 4 6 8 h 1 i h 1 i h 0 i h 1 i (b) v1 = 1 , v2 = 0 , v3 = 1 , v4 = 1 . 0 1 1 1 h 1 i h 4 i (c) w1 = 2 , w2 = 5 . 3 6 h 1 i h 0 i h 0 i (d) x1 = 1 , x2 = 2 , x3 = 0 . 1 2 4

Exercise 5.2. [ex-check-independence] Which of the following lists of vectors are linearly independent?

T T T (a) u1 = [1, 0, 0, 0, 1] , u2 = [0, 2, 0, 2, 0] , u3 = [0, 0, 3, 0, 0] T T T (b) v1 = [1, 1, 1, 1] , v2 = [2, 0, 0, 2] , v3 = [0, 4, 4, 0] T T T (c) w1 = [1, 1, 2] , w2 = [4, 5, 7] , w3 = [1, 1, 1]

Exercise 5.3. [ex-three-evals] Suppose we have real numbers a, b, c ∈ R and functions f, g, h ∈ C(R) such that f(a) = 1 g(a) = 0 h(a) = 0 f(b) = 0 g(b) = 1 h(b) = 0 f(c) = 0 g(c) = 0 h(c) = 1 Prove that f, g and h are linearly independent.

Exercise 5.4. [ex-exp-wronskian] kx Put fk(x) = e . Calculate W (f1, f2, f3). Are f1, f2 and f3 linearly independent?

Exercise 5.5. [ex-pow-wronskian] n n+1 n+2 Find and simplify the Wronskian of the functions g0(x) = x , g1(x) = x and g2(x) = x . You may want to use Maple for this. If you do, you will need to use simplify(...) to get the answer in its simplest form.

Exercise 5.6. [ex-surj-misc-i] Define a linear map φ: M3R → R[x]≤4 by φ(A) = [1, x, x2]A[1, x, x2]T . Show that φ is surjective, and find a basis for its kernel.

Exercise 5.7. [ex-surj-misc-ii]

2 T (a) Define a map φ: R[x]≤3 → R by φ(f) = [f(0), f(1)] . Show that this is surjective, and that the kernel is spanned by x2 − x and x3 − x2. 29 4 T (b) Define a map ψ : R[x]≤2 → R by ψ(f) = [f(0), f(1), f(2), f(3)] . Show that this is injective, and that the image is the space T 4 V = {[u0, u1, u2, u3] ∈ R | u0 − 3u1 + 3u2 − u3 = 0}.

Exercise 5.8. [ex-trace-rep] T T 2 Given vectors [p, q] , [r, s] ∈ R , we can define a linear map φ: M2R −→ R by r φ(A) = p q A . s

Show that p, q, r and s cannot be chosen so that φ(A) = trace(A) for all A ∈ M2R.

Exercise 5.9. [ex-complex-eval]  0 1 Put J = ∈ M . Define φ: [x] → M by φ(f) = f(J), or in other words −1 0 2R R ≤2 2R φ(ax2 + bx + c) = aJ 2 + bJ + cI. Find bases for ker(φ) and image(φ).

Exercise 5.10. [ex-independence-proof] Let V and W be vector spaces, and let φ: V → W be a linear map. Let V = v1, . . . , vn be a list of elements of V.

(a) Show that if v1, . . . , vn are linearly dependent, then so are φ(v1), . . . , φ(vn). (b) Give an example where v1, . . . , vn are linearly independent, but φ(v1), . . . , φ(vn) are linearly depen- dent. (c) Show that if φ(v1), . . . , φ(vn) are linearly independent, then v1, . . . , vn are linearly independent.

6. Linear maps out of Rn We next discuss linear maps Rn −→ V (for any vector space V ). We will do the case n = 2 first; the general case is essentially the same, but with more complicated notation. Definition 6.1. [defn-mu-v-w] 2 Let V be a vector space, and let v and w be elements of V . We then define µv,w : R −→ V by x µv,w ([ y ]) = xv + yw. This makes sense because: • x is a number and v ∈ V and V is a vector space, so xv ∈ V . • y is a number and w ∈ V and V is a vector space, so yw ∈ V . • xv and yw lie in the vector space V , so xv + yw ∈ V .

It is clear that µv,w is a linear map. Proposition 6.2. [prop-mu-v-w] 2 Any linear map φ: R −→ V has the form φ = µv,w for some v, w ∈ V . 1 2 2 Proof. The vector e1 = [ 0 ] is an element of R , and φ is a map from R to V , so we have an element 0 2 2 v = φ(e1) ∈ V . Similarly, the vector e2 = [ 1 ] is an element of R , and φ is a map from R to V , so we have an element w = φ(e2) ∈ V . We claim that φ = µv,w. Indeed, as φ is linear, we have

φ(xe1 + ye2) = xφ(e1) + yφ(e2) = xv + yw

= µv,w(x, y).

30 On the other hand, it is clear that 1 0 x xe1 + ye2 = x [ 0 ] + y [ 1 ] = [ y ] , so the previous equation reads x x φ ([ y ]) = µv,w ([ y ]) . This holds for all x and y, so φ = µv,w as claimed. 

The story for general n is as follows. Recall that for any list V = v1, . . . , vn of elements of V , we can n define a linear map µV : R −→ V by T X µV ([x1, . . . , xn] ) = xivi = x1v1 + ... + xnvn. i Proposition 6.3. [prop-Rn-maps] n Any linear map φ: R −→ V has the form φ = µV for some list V = v1, . . . , vn of elements of V (which are uniquely determined by the formula vi = φ(ei), where ei is as in Definition 5.15). Thus, a linear map Rn −→ V is essentially the same thing as a list of n elements of V . n Proof. Put vi = φ(ei) ∈ V . For any x ∈ R we have X x = x1e1 + ... + xnen = xiei, i so X X φ(x) = xiφ(ei) = xivi = µv1,...,vn (x), i i so φ = µv1,...,vn . (The first equality holds because φ is linear, the second by the definition of vi, and the third by the definition of µV .  Example 6.4. [eg-matrix-matrix] 3 Consider the map φ: R → M3R given by h a i  a a+b a  φ b = a+b a+b+c a+b c a a+b a

Put A = A1,A2,A3, where h 1 1 1 i h 0 1 0 i h 0 0 0 i A1 = φ(e1) = 1 1 1 A2 = φ(e2) = 1 1 1 A3 = φ(e3) = 0 1 0 1 1 1 0 1 0 0 0 0 Then h a i h 1 1 1 i h 0 1 0 i h 0 0 0 i  a a+b a  h a i µA b = a 1 1 1 + b 1 1 1 + c 0 1 0 = a+b a+b+c a+b = φ b c 1 1 1 0 1 0 0 0 0 a a+b a c so φ = µA. Example 6.5. [eg-poly-matrix] Consider the map φ: R3 → R[x] given by h a i φ b = (a + b + c)x2 + (a + b)(x + 1)2 + a(x + 2)2. c

Put P = p1, p2, p3, where 2 2 2 2 p1(x) = φ(e1)= x + (x + 1) + (x + 2) = 3x + 6x + 5 2 2 2 p2(x) = φ(e2)= x + (x + 1) = 2x + 2x + 1 2 p3(x) = φ(e3)= x . Then h a i 2 2 2 h a i µP b = a(3x + 6x + 5) + b(2x + 2x + 1) + cx = φ b . c c Corollary 6.6. [cor-matrix-linear] n m Every linear map α: R −→ R has the form φA (as in Example 3.11) for some m × n matrix A (which is uniquely determined). Thus, a linear map α: Rn −→ Rm is essentially the same thing as an m × n matrix. 31 n m m Proof. A linear map α: R −→ R is essentially the same thing as a list v1,..., vn of elements of R . If we write each vi as a column vector, then the list can be visualised in an obvious way as an m × n matrix. For example, the list 1 3 5 7 [ 2 ] , [ 4 ] , [ 6 ] , [ 8 ] corresponds to the matrix 1 3 5 7 [ 2 4 6 8 ] . Thus, a linear map α: Rn −→ Rm is essentially the same thing as an m × n matrix. There are some things to check to see that this is compatible with Example 3.11, but we shall not go through the details.  Example 6.7. [eg-rotation-matrix] Consider the linear map ρ: R3 → R3 defined by h x i h y i ρ y = z z x (so ρ(v) is obtained by rotating v through 2π/3 around the line x = y = z). Then h 0 i h 1 i h 0 i ρ(e1) = 0 ρ(e2) = 0 ρ(e3) = 1 1 0 0

This means that ρ = φR, where 0 1 0 R = 0 0 1 1 0 0 Example 6.8. [eg-vecprod-matrix] Consider a vector a = [a, b, c]T ∈ R3, and define β : R3 −→ R3 by β(v) = a × v. This is linear, so it must have the form β = φB for some 3 × 3 matrix B. To find B, we note that h x i  bz−cy  β y = cx−az , z ay−bx so h 0 i h −c i h b i β(e1) = c β(e2) = 0 β(e3) = −a . −b a 0 These three vectors are the columns of B, so h 0 −c b i B = c 0 a . −b a 0 (Note incidentally that the matrices arising in this way are precisely the 3 × 3 antisymmetric matrices.) Example 6.9. [eg-projector-matrix] Consider a unit vector a = [a, b, c]T ∈ R3 (so a2 + b2 + c2 = 1) and let P be the plane perpendicular to a. For any v ∈ R3, we let π(v) be the projection of v onto P . The formula for this is π(v) = v − hv, aia (where hv, ai denotes the inner product, also written as v.a.) You can just take this as given if you are not familiar with it. From the formula one can check that π is a linear map, so it must have the form π(v) = Av for some 3 × 3 matrix A. To find A, we observe that 2 h x i h x i h a i  x−a x−aby−acz  π y = y − (ax + by + cz) b = y−abx−b2y−bcz . z z c z−acx−bcy−c2z It follows that  1−a2   −ab   −ac  2 π(e1) = −ab π(e2) = 1−b π(e3) = −bc . −ac −bc 1−c2 These three vectors are the columns of A, so  1−a2 −ab −ac  A = −ab 1−b2 −bc . −ac −bc 1−c2 It is an exercise to check that A2 = AT = A and det(A) = 0. Exercises 32 Exercise 6.1. [ex-maps-from-quad] Let V be a vector space, and let φ: R[x]≤2 → V be a linear map. Show that there exist elements u, v ∈ V such that φ(ax + b) = au + bv for all a, b ∈ R.

Exercise 6.2. [ex-various-polys] 2 2 Consider the list V = 1, x, (1 + x) , 1 + x of elements of R[x]≤2. T (a) Simplify µV ([0, 1, 1, −1] ). 4 2 (b) Find λ ∈ R such that µV (λ) = x . 4 (c) Find λ ∈ R such that λ 6= 0 but µV (λ) = 0 (showing that V is linearly dependent).

Exercise 6.3. [ex-check-span-matrices] Which of the following lists of matrices spans M2R? 0 1 0 2 0 1 0 3 (a) A = [ 2 3 ] , [ 1 3 ] , [ 3 2 ] , [ 2 1 ] 1 1 0 1 1 0  1 0  (b) B = [ 1 0 ] , [ 1 1 ] , [ 0 1 ] , 0 −1 1 0 1 1 1 1 1 1 (c) C = [ 0 0 ] , [ 0 0 ] , [ 1 0 ] , [ 1 1 ]  463 859   937 724   431 736   777 152  (d) D = 265 −463 , 195 −937 , 428 −431 , 522 −777

Exercise 6.4. [ex-prove-rk-spans] 2 Put rk(x) = (x + k) . Prove that the list R = r0, r1, r2 spans R[x]≤2.

7. Matrices for linear maps

Let V and W be finite-dimensional vector spaces, with bases V = v1, . . . , vn and W = w1, . . . , wm say. Let α: V → W be a linear map. Then α(vj) is an element of W , so it can be expressed (uniquely) in terms of the basis W, say

α(vj) = a1jw1 + ··· + amjwm.

These numbers aij form an m × n matrix A, which we call the matrix of α with respect to V and W. Remark 7.1. [rem-endo-matrix] Often we consider the case where W = V and so we have a map α: V → V , and V and W are bases for the same space. It is often natural to take W = V, but everything still makes sense even if W 6= V. Example 7.2. [eg-vecprod-adapted-basis] Let a be a unit vector in R3, and define β, π : R3 −→ R3 by β(x) = a × x π(x) = x − ha, xia.

We have already calculated the matrices of these maps with respect to the standard basis of R3. However, it is sometimes useful to find a different basis that is specially suited to these particular maps, and find matrices with respect to that basis instead. To do this, choose any unit vector b orthogonal to a, and then put c = a × b, so c is another unit vector that is orthogonal to both a and b. By standard properties of the cross product, we have β(a) = a × a = 0, and β(b) = a × b = c by definition of c, and β(c) = a × (a × b) = ha, bia − ha, aib = −b. 33 In summary, we have β(a) = 0 = 0a + 0b + 0c β(b) = c = 0a + 0b + 1c β(c) = −b = 0a − 1b + 0c. The columns in the matrix we want are the lists of coefficients in the three equations above: the first equation gives the first column, the second equation gives the second column, and the third equation gives the third column. Thus, the the matrix of β with respect to the basis a, b, c is h 0 0 0 i 0 0 −1 . 0 1 0 Similarly, we have π(a) = 0 and π(b) = b and π(c) = c, so the matrix of π with respect to the basis a, b, c is h 0 0 0 i 0 1 0 . 0 0 1 Example 7.3. [eg-poly-shift] k k Define φ: R[x]<4 −→ R[x]<4 by φ(x ) = (x + 1) . We then have φ(1) = 1 φ(x) = 1 + x φ(x2) = 1 + 2x + x2 φ(x3) = 1 + 3x + 3x2 + x3, or in other words φ(x0) = 1.x0 + 0.x1 + 0.x2 + 0.x3 φ(x1) = 1.x0 + 1.x1 + 0.x2 + 0.x3 φ(x2) = 1.x0 + 2.x1 + 1.x2 + 0.x3 φ(x3) = 1.x0 + 3.x1 + 3.x2 + 1.x3. Thus, the matrix of φ with respect to the usual basis is  1 1 1 1  0 1 2 3 0 0 1 3 . 0 0 0 1 Example 7.4. [eg-vandermonde] 4 Define φ: R[x]<5 −→ R by φ(f) = [f(1), f(2), f(3), f(4)]T . Then  1   1   1   1   1  1 2 2 4 3 8 4 16 φ(1) = 1 φ(x) = 3 φ(x ) = 9 φ(x ) = 27 φ(x ) = 81 1 4 16 64 256 so the matrix of φ with respect to the usual bases is  1 1 1 1 1  1 2 4 8 16 1 3 9 27 81 . 1 4 16 64 128 Example 7.5. [eg-longest-word] Define φ: R4 −→ R4 by  x1   x4  x2 x3 φ x3 = x2 . x4 x1 The associated matrix is  0 0 0 1  0 0 1 0 0 1 0 0 . 1 0 0 0 34 Example 7.6. [eg-shm-shift] Let V be the space of solutions of the differential equation f 00 + f = 0, and define φ: V −→ V by φ(f)(x) = f(x + π/4). As √ sin(x + π/4) = sin(x) cos(π/4) + cos(x) sin(π/4) = (sin(x) + cos(x))/ 2, √ we have φ(sin) = (sin + cos)/ 2. As √ cos(x + π/4) = cos(x) cos(π/4) − sin(x) sin(π/4) = (cos(x) − sin(x))/ 2, √ we have φ(cos) = (− sin + cos)/ 2. It follows that the matrix of φ with respect to the basis {sin, cos} is 1 1 −1 √ . 2 1 1 Example 7.7. [eg-kill-trace] T Define φ, ψ : M2R −→ M2R by φ(A) = A and ψ(A) = A − trace(A)I/2. In terms of the usual basis 1 0 0 1 0 0 0 0 E1 = [ 0 0 ] E2 = [ 0 0 ] E3 = [ 1 0 ] E4 = [ 0 1 ] we have φ(E1) = E1, φ(E2) = E3, φ(E3) = E2, and φ(E4) = E4. The matrix of φ is thus  1 0 0 0  0 0 1 0 0 1 0 0 . 0 0 0 1 We also have  1  2 0 1 1 ψ(E1) = E1 − I/2 = 1 = 2 E1 − 2 E4 0 − 2 ψ(E2) = E2

ψ(E3) = E3  1  − 2 0 1 1 ψ(E4) = E4 − I/2 = 1 = − 2 E1 + 2 E4. 0 2 The matrix is thus  1 1  2 0 0 − 2 0 1 0 0  0 0 1 0  . 1 1 − 2 0 0 2 The following result gives another important way to think about the matrix of a linear map. Proposition 7.8. [prop-map-matrix] n For any x ∈ R , we have µW (φA(x)) = α(µV (x)), so the two routes around the square below are the same:

φA Rn / Rm

µV µW   V α / W (This is often expressed by saying that the square commutes.) Proof. We will do the case where n = 2 and m = 3; the general case is essentially the same, but with more complicated notation. In our case, v1, v2 is a basis for V , and w1, w2, w3 is a basis for W . From the definitions of aij and A, we have

α(v1) = a11w1 + a21w2 + a31w3

α(v2) = a12w1 + a22w2 + a32w3   a11 a12 A = a21 a22 a31 a32

35 x1 2 Now consider a vector x = [ x2 ] ∈ R . We have µV (x) = x1v1 + x2v2 (by the definition of µV ). It follows that

α(µV (x)) = α(x1v1 + x2v2) = x1α(v1) + x2α(v2)

= x1(a11w1 + a21w2 + a31w3) + x2(a12w1 + a22w2 + a32w3)

= (a11x1 + a12x2)w1 + (a21x1 + a22x2)w2 + (a31x1 + a32x2)w3

On the other hand, we have     a11 a12   a11x1 + a12x2 x1 φA(x) = Ax = a21 a22 = a21x1 + a22x2 , x2 a31 a32 a31x1 + a32x2 so   a11x1 + a12x2 µW (φA(x)) = µW a21x1 + a22x2 a31x1 + a32x2

= (a11x1 + a12x2)w1 + (a21x1 + a22x2)w2 + (a31x1 + a32x2)w3

= α(µV (x)).  Proposition 7.9. [prop-compose-matrix] β Suppose we have linear maps U −→ V −→α W (which can therefore be composed to give a linear map αβ : U −→ W ). Suppose that we have bases U, V and W for U, V and W . Let A be the matrix of α with respect to V and W, and let B be the matrix of β with respect to U and V. Then the matrix of αβ with respect to U and W is AB. P Proof. By the definition of matrix multiplication, the matrix C = AB has entries cik = j aijbjk. By the definitions of A and B, we have X α(vj) = aijwi i X β(uk) = bjkvj j so   X X αβ(uk) = α  bjkvj = bjkα(vj) j j   X X X X = bjk aijwi =  aijbjk wi j i i j X = cikwi. i This means precisely that C is the matrix of αβ with respect to U and W.  Definition 7.10. [defn-basis-change] 0 0 0 Let V be a finite-dimensional vector space, with two different bases V = v1, . . . , vn and V = v1, . . . , vn. We then have 0 vj = p1jv1 + ··· + pnjvn for some scalars pij. Let P be the n × n matrix with entries pij. This is called the change-of-basis matrix from V to V0. One can check that it is invertible, and that P −1 is the change of basis matrix from V0 to V. 36 Example 7.11. [eg-change-poly] Consider the following bases of R[x]≤3: 3 2 v1 = x v2 = x v3 = x v4 = 1 0 3 2 0 3 2 0 3 2 0 3 v1 = x + x + x + 1 v2 = x + x + x v3 = x + x v4 = x Then 0 v1 = 1.v1 + 1.v2 + 1.v3 + 1.v4 0 v2= 1.v1 + 1.v2 + 1.v3 + 0.v4 0 v3= 1.v1 + 1.v2 + 0.v3 + 0.v4 0 v4= 1.v1 + 0.v2 + 0.v3 + 0.v4 so the change of basis matrix is  1 1 1 1  1 1 1 0 P = 1 1 0 0 1 0 0 0 Example 7.12. [eg-change-matrices] Consider the following bases of M2R: 1 0 1 1 1 1 1 1 A1 = [ 0 0 ] A2 = [ 0 0 ] A3 = [ 1 0 ] A4 = [ 1 1 ] 0 1 −1 0  1 1  0  1 1  0 1 1 A1 = [ 1 1 ] A2 = 1 −1 A3 = −1 −1 A4 = [ 1 1 ] Then 0 A1 = 2.A1 + (−2).A2 + 0.A3 + 1.A4 0 A2= 0.A1 + 0.A2 + 2.A3 + (−1).A4 0 A3= 0.A1 + 2.A2 + 0.A3 + (−1).A4 0 A4= 0.A1 + 0.A2 + 0.A3 + 1.A4 so the change of basis matrix is  2 0 0 0  −2 0 2 0 P = 0 2 0 0 1 −1 −1 1 Lemma 7.13. [lem-base-change] n In the situation above, for any x ∈ R we have µV (φP (x)) = µV (P x) = µV0 (x), so the following diagram commutes:

φP n / n R B R BB || BB || µV0 BB ||µV B ~|| V P Proof. We have P x = y, where yi = j pijxj. Thus X X µV (P x) = yivi = pijxjvi i i,j ! X X X 0 = xj pijvi = xjvj j i j

= µV0 (x).  Proposition 7.14. [prop-basis-change] Let α: V → W be a linear map between finite-dimensional vector spaces. Suppose we have two bases for V (say V and V0, with change-of basis matrix P ) and two bases for W (say W and W0, with change-of-basis matrix Q). Let A be the matrix of α with respect to V and W, and let A0 be the matrix with respect to V0 and W0. Then A0 = Q−1AP . 37 Proof. We actually prove that QA0 = AP , which comes to the same thing. For any x ∈ Rn, we have 0 0 µW (QA x) = µW0 (A x) (Lemma 7.13)

= α(µV0 (x)) (Proposition 7.8)

= α(µV (P x)) (Lemma 7.13)

= µW (AP x) (Proposition 7.8).

0 This shows that µW ((QA − AP )x) = 0. Moreover, W is linearly independent, so µW is injective and has trivial kernel, so (QA0 − AP )x = 0. This applies for any vector x, so the matrix QA0 − AP must be zero, as claimed. The upshot is that all parts of the following diagram commute:

φA0 Rm / Rn EE { EE φP φQ {{ EE {{ EE {{ " m n }{ µV0 R / R µW0 y φA CC yy CC yy CC yy µV µW C  |yy C!  V α / W



Remark 7.15. [rem-trace-general] Suppose we have a finite-dimensional vector space V and a linear map α from V to itself. We can now define the trace, determinant and characteristic polynomial of α. We pick any basis V, let A be the matrix of α with respect to V and V, and put trace(α) = trace(A) det(α) = det(A) char(α)(t) = char(A)(t) = det(tI − A).

This is not obviously well-defined: what if we used a different basis, say V0, giving a different matrix, say A0? The proposition tells us that P −1AP = A0, and it follows that P −1(tI − A)P = tI − A0. Using the rules trace(MN) = trace(NM) and det(MN) = det(M) det(N) we see that trace(A0) = trace(P −1(AP )) = trace((AP )P −1) = trace(A(PP −1)) = trace(A) det(A0) = det(P )−1 det(A) det(P ) = det(A) char(A0)(t) = det(P )−1 det(tI − A) det(P ) = char(A)(t). This shows that definitions are in fact basis-independent.

Example 7.16. [eg-vecprod-trace] Let a ∈ R3 be a unit vector, and define β : R3 −→ R3 by β(x) = a × x. The matrix B of β with respect to the standard basis is found as follows:

h 0 i h −a3 i h a2 i h 0 −a3 a2 i β(e1) = a3 β(e2) = 0 β(e3) = −a1 B = a3 0 −a1 −a2 a1 0 −a2 a1 0 We have trace(B) = 0 and

h 0 −a1 i h a3 −a1 i h a3 0 i det(B) = 0. det − (−a3). det + a2. det = 0 − (−a3)(a3.0 − (−a2)(−a1)) + a2(a3a1 − 0.(−a2)) = 0 a1 0 −a2 0 −a2 a1 We can insead choose a unit vector b orthogonal to a and then put c = a × b. With respect to the basis h 0 0 0 i a, b, c, the map β has matrix B0 = 0 0 −1 . It is easy to see that trace(B0) = 0 = det(B0). Either way 0 1 0 we have trace(β) = 0 = det(β). We also find that char(β)(t) = char(B0)(t) = t3 + t. This is much more complicated using B. 38 Example 7.17. [eg-proj-trace] Let a ∈ R3 be a unit vector, and define π : R3 −→ R3 by π(x) = x − hx, aia. The matrix P of π with respect to the standard basis is found as follows:   1−a2 −a a −a a " 1−a2 # " −a2a1 # " −a3a1 # 1 1 2 1 3 1 2 −a a 2 π(e1) = −a a π(e2) = 1−a π(e3) = 3 2 P =  −a a 1−a −a a  1 2 2 2  1 2 2 2 3  −a1a3 −a2a3 1−a3 2 −a1a3 −a2a3 1−a3 2 2 2 2 2 2 We have trace(P ) = 1 − a1 + 1 − a2 + 1 − a3 = 3 − (a1 + a2 + a3) = 2. We can instead choose a unit vector b orthogonal to a and then put c = a × b. With respect to the basis a, b, c, the map π has matrix h 0 0 0 i P 0 = 0 1 0 . It is easy to see that trace(P 0) = 2. Either way we have trace(π) = 2. We also find that 0 0 1 det(π) = det(P 0) = 0 and char(π)(t) = char(P 0)(t) = t(t − 1)2. This is much more complicated using P . Remark 7.18. [rem-det-check] Suppose again that we have a finite-dimensional vector space V and a linear map α from V to itself. One can show that the following are equivalent: (a) α is injective (b) α is surjective (c) α is an isomorphism (d) det(α) 6= 0. (It is important here that α goes from V to itself, not to some other space.) We shall not give proofs, however. Exercises Exercise 7.1. [ex-matrix-i] Define φ: R3 → R3 by h x i h y+z i φ y = z+x . z x+y Find the matrix of φ with respect to the usual basis of R3. Then find the matrix with respect to the basis h 1 i h 1 i h 0 i u1 = 1 u2 = −1 u3 = 1 1 0 −1

Exercise 7.2. [ex-matrix-ii] 3 0 00 T Define a linear map φ: R[x]≤2 → R by φ(f) = [f(0), f (1), f (2)] . What is the matrix of φ with respect 3 to the usual bases of R[x]≤2 and R ?

Exercise 7.3. [ex-matrix-iii] Fix a real number λ, and let V be the set of functions of the form f(x) = (ax2 + bx + c)eλx.

λx In other words, we have V = R[x]≤2e . (a) Write down a basis for V . (b) Show that if f ∈ V then f 0 ∈ V , so we can define a linear map D : V → V by D(f) = f 0. (c) What is the matrix of D with respect to your chosen basis? (d) Show that (D − λ)3(f) = 0 for all f ∈ V .

Exercise 7.4. [ex-difference-op] Define a map ∆: R[x]≤4 → R[x]≤3 by ∆(f(x)) = f(x + 1) − f(x). What is the matrix of this map with respect to the the usual bases of R[x]≤4 and R[x]≤3? What are the kernel and image of ∆?

39 Exercise 7.5. [ex-check-commutative] Consider the vectors h 3 i h 1 i h 16 i h 15 i h 12 i a = 4 b = 2 u1 = −12 u2 = 20 u3 = −9 0 3 15 0 −20 3 3 Define α: R → R by α(x) = a × x. Let A be the matrix of α with respect to the basis U = u1, u2, u3. (a) Calculate α(ui) for i = 1, 2, 3, and observe that the answer is always a multiple of uj for some j. (b) Hence write down the matrix A. (c) Calculate µU (b), α(µU (b)), φA(b) and µU (φA(b)). Check that α(µU (b)) = µU (φA(b)).

Exercise 7.6. [ex-check-composite] Define maps α, β : M2R → M2R by T 1 1 α(X) = X − X β(X) = [ 1 1 ] X.

Put E = E1,E2,E3,E4, where 1 0 0 1 0 0 0 0 E1 = [ 0 0 ] E2 = [ 0 0 ] E3 = [ 1 0 ] E4 = [ 0 1 ] . Let A be the matrix of α with respect to the basis E, let B be the matrix of β with respect to E, and let C be the matrix of αβ with respect to E.

(a) Find α(Ei) for each i, and hence find A. (b) Find β(Ei) for each i, and hence find B. (c) Find α(β(Ei)) for each i, and hence find C. (d) Check that C = AB.

Exercise 7.7. [ex-check-basis-change] 0 0 0 0 0 Let α, E and A be as in the previous exercise. Now consider the alternative basis E = E1,E2,E3,E4, where 0 1 0 0  1 0  0 0 1 0  0 1  E1 = [ 0 1 ] E2 = 0 −1 E3 = [ 1 0 ] E4 = −1 0 . Let P be the change of basis matrix from E to E0, and let A0 be the matrix of α with respect to E0. 0 (a) Express each matrix Ei as a linear combination of E1,...,E4, and hence write down the matrix P . 0 0 0 (b) Express each matrix α(Ei) as a linear combination of E1,...,E4, and hence write down the matrix A0. (c) Check that PA0 = AP .

Exercise 7.8. [ex-char-poly] 0 00 Define φ: R[x]≤2 → R[x]≤2 by φ(f) = f + f + f . Find the matrix of φ with respect to a suitable basis, and hence calculate trace(φ), det(φ) and char(φ)(t).

Exercise 7.9. [ex-recurrence] Let V be the set of all sequences (a0, a1, a2,... ) for which ai+2 = 3ai+1 − 2ai for all i. (a) Define π : V → R2 by T π(a0, a1, a2, a3,... ) = [a0, a1] . Show that ker(π) = 0, so π is injective. i (b) Define sequences u, v by ui = 1 for all i, and vi = 2 . Show that u, v ∈ V . (c) Find constants p, q, r, s such that the elements b = pu + qv and c = ru + sv satisfy π(b) = [1, 0]T and π(c) = [0, 1]T . (d) Show that b and c give a basis for V , and deduce that u and v give a basis for V . (e) Define λ: V −→ V by λ(a0, a1, a2, a3,... ) = (a1, a2, a3, a4,... ). What is the matrix of λ with respect to the basis u, v?

40 8. Theorems about bases

For the next two results, we let V be a vector space, and let V = v1, . . . , vn be a list of elements in V . We put Vi = span(v1, . . . , vi) (with the convention that V0 = 0). There may or may not be any nontrivial linear relations for V. If there is a nontrivial relation λ, so that λ1v1 + ··· + λnvn = 0 and λk 6= 0 for some k, then we define the height of λ to be the largest i such that T λi 6= 0. For example, if n = 6 and 5v1 − 2v2 − 2v3 + 3v4 = 0 then [5, −2, −2, 3, 0, 0] is a nontrivial linear relation of height 4. Proposition 8.1. [prop-no-jump] The following are equivalent (so if any one of them is true, then so are the other two): (a) The list V has a nontrivial linear relation of height i (b) vi ∈ Vi−1 (c) Vi = Vi−1. Example 8.2. [eg-jump] Consider the following vectors in R3: 1 2 3 4 v1 = 2 v2 = 3 v3 = 4 v4 = 5 3 4 5 6

T Then v1 − 2v2 + v3 = 0, so [1, −2, 1, 0] is a linear relation of height 3. The equation can be rearranged as v3 = −v1 + 2v2, showing that v3 ∈ span(v1, v2) = V2. One can check that T V2 = V3 = {[x, y, z] | x + z = 2y}. Thus, in this example, with i = 2, we see that (a), (b) and (c) all hold.

T Proof. (a)⇒(b): Let λ = [λ1, . . . , λn] be a nontrivial linear relation of height i, so λ1v1 + ... + λnvn = 0. The fact that the height is i means that λi 6= 0 but λi+1 = λi+2 = ··· = 0. We can thus rearrange the linear relation as

λivi = − λ1v1 − · · · − λi−1vi−1 − λi+1vi+1 − · · · − λnvn

= − λ1v1 − · · · − λi−1vi−1 − 0.vi+1 − · · · − 0.vn

= − λ1v1 − · · · − λi−1vi−1 so

−1 −1 vi = − λ1λi v1 − · · · − λi−1λi vi−1 ∈ Vi−1.

(b)⇒(a) Suppose that vi ∈ Vi−1 = span(v1, . . . , vi−1), so vi = µ1v1 + ··· + µi−1vi−1 for some scalars µ1, . . . , µi−1. We can rewrite this as a nontrivial linear relation

µ1v1 + ··· + µi−1vi−1 + (−1).vi + 0.vi+1 + ··· + 0.vn = 0, which clearly has height i.

(b)⇒(c): Suppose again that vi ∈ Vi−1 = span(v1, . . . , vi−1), so vi = µ1v1 + ··· + µi−1vi−1 for some scalars µ1, . . . , µi−1. We need to show that Vi = Vi−1, but it is clear that Vi−1 ≤ Vi, so it will be enough to show that Vi ≤ Vi−1. Consider an element w ∈ Vi; we must show that w ∈ Vi−1. As w ∈ Vi we have w = λ1v1 + ··· + λivi for some scalars λ1, . . . , λi. This can be rewritten as

w =λ1v1 + ··· + λi−1vi−1 + λi(µ1v1 + ··· + µi−1vi−1)

=(λ1 + λiµ1)v1 + (λ2 + λiµ2)v1 + ··· + (λi−1 + λiµi−1)vi−1.

This is a linear combination of v1, . . . , vi−1, showing that w ∈ Vi−1, as claimed. 41 (c)⇒(b): Suppose that Vi = Vi−1. It is clear that the element vi lies in span(v1, . . . , vi) = Vi, but Vi = Vi−1, so vi ∈ Vi−1.  Corollary 8.3. [cor-ind-test] If for all i we have vi 6∈ Vi−1, then there cannot be a linear relation of any height, so V must be linearly independent.  Corollary 8.4. [cor-jump] The following are equivalent: (a) The list V has no nontrivial linear relation of height i (b) vi 6∈ Vi−1 (c) Vi 6= Vi−1. If these three things are true, we say that i is a jump. Lemma 8.5. [lem-restrict] Let V be a vector space, and let V = (v1, . . . , vn) be a finite list of elements of V that spans V . Then some sublist V0 ⊆ V is a basis for V .

0 0 0 Proof. Let I be the set of those integers i ≤ n for which vi 6∈ Vi−1, and put V = {vi | i ∈ I }. We first claim that V0 is linearly independent. If not, then there is a nontrivial relation. If we write only the

nontrivial terms, and keep them in the obvious order, then the relation takes the form λi1 vi1 +···+λir vir = 0 0 with ik ∈ I for all k, and λik 6= 0 for all k, and i1 < ··· < ir. This can be regarded as a nontrivial linear 0 relation for V, of height ir. Proposition 8.1 therefore tells us that vir ∈ Vir −1, which is impossible, as ir ∈ I . This contradiction shows that V0 must be linearly independent, after all. 0 0 0 Now put V = span(V ). We will show by induction that Vi ≤ V for all i ≤ n. For the initial step, we 0 0 note that V0 = 0 so certainly V0 ≤ V . Suppose that Vi−1 ≤ V . There are two cases to consider: 0 0 0 0 (a) Suppose that i ∈ I . Then (by the definition of V ) we have vi ∈ V and so vi ∈ V . As Vi = Vi−1 +Rvi 0 0 0 and Vi−1 ≤ V and Rvi ≤ V , we conclude that Vi ≤ V . 0 0 (b) Suppose that i 6∈ I , so (by the definition of I ) we have vi ∈ Vi−1 and so Vi = Vi−1. By the induction 0 0 hypothesis we have Vi−1 ≤ V , so Vi ≤ V . 0 0 Either way we have Vi ≤ V , which proves the induction step. We therefore have Vi ≤ V for all i ≤ n. In 0 particular, we have Vn ≤ V . However, Vn is just span(V), and we assumed that V spans V , so Vn = V . This proves that V ≤ V 0, and it is clear that V 0 ≤ V , so V = V 0. This means that V0 is a spanning list as well as being linearly independent, so it is a basis for V .  Corollary 8.6. [cor-basis-exists] Every finite-dimensional vector space has a basis. Proof. By Definition 5.21, we can find a finite list V that spans V . By Lemma 8.5, some sublist V0 ⊆ V is a basis.  Lemma 8.7. [lem-steinitz] Let V be a vector space, and let V = (v1, . . . , vn) and W = (w1, . . . , wm) be finite lists of elements of V such that V spans V and W is linearly independent. Then n ≥ m. (In other words, any spanning list is at least as long as any linearly independent list.)

Proof. As before, we put Vi = span(v1, . . . , vi), so Vn = span(V) = V . We will show by induction that any linearly independent list in Vi has length at most i. In particular, this will show that any linearly independent list in V = Vn has length at most n, as claimed. For the initial step, note that V0 = 0. This means that the only linearly independent list in V0 is the empy list, which has length 0, as required. Now suppose (for the induction step) that every linearly independent list in Vi−1 has length at most i − 1. Suppose we have a linearly independent list (x1, . . . , xp) in Vi; we must show that p ≤ i. The elements xj lie in Vi = span(v1, . . . , vi). We can thus find scalars ajk such that

xj = aj1v1 + aj2v2 + ··· + aj,i−1vi−1 + ajivi. We need to consider two cases: 42 (a) Suppose that for each j the last coefficient aji is zero. This means that

xj = aj1v1 + aj2v2 + ··· + aj,i−1vi−1,

so xj ∈ span(v1, . . . , vi−1) = Vi−1. This means that x1, . . . , xp is a linearly independent list in Vi−1, so the induction hypothesis tells us that p ≤ i − 1, so certainly p ≤ i. (b) Otherwise, for some xj we have aji 6= 0. It is harmless to reorder the x’s, so for notational convenience −1 we move this xj to the end of the list, which means that api 6= 0. Now put αk = akiapi and

yk = xk − αkxp.

We will show that y1, . . . , yp−1 is a linearly independent list in Vi−1. Assuming this, the induction hypothesis gives p − 1 ≤ i − 1, and so p ≤ i as required. First, we have −1 yk =xk − akiapi xp =ak1v1 + ··· + akivi −1 − akiapi (ap1v1 + ··· + apivi) −1 −1 −1 =(ak1 − akiapi ap1)v1 + (ak2 − akiapi ap2)v2 + ··· + (aki − akiapi api)vi. −1 In the last term, the coefficient aki − akiapi api is zero, so yk is actually a linear combination of v1, . . . , vi−1, so yk ∈ Vi−1. Next, suppose we have a linear relation λ1y1 + ··· + λp−1yp−1 = 0. Put

λp = −λ1α1 − λ2α2 − · · · − λp−1αp−1.

By putting yk = xk − αkxp in the relation λ1y1 + ··· + λp−1yp−1 = 0 and expanding it out, we get λ1x1 + ... + λp−1xp−1 + λpxp = 0. As x1, . . . , xp is linearly independent, this means that we must have λ1 = ··· = λp−1 = λp = 0. It follows that our original relation among the y’s was trivial. We conclude that the list y1, . . . , yp−1 is a linearly independent list in Vi−1. As explained before, the induction hypothesis now tells us that p − 1 ≤ i − 1, so p ≤ i.  Corollary 8.8. [cor-dim-good] Let V be a finite-dimensional vector space. Then V has a finite basis, and any two bases have the same number of elements, say n. This number is called the dimension of V . Moreover, any spanning list for V has at least n elements, and any linearly independent list has at most n elements.

Proof. We already saw in Corollary 8.6 that V has a basis, say V = (v1, . . . , vn). Let X be a linearly independent listin V . As V is a spanning list and X is linearly independent, Lemma 8.7 tells us that V is at least as long as X , so X has at most n elements. Now let Y be a spanning list for V . As Y spans and V is linearly independent, Lemma 8.7 tells us that Y is at least as long as V, so Y has at least n elements. Now let V0 be another basis for V . Then V0 has at least n elements (because it spans) and at most n elements (because it is independent) so it must have exactly n elements.  Corollary 8.9. [cor-iso-Kn] If V is a finite-dimensional vector space over K with dimension n, then we can choose a basis V of length n n n, and the map µV : K → V is an isomorphism, so K is isomorphic to K . Proposition 8.10. [prop-subspace-dim] Let V be a finite-dimensional vector space, and let W be a subspace of V . Then W is also finite-dimensional, and dim(W ) ≤ dim(V ).

Proof. Put n = dim(V ). We define a list W = (w1, w2,... ) as follows. If W = 0 then we take W to be the empty list. Otherwise, we let w1 be any nonzero vector in W . If w1 spans W we take W = (w1). Otherwise, we can choose an element w2 ∈ W that is not in span(w1). If span(w1, w2) = W then we stop and take W = (w1, w2). Otherwise, we can choose an element w3 ∈ W that is not in span(w1, w2). We continue in this way, so we always have wi 6∈ span(w1, . . . , wi−1), so the w’s are linearly independent (by Corollary 8.3). However, V has a spanning set of length n, so Lemma 8.7 tells us that we cannot have a linearly independent list of length greater than n, so our list of w’s must stop before we get to wn+1. This means that for some p ≤ n we have W = span(w1, . . . , wp), so W is finite-dimensional, with dim(W ) = p ≤ n.  43 Proposition 8.11. [prop-extend] Let V be an n-dimensional vector space, and let V = (v1, . . . , vp) be a linearly independent list of elements 0 0 of V . Then p ≤ n, and V can be extended to a list V = (v1, . . . , vn) such that V is a basis of V . 0 Proof. Corollary 8.8 tells us that p ≤ n. If span(v1, . . . , vp) = V then we take V = (v1, . . . , vp). Otherwise, we 0 choose some vp+1 6∈ span(v1, . . . , vp). If span(v1, . . . , vp+1) = V then we stop and take V = (v1, . . . , vp+1). Otherwise, we choose some vp+2 6∈ span(v1, . . . , vp+1) and continue in the same way. We always have vi 6∈ span(v1, . . . , vi−1), so the v’s are linearly independent (by Corollary 8.3). Any linearly independent list has length at most n (by Corollary 8.8) so our process must stop before we get to vn+1. This means that 0 0 0 V = (v1, . . . , vm) with m ≤ n, and as the process has stopped, we must have span(V ) = V . As V is also linearly independent, we see that it is a basis, and so m = n (by Corollary 8.8 again).  Proposition 8.12. [prop-basis-count] Let V be an n-dimensional vector space. (a) Any spanning list for V with exactly n elements is linearly independent, and so is a basis. (b) Any linearly independent list in V with exactly n elements is a spanning list, and so is a basis.

0 Proof. (a) Let V = (v1, . . . , vn) be a spanning list. Lemma 8.5 tells us that some sublist V ⊆ V is a basis for V . As dim(V ) = n, we see that V0 has length n, but V also has length n, so V0 must be all of V. Thus, V itself must be a basis. (b) Let W = (w1, . . . , wn) be a linearly independent list. Proposition 8.11 tells us that W can be extended to a list W0 ⊇ W such that W0 is a basis. In particular, W0 must have length n, so it must just be the same as W, so W itself is a basis.  Corollary 8.13. [cor-basis-count] Let V be an finite-dimensional vector space, and let W be a subspace with dim(W ) = dim(V ); then W = V .

Proof. Put n = dim(V ) = dim(V ), and let W = (w1, . . . , wn) be a basis for W . Then W is a linearly independent list in V with n elements, so part (b) of the Proposition tells us that W spans V . Thus V = span(W) = W .  Proposition 8.14. [prop-two-subspaces] Let U be a finite-dimensional vector space, and let V and W be subspaces of U. Then one can find lists (u1, . . . , up), (v1, . . . , vq) and (w1, . . . , wr) (for some p, q, r ≥ 0) such that

• (u1, . . . , up) is a basis for V ∩ W • (u1, . . . , up, v1, . . . , vq) is a basis for V • (u1, . . . , up, w1, . . . , wr) is a basis for W • (u1, . . . , up, v1, . . . , vq, w1, . . . , wr) is a basis for V + W . In particular, we have dim(V ∩ W ) = p dim(V ) = p + q dim(W ) = p + r dim(V + W ) = p + q + r, so dim(V ) + dim(W ) = 2p + q + r = dim(V ∩ W ) + dim(V + W ).

Proof. Choose a basis U = (u1, . . . , up) for V ∩ W . Then U is a linearly independent list in V , so it can be extended to a basis for V , say (u1, . . . , up, v1, . . . , vq). Similarly U is a linearly independent list in W , so it can be extended to a basis for W , say (u1, . . . , up, w1, . . . , wr). All that is left is to prove that the list

X = (u1, . . . , up, v1, . . . , vq, w1, . . . , wr) is a basis for V + W . Consider an element

x = α1u1 + ··· + αpup + β1v1 + ··· + βqvq + γ1w1 + ··· + γrwr ∈ span(X ). 44 P P P Put y = i αiui + j βjvj and z = k γkwk, so x = y + z. We have ui, vj ∈ V and wk ∈ W so y ∈ V and z ∈ W so x = y + z ∈ V + W . Thus span(X ) ≤ V + W . Now suppose we start with an element x ∈ V + W . We can then find y ∈ V and z ∈ W such that x = y + z. As (u1, . . . , up, v1, . . . , vq) is a basis for V , we have

y = λ1u1 + ··· + λpup + β1v1 + ··· + βqvq for some scalars λi, βj. Similarly, we have

z = µ1u1 + ··· + µpup + γ1w1 + ··· + γrwr for some scalars µi, γk. If we put αi = λi + µi we get

x = y + z = α1u1 + ··· + αpup + β1v1 + ··· + βqvq + γ1w1 + ··· + γrwr ∈ span(X ). It follows that span(X ) = V + W . Finally, suppose we have a linear relation

α1u1 + ··· + αpup + β1v1 + ··· + βqvq + γ1w1 + ··· + γrwr = 0. P P P We again put y = i αiui + j βjvj and z = k γkwk, so y + z = 0, so z = −y. Now y ∈ V , so z also lies in V , because z = −y. On the other hand, it is clear from our definition of z that it lies in W , so z ∈ V ∩ W . We know that U is a basis for V ∩ W , so z = λ1u1 + ··· + λpup for some λ1, . . . , λp. This means that

λ1u1 + ··· + λpup − γ1w1 − · · · − γrwr = 0.

We also know that (u1, . . . , up, w1, . . . , wr) is a basis for W , so the above equation implies that λ1 = ··· = λp = γ1 = ··· = γr = 0. Feeding this back into our original, relation, we get

α1u1 + ··· + αpup + β1v1 + ··· + βqvq = 0.

However, we also know that (u1, . . . , up, v1, . . . , vq) is a basis for V , so the above equation implies that α1 = ··· = αp = β1 = ··· = βq = 0. As all α’s, β’s and γ’s are zero, we see that our original linear relation was trivial. This shows that the list X is linearly independent, so it gives a basis for V + W as claimed.  Example 8.15. [eg-two-subspaces-i] Put U = M3R and h 1 i h 0 i V = {A ∈ U | all rows sum to 0 } = {A ∈ U | A 1 = 0 1 0 W = {A ∈ U | all columns sum to 0 } = {A ∈ U | [1, 1, 1]A = [0, 0, 0]} Then V ∩ W is the set of all matrices of the form

 a b −a−b   1 0 −1   0 1 −1   0 0 0   0 0 0  A = c d −c−d = a 0 0 0 + b 0 0 0 + c 1 0 −1 + d 0 1 −1 −a−c −b−d a+b+c+d −1 0 1 0 −1 1 −1 0 1 0 −1 1 It follows that the list  1 0 −1   0 1 −1   0 0 0   0 0 0  u1 = 0 0 0 , u2 = 0 0 0 , u3 = 1 0 −1 , u4 = 0 1 −1 −1 0 1 0 −1 1 −1 0 1 0 −1 1 is a basis for V ∩ W . Now put

 0 0 0   0 0 0   0 0 1   0 0 0  v1 = 0 0 0 , v2 = 0 0 0 , w1 = 0 0 0 , w2 = 0 0 1 1 0 −1 0 1 −1 0 0 −1 0 0 −1

so vi ∈ V and wi ∈ W . A typical element of V has the form

 a b −a−b   0 0 0  A = c d −c−d = au1 + bu2 + cu3 + du4 + 0 0 0 e f −e−f e−a−c f−b−d a+b+c+d−e−f

= au1 + bu2 + cu3 + du4 + (e − a − c)v1 + (f − b − d)v2.

Using this, we see that u1, . . . , u4, v1, v2 is a basis for V . Similarly, u1, . . . , u4, w1, w2 is a basis for W . It follows that

u1, u2, u3, u4, v1, v2, w1, w2 is a basis for V + W . 45 Example 8.16. [eg-two-subspaces-ii] Put U = R[x]≤3 and

V = {f ∈ U | f(1) = 0} = {(x − 1)g(x) | g(x) ∈ R[x]≤2} W = {f ∈ U | f(−1) = 0} = {(x + 1)h(x) | h(x) ∈ R[x]≤2} so V ∩ W = {f ∈ U | f is divisible by (x + 1)(x − 1) = x2 − 1}

2 3 2 Any f(x) ∈ V ∩ W has the form (ax + b)(x − 1) = a(x − x) + b(x − 1). It follows that the list u1 = 3 2 x − x, u2 = x − 1 is a basis for V ∩ W . Now put v1 = x − 1 ∈ V and w1 = x + 1 ∈ W . We claim that u1, u2, v1 is a basis for V . Indeed, any element of V has the form 2 2 f(x) = (ax + bx + c).(x − 1) = (ax + (b − c)x + c(x + 1)).(x − 1) = au1 + (b − c)u2 + cv1, 3 2 so the list spans V . If we have a linear relation au1 + bu2 + cv1 = 0 then a(x − x) + b(x − 1) + c(x − 1) = 0 for all x, so ax3 + bx2 + (c − a)x − c = 0 for all x, which implies that a = b = c = 0. Our list is thus independent as well as spanning V , so it is a basis. Similarly u1, u2, w1 is a basis for W . It follows that u1, u2, v1, w1 is a basis for V + W . Theorem 8.17. [thm-adapted-basis] Let α: U → V be a linear map between finite-dimensional vector spaces. Then one can choose a basis U = u1, . . . , um for U, and a basis V = v1, . . . , vn for V , and an integer r ≤ min(m, n) such that

(a) α(ui) = vi for 1 ≤ i ≤ r (b) α(ui) = 0 for r < i ≤ m (c) ur+1, . . . , um is a basis for ker(α) ≤ U (d) v1, . . . , vr is a basis for image(α) ≤ V .

Proof. Let v1, . . . , vr be any basis for image(α) (so (d) is satisfied). By Proposition 8.11, this can be extended to a list V = v1, . . . , vn which is a basis for all of V . Next, for j ≤ r we have vj ∈ image(α), so we can choose uj ∈ U with α(uj) = vj (so (a) is satisfied). This gives us a list u1, . . . , ur of elements of U; to these, we add vectors ur+1, . . . , um forming a basis for ker(α) (so that (b) and (c) are satisfied). Now everything is as claimed except that we have not shown that the list U = u1, . . . , un is a basis for U. Consider an element x ∈ U. We then have α(x) ∈ image(α), and v1, . . . , vr is a basis for image(α), so 0 there exist numbers x1, . . . , xr such that α(x) = x1v1 + ... + xrvr. Now put x = x1u1 + ... + xrur, and x00 = x − x0. We have 0 α(x ) = x1α(u1) + ··· + xrα(ur) = x1v1 + ··· + xrvr = α(x), 00 0 00 so α(x ) = α(x) − α(x ) = 0, so x ∈ ker(α). We also know that ur+1, . . . , um is a basis for ker(α), so there 00 exist numbers xr+1, . . . , xm with x = xr+1ur+1 + ··· + xmum. Putting this together, we get 0 00 x = x + x = (x1u1 + ··· + xrur) + (xr+1ur+1 + ··· + xmum), which is a linear combination of u1, . . . , um. It follows that the list U spans U. Now suppose we have a linear relation λ1u1 + ··· + λmum = 0. We apply α to both sides of this equation to get

0 = λ1α(u1) + ··· + λrα(ur) + λr+1α(ur+1) + ··· + λmα(um)

= λ1v1 + ··· + λrvr + λr+1.0 + ··· + λm.0

= λ1v1 + ··· + λrvr.

This is a linear relation between the vectors v1, . . . , vr, but these form a basis for image(α), so this must be the trivial relation, so λ1 = ··· = λr = 0. This means that our original relation has the form

λr+1ur+1 + ··· + λmum = 0

As ur+1, . . . , um is a basis for ker(α), these vectors are linearly independent, so the above relation must be trivial, so λr+1 = ··· = λm = 0. This shows that all the λ’s are zero, so the original relation was trivial. Thus, the vectors u1, . . . , um are linearly independent, as claimed.  46 Remark 8.18. [rem-adapted-matrix] If we use bases as in the theorem, then the matrix of α with respect to those bases has the form  I 0  A = r r,m−r 0n−r,r 0n−r,m−r Corollary 8.19. [cor-rank-nullity] If α: U −→ V is a linear map then dim(ker(α)) + dim(image(α)) = dim(U). Proof. Choose bases as in the theorem. Then dim(U) = m and dim(image(α)) = r and

dim(ker(α)) = |{ur+1, . . . , um}| = m − r. The claim follows.  Example 8.20. [eg-adapted-bases] 1 1 0 1 Consider the map φ: M2R → M2R given by φ(A) = [ 1 1 ] A [ 1 0 ], or equivalently  a b  1 1  a b  0 1 1 1  b a   b+d a+c  φ c d = [ 1 1 ] c d [ 1 0 ] = [ 1 1 ] d c = b+d a+c 1 0 0 1 = (b + d)[ 1 0 ] + (a + c)[ 0 1 ] . It follows that if we put 0 1 1 0 1 0 0 1 u1 = [ 0 0 ] u2 = [ 0 0 ] v1 = [ 1 0 ] v2 = [ 0 1 ] then φ(u1) = v1, φ(u2) = v2, and v1, v2 is a basis for image(φ). It can be extended to a basis for all of M2R 0 0 0 0 by adding v3 = [ 1 0 ] and v4 = [ 0 1 ]. Moreover, we have φ(A) = 0 iff a + c = b + d = 0 iff c = −a and d = −b, in which case  a b   1 0   0 1  A = −a −b = a −1 0 + b 0 −1 .  1 0   0 1  This means that the matrices u3 = −1 0 and u4 = 0 −1 form a basis for ker(φ). Putting this together, we see that u1, . . . , u4 and v1, . . . , v4 are bases for M2R such that φ(ui) = vi for i ≤ 2, and φ(ui) = 0 for i > 2. Exercises Exercise 8.1. [ex-find-jumps] Consider the following elements of R6:  1   1   0   1   1   0   0   1  1 1 0 1 1 0 0 1 v =  1  v =  1  v =  0  v =  1  v =  0  v =  1  v =  1  v =  0  1  1  2  −1  3  1  4  0  5  0  6  1  7  −1  8  0  1  −1  1 0 1 0 0  −1  1 −1 1 0 1 0 0 −1

Put V = v1,..., v8 and V0 = 0 and Vj = span(v1,..., vj) for j > 0. Recall that i is a jump for the sequence V if vi 6∈ Vi−1. Find all the jumps.

Exercise 8.2. [ex-two-subspaces] Let U be a finite-dimensional vector space, and let V and W be subspaces of U. In lectures we proved that there exist elements

u1, . . . , up, v1, . . . , vq, w1, . . . , wr such that

• u1, . . . , up is a basis for V ∩ W • u1, . . . , up, v1, . . . , vq is a basis for V • u1, . . . , up, w1, . . . , wr is a basis for W • u1, . . . , up, v1, . . . , vq, w1, . . . , wr is a basis for V + W . T Find elements as above for the case U = M2R and V = {A ∈ U | A = A} and W = {A ∈ U | trace(A) = 0}.

47 Exercise 8.3. [ex-two-subspaces-numbers] Let Z be a finite-dimensional vector space, and let U, V and W be subspaces of Z. Suppose that dim(U) = 2 dim(U ∩ V ) = 1 dim(V ) = 3 dim(V ∩ W ) = 2 dim(W ) = 4 dim((U + V ) ∩ W ) = 3. Find the dimensions of U + V , V + W and U + V + W . Hence show that U + V + W = V + W and thus that U ≤ V + W .

Exercise 8.4. [ex-adapted-bases] Let φ: U → V be a linear map between finite-dimensional vector spaces. Recall that there exists a number r and bases u1, . . . , un (for U) and v1, . . . , vm (for V ) such that ( vi if i ≤ r φ(ui) = 0 if i > r.

(The method is to find a basis v1, . . . , vr for image(φ), choose elements u1, . . . , ur with φ(ui) = vi, then choose any basis ur+1, . . . , un for ker(φ), then choose any elements vr+1, . . . , vm for V such that v1, . . . , vm is a basis for V .) Find such adapted bases for the following maps: 2 1 (a) φ: M2R → R[x]≤3, φ(A) = [x, x ]A [ x ] 3 0 T (b) ψ : R[x]≤2 → R , ψ(f) = [f(1), f(−1), f (0)]  A 0  (c) χ: M → M , χ(A) = 2R 3R 0 − trace(A) 4 2 2 2 2 (d) θ = µP : R → R[x]≤2, where P = x , (x + 1) , (x − 1) , x + 1.

9. Eigenvalues and eigenvectors Definition 9.1. [defn-eigen] Let V be a finite-dimensional vector space over C, and let α: V → V be a C-linear map. Let λ be a complex number. An eigenvector for α, with eigenvalue λ is a nonzero element v ∈ V such that α(v) = λv. If such a v exists, we say that λ is an eigenvalue of α.

Remark 9.2. [rem-eigen] Suppose we choose a basis V for V , and let A be the matrix of α with respect to V and V. Then the eigenvalues of α are the same as the eigenvalues of the matrix A, which are the roots of the characteristic polynomial det(tI − A). Example 9.3. [eg-delta-eigen] Put V = C[x]≤4, and define φ: V −→ V by φ(f)(x) = f(x + 1). We claim that 1 is the only eigenvalue. Indeed, the corresponding matrix P (with respect to the basis 1, x, . . . , x4) is

" 1 1 1 1 1 # 0 1 2 3 4 P = 0 0 1 3 6 0 0 0 1 4 0 0 0 0 1 The characteristic polynomial is thus " t−1 −1 −1 −1 −1 # 0 t−1 −2 −3 −4 5 det(tI − P ) = det 0 0 t−1 −3 −6 = (t − 1) 0 0 0 t−1 −4 0 0 0 0 t−1 so 1 is the only root of the characteristic polynomial. The eigenvectors are just the polynomials f with φ(f) = 1.f or equivalently f(x + 1) = f(x) for all x. These are just the constant polynomials. 48 Example 9.4. [eg-flip-eigen] k k k Put V = C[x]≤4, and define φ: V −→ V by φ(f)(x) = f(ix), so φ(x ) = i x . The corresponding matrix P (with respect to 1, x, x2, x3, x4) is " 1 0 0 0 0 # 0 i 0 0 0 P = 0 0 −1 0 0 0 0 0 −i 0 0 0 0 0 1 The characteristic polynomial is thus det(tI − P ) = (t − 1)(t − i)(t + 1)(t + i)(t − 1) = (t − 1)(t2 + 1)(t2 − 1) = t5 − t4 − t + 1 so the eigenvalues are 1, i, −1 and −i. • The eigenvectors of eigenvalue 1 are functions f ∈ V with f(ix) = f(x). These are the functions of the form f(x) = a + ex4. • The eigenvectors of eigenvalue i are functions f ∈ V with f(ix) = if(x). These are the functions of the form f(x) = bx. • The eigenvectors of eigenvalue −1 are functions f ∈ V with f(ix) = −f(x). These are the functions of the form f(x) = cx2. • The eigenvectors of eigenvalue −i are functions f ∈ V with f(ix) = −if(x). These are the functions of the form f(x) = dx3. Example 9.5. [eg-vecprod-eigen] Let u be a unit vector in R3 and define α: C3 −→ C3 by α(v) = u × v. Choose a unit vector b orthogonal to a and put c = a × b. We saw previously that the matrix of α with respect to a, b, c is h 0 0 0 i A = 0 0 −1 0 1 0 The characteristic polynomial is thus h t 0 0 i det(tI − A) = det 0 t 1 = t3 + t = t(t + i)(t − i). 0 −1 t The eigenvalues are thus 0, i and −i. • The eigenvectors of eigenvalue 0 are the multiples of a. • The eigenvectors of eigenvalue i are the multiples of b − ic. • The eigenvectors of eigenvalue −i are the multiples of b + ic. Example 9.6. [eg-proj-eigen] Let u and v be non-orthogonal vectors in R3, and define φ: C3 −→ C3 by φ(x) = hu, xiv. We claim that the characteristic polynomial of φ is t2(t − u.v). Indeed, the matrix P with respect to the standard basis is calculated as follows:  u1v1   u2v1   u3v1  φ(e1) = u1v = u1v2 φ(e2) = u2v = u2v2 φ(e3) = u3v = u3v2 u1v3 u2v3 u3v3

 u1v1 u2v1 u3v1  P = u1v2 u2v2 u3v2 u1v3 u2v3 u3v3 The characteristic polynomial is det(tI − P ) = − det(P − tI), which is found as follows: det(P − tI)

" u1v1−t u2v1 u3v1 # = det u1v2 u2v2−t u3v2 u1v3 u2v3 u3v3−t u v u v h u2v2−t u3v2 i h 1 2 3 2 i h u1v2 u2v2−t i =(u1v1 − t) det − u2v1 det + u3v1 det u2v3 u3v3−t u1v3 u3v3−t u1v3 u2v3

h u2v2−t u3v2 i 2 det = (u2v2 − t)(u3v3 − t) − u2v3u3v2 = t − (u2v2 + u3v3)t u2v3 u3v3−t h u v u v i det 1 2 3 2 = u v (u v − t) − u v u v = −u v t u1v3 u3v3−t 1 2 3 3 1 3 3 2 1 2 h i det u1v2 u2v2−t = u v u v − u v (u v − t) = u v t u1v3 u2v3 1 2 2 3 1 3 2 2 1 3 2 det(P − tI) = (u1v1 − t)(t − (u2v2 + u3v3)t) − u2v1(−u1v2t) + u3v1u1v3t 2 3 = (u1v1 + u2v2 + u3v3)t − t 3 2 2 det(tI − P ) = t − (u1v1 + u2v2 + u3v3)t = t (t − hu, vi) The eigenvalues are thus 0 and hu, vi. The eigenvectors of eigenvalue 0 are the vectors orthogonal to u. The eigenvectors of eigenvalue hu, vi are the multiples of v. 49 If we had noticed this in advance then the whole argument would have been much easier. We could have chosen a basis of the form a, b, v with a and b orthogonal to u. With respect to that basis, φ would have h 0 0 0 i matrix 0 0 0 which immediately gives the characteristic polynomial. 0 0 hu,vi

10. Inner products Definition 10.1. [defn-inner-product] Let V be a vector space over R. An inner product on V is a rule that gives a number hu, vi ∈ R for each u, v ∈ V , with the following properties: (a) hu + v, wi = hu, wi + hv, wi for all u, v, w ∈ V . (b) htu, vi = thu, vi for all u, v ∈ V and t ∈ R. (c) hu, vi = hv, ui for all u, v ∈ V . (d) We have hu, ui ≥ 0 for all u ∈ V , and hu, ui = 0 iff u = 0. Given an inner product, we will write kuk = phu, ui, and call this the norm of u. We say that u is a unit vector if kuk = 1. Remark 10.2. [rem-ip-real] Unlike most of the other things we have done, this does not immediately generalise to fields K other than R. The reason is that axiom (d) involves the condition hu, ui ≥ 0, and in an arbitrary field K (such as Z/5, for example) we do not have a good notion of positivity. Moreover, all our examples will rely heavily on the fact that x2 ≥ 0 for all x ∈ R, and of course this ceases to be true if we work over C. We will see in Section 13 how to fix things up in the complex case. Example 10.3. [eg-ip-Rn] We can define an inner product on Rn by n X h(x1, . . . , xn), (y1, . . . , yn)i = xiyi = x1y1 + x2y2 + ··· + xnyn. i=1 T n Properties (a) to (c) are obvious. For property (d), note that if u = [u1, . . . , un] ∈ R then 2 2 2 hu, ui = u1 + u2 + ··· + un. All the terms in this sum are at least zero, so the sum must be at least zero. Moreover, there can be no cancellation, so the only way that hu, ui can be zero is if all the individual terms are zero, which means u1 = u2 = ··· = un = 0, so u = 0 as a vector. Remark 10.4. [rem-ip-mat-mult] If x, y ∈ Rn then we can regard x and y as n × 1 matrices, so xT is a 1 × n matrix, so xT y is a 1 × 1 matrix, or in other words a number. This number is just hx, yi. This is most easily explained by example: in the case n = 4 we have

 x1 T  y1   y1  x2 y2 y2 x1 x2 x3 x4 x3 y3 = [ ] y3 = x1y1 + x2y2 + x3y3 + x4y4 = hx, yi. x4 y4 y4 Example 10.5. [eg-ip-funny] Although we usually use the standard inner product on Rn, there are many other inner products. For example, we can define a different inner product on R3 by h(u, v, w), (x, y, z)i0 = ux + (u + v)(x + y) + (u + v + w)(x + y + z). In particular, this gives h(u, v, w), (u, v, w)i0 = u2 + (u + v)2 + (u + v + w)2 = 3u2 + 2v2 + w2 + 4uv + 2vw + 2uw. The corresponding norm is thus p k(u, v, w)k0 = ph(u, v, w), (u, v, w)i0 = 3u2 + 2v2 + w2 + 4uv + 2vw + 2uw. 50 Example 10.6. [eg-ip-physical] Let U be the set of physical vectors, as in Example 2.6. Given u, v ∈ U we can define hu, vi = (length of u in miles) × (length of v in miles) × cos( angle between u and v ). This turns out to give an inner product on U. Of course we could use a different unit of length instead of miles, and that would just change the inner product by a constant factor. Example 10.7. [eg-ip-Coi] We can define an inner product on C[0, 1] by Z 1 hf, gi = f(x)g(x) dx. x=0 Properties (a) to (c) are obvious. For property (d), note that if f ∈ C[0, 1] then Z 1 hf, fi = f(x)2 dx 0 As f(x)2 ≥ 0 for all x, we have hf, fi ≥ 0. If hf, fi = 0 then the area between the x-axis and the graph of f(x)2 is zero, so f(x)2 must be zero for all x, so f = 0 as required. Here is a slightly more careful version of the argument. Suppose that f is nonzero. We can then find some number a with 0 ≤ a ≤ 1 and f(a) > 0. Put  = f(a)/2. As f is continuous, there exists δ > 0 such that whenever x ∈ [0, 1] and |x − a| < δ we have |f(x) − f(a)| < . For such x we have f(a) −  < f(x) < f(a) + , but f(a) = 2 so f(x) > . Usually we will be able to say that f is greater than  on the interval (a−δ, a+δ) which has length 2δ > 0, so Z 1 Z a+δ f(x)2 dx ≥ 2 dx = 2δ2 > 0. 0 a−δ This will not be quite right, however, if a − δ < 0 or a + δ > 1, because then the interval (a − δ, a + δ) is not contained in the domain where f is defined. However, we can still put a− = max(a − δ, 0) and a+ = min(a + δ, 1) and we find that a− < a+ and

Z 1 Z a+ 2 2 2 hf, fi = f(x) dx ≥  dx = (a+ − a−) > 0, 0 a− as required. Example 10.8. [eg-ip-matrix] We can define an inner product on the space MnR by hA, Bi = trace(ABT ) Consider for example the case n = 3, so   h a11 a12 a13 i b11 b12 b13 A = a21 a22 a23 B = b21 b22 b23 a a a 31 32 33 b31 b32 b33 so  a a a  " b b b # " a b +a b +a b a b +a b +a b a b +a b +a b # T 11 12 13 11 21 31 11 11 12 12 13 13 11 21 12 22 13 23 11 31 12 32 13 33 AB = a21 a22 a23 b12 b22 b32 = a21b11+a22b12+a23b13 a21b21+a22b22+a23b23 a21b31+a22b32+a23b33 a a a 31 32 33 b13 b23 b33 a31b11+a32b12+a33b13 a31b21+a32b22+a33b23 a31b31+a32b32+a33b33 so T hA, Bi = trace(AB ) =a11b11 + a12b12 + a13b13+

a21b21 + a22b22 + a23b23+

a31b31 + a32b32 + a33b33 3 3 X X = aijbij. i=1 j=1 In other words hA, Bi is the sum of the entries of A multiplied by the corresponding entries in B. Thus, 9 if we identify M3R with R in the usual way, then our inner product on M3R corresponds to the standard 51 9 n2 inner product on R . Similarly, if we identify MnR with R in the usual way, then our inner product on n2 MnR corresponds to the standard inner product on R . Example 10.9. [eg-ip-quadratic] ˙ For any a < b we can define an inner product h˙,i[a,b] on R[x]≤2 by Z b hu, vi[a,b] = u(x)v(x) dx. a In particular, we have Z b  i+j+1 b i+j+1 i+j+1 i j i+j x b − a hx , x i[a,b] = x dx= = a i + j + 1 a i + j + 1 This gives an infinite family of different inner products on R[x]≤2. For example: 13 − (−1)3 h1, x2i = = 2/3 [−1,1] 3 14 − (−1)4 hx, x2i = = 0 [−1,1] 4 r 15 − (−1)5 kx2k = = p2/5 [−1,1] 5 r 55 − 05 kx2k = = 25 [0,5] 5 Example 10.10. [eg-gauss-weight] 2 Let V be the set of functions of the form p(x)e−x /2, where p(x) is a polynomial. For example, the function 2 f(x) = (x3 − x)e−x /2, shown in the graph below, is an element of V :

We can define an inner product on V by Z ∞ hf, gi = f(x)g(x) dx −∞ Note that this only works because of the special form of the functions in V . For most functions f and g R ∞ that you might think of, the integral −∞ f(x)g(x) dx will give an infinite or undefined answer. However, 2 the function e−x decays very rapidly to zero as |x| tends to infinity, and one can check that this is enough to make the integral well-defined and finite when f and g are in V . In fact, we have the formula ∞ 2 2 Z 2 hxne−x /2, xme−x /2i = xn+me−x dx −∞ ( √ π (n+m)! if n + m is even = 2n+m ((n+m)/2)! 0 if n + m is odd Exercises Exercise 10.1. [ex-innerprod-matrices] T Use the usual inner product hA, Bi = trace(AB ) on M3R. 52 (a) Calculate all the inner products hCi,Cji, where h 1 1 1 i h 1 2 3 i h 0 1 2 i C1 = 0 1 1 C2 = 2 1 2 C3 = −1 0 3 0 0 1 3 2 1 −2 −3 0 (b) Show that if AT = A and BT = −B then A and B are orthogonal. (c) Put u = [1, 1, 1]T , and let V be the set of all matrices B such that the all three columns of B are the same. Show that if A is orthogonal to V then Au = 0.

R 1 Exercise 10.2. Use the inner product hf, gi = −1 f(x)g(x) dx on R[x]≤2. (a) Find hx + 1, x2 + xi (b) Show that if 0 ≤ i, j ≤ 2 and i + j is odd then hxi, xji = 0. (c) Consider a polynomial u(x) = px2 + q, and another polynomial f(x) = ax2 + bx + c. Give a formula for 4f(−1) − 8f(0) + 4f(1) and another formula for hf, ui. Hence find p and q such that hf, ui = 4f(−1) − 8f(0) + 4f(1) for all quadratic polynomials f.

Exercise 10.3. [ex-innerprod-exotic] T Consider the space U = M2R and the subspace V = {A ∈ U | A = A}. Given matrices A, B ∈ U, put hA, Bi = det(A − B) − det(A + B) + 2 trace(A) trace(B). Expand out hA, Bi when A = [ a1 a2 ] and B =  b1 b2 . Show that a3 a4 b3 b4 (a) hA + B,Ci = hA, Ci + hB,Ci for all A, B, C ∈ U. (b) htA, Bi = thA, Bi for all A, B ∈ U and t ∈ R. (c) hA, Bi = hB,Ai for all A, B ∈ U. (d) There exists A ∈ U such that hA, Ai < 0. (e) However, if A ∈ V then hA, Ai ≥ 0, with equality iff A = 0.

Exercise 10.4. [ex-innerprod-shm] Put ∞ 00 V = {f ∈ C (R) | f + f = 0}. For f, g ∈ V put hf, gi(t) = f(t)g(t) + f 0(t)g0(t), so hf, gi ∈ C∞(R). (a) Prove that hf, gi is actually a constant. (b) Prove that if f ∈ V then f 0 ∈ V , so that differentiation gives a linear map D : V → V . (c) The functions sin and cos give a basis for V . Using this, show that h, i is an inner product on V . (d) What is the matrix of D with respect to the basis {sin, cos}?

11. The Cauchy-Schwartz inequality If v and w are vectors in R2 or R3, you should be familiar with the fact that hv, wi = kvk kwk cos(θ), where θ is the angle between v and w. In particular, as the cosine lies between −1 and 1, we see that |hv, wi| ≤ kvk kwk. We would like to extend all this to arbitrary inner-product spaces. Theorem 11.1 (The Cauchy-Schwartz inequality). [thm-cauchy-schwartz] Let V be an inner product space over R, and let v and w be elements of V . Then |hv, wi| ≤ kvk kwk, with equality iff v and w are linearly dependent. 53 Proof. If w = 0 then |hv, wi| = 0 = kvk kwk and v and w are linearly dependent, so the theorem holds. For the rest of the proof, we can thus restrict attention to the other case, where w 6= 0. For any real numbers s and t, we have 0 ≤ ksv + twk2 = hsv + tw, sv + twi = s2hv, vi + sthv, wi + sthw, vi + t2hw, wi = s2kvk2 + 2sthv, wi + t2kwk2. Now take s = hw, wi = kwk2 and t = −hv, wi. The above inequality gives 0 ≤ kwk4kvk2 − 2kwk2hv, wi2 + hv, wi2kwk2 = kwk2(kwk2kvk2 − hv, wi2). We have assumed that w 6= 0, so kwk2 > 0. We can thus divide by kwk2 and rearrange to see that hv, wi2 ≤ kvk2kwk2. It follows that |hv, wi| ≤ kvkkwk as claimed. If we have equality (i.e. |hv, wi| = kvkkwk) then our calculation shows that ksv +twk2 = 0, so sv +tw = 0. Here s = kwk2 > 0, so we have a nontrivial linear relation between v and w, so they are linearly dependent. Conversely, suppose we start by assuming that v and w are linearly dependent. As w 6= 0, this means that v = λw for some λ ∈ R. It follows that hv, wi = λkwk2, so |hv, wik = |λ|kwk2. On the other hand, we 2 have kvk = |λ|kwk, so kvk kwk = |λ|kwk , which is the same.  Example 11.2. [eg-cauchy-schwartz] We claim that for any vector x ∈ Rn, we have √ q 2 2 |x1 + ··· + xn| ≤ n x1 + ··· + xn. To see this, use the standard inner product on Rn, and consider the vector e = [1, 1,..., 1]T . We have q 2 2 kxk = x1 + ··· + xn √ kek = n

hx, ei = x1 + ··· + xn. The Cauchy-Schwartz inequality therefore tells us that

|x1 + ··· + xn| = |hx, ei| q √ 2 2 ≤ kxkkek = x1 + ··· + xn n, as claimed. Example 11.3. [eg-cauchy-schwartz-functions] We claim that for any continuous function f : [0, 1] → R we have Z 1 r q 2 8 R 1 2 (1 − x )f(x) dx ≤ 0 f(x) dx. 0 15 Indeed, we can define an inner product on C[0, 1] by hu, vi = R 1 u(x)v(x) dx. We then have kfk = q 0 R 1 2 0 f(x) dx and Z 1 k1 − x2k2 = h1 − x2, 1 − x2i = 1 − 2x2 + x4 dx 0  2 3 1 51 2 1 = x − 3 x + 5 x 0 = 1 − 3 + 5 = 8/15 k1 − x2k = p8/15 q q The Cauchy-Schwartz inequality tells us that |hu, fi| ≤ kuk kfk, so R 1(1 − x2)f(x) dx ≤ 8 R 1 f(x)2 dx. 0 15 0 as claimed. Example 11.4. [eg-cauchy-schwartz-matrices] Let A be a nonzero n × n matrix over R. We claim that (a) trace(A)2 ≤ n trace(AAT ), with equality iff A is a multiple of the identity. (b) trace(A2) ≤ trace(AAT ), with equality iff A is either symmetric or antisymmetric. 54 T In both cases we use the inner product hA, Bi = trace(AB ) on MnR and the Cauchy-Schwartz inequality. (a) Apply the inequality to A and I, giving |hA, Ii| ≤ kAkkIk, or equivalently hA, Ii2 ≤ kAk2kIk2 = trace(AAT ) trace(IIT ) Here hA, Ii = trace(A) and trace(IIT ) = trace(I) = n, so we get trace(A)2 ≤ n trace(AAT ) as claimed. This is an equality iff A and I are linearly dependent, which means that A is a multiple of I. (b) Now instead apply the inequality to A and AT , noting that kAk = kAT k = ptrace(AAT ) and hA, AT i = trace(AATT ) = trace(A2). The conclusion is that | trace(A2)| ≤ ptrace(AAT )ptrace(AAT ), which gives trace(A2) ≤ trace(AAT ). This is an equality iff AT is a multiple of A, say AT = λA for some λ. This means that A = ATT = λAT = λ2A, and A 6= 0, so this means that λ2 = 1, or equivalently λ = ±1. If λ = 1 then AT = A and A is symmetric; if λ = −1 then AT = −A and A is antisymmetric. It is now natural to ask whether we also have hv, wi = kvk kwk cos(θ) (where θ is the angle between v and w), just as we did in R3. However, the question is meaningless as it stands, because we do not yet have a definition of angles between elements of an arbitrary inner-product space. We will use the following definition, which makes the above equation true by tautology. Definition 11.5. [defn-angle] Let V be an inner product space over R, and let v and w be nonzero elements of V , so kvk kwk > 0. Put c = hv, wi/(kvk kwk). The Cauchy-Schwartz inequality tells us that −1 ≤ c ≤ 1, so there is a unique angle θ ∈ [0, π] such that cos(θ) = c. We call this the angle between v and w.

Example 11.6. [eg-angle] √ Take V = C[0, 1] (with the usual inner product),√ and v(t) = 1, and w(t) = t. Then kvk = 1 and kwk = 1/ 3 and hv, wi = 1/2, so hv, wi/(kvk kwk) = 3/2 = cos(π/6), so the angle between v and w is π/6. Example 11.7. [eg-angle-matrices] Take V = M3R (with the usual inner product) and h 0 1 0 i h 1 0 0 i A = 1 2 1 B = 1 1 1 0 1 0 0 0 0 We then have p √ √ kAk = 02 + 12 + 02 + 12 + 22 + 12 + 02 + 12 + 02 = 8 = 2 2 p √ kBk = 12 + 02 + 02 + 12 + 12 + 12 + 02 + 02 + 02 = 4 = 2 hA, Bi = 0.1 + 1.0 + 0.0 + 1.1 + 2.1 + 1.1 + 0.0 + 1.0 + 0.0 = 4 √ √ so hA, Bi/(kAkkBk) = 4/(4 2) = 1/ 2 = cos(π/4). The angle between A and B is thus π/4. Exercises Exercise 11.1. [ex-cauchy-i] Show that for any f ∈ C[−1, 1] we have Z 1 Z 1 1/2 p 2 2 1 − x2 f(x) dx ≤ √ f(x) dx −1 3 −1 Find a nonzero function f ∈ C[−1, 1] for which the above inequality is actually an equality.

Exercise 11.2. [ex-cauchy-ii] Show that for any f ∈ C[0, 1] we have Z 1 2 Z 1  Z 1  f(x)3 dx ≤ f(x)2 dx f(x)4 dx 0 0 0 For which functions f is this actually an equality?

55 12. Projections and the Gram-Schmidt procedure Definition 12.1. [defn-complement] Let V be a vector space with inner product, and let W be a subspace. We then put W ⊥ = {v ∈ V | hv, wi = 0 for all w ∈ W }. This is called the orthogonal complement (or annihilator) of W . We say that W is complemented if W +W ⊥ = V . Lemma 12.2. [lem-complement] We always have W ∩ W ⊥ = 0. (Thus, if W is complemented, we have V = W ⊕ W ⊥.) Proof. Suppose that v ∈ W ∩ W ⊥. As v ∈ W ⊥, we have hv, wi = 0 for all w ∈ W . As v ∈ W , we can take 2 w = v, which gives kvk = hv, vi = 0. This implies that v = 0, as required.  Definition 12.3. [defn-orth-seq] Let V be a vector space with inner product. We say that a sequence V = v1, . . . , vn of elements of V is orthogonal if we have hvi, vji = 0 for all i 6= j. We say that the sequence is strictly orthogonal if it is orthogonal, and all the elements vi are nonzero. We say that the sequence is orthonormal if it is orthogonal, and also hvi, vii = 1 for all i. Remark 12.4. [rem-normalise] If V is a strictly orthogonal sequence then we can define an orthonormal sequencev ˆ1,..., vˆn byv ˆi = vi/kvik. Example 12.5. [eg-orthonormal] n The standard basis e1,..., en for R is an orthonormal sequence. Example 12.6. [eg-orth-physical] Let a, b and c be the vectors joining the centre of the earth to the North Pole, the mouth of the river Amazon, and the city of Mogadishu. These are elements of the inner product space U discussed in Examples 2.6 and 10.6. Then a, b, c is an orthogonal sequence, and a/4000, b/4000, c/4000 is an orthonormal sequence. (Of course, these statements are only approximations. You can take it as an internet exercise to work out the size of the errors involved.) Lemma 12.7. [lem-pythag] Let v1, . . . , vn be an orthogonal sequence, and put v = v1 + ··· + vn. Then p 2 2 kvk = kv1k + ··· + kvnk . Proof. We have 2 X X X kvk = h vi, vji = hvi, vji. i j i,j Because the sequence is orthogonal, all terms in the sum are zero except those for which i = j. We thus have 2 X X 2 kvk = hvi, vii = kvik . i i We can now take square roots to get the equation in the lemma.  Lemma 12.8. [lem-strict-ind] Any strictly orthogonal sequence is linearly independent.

Proof. Let V = v1, . . . , vn be a strictly orthogonal sequence, and suppose we have a linear relation λ1v1 + ··· + λnvn = 0. For each i it follows that

hvi, λ1v1 + ··· + λnvni = hvi, 0i = 0. The left hand side here is just

λ1hvi, v1i + λ2hvi, v2i + ··· + λnhvi, vni.

Moreover, the sequence V is orthogonal, so the inner products hvi, vji are zero unless j = i, so the only nonzero term on the left hand side is λihvi, vii, so we conclude that λihvi, vii = 0. Moreover, the sequence 56 is strictly orthogonal, so vi 6= 0, so hvi, vii > 0. It follows that we must have λi = 0, so our original linear relation was the trivial one. We conclude that V is linearly independent, as claimed.  Proposition 12.9. [prop-seq-proj] Let V be a vector space with inner product, and let W be a subspace. Suppose that we have a strictly orthogonal sequence W = w1, . . . , wp that spans W , and we define

hv, w1i hv, wpi π(v) = w1 + ··· + wp hw1, w1i hwp, wpi (for all v ∈ V ). Then π(v) ∈ W and v − π(v) ∈ W ⊥, so v = π(v) + (v − π(v)) ∈ W + W ⊥. In particular, we have W + W ⊥ = V , so W is complemented. Remark 12.10. [rem-proj-normal] If the sequence W is orthonormal, then of course we have hwk, wki = 1 and the formula reduces to

π(v) = hv, w1iw1 + ... + hv, wpiwp.

Proof. First note that the coefficients λi = hv, wii/hwi, wii are just numbers, so the element π(v) = λ1w1 + ... + λpwp lies in the span of w1, . . . , wp, which is W . Next, we have

hwi, π(v)i = λ1hwi, w1i + ... + λihwi, wii + ... + λphwi, wpi.

As the sequence W is orthogonal, we have hwi, wji = 0 for j 6= i, so only the i’th term in the above sum is nonzero. This means that hv, wii hwi, π(v)i = λihwi, wii = hwi, wii = hv, wii = hwi, vi, hwi, wii so hwi, v − π(v)i = hwi, vi − hwi, π(v)i = 0. As this holds for all i, and the elements wi span W , we see that ⊥ hw, v − π(v)i = 0 for all w ∈ W , or in other words, that v − π(v) ∈ W , as claimed.  Corollary 12.11 (Parseval’s inequality). [cor-parseval] Let V be a vector space with inner product, and let W = w1, . . . , wp be an orthonormal sequence in V . Then for any v ∈ V we have p 2 X 2 kvk ≥ hv, wii . i=1 Moreover, this inequality is actually an equality iff v ∈ span(W). Pp Proof. Put W = span(W), and put π(v) = i=1hv, wiiwi as in Proposition 12.9. Put (v) = v −π(v), which lies in W ⊥. The sequence hv, w1iw1,..., hv, wpiwp, (v) is orthogonal, and the sum of the sequence is π(v) + (v) = v. Lemma 12.7 therefore tells us that 2 2 2 2 2 X 2 kvk = khv, w1iw1k + ··· + khv, wpiwpk + k(v)k = k(v)k + hv, wii . i 2 P 2 2 All terms here are nonnegative, so kvk ≥ ihv, wii , with equality iff k(v)k = 0. Moreover, we have 2 k(v)k = 0 iff (v) = 0 iff v = π(v) iff v ∈ W .  Proposition 12.12. [prop-closest] Let W and π be as in Proposition 12.9. Then π(v) is the point in W that is closest to v. Proof. Put x = v − π(v), so x ∈ W ⊥. The distance from v to π(v) is just kv − π(v)k = kxk. Now consider another point w ∈ W , with w 6= π(v). The distance from v to w is just kv − wk; we must show that this is larger than kxk. Put y = π(v) − w, and note that v − w = π(v) + x − w = x + y. Note also that y ∈ W (because π(v) ∈ W and w ∈ W ) and x ∈ W ⊥, so hx, yi = 0 = hy, xi. Finally, note that y 6= 0 and so kyk > 0. It follows that kv − wk2 = kx + yk2 = hx + y, x + yi = hx, xi + hx, yi + hy, xi + hy, yi = kxk2 + 0 + 0 + kyk2 > kxk2.

57 This shows that kv − wk > kxk = kv − π(v)k, so w is further from v than π(v) is.

v x v − w π(v) y w

0

 Theorem 12.13. [thm-gram-schmidt] Let V be a vector space with inner product, and let U = u1, . . . , un be a linearly independent list of elements of V . Then there is a strictly orthogonal sequence V = v1, . . . , vn such that span(v1, . . . , vi) = span(u1, . . . , ui) for all i.

Proof. The sequence V is generated by the Gram-Schmidt procedure, which we now describe. Put Ui = span(u1, . . . , ui). We will construct the elements vi by induction. For the initial step, we take v1 = u1, so (v1) is an orthogonal basis for U1. Suppose we have constructed an orthogonal basis v1, . . . , vi−1 for Ui−1. ⊥ Proposition 12.9 then tells us that Ui−1 is complemented, so V = Ui−1 + Ui−1. In particular, we can write ⊥ ui = vi + wi with vi ∈ Ui−1 and wi ∈ Ui−1. Explicitly, the formulae are

i−1 X hui, vji w = v i hv , v i j j=1 j j

vi = ui − wi.

⊥ As vi ∈ Ui−1 and v1, . . . , vi−1 ∈ Ui−1, we have hvi, vji = 0 for j < i, so (v1, . . . , vi) is an orthogonal sequence. Next, note that Ui = Ui−1 + Rui. As ui = vi + wi with wi ∈ Ui−1, we see that this is the same as Ui−1 + Rvi. By our induction hypothesis, we have Ui−1 = span(v1, . . . , vi−1), and it follows that Ui = Ui−1 + Rvi = span(v1, . . . , vi). This means that v1, . . . , vi is a spanning set of the i-dimensional space Ui, so it must be a basis.  Corollary 12.14. [cor-gram-schmidt] If V and U are as above, then there is an orthonormal sequence vˆ1,..., vˆn with span(ˆv1,..., vˆi) = span(u1, . . . , ui) for all i.

Proof. Just find a strictly orthogonal sequence v1, . . . , vn as in the Proposition, and putv ˆi = vi/kvik as in Remark 12.4.  Example 12.15. [eg-vector-gram] Consider the following elements of R5: " 1 # " 0 # " 0 # " 0 # 1 1 0 0 u1 = 0 u2 = 1 u3 = 1 u4 = 0 . 0 0 1 1 0 0 0 1

We apply the Gram-Schmidt procedure to get an orthogonal basis for the space U = span(u1, u2, u3, u4). T We have v1 = u1 = [ 1 1 0 0 0 ] , so hv1, v1i = 2 and hu2, v1i = 1. Next, we have

" 0 # " 1 #  −1/2  hu2, v1i 1 1 1 1/2 v2 = u2 − v1 = 1 − 0 =  1  . hv1, v1i 0 2 0 0 0 0 0 58 It follows that hv2, v2i = 3/2 and hu3, v2i = 1, whereas hu3, v1i = 0. It follows that

" 0 #  −1/2   1/3  hu3, v1i hu3, v2i 0 1 1/2 −1/3 v3 = u3 − v1 − v2 = 1 −  1  =  1/3  . hv , v i hv , v i 1 3/2 0 1 1 2 2 0 1 0 0

It now follows that hv3, v3i = 4/3 and hu4, v3i = 1, whereas hu4, v1i = hu4, v2i = 0. It follows that −1/4 " 0 #  1/3    hu4, v1i hu4, v2i hu4, v3i 0 1 −1/3 1/4 v4 = u4 − v1 − v2 − v3 = 0 −  1/3  =  −1/4  . hv , v i hv , v i hv , v i 1 4/3 1 1 2 2 3 3 1 1 1/4 0 1 In conclusion, we have −1/4 " 1 #  −1/2   1/3    1 1/2 −1/3 1/4 v1 = 0 v2 =  1  v3 =  1/3  v4 =  −1/4  . 0 0 0 1 1/4 0 0 1 Example 12.16. [eg-poly-gram] R 1 Consider the space V = R[x]≤2 with the inner product hp, qi = −1 p(x)q(x) dx. We will apply the Gram- 2 Schmidt procedure to the usual basis 1, x, x to get an orthonormal basis for V . We start with v1 = u1 = 1, R 1 R 1  2 1 and note that hv1, v1i = −1 1 dx = 2. We also have hx, v1i = −1 x dx = x /2 −1 = 0, so x is already orthogonal to v1. It follows that hx, v1i v2 = x − v1 = x, hv1, v1i R 1 2  3 1 and thus that hv2, v2i = −1 x dx = x /3 −1 = 2/3. We also have Z 1 2 2 hx , v1i = x dx = 2/3 −1 Z 1 2 3 hx , v2i = x dx = 0 −1 so 2 2 2 hx , v1i hx , v2i 2 2/3 2 v3 = x − v1 − v2 = x − 1 = x − 1/3. hv1, v1i hv2, v2i 2 We find that Z 1 Z 1 2 2 4 2 2 1  1 5 2 3 1 1 hv3, v3i = (x − 1/3) dx = x − 3 x + 9 dx = 5 x − 9 x + 9 x −1 = 8/45. −1 −1 The required orthonormal basis is thus given by √ vˆ1 = v1/kv1k = 1/ 2 p vˆ2 = v2/kv2k = 3/2x p 2 vˆ3 = v3/kv3k = 45/8(x − 1/3). Example 12.17. [eg-matrix-gram] h 1 1 1 i Consider the matrix P = 0 0 0 , and let V be the space of 3 × 3 symmetric matrices of trace zero. We will 0 0 0 find the matrix Q ∈ V that is closest to P . The general form of a matrix in V is h a b c i A = b d e . c e −a−d Thus, if we put h 1 0 0 i h 0 1 0 i h 0 0 1 i h 0 0 0 i h 0 0 0 i A1 = 0 0 0 A2 = 1 0 0 A3 = 0 0 0 A4 = 0 1 0 A5 = 0 0 1 , 0 0 −1 0 0 0 1 0 0 0 0 −1 0 1 0 we see that an arbitrary element A ∈ V can be written uniquely as aA1 + bA2 + cA3 + dA4 + eA5, so A1,...,A5 is a basis for V . It is not too far from being an orthonormal basis: we have hAi,Aii = 2 for all i, 59 and when i 6= j we have hAi,Aji = 0 except for the case hA1,A4i = 1. Thus, the Gram-Schmidt procedure works out as follows:

B1 = A1

hA2,B1i B2 = A2 − B1 = A2 hB1,B1i

hA3,B1i hA3,B2i B3 = A3 − B1 − B2 = A3 hB1,B1i hB2,B2i

hA4,B1i hA4,B2i hA4,B3i 1 B4 = A4 − B1 − B2 − B3 = A4 − B1 hB1,B1i hB2,B2i hB3,B3i 2 h 0 0 0 i 1 h 1 0 0 i  −1/2 0 0  = 0 1 0 − 0 0 0 = 0 1 0 0 0 −1 2 0 0 −1 0 0 −1/2

hA5,B1i hA5,B2i hA5,B3i hA5,B4i B5 = A5 − B1 − B2 − B3 − B4 = A5. hB1,B1i hB2,B2i hB3,B3i hB4,B4i p √ p √ We have kB4k = 3/2 and kBik = 2 for all other i. After noting that (1/2)/ 3/2 = 1/ 6, it follows that the following matrices give an orthonormal basis for V : 1 h 1 0 0 i 1 h 0 1 0 i 1 h 0 0 1 i 1 h −1 0 0 i 1 h 0 0 0 i Bb1 = √ 0 0 0 Bb2 = √ 1 0 0 Bb3 = √ 0 0 0 Bb4 = √ 0 2 0 Bb5 = √ 0 0 1 . 2 0 0 −1 2 0 0 0 2 1 0 0 6 0 0 −1 2 0 1 0 P5 −1 According to Proposition 12.12, the matrix Q is given by Q = i=1hP,BiihBi,Bii Bi. The relevant inner products are hP,B1i = hP,B2i = hP,B3i = 1 and hP,B4i = −1/2 and hP,B5i = 0. We also have hB1,B1i = hB2,B2i = hB3,B3i = 2 and hB4,B4i = 3/2, so 1 −1 2  2/3 1/2 1/2  Q = (B1 + B2 + B3) + B4 = 1/2 −1/3 0 2 2 3 1/2 0 −1/3 Exercises Exercise 12.1. [ex-project-to-symmetric] T Put V = {B ∈ M2R | B = B}, and let π : M2R → V be the orthogonal projection. Find an orthogonal  a b  basis for V , and use it to calculate π(A) for an arbitrary matrix A = c d . Use this to show that π(A) = (A + AT )/2.

Exercise 12.2. [ex-parseval-unnormalised] Let W = w1, . . . , wp be a strictly orthogonal sequence in an inner product space V , and let v be an element of V . Show that p 2 X hv, wii kvk2 ≥ . hw , w i i=1 i i (Note that Parseval’s inequality covers the case where the sequence is orthonormal, so kwik = 1. You can prove the above statement either by modifying the proof of Parseval’s inequality, or by applying Parseval’s inequality to a different sequence.)

Exercise 12.3. [ex-parseval-i] Use the result in Exercise 12.2 to show that for any continuous function f ∈ C[−1, 1] we have 2 2 R 1 2 R 1  R 1  2 −1 f(x) dx ≥ −1 f(x) dx + 3 −1 x f(x) dx .

Exercise 12.4. [ex-orthonormal-matrices] T Consider the space V = M4R with the usual inner product hA, Bi = trace(AB ). Consider the following sequence in V : " 0 0 0 0 # " 0 0 0 0 # " 0 1 1 0 # " 1 1 1 1 # 0 1 0 0 0 1 1 0 1 1 1 1 1 1 1 1 A1 = 0 0 1 0 A2 = 0 1 1 0 A3 = 1 1 1 1 A4 = 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 60 Find an orthonormal sequence C1,...,C4 in V such that span{A1,...,Ai} = span{C1,...,Ci} for all i. (You can use the Gram-Schmidt procedure for this but it is easier to find an answer by inspection.)

Exercise 12.5. [ex-orthonormal-vectors] Consider the following vectors in R5: " 1 # " 1 # " 1 # " 1 # " 1 # 1 1 1 1 0 u1 = 1 u2 = 1 u3 = 1 u4 = 0 u5 = 0 1 1 0 0 0 1 0 0 0 0 Find an orthonormal sequence vb1,..., vb5 such that span{vb1,..., vbi} = span{u1, . . . , ui} for all i.

Exercise 12.6. [ex-orthonormal-quad] Define an inner product on R[t]≤2 by ∞ Z 2 √ hf, gi = f(t)g(t)e−t dt/ π. −∞ 2 Apply the Gram-Schmidt procedure to the basis {1, t, t } to get a basis for R[t]≤2 that is orthonormal with respect to this inner product. You may assume that ∞ 1 Z 2 htn, tmi = √ tn+me−t dt π −∞ ( 1 (n+m)! if n + m is even = 2n+m ((n+m)/2)! 0 if n + m is odd (and you should remember that 0! = 1).

Exercise 12.7. For x ∈ R4, put (x − x )2 (x − x )2 (x + x + x + x )2 α(x) = x2 + x2 + x2 + x2 − 1 2 − 3 4 − 1 2 3 4 1 2 3 4 2 2 4 4 (a) By finding a suitable orthonormal sequence v1, v2, v3, show that α(x) ≥ 0 for all x ∈ R . (b) Find a fourth vector v4 such that v1, v2, v3, v4 is orthonormal. (c) Expand out and simplify α(x). How is the answer related to (b)?

13. Hermitian forms We now briefly discuss the analogue of inner products for complex vector spaces. Given a complex number z = x + iy, we write z for the complex conjugate, which is x − iy. Definition 13.1. [defn-hermitian] Let V be a vector space over C.A Hermitian form on V is a rule that gives a number hu, vi ∈ C for each u, v ∈ V , with the following properties: (a) hu + v, wi = hu, wi + hv, wi for all u, v, w ∈ V . (b) htu, vi = thu, vi for all u, v ∈ V and t ∈ C. (c) hu, vi = hv, ui for all u, v ∈ V . In particular, by taking v = u we see that hu, ui = hu, ui, so hu, ui is real. (d) For all u ∈ V we have hu, ui ≥ 0 (which is meaningful because hu, ui ∈ R), and hu, ui = 0 iff u = 0. Note that (b) and (c) together imply that hu, tvi = thu, vi. Given an inner product, we will write kuk = phu, ui, and call this the norm of u. We say that u is a unit vector if kuk = 1. 61 Example 13.2. [eg-hermitian-Cn] We can define a Hermitian form on Cn by

hu, vi = u1v1 + ··· + unvn.

This gives 2 2 2 kuk = hu, ui = |u1| + ··· + |un| .

Definition 13.3. [defn-matrix-dagger] For any n × m matrix A over C, we let A† be the complex conjugate of the transpose of A, so for example

† h 1−i 4−i i  1+i 2+i 3+i  = 2−i 5−i . 4+i 5+i 6+i 3−i 6−i

The above Hermitian form on Cn can then be rewritten as hu, vi = v†u = u†v.

Example 13.4. [eg-hermitian-Cx] We can define a Hermitian form on C[t] by Z 1 hf, gi = f(t)g(t) dt. 0 This gives Z 1 kfk2 = hf, fi = |f(t)|2 dt. 0 Example 13.5. [eg-hermitian-matrix] † n2 We can define a Hermitian form on MnC by hA, Bi = trace(B A). If we identify MnC with C in the usual way, then this is just the same as the Hermitian form in Example 13.2.

Our earlier results about inner products are mostly also true for Hermitian forms, but they need to be adjusted slightly by putting complex conjugates or absolute value signs in appropriate places. We will not go through the proofs, but we will at least record some of the statements.

Theorem 13.6 (The Cauchy-Schwartz inequality). [thm-cauchy-schwartz-hermitian] Let V be a vector space over C with a Hermitian form, and let v and w be elements of V . Then |hv, wi| ≤ kvk kwk,

with equality iff v and w are linearly dependent over C.  Lemma 13.7. [lem-pythag-hermitian] Let V be a vector space over C with a Hermitian form, let v1, . . . , vn be an orthogonal sequence in V , and put v = v1 + ··· + vn. Then p 2 2 kvk = kv1k + ··· + kvnk .  Proposition 13.8. [prop-parseval-complex] Let V be a vector space over C with a Hermitian form, and let W = w1, . . . , wp be an orthonormal sequence in V . Then for any v ∈ V we have p 2 X 2 kvk ≥ |hv, wii| . i=1

Moreover, this inequality is actually an equality iff v ∈ span(W).  62 14. Adjoints of linear maps Definition 14.1. [defn-adjoint] Let V and W be real vector spaces with inner products (or complex vector spaces with Hermitian forms). Let φ: V → W and ψ : W → V be linear maps (over R or C as appropriate). We say that φ is adjoint to ψ if we have hφ(v), wi = hv, ψ(w)i for all v ∈ V and w ∈ W . This is essentially a basis-free formulation of the operation of transposing a matrix, as we see from the following example. Example 14.2. [eg-adjoint-matrix] m n Let A be an n × m matrix over R, giving a linear map φA : R → R by φA(v) = Av. The transpose of A n m is then an m × n matrix, giving a linear map φAT : R → R . We claim that φAT is adjoint to φA. This is easy to see using the formula hx, yi = xT y as in Remark 10.4. Indeed, we have T T T T hφA(u), vi = hAu, vi = (Au) v = u A v = hu,A vi = hu, φAT (v)i, as required. Example 14.3. [eg-adjoint-hermitian] m n † Let A be an n×m matrix over C, giving a linear map φA : C → C by φA(v) = Av. Let A be the complex T conjugate of A . Then φA† is adjoint to φA. Example 14.4. [eg-diff-antihermitian] 2 Let V be the set of functions of the form p(x)e−x /2, where p(x) is a polynomial. We use the inner product Z ∞ hf, gi = f(x)g(x) dx, −∞ 2 as in Example 10.10. If we have a function f(x) = p(x)e−x /2 in V , we note that

2 2 2 f 0(x) = p0(x)e−x + p(x).(−2x).e−x = (p0(x) − 2x p(x))e−x /2, and p0(x) − 2x p(x) is again a polynomial, so f 0(x) ∈ V . We can thus define a linear map D : V → V by D(f) = f 0. We claim that D is adjoint to −D. This is equivalent to the statement that for all f and g in V , we have hD(f), gi + hf, D(g)i = 0. This is true because Z ∞ hf 0, gi + hf, g0i = f 0(x)g(x) + f(x)g0(x) dx −∞ Z ∞ d = (f(x)g(x)) dx −∞ dx ∞ = [f(x)g(x)]−∞ = lim f(x)g(x) − lim f(x)g(x). x→+∞ x→−∞

2 Both limits here are zero, because the very rapid decrease of e−x wipes out the much slower increase of the polynomial terms. Example 14.5. [eg-adjoint-misc] R 1 2 Consider the vector spaces R[x]≤2 (with inner product hf, gi = 0 f(x)g(x) dx) and R (with the usual inner 2 2 product). Define maps φ: R[x]≤2 → R and ψ : R → R[x]≤2 by h f(0) i φ(f) = f(1) p 2 ψ [ q ] = (30p + 30q)x − (36p + 24q)x + (9p + 3q). 2 We claim that φ is adjoint to ψ. To check this, consider a quadratic polynomial f(x) = ax +bx+c ∈ R[x]≤2 p 2 c and a vector v = [ q ] ∈ R . Note that f(0) = c and f(1) = a + b + c, so φ(f) = [ a+b+c ]. We must show that hf, ψ(v)i = hφ(f), vi, or in other words that Z 1 (ax2 + bx + c)((30p + 30q)x2 − (36p + 24q)x + (9p + 3q)) dx = p f(0) + q f(1) = pc + q(a + b + c). 0 63 This is a straightforward calculation, which can be done by hand or using Maple: entering expand( int( (a*x^2+b*x+c)* ((30*p+30*q)*x^2 - (36*p+24*q)*x + (9*p+3*q)), x=0..1 ) );

gives cp + aq + bq + cq, as required. Proposition 14.6. [prop-adjoint-exists] Let V and W be finite-dimensional real vector spaces with inner products (or complex vector spaces with Hermitian forms). Let φ: V → W be a linear maps (over R or C as appropriate). Then there is a unique map ψ : W → V that is adjoint to φ. (We write ψ = φ∗ in the real case, or ψ = φ† in the complex case.) Proof. We will prove the complex case; the real case is similar but slightly easier. We first show that there is at most one adjoint. Suppose that ψ and ψ0 are both adjoint to φ, so hv, ψ(w)i = hφ(v), wi = hv, ψ0(w)i for all v ∈ V and w ∈ W . This means that hv, ψ(w) − ψ0(w)i = 0 for all v and w. In particular, we can take v = ψ(w) − ψ0(w), and we find that kψ(w) − ψ0(w)k2 = hψ(w) − ψ0(w), ψ(w) − ψ0(w)i = 0, so ψ(w) = ψ0(w) for all w, so ψ = ψ0. To show that there exists an adjoint, choose an orthonormal basis V = v1, . . . , vn for V , and define a linear map ψ : W → V by n X ψ(w) = hw, φ(vj)ivj. j=1 Recall that hx, λyi = λhx, yi, and that hx, yi = hy, xi. Using these rules we find that X hvi, ψ(w)i = hvi, hw, φ(vj)ivji j X = hw, φ(vj)ihvi, vji j X = hφ(vj), wihvi, vji j

= hφ(vi), wi.

(For the last equality, recall that V is orthonormal, so hvi, vji = 0 except when j = i, so only the i’th term 2 in the sum is nonzero. The i’th term simplifies to hφ(vi), wi, because hvi, vii = kvik = 1.) P More generally, any element v ∈ V can be written as i xivi for some x1, . . . , xn ∈ C, and then we have X hv, ψ(w)i = xihvi, ψ(w)i i X = xihφ(vi), wi i ! X = hφ xivi , wi i = hφ(v), wi. This shows that ψ is adjoint to φ, as required.  Exercises 64 Exercise 14.1. [ex-adjoint-i] x 3 h i x y  a b  Consider the map φ: → M2 given by φ y = [ y z ]. Given a matrix A = , find a vector R R z c d T 3 ∗ 3 w = [p, q, r] such that hφ(v),Ai = hv, wi for all vectors v ∈ R . (The adjoint map φ : M2R → R is then given by φ∗(A) = w.)

Exercise 14.2. [ex-adjoint-ii] Consider the map φ: M3R → M3R given by

h a1 a2 a3 i h 0 a4 a7 i φ a4 a5 a6 = 0 0 a8 . a7 a8 a9 0 0 0 ∗ Give a formula for the adjoint map φ : M3R → M3R.

Exercise 14.3. [ex-adjoint-iii] 1 1 ∗ Consider the map φ: M2R → M2R given by φ(A) = QAQ, where Q = [ 1 1 ]. Show that φ = φ.

Exercise 14.4. [ex-adjoint-iv] 1 ∗ Define φ: MnR → MnR by φ(A) = A − n trace(A)I. Show that φ = φ.

Exercise 14.5. [ex-adjoint-v] R 1/2 In this exercise we give the space R[x]≤2 the inner product hf, gi = −1/2 f(x)g(x) dx. Define χ: R[x]≤2 → R 00 2 by χ(f) = f (0). If f(x) = ax + bx + c, what is χ(f)? Find an element u ∈ R[x]≤2 such that χ(f) = hf, ui for all f, and thus give a formula for χ∗.

Exercise 14.6. Let T2 be the usual space of trigonometric polynomials. We can define ∆: T2 → T2 by ∆(f) = f 00. P2 (a) Find ∆(f), where f = n=−2 anen. (b) Show that ∆ is self-adjoint. (This can be deduced from part (a), or you can prove it more directly.) (c) Find the eigenvalues of ∆ (there are three of them). (d) What are the dimensions of the corresponding eigenspaces?

Exercise 14.7. [ex-isometric-embedding] Let U and V be vector spaces with inner products, and let φ: U → V be a linear map with the property that ∗ φ (φ(u)) = u for all u ∈ U. Let U = u1, . . . , un be an orthonormal sequence in U. Show that φ(u1), . . . , φ(un) is an orthonormal sequence in V .

Exercise 14.8. [ex-contraction] Let U and V be vector spaces with inner products, and let φ: U → V be a linear map with the property ∗ ∗ that φ(φ (v)) = v for all v ∈ V . Let u be an element of U, and put u1 = φ φ(u) and u2 = u − u1.

(a) Show that φ(u1) = φ(u) and φ(u2) = 0. (b) Show that hu1, u2i = 0. 2 2 (c) Deduce that kuk ≥ ku1k 2 2 (d) Show that ku1k = kφ(u)k . (e) Deduce that kφ(u)k ≤ kuk.

65 15. Fourier theory You will already have studied Fourier series in the Advanced Calculus course. Here we revisit these ideas from a more abstract point of view, in terms of angles and distances in a Hermitian space of periodic functions. Definition 15.1. [defn-P] We say that a function f : R −→ C is periodic if f(t + 2π) = f(t) for all t ∈ R. We let P be the set of all continuous periodic functions from R to C, which is a vector space over C. We define an inner product on P by 1 Z 2π hf, gi = f(t)g(t) dt. 2π 0

Some important elements of P are the functions en, sn and cn defined as follows:

en(t) = exp(int) (for n ∈ Z)

sn(t) = sin(nt) (for n > 0)

cn(t) = cos(nt) (for n ≥ 0) . De Moivre’s theorem tells us that

en = cn + i sn

sn = (en − e−n)/(2i)

cn = (en + e−n)/2. Definition 15.2. [defn-trig-poly] We put Tn = span({ek | − n ≤ k ≤ n}) ≤ P,

and note that Tn ≤ Tn+1 for all n. We also let T denote the span of all the ek’s, or equivalently, the union of all the sets Tn. The elements of T are the functions f : R −→ C that can be written in the form n n X X f(t) = akek(t) = ak exp(ikt) k=−n k=−n

for some n > 0 and some coefficients a−n, . . . , an ∈ C. Functions of this form are called trigonometric polynomials or finite Fourier series. Proposition 15.3. [prop-Tn-basis] The sequence e−n, e−n+1, . . . , en−1, en is an orthonormal basis for Tn (so dim(Tn) = 2n + 1). Proof. For m 6= k we have Z 2π Z 2π −1 −1 hek, emi = (2π) ek(t)em(t) dt = (2π) exp(ikt) exp(−imt) dt 0 0 Z 2π exp(i(k − m)t)2π = (2π)−1 exp(i(k − m)t) dt = (2π)−1 0 i(k − m) 0 1   = e2(k−m)πi − 1 . 2(k − m)πi 2(k−m)πi As k − m is an integer, we have e = 1 and so hek, emi = 0. This shows that the sequence of ek’s is orthogonal. We also have Z 2π Z 2π −1 −1 hek, eki = (2π) ek(t)ek(t) dt = (2π) exp(2kπit) exp(−2kπit) dt 0 0 Z 2π = (2π)−1 1 dt = 1. 0

Our sequence is therefore orthonormal, and so linearly independent. It also spans Tn (by the definition of Tn), so it is a basis.  66 Definition 15.4. [defn-pi-n] For any f ∈ P , let πn(f) be the orthogonal projection of f in Tn, so n X πn(f) = hf, emiem. m=−n ⊥ We also put n(f) = f −πn(f), so f = πn(f)+n(f), with πn(f) ∈ Tn and n(f) ∈ Tn (by Proposition 12.9). Proposition 15.5. [prop-trig-basis] The sequence Cn = c0, c1, . . . , cn, s1, . . . , sn is another orthogonal basis for Tn. It is not orthonormal, but 2 2 2 instead satisfies kskk = 1/2 = kckk for k > 0, and kc0k = 1. Proof. We use the identities

sm = (em − e−m)/(2i)

cm = (em + e−m)/2.

If k 6= m (with k, m ≥ 0) we see that ek and e−k are orthogonal to em and e−m. It follows that

hsm, ski = hsm, cki = hcm, ski = hcm, cki = 0.

Now suppose that m > 0, so cm and sm are both in the claimed basis. We have hem, e−mi = 0, and so 1 1 1 hsm, cmi = 4i hem−e−m, em+e−mi = 4i (hem, emi+hem, e−mi+−he−m, emi−he−m, e−mi) = 4i (1+0−0−1) = 0.

This shows that Cn is an orthogonal sequence. For k > 0 we have hs , s i = 1 1 he − e , e − e i k k 2i 2i k −k k −k 1 = 4 (1 − 0 − 0 + 1) = 1/2.

Similarly, we have hck, cki = 1/2. In the special case k = 0 we instead have c0(t) = 1 for all t, so hc0, c0i = −1 R 2π (2π) 0 1 dt = 1.  Corollary 15.6. [cor-trig-formula] Using Proposition 12.9, we deduce that n n X X πn(f) = hf, c0ic0 + 2 hf, ckick + 2 hf, skisk. k=1 k=1 Theorem 15.7. [thm-convergence] For any f ∈ P we have kn(f)k → 0 as n → ∞. Proof. See Appendix A in the online version of the notes. (The proof is not examinable and will not be covered in lectures.)  Remark 15.8. [rem-convergence] Recall that πn(f) is the closes point to f lying in Tn, so the number kn(f)k = kf − πn(f)k can be regarded as the distance from f to Tn. The theorem says that by taking n to be sufficiently large, we can make this distance as small as we like. In other words, f can be very well approximated by a trigonometric polynomial of sufficiently high degree. Corollary 15.9. [cor-convergence] For any f ∈ P we have ∞ ∞ ∞ 2 X 2 2 X 2 X 2 kfk = |hf, eki| = |hf, c0i| + 2 |hf, cki| + 2 |hf, ski| k=−∞ k=1 k=1

Proof. As e−n, . . . , en is an orthonormal basis for Tn, we have n n 2 2 2 X 2 X 2 kfk − kn(f)k = kπn(f)k = k hf, ekiekk = |hf, eki| k=−n k=−n 67 2 P∞ 2 By taking limits as n tends to infinity, we see that kfk = k=−∞ |hf, eki| . Similarly, using Corollary 15.6 and Proposition 15.5, we see that n n 2 2 2 X 2 2 X 2 2 kπn(f)k = |hf, c0i| kc0k + 4|hf, cki| kckk + 4|hf, ski| kskk k=1 k=1 n n 2 X 2 X 2 = |hf, c0i| + 2 |hf, cki| + 2 |hf, ski| k=1 k=1 We can again let n tend to infinity to see that ∞ ∞ 2 2 X 2 X 2 kfk = |hf, c0i| + 2 |hf, cki| + 2 |hf, ski| . k=1 k=1  Exercises

Exercise 15.1. Suppose that f ∈ T2 satisfies f(0) = f(π) and f(−π/2) = f(π/2). Show that f(t+π) = f(t) for all t.

0 Exercise 15.2. Put U = {f ∈ T2 | f(0) = f (0) = 0}. The aim of this exercise is to find an orthonormal basis for U ⊥. P2 0 (a) If f = n=−2 anen, find f(0) and f (0) in terms of the numbers an, and so find the general form of an element of U. (b) Using this, find a basis for U. ⊥ ⊥ (c) Using this and the fact that T2 = U ⊕ U , find the dimensions of U and U . 0 (d) Using part (a) again, find elements v0, v1 ∈ T2 such that hf, v0i = f(0) and hf, v1i = f (0) for all f ∈ T2. ⊥ (e) Show that v0, v1 is an orthogonal basis for U , and thus find an orthonormal basis.

16. Diagonalisation of self-adjoint operators [dec-diagonal]

Definition 16.1. [defn-self-adjoint] Let V be a finite-dimensional vector space over C.A self-adjoint operator on V is a linear map α: V → V such that α† = α. Theorem 16.2. [thm-real-spectrum] If α: V → V is a self-adjoint operator, then every eigenvalue of α is real. Proof. First suppose that λ is an eigenvalue of α, so there exists a nonzero vector v ∈ V with α(v) = λv. We then have λhv, vi = hλv, vi = hα(v), vi = hv, α†(v)i = hv, α(v)i = hv, λvi = λhv, vi. As v 6= 0 we have hv, vi > 0, so we can divide by this to see that λ = λ, which means that λ is real.  Theorem 16.3. [thm-self-adjoint] If α: V → V is a self-adjoint operator, then one can choose an orthonormal basis V = v1, . . . , vn for V such that each vi is an eigenvector of α. The following lemma will be useful in the proof. Lemma 16.4. [lem-self-adjoint] Let α: V → V be a self-adjoint operator, and let W ≤ V be a subspace such that α(W ) ≤ W (ie α(w) ∈ W for all w ∈ W ). Then α(W ⊥) ≤ W ⊥. 68 Proof. Suppose that v ∈ W ⊥; we must show that α(v) is also in W ⊥. To see this, consider w ∈ W , and note that hα(v), wi = hv, α†(w)i = hv, α(w)i (by the definition of adjoints and the fact that α† = α). As α(W ) ≤ W we see that α(w) ∈ W , so hv, α(w)i = 0 (because v ∈ W ⊥). We conclude that hα(v), wi = 0 for ⊥ all w ∈ W , so α(v) ∈ W as claimed.  Proof of Theorem 16.3. Put n = dim(V ); the proof is by induction on n. If n = 1 then we choose any unit vector v1 ∈ V and note that V = Cv1. This means that α(v1) = λ1v1 for some λ1 ∈ C, so v1 is an eigenvector, and this proves the theorem in the case n = 1. Now suppose that n > 1. The characteristic polynomial of α is then a polynomial of degree n over C, so it must have at least one root (by the fundamental theorem of algebra), say λ1. We know that the roots of the characteristic polynomial are precisely the eigenvalues, so λ1 is an eigenvalue, so we can find a nonzero vector u1 ∈ V with α(u1) = λ1u1. We then put v1 = u1/ku1k, so kv1k = 1 and v1 is still an eigenvector of 0 ⊥ 0 0 eigenvalue λ1, which implies that α(Cv1) ≤ Cv1. Now put V = (Cv1) . The lemma tells us that α(V ) ≤ V , so we can regard α as a self-adjoint operator on V 0. Moreover, dim(V 0) = n − 1, so our induction hypothesis 0 applies. This means that there is an orthonormal basis for V (say v2, v3, . . . , vn) consisting of eigenvectors for α. It follows that v1, v2, . . . , vn is an orthonormal basis for V consisting of eigenvectors for α.  Exercises

Exercise 16.1. Let T2 be the usual space of trigonometric polynomials. We can define ∆: T2 → T2 by ∆(f) = f 00. P2 (a) Find ∆(f), where f = n=−2 anen. (b) Show that ∆ is self-adjoint. (This can be deduced from part (a), or you can prove it more directly.) (c) Find the eigenvalues of ∆ (there are three of them). (d) What are the dimensions of the corresponding eigenspaces?

h x i  x+2y+z  Exercise 16.2. Define α: C3 → C3 by α y = 2x+9y+2z . z x+2y+z (a) What is the matrix of α with respect to the standard basis of C3? (b) Show that α is self-adjoint. (c) Find an orthonormal basis of C3 consisting of eigenvectors of α.

Exercise 16.3. Define α: C5 → C5 by T T α([z0, z1, z2, z3, z4] ) = [z1, z2, z3, z4, z0] . (a) Find α†, and show that α† = α−1. (Of course this is a special property of this particular map. For most linear maps, the adjoint is unrelated to the inverse.) (b) Find the eigenvalues of α. (Hint: it is easier to think directly about when α(z) = λ z, rather than trying to calculate the characteristic polynomial. You will need to consider separately the cases where λ5 = 1 and where λ5 6= 1.)

2 00 Exercise 16.4. Define a map α: R[x]≤2 → R[x]≤2 by α(f) = (3x − 1)f . You may assume that if we use R 1 the inner product hf, gi = −1 f(x)g(x) dx on R[x]≤2, then α is self-adjoint. (a) Show that α(α(f)) = 6α(f) for all f. (b) Deduce that if f is a nonzero eigenvector of α with eigenvalue λ, then α(α(f)) = λ2f and λ2 = 6λ. (c) Find an orthogonal basis for R[x]≤2 consisting of eigenvectors for α.

2 Exercise 16.5. Let V be the space of functions f : R → C of the form f(x) = p(x)e−x /2 for some p ∈ C[x]. R ∞ Give this the Hermitian form hf, gi = −∞ f(x)g(x) dx. Define φ: V → V by φ(f)(x) = x f(x). (a) Show that φ is self-adjoint. 69 (b) Show that if φ(f) = λf for some λ ∈ C, then f = 0. (Thus, φ has no eigenvalues. This is only possible because V is infinite-dimensional.)

0 1 Exercise 16.6. Let T be the matrix , and define γ : M → M by γ(A) = TA − AT . 1 0 2R 2R

(a) Give a basis for M2R, and find the matrix of γ with respect to that basis. (b) Find bases for the kernel and the image of γ. Show that the image is the orthogonal complement of T the kernel with respect to the usual inner product hX,Y i = trace(XY ) on M2R. (c) Show that γ4 = 4γ2. (d) Find a basis of M2R consisting of eigenvectors for γ. (Note here that an eigenvector for γ is a matrix A such that γ(A) = TA − AT = λA for some λ.)

Appendix A. Fejer’s´ Theorem [apx-fejer]

In this appendix, we will outline a proof of Theorem 15.7: for any f ∈ P , we have kn(f)k → 0 as n → ∞. For f ∈ P and n > 0, we put

θn(f) = (π0(f) + ··· + πn−1(f))/n.

δn(f) = f − θn(f).

Theorem A.1 (F´ejer’sTheorem). [thm-fejer] For any f ∈ P , we have

max{|δn(f)(x)| | x ∈ R} → 0 as n → ∞.

We will sketch the proof of this shortly. First, however, we explain why Theorem A.1 implies Theorem 15.7. Suppose we have g ∈ P , and put m = max{|g(x)| | x ∈ R}, so for all x we have 0 ≤ |g(x)|2 ≤ m2. Then 1 Z 2π 1 Z 2π kgk2 = |g(x)|2 dx ≤ m2 dx = m2, 2π 0 2π 0 so kgk ≤ m. Taking g = δn(f), we see that

0 ≤ kδn(f)k ≤ max{|δn(f)(x)| | x ∈ R}.

Using F´ejer’sTheorem and the Sandwich Lemma, we deduce that kδn(f)k → 0 as n → ∞. We now need to relate δn to n. Note that for k ≤ n we have πk(f) ∈ Tk ≤ Tn. It follows that θn(f) ∈ Tn. We also know (from Proposition 12.12) that πn(f) is the closest point to f in Tn. In other words, for any g ∈ Tn, we have kf − πn(f)k ≤ kf − gk. In particular, we can take g = θn(f) to see that kf − πn(f)k ≤ kf − θn(f)k, or in other words kn(f)k ≤ kδn(f)k. As kδn(f)k → 0, the Sandwich Lemma again tells us that kn(f)k → 0, proving Theorem 15.7. The proof of F´ejer’sTheorem depends on the properties of certain functions dn(t) and kn(t) (called the Dirichlet kernel and the F´ejerkernel) which are defined as follows:

n X dn(t) = ej(t) j=−n n−1 X kn(t) = ( dm(t))/n. m=0 70 Proposition A.2. [prop-kernel] If g = πn(f) and h = θn(f), then Z π −1 g(t) = (2π) f(t + s)dn(−s) ds s=−π Z π −1 h(t) = (2π) f(t + s)kn(−s) ds. s=−π Lemma A.3. [lem-periodic-mean] R a+2π R 2π If u is a periodic function and a ∈ R, then a u(t) dt = 0 u(t) dt. Proof. Let 2mπ be the largest multiple of 2π that is less than or equal to a. Assuming that u is real and positive, we have a picture like this:

A B B0 A0

a−2mπ a a+2π 0 2π 2mπ 2(m+1)π

R 2π R a+2π 0 0 We have 0 u(t) dt = area(A) + area(B) and a u(t) dt = area(B ) + area(A ), but clearly area(A) = area(A0) and area(B) = area(B0), and the claim follows. Much the same argument works even if u is not real and positive, but one needs equations rather than pictures.  −1 R π Proof of Proposition A.2. Consider the integral I = (2π) −π f(t + s)dn(−s) ds. We put x = t + s, so −1 R t+π s = x − t and ds = dx and the endpoints s = ±π become x = t ± π, so I = (2π) t−π f(x)dn(t − x) dx. −1 R 2π Next, we can use the lemma to convert this to I = (2π) 0 f(x)dn(t − x) dx. We then note that

ek(t − x) = exp(2πik(t − x)) = exp(2πikt)exp(2πikx) = ek(t)ek(x), so n n X X dn(t − x) = ek(t − x) = ek(t)ek(x), k=−n k=−n so n Z 2π X −1 I = (2π) f(x)ek(x)ek(t) dx k=−n 0 n X = hf, ekiek(t) k=−n

= πn(f)(t) = g(t), Pn−1 as claimed. Next, we recall that kn(t) = ( m=0 dm(t))/n, so n−1 Z π 1 X 1 Z π (2π)−1 f(t + s)k (−s) ds = f(t + s)d (−s) ds n n 2π j −π j=0 −π n−1 1 X = π (f)(t) = θ (f)(t) = h(t), n j n j=0

71 as required.  Corollary A.4. [cor-fejer-normal] R π −π kn(s) ds = 2π.

Proof. We take f = e0 in Proposition A.2. We have πk(e0) = e0 for all k, and so h = θn(e0) = (e0 + ··· + e0)/n = e0. Thus, the proposition tells us that Z π −1 e0(t) = (2π) e0(t + s)kn(−s) ds. s=−π

However, e0(x) = 1 for all x, so this simplifies to Z π −1 1 = (2π) kn(−s) ds. −π

Moreover, we have ej(−s) = e−j(s), and it follows from this that dj(−s) = dj(s) and kn(−s) = kn(s). It therefore follows that Z π kn(s) ds = 2π, −π as claimed.  Proposition A.5. [prop-fejer-kernel]

1 sin(nt/2)2 k (t) = . n n sin(t/2) Proof. We will focus on the case n = 6; the general case is the same, but needs more complicated notation. iπt 2j Put z = e , so ej(t) = z . Put

2 n−1 −1 −2 1−n pn = (1 + z + z + ··· + z )(1 + z + z + ··· + z ) We can expand this out and write the terms in an n × n square array, which looks like this in the case n = 6:

1 z z2 z3 z4 z5

z−1 1 z z2 z3 z4

z−2 z−1 1 z z2 z3

z−3 z−2 z−1 1 z z2

z−4 z−3 z−2 z−1 1 z

z−5 z−4 z−3 z−2 z−1 1

We have divided the square into L-shaped blocks. The sum of the terms in the third block (for example) is

−2 −1 2 z + z + 1 + z + z = e−2(t) + e−1(t) + e0(t) + e1(t) + e2(t) = d3(t).

More generally, the sums of the terms in the six different L-shaped blocks are d0(t), d1(t), . . . , d5(t). Adding these together, we see that

pn(t) = d0(t) + d1(t) + ··· + dn−1(t) = n kn(t). 72 Now put w = eπit, so z = w2 and sin(t/2) = (w − w−1)/(2i) and sin(nt/2) = (wn − w−n)/(2i). On the other hand, we have the geometric progression formula zn − 1 w2n − 1 wn (wn − w−n)/(2i) 1 + z + ··· + zn−1 = = = z − 1 w2 − 1 w (w − w−1)/(2i) sin(nt/2) = wn−1 sin(t/2) Similarly, we have sin(nt/2) 1 + z−1 + ··· + z1−n = w1−n . sin(t/2) If we multiply these two equations together, we get sin(nt/2)2 p (t) = . n sin(t/2) Dividing by n gives 1 sin(nt/2)2 k (t) = , n n sin(t/2) as claimed. 

It is now easy to plot kn(s). For n = 10, the picture is as follows:

−π π

There is a narrow spike (of width approximately 4π/n) near s = 0, and kn(s) is small on the remainder of the interval [−π, π]. Now think what happens when we evaluate the integral Z π −1 h(t) = (2π) f(t + s)kn(−s) ds. −π

When s is very small, f(t + s) is close to f(t). If s is not very small, then kn(−s) is tiny and we do not get much of a contribution to the integral anyway. Thus, it will not make much difference if we replace f(t + s) by f(t). This gives Z π Z π −1 −1 h(t) ≈ (2π) f(t)kn(−s) ds = f(t).(2π) kn(−s) ds = f(t) −π −π (where we have used Corollary A.4). All this can be made more precise to give an explicit upper bound for the quantity max{δn(f)(t) | t ∈ R} = max{|f(t) − h(t)| | t ∈ R}, which can be used to prove Theorem A.1. 73 Appendix B. Solutions [apx-solutions]

Exercise 2.1: Of course there are many different correct answers to this question. The following will do:  1   1   2   10  1 −1 0 −10 (a) u = 1 , v = 1 , u + v = 2 , 10v = 10 . 1 −1 0 −10 1 2 3 6 5 4 7 7 7 60 50 40 (b) u = [ 4 5 6 ], v = [ 3 2 1 ], u + v = [ 7 7 7 ], 10v = [ 30 20 10 ]. (c) u = 1 + x, v = x + x2, u + v = 1 + 2x + x2, 10v = 10x + 10x2. (d) u is the vector pointing 10 miles east, v is the vector pointing 20 miles west, u + v points ten miles west, 10v points 200 miles west.

Exercise 2.2: 1 2 1 2  −1 −2  (a) This is not a vector space because [ 3 4 ] ∈ V0 but (−1). [ 3 4 ] = −3 −4 6∈ V0, which contradicts axiom (b) of Predefinition 2.1. (b) This is not a vector space because the zero matrix does not lie in V1. 1  1  1  1  2 (c) This is not a vector space because [ 1 ] ∈ V2 and −1 ∈ V2 but [ 1 ] + −1 = [ 0 ] 6∈ V2, which contradicts axiom (a). (d) To see that this is not a vector space, consider the polynomials p(x) = x and q(x) = 1 − x. Then p(0)p(1) = 0 = q(0)q(1), so p ∈ V3 and q ∈ V3. However, p(x) + q(x) = 1 for all x, so p + q 6∈ V3. This contradicts axiom (a).

Exercise 3.1: The maps φ0 and φ3 are linear. Indeed, we have

 x h x0 i h x+x0 i h (x+x0)+(y+y0) i φ0 [ y ] + y0 = φ0 y+y0 = (x+x0)−(y+y0) h x+y+x0+y0 i  x+y  h x0+y0 i = x−y+x0−y0 = x−y + x0−y0

x h x0 i = φ0 [ y ] + φ0 y0 x  tx   tx+ty   x+y  x φ0 (t [ y ]) = φ0 ty = tx−ty = t x−y = tφ0 [ y ] 0 00 0 0 00 00 φ3(f + g) = (f + g)(0) + (f + g) (1) + (f + g) (2) = f(0) + g(0) + f (1) + g (1) + f (2) + g (2) 0 00 0 00 = (f(0) + f (1) + f (2)) + (g(0) + g (1) + g (2)) = φ3(f) + φ3(g) 0 00 0 00 φ3(tf) = (tf)(0) + (tf) (1) + (tf) (2) = t(f(0) + f (1) + f (2)) = tφ3(f). For the others: (b) Consider the vectors h 1 i h 0 i h 0 i e1 = 0 e2 = 1 e3 = 0 . 0 0 1

Then φ1(e1) = φ1(e2) = φ1(e3) = 0. However, we have h 1 i φ1(e1 + e2 + e3) = φ1 1 = 1, 1 so φ1(e1 + e2 + e3) 6= φ1(e1) + φ(e2) + φ(e3),

so φ1 is not linear. (c) We have φ2(I) = 1 and φ2((−1).I) = 1, so φ2((−1).I) 6= (−1).φ2(I), so φ2 is not linear. (e) Consider the polynomials p(x) = x and q(x) = 1 − x. Then φ4(p) = 0 × 1 = 0 and φ4(q) = 1 × 0 = 0, but φ4(p + q) = 1 × 1 = 1, so φ4(p + q) 6= φ4(p) + φ4(q), so φ4 is not linear.

Exercise 3.2: Of course there are many different correct answers for this question. The following will do: 74  w  x  w+x  (a) φ y = y+z z  a b c  d e f a (b) φ = [ i ] g h i  a b c  (c) φ d e f = ax2 + bx + c g h i h f(0) 0 i (d) φ(f) = 0 f(1) .

Exercise 3.3: No, because χ(0) = tn 6= 0, for example.

Exercise 3.4: No. The eigenvalues of λI are all equal to λ, so ρ(λI) = |λ|, whereas if ρ were linear we would have to have ρ(λI) = λρ(I) = λ. Alternatively, we have ρ(I) = ρ(−I) = 1 but ρ(0) = 0, so ρ(I + (−I)) 6= ρ(I) + ρ(−I).

Exercise 4.1: (a) U ∩ V is the set of vectors [w, x, y, z]T satisfying the three equations w − x + y − z = 0 w + x + y = 0 x + y + z = 0. Subtracting the last two equations gives w = z. Putting this back into the first equation gives x = y. The middle equation now gives w = −2x, so [w, x, y, z] = [−2x, x, x, −2x]. Thus T T U ∩ V = {[−2x, x, x, −2x] | x ∈ R} = span([−2, 1, 1, −2] ). (b) U ∩W is the set of vectors of the form [u, u+v, u+2v, u+3v]T for which u−(u+v)+(u+2v)−(u+3v) = 0, which reduces to −2v = 0, or equivalently v = 0. Thus T T U ∩ W = {[u, u, u, u] | u ∈ R} = span([1, 1, 1, 1] ) (c) V ∩ W is the set of vectors of the form [u, u + v, u + 2v, u + 3v]T for which u + (u + v) + (u + 2v) = 0 = (u + v) + (u + 2v) + (u + 3v), or in other words 3u + 3v = 0 = 3u + 6v. These equations easily imply that u = v = 0, and this means that V ∩ W = 0.

Exercise 4.2: If you choose two planes at random, their intersection will be a line (unless the two planes happened to be the same, which is unlikely). If you intersect this with a third randomly chosen plane, then you will just get the origin (barring unlikely coincidences). The special feature of P , Q and R is that P ∩ Q ∩ R is not just the origin, but a line. Specifically, we have nh t i o P ∩ Q ∩ R = −2t | t ∈ . t R

Exercise 4.3: The characteristic polynomial is

h t −a −b i a −c a t −c t −c   a t det(tI − A) = det = t det [ c t ] + a det b t − b det [ b c ] b c t = t(t2 + c2) + a(at + bc) − b(ac − bt) = t3 + c2t + a2t + abc − abc + b2t = t3 + (a2 + b2 + c2)t. √ The eigenvalues of A are the roots of this polynomial. These are 0 (which is real) and ±i a2 + b2 + c2 (which are purely imaginary). 75 Exercise 4.4: The matrix can be row-reduced as follows: " 1 1 1 1 # " 1 1 1 1 # " 1 1 0 2 # " 1 1 0 2 # " 1 1 0 0 # " 1 0 0 0 # " 1 0 0 0 # 1 1 −1 −1 1 0 0 −2 −2 2 0 0 0 −4 3 0 0 0 1 4 0 0 0 1 5 0 0 0 1 6 0 1 0 0 1 −1 0 0 −→ 0 −2 −1 1 −→ 0 −2 0 0 −→ 0 1 0 0 −→ 0 1 0 0 −→ 0 1 0 0 −→ 0 0 1 0 . 0 0 1 −1 0 0 1 −1 0 0 1 −1 0 0 1 −1 0 0 1 0 0 0 1 0 0 0 0 1 (In step 1 we subtract row 1 from rows 2 and 3; in step 2 we subtract suitable multiples of row 4 from rows 1, 2, and 3; in step 3 we divide rows 2 and 3 by −4 and −2 respectively; in steps 4 and 5 we clear the 4th and 2nd columns; in step 6 we reorder the rows.) As the final matrix is the identity, we see that A is invertible and has rank 4.

Exercise 4.5: T 0 0 0 0 T (0) The set U0 is a subspace. Indeed, it certainly contains the zero vector. If [w, x, y, z] and [w , x , y , z ] 0 0 0 0 lie in U0, then w + x = 0 and w + x = 0, so (w + w ) + (x + x ) = 0, so the vector [w, x, y, z]T + [w0, x0, y0, z0]T = [w + w0, x + x0, y + y0, z + z0]T

also lies in U0, so U0 is closed under addition. If we also have t ∈ R then tw + tx = t(w + x) = 0, so T [tw, tx, ty, tz] ∈ U0, so U0 is closed under scalar multiplication, and so is a subspace. (1) The set U1 is not a subspace, because it does not contain the zero vector. T 0 0 0 0 T (2) The set U2 is a subspace. Indeed, it certainly contains the zero vector. If [w, x, y, z] and [w , x , y , z ] 0 0 0 0 lie in U2, then w + 2x + 3y + 4z = 0 and w + 2x + 3y + 4z = 0, so (w + w0) + 2(x + x0) + 3(y + y0) + 4(z + z0) = (w + 2x + 3y + 4z) + (w0 + 2x0 + 3y0 + 4z0) = 0 + 0 = 0, so the vector [w, x, y, z]T + [w0, x0, y0, z0]T = [w + w0, x + x0, y + y0, z + z0]T

also lies in U2, so U2 is closed under addition. If we also have t ∈ R then tw + 2tx + 3ty + 4tz = T t(w + 2x + 3y + 4z) = 0, so [tw, tx, ty, tz] ∈ U2, so U2 is closed under scalar multiplication, and so is a subspace. T 2 3 4 T (3) The vector [1, 0, −1, 0] lies in U3, because 1+0 +(−1) +0 = 0. However, the vector 2.[1, 0, −1, 0] = T 2 3 4 [2, 0, −2, 0] does not lie in U3, because 2 + 0 + (−2) + 0 = −6 6= 0. This shows that U3 is not closed under scalar multiplication, so it is not a subspace. (4) As w and x are real numbers, we have w2, x2 ≥ 0, so the only way we can have w2 + x2 = 0 is if w = x = 0. Thus T 4 T U4 = {[w, x, y, z] ∈ R | w = x = 0} = {[0, 0, y, z] | y, z ∈ R}. This is clearly a subspace of R4.

Exercise 4.6: The set U1 is not a subspace, because the zero function is not in U1. The set U2 is not a subspace either. Indeed, the constant function f(t) = 1 is an element of U2, but (−1).f is not an element of U2, so U2 is not closed under scalar multiplication. The set U4 is also not a subspace. To see this, consider the functions f(x) = x(x − 2) and g(x) = (x − 1)(x − 2) and h(x) = f(x) + g(x) = (2x − 1)(x − 2). Then f(0)f(1) = 0 = f(2)f(3) g(0)g(1) = 0 = g(2)g(3) h(0)h(1) = −2 6= h(2)h(3) = 0.

Thus f, g ∈ U4 but f + g 6∈ U4, so U4 is not a subspace. However, U0 and U3 are subspaces of F .

Exercise 4.7: Of course there are many different correct answers for this question. The following will do: 2 (a) W = {p ∈ R[x]≤2 | p(0) = 0} = {ax + bx | a, b ∈ R}. h 1 i  0 a b  (b) W = {A ∈ M2,3 | A 0 = 0} = { | a, b, c, d ∈ } R 0 0 c d R (c) W = {[x, y, z]T ∈ V | z = 0} = {[x, −x, 0]T | x ∈ R} 76 Exercise 4.8: Of course there are many different correct answers for this question. The following will do: (a) V = {[w, x, 0, 0]T | w, x ∈ R} and W = {[0, 0, y, z]T | w, x ∈ R}. w x  0 0  (b) V = {[ 0 0 ] | w, x ∈ R} and W = { y z | y, z ∈ R}. (c) V = {[s, −s, 0]T | s ∈ R} and W = {[0, t, −t]T | t ∈ R}.

Exercise 4.9: Consider a polynomial f ∈ U, say f(x) = ax2 + bx + c. We have f(0) = c, so f ∈ V iff c = 0 iff f(x) = ax2 + bx for some a, b ∈ R. On the other hand, we have f(1) + f(−1) = (a + b + c) + (a − b + c) = 2(a + c), so f ∈ W iff c = −a, so f(x) = a(x2 − 1) + bx for some a, b ∈ R. Thus f ∈ V ∩ W iff c = 0 and also c = −a, which means that a = c = 0, so f(x) = bx for some b. This shows that V ∩ W is as claimed. On the other hand, given an arbitrary quadratic polynomial f(x) = ax2+bx+c we can put g(x) = (a+c)x2 and h(x) = bx−c(x2 −1). We then have g(0) = 0 and h(1)+h(−1) = 0, so g ∈ V and h ∈ W , and f = g +h. This shows that V + W = U. Aside: How did we find this g and h? We need g to be an element of V , so g must have the form g(x) = px2 + qx for some p, q. We also need h to be an element of W , so h must have the form h(x) = r(x2 − 1) + sx for some r, s. Finally, we need f = g + h, which means that ax2 + bx + c = (px2 + qx) + (r(x2 − 1) + sx) = (p + r)x2 + (q + s)x − r. By comparing coefficients we see that a = p + r and b = q + s and c = −r, so r = −c and p = a − r = a + c and s = b − q with q arbitrary. We can choose to take q = 0, giving s = b and so g(x) = px2 + qx = (a + c)x2 and h(x) = r(x2 − 1) + sx = −c(x2 − 1) + bx as before.

u Exercise 4.10: Firstly, it is clear that α [ v ] can only be zero if u = v = 0, so ker(α) = 0, so α is injective.  u −u  1  u −u  1 0 Next, if A ∈ image(α) then A = v −v for some u and v, so A [ 1 ] = v −v [ 1 ] = [ 0 ]. Conversely, suppose 1 0 u s we have a matrix A with A [ 1 ] = [ 0 ]. We can write A = [ t v ] for some u, s, t and v, and we must have 0 u s 1  u+s  [ 0 ] = [ t v ][ 1 ] = t+v .  u −u  u This means that s = −u and t = −v, so A = −v v = α [ v ], so A ∈ image(α). This proves that 1 image(α) = {A | A [ 1 ] = 0}, as claimed.

Exercise 4.11: a + d h i (a) φ  a b  =  a b  − [ 1 0 ] = (a−d)/2 b c d c d 2 0 1 c (d−a)/2 1 (b) We have trace(I) = 2, so φ(I) = I − 2 .2I = 0, so aI ∈ ker(φ) for all a. On the other hand, if 1 1 A ∈ ker(φ) then A − 2 trace(A)I = 0, so A = 2 trace(A)I, which is a multiple of I. Alternatively, we  a b  a 0 can write A as c d , and then if φ(A) = 0 then part (a) gives a − d = b = c = 0, so A = [ 0 a ] = aI. Either way, this completes the proof that ker(φ) = {aI | a ∈ R}.  a b  (c) For any matrix A = c d , we have h i a − d d − a trace(φ(A)) = trace (a−d)/2 b = + = 0. c (d−a)/2 2 2 Thus, image(φ) ⊆ {B ∈ M2R | trace(B) = 0}. Conversely, suppose we have a matrix B with trace(B) = 0; we must show that B ∈ image(φ). We 1 thus need to find a matrix A such that φ(A) = B. In fact we have φ(B) = B− 2 trace(B)I = B−0.I = B, so we can just take A = B. This completes the proof that image(φ) = {B | trace(B) = 0}. Aside: If we had not noticed that φ(B) = B, what would we have done? We would have  p q   a b  B = r −p for some p, q, r, and we would need to find a matrix A = c d with φ(A) = B. Using part (a) we see that this reduces to the equations (a − d)/2 = p and b = q and c = r and 77 (d − a)/2 = −p. These can be solved to give a = 2p + d and b = q and c = r with d arbitrary. We  2p q   p q  could take d = 0, giving A = r 0 , or we could take d = −p giving A = r −p = B as before.

Exercise 4.12:  3 2 0    [ax /3 + bx /2 + cx]−1 a/3 − b/2 + c 3 2 1 (a) φ(f) = [ax /3 + bx /2 + cx]−1 =  2a/3 + 2c  3 2 1 [ax /3 + bx /2 + cx]0 a/3 + b/2 + c (b) We have f ∈ ker(φ) iff φ(f) = 0 iff a/3 − b/2 + c = 0 and 2a/3 + 2c = 0 and a/3 + b/2 + c = 0. By subtracting the first and third equations we see that this implies b = 0, and the second equation gives a = −3c, so f(x) = ax2+bx+c = −3cx2+c = c(1−3x2). It follows that ker(φ) = {c(1−3x2) | c ∈ R}. (c) We need h 1 i  −p/2+q  1 = φ(px + q) = 2q , 0 p/2+q so −p/2 + q = 1 2q = 1 p/2 + q = 0.

1 These equations have the unique solution p = −1 and q = 1/2, so g+(x) = 2 − x. 1 h 0 i (d) We now have g−(x) = + x, and using the formula in (a) we see that φ(g−) = 1 as required. 2 1 (e) Now put T 3 T W = {[u, v, w] ∈ R | v = u + w} = {[u, u + w, w] | u, w ∈ R}. The claim is that W = image(φ). Firstly, for any f we certainly have Z 1 Z 0 Z 1 f(x) dx = f(x) dx + f(x) dx, −1 −1 0 so φ(f) ∈ W , which proves that image(φ) ⊆ W . On the other hand, given a vector x = [u, u + w, w]T ∈ W , we note that h 1 i h 0 i h u i φ(ug+ + wg−) = uφ(g+) + wφ(g−) = u 1 + w 1 = u+w = x, 0 1 w so x ∈ image(φ). This shows that W ⊆ image(φ), so W = image(φ).

Exercise 4.13: h x i (a) This map is injective but not surjective, and so is not an isomorphism. Indeed, if y = 0, then x x clearly [ y ] = 0. In other words, if φ(u) = 0, then u = 0, which means that ker(φ) = 0, which means h 1 i h x i that φ is injective. On the other hand, the vector e1 = 0 is clearly not of the form y , so it does 0 x not lie in the image of φ, so φ is not surjective. h 1 i (b) This map is surjective but not injective, and so is not an isomorphism. Indeed, the vector u = 1 1 has φ(u) = 0, so u ∈ ker(φ), so ker(φ) 6= 0, so φ is not injective. On the other hand, for any vector p 2 v = [ q ] ∈ R we see that h p+q i h (p+q)−q i p φ q = = [ q ] = v, 0 q−0 so v ∈ image(φ). As this works for any v ∈ R2, we deduce that image(φ) = R2, so φ is surjective. h x i p Aside: How did we find this? We need to find a vector u = y with φ(u) = v = [ q ]. This z reduces to the equations x − y = p and y − z = q, giving x = p + q + z and y = q + z with z arbitrary. We could take z = 0, giving u = [p + q, q, 0]T as before. Alternatively, we could take z = −q giving u = [p, 0, −q]T . 78 (c) If f(x) = ax2 + bx + c then f(x) = ax2 + bx + c f(0) = c f 0(x) = 2ax + b f 0(0) = b f 00(x) = 2a f 00(0) = 2a, so φ(ax2 + bx + c) = [c, b, 2a]T . 3 T 1 2 Now define ψ : R → R[x]≤2 by ψ([p, q, r] ) = 2 rx + qx + p. We find that ψ(φ(f)) = f (for all T T f ∈ R[x]≤2), and φ(ψ([p, q, r] )) = [p, q, r] . This means that ψ is an inverse for φ, so φ is an isomorphism (and so is both injective and surjective). (d) This is neither injective nor surjective, and so is not an isomorphism. It is not injective because  1  1 0 φ −1 = 0, which means that φ has nonzero kernel. Moreover, the matrix I = [ 0 1 ] does not have  0 x+y  the form x+y 0 for any x and y, so I 6∈ image(φ), so φ is not surjective. (e) This is surjective but not injective, and so is not an isomorphism. It is not injective, because if we put g(x) = x then Z 1 Z 1  1 21 1 2 2 g(x) dx = x dx = 2 x −1 = 2 (1 − (−1) ) = 0. −1 −1 This shows that g is a nonzero element of ker(φ), so φ cannot be injective. On the other hand, given any t ∈ R we can let h(x) be the constant function with value t/2, and we find that φ(h) = R 1 −1 h(x) dx = 2.(t/2) = t, showing that t ∈ image(φ). This works for any t, so image(φ) = R, so φ is surjective.

s Exercise 4.14: Consider a vector u in L ∩ M. We must have u = [ 2s ] for some s (because u ∈ L) and 2t u = [ t ] for some t (because u ∈ M). This means that s = 2t and t = 2s. If we substitute the second 0 of these equations in the first we get s = 4s, so 3s = 0, so s = 0, so t = 0 and u = [ 0 ]. This shows that L ∩ M = 0. x 2 Now consider an arbitrary vector u = [ y ] ∈ R . We want to show that this lies in L + M, so we must find vectors v ∈ L and w ∈ M such that u = v + w. In other words, we must find numbers s and t such x s 2t that [ y ] = [ 2s ] + [ t ], so x = s + 2t y = 2s + t. These equations have the (unique) solution t = (2x − y)/3 s = (2y − x)/3 so we can put s h (2y−x)/3 i 2t h (4x−2y)/3 i v = [ 2s ] = (4y−2x)/3 w = [ t ] = (2x−y)/3 . We then have v ∈ L and w ∈ M and u = v + w, so u ∈ L + M. This works for any vector u ∈ R2, so R2 = L + M as claimed.

Exercise 4.15: If A ∈ V ∩ W then A = AT (because A ∈ V ) and also AT = −A (because A ∈ W ) so A = −A. We now add A to both sides to get 2A = 0, and divide by 2 to get A = 0. This shows that T T V ∩ W = 0. Now consider an arbitrary matrix A ∈ U. Put A+ = (A + A )/2 and A− = (A − A )/2. Then A+ + A− = A. Moreover, we have T T TT T T A+ = (A + A )/2 = (A + A)/2 = (A + A )/2 = A+ T T TT T T A− = (A − A )/2 = (A − A)/2 = −(A − A )/2 = −A− 79 which shows that A+ ∈ V and A− ∈ W . As A = A+ + A− with A+ ∈ V and A− ∈ W , we have A ∈ V + W . This works for any A ∈ U, so U = V + W as claimed.

Exercise 5.1: (a) These are linearly dependent, and they do not span. Indeed, any list of four vectors in R3 is always dependent. Explicitly, we have u1 − u2 − u3 + u4 = 0, which gives a direct proof of dependence. Also, all the vectors ui have zero as the second entry, so the same will be true for any vector in the T span of the vectors ui. In particular, the vector [0, 1, 0] does not lie in that span, so the ui’s do not span all of R3. This means that they do not form a basis. (b) Any list of four vectors in R3 is automatically linearly dependent (and so cannot form a basis). More specifically, the relation v1 +v2 +v3 −2v4 = 0 shows that the vi’s are dependent. These vectors span 3 h x i 3 all of , because any vector a = y ∈ can be expressed as a = −zv1 −yv2 −xv3 +(x+y +z)v4. R z R (c) A list of two vectors can only be linearly dependent if one is a multiple of the other, which is clearly not the case here, so w1 and w2 are linearly independent. Moreover, a list of two vectors can never 3 h 0 i span all of . More explicitly, we claim that the vector e1 = 1 cannot be expressed as a linear R 0 combination of w1 and w2. Indeed, if we have λ1w1 + λ2w2 = e1 then   h 0 i h 1 i h 4 i λ1+4λ2 1 = λ1 2 + λ2 5 = 2λ1+5λ2 , 0 3 6 3λ1+6λ2 so

λ1 + 4λ2 = 0 2λ1 + 5λ2 = 1 3λ1 + 6λ2 = 0.

The first and third of these easily give λ1 = λ2 = 0, which is incompatible with the second equation, 3 so there is no solution. This shows that w1 and w2 do not form a basis of R . 3 (d) The vectors x1, x2 and x3 are linearly independent and span R , so they form a basis. One way h 1 0 0 i to see this is to write down the matrix A = 1 2 0 whose columns are x1, x2 and x3, and observe 1 2 4 that it row-reduces almost instantly to the identity. Alternatively, we must show that for any vector h x i a = y ∈ 3, there are unique real numbers λ, µ, ν such that z R h x i h 1 i h 0 i h 0 i y = λ 1 + µ 2 + ν 0 . z 1 2 4 This equation is equivalent to λ = x and λ + 2µ = y and λ + 2µ + 4ν = z. It is easy to see that there is indeed a unique solution, namely λ = x and µ = (y − x)/2 and ν = (z − y)/4.

Exercise 5.2: (a) These vectors are linearly independent. Indeed, we have T λ1u1 + λ2u2 + λ3u3 = [λ1, 2λ2, 3λ3, 2λ2, λ1] ,

and this can only be zero if λ1 = λ2 = λ3 = 0. Thus, the only linear relation between u1, u2 and u3 is the trivial one, as required. (b) These are linearly dependent, because of the nontrivial relation 4v1 − 2v2 − v3 = 0. (c) These are linearly independent. Indeed, suppose we have a relation λ1w1 + λ2w2 + λ3w3 = 0. This means that h 1 i h 4 i h 1 i h 0 i λ1 1 + λ2 5 + λ3 1 = 0 , 2 7 1 0 so

λ1 + 4λ2 + λ3 = 0

λ1 + 5λ2 + λ3 = 0

2λ1 + 7λ2 + λ3 = 0.

80 Subtracting the first two equations gives λ2 = 0. Given this, we can subtract the last two equations to get λ1 = 0. Feeding this back into the first equation gives λ3 = 0. Thus, the only linear relation between w1, w2 and w3 is the trivial one, as required. This can also be done by matrix methods. Let A be the matrix whose columns are w1, w2 and w3, so h 1 4 1 i A = 1 5 1 . 2 7 1 Then det(A) = −1 6= 0, and if we row-reduce eihter A or AT then we get the identity. Any of these facts implies that w1, w2 and w3 are linearly independent, as you should remember from SOM201.

Exercise 5.3: Suppose we have a linear relation λf + µg + νh = 0. Note that the symbol 0 on the right hand side means the zero function, which takes the value 0 for all x. In particular, it takes the value 0 at x = a, so we have λf(a) + µg(a) + νh(a) = 0. As f(a) = 1 and g(a) = h(a) = 0, this simplifies to λ = 0. Similarly, we have λf(b) + µg(b) + νh(b) = 0 λf(c) + µg(c) + νh(c) = 0, and these simplify to give µ = ν = 0. Thus, the only linear relation between f, g and h is the trivial one, so they are linearly independent.

0 kx 00 2 kx Exercise 5.4: We have fk(x) = ke and fk (x) = k e , so ex e2x e3x  1 1 1 x 2x 3x x 2x 3x W (f1, f2, f3)(x) = det e 2e 3e  = e e e det 1 2 3 ex 4e2x 9e3x 1 4 9  2 3 1 3 1 2 = e6x det − det + det = e6x(6 − 6 + 2) = 2e6x. 4 9 1 9 1 4

This is not the zero function, so f1, f2 and f3 are linearly independent.

Exercise 5.5: The Wronskian matrix is  xn xn+1 xn+2  WM =  nxn−1 (n + 1)xn (n + 2)xn+1  n(n − 1)xn−2 n(n + 1)xn−1 (n + 1)(n + 2)xn We can extract a factor of xn from the first row, xn−1 from the second row, and xn−2 from the third row, to get  1 x x2  3n−3 W = det(WM) = x det  n (n + 1)x (n + 2)x2  . n(n − 1) n(n + 1)x (n + 1)(n + 2)x2 We then extract x from the second column, and x2 from the third column, to get W = x3n det(V ), where  1 1 1  V =  n n + 1 n + 2  . n(n − 1) n(n + 1) (n + 1)(n + 2) We will expand det(V ) along the top row, using the cofactors

 (n + 1) (n + 2)  det = (n + 1)2(n + 2) − n(n + 1)(n + 2) = (n + 1)(n + 2) = n2 + 3n + 2 n(n + 1) (n + 1)(n + 2)  n (n + 2)  det = n(n + 1)(n + 2) − n(n − 1)(n + 2) = 2n(n + 2) = 2n2 + 4n n(n − 1) (n + 1)(n + 2)  n (n + 1)  det = n2(n + 1) − n(n − 1)(n + 1) = n(n + 1) = (n2 + n). n(n − 1) n(n + 1)

81 Thus det(V ) = 1.(n2 + 3n + 2) − 1.(2n2 + 4n) + 1.(n2 + n) = n2 + 3n + 2 − 2n2 − 4n + n2 + n = 2, so W = 2x3n. Alternatively, you could ask Maple: with(LinearAlgebra):

WM := simplify( << x^n , x^(n+1) , x^(n+2) >| < diff(x^n,x) , diff(x^(n+1),x) , diff(x^(n+2),x) >| < diff(x^n,x,x), diff(x^(n+1),x,x), diff(x^(n+2),x,x) >> );

W := simplify(Determinant(WM));

Exercise 5.6: Consider a matrix   a1 a2 a3 A = a4 a5 a6 a7 a8 a9 Explicitly, we have     a1 a2 a3 1  2 φ(A) = 1 x x a4 a5 a6  x  2 a7 a8 a9 x  2 a1 + a2x + a3x  2 2 = 1 x x a4 + a5x + a6x  2 a7 + a8x + a9x 2 3 4 = a1 + (a2 + a4)x + (a3 + a5 + a7)x + (a6 + a8)x + a9x

2 3 4 In particular, given any element f(x) = b0 + b1x + b2x + b3x + b4x in R[x]≤4, we can take

a1 = b0, a2 = b1, a3 = b2, a6 = b3, a9 = b4, a4 = a5 = a7 = a8 = 0 and we then have φ(A) = f. More explicitly:  1   b b b   b b b  0 1 2  2 0 1 2 2 3 4 φ 0 0 b3 = 1 x x 0 0 b3  x  = b0 + b1x + b2x + b3x + b4x . 0 0 b4 0 0 b4 x2 This means that φ is surjective. The kernel is the set of matrices A for which

a1 = a2 + a4 = a3 + a5 + a7 = a6 + a8 = a9 = 0, or in other words, the set of matrices of the form

        " 0 a2 a3 # 0 1 0 0 0 1 0 0 0 0 0 0 A = −a2 a5 a6 = a2 −1 0 0 + a3  0 0 0 + a5  0 1 0 + a6 0 0 1 −a3−a5 −a6 0 0 0 0 −1 0 0 −1 0 0 0 −1 0 It follows that the following matrices form a basis for ker(φ):

 0 1 0  0 0 1  0 0 0 0 0 0 −1 0 0  0 0 0  0 1 0 0 0 1 0 0 0 −1 0 0 −1 0 0 0 −1 0

Exercise 5.7: 82 a 2 (a) Given any vector v = [ b ] ∈ R , we can define f(x) = a + (b − a)x; then f ∈ R[x]≤3 and f(0) = a and f(1) = b, so φ(f) = v. This shows that φ is surjective. Now consider a polynomial f(x) = ax3 + bx2 + cx + d. We have φ(f) = 0 iff f(1) = 0 = f(0) iff a + b + c + d = 0 = d iff c = −a − b and d = 0. If this holds then f(x) = ax3 + bx2 + (−a − b)x = a(x3 − x) + b(x2 − x) = a(x3 − x2) + (a + b)(x2 − x). In other words, if we put p(x) = x3 − x2 and q(x) = x2 − x, then φ(p) = φ(q) = 0 and f = a p + (a + b) q ∈ span(p, q). This shows that p and q span ker(φ), and they are clearly linearly independent, so they give a basis for ker(φ). (b) If ψ(f) = 0 then we have f(0) = f(1) = f(2) = f(3) = 0, so f(x) has at least four different roots. As f is a polynomial of degree at most three, this is impossible, unless f = 0. To be more explicit, suppose that f(x) = ax3 + bx2 + cx + d and f(0) = f(1) = f(2) = f(3) = 0. This means that d = 0 a + b + c + d = 0 8a + 4b + 2c + d = 0 27a + 9b + 3c + d = 0, and these equations can be solved in the standard way to show that a = b = c = d = 0.

Exercise 5.8: First consider the matrices 1 0 0 1 0 0 0 0 E = E = E = E = . 1 0 0 2 0 0 3 1 0 4 0 1 We have r φ(E ) = p q = pr 1 0 s φ(E ) = p q = ps 2 0 0 φ(E ) = p q = qr 3 r 0 φ(E ) = p q = qs. 4 s

On the other hand, we have trace(E1) = trace(E4) = 1 and trace(E2) = trace(E3) = 0. Suppose for a contradiction that we have φ(A) = trace(A). By taking A = Ei for i = 1, 2, 3, 4 we get pr = 1 and ps = 0 and qr = 0 and qs = 1. As pr = qs = 1 we see that all of p, q, r and s must be nonzero. This conflicts with the equations ps = 0 = qr, so we have the required contradiction.

c − a b  Exercise 5.9: We have J 2 = −I, so if f(x) = ax2 + bx + c we have φ(f) = (c − a)I + bJ = . −b c − a In particular, we have φ(f) = 0 iff b = 0 and c = a, which means that f(x) = a(x2 + 1). We also see that image(φ) is spanned by I and J, which are linearly independent. Thus {x2 + 1} is a basis for ker(φ) and {I,J} is a basis for image(φ).

Exercise 5.10:

(a) If v1, . . . , vn are linearly dependent, then there must exist a linear relation λ1v1 + ··· + λnvn = 0 in which one of the λi’s is nonzero. We then have

λ1φ(v1) + ··· + λnφ(vn) = φ(λ1v1 + ··· + λnvn) = φ(0) = 0,

which gives a nontrivial linear relation between the elements φ(v1), . . . , φ(vn), showing that they too are linearly dependent. 83 2 x 1  0  (b) Consider φ: R → R given by φ [ y ] = x + y, and the vectors v1 = [ 0 ] and v2 = −1 . Then v1 and v2 are linearly independent, but φ(v1) + φ(v2) = 0, which shows that φ(v1) and φ(v2) are linearly dependent. Of course there are many other examples. The minimal example is to let φ be the map R −→ 0 given by φ(t) = 0 for all t, and n = 1, and v1 = 1 ∈ R. But this is perhaps so simple as to be confusing. (c) This is logically equivalent to (a). If φ(v1), . . . , φ(vn) are linearly independent, then v1, . . . , vn cannot be dependent (as that would contradict (a)), so they must be linearly independent.

Exercise 6.1: Put u = φ(x) ∈ V and v = φ(1) ∈ V . Then φ(ax + b) = φ(a.x + b.1) = aφ(x) + bφ(1) = au + bv, as required.

Exercise 6.2: (a)

T 2 2 µV ([0, 1, 1, −1] ) = 0.1 + 1.x + 1.(1 + x) − 1.(1 + x ) = x + 1 + 2x + x2 − 1 − x2 = 3x. (b) Here it is simplest to just observe that

2 2 2 2 T x = −1 + (1 + x ) = (−1).1 + 0.x + 0.(1 + x) + 1.(1 + x ) = µV ([−1, 0, 0, 1] ). For a more laborious but systematic approach, we have

2 2 µV (λ) = λ1.1 + λ2.x + λ3.(1 + 2x + x ) + λ4.(1 + x ) 2 = (λ1 + λ3 + λ4) + (λ2 + 2λ3)x + (λ3 + λ4)x . We want this to equal x2, so we must have

λ1 + λ3 + λ4 = 0

λ2 + 2λ3 = 0

λ3 + λ4 = 1.

These equations can be solved to give λ1 = −1 and λ2 = −2λ3 and λ4 = 1 − λ3 (where λ3 can be T anything). It is simplest to take λ3 = 0, so λ1 = −1 and λ2 = 0 and λ4 = 1, so λ = ([−1, 0, 0, 1] ). (c) Again it is easiest to just observe that (1+x)2 = (1+x2)+2x, so 0.1+2.x−1.(1+x)2 +1.(1+x2) = 0, T so µV ([0, 2, −1, 1] ) = 0. For a more laborious but systematic approach, recall that 2 µV (λ) = (λ1 + λ3 + λ4) + (λ2 + 2λ3)x + (λ3 + λ4)x . We want this to equal 0, so we must have

λ1 + λ3 + λ4 = 0

λ2 + 2λ3 = 0

λ3 + λ4 = 0.

These equations can be solved to give λ1 = 0 and λ3 = −λ4 and λ2 = −2λ3 = 2λ4, so λ = T T λ4.[0, 2, −1, 1] . Here λ4 can be anything, but it is simplest to take λ4 = 1 to get λ = [0, 2, −1, 1] .

Exercise 6.3: 84 (a) As the matrices in A all have 0 in the top left corner, the same will be true of any matrix in span(A). (The formula is µ (λ) = λ [ 0 1 ] + λ [ 0 2 ] + λ [ 0 1 ] + λ [ 0 3 ] =  0 λ1+2λ2+λ3+3λ4  , A 1 2 3 2 1 3 3 3 2 4 2 1 2λ1+λ2+3λ3+2λ4 3λ1+3λ2+2λ3+λ4 but you should be able to follow the argument without needing the formula.) In particular, the identity matrix cannot lie in span(A), because it does not have 0 in the top left corner. Thus span(A) 6= M2R. (b) As the matrices in B are symmetric, the same will be true of any matrix in span(B). (The formula is µ (λ) = λ [ 1 1 ] + λ [ 0 1 ] + λ [ 1 0 ] + λ  1 0  =  λ1+λ3+λ4 λ1+λ2  , B 1 1 0 2 1 1 3 0 1 4 0 −1 λ1+λ2 λ2+λ3−λ4 but you should be able to follow the argument without needing the formula.) In particular, the 0 1 matrix [ 0 0 ] cannot lie in span(B), because it is not symmetric. Thus span(B) 6= M2R.  a b  (c) The list C spans M2R. To see this, consider a matrix A = c d . We have µ (λ) = λ [ 1 0 ] + λ [ 1 1 ] + λ [ 1 1 ] + λ [ 1 1 ] =  λ1+λ2+λ3+λ4 λ2+λ3+λ4  . C 1 0 0 2 0 0 3 1 0 4 1 1 λ3+λ4 λ4  a b  We want this to equal c d , so we must have

λ1 + λ2 + λ3 + λ4 = a

λ2 + λ3 + λ4 = b

λ3 + λ4 = c

λ4 = d

These equations have the (unique) solution λ1 = a − b, λ2 = b − c, λ3 = c − d and λ4 = d. In conclusion, we have T µC([a − b, b − c, c − d, d] ) = A,

showing that A ∈ span(C). This works for any matrix A, so M2R = span(C). (d) As all the matrices in D have trace zero, the same will be true of any matrix in span(D). In particular, the identity matrix cannot lie in span(D), because it does not have trace zero. Thus, span(D) 6= M2R.

T 3 Exercise 6.4: Given λ = [λ0, λ1, λ2] ∈ R , we have 2 2 2 2 2 2 µR(λ)(x) = λ0x + λ1(x + 1) + λ2(x + 2) = λ0x + λ1(x + 2x + 1) + λ2(x + 4x + 4) 2 = (λ0 + λ1 + λ2)x + (2λ1 + 4λ2)x + (λ1 + 4λ2) 2 Suppose we have a quadratic polynomial q(x) = ax + bx + c, and we want to have µR(λ) = q. We must then have

λ0 + λ1 + λ2 = a

2λ1 + 4λ2 = b

λ1 + 4λ2 = c.

Subtracting the last two equations gives λ1 = b − c, and we can put this into the last equation to give λ2 = c/2 − b/4. We then put these two values back into the first equation to give λ0 = a − 3b/4 + c/2. The conclusion is that  a−3b/4+c/2  µ b−c = q, c/2−b/4

showing that q ∈ span(R). As this works for any quadratic polynomial q, we have span(R) = R[x]≤2.

Exercise 7.1: We have h 0 i h 1 i h 1 i φ(e1) = 1 φ(e2) = 0 φ(e3) = 1 1 1 0 85 so the matrix with respect to the standard basis is

h 0 1 1 i 1 0 1 . 1 1 0 On the other hand, we have

h 2 i φ(u1) = 2 = 2.u1 + 0.u2 + 0.u3 2 h −1 i φ(u2) = 1 = 0u1 − 1.u2 + 0.u3 0 h 0 i φ(u3) = −1 = 0u1 + 0.u2 − 1.u3 1

so the matrix with respect to u1, u2 and u3 is

h 2 0 0 i 0 −1 0 . 0 0 −1

2 2 0 00 Exercise 7.2: The usual basis of R[x]≤2 is {1, x, x }. If f(x) = x then f (x) = 2x and f (x) = 2, so f(0) = 0 and f 0(1) = 2 and f 00(2) = 2, so φ(f) = [0, 2, 2]T . In the same way, we get

h 1 i h 0 i 2 h 0 i φ(1) = 0 φ(x) = 1 φ(x ) = 2 . 0 0 2 1 0 0 The matrix of φ has these three vectors as its columns, so the matrix is 0 1 2. 0 0 2

Exercise 7.3: λx λx 2 λx (a) The functions f0(x) = e , f1(x) = xe and f2(x) = x e form a basis for V . (b) If f ∈ V then f(x) = (ax2 + bx + c)eλx, so

f 0(x) = (2ax + b)eλx + (ax2 + bx + c)λeλx = (aλx2 + (2a + bλ)x + (b + cλ))eλx.

Here a, b, c and λ are all just constants, so we see that f 0(x) is again a quadratic polynomial times eλx, so f 0 ∈ V as required. (c) We have

0 f0 = λf0 + 0.f1 + 0.f2 0 f1 = 1.f0 + λf1 + 0.f2 0 f2 = 0.f0 + 2.f1 + λf2

λ 1 0 so the matrix is 0 λ 2. 0 0 λ 0 (d) We have (D − λ)f0 = f0 − λf0 = 0 and similarly (D − λ)f1 = f0 and (D − λ)f2 = 2f1. It follows that 2 3 2 3 (D − λ) f2 = (D − λ)f1 = 0 and (D − λ) f2 = 2(D − λ) f1 = 0, so (D − λ) fi = 0 for i = 0, 1, 2, 0 1 0 so (D − λ)3 = 0. Alternatively, we can note that D − λ has matrix 0 0 2, and it is easy to see 0 0 0 that the cube of this matrix is zero.

86 Exercise 7.4: We have ∆(xk) = (x + 1)k − xk, so ∆(x0) = 0 = 0.x0 + 0.x1 + 0.x2 + 0.x3 ∆(x1) = 1 = 1.x0 + 0.x1 + 0.x2 + 0.x3 ∆(x2) = 2x + 1 = 1.x0 + 2.x1 + 0.x2 + 0.x3 ∆(x3) = 3x2 + 3x + 1 = 1.x0 + 3.x1 + 3.x2 + 0.x3 ∆(x4) = 4x3 + 6x2 + 4x + 1 = 1.x0 + 4.x1 + 6.x2 + 4.x3 The matrix is therefore  0 1 1 1 1  0 0 2 3 4 0 0 0 3 6 0 0 0 0 4 It is clear that the nonzero columns form a basis for R4. It follows that the map is surjective (so the image is all of R[x]≤3), and the kernel is just the set of constant polynomials. More explicitly, consider a polynomial p = ax4 + bx3 + cx2 + dx + e. We have ∆(p) = a(4x3 + 6x2 + 4x + 1) + b(3x2 + 3x + 1) + c(2x + 1) + d = 4ax3 + (6a + 3b)x2 + (4a + 3b + 2c)x + (a + b + c + d). We can only have ∆(p) = 0 if 4a = 6a + 3b = 4a + 3b + 2c = a + b + c + d = 0, which is easily solved to give a = b = c = d = 0 (with e arbitrary). In other words, ∆(p) can only be zero if p is constant. We also find that ∆(x) = 1 ∆((x2 − x)/2) = x ∆((2x3 − 3x2 + x)/6) = x2 ∆((x4 − 2x3 + x2)/4) = x3,

2 3 so the image of ∆ is a subspace containing the elements 1, x, x and x , but these elements span all of R[x]≤3, so ∆ is surjective.

Exercise 7.5: (a) The general formula is h x i h 3 i h x i h 4z i α y = 4 × y = −3z , z 0 z 3y−4x so h 60 i α(u1) = −45 = 5u3 = 0.u1 + 0.u2 + 5.u3 −100 h 0 i α(u2) = 0 = 0.u1 + 0.u2 + 0.u3 0 h −80 i α(u3) = 60 = −5u1 = −5u1 + 0.u2 + 0.u3 −75 h 0 0 −5 i (b) The lists of coefficients here form the columns of the matrix A, so A = 0 0 0 . 5 0 0 (c) h 16 i h 30 i h 36 i h 82 i µU (b) = 1.u1 + 2.u2 + 3.u3 = −12 + 40 + −27 = 1 15 0 −60 −45 h 82 i  4×(−45)  h −180 i α(µU (b)) = α 1 = (−3)×(−45) = 135 −45 3×1−4×82 −325 h 0 0 −5 i h 1 i h −15 i φA(b) = Ab = 0 0 0 2 = 0 5 0 0 3 5 h −15 i  (−15)×16+5×12  h −180 i µU (φA(b)) = µU 0 = −15.u1 + 0.u2 + 5.u3 = (−15)×(−12)+5×(−9) = 135 5 (−15)×15+5×(−20) −325 87 Exercise 7.6: T T T T (a) We have E1 = E1 and E2 = E3 and E3 = E2 and E4 = E4. It follows that α(E1) = α(E4) = 0, whereas α(E2) = E2 − E3 and α(E3) = E3 − E2. From this it follows that " 0 0 0 0 # 0 1 −1 0 A = 0 −1 1 0 . 0 0 0 0 (b) 1 1 1 0 1 0 β(E1) = [ 1 1 ][ 0 0 ] = [ 1 0 ] = 1.E1 + 0.E2 + 1.E3 + 0.E4 1 1 0 1 0 1 β(E2) = [ 1 1 ][ 0 0 ] = [ 0 1 ] = 0.E1 + 1.E2 + 0.E3 + 1.E4 1 1 0 0 1 0 β(E3) = [ 1 1 ][ 1 0 ] = [ 1 0 ] = 1.E1 + 0.E2 + 1.E3 + 0.E4 1 1 0 0 0 1 β(E4) = [ 1 1 ][ 0 1 ] = [ 0 1 ] = 0.E1 + 1.E2 + 0.E3 + 1.E4 The lists of coefficents here give the columns of B, so

" 1 0 1 0 # 0 1 0 1 B = 1 0 1 0 . 0 1 0 1 (c) Using part (b) we get 1 0 1 0 1 1  0 −1  αβ(E1) = α [ 1 0 ] = [ 1 0 ] − [ 0 0 ] = 1 0 = 0.E1 − E2 + E3 + 0.E4 0 1 0 1 0 0  0 1  αβ(E2) = α [ 0 1 ] = [ 0 1 ] − [ 1 1 ] = −1 0 = 0.E1 + E2 − E3 + 0.E4 1 0 1 0 1 1  0 −1  αβ(E3) = α [ 1 0 ] = [ 1 0 ] − [ 0 0 ] = 1 0 = 0.E1 − E2 + E3 + 0.E4 0 1 0 1 0 0  0 1  αβ(E4) = α [ 0 1 ] = [ 0 1 ] − [ 1 1 ] = −1 0 = 0.E1 + E2 − E3 + 0.E4 The lists of coefficents here give the columns of C, so " 0 0 0 0 # −1 1 −1 1 C = 1 −1 1 −1 . 0 0 0 0 (d) One checks directly that " 0 0 0 0 #" 1 0 1 0 # " 0 0 0 0 # 0 1 −1 0 0 1 0 1 −1 1 −1 1 0 −1 1 0 1 0 1 0 = 1 −1 1 −1 , 0 0 0 0 0 1 0 1 0 0 0 0 so AB = C.

Exercise 7.7: (a) The columns of P are the lists of coefficents in the following equations: 0 E1 = E1 + 0.E2 + 0.E3 + E4 0 E2 = E1 + 0.E2 + 0.E3 − E4 0 E3 = 0.E1 + E2 + E3 + 0.E4 0 E4 = 0.E1 + E2 − E3 + 0.E4 Thus, " 1 1 0 0 # 0 0 1 1 P = 0 0 1 −1 1 −1 0 0 0 0 0 (b) For i ≤ 3, the matrix Ei is symmetric, so α(Ei) = 0. This means that the first three columns of A are zero. We also have 0  0 1   0 −1   0 2  0 0 0 0 0 α(E4) = −1 0 − 1 0 = −2 0 = 2E4 = 0.E1 + 0.E2 + 0.E3 + 2.E4, so the last column of A0 is [0, 0, 0, 2]T , so

" 0 0 0 0 # 0 0 0 0 0 A = 0 0 0 0 . 0 0 0 2 (c) Just by multiplying out we see that

" 1 1 0 0 #" 0 0 0 0 # " 0 0 0 0 # " 0 0 0 0 #" 1 1 0 0 # 0 0 0 1 1 0 0 0 0 0 0 0 2 0 1 −1 0 0 0 1 1 PA = 0 0 1 −1 0 0 0 0 = 0 0 0 −2 = 0 −1 1 0 0 0 1 −1 = AP 1 −1 0 0 0 0 0 2 0 0 0 0 0 0 0 0 1 −1 0 0 88 Exercise 7.8: The obvious basis to use is x2, x, 1. We have φ(x2) = x2 + 2x + 2 φ(x) = 0.x2 + x + 1 φ(1) = 0.x2 + 0.x + 1, so the matrix of φ with respect to x2, x, 1 is h 1 0 0 i P = 2 1 0 2 1 1 We thus have trace(φ) = trace(P ) = 1 + 1 + 1 = 3 1 0 2 0 2 1 det(φ) = det(P ) = 1. det [ 1 1 ] − 0. det [ 2 1 ] + 0. det [ 2 1 ] = 1 h t−1 0 0 i char(φ)(t) = char(P )(t) = det −2 t−1 0 = (t − 1)3. −2 −1 t−1

Exercise 7.9:

(a) If a ∈ ker(π) then π(a) = 0 so a0 = a1 = 0. Using the relation a2 = 3a1 −2a0 we see that a2 = 0. We can now use the relation a3 = 3a2 − 2a1 to see that a3 = 0. More generally, if a0 = ··· = ai−1 = 0 (for some i ≥ 2) then the relation ai = 3ai−1 − 2ai−2 tells us that ai = 0 as well. It follows by induction that ai = 0 for all i, so a = 0. Thus ker(π) = 0 as claimed. (b) We have ui+2 − 3ui+1 + 2ui = 1 − 3 + 2 = 0, so u ∈ V . We also have i+2 i+1 i i vi+2 − 3vi+1 + 2vi = 2 − 3.2 + 2.2 = 2 (4 − 3.2 + 2) = 0, so v ∈ V . (c) We have π(u) = [1, 1]T and π(v) = [1, 2]T . By inspection we have π(2u − v) = [1, 0]T and π(v − u) = [0, 1]T , so we can take b = 2u − v and c = v − u. (d) Suppose we have an element a ∈ V . We then have T T T T π(a − a0b − a1c) = π(a) − a0π(b) − a1π(c) = [a0, a1] − a0[1, 0] − a1[0, 1] = [0, 0] .

As π is injective, this means that a = a0b + a1c. It is clear that this expression for a in terms of b and c is unique, so b and c give a basis. Next, for a as above we have

a = a0b + a1c = a0(2u − v) + a1(v − u) = (2a0 − a1)u + (a1 − a0)v, which is a linear combination of u and v. This shows that u and v span V , and they are clearly independent, so they also form a basis. (a) We have λ(u) = λ(1, 1, 1, 1, 1,... ) = (1, 1, 1, 1,... ) = u λ(v) = λ(1, 2, 4, 8, 16,... ) = (2, 4, 8, 16,... ) = 2v 1 0 so the matrix of λ with respect to u, v is just . 0 2

Exercise 8.1:

(1) Clearly v1 6∈ 0 = V0, so 1 is a jump. (2) Clearly v2 is not a multiple of v1, so v2 6∈ V1, so 2 is a jump. (3) We have v3 = (v1 − v2)/2 ∈ V2, so 3 is not a jump. (4) We have v4 = v1 − v3 ∈ V3, so 4 is not a jump. (5) The vectors v1,..., v4 all have the property that the second and third coordinates are the same. Any vector in V4 = span(v1,..., v4) will therefore have the same property. However, the second and third coordinates in v5 are different, so v5 6∈ V4, so 5 is a jump. 89 (6) We have v6 = v1 − v5 ∈ V5, so 6 is not a jump. 6 (7) The vector v7 does not lie in V6. The cleanest way prove this is to consider the linear map φ: R → R give by T φ([x1, x2, x3, x4, x5, x6] ) = x2 − x3 + x4 − x5.

We then find that φ(v1) = φ(v2) = ··· = φ(v6) = 0, so φ(u) = 0 for any u ∈ span(v1,..., v6) = V6, but φ(v7) = −2, so v7 6∈ V6. Thus 7 is a jump. (8) We have v8 = v2 − v7 ∈ V7, so 8 is not a jump. The set of jumps is thus {1, 2, 5, 7}.

 a b   1 0  Exercise 8.2: Note that V ∩ W is the set of matrices of the form b −a , so if we put u1 = 0 −1 and 0 1 1 0  0 1  u2 = [ 1 0 ] then u1, u2 is a basis for V ∩ W . We now put v1 = [ 0 1 ] and w1 = −1 0 . We find that u1, u2, v1 is a basis for V , and u1, u2, w1 is a basis for W .

Exercise 8.3: dim(U + V ) = dim(U) + dim(V ) − dim(U ∩ V ) = 2 + 3 − 1 = 4 dim(V + W ) = dim(V ) + dim(W ) − dim(V ∩ W ) = 3 + 4 − 2 = 5 dim(U + V + W ) = dim(U + V ) + dim(W ) − dim((U + V ) ∩ W ) = 4 + 4 − 3 = 5. Now it is clear that V + W ≤ U + V + W and dim(V + W ) = dim(U + V + W ); this can only happen if V + W = U + V + W . It is also clear that U ≤ U + V + W but U + V + W = V + W so U ≤ V + W as claimed.

Exercise 8.4: (a) Here we have  a b  2  a+bx  2 3 φ c d = [x, x ] c+dx = ax + (b + c)x + dx 2 3 It follows that the list v1 = x, v2 = x , v3 = x is a basis for image(φ), and the matrices u1 = 1 0 0 1 0 0  a b  [ 0 0 ] , u2 = [ 0 0 ] , u3 = [ 0 1 ] have φ(ui) = vi. We also see that φ c d = 0 iff a = d = 0 and c = −b,  0 1  so u4 = −1 0 gives a basis for ker(φ). Finally, we can take v4 = 1 to extend our list v1, . . . , v3 to a basis for all of R[x]≤3. Our final answer is thus: 1 0 0 1 0 0  0 1  u1 = [ 0 0 ] u2 = [ 0 0 ] u3 = [ 0 1 ] u4 = −1 0 2 3 v1 = x v2 = x v3 = x v4 = 1 (b) Here we have 2 h a+b+c i h 1 i h 1 i ψ(ax + bx + c) = a−b+c = (a + c) 1 + b −1 b 0 1 T T From this it is clear that the list v1 = [1, 1, 0] , v2 = [1, −1, 1] is a basis for image(ψ), and that if 2 we put u1 = 1 and u2 = x then φ(ui) = vi for i = 1, 2. Moreover, we have φ(ax + bx + c = 0 iff 2 b = 0 and c = −a, so u3 = x − 1 gives a basis for ker(φ). Finally, almost any choice of v3 will ensure 3 T that v1, v2, v3 is a basis of R , but the simplest is to take v3 = [1, 0, 0] . Our final answer is 2 u1 = 1 u2 = x u3 = x − 1 h 1 i h 1 i h 1 i v1 = 1 v2 = −1 v3 = 0 0 1 0 (c) Here we have h i  a b 0   1 0 0   0 1 0   0 0 0   0 0 0  χ a b = c d 0 = a 0 0 0 + b 0 0 0 + c 1 0 0 + d 0 1 0 c d 0 0 −a−d 0 0 −1 0 0 0 0 0 0 0 0 −1 It follows that we can take 1 0 0 1 0 0 0 0 u1 = [ 0 0 ] u2 = [ 0 0 ] u3 = [ 1 0 ] u4 = [ 0 1 ] h 1 0 0 i h 0 1 0 i h 0 0 0 i h 0 0 0 i v1 = 0 0 0 v2 = 0 0 0 v3 = 1 0 0 v4 = 0 1 0 0 0 −1 0 0 0 0 0 0 0 0 −1 90 and then χ(ui) = vi and v1, . . . , v4 is a basis for image(χ). Moreover, it is clear that χ(A) can only be zero if A is zero, so ker(χ) = 0, so no more u’s need to be added. On the other hand, we need five more v’s to make up a basis for M3R. The obvious choices are as follows:  0 0 1   0 0 0   0 0 0   0 0 0   0 0 0  v5 = 0 0 0 v6 = 0 0 1 v7 = 0 0 0 v8 = 0 0 0 v9 = 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 (d) Here we have θ([a, b, c, d]T ) = ax2 + b(x + 1)2 + c(x − 1)2 + d(x2 + 1) = (a + b + c + d)x2 + 2(b − c)x + (b + c + d) From this we find that θ([1, 0, 0, 0]T ) = x2 θ([0, 1/4, −1/4, 0]T ) = x θ([−1, 0, 0, 1]T ) = 1 θ([0, 1, 1, −2]T ) = 0. We can therefore take

 1  " 0 #  −1   0  0 1/4 0 1 u1 = 0 u2 = −1/4 u3 = 0 u4 = 1 0 0 1 −2 2 v1 = x v2 = x v3 = 1

Exercise 10.1: (a) 2 2 2 2 2 2 2 2 2 hC1,C1i = 1 + 1 + 1 + 0 + 1 + 1 + 0 + 0 + 1 = 6

hC1,C2i = 1.1 + 1.2 + 1.3 + 0.2 + 1.1 + 1.2 + 0.3 + 0.2 + 1.1 = 10

hC1,C3i = 1.0 + 1.1 + 1.2 + 0.(−1) + 1.0 + 1.3 + 0.(−2) + 0.(−3) + 1.0 = 6 2 2 2 2 2 2 2 2 2 hC2,C2i = 1 + 2 + 3 + 2 + 1 + 2 + 3 + 2 + 1 = 37

hC2,C3i = 1.0 + 2.1 + 3.2 + 2.(−1) + 1.0 + 2.3 + 3.(−2) + 2.(−3) + 1.0 = 0 2 2 2 2 2 2 2 2 2 hC3,C3i = 0 + 1 + 2 + (−1) + 0 + 3 + (−2) + (−3) + 0 = 28 (b) Suppose that AT = A and BT = −B. We have hA, Bi = trace(ABT ) = − trace(AB). Using the rules trace(X) = trace(XT ) and trace(YZ) = trace(ZY ) we see that trace(AB) = trace((AB)T ) = trace(BT AT ) = trace((−B)A) = − trace(BA) = − trace(AB) This means that trace(AB) = 0, so hA, Bi = 0. More directly, we have " #  a1 a2 a3  0 b1 b2 A = a2 a4 a5 B = −b1 0 b3 a a a 3 5 6 −b2 −b3 0

for some a1, . . . , a6, b1, b2, b3. It follows that

" a b +a b a b −a b −a b −a b # T 2 1 3 2 3 3 1 1 1 2 2 3 AB = a4b1+a5b2 a5b3−a2b1 −a2b2−a4b3 , a5b1+a6b2 a6b3−a3b1 −a3b2−a5b3 and the trace of this matrix is zero as required. (c) V is the set of matrices of the form

h a a a i  1 1 1   0 0 0   0 0 0  B = b b b = a 0 0 0 + b 1 1 1 + c 0 0 0 c c c 0 0 0 0 0 0 1 1 1 Thus, if we put  1 1 1   0 0 0   0 0 0  B1 = 0 0 0 B2 = 1 1 1 B3 = 0 0 0 0 0 0 0 0 0 1 1 1

then V = span(B1,B2,B3). Now consider a matrix

h a1 a2 a3 i A = a4 a5 a6 , a7 a8 a9 91 ⊥ and suppose that A ∈ V . We then have 0 = hA, B1i = a1 + a2 + a3 and 0 = hA, B2i = a4 + a5 + a6 and 0 = hA, B3i = a7 + a8 + a9. It follows that

h 1 i h a1+a2+a3 i h 0 i A 1 = a4+a5+a6 = 0 , 1 a7+a8+a9 0 as claimed.

Exercise 10.2: (a) We have (x + 1)(x2 + x) = x3 + 2x2 + x, so Z 1 2 3 2  1 4 2 3 1 21 1 2 1 1 2 1 hx + 1, x + xi = x + 2x + x dx = 4 x + 3 x + 2 x −1 = ( 4 + 3 + 2 ) − ( 4 − 3 + 2 ) = 4/3. −1 (b) In general, we have Z 1  xi+j+1 1 1 (−1)i+j+1 hxi, xji = xi+j dx = = − . −1 i + j + 1 −1 i + j + 1 i + j + 1 If i + j is odd then i + j + 1 is even and so (−1)i+j+1 = 1 and hxi, xji = 0. (c) Consider a polynomial f(x) = ax2 + bx + c. We then have 4f(−1) − 8f(0) + 4f(1) = 4(a − b + c) − 8c + 4(a + b + c) = 8a. On the other hand, we have Z 1 hf, ui = (ax2 + bx + c)(px2 + q) dx −1 Z 1 = apx4 + bpx3 + (aq + cp)x2 + bqx + cq dx −1 1 h ap 5 bp 4 aq+cp 3 bq 2 i = 5 x + 4 x + 3 x + 2 x + cqx −1 ap aq+cp = 2 5 + 2 3 + 2cq 2 2 2 = ( 5 p + 3 q)a + ( 3 p + 2q)c. 2 2 2 For this to agree with 4f(−1) − 8f(0) + 4f(1) = 8a, we must have 5 p + 3 q = 8 and 3 p + 2q = 0. 6 2 The second of these gives p = −3q, which we substitute in the first to get − 5 q + 3 q = 8 and thus q = −15. The equation p = −3q now gives p = 45, so u(x) = 45x2 − 15.

Exercise 10.3: If A = [ a1 a2 ] and B =  b1 b2  then a3 a4 b3 b4 det(A − B) = det  a1−b1 a2−b2  = (a − b )(a − b ) − (a − b )(a − b ) a3−b3 a4−b4 1 1 4 4 2 2 3 3 = a1a4 − a1b4 − a4b1 + b1b4 − a2a3 + a2b3 + a3b2 − b2b3 det(A + B) = det  a1+b1 a2+b2  = (a + b )(a + b ) − (a + b )(a + b ) a3+b3 a4+b4 1 1 4 4 2 2 3 3 = a1a4 + a1b4 + a4b1 + b1b4 − a2a3 − a2b3 − a3b2 − b2b3

2 trace(A) trace(B) = 2(a1 + a4)(b1 + b4) = 2a1b1 + 2a1b4 + 2a4b1 + 2a4b4

hA, Bi = −2a1b4 − 2a4b1 + 2a2b3 + 2a3b2 + 2a1b1 + 2a1b4 + 2a4b1 + 2a4b4

= 2(a1b1 + a2b3 + a3b2 + a4b4). (a) We now see that

hA + B,Ci = 2((a1 + b1)c1 + (a2 + b2)c3 + (a3 + b3)c2 + (a4 + b4)c4)

= 2(a1c1 + a2c3 + a3c2 + a4c4) + 2(b1c1 + b2c3 + b3c2 + b4c4) = hA, Ci + hB,Ci 92 (b) Similarly

htA, Bi = 2(ta1b1 + ta2b3 + ta3b2 + ta4b4)

= t.2(a1b1 + a2b3 + a3b2 + a4b4) = thA, Bi

(c) It is clear from the formula hA, Bi = 2(a1b1 + a2b3 + a3b2 + a4b4) that hA, Bi = hB,Ai. 2 2  0 1  (d) In general we have hA, Ai = 2a1 + 4a2a3 + 2a4. If we take A = −1 0 (so a1 = a4 = 0 and a2 = 1 and a3 = −1) then hA, Ai = −4 < 0. 2 2 2 (e) However, if A ∈ V then a3 = a2 so hA, Ai = 2a1 + 4a2 + 2a4. This is always nonnegative, and can only be zero if a1 = a2 = a4 = 0, which means that A = 0 (because a3 = a2).

Exercise 10.4: (a) We have hf, gi0 = (fg + f 0g0)0 = f 0g + fg0 + f 00g0 + f 0g00 = (g + g00)f 0 + (f + f 00)g0, and this is zero because f + f 00 = 0 = g + g00. Thus hf, gi0 = 0. (b) If f ∈ V then f + f 00 = 0. Differentiating this gives f 0 + f 000 = 0, which shows that f 0 ∈ V . (c) It is clear that hf, gi = hg, fi and hf + g, hi = hf, hi + hg, hi and htf, gi = thf, gi. All that is left is to show that hf, fi ≥ 0, with equality only when f = 0. For this, we note that hsin, sini = sin2 + cos2 = 1 hcos, cosi = cos2 +(− sin)2 = 1 hsin, cosi = sin cos + cos .(− sin) = 0.

Any element f ∈ V can be written as f = a. sin +b. cos for some a, b ∈ R, and we deduce that hf, fi = a2hsin, sini + 2abhsin, cosi + b2hcos, cosi = a2 + b2. From this it is clear that hf, fi ≥ 0, with equality iff a = b = 0, or equivalently f = 0. (d) We have D(sin) = cos = 0. sin +1. cos D(cos) = − sin = −1. sin +0. cos

0 −1 It follows that the matrix of D is . 1 0

Exercise 11.1: We use the standard inner product on C[−1, 1], given by hf, gi = R 1 f(x)g(x) dx. Take √ −1 g(x) = 1 − x2, so Z 1 Z 1 2 2 2  1 31 kgk = g(x) dx = 1 − x dx = x − 3 x −1 = 4/3, −1 −1 √ so kgk = 2/ 3. The Cauchy-Schwartz inequality now tells us that |hf, gi| ≤ √2 kfk, or in other words 3

Z 1 Z 1 1/2 p 2 2 1 − x2 f(x) dx ≤ √ f(x) dx −1 3 −1 as claimed. This√ is an equality iff f is a constant multiple of g. In particular, it is an equality when f(x) = g(x) = 1 − x2.

93 R 1 Exercise 11.2: We use the standard inner product on C[0, 1], given by hf, gi = 0 f(x)g(x) dx. The Cauchy-Schwartz inequality says that for any f and g we have hf, gi2 ≤ kfk2kgk2 = hf, fihg, gi. Now take g(x) = f(x)2, so Z 1 hf, gi = f(x)3 dx 0 Z 1 hf, fi = f(x)2 dx 0 Z 1 hg, gi = f(x)4 dx. 0 The inequality therefore says Z 1 2 Z 1  Z 1  f(x)3 dx ≤ f(x)2 dx f(x)4 dx 0 0 0 as claimed. This is an equality iff g is a constant multiple of f, so there is a constant c such that f 2 = cf, so (f(x) − c)f(x) = 0. If f(x) is nonzero for all x we can divide by f(x) to see that f(x) = c for all x, so f is constant. The same holds by a slightly more complicated argument even if we do not assume that f is everywhere nonzero.

Exercise 12.1: The obvious basis for V consists of the matrices 1 0 0 0 0 1 P1 = [ 0 0 ] P2 = [ 0 1 ] P3 = [ 1 0 ] Using the fact that  a b  p q h c d , [ r s ]i = ap + bq + cr + ds, we see that the sequence P1,P2,P3 is orthogonal, with hP1,P1i = hP2,P2i = 1 and hP3,P3i = 2. This means that 1 π(A) = hA, P1iP1 + hA, P2iP2 + 2 hA, P3iP3.  a b  Now take A = c d . We find that hA, P1i = a and hA, P2i = d and hA, P3i = b + c, so we get 1 h a (b+c)/2 i π(A) = aP1 + dP2 + 2 (b + c)P3 = (b+c)/2 d . On the other hand, we have T  a b  a c  2a b+c  A + A = c d + [ b d ] = b+c 2d , so π(A) = (A + AT )/2.

Exercise 12.2: Put W = span(W) and

X hv, wii π(v) = w hw , w i i i i i and (v) = v − π(v). We saw in lectures that v = π(v) + (v), with π(v) ∈ W and (v) ∈ W ⊥, so 2 2 2 2 kvk = kπ(v)k + k(v)k ≥ kπ(v)k . On the other hand, the vectors hv, wiiwi/hwi, wii are orthogonal to each other, so Pythagoras tells us that 2 2 2 2 2 X hv, wii X hv, wii 2 X hv, wii X hv, wii kvk = wi = kwik = hwi, wii = . hw , w i hw , w i2 hw , w i2 hw , w i i i i i i i i i i i i i Alternatively, we can introduce the orthonormal sequence wb i = wi/kwik. Parseval’s inequality tells us that  2 2 2 X X wi X hv, wii X hv, wii kvk2 ≥ hv, w i2 = v, = = b i kw k kw k2 hw , w i i i i i i i i i

94 Exercise 12.3: Observe that the integrals involved in the claimed inequality can be interpreted as follows: Z 1 f(x)2 dx = hf, fi −1 Z 1 f(x) dx = hf, 1i −1 Z 1 xf(x) dx = hf, xi. −1 R 1 Note also that h1, xi = −1 x dx = 0, so the sequence W = 1, x is strictly orthogonal. We can therefore apply Exercise 12.2: it tells us that hf, 1i2 hf, xi2 hf, fi ≥ + . h1, 1i hx, xi R 1 R 1 2 Here h1, 1i = −1 1 dx = 2 and hx, xi = −1 x dx = 2/3, so we get Z 1 Z 1 2 Z 1 2 2 1 3 f(x) dx ≥ 2 f(x) dx + 2 x f(x) dx . −1 −1 −1 We can now multiply by two to get the inequality in the question.

Exercise 12.4: Consider the matrices " 0 0 0 0 # " 0 0 0 0 # " 0 1 1 0 # " 1 0 0 1 # 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 0 B1 = A1 = 0 0 1 0 B2 = A2 − A1 = 0 1 0 0 B3 = A3 − A2 = 1 0 0 1 B4 = A4 − A3 = 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 1

If i 6= j we see that Bi and Bj do not overlap, or in other words, in any place where Bi has a one, Bj has a zero. To calculate hBi,Bji we multiply all the entries in Bi by the corresponding entries in Bj, and add these terms together. All the terms are zero because the matrices do not overlap, so hBi,Bji = 0, so we have an orthogonal sequence. From the formulae

B1 = A1 B2 = A2 − A1 B3 = A3 − A2 B4 = A4 − A3 we deduce that

A1 = B1 A2 = B1 + B2 A3 = B1 + B2 + B3 A4 = B1 + B2 + B3 + B4.

From these two sets of equations together we see that span{B1,...,Bi} = span{A1,...,Ai} for all i. We were asked for an orthonormal sequence, so we now put Ci = Bi/kBik. Note that if a matrix X contains only zeros and ones then kXk2 is just the number of ones. Using this we see that √ √ √ kB1k = 2 kB2k = 2 kB3k = 8 kB4k = 2, so " 0 0 0 0 # " 0 0 0 0 # " 0 1 1 0 # " 1 0 0 1 # 1 0 1 0 0 1 0 0 1 0 1 1 0 0 1 1 0 0 0 0 C1 = √ 0 0 1 0 C2 = √ 0 1 0 0 C3 = √ 1 0 0 1 C4 = 0 0 0 0 2 0 0 0 0 2 0 0 0 0 8 0 1 1 0 2 1 0 0 1

Exercise 12.5: The answer is " 1 # " 1 # " 1 # " 1 # " 1 # 1 1 1 1 1 1 1 1 1 −1 v1 = √ 1 v2 = √ 1 v3 = √ 1 v4 = √ −2 v5 = √ 0 b 5 1 b 20 1 b 12 −3 b 6 0 b 2 0 1 −4 0 0 0 The steps are as follows. We will first find a strictly orthogonal sequence v1,..., v5 and then put vbi = vi/kvik. We start with v1 = u1, which gives hv1, v1i = 5 and hu2, v1i = 4. We then have " 1 # " 1 # " 1 # hu2, v1i 1 4 1 1 1 v2 = u2 − v1 = 1 − 1 = 1 . hv , v i 1 5 1 5 1 1 1 0 1 −4 This gives 2 2 2 2 2 hv2, v2i = (1 + 1 + 1 + 1 + (−4) )/25 = 20/25 = 4/5, 95 and hu3, v1i = 3 and hu3, v2i = 3/5 so " 1 # " 1 # " 1 # " 5 # " 1 # hu3, v1i hu3, v2i 1 3 1 3/5 1 1 1 5 1 1 v3 = u3 − v1 − v2 = 1 − 1 − 1 = 5 = 1 . hv1, v1i hv2, v2i 0 5 1 4/5 5 1 20 −15 4 −3 0 1 −4 0 0 2 2 2 2 This gives hv3, v3i = (1 + 1 + 1 + (−3) )/16 = 3/4 and hu4, v1i = 2 and hu4, v2i = 2/5 and hu4, v3i = 2/4 = 1/2, so

hu4, v1i hu4, v2i hu4, v3i v4 = u4 − v1 − v2 − v3 hv1, v1i hv2, v2i hv3, v3i " 1 # " 1 # " 1 # " 1 # " 10 # " 1 # 1 2 1 2/5 1 1 2/4 1 1 1 10 1 1 = 0 − 1 − 1 − 1 = −20 = −2 . 0 5 1 4/5 5 1 3/4 4 −3 30 0 3 0 0 1 −4 0 0 0

This in turn gives hv4, v4i = 2/3 and hu5, v1i = 1 and hu5, v2i = 1/5 and hu5, v3i = 1/4 and hu5, v4i = 1/3 so

hu5, v1i hu5, v2i hu5, v3i hu5, v4i v5 = u5 − v1 − v2 − v3 − v4 hv1, v1i hv2, v2i hv3, v3i hv4, v4i " 1 # " 1 # " 1 # " 1 # " 1 # " 30 # " 1 # 0 1 1 1/5 1 1 1/4 1 1 1/3 1 1 1 −30 1 −1 = 0 − 1 − 1 − 1 − −2 = 0 = 0 . 0 5 1 4/5 5 1 3/4 4 −3 2/3 3 0 60 0 2 0 0 1 −4 0 0 0 0 In summary, the vectors " 1 # " 1 # " 1 # " 1 # " 1 # 1 1 1 1 1 1 1 1 −1 v1 = 1 v2 = 1 v3 = 1 v4 = −2 v5 = 0 , 1 5 1 4 −3 3 0 2 0 1 −4 0 0 0 form an orthogonal (but not yet orthonormal) sequence with span{v1,..., vi} = span{u1,..., ui} for i = 1,..., 5. To get an orthonormal sequence we put vbi = vi/kvik. We have √ p p p p kv1k = 5 kv2k = 4/5 kv3k = 3/4 kv4k = 2/3 kv5k = 1/2. and so " 1 # " 1 # " 1 # " 1 # " 1 # 1 1 1 1 1 1 1 1 1 −1 v1 = √ 1 v2 = √ 1 v3 = √ 1 v4 = √ −2 v5 = √ 0 b 5 1 b 20 1 b 12 −3 b 6 0 b 2 0 1 −4 0 0 0

√ √ Exercise 12.6: The resulting orthonormal basis is {1, 2t, 2(t2 − 1/2)}. The calculation is as follows. 2 We first note that if i + j is an odd number, then ti+je−t is an odd function, so its integral from −∞ to ∞ is zero, so hti, tji = 0. In particular, t is orthogonal to 1 and t2. Next, the hint tells us that h1, 1i = 1 ht, ti = 1/2 h1, t2i = 1/2 ht2, t2i = 3/4. √ √ It follows that ktk = 1/ 2, so 1, 2t is an orthonormal sequence. The projection of t2 orthogonal to these is √ √ t2 − ht2, 1i1 − ht2, 2ti 2t = t2 − 1/2. To normalise this, we note that 2 2 2 2 2 2 3 1 1 ht − 1/2, t − 1/2i = ht , t i − 2ht , 1/2i + h1/2, 1/2i = 3/4 − ht , 1i + h1, 1i/4 = 4 − 2 + 4 = 1/2. √ √ √ It follows that k 2(t − 1/2)k = 1, so our orthonormal basis is {1, 2t, 2(t2 − 1/2)}.

Exercise 12.7: 96 (a) Take  1   0   1  1 −1 1 0 1 1 v1 = √ 0 v2 = √ 1 v3 = 1 2 0 2 −1 2 1 2 2 2 2 2 2 so (x1 − x2) /2 = hx, v1i and (x3 − x4) /2 = hx, v2i and (x1 + x2 + x3 + x4) /4 = hx, v3i , so

3 2 X 2 α(x) = kxk − hx, vii . i=1

It is easy to check that hv1, v2i = hv1, v3i = hv2, v3i = 0 and hv1, v1i = hv2, v2i = hv3, v3i = 1, so the P3 2 sequence is orthonormal. Parseval’s inequality is thus applicable, and it tells us that i=1hx, vii ≤ kxk2, or in other words α(x) ≥ 0. T (b) The vector v4 = [1, 1, −1, −1] /2 is a unit vector orthogonal to v1, v2 and v3, so the sequence T v1, v2, v3, v4 is orthonormal. (How did we find this? If v4 = [a, b, c, d] we must have a = b for orthogonality with v1, and c = d for orthogonality with v2, and a + b + c + d = 0 for orthogonality 2 2 2 2 T with v3, and a +b +c +d = 1 to make v4 a unit vector. The only two solutions are [1, 1, −1, −1] /2 and [−1, −1, 1, 1]T /2, and either of these will do.) (c) By direct calculation, we have

2 2 2 2 1 2 1 2 1 2 1 2 α(x) =x1 + x2 + x3 + x4 − 4 x1 − 4 x2 − 4 x3 − 4 x4 1 1 1 1 1 1 − 2 x1x2 − 2 x1x3 − 2 x1x4 − 2 x2x3 − 2 x2x4 − 2 x3x4 1 2 1 2 1 2 1 2 − 2 x1 − 2 x2 + x1x2 − 2 x3 − 2 x4 + x3x4 1 2 1 2 1 2 1 2 1 1 1 1 1 1 = 4 x1 + 4 x2 + 4 x3 + 4 x4 + 2 x1x2 − 2 x1x3 − 2 x1x4 − 2 x2x3 − 2 x2x4 + 2 x3x4 2 2 =(x1 + x2 − x3 − x4) /4 = hx, v4i .

We could have done this without calculation as follows. The sequence v1, . . . , v4 is orthonormal (hence linearly independent) of length 4, so it is a basis for R4, so x automatically lies in the span. Thus Parseval’s inequality for this extended sequence is actually an equality, which means that

2 2 2 2 2 kxk = hx, v1i + hx, v2i + hx, v3i + hx, v4i .

2 Rearranging this gives α(x) = hx, v4i as before.

Exercise 14.1: If v = [x, y, z]T we have

x a x y  a b  h i h i hφ(v),Ai = h[ y z ] , i = ax + by + cy + dz = ax + (b + c)y + dz = h y , b+c i, c d z d so h a i φ∗(A) = w = b+c . d

Exercise 14.2: We have   h 0 a4 a7 i b1 b2 b3 h a1 a2 a3 i h 0 0 0 i h 0 0 a8 , b4 b5 b6 i = a b + a b + a b = h a4 a5 a6 , b2 0 0 i. 4 2 7 3 8 6 a a a 0 0 0 b7 b8 b9 7 8 9 b3 b6 0 In other words, if we define   b1 b2 b3 h 0 0 0 i ψ b4 b5 b6 = b2 0 0 b7 b8 b9 b3 b6 0 we have hφ(A),Bi = hA, ψ(B)i. Thus φ∗ = ψ.

97 ∗ Exercise 14.3: We must show that φ = φ, or equivalently that hφ(A),Bi = hA, φ(B)i for all A, B ∈ M2R.  a b  p q If A = c d and B = [ r s ] then

 a+b a+b  1 1  a b  1 1 1 1 c+d 1 1 φ(A) = [ 1 1 ] c d [ 1 1 ] = [ 1 1 ] = (a + b + c + d)[ 1 1 ] = (a + b + c + d)Q c+d 1 1 p q hQ, Bi = h[ 1 1 ] , [ r s ]i = p + q + r + s hφ(A),Bi = (a + b + c + d)hQ, Bi = (a + b + c + d)(p + q + r + s) φ(B) = (p + q + r + s)Q hA, φ(B)i = (p + q + r + s)hA, Qi = (p + q + r + s)(a + b + c + d) = hφ(A),Bi.

Exercise 14.4: We have

T T 1 T T 1 T hφ(A),Bi = trace(φ(A)B ) = trace(AB − n trace(A)B ) = trace(AB ) − n trace(A) trace(B ) 1 = hA, Bi − n trace(A) trace(B) T T 1 T 1 hA, φ(B)i = trace(Aφ(B) ) = trace(AB − n trace(B)A) = trace(AB ) − n trace(A) trace(B), so hφ(A),Bi = hA, φ(B)i. (For a slightly more efficient approach, we can note that our expression for hφ(A),Bi is symmetric: it does not change if we swap A and B. Thus hφ(A),Bi = hφ(B),Ai, and this is the same as hA, φ(B)i by the axiom hX,Y i = hY,Xi.)

Exercise 14.5: First, if f(x) = ax2 + bx + c then f 0(x) = 2ax + b and f 00(x) = 2a for all x, so in particular χ(f) = f 00(0) = 2a. This means that χ(1) = 0 and χ(x) = 0 and χ(x2) = 2. Next, the element u must have the form px2 + qx + r for some constants p, q, r ∈ R. It must satisfy h1, ui = χ(1) = 0 hx, ui = χ(x) = 0 hx2, ui = χ(x2) = 2. On the other hand, we have Z 1/2 2  3 2 1/2 h1, ui = px + qx + r dx = px /3 + qx /2 + rx −1/2 = p/12 + r −1/2 Z 1/2 3 2  4 3 2 1/2 hx, ui = px + qx + rx dx = px /4 + qx /3 + rx /2 −1/2 = q/12 −1/2 Z 1/2 2 4 3 2  5 4 3 1/2 hx , ui = px + qx + rx dx = px /5 + qx /4 + rx /3 −1/2 = p/80 + r/12 −1/2 so we must have p/12+r = 0 and q/12 = 0 and p/80+r/12 = 2. These give p = 360 and q = 0 and r = −30, so u = 360x2 − 30. 2 Now define ψ : R → R[x]≤2 by ψ(t) = tu = 360tx − 30t. We claim that ψ is adjoint to φ. Indeed, the standard inner product on R is just hs, ti = st, so hφ(f), ti = tφ(f) = thf, ui = hf, tui = hf, ψ(t)i, as required.

Exercise 14.6: int 0 int 00 2 int 2 (a) First note that en(t) = e , so en(t) = ine , so en(t) = (in) e = −n en(t). This means that 2 P 2 P 2 ∆(en) = −n en, and thus that ∆(f) = n an.(−n en) = − n n anen. 98 P2 P2 (b) Consider elements f, g ∈ T2, say f = n=−2 anen and g = k=−2 bmem. We then have X 2 ∆(f) = − n anen n * + X 2 X X 2 h∆(f), gi = − n anen, bmem = − n an bm hen, emi n m n,m 2 X 2 = − n anbn n=−2 X 2 ∆(g) = − m bmem m * + X X 2 X 2 hf, ∆(g)i = anen, − m bmem = − m an bm hen, emi n m n,m 2 X 2 = − m ambm m=−2 This shows that h∆(f), gi = hf, ∆(g)i, so ∆ is self-adjoint. (c) Part (a) tells us that the matrix of ∆ with respect to the standard basis E = e−2, e−1, e0, e1, e2 is

 −4 0 0 0 0  0 −1 0 0 0 D =  0 0 0 0 0   0 0 0 −1 0  0 0 0 0 −4 From this it is clear that the eigenvalues are 0, −1 and −4. (d) The eigenspace for the eigenvalue 0 is spanned by e0 and so has dimension one. The eigenspace for the eigenvalue −1 is spanned by e−1 and e1, and so has dimension 2. Similarly, the eigenspace for the eigenvalue −4 has dimension two, with basis e−2, e2.

Exercise 14.7: For any a, b ∈ U we have hφ(a), φ(b)i = ha, φ∗φ(b)i = ha, bi. (The first step is the definition of φ∗, and the second is the fact that φ∗φ(b) = b.) In particular, we have hφ(ui), φ(uj)i = hui, uji. As the sequence U is orthonormal, we have hui, uji = 0 when i 6= j, and hui, uii = 1. We therefore see that hφ(ui), φ(uj)i = 0 when i 6= j, and hφ(ui), φ(ui)i = 1, which means that the sequence φ(u1), . . . , φ(un) is also orthonormal.

Exercise 14.8: (a) We are given that φφ∗(v) = v for all v ∈ V . In particular, we can take v = φ(u) to get φφ∗φ(u) = φ(u), or in other words φ(u1) = φ(u). It follows that φ(u2) = φ(u − u1) = φ(u) − φ(u1) = 0. (b) We have hφ∗(a), bi = ha, φ(b)i for all a ∈ V and b ∈ U. In particular we can take a = φ(u) and b = u2 to get ∗ hu1, u2i = hφ φ(u), u2i = hφ(u), φ(u2)i = hφ(u), 0i = 0. (The penultimate step uses part (a).) (c) As u1 and u2 are orthogonal we have 2 2 2 2 2 kuk = ku1 + u2k = ku1k + ku2k ≥ ku1k . (d) We have 2 ∗ ∗ ∗ 2 ku1k = hφ φ(u), φ φ(u)i = hφ(u), φφ φ(u)i = hφ(u), φ(u)i = kφ(u)k . (Here we have used the equation φφ∗φ(u) = φ(u) from part (a).) (e) If we combine (c) and (d) and take square roots we get kφ(u)k ≤ kuk. 99 P2 Exercise 15.1: We can write f = n=−2 anen for some sequence of coefficients an. Note that

en(0) = 1 inπ n en(π) = e = (−1) −inπ/2 −iπ/2 n n en(−π/2) = e = (e ) = (−i) inπ/2 iπ/2 n n en(π/2) = e = (e ) = i . It follows that

f(0) = a−2 + a−1 + a0 + a1 + a2

f(π) = a−2 − a−1 + a0 − a1 + a2

f(−π/2) = −a−2 + ia−1 + a0 − ia1 − a2

f(π/2) = −a−2 − ia−1 + a0 + ia1 − a2

f(0) − f(π) = 2(a−1 + a1)

f(−π/2) − f(π/2) = 2i(a−1 − a1).

As f(0) = f(π) and f(−π/2) = f(π/2), we must have a−1 + a1 = 0 and also a−1 − a1 = 0, which gives a−1 = a1 = 0. We therefore have f = a−2e−2 + a0 + a2e2, or in other words −2it 2it f(t) = a−2e + a0 + a2e . This gives −2it −2πi 2it 2πi f(t + π) = a−2e e + a0 + a2e e , which is the same as f(t) because e2πi = 1.

Exercise 15.2: 0 (a) As f = a−2e−2 + a−1e−1 + a0e0 + a1e1 + a2e2 and en(0) = 1 and en(0) = in, we have

f(0) = a−2 + a−1 + a0 + a1 + a2 0 f (0) = i.(−2a−2 − a−1 + a1 + 2a2) This means that f ∈ U iff we have

a−2 + a−1 + a0 + a1 + a2 = 0

−2a−2 − a−1 + a1 + 2a2 = 0 These equations can be solved in a standard way to give

a−2 = a0 + 2a1 + 3a2

a−1 = −2a0 − 3a1 − 4a2 and so

f = (a0 + 2a1 + 3a2)e−2 − (2a0 + 3a1 + 4a2)e−1 + a0e0 + a1e1 + a2e2

= a0(e−2 − 2e−1 + e0) + a1(2e−2 − 3e−1 + e1) + a2(3e−2 − 4e−1 + e2). (b) From the last expression above, we observe that the functions

u0 = e−2 − 2e−1 + e0

u1 = 2e−2 − 3e−1 + e1

u2 = 3e−2 − 4e−1 + e2 give a basis for U. (c) Part (b) gives a basis for U of length 3, so dim(U) = 3. We also have a basis e−2, e−1, e0, e1, e2 of ⊥ ⊥ length 5 for T2, so dim(T2) = 5. As T2 = U ⊕ U we have dim(U) + dim(U ) = dim(T2) = 5, so dim(U ⊥) = 5 − 3 = 2. 100 (d) Put

v0 = e−2 + e−1 + e0 + e1 + e2

v1 = −2e−2 − e−1 + e1 + 2e2.

As the en’s are orthonormal, we have

hf, v0i = ha−2e−2 + a−1e−1 + a0e0 + a1e1 + a2e2, e−2 + e−1 + e0 + e1 + e2i

= a−2 + a−1 + a0 + a1 + a2 = f(0)

hf, v1i = ha−2e−2 + a−1e−1 + a0e0 + a1e1 + a2e2, −2e−2 − e−1 + e1 + 2e2i 0 = −2a−2 − a−1 + a1 + 2a2 = f (0). 0 ⊥ (e) If f ∈ U then hf, v0i = f(0) = 0 and hf, v1i = f (0) = 0. This shows that v0 and v1 lie in U . They are clearly linearly independent, as neither one is a multiple of the other. There are two of them, and dim(U ⊥) = 2 by part (c), so they give a basis for U ⊥. Moreover, we have

hv0, v1i = 1.(−1) + 1.(−1) + 1.0 + 1.1 + 1.2 = 0, ⊥ so they are orthogonal. We can therefore construct an orthonormal basis of U by dividing v0 and v1 by their norms. Explicitly, we have 2 2 2 2 2 2 kv0k = 1 + 1 + 1 + 1 + 1 = 5 2 2 2 2 2 2 kv1k = (−2) + (−1) + 0 + 1 + 2 = 10 so our orthonormal basis consists of the functions √ vˆ0 = (e−2 + e−1 + e0 + e1 + e2)/ 5 √ vˆ1 = (−2e−2 − e−1 + e1 + 2e2)/ 10.

Exercise 16.1: int 0 int 00 2 int 2 (a) First note that en(t) = e , so en(t) = ine , so en(t) = (in) e = −n en(t). This means that 2 P 2 P 2 ∆(en) = −n en, and thus that ∆(f) = n an.(−n en) = − n n anen. P2 P2 (b) Consider elements f, g ∈ T2, say f = n=−2 anen and g = k=−2 bmem. We then have X 2 ∆(f) = − n anen n * + X 2 X X 2 h∆(f), gi = − n anen, bmem = − n an bm hen, emi n m n,m 2 X 2 = − n anbn n=−2 X 2 ∆(g) = − m bmem m * + X X 2 X 2 hf, ∆(g)i = anen, − m bmem = − m an bm hen, emi n m n,m 2 X 2 = − m ambm m=−2 This shows that h∆(f), gi = hf, ∆(g)i, so ∆ is self-adjoint. (c) Part (a) tells us that the matrix of ∆ with respect to the standard basis E = e−2, e−1, e0, e1, e2 is  −4 0 0 0 0  0 −1 0 0 0 D =  0 0 0 0 0   0 0 0 −1 0  0 0 0 0 −4 From this it is clear that the eigenvalues are 0, −1 and −4. 101 (d) The eigenspace for the eigenvalue 0 is spanned by e0 and so has dimension one. The eigenspace for the eigenvalue −1 is spanned by e−1 and e1, and so has dimension 2. Similarly, the eigenspace for the eigenvalue −4 has dimension two, with basis e−2, e2.

Exercise 16.2: h 1 2 1 i (a) We have α(x) = Ax, where A = 2 9 2 , so A is the matrix that we need. 1 2 1 † † (b) We have α = φA and so α = φA† , but clearly A = A, so α is self-adjoint. (c) Here we just need to find the eigenvalues and eigenvectors of the matrix A. We have 1−t 2 1 h i  9−t 2   2 2   2 9−t  det(A − tI) = det 2 9−t 2 = (1 − t) det − 2 det 1 1−t + det 1 2 1−t 2 1−t 1 2 = (1 − t)(t2 − 10t + 5) − 2(−2t) + (t − 5) = −t3 + 11t2 − 10t = −t(t2 − 11t + 10) = −t(t − 1)(t − 10), so the characteristic polynomial is det(tI − A) = t(t − 1)(t − 10), so the eigenvalues are 0, 1 and 10. For an eigenvector of eigenvalue 0 we must have x + 2y + z = 0 2x + 9y + 2z = 0 x + 2y + z = 0.

T These equations give y = 0 and z = −x, so any eigenvector√ of eigenvalue 0 is a multiple of [1, 0, −1] . T For a unit vector, we can take u1 = [1, 0, −1] / 2. For an eigenvector of eigenvalue 1 we must have x + 2y + z = x 2x + 9y + 2z = y x + 2y + z = z. These equations give x = z = −2y, so any eigenvector of eigenvalue 1 is a multiple of [−2, 1, −2]T . T For a unit vector, we can take u2 = [−2, 1, −2] /3. For an eigenvector of eigenvalue 10 we must have x + 2y + z = 10x 2x + 9y + 2z = 10y x + 2y + z = 10z.

T These equations give z = x and y = 4x, so any eigenvector√ of eigenvalue 1 is a multiple of [1, 4, 1] . T For a unit vector, we can take u3 = [1, 4, 1] /(3 2). Now u1, u2 and u3 are eigenvectors of a self-adjoint map with different eigenvalues, so they are automatically orthogonal. (It is easy to check this directly, of course.) They are unit vectors, so they give an orthonormal (and so linearly independent) sequence. As this is an independent sequence of length three in a three-dimensional space, it must be a basis.

Exercise 16.3: T T (a) If z = [z0, . . . , z4] and w = [w0, . . . , w4] we have T T hα(z), wi = h[z1, z2, z3, z4, z0] , [w0, w1, w2, w3, w4] i

= z1w0 + z2w1 + z3w2 + z4w3 + z0w4

= z0w4 + z1w0 + z2w1 + z3w2 + z4w3 T T = h[z0, z1, z2, z3, z4] , [w4, w0, w1, w2, w3] i.

102 Thus, if we define β : C5 → C5 by T T β([w0, w1, w2, w3, w4] ) = [w4, w0, w1, w2, w3] , we have hα(z), wi = hz, β(w)i. This means that β = α†. We also have T βα(z) = β([z1, z2, z3, z4, z0] ) = z T αβ(w) = α([w4, w0, w1, w2, w3] ) = w, so β is inverse to α. Thus α−1 = β = α†. (b) Suppose that α(z) = λ z, or in other words

[z1, z2, z3, z4, z0] = [λz0, λz1, λz2, λz3, λz4]. This means that

z1 = λz0 2 z2 = λz1 = λ z0 3 z3 = λz2 = λ z0 4 z4 = λz3 = λ z0 5 z0 = λz4 = λ z0. 5 5 The last equation gives (λ − 1)z0 = 0. If λ 6= 1 then we see that z0 = 0, and by substituting this into our other equations we see that z1 = z2 = z3 = z4 = 0 as well, so z = 0. Thus, for such λ there are no nonzero eigenvectors, so λ is not an eigenvalue. Suppose instead that λ5 = 1, which means that λ = e2πik/5 for some k ∈ {0, 1, 2, 3, 4}. We then see that the vector 2 3 4 T uk = [1, λ, λ , λ , λ ] is an eigenvector of eigenvalue λ. Thus, the eigenvalues are precisely the fifth roots of unity.

Exercise 16.4: (a) If f = ax2 +bx+c then f 00 = 2a, so α(f) = 6ax2 −2a, so α(f)00 = 12a, so α(α(f)) = (3x2 −1)α(f)00 = 36ax2 − 12a = 6α(f). (b) If α(f) = λf then α(α(f)) = α(λf) = λα(f) = λ.λf = λ2f. On the other hand, we also know from (a) that α(α(f)) = 6α(f) = 6λf, so λ2f = 6λf, so (λ2 − 6λ)f = 0. As f was assumed to be nonzero, this means that λ2 − 6λ = 0, or λ(λ − 6) = 0, so λ = 0 or λ = 6. (c) By part (b) the only possible eigenvalues are 0 and 6. Clearly α(f) = 0 iff f 00 = 0 iff a = 0, so the eigenvectors of eigenvalue 0 are just the polynomials of the form bx + c. In particular, if we put R 1 u1 = 1 and u2 = x then u1 and u2 are eigenvectors, and we have hu1, u2i = −1 x dx = 0, so they are 2 00 orthogonal. Next take u2 = 3x − 1. By definition we have α(f) = u2.f for all f, so in particular 00 α(u2) = u2.u2 = 6u2, so u2 is an eigenvector of eigenvalue 6. We are given that α is self-adjoint, so eigenvectors of different eigenvalues are automatically orthogonal, so u1, u2, u3 is an orthogonal sequence.

Exercise 16.5: R ∞ R ∞ R ∞ (a) We have hφ(f), gi = −∞ x f(x)g(x) dx and hf, φ(g)i = −∞ f(x)x g(x) dx = −∞ x f(x)g(x) dx. Here x is real so x = x so hφ(f), gi = hf, φ(g)i, which means that φ is self-adjoint. (b) If φ(f) = λ f then x, f(x) = λ f(x) for all x ∈ R, so (x − λ) f(x) = 0 for all x ∈ R. If x 6= λ then we can divide by x − λ to see that f(x) = 0. If λ 6∈ R then this finishes the argument: x can never be equal to λ, so f(x) = 0 for all x ∈ R, so f = 0. If λ is real then we need one extra step: we have seen that f(x) = 0 for all x 6= λ, but f is continuous so it cannot jump away from zero at x = λ, so f(λ) = 0 as well, so f = 0. 103 Exercise 16.6: We first note that a b 0 1 a b a b 0 1 c d b a c − b d − a γ = − = − = . c d 1 0 c d c d 1 0 a b d c a − d b − c

(a) The standard basis for M2R consists of the matrices 1 0 0 1 0 0 0 0 E = E = E = E = 1 0 0 2 0 0 3 1 0 4 0 1 We have 0 −1 γ(E ) = = −E + E 1 1 0 2 3 −1 0 γ(E ) = = −E + E 2 0 1 1 4 1 0  γ(E ) = = E − E 3 0 −1 1 4  0 1 γ(E ) = = E − E 4 −1 0 2 3

so the matrix of γ with respect to E1,E2,E3,E4 is  0 −1 1 0  −1 0 0 1     1 0 0 −1 0 1 −1 0 1 0  (b) From our formulae for γ(E ), we see that the matrices U = E − E = and V = E − E = i 1 4 0 −1 2 3  0 1 form a basis for image(γ), and the matrices I = E + E and T = E + E form a basis −1 0 1 4 2 3 for ker(γ). The matrices E1,...,E4 are orthonormal with respect to the usual inner product, so we have

hU, Ii = hE1 − E4,E1 + E4i = 1.1 + (−1).1 = 0

hU, T i = hE1 − E4,E2 + E3i = 0

hV,Ii = hE2 − E3,E1 + E4i = 0

hV,T i = hE2 − E3,E2 + E3i = 1.1 + (−1).1 = 0. This shows that ker(γ) = span{I,T } is orthogonal to image(γ) = span{U, V }. As the dimensions of these two subspaces add up to the dimension of the whole space, the subspaces must be orthogonal complements of each other. (c) One checks that γ(U) = −2V and γ(V ) = −2U. It follows that γ2(U) = 4U, and so γ4(U) = γ2(4U) = 16U = 4γ2(U). Similarly, we have γ4(V ) = 16V = 4γ2(V ). As γ(I) = γ(T ) = 0, we also have γ4(I) = 0 = 4γ2(I) and γ4(T ) = 0 = 4γ2(T ). Thus gm4 and 4γ2 have the same effect on U, V, I 4 2 and T , which span M2R, so γ = 4γ . Alternatively, we can just square the matrix in part (a) twice to see that  2 0 0 −2  8 0 0 −8 2  0 2 −2 0  4  0 8 −8 0  γ ∼   γ ∼    0 −2 2 0   0 −8 8 0  −2 0 0 2 −8 0 0 8 which again shows that γ4 = 4γ2. (d) As T and I lie in the kernel of γ, they are eigenvectors with eigenvalue 0. As γ(U) = −2V and γ(V ) = −2U we see that γ(U + V ) = −2(U + V ) and γ(U − V ) = 2(U − V ). It follows that {I,T,U − V,U + V } is a basis consisting of eigenvectors. 104 105