MATH 247: Honours Applied Linear Algebra Anna Brandenberger, Daniela Breitman May 11, 2018

This is a transcript of the lectures given by Prof. Axel Hundemer during the winter semester of the 2017-2018 academic year (01-04 2018) for the Honours Applied Linear Algebra class (math 247). Subjects covered are: Matrix algebra, determinants, systems of linear equations; Abstract vector spaces, inner product spaces, Fourier series; Linear transformations and their matrix representations; Eigenvalues and eigenvectors, diagonalizable and defective matrices, positive deﬁnite and semideﬁnite matrices; Quadratic and Hermitian forms, generalized eigenvalue problems, simultaneous reduction of quadratic forms; and Applications.

1 Abstract Vector Spaces

1.1 Group Theory lecture 01/09 Deﬁnition 1.1. Group: Let G be a set with a binary operation  G × G → G such that: (a, b) 7→ a + b

• associativity a + (b + c) = (a + b) + c, ∀a, b, c ∈ G.

• there exists a neutral element ∃0 ∈ G s.t. a + 0 = 0 + a = a, ∀a ∈ G.

• there exists an (additive) inverse element ∃ − a ∈ G s.t. a + (−a) = (−a) + a = 0, ∀a ∈ G.

Deﬁnition 1.2. A group (G, +)1 is called Abelian if commutativity is 1 A group G with operation +. satisﬁed:

• ∀a, b ∈ G, a + b = b + a.

Examples 1.1. of Abelian groups:

• (R, +), (C, +), (Q, +), (Z, +) standard addition.

• (R\{0}, ·), (C\{0}, ·) standard multiplication. However, N = {1, 2, 3 . . . } with standard addition is not a group because • Rn, Cn with component-wise addition: there is no inverse element in N for any n ∈ N. (x1, x2,..., xn) + (y1, y2,..., yn) = (x1 + y1,..., xn + yn)

• Mat(m × n, R), Mat(m × n, C): m × n matrices with coefﬁcients in R or C with component-wise addition. 2 anna brandenberger, daniela breitman

• the set F(R) of all real-valued functions with domain R, with:

– addition: ( f + g)(x) ≡ f (x) + g(x) – neutral element: zero function 0(x) ≡ 0 – inverse element: (− f )(x) ≡ − f (x), ∀x ∈ R

Example 1.2. of a non-Abelian group:

• GL(n, R), n ≥ 2: "General Linear Group" i.e. the set of all invertible n × n matrices with coefﬁcients in R, with matrix multiplication. " 1 0 # . – neutral element: In = .. s.t. In · A = A · In = A, 0 1 ∀A ∈ GL(n, R). −1 −1 −1 – inverse of A is A since A · A = A · A = In.

Theorem 1.1. Cancellation Law: Let (G, +) be a group and a, b, c ∈ G:

(a)a + b = a + c =⇒ b = c

(b)b + a = c + a =⇒ b = c

Proof of (a):

Exercise 1.3. Prove (b). a + b = a + c =⇒ −a + (a + b) = −a + (a + c) existence of an inverse =⇒ (−a + a) + b = (−a + a) + c associative =⇒ 0 + b = 0 + c deﬁnition of neutral element =⇒ b = c neutral element

1.2 Vector Spaces lecture 01/11 Deﬁnition 1.3. Let V be a set, and let K be either R or C together  + : V × V → V vector addition with two operations: ,  · : K × V → V scalar multiplication such that (V, +) is an abelian group. If the following axioms for scalar multiplication hold ∀k, l ∈ K and u, v ∈ V:

(kl)v = k(lv) associativity ( + ) = + The neutral element is called the zero k l v kv lv distributivity of scalars over vectors vector. k(u + v) = ku + kv distributivity of vectors over scalars 1 · v = v neutral element of scalar multiplication

then (V, +, ·) is called a vector space over K. K stands for Körper ≈ ﬁeld in German. math 247: honours applied linear algebra 3

Examples 1.4. of vector spaces:

• (Rn, +, ·) where + and · are standard addition of tuples and standard scalar multiplication.

• Mat(n × m, K) with + standard addition of matrices, and · standard multiplication by scalars.2 2 Checking axioms: (Mat(n × m, K), +) is an abelian • Let s be the set of all real sequences, i.e. (a1, a2,... ) : ai ∈ R to- group: obvious. axioms for scalar multiplication also gether with component-wise addition and scalar multiplication. satisﬁed. The axioms of VS follow immediately from the fact that the opera- =⇒ vector space. tions are component-wise.

• Let c be the set of all convergent real sequences together with component-wise + and ·. The only thing one must check is that c is closed under + and ·3, which is proven in analysis 1. 3 i.e. that the sum of two convergent sequences is convergent and the scalar • Let c0 be the set of all real sequences converging to 0, together multiple of a convergent sequence is convergent. with component-wise + and ·. This set is closed under + and · since the sum of two sequences with limit 0 has limit 0, same for scalar multiplication.

4 4 • Let c00 be the set of all real sequences that are eventually 0 , with i.e. all but ﬁnitely many terms of these component-wise + and ·. are different from 0.

• Let I ⊆ R be an interval. Let F(I) be the set of all real-valued functions on the interval I, together with + and · deﬁned by:

( f + g)(x) ≡ f (x) + g(x) (k f )(x) ≡ k f (x)

• The set of all solutions to y00 + y = 0 or, more generally, the set of n n−1 solutions to any linear differential equation any + an−1y + 1 0 ··· + a1y + a0y = 0, ai ∈ R. Just need to verify that the solutions 5 5 00 are closed under + and ·. Let y1, y2 be two solutions of y + y = 00 00 0, i.e. y1 + y1 = 0 and y2 + y2 = 0, and k an arbitrary constant. ome consequences S of the axioms of a vector space: 00 00 00 (y1 + y2) + (y1 + y2) = y1 + y2 + y1 + y2 = (y00 + y ) + (y00 + y ) Theorem 1.2. Let (V, +, ·) be a VS over K. Then 1 1 2 2 = 0 00 00 (a)k · 0 = 0 ∀k ∈ K (ky1 ) + (ky1) = k(y1 + y1) = 0 (b) 0 · v = 0 ∀v ∈ V =⇒ y1 + y2 and ky1 are solutions of y00 + y = 0. (c)k · v = 0 ⇐⇒ k = 0 ∨ v = 0 4 anna brandenberger, daniela breitman

Proof. of (a)

k · 0 = k · (0 + 0) neutral el of + = k · 0 + k · 0 distribution of vectors =⇒ 0 +¨k ·¨0 = ¨k ·¨0 + k · 0 neutral element =⇒ 0 = k · 0 cancellation law

Proof. of (b)

0 · v = (0 + 0) · v neutral element = 0 · v + 0 · v distribution of scalars =⇒ 0 +¨0 ·¨v = ¨0 ·¨v + 0 · V neutral element =⇒ 0 = 0 · v cancellation law

Proof. of (c): k · v = 0. "⇐" follows from (a) and (b). "⇒": if k = 0, we are done. So let k 6= 0: must show that v = 0. 1 1 kv = 0 =⇒ (kv) = · 0 = 0 k k 1 = ( k)v = 1 · v = v associative k =⇒ v = 0

Theorem 1.3. Let (V, +, ·) be a VS over K. Then:

(a)n · v = v + ··· + v (n times), ∀n ∈ N

(b) 0 · v = 0

Proof. of (a): by Induction. Base case n = 1: 1 · v = v is an axiom. Inductive step n → n + 1: assume nv = v + ··· + v for some n ∈ N.

(n + 1)v = nv + 1 · v = (v + ··· + v) +v = v + ··· + v | {z } | {z } n times n + 1 times Proof. of (b) has already been done.

Proof. of (c)

v + (−1)v = 1 · v + (−1) · v = (1 + (−1))v associative = 0 · v = 0 prev. proven

=⇒ (−1)v is the additive inverse of v, i.e. (−1)v = −v. math 247: honours applied linear algebra 5

1.3 Subspaces lecture 01/16 Deﬁnition 1.4. Let (V, +, ·) be a VS over K and let W ⊆ V. W is called a subspace of V, in symbols W ≤ V, if (W, +, ·) is a VS over K.

Examples: 1. all subspaces of R2 ({0}, all lines Theorem 1.4. Let (V, +, ·) be a VS over K and let W ⊆ V. Then W ≤ V through the origin and R2 ({0}). iff W is closed under + and · and 0 ∈ W. 2. V ≡ Mat(n × n, K) U ≤ V, where U is the set of all Proof. upper triangular n × n matrices with coefﬁcients in K: ("⇒") Since (W, +, ·) is a VS over K, W is closed under + and ·. U ≡ {A = (aij) ∈ Mat(n × n, K) : Furthermore, 0 ∈ W since (W, +) is an (abelian) group, aij = 0, ∀i > j, 1 ≤ i, j ≤ n}

therefore contains a neutral element. 3. sequences c00 ≤ c0 ≤ c ≤ s

("⇐") Must check the axioms: 4. let Pn(K) be the set of all polynomials of degree at most n. Let P(K) • addition: associativity and commutativity hold on W because they be the set of all polynomials with standard + and ·. Then: hold on V. P0(K) ≤ P1(K) ≤≤ · · · ≤ Pn(K) ≤ · · · ≤ P(K) • neutral element: 0 ∈ W by condition.

• additive inverse: Let u ∈ W. Since W closed under ·, we have (−1)u = −u ∈ W.

All axioms of addition hold. All axioms of multiplication hold on W since they hold on V. =⇒ W is a VS w.r.t + and ·, i.e. W ≤ V.

Theorem 1.5.

(a) Let U, W ≤ V VS over K. Then U ∩ W ≤ V. n T (b) Let U1,..., Un ≤ V. Then U1 ∩ · · · ∩ Un = Ui ≤ V. i=1 T (c) I an arbitrary index set Ui ≤ V ∀i ∈ I. Then ≤ V. i∈I

Unions are not closed under +! For example, take V = R2 and let U and W Proof. of (a) [(b) and (c) left as exercises] be respectively vectors on the x and y axes. Their union is not closed under +: •0 ∈ U, 0 ∈ W =⇒ 0 ∈ U ∩ W adding vectors from U and W can yield any vector in R2. • Let v1, v2 ∈ U ∩ W, especially v1, v2 ∈ U =⇒ v1 + v2 ∈ U. U ∪ W ≤ V ⇐⇒ U ⊆ W ∨ W ⊆ U Similarly, v1 + v2 ∈ W. So, v1 + v2 ∈ U ∩ W =⇒ U ∩ W is closed under +.  v1 ∈ U =⇒ kv1 ∈ U ∀k ∈ K • v1 ∈ W =⇒ kv1 ∈ W ∀k ∈ K =⇒ kv1 ∈ U ∩ W So U ∩ W is closed under ·.

=⇒ U ∩ W ≤ V 6 anna brandenberger, daniela breitman

Deﬁnition 1.5. Union. Let V be a VS over K, U, W ≤ V, then U + W ≡ {u + w : u ∈ U, w ∈ W}.

Theorem 1.6. Let V be a VS over K, U, W ≤ V, then

(a)U + W ≤ V

(b)U + W is the smallest subspace of V containing both U ∪ W i.e if V˜ is any subspace of V, then U + W ⊆ V˜

Proof. of a)

•0 ∈ U, 0 ∈ W =⇒ 0 ∈ U ∪ W ∵ 0 + 0 = 0

• Let v1, v2 ∈ U + W, then ∃u1, u2 ∈ U, w1, w2 ∈ W such that (v1 = u1 + w1) + (v2 = u2 + w2) = v1 + v2 = (u1 + u2) ∈ U + (w1 + w2) ∈ W ∈ U + W.

• kv1 = k(u1 + w1) = ku1(∈ U) + kw1(∈ W) ∈ U + W ⇒ U + W ≤ V

Proof. of b) Let V˜ be any subspace of V ⊂ U ∪ W. Let u ∈ U, w ∈ W be arbitrary, then u + w ∈ U + W. u + w ∈ V˜ =⇒ U + W ⊆ V˜ . So U + W is the smallest subspace of V that contains U ∪ W.

In a similar way we deﬁne u1 + ··· + un = {u1 + ··· + un : uk ∈ Uk∀1 ≤ k ≤ n}, where u1, ··· , un ≤ V, then u1 + ··· + un ≤ V.

1.4 Span and Linear Independence

Deﬁnition 1.6. Linear Combination. Let V be a VS over K, v1,..., vn ∈ V, k1,..., kn ∈ K. Then an expression k1v1 + ··· + knvn is a linear combination of v1,..., vn.

This notion can be extended to infinitely many vectors: k1v1 + k2v2 + 6 Since infinitely many additions are not . . . where all but finitely many of the ki are 0, i.e. their sum is defined. actually finite.6

∈ Deﬁnition 1.7. Span. Let V be a VS over K, v1,..., vn V. Then Note that: span{v1,..., vn} = {k1v1 + ··· + knvn : ki ∈ K, 1 ≤ i ≤ n}. span{v1,..., vn}

= span{v1} + ··· + span{vn} For inﬁnitely many vectors v1, v2, · · · ∈ V, span{v1, v2,... } = {k1v1 + = {k1v1 : k1 ∈ K} + ··· + {knv1 : k1 ∈ K} k2v2 + ··· : ki ∈ K : all but ﬁnitely many are 0}. = {k1v1 + ··· + knvn : ki ∈ K} ≤ V Note that span{v1,..., vn} is the smallest subspace of V that contains all vectors v1,..., vk. Exercise 1.5. Prove directly that Examples 1.6. span{v1,..., vn} ≤ V math 247: honours applied linear algebra 7

 e1 = (1, 0, 0, ··· , 0)   = ( ··· ) 2 e2 0, 1, 0, , 0 2 1. V = R , then R = span{e1,..., en}. ···   en = (0, 0, 0, ··· , 0)

2 n 2 n 2. Consider VS Pn(K) = {1, x, x ,..., x }, then span{1, x, x ,..., x } = n , {k01 + x1x + ··· + xnx : ki ∈ x 0 ≤ k ≤ n} = Pn(K).

3. Consider VS s of all sequences with coefﬁcients in R.  s = (1, 0, 0, . . . )  1  s2 = (0, 1, 0, . . . ) Deﬁne =⇒ span{s1, s2,... } = c00. Note that {s1, s2,... } does not span s: s3 = (0, 0, 1, . . . ) every LC of s results in a sequences that  ··· eventually constantly equals 0.

Deﬁnition 1.8. Linear independence and Dependence. Let V be a VS over K, v1,..., vn ∈ V. We say that v1,..., vn are linearly independent if k1v1 + ··· + knvn = 0 ⇔ k1 = k2 = ··· = kn = 0. We say that v1,..., vn are linearly dependent if they are not linearly independent.

Theorem 1.7. Let V be a VS over K. lecture 01/18 (a) any (ﬁnite) subset of V that contains 0 is LD (linearly dependent).

(b) If v1,..., vn are LD, there exists at least one j, 1 ≤ j ≤ n such that vj is a LC of v1,..., vj−1, vj+1,..., vn. In that case, span{v1,..., vn} = span{v1,..., vj−1, vj+1,..., vn} = span{v1,..., vˆj,..., vn}.

Proof. (a) Consider 0, v1,..., vn. Then 1 · 0 + 0 · 1 + ··· + 0 · vn = 0. Non trivial LC of 0, v1,..., vn since the coefﬁcient of the zero vector is non-zero.7 7 Remark: Even {0} is LD since 1 · 0 = 0 which is a non-trivial LC of the zero- (b) Let v1,..., vn be LD. Thus ∃k1,..., kn ∈ K, where not all of them vector resulting in the zero-vector. 0, such that k1v1 + ··· + kjvj + ··· + knvn = 0. Let kj 6= 0.

=⇒ kjvj = −k1v1 − · · · − kj−1vj−1 − kj+1vj+1 − · · · − knvn (1) ˆ k1 kj kn =⇒ vj = − v1 − · · · − vj − · · · − vn (2) kj kj kj

Thus vj is a LC of {v1, vˆj, vn}. Regarding spans: obviously span{v1,..., vˆj,..., vn} ⊆ span{v1,..., vn}. We still need to prove that span{v1,..., vn} ⊆ span{v1,..., vˆj,..., vn}. Let v = a1v1 + ··· + ajvj + ··· + anvn ∈ span{v1,..., vn}. 8 anna brandenberger, daniela breitman

Then by (3):

ˆ k1 kj kn v = a1v1 + ··· + aj(− v1 − · · · − vj − · · · − vn) + ··· + anvn kj kj kj

k1 kj−1 =⇒ v = (a1 − v1)v1 + ··· + (aj−1 − )vj−1 + (aj+1− kj kj−1

kj+1 kn )vj+1 + ··· + (an − )vn kj+1 kj

which is a LC of v1,..., vˆj,... vn.

Theorem 1.8. Let v1,..., vn ∈ V, linearly independent. Let vn+1 ∈/ span{v1,..., vn}. Then v1,..., vn, vn+1 are LI.

Proof. Let k1v1 + ··· + kjvj + ··· + knvn = 0. We need to show that k1 = ··· = kn = kn+1 = 0. Assume that kn+1 6= 0. Then,

kn+1vn+1 = −k1v1 − · · · − knvn

k1 kn =⇒ vn+1 = − v1 − · · · − vn kn+1 kn+1

=⇒ vn+1 ∈ span{v1,..., vn} Remark: Let V be a VS over K, W ⊆ V. Thus the assumption was wrong. =⇒ kn+1 = 0. Assume that V is ﬁnite dimensional. =⇒ k = ··· = k = k = v v We currently cannot conclude that 1 n n+1 0 since 1,..., n are LI. W is ﬁnite dimensional. This will be =⇒ v1,..., vn, vn+1 are LI. proven later.

1.5 Bases and Dimension

Definition 1.9. Let V be a VS over K. V is called finite dimensional if Caution: it is not enough to construct V has a finite spanning set, i.e. there are finitely many v1,..., vn ∈ V an infinitely spanning set for V to prove s.t. V = span{v1,..., vn}. V is called infinite dimensional if it is not that it is inf. dim! rather, we need to finite dimensional. show that no finite spanning set can possibly exist.

Examples 1.7. of ﬁnite and inﬁnite dimensional VSes:

1. Rn is ﬁnite dim. n R = span{e1,..., en}

2. c00 is inf. dim.

Proof. Assume that c00 is ﬁnite dim; let s1,..., sn ∈ c00 s.t. c00 = span{s1,..., sn}. ≤ ≤ ∃ = ∀ ≥ sj, 1 j n, is in c00. Thus Nj s.t. sjk 0 k Nj. ≡ { } = ( ) Let N max N1,..., Nk and sj sj1 , sj2 ,... . Then: math 247: honours applied linear algebra 9

= ∀ ≤ ≤ ∀ ≥ sjk 0 1 j n, k N. The same holds for every s ∈ span{s1,..., sn} i.e. s = (s1, s2,..., sn−1, sn), then sk = 0 ∀k ≥ N. th Thus e.g. s˜ = (0, 0, . . . , 0, 1, 0, . . . , 0) ∈ c00 (1 at the N position) is not in the span of s1,..., sn.

Theorem 1.9. Steinitz Exchange Lemma.

Let V be a ﬁnite dimensional VS over K. Let u1,..., um ∈ V be LI and i.e. we exchanged m of the vectors let v1,..., vn be spanning, i.e. V = span{v1,..., vn}. Then, m ≤ n and v1,..., vn by u1,..., um without chang- there exist ﬁnitely many indices k1,..., kn−m s.t. u1,..., um, vk1 ,..., vkn−m ing the span. is spanning.

Proof. This is proven iteratively, in m steps, replacing one of the vj by one of the u-s at each step.

1. step: Since span{v1,..., vn} = V, ∃a1,..., an ∈ K s.t. u1 = a1v1 + ··· + anvn. Note that u1 6= 0 since u1,..., um are LI. Thus at least one of the coefﬁcients a1,..., an 6= 0. By potentially reordering v1,..., vn, we may w.l.o.g. assume that a1 6= 0.

=⇒ a1v1 = +u1 − a2v2 − · · · − anvn 1 a2 an v1 = + u1 − v2 − · · · − vn a1 a1 a1

=⇒ v1 is a LC of u1, v2,..., vn. =⇒ span{v1, v2,..., vn} = V = span{u1, v2,..., vn}.

2. step: Since u1, v2,..., vn is spanning, ∃b1, a2,..., an ∈ K s.t. u2 = b1u1 + a2v2 + ··· + anvn. We want to show that at least one of the aj is non-zero. Assume a2 = ··· = an = 0. Then, u2 = b1u1 =⇒ b1u1 − 1 · u2 = 0 which is a non-trivial combination of u1, u2 resulting in the zero vector. But u1, u2 are LI, thus at least one aj 6= 0. w.l.o.g assume that a2 6= 0.

a2v2 = u2 − b1u1 − a3v3 − · · · − anvn

1 b1 a3 an v2 = u2 − u1 − u3 − · · · − un a2 a2 a2 a2

=⇒ v2 is a LC of u1, u2, v3,..., vn. Thus V = span{u1, v2, v3,..., vn} = span{u1, u2, v3,..., vn}. 10 anna brandenberger, daniela breitman

··· th After the m step, we obtain a spanning set {u1,..., um, vm+1,..., vn} of V. Especially m ≤ n. lecture 01/23

Theoretical Applications of the Steinitz Exchange Lemma

Theorem 1.10. Every subspace of a finite dimensional VS over K is finite dimensional. Examples 1.8. Proof. Let V be a finite dimensional VS, U ≤ V.V is finite dimen- 1. The VSs of all sequences with coefficients in K is infinite dimensional. sional, thus it has a finite spanning set {v1, ··· , vn }. We need to show 2. We already know that c00 is infinite that U has a finite spanning set. We proceed recursively. dimensional c00 ≤ s. If s were finite Assume this is not the case: dimensional then so would be c00 6= {~ } 6= ∈ { } by previous Thm but c00 is infinite Then U 0 . Let u1 0, u1 U. Then u1 is linearly indepen- dimensional =⇒ s is infinite dent but not spanning. Thus ∃u2 ∈ U with u2 ∈/ span{u1}. By the dimensional. previous theorem, {u1, u2} is linearly independent; however, it is not 3. Similarly, c0 is also infinite dimensional since c00 ≤ c0. spanning, thus ∃u3 ∈/ span{u1, u2}, etc. After n+1 iterations, we’ve obtained a linearly independent set {u1, ··· , un+1} ⊆ U ⊆ V. But V has a spanning set {u1, ··· , un} containing n elements! =⇒ contradiction to Steiniz! Our assumption was wrong. Thus U is finite dimensional.

Definition 1.10. Basis. Let V be a VS over K. A subset B of V is Exercise 1.9. Prove that any superspace of an infinite dimensional VS over K is called a basis of V if: infinite dimensional.

(i) B is linearly independent.

(ii) B is spanning. 8 Showing that B is a basis for V:

Examples 1.10. Proof.B is spanning. It is also LI: Let a , a , ··· , a ∈ K n 0 1 n = R = { ··· } 2 1. V , B e1, , en such that a0 + a1x + a2x + ··· + n an x = 0 ( =⇒ zero polynomial) n 2 n 2. V = C , B = {e1, ··· , en} = 0 + 0 · x + 0 · x + ··· + 0 · x ∀x ∈ K Two polynomials are identically equal 2 n 8 3. V = Pn, Polynomials of degree ≤ n, B = {1, x, x , ··· , x } iff they have the same coefﬁcients.  0 if (k, l) 6= (i, j) 4. V = mat(mxn, K). Let Mij = (mij), where mkl 1 if (k, l) = (i, j)

9 9 then {Mij}, 1 ≤ i ≤ m, and 1 ≤ j ≤ n is a basis for V. See assignment 3.

5. m = n = 2 : {4’identity’ matrices} is a basis for Mat(2 × 2, K). Q: Does every VS have a basis? A: Yes but we’ll prove this only in the Theorem 1.11. Every ﬁnite dimensional VS has a basis. ﬁnite dimensional case. math 247: honours applied linear algebra 11

Proof. By Casting Out Algorithm (Steiniz proof), let V be finite dimensional, by definition of finite dimensional, V has a finite spanning set {u1, ··· , un}. If {u1, ··· , un} is linearly independent, it is a basis and done. If {vi, ··· , vn} is linearly dependent ∃j : 1 ≤ j ≤ n such that vj is a linear combination of v1, ··· , vˆj, ··· , vn. W.l.o.g., assume that j = n. As proven in previous theorem, we then have that span{v1, ··· , vn} = span{v1, ··· , vn−1} =⇒ {v1, ··· , vn−1} spans V. After repeating this arguments finitely many times, we obtain v1, ··· , vk, k ≤ n such that {v1, ··· , vk} is spanning and linearly independent and thus a basis. Q: What is a basis for {0}? Another version of the proof: Bottom Up Algorithm: A: B = {}, since span{} = {0}. Let V be finite dimensional VS over K. Say V has a spanning set of n elements. If V = {0}, B = {} and were done. If not, pick any v1 ∈ V, v1 6= 0, then {v1} is linearly independent. It if is also spanning, we’re done. If not, let v2 ∈ V be any vector with v2 ∈/ span{v1}, then by previous theorem, {v1, v2} is linearly independent if {v1, v2} is spanning, we’re done. If not, ∃v3 ∈ V : v3 ∈/ span{v1, v2}, etc... until spanning. So by Steiniz, this algorithm has to terminate after at most

n steps. We obtain {v1, ··· , vk}, k ≤ n, linearly independent and spanning, i.e. a basis.

Example 1.11. Polynomials of degree ≤ 2, coefﬁcients ∈ K. We know that {1, x, x2} is a basis and thus spanning. Finding another basis.

Proof. Let v0 ≡ 2, then span{v0} 6= P2. Let v1 ≡ 3 + 5x, then v1 ∈/ span{v0} =⇒ {v0, v1} is linearly independent, not spanning (no polynomial of degree 2 is in the span). 2 Let v2 ≡ 7 − 5x + 11x , then v2 ∈/ span{v0, v1} =⇒ {v0, v1, v3} is linearly independent. It needs to be spanning as well since otherwise ∃v3 ∈/ span{v0, v1, v2} =⇒ {v0, v1, v2, v3} linearly independent =⇒ contradiction to Steinitz. =⇒ {v0, v1, v2} is linearly independent and spanning, =⇒ {v0, v1, v2} is a basis for P2.

Theorem 1.12. Let V be a ﬁnite dimensional VS over K. Let {u1,..., um} and {u1,..., un} be bases for V. Then m = n, i.e. any 2 bases for a ﬁnite dimensional vector space contain the same number of elements.

Proof. Especially {u1, ··· , um} is linearly independent and {u1, ··· , un} is spanning. By Steiniz =⇒ m ≤ n. But we also have that {u1, ··· , un} is linearly independent and {u1, ··· , um} is spanning. By Steiniz =⇒ n ≤ m. ∴ n = m. Remark: the notion of dimension is well-deﬁned by previous theorem. 12 anna brandenberger, daniela breitman

Deﬁnition 1.11. Dimension. Let V be ﬁnite dimensional VS over K. Let {v1, ··· , vn} be any basis for V. Then n is called the dimension of V. lecture 01/25 Examples 1.12. of dimension:

n 1. R as VS over R has dimension n since {e1,..., en} is a basis.

n 2. C as VS over C has dimension n since {e1,..., en} is a basis.

2 n 3. Pn(K) VS over K has dimension n + 1 since {1, x, x ,..., x } is a basis with n + 1 elements.

4. Mat(m × n, K) VS over K has dimension m · n since  1 if (k, l) = (i, j) {Eij = (mkl) where mkl = } 0 if (k, l) 6= (i, j) is a basis with m · n elements.

Some Theorems on Bases and Dimension

Theorem 1.13. Let V be a ﬁnite dimensional VS over K, B = {v1,..., vn} a basis for V. Let u ∈ V. Then there exist uniquely determined

k1,..., kn ∈ K s.t. v = k1v1 + ··· + knvn.

Proof.B is spanning, thus v can be written as a LC of v1,..., vn. Let v = k1v1 + ··· + knvn and v = l1v1 + ··· + lnvn.

=⇒ 0 = (k1v1 + ··· + knvn) − (l1v1 + ··· + lnvn)

= (k1v1 − l1v1) + ··· + (knvn − lnvn)

= (k1 − l1)v1 + ··· + (kn − ln)vn

Since v1,..., vn are LI we have that k1 − l1 = 0, . . . , kn − ln = 0. =⇒ k1 = l1,..., kn = ln. =⇒ v can be expressed in exactly one way as a LC of v1,..., vn.

Theorem 1.14. Let V be a ﬁnite dimensional VS over K, let U ≤ V. Then dimU ≤ dimV.

Remark: Note that U is finite dim since V is finite dim. Thus, dimU is defined. Proof. Let {u1,..., um} be a basis for U, {v1,..., vn} be a basis for V. Note that {u1,..., um} is LI and {v1,..., vn} is spanning. By Steinitz (Theorem 1.9), m ≤ n =⇒ dimU ≤ dimV.

Theorem 1.15. Let V be a ﬁnite dimensional VS over K with dimV = n.

Let B = {v1,..., vn} be LI. Then B is a basis for V. math 247: honours applied linear algebra 13

Proof.B is LI. We need to show it is spanning. Assume it is not. Then ∃ vn+1 ∈ V, vn+1 ∈/ span{v1,..., vn} =⇒ {v1,..., vn, vn+1} is LI. But since dimV = n, V has a spanning set of length n. contradiction with Steinitz. Thus B is also spanning and ∴ a basis.

Theorem 1.16. Let V be a ﬁnite dimensional VS over K with dimV = n.

Let B = {v1,..., vn} be spanning. Then B is a basis for V.

Proof.B is spanning. We need to show that B is LI. Assume that B is LD. Then ∃j : 1 ≤ j ≤ n s.t. vj is a LC of v1, ··· , vˆj, ··· , vn and V = span{v1, ··· , vn} = span{v1, ··· , vˆj, ··· , vn}. i.e. V has a spanning set of n − 1 vectors. But since dimV = n, it Example 1.13. V = P2(K), dimV =3., has a LI set of length n, contradiction with Steinitz. B = {3, 5 − 9x, 7 + 3x + 13x2}. =⇒ B is also LI and thus a basis. B is LI for degree reasons, thus B is also spanning.

Theorem 1.17. Let V be a ﬁnite dimensional VS over K, U ≤ V. Let

Bu = {u1,..., um} be a basis for U. Then Bu can be extended to a basis Bv for V, i.e. ∃ um+1,..., un ∈ V s.t. Bv ≡ {u1,..., um, um+1,..., un} is a basis for V.

Proof. This is the same argument we used for proving the existence of a basis via the Bottom Up algorithm: we start with Bu in the initial step.10 10 Exercise: Work out the details!

Theorem 1.18. Let V be a ﬁnite dimensional VS over K, U ≤ V,W ≤ V. Then dim(U + W) = dimU + dimW − dim(U ∩ W).

Proof. Let {v1, ··· , vk} be a basis for U ∩ W. U ∩ W ≤ U. ∴ by the previous theorem, we can extend this to a basis {v1, ··· , vk, uk+1, ··· , um} for U. Similarly, we can ﬁnd wk+1, ··· , wn s.t. {v1, ··· , vk, wk+1, ··· , wn} is a basis for W.

Now consider the set S ≡ {v1, ··· , vk, uk+1, ··· , um, wk+1, ··· , wn}. We will show that S is a basis for U + W.

1. step: S spans U + W, i.e. span{S} = U + W. First show "⊆". Let v ∈ spanS be arbitrary. Then:

v = a1v1 + ··· + akvk ∈ U ∩ W

+ bk+1uk+1 + ··· + bmum ∈ U

+ ck+1wk+1 + ··· + cnwn ∈ W

where (ai, bj, cl) are scalars.

=⇒ v ∈ U + W ∴ span{S} ⊆ U + W Next show "⊇": Let v ∈ U + W be arbitrary.

Then ∃u ∈ U, w ∈ W s.t. v = u + w ∈ span{v1, ··· , vk, wk+1, 14 anna brandenberger, daniela breitman

··· , wn} ∈ span{v1, ··· , vk, uk+1, ··· , un} ∈ spanS. =⇒ U + W ⊆ spanS =⇒ U + W = span{S} i.e. S spans U + W.

2. step: S is LI. Let the as, bs and cs ∈ K be such that:

0 = a1v1 + ··· + akvk

+ bk+1uk+1 + ··· + bmum (*) + ck+1wk+1 + ··· + cnwn

=⇒ a1v1 + ··· + akvk ∈ U ∩ W

+ bk+1uk+1 + ··· + bmum ∈ U

= −ck+1wk+1 − · · · − cnwn ∈ U ∩ W (since =)

0 0 Since {v1,..., vk} is a basis for U ∩ W, ∃a1, ··· , ak ∈ K s.t. 0 0 −ck+1wk+1 − · · · − cnwn = a1v1 + ··· + akvk

=⇒ 0 · v1 + ··· 0 · vk − ck+1wk+1 − · · · − cnwn 0 0 = a1v1 + ··· + akvk + 0 · wk+1 + ··· + 0 · wn

Note that {v1, ··· , vk, wk+1, ··· , wn} is a basis for W and thus especially LI. Thus, the coefﬁcients in the previous equations are uniquely determined. 0 0 =⇒ [a1 = ··· = ak = 0] and ck+1 = ··· = cn = 0. Plugging this into (*):

0 = a1v1 + ··· + akvk + bk+1uk+1 + ··· + bmum

But {v1, ··· , vk, uk+1, ··· , um} is a basis for U. Thus, a1 = ··· = ak = 0 and bk+1 = ··· = bm = 0. This proves that S is LI. Hence, S is a basis for U + W. 11 Proof. Planes through the origin are the 2-dim subspaces of R3. Let U, W be dim(U + W) = |S| (size of S) 2 such planes. Then: = k + (m − k) + (n − k) (vs, us, ws) dim(U + W) = dimU + dimW − dim(U ∩ W) dim(U ∩ W) = dimU + dimW − dim(U + W) = m + n − k = 2 + 2 − dim(U + W) = dimU + dimW − dim(U ∩ W) = 4 − dim(U + W) (U + W ≤ R3) Example 1.14. Any two planes through the origin in R3 have non ≥ 4 − 3 = 1 trivial intersection, i.e. they intersect in at least a line through the =⇒ the intersection of the planes is a origin. 11 subspace of R3 of dim ≥ 1 and is thus at least a line.

1.6 Direct Sums lecture 01/30 Recall: V VS over K, and U ≤ V, W ≤ V, then U + W ≡ {u + w : u ∈ U, w ∈ W} and U + W ≤ V math 247: honours applied linear algebra 15

Deﬁnition 1.12. A sum U + W of subspaces U, W of V is called direct - in symbols, U ⊕ V) if ∀v ∈ U + W, the representation v = u + w, with u ∈ U, w ∈ W is uniquely determined. Similarly a sum U1 + ··· + Un of subspaces of V is called direct if ∀v ∈ U1 + ··· + Un, the representation v = u1 + ··· + un, uj ∈ Uj ∀1 ≤ j ≤ n is uniquely deﬁned.

Theorem 1.19. U + W of subspaces of V is direct iff U ∩ W = {0}.

Proof. " ⇒ " Let U + W be direct. Let v ∈ U ∩ W, then: 0 = 0 + 0 = v + −v |{z} |{z} |{z} |{z} |{z} ∈U+W ∈U ∈W ∈U ∈W Since the representation is unique, v = 0 =⇒ U ∩ W = {0}. " ⇐ " Let v ∈ U + W be arbitrary. Let v = u1 + w1 and v = u2 + w2, for u1, u2 ∈ U, w1, w2 ∈ W.

v = u1 + w1

v = u2 + w2

0 = (u1 − u2) + (w1 − w2)

=⇒ u1 − u2 = w2 − w1 | {z } | {z } ∈U ∈W | {z } | {z } ∈U∩W ∈U∩W

=⇒ u1 − u2 = 0 = w2 − w1

=⇒ u1 = u2 ∧ w1 = w2

=⇒ v is uniquely determined =⇒ U + W is direct.

Exercise 1.15. Let v1, ··· , vn ∈ V be LI, then span{v1} + ··· + span{vn} is direct. 16 anna brandenberger, daniela breitman

2 Linear Maps lecture 01/30 2.1 Mappings and Associated Subspaces

Deﬁnition 2.1. Let V and W be VS over K. A function L : V → W is called linear if: (Same K for both V and W.)

1. L(v1 + v2) = L(v1) + L(v2) v1, v2 ∈ V L(v1), L(v2) ∈ W 2. L(kv1) = kL(v1)

Theorem 2.1. Let L : V → W be linear, then L(0 ∈ V) = 0 ∈ W.

Proof.

– version 1. L(0) = L(0 + 0) = L(0) + L(0) deﬁnition (1.) ¨ ¨ =⇒ 0 +¨L(¨0) = L(0) +¨L(¨0) cancellation =⇒ L(0) = 0

– version 2. L(0) = L(0 · 0) = 0 · L(0) deﬁnition (2.) = 0 theorem

Warning 1. In calculus, the so-called Deﬁnition 2.2. Subspaces associated with linear maps. Let L : V → "linear" maps f : R → R, x 7→ ax + b are W be linear. Deﬁne: only linear if b = 0!

(a) The kernel, ker L is deﬁned as ker L ≡ {v ∈ V : L(v) = 0} ⊆ V.

(b) The image im L is deﬁned as im L ≡ {L(v) : v ∈ V} ⊆ W

Remark 2. Axler calls ker L the nullspace Theorem 2.2. Let L : V → W be linear, then: of L and im L the range of L.

1. kerL ≤ V

2. imL ≤ W

Proof.

1.0 ∈ ker L since L(0) = 0. Let v1, v2 ∈ ker L, then L(v1 + v2) = L(v1) + L(v2) = 0 + 0 = 0 =⇒ v1 + v2 ∈ ker L and L(kv1) = kL(v1) = k · 0 = 0 =⇒ kv1 ∈ ker L ker L ≤ V

2. Since L(0(∈ V)) = 0(∈ W) =⇒ 0(∈ W) ∈ im L. Let w1, w2 ∈ im L. Thus ∃v1, v2 ∈ V with L(v1) = w1 and L(v2) = w2, then L(v1 + v2) = L(v1) + L(v2) = w1 + w2 =⇒ w1 + w2 ∈ im L and L(kv1) = kL(v1) = kw1 =⇒ kw1 ∈ im L =⇒ im L ≤ W. math 247: honours applied linear algebra 17

Examples 2.1.

1. L deﬁned as: L : Kn → km is linear: x 7→ A · x |{z} |{z} m×n n×1 | {z } 12 m×1 A(x + x˜) = Ax + Ax˜ =⇒ L(x + x˜) = L(x) + L(x˜) • ker L = nullspace(A) A(kx) = k(Ax) =⇒ L(kx) = kL(x) • im L = colspace(A) (c.f. assign. 2) =⇒ L is linear.12

2. D deﬁned as: D : P(R → P(R) is linear: 0 p 7→ p (derivative) 13 ( + ) = ( + )0 = 0 + 0 = ( ) + ( ) D n q n q n q D n D q • ker D = P0(R) 0 0 D(kn) = (kn) = kn = kD(n) • im D = P(R) =⇒ D is linear.13 14

3. (a) D deﬁned as: D : Pn(R → Pn−1(R) is linear as shown • ker D = P0(R) 0 p 7→ p • im D = Pn−1(R) above.14 D is surjective.

(b) Same function deﬁned on D : Pn(R) → Pn(R) is not surjective.15 15

• ker L = P0(R) = (R) ∈ R 4. Integration: V P , for a, b • im L = Pn−1(R) Pn(R) (a) L deﬁned as: L : P(R) → R is linear: Z b p 7→ pdx a

Z b Z b Z b L(p + q) = (p + q)dx = pdx + qdx = L(p) + L(q) a a a Z b Z b L(kp) = kpdx = k pdx = kL(p) a a =⇒ L linear. (b) a ∈ R: L : P(R) → P(R) ker L = {0} Z x p 7→ p(t)dt im L = Set of all polynomials with a a root at x = a. L is linear as shown above.

Assignment 4: V = s, s = {(x1, x2, x3, ··· ) : xi ∈ R, ∀i} Left-shift: (x1, x2, x3, ··· ) 7→ (x2, x3, x4, ··· ) Right-shift:(x1, x2, x3, ··· ) 7→ (0, x1, x2, ··· ). 18 anna brandenberger, daniela breitman

Deﬁnition 2.3. I ⊆ R interval. c0(I)≡ Set of all continuously functions on I. c1(I)≡ Set of all continuously differentiable functions ··· ··· cn(I)≡ Set of all n-times continuously differentiable functions on I. ··· c∞(I) ≡ Set of all functions on I with derivatives of all orders.

Exercise 2.2. c0(I), cn(I), c∞(I) are VS over R. Show c∞(I) ≤ · · · ≤ cn(I) ≤ · · · ≤ c1(I) ≤ c0(I).

− D : cn(I) → cn 1(I) or D : c∞(I) → c∞(I) are all linear. lecture 02/01

Theorem 2.3. Let V, W be VS over K, L : V → W linear. Then L is injective iff ker L = {0}.

Proof. "⇒" Let L be injective16 and let v ∈ ker L. Then: 16 injective = one to one L(v) = 0 = L(0) =⇒ v = 0 =⇒ ker L = {0}. "⇐" Let ker L = {0}. Let v, v˜ ∈ V s.t. L(v) = L(v˜). =⇒ L(v) − L(v˜) = 0 = L(v − v˜) =⇒ v − v˜ ∈ ker L =⇒ v − v˜ = 0 =⇒ v = v˜ =⇒ L is injective.

Theorem 2.4. Let V be a ﬁnite dimensional VS over K, W a VS over K

and {v1, ··· , vn} a basis for V. Let w1, ··· , wn be arbitrary vectors in W. Then there exists a uniquely determined linear map L : V → W s.t. L(v1) = w1, ··· , L(vn) = wn.

Proof. Definition of L: Let v ∈ V. ∃k1, ··· , kn ∈ K s.t. v = k1v1 + ··· + knvn. Deﬁne L(v) ≡ k1w1 + ··· + knwn. Note that L is a function since k1, ··· , kn are uniquely determined.

Linearity of L: Let v, v˜ ∈ V.

v = k1v1 + ··· + knvn

v˜ = l1v1 + ··· + lnvn

v + v˜ = (k1 + l1)v1 + ··· + (kn + ln)vn

=⇒ L(v + v˜) = (k1 + l1)w1 + ··· + (kn + ln)wn

= (k1w1 + ··· + knwn) + (l1w1 + ··· + lnwn) = L(v) + L(v˜) math 247: honours applied linear algebra 19

and

kv = (kk1)v1 + ··· + (kkn)vn

=⇒ L(kv) = (kk1)w1 + ··· + (kkn)wn

= k[k1w1 + ··· knwn] = kL(v)

=⇒ L is linear.

Uniqueness: Let L˜ : V → W be linear with L˜ (v1) = w1, ··· , L˜ (vn) = wn.

=⇒ L˜ (k1v1) = k1w1, ··· , L˜ (knvn) = knwn

=⇒ L˜ (k1v1 + ··· + knvn)

= L˜ (k1v1) + ··· + L˜ (knvn)

= k1w1 + ··· + knwn

= L(k1w1 + ··· + knvn) =⇒ L˜ (v) = L(v) ∀v ∈ V =⇒ L˜ = L

=⇒ L is uniquely determined.

Theorem 2.5. Dimension Formula. Let V be a ﬁnite dimensional VS over K, W VS over K. Let L : V → W be linear. Then dim ker L + dim im L = dimV.

Proof. Let {u1, ··· , uk} be a basis for ker L and let {w1, ··· , wn} be a basis for im L. Since w1, ··· , wn ∈ im L ∃v1, ··· , vn ∈ V with Figure 1: Representation of Thm 2.5

L(v1) = w1, ··· , L(vn) = wn. Note that the vi are not necessarily uniquely determined! Let B ≡ {u1, ··· , uk, v1, ··· , vn}. claim: B is a basis for V.

1.Show B is LI:

Let a1 u1 + ··· + ak uk + b1 v1 + ··· + bn vn = 0 (?). Applying L to both sides:

=⇒ L(a1u1 + ··· akuk + b1v1 + ··· + bnvn) = L(0) = 0

=⇒ a1L(u1) + ··· + ak L(uk) + b1L(u1) + ··· + bn L(vn) = 0 | {z } | {z } | {z } | {z } =0 =0 =w1 =wn

=⇒ b1w1 + ··· + bnwn = 0

since w1, ··· , wn are LI.

It follows that b1 = ··· = bn = 0 in (?): a1u1 + ··· + akuk = 0. Since u1, ··· , uk are LI, =⇒ a1 = ··· = ak = 0, =⇒ B is LI.

2.Show B is spanning: 20 anna brandenberger, daniela breitman

Let v ∈ V be arbitrary. Let w ≡ L(v). Then w ∈ im L. Thus ∃b1, ··· , bn ∈ K s.t. w = b1 w1 + ··· + bn wn. Consider v˜ ≡ b1 v1 + ··· + bn vn. Then:

L(v − v˜) = L(v) − L(v˜) = 0 |{z} |{z} =w =b1w1+···+bnwn=w

=⇒ v − v˜ ∈ ker L =⇒ ∃a1, ··· , aks.t.

v − v˜ = a1u1 + ··· + akuk

=⇒ v = a1u1 + ··· + akuk + b1v1 + ··· + bnvn =⇒ B is spanning

=⇒ B is a basis for V. =⇒ We have: dimV = k + n = dim ker L + dim im L

Examples 2.3.

1. D : Pn(K) → Pn(K) differentiation. ker D consists of all constant polynomials, i.e. ker D = P0(K). dim ker D = 1. im D consists of all polynomials of degree at most n − 1. i.e. im D = Pn−1(K) =⇒ dim im D = n.

? dim Pn(K) = dim ker D + dim im L assignment 2: 7→ m → n + = + L : x Ax, L : K K n 1 1 n X nullspace(A) = ker L colspace(A) = im L 2. Let A ∈ Mat(n × m, R). since dim ker L + dim im L = dim V =⇒ dim nullspace(A) +rank(A) = m Consider Ax = 0. | {z } nullity(A) Learned in previous linear algebra courses that: dim nullspace(A) + dim colspace(A) = m (dim colspace(A) = dim rowspace(A) = rank(A))

a11 ··· a1n 3. tr : Mat(n × n, K) → K "trace": ······ 7→ a11 + ··· ann. an1 ··· ann tr is linear: a11+b11 ··· a1n+b1n tr(A + B) = tr ······ an1+bn1 ··· ann+bnn

= (a11 + b11) + ··· + (ann + bnn)

= (a11 + ··· + ann) + (b11 + ··· + bnn) = trA + trB

ka11 ··· ka1n tr(kA) = tr ······ kan1 ··· kann

= ka11 + ··· + kann = k(a11 + ··· + ann) = k · trA math 247: honours applied linear algebra 21

ker L is the set of all matrices where the sum of all elements along the main diagonal is 0. im L = K, dim im L = 1 dim Mat(n × n, K) = n2 =⇒ dim ker L = n2 − 1.

4. Let A ∈ Mat(n × m, K) with rows = n < columns = n. The lin. hom. system Ax = 0 has non-trivial solutions. Let L : x 7→ Ax : Km → Kn Then Ax = 0 has non-trivial sol. iff L has a non-trivial kernel, i.e. iff dim ker L > 0. By dimensional Formula, dim ker L + dim im L = dimKm

=⇒ dim ker L = dim Km − dim im L = m − dim colspace(A) = m − dim rowspace(A) = m − n > 0 | {z } ≤n =⇒ ker L is non-trivial. =⇒ Ax = 0 has non-trivial solutions. lecture 02/06

2.2 Endomorphisms and Isomorphisms

Deﬁnition 2.4. An endomorphism is a linear map L : V → V, V is VS over K.

Figure 2: Injective (non surjective) map. Theorem 2.6. Let V be ﬁnite dimensional VS over K, L : V → V be an endomorphism, then L is bijective iff L is surjective iff L is injective.

Lemma 2.7. Lemma: Let V be a ﬁnite dimensional VS over K, and let U ≤ V s.t dim U = dimV, then U = V. Proof. Let {v , ··· , v } be a basis for U =⇒ {v , ··· , v } is LI, 1 n 1 n Figure 3: Surjective (non injective) map. where n = dim U = dim V =⇒ {v1, ··· , vn} is a basis for V, especially, it spans V =⇒ U = V

Proof. of Theorem 2.6. There are 2 non-trivial implications:

1. injective =⇒ surjective: Figure 4: Bijective map. By Dimension formula, dim ker L + dim imL = dim V. L injective =⇒ kerL = {0}. =⇒ dim kerL = 0 =⇒ dim imL = dim V, where imL ≤ V. By Lemma 2.7, im L = V =⇒ L is surjective.

2. Surjective =⇒ injective: Let L be surjective =⇒ im L = V =⇒ dim im L = dim V dim ker L = 0 =⇒ ker L = {0} =⇒ L is injective. 22 anna brandenberger, daniela breitman

Remark 3. Comparable results are wrong in almost any other context, e.g.: f : R → R is injective but not surjective, 7→ x x e Remark 4. The result is also wrong in while f : R → R is surjective but not injective. general for endomorphisms between inﬁnite dimensional VS (See Ass. 5). x 7→ x3 − x

Coordinate Vectors

All bases in this section are considered as ordered. B = (v1, v2, ··· , vn) i.e bases will have a ﬁrst, second, etc. element. Vectors in Kn will always be considered as column vectors in this section.

Deﬁnition 2.5. Let V be ﬁnite dimensional VS over K, B = (v1, v2, ··· , vn) is its basis. Let v ∈ V; then ∃ uniquely determined a1, ··· , an ∈ K s.t v = a1v1 + ··· + anvn. The column vector a1 ··· ∈ Kn uniquely represents v. It is called the coordinate vector of an v with respect to B, written as [v]B.

Many textbooks use {v1, ··· , vn} even if they mean ordered bases. Theorem 2.8. Let V be ﬁnite dimensional, B an ordered basis for V, then n the coordinate map []B : V → K , v 7→ [v]B is linear and bijective (this is an isomorphism).

Proof. 1.Linearity of []B: Let B = (v1, v2, ··· , vn). 1 0 [v1]B = 1 · v1 + 0 · v2 + ··· + 0 · vn = ··· = e1 0 0 ! 1 [v2]B = 0 = e2 ··· ··· 0 0 0 [vn]B = ··· = en 1 By the Existence Theorem (Theorem 2.4) for linear maps, ∃ a uniquely determined linear map L : V → Kn with: L(v1) = e1, ··· , L(vn) = en, where a1 a2 L(a1v1 + ··· + anvn) = a1e1 + ··· + anen = ··· an =⇒ L = [ ]B =⇒ []B is linear. Exercise: Prove this directly, i.e without the use of the Existence Theorem.

2.Bijectivity of []B: 0 Injectivity: [v]B = ··· ⇐⇒ v = 0 · va + ··· + 0 · vn = 0 i.e 0 ker[]B = {0} =⇒ []B is injective. a 1 n Surjectivity: Let ··· ∈ K be arbitrary. Let v ≡ a1v1 + ··· + anvn, an a1 then [v]B = ··· =⇒ []B is surjective. an =⇒ []B is bijective =⇒ isomorphism. math 247: honours applied linear algebra 23

Theorem 2.9. on isomorphisms: Let L : V → W be an isomorphism. Since L is bijective, it has an inverse map L−1 : W → V, then L−1 is linear.

Proof. Let w1, w2 ∈ W, v1, v2 ∈ V with L(v1) = w1 and L(v2) = w2, then

−1 −1 L (w1 + w2) = L (L(v1) + L(v2)) −1 = L (L(v1 + v2)) −1 −1 = v1 + v2 = L (w1) + L (w2) −1 −1 −1 =⇒ L (w1 + w2) = L (w1) + L (w2)

Let k ∈ K, then

−1 −1 L (kw1) = L (kL(v1)) −1 −1 =⇒ L (L(kv1)) = kL (w1)

=⇒ L−1 linear.

Theorem 2.10. Let V be VS over K of dimension n, then V is isomorphic to Kn.

Proof. Let B be ordered basis for V, then [] : V → Kn is an isomorphism by Theorem 2.8.

2.3 Matrix Representation of Linear Maps Q: Does there exist a n × m matrix A with coefﬁcients in K Idea: Let V, W be ﬁnite dimensional VS over K, L : V → W, let B be that mimics the action of L, i.e. [ ( )] 0 = ·[ ] ∀ ∈ 0 = = L v B A v B v V ordered basis for V, B ordered basis for W. dim V m, dim W n. |{z}× |{z} n m m×1 For v ∈ V, w ∈ W. [v] ∈ Km, [w] ∈ Kn. Before we construct A in | {z } B B n×1 the general case, let’s revesit the classical case of V = Km, W = Kn, B A: Such a matrix, if it exists, would standard basis on Km, and B0 standard basis on Kn. rightfully be called the matrix representation of L wrt B and B0.

a11 ... a1m ! = . . . A . .. . an1 ... anm

The linear maps L : Km → Kn are of the form L(x) = Ax, where A is an n × m matrix. What can we say about the coefﬁcients of A?

a11 ! = ( ) = . Ae1 L e1 . i.e the ﬁrst column of A. |{z} a ∈Km n1 i.e. ··· A = (L(e1) | L(e2) | · · · | L(em)) a1m ! Ae = L(e ) = . m A m m . i.e the th column of . |{z} a ∈Km nm lecture 02/08 24 anna brandenberger, daniela breitman

Deﬁnition 2.6. Let L : V → W linear, V, W VS over K with dim V = m, dim W = n, B = {v1, ··· , vm} basis for V, {w1, ··· , wn} basis for W. Deﬁne [L]B0,B the matrix representation of L with respect to B and B0 by:

m columns z }| {   [L]B0,B = [L(v1)]B0 | · · · | [L(vm)]B0  | {z } | {z } n×1 n×1 | {z } n×m matrix

Theorem 2.11. L : V → W linear, dim V = m, dim W = n, B = 0 {v1, ··· , vm} basis for V, B = {w1, ··· , wn} basis for W. Then:

[L(v)]B0 = [L]B0,B [v]B ∀v ∈ V | {z } | {z } |{z} n×1 n×m m×1 | {z } n×1

Proof. Let v ∈ V be arbitrary, a1 ! v = a v + ··· + a v =⇒ [v] = . 1 1 m m B . . Then: am

a1 ! [L] 0 [v] = ([L(v )] 0 | · · · | [L(v )] 0 ) · . B ,B B 1 B m B . am

= a1[L(v1)]B0 + ··· + am[L(vm)]B0

= [a1L(v1)]B0 + ··· + [am L(vm)]B0 n Note that []B0 : W → K and L are linear. = [a1L(v1) + ··· + am L(vm)]B0

= [L(a1v1 + ··· + amvm)]B0 = [L(v)]B0

Theorem 2.12. Let L : V → W,V, W ﬁnite dimensional, L linear, B 0 basis for V and B basis for W, A ≡ [L]B0,B. Then ker L is isomorphic to nullspace(A) and im L is isomorphic to colspace(A). The isomorphism is given by:

ker L → nullspace(A)

v 7→ [v]B and im L → colspace(A)

w 7→ [w]B0

Proof. Let v ∈ ker L ⇔ L(v) = 0 ⇔ [L(v)]B0 = 0 = [L]B0,B[v]B ⇔ [v]B ∈ nullspace(A) Let w ∈ im L ⇔ ∃v ∈ V : L(v) = w ⇔ [L(v)]B0 = [w]B0 = [L]B0,B[v]B ⇔ [w]B0 ∈ colspace(A) math 247: honours applied linear algebra 25

Example 2.4. D : P2(R) → P2(R) endomorphism [ ] = 0 = ( 2) ( ) ≡ 0 We can use D B,B to differentiate a B B 1, x, x , D p p polynomial via matrix multiplication: 2 0 [D]B,B = ? [D]B,B = [D]B Let p(x) = 3 − 2x + 5x , p (x) = 3 −2 + 10x, [p]B = −2 5 2 0 1 0 3 −2 D(1) = 0 = 0 · 1 + 0 · x + 0 · x 0 0 2 −2 = 10 0 0 0 5 0 2 0 1 0 D(x) = 1 = 1 · 1 + 0 · x + 0 · x =⇒ [D]B,B = 0 0 2 which is the coordinate vector of the 0 0 0 D(x2) = 2x = 0 · 1 + 2 · x + 0 · x2 polynomial −2 · 1 + 10 · x = −2 + 10x = p0(x)

ker D is the set of all constant polynomials, i.e. P0(R) im D = P1(R) 0 1 0 0 1 0 nullspace[D]B,B: 0 0 2 ∼ 0 0 1 with columns a0, a1, a2 0 0 0 0 0 0 =⇒ a1 = a2 = 0, a0 free parameter. n a0 o n 1 o a0 =⇒ nullspace(A) = 0 : a0 ∈ R = span 0 0 is exactly the coordinate vectors of 0 0 0 constant polynomials.

n 1 0 o n 1 0 o colspace([D]B0,B) = span 0 2 = span 0 1 0 0 0 0 n a0 o = a1 : a0, a1 ∈ R 0 which is precisely the set of all coordinate vectors of linear polynomials.

Example 2.5. L : Mat(2 × 2, R) → Mat(2 × 2, R) t A 7→ A − A Note that L is linear since: L is linear. Now let: L(A + B) = (A + B) − (A + B)t = A + B − At − Bt = 0 = { 1 0 0 1 0 0 0 0 } B B 0 0 , 0 0 , 1 0 , 0 1 t t | {z } | {z } | {z } | {z } = (A − A ) + (B − B ) M11 M12 M21 M22 = L(A) + L(B) L(kA) = kA − (kA)t = kA − kAt 1 0 0 0 = k(A − At) = kL(A) L 0 0 = 0 0 = 0 · M11 + 0 · M12 + 0 · M21 + 0 · M22 0 1 0 1 =⇒ L is linear. L 0 0 = −1 0 = 0 · M11 + 1 · M12 + (−1) · M21 + 0 · M22 0 0 = 0 −1 = · + (− ) · + · + · L 1 0 1 0 0 M11 1 M12 1 M21 0 M22 0 0 0 0 L 0 1 = 0 0 = 0 · M11 + 0 · M12 + 0 · M21 + 0 · M22 0 0 0 0 ! 0 1 −1 0 =⇒ [L]B,B = 0 −1 1 0 0 0 0 0

≡ 2 1 ( ) = 2 1 − 2 3 = 0 −2 Let A 3 −1 , L A 3 −1 1 −1 2 0

0 2 [ ( )] = −2 [ ] = 1 L A B 2 , A B 3 0 −1

0 0 0 0 ! 2 0 0 1 −1 0 1 = −2 = [ ( )] 0 −1 1 0 3 2 L A B 0 0 0 0 −1 0 26 anna brandenberger, daniela breitman

ker L: compute nullspace([L]B,B):

0 0 0 0 ! 0 1 −1 0 ! 0 1 −1 0 0 1 −1 0 = 0 −1 1 0 ∼ 0 0 0 0 0 −1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

with columns a11, a12, a21, a22. a12 − a21 = 0 ⇔ a21 = a12 ⇔ a11, a12, a22 free parameters.

1 0 0 a1 =⇒ = 0 1 0 = a2 ∈ R nullspace span 0 , 1 , 0 a3 : a11, a12, a22 0 0 1 a4 a11 a12 ∈ R which are the coordinate vectors of the matrices a12 a22 : a11, a12, a22 which is exactly the set of all 2 × 2 symmetric matrices. Check: 0 0 t L(A) = 0 0 = 0 = A − A ⇔ A = At ⇔ A is 2 × 2 symmetric.

0 0 ([ ] ) = { 1 −1 } im L : colspace L B,B span −1 , 1 0 0 0 0 ! = span{ 1 } = { a12 : a ∈ R} −1 −a12 12 0 0 These are the coordinate vectors of the matrices: { 0 a12 : a ∈ R} A = −At −a12 0 12 This is the set of all 2 × 2 skew-symmetric matrices. lecture 02/13 We continue the above example: Let L : Mat(2 × 2), 7→ Mat(2 × 2, R) A 7→ A − At

0 0 0 0 ! 1 0 0 1 0 0 0 0 0 1 −1 0 B = { 0 0 , 0 0 , 1 0 , 0 1 } [L]B,B = 0 −1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 B = { 0 0 , 1 0 , 0 1 , −1 0 } | {z } | {z } | {z } | {z } 0 0 0 0 0 0 0 2 7→ 7→ 7→ 7→ 0 1 − 0 −1 = 0 2 0 0 0 0 0 0 −2 0 −1 0 1 0 −2 0 0 0 0 0 = 0 1 0 + 0 0 1 0 0 0 0 0 0 1 0 [L] 0 = B ,B 0 0 0 0 + 0 0 + 0 1 0 0 0 2 0 0 1 2 −1 0

This matrix is diagonal, and simpler than the previous representation (See last example). Shows advantage of choosing different bases.

Compositions of Linear maps

0 00 Let U, V, W be ﬁnite dimensional VS over K with bases B, B , B .L1 : UB0 7→ VB0 , L2 : VB0 7→ WB00 , L2 ◦ L1 : UB 7→ WB00 . How are these three linear maps related? math 247: honours applied linear algebra 27

Theorem 2.13. Let U, V, W be ﬁnite dimensional VS over K with bases 0 00 B, B , B .L1 : UB 7→ VB0 , L2 : VB0 7→ WB00 . Then L2 ◦ L1 : UB 7→ WB00 . If dim U = m, dim V = k, dim W = n, then:

[L2 ◦ L1]B00,B = [L2]B00,B0 · [L1]B0,B | {z } | {z } | {z } n×m n×k k×m

Proof. Let u ∈ U be arbitrary vector, then

[L2 ◦ L1]B00,B[u]B = [(L2 ◦ L1)(u)]B00

= [L2(L1(u))]B00

= [L2]B00,B0 [L1(u)]B0

= [L2]B00,B0 [L1]B0,B[u]B∀u ∈ U.

=⇒ [L2 ◦ L1]B00,B = [L2]B00,B0 · [L1]B0,B

Matrix Representation of Isomorphisms

Theorem 2.14. V, W, ﬁnite dimensional VS over K. L : V 7→ W linear, 0 B, B bases for V, W respectively. L isomorphism ⇐⇒ [L]B0,B invertible.

Remark 5. Consider id: V 7→ V the identity map, dim V = n, B basis for V, then [id]B,B = In (n × n identity matrix). If B = (v1, ··· , vn), then [id]B,B = [id(v1)]B | · · · | [id(vn)]B  1 0 ··· 0  0 1 ··· 0 = [ ] | · · · | [ ] = v1 B vn B  . . ..  In | {z } . . . 0 =1·v1+0·v2+···+0·vn 0 0 0 1

Proof.

("⇒") Let L : VB 7→ WB0 be an isomorphism. =⇒ L is bijective −1 =⇒ L has an inverse L : WB0 7→ VB which is linear. We have −1 −1 L ◦ L : VB 7→ VB, L ◦ L = idV then:

−1 −1 [L ◦ L]B,B = I = [L ]B,B0 [L]B0,B = I =⇒ [L]B0,B invertible. −1 Alternatively, [L]B0,B · [L ]B,B0 = I =⇒ [L]B0,B invertible.

("⇐") Let [L]B0,B be invertible n × n matrix. We ﬁrst prove that L is injective. Let v ∈ ker L:

=⇒ L(v) = 0 =⇒ [L(v)]B0 = 0 [L]B0,B [v]B = [L(v)]B0 = 0 | {z } invertible =⇒ [v]B = 0 =⇒ v = 0 =⇒ ker L = {0} =⇒ L is injective. 28 anna brandenberger, daniela breitman

Next, we prove L is surjective.

dim V = dim W = n dim ker L + dim imL = dim V = n =⇒ dim imL = n = dim W =⇒ imL = W =⇒ L surjective.

=⇒ L is bijective =⇒ L is an isomorphism.

2.4 Change of basis

0 Q1: V ﬁnite dimensional VS over K, B, B bases for V , let v ∈ V. Given [v]B, how can we compute [v]B0 ? Consider id : VB 7→ VB0 . [id]B0,B[v]B = [id(v)]B0 = [v]B0

Deﬁnition 2.7. So given id : VB 7→ VB0 , we have [v]B0 = [id]B0,B[v]B. The matrix [id]B0,B is called the transition matrix or change-of-basis matrix from B to B0. caution: Schaum’s mistakenly calls 0 0 Q2: L : V 7→ V linear, B, B bases for V. Given [L]B,B, how can we this the transition matrix from B to B. compute [L]B0,B0 ? id L id Consider: VB0 −→ VB −→ VB −→ VB0 . Then [id ◦ L ◦ id] = [id]B0,B[L]B,B[id]B,B0 = [L]B0,B0 . Let P ≡ [id]B,B0 .

0 id : VB 7→ VB −1 −1 then [id]B0,B = [id]B,B0 = P −1 =⇒ [L]B0,B0 = P [L]B,BP

where P is the change-of-basis matrix from B0 to B.

Deﬁnition 2.8. Two n × n matrices A, B with coefﬁcients ∈ K are similar if there exists an invertible n × n matrix P with B = P−1 AP, thus, any two matrix representations of the same linear map L : V 7→ V with respect to bases B, B0 are similar. Note that converse also holds.

n Special case: V = K with two bases: standard basis S and arbitrary 2 e.g. (added) in R , B = {v~1, v~2}. n t basis B. If we consider the vectors in K as row vectors, [v]S = v , a ~a = c1v~1 + c2v~2 | | ! c column vector. ~ = v~ t v~ t 1 a 1 2 c2 [v] [v] = [id] [v] | | How can we compute B? Invert matrix: B B,S S | {z } [id] is the transition matrix from S to B. P B,S i.e. [~a]S = P[~a]B t t i.e. P = [id] change-of-basis matrix. If B = (v1, ··· , vn) then [id]S,B = (v1 | · · · | vn). And we have S,B −1 −1 t t [id]B,S = ([id]S,B) =⇒ [v]B = P [v]S , where P = (v1 | · · · | vn). |{z} =vt Examples 2.6. lecture 02/15 math 247: honours applied linear algebra 29

1. V = R2, S standard basis, B = ((1, 1), (3, 2))

1 3 [id]S,B = 1 2 − 1 [id] = ([id] )−1 = 1 3 1 = 2 −3 = −2 3 B,S S,B 1 2 −1 −1 1 1 −1

[v]B = [id]B,S[v]S

Check: = ( ) [ ] = 1 [ ] = −2 − ( ) + ( ) = ( ) v 1, 0 , v S 0 , v B 1 ; 2 1, 1 1 3, 2 1, 0 2 −2 3 2 5 v = (2, 3), [v]S = 3 , [v]B = 1 −1 3 = −1 2. Consider L : R2 ⇒ R2, projection onto x-axis. 1 0 i.e. L((1, 0)) = (1, 0), L((0, 1)) = (0, 0) ⇒ [L]S,S = 0 0

[L]B,B = [id]B,S[L]S,S[id]S,B −1 = ([id]S,B) [L]S,S[id]S,B = −2 3 1 0 1 3 = −2 3 1 3 = −2 −6 1 −1 0 0 1 2 1 −1 0 0 1 3

L Check: v = (2, 3) 7→ w = (2, 0) [ ] = 5 [ ] = −2 3 2 = −4 v B −1 , w B 1 −1 0 2 −2 −6 5 = −4 1 3 −1 2

Examples 2.7. V = Mat(2 × 2, R) See last class. L : A 7→ A − At

1 0 0 1 0 0 0 0 B = ( 0 0 0 0 1 0 0 1 ) 0 1 0 0 1 0 0 0 1 B = ( 0 0 1 0 0 1 −1 0 ) 0 0 0 0 ! 0 0 0 0 0 1 −1 0 0 0 0 0 [L]B,B = 0 −1 1 0 , [L]B0,B0 = 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 ! 0 1 0 1 −1 [id]B,B0 = 0 1 0 −1 , [id]B0,B = ([id]B,B0 ) 0 0 1 0     1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0      0 1 0 1 0 1 0 0   0 1 0 1 0 1 0 0    ∼    0 1 0 −1 0 0 1 0   0 0 0 −2 0 −1 1 0  0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1     1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0    1 1   0 1 0 1 0 1 0 0   0 1 0 0 0 2 2 0  ∼   ∼    0 0 1 0 0 0 0 1   0 0 1 0 0 0 0 1  1 1 1 1 0 0 0 1 0 2 − 2 1 0 0 0 1 0 2 − 2 1

2 0 0 0 1 0 1 1 0 =⇒ [id]B0,B = 0 0 0 2 2 0 1 −1 0 3 3 3 3 Check: A = 1 −1 , [A]B = 1 −1 30 anna brandenberger, daniela breitman

2 0 0 0 3 3 1 0 1 1 0 3 1 2 [A]B0 = 0 0 0 2 1 = −1 2 0 1 −1 0 −1 2 1 Check:

1 0 0 1 0 0 0 1 3 3 3 0 0 + 2 1 0 − 0 1 + −1 0 = 1 −1 = A

The Normal Form Problem

n n Let A ∈ Mat(n × m, K).LA : K → K , v 7→ Av. S standard basis n for K , B = {v1, ··· , vn} an arbitrary basis. Then [LA]S,S = A −1 t t [LA]B,B = [id]B,S [LA]S,S [id]S,B = P AP where P = (v1 | · · · | vn) | {z } | {z } | {z } =P−1 =A ≡P The converse works as well, i.e. given an arbitrary P ∈ GL(n, K), where GL(n, K) is the "general linear" set of all invertible n × n matrices with coefﬁcients in K, we can interpret P−1 AP as a matrix n representation of LA w.r.t. some basis B of K . t t P = (v1 | · · · | vn) Since P is invertible, its columns are LI =⇒ {v1, ··· vn} are LI. n n Since dim K = n, {v1, ··· vn} also span K , i.e. {v1, ··· vn} ≡ B is n −1 a basis for K . And [LA]B,B = P AP. There is thus a one-to-one correspondence between matrix representations of LA and matrices similar to A. Especially, similar matrices can be interpreted as representing the same linear map (w.r.t different bases). We are thus interested in determining the set of all matrices in Mat(n × n, K) similar to A ("Similarity Problem").

2.5 Similarity

Deﬁnition 2.9. Let A, B ∈ Mat(n × n, K). B is said to be similar to A, in symbols B ∼ A if ∃P ∈ GL(n, K) s.t. B = P−1 AP.

Deﬁnition 2.10. Equivalence Relation: Let S be a set. A relation, i.e. a subset of the cartesian product S × S, denoted by ∼, with the following properties:

1. ∀x ∈ S : x ∼ x reflexive

2. ∀x, y ∈ S : x ∼ y ⇐⇒ y ∼ x symmetric

3. ∀x, y, z ∈ S : x ∼ y ∧ y ∼ z =⇒ x ∼ z transitive

Example 2.8. S = Z, n ∼ k : ⇐⇒ n − k is even. claim: ∼ is an equivalence relation: math 247: honours applied linear algebra 31

• n ∼ n since n − n = 0 is even.

• n ∼ k ⇐⇒ n − k even ⇐⇒ k − n is even ⇐⇒ k ∼ m

• n ∼ k ∧ k ∼ m =⇒ n − m = n − k + k − m is even =⇒ n ∼ m | {z } | {z } even even {n ∈ Z : n ∼ 0} = {··· , −2, 0, 2, 4, ···} set of all even numbers. {n ∈ Z : n ∼ 1} = {··· , −3, −1, 1, 3, ···} set of all odd numbers.

Deﬁnition 2.11. Let S be a set, s ∈ S, ∼ an equivalence relation on S. The equivalence class [s] = [s]∼ ≡ {t ∈ S : t ∼ s}.

Note that s ∈ [s]∼, since s ∼ s. Example 2.9. S = Z, ∼: n − k is even. [0] is the set of all even numbers. [1] is the set of all odd numbers. Note that these equivalence classes partition S. This holds in general:

Theorem 2.15. Let S be a set, ∼ an equivalence relation. Then the equivalence classes of ∼ partition S, i.e. every s ∈ S is contained in an equivalence Exercise 2.10. Show that similarity is an class and any two equivalence classes are either identical or disjoint. equivalence relation. Remark 6.

−1 [In] = {P In P : P ∈ GL(n, K)} Proof. Since s ∈ [s], thus the equivalence classes cover S. It remains to | {z } P−1 P=I show that any two equivalence classes are either identical or disjoint. n = {In} Let s, t ∈ S such that [s] ∩ [t] 6= ∅. Let u ∈ [s] ∩ [t]. lecture 02/20 =⇒ u ∼ s ∧ u ∼ t =⇒ by symmetry s ∼ u ∧ u ∼ t =⇒ by transitivity s ∼ t ∧ t ∼ s

Let v ∈ [s] be arbitrary =⇒ v ∼ s and s ∼ t =⇒ by transitivity =⇒ v ∼ t =⇒ v ∈ [t] =⇒ [s] ⊆ [t].

Theorem 2.16. Similarity, i.e. B ∼ A ⇐⇒ ∃P ∈ GL(n, K) : B = P−1 AP is an equivalence relation.

Proof.

−1 • Reﬂexivity: AIn AIn =⇒ A ∼ A.

• Symmetry: Let B ∼ A, B = P−1 AP =⇒ PB = AP =⇒ A = PBP−1 = (P−1)−1BP−1 =⇒ A ∼ B.

• Transitivity: Let C ∼ B ∧ B ∼ A. C = Q−1BQ, B = P−1 AP =⇒ C = Q−1(P−1 AP)Q = (Q−1P−1)APQ = (PQ)−1 A(PQ) =⇒ C ∼ A 32 anna brandenberger, daniela breitman

Thus, similarity is an equivalence relation. Thus the similarity −1 classes [s]∼ ≡ {B ∈ Mat(n × n, K) : ∃P ∈ GL(n, K) : B = P AP}. Partition Mat(n × n, K). All matrices in a given similarity class can be considered to represent the same linear map wrt different bases.

Remark 7. Useful trick for showing that two given matrices are not similar:

1. Similar matrices have the same determinant: B = P−1 AP

−1 −1 1 Proof. det B = det(P AP) = det P det A det P = det Adet P | {z } det P 6=0 = det A =⇒ if det A 6= det B then A, B can’t be similar.

2. Similar matrices have the same trace.

2.6 Diagonalization: Eigenvalues and Eigenvectors

Given a linear map L : V → V, V ﬁnite dimensional VS over K, we can represent L by its matrix representation [L]B, B basis for V.The set of all such representation forms a similarity class of matrices in Mat(n × n, K). We want to ﬁnd the simplest representative within this similarity class.

Deﬁnition 2.12. Invariant subspace: Let L : V → V, V VS over K, L linear subspace. A subspace U ≤ V is called invariant under L or L- invariant if L(U) = {L(u) : u ∈ U} ⊆ U.

Assume V ﬁnite dimensional VS over K, L : V → V, linear, and U, W

L-invariant subspaces of V with V = U ⊕ W. Let {u1, ··· , uk} be basis for U, and {w1, ··· , wm} be basis for W. So B ≡ {u1, ··· , uk, w1, ··· , wm}  k  z}|{ [ ] ? 0 ( ) is a basis for V. Then L B is of the form  0 ?  since L uj is |{z}m an LC of u1, ··· , uk ∀1 ≤ j ≤ k and L(wi) is LC of w1, ··· , wm ∀1 ≤ i ≤ m.

Similarly, if v = u1 ⊕ u2 ⊕ · · · ⊕ uk, where all uj are L-invariant 1 1 k k with bases {u1, ··· , um} for u1, ··· , {u1, ··· , um} for uk, and B ≡  m1  z}|{  ? ··· 0  {u1, ··· , uk } basis for V, then [L] is of the form:  .  1 mk B  . ? 0   0 ··· ?  |{z} mk m1 + ··· + mk = n = dim V We are thus interested in one- Note that the representation is simplest dimenisonal L-invariant subspaces of V. Let v 6= 0 such that span{v} if all of the L-invariant subspaces have dimension 1. Is it possible to achieve is L-invariant, then L(v) ∈ span{v} =⇒ ∃λ ∈ K : L(v) = λv. this? In general, no, but sometimes yes. math 247: honours applied linear algebra 33

Deﬁnition 2.13. V ﬁnite dimensional VS over K, L : V → V linear. v ∈ V, v 6= 0 such that L(v) = λv for some λ ∈ K, then v is called an eigenvector of L to the eigenvalue λ.

Assume that V has a basis of eigenvectors v1, ··· , vn- to eigenvalues λ1, ··· , λn. B ≡ {v1, ··· , vn}; L(vj) = λjvj = 0 · v1 + ··· + λjvj +   λ1 0 ··· 0 0 λ 0  2  ··· + 0 · vn, then [L]B =  . . . .  In this case, the normal form . . .. . 0 0 ··· λn problem is solved. We can also deﬁne eigenvalues and eigenvectors for matrices in Mat(n × n, K):

Deﬁnition 2.14. Let A ∈ Mat(n × n, K), v ∈ Kn, v 6= 0 such that Av = λv for some λ ∈ K, then v is called an eigenvector of A to the eigenvalue λ.

Let L : V → V, L(v) = λv, B basis for V. Then [L]B[v]B = [L(v)]B = [λvB] = λ[v]B. =⇒ λ eigenvalue of [L]B, [v]B is corresponding eigenvector. [L]B passes through a similarity class of matrices. All similar matrices have thus the same eigenvalues. This can be shown directly: Let A ∈ Mat(n × n, K), v 6= 0, λ ∈ K s.t Av = λv.

B = P−1 AP (P−1 AP)(P−1v) = P−1 A(PP−1v = P−1 Av = P−1λv = λ(P−1v) =⇒ (P−1v) = λ(P−1v) =⇒ λis an eigenvalue of B. | {z } 6=0

Some Properties of Eigenvectors and Eigenvalues

Theorem 2.17. Let L : V ⇒ V, λ eigenvalue of L. Eλ(L) ≡ {v ∈ V : L(v) = λv}, then Eλ ≤ V which is called the eigenspace of the eigenvalue n λ. For matrices: Eλ(A) ≡ {v ∈ K : Av = λv} lecture 02/22 Remark 8.E λ consists of all eigenvectors of L w.r.t. λ plus the zero vector.

Proof.

0 ∈ Eλ(L) since L(0) = 0 = λ · 0 X.

Closedness under + : Let v1, v2 ∈ Eλ(L). Then L(v1) = λv1, L(v2) = λv2 =⇒ L(v1 + v2) = L(v1) + L(v2) = λv1 + λv2 = λ(v1 + v2) =⇒ v1 + v2 ∈ Eλ(L) X. 34 anna brandenberger, daniela breitman

Closedness under scalar · : Let k ∈ K. Then L(kv1) = kL(v1) = k(λv1) = λ(kv1) =⇒ kv1 ∈ Eλ(L) X.

Thus Eλ(L) ≤ V.

Theorem 2.18. Eλ(L) is L-invariant.

Proof. Let v ∈ Eλ(L). Then L(v) = λv ∈ Eλ(L) =⇒ Eλ(L) is L-invariant.

Theorem 2.19. Eigenvectors to distinct eigenvalues are linearly independent, i.e. if L : V → V, v1, ··· , vk eigenvectors of L to the eigenvalues λ1, ··· , λk which are pairwise distinct, then {v1, ··· , vk} are LI.

Proof. by Induction. k = 1 : Let v1 eigenvector of L to evaluate λ1. Then v1 6= 0 by deﬁni- tion of eigenvectors. Thus {v1} is LI. k ⇒ k + 1: We assume that the statement holds for some value of k

and need to prove it for k + 1: Let v1, ··· , vk, vk+1 be eigenvectors of L to pairwise distinct eigenvalues λ1, ··· , λk+1. Let:

a1v1 + ··· + akvk + ak+1vk+1 = 0

L(a1v1 + ··· + akvk + ak+1vk+1) = L(0) = 0

=⇒ a1L(v1) + ··· + ak L(vk) + ak+1L(vk+1) = 0

=⇒ a1λ1v1 + ··· + akλkvk + ak+1λk+1vk+1 = 0

Also λk+1∗line 1: a1λk+1v1 + ··· + akλk+1vk + ak+1λk+1vk+1 = 0

subtracting =⇒ a1(λ1 − λk+1)v1 + ··· + ak(λk − λk+1)vk = 0

Since v1, ··· , vk is LI by inductive hypothesis, it follows that:

a1(λ1 − λk+1) = 0, ··· ak(λk − λk+1) = 0 | {z } | {z } 6=0 6=0

=⇒ a1 = 0, ··· , ak = 0

Since a1v1 + ··· akvk +ak+1vk+1 = 0 =⇒ ak+1 vk+1 = 0 | {z } |{z} =0 6=0

=⇒ ak+1 = 0 =⇒ a1 = ··· = ak = ak+1 = 0

=⇒ {v1, ··· , vk+1} is LI.

Theorem 2.20. Let V be a ﬁnite dimensional VS over K. L : V → V. Then L is diagonalizable, i.e. there exists a basis B for L s.t. [L]B is a diagonalizable matrix, iff V has a basis of eigenvectors. math 247: honours applied linear algebra 35

Proof. ("⇒") Let L be diagonalizable, let B be a basis for V such that [L]B λ1 0 ! = . = ( ··· ) is a diagonal matrix D .. , B v1, , vn . 0 λn

 λ  0 [L(v1)]B = D · e1 =  .  = λ1e1 = [λ1v1]B =⇒ L(v1) = λ1v1 . 0 . .  0  . [L(vn)]B = D · en =  .  = λnen = [λnvn]B =⇒ L(vn) = λnvn 0 λn

=⇒ v1, ··· vn are eigenvectors of L. ("⇐") Let {v1, ··· , vn} be a basis of eigenvectors of L, say to eigenvalues λ1, ··· , λn. Let B = {v1, ··· , vn}. Then:

L(v ) = λ v + 0 · v + ··· + 0 · vn 1 1 1 2  λ1 0 ··· 0  . 0 λ2 0 . =⇒ [L]B =  . . . .  . . .. . 0 0 ··· 0 L(vn) = 0 · v1 + ··· + λnvn so [L]B is a diagonal matrix.

Theorem 2.21. V VS over K, dim V = n, L : V → V linear. Assume that L has n pairwise distinct eigenvalues. Then L is diagonalizable.

Proof. We need to prove (by Theorem 2.20) that V has a basis of eigenvectors. Let v1, ··· , vn be eigenvectors to the n pairwise distinct eigenvalues λ1, ··· , λn. By Theorem 2.19, {v1, ··· , vn} is LI and thus a basis for eigenvectors for V, =⇒ L is diagonalizable.

Not every linear map L : V → V (V ﬁnite dimensional) is diagonalizable.

Example 2.11. The differentiation map D : Pn(R) → Pn(R) is not diagonalizable ∀n ≥ 2.

Proof. We have to ﬁnd all eigenvalues and eigenvectors of D. Let n p(x) = a0 + a1x + ··· + anx be arbitrary 6= 0.

2 n−1 D(p(x)) = a1 + 2a2x + 3a3x + ··· + nanx 36 anna brandenberger, daniela breitman

p(x) is an eigenvector of D iff ∃λ ∈ R st. D(p(x)) = λp(x), i.e.:

a1 = λa0 1. Case: λ 6= 0 2. Case: λ = 0

2a2 = λa1 an = 0 an = 0

3a3 = λa2 an−1 = 0 an−1 = 0 ......

nan = λan−1 a0 = 0 a1 = 0

0 = λan =⇒ p = 0 But a0 can be arbitrary! =⇒ no λ 6= 0 is Thus λ = 0 is an eval of D with:

an eval of D E0(D) = {c0} = span{1}

dim E0(D) = 1 6= dim Pn(R) = n + 1

Thus we cannot ﬁnd a basis of eigenvectors for Pn(R) if n > 1 =⇒ D is not diagonalizable for any n > 1.

Remark 9. While 0 ∈ V is not considered an eigenvector, 0 ∈ K can very well be an eigenvalue.

Theorem 2.22. Let L : V → V. Then 0 is an eigenvalue of L iff L is not injective. In that case, E0(L) = ker L.

Proof. ("⇒") Let 0 be an eval of L. Then ∃v 6= 0 with L(v) = 0 · v = 0 =⇒ L has non-trivial kernel, i.e. is non-injective. ("⇐") Let L be non-injective =⇒ L has non-trivial kernel =⇒ ∃v 6= 0 with L(v) = 0 = 0 · v =⇒ v evec to eval 0 =⇒ 0 is eval. And: E0(L) = {v ∈ V : L(v) = 0 · v = 0} = ker L. lecture 02/27

2.7 The Characteristic Polynomial

Goal: Let A ∈ Mat(n × n, K): ﬁnd all eigenvalues (and eigenvectors) of A. λ is an eigenvalue of A ⇔ ∃x ∈ Kn, x 6= 0 : Ax = λx ⇔ λx − Ax = 0 ⇔ λ · Inx − Ax = 0 ⇔ ∃x 6= 0 : (λI − A) x = 0 ⇔ the linear system | {z } n×n matrix (λI − A)x = 0 has a non-trivial solution, ⇔ λI − A is non-invertible ↔ det(λI − A) = 0

 λ−a11 −a12 ··· −a1n  −a21 λ−a ··· −a2n  22  Deﬁnition 2.15. det(λI − A) = det  . . . .  . . .. . −an1 ··· λ−ann is a polynomial in λ of degree n and leading term λn, and is called the characteristic polynomial of A, denoted χA(λ). math 247: honours applied linear algebra 37

Some coefﬁcients of the characteristic polynomial:

n n−1 • Constant Coefﬁcient: Let χA(λ) = λ + cn−1λ + ··· + c1λ1 + c0 n ⇒ χA(0) = c0 = det(−A) = (−1) detA n ±1 ⇒ c0 = (−1) detA z }| { detA = ∑ sgn(σ) ·a1σ(1) · ... · • Coefﬁcient of λn−1: the only term in the determinant that possibly σ perm of { ··· } n−1 1, , n contributes a λ is the product of the elements on the main anσ(n) diagonal.

χA(λ) = (λ − a11) · ... · (λ − ann) + terms of order at most n − 2 in λ n n−1 = λ + λ (−a11 − a22 − · · · − ann) + ··· = λn − trA · λn−1 + lower order terms

i.e. cn−1 = −trA.

Theorem 2.23. Similar matrices have the same characteristic polynomial. Remark 10. Since (−1)ndetA and −trA are coefficients of χA(λ). We thus have Proof. Let B ∼ A, A, B ∈ Mat(n × n, K). that similar matrices have the same B = P−1 AP. Then determinant (we already knew that) and trace. Especially, if A, B ∈ Mat(n × n, K) −1 χB(λ) = det(λI − P AP) with trA 6= trB then A, B are not similar. = det(λP−1P − P−1 AP) = det(P−1(λI)P − P−1 AP) Proof. Number of eigenvalues of A ∈ Mat(n × n, K): λ0 eigenvalue −1 = det(P (λI − A)P) of A ⇔ χA(λ0) = 0. But χA(λ) is a polynomial of degree n and thus has at −1 = det(P )det(λI − A)detP most n roots. Thus A has at most n distinct eigenvals. 1 = · χA(λ) · detP = χA(λ) detP Proof. without using χA(λ): Let λ1, ··· , λk be the distinct eigenvalues of A, and let v1, ··· , vk be correspond- Definition 2.16. Let L : V → V, V finite dimensional vector space, ing eigenvecs. Eigenvectors to distinct eigenvalues are LI =⇒ {v1, ··· , vk} is L linear. Let B be any basis for V. The characteristic polynomial LI =⇒ k ≤ n =⇒ A has at most n χL(λ) ≡ χA(λ) where A = [L]B. eigenvals.

Recall that λ eigenvalue of A iff χA(λ) = 0, i.e. the eigenvalues of Remark 11. Since matrix representations A (or L) are just the roots of χA(λ). (We can use numerical methods to different bases are similar, it follows to find roots, but that is actually not efficient, there exist much better from previous theorem (Theorem 2.23) χ (λ) not algorithms to approximate eigenvalues and eigenvectors - math 327 that L is well-defined, i.e. does depend on B. matrix numerical analysis.)

1 2 4 Example 2.12. Let A = 1 0 2 . Find all eigenvalues, eigenvec- −1 −1 −3 tors; ﬁnd P ∈ GL(n, R) s.t. P−1 AP is diagonal. 38 anna brandenberger, daniela breitman

λ−1 −2 −4 χA(λ) = det(λI − A) = det −1 λ −2 1 1 λ+3 λ+1 −2 −4 1 −2 −4 = det −λ−1 λ −2 = (λ + 1)det −1 λ −2 0 1 λ+3 0 1 λ+3 1 −2 −4 λ−2 −6 = (λ + 1)det 0 λ−2 −6 = (λ + 1)det + 0 1 λ+3 1 λ 3 = (λ + 1)(λ2 + 3λ − 2λ − ¡6 + ¡6) = (λ + 1)(λ2 + λ) = λ(λ + 1)2

=⇒ the eigenvalues are λ = 0, λ = −1. Can use any elementary row or column Eigenspaces: x eigenvector s.t. Ax = λx ⇐⇒ (λI − A)x = 0. operation for determinant computa- tions: here we computed col 2 - col 1, So, eigenvectors are all nontrivial solutions of this equation (since 0 then row 1 + row 2. cannot be an eigenvector). case 1: λ = −1:  −2 −2 −4 1 1 2 x1 + x2 + 2x3 = 0 E−1 : −1 −1 −2 ∼ 0 0 0 =⇒ 0 0 0 1 1 2 x2, x3 free parameters

−2 −1 x2 = 0, x3 = 1 : x1 = −2, 0 ; x2 = 1, x3 = 0 : x1 = −1, 1 1 0 −2 −1 ⇒ E−1(A) = span{ 0 , 1 } 1 0 case 2: λ = 0:  + + = x1 2x2 4x3 0 1 2 4 1 2 4 1 2 4  E0 : 1 0 2 ∼ 0 −2 −2 ∼ 0 1 1 =⇒ x2 + x3 = 0 −1 −1 −3 0 0 0 0 1 1  x3 free parameter

x3 = 1 : x2 = −1, x1 − 2 + 4 = 0, x1 = −2 −2 ⇒ E0(A) = span{ −1 } [= nullspace(A)] 1 −2 −2 −1 0 0 0 −1 17 17 So P ≡ −1 0 1 then P AP = 0 −1 0 . where the ﬁrst column: eigenvector to 1 1 0 0 0 −1 λ = 0 and the second two columns are the eigenvectors to λ = −1. Q: Does A ∈ Mat(n × n, K) always have at least one eigenvalue?

1. no if K = R

2. yes if K = C.

1. If K = R, ∃ linear maps L : V → V, V VS over R without eigenvalues (this occurs when both roots of the characteristic polynomial are = 0 −1 7→ R2 → R2 complex). A 1 0 , L : x Ax, L : . Since A is a rotation π by 2 , we can already see that a rotation of ~v will never be a multiple of itself unless ~v = ~0 6= eigenvalue. We deﬁne the characteristic polynomial: math 247: honours applied linear algebra 39

λ 1 2 χA = det(λI − A) = det −1 λ = λ + 1

This has no real roots, and hence L : R2 → R2 has no eigenvalues in the reals. We note that if we take L as a map from C2 to C2, x 7→ Ax, 2 then for χA = λ + 1, we have roots λ = ±i and thus two eigenvalues =⇒ A is diagonalizable over C:

= 0 −1 ∼ i 0 A 1 0 0 −i To ﬁnd eigenvectors, we set λ = i:

( − ) = i 1 = 1 −i = 1 −i =⇒ i λI A −1 i i 1 0 0 1 . We then set λ = −i. ( − ) = −i 1 = 1 i = 1 i =⇒ −i λI A −1 −i −i 1 0 0 1 =⇒ ≡ i −i −1 = i 0 P 1 1 , P AP 0 −i

Proof. 2. λ0 eigenval of A ⇔ χA(λ0) = 0, i.e. A has at least one eigenval iff χA(λ) has at least one root. χA(λ) is a polynomial of degree n. By the Fundamental Theorem of Algebra, every polynomial of degree n factors over C into a product of n linear factors, i.e. every polynomial of degree n has n roots over C (counted with multiplicity) and thus has at least one root. lecture 03/01 We can ﬁnd eigenvalues and eigenvectors without the use of the characteristic polynomial.

2 Deﬁnition 2.17. Matrix Polynomials: let p(x) ≡ a0 + a1x + a2x + n n ··· + anx , A ∈ Mat(n × n, K), then A ∈ Mat(n × n, K) and p(A) = | {z } n×n n a0 I + a1 A + ··· + an A .

Algorithm for Finding Eigenvalue of A ∈ Mat(n × n, C):

n - Let x0 ∈ C be arbitrary 6= 0.

2 3 - Consider x0, A x0, A x0, ··· at most n of those can be LI. Let k be k the smallest index s.t {x0, Ax0, ··· , A x0 } is LD.

- So if we let a0, ··· , ak ∈ C, not all 0 s.t a0 x0 + a1 Ax0 + ··· + k k ak A x0 = 0 =⇒ (a0 I + a1 A + ··· + ak A )x0 = 0. This is a homogeneous linear system with nontrivial solution since k x0 6= 0 =⇒ B ≡ a0 I + a1 A + ··· + ak A is not invertible.

k - Let p(x) ≡ a0 + a1 x + ··· + ak x =⇒ B = p(A). By the Fundamental Theorem of Algebra, we know that p factors fully

over C : p(x) = ak (x − c1 ) ····· (x − ck ). 40 anna brandenberger, daniela breitman

- Since the product of invertible matrices implies that each matrix

involved is individually invertible, we have that p(A) = ak (A − c1 I) · ... · (A − ck I) is not invertible, and thus at least one of n (A − cj I) is not invertible, say A − cj, then =⇒ ∃x ∈ C , x 6= 0 s.t (A − cj I)x = 0 ⇐⇒ Ax − cj Ix = Ax − cjx = 0 ⇐⇒ Ax = cjx is an eigenvalue of A. Caution: note that not all of the ci’s are necessarily eigenvalues, but at 1 2 4 Example 2.13. Let A = 1 0 2 . We choose the simplest column least one is. −1 −1 −3 0 as Ax0, so x0 = 1 (x0 is just a column from Id. We can choose it to 0 be anything, but want it as simple as possible). We then have that

0 2 2 −2 x0 = 1 Ax0 = 0 A x0 = A(Ax0) = 0 0 −1 1 2 =⇒ {x0, Ax0, A x0}LD 2 2 =⇒ Ax0 + A x0 = 0, (A + A )x0 = 0 =⇒ p(λ) = λ + λ2 = λ(λ + 1) =⇒ λ = 0, λ = −1

1 We obtain the exact same result if we take x0 = 0 0

2.8 Normal Forms for Non-diagonalizable Matrices

λ1 ··· ? ! ∈ ( × ) . Deﬁnition 2.18. A matrix A Mat n n, K of the form .. 0 ··· λn is upper triangular. Similarly, a transpose of an upper triangular matrix is lower triangular.

Theorem 2.24. For upper triangular A ∈ Mat(n × n, C), the diagonal elements λ1, ··· , λn are the eigenvalues of A.

λ−λ1 ? ! ( ) = ( − ) = . Proof. χA λ det λI A det .. . Since the 0 λ−λn determinant of a triangular matrix is the product of its diagonal entries, we have that χA(λ = (λ − λ1) ····· (λ − λn) =⇒ λ1, ··· , λn are the eigenvalues of A. The same holds for lower triangular matrices.

Theorem 2.25. Every matrix A ∈ Mat(n × n, C), is similar to an upper triangular matrix.

Proof. We proceed by recursive induction on n. n = 1 : Every 1 × 1 matrix is upper triangular. math 247: honours applied linear algebra 41

n − 1 → n: (Inductive step) Assume the statement holds for n − 1. Let A ∈ Mat(n × n, C). A has at least one eigenvalue λ1 ∈ C. Let v1 be corresponding eigenvector. We extend to a basis B ≡ n (v1, v2, ··· vn) of C .   λ1 ? ··· ?  0  −1   P ≡ (v |v | · · · |vn), then P AP ≡ B =   1 2  .   . C  0 A ∼ B, show B ∼ upper triangular matrix T =⇒ A ∼ T by transitivity. By inductive hypothesis, C is an upper triangular matrix λ2 ··· ? ! ∃ ˜ ∈ ( − C) ˜ −1 ˜ = . i.e Q GL n 1, , Q CQ .. . 0 λ   n 1 0 ··· 0    0  Let Q ≡  . . We then extend Q for n × n ∈  . ˜   . Q  0   1 0 ··· 0  0  −1   Mat(n × n, C) =⇒ Q is invertible with Q =  .   . ˜ −1   . Q  0     1 0 ··· 0 1 0 ··· 0  0   0  −1     Since QQ =  .   .  =  . ˜   . ˜ −1   . Q   . Q  0 0     1 0 ··· 0 1 0 ··· 0      0   0   .  =  .   . ˜ ˜ −1   .   . Q · Q   . In  0 0   1 0 ··· 0  0  −1   =⇒ Q =  .   . ˜ −1   . Q  0       1 0 ··· 0 λ1 ? ··· ? 1 0 ··· 0  0   0   0  −1       =⇒ Q BQ =  .   .   .  =  . ˜ −1   .   . ˜   . Q   . C   . Q  0 0 0       1 0 ··· 0 λ1 ? ··· ? λ1 ? ··· ?        0   0   0   .   .  =  .  =  . ˜ −1   . ˜   . ˜ −1 ˜   . Q   . CQ   . Q CQ  0 0 0 42 anna brandenberger, daniela breitman

  1 0 ··· 0    0 λ2 ?    which is an upper triangular matrix.  . ..   . .  0 0 λn

2.9 Triangularization lecture 03/13 Given A ∈ Mat(n × n, C), we want to ﬁnd an upper triangular matrix T ∼ A. Summary of algorithm from last class:

1. Find one eigenvalue λ1 and corresponding eigenvector v1 of A n and extend v1 to a basis v1, ··· , vn of C :  t  λ1 u  0  −1   P ≡ (v |v | · · · |vn), then P AP =   where 1 2  .   . B  0 u ∈ Cn−1, B ∈ Mat(n − 1 × n − 1, C).

2. Apply step 1. to B. . . n − 1th step: we have an upper triangular matrix T ∼ A.

3 −3 −3 Example 2.14. Let A = 1 0 −2 . We arbitrarily choose x0 and −1 2 4 apply A until we obtain LD vectors:

1 A 3 A 9 3 1 x0 = 0 −→ 1 −→ 5 = 5 1 −6 0 0 −1 −5 −1 0 | {z } |{z} Ax0 x0 The last vector is obviously an LC of the two previous ones. 2 =⇒ A x0 = 5Ax0 − 6x0 2 =⇒ (A − 5A + 6I)x0 = 0 =⇒ (λ − 2)(λ − 3) = 0 Caution: Using this method, at least 0 3 3 1 0 1 one of the values found is an λ3: 3I − A = −1 3 2 = 0 1 1 1 −2 −1 0 0 0 eigenvalue. (i.e. not both as in the Indeed, not invertible since det(3I − A) = 0, so λ = 3 is an χλ method.) Moreover, there is no guarantee that these are the only eigenvalue. We then solve the system of equations and obtain E3 = eigenvalues. To check which λ is an 1 eigenvalue, row-reduce λI − A: Non- span 1 −1 invertible means λ is an eigenvalue. 3 We now extend v1 to a basis of C : 1 0 0 1 0 0 Let v1 = 1 , v2 = 1 , v3 = 0 , and P1 ≡ 1 1 0 , where −1 0 1 −1 0 1 −1 P1 AP1 is the matrix representation wrt basis (v1, v2, v3). A v1 −→ 3 · v1 + 0 · v2 + 0 · v3 A v2 −→−3 · v1 + 3 · v2 − 1 · v3 A v3 −→−3 · v1 + 1 · v2 + 1 · v3 math 247: honours applied linear algebra 43

− − −1 3 3 3 3 1 =⇒ B = P AP = 0 3 1 , C ≡ − . We repeat the same 1 0 −1 1 1 1 process for C. We calculate the characteristic polynomial and obtain that λ = 2 is 1 the only eigenvalue, thus E2 = span{w1}, w1 = −1 . We extend the 0 2 basis, so let w2 = 1 s.t. w1, w2 span C . ˜ 1 0 ˜ ˜ −1 ˜ 2 1 P2 ≡ −1 1 , C ≡ P2 CP2 = 0 2 , since we have that C w1 −→ 2 · w1 + 0 · w2 C w −→ 1 · w + 2 · w . 2 1 2 ! 1 1 0 0 −1 We get P2 = = 0 1 0 , where P2 BP2 will be upper P˜2 0 −1 1 triangular.

−1 Next, we look at two methods for computing T = P2 BP2. −1 1. We interpret P2 BP2 as a change of coordinates from standard basis to basis consisting of columns of P2, namely p1, p2, p3 respectively. 1 B 0 −→ 3 · p1 + 0 · p2 + 0 · p3 0 0 B 1 −→ 0 · p1 + 2 · p2 + 0 · p3 −1 0 B 0 −→−3 · p1 + 1 · p2 + 2 · p3 1 3 0 −3 =⇒ T = 0 2 1 . 0 0 2 −1 −1 −1 −1 Also, we can see that P2 BP2 = P2 P1 AP1P2 = (P1P2) A(P1P2) = − 1 0 0 P 1 AP, P = 1 1 0 . −1 −1 1 ! ! ! 1 0 λ ut · P˜ 1 0 2. See that P−1BP = 1 2 = 2 2 −1 ˜ 0 P˜2 0 C 0 P2 ! ! λ ut · P˜ λ ut · P˜ 1 2 = 1 2 −1 ˜ 0 P˜2 CP˜2 0 C   3 0 -3   So T =  0 2 1 . 0 0 2

3. Another way to solve is to find all eigenvalues. In this example, λ = 2, λ = 3 are the only eigenvalues (so we have 1-dimensional 1 0 eigenspaces):E3 = span 1 , E2 = span 1 . We extend the −1 −1 0 basis and take v3 = 0 . 1 A v1 −→ 3 · v1 + 0 · v2 + 0 · v3 1 0 0 P ≡ 1 1 0 To prove A (defined in examples) is not A −1 −1 1 diagonalizable, we find all eigenvalues v2 −→ 0 · v1 + 2 · v2 + 0 · v3 −1 3 0 −3 and find 2 eigenvalues for a 3 × 3 P AP = 0 2 1 A 0 0 2 matrix, thus not diagonalizable. v3 −→−3 · v1 + 1 · v2 + 2 · v3 44 anna brandenberger, daniela breitman

3 Applications

3.1 Systems of 1st Order Linear ODEs with Constant Coefﬁcients lecture 03/15 Intro to Matrix Exponentiation

x x2 ∞ xk Reminder: The MacLaurin series of e = 1 + x + 2 + ··· = ∑k=0 k! Now let X ∈ Mat(n × n, K) =⇒ Xk = X ····· X ∈ Mat(n × n, K), and 1 Xk ∈ Mat(n × n, K) | {z } k! k times N 1 k also, =⇒ ∑ k! X ∈ Mat(n × n, K). k=0 1 k Fact: limN→∞ k! X exists, and we deﬁne the following:

∞ 1 k Deﬁnition 3.1. exp(X) ≡ ∑ k! X , the matrix exponential of X. k=0

λ1 ! = . Special case: X is a diagonal matrix X .. λn

 k  λ1 =⇒ k = . X  ..  k λn  λk   1 λk  ∞ 1 1 ∞ k! 1 =⇒ exp(X) =  ..  =  ..  ∑ k! . ∑ . k=0 k k=0 1 k λn k! λn  ∞ 1 k  Caution: this formula only holds in this ∑k=0 k! λ1 particular case. In general, =  ..  .  ea11 ··· ea1n  ∑∞ 1 λk k=0 k! n exp(X) = .  ..  λ1 ! e ean1 ··· eann . = .. eλn

Matrix Exponentiation and Similarity Aside: ∈ ( × ) ∈ ( ) Let X Mat n n, K , P GL n, K : P−1(X2)P = P−1(X · X)P −1 −1 ∞ 1 ∞ 1 ∞ 1 = P X(PP )XP P−1 exp(X)P = P−1( Xk)P = P−1( Xk)P = (P−1(Xk)P) ∑ ∑ ∑ = (P−1XP)(P−1XP) k=0 k! k=0 k! k=0 k! | {z } Aside = (P−1XP)2 ∞ 1 = (P−1XP)k = exp(P−1XP) Applying this idea recursively, we get ∑ −1 k −1 k k=0 k! P (X )P = (P XP) .

Systems of Linear ODEs:

One-dimensional case: 0 d Let x = x(t). Find the solutions of x˙ = ax, a ∈ R (x˙ = x = dt x(t)). math 247: honours applied linear algebra 45

One solution is x(t) = eat. (Check: x˙ = eat · a = ax X) General solution: Let x = x(t) be any solution of x˙ = ax. x = −at Consider eat xe . Then: d (xe−at) = x˙ e−at + x · e−at · (−a) = axe−at − axe−at = 0 dt |{z} ax

Thus, xe−at is a constant, say xe−at = C =⇒ x = C · eat, C ∈ R. Especially, x(0) = Ce0 = C, i.e. C is the IC of x at t = 0.   = x˙ ax at =⇒ the general solution of is x(t) = x0 · e . x(0) = x0 ∈ R Multivariable case:  1 x˙1 = x1 − x2 x = 1 Example 3.1. n=2 , 0 i.e. X = 1 2 0 1 x˙2 = 6x1 − 4x2 x0 = 1 x˙1 = 1 −1 x1 ; X˙ = AX; IC X(0) = X ∈ Kn x˙2 6 −4 x2 0

problem: the equations are "entangled" and thus difﬁcult to solve. We’ll use linear algebra to "disentangle" them. Consider the differential equation of the form X˙ = AX, where A ∈ Mat(n × n, K) x1(t) and X = X(t) = ··· differentiable with initial condition xn(t) x (0) x1 1 0 n X(0) = ··· = ··· = X0 ∈ K . ( ) n xn 0 x0 Assume for now that A is diagonalizable, i.e. D = P−1 AP, P ∈ λ1 ! GL(n, K), D = .. . . B ∈ Mat(n × n, K) λn Bx˙ = (Bx˙ ) =⇒ PDP−1 = A =⇒ X˙ = AX = PDP−1X =⇒ (X˙ = AX = PDP−1X) · P−1 =⇒ P−1X˙ = DP−1X =⇒ (P−˙1X) = DP−1X

Let Y ≡ P−1X, then we have Y˙ = DY. −1 Let Y0 ≡ P X0 This is a change of coordinates from standard basis to (u1,..., un) where  1  y˙1 ! λ1 ! y1 ! Y0 P = (v1 | · · · | vn) . = . . , Y(0) = Y = . . .. . 0  .  y˙ yn n n λn Y0

˙ 1 1 λ1t ⇔ Y1 = λ1Y1, Y1(0) = Y0 =⇒ Y1(t) = Y0 · e . . ˙ n n λnt Yn = λnYn, Yn(0) = Y0 =⇒ Yn(t) = Y0 · e 46 anna brandenberger, daniela breitman

λ t e 1 ! λ1t ! ⇒ = . = . · Y .. exp .. Y0 eλnt λnt

= exp(tD) · Y0 at and X = X0 · e = exp(at)X0 −1 −1 =⇒ Y = P X, X = PY; Y0 = P X0 −1 −1 =⇒ X = PY = P exp(tD)P X0 = exp(P(tD)P )X0 −1 = exp(t PDP )X0 | {z } =A

= exp(tA)X0

So for the n-dimensional case we have X = exp(tA)X0, and for the at 1-dimensional case x = x0e = exp(ta)x0, i.e. the same formula! Example 3.2. (back to previous example): x˙1 1 −1 x1 1 x1(0) = x ; x0 = = x˙2 6 −4 2 1 x2(0) 1 −1 1 A = 6 −4 has eigenvalues −2 and −1: E−2 = span{ 3 }, E−1 = 1 span{ 2 } 1 1 −1 2 −1 −2 1 =⇒ P = 3 2 , P = − −3 1 = 3 −1 = −2 0 = −1 = −1 = −2 1 1 = −1 D 0 −1 , Y P x, Y0 P x0 3 −1 1 2  Y˙1 = −2Y1, Y1(0) = −1 Y˙ = DY, Check: ˙ = − ( ) = Y2 Y2, Y2 0 2 x1(0) = −1 + 2 = 1 X −2t −t x2(0) = −3 + 4 = 1 X =⇒ Y1(t) = e , Y2(t) = 2e −2t −t x˙1 = 2e − 2e − = − −2t + −t + −2t − −t e−2t e−2t −1 x1 x2 e 2e 3e 4e exp(tD) = −t , X(t) = P −t P X0 e e = 2e−2t − 2e−t −2t −t 1 1 e−2t −1 −e +2e x1(t) =⇒ X(t) = −t = −2t −t = 3 2 e 2 −3e +4e x2(t) lecture 03/20 Now, let us consider cases when A is not diagonalizable. If A is Recall that previously, we looked at not diagonalizable, we can at least triangularize it since A ∼ T for T examples where A is diagonalizable, i.e. upper-triangular. x˙ = Ax, x(0) = x0 A = PDP−1, x˙ = PDP−1x −1 −1 = ⇐⇒ = −1 −1 We thus have A PTP T P AT y ≡ P x, y0 ≡ P x0 −1 =⇒ −1 = −1 = ( ) = X˙ = PTP X, X(0) = X0 P x˙ DP x, y˙ Dy, y 0 y0  y   λ t  −1 −1 −1 −1 1 e 1 0 0 P X˙ = TP X, Y ≡ P X, Y˙ = TY, Y(0) = Y0 = P X0. . =⇒  .  =  ..  . 0 . 0 yn 0 0 eλnt y˙1 = a11y1 + a12y2 + ··· + a1nyn  eλ1t 0 0  exp(tD) =  ..  y˙2 = a22y2 + ··· + a2nyn 0 . 0 λnt . 0 0 e . λ t .  e 1 0 0  x = P . P−1x  ..  0 yn˙−1 = a(n−1)(n−1)yn−1 + a(n−1)nyn 0 0 0 0 eλnt y˙n = annyn = exp(tA)x0 This implies that all solutions are linear combinations of eλ1t, ··· , eλnt, where λ1, ··· , λn are the eigenvalues of A. math 247: honours applied linear algebra 47

Thus, we can solve for yn directly and then solve from bottom to top. ˙ 2 1 1 Example 3.3. Y = 2 Y, Y(0) = 1 y˙1 = 2y1 + y2, y1(0) = 1 2t y˙2 = 2y2, y2(0) = 1 =⇒ y2 = e 2t y˙1 = 2y1 + e We solve this nonhomogeneous linear DE: 2t Homogeneous solution for y˙1 − 2y1 = 0 is Ce . 2t Particular solution of the above DE is y1 = (At + B)e . We next solve for initial conditions: y1(0) = 1 =⇒ B = 1 2t y˙1 − 2y1 = e =⇒ A = 1 2t y1 = (t + 1)e We notice that this solution is not a linear combination of exponentials as in the diagonalizable case, so the structure of solutions is very different. We note that we can also solve this using matrix exponentials:

−1 Y˙ = TY, Y(0) = Y0 = P X0

=⇒ Y = exp(tT)Y0 (?) −1 −1 P X = exp(tT)P X0 −1 −1 −1 X = P exp(tT)P X0 = exp(P(tT)P )X0 = exp(tPTP )X0

X(t) = exp(tA)X0

a b Proof. of (?) in the special case where T = 0 a , a, b ∈ R. We show that exp(tY)Y0 solves Y˙ = TY, Y(0) = Y0, where exp(X) = ∞ 1 n ∑n=0 n! X . n n−1 We begin by deriving a formula for Tn : Tn = a na b . 18 18 Proof by induction: 0 an n = 1 =⇒ T1 = a b With this, we can now show: 0 a Inductive step: assume this is true for ∞ ∞ n ∞ tn n ∞ tn n−1 all n, now show n → n + 1: 1 n n t an nan−1b ∑n=0 n! a ∑n=0 n! na b exp(tT) = t T = n = n ∑ ∑ 0 a ∞ t n n+1 n an nan−1b a b n! n! 0 ∑n=0 n a T = T T = n=0 n=0 ! 0 an 0 a ∞ tan ∞ tn−1 n− n+1 n n n+1 n n ∑ tb ∑ a 1 ta ta = a a b+na b = a a (n+1)a b n=0 n! n=1 (n−1)! e tbe n+1 n+1 = = ta 0 a 0 a ∞ tan 0 e 0 ∑n=0 n! ta ta Y(t) = exp(T) · Y = e tbe y01 0 0 eta y02 ˙ 2 1 1 Example 3.4. Y = 0 2 Y, Y(0) = Y0 = 1 , so a = 2, b = 1: 2t Y(t) = (1+t)e e2t

3.2 Difference Equations with Constant Coefﬁcient Difference equations are a discrete First order: Let an be a sequence recursively deﬁned for given a0 version of ODEs. by an+1 = c · an, ∀n ∈ N. Finding a general formula for an is straightforward:

2 n a1 = c · a0 a2 = c · a1 = c · a0 ··· an = c a0 48 anna brandenberger, daniela breitman

Second order: an+2 = c · an+1 + d · an, ∀n ∈ N, for a0, a1 known initial conditions. To solve for a second order difference equation,

we create a dummy variable and diagonalize: Example 3.5. Fibonacci sequence: an+2 = an+1 + an, a0 = 0, a1 = 1. an+2 = c · an+1 + d · an

an+1 = an+1 an+2 = c d an+1 =⇒ = = a1 an+1 1 0 an Xn+1 AXn, X0 a0

X1 = Ax0 2 X2 = AX1 = A X0 ··· n Xn = A X0

We assume A is diagonalizable: D = P−1 AP, A = PDP−1 An = PDnP−1 (from matrix exponentials) n −1 =⇒ Xn = PD P X0 lecture 03/22

Example 3.6. an+2 = 4an+1 − 3an, a0 = 0, a1 = 1

= n = 4 −3 Xn A X0, A 1 0 λ−4 3 2 χA = det −1 λ = λ − 4λ + 3 = (λ − 3)(λ − 1) −1 3 ∼ −1 3 =⇒ = { 3 } E3 : −1 3 0 0 E3 span 1 −3 3 ∼ 1 −1 =⇒ = { 1 } E1 : −1 1 0 0 E1 span 1 1 =⇒ P = 3 1 =⇒ P−1 = 1 −1 1 1 2 −1 3 1 n =⇒ X = 3 1 3 1 −1 X n 2 1 1 1 −1 3 0 n n an+1 1 3 −3 = 3 1 n an 1 1 −1 3 X0 = = = 3 −1 2 1. if a0 0, a1 1 : an 2 n n 1 n+1 n+1 2. if a = 1, a = 3 : a = 3 + 0 = 3 = 3 −1 −3 +3 a1 0 1 n 3n−1 −3n+3 a0 2 3. if a0 = 1, a1 = 1 : an = 1 ∀n ∈ N constant equation. 1 n n n a1−a0 −a1+3a0 =⇒ an = 2 [(3 − 1)a1 + (−3 + 3)a0] = 3 2 + 2 So the solutions "escape to inﬁnity" iff a1 6= a0. If a1 = a0, then the In general, if A is diagonalizable with eigenvalues λ1, λ2, the sequence is constant. n n solutions are LCs of λ1 and λ2 , i.e. LCs of exponential functions. Now what if A is not diagonalizable?

Example 3.7. an+2 = 2an+1 − an , a0 = 0, a1 = 1

= 2 −1 = = 2 −1 Xn+1 1 0 AXn where A 1 0 λ−2 1 2 2 χ A = det −1 λ = λ − 2λ + 1 = (λ − 1) =⇒ λ = 1 only eigenvalue. −1 1 ∼ 1 −1 =⇒ = { 1 } E1 : −1 1 0 0 E1 s pan 1

1 =⇒ A is not diagonalizable, so we triangulate A. Extend 1 by n 0 C2 a b an nan−1b e.g. 1 to a basis for . Recall = 0 a 0 an math 247: honours applied linear algebra 49

= 1 0 −1 = 1 0 = 1 −1 =⇒ n = 1 −n P 1 1 , P −1 1 , T 0 1 T 0 1 =⇒ = 1 0 1 −n 1 0 = 1+n −n a1 Xn 1 1 0 1 −1 1 X0 n 1−n a0

=⇒ an = n · a1 + (1 − n)a0, a0 = 0, a1 = 1 =⇒ an = n 50 anna brandenberger, daniela breitman

4 Orthogonality and Inner Product Spaces

4.1 Deﬁnitions

Deﬁnition 4.1. Standard Inner Product on Rn (also called dot prod- u1 v1 uct or standard scalar product). Given u = ··· and v = ··· , un vn v 1 t hu, vi ≡ u1v1 + ··· + unvn ≡ (u1 ··· un) ··· = u · v vn

Theorem 4.1. Properties: h, i is bilinear, i.e. linear in each component.

• hu + w, vi = hu, vi + hw, vi, hk, uvi = khu, vi

• hu, v + wi = hu, vi + hu, wi, hu, kvi = khu, vi

u1 hu, ui = u2 + ··· + u2 = kuk2 = (u ··· u ) ··· = ut · u 1 n 1 n u q n 2 2 =⇒ kuk = u1 + ··· + un = hu,vi The angle α between vectors u and v is cosα kukkvk π α = 2 ⇔ cosα = 0 ⇔ hu, vi = 0

Deﬁnition 4.2. u, v ∈ Rn are orthogonal iff hu, vi = 0. Note that 0 is orthogonal to any v ∈ Rn. Now to define the standard inner product in Cn : p C = C1 z = x + iy, |z| = x2 + y2 where x = Re(z), y = Im(z). We deﬁne z = x + iy = x − iy =⇒ zz = (x + iy)(x − iy) = x2 + y2 = |z|2 z1 x1 +iy1 Cn Let z = ··· = ··· ∈ Cn zn xn +iyn We obtain kzk by identifying C2 with R2n, i.e. identify z ∈ Cn x1 ! y1 with ··· ∈ R2n xn yn q √ 2 2 2 2 =⇒ kzk = x1 + y1 + ··· + xn + yn = z1 z1 + ··· zn zn 2 z1 t ? =⇒ kzk = z1 z1 + ··· zn zn = (z1 ··· zn ) ··· = z z = z z zn

z1 Deﬁnition 4.3. Standard Inner Product on Cn. Given z = ··· and zn w1 w = ··· , we deﬁne wn ? w1 hz, wi = z w = (z1 ··· zn ) ··· = z1 w1 + ··· + zn wn wn

Properties: kzk2 = hz, zi Linearity: math 247: honours applied linear algebra 51

? w1 +v1 2nd argument: hz, w + vi = z (w + v) = (z1 ··· zn ) ··· wn +vn

= z1 (w1 + v1 ) + ... + zn (wn + vn )

= (z1 w1 + ... + zn wn ) + (z1 v1 + ... + zn zn ) = hz, wi + hz, vi ? hz, kwi = z (kw) = z1 (k1 w1 ) + ... + zn (kwn )

= k(z1 w1 + ... + zn wn ) = khz, wi =⇒ h, i linear in 2nd argument.

1st argument: hz + v, wi = (z1 + v1)w1 + ... + (zn + vn)wn

= (z1 + v1)w1 + ... + (zn + vn)wn

= (z1w1 + ... + znwn) + (v1w1 + ... + vnwn) = hz, wi + hv, wi

hkz, wi = (kz1)w1 + ... + (kzn)wn

= kz1w1 + ... + kznwn = khzn, wni So summary: hz, v + wi = hz, wi + hv, wi, hz, kwi = khz, wi hz + v, wi = hz, wi + hv, wi, hkz, wi = khz, wi

Deﬁnition 4.4. Two vectors z, w ∈ Cn are orthogonal iff hz, wi = 0.

Why is orthogonality important in Linear Algebra? For example, V VS over R, {v1, ··· , vn} basis, v ∈ V. How do we find [v]B? We need to find a1, ··· , an ∈ K s.t. v = a1v1 + ··· + anvn. So we must find a1 a1 P = (v1 | · · · | vn) =⇒ P ··· = a1v1 + ··· + anvn = v =⇒ ··· = an an P−1v which is computationally costly.

lecture 03/27 n Let v1, ··· , vn ∈ K be pairwise orthogonal and LI, v ∈ span{v1, ··· , vn}. Then ∃ a1, ··· , an ∈ K s.t. v = a1v1 + ··· + anvn. Recap from last lecture: Standard Inner Product: In the general case, ﬁnding the aj is computationally non-trivial. On Rn t Example 4.1. Let P = (v1 | · · · | vn) hu, vi ≡ u v = u1v1 + ··· + unvn a a 1 1 −1 Cn P ··· = a1v1 + ··· + anvn = v =⇒ ··· = P v On an an ? i.e. we either have to solve the inhomogeneous linear system hu, vi ≡ u v = u1v1 + ··· + unvn a ···1 −1 Norm: P = v, or compute P . p an kvk = hv, vi

However, if v1, ··· , vn are pairwise orthogonal, there is a much Caution: the standard inner product Cn faster way: on is not bilinear, it is only sesqui-linear: hku, vi = khu, vi while v = a1v1 + ··· + anvn hu, kvi = khu, vi. 52 anna brandenberger, daniela breitman

Now taking hvj, ·i,

hvj, vi = hvj, a1v1 + ··· + ajvj + ··· + anvni

= hvj, a1v1i + ··· + hvj, ajvji + ··· + hvj, anvni

= a1 hvj, v1i + ··· + ajhvj, vji + ··· + an hvj, vni | {z } | {z } =0 =0

= ajhvj, vji

hvj, vi =⇒ aj = hvj, vji i.e. we have the following:

Theorem 4.2. If we have a vector v = a1v1 + ··· + anvn where v1, ··· , vn hvj,vi are pairwise orthogonal =⇒ aj = ∀i ≤ j ≤ n. hvj,vji

Deﬁnition 4.5. Unit vector. A vector v ∈ Kn is called a unit vector if p kvk = hv, vi = 1

Theorem 4.3. We can turn a given non-zero vector v into a unit vector by v ≡ 1 deﬁning the following: kvk kvk v

Proof. In general, hkv, kvi = khkv, vi = kkhv, vi which works in both R and C since k = k ∀k ∈ R.

hkv, kvi = |k|2hv, vi 1 Now consider v : kvk 1 1 1 1 2 h v, vi = hv, vi = kvk = 1 kvk kvk kvk2 2 kvk 1 1 =⇒ k vk = 1 i.e. v is a unit vector. kvk kvk

n Deﬁnition 4.6. {v1, ··· , vk} ∈ K are called

• orthogonal if v1, ··· , vk are pairwise orthogonal

• orthonormal if v1, ··· , vk are pairwise orthogonal and all vj are

unit vectors. Example 4.2. S ≡ {e1, ··· , en} is an orthonormal basis for Kn. Now let {v1, ··· , vk} be orthonormal and v ∈ span{v1, ··· , vk}, h i h = + ··· + = vj,v = vj,v = h i v a1v1 anvn. Then aj hv v i 2 vj, v . j, j kvjk i.e. we have the following:

Theorem 4.4. If {v1, ··· , vk} orthonormal, v = a1v1 + ··· + anvn, then aj = hvj, vi. math 247: honours applied linear algebra 53

Deﬁnition 4.7. A matrix A ∈ Mat(n × n, R) is called orthogonal if the columns of A are orthonormal.

Deﬁnition 4.8. A matrix A ∈ Mat(n × n, C) is called unitary if the columns of A are orthonormal with respect to the standard inner product on Cn.

4.2 Inverses of orthogonal and unitary matrices

Let A ∈ Mat(n × n, R) be orthogonal.

A = (v1 | · · · | vn)  t   t  v1 v1 At = . =⇒ At · A = . (v | · · · | v ) = (b )  .   .  1 n ij t t vn vn  t 0 if i 6= j =⇒ bij = vi · vj = hvi, vji = 1 if i = j =⇒ At · A = I =⇒ A−1 = At

Now let A ∈ Mat(n × n, C) be unitary. If A = (aij, then A ≡ (aij) ? t A = (v1 | · · · | vn), A ≡ A  ?   ?  v1 v1 ? . ? . −1 ? A =  .  =⇒ A · A =  .  (v1 | · · · | vn) = I =⇒ A = A .? .? vn vn Thus we have the following:

Theorem 4.5. Inverse of orthogonal and unitary matrices:

• If A ∈ Mat(n × n, R), A orthogonal, then A−1 = At.

• If A ∈ Mat(n × n, C), A unitary, then A−1 = A?.

n Let U ≤ K , B = {v1, ··· , vk} a basis for U. We want to ﬁnd an orthogonal or orthonormal basis B0 for U:Gram-Schmidt-Process. This algorithm requires orthogonal projections: (Imagine a dia- gram of v projected onto u, u 6= 0, with vk = proju(v) the projection onto u and v⊥ the perpendicular part.) vk = t · u where t is to be determined

v = vk + v⊥ = tu + v⊥ hu, vi =⇒ hu, vi = hu, tu + v⊥i =⇒ proju(v) = v = u k hu, ui = h i + h i u, tu u, v⊥ hu, vi | {z } =⇒ v⊥ = v − v = v − u =0 k hu, ui = t · hu, ui 54 anna brandenberger, daniela breitman

Now we can introduce the Gram-Schmidt process:

n Let U ≤ K , B = {v1, ··· , vk} a basis for U.

Deﬁne: u1 = v1

hu1, v2i u2 = v2 − proju1 (v2) = v2 − u1 hu1, u1i hu1, v3i hu2, v3i u3 = v3 − proju1 (v3) − proju2 (v3) = v3 − u1 − u2 hu1, u1i hu2, u2i ···

hu1, vki huk−1, vki uk = vk − u1 − ... − uk−1 hu1, u1i huk−1, uk−1i

Then: span{v1, ··· , vj} = span{u1, ··· , uj} ∀1 ≤ j ≤ k

uj 6= 0 ∀1 ≤ j ≤ k

{u1, ··· , uj} is orthogonal ∀1 ≤ j ≤ k Especially, we get that span{u1, ··· , uj} = span{v1, ··· , vj} = U and {u1, ··· , uj} is orthogonal.

Proof. by "ﬁnite" induction. Remark 12. We can ﬁnd an orthonormal basis by "normalizing" u1, ··· , uj: 00 1 1 B = { u1, ··· , uk} Base: j = 1 u1 = v1 6= 0 ku1k kuk k

Inductive: Now assume the result holds for a j with 1 ≤ j < k. Under that assumption, we will prove the result for j + 1.

hu1, vj+1i huj, vj+1i uj+1 = vj+1 − u1 − ... − uj hu1, u1i huj, uji | {z } ∈ span{u1,··· ,uj}=span{v1,··· ,vj} vj+16∈span{v1,··· ,vj} since {v1,··· ,vj} is LI

=⇒ uj+1 6= 0

To show orthogonality, we need to show that uj+1 is orthogonal to u1, ··· , uj: uj+1 ⊥ ui 1 ≤ i ≤ j.

hu1, vj+1i hui, uj+1i = hui, vj + 1i − hui, u1i − ... hu1, u1i | {z } =0 hui, vj+1i huj, vj+1i − hu , u i − ... − hu , u i h i i i h i i j ui, ui uj, uj | {z } =0 hu , v i i j+1 = hui, vj+1 − hui, uii = 0 hui, uii

=⇒ {u1, ··· , uj+1} is orthogonal and all ui 6= 0 =⇒ (assignment question) {u1, ··· , uj+1} is LI. math 247: honours applied linear algebra 55

We still need to show that span{u1, ··· , uj+1} = {v1, ··· , vj+1}.

Knowing that span{u1, ··· , uj} = span{v1, ··· , vj}

=⇒ span{u1, ··· , uj+1} = span{v1, ··· , vj, uj+1}

= span{v1, ··· , vj, uj+1 , vj+1} |{z} LC of v1,··· ,vj+1

= span{v1, ··· , vj, vj+1} lecture 03/29 ( ) 1 3 1 Example 4.3. Let B = 1 , 1 , 1 be a basis for R3. We 0 −1 3 want to ﬁnd a basis B0 that is (i) Orthogonal (ii) Orthonormal. We use the Gram-Schmidt method: 1 u1 = 1 0 1 3 h i 1 , 1 1 3 0 −1 1 3 4 1 u2 = 1 − 1 = 1 − 1 = −1 − 1 1 2 − 1 h 1 , 1 i 0 1 0 1 0 0

1 1 1 1 − h 1 , 1 i h 1 , 1 i 1 0 3 1 −1 3 1 u3 = 1 − 1 − −1 3 1 1 0 1 1 −1 h 1 , 1 i h −1 , −1 i 0 0 −1 −1 1 2 1 −3 1 1 = 1 − 1 − −1 = −1 3 2 0 3 −1 2 0 (i) B = {u1, u2, u3} is an orthogonal basis. (ii) We divide each ui by its norm: { √1 u , √1 u , √1 u } i an orthonormal basis. 2 1 3 2 6 3  √1 √1 √1  2 3 6 √1 − √1 − √1 Moreover, we notice that P  2 3 6  is orthogonal since 0 − √1 √2 3 6 P−1 = Pt, where the columns are pairwise orthogonal unit vectors. We also notice that the same applies for rows:

hR1, R2i = 0 hR1, R3i = 0 hR2, R3i = 0

and hR1, R2i = hR1, R3i = hR2, R3i = 1

Theorem 4.6. Let P be orthogonal, then the row vectors of P are pairwise orthogonal unit vectors.

Proof.P orthogonal =⇒ P−1 = Pt

ut −1 t 1 =⇒ I = PP = PP = ··· ( u1 |···| un ) t un t 2 =⇒ ui · ui = 1 = hui, uii = ||ui|| =⇒ ||ui|| = 1 ∀1 ≤ i ≤ n t =⇒ ui · uj = 0, i 6= j ∀1 ≤ i, j ≤ n

=⇒ hui, uji = 0 =⇒ ui, uj orthogonal. 56 anna brandenberger, daniela breitman

Theorem 4.7. (a) Let P, Q be orthogonal, then P−1 and P · Q are orthogonal, thus O(n) ≡ {P ∈ Mat(n × n, R) : P orthogonal} is a group.

−1 (b) Let P, Q be unitary, then P and P · Q are unitary, thus U(n) ≡ {P ∈ O(n), U(n) are subgroups of GL(n, R). Mat(n × n, R) : P unitary} is a group.

4.3 Orthogonal/unitary matrices and diagonalizability (upper-triangularizability)

We wonder what matrices ∈ Mat(n × n, R) can be diagonalized or upper-triangularized via orthogonal, i.e. for matrices A ∃P ∈ O(n) : P−1 AP = Pt AP = D or Pt AP = T. Same for unitary matrices A ∈ Mat(n × n, C), we want P−1 AP = P? AP = D or P? AP = T. We begin with upper-triangularizability of matrices ∈ Mat(n × n, C) via unitary matrices.

Theorem 4.8. Schur. Every matrix A ∈ Mat(n × n, C) can be upper- triangularized via a unitary matrix.

Proof. Same as proof of Theorem 2.25, by induction. We ﬁrst need to ﬁnd one eigenvalue and its corresponding eigenvector. Next we extend to an arbitrary basis and obtain an orthogonal basis by Gram- Schmidt.

n = 1: Any 1 × 1 matrix is already upper-triangular.

n − 1 → n: Outline of proof: let A ∈ Mat(n × n, C), let λ1 be eigenvalue of A and let v1 be a unit eigenvector to λ1. We then extend n {v1} to a basis {v1, u2, ··· , un} for C and apply Gram-Schmidt to n obtain an orthonormal basis {v1, ··· , vn} for C . P = (v1| · · · |vn) is unitary. Then ∃ C ∈ Mat((n − 1) × (n − 1), C): ! λ ? B = P−1 AP = P? AP = 0 C

By inductive step, C can also be upper-triangularized via unitary matrix Q˜ ∈ U(n − 1), i.e. Q˜−1CQ˜ = Q˜?CQ˜ = T˜ upper triangular. (Could INSERT more details). In summary: P? AP = B and Q?BQ = T, so Q?P? APQ = T = t t (PQ)? A(PQ). The product of two unitary matrices is unitary, thus (AB)? = (AB)t = (AB)t = B A PQ is unitary and A is upper triangular via unitary matrix. = B? A?

Corollary 4.9. If A ∈ Mat(n × n, R) has only real eigenvalues, then A can be upper-triangularized via an orthogonal matrix. math 247: honours applied linear algebra 57

Proof. If all eigenvalues are real, v1 is real, and v2, ··· , vn can be chosen real =⇒ all coefﬁcients in previous proof are real =⇒ PQ is orthogonal =⇒ A is upper-triangular via orthogonal matrix. Diagonalizability Real case: Under what conditions can A ∈ Mat(n × n, R) be diagonalizable via an orthogonal matrix? Assume A can be diagonalized via P ∈ O(n) s.t P−1 AP = Pt AP = D =⇒ A = PDP−1 = PDPt =⇒ At = (PDP−1)t(Pt)tDtPt = PDPt =⇒ A = At, so A must be symmetric. Thus, for A to be diagonalizable via orthogonal matrix, A needs to be symmetric (necessary but not sufﬁcient condition) and have real eigenvalues.

Theorem 4.10. ∀A ∈ Symm(n × n, R) :

(a) All eigenvalues λ of A ∈ R.

(b) Eigenvectors to distinct eigenvalues of A are orthogonal.

Proof. (a) Let λ be an eigenvalue of A and X its corresponding eigenvector. Consider X? AX, then, since A is symmetric and real: X?(AX) = X?(λX) = λX?X = λ||X||2 = (X? A)X = (X? At)X = (X? A?)X = (AX)?X = (λX)?X = λX?X = λ||X||2 =⇒ λ||X||2 = λ||X||2 =⇒ (λ − λ)||X||2 = 0 X ≤ 0 since it is an eigenvector =⇒ λ − λ = 0 =⇒ λ = λ =⇒ λ real

(b) Let λ1 ≤ λ2 be eigenvalues with x1, x2 corresponding eigenvectors, then ? ? ? ? x1 Ax2 = x1 (Ax2) = x1 (λ2x2) = λ2x1 x2 = λ2hx1, x2i ? ? ? ? (x1 A)x2 = (Ax1) x2 = (λ1x1) x2 = λ1x1 x2 = by (a) = λ1hx1, x2i

=⇒ λ2hx1, x2i = λ1hx1, x2i

=⇒ (λ2 − λ1)hx1, x2i = 0

=⇒ hx1, x2i = 0 =⇒ x1, x2 orthogonal Remark 13. Let A ∈ Symm(n × n, R), P ∈ GL(n, R). In general, P−1 AP is not symmetric, i.e. the property of symmetry is, in general, not preserved under linear coordinate transformations. 58 anna brandenberger, daniela breitman

lecture 04/03 Recap from last class: = 1 0 = 1 2 Example 4.4. A 0 −1 , P 0 1 1. If A ∈ Mat(n × n, R) is orthogonally −1 1 4 Then P AP = 0 −1 = B, A is symmetric while B is not. diagonalizable, then A is symmetric. 2. If A is real symmetric, then all Theorem 4.11. Let A ∈ Symm(n × n, R), P orthogonal i.e. P ∈ O(n). eigenvalues of A are real. Then B = P−1 AP = Pt AP is symmetric.

Proof.B = Pt AP =⇒ Bt = (Pt AP)t = Pt AtP = Pt AP = B =⇒ B is symmetric.

Theorem 4.12. Spectral Theorem (real version). A matrix A ∈ Mat(n × n, R) is diagonalizable via an orthogonal matrix iff A is symmetric.

Proof. (⇒) already done. (⇐) Let A be real symmetric: then all eigenvalues of A are real. =⇒ by Corollary 4.9 (Schur, real version) ∃P ∈ O(n) s.t. P−1 AP = Pt AP = T where T is upper triangular. Since P ∈ O(n), T is symmetric. But a matrix that is both symmetric and upper triangular is diagonalizable =⇒ T is diagonalizable =⇒ A is diagonalizable via an orthogonal matrix.

Deﬁnition 4.9. A matrix A ∈ Mat(n × n, C) is hermetian if A? = A. A hermetian matrix is the complex equivalent of a real symmetric matrix. We might like to conjecture that A is diagonalizable via a unitary matrix iff A is hermetian. However, while it is true that all hermetian matrices are unitarily diagonalizable, the converse does not hold.

Not all matrices are normal, e.g. 1 4 Deﬁnition 4.10. A ∈ Mat(n × n, C) is normal if AA? = A? A. 0 −1 is not normal.

Theorem 4.13. The following classes of matrices are normal:

1. Real symmetric matrices

2. Real skew symmetric matrices

3. Real orthogonal matrices

4. Hermetian matrices

5. Skew hermetian matrices (i.e. A? = −A)

6. Unitary matrices

Proof. 1. AA? = AAt = A · A = At A = A? A

2. AA? = AAt = A(−A) = −A2 = (−A)A = At A = A? A math 247: honours applied linear algebra 59

? t −1 −1 t ? Remark 14. The class of normal matrices 3. AA = AA = AA = I = A A = A A = A A is strictly larger than the union of these 6 classes. 4. AA? = A · A = A? A Remark 15. If A is normal, P ∈ GL(nC), −1 5. Left to the reader. then P AP is in general not normal.

? −1 −1 ? Example 4.5. (same as Example 4.4) 6. AA = AA = I = A A = A A 1 0 1 2 A = 0 −1 , P = 0 1

Theorem 4.14. Let A ∈ Mat(n × n, C) normal, P ∈ U(n) (unitary group). Then P−1 AP = P? AP is normal.

Proof. Let B ≡ P? AP. Then:

BB? = P? AP(P? AP)? = P? AP · P? A?P = P? AA?P | {z } =I B?B = (P? AP)?P? AP = P? A? P · P? AP = P? A? A P | {z } |{z} =I =AA? =⇒ BB? = B?B =⇒ B is normal.

Theorem 4.15. Let A ∈ Mat(n × n, C) be both normal and upper triangular. Then A is diagonal.

Proof. by induction on n. n = 1 Nothing to show since all 1 × 1 matrices are normal, upper triangular and diagonalizable.    a11 a12 ··· a1n  a11 a22 ··· a2n a a ?  12 22  n − 1 → nA =  .  , A =  . . .  ...... ann a1n a2n ann ? Let B ≡ AA = (bij) =⇒ b11 = a11a11 + a12a12 + ··· + a1na1n ? C ≡ A A = (cij) =⇒ c11 = a11a11 Since A is normal, B = C =⇒ b11 = c11 =⇒ a11a11 = a11a11 + a12a12 + ··· + a1na1n 2 2 2 2 =⇒ |a11| = |a11| + |a12| + ··· + |a1n| | {z } ≥0 real

=⇒ a12 = ··· = a1n = 0   a 0   11 a11 0  a ?   22  ?   =⇒ A =   , A =    ..   0 A˜ ?   0 .    ann  2  ka11k 0   =⇒ ? =   = ? AA  ? ?  A A  0 A˜ A˜ = A˜ A˜ 

=⇒ A˜ is normal (n − 1) × (n − 1) matrix, =⇒ by inductive hypothesis A˜ is diagonalizable, =⇒ A is diagonalizable. 60 anna brandenberger, daniela breitman

Theorem 4.16. Spectral Theorem (complex version). A matrix A ∈ Mat(n × n, C) is diagonalizable via a unitary matrix iff A is normal.

Proof.

(⇒) Let P ∈ U(n) s.t. P−1 AP = P? AP = D =⇒ A = PDP?.

AA? = PDP?(PDP?)? = PDP?P D?P? = PDDP? |{z} =I A? A = (PDP?)?PDP? = PD? P?P DP? = PDDP? |{z} =I Note that any two diagonal matrices commute =⇒ DD = DD =⇒ AA? = A? A

(⇐) Let A be normal. By Theorem 4.8 (Schur), ∃P ∈ U(n) s.t. P−1 AP = P? AP = T upper triangular. Since P ∈ U(n), T is normal and upper triangular, =⇒ T is diagonalizable =⇒ A is Remark 16. on the real spectral theorem. diagonalizable via unitary matrix. We know that all real symmetric matrices are diagonalizable and have only real eigenvalues. Deﬁnition 4.11. A real symmetric matrix is called

1. positive deﬁnite if all eigenvalues of A are positive.

2. positive semideﬁnite if all eigenvalues of A are ≥ 0.

3. negative deﬁnite if all eigenvalues of A are < 0.

4. negative semideﬁnite if all eigenvalues of A are ≤ 0.

lecture 04/05 Example 4.6. on the Real Spectral Theorem

−3 1 −2 A = 1 −3 −2 , real symmetric. −2 −2 0

We want to ﬁnd P ∈ O(n) such that Pt AP = D.

1. Find eigenvalues and eigenvectors: λ = −4, λ = 2 2 −1 −1 E−4 = span{ 0 , 1 }, E2 = span{ −1 } Note that eigen- 1 0 2 vectors to distinct eigenvalues are orthogonal (LI by thm 2.19). For multiple eigenvalues, we use Gram-Schmidt to ﬁnd orthogonal bases for the corresponding eigenspaces.

2. Apply Gram-Schmidt on E−4: −1 h i − 1 ,u1 − 2 1 0 2 1 1 Let u1 = 0 , then u2 = 1 − 0 = 5 Note 1 0 hu1,u1i 1 5 2 that we can take any scalar multiple (orthogonality is preserved), −1 −1 thus we take u2 = 5 . We can check that hu2, −1 i = 0. 2 2 math 247: honours applied linear algebra 61

3. Obtain orthogonal matrix: recall that an orthogonal matrix has orthonormal columns, therefore we need to divide each vector by its norm: − n 1 2 1 −1 o 1 1 E−4 = span √ 0 , √ 5 , E2 = span √ −1 5 1 30 2 6 2

 √2 − √1 √4  5 30 6 0 √5 − √1 P =  30 6  ∈ O(n) √1 √2 √2 5 30 6 t −4 P AP = −4 2

4.4 Real Quadratic Forms Application of the Real Spectral Theorem.

Deﬁnition 4.12. A (real) quadratic form is a purely quadratic polynomial in n variables with real coefﬁcients. 2 2 For example, x1 + 4x1x2 + x2 and There are two interesting questions that we would like to answer 2 2 x1 + 2x1x2 + 2x2 are quadratic forms. concerning these polynomials: 1. What is the range of the quadratic form? 2. Can we express it as a sum or difference of pure squares?

Definition 4.13. A quadratic form Q(x) = Q(x1, x2, ··· , xn) is positive definite if Q(x) ≥ 0 ∀x ∈ Rn and Q(x) = 0 ⇔ x = 0. Similar for negative definite.

2 2 Example 4.7. x1 + 4x1x2 + x is not positive deﬁnite: Let x2 = −x1 2 2 2 2 Q(x1, −x1) = x1 − 4x1 + x1 = −2x1 < 0 ∀x1 6= 0. We want to use a more systematic approach:

2 2 Q(x) = a11x1 + ··· + annxn + b12x1x2 + ··· + b(n−1),nxn−1xn n n 2 = ∑ aiixi + ∑ bijxixj i=1 i,j=1,i

Let A be an n × n matrix:

a11 ··· a1n ! x1 ! xt Ax = (x ··· x ) . . . . 1 n . .. . . an1 ··· ann xn a11x1+···+a1n xn ! = (x ··· x ) . 1 n . an1x1+···+ann xn 2 = a11x1 + a12x1x2 + ··· + a1nx1xn + ··· 2 + an1xnx1 + an2xnx2 + ··· + annxn since i < j: 2 2 = a11x1 + ··· + annxn + (a12 + a21)x1x2 + (a13 + a31)x1x3 + ···

+ (an−1,n + an,n−1)xn−1xn 62 anna brandenberger, daniela breitman

This is a quadratic form! Conversely, every quadratic form Q(x) can t be written in matrix from, i.e. Q(x) = x Ax, bij = aij + aji. 1 If we split evenly, i.e. aij = aji = 2 bij, the resulting matrix is real symmetric, and therefore diagonalizable by an orthogonal matrix.

Example 4.8. 2 2 We now answer our two questions: x1 + 4x1x2 + x2 = Q(x1, x2) 1 2 =⇒ A = 2 1 , and thus: 1 2 x1 t Q(x x ) = ( x1 x2 ) 2. Let Q(x) = x Ax, A ∈ Symm(n × n, R). Let P ∈ O(n) such that 1, 2 2 1 x2 Pt AP = D ⇔ A = PDPt. λ1 ! t t t t t D = . and Q(x) = x PDP x, let P x = y =⇒ x P = y : ( ) = 2 + .. Example 4.9. Q x1, x2 x1 2 1 2 λn 4x1x2 + x2, A = 2 1 orthogonally t 2 2 diagonalizable. =⇒ Q(x) = y Dy = λ1y1 + ··· + λnyn λ = −1, 3. We expressed the quadratic form as a sum of pure squares. We can = −1 = 1 E−1 span 1 , E3 span 1 . thus easily answer our first question: nOrthonormal basiso is thus √1 −1 , √1 1 , P ≡ √1 1 −1 2 1 2 1 3 1 1 1. Q is positive definite ⇔ all λ > 0 i.e. ⇔ A is positive definite. −1 t 1 1 1 t i P = P = 2 −1 1 , A = PDP 1 x x 1 −1 3 0 Q(x1, x2) = ( 1 2 ) √ − Remark 17. We actually saw quadratic forms in Calculus 3 as well 2 1 1 0 1 when classifying critical points with the second derivative test: 1 1 1 x1 ·√ − x 2 1 1 2 n = 1 Single variable case: f twice continuously differentiable, 1 x1+x2 = √ x −x 0 2 2 1 f (x0) = 0. if f ”(x0) > 0, x0 is a local minimum. If f ”(x0) < 0, x0 is a local maximum. This is the matrix of y1, y2. 2 2 Q(y) = λ1y1λ2y2 − 3 1 . n dim Multivariable case: f twice continuously differentiable, x0 = (x + x )2 − (x − x )2 2 1 2 2 2 1 is a critical point i.e. ∇ f (x0) = 0. We take the Taylor expansion As a check, we can expand this result. about x0: Note that this is not positive definite because of the negative eigenvalue 1 λ = −1. f (x) = f (x ) + ∇ f (x ) · (x − x ) + (x − x )H(x )(x − x )2 + ··· 0 0 0 2 0 0 0

2 where H = ∂ f is the Hessian matrix, symmetric in this case ∂xi∂xj since f is twice continuously differentiable. 1 This is a quadratic form, with a 2 coefﬁcient due to even split. Let P ∈ O(n) such that:

λ1 ! t ( ) = = . P H x0 P D .. λn

t =⇒ H(x0) = PDP t t 2 2 =⇒ (x − x0) H(x0)(x − x0) = y Dy = λ1y1 + ··· + λnyn So x0 is a local minimum if all λi > 0, so we have:

– H(x0) positive deﬁnite =⇒ x0 is a local minimum,

– H(x0) negative deﬁnite =⇒ x0 is local maximum. math 247: honours applied linear algebra 63

4.5 Real Inner Product Spaces

Goal: We want to introduce length and angle to abstract vector spaces. These are important to deﬁne convergence (Analysis). We must distinguish the cases K = R and K = C. K = R: important properties of standard inner product on Rn: hx, yi = xty

- Bilinear

- Positive deﬁnite: hx, xi ≥ 0 (hx, xi = 0 ⇔ x = 0)

- symmetric: hx, yi = hy, xi

We use these axioms to deﬁne inner product in abstract vector space: lecture 04/10

Deﬁnition 4.14. Let V vector space over R. h, i : V × V → R is a real inner product space on V iff: h, i is bilinear (i.e. linear in each argument), symmetric (∀ x, y ∈ V : hx, yi = hy, xi) and positive deﬁnite (∀ x ∈ V : hx, xi ≥ 0 ∧ hx, xi = 0 iff x = 0).

Example 4.10.

1. The standard inner product hx, yi ≡ xty on Rn is an inner product.

2. Let V = Rn, A positive deﬁnite (and symmetric) matrix. Then: hX, Yi = Xt AY is an inner product.

Proof. Bilinearity: hX + U, Yi = (X + U)t AY = Xt AY + Ut AY = hX, Yi + hU, Yi and: hkX, Yi = (kX)t AY = kXt AY

= khx, yi Similarly we can show linearity in Symmetry: hY, Xi = Yt AX second argument. =⇒ (Yt AX)t = Xt At Y = Xt AY | {z } |{z} ∈R =A =⇒ Yt AX = Xt AY  ≥ 0 ∀ x ∈ Rb Positive deﬁnite: hX, Xi = Xt A X = |{z} = x = pos. de f  0 iff 0 =⇒ h, iis positive deﬁnite.

3. Let V = C([a, b]) the VS of all continuous functions on [a, b]. R b h f , gi ≡ a f (x)g(x)dx is an inner product. 64 anna brandenberger, daniela breitman

Z b Proof. Bilinearity: h f1 + f2, gi = [ f1(x) + f2(x)]g(x)dx a Z b = f1(x)g(x) + f2(x)g(x)dx a

= h f1, gi + h f2, gi Z b and: hk f , gi = k f (x)g(x)dx a Z b = k f (x)g(x) = kh f , gi a Similarly we can show linearity in Z b Z b second argument. Symmetry: hg, f i = g(x) f (x)dx = f (x)g(x)dx = h f , gi a a Z b Positive deﬁnite: h f , f i = f 2(x) dx ≥ 0 ∀ f ∈ C([a, b]) a | {z } ≥0 R b 2 And let h f , f i = 0 =⇒ a f (x)dx = 0. This holds if f ≡ 0 on [a, b]. Then ∃c ∈ [a, b] s.t. f (c) 6= 0 INSERT picture 10:34 2 2 d =⇒ f (c)i0 (hd). Then e > 0 s.t. f (x)i 2 on (c − e, c + e) R b 2 d =⇒ a f (x)dx ≥ 2 2e = de > 0 =⇒ h f , f i = 0 iff f ≡ 0 =⇒ h, i is positive deﬁnite.

Remark 18.

• h, i on C([a, b]) induces an inner product e.g. on Pn and P (VS of all polynomials). • h, i is not an inner product of the set F([a, b]) of all real valued functions on [a, b]: while h, i is still bilinear and symmetric, it fails to be positive deﬁnite.  1 i f x = 0 For example, f (x) ≡ 0 i f x 6= 0 R b 2 Then f 6= 0 but h f , f i = a f (x)dx = 0 =⇒ h, i is not positive def.

4.6 Complex Inner Product Spaces

Reminder: Properties of standard inner product on Cn: hz, wi ≡ z?w Linear in 2. argument. Sesquilinear in 1. argument: hz1 + z2, wi = hz1, wi + hz2, wi but hkz, wi = khz, wi ? z1 hw, zi = w z = (w1 ... wn) ... = z1w1 + ··· + znwn zn ? w1 hz, wi = z w = (z1 ... zn) ... = z1w1 + ··· + znwn wn Note that z1w1 + ··· + znwn = z1w1 + ··· + znwn =⇒ hw, zi = hz, wi "Conjugate symmetry". math 247: honours applied linear algebra 65

? z1 Positive deﬁnite: hz, zi = z z = (z1 ... zn) ... zn 2 2 = z1z1 + ··· + znzn = kz1k + ··· + kznk | {z } ≥0, ∈R 2 2 and hz, zi = 0 iff kz1k + ··· + kznk = 0 iff z1 = ··· = zn = 0 =⇒ h, i is positive deﬁnite.

Deﬁnition 4.15. Let V be a VS over C. h, i : V × V → C. (V, h, i) is a complex inner product space if it satisﬁes the following:

1. h, i is sesquilinear.

h i = h i ∀ ∈ 2. Conjugate symmetry: z, w w, z z, w V Caution: some books deﬁne sesquilin- earity with linearity in the 1. argument 3. Positive deﬁnite: hz, zi ≥ 0 ∀ z ∈ V ("the left side is real and non (inc. Wikipedia). negative") and hz, zi = 0 ⇔ z = 0

Example 4.11. V = Cn, A hermetian, positive deﬁnite matrix (eigenvalues of the hermetian matrices are real; positive deﬁnite if all eigenvalues are positive). Then hz, wi ≡ z? Aw is an inner product on Cn.

4.7 Length in Inner Product Spaces over K p x ∈ V. Deﬁne kxk ≡ hx, xi ≥ 0. Motivation: V = Rn with standard inner product: x1 q p By Pythagoras, k ... k = x2 + ··· + x2 = hx, xi xn 1 n

+ Deﬁnition 4.16. Let V be a VS over K. The norm k·k : V → R0 = [0, ∞) is s.t.

1. kxk ≥ 0 ∀x ∈ V and kxk = 0 ⇔ x = 0.

2. kkxk = |k| · kxk ∀k ∈ K ∀x ∈ V

3. Triangle inequality: kx + yk ≤ kxk + kyk ∀x, y ∈ V

p Theorem 4.17. Let V inner product space over K, kxk ≡ hx, xi ∀x ∈ V. Then k·k is a norm on V.

Proof. p p 1. kxk = hx, xi ≥ 0 and kxk = 0 = hx, xi ⇔ hx, xi = 0 ⇔ x = 0 q q q 2. kkxk = hkx, kxi = khkx, xi = kkhx, xi q = |k| hx, xi = |k| · kxk

3. Difﬁcult. We will prove the Cauchy-Schwarz Inequality ﬁrst. 66 anna brandenberger, daniela breitman

Theorem 4.18. Cauchy-Schwarz Inequality. Let V be an inner product space over K. Then |hx, yi| ≤ kxk · kyk ∀x, y ∈ V.

Proof. Lemma:

ha + λb, a + λbi = ha + λb, ai + ha + λb, λbi = ha + λb, ai + λha + λb, bi = ha, ai + hλb, ai + λ (ha, bi + hλb, bi) = ha, ai + λ hb, ai +λ ha, bi + λhb, bi | {z } =ha,bi = ha, ai + λha, bi + λ¯ ha, bi + λλhb, bi

Taking Y⊥ as the perpendicular part of the projection of Y onto X, = − hX,Yi k k2 Y⊥ Y hX,Xi X. To ﬁnd Y⊥ :

2 hX, Yi hX, Yi 0 ≤ kY⊥k = hY − X, Y − Xi hX, Xi hX, Xi hX, Yi using the Lemma: λ = − hX, Xi hX, Yi hX, Yi hX, Yi X, Y = hY, Yi − hY, Xi − hY, Xi + · ·hX, Xi hX, Xi hX, Xi hX, Xi hX, Xi ¨ ¨ hX, YihX, Yi hX, YihX¨, Yi hX, YihX¨, Yi = hY, Yi − − ¨¨ + ¨¨ hX, Xi ¨¨hX, Xi ¨¨hX, Xi |hX, Yi|2 = kYk2 − kXk2 |hX, Yi|2 =⇒ 0 ≤ kYk2 − kXk2 =⇒ 0 ≤ kXk2kYk2 − |hX, Yi|2 =⇒ |hX, Yi|2 ≤ kXk2 · kYk2 lecture 04/12 We are interested in knowing under what conditions equality occurs in Cauchy-Schwarz.

Lemma 4.19. Equality holds in Cauchy-Schwarz whenever x, y are linearly dependent (i.e. x k y).

Proof. For x = 0: 0 = 0. For x 6= 0: Similar to Cauchy-Schwarz proof: 2 2 2 2 |hx,yi| 2 |hx,yi| 0 ≤ ||y⊥|| = kyk − =⇒ kyk − = 0 ⇔ ||y⊥|| = 0. kxk2 kxk2 We can conclude x, y must be linearly dependent since = − hx,λxi = − hx,xi = − = y⊥ λx hx,xi x λx λ hx,xi x λx λx 0 =⇒ equality in Cauchy-Schwarz, since if x, y are linearly independent, y⊥ 6= 0 and we get a strict inequality in Cauchy-Schwarz. So equality in Cauchy-Schwarz ⇔ x, y linearly dependent. math 247: honours applied linear algebra 67

We can use Cauchy-Schwarz to prove the triangle inequality, but ﬁrst, we present a lemma:

Lemma 4.20. Let z ∈ C, then:

(a)z + z¯ = 2Re(z).

(b) |Re(z)| ≤ |z|.

Proof. Let z = x + iy and z¯ = x − iy, then (a) z + z¯ = x + iy + x − iy = 2x = 2Re(z). √ 2 p (b) |Re(z)| = |x| = x ≤ x2 + y2 = |z|.

Theorem 4.21. Triangle Inequality. Let V be an inner product space over K. kx + yk ≤ kxk + kyk, ∀x, y ∈ V.

Proof. kx + yk2 = hx + y, x + yi = hx, xi + hx, yi + hy, xi + hy, yi = kxk2 + hx, yi + hx, yi + kyk2 = kxk2 + 2Re(hx, yi) + kyk2 by Lemma 4.20 (a) ≤ kxk2 + 2|Re(hx, yi)| + kyk2 ≤ kxk2 + 2|hx, yi| + kyk2 by Lemma 4.20 (b) ≤ kxk2 + 2kxkkyk + kyk2 by Cauchy-Schwarz kx + yk2 ≤ (kxk + kyk)2 =⇒ kx + yk ≤ kxk + kyk, where k·k is a norm on any inner product space.

4.8 Angle in Inner Product Spaces over K

To deﬁne angle for any inner product space, we use the deﬁnition commonly used in Rn as motivation: x, y ∈ Rn, hx, yi = xty is standard inner product on Rn, then = hx,yi ≤ ≤ cos θ kxkkyk , where 0 θ π is the angle between x and y.

Deﬁnition 4.17. Let V be a real inner product space with inner prod- = hx,yi We note that in cos θ kxkkyk , the uct h, i, let x, y 6= 0 in V. Let θ be uniquely determined angle ∈ [0, π], fraction must ∈ [−1, 1] by Cauchy- such that: Schwarz. Moreover, we note that the hx, yi angle between x, y is equal to the angle cos θ = kxkkyk between y, x since h, i is symmetric.

h i This deﬁnition of orthogonality can Deﬁnition 4.18. Let V be a real inner product space. x, y are or- carry over to complex inner product π thogonal ⇔ hx, yi = 0. For x, y 6= 0 =⇒ θ(x, y) = 2 . spaces.

Theorem 4.22. Pythagoras. Let V be inner product space, x, y ∈ V orthogonal, then kx + yk2 = kxk2 + kyk2. 68 anna brandenberger, daniela breitman

Proof. kx + yk2 = hx + y, x + yi = hx + y, x + yi = hx, xi + hx, yi + hy, xi + hy, yi We note that Pythagoras applies to real 2 2 as well as complex inner product spaces = kxk + kyk since (x ⊥ y) since we only used the deﬁnition of orthogonality.

4.9 Fourier Series (speed topic)

Consider V vector space of all continuous functions on R with inner R π product −π f (x)g(x)dx.

Theorem 4.23. Consider S ≡ {1, sin x, cos x, sin(2x), cos(2x), ···}. (No- tice that 1 = cos(0 · x).) Functions ∈ S are pairwise orthogonal. See Assignment 3, Exercise 3. Proof. Using trigonometric identities and integration by parts, we can show that: Z π sin(nx) cos(mx)dx = 0 ∀n ∈ N, m ∈ N0 −π Z π sin(nx) sin(mx)dx = 0 if m 6= n, n, m ∈ N −π Z π cos(nx) cos(mx)dx = 0 if m 6= n, n, m ∈ N0 −π We note that elements of S are only orthogonal, not orthonormal: Z π sin2(nx)dx = π ∀n ∈ N −π Z π cos2(nx) = π ∀n ∈ N (−2πforn = 0) −π Z π 1dx = 2π −π n Recall that in R with standard inner product with v1, ··· , vn pairwise orthogonal and non-zero, and v = a1v1, ··· , anvn, then we have hvj,vi that aj = . In the proof of this fact, we only use linearity in the hvj,vji second argument and positive deﬁniteness, thus, this carries over to any inner product space. Let f (x) be a linear combination of the members of S, i.e.

f (x) = c0 + [a1 cos x + ··· + an cos(nx)] + [b1 sin x + ··· + bn sin(nx)]

With f (x) known, we want to recover the coefﬁcients, then:

hcos(nx), f (x)i 1 Z π an = = f (x) cos(nx)dx hcos(nx), cos(nx)i π −π hsin(nx), f (x)i 1 Z π bn = = f (x) sin(nx)dx hsin(nx), sin(nx)i π −π 1 Z π c0 = f (x)dx 2π −π math 247: honours applied linear algebra 69

We use the formulas above as motivation for the deﬁnition of the (inﬁnite) Fourier series.

Definition 4.19. Let f be periodic of period 2π and continuous. For an, bn, c0 defined as above, we consider the infinite series, the Fourier series of f : ∞ f (x) = c0 + ∑ an cos(nx) + bn sin(nx) n=1

Theorem 4.24. Dirichlet. Let f be periodic of period 2π, piecewise continuous, and piecewise continuously differentiable on [−π, π], then the Fourier series of f converges at all points where f is continuous, and it represents f .

Example 4.12. Let f (x) = x2, [−π, π] extended periodically. It sat- isﬁes the Dirichlet conditions and thus its Fourier series represents it:

1 ∞ (−1)n 2 = 2 + ( ) ∀ ∈ R x π 4 ∑ 2 cos nx x 3 n=1 n

The bn’s are all 0 since f is even. Especially, at discontinuity x = π:

1 ∞ (−1)n 1 ∞ 1 2 = 2 + (− )n = 2 + x π 4 ∑ 2 1 π 4 ∑ 2 3 n=1 n 3 n=1 n ∞ 1 π2 =⇒ = ∑ 2 n=1 n 6