Discover Incomplete Preliminary Draft

Date: November 28, 2017

L´aszl´oBabai in collaboration with Noah Halford

All rights reserved. Approved for instructional use only. Commercial distribution prohibited. c 2016 L´aszl´oBabai.

Last updated: November 10, 2016 Preface

TO BE WRITTEN.

Babai: Discover Linear Algebra. ii This chapter last updated August 21, 2016 c 2016 L´aszl´oBabai. Contents

Notation ix

I Theory 1

Introduction to Part I 2

1 (F, R) Column Vectors 3 1.1 (F) Column vector basics ...... 3 1.1.1 The domain of scalars ...... 4 1.2 (F) Subspaces and span ...... 8 1.3 (F) and the First Miracle of Linear Algebra ...... 11 1.4 (F) ...... 16 1.5 (R) Dot product over R ...... 18 1.6 (F) Additional exercises ...... 19

2 (F) Matrices 20 2.1 Matrix basics ...... 20 2.2 ...... 24 2.3 Arithmetic of diagonal and triangular matrices ...... 31 2.4 Permutation Matrices ...... 33 2.5 Additional exercises ...... 36

3 (F) Matrix 39 3.1 Column and row rank ...... 39

iii iv CONTENTS

3.2 Elementary operations and ...... 40 3.3 Invariance of column and row rank, the Second Miracle of Linear Algebra . . . . . 43 3.4 Matrix rank and invertibility ...... 45 3.5 (optional) ...... 47 3.6 Additional exercises ...... 48

4 (F) Theory of Systems of Linear Equations I: Qualitative Theory 51 4.1 Homogeneous systems of linear equations ...... 51 4.2 General systems of linear equations ...... 54

5 (F, R) Affine and Convex Combinations (optional) 56 5.1 (F) Affine combinations ...... 56 5.2 (F) Hyperplanes ...... 60 5.3 (R) Convex combinations ...... 60 5.4 (R) Helly’s Theorem ...... 62 5.5 (F, R) Additional exercises ...... 63

6 (F) The 64 6.1 Permutations ...... 65 6.2 Defining the determinant ...... 68 6.3 Cofactor expansion ...... 73 6.4 Matrix inverses and the determinant ...... 75 6.5 Additional exercises ...... 76

7 (F) Theory of Systems of Linear Equations II: Cramer’s Rule 77 7.1 Cramer’s Rule ...... 77

8 (F) Eigenvectors and Eigenvalues 79 8.1 Eigenvector and eigenvalue basics ...... 79 8.2 Similar matrices and diagonalizability ...... 82 8.3 basics ...... 83 8.4 The characteristic polynomial ...... 84 8.5 The Cayley-Hamilton Theorem ...... 87 8.6 Additional exercises ...... 89 CONTENTS v

9 (R) Orthogonal Matrices 90 9.1 Orthogonal matrices ...... 90 9.2 Orthogonal similarity ...... 91 9.3 Additional exercises ...... 92

10 (R) The Spectral Theorem 93 10.1 Statement of the Spectral Theorem ...... 93 10.2 Applications of the Spectral Theorem ...... 94

11 (F, R) Bilinear and Quadratic Forms 96 11.1 (F) Linear and bilinear forms ...... 96 11.2 (F) Multivariate ...... 98 11.3 (R) Quadratic forms ...... 100 11.4 (F) (optional) ...... 103

12 (C) Complex Matrices 106 12.1 Complex numbers ...... 106 12.2 Hermitian dot product in Cn ...... 107 12.3 Hermitian and unitary matrices ...... 109 12.4 Normal matrices and unitary similarity ...... 110 12.5 Additional exercises ...... 112

13 (C, R) Matrix Norms 113 13.1 (R) Operator ...... 113 13.2 (R) Frobenius norm ...... 114 13.3 (C) Complex Matrices ...... 115

II Linear Algebra of Vector Spaces 116

Introduction to Part II 117

14 (F) Preliminaries 118 14.1 Modular arithmetic ...... 118 14.2 Abelian groups ...... 123 14.3 Fields ...... 126 vi CONTENTS

14.4 Polynomials ...... 127

15 (F) Vector Spaces: Basic Concepts 138 15.1 Vector spaces ...... 138 15.2 Subspaces and span ...... 141 15.3 Linear independence and bases ...... 143 15.4 The First Miracle of Linear Algebra ...... 147 15.5 Direct sums ...... 150

16 (F) Linear Maps 152 16.1 basics ...... 152 16.2 Isomorphisms ...... 154 16.3 The Rank-Nullity Theorem ...... 155 16.4 Linear transformations ...... 156 16.4.1 Eigenvectors, eigenvalues, eigenspaces ...... 158 16.4.2 Invariant subspaces ...... 159 16.5 Coordinatization ...... 163 16.6 Change of ...... 166

17 (F) Block Matrices (optional) 168 17.1 basics ...... 168 17.2 Arithmetic of block-diagonal and block-triangular matrices ...... 170

18 (F) Minimal Polynomials of Matrices and Linear Transformations (optional) 172 18.1 The minimal polynomial ...... 172 18.2 Minimal polynomials of linear transformations ...... 174

19 (R) Euclidean Spaces 177 19.1 Inner products ...... 177 19.2 Gram-Schmidt ...... 181 19.3 Isometries and orthogonal transformations ...... 183 19.4 First proof of the Spectral Theorem ...... 186

20 (C) Hermitian Spaces 189 20.1 Hermitian spaces ...... 189 20.2 Hermitian transformations ...... 194 CONTENTS vii

20.3 Unitary transformations ...... 195 20.4 Adjoint transformations in Hermitian spaces ...... 196 20.5 Normal transformations ...... 196 20.6 The Complex Spectral Theorem for normal transformations ...... 197

21 (R, C) The Singular Value Decomposition 198 21.1 The Singular Value Decomposition ...... 198 21.2 Low-rank approximation ...... 199

22 (R) Finite Markov Chains 202 22.1 Stochastic matrices ...... 202 22.2 Finite Markov Chains ...... 203 22.3 Digraphs ...... 205 22.4 Digraphs and Markov Chains ...... 205 22.5 Finite Markov Chains and undirected graphs ...... 205 22.6 Additional exercises ...... 206

23 More Chapters 207

24 Hints 208 24.1 Column Vectors ...... 208 24.2 Matrices ...... 211 24.3 Matrix Rank ...... 213 24.4 Theory of Systems of Linear Equations I: Qualitative Theory ...... 214 24.5 Affine and Convex Combinations ...... 214 24.6 The Determinant ...... 214 24.7 Theory of Systems of Linear Equations II: Cramer’s Rule ...... 215 24.8 Eigenvectors and Eigenvalues ...... 215 24.9 Orthogonal Matrices ...... 215 24.10 The Spectral Theorem ...... 215 24.11 Bilinear and Quadratic Forms ...... 215 24.12 Complex Matrices ...... 216 24.13 Matrix Norms ...... 217 24.14 Preliminaries ...... 217 24.15 Vector Spaces ...... 217 viii CONTENTS

24.16 Linear Maps ...... 218 24.17 Block Matrices ...... 219 24.18 Minimal Polynomials of Matrices and Linear Transformations ...... 219 24.19 Euclidean Spaces ...... 219 24.20 Hermitian Spaces ...... 219 24.21 The Singular Value Decomposition ...... 220 24.22 Finite Markov Chains ...... 220

25 Solutions 221 25.1 Column Vectors ...... 221 25.2 Matrices ...... 222 25.3 Matrix Rank ...... 222 25.4 Theory of Systems of Linear Equations I: Qualitative Theory ...... 222 25.5 Affine and Convex Combinations ...... 222 25.6 The Determinant ...... 222 25.7 Theory of Systems of Linear Equations II: Cramer’s Rule ...... 222 25.8 Eigenvectors and Eigenvalues ...... 222 25.9 Orthogonal Matrices ...... 222 25.10 The Spectral Theorem ...... 222 25.11 Bilinear and Quadratic Forms ...... 224 25.12 Complex Matrices ...... 224 25.13 Matrix Norms ...... 224 25.14 Preliminaries ...... 224 25.15 Vector Spaces ...... 224 25.16 Linear Maps ...... 224 25.17 Block Matrices ...... 224 25.18 Minimal Polynomials of Matrices and Linear Transformations ...... 224 25.19 Euclidean Spaces ...... 225 25.20 Hermitian Spaces ...... 225 25.21 The Singular Value Decomposition ...... 225 25.22 Finite Markov Chains ...... 225 Notation

Symbol Meaning C The field of complex numbers diag im(ϕ) Image of ϕ N The set {0, 1, 2,... } of natural numbers [n] The set {1, . . . , n} Q The field of rational numbers R The field of real numbers R× The set of real numbers excluding 0 Z The set {..., −2, −1, 0, 1, 2,... } of integers Z+ The set {1, 2, 3,... } of positive integers ⊕ Direct sum ≤ Subspace ∅ The empty set ⊥ Orthogonal

Babai: Discover Linear Algebra. ix This chapter last updated August 27, 2016 c 2016 L´aszl´oBabai. Part I

Matrix Theory

1 Introduction to Part I

TO BE WRITTEN. (R Contents)

Babai: Discover Linear Algebra. 2 This chapter last updated August 21, 2016 c 2016 L´aszl´oBabai. Chapter 1

(F, R) Column Vectors

(R Contents)

1.1 (F) Column vector basics We begin with a discussion of column vectors. Definition 1.1.1 (Column vector). A column vector of height k is a list of k numbers arranged in a column, written as   α1 α   2  .  .  .  αk The k numbers in the column are referred to as the entries of the column vector; we will normally use lower case Greek letters such as α, β, and ζ to denote these numbers. We denote column vectors by bold letters such as u, v, w, x, y, b, e, f, etc., so we may write   α1 α   2 v =  .  .  .  αk

Babai: Discover Linear Algebra. 3 This chapter last updated August 29, 2016 c 2016 L´aszl´oBabai. 4 CHAPTER 1. (F, R) COLUMN VECTORS

(R Contents)

1.1.1 The domain of scalars

In general, the entries of column vectors will be taken from a “field,” denoted by F. We shall refer to the elements of the field F as “scalars,” and we will normally denote scalars by lowercase Greek letters. We will discuss fields in detail in Section 14.3. Informally, a field is a set endowed with the operations of addition and multiplication which obey the rules familiar from the arithmetic of real numbers (commutativity, associativity, inverses, etc.). Examples of fields are Q, R, C, and Fp, where Q is the set of rational numbers, R is the set of real numbers, C is the set of complex numbers, and Fp is the “integers modulo p” for a prime p, so Fp is a finite field of “order” p (order = number of elements). The reader who is not comfortable with finite fields or with the abstract concept of fields may ignore the exercises related to them and always take F to be Q, R, or C. In fact, taking F to be R suffices for all sections except those specifically related to C. We shall also consider integral vectors, i. e., vectors whose entries are integers. Z denotes the set of integers and Zk the set of integral vectors of height k, so Zk ⊆ Qk ⊆ Rk ⊆ Ck. Note that Z is not a field (division does not work within Z). Our notation F for the domain of scalars always refers to a field and therefore does not include Z. k X Notation 1.1.2. Let α1, . . . , αn ∈ F. The expression αi denotes the sum α1 + ··· + αk. More i=1 X generally, for an index set I = {i1, i2, . . . , i`}, the expression αi denotes the sum αi1 + ··· + αi` . i∈I 0 X X Convention 1.1.3 (Empty sum). The empty sum, denoted αi or αi, evaluates to zero. i=1 i∈∅ ∗ ∗ ∗ Definition 1.1.4 (The Fk). For a domain F of scalars, we define the space Fk of column vectors of height k over by F    α1   α2  k :   F =  .  αi ∈ F (1.1)  .      αk  1.1. (F) COLUMN VECTOR BASICS 5

Definition 1.1.5 (Zero vector). The zero vector in Fk is the vector 0 0 :   0k = . (1.2) . 0

We often write 0 instead of 0k when the height of the vector is clear from context. Definition 1.1.6 (All-ones vector). The all-ones vector in Fk is the vector 1 1 :   1k = . (1.3) . 1

We sometimes write 1 instead of 1k. Definition 1.1.7 (Addition of column vectors). Addition of column vectors of the same height is defined elementwise, i. e.,       α1 β1 α1 + β1 α  β  α + β   2  2  2 2  .  +  .  =  .  . (1.4)  .   .   .  αk βk αk + βk

 2   0  2 Example 1.1.8. Let v =  6  and let w = −3. Then v + w = 3. −1 2 1

Exercise 1.1.9. Verify that vector addition is commutative, i. e., for v, w ∈ Fk, we have v + w = w + v . (1.5)

Exercise 1.1.10. Verify that vector addition is associative, i. e., for u, v, w ∈ Fk, we have u + (v + w) = (u + v) + w . (1.6) 6 CHAPTER 1. (F, R) COLUMN VECTORS

Column vectors also carry with them the notion of “scaling” by an element of F. Definition 1.1.11 (Multiplication of a column vector by a ). Let v ∈ Fk and let λ ∈ F. Then the vector λv is the vector v after each entry has been scaled (multiplied) by a factor of λ, i. e.,     α1 λα1 α  λα   2  2 λ  .  =  .  . (1.7)  .   .  αk λαk

 2   6  Example 1.1.12. Let v = −1. Then 3v = −3. 1 3

n Definition 1.1.13 (). Let v1,..., vm ∈ F . Then a linear combination of the m X vectors v1,..., vm is a sum of the form αivi where α1, . . . , αm ∈ F. i=1 Example 1.1.14. The following is a linear combination.  2  −1  6  −5 −3  2  1  2   4    + 4   −   =   .  6   0  2  10   1  1 −3 −8 −7 Numerical exercise 1.1.15. The following linear combinations are of the form αa + βb + γc. Evaluate in two ways, as (αa + βb) + γc and as αa + (βb + γc). Self-check: you must get the same answer. 2  3  −7 (a) + 3 − 2 4 −1 2

1 −4 1 1 (b) −2 6 + 2 −2 + 1 3 6 4

 3  −2  7  (c) −  7  − 4  3  + −4 −1 0 3 1.1. (F) COLUMN VECTOR BASICS 7

−5 −2 Exercise 1.1.16. Express the vector −1 as a linear combination of the vectors  1  and 11 7 2 6 . 6 Exercise 1.1.17.  3  (a) Express  1  as a linear combination of the vectors −2

1 −3 1 1 ,  3  , 2 . 1 0 3 First describe the nature of the problem you need to solve.  3  (b) Give an “aha” proof that  1  cannot be expressed as a linear combination of −2

−3  2   3   1  , −2 , −8 2 0 5 An “aha” proof may not be easy to find but it has to be immediately convincing. This problem will be generalized in Ex. 1.2.7. Exercise 1.1.18. To what does a linear combination of the empty list of vectors evaluate? (R Con- vention 1.1.3) Let us now consider the system of linear equations

α11x1 + α12x2 + ··· + α1nxn = β1 α21x1 + α22x2 + ··· + α2nxn = β2 . (1.8) . αk1x1 + αk2x2 + ··· + αknxn = βk 8 CHAPTER 1. (F, R) COLUMN VECTORS

Given the αij and the βi, we need to find x1, . . . , xn that satisfy these equations. This is arguably one of the most fundamental problems of applied mathematics. We can rephrase this problem in terms of the vectors a1,..., an, b, where   α1j α2j  :   aj =  .   .  αkj

is the column of coefficients of xj and the vector   β1 β2 :   b =  .   .  βk represents the right-hand side. With this notation, our system of linear equations takes the more concise form x1a1 + x2a2 + ··· + xnan = b . (1.9) The problem of solving the system of equations (1.8) therefore is equivalent to expressing the vector b as a linear combination of the ai.

(R Contents)

1.2 (F) Subspaces and span

Definition 1.2.1 (Subspace). A set W ⊆ Fn is a subspace of Fn (denoted W ≤ Fn) if it is closed under linear combinations.

Exercise 1.2.2. Let W ≤ Fk. Show that 0 ∈ W . (Why is the empty set not a subspace?)

Proposition 1.2.3. W ≤ Fn if and only if (a) 0 ∈ W 1.2. (F) SUBSPACES AND SPAN 9

(b) If u, v ∈ W , then u + v ∈ W

(c) If v ∈ W and α ∈ F, then αv ∈ W . Exercise 1.2.4. Show that, if W is a nonempty subset of Fn, then (c) implies (a). Exercise 1.2.5. Show

(a) {0} ≤ Fk ;

(b) Fk ≤ Fk . We refer to these as the trivial subspaces of Fk. Exercise 1.2.6.    α1   3 (a) The set  0  α1 = 2α3 is a subspace of R .

 α3     α1 2 (b) The set α2 = α1 + 7 is not a subspace of R . α2 Exercise 1.2.7. Let    α1   α  k   2 k X Wk =  .  ∈ F αi = 0  .  i=1     αk  k k Show that Wk ≤ F . This is the 0-weight subspace of F . Exercise 1.2.8. Prove that, for n ≥ 2, the space Rn has infinitely many subspaces. n Proposition 1.2.9. Let W1,W2 ≤ F . Then n (a) W1 ∩ W2 ≤ F

(b) The intersection of any (finite or infinite) collection of subspaces of Fn is also a subspace of Fn.

n (c) W1 ∪ W2 ≤ F if and only if W1 ⊆ W2 or W2 ⊆ W1 10 CHAPTER 1. (F, R) COLUMN VECTORS

Definition 1.2.10 (Span). Let S ⊆ Fn. Then the span of S, denoted span(S), is the smallest subspace of Fk containing S, i. e., (a) span S ⊇ S;

(b) span S is a subspace of Fk; (c) for every subspace W ≤ Fn, if S ⊆ W then span S ≤ W . Fact 1.2.11. span(∅) = {0}. (Why?) Theorem 1.2.12. Let S ⊆ Fn. Then (a) span S exists and is unique; \ (b) span(S) = W . This is the intersection of all subspaces containing S. S⊆W ≤Fn ♦ This theorem tells us that the span exists. The next theorem constructs all the elements of the span. Theorem 1.2.13. For S ⊆ Fn, span S is the set of all linear combinations of the finite subsets of S. (Note that this is true even when S is empty. Why?) ♦ Proposition 1.2.14. Let S ⊆ Fn. Then S ≤ Fn if and only if S = span(S). Let S ⊆ Fn. Proposition 1.2.15. Prove that span(span(S)) = span(S). Prove this (a) based on the definition; (b) (linear combinations of linear combinations) based on Theorem 1.2.13. Proposition 1.2.16 (Transitivity of span). Suppose R ⊆ span(T ) and T ⊆ span(S). Then R ⊆ span(S). Definition 1.2.17 (Sum of sets). Let A, B ⊆ Fn. Then A + B is the set A + B = {a + b | a ∈ A, b ∈ B} . (1.10) n Proposition 1.2.18. Let U1,U2 ≤ F . Then U1 + U2 = span(U1 ∪ U2).

(R Contents) 1.3. (F) LINEAR INDEPENDENCE AND THE FIRST MIRACLE OF LINEAR ALGEBRA 11

1.3 (F) Linear independence and the First Miracle of Lin- ear Algebra

k In this section, v1, v2,... will denote column vectors of height k, i. e., vi ∈ F .

Definition 1.3.1 (List). A list of objects ai is a whose domain is an “index set” I; we write the list as (ai | i ∈ I). Most often our index set will be I = {1, . . . , n}. In this case, we write the list as (a1, . . . , an). The size of the list is |I|.

Notation 1.3.2 (Concatenation of lists). Let L = (v1, v2,..., v`) and M = (w1, w2,..., wn) be lists. We denote by (L, M) the list obtained by concatenation L and M, i. e., the list (v1, v2,..., v`, w1, w2,..., wn). If the list M has only one element, we omit the parentheses around the list, that is, we write (L, w) rather than (L, (w)).

Definition 1.3.3 (Trivial linear combination). The trivial linear combination of the vectors v1,..., vm is the linear combination (R Def. 1.1.13) 0v1 + ··· + 0vn (all coefficients are 0). Fact 1.3.4. The trivial linear combination evaluates to zero.

Definition 1.3.5 (Linear independence). The list (v1,..., vn) is said to be linearly independent if the only linear combination that evaluates to zero is the trivial linear combination. The list (v1,..., vn) is linearly dependent if it is not linearly independent, i. e., if there exist scalars α1, . . . , αn, not all n X zero, such that αivi = 0. i=1

Definition 1.3.6. If a list (v1,..., vn) of vectors is linearly independent (dependent), we say that the vectors v1,..., vn are linearly independent (dependent). Definition 1.3.7. We say that a set of vectors is linearly independent if a list formed by its elements (in any order and without repetitions) is linearly independent. Example 1.3.8. Let  1  3 −4 v1 = −1 , v2 = 1 , v3 =  1  . 2 0 2

Then v1, v2, v3 are linearly independent vectors. It follows that the set {v1, v2, v3, v2} is linearly independent while the list (v1, v2, v3, v2) is linearly dependent.

Note that the list (v1, v2, v3, v2) has four elements, but the set {v1, v2, v3, v2} has three elements. 12 CHAPTER 1. (F, R) COLUMN VECTORS

 1  4 −2 Exercise 1.3.9. Show that the vectors −3 , 2 , −8 are linearly dependent. 2 1 3

Definition 1.3.10. We say that the vector w depends on the list (v1,..., vk) of vectors if w ∈ span(v1,..., vk), i. e., if w can be expressed as a linear combination of the vi.

Proposition 1.3.11. The vectors v1,..., vk are linearly dependent if and only if there is some i such that vi depends on the other vectors in the list. Exercise 1.3.12. Show that the list (v, w, v + w) is linearly dependent. Exercise 1.3.13. Is the empty list linearly independent? Exercise 1.3.14. Show that if a list is linearly independent, then any permutation of it is linearly independent.

Proposition 1.3.15. The list (v, v), consisting of the vector v ∈ Fn listed twice, is linearly depen- dent. Exercise 1.3.16. Is there a vector v such that the list (v), consisting of a single item, is linearly dependent? Exercise 1.3.17. Which vectors depend on the empty list? Definition 1.3.18 (Sublist). A sublist of a list L is a list M consisting of some of the elements of L, in the same order in which they appear in L. Examples 1.3.19. Let L = (a, b, c, b, d, e). (a) The empty list is a sublist of L.

(b) L is a sublist of itself.

(c) The list L1 = (a, b, d) is a sublist of L.

(d) The list L2 = (b, b) is a sublist of L.

(e) The list L3 = (b, b, b) is not a sublist of L.

(f) The list L4 = (a, d, c, e) is not a sublist of L. 1.3. (F) LINEAR INDEPENDENCE AND THE FIRST MIRACLE OF LINEAR ALGEBRA 13

Fact 1.3.20. Every sublist of a linearly independent list of vectors is linearly independent.

The following lemma is central to the proof of the First Miracle of Linear Algebra (R Theorem 1.3.40) as well as to our characterization of bases as maximal linearly independent sets (R Prop. 1.3.37).

Lemma 1.3.21. Suppose (v1,..., vk) is a linearly independent list of vectors and the list (v1,..., vk+1) is linearly dependent. Then vk+1 ∈ span(v1,..., vk). ♦

Proposition 1.3.22. The vectors v1,..., vk are linearly dependent if and only if there is some j such that vj ∈ span(v1,..., vj−1, vj+1,..., vk) . Exercise 1.3.23. Prove that no list of vectors containing 0 is linearly independent.

Exercise 1.3.24. Prove that a list of vectors with repeated elements (the same vector occurs more than once) is linearly dependent. (This follows from combining which two previous exercises?)

Definition 1.3.25 (Parallel vectors). Let u, v ∈ Fn. We say that u and v are parallel if there exists a scalar α such that u = αv or v = αu. Note that 0 is parallel to all vectors, and the relation of being parallel is an equivalence relation (R Def. 14.1.19) on the set of nonzero vectors.

Exercise 1.3.26. Let u, v ∈ Fn. Show that the list (u, v) is linearly dependent if and only if u and v are parallel.

Exercise 1.3.27. Find n+1 vectors in Fn such that every n are linearly independent. Over infinite fields, a much stronger statement holds (R Ex. 15.3.15).

Definition 1.3.28 (Rank). The rank of a set S ⊆ Fn, denoted rk S, is the size of the largest linearly independent subset of S. The rank of a list is the rank of the set formed by its elements.

Proposition 1.3.29. Let S and T be lists. Show that

rk(S,T ) ≤ rk S + rk T (1.11)

where rk(S,T ) is the rank of the list obtained by concatenating the lists S and T .

Definition 1.3.30 (). Let W ≤ Fn be a subspace. The dimension of W , denoted dim W , is its rank, that is, dim W = rk W . 14 CHAPTER 1. (F, R) COLUMN VECTORS

n Definition 1.3.31 (List of generators). Let W ≤ F . The list L = (v1,..., vk) ⊆ W is said to be a list of generators of W if span(v1,..., vk) = W . In this case, we say that v1,..., vk generate W . k Definition 1.3.32 (Basis). A list b = (b1,..., bn) is a basis of the subspace W ≤ F if b is a linearly independent list generators of W .

Fact 1.3.33. If W ≤ Fn and b is a list of vectors of W then b is a basis of W if and only if it is linearly independent and span(b) = W .

k k Definition 1.3.34 ( of F ). The standard basis of F is the basis (e1,..., ek), where ei is the column vector which has its i-th component equal to 1 and all other components equal to 0. The vectors e1,..., ek are sometimes called the standard unit vectors. For example, the standard basis of F3 is 1 0 0 0 , 1 , 0 0 0 1

Exercise 1.3.35. The standard basis of Fk is a basis.

Note that subspaces of Fk do not have a “standard basis.” n Definition 1.3.36 (Maximal linearly independent list). Let W ≤ F . A list L = (v1,..., vk) of vectors in W is a maximal linearly independent list in W if L is linearly independent but for all w ∈ W , the list (L, w) is linearly dependent.

k Proposition 1.3.37. Let W ≤ F and let b = (b1,..., bk) be a list of vectors in W . Then b is a basis of W if and only if it is a maximal linearly independent list.

Exercise 1.3.38. Prove: every subspace W ≤ Fk has a basis. You may assume the fact, to be proved shortly (R Cor. 1.3.49), that every linearly independent list of vectors in Fk has size at most k.

Proposition 1.3.39. Let L be a finite list of generators of W ≤ Fk. Then L contains a basis of W .

Next we state a central result to which the entire field of linear algebra arguably owes its . 1.3. (F) LINEAR INDEPENDENCE AND THE FIRST MIRACLE OF LINEAR ALGEBRA 15

Theorem 1.3.40 (First Miracle of Linear Algebra). Let v1,..., vk be linearly independent with vi ∈ span(w1,..., wm) for all i. Then k ≤ m. The proof of this theorem requires the following lemma.

Lemma 1.3.41 (). Let (v1,..., vk) be a linearly independent list of vectors such that vi ∈ span(w1,..., wm) for all i. Then there exists j (1 ≤ j ≤ m) such that the list (wj, v2,..., vk) is linearly independent. ♦ Exercise 1.3.42. Use the Steinitz exchange lemma to prove the First Miracle of Linear Algebra. Corollary 1.3.43. Let W ≤ Fk. Every basis of W has the same size (same number of vectors). Exercise 1.3.44. Prove: Cor. 1.3.43 is equivalent to the First Miracle, i. e., infer the First Miracle from Cor. 1.3.43. Corollary 1.3.45. Every basis of Fk has size k. The following result is essentially a restatement of the First Miracle of Linear Algebra.

Corollary 1.3.46. rk(v1,..., vk) = dim (span(v1,..., vk)). Exercise 1.3.47. Prove Cor. 1.3.46 is equivalent to the First Miracle, i. e., infer the First Miracle from Cor. 1.3.46. Corollary 1.3.48. dim Fk = k. Corollary 1.3.49. Every linearly independent list of vectors in Fk has size at most k. Corollary 1.3.50. Let W ≤ Fk and let L be a linearly independent list of vectors in W . Then L can be extended to a basis of W . n Exercise 1.3.51. Let U1,U2 ≤ F with U1 ∩ U2 = {0}. Let v1,..., vk ∈ U1 and w1,..., w` ∈ U2. If the lists (v1,..., vk) and (w1,..., w`) are linearly independent, then so is the concatenated list (v1,..., vk, w1,..., w`). n Proposition 1.3.52. Let U1,U2 ≤ F with U1 ∩ U2 = {0}. Then

dim U1 + dim U2 ≤ n . (1.12) k Proposition 1.3.53 (Modular equation). Let U1,U2 ≤ F . Then

dim(U1 + U2) + dim(U1 ∩ U2) = dim U1 + dim U2 . (1.13)

(R Contents) 16 CHAPTER 1. (F, R) COLUMN VECTORS

1.4 (F) Dot product We now define the dot product, an operation which takes two vectors as input and outputs a scalar. n T T Definition 1.4.1 (Dot product). Let x, y ∈ F with x = (α1, α2, . . . , αn) and y = (β1, β2, . . . , βn) . (Note that these are column vectors). Then the dot product of x and y, denoted x · y, is

n X x · y := αiβi ∈ F . (1.14) i=1 Proposition 1.4.2. The dot product is symmetric, that is,

x · y = y · x .

Exercise 1.4.3. Show that the dot product is distributive, that is,

x · (y + z) = x · y + x · z .

Exercise 1.4.4. Show that the dot product is bilinear (R Def. 6.2.11), i. e., for x, y, z ∈ Fn and α ∈ F, we have (x + z) · y = x · y + z · y (1.15) (αx) · y = α(x · y) (1.16) x · (y + z) = x · y + x · z (1.17) x · (αy) = α(x · y) (1.18)

Exercise 1.4.5. Show that the dot product preserves linear combinations, i. e., for x1,..., xk, y ∈ n F and α1, . . . , αk ∈ F, we have

k ! k X X αixi · y = αi(xi · y) (1.19) i=1 i=1

n and for x, y1,..., y` ∈ F and β1, . . . , β` ∈ F,

` ! ` X X x · βiyi = βi(x · yi) . (1.20) i=1 i=1 1.4. (F) DOT PRODUCT 17

Numerical exercise 1.4.6. Let  3  1  3  x =  4  , y = 0 , z =  2  −2 2 −4

Compute x · y, x · z, and x · (y + z). Self-check: verify that x · (y + z) = (x · y) + (x · z). Exercise 1.4.7. Compute the following dot products.

(a) 1k · 1k for k ≥ 1.   α0 α   1 k (b) x · x where x =  .  ∈ F  .  αk

Definition 1.4.8 (). Let x, y ∈ Fk. Then x and y are orthogonal (notation: x ⊥ y) if x · y = 0. Exercise 1.4.9. What vectors are orthogonal to every vector?

Exercise 1.4.10. Which vectors are orthogonal to 1k?

Definition 1.4.11 (Isotropic vector). The vector v ∈ Fn is isotropic if v 6= 0 and v · v = 0. Exercise 1.4.12.

(a) Show that there are no isotropic vectors in Rn.

(b) Find an isotropic vector in C2.

2 (c) Find an isotropic vector in F2.

Theorem 1.4.13. If v1,..., vk are pairwise orthogonal and non-isotropic vectors, then they are linearly independent. ♦

(R Contents) 18 CHAPTER 1. (F, R) COLUMN VECTORS

1.5 (R) Dot product over R

We now specialize our discussion of the dot product to the space Rn.

Exercise 1.5.1. Let x ∈ Rn. Show x · x ≥ 0, with equality holding if and only if x = 0.

This “positive definiteness” of the dot product allows us to define the “norm” of a vector, a generalization of length.

Definition 1.5.2 (Norm). The norm of a vector x ∈ Rn, denoted kxk, is defined as √ kxk := x · x . (1.21)

This norm is also referred to as the Euclidean norm or the `2 norm.

Definition 1.5.3 (Orthogonal system). An orthogonal system in Rn is a list of (pairwise) orthogonal nonzero vectors in Rn.

Exercise 1.5.4. Let S ⊆ Rk be an orthogonal system in Rk. Prove that S is linearly independent.

Definition 1.5.5 (Orthonormal system). An orthonormal system in Rn is a list of (pairwise) orthog- onal vectors in V , all of which have unit norm.

Definition 1.5.6 (Orthonormal basis). An orthonormal basis of W ≤ Rn is an orthonormal system that is a basis of W .

Exercise 1.5.7.

(a) Find an orthonormal basis of Rn.

(b) Find all orthonormal bases of R2.

Orthogonality is studied in more detail in Chapter 19.

(R Contents) 1.6. (F) ADDITIONAL EXERCISES 19

1.6 (F) Additional exercises

Exercise 1.6.1. Let v1,..., vn be vectors such that kvik > 1 for all i, and vi · vj = 1 whenever i 6= j. Show that v1,..., vn are linearly independent.

n Definition 1.6.2 (Incidence vector). Let A ⊆ {1, . . . , n}. Then the incidence vector vA ∈ R is the vector whose i-th coordinate is 1 if i ∈ A and 0 otherwise.

Exercise 1.6.3. Let A, B ⊆ {1, . . . , n}. Express the dot product vA · vB in terms of the sets A and B.

♥ Exercise 1.6.4 (Generalized Fisher Inequality). Let λ ≥ 1 and let A1,...,Am ⊆ {1, . . . , n} such that for all i 6= j we have |Ai ∩ Aj| = λ. Then m ≤ n.

(R Contents) Chapter 2

(F) Matrices

(R Contents)

2.1 Matrix basics

Definition 2.1.1 (Matrix). A k × n matrix is a table of numbers arranged in k rows and n columns, written as   α11 α12 ··· α1n  . . .   . . .  αk1 αk2 ··· αkn

We may write M = (αi,j)k×n to indicate a matrix whose entry in position (i, j)(i-th row, k-th column) is αi,j. For typographical convenience we usually omit the comma separating the row index and the column index and simply write αij instead of αi,j; we use the comma if its omission would lead to ambiguity. So we write M = (αij)k×n, or simply M = (αij) if the values k and n are clear from context. We also write (M)ij to indicate the (i, j) entry of the matrix M, i. e., αij = (M)ij. Example 2.1.2. This is a 3 × 5 matrix. 1 3 −2 0 −1 6 5 4 −3 −6 2 7 1 1 5

Babai: Discover Linear Algebra. 20 This chapter last updated August 29, 2016 c 2016 L´aszl´oBabai. 2.1. MATRIX BASICS 21

In this example, we have α22 = 5 and α15 = −1.

Definition 2.1.3 (The space Fk×n). The set of k × n matrices with entries from the domain F of scalars is denoted by Fk×n. Recall that F always denotes a field. Square matrices (k = n) have n×n special significance, so we write Mn(F) := F . We identify M1(F) with F and omit the matrix notation, i. e., we write α rather than α. An integral matrix is a matrix with integer entries. k×n n×n Naturally, Z will denote the set of k × n integral matrices, and Mn(Z) = Z . Recall that Z is not a field.

 2 6 9   0 −1 4 7 Example 2.1.4. For example, ∈ 2×4 is a 2 × 4 matrix and 3 −4 −2 ∈ −3 5 6 8 R   −5 1 4 M3(R) is a 3 × 3 matrix.

Observe that the column vectors of height k introduced in Chapter 1 are k × 1 matrices, so Fk = Fk×1. Moreover, every statement about column vectors in Chapter 1 applies analogously to 1 × n matrices (“row vectors”). Notation 2.1.5. When writing row vectors, we use commas to avoid ambiguity, so we write, for example, (3, 5, −1) instead of 3 5 −1. Notation 2.1.6 (). The k × n matrix with all of its entries equal to 0 is called the zero matrix and is denoted by 0k×n, or simply by 0 if k and n are clear from context. Notation 2.1.7 (All-ones matrix). The k × n matrix with all of its entries equal to 1 is denoted by Jk×n or J. We write Jn for Jn×n.

Definition 2.1.8 (Diagonal matrix). A matrix A = (αij) ∈ Mn(F) is diagonal if αij = 0 whenever i 6= j. The n × n diagonal matrix with entries λ1, . . . λn is denoted by diag(λ1, . . . , λn).

Example 2.1.9. 5 0 0 0 0 0 3 0 0 0   diag(5, 3, 0, −1, 5) = 0 0 0 0 0   0 0 0 −1 0 0 0 0 0 5 22 CHAPTER 2. (F) MATRICES

Notation 2.1.10. To avoid filling most of a matrix with the number “0”, we often write matrices like the one above as 5 0   3    diag(5, 3, 0, −1, 5) =  0     −1  0 5 where the big 0 symbol means that every entry in the triangles above or below the diagonal is 0.

Definition 2.1.11 (Upper and lower triangular matrices). A matrix A = (αij) ∈ Mn(F) is upper triangular if αij = 0 whenever i > j. A is said to be strictly upper triangular if αij = 0 whenever i ≥ j. Lower triangular and strictly lower triangular matrices are defined analogously. Examples 2.1.12. 5 2 0 7 2   3 0 −4 0    (a)  0 6 0  is upper triangular.    −1 −3 0 5

0 2 0 7 2   0 0 −4 0    (b)  0 6 0  is strictly upper triangular.    0 −3 0 0 5 0  2 3    (c) 0 0 0  is lower triangular.   7 −4 6 −1  2 0 0 −3 5 0 0  2 0    (d) 0 0 0  is strictly lower triangular.   7 −4 6 0  2 0 0 −3 0 2.1. MATRIX BASICS 23

Fact 2.1.13. The diagonal matrices are the matrices which are simultaneously upper and lower triangular.

Definition 2.1.14 (Matrix ). The transpose of a k × ` matrix M = (αij) is the ` × k matrix (βij) defined by βij = αji (2.1) and is denoted M T . (We flip it across its main diagonal, so the rows of A become the columns of AT and vice versa.) Examples 2.1.15.   T 3 1 3 1 4 (a) = 1 5 1 5 9   4 9

T (b) 1k = (1, 1,..., 1) | {z } k times (c) In Examples 2.1.12, the matrix (c) is the transpose of (a), and (d) is the transpose of (b). Fact 2.1.16. Let A be a matrix. Then AT T = A. Definition 2.1.17 (). A matrix M is symmetric if M = M T . Note that if a matrix M ∈ Fk×` is symmetric then k = ` (M is square). 1 3 0  Example 2.1.18. The matrix 3 5 −2 is symmetric. 0 −2 4

Definition 2.1.19 (Matrix addition). Let A = (αij) and B = (βij) be k × n matrices. Then the sum A + B is the k × n matrix with entries

(A + B)ij = αij + βij (2.2) That is, addition is defined elementwise. Example 2.1.20. 2 1  1 −6 3 −5 4 −2 + 2 3  = 6 1  0 5 1 4 1 9 24 CHAPTER 2. (F) MATRICES

Fact 2.1.21 (Adding zero). For any matrix A ∈ Fk×n, A + 0 = 0 + A = A. Proposition 2.1.22 (Commutativity). Matrix addition obeys the commutative law: if A, B ∈ Fk×n, then A + B = B + A. Proposition 2.1.23 (Associativity). Matrix addition obeys the associative law: if A, B, C ∈ Fk×n, then (A + B) + C = A + (B + C).

Definition 2.1.24 (The negative of a matrix). Let A ∈ Fk×n be a matrix. Then −A is the k × n matrix defined by (−A)ij = −(A)ij.

Proposition 2.1.25. Let A ∈ Fk×n. Then A + (−A) = 0. k×n Definition 2.1.26 (Multiplication of a matrix by a scalar). Let A = (αij) ∈ F , and let ζ ∈ F. Then ζA is the k × n matrix whose (i, j) entry is ζ · αij. Example 2.1.27.  1 2  3 6  3 −3 4 = −9 12 0 6 0 18 Fact 2.1.28. −A = (−1)A.

(R Contents)

2.2 Matrix multiplication

Definition 2.2.1 (Matrix multiplication). Let A = (αij) be an r × s matrix and B = (βjk) be an s × t matrix. Then the matrix product C = AB is the r × t matrix C = (γik) defined by

s X γik = αijβjk (2.3) j=1

Exercise 2.2.2. Let A ∈ Fk×n and let B ∈ Fn×m. Show that (AB)T = BT AT . (2.4) 2.2. MATRIX MULTIPLICATION 25

Proposition 2.2.3 (Distributivity). Matrix multiplication obeys the right distributive law: if A ∈ Fk×n and B,C ∈ Fn×`, then A(B + C) = AB + AC. Analogously, it obeys the left distributive law: if A, B ∈ Fk×n and C ∈ Fn×`, then (A + B)C = AC + BC. Proposition 2.2.4 (Associativity). Matrix multiplication obeys the associative law: if A, B, and C are matrices with compatible dimensions, then (AB)C = A(BC). Proposition 2.2.5 (Matrix multiplication vs. scaling). Let A ∈ Fk×n, B ∈ Fn×`, and α ∈ F. Then A(αB) = α(AB) = (αA)B. (2.5) Numerical exercise 2.2.6. For each of the following triples of matrices, compute the products AB, AC, and A(B + C). Self-check: verify that A(B + C) = AB + AC. 1 2 (a) A = , 2 1  3 1 0 B = , −4 2 5 1 −7 −4 C = 5 3 −6

2 5  (b) A = 1 1  , 3 −3 4 6 3 1 B = , 3 3 −5 4 1 −4 −1 5  C = 2 4 10 −7

3 1 2  (c) A = 4 −2 4  , 1 −3 −2 −1 4  B =  3 2  , −5 −2  2 −3 C = −6 −2 0 1 26 CHAPTER 2. (F) MATRICES

k×n (`×m) Exercise 2.2.7. Let A ∈ F , and let Eij be the ` × m matrix with a 1 in the (i, j) position and 0 everywhere else. (k×k) (a) What is Eij A?

(n×n) (b) What is AEij ?

Definition 2.2.8 (). The rotation matrix Rθ is the matrix defined by cos θ − sin θ R = . (2.6) θ sin θ cos θ As we shall see, this matrix is intimately related to the rotation of the Euclidean by θ (R Ex. 16.5.2).

Exercise 2.2.9. Prove Rα+β = RαRβ. Your proof may use the addition theorems for the trigono- metric functions. Later when we learn about the connection between matrices and linear transfor- mations, we shall give a direct proof of this fact which will imply the addition theorems (R Ex. 16.5.12).

Definition 2.2.10 (). The n × n identity matrix, denoted In or I, is the diagonal matrix whose diagonal entries are all 1, i. e., 1   1 0   I = diag(1, 1,..., 1) =  ..  (2.7)  .  0 1

This is also written as I = (δij), where the Kronecker delta symbol δij is defined by  1 i = j δ = (2.8) ij 0 i 6= j Fact 2.2.11. The columns of I are the standard unit vectors (R Def. 1.3.34).

Proposition 2.2.12. For all A ∈ Fk×n,

IkA = AIn = A. (2.9) 2.2. MATRIX MULTIPLICATION 27

Definition 2.2.13 (Scalar matrix). The matrix A ∈ Mn(F) is a scalar matrix if A = αI for some α ∈ F.

k×n `×k Exercise 2.2.14. Let A ∈ F and let B ∈ F . Let D = diag(λ1, . . . , λk). Show

(a) DA is the matrix obtained by scaling the i-th row of A by λi for each i;

(b) BD is the matrix obtained by scaling the j-th column of B by λj for each j.

` Definition 2.2.15 (`-th power of a matrix). Let A ∈ Mn(F) and ` ≥ 0. We define A , the `-th power of A, inductively:

(i) A0 = I ;

(ii) A`+1 = A · A` .

So A` = A ··· A . (2.10) | {z } ` times

Exercise 2.2.16. Let A ∈ Mn(F). Show that for all `, m ≥ 0, we have (a) A`+m = A`Am ;

(b) A`m = A`m .

Exercise 2.2.17. For k ≥ 0, compute 1 1 (a) Ak for A = ; 0 1

0 1 ♥ (b) Bk for B = ; 1 1

cos θ − sin θ (c) Ck for C = ; sin θ cos θ

(d) Interpret and verify the following statements:

(d1) Ak grows linearly 28 CHAPTER 2. (F) MATRICES

(d2) Bk grows exponentially (d3) Ck stays of constant “size.” (Define “size” in this statement.)

Definition 2.2.18 (). The matrix N ∈ Mn(F) is nilpotent if there exists an integer k such that N k = 0.

n Exercise 2.2.19. Show that if A ∈ Mn(F) is strictly upper triangular, then A = 0. So every strictly upper is nilpotent. Later we shall see that a matrix is nilpotent if and only if it is “similar” (R Def. 8.2.1) to a strictly upper triangular matrix (R Ex. 8.2.4).

Notation 2.2.20. We denote by Nn the n × n matrix defined by

0 1   0 1 0     0 1    Nn =  .. ..  . (2.11)  . .     0 1 0 0

That is, Nn = (αij) where ( 1 if j = i + 1 αij = . 0 otherwise

k Exercise 2.2.21. Find Nn for k ≥ 0. Fact 2.2.22. In Section 1.4, we defined the dot product of two vectors (R Def. 1.4.1). The dot product may also be defined in terms of matrix multiplication. Let x, y ∈ Fk. Then x · y = xT y . (2.12)

Exercise 2.2.23. If v ⊥ 1 then Jv = 0.

Notation 2.2.24. For a matrix A ∈ Fk×n, we sometimes write

A = [a1 | · · · | an]

where ai is the i-th column of A. 2.2. MATRIX MULTIPLICATION 29

Exercise 2.2.25 (Extracting columns and elements of a matrix via multiplication). Let ei be the i-th column of I (i. e., the i-th standard ), and let A = (αij) = [a1 | · · · | an]. Then

(a) Aej = aj ;

T (b) ei Aej = αij .

k×n Exercise 2.2.26 (Linear combination as a matrix product). Let A = [a1 | · · · | an] ∈ F and let T n x = (α1, . . . , αn) ∈ F . Show that

Ax = α1a1 + ··· + αnan . (2.13)

k×n Exercise 2.2.27 (Left multiplication acts column by column). Let A ∈ F and let B = [b1 | n×` · · · | b`] ∈ F . (The bi are the columns of B.) Then

AB = [Ab1 | · · · | Ab`] . (2.14)

Proposition 2.2.28 (No cancellation). Let k ≥ 2. Then for all n and for all x ∈ Fn, there exist k × n matrices A and B (A 6= B) such that Ax = Bx but A 6= B.

Proposition 2.2.29. Let A ∈ Fk×n. If Ax = 0 for all x ∈ Fn, then A = 0.

Corollary 2.2.30 (Cancellation). If A, B ∈ Fk×n are matrices such that Ax = Bx for all x ∈ Fn, then A = B. Note: compare with Prop. 2.2.28.

Proposition 2.2.31 (No double cancellation). Let k ≥ 2. Then for all n and for all x ∈ Fk and y ∈ Fn, there exist k × n matrices A and B such that xT Ay = xT By but A 6= B.

Proposition 2.2.32. Let A ∈ Fk×n. If xT Ay = 0 for all x ∈ Fk and y ∈ Fn, then A = 0.

Corollary 2.2.33 (Double cancellation). If A, B ∈ Fk×n are matrices such that xT Ay = xT By for all x ∈ Fk and y ∈ Fn, then A = B.

Definition 2.2.34 (Trace). The trace of a A = (αij) ∈ Mn(F) is the sum of its diagonal entries, that is, n X Tr(A) = αii (2.15) i=1 Examples 2.2.35. 30 CHAPTER 2. (F) MATRICES

(a) Tr(In) = n 3 1 2  (b) Tr 4 −2 4  = −1 1 −3 −2 Fact 2.2.36 (Linearity of the trace). (a) Tr(A + B) = Tr(A) + Tr(B);

(b) for λ ∈ F, Tr(λA) = λ Tr(A); P P (c) for αi ∈ F, Tr ( αiAi) = αi Tr Ai . Exercise 2.2.37. Let v, w ∈ Fn. Then Tr vwT  = vT w . (2.16)

T T Note that vw ∈ Mn(F) and v w ∈ F. Proposition 2.2.38. Let A ∈ Fk×n and let B ∈ Fn×k. Then Tr(AB) = Tr(BA) . (2.17)

Exercise 2.2.39. Show that the trace of a product is invariant under a cyclic permutation of the terms, i. e., if A1,...,Ak are matrices such that the product A1 ··· Ak is defined and is a square matrix, then Tr(A1 ··· Ak) = Tr(AkA1 ··· Ak−1) . (2.18) Exercise 2.2.40. Show that the trace of a product is not invariant under all permutations of the terms. In particular, find 2 × 2 matrices A, B, and C such that

Tr(ABC) 6= Tr(BAC) .

Exercise 2.2.41 (Trace cancellation). Let B,C ∈ Mn(F). Show that if Tr(AB) = Tr(AC) for all A ∈ Mn(F), then B = C.

(R Contents) 2.3. ARITHMETIC OF DIAGONAL AND TRIANGULAR MATRICES 31 2.3 Arithmetic of diagonal and triangular matrices

In this section we turn our attention to special properties of diagonal and triangular matrices.

Proposition 2.3.1. Let A = diag(α1, . . . , αn) and B = diag(β1, . . . , βn) be n×n diagonal matrices and let λ ∈ F. Then

A + B = diag(α1 + β1, . . . , αn + βn) (2.19)

λA = diag(λα1, . . . , λαn) (2.20)

AB = diag(α1β1, . . . , αnβn) . (2.21)

k k k  Proposition 2.3.2. Let A = diag(α1, . . . , αn) be a diagonal matrix. Then A = diag α1, . . . , αn for all k.

Definition 2.3.3 (Substition of a matrix into a polynomial). Let f ∈ F[t] be the polynomial (R Def. 8.3.1) defined by d f = α0 + α1t + ··· + αdt . Just as we may substitute ζ ∈ F for the variable t in f to obtain a value f(ζ) ∈ F, we may also “plug in” the matrix A ∈ Mn(F) to obtain f(A) ∈ Mn(F). The only thing we have to be careful about is what we do with the scalar term α0; we replace it with α0 times the identity matrix, so

d f(A) := α0I + α1A + ··· + αdA . (2.22)

Proposition 2.3.4. Let f ∈ F[t] be a polynomial and let A = diag(α1, . . . , αn) be a diagonal matrix. Then f(A) = diag(f(α1), . . . , f(αn)) . (2.23)

In our discussion of the arithmetic of triangular matrices, we focus on the diagonal entries. Notation 2.3.5. For the remainder of this section, the symbol ∗ in a matrix will represent an arbitrary value with which we will not concern ourselves. We write   α1 ∗  α   2   ..   0 .  αn 32 CHAPTER 2. (F) MATRICES

for   α1 ∗ · · · ∗  α · · · ∗   2   .. .  .  0 . .  αn

Proposition 2.3.6. Let   α1 ∗  α   2  A =  ..   0 .  αn and   β1 ∗  β   2  B =  ..   0 .  βn be upper triangular matrices and let λ ∈ F. Then   α1 + β1 ∗  α + β   2 2  A + B =  ..  (2.24)  0 .  αn + βn   λα1 ∗  λα   2  λA =  ..  (2.25)  0 .  λαn   α1β1 ∗  α β   2 2  AB =  ..  . (2.26)  0 .  αnβn 2.4. PERMUTATION MATRICES 33

Proposition 2.3.7. Let A be as in Prop. 2.3.6. Then

αk  1 ∗  αk  k  2  A =  ..  (2.27)  .  0 k αn for all k.

Proposition 2.3.8. Let f ∈ F[t] be a polynomial and let A be as in Prop. 2.3.6. Then   f(α1) ∗  f(α )   2  f(A) =  ..  . (2.28)  0 .  f(αn)

(R Contents)

2.4 Permutation Matrices

Definition 2.4.1 (Rook arrangement). A rook arrangement is an arrangement of n rooks on an n×n chessboard such that no pair of rooks attack each other. In other words, there is exactly one rook in each row and column.

FIGURE HERE Fact 2.4.2. The number of rook arrangements on an n × n chessboard is n!. Definition 2.4.3 (). A permutation matrix is a square matrix with the following properties. (a) Every nonzero entry is equal to 1 .

(b) Each row and column has exactly one nonzero entry. 34 CHAPTER 2. (F) MATRICES

Observe that rook arrangements correspond to permutation matrices where each rook is placed on a 1. Permutation matrices will be revisited in Chapter 6 where we discuss the determinant. Recall that we denote by [n] the set of integers {1, . . . , n}. Definition 2.4.4 (Permutation). A permutation is a bijection σ :[n] → [n]. We write σ : i 7→ iσ. Example 2.4.5. A permutation can be represented by a 2 × n table like this.

i 1 2 3 4 5 6 iσ 3 6 4 1 5 2

This permutation takes 1 to 3, 2 to 6, 3 to 4, etc. Note that any table obtained by rearranging columns represents the same permutation. For example, we may also represent the permutation σ by the table below.

i 4 6 1 5 2 3 iσ 1 2 3 5 6 4

Moreover, the permutation can be represented by a diagram, where the arrow i → iσ means that i maps to iσ.

FIGURE HERE Definition 2.4.6 (Composition of permutations). Let σ, τ :[n] → [n] be permutations. The compo- sition of τ with σ, denoted στ, is the permutation which maps i to iστ defined by

iστ := (iσ)τ . (2.29)

This may be represented as i −→σ iσ −→τ iστ . Example 2.4.7. Let σ be the permutation of Example 2.4.5 and let τ be the permutation given in the table i 1 2 3 4 5 6 iτ 4 1 6 5 2 3

Then we can find the table representing the permutation στ by rearranging the table for τ so that its first row is in the same order as the second row of the table for σ, i. e., 2.4. PERMUTATION MATRICES 35

i 3 6 4 1 5 2 iτ 6 3 5 4 2 1

and then combining the two tables:

i 1 2 3 4 5 6 iσ 3 6 4 1 5 2 iστ 6 3 5 4 2 1

That is, the table corresponding to the permutation στ is

i 1 2 3 4 5 6 iστ 6 3 5 4 2 1

Definition 2.4.8 (Identity permutation). The identity permutation id : [n] → [n] is the permutation defined by iid = i for all i ∈ [n]. Definition 2.4.9 (Inverse of a permutation). let σ :[n] → [n] be a permutation. Then the inverse of σ is the permutation σ−1 such that σσ−1 = id. So σ−1 takes iσ to i.

Example 2.4.10. To find the table corresponding to the inverse of a permutation σ, we first switch the rows of the table corresponding to σ and then rearrange the columns so that they are in natural order. For example, when we switch the rows of the table in Example 2.4.5, we have

i 3 6 4 1 5 2 iσ−1 1 2 3 4 5 6

Rearranging the columns so that they are in natural order gives us the table

i 1 2 3 4 5 6 iσ−1 6 4 1 2 5 3

Definition 2.4.11. Let σ :[n] → [n] be a permutation. The permutation matrix corresponding to σ, denoted Pσ, is the n × n matrix whose (i, j) entry is 1 if σ(i) = j and 0 otherwise.

Exercise 2.4.12. Let σ, τ :[n] → [n] be permutations, let A ∈ Fk×n, and let B ∈ Fn×`.

(a) What is APσ? 36 CHAPTER 2. (F) MATRICES

(b) What is PσB?

(c) What is Pσ−1 ?

(d) Show that PσPτ = Pτσ (note the conflict in conventions for composition of permutations and matrix multiplication).

(R Contents)

2.5 Additional exercises

Exercise 2.5.1. Let A be a matrix. Show, without calculation, that AT A is symmetric.

Definition 2.5.2 (Commutator). Let A ∈ Fk×n and A ∈ Fn×k. Then the commutator of A and B is AB − BA.

Definition 2.5.3. Two matrices A, B ∈ Mn(F) commute if AB = BA. Exercise 2.5.4. (a) Find an example of two 2 × 2 matrices that do not commute.

(b) (Project) Interpret and prove the following statement:

Almost all pairs of 2 × 2 integral matrices (matrices in M2(Z)) do not commute.

Exercise 2.5.5. Let D ∈ Mn(F) be a diagonal matrix such that all diagonal entries are distinct. Show that if A ∈ Mn(F) commutes with D then A is a diagonal matrix. Exercise 2.5.6. Show that only the scalar matrices (R Def. 2.2.13) commute with all matrices in Mn(F). (A scalar matrix is a matrix of the form λI.) Exercise 2.5.7.

(a) Show that the commutator of two matrices over R is never the identity matrix.

(b) Find A, B ∈ Mp(Fp) such that their commutator is the identity. 2.5. ADDITIONAL EXERCISES 37

Exercise 2.5.8 (Submatrix sum). Let I1 ⊆ [k] and I2 ⊆ [n], and let B be the submatrix (R Def. k×n T 3.3.11) of A ∈ F with entries αij for i ∈ I1, j ∈ I2. Find vectors a and b such that a Ab equals the sum of the entries of B.

Definition 2.5.9 (). The Vandermonde matrix generated by α1, . . . , αn is the n × n matrix  1 1 ··· 1   α α ··· α   1 2 n   α2 α2 ··· α2  V (α1, . . . , αn) =  1 2 n  . (2.30)  . . .. .   . . . .  n−1 n−1 n−1 α1 α2 ··· αn

Exercise 2.5.10. Let A be a Vandermonde matrix generated by distinct αi. Show that the rows of A are linearly independent. Do not use . Exercise 2.5.11. Prove that polynomials of a matrix commute: let A be a square matrix and let f, g ∈ F[t]. Then f(A) and g(A) commute. In particular, A commutes with f(A).

Definition 2.5.12 (). The circulant matrix generated by the sequence (α0, α1, . . . αn−1) of scalars is the n × n matrix   α0 α1 ··· αn−1 α α ··· α   n−1 0 n−2 C(α0, α1, . . . αn−1) =  . . .. .  (2.31)  . . . .  α1 α2 ··· α0 Exercise 2.5.13. Prove that all circulant matrices commute. Prove this (a) directly, (b) in a more elegant way, by showing that all circulant matrices are polynomials of a particular circulant matrix.

Definition 2.5.14 (Jordan block). For λ ∈ C and n ≥ 1, the Jordan block J(n, λ) is the matrix

J(n, λ) := λI + Nn (2.32) where Nn is the matrix defined in Notation 2.2.20. 38 CHAPTER 2. (F) MATRICES

Exercise 2.5.15. Let f ∈ C[t]. Prove that f(J(n, λ)) is the matrix

 0 1 (2) 1 (n−1)  f(λ) f (λ) 2! f (λ) ··· (n−1)! f (λ)  0 1 (n−2)   f(λ) f (λ) ··· f (λ)  (n−2)!   .. .. .   . . .  . (2.33)    f(λ) f 0(λ)   0  f(λ)

Exercise 2.5.16. The converse of the second statement in Ex. 2.5.11 would be: (∗) The only matrices that commute with A are the polynomials of A. (a) Find a matrix A for which (∗) is false. (b) For which diagonal matrices is (∗) true? (c) Prove: (∗) is true for Jordan blocks.

(d) Characterize the matrices over C for which (∗) is true, in terms of their Jordan blocks (R ??). k Project 2.5.17. For A ∈ Mn(R), let f(A, k) denote the largest absolute value of all entries of A , and define (λ) Mn (R) := {A ∈ Mn(R) | f(A, 1) ≤ λ} (2.34) (the matrices where all entries have absolute value ≤ λ). Define

f1(n, k) = max f(A, k) , (1) A∈Mn (R)

f2(n, k) = max f(A, k) , (1) A∈Mn (R) A nilpotent

f3(n, k) = max f(A, k) . (1) A∈Mn (R) A strictly upper triangular Find the rate of growth of these functions in terms of k and n.

(R Contents) Chapter 3

(F) Matrix Rank

(R Contents)

3.1 Column and row rank

In Section 1.3, we defined the rank of a list of column or row vectors. We now view this concept in terms of the columns and rows of a matrix. Definition 3.1.1 (Column- and row-rank). The column-rank of a matrix is the rank of the list of its column vectors. The row-rank of a matrix is the rank of the list of its row vectors. A matrix A ∈ Fk×n is of full row-rank if its row-rank is equal to k, and of full column-rank if its column-rank is equal to n. We denote the column-rank of A by rkcol A and the row-rank of A by rkrow A. Definition 3.1.2 (Column and row space). The column space of a matrix A, denoted col-space A, is the span of its columns. The row space of a matrix A, denoted row-space A, is the span of its rows. In particular, for a k × n matrix A, we have col-space A ≤ Fk and row-space A ≤ Fn. Proposition 3.1.3. The column-rank of a matrix is equal to the dimension of its column space, and the row-rank of a matrix is equal to the dimension of its row space.

Definition 3.1.4 (Full column- and row-rank). A k ×n matrix A is of full column-rank if its column- rank is k. A is of full row-rank if its row-rank is n.

Babai: Discover Linear last Algebra. updated August 21, 2016.39Thischapter Section 3.2 (Elementary operations) significantly c 2016 L´aszl´oBabai. revised November 15, 2017. 40 CHAPTER 3. (F) MATRIX RANK

Exercise 3.1.5. Let A and B be k × n matrices. Then

| rkcol A − rkcol B| ≤ rkcol(A + B) ≤ rkcol A + rkcol B. (3.1)

Exercise 3.1.6. Let A be a k × n matrix and let B be an n × ` matrix. Then col-space(AB) ≤ col-space A.

Exercise 3.1.7. Let A be a k × n matrix and let B be an n × ` matrix. Then rkcol(AB) ≤ rkcol A.

(R Contents)

3.2 Elementary operations and Gaussian elimination

There are three kinds of “elementary operations” on the columns of a matrix and three corresponding elementary operations on the rows of a matrix. The significance of these operations is that while they can simplify the matrix by turning most of its elements into zero, they do no change the rank of the matrix, and they change the value of the determinant in an easy-to-follow manner. We call the three column operations column shearing, column scaling, and column swapping, and use the corresponding terms for rows as well. The most powerful of these is “column shearing.” While the literature sometimes calls this operation “column addition,” I find this term misleading: we don’t “add columns” – that is a term more adequately applied in the context of the multilinearity of the determinant. I propose the term “column shearing” from physical analogy which I will explain below. Similarly misleading is the term “column multiplication” often used to describe what I propose to call “column scaling.” The term “column multiplication” seems to suggest that we multiply columns, as we do in the definition of an . Our “column swapping” is often described as “column switching.” Although “switching” seems to be an adequate term, I believe “swapping” better describes what is actually happening. Let A = [a1 | · · · | an] be a matrix; the vi are the columns of this matrix. The definitions below refer to this matrix.

Definition 3.2.1 (Column shearing). The column shearing operation, denoted by shearc(i, j, λ), takes three parameters: column indices i 6= j and a scalar λ. The operation replaces ai by ai + λaj.

Note that the only column that changes as a result of the shearc(i, j, λ) operation is column i. Remember the condition i 6= j. On the other hand, λ can be any scalar, including 0. 3.2. ELEMENTARY OPERATIONS AND GAUSSIAN ELIMINATION 41

Definition 3.2.2 (Column scaling). The column scaling operation, denoted by scalec(i, λ), takes two parameters: a column index i and a scalar λ 6= 0. The operation replaces ai by λai.

Note that the only column that changes as a result of the scalec(i, λ) operation is column i. Remember the condition λ 6= 0.

Definition 3.2.3 (Column swapping). The column swapping operation, denoted by swapc(i, j), takes two parameters: column indices i 6= j. The operation swaps the columns ai and aj. So what happens can be described with reference to a temporary vector z as the following sequence of operations: z ← vi, vi ← vj, vj ← z. Note that the only columns that change as a result of the swapc(i, j) operation are columns i and j. Remember the condition i 6= j.

Definition 3.2.4. Elementary row operations The elementary row operations shearr(i, j, λ), scaler(i, λ), and swapr(i, j) are defined analogously. Exercise 3.2.5. Each of the elementary operations is invertible and the inverse of each elementary operation is an elementary operation of the same type. Specifically, the inverse of shearc(i, j, λ) is −1 shearc(i, j, −λ); the inverse of scalec(i, λ) is of scalec(i, λ ), and swapc(i, j) is its own inverse. The analogous statements hold for the elementary row operations.

Definition 3.2.6 (). An elementary matrix is a square matrix that differs from the the identity matrix in an elementary operation. Accordingly, we shall have three kinds of elementary matrices: shearing, scaling, and swapping matrices.

Definition 3.2.7 (Shearing matrix). Let i 6= j, 1 ≤ i, j ≤ n. We denote by Eij the n × n matrix which has 1 in the (i, j) position and 0 in every other position. A shearing matrix B is an n × n matrix of the form B = I + λEij for λ ∈ F.

Exercise 3.2.8. Let A ∈ Fk×n be a k × n matrix, λ ∈ F, and B the n × n shearing matrix B = I + λEij. Show that AB is the matrix obtained from A by performing the operation shearc(j, i, λ). Infer (do not repeat the same argument) that BA is obtained from A by performing shearr(i, j, λ). Remark 3.2.9. In physics, “shearing forces” are “unaligned forces pushing one part of a body in one specific direction, and another part of the body in the opposite direction.” (Wikipedia) A pair of scissors exerts shearing forces on the material it tries to cut. A nice example is a deck of cards 42 CHAPTER 3. (F) MATRIX RANK on a hirizontal table being pushed horizontally on the top while being held in place at the bottom, causing the cards to slide. Thus the deck, which initially has a rectangle as its vertical cross-section, ends up with a parallelogram of the same height as its vertical cross-section. The distance of each card from the base remains constant, and the distance each card slides is proportional to its distance from the base. This is precisely what an application of a shearing matrix to the deck achieves in 2D or 3D. Definition 3.2.10 (Scaling matrix). Let 1 ≤ i ≤ n and λ ∈ F, λ 6= 0. We denote by D(i, λ) the n × n diagonal matrix which has λ in the (i, i) position and 1 in all the other diagonal positions.

Exercise 3.2.11. Let A ∈ Fk×n be a k × n matrix, λ ∈ F, λ 6= 0, and 1 ≤ i ≤ n. Show that AD(i, λ) is the matrix obtained from A by performing the column operation scalec(i, λ). Infer (do not repeat the same argument) that D(i, λ)A is obtained from A by performing the row operation scaler(i, λ).

Definition 3.2.12 (Swapping matrix). Let 1 ≤ i, j ≤ n where i 6= j. We denote by Ti,j the n × n matrix obtained from the identity matrix by swapping the column i and j. Note that by swapping rows i and j we obtain the same matrix.

k×n Exercise 3.2.13. Let A ∈ F be a k × n matrix, 1 ≤ i, j ≤ n, i 6= j. Show that A · Ti,j is the matrix obtained from A by performing the column operation swapc(i, j). Infer (do not repeat the same argument) that Ti,jA is obtained from A by performing the row operation swapr(i, j). Exercise 3.2.14. Let A be a matrix with linearly independent rows. Then by performing a series of row shearing, row swapping, and column swapping operations we can bring the matrix to the form [D | B] where D is a diagonal matrix no zeros in the diagonal. Exercise 3.2.15. Let A be a k × n matrix. Then by performing a series of row shearing, row swapping, and column swapping operations we can bring the A to the 2 × 2 block-matrix form

DB 0 0 where the top left block, D, is an r × r diagonal matrix with no zeros in the diagonal (for some value r which will turn out to be the rank of A) and the k − r bottom rows are zero. Exercise 3.2.16. Any matrix can be transformed into an upper triangular matrix by performing a series of row shearing, row swapping, and column swapping operations. 3.3. THE SECOND MIRACLE OF LINEAR ALGEBRA 43

The process of transforming a matrix into a matrix of the form described in Ex. 3.2.15 by per- forming a series of row shearing, row swapping, and column swapping operations is called Gaussian elimination.

Exercise 3.2.17. Let A be a k × n matrix. Then by performing a series of row shearing, row swapping, column shearing, and column swapping operations we can bring the A to the 2 × 2 block-matrix form D 0 0 0 where the top left block, D, is an r × r diagonal matrix with no zeros in the diagonal (for some value r which will turn out to be the rank of A); the rest of the matrix is zero.

The process of transforming a matrix into a matrix of the form described in Ex. 3.2.17 by per- forming a series of row shearing, row swapping, column shearing, and column swapping operations is called Gauss-Jordan elimination.

(R Contents)

3.3 Invariance of column and row rank, the Second Miracle of Linear Algebra

The main result of this section is the following theorem.

Theorem 3.3.1 (Second Miracle of Linear Algebra). The row-rank of a matrix is equal to its column-rank.

In order to prove this theorem, we will first need to prove the following two lemmas.

Lemma 3.3.2. Elementary column operations do not change the column-rank of a matrix. ♦ The proof of the next lemma is somewhat more involved; we will break its proof into exercises.

Lemma 3.3.3. Elementary row operations do not change the column-rank of a matrix.

Exercise 3.3.4. Use these two lemmas together with Ex. 3.2.17 to prove Theorem 3.3.1. 44 CHAPTER 3. (F) MATRIX RANK

In order to complete our proof of the Second Miracle, we now only need to prove Lemma 3.3.3. The following exercise demonstrates why the proof of Lemma 3.3.3 is not as straightforward as the proof of Lemma 3.3.2.

Exercise 3.3.5. An elementary row operation can change the column space of a matrix.

k×n Proposition 3.3.6. Let A = [a1 | · · · | an] ∈ F be a matrix with columns a1,..., an. Let

n X αiai = 0 i=1

0 0 0 be a linear relation among the columns. Let A = [a1 | · · · | an] be the result of applying an elementary row operation to A. Then the columns of A0 obey the same linear relation, that is,

n X 0 αiai = 0 . i=1

Corollary 3.3.7. If the columns vi1 ,..., vi` are linearly independent, then this remains true after an elementary row operation.

Exercise 3.3.8. Complete the proof of Lemma 3.3.3.

∗ ∗ ∗

Because the Second Miracle of Linear Algebra establishes that the row-rank and column-rank of a matrix A are equal, it is no longer necessary to differentiate between them; this quantity is simply referred to as the rank of A, denoted rk A.

Definition 3.3.9 (Full rank). Let A ∈ Mn(F) be a square matrix. Then A is of full rank if rk A = n. Observe that full rank is the same as full column- or row-rank (R Def. 3.1.4) only in the case of square matrices. Definition 3.3.10 (Nonsingular matrix). An n × n matrix is nonsingular if it is of full rank, i. e., rk A = n. Definition 3.3.11 (Submatrix). Let A ∈ Fk×n be a matrix. Then the matrix B is a submatrix of A if it can be obtained by deleting rows and columns from A. 3.4. MATRIX RANK AND INVERTIBILITY 45

In other words, a submatrix of a matrix A is a matrix obtained by taking the intersection of a set of the rows of A with a set of the columns of A.

Theorem 3.3.12 (Rank vs. nonsingular submatrices). Let A ∈ Fk×n be a matrix. Then rk A is the largest value of r such that A has a nonsingular r × r submatrix. ♦ Exercise 3.3.13. Show that for all k, the intersection of k linearly independent rows with k linearly independent columns can be singular. In fact, it can be the zero matrix.

Exercise 3.3.14. Let A be a matrix of rank r. Show that the intersection of any r linearly independent rows with any r linearly independent columns is a nonsingular r × r submatrix of A. (Note: this exercise is more difficult than Theorem 3.3.12 and is not needed for the proof of Theorem 3.3.12).

Exercise 3.3.15. Let A be a matrix. Show that if the intersection of k linearly independent columns with ` linearly independent rows of A has rank s, then rk(A) ≥ k + ` − s.

(R Contents)

3.4 Matrix rank and invertibility

Definition 3.4.1 (Left and right inverse). Let A ∈ Fk×n. The matrix B ∈ Fn×k is a left inverse of A n×k if BA = In. Likewise, the matrix C ∈ F is a right inverseof A if AC = Ik.

Proposition 3.4.2. Let A ∈ Fk×n. (a) Show that A has a right inverse if and only if A has full row-rank, i. e., rk A = k.

(b) Show that A has a left inverse if and only if A has full column-rank, i. e., rk A = n.

Note in particular that if A has a right inverse, then k ≤ n, and if A has a left inverse, then k ≥ n.

Corollary 3.4.3. Let A be a nonsingular square matrix. Then A has both a right and a left inverse.

Exercise 3.4.4. For all k < n, find a k × n matrix that has infinitely many right inverses. 46 CHAPTER 3. (F) MATRIX RANK

Definition 3.4.5 (Two-sided inverse). Let A ∈ Mn(F). Then the matrix B ∈ Mn(F) is a (two-sided) −1 inverse of A if AB = BA = In. The inverse of A is denoted A . If A has an inverse, then A is said to be invertible. Proposition 3.4.6. Let A be a matrix. If A has a left inverse as well as a right inverse, then A has a unique two-sided inverse and it has no left or right inverse other than the two-sided inverse. The proof of this lengthy statement is just one line, based solely on the associativity of matrix multiplication. The essence of the proof is in the next lemma.

Lemma 3.4.7. Let A ∈ Fk×n be a matrix with a right inverse B and a left inverse C. Then B = C is a two-sided inverse of A and k = n. ♦ Corollary 3.4.8. Under the conditions of Lemma 3.4.7, k = n and B = C is a two-sided inverse. Moreover, if C1 is also a left inverse, then C1 = C; analogously, if B1 is also a right inverse, then B1 = B. Corollary 3.4.9. Let A be a matrix with a left inverse. Then A has at most one right inverse. Corollary 3.4.10. A matrix A has an inverse if and only if A is a nonsingular square matrix. Corollary 3.4.11. If A has both a right and a left inverse then k = n.

Theorem 3.4.12. For A ∈ Mn(F), the following are equivalent. (a) rk A = n

(b) A has a right inverse

(c) A has a left inverse

(d) A has an inverse

♦ Exercise 3.4.13. Assume F is infinite and let A ∈ Fk×n where n > k. If A has a right inverse, then A has infinitely many right inverses.

(R Contents) 3.5. CODIMENSION (OPTIONAL) 47 3.5 Codimension (optional)

The reader comfortable with abstract vector spaces can skip to Chapter ?? for a more general discussion of this material. Definition 3.5.1 (Codimension). Let W ≤ Fn. The codimension of W is the minimum number of vectors that together with W generate Fn and is denoted by codim W or codim W . (Note that this is the dimension of the quotient space Fn/W ; see Def. ??)

Proposition 3.5.2. If W ≤ Fn then codim W = n − dim W .

Proposition 3.5.3 (Dual modular equation). Let U, W ≤ Fn. Then

codim(U + W ) + codim(U ∩ W ) = codim U + codim W. (3.2)

Corollary 3.5.4. Let U, W ≤ Fn. Then

codim(U ∩ W ) ≤ codim U + codim W. (3.3)

Corollary 3.5.5. Let U, W ≤ Fn with dim U = r and codim W = t. Then

dim(U ∩ W ) ≥ max{r − t, 0} . (3.4)

In Section 1.3, we defined the rank of a set of column vectors (R Def. 1.3.28) and then showed this to be equal to be the dimension of their span (R Cor. 1.3.46). We now define the corank of a set of vectors. Definition 3.5.6 (Corank). The corank of a set S ⊆ Fn, denoted corank S, is defined by

corank S := codim(span S) . (3.5)

Definition 3.5.7 (Null space). The null space or of a matrix A ∈ Fk×n, denoted null(A), is the set n null(A) = {v ∈ F | Av = 0} . (3.6) 48 CHAPTER 3. (F) MATRIX RANK

Exercise 3.5.8. Let A ∈ Fk×n. Show that rk A = codim(null(A)) . (3.7)

Proposition 3.5.9. Let U ≤ Fn and let W ≤ Fk such that dim W = codim U = `. Then there is a matrix A ∈ F`×k such that null(A) = U and col-space A = W . Definition 3.5.10 (Corank of a matrix). Let A ∈ Fk×n. We define the corank of A as the corank of its column space, i. e., corank A := k − rk A. (3.8) Exercise 3.5.11. When is corank A = corank AT ?

Exercise 3.5.12. Let A ∈ Fk×n and let B ∈ Fn×`. Show that corank(AB) ≤ corank A + corank B. (3.9)

(R Contents)

3.6 Additional exercises

Exercise 3.6.1. Let A = [v1 | · · · | vn] be a matrix. True or false: if the columns v1,..., v` are linearly independent this remains true after performing elementary column operations on A. Proposition 3.6.2. Let A be a matrix with at most one nonzero entry in each row and column. Then rk A is the number of nonzero entries in A. Numerical exercise 3.6.3. Perform a series of elementary row operations to determine the rank of the matrix  1 3 2 −5 −2 3 . −3 4 7 Self-check: use a different sequence of elementary operations and verify that you obtain the same answer. 3.6. ADDITIONAL EXERCISES 49

Exercise 3.6.4. Determine the ranks of the n × n matrices

(a) A = (αij) where αij = i + j

(b) B = (βij) where βij = ij

2 2 (c) C = (γij) where γij = i + j

Proposition 3.6.5. Let A ∈ Fk×` and B ∈ F`×m. Then rk(AB) ≤ min{rk(A), rk(B)} .

Proposition 3.6.6. Let A, B ∈ Fk×`. Then rk(A + B) ≤ rk(A) + rk(B). Exercise 3.6.7. Find an n × n matrix A of rank n − 1 such that An = 0.

Exercise 3.6.8. Let A ∈ Fk×n. Then rk A is the smallest r such that there exist matrices B ∈ Fk×r and C ∈ Fr×n with A = BC. Exercise 3.6.9. Show that the matrix A is of rank r if and only if A can be expressed as the sum of r matrices of rank 1.

Exercise 3.6.10 (Characterization of matrices of rank 1). Let A ∈ Fk×n. Show that rk A = 1 if T and only if there exist column vectors a ∈ Fk and b ∈ Fn such that A = ab . Let A ∈ Fk×n and let F be a subfield of the field G. Then we can also view A as a matrix over G. However, this changes the notion of linear combinations and therefore in principle it could affect the rank. We shall see that this is not the case, but in order to be able to reason about this question, we temporarily use the notation rkF(A) and rkG(A) to denote the rank of A with respect to the corresponding fields. We will also write rkp to mean rkFp . Lemma 3.6.11. Being nonsingular is insensitive to field extensions. ♦ Corollary 3.6.12. Let F be a subfield of G, and let A ∈ Fk×n. Then

rkF(A) = rkG(A) .

Exercise 3.6.13. Let A ∈ Rk×n be a matrix with integer coefficients. Then the entries of A can k×n also be considered residue classes of Z, that is, we can instead consider A as an element of Fp . Observe that Fp is not a subfield of R, because the operations of addition and multiplication are not the same. 50 CHAPTER 3. (F) MATRIX RANK

(a) Show that rkp(A) ≤ rk(A). (b) For every p, find a (0, 1) matrix A (that is, a matrix whose entries are only 0 and 1) where this inequality is strict, i. e., rkp(A) < rk(A). Proposition 3.6.14.

(a) Let A be a matrix with real entries. Then rk AT A = rk(A).

(b) This is false over C and over Fp. Exercise 3.6.15.

(a) Show that rk(Jn − In) = n . (b) Show that  n n even rk (J − I ) = . 2 n n n − 1 n odd

0 ♥ Exercise 3.6.16. Let A = (αij) be a matrix of rank r, and let A be the matrix obtained by squaring every element of A.

0 2  0 (a) Let A = αij be the matrix obtained by squaring every element of A. Show that rk(A ) ≤ r+1 2 .

0 d  0 r+d−1 (b) Now let A = αij . Show that rk(A ) ≤ d . r+d (c) Let f be a polynomial of degree d, and let Af = (f(αij)). Prove that rk(Af ) ≤ d . (d) Show that each of these bounds are tight for all r, i. e., for every r and d there exists a 0 r+d−1 matrix A such that rk A = d , and for every r and f there exists a matrix B such that r+d rk Bf = d .

(R Contents) Chapter 4

(F) Theory of Systems of Linear Equations I: Qualitative Theory

(R Contents)

4.1 Homogeneous systems of linear equations

Matrices allow us to concisely express systems of linear equations. In particular, consider the general system of k linear equations in n unknowns,

α11x1 + α12x2 + ··· + α1nxn = β1

α21x1 + α22x2 + ··· + α2nxn = β2 . .

αk1x1 + αk2x2 + ··· + αknxn = βk

Here, the αij and βi are scalars, while the xj are unknowns. In Section 1.1, we represented this system as a linear combination of column vectors. Matrices allow us to write this system even more

Babai: Discover Linear Algebra. 51 This chapter last updated August 8, 2016 c 2016 L´aszl´oBabai. 52CHAPTER 4. (F) THEORY OF SYSTEMS OF LINEAR EQUATIONS I: QUALITATIVE THEORY

concisely as Ax = b, where     α11 α12 ··· α1n x1 α α ··· α  x   21 22 2n  2 A =  . . .. .  , x =  .  ,  . . . .   .  αk1 αk2 ··· αkn xn   β1 β   2 b =  .  . (4.1)  .  βk

We know that the simplest is of the form ax = b (one equation in one unknown); remarkably, the notation does not become any more complex when we have a general system of linear equations. The first question we ask about any system of equations is its solvability. Definition 4.1.1 (Solvable system of linear equations). Given a matrix A ∈ Fk×n and a vector b ∈ Fk, we say that the system Ax = b of linear equations is solvable if there exists a vector x ∈ Fn that satisfies Ax = b. Definition 4.1.2 (Homogeneous system of linear equations). The system Ax = 0 is called a homo- geneous system of linear equations. Every system of homogeneous linear equations is solvable. Definition 4.1.3 (Trivial solution to a homogeneous system of linear equations). The trivial solution to the homogeneous system of linear equations Ax = 0 is the solution x = 0. So when presented with a homogeneous system of linear equations, the question we ask is not, “Is this system solvable?” but rather, “Does this system have a nontrivial solution?” Definition 4.1.4 (Solution space). Let A ∈ Fk×n. The set of solutions to the homogeneous system of linear equations Ax = 0 is the set U = {x ∈ Fn | Ax = 0} and is called the solution space of Ax = 0. Ex. 4.1.7 explains the terminology. Definition 4.1.5 (Null space). The null space or kernel of a matrix A ∈ Fk×n, denoted null(A), is the set n null(A) = {v ∈ F | Av = 0} . (4.2) Definition 4.1.6. The nullity of a matrix A is the dimension of its null space. 4.1. HOMOGENEOUS SYSTEMS OF LINEAR EQUATIONS 53

For the following three exercises, let A ∈ Fk×n and let U be the solution space of the system Ax = 0.

Exercise 4.1.7. Prove that U ≤ Fn. Exercise 4.1.8. Show that null(A) = U.

Proposition 4.1.9. Let A ∈ Fk×n and consider the homogeneous system Ax = b.

(a) Let rk A = r and let n = r + d. Then it is possible to relabel the unknowns x1,..., xn 0 0 0 0 0 as xi,..., xn, so that xj can be represented as a linear combination of xd+1,..., xn for j = 1, . . . , d, say n 0 X 0 xi = λijxj . j=d+1

Pn 0 0 (b) The vectors ei + j=d+1 λijxj form a basis of U , the solution space of the relabeled system of equations.

(c) dim U = dim U 0

(d) dim U = n − rk A An immediate consequence of (d) is the Rank-Nullity Theorem, which will be crucial in our study of linear maps in Chapter 16.1

Corollary 4.1.10 (Rank-Nullity Theorem). Let A ∈ Fk×n be a matrix. Then rk(A) + nullity(A) = n . (4.3)

An explanation of (d) is that dim U measures the number of coordinates of x that we can choose independently. This quantity is referred to by physicists as the “degree of freedom” left in our choice of x after the set Ax = b of constraints. If there are no constraints, the degree of freedom of the system is equal to n. It is plausible that each constraint reduces the degree of freedom by 1, which would suggest dim U = n − k, but effectively there are only rk A constraints because every equation that is a linear combination of previous equations can be thrown out. This makes it plausible that the degree of freedom is n − rk A. This argument is not a proof, however.

1In Chapter 16, we will formulate the Rank-Nullity Theorem in terms of linear maps rather than matrices, but the two formulations are equivalent. 54CHAPTER 4. (F) THEORY OF SYSTEMS OF LINEAR EQUATIONS I: QUALITATIVE THEORY

Proposition 4.1.11. Let A ∈ Fk×n and consider the homogeneous system Ax = 0 of linear equations. Let U be the solution space of Ax = 0. Prove that the following are equivalent.

(a) Ax = 0 has no nontrivial solution

(b) The columns of A are linearly independent

(c) A has full column rank, i. e., rk A = n

(d) dim U 6= 0

(e) The rows of A span Fn (f) A has a left inverse

Proposition 4.1.12. Let A be a square matrix. Then Ax = 0 has no nontrivial solution if and only if A is nonsingular.

(R Contents)

4.2 General systems of linear equations

Proposition 4.2.1. The system of linear equations Ax = b is solvable if and only if b ∈ col-space A.

Definition 4.2.2 (Augmented system). When speaking of the system Ax = b, we call A the matrix of the system and [A | b] (the column b added to A) the augmented system.

Proposition 4.2.3. The system Ax = b of linear equations is solvable if and only if the matrix of the system and the have the same rank, i. e., rk A = rk[A | b].

Definition 4.2.4 (Translation). Let S ⊆ Fn, and let v ∈ Fn. The set

S + v = {s + v | s ∈ S} (4.4) is called the translate of S by v. Such an object is called an affine subspace of Fn (R Def. 5.1.3). 4.2. GENERAL SYSTEMS OF LINEAR EQUATIONS 55

Proposition 4.2.5. Let S = {x ∈ Fn | Ax = b} be the set of solutions to a system of linear equations. Then S is either empty or a translate of a subspace of Fn, namely, of the solution space of to the homogeneous system Ax = 0.

Proposition 4.2.6 (Equivalent characterizations of nonsingular matrices). Let A ∈ Mn(F). The following are equivalent.

(a) A is nonsingular, i. e., rk(A) = n

(b) The rows of A are linearly independent

(c) The columns of A are linearly independent

(d) The rows of A span Fn

(e) The columns of A span Fn

(f) The rows of A form a basis of Fn

(g) The columns of A form a basis of Fn (h) A has a left inverse

(i) A has a right inverse

(j) A has a two-sided inverse

(k) Ax = 0 has no nontrivial solution

(l) For all b ∈ Fn, Ax = b is solvable

(m) For all b ∈ Fn, Ax = b has a unique solution Our discussion of the determinant in the next chapter will allow us to add a particularly impor- tant additional property:

(n) det A 6= 0 Chapter 5

(F, R) Affine and Convex Combinations (optional)

The reader comfortable with abstract vector spaces can skip to Chapter ?? for a more general discussion of this material.

5.1 (F) Affine combinations In Section 1.1, we defined linear combinations of column vectors (R Def. 1.1.13). We now consider affine combinations. n Definition 5.1.1 (Affine combination). An affine combination of the vectors v1,..., vk ∈ F is a k k X X linear combination αivi where αi = 1. i=1 i=1 Example 5.1.2. 1 1 1 2v − v − v − v 1 2 2 3 4 6 5

is an affine combination of the column vectors v1,..., v5.

Definition 5.1.3 (Affine-closed set). The set S ⊆ Fn is affine-closed if it is closed under affine combinations. Fact 5.1.4. The empty set is affine-closed (why?).

Babai: Discover Linear Algebra. 56 This chapter last updated August 23, 2016 c 2016 L´aszl´oBabai. 5.1. (F) AFFINE COMBINATIONS 57

Definition 5.1.5 (Affine subspace). A nonempty, affine-closed subset U of Fn is called an affine n n subspace of F (notation: U ≤aff F ).

Proposition 5.1.6. The intersection of a (finite or infinite) family of affine-closed subsets of Fn is affine-closed. In other words, an intersection affine subspaces is either empty or an affine subspace.

Throughout this book, the term “subspace” refers to subsets that are closed under linear com- binations (R Def. 1.2.1). Subspaces are also referred to as “linear subspaces.” This (redundant) longer term is especially useful in contexts where affine subspaces are discussed in order to distin- guish linear subspaces from affine subspaces. Definition 5.1.7 (Affine hull). The affine hull of a subset S ⊆ Fn, denoted aff S, is the smallest affine-closed set containing S, i. e.,

(a) aff S ⊇ S;

(b) aff S is affine-closed;

(c) for every affine set T ⊆ Fn, if T ⊇ S then T ⊇ aff S. Fact 5.1.8. aff ∅ = ∅.

Contrast this with the fact that span ∅ = {0}.

n Theorem 5.1.9. Let S ⊆ F . Then aff S exists and is unique. ♦

Theorem 5.1.10. For S ⊆ Fn, aff S is the set of all affine combinations of the finite subsets of S. ♦

Proposition 5.1.11. Let S ⊆ Fn. Then aff(aff S) = aff S.

Proposition 5.1.12. Let S ⊆ Fn. Then S is affine-closed if and only if S = aff S.

Proposition 5.1.13. Let S ⊆ Fn be affine-closed. Then S ≤ Fn if and only if 0 ∈ S. Proposition 5.1.14.

(a) Let W ≤ Fn. All translates W + v of W (R Def. 4.2.4) are affine subspaces.

n n (b) Every affine subspace S ≤aff F is the translate of a (unique) subspace of F . 58 CHAPTER 5. (F, R) AFFINE AND CONVEX COMBINATIONS (OPTIONAL)

Proposition 5.1.15. The intersection of a (finite or infinite) family of affine subspaces is either empty or equal to a translate of the intersection of their corresponding linear subspaces.

∗ ∗ ∗

Next we connect these concepts with the theory of systems of linear equations.

Exercise 5.1.16. Let A ∈ Fk×n and let b ∈ Fk. Then the set of solutions to the system Ax = b of linear equations is an affine-closed subset of Fn. The next exercise shows the converse.

Exercise 5.1.17. Every affine-closed subset of Fn is the set of solutions to the system Ax = b of linear equations for some A ∈ Fk×n and b ∈ Fk.

Proposition 5.1.18 (General vs. homogeneous systems of linear equations). Let A ∈ Fk×n and b ∈ Fn. Let S = {x ∈ Fn | Ax = b} be the set of solutions of the system Ax = b and let U = {x ∈ Fn | Ax = 0} be the set of solutions of the corresponding system of homogeneous linear equations. Then either S is empty or S is a translate of U.

∗ ∗ ∗

We now study geometric features of Fn, viewed as an affine space.

Proposition 5.1.19. The span of the set S ⊆ Fn is the affine hull of S ∪ {0}. Proposition 5.1.20. Let S ⊆ W be nonempty. Then for any u ∈ S, we have

aff S = u + span(S − u) (5.1) where S − u is the translate of S by −u.

Definition 5.1.21 (Dimension of an affine subspace). The (affine) dimension of an affine subspace n U ≤aff F , denoted dimaff U, is the dimension of its corresponding (of which it is a translate). In order to assign a dimension to all affine-closed sets, we adopt the convention that dim ∅ = −1.

k Exercise 5.1.22. Let U ≤ F . Then dimaff U = dim U. 5.1. (F) AFFINE COMBINATIONS 59

Exercise 5.1.23. What are the 0-dimensional affine subspaces?

n Definition 5.1.24 (Affine independence). The vectors v1,..., vk ∈ F are affine-independent if for k k X X every α1, . . . , αk ∈ F, the two conditions αivi = 0 and αi = 0 imply αi = ··· = αk = 0. i=1 i=1

n Proposition 5.1.25. The vectors v1,..., vk ∈ F are affine-independent if and only if none of them belongs to the affine hull of the others.

Fact 5.1.26. Any single vector is affine-independent and affine-closed at the same time.

n Proposition 5.1.27 (Translation invariance). Let v1,..., vk ∈ F be affine-independent and let n w ∈ F be any vector. Then v1 + w,..., vk + w are also affine-independent.

n Proposition 5.1.28 (Affine vs. linear independence). Let v1,..., vk ∈ F .

(a) For k ≥ 0, the vectors v1,... vk are linearly independent if and only if the vectors 0, v1,..., vk are affine-independent.

(b) For k ≥ 1, the vectors v1,..., vk are affine-independent if and only if the vectors v2 − v1,..., vk − v1 are linearly independent.

n Definition 5.1.29 (Affine basis). An affine basis of an affine subspace W ≤aff F is an affine- independent set S such that aff S = W .

Proposition 5.1.30. Let W be an affine subspace of Fn. Every affine basis of W has 1 + dim W elements.

n Corollary 5.1.31. If W1,...,Wk are affine subspaces of F , then

k X dimaff (aff{W1,...,Wk}) ≤ (k − 1) + dimaff Wi . (5.2) i=1 60 CHAPTER 5. (F, R) AFFINE AND CONVEX COMBINATIONS (OPTIONAL)

5.2 (F) Hyperplanes

Definition 5.2.1 (Linear hyperplane). A linear hyperplane of Fn is a subspace of Fn of codimension 1 (R Def. 3.5.1). Definition 5.2.2 (Codimension of an affine subspace). The (affine) codimension of an affine subspace n U ≤aff F , denoted codimaff U, is the codimension of its corresponding linear subspace (of which it is a translate). Definition 5.2.3 (Hyperplane). A hyperplane is an affine subspace of codimension 1.

Proposition 5.2.4. Let S ⊆ Fn be a hyperplane. Then there exist a nonzero vector a ∈ Fn and β ∈ F such that aT v = β if and only if v ∈ S. The vector a whose existence is guaranteed by the preceding proposition is called the normal vector of the hyperplane S.

Proposition 5.2.5. Let W ≤ Fn. Then (a) W is the intersection of linear hyperplanes;

(b) if dim W = k, then W is the intersection of n − k linear hyperplanes.

n Proposition 5.2.6. Let W ≤aff F be an affine subspace. Then (a) W is the intersection of hyperplanes;

(b) if dimaff W = k, then W is the intersection of n − k hyperplanes.

5.3 (R) Convex combinations

In the preceding section, we studied affine combinations over an arbitrary field F. We now restrict ourselves to the case where F = R. Definition 5.3.1 (). A convex combination is an affine combination with non- k k X X negative coefficients. So the expression αivi is a convex combination if αi = 1 and αi ≥ 0 i=1 i=1 for all i. 5.3. (R) CONVEX COMBINATIONS 61

Example 5.3.2. 1 1 1 1 v + v + v + v 2 1 4 2 6 4 12 5 is a convex combination of the vectors v1,..., v5. Note that the affine combination in Example 5.1.2 is not convex.

Fact 5.3.3. Every convex combination is an affine combination.

Definition 5.3.4 (). A convex set is a subset S ⊆ Rn that is closed under convex combi- nations.

Proposition 5.3.5. The intersection of a (finite or infinite) family of convex sets is convex.

Definition 5.3.6. The of a subset S ⊆ Rn, denoted conv S, is the smallest convex set containing S, i. e.,

(a) conv S ⊇ S;

(b) conv S is convex;

(c) for every convex set T ⊆ Rn, if T ⊇ S then T ⊇ conv S.

n Theorem 5.3.7. Let S ⊆ R . Then conv S exists and is unique. ♦

Theorem 5.3.8. For S ⊆ R, conv S is the set of all convex combinations of the finite subsets of S. ♦

Proposition 5.3.9. Let S ⊆ Rn. Then S is convex if and only if S = conv S. Fact 5.3.10. conv S ⊆ aff S.

Definition 5.3.11 (Straight-line segment). Let u, v ∈ Rn. The straight-line segment connecting u and v is the convex hull of {u, v}, i. e., the set

conv(u, v) = {λu + (1 − λ)v | 0 ≤ λ ≤ 1} .

Proposition 5.3.12. The set S ⊆ Rn is convex if and only if it contains the straight-line segment connecting u and v for every u, v ∈ S. 62 CHAPTER 5. (F, R) AFFINE AND CONVEX COMBINATIONS (OPTIONAL)

n Definition 5.3.13 (Dimension). The dimension of a convex set C ⊆ R , denoted dimconv C, is the dimension of its affine hull. A convex subset C of Rn is full-dimensional if aff C = Rn. Definition 5.3.14 (Half-space). A closed half-space is a region of Rn defined as {v ∈ Rn | aT v ≥ β} for some nonzero a ∈ Rn and β ∈ R and is denoted H(a, β). An open half-space is a region of Rn defined as {v ∈ Rn | aT v > β} for some nonzero a ∈ Rn and β ∈ R and is denoted Ho(a, β). Exercise 5.3.15. Let a ∈ Rn be a nonzero vector and let β ∈ R. Prove: the set {v ∈ Rn | aT v ≤ β} is also a (closed) half-space. Fact 5.3.16. Let S ⊆ Rn be a hyperplane defined by aT v = β. Then S divides Rn into two half-spaces, defined by aT v ≥ β and aT v ≤ β. The intersection of these two half-spaces is S. Proposition 5.3.17. (a) Every closed (R Def. 19.4.11) convex set is the intersection of the closed half-spaces containing it.

(b) If S ⊆ Rn is finite then conv(S) is the intersection of a finite number of closed half-spaces. Exercise 5.3.18. Find a convex set which is not the intersection of any number of open or closed half-spaces. (Such a set already exists in 2 dimensions.) Proposition∗ 5.3.19. Every open (R Def. 19.4.10) convex set is the intersection of the open half-spaces containing it.

5.4 (R) Helly’s Theorem The main result of this section is the following theorem. n Theorem 5.4.1 (Helly’s Theorem). If C1,...,Ck ⊆ R are convex sets such that any n+1 of them intersect then all of them intersect. Lemma 5.4.2 (Radon). Let S ⊆ Rn be a set of k ≥ n + 2 vectors in Rn. Then S has two disjoint subsets S1 and S2 whose convex hulls intersect. ♦ Exercise 5.4.3. Show that the inequality k ≥ n + 2 is tight, i. e., that there exists a set S of n + 1 n vectors in R such that for any two subsets S1,S2 ⊆ S, we have conv S1 ∩ conv S2 = ∅. Exercise 5.4.4. Use Radon’s Lemma to prove Helly’s Theorem. Exercise 5.4.5. Show the bound in Helly’s Theorem is tight, i. e., there exist n + 1 convex subsets of Rn such that every n of them intersect but the intersection of all of them is empty. 5.5. (F, R) ADDITIONAL EXERCISES 63

5.5 (F, R) Additional exercises Observe that a half-space can be defined by n + 1 real parameters,1 i. e., by defining a and β. The following project asks you to define a similar object which can be defined by 2n−1 real parameters.

Project 5.5.1. Define a “partially open half-space” which can be defined by 2n − 1 parameters2 so that every convex set is the intersection of the partially open half-spaces and the closed half-spaces containing it.

(R Contents)

1In fact, in a well-defined sense, half-spaces can be defined by n real parameters. 2In fact, this object can be defined by 2n − 3 real parameters. Chapter 6

(F) The Determinant

While many systems of linear equations are too large to reasonably solve by hand using methods like elimination, the general system of two equations in two unknowns

α11x1 + α12x2 = β1 (6.1)

α21x1 + α22x2 = β2 (6.2)

is not too cumbersome to work out and find

α22β1 − α12β2 x1 = (6.3) α22α11 − α12α21 α11β2 − α21β1 x2 = (6.4) α22α11 − α12α21

For the case of 3 equations in 3 unknowns, we obtain

N31 x1 = (6.5) D3 N32 x2 = (6.6) D3 N33 x3 = (6.7) D3

Babai: Discover Linear Algebra. 64 This chapter last updated August 18, 2016 c 2016 L´aszl´oBabai. 6.1. PERMUTATIONS 65

where

D3 = α11α22α33 + α12α23α31 + α13α21α32 (6.8)

− α11α23α32 − α12α21α33 − α13α22α31 (6.9)

N31 = α22α33β1 + α12α23β3 + α13α32β2 (6.10)

− α23α32β1 − α12α33β2 − α13α22β3 (6.11)

N32 = α11α33β2 + α23α31β1 + α13α32β3 (6.12)

− α11α23β3 − α21α33β1 − α13α31β2 (6.13)

N33 = α11α22β3 + α12α31β2 + α21α32β1 (6.14)

− α11α32β2 − α12α21β3 − α22α31β1 (6.15)

In particular, the numerators and denominators of these expressions each have six terms. For a system of 4 equations in 4 unknowns, the numerators and denominators each have 24 terms, and, in general, we have n! terms in the numerators and denominators of the solutions to n equations in n unknowns. As complicated as these expressions may seem for large n, it is clear that they represent a fundamental quantity associated with the matrix. The denominator of these expressions is known as the determinant of the matrix A (where our system of linear equations is written as Ax = b). The determinant is a function from the space of n × n matrices to numbers, that is, det : Mn(F) → F. Before defining this function, we need to study permutations. (R Contents)

6.1 Permutations

Definition 6.1.1 (Permutation). A permutation of a set Ω is a bijection f :Ω → Ω. The set Ω is called the permutation domain.

Definition 6.1.2 (Symmetric group). The symmetric group of degree n, denoted Sn, is the set of all permutations of the set {1, . . . , n}.

Definition 6.1.3 (Inversion). Let σ ∈ Sn be a permutation of the set {1, . . . , n}, and let 1 ≤ i, j ≤ n with i 6= j. We say that the pair {i, j} is inverted by σ if i < j and σ(i) > σ(j) or i > j and 66 CHAPTER 6. (F) THE DETERMINANT

σ(i) < σ(j). We denote by Inv(σ) the number of inversions of σ, that is, the number of pairs {i, j} that are inverted by σ.

Exercise 6.1.4. What is the maximum possible value of Inv(σ) for σ ∈ Sn? Definition 6.1.5 (Even and odd permutations). If Inv(σ) is even, then we say that σ is an even permutation, and if Inv(σ) is odd, then σ is an odd permutation. Proposition 6.1.6. For n ≥ 2, exactly half of the n! permutations of {1, . . . , n} are even, the other half are odd.

Definition 6.1.7 (Sign of a permutation). Let σ ∈ Sn be a permutation. The sign of σ, denoted sgn σ, is defined as sgn(σ) := (−1)Inv(σ) (6.16)

In particular, sgn(σ) = 1 if σ is an even permutation and sgn(σ) = −1 if σ is an odd permutation. The sign of a permutation (equivalently, whether it is even or odd) is also referred to as the parity of the permutation. Definition 6.1.8 (Composition of permutations). Let Ω be a set, and let σ and τ be permutations of Ω. Then the composition of σ with τ, denoted στ, is defined by

(στ)(a) := σ(τ(a)) (6.17) for all a ∈ Ω. We also refer to the composition of σ with τ as the product of σ and τ. Definition 6.1.9 (Transposition). Let Ω be a set. The transposition of the elements a 6= b ∈ Ω is the permutation that swaps a and b and fixes every other element. Formally, it is the permutation τ defined by   b x = a τ(x) := a x = b (6.18)  x otherwise This permutation is denoted τ = (a, b).

Proposition 6.1.10. Transpositions generate Sn, i. e., every permutation σ ∈ Sn can be written as a composition of transpositions. 6.1. PERMUTATIONS 67

Theorem 6.1.11. A permutation σ of {1, . . . , n} is even if and only if σ is a product of an even number of transpositions. ♦ In the light of this theorem, we could use its conclusion as the definition of even permutations. The advantage of this definition is that it can be applied to any set Ω, not just the ordered set {1, . . . , n}.

Corollary 6.1.12. While the number of inversions of a permutation depends on the ordering of the permutation domain, its parity does not.

Definition 6.1.13 (Neighbor transposition). A neighbor transposition of the set {1, . . . , n} is a trans- position of the form τ = (i, i + 1).

Exercise 6.1.14. Let σ ∈ Sn and let τ be a neighbor transposition. Show

| Inv(σ) − Inv(στ)| = 1 .

Corollary 6.1.15. Let σ ∈ Sn and let τ be a neighbor transposition. Then sgn(στ) = − sgn(σ). Proposition 6.1.16. Every transposition is the composition of an odd number of neighbor trans- positions.

Proposition 6.1.17. Neighbor transpositions generate the symmetric group. That is, every ele- ment of Sn can be expressed as the composition of neighbor transpositions. Proposition 6.1.18. Composition with a transposition changes the parity of a permutation.

Corollary 6.1.19. Let σ ∈ Sn be a permutation. Then σ is even if and only if σ is the product of an even number of transpositions.

Theorem 6.1.20. Let σ, τ ∈ Sn. Then

sgn(στ) = sgn(σ) sgn(τ) . (6.19)

Definition 6.1.21 (k-cycle). A k-cycle is a permutation that cyclically permutes k elements {a1, . . . , ak} and fixes all others (FIGURE). That is, σ is a k-cycle if σ(ai) = ai+1 for some elements a1, . . . , ak (where ak+1 = a1) and σ(x) = x if x 6∈ {a1, . . . , ak}. 68 CHAPTER 6. (F) THE DETERMINANT

In particular, transpositions are 2-cycles. Definition 6.1.22 (Disjoint cycles). Let σ and τ be cycles with permutation domain Ω. Then σ and τ are disjoint if no element of Ω is permuted by both σ and τ. Proposition 6.1.23. Every permutation uniquely decomposes into disjoint cycles. Exercise 6.1.24. Let σ be a k-cycle. Show that σ is an even permutation if and only if k is odd. Corollary 6.1.25. Let σ be a permutation. Then σ is even if and only if its cycle decomposition includes an even number of even cycles. Proposition 6.1.26. Let σ be a permutation. Then Inv (σ−1) = Inv(σ).

(R Contents)

6.2 Defining the determinant

Definition 6.2.1 (Determinant). The determinant of an n × n matrix A = (αij) is

n X Y det A := sgn(σ) αi,σ(i) . (6.20)

σ∈Sn i=1   α11 ··· α1n  . .. .  Notation 6.2.2. The determinant of a matrix A =  . . .  may also be represented with αn1 ··· αnn vertical bars, that is,   α11 ··· α1n  . .. .  det A = |A| = det  . . .  (6.21) αn1 ··· αnn

α ··· α 11 1n . .. . = . . . . (6.22)

αn1 ··· αnn 6.2. DEFINING THE DETERMINANT 69

Proposition 6.2.3. Show

α β (a) det = αδ − βγ γ δ   α11 α12 α13 (b) det α21 α22 α23 = α31 α32 α33

α11α22α33 + α12α23α31 + α13α21α32 − α11α23α32 − α12α21α33 − α13α22α31

For the following exercises, let A be an n × n matrix.

Proposition 6.2.4. If a column of A is 0, then det A = 0.

Proposition 6.2.5. Let A0 be the matrix obtained by swapping two columns of A. Then det A0 = − det A.

Proposition 6.2.6. Show that det AT = det A. This fact follows from what property of inversions?

Proposition 6.2.7. If two columns of A are equal, then det A = 0.

Proposition 6.2.8. The determinant of a diagonal matrix is the product of its diagonal entries.

Proposition 6.2.9. The determinant of an upper triangular matrix is the product of its diagonal entries.

Example 6.2.10. 5 1 7 det 0 2 6 = 30 (6.23) 0 0 3 and this value does not depend on the three entries in the upper-right corner.

Definition 6.2.11 (k-linearity). A function f : V × · · · × V → W is linear in the i-th component, if, | {z } k terms whenever we fix x1,..., xi−1, xi+1,... xk, the function

g(y) := f(x1,..., xi−1, y, xi+1,..., xk) 70 CHAPTER 6. (F) THE DETERMINANT is a linear,1 i. e.,

g(y1 + y2) = g(y1) + g(y2) (6.24) g(αy) = αg(y) (6.25)

The function f is [k-linear] if it is linear in all k components. A function which is 2-linear is said to be bilinear.

Proposition 6.2.12 (Multilinearity). The determinant is multilinear in the columns of A. That is,

det[a1 | · · · | ai−1 | ai + b | ai+1 | · · · | an]

= det[a1 | · · · | an]

+ det[a1 | · · · | ai−1 | b | ai+1 | · · · | an] (6.26) and

det[a1 | · · · | ai−1 | αai | ai+1 | · · · | an]

= α det[a1 | · · · | an] . (6.27)

Next we study the effect of elementary column and row operations (Sec. 3.2) on the determinant.

0 Proposition 6.2.13. Let A be the matrix obtained from A ∈ Mn(F) by applying an elementary column or row operation. We distinguish cases according to the operation applied.

(a) Shearing does not change the value of the determinant: det(A0) = det(A).

(b) Scaling of a row or comulmn by a scalar λ (the scalec(i, λ) or scaler(i, λ) operation scales the value of the determinant by the same factor: det(A0) = λ det(A).

(c) Swapping two rows or columns changes the sign of the determinant: det(A0) = − det(A).

Proposition 6.2.14. Let A, B ∈ Mn(F). Then det(AB) = det(A) det(B).

1Linear maps will be discussed in more detail in Chapter 16. 6.2. DEFINING THE DETERMINANT 71

2 1 −3 Numerical exercise 6.2.15. Let A = 4 −1 0 . 2 5 −1

(a) Use the formula derived in part (b) of Prop. 6.2.3 to compute det A.

(b) Compute the matrix A0 obtained by performing the column operation (1, 2, −4) on A.

(c) Self-check: Use the same formula to compute det A0, and verify that det A0 = det A.

Theorem 6.2.16 (Determinant vs. linear independence). Let A be a square matrix. Then det A = 0 if and only if the columns of A are linearly dependent.

The proof of this theorem requires the following two lemmas which also provide a practical method to compute the determinant by Gaussian or Gauss-Jordan elimination.

Lemma 6.2.17. Let A be an n × n matrix whose columns are linearly dependent. Then through a series of column shearing operations, it is possible to bring A into a form in which one column is all zeros. In particular, in this case the determinant is zero. ♦ Lemma 6.2.18. Let A be an n × n matrix whose columns are linearly independent. Then it is possible to perform a series of column shearing operations that bring A into a form in which each row and column contains exactly one nonzero entry, i. e., the determinant has exactly one nonzero expansion term. ♦ Theorem 6.2.19 (Determinant vs. system of linear equations). The homogeneous system Ax = 0 of n linear equations in n unknowns has a nontrivial solution if and only if det A = 0. ♦

T Definition 6.2.20 (Skew-symmetric matrix). The matrix A ∈ Mn(F) is skew-symmetric if A = −A.

Exercise 6.2.21. Let F be a field in which 1 + 1 6= 0.

(a) Show that if A ∈ Mn(F) is skew-symmetric and n is odd then det A = 0.

(b) For all even n, find a nonsingular skew-symmetric matrix A ∈ Mn(F).

Exercise 6.2.22. Show that part (a) of the preceding exercise is false over F2 (and every field of characteristic 2). 72 CHAPTER 6. (F) THE DETERMINANT

2 Exercise 6.2.23. Let F be a field of characteristic 2 and let n be odd. If A ∈ Mn(F) is symmetric and all of its diagonal entries are 0, then A is singular.

Proposition 6.2.24 (Hadamard’s Inequality). Let A = [a1 | · · · | an] ∈ Mn(R). Prove

n Y | det A| ≤ kajk . (6.28) j=1 where kajk is the norm of the vector aj (R Def. 1.5.2). Exercise 6.2.25. When does equality hold in Hadamard’s Inequality?

2 Definition 6.2.26 (Parallelepiped). Let v1, v2 ∈ R . The parallelogram spanned by v1 and v2 is the set (PICTURE)

parallelogram(v1, v2)

:= {αv1 + βv2 | 0 ≤ α, β ≤ 1} . (6.29)

n More generally, the parallelepiped spanned by the vectors v1,..., vn ∈ R is the set

parallelepiped(v1,..., vn) ( n )

X := αivi 0 ≤ αi ≤ 1 . (6.30) i=1

Exercise 6.2.27. Let a, b ∈ R2. Show that the area of the parallelogram spanned by these vectors (PICTURE) is | det A|, where A = [a | b] ∈ M2(R). The following theorem is a generalization of Ex. 6.2.27.

n Theorem 6.2.28. If v1,..., vn ∈ R , then the volume of parallelepiped(v1,..., vn) is | det A|, where A is the n × n matrix whose i-th column is vi. ♦

(R Contents)

2Observe that in a field of characteristic 2, “skew-symmetric” means the same thing as “symmetric.” 6.3. COFACTOR EXPANSION 73 6.3 Cofactor expansion

i+j Definition 6.3.1 (Cofactor). Let A ∈ Mn(F) be a matrix. The (i, j) cofactor of A is (−1) det(ˆiAˆj), where ˆiAˆj is the (n − 1) × (n − 1) matrix obtained by removing the i-th row and j-th column of A.

Theorem 6.3.2 (Cofactor expansion). Let A = (αij) be an n × n matrix, and let Cij be the (i, j) cofactor of A. Then for all i, n X det A = αijCij. (6.31) j=1 This is the cofactor expansion of A along the i-th row. Similarly, for all j,

n X det A = αijCij. (6.32) i=1

This is the cofactor expansion of A along the j-th column. ♦ Numerical exercise 6.3.3. Compute the determinants of the following matrices by cofactor ex- pansion (a) along the first row and (b) along the second column. Self-check: your answers should be the same. 2 3 1  (a) 0 −4 −1 1 −3 4

3 −3 2  (b) 4 7 −1 6 −4 2

1 3 2 (c) 3 −1 0 0 6 5

 6 2 −1 (d)  0 4 1  −3 1 1 Exercise 6.3.4. Compute the determinants of the following matrices. 74 CHAPTER 6. (F) THE DETERMINANT

α β β ··· β β β α β ··· β β   β β α ··· β β   (a)  ......   ......    β β β ··· α β β β β ··· β α

1 1 0 0 ··· 0 0 0 1 1 1 0 ··· 0 0 0   0 1 1 1 ··· 0 0 0   (b) ......  ......    0 0 0 0 ··· 1 1 1 0 0 0 0 ··· 0 1 1

 1 1 0 0 ··· 0 0 0 −1 1 1 0 ··· 0 0 0    0 −1 1 1 ··· 0 0 0   (c)  ......   ......     0 0 0 0 · · · −1 1 1 0 0 0 0 ··· 0 −1 1

Exercise 6.3.5. Compute the determinant of the Vandermonde matrix (R Def. 2.5.9 generated by α1, . . . , αn.

Definition 6.3.6 (Fixed ). Let Ω be a set, and let f :Ω → Ω be a permutation of Ω. We say that x ∈ Ω is a fixed point of f if f(x) = x. We say that f is fixed-point-free, or a derangement, if f has no fixed points.

♥ Exercise 6.3.7. Let Fn denote the set of fixed-point free permutations of the set {1, . . . , n}. Decide, for each n, whether the majority of Fn is odd or even. (The answer will depend on n.)

(R Contents) 6.4. MATRIX INVERSES AND THE DETERMINANT 75 6.4 Matrix inverses and the determinant

We now derive an explicit form for the inverse of an n × n matrix. Our tool for this will be the cofactor expansion in addition to the skew cofactor expansion. In Theorem 6.3.2, we took the dot product of the i-th column of a matrix with the cofactors of corresponding to the i-th column. We now instead take the dot product of the ith column with the cofactors corresponding to the j-th column.

Proposition 6.4.1 (Skew cofactor expansion). Let A = (αij) be an n × n matrix, and fix i and j (1 ≤ i, j ≤ n) such that i 6= j. Let Ckj be the (k, j) cofactor of A. Then n X αkiCkj = 0 . (6.33) k=1

Definition 6.4.2 (Adjugate of a matrix). Let A ∈ Mn(F). Then the adjugate of A, denoted adj(A), is the matrix whose (i, j) entry is the (j, i) cofactor of A. Theorem 6.4.3 (Explicit form of the matrix inverse). Let A be a nonsingular n × n matrix. Then 1 A−1 = adj(A) . (6.34) det A ♦ Numerical exercise 6.4.4. Compute the inverses of the following matrices. Self-check: multiply your answer by the original matrix to get the identity.  1 −3 (a) −2 4

−4 2  (b) −1 −1

 3 4 −7 (c) −2 1 −4 0 −2 5

−1 4 2  (d) −3 2 −3 1 0 2 76 CHAPTER 6. (F) THE DETERMINANT

1 Proposition 6.4.5. If A ∈ M ( ) is invertible, then det A−1 = . n F det A

Exercise 6.4.6. When is A ∈ Mn(Z) invertible over Z? Give a simple condition in terms of det A.

Exercise 6.4.7. Let n be odd and let A ∈ Mn(Z) be an invertible symmetric matrix whose diagonal −1 entries are all 0. Show that A 6∈ Mn(Z).

(R Contents)

6.5 Additional exercises

k×n n×k Notation 6.5.1. Let A ∈ F and B ∈ F be matrices, and let I ⊆ [n]. The matrix AI is the matrix whose columns are the columns of A which correspond to the elements of I. The matrix I B T T is BI , i. e., the matrix whose rows are the rows of B which correspond to the elements of I. −4 1 3 1 7 3 7 Example 6.5.2. Let A = , B = 6 2 , and let I = {1, 3}. Then A = and 2 3 2   I 2 2 −1 0 −4 1 B = . I −1 0

Exercise 6.5.3 (Cauchy-Binet formula). Let A ∈ Fk×n and let B ∈ Fn×k. Show that X det(AB) = det(AI ) det(I B) . (6.35) I⊆[n] |I|=k

(R Contents) Chapter 7

(F) Theory of Systems of Linear Equations II: Cramer’s Rule

(R Contents)

7.1 Cramer’s Rule

With the determinant in hand, we are now able to return to the problem of solving systems of linear equations. In particular, we have the following result. Theorem 7.1.1. Let A be a nonsingular square matrix. Then the system Ax = b of n linear equations in n unknowns is solvable. ♦ In particular, if A is a square matrix, then Ax = b is solvable (and in fact has a unique solution) precisely when A is invertible. It is easy to see that in this case, the solution to Ax = b is given by x = A−1b, but we now demonstrate that it is possible to determine the solution to this system of equations without ever computing a matrix inverse. Theorem 7.1.2 (Cramer’s Rule). Let A be an invertible n × n matrix over F, and let b ∈ Fn. Let   α1 α  −1  2 a = A b =  . . Then  .  αn

Babai: Discover Linear Algebra. 77 This chapter last updated July 19, 2016 c 2016 L´aszl´oBabai. 78 CHAPTER 7. SYSTEMS OF LINEAR EQUATIONS: CRAMER’S RULE

(a) Aa = b

(b) For each i ≤ n, det A α = i (7.1) i det A

where Ai is the matrix obtained by replacing the i-th column of A by b.

♦ Numerical exercise 7.1.3. Use Cramer’s Rule to solve the following systems of linear equations. Self-check: plug your answers back into the original equations.

(a)

x1 + 2x2 = 3

x1 − x2 = 6

(b)

−x1 + x2 − x3 = 4

2x1 + 3x2 + 4x3 = −2

−x1 − x2 − 3x3 = 3

(R Contents) Chapter 8

(F) Eigenvectors and Eigenvalues

(R Contents)

8.1 Eigenvector and eigenvalue basics

n Definition 8.1.1 (Eigenvector). Let A ∈ Mn(F) and let v ∈ F . Then v is an eigenvector of A if v 6= 0 and there exists λ ∈ F such that Av = λv.

Definition 8.1.2 (Eigenvalue). Let A ∈ Mn(F). Then λ ∈ F is an eigenvalue of A if there exists a nonzero vector v ∈ Fn such that Av = λv. n Definition 8.1.3. Let A ∈ Mn(F). Let λ ∈ F be a scalar and v ∈ F a nonzero vector. If Av = λv then we say that λ and v are a corresponding eigenvalue–eigenvector pair.

 1 1 Exercise 8.1.4. Consider the 2 × 2 real matrix B = . −2 4

(a) Prove that the numbers λ1 = 2 and λ2 = 3 are eigenvalues of this matrix. Find the corre- sponding eigenvectors.

(b) Prove that B has no other eigenvalues.

Exercise 8.1.5. What are the eigenvalues and eigenvectors of the n × n identity matrix?

Babai: Discover Linear Algebra. 79 This chapter last updated November 25, 2017 c 2016 L´aszl´oBabai. 80 CHAPTER 8. (F) EIGENVECTORS AND EIGENVALUES

Exercise 8.1.6. Let D = diag(λ1, . . . , λn) be the n × n diagonal matrix whose diagonal entries are λ1, . . . , λn. What are the eigenvalues and eigenvectors of D? 1 1 Exercise 8.1.7. Determine the eigenvalues of the 2 × 2 matrix . 1 0 Exercise 8.1.8. Find examples of real 2 × 2 matrices with

(a) no eigenvectors in R2

(b) exactly one eigenvector in R2, up to scaling

(c) exactly two eigenvectors in R2, up to scaling

(d) infinitely many eigenvectors in R2, none of which is a scalar multiple of another

Exercise 8.1.9. Let A ∈ Mn(F) be a matrix and let v1 and v2 be eigenvectors to distinct eigen- values. Then v1 + v2 is not an eigenvector.

Exercise 8.1.10. Let A ∈ Mn(F) be a matrix to which every nonzero vector is an eigenvector. Then A is a scalar matrix (R Def. 2.2.13).

♥ Exercise 8.1.11. Show that eigenvectors of a matrix corresponding to distinct eigenvalues are linearly independent.

n Definition 8.1.12 (Eigenbasis). Let A ∈ Mn(F) be a matrix. An eigenbasis of A is a basis of F made up of eigenvectors of A.

Exercise 8.1.13. Find all eigenbases of I.

Exercise 8.1.14. Let λ1, . . . , λn be scalars. Find an eigenbasis of diag(λ1, . . . , λn). 1 1 Exercise 8.1.15. Prove that has no eigenbasis. 0 1

Proposition 8.1.16. Let A ∈ Mn(F). Then A is nonsingular if and only if 0 is not an eigenvalue of A.

1×n Definition 8.1.17 (Left eigenvector). Let A ∈ Mn(F). Then x ∈ F is a left eigenvector if x 6= 0 and there exists λ ∈ F such that xA = λx. 8.1. EIGENVECTOR AND EIGENVALUE BASICS 81

Definition 8.1.18 (Left eigenvalue). Let A ∈ Mn(F). Then λ ∈ F is a left eigenvalue of A if there exists a nonzero row vector v ∈ F1×n such that vA = λv. Convention 8.1.19. When we use the word “eigenvector” without a modifier, we refer to a right eigenvector; the term “right eigenvector” is occasionally used for clarity.

T Exercise 8.1.20. Let A ∈ Mn(F). Show that if x is a right eigenvector to eigenvalue λ and y is a left eigenvector to eigenvalue µ 6= λ, then yT · x = 0, i. e., y ⊥ x.

Definition 8.1.21 (Eigenspace). Let A ∈ Mn(F). We denote by Uλ the set n Uλ := {v ∈ F | Av = λv} . (8.1) This set is called the eigenspace corresponding to the eigenvalue λ. The following exercise explains this terminology.

Exercise 8.1.22. Let A ∈ Mn(F) be a square matrix with eigenvalue λ. Show that Uλ is a subspace of Fn.

Definition 8.1.23 (Geometric multiplicity). Let A ∈ Mn(F) be a square matrix and let λ be an eigenvalue of A. Then the geometric multiplicity of λ is dim Uλ. The next exercise provides a method of calculating this quantity. Exercise 8.1.24. Let λ be an eigenvalue of the n × n matrix A. Give a simple expression for the geometric multiplicity in terms of A and λ. Exercise 8.1.25. If N is a nilpotent matrix (R Def. 2.2.18) and λ is an eigenvalue of N, then λ = 0. Exercise 8.1.26. Let N be a nilpotent matrix. (a) Show that I + N is invertible. (b) Find a nonsingular matrix A and a nilpotent matrix N such that A + N is singular. (c) Prove that there exist a nonsingular diagonal matrix D and a nilpotent matrix N such that D + N is singular.

(R Contents) 82 CHAPTER 8. (F) EIGENVECTORS AND EIGENVALUES 8.2 Similar matrices and diagonalizability

Definition 8.2.1 (Similar matrices). Let A, B ∈ Mn(F). Then A and B are similar (notation: A ∼ B) if there exists a nonsingular matrix S such that B = S−1AS.

Exercise 8.2.2. Let A and B be similar matrices. Show that, for k ≥ 0, we have Ak ∼ Bk.

Proposition 8.2.3. Let A ∼ B (R Def. 8.2.1). Then det A = det B.

Exercise 8.2.4. Let N ∈ Mn(F) be a matrix. Then N is nilpotent if and only if it is similar to a strictly upper triangular matrix.

Proposition 8.2.5. Every matrix is similar to a block diagonal matrix where each block has the form αI + N where α ∈ F and N is nilpotent.

Blocks of this form are “proto-Jordan blocks.” We will discuss Jordan blocks and the Jordan canonical form of matrices in Section ??

Definition 8.2.6 (Diagonalizability). Let A ∈ Mn(F). Then A is diagonalizable if A is similar to a diagonal matrix.

Exercise 8.2.7. 1 1 (a) Prove that is not diagonalizable. 0 1

1 2 (b) Prove that is diagonalizable. 0 3

α α  (c) When is the matrix 11 12 diagonalizable? 0 α22

Proposition 8.2.8. Let A ∈ Mn(F). Then A is diagonalizable if and only if A has an eigenbasis.

(R Contents) 8.3. POLYNOMIAL BASICS 83 8.3 Polynomial basics

We now digress from the theory of eigenvectors and eigenvalues in order to build a foundation in polynomials. We will discuss polynomials in more depth in Section 14.4, but now we establish the basics necessary for studying the characteristic polynomial in Section 8.4. Definition 8.3.1 (Polynomial). A polynomial over the field F is a formal expression1 f of the form

2 n f = α0 + α1t + α2t + ··· + αnt (8.2)

We may omit any terms with zero coefficient, e. g.,

6 + 0t − 3t2 + 0t3 = 6 − 3t2 (8.3)

The set of polynomials over the field F is denoted F[t]. Definition 8.3.2 (Zero polynomial). The polynomial which has all coefficients equal to zero is called the zero polynomial and is denoted by 0. n Definition 8.3.3. The leading term of a polynomial f = α0+α1t+···+αnt is the term corresponding k to the highest power of t with a nonzero coefficient, that is, the term αkt where αk 6= 0 and αj = 0 for all j > k. The zero polynomial does not have a leading term. The leading coefficient of a polynomial f is the coefficient of the leading term of f. The degree of a polynomial f, denoted deg f, is the exponent of its leading term. A polynomial is monic if its leading coefficient is 1. Convention 8.3.4. The zero polynomial has degree −∞. Example 8.3.5. Let f = 6 − 3t2 + 4t5. Then the leading term of f is 4t5, the leading coefficient of f is 4, and deg f = 5. Exercise 8.3.6. Which polynomials have degree 0?

Notation 8.3.7. We denote the set of polynomials of degree at most n over F by Pn(F). Definition 8.3.8 (Divisibility of polynomials). Let f, g ∈ F[t]. We say that g divides f, or f is divisible by g, written g | f, if there exists a polynomial h ∈ F[t] such that f = gh. In this case we say that g is a divisor of f and f is a multiple of g. Definition 8.3.9 (Root of a polynomial). Let f ∈ F[t] be a polynomial. Then ζ ∈ F[t] is a root of f if f(ζ) = 0.

1In Chapter 14, we will refine this definition and study polynomials more formally. 84 CHAPTER 8. (F) EIGENVECTORS AND EIGENVALUES

Proposition 8.3.10. Let f ∈ F[t] be a polynomial and let α ∈ F[t]. Then ζ is a root of F if and only if t − ζ | f.

Definition 8.3.11 (Multiplicity of a root). Let f be a polynomial and let ζ be a root of f. The multiplicity of the root ζ is the largest k for which (t − ζ)k | f.

Proposition 8.3.12. Let f be a polynomial of degree n. Then f has at most n roots (counting multiplicity).

Theorem 8.3.13 (Fundamental Theorem of Algebra). Let f ∈ C[t]. If deg f = k ≥ 1, then f can be written as k Y f = αk (t − ζi) (8.4) i=1 where αk is the leading coefficient of f and the ζi are complex numbers. ♦

(R Contents)

8.4 The characteristic polynomial

We now establish a method for finding the eigenvalues of general n × n matrices. Definition 8.4.1 (Characteristic polynomial). Let A be an n × n matrix. Then the characteristic polynomial of A is defined by fA(t) := det(tI − A) . (8.5) Exercise 8.4.2. (a) What is the characteristic polynomial of a 1 × 1 matrix?

α α  (b) Calculate the characteristic polynomial for the general 2 × 2 matrix A = 11 12 . α21 α22 Exercise 8.4.3. Show that the characteristic polynomial of an n × n matrix is a monic polynomial of degree n.

Theorem 8.4.4. The eigenvalues of a square matrix A are precisely the roots of its characteristic polynomial. ♦ 8.4. THE CHARACTERISTIC POLYNOMIAL 85

Proposition 8.4.5. Every matrix A ∈ M3(R) has an eigenvector.

Exercise 8.4.6. Find a matrix A ∈ M2(R) which does not have an eigenvector. Exercise 8.4.7. Let N be an n × n nilpotent matrix. What is its characteristic polynomial?

In Section 8.1, we defined the geometric multiplicity of an eigenvalue (R Def. 8.1.23). We now define the algebraic multiplicity of an eigenvalue. Definition 8.4.8 (Algebraic multiplicity). Let A be a square matrix with eigenvalue λ. The algebraic multiplicity of λ is its multiplicity as a root of the characteristic polynomial of A.

Proposition 8.4.9. Let A be an n × n matrix with distinct complex eigenvalues λ1, . . . , λ`, and let ki be the algebraic multiplicity of λi for each i. Then

` X ki = n (8.6) i=1 and ` Y ki fA = (t − λi) . (8.7) i=1 Proposition 8.4.10. Let A be as in the preceding proposition. Show that the coefficient of tn−` in fA is ` X (−1) λi1 λi2 ··· λi` .

i1,...,i` distinct

Theorem 8.4.11. Let A be an n × n matrix with eigenvalues λ1, . . . , λn (with their algebraic mul- tiplicities). Then

n X Tr A = λi (8.8) i=1 n Y det A = λi (8.9) i=1

♦ 86 CHAPTER 8. (F) EIGENVECTORS AND EIGENVALUES

Proposition 8.4.12. Let A be a square matrix and let λ be an eigenvalue of A. Then the geometric multiplicity of λ is less than or equal to the algebraic multiplicity of λ.

Exercise 8.4.13. Determine the eigenvalues and their (algebraic and geometric) multiplicities of the all ones matrix Jn over R.

Proposition 8.4.14. Let A be a square matrix. Then A is diagonalizable over C if and only if for every eigenvalue λ, the geometric and algebraic multiplicities of λ are equal.

Proposition 8.4.15. Let A ∈ Mn(F). Then the left eigenvalues of A are the same as the right eigenvalues, including their geometric and algebraic multiplicities.

k×n n×k n−k Proposition 8.4.16. Suppose n ≥ k. Let A ∈ F and B ∈ F . Then fAB = fBA · t .

Definition 8.4.17 (). Let f ∈ Pn[F] be a monic polynomial, say

n−1 n f = α0 + α1t + ··· + αn−1t + αnt . (8.10)

Then the companion matrix of f is the matrix C(f) ∈ Mn(F) defined by   0 0 0 ··· 0 −α0 1 0 0 ··· 0 −α1    0 1 0 ··· 0 −α2  :   C(f) = ......  . (8.11) ......    0 0 0 ··· 0 −αn−2 0 0 0 ··· 1 −αn−1

Proposition 8.4.18. Let f be a monic polynomial and let A = C(f) be its companion matrix. Then the characteristic polynomial of A is equal to f. That is,

f = fA = det(tI − C(f)) . (8.12)

Corollary 8.4.19. Every monic polynomial f ∈ Q[t] is the characteristic polynomial of a rational matrix.

(R Contents) 8.5. THE CAYLEY-HAMILTON THEOREM 87 8.5 The Cayley-Hamilton Theorem

In Section 2.3, we defined how to substitute a square matrix into a polynomial (R Def. 2.3.3). We repeat that definition here. Definition 8.5.1 (Substition of a matrix into a polynomial). Let f ∈ F[t] be the polynomial (R Def. 8.3.1) defined by d f = α0 + α1t + ··· + αdt . Just as we may substitute ζ ∈ F for the variable t in f to obtain a value f(ζ) ∈ F, we may also “plug in” the matrix A ∈ Mn(F) to obtain f(A) ∈ Mn(F). The only thing we have to be careful about is what we do with the scalar term α0; we replace it with α0 times the identity matrix, so

d f(A) := α0I + α1A + ··· + αdA . (8.13) Exercise 8.5.2. Let A and B be similar matrices, and let f be a polynomial. Show that f(A) ∼ f(B). The main result of this section is the following theorem.

Theorem 8.5.3 (Cayley-Hamilton Theorem). Let A be an n × n matrix. Then fA(A) = 0. Exercise 8.5.4. What is wrong with the following “proof” of the Cayley-Hamilton Theorem?

fA(A) = det(AI − A) = det 0 = 0 . (8.14) Exercise 8.5.5. Prove the Cayley-Hamilton Theorem by brute force (a) for 1 × 1 matrices,

(b) for 2 × 2 matrices. We will first prove the Cayley-Hamilton Theorem for diagonal matrices, and then more generally for diagonalizable matrices (R Def. 8.2.6).

Proposition 8.5.6. Let D be a diagonal matrix, and let fD be its characteristic polynomial. Then fD(D) = 0. Proposition 8.5.7. Let A ∼ B, so B = S−1AS. Then

tI − B = S−1 (tI − A) S. (8.15) 88 CHAPTER 8. (F) EIGENVECTORS AND EIGENVALUES

Proposition 8.5.8. If A ∼ B, then the characteristic polynomials of A and B are equal, i. e., fA = fB.

n Y Proposition 8.5.9. Let A be an n × n matrix with characteristic polynomial fA = (t − λi). i=1 Then A is diagonalizable if and only if   λ1 0 ··· 0  0 λ ··· 0   2  A ∼  . . .. .  . (8.16)  . . . .  0 0 ··· λn

−1 Lemma 8.5.10. Let g ∈ F[t], and let A, S ∈ Mn(F) with S nonsingular. Then g (S AS) = −1 S g(A)S. ♦

Corollary 8.5.11. Let g ∈ F[t] and let A, B ∈ Mn(F) with A ∼ B. Then g(A) ∼ g(B). Proposition 8.5.12. The Cayley-Hamilton Theorem holds for diagonalizable matrices.

Proposition 8.5.13. Show that the diagonalizable matrices are dense in Mn(C), i. e., every n × n complex matrix is the (elementwise) limit of a sequence of diagonalizable matrices.

Exercise 8.5.14. Combine the preceding two exercises to complete the proof of the Cayley– Hamilton theorem over C.

Proposition 8.5.15 (Identity principle). Let f(x1, . . . , xN ) be a multivariate polynomial with inte- ger coefficients. If f is identically zero over Z, i. e., for all α1, . . . , αN ∈ Z we have f(α1, . . . , αN ) = 0, then f is formally zero (all coeffcients of f are zero). In particular, f is the zero polynomial over any field.

Exercise 8.5.16. Use the Identity Principle to complete the proof of the Cayley–Hamilton Theorem over any field.

(R Contents) 8.6. ADDITIONAL EXERCISES 89 8.6 Additional exercises

Exercise 8.6.1. Let N be an n × n nilpotent matrix over any field. Show that N n = 0.

The following two exercises are easier for diagonalizable matrices.

P∞ i Exercise 8.6.2. Let f(t) = i=0 αit be a function which is convergent for all t such that |t| < r + for some r ∈ R ∪ {∞}. Define, for A ∈ Mn(R),

∞ X i f(A) = αiA . (8.17) i=0

A Prove that f(A) converges if |λi| < r for all eigenvalues λi of A. In particular, e always converges. Exercise 8.6.3.

(a) Find square matrices A and B such that eA+B 6= eAeB.

(b) Give a natural sufficient condition under which eA+B = eAeB. Chapter 9

(R) Orthogonal Matrices

9.1 Orthogonal matrices

In Section 1.4, we defined the standard dot product (R Def. 1.4.1) and the notions of orthogonal vectors (R Def. 1.4.8). In this chapter we study orthogonal matrices, matrices whose columns form an orthonormal basis (R Def. 1.5.6) of Rn. T Definition 9.1.1 (Orthogonal matrix). The matrix A ∈ Mn(R) is orthogonal if A A = I. The set of orthogonal n × n matrices is denoted by O(n).

n Fact 9.1.2. A ∈ Mn(R) is orthogonal if and only if its columns form an orthonormal basis of R . Proposition 9.1.3. O(n) is a group (R Def. 14.2.1) under matrix multiplication (it is called the orthogonal group). Exercise 9.1.4. Which diagonal matrices are orthogonal?

Theorem 9.1.5 (Third Miracle of Linear Algebra). Let A ∈ Mn(R). Then the columns of A are orthonormal if and only if the rows of A are orthonormal. ♦ Proposition 9.1.6. Let A ∈ O(n). Then all eigenvalues of A have absolute value 1.

Exercise 9.1.7. The matrix A ∈ Mn(R) is orthogonal if and only if A preserves the dot product, i. e., for all v, w ∈ Rn, we have (Av)T (Aw) = vT w.

Exercise 9.1.8. The matrix A ∈ Mn(R) is orthogonal if and only if A preserves the norm, i. e., for all v ∈ Rn, we have kAvk = kvk.

Babai: Discover Linear Algebra. 90 This chapter last updated August 29, 2016 c 2016 L´aszl´oBabai. 9.2. ORTHOGONAL SIMILARITY 91 9.2 Orthogonal similarity

Definition 9.2.1 (Orthogonal similarity). Let A, B ∈ Mn(R). We say that A is orthogonally similar −1 to B, denoted A ∼o B, if there exists an orthogonal matrix O such that A = O BO. Note that O−1BO = OT BO because O is orthogonal.

Proposition 9.2.2. Let A ∼o B. (a) If A is symmetric then so is B. (b) If A is orthogonal then so is B.

Proposition 9.2.3. Let A ∼o diag(λ1, . . . , λn). Then (a) If all eigenvalues of A are real then A is symmetric. (b) If all eigenvalues of A have unit absolute value then A is orthogonal.

Proposition 9.2.4. A ∈ Mn(R) has an orthonormal eigenbasis if and only if A is orthogonally similar to a diagonal matrix.

Proposition 9.2.5. Let A ∈ Mn(R). Then A ∈ O(n) if and only if it is orthogonally similar to a matrix which is the diagonal sum of some of the following: an identity matrix, a negative identity matrix, 2 × 2 rotation matrices (compare with Prop. 16.4.45). Examples 9.2.6. The following are examples of the matrices described in the preceding proposition. −1 0 0  0 √1 − √1 (a)  2 2  0 √1 √1 2 2 1 0 0 0 0 0 0 0 0  0 −1 0 0 0 0 0 0 0    0 0 −1 0 0 0 0 0 0    0 0 0 0 −1 0 0 0 0    (b) 0 0 0 1 0 0 0 0 0    0 0 0 0 0 0 −1 0 0  0 0 0 0 0 1 0 0 0   √  0 0 0 0 0 0 0 3 − 1   2 √ 2  1 3 0 0 0 0 0 0 0 2 2 92 CHAPTER 9. (R) ORTHOGONAL MATRICES 9.3 Additional exercises

Definition 9.3.1 (). The matrix A = (αij) ∈ Mn(R) is an Hadamard matrix if αij = ±1 for all i, j, and the columns of A are orthogonal. We denote by H the set

H := {n | an n × n Hadamard matrix exists} . (9.1)

1 1  Example 9.3.2. The matrix is an Hadamard matrix. 1 −1 Exercise 9.3.3. Let A be an n × n Hadamard matrix. Create a 2n × 2n Hadamard matrix.

Proposition 9.3.4. If n ∈ H and n > 2, then 4 | n.

Proposition 9.3.5. If p is prime and p ≡ −1 (mod 4), then p + 1 ∈ H.

Proposition 9.3.6. If k, ` ∈ H, then k` ∈ H.

√1 Proposition 9.3.7. Let A be an n × n Hadamard matrix. Then n A is an orthogonal matrix. Chapter 10

(R) The Spectral Theorem

10.1 Statement of the Spectral Theorem

The Spectral Theorem is one of the most significant results of linear algebra, as well as one of the most frequently applied mathematical results in pure math, applied math, and science. We will see a number of different versions of this theorem, and we will not prove it until Section 19.4. However, we now have developed the tools necessary to understand the statement of the theorem and some of its applications.

Theorem 10.1.1 (The Spectral Theorem for real symmetric matrices). Let A ∈ Mn(R) be a real symmetric matrix. Then A has an orthonormal eigenbasis. ♦

The Spectral Theorem can be restated in terms of orthogonal similarity (R Def. 9.2.1).

Theorem 10.1.2 (The Spectral Theorem for real symmetric matrices, restated). Let A ∈ Mn(R) be a real symmetric matrix. Then A is orthogonally similar to a diagonal matrix. ♦

Exercise 10.1.3. Verify that these two formulations of the Spectral Theorem are equivalent.

Corollary 10.1.4. Let A be a real symmetric matrix. Then A is diagonalizable.

Corollary 10.1.5. Let A be a real symmetric matrix. Then all of the eigenvalues of A are real.

Babai: Discover Linear Algebra. 93 This chapter last updated August 4, 2016 c 2016 L´aszl´oBabai. 94 CHAPTER 10. (R) THE SPECTRAL THEOREM 10.2 Applications of the Spectral Theorem

Although we have not yet proved the Spectral Theorem, we can already begin to study some of its many applications.

Proposition 10.2.1. If two symmetric matrices are similar then they are orthogonally similar.

n Exercise 10.2.2. Let A be a symmetric real n × n matrix, and let v ∈ R . Let b = (b1,..., bn) be an orthonormal eigenbasis of A. Express vT Av in terms of the eigenvalues and the coordinates of v with respect to b.

Definition 10.2.3 (Positive definite matrix). An n × n real matrix A ∈ Mn(R) is positive definite if for all x ∈ Rn (x 6= 0), we have xT Ax > 0.

Proposition 10.2.4. Let A ∈ Mn(R) be a real symmetric n×n matrix. Then A is positive definite if and only if all eigenvalues of A are positive.

Another consequence of the Spectral Theorem is Rayleigh’s Principle.

Definition 10.2.5 (Rayleigh quotient). Let A ∈ Mn(R). The Rayleigh quotient of A is a function n RA : R \{0} → R defined by vT Av R (v) = . (10.1) A kvk2 Recall that kvk2 = vT v (R Def. 1.5.2).

Proposition 10.2.6 (Rayleigh’s Principle). Let A be an n × n real symmetric matrix with eigen- values λ1 ≥ · · · ≥ λn. Then

(a) max RA(v) = λ1 v∈Rn\{0}

(b) min RA(v) = λn v∈Rn\{0}

Theorem 10.2.7 (Courant-Fischer). Let A be a symmetric real matrix with eigenvalues λ1 ≥ · · · ≥ λn. Then λ = max min R (v) . (10.2) i n A U≤R v∈U\{0} dim U=i ♦ 10.2. APPLICATIONS OF THE SPECTRAL THEOREM 95

Theorem 10.2.8 (Interlacing). Let A ∈ Mn(R) be an n × n symmetric real matrix. Let B be the (n − 1) × (n − 1) matrix obtained by deleting the i-th column and the i-th row of A (so B is also symmetric). Prove that the eigenvalues of A and B interlace, i. e., if the eigenvalues of A are λ1 ≥ · · · ≥ λn and the eigenvalues of B are µ1 ≥ · · · ≥ µn−1, then

λ1 ≥ µ1 ≥ λ2 ≥ µ2 ≥ · · · ≥ λn−1 ≥ µn−1 ≥ λn .

♦ Chapter 11

(F, R) Bilinear and Quadratic Forms

11.1 (F) Linear and bilinear forms

Definition 11.1.1 (). A linear form is a function f : Fn → F with the following properties.

(a) f(x + y) = f(x) + f(y) for all x, y ∈ Fn;

(b) f(λx) = λf(x) for all x ∈ Fn and λ ∈ F. Exercise 11.1.2. Let f be a linear form. Show that f(0) = 0.

Definition 11.1.3 (). The set of linear forms f : Fn → F is called the dual space of Fn ∗ and is denoted (Fn) .

T Example 11.1.4. The function f(x) = x1 + ··· + xn (where x = (x1, . . . , xn) ) is a linear form. T n More generally, for any a = (α1, . . . , αn) ∈ F , the function

n T X f(x) = a x = αixi (11.1) i=1

is a linear form.

Theorem 11.1.5 (Representation Theorem for Linear Forms). Every linear form f : Fn → F has n the form (11.1) for some column vector a ∈ F . ♦

Babai: Discover Linear Algebra. 96 This chapter last updated August 21, 2016 c 2016 L´aszl´oBabai. 11.1. (F) LINEAR AND BILINEAR FORMS 97

Definition 11.1.6 (Bilinear form). A bilinear form is a function f : Fn × Fn → F with the following properties.

(a) f(x1 + x2, y) = f(x1, y) + f(x2, y) (b) f(λx, y) = λf(x, y)

(c) f(x, y1 + y2) = f(x, y1) + f(x, y2) (d) f(x, λy) = λf(x, y)

Exercise 11.1.7. Let f : Fn × Fn → F be a bilinear form. Show that for all x, y ∈ F, we have f(x, 0) = f(0, y) = 0 . (11.2)

The next result explains the term “bilinear.”

Proposition 11.1.8. The function f : Fn × Fn → F is a bilinear form exactly if n (1) n (1) (a) for all a ∈ F the function fa : F → F defined by fa (x) = f(a, x) is a linear form and

n (2) n (2) (b) for all b ∈ F the function fb : F → F defined by fb (x) = f(x, b) is a linear form. Examples 11.1.9. The standard dot product

n T X f(x, y) = x y = xiyi (11.3) i=1 is a bilinear form. More generally, for any matrix A ∈ Mn(F), the expression n n T X X f(x, y) = x Ay = αijxiyj (11.4) i=1 j=1 is a bilinear form. Theorem 11.1.10 (Representation Theorem for bilinear forms). Every bilinear form f has the form (11.4) for some matrix A. ♦ Definition 11.1.11 (Nonsingular bilinear form). We say that the bilinear form f(x, y) = xT Ay is nonsingular if the matrix A is nonsingular. 98 CHAPTER 11. (F, R) BILINEAR AND QUADRATIC FORMS

Exercise 11.1.12. Is the standard dot product nonsingular?

Exercise 11.1.13.

(a) Assume F does not have characteristic 2, i. e., 1+1 6= 0 in F (R Section 14.3 for more about the characteristic of a field). Prove: if n is odd and the bilinear form f : Fn × Fn → F satisfies f(x, x) = 0 for all x ∈ Fn, then f is singular.

(b)∗ Prove this without the assumption that 1 + 1 6= 0.

(b) Over every field F, find a nonsingular bilinear form f over F2 such that f(x, x) = 0 for all x ∈ F2.

(c) Extend this to all even dimensions.

11.2 (F) Multivariate polynomials

Definition 11.2.1 (Multivariate monomial). A monomial in the variables x1, . . . , xn is an expression of the form n Y ki f = α xi i=1 for some nonzero scalar α and exponents ki. The degree of this monomial is

n X deg f := ki . (11.5) i=1

Examples 11.2.2. The following expressions are multivariate monomials of degree 4 in the variables 2 3 4 x1, . . . , x6: x1x2x3, 5x4x5, −3x6.

Definition 11.2.3 (Multivariate polynomial). A multivariate polynomial is an expression which is the sum of multivariate monomials. Observe that the preceding definition includes the possibility of the empty sum, corresponding to the 0 polynomial. 11.2. (F) MULTIVARIATE POLYNOMIALS 99

n Y ki Definition 11.2.4 (Monic monomial). We call the monomials of the form xi monic. We define i=1 n n Y ki Y ki the monic part of the monomial f = α xi to be the monomial xi . i=1 i=1 Definition 11.2.5 (Standard form of a multivariate polynomial). A multivariate polynomial is in standard form if it is expressed as a (possibly empty) sum of monomials with distinct monic parts. The empty sum of monomials is the zero polynomial, denoted by 0.

Examples 11.2.6. The following are multivariate polynomials in the variables x1, . . . , x7 (in stan- dard form).

3 (a) 3x1x3x5 + 2x2x6 − x4

3 2 (b) 4x1 + 2x5x7 + 3x1x5x7 (c) 0

Definition 11.2.7 (Degree of a multivariate polynomial). The degree of a multivariate polynomial f is the highest degree of a monomial in the standard form expression for f. The degree of the 0 polynomial1 is defined to be −∞.

Exercise 11.2.8. Let f and g be multivariate polynomials. Then

(a) deg(f + g) ≤ max{deg f, deg g} ;

(b) deg(fg) = deg f + deg g .

Note that by our convention, these rules remain valid if f or g is the zero polynomial. Definition 11.2.9 (Homogeneous multivariate polynomial). The multivariate polynomial f is a ho- mogeneous polynomial of degree k if every monomial in the standard form expression of f has degree k.

Fact 11.2.10. The 0 polynomial is a homogeneous polynomial of degree k for all k.

Fact 11.2.11. 1This is in accordance with the natural convention that the maximum of the empty list is −∞. 100 CHAPTER 11. (F, R) BILINEAR AND QUADRATIC FORMS

(a) If f and g are homogeneous polynomials of degree k, then f + g is a homogeneous polynomial of degree k.

(b) If f is a homogeneous polynomial of degree k and g is a homogeneous polynomial of degree `, then fg is a homogeneous polynomial of degree k + `.

Exercise 11.2.12. For all n and k, count the monic monomials of degree k in the variables x1, . . . , xn. Note that this is the dimension (R Def. 15.3.6) of the space of homogeneous polyno- mials of degree k.

Exercise 11.2.13. What are the homogeneous polynomials of degree 0?

Fact 11.2.14. Linear forms are exactly the homogeneous polynomials of degree 1.

In the next section we explore quadratic forms, which are the homogeneous polynomials of degree 2 (R (11.6)).

11.3 (R) Quadratic forms In this section we restrict our attention to the field of real numbers. Definition 11.3.1 (Quadratic form). A quadratic form is a function Q : Rn → Rn where Q(x) = f(x, x) for some bilinear form f.

Definition 11.3.2. Let A ∈ Mn(R) be an n × n matrix. The quadratic form associated with A, denoted QA, is defined by n n T X X QA(x) = x Ax = αijxixj . (11.6) i=1 j=1

Note that for all matrices A ∈ Mn(R), we have QA(0) = 0. Note further that for a symmetric matrix B, the quadratic form QB is the numerator of the Rayleigh quotient RB (R Def. 10.2.5).

Proposition 11.3.3. For all A ∈ Mn(R), there is a unique B ∈ Mn(R) such that B is a symmetric n matrix and QA = QB, i. e., for all x ∈ R , we have

xT Ax = xT Bx . 11.3. (R) QUADRATIC FORMS 101

Exercise 11.3.4. Find the symmetric matrix B such that

2 2 QB(x) = 3x1 − 7x1x2 + 2x2

T where x = (x1, x2) . Definition 11.3.5. Let Q be a quadratic form in n variables.

(a) Q is positive definite if Q(x) > 0 for all x 6= 0.

(b) Q is positive semidefinite if Q(x) ≥ 0 for all x.

(c) Q is negative definite if for all Q(x) < 0 for all x 6= 0.

(d) Q is negative semidefinite if for all Q(x) ≤ 0 for all x.

(e) Q is indefinite if it is neither positive semidefinite nor negative semidefinite, i. e., there exist x, y ∈ Rn such that Q(x) > 0 and Q(y) < 0.

Definition 11.3.6. We say that a matrix A ∈ Mn(R) is positive definite if it is symmetric and its associated quadratic form QA is positive definite. Positive semidefinite, negative definite, negative semidefinite, and indefinite symmetric matrices are defined analogously. Notice that we shall not call a non-symmetric matrix A positive definite, etc., even if the quadratic form QA is positive definite, etc. Exercise 11.3.7. Categorize diagonal matrices according to their definiteness, i. e., tell, which diagonal matrices are positive definite, etc.

Exercise 11.3.8. Let A ∈ Mn(R). Assume QA is positive definite and λ is a real eigenvalue of A. Prove λ > 0. Note that A is not necessarily symmetric.

Corollary 11.3.9. If A is a positive definite (and therefore symmetric by definition) matrix and λ is an eigenvalue of A then λ > 0.

Proposition 11.3.10. Let A ∈ Mn(R) be a symmetric real matrix with eigenvalues λ1 ≥ · · · ≥ λn.

(a) QA is positive definite if and only if λi > 0 for all i.

(b) QA is positive semidefinite if and only if λi ≥ 0 for all i. 102 CHAPTER 11. (F, R) BILINEAR AND QUADRATIC FORMS

(c) QA is negative definite if and only if λi < 0 for all i.

(d) QA is negative semidefinite if and only if λi ≤ 0 for all i.

(e) QA is indefinite if there exist i and j such that λi > 0 and λj < 0 for all i.

Proposition 11.3.11. If A ∈ Mn(R) is positive definite, then its determinant is positive. Exercise 11.3.12. Show that if A and B are symmetric n × n matrices and A ∼ B and A is positive definite, then so is B.

n Definition 11.3.13 (Corner matrix). Let A = (αij)i,j=1 ∈ Mn(R) and define for k = 1, . . . , n, the k corner matrix Ak := (αij)i,j=1 to be the k × k submatrix of A, obtained by taking the intersection of the first k rows of A with the first k columns of A. In particular, An = A. The k-th corner determinant of A is det Ak.

Exercise 11.3.14. Let A ∈ Mn(R) be a symmetric matrix. (a) If A is positive definite then all of its corner matrices are positive definite.

(b) If A is positive definite then all of its corner determinants are positive.

The next theorem says that (b) is actually a necessary and sufficient condition for positive definiteness.

Theorem 11.3.15. Let A ∈ Mn(R) be a symmetric matrix. A is positive definite if and only if all of its corner determinants are positive. ♦ Exercise 11.3.16. Show that the following statement is false for every n ≥ 2: The n×n symmetric matrix A is positive semidefinite if and only if all corner determinants are nonnegative.

Exercise 11.3.17. Show that the statement of Ex. 11.3.16 remains false for n ≥ 3 if in addition we require all diagonal elements to be positive.

∗ ∗ ∗

In the next sequence of exercises we study the definiteness of quadratic forms QA where A is not necessarily symmetric. 11.4. (F) GEOMETRIC ALGEBRA (OPTIONAL) 103

Exercise 11.3.18. Let A ∈ Mn(R) be an n×n matrix such that QA is positive definite. Show that A need not have all of its corner determinants positive. Give a 2 × 2 counterexample. Contrast this with part (b) of Ex. 11.3.14.

Proposition 11.3.19. Let A ∈ Mn(R). If QA is positive definite, then so is QAT ; analogous re- sults hold for positive semidefinite, negative definite, negative semidefinite, and indefinite quadratic forms.

T Exercise 11.3.20. Let A, B, C ∈ Mn(R). Assume B = C AC. Prove: if QA is positive semidefinite then so is QB.

Exercise 11.3.21. Let A, B ∈ Mn(R). Show that if A ∼o B (A and B are orthogonally similar, R Def. 9.2.1) and QA is positive definite then QB is positive definite. α β Exercise 11.3.22. Let A = . Prove that Q is positive definite if and only if α, δ > 0 and γ δ A (β + γ)2 < 4αδ.

Exercise 11.3.23. Let A, B ∈ Mn(R) be matrices that are not necessarily symmetric.

(a) Prove that if A ∼ B and QA is positive definite, then QB cannot be negative definite.

(b) Find 2×2 matrices A and B such that A ∼ B and QA is positive definite but QB is indefinite. (Contrast this with Ex. 11.3.12.)

11.4 (F) Geometric algebra (optional)

Let us fix a bilinear form f : Fn × Fn → F. Definition 11.4.1 (Orthogonality). Let x, y ∈ Fn. We say that x and y are orthogonal with respect to f (notation: x ⊥ y) if f(x, y) = 0. Definition 11.4.2. Let S,T ⊆ Fn. For v ∈ Fn, we say that v is orthogonal to S (notation: v ⊥ S) if for all s ∈ S, we have v ⊥ s. Moreover, we say that S is orthogonal to T (notation: S ⊥ T ) if s ⊥ t for all s ∈ S and t ∈ T . Definition 11.4.3. Let S ⊆ Fn. Then S⊥ (“S perp”) is the set of vectors orthogonal to S, i. e.,

⊥ n S := {v ∈ F | v ⊥ S}. (11.7) 104 CHAPTER 11. (F, R) BILINEAR AND QUADRATIC FORMS

Proposition 11.4.4. For all subsets S ⊆ Fn, we have S⊥ ≤ Fn.

n ⊥⊥ Proposition 11.4.5. Let S ⊆ F . Then S ⊆ S .

⊥ Exercise 11.4.6. Prove: (Fn) = {0} if and only if f is nonsingular.

Exercise 11.4.7. Verify {0}⊥ = Fn. Exercise 11.4.8. What is ∅⊥ ?

∗ ∗ ∗

For the rest of this section we assume that f is nonsingular.

Theorem 11.4.9 (Dimensional complementarity). Let U ≤ Fn. Then dim U + dim U ⊥ = n . (11.8)

Corollary 11.4.10. Let S ⊆ Fn. Then

⊥ S⊥ = span(S) . (11.9)

In particular, if U ≤ Fn then ⊥ U ⊥ = U. (11.10)

Definition 11.4.11 (Isotropic vector). The vector v ∈ Fn is isotropic if v 6= 0 and v ⊥ v. Exercise 11.4.12. Let A = I, i. e., f(x, y) = xT y (the standard dot product).

(a) Prove: over R, there are no isotropic vectors.

2 2 2 (b) Find isotropic vectors in C , F5, and F2.

2 (c) For what primes p is there an isotropic vector in Fp?

Definition 11.4.13 (Totally isotropic subspace). The subspace U ≤ Fn is totally isotropic if U ⊥ U, i. e., U ≤ U ⊥. 11.4. (F) GEOMETRIC ALGEBRA (OPTIONAL) 105

Exercise 11.4.14. If S ⊆ Fn and S ⊥ S then span(S) is a totally isotropic subspace.

n n Corollary 11.4.15 (to Theorem 11.4.9). If U ≤ F is a totally isotropic subspace then dim U ≤ b 2 c.

n n n n Exercise 11.4.16. For even n, find an 2 -dimensional totally isotropic subspace in C , F5 , and F2 . Exercise 11.4.17. Let F be a field and let k ≥ 2. Consider the following statement.

Stm(F, k): If U ≤ Fn is a k-dimensional subspace then U contains an isotropic vector. Prove:

(a) if k ≤ ` and Stm(F, k) is true, then Stm(F, `) is also true;

(b) Stm(F, 2) is true

(b1) for F = F2 and (b2) for F = C;

(c) Stm(F, 2) holds for all finite fields of characteristic 2;

(d) Stm(F, 2) is false for all finite fields of odd characteristic;

(e)∗ Stm(F, 3) holds for all finite fields.

Exercise 11.4.18. Prove: if Stm(F, 2) is true then every maximal totally isotropic subspace of Fn n has dimension b 2 c. In particular, this conclusion holds for F = F2 and for F = C. Exercise 11.4.19. Prove: for all finite fields F, every maximal totally isotropic subspace of Fn has n dimension ≥ 2 − 1. Chapter 12

(C) Complex Matrices

12.1 Complex numbers

Before beginning our discussion of matrices with entries taken from the field C, we provide a refresher of complex numbers and their properties.

Definition 12.1.1 (Complex√ number). A complex number is a number z ∈ C of the form z = a + bi, where a, b ∈ R and i = −1. Notation 12.1.2. Let z be a complex number. Then Re z denotes the real part of z and Im z denotes the imaginary part of z. In particular, if z = a + bi, then Re z = a and Im z = b.

Definition 12.1.3 (Complex conjugate). Let z = a + bi ∈ C. Then the complex conjugate of z, denoted z, is z = a − bi . (12.1)

Exercise 12.1.4. Let z1, z2 ∈ C. Show that

z1 · z2 = z1 · z2 . (12.2)

Fact 12.1.5. Let z = a + bi. Then zz = a2 + b2. In particular, zz ∈ R and zz ≥ 0.

Definition 12.1.6 (Magnitude√ of a complex number). Let z ∈ C. Then the magnitude, norm, or absolute value of z is |z| = zz. If |z| = 1, then z is said to have unit norm.

Babai: Discover Linear Algebra. 106 This chapter last updated August 9, 2016 c 2016 L´aszl´oBabai. 12.2. HERMITIAN DOT PRODUCT IN CN 107

Proposition 12.1.7. Let z ∈ C have unit norm. Then z can be expressed in the form

z = cos θ + i sin θ (12.3)

for some θ ∈ [0, 2π).

Proposition 12.1.8. The complex number z has unit norm if and only if z = z−1.

Until now, we have dealt with matrices over the reals or over a general field F. We now turn our attention specifically to matrices with complex entries.

12.2 Hermitian dot product in Cn

k×` Definition 12.2.1 (Conjugate-transpose). Let A = (αij) ∈ C . The conjugate-transpose of A is ∗ the ` × k matrix A whose (i, j) entry is αji. The conjugate-transpose is also called the (Hermitian) adjoint.

Example 12.2.2. The conjugate-transpose of the matrix

 2 3 − 4i −2i 1 + i −5 6i

is the matrix  2 1 − i 3 + 4i −5  . 2i −6i

Fact 12.2.3. Let A, B ∈ Ck×n. Then

(A + B)∗ = A∗ + B∗ . (12.4)

Exercise 12.2.4. Let A ∈ Ck×n and let B ∈ Cn×`. Show that (AB)∗ = B∗A∗.

Fact 12.2.5. Let λ ∈ C and A ∈ Ck×n. Then (λA)∗ = λA∗. In Section 1.4, we defined the standard dot product (R Def. 1.4.1). We now define the standard Hermitian dot product for vectors in Cn. 108 CHAPTER 12. (C) COMPLEX MATRICES

Definition 12.2.6 (Standard Hermitian dot product). Let v, w ∈ Cn. Then the Hermitian dot product of v with w is n ∗ X v · w := v w = αiβi (12.5) i=1 T T where v = (α1, . . . , αn) and w = (β1, . . . , βn) . In particular, observe that v∗v is real and positive for all v 6= 0. The following pair of exercises show some of the things that would go wrong if we did not conjugate. Exercise 12.2.7. Find a nonzero vector v ∈ Cn such that vT v = 0. Exercise 12.2.8. Let A ∈ Ck×n. (a) Show that rk (A∗A) = rk A.

T  (b) Find A ∈ M2(C) such that rk A A < rk(A). The standard dot product in Rn (R Def. 1.4.1) carries with it the notions of norm and orthogonality; likewise, the standard Hermitian dot product carries these notions with it. Definition 12.2.9 (Norm). Let v ∈ Cn. The norm of v, denoted kvk, is √ kvk := v · v . (12.6) This norm is also referred to as the (complex) Euclidean norm or the `2 norm. T Fact 12.2.10. If v = (α1, . . . , αn) , then v u n uX 2 kvk = t |αi| . (12.7) i=1 Definition 12.2.11 (Orthogonality). The vectors v, w ∈ Cn are orthogonal (notation: x ⊥ y) if v · w = 0. Definition 12.2.12 (Orthogonal system). An orthogonal system in Cn is a list of (pairwise) orthogonal nonzero vectors in Cn. Exercise 12.2.13. Let S ⊆ Ck be an orthogonal system in Cn. Prove that S is linearly independent. Definition 12.2.14 (Orthonormal system). An orthonormal system in Cn is a list of (pairwise) orthogonal vectors in V , all of which have unit norm. So (v1, v2,... ) is an orthonormal system if vi · vj = δij for all i, j. Definition 12.2.15 (Orthonormal basis). An orthonormal basis of Cn is an orthonormal system that is a basis of Cn. 12.3. HERMITIAN AND UNITARY MATRICES 109 12.3 Hermitian and unitary matrices

Definition 12.3.1 (Self-adjoint matrix). The matrix A ∈ Mn(C) is said to be self-adjoint or Hermi- tian if A∗ = A.  8 2 + 5i Example 12.3.2. The matrix is self-adjoint. 2 − 5i −3 Exercise 12.3.3. Which diagonal matrices are Hermitian?

Theorem 12.3.4. Let A be a . Then all eigenvalues of A are real. ♦ Exercise 12.3.5 (Alternative proof of the real Spectral Theorem). The key part of the proof of the Spectral Theorem (R Theorem 19.4.4) given in Section 19.4 is the following lemma (R Lemma 19.4.7): Let A be a symmetric real matrix. Then A has an eigenvector. Derive this lemma from Theorem 12.3.4. Recall that a square matrix A is orthogonal (R Def. 9.1.1) if AT A = I. Unitary matrices are the complex generalization of orthogonal matrices. ∗ Definition 12.3.6 (). The matrix A ∈ Mn(C) is unitary if A A = I. The set of unitary n × n matrices is denoted by U(n).

n Fact 12.3.7. A ∈ Mn(C) is unitary if and only if its columns form an orthonormal basis of C . Proposition 12.3.8. U(n) is a group under matrix multiplication (it is called the unitary group). Exercise 12.3.9. Which diagonal matrices are unitary?

Theorem 12.3.10 (Third Miracle of Linear Algebra). Let A ∈ Mn(C). Then the columns of A are orthonormal if and only if the rows of A are orthonormal. ♦ Proposition 12.3.11. Let A ∈ U(n). Then all eigenvalues of A have absolute value 1.

Exercise 12.3.12. The matrix A ∈ Mn(C) is unitary if and only if A preserves the Hermitian dot product, i. e., for all v, w ∈ Cn, we have (Av)∗(Aw) = v∗w. ∗ Exercise 12.3.13. The matrix A ∈ Mn(C) is unitary if and only if A preserves the norm, i. e., for all v ∈ Cn, we have kAvk = kvk. Warning. The proof of this is trickier than in the real case (R Ex. 9.1.8). 110 CHAPTER 12. (C) COMPLEX MATRICES 12.4 Normal matrices and unitary similarity

Definition 12.4.1 (). A matrix A ∈ Mn(C) is normal if it commutes with its conjugate-transpose, i. e., AA∗ = A∗A.

Exercise 12.4.2. Which diagonal matrices are normal?

Definition 12.4.3 (Unitary similarity). Let A, B ∈ Mn(C). We say that A is unitarily similar to B, −1 denoted A ∼u B, if there exists a unitary matrix U such that A = U BU. Note that U −1BU = U ∗BU because U is unitary.

Proposition 12.4.4. Let A ∼u B. (a) If A is Hermitian then so is B.

(b) If A is unitary then so is B.

(c) If A is normal then so is B.

Proposition 12.4.5. Let A ∼u diag(λ1, . . . , λn). Then (a) A is normal;

(b) if all eigenvalues of A are real then A is Hermitian;

(c) if all eigenvalues of A have unit absolute value then A is unitary.

We note that all of these implications are actually “if and only if,” as we shall demonstrate (R Theorem 12.4.14).

Proposition 12.4.6. A ∈ Mn(C) has an orthonormal eigenbasis if and only if A is unitarily similar to a diagonal matrix.

We now show (Cor. 12.4.8) that this condition is equivalent to A being normal.

Lemma 12.4.7. Let A, B ∈ Mn(C) with A ∼u B. Then A is normal if and only if B is normal. ♦ Corollary 12.4.8. If A is unitarily similar to a diagonal matrix, then A is normal.

Unitary similarity is a powerful tool to study matrices, owing largely to the following theorem. 12.4. NORMAL MATRICES AND UNITARY SIMILARITY 111

Theorem 12.4.9 (Schur). Every matrix A ∈ Mn(C) is unitarily similar to a triangular matrix. ♦

Definition 12.4.10 (Dense subset). A subset S ⊆ Fk×n is dense in Fk×n if for every A ∈ S and every ε > 0, there exists B ∈ S such that every entry of the matrix A − B has absolute value less than ε.

Proposition 12.4.11. Diagonalizable matrices are dense in Mn(C).

Exercise 12.4.12. Complete the proof of the Cayley-Hamilton Theorem over C (R Theorem 8.5.3) using Prop. 12.4.11.

We now generalize the Spectral Theorem to normal matrices.

Theorem 12.4.13 (Complex Spectral Theorem). Let A ∈ Mn(C). Then A has an orthonormal eigenbasis if and only if A is normal. ♦

Before giving the proof, let us restate the theorem.

Theorem 12.4.14 (Complex Spectral Theorem, restated). Let A ∈ Mn(C). Then A is unitarily similar to a diagonal matrix if and only if A is normal. ♦

Exercise 12.4.15. Prove that Theorems 12.4.13 and 12.4.14 are equivalent.

Exercise 12.4.16. Infer Theorem 12.4.14 from Schur’s Theorem (Theorem 12.4.9) via the following lemma.

Lemma 12.4.17. If a triangular matrix is normal, then it is diagonal. ♦

Theorem 12.4.18 (Real version of Schur’s Theorem). If A ∈ Mn(R) and all eigenvalues of A are real, then A is orthogonally similar to an upper triangular matrix. ♦

Exercise 12.4.19 (Third proof of the real Spectral Theorem). Infer the Spectral Theorem for real symmetric matrices (R Theorem 10.1.1) from the complex Spectral Theorem and the fact, separately proved, that all eigenvalues of a symmetric matrix are real (R Theorem 12.3.4). 112 CHAPTER 12. (C) COMPLEX MATRICES 12.5 Additional exercises

Exercise 12.5.1. Find an n × n unitary circulant matrix (R Def. 2.5.12) with no zero entries.

Definition 12.5.2 (Discrete Fourier Transform matrix). The Discrete Fourier Transform (DFT) matrix is the n × n Vandermonde matrix (R Def. 2.5.9) F generated by 1, ω, ω2, . . . , ωn−1, where ω is a primitive n-th root of unity.

√1 Exercise 12.5.3. Let F be the n × n DFT matrix. Prove that n F is unitary. Exercise 12.5.4. Which circulant matrices are unitary? Chapter 13

(C, R) Matrix Norms

13.1 (R) Operator norm

Definition 13.1.1 (Operator norm). The operator norm of a matrix A ∈ Rk×n is defined to be kAxk kAk = max (13.1) n x∈R kxk x6=0

where the k · k notation on the right-hand side represents the Euclidean norm.

Proposition 13.1.2. Let A ∈ Rk×n. Then kAk exists.

Proposition 13.1.3. Let A ∈ Rk×n and λ ∈ R. Then kλAk = |λ|kAk.

Proposition 13.1.4 (Triangle inequality). Let A, B ∈ Rk×n. Then kA + Bk ≤ kAk + kBk.

Proposition 13.1.5 (Submultiplicativity). Let A ∈ Rk×n and B ∈ Rn×`. Then kABk ≤ kAk·kBk.

k×n Exercise 13.1.6. Let A = [a1 | · · · | an] ∈ R . (The ai are the columns of A.) Show kAk ≥ kaik for every i.

k×n Exercise 13.1.7. Let A = (αij) ∈ R . Show that kAk ≥ |αij| for every i and j. Proposition 13.1.8. If A is a orthogonal matrix then kAk = 1.

Babai: Discover Linear Algebra. 113 This chapter last updated August 29, 2016 c 2016 L´aszl´oBabai. 114 CHAPTER 13. (C, R) MATRIX NORMS

Proposition 13.1.9. Let A ∈ Rk×n. Let S ∈ O(k) and T ∈ O(n) be orthogonal matrices. Then kSAk = kAk = kAT k . (13.2)

Proposition 13.1.10. Let A be a symmetric real matrix with eigenvalues λ1 ≥ · · · ≥ λn. Then kAk = max |λi|.

Exercise 13.1.11. Let A ∈ Rk×n. Show that AT A is positive semidefinite (R Def. 11.3.6). k×n T Exercise√ 13.1.12. Let A ∈ R and let A A have eigenvalues λ1 ≥ · · · ≥ λn. Show that kAk = λ1.

Proposition 13.1.13. For all A ∈ Rk×n, we have kAT k = kAk. Exercise 13.1.14. (a) Find a (R Def. 22.1.2) of norm greater than 1. √ (b) Find an n × n stochastic matrix of norm n. √ (c) Show that an n × n stochastic matrix cannot have norm greater than n. Numerical exercise 13.1.15. 1 1 (a) Let A = . Calculate kAk. 0 1

0 1 (b) Let B = . Calculate kBk. 1 1

13.2 (R) Frobenius norm

k×n Definition 13.2.1 (Frobenius norm). Let A = (αij) ∈ R be a matrix. The Frobenius norm of A, denoted kAkF, is defined as s X 2 kAkF = |αij| . (13.3) i,j

k×n p T Proposition 13.2.2. Let A ∈ R . Then kAkF = Tr(A A). 13.3. (C) COMPLEX MATRICES 115

k×n Proposition 13.2.3. Let A ∈ R and λ ∈ R. Then kλAkF = |λ|kAkF.

k×n Proposition 13.2.4 (Triangle inequality). Let A, B ∈ R . Then kA + BkF ≤ kAkF = kBkF.

k×n n×` Proposition 13.2.5 (Submultiplicativity). Let A ∈ R and B ∈ R . Then kABkF ≤ kAkF · kBkF. √ Proposition 13.2.6. If A ∈ O(k) then kAkF = k.

Proposition 13.2.7. Let A ∈ Rk×n. Let S ∈ O(k) and T ∈ O(n) be orthogonal matrices. Then

kSAkF = kAkF = kAT kF . (13.4)

k×n T Proposition 13.2.8. For all A ∈ R , we have kA kF = kAkF.

Proposition 13.2.9. Let A ∈ Rk×n. Then √ kAk ≤ kAkF ≤ nkAk . (13.5)

Exercise 13.2.10. Prove kAk = kAkF if and only if rk A = 1. Use the Singular Value Decompo- sition (R Theorem 21.1.2). √ Exercise 13.2.11. Let A be a symmetric real matrix. Show that kAkF = nkAk if and only if A = λR for some reflection matrix (R ??) R.

13.3 (C) Complex Matrices Exercise 13.3.1. Generalize the definition of the operator norm and statements 13.1.2-13.1.12 to C. Exercise 13.3.2. Generalize the definition of the Frobenius norm and statements 13.2.2-13.2.10 to C. √ Exercise 13.3.3. Let A be a normal matrix with eigenvalues λ1, . . . , λn. Show that kAkF = nkAk if and only if |λ1| = ··· = |λn|. Question 13.3.4. Is normality necessary? Part II

Linear Algebra of Vector Spaces

116 Introduction to Part II

TO BE WRITTEN.

(R Contents)

Babai: Discover Linear Algebra. 117 This chapter last updated August 21, 2016 c 2016 L´aszl´oBabai. Chapter 14

(F) Preliminaries

(R Contents)

14.1 Modular arithmetic

Notation 14.1.1. The integers {..., −2, −1, 0, 1, 2,... } are denoted by Z. The set {1, 2, 3,... } of positive integers is denoted by Z+. Definition 14.1.2 (Sum of sets). Let A, B ⊆ Z. Then A + B is the set A + B = {a + b | a ∈ A, b ∈ B} . (14.1)

Definition 14.1.3 (Cartesian product). Let A and B be sets. Then the Cartesian product of A and B is the set A × B defined by

A × B = {(a, b) | a ∈ A, b ∈ B} . (14.2)

Notice that (a, b) in the above definition is an ordered pair. In particular, if a1, a2 ∈ A, then (a1, a2) and (a2, a1) are distinct elements of A × A. Notation 14.1.4 (Cardinality). For a set A we denote the cardinality of A (the number of elements of A) by |A|. For instance, |{4, 5, 6, 4}| = 3.

Babai: Discover Linear Algebra. 118 This chapter last updated August 9, 2016 c 2016 L´aszl´oBabai. 14.1. MODULAR ARITHMETIC 119

Exercise 14.1.5. Let A, B ⊆ Z with |A| = n and B = m. Show that (a) m + n − 1 ≤ |A + B| ≤ mn

n+1 (b) |A + A| ≤ 2 n+2 (c) |A + A + A| ≤ 3 (d) |A × B| = mn

Definition 14.1.6 (Divisibility). Let a, b ∈ Z. Then a divides b, or a is a divisor of b (notation: a | b) if there exists some c ∈ Z such that b = ac.

Proposition 14.1.7. For all a ∈ Z, 1 | a. Proposition 14.1.8. Show that 0 | 0.

Notice that this does not violate any rules that we know about division by 0 because our definition of divisibility does not involve division.

Exercise 14.1.9. For what a do we have a | b for all b ∈ Z?

Exercise 14.1.10. For what b do we have a | b for all a ∈ Z?

Proposition 14.1.11. Let a, b, c ∈ Z. If c | a and c | b then c | a + b.

Proposition 14.1.12 (Transitivity of divisibility). Let a, b, c ∈ Z. If a | b and b | c, then a | c.

Definition 14.1.13 (Congruence modulo m). Let a, b, m ∈ Z. Then a is congruent to b modulo m (written a ≡ b (mod m)) if m | a − b.

Exercise 14.1.14. Let a, b ∈ Z. Show (a) a ≡ b (mod 1)

(b) a ≡ b (mod 2) if and only if a and b are either both even or both odd

(c) a ≡ b (mod 0) if and only if a = b

Exercise 14.1.15. Let a, b, c, d, m ∈ Z. Show that if a ≡ c (mod m) and b ≡ d (mod m), then 120 CHAPTER 14. (F) PRELIMINARIES

(a) a + b ≡ c + d (mod m)

(b) a − b ≡ c − d (mod m)

(c) ab ≡ cd (mod m)

Exercise 14.1.16. Let k ≥ 0. Show that if a ≡ b (mod m), then ak ≡ bk (mod m).

Example 14.1.17. Modular arithmetic (arithmetic modulo m) models periodic phenomena. When we teach modular arithmetic to kids (Fermat’s little Theorem is a perfectly reasonable topic for math-minded eighth-graders), we may refer to the subject as “calendar arithmetic,” the calendar being a source of periodicity. For example, if August 3 is a Wednesday, then August 24 is also a Wednesday, because 3 ≡ 24 (mod 7).

Definition 14.1.18 (Binary relation). A binary relation on the set A is a subset R ⊆ A × A. The relation R holds for the elements a, b ∈ A if (a, b) ∈ R. In this case, we write aRb or R(a, b). Definition 14.1.19 (Equivalence relation). Let ∼ be a binary relation on a set A. The relation ∼ is said to be an equivalence relation if it has the following properties. For all a, b, c ∈ A,

(a) a ∼ a (reflexivity)

(b) a ∼ b if and only if b ∼ a (symmetry)

(c) If a ∼ b and b ∼ c, then a ∼ c (transitivity)

Proposition 14.1.20. For any fixed m ∈ Z, “congruence modulo m” is an equivalence relation on Z. Definition 14.1.21 (Equivalence class). Let A be a set with an equivalence relation ∼, and let a ∈ A. The equivalence class of a with respect to ∼, denoted [a], is the set

[a] = {b ∈ A | a ∼ b} (14.3)

Definition 14.1.22 (Residue classes). The equivalence classes of the equivalence relation “congruence modulo m” in Z are called “ modulo m residue classes”. Exercise 14.1.23. (a) What are the modulo 2 residue classes? (b) Prove: If m ≥ 1 then there are m residue classes modulo m, and every integer belongs to exactly one of them. 14.1. MODULAR ARITHMETIC 121

Definition 14.1.24 (Partition). Let B1,...,Bm be non-empty subsets of the set A. We say that the set Π = {B1,...,Bm} is a partition of A if every element of A belongs to exactly one of the Bi. We call the Bi the blocks of the partition. Exercise 14.1.25 (Partition defines equivalence relation). Given a partition Π of the set A, let us define the relation ∼Π on A by letting a ∼Π b if and only if a and b belong to the same block of Π. Show that ∼Π is an equivalence relation and the blocks of Π are the equivalence classes of ∼Π. The next theorem shows that equivalence relations define partititions. Theorem 14.1.26 (Fundamental Theorem of Equivalence Relations). Let R be an equivalence relation on the set A. Show that there exists a unique partition of Π of A such that R =∼Π. ♦ The proof is contained in the following exercise. Exercise 14.1.27. Let ∼ be an equivalence relation on the set A and let a, b ∈ A. Prove: (a) a ∈ [a] ; (b) if [a] ∩ [b] 6= ∅ then [a] = [b]. Indicate how each axiom of equivalence relations is used in your proof. Example 14.1.28. Theorem 14.1.26 might as well be called the “fundamental theorem of intelli- gence.” It underlies all concept formation. Humanity discovered that there was something common in a set of three trees, three bears, three mountains, hence came the abstract concept of the number “three.” More generally, what was discovered was the equivalence relation of equicardinality (hav- ing the same number of elements: same number of trees, goats, rocks) and the concept of “number” emerged as our ancestors gave names to the equivalence classes. The notion of “having the same color” is a (somewhat fuzzy) equivalence relation; the equivalence classes are the colors, hence the notion of “color” was born. A vast number of mathematical concepts gives names to equivalence classes of important equiv- alence relations. Of course in real life (as opposed to mathematics) we always encounter fuzzy boundaries, the relations we discover are “similarity relations” as opposed to exact equivalence relations, but the basic idea is the same: we want to form clusters of “similar” objects, and the clusters themselves will represent the new concept. (Think of anything from “flavor” or “texture” to “honesty” or “creditworthiness” or the concept of a “concept.”) Much of and Ma- chine Learning tries to discover clusters and thus imitiate the concept-forming ability of human intelligence. Definition 14.1.29 (Sum of residue classes). Let [a] and [b] be residue classes modulo m. Then their sum is defined by the equation [a] + [b] = [a + b] (14.4) 122 CHAPTER 14. (F) PRELIMINARIES

Definition 14.1.30 (Product of residue classes). Let [a] and [b] be residue classes of Z modulo m. Then their product is defined by the equation [a] · [b] = [a · b] (14.5) These are two examples of “definitions by representatives.” The question arises, are such defi- nitions sound? Let us take two residue classes mod m, call them A and B. If a ∈ A and b ∈ B, we define A + B to be the residue class [a + b]. But what if we pick another pair of representatives, a0 ∈ A and b0 ∈ B. Do we get the same answer? Is it the case that [a + b] = [a0 + b0]? The positive answer is stated in the next exercise. Exercise 14.1.31. Show that the sum and product of residue classes are well defined, that is, that they do not depend on our choice of representative for each residue class.

Proposition 14.1.32. Let Zm denote the set of residue classes modulo m with the operations of addition and multiplication as defined above. Verify the following properties of these operations.

(a) For all a, b ∈ Zm, there exist unique a + b ∈ Zm and a · b ∈ Zm

(b) For all a, b, c ∈ Zm we have (a + b) + c = a + (b + c) and (a · b) · c = a · (b · c) (associativity)

(c) There exists a “additively neutral” element in Zm, which we call “zero” and denote by 0, such that for all a ∈ Zm we have 0 + a = a + 0 = a (zero element)

(d) There exists a “multiplicatively neutral” element in Zm, which we call “(multiplicative) iden- itity” and denote by 1, such that for all a ∈ Zm we have 1 · a = a · 1 = a (identity element)

(e) For each a ∈ Zm, there exists an element −a ∈ Zm such that a + (−a) = 0 (additive inverse)

(f) For all a, b ∈ Zm, we have a + b = b + a and a · b = b · a (commutativity)

(g) For all a, b, c ∈ Zm, a · (b + c) = a · b + a · c (distributivity) Further, if p is a prime number, then

−1 −1 (h) For all a ∈ Zp, if a 6= 0 then there exists an element a ∈ Zp such that a · a = 1 (multiplicative inverse)

(R Contents) 14.2. ABELIAN GROUPS 123 14.2 Abelian groups

A group is a set with a binary operation satisfying certain axioms. The set Z with the operation of addition is an example of group. A class of further examples are the sets Zm of residue classes modulo m, with addition being the operation. We now generalize this notion to include a much larger collection of algebraic structures. Definition 14.2.1 (Group). A group is a set G along with a binary operation ◦ that satisfies the following axioms. (a) For all a, b ∈ G, there exists a unique element a ◦ b ∈ G

(b) For all a, b, c ∈ G,(a ◦ b) ◦ c = a ◦ (b ◦ c) (associativity)

(c) There exists a neutral element e ∈ G such that for all a ∈ G, e ◦ a = a ◦ e = a (identity element in multiplicative notation, zero in additive notation))

(d) For each a ∈ G, there exists b ∈ G such that a ◦ b = b ◦ a = e (inverses) Proposition 14.2.2. The neutral (identity or zero) element of G is unique. For every a ∈ G, the inverse of a is unique. The inverse of an element a is often written as a−1, and we often write ab instead of a ◦ b. The set G along with the binary operation ◦ is the group (G, ◦). We often omit ◦ and we refer to G as the group when the binary operation is clear from context. Groups satisfying the additional axiom (e) a ◦ b = b ◦ a for all a, b ∈ G (commutativity) are called abelian groups. When discussing abelian groups, we often use additive notation (that is, we write the binary operation ◦ as +), and we write the neutral element as 0 and the additive inverse of an element a as −a. Definition 14.2.3 (Order of a group). Let G be a group. The order of G is its cardinality, |G| (the number of its elements). × Observe that Prop. 14.1.32 asserted that (Zm, +) is an for and that (Zp , ·) is an × abelian group when p is prime; here Zp = Zp \{0}. We now present more examples of groups. Examples 14.2.4. Show that the following are abelian groups. 124 CHAPTER 14. (F) PRELIMINARIES

(a) The integers with addition, (Z, +)

(b) The real numbers with addition, (R, +)

(c) The real numbers except for 0 with multiplication, (R×, ×)

(d) For any set Ω, the set RΩ of real functions f :Ω → R with pointwise addition, (RΩ, +), where pointwise addition is defined by the rules (f + g)(ω) = f(ω) + g(ω) for all ω ∈ Ω.

(e) The complex n-th roots of unity, that is, the set

n {ω ∈ C | ω = 1} with multiplication.

k X Definition 14.2.5. In an additive abelian group, we define the sum ai by induction. For the base i=1 1 X case of k = 1, ai = ai, and for k > 1, i=1

k k−1 X X ai = ak + ai . (14.6) i=1 i=1 Convention 14.2.6 (Empty sum). The empty sum is equal to 0, that is, X ai = 0 . (14.7) i∈∅

P0 The notation i=1 ai refers to the empty sum, so this sum is zero. Y Convention 14.2.7 (Empty product). By convention, the empty product ai is equal to 1. In i∈∅ 0 Q0 particular, 0 = 1 and 0! = 1. The notation i=1 ai refers to the empty product, so this product is 1.

Exercise 14.2.8. Let G be an (additive) abelian group and let A, B ⊆ G with |A| = n and B = m. Show that 14.2. ABELIAN GROUPS 125

(a) |A + B| ≤ mn

n+1 (b) |A + A| ≤ 2 n+2 (c) |A + A + A| ≤ 3 Definition 14.2.9 (Subgroup). Let G be a group. If H ⊆ G is nonempty and is a group under the same operation as G, we say that H is a subgroup of G, denoted H ≤ G.

Proposition 14.2.10. Let G be a group and H ⊆ G be a subset. H is a subgroup of G if and only if

(a) The identity element of G is a member of H. (In additive notation, 0 ∈ H.)

(b) H is closed under the operation: if a, b ∈ H then ab ∈ H. (In additive notation, a + b ∈ H.)

(c) H is closed under inversion: if a ∈ H then a−1 ∈ H. (In additive notation, −a ∈ H.)

Exercise 14.2.11. The “subgroup” relation is transitive: a subgroup of a subgroup is a subgroup.

Proposition 14.2.12. The intersection of any collection of subgroups of a group G is itself a subgroup of G.

Exercise 14.2.13. Let G be a group and let H,K ∈ G. Then H ∪ K ≤ G if and only if H ⊆ K or K ⊆ H.

Proposition 14.2.14. Let G be a group and let H ≤ G. Then

(a) The identity of H is the same as the identity of G.

(b) Let a ∈ H. The inverse of a in H is the same as the inverse of a in G.

Notation 14.2.15. Let G be an abelian group, written additively. Let H,K ⊆ G. We write −H for the set −H = {−h | h ∈ H}, and H − K for the set

H − K = H + (−K) = {h − k | h ∈ H, k ∈ K} (14.8)

Proposition 14.2.16. Let G be an abelian group, written additively. Let H ⊆ G. Then H ≤ G if and only if 126 CHAPTER 14. (F) PRELIMINARIES

(a) 0 ∈ H

(b) −H ⊆ H (closed under additive inverses)

(c) H + H ⊆ H (closed under addition)

Proposition 14.2.17. Let G be an abelian group, written additively. Let H ⊆ G. Then H ≤ G if and only if H 6= ∅ and H − H ⊆ H (that is, H is closed under subtraction).

(R Contents)

14.3 Fields

Definition 14.3.1 (Field). A field F is a set with two operations, addition and multiplication, satis- fying the following axioms.

(a) (F, +) is an abelian group.

(b) (F×, ·) is an abelian group, where F× = F \{0}. (c) Distributivity holds: a(b + c) = ab + ac.

Proposition 14.3.2. For all a ∈ F we have 0 · a = a · 0 = 0 and 1 · a = a · 1 = a. Exercise 14.3.3. A field has at least 2 elements: 0 and 1; they are not equal. Why?

Exercise 14.3.4. (a) Q (rational numbers), R (real numbers), C (complex numbers) are exam- ples of fields.

(b) Z is not a field.

(c) For p a prime number, we define Fp = Zp the set of residue classes modulo p with addition and multiplication as the operations. Fp is a field.

(d) Zm is a field if and only if m is a prime number. 14.4. POLYNOMIALS 127

The field Fp has p elements; there are finite fields other than Fp as well. The number of elements of a field is referred to as the order of the field. Fields were invented by Evariste´ Galois (1811-1832) (along with groups and the notion of abstract algebra). Galois showed that (a) the order of every finite field is a prime power and (b) for every prime power q there exists a unique field Fq of order q. Note that Fq 6= Zq unless q is a prime. √ √ Exercise 14.3.5. (a) Prove that the set Q[ 2] = {a + b 2 | a, b ∈ Q} is a field. √ √ √ 3 3 3 (b) Prove that the set Q[ 2] = {a + b 2 + c 4 | a, b, c ∈ Q} is a field.

Exercise 14.3.6. Let p be a prime. Consider the set Fp[i] of formal expressions of the form a + bi 2 (a, b ∈ Fp) where multiplication is performed by observing the rule i = −1. Determine, for what primes p is Fp[i] a field. (Experiment, find a pattern, prove.) This exercise will give you an infinite number of fields of order p2.

(R Contents)

14.4 Polynomials

In Section 8.3, we described some basic theory of polynomials. We now develop a more formal theory of polynomials. We view a polynomial f as a formal expression whose coefficients are taken from a field F of scalars. Definition 14.4.1 (Polynomial). A polynomial over the field F is an expression1 of the form 2 n f = α0 + α1t + α2t + ··· + αnt (14.9) where the coefficients αi are scalars (elements of F), and t is a symbol. The set of all polynomials over F is denoted F[t]. Two expressions, (14.9) and 2 m g = β0 + β1t + β2t + ··· + βmt (14.10) define the same polynomial if they only differ in leading zero coefficients, i. e., there is some k for which α0 = β0, . . . , αk = βk, and all coefficients αj, βj are zero for j > k. We may omit any terms with zero coefficient, e. g., 3 + 0t + 2t2 + 0t3 = 3 + 2t2 (14.11)

1Strictly speaking, a polynomial is an equivalence class of such expressions, as explained in the next sentence. 128 CHAPTER 14. (F) PRELIMINARIES

Definition 14.4.2 (Zero polynomial). The polynomial which has all coefficients equal to zero is called the zero polynomial and is denoted by 0. n Definition 14.4.3 (Leading term). The leading term of a polynomial f = α0 + α1t + ··· + αnt is k the term corresponding to the highest power of t with a nonzero coefficient, that is, the term αkt where αk 6= 0 and αj = 0 for all j > k. The zero polynomial does not have a leading term. Definition 14.4.4 (Leading coefficient). The leading coefficient of a polynomial f is the coefficient of the leading term of f. For example, the leading term of the polynomial 3 + 2t2 + 5t7 is 5t7 and the leading coefficient is 5. Definition 14.4.5 (Monic polynomial). A polynomial is monic if its leading coefficient is 1. n Definition 14.4.6 (Degree of a polynomial). The degree of a polynomial f = α0 + α1t + ··· + αnt , denoted deg f, is the exponent of its leading term. For example, deg (3 + 2t2 + 5t7) = 7. This definition does not assign a degree to the zero polynomial. Convention 14.4.7. The zero polynomial has degree −∞. In fact, this is the only sensible assignment of degree to the zero polynomial (see Ex. 14.4.16). Exercise 14.4.8. Which polynomials have degree 0?

Notation 14.4.9. We denote the set of polynomials of degree at most n over F by Pn[F]. n Definition 14.4.10 (Sum and difference of polynomials). Let f = α0 + α1t + ··· + αnt and g = n β0 + β1t + ··· + βnt be polynomials. Then the sum of f and g is defined as n f + g = (α0 + β0) + (α1 + β1)t + ··· + (αn + βn)t (14.12) and the difference f − g is defined as n f − g = (α0 − β0) + (α1 − β1)t + ··· + (αn − βn)t (14.13) Note that f and g need not be of the same degree; we can add on leading zeros if necessary. Numerical exercise 14.4.11. Let f = 2t + t2 g = 3 + t + 2t2 + 3t3 h = 5 + t3 + t4 Compute the polynomials 14.4. POLYNOMIALS 129

(a) e1 = f − g

(b) e2 = g − h

(c) e3 = h − f

Self-check: verify that e1 + e2 + e3 = 0.

Proposition 14.4.12. Addition of polynomials is (a) commutative and (b) associative, that is, if f, g, h ∈ F[t] then (a) f + g = g + f

(b) f + (g + h) = (f + g) + h

n Definition 14.4.13 (Multiplication of polynomials). Let f = α0 + α1t + ··· + αnt and g = β0 + m β1t + ··· + βmt be polynomials. Then the product of f and g is defined as

n+m ! X X i f · g = αjβk t . (14.14) i=0 j+k=i

Numerical exercise 14.4.14. Let f, g, and h be as in Example 14.4.11. Compute

(a) e4 = f · g

(b) e5 = f · h

(c) e6 = f · (g + h)

Self-check: verify that e4 + e5 = e6.

Proposition 14.4.15. Let f, g ∈ F[t]. Then (a) deg(f + g) ≤ max{deg f, deg g} ,

(b) deg(fg) = deg f + deg g .

Note that both of these statements hold even if one of the polynomials is the zero polynomial. 130 CHAPTER 14. (F) PRELIMINARIES

Exercise 14.4.16. Show that Convention 14.4.7 is the only sensible assignment of “degree” to the zero polynomial in the following sense: no other value from Z ∪ {±∞} will preserve the validity of Prop. 14.4.15.

Notation 14.4.17 (Set of functions). Let A and B be sets. The set of functions f : B → A is denoted by AB. Here B is the domain of the functions f in question and A is the target set or codomain (into which f maps B). The following exercise explains the reason for this notation.

Exercise 14.4.18. Let A and B be finite sets. Show that AB = |A||B|.

n Definition 14.4.19 (Substitution). For ζ ∈ F and f = α0 + α1t + ··· + αnt ∈ F[t], we set

n f(ζ) = α0 + α1ζ + ··· + αnζ ∈ F (14.15)

The substitution t 7→ ζ defines a mapping F[t] → F which assigns the value f(ζ) to f. We denote the F → F function ζ 7→ f(ζ) by f and call f a polynomial function. So f is a function while f is a formal expression. Definition 14.4.20. If A is a k × k matrix, then we define

n f(A) = α0Ik + α1A + ··· + αnA ∈ Mk(F) . (14.16)

Definition 14.4.21 (Divisibility of polynomials). Let f, g ∈ F[t]. We say that g divides f, or f is divisible by g, written g | f, if there exists a polynomial h ∈ F[t] such that f = gh. In this case we say that g is a divisor of f and f is a multiple of g. Notation 14.4.22 (Divisors of a polynomial). Let f ∈ F[t]. We denote by Div(f) the set of all divisors of f. We denote by Div(f, g) the set of polynomials which divide both f and g, that is, Div(f, g) = Div(f) ∩ Div(g).

Theorem 14.4.23 (Division Theorem). Let f, g ∈ F[t] where g is not be the zero polynomial. Then there exist unique polynomials q and r such that

f = qg + r (14.17)

and deg r < deg g. ♦ Exercise 14.4.24. Let f ∈ F[t] and ζ ∈ F. 14.4. POLYNOMIALS 131

(a) Show that there exist q ∈ F[t] and ξ ∈ F such that f = (t − ζ)q + ξ

(b) Prove that ξ = f(ζ)

Exercise 14.4.25. Let f ∈ F[t].

(a) Show that if F is an infinite field, then f = 0 if and only f = 0.

(b) Let F be a finite field of order q. (b1) Prove that there exists f 6= 0 such that f = 0. (b2) Show that if deg f < q, then f = 0 if and only if f = 0. Do not use (b3) to solve this; your solution should be just one line, based on a very simple formula defining f. (b3) (∗) (Fermat’s Little Theorem, generalized) Show that if |F| = q and f = tq − t, then f = 0.

Definition 14.4.26 (Ideal). Let I ⊆ F[t]. Then I is an ideal if the following three conditions hold. (a) 0 ∈ I

(b) I is closed under addition

(c) If f ∈ I and g ∈ F[t] then fg ∈ I

Notation 14.4.27. Let f ∈ F[t]. We denote by (f) the set of all multiples of f, i. e.,

(f) = {fg | g ∈ F[t]} .

Proposition 14.4.28. Let f, g ∈ F[t]. Then (f) ⊆ (g) if and only if g | f.

Definition 14.4.29 (Principal ideal). Let f ∈ F[t]. The set (f) is called the principal ideal generated by f, and f is said to be a generator of this ideal.

Exercise 14.4.30. For f ∈ F[t], verify that (f) is indeed an ideal of F[t].

Theorem 14.4.31 (Every ideal is principal). Every ideal of F[t] is principal. ♦ 132 CHAPTER 14. (F) PRELIMINARIES

Proposition 14.4.32. Let f, g ∈ F[t]. Then (f) = (g) if and only if there exists ζ 6= 0 ∈ F such that g = αf. Theorem 14.4.31 will be our tool to prove the existence of greatest common divisors of polyno- mials.

Definition 14.4.33 (Greatest common divisior). Let f1, . . . , fk, g ∈ F[t]. We say that g is a greatest common divisor (gcd) of f1, . . . , fk if

(a) g is a common divisor of the fi, i. e., g | fi for all i,

(b) g is a common multiple of all common divisors of the fi, i. e., for all e ∈ F[t], if e | fi for all i, then e | g.

Proposition 14.4.34. Let f1, . . . , fk, d, d1, d2 ∈ F[t].

(a) Let ζ be a nonzero scalar. If d is a gcd of f1, . . . , fk, then ζd is also a gcd of f1, . . . , fk.

× (b) If d1 and d2 are both gcds of f1, . . . , fk, then there exists ζ ∈ F = F \{0} such that d2 = ζd1.

Exercise 14.4.35. Let f1, . . . , fk ∈ F[t]. Show that gcd(f1, . . . , fk) = 0 if and only if f1 = ··· = fk = 0.

Proposition 14.4.36. Let f1, . . . , fk ∈ F[t] and suppose not all of the fi are 0. Then among all of the greatest common divisors of f1, . . . , fk, there is a unique monic polynomial.

For the sake of uniqueness of the gcd notation, we write d = gcd(f1, . . . , fk) if, in addition to (a) and (b), (c) d is monic or d = 0.

Theorem 14.4.37 (Existence of gcd). Let f1, . . . , fk ∈ F[t]. Then gcd(f1, . . . , fk) exists and, moreover, that there exist polynomials g1, . . . , gk such that

k X gcd(f1, . . . , fk) = figi (14.18) i=1

♦ Lemma 14.4.38 (Euclid’s Lemma). Let f, g, h ∈ F[t]. Then Div(f, g) = Div(f − gh, g). ♦ 14.4. POLYNOMIALS 133

Theorem 14.4.39. Let f1, . . . , fk, g ∈ F[t]. Then g is a gcd of f1, . . . , fk if and only if Div(f1, . . . , fk) = Div(g). ♦ Exercise 14.4.40. Let f ∈ F[t]. Show that Div(f, 0) = Div(f). By applying the above theorem and Euclid’s Lemma, we arrive at Euclid’s Algorithm for deter- mining the gcd of polynomials. Theorem 14.4.41 (Euclid’s Algorithm). The following algorithm can be used to determine the greatest common divisor of two polynomials f0 and g0. In each iteration of the while loop, r = f−g·q, where r and q are the polynomials guaranteed by the Division Theorem. In particular, deg r < deg g.

f ← f0 g ← g0 while g 6= 0 do Find q and r by the Division Theorem f ← g g ← r end while return f ♦ The following example demonstrates Euclid’s algorithm. Example 14.4.42. Let f = t5 + 2t4 − 3t3 + t2 − 5t + 4 and let g = t2 − 1 over the real numbers. Then

Div(f, g) = Div f − t3 + 2t2 − 2t + 3 g, g = Div t2 − 1, −7t − 7  t  = Div t2 − 1 + (−7t + 7), −7t + 7 7 = Div (t − 1, −7t + 7)  1  = Div t − 1 + (−7t + 7), −7t + 7 7 = Div(0, −7t + 7) = Div(−7t + 7) 134 CHAPTER 14. (F) PRELIMINARIES

1 Thus −7t + 7 is a gcd of f and g, and we can multiply by − 7 to get a monic polynomial. In particular, t − 1 is the gcd of f and g, and we may write gcd(f, g) = t − 1.

Exercise 14.4.43. Let

2 f1 = t + t − 2 (14.19) 2 f2 = t + 3t + 2 (14.20) 3 f3 = t − 1 (14.21) 4 2 f4 = t − t + t − 1 (14.22) over the real numbers. Determine the following greatest common divisors.

(a) gcd(f1, f2)

(b) gcd(f1, f3)

(c) gcd(f1, f3, f4)

(d) gcd(f1, f2, f3, f4)

Proposition 14.4.44. Let f, g, h ∈ F[t] and d = gcd(g, h). Then gcd(fg, fh) = fd. Proposition 14.4.45. If f | gh and gcd(f, g) = 1, then f | h.

Exercise 14.4.46. Determine gcd(f, f 0) where f = tn + t + 1 (over R). Exercise 14.4.47. Determine gcd(tn − 1, t2 + t + 1) over the real numbers.

Let f ∈ F[t] and let F be a subfield of the field G. Then we can also view f as a polynomial over G. However, this changes the set Div(f) and therefore in principle it could affect the gcd of two polynomials. We shall see that up to scalar factors nothing changes, but in order to be able to reason about this question, we temporarily use the notation gcdF(f) and gcdG(f) to denote the gcd of f with respect to the corresponding fields.

Exercise 14.4.48 (Insensitivity of gcd to field extensions). Let F be a subfield of G, and let f, g ∈ F[t]. Then gcd(f, g) = gcd(f, g) . F G 14.4. POLYNOMIALS 135

Definition 14.4.49 (Irreducible polynomial). A polynomial f ∈ F[t] is irreducible over F if deg f ≥ 1 and for all g, h ∈ F[t], if f = gh then either deg g = 0 or deg h = 0. We shall give examples of irreducible polynomials over various fields in Examples ??

Proposition 14.4.50. If f ∈ F[t] is irreducible and ζ is a nonzero scalar, then ζf is irreducible. Proposition 14.4.51. Let f ∈ F[t] with deg f = 1. Then f is irreducible. Proposition 14.4.52. Let f be an irreducible polynomial, and let f | gh. Then either f | g or f | h. Proposition 14.4.53. Every nonzero polynomial is a product of irreducible polynomials.

Theorem 14.4.54 (Unique Factorization). Every polynomial in F[t] can be uniquely written as the product of irreducible polynomials over F. ♦

Uniqueness holds up to the order of the factors and scalar multiples, i. e., if f1 ··· fk = g1 ··· g` where the fi and gj are irreducible, then k = ` and there exists a permutation (bijection) σ : {1, . . . , k} → {1, . . . , k} and nonzero scalars αi ∈ F such that fi = αigσ(i). Definition 14.4.55 (Root of a polynomial). Let f ∈ F[t]. We say that ζ ∈ F is a root of f if f(ζ) = 0. Proposition 14.4.56. Let ζ ∈ F and let f ∈ F[t]. Then t − ζ | f − f(ζ) .

Corollary 14.4.57. Let ζ ∈ F and f ∈ F[t]. Then ζ is a root of f if and only if (t − ζ) | f. Definition 14.4.58 (Multiplicity). The multiplicity of a root ζ of a polynomial f ∈ F[t] is the largest k for which (t − ζ)k | f. √  2 Exercise 14.4.59. Let f ∈ R[t]. Show that f −1 = 0 if and only if t + 1 | f. Proposition 14.4.60. Let f be a polynomial of degree n. Then f has at most n roots (counting multiplicities).

Theorem 14.4.61 (Fundamental Theorem of Algebra). Let f ∈ C[t]. If deg f ≥ 1, then f has a complex root, i. e., there exists ζ ∈ C such that f(ζ) = 0. ♦ Proposition 14.4.62. If f ∈ C[t], then f is irreducible over C if and only if deg f = 1. 136 CHAPTER 14. (F) PRELIMINARIES

Proposition 14.4.63. If f ∈ C[t] and deg f = k ≥ 0, then f can be written as

k Y f = αk (t − ζi) (14.23) i=1

where αk is the leading coefficient of f and the ζi are complex numbers.

2 Proposition 14.4.64. Let f ∈ P2[R] be given by f = at + bt + c with a 6= 0. Then f is irreducible over R if and only if b2 − 4ac < 0.

Exercise 14.4.65. Let f ∈ F[t].

(a) If f has a root in F and deg f ≥ 2, then f is reducible.

(b) Find a reducible polynomial over R that has no real root.

Proposition 14.4.66. Let f ∈ R[t] be of odd degree. Then f has a real root.

Proposition 14.4.67. Let f ∈ R[t] and ζ ∈ C. Then f(ζ) = f(ζ). Conclude that if ζ is a complex root of f, then so is ζ.

Exercise 14.4.68. Let f ∈ R[t]. Show that if f is irreducible over R, then deg f ≤ 2. Q Exercise 14.4.69. Let f ∈ R[t] and f 6= 0. Show that f can be written as f = gi where each gi has degree 1 or 2.

n X i Definition 14.4.70 (Formal derivative of a polynomial). Let f = αit . Then the formal derivative i=0 of f is defined to be n 0 X k−1 f = kαkt (14.24) k=1 That is, 0 n−1 f = α1 + 2α2t + ··· + nαnt . (14.25) Note that this definition works even over finite fields. We write f (k) to mean the k-th derivative of f, defined inductively as f (0) = f and f (k+1) = f (k)0. 14.4. POLYNOMIALS 137

Proposition 14.4.71 (Linearity of differentiation). Let f and g be a polynomials and let ζ be a scalar. Then (a) (f + g)0 = f 0 + g0 ,

(b) (ζf)0 = ζf 0 . Proposition 14.4.72 (Product rule). Let f and g be a polynomials and let ζ be a scalar. Then

(fg)0 = f 0g + fg0 (14.26)

Definition 14.4.73 (Composition of polynomials). Let f and g be polynomials. Then the composition of f with g, denoted f ◦ g is the polynomial obtained by replacing all occurrences of the symbol t in the expression for f with g, i. e., f ◦ g = f(g) (we “substitute g” into f). Proposition 14.4.74. Let f and g be polynomials and let ζ be a scalar. Then

(f ◦ g)(ζ) = f(g(ζ)) . (14.27)

Proposition 14.4.75 (Chain Rule). Let f, g ∈ F[t] and let h = f ◦ g. Then h0 = (f 0 ◦ g) · g0 . (14.28)

Proposition 14.4.76.

(a) Let2 F be Q, R, or C. Let f ∈ F[t] and let ζ ∈ F. Then (t − ζ)k | f if and only if f(ζ) = f 0(ζ) = ··· = f (k−1)(ζ) = 0

(b) This is false if F = Fp. Proposition 14.4.77. Let f ∈ C[t]. Then f has no multiple roots if and only if gcd(f, f 0) = 1. Exercise 14.4.78. Let n ≥ 1. Prove that the polynomial f = tn + t + 1 has no multiple roots in C.

(R Contents)

2This exercise holds for all subfields of C and more generally for all fields of characteristic 0. Chapter 15

(F) Vector Spaces: Basic Concepts

(R Contents)

15.1 Vector spaces

Many sets with which we are familiar have the desirable property that they are closed under “linear combinations” of their elements. Consider, for example, the set RR of real functions f : R → R. If f, g ∈ RR and α, β ∈ R, then αf + βg ∈ RR. The same is true for the set C[0, 1] of continuous functions f : [0, 1] → R as well as many other sets, like the set R[t] of polynomials with real coefficients (where t denotes the variable), the set RN of sequences of real numbers, and the 2- and 3-dimensional geometric spaces G2 and G3. We formalize the common properties of these sets, endowed with the notion of linear combination, with the concept of a . Let F be a (finite or infinite) field. The reader not familiar with fields may think of F as being R or C. Definition 15.1.1 (Vector space). A vector space V over a field F is a set V with certain operations. We refer to the elements of V as “vectors” ad the elements of F as “scalars.” We assume a binary operation of addition of vectors, and an operation of multiplication of a vector by a scalar, satisfying the following axioms.

(a) (V, +) is an abelian group (see Sec. 14.2). In other words,

Babai: Discover Linear Algebra. 138 This chapter last updated August 16, 2016 c 2016 L´aszl´oBabai. 15.1. VECTOR SPACES 139

(b1) For all u, v ∈ V there exists a unique element u + v ∈ V (b2) Addition is commutative: for all u, v ∈ V we have u + v = v + u (b3) Addition is associative: for all u, v, w ∈ V we have u + (v + w) = (u + v) + w (b4) There is a zero element, denoted 0, such that for all v ∈ V we have 0 + v = v (b5) Every element v ∈ V has an additive inverse, denoted −v, such that v + (−v) = 0.

(b) V comes with a scaling function V × F → V such that (b1) For all α ∈ F and v ∈ V , there exists a unique vector αv ∈ V (b2) For all α, β ∈ F and v ∈ V ,(αβ)v = α(βv) (b3) For all α, β ∈ F and v ∈ V ,(α + β)v = αv + βv (b4) For every α ∈ F and u, v ∈ V , α(u + v) = αu + αv (b5) For every v ∈ V , 1 · v = v (normalization)

The zero vector in a vector space V is written as 0V , but we often just write 0 when the context is clear. Property (b2) is referred to as “pseudo-associativity,” because it is a form of associativity in which we are dealing with different operations (multiplication in F and scaling of vectors). Similarly, Properties (b3) and (b4) are both types of “pseudo-distributivity.” Proposition 15.1.2. Let V be a vector space over the field F. For all v ∈ V , α ∈ F we have (a) 0v = 0. (b) α0 = 0. (c) αv = 0 if and only if either α = 0 or v = 0. Exercise 15.1.3. Let V be a vector space and let x ∈ V . Show that x+x = x if and only if x = 0. Example 15.1.4 (Elementary geometry). The most natural examples of vector spaces are the geometric spaces G2 and G3. We write G2 for the plane and G3 for the “space” familiar from elementary geometry. We think of G2 and G3 has having a special point called the origin. We view the points of G2 and G3 as “vectors” (line segments from the origin to the point). Addition is defined by the parallelogram rule and multiplication by scalars (over R) by scaling. Observe that G2 and G3 are vector spaces over R. These classical geometries form the foundation of our intuition about vector spaces. 140 CHAPTER 15. (F) VECTOR SPACES: BASIC CONCEPTS

2 Note that G2 is not the same as R . Vectors in G2 are directed segments (geometric objects), while the vectors in R2 are pairs of numbers. The connection between these two is one of the great discoveries of the mathematics of the modern era (Descartes).

Examples 15.1.5. Show that the following are vector spaces over R.

(a) Rn

(b) Mn(R)

(c) Rk×n

(d) C[0, 1], the space of continuous real-valued functions f : [0, 1] → R

(e) The space RN of infinite sequences of real numbers

(f) The space RR of real functions f : R → R

(g) The space R[t] of polynomials in one variable with real coefficients.

(h) For all k ≥ 0, the space Pk[R] of polynomials of degree at most k with coefficients in R

(i) The space RΩ of functions f :Ω → R where Ω is an arbitrary set.

In Section 11.1, we defined the notion of a linear form over Fn (R Def. 11.1.1). This generalizes immediately to vector spaces over F. Definition 15.1.6 (Linear form). Let V be a vector space over F.A linear form is a function f : V → F with the following properties.

(a) f(x + y) = f(x) + f(y) for all x, y ∈ Fn;

(b) f(λx) = λf(x) for all x ∈ Fn and λ ∈ F.

Definition 15.1.7 (Dual space). Let V be a vector space over F. The set of linear forms f : V → F is called the dual space of V and is denoted V ∗.

Exercise 15.1.8. Let V be a vector space over F. Show that V ∗ is also a vector space over F. In Section 1.1, we defined linear combinations of column vectors (R Def. 1.1.13). This is easily generalized to linear combinations of vectors in any vector space. 15.2. SUBSPACES AND SPAN 141

Definition 15.1.9 (Linear combination). Let V be a vector space over F, and let v1,..., vk ∈ V , k X α1, . . . , αk ∈ F. Then αivi is called a linear combination of the vectors v1,..., vk. The linear i=1 combination for which all coefficients are zero is the trivial linear combination. Exercise 15.1.10 (Empty linear combination). What is the linear combination of the empty set? R Convention 14.2.6 explains our convention for the empty sum. Exercise 15.1.11. (a) Express the polynomial t − 1 as a linear combination of the polynomials t2 − 1, (t − 1)2, t2 − 3t + 2.

(b) Give an elegant proof that the polynomial t2 + 1 cannot be expressed as a linear combination of the polynomials t2 − 1, (t − 1)2, t2 − 3t + 2.

Exercise 15.1.12. For α ∈ R, express cos(t + α) as a linear combination of cos t and sin t.

(R Contents)

15.2 Subspaces and span

In Section 1.2, we studied subspaces and span in the context of Fk. We now generalize this to arbitrary vector spaces. For this section, we will take V to be a vector space over a field F and we will let W, S ⊆ V . Definition 15.2.1 (Subspace). W ⊆ V is a subspace (notation: W ≤ V ) if W is closed under linear combinations. Proposition 15.2.2. Let W ≤ V . Then W is also a vector space.

Exercise 15.2.3. Verify that Propositions 1.2.3 and 1.2.9 hold when Fn is replaced by V .

Exercise 15.2.4. Describe all subspaces of the geometric spaces G2 and G3.

Exercise 15.2.5. Determine which of the following are subspaces of R[t], the space of polynomials in one variable over R. 142 CHAPTER 15. (F) VECTOR SPACES: BASIC CONCEPTS

(a) {f ∈ R[t] | deg(f) = 5}

(b) {f ∈ R[t] | deg(f) ≤ 5}

(c) {f ∈ R[t] | f(1) = 1}

(d) {f ∈ R[t] | f(1) = 0} √  (e) {f ∈ R[t] | f 2 = 0} √  (f) {f ∈ R[t] | f −1 = 0}

(g) {f ∈ R[t] | f(1) = f(2)}

(h) {f ∈ R[t] | f(1) = (f(2))2}

(i) {f ∈ R[t] | f(1)f(2) = 0}

(j) {f ∈ R[t] | f(1) = 3f(2) + 4f(3)}

(k) {f ∈ R[t] | f(1) = 3f(2) + 4f(3) + 1}

(l) {f ∈ R[t] | f(1) ≤ f(2)}

Definition 15.2.6 (Span). Let V be a vector space and let v1,..., vm ∈ V . Then the span of S = {v1,..., vm}, denoted span(v1,..., vm), is the smallest subspace of V containing S, i. e.,

(a) span S ⊇ S;

(b) span S is closed under linear combinations

(c) for every subspace W ≤ V , if S ⊆ W then span S ≤ W .

Exercise 15.2.7. Repeat the exercises of Section 1.2, replacing Fk by V .

(R Contents) 15.3. LINEAR INDEPENDENCE AND BASES 143 15.3 Linear independence and bases

Let V be a vector space. In Section 1.3, we defined the notion of linear independence of matrices (R Def. 1.3.5). We now generalize this to linear independence of a list (R Def. 1.3.1) of vectors in a general vector space.

Definition 15.3.1 (Linear independence). The list (v1,..., vk) of vectors in V is said to be linearly independent over F if the only linear combination equal to 0 is the trivial linear combination. The list (v1,..., vk) is linearly dependent if it is not linearly independent.

Definition 15.3.2. If a list (v1,..., vk) of vectors is linearly independent (dependent), we say that the vectors v1,..., vk are linearly independent (dependent). Definition 15.3.3. We say that a set of vectors is linearly independent if a list formed by its elements (in any order and without repetitions) is linearly independent.

Exercise 15.3.4. Show that the following sets are linearly independent over Q. √ √ (a) 1, 2, 3 √ (b) { x | x is square-free} (an integer n is square-free if there is no perfect square k 6= 1 such that k | n).

Exercise 15.3.5. Let U1,U2 ≤ V with U1 ∩ U2 = {0}. Let v1,..., vk ∈ U1 and w1,..., w` ∈ U2. If the lists (v1,..., vk) and (w1,..., w`) are linearly independent, then so is the list (v1,..., vk, w1,..., w`).

Definition 15.3.6 (Rank and dimension). The rank of a set of vectors is the maximum number of linearly independent vectors among them. For a vector space V , the dimension of V is its rank, that is, dim V = rk V .

Exercise 15.3.7.

(a) When are two vectors in G2 linearly dependent?

(b) When are two vectors in G3 linearly dependent?

(c) When are three vectors in G3 linearly dependent?

Phrase your answers in geometric terms. 144 CHAPTER 15. (F) VECTOR SPACES: BASIC CONCEPTS

Definition 15.3.8. We say that the vector w depends on the list (v1,..., vk) of vectors if w ∈ span(v1,..., vk), i. e., if w can be expressed as a linear combination of the vi.

Definition 15.3.9 (Linear independence of an infinite list). We say that an infinite list (vi | i ∈ I) (where I is an index set) is linearly independent if every finite sublist (vi | i ∈ J) (where J ⊆ I and |J| < ∞) is linearly independent.

Exercise 15.3.10. Verify that Exercises 1.3.11-1.3.26 hold in general vector spaces (replace Fn by V where necessary).

Example 15.3.11. For k = 0, 1, 2,... , let fk be a polynomial of degree k. Show that the infinite list (f0, f1, f2,... ) is linearly independent.

Exercise 15.3.12. Find three nonzero vectors in G3 that are linearly dependent but no two are parallel.

Exercise 15.3.13. Prove that for all α, β ∈ R, the functions sin(t), sin(t+α), sin(t+β) are linearly dependent (as members of the RR).

Exercise 15.3.14. Let α1 < α2 < ··· < αn ∈ R. Consider the vectors

 i  α1  i  α2 vi =  .  (15.1)  .    i αn

0 for i ≥ 0 (recall the convention that α = 1 even if α = 0). Show that (v0,..., vn−1) is linearly independent.

Exercise 15.3.15 (Moment curve). Find a continuous curve in Rn, i. e., a continuous injective function f : R → Rn, such that every set of n points on the curve is linearly independent. The simplest example is called the “moment curve,” and we bet you will find it.

Exercise 15.3.16. Let α1 < α2 < ··· < αn ∈ R, and define the degree-n polynomial

n Y f = (t − αj) j=1 15.3. LINEAR INDEPENDENCE AND BASES 145

For each 1 ≤ i ≤ n, define the polynomial gi of degree n − 1 by

n f Y g = = (t − α ) . i t − α j i j=1 j6=i

Prove that the polynomials g1, . . . , gn are linearly independent. Definition 15.3.17. We say that a set S ⊆ V generates V if span(S) = V . If S generates V , then S is said to be a set of generators of V . Definition 15.3.18 (Finite-dimensional vector space). We say that the vector space V is finite di- mensional if V has a finite set of generators. A vector space which is not finite dimensional is infinite dimensional.

Exercise 15.3.19. Show that F[t] is infinite dimensional.

Definition 15.3.20 (Basis). A list e = (e1,..., ek) of vectors is a basis of V if e is linearly independent and generates V . In Section 1.3, we defined the standard basis of the space Fn (R Def. 1.3.34). However, general vector spaces do not have the notion of a standard basis. Note that a list of vectors is not the same as a set of vectors, but a list of vectors which is linearly independent necessarily has no repeated elements. Note further that lists carry with them an inherent ordering; that is, bases are ordered.

Examples 15.3.21.

2 k (a) Show that the polynomials 1, t, t , . . . , t form a basis for Pk[F].

2 2 2 (b) Show that the polynomials t + t + 1, t − 2t + 2, t − t − 1 form a basis of P2[F]. (c) Express the polynomial f = 1 as a linear combination of these basis vectors.

Examples 15.3.22. For each of the following sets S, describe the vectors in span(S) and give a basis for span(S).

(a) S = {t, t2} ⊆ F[t]

(b) S = {sin(t), cos(t), cos(2t), eit} ⊆ CR 146 CHAPTER 15. (F) VECTOR SPACES: BASIC CONCEPTS        1 0 4  (c) S = 1 , 1 , −7 ⊆ F3  1 0 4 

1 0 3 0  (d) S = , , 0 1 0 −5 0 2 2 3  , ⊆ M ( ) 0 0 0 −7 2 F      1 0 0 0 0 0 (e) S = 0 −3 0 , 0 6 0 ,  0 0 0 0 0 2     1 0 0 2 0 0  0 0 0 , 0 3 0 ⊆ M3(F) 0 0 1 0 0 3 

2 2 2 Example 15.3.23. Show that the polynomials t ,(t + 1) , and (t + 2) form a basis for P2[F]. Express the polynomial t in terms of this basis, and write its coordinate vector.

Exercise 15.3.24. For α ∈ R, write the coordinate vector of cos(t + α) in the basis (cos t, sin t). Exercise 15.3.25. Find a basis of the 0-weight subspace of Fk (the 0-weight subspace is defined in R Ex. 1.2.7). Exercise 15.3.26.

(a) Find a basis of Mn(F).

(b) Find a basis of Mn(F) consisting of nonsingular matrices. Warning. Part (b) is easier for fields of characteristic 0 than for fields of finite characteristic.

Proposition 15.3.27. Let b = (b1,..., bn) be a list of vectors in V . Then b is a basis of V if and only if every vector can be uniquely expressed as a linear combination of the bi, i. e., for every n X v ∈ V , there exists a unique list of coefficients (α1, . . . , αn) such that v = αibi. i=1 Definition 15.3.28 (Maximal linearly independent set). A linearly independent set S ⊆ V is maximal if, for all v ∈ V \ S, S ∪ {v} is not linearly independent. 15.4. THE FIRST MIRACLE OF LINEAR ALGEBRA 147

Proposition 15.3.29. Let e be a list of vectors in a vector space V . Then e is a basis of V if and only if it is a maximal linearly independent set. Proposition 15.3.30. Let V be a vector space. Then V has a basis (Zorn’s lemma is needed for the infinite-dimensional case). Proposition 15.3.31. Let e be a list of vectors in V . Then it is possible to extend e to a basis of V , that is, there exists a basis of V which has e as a sublist. Proposition 15.3.32. Let V be a vector space and let S ⊆ V be a set of generators of V . Then there exists a list e of vectors in S such that e is a basis of V .

Definition 15.3.33 (Coordinates). The coefficients α1, . . . , αn of Ex. 15.3.27 are called the coordi- nates of v with respect to the basis b.

Definition 15.3.34 (Coordinate vector). Let b = (b1,..., bk) be a basis of the vector space V , and let v ∈ V . Then the column vector representation of v with respect to the basis b, or the coordinatization of v with respect to b, denoted by [v]b, is obtained by arranging the coordinates of v with respect to b in a column, i. e.,   α1 α   2 [v]b =  .  (15.2)  .  αk

k X where v = αibi. i=1

(R Contents)

15.4 The First Miracle of Linear Algebra

In Section 1.3, we proved the First Miracle of Linear Algebra for Fn (R Theorem 1.3.40). This generalizes immediately to abstract vector spaces.

Theorem 15.4.1 (First Miracle of Linear Algebra). Let v1,..., vk be linearly independent with vi ∈ span(w1,..., wm) for all i. Then k ≤ m. 148 CHAPTER 15. (F) VECTOR SPACES: BASIC CONCEPTS

The proof of this theorem requires the following lemma.

Lemma 15.4.2 (Steinitz exchange lemma). Let (v1,..., vk) be a linearly independent list such that vi ∈ span(w1,..., wm) for all i. Then there exists j (1 ≤ j ≤ m) such that the list (wj, v2,..., vk) is linearly independent. ♦ Corollary 15.4.3. Let V be a vector space. All bases of V have the same cardinality.

This is an immediate corollary to the First Miracle. The following theorem is essentially a restatement of the First Miracle of Linear Algebra.

Theorem 15.4.4.

(a) Use the First Miracle to derive the fact that rk(v1,..., vk) = dim (span(v1,..., vk)).

(b) Derive the First Miracle from the statement that rk(v1,..., vk) = dim (span(v1,..., vk)).

Exercise 15.4.5. Let V be a vector space of dimension n, and let v1,..., vn ∈ V . The following are equivalent:

(a) (v1,..., vn) is a basis of V

(b) v1,..., vn are linearly independent

(c) V = span(v1,..., vn)

Exercise 15.4.6. Show that dim (Fn) = n.

k×n Exercise 15.4.7. Show that dim F = kn.

Exercise 15.4.8. Show that dim(Pk) = k + 1, where Pk is the space of polynomials of degree at most k.

Exercise 15.4.9. What is the√ dimension of the subspace of R[t] consisting of polynomials f of degree at most n such that f −1 = 0?

Exercise 15.4.10. Show that, if f is a polynomial of degree n, then (f(t), f(t+1), . . . , f(t+n−1)) is a basis of Pn[F]. 15.4. THE FIRST MIRACLE OF LINEAR ALGEBRA 149

Exercise 15.4.11. Show that any list of polynomials, one of each degree 0, . . . , n, forms a basis of Pn[F].

Proposition 15.4.12. Let V be an n-dimensional vector space with subspaces U1,U2 such that U1 ∩ U2 = {0}. Then dim U1 + dim U2 ≤ n . (15.3)

Proposition 15.4.13 (Modular equation). Let V be a vector space, and let U1,U2 ≤ V . Then

dim(U1 + U2) + dim(U1 ∩ U2) = dim U1 + dim U2 . (15.4)

n×n Exercise 15.4.14. Let A = (αij) ∈ R , and assume the columns of A are linearly independent. Prove that it is always possible to change the value of an entry in the first row so that the columns of A become linearly dependent.

Exercise 15.4.15. Call a sequence (a0, a1, a2,... ) “Fibonacci-like” if for all n, an+2 = an+1 + an. (a) Prove that Fibonacci-like sequences form a 2-dimensional vector space.

(b) Find a basis for the space of Fibonacci-like sequences.

(c) Express the Fibonacci sequence (0, 1, 1, 2, 5, 8,... ) as a linear combination of these basis vec- tors. ♥ Exercise 15.4.16. Let f be a polynomial. Prove that f has a multiple g = f · h 6= 0 in which every exponent is prime, i. e., X p g = αpx (15.5) p prime for some coefficients αp. Definition 15.4.17 (Elementary operations). The three types of elementary operations defined as “elementary column operations” in Sec. 3.2 can be applied to any list of vectors. Proposition 15.4.18. Performing elementary operations does not change the rank of a set of vectors.

(R Contents) 150 CHAPTER 15. (F) VECTOR SPACES: BASIC CONCEPTS 15.5 Direct sums

Definition 15.5.1 (Disjoint subspaces). Two subspaces U1,U2 ≤ V are disjoint if U1 ∩ U2 = {0}.

Definition 15.5.2 (Direct sum). Let V be a vector space and let U1,U2 ≤ V . We say that V is the direct sum of U1 and U2 (notation: (V = U1 ⊕ U2) if V = U1 + U2 and U1 ∩ U2 = {0}. More generally, we say that V is the direct sum of subspaces U1,...,Uk ≤ V (notation: V = U1 ⊕ · · · ⊕ Uk) if V = U1 + ··· + Uk and for all i, ! X Ui ∩ Uj = {0} . (15.6) j6=i

Proposition 15.5.3. Let V be a vector space and let U1,...,Uk ≤ V . Then

k X dim(U1 ⊕ · · · ⊕ Uk) = dim Ui . (15.7) i=1

The following exercise shows that this could actually be used as the definition of the direct sum in finite-dimensional spaces, but that statement in fact holds in infinite-dimensional spaces as well.

Proposition 15.5.4. Let V be a finite-dimensional vector space, and let U1,...,Uk ≤ V , with

k ! k X X dim Ui = dim Ui . i=1 i=1 Then k k X M Ui = Ui . i=1 i=1

k X Proposition 15.5.5. Let V be a vector space and let Ui,...Uk ≤ V . Then W = Ui is a direct i=1 sum if and only if for every choice of k vectors ui (i = 1, . . . , k) where ui ∈ Ui \{0}, the vectors u1,..., uk are linearly independent.

We note that the notion of direct sum extends the notion of linear independence to subspaces. 15.5. DIRECT SUMS 151

Proposition 15.5.6. The vectors v1,..., vk are linearly independent if and only if

k k X M span(vi) = span(vi) . (15.8) i=1 i=1

(R Contents) Chapter 16

(F) Linear Maps

(R Contents)

16.1 Linear map basics

Definition 16.1.1 (Linear map). Let V and W be vector spaces over the same field F. A function ϕ : V → W is called a linear map or homomorphism if for all v, w ∈ V and α ∈ F (a) ϕ(αv) = αϕ(v)

(b) ϕ(v + w) = ϕ(v) + ϕ(w)

Exercise 16.1.2. Show that if ϕ : V → W is a linear map, then ϕ(0V ) = 0W . Proposition 16.1.3. Linear maps preserve linear combinations, i. e.,

k ! k X X ϕ αivi = αiϕ(vi) (16.1) i=1 i=1 Exercise 16.1.4. True or false?

(a) If the vectors v1,..., vk are linearly independent, then ϕ(v1), . . . , ϕ(vk) are linearly indepen- dent.

Babai: Discover Linear Algebra. 152 This chapter last updated August 29, 2016 c 2016 L´aszl´oBabai. 16.1. LINEAR MAP BASICS 153

(b) If the vectors v1,..., vk are linearly dependent, then ϕ(v1), . . . , ϕ(vk) are linearly dependent. Example 16.1.5. Our prime examples of linear maps are those defined by matrix multiplication. k×n n k Every matrix A ∈ F defines a linear map ϕA : F → F by x 7→ Ax. Example 16.1.6. The projection of R3 onto R2 defined by   α1   α1 α2 7→ (16.2) α2 α3

is a linear map (verify!). R 1 Example 16.1.7. The map f 7→ 0 f(t)dt is a linear map from C[0, 1] → R (verify!). d d Example 16.1.8. Differentiation dt is a linear map dt : Pn[F] → Pn−1[F] from the space of polyno- mials of degree ≤ n to the space of polynomials of degree ≤ n − 1 (verify!).

k Example 16.1.9. The map ϕ : Pn[R] → R defined by   f(α1) f(α )  2  f 7→  .  (16.3)  .  f(αk)

where α1 < α2 < ··· < αn is a linear map (verify!).

R Example 16.1.10. Let us fix n + 1 distinct real numbers α0 < α1 < ··· < αn. Let f ∈ R be a real function. Interpolation is the map f 7→ L(f) ∈ Pn(R) where L(f) = p is the unique polynomial of degree ≤ n with the property that p(αi) = f(αi)(i = 0, . . . , n). This is a linear map from R R → Pn(R) (verify!).

Example 16.1.11. The map ϕ : Pn(R) → Tn = span{1, cos w, cos 2w, . . . , cos nw} defined by

2 n ϕ α0 + α1t + α2t + ··· + αnt n = α0 + α1 cos w + α2 cos 2w + ··· + αn cos t (16.4)

is a linear map from polynomials to trigonometric polynomials (verify!). 154 CHAPTER 16. (F) LINEAR MAPS

Notation 16.1.12. The set of linear maps ϕ : V → W is denoted by Hom(V,W ). Fact 16.1.13. Hom(V,W ) is a subspace of the function space W V . Definition 16.1.14 (Composition of linear maps). Let U, V , and W be vector spaces, and let ϕ : U → V and ψ : V → W be linear maps. Then the composition of ψ with ϕ, denoted by ψ ◦ ϕ or ψϕ, is the map η : U → W defined by

η(v) := ψ(ϕ(v)) (16.5)

ϕ ψ FIGURE: U −→ V −→ W Proposition 16.1.15. Let U, V , and W be vector spaces and let ϕ : U → V and ψ : V → W be linear maps. Then ϕ ◦ ψ is a linear map. Linear maps are uniquely determined by their action on a basis, and we are free to choose this action arbitrarily. This is more formally expressed in the following theorem. Theorem 16.1.16 (Degree of freedom of linear maps). Let V and W be vector spaces with e = (e1,..., ek) a basis of V , and w1,..., wk arbitrary vectors in W . Then there exists a unique linear map ϕ : V → W such that ϕ(vi) = wi for 1 ≤ i ≤ k. ♦ Exercise 16.1.17. Show that dim(Hom(V,W )) = dim(V ) · dim(W ). In Section 15.3 we represented vectors by the column vectors of their coordinates with respect to a given basis. As the next step in translating geometric objects to tables of numbers, we assign matrices to linear maps relative to given bases in the domain and the target space. Our key tool for this endeavor is Theorem 16.1.16.

(R Contents)

16.2 Isomorphisms

Let V and W be vector spaces over the same field. Definition 16.2.1 (Isomorphism). A linear map ϕ ∈ Hom(V,W ) is said to be an isomorphism if it is a bijection. If there exists an isomorphism between V and W , then V and W are said to be isomorphic. The circumstance that “V is isomorphic to W ” is denoted V ∼= W . 16.3. THE RANK-NULLITY THEOREM 155

Fact 16.2.2. The inverse of an isomorphism is an isomorphism. Fact 16.2.3. Isomorphisms preserve linear independence and, moreover, map bases to bases.

Fact 16.2.4. Let V and W be vector spaces, let ϕ : V → W be an isomorphism, and let v1,..., vk ∈ V . Then rk(v1,..., vk) = rk(ϕ(v1), . . . , ϕ(vk)) Exercise 16.2.5. Show that ∼= is an equivalence relation, that is, for vector spaces U, V , and W : (a) V ∼= V (reflexive)

(b) If V ∼= W then W ∼= V (symmetric)

(c) If U ∼= V and V ∼= W then U ∼= W (transitive) ∼ n Exercise 16.2.6. Let V be an n-dimensional vector space over F. Show that V = F . Proposition 16.2.7. Two vector spaces over the same field are isomorphic if and only if they have the same dimension.

(R Contents)

16.3 The Rank-Nullity Theorem

Definition 16.3.1 (Image). The image of a linear map ϕ, denoted im(ϕ), is

im(ϕ) := {ϕ(v) | v ∈ V } (16.6)

Definition 16.3.2 (Kernel). The kernel of a linear map ϕ, denoted ker(ϕ), is

−1 ker(ϕ) := {v ∈ V | ϕ(v) = 0W } = ϕ (0W ) (16.7) Proposition 16.3.3. Let ϕ : V → W be a linear map. Then im(ϕ) ≤ W and ker(ϕ) ≤ V . Definition 16.3.4 (Rank of a linear map). The rank of a linear map ϕ is defined as

rk(ϕ) := dim(im(ϕ)) . 156 CHAPTER 16. (F) LINEAR MAPS

Definition 16.3.5 (Nullity). The nullity of a linear map ϕ is defined as

nullity(ϕ) := dim(ker(ϕ)) .

Exercise 16.3.6. Let ϕ : V → W be a linear map. (a) Show that dim(im ϕ) ≤ dim V .

(b) Use this to reprove rk(AB) ≤ min{rk(A), rk(B)}. Theorem 16.3.7 (Rank-Nullity Theorem). Let ϕ : V → W be a linear map. Then

rk(ϕ) + nullity(ϕ) = dim(V ) . (16.8)

♦ Exercise 16.3.8. Let n = k + `. Find a linear map ϕ : Rn → Rn which has rank k, and therefore has nullity `. Examples 16.3.9. Find the rank and nullity of each of the linear maps in Examples 16.1.6-16.1.11.

(R Contents)

16.4 Linear transformations

Let V be a vector space over the field F. In Section 16.1, we defined the notion of a linear map. We now restrict ourselves to those maps which map a vector space into itself. Definition 16.4.1 (Linear transformation). If ϕ : V → V is a linear map, then ϕ is said to be a linear transformation or a linear operator. Definition 16.4.2 (Identity transformation). The identity transformation of V is the transformation id : V → V defined by id(v) = v for all v ∈ V . Definition 16.4.3 (Scalar transformation). The scalar transformations of V are the transformations of the form v 7→ αv for a fixed α ∈ F. In other words, a scalar transformation can be written as α · id. 16.4. LINEAR TRANSFORMATIONS 157

Examples 16.4.4. The following are linear transformations of the 2-dimensional geometric space G2 (verify!). (a) Rotation about the origin by an angle θ

(b) Reflection about any line passing through the origin

(c) Shearing parallel to a line (tilting the deck)

(d) Scaling by α in one direction

FIGURES

Examples 16.4.5. The following are linear transformations of the 3-dimensional geometric space G3 (verify!). (a) Rotation by θ about any line

(b) Projection into any plane

(c) Reflection through any plane passing through the origin

(d) Central reflection

R Example 16.4.6. Let Sα : R be the left shift by α operator, defined by Sα(f) = g where g(t) := f(t + α) . (16.9)

Sα is a linear transformation (verify!).

Examples 16.4.7. The following are linear transformations of Pn[F] (verify!).

d (a) Differentiation dt

(b) Left shift by α, f 7→ Sα(f) (c) Multiplication by t, followed by differentiation, i. e., the map f 7→ (tf)0

Examples 16.4.8. The following are linear transformations of the space RN of infinite sequences of real numbers (verify!). 158 CHAPTER 16. (F) LINEAR MAPS

(a) Left shift, (α0, α1, α2,... ) 7→ (α1, α2, α3,... )

(b) Difference, (α0, α1, α2,... ) 7→ (α1 − α0, α2 − α1, α3 − α2,... ) Example 16.4.9. Let V = span(sin t, cos t). Differentiation is a linear transformation of V (verify!).

Exercise 16.4.10. The difference operator ∆ : Pk[F] → Pk[F], defined by

∆f := Sα(f) − f is linear, and deg(∆f) = deg(f) − 1 if deg(f) ≥ 1.

16.4.1 Eigenvectors, eigenvalues, eigenspaces In Chapter 8, we discussed the notion of eigenvectors and eigenvalues of square matrices. This is easily generalized to eigenvectors and eigenvalues of linear transformations. Definition 16.4.11 (Eigenvector). Let ϕ : V → V be a linear transformation. Then v ∈ V is an eigenvector of ϕ if v 6= 0 and there exists λ ∈ F such that ϕ(v) = λv. In this case we say that v is an eigenvector to eigenvalue λ. Definition 16.4.12 (Eigenvalue). Let ϕ : V → V be a linear transformation. Then λ ∈ F is an eigenvalue of ϕ if there exists a nonzero vector v ∈ V such that ϕ(v) = λv. Compare these definitions with those in Chapter 8. In particular, if A is a square matrix, verify that its eigenvectors and eigenvalues are the same as those of the linear transformation ϕA associated with A (R Example 16.1.5), i. e., the map x 7→ Ax.

Exercise 16.4.13. Let ϕ : V → V be a linear transformation and let v1 and v2 be eigenvectors to distinct eigenvalues. Then v1 + v2 is not an eigenvector. Exercise 16.4.14. Let ϕ : V → V be a linear transformation such that every nonzero vector is an eigenvector. Then ϕ is a scalar transformation.

Exercise 16.4.15. Let ϕ : V → V be a linear transformation and let v1,..., vk be eigenvectors to distinct eigenvalues. Then the vi are linearly independent. 16.4. LINEAR TRANSFORMATIONS 159

Definition 16.4.16 (Eigenspace). Let V be a vector space and let ϕ : V → V be a linear transfor- mation. We denote by Uλ the set

Uλ := {v ∈ V | ϕ(v) = λv} . (16.10)

This set is called the eigenspace corresponding to the eigenvalue λ.

Exercise 16.4.17. Let ϕ : V → V be a linear transformation. Show that, for all λ ∈ F, Uλ is a subspace of V .

Proposition 16.4.18. Let ϕ : V → V be a linear transformation. Then X M Uλ = Uλ . (16.11) λ λ where ⊕ represents the direct sum (R Def. 15.5.2) and the summation is over all eigenvalues.

Examples 16.4.19. Determine the rank, nullity, eigenvalues (and their geometric multiplicities), and eigenvectors of each of the transformations in Examples 16.4.4-16.4.10.

Definition 16.4.20 (Eigenbasis). Let ϕ : V → V be a linear transformation. An eigenbasis of ϕ is a basis of V consisting of eigenvectors of ϕ.

Exercise 16.4.21.

16.4.2 Invariant subspaces Definition 16.4.22 (Invariant subspace). Let ϕ : V → V and let W ≤ V . Then W is a ϕ-invariant subspace of V if, for all w ∈ W , we have ϕ(w) ∈ W . For any linear transformation ϕ, the trivial invariant subspace is the subspace {0}.

Exercise 16.4.23. Let ϕ : G3 → G3 be a rotation about the vertical axis through the origin. What are the ϕ-invariant subspaces?

Exercise 16.4.24. Let π : G3 → G3 be the projection onto the horizontal plane. What are the π-invariant subspaces?

Exercise 16.4.25. 160 CHAPTER 16. (F) LINEAR MAPS

(a) What are the invariant subspaces of id? (b) What are the invariant subspaces of the 0 transformation? Exercise 16.4.26. Let ϕ : V → V be a linear transformation and let λ be an eigenvalue of ϕ. What are the invariant subspaces of ϕ + λ id ?

Exercise 16.4.27. Over every field F, find an infinite-dimensional vector space V and linear trans- formation ϕ : V → V that has no finite-dimensional invariant subspaces other than {0}. Proposition 16.4.28. Let ϕ : V → V be a linear transformation. Then ker ϕ and im ϕ are invariant subspaces.

Proposition 16.4.29. Let ϕ : V → V be a linear transformations and let W1,W2 ≤ V be ϕ- invariant subspaces. Then

(a) W1 + W2 is a ϕ-invariant subspace;

(b) W1 ∩ W2 is a ϕ-invariant subspace. Definition 16.4.30 (Restriction to a subspace). Let ϕ : V → V be a linear map and let W ≤ V be a ϕ-invariant subspace of V . The restriction of ϕ to W , denoted ϕW , is the linear map ϕW : W → W defined by ϕW (w) = ϕ(w) for all w ∈ W . Proposition 16.4.31. Let ϕ : V → V be a linear transformation and let W ≤ V be a ϕ-invariant subspace. Let f ∈ F[t]. Then f(ϕW ) = f(ϕ)W . (16.12) Proposition 16.4.32. Let ϕ : V → V be a linear transformation. The following are equivalent. (a) ϕ is a scalar transformation; (b) all subspaces of V are ϕ-invariant; (c) all 1-dimensional subspaces of V are ϕ-invariant; (d) all hyperplanes (R Def. 5.2.1) are ϕ-invariant.

Exercise 16.4.33. Let S be the shift operator (R Example 16.4.8 (a)) on the space RN of sequences of real numbers, defined by

S(α0, α1, α2,... ) := (α1, α2, α3,... ) . (16.13) 16.4. LINEAR TRANSFORMATIONS 161

(a) In Ex. 15.4.15, we defined the space of Fibonacci-like sequences. Show that this is an S- invariant subspace of RN. (b) Find an eigenbasis of S in this subspace. (c) Use the result of part (b) to find an explicit formula for the n-th Fibonacci number. Definition 16.4.34 (Minimal invariant subspace). Let ϕ : V → V be a linear transformation. Then U ≤ V is a minimal invariant subspace of ϕ if the only invariant subspace properly contained in U is {0}. 0 0 ··· 0 1 1 0 ··· 0 0   0 1 ··· 0 0 Exercise 16.4.35. Let ρ =   be an n × n matrix. ......  . . . . . 0 0 ··· 1 0

(a) Count the invariant subspaces of ρ over C (b) Find a 2-dimensional minimal invariant subspace of ρ over R (c) Count the invariant subspaces of ρ over Q if n is prime Definition 16.4.36. Let ϕ : V → V be a linear transformation. Then the n-th power of ϕ is defined as ϕn := ϕ ◦ · · · ◦ ϕ . (16.14) | {z } n times Definition 16.4.37. Let ϕ : V → V be a linear transformation and let f ∈ F[t], say

n f = α0 + α1t + ··· + αnt . Then we define f(ϕ) by n f(ϕ) := α0 id +α1ϕ + ··· + αnϕ . (16.15) In particular, n f(ϕ)(v) = α0v + α1ϕ(vv) + ··· + αnϕ (v) . (16.16)

Exercise 16.4.38. Let ϕ : V → V be a linear transformation and let f ∈ F[t]. Verify that f(ϕ) is a linear transformation of V . 162 CHAPTER 16. (F) LINEAR MAPS

Definition 16.4.39 (Chain of invariant subspaces). Let ϕ : V → V be a linear transformation, and let U1,...,Uk ≤ V be ϕ-invariant subspaces. We say that the Ui form a chain if whenever i 6= j, either Ui ≤ Uj or Uj ≤ Ui.

Definition 16.4.40 (Maximal chain). Let ϕ : V → V be a linear transformation and let U1,...,Uk ≤ V be a chain of ϕ-invariant subspaces. We say that this chain is maximal if, for all W ≤ V (W 6= Ui for all i), the Ui together with W do not form a chain. d R Exercise 16.4.41. Let dt : Pn(R) → Pn(R) be the derivative linear transformation ( Def. 14.4.70) of the space of real polynomials of degree at most n.

d (a) Prove that the number of dt -invariant subspaces is n + 2; (b) give a very simple description of each;

(c) prove that they form a maximal chain.

Fp ♥ Exercise 16.4.42. Let V = Fp , and let ϕ be the shift-by-1 operator, i. e., ϕ(f)(t) = f(t + 1). What are the invariant subspaces of ϕ? Prove that they form a maximal chain.

Proposition 16.4.43. Let ϕ : V → V be a linear transformation, and let f ∈ F[t]. Then ker f(ϕ) and im f(ϕ) are invariant subspaces.

The next exercise shows that invariant subspaces generalize the notion of eigenvectors.

Exercise 16.4.44. Let v ∈ V be a nonzero vector and let ϕ : V → V be a linear transformation. Then v is an eigenvector of ϕ if and only if span(v) is ϕ-invariant.

Proposition 16.4.45. Let V be a vector space over R and let ϕ : V → V be a linear transformation. Then ϕ has an invariant subspace of dimension at most 2.

Proposition 16.4.46. Let V be an n-dimensional vector space and let ϕ : V → V be a linear transformation. The following are equivalent.

(a) There exists a maximal chain of subspaces, all of which are invariant.

(b) There is a basis b of V such that [ϕ]b is triangular. Proposition 16.4.47. Let V be a finite-dimensional vector space with basis b and let ϕ : V → V be a linear transformation. 16.5. COORDINATIZATION 163

(a) Let b a basis of V . Then [ϕ]b is triangular if and only if every initial segment of b spans a ϕ-invariant subspace of V .

(b) If such a basis exists, then such an orthonormal basis exists.

Proposition 16.4.48. Let V be a finite-dimensional vector space with basis b and let ϕ : V → V be a linear transformation. Then [ϕ]b is diagonal if and only if b is an eigenbasis of ϕ. Exercise 16.4.49. Infer Schur’s Theorem (R Theorem 12.4.9) and the real version of Schur’s Theorem (R Theorem 12.4.18) from the preceding exercise.

(R Contents)

16.5 Coordinatization

In Section 15.3, we defined coordinatization of a vector with respect to some basis (R Def. 15.3.34). We now extend this to the notion of coordinatization of linear maps. Definition 16.5.1 (Coordinatization). Let V be an n-dimensional vector space with basis e = (e1,..., en), and let W be an m-dimensional vector space with basis f = (f 1,..., f m). Let αij m X (1 ≤ i ≤ m, 1 ≤ j ≤ n) be coefficients such that ϕ(ej) = αijf i. Then the matrix representation i=1 or coordinatization of ϕ with respect to the bases e and f is the m × n matrix   α11 ··· α1n :  . .. .  [ϕ]e,f =  . . .  (16.17) αm1 ··· αmn

If ϕ : V → V is a linear transformation, then we write [ϕ]e instead of [ϕ]e,e.

So the j-th column of [ϕ]e,f is [ϕ(ej)]f , the coordinate vector of the image of the j-th basis vector of V in the basis f of W .

Exercise 16.5.2. Write the matrix representation of each of the linear transformations in Ex. 16.4.4 in a basis consisting of perpendicular unit vectors. 164 CHAPTER 16. (F) LINEAR MAPS

Exercise 16.5.3. Compute the matrix of the “rotation by θ” transformation with respect to the basis of two unit vectors at an angle θ.

Exercise 16.5.4. In the preceding two exercises, we computed two matrices representing the “ro- tation by θ” transformation, corresponding to the two different bases considered. Compare (a) the traces and (b) the determinants (R Prop. 6.2.3 (a)) of these two matrices.

Example 16.5.5. Write the matrix representation of each of the linear transformations in Ex. 16.4.5 in the basis consisting of three mutually perpendicular unit vectors.

Example 16.5.6. Write the matrix representation of each of the linear transformations in Ex. 2 n 16.4.7 in the basis (1, t, t , . . . , t ) of the polynomial space Pn(F). The next exercise demonstrates that under our rules of coordinatization, the action of a linear map corresponds to multiplying a column vector by a matrix.

Proposition 16.5.7. For any v ∈ V , if ϕ : V → W is a linear map, then

[ϕ(v)]f = [ϕ]e,f [v]e

Moreover, coordinatization treats eigenvectors the way we would expect.

Proposition 16.5.8. Let V be a vector space with basis b and let ϕ : V → V be a linear transfor- mation. Write A = [ϕ]b. Then (a) A and ϕ have the same eigenvalues

(b) v ∈ V is an eigenvector of ϕ with eigenvalue λ if and only if [v]b is an eigenvector of A with eigenvalue λ

The next exercise shows that under our coordinatization, composition of linear maps (R Def. 16.1.14) corresponds to matrix multiplication. This gives a natural explanation of why we multiply matrices the way we do, and why this operation is associative.

Proposition 16.5.9. Let U, V , and W be vector spaces with bases e, f, and g, respectively, and let ϕ : U → V and ψ : V → W be linear maps. Then

[ψϕ]e,g = [ψ]f,g[ϕ]e,f (16.18) 16.5. COORDINATIZATION 165

Exercise 16.5.10. Explain the comment before the preceding exercise. Exercise 16.5.11. Let V and W be vector spaces, with dim V = n and dim W = k. Infer from Prop. 16.5.9 that coordinatization is an isomorphism between Hom(V,W ) and Fk×n.

Exercise 16.5.12. Let ρθ be the “rotation by θ” linear map in G2.

(a) Show that, for α, β ∈ R, ρα+β = ρα ◦ ρβ. (b) Use matrix multiplication to derive the addition formulas for sin and cos.

Example 16.5.13. Let V be an n-dimensional vector space with basis b, and let A ∈ Fk×n. Define k ϕ : V → F by x 7→ A[x]b. (a) Show that ϕ is a linear map.

(b) Show that rk ϕA = rk A Definition 16.5.14. Let V be a finite-dimensional vector space over R and let b be a basis of V . Let ϕ : V → V be a nonsingular linear transformation. We say that ϕ is sense-preserving if det[ϕ]b > 0 and ϕ is sense-reversing if det[ϕ]b < 0. Proposition 16.5.15. (a) The sense-preserving linear transformations of the plane are rotations. (b) The sense-reversing transformations of the plane are reflections about a line through the origin.

Exercise 16.5.16. What linear transformations of G2 fix the origin?

Definition 16.5.17 (Rotational reflection). A rotational reflection of G3 is a rotation followed by a central reflection. Proposition 16.5.18. (a) The sense-preserving linear transformations of 3-dimensional space are rotations about an axis. (b) The sense-reversing linear transformations of 3-dimensional space are rotational reflections.

(R Contents) 166 CHAPTER 16. (F) LINEAR MAPS 16.6

We now have the equipment necessary to discuss change of basis transformations and the result of change of basis on the matrix representation of linear maps. Definition 16.6.1 (Change of basis transformation). Let V be a vector space with bases e = 0 0 0 0 (e1,..., en) and e = (e1,..., en). Then the change of basis transformation (from e to e ) is 0 the linear transformation σ : V → V given by σ(ei) = ei, for 1 ≤ i ≤ n. Proposition 16.6.2. Let V be a vector space with bases e and e0, and let σ : V → V be the change 0 of basis transformation from e to e . Then [σ]e = [σ]e0 . For this reason, we often denote the matrix representation of the change of basis transformation σ by [σ] rather than by, e. g., [σ]e. Fact 16.6.3. Let σ : V → V be a change of basis transformation. Then [σ] is invertible.

Notation 16.6.4. Let V be a vector space with bases e and e0. When changing basis from e to e0, we sometimes refer to e as the “old” basis and e0 as the “new” basis. So if v ∈ V is a vector, we often write [v]old in place of ve and [v]new in place of [v]e0 . Likewise, if W is a vector space with bases f and f 0 and we change bases from f to f 0, we consider f the “old” basis and f 0 the “new” basis. So

if ϕ : V → W is a linear map, we write [ϕ]old in place of [ϕ]e,f and [ϕ]new in place of [ϕ]e0,f 0 . Proposition 16.6.5. Let v ∈ V and let e and e0 be bases of V . Let σ be the change of basis transformation from e to e0. Then −1 [v]new = [σ] [v]old . (16.19) Numerical exercise 16.6.6. For each of the following vector spaces V , compute the change of 0 basis matrix from e to e . Self-check: pick some v ∈ V , determine [v]e and [v]e0 , and verify that Equation (16.19) holds.

0 0 0 (a) V = G2, e = (e1, e2) is two perpendicular unit vectors, and e = (e1, e2), where e2 is e1 rotated by θ

2 0 2 2 2 (b) V = P2[F], e = (1, t, t ), e = (t , (t + 1) , (t + 2) ) 1  2  0 1 −1 0 (c) V = F3, e = 1 ,  0  , 1, e0 = 0 ,  1  , 1 1 −1 2 1 1 1 16.6. CHANGE OF BASIS 167

Just as coordinates change with respect to different bases, so do the matrix representations of linear maps.

Proposition 16.6.7. Let V and W be finite dimensional vector spaces, let e and e0 be bases of V , let f and f 0 be bases of W , and let ϕ : V → W be a linear map. Then

−1 [ϕ]new = T [ϕ]oldS (16.20)

where S is the change of basis matrix from e to e0 and T is the change of basis matrix from f to f 0.

Proposition 16.6.8. Let N be a nilpotent matrix. Then the linear transformation defined by x 7→ Nx has a chain of invariant subspaces.

Proposition 16.6.9. Every matrix A ∈ Mn(C) is similar to a triangular matrix. Chapter 17

(F) Block Matrices (optional)

17.1 Block matrix basics

Definition 17.1.1 (Block matrix). We sometimes write matrices as block matrices, where each entry actually represents a matrix.

 1 2 −3 4  Example 17.1.2. The matrix A =  2 1 6 −1  may be symbolically written as the 0 −3 −1 2       A11 A12 1 2 −3 4 block matrix A = where A11 = , A12 = , A21 = (0, −3), and A21 A22 2 1 6 −1 A22 = (−1, 2).

Definition 17.1.3 (Block-diagonal matrix). A square matrix A is said to be a block-diagonal matrix   A1  A 0  2  if it can be written in the form A =  ..  where the diagonal blocks Ai are square 0 .  Ak matrices. In this case, we say that A is the diagonal sum of the matrices A1,...,Ak.

Babai: Discover Linear Algebra. 168 This chapter last updated August 23, 2016 c 2016 L´aszl´oBabai. 17.1. BLOCK MATRIX BASICS 169

Example 17.1.4. The matrix

 1 2 0 0 0 0   −3 7 0 0 0 0     0 0 6 0 0 0  A =    0 0 0 4 6 0     0 0 0 2 −1 3  0 0 0 3 2 5

A 0 0  1  1 2 is a block-diagonal matrix. We may write A = 0 A 0 where A = , A = 6,  2  1 −3 7 2 0 0 A3 4 6 0 and A3 = 2 −1 3. 3 2 5

Note that the blocks need not be of the same size.

Definition 17.1.5 (Block-triangular matrix). A square matrix A is said to be a block-triangular   A11 A12 ··· A1k  A ··· A   22 2k matrix if it can be written in the form A =  .. .  for blocks Aij, where the  0 . .  Akk diagonal blocks Aii are square matrices.

Exercise 17.1.6. Show that block matrices multiply in the same way as matrices. More precisely, ri×sj sj ×tk suppose that A = (Aij) and B = (Bjk) are block matrices where Aij ∈ F and Bjk ∈ F . Let ri×tk C = AB (why is this product defined?). Show that C = (Cik) where Cik ∈ F and

X Cjk = AijBjk . (17.1) j 170 CHAPTER 17. (F) BLOCK MATRICES (OPTIONAL) 17.2 Arithmetic of block-diagonal and block-triangular ma- trices

Proposition 17.2.1. Let A = diag(A1,...,An) and B = diag(B1,...,Bn) be block-diagonal ma- trices with blocks of the same size and let λ ∈ F. Then

A + B = diag(A1 + B1,...,An + Bn) (17.2)

λA = diag(λA1, . . . , λAn) (17.3)

AB = diag(A1B1,...,AnBn) . (17.4)

k k k  Proposition 17.2.2. Let A = diag(A1,...,An) be a block-diagonal matrix. Then A = diag A1,...,An for all k.

Proposition 17.2.3. Let f ∈ F[t] be a polynomial and let A = diag(A1,...,An) be a block-diagonal matrix. Then

f(A) = diag(f(A1), . . . , f(An)) . (17.5)

In our discussion of the arithmetic block-triangular matrices, we are interested only in the block- diagonal entries.

Proposition 17.2.4. Let   A1 ∗  A   2  A =  ..   0 .  An and   B1 ∗  B   2  B =  ..   0 .  Bn 17.2. ARITHMETIC OF BLOCK-DIAGONAL AND BLOCK-TRIANGULAR MATRICES 171 be block-upper triangular matrices with blocks of the same size and let λ ∈ F. Then   A1 + B1 ∗  A + B   2 2  A + B =  ..  (17.6)  0 .  An + Bn   λA1 ∗  λA   2  λA =  ..  (17.7)  0 .  λAn   A1B1 ∗  A B   2 2  AB =  ..  . (17.8)  0 .  AnBn

Proposition 17.2.5. Let A be as in Prop. 17.2.4. Then

Ak  1 ∗  Ak  k  2  A =  ..  (17.9)  .  0 k An for all k.

Proposition 17.2.6. Let f ∈ F[t] be a polynomial and let A be as in Prop. 17.2.4. Then   f(A1) ∗  f(A )   2  f(A) =  ..  . (17.10)  0 .  f(An) Chapter 18

(F) Minimal Polynomials of Matrices and Linear Transformations (optional)

18.1 The minimal polynomial

All matrices in this section are square. In Section 8.5, we defined what it means to plug a matrix into a polynomial (R Def. 2.3.3).

Definition 18.1.1 (Annihilation). Let f ∈ F[t]. We say that f annihilates the matrix A ∈ Mn(F) if f(A) = 0. Exercise 18.1.2. Let A be a matrix. Prove, without using the Cayley-Hamilton Theorem, that for every matrix there is a nonzero polynomial that annihilates it, i. e., for every matrix A there is a nonzero polynomial f such that f(A) = 0. Show such a polynomial of degree at most n2 exists.

Definition 18.1.3 (Minimal polynomial). Let A ∈ Mn(F). The polynomial f ∈ F[t] is a minimal polynomial of A if f is a nonzero polynomial of lowest degree that annihilates A. Example 18.1.4. The polynomial t − 1 is a minimal polynomial of the n × n identity matrix. Exercise 18.1.5. Let A be the diagonal matrix diag(3, 3, 7, 7, 7), i. e., 3 0   3    A =  7  .    7  0 7

Babai: Discover Linear Algebra. 172 This chapter last updated August 16, 2016 c 2016 L´aszl´oBabai. 18.1. THE MINIMAL POLYNOMIAL 173

Find a minimal polynomial for A. Compare your answer to the characteristic polynomial fA. Recall (R Prop. 2.3.4) that for all f ∈ F[t], we have

f(diag(λ1, . . . , λn)) = diag(f(λ1), . . . , f(λn)) .

Proposition 18.1.6. Let A ∈ Mn(F) and let m be a minimal polynomial of A. Then for all g ∈ F[t], we have g(A) = 0 if and only if m | g.

Corollary 18.1.7. Let A ∈ Mn(F). Then the minimal polynomial of A is unique up to nonzero scalar factors. Convention 18.1.8. When discussing “the” minimal polynomial of a matrix, we refer to the unique monic minimal polynomial, denoted mA. Corollary 18.1.9 (Cayley-Hamilton restated). The minimal polynomial of a matrix divides its characteristic polynomial.

Corollary 18.1.10. Let A ∈ Mn(F). Then deg mA ≤ n.

Example 18.1.11. mI = t − 1 . 1 1 Exercise 18.1.12. Let A = . Prove m = (t − 1)2. 0 1 A Exercise 18.1.13. Find two 2 × 2 matrices with the same characteristic polynomial but different minimal polynomials. Exercise 18.1.14. Let

A = diag(λ1, . . . , λ1, λ2, . . . , λ2, . . . , λk, . . . , λk) where the λi are distinct. Prove that k Y mA = (t − λi) . (18.1) i=1   A1  A 0  2  Exercise 18.1.15. Let A be a block-diagonal matrix, say A =  .. . Give a simple 0 .  Ak expression of mA in terms of the mA . mA = lcm mA . i i i 174CHAPTER 18. (F) MINIMAL POLYNOMIALS OF MATRICES AND LINEAR TRANSFORMATIONS (OPTIONAL)

Proposition 18.1.16. Let A ∈ Mn(F). Then λ is an eigenvalue of A if and only if mA(λ) = 0. Exercise 18.1.17. Prove: similar matrices have the same minimal polynomial, i. e., if A ∼ B then mA = mB.

Proposition 18.1.18. Let A ∈ Mn(C). If mA does not have multiple roots then A is diagonalizable. In fact, this condition is necessary and sufficient (R Theorem 18.2.20).

18.2 Minimal polynomials of linear transformations

In this section, V is an n-dimensional vector space over F. In Section 16.4.2 we defined what it means to plug a linear transformation into a polynomial (R Def. 16.4.37).

Exercise 18.2.1. Define what it means for the polynomial f ∈ F[t] to annihilate the linear trans- formation ϕ : V → V .

Exercise 18.2.2. Let ϕ : V → V be a linear transformation. Prove that there is a nonzero polynomial that annihilates ϕ.

Exercise 18.2.3. Define a minimal polynomial of a linear transformation ϕ : V → V .

Example 18.2.4. The polynomial t − 1 is a minimal polynomial of the identity transformation.

Proposition 18.2.5. Let ϕ : V → V be a linear transformation and let m be a minimal polynomial of ϕ. Then for all g ∈ F[t], we have g(ϕ) = 0 if and only if m | g. Corollary 18.2.6. Let ϕ : V → V be a linear transformation. Then the minimal polynomial of ϕ is unique up to nonzero scalar factors.

Convention 18.2.7. When discussing “the” minimal polynomial of a linear transformation, we shall refer to the unique monic minimal polynomial, denoted mϕ.

Proposition 18.2.8. Let ϕ : V → V be a linear transformation. Then deg mϕ ≤ n.

Example 18.2.9. mid = t − 1. 18.2. MINIMAL POLYNOMIALS OF LINEAR TRANSFORMATIONS 175

Exercise 18.2.10. Let ϕ : V → V be a linear transformation with an eigenbasis. Prove: all roots of mϕ belong to F and mϕ has no multiple roots. Let b1,..., bn be an eigenbasis, and let ϕ(bi) = λibi for each i. Find mϕ in terms of the λi. Proposition 18.2.11. Let ϕ : V → V be a linear transformation. Then λ is an eigenvalue of ϕ if and only if mϕ(λ) = 0. Exercise 18.2.12 (Consistency of translation).

(a) Let b be a basis of V and let ϕ : V → V be a linear transformation. Let A = [ϕ]b. Show that mϕ = mA. (b) Use this to give a second proof of Ex. 18.1.17 (similar matrices have the same minimal polynomial).

Recall the definition of an invariant subspace of the linear transformation ϕ (R Def. 16.4.22). Definition 18.2.13 (Minimal invariant subspace). Let ϕ : V → V and let W ≤ V . We say that W is a minimal ϕ-invariant subspace if W is a ϕ-invariant subspace, W 6= {0}, and the only ϕ-invariant subspaces of W are W and {0}.

Lemma 18.2.14. Let ϕ : V → V be a linear transformation and let W ≤ V be a ϕ-invariant

subspace . Let ϕW denote the restriction of ϕ to W . Then mϕW | mϕ. ♦

Proposition 18.2.15. Let V = W1 ⊕ · · · ⊕ Wk where the Wi are ϕ-invariant subspaces and ⊕ R denotes their direct sum ( Def. 15.5.2). If mi denotes the minimal polynomial of ϕWi then

mϕ = lcm mi . (18.2) i

(a) Prove this without using Ex. 18.1.15, i. e., without translating the problem to matrices.

(b) Prove this using Ex. 18.1.15.

Theorem 18.2.16. Let V be a finite-dimensional vector space and let ϕ : V → V be a linear transformation. If mϕ has an irreducible factor f of degree d, then there is a d-dimensional minimal invariant subspace W such that

mϕW = fϕW = f . (18.3) ♦ 176CHAPTER 18. (F) MINIMAL POLYNOMIALS OF MATRICES AND LINEAR TRANSFORMATIONS (OPTIONAL)

Corollary 18.2.17. Let V be a finite-dimensional vector space over R and let ϕ : V → V be a linear transformation. Then V has a ϕ-invariant subspace of dimension 1 or 2.

Proposition 18.2.18. Let ϕ : V → V be a linear transformation. Let f | mϕ and let W = ker f(ϕ). Then

(a) mϕW | f;

 mϕ  (b) if gcd f, f = 1, then mϕW = f.

(a) f(ϕW ) = f(ϕ)W = 0. (b) m = lcm(m , m ) where W 0 = ker mϕ (ϕ), i. e., V = W ⊕ W 0. ϕ ϕW ϕW 0 f

Proposition 18.2.19. Let ϕ : V → V be a linear transformation and let f, g ∈ F[t] with mϕ = f ·g and gcd(f, g) = 1. Then n F = (ker f(ϕ)) ⊕ (ker g(ϕ)) (18.4)

Theorem 18.2.20. Let A ∈ Mn(C). Then A is diagonalizable if and only if mA does not have multiple roots over C. ♦ Theorem 18.2.21. Let F = C and let ϕ : V → V be a linear transformation. Then ϕ has an eigenbasis if and only if mϕ does not have multiple roots. ♦ The next theorem represents a step toward canonical forms of matrices.

Theorem 18.2.22. Let A ∈ Mn(F). Then there is a matrix B ∈ Mn(F) such that A ∼ B and B is the diagonal sum of matrices whose minimal polynomials are powers of irreducible polynomials. ♦ Chapter 19

(R) Euclidean Spaces

19.1 Inner products

Let V be a vector space over R. In Section 1.4, we introduced the standard dot product of Rn (R Def. 1.4.1). We now generalize this to the notion of an inner product over a real vector space V . Definition 19.1.1 (). A Euclidean space V is a vector space over R endowed with an inner product h·, ·i : V × V → R that is bilinear, symmetric, and positive definite. We explain these terms.

(a) To all u, v ∈ V , a unique real number denoted by hu, vi is assigned. The number hu, vi is called the inner product of u and v.

(b) For all u, v, w ∈ V we have hu + v, wi = hu, wi + hv, wi

(c) For all u, w ∈ V and all α ∈ R we have hαu, wi = αhu, wi. (Items (b) and (c) express that the inner product is a linear function of its first argument, for any fixed value of the second argument.)

(d) For all u, v, w ∈ V we have hu, v + wi = hu, vi + hu, wi.

(e) For all u, w ∈ V and all α ∈ R we have hu, αwi = αhu, wi. (Items (d) and (e) express that the inner product is a linear function of its second argument, for any fixed value of the first argument. So items (b)-(e) justify the term bilinear.)

Babai: Discover Linear Algebra. 177 This chapter last updated November 28, 2017 c 2016 L´aszl´oBabai. 178 CHAPTER 19. (R) EUCLIDEAN SPACES

(f) For all v, w ∈ V we have hv, wi = hw, vi. (The inner product is symmetric. Given this assumption, (d) and (e) can be dropped.)

(g) For all v ∈ V , if v 6= 0 then hv, vi > 0. (The inner product is positive definite.)

Observe that the standard dot product of Rn that was introduced in Section 1.4 has all of these properties and, in particular, the vector space Rn endowed with the standard dot product as the inner product is a Euclidean space.

Exercise 19.1.2. Let V be a Euclidean space with inner product h·, ·i. Show that for all v ∈ V , we have hv, 0i = h0, vi = 0 . (19.1) Note in particular that hv, vi ≥ 0 for all v ∈ V .

Examples 19.1.3. The following vector spaces together with the specified inner products are Euclidean spaces (verify this).

(a) The vectors in our 2-dimensional and 3-dimensional geometries with the scalar product defined as v · w = |v| · |w| · cos(θ) where θ is the angle between the vectors v and w.

(b) V = Rn, with hx, yi = xT y (the standard dot product).

(c) V = C[0, 1] (the space of continuous functions f : [0, 1] → R) with the inner product hf, gi = R 1 0 f(t)g(t)dt .

(d) V = R[t] (the space of polynomials with real coefficients) with the inner product hf, gi = R ∞ −t2/2 −∞ f(t)g(t)e dt . More generally, let ρ(t) be any nonnegative continuous function that is R ∞ 2n not identically 0 and has the property that −∞ ρ(t)t dt < ∞ for all nonnegative integers n (such a function ρ is called a weight function). Then R[t] is a Euclidean space with the inner R ∞ product hf, gi = −∞ f(t)g(t)ρ(t)dt .

k×n T  (e) V = R , hA, Bi = Tr AB

n T R (f) V = R , and hx, yi = x Ay where A ∈ Mn(R) is a symmetric positive definite ( Def. 10.2.3) n × n real matrix 19.1. INNER PRODUCTS 179

(g) Let (Ω,P ) be a probability space and let V the space of random variables with zero ex- pected value and finite variance. We consider two random variables equal if they are equal with probability 1. The of the random variables X,Y is defined as Cov(X,Y ) = E(XY ) − E(X)E(Y ). With covariance as the inner product, V is a Euclidean space.

Notice that the same vector space can be endowed with different inner products (for example, different weight functions for the inner product on R[t]). Because they have inner products, Euclidean spaces carry with them the notion of distance (“norm”) and the notion of two vectors being perpendicular (“orthogonality”). Just as inner prod- ucts generalize the standard dot product in Rn, these concepts generalize the definitions of norm and orthogonality presented for Rn (with respect to the standard dot product) in Section 1.4. Definition 19.1.4 (Norm). Let V be a Euclidean space, and let v ∈ V . Then the norm of v, denoted kvk, is kvk := phv, vi . (19.2) The notion of a norm allows us to easily define the distance between two vectors. Definition 19.1.5. Let V be a Euclidean space, and let v, w ∈ V . Then the distance between the vectors v and w, denoted d(v, w), is

d(v, w) := kv − wk . (19.3)

The following two theorems show that distance in Euclidean spaces behaves the way we are used to it behaving in Rn. Theorem 19.1.6 (Cauchy-Schwarz inequality). Let V be a Euclidean space, and let v, w ∈ V . Then |hv, wi| ≤ kvk · kwk . (19.4)

♦ Theorem 19.1.7 (Triangle inequality). Let V be a Euclidean space, and let v, w ∈ V . Then

kv + wk ≤ kvk + kwk . (19.5)

♦ Exercise 19.1.8. Show that the triangle inequality is equivalent to the Cauchy-Schwarz inequality. 180 CHAPTER 19. (R) EUCLIDEAN SPACES

Definition 19.1.9 (Angle between vectors). Let V be a Euclidean space, and let v, w ∈ V . The angle θ between v and w is defined by

hv, wi θ := arccos . (19.6) kvkkwk

Observe that from the Cauchy-Schwarz inequality, we have (for v, w 6= 0) that

hv, wi − 1 ≤ ≤ 1 (19.7) kvkkwk

so this definition is valid, as this term is always in the domain of arccos. Definition 19.1.10 (Orthogonality). Let V be a Euclidean space, and let v, w ∈ V . Then v and w are orthogonal (denoted v ⊥ w) if hv, wi = 0. A set of vectors S = {v1,..., vn} is said to be orthogonal if vi ⊥ vj whenever i 6= j. π Observe that two vectors are orthogonal when the angle between them is 2 . This agrees with the notion of orthogonality being a generalization of perpendicularity.

Exercise 19.1.11. Let V be a Euclidean space. What vectors are orthogonal to every vector?

Exercise 19.1.12. Let V = C[0, 2π] be the space of continuous functions f : [0, 2π] → R, endowed with the inner product Z 2π hf, gi = f(t)g(t)dt . (19.8) 0 Show that the set {1, cos t, sin t, cos(2t), sin(2t), cos(3t),... } is an orthogonal set in this Euclidean space.

Definition 19.1.13 (Orthogonal system). An orthogonal system in a Euclidean space V is a list of (pairwise) orthogonal nonzero vectors in V .

Proposition 19.1.14. Every orthogonal system in a Euclidean space is linearly independent.

Definition 19.1.15 (). Let V be a Euclidean space, and let v1,..., vk ∈ V . The Gram matrix of v1,..., vk is the k × k matrix whose (i, j) entry is hvi, vji, that is,

k G = G(v1,..., vk) := (hvi, vji)i,j=1 . (19.9) 19.2. GRAM-SCHMIDT ORTHOGONALIZATION 181

Exercise 19.1.16. Let V be a Euclidean space. Show that the vectors v1,..., vk ∈ V are linearly independent if and only if det G(v1,..., vk) 6= 0.

Exercise 19.1.17. Let V be a Euclidean space and let v1,..., vk ∈ V . Show

rk(v1,..., vk) = rk(G(v1,..., vk)) . (19.10)

Definition 19.1.18 (Orthonormal system). An orthonormal system in a Euclidean space V is a list of (pairwise) orthogonal vectors in V , all of which have unit norm. So (v1, v2,... ) is an orthonormal system if hvi, vji = δij for all i, j. In the case of finite-dimensional Euclidean spaces, we are particularly interested in orthonormal bases.

Definition 19.1.19 (Orthonormal basis). Let V be a Euclidean space. A list (v1,..., vn) is an orthonormal basis (ONB) of V if it is a basis and {v1,..., vn} is an orthonormal set. In the next section, not only will we show that every finite-dimensional Euclidean space has an orthonormal basis, but we will, moreover, demonstrate an algorithm for transforming any basis into an orthonormal basis.

Proposition 19.1.20. Let V be a Euclidean space with orthonormal basis b. Then for all v, w ∈ V ,

T hv, wi = [v]b [w]b . (19.11)

Proposition 19.1.21. Let V be a Euclidean space. Every linear form f : V → R (R Def. 15.1.6) can be written as f(x) = ha, xi (19.12) for a unique a ∈ V .

19.2 Gram-Schmidt orthogonalization

The main result of this section is the following theorem.

Theorem 19.2.1. Every finite-dimensional Euclidean space has an orthonormal basis. In fact, every orthonormal system extends to an orthonormal basis. ♦ 182 CHAPTER 19. (R) EUCLIDEAN SPACES

Before we prove this theorem, we will first develop Gram-Schmidt orthogonalization, an online procedure that takes a list of vectors as input and produces a list of orthogonal vectors satisfying certain conditions. We formalize this below.

Theorem 19.2.2 (Gram-Schmidt orthogonalization). Let V be a Euclidean space and let v1,..., vk ∈ V . For i = 1, . . . , k, let Ui = span(v1,..., vi). Then there exist vectors e1,..., ek such that

(a) For all j ≥ 1, we have vj − ej ∈ Uj−1

(b) The ej are pairwise orthogonal

Moreover, the ej are uniquely determined. ♦

Note that U0 = span(∅) = {0}. Let GS(k) denote the statement of Theorem 19.2.2 for a particular value of k. We prove the Theorem 19.2.2 by induction on k. The inductive step is based on the following lemma.

Lemma 19.2.3. Assume GS(k) holds. Then, for all j ≤ k, we have span(e1,..., ej) = Uj. ♦ Exercise 19.2.4. Prove GS(1).

Exercise 19.2.5. Let k ≥ 2 and assume GS(k − 1) is true. Look for ek in the form

k−1 X ek = vk − αiei (19.13) i=1

Prove that the only possible vector ek satisfying the conditions of GS(k) is the ek for which

hvi, eii αi = 2 (19.14) keik except in the case where ei = 0 (in that case, αi can be chosen arbitrarily).

Exercise 19.2.6. Prove that ek as constructed in the previous exercise satisfies GS(k). This completes the proof of Theorem 19.2.2.

Proposition 19.2.7. ei = 0 if and only if vi ∈ span(v1,..., vi−1). Proposition 19.2.8. The Gram-Schmidt procedure preserves linear independence. 19.3. ISOMETRIES AND ORTHOGONAL TRANSFORMATIONS 183

Proposition 19.2.9. If (v1,..., vk) is a basis of V , then so is (e1,..., ek).

Exercise 19.2.10. Let e = (e1,..., en) be an orthogonal basis of V . From e, construct an or- 0 0 0 thonormal basis e = (e1,..., en) of V . Exercise 19.2.11. Conclude that every finite-dimensional Euclidean space has an orthonormal basis. Numerical exercise 19.2.12. Apply the Gram-Schmidt procedure to find an orthonormal basis for each of the following Euclidean spaces V from the basis b. Self-check: once you have applied Gram-Schmidt, verify that you have obtained an orthonormal set of vectors. 1 1 −1 (a) V = R3 with the standard dot product, b = 0 , 1 ,  1  1 0 2

R ∞ −t2/2 2 (b) V = P2[R], with hf, gi = −∞ e f(t)g(t)dt, and b = (1, t, t ) (Hermite polynomials) ∞ (c) V = P [ ], with hf, gi = R √ 1 f(t)g(t)dt, and b = (1, t, t2) (Chebyshev polynomials of 2 R −∞ 1−t2 the first kind) √ R ∞ 2 2 (d) V = P2[R], with hf, gi = −∞ 1 − t f(t)g(t)dt, and b = (1, t, t ) (Chebyshev polynomials of the second kind)

19.3 Isometries and orthogonal transformations

Definition 19.3.1 (Isometry). Let V and W be Euclidean spaces. Then an isometry f : V → W is a bijection that preserves inner product, i. e., for all v1, v2 ∈ V , we have

hv1, v2iV = hf(v1), f(v2)iW (19.15)

The Euclidean spaces V and W are isometric if there is an isometry between them. Theorem 19.3.2. If V and W are finite dimensional Euclidean spaces, then they are isometric if and only if dim V = dim W . ♦ Proposition 19.3.3. Let V and W be Euclidean spaces. Then ϕ : V → W is an isometry if and only if it maps an orthonormal basis of V to an orthonormal basis of W . 184 CHAPTER 19. (R) EUCLIDEAN SPACES

Proposition 19.3.4. Let ϕ : V → W be an isomorphism that preserves orthogonality (so v ⊥ w if and only if ϕ(v) ⊥ ϕ(w)). Show that there is an isometry ψ and a nonzero scalar λ such that ϕ = λψ.

The geometric notion of congruence is captured by the concept of orthogonal transformations. Definition 19.3.5 (Orthogonal transformation). Let V be a Euclidean space. A linear transformation ϕ : V → V is called an orthogonal transformation if it is an isometry. The set of orthogonal transformations of V is denoted by O(V ), and is called the orthogonal group of V .

Proposition 19.3.6. The set O(V ) is a group (R Def. 14.2.1) under composition.

Exercise 19.3.7. The linear transformation ϕ : V → V is orthogonal if and only if ϕ preserves the norm, i. e., for all v ∈ Cn, we have kϕvk = kvk.

Theorem 19.3.8. Let ϕ ∈ O(V ). Then all eigenvalues of ϕ are ±1. ♦

Proposition 19.3.9. Let V be a Euclidean space and let e = (e1,..., en) be an orthonormal basis of V . Then ϕ : V → V is an orthogonal transformation if and only if (ϕ(e1), . . . , ϕ(en)) is an orthonormal basis.

Proposition 19.3.10 (Consistency of translation). Let V be a Euclidean space with orthonormal basis b, and let ϕ : V → V be a linear transformation. Then ϕ is orthogonal if and only if [ϕ]b is an orthogonal matrix (R Def. 9.1.1).

Definition 19.3.11. Let V be a Euclidean space and let S,T ⊆ V . For v ∈ V , we say that v is orthogonal to S (notation: v ⊥ S) if for all s ∈ S, we have v ⊥ s. Moreover, we say that S is orthogonal to T (notation: S ⊥ T ) if s ⊥ t for all s ∈ S and t ∈ T . Definition 19.3.12. Let V be a Euclidean space and let S ⊆ V . Then S⊥ (“S perp”) is the set of vectors orthogonal to S, i. e., S⊥ := {v ∈ V | v ⊥ S}. (19.16)

Proposition 19.3.13. For all subsets S ⊆ V , we have S⊥ ≤ V .

Proposition 19.3.14. Let S ⊆ V . Then S ⊆ S⊥⊥.

Exercise 19.3.15. Verify

(a) {0}⊥ = V 19.3. ISOMETRIES AND ORTHOGONAL TRANSFORMATIONS 185

(b) ∅⊥ = V

(c) V ⊥ = {0} The next theorem says that the direct sum (R Def. 15.5.2) of a subspace and its perp is the entire space.

⊥ Theorem 19.3.16. If dim V < ∞ and W ≤ V , then V = W ⊕ W . ♦ The proof of this theorem requires the following lemma. Lemma 19.3.17. Let V be a vector space with dim V = n, and let W ≤ V . Then

dim W ⊥ = n − dim W. (19.17)

♦ Proposition 19.3.18. Let V be a finite-dimensional Euclidean space and let S ⊆ V . Then

⊥ S⊥ = span(S) . (19.18)

Proposition 19.3.19. Let U1,...,Uk be pairwise orthogonal subspaces (i. e., Ui ⊥ Uj whenever i 6= j). Then k k X M Ui = Ui . i=1 i=1 We now study the linear map analogue of the transpose of a matrix, known as the adjoint of the linear map. Theorem 19.3.20. Let V and W be Euclidean spaces, and let ϕ : V → W be a linear map. Then there exists a unique linear map ψ : W → V such that for all v ∈ V and w ∈ W , we have

hϕv, wi = hv, ψwi . (19.19)

♦ Note that the inner product above refers to inner products in two different spaces. To be more specific, we should have written hϕv, wiW = hv, ψwiV . (19.20) 186 CHAPTER 19. (R) EUCLIDEAN SPACES

Definition 19.3.21. The linear map ψ whose existence is guaranteed by Theorem 19.3.20 is called the adjoint of ϕ and is denoted ϕ∗. So for all v ∈ V and w ∈ W , we have

hϕv, wi = hv, ϕ∗wi . (19.21)

The next exercise shows the relationship between the coordinatization of ϕ and of ϕ∗. The reason we denote the adjoint of the linear map ϕ by ϕ∗ rather than by ϕT will become clear in Section 20.4.

Proposition 19.3.22. Let V , W , and ϕ be as in the statement of Theorem 19.3.20. Let b1 be an orthonormal basis of V and let b2 be an orthonormal basis of W . Then

[ϕ∗] = [ϕ]T . (19.22) b2,b1 b1,b2

19.4 First proof of the Spectral Theorem

We first stated the Spectral Theorem for real symmetric matrices in Chapter 10. We now restate the theorem in the context of Euclidean spaces and break its proof into a series of exercises. Let V be a Euclidean space. Definition 19.4.1 (Symmetric linear transformation). Let ϕ : V → V be a linear transformation. Then ϕ is symmetric if, for all v, w ∈ V , we have

hv, ϕ(w)i = hϕ(v), wi . (19.23)

Proposition 19.4.2. Let ϕ : V → V be a symmetric transformation, and let v1,..., vk be eigen- vectors of ϕ with distinct eigenvalues. Then vi ⊥ vj whenever i 6= j. Proposition 19.4.3. Let b be an orthonormal basis of the finite-dimensional Euclidean space V , and let ϕ : V → V be a linear transformation. Then ϕ is symmetric if and only if the matrix [ϕ]b is symmetric.

In particular, this proposition establishes the equivalence of Theorem 10.1.1 and Theorem 19.4.4. The main theorem of this section is the Spectral Theorem.

Theorem 19.4.4 (Spectral Theorem). Let V be a finite-dimensional Euclidean space and let ϕ : V → V be a symmetric linear transformation. Then ϕ has an orthonormal eigenbasis. ♦ 19.4. FIRST PROOF OF THE SPECTRAL THEOREM 187

Proposition 19.4.5. Let ϕ be a symmetric linear transformation of the Euclidean space V , and let W ≤ V . If W is ϕ-invariant (R Def. 16.4.22) then W ⊥ is ϕ-invariant.

Lemma 19.4.6. Let ϕ : V → V be a symmetric linear transformation and let W be a ϕ-invariant subspace of V . Then the restriction of ϕ to W is also symmetric. ♦ The heart of the proof of the Spectral Theorem is the following lemma.

Main Lemma 19.4.7. Let ϕ be a symmetric linear transformation of a Euclidean space of degree ≥ 1. Then ϕ has an eigenvector.

Exercise 19.4.8. Assuming Lemma 19.4.7, prove the Spectral Theorem by induction on dim V .

Before proving the Main Lemma, we digress briefly into analysis.

Definition 19.4.9 (Convergence). Let V be a Euclidean space and let v = (v1, v2, v3,... ) be a sequence of vectors in V . We say that the sequence v converges to the vector v if lim kvi − vk = 0, i→∞ that is, if for all ε > 0 there exists N ∈ Z such that kvi − vk < ε for all i > N. Definition 19.4.10 (Open set). Let S ⊆ Rn. We say that S is open if, for every v ∈ S, there exists ε > 0 such that the set {w ∈ Rn | kv − wk < ε} is a subset of S. Definition 19.4.11 (Closed set). A set S ⊆ V is closed if, whenever there is a sequence of vectors v1, v2, v3, · · · ∈ S that converges to a vector v ∈ V , we have v ∈ S. Definition 19.4.12 (Bounded set). A set S ⊆ V is bounded if there is some v ∈ V and r ∈ R such that for all w ∈ V , we have d(v, w) < r (R Def. 19.1.5: d(v, w) = kv − wk). Definition 19.4.13 (Compactness). Let V be a finite-dimensional Euclidean space, and let S ⊆ V . Then S is compact if it is closed and bounded.1 Definition 19.4.14. Let S be a set and let f : S → R be a function. We say that f attains its maximum if there is some s0 ∈ S such that for all s ∈ S, we have f(s0) ≥ f(s). The notion of a function attaining its minimum is defined analogously.

Theorem 19.4.15. Let V be a Euclidean space and let S ⊆ V be a compact and nonempty subset. If f : S → R is continuous, then f attains its maximum and its minimum. ♦ 1The reader familiar with topology may recognize this not as the standard definition of compactness in a general metric space, but rather as an equivalent definition for finite-dimensional Euclidean spaces that arises as a consequence of the Heine–Borel Theorem. 188 CHAPTER 19. (R) EUCLIDEAN SPACES

Definition 19.4.16 (Rayleigh quotient for linear transformations). Let ϕ : V → V be a linear transformation. Then the Rayleigh quotient of ϕ is the function Rϕ : V \{0} → R defined by hv, ϕ(v)i R (v) := . (19.24) ϕ kvk2

While this function is defined for all linear transformations, its significance is in its applications to symmetric linear transformations.

Exercise 19.4.17. Let ϕ : V → V be a linear transformation, and let v ∈ V and λ ∈ R×. Verify that Rϕ(λv) = Rϕ(v).

Proposition 19.4.18. Let ϕ : V → V be a linear transformation. Then the Rayleigh quotient Rϕ attains its maximum and its minimum.

Exercise 19.4.19. Consider the real function α + βt + γt2 f(t) = δ + εt2

where α, β, γ, δ, ε ∈ R and δ2 + ε2 6= 0. Suppose f(0) ≥ f(t) for all t ∈ R. Show that β = 0.

Definition 19.4.20 (arg max). Let S be a set and let f : S → R be a function which attains its maximum. Then arg max f := {s0 ∈ S | f(s0) ≥ f(s) for all s ∈ S}. Convention 19.4.21. Let S be a set and let f : S → R be a function which attains its maximum. We often write arg max f to refer to any (arbitrarily chosen) element of the set arg max f, rather than the set itself.

Proposition 19.4.22. Let ϕ be a symmetric linear transformation of the Euclidean space V , and let v0 = arg max Rϕ(v). Then v0 is an eigenvector of ϕ. Prop. 19.4.22 completes the proof of the Main Lemma and thereby the proof of the Spectral Theorem.

Proposition 19.4.23. If two symmetric matrices are similar then they are orthogonally similar. Chapter 20

(C) Hermitian Spaces

In Chapter 19, we discussed Euclidean spaces, whose underlying vector spaces were real. We now generalize this to the notion of Hermitian spaces, whose underlying vector spaces are complex.

20.1 Hermitian spaces

Definition 20.1.1 (Sesquilinear forms). Let V be a vector space over C. The function f : V ×V → C is a sesquilinear form if the following four conditions are met.

(a) f(v, w1 + w2) = f(v, w1) + f(v, w2) for all v, w1, w2 ∈ V ,

(b) f(v1 + v2, w) = f(v1, w) + f(v2, w) for all v1, v2, w ∈ V ,

(c) f(v, λw) = λf(v, w) for all v, w ∈ V and λ ∈ C, (d) f(λv, w) = λf(v, w) for all v, w ∈ V and λ ∈ C, where λ is the complex conjugate of λ. Examples 20.1.2. The function f(v, w) = v∗w is a sesquilinear form over Cn. More generally, for any A ∈ Mn(C), f(v, w) = v∗Aw (20.1) is sesquilinear. Exercise 20.1.3. Let V be a vector space over C and let f : V × V → C be a sesquilinear form. Show that for all v ∈ V , we have f(v, 0) = f(0, v) = 0 . (20.2)

Babai: Discover Linear Algebra. 189 This chapter last updated August 9, 2016 c 2016 L´aszl´oBabai. 190 CHAPTER 20. (C) HERMITIAN SPACES

Definition 20.1.4 (Hermitian form). Let V be a complex vector space. The function f : V × V → C is Hermitian if for all v, w ∈ V , f(v, w) = f(w, v) . (20.3) A sesquilinear form that is Hermitian is called a Hermitian form.

∗ Exercise 20.1.5. For what matrices A ∈ Mn(C) is the sesquilinear form f(v, w) = v Aw Hermi- tian?

Exercise 20.1.6. Show that for Hermitian forms, (b) follows from (a) and (d) follows from (c) in Def. 20.1.1.

Fact 20.1.7. Let V be a vector space over C and let f : V × V → C be a Hermitian form. Then f(v, v) ∈ R for all v ∈ V .

Definition 20.1.8. Let V be a vector space over C, and let f : V × V → C be a Hermitian form. (a) If f(v, v) > 0 for all v 6= 0, then f is positive definite.

(b) If f(v, v) ≥ 0 for all v ∈ V , then f is positive semidefinite.

(c) If f(v, v) < 0 for all v 6= 0, then f is negative definite.

(d) If f(v, v) ≤ 0 for all v ∈ V , then f is negative semidefinite.

(e) If there exists v, w ∈ V such that f(v, v) > 0 and f(w, w) < 0, then f is indefinite.

∗ Exercise 20.1.9. For what A ∈ Mn(C) is v Aw positive definite, positive semidefinite, etc.?

Exercise 20.1.10. Let V be the space of continuous functions f : [0, 1] → C, and let ρ : [0, 1] → C be a continuous “weight function.” Define

Z 1 F (f, g) := f(t)g(t)ρ(t)dt . (20.4) 0 (a) Show that F is a sesquilinear form.

(b) Under what conditions on ρ is F Hermitian?

(c) Under what conditions on ρ is F Hermitian and positive definite? 20.1. HERMITIAN SPACES 191

Exercise 20.1.11. Let r = (ρ0, ρ1,... ) be an infinite sequence of complex numbers. Consider the space V of infinite sequences (α0, α1,... ) of complex numbers such that ∞ X 2 |αi| |ρi| < ∞. (20.5) i=0

For a = (α0, α1,... ) and b = (β0, β1,... ), define ∞ X F (a, b) := αiβiρi . (20.6) i=0 (a) Show that F is a sesquilinear form. (b) Under what conditions on r is F Hermitian? (c) Under what conditions on r is F Hermitian and positive definite?

Definition 20.1.12 (Hermitian space). A Hermitian space is a vector space V over C endowed with a positive definite, Hermitian, sesquilinear inner product h·, ·i : V × V → C. Example 20.1.13. The standard example of a (complex) Hermitian space is Cn endowed with the standard Hermitian dot product (R Def. 12.2.6), that is, hv, wi := v∗w. Note that the standard Hermitian dot product is sesquilinear and positive definite.

Example 20.1.14. Let V be the space of continuous functions f : [0, 1] → C, and define the inner product Z 1 hf, gi = f(t)g(t)dt 0 Then V endowed with this inner product is a Hermitian space.

2 Example 20.1.15. The space ` (C) of sequences (α0, α1,... ) of complex numbers such that ∞ X 2 |αi| < ∞ (20.7) i=0 endowed with the inner product ∞ X ha, bi = αiβi (20.8) i=0 192 CHAPTER 20. (C) HERMITIAN SPACES where a = (α0, α1,... ) and b = (β0, β1,... ) is a Hermitian space. This is one of the standard representations of the complex “separable .” Euclidean spaces generalize the geometric concepts of distance and perpendicularity via the notions of norm and orthogonality, respectively; these are easily extended to complex Hermitian spaces. Definition 20.1.16 (Norm). Let V be a Hermitian space, and let v ∈ V . Then the norm of v, denoted kvk, is defined to be kvk := phv, vi . (20.9) Just as in Euclidean spaces, the notion of a norm allows us to define the distance between two vectors in a Hermitian space. Definition 20.1.17. Let V be a Hermitian space, and let v, w ∈ V . Then the distance between the vectors v and w, denoted d(v, w), is

d(v, w) := kv − wk . (20.10)

Distance in Hermitian spaces obeys the same properties that we are used to in Euclidean spaces. Theorem 20.1.18 (Cauchy-Schwarz inequality). Let V be a Hermitian space, and let v, w ∈ V . Then |hv, wi| ≤ kvk · kwk . (20.11) ♦ Theorem 20.1.19 (Triangle inequality). Let V be a Hermitian space, and let v, w ∈ V . Then

kv + wk ≤ kvk + kwk . (20.12)

♦ Again, like in Euclidean spaces, norms carry with them the notion of angle; however, because hv, wi is not necessarily real, our definition of angle is not identical to the definition of angle presented in Section 19.1. Definition 20.1.20 (Orthogonality). Let V be a Hermitian space. Then we say that v, w ∈ V are orthogonal (notation: v ⊥ w) if hv, wi = 0. Exercise 20.1.21. Let V be a Hermitian space. What vectors are orthogonal to every vector? 20.1. HERMITIAN SPACES 193

Definition 20.1.22 (Orthogonal system). An orthogonal system in a Hermitian space V is a list of (pairwise) orthogonal nonzero vectors in V . Proposition 20.1.23. Every orthogonal system in a Hermitian space is linearly independent.

Definition 20.1.24 (Gram matrix). Let V be a Hermitian space, and let v1,..., vk ∈ V . The Gram matrix of v1,..., vk is the k × k matrix whose (i, j) entry is hvi, vji, that is, k G = G(v1,..., vk) := (hvi, vji)i,j=1 . (20.13)

Exercise 20.1.25. Let V be a Hermitian space. Show that the vectors v1,..., vk ∈ V are linearly independent if and only if det G(v1,..., vk) 6= 0.

Exercise 20.1.26. Let V be a Hermitian space and let v1,..., vk ∈ V . Show

rk(v1,..., vk) = rk(G(v1,..., vk)) . (20.14) Definition 20.1.27 (Orthonormal system). An orthonormal system in a Hermitian space V is a list of (pairwise) orthogonal vectors in V , all of which have unit norm. So (v1, v2,... ) is an orthonormal system if hvi, vji = δij for all i, j. In the case of finite-dimensional Hermitian spaces, we are particularly interested in orthonormal bases, just as we were interested in orthonormal bases for finite-dimensional Euclidean spaces. Definition 20.1.28 (Orthonormal basis). Let V be a Hermitian space. An orthonormal basis of V is an orthonormal system that is a basis of V . Exercise 20.1.29. Generalize the Gram-Schmidt orthogonalization procedure (R Section 19.2) to complex Hermitian spaces. Theorem 19.2.2 will hold verbatim, replacing the word “Euclidean” by “Hermitian.” Proposition 20.1.30. Every finite-dimensional Hermitian space has an orthonormal basis. In fact, every orthonormal list of vectors can be extended to an orthonormal basis. Proposition 20.1.31. Let V be a Hermitian space with orthonormal basis b. Then for all v, w ∈ V ,

∗ hv, wi = [v]b [w]b . (20.15) Proposition 20.1.32. Let V be a Hermitian space. Every linear form f : V → C (R Def. 15.1.6) can be written as f(x) = ha, xi (20.16) for a unique a ∈ V . 194 CHAPTER 20. (C) HERMITIAN SPACES 20.2 Hermitian transformations

Definition 20.2.1 (Hermitian linear transformation). Let V be a Hermitian space. Then the linear transformation ϕ : V → V is Hermitian if for all v, w ∈ V , we have hϕv, wi = hv, ϕwi . (20.17)

Theorem 20.2.2. All eigenvalues of a Hermitian linear transformation are real. ♦

Proposition 20.2.3. Let ϕ : V → V be a Hermitian linear transformation, and let v1,..., vk be eigenvectors of ϕ with distinct eigenvalues. Then vi ⊥ vj whenever i 6= j. Proposition 20.2.4. Let b be an orthonormal basis of the Hermitian space V , and let ϕ : V → V be a linear transformation. Then the transformation ϕ is Hermitian if and only if the matrix [ϕ]b is Hermitian. Observe that the definition of a Hermitian linear transformation is analogous to that of a sym- metric linear transformation of a Euclidean space (R Def. 19.4.1), but they are not fully anal- ogous. In particular, while a linear transformation of a Euclidean space that has an orthonormal eigenbasis is necessarily symmetric, the analogous statement in complex spaces involves “normal transformations” (R Def. 20.5.1), as opposed to Hermitian transformations.

Exercise 20.2.5. Find a linear transformation of Cn that has an orthonormal eigenbasis but is not Hermitian. However, the Spectral Theorem does extend to Hermitian linear transformations. Theorem 20.2.6 (Spectral Theorem for Hermitian transformations). Let V be a Hermitian space and let ϕ : V → V be a Hermitian linear transformation. Then (a) ϕ has an orthonormal eigenbasis; (b) all eigenvalues of ϕ are real.

Exercise 20.2.7. Prove the converse of the Spectral Theorem for Hermitian transformations: If ϕ : V → V satisfies (a) and (b), then ϕ is Hermitian. In Section 20.6, we shall see a more general form of the Spectral Theorem which extends part (a) to normal transformations (R Theorem 20.6.1). 20.3. UNITARY TRANSFORMATIONS 195 20.3 Unitary transformations

In Section 19.3, we introduced orthogonal transformations (R Def. 19.3.5), which captured the geometric notion of congruence in Euclidean spaces. The complex analogues of real orthogonal transformations are called unitary transformations. Definition 20.3.1 (Unitary transformation). Let V be a Hermitian space. Then the transformation ϕ : V → V is unitary if it preserves the inner product, i. e.,

hϕv, ϕwi = hv, wi (20.18) for all v, w ∈ V . The set of unitary transformations ϕ : V → V is denoted by U(V ).

Proposition 20.3.2. The set U(V ) is a group (R Def. 14.2.1) under composition.

Exercise 20.3.3. The linear transformation ϕ : V → V is unitary if and only if ϕ preserves the norm, i. e., for all v ∈ Cn, we have kϕvk = kvk.

Warning. The proof of this is trickier than in the real case (R Ex. 19.3.7).

Theorem 20.3.4. Let ϕ ∈ U(V ). Then all eigenvalues of ϕ have absolute value 1. ♦ Proposition 20.3.5 (Consistency of translation). Let b be an orthonormal basis of the Hermitian space V , and let ϕ : V → V be a linear transformation. Then the transformation ϕ is unitary if and only if the matrix [ϕ]b is unitary.

Theorem 20.3.6 (Spectral Theorem for unitary transformations). Let V be a Hermitian space and let ϕ : V → V be a unitary transformation. Then

(a) ϕ has an orthonormal eigenbasis;

(b) all eigenvalues of ϕ have unit absolute value.

Exercise 20.3.7. Prove the converse of the Spectral Theorem for unitary transformations: If ϕ : V → V satisfies (a) and (b), then ϕ is unitary. 196 CHAPTER 20. (C) HERMITIAN SPACES 20.4 Adjoint transformations in Hermitian spaces

We now study the linear map analogue of the conjugate-transpose of a matrix, known as the Hermitian adjoint.

Theorem 20.4.1. Let V and W be Hermitian spaces, and let ϕ : V → W be a linear map. Then there exists a unique linear map ψ : W → V such that for all v ∈ V and w ∈ W , we have

hϕv, wi = hv, ψwi . (20.19)

Note that the inner product above refers to inner products in two different spaces. To be more specific, we should have written

hϕv, wiW = hv, ψwiV . (20.20)

Definition 20.4.2. The linear map ψ whose existence is guaranteed by Theorem 20.4.1 is called the adjoint of ϕ and is denoted ϕ∗. So for all v ∈ V and w ∈ W , we have

hϕv, wi = hv, ϕ∗wi . (20.21)

Proposition 20.4.3 (Consistency of translation). Let V , W , and ϕ be as in the statement of

Theorem 20.4.1. Let b1 be an orthonormal basis of V and let b2 be an orthonormal basis of W . Then [ϕ∗] = [ϕ]∗ . (20.22) b2,b1 b1,b2

TO BE WRITTEN.

20.5 Normal transformations

Definition 20.5.1 (Normal transformation). Let V be a Hermitian space. The transformation ϕ : V → V is normal if it commutes with its adjoint, i. e., ϕ∗ϕ = ϕϕ∗. TO BE WRITTEN. 20.6. THE COMPLEX SPECTRAL THEOREM FOR NORMAL TRANSFORMATIONS 197 20.6 The Complex Spectral Theorem for normal transfor- mations

The main result of this section is the generalization of the Spectral Theorem to normal transforma- tions.

Theorem 20.6.1 (Complex Spectral Theorem). Let ϕ : V → V be a linear transformation. Then ϕ has an orthonormal eigenbasis if and only if ϕ is normal. ♦ Chapter 21

(R, C) The Singular Value Decomposition

In this chapter, we discuss matrices over C, but every statement of this chapter holds over R as well, if we replace matrix adjoints by transposes and the word “unitary” by “orthogonal.”

21.1 The Singular Value Decomposition

In this chapter we study the “Singular Value Decomposition,” an important tool in many areas of math and computer science.

Notation 21.1.1. For r ≤ min{k, n}, the matrix diagk×n(σ1, . . . , σr) is the k × n matrix with σi (1 ≤ i ≤ r) in the (i, i) entry and 0 everywhere else, i. e., the matrix

  σ1 0  σ2   .   ..       σr     0   .   ..  0 0

Note that such a matrix is a “diagonal” matrix which is not necessarily square.

Babai: Discover Linear Algebra. 198 This chapter last updated August 29, 2016 c 2016 L´aszl´oBabai. 21.2. LOW-RANK APPROXIMATION 199

Theorem 21.1.2 (Singular Value Decomposition). Let A ∈ Ck×n. Then there exist unitary matrices S ∈ U(k) and T ∈ U(n) such that

∗ S AT = diagk×n(σ1, . . . , σr) (21.1)

k×n where each “singular value” σi is real, σ1 ≥ σ2 ≥ · · · ≥ σr > 0, and r = rk A. If A ∈ R , then we can let S and T be real, i. e., S ∈ O(k) and T ∈ O(n). Theorem 21.1.3. The singular values of A are the square roots of the nonzero eigenvalues of ∗ A A. ♦ The remainder of this section is dedicated to proving the following restatement of Theorem 21.1.2. Theorem 21.1.4 (Singular Value Decomposition, restated). Let V and W be Hermitian spaces with dim V = n and dim W = k. Let ϕ : V → W be a linear map of rank r. Then there exist orthonormal bases e and f of V and W , respectively, and real numbers σ1 ≥ · · · ≥ σr > 0 such that ∗ ∗ ϕei = σif i and ϕ f i = σiei for i = 1, . . . , r, and ϕej = 0 = ϕ f j for j > r. Exercise 21.1.5. Show that Theorem 21.1.4 is equivalent to Theorem 21.1.2. Exercise 21.1.6. Let V and W be Hermitian spaces and let ϕ : V → W be a linear map of rank r. ∗ Let e = (e1,..., er,..., en) be an orthonormal eigenbasis of ϕ ϕ (why does√ such a basis exist?) with corresponding eigenvalues λ1 ≥ · · · ≥ λr > 0 = λr+1 = ··· = λn. Let σi = λi for u = 1, . . . , r. Let 1 f i = ei for i = 1, . . . , r. Then for 1 ≤ i ≤ r, σi

(a) ϕei = σif i;

∗ (b) ϕ f i = σiei .

Exercise 21.1.7. The f i of the preceding exercise are orthonormal. Exercise 21.1.8. Complete the proof of Theorem 21.1.4, and hence of Theorem 21.1.2.

21.2 Low-rank approximation

In this section, we use the Singular Value Decomposition to find low-rank approximations to a matrix, that is, the matrix of a given rank which is “closest” to a specified matrix under the operator norm (R Def. 13.1.1). 200 CHAPTER 21. (R, C) THE SINGULAR VALUE DECOMPOSITION

Definition 21.2.1 (Truncated diagonal matrix). Let D = diagk×n(σ1, . . . , σr) be a rank-r matrix (so σ1, . . . , σr 6= 0). The rank-` truncation (` ≤ r) of D, denoted D`, is the k × n matrix D` == diagk×n(σ1, . . . , σ`). The next theorem explains how the Singular Value Decomposition helps us find low-rank ap- proximations.

Theorem 21.2.2 (Nearest low-rank matrix). Let A ∈ Ck×n be a matrix of rank r with singular values σ1 ≥ · · · ≥ σr > 0. Define S, T , and D as guaranteed by the Singular Value Decomposition Theorem so that ∗ S AT = diagk×n(σ1, . . . , σr) = D. (21.2) Given ` ≤ r, let

D` = diagk×n(σ1, . . . , σ`) (21.3) ∗ and define B` = SD`T . Then B` is the matrix of rank at most ` which is nearest to A under the k×n operator norm, i. e., rk B` = ` and for all B ∈ C , if rk B ≤ `, then kA − B`k ≤ kA − Bk.

Exercise 21.2.3. Let A, B`, D, and D` be as in the statement of Theorem 21.2.2. Show

(a) kA − B`k = kD − D`k ;

(b) kD − D`k = σ`+1 .

As with the proof of the Singular Value Decomposition Theorem, Theorem 21.2.2 is easier to prove in terms of linear maps. We restate the theorem as follows.

Theorem 21.2.4. Let V and W be Hermitian spaces with orthonormal bases e and f, respectively, and let ϕ : V → W be a linear map such that ϕei = σif i for i = 1, . . . , r. Define the truncated map ϕ` : V → V by  σ f 1 ≤ i ≤ ` ϕ e = i i (21.4) ` i 0 otherwise

Then whenever ψ : V → W is a linear map of rank ≤ `, we have kϕ − ϕ`k ≤ kϕ − ψk.

Exercise 21.2.5. Show that Theorem 21.2.4 is equivalent to Theorem 21.2.2.

Exercise 21.2.6. Let ϕ and ϕ` be as in the statement of the preceding theorem. Show kϕ − ϕ`k = σ`+1. 21.2. LOW-RANK APPROXIMATION 201

It follows that in order to prove Theorem 21.2.4, it suffices to show that for all linear maps ψ : V → W of rank ≤ `, we have kϕ − ψk ≥ σ`+1. Exercise 21.2.7. Let ψ : V → W be a linear map of rank ≤ `. Show that there exists v ∈ ker ψ such that k(ϕ − ψ)vk ≥ σ . (21.5) kvk `+1 Exercise 21.2.8. Complete the proof of Theorem 21.2.4, hence of Theorem 21.2.2.

Exercise 21.2.9. Show that the matrix B` whose existence is guaranteed by Theorem 21.2.2 is unique, i. e., if there is a matrix B ∈ Ck×` of rank ` such that, kA − B0k ≤ kA − Bk for all rank-` 0 k×` matrices B ∈ C , then B = B`. In fact, the rank-` matrix guaranteed by Theorem 21.2.2 to be nearest to A under the operator norm is also the rank-` matrix nearest to A under the Frobenius norm (R Def. 13.2.1).

Theorem 21.2.10. The statement of Theorem 21.2.2 holds for the same matrix B` when the operator norm is replaced by the Frobenius norm. That is, we also have kA − B`kF ≤ kA − BkF for k×n all rank-` matrices B ∈ C . ♦ Chapter 22

(R) Finite Markov Chains

22.1 Stochastic matrices

Definition 22.1.1 (). We say that A is a nonnegative matrix if all entries of A are nonnegative.

Definition 22.1.2 (Stochastic matrix). A square matrix A = (αij) ∈ Mn(R) is stochastic if it is n X nonnegative and the rows of A sum to 1, i. e., αij = 1 for all i. j=1

Examples 22.1.3. The following matrices are stochastic.

1 (a) n Jn (b) Permutation matrices

0.8 0.2 (c) 0.3 0.7

1  1    (d) . 0  .  1

Babai: Discover Linear Algebra. 202 This chapter last updated August 24, 2016 c 2016 L´aszl´oBabai. 22.2. FINITE MARKOV CHAINS 203

Exercise 22.1.4. Let A ∈ Rk×n and B ∈ Rn×m. Prove that if A and B are stochastic matrices then AB is a stochastic matrix. Exercise 22.1.5. (a) Let A be a stochastic matrix. Prove that A1 = 1. (b) Show that the converse is false. (c) Show that A is stochastic if and only if A is nonnegative and A1 = 1. Definition 22.1.6 (Probability distribution). A probability distribution is a list of nonnegative num- bers which add to 1. Fact 22.1.7. Every row of a stochastic matrix is a probability distribution.

22.2 Finite Markov Chains

A finite Markov Chain is a stochastic process defined by a finite number of states and constant transition probabilities. We consider the trajectory of a particle that can move between states at discrete time steps with the given transition probabilities. In this section, we denote by Ω = [n] the finite set of states.

FIGURE HERE

This is described more formally below.

Definition 22.2.1 (Finite Markov Chain). Let Ω = [n] be a set of states. We denote by Xt ∈ Ω the position of the particle at time t. The transition probability pij is the probability that the particle moves to state j at time t + 1, given that it is in state i at time t, i. e.,

pij = P (Xt+1 = j | Xt = i) . (22.1)

n X In particular, each pij ≥ 0 and pij = 1 for every i. The infinite sequence (X0,X1,X2,... ) is a j=1 stochastic process. Definition 22.2.2 (Transition matrix). The transition matrix corresponding to our finite Markov Chain is the matrix T = (pij). 204 CHAPTER 22. (R) FINITE MARKOV CHAINS

Fact 22.2.3. The transition matrix T of a finite Markov Chain is a stochastic matrix, and every stochastic matrix is the transition matrix of a finite Markov Chain. Notation 22.2.4 (r-step transition probability). The r-step transition probability from state i to (r) state j, denoted pij is defined by (r) pij = P (Xt+r = j | Xt = i) . (22.2)

Exercise 22.2.5 (Evolution of Markov Chains, I). Let T = (pij) be the transition matrix corre- r  (r) sponding to a finite Markov Chain. Show that T = pij where

Definition 22.2.6. Let qt,i be the probability that the particle is in state i at time t. We define qt = (qt,1, . . . , qt,n) to be the distribution of the particle at time t. n X Fact 22.2.7. For every t ≥ 0, qt,i = 1. i=1 Exercise 22.2.8 (Evolution of Markov Chains, II). Let T be the transition matrix of a finite t Markov Chain. Show that qt+1 = qtT and conclude that qt = q0T . Definition 22.2.9 (Stationary distribution). The probability distribution q is a stationary distribu- tion if q = qT , i. e., if q is a left eigenvector (R Def. 8.1.17) with eigenvalue 1.

Proposition 22.2.10. Let A ∈ Mn(R) be a stochastic matrix. Show that A has a left eigenvector with eigenvalue 1. The preceding proposition in conjunction with the next theorem shows that every stochastic matrix has a stationary distribution. Recall (R Def. 22.1.1) that a nonnegative matrix is a matrix whose entries are all nonnegative; a nonnegative vector is defined similarly. Do not prove the following theorem. Theorem 22.2.11 (Perron-Frobenius). Every nonnegative square matrix has a nonnegative eigen- vector. Corollary 22.2.12. Every finite Markov Chain has a stationary distribution. Proposition 22.2.13. If T is the transition matrix of a finite Markov Chain and lim T r = L r→∞ exists, then every row of L is a stationary distribution. In order to determine which Markov Chains have transition matrices that converge, we study the directed graphs associated with finite Markov Chains. 22.3. DIGRAPHS 205 22.3 Digraphs

TO BE WRITTEN.

22.4 Digraphs and Markov Chains

Every finite Markov Chain has an associated digraph, where i → j if and only if pij 6= 0.

FIGURE HERE

Definition 22.4.1 (Irreducible Markov Chain). A finite Markov Chain is irreducible if its associated digraph is strongly connected.

Proposition 22.4.2. If T is the transition matrix of an irreducible finite Markov Chain and lim T r = L exists, then all rows of L are the same, so rk L = 1. r→∞

Definition 22.4.3 (Ergodic Markov Chain). A finite Markov Chain is ergodic if its associated digraph is strongly connected and aperiodic. The following theorem establishes a sufficient condition under which the probability distribution of a finite Markov Chain converges to the stationary distribution. For irreducible Markov Chains, this is necessary and sufficient.

Theorem 22.4.4. If T is the transition matrix of an ergodic finite Markov Chain, then lim T r r→∞ exists.

Proposition 22.4.5. For irreducible Markov Chains, the stationary distribution is unique.

22.5 Finite Markov Chains and undirected graphs

TO BE WRITTEN. 206 CHAPTER 22. (R) FINITE MARKOV CHAINS 22.6 Additional exercises

Definition 22.6.1 (). A matrix A = (αij) is doubly stochastic if both A and T A are stochastic, i. e., 0 ≤ αij ≤ 1 for all i, j and every row and column sum is equal to 1. Examples 22.6.2. The following matrices are doubly stochastic.

1 (a) n Jn (b) Permutation matrices

0.1 0.3 0.6 (c) 0.2 0.5 0.3 0.7 0.2 0.1

Theorem∗ 22.6.3 (Birkhoff’s Theorem). The set of doubly stochastic matrices is the convex hull (R Def. 5.3.6) of the set of permutation matrices (R Def. 2.4.3). In other words, every doubly stochastic matrix can be expressed as a convex combination (R Def. 5.3.1) of permutation matrices. ♦ Chapter 23

More Chapters

TO BE WRITTEN.

Babai: Discover Linear Algebra. This chapter last updated July 7, 2016 c 2016 L´aszl´oBabai. Chapter 24

Hints

24.1 Column Vectors

1.1.17 R Exercise R Solution (b) Consider the sums of the entries in each column vector.

1.2.7 R Exercise R Solution Show that the sum of two vectors of zero weight has zero weight and that scaling a zero-weight vector produces a zero-weight vector.

1.2.8 R Exercise R Solution What are the subspaces of Rn which are spanned by one vector?

1.2.9 R Exercise R Solution

(c) The “if” direction is trivial. For the “only if” direction, begin with vectors w1 ∈ W1 \ W2 and w2 ∈ W2 \ W1. Where is w1 + w2?

1.2.12 R Exercise R Solution Prove (a) and (b) together. Prove that the subspace defined by (b) satisfies the definition.

1.2.15 R Exercise R Solution

Babai: Discover Linear Algebra. 208 This chapter last updated August 18, 2016 c 2016 L´aszl´oBabai. 24.1. COLUMN VECTORS 209

R Preceding exercise.

1.2.16 R Exercise R Solution Show that span(T ) ≤ span(S).

1.2.18 R Exercise R Solution

It is clear that U1 + U2 ⊆ span(U1 ∪ U2). For the reverse inclusion you need to show that a linear combination from U1 ∪ U2 can always be written as a sum u1 + u2 for some u1 ∈ U1 and u2 ∈ U2. 1.3.9 R Exercise R Solution

You need to find α1, α2, α3 such that  1  4 −2 0 α1 −3 + α2 2 + α3 −8 = 0 . 2 1 3 0 This gives you a system of linear equations. Find a nonzero solution.

1.3.11 R Exercise R Solution P Assume αivi = 0. What condition on αj allows you to express vj in terms of the other vectors? 1.3.13 R Exercise R Solution What does the empty sum evaluate to?

1.3.15 R Exercise R Solution Find a nontrivial linear combination of v and v that evaluates to 0.

1.3.16 R Exercise R Solution The only linear combinations of the list (v) are of the form αv for α ∈ F. 1.3.17 R Exercise R Solution R Ex. 1.3.13.

1.3.21 R Exercise R Solution P If αivi = 0 and not all the αi are 0, then it must be the case that αk+1 6= 0 (why?). 1.3.22 R Exercise R Solution 210 CHAPTER 24. HINTS

R Prop. 1.3.11

1.3.23 R Exercise R Solution What is the simplest nontrivial linear combination of such a list which evaluates to zero?

1.3.24 R Exercise R Solution Combine Prop. 1.3.15 and Fact 1.3.20.

1.3.27 R Exercise R Solution Begin with a basis. How can you add one more vector that will satisfy the condition?

1.3.41 R Exercise R Solution

Suppose not. Use Lemma 1.3.21 to show that span(w1,..., wm) ≤ span(v2,..., vk) and arrive at a contradiction.

1.3.42 R Exercise R Solution

Use the Steinitz exchange lemma to replace v1 by some wi1 , then v2 by some wi2 , etc. So in the end, wi1 ,..., wik are linearly independent.

1.3.45 R Exercise R Solution Use the standard basis of Fk and the preceding exercise.

1.3.46 R Exercise R Solution

It is clear that rk(v1,..., vk) ≤ dim(span(v1,..., vk) (why?). For the other direction, first assume that the vi are linearly independent, and show that span(v1,..., vk) cannot contain a list of more than k linearly independent vectors.

1.3.51 R Exercise R Solution

Show that if this list were not linearly independent, then U1 ∩ U2 would contain a nonzero vector.

1.3.52 R Exercise R Solution R Ex. 1.3.51.

1.3.53 R Exercise R Solution 24.2. MATRICES 211

Start with a basis of U1 ∩ U2.

1.4.9 R Exercise R Solution Only the zero vector. Why?

1.4.10 R Exercise R Solution Express 1 · x in terms of the entries of x.

1.5.4 R Exercise R Solution Begin with a linear combination W of the vectors in S which evaluates to zero. Conclude that all coefficients must be 0 by taking the dot product of W with each member of S.

1.6.4 R Exercise R Solution Incidence vectors.

24.2 Matrices

2.2.3 R Exercise R Solution Prove only one of these, then infer the other using the transpose (R Ex. 2.2.2).

2.2.17 R Exercise R Solution (d3) The size is the sum of the squares of the entries. This is the Frobenius norm of the matrix.

2.2.19 R Exercise R Solution Show, by induction on k, that the (i, j) entry of Ak is zero whenever j ≤ i + k − 1.

2.2.27 R Exercise R Solution

Write B as a sum of matrices of the form [0 | · · · | 0 | bi | 0 | · · · | 0] and then use distributivity.

2.2.28 R Exercise R Solution The i-th entry of Ax is the dot product of the i-th row of A with x.

2.2.29 R Exercise R Solution 212 CHAPTER 24. HINTS

Apply the preceding exercise to B = I.

2.2.30 R Exercise R Solution Consider (A − B)x.

2.2.31 R Exercise R Solution R Prop. 2.2.28.

2.2.32 R Exercise R Solution R Prop. 2.2.29

2.2.33 R Exercise R Solution R Prop. 2.2.29

2.2.38 R Exercise R Solution What is the (i, i) entry of AB?

2.2.39 R Exercise R Solution R Preceding exercise.

2.2.41 R Exercise R Solution If Tr(AB) = 0 for all A, then B = 0.

2.3.2 R Exercise R Solution Induction on k. Use the preceding exercise.

2.3.4 R Exercise R Solution Induction on the degree of f.

2.3.6 R Exercise R Solution Observe that the off-diagonal entries do not affect the diagonal entries.

2.3.7 R Exercise R Solution Induction on k. Use the preceding exercise.

2.5.1 R Exercise R Solution 24.3. MATRIX RANK 213

R Ex. 2.2.2

2.5.5 R Exercise R Solution R Ex. 2.2.14.

2.5.6 R Exercise R Solution Multiply by the matrix with a 1 in the (i, j) position and 0 everywhere else.

2.5.7 R Exercise R Solution

(a) Trace.

2.5.8 R Exercise R Solution Incidence vectors (R Def. 1.6.2).

2.5.13 R Exercise R Solution (b) That particular circulant matrix is the one corresponding to the cyclic permutation of all elements, i. e., the matrix C = C(0, 1, 0, 0,..., 0).

24.3 Matrix Rank

3.1.5 R Exercise R Solution Consider the matrix [A | B] (the k × 2n matrix obtained by concatenating the columns of B onto A), and compare its column rank to the quantities above.

3.3.2 R Exercise R Solution Elementary column operations do not change the column space of a matrix.

3.3.12 R Exercise R Solution Consider the k × r submatrix formed by taking r linearly independent columns.

3.5.9 R Exercise R Solution 214 CHAPTER 24. HINTS

Extend a basis of U to a basis of Fn, and consider the action of the matrix A on this basis.

3.6.7 R Exercise R Solution R Ex. 2.2.19.

24.4 Theory of Systems of Linear Equations I: Qualitative Theory

24.5 Affine and Convex Combinations

5.1.14 R Exercise R Solution

(b) Let v0 ∈ S and let U = {s − v0 | s ∈ S}. Show that

(i) U ≤ Fn;

(ii) S is a translate of U.

For uniqueness, you need to show that if U + u = V + v = S, then U = V .

24.6 The Determinant

6.2.22 R Exercise R Solution Identity matrix.

6.2.23 R Exercise R Solution Consider the n! term expansion of the determinant (Eq. (6.20)). What happens with the contribu- tions of σ and σ−1? What if σ = σ−1? What if σ has a fixed point?

6.3.7 R Exercise R Solution

Let fen denote the number of fixed-point-free even permutations of the set {1, . . . , n}, and let fon 24.7. THEORY OF SYSTEMS OF LINEAR EQUATIONS II: CRAMER’S RULE 215

denote the number of fixed-point-free odd permutations of the set [n]. Write fon − fen as the determinant of a familiar matrix.

6.4.6 R Exercise R Solution det A = ±1.

6.4.7 R Exercise R Solution R Ex. 6.2.23

24.7 Theory of Systems of Linear Equations II: Cramer’s Rule 24.8 Eigenvectors and Eigenvalues

(b) R Exercise R Solution Use Ex. 8.1.9.

8.1.24 R Exercise R Solution Use Rank-Nullity.

8.5.6 R Exercise R Solution R Prop. 2.3.4

24.9 Orthogonal Matrices 24.10 The Spectral Theorem 24.11 Bilinear and Quadratic Forms

11.1.5 R Exercise R Solution 216 CHAPTER 24. HINTS

n R Let e1,..., en be the standard basis of F ( Def. 1.3.34). Let αi = f(ei). Show that a = T (α1, . . . , αn) works.

11.1.10 R Exercise R Solution Generalize the hint to Theorem 11.1.5.

11.1.13 R Exercise R Solution (c) Skew symmetric matrices. R Ex. 6.2.21

11.3.3 R Exercise R Solution Give a very simple expression for B in terms of A.

11.3.9 R Exercise R Solution Use the fact that λ is real.

11.3.15 R Exercise R Solution Interlacing.

11.3.23 R Exercise R Solution (b) Triangular matrices.

24.12 Complex Matrices

12.2.8 R Exercise R Solution

(a)

(b) Prove that your example must have rank 1.

12.4.9 R Exercise R Solution Induction on n.

12.4.15 R Exercise R Solution 24.13. MATRIX NORMS 217

R Prop. 12.4.6.

12.4.19 R Exercise R Solution R Theorem 12.4.18

12.5.4 R Exercise R Solution T Pick λ0, . . . , λn−1 to be complex numbers with unit absolute value. Let w = (λ0, . . . , λn−1) . Prove √1 that the entries of n F w generate a unitary circulant matrix, and that all unitary circulant matrices are of this form. The λi are the eigenvalues of the resulting circulant matrix.

24.13 Matrix Norms

13.1.2 R Exercise R Solution kAxk Show that max exists. Why does this suffice? n x∈R kxk kxk=1

13.1.13 R Exercise R Solution n−k Use the fact that fAB = fBA · t (R Prop. 8.4.16).

24.14 Preliminaries

24.15 Vector Spaces

15.3.14 R Exercise R Solution Use the fact that a polynomial of degree n has at most n roots.

15.4.9 R Exercise R Solution R Ex. 14.4.59.

15.4.10 R Exercise R Solution 218 CHAPTER 24. HINTS

R Ex. 16.4.10.

15.4.13 R Exercise R Solution

Start with a basis of U1 ∩ U2.

15.4.14 R Exercise R Solution It is an immediate result of Cor. 15.4.3 that every list of k + 1 vectors in Rk is linearly dependent.

15.4.18 R Exercise R Solution Their span is not changed.

24.16 Linear Maps

16.2.6 R Exercise R Solution Coordinate vector.

16.3.7 R Exercise R Solution

Let e1,..., ek be a basis of ker(ϕ). Extend this to a basis e1,..., en of V , and show that ϕ(ek+1), . . . , ϕ(en) is a basis for im(ϕ).

16.4.32 R Exercise R Solution

(a)

(b)

(c) R Prop. 16.4.14.

(d)

16.6.7 R Exercise R Solution 0 n 0 −1 Write A = [ϕ]old and A = [ϕ]new. Show that for all x ∈ F (n = dim V ), we have A x = T ASx. 24.17. BLOCK MATRICES 219 24.17 Block Matrices

24.18 Minimal Polynomials of Matrices and Linear Trans- formations

24.19 Euclidean Spaces

19.1.6 R Exercise R Solution Consider the quadratic polynomial f(t) = ktv + wk2. Note that f(t) ≥ 0 for all t. What does this say about the discriminant of f?

19.1.7 R Exercise R Solution Derive this from the Cauchy-Schwarz inequality.

19.1.21 R Exercise R Solution R Theorem 11.1.5.

19.3.20 R Exercise R Solution R Prop. 19.1.21.

19.4.19 R Exercise R Solution f 0(0) = 0.

19.4.22 R Exercise R Solution ⊥ Define U = {v0} , and for each u ∈ U \{0}, consider the function fu : R → R defined by fu(t) = Rϕ(v0 + tu). Apply the preceding exercise to this function.

24.20 Hermitian Spaces

20.1.19 R Exercise R Solution 220 CHAPTER 24. HINTS

Derive this from the Cauchy-Schwarz inequality.

20.1.32 R Exercise R Solution R Theorem 11.1.5.

20.2.2 R Exercise R Solution For an eigenvalue λ, you need to show that λ = λ. The proof is just one line. Consider the expression x∗Ax.

20.2.5 R Exercise R Solution Which diagonal matrices are Hermitian?

20.4.1 R Exercise R Solution R Prop. 20.1.32.

24.21 The Singular Value Decomposition

21.2.7 R Exercise R Solution

It suffices to show that there exists v ∈ ker ψ such that kϕvk ≥ σ`+1kvk (why?). Pick v ∈ ker ψ ∩ span(e1,..., e`+1) and show that this works.

24.22 Finite Markov Chains

22.1.5 R Exercise R Solution

For a general matrix B ∈ Mn(R), what is the i-th entry of B1? 22.2.10 R Exercise R Solution R Prop. 8.4.15.

22.6.3 R Exercise R Solution Marriage Theorem. Chapter 25

Solutions

25.1 Column Vectors

1.1.16 R Exercise R Hint

−5 −2 2 1 −1 = 2 1 − 6     2   11 7 6

Babai: Discover Linear Algebra. 221 This chapter last updated August 9, 2016 c 2016 L´aszl´oBabai. 222 CHAPTER 25. SOLUTIONS 25.2 Matrices

25.3 Matrix Rank

25.4 Theory of Systems of Linear Equations I: Qualitative Theory

25.5 Affine and Convex Combinations

25.6 The Determinant

25.7 Theory of Systems of Linear Equations II: Cramer’s Rule

25.8 Eigenvectors and Eigenvalues

8.1.5 R Exercise R Hint n Every vector in F is an eigenvector of In with eigenvalue 1.

8.1.24 R Exercise R Hint dim Uλ = n − rk(λI − A).

25.9 Orthogonal Matrices

25.10 The Spectral Theorem

10.2.2 R Exercise R Hint 25.10. THE SPECTRAL THEOREM 223

n ! n ! T X T X v Av = αibi A αibi i=1 i=1 n ! n ! X T X = αibi αiAbi i=1 i=1 n ! n ! X T X = αibi αiλibi i=1 i=1 n n X X T = λiαiαjbj bi i=1 j=1 n n X X = λiαiαj(bj · bi) i=1 j=1 n n X X = λiαiαjδij i=1 j=1 n X 2 = λiαi i=1

10.2.4 R Exercise R Hint First suppose A is positive definite. Let v be an eigenvector of A. Then, because v 6= 0, we have 0 < vT Av = vT λv = λvT v = λkvk2 . Hence λ > 0. Now suppose A is symmetric and all of its eigenvalues are positive. By the Spectral Theorem, k A has an orthonormal eigenbasis, say b = (b1,..., bn), with Abi = λibi for all i. Let v ∈ F be a P nonzero vector. Then we can write v = αibi for some scalars αi. Then n T X 2 v Av = λiαi > 0 i=1 so A is positive definite. 224 CHAPTER 25. SOLUTIONS 25.11 Bilinear and Quadratic Forms

11.3.3 R Exercise R Hint A+AT B = 2 . Verify that this and only this matrix works.

25.12 Complex Matrices

25.13 Matrix Norms

25.14 Preliminaries

25.15 Vector Spaces

25.16 Linear Maps

25.17 Block Matrices

25.18 Minimal Polynomials of Matrices and Linear Trans- formations

18.2.10 R Exercise R Hint Qn 0 Q0 mA = i=1 (t − λi) where means that we take each factor only once (so eigenvalues λi with multiplicity greater than 1 only contribute one factor of t − λi). 25.19. EUCLIDEAN SPACES 225 25.19 Euclidean Spaces 25.20 Hermitian Spaces 25.21 The Singular Value Decomposition 25.22 Finite Markov Chains

Index

0, 5, 13, 17, 18 Cauchy-Binet formula, 76 1, 5, 17, 28 Cayley-Hamilton Theorem, 87, 87–88, 111 for diagonal matrices, 87 Affine basis, 59, 59 for diagonalizable matrices, 88 Affine combination, 56, 56, 61 Characteristic polynomial, 84, 84–86 Affine hull, 57, 57–58, 61 of nilpotent matrices, 85 Affine independence, 59, 59 Affine subspace, 54 of similar matrices, 88 k Codimension, 47 of F , 57, 57–60 codimension of, 60 of an affine subspace, 60 dimension of, 58, 58–59 Cofactor, 73 intersection of, 57, 58 Cofactor expansion, 73, 73–74 All-ones matrix, see J matrix skew, 75 All-ones vector, see 1 Column space, 39, 44 Column vecotr Basis orthogonality of, 108 k of a subspace of F , 14, 14–15 Column vector, 3, 3–19 of a subspace of F k, 15 addition of, 5, 5 of k, 15, 55 F dot product of, 16, 16–18, 28, 90, 97, 98, Bilinear form 104 nonsingular, 104 isotropic, 104, 104–105 over n, 97, 97–98 F linear combination of, 6, 6–8, 16, 29 nonsingular, 97, 98 linear dependence of, 11, 11–71 representation theorem for, 97 Linear independence of, 15 Cn linear independence of, 11, 11–13, 18, 19, orthonormal basis of, 108, 109 37, 44, 48, 55, 71, 80

227 228 INDEX

multiplication by a scalar, 6, 6 Dimension, 13, 15 norm of, 108 of a convex set, 62 orthogonality, 17, 17–18 of an affine subspace, 58, 58–59 orthogonality of, 28 Discrete Fourier Transform, 112, 112 with respect to a bilinear form, 103 Dot product, 16, 16–18, 28, 90, 97, 98, 104 parallel, 13, 13 Dual space Column-rank, 39, 39–40, 43 of Fn, 96 full, 39, 45 Column-space, 40 Eigenbasis Companion matrix, 86, 86 of Fn, 80, 80 Complex conjugate, 106 orthonormal, 91, 93, 94, 110 Complex number, 106, 106–107 Eigenspace, 81, 81 absolute value of, 106 Eigenvalue imaginary part of, 106 interlacing of, 95 magnitude of, 106 of a matrix, 79, 79–81, 84, 85, 101, 114, norm of, 106 115 real part of, 106 algebraic multiplicity of, 85, 85–86 unit norm, 106, 106–107 geometric multiplicity of, 81, 81, 86 Convex combination, 60–61 left, 81, 81, 86 Convex hull, 61, 61–62 of a real symmetric matrix, 93 Convex set, 61, 61–62 Eigenvector dimension of, 62 of a matrix intersection of, 61 left, 81 Coordinates, 94 of a matrix, 79, 79–81 Corank left, 80 of a matrix, 48, 48 Elementary column operation, 40, 40–43, 48, of a set, 47 71 Courant-Fischer Theorem, 94 and the determinant, 70 Cramer’s Rule, 77, 78 Elementary matrices, 41, 42 Elementary matrix, 41 Determinant, 55, 65, 68, 68–76, 85, 102 Elementary row operation, 41, 41–44, 48 corner, 102, 102–103 Empty list, 7 multilinearity of, 70 Empty sum, 4 of a matrix product, 70 Euclidean norm, 18, 72, 90, 109 , 82, 82 complex, 108 INDEX 229

Fk, 4, 21 intersection of, 60 basis of, 15, 55 linear, see Linear hyperplane dimension of, 15 I, see Identity matrix standard basis of, 14, 26, 29 Identity matrix, 26, 26, 45, 79 subspace of, 8, 8–10, 60 Incidence vector, 19, 19 codimension of, 47, 47 dimension of, 13, 13–15 J matrix, 21, 28 disjoint, 15 Jordan block, 37 intersection of, 9 totally isotropic, 104, 104–105 k-cycle, 67, 68 trivial, 9 Kronecker delta, 26 union of, 9 `2 norm, 108 k×n F , 21 `2 norm, 18 F[t], 83 Linear dependence Field, 4 of column vectors, 71 of characteristic 2, 71, 72 Linear combination First Miracle of Linear Algebra, 15, 15 affine, see Affine combination Frobenius norm, 114, 114–115 of column vectors, 6, 6–8, 10, 16, 29 submultiplicativity of, 115 trivial, 11 Fundamental Theorem of Algebra, 84 Linear dependence of column vectors, 11, 11–71 Gauss-Jordan elimination, 43 Linear form Gaussian elimination, 43 over Fn, 96, 96, 100 Generalized Fisher Inequality, 19 Linear hyperplane, 60 Group intersection of, 60 orthogonal, see Orthogonal group Linear independence symmetric, see Symmetric group maximal, 14, 14 unitary, see Unitary group of column vectors, 11, 11–13, 15, 18, 19, 37, 44, 48, 55, 71, 80 Hadamard matrix, 92, 92 List, 11 Hadamard’s Inequality, 72 concatenation of, 11 Half-space, 62, 62 empty, 12 Helly’s Theorem, 62, 62 List of generators, 14, 14 Hermitian dot product, 108, 108–109 Hyperplane, 60, 60 Mn(F), 21 230 INDEX

Matrix, 20, 20–37 multiplication by a scalar, 24, 24 addition of, 23, 23 negative definite, 101 adjugate of, 75 negative semidefinite, 101 augmented, 54 nilpotent, 28, 28, 81, 82 block, 168, 168 characteristic polynomial of, 85 multiplication of, 169 nonnegative, 202 block-diagonal, 168, 169–170 nonsingular, 44, 44–45, 49, 54, 55, 75, 77, block-triangular, 169, 170, 171 80, 97 characteristic polynomial of, 84, 84–86 normal, 110, 110–111, 115 circulant, 37, 37, 112 null space of, 47, 52, 53 commutator of, 36, 36 nullity of, 52 conjugate-transpose of, 107, 107 orthogonal, 90, 90–91, 113–115 corank of, 48, 48 orthogonally similar, 91, 91, 93, 94 corner, 102, 102 permutation, see Permutation matrix definiteness of, 101, 101–102 positive definite, 94, 94, 101, 114 diagonal, 21, 21, 27, 31, 36, 80, 101, 109 positive semidefinite, 101 determinant of, 69 rank, 44, 44–50, 108 diagonalizable, 82, 82, 111 full, 44 doubly stochastic, 206, 206 right inverse of, 45, 45–46, 55 exponentiation, 27, 27, 31, 33, 170, 171 rotation, see Rotation matrix Hermitian, 109, 109–110 scalar, 27, 36, 80 Hermitian adjoint of, 107, 107 self-adjoint, 109, 109–110 identity, see Identity matrix similarity of, 82, 82, 87 indefinite, 101 skew-symmetric, 71 integral, 21, 36 square, 21, 23, 44 inverse, 46, 46, 75–76 stochastic, 114, 202, 202–203 inverse of, 55 strictly lower triangular, 22, 22 invertibility of, 46, 46, 55, 76, 77 strictly upper triangular, 22, 22, 28 kernel of, 47, 52 symmetric, 23, 23, 36, 91, 93, 94, 114, 115 left inverse of, 45, 45–46, 55 trace of, 29, 29–30 lower triangular, 22, 22 transpose, 23, 23, 24, 36 multiplication, 24, 24–30 determinant of, 69 and the determinant, 70 unitary, 109, 109–112 associativity of, 25 unitary similarity of, 110, 110–111 distributivity of, 25 upper triangular, 22, 22, 32–33, 42 INDEX 231

determinant of, 69 of Rn, 90, 91 Vandermonde, 112 Orthonormal system, 18, 90, 108, 109 zero, see Zero matrix Modular equation, 15 Pn(F), 83 for codimension, 47 Parallelepiped, 72, 72 Monomial Parallelogram, 72, 72 multivariate, 98, 98 Permutation, 34, 34–36, 65, 65–68 degree of, 98 composition of, 34, 34, 66, 67 monic, 99 cycle, 67, 68 monic part of, 99 decomposition, 68 Multilinearity, 69 disjoint, 68, 68 of the determinant, 70 even, 66, 66 fixed point of, 74, 74 Norm identity, 35 of column vectors, 18, 19, 72, 90, 108, 109 inverse of, 35, 35 Null space inversion of, 65, 65–68 of a matrix, 47, 52, 53 odd, 66 Nullity parity, 66, 67, 68 of a matrix, 52 product of, 66, 67 sign, 66, 67 Operator norm Permutation matrix, 33 of a matrix, 113, 113–115 Polynomial, 31, 83, 83–84, 87 submultiplicativity of, 113 degree of, 83, 83 Orthogonal group, 90 divisibility of, 83 Orthogonal matrix leading coefficient of, 83, 83 eigenvalues of, 90 leading term of, 83, 83 Orthogonal system, 18, 108 monic, 83 Orthogonality multivariate, 98, 98–100 of column vectors, 17, 17–18, 28, 81, 108 degree of, 99 with respect to a bilinear form, 103 homogeneous, 99, 99–100 Orthogonally similar matrices, 91, 91, 93, 94, standard form, 99 103 of a matrix, 37, 38 Orthonormal basis of matrices, 31, 31, 33, 87, 87, 88, 170, of a subspace of Rn, 18 171 of Cn, 108, 109 root of, 83, 83–84 232 INDEX

multiplicity, 84 real version, 111 zero, 83, 83 Second Miracle of Linear Algebra, 43, 43 Set Quadratic form, 100, 100–103 affine closed, 59 definiteness of, 101, 101–103 affine-closed, 56, 56 indefinite, 101 intersection of, 57 negative definite, 101 closed, 62 negative semidefinite, 101 convex, 61, 61–62 positive definite, 101 dimension of, 62 positive semidefinite, 101 intersection of, 61 corank of, 47 Rn orthonormal basis of, 90, 91 open, 62, 187 subspace of, 9 sum of, 10, 10, 15 Radon’s Lemma, 62 translate of, 54, 55, 57 Rank Shearing, 40, 40 of a matrix, 44, 44–50, 108 Shearing matrix, 41 full, 44 Similar matrices, 82, 82, 87, 102, 103 of column vectors, 13, 13–15 Skew cofactor expansion, 75 Rank-Nullity Theorem Span for matrices, 53 of column vectors, 10, 10, 13, 55 Rayleigh quotient, 94, 100 transitivity of, 10 Rayleigh’s Principle, 94 Spectral Theorem Reflection matrix, 115 for normal matrices, 111 Rook arrangement, 33, 33 for real symmetric matrices, 93, 93–95, Rotation matrix, 26, 26 109, 111 Row space, 39 applications of, 94–95 k Row vector, 21, 21 Standard basis of F , 14, 26, 29 Row-rank, 39 Standard unit vector, 14, 26 full, 39, 45 Steinitz exchange lemma, 15 Straight-line segment, 61, 61 Sn, see Symmetric group Sublist, 12, 12 Scalar, 4 Submatrix, 37, 44, 45 Scaling, 41, 41 Subset Scaling matrix, 42 dense, 111, 111 Schur’s Theorem, 111 Subspace INDEX 233

affine, see Affine subspace Third Miracle of Linear Algebra, 90, 109 of Fk, 8, 8–10, 60 Trace basis of, 14, 14–15 of a matrix, 29, 29–30, 85 codimension of, 47, 47 Transposition, 66, 66–67 dimension of, 13, 13–15 neighbor, 67, 67 disjoint, 15 Triangle inequality, 113, 115 intersection of, 9 totally isotropic, 104, 104–105 Unitarily similar matrices, 110, 110–111 union of, 9 Unitary group, 109 zero-weight, 9 Unitary matrix Swapping, 41, 41 eigenvalues of, 109 Swapping matrix, 42 Symmetric group, 65, 66, 67 Vandermonde determinant, 74 System of linear equations, 7, 51–55, 64 Vandermonde matrix, 37, 37, 74 augmented, 54 Vector homogeneous, 52, 58, 71 isotropic, 17, 17 solution space of, 52, 53–54 trivial solution to, 52 Zero matrix, 21, 29, 45 set of solutions to, 55, 58 Zero polynomial, 83, 83 solvable, 52, 54–55, 77 Zero vector, see 0