<<

Discover Incomplete Preliminary Draft

Date: November 28, 2017

L´aszl´oBabai in collaboration with Noah Halford

All rights reserved. Approved for instructional use only. Commercial distribution prohibited. c 2016 L´aszl´oBabai.

Last updated: November 10, 2016 Preface

TO BE WRITTEN.

Babai: Discover Linear Algebra. ii This chapter last updated August 21, 2016 c 2016 L´aszl´oBabai. Contents

Notation ix

I Theory 1

Introduction to Part I 2

1 (F, R) Column Vectors 3 1.1 (F) Column vector basics ...... 3 1.1.1 The domain of scalars ...... 3 1.2 (F) Subspaces and span ...... 6 1.3 (F) and the First Miracle of Linear Algebra ...... 8 1.4 (F) ...... 12 1.5 (R) Dot product over R ...... 14 1.6 (F) Additional exercises ...... 14

2 (F) Matrices 15 2.1 Matrix basics ...... 15 2.2 ...... 18 2.3 Arithmetic of diagonal and triangular matrices ...... 22 2.4 Permutation Matrices ...... 24 2.5 Additional exercises ...... 26

3 (F) Matrix 28 3.1 Column and row rank ...... 28

iii iv CONTENTS

3.2 Elementary operations and ...... 29 3.3 Invariance of column and row rank, the Second Miracle of Linear Algebra . . . . . 31 3.4 Matrix rank and invertibility ...... 33 3.5 (optional) ...... 34 3.6 Additional exercises ...... 35

4 (F) Theory of Systems of Linear Equations I: Qualitative Theory 38 4.1 Homogeneous systems of linear equations ...... 38 4.2 General systems of linear equations ...... 40

5 (F, R) Affine and Convex Combinations (optional) 42 5.1 (F) Affine combinations ...... 42 5.2 (F) Hyperplanes ...... 44 5.3 (R) Convex combinations ...... 45 5.4 (R) Helly’s Theorem ...... 46 5.5 (F, R) Additional exercises ...... 47

6 (F) The 48 6.1 Permutations ...... 49 6.2 Defining the determinant ...... 51 6.3 Cofactor expansion ...... 54 6.4 Matrix inverses and the determinant ...... 55 6.5 Additional exercises ...... 56

7 (F) Theory of Systems of Linear Equations II: Cramer’s Rule 57 7.1 Cramer’s Rule ...... 57

8 (F) Eigenvectors and Eigenvalues 59 8.1 Eigenvector and eigenvalue basics ...... 59 8.2 Similar matrices and diagonalizability ...... 61 8.3 basics ...... 61 8.4 The characteristic polynomial ...... 63 8.5 The Cayley-Hamilton Theorem ...... 64 8.6 Additional exercises ...... 66 CONTENTS v

9 (R) Orthogonal Matrices 67 9.1 Orthogonal matrices ...... 67 9.2 Orthogonal similarity ...... 67 9.3 Additional exercises ...... 68

10 (R) The Spectral Theorem 69 10.1 Statement of the Spectral Theorem ...... 69 10.2 Applications of the Spectral Theorem ...... 69

11 (F, R) Bilinear and Quadratic Forms 71 11.1 (F) Linear and bilinear forms ...... 71 11.2 (F) Multivariate ...... 72 11.3 (R) Quadratic forms ...... 74 11.4 (F) (optional) ...... 76

12 (C) Complex Matrices 78 12.1 Complex numbers ...... 78 12.2 Hermitian dot product in Cn ...... 78 12.3 Hermitian and unitary matrices ...... 80 12.4 Normal matrices and unitary similarity ...... 81 12.5 Additional exercises ...... 82

13 (C, R) Matrix Norms 83 13.1 (R) Operator norm ...... 83 13.2 (R) Frobenius norm ...... 84 13.3 (C) Complex Matrices ...... 84

II Linear Algebra of Vector Spaces 85

Introduction to Part II 86

14 (F) Preliminaries 87 14.1 Modular arithmetic ...... 87 14.2 Abelian groups ...... 90 14.3 Fields ...... 93 vi CONTENTS

14.4 Polynomials ...... 94

15 (F) Vector Spaces: Basic Concepts 102 15.1 Vector spaces ...... 102 15.2 Subspaces and span ...... 104 15.3 Linear independence and bases ...... 105 15.4 The First Miracle of Linear Algebra ...... 109 15.5 Direct sums ...... 111

16 (F) Linear Maps 112 16.1 basics ...... 112 16.2 Isomorphisms ...... 113 16.3 The Rank-Nullity Theorem ...... 114 16.4 Linear transformations ...... 115 16.4.1 Eigenvectors, eigenvalues, eigenspaces ...... 116 16.4.2 Invariant subspaces ...... 117 16.5 Coordinatization ...... 120 16.6 Change of ...... 122

17 (F) Block Matrices (optional) 124 17.1 basics ...... 124 17.2 Arithmetic of block-diagonal and block-triangular matrices ...... 125

18 (F) Minimal Polynomials of Matrices and Linear Transformations (optional) 127 18.1 The minimal polynomial ...... 127 18.2 Minimal polynomials of linear transformations ...... 128

19 (R) Euclidean Spaces 131 19.1 Inner products ...... 131 19.2 Gram-Schmidt ...... 134 19.3 Isometries and orthogonal transformations ...... 136 19.4 First proof of the Spectral Theorem ...... 138

20 (C) Hermitian Spaces 141 20.1 Hermitian spaces ...... 141 20.2 Hermitian transformations ...... 144 CONTENTS vii

20.3 Unitary transformations ...... 145 20.4 Adjoint transformations in Hermitian spaces ...... 146 20.5 Normal transformations ...... 146 20.6 The Complex Spectral Theorem for normal transformations ...... 147

21 (R, C) The Singular Value Decomposition 148 21.1 The Singular Value Decomposition ...... 148 21.2 Low-rank approximation ...... 149

22 (R) Finite Markov Chains 151 22.1 Stochastic matrices ...... 151 22.2 Finite Markov Chains ...... 151 22.3 Digraphs ...... 153 22.4 Digraphs and Markov Chains ...... 153 22.5 Finite Markov Chains and undirected graphs ...... 153 22.6 Additional exercises ...... 153

23 More Chapters 155

24 Hints 156 24.1 Column Vectors ...... 156 24.2 Matrices ...... 158 24.3 Matrix Rank ...... 159 24.4 Theory of Systems of Linear Equations I: Qualitative Theory ...... 159 24.5 Affine and Convex Combinations ...... 159 24.6 The Determinant ...... 160 24.7 Theory of Systems of Linear Equations II: Cramer’s Rule ...... 160 24.8 Eigenvectors and Eigenvalues ...... 160 24.9 Orthogonal Matrices ...... 160 24.10 The Spectral Theorem ...... 160 24.11 Bilinear and Quadratic Forms ...... 160 24.12 Complex Matrices ...... 161 24.13 Matrix Norms ...... 161 24.14 Preliminaries ...... 161 24.15 Vector Spaces ...... 161 viii CONTENTS

24.16 Linear Maps ...... 162 24.17 Block Matrices ...... 162 24.18 Minimal Polynomials of Matrices and Linear Transformations ...... 162 24.19 Euclidean Spaces ...... 162 24.20 Hermitian Spaces ...... 163 24.21 The Singular Value Decomposition ...... 163 24.22 Finite Markov Chains ...... 163

25 Solutions 164 25.1 Column Vectors ...... 164 25.2 Matrices ...... 164 25.3 Matrix Rank ...... 164 25.4 Theory of Systems of Linear Equations I: Qualitative Theory ...... 164 25.5 Affine and Convex Combinations ...... 164 25.6 The Determinant ...... 164 25.7 Theory of Systems of Linear Equations II: Cramer’s Rule ...... 164 25.8 Eigenvectors and Eigenvalues ...... 164 25.9 Orthogonal Matrices ...... 165 25.10 The Spectral Theorem ...... 165 25.11 Bilinear and Quadratic Forms ...... 165 25.12 Complex Matrices ...... 166 25.13 Matrix Norms ...... 166 25.14 Preliminaries ...... 166 25.15 Vector Spaces ...... 166 25.16 Linear Maps ...... 166 25.17 Block Matrices ...... 166 25.18 Minimal Polynomials of Matrices and Linear Transformations ...... 166 25.19 Euclidean Spaces ...... 166 25.20 Hermitian Spaces ...... 166 25.21 The Singular Value Decomposition ...... 166 25.22 Finite Markov Chains ...... 166 Notation

Symbol Meaning C The field of complex numbers diag im(ϕ) Image of ϕ N The set {0, 1, 2,... } of natural numbers [n] The set {1, . . . , n} Q The field of rational numbers R The field of real numbers R× The set of real numbers excluding 0 Z The set {..., −2, −1, 0, 1, 2,... } of integers Z+ The set {1, 2, 3,... } of positive integers ⊕ Direct sum ≤ Subspace ∅ The empty set ⊥ Orthogonal

Babai: Discover Linear Algebra. ix This chapter last updated August 27, 2016 c 2016 L´aszl´oBabai. Part I

Matrix Theory

1 Introduction to Part I

TO BE WRITTEN. (R Contents)

Babai: Discover Linear Algebra. 2 This chapter last updated August 21, 2016 c 2016 L´aszl´oBabai. β, and ζ to denote these numbers. We denote column vectors by bold letters such as u, v, w, x, y, b, e, f, etc., so we may write   α1 α   2 v =  .  . Chapter 1  .  αk (F, R) Column (R Contents)

Vectors 1.1.1 The domain of scalars In general, the entries of column vectors will be taken from a “field,” denoted by F. We shall refer to the elements of the field as “scalars,” (R Contents) F and we will normally denote scalars by lower- case Greek letters. We will discuss fields in detail in Section 14.3. Informally, a field is 1.1 (F) Column vector ba- a set endowed with the operations of addition sics and multiplication which obey the rules famil- iar from the arithmetic of real numbers (com- We begin with a discussion of column vectors. mutativity, associativity, inverses, etc.). Ex- amples of fields are , , , and , where Definition 1.1.1 (Column vector). A column Q R C Fp Q is the set of rational numbers, is the set of vector of height k is a list of k numbers ar- R real numbers, is the set of complex numbers, ranged in a column, written as C and Fp is the “integers modulo p” for a prime   α1 p, so Fp is a finite field of “order” p (order = α  number of elements).  2  .  . The reader who is not comfortable with fi-  .  α nite fields or with the abstract concept of fields k may ignore the exercises related to them and The k numbers in the column are referred to as always take F to be Q, R, or C. In fact, taking the entries of the column vector; we will nor- F to be R suffices for all sections except those mally use lower case Greek letters such as α, specifically related to C.

Babai: Discover Linear Algebra. 3 This chapter last updated August 29, 2016 c 2016 L´aszl´oBabai. 4 CHAPTER 1. (F, R) COLUMN VECTORS

We shall also consider integral vectors, i. e., Definition 1.1.5 (Zero vector). The zero vector vectors whose entries are integers. Z denotes in Fk is the vector the set of integers and Zk the set of integral 0 k k k k vectors of height k, so Z ⊆ Q ⊆ R ⊆ C . 0 Note that is not a field (division does not 0k := . (1.2) Z . work within ). Our notation for the domain   Z F 0 of scalars always refers to a field and therefore does not include Z. We often write 0 instead of 0k when the height of the vector is clear from context. Notation 1.1.2. Let α1, . . . , αn ∈ F. The ex- k Definition 1.1.6 (All-ones vector). The all-ones X k pression α denotes the sum α + ··· + vector in F is the vector i 1   i=1 1 αk. More generally, for an index set I = 1 X 1 :=   (1.3) {i1, i2, . . . , i`}, the expression αi denotes k . . i∈I 1 the sum αi1 + ··· + αi` . We sometimes write 1 instead of 1k. Convention 1.1.3 (Empty sum). The empty 0 Definition 1.1.7 (Addition of column vectors). X X Addition of column vectors of the same height sum, denoted αi or αi, evaluates to i=1 i∈∅ is defined elementwise, i. e., zero.       α1 β1 α1 + β1 α2 β2 α2 + β2   +   =   . (1.4) ∗ ∗ ∗  .   .   .   .   .   .  αk βk αk + βk Definition 1.1.4 (The Fk). For a domain  2  k F of scalars, we define the space F of column Example 1.1.8. Let v =  6  and let vectors of height k over F by −1  0  2    α1 w = −3 . Then v + w = 3 .       α2  2 1 k :   F =  .  αi ∈ F (1.1)  .   Exercise 1.1.9. Verify that vector addition is   k  αk  commutative, i. e., for v, w ∈ F , we have v + w = w + v . (1.5) 1.1. (F) COLUMN VECTOR BASICS 5

Exercise 1.1.10. Verify that vector addition Example 1.1.14. The following is a linear is associative, i. e., for u, v, w ∈ Fk, we have combination.         u + (v + w) = (u + v) + w . (1.6) 2 −1 6 −5 −3  2  1  2   4    + 4   −   =   .  6   0  2  10   1  1 −3 −8 −7 Column vectors also carry with them the Numerical exercise 1.1.15. The following notion of “scaling” by an element of . F linear combinations are of the form αa + βb + Definition 1.1.11 (Multiplication of a column γc. Evaluate in two ways, as (αa + βb) + γc vector by a ). Let v ∈ Fk and let λ ∈ F. and as αa + (βb + γc). Self-check: you must Then the vector λv is the vector v after each get the same answer. entry has been scaled (multiplied) by a factor 2  3  −7 of λ, i. e., (a) + 3 − 2 4 −1 2 α  λα  1 1 1 −4 1 α  λα   2  2 (b) −2 6 + 1 −2 + 1 λ  .  =  .  . (1.7)   2      .   .  3 6 4 αk λαk  3  −2  7   2  (c) −  7  − 4  3  + −4 Example 1.1.12. Let v = −1. Then −1 0 3 1    6  −5 Exercise 1.1.16. Express the vector −1 3v = −3.   3 11 −2 Definition 1.1.13 (). Let as a linear combination of the vectors  1  v ,..., v ∈ n. Then a linear combination 7 1 m F   of the vectors v1,..., vm is a sum of the form 2 m X and 6 . αivi where α1, . . . , αm ∈ F. 6 i=1 Exercise 1.1.17. 6 CHAPTER 1. (F, R) COLUMN VECTORS   3 Given the αij and the βi, we need to find (a) Express  1  as a linear combination x1, . . . , xn that satisfy these equations. This −2 is arguably one of the most fundamental of the vectors problems of applied mathematics. We can rephrase this problem in terms of the vectors 1 −3 1 a1,..., an, b, where 1 ,  3  , 2 .   1 0 3 α1j α2j  First describe the nature of the problem aj :=  .   .  you need to solve.   αkj  3  is the column of coefficients of xj and the vec- (b) Give an “aha” proof that  1  cannot tor   −2 β1 be expressed as a linear combination of β2 :   b =  .  −3  2   3   .   1  , −2 , −8 βk 2 0 5 represents the right-hand side. With this nota- tion, our system of linear equations takes the An “aha” proof may not be easy to find more concise form but it has to be immediately convincing. This problem will be generalized in Ex. x1a1 + x2a2 + ··· + xnan = b . (1.9) 1.2.7. The problem of solving the system of equations (1.8) therefore is equivalent to expressing the Exercise 1.1.18. To what does a linear com- vector b as a linear combination of the a . bination of the empty list of vectors evaluate? i (R Convention 1.1.3) (R Contents) Let us now consider the system of linear equations 1.2 (F) Subspaces and α11x1 + α12x2 + ··· + α1nxn = β1 span α21x1 + α22x2 + ··· + α2nxn = β2 . (1.8) . Definition 1.2.1 (Subspace). A set W ⊆ Fn is a n n αk1x1 + αk2x2 + ··· + αknxn = βk subspace of F (denoted W ≤ F ) if it is closed 1.2. (F) SUBSPACES AND SPAN 7

k under linear combinations. Show that Wk ≤ F . This is the 0-weight sub- k k space of F . Exercise 1.2.2. Let W ≤ F . Show that 0 ∈ W . (Why is the empty set not a sub- Exercise 1.2.8. Prove that, for n ≥ 2, the space?) space Rn has infinitely many subspaces. n Proposition 1.2.3. W ≤ if and only if n F Proposition 1.2.9. Let W1,W2 ≤ F . Then

(a) 0 ∈ W n (a) W1 ∩ W2 ≤ F (b) If u, v ∈ W , then u + v ∈ W (b) The intersection of any (finite or infinite) n (c) If v ∈ W and α ∈ F, then αv ∈ W . collection of subspaces of F is also a sub- space of n. Exercise 1.2.4. Show that, if W is a F n n nonempty subset of F , then (c) implies (a). (c) W1 ∪ W2 ≤ F if and only if W1 ⊆ W2 or Exercise 1.2.5. Show W2 ⊆ W1

k (a) {0} ≤ F ; n Definition 1.2.10 (Span). Let S ⊆ F . Then (b) Fk ≤ Fk . the span of S, denoted span(S), is the smallest subspace of Fk containing S, i. e., We refer to these as the trivial subspaces of Fk. (a) span S ⊇ S; Exercise 1.2.6.    k α1 (b) span S is a subspace of F ;   (a) The set  0  α1 = 2α3 is a sub- (c) for every subspace W ≤ Fn, if S ⊆ W  α3  then span S ≤ W . space of R3.    Fact 1.2.11. span(∅) = {0}. (Why?) α1 (b) The set α2 = α1 + 7 is not a α n 2 Theorem 1.2.12. Let S ⊆ F . Then subspace of R2. (a) span S exists and is unique; Exercise 1.2.7. Let \    (b) span(S) = W . This is the inter-  α1   k  S⊆W ≤ n α2  F   k X section of all subspaces containing S. Wk =  .  ∈ F αi = 0  .  i=1     αk  ♦ 8 CHAPTER 1. (F, R) COLUMN VECTORS

This theorem tells us that the span exists. 1.3 (F) Linear indepen- The next theorem constructs all the elements of the span. dence and the First Miracle of Linear Alge- Theorem 1.2.13. For S ⊆ Fn, span S is the set of all linear combinations of the finite sub- bra sets of S. (Note that this is true even when S is empty. Why?) In this section, v1, v2,... will denote column ♦ k vectors of height k, i. e., vi ∈ F . Proposition 1.2.14. Let S ⊆ n. Then F Definition 1.3.1 (List). A list of objects ai is a n S ≤ F if and only if S = span(S). whose domain is an “index set” I; we write the list as (a | i ∈ I). Most often our Let S ⊆ n. i F index set will be I = {1, . . . , n}. In this case, Proposition 1.2.15. Prove that span(span(S)) = we write the list as (a1, . . . , an). The size of the span(S). Prove this list is |I|. Notation 1.3.2 (Concatenation of lists). (a) based on the definition; Let L = (v1, v2,..., v`) and M = (w , w ,..., w ) be lists. We denote by (L, M) (b) (linear combinations of linear combina- 1 2 n the list obtained by concatenation L and M, tions) based on Theorem 1.2.13. i. e., the list (v1, v2,..., v`, w1, w2,..., wn). If Proposition 1.2.16 (Transitivity of span). the list M has only one element, we omit the Suppose R ⊆ span(T ) and T ⊆ span(S). Then parentheses around the list, that is, we write R ⊆ span(S). (L, w) rather than (L, (w)). Definition 1.3.3 (Trivial linear combination). n Definition 1.2.17 (Sum of sets). Let A, B ⊆ F . The trivial linear combination of the vectors Then A + B is the set v1,..., vm is the linear combination (R Def. 1.1.13) 0v1 + ··· + 0vn (all coefficients are 0). A + B = {a + b | a ∈ A, b ∈ B} . (1.10) Fact 1.3.4. The trivial linear combination n Proposition 1.2.18. Let U1,U2 ≤ F . Then evaluates to zero. U1 + U2 = span(U1 ∪ U2). Definition 1.3.5 (Linear independence). The list (v1,..., vn) is said to be linearly indepen- (R Contents) dent if the only linear combination that eval- uates to zero is the trivial linear combination. 1.3. (F) LINEAR INDEPENDENCE AND THE FIRST MIRACLE OF LINEAR ALGEBRA 9

The list (v1,..., vn) is linearly dependent if it w ∈ span(v1,..., vk), i. e., if w can be ex- is not linearly independent, i. e., if there ex- pressed as a linear combination of the vi. ist scalars α1, . . . , αn, not all zero, such that n Proposition 1.3.11. The vectors v1,..., vk X αivi = 0. are linearly dependent if and only if there is i=1 some i such that vi depends on the other vec-

Definition 1.3.6. If a list (v1,..., vn) of vec- tors in the list. tors is linearly independent (dependent), we Exercise 1.3.12. Show that the list (v, w, v+ say that the vectors v1,..., vn are linearly in- w) is linearly dependent. dependent (dependent). Exercise 1.3.13. Is the empty list linearly in- Definition 1.3.7. We say that a set of vectors is dependent? linearly independent if a list formed by its el- ements (in any order and without repetitions) Exercise 1.3.14. Show that if a list is linearly is linearly independent. independent, then any permutation of it is lin- early independent. Example 1.3.8. Let Proposition 1.3.15. The list (v, v), consist-  1  3 −4 ing of the vector v ∈ Fn listed twice, is linearly v1 = −1 , v2 = 1 , v3 =  1  . dependent. 2 0 2 Exercise 1.3.16. Is there a vector v such that Then v1, v2, v3 are linearly independent vec- the list (v), consisting of a single item, is lin- tors. It follows that the set {v1, v2, v3, v2} early dependent? is linearly independent while the list Exercise 1.3.17. Which vectors depend on (v1, v2, v3, v2) is linearly dependent. the empty list? Note that the list (v1, v2, v3, v2) has four Definition 1.3.18 (Sublist). A sublist of a list elements, but the set {v1, v2, v3, v2} has three elements. L is a list M consisting of some of the elements of L, in the same order in which they appear Exercise 1.3.9. Show that the vectors in L.  1  4 −2 −3 , 2 , −8 are linearly depen- Examples 1.3.19. Let L = (a, b, c, b, d, e). 2 1 3 (a) The empty list is a sublist of L. dent. (b) L is a sublist of itself. Definition 1.3.10. We say that the vector w depends on the list (v1,..., vk) of vectors if (c) The list L1 = (a, b, d) is a sublist of L. 10 CHAPTER 1. (F, R) COLUMN VECTORS

(d) The list L2 = (b, b) is a sublist of L. Definition 1.3.25 (Parallel vectors). Let u, v ∈ Fn. We say that u and v are parallel if there (e) The list L3 = (b, b, b) is not a sublist of exists a scalar α such that u = αv or v = αu. L. Note that 0 is parallel to all vectors, and the relation of being parallel is an equivalence re- (f) The list L4 = (a, d, c, e) is not a sublist lation (R Def. 14.1.19) on the set of nonzero of L. vectors. Fact 1.3.20. Every sublist of a linearly inde- Exercise 1.3.26. Let u, v ∈ Fn. Show that pendent list of vectors is linearly independent. the list (u, v) is linearly dependent if and only if u and v are parallel. The following lemma is central to the Exercise 1.3.27. Find n+1 vectors in Fn such proof of the First Miracle of Linear Algebra that every n are linearly independent. Over in- R ( Theorem 1.3.40) as well as to our char- finite fields, a much stronger statement holds acterization of bases as maximal linearly inde- (R Ex. 15.3.15). pendent sets (R Prop. 1.3.37). Definition 1.3.28 (Rank). The rank of a set n Lemma 1.3.21. Suppose (v1,..., vk) is a lin- S ⊆ F , denoted rk S, is the size of the largest early independent list of vectors and the list linearly independent subset of S. The rank of (v1,..., vk+1) is linearly dependent. Then a list is the rank of the set formed by its ele- vk+1 ∈ span(v1,..., vk). ♦ ments. Proposition 1.3.29. Let S and T be lists. Proposition 1.3.22. The vectors v ,..., v 1 k Show that are linearly dependent if and only if there is some j such that rk(S,T ) ≤ rk S + rk T (1.11) where rk(S,T ) is the rank of the list obtained v ∈ span(v ,..., v , v ,..., v ) . j 1 j−1 j+1 k by concatenating the lists S and T . Exercise 1.3.23. Prove that no list of vectors Definition 1.3.30 (). Let W ≤ Fn containing 0 is linearly independent. be a subspace. The dimension of W , denoted dim W , is its rank, that is, dim W = rk W . Exercise 1.3.24. Prove that a list of vectors Definition 1.3.31 (List of generators). Let W ≤ with repeated elements (the same vector occurs n F . The list L = (v1,..., vk) ⊆ W is said to be more than once) is linearly dependent. (This a list of generators of W if span(v1,..., vk) = follows from combining which two previous ex- W . In this case, we say that v1,..., vk gener- ercises?) ate W . 1.3. (F) LINEAR INDEPENDENCE AND THE FIRST MIRACLE OF LINEAR ALGEBRA 11

Definition 1.3.32 (Basis). A list b = Exercise 1.3.38. Prove: every subspace W ≤ k k (b1,..., bn) is a basis of the subspace W ≤ F F has a basis. You may assume the fact, to be if b is a linearly independent list generators of proved shortly (R Cor. 1.3.49), that every W . linearly independent list of vectors in Fk has size at most k. Fact 1.3.33. If W ≤ Fn and b is a list of vec- tors of W then b is a basis of W if and only if Proposition 1.3.39. Let L be a finite list of it is linearly independent and span(b) = W . k generators of W ≤ F . Then L contains a basis k of W . Definition 1.3.34 ( of F ). The standard basis of k is the basis (e ,..., e ), F 1 k Next we state a central result to which the where e is the column vector which has its i entire field of linear algebra arguably owes its i-th component equal to 1 and all other com- . ponents equal to 0. The vectors e1,..., ek are sometimes called the standard unit vectors. Theorem 1.3.40 (First Miracle of Linear Al- 3 For example, the standard basis of F is gebra). Let v1,..., vk be linearly independent with v ∈ span(w ,..., w ) for all i. Then 1 0 0 i 1 m k ≤ m. 0 , 1 , 0 0 0 1 The proof of this theorem requires the fol- k lowing lemma. Exercise 1.3.35. The standard basis of F is a basis. Lemma 1.3.41 (). k Note that subspaces of F do not have a Let (v1,..., vk) be a linearly independent list “standard basis.” of vectors such that vi ∈ span(w1,..., wm) for Definition 1.3.36 (Maximal linearly indepen- all i. Then there exists j (1 ≤ j ≤ m) such n that the list (wj, v2,..., vk) is linearly indepen- dent list). Let W ≤ F . A list L = (v1,..., vk) of vectors in W is a maximal linearly indepen- dent. ♦ dent list in W if L is linearly independent but Exercise 1.3.42. Use the Steinitz exchange for all w ∈ W , the list (L, w) is linearly depen- lemma to prove the First Miracle of Linear Al- dent. gebra. Proposition 1.3.37. Let W ≤ Fk and let Corollary 1.3.43. Let W ≤ k. Every ba- b = (b1,..., bk) be a list of vectors in W . Then F b is a basis of W if and only if it is a maximal sis of W has the same size (same number of linearly independent list. vectors). 12 CHAPTER 1. (F, R) COLUMN VECTORS

Exercise 1.3.44. Prove: Cor. 1.3.43 is equiv- Proposition 1.3.53 (Modular equation). Let k alent to the First Miracle, i. e., infer the First U1,U2 ≤ F . Then Miracle from Cor. 1.3.43. dim(U1+U2)+dim(U1∩U2) = dim U1+dim U2 . k Corollary 1.3.45. Every basis of F has size (1.13) k.

The following result is essentially a restate- (R Contents) ment of the First Miracle of Linear Algebra.

Corollary 1.3.46. rk(v1,..., vk) = dim (span(v1,..., vk)). 1.4 (F) Dot product Exercise 1.3.47. Prove Cor. 1.3.46 is equiv- alent to the First Miracle, i. e., infer the First We now define the dot product, an operation Miracle from Cor. 1.3.46. which takes two vectors as input and outputs a scalar. Corollary 1.3.48. dim Fk = k. Definition 1.4.1 (Dot product). Let x, y ∈ n T Corollary 1.3.49. Every linearly independent F with x = (α1, α2, . . . , αn) and y = k T list of vectors in F has size at most k. (β1, β2, . . . , βn) . (Note that these are column vectors). Then the dot product of x and y, Corollary 1.3.50. Let W ≤ Fk and let L be a denoted x · y, is linearly independent list of vectors in W . Then n L can be extended to a basis of W . X x · y := αiβi ∈ F . (1.14) n Exercise 1.3.51. Let U1,U2 ≤ F with i=1 U1 ∩ U2 = {0}. Let v1,..., vk ∈ Proposition 1.4.2. The dot product is sym- U1 and w1,..., w` ∈ U2. If the lists metric, that is, (v1,..., vk) and (w1,..., w`) are linearly in- dependent, then so is the concatenated list x · y = y · x . (v1,..., vk, w1,..., w`).

n Proposition 1.3.52. Let U1,U2 ≤ F with Exercise 1.4.3. Show that the dot product is U1 ∩ U2 = {0}. Then distributive, that is,

dim U1 + dim U2 ≤ n . (1.12) x · (y + z) = x · y + x · z . 1.4. (F) DOT PRODUCT 13

Exercise 1.4.4. Show that the dot product is Definition 1.4.8 (). Let x, y ∈ bilinear (R Def. 6.2.11), i. e., for x, y, z ∈ Fn Fk. Then x and y are orthogonal (notation: and α ∈ F, we have x ⊥ y) if x · y = 0. (x + z) · y = x · y + z · y (1.15) (αx) · y = α(x · y) (1.16) Exercise 1.4.9. What vectors are orthogonal x · (y + z) = x · y + x · z (1.17) to every vector? x · (αy) = α(x · y) (1.18) Exercise 1.4.10. Which vectors are orthogo- Exercise 1.4.5. Show that the dot prod- nal to 1 ? uct preserves linear combinations, i. e., for k n x1,..., xk, y ∈ F and α1, . . . , αk ∈ F, we have k ! k Definition 1.4.11 (Isotropic vector). The vec- X X tor v ∈ n is isotropic if v 6= 0 and v · v = 0. αixi · y = αi(xi · y) (1.19) F i=1 i=1 n and for x, y1,..., y` ∈ F and β1, . . . , β` ∈ F, Exercise 1.4.12. ` ! ` X X x · βiyi = βi(x · yi) . (1.20) (a) Show that there are no isotropic vectors i=1 i=1 in Rn. Numerical exercise 1.4.6. Let  3  1  3  (b) Find an isotropic vector in C2. x =  4  , y = 0 , z =  2  −2 2 −4 (c) Find an isotropic vector in 2. Compute x · y, x · z, and x · (y + z). Self-check: F2 verify that x · (y + z) = (x · y) + (x · z). Theorem 1.4.13. If v ,..., v are pairwise Exercise 1.4.7. Compute the following dot 1 k orthogonal and non-isotropic vectors, then they products. are linearly independent. ♦ (a) 1k · 1k for k ≥ 1.   α0 α   1 k (R Contents) (b) x · x where x =  .  ∈ F  .  αk 14 CHAPTER 1. (F, R) COLUMN VECTORS

1.5 (R) Dot product over R (b) Find all orthonormal bases of R2. We now specialize our discussion of the dot Orthogonality is studied in more detail in product to the space Rn. Chapter 19. Exercise 1.5.1. Let x ∈ Rn. Show x · x ≥ 0, (R Contents) with equality holding if and only if x = 0. This “positive definiteness” of the dot prod- uct allows us to define the “norm” of a vector, 1.6 (F) Additional exer- a generalization of length. cises Definition 1.5.2 (Norm). The norm of a vector n x ∈ R , denoted kxk, is defined as Exercise 1.6.1. Let v1,..., vn be vectors such √ that kvik > 1 for all i, and vi · vj = 1 when- kxk := x · x . (1.21) ever i 6= j. Show that v1,..., vn are linearly independent. This norm is also referred to as the Euclidean norm or the `2 norm. Definition 1.6.2 (Incidence vector). Let A ⊆ n Definition 1.5.3 (Orthogonal system). An or- {1, . . . , n}. Then the incidence vector vA ∈ R thogonal system in Rn is a list of (pairwise) is the vector whose i-th coordinate is 1 if i ∈ A orthogonal nonzero vectors in Rn. and 0 otherwise. Exercise 1.5.4. Let S ⊆ Rk be an orthogonal Exercise 1.6.3. Let A, B ⊆ {1, . . . , n}. Ex- k system in R . Prove that S is linearly indepen- press the dot product vA · vB in terms of the dent. sets A and B. Definition 1.5.5 (Orthonormal system). An or- ♥ Exercise 1.6.4 (Generalized Fisher In- n thonormal system in R is a list of (pairwise) equality). Let λ ≥ 1 and let A1,...,Am ⊆ orthogonal vectors in V , all of which have unit {1, . . . , n} such that for all i 6= j we have norm. |Ai ∩ Aj| = λ. Then m ≤ n. Definition 1.5.6 (Orthonormal basis). An or- n thonormal basis of W ≤ R is an orthonormal (R Contents) system that is a basis of W . Exercise 1.5.7.

(a) Find an orthonormal basis of Rn. values k and n are clear from context. We also write (M)ij to indicate the (i, j) entry of the matrix M, i. e., αij = (M)ij. Example 2.1.2. This is a 3 × 5 matrix. Chapter 2 1 3 −2 0 −1 6 5 4 −3 −6 2 7 1 1 5 ( ) Matrices In this example, we have α22 = 5 and α15 = F −1.

Definition 2.1.3 (The space Fk×n). The set of k × n matrices with entries from the domain F of scalars is denoted by Fk×n. Recall that (R Contents) F always denotes a field. Square matrices (k = n) have special significance, so we write n×n Mn(F) := F . We identify M1(F) with F 2.1 Matrix basics and omit the matrix notation, i. e., we write α rather than α. An integral matrix is a ma- Definition 2.1.1 (Matrix). A k × n matrix is trix with integer entries. Naturally, Zk×n will a table of numbers arranged in k rows and n denote the set of k × n integral matrices, and n×n columns, written as Mn(Z) = Z . Recall that Z is not a field.    0 −1 4 7 α11 α12 ··· α1n Example 2.1.4. For example, ∈  . . .  −3 5 6 8  . . .   2 6 9  α α ··· α k1 k2 kn R2×4 is a 2 × 4 matrix and  3 −4 −2 ∈ −5 1 4 We may write M = (α ) to indicate a ma- i,j k×n M ( ) is a 3 × 3 matrix. trix whose entry in position (i, j)(i-th row, k- 3 R th column) is αi,j. For typographical conve- Observe that the column vectors of height nience we usually omit the comma separating k introduced in Chapter 1 are k × 1 matri- the row index and the column index and sim- ces, so Fk = Fk×1. Moreover, every state- ply write αij instead of αi,j; we use the comma ment about column vectors in Chapter 1 ap- if its omission would lead to ambiguity. So we plies analogously to 1 × n matrices (“row vec- write M = (αij)k×n, or simply M = (αij) if the tors”).

Babai: Discover Linear Algebra. 15 This chapter last updated August 29, 2016 c 2016 L´aszl´oBabai. 16 CHAPTER 2. (F) MATRICES

Notation 2.1.5. When writing row vectors, we Definition 2.1.11 (Upper and lower triangular use commas to avoid ambiguity, so we write, matrices). A matrix A = (αij) ∈ Mn(F) is up-  for example, (3, 5, −1) instead of 3 5 −1 . per triangular if αij = 0 whenever i > j. A is Notation 2.1.6 (). The k × n ma- said to be strictly upper triangular if αij = 0 trix with all of its entries equal to 0 is called the whenever i ≥ j. Lower triangular and strictly lower triangular matrices are defined analo- zero matrix and is denoted by 0k×n, or simply by 0 if k and n are clear from context. gously. Notation 2.1.7 (All-ones matrix). The k × n Examples 2.1.12. matrix with all of its entries equal to 1 is de-   noted by Jk×n or J. We write Jn for Jn×n. 5 2 0 7 2  3 0 −4 0  Definition 2.1.8 (Diagonal matrix). A matrix   (a)  0 6 0  is upper triangu- A = (αij) ∈ Mn(F) is diagonal if αij = 0 when-   ever i 6= j. The n × n diagonal matrix with en-  −1 −3 0 5 tries λ1, . . . λn is denoted by diag(λ1, . . . , λn). lar. Example 2.1.9. 0 2 0 7 2    5 0 0 0 0  0 0 −4 0    0 3 0 0 0 (b)  0 6 0  is strictly upper     diag(5, 3, 0, −1, 5) = 0 0 0 0 0  0 −3   0 0 0 −1 0 0 0 0 0 0 0 5 triangular. Notation 2.1.10. To avoid filling most of a ma- 5 0  trix with the number “0”, we often write ma- 2 3    trices like the one above as (c) 0 0 0  is lower triangular. 7 −4 6 −1  5    0 2 0 0 −3 5  3    diag(5, 3, 0, −1, 5) =  0    0   −1  0 0 2 0  5   (d) 0 0 0  is strictly lower tri-   where the big 0 symbol means that every entry 7 −4 6 0  in the triangles above or below the diagonal is 2 0 0 −3 0 0. angular. 2.1. MATRIX BASICS 17

Fact 2.1.13. The diagonal matrices are the Definition 2.1.19 (Matrix addition). Let A = matrices which are simultaneously upper and (αij) and B = (βij) be k × n matrices. Then lower triangular. the sum A + B is the k × n matrix with entries Definition 2.1.14 (Matrix ). The (A + B)ij = αij + βij (2.2) transpose of a k × ` matrix M = (αij) is the ` × k matrix (βij) defined by That is, addition is defined elementwise. β = α (2.1) ij ji Example 2.1.20. T and is denoted M . (We flip it across its main       diagonal, so the rows of A become the columns 2 1 1 −6 3 −5 of AT and vice versa.) 4 −2 + 2 3  = 6 1  0 5 1 4 1 9 Examples 2.1.15.   Fact 2.1.21 (Adding zero). For any matrix T 3 1 3 1 4 k×n (a) = 1 5 A ∈ F , A + 0 = 0 + A = A. 1 5 9   4 9 Proposition 2.1.22 (Commutativity). Ma- T trix addition obeys the commutative law: if (b) 1k = (1, 1,..., 1) | {z } A, B ∈ k×n, then A + B = B + A. k times F (c) In Examples 2.1.12, the matrix (c) is the Proposition 2.1.23 (Associativity). Matrix transpose of (a), and (d) is the transpose addition obeys the associative law: if A, B, C ∈ of (b). Fk×n, then (A + B) + C = A + (B + C).

Fact 2.1.16. Let A be a matrix. Then Definition 2.1.24 (The negative of a matrix). T T A = A. k×n Let A ∈ F be a matrix. Then −A is the Definition 2.1.17 (). A ma- k × n matrix defined by (−A)ij = −(A)ij. trix M is symmetric if M = M T . Proposition 2.1.25. Let A ∈ k×n. Then k×` F Note that if a matrix M ∈ F is symmet- A + (−A) = 0. ric then k = ` (M is square). 1 3 0  Definition 2.1.26 (Multiplication of a matrix by a scalar). Let A = (α ) ∈ k×n, and let ζ ∈ . Example 2.1.18. The matrix 3 5 −2 ij F F 0 −2 4 Then ζA is the k × n matrix whose (i, j) entry is symmetric. is ζ · αij. 18 CHAPTER 2. (F) MATRICES

Example 2.1.27. Proposition 2.2.5 (Matrix multiplication vs. scaling). Let A ∈ k×n, B ∈ n×`, and α ∈ .  1 2  3 6  F F F Then 3 −3 4 = −9 12 0 6 0 18 A(αB) = α(AB) = (αA)B. (2.5) Fact 2.1.28. −A = (−1)A. Numerical exercise 2.2.6. For each of the following triples of matrices, compute the prod- (R Contents) ucts AB, AC, and A(B+C). Self-check: verify that A(B + C) = AB + AC.

1 2 2.2 Matrix multiplication (a) A = , 2 1 Definition 2.2.1 (Matrix multiplication). Let  3 1 0 B = , A = (αij) be an r × s matrix and B = (βjk) −4 2 5 be an s × t matrix. Then the matrix product 1 −7 −4 C = C = AB is the r × t matrix C = (γik) defined 5 3 −6 by s X 2 5  γ = α β (2.3) ik ij jk (b) A = 1 1 , j=1   3 −3 k×n Exercise 2.2.2. Let A ∈ F and let B ∈ 4 6 3 1 n×m B = , F . Show that 3 3 −5 4 T T T 1 −4 −1 5  (AB) = B A . (2.4) C = 2 4 10 −7 Proposition 2.2.3 (Distributivity). Matrix multiplication obeys the right distributive law: 3 1 2  k×n n×` if A ∈ F and B,C ∈ F , then A(B +C) = (c) A = 4 −2 4  , AB + AC. Analogously, it obeys the left dis- 1 −3 −2 tributive law: if A, B ∈ Fk×n and C ∈ Fn×`, −1 4  then (A + B)C = AC + BC. B =  3 2  , −5 −2 Proposition 2.2.4 (Associativity). Matrix   multiplication obeys the associative law: if A, 2 −3 B, and C are matrices with compatible dimen- C = −6 −2 sions, then (AB)C = A(BC). 0 1 2.2. MATRIX MULTIPLICATION 19

k×n (`×m) Exercise 2.2.7. Let A ∈ F , and let Eij This is also written as I = (δij), where the be the ` × m matrix with a 1 in the (i, j) posi- Kronecker delta symbol δij is defined by tion and 0 everywhere else.  1 i = j δij = (2.8) (k×k) 0 i 6= j (a) What is Eij A? Fact 2.2.11. The columns of I are the stan- (n×n) R (b) What is AEij ? dard unit vectors ( Def. 1.3.34). Proposition 2.2.12. For all A ∈ k×n, Definition 2.2.8 (). The rota- F

tion matrix Rθ is the matrix defined by IkA = AIn = A. (2.9)

cos θ − sin θ Rθ = . (2.6) sin θ cos θ Definition 2.2.13 (Scalar matrix). The matrix A ∈ Mn(F) is a scalar matrix if A = αI for As we shall see, this matrix is intimately re- some α ∈ F. lated to the rotation of the Euclidean by k×n θ (R Ex. 16.5.2). Exercise 2.2.14. Let A ∈ F and let B ∈ `×k F . Let D = diag(λ1, . . . , λk). Show Exercise 2.2.9. Prove Rα+β = RαRβ. Your (a) DA is the matrix obtained by scaling the proof may use the addition theorems for the i-th row of A by λi for each i; trigonometric functions. Later when we learn about the connection between matrices and lin- (b) BD is the matrix obtained by scaling the ear transformations, we shall give a direct proof j-th column of B by λj for each j. of this fact which will imply the addition the- Definition 2.2.15 (`-th power of a matrix). Let orems (R Ex. 16.5.12). ` A ∈ Mn(F) and ` ≥ 0. We define A , the `-th power of A, inductively: Definition 2.2.10 (). The n × n 0 identity matrix, denoted In or I, is the diag- (i) A = I ; onal matrix whose diagonal entries are all 1, (ii) A`+1 = A · A` . i. e., So 1  A` = A ··· A . (2.10) 0 | {z }  1  ` times   I = diag(1, 1,..., 1) =  ..  (2.7)  .  Exercise 2.2.16. Let A ∈ Mn(F). Show that 0 1 for all `, m ≥ 0, we have 20 CHAPTER 2. (F) MATRICES

`+m ` m (a) A = A A ; Notation 2.2.20. We denote by Nn the n × n matrix defined by (b) A`m = A`m . 0 1   0 1 0  Exercise 2.2.17. For k ≥ 0, compute    0 1      Nn =  .. ..  . (2.11) k 1 1  . .  (a) A for A = ;   0 1  0 1 0 0 0 1 ♥ (b) Bk for B = ; That is, N = (α ) where 1 1 n ij ( 1 if j = i + 1 cos θ − sin θ αij = . (c) Ck for C = ; 0 otherwise sin θ cos θ k Exercise 2.2.21. Find Nn for k ≥ 0. (d) Interpret and verify the following state- Fact 2.2.22. In Section 1.4, we defined the dot ments: product of two vectors (R Def. 1.4.1). The dot product may also be defined in terms of k (d1) A grows linearly k matrix multiplication. Let x, y ∈ F . Then k (d2) B grows exponentially x · y = xT y . (2.12) k (d3) C stays of constant “size.” (Define Exercise 2.2.23. If v ⊥ 1 then Jv = 0. “size” in this statement.) Notation 2.2.24. For a matrix A ∈ Fk×n, we Definition 2.2.18 (). The ma- sometimes write trix N ∈ Mn(F) is nilpotent if there exists an A = [a1 | · · · | an] integer k such that N k = 0. where ai is the i-th column of A. Exercise 2.2.25 (Extracting columns and el- Exercise 2.2.19. Show that if A ∈ Mn(F) is strictly upper triangular, then An = 0. ements of a matrix via multiplication). Let ei be the i-th column of I (i. e., the i-th standard So every strictly upper ), and let A = (αij) = [a1 | · · · | an]. is nilpotent. Later we shall see that a ma- Then

trix is nilpotent if and only if it is “similar” (a) Aej = aj ; (R Def. 8.2.1) to a strictly upper triangular T matrix (R Ex. 8.2.4). (b) ei Aej = αij . 2.2. MATRIX MULTIPLICATION 21

Exercise 2.2.26 (Linear combination as a ma- Definition 2.2.34 (Trace). The trace of a square k×n trix product). Let A = [a1 | · · · | an] ∈ F matrix A = (αij) ∈ Mn(F) is the sum of its di- T n and let x = (α1, . . . , αn) ∈ F . Show that agonal entries, that is, n Ax = α1a1 + ··· + αnan . (2.13) X Tr(A) = αii (2.15) Exercise 2.2.27 (Left multiplication acts col- i=1 umn by column). Let A ∈ Fk×n and let B = Examples 2.2.35. n×` [b1 | · · · | b`] ∈ F . (The bi are the columns of B.) Then (a) Tr(In) = n 3 1 2  AB = [Ab1 | · · · | Ab`] . (2.14) (b) Tr 4 −2 4  = −1 Proposition 2.2.28 (No cancellation). Let 1 −3 −2 n k ≥ 2. Then for all n and for all x ∈ F , Fact 2.2.36 (Linearity of the trace). (a) there exist k × n matrices A and B (A 6= B) Tr(A + B) = Tr(A) + Tr(B); such that Ax = Bx but A 6= B. (b) for λ ∈ F, Tr(λA) = λ Tr(A); Proposition 2.2.29. Let A ∈ Fk×n. If Ax = n P P 0 for all x ∈ F , then A = 0. (c) for αi ∈ F, Tr ( αiAi) = αi Tr Ai . Corollary 2.2.30 (Cancellation). If A, B ∈ Exercise 2.2.37. Let v, w ∈ Fn. Then Fk×n are matrices such that Ax = Bx for all Tr vwT  = vT w . (2.16) x ∈ Fn, then A = B. Note: compare with Prop. 2.2.28. T T Note that vw ∈ Mn(F) and v w ∈ F. Proposition 2.2.31 (No double cancellation). Proposition 2.2.38. Let A ∈ k×n and let k F Let k ≥ 2. Then for all n and for all x ∈ F B ∈ Fn×k. Then and y ∈ Fn, there exist k × n matrices A and B such that xT Ay = xT By but A 6= B. Tr(AB) = Tr(BA) . (2.17)

Proposition 2.2.32. Let A ∈ Fk×n. If Exercise 2.2.39. Show that the trace of a xT Ay = 0 for all x ∈ Fk and y ∈ Fn, then product is invariant under a cyclic permuta- A = 0. tion of the terms, i. e., if A1,...,Ak are matri- ces such that the product A1 ··· Ak is defined Corollary 2.2.33 (Double cancellation). If and is a , then A, B ∈ Fk×n are matrices such that xT Ay = T k n x By for all x ∈ F and y ∈ F , then A = B. Tr(A1 ··· Ak) = Tr(AkA1 ··· Ak−1) . (2.18) 22 CHAPTER 2. (F) MATRICES

Exercise 2.2.40. Show that the trace of a Definition 2.3.3 (Substition of a matrix into a product is not invariant under all permutations polynomial). Let f ∈ F[t] be the polynomial of the terms. In particular, find 2 × 2 matrices (R Def. 8.3.1) defined by A, B, and C such that d f = α0 + α1t + ··· + αdt . Tr(ABC) 6= Tr(BAC) . Just as we may substitute ζ ∈ F for the vari- able t in f to obtain a value f(ζ) ∈ F, we may also “plug in” the matrix A ∈ M ( ) to ob- Exercise 2.2.41 (Trace cancellation). Let n F tain f(A) ∈ M ( ). The only thing we have to B,C ∈ M ( ). Show that if Tr(AB) = n F n F be careful about is what we do with the scalar Tr(AC) for all A ∈ Mn(F), then B = C. term α0; we replace it with α0 times the iden- tity matrix, so (R Contents) d f(A) := α0I + α1A + ··· + αdA . (2.22)

Proposition 2.3.4. Let f ∈ F[t] be a polyno- 2.3 Arithmetic of diagonal mial and let A = diag(α1, . . . , αn) be a diago- and triangular matri- nal matrix. Then

ces f(A) = diag(f(α1), . . . , f(αn)) . (2.23)

In this section we turn our attention to special properties of diagonal and triangular matrices. In our discussion of the arithmetic of tri-

Proposition 2.3.1. Let A = diag(α1, . . . , αn) angular matrices, we focus on the diagonal en- and B = diag(β1, . . . , βn) be n × n diagonal tries. matrices and let λ ∈ F. Then Notation 2.3.5. For the remainder of this sec- tion, the symbol ∗ in a matrix will represent an A + B = diag(α + β , . . . , α + β ) (2.19) 1 1 n n arbitrary value with which we will not concern λA = diag(λα1, . . . , λαn) (2.20) ourselves. We write AB = diag(α1β1, . . . , αnβn) . (2.21)   α1 ∗ Proposition 2.3.2. Let A = diag(α , . . . , α )  α  1 n  2  be a diagonal matrix. Then Ak =  ..   .  diag αk, . . . , αk  for all k. 0 1 n αn 2.3. ARITHMETIC OF DIAGONAL AND TRIANGULAR MATRICES 23 for Then   α1 + β1 ∗    α2 + β2  α1 ∗ · · · ∗ A + B =  .   ..   α · · · ∗     2  0  .. .  . αn + βn  0 . .  αn (2.24)   λα1 ∗  λα   2  λA =  ..  (2.25)  0 .  Proposition 2.3.6. Let λαn   α1β1 ∗  α β   2 2  AB =  ..  . (2.26)    .  α1 0 ∗ αnβn  α2  A =  .   ..  Proposition 2.3.7. Let A be as in Prop.  0  2.3.6. Then αn αk  1 ∗  αk  k  2  A =  ..  (2.27)  .  and 0 k αn for all k. β  1 ∗ Proposition 2.3.8. Let f ∈ F[t] be a poly-  β   2  nomial and let A be as in Prop. 2.3.6. Then B =  ..   .    0 f(α1) βn ∗  f(α )   2  f(A) =  ..  .  0 .  f(αn) be upper triangular matrices and let λ ∈ F. (2.28) 24 CHAPTER 2. (F) MATRICES

Example 2.4.5. A permutation can be repre- sented by a 2 × n table like this. (R Contents) i 1 2 3 4 5 6 iσ 3 6 4 1 5 2 This permutation takes 1 to 3, 2 to 6, 3 to 2.4 Permutation Matrices 4, etc. Note that any table obtained by rearrang- Definition 2.4.1 (Rook arrangement). A rook ing columns represents the same permutation. arrangement is an arrangement of n rooks on For example, we may also represent the permu- an n × n chessboard such that no pair of rooks tation σ by the table below. attack each other. In other words, there is ex- actly one rook in each row and column. i 4 6 1 5 2 3 iσ 1 2 3 5 6 4 FIGURE HERE Moreover, the permutation can be repre- Fact 2.4.2. The number of rook arrangements sented by a diagram, where the arrow i → iσ on an n × n chessboard is n!. means that i maps to iσ. Definition 2.4.3 (). A per- FIGURE HERE mutation matrix is a square matrix with the following properties. Definition 2.4.6 (Composition of permuta- tions). Let σ, τ :[n] → [n] be permutations. (a) Every nonzero entry is equal to 1 . The composition of τ with σ, denoted στ, is στ (b) Each row and column has exactly one the permutation which maps i to i defined nonzero entry. by iστ := (iσ)τ . (2.29) Observe that rook arrangements corre- This may be represented as spond to permutation matrices where each rook is placed on a 1. Permutation matrices i −→σ iσ −→τ iστ . will be revisited in Chapter 6 where we discuss the determinant. Example 2.4.7. Let σ be the permutation of Recall that we denote by [n] the set of in- Example 2.4.5 and let τ be the permutation tegers {1, . . . , n}. given in the table Definition 2.4.4 (Permutation). A permutation i 1 2 3 4 5 6 is a bijection σ :[n] → [n]. We write σ : i 7→ iσ. iτ 4 1 6 5 2 3 2.4. PERMUTATION MATRICES 25

Then we can find the table representing the i 3 6 4 1 5 2 permutation στ by rearranging the table for τ iσ−1 1 2 3 4 5 6 so that its first row is in the same order as the second row of the table for σ, i. e., Rearranging the columns so that they are in natural order gives us the table i 3 6 4 1 5 2 iτ 6 3 5 4 2 1 i 1 2 3 4 5 6 and then combining the two tables: iσ−1 6 4 1 2 5 3

i 1 2 3 4 5 6 iσ 3 6 4 1 5 2 Definition 2.4.11. Let σ :[n] → [n] be a permu- iστ 6 3 5 4 2 1 tation. The permutation matrix corresponding to σ, denoted Pσ, is the n × n matrix whose That is, the table corresponding to the per- (i, j) entry is 1 if σ(i) = j and 0 otherwise. mutation στ is Exercise 2.4.12. Let σ, τ :[n] → [n] be per- i 1 2 3 4 5 6 k×n n×` mutations, let A ∈ F , and let B ∈ F . iστ 6 3 5 4 2 1 (a) What is AP ? Definition 2.4.8 (Identity permutation). The σ identity permutation id : [n] → [n] is the per- mutation defined by iid = i for all i ∈ [n]. (b) What is PσB? Definition 2.4.9 (Inverse of a permutation). let (c) What is P −1 ? σ :[n] → [n] be a permutation. Then the in- σ verse of σ is the permutation σ−1 such that σσ−1 = id. So σ−1 takes iσ to i. (d) Show that PσPτ = Pτσ (note the conflict in conventions for composition of permu- Example 2.4.10. To find the table corre- tations and matrix multiplication). sponding to the inverse of a permutation σ, we first switch the rows of the table corresponding to σ and then rearrange the columns so that they are in natural order. For example, when (R Contents) we switch the rows of the table in Example 2.4.5, we have 26 CHAPTER 2. (F) MATRICES

2.5 Additional exercises Exercise 2.5.8 (Submatrix sum). Let I1 ⊆ [k] and I2 ⊆ [n], and let B be the submatrix Exercise 2.5.1. Let A be a matrix. Show, (R Def. 3.3.11) of A ∈ Fk×n with entries T without calculation, that A A is symmetric. αij for i ∈ I1, j ∈ I2. Find vectors a and b such that aT Ab equals the sum of the entries Definition 2.5.2 (Commutator). Let A ∈ k×n F of B. and A ∈ Fn×k. Then the commutator of A and B is AB − BA. Definition 2.5.9 (). The Vandermonde matrix generated by α1, . . . , αn Definition 2.5.3. Two matrices A, B ∈ Mn(F) commute if AB = BA. is the n × n matrix  1 1 ··· 1  Exercise 2.5.4. (a) Find an example of two  α1 α2 ··· αn  2 × 2 matrices that do not commute.    α2 α2 ··· α2  V (α1, . . . , αn) =  1 2 n  .  . . .. .  (b) (Project) Interpret and prove the follow-  . . . .  n−1 n−1 n−1 ing statement: α1 α2 ··· αn (2.30) Almost all pairs of 2×2 integral matrices Exercise 2.5.10. Let A be a Vandermonde (matrices in M2( )) do not commute. Z matrix generated by distinct αi. Show that the rows of A are linearly independent. Do not use Exercise 2.5.5. Let D ∈ Mn(F) be a diag- onal matrix such that all diagonal entries are . distinct. Show that if A ∈ Mn(F) commutes Exercise 2.5.11. Prove that polynomials of a with D then A is a diagonal matrix. matrix commute: let A be a square matrix and let f, g ∈ F[t]. Then f(A) and g(A) commute. Exercise 2.5.6. Show that only the scalar ma- In particular, A commutes with f(A). trices (R Def. 2.2.13) commute with all ma- trices in Mn(F). (A scalar matrix is a matrix Definition 2.5.12 (). The cir- of the form λI.) culant matrix generated by the sequence (α0, α1, . . . αn−1) of scalars is the n × n matrix Exercise 2.5.7.   α0 α1 ··· αn−1 (a) Show that the commutator of two matri- α α ··· α   n−1 0 n−2 ces over R is never the identity matrix. C(α0, α1, . . . αn−1) =  . . .. .   . . . .  α α ··· α (b) Find A, B ∈ Mp(Fp) such that their com- 1 2 0 mutator is the identity. (2.31) 2.5. ADDITIONAL EXERCISES 27

Exercise 2.5.13. Prove that all circulant ma- (c) Prove: (∗) is true for Jordan blocks. trices commute. Prove this (d) Characterize the matrices over C for (a) directly, which (∗) is true, in terms of their Jordan blocks (R ??). (b) in a more elegant way, by showing that all circulant matrices are polynomials of Project 2.5.17. For A ∈ Mn(R), let f(A, k) a particular circulant matrix. denote the largest absolute value of all entries of Ak, and define Definition 2.5.14 (Jordan block). For λ ∈ C (λ) and n ≥ 1, the Jordan block J(n, λ) is the ma- Mn (R) := {A ∈ Mn(R) | f(A, 1) ≤ λ} trix (2.34) (the matrices where all entries have absolute J(n, λ) := λI + Nn (2.32) value ≤ λ). Define where Nn is the matrix defined in Notation 2.2.20. f1(n, k) = max f(A, k) , (1) A∈Mn (R) Exercise 2.5.15. Let f ∈ [t]. Prove that C f2(n, k) = max f(A, k) , (1) f(J(n, λ)) is the matrix A∈Mn (R) A nilpotent   0 1 (2) 1 (n−1) f3(n, k) = max f(A, k) . f(λ) f (λ) 2! f (λ) ··· (n−1)! f (λ) (1) A∈Mn (R)  0 1 (n−2)   f(λ) f (λ) ··· f (λ) A strictly upper  (n−2)!  triangular  .. .. .   . . .  .   Find the rate of growth of these functions in  f(λ) f 0(λ)   0  terms of k and n. f(λ) (2.33) (R Contents) Exercise 2.5.16. The converse of the second statement in Ex. 2.5.11 would be:

(∗) The only matrices that commute with A are the polynomials of A.

(a) Find a matrix A for which (∗) is false.

(b) For which diagonal matrices is (∗) true? col-space A, is the span of its columns. The row space of a matrix A, denoted row-space A, is the span of its rows. In particular, for a k × n matrix A, we have k n Chapter 3 col-space A ≤ F and row-space A ≤ F . Proposition 3.1.3. The column-rank of a ma- trix is equal to the dimension of its column ( ) Matrix Rank space, and the row-rank of a matrix is equal to F the dimension of its row space.

Definition 3.1.4 (Full column- and row-rank). A k × n matrix A is of full column-rank if its (R Contents) column-rank is k. A is of full row-rank if its row-rank is n.

3.1 Column and row rank Exercise 3.1.5. Let A and B be k × n matri- ces. Then In Section 1.3, we defined the rank of a list of column or row vectors. We now view this | rkcol A−rkcol B| ≤ rkcol(A+B) ≤ rkcol A+rkcol B. concept in terms of the columns and rows of a (3.1) matrix. Exercise 3.1.6. Let A be a k × n ma- Definition 3.1.1 (Column- and row-rank). The trix and let B be an n × ` matrix. Then column-rank of a matrix is the rank of the list col-space(AB) ≤ col-space A. of its column vectors. The row-rank of a ma- trix is the rank of the list of its row vectors. A k×n Exercise 3.1.7. Let A be a k × n matrix and matrix A ∈ F is of full row-rank if its row- rank is equal to k, and of full column-rank if let B be an n × ` matrix. Then rkcol(AB) ≤ its column-rank is equal to n. We denote the rkcol A. column-rank of A by rkcol A and the row-rank of A by rkrow A. (R Contents) Definition 3.1.2 (Column and row space). The column space of a matrix A, denoted

Babai: Discover Linear last Algebra. updated August 21, 2016.28Thischapter Section 3.2 (Elementary operations) significantly c 2016 L´aszl´oBabai. revised November 15, 2017. 3.2. ELEMENTARY OPERATIONS AND GAUSSIAN ELIMINATION 29

3.2 Elementary operations Let A = [a1 | · · · | an] be a matrix; the vi are the columns of this matrix. The definitions and Gaussian elimina- below refer to this matrix. tion Definition 3.2.1 (Column shearing). The column shearing operation, denoted by There are three kinds of “elementary opera- shearc(i, j, λ), takes three parameters: column tions” on the columns of a matrix and three indices i 6= j and a scalar λ. The operation corresponding elementary operations on the replaces ai by ai + λaj. rows of a matrix. The significance of these op- erations is that while they can simplify the ma- Note that the only column that changes as trix by turning most of its elements into zero, a result of the shearc(i, j, λ) operation is col- they do no change the rank of the matrix, and umn i. Remember the condition i 6= j. On the they change the value of the determinant in an other hand, λ can be any scalar, including 0. easy-to-follow manner. Definition 3.2.2 (Column scaling). The column We call the three column operations column scaling operation, denoted by scalec(i, λ), takes shearing, column scaling, and column swap- two parameters: a column index i and a scalar ping, and use the corresponding terms for rows λ 6= 0. The operation replaces ai by λai. as well. The most powerful of these is “col- Note that the only column that changes as umn shearing.” While the literature sometimes a result of the scalec(i, λ) operation is column calls this operation “column addition,” I find i. Remember the condition λ 6= 0. this term misleading: we don’t “add columns” Definition 3.2.3 (Column swapping). The – that is a term more adequately applied in column swapping operation, denoted by the context of the multilinearity of the deter- swapc(i, j), takes two parameters: column in- minant. I propose the term “column shearing” dices i 6= j. The operation swaps the columns from physical analogy which I will explain be- a and a . low. Similarly misleading is the term “column i j multiplication” often used to describe what I So what happens can be described with ref- propose to call “column scaling.” The term erence to a temporary vector z as the following “column multiplication” seems to suggest that sequence of operations: we multiply columns, as we do in the definition z ← vi, vi ← vj, vj ← z. of an . Our “column swap- Note that the only columns that change as a ping” is often described as “column switch- result of the swapc(i, j) operation are columns ing.” Although “switching” seems to be an i and j. Remember the condition i 6= j. adequate term, I believe “swapping” better de- Definition 3.2.4. Elementary row operations scribes what is actually happening. The elementary row operations shearr(i, j, λ), 30 CHAPTER 3. (F) MATRIX RANK scaler(i, λ), and swapr(i, j) are defined analo- one specific direction, and another part of the gously. body in the opposite direction.” (Wikipedia) A pair of scissors exerts shearing forces on the Exercise 3.2.5. Each of the elementary oper- material it tries to cut. A nice example is a ations is invertible and the inverse of each ele- deck of cards on a hirizontal table being pushed mentary operation is an elementary operation horizontally on the top while being held in of the same type. Specifically, the inverse of place at the bottom, causing the cards to slide. shearc(i, j, λ) is shearc(i, j, −λ); the inverse of −1 Thus the deck, which initially has a rectan- scalec(i, λ) is of scalec(i, λ ), and swapc(i, j) gle as its vertical cross-section, ends up with a is its own inverse. The analogous statements parallelogram of the same height as its vertical hold for the elementary row operations. cross-section. The distance of each card from Definition 3.2.6 (). An ele- the base remains constant, and the distance mentary matrix is a square matrix that differs each card slides is proportional to its distance from the the identity matrix in an elementary from the base. This is precisely what an appli- operation. cation of a shearing matrix to the deck achieves in 2D or 3D. Accordingly, we shall have three kinds of elementary matrices: shearing, scaling, and Definition 3.2.10 (Scaling matrix). Let 1 ≤ i ≤ swapping matrices. n and λ ∈ F, λ 6= 0. We denote by D(i, λ) the n × n diagonal matrix which has λ in the (i, i) Definition 3.2.7 (Shearing matrix). Let i 6= j, position and 1 in all the other diagonal posi- 1 ≤ i, j ≤ n. We denote by Eij the n × n ma- tions. trix which has 1 in the (i, j) position and 0 in k×n every other position. A shearing matrix B is Exercise 3.2.11. Let A ∈ F be a k × n matrix, λ ∈ , λ 6= 0, and 1 ≤ i ≤ n. Show an n × n matrix of the form B = I + λEij for F that AD(i, λ) is the matrix obtained from A by λ ∈ F. performing the column operation scalec(i, λ). Exercise 3.2.8. Let A ∈ Fk×n be a k × n ma- Infer (do not repeat the same argument) that trix, λ ∈ F, and B the n × n shearing matrix D(i, λ)A is obtained from A by performing the B = I + λEij. Show that AB is the matrix row operation scaler(i, λ). obtained from A by performing the operation Definition 3.2.12 (Swapping matrix). Let 1 ≤ shear (j, i, λ). Infer (do not repeat the same c i, j ≤ n where i 6= j. We denote by T the argument) that BA is obtained from A by per- i,j n × n matrix obtained from the identity ma- forming shear (i, j, λ). r trix by swapping the column i and j. Remark 3.2.9. In physics, “shearing forces” are Note that by swapping rows i and j we ob- “unaligned forces pushing one part of a body in tain the same matrix. 3.3. THE SECOND MIRACLE OF LINEAR ALGEBRA 31

Exercise 3.2.13. Let A ∈ Fk×n be a k×n ma- Exercise 3.2.17. Let A be a k × n matrix. trix, 1 ≤ i, j ≤ n, i 6= j. Show that A·Ti,j is the Then by performing a series of row shearing, matrix obtained from A by performing the col- row swapping, column shearing, and column umn operation swapc(i, j). Infer (do not repeat swapping operations we can bring the A to the the same argument) that Ti,jA is obtained from 2 × 2 block-matrix form A by performing the row operation swapr(i, j). D 0 0 0 Exercise 3.2.14. Let A be a matrix with lin- early independent rows. Then by performing a where the top left block, D, is an r×r diagonal series of row shearing, row swapping, and col- matrix with no zeros in the diagonal (for some umn swapping operations we can bring the ma- value r which will turn out to be the rank of trix to the form [D | B] where D is a diagonal A); the rest of the matrix is zero. matrix no zeros in the diagonal. The process of transforming a matrix into Exercise 3.2.15. Let A be a k × n matrix. a matrix of the form described in Ex. 3.2.17 by Then by performing a series of row shearing, performing a series of row shearing, row swap- row swapping, and column swapping opera- ping, column shearing, and column swapping tions we can bring the A to the 2 × 2 block- operations is called Gauss-Jordan elimination. matrix form (R Contents) DB 0 0 where the top left block, D, is an r×r diagonal 3.3 Invariance of column matrix with no zeros in the diagonal (for some value r which will turn out to be the rank of and row rank, the Sec- A) and the k − r bottom rows are zero. ond Miracle of Linear Exercise 3.2.16. Any matrix can be trans- Algebra formed into an upper triangular matrix by per- forming a series of row shearing, row swapping, The main result of this section is the following and column swapping operations. theorem.

The process of transforming a matrix into Theorem 3.3.1 (Second Miracle of Linear Al- a matrix of the form described in Ex. 3.2.15 by gebra). The row-rank of a matrix is equal to its performing a series of row shearing, row swap- column-rank. ping, and column swapping operations is called In order to prove this theorem, we will first Gaussian elimination. need to prove the following two lemmas. 32 CHAPTER 3. (F) MATRIX RANK

Lemma 3.3.2. Elementary column operations Corollary 3.3.7. If the columns vi1 ,..., vi` do not change the column-rank of a matrix. are linearly independent, then this remains ♦ true after an elementary row operation.

The proof of the next lemma is somewhat Exercise 3.3.8. Complete the proof of Lemma more involved; we will break its proof into ex- 3.3.3. ercises.

Lemma 3.3.3. Elementary row operations do ∗ ∗ ∗ not change the column-rank of a matrix. Exercise 3.3.4. Use these two lemmas to- Because the Second Miracle of Linear Alge- gether with Ex. 3.2.17 to prove Theorem 3.3.1. bra establishes that the row-rank and column- rank of a matrix A are equal, it is no longer In order to complete our proof of the Sec- necessary to differentiate between them; this ond Miracle, we now only need to prove Lemma quantity is simply referred to as the rank of A, 3.3.3. denoted rk A. The following exercise demonstrates why the proof of Lemma 3.3.3 is not as straight- Definition 3.3.9 (Full rank). Let A ∈ Mn(F) forward as the proof of Lemma 3.3.2. be a square matrix. Then A is of full rank if rk A = n. Exercise 3.3.5. An elementary row operation Observe that full rank is the same as full can change the column space of a matrix. column- or row-rank (R Def. 3.1.4) only in

Proposition 3.3.6. Let A = [a1 | · · · | an] ∈ the case of square matrices. k×n F be a matrix with columns a1,..., an. Let Definition 3.3.10 (Nonsingular matrix). An n× n n matrix is nonsingular if it is of full rank, i. e., X αiai = 0 rk A = n. i=1 Definition 3.3.11 (Submatrix). Let A ∈ Fk×n be a linear relation among the columns. Let be a matrix. Then the matrix B is a submatrix 0 0 0 A = [a1 | · · · | an] be the result of applying of A if it can be obtained by deleting rows and an elementary row operation to A. Then the columns from A. columns of A0 obey the same linear relation, that is, In other words, a submatrix of a matrix A is n a matrix obtained by taking the intersection of X 0 αiai = 0 . a set of the rows of A with a set of the columns i=1 of A. 3.4. MATRIX RANK AND INVERTIBILITY 33

Theorem 3.3.12 (Rank vs. nonsingular sub- (a) Show that A has a right inverse if and matrices). Let A ∈ Fk×n be a matrix. Then only if A has full row-rank, i. e., rk A = k. rk A is the largest value of r such that A has a (b) Show that A has a left inverse if and only nonsingular r × r submatrix. ♦ if A has full column-rank, i. e., rk A = n. Exercise 3.3.13. Show that for all k, the in- tersection of k linearly independent rows with Note in particular that if A has a right in- k linearly independent columns can be singu- verse, then k ≤ n, and if A has a left inverse, lar. In fact, it can be the zero matrix. then k ≥ n. Corollary 3.4.3. Let A be a nonsingular Exercise 3.3.14. Let A be a matrix of rank square matrix. Then A has both a right and a r. Show that the intersection of any r linearly left inverse. independent rows with any r linearly indepen- dent columns is a nonsingular r × r submatrix Exercise 3.4.4. For all k < n, find a k × n of A. (Note: this exercise is more difficult than matrix that has infinitely many right inverses. Theorem 3.3.12 and is not needed for the proof of Theorem 3.3.12). Definition 3.4.5 (Two-sided inverse). Let A ∈ Mn(F). Then the matrix B ∈ Mn(F) is a (two- Exercise 3.3.15. Let A be a matrix. Show sided) inverse of A if AB = BA = In. The that if the intersection of k linearly indepen- inverse of A is denoted A−1. If A has an in- dent columns with ` linearly independent rows verse, then A is said to be invertible. of A has rank s, then rk(A) ≥ k + ` − s. Proposition 3.4.6. Let A be a matrix. If A has a left inverse as well as a right inverse, then R ( Contents) A has a unique two-sided inverse and it has no left or right inverse other than the two-sided inverse. 3.4 Matrix rank and in- The proof of this lengthy statement is just vertibility one line, based solely on the associativity of matrix multiplication. The essence of the proof Definition 3.4.1 (Left and right inverse). Let is in the next lemma. A ∈ k×n. The matrix B ∈ n×k is a left in- F F Lemma 3.4.7. Let A ∈ k×n be a matrix verse of A if BA = I . Likewise, the matrix F n with a right inverse B and a left inverse C. C ∈ n×k is a right inverseof A if AC = I . F k Then B = C is a two-sided inverse of A and Proposition 3.4.2. Let A ∈ Fk×n. k = n. ♦ 34 CHAPTER 3. (F) MATRIX RANK

Corollary 3.4.8. Under the conditions of 3.5 Codimension (op- Lemma 3.4.7, k = n and B = C is a two-sided tional) inverse. Moreover, if C1 is also a left inverse, then C1 = C; analogously, if B1 is also a right The reader comfortable with abstract vector inverse, then B1 = B. spaces can skip to Chapter ?? for a more gen- eral discussion of this material. Corollary 3.4.9. Let A be a matrix with a left n inverse. Then A has at most one right inverse. Definition 3.5.1 (Codimension). Let W ≤ F . The codimension of W is the minimum num- Corollary 3.4.10. A matrix A has an inverse ber of vectors that together with W generate n if and only if A is a nonsingular square matrix. F and is denoted by codim W or codim W . (Note that this is the dimension of the quotient n Corollary 3.4.11. If A has both a right and space F /W ; see Def. ??) a left inverse then k = n. Proposition 3.5.2. If W ≤ Fn then codim W = n − dim W . Theorem 3.4.12. For A ∈ Mn(F), the follow- ing are equivalent. Proposition 3.5.3 (Dual modular equation). Let U, W ≤ Fn. Then (a) rk A = n codim(U + W ) + codim(U ∩ W ) (b) A has a right inverse = codim U + codim W. (3.2) Corollary 3.5.4. Let U, W ≤ n. Then (c) A has a left inverse F codim(U ∩ W ) ≤ codim U + codim W. (3.3) (d) A has an inverse Corollary 3.5.5. Let U, W ≤ Fn with dim U = r and codim W = t. Then ♦ dim(U ∩ W ) ≥ max{r − t, 0} . (3.4) Exercise 3.4.13. Assume F is infinite and let A ∈ Fk×n where n > k. If A has a right inverse, then A has infinitely many right inverses. In Section 1.3, we defined the rank of a set of column vectors (R Def. 1.3.28) and then showed this to be equal to be the dimension of R ( Contents) their span (R Cor. 1.3.46). We now define the corank of a set of vectors. 3.6. ADDITIONAL EXERCISES 35

Definition 3.5.6 (Corank). The corank of a set 3.6 Additional exercises S ⊆ Fn, denoted corank S, is defined by Exercise 3.6.1. Let A = [v1 | · · · | vn] be a corank S := codim(span S) . (3.5) matrix. True or false: if the columns v1,..., v` are linearly independent this remains true af- Definition 3.5.7 (Null space). The null space or ter performing elementary column operations k×n of a matrix A ∈ F , denoted null(A), on A. is the set Proposition 3.6.2. Let A be a matrix with n null(A) = {v ∈ F | Av = 0} . (3.6) at most one nonzero entry in each row and col- umn. Then rk A is the number of nonzero en- Exercise 3.5.8. Let A ∈ k×n. Show that F tries in A. rk A = codim(null(A)) . (3.7) Numerical exercise 3.6.3. Perform a series n of elementary row operations to determine the Proposition 3.5.9. Let U ≤ F and let k rank of the matrix W ≤ F such that dim W = codim U = `. `×k Then there is a matrix A ∈ F such that  1 3 2 null(A) = U and col-space A = W . −5 −2 3 . Definition 3.5.10 (Corank of a matrix). Let −3 4 7 A ∈ k×n. We define the corank of A as the F Self-check: use a different sequence of elemen- corank of its column space, i. e., tary operations and verify that you obtain the corank A := k − rk A. (3.8) same answer.

Exercise 3.5.11. When is corank A = Exercise 3.6.4. Determine the ranks of the corank AT ? n × n matrices

Exercise 3.5.12. Let A ∈ Fk×n and let B ∈ (a) A = (αij) where αij = i + j Fn×`. Show that (b) B = (βij) where βij = ij

corank(AB) ≤ corank A + corank B. (3.9) 2 2 (c) C = (γij) where γij = i + j

Proposition 3.6.5. Let A ∈ Fk×` and B ∈ F`×m. Then (R Contents) rk(AB) ≤ min{rk(A), rk(B)} . 36 CHAPTER 3. (F) MATRIX RANK

Proposition 3.6.6. Let A, B ∈ Fk×`. Then Exercise 3.6.13. Let A ∈ Rk×n be a matrix rk(A + B) ≤ rk(A) + rk(B). with integer coefficients. Then the entries of A can also be considered residue classes of Z, that Exercise 3.6.7. Find an n × n matrix A of is, we can instead consider A as an element of n rank n − 1 such that A = 0. k×n Fp . Observe that Fp is not a subfield of R, because the operations of addition and multi- Exercise 3.6.8. Let A ∈ k×n. Then rk A is F plication are not the same. the smallest r such that there exist matrices k×r r×n B ∈ F and C ∈ F with A = BC. (a) Show that rkp(A) ≤ rk(A). Exercise 3.6.9. Show that the matrix A is of (b) For every p, find a (0, 1) matrix A (that rank r if and only if A can be expressed as the is, a matrix whose entries are only 0 and sum of r matrices of rank 1. 1) where this inequality is strict, i. e., rkp(A) < rk(A). Exercise 3.6.10 (Characterization of matrices Proposition 3.6.14. of rank 1). Let A ∈ Fk×n. Show that rk A = 1 if and only if there exist column vectors a ∈ Fk (a) Let A be a matrix with real entries. Then T T  and b ∈ Fn such that A = ab . rk A A = rk(A).

Let A ∈ Fk×n and let F be a subfield of the (b) This is false over C and over Fp. field G. Then we can also view A as a matrix Exercise 3.6.15. over . However, this changes the notion of G (a) Show that rk(J − I ) = n . linear combinations and therefore in principle n n it could affect the rank. We shall see that this (b) Show that is not the case, but in order to be able to rea-  n n even son about this question, we temporarily use the rk (J − I ) = . 2 n n n − 1 n odd notation rkF(A) and rkG(A) to denote the rank of A with respect to the corresponding fields. ♥ Exercise 3.6.16. Let A = (αij) be a matrix We will also write rk to mean rk . p Fp of rank r, and let A0 be the matrix obtained by Lemma 3.6.11. Being nonsingular is insensi- squaring every element of A. 0 2  tive to field extensions. ♦ (a) Let A = αij be the matrix obtained by squaring every element of A. Show that Corollary 3.6.12. Let F be a subfield of G, 0 r+1 rk(A ) ≤ 2 . and let A ∈ Fk×n. Then 0 d  0 (b) Now let A = αij . Show that rk(A ) ≤ rk (A) = rk (A) . r+d−1 F G d . 3.6. ADDITIONAL EXERCISES 37

(c) Let f be a polynomial of degree d, and let Af = (f(αij)). Prove that rk(Af ) ≤ r+d d . (d) Show that each of these bounds are tight for all r, i. e., for every r and d there ex- 0 r+d−1 ists a matrix A such that rk A = d , and for every r and f there exists a ma- r+d trix B such that rk Bf = d .

(R Contents) knowns,

α11x1 + α12x2 + ··· + α1nxn = β1

α21x1 + α22x2 + ··· + α2nxn = β2 . . Chapter 4 αk1x1 + αk2x2 + ··· + αknxn = βk

Here, the αij and βi are scalars, while the xj are unknowns. In Section 1.1, we represented ( ) Theory of this system as a linear combination of column F vectors. Matrices allow us to write this system Systems of Linear even more concisely as Ax = b, where     α11 α12 ··· α1n x1 Equations I: α α ··· α  x   21 22 2n  2 A =  . . .. .  , x =  .  , Qualitative Theory  . . . .   .  αk1 αk2 ··· αkn xn   β1 β   2 b =  .  . (4.1)  .  β (R Contents) k We know that the simplest is of the form ax = b (one equation in one unknown); remarkably, the notation does not become any more complex when we have a gen- eral system of linear equations. The first ques- 4.1 Homogeneous systems tion we ask about any system of equations is its solvability. of linear equations Definition 4.1.1 (Solvable system of linear equations). Given a matrix A ∈ Fk×n and a Matrices allow us to concisely express systems vector b ∈ Fk, we say that the system Ax = b of linear equations. In particular, consider the of linear equations is solvable if there exists a general system of k linear equations in n un- vector x ∈ Fn that satisfies Ax = b.

Babai: Discover Linear Algebra. 38 This chapter last updated August 8, 2016 c 2016 L´aszl´oBabai. 4.1. HOMOGENEOUS SYSTEMS OF LINEAR EQUATIONS 39

Definition 4.1.2 (Homogeneous system of lin- Exercise 4.1.8. Show that null(A) = U. ear equations). The system Ax = 0 is called a homogeneous system of linear equations. Proposition 4.1.9. Let A ∈ Fk×n and con- Every system of homogeneous linear equa- sider the homogeneous system Ax = b. tions is solvable. Definition 4.1.3 (Trivial solution to a homoge- (a) Let rk A = r and let n = r + d. Then neous system of linear equations). The trivial it is possible to relabel the unknowns 0 0 0 solution to the homogeneous system of linear x1,..., xn as xi,..., xn, so that xj can equations Ax = 0 is the solution x = 0. be represented as a linear combination of x0 ,..., x0 for j = 1, . . . , d, say So when presented with a homogeneous sys- d+1 n

tem of linear equations, the question we ask n is not, “Is this system solvable?” but rather, 0 X 0 xi = λijxj . “Does this system have a nontrivial solution?” j=d+1 Definition 4.1.4 (Solution space). Let A ∈ k×n F . The set of solutions to the homogeneous n (b) The vectors e + P λ x0 form a ba- system of linear equations Ax = 0 is the set i j=d+1 ij j sis of U 0, the solution space of the rela- U = {x ∈ Fn | Ax = 0} and is called the solu- tion space of Ax = 0. Ex. 4.1.7 explains the beled system of equations. terminology. (c) dim U = dim U 0 Definition 4.1.5 (Null space). The null space or kernel of a matrix A ∈ Fk×n, denoted null(A), is the set (d) dim U = n − rk A

n null(A) = {v ∈ F | Av = 0} . (4.2) An immediate consequence of (d) is the Rank-Nullity Theorem, which will be crucial Definition 4.1.6. The nullity of a matrix A is in our study of linear maps in Chapter 16.1 the dimension of its null space.

For the following three exercises, let A ∈ Corollary 4.1.10 (Rank-Nullity Theorem). k×n and let U be the solution space of the F Let A ∈ Fk×n be a matrix. Then system Ax = 0.

Exercise 4.1.7. Prove that U ≤ Fn. rk(A) + nullity(A) = n . (4.3) 1In Chapter 16, we will formulate the Rank-Nullity Theorem in terms of linear maps rather than matrices, but the two formulations are equivalent. 40CHAPTER 4. (F) THEORY OF SYSTEMS OF LINEAR EQUATIONS I: QUALITATIVE THEORY

An explanation of (d) is that dim U mea- (R Contents) sures the number of coordinates of x that we can choose independently. This quantity is re- ferred to by physicists as the “degree of free- dom” left in our choice of x after the set Ax = 4.2 General systems of lin- b of constraints. If there are no constraints, the degree of freedom of the system is equal to n. ear equations It is plausible that each constraint reduces the Proposition 4.2.1. The system of linear degree of freedom by 1, which would suggest equations Ax = b is solvable if and only if dim U = n − k, but effectively there are only b ∈ col-space A. rk A constraints because every equation that is a linear combination of previous equations can Definition 4.2.2 (Augmented system). When be thrown out. This makes it plausible that the speaking of the system Ax = b, we call A the degree of freedom is n − rk A. This argument matrix of the system and [A | b] (the column is not a proof, however. b added to A) the augmented system. Proposition 4.1.11. Let A ∈ Fk×n and con- sider the homogeneous system Ax = 0 of lin- Proposition 4.2.3. The system Ax = b of ear equations. Let U be the solution space of linear equations is solvable if and only if the Ax = 0. Prove that the following are equiva- matrix of the system and the augmented ma- lent. trix have the same rank, i. e., rk A = rk[A | b].

(a) Ax = 0 has no nontrivial solution Definition 4.2.4 (Translation). Let S ⊆ Fn, and let v ∈ n. The set (b) The columns of A are linearly indepen- F dent S + v = {s + v | s ∈ S} (4.4) (c) A has full column rank, i. e., rk A = n is called the translate of S by v. Such an ob- (d) dim U 6= 0 ject is called an affine subspace of Fn (R Def. 5.1.3). (e) The rows of A span Fn (f) A has a left inverse Proposition 4.2.5. Let S = {x ∈ Fn | Ax = b} be the set of solutions to a system of linear Proposition 4.1.12. Let A be a square ma- equations. Then S is either empty or a trans- trix. Then Ax = 0 has no nontrivial solution late of a subspace of Fn, namely, of the solution if and only if A is nonsingular. space of to the homogeneous system Ax = 0. 4.2. GENERAL SYSTEMS OF LINEAR EQUATIONS 41

Proposition 4.2.6 (Equivalent characteriza- tions of nonsingular matrices). Let A ∈ Mn(F). The following are equivalent.

(a) A is nonsingular, i. e., rk(A) = n

(b) The rows of A are linearly independent

(c) The columns of A are linearly indepen- dent

(d) The rows of A span Fn

(e) The columns of A span Fn

(f) The rows of A form a basis of Fn

(g) The columns of A form a basis of Fn (h) A has a left inverse

(i) A has a right inverse

(j) A has a two-sided inverse

(k) Ax = 0 has no nontrivial solution

(l) For all b ∈ Fn, Ax = b is solvable

(m) For all b ∈ Fn, Ax = b has a unique solution

Our discussion of the determinant in the next chapter will allow us to add a particularly important additional property:

(n) det A 6= 0 k X αi = 1. i=1 Example 5.1.2. 1 1 1 2v − v − v − v Chapter 5 1 2 2 3 4 6 5 is an affine combination of the column vectors v1,..., v5. (F, R) Affine and Definition 5.1.3 (Affine-closed set). The set S ⊆ Fn is affine-closed if it is closed under Convex affine combinations. Fact 5.1.4. The empty set is affine-closed Combinations (why?). (optional) Definition 5.1.5 (Affine subspace). A nonempty, affine-closed subset U of Fn is called n n an affine subspace of F (notation: U ≤aff F ). Proposition 5.1.6. The intersection of a (fi- The reader comfortable with abstract vector nite or infinite) family of affine-closed subsets spaces can skip to Chapter ?? for a more gen- n of F is affine-closed. In other words, an inter- eral discussion of this material. section affine subspaces is either empty or an affine subspace. 5.1 (F) Affine combina- Throughout this book, the term “subspace” refers to subsets that are closed under linear tions combinations (R Def. 1.2.1). Subspaces are also referred to as “linear subspaces.” This In Section 1.1, we defined linear combinations (redundant) longer term is especially useful in of column vectors (R Def. 1.1.13). We now contexts where affine subspaces are discussed consider affine combinations. in order to distinguish linear subspaces from Definition 5.1.1 (Affine combination). An affine subspaces. affine combination of the vectors v1,..., vk ∈ Definition 5.1.7 (Affine hull). The affine hull k n n X of a subset S ⊆ F , denoted aff S, is the small- F is a linear combination αivi where est affine-closed set containing S, i. e., i=1

Babai: Discover Linear Algebra. 42 This chapter last updated August 23, 2016 c 2016 L´aszl´oBabai. 5.1. (F) AFFINE COMBINATIONS 43

(a) aff S ⊇ S; ∗ ∗ ∗

(b) aff S is affine-closed; Next we connect these concepts with the theory of systems of linear equations. (c) for every affine set T ⊆ n, if T ⊇ S then F k×n T ⊇ aff S. Exercise 5.1.16. Let A ∈ F and let b ∈ Fk. Then the set of solutions to the system Fact 5.1.8. aff ∅ = ∅. Ax = b of linear equations is an affine-closed n Contrast this with the fact that span ∅ = subset of F . {0}. The next exercise shows the converse.

Theorem 5.1.9. Let S ⊆ Fn. Then aff S ex- Exercise 5.1.17. Every affine-closed subset of n ists and is unique. ♦ F is the set of solutions to the system Ax = b of linear equations for some A ∈ Fk×n and Theorem 5.1.10. For S ⊆ n, aff S is the set F b ∈ Fk. of all affine combinations of the finite subsets of S. ♦ Proposition 5.1.18 (General vs. homoge- neous systems of linear equations). Let A ∈ Proposition 5.1.11. Let S ⊆ n. Then k×n n n F F and b ∈ F . Let S = {x ∈ F | Ax = b} aff(aff S) = aff S. be the set of solutions of the system Ax = b n n and let U = {x ∈ | Ax = 0} be the set Proposition 5.1.12. Let S ⊆ F . Then S is F affine-closed if and only if S = aff S. of solutions of the corresponding system of ho- mogeneous linear equations. Then either S is Proposition 5.1.13. Let S ⊆ Fn be affine- empty or S is a translate of U. closed. Then S ≤ Fn if and only if 0 ∈ S. ∗ ∗ ∗ Proposition 5.1.14. We now study geometric features of Fn, (a) Let W ≤ Fn. All translates W + v of W viewed as an affine space. (R Def. 4.2.4) are affine subspaces. Proposition 5.1.19. The span of the set S ⊆ n n (b) Every affine subspace S ≤aff F is the F is the affine hull of S ∪ {0}. translate of a (unique) subspace of n. F Proposition 5.1.20. Let S ⊆ W be Proposition 5.1.15. The intersection of a (fi- nonempty. Then for any u ∈ S, we have nite or infinite) family of affine subspaces is ei- aff S = u + span(S − u) (5.1) ther empty or equal to a translate of the inter- section of their corresponding linear subspaces. where S − u is the translate of S by −u. 44 CHAPTER 5. (F, R) AFFINE AND CONVEX COMBINATIONS (OPTIONAL)

Definition 5.1.21 (Dimension of an affine sub- (b) For k ≥ 1, the vectors v1,..., vk are space). The (affine) dimension of an affine sub- affine-independent if and only if the vec- n space U ≤aff F , denoted dimaff U, is the di- tors v2 − v1,..., vk − v1 are linearly in- mension of its corresponding dependent. (of which it is a translate). In order to assign a dimension to all affine-closed sets, we adopt Definition 5.1.29 (Affine basis). An affine ba- n the convention that dim ∅ = −1. sis of an affine subspace W ≤aff F is an affine- independent set S such that aff S = W . Exercise 5.1.22. Let U ≤ Fk. Then dimaff U = dim U. Proposition 5.1.30. Let W be an affine sub- Exercise 5.1.23. What are the 0-dimensional space of Fn. Every affine basis of W has affine subspaces? 1 + dim W elements. Definition 5.1.24 (Affine independence). The Corollary 5.1.31. If W1,...,Wk are affine n vectors v ,..., v ∈ are affine-independent n 1 k F subspaces of F , then if for every α1, . . . , αk ∈ F, the two conditions k k X X k α v = 0 and α = 0 imply α = ··· = X i i i i dimaff (aff{W1,...,Wk}) ≤ (k−1)+ dimaff Wi . i=1 i=1 i=1 αk = 0. (5.2)

Proposition 5.1.25. The vectors v1,..., vk ∈ Fn are affine-independent if and only if none of them belongs to the affine hull of the others. 5.2 ( ) Hyperplanes Fact 5.1.26. Any single vector is affine- F independent and affine-closed at the same time. Definition 5.2.1 (Linear hyperplane). A linear Proposition 5.1.27 (Translation invariance). hyperplane of Fn is a subspace of Fn of codi- n mension 1 (R Def. 3.5.1). Let v1,..., vk ∈ F be affine-independent n and let w ∈ F be any vector. Then v1 + Definition 5.2.2 (Codimension of an affine sub- w,..., vk + w are also affine-independent. space). The (affine) codimension of an affine n Proposition 5.1.28 (Affine vs. linear inde- subspace U ≤aff F , denoted codimaff U, is the n codimension of its corresponding linear sub- pendence). Let v1,..., vk ∈ F . space (of which it is a translate). (a) For k ≥ 0, the vectors v1,... vk are lin- early independent if and only if the vec- Definition 5.2.3 (Hyperplane). A hyperplane is tors 0, v1,..., vk are affine-independent. an affine subspace of codimension 1. 5.3. (R) CONVEX COMBINATIONS 45

Proposition 5.2.4. Let S ⊆ Fn be a hy- Example 5.3.2. perplane. Then there exist a nonzero vector n T 1 1 1 1 a ∈ F and β ∈ F such that a v = β if and v1 + v2 + v4 + v5 only if v ∈ S. 2 4 6 12 The vector a whose existence is guaranteed is a convex combination of the vectors by the preceding proposition is called the nor- v1,..., v5. Note that the affine combination mal vector of the hyperplane S. in Example 5.1.2 is not convex. Proposition 5.2.5. Let W ≤ Fn. Then Fact 5.3.3. Every convex combination is an (a) W is the intersection of linear hyper- affine combination. planes; Definition 5.3.4 (Convex set). A convex set is n (b) if dim W = k, then W is the intersection a subset S ⊆ R that is closed under convex of n − k linear hyperplanes. combinations. n Proposition 5.2.6. Let W ≤aff F be an Proposition 5.3.5. The intersection of a (fi- affine subspace. Then nite or infinite) family of convex sets is convex.

(a) W is the intersection of hyperplanes; Definition 5.3.6. The convex hull of a subset n (b) if dimaff W = k, then W is the intersec- S ⊆ R , denoted conv S, is the smallest con- tion of n − k hyperplanes. vex set containing S, i. e., (a) conv S ⊇ S; 5.3 (R) Convex combina- (b) conv S is convex; tions (c) for every convex set T ⊆ Rn, if T ⊇ S then T ⊇ conv S. In the preceding section, we studied affine com- binations over an arbitrary field F. We now Theorem 5.3.7. Let S ⊆ Rn. Then conv S restrict ourselves to the case where F = R. exists and is unique. ♦ Definition 5.3.1 (Convex combination). A con- Theorem 5.3.8. For S ⊆ , conv S is the set vex combination is an affine combination with R of all convex combinations of the finite subsets nonnegative coefficients. So the expression k k of S. ♦ X X α v is a convex combination if α = 1 i i i Proposition 5.3.9. Let S ⊆ n. Then S is i=1 i=1 R and αi ≥ 0 for all i. convex if and only if S = conv S. 46 CHAPTER 5. (F, R) AFFINE AND CONVEX COMBINATIONS (OPTIONAL)

Fact 5.3.10. conv S ⊆ aff S. (b) If S ⊆ Rn is finite then conv(S) is the intersection of a finite number of closed Definition 5.3.11 (Straight-line segment). Let half-spaces. u, v ∈ Rn. The straight-line segment connect- ing u and v is the convex hull of {u, v}, i. e., Exercise 5.3.18. Find a convex set which is the set not the intersection of any number of open or conv(u, v) = {λu + (1 − λ)v | 0 ≤ λ ≤ 1} . closed half-spaces. (Such a set already exists in 2 dimensions.) Proposition 5.3.12. The set S ⊆ Rn is con- ∗ R vex if and only if it contains the straight-line Proposition 5.3.19. Every open ( Def. segment connecting u and v for every u, v ∈ S. 19.4.10) convex set is the intersection of the open half-spaces containing it. Definition 5.3.13 (Dimension). The dimension n of a convex set C ⊆ R , denoted dimconv C, is the dimension of its affine hull. A convex sub- set C of Rn is full-dimensional if aff C = Rn. 5.4 (R) Helly’s Theorem Definition 5.3.14 (Half-space). A closed half- n n The main result of this section is the following space is a region of R defined as {v ∈ R | theorem. aT v ≥ β} for some nonzero a ∈ Rn and β ∈ R and is denoted H(a, β). An open half-space is Theorem 5.4.1 (Helly’s Theorem). If n n T n a region of R defined as {v ∈ R | a v > β} C1,...,Ck ⊆ R are convex sets such that for some nonzero a ∈ Rn and β ∈ R and is any n + 1 of them intersect then all of them denoted Ho(a, β). intersect. Exercise 5.3.15. Let a ∈ n be a nonzero R Lemma 5.4.2 (Radon). Let S ⊆ Rn be a set vector and let β ∈ . Prove: the set {v ∈ n | R R of k ≥ n + 2 vectors in Rn. Then S has two aT v ≤ β} is also a (closed) half-space. disjoint subsets S1 and S2 whose convex hulls Fact 5.3.16. Let S ⊆ Rn be a hyperplane de- intersect. ♦ fined by aT v = β. Then S divides Rn into two half-spaces, defined by aT v ≥ β and aT v ≤ β. Exercise 5.4.3. Show that the inequality k ≥ The intersection of these two half-spaces is S. n + 2 is tight, i. e., that there exists a set S of n + 1 vectors in Rn such that for any two sub- Proposition 5.3.17. (a) Every closed sets S1,S2 ⊆ S, we have conv S1 ∩ conv S2 = ∅. (R Def. 19.4.11) convex set is the intersection of the closed half-spaces con- Exercise 5.4.4. Use Radon’s Lemma to prove taining it. Helly’s Theorem. 5.5. (F, R) ADDITIONAL EXERCISES 47

Exercise 5.4.5. Show the bound in Helly’s Theorem is tight, i. e., there exist n + 1 con- vex subsets of Rn such that every n of them intersect but the intersection of all of them is empty.

5.5 (F, R) Additional exer- cises

Observe that a half-space can be defined by n + 1 real parameters,1 i. e., by defining a and β. The following project asks you to define a similar object which can be defined by 2n − 1 real parameters.

Project 5.5.1. Define a “partially open half- space” which can be defined by 2n−1 parame- ters2 so that every convex set is the intersection of the partially open half-spaces and the closed half-spaces containing it.

(R Contents)

1In fact, in a well-defined sense, half-spaces can be defined by n real parameters. 2In fact, this object can be defined by 2n − 3 real parameters. we obtain

N31 x1 = (6.5) D3 N32 x2 = (6.6) D3 N33 Chapter 6 x3 = (6.7) D3 where

( ) The D3 = α11α22α33 + α12α23α31 + α13α21α32 F (6.8)

Determinant − α11α23α32 − α12α21α33 − α13α22α31 (6.9)

N31 = α22α33β1 + α12α23β3 + α13α32β2 (6.10) While many systems of linear equations are too large to reasonably solve by hand using meth- − α23α32β1 − α12α33β2 − α13α22β3 ods like elimination, the general system of two (6.11) equations in two unknowns N32 = α11α33β2 + α23α31β1 + α13α32β3 (6.12)

− α11α23β3 − α21α33β1 − α13α31β2 α11x1 + α12x2 = β1 (6.1) (6.13) α21x1 + α22x2 = β2 (6.2) N33 = α11α22β3 + α12α31β2 + α21α32β1 (6.14)

is not too cumbersome to work out and find − α11α32β2 − α12α21β3 − α22α31β1 (6.15)

α22β1 − α12β2 In particular, the numerators and denomi- x1 = (6.3) α22α11 − α12α21 nators of these expressions each have six terms. α11β2 − α21β1 For a system of 4 equations in 4 unknowns, x2 = (6.4) α22α11 − α12α21 the numerators and denominators each have 24 terms, and, in general, we have n! terms in the numerators and denominators of the solutions For the case of 3 equations in 3 unknowns, to n equations in n unknowns.

Babai: Discover Linear Algebra. 48 This chapter last updated August 18, 2016 c 2016 L´aszl´oBabai. 6.1. PERMUTATIONS 49

As complicated as these expressions may Definition 6.1.5 (Even and odd permutations). seem for large n, it is clear that they represent a If Inv(σ) is even, then we say that σ is an even fundamental quantity associated with the ma- permutation, and if Inv(σ) is odd, then σ is an trix. odd permutation. The denominator of these expressions is known as the determinant of the matrix A Proposition 6.1.6. For n ≥ 2, exactly half of (where our system of linear equations is written the n! permutations of {1, . . . , n} are even, the as Ax = b). other half are odd. The determinant is a function from the space of n × n matrices to numbers, that is, Definition 6.1.7 (Sign of a permutation). Let σ ∈ Sn be a permutation. The sign of σ, de- det : Mn(F) → F. Before defining this func- tion, we need to study permutations. noted sgn σ, is defined as

Inv(σ) (R Contents) sgn(σ) := (−1) (6.16)

In particular, sgn(σ) = 1 if σ is an even per- 6.1 Permutations mutation and sgn(σ) = −1 if σ is an odd per- mutation. The sign of a permutation (equiva- Definition 6.1.1 (Permutation). A permutation lently, whether it is even or odd) is also referred of a set Ω is a bijection f :Ω → Ω. The set Ω to as the parity of the permutation. is called the permutation domain. Definition 6.1.2 (Symmetric group). The sym- Definition 6.1.8 (Composition of permuta- tions). Let Ω be a set, and let σ and τ be metric group of degree n, denoted Sn, is the set of all permutations of the set {1, . . . , n}. permutations of Ω. Then the composition of σ with τ, denoted στ, is defined by Definition 6.1.3 (Inversion). Let σ ∈ Sn be a permutation of the set {1, . . . , n}, and let (στ)(a) := σ(τ(a)) (6.17) 1 ≤ i, j ≤ n with i 6= j. We say that the pair {i, j} is inverted by σ if i < j and σ(i) > σ(j) for all a ∈ Ω. or i > j and σ(i) < σ(j). We denote by Inv(σ) We also refer to the composition of σ with the number of inversions of σ, that is, the num- τ as the product of σ and τ. ber of pairs {i, j} that are inverted by σ. Definition 6.1.9 (Transposition). Let Ω be a Exercise 6.1.4. What is the maximum possi- set. The transposition of the elements a 6= b ∈ ble value of Inv(σ) for σ ∈ Sn? Ω is the permutation that swaps a and b and 50 CHAPTER 6. (F) THE DETERMINANT

fixes every other element. Formally, it is the Proposition 6.1.16. Every transposition is permutation τ defined by the composition of an odd number of neighbor  transpositions.  b x = a τ(x) := a x = b (6.18) Proposition 6.1.17. Neighbor transpositions  x otherwise generate the symmetric group. That is, every element of Sn can be expressed as the compo- This permutation is denoted τ = (a, b). sition of neighbor transpositions. Proposition 6.1.10. Transpositions generate Proposition 6.1.18. Composition with a Sn, i. e., every permutation σ ∈ Sn can be writ- transposition changes the parity of a permu- ten as a composition of transpositions. tation. Theorem 6.1.11. A permutation σ of Corollary 6.1.19. Let σ ∈ S be a permu- {1, . . . , n} is even if and only if σ is a product n tation. Then σ is even if and only if σ is the of an even number of transpositions. ♦ product of an even number of transpositions. In the light of this theorem, we could use Theorem 6.1.20. Let σ, τ ∈ S . Then its conclusion as the definition of even permu- n tations. The advantage of this definition is that sgn(στ) = sgn(σ) sgn(τ) . (6.19) it can be applied to any set Ω, not just the or- dered set {1, . . . , n}. ♦ Corollary 6.1.12. While the number of inver- Definition 6.1.21 (k-cycle). A k-cycle is a per- sions of a permutation depends on the ordering mutation that cyclically permutes k elements of the permutation domain, its parity does not. {a1, . . . , ak} and fixes all others (FIGURE). That is, σ is a k-cycle if σ(a ) = a for Definition 6.1.13 (Neighbor transposition). A i i+1 some elements a , . . . , a (where a = a ) neighbor transposition of the set {1, . . . , n} is a 1 k k+1 1 and σ(x) = x if x 6∈ {a , . . . , a }. transposition of the form τ = (i, i + 1). 1 k In particular, transpositions are 2-cycles. Exercise 6.1.14. Let σ ∈ S and let τ be a n Definition 6.1.22 (Disjoint cycles). Let σ and neighbor transposition. Show τ be cycles with permutation domain Ω. Then | Inv(σ) − Inv(στ)| = 1 . σ and τ are disjoint if no element of Ω is per- muted by both σ and τ. Corollary 6.1.15. Let σ ∈ Sn and let τ be a neighbor transposition. Then sgn(στ) = Proposition 6.1.23. Every permutation − sgn(σ). uniquely decomposes into disjoint cycles. 6.2. DEFINING THE DETERMINANT 51

Exercise 6.1.24. Let σ be a k-cycle. Show Proposition 6.2.3. Show that σ is an even permutation if and only if k α β is odd. (a) det = αδ − βγ γ δ Corollary 6.1.25. Let σ be a permutation. Then σ is even if and only if its cycle decompo- α α α  sition includes an even number of even cycles. 11 12 13 (b) det α21 α22 α23 = Proposition 6.1.26. Let σ be a permutation. α31 α32 α33 Then Inv (σ−1) = Inv(σ). α11α22α33 + α12α23α31 + α13α21α32 − (R Contents) α11α23α32 − α12α21α33 − α13α22α31

For the following exercises, let A be an n×n 6.2 Defining the determi- matrix. nant Proposition 6.2.4. If a column of A is 0, then det A = 0. Definition 6.2.1 (Determinant). The determi- Proposition 6.2.5. Let A0 be the matrix ob- nant of an n × n matrix A = (αij) is n tained by swapping two columns of A. Then X Y det A0 = − det A. det A := sgn(σ) αi,σ(i) . (6.20) σ∈S i=1 n Proposition 6.2.6. Show that det AT = Notation 6.2.2. The determinant of a matrix det A. This fact follows from what property α ··· α  11 1n of inversions?  . .. .  A =  . . .  may also be repre- αn1 ··· αnn Proposition 6.2.7. If two columns of A are sented with vertical bars, that is, equal, then det A = 0.   α11 ··· α1n Proposition 6.2.8. The determinant of a di-  . .. .  det A = |A| = det  . . .  (6.21) agonal matrix is the product of its diagonal αn1 ··· αnn entries.

α ··· α 11 1n Proposition 6.2.9. The determinant of an . .. . = . . . . (6.22) upper triangular matrix is the product of its

αn1 ··· αnn diagonal entries. 52 CHAPTER 6. (F) THE DETERMINANT

Example 6.2.10. and

5 1 7 det[a1 | · · · | ai−1 | αai | ai+1 | · · · | an] 0 2 6 det   = 30 (6.23) = α det[a | · · · | a ] . (6.27) 0 0 3 1 n Next we study the effect of elementary col- and this value does not depend on the three umn and row operations (Sec. 3.2) on the de- entries in the upper-right corner. terminant.

Definition 6.2.11 (k-linearity). A function Proposition 6.2.13. Let A0 be the matrix ob- f : V × · · · × V → W is linear in tained from A ∈ M ( ) by applying an elemen- | {z } n F k terms tary column or row operation. We distinguish the i-th component, if, whenever we fix cases according to the operation applied. x1,..., xi−1, xi+1,... xk, the function (a) Shearing does not change the value of the g(y) := f(x1,..., xi−1, y, xi+1,..., xk) determinant: det(A0) = det(A).

is a linear,1 i. e., (b) Scaling of a row or comulmn by a scalar λ (the scalec(i, λ) or scaler(i, λ) operation g(y1 + y2) = g(y1) + g(y2) (6.24) scales the value of the determinant by the g(αy) = αg(y) (6.25) same factor: det(A0) = λ det(A).

The function f is [k-linear] if it is linear in all (c) Swapping two rows or columns changes 0 k components. A function which is 2-linear is the sign of the determinant: det(A ) = said to be bilinear. − det(A).

Proposition 6.2.12 (Multilinearity). The de- Proposition 6.2.14. Let A, B ∈ Mn(F). terminant is multilinear in the columns of A. Then det(AB) = det(A) det(B). That is, Numerical exercise 6.2.15. Let A = 2 1 −3 det[a1 | · · · | ai−1 | ai + b | ai+1 | · · · | an] 4 −1 0 . = det[a1 | · · · | an] 2 5 −1 + det[a1 | · · · | ai−1 | b | ai+1 | · · · | an] (a) Use the formula derived in part (b) of (6.26) Prop. 6.2.3 to compute det A. 1Linear maps will be discussed in more detail in Chapter 16. 6.2. DEFINING THE DETERMINANT 53

(b) Compute the matrix A0 obtained by per- has a nontrivial solution if and only if det A = forming the column operation (1, 2, −4) 0. ♦ on A. Definition 6.2.20 (Skew-symmetric matrix). (c) Self-check: Use the same formula to com- The matrix A ∈ Mn(F) is skew-symmetric if pute det A0, and verify that det A0 = AT = −A. det A. Exercise 6.2.21. Let F be a field in which Theorem 6.2.16 (Determinant vs. linear in- 1 + 1 6= 0. dependence). Let A be a square matrix. Then (a) Show that if A ∈ M ( ) is skew- det A = 0 if and only if the columns of A are n F symmetric and n is odd then det A = 0. linearly dependent. The proof of this theorem requires the fol- (b) For all even n, find a nonsingular skew- lowing two lemmas which also provide a prac- symmetric matrix A ∈ M ( ). tical method to compute the determinant by n F Gaussian or Gauss-Jordan elimination. Exercise 6.2.22. Show that part (a) of the preceding exercise is false over (and every Lemma 6.2.17. Let A be an n × n matrix F2 field of characteristic 2). whose columns are linearly dependent. Then through a series of column shearing operations, Exercise 6.2.23. Let F be a field of charac- it is possible to bring A into a form in which teristic 2 and let n be odd. If A ∈ Mn(F) is one column is all zeros. In particular, in this symmetric2 and all of its diagonal entries are case the determinant is zero. ♦ 0, then A is singular. Lemma 6.2.18. Let A be an n × n matrix Proposition 6.2.24 (Hadamard’s Inequality). whose columns are linearly independent. Then Let A = [a1 | · · · | an] ∈ Mn(R). Prove it is possible to perform a series of column n shearing operations that bring A into a form Y | det A| ≤ kajk . (6.28) in which each row and column contains exactly j=1 one nonzero entry, i. e., the determinant has exactly one nonzero expansion term. ♦ where kajk is the norm of the vector aj (R Def. 1.5.2). Theorem 6.2.19 (Determinant vs. system of linear equations). The homogeneous system Exercise 6.2.25. When does equality hold in Ax = 0 of n linear equations in n unknowns Hadamard’s Inequality? 2Observe that in a field of characteristic 2, “skew-symmetric” means the same thing as “symmetric.” 54 CHAPTER 6. (F) THE DETERMINANT

Definition 6.2.26 (Parallelepiped). Let v1, v2 ∈ Theorem 6.3.2 (Cofactor expansion). Let 2 R . The parallelogram spanned by v1 and v2 A = (αij) be an n × n matrix, and let Cij be is the set (PICTURE) the (i, j) cofactor of A. Then for all i,

n parallelogram(v1, v2) X det A = αijCij. (6.31) := {αv1 + βv2 | 0 ≤ α, β ≤ 1} . (6.29) j=1 More generally, the parallelepiped spanned by n This is the cofactor expansion of A along the the vectors v1,..., vn ∈ R is the set i-th row. Similarly, for all j, parallelepiped(v ,..., v ) 1 n n ( n ) X X det A = αijCij. (6.32) := α v 0 ≤ α ≤ 1 . (6.30) i i i i=1 i=1 This is the cofactor expansion of A along the Exercise 6.2.27. Let a, b ∈ R2. Show that the area of the parallelogram spanned by these j-th column. ♦ vectors (PICTURE) is | det A|, where A = [a | Numerical exercise 6.3.3. Compute the de- b] ∈ M2(R). terminants of the following matrices by cofac- The following theorem is a generalization of tor expansion (a) along the first row and (b) Ex. 6.2.27. along the second column. Self-check: your an- n swers should be the same. Theorem 6.2.28. If v1,..., vn ∈ R , then the   volume of parallelepiped(v1,..., vn) is | det A|, 2 3 1 where A is the n × n matrix whose i-th column (a) 0 −4 −1 is vi. ♦ 1 −3 4   (R Contents) 3 −3 2 (b) 4 7 −1 6 −4 2

6.3 Cofactor expansion 1 3 2 (c) 3 −1 0 Definition 6.3.1 (Cofactor). Let A ∈ Mn(F) 0 6 5 be a matrix. The (i, j) cofactor of A is i+j   (−1) det(ˆiAˆj), where ˆiAˆj is the (n − 1) × 6 2 −1 (n − 1) matrix obtained by removing the i-th (d)  0 4 1  row and j-th column of A. −3 1 1 6.4. MATRIX INVERSES AND THE DETERMINANT 55

Exercise 6.3.4. Compute the determinants of {1, . . . , n}. Decide, for each n, whether the the following matrices. majority of Fn is odd or even. (The answer will depend on n.) α β β ··· β β β α β ··· β β   β β α ··· β β   (R Contents) (a)  ......   ......    β β β ··· α β β β β ··· β α 6.4 Matrix inverses and 1 1 0 0 ··· 0 0 0 1 1 1 0 ··· 0 0 0 the determinant   0 1 1 1 ··· 0 0 0 (b) ......  We now derive an explicit form for the inverse ......    of an n × n matrix. Our tool for this will be 0 0 0 0 ··· 1 1 1   the cofactor expansion in addition to the skew 0 0 0 0 ··· 0 1 1 cofactor expansion. In Theorem 6.3.2, we took the dot product of the i-th column of a matrix  1 1 0 0 ··· 0 0 0 with the cofactors of corresponding to the i-th −1 1 1 0 ··· 0 0 0   column. We now instead take the dot prod-  0 −1 1 1 ··· 0 0 0   uct of the ith column with the cofactors corre- (c)  ......   ......  sponding to the j-th column.    0 0 0 0 · · · −1 1 1 0 0 0 0 ··· 0 −1 1 Proposition 6.4.1 (Skew cofactor expansion). Exercise 6.3.5. Compute the determinant of Let A = (αij) be an n×n matrix, and fix i and the Vandermonde matrix (R Def. 2.5.9 gen- j (1 ≤ i, j ≤ n) such that i 6= j. Let Ckj be the (k, j) cofactor of A. Then erated by α1, . . . , αn. n Definition 6.3.6 (Fixed ). Let Ω be a set, X and let f :Ω → Ω be a permutation of Ω. We αkiCkj = 0 . (6.33) say that x ∈ Ω is a fixed point of f if f(x) = x. k=1 We say that f is fixed-point-free, or a derange- Definition 6.4.2 (Adjugate of a matrix). Let ment, if f has no fixed points. A ∈ Mn(F). Then the adjugate of A, denoted adj(A), is the matrix whose (i, j) entry is the ♥ Exercise 6.3.7. Let Fn denote the set of fixed-point free permutations of the set (j, i) cofactor of A. 56 CHAPTER 6. (F) THE DETERMINANT

Theorem 6.4.3 (Explicit form of the matrix 6.5 Additional exercises inverse). Let A be a nonsingular n × n matrix. Then Notation 6.5.1. Let A ∈ Fk×n and B ∈ Fn×k 1 A−1 = adj(A) . (6.34) be matrices, and let I ⊆ [n]. The matrix AI det A is the matrix whose columns are the columns ♦ of A which correspond to the elements of I. T T Numerical exercise 6.4.4. Compute the in- The matrix I B is BI , i. e., the matrix whose verses of the following matrices. Self-check: rows are the rows of B which correspond to the multiply your answer by the original matrix to elements of I. get the identity. 3 1 7  1 −3 Example 6.5.2. Let A = , B = (a) 2 3 2 −2 4 −4 1 −4 2   6 2, and let I = {1, 3}. Then AI = (b) −1 −1 −1 0 3 7 −4 1   and B = . 3 4 −7 2 2 I −1 0 (c) −2 1 −4 0 −2 5 Exercise 6.5.3 (Cauchy-Binet formula). Let A ∈ Fk×n and let B ∈ Fn×k. Show that −1 4 2  X (d) −3 2 −3 det(AB) = det(AI ) det(I B) . (6.35) 1 0 2 I⊆[n] |I|=k Proposition 6.4.5. If A ∈ Mn(F) is invert- 1 ible, then det A−1 = . det A Exercise 6.4.6. When is A ∈ Mn(Z) invert- (R Contents) ible over Z? Give a simple condition in terms of det A. Exercise 6.4.7. Let n be odd and let A ∈ Mn(Z) be an invertible symmetric matrix whose diagonal entries are all 0. Show that −1 A 6∈ Mn(Z). (R Contents) lution) precisely when A is invertible. It is easy to see that in this case, the solution to Ax = b is given by x = A−1b, but we now demonstrate that it is possible to determine the solution to this system of equations without ever comput- Chapter 7 ing a matrix inverse. Theorem 7.1.2 (Cramer’s Rule). Let A be an invertible n × n matrix over , and let b ∈ n.   F F (F) Theory of α1 α  −1  2 Let a = A b =  . . Then Systems of Linear  .  Equations II: αn Cramer’s Rule (a) Aa = b (b) For each i ≤ n,

det A α = i (7.1) i det A (R Contents)

where Ai is the matrix obtained by replac- ing the i-th column of A by b. 7.1 Cramer’s Rule ♦ With the determinant in hand, we are now able to return to the problem of solving systems of Numerical exercise 7.1.3. Use Cramer’s linear equations. In particular, we have the fol- Rule to solve the following systems of linear lowing result. equations. Self-check: plug your answers back into the original equations. Theorem 7.1.1. Let A be a nonsingular square matrix. Then the system Ax = b of n (a) linear equations in n unknowns is solvable. ♦ x + 2x = 3 In particular, if A is a square matrix, then 1 2 Ax = b is solvable (and in fact has a unique so- x1 − x2 = 6

Babai: Discover Linear Algebra. 57 This chapter last updated July 19, 2016 c 2016 L´aszl´oBabai. 58 CHAPTER 7. SYSTEMS OF LINEAR EQUATIONS: CRAMER’S RULE

(b)

−x1 + x2 − x3 = 4

2x1 + 3x2 + 4x3 = −2

−x1 − x2 − 3x3 = 3

(R Contents) Exercise 8.1.4. Consider the 2×2 real matrix  1 1 B = . −2 4

(a) Prove that the numbers λ1 = 2 and λ2 = 3 are eigenvalues of this matrix. Chapter 8 Find the corresponding eigenvectors. (b) Prove that B has no other eigenvalues. Exercise 8.1.5. What are the eigenvalues and (F) Eigenvectors eigenvectors of the n × n identity matrix? and Eigenvalues Exercise 8.1.6. Let D = diag(λ1, . . . , λn) be the n × n diagonal matrix whose diagonal en- tries are λ1, . . . , λn. What are the eigenvalues and eigenvectors of D?

(R Contents) Exercise 8.1.7. Determine the eigenvalues of 1 1 the 2 × 2 matrix . 1 0 Exercise 8.1.8. Find examples of real 2 × 2 8.1 Eigenvector and eigen- matrices with

value basics 2 (a) no eigenvectors in R Definition 8.1.1 (Eigenvector). Let A ∈ Mn(F) (b) exactly one eigenvector in R2, up to scal- and let v ∈ Fn. Then v is an eigenvector of ing A if v 6= 0 and there exists λ ∈ F such that Av = λv. (c) exactly two eigenvectors in R2, up to scal- ing Definition 8.1.2 (Eigenvalue). Let A ∈ Mn(F). Then λ ∈ F is an eigenvalue of A if there exists (d) infinitely many eigenvectors in R2, none a nonzero vector v ∈ Fn such that Av = λv. of which is a scalar multiple of another Definition 8.1.3. Let A ∈ M ( ). Let λ ∈ n F F Exercise 8.1.9. Let A ∈ Mn(F) be a matrix n be a scalar and v ∈ F a nonzero vector. If and let v1 and v2 be eigenvectors to distinct Av = λv then we say that λ and v are a cor- eigenvalues. Then v1 + v2 is not an eigenvec- responding eigenvalue–eigenvector pair. tor.

Babai: Discover Linear Algebra. 59 This chapter last updated November 25, 2017 c 2016 L´aszl´oBabai. 60 CHAPTER 8. (F) EIGENVECTORS AND EIGENVALUES

Exercise 8.1.10. Let A ∈ Mn(F) be a matrix Exercise 8.1.20. Let A ∈ Mn(F). Show that to which every nonzero vector is an eigenvector. if x is a right eigenvector to eigenvalue λ and Then A is a scalar matrix (R Def. 2.2.13). yT is a left eigenvector to eigenvalue µ 6= λ, then yT · x = 0, i. e., y ⊥ x. ♥ Exercise 8.1.11. Show that eigenvectors of a matrix corresponding to distinct eigenvalues Definition 8.1.21 (Eigenspace). Let A ∈ are linearly independent. Mn(F). We denote by Uλ the set Definition 8.1.12 (Eigenbasis). Let A ∈ M ( ) n n F Uλ := {v ∈ F | Av = λv} . (8.1) be a matrix. An eigenbasis of A is a basis of Fn made up of eigenvectors of A. This set is called the eigenspace corresponding to the eigenvalue λ. Exercise 8.1.13. Find all eigenbases of I. The following exercise explains this termi- Exercise 8.1.14. Let λ1, . . . , λn be scalars. nology. Find an eigenbasis of diag(λ1, . . . , λn). Exercise 8.1.22. Let A ∈ Mn(F) be a square 1 1 Exercise 8.1.15. Prove that has no matrix with eigenvalue λ. Show that Uλ is a 0 1 n subspace of F . eigenbasis. Definition 8.1.23 (Geometric multiplicity). Let Proposition 8.1.16. Let A ∈ Mn(F). Then A ∈ Mn(F) be a square matrix and let λ be an A is nonsingular if and only if 0 is not an eigen- eigenvalue of A. Then the geometric multiplic- value of A. ity of λ is dim Uλ. Definition 8.1.17 (Left eigenvector). Let A ∈ The next exercise provides a method of cal- 1×n Mn(F). Then x ∈ F is a left eigenvector culating this quantity. if x 6= 0 and there exists λ ∈ F such that xA = λx. Exercise 8.1.24. Let λ be an eigenvalue of the n × n matrix A. Give a simple expression for Definition 8.1.18 (Left eigenvalue). Let A ∈ the geometric multiplicity in terms of A and λ. Mn(F). Then λ ∈ F is a left eigenvalue of A if there exists a nonzero row vector v ∈ F1×n Exercise 8.1.25. If N is a nilpotent matrix such that vA = λv. (R Def. 2.2.18) and λ is an eigenvalue of N, Convention 8.1.19. When we use the word then λ = 0. “eigenvector” without a modifier, we refer to Exercise 8.1.26. Let N be a nilpotent matrix. a right eigenvector; the term “right eigenvec- tor” is occasionally used for clarity. (a) Show that I + N is invertible. 8.2. SIMILAR MATRICES AND DIAGONALIZABILITY 61

(b) Find a nonsingular matrix A and a nilpo- Definition 8.2.6 (Diagonalizability). Let A ∈ tent matrix N such that A+N is singular. Mn(F). Then A is diagonalizable if A is simi- lar to a diagonal matrix. (c) Prove that there exist a nonsingular di- agonal matrix D and a nilpotent matrix N such that D + N is singular. Exercise 8.2.7. 1 1 (a) Prove that is not diagonalizable. (R Contents) 0 1

1 2 (b) Prove that is diagonalizable. 8.2 Similar matrices and 0 3 diagonalizability α α  (c) When is the matrix 11 12 diago- Definition 8.2.1 (Similar matrices). Let A, B ∈ 0 α22 nalizable? Mn(F). Then A and B are similar (notation: A ∼ B) if there exists a nonsingular matrix S −1 such that B = S AS. Proposition 8.2.8. Let A ∈ Mn(F). Then A Exercise 8.2.2. Let A and B be similar ma- is diagonalizable if and only if A has an eigen- trices. Show that, for k ≥ 0, we have Ak ∼ Bk. basis. Proposition 8.2.3. Let A ∼ B (R Def. 8.2.1). Then det A = det B. (R Contents) Exercise 8.2.4. Let N ∈ Mn(F) be a matrix. Then N is nilpotent if and only if it is similar to a strictly upper triangular matrix. Proposition 8.2.5. Every matrix is similar to 8.3 Polynomial basics a block diagonal matrix where each block has the form αI + N where α ∈ and N is nilpo- F We now digress from the theory of eigenvectors tent. and eigenvalues in order to build a foundation Blocks of this form are “proto-Jordan in polynomials. We will discuss polynomials blocks.” We will discuss Jordan blocks and the in more depth in Section 14.4, but now we es- Jordan canonical form of matrices in Section tablish the basics necessary for studying the ?? characteristic polynomial in Section 8.4. 62 CHAPTER 8. (F) EIGENVECTORS AND EIGENVALUES

Definition 8.3.1 (Polynomial). A polynomial Exercise 8.3.6. Which polynomials have de- over the field F is a formal expression1 f of gree 0? the form Notation 8.3.7. We denote the set of polyno- 2 n mials of degree at most n over F by Pn(F). f = α0 + α1t + α2t + ··· + αnt (8.2) Definition 8.3.8 (Divisibility of polynomials). We may omit any terms with zero coefficient, Let f, g ∈ F[t]. We say that g divides f, or f e. g., is divisible by g, written g | f, if there exists a 6 + 0t − 3t2 + 0t3 = 6 − 3t2 (8.3) polynomial h ∈ F[t] such that f = gh. In this case we say that g is a divisor of f and f is a The set of polynomials over the field F is de- multiple of g. noted F[t]. Definition 8.3.9 (Root of a polynomial). Let Definition 8.3.2 (Zero polynomial). The poly- f ∈ F[t] be a polynomial. Then ζ ∈ F[t] is a nomial which has all coefficients equal to zero root of f if f(ζ) = 0. is called the zero polynomial and is denoted by Proposition 8.3.10. Let f ∈ F[t] be a poly- 0. nomial and let α ∈ F[t]. Then ζ is a root of F Definition 8.3.3. The leading term of a poly- if and only if t − ζ | f. n nomial f = α0 + α1t + ··· + αnt is the term Definition 8.3.11 (Multiplicity of a root). Let corresponding to the highest power of t with f be a polynomial and let ζ be a root of f. k a nonzero coefficient, that is, the term αkt The multiplicity of the root ζ is the largest k where αk 6= 0 and αj = 0 for all j > k. The for which (t − ζ)k | f. zero polynomial does not have a leading term. The leading coefficient of a polynomial f is the Proposition 8.3.12. Let f be a polynomial of coefficient of the leading term of f. The de- degree n. Then f has at most n roots (counting gree of a polynomial f, denoted deg f, is the multiplicity). exponent of its leading term. A polynomial is Theorem 8.3.13 (Fundamental Theorem of monic if its leading coefficient is 1. Algebra). Let f ∈ C[t]. If deg f = k ≥ 1, then Convention 8.3.4. The zero polynomial has de- f can be written as k gree −∞. Y f = αk (t − ζi) (8.4) Example 8.3.5. Let f = 6 − 3t2 + 4t5. Then i=1 5 the leading term of f is 4t , the leading coeffi- where αk is the leading coefficient of f and the cient of f is 4, and deg f = 5. ζi are complex numbers. ♦ 1In Chapter 14, we will refine this definition and study polynomials more formally. 8.4. THE CHARACTERISTIC POLYNOMIAL 63

(R Contents) Exercise 8.4.7. Let N be an n × n nilpotent matrix. What is its characteristic polynomial?

In Section 8.1, we defined the geometric 8.4 The characteristic multiplicity of an eigenvalue (R Def. 8.1.23). polynomial We now define the algebraic multiplicity of an eigenvalue. We now establish a method for finding the Definition 8.4.8 (Algebraic multiplicity). Let A eigenvalues of general n × n matrices. be a square matrix with eigenvalue λ. The al- Definition 8.4.1 (Characteristic polynomial). gebraic multiplicity of λ is its multiplicity as a Let A be an n × n matrix. Then the char- root of the characteristic polynomial of A. acteristic polynomial of A is defined by Proposition 8.4.9. Let A be an n × n matrix fA(t) := det(tI − A) . (8.5) with distinct complex eigenvalues λ1, . . . , λ`, and let k be the algebraic multiplicity of λ Exercise 8.4.2. (a) What is the character- i i for each i. Then istic polynomial of a 1 × 1 matrix? ` (b) Calculate the characteristic polynomial X ki = n (8.6) for the general 2 × 2 matrix A = i=1 α α  11 12 . α21 α22 and ` Y ki Exercise 8.4.3. Show that the characteristic fA = (t − λi) . (8.7) polynomial of an n × n matrix is a monic poly- i=1 nomial of degree n. Proposition 8.4.10. Let A be as in the pre- Theorem 8.4.4. The eigenvalues of a square ceding proposition. Show that the coefficient n−` matrix A are precisely the roots of its charac- of t in fA is teristic polynomial. ♦ ` X (−1) λi1 λi2 ··· λi` . Proposition 8.4.5. Every matrix A ∈ M3(R) i1,...,i` has an eigenvector. distinct

Exercise 8.4.6. Find a matrix A ∈ M2(R) Theorem 8.4.11. Let A be an n × n ma- which does not have an eigenvector. trix with eigenvalues λ1, . . . , λn (with their al- 64 CHAPTER 8. (F) EIGENVECTORS AND EIGENVALUES gebraic multiplicities). Then Then the of f is the matrix C(f) ∈ Mn(F) defined by n X   Tr A = λi (8.8) 0 0 0 ··· 0 −α0 i=1 1 0 0 ··· 0 −α1  n   0 1 0 ··· 0 −α2  Y C(f) :=   . (8.11) det A = λi (8.9) ......  ......  i=1   0 0 0 ··· 0 −αn−2 ♦ 0 0 0 ··· 1 −αn−1 Proposition 8.4.18. Let f be a monic poly- Proposition 8.4.12. Let A be a square ma- nomial and let A = C(f) be its companion trix and let λ be an eigenvalue of A. Then the matrix. Then the characteristic polynomial of geometric multiplicity of λ is less than or equal A is equal to f. That is, to the algebraic multiplicity of λ. f = fA = det(tI − C(f)) . (8.12) Exercise 8.4.13. Determine the eigenvalues Corollary 8.4.19. Every monic polynomial and their (algebraic and geometric) multiplici- f ∈ Q[t] is the characteristic polynomial of a ties of the all ones matrix Jn over R. rational matrix.

Proposition 8.4.14. Let A be a square ma- (R Contents) trix. Then A is diagonalizable over C if and only if for every eigenvalue λ, the geometric and algebraic multiplicities of λ are equal. 8.5 The Cayley-Hamilton

Proposition 8.4.15. Let A ∈ Mn(F). Then Theorem the left eigenvalues of A are the same as the right eigenvalues, including their geometric and In Section 2.3, we defined how to substitute algebraic multiplicities. a square matrix into a polynomial (R Def. 2.3.3). We repeat that definition here. Proposition 8.4.16. Suppose n ≥ k. Let A ∈ Definition 8.5.1 (Substition of a matrix into a k×n n×k n−k F and B ∈ F . Then fAB = fBA · t . polynomial). Let f ∈ F[t] be the polynomial (R Def. 8.3.1) defined by Definition 8.4.17 (Companion matrix). Let d f = α0 + α1t + ··· + αdt . f ∈ Pn[F] be a monic polynomial, say Just as we may substitute ζ ∈ F for the vari- n−1 n f = α0 + α1t + ··· + αn−1t + αnt . (8.10) able t in f to obtain a value f(ζ) ∈ F, we may 8.5. THE CAYLEY-HAMILTON THEOREM 65 also “plug in” the matrix A ∈ Mn(F) to ob- Proposition 8.5.7. Let A ∼ B, so B = −1 tain f(A) ∈ Mn(F). The only thing we have to S AS. Then be careful about is what we do with the scalar tI − B = S−1 (tI − A) S. (8.15) term α0; we replace it with α0 times the iden- tity matrix, so Proposition 8.5.8. If A ∼ B, then the char- d f(A) := α0I + α1A + ··· + αdA . (8.13) acteristic polynomials of A and B are equal, i. e., fA = fB. Exercise 8.5.2. Let A and B be similar ma- trices, and let f be a polynomial. Show that Proposition 8.5.9. Let A be an n × n matrix n f(A) ∼ f(B). Y with characteristic polynomial fA = (t−λi). The main result of this section is the fol- i=1 Then A is diagonalizable if and only if lowing theorem.   Theorem 8.5.3 (Cayley-Hamilton Theorem). λ1 0 ··· 0  0 λ2 ··· 0  Let A be an n×n matrix. Then fA(A) = 0.   A ∼  . . .. .  . (8.16)  . . . .  Exercise 8.5.4. What is wrong with the fol- 0 0 ··· λ lowing “proof” of the Cayley-Hamilton Theo- n rem? Lemma 8.5.10. Let g ∈ F[t], and let A, S ∈ −1 Mn(F) with S nonsingular. Then g (S AS) = fA(A) = det(AI − A) = det 0 = 0 . (8.14) −1 S g(A)S. ♦ Exercise 8.5.5. Prove the Cayley-Hamilton Corollary 8.5.11. Let g ∈ [t] and let A, B ∈ Theorem by brute force F Mn(F) with A ∼ B. Then g(A) ∼ g(B). (a) for 1 × 1 matrices, Proposition 8.5.12. The Cayley-Hamilton (b) for 2 × 2 matrices. Theorem holds for diagonalizable matrices. We will first prove the Cayley-Hamilton Proposition 8.5.13. Show that the diagonal- Theorem for diagonal matrices, and then more izable matrices are dense in Mn(C), i. e., every generally for diagonalizable matrices (R Def. n×n complex matrix is the (elementwise) limit 8.2.6). of a sequence of diagonalizable matrices. Proposition 8.5.6. Let D be a diagonal ma- Exercise 8.5.14. Combine the preceding two trix, and let fD be its characteristic polyno- exercises to complete the proof of the Cayley– mial. Then fD(D) = 0. Hamilton theorem over C. 66 CHAPTER 8. (F) EIGENVECTORS AND EIGENVALUES

Proposition 8.5.15 (Identity principle). Let (a) Find square matrices A and B such that A+B A B f(x1, . . . , xN ) be a multivariate polynomial e 6= e e . with integer coefficients. If f is identically zero (b) Give a natural sufficient condition under over Z, i. e., for all α1, . . . , αN ∈ Z we have which eA+B = eAeB. f(α1, . . . , αN ) = 0, then f is formally zero (all coeffcients of f are zero). In particular, f is the zero polynomial over any field. Exercise 8.5.16. Use the Identity Principle to complete the proof of the Cayley–Hamilton Theorem over any field.

(R Contents)

8.6 Additional exercises

Exercise 8.6.1. Let N be an n × n nilpotent matrix over any field. Show that N n = 0. The following two exercises are easier for diagonalizable matrices.

P∞ i Exercise 8.6.2. Let f(t) = i=0 αit be a function which is convergent for all t such that |t| < r for some r ∈ R+ ∪ {∞}. Define, for A ∈ Mn(R), ∞ X i f(A) = αiA . (8.17) i=0

Prove that f(A) converges if |λi| < r for all A eigenvalues λi of A. In particular, e always converges. Exercise 8.6.3. Exercise 9.1.4. Which diagonal matrices are orthogonal? Theorem 9.1.5 (Third Miracle of Linear Al- gebra). Let A ∈ Mn(R). Then the columns of A are orthonormal if and only if the rows of A Chapter 9 are orthonormal. ♦ Proposition 9.1.6. Let A ∈ O(n). Then all ( ) Orthogonal eigenvalues of A have absolute value 1. R Exercise 9.1.7. The matrix A ∈ Mn(R) is orthogonal if and only if A preserves the dot Matrices n product, i. e., for all v, w ∈ R , we have (Av)T (Aw) = vT w.

Exercise 9.1.8. The matrix A ∈ Mn(R) is or- 9.1 Orthogonal matrices thogonal if and only if A preserves the norm, i. e., for all v ∈ Rn, we have kAvk = kvk. In Section 1.4, we defined the standard dot product (R Def. 1.4.1) and the notions of orthogonal vectors (R Def. 1.4.8). In this 9.2 Orthogonal similarity chapter we study orthogonal matrices, matri- Definition 9.2.1 (Orthogonal similarity). Let ces whose columns form an orthonormal basis n A, B ∈ Mn(R). We say that A is orthogo- (R Def. 1.5.6) of R . nally similar to B, denoted A ∼o B, if there Definition 9.1.1 (Orthogonal matrix). The ma- exists an orthogonal matrix O such that A = T −1 trix A ∈ Mn(R) is orthogonal if A A = I. The O BO. set of orthogonal n × n matrices is denoted by Note that O−1BO = OT BO because O is O(n). orthogonal. Fact 9.1.2. A ∈ M ( ) is orthogonal if and n R Proposition 9.2.2. Let A ∼o B. only if its columns form an orthonormal basis n (a) If A is symmetric then so is B. of R . Proposition 9.1.3. O(n) is a group (R Def. (b) If A is orthogonal then so is B. 14.2.1) under matrix multiplication (it is called Proposition 9.2.3. Let A ∼o diag(λ1, . . . , λn). the orthogonal group). Then

Babai: Discover Linear Algebra. 67 This chapter last updated August 29, 2016 c 2016 L´aszl´oBabai. 68 CHAPTER 9. (R) ORTHOGONAL MATRICES

(a) If all eigenvalues of A are real then A is 9.3 Additional exercises symmetric. Definition 9.3.1 (). The ma- (b) If all eigenvalues of A have unit absolute trix A = (αij) ∈ Mn(R) is an Hadamard matrix value then A is orthogonal. if αij = ±1 for all i, j, and the columns of A are orthogonal. We denote by H the set Proposition 9.2.4. A ∈ M ( ) has an or- n R H := {n | an n × n Hadamard matrix exists} . thonormal eigenbasis if and only if A is orthog- (9.1) onally similar to a diagonal matrix. 1 1  Example 9.3.2. The matrix is an Proposition 9.2.5. Let A ∈ Mn(R). Then 1 −1 A ∈ O(n) if and only if it is orthogonally sim- Hadamard matrix. ilar to a matrix which is the diagonal sum of some of the following: an identity matrix, a Exercise 9.3.3. Let A be an n × n Hadamard negative identity matrix, 2 × 2 rotation matri- matrix. Create a 2n × 2n Hadamard matrix. ces (compare with Prop. 16.4.45). Proposition 9.3.4. If n ∈ H and n > 2, then 4 | n. Examples 9.2.6. The following are exam- ples of the matrices described in the preceding Proposition 9.3.5. If p is prime and p ≡ −1 proposition. (mod 4), then p + 1 ∈ H.

−1 0 0  Proposition 9.3.6. If k, ` ∈ H, then k` ∈ H. (a) 0 √1 − √1  2 2  Proposition 9.3.7. Let A be an n × n 0 √1 √1 2 2 √1 Hadamard matrix. Then n A is an orthogo- nal matrix. 1 0 0 0 0 0 0 0 0  0 −1 0 0 0 0 0 0 0    0 0 −1 0 0 0 0 0 0    0 0 0 0 −1 0 0 0 0    (b) 0 0 0 1 0 0 0 0 0    0 0 0 0 0 0 −1 0 0  0 0 0 0 0 1 0 0 0   √  0 0 0 0 0 0 0 3 − 1   2 √ 2  1 3 0 0 0 0 0 0 0 2 2 Theorem 10.1.2 (The Spectral Theorem for real symmetric matrices, restated). Let A ∈ Mn(R) be a real symmetric matrix. Then A is orthogonally similar to a diagonal matrix. ♦ Exercise 10.1.3. Verify that these two formu- Chapter 10 lations of the Spectral Theorem are equivalent.

Corollary 10.1.4. Let A be a real symmetric (R) The Spectral matrix. Then A is diagonalizable. Corollary 10.1.5. Let A be a real symmetric Theorem matrix. Then all of the eigenvalues of A are real.

10.1 Statement of the 10.2 Applications of the Spectral Theorem Spectral Theorem

The Spectral Theorem is one of the most signif- Although we have not yet proved the Spectral icant results of linear algebra, as well as one of Theorem, we can already begin to study some the most frequently applied mathematical re- of its many applications. sults in pure math, applied math, and science. We will see a number of different versions of Proposition 10.2.1. If two symmetric matri- this theorem, and we will not prove it until Sec- ces are similar then they are orthogonally sim- tion 19.4. However, we now have developed the ilar. tools necessary to understand the statement of Exercise 10.2.2. Let A be a symmetric real the theorem and some of its applications. n × n matrix, and let v ∈ Rn. Let b = Theorem 10.1.1 (The Spectral Theorem for (b1,..., bn) be an orthonormal eigenbasis of A. Express vT Av in terms of the eigenvalues real symmetric matrices). Let A ∈ Mn(R) be a real symmetric matrix. Then A has an or- and the coordinates of v with respect to b. thonormal eigenbasis. ♦ Definition 10.2.3 (Positive definite matrix). An The Spectral Theorem can be restated in n×n real matrix A ∈ Mn(R) is positive definite n T terms of orthogonal similarity (R Def. 9.2.1). if for all x ∈ R (x 6= 0), we have x Ax > 0.

Babai: Discover Linear Algebra. 69 This chapter last updated August 4, 2016 c 2016 L´aszl´oBabai. 70 CHAPTER 10. (R) THE SPECTRAL THEOREM

Proposition 10.2.4. Let A ∈ Mn(R) be a real (so B is also symmetric). Prove that the eigen- symmetric n × n matrix. Then A is positive values of A and B interlace, i. e., if the eigen- definite if and only if all eigenvalues of A are values of A are λ1 ≥ · · · ≥ λn and the eigen- positive. values of B are µ1 ≥ · · · ≥ µn−1, then

Another consequence of the Spectral Theo- λ1 ≥ µ1 ≥ λ2 ≥ µ2 ≥ · · · ≥ λn−1 ≥ µn−1 ≥ λn . rem is Rayleigh’s Principle. Definition 10.2.5 (Rayleigh quotient). Let A ∈ ♦ Mn(R). The Rayleigh quotient of A is a func- n tion RA : R \{0} → R defined by vT Av R (v) = . (10.1) A kvk2

Recall that kvk2 = vT v (R Def. 1.5.2). Proposition 10.2.6 (Rayleigh’s Principle). Let A be an n × n real symmetric matrix with eigenvalues λ1 ≥ · · · ≥ λn. Then

(a) max RA(v) = λ1 v∈Rn\{0}

(b) min RA(v) = λn v∈Rn\{0} Theorem 10.2.7 (Courant-Fischer). Let A be a symmetric real matrix with eigenvalues λ1 ≥ · · · ≥ λn. Then

λ = max min R (v) . (10.2) i n A U≤R v∈U\{0} dim U=i

♦ Theorem 10.2.8 (Interlacing). Let A ∈ Mn(R) be an n × n symmetric real matrix. Let B be the (n − 1) × (n − 1) matrix obtained by deleting the i-th column and the i-th row of A T n (α1, . . . , αn) ∈ F , the function n T X f(x) = a x = αixi (11.1) i=1 is a linear form. Chapter 11 Theorem 11.1.5 (Representation Theorem for Linear Forms). Every linear form f : Fn → F has the form (11.1) for some column vector n ( , ) Bilinear and a ∈ F . ♦ F R Definition 11.1.6 (Bilinear form). A bilinear Quadratic Forms form is a function f : Fn × Fn → F with the following properties.

(a) f(x1 + x2, y) = f(x1, y) + f(x2, y) (b) f(λx, y) = λf(x, y) 11.1 (F) Linear and bilin- (c) f(x, y + y ) = f(x, y ) + f(x, y ) ear forms 1 2 1 2 (d) f(x, λy) = λf(x, y) Definition 11.1.1 (Linear form). A linear form Exercise 11.1.7. Let f : n × n → be a is a function f : n → with the following F F F F F bilinear form. Show that for all x, y ∈ , we properties. F have (a) f(x + y) = f(x) + f(y) for all x, y ∈ Fn; f(x, 0) = f(0, y) = 0 . (11.2) The next result explains the term “bilin- (b) f(λx) = λf(x) for all x ∈ n and λ ∈ . F F ear.” Exercise 11.1.2. Let f be a linear form. Show Proposition 11.1.8. The function f : Fn × that f(0) = 0. Fn → F is a bilinear form exactly if n (1) n Definition 11.1.3 (). The set of lin- (a) for all a ∈ F the function fa : F → F n (1) ear forms f : F → F is called the dual space defined by fa (x) = f(a, x) is a linear ∗ of Fn and is denoted (Fn) . form and n (2) n Example 11.1.4. The function f(x) = x1 + (b) for all b ∈ F the function fb : F → F T (2) ··· + xn (where x = (x1, . . . , xn) ) is a lin- defined by fb (x) = f(x, b) is a linear ear form. More generally, for any a = form.

Babai: Discover Linear Algebra. 71 This chapter last updated August 21, 2016 c 2016 L´aszl´oBabai. 72 CHAPTER 11. (F, R) BILINEAR AND QUADRATIC FORMS

Examples 11.1.9. The standard dot product 11.2 (F) Multivariate poly- n nomials T X f(x, y) = x y = xiyi (11.3) i=1 Definition 11.2.1 (Multivariate monomial). A is a bilinear form. More generally, for any ma- monomial in the variables x1, . . . , xn is an ex- trix A ∈ Mn(F), the expression pression of the form n n T X X n f(x, y) = x Ay = αijxiyj (11.4) Y ki i=1 j=1 f = α xi is a bilinear form. i=1

Theorem 11.1.10 (Representation Theorem for some nonzero scalar α and exponents ki. for bilinear forms). Every bilinear form f has The degree of this monomial is the form (11.4) for some matrix A. ♦ n Definition 11.1.11 (Nonsingular bilinear form). X deg f := k . (11.5) We say that the bilinear form f(x, y) = xT Ay i i=1 is nonsingular if the matrix A is nonsingular. Exercise 11.1.12. Is the standard dot prod- Examples 11.2.2. The following expressions uct nonsingular? are multivariate monomials of degree 4 in the 2 3 4 Exercise 11.1.13. variables x1, . . . , x6: x1x2x3, 5x4x5, −3x6. (a) Assume does not have characteristic 2, F Definition 11.2.3 (Multivariate polynomial). A i. e., 1 + 1 6= 0 in (R Section 14.3 for F multivariate polynomial is an expression which more about the characteristic of a field). is the sum of multivariate monomials. Prove: if n is odd and the bilinear form f : Fn × Fn → F satisfies f(x, x) = 0 for Observe that the preceding definition in- all x ∈ Fn, then f is singular. cludes the possibility of the empty sum, cor- responding to the 0 polynomial. (b)∗ Prove this without the assumption that 1 + 1 6= 0. Definition 11.2.4 (Monic monomial). We call n (b) Over every field , find a nonsingular bi- Y ki F the monomials of the form xi monic. We 2 linear form f over F such that f(x, x) = i=1 0 for all x ∈ 2. define the monic part of the monomial f = F n n Y ki Y ki (c) Extend this to all even dimensions. α xi to be the monomial xi . i=1 i=1 11.2. (F) MULTIVARIATE POLYNOMIALS 73

Definition 11.2.5 (Standard form of a mul- Fact 11.2.10. The 0 polynomial is a homoge- tivariate polynomial). A multivariate polyno- neous polynomial of degree k for all k. mial is in standard form if it is expressed as a (possibly empty) sum of monomials with dis- tinct monic parts. The empty sum of monomi- Fact 11.2.11. als is the zero polynomial, denoted by 0. Examples 11.2.6. The following are multi- (a) If f and g are homogeneous polynomials of degree k, then f + g is a homogeneous variate polynomials in the variables x1, . . . , x7 (in standard form). polynomial of degree k.

3 (a) 3x1x3x5 + 2x2x6 − x4 (b) If f is a homogeneous polynomial of de- 3 2 (b) 4x1 + 2x5x7 + 3x1x5x7 gree k and g is a homogeneous polyno- mial of degree `, then fg is a homoge- (c) 0 neous polynomial of degree k + `. Definition 11.2.7 (Degree of a multivariate polynomial). The degree of a multivariate poly- nomial f is the highest degree of a monomial in Exercise 11.2.12. For all n and k, count the the standard form expression for f. The degree monic monomials of degree k in the variables of the 0 polynomial1 is defined to be −∞. x1, . . . , xn. Note that this is the dimension (R Def. 15.3.6) of the space of homogeneous Exercise 11.2.8. Let f and g be multivariate polynomials of degree k. polynomials. Then

(a) deg(f + g) ≤ max{deg f, deg g} ; Exercise 11.2.13. What are the homogeneous polynomials of degree 0? (b) deg(fg) = deg f + deg g . Note that by our convention, these rules re- main valid if f or g is the zero polynomial. Fact 11.2.14. Linear forms are exactly the ho- mogeneous polynomials of degree 1. Definition 11.2.9 (Homogeneous multivariate polynomial). The multivariate polynomial f is a homogeneous polynomial of degree k if every In the next section we explore quadratic monomial in the standard form expression of f forms, which are the homogeneous polynomi- has degree k. als of degree 2 (R (11.6)). 1This is in accordance with the natural convention that the maximum of the empty list is −∞. 74 CHAPTER 11. (F, R) BILINEAR AND QUADRATIC FORMS

11.3 (R) Quadratic forms (a) Q is positive definite if Q(x) > 0 for all x 6= 0. In this section we restrict our attention to the field of real numbers. (b) Q is positive semidefinite if Q(x) ≥ 0 for all x. Definition 11.3.1 (Quadratic form). A quadratic form is a function Q : Rn → Rn (c) Q is negative definite if for all Q(x) < 0 where Q(x) = f(x, x) for some bilinear form for all x 6= 0. f. (d) Q is negative semidefinite if for all Definition 11.3.2. Let A ∈ M ( ) be an n × n n R Q(x) ≤ 0 for all x. matrix. The quadratic form associated with A, denoted QA, is defined by (e) Q is indefinite if it is neither positive n n semidefinite nor negative semidefinite, T X X n QA(x) = x Ax = αijxixj . (11.6) i. e., there exist x, y ∈ R such that i=1 j=1 Q(x) > 0 and Q(y) < 0.

Note that for all matrices A ∈ Mn(R), we Definition 11.3.6. We say that a matrix A ∈ have QA(0) = 0. Note further that for a Mn(R) is positive definite if it is symmetric symmetric matrix B, the quadratic form QB and its associated quadratic form QA is pos- is the numerator of the Rayleigh quotient RB itive definite. Positive semidefinite, negative (R Def. 10.2.5). definite, negative semidefinite, and indefinite symmetric matrices are defined analogously. Proposition 11.3.3. For all A ∈ Mn(R), Notice that we shall not call a non- there is a unique B ∈ Mn(R) such that B is a symmetric matrix and QA = QB, i. e., for all symmetric matrix A positive definite, etc., even x ∈ Rn, we have if the quadratic form QA is positive definite, etc. xT Ax = xT Bx . Exercise 11.3.7. Categorize diagonal matri- Exercise 11.3.4. Find the symmetric matrix ces according to their definiteness, i. e., tell, B such that which diagonal matrices are positive definite,

2 2 etc. QB(x) = 3x1 − 7x1x2 + 2x2

T Exercise 11.3.8. Let A ∈ Mn(R). Assume where x = (x1, x2) . QA is positive definite and λ is a real eigen- Definition 11.3.5. Let Q be a quadratic form value of A. Prove λ > 0. Note that A is not in n variables. necessarily symmetric. 11.3. (R) QUADRATIC FORMS 75

Corollary 11.3.9. If A is a positive definite Exercise 11.3.14. Let A ∈ Mn(R) be a sym- (and therefore symmetric by definition) matrix metric matrix. and λ is an eigenvalue of A then λ > 0. (a) If A is positive definite then all of its cor- ner matrices are positive definite. Proposition 11.3.10. Let A ∈ Mn(R) be a symmetric real matrix with eigenvalues λ1 ≥ (b) If A is positive definite then all of its cor- · · · ≥ λn. ner determinants are positive.

(a) QA is positive definite if and only if λi > The next theorem says that (b) is actually 0 for all i. a necessary and sufficient condition for positive definiteness. (b) Q is positive semidefinite if and only if A Theorem 11.3.15. Let A ∈ M ( ) be a sym- λ ≥ 0 for all i. n R i metric matrix. A is positive definite if and only if all of its corner determinants are posi- (c) QA is negative definite if and only if tive. λi < 0 for all i. ♦ Exercise 11.3.16. Show that the following (d) QA is negative semidefinite if and only if statement is false for every n ≥ 2: The n × n λi ≤ 0 for all i. symmetric matrix A is positive semidefinite if and only if all corner determinants are nonneg- (e) Q is indefinite if there exist i and j such A ative. that λi > 0 and λj < 0 for all i. Exercise 11.3.17. Show that the statement Proposition 11.3.11. If A ∈ Mn(R) is posi- of Ex. 11.3.16 remains false for n ≥ 3 if in tive definite, then its determinant is positive. addition we require all diagonal elements to be positive. Exercise 11.3.12. Show that if A and B are symmetric n × n matrices and A ∼ B and A is ∗ ∗ ∗ positive definite, then so is B. In the next sequence of exercises we study the definiteness of quadratic forms Q where Definition 11.3.13 (Corner matrix). Let A = A n A is not necessarily symmetric. (αij)i,j=1 ∈ Mn(R) and define for k = 1, . . . , n, k the corner matrix Ak := (αij)i,j=1 to be the Exercise 11.3.18. Let A ∈ Mn(R) be an n×n k × k submatrix of A, obtained by taking the matrix such that QA is positive definite. Show intersection of the first k rows of A with the that A need not have all of its corner determi- first k columns of A. In particular, An = A. nants positive. Give a 2 × 2 counterexample. The k-th corner determinant of A is det Ak. Contrast this with part (b) of Ex. 11.3.14. 76 CHAPTER 11. (F, R) BILINEAR AND QUADRATIC FORMS

Proposition 11.3.19. Let A ∈ Mn(R). If QA Definition 11.4.1 (Orthogonality). Let x, y ∈ n is positive definite, then so is QAT ; analogous F . We say that x and y are orthogonal with results hold for positive semidefinite, negative respect to f (notation: x ⊥ y) if f(x, y) = 0. definite, negative semidefinite, and indefinite n n Definition 11.4.2. Let S,T ⊆ F . For v ∈ F , quadratic forms. we say that v is orthogonal to S (notation: v ⊥ S) if for all s ∈ S, we have v ⊥ s. More- Exercise 11.3.20. Let A, B, C ∈ Mn(R). As- T over, we say that S is orthogonal to T (nota- sume B = C AC. Prove: if QA is positive tion: S ⊥ T ) if s ⊥ t for all s ∈ S and t ∈ T . semidefinite then so is QB. Definition 11.4.3. Let S ⊆ n. Then S⊥ (“S Exercise 11.3.21. Let A, B ∈ M ( ). Show F n R perp”) is the set of vectors orthogonal to S, that if A ∼ B (A and B are orthogonally sim- o i. e., ilar, R Def. 9.2.1) and QA is positive definite ⊥ n then QB is positive definite. S := {v ∈ F | v ⊥ S}. (11.7)   α β n Exercise 11.3.22. Let A = . Prove Proposition 11.4.4. For all subsets S ⊆ F , γ δ ⊥ n we have S ≤ F . that QA is positive definite if and only if α, δ > 0 and (β + γ)2 < 4αδ. Proposition 11.4.5. Let S ⊆ Fn. Then S ⊆ S⊥⊥. Exercise 11.3.23. Let A, B ∈ Mn(R) be ma- ⊥ trices that are not necessarily symmetric. Exercise 11.4.6. Prove: (Fn) = {0} if and only if f is nonsingular. (a) Prove that if A ∼ B and QA is positive definite, then QB cannot be negative def- Exercise 11.4.7. Verify {0}⊥ = Fn. inite. Exercise 11.4.8. What is ∅⊥ ? (b) Find 2 × 2 matrices A and B such that A ∼ B and QA is positive definite but ∗ ∗ ∗ QB is indefinite. (Contrast this with Ex. 11.3.12.) For the rest of this section we assume that f is nonsingular. Theorem 11.4.9 (Dimensional complemen- 11.4 (F) Geometric alge- tarity). Let U ≤ Fn. Then bra (optional) dim U + dim U ⊥ = n . (11.8)

Let us fix a bilinear form f : Fn × Fn → F. ♦ 11.4. (F) GEOMETRIC ALGEBRA (OPTIONAL) 77

Corollary 11.4.10. Let S ⊆ Fn. Then Exercise 11.4.17. Let F be a field and let k ≥ 2. Consider the following statement. ⊥ S⊥ = span(S) . (11.9) Stm(F, k): If U ≤ Fn is a k-dimensional In particular, if U ≤ Fn then subspace then U contains an isotropic vector.

⊥ U ⊥ = U. (11.10) Prove: (a) if k ≤ ` and Stm(F, k) is true, then Definition 11.4.11 (Isotropic vector). The vec- Stm(F, `) is also true; tor v ∈ Fn is isotropic if v 6= 0 and v ⊥ v. (b) Stm(F, 2) is true Exercise 11.4.12. Let A = I, i. e., f(x, y) = T x y (the standard dot product). (b1) for F = F2 and (b2) for F = C; (a) Prove: over R, there are no isotropic vec- tors. (c) Stm(F, 2) holds for all finite fields of char- acteristic 2; 2 2 2 (b) Find isotropic vectors in C , F5, and F2. (d) Stm(F, 2) is false for all finite fields of odd (c) For what primes p is there an isotropic characteristic; 2 vector in Fp? (e)∗ Stm(F, 3) holds for all finite fields. Definition 11.4.13 (Totally isotropic subspace). n The subspace U ≤ F is totally isotropic if Exercise 11.4.18. Prove: if Stm( , 2) is true ⊥ F U ⊥ U, i. e., U ≤ U . then every maximal totally isotropic subspace n n of F has dimension b 2 c. In particular, this Exercise 11.4.14. If S ⊆ Fn and S ⊥ S then conclusion holds for F = F2 and for F = C. span(S) is a totally isotropic subspace. Exercise 11.4.19. Prove: for all finite fields Corollary 11.4.15 (to Theorem 11.4.9). If , every maximal totally isotropic subspace of n F U ≤ F is a totally isotropic subspace then n has dimension ≥ n − 1. n F 2 dim U ≤ b 2 c.

n Exercise 11.4.16. For even n, find an 2 - dimensional totally isotropic subspace in Cn, n n F5 , and F2 . Exercise 12.1.4. Let z1, z2 ∈ C. Show that

z1 · z2 = z1 · z2 . (12.2)

Fact 12.1.5. Let z = a+bi. Then zz = a2 +b2. Chapter 12 In particular, zz ∈ R and zz ≥ 0. Definition 12.1.6 (Magnitude of a complex number). Let z ∈ C. Then the magnitude√ , norm, or absolute value of z is |z| = zz. If (C) Complex |z| = 1, then z is said to have unit norm.

Matrices Proposition 12.1.7. Let z ∈ C have unit norm. Then z can be expressed in the form

z = cos θ + i sin θ (12.3) 12.1 Complex numbers for some θ ∈ [0, 2π). Before beginning our discussion of matrices Proposition 12.1.8. The complex number z with entries taken from the field , we pro- C has unit norm if and only if z = z−1. vide a refresher of complex numbers and their properties. Until now, we have dealt with matrices over Definition 12.1.1 (Complex number). A com- the reals or over a general field F. We now turn our attention specifically to matrices with com- plex number is a number z ∈ C of the form √ plex entries. z = a + bi, where a, b ∈ R and i = −1. Notation 12.1.2. Let z be a complex number. Then Re z denotes the real part of z and Im z 12.2 Hermitian dot prod- denotes the imaginary part of z. In particular, n if z = a + bi, then Re z = a and Im z = b. uct in C Definition 12.1.3 (Complex conjugate). Let Definition 12.2.1 (Conjugate-transpose). Let k×` z = a + bi ∈ C. Then the complex conjugate of A = (αij) ∈ C . The conjugate-transpose z, denoted z, is of A is the ` × k matrix A∗ whose (i, j) entry is αji. The conjugate-transpose is also called the z = a − bi . (12.1) (Hermitian) adjoint.

Babai: Discover Linear Algebra. 78 This chapter last updated August 9, 2016 c 2016 L´aszl´oBabai. 12.2. HERMITIAN DOT PRODUCT IN CN 79

Example 12.2.2. The conjugate-transpose of Exercise 12.2.7. Find a nonzero vector v ∈ the matrix Cn such that vT v = 0.   2 3 − 4i −2i Exercise 12.2.8. Let A ∈ Ck×n. 1 + i −5 6i (a) Show that rk (A∗A) = rk A. is the matrix (b) Find A ∈ M ( ) such that rk AT A <  2 1 − i 2 C rk(A). 3 + 4i −5  . 2i −6i The standard dot product in Rn (R Def. 1.4.1) carries with it the notions of norm and Fact 12.2.3. Let A, B ∈ k×n. Then C orthogonality; likewise, the standard Hermi- (A + B)∗ = A∗ + B∗ . (12.4) tian dot product carries these notions with it. Definition 12.2.9 (Norm). Let v ∈ n. The Exercise 12.2.4. Let A ∈ Ck×n and let B ∈ C norm of v, denoted kvk, is Cn×`. Show that (AB)∗ = B∗A∗. √ Fact 12.2.5. Let λ ∈ C and A ∈ Ck×n. Then kvk := v · v . (12.6) (λA)∗ = λA∗. This norm is also referred to as the (complex) In Section 1.4, we defined the standard dot Euclidean norm or the `2 norm. R product ( Def. 1.4.1). We now define the T standard Hermitian dot product for vectors in Fact 12.2.10. If v = (α1, . . . , αn) , then n. v C u n uX 2 Definition 12.2.6 (Standard Hermitian dot kvk = t |αi| . (12.7) n product). Let v, w ∈ C . Then the Hermitian i=1 dot product of v with w is Definition 12.2.11 (Orthogonality). The vec- n X n v · w := v∗w = α β (12.5) tors v, w ∈ C are orthogonal (notation: x ⊥ i i y) if v · w = 0. i=1

T Definition 12.2.12 (Orthogonal system). An where v = (α1, . . . , αn) and w = n T orthogonal system in C is a list of (pairwise) (β1, . . . , βn) . orthogonal nonzero vectors in Cn. In particular, observe that v∗v is real and positive for all v 6= 0. The following pair of Exercise 12.2.13. Let S ⊆ Ck be an orthog- exercises show some of the things that would onal system in Cn. Prove that S is linearly go wrong if we did not conjugate. independent. 80 CHAPTER 12. (C) COMPLEX MATRICES

Definition 12.2.14 (Orthonormal system). An are the complex generalization of orthogonal orthonormal system in Cn is a list of (pairwise) matrices. orthogonal vectors in V , all of which have unit Definition 12.3.6 (). The ma- norm. So (v , v ,... ) is an orthonormal sys- 1 2 trix A ∈ M ( ) is unitary if A∗A = I. The set tem if v · v = δ for all i, j. n C i j ij of unitary n × n matrices is denoted by U(n). Definition 12.2.15 (Orthonormal basis). An or- n thonormal basis of C is an orthonormal sys- Fact 12.3.7. A ∈ Mn(C) is unitary if and only n tem that is a basis of C . if its columns form an orthonormal basis of Cn.

Proposition 12.3.8. U(n) is a group under 12.3 Hermitian and uni- matrix multiplication (it is called the unitary tary matrices group).

Definition 12.3.1 (Self-adjoint matrix). The Exercise 12.3.9. Which diagonal matrices are matrix A ∈ Mn(C) is said to be self-adjoint unitary? or Hermitian if A∗ = A. Theorem 12.3.10 (Third Miracle of Linear  8 2 + 5i Example 12.3.2. The matrix Algebra). Let A ∈ M ( ). Then the columns 2 − 5i −3 n C is self-adjoint. of A are orthonormal if and only if the rows of A are orthonormal. ♦ Exercise 12.3.3. Which diagonal matrices are Hermitian? Proposition 12.3.11. Let A ∈ U(n). Then all eigenvalues of A have absolute value 1. Theorem 12.3.4. Let A be a Hermitian ma- trix. Then all eigenvalues of A are real. ♦ Exercise 12.3.12. The matrix A ∈ Mn(C) is Exercise 12.3.5 (Alternative proof of the real unitary if and only if A preserves the Hermi- n Spectral Theorem). The key part of the proof tian dot product, i. e., for all v, w ∈ C , we ∗ ∗ of the Spectral Theorem (R Theorem 19.4.4) have (Av) (Aw) = v w. given in Section 19.4 is the following lemma ∗ (R Lemma 19.4.7): Let A be a symmetric Exercise 12.3.13. The matrix A ∈ Mn(C) real matrix. Then A has an eigenvector. De- is unitary if and only if A preserves the norm, n rive this lemma from Theorem 12.3.4. i. e., for all v ∈ C , we have kAvk = kvk. Recall that a square matrix A is orthogonal Warning. The proof of this is trickier than (R Def. 9.1.1) if AT A = I. Unitary matrices in the real case (R Ex. 9.1.8). 12.4. NORMAL MATRICES AND UNITARY SIMILARITY 81

12.4 Normal matrices and Proposition 12.4.6. A ∈ Mn(C) has an or- thonormal eigenbasis if and only if A is unitar- unitary similarity ily similar to a diagonal matrix. Definition 12.4.1 (). A matrix We now show (Cor. 12.4.8) that this con- A ∈ M ( ) is normal if it commutes with its n C dition is equivalent to A being normal. conjugate-transpose, i. e., AA∗ = A∗A. Exercise 12.4.2. Which diagonal matrices are Lemma 12.4.7. Let A, B ∈ Mn(C) with A ∼u normal? B. Then A is normal if and only if B is nor- mal. ♦ Definition 12.4.3 (Unitary similarity). Let A, B ∈ Mn(C). We say that A is unitarily Corollary 12.4.8. If A is unitarily similar to similar to B, denoted A ∼u B, if there exists a diagonal matrix, then A is normal. a unitary matrix U such that A = U −1BU. Unitary similarity is a powerful tool to Note that U −1BU = U ∗BU because U is study matrices, owing largely to the following unitary. theorem.

Proposition 12.4.4. Let A ∼u B. Theorem 12.4.9 (Schur). Every matrix A ∈ (a) If A is Hermitian then so is B. Mn(C) is unitarily similar to a triangular ma- trix. (b) If A is unitary then so is B. ♦ (c) If A is normal then so is B. Definition 12.4.10 (Dense subset). A subset S ⊆ Fk×n is dense in Fk×n if for every A ∈ S Proposition 12.4.5. Let A ∼u diag(λ1, . . . , λn). and every ε > 0, there exists B ∈ S such that Then every entry of the matrix A − B has absolute value less than ε. (a) A is normal; (b) if all eigenvalues of A are real then A is Proposition 12.4.11. Diagonalizable matri- Hermitian; ces are dense in Mn(C). (c) if all eigenvalues of A have unit absolute Exercise 12.4.12. Complete the proof of the value then A is unitary. Cayley-Hamilton Theorem over C (R Theo- rem 8.5.3) using Prop. 12.4.11. We note that all of these implications are actually “if and only if,” as we shall demon- We now generalize the Spectral Theorem to strate (R Theorem 12.4.14). normal matrices. 82 CHAPTER 12. (C) COMPLEX MATRICES

Theorem 12.4.13 (Complex Spectral Theo- 12.5 Additional exercises rem). Let A ∈ Mn(C). Then A has an or- thonormal eigenbasis if and only if A is nor- Exercise 12.5.1. Find an n × n unitary cir- R mal. ♦ culant matrix ( Def. 2.5.12) with no zero entries. Before giving the proof, let us restate the theorem. Definition 12.5.2 (Discrete Fourier Trans- form matrix). The Discrete Fourier Trans- Theorem 12.4.14 (Complex Spectral Theo- form (DFT) matrix is the n × n Vander- R rem, restated). Let A ∈ Mn( ). Then A is monde matrix ( Def. 2.5.9) F generated C 2 n−1 unitarily similar to a diagonal matrix if and by 1, ω, ω , . . . , ω , where ω is a primitive n- th root of unity. only if A is normal. ♦ Exercise 12.4.15. Prove that Theo- Exercise 12.5.3. Let F be the n × n DFT matrix. Prove that √1 F is unitary. rems 12.4.13 and 12.4.14 are equivalent. n Exercise 12.5.4. Which circulant matrices Exercise 12.4.16. Infer Theorem 12.4.14 are unitary? from Schur’s Theorem (Theorem 12.4.9) via the following lemma.

Lemma 12.4.17. If a triangular matrix is normal, then it is diagonal. ♦

Theorem 12.4.18 (Real version of Schur’s Theorem). If A ∈ Mn(R) and all eigenvalues of A are real, then A is orthogonally similar to an upper triangular matrix. ♦ Exercise 12.4.19 (Third proof of the real Spectral Theorem). Infer the Spectral Theo- rem for real symmetric matrices (R Theo- rem 10.1.1) from the complex Spectral The- orem and the fact, separately proved, that all eigenvalues of a symmetric matrix are real (R Theorem 12.3.4). Exercise 13.1.6. Let A = [a1 | · · · | an] ∈ k×n R . (The ai are the columns of A.) Show kAk ≥ kaik for every i.

k×n Exercise 13.1.7. Let A = (αij) ∈ R . Show Chapter 13 that kAk ≥ |αij| for every i and j. Proposition 13.1.8. If A is a orthogonal ma- trix then kAk = 1. ( , ) Matrix Proposition 13.1.9. Let A ∈ Rk×n. Let C R S ∈ O(k) and T ∈ O(n) be orthogonal ma- Norms trices. Then kSAk = kAk = kAT k . (13.2)

Proposition 13.1.10. Let A be a symmetric 13.1 (R) Operator norm real matrix with eigenvalues λ1 ≥ · · · ≥ λn. Then kAk = max |λi|. Definition 13.1.1 (Operator norm). The oper- Exercise 13.1.11. Let A ∈ k×n. Show that ator norm of a matrix A ∈ k×n is defined to R R AT A is positive semidefinite (R Def. 11.3.6). be kAxk k×n T kAk = max (13.1) Exercise 13.1.12. Let A ∈ R and let A A n x∈R kxk have eigenvalues λ ≥ · · · ≥ λ . Show that x6=0 √ 1 n kAk = λ . where the k · k notation on the right-hand side 1 represents the Euclidean norm. Proposition 13.1.13. For all A ∈ Rk×n, we k×n have kAT k = kAk. Proposition 13.1.2. Let A ∈ R . Then kAk exists. Exercise 13.1.14. Proposition 13.1.3. Let A ∈ k×n and λ ∈ R (a) Find a (R Def. . Then kλAk = |λ|kAk. R 22.1.2) of norm greater than 1. Proposition 13.1.4 (Triangle inequality). k×n (b) Find an n × n stochastic matrix of norm Let A, B ∈ R . Then kA+Bk ≤ kAk+kBk. √ n. Proposition 13.1.5 (Submultiplicativity). k×n n×` Let A ∈ R and B ∈ R . Then kABk ≤ (c) Show that an n×n stochastic√ matrix can- kAk · kBk. not have norm greater than n.

Babai: Discover Linear Algebra. 83 This chapter last updated August 29, 2016 c 2016 L´aszl´oBabai. 84 CHAPTER 13. (C, R) MATRIX NORMS

Numerical exercise 13.1.15. Proposition 13.2.8. For all A ∈ Rk×n, we T 1 1 have kA kF = kAkF. (a) Let A = . Calculate kAk. 0 1 Proposition 13.2.9. Let A ∈ Rk×n. Then 0 1 √ (b) Let B = . Calculate kBk. 1 1 kAk ≤ kAkF ≤ nkAk . (13.5)

Exercise 13.2.10. Prove kAk = kAkF if and 13.2 ( ) Frobenius norm only if rk A = 1. Use the Singular Value De- R composition (R Theorem 21.1.2). Definition 13.2.1 (Frobenius norm). Let A = Exercise 13.2.11. Let A be a symmetric real (α ) ∈ k×n be a matrix. The Frobenius norm √ ij R matrix. Show that kAk = nkAk if and only of A, denoted kAk , is defined as F F if A = λR for some reflection matrix (R ??) s X 2 R. kAkF = |αij| . (13.3) i,j

k×n 13.3 ( ) Complex Matri- Proposition 13.2.2. Let A ∈ R . Then C p T kAkF = Tr(A A). ces Proposition 13.2.3. Let A ∈ k×n and λ ∈ R Exercise 13.3.1. Generalize the definition . Then kλAk = |λ|kAk . R F F of the operator norm and statements 13.1.2- Proposition 13.2.4 (Triangle inequality). 13.1.12 to C. k×n Let A, B ∈ R . Then kA + BkF ≤ kAkF = Exercise 13.3.2. Generalize the definition of kBkF. the Frobenius norm and statements 13.2.2- Proposition 13.2.5 (Submultiplicativity). 13.2.10 to . k×n n×` C Let A ∈ R and B ∈ R . Then kABkF ≤ kAkF · kBkF. Exercise 13.3.3. Let A be a normal matrix with eigenvalues λ1, . . . , λn. Show that kAkF = Proposition√ 13.2.6. If A ∈ O(k) then √ nkAk if and only if |λ1| = ··· = |λn|. kAkF = k. Proposition 13.2.7. Let A ∈ Rk×n. Let Question 13.3.4. Is normality necessary? S ∈ O(k) and T ∈ O(n) be orthogonal ma- trices. Then

kSAkF = kAkF = kAT kF . (13.4) Part II

Linear Algebra of Vector Spaces

85 Introduction to Part II

TO BE WRITTEN.

(R Contents)

Babai: Discover Linear Algebra. 86 This chapter last updated August 21, 2016 c 2016 L´aszl´oBabai. then (a1, a2) and (a2, a1) are distinct elements of A × A. Notation 14.1.4 (Cardinality). For a set A we denote the cardinality of A (the number of el- ements of A) by |A|. Chapter 14 For instance, |{4, 5, 6, 4}| = 3. Exercise 14.1.5. Let A, B ⊆ Z with |A| = n and B = m. Show that (F) Preliminaries (a) m + n − 1 ≤ |A + B| ≤ mn n+1 (b) |A + A| ≤ 2 n+2 (c) |A + A + A| ≤ 3 (R Contents) (d) |A × B| = mn

Definition 14.1.6 (Divisibility). Let a, b ∈ Z. 14.1 Modular arithmetic Then a divides b, or a is a divisor of b (nota- tion: a | b) if there exists some c ∈ Z such that Notation 14.1.1. The integers {..., −2, −1, 0, b = ac. 1, 2,... } are denoted by . The set Z Proposition 14.1.7. For all a ∈ , 1 | a. {1, 2, 3,... } of positive integers is denoted by Z Z+. Proposition 14.1.8. Show that 0 | 0. Definition 14.1.2 (Sum of sets). Let A, B ⊆ Z. Notice that this does not violate any rules Then A + B is the set that we know about division by 0 because our definition of divisibility does not involve divi- A + B = {a + b | a ∈ A, b ∈ B} . (14.1) sion. Definition 14.1.3 (Cartesian product). Let A Exercise 14.1.9. For what a do we have a | b and B be sets. Then the Cartesian product of for all b ∈ Z? A and B is the set A × B defined by Exercise 14.1.10. For what b do we have a | b A × B = {(a, b) | a ∈ A, b ∈ B} . (14.2) for all a ∈ Z? Notice that (a, b) in the above definition is Proposition 14.1.11. Let a, b, c ∈ Z. If c | a and c | b then c | a + b. an ordered pair. In particular, if a1, a2 ∈ A,

Babai: Discover Linear Algebra. 87 This chapter last updated August 9, 2016 c 2016 L´aszl´oBabai. 88 CHAPTER 14. (F) PRELIMINARIES

Proposition 14.1.12 (Transitivity of divisi- Definition 14.1.18 (Binary relation). A binary bility). Let a, b, c ∈ Z. If a | b and b | c, then relation on the set A is a subset R ⊆ A × A. a | c. The relation R holds for the elements a, b ∈ A if (a, b) ∈ R. In this case, we write aRb or Definition 14.1.13 (Congruence modulo m). R(a, b). Let a, b, m ∈ Z. Then a is congruent to b mod- ulo m (written a ≡ b (mod m)) if m | a − b. Definition 14.1.19 (Equivalence relation). Let ∼ be a binary relation on a set A. The relation Exercise 14.1.14. Let a, b ∈ Z. Show ∼ is said to be an equivalence relation if it has (a) a ≡ b (mod 1) the following properties. For all a, b, c ∈ A, (b) a ≡ b (mod 2) if and only if a and b are (a) a ∼ a (reflexivity) either both even or both odd (b) a ∼ b if and only if b ∼ a (symmetry) (c) a ≡ b (mod 0) if and only if a = b (c) If a ∼ b and b ∼ c, then a ∼ c (transitiv- Exercise 14.1.15. Let a, b, c, d, m ∈ Z. Show ity) that if a ≡ c (mod m) and b ≡ d (mod m), then Proposition 14.1.20. For any fixed m ∈ Z, “congruence modulo m” is an equivalence re- (a) a + b ≡ c + d (mod m) lation on Z. (b) a − b ≡ c − d (mod m) Definition 14.1.21 (Equivalence class). Let A (c) ab ≡ cd (mod m) be a set with an equivalence relation ∼, and let a ∈ A. The equivalence class of a with re- Exercise 14.1.16. Let k ≥ 0. Show that if spect to ∼, denoted [a], is the set a ≡ b (mod m), then ak ≡ bk (mod m). [a] = {b ∈ A | a ∼ b} (14.3) Example 14.1.17. Modular arithmetic (arithmetic modulo m) models periodic phe- Definition 14.1.22 (Residue classes). The nomena. When we teach modular arithmetic to equivalence classes of the equivalence relation kids (Fermat’s little Theorem is a perfectly rea- “congruence modulo m” in Z are called “ mod- sonable topic for math-minded eighth-graders), ulo m residue classes”. we may refer to the subject as “calendar arith- metic,” the calendar being a source of period- Exercise 14.1.23. (a) What are the modulo icity. For example, if August 3 is a Wednesday, 2 residue classes? (b) Prove: If m ≥ 1 then then August 24 is also a Wednesday, because there are m residue classes modulo m, and ev- 3 ≡ 24 (mod 7). ery integer belongs to exactly one of them. 14.1. MODULAR ARITHMETIC 89

Definition 14.1.24 (Partition). Let B1,...,Bm common in a set of three trees, three bears, be non-empty subsets of the set A. We say three mountains, hence came the abstract con- that the set Π = {B1,...,Bm} is a partition cept of the number “three.” More generally, of A if every element of A belongs to exactly what was discovered was the equivalence rela- one of the Bi. We call the Bi the blocks of the tion of equicardinality (having the same num- partition. ber of elements: same number of trees, goats, rocks) and the concept of “number” emerged Exercise 14.1.25 (Partition defines equiva- as our ancestors gave names to the equivalence lence relation). Given a partition Π of the set classes. The notion of “having the same color” A, let us define the relation ∼Π on A by letting is a (somewhat fuzzy) equivalence relation; the a ∼Π b if and only if a and b belong to the same equivalence classes are the colors, hence the no- block of Π. Show that ∼Π is an equivalence re- tion of “color” was born. lation and the blocks of Π are the equivalence A vast number of mathematical concepts classes of ∼Π. gives names to equivalence classes of impor- The next theorem shows that equivalence tant equivalence relations. Of course in real relations define partititions. life (as opposed to mathematics) we always encounter fuzzy boundaries, the relations we Theorem 14.1.26 (Fundamental Theorem of discover are “similarity relations” as opposed Equivalence Relations). Let R be an equiva- to exact equivalence relations, but the basic lence relation on the set A. Show that there idea is the same: we want to form clusters of exists a unique partition of Π of A such that “similar” objects, and the clusters themselves R =∼Π. ♦ will represent the new concept. (Think of any- thing from “flavor” or “texture” to “honesty” The proof is contained in the following ex- or “creditworthiness” or the concept of a “con- ercise. cept.”) Much of and Machine Learn- Exercise 14.1.27. Let ∼ be an equivalence re- ing tries to discover clusters and thus imitiate lation on the set A and let a, b ∈ A. Prove: (a) the concept-forming ability of human intelli- a ∈ [a] ; (b) if [a] ∩ [b] 6= ∅ then [a] = [b]. In- gence. dicate how each axiom of equivalence relations Definition 14.1.29 (Sum of residue classes). Let is used in your proof. [a] and [b] be residue classes modulo m. Then Example 14.1.28. Theorem 14.1.26 might as their sum is defined by the equation well be called the “fundamental theorem of in- telligence.” It underlies all concept formation. [a] + [b] = [a + b] (14.4) Humanity discovered that there was something 90 CHAPTER 14. (F) PRELIMINARIES

Definition 14.1.30 (Product of residue classes). denote by 0, such that for all a ∈ Zm we Let [a] and [b] be residue classes of Z modulo have 0 + a = a + 0 = a (zero element) m. Then their product is defined by the equa- (d) There exists a “multiplicatively neutral” tion element in , which we call “(mul- [a] · [b] = [a · b] (14.5) Zm tiplicative) idenitity” and denote by 1, These are two examples of “definitions by such that for all a ∈ Zm we have 1 · a = representatives.” The question arises, are such a · 1 = a (identity element) definitions sound? Let us take two residue classes mod m, call them A and B. If a ∈ A (e) For each a ∈ Zm, there exists an element and b ∈ B, we define A + B to be the residue −a ∈ Zm such that a + (−a) = 0 (addi- class [a + b]. But what if we pick another pair tive inverse) of representatives, a0 ∈ A and b0 ∈ B. Do (f) For all a, b ∈ Zm, we have a + b = b + a we get the same answer? Is it the case that and a · b = b · a (commutativity) [a+b] = [a0 +b0]? The positive answer is stated in the next exercise. (g) For all a, b, c ∈ Zm, a · (b + c) = a · b + a · c (distributivity) Exercise 14.1.31. Show that the sum and Further, if p is a prime number, then product of residue classes are well defined, that is, that they do not depend on our choice of (h) For all a ∈ Zp, if a 6= 0 then there exists −1 −1 representative for each residue class. an element a ∈ Zp such that a·a = 1 (multiplicative inverse) Proposition 14.1.32. Let Zm denote the set of residue classes modulo m with the opera- (R Contents) tions of addition and multiplication as defined above. Verify the following properties of these operations. 14.2 Abelian groups (a) For all a, b ∈ , there exist unique Zm A group is a set with a binary operation satis- a + b ∈ Zm and a · b ∈ Zm fying certain axioms. The set Z with the op- eration of addition is an example of group. A (b) For all a, b, c ∈ Zm we have (a + b) + c = a + (b + c) and (a · b) · c = a · (b · c) (as- class of further examples are the sets Zm of sociativity) residue classes modulo m, with addition being the operation. (c) There exists a “additively neutral” ele- We now generalize this notion to include a ment in Zm, which we call “zero” and much larger collection of algebraic structures. 14.2. ABELIAN GROUPS 91

Definition 14.2.1 (Group). A group is a set G Definition 14.2.3 (Order of a group). Let G be along with a binary operation ◦ that satisfies a group. The order of G is its cardinality, |G| the following axioms. (the number of its elements). (a) For all a, b ∈ G, there exists a unique Observe that Prop. 14.1.32 asserted that element a ◦ b ∈ G × (Zm, +) is an for and that (Zp , ·) is an abelian group when p is prime; here (b) For all a, b, c ∈ G,(a ◦ b) ◦ c = a ◦ (b ◦ c) × = \{0}. We now present more examples (associativity) Zp Zp of groups. (c) There exists a neutral element e ∈ G such that for all a ∈ G, e ◦ a = a ◦ e = a Examples 14.2.4. Show that the following (identity element in multiplicative nota- are abelian groups. tion, zero in additive notation)) (a) The integers with addition, ( , +) (d) For each a ∈ G, there exists b ∈ G such Z that a ◦ b = b ◦ a = e (inverses) (b) The real numbers with addition, (R, +) Proposition 14.2.2. The neutral (identity or zero) element of G is unique. For every a ∈ G, (c) The real numbers except for 0 with mul- the inverse of a is unique. tiplication, (R×, ×) The inverse of an element a is often written Ω as a−1, and we often write ab instead of a ◦ b. (d) For any set Ω, the set R of real func- The set G along with the binary operation ◦ tions f :Ω → R with pointwise addition, Ω is the group (G, ◦). We often omit ◦ and we (R , +), where pointwise addition is de- refer to G as the group when the binary oper- fined by the rules (f+g)(ω) = f(ω)+g(ω) ation is clear from context. Groups satisfying for all ω ∈ Ω. the additional axiom (e) The complex n-th roots of unity, that is, (e) a ◦ b = b ◦ a for all a, b ∈ G (commutativ- the set ity) n {ω ∈ C | ω = 1} are called abelian groups. When discussing abelian groups, we often with multiplication. use additive notation (that is, we write the bi- nary operation ◦ as +), and we write the neu- Definition 14.2.5. In an additive abelian group, k tral element as 0 and the additive inverse of an X we define the sum ai by induction. For the element a as −a. i=1 92 CHAPTER 14. (F) PRELIMINARIES

1 X Proposition 14.2.10. Let G be a group and base case of k = 1, ai = ai, and for k > 1, H ⊆ G be a subset. H is a subgroup of G if i=1 and only if k k−1 X X (a) The identity element of G is a member of a = a + a . (14.6) i k i H. (In additive notation, 0 ∈ H.) i=1 i=1

Convention 14.2.6 (Empty sum). The empty (b) H is closed under the operation: if a, b ∈ sum is equal to 0, that is, H then ab ∈ H. (In additive notation, a + b ∈ H.) X ai = 0 . (14.7) (c) H is closed under inversion: if a ∈ H i∈∅ then a−1 ∈ H. (In additive notation, P0 −a ∈ H.) The notation i=1 ai refers to the empty sum, so this sum is zero. Exercise 14.2.11. The “subgroup” relation is Convention 14.2.7 (Empty product). By con- transitive: a subgroup of a subgroup is a sub- Y group. vention, the empty product ai is equal to i∈∅ Proposition 14.2.12. The intersection of any 1. In particular, 00 = 1 and 0! = 1. The no- 0 collection of subgroups of a group G is itself a tation Q a refers to the empty product, so i=1 i subgroup of G. this product is 1. Exercise 14.2.13. Let G be a group and let Exercise 14.2.8. Let G be an (additive) H,K ∈ G. Then H ∪ K ≤ G if and only if abelian group and let A, B ⊆ G with |A| = n H ⊆ K or K ⊆ H. and B = m. Show that Proposition 14.2.14. Let G be a group and (a) |A + B| ≤ mn let H ≤ G. Then (b) |A + A| ≤ n+1 2 (a) The identity of H is the same as the iden- n+2 tity of G. (c) |A + A + A| ≤ 3 (b) Let a ∈ H. The inverse of a in H is the Definition 14.2.9 (Subgroup). Let G be a same as the inverse of a in G. group. If H ⊆ G is nonempty and is a group under the same operation as G, we say that H Notation 14.2.15. Let G be an abelian group, is a subgroup of G, denoted H ≤ G. written additively. Let H,K ⊆ G. We write 14.3. FIELDS 93

−H for the set −H = {−h | h ∈ H}, and Proposition 14.3.2. For all a ∈ F we have H − K for the set 0 · a = a · 0 = 0 and 1 · a = a · 1 = a.

H − K = H + (−K) = {h − k | h ∈ H, k ∈ K} Exercise 14.3.3. A field has at least 2 ele- (14.8) ments: 0 and 1; they are not equal. Why?

Proposition 14.2.16. Let G be an abelian Exercise 14.3.4. (a) Q (rational numbers), group, written additively. Let H ⊆ G. Then R (real numbers), C (complex numbers) H ≤ G if and only if are examples of fields. (a) 0 ∈ H (b) Z is not a field. (b) −H ⊆ H (closed under additive inverses) (c) For p a prime number, we define Fp = Zp (c) H + H ⊆ H (closed under addition) the set of residue classes modulo p with addition and multiplication as the oper- Proposition 14.2.17. Let G be an abelian ations. Fp is a field. group, written additively. Let H ⊆ G. Then H ≤ G if and only if H 6= ∅ and H − H ⊆ H (d) Zm is a field if and only if m is a prime (that is, H is closed under subtraction). number.

The field Fp has p elements; there are finite R ( Contents) fields other than Fp as well. The number of ele- ments of a field is referred to as the order of the field. Fields were invented by Evariste´ Galois (1811-1832) (along with groups and the notion 14.3 Fields of abstract algebra). Galois showed that (a) the order of every finite field is a prime power Definition 14.3.1 (Field). A field is a set F and (b) for every prime power q there exists a with two operations, addition and multiplica- unique field of order q. Note that 6= tion, satisfying the following axioms. Fq Fq Zq unless q is a prime. (a) (F, +) is an abelian group. Exercise√ 14.3.5. (a)√ Prove that the set (b) (F×, ·) is an abelian group, where F× = Q[ 2] = {a + b 2 | a, b ∈ Q} is a field. \{0}. √ √ F 3 3 (b) Prove√ that the set Q[ 2] = {a + b 2 + 3 (c) Distributivity holds: a(b + c) = ab + ac. c 4 | a, b, c ∈ Q} is a field. 94 CHAPTER 14. (F) PRELIMINARIES

Exercise 14.3.6. Let p be a prime. Consider k for which α0 = β0, . . . , αk = βk, and all coef- the set Fp[i] of formal expressions of the form ficients αj, βj are zero for j > k. We may omit a + bi (a, b ∈ Fp) where multiplication is per- any terms with zero coefficient, e. g., 2 formed by observing the rule i = −1. Deter- 3 + 0t + 2t2 + 0t3 = 3 + 2t2 (14.11) mine, for what primes p is Fp[i] a field. (Exper- iment, find a pattern, prove.) This exercise will Definition 14.4.2 (Zero polynomial). The poly- give you an infinite number of fields of order p2. nomial which has all coefficients equal to zero is called the zero polynomial and is denoted by 0. (R Contents) Definition 14.4.3 (Leading term). The leading n term of a polynomial f = α0 + α1t + ··· + αnt is the term corresponding to the highest power of t with a nonzero coefficient, that is, the term k 14.4 Polynomials αkt where αk 6= 0 and αj = 0 for all j > k. The zero polynomial does not have a leading In Section 8.3, we described some basic theory term. of polynomials. We now develop a more formal Definition 14.4.4 (Leading coefficient). The theory of polynomials. We view a polynomial leading coefficient of a polynomial f is the co- f as a formal expression whose coefficients are efficient of the leading term of f. taken from a field of scalars. F For example, the leading term of the poly- Definition 14.4.1 (Polynomial). A polynomial nomial 3 + 2t2 + 5t7 is 5t7 and the leading co- over the field F is an expression1 of the form efficient is 5.

2 n Definition 14.4.5 (Monic polynomial). A poly- f = α0 + α1t + α2t + ··· + αnt (14.9) nomial is monic if its leading coefficient is 1. Definition 14.4.6 (Degree of a polynomial). where the coefficients α are scalars (elements i The degree of a polynomial f = α + α t + of ), and t is a symbol. The set of all polyno- 0 1 F ··· + α tn, denoted deg f, is the exponent of mials over is denoted [t]. Two expressions, n F F its leading term. (14.9) and For example, deg (3 + 2t2 + 5t7) = 7. 2 m This definition does not assign a degree to g = β0 + β1t + β2t + ··· + βmt (14.10) the zero polynomial. define the same polynomial if they only differ Convention 14.4.7. The zero polynomial has in leading zero coefficients, i. e., there is some degree −∞. 1Strictly speaking, a polynomial is an equivalence class of such expressions, as explained in the next sentence. 14.4. POLYNOMIALS 95

In fact, this is the only sensible assign- Proposition 14.4.12. Addition of polynomi- ment of degree to the zero polynomial (see als is (a) commutative and (b) associative, that Ex. 14.4.16). is, if f, g, h ∈ F[t] then Exercise 14.4.8. Which polynomials have de- (a) f + g = g + f gree 0? (b) f + (g + h) = (f + g) + h Notation 14.4.9. We denote the set of polyno- Definition 14.4.13 (Multiplication of polyno- mials of degree at most n over by Pn[ ]. n F F mials). Let f = α0 + α1t + ··· + αnt and m Definition 14.4.10 (Sum and difference of poly- g = β0 +β1t+···+βmt be polynomials. Then n nomials). Let f = α0 + α1t + ··· + αnt and the product of f and g is defined as n g = β0 + β1t + ··· + βnt be polynomials. Then n+m ! the sum of f and g is defined as X X i f · g = αjβk t . (14.14) n i=0 j+k=i f +g = (α0 +β0)+(α1 +β1)t+···+(αn +βn)t (14.12) Numerical exercise 14.4.14. Let f, g, and and the difference f − g is defined as h be as in Example 14.4.11. Compute

n (a) e = f · g f −g = (α0 −β0)+(α1 −β1)t+···+(αn −βn)t 4 (14.13) (b) e5 = f · h Note that f and g need not be of the same de- gree; we can add on leading zeros if necessary. (c) e6 = f · (g + h)

Numerical exercise 14.4.11. Let Self-check: verify that e4 + e5 = e6.

f = 2t + t2 Proposition 14.4.15. Let f, g ∈ F[t]. Then g = 3 + t + 2t2 + 3t3 (a) deg(f + g) ≤ max{deg f, deg g} , 3 4 h = 5 + t + t (b) deg(fg) = deg f + deg g . Compute the polynomials Note that both of these statements hold even if one of the polynomials is the zero polynomial. (a) e1 = f − g Exercise 14.4.16. Show that Conven- (b) e2 = g − h tion 14.4.7 is the only sensible assignment of “degree” to the zero polynomial in the follow- (c) e3 = h − f ing sense: no other value from Z ∪ {±∞} will Self-check: verify that e1 + e2 + e3 = 0. preserve the validity of Prop. 14.4.15. 96 CHAPTER 14. (F) PRELIMINARIES

Notation 14.4.17 (Set of functions). Let A and Theorem 14.4.23 (Division Theorem). Let B be sets. The set of functions f : B → A is f, g ∈ F[t] where g is not be the zero polyno- denoted by AB. Here B is the domain of the mial. Then there exist unique polynomials q functions f in question and A is the target set and r such that or codomain (into which f maps B). The following exercise explains the reason f = qg + r (14.17) for this notation. and deg r < deg g. ♦ Exercise 14.4.18. Let A and B be finite sets. B |B| Show that A = |A| . Exercise 14.4.24. Let f ∈ F[t] and ζ ∈ F. Definition 14.4.19 (Substitution). For ζ ∈ F (a) Show that there exist q ∈ [t] and ξ ∈ and f = α + α t + ··· + α tn ∈ [t], we set F F 0 1 n F such that n f(ζ) = α0 + α1ζ + ··· + αnζ ∈ F (14.15) f = (t − ζ)q + ξ The substitution t 7→ ζ defines a mapping F[t] → F which assigns the value f(ζ) to f. We denote the F → F function ζ 7→ f(ζ) by (b) Prove that ξ = f(ζ) f and call f a polynomial function. So f is a function while f is a formal expression. Exercise 14.4.25. Let f ∈ F[t]. Definition 14.4.20. If A is a k × k matrix, then (a) Show that if F is an infinite field, then we define f = 0 if and only f = 0. n f(A) = α0Ik + α1A + ··· + αnA ∈ Mk(F) . (14.16) (b) Let F be a finite field of order q. Definition 14.4.21 (Divisibility of polynomi- (b1) Prove that there exists f 6= 0 such als). Let f, g ∈ F[t]. We say that g divides that f = 0. f, or f is divisible by g, written g | f, if there (b2) Show that if deg f < q, then f = 0 exists a polynomial h ∈ F[t] such that f = gh. In this case we say that g is a divisor of f and if and only if f = 0. Do not use (b3) f is a multiple of g. to solve this; your solution should be just one line, based on a very simple Notation 14.4.22 (Divisors of a polynomial). formula defining f. Let f ∈ F[t]. We denote by Div(f) the set of all divisors of f. We denote by Div(f, g) the (b3) (∗) (Fermat’s Little Theorem, gen- set of polynomials which divide both f and g, eralized) Show that if |F| = q and that is, Div(f, g) = Div(f) ∩ Div(g). f = tq − t, then f = 0. 14.4. POLYNOMIALS 97

Definition 14.4.26 (Ideal). Let I ⊆ F[t]. Then (a) g is a common divisor of the fi, i. e., g | fi I is an ideal if the following three conditions for all i, hold. (b) g is a common multiple of all common (a) 0 ∈ I divisors of the fi, i. e., for all e ∈ F[t], if e | fi for all i, then e | g. (b) I is closed under addition Proposition 14.4.34. Let f1, . . . , fk, d, d1, d2 ∈ (c) If f ∈ I and g ∈ F[t] then fg ∈ I F[t]. Notation 14.4.27. Let f ∈ F[t]. We denote by (a) Let ζ be a nonzero scalar. If d is a gcd (f) the set of all multiples of f, i. e., of f1, . . . , fk, then ζd is also a gcd of f1, . . . , fk. (f) = {fg | g ∈ F[t]} . (b) If d1 and d2 are both gcds of f1, . . . , fk, Proposition 14.4.28. Let f, g ∈ F[t]. Then then there exists ζ ∈ F× = F \{0} such (f) ⊆ (g) if and only if g | f. that d2 = ζd1.

Definition 14.4.29 (Principal ideal). Let f ∈ Exercise 14.4.35. Let f1, . . . , fk ∈ F[t]. Show F[t]. The set (f) is called the principal ideal that gcd(f1, . . . , fk) = 0 if and only if f1 = generated by f, and f is said to be a generator ··· = fk = 0. of this ideal. Proposition 14.4.36. Let f1, . . . , fk ∈ F[t] Exercise 14.4.30. For f ∈ F[t], verify that and suppose not all of the fi are 0. Then (f) is indeed an ideal of F[t]. among all of the greatest common divisors of f1, . . . , fk, there is a unique monic polynomial. Theorem 14.4.31 (Every ideal is principal). For the sake of uniqueness of the gcd nota- Every ideal of F[t] is principal. ♦ tion, we write d = gcd(f1, . . . , fk) if, in addi- Proposition 14.4.32. Let f, g ∈ F[t]. Then tion to (a) and (b), (f) = (g) if and only if there exists ζ 6= 0 ∈ F (c) d is monic or d = 0. such that g = αf. Theorem 14.4.37 (Existence of gcd). Let Theorem 14.4.31 will be our tool to prove f1, . . . , fk ∈ F[t]. Then gcd(f1, . . . , fk) ex- the existence of greatest common divisors of ists and, moreover, that there exist polynomials polynomials. g1, . . . , gk such that Definition 14.4.33 (Greatest common divisior). k Let f1, . . . , fk, g ∈ [t]. We say that g is a X F gcd(f1, . . . , fk) = figi (14.18) greatest common divisor (gcd) of f , . . . , f if 1 k i=1 98 CHAPTER 14. (F) PRELIMINARIES

5 4 3 2 ♦ Example 14.4.42. Let f = t +2t −3t +t − 5t + 4 and let g = t2 − 1 over the real numbers. Lemma 14.4.38 (Euclid’s Lemma). Let Then f, g, h ∈ F[t]. Then Div(f, g) = Div(f − 3 2   gh, g). ♦ Div(f, g) = Div f − t + 2t − 2t + 3 g, g = Div t2 − 1, −7t − 7 Theorem 14.4.39. Let f1, . . . , fk, g ∈ F[t].   2 t Then g is a gcd of f1, . . . , fk if and only if = Div t − 1 + (−7t + 7), −7t + 7 7 Div(f1, . . . , fk) = Div(g). ♦ = Div (t − 1, −7t + 7) Exercise 14.4.40. Let f ∈ F[t]. Show that  1  Div(f, 0) = Div(f). = Div t − 1 + (−7t + 7), −7t + 7 7 By applying the above theorem and Eu- = Div(0, −7t + 7) clid’s Lemma, we arrive at Euclid’s Algorithm = Div(−7t + 7) for determining the gcd of polynomials. Thus −7t + 7 is a gcd of f and g, and we can 1 Theorem 14.4.41 (Euclid’s Algorithm). The multiply by − 7 to get a monic polynomial. In following algorithm can be used to determine particular, t − 1 is the gcd of f and g, and we the greatest common divisor of two polynomi- may write gcd(f, g) = t − 1. als f and g . In each iteration of the while 0 0 Exercise 14.4.43. Let loop, r = f − g · q, where r and q are the poly- 2 nomials guaranteed by the Division Theorem. f1 = t + t − 2 (14.19) In particular, deg r < deg g. 2 f2 = t + 3t + 2 (14.20) f ← f0 3 f3 = t − 1 (14.21) g ← g0 4 2 while g 6= 0 do f4 = t − t + t − 1 (14.22) Find q and r by the Division Theorem over the real numbers. Determine the following f ← g greatest common divisors. g ← r end while (a) gcd(f1, f2) return f (b) gcd(f1, f3) ♦ (c) gcd(f1, f3, f4) The following example demonstrates Eu- clid’s algorithm. (d) gcd(f1, f2, f3, f4) 14.4. POLYNOMIALS 99

Proposition 14.4.44. Let f, g, h ∈ F[t] and Proposition 14.4.50. If f ∈ F[t] is irre- d = gcd(g, h). Then gcd(fg, fh) = fd. ducible and ζ is a nonzero scalar, then ζf is irreducible. Proposition 14.4.45. If f | gh and Proposition 14.4.51. Let f ∈ [t] with gcd(f, g) = 1, then f | h. F deg f = 1. Then f is irreducible. Exercise 14.4.46. Determine gcd(f, f 0) where Proposition 14.4.52. Let f be an irreducible f = tn + t + 1 (over R). polynomial, and let f | gh. Then either f | g or f | h. Exercise 14.4.47. Determine gcd(tn − 1, t2 + t + 1) over the real numbers. Proposition 14.4.53. Every nonzero polyno- mial is a product of irreducible polynomials. Let f ∈ F[t] and let F be a subfield of the Theorem 14.4.54 (Unique Factorization). field G. Then we can also view f as a poly- Every polynomial in F[t] can be uniquely writ- nomial over G. However, this changes the set ten as the product of irreducible polynomials Div(f) and therefore in principle it could affect over F. ♦ the gcd of two polynomials. We shall see that Uniqueness holds up to the order of the fac- up to scalar factors nothing changes, but in or- tors and scalar multiples, i. e., if f ··· f = der to be able to reason about this question, 1 k g ··· g where the f and g are irreducible, we temporarily use the notation gcd (f) and 1 ` i j F then k = ` and there exists a permutation gcd (f) to denote the gcd of f with respect to G (bijection) σ : {1, . . . , k} → {1, . . . , k} and the corresponding fields. nonzero scalars αi ∈ F such that fi = αigσ(i). Exercise 14.4.48 (Insensitivity of gcd to field Definition 14.4.55 (Root of a polynomial). Let extensions). Let F be a subfield of G, and let f ∈ F[t]. We say that ζ ∈ F is a root of f if f, g ∈ F[t]. Then f(ζ) = 0. Proposition 14.4.56. Let ζ ∈ F and let gcd(f, g) = gcd(f, g) . f ∈ [t]. Then F G F t − ζ | f − f(ζ) . Definition 14.4.49 (Irreducible polynomial). A polynomial f ∈ F[t] is irreducible over F if Corollary 14.4.57. Let ζ ∈ F and f ∈ F[t]. deg f ≥ 1 and for all g, h ∈ F[t], if f = gh Then ζ is a root of f if and only if (t − ζ) | f. then either deg g = 0 or deg h = 0. Definition 14.4.58 (Multiplicity). The multi- We shall give examples of irreducible poly- plicity of a root ζ of a polynomial f ∈ F[t] k nomials over various fields in Examples ?? is the largest k for which (t − ζ) | f. 100 CHAPTER 14. (F) PRELIMINARIES

Exercise√ 14.4.59. Let f ∈ R[t]. Show that Proposition 14.4.67. Let f ∈ R[t] and ζ ∈  2 f −1 = 0 if and only if t + 1 | f. C. Then f(ζ) = f(ζ). Conclude that if ζ is a complex root of f, then so is ζ. Proposition 14.4.60. Let f be a polynomial of degree n. Then f has at most n roots (count- Exercise 14.4.68. Let f ∈ R[t]. Show that if ing multiplicities). f is irreducible over R, then deg f ≤ 2. Exercise 14.4.69. Let f ∈ R[t] and f 6= 0. Theorem 14.4.61 (Fundamental Theorem of Q Show that f can be written as f = gi where Algebra). Let f ∈ C[t]. If deg f ≥ 1, then f each gi has degree 1 or 2. has a complex root, i. e., there exists ζ ∈ C such that f(ζ) = 0. ♦ Definition 14.4.70 (Formal derivative of a poly- n X i Proposition 14.4.62. If f ∈ C[t], then f is nomial). Let f = αit . Then the formal irreducible over C if and only if deg f = 1. i=0 derivative of f is defined to be Proposition 14.4.63. If f ∈ C[t] and deg f = n 0 X k−1 k ≥ 0, then f can be written as f = kαkt (14.24) k=1 k Y That is, f = αk (t − ζi) (14.23) i=1 0 n−1 f = α1 + 2α2t + ··· + nαnt . (14.25) where αk is the leading coefficient of f and the Note that this definition works even over finite ζi are complex numbers. fields. We write f (k) to mean the k-th deriva- tive of f, defined inductively as f (0) = f and Proposition 14.4.64. Let f ∈ P2[ ] be given R (k+1) (k)0 by f = at2 + bt + c with a 6= 0. Then f is f = f . irreducible over R if and only if b2 − 4ac < 0. Proposition 14.4.71 (Linearity of differenti- ation). Let f and g be a polynomials and let ζ Exercise 14.4.65. Let f ∈ [t]. F be a scalar. Then (a) If f has a root in F and deg f ≥ 2, then (a) (f + g)0 = f 0 + g0 , f is reducible. (b) (ζf)0 = ζf 0 . (b) Find a reducible polynomial over R that has no real root. Proposition 14.4.72 (Product rule). Let f and g be a polynomials and let ζ be a scalar. Proposition 14.4.66. Let f ∈ R[t] be of odd Then degree. Then f has a real root. (fg)0 = f 0g + fg0 (14.26) 14.4. POLYNOMIALS 101

Definition 14.4.73 (Composition of polynomi- als). Let f and g be polynomials. Then the composition of f with g, denoted f ◦ g is the polynomial obtained by replacing all occur- rences of the symbol t in the expression for f with g, i. e., f ◦ g = f(g) (we “substitute g” into f). Proposition 14.4.74. Let f and g be polyno- mials and let ζ be a scalar. Then

(f ◦ g)(ζ) = f(g(ζ)) . (14.27)

Proposition 14.4.75 (Chain Rule). Let f, g ∈ F[t] and let h = f ◦ g. Then h0 = (f 0 ◦ g) · g0 . (14.28)

Proposition 14.4.76.

(a) Let2 F be Q, R, or C. Let f ∈ F[t] and let ζ ∈ F. Then (t − ζ)k | f if and only if f(ζ) = f 0(ζ) = ··· = f (k−1)(ζ) = 0

(b) This is false if F = Fp. Proposition 14.4.77. Let f ∈ C[t]. Then f has no multiple roots if and only if gcd(f, f 0) = 1. Exercise 14.4.78. Let n ≥ 1. Prove that the polynomial f = tn +t+1 has no multiple roots in C.

(R Contents)

2This exercise holds for all subfields of C and more generally for all fields of characteristic 0. the notion of linear combination, with the con- cept of a . Let F be a (finite or infinite) field. The reader not familiar with fields may think of F as being R or C. Chapter 15 Definition 15.1.1 (Vector space). A vector space V over a field F is a set V with certain operations. We refer to the elements of V as “vectors” ad the elements of F as “scalars.” (F) Vector Spaces: We assume a binary operation of addition of vectors, and an operation of multiplication of Basic Concepts a vector by a scalar, satisfying the following axioms.

(a) (V, +) is an abelian group (see Sec. 14.2). In other words, (R Contents) (b1) For all u, v ∈ V there exists a unique element u + v ∈ V 15.1 Vector spaces (b2) Addition is commutative: for all u, v ∈ V we have u + v = v + u Many sets with which we are familiar have the (b3) Addition is associative: for all desirable property that they are closed under u, v, w ∈ V we have u + (v + w) = “linear combinations” of their elements. Con- (u + v) + w sider, for example, the set RR of real functions (b4) There is a zero element, denoted 0, f : R → R. If f, g ∈ RR and α, β ∈ R, then such that for all v ∈ V we have αf + βg ∈ RR. The same is true for the set 0 + v = v C[0, 1] of continuous functions f : [0, 1] → R as well as many other sets, like the set R[t] (b5) Every element v ∈ V has an addi- of polynomials with real coefficients (where t tive inverse, denoted −v, such that denotes the variable), the set RN of sequences v + (−v) = 0. of real numbers, and the 2- and 3-dimensional geometric spaces G2 and G3. We formalize the (b) V comes with a scaling function V ×F → common properties of these sets, endowed with V such that

Babai: Discover Linear Algebra. 102 This chapter last updated August 16, 2016 c 2016 L´aszl´oBabai. 15.1. VECTOR SPACES 103

(b1) For all α ∈ F and v ∈ V , there ex- Example 15.1.4 (Elementary geometry). The ists a unique vector αv ∈ V most natural examples of vector spaces are the geometric spaces G and G . We write G for (b2) For all α, β ∈ F and v ∈ V ,(αβ)v = 2 3 2 α(βv) the plane and G3 for the “space” familiar from elementary geometry. We think of G2 and G3 (b3) For all α, β ∈ F and v ∈ V ,(α + has having a special point called the origin. We β)v = αv + βv view the points of G2 and G3 as “vectors” (line (b4) For every α ∈ F and u, v ∈ V , segments from the origin to the point). Addi- α(u + v) = αu + αv tion is defined by the parallelogram rule and multiplication by scalars (over R) by scaling. (b5) For every v ∈ V , 1 · v = v (normal- Observe that G and G are vector spaces over ization) 2 3 R. These classical geometries form the founda- tion of our intuition about vector spaces. The zero vector in a vector space V is writ- ten as 0V , but we often just write 0 when the 2 Note that G2 is not the same as R . Vec- context is clear. tors in G2 are directed segments (geometric ob- Property (b2) is referred to as “pseudo- jects), while the vectors in R2 are pairs of num- associativity,” because it is a form of associa- bers. The connection between these two is one tivity in which we are dealing with different of the great discoveries of the mathematics of operations (multiplication in F and scaling of the modern era (Descartes). vectors). Similarly, Properties (b3) and (b4) are both types of “pseudo-distributivity.” Examples 15.1.5. Show that the following are vector spaces over R. Proposition 15.1.2. Let V be a vector space (a) Rn over the field F. For all v ∈ V , α ∈ F we have (b) M ( ) (a) 0v = 0. n R (c) Rk×n (b) α0 = 0. (d) C[0, 1], the space of continuous real- (c) αv = 0 if and only if either α = 0 or valued functions f : [0, 1] → R v = 0. (e) The space RN of infinite sequences of real Exercise 15.1.3. Let V be a vector space and numbers let x ∈ V . Show that x + x = x if and only if x = 0. (f) The space RR of real functions f : R → R 104 CHAPTER 15. (F) VECTOR SPACES: BASIC CONCEPTS

k (g) The space R[t] of polynomials in one vari- X able with real coefficients. V , α1, . . . , αk ∈ F. Then αivi is called a i=1 linear combination of the vectors v ,..., v . (h) For all k ≥ 0, the space P [ ] of polyno- 1 k k R The linear combination for which all coeffi- mials of degree at most k with coefficients cients are zero is the trivial linear combination. in R

Ω Exercise 15.1.10 (Empty linear combina- (i) The space R of functions f :Ω → R tion). What is the linear combination of the where Ω is an arbitrary set. empty set? R Convention 14.2.6 explains our In Section 11.1, we defined the notion of a convention for the empty sum. linear form over Fn (R Def. 11.1.1). This Exercise 15.1.11. generalizes immediately to vector spaces over F. (a) Express the polynomial t − 1 as a linear combination of the polynomials t2 − 1, Definition 15.1.6 (Linear form). Let V be a (t − 1)2, t2 − 3t + 2. vector space over F.A linear form is a function f : V → F with the following properties. (b) Give an elegant proof that the polyno- mial t2 + 1 cannot be expressed as a lin- n (a) f(x + y) = f(x) + f(y) for all x, y ∈ F ; ear combination of the polynomials t2−1, (t − 1)2, t2 − 3t + 2. (b) f(λx) = λf(x) for all x ∈ Fn and λ ∈ F. Exercise 15.1.12. For α ∈ R, express cos(t + Definition 15.1.7 (Dual space). Let V be a α) as a linear combination of cos t and sin t. vector space over F. The set of linear forms f : V → is called the dual space of V and is F R denoted V ∗. ( Contents)

Exercise 15.1.8. Let V be a vector space over F. Show that V ∗ is also a vector space over F. 15.2 Subspaces and span

In Section 1.1, we defined linear combina- In Section 1.2, we studied subspaces and span tions of column vectors (R Def. 1.1.13). This in the context of Fk. We now generalize this to is easily generalized to linear combinations of arbitrary vector spaces. vectors in any vector space. For this section, we will take V to be a Definition 15.1.9 (Linear combination). Let V vector space over a field F and we will let be a vector space over F, and let v1,..., vk ∈ W, S ⊆ V . 15.3. LINEAR INDEPENDENCE AND BASES 105

Definition 15.2.1 (Subspace). W ⊆ V is a sub- (l) {f ∈ R[t] | f(1) ≤ f(2)} space (notation: W ≤ V ) if W is closed under linear combinations. Definition 15.2.6 (Span). Let V be a vector space and let v1,..., vm ∈ V . Then the span of Proposition 15.2.2. Let W ≤ V . Then W is S = {v1,..., vm}, denoted span(v1,..., vm), also a vector space. is the smallest subspace of V containing S, i. e.,

Exercise 15.2.3. Verify that Proposi- (a) span S ⊇ S; tions 1.2.3 and 1.2.9 hold when n is replaced F (b) span S is closed under linear combina- by V . tions Exercise 15.2.4. Describe all subspaces of the (c) for every subspace W ≤ V , if S ⊆ W geometric spaces G and G . 2 3 then span S ≤ W . Exercise 15.2.5. Determine which of the fol- Exercise 15.2.7. Repeat the exercises of Sec- lowing are subspaces of [t], the space of poly- k R tion 1.2, replacing F by V . nomials in one variable over R.

(a) {f ∈ R[t] | deg(f) = 5} (R Contents)

(b) {f ∈ R[t] | deg(f) ≤ 5}

(c) {f ∈ R[t] | f(1) = 1} 15.3 Linear independence

(d) {f ∈ R[t] | f(1) = 0} and bases √  (e) {f ∈ R[t] | f 2 = 0} Let V be a vector space. In Section 1.3, we de- fined the notion of linear independence of ma- √  (f) {f ∈ R[t] | f −1 = 0} trices (R Def. 1.3.5). We now generalize this to linear independence of a list (R Def. (g) {f ∈ [t] | f(1) = f(2)} R 1.3.1) of vectors in a general vector space. (h) {f ∈ R[t] | f(1) = (f(2))2} Definition 15.3.1 (Linear independence). The list (v1,..., vk) of vectors in V is said to be (i) {f ∈ [t] | f(1)f(2) = 0} R linearly independent over F if the only linear combination equal to 0 is the trivial linear com- (j) {f ∈ R[t] | f(1) = 3f(2) + 4f(3)} bination. The list (v1,..., vk) is linearly de- (k) {f ∈ R[t] | f(1) = 3f(2) + 4f(3) + 1} pendent if it is not linearly independent. 106 CHAPTER 15. (F) VECTOR SPACES: BASIC CONCEPTS

Definition 15.3.2. If a list (v1,..., vk) of vec- (c) When are three vectors in G3 linearly de- tors is linearly independent (dependent), we pendent? say that the vectors v1,..., vk are linearly in- dependent (dependent). Phrase your answers in geometric terms. Definition 15.3.3. We say that a set of vectors is linearly independent if a list formed by its el- Definition 15.3.8. We say that the vector w ements (in any order and without repetitions) depends on the list (v1,..., vk) of vectors if is linearly independent. w ∈ span(v1,..., vk), i. e., if w can be ex- pressed as a linear combination of the vi. Exercise 15.3.4. Show that the following sets are linearly independent over . Definition 15.3.9 (Linear independence of an Q infinite list). We say that an infinite list (v | √ √ i (a) 1, 2, 3 i ∈ I) (where I is an index set) is linearly in- √ dependent if every finite sublist (vi | i ∈ J) (b) { x | x is square-free} (an integer n is (where J ⊆ I and |J| < ∞) is linearly inde- square-free if there is no perfect square pendent. k 6= 1 such that k | n). Exercise 15.3.10. Verify that Exer- Exercise 15.3.5. Let U ,U ≤ V with U ∩ 1 2 1 cises 1.3.11-1.3.26 hold in general vector spaces U2 = {0}. Let v1,..., vk ∈ U1 and (replace Fn by V where necessary). w1,..., w` ∈ U2. If the lists (v1,..., vk) and (w ,..., w ) are linearly independent, then so 1 ` Example 15.3.11. For k = 0, 1, 2,... , let f is the list (v ,..., v , w ,..., w ). k 1 k 1 ` be a polynomial of degree k. Show that the Definition 15.3.6 (Rank and dimension). The infinite list (f0, f1, f2,... ) is linearly indepen- rank of a set of vectors is the maximum number dent. of linearly independent vectors among them. For a vector space V , the dimension of V is its Exercise 15.3.12. Find three nonzero vectors rank, that is, dim V = rk V . in G3 that are linearly dependent but no two are parallel. Exercise 15.3.7. Exercise 15.3.13. Prove that for all α, β ∈ , (a) When are two vectors in G linearly de- R 2 the functions sin(t), sin(t+α), sin(t+β) are lin- pendent? early dependent (as members of the function R (b) When are two vectors in G3 linearly de- space R ). pendent? 15.3. LINEAR INDEPENDENCE AND BASES 107

Exercise 15.3.14. Let α1 < α2 < ··· < αn ∈ Definition 15.3.18 (Finite-dimensional vector R. Consider the vectors space). We say that the vector space V is finite   dimensional if V has a finite set of generators. αi 1 A vector space which is not finite dimensional  i  α2 is infinite dimensional. vi =  .  (15.1)  .    i Exercise 15.3.19. Show that F[t] is infinite αn dimensional. 0 for i ≥ 0 (recall the convention that α = 1 Definition 15.3.20 (Basis). A list e = even if α = 0). Show that (v0,..., vn−1) is (e1,..., ek) of vectors is a basis of V if e is linearly independent. linearly independent and generates V . Exercise 15.3.15 (Moment curve). Find a In Section 1.3, we defined the standard ba- n continuous curve in R , i. e., a continuous in- sis of the space n (R Def. 1.3.34). However, n F jective function f : R → R , such that every general vector spaces do not have the notion of set of n points on the curve is linearly inde- a standard basis. pendent. The simplest example is called the Note that a list of vectors is not the same “moment curve,” and we bet you will find it. as a set of vectors, but a list of vectors which is linearly independent necessarily has no re- Exercise 15.3.16. Let α1 < α2 < ··· < αn ∈ peated elements. Note further that lists carry R, and define the degree-n polynomial with them an inherent ordering; that is, bases n Y are ordered. f = (t − αj) j=1 Examples 15.3.21. For each 1 ≤ i ≤ n, define the polynomial g of i (a) Show that the polynomials 1, t, t2, . . . , tk degree n − 1 by form a basis for Pk[F]. n f Y 2 2 gi = = (t − αj) . (b) Show that the polynomials t +t+1, t − t − αi 2 j=1 2t + 2, t − t − 1 form a basis of P2[ ]. j6=i F

Prove that the polynomials g1, . . . , gn are lin- (c) Express the polynomial f = 1 as a linear early independent. combination of these basis vectors. Definition 15.3.17. We say that a set S ⊆ V Examples 15.3.22. For each of the following generates V if span(S) = V . If S generates V , sets S, describe the vectors in span(S) and give then S is said to be a set of generators of V . a basis for span(S). 108 CHAPTER 15. (F) VECTOR SPACES: BASIC CONCEPTS

(a) S = {t, t2} ⊆ F[t] Warning. Part (b) is easier for fields of characteristic 0 than for fields of finite char- it R (b) S = {sin(t), cos(t), cos(2t), e } ⊆ C acteristic. 1 0  4    Proposition 15.3.27. Let b = (b1,..., bn) be (c) S = 1 , 1 , −7 ⊆ F3 a list of vectors in V . Then b is a basis of V  1 0 4  if and only if every vector can be uniquely ex- pressed as a linear combination of the b , i. e., 1 0 3 0  i (d) S = , , for every v ∈ V , there exists a unique list of co- 0 1 0 −5 n     X 0 2 2 3 efficients (α1, . . . , αn) such that v = αibi. , ⊆ M ( ) 0 0 0 −7 2 F i=1     Definition 15.3.28 (Maximal linearly indepen-  1 0 0 0 0 0 dent set). A linearly independent set S ⊆ V is (e) S = 0 −3 0 , 0 6 0 , maximal if, for all v ∈ V \ S, S ∪ {v} is not  0 0 0 0 0 2 linearly independent. 1 0 0 2 0 0  Proposition 15.3.29. Let e be a list of vec- 0 0 0 , 0 3 0 ⊆ M3(F) 0 0 1 0 0 3  tors in a vector space V . Then e is a basis of V if and only if it is a maximal linearly inde- Example 15.3.23. Show that the polynomi- pendent set. als t2,(t + 1)2, and (t + 2)2 form a basis for Proposition 15.3.30. Let V be a vector P [ ]. Express the polynomial t in terms of 2 F space. Then V has a basis (Zorn’s lemma is this basis, and write its coordinate vector. needed for the infinite-dimensional case). Exercise 15.3.24. For α ∈ R, write the Proposition 15.3.31. Let e be a list of vec- coordinate vector of cos(t + α) in the basis tors in V . Then it is possible to extend e to (cos t, sin t). a basis of V , that is, there exists a basis of V Exercise 15.3.25. Find a basis of the 0-weight which has e as a sublist. subspace of Fk (the 0-weight subspace is de- Proposition 15.3.32. Let V be a vector space fined in R Ex. 1.2.7). and let S ⊆ V be a set of generators of V . Then Exercise 15.3.26. there exists a list e of vectors in S such that e is a basis of V . (a) Find a basis of M ( ). n F Definition 15.3.33 (Coordinates). The coeffi- cients α , . . . , α of Ex. 15.3.27 are called the (b) Find a basis of Mn(F) consisting of non- 1 n singular matrices. coordinates of v with respect to the basis b. 15.4. THE FIRST MIRACLE OF LINEAR ALGEBRA 109

Definition 15.3.34 (Coordinate vector). Let Lemma 15.4.2 (Steinitz exchange lemma). b = (b1,..., bk) be a basis of the vector space Let (v1,..., vk) be a linearly independent list V , and let v ∈ V . Then the column vector such that vi ∈ span(w1,..., wm) for all i. representation of v with respect to the basis b, Then there exists j (1 ≤ j ≤ m) such that the or the coordinatization of v with respect to b, list (wj, v2,..., vk) is linearly independent. ♦ denoted by [v] , is obtained by arranging the b Corollary 15.4.3. Let V be a vector space. coordinates of v with respect to b in a column, All bases of V have the same cardinality. i. e.,   α1 This is an immediate corollary to the First α2 Miracle.   [v]b =  .  (15.2) The following theorem is essentially a re-  .  statement of the First Miracle of Linear Alge- αk bra. k X where v = αibi. Theorem 15.4.4. i=1 (a) Use the First Miracle to derive the fact that rk(v ,..., v ) = (R Contents) 1 k dim (span(v1,..., vk)). (b) Derive the First Miracle from the 15.4 The First Miracle of statement that rk(v1,..., vk) = dim (span(v1,..., vk)). Linear Algebra ♦ In Section 1.3, we proved the First Miracle of Exercise 15.4.5. Let V be a vector space of Linear Algebra for n (R Theorem 1.3.40). F dimension n, and let v1,..., vn ∈ V . The fol- This generalizes immediately to abstract vector lowing are equivalent: spaces. (a) (v1,..., vn) is a basis of V Theorem 15.4.1 (First Miracle of Linear Al- (b) v1,..., vn are linearly independent gebra). Let v1,..., vk be linearly independent with vi ∈ span(w1,..., wm) for all i. Then (c) V = span(v1,..., vn) k ≤ m. Exercise 15.4.6. Show that dim (Fn) = n. The proof of this theorem requires the fol- k×n lowing lemma. Exercise 15.4.7. Show that dim F = kn. 110 CHAPTER 15. (F) VECTOR SPACES: BASIC CONCEPTS

Exercise 15.4.8. Show that dim(Pk) = k + 1, (a) Prove that Fibonacci-like sequences form where Pk is the space of polynomials of degree a 2-dimensional vector space. at most k. Exercise 15.4.9. What is the dimension of the (b) Find a basis for the space of Fibonacci- like sequences. subspace of R[t] consisting of polynomials√ f of degree at most n such that f −1 = 0? Exercise 15.4.10. Show that, if f is a polyno- (c) Express the Fibonacci sequence mial of degree n, then (f(t), f(t + 1), . . . , f(t + (0, 1, 1, 2, 5, 8,... ) as a linear combina- tion of these basis vectors. n − 1)) is a basis of Pn[F].

Exercise 15.4.11. Show that any list of poly- ♥ Exercise 15.4.16. Let f be a polynomial. nomials, one of each degree 0, . . . , n, forms a Prove that f has a multiple g = f · h 6= 0 in basis of Pn[F]. which every exponent is prime, i. e., Proposition 15.4.12. Let V be an n- X p dimensional vector space with subspaces U1,U2 g = αpx (15.5) such that U1 ∩ U2 = {0}. Then p prime

dim U1 + dim U2 ≤ n . (15.3) for some coefficients αp. Proposition 15.4.13 (Modular equation). Let V be a vector space, and let U1,U2 ≤ V . Definition 15.4.17 (Elementary operations). Then The three types of elementary operations de- fined as “elementary column operations” in dim(U1+U2)+dim(U1∩U2) = dim U1+dim U2 . Sec. 3.2 can be applied to any list of vectors. (15.4)

n×n Exercise 15.4.14. Let A = (αij) ∈ R , and Proposition 15.4.18. Performing elementary assume the columns of A are linearly indepen- operations does not change the rank of a set of dent. Prove that it is always possible to change vectors. the value of an entry in the first row so that the columns of A become linearly dependent.

Exercise 15.4.15. Call a sequence (R Contents) (a0, a1, a2,... ) “Fibonacci-like” if for all n, an+2 = an+1 + an. 15.5. DIRECT SUMS 111

15.5 Direct sums Then k k X M Definition 15.5.1 (Disjoint subspaces). Two Ui = Ui . i=1 i=1 subspaces U1,U2 ≤ V are disjoint if U1 ∩ U2 = {0}. Proposition 15.5.5. Let V be a vector space k Definition 15.5.2 (Direct sum). Let V be a vec- X and let Ui,...Uk ≤ V . Then W = Ui is a tor space and let U1,U2 ≤ V . We say that i=1 V is the direct sum of U1 and U2 (notation: direct sum if and only if for every choice of k (V = U ⊕U ) if V = U +U and U ∩U = {0}. 1 2 1 2 1 2 vectors ui (i = 1, . . . , k) where ui ∈ Ui \{0}, More generally, we say that V is the direct the vectors u1,..., uk are linearly independent. sum of subspaces U1,...,Uk ≤ V (notation: V = U1 ⊕ · · · ⊕ Uk) if V = U1 + ··· + Uk and We note that the notion of direct sum ex- for all i, tends the notion of linear independence to sub- spaces. ! X Ui ∩ Uj = {0} . (15.6) Proposition 15.5.6. The vectors v1,..., vk j6=i are linearly independent if and only if

Proposition 15.5.3. Let V be a vector space k k X M and let U1,...,Uk ≤ V . Then span(vi) = span(vi) . (15.8) i=1 i=1 k X dim(U1 ⊕ · · · ⊕ Uk) = dim Ui . (15.7) i=1 (R Contents) The following exercise shows that this could actually be used as the definition of the di- rect sum in finite-dimensional spaces, but that statement in fact holds in infinite-dimensional spaces as well.

Proposition 15.5.4. Let V be a finite- dimensional vector space, and let U1,...,Uk ≤ V , with

k ! k X X dim Ui = dim Ui . i=1 i=1 Exercise 16.1.4. True or false?

(a) If the vectors v1,..., vk are linearly in- dependent, then ϕ(v1), . . . , ϕ(vk) are lin- early independent. Chapter 16 (b) If the vectors v1,..., vk are linearly de- pendent, then ϕ(v1), . . . , ϕ(vk) are lin- early dependent. Example 16.1.5. Our prime examples of lin- (F) Linear Maps ear maps are those defined by matrix multipli- cation. Every matrix A ∈ Fk×n defines a linear n k map ϕA : F → F by x 7→ Ax. Example 16.1.6. The projection of R3 onto (R Contents) R2 defined by   α1   α1 α2 7→ (16.2) α2 16.1 Linear map basics α3 is a linear map (verify!). Definition 16.1.1 (Linear map). Let V and W Example 16.1.7. The map f 7→ R 1 f(t)dt is be vector spaces over the same field F. A func- 0 tion ϕ : V → W is called a linear map or a linear map from C[0, 1] → R (verify!). homomorphism if for all v, w ∈ V and α ∈ d F Example 16.1.8. Differentiation dt is a lin- ear map d : P [ ] → P [ ] from the space (a) ϕ(αv) = αϕ(v) dt n F n−1 F of polynomials of degree ≤ n to the space of (b) ϕ(v + w) = ϕ(v) + ϕ(w) polynomials of degree ≤ n − 1 (verify!). Example 16.1.9. The map ϕ : P [ ] → k Exercise 16.1.2. Show that if ϕ : V → W is n R R defined by a linear map, then ϕ(0V ) = 0W .   f(α1) Proposition 16.1.3. Linear maps preserve f(α2) f 7→   (16.3) linear combinations, i. e.,  .   .  k ! k f(αk) X X ϕ αivi = αiϕ(vi) (16.1) where α1 < α2 < ··· < αn is a linear map i=1 i=1 (verify!).

Babai: Discover Linear Algebra. 112 This chapter last updated August 29, 2016 c 2016 L´aszl´oBabai. 16.2. ISOMORPHISMS 113

Example 16.1.10. Let us fix n + 1 distinct Linear maps are uniquely determined by R real numbers α0 < α1 < ··· < αn. Let f ∈ R their action on a basis, and we are free to be a real function. Interpolation is the map choose this action arbitrarily. This is more for- f 7→ L(f) ∈ Pn(R) where L(f) = p is the mally expressed in the following theorem. unique polynomial of degree ≤ n with the prop- Theorem 16.1.16 (Degree of freedom of lin- erty that p(αi) = f(αi)(i = 0, . . . , n). This is ear maps). Let V and W be vector spaces with R a linear map from → Pn( ) (verify!). R R e = (e1,..., ek) a basis of V , and w1,..., wk arbitrary vectors in W . Then there exists a Example 16.1.11. The map ϕ : Pn(R) → Tn = span{1, cos w, cos 2w, . . . , cos nw} de- unique linear map ϕ : V → W such that fined by ϕ(vi) = wi for 1 ≤ i ≤ k. ♦ Exercise 16.1.17. Show that dim(Hom(V,W )) = 2 n ϕ α0 + α1t + α2t + ··· + αnt dim(V ) · dim(W ). = α + α cos w + α cos 2w + ··· + α cosn t 0 1 2 n In Section 15.3 we represented vectors by (16.4) the column vectors of their coordinates with is a linear map from polynomials to trigono- respect to a given basis. As the next step in metric polynomials (verify!). translating geometric objects to tables of num- bers, we assign matrices to linear maps relative Notation 16.1.12. The set of linear maps ϕ : to given bases in the domain and the target V → W is denoted by Hom(V,W ). space. Our key tool for this endeavor is Theo- Fact 16.1.13. Hom(V,W ) is a subspace of the rem 16.1.16. V W . (R Contents) Definition 16.1.14 (Composition of linear maps). Let U, V , and W be vector spaces, and let ϕ : U → V and ψ : V → W be linear maps. 16.2 Isomorphisms Then the composition of ψ with ϕ, denoted by ψ ◦ ϕ or ψϕ, is the map η : U → W defined by Let V and W be vector spaces over the same field. η(v) := ψ(ϕ(v)) (16.5) Definition 16.2.1 (Isomorphism). A linear map ϕ ψ ϕ ∈ Hom(V,W ) is said to be an isomorphism FIGURE: U −→ V −→ W if it is a bijection. If there exists an isomor- Proposition 16.1.15. Let U, V , and W be phism between V and W , then V and W are vector spaces and let ϕ : U → V and ψ : V → said to be isomorphic. The circumstance that W be linear maps. Then ϕ ◦ ψ is a linear map. “V is isomorphic to W ” is denoted V ∼= W . 114 CHAPTER 16. (F) LINEAR MAPS

Fact 16.2.2. The inverse of an isomorphism is 16.3 The Rank-Nullity an isomorphism. Theorem

Fact 16.2.3. Isomorphisms preserve linear in- Definition 16.3.1 (Image). The image of a lin- dependence and, moreover, map bases to bases. ear map ϕ, denoted im(ϕ), is

Fact 16.2.4. Let V and W be vector spaces, im(ϕ) := {ϕ(v) | v ∈ V } (16.6) let ϕ : V → W be an isomorphism, and let Definition 16.3.2 (Kernel). The kernel of a lin- v1,..., vk ∈ V . Then ear map ϕ, denoted ker(ϕ), is

−1 ker(ϕ) := {v ∈ V | ϕ(v) = 0W } = ϕ (0W ) rk(v1,..., vk) = rk(ϕ(v1), . . . , ϕ(vk)) (16.7) Exercise 16.2.5. Show that ∼= is an equiva- Proposition 16.3.3. Let ϕ : V → W be a lence relation, that is, for vector spaces U, V , linear map. Then im(ϕ) ≤ W and ker(ϕ) ≤ V . and W : Definition 16.3.4 (Rank of a linear map). The rank of a linear map ϕ is defined as (a) V ∼= V (reflexive) rk(ϕ) := dim(im(ϕ)) . ∼ ∼ (b) If V = W then W = V (symmetric) Definition 16.3.5 (Nullity). The nullity of a lin- ear map ϕ is defined as (c) If U ∼= V and V ∼= W then U ∼= W (tran- sitive) nullity(ϕ) := dim(ker(ϕ)) . Exercise 16.3.6. Let ϕ : V → W be a linear Exercise 16.2.6. Let V be an n-dimensional map. ∼ n vector space over F. Show that V = F . (a) Show that dim(im ϕ) ≤ dim V . Proposition 16.2.7. Two vector spaces over (b) Use this to reprove rk(AB) ≤ the same field are isomorphic if and only if they min{rk(A), rk(B)}. have the same dimension. Theorem 16.3.7 (Rank-Nullity Theorem). Let ϕ : V → W be a linear map. Then

(R Contents) rk(ϕ) + nullity(ϕ) = dim(V ) . (16.8)

♦ 16.4. LINEAR TRANSFORMATIONS 115

Exercise 16.3.8. Let n = k + `. Find a lin- (a) Rotation about the origin by an angle θ ear map ϕ : Rn → Rn which has rank k, and therefore has nullity `. (b) Reflection about any line passing through the origin Examples 16.3.9. Find the rank and nullity of each of the linear maps in Examples 16.1.6- (c) Shearing parallel to a line (tilting the 16.1.11. deck) (d) Scaling by α in one direction (R Contents) FIGURES Examples 16.4.5. The following are linear 16.4 Linear transforma- transformations of the 3-dimensional geomet- ric space G (verify!). tions 3 (a) Rotation by θ about any line Let V be a vector space over the field . F (b) Projection into any plane In Section 16.1, we defined the notion of a linear map. We now restrict ourselves to those (c) Reflection through any plane passing maps which map a vector space into itself. through the origin Definition 16.4.1 (Linear transformation). If ϕ : V → V is a linear map, then ϕ is said to be (d) Central reflection a linear transformation or a linear operator. R Example 16.4.6. Let Sα : R be the left shift Definition 16.4.2 (Identity transformation). by α operator, defined by Sα(f) = g where The identity transformation of V is the trans- g(t) := f(t + α) . (16.9) formation id : V → V defined by id(v) = v for all v ∈ V . Sα is a linear transformation (verify!). Definition 16.4.3 (Scalar transformation). The Examples 16.4.7. The following are linear scalar transformations of V are the transfor- transformations of Pn[F] (verify!). mations of the form v 7→ αv for a fixed α ∈ F. In other words, a scalar transformation can be d (a) Differentiation dt written as α · id. (b) Left shift by α, f 7→ Sα(f) Examples 16.4.4. The following are linear transformations of the 2-dimensional geomet- (c) Multiplication by t, followed by differen- tiation, i. e., the map f 7→ (tf)0 ric space G2 (verify!). 116 CHAPTER 16. (F) LINEAR MAPS

Examples 16.4.8. The following are linear Definition 16.4.12 (Eigenvalue). Let ϕ : V → transformations of the space RN of infinite se- V be a linear transformation. Then λ ∈ F is an quences of real numbers (verify!). eigenvalue of ϕ if there exists a nonzero vector v ∈ V such that ϕ(v) = λv. (a) Left shift, Compare these definitions with those in

(α0, α1, α2,... ) 7→ (α1, α2, α3,... ) Chapter 8. In particular, if A is a square ma- trix, verify that its eigenvectors and eigenval- ues are the same as those of the linear trans- formation ϕ associated with A (R Example (b) Difference, (α , α , α ,... ) 7→ A 0 1 2 16.1.5), i. e., the map x 7→ Ax. (α1 − α0, α2 − α1, α3 − α2,... ) Exercise 16.4.13. Let ϕ : V → V be a linear Example 16.4.9. Let V = span(sin t, cos t). transformation and let v1 and v2 be eigenvec- Differentiation is a linear transformation of V tors to distinct eigenvalues. Then v1 + v2 is (verify!). not an eigenvector. Exercise 16.4.10. The difference operator ∆ : Exercise 16.4.14. Let ϕ : V → V be a linear Pk[F] → Pk[F], defined by transformation such that every nonzero vector is an eigenvector. Then ϕ is a scalar transfor- : ∆f = Sα(f) − f mation. is linear, and deg(∆f) = deg(f)−1 if deg(f) ≥ Exercise 16.4.15. Let ϕ : V → V be a linear 1. transformation and let v1,..., vk be eigenvec- tors to distinct eigenvalues. Then the vi are 16.4.1 Eigenvectors, eigenval- linearly independent. ues, eigenspaces Definition 16.4.16 (Eigenspace). Let V be a vector space and let ϕ : V → V be a linear In Chapter 8, we discussed the notion of eigen- transformation. We denote by U the set vectors and eigenvalues of square matrices. λ

This is easily generalized to eigenvectors and Uλ := {v ∈ V | ϕ(v) = λv} . (16.10) eigenvalues of linear transformations. This set is called the eigenspace corresponding Definition 16.4.11 (Eigenvector). Let ϕ : V → to the eigenvalue λ. V be a linear transformation. Then v ∈ V is an eigenvector of ϕ if v 6= 0 and there exists Exercise 16.4.17. Let ϕ : V → V be a linear λ ∈ F such that ϕ(v) = λv. In this case we transformation. Show that, for all λ ∈ F, Uλ is say that v is an eigenvector to eigenvalue λ. a subspace of V . 16.4. LINEAR TRANSFORMATIONS 117

Proposition 16.4.18. Let ϕ : V → V be a Exercise 16.4.25. linear transformation. Then (a) What are the invariant subspaces of id? X M Uλ = Uλ . (16.11) (b) What are the invariant subspaces of the λ λ 0 transformation? where ⊕ represents the direct sum (R Def. Exercise 16.4.26. Let ϕ : V → V be a linear 15.5.2) and the summation is over all eigenval- transformation and let λ be an eigenvalue of ϕ. ues. What are the invariant subspaces of ϕ + λ id ? Examples 16.4.19. Determine the rank, nul- Exercise 16.4.27. Over every field F, find an lity, eigenvalues (and their geometric multiplic- infinite-dimensional vector space V and lin- ities), and eigenvectors of each of the transfor- ear transformation ϕ : V → V that has mations in Examples 16.4.4-16.4.10. no finite-dimensional invariant subspaces other than {0}. Definition 16.4.20 (Eigenbasis). Let ϕ : V → V be a linear transformation. An eigenbasis of Proposition 16.4.28. Let ϕ : V → V be a ϕ is a basis of V consisting of eigenvectors of linear transformation. Then ker ϕ and im ϕ are ϕ. invariant subspaces. Exercise 16.4.21. Proposition 16.4.29. Let ϕ : V → V be a linear transformations and let W1,W2 ≤ V be ϕ-invariant subspaces. Then 16.4.2 Invariant subspaces (a) W1 + W2 is a ϕ-invariant subspace; Definition 16.4.22 (Invariant subspace). Let ϕ : V → V and let W ≤ V . Then W is a (b) W1 ∩ W2 is a ϕ-invariant subspace. ϕ-invariant subspace of V if, for all w ∈ W , Definition 16.4.30 (Restriction to a subspace). we have ϕ(w) ∈ W . For any linear transfor- Let ϕ : V → V be a linear map and let W ≤ V mation ϕ, the trivial invariant subspace is the be a ϕ-invariant subspace of V . The restric- subspace {0}. tion of ϕ to W , denoted ϕW , is the linear map ϕ : W → W defined by ϕ (w) = ϕ(w) for Exercise 16.4.23. Let ϕ : G → G be a rota- W W 3 3 all w ∈ W . tion about the vertical axis through the origin. What are the ϕ-invariant subspaces? Proposition 16.4.31. Let ϕ : V → V be a linear transformation and let W ≤ V be a ϕ- Exercise 16.4.24. Let π : G → G be the 3 3 invariant subspace. Let f ∈ F[t]. Then projection onto the horizontal plane. What are the π-invariant subspaces? f(ϕW ) = f(ϕ)W . (16.12) 118 CHAPTER 16. (F) LINEAR MAPS   Proposition 16.4.32. Let ϕ : V → V be a 0 0 ··· 0 1 linear transformation. The following are equiv- 1 0 ··· 0 0   alent. 0 1 ··· 0 0 Exercise 16.4.35. Let ρ =   ......  . . . . . (a) ϕ is a scalar transformation; 0 0 ··· 1 0 be an n × n matrix. (b) all subspaces of V are ϕ-invariant; (a) Count the invariant subspaces of ρ over (c) all 1-dimensional subspaces of V are ϕ- C invariant; (b) Find a 2-dimensional minimal invariant (d) all hyperplanes (R Def. 5.2.1) are ϕ- subspace of ρ over R invariant. (c) Count the invariant subspaces of ρ over Exercise 16.4.33. Let S be the shift operator Q if n is prime (R Example 16.4.8 (a)) on the space RN of Definition 16.4.36. Let ϕ : V → V be a linear sequences of real numbers, defined by transformation. Then the n-th power of ϕ is defined as S(α0, α1, α2,... ) := (α1, α2, α3,... ) . (16.13) ϕn := ϕ ◦ · · · ◦ ϕ . (16.14) | {z } (a) In Ex. 15.4.15, we defined the space of n times Fibonacci-like sequences. Show that this Definition 16.4.37. Let ϕ : V → V be a linear is an S-invariant subspace of N. R transformation and let f ∈ F[t], say n (b) Find an eigenbasis of S in this subspace. f = α0 + α1t + ··· + αnt . Then we define f(ϕ) by (c) Use the result of part (b) to find an n f(ϕ) := α0 id +α1ϕ + ··· + αnϕ . (16.15) explicit formula for the n-th Fibonacci number. In particular, f(ϕ)(v) = α v + α ϕ(vv) + ··· + α ϕn(v) . Definition 16.4.34 (Minimal invariant sub- 0 1 n (16.16) space). Let ϕ : V → V be a linear transfor- mation. Then U ≤ V is a minimal invariant Exercise 16.4.38. Let ϕ : V → V be a linear subspace of ϕ if the only invariant subspace transformation and let f ∈ F[t]. Verify that properly contained in U is {0}. f(ϕ) is a linear transformation of V . 16.4. LINEAR TRANSFORMATIONS 119

Definition 16.4.39 (Chain of invariant sub- Exercise 16.4.44. Let v ∈ V be a nonzero spaces). Let ϕ : V → V be a linear transfor- vector and let ϕ : V → V be a linear transfor- mation, and let U1,...,Uk ≤ V be ϕ-invariant mation. Then v is an eigenvector of ϕ if and subspaces. We say that the Ui form a chain if only if span(v) is ϕ-invariant. whenever i 6= j, either Ui ≤ Uj or Uj ≤ Ui. Proposition 16.4.45. Let V be a vector space Definition 16.4.40 (Maximal chain). Let ϕ : over R and let ϕ : V → V be a linear transfor- V → V be a linear transformation and let mation. Then ϕ has an invariant subspace of U1,...,Uk ≤ V be a chain of ϕ-invariant sub- dimension at most 2. spaces. We say that this chain is maximal if, for all W ≤ V (W 6= Ui for all i), the Ui to- Proposition 16.4.46. Let V be an n- gether with W do not form a chain. dimensional vector space and let ϕ : V → V be a linear transformation. The following are d Exercise 16.4.41. Let dt : Pn(R) → Pn(R) be equivalent. the derivative linear transformation (R Def. 14.4.70) of the space of real polynomials of de- (a) There exists a maximal chain of sub- gree at most n. spaces, all of which are invariant. (b) There is a basis b of V such that [ϕ] is (a) Prove that the number of d -invariant b dt triangular. subspaces is n + 2; Proposition 16.4.47. Let V be a finite- (b) give a very simple description of each; dimensional vector space with basis b and let ϕ : V → V be a linear transformation. (c) prove that they form a maximal chain.

(a) Let b a basis of V . Then [ϕ]b is triangu- Fp ♥ Exercise 16.4.42. Let V = Fp , and let lar if and only if every initial segment of ϕ be the shift-by-1 operator, i. e., ϕ(f)(t) = b spans a ϕ-invariant subspace of V . f(t + 1). What are the invariant subspaces of ϕ? Prove that they form a maximal chain. (b) If such a basis exists, then such an or- thonormal basis exists. Proposition 16.4.43. Let ϕ : V → V be a linear transformation, and let f ∈ F[t]. Then Proposition 16.4.48. Let V be a finite- ker f(ϕ) and im f(ϕ) are invariant subspaces. dimensional vector space with basis b and let ϕ : V → V be a linear transformation. Then The next exercise shows that invariant sub- [ϕ]b is diagonal if and only if b is an eigenbasis spaces generalize the notion of eigenvectors. of ϕ. 120 CHAPTER 16. (F) LINEAR MAPS

Exercise 16.4.49. Infer Schur’s Theorem Exercise 16.5.2. Write the matrix represen- (R Theorem 12.4.9) and the real version of tation of each of the linear transformations in Schur’s Theorem (R Theorem 12.4.18) from Ex. 16.4.4 in a basis consisting of perpendicu- the preceding exercise. lar unit vectors. Exercise 16.5.3. Compute the matrix of the (R Contents) “rotation by θ” transformation with respect to the basis of two unit vectors at an angle θ. Exercise 16.5.4. In the preceding two ex- 16.5 Coordinatization ercises, we computed two matrices represent- ing the “rotation by θ” transformation, corre- In Section 15.3, we defined coordinatization of sponding to the two different bases considered. a vector with respect to some basis (R Def. Compare (a) the traces and (b) the determi- 15.3.34). We now extend this to the notion of nants (R Prop. 6.2.3 (a)) of these two ma- coordinatization of linear maps. trices. Definition 16.5.1 (Coordinatization). Let V be Example 16.5.5. Write the matrix represen- an n-dimensional vector space with basis e = tation of each of the linear transformations in (e1,..., en), and let W be an m-dimensional Ex. 16.4.5 in the basis consisting of three mu- vector space with basis f = (f 1,..., f m). Let tually perpendicular unit vectors. αij (1 ≤ i ≤ m, 1 ≤ j ≤ n) be coefficients such m Example 16.5.6. Write the matrix represen- X that ϕ(ej) = αijf i. Then the matrix repre- tation of each of the linear transformations in i=1 Ex. 16.4.7 in the basis (1, t, t2, . . . , tn) of the sentation or coordinatization of ϕ with respect polynomial space Pn(F). to the bases e and f is the m × n matrix The next exercise demonstrates that under   α11 ··· α1n our rules of coordinatization, the action of a . . . linear map corresponds to multiplying a col- [ϕ]e,f :=  . .. .  (16.17)   umn vector by a matrix. αm1 ··· αmn Proposition 16.5.7. For any v ∈ V , if ϕ : If ϕ : V → V is a linear transformation, then V → W is a linear map, then we write [ϕ]e instead of [ϕ]e,e. [ϕ(v)]f = [ϕ]e,f [v]e So the j-th column of [ϕ]e,f is [ϕ(ej)]f , the coordinate vector of the image of the j-th basis Moreover, coordinatization treats eigenvec- vector of V in the basis f of W . tors the way we would expect. 16.5. COORDINATIZATION 121

Proposition 16.5.8. Let V be a vector space Example 16.5.13. Let V be an n-dimensional with basis b and let ϕ : V → V be a linear vector space with basis b, and let A ∈ Fk×n. k transformation. Write A = [ϕ]b. Then Define ϕ : V → F by x 7→ A[x]b. (a) A and ϕ have the same eigenvalues (a) Show that ϕ is a linear map.

(b) v ∈ V is an eigenvector of ϕ with eigen- (b) Show that rk ϕA = rk A value λ if and only if [v] is an eigenvector b Definition 16.5.14. Let V be a finite- of A with eigenvalue λ dimensional vector space over R and let b be The next exercise shows that under our a basis of V . Let ϕ : V → V be a nonsin- coordinatization, composition of linear maps gular linear transformation. We say that ϕ is (R Def. 16.1.14) corresponds to matrix mul- sense-preserving if det[ϕ]b > 0 and ϕ is sense- tiplication. This gives a natural explanation of reversing if det[ϕ]b < 0. why we multiply matrices the way we do, and Proposition 16.5.15. why this operation is associative. (a) The sense-preserving linear transforma- Proposition 16.5.9. Let U, V , and W be vec- tions of the plane are rotations. tor spaces with bases e, f, and g, respectively, and let ϕ : U → V and ψ : V → W be linear (b) The sense-reversing transformations of maps. Then the plane are reflections about a line through the origin. [ψϕ]e,g = [ψ]f,g[ϕ]e,f (16.18) Exercise 16.5.16. What linear transforma- Exercise 16.5.10. Explain the comment be- tions of G2 fix the origin? fore the preceding exercise. Definition 16.5.17 (Rotational reflection). A Exercise 16.5.11. Let V and W be vector rotational reflection of G3 is a rotation followed spaces, with dim V = n and dim W = k. Infer by a central reflection. from Prop. 16.5.9 that coordinatization is an Proposition 16.5.18. isomorphism between Hom(V,W ) and Fk×n. (a) The sense-preserving linear transforma- Exercise 16.5.12. Let ρ be the “rotation by θ tions of 3-dimensional space are rotations θ” linear map in G . 2 about an axis. (a) Show that, for α, β ∈ R, ρα+β = ρα ◦ ρβ. (b) The sense-reversing linear transforma- (b) Use matrix multiplication to derive the tions of 3-dimensional space are rota- addition formulas for sin and cos. tional reflections. 122 CHAPTER 16. (F) LINEAR MAPS

(R Contents) bases from f to f 0, we consider f the “old” ba- sis and f 0 the “new” basis. So if ϕ : V → W is a linear map, we write [ϕ]old in place of [ϕ]e,f and [ϕ] in place of [ϕ] 0 0 . 16.6 new e ,f Proposition 16.6.5. Let v ∈ V and let e and We now have the equipment necessary to dis- e0 be bases of V . Let σ be the change of basis cuss change of basis transformations and the transformation from e to e0. Then result of change of basis on the matrix repre- −1 sentation of linear maps. [v]new = [σ] [v]old . (16.19) Definition 16.6.1 (Change of basis transfor- Numerical exercise 16.6.6. For each of the mation). Let V be a vector space with bases following vector spaces V , compute the change e = (e ,..., e ) and e0 = (e0 ,..., e0 ). Then 1 n 1 n of basis matrix from e to e0. Self-check: pick the change of basis transformation (from e to some v ∈ V , determine [v] and [v] 0 , and ver- e0 ) is the linear transformation σ : V → V e e 0 ify that Equation (16.19) holds. given by σ(ei) = ei, for 1 ≤ i ≤ n. (a) V = G2, e = (e1, e2) is two perpendicu- Proposition 16.6.2. Let V be a vector space 0 0 0 lar unit vectors, and e = (e1, e2), where with bases e and e , and let σ : V → V be e0 is e rotated by θ the change of basis transformation from e to 2 1 0 2 0 e . Then [σ]e = [σ]e0 . (b) V = P2[F], e = (1, t, t ), e = (t2, (t + 1)2, (t + 2)2) For this reason, we often denote the matrix representation of the change of basis transfor- 1  2  0 3 mation σ by [σ] rather than by, e. g., [σ]e. (c) V = F , e = 1 ,  0  , 1, 1 −1 2 Fact 16.6.3. Let σ : V → V be a change of 1 −1 0 basis transformation. Then [σ] is invertible. e0 = 0 ,  1  , 1 Notation 16.6.4. Let V be a vector space with 1 1 1 0 bases e and e . When changing basis from e to Just as coordinates change with respect to 0 e , we sometimes refer to e as the “old” basis different bases, so do the matrix representa- 0 and e as the “new” basis. So if v ∈ V is a tions of linear maps. vector, we often write [v]old in place of ve and [v]new in place of [v]e0 . Likewise, if W is a vec- Proposition 16.6.7. Let V and W be finite tor space with bases f and f 0 and we change dimensional vector spaces, let e and e0 be bases 16.6. CHANGE OF BASIS 123

of V , let f and f 0 be bases of W , and let ϕ : V → W be a linear map. Then

−1 [ϕ]new = T [ϕ]oldS (16.20)

where S is the change of basis matrix from e to e0 and T is the change of basis matrix from f to f 0.

Proposition 16.6.8. Let N be a nilpotent matrix. Then the linear transformation defined by x 7→ Nx has a chain of invariant subspaces.

Proposition 16.6.9. Every matrix A ∈ Mn(C) is similar to a triangular matrix.   A1  A 0  2   ..  where the diagonal blocks 0 .  Ak Ai are square matrices. In this case, we say Chapter 17 that A is the diagonal sum of the matrices A1,...,Ak. ( ) Block Matrices Example 17.1.4. The matrix F  1 2 0 0 0 0   −3 7 0 0 0 0  (optional)    0 0 6 0 0 0  A =    0 0 0 4 6 0     0 0 0 2 −1 3  0 0 0 3 2 5 17.1 Block matrix basics is a block-diagonal matrix. We may write Definition 17.1.1 (Block matrix). We some- A 0 0  1  1 2 times write matrices as block matrices, where A = 0 A 0 where A = ,  2  1 −3 7 each entry actually represents a matrix. 0 0 A3 4 6 0  Example 17.1.2. The matrix A = A2 = 6 , and A3 = 2 −1 3.  1 2 −3 4  3 2 5  2 1 6 −1  may be symbolically 0 −3 −1 2 Note that the blocks need not be of the same size. A A  written as the block matrix A = 11 12 A21 A22 Definition 17.1.5 (Block-triangular matrix). A 1 2 −3 4  square matrix A is said to be a block-triangular where A11 = , A12 = , matrix if it can be written in the form A = 2 1 6 −1   A11 A12 ··· A1k A21 = (0, −3), and A22 = (−1, 2).  A ··· A   22 2k  .. .  for blocks Aij, where Definition 17.1.3 (Block-diagonal matrix). A  0 . .  square matrix A is said to be a block-diagonal Akk matrix if it can be written in the form A = the diagonal blocks Aii are square matrices.

Babai: Discover Linear Algebra. 124 This chapter last updated August 23, 2016 c 2016 L´aszl´oBabai. 17.2. ARITHMETIC OF BLOCK-DIAGONAL AND BLOCK-TRIANGULAR MATRICES 125

Exercise 17.1.6. Show that block matrices In our discussion of the arithmetic block- multiply in the same way as matrices. More triangular matrices, we are interested only in precisely, suppose that A = (Aij) and B = the block-diagonal entries. ri×sj (Bjk) are block matrices where Aij ∈ F Proposition 17.2.4. Let sj ×tk and Bjk ∈ F . Let C = AB (why is this   A1 product defined?). Show that C = (Cik) where ∗ ri×tk  A2  Cik ∈ F and   A =  ..  X  .  C = A B . (17.1) 0 jk ij jk An j and   B1 ∗ 17.2 Arithmetic of block-  B   2  B =  ..  diagonal and block-  0 .  triangular matrices Bn be block-upper triangular matrices with blocks

Proposition 17.2.1. Let A = diag(A1,...,An) of the same size and let λ ∈ F. Then and B = diag(B ,...,B ) be block-diagonal   1 n A1 + B1 ∗ matrices with blocks of the same size and let  A2 + B2  A + B =   λ ∈ F. Then  ..   0 .  An + Bn A + B = diag(A1 + B1,...,An + Bn) (17.2) λA = diag(λA1, . . . , λAn) (17.3) (17.6) AB = diag(A1B1,...,AnBn) . (17.4)   λA1 ∗ Proposition 17.2.2. Let A = diag(A1,...,An)  λA2  k λA =   (17.7) be a block-diagonal matrix. Then A =  ...  k k    diag A1,...,An for all k. 0 λAn Proposition 17.2.3. Let f ∈ [t] be a polyno-   F A1B1 ∗ mial and let A = diag(A1,...,An) be a block-  A B   2 2  diagonal matrix. Then AB =  ..  .  0 .  AnBn f(A) = diag(f(A1), . . . , f(An)) . (17.5) (17.8) 126 CHAPTER 17. (F) BLOCK MATRICES (OPTIONAL)

Proposition 17.2.5. Let A be as in Prop. 17.2.4. Then

Ak  1 ∗  Ak  k  2  A =  ..  (17.9)  .  0 k An for all k.

Proposition 17.2.6. Let f ∈ F[t] be a poly- nomial and let A be as in Prop. 17.2.4. Then   f(A1) ∗  f(A )   2  f(A) =  ..  .  0 .  f(An) (17.10) Exercise 18.1.2. Let A be a matrix. Prove, without using the Cayley-Hamilton Theorem, that for every matrix there is a nonzero polyno- mial that annihilates it, i. e., for every matrix A there is a nonzero polynomial f such that f(A) = 0. Show such a polynomial of degree Chapter 18 at most n2 exists. Definition 18.1.3 (Minimal polynomial). Let A ∈ Mn(F). The polynomial f ∈ F[t] is a (F) Minimal minimal polynomial of A if f is a nonzero poly- Polynomials of nomial of lowest degree that annihilates A. Example 18.1.4. The polynomial t − 1 is a Matrices and minimal polynomial of the n × n identity ma- trix. Linear Exercise 18.1.5. Let A be the diagonal ma- trix diag(3, 3, 7, 7, 7), i. e., Transformations   3 0  3    (optional) A =  7  .    7  0 7 Find a minimal polynomial for A. Compare 18.1 The minimal polyno- your answer to the characteristic polynomial R mial fA. Recall ( Prop. 2.3.4) that for all f ∈ F[t], we have All matrices in this section are square. f(diag(λ1, . . . , λn)) = diag(f(λ1), . . . , f(λn)) . In Section 8.5, we defined what it means to plug a matrix into a polynomial (R Def. Proposition 18.1.6. Let A ∈ Mn(F) and let 2.3.3). m be a minimal polynomial of A. Then for all g ∈ F[t], we have g(A) = 0 if and only if m | g. Definition 18.1.1 (Annihilation). Let f ∈ F[t]. We say that f annihilates the matrix A ∈ Corollary 18.1.7. Let A ∈ Mn(F). Then the minimal polynomial of A is unique up to Mn(F) if f(A) = 0. nonzero scalar factors.

Babai: Discover Linear Algebra. 127 This chapter last updated August 16, 2016 c 2016 L´aszl´oBabai. 128CHAPTER 18. (F) MINIMAL POLYNOMIALS OF MATRICES AND LINEAR TRANSFORMATIONS (OPTIONAL)

Convention 18.1.8. When discussing “the” Proposition 18.1.16. Let A ∈ Mn(F). Then minimal polynomial of a matrix, we refer to λ is an eigenvalue of A if and only if mA(λ) = 0. the unique monic minimal polynomial, denoted mA. Exercise 18.1.17. Prove: similar matrices have the same minimal polynomial, i. e., if Corollary 18.1.9 (Cayley-Hamilton re- A ∼ B then mA = mB. stated). The minimal polynomial of a matrix divides its characteristic polynomial. Proposition 18.1.18. Let A ∈ Mn(C). If mA does not have multiple roots then A is diago- Corollary 18.1.10. Let A ∈ M ( ). Then n F nalizable. deg mA ≤ n. In fact, this condition is necessary and suf- Example 18.1.11. mI = t − 1 . ficient (R Theorem 18.2.20). 1 1 Exercise 18.1.12. Let A = . Prove 0 1 2 mA = (t − 1) . 18.2 Minimal polynomials Exercise 18.1.13. Find two 2 × 2 matrices of linear transforma- with the same characteristic polynomial but different minimal polynomials. tions Exercise 18.1.14. Let In this section, V is an n-dimensional vector space over F. A = diag(λ1, . . . , λ1, λ2, . . . , λ2, . . . , λk, . . . , λk) In Section 16.4.2 we defined what it means where the λ are distinct. Prove that to plug a linear transformation into a polyno- i mial (R Def. 16.4.37). k Y mA = (t − λi) . (18.1) Exercise 18.2.1. Define what it means for the i=1 polynomial f ∈ F[t] to annihilate the linear Exercise 18.1.15. Let A be a block-diagonal transformation ϕ : V → V . A  1 Exercise 18.2.2. Let ϕ : V → V be a linear  A 0  2  transformation. Prove that there is a nonzero matrix, say A =  .. . Give a 0 .  polynomial that annihilates ϕ. Ak

simple expression of mA in terms of the mAi . Exercise 18.2.3. Define a minimal polyno- mA = lcm mA . mial of a linear transformation ϕ : V → V . i i 18.2. MINIMAL POLYNOMIALS OF LINEAR TRANSFORMATIONS 129

Example 18.2.4. The polynomial t − 1 is a (a) Let b be a basis of V and let ϕ : V → V minimal polynomial of the identity transforma- be a linear transformation. Let A = [ϕ]b. tion. Show that mϕ = mA.

Proposition 18.2.5. Let ϕ : V → V be a (b) Use this to give a second proof of Ex. linear transformation and let m be a minimal 18.1.17 (similar matrices have the same polynomial of ϕ. Then for all g ∈ F[t], we have minimal polynomial). g(ϕ) = 0 if and only if m | g. Recall the definition of an invariant sub- Corollary 18.2.6. Let ϕ : V → V be a linear space of the linear transformation ϕ (R Def. transformation. Then the minimal polynomial 16.4.22). of ϕ is unique up to nonzero scalar factors. Definition 18.2.13 (Minimal invariant sub- space). Let ϕ : V → V and let W ≤ V . We Convention 18.2.7. When discussing “the” say that W is a minimal ϕ-invariant subspace minimal polynomial of a linear transformation, if W is a ϕ-invariant subspace, W 6= {0}, and we shall refer to the unique monic minimal the only ϕ-invariant subspaces of W are W and polynomial, denoted m . ϕ {0}. Proposition 18.2.8. Let ϕ : V → V be a Lemma 18.2.14. Let ϕ : V → V be a linear linear transformation. Then deg m ≤ n. ϕ transformation and let W ≤ V be a ϕ-invariant subspace . Let ϕW denote the restriction of ϕ Example 18.2.9. mid = t − 1. to W . Then mϕW | mϕ. ♦ Exercise 18.2.10. Let ϕ : V → V be a linear Proposition 18.2.15. Let V = W ⊕· · ·⊕W transformation with an eigenbasis. Prove: all 1 k where the W are ϕ-invariant subspaces and ⊕ roots of m belong to and m has no mul- i ϕ F ϕ denotes their direct sum (R Def. 15.5.2). If tiple roots. Let b1,..., bn be an eigenbasis, mi denotes the minimal polynomial of ϕWi then and let ϕ(bi) = λibi for each i. Find mϕ in terms of the λi. mϕ = lcm mi . (18.2) i Proposition 18.2.11. Let ϕ : V → V be a linear transformation. Then λ is an eigenvalue (a) Prove this without using Ex. 18.1.15, of ϕ if and only if mϕ(λ) = 0. i. e., without translating the problem to matrices. Exercise 18.2.12 (Consistency of transla- tion). (b) Prove this using Ex. 18.1.15. 130CHAPTER 18. (F) MINIMAL POLYNOMIALS OF MATRICES AND LINEAR TRANSFORMATIONS (OPTIONAL)

Theorem 18.2.16. Let V be a finite- Theorem 18.2.21. Let F = C and let ϕ : V → dimensional vector space and let ϕ : V → V V be a linear transformation. Then ϕ has an be a linear transformation. If mϕ has an ir- eigenbasis if and only if mϕ does not have mul- reducible factor f of degree d, then there is a tiple roots. ♦ d-dimensional minimal invariant subspace W such that The next theorem represents a step toward canonical forms of matrices. mϕW = fϕW = f . (18.3) ♦ Theorem 18.2.22. Let A ∈ Mn(F). Then there is a matrix B ∈ Mn(F) such that A ∼ B Corollary 18.2.17. Let V be a finite- and B is the diagonal sum of matrices whose dimensional vector space over R and let ϕ : minimal polynomials are powers of irreducible V → V be a linear transformation. Then V polynomials. ♦ has a ϕ-invariant subspace of dimension 1 or 2.

Proposition 18.2.18. Let ϕ : V → V be a linear transformation. Let f | mϕ and let W = ker f(ϕ). Then

(a) mϕW | f;

 mϕ  (b) if gcd f, f = 1, then mϕW = f.

(a) f(ϕW ) = f(ϕ)W = 0.

0 (b) mϕ = lcm(mϕW , mϕW 0 ) where W = mϕ 0 ker f (ϕ), i. e., V = W ⊕ W .

Proposition 18.2.19. Let ϕ : V → V be a linear transformation and let f, g ∈ F[t] with mϕ = f · g and gcd(f, g) = 1. Then

n F = (ker f(ϕ)) ⊕ (ker g(ϕ)) (18.4)

Theorem 18.2.20. Let A ∈ Mn(C). Then A is diagonalizable if and only if mA does not have multiple roots over C. ♦ (c) For all u, w ∈ V and all α ∈ R we have hαu, wi = αhu, wi. (Items (b) and (c) express that the inner product is a lin- ear function of its first argument, for any fixed value of the second argument.) Chapter 19 (d) For all u, v, w ∈ V we have hu, v + wi = hu, vi + hu, wi.

(e) For all u, w ∈ V and all α ∈ R we have (R) Euclidean hu, αwi = αhu, wi. (Items (d) and (e) express that the inner product is a lin- Spaces ear function of its second argument, for any fixed value of the first argument. So items (b)-(e) justify the term bilinear.) 19.1 Inner products (f) For all v, w ∈ V we have hv, wi = hw, vi. (The inner product is symmet- ric. Given this assumption, (d) and (e) Let V be a vector space over R. In Section 1.4, we introduced the standard can be dropped.) n R dot product of R ( Def. 1.4.1). We now (g) For all v ∈ V , if v 6= 0 then hv, vi > 0. generalize this to the notion of an inner product (The inner product is positive definite.) over a real vector space V . Observe that the standard dot product of Definition 19.1.1 (Euclidean space). A Eu- n that was introduced in Section 1.4 has all clidean space V is a vector space over en- R R of these properties and, in particular, the vec- dowed with an inner product h·, ·i : V ×V → R tor space n endowed with the standard dot that is bilinear, symmetric, and positive defi- R product as the inner product is a Euclidean nite. We explain these terms. space. (a) To all u, v ∈ V , a unique real number Exercise 19.1.2. Let V be a Euclidean space denoted by hu, vi is assigned. The num- with inner product h·, ·i. Show that for all ber hu, vi is called the inner product of v ∈ V , we have u and v. hv, 0i = h0, vi = 0 . (19.1) (b) For all u, v, w ∈ V we have hu + v, wi = hu, wi + hv, wi Note in particular that hv, vi ≥ 0 for all v ∈ V .

Babai: Discover Linear Algebra. 131 This chapter last updated November 28, 2017 c 2016 L´aszl´oBabai. 132 CHAPTER 19. (R) EUCLIDEAN SPACES

Examples 19.1.3. The following vector zero expected value and finite variance. spaces together with the specified inner prod- We consider two random variables equal ucts are Euclidean spaces (verify this). if they are equal with probability 1. The of the random variables X,Y (a) The vectors in our 2-dimensional and 3- is defined as Cov(X,Y ) = E(XY ) − dimensional geometries with the scalar E(X)E(Y ). With covariance as the in- product defined as v ·w = |v|·|w|·cos(θ) ner product, V is a Euclidean space. where θ is the angle between the vectors v and w. Notice that the same vector space can be endowed with different inner products (for ex- (b) V = Rn, with hx, yi = xT y (the standard ample, different weight functions for the inner dot product). product on R[t]). Because they have inner products, Eu- (c) V = C[0, 1] (the space of continuous clidean spaces carry with them the notion functions f : [0, 1] → R) with the inner of distance (“norm”) and the notion of two R 1 product hf, gi = 0 f(t)g(t)dt . vectors being perpendicular (“orthogonality”). Just as inner products generalize the standard (d) V = [t] (the space of polynomials R dot product in n, these concepts generalize with real coefficients) with the inner R the definitions of norm and orthogonality pre- product hf, gi = R ∞ f(t)g(t)e−t2/2dt . −∞ sented for n (with respect to the standard dot More generally, let ρ(t) be any non- R product) in Section 1.4. negative continuous function that is not Definition 19.1.4 (Norm). Let V be a Eu- identically 0 and has the property that ∞ clidean space, and let v ∈ V . Then the norm R ρ(t)t2ndt < ∞ for all nonnegative −∞ of v, denoted kvk, is integers n (such a function ρ is called p a weight function). Then R[t] is a Eu- kvk := hv, vi . (19.2) clidean space with the inner product The notion of a norm allows us to easily hf, gi = R ∞ f(t)g(t)ρ(t)dt . −∞ define the distance between two vectors. k×n T  Definition 19.1.5. Let V be a Euclidean space, (e) V = R , hA, Bi = Tr AB and let v, w ∈ V . Then the distance between (f) V = Rn, and hx, yi = xT Ay where the vectors v and w, denoted d(v, w), is A ∈ Mn(R) is a symmetric positive defi- d(v, w) := kv − wk . (19.3) nite (R Def. 10.2.3) n × n real matrix The following two theorems show that dis- (g) Let (Ω,P ) be a probability space and let tance in Euclidean spaces behaves the way we V the space of random variables with are used to it behaving in Rn. 19.1. INNER PRODUCTS 133

Theorem 19.1.6 (Cauchy-Schwarz inequal- Observe that two vectors are orthogonal π ity). Let V be a Euclidean space, and let v, w ∈ when the angle between them is 2 . This agrees V . Then with the notion of orthogonality being a gen- eralization of perpendicularity. |hv, wi| ≤ kvk · kwk . (19.4) Exercise 19.1.11. Let V be a Euclidean ♦ space. What vectors are orthogonal to every Theorem 19.1.7 (Triangle inequality). Let V vector? be a Euclidean space, and let v, w ∈ V . Then kv + wk ≤ kvk + kwk . (19.5) Exercise 19.1.12. Let V = C[0, 2π] be the space of continuous functions f : [0, 2π] → R, ♦ endowed with the inner product

Exercise 19.1.8. Show that the triangle in- Z 2π equality is equivalent to the Cauchy-Schwarz hf, gi = f(t)g(t)dt . (19.8) inequality. 0 Definition 19.1.9 (Angle between vectors). Let Show that the set {1, cos t, sin t, cos(2t), sin(2t), V be a Euclidean space, and let v, w ∈ V . The cos(3t),... } is an orthogonal set in this Eu- angle θ between v and w is defined by clidean space. hv, wi θ := arccos . (19.6) Definition 19.1.13 (Orthogonal system). An kvkkwk orthogonal system in a Euclidean space V is Observe that from the Cauchy-Schwarz in- a list of (pairwise) orthogonal nonzero vectors equality, we have (for v, w 6= 0) that in V . hv, wi − 1 ≤ ≤ 1 (19.7) Proposition 19.1.14. Every orthogonal sys- kvkkwk tem in a Euclidean space is linearly indepen- so this definition is valid, as this term is always dent. in the domain of arccos. Definition 19.1.10 (Orthogonality). Let V be Definition 19.1.15 (). Let V be a a Euclidean space, and let v, w ∈ V . Then Euclidean space, and let v1,..., vk ∈ V . The v and w are orthogonal (denoted v ⊥ w) if Gram matrix of v1,..., vk is the k × k matrix whose (i, j) entry is hvi, vji, that is, hv, wi = 0. A set of vectors S = {v1,..., vn} is said to be orthogonal if v ⊥ v whenever i j k i 6= j. G = G(v1,..., vk) := (hvi, vji)i,j=1 . (19.9) 134 CHAPTER 19. (R) EUCLIDEAN SPACES

Exercise 19.1.16. Let V be a Euclidean Proposition 19.1.21. Let V be a Euclidean R space. Show that the vectors v1,..., vk ∈ space. Every linear form f : V → R ( Def. V are linearly independent if and only if 15.1.6) can be written as det G(v1,..., vk) 6= 0. f(x) = ha, xi (19.12) Exercise 19.1.17. Let V be a Euclidean space for a unique a ∈ V . and let v1,..., vk ∈ V . Show

rk(v1,..., vk) = rk(G(v1,..., vk)) . (19.10) 19.2 Gram-Schmidt or- Definition 19.1.18 (Orthonormal system). An thogonalization orthonormal system in a Euclidean space V is a list of (pairwise) orthogonal vectors in V , all The main result of this section is the following theorem. of which have unit norm. So (v1, v2,... ) is an orthonormal system if hvi, vji = δij for all i, j. Theorem 19.2.1. Every finite-dimensional In the case of finite-dimensional Euclidean Euclidean space has an orthonormal basis. In spaces, we are particularly interested in or- fact, every orthonormal system extends to an thonormal bases. orthonormal basis. ♦ Definition 19.1.19 (Orthonormal basis). Let V Before we prove this theorem, we will first be a Euclidean space. A list (v1,..., vn) is an develop Gram-Schmidt orthogonalization, an orthonormal basis (ONB) of V if it is a basis online procedure that takes a list of vectors as and {v1,..., vn} is an orthonormal set. input and produces a list of orthogonal vectors In the next section, not only will we show satisfying certain conditions. We formalize this that every finite-dimensional Euclidean space below. has an orthonormal basis, but we will, more- Theorem 19.2.2 (Gram-Schmidt orthogo- over, demonstrate an algorithm for transform- nalization). Let V be a Euclidean space and ing any basis into an orthonormal basis. let v1,..., vk ∈ V . For i = 1, . . . , k, let Ui = span(v1,..., vi). Then there exist vec- Proposition 19.1.20. Let V be a Euclidean tors e1,..., ek such that space with orthonormal basis b. Then for all v, w ∈ V , (a) For all j ≥ 1, we have vj − ej ∈ Uj−1

T (b) The ej are pairwise orthogonal hv, wi = [v]b [w]b . (19.11) Moreover, the ej are uniquely determined. ♦ 19.2. GRAM-SCHMIDT ORTHOGONALIZATION 135

Note that U0 = span(∅) = {0}. Proposition 19.2.9. If (v1,..., vk) is a basis Let GS(k) denote the statement of Theorem of V , then so is (e1,..., ek). 19.2.2 for a particular value of k. We prove the Theorem 19.2.2 by induction on k. The induc- Exercise 19.2.10. Let e = (e1,..., en) be an tive step is based on the following lemma. orthogonal basis of V . From e, construct an 0 0 0 orthonormal basis e = (e1,..., en) of V . Lemma 19.2.3. Assume GS(k) holds. Then, for all j ≤ k, we have span(e1,..., ej) = Exercise 19.2.11. Conclude that every finite- Uj. ♦ dimensional Euclidean space has an orthonor- mal basis. Exercise 19.2.4. Prove GS(1). Numerical exercise 19.2.12. Apply the Exercise 19.2.5. Let k ≥ 2 and assume Gram-Schmidt procedure to find an orthonor- GS(k − 1) is true. Look for e in the form k mal basis for each of the following Euclidean k−1 spaces V from the basis b. Self-check: once X ek = vk − αiei (19.13) you have applied Gram-Schmidt, verify that i=1 you have obtained an orthonormal set of vec- tors. Prove that the only possible vector ek satisfy- ing the conditions of GS(k) is the e for which k (a) V = R3 with the standard dot product,       hv , e i 1 1 −1 α = i i (19.14) i ke k2 b = 0 , 1 ,  1  i 1 0 2

except in the case where ei = 0 (in that case, (b) V = P [ ], with hf, gi = αi can be chosen arbitrarily). 2 R R ∞ −t2/2 2 −∞ e f(t)g(t)dt, and b = (1, t, t ) Exercise 19.2.6. Prove that ek as constructed (Hermite polynomials) in the previous exercise satisfies GS(k). (c) V = P2[R], with hf, gi = ∞ This completes the proof of Theorem R √ 1 f(t)g(t)dt, and b = (1, t, t2) −∞ 1−t2 19.2.2. (Chebyshev polynomials of the first kind)

Proposition 19.2.7. ei = 0 if and only if v ∈ span(v ,..., v ). (d) V √= P2[R], with hf, gi = i 1 i−1 R ∞ 2 2 −∞ 1 − t f(t)g(t)dt, and b = (1, t, t ) Proposition 19.2.8. The Gram-Schmidt pro- (Chebyshev polynomials of the second cedure preserves linear independence. kind) 136 CHAPTER 19. (R) EUCLIDEAN SPACES

19.3 Isometries and or- orthogonal transformations of V is denoted by thogonal transforma- O(V ), and is called the orthogonal group of V . Proposition 19.3.6. The set O(V ) is a group tions (R Def. 14.2.1) under composition. Definition 19.3.1 (Isometry). Let V and W be Exercise 19.3.7. The linear transformation Euclidean spaces. Then an isometry f : V → ϕ : V → V is orthogonal if and only if ϕ pre- W is a bijection that preserves inner product, serves the norm, i. e., for all v ∈ Cn, we have i. e., for all v1, v2 ∈ V , we have kϕvk = kvk.

hv1, v2iV = hf(v1), f(v2)iW (19.15) Theorem 19.3.8. Let ϕ ∈ O(V ). Then all eigenvalues of ϕ are ±1. ♦ The Euclidean spaces V and W are isometric if there is an isometry between them. Proposition 19.3.9. Let V be a Euclidean space and let e = (e1,..., en) be an or- Theorem 19.3.2. If V and W are finite di- thonormal basis of V . Then ϕ : V → V mensional Euclidean spaces, then they are iso- is an orthogonal transformation if and only if metric if and only if dim V = dim W . ♦ (ϕ(e1), . . . , ϕ(en)) is an orthonormal basis. Proposition 19.3.3. Let V and W be Eu- Proposition 19.3.10 (Consistency of trans- clidean spaces. Then ϕ : V → W is an isome- lation). Let V be a Euclidean space with or- try if and only if it maps an orthonormal basis thonormal basis b, and let ϕ : V → V be of V to an orthonormal basis of W . a linear transformation. Then ϕ is orthogo- nal if and only if [ϕ]b is an orthogonal matrix Proposition 19.3.4. Let ϕ : V → W be an (R Def. 9.1.1). isomorphism that preserves orthogonality (so v ⊥ w if and only if ϕ(v) ⊥ ϕ(w)). Show that Definition 19.3.11. Let V be a Euclidean space there is an isometry ψ and a nonzero scalar λ and let S,T ⊆ V . For v ∈ V , we say that v such that ϕ = λψ. is orthogonal to S (notation: v ⊥ S) if for all s ∈ S, we have v ⊥ s. Moreover, we say that S The geometric notion of congruence is cap- is orthogonal to T (notation: S ⊥ T ) if s ⊥ t tured by the concept of orthogonal transforma- for all s ∈ S and t ∈ T . tions. Definition 19.3.12. Let V be a Euclidean space Definition 19.3.5 (Orthogonal transformation). and let S ⊆ V . Then S⊥ (“S perp”) is the set Let V be a Euclidean space. A linear trans- of vectors orthogonal to S, i. e., formation ϕ : V → V is called an orthogonal ⊥ transformation if it is an isometry. The set of S := {v ∈ V | v ⊥ S}. (19.16) 19.3. ISOMETRIES AND ORTHOGONAL TRANSFORMATIONS 137

Proposition 19.3.13. For all subsets S ⊆ V , We now study the linear map analogue of we have S⊥ ≤ V . the transpose of a matrix, known as the adjoint Proposition 19.3.14. Let S ⊆ V . Then of the linear map. ⊥⊥ S ⊆ S . Theorem 19.3.20. Let V and W be Euclidean Exercise 19.3.15. Verify spaces, and let ϕ : V → W be a linear map. Then there exists a unique linear map ψ : W → ⊥ (a) {0} = V V such that for all v ∈ V and w ∈ W , we have (b) ∅⊥ = V hϕv, wi = hv, ψwi . (19.19) (c) V ⊥ = {0} ♦ The next theorem says that the direct sum (R Def. 15.5.2) of a subspace and its perp is Note that the inner product above refers to the entire space. inner products in two different spaces. To be more specific, we should have written Theorem 19.3.16. If dim V < ∞ and W ≤ ⊥ V , then V = W ⊕ W . ♦ hϕv, wiW = hv, ψwiV . (19.20) The proof of this theorem requires the fol- Definition 19.3.21. The linear map ψ whose lowing lemma. existence is guaranteed by Theorem 19.3.20 is Lemma 19.3.17. Let V be a vector space with called the adjoint of ϕ and is denoted ϕ∗. So dim V = n, and let W ≤ V . Then for all v ∈ V and w ∈ W , we have ⊥ dim W = n − dim W. (19.17) hϕv, wi = hv, ϕ∗wi . (19.21)

♦ The next exercise shows the relationship be- Proposition 19.3.18. Let V be a finite- tween the coordinatization of ϕ and of ϕ∗. The dimensional Euclidean space and let S ⊆ V . reason we denote the adjoint of the linear map Then ϕ by ϕ∗ rather than by ϕT will become clear in ⊥ S⊥ = span(S) . (19.18) Section 20.4.

Proposition 19.3.19. Let U1,...,Uk be pair- Proposition 19.3.22. Let V , W , and ϕ be as wise orthogonal subspaces (i. e., U ⊥ U when- i j in the statement of Theorem 19.3.20. Let b1 ever i 6= j). Then be an orthonormal basis of V and let b2 be an k k orthonormal basis of W . Then X M Ui = Ui . [ϕ∗] = [ϕ]T . (19.22) i=1 i=1 b2,b1 b1,b2 138 CHAPTER 19. (R) EUCLIDEAN SPACES

19.4 First proof of the ϕ : V → V be a symmetric linear transfor- Spectral Theorem mation. Then ϕ has an orthonormal eigenba- sis. ♦ We first stated the Spectral Theorem for real Proposition 19.4.5. Let ϕ be a symmetric symmetric matrices in Chapter 10. We now re- linear transformation of the Euclidean space V , state the theorem in the context of Euclidean and let W ≤ V . If W is ϕ-invariant (R Def. spaces and break its proof into a series of exer- 16.4.22) then W ⊥ is ϕ-invariant. cises. Lemma 19.4.6. Let ϕ : V → V be a sym- Let V be a Euclidean space. metric linear transformation and let W be a Definition 19.4.1 (Symmetric linear transfor- ϕ-invariant subspace of V . Then the restric- mation). Let ϕ : V → V be a linear trans- tion of ϕ to W is also symmetric. ♦ formation. Then ϕ is symmetric if, for all The heart of the proof of the Spectral The- v, w ∈ V , we have orem is the following lemma. hv, ϕ(w)i = hϕ(v), wi . (19.23) Main Lemma 19.4.7. Let ϕ be a symmetric linear transformation of a Euclidean space of Proposition 19.4.2. Let ϕ : V → V be a degree ≥ 1. Then ϕ has an eigenvector. symmetric transformation, and let v1,..., vk Exercise 19.4.8. Assuming Lemma 19.4.7, be eigenvectors of ϕ with distinct eigenvalues. prove the Spectral Theorem by induction on Then vi ⊥ vj whenever i 6= j. dim V . Proposition 19.4.3. Let b be an orthonormal Before proving the Main Lemma, we digress basis of the finite-dimensional Euclidean space briefly into analysis. V , and let ϕ : V → V be a linear transforma- Definition 19.4.9 (Convergence). Let V be a tion. Then ϕ is symmetric if and only if the Euclidean space and let v = (v1, v2, v3,... ) matrix [ϕ]b is symmetric. be a sequence of vectors in V . We say that the sequence v converges to the vector v if In particular, this proposition establishes lim kvi − vk = 0, that is, if for all ε > 0 there the equivalence of Theorem 10.1.1 and Theo- i→∞ exists N ∈ such that kv − vk < ε for all rem 19.4.4. Z i i > N. The main theorem of this section is the n Spectral Theorem. Definition 19.4.10 (Open set). Let S ⊆ R . We say that S is open if, for every v ∈ S, n Theorem 19.4.4 (Spectral Theorem). Let V there exists ε > 0 such that the set {w ∈ R | be a finite-dimensional Euclidean space and let kv − wk < ε} is a subset of S. 19.4. FIRST PROOF OF THE SPECTRAL THEOREM 139

Definition 19.4.11 (Closed set). A set S ⊆ V While this function is defined for all linear is closed if, whenever there is a sequence of transformations, its significance is in its appli- vectors v1, v2, v3, · · · ∈ S that converges to a cations to symmetric linear transformations. vector v ∈ V , we have v ∈ S. Exercise 19.4.17. Let ϕ : V → V be a linear Definition 19.4.12 (Bounded set). A set S ⊆ V transformation, and let v ∈ V and λ ∈ R×. is bounded if there is some v ∈ V and r ∈ R Verify that Rϕ(λv) = Rϕ(v). such that for all w ∈ V , we have d(v, w) < r (R Def. 19.1.5: d(v, w) = kv − wk). Proposition 19.4.18. Let ϕ : V → V be a linear transformation. Then the Rayleigh quo- Definition 19.4.13 (Compactness). Let V be tient Rϕ attains its maximum and its mini- a finite-dimensional Euclidean space, and let mum. S ⊆ V . Then S is compact if it is closed and bounded.1 Exercise 19.4.19. Consider the real function α + βt + γt2 Definition 19.4.14. Let S be a set and let f(t) = f : S → R be a function. We say that f at- δ + εt2 tains its maximum if there is some s ∈ S such 0 where α, β, γ, δ, ε ∈ R and δ2 +ε2 6= 0. Suppose that for all s ∈ S, we have f(s0) ≥ f(s). The f(0) ≥ f(t) for all t ∈ . Show that β = 0. notion of a function attaining its minimum is R defined analogously. Definition 19.4.20 (arg max). Let S be a set and let f : S → R be a function which attains Theorem 19.4.15. Let V be a Euclidean space its maximum. Then arg max f := {s0 ∈ S | and let S ⊆ V be a compact and nonempty sub- f(s0) ≥ f(s) for all s ∈ S}. set. If f : S → is continuous, then f attains R Convention 19.4.21. Let S be a set and let its maximum and its minimum. ♦ f : S → R be a function which attains its Definition 19.4.16 (Rayleigh quotient for linear maximum. We often write arg max f to refer transformations). Let ϕ : V → V be a linear to any (arbitrarily chosen) element of the set transformation. Then the Rayleigh quotient of arg max f, rather than the set itself. ϕ is the function Rϕ : V \{0} → R defined by Proposition 19.4.22. Let ϕ be a symmetric linear transformation of the Euclidean space V , hv, ϕ(v)i and let v = arg max R (v). Then v is an Rϕ(v) := . (19.24) 0 ϕ 0 kvk2 eigenvector of ϕ. 1The reader familiar with topology may recognize this not as the standard definition of compactness in a gen- eral metric space, but rather as an equivalent definition for finite-dimensional Euclidean spaces that arises as a consequence of the Heine–Borel Theorem. 140 CHAPTER 19. (R) EUCLIDEAN SPACES

Prop. 19.4.22 completes the proof of the Main Lemma and thereby the proof of the Spectral Theorem.

Proposition 19.4.23. If two symmetric ma- trices are similar then they are orthogonally similar. (d) f(λv, w) = λf(v, w) for all v, w ∈ V and λ ∈ C, where λ is the complex con- jugate of λ. Examples 20.1.2. The function f(v, w) = v∗w is a sesquilinear form over Cn. More gen- Chapter 20 erally, for any A ∈ Mn(C), f(v, w) = v∗Aw (20.1) (C) Hermitian is sesquilinear. Spaces Exercise 20.1.3. Let V be a vector space over C and let f : V ×V → C be a sesquilinear form. Show that for all v ∈ V , we have

In Chapter 19, we discussed Euclidean spaces, f(v, 0) = f(0, v) = 0 . (20.2) whose underlying vector spaces were real. We Definition 20.1.4 (Hermitian form). Let V be now generalize this to the notion of Hermi- a complex vector space. The function f : tian spaces, whose underlying vector spaces are V × V → is Hermitian if for all v, w ∈ V , complex. C f(v, w) = f(w, v) . (20.3)

20.1 Hermitian spaces A sesquilinear form that is Hermitian is called a Hermitian form. Definition 20.1.1 (Sesquilinear forms). Let V be a vector space over C. The function f : Exercise 20.1.5. For what matrices A ∈ V × V → is a sesquilinear form if the follow- C Mn(C) is the sesquilinear form f(v, w) = ing four conditions are met. v∗Aw Hermitian? (a) f(v, w1 + w2) = f(v, w1) + f(v, w2) for Exercise 20.1.6. Show that for Hermitian all v, w1, w2 ∈ V , forms, (b) follows from (a) and (d) follows from (c) in Def. 20.1.1. (b) f(v1 + v2, w) = f(v1, w) + f(v2, w) for all v , v , w ∈ V , 1 2 Fact 20.1.7. Let V be a vector space over C (c) f(v, λw) = λf(v, w) for all v, w ∈ V and let f : V × V → C be a Hermitian form. and λ ∈ C, Then f(v, v) ∈ R for all v ∈ V .

Babai: Discover Linear Algebra. 141 This chapter last updated August 9, 2016 c 2016 L´aszl´oBabai. 142 CHAPTER 20. (C) HERMITIAN SPACES

Definition 20.1.8. Let V be a vector space over Exercise 20.1.11. Let r = (ρ0, ρ1,... ) be C, and let f : V ×V → C be a Hermitian form. an infinite sequence of complex numbers. Consider the space V of infinite sequences (a) If f(v, v) > 0 for all v 6= 0, then f is (α0, α1,... ) of complex numbers such that positive definite. ∞ X 2 (b) If f(v, v) ≥ 0 for all v ∈ V , then f is |αi| |ρi| < ∞. (20.5) positive semidefinite. i=0 For a = (α0, α1,... ) and b = (β0, β1,... ), de- (c) If f(v, v) < 0 for all v 6= 0, then f is fine ∞ negative definite. X F (a, b) := αiβiρi . (20.6) (d) If f(v, v) ≤ 0 for all v ∈ V , then f is i=0 negative semidefinite. (a) Show that F is a sesquilinear form.

(e) If there exists v, w ∈ V such that (b) Under what conditions on r is F Hermi- f(v, v) > 0 and f(w, w) < 0, then f tian? is indefinite. (c) Under what conditions on r is F Hermi- tian and positive definite? Exercise 20.1.9. For what A ∈ Mn(C) is ∗ v Aw positive definite, positive semidefinite, Definition 20.1.12 (Hermitian space). A Her- etc.? mitian space is a vector space V over C en- dowed with a positive definite, Hermitian, Exercise 20.1.10. Let V be the space of con- sesquilinear inner product h·, ·i : V × V → C. tinuous functions f : [0, 1] → C, and let ρ : [0, 1] → C be a continuous “weight func- Example 20.1.13. The standard example of a tion.” Define (complex) Hermitian space is Cn endowed with the standard Hermitian dot product (R Def. 1 Z ∗ F (f, g) := f(t)g(t)ρ(t)dt . (20.4) 12.2.6), that is, hv, wi := v w. 0 Note that the standard Hermitian dot prod- (a) Show that F is a sesquilinear form. uct is sesquilinear and positive definite. Example 20.1.14. Let V be the space of con- (b) Under what conditions on ρ is F Hermi- tinuous functions f : [0, 1] → , and define the tian? C inner product (c) Under what conditions on ρ is F Hermi- Z 1 tian and positive definite? hf, gi = f(t)g(t)dt 0 20.1. HERMITIAN SPACES 143

Then V endowed with this inner product is a Distance in Hermitian spaces obeys the Hermitian space. same properties that we are used to in Eu- clidean spaces. Example 20.1.15. The space `2(C) of se- quences (α0, α1,... ) of complex numbers such Theorem 20.1.18 (Cauchy-Schwarz inequal- that ∞ ity). Let V be a Hermitian space, and let X 2 |αi| < ∞ (20.7) v, w ∈ V . Then i=0 endowed with the inner product |hv, wi| ≤ kvk · kwk . (20.11) ∞ X ♦ ha, bi = αiβi (20.8) i=0 Theorem 20.1.19 (Triangle inequality). Let where a = (α0, α1,... ) and b = (β0, β1,... ) V be a Hermitian space, and let v, w ∈ V . is a Hermitian space. This is one of the stan- Then dard representations of the complex “separable .” kv + wk ≤ kvk + kwk . (20.12) Euclidean spaces generalize the geometric ♦ concepts of distance and perpendicularity via the notions of norm and orthogonality, respec- Again, like in Euclidean spaces, norms tively; these are easily extended to complex carry with them the notion of angle; however, Hermitian spaces. because hv, wi is not necessarily real, our defi- Definition 20.1.16 (Norm). Let V be a Hermi- nition of angle is not identical to the definition tian space, and let v ∈ V . Then the norm of of angle presented in Section 19.1. v, denoted kvk, is defined to be Definition 20.1.20 (Orthogonality). Let V be a Hermitian space. Then we say that v, w ∈ V kvk := phv, vi . (20.9) are orthogonal (notation: v ⊥ w) if hv, wi = 0. Just as in Euclidean spaces, the notion of a norm allows us to define the distance between Exercise 20.1.21. Let V be a Hermitian two vectors in a Hermitian space. space. What vectors are orthogonal to every vector? Definition 20.1.17. Let V be a Hermitian space, and let v, w ∈ V . Then the distance Definition 20.1.22 (Orthogonal system). An between the vectors v and w, denoted d(v, w), orthogonal system in a Hermitian space V is is a list of (pairwise) orthogonal nonzero vectors d(v, w) := kv − wk . (20.10) in V . 144 CHAPTER 20. (C) HERMITIAN SPACES

Proposition 20.1.23. Every orthogonal sys- Exercise 20.1.29. Generalize the Gram- tem in a Hermitian space is linearly indepen- Schmidt orthogonalization procedure dent. (R Section 19.2) to complex Hermitian spaces. Theorem 19.2.2 will hold verbatim, Definition 20.1.24 (Gram matrix). Let V be a replacing the word “Euclidean” by “Hermi- Hermitian space, and let v1,..., vk ∈ V . The tian.” Gram matrix of v ,..., v is the k × k matrix 1 k Proposition 20.1.30. Every finite- whose (i, j) entry is hv , v i, that is, i j dimensional Hermitian space has an orthonor- G = G(v ,..., v ) := (hv , v i)k . (20.13) mal basis. In fact, every orthonormal list of 1 k i j i,j=1 vectors can be extended to an orthonormal Exercise 20.1.25. Let V be a Hermitian basis. space. Show that the vectors v1,..., vk ∈ Proposition 20.1.31. Let V be a Hermitian V are linearly independent if and only if space with orthonormal basis b. Then for all det G(v1,..., vk) 6= 0. v, w ∈ V , hv, wi = [v]∗[w] . (20.15) Exercise 20.1.26. Let V be a Hermitian space b b and let v1,..., vk ∈ V . Show Proposition 20.1.32. Let V be a Hermitian space. Every linear form f : V → C (R Def. rk(v1,..., vk) = rk(G(v1,..., vk)) . (20.14) 15.1.6) can be written as f(x) = ha, xi (20.16) Definition 20.1.27 (Orthonormal system). An orthonormal system in a Hermitian space V is for a unique a ∈ V . a list of (pairwise) orthogonal vectors in V , all of which have unit norm. So (v1, v2,... ) is an 20.2 Hermitian transfor- orthonormal system if hvi, vji = δij for all i, j. In the case of finite-dimensional Hermitian mations spaces, we are particularly interested in or- Definition 20.2.1 (Hermitian linear transfor- thonormal bases, just as we were interested mation). Let V be a Hermitian space. Then in orthonormal bases for finite-dimensional Eu- the linear transformation ϕ : V → V is Hermi- clidean spaces. tian if for all v, w ∈ V , we have Definition 20.1.28 (Orthonormal basis). Let V hϕv, wi = hv, ϕwi . (20.17) be a Hermitian space. An orthonormal basis of V is an orthonormal system that is a basis of Theorem 20.2.2. All eigenvalues of a Hermi- V . tian linear transformation are real. ♦ 20.3. UNITARY TRANSFORMATIONS 145

Proposition 20.2.3. Let ϕ : V → V be a Her- mitian linear transformation, and let v1,..., vk Exercise 20.2.7. Prove the converse of the be eigenvectors of ϕ with distinct eigenvalues. Spectral Theorem for Hermitian transforma- Then vi ⊥ vj whenever i 6= j. tions: If ϕ : V → V satisfies (a) and (b), then Proposition 20.2.4. Let b be an orthonor- ϕ is Hermitian. mal basis of the Hermitian space V , and let In Section 20.6, we shall see a more general ϕ : V → V be a linear transformation. Then form of the Spectral Theorem which extends the transformation ϕ is Hermitian if and only part (a) to normal transformations (R The- if the matrix [ϕ]b is Hermitian. orem 20.6.1). Observe that the definition of a Hermitian linear transformation is analogous to that of a 20.3 Unitary transforma- symmetric linear transformation of a Euclidean space (R Def. 19.4.1), but they are not fully tions analogous. In particular, while a linear trans- In Section 19.3, we introduced orthogonal formation of a Euclidean space that has an transformations (R Def. 19.3.5), which cap- orthonormal eigenbasis is necessarily symmet- tured the geometric notion of congruence in ric, the analogous statement in complex spaces Euclidean spaces. The complex analogues of involves “normal transformations” (R Def. real orthogonal transformations are called uni- 20.5.1), as opposed to Hermitian transforma- tary transformations. tions. Definition 20.3.1 (Unitary transformation). Exercise 20.2.5. Find a linear transformation Let V be a Hermitian space. Then the trans- of Cn that has an orthonormal eigenbasis but formation ϕ : V → V is unitary if it preserves is not Hermitian. the inner product, i. e., However, the Spectral Theorem does ex- hϕv, ϕwi = hv, wi (20.18) tend to Hermitian linear transformations. for all v, w ∈ V . The set of unitary transfor- Theorem 20.2.6 (Spectral Theorem for Her- mations ϕ : V → V is denoted by U(V ). mitian transformations). Let V be a Hermitian Proposition 20.3.2. The set U(V ) is a group space and let ϕ : V → V be a Hermitian linear (R Def. 14.2.1) under composition. transformation. Then Exercise 20.3.3. The linear transformation (a) ϕ has an orthonormal eigenbasis; ϕ : V → V is unitary if and only if ϕ pre- serves the norm, i. e., for all v ∈ Cn, we have (b) all eigenvalues of ϕ are real. kϕvk = kvk. 146 CHAPTER 20. (C) HERMITIAN SPACES

Warning. The proof of this is trickier than Theorem 20.4.1. Let V and W be Hermi- in the real case (R Ex. 19.3.7). tian spaces, and let ϕ : V → W be a linear map. Then there exists a unique linear map Theorem 20.3.4. Let ϕ ∈ U(V ). Then all ψ : W → V such that for all v ∈ V and eigenvalues of ϕ have absolute value 1. ♦ w ∈ W , we have Proposition 20.3.5 (Consistency of transla- hϕv, wi = hv, ψwi . (20.19) tion). Let b be an orthonormal basis of the Her- mitian space V , and let ϕ : V → V be a linear ♦ transformation. Then the transformation ϕ is Note that the inner product above refers to unitary if and only if the matrix [ϕ]b is unitary. inner products in two different spaces. To be Theorem 20.3.6 (Spectral Theorem for uni- more specific, we should have written tary transformations). Let V be a Hermitian hϕv, wiW = hv, ψwiV . (20.20) space and let ϕ : V → V be a unitary transfor- mation. Then Definition 20.4.2. The linear map ψ whose existence is guaranteed by Theorem 20.4.1 is (a) ϕ has an orthonormal eigenbasis; called the adjoint of ϕ and is denoted ϕ∗. So for all v ∈ V and w ∈ W , we have (b) all eigenvalues of ϕ have unit absolute value. hϕv, wi = hv, ϕ∗wi . (20.21) Proposition 20.4.3 (Consistency of transla- tion). Let V , W , and ϕ be as in the statement Exercise 20.3.7. Prove the converse of the of Theorem 20.4.1. Let b be an orthonormal Spectral Theorem for unitary transformations: 1 basis of V and let b2 be an orthonormal basis If ϕ : V → V satisfies (a) and (b), then ϕ is of W . Then unitary. [ϕ∗] = [ϕ]∗ . (20.22) b2,b1 b1,b2 20.4 Adjoint transforma- TO BE WRITTEN. tions in Hermitian spaces 20.5 Normal transforma- tions We now study the linear map analogue of the conjugate-transpose of a matrix, known as the Definition 20.5.1 (Normal transformation). Hermitian adjoint. Let V be a Hermitian space. The transforma- 20.6. THE COMPLEX SPECTRAL THEOREM FOR NORMAL TRANSFORMATIONS 147 tion ϕ : V → V is normal if it commutes with its adjoint, i. e., ϕ∗ϕ = ϕϕ∗. TO BE WRITTEN.

20.6 The Complex Spec- tral Theorem for nor- mal transformations

The main result of this section is the gener- alization of the Spectral Theorem to normal transformations.

Theorem 20.6.1 (Complex Spectral Theo- rem). Let ϕ : V → V be a linear transforma- tion. Then ϕ has an orthonormal eigenbasis if and only if ϕ is normal. ♦ else, i. e., the matrix   σ1 0  σ2   .   ..       σr  Chapter 21    0   .   0 ..  ( , ) The 0 R C Note that such a matrix is a “diagonal” matrix Singular Value which is not necessarily square. Theorem 21.1.2 (Singular Value Decomposi- Decomposition tion). Let A ∈ Ck×n. Then there exist unitary matrices S ∈ U(k) and T ∈ U(n) such that

∗ S AT = diagk×n(σ1, . . . , σr) (21.1) In this chapter, we discuss matrices over C, but every statement of this chapter holds over where each “singular value” σi is real, σ1 ≥ R k×n as well, if we replace matrix adjoints by trans- σ2 ≥ · · · ≥ σr > 0, and r = rk A. If A ∈ R , poses and the word “unitary” by “orthogonal.” then we can let S and T be real, i. e., S ∈ O(k) and T ∈ O(n). Theorem 21.1.3. The singular values of A are the square roots of the nonzero eigenvalues 21.1 The Singular Value ∗ of A A. ♦ Decomposition The remainder of this section is dedicated to proving the following restatement of Theo- In this chapter we study the “Singular Value rem 21.1.2. Decomposition,” an important tool in many ar- eas of math and computer science. Theorem 21.1.4 (Singular Value Decompo- sition, restated). Let V and W be Hermitian Notation 21.1.1. For r ≤ min{k, n}, the matrix spaces with dim V = n and dim W = k. Let diagk×n(σ1, . . . , σr) is the k × n matrix with σi ϕ : V → W be a linear map of rank r. Then (1 ≤ i ≤ r) in the (i, i) entry and 0 everywhere there exist orthonormal bases e and f of V and

Babai: Discover Linear Algebra. 148 This chapter last updated August 29, 2016 c 2016 L´aszl´oBabai. 21.2. LOW-RANK APPROXIMATION 149

W , respectively, and real numbers σ1 ≥ · · · ≥ Definition 21.2.1 (Truncated diagonal matrix). ∗ σr > 0 such that ϕei = σif i and ϕ f i = σiei Let D = diagk×n(σ1, . . . , σr) be a rank-r ma- ∗ for i = 1, . . . , r, and ϕej = 0 = ϕ f j for trix (so σ1, . . . , σr 6= 0). The rank-` truncation j > r. (` ≤ r) of D, denoted D`, is the k × n matrix D` == diagk×n(σ1, . . . , σ`). Exercise 21.1.5. Show that Theorem 21.1.4 The next theorem explains how the Singu- is equivalent to Theorem 21.1.2. lar Value Decomposition helps us find low-rank Exercise 21.1.6. Let V and W be Hermitian approximations. spaces and let ϕ : V → W be a linear map Theorem 21.2.2 (Nearest low-rank matrix). of rank r. Let e = (e ,..., e ,..., e ) be an k×n 1 r n Let A ∈ C be a matrix of rank r with sin- orthonormal eigenbasis of ϕ∗ϕ (why does such gular values σ1 ≥ · · · ≥ σr > 0. Define S, a basis exist?) with corresponding eigenvalues T , and D as guaranteed by the Singular Value λ1 ≥ · · · ≥ λr > 0 = λr+1 = ··· = λn. Let Decomposition Theorem so that √ 1 σi = λi for u = 1, . . . , r. Let f i = σ ei for i ∗ i = 1, . . . , r. Then for 1 ≤ i ≤ r, S AT = diagk×n(σ1, . . . , σr) = D. (21.2)

(a) ϕei = σif i; Given ` ≤ r, let

∗ D = diag (σ , . . . , σ ) (21.3) (b) ϕ f i = σiei . ` k×n 1 ` ∗ and define B` = SD`T . Then B` is the ma- Exercise 21.1.7. The f i of the preceding ex- ercise are orthonormal. trix of rank at most ` which is nearest to A under the operator norm, i. e., rk B` = ` and k×n Exercise 21.1.8. Complete the proof of The- for all B ∈ C , if rk B ≤ `, then kA − B`k ≤ orem 21.1.4, and hence of Theorem 21.1.2. kA − Bk.

Exercise 21.2.3. Let A, B`, D, and D` be as 21.2 Low-rank approxima- in the statement of Theorem 21.2.2. Show tion (a) kA − B`k = kD − D`k ; (b) kD − D k = σ . In this section, we use the Singular Value De- ` `+1 composition to find low-rank approximations As with the proof of the Singular Value De- to a matrix, that is, the matrix of a given rank composition Theorem, Theorem 21.2.2 is easier which is “closest” to a specified matrix under to prove in terms of linear maps. We restate the operator norm (R Def. 13.1.1). the theorem as follows. 150 CHAPTER 21. (R, C) THE SINGULAR VALUE DECOMPOSITION

Theorem 21.2.4. Let V and W be Hermitian In fact, the rank-` matrix guaranteed by spaces with orthonormal bases e and f, respec- Theorem 21.2.2 to be nearest to A under the tively, and let ϕ : V → W be a linear map operator norm is also the rank-` matrix near- such that ϕei = σif i for i = 1, . . . , r. Define est to A under the Frobenius norm (R Def. the truncated map ϕ` : V → V by 13.2.1).  σif i 1 ≤ i ≤ ` Theorem 21.2.10. The statement of Theo- ϕ`ei = (21.4) 0 otherwise rem 21.2.2 holds for the same matrix B` when the operator norm is replaced by the Frobenius Then whenever ψ : V → W is a linear map of norm. That is, we also have kA − B`kF ≤ rank ≤ `, we have kϕ − ϕ`k ≤ kϕ − ψk. k×n kA−BkF for all rank-` matrices B ∈ C . ♦ Exercise 21.2.5. Show that Theorem 21.2.4 is equivalent to Theorem 21.2.2.

Exercise 21.2.6. Let ϕ and ϕ` be as in the statement of the preceding theorem. Show kϕ − ϕ`k = σ`+1. It follows that in order to prove Theorem 21.2.4, it suffices to show that for all linear maps ψ : V → W of rank ≤ `, we have kϕ − ψk ≥ σ`+1. Exercise 21.2.7. Let ψ : V → W be a lin- ear map of rank ≤ `. Show that there exists v ∈ ker ψ such that k(ϕ − ψ)vk ≥ σ . (21.5) kvk `+1 Exercise 21.2.8. Complete the proof of The- orem 21.2.4, hence of Theorem 21.2.2.

Exercise 21.2.9. Show that the matrix B` whose existence is guaranteed by Theorem 21.2.2 is unique, i. e., if there is a matrix B ∈ Ck×` of rank ` such that, kA − B0k ≤ kA − Bk 0 k×` for all rank-` matrices B ∈ C , then B = B`. 1  1    (d) . 0  .  1

Exercise 22.1.4. Let A ∈ Rk×n and B ∈ Chapter 22 Rn×m. Prove that if A and B are stochastic matrices then AB is a stochastic matrix. Exercise 22.1.5. (R) Finite Markov (a) Let A be a stochastic matrix. Prove that Chains A1 = 1. (b) Show that the converse is false. (c) Show that A is stochastic if and only if A is nonnegative and A1 = 1. 22.1 Stochastic matrices Definition 22.1.6 (Probability distribution). A Definition 22.1.1 (). We probability distribution is a list of nonnegative say that A is a nonnegative matrix if all en- numbers which add to 1. tries of A are nonnegative. Fact 22.1.7. Every row of a stochastic matrix Definition 22.1.2 (Stochastic matrix). A is a probability distribution. square matrix A = (αij) ∈ Mn(R) is stochastic if it is nonnegative and the rows of A sum to n 22.2 Finite Markov Chains X 1, i. e., αij = 1 for all i. A finite Markov Chain is a stochastic process j=1 defined by a finite number of states and con- Examples 22.1.3. The following matrices are stant transition probabilities. We consider the stochastic. trajectory of a particle that can move between states at discrete time steps with the given 1 (a) n Jn transition probabilities. In this section, we de- note by Ω = [n] the finite set of states. (b) Permutation matrices FIGURE HERE 0.8 0.2 (c) 0.3 0.7 This is described more formally below.

Babai: Discover Linear Algebra. 151 This chapter last updated August 24, 2016 c 2016 L´aszl´oBabai. 152 CHAPTER 22. (R) FINITE MARKOV CHAINS

n Definition 22.2.1 (Finite Markov Chain). Let X Fact 22.2.7. For every t ≥ 0, qt,i = 1. Ω = [n] be a set of states. We denote by Xt ∈ Ω i=1 the position of the particle at time t. The tran- sition probability pij is the probability that the Exercise 22.2.8 (Evolution of Markov particle moves to state j at time t + 1, given Chains, II). Let T be the transition matrix of that it is in state i at time t, i. e., a finite Markov Chain. Show that qt+1 = qtT t and conclude that qt = q0T . pij = P (Xt+1 = j | Xt = i) . (22.1) n Definition 22.2.9 (Stationary distribution). X In particular, each pij ≥ 0 and pij = 1 for The probability distribution q is a stationary j=1 distribution if q = qT , i. e., if q is a left eigen- every i. The infinite sequence (X0,X1,X2,... ) vector (R Def. 8.1.17) with eigenvalue 1. is a stochastic process. Definition 22.2.2 (Transition matrix). The Proposition 22.2.10. Let A ∈ Mn(R) be a transition matrix corresponding to our finite stochastic matrix. Show that A has a left eigenvector with eigenvalue 1. Markov Chain is the matrix T = (pij). Fact 22.2.3. The transition matrix T of a fi- The preceding proposition in conjunction nite Markov Chain is a stochastic matrix, and with the next theorem shows that every every stochastic matrix is the transition matrix stochastic matrix has a stationary distribution. of a finite Markov Chain. Recall (R Def. 22.1.1) that a nonnegative Notation 22.2.4 (r-step transition probability). matrix is a matrix whose entries are all nonneg- The r-step transition probability from state i to ative; a nonnegative vector is defined similarly. (r) Do not prove the following theorem. state j, denoted pij is defined by p(r) = P (X = j | X = i) . (22.2) Theorem 22.2.11 (Perron-Frobenius). Every ij t+r t nonnegative square matrix has a nonnegative Exercise 22.2.5 (Evolution of Markov eigenvector. Chains, I). Let T = (pij) be the transition ma- trix corresponding to a finite Markov Chain. Corollary 22.2.12. Every finite Markov r  (r) Chain has a stationary distribution. Show that T = pij where Proposition 22.2.13. If T is the transition Definition 22.2.6. Let qt,i be the probability matrix of a finite Markov Chain and lim T r = that the particle is in state i at time t. We r→∞ define qt = (qt,1, . . . , qt,n) to be the distribution L exists, then every row of L is a stationary of the particle at time t. distribution. 22.3. DIGRAPHS 153

In order to determine which Markov Chains Theorem 22.4.4. If T is the transition matrix have transition matrices that converge, we of an ergodic finite Markov Chain, then lim T r r→∞ study the directed graphs associated with fi- exists. nite Markov Chains. Proposition 22.4.5. For irreducible Markov Chains, the stationary distribution is unique. 22.3 Digraphs TO BE WRITTEN. 22.5 Finite Markov Chains and undirected graphs 22.4 Digraphs and Markov Chains TO BE WRITTEN.

Every finite Markov Chain has an associated 22.6 Additional exercises digraph, where i → j if and only if pij 6= 0.

FIGURE HERE Definition 22.6.1 (). A matrix A = (αij) is doubly stochastic if both T Definition 22.4.1 (Irreducible Markov Chain). A and A are stochastic, i. e., 0 ≤ αij ≤ 1 for A finite Markov Chain is irreducible if its asso- all i, j and every row and column sum is equal ciated digraph is strongly connected. to 1. Proposition 22.4.2. If T is the transition ma- Examples 22.6.2. The following matrices are trix of an irreducible finite Markov Chain and doubly stochastic. lim T r = L exists, then all rows of L are the r→∞ (a) 1 J same, so rk L = 1. n n Definition 22.4.3 (Ergodic Markov Chain). A (b) Permutation matrices finite Markov Chain is ergodic if its associated 0.1 0.3 0.6 digraph is strongly connected and aperiodic. (c) 0.2 0.5 0.3 The following theorem establishes a suffi- 0.7 0.2 0.1 cient condition under which the probability dis- tribution of a finite Markov Chain converges Theorem∗ 22.6.3 (Birkhoff’s Theorem). The to the stationary distribution. For irreducible set of doubly stochastic matrices is the convex Markov Chains, this is necessary and sufficient. hull (R Def. 5.3.6) of the set of permutation 154 CHAPTER 22. (R) FINITE MARKOV CHAINS matrices (R Def. 2.4.3). In other words, ev- ery doubly stochastic matrix can be expressed as a convex combination (R Def. 5.3.1) of permutation matrices. ♦ Chapter 23

More Chapters

TO BE WRITTEN.

Babai: Discover Linear Algebra. This chapter last updated July 7, 2016 c 2016 L´aszl´oBabai. Prove (a) and (b) together. Prove that the sub- space defined by (b) satisfies the definition.

1.2.15 R Exercise R Solution R Preceding exercise.

Chapter 24 1.2.16 R Exercise R Solution Show that span(T ) ≤ span(S).

Hints 1.2.18 R Exercise R Solution

It is clear that U1 + U2 ⊆ span(U1 ∪ U2). For the reverse inclusion you need to show that a linear combination from U1 ∪ U2 can always be 24.1 Column Vectors written as a sum u1 + u2 for some u1 ∈ U1 and u2 ∈ U2. R R 1.1.17 Exercise Solution 1.3.9 R Exercise R Solution (b) Consider the sums of the entries in each You need to find α1, α2, α3 such that column vector.  1  4 −2 0 1.2.7 R Exercise R Solution α1 −3 + α2 2 + α3 −8 = 0 . Show that the sum of two vectors of zero weight 2 1 3 0 has zero weight and that scaling a zero-weight vector produces a zero-weight vector. This gives you a system of linear equations. Find a nonzero solution. 1.2.8 R Exercise R Solution n 1.3.11 R Exercise R Solution What are the subspaces of R which are P spanned by one vector? Assume αivi = 0. What condition on αj allows you to express vj in terms of the other 1.2.9 R Exercise R Solution vectors? (c) The “if” direction is trivial. For the “only 1.3.13 R Exercise R Solution if” direction, begin with vectors w1 ∈ W1 \ W2 and w2 ∈ W2 \ W1. Where is w1 + w2? What does the empty sum evaluate to?

1.2.12 R Exercise R Solution 1.3.15 R Exercise R Solution

Babai: Discover Linear Algebra. 156 This chapter last updated August 18, 2016 c 2016 L´aszl´oBabai. 24.1. COLUMN VECTORS 157

Find a nontrivial linear combination of v and by some wi1 , then v2 by some wi2 , etc. So in v that evaluates to 0. the end, wi1 ,..., wik are linearly independent.

1.3.16 R Exercise R Solution 1.3.45 R Exercise R Solution The only linear combinations of the list (v) are Use the standard basis of Fk and the preceding of the form αv for α ∈ F. exercise. 1.3.17 R Exercise R Solution 1.3.46 R Exercise R Solution

R Ex. 1.3.13. It is clear that rk(v1,..., vk) ≤ dim(span(v1,..., vk) (why?). For the other R R 1.3.21 Exercise Solution direction, first assume that the vi are linearly P If αivi = 0 and not all the αi are 0, then it independent, and show that span(v1,..., vk) must be the case that αk+1 6= 0 (why?). cannot contain a list of more than k linearly independent vectors. 1.3.22 R Exercise R Solution R Prop. 1.3.11 1.3.51 R Exercise R Solution Show that if this list were not linearly inde- R R 1.3.23 Exercise Solution pendent, then U1 ∩ U2 would contain a nonzero What is the simplest nontrivial linear combi- vector. nation of such a list which evaluates to zero? 1.3.52 R Exercise R Solution 1.3.24 R Exercise R Solution R Ex. 1.3.51. Combine Prop. 1.3.15 and Fact 1.3.20. 1.3.53 R Exercise R Solution R R 1.3.27 Exercise Solution Start with a basis of U1 ∩ U2. Begin with a basis. How can you add one more vector that will satisfy the condition? 1.4.9 R Exercise R Solution Only the zero vector. Why? 1.3.41 R Exercise R Solution Suppose not. Use Lemma 1.3.21 to show that 1.4.10 R Exercise R Solution span(w1,..., wm) ≤ span(v2,..., vk) and ar- Express 1 · x in terms of the entries of x. rive at a contradiction. 1.5.4 R Exercise R Solution 1.3.42 R Exercise R Solution Begin with a linear combination W of the vec- Use the Steinitz exchange lemma to replace v1 tors in S which evaluates to zero. Conclude 158 CHAPTER 24. HINTS

that all coefficients must be 0 by taking the Consider (A − B)x. dot product of W with each member of S. 2.2.31 R Exercise R Solution 1.6.4 R Exercise R Solution R Prop. 2.2.28. Incidence vectors. 2.2.32 R Exercise R Solution R Prop. 2.2.29 24.2 Matrices 2.2.33 R Exercise R Solution R Prop. 2.2.29 2.2.3 R Exercise R Solution R R Prove only one of these, then infer the other 2.2.38 Exercise Solution using the transpose (R Ex. 2.2.2). What is the (i, i) entry of AB?

2.2.17 R Exercise R Solution 2.2.39 R Exercise R Solution (d3) The size is the sum of the squares of the R Preceding exercise. entries. This is the Frobenius norm of the ma- 2.2.41 R Exercise R Solution trix. If Tr(AB) = 0 for all A, then B = 0. 2.2.19 R Exercise R Solution 2.3.2 R Exercise R Solution Show, by induction on k, that the (i, j) entry Induction on k. Use the preceding exercise. of Ak is zero whenever j ≤ i + k − 1. 2.3.4 R Exercise R Solution 2.2.27 R Exercise R Solution Induction on the degree of f. Write B as a sum of matrices of the form [0 | · · · | 0 | bi | 0 | · · · | 0] and then use 2.3.6 R Exercise R Solution distributivity. Observe that the off-diagonal entries do not af- fect the diagonal entries. 2.2.28 R Exercise R Solution The i-th entry of Ax is the dot product of the 2.3.7 R Exercise R Solution i-th row of A with x. Induction on k. Use the preceding exercise.

2.2.29 R Exercise R Solution 2.5.1 R Exercise R Solution Apply the preceding exercise to B = I. R Ex. 2.2.2

2.2.30 R Exercise R Solution 2.5.5 R Exercise R Solution 24.3. MATRIX RANK 159

R Ex. 2.2.14. Consider the k ×r submatrix formed by taking r linearly independent columns. 2.5.6 R Exercise R Solution Multiply by the matrix with a 1 in the (i, j) 3.5.9 R Exercise R Solution position and 0 everywhere else. Extend a basis of U to a basis of Fn, and con- 2.5.7 R Exercise R Solution sider the action of the matrix A on this basis.

3.6.7 R Exercise R Solution (a) Trace. R Ex. 2.2.19.

2.5.8 R Exercise R Solution Incidence vectors (R Def. 1.6.2). 24.4 Theory of Systems of 2.5.13 R Exercise R Solution Linear Equations I: (b) That particular circulant matrix is the one corresponding to the cyclic permuta- Qualitative Theory tion of all elements, i. e., the matrix C = C(0, 1, 0, 0,..., 0). 24.5 Affine and Convex Combinations 24.3 Matrix Rank

R R 3.1.5 R Exercise R Solution 5.1.14 Exercise Solution Consider the matrix [A | B] (the k × 2n matrix (b) Let v0 ∈ S and let U = {s − v0 | s ∈ S}. obtained by concatenating the columns of B Show that onto A), and compare its column rank to the quantities above. (i) U ≤ Fn; 3.3.2 R Exercise R Solution (ii) S is a translate of U. Elementary column operations do not change the column space of a matrix. For uniqueness, you need to show that if U + 3.3.12 R Exercise R Solution u = V + v = S, then U = V . 160 CHAPTER 24. HINTS

24.6 The Determinant Use Ex. 8.1.9.

8.1.24 R Exercise R Solution 6.2.22 R Exercise R Solution Use Rank-Nullity. Identity matrix. 8.5.6 R Exercise R Solution 6.2.23 R Exercise R Solution R Prop. 2.3.4 Consider the n! term expansion of the deter- minant (Eq. (6.20)). What happens with the contributions of σ and σ−1? What if σ = σ−1? 24.9 Orthogonal Matrices What if σ has a fixed point? 6.3.7 R Exercise R Solution 24.10 The Spectral Theo- Let fen denote the number of fixed-point-free rem even permutations of the set {1, . . . , n}, and let fon denote the number of fixed-point-free odd 24.11 Bilinear and Quadratic permutations of the set [n]. Write fon − fen as the determinant of a familiar matrix. Forms 6.4.6 R Exercise R Solution det A = ±1. 11.1.5 R Exercise R Solution Let e ,..., e be the standard basis of n 6.4.7 R Exercise R Solution 1 n F (R Def. 1.3.34). Let αi = f(ei). Show that R Ex. 6.2.23 T a = (α1, . . . , αn) works. R R 24.7 Theory of Systems of 11.1.10 Exercise Solution Generalize the hint to Theorem 11.1.5. Linear Equations II: R R Cramer’s Rule 11.1.13 Exercise Solution (c) Skew symmetric matrices. R Ex. 6.2.21

24.8 Eigenvectors and 11.3.3 R Exercise R Solution Eigenvalues Give a very simple expression for B in terms of A.

(b) R Exercise R Solution 11.3.9 R Exercise R Solution 24.12. COMPLEX MATRICES 161

Use the fact that λ is real. 24.13 Matrix Norms

11.3.15 R Exercise R Solution Interlacing. 13.1.2 R Exercise R Solution kAxk R R Show that max exists. Why does this 11.3.23 Exercise Solution n x∈R kxk (b) Triangular matrices. kxk=1 suffice?

24.12 Complex Matrices 13.1.13 R Exercise R Solution n−k Use the fact that fAB = fBA · t (R Prop. 8.4.16). 12.2.8 R Exercise R Solution

(a) 24.14 Preliminaries (b) Prove that your example must have rank 24.15 Vector Spaces 1.

12.4.9 R Exercise R Solution 15.3.14 R Exercise R Solution Induction on n. Use the fact that a polynomial of degree n has at most n roots. 12.4.15 R Exercise R Solution R Prop. 12.4.6. 15.4.9 R Exercise R Solution R Ex. 14.4.59. 12.4.19 R Exercise R Solution R Theorem 12.4.18 15.4.10 R Exercise R Solution 12.5.4 R Exercise R Solution R Ex. 16.4.10.

Pick λ0, . . . , λn−1 to be complex numbers with T 15.4.13 R Exercise R Solution unit absolute value. Let w = (λ0, . . . , λn−1) . √1 Start with a basis of U1 ∩ U2. Prove that the entries of n F w generate a uni- tary circulant matrix, and that all unitary cir- R R culant matrices are of this form. The λi are the 15.4.14 Exercise Solution eigenvalues of the resulting circulant matrix. It is an immediate result of Cor. 15.4.3 that 162 CHAPTER 24. HINTS

every list of k + 1 vectors in Rk is linearly de- 24.17 Block Matrices pendent. 24.18 Minimal Polynomi- 15.4.18 R Exercise R Solution Their span is not changed. als of Matrices and Linear Transforma- 24.16 Linear Maps tions 24.19 Euclidean Spaces 16.2.6 R Exercise R Solution Coordinate vector. 19.1.6 R Exercise R Solution Consider the quadratic polynomial f(t) = 16.3.7 R Exercise R Solution ktv + wk2. Note that f(t) ≥ 0 for all t. What Let e1,..., ek be a basis of ker(ϕ). Extend does this say about the discriminant of f? this to a basis e1,..., en of V , and show that ϕ(ek+1), . . . , ϕ(en) is a basis for im(ϕ). 19.1.7 R Exercise R Solution Derive this from the Cauchy-Schwarz inequal- 16.4.32 R Exercise R Solution ity.

19.1.21 R Exercise R Solution (a) R Theorem 11.1.5.

(b) 19.3.20 R Exercise R Solution R Prop. 19.1.21. (c) R Prop. 16.4.14. 19.4.19 R Exercise R Solution (d) f 0(0) = 0.

19.4.22 R Exercise R Solution R R ⊥ 16.6.7 Exercise Solution Define U = {v0} , and for each u ∈ U \{0}, 0 Write A = [ϕ]old and A = [ϕ]new. Show consider the function fu : R → R defined by n that for all x ∈ F (n = dim V ), we have fu(t) = Rϕ(v0 + tu). Apply the preceding ex- A0x = T −1ASx. ercise to this function. 24.20. HERMITIAN SPACES 163

24.20 Hermitian Spaces For a general matrix B ∈ Mn(R), what is the i-th entry of B1? R R 20.1.19 Exercise Solution 22.2.10 R Exercise R Solution Derive this from the Cauchy-Schwarz inequal- R Prop. 8.4.15. ity. 22.6.3 R Exercise R Solution 20.1.32 R Exercise R Solution Marriage Theorem. R Theorem 11.1.5. 20.2.2 R Exercise R Solution For an eigenvalue λ, you need to show that λ = λ. The proof is just one line. Consider the expression x∗Ax. 20.2.5 R Exercise R Solution Which diagonal matrices are Hermitian? 20.4.1 R Exercise R Solution R Prop. 20.1.32.

24.21 The Singular Value Decomposition

21.2.7 R Exercise R Solution It suffices to show that there exists v ∈ ker ψ such that kϕvk ≥ σ`+1kvk (why?). Pick v ∈ ker ψ ∩ span(e1,..., e`+1) and show that this works.

24.22 Finite Markov Chains

22.1.5 R Exercise R Solution 25.2 Matrices

25.3 Matrix Rank

25.4 Theory of Systems of Chapter 25 Linear Equations I: Qualitative Theory Solutions 25.5 Affine and Convex Combinations

25.6 The Determinant 25.1 Column Vectors 25.7 Theory of Systems of Linear Equations II: Cramer’s Rule

25.8 Eigenvectors and Eigenvalues 1.1.16 R Exercise R Hint

8.1.5 R Exercise R Hint n Every vector in F is an eigenvector of In with eigenvalue 1. −5 −2 2 1 −1 = 2 1 − 6 8.1.24 R Exercise R Hint     2   11 7 6 dim Uλ = n − rk(λI − A).

Babai: Discover Linear Algebra. 164 This chapter last updated August 9, 2016 c 2016 L´aszl´oBabai. 25.9. ORTHOGONAL MATRICES 165

25.9 Orthogonal Matrices have

25.10 The Spectral Theo- 0 < vT Av = vT λv = λvT v = λkvk2 . rem

Hence λ > 0. 10.2.2 R Exercise R Hint Now suppose A is symmetric and all of its eigenvalues are positive. By the Spectral The- n ! n ! orem, A has an orthonormal eigenbasis, say T X T X v Av = αibi A αibi b = (b1,..., bn), with Abi = λibi for all i. i=1 i=1 Let v ∈ Fk be a nonzero vector. Then we can n ! n ! P X T X write v = αibi for some scalars αi. Then = αibi αiAbi i=1 i=1 n ! n ! n X T X X = αib αiλibi T 2 i v Av = λiαi > 0 i=1 i=1 i=1 n n X X T = λiαiαjbj bi i=1 j=1 so A is positive definite. n n X X = λiαiαj(bj · bi) i=1 j=1 n n X X = λiαiαjδij i=1 j=1 25.11 Bilinear and Quadratic n X 2 Forms = λiαi i=1

10.2.4 R Exercise R Hint 11.3.3 R Exercise R Hint A+AT First suppose A is positive definite. Let v be B = 2 . Verify that this and only this ma- an eigenvector of A. Then, because v 6= 0, we trix works. 166 CHAPTER 25. SOLUTIONS

Qn 0 Q0 25.12 Complex Matrices mA = i=1 (t − λi) where means that we take each factor only once (so eigenvalues λi 25.13 Matrix Norms with multiplicity greater than 1 only contribute one factor of t − λi). 25.14 Preliminaries 25.19 Euclidean Spaces 25.15 Vector Spaces 25.20 Hermitian Spaces 25.16 Linear Maps 25.21 The Singular Value 25.17 Block Matrices Decomposition 25.18 Minimal Polynomi- 25.22 Finite Markov Chains als of Matrices and Linear Transforma- tions

18.2.10 R Exercise R Hint Index

0, 4, 10, 13, 14 Cauchy-Binet formula, 56 1, 4, 13, 20 Cayley-Hamilton Theorem, 65, 65, 81 for diagonal matrices, 65 Affine basis, 44, 44 for diagonalizable matrices, 65 Affine combination, 42, 42, 45 Characteristic polynomial, 63, 63–64 Affine hull, 42, 42–43, 46 of nilpotent matrices, 63 Affine independence, 44, 44 Affine subspace, 40 of similar matrices, 65 Codimension, 34 of Fk, 42, 42–45 codimension of, 44 of an affine subspace, 44 dimension of, 44, 44 Cofactor, 54 intersection of, 42, 43 Cofactor expansion, 54, 54–55 All-ones matrix, see J matrix skew, 55 All-ones vector, see 1 Column space, 28, 32 Column vecotr Basis orthogonality of, 79 k of a subspace of F , 11, 11–12 Column vector, 3, 3–14 of a subspace of F k, 12 addition of, 4, 4–5 k of F , 12, 41 dot product of, 12, 12–14, 20, 67, 72, 77 Bilinear form isotropic, 77, 77 nonsingular, 76 linear combination of, 5, 5–6, 13, 21 over n, 71, 71–72 F linear dependence of, 8, 8–53 nonsingular, 72, 72 Linear independence of, 12 representation theorem for, 72 linear independence of, 8, 8–10, 14, 26, Cn 32, 35, 41, 53, 60 orthonormal basis of, 80 multiplication by a scalar, 5, 5

167 168 INDEX

norm of, 79 of a convex set, 46 orthogonality, 13, 13–14 of an affine subspace, 44, 44 orthogonality of, 20 Discrete Fourier Transform, 82, 82 with respect to a bilinear form, 76 Dot product, 12, 12–14, 20, 67, 72, 77 parallel, 10, 10 Dual space Column-rank, 28, 28, 32 of Fn, 71 full, 28, 33 Column-space, 28 Eigenbasis Companion matrix, 64, 64 of Fn, 60, 60 Complex conjugate, 78 orthonormal, 68, 69, 81 Complex number, 78, 78 Eigenspace, 60, 60 absolute value of, 78 Eigenvalue imaginary part of, 78 interlacing of, 70 magnitude of, 78 of a matrix, 59, 59–61, 63, 74, 75, 83, 84 norm of, 78 algebraic multiplicity of, 63, 63–64 real part of, 78 geometric multiplicity of, 60, 60, 64 unit norm, 78, 78 left, 60, 60, 64 Convex combination, 45 of a real symmetric matrix, 69 Convex hull, 45, 45–46 Eigenvector Convex set, 45, 45–47 of a matrix dimension of, 46 left, 60 intersection of, 45 of a matrix, 59, 59–61 Coordinates, 69 left, 60 Corank Elementary column operation, 29, 29–32, 35, of a matrix, 35, 35 53 of a set, 35 and the determinant, 52 Courant-Fischer Theorem, 70 Elementary matrices, 30 Cramer’s Rule, 57, 57 Elementary matrix, 30 Elementary row operation, 29, 29–32, 35 Determinant, 41, 49, 51, 51–56, 64, 75 Empty list, 6 corner, 75, 75 Empty sum, 4 multilinearity of, 52 Euclidean norm, 14, 53, 67, 80 of a matrix product, 52 complex, 79 , 61, 61 Dimension, 10, 12 Fk, 4, 15 INDEX 169

basis of, 12, 41 linear, see Linear hyperplane dimension of, 12 standard basis of, 11, 19, 20 I, see Identity matrix subspace of, 6, 6–8, 45 Identity matrix, 19, 19, 33, 59 codimension of, 34, 34 Incidence vector, 14, 14 dimension of, 10, 10–12 J matrix, 16, 20 disjoint, 12 Jordan block, 27 intersection of, 7 totally isotropic, 77, 77 k-cycle, 50, 51 trivial, 7 Kronecker delta, 19 union of, 7 Fk×n, 15 `2 norm, 79 F[t], 62 `2 norm, 14 Field, 3 Linear dependence of characteristic 2, 53 of column vectors, 53 First Miracle of Linear Algebra, 11, 11–12 Linear combination Frobenius norm, 84, 84 affine, see Affine combination submultiplicativity of, 84 of column vectors, 5, 5–6, 8, 13, 21 Fundamental Theorem of Algebra, 62 trivial, 8 Linear dependence Gauss-Jordan elimination, 31 of column vectors, 8, 8–53 Gaussian elimination, 31 Linear form Generalized Fisher Inequality, 14 over Fn, 71, 71, 73 Group Linear hyperplane, 44 orthogonal, see Orthogonal group intersection of, 45 symmetric, see Symmetric group Linear independence unitary, see Unitary group maximal, 11, 11 of column vectors, 8, 8–10, 12, 14, 26, 32, Hadamard matrix, 68, 68 35, 41, 53, 60 Hadamard’s Inequality, 53 List, 8 Half-space, 46, 46 concatenation of, 8 Helly’s Theorem, 46, 46–47 empty, 9 Hermitian dot product, 79, 79–80 List of generators, 10, 11 Hyperplane, 44, 44–45 intersection of, 45 Mn(F), 15 170 INDEX

Matrix, 15, 15–27 distributivity of, 18 addition of, 17, 17 multiplication by a scalar, 17, 18 adjugate of, 55 negative definite, 74 augmented, 40 negative semidefinite, 74 block, 124, 124 nilpotent, 20, 20, 60, 61 multiplication of, 125 characteristic polynomial of, 63 block-diagonal, 124, 124–125 nonnegative, 151 block-triangular, 124, 125, 126 nonsingular, 32, 32–33, 36, 40, 41, 56, 57, characteristic polynomial of, 63, 63–64 60, 72 circulant, 26, 27, 82 normal, 81, 81–82, 84 commutator of, 26, 26 null space of, 35, 39, 39 conjugate-transpose of, 78, 78–79 nullity of, 39 corank of, 35, 35 orthogonal, 67, 67–68, 83, 84 corner, 75, 75 orthogonally similar, 67, 67–69 definiteness of, 74, 74–75 permutation, see Permutation matrix diagonal, 16, 16, 19, 22, 26, 59, 74, 80 positive definite, 69, 70, 74, 83 determinant of, 51 positive semidefinite, 74 diagonalizable, 61, 61, 81 rank, 32, 32–37, 79 doubly stochastic, 153, 153–154 full, 32 exponentiation, 19, 19, 20, 22, 23, 125, right inverse of, 33, 33–34, 41 126 rotation, see Rotation matrix Hermitian, 80, 80–81 scalar, 19, 26, 60 Hermitian adjoint of, 78, 78–79 self-adjoint, 80, 80–81 identity, see Identity matrix similarity of, 61, 61, 65 indefinite, 74 skew-symmetric, 53 integral, 15, 26 square, 15, 17, 32 inverse, 33, 33–34, 56 stochastic, 83, 151, 151 inverse of, 41 strictly lower triangular, 16, 16 invertibility of, 33, 33–34, 41, 56, 57 strictly upper triangular, 16, 16, 20 kernel of, 35, 39 symmetric, 17, 17, 26, 67–69, 83, 84 left inverse of, 33, 33–34, 41 trace of, 21, 21–22 lower triangular, 16, 16 transpose, 17, 17, 18, 26 multiplication, 18, 18–22 determinant of, 51 and the determinant, 52 unitary, 80, 80–82 associativity of, 18 unitary similarity of, 81, 81–82 INDEX 171

upper triangular, 16, 16, 23–24, 31 of Rn, 67, 68 determinant of, 51, 52 Orthonormal system, 14, 67, 80 Vandermonde, 82 zero, see Zero matrix Pn(F), 62 Modular equation, 12 Parallelepiped, 54, 54 for codimension, 34 Parallelogram, 54, 54 Monomial Permutation, 24, 24–25, 49, 49–51 multivariate, 72, 72 composition of, 24, 24, 49, 50 degree of, 72 cycle, 50, 51 monic, 72 decomposition, 50, 51 monic part of, 72 disjoint, 50, 50 Multilinearity, 52 even, 49, 49 of the determinant, 52 fixed point of, 55, 55 identity, 25 Norm inverse of, 25, 25 of column vectors, 14, 53, 67, 79, 80 inversion of, 49, 49–51 Null space odd, 49 of a matrix, 35, 39, 39 parity, 49, 50, 51 Nullity product of, 49, 50 of a matrix, 39 sign, 49, 50 Permutation matrix, 24 Operator norm Polynomial, 22, 62, 62, 64 of a matrix, 83, 83–84 degree of, 62, 62 submultiplicativity of, 83 divisibility of, 62 Orthogonal group, 67 leading coefficient of, 62, 62 Orthogonal matrix leading term of, 62, 62 eigenvalues of, 67 monic, 62 Orthogonal system, 14, 79 multivariate, 72, 72–73 Orthogonality degree of, 73 of column vectors, 13, 13–14, 20, 60, 79 homogeneous, 73, 73 with respect to a bilinear form, 76 standard form, 73 Orthogonally similar matrices, 67, 67–69, 76 of a matrix, 26, 27 Orthonormal basis of matrices, 22, 22, 23, 64, 65, 65, 125, of a subspace of Rn, 14 126 of Cn, 80 root of, 62, 62 172 INDEX

multiplicity, 62 real version, 82 zero, 62, 62 Second Miracle of Linear Algebra, 31, 31 Set Quadratic form, 74, 74–76 affine closed, 44 definiteness of, 74, 74–76 affine-closed, 42, 42 indefinite, 74 intersection of, 42 negative definite, 74 closed, 46 negative semidefinite, 74 convex, 45, 45–47 positive definite, 74 dimension of, 46 positive semidefinite, 74 intersection of, 45 corank of, 35 Rn orthonormal basis of, 67, 68 open, 46, 138 subspace of, 7 sum of, 8, 8, 12 Radon’s Lemma, 46 translate of, 40, 40, 43 Rank Shearing, 29, 29 of a matrix, 32, 32–37, 79 Shearing matrix, 30 full, 32 Similar matrices, 61, 61, 65, 75, 76 of column vectors, 10, 10–12 Skew cofactor expansion, 55 Rank-Nullity Theorem Span for matrices, 39 of column vectors, 7, 7–8, 10, 41 Rayleigh quotient, 70, 74 transitivity of, 8 Rayleigh’s Principle, 70 Spectral Theorem Reflection matrix, 84 for normal matrices, 82 Rook arrangement, 24, 24 for real symmetric matrices, 69, 69–70, Rotation matrix, 19, 19 80, 82 Row space, 28 applications of, 69–70 k Row vector, 15, 16 Standard basis of F , 11, 19, 20 Row-rank, 28 Standard unit vector, 11, 19 full, 28, 33 Steinitz exchange lemma, 11 Straight-line segment, 46, 46 Sn, see Symmetric group Sublist, 9, 9 Scalar, 3 Submatrix, 26, 32, 33 Scaling, 29, 29 Subset Scaling matrix, 30 dense, 81, 81 Schur’s Theorem, 81 Subspace INDEX 173

affine, see Affine subspace Third Miracle of Linear Algebra, 67, 80 of Fk, 6, 6–8, 45 Trace basis of, 11, 11–12 of a matrix, 21, 21–22, 64 codimension of, 34, 34 Transposition, 49, 49–50 dimension of, 10, 10–12 neighbor, 50, 50 disjoint, 12 Triangle inequality, 83, 84 intersection of, 7 totally isotropic, 77, 77 Unitarily similar matrices, 81, 81–82 union of, 7 Unitary group, 80 zero-weight, 7 Unitary matrix Swapping, 29, 29 eigenvalues of, 80 Swapping matrix, 30 Symmetric group, 49, 50 Vandermonde determinant, 55 System of linear equations, 6, 38–41, 48 Vandermonde matrix, 26, 26, 55 augmented, 40 Vector homogeneous, 39, 43, 53 isotropic, 13, 13 solution space of, 39, 39–40 trivial solution to, 39 Zero matrix, 16, 21, 33 set of solutions to, 40, 43 Zero polynomial, 62, 62 solvable, 38, 40–41, 57 Zero vector, see 0