Linear Algebra
Total Page:16
File Type:pdf, Size:1020Kb
Department of Stat. and OR Refresher course, Summer 2019 Linear Algebra Modified By: Eric Lock (UNC, 2010 & 2011) Original Author: Gen Li (UNC, 2012) Oleg Mayba, Michael Lamm (UNC, 2013) UC Berkeley Wen Jenny Shi (UNC, 2014) Department of Statistics, Meilei Jiang (UNC, 2015 & 2016) 2006, Iain Carmichael (UNC, 2017) DMS Grant No 0130526 Adam Waterbury (UNC, 2018) Brendan Brown (UNC, 2019) Instructor: Brendan Brown (UNC at Chapel Hill) August 9, 2019 Contents 1 Introduction3 2 Vector Spaces4 2.1 Basic Concepts..................................5 2.2 Special Spaces...................................8 2.3 Properties and geometry of inner product spaces................ 12 2.4 Gram-Schmidt Process.............................. 15 2.5 Comments on infinite-dimensional spaces.................... 16 Exercises......................................... 17 3 Matrices and Matrix Algebra 20 3.1 Matrix Operations................................ 20 3.2 Special Matrices.................................. 22 3.3 The Four Fundamental Spaces.......................... 23 3.4 Sample Covariance matrix............................ 25 Exercises......................................... 26 4 Projections and Least Squares Estimation 27 4.1 Projections.................................... 27 4.2 Applications to Statistics: Least Squares Estimator.............. 30 Exercises......................................... 32 5 Linear functionals, riesz representation and hyperplane separation 34 5.1 Linear functionals................................. 34 5.2 Hyperplane separation.............................. 36 6 Matrix Decompositions 39 6.1 Determinants................................... 39 6.2 Eigenvalues and Eigenvectors.......................... 41 6.3 Complex Matrices................................. 45 6.4 Facts that lead up to the spectral theorem................... 45 6.5 Spectral Theorem................................. 47 6.6 Examples: Spectral theory............................ 50 Exercises......................................... 52 7 Tensors 55 7.1 Exercises...................................... 57 8 Singular Value Decomposition 57 8.1 Definition..................................... 57 8.2 Low rank approximation............................. 60 8.3 A few Applications................................ 60 8.4 Principal Components Analysis......................... 62 8.5 The $25,000,000,000 Eigenvector: the Linear Algebra Behind Google.... 65 1 Exercises......................................... 65 9 Matrix functions and differentiation 67 9.1 Basics....................................... 67 9.2 Jacobian and Chain Rule............................. 67 9.3 Matrix functions................................. 69 Exercises......................................... 70 10 Computation 71 10.1 Power method................................... 71 10.2 Gradient Descent................................. 73 11 Statistics: Random Variables 74 11.1 Expectation, Variance and Covariance..................... 74 11.2 Distribution of Functions of Random Variables................. 76 11.3 Derivation of Common Univariate Distributions................ 79 11.4 Random Vectors: Expectation and Variance.................. 82 Exercises......................................... 84 12 Further Applications to Statistics: Normal Theory and F-test 85 12.1 Bivariate Normal Distribution.......................... 85 12.2 F-test....................................... 86 Exercises......................................... 88 13 References 89 2 1 Introduction These notes are intended for use in the warm-up camp for incoming UNC STOR graduate students. Welcome to Carolina! We assume that you have taken a linear algebra course before and that most of the ma- terial in these notes will be a review of what you've already known. If some of the material is unfamiliar, do not be intimidated! We hope you find these notes helpful. If not, you can consult the references listed at the end, or any other textbooks of your choice for more information or another style of presentation (most of the proofs on linear algebra part have been adopted from Strang, the proof of F-test from Montgomery et al, and the proof of bivariate normal density from Bickel and Doksum). Linear algebra is an important and fundamental math tool for probability, statistics, nu- merical analysis and operations research. Lots of material in this notes will show up in your future study and research. There will be 8 algebraic classes in total. In the second week we may cover some topics in computational linear algebra with applications to statistics and machine learning (PCA, PageRank, Spectra Clustering, etc). See the notes and course schedule on the UNC STOR bootcamp webpage. 3 2 Vector Spaces A set V is a vector space over R (or any field) with two operations defined on it: 1. Vector addition, that assigns to each pair of vectors v1; v2 2 V another vector w 2 V (we write v1 + v2 = w) 2. Scalar multiplication, that assigns to each vector v 2 V and each scalar r 2 R (field) another vector w 2 V (we write rv = w) The elements of V are called vectors and must satisfy the following 8 conditions 8 v1; v2; v3 2 V and 8 r1; r2 2 R: 1. Commutativity of vector addition: v1 + v2 = v2 + v1 2. Associativity of vector addition: (v1 + v2) + v3 = v1 + (v2 + v3) 3. Identity element of vector addition: 9 vector 0 2 V , s.t. v + 0 = v, 8 v 2 V 4. Inverse elements of vector addition: 8 v 2 V 9 −v = w 2 V s.t. v + w = 0 5. Compatibility of scalar multiplication with (field) multiplication: r1(r2v) = (r1r2)v, 8 v 2 V 6. Distributivity of scalar multiplication with respect to (field) addition: (r1 + r2)v = r1v + r2v, 8 v 2 V 7. Distributivity of scalar multiplication with respect to vector addition: r(v1 + v2) = rv1 + rv2, 8 r 2 R 8. Identity element of scalar multiplication: 1v = v, 8 v 2 V Vector spaces over fields other than R are defined similarly, with the multiplicative iden- tity of the field replacing 1. We won't concern ourselves with those spaces, except for when we need complex numbers later on. Also, we'll be using the symbol 0 to designate both the number 0 and the vector 0 in V , and you should always be able to tell the difference from the context. Sometimes, we'll emphasize that we're dealing with, say, n × 1 vector 0 by writing 0n×1. Here are some examples of vector spaces 1. Rn with usual operations of element-wise addition and scalar multiplication. An ex- ample of these operations in R2 is illustrated in Figure2. 2. Vector space F[−1;1] of all functions defined on interval [−1; 1], where we define (f+g)(x) = f(x) + g(x) and (rf)(x) = rf(x). 4 - 6 v + w v w - 2w- Figure 1: Vector Addition and Scalar Multiplication 2.1 Basic Concepts Subspace and span We say that S ⊂ V is a subspace of V , if S is closed under vector addition and scalar multiplication, i.e. 1. 8s1; s2 2 S, s1 + s2 2 S 2. 8s 2 S, 8r 2 R, rs 2 S You can verify that if those conditions hold, S is a vector space in its own right (satisfies the 8 conditions above). Note also that S has to be non-empty; the empty set is not allowed as a subspace. Examples: 1. A subset f0g is always a subspace of a vectors space V . Pn 2. Given a set of vectors S ⊂ V , span(S) = fw : w = i=1 rivi; ri 2 R; and vi 2 Sg, the set of all linear combinations of elements of S (see below for definition) is a subspace of V . 3. S = f(x; y) 2 R2 : y = 0g is a subspace of R2 (x-axis). 4. The set of all continuous functions defined on interval [−1; 1] is a subspace of F[−1;1]. For all of the above examples, you should check for yourself that they are in fact subspaces. Given vectors v1; v2; : : : ; vn 2 V , we say that w 2 V is a linear combination of n P v1; v2; : : : ; vn if for some r1; r2; : : : ; rn 2 R, we have w = rivi. If every vector in V is i=1 a linear combination of S = fv1; v2; : : : ; vng, we have span(S) = V , then we say S spans V . Fact: A set S ⊆ V is a subspace if and only if it is closed under linear combinations. Linear independence and dependence Given vectors v1; v2; : : : ; vn 2 V we say that n P v1; v2; : : : ; vn are linearly independent if rivi = 0 =) r1 = r2 = ::: = rn = 0, i.e. the i=1 only linear combination of v1; v2; : : : ; vn that produces 0 vector is the trivial one. We say 5 that v1; v2; : : : ; vn are linearly dependent otherwise. We now prove two results that will be used later and give some practice with linear al- gebra proofs. Theorem: Let I;S ⊂ V be such that I is linearly independent, and S spans V . Then for every x 2 I there exists a y 2 S such that fyg [ Infxg is linearly independent. Proof: We prove this result by contradiction. First two two facts that can be easily verified from the definitions above. 1. If a set J ⊂ V is linearly independent, then J [ fyg is linearly dependent if and only if y 2 span(J). 2. If S; T ⊂ V with T ⊂ span(S) then span(T ) ⊂ span(S). Assume for the sake of contradiction that the claim does not hold. I.e. suppose there there exists a x 2 I such that for all y 2 S fyg[Infxg is linearly dependent. Let I0 = Infxg. Since I is linearly independent it follows that I0 is also linearly independent. Then by the first fact above, the fact that fyg [ I0 is linearly dependent implies y 2 span(I0). Moreover, this holds for all y 2 S so S ⊂ span(I0). By the second fact we then have that span(S) ⊂ span(I0). Now since S spans V it follows that x 2 V = span(S) ⊂ span(I0) = span(Infxg). This means there exists Pn v1; v2; : : : ; vn 2 Infxg and r1; r2; : : : ; rn 2 R such that 0 = x − i=1 rivi, contradicting the assumption that I linearly independent. Corollary: Let I;S ⊂ V be such that I is linearly independent, and S spans V . Then jIj ≤ jSj, where |· j denotes the number of elements of a set (possibly infinite). Proof: If jSj = 1 then the claim holds.