Chapter 1 Introduction to Matrices
Total Page:16
File Type:pdf, Size:1020Kb
Chapter 1 Introduction to Matrices Contents (final version) 1.0 Introduction........................................ 1.3 1.1 Basics............................................ 1.4 1.2 Matrix structures..................................... 1.14 Notation.............................................. 1.14 Common matrix shapes and types................................. 1.15 Matrix transpose and symmetry.................................. 1.19 1.3 Multiplication....................................... 1.21 Vector-vector multiplication.................................... 1.21 Matrix-vector multiplication.................................... 1.24 Matrix-matrix multiplication.................................... 1.25 Matrix multiplication properties.................................. 1.25 Using matrix-vector operations in high-level computing languages................ 1.31 Invertibility............................................. 1.39 1.4 Orthogonality....................................... 1.40 1.5 Matrix determinant.................................... 1.43 1.1 © J. Lipor, January 5, 2021, 14:27 (final version) 1.2 1.6 Eigenvalues........................................ 1.49 Properties of eigenvalues...................................... 1.52 Trace................................................ 1.55 1.7 Appendix: Fields, Vector Spaces, Linear Transformations.............. 1.56 © J. Lipor, January 5, 2021, 14:27 (final version) 1.3 1.0 Introduction This chapter reviews vectors and matrices and basic properties like shape, orthogonality, determinant, eigen- values and trace. It also reviews some operations like multiplication and transpose. © J. Lipor, January 5, 2021, 14:27 (final version) 1.4 1.1 Basics Why vector? L§1.1 • Data organization • To group into matrices (data) or to be acted on by matrices (operations) Example: Personal attributes - age, height, weight, eye color (?) ... Example: Digital grayscale image (!) • A vector in the vector space of 2D arrays. • I rarely think of a digital image as a “matrix” ! (Read the Appendix about general vector spaces.) © J. Lipor, January 5, 2021, 14:27 (final version) 1.5 Why matrix? Two reasons: • data array • linear operation aka linear map or linear transformation Example: data matrix: return to personal attributes person1, person2, ... Example: linear operation: DFT matrix in 1D DFT matrix in 2D unified as matrix-vector operation: X = W x |{z} |{z} |{z} N × 1 spectrum N × N DFT ,! N × 1 signal © J. Lipor, January 5, 2021, 14:27 (final version) 1.6 Example (classic): solve system of linear equations with N equations in N unknowns: 2 3 2 3 ax1 + bx2 + cx3 = u a b c u dx1 + ex2 + fx3 = v =) Ax = b; with A = 4d e f5 ; b = 4v 5 : gx1 + hx2 + ix3 = w g h i w Certainly the matrix-vector notation Ax = b is more concise (and general). A traditional linear algebra course would focus extensively on solving Ax = b: This topic will not be a major focus of this course! © J. Lipor, January 5, 2021, 14:27 (final version) 1.7 Example: linear operation: convolution in DSP: K−1 X y[n] = h[k]x[n − k] k=0 LTI (x[0]; : : : ; x[N − 1]) ! ! (y[0]; : : : ; y[M − 1]) | {z } h[0]; : : : ; h[K − 1] | {z } input signal output signal Matrix-vector representation of convolution: y = Hx where x is length-N vector, y is length-M vector and H is a M × N matrix with elements Hmn = h[m − n]: What is M in terms of N and K? (Choose best answer.) A: max(N; K) − 1 B: max(N; K) C: N + K D: N + K + 1 E: None of these. ?? Other matrix operators for other important linear operations: wavelet transform, DCT, 2D DFT, ... © J. Lipor, January 5, 2021, 14:27 (final version) 1.8 Machine learning setup In machine learning, we describe our data using the language of linear algebra. We generally work with a set of N-dimensional vectors x1;:::; xp representing measurements or observations of some phenomenon we care about, called feature vectors, whose coordinates represent a set of features/attributes/predictors/co- variates. In supervised learning, we also have a set of scalars y1; : : : ; yp called the labels/outputs/responses. When yi belongs to a finite set of discrete values (e.g., 1;:::;K), we call the problem classification, and when yi is real valued, we call the problem regression. The goal of (supervised) machine learning is to learn some function f such that f(xi) ≈ yi for i = 1; : : : ; p. The language of matrices and vectors allows us to model this problem! © J. Lipor, January 5, 2021, 14:27 (final version) 1.9 Example: data matrix A term-document matrix is used in information retrieval. Document1: ECE510 meets on Tuesdays and Thursdays Document2: Tuesday is the most exciting day of the week Document3: Let us meet next Thursday. Keywords (terms): ECE510 meet Tuesday Thursday exciting day week next (Note: use stem, ignore generic words "the" "and") Term-document (binary) matrix: Term Doc1 Doc2 Doc3 ECE510 1 0 0 meet 1 0 1 Tuesday 1 1 0 Thursday 1 0 1 exciting 0 1 0 day 0 1 0 week 0 1 0 next 0 0 1 © J. Lipor, January 5, 2021, 14:27 (final version) 1.10 2 3 The entries in a (mostly) numerical table like this are naturally 1 0 0 represented by a 8×3 matrix T (because each column is a vector 61 0 17 6 7 in 8 and there are three columns) as follows: 61 1 07 R 6 7 61 0 17 T = 6 7 : Why mostly? Column and row labels... 60 1 07 6 7 60 1 07 6 7 40 1 05 0 0 1 2 3 Query as a matrix vector operation: find all documents relevant to 0 the query “exciting days of the week.” See example query vector 607 6 7 to the right: 607 6 7 607 q = 6 7 : In matrix terms, find columns of T that are “close” (will be de- 617 6 7 fined precisely later) to query vector q. 617 6 7 415 0 This is an information retrieval application expressed using matrix/vector operations. © J. Lipor, January 5, 2021, 14:27 (final version) 1.11 Example: Networks, Graphs, and Adjacency Matrices 1 2 3 Consider a set of six web pages that are related via web links (outlinks and inlinks) according to the following directed graph [2, Ex. 1.3, p. 7]: The 6 × 6 adjacency matrix A for this graph has a nonzero value (unity) in 4 5 6 element Aij if there is a link from from node (web page) j to i: 20 1 0 0 0 03 2 0 1=3 0 0 0 0 3 61 0 0 0 0 07 61=3 0 0 0 0 0 7 6 7 6 7 60 1 0 0 1 17 6 0 1=3 0 0 1=3 1=27 A = 6 7 ; L = 6 7 : 61 0 0 0 1 07 61=3 0 0 0 1=3 0 7 6 7 6 7 41 1 0 0 0 15 41=3 1=3 0 0 0 1=25 0 0 1 0 1 0 0 0 1 0 1=3 0 The link graph matrix L is found from the adjacency matrix by normalizing (each column) with respect to the number of outlinks so each column sums to unity, We will see later that Google’s PageRank algorithm [3] for quantifying importance of web pages is related to an eigenvector of L. (There is a left eigenvector with λ = 1 so there must also be a right eigenvector with that eigenvalue and that is the one of interest.) So evidently representing web page relationships using a matrix and computing properties of that matrix is useful in many domains, even some that might seem quite unexpected at first. © J. Lipor, January 5, 2021, 14:27 (final version) 1.12 Example: a simple linear operation In the preceding two examples (viewing a 2D function and computing volume under a 2D function), the matrix was a 2D array of data. Recall that there were two “whys” for matrices: data and linear operations. Now we turn to an example where the matrix is associated with an linear operation that we apply to vectors. Consider a polygon P1 defined by N vertex points (x1; y1);:::; (xN ; yN ); connected in that order, and where the last point is also connected to the first point so there N edges. Define a new polygon P2 where each vertex is the average of the two points along an edge of polygon P1. (x1; y1) (x4; y4) (^x5; y^5) (^x1; y^1) (^x4; y^4) (x5; y5) (^x3; y^3) (x2; y2) (x3; y3) (^x2; y^2) P1 =) P2 x + x y + y Mathematically, the new vertex points are linearly related to the old points as: x^ = 1 2 ; y^ = 1 2 : 1 2 1 2 © J. Lipor, January 5, 2021, 14:27 (final version) 1.13 In general: 2 3 2 3 2 3 2 3 x^1 x1 + x2 y^1 y1 + y2 6 . 7 1 6 . 7 6 . 7 1 6 . 7 6 . 7 6 . 7 6 . 7 6 . 7 6 7 = 6 7 ; 6 7 = 6 7 : (1.1) 4x^N−15 2 4xN−1 + xN 5 4y^N−15 2 4yN−1 + yN 5 x^N xN + x1 y^N yN + y1 This is a linear operation so we can write it concisely in matrix-vector form: 21 1 0 0 ::: 03 60 1 1 0 ::: 07 6 7 6. .. .. 7 1 6. 7 x^ = Ax; y^ = Ay; A 6 7 : (1.2) , 2 6 7 6 7 6 7 40 0 ::: 0 1 15 1 0 0 0 ::: 1 The matrix form might not seem more illuminating than the expressions in (1.1) at first. But now suppose we ask the following question: what happens to the polygon shape if we perform the process repeatedly (assuming we do some appropriate normalization)? The answer is not obvious from (1.1) but in matrix form the answer is related to the power iteration that we will discuss later for computing the principle eigenvector of A.