Linear Algebra

Linear algebra Kevin P. Murphy Last updated January 25, 2008 1 Introduction Linear algebra is the study of matrices and vectors. Matrices can be used to represent linear functions1 as well as systems of linear equations. For example, 4x 5x = 13 1 2 (2) 2x +− 3x = 9− − 1 2 can be represented more compactly by Ax = b (3) where 4 5 13 A = , b = (4) 2− 3 9 − − Linear algebra also forms the the basis of many machine learning methods, such as linear regression, PCA, Kalman filtering, etc. Note: Much of this chapter is based on notes written by Zico Kolter, and is used with his permission. 2 Basic notation We use the following notation: m n By A R × we denote a matrix with m rows and n columns, where the entries of A are real numbers. • ∈ By x Rn, we denote a vector with n entries. Usually a vector x will denote a column vector — i.e., a matrix • with n∈rows and 1 column. If we want to explicitly represent a row vector —a matrixwith 1 rowand n columns — we typically write xT (here xT denotes the transpose of x, which we will define shortly). The ith element of a vector x is denoted x : • i x1 x2 x = . . x n We use the notation a (or A , A , etc) to denote the entry of A in the ith row and jth column: • ij ij i,j a a a 11 12 · · · 1n a21 a22 a2n A = . ·. · · . . .. a a a m1 m2 · · · mn 1A function f : Rm Rn is called linear if → f(c1x + c2y) = c1f(x)+ c2f(y) (1) for all scalars c1,c2 and vectors x, y. Hence we can predict the output of a linear function in terms of its response to simple inputs. 1 We denote the jth column of A by a or A : • j :,j A = a|a |a | . 1 2 · · · n | | | We denote the ith row of A by aT or A : • i i,: T — a1 — T — a2 — A = . . — aT — m A diagonal matrix is a matrixwhere all non-diagonalelementsare 0. Thisis typically denoted D = diag(d1, d2,...,dn), • with d1 d2 D = . (5) .. d n n n The identity matrix, denoted I R × , is a square matrix with ones on the diagonal and zeros everywhere • ∈ m n else, I = diag(1, 1,..., 1). It has the property that for all A R × , ∈ AI = A = IA (6) where the size of I is determined by the dimensions of A so that matrix multiplication is possible. (We define matrix multiplication below.) A block diagonal matrix is one which contains matrices on its main diagonal, and is 0 everywhere else, e.g., • A 0 (7) 0 B The unit vector e is a vector of all 0’s, except entry i, which has value 1: • i T ei = (0,..., 0, 1, 0,..., 0) (8) The vector of all ones is denoted 1. The vector of all zeros is denoted 0. 3 Matrix Multiplication m n n p The product of two matrices A R × and B R × is the matrix ∈ ∈ m p C = AB R × , (9) ∈ where n Cij = AikBkj . (10) Xk=1 Note that in order for the matrix product to exist, the number of columns in A must equal the number of rows in B. There are many ways of looking at matrix multiplication, and we’ll start by examining a few special cases. 2 3.1 Vector-Vector Products Given two vectors x, y Rn, the quantity xT y, called the inner product, dot product or scalar product of the vectors, is a real number∈ given by n xT y = x, y R = x y . (11) h i ∈ i i i=1 X Note that it is always the case that xT y = yT x. Given vectors x Rm, y Rn (they no longer have to be the same size), xyT is called the outer product of the ∈ ∈ T vectors. It is a matrix whose entries are given by (xy )ij = xiyj, i.e., x y x y x y 1 1 1 2 · · · 1 n x2y1 x2y2 x2yn T m n xy R × = . ·. · · . . (12) ∈ . .. x y x y x y m 1 m 2 · · · m n 3.2 Matrix-Vector Products m n n m Given a matrix A R × and a vector x R , their product is a vector y = Ax R . There are a couple ways of looking at matrix-vector∈ multiplication, and∈ we will look at them both. ∈ If we write A by rows, then we can express Ax as, T T — a1 — a1 x T T — a2 — a2 x y = . x = . . (13) . — aT — aT x m m T In other words, the ith entry of y is equal to the inner product of the ith row of A and x, yi = ai x. In Matlab notation, we have y = [A(1,:)*x1; ...; A(m,:)*xn] Alternatively, let’s write A in column form. In this case we see that, x1 | | | x2 y = a1 a2 an . = a1 x1 + a2 x2 + . + an xn . (14) · · · . | | | x n In other words, y is a linear combination of the columns of A, where the coefficients of the linear combination are given by the entries of x. In Matlab notation, we have y = A(:,1)*x1 + ...+ A(:,n)*xn So far we have been multiplying on the right by a column vector, but it is also possible to multiply on the left by a T T m n m n T row vector. This is written, y = x A for A R × , x R , and y R . As before, we can express y in two obvious ways, depending on whether we express∈ A in terms∈ on its rows or∈ columns. In the first case we express A in terms of its columns, which gives yT = xT a|a |a | = xT a xT a xT a (15) 1 2 · · · n 1 2 · · · n | | | which demonstrates that the ith entry of yT is equal to the inner product of x and the ith column of A. 3 Finally, expressing A in terms of rows we get the final representation of the vector-matrix product, T — a1 — — aT — T 2 y = x1 x2 xn . (16) · · · . — aT — m (17) T T T = x1 — a1 — + x2 — a2 — + ... + xn — an — (18) so we see that yT is a linear combination of therows of A, where the coefficients for the linear combination are given by the entries of x. 3.3 Matrix-Matrix Products Armed with this knowledge, we can now look at four different (but, of course, equivalent) ways of viewing the matrix-matrix multiplication C = AB as defined at the beginning of this section. First we can view matrix-matrix multiplication as a set of vector-vector products. The most obvious viewpoint, which follows immediately from the definition, is that the i, j entry of C is equal to the innerproductofthe ithrow of A and the jth rowof B. Symbolically, this looks like the following, T T T T — a1 — a1 b1 a1 b2 a1 bp — aT — aT b aT b · · · aT b 2 | | | 2 1 2 2 · · · 2 p C = AB = . b1 b2 bp = . . (19) . · · · . .. — aT — | | | aT b aT b aT b m m 1 m 2 · · · m p m n n p n n Remember that since A R × and B R × , ai R and bj R , so these inner products all make sense. This is the most “natural” representation∈ when∈ we represent∈ A by rows and∈ B by columns. A special case of this result arises in statistical applications where A = X and B = XT , where X is the n d design matrix, whose rows are the data cases. In this case, XXT is an n n matrix of inner products called× the Gram matrix: × xT x xT x 1 1 · · · 1 n T .. XX = . (20) xT x xT x n 1 · · · n n Alternatively, we can represent A by columns, and B by rows, which leads to the interpretation of AB as a sum of outer products. Symbolically, T — b1 — — bT — n | | | 2 T C = AB = a1 a2 an . = aibi . (21) · · · . i=1 | | | — bT — X n Put another way, AB is equal to the sum, over all i, of the outer product of the ith column of A and the ith row of B. Rm Rp T Since, in this case, ai and bi , the dimension of the outer product aibi is m p, which coincides with the dimension of C. ∈ ∈ × If A = XT and B = X, where X is the n d design matrix,wegeta d d matrix which is proportional to the empirical covariance matrix of the data (assuming× it has been centered): × x2 x x x x n i,1 i,1 i,2 · · · i,1 i,d T T .. X X = xixi = . (22) i=1 i x x x x x2 X X i,d i,1 i,d i,2 · · · i,d 4 Second, we can also view matrix-matrix multiplication as a set of matrix-vector products. Specifically, if we represent B by columns, we can view the columns of C as matrix-vector products between A and the columns of B. Symbolically, | | | | | | C = AB = A b b b = Ab Ab Ab . (23) 1 2 · · · p 1 2 · · · p | | | | | | Here the ith column of C is given by the matrix-vector product with the vector on the right, ci = Abi. These matrix- vector products can in turn be interpreted using both viewpoints given in the previous subsection. Finally, we have the analogous viewpoint, where we represent A by rows, and view the rows of C as the matrix-vector product between the rows of A and C.

Load more