Linear Algebra

Linear algebra Kevin P. Murphy Last updated January 25, 2008 1 Introduction Linear algebra is the study of matrices and vectors. Matrices can be used to represent linear functions1 as well as systems of linear equations. For example, 4x 5x = 13 1 2 (2) 2x +− 3x = 9− − 1 2 can be represented more compactly by Ax = b (3) where 4 5 13 A = , b = (4) 2− 3 9 − − Linear algebra also forms the the basis of many machine learning methods, such as linear regression, PCA, Kalman filtering, etc. Note: Much of this chapter is based on notes written by Zico Kolter, and is used with his permission. 2 Basic notation We use the following notation: m n By A R × we denote a matrix with m rows and n columns, where the entries of A are real numbers. • ∈ By x Rn, we denote a vector with n entries. Usually a vector x will denote a column vector — i.e., a matrix • with n∈rows and 1 column. If we want to explicitly represent a row vector —a matrixwith 1 rowand n columns — we typically write xT (here xT denotes the transpose of x, which we will define shortly). The ith element of a vector x is denoted x : • i x1 x2 x = . . x n We use the notation a (or A , A , etc) to denote the entry of A in the ith row and jth column: • ij ij i,j a a a 11 12 · · · 1n a21 a22 a2n A = . ·. · · . . .. a a a m1 m2 · · · mn 1A function f : Rm Rn is called linear if → f(c1x + c2y) = c1f(x)+ c2f(y) (1) for all scalars c1,c2 and vectors x, y. Hence we can predict the output of a linear function in terms of its response to simple inputs. 1 We denote the jth column of A by a or A : • j :,j A = a|a |a | . 1 2 · · · n | | | We denote the ith row of A by aT or A : • i i,: T — a1 — T — a2 — A = . . — aT — m A diagonal matrix is a matrixwhere all non-diagonalelementsare 0. Thisis typically denoted D = diag(d1, d2,...,dn), • with d1 d2 D = . (5) .. d n n n The identity matrix, denoted I R × , is a square matrix with ones on the diagonal and zeros everywhere • ∈ m n else, I = diag(1, 1,..., 1). It has the property that for all A R × , ∈ AI = A = IA (6) where the size of I is determined by the dimensions of A so that matrix multiplication is possible. (We define matrix multiplication below.) A block diagonal matrix is one which contains matrices on its main diagonal, and is 0 everywhere else, e.g., • A 0 (7) 0 B The unit vector e is a vector of all 0’s, except entry i, which has value 1: • i T ei = (0,..., 0, 1, 0,..., 0) (8) The vector of all ones is denoted 1. The vector of all zeros is denoted 0. 3 Matrix Multiplication m n n p The product of two matrices A R × and B R × is the matrix ∈ ∈ m p C = AB R × , (9) ∈ where n Cij = AikBkj . (10) Xk=1 Note that in order for the matrix product to exist, the number of columns in A must equal the number of rows in B. There are many ways of looking at matrix multiplication, and we’ll start by examining a few special cases. 2 3.1 Vector-Vector Products Given two vectors x, y Rn, the quantity xT y, called the inner product, dot product or scalar product of the vectors, is a real number∈ given by n xT y = x, y R = x y . (11) h i ∈ i i i=1 X Note that it is always the case that xT y = yT x. Given vectors x Rm, y Rn (they no longer have to be the same size), xyT is called the outer product of the ∈ ∈ T vectors. It is a matrix whose entries are given by (xy )ij = xiyj, i.e., x y x y x y 1 1 1 2 · · · 1 n x2y1 x2y2 x2yn T m n xy R × = . ·. · · . . (12) ∈ . .. x y x y x y m 1 m 2 · · · m n 3.2 Matrix-Vector Products m n n m Given a matrix A R × and a vector x R , their product is a vector y = Ax R . There are a couple ways of looking at matrix-vector∈ multiplication, and∈ we will look at them both. ∈ If we write A by rows, then we can express Ax as, T T — a1 — a1 x T T — a2 — a2 x y = . x = . . (13) . — aT — aT x m m T In other words, the ith entry of y is equal to the inner product of the ith row of A and x, yi = ai x. In Matlab notation, we have y = [A(1,:)*x1; ...; A(m,:)*xn] Alternatively, let’s write A in column form. In this case we see that, x1 | | | x2 y = a1 a2 an . = a1 x1 + a2 x2 + . + an xn . (14) · · · . | | | x n In other words, y is a linear combination of the columns of A, where the coefficients of the linear combination are given by the entries of x. In Matlab notation, we have y = A(:,1)*x1 + ...+ A(:,n)*xn So far we have been multiplying on the right by a column vector, but it is also possible to multiply on the left by a T T m n m n T row vector. This is written, y = x A for A R × , x R , and y R . As before, we can express y in two obvious ways, depending on whether we express∈ A in terms∈ on its rows or∈ columns. In the first case we express A in terms of its columns, which gives yT = xT a|a |a | = xT a xT a xT a (15) 1 2 · · · n 1 2 · · · n | | | which demonstrates that the ith entry of yT is equal to the inner product of x and the ith column of A. 3 Finally, expressing A in terms of rows we get the final representation of the vector-matrix product, T — a1 — — aT — T 2 y = x1 x2 xn . (16) · · · . — aT — m (17) T T T = x1 — a1 — + x2 — a2 — + ... + xn — an — (18) so we see that yT is a linear combination of therows of A, where the coefficients for the linear combination are given by the entries of x. 3.3 Matrix-Matrix Products Armed with this knowledge, we can now look at four different (but, of course, equivalent) ways of viewing the matrix-matrix multiplication C = AB as defined at the beginning of this section. First we can view matrix-matrix multiplication as a set of vector-vector products. The most obvious viewpoint, which follows immediately from the definition, is that the i, j entry of C is equal to the innerproductofthe ithrow of A and the jth rowof B. Symbolically, this looks like the following, T T T T — a1 — a1 b1 a1 b2 a1 bp — aT — aT b aT b · · · aT b 2 | | | 2 1 2 2 · · · 2 p C = AB = . b1 b2 bp = . . (19) . · · · . .. — aT — | | | aT b aT b aT b m m 1 m 2 · · · m p m n n p n n Remember that since A R × and B R × , ai R and bj R , so these inner products all make sense. This is the most “natural” representation∈ when∈ we represent∈ A by rows and∈ B by columns. A special case of this result arises in statistical applications where A = X and B = XT , where X is the n d design matrix, whose rows are the data cases. In this case, XXT is an n n matrix of inner products called× the Gram matrix: × xT x xT x 1 1 · · · 1 n T .. XX = . (20) xT x xT x n 1 · · · n n Alternatively, we can represent A by columns, and B by rows, which leads to the interpretation of AB as a sum of outer products. Symbolically, T — b1 — — bT — n | | | 2 T C = AB = a1 a2 an . = aibi . (21) · · · . i=1 | | | — bT — X n Put another way, AB is equal to the sum, over all i, of the outer product of the ith column of A and the ith row of B. Rm Rp T Since, in this case, ai and bi , the dimension of the outer product aibi is m p, which coincides with the dimension of C. ∈ ∈ × If A = XT and B = X, where X is the n d design matrix,wegeta d d matrix which is proportional to the empirical covariance matrix of the data (assuming× it has been centered): × x2 x x x x n i,1 i,1 i,2 · · · i,1 i,d T T .. X X = xixi = . (22) i=1 i x x x x x2 X X i,d i,1 i,d i,2 · · · i,d 4 Second, we can also view matrix-matrix multiplication as a set of matrix-vector products. Specifically, if we represent B by columns, we can view the columns of C as matrix-vector products between A and the columns of B. Symbolically, | | | | | | C = AB = A b b b = Ab Ab Ab . (23) 1 2 · · · p 1 2 · · · p | | | | | | Here the ith column of C is given by the matrix-vector product with the vector on the right, ci = Abi. These matrix- vector products can in turn be interpreted using both viewpoints given in the previous subsection. Finally, we have the analogous viewpoint, where we represent A by rows, and view the rows of C as the matrix-vector product between the rows of A and C.

Linear Algebra

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support