B553 Lecture 5: Matrix Algebra Review
Total Page:16
File Type:pdf, Size:1020Kb
B553 Lecture 5: Matrix Algebra Review Kris Hauser January 19, 2012 We have seen in prior lectures how vectors represent points in Rn and gradients of functions. Matrices represent linear transformations of vector quantities. This lecture will present standard matrix notation, conventions, and basic identities that will be used throughout this course. During the course of this discussion we will also drop the boldface notation for vectors, and it will remain this way for the rest of the class. 1 Matrices A matrix A represents a linear transformation of an n-dimensional vector space to an m-dimensional one. It is given by an m×n array of real numbers. Usually matrices are denoted as uppercase letters (e.g., A; B; C), with the entry in the i'th row and j'th column denoted in the subscript ·i;j, or when it is unambiguous, ·ij (e.g., A1;2;A1p). 2 3 A1;1 ··· A1;n 6 . 7 A = 4 . 5 (1) Am;n ··· Am;n 1 1.1 Matrix-Vector Product An m × n matrix A transforms vectors x = (x1; : : : ; xn) into m-dimensional vectors y = (y1; : : : ; ym) = Ax as follows: n X y1 = A1jxj j=1 ::: (2) n X ym = Amjxj j=1 Pn Or, more concisely, yi = j=1 Aijxj for i = 1; : : : ; m. (Note that matrix- vector multiplication is not symmetric, so xA is an invalid operation.) Linearity of matrix-vector multiplication. We can see that matrix- vector multiplication is linear, that is A(ax+by) = aAx+bAy for all a, b, x, and y. It is also linear in terms of component-wise addition and multiplica- tion of matrices, as long as the matrices are of the same size. More precisely, if A and B are both m × n matrices, then (aA + bB)x = aAx + bBx for all a, b, and x. Identity matrix. One special matrix that occurs frequently is the n × n identity matrix In, which has 0's in all off-diagonal positions Iij with i 6= j, and 1's in all diagonal positions Iii. It is significant because Inx = x for all x 2 Rn. 1.2 Matrix Product When two linear transformations are performed one after the other, the result is also a linear transformation. Suppose A is m × n, B is n × p, and x is a p-dimensional vector, and consider the result of A(Bx) (that is, first multiplying by B and then multiplying the result by A). We see that p p X X Bx = ( B1jxj;:::; Bnjxj) (3) j=1 j=1 and n n X X Ay = ( A1kyk;:::; Amkyk) (4) k=1 k=1 2 So n p n p ! X X X X A(Bx) = A1k( Bkjxj);:::; Amk( Bkjxj) : (5) k=1 j=1 k=1 j=1 Rearranging the summations, we see that p n p n ! X X X X A(Bx) = ( A1kBkj)xj);:::; ( AmkBkjxj) : (6) j=1 k=1 j=1 k=1 In other words, we could have A(Bx) = Cx if we were to form a matrix C such that n X Cij = AikBkj (7) k=1 This is exactly the definition of the matrix product, and we say C = AB. The entry Cij of can also be obtained taking the dot-product of the i'th column of A and the j'th column of B. Matrix product is associative but not symmetric. By the above derivation we can drop the parentheses A(Bx) = (AB)x. So, matrix-vector and matrix-matrix multiplication are associative. Note again however that matrix-matrix multiplication is not symmetric, that is AB 6= BA in general. Column and row vectors. Note that if we were to write an n-dimensional vector x stacked in a n × 1 matrix x (denoted in lowercase), we can turn the matrix-vector y = Ax into the matrix product y = Ax. Here, if A is an m × n matrix, then y is an m × 1 matrix. 2 3 2 3 2 3 y1 A1;1 ··· A1;n x1 6 . 7 6 . 7 6 . 7 4 . 5 = 4 . 5 4 . 5 (8) ym Am;n ··· Am;n xn Hence, there is a one-to-one correspondence between vectors and matrices with one column. These matrices are called column vectors and will be our default notation for vectors throughout the rest of the course. We will occasionally also deal with row vectors, which are matrices with a single row. 1.3 Transpose The transpose AT of a matrix A simply switches A's rows and columns. T (A )ij = Aji: (9) 3 If A is m × n, then AT is n × m. Symmetric matrix. If A = AT , then A is symmetric. 1.4 Matrix Inverse An inverse A−1 of an n × n square matrix A is a matrix that satisfies the following equation: −1 −1 AA = A A = In (10) where In is the identity matrix. Not all square matrices have an inverse, in which case we say A is not invertible (or singular). Invertible matrices are significant because the unique solution x to the system of linear equations Ax = b, is simply A−1b. This holds for any b. If the matrix is not invertible, then such an equation may or may not have a solution. Orthogonal matrix. An orthogonal matrix is a square matrix that satisfies T AA = In. In other words, its transpose is its inverse. 1.5 Matrix identities Identities involving the transpose: • (cA)T = cAT for any real value c. • (A + B)T = AT + BT . • (AB)T = BT AT . • All 1×1 matrices are symmetric, the identity matrix is symmetric, and all uniform scalings of a symmetric matrix are symmetric. • A + AT is symmetric. • The dot product x·y is equal to xT y, with x and y denoting the column vector representations of x and y, respectively. • xT Ay = yT AT x, with x and y column vectors. Identities involving the inverse: −1 • In = In. 4 −1 1 −1 • (cA) = c A for any real value c 6= 0. • (AB)−1 = B−1A−1 if both B and A are invertible. • If A and B are invertible, then (ABA−1)−1 = AB−1A−1. 1.6 Common mistakes Matrix expressions are similar to standard expressions regarding real num- bers in that addition and subtraction are equivalent, multiplication is nearly equivalent, and inverses give an approximation of division. But, this similar- ity leads to common pitfalls when manipulating matrix equations. Here are some common mistakes that you should look out for. 1. Swapping the arguments of a matrix product. 2. Propagating transposes or inverses into a matrix product without swap- ping the order of arguments. 3. Assuming that a matrix is invertible (or worse, assuming a non-square matrix is invertible). 4. Performing operations on matrices of incompatible size. 2 Rank, Null space, and Definiteness If A is not invertible (for instance, it may not be square) then the system of linear equations Ax = b may not have a solution x. Or, it may have an infinite number of solutions. Or, it may have solutions for some b's and not others. We would like to characterize, based on properties of A, when such equations can be solved. 2.1 Matrix rank Consider the columns of A as a list of vectors a1; : : : ; an. Recall that if b 2 Span(a1; : : : ; an), then b is a linear combination of a1; : : : ; an. If this holds, then it is sufficient to set each component xi to the respective coefficient on ai in order to solve Ax = b. On the other hand, if b2 = Span(a1; : : : ; an), then 5 there is not solution. So, the set of vectors b such that Ax = b has a solution is precisely Span(a1; : : : ; an). Rank. The rank of an m × n matrix A is the size of the largest subset of fa1; : : : ; ang that is linearly independent. In other words, if A has rank k, m then Span(a1; : : : ; an) is an k-dimensional subspace of R . If k = n, then A is said to have full column rank, and such problems have at most one solution. If k = m, then A is said to have full row rank, and such problems have at least one solution. If k = m = n, then A is invertible. Overdetermined system. Now suppose that the rank of A is k < m. Then there are some possible values of b that are not attainable by linear combi- nations of a1; : : : ; an. Such systems are known as overdetermined because there are more constraints than can be fulfilled by adjusting the values of x. Overdetermined systems are usually not solved exactly, but are more often 2 solved in a least squares sense minx jjAx − bjj . Underdetermined system. If the rank of A is k < n, then there are an infinite number of solutions x to the equation Ax0 = Ax. To see this, let some column of A be linearly dependent on the remaining columns. Suppose Pn this column is a1 without loss of generality. Then, a1 − i=2 ciai = 0 for some coefficients ci. So, any multiple of the vector v = (1; −c2;:::; −cn) can be added to x0 without affecting the value of A(x0 + cv). Such systems are known as underdetermined because they may be solved by multple values of x. A system can be both underdetermined and overdetermined if k < m and k < n. This means there are some values of b for which there is no solution, but for those that do have a solution, there are an infinite number of solutions.