Some Important Properties for Matrix Calculus

Some Important Properties for Matrix Calculus

Some Important Properties for Matrix Calculus Dawen Liang Carnegie Mellon University [email protected] 1 Introduction Matrix calculation plays an essential role in many machine learning algorithms, among which ma- trix calculus is the most commonly used tool. In this note, based on the properties from the dif- ferential calculus, we show that they are all adaptable to the matrix calculus1. And in the end, an example on least-square linear regression is presented. 2 Notation A matrix is represented as a bold upper letter, e.g. X, where Xm;n indicates the numbers of rows and columns are m and n, respectively. A vector is represented as a bold lower letter, e.g. x, where it is a n × 1 column vector in this note. An important concept for a n × n matrix An;n is the trace Tr(A), which is defined as the sum of the diagonal: n X Tr(A) = Aii (1) i=1 where Aii index the element at the ith row and ith column. 3 Properties The derivative of a matrix is usually referred as the gradient, denoted as r. Consider a function m×n p×q f : R ! R , the gradient for f(A) w.r.t. Am;n is: 0 1 @f @f ··· @f @A11 @A12 @A1n B @f @f ··· @f C @f(A) B @A21 @A22 @A2n C rAf(A) = = B . C @A B . .. C @ . A @f @f ··· @f @Am1 @Am2 @Amn This definition is very similar to the differential derivative, thus a few simple properties hold (the matrix A below is square matrix and has the same dimension with the vectors): T T rxb Ax = b A (2) 1Some of the detailed derivations which are omitted in this note can be found at http://www.cs.berkeley. edu/˜jduchi/projects/matrix_prop.pdf 1 T T rAXAY = Y X (3) T T rxx Ax = Ax + A x (4) T rAT f(A) = (rAf(A)) (5) where superscript T denotes the transpose of a matrix or a vector. Now let us turn to the properties for the derivative of the trace. First of all, a few useful properties for trace: Tr(A) = Tr(AT) (6) Tr(ABC) = Tr(BCA) = Tr(CAB) (7) Tr(A + B) = Tr(A) + Tr(B) (8) which are all easily derived. Note that the second one be extended to more general case with arbitrary number of matrices. Thus, for the derivatives, T rATr(AB) = B (9) Proof : Just extend Tr(AB) according to the trace definition (Eq.1). T T T rATr(ABA C) = CAB + C AB (10) Proof : T rATr(ABA C) T =rATr((AB) (A C)) | {z } | {z } u(A) v(AT) T T =rA:u(A)Tr(u(A)v(A )) + rA:v(AT)Tr(u(A)v(A )) T T T T =(v(A )) rAu(A) + (rAT:v(AT)Tr(u(A)v(A )) T T T T T =C AB + ((u(A)) rAT v(A )) =CTABT + (BTATCT)T =CAB + CTABT Here we make use of the property of the derivative of product: (u(x)v(x))0 = u0(x)v(x) + 0 u(x)v (x). The notation rA:u(A) means to calculate the derivative w.r.t. A only on u(A). Same ap- plies to rAT:v(AT). Here chain rule is used. Note that the conversion from rA:v(AT) to rAT:v(AT) is based on Eq.5. 4 An Example on Least-square Linear Regression Now we will derive the solution for least-square linear regression in matrix form, using the proper- ties shown above. We know that the least-square linear regression has a closed-form solution (often referred as normal equation). 2 (i) (i) 1:N Assume we have N data points fx ; y g , and the linear regression function hθ(x) is parametrized by θ. We can rearrange the data to matrix form: 2 (x(1))T 3 2 y(1) 3 6 (x(2))T 7 6 y(2) 7 X = 6 7 y = 6 7 6 . 7 6 . 7 4 . 5 4 . 5 (x(N))T y(N) Thus the error can be represented as: 2 (1) (1) 3 hθ(x ) − y (2) (2) 6 hθ(x ) − y 7 Xθ − y = 6 7 6 . 7 4 . 5 (N) (N) hθ(x ) − y The squared error E(θ), according to the numerical definition: N 1 X E(θ) = (h (x(i)) − y(i))2 2 θ i=1 which is equivalent to the matrix form: 1 E(θ) = (Xθ − y)T(Xθ − y) 2 Take the derivative: 1 T rθE(θ) = r (Xθ − y) (Xθ − y) 2 | {z } 1×1 matrix, thus Tr(·)=(·) 1 = rTr(θTXTXθ − yTXθ − θTXTy + yTy) 2 1 = rTr(θTXTXθ) − rTr(yTXθ) − rTr(θTXTy) 2 1 = rTr(θ I θTXTX) − (yTX)T − XTy 2 The first term can be computed using Eq. 10, where A = θ, B = I, and C = XTX (Note that in this case, C = CT). Plug back to the derivation: 1 r E(θ) = (XTXθ + XTXθ − 2XTy) θ 2 1 = (2XTXθ − 2XTy) 2 ====Set to)0 XTXθ = XTy T −1 T θLS = (X X) X y The normal equation is obtained in matrix form. 3.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    3 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us