Vector and Matrix Calculus

Vector and Matrix Calculus

Vector and Matrix Calculus Herman Kamper [email protected] Published: 2013-01-30 Last update: 2021-07-26 1 Introduction As explained in detail in [1], there unfortunately exists multiple competing notations concerning the layout of matrix derivatives. This can cause a lot of difficulty when consulting several sources, since different sources might use different conventions. Some sources, for example [2] (from which I use a lot of identities), even use a mixed layout (according to [1, Notes]). Identities for both the numerator layout (sometimes called the Jacobian formulation) and the denominator layout (sometimes called the Hessian formulation) is given in [1], so this makes it easy to check what layout a particular source uses. I will aim to stick to the denominator layout, which seems to be the most widely used in the field of statistics and pattern recognition (e.g. [3] and [4, pp. 327{332]). Other useful references concerning matrix calculus include [5] and [6]. In this document column vectors are assumed in all cases expect where specifically stated otherwise. Table 1: Derivatives of scalars, vector functions and matrices [1,6]. column vector m×n scalar y m matrix Y 2 R y 2 R @Y @y row vector matrix @x (only scalar x scalar @x @y m @x 2 R numerator layout) n column vector matrix column vector x 2 R @y n @y n×m @x 2 R @x 2 R p×q @y p×q matrix X 2 R matrix @X 2 R 2 Definitions Table1 indicates the six possible kinds of derivatives when using the denominator layout. Using this layout notation consistently, we have the following definitions. n n The derivative of a scalar function f : R ! R with respect to vector x 2 R is 2 @f(x) 3 @x 6 1 7 6 @f(x) 7 @f(x) def 6 @x2 7 = 6 7 (1) @x 6 . 7 6 . 7 4 5 @f(x) @xn This is the transpose of the gradient (some authors simply call this the gradient, irrespective of whether numerator or denominator layout is used). 1 n m T The derivative of a vector function f : R ! R , where f(x) = f1(x) f2(x) : : : fm(x) n and x 2 R , with respect to scalar xi is @f(x) def h @f1(x) @f2(x) @fm(x) i = @x @x ::: @x (2) @xi i i i n m T The derivative of a vector function f : R ! R , where f(x) = f1(x) f2(x) : : : fm(x) , n with respect to vector x 2 R is 2 @f(x) 3 2 @f1(x) @f2(x) @fm(x) 3 @x @x @x ::: @x 6 1 7 6 1 1 1 7 6 @f(x) 7 6 @f1(x) @f2(x) ::: @fm(x) 7 @f(x) def 6 @x2 7 6 @x2 @x2 @x2 7 = 6 7 = 6 7 (3) @x 6 . 7 6 . 7 6 . 7 6 . 7 4 5 4 5 @f(x) @f1(x) @f2(x) ::: @fm(x) @xn @xn @xn @xn This is just the transpose of the Jacobian matrix. m×n m×n The derivative of a scalar function f : R ! R with respect to matrix X 2 R is 2 @f(X) @f(X) @f(X) 3 @X @X ··· @X 6 11 12 1n 7 6 @f(X) @f(X) ··· @f(X) 7 @f(X) def 6 @X21 @X22 @X2n 7 = 6 7 (4) @X 6 . 7 6 . 7 4 5 @f(X) @f(X) ··· @f(X) @Xm1 @Xm2 @Xmn Observe that the (1) is just a special case of (4) for column vectors. Often (as in [3]) the gradient notation is used as an alternative to the notation used above, for example: @f(x) r f(x) = (5) x @x @f(X) r f(X) = (6) X @X 3 Identities 3.1 Scalar-by-vector product rule m n m×n If a 2 R , b 2 R and C 2 R then m m 0 n 1 m n T X X X X X a Cb = ai(Cb)i = ai @ CijbjA = Cijaibj (7) i=1 i=1 j=1 i=1 j=1 m m n n m×n Now assume we have vector functions u : R ! R , v = R ! R and A 2 R . The vector q functions u and v are functions of x 2 R , but A is not. We want to find an identity for @uTAv (8) @x 2 From (7), we have: m n @uTAv @uTAv @ X X = = A u v @x @x @x ij i j l l l i=1 j=1 m n X X @ = A u v ij @x i j i=1 j=1 l m n X X @ui @vj = A v + u ij j @x i @x i=1 j=1 l l m n m n X X @ui X X @vj = A v + A u (9) ij j @x ij i @x i=1 j=1 l i=1 j=1 l Now we can show (by writing out the elements [Notebook, 2012-05-22]) that: m n m n @u @v X X @ui X X @vj Av + ATu = A v + (AT) u @x @x ij j @x ji i @x l i=1 j=1 l i=1 j=1 l m n m n X X @ui X X @vj = A v + A u (10) ij j @x ij i @x i=1 j=1 l i=1 j=1 l A comparison of (9) and (10) completes the proof that @uTAv @u @v = Av + ATu (11) @x @x @x 3.2 Useful identities from scalar-by-vector product rule m q n m×n m×q From (11) it follows, with vectors and matrices b 2 R , d 2 R , x 2 R , B 2 R , C 2 R , q×n D 2 R , that @(Bx + b)TC(Dx + d) @(Bx + b) @(Dx + d)T = C(Dx + d) + CT(Bx + b) (12) @x @x @x resulting in the identity: @(Bx + b)TC(Dx + d) = BTC(Dx + d) + DTCT(Bx + b) (13) @x by using the easily verifiable identities: @(u(x) + v(x)) @u(x) @v(x) = + (14) @x @x @x @Ax = AT (15) @x @a = 0 (16) @x Some other useful special cases of (11): @xTAb = Ab (17) @x 3 @xTAx = (A + AT)x (18) @x @xTAx = 2Ax if A is symmetric (19) @x 3.3 Derivatives of determinant See [7, p. 374] for definition of cofactors. Also see [Notebook, 2012-05-22]. n×n We can write the determinant of matrix X 2 R as n X jXj = Xi1Ci1 + Xi2Ci2 + ::: + XinCin = XijCij+ (20) j=1 Thus the derivative will be @jXj @ = fXi1Ci1 + Xi2Ci2 + ::: + XinCing @X kl @Xkl @ = fXk1Ck1 + Xk2Ck2 + ::: + XknCkng @Xkl (can choose i any number, so choose i = k) = Ckl (21) Thus (see [7, p. 386]) @jXj = cofactor X = (adj X)T (22) @X But we know that the inverse of X is given by [7, p. 387] 1 X−1 = adj X (23) jXj thus adj X = jXjX−1 (24) which, when substituted into (22), results in the identity @jXj = jXj(X−1)T (25) @X From (25) we can also write @ ln jXj @ ln jXj 1 @jXj 1 = = = jXj(X−1)T (26) @X kl @Xkl jXj @X jXj giving the identity @ ln jXj = (X−1)T (27) @X 4 References [1] Matrix calculus. [Online]. Available: http://en.wikipedia.org/wiki/Matrix calculus [2] K. B. Petersen and M. S. Pedersen, \The matrix cookbook," 2008. [3] A. Ng, Machine Learning. Class notes for CS229, Stanford Engineering Everywhere, Stanford University, 2008. [Online]. Available: http://see.stanford.edu [4] S. R. Searle, Matrix Algebra Useful for Statistics. New York, NY: John Wiley & Sons, 1982. [5] J. R. Schott, Matrix Analysis for Statistics. New York, NY: John Wiley & Sons, 1996. [6] T. P. Minka, \Old and new matrix algebra useful for statistics," 2000. [Online]. Available: http://research.microsoft.com/en-us/um/people/minka/papers/matrix [7] D. G. Zill and M. R. Cullen, Advanced Engineering Mathematics, 3rd ed. Jones and Bartlett, 2006. 5.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    5 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us