Vector/Matrix Calculus

Vector/Matrix Calculus

Vector/Matrix Calculus In neural networks, we often encounter prob- lems with analysis of several variables. Vec- tor/Matrix calculus extends calculus of one vari- able into that of a vector or a matrix of vari- ables. Vector Gradient: Let g(w) be a differentiable scalar function of m variables, where T w = [w1,...,wm] Then the vector gradient of g(w) w.r.t. w is the m-dimensional vector of partial derivatives of g ∂g ∂w ∂g . 1 = ∇g = ∇wg = . ∂w ∂g ∂wm Similarly we can define second-order gradient or Hessian matrix. 1 Hessian matrix is defined as ∂2g ∂2g 2 ··· ∂w w ∂2g ∂w1 1 m = . 2 . ∂w 2 2 ∂ g ··· ∂ g ∂wmw1 ∂w2 m Jacobian matrix: Generalization to the vector valued functions T g(w) = [g1(w),...,gn(w)] leads to a definition of the Jacobian matrix of g w.r.t. w ∂g1 ··· ∂gn ∂g ∂w1 ∂w1 = . ∂w ∂g ∂gn 1 ··· ∂wm ∂wm In this vector convention the columns of the Jacobian matrix are gradients of the corre- sponding components functions gi(w) w.r.t. the vector w. 2 Differentiation Rules: The differentiation rules are analogous to the case of ordinary functions: • ∂f(w)g(w) ∂f(w) ∂g(w) = g(w)+ f(w) ∂w ∂w ∂w • ∂f(w) ∂g(w) ∂f(w)/g(w) g(w) − f(w) = ∂w ∂w ∂w g2(w) • ∂f(g(w)) ∂g(w) = f ′(g(w)) ∂w ∂w 3 Example: Consider m T g(w)= aiwi = a w i=1 where a is a constant vector. Thus, a ∂g 1 = . ∂w am or in the vector notation ∂aT w = a. ∂w T 3 Example: Let w = (w1,w2,w3) ∈ R . g(w) = 2w1 + 5w2 + 12w3 = 2 5 12 w. Because ∂g ∂ = (2w1 +5w2 +12w3)=2+0+0=2 ∂w1 ∂w1 hence, 2 ∂g = 5 . ∂w 12 4 Example: Consider m m T g(w)= aijwiwj = w Aw i=1 j=1 where A is a constant square matrix. Thus, m m j=1 wja1j + i=1 wiai1 ∂g . = . ∂w m m j=1 wjamj + i=1 wiaim and so in the vector notation ∂wT Aw = Aw + AT w. ∂w Hessian of g(w) is 2a ··· a + a ∂2wT Aw 11 1m m1 = . ∂w2 a 1 + a1 ··· 2amm m m which equals to A + AT . T 2 Example: Let w = (w1,w2) ∈ R . g(w) = 3w1w1 + 2w1w2 + 6w2w1 + 5w2w2 5 3 2 w1 = w1 w2 6 5 w2 ∂g ∂ = (3w1w1 + 2w1w2 + 6w2w1 + 5w2w2) ∂w1 ∂w1 = 3 ∗ 2w1 + 2w2 + 6w2 +0=6w1 + 8w2 ∂g =0+2 ∗ w1 + 6w1 + 5 ∗ 2w2 = 8w1 + 10w2 ∂w2 and so in the vector notation ∂wT Aw 3 2 w 3 6 w = 1 + 1 ∂w 6 5 w2 2 5 w2 6 8 w = 1 8 10 w2 Hessian of g(w) is ∂2wT Aw ∂ ∂wT Aw = ( ) ∂w2 ∂w ∂w 2 ∗ 3 2+6 6 8 = = . 6+2 2 ∗ 5 8 10 6 Matrix Gradient: Consider a scalar valued func- tion g(W) of the n × m matrix W = {wij} (e.g. determinant of a matrix). The matrix gradi- ent w.r.t W is a matrix of the same dimension as W consisting of partial derivatives of g(W) w.r.t. components of W: ∂g ∂g ∂w ··· ∂w ∂g .11 .1n = . ∂W ∂g ∂g ··· ∂wm1 ∂wmn ∂g Example: If g(W) = tr(W), then ∂W = I Example: Consider a matrix function m m T g(W)= wijaiaj = a Wa i=1 j=1 i.e. assume that a is a constant vector, whereas W is a matrix of variables. Taking the gradient T w.r.t. to W yields ∂a Wa = a a . Thus, in the ∂wij i j matrix form ∂aT Wa = aaT ∂W 7 Example: g(W) = 9w11 + 6w21 + 6w12 + 4w22 w w 3 = 3 2 11 12 w21 w22 2 Thus we have ∂g ∂ = (9w11 + 6w21 + 6w12 + 4w22) = 9 ∂w11 ∂w11 ∂g ∂ = (9w11 + 6w21 + 6w12 + 4w22) = 6 ∂w12 ∂w12 ∂g ∂ = (9w11 + 6w21 + 6w12 + 4w22) = 6 ∂w21 ∂w21 ∂g ∂ = (9w11 + 6w21 + 6w12 + 4w22) = 4 ∂w22 ∂w22 hence ∂g 9 6 3 = = 3 2 ∂W 6 4 2 8 Example: Let W be an invertible square ma- trix of dimension m with determinant, det(W). Then, ∂ det W = (WT )−1 det W ∂W Recall that 1 W−1 = adj(W). det W In the above adj(W) is the adjoint of W: W11 ··· Wm1 . adj(W)= . W1 ··· Wmm m where Wij is a cofactor obtained by multiplying term (−1)i+j by the determinant of a matrix obtained from W by removing the ith row and the jth column. Recall that the determinant of W can be also obtained using cofactors: m det W = wikWik. k=1 9 In the above formula i denotes an arbitrary row. Now taking derivative of det(W) w.r.t. Wij gives ∂ det(W) = Wij ∂wij but from the definition of the matrix gradient it follows that ∂ det(W) = adj(W)T ∂W Using the formula for the inverse of W. 1 W−1 = adj(W). det W We have ∂ det(W) = (WT )−1 det W ∂W Homework: Prove that ∂ log |det(W)| 1 ∂ |det W| = = (WT )−1. ∂W |det W| ∂W 10.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    10 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us