Matrix Calculus

FÐ1 Appendix F: MATRIX CALCULUS

TABLE OF CONTENTS

Page ¤F.1. Introduction FÐ3 ¤F.2. The Derivatives of Vector Functions FÐ3 ¤F.2.1. Derivative of Vector with Respect to Vector ...... FÐ3 ¤F.2.2. Derivative of a Scalar with Respect to Vector ...... FÐ3 ¤F.2.3. Derivative of Vector with Respect to Scalar ...... FÐ3 ¤F.2.4. Jacobian of a Variable Transformation ...... FÐ4 ¤F.3. The Chain Rule for Vector Functions FÐ5 ¤F.4. The Derivative of Scalar Functions of a Matrix FÐ6 ¤F.4.1. Functions of a Matrix Determinant ...... FÐ7 ¤F.5. The Matrix Differential FÐ8

FÐ2 ¤F.2 THE DERIVATIVES OF VECTOR FUNCTIONS

¤F.1. Introduction In this Appendix we collect some useful formulas of matrix calculus that often appear in ﬁnite element derivations.

¤F.2. The Derivatives of Vector Functions

Let x and y be vectors of orders n and m respectively:     x1 y1      x2   y2  x =  .  , y =  .  ,(F.1) . .

xn ym where each component yi may be a function of all the x j , a fact represented by saying that y is a function of x,or y = y(x). (F.2) If n = 1, x reduces to a scalar, which we call x.Ifm = 1, y reduces to a scalar, which we call y. Various applications are studied in the following subsections.

¤F.2.1. Derivative of Vector with Respect to Vector The derivative of the vector y with respect to vector x is the n × m matrix   ∂y1 ∂y2 ∂ym ∂ ∂ ··· ∂  x1 x1 x1   ∂ ∂ ∂  ∂  y1 y2 ··· ym  y def  ∂x ∂x ∂x  =  2 2 2  (F.3) ∂x  . . . .   . . .. .  ∂ ∂ ∂ y1 y2 ··· ym ∂xn ∂xn ∂xn

¤F.2.2. Derivative of a Scalar with Respect to Vector If y is a scalar,   ∂y  ∂x1   ∂  ∂  y  y def  ∂  =  x2  .(F.4) ∂x  .   .  ∂y ∂xn

¤F.2.3. Derivative of Vector with Respect to Scalar If x is a scalar, ∂y def ∂ ∂ ∂ = y1 y2 ... ym (F.5) ∂x ∂x ∂x ∂x

FÐ3 Appendix F: MATRIX CALCULUS

Remark F.1. Many authors, notably in statistics and economics, deﬁne the derivatives as the transposes of those given above.1 This has the advantage of better agreement of matrix products with composition schemes such as the chain rule. Evidently the notation is not yet stable.

Example F.1.Given x1 y1 y = , x = x2 (F.6) y2 x3 and 2 y1 = x − x 1 2 (F.7) = 2 + y2 x3 3x2 the partial derivative matrix ∂y/∂x is computed as follows:  ∂ ∂  y1 y2 ∂x1 ∂x1 ∂y  ∂ ∂  2x1 0 =  y1 y2  = − ( . )  ∂ ∂  13 F 8 ∂x x2 x2 02x3 ∂y1 ∂y2 ∂x3 ∂x3 ¤F.2.4. Jacobian of a Variable Transformation In multivariate analysis, if x and y are of the same order, the determinant of the square matrix ∂x/∂y, that is ∂x J = (F.9) ∂y is called the Jacobian of the transformation determined by y = y(x). The inverse determinant is ∂ −1 y J = .(F.10) ∂x

Example F.2. The transformation from spherical to Cartesian coordinates is deﬁned by x = r sin θ cos ψ, y = r sin θ sin ψ, z = r cos θ(F.11) where r > 0, 0 <θ<πand 0 ≤ ψ<2π. To obtain the Jacobian of the transformation, let x ≡ x , y ≡ x , z ≡ x 1 2 3 (F.12) r ≡ y1,θ≡ y2,ψ≡ y3 Then ∂ sin y2 cos y3 sin y2 sin y3 cos y2 = x = − J y1 cos y2 cos y3 y1 cos y2 sin y3 y1 sin y2 ∂y (F.13) −y1 sin y2 sin y3 y1 sin y2 cos y3 0 = 2 = 2 θ. y1 sin y2 r sin The foregoing deﬁnitions can be used to obtain derivatives to many frequently used expressions, including quadratic and bilinear forms.

1 One author puts it this way: “When one does matrix calculus, one quickly ﬁnds that there are two kinds of people in this world: those who think the gradient is a row vector, and those who think it is a column vector.”

FÐ4 ¤F.3 THE CHAIN RULE FOR VECTOR FUNCTIONS

Example F.3. Consider the quadratic form y = xT Ax (F.14) where A is a square matrix of order n. Using the deﬁnition (D.3) one obtains ∂y = Ax + AT x (F.15) ∂x and if A is symmetric, ∂y = 2Ax.(F.16) ∂x We can of course continue the differentiation process: ∂2 y ∂ ∂y = = A + AT ,(F.17) ∂x2 ∂x ∂x and if A is symmetric, ∂2 y = 2A.(F.18) ∂x2

The following table collects several useful vector derivative formulas.

∂y y ∂x Ax AT xT AA xT x 2x xT Ax Ax + AT x

¤F.3. The Chain Rule for Vector Functions Let       x1 y1 z1        x2   y2   z2  x =  .  , y =  .  and z =  .  (F.19) . . .

xn yr zm where z is a function of y, which is in turn a function of x. Using the deﬁnition (D.2), we can write  ∂ ∂ ∂  z1 z1 ... z1 ∂x1 ∂x2 ∂xn  ∂ ∂ ∂  T  z2 z2 ... z2  ∂z ∂x ∂x ∂x =  1 2 n  ( . ) ∂  . . .  F 20 x  . . .  ∂ ∂ ∂ zm zm ... zm ∂x1 ∂x2 ∂xn Each entry of this matrix may be expanded as r ∂zi ∂zi ∂yq i = 1, 2,...,m = (F.21) ∂ ∂ ∂ j = 1, 2,...,n. x j q=1 yq x j

FÐ5 Appendix F: MATRIX CALCULUS

Then  ∂ ∂y ∂ ∂y ∂ ∂y  z1 q z1 q ... z2 q ∂ ∂ ∂ ∂ ∂  yq x1 yq x2 yq xn  ∂ ∂y ∂ ∂y ∂ ∂y T  z2 q z2 q ... z2 q  ∂z  ∂y ∂x ∂y ∂x ∂y ∂x  =  q 1 q 2 q n  ∂  .  x  .  ∂ ∂y ∂ ∂y ∂ ∂y zm q zm q ... zm q ∂yq ∂x1 ∂yq ∂x2 ∂yq ∂xn  ∂ ∂ ∂   ∂ ∂ ∂  z1 z1 ... z1 y1 y1 ... y1 ∂y1 ∂y2 ∂yr ∂x1 ∂x2 ∂xn  ∂ ∂ ∂   ∂ ∂ ∂  z2 z2 ... z2 y2 y2 ... y2  ∂ ∂ ∂   ∂ ∂ ∂   y1 y2 yr   x1 x2 xn  =  .   .   .   .  ∂ ∂ ∂ ∂ ∂ ∂ zm zm ... zm yr yr ... yr ∂y1 ∂y2 ∂yr ∂x1 ∂x2 ∂xn ∂z T ∂y T ∂y ∂z T = = .(F.22) ∂y ∂x ∂x ∂y On transposing both sides, we ﬁnally obtain

∂z ∂y ∂z = ,(F.23) ∂x ∂x ∂y which is the chain rule for vectors. If all vectors reduce to scalars,

∂z ∂y ∂z ∂z ∂y = = ,(F.24) ∂x ∂x ∂y ∂y ∂x which is the conventional chain rule of calculus. Note, however, that when we are dealing with vectors, the chain of matrices builds “toward the left.” For example, if w is a function of z, which is a function of y, which is a function of x,

∂w ∂y ∂z ∂w = .(F.25) ∂x ∂x ∂y ∂z

On the other hand, in the ordinary chain rule one can indistictly build the product to the right or to the left because scalar multiplication is commutative.

¤F.4. The Derivative of Scalar Functions of a Matrix

Let X = (xij) be a matrix of order (m × n) and let

y = f (X), (F.26) be a scalar function of X. The derivative of y with respect to X, denoted by

∂y ,(F.27) ∂X

FÐ6 ¤F.4 THE DERIVATIVE OF SCALAR FUNCTIONS OF A MATRIX is deﬁned as the following matrix of order (m × n):   ∂y ∂y ... ∂y ∂x ∂x ∂x  11 12 1n  ∂y ∂y ... ∂y ∂y  ∂ ∂ ∂  ∂y ∂y G = =  x21 x22 x2n  = = E ,(F.28) ∂  . . .  ∂ ij ∂ X  . . .  xij i, j xij ∂y ∂y ... ∂y ∂xm1 ∂xm2 ∂xmn where Eij denotes the elementary matrix* of order (m × n). This matrix G is also known as a gradient matrix.

Example F.4. Find the gradient matrix if y is the trace of a square matrix X of order n, that is

n y = tr(X) = xii.(F.29) i=1 Obviously all non-diagonal partials vanish whereas the diagonal partials equal one, thus ∂y G = = I,(F.30) ∂X where I denotes the identity matrix of order n.

¤F.4.1. Functions of a Matrix Determinant An important family of derivatives with respect to a matrix involves functions of the determinant of a matrix, for example y =|X| or y =|AX|. Suppose that we have a matrix Y = [yij] whose components are functions of a matrix X = [xrs], that is yij = fij(xrs), and set out to build the matrix ∂|Y| .(F.31) ∂X Using the chain rule we can write ∂| | ∂| | ∂ Y = Y yij .(. ) ∂ Yij ∂ ∂ F 32 xrs i j yij xrs

But |Y|= yijYij,(F.33) j where Yij is the cofactor of the element yij in |Y|. Since the cofactors Yi1, Yi2, ... are independent of the element yij,wehave ∂|Y| = Yij.(F.34) ∂yij It follows that ∂| | ∂ Y = yij .(. ) ∂ Yij ∂ F 35 xrs i j xrs

* The elementary matrix Eij of order m × n has all zero entries except for the (i, j) entry, which is one.

FÐ7 Appendix F: MATRIX CALCULUS

There is an alternative form of this result which is ocassionally useful. Deﬁne

∂yij aij = Yij, A = [aij], bij = , B = [bij].(F.36) ∂xrs Then it can be shown that ∂|Y| = tr(ABT ) = tr(BT A). (F.37) ∂xrs

− Example F.5.IfX is a nonsingular square matrix and Z =|X|X 1 its cofactor matrix, ∂|X| G = = ZT .(F.38) ∂X If X is also symmetric, ∂|X| G = = 2ZT − diag(ZT ). (F.39) ∂X

¤F.5. The Matrix Differential For a scalar function f (x), where x is an n-vector, the ordinary differential of multivariate calculus is deﬁned as n ∂ = f .(. ) df ∂ dxi F 40 i=1 xi

In harmony with this formula, we deﬁne the differential of an m × n matrix X = [xij]tobe   dx11 dx12 ... dx1n  dx21 dx22 ... dx2n  dX def=  . . .  .(F.41)  . . . 

dxm1 dxm2 ... dxmn This deﬁnition complies with the multiplicative and associative rules

d(αX) = α dX, d(X + Y) = dX + dY.(F.42)

If X and Y are product-conforming matrices, it can be veriﬁed that the differential of their product is d(XY) = (dX)Y + X(dY). (F.43) which is an extension of the well known rule d(xy) = ydx+ xdyfor scalar functions.

FÐ8 ¤F.5 THE MATRIX DIFFERENTIAL

− Example F.7. With the same assumptions as above, ﬁnd d(X 1). The quickest derivation follows by differ- entiating both sides of the identity X−1X = I:

d(X−1)X + X−1 dX = 0,(F.45) from which d(X−1) =−X−1 dXX−1.(F.46) If X reduces to the scalar x we have 1 dx d =− .(F.47) x x 2

FÐ9