Paul Klein Stockholms universitet October 1999
Calculus with vectors and matrices
1.1 The gradient and the Hessian
The purpose of this section is to make sense of expressions like
∂f (x) = ∇ f (x) = ∇f (x) = f 0 (x) = f (x) (1) ∂xT x x where f : Rn → Rm. Of course, we already know what a partial derivative is and how to calculate it. What this section will tell us is how to arrange the partial derivatives into a matrix (gradient), and the rules of arithmetic that follow from adopting our particular arrangement convention.
Definition 1.1 Let f : Rn → Rm have partial derivatives at x. Then ∂f1(x) ··· ∂f1(x) ∂x1 ∂xn ∂f (x) . .. . (2) T , . . . ∂x | {z } m×n ∂fm(x) ··· ∂fm(x) ∂x1 ∂xn
1 and ∂f (x) ∂f (x)T (3) ∂x , ∂xT | {z } n×m where AT is the transpose of A.
Definition 1.2 Let f : Rn → Rn have partial derivatives at x. Then the (scalar- valued) Jacobian of f at x is defined via
∂f (x) J (x) det . (4) f , ∂xT
∂f (x) Remark 1.1 Sometimes the gradient itself is called the Jacobian. Here ∂xT the Jacobian is defined as the determinant of the gradient.
The following properties of the gradient follow straightforwardly from the defi- nition.
Proposition 1.1 1. Let x be an n × 1 vector and A an m × n matrix. Then
∂ [Ax] = A. (5) ∂xT
2. Let x be an n × 1 vector and A an n × m matrix. Then
∂ xT A = AT . (6) ∂xT
3. Let x be an n × 1 vector and A an n × n matrix. Then
∂ xT Ax = xT A + AT . (7) ∂xT
2 4. Let x be an n × 1 vector and A an n × n symmetric matrix. Then
∂ xT Ax = 2xT A (8) ∂xT
If f is scalar-valued, it is straightforward to define the second derivative (Hes- sian) as follows.
Definition 1.3 Let f : Rn → R have continuous first and second partial deriva- tives at x (so as to satisfy the requirements of Young’s theorem). Then ∂2f(x) ∂2f(x) 2 ··· ∂x ∂x1∂xn 2 1 ∂ f (x) . . . 00 . .. . f (x) . (9) ∂x∂xT , . . , | {z } n×n ∂2f(x) ∂2f(x) ··· 2 ∂xn∂x1 ∂xn
Note that, by Young’s theorem, the Hessian of a scalar-valued function is symmet- ric.
T Proposition 1.2 Let f (x) , x Ax where A is symmetric. Then
∂2f (x) = 2A (10) ∂x∂xT
Occasionally we run into matrix-valued functions, and the way forward then is
to vectorize and then differentiate.
3 " # Definition 1.4 Let A = a1 a2 ··· an be an m × n matrix. Then |{z} |{z} |{z} m×1 m×1 m×1 a1 a2 vec (A) (11) , . | {z } . mn×1 an
Definition 1.5 Let f : Rk → Rn×m have partial derivatives at x. Then
∂f (x) ∂vecf (x) (12) ∂xT , ∂xT | {z } nm×k
Having defined the vec operator, we quickly run into cases where we need the
Kronecker product, defined as follows.
Definition 1.6 Let A and B be matrices. Denote the element in the i:th row m×n k×l and j:th column of A by aij. Then
a B ··· a B 11 1n . .. . A ⊗ B , . . . . (13) | {z } mk×ln am1B ··· amnB
Proposition 1.3 Let A , B and C be matrices. Then k×l m×n p×q
vec (ABC) = CT ⊗ A vec (B) (14)
4 Proof. Exercise.
Occasionally we find ourselves wanting to differentiate a vector-valued function with respect to a matrix. Again the way forward is to vectorize.
Proposition 1 Whenever the following expressions are defined, they are true.
The trace of a matrix A is denoted by tr (A) . [Various rules of arithmetic omit- ted in this version. See the bibliography for sources.]
Definition 1.7 Let f : Rn×m → Rk have partial derivatives at x. Then
∂f (x) ∂f (x) , (15) ∂AT ∂ (vecA)T | {z } nm×k
n×m n m Example 1.1 Let f : R → R be defined via f (Φ) , Φk where k ∈ R is a