On det X , log det X and log det XT X

Andersen Ang

Math´ematiqueet de Recherche op´erationnelle Facult´epolytechnique de Mons UMONS Mons, Belgium email: [email protected] homepage: angms.science

August 14, 2017

1 / 17 Overview

1 On det X Physical meaning of det X ∂ det X −T Jacobi’s Formula and ∂X = det X · X

2 On log det X Physical meaning of log det X log det X is concave (Fast) computation of log det

3 On log det XT X Physical meaning of log det XT X of log det XT X + δI

4 Summary

2 / 17 What is det : the volume of parallelepiped

n×n For a square matrix X = [x1 x2 ... xn] ∈ R with linearly independent columns, det X tells the volume of the parallelepiped spaned by the columns xi det X = vol({x1 x2 ... xn}) 2 Why? Consider the case n = 2 : two vectors x1, x2 ∈ R , we can always perform Gram-Schmidt orthogonalization to get two vectors v1, v2

v1 = x1 ⊥ v2 = c12v1 + v2 ⊥ where vector v2 is orthogonal to v1. Then ⊥ det(v1, v2) = det(v1, c12v1 + v2 ) ⊥ = det(v1, v2 ) = signed volume(v1, v2) same argument applies to any n

For detail, see Hannah, John. ”A geometric approach to .” American Mathematical Monthly (1996): 401-409. 3 / 17 Derivative of detX - the Jacobi’s Formula

For a non-singular matrix X, recall that : adjugate-det-inverse relationship: adjX = det X · X−1 adjugate-cofactor relationship : adjX = CT Jacobi’s Formula gives the derivative of det X with respect to (w.r.t.) scalar x ∂ det X  ∂X  = det X · Tr X−1 (1) ∂x ∂x  ∂X  = Tr det X · X−1 | {z } ∂x adjX  ∂X  = Tr adjX | {z } ∂x CT  ∂X  = C, ∂x F So the derivative of det X equals to the matrix inner product of cofactor and derivative of X 4 / 17 Derivative of detX - the Jacobi’s Formula

∂ det X  ∂X  The equation = C, makes sense because : ∂x ∂x F The derivative of scalar value det X w.r.t. scalar x is a scalar  ∂X  C, is a scalar ∂x F The derivative of det X w.r.t. matrix X is a matrix. The expression is ∂ det X = det X · X−T ∂X For derivation, refer to previous document.

5 / 17 What is log det

The log- of a matrix X is log det X X has to be square (∵ det) X has to be positive definite (pd), because Q I det X = i λi I all eigenvalues of pd matrix are positive I domain of log has to be positive real number (log of negative number produces complex number which is out of context here) if X is positive semidefinite (psd), it is better to consider a regularized version log det(X + δI), where δ > 0 is a small positive number. As all eigenvalues of a psd matrix is non-negative (zero or positve), adding a small positive number δ removes all the zero eigenvalues, turns X + δI to pd log det is a concave ∂ log det X = X−T ∂X

6 / 17 Why log det

Q For a matrix X, the expression det X = λi is dominated by the leading eigenvalues. Such domination is not a problem if we only care about the leading eigenvalues.

When all the eigenvalues are important, one way to suppress the leading eigenvalues is to use log. Hence we have X log det X = log λi = log λ1 + log λ2 + ...

Again, since the input of log should not be negative nor zero, the matrix X here should be pd to make λi > 0

7 / 17 log det X is concave

n For f(X) = log det X, X ∈ S++, f is concave on X. Proof. To show f is concave on X is equivalent to show g(t) = f(X) = f(Z + tV ) is concave n on t where Z,V ∈ S , and domg = {t|Z + tV 0} ∩ {t = 0}

g(t) = log det(Z + tV )  1 − 1 − 1 1  = log det Z 2 (I + tZ 2 VZ 2 )Z 2  1 1  − 1 − 1 = log det Z 2 (I + tVˆ )Z 2 let Vˆ = Z 2 VZ 2 h  ˆ  i = log det I + tV det Z ∵ det(AB) = det(A) det(B)   = log det I + tVˆ + log det Z   = log det QQT + tQΛQT + log det Z Vˆ = QΛQT , QQT = QT Q = I   = log det Q(I + tΛ)QT + log det Z = log det(QQT ) + log det(I + tΛ) + log det Z = log det(I + tΛ) + log det Z QQT = I, det I = 1, log 1 = 0 Q Q = log (1 + tλi) + log det Z det Λ = λi P = log(1 + tλi) + log det Z

∂g λ ∂2g λ2 = P i and = − P i < 0, g(t) is concave, so f(X) is concave (and 2 2 ∂t 1 + tλi ∂t (1 + tλi) −f(X) is convex) Reference: appendix of the book by Boyd

8 / 17 Application of concavity of log det X : Taylor bound

Fact : first order Taylor approximation is a global over-estimator of a concave function. That is,

f(x) ≤ f(y) + ∇f(y)T (x − y)

As log det X is concave, so it is upper bounded by its first order Taylor approximation.

−T log det X ≤ log det Y + Y , (X − Y ) F

Again, Y −T and X − Y are matrices while log det X and log det Y are scalars, so the matrix inner product h , i has to be applied.   log det X ≤ log det Y + Tr Y −1(X − Y )

Such expression can be used for minimization of log det X.

9 / 17 (Fast) computation of log det

Ways to compute log det X :

1. By definition. Compute det X, then take log. For example, using Laplace cofactor expansion formula

Q 2. Eigenvalues (det X = λi) P Compute the eigenvalues of X, then log det X = log λi

3. Cholesky factorization. For X being pd, apply Cholesky decomoposition on X = LLT , then with Q P det L = Lii , log det X = 2 log Lii

There are many other methods , for example approximating det X for big matrix X.

10 / 17 On log det XT X : why log det XT X

The det X requires X to be square matrix.

For non-square X, one can try det XT X, where XT X is the Gram matrix T T 2 of X and it is always psd : y X Xy = kXyk2 ≥ 0.

Again it is better to consider a regularied version log det(XX + δI) for removing the possibility of having det(XT X + δI) = 0.

Note. log det XT X and log det(XT X + δI) is not conave nor covex in X

11 / 17 Derivative of log det XT X + δI

Let matrix B = XT X + δI to shorten the notation.

Again, the derivative of the scalar-valued function log det B w.r.t. to a scalar x is a scalar, so the following expression after using is correct : ∂ log det B ∂ log det B ∂ det B = (2) ∂x ∂ det B ∂x And the following expression after using chain rule is wrong : ∂ log det B ∂ log det B ∂B = ∂x ∂B ∂x The correct expression (with inner product operator) should be

∂ log det B h∂ log det B T ∂B i = Tr (3) ∂x ∂B ∂x

12 / 17 Derivative of log det XT X + δI ... 2

∂ log det B ∂ log det B ∂ det B Consider equation (2) : = . ∂x ∂ det B ∂x ∂ log det B ∂ det B is just a simple log differentiation as det B is a scalar ∂ log det B 1 = (4) ∂ det B det B

∂ det B For ∂x , apply Jacobi’s Formula ∂ det B  ∂B  = det B Tr B−1 (5) ∂x ∂x Put (4),(5) to (2) :

∂ log det B  ∂B  = Tr B−1 ∂x ∂x

13 / 17 Derivative of log det XT X + δI ... 3

Put B = XT X + δI back ,we have

∂ log det(XT X + δI)  ∂(XT X + δI) = Tr (XT X + δI)−1 ∂x ∂x

∂(XT X+δI) ∂XT X ∂δI ∂δI As ∂x = ∂x + ∂x and ∂x gives zero matrix so

∂ log det(XT X + δI)  ∂XT X  = Tr (XT X + δI)−1 (6) ∂x ∂x Again, note that log det(.) and x are scalars but (XT X + δI)−1 and ∂XT X are matrices. It is the trace operator turns the matrix ∂x ∂XT X (XT X + δI)−1 back to scalar so that equation (6) makes sense! ∂x

14 / 17 Derivative of log det XT X + δI ... 4

Now consider another way to show the same result. Consider (3) :

∂ log det B h∂ log det B T ∂B i = Tr ∂x ∂B ∂x

∂ log det B −T Since ∂B = B (page5), so we have ∂ log det B h ∂B i = Tr B−1 ∂x ∂x

∂B ∂XT X From last page we have ∂x = ∂x , so

∂ log det B h ∂XT X i = Tr B−1 ∂x ∂x Put back B = XT X + δI we get the same result as last page.

15 / 17 Derivative of log det XT X + δI ... 5 ∂ log det(XT X + δI) h ∂XT X i So we have = Tr (XT X + δI)−1 ∂x ∂x T ∂X X T ij ji ij Put x = Xij we have = X J + J X where J is single entry ∂Xij matrix. We get

∂ log det(XT X + δI) h i = Tr (XT X + δI)−1(XT Jij + Jij X) ∂Xij h i h i = Tr (XT X + δI)−1XT Jij + Tr (XT X + δI)−1Jij X)

ij T ji By Tr(AJ ) = [A ]ij and Tr(AJ B) = [BA]ij, we have ∂ log det(XT X + δI) h iT h i = (XT X + δI)−1XT + X(XT X + δI)−1 ∂Xij ij ij As XT X + δI is symmetric so ∂ log det(XT X + δI) = 2X(XT X + δI)−1 ∂X

16 / 17 Last page - summary

Physical meanings of det X and log det X log det X is concave in X Jacobi Formula and w.r.t. matrix variable X of det X, log det X and log det(XT X + δI) End of document

17 / 17