On det X , log det X and log det XT X
Andersen Ang
Math´ematiqueet de Recherche op´erationnelle Facult´epolytechnique de Mons UMONS Mons, Belgium email: [email protected] homepage: angms.science
August 14, 2017
1 / 17 Overview
1 On det X Physical meaning of det X ∂ det X −T Jacobi’s Formula and ∂X = det X · X
2 On log det X Physical meaning of log det X log det X is concave (Fast) computation of log det
3 On log det XT X Physical meaning of log det XT X Derivative of log det XT X + δI
4 Summary
2 / 17 What is det : the volume of parallelepiped
n×n For a square matrix X = [x1 x2 ... xn] ∈ R with linearly independent columns, det X tells the volume of the parallelepiped spaned by the columns xi det X = vol({x1 x2 ... xn}) 2 Why? Consider the case n = 2 : two vectors x1, x2 ∈ R , we can always perform Gram-Schmidt orthogonalization to get two vectors v1, v2
v1 = x1 ⊥ v2 = c12v1 + v2 ⊥ where vector v2 is orthogonal to v1. Then ⊥ det(v1, v2) = det(v1, c12v1 + v2 ) ⊥ = det(v1, v2 ) = signed volume(v1, v2) same argument applies to any n
For detail, see Hannah, John. ”A geometric approach to determinants.” American Mathematical Monthly (1996): 401-409. 3 / 17 Derivative of detX - the Jacobi’s Formula
For a non-singular matrix X, recall that : adjugate-det-inverse relationship: adjX = det X · X−1 adjugate-cofactor relationship : adjX = CT Jacobi’s Formula gives the derivative of det X with respect to (w.r.t.) scalar x ∂ det X ∂X = det X · Tr X−1 (1) ∂x ∂x ∂X = Tr det X · X−1 | {z } ∂x adjX ∂X = Tr adjX | {z } ∂x CT ∂X = C, ∂x F So the derivative of det X equals to the matrix inner product of cofactor and derivative of X 4 / 17 Derivative of detX - the Jacobi’s Formula
∂ det X ∂X The equation = C, makes sense because : ∂x ∂x F The derivative of scalar value det X w.r.t. scalar x is a scalar ∂X C, is a scalar ∂x F The derivative of det X w.r.t. matrix X is a matrix. The expression is ∂ det X = det X · X−T ∂X For derivation, refer to previous document.
5 / 17 What is log det
The log-determinant of a matrix X is log det X X has to be square (∵ det) X has to be positive definite (pd), because Q I det X = i λi I all eigenvalues of pd matrix are positive I domain of log has to be positive real number (log of negative number produces complex number which is out of context here) if X is positive semidefinite (psd), it is better to consider a regularized version log det(X + δI), where δ > 0 is a small positive number. As all eigenvalues of a psd matrix is non-negative (zero or positve), adding a small positive number δ removes all the zero eigenvalues, turns X + δI to pd log det is a concave function ∂ log det X = X−T ∂X
6 / 17 Why log det
Q For a matrix X, the expression det X = λi is dominated by the leading eigenvalues. Such domination is not a problem if we only care about the leading eigenvalues.
When all the eigenvalues are important, one way to suppress the leading eigenvalues is to use log. Hence we have X log det X = log λi = log λ1 + log λ2 + ...
Again, since the input of log should not be negative nor zero, the matrix X here should be pd to make λi > 0
7 / 17 log det X is concave
n For f(X) = log det X, X ∈ S++, f is concave on X. Proof. To show f is concave on X is equivalent to show g(t) = f(X) = f(Z + tV ) is concave n on t where Z,V ∈ S , and domg = {t|Z + tV 0} ∩ {t = 0}
g(t) = log det(Z + tV ) 1 − 1 − 1 1 = log det Z 2 (I + tZ 2 VZ 2 )Z 2 1 1 − 1 − 1 = log det Z 2 (I + tVˆ )Z 2 let Vˆ = Z 2 VZ 2 h ˆ i = log det I + tV det Z ∵ det(AB) = det(A) det(B) = log det I + tVˆ + log det Z = log det QQT + tQΛQT + log det Z Vˆ = QΛQT , QQT = QT Q = I = log det Q(I + tΛ)QT + log det Z = log det(QQT ) + log det(I + tΛ) + log det Z = log det(I + tΛ) + log det Z QQT = I, det I = 1, log 1 = 0 Q Q = log (1 + tλi) + log det Z det Λ = λi P = log(1 + tλi) + log det Z
∂g λ ∂2g λ2 = P i and = − P i < 0, g(t) is concave, so f(X) is concave (and 2 2 ∂t 1 + tλi ∂t (1 + tλi) −f(X) is convex) Reference: appendix of the Convex Optimization book by Boyd
8 / 17 Application of concavity of log det X : Taylor bound
Fact : first order Taylor approximation is a global over-estimator of a concave function. That is,
f(x) ≤ f(y) + ∇f(y)T (x − y)
As log det X is concave, so it is upper bounded by its first order Taylor approximation.
−T log det X ≤ log det Y + Y , (X − Y ) F
Again, Y −T and X − Y are matrices while log det X and log det Y are scalars, so the matrix inner product h , i has to be applied. log det X ≤ log det Y + Tr Y −1(X − Y )
Such expression can be used for minimization of log det X.
9 / 17 (Fast) computation of log det
Ways to compute log det X :
1. By definition. Compute det X, then take log. For example, using Laplace cofactor expansion formula
Q 2. Eigenvalues (det X = λi) P Compute the eigenvalues of X, then log det X = log λi
3. Cholesky factorization. For X being pd, apply Cholesky decomoposition on X = LLT , then with Q P det L = Lii , log det X = 2 log Lii
There are many other methods , for example approximating det X for big matrix X.
10 / 17 On log det XT X : why log det XT X
The det X requires X to be square matrix.
For non-square X, one can try det XT X, where XT X is the Gram matrix T T 2 of X and it is always psd : y X Xy = kXyk2 ≥ 0.
Again it is better to consider a regularied version log det(XX + δI) for removing the possibility of having det(XT X + δI) = 0.
Note. log det XT X and log det(XT X + δI) is not conave nor covex in X
11 / 17 Derivative of log det XT X + δI
Let matrix B = XT X + δI to shorten the notation.
Again, the derivative of the scalar-valued function log det B w.r.t. to a scalar x is a scalar, so the following expression after using chain rule is correct : ∂ log det B ∂ log det B ∂ det B = (2) ∂x ∂ det B ∂x And the following expression after using chain rule is wrong : ∂ log det B ∂ log det B ∂B = ∂x ∂B ∂x The correct expression (with inner product operator) should be
∂ log det B h∂ log det B T ∂B i = Tr (3) ∂x ∂B ∂x
12 / 17 Derivative of log det XT X + δI ... 2
∂ log det B ∂ log det B ∂ det B Consider equation (2) : = . ∂x ∂ det B ∂x ∂ log det B ∂ det B is just a simple log differentiation as det B is a scalar ∂ log det B 1 = (4) ∂ det B det B
∂ det B For ∂x , apply Jacobi’s Formula ∂ det B ∂B = det B Tr B−1 (5) ∂x ∂x Put (4),(5) to (2) :
∂ log det B ∂B = Tr B−1 ∂x ∂x
13 / 17 Derivative of log det XT X + δI ... 3
Put B = XT X + δI back ,we have
∂ log det(XT X + δI) ∂(XT X + δI) = Tr (XT X + δI)−1 ∂x ∂x
∂(XT X+δI) ∂XT X ∂δI ∂δI As ∂x = ∂x + ∂x and ∂x gives zero matrix so
∂ log det(XT X + δI) ∂XT X = Tr (XT X + δI)−1 (6) ∂x ∂x Again, note that log det(.) and x are scalars but (XT X + δI)−1 and ∂XT X are matrices. It is the trace operator turns the matrix ∂x ∂XT X (XT X + δI)−1 back to scalar so that equation (6) makes sense! ∂x
14 / 17 Derivative of log det XT X + δI ... 4
Now consider another way to show the same result. Consider (3) :
∂ log det B h∂ log det B T ∂B i = Tr ∂x ∂B ∂x
∂ log det B −T Since ∂B = B (page5), so we have ∂ log det B h ∂B i = Tr B−1 ∂x ∂x
∂B ∂XT X From last page we have ∂x = ∂x , so
∂ log det B h ∂XT X i = Tr B−1 ∂x ∂x Put back B = XT X + δI we get the same result as last page.
15 / 17 Derivative of log det XT X + δI ... 5 ∂ log det(XT X + δI) h ∂XT X i So we have = Tr (XT X + δI)−1 ∂x ∂x T ∂X X T ij ji ij Put x = Xij we have = X J + J X where J is single entry ∂Xij matrix. We get
∂ log det(XT X + δI) h i = Tr (XT X + δI)−1(XT Jij + Jij X) ∂Xij h i h i = Tr (XT X + δI)−1XT Jij + Tr (XT X + δI)−1Jij X)
ij T ji By Tr(AJ ) = [A ]ij and Tr(AJ B) = [BA]ij, we have ∂ log det(XT X + δI) h iT h i = (XT X + δI)−1XT + X(XT X + δI)−1 ∂Xij ij ij As XT X + δI is symmetric so ∂ log det(XT X + δI) = 2X(XT X + δI)−1 ∂X
16 / 17 Last page - summary
Physical meanings of det X and log det X log det X is concave in X Jacobi Formula and derivatives w.r.t. matrix variable X of det X, log det X and log det(XT X + δI) End of document
17 / 17