On Detx , Logdetx and Logdetxtx

On det X , log det X and log det XT X Andersen Ang Mathématiqueet de Recherche opérationnelle Facultépolytechnique de Mons UMONS Mons, Belgium email: [email protected] homepage: angms.science August 14, 2017 1 / 17 Overview 1 On det X Physical meaning of det X @ det X −T Jacobi's Formula and @X = det X · X 2 On log det X Physical meaning of log det X log det X is concave (Fast) computation of log det 3 On log det XT X Physical meaning of log det XT X Derivative of log det XT X + δI 4 Summary 2 / 17 What is det : the volume of parallelepiped n×n For a square matrix X = [x1 x2 ::: xn] 2 R with linearly independent columns, det X tells the volume of the parallelepiped spaned by the columns xi det X = vol(fx1 x2 ::: xng) 2 Why? Consider the case n = 2 : two vectors x1; x2 2 R , we can always perform Gram-Schmidt orthogonalization to get two vectors v1; v2 v1 = x1 ? v2 = c12v1 + v2 ? where vector v2 is orthogonal to v1. Then ? det(v1; v2) = det(v1; c12v1 + v2 ) ? = det(v1; v2 ) = signed volume(v1; v2) same argument applies to any n For detail, see Hannah, John. "A geometric approach to determinants." American Mathematical Monthly (1996): 401-409. 3 / 17 Derivative of detX - the Jacobi's Formula For a non-singular matrix X, recall that : adjugate-det-inverse relationship: adjX = det X · X−1 adjugate-cofactor relationship : adjX = CT Jacobi's Formula gives the derivative of det X with respect to (w.r.t.) scalar x @ det X @X = det X · Tr X−1 (1) @x @x @X = Tr det X · X−1 | {z } @x adjX @X = Tr adjX | {z } @x CT @X = C; @x F So the derivative of det X equals to the matrix inner product of cofactor and derivative of X 4 / 17 Derivative of detX - the Jacobi's Formula @ det X @X The equation = C; makes sense because : @x @x F The derivative of scalar value det X w.r.t. scalar x is a scalar @X C; is a scalar @x F The derivative of det X w.r.t. matrix X is a matrix. The expression is @ det X = det X · X−T @X For derivation, refer to previous document. 5 / 17 What is log det The log-determinant of a matrix X is log det X X has to be square (* det) X has to be positive definite (pd), because Q I det X = i λi I all eigenvalues of pd matrix are positive I domain of log has to be positive real number (log of negative number produces complex number which is out of context here) if X is positive semidefinite (psd), it is better to consider a regularized version log det(X + δI), where δ > 0 is a small positive number. As all eigenvalues of a psd matrix is non-negative (zero or positve), adding a small positive number δ removes all the zero eigenvalues, turns X + δI to pd log det is a concave function @ log det X = X−T @X 6 / 17 Why log det Q For a matrix X, the expression det X = λi is dominated by the leading eigenvalues. Such domination is not a problem if we only care about the leading eigenvalues. When all the eigenvalues are important, one way to suppress the leading eigenvalues is to use log. Hence we have X log det X = log λi = log λ1 + log λ2 + ::: Again, since the input of log should not be negative nor zero, the matrix X here should be pd to make λi > 0 7 / 17 log det X is concave n For f(X) = log det X, X 2 S++, f is concave on X. Proof. To show f is concave on X is equivalent to show g(t) = f(X) = f(Z + tV ) is concave n on t where Z; V 2 S , and domg = ftjZ + tV 0g \ ft = 0g g(t) = log det(Z + tV ) 1 − 1 − 1 1 = log det Z 2 (I + tZ 2 VZ 2 )Z 2 1 1 − 1 − 1 = log det Z 2 (I + tV^ )Z 2 let V^ = Z 2 VZ 2 h ^ i = log det I + tV det Z * det(AB) = det(A) det(B) = log det I + tV^ + log det Z = log det QQT + tQΛQT + log det Z V^ = QΛQT ; QQT = QT Q = I = log det Q(I + tΛ)QT + log det Z = log det(QQT ) + log det(I + tΛ) + log det Z = log det(I + tΛ) + log det Z QQT = I; det I = 1; log 1 = 0 Q Q = log (1 + tλi) + log det Z det Λ = λi P = log(1 + tλi) + log det Z @g λ @2g λ2 = P i and = − P i < 0, g(t) is concave, so f(X) is concave (and 2 2 @t 1 + tλi @t (1 + tλi) −f(X) is convex) Reference: appendix of the Convex Optimization book by Boyd 8 / 17 Application of concavity of log det X : Taylor bound Fact : first order Taylor approximation is a global over-estimator of a concave function. That is, f(x) ≤ f(y) + rf(y)T (x − y) As log det X is concave, so it is upper bounded by its first order Taylor approximation. −T log det X ≤ log det Y + Y ; (X − Y ) F Again, Y −T and X − Y are matrices while log det X and log det Y are scalars, so the matrix inner product h ; i has to be applied. log det X ≤ log det Y + Tr Y −1(X − Y ) Such expression can be used for minimization of log det X. 9 / 17 (Fast) computation of log det Ways to compute log det X : 1. By definition. Compute det X, then take log. For example, using Laplace cofactor expansion formula Q 2. Eigenvalues (det X = λi) P Compute the eigenvalues of X, then log det X = log λi 3. Cholesky factorization. For X being pd, apply Cholesky decomoposition on X = LLT , then with Q P det L = Lii , log det X = 2 log Lii There are many other methods , for example approximating det X for big matrix X. 10 / 17 On log det XT X : why log det XT X The det X requires X to be square matrix. For non-square X, one can try det XT X, where XT X is the Gram matrix T T 2 of X and it is always psd : y X Xy = kXyk2 ≥ 0. Again it is better to consider a regularied version log det(XX + δI) for removing the possibility of having det(XT X + δI) = 0. Note. log det XT X and log det(XT X + δI) is not conave nor covex in X 11 / 17 Derivative of log det XT X + δI Let matrix B = XT X + δI to shorten the notation. Again, the derivative of the scalar-valued function log det B w.r.t. to a scalar x is a scalar, so the following expression after using chain rule is correct : @ log det B @ log det B @ det B = (2) @x @ det B @x And the following expression after using chain rule is wrong : @ log det B @ log det B @B = @x @B @x The correct expression (with inner product operator) should be @ log det B h@ log det B T @B i = Tr (3) @x @B @x 12 / 17 Derivative of log det XT X + δI ... 2 @ log det B @ log det B @ det B Consider equation (2) : = . @x @ det B @x @ log det B @ det B is just a simple log differentiation as det B is a scalar @ log det B 1 = (4) @ det B det B @ det B For @x , apply Jacobi's Formula @ det B @B = det B Tr B−1 (5) @x @x Put (4),(5) to (2) : @ log det B @B = Tr B−1 @x @x 13 / 17 Derivative of log det XT X + δI ... 3 Put B = XT X + δI back ,we have @ log det(XT X + δI) @(XT X + δI) = Tr (XT X + δI)−1 @x @x @(XT X+δI) @XT X @δI @δI As @x = @x + @x and @x gives zero matrix so @ log det(XT X + δI) @XT X = Tr (XT X + δI)−1 (6) @x @x Again, note that log det(:) and x are scalars but (XT X + δI)−1 and @XT X are matrices. It is the trace operator turns the matrix @x @XT X (XT X + δI)−1 back to scalar so that equation (6) makes sense! @x 14 / 17 Derivative of log det XT X + δI ... 4 Now consider another way to show the same result. Consider (3) : @ log det B h@ log det B T @B i = Tr @x @B @x @ log det B −T Since @B = B (page5), so we have @ log det B h @B i = Tr B−1 @x @x @B @XT X From last page we have @x = @x , so @ log det B h @XT X i = Tr B−1 @x @x Put back B = XT X + δI we get the same result as last page. 15 / 17 Derivative of log det XT X + δI ... 5 @ log det(XT X + δI) h @XT X i So we have = Tr (XT X + δI)−1 @x @x T @X X T ij ji ij Put x = Xij we have = X J + J X where J is single entry @Xij matrix. We get @ log det(XT X + δI) h i = Tr (XT X + δI)−1(XT Jij + Jij X) @Xij h i h i = Tr (XT X + δI)−1XT Jij + Tr (XT X + δI)−1Jij X) ij T ji By Tr(AJ ) = [A ]ij and Tr(AJ B) = [BA]ij, we have @ log det(XT X + δI) h iT h i = (XT X + δI)−1XT + X(XT X + δI)−1 @Xij ij ij As XT X + δI is symmetric so @ log det(XT X + δI) = 2X(XT X + δI)−1 @X 16 / 17 Last page - summary Physical meanings of det X and log det X log det X is concave in X Jacobi Formula and derivatives w.r.t.

Load more