On Detx , Logdetx and Logdetxtx

On det X , log det X and log det XT X Andersen Ang Mathématiqueet de Recherche opérationnelle Facultépolytechnique de Mons UMONS Mons, Belgium email: [email protected] homepage: angms.science August 14, 2017 1 / 17 Overview 1 On det X Physical meaning of det X @ det X −T Jacobi's Formula and @X = det X · X 2 On log det X Physical meaning of log det X log det X is concave (Fast) computation of log det 3 On log det XT X Physical meaning of log det XT X Derivative of log det XT X + δI 4 Summary 2 / 17 What is det : the volume of parallelepiped n×n For a square matrix X = [x1 x2 ::: xn] 2 R with linearly independent columns, det X tells the volume of the parallelepiped spaned by the columns xi det X = vol(fx1 x2 ::: xng) 2 Why? Consider the case n = 2 : two vectors x1; x2 2 R , we can always perform Gram-Schmidt orthogonalization to get two vectors v1; v2 v1 = x1 ? v2 = c12v1 + v2 ? where vector v2 is orthogonal to v1. Then ? det(v1; v2) = det(v1; c12v1 + v2 ) ? = det(v1; v2 ) = signed volume(v1; v2) same argument applies to any n For detail, see Hannah, John. "A geometric approach to determinants." American Mathematical Monthly (1996): 401-409. 3 / 17 Derivative of detX - the Jacobi's Formula For a non-singular matrix X, recall that : adjugate-det-inverse relationship: adjX = det X · X−1 adjugate-cofactor relationship : adjX = CT Jacobi's Formula gives the derivative of det X with respect to (w.r.t.) scalar x @ det X @X = det X · Tr X−1 (1) @x @x @X = Tr det X · X−1 | {z } @x adjX @X = Tr adjX | {z } @x CT @X = C; @x F So the derivative of det X equals to the matrix inner product of cofactor and derivative of X 4 / 17 Derivative of detX - the Jacobi's Formula @ det X @X The equation = C; makes sense because : @x @x F The derivative of scalar value det X w.r.t. scalar x is a scalar @X C; is a scalar @x F The derivative of det X w.r.t. matrix X is a matrix. The expression is @ det X = det X · X−T @X For derivation, refer to previous document. 5 / 17 What is log det The log-determinant of a matrix X is log det X X has to be square (* det) X has to be positive definite (pd), because Q I det X = i λi I all eigenvalues of pd matrix are positive I domain of log has to be positive real number (log of negative number produces complex number which is out of context here) if X is positive semidefinite (psd), it is better to consider a regularized version log det(X + δI), where δ > 0 is a small positive number. As all eigenvalues of a psd matrix is non-negative (zero or positve), adding a small positive number δ removes all the zero eigenvalues, turns X + δI to pd log det is a concave function @ log det X = X−T @X 6 / 17 Why log det Q For a matrix X, the expression det X = λi is dominated by the leading eigenvalues. Such domination is not a problem if we only care about the leading eigenvalues. When all the eigenvalues are important, one way to suppress the leading eigenvalues is to use log. Hence we have X log det X = log λi = log λ1 + log λ2 + ::: Again, since the input of log should not be negative nor zero, the matrix X here should be pd to make λi > 0 7 / 17 log det X is concave n For f(X) = log det X, X 2 S++, f is concave on X. Proof. To show f is concave on X is equivalent to show g(t) = f(X) = f(Z + tV ) is concave n on t where Z; V 2 S , and domg = ftjZ + tV 0g \ ft = 0g g(t) = log det(Z + tV ) 1 − 1 − 1 1 = log det Z 2 (I + tZ 2 VZ 2 )Z 2 1 1 − 1 − 1 = log det Z 2 (I + tV^ )Z 2 let V^ = Z 2 VZ 2 h ^ i = log det I + tV det Z * det(AB) = det(A) det(B) = log det I + tV^ + log det Z = log det QQT + tQΛQT + log det Z V^ = QΛQT ; QQT = QT Q = I = log det Q(I + tΛ)QT + log det Z = log det(QQT ) + log det(I + tΛ) + log det Z = log det(I + tΛ) + log det Z QQT = I; det I = 1; log 1 = 0 Q Q = log (1 + tλi) + log det Z det Λ = λi P = log(1 + tλi) + log det Z @g λ @2g λ2 = P i and = − P i < 0, g(t) is concave, so f(X) is concave (and 2 2 @t 1 + tλi @t (1 + tλi) −f(X) is convex) Reference: appendix of the Convex Optimization book by Boyd 8 / 17 Application of concavity of log det X : Taylor bound Fact : first order Taylor approximation is a global over-estimator of a concave function. That is, f(x) ≤ f(y) + rf(y)T (x − y) As log det X is concave, so it is upper bounded by its first order Taylor approximation. −T log det X ≤ log det Y + Y ; (X − Y ) F Again, Y −T and X − Y are matrices while log det X and log det Y are scalars, so the matrix inner product h ; i has to be applied. log det X ≤ log det Y + Tr Y −1(X − Y ) Such expression can be used for minimization of log det X. 9 / 17 (Fast) computation of log det Ways to compute log det X : 1. By definition. Compute det X, then take log. For example, using Laplace cofactor expansion formula Q 2. Eigenvalues (det X = λi) P Compute the eigenvalues of X, then log det X = log λi 3. Cholesky factorization. For X being pd, apply Cholesky decomoposition on X = LLT , then with Q P det L = Lii , log det X = 2 log Lii There are many other methods , for example approximating det X for big matrix X. 10 / 17 On log det XT X : why log det XT X The det X requires X to be square matrix. For non-square X, one can try det XT X, where XT X is the Gram matrix T T 2 of X and it is always psd : y X Xy = kXyk2 ≥ 0. Again it is better to consider a regularied version log det(XX + δI) for removing the possibility of having det(XT X + δI) = 0. Note. log det XT X and log det(XT X + δI) is not conave nor covex in X 11 / 17 Derivative of log det XT X + δI Let matrix B = XT X + δI to shorten the notation. Again, the derivative of the scalar-valued function log det B w.r.t. to a scalar x is a scalar, so the following expression after using chain rule is correct : @ log det B @ log det B @ det B = (2) @x @ det B @x And the following expression after using chain rule is wrong : @ log det B @ log det B @B = @x @B @x The correct expression (with inner product operator) should be @ log det B h@ log det B T @B i = Tr (3) @x @B @x 12 / 17 Derivative of log det XT X + δI ... 2 @ log det B @ log det B @ det B Consider equation (2) : = . @x @ det B @x @ log det B @ det B is just a simple log differentiation as det B is a scalar @ log det B 1 = (4) @ det B det B @ det B For @x , apply Jacobi's Formula @ det B @B = det B Tr B−1 (5) @x @x Put (4),(5) to (2) : @ log det B @B = Tr B−1 @x @x 13 / 17 Derivative of log det XT X + δI ... 3 Put B = XT X + δI back ,we have @ log det(XT X + δI) @(XT X + δI) = Tr (XT X + δI)−1 @x @x @(XT X+δI) @XT X @δI @δI As @x = @x + @x and @x gives zero matrix so @ log det(XT X + δI) @XT X = Tr (XT X + δI)−1 (6) @x @x Again, note that log det(:) and x are scalars but (XT X + δI)−1 and @XT X are matrices. It is the trace operator turns the matrix @x @XT X (XT X + δI)−1 back to scalar so that equation (6) makes sense! @x 14 / 17 Derivative of log det XT X + δI ... 4 Now consider another way to show the same result. Consider (3) : @ log det B h@ log det B T @B i = Tr @x @B @x @ log det B −T Since @B = B (page5), so we have @ log det B h @B i = Tr B−1 @x @x @B @XT X From last page we have @x = @x , so @ log det B h @XT X i = Tr B−1 @x @x Put back B = XT X + δI we get the same result as last page. 15 / 17 Derivative of log det XT X + δI ... 5 @ log det(XT X + δI) h @XT X i So we have = Tr (XT X + δI)−1 @x @x T @X X T ij ji ij Put x = Xij we have = X J + J X where J is single entry @Xij matrix. We get @ log det(XT X + δI) h i = Tr (XT X + δI)−1(XT Jij + Jij X) @Xij h i h i = Tr (XT X + δI)−1XT Jij + Tr (XT X + δI)−1Jij X) ij T ji By Tr(AJ ) = [A ]ij and Tr(AJ B) = [BA]ij, we have @ log det(XT X + δI) h iT h i = (XT X + δI)−1XT + X(XT X + δI)−1 @Xij ij ij As XT X + δI is symmetric so @ log det(XT X + δI) = 2X(XT X + δI)−1 @X 16 / 17 Last page - summary Physical meanings of det X and log det X log det X is concave in X Jacobi Formula and derivatives w.r.t.

On Detx , Logdetx and Logdetxtx

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support