Sam Roweis' Notes on Matrix Identities

matrix identities sam roweis (revised June 1999) note that a,b,c and A,B,C do not depend on X,Y,x,y or z 0.1 basic formulae A(B + C) = AB + AC (1a) (A + B)T = AT + BT (1b) (AB)T = BT AT (1c) 1 1 1 if individual inverses exist (AB)− = B− A− (1d) 1 T T 1 (A− ) = (A )− (1e) 0.2 trace, determinant and rank AB = A B (2a) j j j jj j 1 1 A− = (2b) j j A j j A = evals (2c) j j Y Tr [A] = evals (2d) X if the cyclic products are well deﬁned; Tr [ABC :::] = Tr [BC ::: A] = Tr [C ::: AB] = ::: (2e) rank [A] = rank AT A = rank AAT (2f) biggest eval condition number = γ = r (2g) smallest eval derivatives of scalar forms with respect to scalars, vectors, or matricies are indexed in the obvious way. similarly, the indexing for derivatives of vectors and matrices with respect to scalars is straightforward. 1 0.3 derivatives of traces @Tr [X] = I (3a) @X @Tr [XA] @Tr [AX] = = AT (3b) @X @X @Tr XT A @Tr AXT = = A (3c) @X @X @Tr XT AX = (A + AT )X (3d) @X 1 @Tr X− A 1 T 1 = X− A X− (3e) @X − 0.4 derivatives of determinants @ AXB 1 T T 1 j j = AXB (X− ) = AXB (X )− (4a) @X j j j j @ ln X j j = (X 1)T = (XT ) 1 (4b) @X − − @ ln X(z) 1 @X j j = Tr X− (4c) @z @z T @ X AX T T 1 T T T 1 j j = X AX (AX(X AX)− + A X(X A X)− ) (4d) @X j j 0.5 derivatives of scalar forms @(aT x) @(xT a) = = a (5a) @x @x @(xT Ax) = (A + AT )x (5b) @x @(aT Xb) = abT (5c) @X @(aT XT b) = baT (5d) @X @(aT Xa) @(aT XT a) = = aaT (5e) @X @X @(aT XT CXb) = CT XabT + CXbaT (5f) @X @ (Xa + b)T C(Xa + b) = (C + CT )(Xa + b)aT (5g) @X 2 the derivative of one vector y with respect to another vector x is a matrix whose (i; j)th element is @y(j)=@x(i). such a derivative should be written as @yT =@x in which case it is the Jacobian matrix of y wrt x. its determinant represents the ratio of the hypervolume dy to that of dx so that f(y)dy = f(y(x)) @yT =@x dx. however, the sloppy forms @y=@x, @yRT =@xT and @Ry=@xT arej oftenj used for this Jacobain matrix. 0.6 derivatives of vector/matrix forms 1 @(X− ) 1 @X 1 = X− X− (6a) @z − @z @(Ax) @x = A (6b) @z @z @(XY) @Y @X = X + Y (6c) @z @z @z @(AXB) @X = A B (6d) @z @z @(xT A) = A (6e) @x @(xT ) = I (6f) @x @(xT AxxT ) = (A + AT )xxT + xT AxI (6g) @x 0.7 constrained maximization the maximum over x of the quadratic form: T 1 T 1 µ x x A− x (7a) − 2 subject to the J conditions cj(x) = 0 is given by: Aµ + ACΛ; Λ = 4(CT AC)CT Aµ (7b) − where the jth column of C is @cj(x)=@x 0.8 symmetric matrices have real eigenvalues, though perhaps not distinct and can always be diag- onalized to the form: A = CΛCT (8) 3 where the columns of C are (orthonormal) eigenvectors (i.e. CCT = I) and the diagonal of Λ has the eigenvalues 0.9 block matrices for conformably partitioned block matrices, addition and multiplication is performed by adding and multiplying blocks in exactly the same way as scalar elements of regular matrices however, determinants and inverses of block matrices are very tricky; for 2 blocks by 2 blocks the results are: A11 A12 = A22 F11 = A11 F22 (9a) A21 A22 j j · j j j j · j j 1 1 1 1 A11 A12 − F11− A11− A12F22− = 1 1 − 1 (9b) A21 A22 F− A21A− F− − 22 11 22 1 1 1 1 1 1 A11− + A11− A12F22− A21A11− F11− A12A22− = 1 1 1 − 1 1 1 A− A21F− A− + A− A21F− A12A− − 22 11 22 22 11 22 where 1 1 F11 = A11 A12A− A21 F22 = A22 A21A− A12 − 22 − 11 for block diagonal matrices things are much easier: A11 0 = A11 A22 (9d) 0 A22 j jj j 1 1 A11 0 − A11− 0 = 1 (9e) 0 A22 0 A22− 0.10 matrix inversion lemma (sherman-morrison-woodbury) using the above results for block matrices we can make some substitutions and get the following important results: T 1 1 1 1 T 1 1 T 1 (A + XBX )− = A− A− X(B− + X A− X)− X A− (10) − T 1 T 1 A + XBX = B A B− + X A− X (11) j j j jj jj j where A and B are square and invertible matrices but need not be of the same dimension. this lemma often allows a really hard inverse to be con- verted into an easy inverse. the most typical example of this is when A is large but diagonal, and X has many rows but few columns 4.

Sam Roweis' Notes on Matrix Identities

Estimations of the Trace of Powers of Positive Self-Adjoint Operators by Extrapolation of the Moments∗

5 the Dirac Equation and Spinors

A Some Basic Rules of Tensor Calculus

18.700 JORDAN NORMAL FORM NOTES These Are Some Supplementary Notes on How to Find the Jordan Normal Form of a Small Matrix. Firs

Trace Inequalities for Matrices

Lecture 28: Eigenvalues Allowing Complex Eigenvalues Is Really a Blessing

Approximating Spectral Sums of Large-Scale Matrices Using Stochastic Chebyshev Approximations∗

Math 217: True False Practice Professor Karen Smith 1. a Square Matrix Is Invertible If and Only If Zero Is Not an Eigenvalue. Solution Note: True

Combining Systems: the Tensor Product and Partial Trace

Similar Matrices and Jordan Form

Notes on Symmetric Matrices 1 Symmetric Matrices

Randomized Matrix-Free Trace and Log-Determinant Estimators