The Matrix Cookbook [ http://matrixcookbook.com ]
Kaare Brandt Petersen Michael Syskind Pedersen
Version: September 5, 2007
What is this? These pages are a collection of facts (identities, approxima- tions, inequalities, relations, ...) about matrices and matters relating to them. It is collected in this form for the convenience of anyone who wants a quick desktop reference .
Disclaimer: The identities, approximations and relations presented here were obviously not invented but collected, borrowed and copied from a large amount of sources. These sources include similar but shorter notes found on the internet and appendices in books - see the references for a full list.
Errors: Very likely there are errors, typos, and mistakes for which we apolo- gize and would be grateful to receive corrections at [email protected].
Its ongoing: The project of keeping a large repository of relations involving matrices is naturally ongoing and the version will be apparent from the date in the header.
Suggestions: Your suggestion for additional content or elaboration of some topics is most welcome at [email protected].
Keywords: Matrix algebra, matrix relations, matrix identities, derivative of determinant, derivative of inverse matrix, differentiate a matrix.
Acknowledgements: We would like to thank the following for contributions and suggestions: Bill Baxter, Christian Rishøj, Douglas L. Theobald, Esben Hoegh-Rasmussen, Jan Larsen, Korbinian Strimmer, Lars Christiansen, Lars Kai Hansen, Leland Wilkinson, Liguo He, Loic Thibaut, Ole Winther, Stephan Hattinger, and Vasile Sima. We would also like thank The Oticon Foundation for funding our PhD studies.
1 CONTENTS CONTENTS
Contents
1 Basics 5 1.1 Trace and Determinants ...... 5 1.2 The Special Case 2x2 ...... 5
2 Derivatives 7 2.1 Derivatives of a Determinant ...... 7 2.2 Derivatives of an Inverse ...... 8 2.3 Derivatives of Eigenvalues ...... 9 2.4 Derivatives of Matrices, Vectors and Scalar Forms ...... 9 2.5 Derivatives of Traces ...... 11 2.6 Derivatives of vector norms ...... 13 2.7 Derivatives of matrix norms ...... 13 2.8 Derivatives of Structured Matrices ...... 13
3 Inverses 16 3.1 Basic ...... 16 3.2 Exact Relations ...... 17 3.3 Implication on Inverses ...... 19 3.4 Approximations ...... 19 3.5 Generalized Inverse ...... 20 3.6 Pseudo Inverse ...... 20
4 Complex Matrices 22 4.1 Complex Derivatives ...... 22
5 Decompositions 25 5.1 Eigenvalues and Eigenvectors ...... 25 5.2 Singular Value Decomposition ...... 25 5.3 Triangular Decomposition ...... 27
6 Statistics and Probability 28 6.1 Definition of Moments ...... 28 6.2 Expectation of Linear Combinations ...... 29 6.3 Weighted Scalar Variable ...... 30
7 Multivariate Distributions 31 7.1 Student’s t ...... 31 7.2 Cauchy ...... 32 7.3 Gaussian ...... 32 7.4 Multinomial ...... 32 7.5 Dirichlet ...... 32 7.6 Normal-Inverse Gamma ...... 32 7.7 Wishart ...... 32 7.8 Inverse Wishart ...... 33
Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 2 CONTENTS CONTENTS
8 Gaussians 34 8.1 Basics ...... 34 8.2 Moments ...... 36 8.3 Miscellaneous ...... 38 8.4 Mixture of Gaussians ...... 39
9 Special Matrices 40 9.1 Orthogonal, Ortho-symmetric, and Ortho-skew ...... 40 9.2 Units, Permutation and Shift ...... 40 9.3 The Singleentry Matrix ...... 41 9.4 Symmetric and Antisymmetric ...... 43 9.5 Orthogonal matrices ...... 44 9.6 Vandermonde Matrices ...... 45 9.7 Toeplitz Matrices ...... 45 9.8 The DFT Matrix ...... 46 9.9 Positive Definite and Semi-definite Matrices ...... 47 9.10 Block matrices ...... 49
10 Functions and Operators 51 10.1 Functions and Series ...... 51 10.2 Kronecker and Vec Operator ...... 52 10.3 Solutions to Systems of Equations ...... 53 10.4 Vector Norms ...... 56 10.5 Matrix Norms ...... 56 10.6 Rank ...... 58 10.7 Integral Involving Dirac Delta Functions ...... 58 10.8 Miscellaneous ...... 58
A One-dimensional Results 59 A.1 Gaussian ...... 59 A.2 One Dimensional Mixture of Gaussians ...... 60
B Proofs and Details 62 B.1 Misc Proofs ...... 62
Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 3 CONTENTS CONTENTS
Notation and Nomenclature A Matrix Aij Matrix indexed for some purpose Ai Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or The n.th power of a square matrix A−1 The inverse matrix of the matrix A A+ The pseudo inverse matrix of the matrix A (see Sec. 3.6) A1/2 The square root of a matrix (if unique), not elementwise (A)ij The (i, j).th entry of the matrix A Aij The (i, j).th entry of the matrix A [A]ij The ij-submatrix, i.e. A with i.th row and j.th column deleted a Vector ai Vector indexed for some purpose ai The i.th element of the vector a a Scalar
det(A) Determinant of A Tr(A) Trace of the matrix A diag(A) Diagonal matrix of the matrix A, i.e. (diag(A))ij = δijAij vec(A) The vector-version of the matrix A (see Sec. 10.2.2) sup Supremum of a set ||A|| Matrix norm (subscript if any denotes what norm) AT Transposed matrix A∗ Complex conjugated matrix AH Transposed and complex conjugated matrix (Hermitian) A ◦ B Hadamard (elementwise) product A ⊗ B Kronecker product 0 The null matrix. Zero in all entries. I The identity matrix Jij The single-entry matrix, 1 at (i, j) and zero elsewhere Σ A positive definite matrix Λ A diagonal matrix Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 4 1 BASICS 1 Basics (AB)−1 = B−1A−1 (1) (ABC...)−1 = ...C−1B−1A−1 (2) (AT )−1 = (A−1)T (3) (A + B)T = AT + BT (4) (AB)T = BT AT (5) (ABC...)T = ...CT BT AT (6) (AH )−1 = (A−1)H (7) (A + B)H = AH + BH (8) (AB)H = BH AH (9) (ABC...)H = ...CH BH AH (10) 1.1 Trace and Determinants P Tr(A) = iAii (11) P Tr(A) = iλi, λi = eig(A) (12) Tr(A) = Tr(AT ) (13) Tr(AB) = Tr(BA) (14) Tr(A + B) = Tr(A) + Tr(B) (15) Tr(ABC) = Tr(BCA) = Tr(CAB) (16) Q det(A) = iλi λi = eig(A) (17) n n×n det(cA) = c det(A), if A ∈ R (18) det(AB) = det(A) det(B) (19) det(A−1) = 1/ det(A) (20) det(An) = det(A)n (21) det(I + uvT ) = 1 + uT v (22) det(I + εA) =∼ 1 + εTr(A), ε small (23) 1.2 The Special Case 2x2 Consider the matrix A A A A = 11 12 A21 A22 Determinant and trace det(A) = A11A22 − A12A21 (24) Tr(A) = A11 + A22 (25) Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 5 1.2 The Special Case 2x2 1 BASICS Eigenvalues λ2 − λ · Tr(A) + det(A) = 0 Tr(A) + pTr(A)2 − 4 det(A) Tr(A) − pTr(A)2 − 4 det(A) λ = λ = 1 2 2 2 λ1 + λ2 = Tr(A) λ1λ2 = det(A) Eigenvectors A12 A12 v1 ∝ v2 ∝ λ1 − A11 λ2 − A11 Inverse 1 A −A A−1 = 22 12 (26) det(A) −A21 A11 Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 6 2 DERIVATIVES 2 Derivatives This section is covering differentiation of a number of expressions with respect to a matrix X. Note that it is always assumed that X has no special structure, i.e. that the elements of X are independent (e.g. not symmetric, Toeplitz, positive definite). See section 2.8 for differentiation of structured matrices. The basic assumptions can be written in a formula as ∂Xkl = δikδlj (27) ∂Xij that is for e.g. vector forms, ∂x ∂x ∂x ∂x ∂x ∂x = i = = i ∂y i ∂y ∂y i ∂yi ∂y ij ∂yj The following rules are general and very useful when deriving the differential of an expression ([18]): ∂A = 0 (A is a constant) (28) ∂(αX) = α∂X (29) ∂(X + Y) = ∂X + ∂Y (30) ∂(Tr(X)) = Tr(∂X) (31) ∂(XY) = (∂X)Y + X(∂Y) (32) ∂(X ◦ Y) = (∂X) ◦ Y + X ◦ (∂Y) (33) ∂(X ⊗ Y) = (∂X) ⊗ Y + X ⊗ (∂Y) (34) ∂(X−1) = −X−1(∂X)X−1 (35) ∂(det(X)) = det(X)Tr(X−1∂X) (36) ∂(ln(det(X))) = Tr(X−1∂X) (37) ∂XT = (∂X)T (38) ∂XH = (∂X)H (39) 2.1 Derivatives of a Determinant 2.1.1 General form ∂ det(Y) ∂Y = det(Y)Tr Y−1 (40) ∂x ∂x 2.1.2 Linear forms ∂ det(X) = det(X)(X−1)T (41) ∂X ∂ det(AXB) = det(AXB)(X−1)T = det(AXB)(XT )−1 (42) ∂X Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 7 2.2 Derivatives of an Inverse 2 DERIVATIVES 2.1.3 Square forms If X is square and invertible, then ∂ det(XT AX) = 2 det(XT AX)X−T (43) ∂X If X is not square but A is symmetric, then ∂ det(XT AX) = 2 det(XT AX)AX(XT AX)−1 (44) ∂X If X is not square and A is not symmetric, then ∂ det(XT AX) = det(XT AX)(AX(XT AX)−1 + AT X(XT AT X)−1) (45) ∂X 2.1.4 Other nonlinear forms Some special cases are (See [9, 7]) ∂ ln det(XT X)| = 2(X+)T (46) ∂X ∂ ln det(XT X) = −2XT (47) ∂X+ ∂ ln | det(X)| = (X−1)T = (XT )−1 (48) ∂X ∂ det(Xk) = k det(Xk)X−T (49) ∂X 2.2 Derivatives of an Inverse ¿From [26] we have the basic identity ∂Y−1 ∂Y = −Y−1 Y−1 (50) ∂x ∂x from which it follows −1 ∂(X )kl −1 −1 = −(X )ki(X )jl (51) ∂Xij ∂aT X−1b = −X−T abT X−T (52) ∂X ∂ det(X−1) = − det(X−1)(X−1)T (53) ∂X ∂Tr(AX−1B) = −(X−1BAX−1)T (54) ∂X Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 8 2.3 Derivatives of Eigenvalues 2 DERIVATIVES 2.3 Derivatives of Eigenvalues ∂ X ∂ eig(X) = Tr(X) = I (55) ∂X ∂X ∂ Y ∂ eig(X) = det(X) = det(X)X−T (56) ∂X ∂X 2.4 Derivatives of Matrices, Vectors and Scalar Forms 2.4.1 First Order ∂xT a ∂aT x = = a (57) ∂x ∂x ∂aT Xb = abT (58) ∂X ∂aT XT b = baT (59) ∂X ∂aT Xa ∂aT XT a = = aaT (60) ∂X ∂X ∂X = Jij (61) ∂Xij ∂(XA)ij mn = δim(A)nj = (J A)ij (62) ∂Xmn T ∂(X A)ij nm = δin(A)mj = (J A)ij (63) ∂Xmn 2.4.2 Second Order ∂ X X XklXmn = 2 Xkl (64) ∂Xij klmn kl ∂bT XT Xc = X(bcT + cbT ) (65) ∂X ∂(Bx + b)T C(Dx + d) = BT C(Dx + d) + DT CT (Bx + b) (66) ∂x T ∂(X BX)kl T = δlj(X B)ki + δkj(BX)il (67) ∂Xij T ∂(X BX) T ij ji ij = X BJ + J BX (J )kl = δikδjl (68) ∂Xij Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 9 2.4 Derivatives of Matrices, Vectors and Scalar Forms 2 DERIVATIVES See Sec 9.3 for useful properties of the Single-entry matrix Jij ∂xT Bx = (B + BT )x (69) ∂x ∂bT XT DXc = DT XbcT + DXcbT (70) ∂X ∂ (Xb + c)T D(Xb + c) = (D + DT )(Xb + c)bT (71) ∂X Assume W is symmetric, then ∂ (x − As)T W(x − As) = −2AT W(x − As) (72) ∂s ∂ (x − s)T W(x − s) = −2W(x − s) (73) ∂s ∂ (x − As)T W(x − As) = 2W(x − As) (74) ∂x ∂ (x − As)T W(x − As) = −2W(x − As)sT (75) ∂A 2.4.3 Higher order and non-linear n n−1 ∂(X )kl X = (XrJijXn−1−r) (76) ∂X kl ij r=0 For proof of the above, see B.1.1. n−1 ∂ X aT Xnb = (Xr)T abT (Xn−1−r)T (77) ∂X r=0 n−1 ∂ X h aT (Xn)T Xnb = Xn−1−rabT (Xn)T Xr ∂X r=0 i +(Xr)T XnabT (Xn−1−r)T (78) See B.1.1 for a proof. Assume s and r are functions of x, i.e. s = s(x), r = r(x), and that A is a constant, then ∂ ∂s T ∂r T sT Ar = Ar + AT s (79) ∂x ∂x ∂x Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 10 2.5 Derivatives of Traces 2 DERIVATIVES 2.4.4 Gradient and Hessian Using the above we have for the gradient and the hessian f = xT Ax + bT x (80) ∂f ∇ f = = (A + AT )x + b (81) x ∂x ∂2f = A + AT (82) ∂x∂xT 2.5 Derivatives of Traces 2.5.1 First Order ∂ Tr(X) = I (83) ∂X ∂ Tr(XA) = AT (84) ∂X ∂ Tr(AXB) = AT BT (85) ∂X ∂ Tr(AXT B) = BA (86) ∂X ∂ Tr(XT A) = A (87) ∂X ∂ Tr(AXT ) = A (88) ∂X ∂ Tr(A ⊗ X) = Tr(A)I (89) ∂X Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 11 2.5 Derivatives of Traces 2 DERIVATIVES 2.5.2 Second Order ∂ Tr(X2) = 2XT (90) ∂X ∂ Tr(X2B) = (XB + BX)T (91) ∂X ∂ Tr(XT BX) = BX + BT X (92) ∂X ∂ Tr(XBXT ) = XBT + XB (93) ∂X ∂ Tr(AXBX) = AT XT BT + BT XT AT (94) ∂X ∂ Tr(XT X) = 2X (95) ∂X ∂ Tr(BXXT ) = (B + BT )X (96) ∂X ∂ Tr(BT XT CXB) = CT XBBT + CXBBT (97) ∂X ∂ Tr XT BXC = BXC + BT XCT (98) ∂X ∂ Tr(AXBXT C) = AT CT XBT + CAXB (99) ∂X ∂ h i Tr (AXb + c)(AXb + c)T = 2AT (AXb + c)bT (100) ∂X ∂ ∂ Tr(X ⊗ X) = Tr(X)Tr(X) = 2Tr(X)I (101) ∂X ∂X See [7]. 2.5.3 Higher Order ∂ Tr(Xk) = k(Xk−1)T (102) ∂X k−1 ∂ X Tr(AXk) = (XrAXk−r−1)T (103) ∂X r=0 ∂ T T T T T ∂X Tr B X CXX CXB = CXX CXBB +CT XBBT XT CT X +CXBBT XT CX +CT XXT CT XBBT (104) 2.5.4 Other ∂ Tr(AX−1B) = −(X−1BAX−1)T = −X−T AT BT X−T (105) ∂X Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 12 2.6 Derivatives of vector norms 2 DERIVATIVES Assume B and C to be symmetric, then ∂ h i Tr (XT CX)−1A = −(CX(XT CX)−1)(A + AT )(XT CX)−1 (106) ∂X ∂ h i Tr (XT CX)−1(XT BX) = −2CX(XT CX)−1XT BX(XT CX)−1 ∂X +2BX(XT CX)−1 (107) See [7]. norms og matrix norms. 2.6 Derivatives of vector norms 2.6.1 Two-norm ∂ x − a ||x − a||2 = (108) ∂x ||x − a||2 ∂ x − a I (x − a)(x − a)T = − 3 (109) ∂x kx − ak2 kx − ak2 kx − ak2 ∂||x||2 ∂||xT x|| 1 2 = 2 = x (110) ∂x ∂x 2 2.7 Derivatives of matrix norms For more on matrix norms, see Sec. 10.5. 2.7.1 Frobenius norm ∂ ∂ ||X||2 = Tr(XXH ) = 2X (111) ∂X F ∂X See (201). 2.8 Derivatives of Structured Matrices Assume that the matrix A has some structure, i.e. symmetric, toeplitz, etc. In that case the derivatives of the previous section does not apply in general. Instead, consider the following general rule for differentiating a scalar function f(A) " T # df X ∂f ∂Akl ∂f ∂A = = Tr (112) dAij ∂Akl ∂Aij ∂A ∂Aij kl The matrix differentiated with respect to itself is in this document referred to as the structure matrix of A and is defined simply by ∂A = Sij (113) ∂Aij If A has no special structure we have simply Sij = Jij, that is, the structure matrix is simply the singleentry matrix. Many structures have a representation in singleentry matrices, see Sec. 9.3.6 for more examples of structure matrices. Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 13 2.8 Derivatives of Structured Matrices 2 DERIVATIVES 2.8.1 The Chain Rule Sometimes the objective is to find the derivative of a matrix which is a function of another matrix. Let U = f(X), the goal is to find the derivative of the function g(U) with respect to X: ∂g(U) ∂g(f(X)) = (114) ∂X ∂X Then the Chain Rule can then be written the following way: M N ∂g(U) ∂g(U) X X ∂g(U) ∂ukl = = (115) ∂X ∂xij ∂ukl ∂xij k=1 l=1 Using matrix notation, this can be written as: ∂g(U) h ∂g(U) ∂U i = Tr ( )T . (116) ∂Xij ∂U ∂Xij 2.8.2 Symmetric If A is symmetric, then Sij = Jij + Jji − JijJij and therefore df ∂f ∂f T ∂f = + − diag (117) dA ∂A ∂A ∂A That is, e.g., ([5]): ∂Tr(AX) = A + AT − (A ◦ I), see (121) (118) ∂X ∂ det(X) = det(X)(2X−1 − (X−1 ◦ I)) (119) ∂X ∂ ln det(X) = 2X−1 − (X−1 ◦ I) (120) ∂X 2.8.3 Diagonal If X is diagonal, then ([18]): ∂Tr(AX) = A ◦ I (121) ∂X 2.8.4 Toeplitz Like symmetric matrices and diagonal matrices also Toeplitz matrices has a special structure which should be taken into account when the derivative with respect to a matrix with Toeplitz structure. Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 14 2.8 Derivatives of Structured Matrices 2 DERIVATIVES ∂Tr(AT) (122) ∂T ∂Tr(TA) = ∂T T T Tr(A) Tr([A ]n1) Tr([[A ]1n]n−1,2) ··· An1 ...... Tr([AT ] )) Tr(A) . . . 1n . . . . . . = T . . . T Tr([[A ]1n]2,n−1) Tr([[A ]1n]n−1,2) . . . . . . . . . . . T . Tr([A ]n1) T T A1n ··· Tr([[A ]1n]2,n−1) Tr([A ]1n)) Tr(A) ≡ α(A) As it can be seen, the derivative α(A) also has a Toeplitz structure. Each value in the diagonal is the sum of all the diagonal valued in A, the values in the diagonals next to the main diagonal equal the sum of the diagonal next to the main diagonal in AT . This result is only valid for the unconstrained Toeplitz matrix. If the Toeplitz matrix also is symmetric, the same derivative yields ∂Tr(AT) ∂Tr(TA) = = α(A) + α(A)T − α(A) ◦ I (123) ∂T ∂T Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 15 3 INVERSES 3 Inverses 3.1 Basic 3.1.1 Definition −1 The inverse A of a matrix A ∈ Cn×n is defined such that AA−1 = A−1A = I, (124) where I is the n × n identity matrix. If A−1 exists, A is said to be nonsingular. Otherwise, A is said to be singular (see e.g. [12]). 3.1.2 Cofactors and Adjoint The submatrix of a matrix A, denoted by [A]ij is a (n − 1) × (n − 1) matrix obtained by deleting the ith row and the jth column of A. The (i, j) cofactor of a matrix is defined as i+j cof(A, i, j) = (−1) det([A]ij), (125) The matrix of cofactors can be created from the cofactors cof(A, 1, 1) ··· cof(A, 1, n) . . cof(A) = . cof(A, i, j) . (126) cof(A, n, 1) ··· cof(A, n, n) The adjoint matrix is the transpose of the cofactor matrix adj(A) = (cof(A))T , (127) 3.1.3 Determinant The determinant of a matrix A ∈ Cn×n is defined as (see [12]) n X j+1 det(A) = (−1) A1j det ([A]1j) (128) j=1 n X = A1jcof(A, 1, j). (129) j=1 3.1.4 Construction The inverse matrix can be constructed, using the adjoint matrix, by 1 A−1 = · adj(A) (130) det(A) For the case of 2 × 2 matrices, see section 1.2. Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 16 3.2 Exact Relations 3 INVERSES 3.1.5 Condition number The condition number of a matrix c(A) is the ratio between the largest and the smallest singular value of a matrix (see Section 5.2 on singular values), d c(A) = + (131) d− The condition number can be used to measure how singular a matrix is. If the condition number is large, it indicates that the matrix is nearly singular. The condition number can also be estimated from the matrix norms. Here c(A) = kAk · kA−1k, (132) where k · k is a norm such as e.g the 1-norm, the 2-norm, the ∞-norm or the Frobenius norm (see Sec 10.5 for more on matrix norms). The 2-norm of A equals p(max(eig(AH A))) [12, p.57]. For a symmetric matrix, this reduces to ||A||2 = max(|eig(A)|) [12, p.394]. If the matrix ia symmetric and positive definite, ||A||2 = max(eig(A)). The condition number based on the 2-norm thus reduces to max(eig(A)) kAk kA−1k = max(eig(A)) max(eig(A−1)) = . (133) 2 2 min(eig(A)) 3.2 Exact Relations 3.2.1 Basic (AB)−1 = B−1A−1 (134) 3.2.2 The Woodbury identity The Woodbury identity comes in many variants. The latter of the two can be found in [12] (A + CBCT )−1 = A−1 − A−1C(B−1 + CT A−1C)−1CT A−1 (135) (A + UBV)−1 = A−1 − A−1U(B−1 + VA−1U)−1VA−1 (136) If P, R are positive definite, then (see [29]) (P−1 + BT R−1B)−1BT R−1 = PBT (BPBT + R)−1 (137) 3.2.3 The Kailath Variant (A + BC)−1 = A−1 − A−1B(I + CA−1B)−1CA−1 (138) See [4, page 153]. Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 17 3.2 Exact Relations 3 INVERSES 3.2.4 The Searle Set of Identities The following set of identities, can be found in [24, page 151], (I + A−1)−1 = A(A + I)−1 (139) (A + BBT )−1B = A−1B(I + BT A−1B)−1 (140) (A−1 + B−1)−1 = A(A + B)−1B = B(A + B)−1A (141) A − A(A + B)−1A = B − B(A + B)−1B (142) A−1 + B−1 = A−1(A + B)B−1 (143) (I + AB)−1 = I − A(I + BA)−1B (144) (I + AB)−1A = A(I + BA)−1 (145) 3.2.5 Rank-1 update of Moore-Penrose Inverse The following is a rank-1 update for the Moore-Penrose pseudo-inverse and proof can be found in [17]. The matrix G is defined below: (A + cdT )+ = A+ + G (146) Using the the notation β = 1 + dT A+c (147) v = A+c (148) n = (A+)T d (149) w = (I − AA+)c (150) m = (I − A+A)T d (151) the solution is given as six different cases, depending on the entities ||w||, ||m||, and β. Please note, that for any (column) vector v it holds that v+ = T T −1 vT v (v v) = ||v||2 . The solution is: Case 1 of 6: If ||w|| 6= 0 and ||m|| 6= 0. Then G = −vw+ − (m+)T nT + β(m+)T w+ (152) 1 1 β = − vwT − mnT + mwT (153) ||w||2 ||m||2 ||m||2||w||2 Case 2 of 6: If ||w|| = 0 and ||m|| 6= 0 and β = 0. Then G = −vv+A+ − (m+)T nT (154) 1 1 = − vvT A+ − mnT (155) ||v||2 ||m||2 Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 18 3.3 Implication on Inverses 3 INVERSES Case 3 of 6: If ||w|| = 0 and β 6= 0. Then 1 β ||v||2 ||m||2 T G = mvT A+ − m + v (A+)T v + n β ||v||2||m||2 + |β|2 β β (156) Case 4 of 6: If ||w|| 6= 0 and ||m|| = 0 and β = 0. Then G = −A+nn+ − vw+ (157) 1 1 = − A+nnT − vwT (158) ||n||2 ||w||2 Case 5 of 6: If ||m|| = 0 and β 6= 0. Then 1 β ||w||2 ||n||2 T G = A+nwT − A+n + v w + n (159) β ||n||2||w||2 + |β|2 β β Case 6 of 6: If ||w|| = 0 and ||m|| = 0 and β = 0. Then G = −vv+A+ − A+nn+ + v+A+nvn+ (160) 1 1 vT A+n = − vvT A+ − A+nnT + vnT (161) ||v||2 ||n||2 ||v||2||n||2 3.3 Implication on Inverses If (A + B)−1 = A−1 + B−1 then AB−1A = BA−1B (162) See [24]. 3.3.1 A PosDef identity Assume P, R to be positive definite and invertible, then (P−1 + BT R−1B)−1BT R−1 = PBT (BPBT + R)−1 (163) See [29]. 3.4 Approximations The following is a Taylor expansion (I + A)−1 = I − A + A2 − A3 + ... (164) The following approximation is from [21] and holds when A large and symmetric A − A(I + A)−1A =∼ I − A−1 (165) If σ2 is small compared to Q and M then (Q + σ2M)−1 =∼ Q−1 − σ2Q−1MQ−1 (166) Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 19 3.5 Generalized Inverse 3 INVERSES 3.5 Generalized Inverse 3.5.1 Definition A generalized inverse matrix of the matrix A is any matrix A− such that (see [25]) AA−A = A (167) The matrix A− is not unique. 3.6 Pseudo Inverse 3.6.1 Definition The pseudo inverse (or Moore-Penrose inverse) of a matrix A is the matrix A+ that fulfils I AA+A = A II A+AA+ = A+ III AA+ symmetric IV A+A symmetric The matrix A+ is unique and does always exist. Note that in case of com- plex matrices, the symmetric condition is substituted by a condition of being Hermitian. 3.6.2 Properties Assume A+ to be the pseudo-inverse of A, then (See [3]) (A+)+ = A (168) (AT )+ = (A+)T (169) (cA)+ = (1/c)A+ (170) (AT A)+ = A+(AT )+ (171) (AAT )+ = (AT )+A+ (172) Assume A to have full rank, then (AA+)(AA+) = AA+ (173) (A+A)(A+A) = A+A (174) Tr(AA+) = rank(AA+) (See [25]) (175) Tr(A+A) = rank(A+A) (See [25]) (176) 3.6.3 Construction Assume that A has full rank, then Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 20 3.6 Pseudo Inverse 3 INVERSES A n × n Square rank(A) = n ⇒ A+ = A−1 A n × m Broad rank(A) = n ⇒ A+ = AT (AAT )−1 A n × m Tall rank(A) = m ⇒ A+ = (AT A)−1AT Assume A does not have full rank, i.e. A is n×m and rank(A) = r < min(n, m). The pseudo inverse A+ can be constructed from the singular value decomposi- tion A = UDVT , by + −1 T A = VrDr Ur (177) where Ur, Dr, and Vr are the matrices with the degenerated rows and columns deleted. A different way is this: There do always exist two matrices C n × r and D r × m of rank r, such that A = CD. Using these matrices it holds that A+ = DT (DDT )−1(CT C)−1CT (178) See [3]. Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 21 4 COMPLEX MATRICES 4 Complex Matrices 4.1 Complex Derivatives In order to differentiate an expression f(z) with respect to a complex z, the Cauchy-Riemann equations have to be satisfied ([7]): df(z) ∂<(f(z)) ∂=(f(z)) = + i (179) dz ∂ ∂f(z) ∂f(z) = i . (181) ∂=z ∂ df(z) 1∂f(z) ∂f(z) = − i . (182) dz 2 ∂ • Conjugate Complex Derivative df(z) 1∂f(z) ∂f(z) = + i . (183) dz∗ 2 ∂ The Generalized Complex Derivative equals the normal derivative, when f is an analytic function. For a non-analytic function such as f(z) = z∗, the derivative equals zero. The Conjugate Complex Derivative equals zero, when f is an analytic function. The Conjugate Complex Derivative has e.g been used by [20] when deriving a complex gradient. Notice: df(z) ∂f(z) ∂f(z) 6= + i . (184) dz ∂ df(z) ∇f(z) = 2 (185) dz∗ ∂f(z) ∂f(z) = + i . ∂ Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 22 4.1 Complex Derivatives 4 COMPLEX MATRICES • Complex Gradient Matrix: If f is a real function of a complex matrix Z, then the complex gradient matrix is given by ([2]) df(Z) ∇f(Z) = 2 (186) dZ∗ ∂f(Z) ∂f(Z) = + i . ∂ 4.1.1 The Chain Rule for complex numbers The chain rule is a little more complicated when the function of a complex u = f(x) is non-analytic. For a non-analytic function, the following chain rule can be applied ([7]) ∂g(u) ∂g ∂u ∂g ∂u∗ = + (187) ∂x ∂u ∂x ∂u∗ ∂x ∂g ∂u ∂g∗ ∗ ∂u∗ = + ∂u ∂x ∂u ∂x Notice, if the function is analytic, the second term reduces to zero, and the func- tion is reduced to the normal well-known chain rule. For the matrix derivative of a scalar function g(U), the chain rule can be written the following way: ∂g(U) T ∂g(U) T ∗ ∂g(U) Tr(( ) ∂U) Tr(( ∗ ) ∂U ) = ∂U + ∂U . (188) ∂X ∂X ∂X 4.1.2 Complex Derivatives of Traces If the derivatives involve complex numbers, the conjugate transpose is often in- volved. The most useful way to show complex derivative is to show the derivative with respect to the real and the imaginary part separately. An easy example is: ∂Tr(X∗) ∂Tr(XH ) = = I (189) ∂ Petersen & Pedersen, The Matrix Cookbook, Version: September 5, 2007, Page 23 4.1 Complex Derivatives 4 COMPLEX MATRICES complex number. ∂Tr(AXH ) = A (193) ∂ ∂Tr(AX∗) = AT (195) ∂ ∂Tr(XXH ) ∂Tr(XH X) = = 2 ∂Tr(XXH ) = X∗ (199) ∂X ∂Tr(XXH ) = X (200) ∂X∗ Since the function Tr(XXH ) is a real function of the complex matrix X, the complex gradient matrix (186) is given by ∂Tr(XXH ) ∇Tr(XXH ) = 2 = 2X (201) ∂X∗ 4.1.3 Complex Derivative Involving Determinants Here, a calculation example is provided. The objective is to find the derivative of H det(X AX) with respect to X ∈ Cm×n. The derivative is found with respect to the real part and the imaginary part of X, by use of (36) and (32), det(XH AX) can be calculated as (see App. B.1.2 for details) ∂ det(XH AX) 1∂ det(XH AX) ∂ det(XH AX) = − i ∂X 2 ∂