Matrix Calculus Notes

Matrix Calculus Notes James Burke 2000 Linear Spaces and Operators X and Y { normed linear spaces with norms k·kx and k·ky. L : X ! Y is a linear transformations (or operators) if L(αx + βz) = αL(x) + βL(z) 8 x; z 2 X and α; β 2 R: L[X; Y] the normed space of continuous linear operators: kT k := sup kT xky 8 T 2 L[X; Y]: kxkx≤1 ∗ X := L[X; R] { topological dual of X with the duality pairing ∗ hφ, xi = φ(x) 8 (φ, x) 2 X × X: The duality pairing gives rise to adjoints of a linear operator: ∗ ∗ ∗ T 2 L[X; Y] defines T 2 L[Y ; X ] by ∗ ∗ ∗ ∗ ∗ hy ; T (x)i = hT (y ); xi 8 (y ; x) 2 Y × X: m i m η { coordinate mapping from Y to R with the basis fy gi=1. Given T 2 L[X; Y], there exist uniquely defined ftij j i = 1; : : : ; m; j = 1; : : : ; ng ⊂ R such that m j X i T x = tijy ; j = 1; : : : ; n: i=1 m×n Therefore, η(T x) = T κ(x) where where (tij) = T 2 R Matrix Representations j n i m fx gj=1 and fy gi=1 are bases for X and Y. Pn j Given x = j=1 ajx 2 X, the linear mapping κ T x −! (a1; : : : ; an) n is a linear isomorphism between X and R and is called the n coordinate mapping from X to R associated with the basis j n fx gj=1. Matrix Representations j n i m fx gj=1 and fy gi=1 are bases for X and Y. Pn j Given x = j=1 ajx 2 X, the linear mapping κ T x −! (a1; : : : ; an) n is a linear isomorphism between X and R and is called the n coordinate mapping from X to R associated with the basis j n fx gj=1. m i m η { coordinate mapping from Y to R with the basis fy gi=1. Given T 2 L[X; Y], there exist uniquely defined ftij j i = 1; : : : ; m; j = 1; : : : ; ng ⊂ R such that m j X i T x = tijy ; j = 1; : : : ; n: i=1 m×n Therefore, η(T x) = T κ(x) where where (tij) = T 2 R . The Kronecker Product m×n s×t A; C 2 R and B 2 R . The Kronecker product of A with B is the ms × mt matrix given by 2 3 a11B a12B ··· a1nB 6 a21B a22B ··· a2nB 7 A ⊗ B := 6 7 : 6 . 7 4 . 5 am1B am2B ··· amnB 0 −1 A = [2 − 1];B = 2 1 0 −2 0 1 A ⊗ B = 4 2 −2 −1 The Matrix Coordinate Map \vec" m×n Given A 2 R , 2 3 A·1 6A·27 vec(A) := 6 7 ; 6 . 7 4 . 5 A·n that is, vec(A) is the mn vector obtained by stacking the columns of A on top of each other. T m×n mn Clearly, hA; BiF = vec(A) vec(B) and vec 2 L[R ; R ]. Properties of the Kronecker Product 1. A ⊗ B ⊗ C = (A ⊗ B) ⊗ C = A ⊗ (B ⊗ C) 2.( A ⊗ B)(C ⊗ D) = AC ⊗ BD when AC and BD exist. 3.( A ⊗ B)T = (AT ⊗ BT ) 4.( Ay ⊗ By) = (A ⊗ B)y, where M y is the Moore-Penrose pseudo inverse of the matrix M. 5. For vectors a and b, vec(abT ) = b ⊗ a. 6. If AXB is a well defined matrix product, then vec(AXB) = (BT ⊗ A)vec(X): In particular, vec(AX) = (I⊗A)vec(X) and vec(XB) = (BT ⊗I)vec(X); where I is interpreted as the identity matrix of the appropriate dimension. Spectrum of the Kronecker Product n×n m×m n m A 2 R and B 2 R have spectrum fλigi=1; fµigi=1 with multiplicity, resp.ly. Then the eigenvalues of A ⊗ B are λiµj; i; j = 1; : : : ; n and the eigenvalues of (In ⊗ A) + (B ⊗ Im) are λi + µj; i; j = 1; : : : ; n: (In ⊗ A) + (B ⊗ Im) is called the Kronecker sum of A and B. In particular, tr (A ⊗ B) = tr (A) tr (B) and det(A ⊗ B) = det An det(B)m: Singular Values of the Kronecker Product m×n s×t A 2 R and B 2 R have singular value decompositions T T UAΣAVA and UBΣBVV . Then, after reordering, the singular value decomposition of A ⊗ B is T T (UA ⊗ UB)(ΣA ⊗ ΣB)(VA ⊗ VB ): In particular, the nonzero singular values of A ⊗ B are σi(A)σj(B); i = 1;:::; rank A; j = 1;:::; rank B: Matrix Representations for L[Rm×n; Rs×t] m×n On R the matrices fEij j i = 1; : : : ; m; j = 1; : : : ; ng ; where Eij is the matrix having a one in the ij position and zero m×n elsewhere, form the standard unit coordinate basis for R . m×n Observe that vec is the coordinate mapping on R associated with this basis, where the coordinates are ordered by columns. We show how to use vec and ⊗ to compute a matrix m×n s×t representation for of elements L[R ; R ] with respect to the m×n s×t standard unit coordinate bases on R and R . Example 1 s×m n×t m×n s×t Let A 2 R and B 2 R and define T 2 L[R ; R ] by T (X) = AXB. Then, using the coordinate mapping vec we get vec(T (X)) = vec(AXB) = (BT ⊗ A)vec(X): Hence, the matrix representation of T in the coordinate bases is T = (BT ⊗ A) : Example 2 n×n n×n Define T 2 L[R ; R ] by T (X) = AX + XB, where n×n A; B 2 R . Then vec(T (X)) = vec(AX) + vec(XB) = (I ⊗ A)vec(X) + (BT ⊗ I)vec(X) T = [(In ⊗ A) + (B ⊗ In)]vec(X): That is, the matrix representation of T in the unit coordinate bases is T T = (In ⊗ A) + (B ⊗ In) the Kronecker sum of A and BT . The Derivative of det The standard way to compute the derivative of the determinant is to use Laplace's formula: 8 i0; j0 2 f1; 2; : : : ; ng n n X i+j0 X i0+j det(X) = xij0 (−1) det(X(i; j0)) = xi0j(−1) det(X(i0; j)) ; i=1 j=1 where X(i; j) 2 R(n−1)×(n−1) is obtained from X by deleting the ith row and jth column. This formula immediately tells us that @ det(X) = (−1)i+j det(X(i; j)) 8 i; j 2 f1; 2; : : : ; ng: @xij Consequently, the derivative of the determinant can be written in terms of the classical adjoint of X: T adj(A) := (−1)i+j det(X(i; j)) : That is, 0 T T (det(·)) (X)(D) = adj(X) ;D F = tr (adj(X)D) so r det(X) = adj(X) : T In differential notation, d det(X) = adj(X) ; dX F ; which more explicitly describes how to apply the chain rule. The Determinant The determinate is the unique multilinear form on the columns (or rows) whose value at the identity is 1. Determinants have a much longer history than do matrices themselves .They were derived to solve linear systems long before the invention of matrices. The culmination of this effort is what we now call Cramer's rule. Cramer's rule tells us that A adj(A) = adj(A)A = det(A)In: So, when det(A) 6= 0, then A−1 exists and we have 1 A−1 = adj(A) = det(A−1) adj(A) and det(A) adj(A) = det(A) A−1: In particular, when A−1 exists, we have r det(A) = adj(A)T = det(A) A−T : The Banach Lemma n×n The spectral radius of A 2 R is the maximum modulus of its spectrum, ρ(A) := max fjλj j det(λ − A) = 0g : n×n −1 Lemma: Given A 2 R , if ρ(A) < 1, then (I − A) exists and is given by the geometric series (I − A)−1 = I + A + A2 + A3 + :::: In addition, we have 1 1 ≤ ρ((I − A)−1) ≤ : 1 + ρ(A) 1 − ρ(A) Derivatives of A−1: d(X)−1 = −X−1(dX)X−1 The general linear group of degree n over R, GLn(R), is the set of real nonsingular n × n matrices. −1 Define Φ : GLn(R) ! GLn(R) by Φ(A) := A . Let A 2 GLn(R) and ∆A 2 Rn×n be such that ρ(A−1∆A) < 1, then (A + ∆A)−1 = (A(I + A−1∆A))−1 = (I + A−1∆A)−1A−1 = (I − A−1∆A) + A−1∆A) + o(k∆Ak2))A−1 (Banach Lemma) = A−1 − A−1∆AA−1 + A−1∆AA−1∆AA−1 + o(k∆Ak2): So Φ0(A)(D) = −A−1DA−1 and Φ00(A)(D; D) = 2A−1DA−1DA−1; and vec(Φ0(A)(D)) = −vec(A−1DA−1) = −(A−T ⊗ A−1)vec(D) =) rΦ(A) = −A−T ⊗ A−1 This procedure shows that Φ is C1 and all these derivatives are easily computed from the Banach Lemma. Chain Rule Example ( ln det(XT V −1X) ;V 2 n (V ) := S++ +1 ; otherwise, 0(V )(D) = r(ln det(·))(XT V −1X); (XT (·)−1X)0(V )(D) = (XT V −1X)−1; −XT V −1DV −1X = −tr (XT V −1X)−1XT V −1DV −1X = −tr V −1X(XT V −1X)−1XT V −1D = −V −1X(XT V −1X)−1XT V −1;D −1 T −1 −1 T −1 n =) r (V ) = −V X(X V X) X V 2 S :.

Matrix Calculus Notes

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support