Chapter 2 Matrix Calculus and Notation

Chapter 2 Matrix Calculus and Notation 2.1 Introduction: Can It Possibly Be That Simple? In October of 2005, I scribbled in a notebook, “can it possibly be that simple?” I was referring to the sensitivity of transient dynamics (the eventual results appear in Chap. 7), and had just begun to use matrix calculus as a tool. The answer to my question was yes. It can be that simple. This book relies on this set of mathematical techniques. This chapter introduces the basics, which will be used throughout the text. For more information, I recommend four sources in particular. The most complete treatment, but not the easiest starting point, is the book by Magnus and Neudecker (1988). More accessible introductions can be found in the paper by Magnus and Neudecker (1985) and especially the text by Abadir and Magnus (2005). A review paper by Nel (1980) is helpful in placing the Magnus-Neudecker formulation in the context of other attempts at a calculus of matrices. Sensitivity analysis asks how much change in an outcome variable y is caused by a change in some parameter x. At its most basic level, and with some reasonable assumptions about the continuity and differentiability of the functional relationships involved, the solution is given by differential calculus. If y is a function of x, then the derivative dy dx tells how y responds to a change in x, i.e., the sensitivity of y to a change in x. However, the outcomes of a demographic calculation may be scalar-valued (e.g., the population growth rate λ), vector-valued (e.g., the stable stage distribution), or matrix-valued (e.g., the fundamental matrix). Any of these outcomes may be © The Author(s) 2019 13 H. Caswell, Sensitivity Analysis: Matrix Methods in Demography and Ecology, Demographic Research Monographs, https://doi.org/10.1007/978-3-030-10534-1_2 14 2 Matrix Calculus and Notation functions of scalar-valued parameters (e.g., the Gompertz aging rate), vector-valued parameters (e.g., the mortality schedule), or matrix-valued parameters (e.g., the transition matrix) parameters. Thus, sensitivity analysis in demography requires more than the simple derivative in (2.1). We want a consistent and flexible approach to differentiating ⎧ ⎫ ⎧ ⎫ ⎨ scalar-valued ⎬ ⎨ scalar ⎬ ⎩ vector-valued ⎭ functions of ⎩ vector ⎭ arguments matrix-valued matrix 2.2 Notation and Matrix Operations 2.2.1 Notation Matrices are denoted by upper case bold symbols (e.g., A), vectors (usually) by lower case bold symbols (n). The (i, j) entry of the matrix A is aij , and the ith entry of the vector n is ni. Sometimes we will use MATLAB notation, and write X(i, :) = row i of X (2.1) X(:,j) = column j of X (2.2) The notation x(ij) denotes a matrix whose (i, j) entry is x. For example, dyi dxj is the matrix whose (i, j) entry is the derivative of yi with respect to xj . The transpose of X is XT. Logarithms are natural logarithms. The vector norm x is, unless noted otherwise, the 1-norm. The symbol D (x) denotes the square matrix with x on the diagonal and zeros elsewhere. The symbol 1 denotes a vector of ones. The vector ei is a unit vector with 1 in the ith entry and zeros elsewhere. The identity matrix is I. Where necessary for clarity, the dimension of matrices or vectors will be indicated by a subscript. Thus Is is a s × s identity matrix, 1s is an s × 1 vector of ones, and Xm×n is a m × n matrix. In some places (Chaps. 6 and 10) block-structured matrices appear; these are denoted by either A or A˜ , depending on the context and the role of the matrix. 2.2 Notation and Matrix Operations 15 2.2.2 Operations In addition to the familiar matrix product AB, we will also use the Hadamard, or elementwise product A ◦ B = aij bij (2.3) and the Kronecker product A ⊗ B = aij B (2.4) The Hadamard product requires that A and B be the same size. The Kronecker product is defined for any sizes of A and B. Some useful properties of the Kronecker product include − − − (A ⊗ B) 1 = A 1 ⊗ B 1 (2.5) (A ⊗ B)T = AT ⊗ BT (2.6) A ⊗ (B + C) = (A ⊗ B) + (A ⊗ C) (2.7) and, provided that the matrices are of the right size for the products to be defined, (A1 ⊗ B1)(A2 ⊗ B2) = (A1A2 ⊗ B1B2) . (2.8) 2.2.3 The Vec Operator and Vec-Permutation Matrix The vec operator transforms a m × n matrix A into a mn × 1 vector by stacking the columns one above the next, ⎛ ⎞ A(:, 1) ⎜ ⎟ = . vec A ⎝ . ⎠ (2.9) A(:,n) For example, ⎛ ⎞ a ⎜ ⎟ ab ⎜ c ⎟ vec = ⎝ ⎠ . (2.10) cd b d 16 2 Matrix Calculus and Notation The vec of A and the vec of AT are rearrangements of the same entries; they are related by T vec A = Km,nvec A (2.11) where A is m × n and Km,n is the vec-permutation matrix (Henderson and Searle 1981)orcommutation matrix (Magnus and Neudecker 1979). The vec-permutation matrix can be calculated as m n = ⊗ T Km,n Eij Eij (2.12) i=1 j=1 where Eij is a matrix, of dimension m × n, with a 1 in the (i, j) entry and zeros elsewhere. Like any permutation matrix, K−1 = KT. The vec operator and the vec-permutation matrix are particularly important in multistate models (e.g., age×stage-classified models), where they are used in both the formulation and analysis of the models (e.g., Caswell 2012, 2014; Caswell and Salguero-Gómez 2013; Caswell et al. 2018); see also Chap. 6. Extensions to an arbitrary number of dimensions, so-called hyperstate models, have been presented by Roth and Caswell (2016). 2.2.4 Roth’s Theorem The vec operator and the Kronecker product are connected by a theorem due to Roth (1934): vec (ABC) = CT ⊗ A vec B. (2.13) We will often want to obtain the vec of a matrix that appears in the middle of a product; we will use Roth’s theorem repeatedly. 2.3 Defining Matrix Derivatives The derivative of a scalar y with respect to a scalar x is familiar. What, however, does it mean to speak of the derivative of a scalar with respect to a vector, or of a vector with respect to another vector, or any other combination? These can be defined in more than one way and the choice is critical (Nel 1980; Magnus and Neudecker 1985). This book relies on the notation due to Magnus and Neudecker, because it makes certain operations possible and consistent. •Ifx and y are scalars, the derivative of y with respect to x is the familiar derivative dy/dx. 2.4 The Chain Rule 17 •Ify is a n × 1 vector and x a scalar, the derivative of y with respect to x is the n × 1 column vector ⎛ ⎞ dy1 ⎜ ⎟ ⎜ dx ⎟ dy ⎜ . ⎟ = ⎜ . ⎟ . (2.14) dx ⎝ . ⎠ dyn dx •Ify is a scalar and x a m × 1 vector, the derivative of y with respect to x is the 1 × m row vector (called the gradient vector) dy ∂y ∂y = ··· . (2.15) T dx ∂x1 ∂xm Note the orientation of dy/dx as a column vector and dy/dxT as a row vector. •Ify is a n × 1 vector and x a m × 1 vector, the derivative of y with respect to x is defined to be the n × m matrix whose (i, j) entry is the derivative of yi with respect to xj , i.e., dy dyi = (2.16) dxT dxj (this matrix is called the Jacobian matrix). • Derivatives involving matrices are written by first transforming the matrices into vectors using the vec operator, and then applying the rules for vector differentiation to the resulting vectors. Thus, the derivative of the m × n matrix Y with respect to the p × q matrix X is the mn × pq matrix dvec Y . (2.17) d (vec X)T From now on, I will write vec TX for (vec X)T. 2.4 The Chain Rule The chain rule for differentiation is your friend. The Magnus-Neudecker notation, unlike some alternatives, extends the familiar scalar chain rule to derivatives of vectors and matrices (Nel 1980; Magnus and Neudecker 1985). If u (size m × 1) is a function of v (size n × 1) and v is a function of x (size p × 1), then du du dv = (2.18) T T T dx dv dx m×p m×n n×p 18 2 Matrix Calculus and Notation Notice that the dimensions are correct, and that the order of the multiplication matters. Checking dimensional consistency in this way is a useful way to find errors. 2.5 Derivatives from Differentials The key to the matrix calculus of Magnus and Neudecker (1988) is the relationship between the differential and the derivative of a function. Experience suggests that, for many readers of this book, this relationship is shrouded in the mists of long-ago calculus classes. 2.5.1 Differentials of Scalar Function Start with scalars. Suppose that y = f(x) is a differentiable function at x = x0. Then the derivative of y with respect to x at the point x0 is defined as f(x0 + h) − f(x0) f (x0) = lim . (2.19) h→0 h Now define the differential of y. This is customarily denoted dy, but for the moment, I will denote it by cy. The differential of y at x0 is a function of h, defined by cy(x0,h)= f (x0)h.

Load more