R Functions to Symbolically Compute the Central and Non-Central Moments of the Multivariate Normal Distribution
Total Page:16
File Type:pdf, Size:1020Kb
JSS Journal of Statistical Software MMMMMM YYYY, Volume VV, Code Snippet II.doi: 10.18637/jss.v000.i00 R Functions to Symbolically Compute the Central and Non-central Moments of the Multivariate Normal Distribution Kem Phillips Abstract The central moments of the multivariate normal distribution are functions of its n × n variance-covariance matrix Σ. These moments can be expressed symbolically as linear combinations of products of powers of the elements of Σ. A formula for these moments derived by differentiating the characteristic function is developed. The formula requires searching integer matrices for matrices whose n successive row and column sums equal the n exponents of the moment. This formula is implemented in R, with R functions to display moments in LATEX and to evaluate moments at specified variance-covariance matrices are included. This vignette also describes how the symmoments package has been augmented to cal- culate symbolic representations of non-central multivariate moments, and to calculate the expected value of such moments as well as the expected value of multivariate polynomials. In addition, it is shown that first non-central multivariate moments are in a one-to-one relationship to phylogenetic trees. Functions making this correspondence, as well as a function for the correspondence with matchings are provided. Keywords: non-central moments, multivariate Normal distribution, symbolic computation, multivariate polynomials, phylogenetic trees, R,LATEX. 1. Introduction The central moments of an n-dimensional random vector X are defined as k1 k2 kn mk1;:::;kn = E[(X1 − µ1) (X2 − µ2) ··· (Xn − µn) ]; (1) where E[··· ] denotes expected value. Suppose that X is distributed according to the multi- variate normal distribution with mean µ and variance-covariance matrix Σ = (σij) (2) 2 Expected Values of Functions of a Multivariate Normal Random Variable where the variance terms are σii; i = 1; : : : ; n, the covariance terms are σij; i 6= j, and by symmetry σij = σji. For the multivariate normal distribution the central moments are not functions of the mean vector µ, and depend only on the variance-covariance terms σij. Simple cases are familiar. Setting µ to 0, 2 m2 = E[X1 ] = σ1;1 (3) m1;2 = E[X1X2] = σ1;2 (4) Slightly more complicated cases can be computed directly or by manipulating simple expres- sions obtained for moments of the form E[X1 ··· Xn](Wikipedia contributors 2009). For example, 2 2 2 m2;2 = E[X1 X2 ] = 2σ1;2 + σ1;1σ2;2 (5) 2 1 1 m2;1;1 = E[X1 X2 X3 ] = 2σ1;2σ1;3 + σ1;1σ2;3 (6) 3 1 m1;1 = E[X1 X2 ] = 3σ1;1σ1;2 (7) Although some higher order moments follow known patterns, most are much harder to deter- mine by simple calculations. We wish to compute the symbolic expression of any moment mk1;:::;kn in terms of the n(n+1) Pn symbols σij. Note that if i=1 ki is an odd integer, the moment is 0, so that only moments where this sum is even need be considered. The multivariate normal distribution is fundamental to mathematical statistics, and its mo- ments play a central role in statistical methodology. Various methods have been developed to numerically compute them (Muirhead 1982, p. 46) and (Anderson 1971, p. 49). Kan(2008) developed a formula (his Proposition 1) for the central moments as a repeated sum. He gives an excellent review of other formulas that have been developed, and cites Isserlis(1918) as deriving the first expression for the central moments. Muirhead (page 49) used the matrix derivatives of the multivariate normal distribution's characteristic function to derive a for- mula for multivariate cumulants. Tracy and Sultan(1993) also used matrix derivatives to derive an expression for the distribution's moments (their Theorem 2) based on a recurrence relationship of the derivatives. This article develops a new explicit formula for the moments starting with the derivatives of the characteristic function. The expression for the moments is based on a search algorithm over certain integer matrices. This formula has been translated into R functions that produce symbolic representations of moments in terms of the variance- covariance terms σij. In addition, a formula for non-central moments is developed based on central moments. This formula is programmed, and is extended to the expected values of multivariate polynomials. Finally, it is shown that first central moments are equivalent to phylogenetic trees, and R functions making this correspondence are described. The functions described here are based on Phillips(2010), and are available in the package symmoments implemented in the R system for statistical computing (R Development Core Team 2008). Both R itself and the symmoments package (as well as all other packages used in this paper) are available under the terms of the General Public License (GPL) from the Comprehensive R Archive Network (CRAN, http://CRAN.R-project.org/). Journal of Statistical Software { Code Snippets 3 2. Development of the formula for central moments The moments of any distribution can be represented by the derivatives of the distribution's characteristic function. The characteristic function of the multivariate normal distribution is (Muirhead 1982, p. 5, 49) it>X it>µ− 1 t>Σt E[e ] = e 2 (8) where t = (t1; t2; :::; tn). Within a constant, the moment is the k1; :::; kn-order derivative of the characteristic function evaluated at t = 0: Pn k Pn d i=1 i > − i=1 ki it X mk1;:::;kn = i k k k E[e ] jt=0 (9) d 1 t1d 2 t2 ··· d n tn where i is the imaginary unit. Expanding the exponential into an infinite sum, this is Pn k 1 Pn d i=1 i 1 − i=1 ki X > > ` mk1;:::;kn = i k k k (it µ − (t Σt)) =`! jt=0 (10) d 1 t1d 2 t2 ··· d n tn 2 `=0 Since we are to compute the central moment, we will set µ = 0, so that the term it>µ will not appear: Pn k 1 Pn d i=1 i 1 − i=1 ki X > ` mk1;:::;kn = i k k k (− (t Σt)) =`! jt=0 (11) d 1 t1d 2 t2 ··· d n tn 2 `=0 n Pn Pn Pn P − i=1 ki − i=1 ki=2 − i=1 ki Since i=1 ki is even, the term i = (−1) . We note that the term i will ultimately cancel with the negative in the infinite sum, and will be omitted for convenience in notation. The expression in t is > X t Σt = σijtitj (12) ij `1 `2 `n We need to find the coefficient of t1 t2 ··· tn in > ` X ` X X (t Σt) = ( σijtitj) = σijtitj ··· σijtitj (13) ij ij ij All products in the elements of the sum will occur. Any product will be obtained by choosing σijtitj a certain number of times, say `ij. Since one term is chosen from each of ` terms, P ij `ij = `. Further, for any such matrix (`ij), there will be a term, since it can be constructed by choosing σijtitj from the first `ij terms, and so forth for each (ij) until ` is exhausted. For any (`ij) there are ` (14) `11 : : : `nn ways to choose the terms, where this is the multinomial coefficient. So, X ` X ` Y `ij ( σijtitj) = (σijtitj) (15) P `11 : : : `nn ij f(`ij )j ij `ij =`g ij 4 Expected Values of Functions of a Multivariate Normal Random Variable X ` Y `ij Y `ij = σij (titj) (16) P `11 : : : `nn f(`ij )j ij `ij =`g ij ij Q `ij For the moment, we distinguish between σij and σji as symbols. As a result, each ij σij is unique as determined by unique (`ij). However, titj = tjti, so since each σij is combined with `1 `2 `n Pn two t's, the total exponent in t is 2`. That is, a term t1 t2 ··· tn must have i=1 `i = 2`. We need to determine the terms for which, for any (`1; : : : ; `n), Y `ij `1 `n (titj) = t1 ··· tn (17) ij We will get tk in the product in the following mutually exclusive cases: Condition Exponent of tk i = k; j 6= k 1 (18) i 6= k; j = k 1 i = j = k 2 So the exponent of tk will be X X `ij + `ij + 2`kk (19) i=k;j6=k i6=k;j=k This sum is obtained by adding the sum of `ij across row k to the sum across column k, since the diagonal element k occurs in both sums. That is, we get X X X `ik + `kj = (`ik + `ki) = `k (20) i j i We can now partition the set of (`ij) in Equation 16 according to these sums, that is, f`k; k = 1 : : : ng. As stated before, the sum of the exponents, `k, must be 2`. X ` ( σijtitj) = ij 0 1 n X X ` Y `ij Y `i @ σij A ti (21) P P `11 : : : `nn f(`1;:::`n)j k `k=2`g f(`11;:::;`nn)j i(`ik+`ki)=`k;k=1:::ng ij i=1 Since differentiation is distributive with respect to addition and multiplication by constants, the derivative of the product of ts can be determined from the derivatives of the individual terms: Pn k n n k d i=1 i Y Y d i t`i = t`i (22) dk1 t dk2 t ··· dkn t i ki i 1 2 n i=1 i=1 dti n Y `i! = Ifk ≤ ` g t`i−ki (23) i i (` − k )! i i=1 i i Journal of Statistical Software { Code Snippets 5 Thus, Pn ki d i=1 X ` ( σijtitj) = dk1 dk2 ··· dkn ij 0 1 X X ` Y `ij @ σij A P P `11 : : : `nn f(`1;:::;`n)j i `i=2`g f(`11;`12;:::;`nn)j i(`ih+`hi)=`h;h=1;:::;ng ij n Y `i! Ifk ≤ ` g t`i−ki (24) i i (` − k )! i i=1 i i Incorporating the constants from Equation 11, noting again that the negative signs will cancel, the full sum is 0 1 1 X 1 ` X X ` Y `ij ( ) =`! @ σij A 2 P P `11 : : : `nn `=0 f(`1;:::;`n)j i li=2`g f(`11;`12;:::;`nn)j j (`hj +`jh)=`i;h=1;:::;ng ij n Y `i! Ifk ≤ ` g t`i−ki (25) i i (` − k )! i i=1 i i Setting t = 0, only terms with `i = ki for all i will remain.