2. Wishart Distributions and Inverse-Wishart Sampling

Wishart Distributions and Inverse-Wishart Sampling Stanley Sawyer | Washington University | Vs. April 30, 2007 1. Introduction. The Wishart distribution W (§; d; n) is a probability distribution of random nonnegative-de¯nite d £ d matrices that is used to model random covariance matrices. The parameter n is the number of degrees of freedom, and § is a nonnegative-de¯nite symmetric d £ d matrix that is called the scale matrix. By de¯nition Xn 0 W ¼ W (§; d; n) ¼ XiXi;Xi ¼ N(0; §) (1:1) i=1 so that W ¼ W (§; d; n) is the distribution of a sum of n rank-one matrices d de¯ned by independent normal Xi 2 R with E(X) = 0 and Cov(X) = §. In particular 0 E(W ) = nE(XiXi) = n Cov(Xi) = n§ (See multivar.tex for more details.) In general, any X ¼ N(¹; §) can be represented X = ¹ + AZ; Z ¼ N(0;Id); so that § = Cov(X) = A Cov(Z)A0 = AA0 (1:2) The easiest way to ¯nd A in terms of § is the LU-decomposition, which ¯nds 0 a unique lower diagonal matrix A with Aii ¸ 0 such that AA = §. Then by (1.1) and (1.2) with ¹ = 0 Xn µXn ¶ 0 0 0 W (§; d; n) ¼ (AZi)(AZi) ¼ A ZiZi A ;Zi ¼ N(0;Id) i=1 i=1 0 ¼ AW (d; n) A where W (d; n) = W (Id; d; n) (1:3) In particular, W (§; d; n) can be easily represented in terms of W (d; n) = W (Id; d; n). Assume in the following that n > d and § is invertible. Then the density of the random d £ d matrix W in (1.1) can be written ¡ ¢ jwj(n¡d¡1)=2 exp ¡(1=2) tr(w§¡1) f(w; n; §) = Q ¡ ¢ (1:4) dn=2 d(d¡1)=4 n=2 d 2 ¼ j§j i=1 ¡ (n + 1 ¡ i)=2 where jwj = det(w), j§j = det(§), and f(w; n; §) = 0 unless w is symmetric and positive de¯nite (Anderson 2003, Section 7.2, page 252). Wishart and Inverse-Wishart Distributions:::::::::::::::::::::::::::::::2 2. The Inverse-Wishart Conjugate Prior. An important use of the Wishart distribution is as a conjugate prior for multivariate normal sampling. This leads to a d-dimensional analog of the inverse-gamma-normal conjugate prior for normal sampling in one dimension. The likelihood function of n independent observations Xi ¼ N(¹; §) for a d £ d positive de¯nite matrix § is Yn 1 ¡ 0 ¡1 ¢ L(¹; §;X) = p exp ¡(1=2)(Xi ¡ ¹) § (Xi ¡ ¹) (2¼)dj§j i=1 Ã ! 1 Xn = exp ¡(1=2) (X ¡ ¹)0§¡1(X ¡ ¹) (2:1) (2¼)nd=2j§jn=2 i i i=1 The sum in (2.1) can be written Xn Xd Xd ¡1 (Xia ¡ ¹a)(§ )ab(Xib ¡ ¹b) i=1 a=1 b=1 Xd Xd Xn ¡1 = (§ )ab (Xia ¡ ¹a)(Xib ¡ ¹b) a=1 b=1 i=1 Xd Xd ¡1 ¡ ¡1 ¢ = (§ )abQ(¹)ab = tr § Q(¹) (2:2) a=1 b=1 where Xn 0 Q(¹) = (Xi ¡ ¹)(Xi ¡ ¹) i=1 Xn 0 0 = (Xi ¡ X)(Xi ¡ X) + n(X ¡ ¹)(X ¡ ¹) i=1 0 = Q0 + nºº ; º = X ¡ ¹ (2:3) Substituting (2.3) into (2.2) and (2.1) leads to the expression ¡ ¡ ¢¢ ¡ ¢ exp ¡(1=2) tr Q §¡1 exp ¡(1=2)nº0§¡1º L(¹; §;X) = 0 (2¼)nd=2j§jn=2 ¡1 (n¡1)=2 ¡ ¡ ¡1¢¢ = Cnd j§ j exp ¡(1=2) tr Q0§ 1 ¡ ¢ £ p exp ¡(1=2)n(¹ ¡ X)0§¡1(¹ ¡ X) (2:4) 2¼j§j Wishart and Inverse-Wishart Distributions:::::::::::::::::::::::::::::::3 where Xn 0 Q0 = (Xi ¡ X)(Xi ¡ X) (2:5) R i=1 Note that the integral L(¹; §;X)d¹ in (2.4) is the same as the Wishart ¡1 ¡1 density (1.4) with § replaced by w, Q0 in (2.4) replaced by § in (1.4) ¡1 ¡1 so that Q0§ is replaced by w§ , and n replaced by n¡d, within multi- plicative constants that depend only on n, d, and X. The similarity in forms between (1.4) and the ¯rst factor in (2.4) suggests that we might be able to sample from the density L(¹; §;X) in (2.4) by generating random variables by ¡1 (i) W ¼ W (d; n; Q0 ) (2:6) (ii) § = W ¡1 p 0 (iii) ¹ = X + (A= n)Z; § = AA ;Z ¼ N(0;Id) ¡1 ¡1 One subtlety is that the density of § in (2.6) will not be f(n; Q0 ; § ) for f(n; S; W ) in (1.4), or at least will not be this density with respect to 2 Lebesgue measure d§ in Rd . In general, for any function Á(y) ¸ 0, Z ¡ ¢ ¡ ¡1 ¢ ¡1 ¡1 E Á(§) = E Á(W ) = Á(y )f(n; Q0 ; y) dy Z ¡1 ¡1 ¡1 = Á(y)f(n; Q0 ; y ) d(y ) Z ¡1 ¡1 ¡1 = Á(y)f(n; Q0 ; y )Jy(y ) dy ¡1 ¡1 where Jy(y ) is the absolute value of the Jacobian matrix of y ! y . Among all invertible matrices, a space of dimension d2, the Jacobian ¡1 ¡2d Jy(y ) = jyj (see Theorem 4.1 below). However, f(w; n; §) in (1.4) is de- rived using the \Bartlett decomposition" (see Section 3 below) to parametrize positive de¯nite symmetric matrices by a ﬂat space of dimension d(d + 1)=2. ¡1 ¡d¡1 ¡1 Anderson (2003) states Jy(y ) = jyj for the mapping y ! y re- stricted to symmetric matrices, but refers only to a theorem in his Appendix (Theorem A.4.6) that has only the d2-dimensional result. ¡1 ¡d¡1 In any event, substituting Jy(y ) = jyj above leads to Z ¡ ¢ ¡d¡1 ¡1 ¡1 E Á(§) = Á(y)jyj f(n; Q0 ; y ) dy Thus by (1.4) the joint density of (¹; §) generated by (2.6) is ¡(n+d+1)=2 ¡ ¡1 ¢ g(¹; §) = Cj§j exp ¡(1=2) tr(§ Q0) 1 ¡ ¢ £ p exp ¡(1=2)n(¹ ¡ X)0§¡1(¹ ¡ X) (2:7) 2¼j§j Wishart and Inverse-Wishart Distributions:::::::::::::::::::::::::::::::4 The ¯rst factor in (2.7) is called the inverse-Wishart distribution by Anderson (2003, Theorem 7.7.1). Gelman et al. (2003) de¯ne the four-parameter inverse-Wishart-normal density for (¹; §) as the density generated by ¡1 ¡1 § ¼ W (º0; d; ¤0 ) (2:8) ¹ j § ¼ N(¹0; §=·0) where ·0; º0 are positive numbers, ¹0 is a real number, and ¤0 is a d £ d positive de¯nite matrix. They also state the density ¡ ¢ ¡((º0+d+1)=2) ¡1 p(¹; §) = j§j exp ¡(1=2) tr(¤0§ ) (2:9) ¡1=2 ¡ 0 ¡1 ¢ £ j§j exp ¡(·0=2)(¹ ¡ ¹0) § (¹ ¡ ¹0) for (2.8), which is the same as (2.7) with º0 = n, ¤0 = Q0, ·0 = n, and ¹0 = X. Gelman et al. (2003) say that updating (2.8) or (2.9) with respect to an independent multivariate normal sample X1;X2;:::;Xn with distribution N(¹; §) preserves the distribution with ¹0 etc. replaced by n ·0¹0 ¹n = X + (2:10) ·0 + n ·0 + n ·n = ·0 + n ºn = º0 + n ·0n 0 ¤n = ¤0 + Q0 + (X ¡ ¹0)(X ¡ ¹0) ·0 + n for Q0 in (2.5). Since º0 appears in (2.10) only in the additive update ºn = º0 + n, the initial power of § in the density does not matter. Gelman et al. also discuss using a Je®rey's prior (d+1)=2 ¼0(¹; §) = C=j§j This would amount to increasing n by d + 1 in (2.6) or (2.7) or º0 by d + 1 in (2.9). This would at least guarantee that the random matrices W in (2.6) were always invertible. 3. A Fast Way to Generate Wishart-Distributed Random Vari- ables. Suppose that we want to estimate parameters in a model with independent multivariate normal variables. Bayesian methods based on Gibbs sampling using (2.4){(2.4) and (2.6) or (2.8) depend on being able to simulate Wishart-distributed random matrices in an e±cient manner. Wishart and Inverse-Wishart Distributions:::::::::::::::::::::::::::::::5 If we simulate W ¼ W (§; d; n) using the basic de¯nition (1.1){(1.3), then we have to generate nd independent standard normal random variables and use of order nd2 operations for each simulated value of W . Odell and Feiveson (1966) (referenced in Liu, 2001) developed a way to simulate W in O(d2) operations, which is a considerable improvement in time if n À d. Theorem 3.1 (Odell and Feiveson, 1966). Suppose that Vi (1 · i · d) are independent random variables where Vi has a chi-square distribution with n ¡ i + 1 degrees of freedom (so that n ¡ d + 1 · n ¡ i + 1 · n). Suppose that Nij are independent normal random variables with mean zero and variance one for 1 · i < j · d, also independent of the Vi. De¯ne random variables bij for 1 · i; j · d by bji = bij for 1 · i < j · d and Xi¡1 2 bii = Vi + Nri; 1 · i · d (3:1) r=1 p Xi¡1 bij = Nij Vi + NriNrj ; i < j · d r=1 Then B = fbijg has a Wishart distribution W (d; n) = W (Id; d; n). Remarks. (1) We assume thatp empty sums in (3.1) are zero, so that (3.1) implies b11 = V1 and b1j = N1j V1. Note that each diagonal entry bii is individually chi-square with n degrees of freedom. Odell and Feiveson state the theorem with n ¡ 1 in place of n, so that Vi is chi-squared with n ¡ i degrees of freedom (from n ¡ d to n ¡ 1) and the conclusion is that B ¼ W (n ¡ 1; d).

2. Wishart Distributions and Inverse-Wishart Sampling

STAT 802: Multivariate Analysis Course Outline

Mx (T) = 2$/2) T'w# (&&J Terp

Bayesian Inference Via Approximation of Log-Likelihood for Priors in Exponential Family

Interpolation of the Wishart and Non Central Wishart Distributions

Singular Inverse Wishart Distribution with Application to Portfolio Theory

An Introduction to Wishart Matrix Moments Adrian N

Package 'Matrixsampling'

Arxiv:1402.4306V2 [Stat.ML] 19 Feb 2014 Advantages Come at No Additional Computa- Archambeau and Bach, 2010]

Wishart Distributions for Covariance Graph Models

Bayesian Inference for the Multivariate Normal

Exact Largest Eigenvalue Distribution for Doubly Singular Beta Ensemble

Non-Central Multivariate Chi-Square and Gamma Distributions