2. Wishart Distributions and Inverse-Wishart Sampling

Total Page:16

File Type:pdf, Size:1020Kb

2. Wishart Distributions and Inverse-Wishart Sampling Wishart Distributions and Inverse-Wishart Sampling Stanley Sawyer | Washington University | Vs. April 30, 2007 1. Introduction. The Wishart distribution W (§; d; n) is a probability distribution of random nonnegative-de¯nite d £ d matrices that is used to model random covariance matrices. The parameter n is the number of de- grees of freedom, and § is a nonnegative-de¯nite symmetric d £ d matrix that is called the scale matrix. By de¯nition Xn 0 W ¼ W (§; d; n) ¼ XiXi;Xi ¼ N(0; §) (1:1) i=1 so that W ¼ W (§; d; n) is the distribution of a sum of n rank-one matrices d de¯ned by independent normal Xi 2 R with E(X) = 0 and Cov(X) = §. In particular 0 E(W ) = nE(XiXi) = n Cov(Xi) = n§ (See multivar.tex for more details.) In general, any X ¼ N(¹; §) can be represented X = ¹ + AZ; Z ¼ N(0;Id); so that § = Cov(X) = A Cov(Z)A0 = AA0 (1:2) The easiest way to ¯nd A in terms of § is the LU-decomposition, which ¯nds 0 a unique lower diagonal matrix A with Aii ¸ 0 such that AA = §. Then by (1.1) and (1.2) with ¹ = 0 Xn µXn ¶ 0 0 0 W (§; d; n) ¼ (AZi)(AZi) ¼ A ZiZi A ;Zi ¼ N(0;Id) i=1 i=1 0 ¼ AW (d; n) A where W (d; n) = W (Id; d; n) (1:3) In particular, W (§; d; n) can be easily represented in terms of W (d; n) = W (Id; d; n). Assume in the following that n > d and § is invertible. Then the density of the random d £ d matrix W in (1.1) can be written ¡ ¢ jwj(n¡d¡1)=2 exp ¡(1=2) tr(w§¡1) f(w; n; §) = Q ¡ ¢ (1:4) dn=2 d(d¡1)=4 n=2 d 2 ¼ j§j i=1 ¡ (n + 1 ¡ i)=2 where jwj = det(w), j§j = det(§), and f(w; n; §) = 0 unless w is symmetric and positive de¯nite (Anderson 2003, Section 7.2, page 252). Wishart and Inverse-Wishart Distributions:::::::::::::::::::::::::::::::2 2. The Inverse-Wishart Conjugate Prior. An important use of the Wishart distribution is as a conjugate prior for multivariate normal sampling. This leads to a d-dimensional analog of the inverse-gamma-normal conjugate prior for normal sampling in one dimension. The likelihood function of n independent observations Xi ¼ N(¹; §) for a d £ d positive de¯nite matrix § is Yn 1 ¡ 0 ¡1 ¢ L(¹; §;X) = p exp ¡(1=2)(Xi ¡ ¹) § (Xi ¡ ¹) (2¼)dj§j i=1 à ! 1 Xn = exp ¡(1=2) (X ¡ ¹)0§¡1(X ¡ ¹) (2:1) (2¼)nd=2j§jn=2 i i i=1 The sum in (2.1) can be written Xn Xd Xd ¡1 (Xia ¡ ¹a)(§ )ab(Xib ¡ ¹b) i=1 a=1 b=1 Xd Xd Xn ¡1 = (§ )ab (Xia ¡ ¹a)(Xib ¡ ¹b) a=1 b=1 i=1 Xd Xd ¡1 ¡ ¡1 ¢ = (§ )abQ(¹)ab = tr § Q(¹) (2:2) a=1 b=1 where Xn 0 Q(¹) = (Xi ¡ ¹)(Xi ¡ ¹) i=1 Xn 0 0 = (Xi ¡ X)(Xi ¡ X) + n(X ¡ ¹)(X ¡ ¹) i=1 0 = Q0 + nºº ; º = X ¡ ¹ (2:3) Substituting (2.3) into (2.2) and (2.1) leads to the expression ¡ ¡ ¢¢ ¡ ¢ exp ¡(1=2) tr Q §¡1 exp ¡(1=2)nº0§¡1º L(¹; §;X) = 0 (2¼)nd=2j§jn=2 ¡1 (n¡1)=2 ¡ ¡ ¡1¢¢ = Cnd j§ j exp ¡(1=2) tr Q0§ 1 ¡ ¢ £ p exp ¡(1=2)n(¹ ¡ X)0§¡1(¹ ¡ X) (2:4) 2¼j§j Wishart and Inverse-Wishart Distributions:::::::::::::::::::::::::::::::3 where Xn 0 Q0 = (Xi ¡ X)(Xi ¡ X) (2:5) R i=1 Note that the integral L(¹; §;X)d¹ in (2.4) is the same as the Wishart ¡1 ¡1 density (1.4) with § replaced by w, Q0 in (2.4) replaced by § in (1.4) ¡1 ¡1 so that Q0§ is replaced by w§ , and n replaced by n¡d, within multi- plicative constants that depend only on n, d, and X. The similarity in forms between (1.4) and the ¯rst factor in (2.4) suggests that we might be able to sample from the density L(¹; §;X) in (2.4) by generating random variables by ¡1 (i) W ¼ W (d; n; Q0 ) (2:6) (ii) § = W ¡1 p 0 (iii) ¹ = X + (A= n)Z; § = AA ;Z ¼ N(0;Id) ¡1 ¡1 One subtlety is that the density of § in (2.6) will not be f(n; Q0 ; § ) for f(n; S; W ) in (1.4), or at least will not be this density with respect to 2 Lebesgue measure d§ in Rd . In general, for any function Á(y) ¸ 0, Z ¡ ¢ ¡ ¡1 ¢ ¡1 ¡1 E Á(§) = E Á(W ) = Á(y )f(n; Q0 ; y) dy Z ¡1 ¡1 ¡1 = Á(y)f(n; Q0 ; y ) d(y ) Z ¡1 ¡1 ¡1 = Á(y)f(n; Q0 ; y )Jy(y ) dy ¡1 ¡1 where Jy(y ) is the absolute value of the Jacobian matrix of y ! y . Among all invertible matrices, a space of dimension d2, the Jacobian ¡1 ¡2d Jy(y ) = jyj (see Theorem 4.1 below). However, f(w; n; §) in (1.4) is de- rived using the \Bartlett decomposition" (see Section 3 below) to parametrize positive de¯nite symmetric matrices by a flat space of dimension d(d + 1)=2. ¡1 ¡d¡1 ¡1 Anderson (2003) states Jy(y ) = jyj for the mapping y ! y re- stricted to symmetric matrices, but refers only to a theorem in his Appendix (Theorem A.4.6) that has only the d2-dimensional result. ¡1 ¡d¡1 In any event, substituting Jy(y ) = jyj above leads to Z ¡ ¢ ¡d¡1 ¡1 ¡1 E Á(§) = Á(y)jyj f(n; Q0 ; y ) dy Thus by (1.4) the joint density of (¹; §) generated by (2.6) is ¡(n+d+1)=2 ¡ ¡1 ¢ g(¹; §) = Cj§j exp ¡(1=2) tr(§ Q0) 1 ¡ ¢ £ p exp ¡(1=2)n(¹ ¡ X)0§¡1(¹ ¡ X) (2:7) 2¼j§j Wishart and Inverse-Wishart Distributions:::::::::::::::::::::::::::::::4 The ¯rst factor in (2.7) is called the inverse-Wishart distribution by Anderson (2003, Theorem 7.7.1). Gelman et al. (2003) de¯ne the four-parameter inverse-Wishart-normal density for (¹; §) as the density generated by ¡1 ¡1 § ¼ W (º0; d; ¤0 ) (2:8) ¹ j § ¼ N(¹0; §=·0) where ·0; º0 are positive numbers, ¹0 is a real number, and ¤0 is a d £ d positive de¯nite matrix. They also state the density ¡ ¢ ¡((º0+d+1)=2) ¡1 p(¹; §) = j§j exp ¡(1=2) tr(¤0§ ) (2:9) ¡1=2 ¡ 0 ¡1 ¢ £ j§j exp ¡(·0=2)(¹ ¡ ¹0) § (¹ ¡ ¹0) for (2.8), which is the same as (2.7) with º0 = n, ¤0 = Q0, ·0 = n, and ¹0 = X. Gelman et al. (2003) say that updating (2.8) or (2.9) with respect to an independent multivariate normal sample X1;X2;:::;Xn with distribution N(¹; §) preserves the distribution with ¹0 etc. replaced by n ·0¹0 ¹n = X + (2:10) ·0 + n ·0 + n ·n = ·0 + n ºn = º0 + n ·0n 0 ¤n = ¤0 + Q0 + (X ¡ ¹0)(X ¡ ¹0) ·0 + n for Q0 in (2.5). Since º0 appears in (2.10) only in the additive update ºn = º0 + n, the initial power of § in the density does not matter. Gelman et al. also discuss using a Je®rey's prior (d+1)=2 ¼0(¹; §) = C=j§j This would amount to increasing n by d + 1 in (2.6) or (2.7) or º0 by d + 1 in (2.9). This would at least guarantee that the random matrices W in (2.6) were always invertible. 3. A Fast Way to Generate Wishart-Distributed Random Vari- ables. Suppose that we want to estimate parameters in a model with in- dependent multivariate normal variables. Bayesian methods based on Gibbs sampling using (2.4){(2.4) and (2.6) or (2.8) depend on being able to simulate Wishart-distributed random matrices in an e±cient manner. Wishart and Inverse-Wishart Distributions:::::::::::::::::::::::::::::::5 If we simulate W ¼ W (§; d; n) using the basic de¯nition (1.1){(1.3), then we have to generate nd independent standard normal random variables and use of order nd2 operations for each simulated value of W . Odell and Feiveson (1966) (referenced in Liu, 2001) developed a way to simulate W in O(d2) operations, which is a considerable improvement in time if n À d. Theorem 3.1 (Odell and Feiveson, 1966). Suppose that Vi (1 · i · d) are independent random variables where Vi has a chi-square distribution with n ¡ i + 1 degrees of freedom (so that n ¡ d + 1 · n ¡ i + 1 · n). Suppose that Nij are independent normal random variables with mean zero and variance one for 1 · i < j · d, also independent of the Vi. De¯ne random variables bij for 1 · i; j · d by bji = bij for 1 · i < j · d and Xi¡1 2 bii = Vi + Nri; 1 · i · d (3:1) r=1 p Xi¡1 bij = Nij Vi + NriNrj ; i < j · d r=1 Then B = fbijg has a Wishart distribution W (d; n) = W (Id; d; n). Remarks. (1) We assume thatp empty sums in (3.1) are zero, so that (3.1) implies b11 = V1 and b1j = N1j V1. Note that each diagonal entry bii is individually chi-square with n degrees of freedom. Odell and Feiveson state the theorem with n ¡ 1 in place of n, so that Vi is chi-squared with n ¡ i degrees of freedom (from n ¡ d to n ¡ 1) and the conclusion is that B ¼ W (n ¡ 1; d).
Recommended publications
  • STAT 802: Multivariate Analysis Course Outline
    STAT 802: Multivariate Analysis Course outline: Multivariate Distributions. • The Multivariate Normal Distribution. • The 1 sample problem. • Paired comparisons. • Repeated measures: 1 sample. • One way MANOVA. • Two way MANOVA. • Profile Analysis. • 1 Multivariate Multiple Regression. • Discriminant Analysis. • Clustering. • Principal Components. • Factor analysis. • Canonical Correlations. • 2 Basic structure of typical multivariate data set: Case by variables: data in matrix. Each row is a case, each column is a variable. Example: Fisher's iris data: 5 rows of 150 by 5 matrix: Case Sepal Sepal Petal Petal # Variety Length Width Length Width 1 Setosa 5.1 3.5 1.4 0.2 2 Setosa 4.9 3.0 1.4 0.2 . 51 Versicolor 7.0 3.2 4.7 1.4 . 3 Usual model: rows of data matrix are indepen- dent random variables. Vector valued random variable: function X : p T Ω R such that, writing X = (X ; : : : ; Xp) , 7! 1 P (X x ; : : : ; Xp xp) 1 ≤ 1 ≤ defined for any const's (x1; : : : ; xp). Cumulative Distribution Function (CDF) p of X: function FX on R defined by F (x ; : : : ; xp) = P (X x ; : : : ; Xp xp) : X 1 1 ≤ 1 ≤ 4 Defn: Distribution of rv X is absolutely con- tinuous if there is a function f such that P (X A) = f(x)dx (1) 2 ZA for any (Borel) set A. This is a p dimensional integral in general. Equivalently F (x1; : : : ; xp) = x1 xp f(y ; : : : ; yp) dyp; : : : ; dy : · · · 1 1 Z−∞ Z−∞ Defn: Any f satisfying (1) is a density of X. For most x F is differentiable at x and @pF (x) = f(x) : @x @xp 1 · · · 5 Building Multivariate Models Basic tactic: specify density of T X = (X1; : : : ; Xp) : Tools: marginal densities, conditional densi- ties, independence, transformation.
    [Show full text]
  • Mx (T) = 2$/2) T'w# (&&J Terp
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Applied Mathematics Letters PERGAMON Applied Mathematics Letters 16 (2003) 643-646 www.elsevier.com/locate/aml Multivariate Skew-Symmetric Distributions A. K. GUPTA Department of Mathematics and Statistics Bowling Green State University Bowling Green, OH 43403, U.S.A. [email protected] F.-C. CHANG Department of Applied Mathematics National Sun Yat-sen University Kaohsiung, Taiwan 804, R.O.C. [email protected] (Received November 2001; revised and accepted October 200.2) Abstract-In this paper, a class of multivariateskew distributions has been explored. Then its properties are derived. The relationship between the multivariate skew normal and the Wishart distribution is also studied. @ 2003 Elsevier Science Ltd. All rights reserved. Keywords-Skew normal distribution, Wishart distribution, Moment generating function, Mo- ments. Skewness. 1. INTRODUCTION The multivariate skew normal distribution has been studied by Gupta and Kollo [I] and Azzalini and Dalla Valle [2], and its applications are given in [3]. This class of distributions includes the normal distribution and has some properties like the normal and yet is skew. It is useful in studying robustness. Following Gupta and Kollo [l], the random vector X (p x 1) is said to have a multivariate skew normal distribution if it is continuous and its density function (p.d.f.) is given by fx(z) = 24&; W(a’z), x E Rp, (1.1) where C > 0, Q E RP, and 4,(x, C) is the p-dimensional normal density with zero mean vector and covariance matrix C, and a(.) is the standard normal distribution function (c.d.f.).
    [Show full text]
  • Bayesian Inference Via Approximation of Log-Likelihood for Priors in Exponential Family
    1 Bayesian Inference via Approximation of Log-likelihood for Priors in Exponential Family Tohid Ardeshiri, Umut Orguner, and Fredrik Gustafsson Abstract—In this paper, a Bayesian inference technique based x x x ··· x on Taylor series approximation of the logarithm of the likelihood 1 2 3 T function is presented. The proposed approximation is devised for the case, where the prior distribution belongs to the exponential family of distributions. The logarithm of the likelihood function is linearized with respect to the sufficient statistic of the prior distribution in exponential family such that the posterior obtains y1 y2 y3 yT the same exponential family form as the prior. Similarities between the proposed method and the extended Kalman filter for nonlinear filtering are illustrated. Furthermore, an extended Figure 1: A probabilistic graphical model for stochastic dy- target measurement update for target models where the target namical system with latent state xk and measurements yk. extent is represented by a random matrix having an inverse Wishart distribution is derived. The approximate update covers the important case where the spread of measurement is due to the target extent as well as the measurement noise in the sensor. (HMMs) whose probabilistic graphical model is presented in Fig. 1. In such filtering problems, the posterior to the last Index Terms—Approximate Bayesian inference, exponential processed measurement is in the same form as the prior family, Bayesian graphical models, extended Kalman filter, ex- distribution before the measurement update. Thus, the same tended target tracking, group target tracking, random matrices, inference algorithm can be used in a recursive manner.
    [Show full text]
  • Interpolation of the Wishart and Non Central Wishart Distributions
    Interpolation of the Wishart and non central Wishart distributions Gérard Letac∗ Angers, June 2016 ∗Laboratoire de Statistique et Probabilités, Université Paul Sabatier, 31062 Toulouse, France 1 Contents I Introduction 3 II Wishart distributions in the classical sense 3 III Wishart distributions on symmetric cones 5 IV Wishart distributions as natural exponential families 10 V Properties of the Wishart distributions 12 VI The one dimensional non central χ2 distributions and their interpola- tion 16 VII The classical non central Wishart distribution. 17 VIIIThe non central Wishart exists only if p is in the Gindikin set 17 IX The rank problem for the non central Wishart and the Mayerhofer conjecture. 19 X Reduction of the problem: the measures m(2p; k; d) 19 XI Computation of m(1; 2; 2) 23 XII Computation of the measure m(d − 1; d; d) 28 XII.1 Zonal functions . 28 XII.2 Three properties of zonal functions . 30 XII.3 The calculation of m(d − 1; d; d) ....................... 32 XII.4 Example: m(2; 3; 3) .............................. 34 XIIIConvolution lemmas in the cone Pd 35 XIVm(d − 2; d − 1; d) and m(d − 2; d; d) do not exist for d ≥ 3 37 XV References 38 2 I Introduction The aim of these notes is to present with the most natural generality the Wishart and non central Wishart distributions on symmetric real matrices, on Hermitian matrices and sometimes on a symmetric cone. While the classical χ2 square distributions are familiar to the statisticians, they quickly learn that these χ2 laws as parameterized by n their degree of freedom, can be interpolated by the gamma distributions with a continuous shape parameter.
    [Show full text]
  • Singular Inverse Wishart Distribution with Application to Portfolio Theory
    Working Papers in Statistics No 2015:2 Department of Statistics School of Economics and Management Lund University Singular Inverse Wishart Distribution with Application to Portfolio Theory TARAS BODNAR, STOCKHOLM UNIVERSITY STEPAN MAZUR, LUND UNIVERSITY KRZYSZTOF PODGÓRSKI, LUND UNIVERSITY Singular Inverse Wishart Distribution with Application to Portfolio Theory Taras Bodnara, Stepan Mazurb and Krzysztof Podgorski´ b; 1 a Department of Mathematics, Stockholm University, Roslagsv¨agen101, SE-10691 Stockholm, Sweden bDepartment of Statistics, Lund University, Tycho Brahe 1, SE-22007 Lund, Sweden Abstract The inverse of the standard estimate of covariance matrix is frequently used in the portfolio theory to estimate the optimal portfolio weights. For this problem, the distribution of the linear transformation of the inverse is needed. We obtain this distribution in the case when the sample size is smaller than the dimension, the underlying covariance matrix is singular, and the vectors of returns are inde- pendent and normally distributed. For the result, the distribution of the inverse of covariance estimate is needed and it is derived and referred to as the singular inverse Wishart distribution. We use these results to provide an explicit stochastic representation of an estimate of the mean-variance portfolio weights as well as to derive its characteristic function and the moments of higher order. ASM Classification: 62H10, 62H12, 91G10 Keywords: singular Wishart distribution, mean-variance portfolio, sample estimate of pre- cision matrix, Moore-Penrose inverse 1Corresponding author. E-mail address: [email protected]. The authors appreciate the financial support of the Swedish Research Council Grant Dnr: 2013-5180 and Riksbankens Jubileumsfond Grant Dnr: P13-1024:1 1 1 Introduction Analyzing multivariate data having fewer observations than their dimension is an impor- tant problem in the multivariate data analysis.
    [Show full text]
  • An Introduction to Wishart Matrix Moments Adrian N
    An Introduction to Wishart Matrix Moments Adrian N. Bishop1, Pierre Del Moral2 and Angèle Niclas3 1University of Technology Sydney (UTS) and CSIRO, Australia; [email protected] 2INRIA, Bordeaux Research Center, France; [email protected] 3École Normale Supérieure de Lyon, France ABSTRACT These lecture notes provide a comprehensive, self-contained introduction to the analysis of Wishart matrix moments. This study may act as an introduction to some particular aspects of random matrix theory, or as a self-contained exposition of Wishart matrix moments. Random matrix theory plays a central role in statistical physics, computational mathematics and engineering sci- ences, including data assimilation, signal processing, combi- natorial optimization, compressed sensing, econometrics and mathematical finance, among numerous others. The mathe- matical foundations of the theory of random matrices lies at the intersection of combinatorics, non-commutative algebra, geometry, multivariate functional and spectral analysis, and of course statistics and probability theory. As a result, most of the classical topics in random matrix theory are technical, and mathematically difficult to penetrate for non-experts and regular users and practitioners. The technical aim of these notes is to review and extend some important results in random matrix theory in the specific Adrian N. Bishop, Pierre Del Moral and Angèle Niclas (2018), “An Introduction to Wishart Matrix Moments”, Foundations and Trends R in Machine Learning: Vol. 11, No. 2, pp 97–218. DOI: 10.1561/2200000072. 2 context of real random Wishart matrices. This special class of Gaussian-type sample covariance matrix plays an impor- tant role in multivariate analysis and in statistical theory.
    [Show full text]
  • Package 'Matrixsampling'
    Package ‘matrixsampling’ August 25, 2019 Type Package Title Simulations of Matrix Variate Distributions Version 2.0.0 Date 2019-08-24 Author Stéphane Laurent Maintainer Stéphane Laurent <[email protected]> Description Provides samplers for various matrix variate distributions: Wishart, inverse-Wishart, nor- mal, t, inverted-t, Beta type I, Beta type II, Gamma, confluent hypergeometric. Allows to simu- late the noncentral Wishart distribution without the integer restriction on the degrees of freedom. License GPL-3 URL https://github.com/stla/matrixsampling BugReports https://github.com/stla/matrixsampling/issues Encoding UTF-8 LazyData true Imports stats, keep ByteCompile true RoxygenNote 6.1.1 NeedsCompilation no Repository CRAN Date/Publication 2019-08-24 22:00:02 UTC R topics documented: rinvwishart . .2 rmatrixbeta . .3 rmatrixbetaII . .4 rmatrixCHIIkind2 . .6 rmatrixCHIkind2 . .7 rmatrixCHkind1 . .8 1 2 rinvwishart rmatrixgamma . .9 rmatrixit . 10 rmatrixnormal . 11 rmatrixt . 12 rwishart . 13 rwishart_chol . 14 Index 16 rinvwishart Inverse-Wishart sampler Description Samples the inverse-Wishart distribution. Usage rinvwishart(n, nu, Omega, epsilon = 0, checkSymmetry = TRUE) Arguments n sample size, a positive integer nu degrees of freedom, must satisfy nu > p-1, where p is the dimension (the order of Omega) Omega scale matrix, a positive definite real matrix epsilon threshold to force invertibility in the algorithm; see Details checkSymmetry logical, whether to check the symmetry of Omega; if FALSE, only the upper tri- angular part of Omega is used Details The inverse-Wishart distribution with scale matrix Ω is defined as the inverse of the Wishart dis- tribution with scale matrix Ω−1 and same number of degrees of freedom.
    [Show full text]
  • Arxiv:1402.4306V2 [Stat.ML] 19 Feb 2014 Advantages Come at No Additional Computa- Archambeau and Bach, 2010]
    Student-t Processes as Alternatives to Gaussian Processes Amar Shah Andrew Gordon Wilson Zoubin Ghahramani University of Cambridge University of Cambridge University of Cambridge Abstract simple exact learning and inference procedures, and impressive empirical performances [Rasmussen, 1996], Gaussian processes as kernel machines have steadily We investigate the Student-t process as an grown in popularity over the last decade. alternative to the Gaussian process as a non- parametric prior over functions. We de- At the heart of every Gaussian process (GP) is rive closed form expressions for the marginal a parametrized covariance kernel, which determines likelihood and predictive distribution of a the properties of likely functions under a GP. Typ- Student-t process, by integrating away an ically simple parametric kernels, such as the Gaus- inverse Wishart process prior over the co- sian (squared exponential) kernel are used, and its pa- variance kernel of a Gaussian process model. rameters are determined through marginal likelihood We show surprising equivalences between dif- maximization, having analytically integrated away the ferent hierarchical Gaussian process models Gaussian process. However, a fully Bayesian nonpara- leading to Student-t processes, and derive a metric treatment of regression would place a nonpara- new sampling scheme for the inverse Wishart metric prior over the Gaussian process covariance ker- process, which helps elucidate these equiv- nel, to represent uncertainty over the kernel function, alences. Overall, we
    [Show full text]
  • Wishart Distributions for Covariance Graph Models
    WISHART DISTRIBUTIONS FOR COVARIANCE GRAPH MODELS By Kshitij Khare Bala Rajaratnam Technical Report No. 2009-1 July 2009 Department of Statistics STANFORD UNIVERSITY Stanford, California 94305-4065 WISHART DISTRIBUTIONS FOR COVARIANCE GRAPH MODELS By Kshitij Khare Bala Rajaratnam Department of Statistics Stanford University Technical Report No. 2009-1 July 2009 This research was supported in part by National Science Foundation grant DMS 0505303. Department of Statistics STANFORD UNIVERSITY Stanford, California 94305-4065 http://statistics.stanford.edu Technical Report, Department of Statistics, Stanford University WISHART DISTRIBUTIONS FOR COVARIANCE GRAPH MODELS By Kshitij Khare† and Bala Rajaratnam∗ Stanford University Gaussian covariance graph models encode marginal independence among the compo- nents of a multivariate random vector by means of a graph G. These models are distinctly different from the traditional concentration graph models (often also referred to as Gaus- sian graphical models or covariance selection models), as the zeroes in the parameter are now reflected in the covariance matrix Σ, as compared to the concentration matrix −1 Ω=Σ . The parameter space of interest for covariance graph models is the cone PG of positive definite matrices with fixed zeroes corresponding to the missing edges of G.As in Letac and Massam [20] we consider the case when G is decomposable. In this paper we construct on the cone PG a family of Wishart distributions, that serve a similar pur- pose in the covariance graph setting, as those constructed by Letac and Massam [20]and Dawid and Lauritzen [8] do in the concentration graph setting. We proceed to undertake a rigorous study of these “covariance” Wishart distributions, and derive several deep and useful properties of this class.
    [Show full text]
  • Bayesian Inference for the Multivariate Normal
    Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate Normal is most simply instanti- ated using a Normal-Wishart prior over the mean and covariance. Predic- tive densities then correspond to multivariate T distributions, and the moments from the marginal densities are provided analytically or via Monte-Carlo sampling. We show how this textbook approach is applied to a simple two-dimensional example. 1 Introduction This report addresses the following question. Given samples of multidimensional vectors drawn from a multivariate Normal density with mean m and precision Λ, what are the likely values of m and Λ ? Here, the precision is simply the inverse of the covariance i.e. Λ = C−1. Whilst it is simple to compute the sample mean and covariance, for inference we need the probability distributions over these quantities. In a Bayesian conception of this problem we place prior distributions over all quantities of interest and use Bayes rule to compute the posterior. We follow the formulation in Bernardo and Smith [1] (tabularised on page 441). 2 Preliminaries 2.1 Multivariate T distribution The multivariate T distribution over a d-dimensional random variable x is p(x) = T (x; µ, Λ; v) (1) with parameters µ, Λ and v. The mean and covariance are given by E(x) = µ (2) v V ar(x) = Λ−1 v − 2 The multivariate T approaches a multivariate Normal for large degrees of free- dom, v, as shown in Figure 1.
    [Show full text]
  • Exact Largest Eigenvalue Distribution for Doubly Singular Beta Ensemble
    Exact Largest Eigenvalue Distribution for Doubly Singular Beta Ensemble Stepan Grinek Institute for Systems Genetics, NYU Langone Medical Center, New York, NY, USA Abstract In [1] beta type I and II doubly singular distributions were introduced and their densities and the joint densities of nonzero eigenvalues were derived. In such matrix variate distributions p, the dimension of two singular Wishart distributions defining beta distribution is larger than m and q, degrees of freedom of Wishart matrices. We found simple formula to compute exact largest eigenvalue distribution for doubly singular beta ensemble in case of identity scale matrix, Σ = I. Distribution is presented in terms of existing expression for CDF of Roys statistic: −1 λmax ∼ max eig Wq(m; I)Wq(p − m + q; I) ; where Wk(n; I) is Wishart distribution with k dimensions, n degrees of free- dom and identity scale matrix. 1. Introduction The presented work is motivated by recent research on the method in the field of multivariate genomics data, Principal Component of Heritability (PCH). PCH is a method to reduce multivariate measurements on multiple arXiv:1905.01774v3 [math.ST] 4 Jan 2020 samples to one or two vectors representing the relative importance of the measurements. The criterion of the optimisation is the ratio between ex- plained (model) variance and residual variance in the multivariate regression model [3]. We recently achieved exact generalisation of this method to the case when number of measurements, p, is higher than number of observations, Email address: [email protected] (Stepan Grinek) Preprint submitted to Elsevier January 7, 2020 n, see our package P CEV on Comprehensive R Archive Network (CRAN).
    [Show full text]
  • Non-Central Multivariate Chi-Square and Gamma Distributions
    - 1 - Non-Central Multivariate Chi-Square and Gamma Distributions Thomas Royen TH Bingen, University of Applied Sciences e-mail: [email protected] A ()p -1 -variate integral representation is given for the cumulative distribution function of the general p - variate non-central gamma distribution with a non-centrality matrix of any admissible rank. The real part of products of well known analytical functions is integrated over arguments from ( , ). To facilitate the computation, these formulas are given more detailed for p 3. These ()p -1 -variate integrals are also derived for the diagonal of a non-central complex Wishart matrix. Furthermore, some alternative formulas are given for the cases with an associated “one-factorial” ()pp - correlation matrix R, i.e. R differs from a suitable diagonal matrix only by a matrix of rank 1, which holds in particular for all (3 3) - R with no vanishing correlation. 1. Introduction and Notation The following notations are used: stands for with nn,..., . The spectral norm of any ()n n1... np n 10p square matrix B is denoted by ||B || and ||B is its determinant. For a symmetrical matrix, A 0 means positive definiteness of A, and I or I p is always an identity matrix. The symbol also denotes a non-singular cova- 1 ik 1 ik riance matrix () ik with () , and Rr ()ik is a non-singular correlation matrix with Rr ( ). The abbreviation Lt stands for “Laplace transform” or “Laplace transformation”. Formulas from the NIST Handbook of Mathematical Functions are cited by HMF and their numbers. The p - variate non-central chi-square distribution with ν degrees of freedom, an “associated” covariance matrix 2 and a “non-centrality” matrix (the p (,,)- distribution) is defined as the joint distribution of the diago- nal elements of a non-central Wp (,,) - Wishart matrix.
    [Show full text]