Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution

Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution Charles J. Geyer School of Statistics University of Minnesota This work is licensed under a Creative Commons Attribution- ShareAlike 4.0 International License (http://creativecommons.org/ licenses/by-sa/4.0/). 1 The Dirichlet Distribution The Dirichlet Distribution is to the beta distribution as the multinomial distribution is to the binomial distribution. We get it by the same process that we got to the beta distribution (slides 128{137, deck 3), only multivariate. Recall the basic theorem about gamma and beta (same slides referenced above). 2 The Dirichlet Distribution (cont.) Theorem 1. Suppose X and Y are independent gamma random variables X ∼ Gam(α1; λ) Y ∼ Gam(α2; λ) then U = X + Y V = X=(X + Y ) are independent random variables and U ∼ Gam(α1 + α2; λ) V ∼ Beta(α1; α2) 3 The Dirichlet Distribution (cont.) Corollary 1. Suppose X1;X2;:::; are are independent gamma random variables with the same shape parameters Xi ∼ Gam(αi; λ) then the following random variables X1 ∼ Beta(α1; α2) X1 + X2 X1 + X2 ∼ Beta(α1 + α2; α3) X1 + X2 + X3 . X1 + ··· + Xd−1 ∼ Beta(α1 + ··· + αd−1; αd) X1 + ··· + Xd are independent and have the asserted distributions. 4 The Dirichlet Distribution (cont.) From the first assertion of the theorem we know X1 + ··· + Xk−1 ∼ Gam(α1 + ··· + αk−1; λ) and is independent of Xk. Thus the second assertion of the theorem says X1 + ··· + Xk−1 ∼ Beta(α1 + ··· + αk−1; αk)(∗) X1 + ··· + Xk and (∗) is independent of X1 + ··· + Xk. That proves the corollary. 5 The Dirichlet Distribution (cont.) Theorem 2. Suppose X1;X2;:::; are as in the Corollary. Then the random variables Xi Yi = X1 + ::: + Xd satisfy d X Yi = 1; almost surely: i=1 and the joint density of Y2, :::, Yd is Γ(α + ··· + α ) d 1 d α1−1 Y αi−1 f(y2; : : : ; yd) = (1 − y2 − · · · − yd) yi Γ(α1) ··· Γ(αd) i=2 The Dirichlet distribution with parameter vector (α1; : : : ; αd) is the distribution of the random vector (Y1;:::;Yd). 6 The Dirichlet Distribution (cont.) Let the random variables in Corollary 1 be denoted W2;:::;Wd so these are independent and X1 + ··· + Xi−1 Wi = ∼ Beta(α1 + ··· + αi−1; αi) X1 + ··· + Xi Then d Y Yi = (1 − Wi) Wj; i = 2; : : : ; d; j=i+1 where in the case i = d we use the convention that the product is empty and equal to one. 7 The Dirichlet Distribution (cont.) Xi Yi = X1 + ::: + Xd X1 + ··· + Xi−1 Wi = X1 + ··· + Xi The inverse transformation is Y1 + ··· + Yi−1 1 − Yi − · · · − Yd Wi = = ; i = 2; : : : ; d; Y1 + ··· + Yi 1 − Yi+1 − · · · − Yd where in the case i = d we use the convention that the the sum in the denominator of the fraction on the right is empty and equal to zero, so the denominator itself is equal to one. 8 The Dirichlet Distribution (cont.) 1 − yi − · · · − yd wi = 1 − yi+1 − · · · − yd This transformation has components of the Jacobian matrix @w 1 i = − @yi 1 − yi+1 − · · · − yd @w i = 0; j < i @yj @w 1 1 − y − · · · − y i = + i d − 2; j > i @yj 1 − yi+1 − · · · − yd (1 − yi+1 − · · · − yd) 9 The Dirichlet Distribution (cont.) Since this Jacobian matrix is triangular, the determinant is the product of the diagonal elements d−1 Y 1 jdet rh(y2; : : : ; yd)j = : i=2 1 − yi+1 − · · · − yd 10 The Dirichlet Distribution (cont.) The joint density of W2, :::, Wd is d Γ(α + ··· + α ) + + 1 Y 1 i α1 ··· αi−1− αi−1 wi (1 − wi) i=2 Γ(α1 + ··· + αi−1)Γ(αi) d Γ(α + ··· + α ) + + 1 1 d Y α1 ··· αi−1− αi−1 = wi (1 − wi) Γ(α1) ··· Γ(αd) i=2 11 The Dirichlet Distribution (cont.) Γ(α1+···+αd) Qd α1+···+αi−1−1 α −1 PMF of W 's w (1 − wi) i Γ(α1)···Γ(αd) i=2 i Jacobian Qd−1 1 i=2 1−yi+1−···−yd 1−yi−···−yd transformation wi = 1−yi+1−···−yd The PMF of Y2, :::, Yd is 1 d α1+···+αi−1−1 αi− Γ(α1 + ··· + αd) Y (1 − yi − · · · − yd) yi α1+···+αi−1 Γ(α1) ··· Γ(αd) i=2 (1 − yi+1 − · · · − yd) Γ(α + ··· + α ) d 1 d α1−1 Y αi−1 = (1 − y2 − · · · − yd) yi Γ(α1) ··· Γ(αd) i=2 12 Univariate Marginals Write I = f1; : : : ; dg. By definition Xi Yi = X1 + ::: + Xd has distribution the beta distribution with parameters αi and X αj j2I j6=i by Theorem 1 because Xi ∼ Gam(αi; λ) 0 1 B C X B X C X ∼ Gam B α ; λC j B j C j2I @j2I A j6=i j6=i 13 Multivariate Marginals Multivariate Marginals are \almost" Dirichlet. As was the case with the multinomial, if we collapse categories, we get a Dirichlet. Let A be a partition of I, and define X ZA = Yi;A 2 A: i2A X βA = αi;A 2 A: i2A Then the random vector having components ZA has the Dirichlet distribution with parameters βA. 14 Conditionals Xi Yi = X1 + ::: + Xd Xi X1 + ··· + Xk Yi = · X1 + ··· + Xk X1 + ··· + Xd Xi = · (Y1 + ··· + Yk) X1 + ··· + Xk Xi = · (1 − Yk+1 − · · · − Yd) X1 + ··· + Xk 15 Conditionals (cont.) Xi Yi = · (1 − Yk+1 − · · · − Yd) X1 + ··· + Xk When we condition on Yk+1;:::;Yd, the second term above is a constant and the first term a component of another Dirichlet random vector having components Xi Zi = ; i = 1; : : : ; k X1 + ··· + Xk So conditionals of Dirichlet are constant times Dirichlet. 16 Moments From the marginals being beta, we have αi E(Yi) = α1 + ··· + αd α var(Y ) = i X α i ( + + )2( + + + 1) j α1 ··· αd α1 ··· αd j2I j6=i 17 Moments (cont.) From the PMF we get the \theorem associated with the Dirichlet distribution." 0 1 Z Z d α1−1 Y αi−1 ··· (1 − y2 − · · · − yd) @ yi A dy2 ··· dyd i=2 Γ(α ) ··· Γ(α ) = 1 d Γ(α1 + ··· + αd) so Γ(α1 + 1)Γ(α2 + 1)Γ(α3) ··· Γ(αd) Γ(α1 + ··· + αd) E(Y1Y2) = · Γ(α1 + ··· + αd + 2) Γ(α1) ··· Γ(αd) α α = 1 2 (α1 + ··· + αd + 1)(α1 + ··· + αd) 18 Moments (cont.) The result on the preceding slide holds when 1 and 2 are replaced by i and j for i 6= j, and cov(Yi;Yj) = E(YiYj) − E(Yi)E(Yj) α α α α = i j i j − 2 (α1 + ··· + αd + 1)(α1 + ··· + αd) (α1 + ··· + αd) α α " 1 1 # = i j − α1 + ··· + αd α1 + ··· + αd + 1 α1 + ··· + αd α α = i j − 2 (α1 + ··· + αd) (α1 + ··· + αd + 1) 19.

Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution

Connection Between Dirichlet Distributions and a Scale-Invariant Probabilistic Model Based on Leibniz-Like Pyramids

Distribution of Mutual Information

Lecture 24: Dirichlet Distribution and Dirichlet Process

The Exciting Guide to Probability Distributions – Part 2

Parameter Specification of the Beta Distribution and Its Dirichlet

Generalized Dirichlet Distributions on the Ball and Moments

Spatially Constrained Student's T-Distribution Based Mixture Model

Expansion of the Kullback-Leibler Divergence, and a New Class of Information Metrics

Handbook on Probability Distributions

Maximum Likelihood Estimation of Dirichlet Distribution Parameters

Field Guide to Continuous Probability Distributions

Characteristic Functionals of Dirichlet Measures∗