A Useful Distributions

A Useful distributions Bernoulli distribution A Bernoulli random variable is the indicator function of an event or, in other words, a discrete random variable whose only possible values are zero and one. If X e(p), ∼ B P (X = 1) = 1 P (X = 0) = p. − The probability mass function is 1 p if x = 0, e(x; p) = − B (p if x = 1. Normal distribution Arguably the most used (and abused) probability distribution. Its density is 1 (x µ)2 (x; µ, σ2) = exp − , N √ 2 − 2σ2 2πσ with expected value µ and variance σ2. Beta distribution The support of a Beta distribution is the interval (0,1). For this reason it is often used as prior distribution for an unknown probability. The distribution is parametrized in terms of two positive parameters, a and b, and is denoted by (a, b). Its density is B Γ (a + b) (x; a, b) = xa−1(1 x)b−1, 0 < x < 1, B Γ (a)Γ (b) − For a random variable X (a, b) we have ∼ B a ab E(X) = , Var(X) = . a + b (a + b)2(a + b + 1) A multivariate generalization is provided by the Dirichlet distribution. 232 A Useful distributions Gamma distribution A random variable X has a Gamma distribution, with parameters (a, b), if it has density ba (x; a, b) = xa−1 exp( bx), x > 0 G Γ (a) − where a and b are positive parameters. We find that a a E(X) = , Var(X) = . b b2 If a > 1, there is a unique mode at (a 1)/b. For a = 1, the density reduces to the (negative) exponential distribution− with parameter b. For (a = k/2, b = 1/2) it is a Chi-square distribution with k degrees of freedom, χ2(k). If X (a, b), the density of Y = 1/X is called Inverse-Gamma, with parameters∼ G (a, b), and we have E(Y ) = b/(a 1) if a > 1 and Var(Y ) = b2/((a 1)2(a 2)) if a > 2. − − − Student-t distribution If Z (0, 1),U χ2(k), k > 0 and Z and U are independent, then the random∼ N variable T ∼= Z/ U/k has a (central) Student-t distribution with k degrees of freedom, with density p − k+1 t2 2 f(t; k) = c 1 + , k where c = Γ ((k + 1)/2)/(Γ (k/2)√kπ). We write T (0, 1, k) or simply ∼ T T k. ∼It T is clear from the definition that the density is positive on the whole real line and symmetric around the origin. It can be shown that, as k increases to infinity, the density converges to a standard Normal density at any point. We have E(X) = 0 if k > 1, k Var(X) = if k > 2. k 2 − If T (0, 1, k), then X = µ + σT has a Student-t distribution, with parameters∼ T(µ, σ2) and k degrees of freedom; we write X (µ, σ2, k). Clearly 2 k ∼ T E(X) = µ if k > 1 and Var(X) = σ k−2 if k > 2. Normal-Gamma distribution −1 Let (X, Y ) be a bivariate random vector. If X Y = y (µ, (n0y) ), and Y (a, b), then we say that (X, Y ) has a| Normal-Gamma∼ N density with ∼ G −1 R R+ parameters (µ, n0 , a, b) (where of course µ , n0, a, b ). We write −1 ∈ ∈ (X, Y ) (µ, n0 , a, b) The marginal density of X is a Student-t, X (µ, (n∼a ) N−1 G, 2a). ∼ T 0 b A Useful distributions 233 Multivariate Normal distribution ′ A continuous random vector Y = (Y1,...,Yk) has a k-variate Normal distri- ′ k bution with parameters µ = (µ1, . , µk) and Σ, where µ R and Σ is a symmetric positive-definite matrix, if it has density ∈ 1 (y; µ, Σ) = Σ −1/2(2π)−k/2 exp (y µ)′Σ−1(y µ) , y Rk Nk | | −2 − − ∈ where Σ denotes the determinant of the matrix Σ. We write | | Y (µ, Σ). ∼ Nk Clearly, if k = 1, so that Σ is a scalar, the k(µ, Σ) reduces to the univariate Normal density. N We have E(Yi) = µi and, denoting by σi,j the elements of Σ, Var(Yi) = σi,i −1 and Cov(Yi,Yj) = σi,j. The inverse of the covariance matrix Σ, Φ = Σ is the precision matrix of Y . Several results are of interest; their proof can be found in any multivariate analysis textbook (see, e.g. Barra and Herbach; 1981, pp.92,96). 1. If Y k(µ, Σ) and X is a linear transformation of Y , that is X = AY where∼A Nis a n k matrix, then X (Aµ, AΣA′). × ∼ Nk 2. Let X and Y be two random vectors, with covariance matrices ΣX and ΣY , respectively. Let ΣYX be the covariance between Y and X, i.e. ′ ΣYX = E((Y E(Y ))(X E(X)) ). The covariance between X and Y − ′ − is then ΣXY = ΣYX . Suppose that ΣX is nonsingular. Then it can be proved that the joint distribution of (X, Y ) is Gaussian if and only if the following conditions are satisfied: (i) X has a Gaussian distribution; (ii) the conditional distribution of Y given X = x is a Gaussian distribution whose mean is E(Y X = x) = E(Y ) + Σ Σ−1(x E(X)) | YX X − and whose covariance matrix is Σ = Σ Σ Σ−1Σ . Y |X Y − YX X XY Multinomial distribution Consider a set of n independent and identically distributed observations taking values in a finite label set L ,L ,...,L . Denote by p the probability of { 1 2 k} i an observation being equal to Li, i = 1, . , k. The vector of label counts X = (X1,...,Xk), where Xi is the number of observations equal to Li (i = 1, . , k) has a Multinomial distribution, whose probability mass function is n! x1 xk ult(x1, . , xk; n, p) = p1 . pk , M x1! . xk! where p = (p1, . , pk) and the counts x1, . , xk satisfy the constraint xi = n. P 234 A Useful distributions Dirichlet distribution The Dirichlet distribution is a multivariate generalization of the Beta distribution. Consider a parameter vector a = (a1, . , ak). The Dirichlet distribution ir(a) has k 1-dimensional density D − k−1 ak−1 Γ (a + + a ) − ir(x , . , x ; a) = 1 ··· k xa1−1 . xak 1−1 1 x , D 1 k−1 Γ (a ) ...Γ (a ) 1 k−1 − i 1 k i=1 ! X k−1 for x < 1, x > 0, i = 1 . , k 1. i i − i=1 X Wishart distribution Let W be a symmetric positive-definite matrix of random variables wi,j , i, j = 1, . , k. The distribution of W is the joint distribution of its entries (in fact, the distribution of the k(k + 1)/2-dimensional vector of the distinct entries). We say that W has a Wishart distribution with parameters α and B (α > (k 1)/2 and B a symmetric, positive-definite matrix), if it has density − (W ; α, B) = c W α−(k+1)/2 exp tr(BW ) , Wk | | − α k(k−1)/4 k where c = B /Γk(α), Γk(α) = π i=1 Γ ((2α + 1 i)/2) is the generalized gamma| | function and tr( ) denotes the trace of a matrix− argument. We write W (α, B) or just W· (α, BQ). We have ∼ Wk ∼ W E(W ) = α B−1. The Wishart distribution arises in sampling from a multivariate Gaussian distribution. If (Y1,...,Yn), n > 1, is a random sample from a multivariate normal distribution (µ, Σ) and Y¯ = n Y /n, then Y¯ (µ, Σ/n) and Nk i=1 i ∼ Nk n P S = (Y Y¯ )(Y Y¯ )′ i − i − i=1 X −1 is independent of Y¯ and has a Wishart distribution k((n 1)/2,Σ /2). In particular, if µ = 0, then W − n n 1 W = Y Y ′ , Σ−1 , i i ∼ Wk 2 2 i=1 X whose density (for n > k 1) is − − − n k 1 1 −1 f(w; n, Σ) W 2 exp tr(Σ W ) . ∝ | | −2 A Useful distributions 235 In fact, the Wishart distribution is usually parametrized in n and Σ, as in the expression above; then the parameter n is called degrees of freedom. Note that E(W ) = nΣ. We used the parametrization in α and B for analogy with the Gamma distribution; indeed, if k = 1, so that B is a scalar, then 1(α, B) reduces to the Gamma density ( ; α, B). W The following properties ofG the· Wishart distribution can be proved. Let −1 ′ W k(α = n/2,B = Σ /2) and Y = AW A , where A is an (m k) matrix of real∼ W numbers (m k). Then Y has a Wishart distribution of× dimension ≤ 1 −1 m with parameters α and 2 (AΣA) , if the latter exists. In particular, if W and Σ conformably partition into W W Σ Σ W = 1,1 1,2 ,Σ = 1,1 1,2 , W W Σ Σ 2,1 2,2 2,1 2,2 where W and Σ are h h matrices (1 h < k), then 1,1 1,1 × ≤ n 1 W α = , Σ−1 . 1,1 ∼ Wh 2 2 1,1 This property allows to compute the marginal distribution of the elements on the diagonal of W ; for example, if k = 2 and A = (1, 0), then Y = w1,1 (α = n/2, σ−1/2), where σ is the first element of the diagonal of Σ.∼ It G 1,1 1,1 follows that w /σ χ2(n).

Load more