A Useful Distributions

A Useful distributions Bernoulli distribution A Bernoulli random variable is the indicator function of an event or, in other words, a discrete random variable whose only possible values are zero and one. If X e(p), ∼ B P (X = 1) = 1 P (X = 0) = p. − The probability mass function is 1 p if x = 0, e(x; p) = − B (p if x = 1. Normal distribution Arguably the most used (and abused) probability distribution. Its density is 1 (x µ)2 (x; µ, σ2) = exp − , N √ 2 − 2σ2 2πσ with expected value µ and variance σ2. Beta distribution The support of a Beta distribution is the interval (0,1). For this reason it is often used as prior distribution for an unknown probability. The distribution is parametrized in terms of two positive parameters, a and b, and is denoted by (a, b). Its density is B Γ (a + b) (x; a, b) = xa−1(1 x)b−1, 0 < x < 1, B Γ (a)Γ (b) − For a random variable X (a, b) we have ∼ B a ab E(X) = , Var(X) = . a + b (a + b)2(a + b + 1) A multivariate generalization is provided by the Dirichlet distribution. 232 A Useful distributions Gamma distribution A random variable X has a Gamma distribution, with parameters (a, b), if it has density ba (x; a, b) = xa−1 exp( bx), x > 0 G Γ (a) − where a and b are positive parameters. We find that a a E(X) = , Var(X) = . b b2 If a > 1, there is a unique mode at (a 1)/b. For a = 1, the density reduces to the (negative) exponential distribution− with parameter b. For (a = k/2, b = 1/2) it is a Chi-square distribution with k degrees of freedom, χ2(k). If X (a, b), the density of Y = 1/X is called Inverse-Gamma, with parameters∼ G (a, b), and we have E(Y ) = b/(a 1) if a > 1 and Var(Y ) = b2/((a 1)2(a 2)) if a > 2. − − − Student-t distribution If Z (0, 1),U χ2(k), k > 0 and Z and U are independent, then the random∼ N variable T ∼= Z/ U/k has a (central) Student-t distribution with k degrees of freedom, with density p − k+1 t2 2 f(t; k) = c 1 + , k where c = Γ ((k + 1)/2)/(Γ (k/2)√kπ). We write T (0, 1, k) or simply ∼ T T k. ∼It T is clear from the definition that the density is positive on the whole real line and symmetric around the origin. It can be shown that, as k increases to infinity, the density converges to a standard Normal density at any point. We have E(X) = 0 if k > 1, k Var(X) = if k > 2. k 2 − If T (0, 1, k), then X = µ + σT has a Student-t distribution, with parameters∼ T(µ, σ2) and k degrees of freedom; we write X (µ, σ2, k). Clearly 2 k ∼ T E(X) = µ if k > 1 and Var(X) = σ k−2 if k > 2. Normal-Gamma distribution −1 Let (X, Y ) be a bivariate random vector. If X Y = y (µ, (n0y) ), and Y (a, b), then we say that (X, Y ) has a| Normal-Gamma∼ N density with ∼ G −1 R R+ parameters (µ, n0 , a, b) (where of course µ , n0, a, b ). We write −1 ∈ ∈ (X, Y ) (µ, n0 , a, b) The marginal density of X is a Student-t, X (µ, (n∼a ) N−1 G, 2a). ∼ T 0 b A Useful distributions 233 Multivariate Normal distribution ′ A continuous random vector Y = (Y1,...,Yk) has a k-variate Normal distri- ′ k bution with parameters µ = (µ1, . , µk) and Σ, where µ R and Σ is a symmetric positive-definite matrix, if it has density ∈ 1 (y; µ, Σ) = Σ −1/2(2π)−k/2 exp (y µ)′Σ−1(y µ) , y Rk Nk | | −2 − − ∈ where Σ denotes the determinant of the matrix Σ. We write | | Y (µ, Σ). ∼ Nk Clearly, if k = 1, so that Σ is a scalar, the k(µ, Σ) reduces to the univariate Normal density. N We have E(Yi) = µi and, denoting by σi,j the elements of Σ, Var(Yi) = σi,i −1 and Cov(Yi,Yj) = σi,j. The inverse of the covariance matrix Σ, Φ = Σ is the precision matrix of Y . Several results are of interest; their proof can be found in any multivariate analysis textbook (see, e.g. Barra and Herbach; 1981, pp.92,96). 1. If Y k(µ, Σ) and X is a linear transformation of Y , that is X = AY where∼A Nis a n k matrix, then X (Aµ, AΣA′). × ∼ Nk 2. Let X and Y be two random vectors, with covariance matrices ΣX and ΣY , respectively. Let ΣYX be the covariance between Y and X, i.e. ′ ΣYX = E((Y E(Y ))(X E(X)) ). The covariance between X and Y − ′ − is then ΣXY = ΣYX . Suppose that ΣX is nonsingular. Then it can be proved that the joint distribution of (X, Y ) is Gaussian if and only if the following conditions are satisfied: (i) X has a Gaussian distribution; (ii) the conditional distribution of Y given X = x is a Gaussian distribution whose mean is E(Y X = x) = E(Y ) + Σ Σ−1(x E(X)) | YX X − and whose covariance matrix is Σ = Σ Σ Σ−1Σ . Y |X Y − YX X XY Multinomial distribution Consider a set of n independent and identically distributed observations taking values in a finite label set L ,L ,...,L . Denote by p the probability of { 1 2 k} i an observation being equal to Li, i = 1, . , k. The vector of label counts X = (X1,...,Xk), where Xi is the number of observations equal to Li (i = 1, . , k) has a Multinomial distribution, whose probability mass function is n! x1 xk ult(x1, . , xk; n, p) = p1 . pk , M x1! . xk! where p = (p1, . , pk) and the counts x1, . , xk satisfy the constraint xi = n. P 234 A Useful distributions Dirichlet distribution The Dirichlet distribution is a multivariate generalization of the Beta distribution. Consider a parameter vector a = (a1, . , ak). The Dirichlet distribution ir(a) has k 1-dimensional density D − k−1 ak−1 Γ (a + + a ) − ir(x , . , x ; a) = 1 ··· k xa1−1 . xak 1−1 1 x , D 1 k−1 Γ (a ) ...Γ (a ) 1 k−1 − i 1 k i=1 ! X k−1 for x < 1, x > 0, i = 1 . , k 1. i i − i=1 X Wishart distribution Let W be a symmetric positive-definite matrix of random variables wi,j , i, j = 1, . , k. The distribution of W is the joint distribution of its entries (in fact, the distribution of the k(k + 1)/2-dimensional vector of the distinct entries). We say that W has a Wishart distribution with parameters α and B (α > (k 1)/2 and B a symmetric, positive-definite matrix), if it has density − (W ; α, B) = c W α−(k+1)/2 exp tr(BW ) , Wk | | − α k(k−1)/4 k where c = B /Γk(α), Γk(α) = π i=1 Γ ((2α + 1 i)/2) is the generalized gamma| | function and tr( ) denotes the trace of a matrix− argument. We write W (α, B) or just W· (α, BQ). We have ∼ Wk ∼ W E(W ) = α B−1. The Wishart distribution arises in sampling from a multivariate Gaussian distribution. If (Y1,...,Yn), n > 1, is a random sample from a multivariate normal distribution (µ, Σ) and Y¯ = n Y /n, then Y¯ (µ, Σ/n) and Nk i=1 i ∼ Nk n P S = (Y Y¯ )(Y Y¯ )′ i − i − i=1 X −1 is independent of Y¯ and has a Wishart distribution k((n 1)/2,Σ /2). In particular, if µ = 0, then W − n n 1 W = Y Y ′ , Σ−1 , i i ∼ Wk 2 2 i=1 X whose density (for n > k 1) is − − − n k 1 1 −1 f(w; n, Σ) W 2 exp tr(Σ W ) . ∝ | | −2 A Useful distributions 235 In fact, the Wishart distribution is usually parametrized in n and Σ, as in the expression above; then the parameter n is called degrees of freedom. Note that E(W ) = nΣ. We used the parametrization in α and B for analogy with the Gamma distribution; indeed, if k = 1, so that B is a scalar, then 1(α, B) reduces to the Gamma density ( ; α, B). W The following properties ofG the· Wishart distribution can be proved. Let −1 ′ W k(α = n/2,B = Σ /2) and Y = AW A , where A is an (m k) matrix of real∼ W numbers (m k). Then Y has a Wishart distribution of× dimension ≤ 1 −1 m with parameters α and 2 (AΣA) , if the latter exists. In particular, if W and Σ conformably partition into W W Σ Σ W = 1,1 1,2 ,Σ = 1,1 1,2 , W W Σ Σ 2,1 2,2 2,1 2,2 where W and Σ are h h matrices (1 h < k), then 1,1 1,1 × ≤ n 1 W α = , Σ−1 . 1,1 ∼ Wh 2 2 1,1 This property allows to compute the marginal distribution of the elements on the diagonal of W ; for example, if k = 2 and A = (1, 0), then Y = w1,1 (α = n/2, σ−1/2), where σ is the first element of the diagonal of Σ.∼ It G 1,1 1,1 follows that w /σ χ2(n).

A Useful Distributions

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support