1 Appendix: Common distributions
This chapter provides details for common univariate and multivariate distributions, in- cluding de…nitions, moments, and simulation. Many distributions can be parameterized in di¤erent ways. Deyroye (1986) provides a complete treatments of random number genera- tion, although care must be taken to insure the parameterizations are consistent.
Uniform
A random variable X has a uniform distribution on the interval [ ; ], denoted [ ; ] ; if the probability density function (pdf) is U 1 p (x ; ) = (1) j for x [ ; ] and 0 otherwise. The mean and variance of a uniform random variable 2 + ( )2 are E (x) = 2 and var (x) = 12 , respectively. The uniform distribution plays a foundational role in random number generation. In particular, uniform random numbers are required for the inverse transform simulation method, accept-reject algorithms, and the Metropolis algorithm. Fast and accurate pre-programmed algorithms are available in most statistical software packages and programming languages.
Bernoulli
A random variable X 0; 1 has a Bernoulli distribution with parameter , denoted 2 f g X er () if the probability mass function is B x 1 x Prob (X = x ) = (1 ) . (2) j The mean and variance of a Bernoulli random variable are E(X) = and var(X) = (1 ), respectively. To simulate X er (), B 1. Draw U (0; 1) U 2. Set X = 1 if U > .
1 Binomial
A random variable X 0; 1; :::; n has a binomial distribution with parameters n 2 f g and , denoted, X in (n; ), if the probability mass function is B
n! x n x Prob (X = x n; ) = (1 ) , (3) j x!(n x)! where n! = n (n 1)! = n (n 1) 2 1. The mean and variance of a binomial ran- dom variable are E(X) = n and var(X) = n (1 ), respectively. The Binomial distribution arises as the distribution of a sum of n independent Bernoulli trials. The
binomial is closely related to a number of other distributions. If W1; :::; Wn are i.i.d. n er (p), then Wi in (n; p). As n increases and p 0 and np = , B i=1 B ! 1 ! X in (n; p) converges in distribution to a Poisson distribution with parameter . B P To simulate X in (n; ), B
1. Draw W1; :::; Wn independently Wi er () B 2. Set X = # (Xi = 1) .
Poisson
A random variable X N+ (the non-negative integers) has a Poisson distribution 2 with parameter , denoted X oi () ; if the probability mass function is P x Prob (X = x ) = e : (4) j x! The mean and variance of a Poisson random variable are E(X) = and var(X) = ; respectively.
To simulate X oi (), P
1. Draw Zi i 1 independently, Zi exp (1) f g n 2. Set X = inf n 0 : Zi > . i=1 n X o Exponential
2 A random variable X R+ has an exponential distribution with parameter , de- 2 noted, X exp (), if the pdf is 1 x p (x ) = exp . (5) j The mean and variance of an exponential random variable are E (X) = and var (X) = 2, respectively.
The inverse transform method is easiest way to simulate exponential random vari- x ables, since cumulative distribution function is F (x) = 1 e : To simulate X exp (),
1. Draw U [0; 1] U 2. Set X = ln (1 U) . Gamma
A random variable X R+ has a gamma distribution with parameters and , 2 denoted X ( ; ), if the pdf is G
1 p (x ; ) = x exp ( x) , (6) j ( ) where the gamma function is de…ned in Appendix 4. The mean and variance of 1 2 a gamma random variable are E(X) = and var(X) = , respectively. It is important to note that there are di¤erently parameterizations of the gamma distribution. For example, some authors (and MATLAB) parameterize the gamma density as 1 1 p (x ; ) = x exp ( x= ) . j ( ) Notice that if Y ( ; 1) and X = Y= , then X ( ; ). To see this, note that G G the inverse transform is Y = X and dY=dX = , which implies that
1 1 1 p (x ; ) = (x ) exp ( x)( ) = x exp ( x) , j ( ) ( ) which is the density of ( ; ) random variable. The exponential distribution is a G 1 special case of the gamma distribution when = 1: X (1; ) implies that G X exp (). 3 Gamma random variable simulation is standard, with built-in generators in most software packages. These algorithms typically use accept/reject algorithms that are customized to the speci…c values of and . To simulate X ( ; ) when is G integer-valued,
1. Draw X1; :::; X independently Xi exp(1) 2. Set X = Xi. i=1 X For non-integer , accept-reject methods provide fast and accurate algorithms for gamma simulation. To avoid confusion over parameterizations, the transformation method can be used. To simulate X ( ; ) ; G 1. Draw Y ( ; 1) G Y 2. Set X = .
Beta
A random variable X [0; 1] has a beta distribution with parameters and , 2 denoted X ( ; ), if the pdf is B 1 1 x (1 x) p (x ; ) = , (7) j B ( ; ) where ( ) ( ) B ( ; ) = ( + ) is the beta function. Since p (x ; ) dx = 1, j R 1 1 1 B ( ; ) = x (1 x) dx: Z0 The mean and variance of a beta random variable are
E (X) = and var (X) = , (8) + ( + )2 ( + + 1)
respectively. If = = 1, then X (0; 1). U
4 If and are integers, to simulate X ( ; ), B
1. Draw X1 ( ; 1) and X2 ( ; 1) G G X 2. Set X = 1 : X1 + X2 For the general case, fast algorithms involving accept-reject, composition, and trans- formation methods are available in standard software packages.
Chi-squared
A random variable X R+ has a Chi-squared distribution with parameter , denoted 2 X 2 if the pdf is X 1 x 2 1 p (x ) = x exp . (9) j 2 2 2 2 The mean and variance of X are E (X ) = and var (X) = 2, respectively. The 2-distribution is a special case of the gamma distribution: 2 = ; 1 . X X2 G 2 2 Simulating chi-squared random variables typically uses the transformation method. For integer values of . the following two-step procedure simulates a 2 random X variable:
1. Draw Z1; :::; Z independently Zi (0; 1) N 2 2.Set X = Zi . i=1 X When is large, simulating using normal random variables is computationally costly and alternative more computationally e¢ cient algorithms use gamma random variable generation.
Inverse gamma
A random variable X R+ has an inverse gamma distribution, denoted by X 2 ( ; ), if the pdf is IG exp p (x ; ) = x . (10) ( ) x +1 j
5 The mean and variance of the inverse gamma distribution are
2 E (X) = and var (X) = (11) 1 ( 1)2 ( 2) 1 for > 2. If Y ( ; ) ; then X = Y ( ; ). To see this, note that G IG 0 1 1 1 1 1 1 = y exp ( y) dy = exp 2 dx 0 ( ) ( ) x x x Z Z1 1 1 = exp dx. ( ) x +1 x Z0 The following two-steps simulate an ( ; ) IG 1: Draw Y ( ; 1) G
2: Set X = . Y Again, as in the case of the gamma distribution, some authors use a di¤erent para- meterization for this distribution as, so it is important to be careful to make sure you are drawing using the correct parameters. In the case of prior distributions over scale parameters, 2, it is additional complicated because some authors such as Zellner (1971) parameterize models in terms of instead of 2.
Normal
A random variable X R has a normal distribution with parameters and 2, 2 denoted X (; 2), if the pdf is N 2 2 1 (x ) p x ; = exp 2 : (12) j p22 2 ! The mean and variance are E (X) = and var (X) = 2.
Given the importance of normal random variables, all software packages have func- tions to draw normal random variables. The algorithms typically use transformation methods drawing uniform and exponential random variables or look-up tables.
Lognormal
6 2 A random variable X R+ has a lognormal distribution with parameters and , 2 denoted by X (; 2) if the pdf is LN 1 1 p x ; 2 = x ; 2 = exp (ln x )2 : (13) j j xp22 22 + 1 2 The mean and variance of the normal distribution are E (X) = e 2 and similarly var (X) = exp (2 + 2) (exp (2) 1). It is related to a normal distribution via the transformation X = e+Z . Although …nite moments of the lognormal exist, the distribution does not admit a moment-generating function.
Simulating lognormal random variables via the transformation method is straightfor- ward since X = e+" where " (0; 1) is (; 2). N LN Truncated Normal A random variable X has a truncated normal distribution with parameters ; 2 and truncation region ( ; b) if the pdf is (x ; 2) p (x a < x < b) = j ; j (b ; 2) (a ; 2) j j where and it is clear that b (x ; 2) dx = (b ; 2). The mean of a truncated 1 j j normal distribution is R a b E(X a < X < b) = ; j b a where x is the standard normal density evaluated at (x ) = and x is the stan- dard normal CDF evaluated at (x ) =. The inversion method can be used to simulate a truncated normal random variable. A two-step algorithm provides a draw from a truncated standard normal, 1: U U [0; 1] 1 2: X = [ (a) + U ( (b) (a))] ; a 1=2 2 where (a) = (2) exp ( x =2) dx. For a general truncated normal, X 2 1 (; ) 1[a;b] TN R 1: Draw U U [0; 1] 1 a b a 2: X = + + U ; 1 where is the inverse of the error function.
7 Double exponential
A random variable X R has a double exponential (or Laplace) distribution with 2 parameters and 2, denoted X (; ), if the pdf is DE 1 1 p (x ; ) = exp x : (14) j 2 j j The mean and variance are E (x) = and var (X) = 22.
The following two steps utilize the composition method to simulate a (; ) ran- DE dom variable:
1: Draw (2) and " (0; 1) E N 2: Set X = + p":
Check exponential
A random variable X R has a check (or asymmetric) exponential distribution with 2 parameters and 2, denoted X (; ; ), if the pdf is CE 1 1 p (x ; ) = exp (x ) : (15) j 1 where (x) = x (2 1)x and = 2(1 ). The double exponential is a j j 1 special case when = 2 . The following two steps utilize the composition method to simulate a (; ) random CE variable:
Step 1: Draw ( ) and " (0; 1) E N Step 2: Set X = (2 1) + p": Student T
2 A random variable X R+ has a t-distribution with parameters ,, and , denoted 2 2 X t (; ), if the pdf is +1 +1 (x )2 2 p x ; ; 2 = 2 1 + : (16) j p 2 2 2 ! 8 When = 0 and = 1, the distribution is denoted merely as t. The mean and 2 variance of the t-distribution are E (X) = and var (X) = 2 for > 2. The Cauchy distribution is the special case where = 1.
2 The following two steps utilize the composition method to simulate a t (; ) random variable,
2 2 1. Draw X1 ; and X2 N X X1 2. Set X = 1=2 . (X2=)
Inverse Gaussian (…x)
A random variable X R+ has an inverse Gaussian distribution with parameters 2 and , denoted X (; ), if the pdf is IN (x )2 p (x ; ) = exp : (17) j 2x3 22x r ! The mean and variance of an inverse Gaussian random variable are E (X) = and var (X) = 3= , respectively.
To simulate an inverse Gaussian (; ) random variable, IN 1. U U(0; 1) and V 2 1 4 =Y 2: Set W = 2 1 + 1 + (4 =Y ) = p 2 3. If U < set X = W . If U X = . + W + W W
Generalized inverse Gaussian
A random variable X R+ has an generalized inverse Gaussian distribution with 2 parameters a, b, and p, X (a; b; p), if the pdf is GIG p p 1 a 2 x 1 b p (x a; b; p) = exp ax + ; (18) j b 2K pab 2 x p 9 where Kp is the modi…ed Bessel function of the second kind. The mean and variance are known, but are complicated expressions of the Bessel functions. The gamma distribution is the special case with = b=2 and a = 0, the inverse gamma is the special case with a = 0.
Simulating GIG random variables is typically done using resampling methods. Multinomial
A vector of random variables X = (X1; :::; Xk) has a multinomial distribution, de- noted X Mult (n; p1; :::; pk), if k n! xi p (X = x p1; :::; pk) = pi , (19) j x1! xk! i=1 Y k where i=1 xi = n. The multinomial distribution is a natural extension of the Bernoulli and binomial distributions. The Bernoulli distribution gives a single trial P resulting in success or failure. The binomial distribution is an extension that involves n independently repeated Bernoulli trials. The multinomial allows for multiple out- comes, instead of the two outcomes in the binomial distribution. There are still n total trials, but now the outcome of each trial is assigned into one of k categories,
and xi counts the number of outcomes in category i. The probability of category i is
pi. The mean, variance, and covariances of the multinomial distribution are given by
E (Xi) = npi; var (Xi) = npi (1 pi) , and cov (Xi;Xj) = npipj: (20) Multinomial distributions are often used in modeling …nite mixture distributions where the multinomial random variables represent the various mixture components.
Standard software packages provide multinomial samplers. Dirichlet
A vector of random variables X = (X1; ::; Xk) has a Dirichlet distribution, denoted k X ( 1; :::; k), if Xi = 1 D i=1 P k i=1 i k i 1 p (x 1; :::; k) = xi . (21) j ( 1P) ( k) i=1 Y 10 Dirichlet distribution is used as a prior for mixture probabilities in mixture models. The mean, variance, and covariances of the Multinomial distribution are given by
i E (Xi) = k i=1 i
i j=i j P 6 var (Xi) = 2 k k i=1 i P i=1 i + 1 P Pi j cov (Xi;Xj) = 2 . k k i=1 i i=1 i + 1 P P To simulate a Dirichlet X = (X1;:::;Xk) ( 1; :::; k) draw k independent gamma D k variates Yi ( i; 1) and then set Xi = Yi= Yi. i=1 Multivariate normal P
p A 1 p random vector X R+ has a multivariate normal distribution with parameters 2 and , denoted X (; ), if the pdf is MVN
1 1=2 1 1 p (x ; ) = p exp (x ) (x )0 ; (22) j 2 j j 2 (2) where is the determinant of the positive de…nite symmetric matrix . The mean j j and covariance matrix of a multivariate normal random variable are E (X) = and cov (X) = , respectively.
Given the importance of normal random variables, all software packages have func- tions to draw normal random variables.
Multivariate t
p A 1 p random vector X R+ has a multivariate t-distribution with parameters 2 ; ; and , denoted X t (; ), if the pdf is given by MV +p +p 1 2 (x ) (x )0 p (x ; ; ) = 2 1 + : (23) j ()1=2 1=2 2 j j The mean and covariance matrix of a multivariate t random variable are E (X) = and cov (X) = = ( 2), respectively. 11 The following two steps provide a draw from a multivariate t-distribution: 2 Step 1. Simulate Y k (; ) and Z N 1 X Z 2 Step 2. Set X = + Y Wishart
A random m m matrix has a Wishart distribution, m (v; V ), if the density W function is given by
(v m 1) 2 1 1 p ( v; V ) = v exp tr V , (24) vmj j v j 2 2 V 2 2 m 2 j j for v > m, where m m(m 1) v v k + 1 m = 4 (25) 2 k=1 2 is the multivariate gamma function.Y If v < m, then S does not have a density (although its distribution is de…ned). The Wishart distribution arises naturally in multivariate settings with normally distributed random variables as the distribution of quadratic forms of multivariate normal random variables.
The Wishart distribution can be viewed as a multivariate generalization of the 2 X distribution. From this, it is clear how to sample from a Wishart distribution:
1. Draw Xj (0;V ) for j = 1; :::; v v N
2. Set S = XjXj0 . j=1 X Inverted Wishart
A random m m matrix has an inverted Wishart distribution, denoted m (b; B) if the density function is IW (b+m+1) 2 1 1 p ( v; V ) = j j exp tr B . (26) bm b b j 2 2 2 2 B m 2 j j 12 1 1 1 This also implies that has a Wishart distribution, m (b; B ). The W Jacobian of the transformation is
1 @ (m+1) = : @ j j
To generate m (b; B), follow the two step procedure: IW 1 Step 1: Draw Xi 0;B N b Step 2: Set = i=1X iXi0:
In cases where m is extremely large, there areP more e¢ cient algorithms for drawing inverted Wishart random variables that factor .
13 2 Likelihoods, priors, and posteriors
This appendix provides likelihoods and priors for the following types of observed data: Bernoulli, Poisson, normal, normal regression, and multivariate normal. For each speci…- cation, proper conjugate and Je¤reys’priors are given.
2.1 Bernoulli observations
Likelihood: If yt er (), [0; 1], then the likelihood is j B 2 T T T T yt 1 yt yt T t=1 yt p (y ) = p (yt ) = (1 ) = t=1 (1 ) ; (27) j t=1 j t=1 P P Y Y T where t=1 yt is a su¢ cient statistic. Fisher’sinformation for Bernoulli observations is
2 P @ ln p (yt ) 1 I () = E j = ; @2 (1 ) where E denotes the expectation of yt conditional on .
Priors: A proper conjugate prior for Bernoulli observations is the beta distribution. If (a; A), then B a 1 A 1 (a + A) a 1 A 1 (1 ) p () = (1 ) = ; (a) (A) B (a; A) where B (a; A) is the Beta function. Je¤reys’prior is
1 1 1 1 1 p () = I () 2 2 (1 ) 2 ; : / B 2 2
Posterior: By Bayes rule, the posterior distribution is
T T a+ yt 1 A+T yt 1 p ( y) p (y ) p () t=1 (1 ) t=1 j / j / P P (aT ;AT ) ; B T T where aT = a + yt and AT = A + T yt: The moments of the Beta distribution t=1 t=1 are given in Appendix 2. P P 14 Marginal likelihood: The marginal likelihood is
1 1 aT 1 AT 1 p (y) = p (y ) p () d = (1 ) d B (a; A) j B (a; A) Z Z B (a ;A ) = T T . B (a; A)
Predictive likelihood: The predictive distribution is
T +1 T +1 p y p y p () d B (yt+1 + aT ; 1 yt+1 + _AT ) y yT = = j = . P T +1 p (yT ) p (yT ) p () d B (a ;A ) j R T T j R
2.2 Multinomial observations (FIX)
Likelihood: Multinomial observations consists of data from k di¤erent categories, and th yi counts the number of observations in the i category. If yt Mult (T; 1; : : : ; k), j i [0; 1], then the likelihood of T trials is 2 k k y1 yk p(y1; : : : ; yk 1; : : : ; k) = 1 : : : k where yi = T and i = 1 j i=1 i=1 X X Prior: If we assume a Dirchlet prior distribution, Dir( ) with density