Stat 609: Mathematical Statistics Lecture 16
Total Page:16
File Type:pdf, Size:1020Kb
Lecture 16: Hierarchical models and miscellanea It is often easier to model a practical situation by thinking of things in a hierarchy. Example 4.4.1 (binomial-Poisson hierarchy) An insect lays many eggs, each surviving with probability p. On the average, how many eggs will survive? Let Y be the number of eggs and X be the number of survivors; both are random variables. We can model this situation by first modeling the distribution of Y ; given Y , we then model the distribution of XjY . We can then obtain the joint distribution of (X;Y ) and marginal distributions of X and Y . We can model the number of eggs by a Poisson distribution, i.e., Y ∼ Poisson(l), where l > 0 is the average of eggs. Given Y , we can model the number of survivors as beamer-tu-logo XjY ∼ binomial(p;Y ). UW-Madison (Statistics) Stat 609 Lecture 16 2015 1 / 15 What is the marginal distribution of X? For x = 0;1;2;:::, ¥ P(X = x) = ∑ P(X = x;Y = y) x ≤ y y=x ¥ = ∑ P(X = xjY = y)P(Y = y) y=x ¥ y e−l l y = ∑ px (1 − p)y−x y=x x y! (lp)x e−l ¥ [(1 − p)l]y−x = ∑ x! y=x (y − x)! (lp)x e−l ¥ [(1 − p)l]t = ∑ t = y − x x! t=0 t! (lp)x e−l e−pl (lp)x = e(1−p)l = x! x! The distribution of X is Poisson(pl)! beamer-tu-logo UW-Madison (Statistics) Stat 609 Lecture 16 2015 2 / 15 Then, E(X) = pl, which can also be obtained using E(X) = E[E(XjY )] = E(pY ) = pE(Y ) = pl without using the marginal distribution of X. If we begin with our model by saying that X ∼ Poisson(q), then q = pl with Y playing no role at all. Introducing Y in the hierarchy was mainly aid our understanding of the model. Example (binomial-binomial hierarchy) A very similar hierarchical model can be described as follows. A market survey is conducted to study whether a new product is preferred over the product currently available in the market (old product). The survey is conducted by mail. Questionnaires are sent along with the sample products (both new and old) to N customers randomly selected from a population. Each customer is asked to fill out the questionnaire and returnbeamer-tu-logo it, with response 1 (new is better than old) or 0 (otherwise). UW-Madison (Statistics) Stat 609 Lecture 16 2015 3 / 15 Let X be the number of ones in the returned questionnaires. What is the distribution of X? If every customer returns the questionnaire, then (from elementary probability) X ∼ binomial(p;N) (assuming that the population is large enough so that customers respond independently), where p 2 (0;1) is the overall rate of customers who prefer the new product. Some customers, however, do not return the questionnaires. Let Y be the number of customers who respond. If customers respond independently with the same probability p 2 (0;1), then Y ∼ binomial(p;N). Given Y = y (an integer between 0 and N), XjY = y ∼ (p;y) if y ≥ 1 and the point mass at 0 if y = 0. For x = 0;1;:::;N, N N P(X = x) = P(X = x;Y = k) = P(X = xjY = k)P(Y = k) ∑ ∑ beamer-tu-logo k=x k=x UW-Madison (Statistics) Stat 609 Lecture 16 2015 4 / 15 N k N = ∑ px (1 − p)k−x pk (1 − p)N−k k=x x k N N N − xp − pp k−x 1 − p N−k = (pp)x (1 − pp)N−x ∑ x k=x k − x 1 − pp 1 − pp N = (pp)x (1 − pp)N−x x It turns out that the marginal distribution of X is the binomial(pp;N) distribution. Hierarchical models can have more than two stages. The advantage is that complicated processes may be modeled by a sequence of relatively simple models placed in a hierarchy. Conditional distributions play a central role. The random variables in hierarchical models may be all discrete (as in our previous examples), all continuous, or some discrete beamer-tu-logo and some continuous. UW-Madison (Statistics) Stat 609 Lecture 16 2015 5 / 15 More general joint pdf’s and conditional distributions So far we have considered only the situation where all random variables are continuous or all are discrete. What if X is a continuous random variable and Y is a discrete random variable on Y ? We can define the joint “pdf” to be a function f (x;y) satisfying Z x P(X ≤ x;Y = y) = f (t;y)dt; x 2 R; y 2 Y −¥ Then the marginal pdf of X and pmf of Y are respectively Z ¥ fX (x) = ∑ f (x;y) and fY (y) = f (x;y)dx y2Y −¥ The conditional pdf of XjY or pmf of Y jX can be defined as before. Similarly we can deal with the situation where X is a continuous random vector and Y is a discrete random vector. If X or Y is neither discrete nor continuous, we can still define conditional distributions. But it involves higher level mathematics. beamer-tu-logo UW-Madison (Statistics) Stat 609 Lecture 16 2015 6 / 15 Example. Suppose that X is a continuous random variable and Y is a discrete random variable with Y = f0;1g, and the joint pdf is 8 2 < pa e−x =2 y = 0; f (x;y) = 2p x 2 R 1−a −jxj : 2 e y = 1; where 0 < a < 1. Then a −x2=2 1 − a −jxj fX (x) = f (x;0) + f (x;1) = p e + e x 2 R 2p 2 ( Z ¥ a y = 0 fY (y) = f (x;y)dx = −¥ 1 − a y = 1 8 2 2 > pa e−x =2 pa e−x =2 + 1−a e−jxj y = 0 f (x;y) <> 2p 2p 2 f (yjx) = = fX (x) 2 > 1−a e−jxj pa e−x =2 + 1−a e−jxj y = 1 :> 2 2p 2 beamer-tu-logo UW-Madison (Statistics) Stat 609 Lecture 16 2015 7 / 15 8 1 −x2=2 f (x;y) < p e y = 0 f (xjy) = = 2p x 2 R fY (y) 1 −jxj : 2 e y = 1 Mixture distributions In the previous example, the pdf fX is referred to as a mixture distribution (or pdf), since it is a convex combination of two distributions (pdf’s). Example 4.4.5 (three-stage hierarchy) Consider a generalization of Example 4.4.1, where instead of one mother insect there are a large number of mothers and one mother is chosen at random from a (possibly continuous) distribution G. The following three-stage hierarchy may be more appropriate: number of survivors X from Y eggs XjY ∼ binomial(p;Y ) number of eggs Y from a given mother ZY jZ ∼ Poisson(Z ) a particular mother ZZ ∼ G beamer-tu-logo UW-Madison (Statistics) Stat 609 Lecture 16 2015 8 / 15 Suppose that Z ∼ exponential(b). Then E(X) = E[E(XjY )] = pE(Y ) = pE[E(Y jZ)] = pE(Z) = pb The three-stage model can be thought of as a two-stage model by combining the last two stages; we just need to obtain the marginal distribution of Y . Z ¥ Z ¥ −z y e z 1 −z=b P(Y = y) = f (yjz)fZ (z)dz = e dz 0 0 y! b ¥ 1 Z −1 1 1 = y −z(1+b ) = Γ( + ) z e dz y 1 −1 y+1 by! 0 by! (1 + b ) 1 b y+1 1 1 y = = 1 − b (1 + b)y+1 1 + b 1 + b which is the geometric distribution with probability (1 + b)−1, i.e., number of survivors X from Y eggs XjY ∼ binomial(p;Y ) number of eggs YY ∼ geometric((1 + b)−1) beamer-tu-logo However, the three-stage model is easier to understand. UW-Madison (Statistics) Stat 609 Lecture 16 2015 9 / 15 Calculation of mean and variance Aside from the advantage in understanding models, hierarchical models can often make the calculation of mean and variance easier. Let X be a random variable having a pdf that is the noncentral chi- square pdf with noncentrality parameter l and degrees of freedom n: ¥ xn=2+k−1e−x=2 (l=2)k e−l=2 f (x) = x > 0 X ∑ n=2+k k=1 Γ(n=2 + k)2 k! Calculating E(X) is directly using this pdf not easy. However, the pdf is the marginal pdf of X in the following two-stage model: XjK ∼ chi square with degrees of freedom n + 2K K ∼ Poisson(l=2) Then E(X) = E[E(XjK )] = E(n + 2K ) = n + 2E(K ) = n + l Var(X) = E[Var(XjK )] + Var(E(XjK )) = E(2n + 4K ) + Var(n + 2K ) beamer-tu-logo = 2n + 4E(K ) + 4Var(K ) = 2n + 4l UW-Madison (Statistics) Stat 609 Lecture 16 2015 10 / 15 Example 4.4.6. Suppose that XjP ∼ binomial(n;P); P ∼ beta(a;b) How to calculate E(X) and Var(X)? na E(X) = E[E(XjP)] = E(nP) = nE(P) = a + b Using the variance formula derived previously, Var(X) = Var(E(XjP)) + E[Var(XjP)] = Var(nP) + E[nP(1 − P)] = n2Var(P) + nE(P) − nE(P2) n2ab na na(a + 1) = + − (a + b)2(a + b + 1) a + b (a + b)(a + b + 1) nab(a + b + n) = (a + b)2(a + b + 1) beamer-tu-logo UW-Madison (Statistics) Stat 609 Lecture 16 2015 11 / 15 Theorem 4.7.9 (Covariance inequality) Let X be a random variable and g and h be nondecreasing functions such that E[g(X)], E[h(X)], and E[g(X)h(X)] exist.