Multinomial Distribution
Total Page:16
File Type:pdf, Size:1020Kb
494 Chapter 10 Goodness-of-Fit Tests In general, any procedure that seeks to determine whether a set of data could reasonably have originated from some given probability distribution, or class of probability distributions, is called a goodness-of-fit test. The principle behind the particular goodness-of-fit test we will look at is very straightforward: First the observed data are grouped, more or less arbitrarily, into k classes; then each class’s “expected” occupancy is calculated on the basis of the presumed model. If it should happen that the set of observed and expected frequencies shows considerably more disagreement than sampling variability would predict, our conclusion will be that the supposed pX (k) or fY (y) was incorrect. In practice, goodness-of-fit tests have several variants, depending on the speci- ficity of the null hypothesis. Section 10.3 describes the approach to take when both the form of the presumed data model and the values of its parameters are known. More typically, we know the form of pX (k) or fY (y),buttheirparametersneedto be estimated; these are taken up in Section 10.4. Asomewhatdifferentapplicationofgoodness-of-fittestingisthefocusof Section 10.5. There, the null hypothesis is that two random variables are indepen- dent. In more than a few fields of endeavor, tests for independence are among the most frequently used of all inference procedures. 10.2 The Multinomial Distribution Their diversity notwithstanding, most goodness-of-fit tests are based on essentially the same statistic, one that has an asymptotic chi square distribution. The underlying structure of that statistic, though, derives from the multinomial distribution, adirect extension of the familiar binomial.Inthissectionwedefinethemultinomialand state those of its properties that relate to goodness-of-fit testing. Given a series of n independent Bernoulli trials, each with success probability p,weknowthatthepdfforX,thetotalnumberofsuccesses,is n k n k P(X k) pX (k) p (1 p) − , k 0, 1,...,n (10.2.1) = = = k − = ! " One of the obvious ways to generalize Equation 10.2.1 is to consider situations in which at each trial, one of t outcomes can occur, rather than just one of two. That is, we will assume that each trial will result in one of the outcomes r1,r2,...,rt ,where t p(ri ) pi , i 1, 2,...,t (see Figure 10.2.1). It follows, of course, that pi 1. = = i 1 = #= Figure 10.2.1 r r r 1 1 p = P(r ), 1 r r i i r Possible 2 2 2 outcomes i = 1, 2, . , t rt rt rt ... 1 2 n Independent trials In the binomial model, the two possible outcomes are denoted s and f ,where P(s) p and P( f ) 1 p. Moreover, the outcomes of the n trials can be nicely sum- marized= with a single= random− variable X,whereX denotes the number of successes. In the more general multinomial model, we will need a random variable to count the number of times that each of the ri ’s occurs. To that end, we define 10.2 The Multinomial Distribution 495 Xi number of times ri occurs, i 1, 2,...,t = = t For a given set of n trials, X1 k1, X2 k2,...,Xt kt and ki n. = = = i 1 = #= Theorem Let Xi denote the number of times that the outcome ri occurs, i 1, 2,...,t,ina = 10.2.1 series of n independent trials, where pi P(ri ).Thenthevector(X1, X2,...,Xt ) has a multinomial distribution and = pX ,X ,...,X (k1, k2,...,kt ) P(X1 k1, X2 k2,...,Xt kt ) 1 2 t = = = = n k1 k2 kt ! p1 p2 pt , = k1 k2 kt ··· ! !··· ! t ki 0, 1,...,n i 1, 2,...,t ki n = ; = ; = i 1 $= Proof Any particular sequence of k1 r1’s, k2 r2’s, ..., and kt rt ’s has probability k1 k2 kt p1 p2 ...pt . Moreover, the total number of outcome sequences that will gener- ate the values (k1, k2,...,kt ) is the number of ways to permute n objects, k1 of one type, k2 of a second type, ..., and kt of a tth type. By Theorem 2.6.2 that number is n /k1 k2 ...kt ,andthestatementofthetheoremfollows. ! ! ! ! ! Depending on the context, the ri ’s associated with the n trials in Figure 10.2.1 can be either single numerical values (or categories) or ranges of numerical values (or categories). Example 10.2.1 illustrates the first type; Example 10.2.2, the second. The only requirements imposed on the ri ’s are (1) they must span all of the outcomes possible at a given trial and (2) they must be mutually exclusive. Example Suppose a loaded die is tossed twelve times, where 10.2.1 pi P(Face i appears) ci, i 1, 2,...,6 = = = What is the probability that each face will appear exactly twice? Note that 6 6 6(6 1) pi 1 ci c + = = = · 2 i 1 i 1 $= $= 1 which implies that c (and pi i/21). In the terminology of Theorem 10.2.1, the = 21 = possible outcomes at each trial are the t 6 faces, 1 ( r1) through 6 ( r6),andXi is the number of times face i occurs, i 1=, 2,...,6. = = The question is asking for the probability= of the vector (X1, X2, X3, X4, X5, X6) (2, 2, 2, 2, 2, 2) = According to Theorem 10.2.1, 12 1 2 2 2 6 2 P(X1 2, X2 2,...,X6 2) ! = = = = 2 2 2 21 21 ··· 21 ! !··· ! ! " ! " ! " 0.0005 = 496 Chapter 10 Goodness-of-Fit Tests Example Five observations are drawn at random from the pdf 10.2.2 fY (y) 6y(1 y), 0 y 1 = − ≤ ≤ What is the probability that one of the observations lies in the interval [0, 0.25), none in the interval [0.25, 0.50), three in the interval [0.50, 0.75), and one in the interval [0.75, 1.00]? 2 fY (y) = 6y(1 – y) 1 p2 p3 p1 p4 Probability density 0 0.25 0.50 0.75 1.00 r1 r2 r3 r4 Figure 10.2.2 Figure 10.2.2 shows the pdf being sampled, together with the ranges r1,r2,r3,and r4,andtheintendeddispositionofthefivedatapoints.Thepi ’s of Theorem 10.2.1 are now areas. Integrating fY (y) from 0 to 0.25, for example, gives: 0.25 p1 6y(1 y) dy = − %0 0.25 0.25 3y2 2y3 = 0 − 0 & & & & 5 & & & & = 32 5 By symmetry, p4 . Moreover, since the area under fY (y) equals 1, = 32 1 10 11 p2 p3 1 = = 2 − 32 = 32 ! " Let Xi denote the number of observations that fall into the ith range, i 1, 2, 3, 4.Theprobabilityassociatedwiththemultinomialvector(1,0,3,1),then,= is 0.0198: 5 5 1 11 0 11 3 5 1 P(X1 1, X2 0, X3 3, X4 1) ! = = = = = 1 0 3 1 32 32 32 32 ! ! ! ! ! " ! " ! " ! " 0.0198 = AMultinomial/BinomialRelationship Since the multinomial pdf is conceptually a straightforward generalization of the binomial pdf, it should come as no surprise that each Xi in a multinomial vector is, itself, a binomial random variable. Theorem Suppose the vector (X1, X2,...,Xt ) is a multinomial random variable with parame- 10.2.2 ters n, p1, p2,...,andpt .ThenthemarginaldistributionofXi , i 1, 2,...,t,isthe = binomial pdf with parameters n and pi . 10.2 The Multinomial Distribution 497 Proof To deduce the pdf for Xi we need simply to dichotomize the possible out- comes at each of the trials into “ri ”and“notri .” Then Xi becomes, in effect, the number of “successes” in n independent Bernoulli trials, where the probability of success at any given trial is pi .ByTheorem3.2.1,itfollowsthatXi is a binomial random variable with parameters n and pi . ! Comment Theorem 10.2.2 gives the pdf for any given Xi in a multinomial vector. Since that pdf is the binomial, we also know that the mean and variance of each Xi are E(Xi ) npi and Var(Xi ) npi (1 pi ),respectively. = = − Example Aphysicsprofessorhasjustgivenanexamtofiftystudentsenrolledinathermody- 10.2.3 namics class. From past experience, she has reason to believe that the scores will be normally distributed with µ 80.0 and σ 5.0.Studentsscoringninetyorabovewill receive A’s, between eighty= and eighty-nine,= B’s, and so on. What are the expected values and variances for the numbers of students receiving each of the five letter grades? Let Y denote the score a student earns on the exam, and let r1,r2,r3,r4, and r5 denote the ranges corresponding to the letter grades A, B, C, D, and F, respectively. Then p1 P(Student earns an A) = P(90 Y 100) = ≤ ≤ 90 80 Y 80 100 80 P − − − = 5 ≤ 5 ≤ 5 ! " P(2.00 Z 4.00) = ≤ ≤ 0.0228 = If X1 is the number of A’s that are earned, E(X1) np1 50(0.0228) 1.14 = = = and Var(X1) np1(1 p1) 50(0.0228)(0.9772) 1.11 = − = = Table 10.2.1 lists the means and variances for all the Xi ’s. Each is an illustration of the Comment following Theorem 10.2.2. Table 10.2.1 Score Grade pi E(Xi ) Var(Xi ) 90 Y 100 A0.02281.14 1.11 80 ≤ Y <≤ 90 B0.477223.8612.47 70 ≤ Y < 80 C0.477223.8612.47 60 ≤ Y < 70 D 0.0228 1.14 1.11 ≤ Y <60 F0.00000.000.00.