The Bernoulli Process and Discrete Distributions Math 217 Probability
Total Page:16
File Type:pdf, Size:1020Kb
1 special case p = 2 . Pascal published the result- ing theory of binomial coefficients and properties of what we now call Pascal's triangle. In the very early 1700s Jacob Bernoulli extended these results to The Bernoulli process and general values of p. Bernoulli processes are named discrete distributions in honor of Jacob Bernoulli who studied them in Math 217 Probability and Statistics detail and proved the first version of the Central Prof. D. Joyce, Fall 2014 Limit Theorem, something we'll look at later in the course. We've been looking at Bernoulli processes for a while, but didn't give them a name. Repeatedly Sampling with replacement. Sampling with flipping a coin is an example. Repeatedly tossing a replacement occurs when a set of N elements has die is another one if you only care whether a par- a subset of M \preferred" elements, and n ele- ticular number comes up or not. Sampling with ments are chosen at random, but the n elements replacement is another. don't have to be distinct. In other words, after one element is chosen, it's put back in the set so The definition. A single trial for a Bernoulli pro- it can be chosen again. Selecting a preferred ele- cess, called a Bernoulli trial, ends with one of two ment is success, and that happens with probabil- outcomes, one often called success, the other called ity p = M=N, while selecting a nonpreferred ele- failure. Success occurs with probability p while fail- ment is failure, and that happens with probability ure occurs with probability 1 − p. For a Bernoulli q = 1 − p = (N − M)=N. Thus, sampling with process, we'll always denote 1 − p as q. Rather replacement is a Bernoulli process. than use the outcomes heads and tails that are ap- propriate for flipping coins, will assume the out- Distributions associated to the Bernoulli comes are 0 and 1, where 1 denotes success and process. You can ask various questions about a 0 denotes failure. A random variable X that has Bernoulli process, and the answers to these ques- two outcomes, 0 and 1, where P (X=1) = p is said tions have various distributions. to have a Bernoulli distribution with parameter p, Bernoulli(p). The Bernoulli process consists of repeated inde- • If you ask how many successes there will be pendent Bernoulli trials with the same parameter p. among n Bernoulli trials, then the answer will Another way of expressing that is that Bernoulli tri- have a binomial distribution, Binomial(n; p). als form a random sample from the Bernoulli pop- Formally, it's the sum X1 + X2 + ··· + Xn of a ulation. It's assumed that the probability p applies Bernoulli sample of n i.i.d. random variables to every trial and that the outcomes of each trial with a Bernoulli(p) distribution. We've are independent of all the rest. We'll use the ran- looked at that before and determined that the th probability that there are k successes is dom variable Xi for the outcome of the i trial in a Bernoulli process. The first n trials of a Bernoulli n k n−k process X = (X1;X2;:::;Xn) form a random sam- p q : ple from a Bernoulli distribution with parameter k p. The first serious development in the theory of • If you ask how many trials it will be to get probability was in the 1650s when Pascal and Fer- the first success, then the answer will have a mat investigated the binomial distribution in the geometric distribution, Geometric(p). 1 • If you ask how many trials there will be Therefore, the probability that the rth success oc- to get the rth success, then the answer curs on the tth trial is will have a negative binomial distribution, t − 1 NegativeBinomial(p; r). prqt−r: r − 1 • Given that there are M successes among N Note: there are other parameters besides the p trials, if you ask how many of the first n tri- and r used here and in our text. It's not uncommon als are successes, then the answer will have a for a family of distributions to be parameterized Hypergeometric(N; M; n) distribution. in different ways. By the way, the \negative" in the name of this family comes from the −1's in The geometric distribution. Geometric(p). the binomial coefficients. Perhaps a better name In a Bernoulli process with probability p of suc- would have been depressed binomial distribution cess, the number of trials X it takes to get the or something like that. first success has a geometric distribution. Let's de- termine what that probability is that the the first The hypergeometric distribution. th success occurs on the t trial. Hypergeometric(N; M; n). The probability that the first success occurs on The hypergeometric distribution gives you a the first trial is, of course, p. certain conditional probability that occurs in a In order for the first success to occur on the sec- Bernoulli process. Given that there are M suc- ond trial, the first has to be a failure, which occurs cesses among N trials, it tells you the probability with probability 1 − p, followed immediately by a that there are k successes among the first n trials. success, which occurs with probability p. Thus the Here, k can be any nonnegative integer. first success occurs on the second trial is qp. To compute this probability, take the sample You can see where this is going. For the first N space to the the M ways that you can arrange M success to occur on the tth trial, there have to be successes among the N trials. Note that they each t − 1 failures followed by one success, and that says have the same probability so we're dealing with uni- the probability of the first success on trial t is pqt−1. form discrete probability here. Our answer won't N involve p. Among those M arrangements, how many of them have k successes among the first n The negative binomial distribution. trials? There are n ways to place the k successes NegativeBinomial(p; r). k among the the first n trials, but then we have N−n The negative binomial distribution generalizes M−k ways to place the remaining M −k successes among the geometric distribution. Whereas the geomet- the remaining N − m trials. Therefore, given that ric distribution gives the probability that the first there are M successes among N trials, the proba- success occurs on the tth trial, the negative bi- bility that there are k successes among the first n nomial distribution gives the probability that the th th trials is r success occurs on the t trial, where r is nN−n any positive integer. Thus, Geometric(p) = k M−k N : NegativeBinomial(p; 1). M So what is that probability? Among the first t − 1 trials there have to be exactly r − 1 successes Sampling without replacement. Note that and t − r failures, and that happens with proba- this is the same probability that that we got from t−1 r−1 t−r th bility r−1 p q , and then the t trial has to sampling without replacement. When you have N be a success, and that happens with probability p. balls, M of them preferred, and select n of them, 2 that formula gives you the probability that you've selected k preferred ones. Math 217 Home Page at http://math.clarku. edu/~djoyce/ma217/ 3.