Lecture 8: Geometric and Binomial Distributions
Total Page:16
File Type:pdf, Size:1020Kb
Recap Dangling thread from last time From OpenIntro quiz 2: Lecture 8: Geometric and Binomial distributions Statistics 101 Mine C¸etinkaya-Rundel P(took j not valuable) September 22, 2011 P(took and not valuable) = P(not valuable) 0:1403 = 0:1403 + 0:5621 = 0:1997 Statistics 101 (Mine C¸etinkaya-Rundel) L8: Geometric and Binomial September 22, 2011 1 / 27 Recap Geometric distribution Bernoulli distribution Milgram experiment Clicker question (graded) Stanley Milgram, a Yale University psychologist, conducted a series of Which of the following is false? experiments on obedience to authority starting in 1963. Experimenter (E) orders the teacher (a) The Z score for the mean of a distribution of any shape is 0. (T), the subject of the experiment, to (b) Half the observations in a distribution of any shape have positive give severe electric shocks to a Z scores. learner (L) each time the learner (c) The median of a right skewed distribution has a negative Z score. answers a question incorrectly. (d) Majority of the values in a left skewed distribution have positive Z The learner is actually an actor, and scores. the electric shocks are not real, but a prerecorded sound is played each time the teacher administers an electric shock. Statistics 101 (Mine C¸etinkaya-Rundel) L8: Geometric and Binomial September 22, 2011 2 / 27 Statistics 101 (Mine C¸etinkaya-Rundel) L8: Geometric and Binomial September 22, 2011 3 / 27 Geometric distribution Bernoulli distribution Geometric distribution Bernoulli distribution Milgram experiment (cont.) Bernouilli random variables These experiments measured the willingness of study Each person in Milgram’s experiment can be thought of as a trial. participants to obey an authority figure who instructed them to A person is labeled a success if she refuses to administer a perform acts that conflicted with their personal conscience. severe shock, and failure if she administers such shock. Milgram found that about 65% of people would obey authority Since only 35% of people refused to administer a shock, and give such shocks. probability of success is p = 0:35. Over the years, additional research suggested this number is When an individual trial has only two possible outcomes, it is approximately consistent across communities and time. called a Bernoulli random variable. Statistics 101 (Mine C¸etinkaya-Rundel) L8: Geometric and Binomial September 22, 2011 4 / 27 Statistics 101 (Mine C¸etinkaya-Rundel) L8: Geometric and Binomial September 22, 2011 5 / 27 Geometric distribution Bernoulli distribution Geometric distribution Geometric distribution Simulation of Milgram’s experiment Geometric distribution Dr. Smith wants to repeat Milgram’s experiments but she only wants to sam- Imagine a hat with 100 pieces of paper in it, 35 are marked “refuse” and 65 ple people until she finds someone who will not inflict a severe shock. What are marked “shock”. These represent the subjects in Milgram’s experiment. is the probability that she stops after the first person? > outcomes <- c(rep("refuse", 35), rep("shock", 65)) P(1st person refuses) = 0:35 Let’s simulate a sample of 10 subjects, with replacement so that the ... the third person? probability of success stays constant at 0.35 at each trial. S S R P(1st and 2nd shock; 3rd refuses) = × × > simSample <- sample(outcomes, size = 10, replace = TRUE) 0:65 0:65 0:35 = 0:652 × 0:35 ≈ 0:15 Now let’s take a look at who is in our simulated sample: ... the tenth person? > table(simSample) simSample The sample proportion of successes, th S S R 3 P(9 shock; 10 refuses) = × · · · × × shock refuse p^, is . 0:65 0:65 0:35 10 | {z } 7 3 9 of these = 0:659 × 0:35 ≈ 0:0072 Statistics 101 (Mine C¸etinkaya-Rundel) L8: Geometric and Binomial September 22, 2011 6 / 27 Statistics 101 (Mine C¸etinkaya-Rundel) L8: Geometric and Binomial September 22, 2011 7 / 27 Geometric distribution Geometric distribution Geometric distribution Geometric distribution Geometric distribution (cont.) Geometric distribution describes the waiting time until a success for independent and identically distributed (iid) Bernouilli random Clicker question variables. independence: outcomes of trials don’t affect each other Can we calculate the probability of rolling a 6 for the first time on the th identical: the probability of success is the same for each trial 6 roll of a die using the geometric distribution? (a) no, on the roll of a die there are more than 2 possible outcomes Geometric probabilities (b) yes, why not If p represents probability of success, (1 − p) represents probability of failure, and n represents number of independent trials P(success on the nth trial) = (1 − p)n−1p Statistics 101 (Mine C¸etinkaya-Rundel) L8: Geometric and Binomial September 22, 2011 8 / 27 Statistics 101 (Mine C¸etinkaya-Rundel) L8: Geometric and Binomial September 22, 2011 9 / 27 Geometric distribution Geometric distribution Geometric distribution Geometric distribution Expected value Expected value and its variability Mean and standard deviation of geometric distribution s 1 1 − p How many people is Dr. Smith expected to test before finding the first µ = σ = one that refuses to administer the shock? p p2 The expected value, or the mean, of a geometric distribution is 1 Going back to Dr. Smith’s experiment: defined as p . 1 1 s r µ = = = 2:86 1 − p 1 − 0:35 p 0:35 σ = = = 2:3 p2 0:352 She is expected to test 2.86 people before finding the first one that refuses to administer the shock. But how can she test a non-whole Dr. Smith is expected to test 2.86 people before finding the first number of people? one that refuses to administer the shock, give or take 2.3 people. These values only makes sense in the context of repeating the experiment many many times. Statistics 101 (Mine C¸etinkaya-Rundel) L8: Geometric and Binomial September 22, 2011 10 / 27 Statistics 101 (Mine C¸etinkaya-Rundel) L8: Geometric and Binomial September 22, 2011 11 / 27 Binomial distribution Binomial distribution The binomial distribution Suppose we randomly select four individuals to participate in Dr. Binomial distribution Smith’s experiment. What is the probability that exactly one of them will refuse to administer the shock? The question from the prior slide asked for the probability of given number of successes, k, in a given number of trials , n,(k = 1 Let’s call these people Allen (A), Brittany (B), Caroline (C), and success in n = 4 trials), and we calculated this probability as Damian (D). Each one of the four scenarios below will satisfy the condition of “exactly one of them refusing to administer the shock”: # of scenarios × P(single scenario) 1 A = refuse, B = shock, C = shock, D = shock: 0:35 ∗ 0:653 = 0:0961 2 A = shock, B = refuse, C = shock, D = shock: 0:35 ∗ 0:653 = 0:0961 # of scenarios: there is a less tedious way to figure this out, we’ll get to that shortly... 3 A = shock, B = shock, C = refuse, D = shock: 0:35 ∗ 0:653 = 0:0961 P(single scenario) = pk (1 − p)(n−k) 4 A = shock, B = shock, C = shock, D = refuse: 0:35 ∗ 0:653 = 0:0961 probability of success to the power of number of successes, probability of failure to the power of number of failures The probability of exactly one out of four people refusing to administer The binomial distribution describes the probability of having exactly k the shock is the sum of all of these probabilities. successes in n independent Bernouilli trials with probability of success p. 4 × 0:0961 = 0:3844 Statistics 101 (Mine C¸etinkaya-Rundel) L8: Geometric and Binomial September 22, 2011 12 / 27 Statistics 101 (Mine C¸etinkaya-Rundel) L8: Geometric and Binomial September 22, 2011 13 / 27 Binomial distribution The binomial distribution Binomial distribution The binomial distribution Counting the # of scenarios Calculating the # of scenarios Earlier we wrote out all possible scenarios that fit the condition of Choose function exactly one person refusing to administer the shock. If n was larger The choose function is useful for calculating the number of ways to and/or k was different than 1, writing out the scenarios would get choose k successes in n trials. even more tedious. For example, what if n = 9 and k = 2: ! n n! = RRSSSSSSS k k!(n − k)! SRRSSSSSS SSRRSSSSS 4 4! 4×3×2×1 ··· k = 1, n = 4: 1 = 1!(4−1)! = 1×(3×2×1) = 4 SSRSSRSSS 9 k = 2, n = 9: = 9! = 9×8×7! = 72 = 36 ··· 2 2!(9−1)! 2×1×7! 2 SSSSSSSRR Note: You can also use R for these calculations: Writing out all possible scenarios is way too tedious, and is prone to > choose(9,2) [1] 36 errors. Statistics 101 (Mine C¸etinkaya-Rundel) L8: Geometric and Binomial September 22, 2011 14 / 27 Statistics 101 (Mine C¸etinkaya-Rundel) L8: Geometric and Binomial September 22, 2011 15 / 27 Binomial distribution The binomial distribution Binomial distribution The binomial distribution Properties of the choose function Binomial distribution (cont.) If k = 1, only 1 of the n trials result in a success, it could be the Binomial probabilities th first, the second, ··· , or the n trial, so there are n ways this can If p represents probability of success, (1 − p) represents probability of happen: failure, n represents number of independent trials, and k represents ! number of successes n = n ! 1 n P(k successes in n trials) = pk (1 − p)(n−k) If k = n, all n trials result in a success, and there’s only one way k this can happen: ! n We can use the binomial distribution to calculate the probability of k = 1 n successes in n trials, as long as If k = 0, all n trials result in a failure, and there’s only one way 1 the trials are independent this can happen as well: 2 the number of trials, n, is fixed ! 3 each trial outcome can be classified as a success or a failure n = 1 4 0 the probability of success, p, is the same for each trial Statistics 101 (Mine C¸etinkaya-Rundel) L8: Geometric and Binomial September 22, 2011 16 / 27 Statistics 101 (Mine C¸etinkaya-Rundel) L8: Geometric and Binomial September 22, 2011 17 / 27 Binomial distribution The binomial distribution Binomial distribution The binomial distribution Expected value Expected value and its variability A September 2011 Gallup survey suggests that 38% of Americans are Mean and standard deviation of binomial distribution planning on voting for Barack Obama in the 2012 presidential election.