<<

Lab 5 for Math 17: Distributions and Applications

Recall: The distribution formed by considering the value of a for every possible of a given size n from the population is called the sampling distribution of the statistic. It is usually too difficult to enumerate all possible samples and compute all possible values of the statistic by hand, but we can approximate the distributions by taking a “large” number of samples (via simulation) to help visualize the distribution. helps us determine the distributions of some common sampling distributions.

1 Coin Activity

Suppose we want to understand how the sample year on pennies behaves. The population of pennies we have available for investigation is a collection of 1002 pennies which were obtained from the UMass Five College Credit Union on August 25, 2010 ($10 in pennies was asked for). What do you think the distribution of year looks like for the population of pennies? Explain.

Obtain a sample of 30 pennies, and compute the sample mean year. What value do you get? We note that due to time constraints, we are not sampling with replacement.

Compare your mean value with the class (class graph). Are the values very different?

What does the distribution of sample mean year look like based on the graph?

Do you think looking at roughly 30 samples of size 30 is good enough to tell us about the distribu- tion of sample mean year when n is 30?

2 Sampling Distribution of the Sample Proportion

For the purposes of this example, the bin filled with balls represents the population of all possible birds that could be captured as part of an upcoming study looking for a genetic trait which is known to be harmful to carriers and sometimes fatal to those which exhibit the trait (think sickle cell anemia idea but for birds). Let white balls denote birds that do not have the trait and are also not carriers. Let red balls denote birds that are carriers but do not themselves exhibit the trait, and let green balls denote birds that do exhibit the trait (also then carriers).

1 Looking at the bin, what are your initial guesses as to the composition of this population? % white, % red, % green with total balls

With the understanding that you could choose a combination of colors (i.e. red + green = all car- riers) and estimate the for that combination, what combination (or single color) do you want to investigate? (You cannot choose single green vs. white+red).

What color/combination did the class decide on?

Working in groups of 2, taking turns as appropriate, every group come get a sample of size 25, 50, and 100 from the bin and get your count of the number of balls meeting the criteria above (class color/combination selected). Both members need to count the number of balls meeting the criteria chosen and agree on the count before you can record your counts for the class. Be sure you take a sample then return it to the population without losing members! (Also don’t take all three samples at once; do one, then return the balls, then take the second, etc.).

Small (n=25) Medium (n=50) Large (n=100)

Explain why this is NOT equivalent to capture-recapture sampling.

What does it look like the counts are close to for each sample size? What proportion is that (roughly)?

(Class values will be entered into R/Rcmdr for analysis). What values does the class get as the average of the sample proportions for each sample size?

What values does the class get as the of the sample proportions for each sample size?

The population proportion corresponding to the class color/combination is %. Does the sample proportion appear to be an unbiased statistic?

What does the effect of sample size on standard deviation for the sampling distribution of p appear to be?

2 What shapes do the for each sample size have (will be hard to tell with our small num- ber of repetitions)?

The Sampling Distribution for p can be described as: approximately normal for large sample sizes where p is not too near 0 or 1, with a mean denoted µpˆ = p, the population proportion, and a q p(1−p) standard deviation σpˆ = n (assuming that not more than 10% of the population is used as a sample).

For n = 25, 50, 100, compute the standard deviations for p based on the now known population proportion. Do the observed standard deviations for the sample proportions match up?

3 Sampling Distribution of the Sample Mean

For sample means, we will learn about the sampling distribution via an applet (link online). Steer your (Java-enabled) browsers to http://onlinestatbook.com/stat sim/sampling dist/index.html

In this applet, when you first hit Begin, a of a is displayed at the top of the screen. This is the parent population from which samples are taken (think of it as the bin of balls) except it’s showing the distribution. The mean of that distribution is indicated by a small blue line and the is indicated by a small purple line. Since the mean and median are the same for a normal distribution, the two lines overlap. The red line extends from the mean one standard deviation in each direction. The second histogram displays the sample . This histogram is initially blank. The third and fourth histograms show the distribution of statistics computed from the sample data. The option N in those histograms is the sample size you are drawing from the population. We will be exploring the distribution of the sample mean by drawing many samples from the parent distribution and examining the distribution of the sample means we get.

Step 1. Describe the parent population. What distribution is it and what is its mean and standard deviation?

Step 2. You can see the third histogram is already set to “Mean”, with a sample size of N = 5. Click Animated sample once. The animation shows five observations being drawn from the parent distribution. Their mean is computed and dropped down onto the third histogram. For your sam- ple, what was the sample mean?

Step 3. Click Animated sample again. A new set of five observations are drawn, their mean is computed and dropped as the second sample mean onto the third histogram. What did the mean of the sample means (yes, we are interested in the mean of sample means as part of the sampling distribution) change to?

Step 4. Click Animated sample one more time. What did the mean of the sample means update to now?

3 Step 5. Click 10,000. This takes 10,000 samples at once (no more animation) and will place those 10,000 sample means on the third histogram and update the mean and standard deviation of the sample means. Record the mean and standard deviation of the sample means. What shape does this third histogram have? How do these findings compare to the parent distribution?

Step 6. Hit Clear Lower 3 in the upper right corner. Change N = 5 to N = 25 for the third histogram. Do animated sample at least once (convince yourself it is actually samples of 25 now). Then take 10,000 at once. Record the mean and standard deviation of the sample means. What shape does the third histogram have? How do these findings compare to the parent distribution?

Step 7. Compare the different standard deviations from Steps 5 and 6. What effect does sample size appear to have on standard deviation of the sample means?

Step 8. Hit Clear Lower 3. Change the parent distribution to Skewed. What are the new mean and standard deviation of the parent distribution? Which direction is this distribution skewed?

Step 9. Set N = 5 back for the third histogram. Set “Mean” and N = 25 for the fourth histogram. Hit 10,000 at once. (This will take 10,000 samples of size 5, compute the sample means and put those means in the third histogram, as well as take 10,000 samples of size 25, compute the sample means and put those means in the fourth histogram). What do the distributions look like for the third and fourth histograms? Are they skewed like the parent population? What are the means and standard deviations for each histogram?

Step 10. Hit Clear Lower 3. Change the parent distribution to Custom. Draw in a custom dis- tribution (left click and drag the mouse over the top histogram). Sketch your custom distribution below. What are its mean and standard deviation?

Step 11. Hit 10,000 at once (leave the settings on the third and fourth histograms alone). (You could take animated once to convince yourself it was really drawing from your new distribution). What do the third and fourth histograms look like? Anything like the parent distribution? What are their means and standard deviations?

¯ The Sampling Distribution for the sample mean, X can be described as having a mean µX¯ = µ,

4 √σ the population mean, and a standard deviation σX¯ = n . The distribution is exactly normal if the parent population is normal. Finally, the tells us the distribution will be approximately normal with the mean and standard deviation stated above if n is sufficiently large even if the population distribution is not normal.

4 Application Example

A rental car company is interested in the number of miles put on their rental cars by their clients as part of a project where they may trade in some cars in the Cash for Clunkers program. From past experience, they believe the population distribution of mileage has a mean of 60 miles and a standard deviation of 60 miles. They obtain a random sample of 50 mileages from their rental car fleet and obtain a sample mean of 73.31 miles. The company executives are worried: has the average number of miles put on the cars gone up? Your job is to help them figure out if the data suggest an increase in average number of miles put on the cars. a. What is the sampling distribution of the sample mean mileage put on rental cars? (Give distribution type, mean, and standard deviation.) What result allows you to provide this distribu- tion?

b. What is the probability you would see a sample mean of 73.31 or greater if the population mean and standard deviation were both really 60?

c. Would you tell the executives that the average number of miles put on the rental cars has increased? (How unusual is 73.31 if the mean is really 60, assuming the standard deviation is correct?) d. In practice, do you think the standard deviation of the parent distribution would be known? How would you get around it being unknown? What value could you substitute for σ in our calcu- lations relating to the CLT? This swap and its consequences will be a focus of our discussions next week as we start developing confidence intervals.

5 5 More Applications

1. Suppose 40 percent of the voters in a large city prefer candidate Q for mayor. A random sample of 2400 city voters is taken. a. What is the sampling distribution of the sample proportion of city voters who prefer candi- date Q for mayor? Check that this distribution is valid.

b. What is the probability that the sample taken results in a sample proportion of .426 or higher?

2. A researcher is investigating deaths among a new invasive species of beetles treated with various insecticides. Age of death is recorded for fully matured adult beetles at various doses of insecti- cides. Since only fully matured adult beetles are included, and because ages at death are usually not bell-shaped, the researcher records age at death for 50 beetles at each insecticide/dosage level to help study average age at death. a. What is the significance of studying 50 beetles at each treatment level if you know you want to examine the sample mean? b. Suppose the population mean age at death for the beetle population at a specific treatment level is 20 days with a population standard deviation of 3 days. What is the sampling distribution of the sample mean for that treatment level for the sample of size 50 taken?

c. What is the probability a sample of size 50 results in a sample mean between 19 and 21?

6 To Turn In

In a recent election, 62 percent of voters voted in favor of a new law. A related law is coming up to vote in a neighboring state. A random sample of 80 voters in the neighoring state reveals that 43 of the 80 are in favor of the related law. If the percent in favor is really the same in both states, how unusual is the result of the sample poll or something more extreme (for direction of extreme use smaller values)?

6