Probability: What Affects Estimates

Spring 2010 Math 263 Deb Hughes Hallett Class 13: Confidence Intervals for Means Statistical Inference We take a sample to learn about a population. There are two ways that we can draw a conclusion: Estimation, using confidence intervals. Here we use the sample to make an estimate of a population parameter, such as the population, ,.or the population proportion, . --For example, estimate the mean income in a community from a sample. Hypothesis testing. Here we test a claim about a population. --For example, test the claim that a drug lowers blood pressure significantly. Example: What is the Effect of the Police Radar US traffic police often use radar to catch drivers speeding. To alert them to the presence of police radar, some drivers mount radar detectors in their cars. This has led to a debate:1 Are radar detectors a useful reminder to stay within the speed limit, or are they simply a way of avoiding police detection? A study2 in Maryland found that a sample of 22 cars with radar detectors slowed down an average of 11 mph in the presence of radar. Suppose that the speed reduction of individual cars was normally distributed with standard deviation 2 mph.3 Ex: What does this sample tell us about the average drop in speed of all cars with radar detectors? What is: Variable type: (Quantitative/categorical?): Quantitative Population: All cars with radar detectors Population Parameter: Average drop in speed of all cars with radar detectors Sample: The 22 cars sampled Sample Statistic: Average drop in speed of cars in sample, 11 mph Estimate of population mean: We use the sample mean, 11 mph, as an estimate of the population mean. How far from the true mean could this estimate be? Confidence Intervals To see how far from the true value our estimate of 11 mph could be, we construct a confidence interval, in which the true population mean is likely to lie. The margin of error and the width of the confidence interval depend on how much the sample means vary between samples; this is determined by the Central Limit Theorem.4 The Central Limit Theorem tells us the mean drop in speed for a sample of 22 cars is normally distributed with mean equal to the mean drop in speed of the population (which we don’t know) and standard deviation = mph. Suppose the mean drop in speed for the population was 11 mph. (Note: It wasn’t exactly 11 mph, as 11 mph is the sample mean, but we expect the population mean is close to 11 mph.) Then the distribution of sample means for samples of size 22 would look like this: 1 From Ohio State’s EESEE, based on work by N.Teed, K.Adrian, R. Khoblanch, 1991, www.whfreeman.com/scc6e 2 www.afn.org/nafn 09444/ scanlaws/ 3 We are going to need to know the standard deviation of the population distribution, so we take this to be 2. 4 We can use the Central Limit Theorem even though the sample size is less than 30 because the original distribution is normal. 1 Spring 2010 Math 263 Deb Hughes Hallett Distribution of Average Drop in Speed for Samples of 22 Cars Mean 11 mph, Standard deviation 2 mph 0.0 9.00 10.00 11.00 12.00 13.00 Drop in speed (mph) The graph suggests almost all the mean drops in speed are between 10 mph and 12 mph. Since 95 % of the data is within 2 standard deviations of the mean, we conclude that 95% of the drops in speed are roughly between 11 – 2 (0.43) mph and 11 + 2 (0.43) mph = 11 – 0.86 mph and 11 + 0.86 mph = 10.14 mph and 11.86 mph. The interval is called a confidence interval. More accurate Confidence Interval Ex: Use the table to find a more accurate the -values on either side of 0 containing 95% of the data. We want the z-values leaving 2.5% on the outside; the closest value is and More precisely, we can now say that 95 % of the speed drops are between 11 – 1.96 (0.43) mph and 11 + 1.96 (0.43) mph = 11 – 0.8 mph and 11 + 0.8 mph = 10.2 mph and 11.8 mph. The interval (10.2, 11.8) is called the 95 % confidence interval. It tells us that the average drop in speed for the whole population is has a 95 % chance to be in this interval. The 0.8 mph is called the margin of error. Formula for Confidence Interval for Means In the previous example, we see that the confidence interval was constructed like this: Here 11 is the mean, , of the sample; 1.96 is the Z-value corresponding to 95% of the data; 2 is the standard deviation, σ, and the 22 is the sample size n. Thus, in general, the 95% confidence interval is The margin of error is 2 Spring 2010 Math 263 Deb Hughes Hallett Other Confidence Levels We have found a 95% confidence interval for the mean speed reduction for cars with radar detectors. It is also possible to estimate the mean speed reduction by using 90% and 99% confidence intervals from the same sample. Ex: How are the 95%, 90%, 99% confidence intervals related? Center of intervals: All centered at 11 mph Spread of intervals: The 90% confidence interval is shorter than the 95% confidence interval because the 90% interval does not have to be as sure that it contains the true value. The 99% confidence interval is longer than the 95% interval. Thus changing the confidence level makes the interval longer or shorter, but does not alter its center. Ex: Find Z-values for 90%, 95%, 99% confidence interval Confidence Level 90% 95% 99% z-values 1.645 1.96 2.575 Ex: What are the 90% and 99% confidence intervals for the drop in speed? 90% confidence: 99% confidence: Interpreting Confidence Intervals Informally we can say there’s a 90% chance that the mean speed drop is in the interval there’s a 95% chance that the mean speed drop is in the interval there’s a 99% chance that the mean speed drop is in the interval . However, this is not quite correct as the mean is a fixed number, so it either is, or isn’t, in these intervals—the probability is either 0 or 1. More properly, we say the method which produced a 95% interval covers the true mean 95% of the time. Ex: True or false: The 95% confidence interval tells us that 95% of the times we measure a speed drop, we will find it between 10.2 mph and 11.8 mph. False: The confidence interval tells us that the mean of the population is has a 95% chance of being in this interval, not that 95% of the individual readings are in this interval. 3 Spring 2010 Math 263 Deb Hughes Hallett Choosing Sample Size for the Margin of Error If the sample size was 50 (instead of 22), find the standard deviation of the sampling deviation of the sampling distribution, the margin of error and the 95% confidence interval. Standard deviation = mph Margin of Error = 1.96(0.28) = 0.55 mph Confidence Interval is: (11 – 0.55 mph, 11 + 0.55 mph) = (10.45 mph, 11.55 mph) Thus we can be 95% certain that the average drop in speed of the population of all cars with radar detectors is between 10.45 mph and 11.55 mph. Ex: Why does increasing the sample size decreases the margin of error? Explain mathematically and intuitively. Mathematically, the sample size is in the denominator of the expression for the standard deviation and the margin of error, so both decrease as the sample size increases. Intuitively, extreme values are more likely to average out in a larger sample, so the sampling distribution is less spread out––it has a smaller standard deviation. Thus the margin of error gets smaller as the sample size gets larger. Ex: If you needed a more precise estimate of the drop in speed to within 0.1 mph, how large a sample is required? We need the margin of error to be 0.1, and we solve for the sample size that achieves this. Since the margin of error , we have Thus a sample of 1537 cars is needed. 4 Spring 2010 Math 263 Deb Hughes Hallett Other Examples Ex: A US Department of Agriculture (USDA) study5 found that the mean price received by a sample of 22 farmers for corn was $2.08 per bushel with standard error $0.176 per bushel. Find a 95% confidence interval for the price of corn. What is the margin of error? We do not use the 22 as we are give that the standard error , so the confidence interval is (2.08 – 1.96(0.176), 2.08 + 1.96(0.176) = (1.74, 2.42) The true price was likely between $1.74 and $2.42. The margin of error is 1.96(0.176) = $0.345. Ex: The 95% confidence interval for the difference in birth weight6 (nonsmokers smokers) in grams for babies for mothers who do not smoke and those who do is (167, 595). Explain what this interval tells us. What is the best single number estimate of the weight difference? The study tells us that the weight difference for babies of smokers is estimated to be (167 + 595)/2 = 381 grams; the true value is likely to be between 167 and 595 grams.

Probability: What Affects Estimates

Measurement and Uncertainty Analysis Guide

What Is This “Margin of Error”?

A Note on Confidence Interval Estimation and Margin of Error

−1 ≤ R ≤ +1 FACT: −1 ≤ Ρ ≤ +1

Statistics Final Exam Review Notes How to Study for the Final?

Chapter 10 Estimating with Confidence

Determining Sample Size How to Ensure You Get the Correct Sample Size

Lecture Notes for Week 12

The “Margin of Error” of Polls – Sampling Error, Bernoulli Processes, and Random Walks John Denker

1 (Poisson) Model for (Sampling)Variability of Count in a Given Amount of “Experience” 1

Overdispersed Models for Claim Count Distribution

Lecture Notes #7: Residual Analysis and Multiple Regression 7-1