AP Statistics Chapter 10

Confidence intervals for Proportions Chapter 18 Objectives: 1. Standard Error 2. Confidence Interval 3. One-proportion z-interval 4. Margin of Error 5. Critical Value Introduction • Statistical Inference – Involves methods of using information from a sample to draw conclusions regarding the population. – In formal statistical inference, we use probability to express the strength of our conclusions. Two Most Common Types of Formal Statistical Inference 1. Confidence Intervals – Estimate the value of a population parameter. 2. Tests of Significance – Assess the evidence for a claim about a population. Conference Intervals and Tests of Significance • Both types of inference are based on the sampling distributions of a sample statistic. CONFIDENCE INTERVAL The sample proportion pˆ We now study categorical data and draw inference on the proportion, or percentage, of the population with a specific characteristic. If we call a given categorical characteristic in the population “success,” then the sample proportion of successes, p ˆ ,is: count of successes in the sample pˆ count of observations in the sample We choose 50 people in an undergrad class, and find that 10 of them are Hispanic: = (10)/(50) = 0.2 (proportion of Hispanics in sample) You treat a group of 120 Herpes patients given a new drug; 30 get better: = (30)/(120) = 0.25 (proportion of patients improving in sample) Sampling distribution of pˆ The sampling distribution of p ˆ is never exactly normal. But as the sample size increases, the sampling distribution of becomes approximately normal. Standard Error • Both of the sampling distributions we’ve looked at are Normal. – For proportions pq SD pˆ n – For means SD y n Standard Error • When we don’t know p or σ, (which we normally don’t, because they are population parameters) we’re stuck, right? • Nope. We will use sample statistics to estimate these population parameters. • Whenever we estimate the standard deviation of a sampling distribution, we call it a standard error. Standard Error • For a sample proportion, the standard error is pqˆˆ SE pˆ n • For the sample mean, the standard error is s SE y n A Confidence Interval • Recall that the sampling distribution model of p ˆ is centered at p, with standard deviation p q . n • Since we don’t know p, we can’t find the true standard deviation of the sampling distribution model, so we need to find the standard error: pˆqˆ SE(pˆ) n A Confidence Interval • By the 68-95-99.7% Rule, we know pˆ – about 68% of all samples will have ’s within 1 SE of p – about 95% of all samples will have ’s within 2 SEs of p – about 99.7% of all samples will have ’s within 3 SEs of p • We can look at this from ’s point of view… A Confidence Interval pˆ • Consider the 95% level: – There’s a 95% chance that p is no more than 2 SEs away from . – So, if we reach out 2 SEs, we are 95% sure that p will be in that interval. In other words, if we reach out 2 SEs in either direction of , we can be 95% confident that this interval contains the true proportion. • This is called a 95% confidence interval. A Confidence Interval Confidence Interval • Definition – Confidence Interval is a range of values used to estimate the true value of a population parameter. What Does “95% Confidence” Really Mean? • The figure to the right shows that some of our confidence intervals (from 20 random samples) capture the true proportion (the green horizontal line), while others do not: What Does “95% Confidence” Really Mean? • Our confidence is in the process of constructing the interval, not in any one interval itself. • Thus, we expect 95% of all 95% confidence intervals to contain the true parameter that they are estimating. A Level C Confidence Interval has Two Parts: 1. An interval calculated from the data, usually of the form estimate ± margin of error – Example: estimate – margin of error – how accurate we believe our estimate is, based on the variability of the estimate. For a 95% confidence interval the margin of error would be 2 SE ( pˆ ) . 2. A Confidence Level C, which gives the probability that the interval will capture the true parameter value in repeated samples. – Example: 95% confidence interval – normally use confidence level of 90% or higher Margin of Error: Certainty vs. Precision • We can claim, with 95% confidence, that the interval pˆ 2SE(pˆ) contains the true population proportion. – The extent of the interval on either side of p ˆ is called the margin of error (ME). • In general, confidence intervals have the form estimate ± ME. • The more confident we want to be, the larger our ME needs to be, making the interval wider. Margin of Error: Certainty vs. Precision Margin of Error: Certainty vs. Precision • To be more confident, we wind up being less precise. – We need more values in our confidence interval to be more certain. • Because of this, every confidence interval is a balance between certainty and precision. • The tension between certainty and precision is always there. – Fortunately, in most cases we can be both sufficiently certain and sufficiently precise to make useful statements. Margin of Error: Certainty vs. Precision • The choice of confidence level is somewhat arbitrary, but keep in mind this tension between certainty and precision when selecting your confidence level. • The most commonly chosen confidence levels are 90%, 95%, and 99% (but any percentage can be used). Critical Values • The ‘2’ in p ˆˆ 2 SE ( p ) (our 95% confidence interval) came from the 68-95-99.7% Rule. • Using a table or technology, we find that a more exact value for our 95% confidence interval is 1.96 instead of 2. – We call 1.96 the critical value and denote it z*. • For any confidence level, we can find the corresponding critical value (the number of SEs that corresponds to our confidence interval level). Example: Confidence Level Upper Critical Lower Critical Value Value Critical Values • Example: For a 90% confidence interval, the critical value is 1.645: z* • The critical value z* is the number (z-score) on the borderline separating sample statistics that are likely to occur from those that are unlikely to occur, for a given confidence level. z* is the same for any normal distribution for a given confidence level Problem: • Find the critical value z* for a confidence level of 88%? • invNorm(.94)=1.555 Your Turn: • Find the critical value z* for a confidence level of 73%? • invNorm(.865)=1.103 Objective • Construct confidence intervals, and recognize the significance they have on statistical inference. Assumptions and Conditions • Here are the assumptions and the corresponding conditions you must check before creating a confidence interval for a proportion: • Independence Assumption: We first need to Think about whether the Independence Assumption is plausible. It’s not one you can check by looking at the data. Instead, we check two conditions to decide whether independence is reasonable. Assumptions and Conditions – Randomization Condition: Were the data sampled at random or generated from a properly randomized experiment? Proper randomization can help ensure independence. – 10% Condition: Is the sample size no more than 10% of the population? . Sample Size Assumption: The sample needs to be large enough for us to be able to use the CLT. – Success/Failure Condition: We must expect at least 10 “successes” and at least 10 “failures.” One-Proportion z-Interval • When the conditions are met, we are ready to find the confidence interval for the population proportion, p. • The confidence interval is pˆ z SEpˆ where pˆqˆ SE(pˆ) n • The critical value, z*, depends on the particular confidence level, C, that you specify. Procedure: Confidence Interval for a Population Proportion 1. Identify the population of interest and the parameter you want to draw conclusions about (population proportion p). 2. Choose the appropriate inference procedure. Verify the conditions for using the selected procedure. – Conditions population proportion; • Random condition • 10% condition • Success/Failure condition 3. If the conditions are met, carry out the inference procedure. – Confidence interval (CI) • CI = estimate ± margin of error – In general • CI = estimate ± z* • SE – For population proportion p • Estimate = pˆ pqˆˆ SE() pˆ • n • z*: calculated based on the confidence level • CI for p: pˆˆ z* SE ( p ) 4. Interpret your results in the context of the problem. • Summary – Confidence interval for population proportion p pˆˆ z* SE() p Standard error of the Unbiased estimate sampling distribution of of population Upper critical value sample proportions proportion p for confidence level Example - Medication side effects Arthritis is a painful, chronic inflammation of the joints. An experiment on the side effects of pain relievers examined arthritis patients to find the proportion of patients who suffer side effects. It was found that 23 out of 440 arthritis patients suffered side effects. What are some side effects of ibuprofen? Serious side effects (seek medical attention immediately): Find a 90% Confidence Interval. Allergic reaction (difficulty breathing, swelling, or hives), Muscle cramps, numbness, or tingling, Ulcers (open sores) in the mouth, Rapid weight gain (fluid retention), Seizures, Black, bloody, or tarry stools, Blood in your urine or vomit, Decreased hearing or ringing in the ears, Jaundice (yellowing of the skin or eyes), or Abdominal cramping, indigestion, or heartburn, Less serious side effects (discuss with your doctor): Dizziness or headache, Nausea, gaseousness, diarrhea, or constipation, Depression, Fatigue or weakness, Dry mouth, or Irregular menstrual periods Solution • Check Conditions – Randomization Condition; assume the 440 arthritis patients were randomly selected. – 10% Condition; it is reasonable to assume there are more than 4,400 total arthritis patients. – Success/Failure pˆCondition: n = (440)(23/440) = 23 and nq ˆ = (440)(317/440) = 317, both are greater than 10.

Load more