Sampling Distribution of Sample Proportion ˆP

Lecture 22 Hypothesis Testing About a Proportion Our first example of Hypothesis Testing tests hypotheses about a proportion. These apply when you are considering a binary categorical variable in a population and want to know if it is plausible that the proportion of successes is different from some proposed value. • 5 Steps of HT About a Proportion 1. Identify Hypotheses - From the problem you will get the proposed value, called the test proportion p0 of the population proportion p: In each problem p0 will be a number coming from the question. The Null Hyp. is always that p equals that value. The Alt. Hyp. can be any of the follow- ing three, depending on what the question asks for evidence in support of. H0 : p = p0 (a) HA : p < p0 (b) HA : p > p0 (c) HA : p 6= p0 (two-tailed) • In your calculations p will always be a variable, because we will never know it. On the other hand p0 will in each problem be a number which you can plug in for it everywhere below. 5 Steps of HT About a Proportion (step 2) 2. Assume H0; Give Sampling Distribution - Assume p = p0: That means the distribution of P^ the sample proportion of a sample of size n... p ... has mean p0; standard error p0(1 − p0)=n and is roughly normal Sampling Distribution of Sample Proportion P^ 1 • This describes the typical values for P^ you would expect to get if you took a random sample of size n from a population where the population proportion was p = p0: We are going to see how the specific p^ we got in our sample compares with these expected results. If it looks surprising assuming p = p0; then that assumption is probably wrong. 5 Steps of HT About a Proportion (step 3) 3. Calculate p-value - The calculation of the p-value depends on which of three different Alternate Hypotheses you are using, and the actualp ^ you (a) HA : p < p0 p p-val = Normdist p;^ p0; p0(1 − p0)=n; 1 (b) HA : p > p0 get from your sample. p p-val = 1 − Normdist p;^ p0; p0(1 − p0)=n; 1 (c) HA : p 6= p0 (twice smaller of values above) p-val = 2 × MIN (a; b) • You can just remember these three calculations, but it is worth seeing what the logic is. Remember the p-value is the probability (area) of the getting results at least as convincing as yours in favor of HA: So if HA is p > p0; it is the probability of getting yourp ^ or higher, because anything more than yourp ^ is at least as convincing as your result that p > p0: By the same reasoning, if HA is p < p0 then the relevant probability is of getting yourp ^ or anything smaller. Most subtly, if you are looking for evidence that p 6= po: than anything as far away or further from p0 as yourp ^ is would be as convincing. So that gives you the area of two tails of equal area, so twice the area of one tail. 5 Steps of HT About a Proportion (step 4) 4. Conclusion - If problem provides a significance level α; then If p-value < α conclude • This data is significant evidence at the [α] significance level that the proportion of all [POPULATION] which is [VARIABLE] is [less / more / different from] [p0] otherwise conclude • This data is not significant evidence at the [α] significance level that the proportion of all [POPULATION] which is [VARIABLE] is [less / more / different from] [p0] 2 If there is no significance level, make a reasonable assessment, but do not use the word significant. • The wording here requires care. You do not prove HA; you do not find evidence for H0: You don't use the word significant if it is not a significance test. It is important that you express HA in terms of the original problem, because it is easy for all the formal language and subtle logic to leave you not understanding whether you found evidence that the medicine worked or that it didn't work. 5 Steps of HT About a Proportion (step 5) 5. Assumptions (a) SRS - The sample is a simple random sample. (b) Large Population (Independence) - the population is at least 20 times the sample size N ≥ 20n (c) Rule of 15 (Normality) - np0 ≥ 15 n(1 − p0) ≥ 15 • This is all the same as before, except notice that we use po in place of p orp ^ in the last assumption. This is confusing, but the idea is we always use the best stand-in for p that we have access to in each procedure. In the confidence interval the best we have isp; ^ while here we have p0; which is what we are assuming p is while we are doing the calculation. Example (At Last!) In a sample of 120 customers 78 said they preferred the ad with the mauve color scheme over the one with the taupe. Is this evidence at the 1% level that a majority (i.e. more than 50%) prefer mauve? Population = All customers Variable = Whether they prefer mauve or taupe? Parameter = Proportion of all customers who prefer mauve • Often hypothesis testing about a proportion will ask for a evidence that something is true of a majority of the population. Of course majority always means more than 50%; so the test proportion will be p0 = :5: This is one of the cases where the test proportion appears in the questions implicitly but not explicitly. 3 H. T. Example Step 1 1. Identify Hypotheses - the problem asks for evidence that the proportion of all customers who prefer mauve is more than 50%; so the Alternate Hypothesis is HA : p>: 5 which means that the Null Hypothesis is H0 : p = :5 This means that p0 = :5 • Notice the choice of HA was based on what the question asked for evidence for, not on the data from the sample. Choosing your Hypotheses based on the data from the sample is an example of the Cardinal Sin of Hypothesis Testing. H. T. Example Steps 2-3 2. Sampling Distribution - We assume p = :5; we know n = 120 so the sampling distribution of P^ is normal with a mean of µ ^ = :5 and a standard p P error of σP^ = :5 ∗ :5=120) = :0456: 3. Calculate p-value - Since HA : p > p0 is alternative (b) andp ^ = 78=120 = :65 we use p-value=1 − NORMDIST(^p; p0; SQRT(p0(1 − p0)=n); 1) = 1 − NORMDIST(78=120;:5; SQRT(:5 ∗ :5=120); 1) = :000508 = :0508% • H. T. Example Step 4 4. Conclusion - The problem gave us a significance level of α = 1%; so since the p-value of :05% is less than the significance level of 1%; This data is significant evidence at the 1% significance level that the proportion of all customers who prefer mauve is more than 50%: Notice this conclusion gives the actual conclusion, the significance level and the Alternate Hypothesis, which it expresses by relating the parameter with its associated population and variable via an inequality to the test proportion. 4 • I will expect to see all the things I highlighted above whenever you reach a conclusion in a hypothesis test. Notice I am not fussy about, for example, how you express the variable. Best is a sentence an ordinary person can understand. • the book is fond of a more old fashioned (and in my opinion more confusing) way to express the conclusion. If the data is significant you say \We reject the null hypothesis" and if it is not significant you say \We fail to reject the null hypothesis." I do not use that terminology and don't want you to use that terminology, but you should be prepared to see it in online homework problems, and possibly in your future life. H. T. Example Step 5 5. Assumptions - • SRS - Not met - The problem does not say how the sample was taken. • Large Population (Independence) - Met - We can probably assume there are more than 20 × 120 = 2;400 customers for your company. • Rule of 15 (Normality) - Met np0 = 120 ∗ :5 = 60 ≥ 15 n(1 − p0) = 120 ∗ :5 = 60 ≥ 15 • In real life, we accept sampling methods that are not simple random samples but where we cannot see plausible sources of bias. For purposes of questions on tests and such, if the problem doesn't say, the assumption is not met. A Fast Example (Two-Tailed) Test the claim at the 5% significance level that the proportion is different from :3 if a simple random sample of 80 had 21 successes. 1. Hypotheses- H0 : p = :3 and HA : p 6=:3 2. Sampling Distribution - P^ is normal with mean :3 and standard error p:3 ∗ :7=80= :0512: 3. p-value - situation (c), Two-Tailed Alternative, so NORMDIST(21=80;:3; SQRT(:3 ∗ :7=80); 1) = :232 1 − NORMDIST(21=80;:3; SQRT(:3 ∗ :7=80); 1) = :768 2 ∗ MIN(:232;:768) = :464 5 4. Conclusion - Since p-value 46:4% is more than 5%; this data is not significant evidence at the 5% significance level that p is different from :3: 5. Assumptions - SRS met (says in problem). Large Pop - don't know, would need pop ≥ 1600: Rule of 15 - Met, 80 ∗ :3 = 24 ≥ 15; 80 ∗ :7 = 56 ≥ 15: • Key Points You should know... • How to extract the Null and Alternate Hypotheses (and the test proportion) from the question, and to write it in the form H0 : p = :25; HA : p > :25 • How to calculate the p-value based on the sample proportion, the test proportion and which of the three HAs you are using.

Sampling Distribution of Sample Proportion ˆP

Chapter 8 Fundamental Sampling Distributions And

Permutation Tests

Sampling Distribution of the Variance

Arxiv:1804.01620V1 [Stat.ML]

Examination of Residuals

Lecture 14 Testing for Kurtosis

Statistics Sampling Distribution Note

Week 5: Simple Linear Regression

Sampling Distributions Menzies Research Institute Tasmania, 2014

Visualizing Distributions of Covariance Matrices ∗

Sampling and Hypothesis Testing Or Another

Chapter 8 Sampling Distributions