<<

1

CHAPTER 8: TESTING FOR PROPORTIONS

8.2 Two- test for proportions

2

Outline of the Chapter 8

•8.1 One-sample test for proportion

•8.2 Two-sample test for proportions

3

What We Have Learned from 8.1?

• When sample size is large enough, we can use a simple normal distribution based z-test to deal with proportions.

• The key is to compare the obtained z-score with the critical z-values.

• Critical z-values are decided by the level of significance (alpha-level) and the direction of the .

These will also hold true for 8.2.

1 4 When to Use the Two-sample Test for Proportion?

Use this test when you want to determine if the proportion of a certain outcome (e.g., the % of adults who snore) is the same or different across two (e.g., young people vs. older people).

* We will only focus on two tailed test in this section.

5

Example: Snoring Research

Snoring research: For a random sample of 995 adult respondents, 366 or 36.8% snored at least few nights a week. If we split in two age categories, we found that 48 or 26.1% of 184 respondents whose age is 30 or below snored and 318 or 39.2% of the rest 811 respondents who are over 30 answered yes.

Is the difference of 13.1% real or due only to variability?

6 The of the Difference • To conduct a hypothesis testing, we first need to understand the sampling distribution of the random quantity of interest (here, 13.1%). • The sampling distribution for a difference between two independent proportions could be again derived using . Since both sample follow normal distribution approximately when sample sizes are large enough, their difference will also follow a normal distribution with as and .

2 7

Assumptions and Conditions

1. Independent Samples Condition • The two groups we are comparing must also be independent of each other (usually evident from the way the is collected). • Example • Same group of people before and after a treatment are not independent

2. Random sampling assumption: the two sample need to drawn randomly.

Test the Equality of the Two Proportions

• Null hypothesis: H0 : P1 – P2 = 0 (or P1 = P2 )

• Under the null, the two proportions are the same. So we can pool the observation together to simplify the computation, i.e., Combining the counts to get an overall proportion (aka, pooled proportion):

8

Sampling Distribution of

• Null hypothesis: H0 : P1 – P2 = 0 (or P1 = P2 )

• When sample size is large enough, approximately follows a normal distribution with zero mean and under the null.

• If we can figure out a way to estimate , we will apply the same method as last section to find the z-score.

9

3 Estimate the Standard Error

• Because we hypothesize that the proportions are equal, we pool them to estimate the standard error

10

Two-Proportion z-test

• Now we find the test using the statistic

• When the conditions are met and the null hypothesis is true, this statistic approximately follows the standard normal distribution/

11

12

Rejection Region for Two-tailed Test

• Given it is a two-tailed test, either a too large or too small difference may lead to the rejection of the null.

• This translates to a rejection region with two tails: either a “too positive” or a “too negative” z-score will reject the null.

• Given the level of significance is  = 0.05, the critical values are -1.96 and 1.96. The rejection regions are

z < -1.96 and z > 1.96

4 Conclusion

• Because z is in the rejection region, we should reject the null hypothesis.

At the 5% level of significance, there is sufficient evidence to reject the claim that the two age groups have the same proportion of snorers.

• On the next slide, we would tackle the problem by comparing the p-value and the significance level.

13

P-value Method

• The p-value is the probability of observing a difference greater or equal to 0.131

• The two sided p-value is 0.0008. In other words, if the null is true, we only have .0008 of chances to observed a difference of 13.1% or larger. With an alpha value of .05, we reject the null hypothesis.

14

Confidence Interval Estimate for the Difference of the Two Proportions

• We cannot use the pooled proportion to estimate the standard error since we don’t know if the two are equal.

• To estimate the standard error of the difference between the two sample proportions, we have to use both sample proportions. Here is the formula:

15

5 16

Construct the

• When the sample sizes and independence conditions are met, we are ready to find the confidence interval for the difference of two proportions P1 - P2.

• Plugging in the estimated standard error, we have the CI as:

• The critical value z* depends on the particular confidence level. For a typical 95% CI, z*=1.96.

17

Summary

• We introduced a normal distribution based z-test for comparing proportions of two populations.

• The key is to compare the obtained z-score with the critical z-values. To calculate the obtained z-score, we need to calculate the pooled sample proportion first.

• For  = 0.05, the critical values are -1.96 and 1.96. The rejection regions for a two tailed test are z < -1.96 and z > 1.96.

6