<<

Chapter 10 Chi-Square Tests and the F-Distribution

§ 10.1 Goodness of Fit

Multinomial

A multinomial is a probability experiment consisting of a fixed number of trials in which there are more than two possible outcomes for each independent trial. (Unlike the binomial experiment in which there were only two possible outcomes.)

Example : A researcher claims that the distribution of favorite pizza toppings among teenagers is as shown below. Topping , f Each outcome is Cheese 41% The probability for classified into Pepperoni 25% each possible categories . Sausage 15% outcome is fixed. Mushrooms 10% Onions 9%

Larson & Farber, Elementary : Picturing the World , 3e 3

1 Chi-Square Goodness-of-Fit Test

A Chi-Square Goodness-of-Fit Test is used to test whether a fits an expected distribution. To calculate the test for the chi-square goodness-of-fit test, the observed frequencies and the expected frequencies are used.

The observed frequency O of a category is the frequency for the category observed in the sample . The expected frequency E of a category is the calculated frequency for the category. Expected frequencies are obtained assuming the specified (or hypothesized) distribution. The expected frequency for the ith category is

Ei = np i where n is the number of trials (the sample size) and pi is the assumed probability of the ith category.

Larson & Farber, Elementary Statistics: Picturing the World , 3e 4

Observed and Expected Frequencies

Example : 200 teenagers are randomly selected and asked what their favorite pizza topping is. The results are shown below. Find the observed frequencies and the expected frequencies.

Topping Results ( n % of Observed Expected Frequency = 200) teenagers Frequency Cheese 78 41% 78 200(0.41) = 82 Pepperoni 52 25% 52 200(0.25) = 50 Sausage 30 15% 30 200(0.15) = 30 Mushrooms 25 10% 25 200(0.10) = 20 Onions 15 9% 15 200(0.09) = 18

Larson & Farber, Elementary Statistics: Picturing the World , 3e 5

Chi-Square Goodness-of-Fit Test

For the chi-square goodness-of-fit test to be used, the following must be true.

1. The observed frequencies must be obtained by using a random sample. 2. Each expected frequency must be greater than or equal to 5.

The Chi-Square Goodness-of-Fit Test If the conditions listed above are satisfied, then the distribution for the goodness-of-fit test is approximated by a chi-square distribution with k – 1 degrees of freedom, where k is the number of categories. The test statistic for the chi-square goodness-of-fit test is

(O− E ) 2 The test is always a right- χ 2 = ∑ where O represents the observedE frequencytailed of each test. category and E represents the expected frequency of each category.

Larson & Farber, Elementary Statistics: Picturing the World , 3e 6

2 Chi-Square Goodness-of-Fit Test

Performing a Chi-Square Goodness-of-Fit Test In Words In Symbols

1. Identify the claim. State the null and State H0 and Ha. alternative hypotheses.

2. Specify the level of significance. Identify α.

3. Identify the degrees of freedom. d.f. = k – 1

4. Determine the critical value. Use Table 6 in Appendix B. 5. Determine the rejection region.

Continued.

Larson & Farber, Elementary Statistics: Picturing the World , 3e 7

Chi-Square Goodness-of-Fit Test

Performing a Chi-Square Goodness-of-Fit Test In Words In Symbols 6. Calculate the test statistic. − 2 χ 2 = ∑ (O E ) E

7. Make a decision to reject or fail to If χ2 is in the rejection

reject the null hypothesis. region, reject H0. Otherwise, fail to

reject H0. 8. Interpret the decision in the context of the original claim.

Larson & Farber, Elementary Statistics: Picturing the World , 3e 8

Chi-Square Goodness-of-Fit Test

Example : A researcher claims that the distribution of favorite pizza toppings among teenagers is as shown below. 200 randomly selected teenagers are surveyed. Topping Frequency, f Cheese 39% Pepperoni 26% Sausage 15% Mushrooms 12.5% Onions 7.5%

Using α = 0.01, and the observed and expected values previously calculated, test the surveyor’s claim using a chi-square goodness- of-fit test. Continued.

Larson & Farber, Elementary Statistics: Picturing the World , 3e 9

3 Chi-Square Goodness-of-Fit Test

Example continued :

H0: The distribution of pizza toppings is 39% cheese, 26% pepperoni, 15% sausage, 12.5% mushrooms, and 7.5% onions. (Claim)

Ha: The distribution of pizza toppings differs from the claimed or expected distribution.

Because there are 5 categories, the chi-square distribution has k – 1 = 5 – 1 = 4 degrees of freedom.

2 With d.f. = 4 and α = 0.01, the critical value is χ 0 = 13.277.

Continued.

Larson & Farber, Elementary Statistics: Picturing the World , 3e 10

Chi-Square Goodness-of-Fit Test

Example continued : Topping Observed Expected Rejection Frequency Frequency region Cheese 78 82 α = 0.01 Pepperoni 52 50 Sausage 30 30 X2 Mushrooms 25 20 2 χ 0 = 13.277 Onions 15 18

2 2 2 2 2 2 2 (O− E ) (78− 82) (52− 50) (30− 30) (25− 20) (15− 18) χ = ∑ = + + + + E 82 50 30 20 18 ≈ 2.025 Fail to reject H0. There is not enough evidence at the 1% level to reject the surveyor’s claim.

Larson & Farber, Elementary Statistics: Picturing the World , 3e 11

§ 10.2 Independence

4 Contingency Tables

An r ××× c shows the observed frequencies for two variables. The observed frequencies are arranged in r rows and c columns. The intersection of a row and a column is called a cell.

The following contingency table shows a random sample of 321 fatally injured passenger vehicle drivers by age and gender. (Adapted from Insurance Institute for Highway Safety)

Age Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and older Male 32 51 52 43 28 10 Female 13 22 33 21 10 6

Larson & Farber, Elementary Statistics: Picturing the World , 3e 13

Expected Frequency

Assuming the two variables are independent, you can use the contingency table to find the expected frequency for each cell.

Finding the Expected Frequency for Contingency Table Cells

The expected frequency for a cell Er,c in a contingency table is

(Sum of row r )× (Sum of column c ) Expected frequency E = . r, c Sample size

Larson & Farber, Elementary Statistics: Picturing the World , 3e 14

Expected Frequency

Example : Find the expected frequency for each “Male” cell in the contingency table for the sample of 321 fatally injured drivers. Assume that the variables, age and gender, are independent. Age Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and Total older Male 32 51 52 43 28 10 216 Female 13 22 33 21 10 6 105 Total 45 73 85 64 38 16 321

Continued.

Larson & Farber, Elementary Statistics: Picturing the World , 3e 15

5 Expected Frequency

Example continued : Age Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and Total older Male 32 51 52 43 28 10 216 Female 13 22 33 21 10 6 105 Total 45 73 85 64 38 16 321 (Sum of row r )× (Sum of column c ) Expected frequency E = r, c Sample size 216⋅ 45 216⋅ 73 216⋅ 85 E = ≈ 30.28 E = ≈ 49.12 E = ≈ 57.20 1,1 321 1,2 321 1,3 321

216⋅ 64 216⋅ 38 216⋅ 16 E = ≈ 43.07 E = ≈ 25.57 E = ≈ 10.77 1,4 321 1,5 321 1,6 321

Larson & Farber, Elementary Statistics: Picturing the World , 3e 16

Chi-Square Independence Test

A chi-square independence test is used to test the independence of two variables. Using a chi-square test, you can determine whether the occurrence of one variable affects the probability of the occurrence of the other variable.

For the chi-square independence test to be used, the following must be true. 1. The observed frequencies must be obtained by using a random sample. 2. Each expected frequency must be greater than or equal to 5.

Larson & Farber, Elementary Statistics: Picturing the World , 3e 17

Chi-Square Independence Test

The Chi-Square Independence Test If the conditions listed are satisfied, then the for the chi-square independence test is approximated by a chi- square distribution with (r – 1)( c – 1) degrees of freedom, where r and c are the number of rows and columns, respectively, of a contingency table. The test statistic for the chi-square independence test is

(O− E ) 2 The test is always a right- χ 2 = ∑ E tailed test. where O represents the observed frequencies and E represents the expected frequencies.

Larson & Farber, Elementary Statistics: Picturing the World , 3e 18

6 Chi-Square Independence Test

Performing a Chi-Square Independence Test In Words In Symbols

1. Identify the claim. State the null and State H0 and Ha. alternative hypotheses.

2. Specify the level of significance. Identify α.

3. Identify the degrees of freedom. d.f. = ( r – 1)( c – 1)

4. Determine the critical value. Use Table 6 in Appendix B. 5. Determine the rejection region.

Continued.

Larson & Farber, Elementary Statistics: Picturing the World , 3e 19

Chi-Square Independence Test

Performing a Chi-Square Independence Test In Words In Symbols 6. Calculate the test statistic. − 2 χ 2 = ∑ (O E ) E

7. Make a decision to reject or fail to If χ2 is in the rejection

reject the null hypothesis. region, reject H0. Otherwise, fail to

reject H0. 8. Interpret the decision in the context of the original claim.

Larson & Farber, Elementary Statistics: Picturing the World , 3e 20

Chi-Square Independence Test

Example : The following contingency table shows a random sample of 321 fatally injured passenger vehicle drivers by age and gender. The expected frequencies are displayed in parentheses. At α = 0.05, can you conclude that the drivers’ ages are related to gender in such accidents? Age Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and Total older Male 32 51 52 43 28 10 216 (30.28) (49.12) (57.20) (43.07) (25.57) (10.77) Female 13 22 33 21 10 6 (5.23) 105 (14.72) (23.88) (27.80) (20.93) (12.43) 45 73 85 64 38 16 321

Larson & Farber, Elementary Statistics: Picturing the World , 3e 21

7 Chi-Square Independence Test

Example continued : Because each expected frequency is at least 5 and the drivers were randomly selected, the chi-square independence test can be used to test whether the variables are independent.

H0: The drivers’ ages are independent of gender.

Ha: The drivers’ ages are dependent on gender. (Claim)

d.f. = ( r – 1)( c – 1) = (2 – 1)(6 – 1) = (1)(5) = 5

2 With d.f. = 5 and α = 0.05, the critical value is χ 0 = 11.071.

Continued.

Larson & Farber, Elementary Statistics: Picturing the World , 3e 22

Chi-Square Independence Test

− 2 Example continued : O E O – E (O – E)2 (O E ) Rejection E 32 30.28 1.72 2.9584 0.0977 region 51 49.12 1.88 3.5344 0.072 − α = 0.05 52 57.20 5.2 27.04 0.4727 43 43.07 −0.07 0.0049 0.0001 X2 28 25.57 2.43 5.9049 0.2309 − χ2 = 11.071 10 10.77 0.77 0.5929 0.0551 0 13 14.72 −1.72 2.9584 0.201 2 22 23.88 −1.88 3.5344 0.148 2 (O− E ) χ = ∑ = 2.84 E 33 27.80 5.2 27.04 0.9727 21 20.93 0.07 0.0049 0.0002 Fail to reject H0. 10 12.43 −2.43 5.9049 0.4751 6 5.23 0.77 0.5929 0.1134 There is not enough evidence at the 5% level to conclude that age is dependent on gender in such accidents.

Larson & Farber, Elementary Statistics: Picturing the World , 3e 23

§ 10.3 Comparing Two

8 F-Distribution

2 2 Let s 1 and s 2 represent the sample variances of two different populations. If both populations are normal and the population 2 2 variances are equal,σ then1 and the σ 2 sampling distribution of

s 2 = 1 F 2 is called an F-distributions 2 . There are several properties of this distribution.

1. The F-distribution is a family of curves each of which is determined by two types of degrees of freedom: the degrees of freedom

corresponding to the in the numerator, denoted d.f. N, and the degrees of freedom corresponding to the variance in the

denominator, denoted d.f. D. Continued.

Larson & Farber, Elementary Statistics: Picturing the World , 3e 25

F-Distribution

Properties of the F-distribution continued: 2. F-distributions are positively skewed. 3. The total area under each curve of an F-distribution is equal to 1. 4. F-values are always greater than or equal to 0. 5. For all F-distributions, the value of F is approximately equal to 1.

d.f. N = 1 and d.f. D = 8

d.f. N = 8 and d.f. D = 26

d.f. N = 16 and d.f. D = 7

d.f. N = 3 and d.f. D = 11

F 1 2 3 4

Larson & Farber, Elementary Statistics: Picturing the World , 3e 26

Critical Values for the F-Distribution

Finding Critical Values for the F-Distribution 1. Specify the level of significance α.

2. Determine the degrees of freedom for the numerator, d.f. N.

3. Determine the degrees of freedom for the denominator, d.f. D. 4. Use Table 7 in Appendix B to find the critical value. If the hypothesis test is a. one -tailed, use the α F-table. 1 b. two-tailed, use the α 2F-table.

Larson & Farber, Elementary Statistics: Picturing the World , 3e 27

9 Critical Values for the F-Distribution

Example : Find the critical F-value for a right-tailed test when α = 0.05, d.f. N = 5 and d.f. D = 28. Appendix B: Table 7: F-Distribution

d.f. D: Degrees ααα = 0.05 of freedom, d.f. N: Degrees of freedom, numerator denominator 1 2 3 4 5 6 1 161.4 199.5 215.7 224.6 230.2 234.0 2 18.51 19.00 19.16 19.25 19.30 19.33

27 4.21 3.35 2.96 2.73 2.57 2.46 28 4.20 3.34 2.95 2.71 2.56 2.45 29 4.18 3.33 2.93 2.70 2.55 2.43

The critical value is F0 = 2.56.

Larson & Farber, Elementary Statistics: Picturing the World , 3e 28

Critical Values for the F-Distribution

Example : Find the critical F-value for a two-tailed test when 1 1 α = (0.10) = 0.05 2 2 α = 0.10, d.f. N = 4 and d.f. D = 6. Appendix B: Table 7: F-Distribution

d.f. D: Degrees ααα = 0.05 of freedom, d.f. N: Degrees of freedom, numerator denominator 1 2 3 4 5 6 1 161.4 199.5 215.7 224.6 230.2 234.0 2 18.51 19.00 19.16 19.25 19.30 19.33 3 10.13 9.55 9.28 9.12 9.01 8.94 4 7.71 6.94 6.59 6.39 6.26 6.16 5 6.61 5.79 5.41 5.19 5.05 4.95 6 5.99 5.14 4.76 4.53 4.39 4.28 7 5.59 4.74 4.35 4.12 3.97 3.87

The critical value is F0 = 4.53. Larson & Farber, Elementary Statistics: Picturing the World , 3e 29

Two-Sample F-Test for Variances

Two-Sample F-Test for Variances A two-sample F-test is used to compare two population variances 2 2 whenσ1 a and sample σ 2 is randomly selected from each population. The populations must be independent and normally distributed. The test statistic is s 2 = 1 F 2 s 2 2 2 where s1 and s 2 represent the sample variances with 2 2 Thes1≥ degrees s 2 . of freedom for the numerator is d.f.N = n1 – 1 and the degrees of freedom for the denominator is d.f. D = n2 – 1, where n1 is the size of the sample having variance and n2 is the size of the 2 sample having variance s1 2 s 2 .

Larson & Farber, Elementary Statistics: Picturing the World , 3e 30

10 Two-Sample F-Test for Variances

Using a Two-Sample F-Test to Compare In Words In Symbols

1. Identify the claim. State the null and State H0 and Ha. alternative hypotheses.

2. Specify the level of significance. Identify α.

3. Identify the degrees of freedom. d.f. N = n1 – 1 d.f. D = n2 – 1

4. Determine the critical value. Use Table 7 in Appendix B.

Continued.

Larson & Farber, Elementary Statistics: Picturing the World , 3e 31

11