Simple Linear Regression Notes

Total Page:16

File Type:pdf, Size:1020Kb

Simple Linear Regression Notes

Contingency Table Analysis

When finished with these notes, you should be able to  Define the terms  Explain why the test is conducted as it is  Test a hypothesis of equal proportions and a test of independence  Determine when it should be used  Explain the requirements of the procedure  Determine if the requirements are met  Calculate the values using NCSS.

1. Terms

1.1 Contingency tables – a pivot table – a table that compares two qualitative variables and counts the number of observations that fall in each combination of the variables

Example: You have asked 100 politicians their party affiliation and whether they will vote for a specific candidate. The following table was constructed from their answers:

Observed Counts of Party Affiliation and Voting Combinations Party Affiliation Vote Dem Rep Other Total Yes 24 18 18 60 No 16 22 2 40 Total 40 40 20 100

1.2 Independence: The proportion of objects in a specific row does not depend on the column.

Example: If the party affiliation and the vote are independent, then the proportion of democrats who vote yes should be the same as the proportion of republicans who vote yes and same for the independents.

Since overall 60% of politicians vote Yes, you would expect 60% of Democrats, 60% of Republicans and 60% of independents would vote Yes if the party was independent of the vote.

Expected Counts if Party Was Independent of Vote Party Affiliation Vote Dem Rep Other Total Yes 24 24 12 60 No 16 16 8 40 Total 40 40 20 100 2. Explanation of Test Procedure

If these were the population values, what would you conclude about the observed counts? Is party affiliation independent of voting?

If these were sample values, then we need to see if the distance sample values fall from the expected is larger than just by chance. Too large a difference and you conclude there is a relationship.

For the test, we will calculate a measure of the distance between the observed and expected and sum this over all combinations. Since some differences are positive and some are negative, the square of the distance will be used. Also since larger expected counts could have large differences just by chance, we will relative the squared distance to the expected count. This is called the chi-square test:

o  e2  2   all cells e

Which is then compared to a 2 table with degrees for freedom equal to the number of rows minus one time the number of columns minus one (r-1)*(c-1)

Example: The rejection region is a range of values such that the chance of finding a value of the chi-squared in this range would only occur by accident 5% of the time if independence was true.

For a table with (3-1)*(2-1) = 2 degrees of freedom, the table value would be 5.991. Therefore you would reject independence if you find a test statistic value larger than 5.991

For the previous two tables, the test statistic is

24  242 18  242 (18 12)2  2     24 24 12 16 162 22 162 2 82    11.25 16 16 8

Therefore we would reject independence. Recap: Ho: voting and party affiliation are independent H1: Voting is related to party affiliation R.R. Reject Ho if the 2 test statistic value > 2 (2) = 5.991 T.S. 2 = 11.25 Decision: Reject Ho Conclusion: We can say that voting is associated with party affiliation. For other examples double click the embedded Excel file below

Can you conclude that the proportion of cars in the following categories (Foreign versus Domestic) depends on the types of car (SUV, Pickup, Sedan)?

You took a random sample of cars and measured both the types of car category and the Origin of Car category You are given that the chi-square test statistic value is 7.503 or click on the following link: http://wweb.uta.edu/faculty/eakin/busa5325/chisqTest.xls

For use in the Department of Labor see table 2.5 at : http://www.dol.gov/esa/whd/fmla/chapter1.htm

Exercise on Blackboard. Due date on blackboard .

3. Test of equal proportions versus the test of independence

 Both tests use the same test statistic and same rejection region. The only difference in the test of equal proportion and the test of independence is in the way the data is collected.  The data for chi-square tests can be collected identically to the ways it is collected in Analysis of Variance except in Anova the response is quantitative and only the factors are qualitative while in the chi-square both the response and factor are qualitative. For the chi-square test:  When you take a random sample of objects and have measured two qualitative variables on each object the test becomes a test of independence of the two variables. The null hypothesis would be independence and the alternative dependence.  The test becomes a test of equal proportions under either of the following data collection approaches: (1) when you take 3 or more random samples or (2) when an experiment is conducted and experimental units are randomly assigned to the different treatment levels. The response variable is qualitative in each data collection approach. The null would be equal proportions and the alternative would be at least two proportions differ. 4. What are the requirements of the procedure?  The chance of an object being in a particular combination does not change from object to object.  The objects are randomly and independently selected.  You have enumerated all possible values of the qualitative variables.  The sample size is large enough (all expected counts must equal or exceed 5) so that the sample proportions are approximately normally distributed.

5. How do you check the requirements?

Only the sample size requirement can be checked by the computer.

6. SAS

Click here to see how to run the chi-square test for a contingency table with SAS

Recommended publications