Data Analysis for Two-Way Tables Section 9.1: Chi-Square Test for Two-Way Tables
Total Page:16
File Type:pdf, Size:1020Kb
Section 2.5: Data Analysis for Two-Way Tables Section 9.1: Chi-square test for Two-Way Tables Learning goals for this chapter: Find the joint, marginal, and conditional distributions from a two-way table of the counts by hand and with SPSS. Determine from the wording of the story whether the question is asking for a joint, marginal, or conditional percentage/probability. Know when it two-way tables and the chi-square test are the correct statistical technique for a story. Perform a hypothesis test for a 2 test, including: stating the hypotheses, obtaining the test statistic and P-value from SPSS, and writing a conclusion in terms of the story. Check assumption to see if it is appropriate to use a test using the footnote of the SPSS test. Two-way tables and the chi-square test are used when you are studying the association between 2 categorical variables. cell # The joint distribution of the 2 categorical variables is the (the inner squares). total # All the joint distribution should add to 1. The marginal distribution allows us to study 1 variable at a time. You get them just by adding across a row or down a column for the specific variable you are interested in. The marginals are written in the margins of the table (far right and very bottom). The marginals for the row variable should add to 1. The marginals for the column variable should add to 1. Conditional distribution: If you know one variable for sure (you have “reduced your world”), what are the respective percentages for the other variable? Bar graphs are a good way to demonstrate conditional distributions. Hypothesis testing with 2-way tables H0: There is no association between the row and column variables in the population. Ha: There is an association between the row and column variables in the population. To test the null hypothesis, compare observed cell counts with expected cell counts calculated under the assumption that the null hypothesis is true. 1 Test statistic: Chi Square Test Statistic observed count - expected count 2 X 2 expected count row total x column total Expected count = , n where n = total # of observations for the table. The X2 test statistic has an approximately chi-square distribution. To use the chi-square table, you need the degrees of freedom, (r-1)(c-1). Go to Table F in the back of the book. WE WILL LET SPSS CALCULATE THE TEST STATISTIC AND P-VALUE FOR US. YOU DO NOT NEED TO KNOW HOW TO USE THE TABLE. P-value for chi-square test is: PX()22 (We’ll be using SPSS to do the test.) The chi-square test becomes more accurate as the cell counts increase and for tables larger than 2x2. For tables larger than 2x2: use chi-square test whenever the average of the expected counts is 5 or more and the smallest expected count is 1 or more <20% of cells have expected counts of less than 5. For 2x2 tables: use chi-square test whenever all 4 expected cell counts to be 5 or more Example: Market researchers know that background music can influence the mood and purchasing behavior of customers. One study in a supermarket in Northern Ireland compared 3 treatments: no music, French accordion music, and Italian string music. Under each condition, the researchers recorded the number of bottles of French, Italian, and other wine purchased. Here is the 2-way table that summarizes the data in counts (total # of bottles sold = 243: Music Wine None French Italian French 30 39 30 Italian 11 1 19 Other 43 35 35 2 Calculate the joint distribution for music and wine: Music Wine None French Italian French 12.3 16.0 12.3 Italian 4.5 0.4 7.8 Other 17.7 14.4 14.4 Calculate the marginal distribution for music: Music Wine None French Italian French 12.3 16.0 12.3 Italian 4.5 0.4 7.8 Other 17.7 14.4 14.4 Marg. for music 34.6 30.9 34.6 Calculate the marginal distribution for wine: Music Wine None French Italian Marg. for wine French 12.3 16.0 12.3 40.7 Italian 4.5 0.4 7.8 12.8 Other 17.7 14.4 14.4 46.5 Marg. for music 34.6 30.9 34.6 100 3 Questions (joint, marginal, conditional?): 1. “What percent of all wine bought was Italian with French music playing in the store?” 2. “Of the Italian wine purchased, what percent was from a store playing French music?” 3. “What percent of wine bought was Italian?” 4. “What percent of the wine purchased from French music-playing stores was French?” 5. “What percent of wine was purchased from a store with no music playing?” Using SPSS, set up the data so that you have a wine column, a music column, and a purchase column (where you will input the counts inside the chart). Wine Music Purchase French None 30 Italian None 11 Other None 43 French French 39 Italian French 1 Other French 35 French Italian 30 Italian Italian 19 Other Italian 35 Then go to Data --> Weight Cases. Click “Weight cases by” and then move “purchase” into the “frequency variable” box. Click OK. Do Analyze--> Descriptive Statistics --> Crosstabs. Make sure “observed” is checked. Put “wine” into the “Rows” box and “music” into the “Columns” box. Click OK. You will get: 4 Type of Wine * Type of Music Crosstabulation Count Type of Music French Italian None Total Type of French 39 30 30 99 Wine Italian 1 19 11 31 Other 35 35 43 113 Total 75 84 84 243 Then if you want the %s for joint and marginal distributions instead of counts, you go back to your data and do Analyze --> Descriptive Statistics --> Crosstabs --> (your rows and columns should still be entered from the previous step) --> Click “Cells” --> Click “Total.” Also, un-click “observed” so your table won’t also include the counts and be too crowded. Click “Continue” and then “OK.” You will get: Type of Wine * Type of Music Crosstabulation % of Total Type of Music French Italian None Total Type of French 16.0% 12.3% 12.3% 40.7% Wine Italian .4% 7.8% 4.5% 12.8% Other 14.4% 14.4% 17.7% 46.5% Total 30.9% 34.6% 34.6% 100.0% Is there a relationship in the population between the type of wine purchased and the type of music that is playing? Perform a significance test, and write a short summary of your conclusion. Hypotheses: Test statistic: P-value: Conclusion in terms of the story: Was it appropriate to use the chi-square test here? Justify your answer. 5 To make SPSS do the hypothesis test, you go back to Analyze --> Descriptive Statistics -- > Crosstabs --> Cells. Then click “total” to make their checks go away. Also click “expected” under “counts.” Click Continue. Then click Statistics --> Chi-Square --> Continue --> OK. You will get: Chi-Square Tests Asymp. Sig. Value df (2-sided) Pearson Chi-Square 18.279a 4 .001 Likelihood Ratio 21.875 4 .000 N of Valid Cases 243 a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 9.57. Use the “Pearson Chi-Square” to get your X2 test statistic, and the “Asymp. Sig.” to get the P-value. Example: Psychological and social factors can influence the survival of patients with serious diseases. One study examined the relationship between survival of patients with coronary heart disease and pet ownership. Each of 92 patients was classified as having a pet or not and by whether they survived for one year. The researchers suspect that having a pet might be connected to the patient status. Here are the data: Pet ownership Patient Status No Yes Alive 28 50 Dead 11 3 Total 39 53 a) Find the joint and marginal distributions (in probabilities) of patient status and pet ownership. Pet ownership Patient Status No Yes Marg, for status Alive 0.304 0.543 0.847 Dead 0.120 0.033 0.153 Marg. for pets 0.424 0.576 b) Assuming a patient is still alive, what is the probability he owns a pet? Is this a joint, marginal, or conditional probability? 6 c) What is the probability a patient is still alive and owns a pet? Is this a joint, marginal, or conditional probability? d) What is the probability a patient owns a pet? Is this a joint, marginal, or conditional probability? e) State the hypotheses for a 2 test of this problem, find the X2 test statistic, its degrees of freedom, and the P-value. State your conclusion in terms of the original problem. Hypotheses: Test statistic: P-value: Conclusion in terms of the story: Chi-Square Tests Asymp. Sig. Exact Sig. Exact Sig. Value df (2-sided) (2-sided) (1-sided) Pearson Chi-Square 8.851(b) 1 .003 Continuity 7.190 1 .007 Correction(a) Likelihood Ratio 9.011 1 .003 Fisher's Exact Test .006 .004 Linear-by-Linear Association 8.755 1 .003 N of Valid Cases 92 7 Student Handout for M&Ms/Skittles Activity (Chapter 9: Two Way Distributions) Part 1: Plain vs. Peanut M&Ms 1. Your data for plain (mine for peanut), in counts: Brown Yellow Red Blue Orange Green Total Plain Peanut 2 3 5 0 8 4 22 Total Overall total number of plain and peanut M&Ms counted: 2. Joint Distribution (in white boxes). Divide each count above by the overall total of M&Ms.