Learn to Use Pearson’s Chi- Squared Test in SPSS With Data From the American National Election Study (2012)

© 2015 SAGE Publications, Ltd. All Rights Reserved. This PDF has been generated from SAGE Research Methods Datasets. SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 Learn to Use Pearson’s Chi- Squared Test in SPSS With Data From the American National Election Study (2012)

Student Guide

Introduction This dataset example introduces Pearson’s chi-squared test for independence.

It is often referred to using the symbol χ2. Pearson’s chi-squared test allows researchers to test whether the distributions of two categorical variables are independent of each other or if they are related. In that way, this technique builds directly on cross-tabulation. Many other statistical tests of independence exist, but Pearson’s chi-squared test is one of the most frequently used among them.

This example describes Pearson’s chi-squared test, discusses the assumptions underling it, and shows how to compute and interpret it. We illustrate this using a subset of data from the 2012 American National Election Study (ANES). Specifically, we test whether views on same-sex marriage differ in a statistically significant way between men and women. You will find links to this dataset and a guide to carrying out Pearson’s chi-squared test using statistical software.

What Is Pearson’s Chi-Squared Test for Independence? Pearson’s chi-squared test for independence measures whether the distributions of two categorical variables are independent of each other. In other words, it

Page 2 of 11 Learn to Use Pearson’s Chi-Squared Test in SPSS With Data From the American National Election Study (2012) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 measures whether there is a meaningful relationship between the two variables. Categorical variables are defined by having a fixed number of groupings or categories determined by a preassigned code. Those categories must be mutually exclusive and exhaustive, meaning that every case must fall into one and only one of the categories.

When computing formal statistical tests, it is customary to define the null hypothesis () to be tested. In this case, the standard null hypothesis is that the two categorical variables in question are independent of each other (i.e. they are not associated). The test begins with conducting a cross-tabulation of two categorical variables to produce a . It then compares the observed number of cases in each cell of the contingency table to the number of cases that would be expected to be observed if the two variables were completely unrelated. Some difference between the observed and expected count in each cell is likely simply due to random chance. Pearson’s chi-squared test is designed to help us determine if the differences are large enough to declare the test statistically significant. Doing so would lead us to reject the null hypothesis () of independence and conclude that there likely is a relationships between the two variables.

Calculating Pearson’s Chi-Squared Test The logic of comparing observed to expected counts in a contingency table is best illustrated by explaining the formula used to calculate Pearson’s chi-squared test of independence.

First, suppose you have a contingency table with R rows and C columns, meaning that there are a total of R×C cells in the table. The expected number of observations in each cell under the null hypothesis of independence is:

(1)

Page 3 of 11 Learn to Use Pearson’s Chi-Squared Test in SPSS With Data From the American National Election Study (2012) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1

NR × NC ERC = NT

Where:

• ERC = the expected number of observations in the cell associated with Row R and Column C • NR = the number of observations in Row R • NC is the number of observations in Column C • NT is the total number of observations in the table

Table 1 illustrates this with two hypothetical variables – Party Identification and Gender. In this hypothetical example, there are 60 women and 40 men, constituting a sample of 100. Of these, 45 are Democrats, 20 are Independents, and 35 are Republicans. This represent a 3 × 2 contingency table, producing 6 cells for which we need to calculate the expected number of observations if gender and party identification are completely unrelated to each other.

Table 1: Empty Table of Party Identification and Gender

Party ID Female Male Sum

Democrat ? ? 45

Independent ? ? 20

Republican ? ? 35

Sum 60 40 100

We can use the formula in Equation 1 to replace the question marks in Table 1 with the expected number of observations that should fall in each cell. Remember, these calculations represent what we would expect to see if the two variables are independent (i.e. if the null hypothesis were true.) Table 2 provides the results.

Page 4 of 11 Learn to Use Pearson’s Chi-Squared Test in SPSS With Data From the American National Election Study (2012) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 Table 2: Expected Counts of Party Identification and Gender

Party ID Female Male Sum

45 × 60 45 × 40 Democrat = 27 = 18 45 100 100

20 × 60 20 × 40 Independent = 12 = 8 20 100 100

35 × 60 35 × 40 Republican = 21 = 14 35 100 100

Sum 60 40 100

Table 2 shows that if the two variables are unrelated, we would expect to have 27 female Democrats, 12 female Independents, 14 male Republicans, and so forth in our sample of 100 respondents. Next we must compare these expected counts for each cell in the contingency table to what we actually observe in our sample. Every sample of data differs due to random chance, or what is sometimes referred to as sampling error. This we would never expect the observed and expected counts to be exactly equal to each other. Rather, we will almost certainly see some differences between them. The question is whether those differences are large enough to lead us to reject the Null hypothesis. This is exactly what Pearson’s chi-squared test measures.

The formula for Pearson’s chi-squared test of independences is presented in Equation 2.

(2)

R C (O − E )2 χ2 = Σ Σ R, C R, C r = 1 c = 1 ER, C

Where:

• OR,C = the observed count in a cell defined by row R and column C • ER,C = the expected count in a cell defined by row R and column C

Page 5 of 11 Learn to Use Pearson’s Chi-Squared Test in SPSS With Data From the American National Election Study (2012) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 Note that the calculation for each cell is then summed (as denoted by the summation symbols for both rows and columns)

The resulting follows a chi-squared probability distribution with degrees of freedom (df) equal to (R − 1)(C − 1), where:

• R = the number of rows in the contingency table • C = the number of columns in the contingency table.

If the resulting value is large enough to be above a critical threshold, that would constitute evidence for rejecting the Null hypothesis and concluding that there likely is a relationship between the two variables. Accounting for the degrees of freedom is important because the value of the chi-squared statistic will get larger even if there is not a relationship between the two variables in question simply as the number of cells in the table increases. Accounting for degrees of freedom based on the number of rows and columns in the table compensates for this.

Social scientists generally chose a critical value for the chi-squared test such that there is less than a .05 probability that the result (in this case, the difference between the observed and expected counts) occurred strictly due to random chance. Thus researchers tend to reject the null hypothesis only when a test statistic has a corresponding significance level (p-value) equal to or less than .05.

Assumptions Behind the Method Nearly every statistical test relies on some underlying assumptions, and they all are affected by the mix of data you happen to have. Critical considerations for Pearson’s chi-squared test include:

• The two variables in question are categorical, each with two or more categories. • The expected count does not drop below 5 in more than 20% of the cells in

Page 6 of 11 Learn to Use Pearson’s Chi-Squared Test in SPSS With Data From the American National Election Study (2012) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 the contingency table at hand.

If your table is only 2×2, it is recommended that you have at least 10 observations per cell. If not, Fisher’s exact probability test is a more robust option to Pearson’s chi-squared test – this test is readily available in most statistical software packages.

Illustrative Example: Gender and Views on Same-Sex Marriage This example presents a Pearson’s chi-squared test of independence between two variables taken from the 2012 American National Election Study (ANES). Specifically, it examines whether views on same-sex marriage are independent of gender. This is pertinent because understanding differences in views on sensitive issues might make formulating policy around them easier.

This example addresses the following research question:

Do views on same-sex marriage differ between men and women?

Stated in the form of a null hypothesis:

H0 = There is no association between views on same-sex marriage and gender.

The Data This example uses a subset of data from the American National Election Studies series – a series of national political surveys conducted nearly every two years in the United States since 1948. This extract from the 2012 study includes 5492 respondents. The two variables we examine are:

• The respondent’s gender (sex) • The respondent’s views on same-sex marriage (gaymarry)

Page 7 of 11 Learn to Use Pearson’s Chi-Squared Test in SPSS With Data From the American National Election Study (2012) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 For gender, respondents were classified as male or female. For the second variable, respondents chose one of three answers:

1. Gay and lesbian couples should be allowed to legally marry. 2. Gay and lesbian couples should be allowed civil unions but not marriage. 3. There should be no legal recognition of gay or lesbian couples’ relationships.

Both of these are categorical variables, making them appropriate to use for cross- tabulation and, thus, for using Pearson’s chi-squared test to determine whether there is a statistically significant relationship between them.

Analyzing the Data The first step is to generate a cross-tabulation of the two variables in question. Since we are also going to calculate Pearson’s chi-squared statistic, we should present both the observed and expected counts in the resulting contingency table. We also present column percentages to facilitate comparison of views on same- sex marriage among men versus women. Table 3 presents the contingency table.

Table 3: Cross-tabulation between views on same-sex marriage and gender, including observed and expected counts, 2012 American National Election Survey.

Men Women Total

Gay and lesbian couples should be allowed to legally marry Count 1000 1221 2221

Expected Count % of Gender 1079.4 1141.6 2221.0

37.47% 43.25% 40.44%

Gay and lesbian couples should be allowed to form civil unions Count 943 918 1861

Page 8 of 11 Learn to Use Pearson’s Chi-Squared Test in SPSS With Data From the American National Election Study (2012) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1

Expected Count % of Gender 904.4 956.6 1861.0

35.33% 32.52% 33.89%

There should be no legal recognition of same-sex couples Count 726 684 1410

Expected Count % of Gender 685.2 724.8 1410.0

27.20% 24.23% 25.67%

Total Count 2669 2823 5492

Expected Count % of Gender 2669.0 2823.0 5492.0

100.0 100.0 100.0

Table 3 shows a total of 5492 respondents in this table. Of those, 2669 were men and 2823 were women. Of the men, about 37% thought that same-sex marriage should be legal, while just over 43% of women held that view. In contrast, just over 27% of men felt there should be no legal recognition of same-sex couples, while only 24% of women felt that way. These results suggest a relationship between gender and views on same-sex marriage, but we can use Pearson’s chi-squared test to evaluate this formally.

Applying Equation 2 to the data in Table 3 produces a value of Pearson’s chi- squared statistic of 19.27 (df = 2). This value must then be compared to a chi- squared probability distribution to determine the probability of observing a value for the chi-squared test this large or larger strictly due to random chance. This probability, or significance level, helps us decide whether to accept or reject the null hypothesis of no relationship.

This could be done by comparing the calculated value of the test statistic – 19.27 in this case – to a table of critical values for the chi-squared distribution. Many textbooks contain such tables. However, nearly all statistical software programs (e.g. SPSS, R, STATA, SAS, etc.) do this automatically and report

Page 9 of 11 Learn to Use Pearson’s Chi-Squared Test in SPSS With Data From the American National Election Study (2012) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 the exact p-value (level of significance). For this example, the p-value is much lower that the traditional cut-off value of 0.05, meaning we would reject the null hypothesis of no relationship.

Presenting Results A Pearson’s chi-squared test of independence can be reported as follows:

“We used a subset of data from the 2012 American National Election Study to test the null hypothesis:

H0 = There is no association between views on same-sex marriage and gender.

The data included 5492 respondents who reported their gender and their views on same-sex marriage. Table 3 shows that women are somewhat more likely to support legal marriage for same-sex couples (43%) compared to men (37%). In contrast, men are somewhat more likely to say there should be no legal recognition for same-sex couples (27%) than are women (24%). A Pearson’s chi- squared test of independence was calculated, yielding a statistic of 19.27, df = 2, p < 0.05. This leads us to reject the null hypothesis of no relationship between gender and views on same-sex marriage. The analysis supports the conclusion that gender and views on same-sex marriage are associated with each other.”

Review Pearson’s chi-squared test of independence is a statistical test used to evaluate whether or not there is an association between two categorical variables. While the test is valuable, it only tests against the null hypothesis of no relationship. Rejecting that null does not say anything about what sort of relationship there is between the variables in the contingency table – only that some sort of relationship appears to exist.

Page 10 of 11 Learn to Use Pearson’s Chi-Squared Test in SPSS With Data From the American National Election Study (2012) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 You should know:

• What types of variable are suitable for Pearson’s chi-squared test of independence. • How Pearson’s chi-squared test is an extension of cross-tabulation. • How to compute and interpret Pearson’s chi-squared test, including both observed and expected counts. • How to report the results of a Pearson’s chi-squared test of independence.

Your Turn You can download this sample dataset along with a guide showing how to produce Pearson’s chi-squared test of independence using statistical software. The sample dataset also includes another variable called religimport that records whether or not a respondent says that religion is an important part of their life. See if you can reproduce the results presented here, then try producing your own Pearson’s chi- squared test of independence between gender and religious importance.

Page 11 of 11 Learn to Use Pearson’s Chi-Squared Test in SPSS With Data From the American National Election Study (2012)