<<

Learn to Use Fisher’s Exact Test in SPSS With Greater Manchester Police’s Stop and Search Data (2017)

© 2019 SAGE Publications, Ltd. All Rights Reserved. This PDF has been generated from SAGE Research Methods Datasets. SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 Learn to Use Fisher’s Exact Test in SPSS With Greater Manchester Police’s Stop and Search Data (2017)

Student Guide

Introduction This example dataset introduces Fisher’s Exact test. This test allows researchers to test whether there is an association between two categorical variables, each of which is measured dichotomously, i.e., has only two groups (e.g., male/female, employed/unemployed) in small samples. This example describes Fisher’s Exact test, discusses the assumptions underlying it, and shows how to compute and interpret it. We illustrate Fisher’s Exact test using a subset of data from the 2017 (December) Greater Manchester Police’s Stop and Search data. Specifically, we test whether there is an association between gender and the (reported) reason for an individual being stopped and searched by the Police. Fisher’s Exact test is used instead of a Pearson’s Chi-squared test when you have a small sample, which is typically anything under 1,000 cases. Fisher’s Exact test does not rely on distributional assumptions and therefore makes it particularly appropriate for the analysis of small samples. Fisher’s Exact test can be used on larger samples, but it is better to use alternative tests in this situation, such as Pearson’s Chi-squared, because Fisher’s Exact test was specifically designed to overcome the problems of small sample sizes in 2 × 2 contingency tables.

This page provides links to this sample dataset and a guide to producing the

Page 2 of 11 Learn to Use Fisher’s Exact Test in SPSS With Greater Manchester Police’s Stop and Search Data (2017) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 Fisher’s Exact test using statistical software.

What Is a Fisher’s Exact test? The Fisher’s Exact test is a method for determining whether there is an association between two categorical variables (e.g., sex, ethnicity, occupation), each of which is measured dichotomously, that is they only have two groups (e.g., male/female, employed/unemployed). This test only works for binary categorical variables. Fisher’s Exact test is best suited for small samples (typically anything under 1,000 cases) because it does not rely on distributional assumptions. It can be used for categorical variables with more than two categories and larger samples, but there are better tests to choose for this sort of data, for example, Pearson’s Chi-squared test.

When computing formal statistical tests, one first defines the null hypothesis (H0) to be tested. In this case, the standard null hypothesis is that there is no association between the two variables. Some difference in association is expected simply due to error, i.e., random chance in sampling. The Fisher’s Exact test conducted here is designed to help us determine whether the difference is large enough to declare the test statistically significant. “Large enough” is typically defined as a test with a level of , or p-value, of less than .05, meaning that sample associations this large or larger would occur “just by random chance” in only 5% of samples this size. We would “reject the null hypothesis (H0) of no association between the two variables” at the .05 level.

Calculating Fisher’s Exact Test Fisher’s Exact test was designed to correct the problem of small samples sizes when analysing 2 × 2 contingency tables. The test works by calculating the exact probability of the Chi-square under the maintained assumption of no association. The Chi-square test statistic has a , which

Page 3 of 11 Learn to Use Fisher’s Exact Test in SPSS With Greater Manchester Police’s Stop and Search Data (2017) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 is approximately a Chi-square distribution; the larger the sample, the better the approximation and the less we should worry that it is an approximation and not an exact distribution. However, in small samples, the approximation makes significance testing of the Chi-square test statistic potentially inaccurate.

To illustrate, let’s imagine that we have conducted an on 15 participants, 8 men and 7 women. We have asked them to identify (yes/no) whether they regularly use hair removal products.

Table 1 shows the results below.

Table 1: Cross-tabulation of Gender and Regular Use of Hair Removal Products.

Gender

Male Female Total

Whether participant regularly uses hair removal products Yes 2 3 5

No 6 4 10

Totals 8 7 15

The cross-tabulation suggests a possible association as 3 (out of 5) of our regular hair removal product users are female, whereas there are more men (6 out of 8) in the non-regular user category. However, we do not know whether this pattern is statistically significant; what is the probability that regular use of hair removal products should be this unevenly or more unevenly distributed between males and females just by chance if truly there is no association?

Table 1 is also known as a 2 × 2 ; before we can calculate the probability, it is helpful to add some notation to Table 1, as illustrated in Table 2 below.

Page 4 of 11 Learn to Use Fisher’s Exact Test in SPSS With Greater Manchester Police’s Stop and Search Data (2017) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 Table 2: Cross-tabulation of Gender and Regular Use of Hair Removal Products (Notation Added).

Gender

Male Female Total

2 3 5 Yes a b a + b Whether participant regularly uses hair removal products

6 4 10 No c d c + d

8 7 15 Totals a + c b + d a + b + c + d (= n)

Fisher’s Exact test uses the hypergeometric distribution to calculate the probability of an association between two variables and the formula is listed below under Equation (1).

(1)

(a + b)! (c + d)! (a + c)! (b + d)! p = n! a! b! c! d! where:

• ! = the factorial operator

To calculate Equation 1, we need to consider all the possible 2 × 2 tables that could be constructed using these data, as illustrated in Table 3 below.

Table 3: A Subset of Possible 2 × 2 Contingency Tables for the Cross- tabulation of Gender and Regular Use of Hair Removal Products.

0 5 1 4 2 3 3 2 4 1 5 0

Page 5 of 11 Learn to Use Fisher’s Exact Test in SPSS With Greater Manchester Police’s Stop and Search Data (2017) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2

8 2 7 3 6 4 5 5 4 6 3 7

You then calculate the probability for each table, using Equation 1, as shown in Table 4 below.

Table 4: The Probability for Possible 2 × 2 Contingency Tables for the Cross-tabulation of Gender and Regular Use of Hair Removal Products.

0 5 1 4 2 3 3 2 4 1 5 0 Table 8 2 7 3 6 4 5 5 4 6 3 7

Probability 0.007 .093 0.326 0.392 0.163 0.019

Fisher’s Exact test works best with one-tailed hypotheses, although it can also be used for two-tailed hypotheses.

To calculate the probability for a one-tailed hypothesis, we need to sum the probability of the given table and every other table that is configured in the same direction as the given table but more extreme than the observed frequencies. In this example that would be 0.326 +0.93 + 0.007 = 0.426.

To calculate the probability for a two-tailed hypothesis, we need to sum the probability of all the tables that are to the right of the observed frequencies, which have a probability less than or equal to the given table and add it to the one- tailed hypothesis probability. In this example that would be 0.426 + 0.163 + 0.019 = 0.608.

In our example, the Fisher’s Exact test value is 0.426 for a one-tailed hypothesis and 0.608 for a two-tailed hypothesis. We can fail to reject the H0, in other words, there is no association between the two variables.

Assumptions Behind the Method All statistical tests rely on some underlying assumptions, and they all are affected

Page 6 of 11 Learn to Use Fisher’s Exact Test in SPSS With Greater Manchester Police’s Stop and Search Data (2017) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 by the type of data that you have.

Assumptions of the Fisher’s Exact test

• Both variables must have two categorical, independent groups. • There must be independence of observations, so there is no relationship between the groups or between the observations in each group.

These assumptions are not typically testable from the sample data and are related to the research design. The second assumption is only likely to be violated if the data were sampled by pairs rather than individuals (e.g., couples rather than individual persons). It is important to understand how your data were collected and categorised; this will help you avoid violating the two assumptions.

Illustrative Example: Association Between Gender and the (Reported) Reason for an Individual Being Stopped and Searched by the Police This example presents a Fisher’s Exact test using two variables from the 2017 (December) Greater Manchester Police’s Stop and Search data. Specifically, we test whether there is an association between gender and the (reported) reason for an individual being stopped and searched by the Police.

Thus, this example addresses the following research question:

Does the reason reported for an individual being stopped and searched by the Police vary according to an individual’s gender?

Stated in the form of a null hypothesis:

H0 = There will be no association between gender and the reason reported for an individual being stopped and searched by the Police.

It should be noted that this hypothesis is two-tailed.

Page 7 of 11 Learn to Use Fisher’s Exact Test in SPSS With Greater Manchester Police’s Stop and Search Data (2017) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2

The Data This example uses a subset of data from the 2017 (December) Greater Manchester Police’s Stop and Search data. It should be noted that these data have been cleaned and have fewer variables than the original data source. This extract includes 135 respondents, which is a small sample. The two variables we examine are:

• Respondent’s gender (Gender) • Object of search (ObjectSearch)

The first variable, Respondent’s gender (Gender), is coded 1, if male and 2, if female. The object of search (ObjectSearch) is treated dichotomously in our data and is coded, 1, if stopped and searched for controlled drugs and 2, if stopped and searched for an object to threaten or harm/firearms/offensive weapons. We treat both variables as categorical, in line with common practice in social science research. In addition, both variables are dichotomous.

Analysing the Data Before conducting Fisher’s Exact test, we should first examine each variable in isolation. We start by presenting a of Gender in Table 5. Table 5 shows the distribution of Gender; the majority of the sample are male (92.5%). It should be noted that there are 25 missing cases.

Table 5: Frequency Distribution of Gender.

Frequency Percent Valid percent Cumulative percent

Male 149 80.1 92.5 92.5

Valid Female 12 6.5 7.5 100.0

Total 161

Page 8 of 11 Learn to Use Fisher’s Exact Test in SPSS With Greater Manchester Police’s Stop and Search Data (2017) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2

Missing 25 13.4

Total 186 100.0 100.0

Table 6 shows the frequency distribution of ObjectSearch. The majority (74.8%) of searches were for controlled drugs. It should be noted that there are 51 missing cases.

Table 6: Frequency Distribution of ObjectSearch.

Frequency Percent Valid percent Cumulative percent

Controlled drugs 101 54.3 74.8 74.8

Object to threaten or harm/firearms/offensive weapon 34 13.8 25.2 100.0 Valid

Total 135 72.6 100.0

Missing 51 27.4

Total 186 100.0

Tables 5 and 6 show the distribution of each of these variables by themselves, but they cannot tell us whether they are in a relationship.

Conducting Fisher’s Exact Test Table 7 presents the results of the Fisher’s Exact test.

Table 7: Results of the Fisher’s Exact Test.

Asymptotic significance Exact significance Exact significance Value df (2-sided) (2-sided) (1-sided)

Pearson’s Chi-Square 5.495 1 .019

Continuity Correction 4.668 1 .062

Likelihood Ratio 108.321 1 .031

Page 9 of 11 Learn to Use Fisher’s Exact Test in SPSS With Greater Manchester Police’s Stop and Search Data (2017) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2

Fisher’s Exact test .038 .038

Linear-by-Linear 5.447 1 .020 Association

N of Valid Cases 116

We can see that our results are significant p = .038, the variables are associated with each other. Therefore, we can reject the null hypothesis.

Presenting Results A Fisher’s Exact test can be reported as follows:

“We used a subset of data from the 2017 (December) Greater Manchester Police’s Stop and Search data to test whether there is an association between gender and the (reported) reason for an individual being stopped and searched by the Police. Thus, we tested the following null hypothesis:

H0 = There will be no association between gender and the reason reported for an individual being stopped and searched by the Police.

The data included 135 adults. There was a significant association between gender and the reason for an individual being stopped and searched by the Police, p = .038. This leads us to reject the null hypothesis of no association between gender and the reason for an individual being stopped and searched by the Police.”

Review The Fisher’s Exact test is a statistical test used to evaluate the strength of association between two dichotomous categorical variables.

You should know:

• What types of variables are suited for a Fisher’s Exact test.

Page 10 of 11 Learn to Use Fisher’s Exact Test in SPSS With Greater Manchester Police’s Stop and Search Data (2017) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 • The basic assumptions underlying this statistical test. • How to compute and interpret a Fisher’s Exact test. • How to report the results of a Fisher’s Exact test.

Your Turn You can download this sample dataset along with a guide showing how to produce a Fisher’s Exact test using statistical software. The sample dataset also includes another variable called Ethnicity, which relates to the individual’s self-defined ethnicity. See whether you can reproduce the results presented here for the Gender variable and then try producing your own Fisher’s Exact test substituting Ethnicity for Gender in the analysis.

Page 11 of 11 Learn to Use Fisher’s Exact Test in SPSS With Greater Manchester Police’s Stop and Search Data (2017)