Learn to Use the F-Test to Compare Two in R With From the General Social Survey (2016)

© 2019 SAGE Publications, Ltd. All Rights Reserved. This PDF has been generated from SAGE Research Methods Datasets. SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016)

Student Guide

Introduction This example dataset introduces the F-test to Compare Two Variances. An “F-test” is a generic term for any statistical test where the test (usually known as the F-statistic) uses the F-distribution. For example, the F-distribution is at the heart of determining “” in both ANOVA and OLS regression. However, the use of the term “F-test” is taken here to refer to the F-test to Compare Two Variances, and it is this test that will be the focus of this dataset. This example dataset introduces the F-test to Compare Two Variances, which is a test that allows us to compare the of a continuous or interval- level (e.g., income, opinion on government spending) dependent variable across two subsamples or groupings by a categorical independent variable (e.g., sex, ethnicity [White/Black]). By comparing the variances of our (dependent) variable across the groupings of our (independent) variable, we can assess whether the variance of the former is different at different values of the latter,

This example describes the F-test to Compare Two Variances, discusses the assumptions underlying it, and shows how to compute and interpret it. We illustrate the F-test to Compare Two Variances using a subset of data from the 2016 General Social Survey (GSS). Specifically, we test the extent to which the

Page 2 of 14 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 variance of an individual’s views on government spending differs by whether an individual has a shotgun in the home. This page provides links to this sample dataset and a guide to producing the F-test to Compare Two Variances test using statistical software.

What Is an F-test? F-test refers, generically, to any statistical test where the test statistic (usually called the F-statistic) follows an F-distribution. However, a common use of the term F-test is taken to the F-test to Compare Two Variances. The F-test to Compare Two Variances is a test that allows us to determine whether the variance of a continuous, interval-level dependent variable is equal across two subsamples or groupings of a categorical independent variable. In our example, we want to examine whether the variances of an individual’s views on government spending on defence are equal across the categories of the indicator variable shotgun ownership (yes/no). In other words, is there a statistically significant difference by shotgun ownership in the variance of an individual’s views on government spending on defence.

When computing formal statistical tests, it is customary to define the null hypothesis (H0) to be tested. In this case, the standard null hypothesis is that there are equal variances (i.e., they are equal) between whether or not an individual owns a shotgun in relation to an individual’s views on government spending on defence. Some difference in variances is expected simply due to error; i.e., random chance in sampling. The F-test to Compare Two Variances conducted here will help us determine whether the difference in variances is large enough to have been unlikely to arise by chance alone, so that we can declare the test statistically significant. “Large enough,” as usually defined, is a test statistic with an associated level of , or p-value, of less than .05. This would lead us to reject the null hypothesis (H0) of equal

Page 3 of 14 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 variances and conclude that there likely is an association between the categories of the (independent) and the variances of the (dependent) variable being tested.

Calculating an F-Test to Compare Two Variances The F-test to Compare Two Variances works by using an F-statistic to compare the variance of a continuous or interval-level variable across two subsamples or groupings by a categorical independent variable, by dividing them with each other. This test operates on the null hypothesis that there is equal variance between the groups. To illustrate, imagine that we had collected data on social media usage (measured in hours per day) from a small group (N = 10) of adolescents (categorised into two groups: males, females). Table 1 below shows the distribution of our data.

Table 1: Frequency Distribution of Social Media Usage by Gender.

Males’ daily social media usage (in hours) Females’ daily social media usage (in hours)

2 8

1 7

4 6

3 6

3 9

4 10

2 5

1 7

6 8

Page 4 of 14 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2

5 5

If we review Table 1, we can see that there appears to be a greater number of higher scores in the females’ column compared to the males, suggestive that females spend longer on social media. To explore this further, we can compare the mean social media usage of each group, which are 3.1 hours for males and 7.1 hours for females. The difference in mean social media usage again suggests a difference between our groups. However, here, we want to examine the variances of each group and then determine whether the variance of social media usage (in hours) is equal across the sexes, at a statistically significant level. To do this, we need to calculate the variance for each of our groups, using Equation 1 below.

(1)

¯ 2 (X − x) S2 = ∑ n − 1

Table 2 below shows the variance scores for the Males group.

Table 2: Males’ Social Media Usage – Variances.

¯ ¯ 2 Males’ daily social media usage (in hours) X − x X − x ( )

2 −1.1 1.21

1 −2.1 4.41

4 0.9 8.1

3 −0.1 0.01

3 −0.1 0.01

4 0.9 8.1

2 −1.1 1.21

Page 5 of 14 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2

1 −2.1 4.41

6 2.9 8.41

5 1.9 3.61

¯ 2 From Table 2, we can calculate our total (X − x) to be 39.48. If we then insert this figure into Equation 1 as shown below, we can calculate our total variance for the Males’ social media usage.

39.48 S2 = 10 − 1 S2 = 4.39

The total variance for the Males group is 4.39. We now need to repeat this process for the Females group, as shown in Table 3.

Table 3: Females’ Social Media Usage – Variances.

¯ ¯ 2 Females’ daily social media usage (in hours) X − x X − x ( )

8 0.9 0.81

7 −0.1 0.01

6 −1.1 1.21

6 −1.1 1.21

9 1.9 3.61

10 2.9 8.41

5 −2.1 4.41

7 −0.1 0.01

8 0.9 0.81

Page 6 of 14 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2

5 −2.1 4.41

¯ 2 From Table 3, we can calculate our total (X − x) to be 24.9. If we then insert this figure into Equation 1 as shown below, we can calculate our total variance for the Females’ social media usage.

24.9 S2 = 10 − 1 S2 = 2.77

The total for Females is 2.77. Now that we have both our variances, we can calculate our F-statistic using the formula for the F-test to Compare Two Variances, as shown in Equation 2 below. Once we have calculated the F-statistic, we can determine whether our variances are significantly equal.

(2)

2 S1 F = 2 S2

Please note that the largest variance is always the numerator in the equation, which for our data is S1, Males. If we insert our data into Equation 2, as shown below, we can calculate our F-statistic.

4.39 F = 2.77 F = 1.58

Our F-statistic is 1.58, and we now need to consult a table of the critical values of the F-distribution to determine whether this is a statistically significant result. At a .05 significance level, the F-statistic, (df = 9), should be ≥2.98. However, for our data F = 1.58 (df = 9) and therefore we can determine that our findings are not statistically significant. Therefore, we cannot reject (at the 5% level) that there is

Page 7 of 14 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 equal variance between males and females in relation to daily social media usage.

Social scientists generally choose a critical value for an F-test such that there is a less than .05 probability that the result occurred strictly due to random chance. Thus, researchers tend to reject the null hypothesis only when the F-test statistic has a corresponding significance level (p-value) equal to or less than .05.

This example has shown you how to calculate manually an F-statistic, which is relatively easy to do with small samples. However, with larger samples, it is much easier to use statistical software.

Assumptions Behind the Test All statistical tests rely on some underlying assumptions, and they all are affected by the type of data that you have.

Assumptions of the F-test to Compare Two Variances

• The variable whose variances are being tested (i.e., the dependent variable) must be measured at the interval level, either as ordinal or scale data. • The variable by which the sample is split into categories across which the dependent variable’s variance will be compared (i.e., the independent variable) must be categorical, with two independent groups. • There must be independence of observations, so there is no relationship between the groups or between the observations in each group. • The data must be normally distributed.

Assumptions one, two, and three are not typically testable from the sample data and are related to the research design. The third assumption is only likely to be violated if, e.g., the data were sampled by pairs rather than individuals (e.g., couples rather than individual persons). It is important to understand how your

Page 8 of 14 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 data were collected and categorised; this will help you avoid violating the first three assumptions. The fourth assumption can be easily tested using statistical software.

Illustrative Example: Shotgun Ownership and Views on Government Spending on Defence This example presents an F-test to Compare Two Variances using two variables from the 2016 GSS. Specifically, we test whether the variance of an individual’s views on government spending on defence is equal across the variable shotgun ownership (yes/no).

Thus, this example addresses the following research question:

Does an individual’s views on government spending on defence vary by shotgun ownership?

Stated in the form of a null hypothesis:

H0 = There will be no difference in variances of views of government spending on defence by shotgun ownership status.

It should be noted that this is a two-tailed hypothesis.

The Data This example uses a subset of data from the 2016 GSS. This extract includes 1,338 respondents. Please note that the original dataset is larger than this, but it has been “cleaned” to include only those who have responded to our dependent variable. The two variables we examine are:

• Whether there is a shotgun in the home (SHOTGUN). • R’s views on Government spending on defense (SPARMS).

Page 9 of 14 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 The first variable, SHOTGUN, is coded 1, if a respondent reports “yes” and 2, if “no.” The second variable, SPARMS, is coded 1, if “Spend Much More,” 2, if “Spend More,” 3, if “Spend Same,” 4, if “Spend Less,” and 5, if “Spend Much Less.” Given that our dependent variable is interval-level data on a scale of decreasing support for spending on defence and our independent variable is categorical with two groups, an F-test to Compare Two Variances is therefore appropriate for these data.

Analysing the Data Before conducting the F-test to Compare Two Variances, we should first examine each variable in isolation; in addition, we need to determine whether our data are normally distributed. We start by presenting a frequency table of SHOTGUN in Table 4. We can see that the majority (81%) of the respondents do not have a shotgun in the home. We can note that there are 984 missing cases.

Table 4: Frequency Distribution of SHOTGUN.

Frequency Percent Valid percent Cumulative percent

Yes 346 12.3 19.0 19.0

Valid No 1,477 52.6 81.0 100.0

Total 1,823 64.9 100.0

IAP (Inapplicable) 978 34.8

Missing Don’t know 6 0.2

Total 984 0.6

Total 2,807 100.0

Table 5 below shows the frequency distribution of SPARMS. We can see that 43.9% of those who answered the question think that more/much more should be

Page 10 of 14 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 spent on defence, whereas just over a third (35.3%) think it should stay the same. We can note that there are 1,469 missing cases, which represents just over half of the sample (52.3%).

Table 5: Frequency Distribution of SPARMS.

Frequency Percent Valid percent Cumulative percent

Spend Much More 179 6.4 13.4 13.4

Spend More 408 14.5 30.5 43.9

Spend Same 472 16.8 35.3 79.1 Valid Spend Less 193 6.9 14.4 93.6

Spend Much Less 86 3.1 6.4 100.0

Total 1,338 47.7 100.0

IAP 1,448 51.6

Can’t Choose 11 0.4 Missing NA 10 0.4

Total 1,469 52.3

Total 2,807 100.0

Tables 4 and 5 show the distribution of the variables individually. In order to run the F-test to Compare Two Variances, we must first test for normality. The best way to do this is to run Q–Q Plots on statistical software, such as IBM® SPSS®. Figure 1 shows the Q–Q Plots for our data.

Figure 1: Q–Q Plots of SHOTGUN: SPARMS.

Page 11 of 14 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2

We can use the Q–Q Plots in Figure 1 to eyeball our data; the data look approximately normal, as the dots follow the line, and there are no major deviations from it. Therefore, our data meet the assumption on normality required for conducting the F-test to Compare Two Variances.

Conducting the F-Test to Compare Two Variances Table 6 presents the findings of our F-test to Compare Two Variances.

Table 6: F-Test to Compare Two Variances.

View of government spending on defense: View of government spending on defense: No Shotgun in home shotgun in home

Mean 2.747126 2.398809524

Variance 1.121574 1.067543484

Observations 696 168

df 695 167

F 1.050612

P (F ≤ f) one- 0.352762 tail

Page 12 of 14 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2

F Critical 1.281981 one-tail

We can see that the F-test statistic is 1.05 (df = 167, 695, p = .352); therefore, our results are not significant. Indeed, our hypothesis was two-tailed, so the p-value will actually be higher than .352. Our F-test statistic is lower than the F critical value, which also confirms that our results are not statistically significant. Therefore, we fail to reject the Null hypothesis of no difference in variance in views on government spending on defence between those who have a shotgun in the home and those who do not. Our findings suggest that there is equal variance in respondents’ views on government spending on defence between those who have a shotgun in the home and those who do not.

Presenting Results An F-test to Compare Two Variances can be reported as follows:

“We used a subset of the GSS 2016 dataset to test whether there was equal variance in views on government spending on defence between those who had a shotgun in the home and those who did not.

H0 = There will be no difference in variances of views of government spending on defence by shotgun ownership status.

The data included 1,338 adults. There was no significant difference between the variances of respondents’ views on government spending on defence between those who had a shotgun in the home and those who did not: F(167, 695) = 1.05, p = .352. This leads us to fail to reject the Null hypothesis of no difference in variances of respondents’ views on government spending on defence in relation to those who have a shotgun in the home and those who do not.”

Review

Page 13 of 14 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 The F-test to Compare Two Variances is a test to identify whether there are equal variances of a continuous or interval-level dependent variable across two groups of a categorical, independent variable. It tests the null hypothesis of no difference in variances between the groups.

You should know:

• What types of variables are suited for an F-test to Compare Two Variances. • The basic assumptions underlying this statistical test. • How to compute and interpret an F-test to Compare Two Variances. • How to report the results of an F-test to Compare Two Variances.

Your Turn You can download this sample dataset along with a guide showing how to produce an F-test to Compare Two Variances test using statistical software. The sample dataset also includes another variable called GENDER1, which is whether the respondent is male or female. See whether you can reproduce the results presented here for the SHOTGUN variable, and then try producing your own F-test to Compare Two Variances test substituting “SHOTGUN” for “GENDER1” in the analysis.

Page 14 of 14 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016)