Learn to Use the <Span Class="Hi-Italic">F</Span>-Test To
Total Page:16
File Type:pdf, Size:1020Kb
Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) © 2019 SAGE Publications, Ltd. All Rights Reserved. This PDF has been generated from SAGE Research Methods Datasets. SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) Student Guide Introduction This example dataset introduces the F-test to Compare Two Variances. An “F-test” is a generic term for any statistical test where the test statistic (usually known as the F-statistic) uses the F-distribution. For example, the F-distribution is at the heart of determining “goodness of fit” in both ANOVA and OLS regression. However, the use of the term “F-test” is taken here to refer to the F-test to Compare Two Variances, and it is this test that will be the focus of this dataset. This example dataset introduces the F-test to Compare Two Variances, which is a test that allows us to compare the variance of a continuous or interval- level (e.g., income, opinion on government spending) dependent variable across two subsamples or groupings by a categorical independent variable (e.g., sex, ethnicity [White/Black]). By comparing the variances of our (dependent) variable across the groupings of our (independent) variable, we can assess whether the variance of the former is different at different values of the latter, This example describes the F-test to Compare Two Variances, discusses the assumptions underlying it, and shows how to compute and interpret it. We illustrate the F-test to Compare Two Variances using a subset of data from the 2016 General Social Survey (GSS). Specifically, we test the extent to which the Page 2 of 14 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 variance of an individual’s views on government spending differs by whether an individual has a shotgun in the home. This page provides links to this sample dataset and a guide to producing the F-test to Compare Two Variances test using statistical software. What Is an F-test? F-test refers, generically, to any statistical test where the test statistic (usually called the F-statistic) follows an F-distribution. However, a common use of the term F-test is taken to mean the F-test to Compare Two Variances. The F-test to Compare Two Variances is a test that allows us to determine whether the variance of a continuous, interval-level dependent variable is equal across two subsamples or groupings of a categorical independent variable. In our example, we want to examine whether the variances of an individual’s views on government spending on defence are equal across the categories of the indicator variable shotgun ownership (yes/no). In other words, is there a statistically significant difference by shotgun ownership in the variance of an individual’s views on government spending on defence. When computing formal statistical tests, it is customary to define the null hypothesis (H0) to be tested. In this case, the standard null hypothesis is that there are equal variances (i.e., they are equal) between whether or not an individual owns a shotgun in relation to an individual’s views on government spending on defence. Some difference in variances is expected simply due to sampling error; i.e., random chance in sampling. The F-test to Compare Two Variances conducted here will help us determine whether the difference in variances is large enough to have been unlikely to arise by chance alone, so that we can declare the test statistically significant. “Large enough,” as usually defined, is a test statistic with an associated level of statistical significance, or p-value, of less than .05. This would lead us to reject the null hypothesis (H0) of equal Page 3 of 14 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 variances and conclude that there likely is an association between the categories of the (independent) categorical variable and the variances of the (dependent) variable being tested. Calculating an F-Test to Compare Two Variances The F-test to Compare Two Variances works by using an F-statistic to compare the variance of a continuous or interval-level variable across two subsamples or groupings by a categorical independent variable, by dividing them with each other. This test operates on the null hypothesis that there is equal variance between the groups. To illustrate, imagine that we had collected data on social media usage (measured in hours per day) from a small group (N = 10) of adolescents (categorised into two groups: males, females). Table 1 below shows the frequency distribution of our data. Table 1: Frequency Distribution of Social Media Usage by Gender. Males’ daily social media usage (in hours) Females’ daily social media usage (in hours) 2 8 1 7 4 6 3 6 3 9 4 10 2 5 1 7 6 8 Page 4 of 14 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 5 5 If we review Table 1, we can see that there appears to be a greater number of higher scores in the females’ column compared to the males, suggestive that females spend longer on social media. To explore this further, we can compare the mean social media usage of each group, which are 3.1 hours for males and 7.1 hours for females. The difference in mean social media usage again suggests a difference between our groups. However, here, we want to examine the variances of each group and then determine whether the variance of social media usage (in hours) is equal across the sexes, at a statistically significant level. To do this, we need to calculate the variance for each of our groups, using Equation 1 below. (1) ¯ 2 (X − x) S2 = ∑ n − 1 Table 2 below shows the variance scores for the Males group. Table 2: Males’ Social Media Usage – Variances. ¯ ¯ 2 Males’ daily social media usage (in hours) X − x X − x ( ) 2 −1.1 1.21 1 −2.1 4.41 4 0.9 8.1 3 −0.1 0.01 3 −0.1 0.01 4 0.9 8.1 2 −1.1 1.21 Page 5 of 14 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 1 −2.1 4.41 6 2.9 8.41 5 1.9 3.61 ¯ 2 From Table 2, we can calculate our total (X − x) to be 39.48. If we then insert this figure into Equation 1 as shown below, we can calculate our total variance for the Males’ social media usage. 39.48 S2 = 10 − 1 S2 = 4.39 The total variance for the Males group is 4.39. We now need to repeat this process for the Females group, as shown in Table 3. Table 3: Females’ Social Media Usage – Variances. ¯ ¯ 2 Females’ daily social media usage (in hours) X − x X − x ( ) 8 0.9 0.81 7 −0.1 0.01 6 −1.1 1.21 6 −1.1 1.21 9 1.9 3.61 10 2.9 8.41 5 −2.1 4.41 7 −0.1 0.01 8 0.9 0.81 Page 6 of 14 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 5 −2.1 4.41 ¯ 2 From Table 3, we can calculate our total (X − x) to be 24.9. If we then insert this figure into Equation 1 as shown below, we can calculate our total variance for the Females’ social media usage. 24.9 S2 = 10 − 1 S2 = 2.77 The total for Females is 2.77. Now that we have both our variances, we can calculate our F-statistic using the formula for the F-test to Compare Two Variances, as shown in Equation 2 below. Once we have calculated the F-statistic, we can determine whether our variances are significantly equal. (2) 2 S1 F = 2 S2 Please note that the largest variance is always the numerator in the equation, which for our data is S1, Males. If we insert our data into Equation 2, as shown below, we can calculate our F-statistic. 4.39 F = 2.77 F = 1.58 Our F-statistic is 1.58, and we now need to consult a table of the critical values of the F-distribution to determine whether this is a statistically significant result. At a .05 significance level, the F-statistic, (df = 9), should be ≥2.98. However, for our data F = 1.58 (df = 9) and therefore we can determine that our findings are not statistically significant. Therefore, we cannot reject (at the 5% level) that there is Page 7 of 14 Learn to Use the F-Test to Compare Two Variances in R With Data From the General Social Survey (2016) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd.