<<

Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998)

© 2015 SAGE Publications, Ltd. All Rights Reserved. This PDF has been generated from SAGE Research Methods Datasets. SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998)

Student Guide

Introduction This dataset example introduces (ANCOVA). This method allows researchers to compare the means of a single continuous variable for two or more subsets of the data to evaluate whether the means for each subset are statistically significantly different from each other, while adjusting for one or more variables that might covary with the continuous variable of interest.

The subsets of the data are defined by values from a categorical variable in the dataset. This technique builds on one-way ANOVA by allowing the researcher to make statistical adjustments using additional covariates to obtain more efficient and/or unbiased estimates of group differences.

This example describes ANCOVA, discusses the assumptions underlying it, and shows how to compute and interpret it. We illustrate ANCOVA using data from the Early Childhood Longitudinal Study, Kindergarten Class of 1998–99 (ECLSK) at the National Center for Education Statistics (https://nces.ed.gov/ecls/ kindergarten.asp). Specifically, we test whether kindergarten students’ scores on a general knowledge test in the Spring differ across income categories, adjusting for their general knowledge score from the preceding Fall. Analysis like this might

Page 2 of 14 Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 help researchers and policy makers better understand early childhood education.

What Is ANCOVA? ANCOVA is a method for testing whether the means of a given variable are different between two or more subsets of the data. Those subsets are typically defined by categories of another variable. ANCOVA is an extension of one-way ANOVA, the subject of another Sage Dataset example. The difference between the two techniques is that ANCOVA allows the researcher to make a statistical adjustment to the estimated mean differences between groups to equate them on some related variable. There are two main reasons why you might wish to do this.

First, this allows you to control for the influence of variables that covary with the continuous variable of interest. For example, you might compute the mean body weight for people who live in urban, suburban and rural areas in your sample of data and be interested in determining if the mean weights are statistically significantly different from each other across the three groups. However, if the areas have different proportions of people living in poverty in them, and poverty is correlated with being overweight, then you could adjust your estimates of average weight differences by including poverty as a covariate.

Second, ANCOVA lets researchers gain a more precise estimate of the differences between groups by reducing within-group error . A covariate that is related to the outcome variable will explain some of the variance in that outcome. Recall from the prerequisite ANOVA example that the F-test is based on the ratio of model-explained to the unexplained or ‘error’ variance. If we reduce the error variance, then we increase the relative amount of variance explained by our model (the groups) and hence can more accurately assess its implications. If the covariate is unrelated to the groups, as is the case in an experiment where participants are randomly assigned to groups, then adjusting for it reduces the within-group variance and will therefore only have the effect of increasing the

Page 3 of 14 Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 chance of finding a statistically significant result, while the size of the estimated differences between groups will remain the same.

A typical scenario for this might be in an educational research project, where you are interested in the effect of several different randomly assigned treatments on reading scores. Using ANCOVA, you could use baseline reading score as a covariate. This would have the effect of substantially reducing the within-group variance in post-intervention scores and thereby increasing power, allowing for smaller true differences to be detected. In the other scenario, where the researcher is estimating differences based on intact groups, such as country, then adjusting for a covariate might also be expected to reduce the between- group variance because it is related both to the group and to the outcome. This in turn may have the effect of increasing or decreasing the post-adjustment group differences, which are the group differences net of those explained by the covariate.

In both cases, ANCOVA is equivalent to multiple regression, where the groups are defined by dummy variables and the covariate entered as an additional predictor. Multiple regression with dummy variables is the subject of another Sage Datasets example. The choice of which analysis to conduct is largely a matter of what is most familiar to the researcher. Typically psychologists are familiar with ANOVA and ANCOVA while sociologists and political scientists tend to use regression models more often. The results will be equivalent.

When computing formal statistical tests, it is customary to define the null hypothesis (H0) to be tested. In this case, the null hypothesis is that the means of all of the groups, defined by each category of our independent, or factor, variable on the test, or dependent, variable, do not differ from each other, controlling for our covariate(s). Some difference between these means is expected, simply due to random chance in sampling. ANCOVA is designed to help us determine if

Page 4 of 14 Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 any differences between groups are large enough to declare the test statistically significant. “Large enough” is typically defined as a test statistic with a level of , or p-value, of less than 0.05. This would lead us to reject the null hypothesis (H0) of no difference and conclude that there likely is a relationship between the two variables.

Calculating ANCOVA In this example we do not go into the computations needed to estimate the model but focus simply on the model and its implications and interpretation. The logic behind ANCOVA is that if the group means are the same as the overall, or ‘grand mean’ for the pooled sample, after adjusting for our covariate, then we would expect to see the same amount of variance within each group as we see between each group. The test statistic for ANCOVA, like ANOVA, is called an F-statistic and it is computed as the ratio of the between-group, or model, variance to the within-group, or residual, variance, conditional on the covariate(s). If the between- group variance is large relative to the within-group, then F will be large. Just like the T-statistic, if it is larger than some critical value, then the test is statistically significant and we can reject the hypothesis that all of the group means are the same in the population from which our data are drawn.

The simplest representation of the ANCOVA model with three groups is as a regression model:

(1)

Yij = β0 + β1G1i + β2G2i + β3Xi + εij where G1i and G2i is are dichotomous indicators of treatment groups 1 and 2, with 1 denoting treatment and 0 denoting control. β3 is the marginal effect, or slope, of X on Y. If this model is estimated after centering X (subtracting each value of X from its overall mean), then β0 yields the mean of the control group; adding the β1

Page 5 of 14 Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 to β0 gives the mean for treatment group 1 adjusted for X, while adding β2 to β0 yields the adjusted mean for treatment group 2.

Testing for Specific Group Differences The F-test produced by an ANCOVA is what is known as an omnibus test. It does not tell us which particular groups are different from each other, only that at least some are different. This is in fact the advantage of the method over simply running pairs of t-tests for each combination of groups. Statistical tests which reject the null when p < 0.05 assign a probability of obtaining the result by chance as being less than 5%, or 1 in 20. If we conduct, say, 10 t-tests to look for mean differences among all possible pairs of groups defined by a 5-category independent variable, the chance of rejecting the null hypothesis in at least one of those tests is now <50%, not <5%. In order to counteract this inflating of chance findings, more stringent or conservative tests are used when, subsequent to finding a significant F-test, we go on to explore particular contrasts between specific groups. These are called post-hoc tests. The most commonly used are Tukeys and Bonferroni, which are both available in statistical software. It is also possible to conduct a limited set of planned-in-advance contrasts as part of the main ANCOVA, for instance in a clinical trial, comparing experimental treatments with a control group but not with each other. It is beyond the scope of this example to discuss planned contrasts further.

Assumptions Behind the Method Nearly every statistical test relies on some underlying assumptions, and they all are affected by the mix of data you happen to have. Critical considerations for a one-way ANOVA include:

• The observations in each group are sampled independently of each other. • The observations in each group are drawn from populations that are

Page 6 of 14 Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 normally distributed. • The of the variable of interest are approximately equal across groups. • The slopes of covariate(s) on Y are homogenous.

The first assumption is not typically testable from the sample data. However, if the data is sampled by pairs rather than individuals (e.g. couples rather than individual persons), then the independence assumption is likely violated. The second can be addressed by examining the sample distribution for normality. The remaining assumptions can easily be tested in most statistical software programs, although it is beyond the scope of this example to discuss this further.

Illustrative Example: General Knowledge Performance in Kindergarten in the U.S. This example presents an ANCOVA designed to test whether kindergarten student performance on a general knowledge score in the Spring differs based on their family’s income, controlling for their general knowledge score from the previous Fall. This example therefore addresses the following research question:

Do children from families with higher incomes have higher average general knowledge scores, controlling for their previous general knowledge score?

We can also state this in the form of a null hypothesis:

H0 = After accounting for a student’s general knowledge in the Fall, there is no difference in their Spring general knowledge score based on family income.

The Data

Page 7 of 14 Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 This example uses a subset of data from the first wave of the ECLSK dataset. This extract includes data from 11,933 students who were in kindergarten in the 1998–99 academic year. The three variables we examine are:

• General knowledge performance in the Fall of 1998 (c1rgscal). • General knowledge performance in the Spring of 1999 (c2rgscal). • Household income divided into three categories (incomecat).

The two general knowledge variables are scales based on student responses to a large number of test items. The scales are built using item response theory, which is a common method of measuring performance based on multiple test items, and include questions about science and social studies. The score for the Fall ranges from about 7 to almost 48, with a mean of 23 and a standard deviation of about 7.4. The score for the Spring ranges from almost 8 to just over 48, with a mean of 28.2 and a standard deviation of about 7.6.

The variable incomecat is based on the total family income reported in the survey (p2income), but here divided into three roughly equal sized groups:

1 = Low (below $40,000 annually) 2 = Middle (from $40,000 to $69,999 annually) 3 = High ($70,000 and above annually)

Analyzing the Data Before conducting the ANCOVA, we should first examine each variable in isolation. We start by presenting histograms of the Fall and Spring general knowledge scores in Figures 1 and Figure 2, respectively.

Figure 1: Histogram showing the distribution of the general knowledge score for kindergarten students in the Fall of 1998,

Page 8 of 14 Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1

ECLSK.

Figure 2: Histogram showing the distribution of the general knowledge score for kindergarten students in the Spring of 1999, ECLSK.

Page 9 of 14 Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1

Figure 1 shows that the majority of general knowledge scores for the Fall of 1998 cluster around the mean of 23, with most falling between 10 and 40. There is a slight positive skew to the distribution resulting from a small number of somewhat higher scores, but there appears to be little to worry about for this analysis.

Figure 2 shows that the majority of general knowledge scores for the Spring of 1999 cluster around the mean of 28, with most falling between 10 and 45. The slight positive skew that existed in the Fall measure appears to have largely disappeared by the Spring of 1999, as the distribution in Figure 2 appears to be symmetric.

Page 10 of 14 Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 Table 1 presents a frequency distribution of the 3-category income variable. We have 4729 (39.63%) observations in the Low category, 3726 (31.22%) in the Middle category, and 3478 (29.15%) in the High category.

Table 1: Frequency distribution of income categories, ECLSK.

Frequency Percent Cumulative Percent

Low Income 4729 39.63 39.63

Middle Income 3726 31.22 70.85

High Income 3478 29.15 100.0

Total 11933 100.0 100.0

Figures 1 and 2, along with Table 1, show the distribution of each of these variables by themselves. Now we turn to estimating the ANCOVA model to determine if general knowledge scores differ across categories of income, adjusting for previous levels of general knowledge.

Table 2 presents the average Spring general knowledge scores for students from relative low, middle, and higher income families. The first column does so without controlling for the influence of their general knowledge score from the previous Fall. The second column of Table 2 reports the means again, this time adjusted for the possible confounding effects of the prior general knowledge score.

Table 2: Results from ANCOVA analysis of average general knowledge scores for kindergarten students in Spring, 1999 across income categories, controlling for average general knowledge scores in Fall, 1998, ECLSK.

Sample Size Mean Adjusted Mean

Low Income 4729 25.1 27.7

Middle Income 3726 29.1 28.4

Page 11 of 14 Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1

High Income 3478 31.6 28.7

F-Statistic 10818

Degrees of Freedom 3

Significance <0.001

Table 2 shows that the average general knowledge scores for students from lower income families was 25.1, which increased to 29.1 for those in middle income families up to 31.6 for those in higher income families. However, the gaps between income groups are smaller once those means are adjusted for the scores students had in the previous Fall. This means that part of the differences across income groups observed in the means in the second column of Table 2 result from similar differences across income groups for the Fall, 1998 scores.

Still, the F-statistic reported in Table 2 of 10818 with 3 degrees of freedom is much larger than we would expect to observe strictly due to random chance (p-value < 0.001). This would lead us to reject the null hypothesis of no difference in the Spring, 1999 general knowledge scores across income categories, even after controlling for students’ Fall, 1998 general knowledge scores.

Having run an overall ANCOVA, researchers may want to explore specific pairwise comparisons between the adjusted means for the various subsets of data. In this case, that would involve comparing the adjusted mean of the Spring general knowledge score for each income category to the other two (e.g. Low to Middle; Low to High; Middle to High). When making any pairwise comparison, the implied null hypothesis is that there is no difference in mean attitudes between the two groups in the pair. We run a Bonferroni test making all three comparisons and find statistically significant differences (p-value < 0.05) in every case. Hence we can reject the null hypothesis.

Readers interested in doing the calculations for the F-test by hand or outside of

Page 12 of 14 Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 statistical software should know that many statistics textbooks include tables of critical values for the F-distribution in an appendix, and such tables are widely available online as well.

Presenting Results The results from an ANCOVA can be reported as follows:

“We used a subset of data from the Early Childhood Longitudinal Study, Kindergarten Class of 1998–99 (ECLSK) at the National Center for Education Statistics to test the following null hypothesis:

H0 = After accounting for a student’s general knowledge in the Fall, there is no difference in their Spring general knowledge score based on family income.

The dataset includes 11,933 students. Table 2 presents the results of an ANCOVA model designed to test the null hypothesis. Table 2 shows that the average general knowledge scores for students from lower income families was 25.1, which increased to 29.1 for those in middle income families up to 31.6 for those in higher income families. However, the gaps between income groups are smaller once those means are adjusted for the scores students had in the previous Fall. This means that part of the differences across income groups observed in the means in the second column of Table 2 result from similar differences across income groups for the Fall, 1998 scores.

Still, the F-statistic reported in Table 2 of 10818 with 3 degrees of freedom is much larger than we would expect to observe strictly due to random chance (p-value < 0.001). This would lead us to reject the null hypothesis of no difference in the Spring, 1999 general knowledge scores across income categories, even after controlling for students’ Fall, 1998 general knowledge scores. Furthermore, we

Page 13 of 14 Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 ran a Bonferroni test of all pairwise comparisons (Low to Middle; Low to High; Middle to High) and found statistically significant differences (p-value < 0.05) in every case.”

Review ANCOVA is a statistical test used to evaluate whether the mean of a continuous variable differs between two or more groups, after adjusting for one or more covariates. It tests the null hypothesis of no difference between group means. Thus, it tests whether a continuous variable and a categorical variable are related to each other, after taking into account one or more covariates.

You should know:

• What types of variable are suitable for ANCOVA. • The basic assumptions underlying ANCOVA. • How to estimate and interpret ANCOVA. • How to report the results of ANCOVA.

Your Turn You can download this sample dataset along with a guide showing how to produce an ANCOVA using statistical software. The sample dataset includes Fall and Spring performance scores for reading (c1r4rscl and c2r4rscl) and for math (c1r4mscl and c2r4mscl), respectively. See if you can reproduce the results presented here, and then try producing your own ANCOVA using these subject- specific scores rather than the general knowledge scores to see if differences across income groups appear.

Page 14 of 14 Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998)