Learn About ANCOVA in SPSS with Data from the Early Childhood Longitudinal Study (1998)
Total Page:16
File Type:pdf, Size:1020Kb
Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998) © 2015 SAGE Publications, Ltd. All Rights Reserved. This PDF has been generated from SAGE Research Methods Datasets. SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998) Student Guide Introduction This dataset example introduces Analysis of Covariance (ANCOVA). This method allows researchers to compare the means of a single continuous variable for two or more subsets of the data to evaluate whether the means for each subset are statistically significantly different from each other, while adjusting for one or more variables that might covary with the continuous variable of interest. The subsets of the data are defined by values from a categorical variable in the dataset. This technique builds on one-way ANOVA by allowing the researcher to make statistical adjustments using additional covariates to obtain more efficient and/or unbiased estimates of group differences. This example describes ANCOVA, discusses the assumptions underlying it, and shows how to compute and interpret it. We illustrate ANCOVA using data from the Early Childhood Longitudinal Study, Kindergarten Class of 1998–99 (ECLSK) at the National Center for Education Statistics (https://nces.ed.gov/ecls/ kindergarten.asp). Specifically, we test whether kindergarten students’ scores on a general knowledge test in the Spring differ across income categories, adjusting for their general knowledge score from the preceding Fall. Analysis like this might Page 2 of 14 Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 help researchers and policy makers better understand early childhood education. What Is ANCOVA? ANCOVA is a method for testing whether the means of a given variable are different between two or more subsets of the data. Those subsets are typically defined by categories of another variable. ANCOVA is an extension of one-way ANOVA, the subject of another Sage Dataset example. The difference between the two techniques is that ANCOVA allows the researcher to make a statistical adjustment to the estimated mean differences between groups to equate them on some related variable. There are two main reasons why you might wish to do this. First, this allows you to control for the influence of variables that covary with the continuous variable of interest. For example, you might compute the mean body weight for people who live in urban, suburban and rural areas in your sample of data and be interested in determining if the mean weights are statistically significantly different from each other across the three groups. However, if the areas have different proportions of people living in poverty in them, and poverty is correlated with being overweight, then you could adjust your estimates of average weight differences by including poverty as a covariate. Second, ANCOVA lets researchers gain a more precise estimate of the differences between groups by reducing within-group error variance. A covariate that is related to the outcome variable will explain some of the variance in that outcome. Recall from the prerequisite ANOVA example that the F-test is based on the ratio of model-explained to the unexplained or ‘error’ variance. If we reduce the error variance, then we increase the relative amount of variance explained by our model (the groups) and hence can more accurately assess its implications. If the covariate is unrelated to the groups, as is the case in an experiment where participants are randomly assigned to groups, then adjusting for it reduces the within-group variance and will therefore only have the effect of increasing the Page 3 of 14 Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 chance of finding a statistically significant result, while the size of the estimated differences between groups will remain the same. A typical scenario for this might be in an educational research project, where you are interested in the effect of several different randomly assigned treatments on reading scores. Using ANCOVA, you could use baseline reading score as a covariate. This would have the effect of substantially reducing the within-group variance in post-intervention scores and thereby increasing power, allowing for smaller true differences to be detected. In the other scenario, where the researcher is estimating differences based on intact groups, such as country, then adjusting for a covariate might also be expected to reduce the between- group variance because it is related both to the group and to the outcome. This in turn may have the effect of increasing or decreasing the post-adjustment group differences, which are the group differences net of those explained by the covariate. In both cases, ANCOVA is equivalent to multiple regression, where the groups are defined by dummy variables and the covariate entered as an additional predictor. Multiple regression with dummy variables is the subject of another Sage Datasets example. The choice of which analysis to conduct is largely a matter of what is most familiar to the researcher. Typically psychologists are familiar with ANOVA and ANCOVA while sociologists and political scientists tend to use regression models more often. The results will be equivalent. When computing formal statistical tests, it is customary to define the null hypothesis (H0) to be tested. In this case, the null hypothesis is that the means of all of the groups, defined by each category of our independent, or factor, variable on the test, or dependent, variable, do not differ from each other, controlling for our covariate(s). Some difference between these means is expected, simply due to random chance in sampling. ANCOVA is designed to help us determine if Page 4 of 14 Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 any differences between groups are large enough to declare the test statistically significant. “Large enough” is typically defined as a test statistic with a level of statistical significance, or p-value, of less than 0.05. This would lead us to reject the null hypothesis (H0) of no difference and conclude that there likely is a relationship between the two variables. Calculating ANCOVA In this example we do not go into the computations needed to estimate the model but focus simply on the model and its implications and interpretation. The logic behind ANCOVA is that if the group means are the same as the overall, or ‘grand mean’ for the pooled sample, after adjusting for our covariate, then we would expect to see the same amount of variance within each group as we see between each group. The test statistic for ANCOVA, like ANOVA, is called an F-statistic and it is computed as the ratio of the between-group, or model, variance to the within-group, or residual, variance, conditional on the covariate(s). If the between- group variance is large relative to the within-group, then F will be large. Just like the T-statistic, if it is larger than some critical value, then the test is statistically significant and we can reject the hypothesis that all of the group means are the same in the population from which our data are drawn. The simplest representation of the ANCOVA model with three groups is as a regression model: (1) Yij = β0 + β1G1i + β2G2i + β3Xi + εij where G1i and G2i is are dichotomous indicators of treatment groups 1 and 2, with 1 denoting treatment and 0 denoting control. β3 is the marginal effect, or slope, of X on Y. If this model is estimated after centering X (subtracting each value of X from its overall mean), then β0 yields the mean of the control group; adding the β1 Page 5 of 14 Learn About ANCOVA in SPSS With Data From the Early Childhood Longitudinal Study (1998) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 to β0 gives the mean for treatment group 1 adjusted for X, while adding β2 to β0 yields the adjusted mean for treatment group 2. Testing for Specific Group Differences The F-test produced by an ANCOVA is what is known as an omnibus test. It does not tell us which particular groups are different from each other, only that at least some are different. This is in fact the advantage of the method over simply running pairs of t-tests for each combination of groups. Statistical tests which reject the null when p < 0.05 assign a probability of obtaining the result by chance as being less than 5%, or 1 in 20. If we conduct, say, 10 t-tests to look for mean differences among all possible pairs of groups defined by a 5-category independent variable, the chance of rejecting the null hypothesis in at least one of those tests is now <50%, not <5%. In order to counteract this inflating of chance findings, more stringent or conservative tests are used when, subsequent to finding a significant F-test, we go on to explore particular contrasts between specific groups. These are called post-hoc tests. The most commonly used are Tukeys and Bonferroni, which are both available in statistical software. It is also possible to conduct a limited set of planned-in-advance contrasts as part of the main ANCOVA, for instance in a clinical trial, comparing experimental treatments with a control group but not with each other.