Learn About ANCOVA in SPSS with Data from the Eurobarometer (63.1, Jan–Feb 2005)
Total Page:16
File Type:pdf, Size:1020Kb
Learn About ANCOVA in SPSS With Data From the Eurobarometer (63.1, Jan–Feb 2005) © 2015 SAGE Publications, Ltd. All Rights Reserved. This PDF has been generated from SAGE Research Methods Datasets. SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 Learn About ANCOVA in SPSS With Data From the Eurobarometer (63.1, Jan–Feb 2005) Student Guide Introduction This dataset example introduces ANCOVA (Analysis of Covariance). This method allows researchers to compare the means of a single variable for more than two subsets of the data to evaluate whether the means for each subset are statistically significantly different from each other or not, while adjusting for one or more covariates. This technique builds on one-way ANOVA but allows the researcher to make statistical adjustments using additional covariates in order to obtain more efficient and/or unbiased estimates of groups’ differences. This example describes ANCOVA, discusses the assumptions underlying it, and shows how to compute and interpret it. We illustrate this using a subset of data from the 2005 Eurobarometer: Europeans, Science and Technology (EB63.1). Specifically, we test whether attitudes to science and faith are different in different countries, after adjusting for differing levels of scientific knowledge between these countries. This is useful if we want to understand the extent of persistent differences in attitudes to science across countries, regardless of differing levels of information available to citizens. This page provides links to this sample dataset and a guide to producing an ANCOVA using statistical software. What Is ANCOVA? ANCOVA is a method for testing whether or not the means of a given variable are Page 2 of 14 Learn About ANCOVA in SPSS With Data From the Eurobarometer (63.1, Jan–Feb 2005) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 different between two, but usually more, subsets of the data. Those subsets are typically defined by categories of another variable. It is an extension of one-way ANOVA, the subject of another Sage Dataset example. The difference between the two techniques is that ANCOVA allows the researcher to make a statistical adjustment to the estimated mean differences between groups to equate them on some related variable. There are two main reasons why you might wish to do this. Firstly, to eliminate confounding influences when the groups are not randomly assigned. Such groups are sometimes referred to as ‘intact groups’. For example, you might compute the mean weight for people who live in urban, suburban and rural areas in your sample of data and be interested in determining if the mean weights are the same across the three groups in the population from which they are drawn. That is to say, whether the means are statistically significantly different from each other. However, you think that the areas have different proportions of people living in poverty in them and you know that poverty is a predictor of being overweight. Therefore you wish to estimate the area differences net of differences brought about by variation in poverty. The second reason for using ANCOVA is where the researcher wants to gain a more precise estimate of the differences between groups. It accomplishes this by reducing within-group error variance. A covariate that is related to the outcome variable will explain some of the variance in that outcome. Recall from the prerequisite ANOVA example that the F-test is based on the ratio of model- explained variance to unexplained or ‘error’ variance. If we reduce the error variance, then we increase the relative amount of variance explained by our model (the groups) and hence can more accurately assess its implications. If the covariate is unrelated to the groups, as is the case in an experiment where participants are randomly assigned to groups, then adjusting for it reduces the within-group variance and will therefore only have the effect of increasing the chance of finding a statistically significant result, while the size of the estimated differences between groups will remain the same. A typical scenario for this Page 3 of 14 Learn About ANCOVA in SPSS With Data From the Eurobarometer (63.1, Jan–Feb 2005) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 might be in an educational research project, where you are interested in the effect on reading scores of several different randomly assigned treatments. Using ANCOVA, you could use baseline reading score as a covariate. This would have the effect of substantially reducing the within-group variance in post-intervention scores and thereby increasing power, allowing for smaller true differences to be detected. In the other scenario, where the researcher is estimating differences based on intact groups, such as country, as in the example above, then adjusting for a covariate might also be expected to reduce the between-group variance because it is related both to the group and the outcome. This in turn may have the effect of increasing or decreasing the post-adjustment group differences, which are the group differences net of those explained by the covariate. In both cases, ANCOVA is equivalent to multiple regression, where the groups are defined by dummy variables and the covariate entered as an additional predictor. Multiple regression with dummy variables is the subject of another Sage Datasets example. The choice of which analysis to conduct is largely a matter of what is most familiar to the researcher. Typically psychologists are familiar with ANOVA and ANCOVA while sociologists and political scientists tend to use regression models more often. The results will be equivalent. When computing formal statistical tests, it is customary to define the null hypothesis (H0) to be tested. In this case, the null hypothesis is that the means of all of the groups, defined by each category of our independent, or factor, variable on the test, or dependent, variable, do not differ from each other, controlling for our covariate(s). Some difference between these means is expected simply due to random chance in sampling. The ANCOVA conducted here is designed to help us determine if any differences between groups are large enough to declare the test statistically significant. "Large enough" is typically defined as a test statistic with a level of statistical significance, or p-value, of less than 0.05. This would lead us to reject the null hypothesis (H0) of no difference and conclude that there likely is a Page 4 of 14 Learn About ANCOVA in SPSS With Data From the Eurobarometer (63.1, Jan–Feb 2005) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 relationship between the two variables. Calculating ANCOVA In this example we do not go into the computations needed to estimate the model but focus simply on the model and its implications and interpretation. The logic behind ANCOVA is that if the group means are the same as the overall, or ‘grand mean’ for the pooled sample, after adjusting for our covariate, then we would expect to see the same amount of variance within each group as we see between each group. The test statistic for ANCOVA, like ANOVA, is called an F-statistic and it is computed as the ratio of the between-group or model variance to the within-group or residual variance, conditional on the covariate(s). If the between- group variance is large relative to the within group, then F will be large. Just like the T-statistic, if it is larger than some critical value, then the test is statistically significant and we can reject the hypothesis that all of the groups’ means are the same in the population from which our data are drawn. The simplest representation of the ANCOVA model with three groups is as a regression model: (1) Yij = β0 + β1Gi1 + β2Gi2 + β3Xi + εij where Gi1 and Gi2 is are dichotomous indicators of treatment groups 1 and 2, with 1 denoting treatment and 0 denoting control. β3 is the slope of X on Y. If this model is estimated after centering X (subtracting each value of X from its grand mean), then β0 yields the mean of the control group; adding the β1 to β0 gives the mean for treatment group 1 adjusted for X, while adding β2 to β0 yields the adjusted mean for treatment group 2. Page 5 of 14 Learn About ANCOVA in SPSS With Data From the Eurobarometer (63.1, Jan–Feb 2005) SAGE SAGE Research Methods Datasets Part 2015 SAGE Publications, Ltd. All Rights Reserved. 1 Testing for Specific Group Differences The F-test produced by an ANCOVA is what is known as an omnibus test. It doesn’t tell us which particular groups are different from each other, only that at least some are different. This is in fact the advantage of the method over simply running pairs of t-tests for each combination of groups. The reason for this is beyond the scope of this example to explain in detail, but the intuition behind it is as follows. Statistical tests which reject the null when p < .05 assign a probability of obtaining the result by chance a being less than 5%, or 1 in 20. If we conduct, say, 10 t-tests to look for mean differences amongst all possible pairs of groups defined by a 5-category independent variable, the chance of rejecting the null hypothesis in at least one of those tests is now <50%, not <5%. In order to counteract this inflating of chance findings, more stringent, or conservative, tests are used when, subsequent to finding a significant F-test, we go on to explore particular contrasts between specific groups.