Is the Meta-Analysis of Correlation Coefficients Accurate When Population Correlations Vary? Psychological Methods, 10 (4)
Total Page:16
File Type:pdf, Size:1020Kb
Is the meta-analysis of correlation coefficients accurate when population correlations vary? Article (Unspecified) Field, A. P. (2005) Is the meta-analysis of correlation coefficients accurate when population correlations vary? Psychological Methods, 10 (4). pp. 444-467. ISSN 1082-989X This version is available from Sussex Research Online: http://sro.sussex.ac.uk/id/eprint/718/ This document is made available in accordance with publisher policies and may differ from the published version or from the version of record. If you wish to cite this item you are advised to consult the publisher’s version. Please see the URL above for details on accessing the published version. Copyright and reuse: Sussex Research Online is a digital repository of the research output of the University. Copyright and all moral rights to the version of the paper presented here belong to the individual author(s) and/or other copyright owners. To the extent reasonable and practicable, the material made available in SRO has been checked for eligibility before being made available. Copies of full text items generally can be reproduced, displayed or performed and given to third parties in any format or medium for personal research or study, educational, or not-for-profit purposes without prior permission or charge, provided that the authors, title and full bibliographic details are credited, a hyperlink and/or URL is given for the original metadata page and the content is not changed in any way. http://sro.sussex.ac.uk Running Head: Random-effects methods of meta-analysis of correlation coefficients Is the Meta-Analysis of Correlation Coefficients Accurate when Population Correlations Vary? Andy P. Field University of Sussex, UK 1 Abstract One conceptualization of meta-analysis is that studies within the meta-analysis are sampled from populations with mean effect sizes that vary (random-effects models). The consequences of not applying such models and the comparison of different methods have been hotly debated. A Monte Carlo study compared the efficacy of Hedges and Vevea’s random-effects methods of meta-analysis with Hunter and Schmidt’s, over a wide range of conditions, as the variability in population correlations increases. (1) The Hunter-Schmidt method produced estimates of the average correlation with the least error, although estimates from both methods were very accurate; (2) confidence intervals from Hunter and Schmidt’s method were always slightly too narrow, but became more accurate than those from Hedges and Vevea’s method as the number of studies included in the meta-analysis, the size of the true correlation and the variability of correlations increased and, (3) the study weights did not explain the differences between the methods. 2 Is the Meta-Analysis of Correlation Coefficients Accurate when Population Correlations Vary? Meta-analysis is a statistical technique for assimilating research findings that was developed because of the failure of discursive reviews to provide objective assessments of the substantive importance of empirical effects (see Wolf, 1986). Although its objectivity can also be limited (for example, by the selective inclusion of studies and the difficulty in including all relevant studies because of unpublished research findings), Field (2001, 2003a,b) reports a remarkable increase in its usage since Glass (1976), Hedges and Olkin (1985), Rosenthal and Rubin (1978), Schmidt and Hunter (1977), Hunter, Schmidt and Jackson (1982), and Hunter and Schmidt (1990a) made their groundbreaking contributions. However, meta-analysis is not without controversy and recent debate has centered on the appropriate application of meta-analytic methods (e.g., Field, 2003a,b; Hunter & Schmidt, 2000) and comparisons of different methods (Field, 2001; Hall & Brannick, 2002; Johnson, Mullen & Salas, 1995; Schulze, 2004). This paper reviews these controversies before presenting empirical data comparing two different methods over a wide range of conditions. Methods of Meta-Analysis In meta-analysis, effect-size estimates from different studies are combined to try to estimate the true size of the effect in the population. Although several effect-size estimates are available (e.g., the Pearson product-moment correlation coefficient, r; Cohen’s effect-size index, d; odds ratios; risk rates; and risk differences), they, at some level, represent the same thing and in some cases can be converted into one of the other metrics (see Rosenthal, 1991; Wolf, 1986)i. The general meta-analytic framework is similar for all of these metrics: the population effect size is estimated by 3 taking effect sizes for individual studies, converting them to a common metric, and then calculating a weighted average effect size that has an associated standard error. The weight given to a particular study is often based on the sample size for that study (usually the sampling variance of the effect size), which is an indicator of the sampling accuracy of that particular study. Confidence intervals can be constructed around the weighted average and its significance can be determined from a z-test. Meta-analysis can also be used to assess the similarity of effect sizes across studies using tests of homogeneity (Hedges & Olkin, 1985) or variance estimates (Hunter & Schmidt, 1990a; 2004). Fixed- and Random-Effects Methods One controversy within the meta-analysis literature is the appropriate application of methods (Field, 2003a,b; Hunter & Schmidt, 2000). In essence, there are two ways to conceptualise meta-analysis: fixed- and random-effects models (see Hedges, 1992; Hedges & Vevea, 1998; Hunter & Schmidt, 2000) ii . The fixed-effect conceptualisation assumes that studies in the meta-analysis are sampled from a population with a fixed effect size or one that can be predicted from a few predictors; in the simplest case, the effect size in the population is constant for all studies included in a meta-analysis (Hunter & Schmidt, 2000). The alternative is to assume that population effect sizes vary randomly from study to study; that is, studies in a meta-analysis come from populations of effect sizes that are likely to have different means. Population effect sizes can, therefore, be thought of as being sampled from a universe of possible effects—a ‘superpopulation’ (Becker, 1996; Hedges, 1992). Which of the two conceptualisations to use is controversial and this issue hinges on both the assumptions that can realistically be made about the populations from which studies are sampled, and the types of inferences that researchers wish to make 4 from the meta-analysis. On the former point, there has been support for the position that real-world data are likely to have variable population parameters (Field, 2003a; Hunter & Schmidt, 1990b, 2000; National Research Council, 1992; Osburn & Callender, 1992) and empirical data have shown that real-world data do not conform to the assumption of fixed population parameters (Barrick & Mount, 1991). Figure 1 shows the distribution of between-studies standard-deviation estimates calculated (where data were available) for all meta-analytic studies using correlation coefficients published in Psychological Bulletin 1997-2002 iii . These estimates are based on the Hunter-Schmidt method (they are the square root of equation (15), described later). This histogram shows that the meta-analytic studies typically give rise to between- studies standard deviation estimates ranging from 0 to 0.3, with values of 0 being very common. However, when effect-size variability is present, it is most frequently in the region of 0.10-0.16 (which is broadly consistent with Barrick & Mount, 1991), and values as high as 0.3 are relatively infrequent. With regard to the latter point, Hedges and Vevea (1998) suggested that the choice of model depends on the type of inferences that the researcher wishes to make: fixed-effect models are appropriate for inferences that extend only to the studies included in the meta-analysis ( conditional inferences ) whereas random-effects models allow inferences that generalise beyond the studies included in the meta- analysis ( unconditional inferences ). Psychologists typically wish to make generalizations beyond the studies included in the meta-analysis and so random- effects models are more appropriate (Field, 2003a; Hunter & Schmidt, 2000). This debate has been exacerbated by the fact that fixed- and random-effects meta-analytic methods are frequently incorrectly applied. Despite some good evidence that real-world data support a random-effects conceptualisation, psychologists routinely apply fixed-effects meta-analytic methods to their data. For example, Hunter 5 and Schmidt (2000) listed 21 recent examples of meta-analytic studies using fixed- effects methods in the major review journal for psychology ( Psychological Bulletin ) compared to none using random-effects models. The theoretical consequence, according to Hunter and Schmidt (2000), is that the significance tests of the average effect size should not control the Type I error rate: they predicted inflated error rates of between 11–28%. In fact, Field (2003a) has shown using Monte Carlo simulations that Type I error rates are inflated from 5% to anywhere between 43 and 80%. So, of the 21 meta-analyses reported by Hunter and Schmidt (2000) anywhere between 9 and 17 of them are likely to have reported significant effects when in reality no true effect may have existed within the population (see Field, 2003a). One function of the homogeneity of effect-size measures mentioned earlier