Resampling Tests for Meta-Analysis of Ecological Data (Pdf)
Total Page:16
File Type:pdf, Size:1020Kb
Thursday Sep 17 09:12 AM ecol d78 502 Mp 1277 Allen Press x DTPro File # 03sc June 1997 REPORTS 1277 Ecology, 78(5), 1997, pp. 1277±1283 q 1997 by the Ecological Society of America RESAMPLING TESTS FOR META-ANALYSIS OF ECOLOGICAL DATA DEAN C. ADAMS,JESSICA GUREVITCH, AND MICHAEL S. ROSENBERG Department of Ecology and Evolution, State University of New York at Stony Brook, Stony Brook, New York 11794-5245 USA Abstract. Meta-analysis is a statistical technique that allows one to combine the results from multiple studies to glean inferences on the overall importance of various phenomena. This method can prove to be more informative than common ``vote counting,'' in which the number of signi®cant results is compared to the number with nonsigni®cant results to determine whether the phenomenon of interest is globally important. While the use of meta- analysis is widespread in medicine and the social sciences, only recently has it been applied to ecological questions. We compared the results of parametric con®dence limits and ho- mogeneity statistics commonly obtained through meta-analysis to those obtained from re- sampling methods to ascertain the robustness of standard meta-analytic techniques. We found that con®dence limits based on bootstrapping methods were wider than standard con®dence limits, implying that resampling estimates are more conservative. In addition, we found that signi®cance tests based on homogeneity statistics differed occasionally from results of randomization tests, implying that inferences based solely on chi-square signif- icance tests may lead to erroneous conclusions. We conclude that resampling methods should be incorporated in meta-analysis studies, to ensure proper evaluation of main effects in ecological studies. Key words: bootstrapping; meta-analysis; randomization tests; resampling statistics vs. standard methods; statistical techniques. INTRODUCTION els, although random-effects models exist (Raudenbush 1994), and a mixed model has been proposed (Gur- There is a compelling need for new methods for com- evitch and Hedges 1993). These procedures are sum- bining ecological data from different experimental studies in order to reach general conclusions. While marized elsewhere (Hedges and Olkin 1985, Cooper conventional reviews and syntheses of ecological data and Hedges 1994). The parametric model used to derive have relied on subjective, narrative methods, or ``vote- the effect size d used in meta-analysis relies on the counting'' approaches, ecologists have recently begun assumption that the observations in the experimental to explore the use of meta-analysis to address this need. and control groups are normally distributed for each Meta-analysis is a set of statistical methods that pro- study. The large-sample distribution of d tends to nor- vides a rigorous framework for the quantitative syn- mality, and the large-sample approximation to the dis- thesis of the results of independent studies. The use of tribution of the effect size estimator is fairly reliable these techniques has become widespread and even rou- for sample sizes that exceed 10 in each of the control tine in medicine and in the social sciences, but their and experimental groups. The large-sample approxi- potential for integrating ecological data is just begin- mation is likely to be less accurate when sample sizes ning to be realized (Arnqvist and Wooster 1995). Re- are small, when there are large differences in sample cent applications to ecological problems include syn- size between the experimental and control groups, and theses of the experimental evidence for competition with very large effect sizes (Hedges and Olkin 1985). (Gurevitch et al. 1992), the responses of woody plant Unfortunately, it is not uncommon for ecological data to violate each of these conditions (e.g., Gurevitch et species to elevated CO2 (Curtis 1996), and the effec- tiveness of crop diversi®cation in deterring herbivorous al. 1992). In addition, the test statistic used to assess insects (Tonhasca and Byrne 1994). the homogeneity of the effect sizes among studies, Q, Most researchers use parametric ®xed-effects mod- is approximately chi-square distributed when the above assumptions of normality are met. If they are violated, Manuscript received 16 May 1996; revised 9 September the conventional tests of homogeneity may be ¯awed 1996; accepted 13 September 1996; ®nal version received 24 (Hedges and Olkin 1985). October 1996. One alternative to traditional parametric and non- Thursday Sep 17 09:12 AM ecol d78 502 Mp 1278 Allen Press x DTPro File # 03sc 1278 REPORTS Ecology, Vol. 78, No. 4 TABLE 1. Equations used in the calculation of mean effect sizes and homogeneity components in meta-analysis. Symbols in equations follow Gurevitch and Hedges (1993) and are as C E follows: Xij 5 mean of control group, Xij 5 mean of the experimental group, sij 5 pooled standard deviation of the control and experimental groups, J 5 correction term for bias because of small sample size, w 5 weighting for each study [see Methods: Calculations for parametric and nonparametric weighting schemes]. Statistic Symbol Equation 22 Study effect size dij X EC2 X ij ij J sij ki Class effect size di1 wdij ij O5 j 1 ki wij O5 j 1 mki Grand mean effect size d11 wdij ij OO55 i 1 j 1 mki wij OO55 i 1 j 1 mki Homogeneity within classes QW 2 wij(d ij2 d i1) OO55 i 1 j 1 Homogeneity between classes Q mki B 2 wij(d i1112 d ) OO55 i 1 j 1 parametric statistical tests is the use of resampling results from standard meta-analytic methods for three methods. These computer-intensive techniques are now ecological data sets. beginning to gain wider application in single-study METHODS analyses in ecology and evolution (Manly 1991, Crow- ley 1992). Such methods have not previously been ap- Calculations plied to meta-analysis. Resampling methods test the In conventional meta-analysis, effect sizes are cal- signi®cance of a statistic by generating a distribution culated from the means, sample sizes, and standard of that statistic by permuting the data many times, each deviations of the experimental and control groups in time recalculating the statistic. By comparing the orig- each study (Hedges and Olkin 1985), and are then com- inal statistic to this generated distribution, a signi®- bined to obtain an estimate of the mean effect size for cance level can be determined (Kempthorne and Doer- each class, di1, as well as the grand mean effect size ¯er 1969, Manly 1991). Resampling methods such as for all studies, d11. It is often of interest to test whether the bootstrap can also be used to estimate con®dence classes of studies differ in their effect sizes. A ho- limits for statistics. Because they generate their own mogeneity statistic, QB, can be used to assess whether distributions, resampling methods are free from the dis- the classes of studies differ signi®cantly from one an- tribution assumptions of parametric tests, and, in many other, and the statistic QW can be used to test for within- cases, may be more powerful than conventional non- class homogeneity (see Table 1 for formulas). This parametric ranking approaches (Manly 1991, Adams method of determining within- and between-class ho- and Anthony 1996). mogeneity is analogous to the partitioning of variance Despite their growing popularity in primary analy- into within- and between-group components in an anal- ses, resampling and randomization techniques have not ysis of variance (see Gurevitch and Hedges [1993] for been used in meta-analysis in any ®eld, and their ap- a more detailed explanation). plication to this secondary level of analysis raises ques- Studies are typically weighted by an estimate of the tions that have not been addressed before by statisti- precision of the effect size, based on the reasonable cians. Because ecological data may violate some of the assumption that more-precise studies (e.g., those with assumptions for common meta-analysis statistics, we larger sample sizes) should be weighted more heavily propose an approach by which resampling methods can than those that are less precise. The parametric weights be applied to the statistical tests of signi®cance and to usually used are inversely proportional to the estimated the calculation of con®dence limits in meta-analysis. sampling variance, and are calculated as wij 5 1/vij, We then compare the results of these analyses to the where Thursday Sep 17 09:12 AM ecol d78 502 Mp 1279 Allen Press x DTPro File # 03sc June 1997 REPORTS 1279 N EC1 Nd 2 mean effect sizes and 95% con®dence limits for each v ij ij ij ij 51EC E C NNij ij2(N ij1 N ij ) class using parametric, ®xed-effects model and mixed- effects model meta-analytic techniques. We also cal- (Hedges and Olkin 1985). NE and NC are the experi- culated the between-class homogeneity (Q ) for each mental and control group sample sizes, and d is the B ij data set and tested this against a chi-square distribution effect size for that study. This weighting minimizes the to determine if classes differed signi®cantly from one variance of d , and is the most precise weighting es- i1 another. We then calculated bootstrap con®dence limits timate when the assumptions based on large-sample for the mean effect sizes for each class and for the theory are satis®ed. An alternative weighting that grand mean effect sizes for comparison. makes fewer assumptions, but still incorporates the de- We ®rst calculated bootstrap con®dence limits for sired property of counting larger studies more heavily the mean class effect sizes using the conventional meth- than small ones is od, the percentile bootstrap (Efron 1979). For each NNEC class, we chose i studies with replacement and calcu- w ij ij ij 5 EC Nij1 N ij lated a weighted mean effect size.