<<

journal of and Community Health 1991; 45: 89-92 J Epidemiol Community Health: first published as 10.1136/jech.45.2.89 on 1 June 1991. Downloaded from REVIEW ARTICLE: methods in epidemiology, V

The potential and limitations of meta-analysis

Tim D Spector, Simon G Thompson

We are currently wimessing an "" of numbers is particularly relevant when dealing meta-analyses and overviews in the scientific with subgroup analysis, for which very often the literature. This is a relatively new phenomenon randomised controlled trial was not designed. and this article addresses some of the important Combining the results of comparable trials or issues raised by their increasing use. In particular studies can reduce random errors that the differing applications and limitations ofmeta- may predominate in any individual study. The analysis are discussed, with a review of the larger the sample size available, the more precise analytic methods used and the problems and the estimate of the effect, and the of biases encountered. subgroup effects can be more reliably investigated. What is meta-analysis? It has been suggested by some authors that only Meta-analysis has come to refer to the combining randomised controlled trials should be subjected ofresults from a number ofexperiments or studies to meta-analysis.8 However this restriction is not examining the same question. Such a process is desirable; aetiological meta-analyses (ie, of case- not new, and some meta-analytic studies were control or prospective studies) have recently been reported as early as 1955.1 However, only since carried out, usually to clarify inconsistent the term meta-analysis was first used in 19762 has findings or to estimate the true effect of a risk the technique become recognised as an analytical factor. However the interpretation of a meta- method. Meta-analysis is a discipline that reviews analysis of randomised controlled trials is usually critically and combines statistically the results of simpler. If all relevant clinical trials are included previous research in an attempt to summarise the and these are free from bias (ie, trials are totality ofevidence relating to a particular medical randomised, all randomised individuals are issue. The term meta-analysis is now often used included in "intention to treat" analyses, and synonymously with overview. outcome assessments are objective or blinded), a meta-analysis will give an unbiased assessment of Why use meta-analysis? a treatment's .9 In observational http://jech.bmj.com/ Traditionally, when seeking advice in epidemiology, potential bias in individual studies controversial or novel areas, clinicians and (through , misclassification, or other scientists have relied heavily on "informed" causes) will always remain a problem, especially editorials or narrative reviews. There is now good when effect sizes are small. If such biases are to an evidence to suggest that these traditional methods extent consistent over different studies, a meta- are subject to bias and inaccuracy.3 Reviewers analysis will reflect both the true effect and the are a using traditional methods less likely to detect biases. However the increasing use of meta- on September 27, 2021 by guest. Protected copyright. small but significant effect or difference analysis in observational studies should compared with reviewers using formal statistical encourage the more formal reporting of techniques.4 In controversial topics, such as aetiological studies, to facilitate the combining of reviews of the uses of new procedures, the such results. Indeed the direct comparison of Department of enthusiasm for the procedure may be associated results from meta-analyses of randomised Environmental and more with the specialty of the reviewer than with contfolled trials and of the related observational Preventive , the most current St Bartholomew's results of the trials.5 As medical studies is a novel and informative advance.'0 Hospital Medical reviews do not use scientific methods to assess and College, Charterhouse present , different reviewers often reach Examples of meta-analysis Square, different conclusions based on the same data.6 ECIM 6BQ For There are now many examples ofmeta-analysis in T D Spector these reasons some formal statistical process of a great variety of medical specialities that Medical review should replace the informal approach. demonstrate their potential usefulness. One ofthe Unit, London School Meta-analysis can be used to resolve uncertainty f of Hygiene and early important studies concerned the use of Tropical Medicine, when reports, editorials or reviews disagree. blockers in myocardial infarction'2 13 which Keppel St, London Although the randomised controlled clinical showed the efficacy of post-discharge treatment wC1 trial is now accepted as the method by combining the results of over 60 small studies. S G Thompson of assessing therapeutic regimes, individual trials It also produced a useful framework for future Correspondence to: may produce false positive or negative studies. Another meta-analysis has concluded Dr Spector conclusions. Small numbers and the consequent that steriods are of benefit in in Accepted for publication lack ofpower ofany individual study is usually the children,14 another that H2 antagonists are ofonly October 1990 main problem area.7 The problem of small minor benefit in the treatment of gastrointestinal 90 Tim D Spector, Simon G Thompson

haemorrhage, and only in gastric ulcers.'5 independently assessed "quality", derived from a J Epidemiol Community Health: first published as 10.1136/jech.45.2.89 on 1 June 1991. Downloaded from Although the vast majority of meta-analyses large number of predetermined "quality" concern the assessment of in criteria.26 The pooled estimate can then be randomised controlled trials, a few studies have adjusted accordingly, or else the quality* score addressed contentious aetiological issues such as used to exclude studies. A simpler method for the quantification ofthe effect ofpassive trials has been proposed which concentrates on on the risk of lung ,'6 in breast three areas of potential bias, namely treatment cancer,"7 the oral contraceptive pill in rheumatoid allocation by randomisation, inclusion of all arthritis,'8 and leukaemia in refinery workers.'9 randomised individuals in analysis, and the blindness of the outcome assessments.27 Quality Study design in meta-analysis assessments have also been used in With the proliferation of meta-analyses, it has epidemiological studies.'7 28 The major problem become apparent that their design, methods and with quality weighting is that it must remain publication should be conducted in a rigorous arbitrary and to an extent subjective. A single scientific manner, akin to that currently expected choice of weights is difficult to justify; for of randomised controlled trials. This is to allow example, is it worse to have poor blinding or poor critical appraisal of each individual meta-analysis randomisation? Moreover the procedure goes in terms of its methodology and therefore the against the general purpose of meta-analysis, that validity ofits conclusions. A meta-analysis should is to obtain an objective summary of the available be a research study in its own right. Specific a evidence. Because of the time and resources priori aims should be set out and a working needed to undertake full quality assessment, its established. routine use cannot be recommended unless its Having defined the aims of the study, a true worth becomes established. thorough search of relevant publications needs to be performed. Computer searches have aided the Publication bias inclusion of large numbers of trials in published Publication bias is a potential problem in all meta-analyses. However, several studies have meta-analyses.29 30 It arises from the fact that shown that less than two thirds of relevant trials unpublished papers may contradict the findings are uncovered by computer searches.20 Therefore of the overview due to the overrepresentation of computer searches should be supplemented by published "positive" (ie, statistically significant) the bibliographies of textbooks, reviews, and the studies. There is now good evidence that negative studies themselves, and information from studies in medicine are less likely to be published specialists in the field. Where possible databases than positive ones.3' 32The likelihood ofthis bias of ongoing clinical trials should be consulted. altering the conclusions will depend on the In order to reduce bias, the inclusion of studies chances of the existence of important numbers of should be based on predetermined criteria. For unpublished papers. This is less likely to occur example in clinical trials, evidence of when the result is of considerable importance (eg, randomisation is usually regarded as crucial2"; in supplementation and neural tube some situations a minimum study size might be defects)33 or when the questions can only be desirable. Ideally all studies should be assessed in answered by large costly studies which are likely a blinded fashion by independent observers, to reach publication (eg, trials ofthrombolytics on although this is often difficult and impractical to cardiovascular mortality). http://jech.bmj.com/ perform. The decision to include studies should The question of publication bias needs to be consider whether treatments, outcomes, and case addressed in all meta-analyses and its importance definitions are similar enough to be combined. considered. There are now several methods of Opinions will of course differ as to how strict confronting the problem. One involves a simple inclusion criteria should be. Some argue that calculation of the number of studies needed to where certain methodological differences occur it refute the conclusions of the meta-analysis.34 a one based on a "funnel is wrong to produce summary estimates.22 Others Another method is visual on September 27, 2021 by guest. Protected copyright. argue that the more varied the studies included, plot", an example of which is given by the more generalisable and applicable the Vandenbroucke.35 The basis of this is that if the results.23 Differences between studies are likely to observed effect sizes are plotted according to result in differences in the size, rather than the sample size they should scatter around an direction ofthe effect.24 Peto has also pointed out underlying "true" value, producing a funnel the tendency for trials addressing related pattern. Gaps in the plot indicate potential questions to produce answers in a similar unpublished studies and the possibility of bias. direction, despite methodological variations.9 It is Begg also produced a quantitative method of reassuring to note however that different meta- estimating the maximum potential effect of analyses of the same subject, that differ in the publication bias using the sample size ofthe study number of trials included, usually reach similar and an estimate of the size of the source conclusions.25 population.30 36 problems of this method are At present most meta-analyses performed do that information is needed on specific not take into account the quality of the individual rates and the proportion of a population who studies included, and results are weighted simply would enrol in a trial, and these details are not in favour of the large study over the small. In usually available with any accuracy. principle it would seem desirable to down weight Another approach has been to seek out and those studies of "doubtful" quality relative to include all unpublished studies performed, either "good" quality studies, because of their greater from abstracts of meetings or by direct likelihood of bias. Some authors have proposed correspondence from other investigators. that studies can be weighted in terms of Although less open to publication bias, a new Potential and limitations of meta-analysis 91

problem of data quality is encountered. The (0) differ systematically from the expected J Epidemiol Community Health: first published as 10.1136/jech.45.2.89 on 1 June 1991. Downloaded from decision to use abstracts or study summaries is a numbers (E), this provides evidence ofan effect of contentious one. Some editors have advised treatment. A test is provided by totalling the O-E against their use in referencing.37 About halfofall differences, and their , across the studies abstracts never appear as full publications.29 to see if the totalled (O-E) differs more from than Chalmers and coworkers attempted to identify zero than is compatible with chance. The factors which determined subsequent full calculations are thus easy to perform and to publication of abstracts in the present. A disadvantage for general use is that the perinatal field.38 They were unable to detect any approximation provided for the overall estimate differences in methodological quality, but did of is not a good one if the odds ratio is find that sample size was a significant factor in far from unity45; this is most unlikely to be a determining publication. The effect of sample problem in clinical trials, but could be in meta- size has also been shown by others.29 However analyses of epidemiological studies. although small studies are more likely to remain The choice between these fixed effect methods unpublished, those with large effects may would rarely materially affect the conclusions be preferentially published.39 Obtaining being drawn. A more important consideration is information from the authors of unpublished the possibility of heterogeneity between the studies has other inherent problems, as studies, that is failure of the asssumption information obtained from an investigator may be underlying all the fixed effect methods. The subject to , both on the part of the evidence for heterogeneity, ie, the systematic meta-analyst and the original researcher. A meta- differences between the underlying true analyst thus has to weigh up the risks ofincluding treatment effects in different studies, can be biased data (while increasing the power of the assessed formally using a X2 .46 However study) against the risks of publication bias. the test lacks power, and even in the absence of Theoretically publication bias could be "statistically significant" heterogeneity, one may prevented or markedly reduced if researchers want to explore the analysis further. One reported all studies undertaken and journals approach is to attempt to "explain" the accepted papers based on methods rather than heterogeneity in terms of characteristics of the results. These ideals may be a long way off, and studies or the patients included. If such divisions perhaps the most practical step would be the reveal possible sources of heterogeneity, extension ofclinical trial registers into other fields interpretation is necessarily cautious because and disciplines.40 41 analyses are "post hoc", that is, inspired by looking at the data. Statistical methods Often the sources of any heterogeneity are The first step in meta-analysis should simply be to intangible. If so, it may be difficult to justify a display the estimated treatment effects, together single combined estimate for all the studies. One with their confidence intervals, for each study. formal approach is the random effects method47 in Although the smallest and least informative which both a between study and the studies have wide confidence intervals that tend to within study variances are taken into account in dominate the diagram visually, the careful deriving the weighting given to each study. such inspection of displays often prompts most of However, the method cannot be regarded as a http://jech.bmj.com/ the conclusions that will emerge from a numerical panacea for heterogeneity. The between study analysis. There are two general philosophies for variance, estimated from the x2 statistic for producing a combined estimate of effect and its heterogeneity, is itself imprecise and, being often , the so called "fixed effect" strongly dependent on the inclusion or exclusion and "random effects" methods. They differ in of small studies, is susceptible to the effects of their assumptions about the true underlying publication bias. Also, the representation of treatment effects in the different studies. differences between studies by a single variance is In the fixed effect method, all the studies are conceptually inadequate. on September 27, 2021 by guest. Protected copyright. assumed to be estimating the same underlying The numerical methods used in meta-analysis treatment effect. In this situation, the most are therefore most reasonably based on the precise overall average of observed treatment following sequence. A fixed effect method may be effects is obtained by weighting each individual used initially, but it should be followed by an treatment effect inversely according to its assessment of heterogeneity. The random effects variance.42 This can be applied directly, for method may then be useful in assessing the example, to log odds ratios as summaries of each robustness of the initial conclusions to failure in trial's observed treatment effect. Logistic the assumption of no heterogeneity. If the regression is also sometimes used,i7 and is in fact conclusions from each method agree, there is equivalent to such an analysis. The Mantel- naturally greater confidence in them; if not, that Haenszel method weights the odds ratios (not the interpretation is problematic should be made their logarithms) approximately inversely explicit. according to their variances43; in many instances the choice between odds ratios and log odds ratios Conclusions is unimportant. Meta-analysis is here to stay. Epidemiologists, Peto's "observed minus expected" (O-E) , and clinicians should all be aware of method13 44 is equivalent to the Mantel-Haenszel the uses and limitations ofthe technique. A useful test. For each study, the "observed" number of by product of the growing use of this form of events in the treated or exposed group is analysis has been the greater awareness of the compared with that "expected" ifthe treatment or need for consistency in the way clinical trials and exposure had no effect. If the observed numbers epidemiological studies are presented, so that the 92 Tim D Spector, Simon G Thompson

results from these studies can be combined. This 19 Wong 0, Raabe GK. Critical review of cancer epidemiology J Epidemiol Community Health: first published as 10.1136/jech.45.2.89 on 1 June 1991. Downloaded from in petroleum industry employees, with a quantitative will undoubtedly have the effect of improving the meta-analysis by cancer site. Am J7 Indust Med 1989; 15: quality of methodology, assessment, and 283-310. 20 Dickersin K, Hewitt P, Mutch L, Chalmers I, Chalmers TC. presentation of and the Perusing the literature: comparison of medline searching availability of study data for future meta-analysis. with a perinatal clinical trials database. Controlled Clin Trials 1985; 6: 306-17. Despite the potential problems and pitfalls we 21 Sacks H, Chalmers TC, Smith H. Randomised versus have outlined, meta-analysis should play a leading historical controls for clinical trials. Am J7 Med 1982; 72: 233-4. role in the review of scientific issues. This 22 Goldsman L, Feinstein AR. and myocardial of meta- infarction: the problems of pooling, drowning and floating. necessitates a fuller understanding Ann Intern Med 1979; 90: 92-4. analysis as a routine analytical tool, but also a 23 Hedges LV. Commentary. Stat Med 1987; 6: 381-5. 24 Chalmers I, Hetherington J, Elbourne D, Keirse MJ, Enkin wider appreciation of the issues involved. M. Materials and methods used in synthesising evidence to evaluate the effects of care during pregnancy and childbirth. In: Chalmers I, Enkin M, Keirse MJ, eds We would like to thank Dr of the Effective care in pregnancy and childbirth Vol 1. Oxford: University of Maryland for her advice and comments Oxford University Press, 1989: 39-65. the of this manuscript. 25 Chalmers TC, Berrier J, Sacks HS, Levin H, Reitman D, during preparation Nagalingham R. Meta-analysis of clinical trials as a scientific discipline: replicate variability and comparison of 1 Beecher HK. The powerful . J7AMA 1955; 159: studies that agree and disagree. Stat Med 1987; 6: 733-44. 1602-6. 26 Chalmers TC, Smith H, Blackburn B, et al. A method for 2 Glass GV. Primary, secondary and meta-analysis of assessing the quality of a randomised control trial. research. Educ Res 1976; 5: 3-8. Controlled Clin Trials 1981; 2: 31-49. 3 Teagarden JR. Meta-analysis: whither narrative review? 27 Prendiville W, Elbourne D, Chalmers I. The effects of Pharmacotherapy 1989; 9: 274-84. routine oxytocic administration in the management of the 4 Cooper HM, Rosenthal R. Statistical versus traditional third stage of labour: an overview of the evidence from procedures for summarising research findings. Psychol Bull controlled trials. Br J Obstet Gynaecol 1988; 95: 3-16. 1980; 87: 442-9. 28 Lichtenstein MJ, Mulrow CD, Elwood PC. Guidelines for the reading case-control studies. J7 Chron Dis 1987; 40: 893- 5 Chalmers TC, Frank CS, Reitman D. Minimising three 903. stages of publication bias. JAMA 1990; 263: 1392-5. 29 Dickersin K. The existence of publication bias and risk 6 Mulrow CD. The medical review article: state of the science. factors for its occurrence. JAMA 1990; 263: 1385-9. Ann Intern Med 1987; 106: 485-8. 30 Begg CB, Berlin JA. Publication bias: a problem in 7 Frieman JA, Chalmers TC, Smith H, Kuebler RR. The interpreting medical data. J R Stat Soc 1988; 151: 419-63. importance of beta, the type II error and sample size in the 31 Simes RJ. Publication bias: the case for an International design and interpretation of the randomised controlled Registry of clinical trials. J7 Clin Oncol 1986; 4: 1529-41. trial: of 71 negative trials. N Engl _J Med 1978; 299: 32 Dickersin K, Chann SS, Chalmers TC, Sacks HS, Smith H. 690-4. Publication bias and clinical trials. Controlled Clin Trials 8 Bulpitt CJ. Meta-analysis. Lancet 1988; ii: 93-4. 1987; 8: 343-53. 9 Peto R. Why do we need systematic overviews ofrandomised 33 Angell M. Negative studies (Editorial). N Engl J Med 1989; trials? Stat Med 1987; 6: 233-40. 321: 464-6. 10 MacMahon S, Peto R, Cutler J, et al. , 34 Rosenthal R. The "file drawer problem" and tolerance for and coronary heart . Part 1, prolonged differences in null results. Psychol Bull 1979; 86: 638-41. blood pressure: prospective observational studies corrected 35 Vandenbroucke JP. Passive smoking and : a for the regression dilution bias. Lancet 1990; i: 765-73. publication bias? BMJ 1988; 296: 391-2. 11 Collins R, Peto R, MacMahon S, et al. Blood pressure, 36 Begg CB. A measure to aid the interpretation of published stroke and coronary heart disease. Part 2, Short term clinical trials. Stat Med 1985; 4: 1-9. reductions in blood pressure: overview of randomised 37 Editorial. Uniform requirements for manuscripts submitted trials in their epidemiological context. Lancet 1990; i: to biomedical journals. Lancet 1979; i: 428-31. 827-38. 38 Chalmers I, Adams M, Dickersin K, et al. A of 12 Yusuf S, Peto R, Lewis J, Collins R, Sleight P. Beta- summary reports of controlled trials. JAMA 1990; 263: blockade during and after myocardial infarction: an 1401-5. overview of the randomised trials. N EnglJ Med 1987; 316: 39 Berlin JA, Begg CB, Louis TA. An assessment of 450-5. publication bias using a sample ofpublished clinical trials. J7 13 Yusuf S, Collins R, Peto R, et al. Intravenous and Am Stat Assoc 1989; 84: 381-92. 40 Chalmers I, Hetherington J, Newdick M, et al. The Oxford intracoronary fibrinolytic in acute myocardial http://jech.bmj.com/ infarction: Overview of results on mortality, reinfarction database or Perinatal Trials: developing a register of and side effects from 33 randomised controlled trials. Eur published reports ofcontrolled trials. Controlled Clin Trials Heart J7 1985; 6: 556-85. 1986; 7: 306-25. 14 Havens PL, Wendelberger KJ, Hoffman GM, Lee MB, 41 Hubbard SM, Henney JE, DeVia VT. A computer database for information on cancer treatment. N Engl J Med 1987; Chusid MJ. Corticosteriods as adjunctive therapy in 316: 315-8. bacterial meningitis. Am J7 Dis Child 1989; 143: 1051-5. 42 Armitage P, Berry G. Statistical methods in . 15 Collins R, Langman M. Treatment with histamine H2 Oxford: Blackwell Scientific Publications, 1987: 194-5. antagonists in acute upper gastrointestinal haemorrhage. N 43 Mantel N, Haenszel W. Statistical aspects of the analysis of EnglJ7 Med 1985; 313: 660-6. data from retrospective studies of disease. J Natl Cancer 16 Wald NJ, Nanchahal K, Thompson SG, Cuckle HS. Does Inst 1959; 29: 719-48. breathing other people's tobacco smoke cause lung cancer? 44 Collins R, Yusuf S, Peto R. Overview ofrandomized trials of on September 27, 2021 by guest. Protected copyright. BMJ 1986; 293: 1217-22. in pregnancy. BMJ 1985; 290: 17-23. 17 Longnecker MP, Berlin JA, Orza MJ, Chalmers TC. A 45 Greenland S, Salvan A. Bias in the one-step method for meta-analysis of alcohol consumption in relation to risk of pooling study results. Stat Med 1990; 9: 247-52. breast cancer. J7AMA 1988; 260: 652-6. 46 Berlin JA, Laird NM, Sacks HS, Chalmers TC. A 18 Spector TD, Hochberg MC. A meta-analysis of the comparison of statistical methods for combining event rates association between the oral contraceptive pill and the from clinical trials. Stat Med 1989; 8: 141-51. development of rheumatoid arthritis. Clin Epidemiol 1990; 47 DerSimonian R, Laird N. Meta-analysis in clinical trials. 43: 1221-30. Controlled Clin Trials 1986; 7: 177-88.