<<

RESEARCH BMJ: first published as 10.1136/bmj.b2981 on 7 August 2009. Downloaded from

Novel methods to deal with publication biases: secondary analysis of antidepressant trials in the FDA trial registry database and related journal publications

Santiago G Moreno, research student,1 Alex J Sutton, professor of medical ,1 Erick H Turner, assistant professor,2 Keith R Abrams, professor of ,1 Nicola J Cooper, senior research fellow,1 Tom M Palmer, research associate,3 A E Ades, professor of public health science4

1Department of Health Sciences, ABSTRACT the term has been used historically to University of Leicester, Leicester Objective To assess the performance of novel contour refer to the suppression of whole studies based on (the LE1 7RH enhanced funnel plots and a regression based “ ” 2Department of Psychiatry, lack of) statistical significance or interest level, a Oregon Health and Science adjustment method to detect and adjust for publication of mechanisms can distort the published litera- University, Portland Veterans biases. ture. These include, in addition to the suppression of Affairs Medical Center, Portland, Design Secondary analysis of a published systematic Oregon, USA whole studies, selective reporting of outcomes or sub- literature review. “ ” 3MRC Centre for Causal Analyses groups; data massaging, such as the selective exclu- in Translational , Data sources Placebo controlled trials of antidepressants sion of patients from the analysis; and biases regarding Department of Social Medicine, previously submitted to the US Food and Drug timelines.2 A good umbrella term for all these is disse- University of Bristol Administration (FDA) and matching journal publications. mination biases34; in keeping with common usage we 4Department of Community Based Methods Publication biases were identified using novel refer to them as publication biases. If such biases are

Medicine, University of Bristol http://www.bmj.com/ Correspondence to: S G Moreno contour enhanced funnel plots, a regression based present, any decision making based on the literature [email protected] adjustment method, Egger’s test, and the trim and fill could be misleading,56 not least through obtaining method. Results were compared with a meta-analysis of 7 Cite this as: BMJ 2009;339:b2981 inflated clinical effects from meta-analysis. doi:10.1136/bmj.b2981 the gold standard data submitted to the FDA. The FDA dataset is assumed to be an unbiased (but Results Severe asymmetry was observed in the contour not the complete) body of evidence in the specialty of enhanced funnel that appeared to be heavily antidepressants and so is regarded a gold standard data influenced by the statistical significance of results, source owing to the legal requirements of submitting on 30 September 2021 by guest. Protected copyright. suggesting publication biases as the cause of the evidence in its entirety to the FDA and its careful mon- asymmetry. Applying the regression based adjustment itoring for deviations from protocol.8-10 A gold stan- method to the journal data produced a similar pooled dard dataset will not, however, be available in most effect to that observed by a meta-analysis of the FDA data. contexts. In the absence of a gold standard, meta-ana- Contrasting journal and FDA results suggested that, in lysts have had to rely on analytical methods to both addition to other deviations from study protocol, detect and adjust for publication biases. This has been switching from an intention to treat analysis to a per an active area of methodology development over the protocol one would contribute to the observed past decades, with much written on approaches to deal discrepancies between the journal and FDA results. with publication biases in a meta-analysis context.2 Conclusion Novel contour enhanced funnel plots and a These include graphical diagnostic approaches and regression based adjustment method worked formal statistical tests to detect the presence of publica- convincingly and might have an important part to play in tion bias, and statistical approaches to modify effect combating publication biases. sizes to adjust a meta-analysis estimate when the pre- sence of publication bias is suspected.2 While the per- INTRODUCTION formance of many of these methods has been In 2008 Turner et al published a study in the New Eng- evaluated using simulation studies, concerns remain land Journal of Medicine showing that the scientific jour- as to whether the simulations reflect real life situations nal literature on antidepressants was biased towards and therefore whether their perceived performance is “favourable” results.1 The authors compared the representative of what would happen if they were used results in journal based reports of trials with data on in practice. Understandably this has led to caution in the corresponding trials submitted to the US Food the use of the methods, particularly for those that adjust and Drug Administration (FDA) when applying for effect sizes for publication biases6; but ultimately this is licensing. The discrepancies observed in the journal what is required for rational decision making if publi- based reports were due to publication biases. Although cation biases exist.

BMJ | ONLINE FIRST | bmj.com page 1 of 7 RESEARCH

We consider what we believe are currently the best compared with 51% according to the FDA. Data for BMJ: first published as 10.1136/bmj.b2981 on 7 August 2009. Downloaded from methods for identifying and adjusting for publication the analysis were extracted from the previous paper biases—both of which have been described only (table C in the appendix),1 in which two studies were recently. Specifically, we consider a funnel plot (a scat- combined, making a total of 73 studies in our assess- ter plot of versus associated ) ment. enhanced by contours separating areas of statistical sig- nificance from non-significance.11 These contours help Analysis distinguish publication biases from other factors that We applied two novel methods to the journal dataset: lead to asymmetry in the funnel plot. The method the contour enhanced funnel plot11 16 to detect publica- used to adjust a meta-analysis for publication bias is tion biases, and a regression based adjustment based on a regression line fitted to the funnel plot.12 method12 to adjust for them. For completeness and The adjusted effect size is obtained by extrapolating comparison we also applied to the dataset the most the regression line to predict the effect size that would established and commonly used methods to deal with be seen in a hypothetical study of infinite size—that is, publication biases—namely, Egger’s regression test13 which has an effect size with zero associated standard for detecting bias, and the trim and fill adjustment error. For comparison and completeness we consider method (fixed effects linear estimator).14 17-19 The trim established methods to deal with publication bias. and fill method is an iterative non-parametric techni- These are the regression based Egger’s test for funnel que that uses rank based data augmentation to adjust asymmetry,13 and the trim and fill method,14 which for publication bias by imputing studies estimated to be adjusts a meta-analysis for publication bias by imput- missing from the dataset. We use fixed effect models ing studies to rectify any asymmetry in the funnel plot. for the primary analysis in this paper; we also reana- The dataset from Turner et al provides a unique lysed the data using random effects models as a sensi- opportunity to evaluate the performance of these ana- tivity analysis. Stata v.9.2 was used for all the analyses. lytical methods against a gold standard. We present the results of applying the diagnostic and adjustment meth- Contour enhanced funnel plots ods to the journal published results and compare the In its simplest form a funnel plot is a of findings with those obtained through (gold standard) study effect sizes (x axis) against their estimated stan- analysis of the data submitted to the FDA. dard errors (y axis).20 When no bias is present such a plot should be symmetrical, with increasing variability

METHODS in effect sizes being observed in the less precise studies http://www.bmj.com/ A full description of the dataset, how it was obtained, towards the bottom of the plot, producing a funnel and the references to the trials associated with it have shape. Asymmetry in this plot may indicate that pub- been published previously.1 Briefly, Turner et al iden- lication biases are present through the lack of observed tified the cohort of all phase II and phase III short term data points in a region of the plot.20 Asymmetry alone double blind placebo controlled trials used for the does not necessarily imply publication biases exist, licensing of antidepressant drugs between 1987 and however, since alternative explanations for the asym- 21

2004 by the FDA. Seventy four trials registered with metry may be present. For example, on 30 September 2021 by guest. Protected copyright. the FDA and involving 12 drugs and 12 564 patients factors (that is, any unmeasured variable associated were identified. To compare drug efficacy reported by with both study precision and effect size) may distort the published literature with that of the FDA gold stan- the appearance of the plot. It has been observed that dard, Turner et al collected data on the primary out- certain aspects of trial quality may influence the esti- come from both sources. Once the primary outcome mates of effect size,22-25 and empirical evidence sug- data were extracted from the FDA trial registry, they gests that small studies are, on average, of lower searched the published scientific literature for publica- quality and this could induce asymmetry on a funnel tions matching the same trials. When a match was iden- plot.26 Mechanisms such as this lead to what have been tified, they extracted data on the article’s apparent termed small study effects,21 26-28 and their presence will primary efficacy outcome. Because studies reported also make funnel plots asymmetrical. their outcomes on different scales, they expressed all With a view to disentangling genuine publication effect sizes as standardised differences using biases from other causes of funnel asymmetry, the fun- Hedges’ g scores (accompanied by corresponding nel plot can be enhanced by including contours that ).15 Among the 74 studies registered with the partition it into areas of statistical significance and non- FDA, 23 (31%), accounting for 3449 participants, were significance11 16 based on the standard , mark- not published. Overall, larger effects were derived ing traditionally perceived milestones of significance— from the journal data than from the FDA data. for example, the 1%, 5%, and 10% levels.29 In this way Among the 38 studies with results viewed by the the level of statistical significance of every study’s effect FDA as statistically significant, only one was unpub- estimate is identified. Since there is evidence that pub- lished. Conversely, inconclusive studies were, with lication biases are related to these milestones,30 31 this three exceptions, either not published (22 studies) or can aid interpretation of the funnel plot—that is, if stu- published in conflict with the FDA findings (11 stu- dies seem to be missing in areas of statistical non-sig- dies). Moreover, 94% of published studies reported a nificance, then this adds credence to the notion that the positive significant result for their primary outcome, asymmetry is due to publication biases. In such cases page 2 of 7 BMJ | ONLINE FIRST | bmj.com RESEARCH

an attempt should be made to adjust for such biases (in (fig 1A), a contour enhanced funnel plot has convin- BMJ: first published as 10.1136/bmj.b2981 on 7 August 2009. Downloaded from the absence of being able to obtain gold standard data cingly identified publication biases as a major problem unaffected by publication biases, such as data from reg- for the journal data. ulatory authorities like the FDA). Conversely, if the For the journal dataset, the trim and fill method parts of the funnel where studies are perceived to be imputed a total of 18 “missing” studies (all in the region missing are found in areas of higher statistical signifi- of non-statistical significance indicated by squares in cance, the cause of asymmetry is more likely to be due figure 1C). This agrees reasonably well with the truth, to factors other than publication biases. as 23 studies identified through the FDA registry were not identified in the journal literature. The application Regression based adjustment of the trim and fill method reduced the average effect The regression based adjustment method fits a regres- size to 0.35 (95% 0.31 to 0.39), sion line of best fit to the data presented on a funnel which is about halfway between the FDA and journal 32 plot. An adjusted pooled estimate of effect is obtained estimates (all three estimates are presented in figure by predicting, from the regression line, the pooled 1C). effect size for an ideal study of infinite size (hence The fitted line corresponding to the regression based with zero standard error), which would be located at adjustment method is plotted in figure 1D (orange the top of a funnel plot; since it is hypothesised that dashed line). The adjusted estimate is obtained by extra- there would be no bias in studies of that size. This 33-35 polating the line to where the standard error is 0 (at the idea has been discussed in the literature (and addi- top of figure 1D). This produces an adjusted average tionally, such metaregressions are commonly used to 13 effect size of 0.29 (95% confidence interval 0.23 to test for the presence of publication bias), but only 0.35), which is close to the estimate produced by the recently has the notion been formally evaluated.12 In meta-analysis of the FDA data (0.31, 0.27 to 0.35). that evaluation the performance of several different The situation is complicated by the fact that among regression models was considered over an extensive the FDA non-significant studies that were published in range of meta-analytical and publication bias scenar- medical journals, most were published as if they were ios. The best models were shown to consistently out- significant. This is investigated in figure 2A by linking perform the established trim and fill method. One of the effect sizes from each study where estimates were these, the quadratic version of the original Egger’s available from both data sources (69% (n=50) of all the regression test,13 is implemented here. This assumes a trials), using arrows indicating the magnitude and linear trend between the effect size and its http://www.bmj.com/ direction of change between FDA and published effect (rather than its standard error, as assumed in the origi- ’ sizes. The effect size differed between FDA and journal nal Egger s test). Other models considered in the simu- = lation study were designed for binary outcomes analyses in 62% (n 31) of the 50 trials by at least a g exclusively and are not considered here. score of 0.01. Of these, the journal published effects were larger in 77% (n=24) of the studies (arrow point- RESULTS ing to right). As expected, a meta-analysis of these data produces a higher average effect size for the journal

Figure 1A displays a contour enhanced funnel plot of on 30 September 2021 by guest. Protected copyright. = the studies submitted to the FDA, with the correspond- data (g score 0.41, 95% confidence interval 0.37 to ing fixed effect meta-analysis pooled estimate provid- 0.45) compared with the matched FDA data (0.37, ing a weighted average of effect sizes across trials (g 0.33 to 0.41). About eight studies in figure 2 achieve score 0.31, 95% confidence interval 0.27 to 0.35). statistical significance at the 5% level when published This funnel plot is reasonably symmetrical (Egger’s in medical journals, contradicting their non-significant test P=0.10), which is consistent with the hypothesis FDA submission, whereas no journal publication that the FDA is an unbiased and appropriate gold stan- revokes statistical significance previously reported to dard data source. the FDA. This suggests that reporting biases within The contour enhanced funnel plot for the journal published studies are directed towards the realisation = data (fig 1B) is different and highly asymmetrical of statistical significance. Similarly, 96% (n 21) of the (Egger’s test P<0.001). A meta-analysis of these data 22 unpublished studies (in journals) were non-signifi- results in a higher average effect size (g score 0.41, cant when submitted to the FDA (fig 2B); which again 0.37 to 0.45). Most of the study estimates now lie supports the hypothesis of the presence of publication above (but many close to) the right contour line, indi- biases. The fixed effect meta-analysis estimate for these cating a statistically significant benefit at the 5% level, 22 unpublished studies (0.15, 95% confidence interval with few studies located below this 5% contour line— 0.08 to 0.22) was far lower than the one for published that is, not reaching significance at the 5% level. Cru- studies (0.41, 0.37 to 0.45; fig 2B), adding further sup- cially, the area where studies seem to be “missing” is port that serious publication biases are present in the contained within the area where non-significant studies journal data. would be located; inside the triangle defined by P=0.10 A reanalysis of the data using random effects models contour boundaries. This adds further credence to the produced similar results to the fixed effect (proportion hypothesis that the observed asymmetry is caused by of total variability explained by heterogeneity (I2) was publication biases. Hence, even without the availabil- 16% for the FDA data and 0% for the journal data).36 ity of the corresponding funnel plot for the FDA data Details are available on request from the first author.

BMJ | ONLINE FIRST | bmj.com page 3 of 7 RESEARCH

Significance level <1% DISCUSSION BMJ: first published as 10.1136/bmj.b2981 on 7 August 2009. Downloaded from Significance level 1-5% The application of two novel approaches to identify Significance level 5-10% Significance level >10% and adjust for publication biases in a dataset derived FDA estimate from a journal publication, where a gold standard data- Fixed effect meta-analysis FDA data set exists, produced encouraging results. Firstly, detec- 0.0 A tion of publication biases was convincing using a contour enhanced funnel plot. Secondly, the regres- 0.1 sion based method produced a corrected average effect

Standard error size, which was close to that obtained from the FDA 0.2 dataset (and closer than that obtained by the trim and fill method). 0.3 This assessment does, however, have limitations. Firstly, the findings relate to a single dataset and thus 0.4 are not necessarily generalisable to other examples. Journal estimate Specifically, all the trials were sponsored by the phar- Fixed effect meta-analysis journal data maceutical industry and we make the assumption that Fixed effect meta-analysis FDA data 0.0 the FDA data are completely unbiased. Furthermore, B the methods under evaluation were designed primarily for the assessment of efficacy outcomes and they might 0.1 not be appropriate for safety outcomes—for example,

Standard error there may be incentives to suppress statistically signifi- 0.2 cant safety outcomes (rather than non-significant ones). This is an area that requires more research. 0.3 Debate is ongoing about the usefulness of funnel plots and related tests for the identification of publica- 0.4 tion biases. Although their use is widely advocated237 Journal estimate some question their validity,27 38-41 including in this Fixed effect meta-analysis journal data Fixed effect meta-analysis FDA data Filled study Significance level <1% http://www.bmj.com/ Fixed effect trim and fill Significance level 1-5% 0.0 Significance level 5-10% C Significance level >10% FDA to journal change in effect 0.1 0.0 A Standard error 0.2 0.1 on 30 September 2021 by guest. Protected copyright. Standard error 0.3 0.2

0.4 0.3 Journal estimate Fixed effect meta-analysis journal data Fixed effect meta-analysis FDA data 0.4 Regression line FDA estimate Fixed effect trim and fill Fixed effect meta-analysis unpublished 0.0 0.0 D B

0.1 0.1 Standard error Standard error 0.2 0.2

0.3 0.3

0.4 0.4 -1.0 -0.5 0 0.5 1.0 -1.0 -0.5 0 0.5 1.0 Effect estimate Effect estimate

Fig 1 | Contour enhanced funnel plots (95% CI at top). (A) Studies Fig 2 | Contour enhanced funnel plots displaying discrepancy submitted to Food and Drug Administration (FDA). (B) Studies between Food and Drug Administration (FDA) data and journal published in journals. (C) Implementation of trim and fill method data. (A) Arrows joining effect results from same studies when on journal data. (D) Implementation of regression adjustment both were available from FDA and journals. (B) Estimates of model on journal data (adjusted effect at top where SE is 0) effect only available from FDA (not journal published studies) page 4 of 7 BMJ | ONLINE FIRST | bmj.com RESEARCH

journal.42 We think the analysis presented here pro- because of problems identified through simulation BMJ: first published as 10.1136/bmj.b2981 on 7 August 2009. Downloaded from vides strong evidence that they do have a useful role. studies.12 40 66 The regression adjustment method, Recently there has been a lot of research into refining which is easy to carry out,67 consistently outperformed tests for funnel plot asymmetry,13 26 43-45 and while we the trim and fill method in an extensive simulation support the formalisation of such an assessment, none study12 (as well as within this particular dataset). of the tests (nor trim and fill or the regression adjust- We consider technical issues relating the influence of ment method) considers the statistical significance of choice of outcome metric on the robustness of the the available study estimates. For this reason we think results, and analyses methods used within the assess- the consideration of the contours on the funnel plot to ments. Firstly, the Hedges’ g score outcome metric be an essential component of distinguishing publica- was used throughout the analysis. This includes a cor- tion biases from other causes of funnel plot asymmetry. rection for small sample size. An alternative metric, We make no claim that the contours can distinguish without the correction, is the Cohen’s d score, which between the different mechanisms for publication could also have been used. However this would have bias—for example, whether it is missing whole studies, negligible influence on the funnel plots presented here selectively reported outcomes, or “massaged” data that since the correction is still modest even for the smallest have led to the distorted funnel plot. (Because we have trials (n=25). An additional consideration is that the the FDA data, we do go on to disentangle this (fig 2) but contours on the funnels are constructed assuming nor- generally this will not be possible.) But we do not think mality of the effect size since they are based on the this is an important limitation because all these biases Wald test. We acknowledge that this may not be have the same effect in a meta-analysis—that is, they exactly the statistical test used in the original analyses are all assumed to be related to statistical significance for some of the trials. For example, for trials with small and they all result in an exaggeration of the pooled sample sizes, a t test may have been used. However, as effect. There is empirical evidence to support this the Wald and t test statistics converge as the sample size notion for the effect of reporting biases within pub- increases, this is only going to affect the assessment of lished clinical trials in general46-48 and for trials on anti- the most imprecise trials at the bottom of the funnel, depressants in particular.14950 Potential mechanisms and all our findings are clearly robust to this. that are known to induce this include: (a) selectivity The 73 randomised controlled trials considered here in which outcomes are reported or labelled as primary correspond to 12 different antidepressants. Despite in journal publications; (b) post hoc searches for statis- this, there was little statistical heterogeneity in both

tical significance using numerous hypothesis tests— datasets and so we carried out fixed effect analyses for http://www.bmj.com/ that is, data dredging or fishing; and (c) selectivity in simplicity (and findings are consistent if random effects the analysis methods applied to the data for journal are used). There is an ever present tension in meta-ana- publication. Regarding the last point, the FDA makes lysis between “lumping and splitting” studies, and an its recommendations based on the intention to treat argument could be made for allowing for specific dif- principle,51 52 whereas only half the journal publica- ferences in drug treatment by stratifying them and car- tions are analysed and reported using this rying out 12 separate analyses. Challenges would arise 53-56 — approach. The usual alternative the per protocol if attempting to detect and adjust for publication biases on 30 September 2021 by guest. Protected copyright. approach to analysis—excludes dropouts and non- in each of the analyses independently owing to the dif- adherents (or patients with protocol deviations in gen- ficulty of interpreting funnel plots with small numbers eral) and aims to estimate drug efficacy, which will tend of studies and the limited power of statistical to inflate effect sizes compared with the intention to methods.26 We agree with the suggestions of Shang et treat approach, which estimates effectiveness.57-60 An al,68 in their assessment of biases in the homoeopathy estimate from a per protocol analysis will generally trial literature (which has some commonalities with the have less precision than for the associated intention to analysis presented here), that it is advantageous to treat analyses owing to the removal of patients with “borrow strength” from a large number of trials and protocol deviations,61 62 which would result in a shift provide empirical information to assist reviewers and downwards along the y axis of a funnel plot. This is readers in the interpretation of findings from small consistent with what is observed in figure 2A, where meta-analyses that focus on a specific intervention. most arrows are in a downward (as well as right mov- Furthermore, investigations of extensions of the exist- ing) direction. How much such a mechanism com- ing statistical methods that would formalise such ideas monly contributes to funnel plot asymmetry would for borrowing strength to produce stratum specific be worthy of further investigation. tests and estimates of bias are under way. Few methods for specifically addressing outcome63 64 Given the apparent biases in the journal based litera- and subgroup reporting biases65 exist, and further ture for these placebo controlled trials on anti- development of analytical methods to specifically depressants, we are concerned about the validity of tackle aspects of reporting biases within studies is the findings of a recent high profile network meta- encouraged. Nevertheless, it is reassuring that the analysis69 of non-placebo controlled trials on anti- methods used in this article to address publication depressants as no assessment of potential publication and related biases generally seem to work well in the biases seemed to be carried out.70 presence of multiple types of publication biases. We no Undoubtedly the best solution to publication biases longer advocate the use of the trim and fill method is to prevent them from occurring in the first place.2

BMJ | ONLINE FIRST | bmj.com page 5 of 7 RESEARCH

11 Peters J, Sutton AJ, Jones DR, Abrams KR, Rushton L. Contour-

WHAT IS ALREADY KNOWN ON THIS TOPIC enhanced meta-analysis funnel plots help distinguish publication BMJ: first published as 10.1136/bmj.b2981 on 7 August 2009. Downloaded from bias from other causes of asymmetry. JClinEpidemiol2008:991-6. Publication biases exaggerate clinical effects resulting in potentially erroneous clinical 12 Moreno SG, Sutton AJ, Ades AE, Stanley TD, Abrams KR, Peters JL, decision making et al. Assessment of regression-based methods to adjust for publication bias through a comprehensive simulation study. BMC While most of the attention has focused on the non-publication of whole studies, the Med Res Methodol 2009;9:2. problem of reporting biases within published studies is receiving increased attention 13 Egger M, Smith GD, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. BMJ 1997;315:629-34. 14 Duval S, Tweedie RL. Trim and fill: a simple funnel plot based method WHAT THIS STUDY ADDS of testing and adjusting for publication bias in meta-analysis. Biometrics 2000;56:455-63. Mechanisms including suppression of whole studies, selective outcome reporting, and data 15 Hedges LV. Estimating effect size from a series of independent “massaging” (for example, selective exclusion of patients from the analysis) may act . Psychol Bull 1982;92:490-9. simultaneously, but may be motivated by underlying statistical significance 16 Palmer TM, Peters JL, Sutton AJ, Moreno SG. Contour-enhanced funnel plots for meta-analysis. Stata J 2008;8:242-54. Contour enhanced funnel plots and a regression based adjustment method to identify and 17 Sutton AJ, Duval SJ, Tweedie RL, Abrams KR, Jones DR. Empirical adjust for multiple publication biases using real data where a gold standard exists showed assessment of effect of publication bias on meta-analyses. BMJ promising results 2000;320:1574-7. 18 Duval S, Tweedie RL. A nonparametric “trim and fill” method of accounting for publication bias in meta-analysis. J Am Stat Assoc 2000;95:89-98. Using a gold standard data source, such as the FDA 19 Steichen TJ. METATRIM: stata module to perform nonparametric analysis of publication bias. Stata Tech Bull 2000;61:8-14. trial registry database, is one way of achieving this. 20 Sterne JAC, Egger M. Funnel plots for detecting bias in meta-analysis: However, this is still a long way off from becoming a guidelines on choice of axis. J Clin Epidemiol 2001;54:1046-55. 21 McMahon B, Holly L, Harrington R, Roberts C, Green J. Do larger reality for many analyses. Hence we often have to rely studies find smaller effects? The example of studies for the on analytical methods to deal with the problem, and we prevention of conduct disorder. Eur Child Adolesc Psychiatry believe that the contour enhanced funnel plot and the 2008;17:432-7. 22 Egger M, Ebrahim S, Smith GD. Where now for meta-analysis? Int J regression based adjustment method provide impor- Epidemiol 2002;31:1-5. tant developments in the toolkit to combat publication 23 Sterne JAC, Jüni P, Schulz KF, Altman DG, Bartlett C, Egger M. Statistical methods for assessing the influence of study biases. characteristics on treatment effects in ‘meta-epidemiological’ research. 2002;21:1513-24. Contributors: AJS conceived the project and led the research together Stat Med 24 Pildal J, Hrobjartsson A, Jorgensen KJ, Hilden J, Altman DG, Gøtzsche with SGM. AJS and SGM carried out the statistical analyses and PC. Impact of allocation concealment on conclusions drawn from interpretation of the data. EHT, AEA, KRA, and NJC participated in data meta-analyses of randomized trials. Int J Epidemiol 2007;36:847-57. analysis and interpretation. TMP made a substantial contribution by 25 Wood L, Egger M, Gluud LL, Schulz KF, Juni P, Altman DG, et al. designing and developing the plots. SGM and AJS drafted the paper, Empirical evidence of bias in treatment effect estimates in controlled which was revised by all coauthors through substantial contributions to trials with different interventions and outcomes: meta- http://www.bmj.com/ the contents of the paper. All authors approved the final version of the epidemiological study. BMJ 2008;336:601-5. paper for publication. SGM is the guarantor. 26 Sterne JAC, Gavaghan D, Egger M. Publication and related bias in meta-analysis: power of statistical tests and prevalence in the Funding: SGM was supported by a Medical Research Council Health literature. J Clin Epidemiol 2000;53:1119-29. Services Research Collaboration studentship in the UK. AEA was funded 27 Ioannidis JPA. Interpretation of tests of heterogeneity and bias in by the Medical Research Council Health Services Research Collaboration. meta-analysis. J Eval Clin Pract 2008;14:951-7. The funding agreement ensured the authors’ independence in designing 28 Jüni P, Nüesch E, Reichenbach S, Rutjes A, Scherrer M, Bürgi E, et al. the study, interpreting the data, and writing and publishing the report. The Overestimation of treatment effects associated with small sample corresponding author as well as the other authors had access to all the size in osteoarthritis research. (Abstracts of the 16th Cochrane on 30 September 2021 by guest. Protected copyright. data and take responsibility for the integrity of the data and the accuracy Colloquium). GermanJQualHealthCare 2008;102:7-99. of the data analysis. 29 Gerber AS, Malhotra N. Publication bias in empirical sociological research: do arbitrary significance levels distort published results? Competing interests: None declared. Sociol Methods Res 2008;37:3-30. Ethical approval: Not required. 30 Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR. Publication bias Data sharing: Data are available on request from the first author. in clinical research. Lancet 1991;337:867-72. 31 Ioannidis JPA. Effect of the statistical significance of results on the time to completion and publication of randomized efficacy trials. 1 Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R. Selective JAMA 1998;279:281-6. publication of antidepressant trials and its influence on apparent 32 Thompson SG, Higgins JPT. How should meta-regression analyses be efficacy. NEnglJMed2008;358:252-60. undertaken and interpreted? Stat Med 2002;21:1559-73. 2 Rothstein HR, Sutton AJ, Borenstein M. Publication bias in meta- 33 Steichen TJ. METABIAS: tests for publication bias in meta-analysis. analysis. Prevention, assessment and adjustments. Chichester: Stata Tech Bull 1998;7:125-33. Wiley, 2005. 34 DuMouchel W, Normand SL. Computer-modeling and graphical 3 Bax L, Ikeda N, Fukui N, Yaju Y, Tsuruta H, Moons KGM. More than strategies for meta-analysis. In: Stangl DK, Berry DA, eds. Meta- numbers: the power of graphs in meta-analysis. Am J Epidemiol analysis in medicine and health policy. New York: Marcel Dekker, 2009;169:249-55. 2000:157. 4 Song F, Eastwood AJ, Gilbody S, Duley L, Sutton AJ. Publication and 35 Copas JB, Malley PF. A robust P-value for treatment effect in meta- related biases. Health Technol Assess 2000;4:1-115. analysis with publication bias. Stat Med 2008;27:4267-78. 5 Egger M, Smith GD. Misleading meta-analysis [editorial]. BMJ 36 Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring 1995;310:752-4. inconsistency in meta-analyses. BMJ 2003;327:557-60. 6 Sterne JAC, Egger M, Davey Smith G. Systematic reviews in health 37 Sterne JAC, Egger M, Moher D, eds. Chapter 10: addressing reporting care: investigating and dealing with publication and other biases in biases. In: Higgins JPT, Green S, ed. Cochrane handbook for meta-analysis. BMJ 2001;323:101-5. systematic reviews of intervention. Version 5.0.0 (updated Feb 2008). Oxford: Cochrane Collaboration, 7 Egger M, Davey Smith G, Phillips AN. Meta-analysis: principles and 2008 (available from www.cochrane-handbook.org). procedures. BMJ 1997;315:1533-7. 38 Ioannidis JPA, Trikalinos TA. The appropriateness of asymmetry tests 8 Ioannidis JPA. Effectiveness of antidepressants: an evidence myth for publication bias in meta-analyses: a large survey. CMAJ constructed from a thousand randomized trials? Philos Ethics 2007;176:1091-6. Humanit Med 2008;3:14. 39 Terrin N, Schmid CH, Lau J. In an empirical evaluation of the funnel 9 Chan A-W. Bias, spin, and misreporting: time for full access to trial plot, researchers could not visually identify publication bias. JClin protocols and results. PLoS Med 2008;5:e230. Epidemiol 2005;58:894-901. 10 Turner EH. A taxpayer-funded clinical trials registry and results 40 Terrin N, Schmid CH, Lau J, Olkin I. Adjusting for publication bias in database. PLoS Med 2004;1:e60. the presence of heterogeneity. Stat Med 2003;22:2113-26. page 6 of 7 BMJ | ONLINE FIRST | bmj.com RESEARCH

41 Tang JL, Liu JL. Misleading funnel plot for detection of bias in meta- 56 Hollis S, Campbell F. What is meant by intention to treat analysis? analysis. J Clin Epidemiol 2000;53:477-84. Survey of published randomised controlled trials. BMJ BMJ: first published as 10.1136/bmj.b2981 on 7 August 2009. Downloaded from 42 Lau J, Ioannidis JPA, Terrin N, Schmid C, Olkin I. The case of the 1999;319:670-4. misleading funnel plot. BMJ 2006;333:597-600. 57 Gartlehner G, Hansen RA, Nissman D, Lohr KN, Carey TS. A simple 43 Harbord RM, Egger M, Sterne JAC. A modified test for small-study and valid tool distinguished efficacy from effectiveness studies. JClin effects in meta-analyses of controlled trials with binary endpoints. Epidemiol 2006;59:1040-8. Stat Med 2006;25:3443-57. 58 Bollini P, Pampallona S, Tibaldi G, Kupelnick B, Munizza C. 44 Peters JL, Sutton AJ, Jones DR, Abrams KR, Rushton L. Comparison of Effectiveness of antidepressants. Meta-analysis of dose-effect two methods to detect publication bias in meta-analysis. JAMA relationships in randomised clinical trials. Br J Psychiatry 2006;295:676-80. 1999;174:297-303. 45 Rücker G, Schwarzer G, Carpenter J. Arcsine test for publication bias 59 Revicki DA, Frank L. Pharmacoeconomic evaluation in the real world. in meta-analyses with binary outcomes. Stat Med 2008;27:746-63. Effectiveness versus efficacy studies. Pharmacoeconomics 46 Chan A-W, Hróbjartsson A, Haahr MT, Gøtzsche PC, Altman DG. 1999;15:423-34. Empirical evidence for selective reporting of outcomes in randomized 60 Schulz KF, Grimes DA. Sample size slippages in randomised trials: trials. Comparison of protocols to published articles. JAMA exclusions and the lost and wayward. Lancet 2002;359:781-5. 2004;291:2457-65. 61 Porta N, Bonet C, Cobo E. Discordance between reported intention- 47 Chan A-W, Krleza-Jeric K, Schmid I, Altman DG. Outcome reporting to-treat and per protocol analyses. JClinEpidemiol2007;60:663-9. bias in randomized trials funded by the Canadian Institutes of Health 62 Fergusson D, Aaron SD, Guyatt G, Hebert P. Post-randomisation Research. CMAJ 2004;17:735-40. exclusions: the intention to treat principle and excluding patients 48 Dwan K, Altman DG, Arnaiz JA, Bloom J, Chan A, Cronin E, et al. from analysis. BMJ 2002;325:652-4. of the empirical evidence of study publication bias 63 Williamson PR, Gamble C. Application and investigation of a bound and outcome reporting bias. PLoS ONE 2008;3:e3081. for outcome reporting bias. Trials 2007;8:9. 64 Hutton JL, Williamson PR. Bias in meta-analysis due to outcome 49 Hotopf M, Barbui C. Bias in the evaluation of antidepressants. variable selection within studies. Appl Stat 2000;49:359-70. Epidemiol Psichiatr Soc 2005;14:55-7. 65 Hahn S, Williamson PR, Hutton JL, Garner P, Flynn EV. Assessing the 50 FurukawaTA,WatanabeN,OmoriIM,MontoriVM,GuyattGH. potential for bias in meta-analysis due to selective reporting of Association between unreported outcomes and effect size estimates subgroup analyses within studies. Stat Med 2000;19:3325-36. in Cochrane meta-analyses. JAMA 2007;297:468-70. 66 Peters JL, Sutton JA, Jones DR, Abrams KR, Rushton L. Performance of 51 Heritier SR, Gebski VJ, Keech AC. Inclusion of patients in the trim and fill method in the presence of publication bias and analysis: the intention-to-treat principle. Med J Aust between-. Stat Med 2007;26:4544-62. 2003;179:438-40. 67 Thompson SG, Sharp SJ. Explaining heterogeneity in meta-analysis: a 52 Lewis JA. Statistical principles for clinical trials (ICH E9): an comparison of methods. Stat Med 1999;18:2693-708. introductory note on an international guideline. Stat Med 68 Shang A, Huwiler-Müntener K, Nartey L, Jüni P, Dörig S, Sterne JAC, 1999;18:1903-42. et al. Are the clinical effects of homoeopathy placebo effects? 53 Gravel J, Opatrny L, Shapiro S. The intention-to-treat approach in Comparative study of placebo-controlled trials of homoeopathy and randomized controlled trials: are authors saying what they do and allopathy. Lancet 2005;366:726-32. doing what they say? Clin Trials 2007;4:350-6. 69 Cipriani A, Furukawa TA, Salanti G, Geddes JR, Higgins JPT, 54 Melander H, Ahlqvist-Rastad J, Meijer G, Beermann B. Evidence b(i) Churchill R, et al. Comparative efficacy and acceptability of 12 new- ased medicine—selective reporting from studies sponsored by generation antidepressants: a multiple-treatments meta-analysis. pharmaceutical industry: review of studies in new drug applications. Lancet 2009;373:746-58. BMJ 2003;326:1171-3. 70 Turner EH, Moreno SG, Sutton AJ. Concerns about reported rank- http://www.bmj.com/ 55 Hotopf M, Lewis G, Normand C. Putting trials on trial—the costs and order of antidepressant efficacy. Lancet 2009;373:1760. consequences of small trials in depression: a systematic review of methodology. J Epidemiol Community Health 1997;51:354-8. Accepted: 10 May 2009 on 30 September 2021 by guest. Protected copyright.

BMJ | ONLINE FIRST | bmj.com page 7 of 7