TICS-1311; No. of Pages 7
Review
Publication and other reporting biases
in cognitive sciences: detection, prevalence, and prevention
1,2,3,4 5,6 7 8
John P.A. Ioannidis , Marcus R. Munafo` , Paolo Fusar-Poli , Brian A. Nosek ,
1
and Sean P. David
1
Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
2
Department of Health Research and Policy, Stanford University School of Medicine, Stanford, CA 94305, USA
3
Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, CA 94305, USA
4
Meta-Research Innovation Center at Stanford (METRICS), Stanford, CA 94305, USA
5
MRC Integrative Epidemiology Unit, UK Centre for Tobacco and Alcohol Studies, University of Bristol, Bristol, UK
6
School of Experimental Psychology, University of Bristol, Bristol, UK
7
Department of Psychosis Studies, Institute of Psychiatry, King’s College London, London, UK
8
Center for Open Science, and Department of Psychology, University of Virginia, VA, USA
Recent systematic reviews and empirical evaluations of Definitions of biases and relevance for cognitive
the cognitive sciences literature suggest that publica- sciences
tion and other reporting biases are prevalent across The terms ‘publication bias’ and ‘selective reporting bias’
diverse domains of cognitive science. In this review, refer to the differential choice to publish studies or report
we summarize the various forms of publication and particular results, respectively, depending on the nature
reporting biases and other questionable research prac- or directionality of findings [1]. There are several forms of
tices, and overview the available methods for probing such biases in the literature [2], including: (i) study publi-
into their existence. We discuss the available empirical cation bias, where studies are less likely to be published
evidence for the presence of such biases across the when they reach nonstatistically significant findings; (ii)
neuroimaging, animal, other preclinical, psychological, selective outcome reporting bias, where multiple out-
clinical trials, and genetics literature in the cognitive comes are evaluated in a study and outcomes found to
sciences. We also highlight emerging solutions (from be nonsignificant are less likely to be published than are
study design to data analyses and reporting) to prevent statistically significant ones; and (iii) selective analysis
bias and improve the fidelity in the field of cognitive reporting bias, where certain data are analyzed using
science research. different analytical options and publication favors the
more impressive, statistically significant variants of the
Introduction results (Box 1).
The promise of science is that the generation and evalua- Cognitive science uses a range of methods producing a
tion of evidence is done transparently to promote an effi- multitude of measures and rich data sets. Given this
cient, self-correcting process. However, multiple biases quantity of data, the potential analyses available and
may produce inefficiency in knowledge building. In this pressures to publish, the temptation to use arbitrary (in-
review, we discuss the importance of publication and other stead of planned) statistical analyses, and to mine data are
reporting biases, and present some potential correctives substantial and may lead to questionable research prac-
that may reduce bias without disrupting innovation. Con- tices. Even extreme cases of fraud in the cognitive sciences,
sideration of publication and other reporting biases is such as reporting of experiments that never took place,
particularly timely for cognitive sciences because it is a falsifying results, and confabulating data, have been
field that is expanding rapidly. As such, preventing or recently reported [3–5]. Overt fraud is probably rare, but
remedying these biases will have substantial impact on evidence of reporting biases in multiple domains of the
the efficient development of a credible corpus of published cognitive sciences (discussed below) raise concern about
research. the prevalence of questionable practices that occur via
motivated reasoning without any intent to be fraudulent.
Corresponding author: Ioannidis, J.P.A. ([email protected]).
Keywords: publication bias; reporting bias; cognitive sciences; neuroscience; bias.
Tests for publication and other reporting biases
1364-6613/$ – see front matter
Tests for single studies, specific topics, and wider
ß 2014 Published by Elsevier Ltd. http://dx.doi.org/10.1016/j.tics.2014.02.010 disciplines
Explicit documentation of publication and other reporting
biases requires availability of protocols, data, and results
of primary analyses from conducted studies so that these
Trends in Cognitive Sciences xx (2014) 1–7 1
TICS-1311; No. of Pages 7
Review Trends in Cognitive Sciences xxx xxxx, Vol. xxx, No. x
Box 1. Various publication and reporting biases compared with large studies, this may reflect publication
or selective reporting biases, but alternative explanations
Study publication bias can distort meta-analyses and systematic
exist (as reviewed elsewhere [10]). Sensitivity and specific-
reviews of a large number of published studies when authors are
more likely to submit and/or editors more likely to publish studies ity of these tests in real life is unknown, but simulation
with ‘positive’ (i.e., statistically significant) results or suppress studies have evaluated the performance in different set-
‘negative’ (i.e., nonsignificant) results.
tings. Published recommendations suggest cautious use of
Study publication bias is also called the ‘file drawer problem’ [88],
such tests [10]. In brief, visual evaluations of inverted
whereby most studies, with null effects, are never submitted and
funnel plots without statistical testing are precarious
thereby left in the file drawer. However, study publication bias is
probably also accentuated by peer reviewers and editors, even for [11]; some test variants have better type I and type II
‘negative’ studies that are submitted by their authors for publication.
error properties than others [12,13]; and, for most meta-
Sometimes, studies with ‘negative’ results may be published, but
analyses where there are a limited number of studies, the
with greater delay, than studies with ‘positive’ results, causing the
power of these tests is low [10].
so-called ‘time-lag bias’ [89].
In the ‘Proteus phenomenon’, ‘positive’ results are published
quickly and then there is a window of opportunity for the Selection models
publication also of ‘negative’ results contradicting the original
Selection model approaches evaluate whether the pattern
observation that may be tantalizing and interesting to editors and
of results that have been accumulated from a number of
investigators. The net effect is rapidly alternating extreme results
[17,90]. studies suggests an underlying filtering process, such as
Selective outcome or analyses reporting biases can occur in many the nonpublication of a given percentage of results that had
forms, including selective reporting of some of the analyzed not reached formal statistical significance [14–16]. These
outcomes in a study when others exist; selective reporting of a
methods have been less widely applied than small-study
specific outcome; and incomplete reporting of an outcome [91].
effect tests, even though they may be more promising. One
Examples of selective analyses reporting include selective report-
application of a selection model approach examined an
ing of subgroup analyses [92], reporting of per protocol instead of
intention-to-treat analyses [93] and relegation of primary out- entire discipline (Alzheimer’s disease genetics) and found
comes to secondary outcomes [94] so that nonsignificant results that selection forces may be different for first and/or dis-
for the pre-specified primary outcomes are subordinated to
covery results versus subsequent replication and/or refu-
nonprimary positive results [8].
tation results or late replication efforts [17]. Another
As a result of all these selective reporting biases, the published
method that probes for data ‘fiddling’ has been proposed
literature is likely to have an excess of statistically significant
results. Even when associations and effects are genuinely non- [18] to identify selective reporting of extremely promising
null, the published results are likely to overestimate their
P values in a body of published results.
magnitude [95,96].
Excess significance
can be compared against the published literature. Howev- Excess significance testing evaluates whether the number
er, these are not often available. A few empirical studies of statistically significant results in a corpus of studies is
have retrieved study protocols from authors or trial data too high, under some plausible assumptions about the
from submissions to the US Food and Drug Administration magnitude of the true effect size [19]. The Ioannidis test
(FDA) [6–9]. These studies have shown that deviations in [19] can be applied to meta-analyses of multiple studies
the analysis plan between protocols and published papers and also to larger fields and disciplines where many meta-
are common [6], and effect sizes of drug interventions are analyses are compiled. The number of expected studies
larger in the published literature compared with the cor- with nominally statistically significant results is estimated
responding data from the same trials submitted to the FDA by summing the calculated power of all the considered
[7,8]. It is challenging, but more common, to detect these studies. Simulations suggest that the most appropriate
biases using only the published literature. Several tests assumption is the effect size of the largest study in each
have been developed for this purpose; some are informal, meta-analysis [20].
whereas others use statistical methods of variable rigor. Francis has applied a variant of the excess significance
These tests may be applied to single studies, multiple test on multiple occasions on discrete psychological science
studies on specific questions, or wider disciplines. studies, where multiple experiments have statistically
Evaluating biases in each single study is attractive, but significant results [21]. The probability is estimated that
most difficult, because the data are usually limited, unless all experiments in the study would have had statistically
designs and analysis plans are registered a priori [9]. It is significant results. The appropriateness of applying excess
easier to evaluate bias across multiple studies performed significance testing in single studies has been questioned.
on the same question. When tests of bias are applied to a Obviously, this application is more charged, because spe-
wider scientific corpus, it is difficult to pinpoint which cific studies are pinpointed as being subject to bias.
single studies in this corpus of evidence are affected more
by bias. The tractable goal is to gain insight into the Other field-wide assessments
average bias in the field. In some fields, such as neuroimaging, where identification
of foci rather than effect sizes is the goal, one approach
Small-study effects evaluates whether the number of claimed discovered foci is
Tests of small-study effects have been popular since the related to the sample size of the performed studies [22]. In
mid-1990s. They evaluate studies included in the same the absence of bias, one would expect larger studies to have
meta-analysis and assess whether effect sizes are related larger power and, thus, detect more loci. Lack of such a
to study size. When small studies have larger effects relation, or even worse, an inverse relation with fewer foci
2
TICS-1311; No. of Pages 7
Review Trends in Cognitive Sciences xxx xxxx, Vol. xxx, No. x
discovered with larger studies, offers indirect evidence VBM studies, the number of foci reported in small VBM
of bias. studies and even in meta-analyses with few studies was
often inflated, consistent with reporting biases [30]. Inter-
Methods for correction of bias estingly, this and another investigation suggested that
Both small-study effects approaches and selection models studies with fewer coauthors tend to report larger brain
allow extensions to correct for the potential presence of abnormalities [31]. Similar biases were also detected in the
bias. For example, one can estimate the extrapolated effect use of functional MRI (fMRI) in an evaluation of 94 whole-
size for a study with theoretically infinite sample size that brain fMRI meta-analyses with 1788 unique data sets of
is immune from small-study bias [23]. For selection mod- psychiatric or neurological conditions and tasks [22].
els, one may impute missing nonsignificant studies in Reporting biases seemed to affect small fMRI studies,
meta-analyses to estimate corrected effect sizes [16]. An- which may be analyzed and reported in ways that may
other popular approach is the trim-and-fill method, and generate a larger number of foci (many foci especially in
variants thereof, for imputing missing studies [24]. Other smaller studies may represent false-positives) [22].
methods try to model explicitly the impact of multiple
biases on the data [25,26]. Although these methods are Animal studies and other preclinical studies
a focus of active research and some are even frequently The poor reproducibility of preclinical studies, including
used, their performance is largely unknown. They may animal studies, has recently been highlighted by several
generate a false sense of security that we can remedy failures to replicate key findings [32–34]. Irreproducible
distorted and/or selectively reported data. More solid results indicate that various biases may be operating in
approaches for correction of bias would require access to this literature.
raw data, protocols, analyses codes, and registries of un- In some fields of in vitro and cell-based preclinical
published information, but this is usually not available. research, direct replication attempts are uncommon and,
thus, bias is difficult to assess empirically. Conversely,
Empirical evidence for publication and other reporting there are often multiple animal model studies performed
biases in cognitive sciences on similar questions. Empirical evaluations have shown
Neuroimaging that small studies consistently give more favorable results
With over 3000 publications annually in the past decade, compared with larger studies [35] and study quality is
including several meta-analyses [27] (Figure 1), the iden- inversely related to effect size [36–39]. A large-scale
tification of potential publication or reporting biases is of meta-epidemiological evaluation [37,40] used data from
crucial interest for the future of neuroimaging. The first the Collaborative Approach to Meta-Analysis and Review
study to evaluate evidence for an excess of statistically of Animal Data in Experimental Studies consortium
significant results in neuroimaging focused on brain volu- (CAMARADES; http://www.camarades.info/). It assessed
metric studies based on region of interests (ROIs) analyses 4445 data sets synthesized in 160 meta-analyses of neuro-
of psychiatric conditions [28]. The study analyzed 41 meta- logical disorders, and found clear evidence of an excess of
analyses (461 data sets) and concluded that there were too significance. This observation suggests strong biases, with
many studies with statistically significant results, proba- selective analysis and outcome reporting biases being
bly due to selective outcome or analysis reporting [28]. plausible explanations.
Another evaluation assessed studies using voxel-based-
morphometry (VBM) methods, which are less subject to Psychological science
selective reporting of selected ROIs [29]. Still, in a data set Over 50 years ago, Sterling [41] offered evidence that most
of 47 whole-brain meta-analyses, including 324 individual published psychological studies reported positive results.
This trend has been persistent and perhaps even increas-
ing over time [42], particularly in the USA [43]. Evidence
10 000
suggests perceived bias against negative results in peer
review [44], and a desire for aesthetically pleasing positive
8000 effects [45] leads authors to be less likely to even attempt to
publish negative studies [44]. Instead, many psychological
6000 scientists report taking advantage of selective reporting
and flexibility in analysis to make their research results
4000 more publishable [46,47]. Such practices may contribute to
unusually elevated likelihoods of positive results that are
just below the nominal 0.05 threshold for significance (‘P 2000 Number of publica ons
hacking’) [48,49].
0
Clinical trials
Publication and other reporting biases in the clinical trials
1962 1972 1982 1992 2002 2012
Years literature have often been ascribed to financial vested
TRENDS in Cognitive Sciences interests from manufacturer sponsors [50]. For example,
in a review [51] of all randomized controlled trials of
Figure 1. Brain-imaging studies over the past 50 years. Number of brain-imaging
nicotine replacement therapy (NRT) for smoking cessation,
studies published in PubMed (search term ‘neuroimaging’, updated up to 2012)
over the past 50 years. more industry-supported trials (51%) reported statistically
3
TICS-1311; No. of Pages 7
Review Trends in Cognitive Sciences xxx xxxx, Vol. xxx, No. x
significant results compared with nonindustry trials (22%); No repor ng standards Not publishing null results Not publishing replica ons
this difference was unexplained by trial characteristics. Many measures and no analysis constraints Materials unavailable for replica on
Moreover, industry-supported trials indicated a larger
Presen ng exploratory analysis as confirmatory Data unavailable for reanalysis
effect of NRT [summary odds ratio 1.90, 95% confidence
Underpowered designss Data peepeekingking Selec ve repor ng No repor ng
interval (CI) 1.67–2.16] compared with nonindustry trials
(summary odds ratio 1.61, 95% CI 1.43–1.80). Evidence of
excess significance has also been documented in trials of Design Conduct Analyze Report
study study data results
neuroleptics [19]. Comparisons of published results
against FDA records shows that, although almost half of
the trials on antidepressants for depression have negative
Register study Specify analysis plan Open materials Open data Replica on
results in the FDA records, these negative results either Solu ons workflow problems
Data collec on start and stop rules Dis nguish confirmatory and exploratory analysis
remain unpublished or are published with distorted
reporting that shows them as positive [8]; thus, the pub- Disclose all measures, condi ons, and outcomes of pre-specified analysis plan
lished literature shows larger estimates of treatment Registered reports Require disclosure Reward open prac ces Post-pub review
effects for antidepressants than do the FDA data. A similar TRENDS in Cognitive Sciences
pattern has also been recorded for trials on antipsychotics
[7]. Figure 2. Common practices and possible solutions across the workflow for
addressing publication biases. Red problems and green solutions are mostly
Financial conflicts are not the only explanation for such controllable by researchers; purple problems and blue solutions are mostly
controllable by journal editors. Funding agencies may also be major players in
biases. For example, similar problems to antidepressants
shaping incentives and reward systems.
also seem to exist in the literature of randomized trials on
psychotherapies [52,53]. Allegiance bias towards favoring
specific cognitive, behavioral, or social theories that under- publication. Using public registries such as http://clinical-
lie these interventions may be responsible. trials.gov/, one can discover studies that were conducted
and never reported. However, still only the minority of
Genetics clinical trials is registered and registration is uncommon
Over the past couple of decades, candidate gene studies for other types of design for which registration may also be
have exemplified many of the problems of poor reproduc- useful [64–66].
ibility; few of the early studies in this literature could be
replicated reliably [54,55]. This problem has been even Pre-specification
worse in the literature of genetics of cognitive traits and Pre-specification of the project design and analysis plan
personality, where phenotype definitions are flexible [56]. can allow separating pre-specified from post hoc explorato-
The development of genome-wide association methods ry analyses. Currently, registered studies do not necessar-
necessitated greater statistical stringency (given the num- ily include such detailed pre-specification. Exploratory
ber of statistical tests conducted) and larger sample sizes. analysis can be useful, but they are tentative. Pre-specified
This has transformed the reliability of genetic research, analyses can be interpreted with greater confidence [67].
with fewer but more robust genetic signals now being Researchers can register and pre-specify analyses through
identified [57]. The successful transformation has required services such as the Open Science Framework (OSF; http://
international collaboration, the combination of data in osf.io/).
large consortia, and the adoption of rigorous replication
practices in meta-analysis within consortia [58,59]. Data, protocol, code, and analyses availability
For materials that can be shared digitally, transparency of
Approaches to prevent bias the materials, data, and research process is now straight-
Most scientists embrace fundamental scientific values, forward. There are hundreds of data repositories available
such as disinterestedness and transparency, even while for data storage, some are generic (e.g., http://thedata.org/),
believing that other scientists do not [60]. However, noble whereas others are specific for particular types of data (e.g.,
intentions may not be sufficient to decrease biases when http://openfmri.org/), and others offer additional services
people are not able to recognize or control their own biases for collaboration and project management (e.g., OSF and
[61,62], rationalize biases through motivated reasoning http://figshare.com/). There are also an increasing number
[63], and are embedded in a culture that implicitly rewards of options for registering protocols (e.g., PROSPERO for
the expression of biases for personal career advancement systematic reviews, Trials, and F1000 journals).
(i.e., publication over accuracy). Actual solutions should
recognize the limits of human capacity to act on intentions, Replication
and nudge the incentives that motivate and constrain Another core value of science is reproducibility [68]. Present
behavior. Figure 2 illustrates common practices and pos- incentives for innovation vastly outweigh those for replica-
sible solutions across the workflow for addressing multiple tion, making replications rare in many fields [69,70]. Jour-
biases, which are discussed below. nals are altering those incentives with a ‘registered reports’
publishing option to avoid bias on the part of authors or
Study registration reviewers who might otherwise be swayed by statistical
Study registration has been used primarily for clinical significance or the ‘information gain’ [71] arising from the
trials. Most leading journals now require protocol regis- results of a study [72]. Authors propose replicating an
tration for clinical trials at inception as a condition of existing result and provide justification for importance
4
TICS-1311; No. of Pages 7
Review Trends in Cognitive Sciences xxx xxxx, Vol. xxx, No. x
and description of the protocol and analysis plan. Peer Publication practices
review and conditional acceptance before data collection Beyond disclosure requirements and replication reports,
lowers the barrier for authors to pursue replication, and journals can encourage or require practices that shift
increases confidence in the results of the replication via incentives and decrease bias. For example, in 2014, eight
peer review and pre-specification. Multiple journals in charter journals in psychology and neuroscience adopted
psychology and neuroscience have adopted this format badges to acknowledge articles that met criteria for Open
[73–75]; (http://www.psychologicalscience.org/index.php/ Data, Open Materials, and Preregistration (http://osf.io/
replication), and funders such as the NIH are considering TVyXZ). Furthermore, journals can make explicit that
support for replication research [76]. replications are welcome, not discouraged [83], particular-
ly of research that had been published previously in that
Collaborative consortia journal. Finally, journals such as Psychological Science are
Incentives for replication research are enhanced further changing to allow fulsome description of methods [84]. Full
through crowdsourcing and collaborative consortia to description may facilitate the adoption of standards in
increase the feasibility and impact of replication. The contexts that allow substantial flexibility. For example,
Reproducibility Project is an open collaboration of more a study of 241 studies using fMRI found 223 unique
than 180 researchers aiming to estimate the reproduc- analysis pipelines with little justification for this variation
ibility of psychological science. Replication teams each [85]. Furthermore, activation patterns were strongly de-
select a study from a sample of articles published in three pendent on the analysis pipeline applied.
psychology journals in 2008. All teams follow a protocol
that provides quality assurance and adherence to stan- Post-publication review practices
dards. The effort distributes workload among many con- Retractions are a rare event [86], as are full re-analyses of
tributors and aggregates the replications for maximal published results. Increasing opportunities and exposure
value [77,78]. The Many Labs Project implemented a to post-publication review commentary and re-analysis
single study session including 13 experimental effects may be useful. Post-publication comments can now be
in psychology [79]. The same protocol (with translation) made in PubMed Commons and have been possible for a
was administered in 36 laboratory and web samples long time in the BMJ, Frontiers, PLoS, and other journal
around the world (total n = 6344). groups.
Commentators also note untapped opportunities for
replication research in the context of advanced undergrad- Incentive structures
uate and graduate pedagogy [80,81]. The Collaborative The primary currency of academic science is publication.
Replications and Education Project (CREP; https://osf.io/ Incentives need to be realigned so that emphasis is given
fLAuE/) identified ten published studies from 2010 of those not to quantity of publications, but to the quality and
that were amenable to integration and execution as a impact of the work, reproducibility track-record, and ad-
course-based research project. Project coordinators provid- herence to sharing culture (for data, protocols, analyses,
ed instructional aids and defined standards for document- and materials) [61,87].
ing and aggregating the results of replications across
institutions. Concluding remarks
Some fields, such as genetics, have been transformed by Overall, publication and other selective reporting biases
the widespread adoption of replication practices as a sine are probably prevalent and influential in diverse cognitive
qua non in the setting of large-scale collaborative consor- sciences. Different approaches have been proposed to rem-
tia, where tens and hundreds of teams participate. Such edy these biases. Box 2 summarizes some outstanding
consortia allow ensuring that maximal standardization or questions in this process. As shown in Figure 2, there
harmonization exists in the analysis plan across the par-
ticipating teams.
Box 2. Outstanding questions
There are several outstanding questions about how to minimize the
Pre-publication review practices
impact of biases in the literature. These include who should take the
Authors follow standards and checklists for formatting
lead in changing various practices (journals, scientists, editors,
manuscripts for publication, but there are few standards
funding agencies, regulators, patients, the public, or combinations
for the content of what is reported in manuscripts in basic of the above); how to ensure that more stringent requirements to
protect from bias do not stifle creativity by increasing the burden of
science. In January 2014, Psychological Science intro-
documentation; the extent to which new measures to curtail bias
duced a four-item checklist of disclosure standards for
may force investigators, authors, and journals to adapt their
submission: rationale for sample size, outcome variables,
practices in a normative fashion, satisfying new checklists of
experimental conditions, and use of covariates in analysis. requirements, but without really changing the essence of the
Enriched checklists and reporting standards, such as research practices; how to avoid excessive, unnecessary replication
happening at the expense of not investing in new research
CONSORT and ARRIVE, are in use in clinical trials
directions; how to optimize the formation and efficiency of
and animal research, respectively (http://www.consort-
collaborative consortia of scientists; and how to tackle potential
statement.org/; http://www.nc3rs.org.uk/page.asp?id=
caveats or collateral problems that may ensue while trying to
1357). In addition, specialized reporting standards have maximize public access to raw data [e.g., in theory, analyses may be
been introduced for some methods, such as fMRI [82]. Such performed by ambitious re-analysts with the explicit purpose of
showing that they get a different (and, thus, more easily publish-
practices may improve the consistency and clarity of reporting. able) result].
5
TICS-1311; No. of Pages 7
Review Trends in Cognitive Sciences xxx xxxx, Vol. xxx, No. x
18 Gadbury, G.L. and Allison, D.B. (2012) Inappropriate fiddling with
are multiple entry points for each of diverse ‘problems’
statistical analyses to obtain a desirable p-value: tests to detect its
from designing to conducting studies, analyzing data, and
presence in published literature. PLoS ONE 7, e46363
reporting results. ‘Solutions’ are existing, but uncommon,
19 Ioannidis, J.P. and Trikalinos, T.A. (2007) An exploratory test for an
practices that should improve the fidelity of scientific excess of significant findings. Clin. Trials 4, 245–253
inference in the cognitive sciences. We have limited evi- 20 Ioannidis, J.P.A. (2014) Clarifications on the application and
interpretation of the test for excess significance and its extensions.
dence on effectiveness of these approaches. Moreover,
J. Math. Psychol. 57, 187
improvements may require interventions at multiple
21 Francis, G. (2012) The psychology of replication and replication in
levels. Such interventions should aim at improving the
psychology. Perspect. Psychol. Sci. 7, 585–594
efficiency and reproducibility of science without stifling 22 David, S.P. et al. (2013) Potential reporting bias in FMRI studies of the
creativity and the drive for discovery. brain. PLoS ONE 8, e70104
23 Ru¨ cker, G. et al. (2011) Detecting and adjusting for small-study effects
in meta-analysis. Biom. J. 53, 351–368
Acknowledgments
24 Duval, S. and Tweedie, R. (2000) Trim and fill: a simple funnel-plot-
S.P.D. acknowledges funding from the National Institute on Drug Abuse
based method of testing and adjusting for publication bias in meta-
US Public Health Service grant DA017441 and a research stipend from
analysis. Biometrics 56, 455–463
the Institute of Medicine (IOM) (Puffer/American Board of Family
25 Ioannidis, J.P. (2011) Adjusting for bias: a user’s guide to performing
Medicine/IOM Anniversary Fellowship). M.R.M. acknowledges support
plastic surgery on meta-analyses of observational studies. Int. J.
for his time from the UK Centre for Tobacco Control Studies (a UKCRC
Epidemiol. 40, 777–779
Public Health Research Centre of Excellence) and funding from British
26 Thompson, S. et al. (2011) A proposed method of bias adjustment for
Heart Foundation, Cancer Research UK, the Economic and Social
meta-analyses of published observational studies. Int. J. Epidemiol.
Research Council, the Medical Research Council, and the National
40, 765–777
Institute for Health Research under the auspices of the UK Clinical
27 Jennings, R.G. and Van Horn, J.D. (2012) Publication bias in
Research Collaboration. F.F-P. acknowledges funding from the EU-GEI
neuroimaging research: implications for meta-analyses.
study (project of the European network of national schizophrenia
Neuroinformatics 10, 67–80
networks studying Gene-Environment Interactions). B.A.N. acknowl-
28 Ioannidis, J.P. (2011) Excess significance bias in the literature on brain
edges funding from the Fetzer Franklin Fund, Laura and John Arnold
volume abnormalities. Arch. Gen. Psychiatry 68, 773–780
Foundation, Alfred P. Sloan Foundation, and the John Templeton
29 Borgwardt, S. et al. (2012) Why are psychiatric imaging methods
Foundation.
clinically unreliable? Conclusions and practical guidelines for
authors, editors and reviewers. Behav. Brain Funct. 8, 46
References 30 Fusar-Poli, P. et al. (2013) Evidence of reporting biases in voxel-based
1 Green, S. et al. (2011) Cochrane Handbook for Systematic Reviews of morphometry (VBM) studies of psychiatric and neurological disorders.
Interventions, version 5.1.0, The Cochcrane Collaboration Hum. Brain Mapp. http://dx.doi.org/10.1002/hbm.22384
2 Dwan, K. et al. (2008) Systematic review of the empirical evidence of 31 Sayo, A. et al. (2012) Study factors influencing ventricular enlargement
study publication bias and outcome reporting bias. PLoS ONE 3, e3081 in schizophrenia: a 20 year follow-up meta-analysis. Neuroimage 59,
3 Vogel, G. (2011) Scientific misconduct. Psychologist accused of fraud on 154–167
‘astonishing scale’. Science 334, 579 32 Begley, C.G. and Ellis, L.M. (2012) Drug development: raise standards
4 Vogel, G. (2008) Scientific misconduct. Fraud charges cast doubt on for preclinical cancer research. Nature 483, 531–533
claims of DNA damage from cell phone fields. Science 321, 1144–1145 33 Prinz, F. et al. (2011) Believe it or not: how much can we rely on published
5 Vogel, G. (2006) Developmental biology. Fraud investigation clouds data on potential drug targets? Nat. Rev. Drug Discov. 10, 712
paper on early cell fate. Science 314, 1367–1369 34 ter Riet, G. et al. (2012) Publication bias in laboratory animal research:
6 Saquib, N. et al. (2013) Practices and impact of primary outcome a survey on magnitude, drivers, consequences and potential solutions.
adjustment in randomized controlled trials: meta-epidemiologic PLoS ONE 7, e43404
study. BMJ 347, f4313 35 Sena, E.S. et al. (2010) Publication bias in reports of animal stroke
7 Turner, E.H. et al. (2012) Publication bias in antipsychotic trials: an studies leads to major overstatement of efficacy. PLoS Biol. 8,
analysis of efficacy comparing the published literature to the US Food e1000344
and Drug Administration database. PLoS Med. 9, e1001189 36 Ioannidis, J.P. (2012) Extrapolating from animals to humans. Sci.
8 Turner, E.H. et al. (2008) Selective publication of antidepressant trials Transl. Med. 4, 151ps15
and its influence on apparent efficacy. N. Engl. J. Med. 358, 252–260 37 Jonasson, Z. (2005) Meta-analysis of sex differences in rodent models of
9 Dwan, K. et al. (2013) Systematic review of the empirical evidence of learning and memory: a review of behavioral and biological data.
study publication bias and outcome reporting bias – an updated review. Neurosci. Biobehav. Rev. 28, 811–825
PLoS ONE 8, e66844 38 Macleod, M.R. et al. (2008) Evidence for the efficacy of NXY-059 in
10 Sterne, J.A. et al. (2011) Recommendations for examining and experimental focal cerebral ischaemia is confounded by study quality.
interpreting funnel plot asymmetry in meta-analyses of randomised Stroke 39, 2824–2829
controlled trials. BMJ 343, d4002 39 Sena, E. et al. (2007) How can we improve the pre-clinical development
11 Lau, J. et al. (2006) The case of the misleading funnel plot. BMJ 333, of drugs for stroke? Trends Neurosci. 30, 433–439
597–600 40 Button, K.S. et al. (2013) Power failure: why small sample size
12 Peters, J.L. et al. (2006) Comparison of two methods to detect undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14,
publication bias in meta-analysis. JAMA 295, 676–680 365–376
13 Harbord, R.M. et al. (2006) A modified test for small-study effects in 41 Sterling, T.D. (1959) Publication decisions and their possible effects on
meta-analyses of controlled trials with binary endpoints. Stat. Med. 25, inferences drawn from tests of significance, or vice versa. J. Am. Stat.
3443–3457 Assoc. 54, 30–34
14 Copas, J.B. and Malley, P.F. (2008) A robust P-value for treatment 42 Fanelli, D. (2012) Negative results are disappearing from most
effect in meta-analysis with publication bias. Stat. Med. 27, 4267–4278 disciplines and countries. Scientometrics 90, 891–904
15 Copas, J. and Jackson, D. (2004) A bound for publication bias based on 43 Fanelli, D. and Ioannidis, J.P. (2013) US studies may overestimate
the fraction of unpublished studies. Biometrics 60, 146–153 effect sizes in softer research. Proc. Natl. Acad. Sci. U.S.A. 110, 15031–
16 Hedges, L.V. and Vevea, J.L. (1996) Estimating effect size under 15036
publication bias: small sample properties and robustness of a 44 Greenwald, A.G. (1975) Consequences of prejudice against the null
random effects selection model. J. Educ. Behav. Stat. 21, 299–332 hypothesis. Psychol. Bull. 82, 1–20
17 Pfeiffer, T. et al. (2011) Quantifying selective reporting and the Proteus 45 Giner-Sorolla (2012) Science or art? How esthetic standards grease the
phenomenon for multiple datasets with similar bias. PLoS ONE 6, way through the publication bottleneck but undermine science.
e18362 Psychol. Sci. 7, 562–571
6
TICS-1311; No. of Pages 7
Review Trends in Cognitive Sciences xxx xxxx, Vol. xxx, No. x
46 John, L.K. et al. (2012) Measuring the prevalence of questionable 72 Emerson, G.B. et al. (2010) Testing for the presence of positive-outcome
research practices with incentives for truth telling. Psychol. Sci. 23, bias in peer review: a randomized controlled trial. Arch. Intern. Med.
524–532 170, 1934–1939
47 Simmons, J.P. et al. (2011) False-positive psychology: undisclosed 73 Chambers, C.D. (2013) Registered reports: a new publishing initiative
flexibility in data collection and analysis allows presenting anything at Cortex. Cortex 49, 609–610
as significant. Psychol. Sci. 22, 1359–1366 74 Nosek, B.A. and Lakens, D.E. (2013) Call for proposals: special issue of
48 Masicampo, E.J. and Lalande, D.R. (2012) A peculiar prevalence of p Social Psychology on ‘Replications of important results in social
values just below .05. Q. J. Exp. Psychol. (Colchester) 65, 2271–2279 psychology’. Soc. Psychol. 44, 59–60
49 Simonsohn, U. et al. (2014) P-curve: a key to the file-drawer. J. Exp. 75 Wolfe, J. (2013) Registered reports and replications in attention,
Psychol. Gen. (in press) perception, and psychophysics. Atten. Percept. Psychophys. 75, 781–783
50 Stamatakis, E. et al. (2013) Undue industry influences that distort 76 Wadman, M. (2013) NIH mulls rules for validating key results. Nature
healthcare research, strategy, expenditure and practice: a review. Eur. 500, 14–16
J. Clin. Invest. 43, 469–475 77 Open Science Collaboration (2012) An open, large-scale collaborative
51 Etter, J.F. et al. (2007) The impact of pharmaceutical company funding effort to estimate the reproducibility of psychological science. Perspect.
on results of randomized trials of nicotine replacement therapy for Psychol. Sci. 7, 657–660
smoking cessation: a meta-analysis. Addiction 102, 815–822 78 Open Science Collaboration (2014) The Reproducibility Project: a
52 Cuijpers, P. et al. (2013) Comparison of psychotherapies for adult model of large-scale collaboration for empirical research on
depression to pill placebo control groups: a meta-analysis. Psychol. reproducibility. In Implementing Reproducible Computational
Med. 44, 685–695 Research (A Volume in the R Series) (Stodden, V. et al., eds), Taylor
53 Cuijpers, P. et al. (2010) Efficacy of cognitive-behavioural therapy and & Francis (in press)
other psychological treatments for adult depression: meta-analytic 79 Klein, R.A. and Ratliff, K.A. (2014) Investigating variation in
study of publication bias. Br. J. Psychiatry 196, 173–178 replicability: a ‘Many Labs’ replication project. Registered design.
54 Munafo`, M.R. (2009) Reliability and replicability of genetic association Soc. Psychol. (in press)
studies. Addiction 104, 1439–1440 80 Frank, M.C. and Saxe, R. (2012) Teaching replication. Perspect.
55 Ioannidis, J.P. et al. (2001) Replication validity of genetic association Psychol. Sci. 7, 600–604
studies. Nat. Genet. 29, 306–309 81 Grahe, J.E. et al. (2012) Harnessing the undiscovered resource of
56 Munafo`, M.R. and Flint, J. (2011) Dissecting the genetic architecture of student research projects. Perspect. Pyschol. Sci. http://dx.doi.org/
human personality. Trends Cogn. Sci. 15, 395–400 10.1177/1745691612459057
57 Hirschhorn, J.N. (2009) Genomewide association studies: illuminating 82 Poldrack, R.A. et al. (2008) Guidelines for reporting an fMRI study.
biologic pathways. N. Engl. J. Med. 360, 1699–1701 Neuroimage 40, 409–414
58 Evangelou, E. and Ioannidis, J.P. (2013) Meta-analysis methods for 83 Neuliep, J.W. and Crandell, R. (1990) Editorial bias against replication
genome-wide association studies and beyond. Nat. Rev. Genet. 14, research. J. Soc. Behav. Pers. 5, 85–90
379–389 84 Eich, E. (2014) Business not as usual. Psychol. Sci. http://dx.doi.org/
59 Panagiotou, O.A. et al. (2013) The power of meta-analysis in genome- 10.1177/0956797613512465
wide association studies. Annu. Rev. Genomics Hum. Genet. 14, 85 Carp, J. (2012) The secret lives of experiments: methods reporting in
441–465 the fMRI literature. Neuroimage 63, 289–300
60 Anderson, M.S. et al. (2007) Normative dissonance in science: results 86 Steen, R.G. (2010) Retractions in the scientific literature: do authors
from a national survey of U.S. Scientists. J. Empir. Res. Hum. Res. deliberately commit research fraud? J. Med. Ethics 37, 113–117
Ethics 2, 3–14 87 Koole, S.L. and Lakens, D. (2012) Rewarding replications: a sure and
61 Nosek, B.A. et al. (2012) Scientific utopia: II. Restructuring incentives simple way to improve psychological science. Perspect. Psychol. Sci. 7,
and practices to promote truth over publishability. Perspect. Psychol. 608–614
Sci. 7, 615–631 88 Rosenthal, R. (1979) The file drawer problem and tolerance for null
62 Nosek, B.A. et al. (2012) Implicit social cognition. In Handbook of Social results. Psychol. Bull. 86, 638–641
Cognition (Fiske, S. and Macrae, C.N., eds), pp. 31–53, Sage 89 Ioannidis, J.P. (1998) Effect of the statistical significance of results on
63 Kunda, Z. (1990) The case for motivated reasoning. Psychol. Bull. 108, the time to completion and publication of randomized efficacy trials.
480–498 JAMA 279, 281–286
64 Ioannidis, J.P. (2012) The importance of potential studies that have 90 Ioannidis, J.P. and Trikalinos, T.A. (2005) Early extreme contradictory
not existed and registration of observational data sets. JAMA 308, estimates may appear in published research: the Proteus phenomenon
575–576 in molecular genetics research and randomized trials. J. Clin.
65 World Medical Association (2013) World Medical Association Epidemiol. 58, 543–549
Declaration of Helsinki: ethical principles for medical research 91 Kirkham, J.J. et al. (2010) The impact of outcome reporting bias in
involving human subjects. JAMA 310, 2191–2194 randomised controlled trials on a cohort of systematic reviews. BMJ
66 Dal-Re´, R. et al. (2014) Making prospective registration of 340, c365
observational research a reality. Sci. Transl. Med. 6, 224cm1 92 Hahn, S. et al. (2000) Assessing the potential for bias in meta-analysis
67 Wagenmakers, E-J. et al. (2012) An agenda for purely confirmatory due to selective reporting of subgroup analyses within studies. Stat.
research. Perspect. Psychol. Sci. 7, 632–638 Med. 19, 3325–3336
68 Schmidt, S. (2009) Shall we really do it again? The powerful concept of 93 Moreno, S.G. et al. (2009) Novel methods to deal with publication
replication is neglected in the social sciences. Rev. Gen. Psychol. 13, 90– biases: secondary analysis of antidepressant trials in the FDA trial
100 registry database and related journal publications. BMJ 339, b2981
69 Collins, H.M. (1985) Changing Order, Sage 94 Vedula, S.S. et al. (2009) Outcome reporting in industry-sponsored
70 Makel, M.C. et al. (2012) Replications in psychology research: how often trials of gabapentin for off-label use. N. Engl. J. Med. 361, 1963–1971
do they really occur? Perspect. Psychol. Sci. 7, 537–542 95 Ioannidis, J.P. (2008) Why most discovered true associations are
71 Evangelou, E. et al. (2012) Perceived information gain from inflated. Epidemiology 19, 640–648
randomized trials correlates with publication in high-impact factor 96 Young, N.S. et al. (2008) Why current publication practices may distort
journals. J. Clin. Epidemiol. 65, 1274–1281 science. PLoS Med. 5, e201
7