TICS-1311; No. of Pages 7

Review

Publication and other reporting

in cognitive sciences: detection, prevalence, and prevention

1,2,3,4 5,6 7 8

John P.A. Ioannidis , Marcus R. Munafo` , Paolo Fusar-Poli , Brian A. Nosek ,

1

and Sean P. David

1

Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA

2

Department of Health and Policy, Stanford University School of Medicine, Stanford, CA 94305, USA

3

Department of Statistics, Stanford University School of Humanities and Sciences, Stanford, CA 94305, USA

4

Meta-Research Innovation Center at Stanford (METRICS), Stanford, CA 94305, USA

5

MRC Integrative Unit, UK Centre for Tobacco and Alcohol Studies, University of Bristol, Bristol, UK

6

School of Experimental , University of Bristol, Bristol, UK

7

Department of Psychosis Studies, Institute of Psychiatry, King’s College London, London, UK

8

Center for Open Science, and Department of Psychology, University of Virginia, VA, USA

Recent systematic reviews and empirical evaluations of Definitions of biases and relevance for cognitive

the cognitive sciences literature suggest that publica- sciences

tion and other reporting biases are prevalent across The terms ‘publication ’ and ‘selective

diverse domains of cognitive science. In this review, refer to the differential choice to publish studies or report

we summarize the various forms of publication and particular results, respectively, depending on the nature

reporting biases and other questionable research prac- or directionality of findings [1]. There are several forms of

tices, and overview the available methods for probing such biases in the literature [2], including: (i) study publi-

into their existence. We discuss the available empirical cation bias, where studies are less likely to be published

evidence for the presence of such biases across the when they reach nonstatistically significant findings; (ii)

neuroimaging, animal, other preclinical, psychological, selective outcome reporting bias, where multiple out-

clinical trials, and genetics literature in the cognitive comes are evaluated in a study and outcomes found to

sciences. We also highlight emerging solutions (from be nonsignificant are less likely to be published than are

study design to data analyses and reporting) to prevent statistically significant ones; and (iii) selective analysis

bias and improve the fidelity in the field of cognitive reporting bias, where certain data are analyzed using

science research. different analytical options and publication favors the

more impressive, statistically significant variants of the

Introduction results (Box 1).

The promise of science is that the generation and evalua- Cognitive science uses a range of methods producing a

tion of evidence is done transparently to promote an effi- multitude of measures and rich data sets. Given this

cient, self-correcting process. However, multiple biases quantity of data, the potential analyses available and

may produce inefficiency in knowledge building. In this pressures to publish, the temptation to use arbitrary (in-

review, we discuss the importance of publication and other stead of planned) statistical analyses, and to mine data are

reporting biases, and present some potential correctives substantial and may lead to questionable research prac-

that may reduce bias without disrupting innovation. Con- tices. Even extreme cases of fraud in the cognitive sciences,

sideration of publication and other reporting biases is such as reporting of that never took place,

particularly timely for cognitive sciences because it is a falsifying results, and confabulating data, have been

field that is expanding rapidly. As such, preventing or recently reported [3–5]. Overt fraud is probably rare, but

remedying these biases will have substantial impact on evidence of reporting biases in multiple domains of the

the efficient development of a credible corpus of published cognitive sciences (discussed below) raise concern about

research. the prevalence of questionable practices that occur via

without any intent to be fraudulent.

Corresponding author: Ioannidis, J.P.A. ([email protected]).

Keywords: ; reporting bias; cognitive sciences; neuroscience; bias.

Tests for publication and other reporting biases

1364-6613/$ – see front matter

Tests for single studies, specific topics, and wider

ß 2014 Published by Elsevier Ltd. http://dx.doi.org/10.1016/j.tics.2014.02.010 disciplines

Explicit documentation of publication and other reporting

biases requires availability of protocols, data, and results

of primary analyses from conducted studies so that these

Trends in Cognitive Sciences xx (2014) 1–7 1

TICS-1311; No. of Pages 7

Review Trends in Cognitive Sciences xxx xxxx, Vol. xxx, No. x

Box 1. Various publication and reporting biases compared with large studies, this may reflect publication

or selective reporting biases, but alternative explanations

 Study publication bias can distort meta-analyses and systematic

exist (as reviewed elsewhere [10]). Sensitivity and specific-

reviews of a large number of published studies when authors are

more likely to submit and/or editors more likely to publish studies ity of these tests in real life is unknown, but simulation

with ‘positive’ (i.e., statistically significant) results or suppress studies have evaluated the performance in different set-

‘negative’ (i.e., nonsignificant) results.

tings. Published recommendations suggest cautious use of

 Study publication bias is also called the ‘file drawer problem’ [88],

such tests [10]. In brief, visual evaluations of inverted

whereby most studies, with null effects, are never submitted and

funnel plots without statistical testing are precarious

thereby left in the file drawer. However, study publication bias is

probably also accentuated by peer reviewers and editors, even for [11]; some test variants have better type I and type II

‘negative’ studies that are submitted by their authors for publication.

error properties than others [12,13]; and, for most meta-

 Sometimes, studies with ‘negative’ results may be published, but

analyses where there are a limited number of studies, the

with greater delay, than studies with ‘positive’ results, causing the

power of these tests is low [10].

so-called ‘time-lag bias’ [89].

 In the ‘Proteus phenomenon’, ‘positive’ results are published

quickly and then there is a window of opportunity for the Selection models

publication also of ‘negative’ results contradicting the original

Selection model approaches evaluate whether the pattern

observation that may be tantalizing and interesting to editors and

of results that have been accumulated from a number of

investigators. The net effect is rapidly alternating extreme results

[17,90]. studies suggests an underlying filtering process, such as

 Selective outcome or analyses reporting biases can occur in many the nonpublication of a given percentage of results that had

forms, including selective reporting of some of the analyzed not reached formal statistical significance [14–16]. These

outcomes in a study when others exist; selective reporting of a

methods have been less widely applied than small-study

specific outcome; and incomplete reporting of an outcome [91].

effect tests, even though they may be more promising. One

 Examples of selective analyses reporting include selective report-

application of a selection model approach examined an

ing of subgroup analyses [92], reporting of per instead of

intention-to-treat analyses [93] and relegation of primary out- entire discipline (Alzheimer’s disease genetics) and found

comes to secondary outcomes [94] so that nonsignificant results that selection forces may be different for first and/or dis-

for the pre-specified primary outcomes are subordinated to

covery results versus subsequent replication and/or refu-

nonprimary positive results [8].

tation results or late replication efforts [17]. Another

 As a result of all these selective reporting biases, the published

method that probes for data ‘fiddling’ has been proposed

literature is likely to have an excess of statistically significant

results. Even when associations and effects are genuinely non- [18] to identify selective reporting of extremely promising

null, the published results are likely to overestimate their

P values in a body of published results.

magnitude [95,96].

Excess significance

can be compared against the published literature. Howev- Excess significance testing evaluates whether the number

er, these are not often available. A few empirical studies of statistically significant results in a corpus of studies is

have retrieved study protocols from authors or trial data too high, under some plausible assumptions about the

from submissions to the US Food and Drug Administration magnitude of the true [19]. The Ioannidis test

(FDA) [6–9]. These studies have shown that deviations in [19] can be applied to meta-analyses of multiple studies

the analysis plan between protocols and published papers and also to larger fields and disciplines where many meta-

are common [6], and effect sizes of drug interventions are analyses are compiled. The number of expected studies

larger in the published literature compared with the cor- with nominally statistically significant results is estimated

responding data from the same trials submitted to the FDA by summing the calculated power of all the considered

[7,8]. It is challenging, but more common, to detect these studies. Simulations suggest that the most appropriate

biases using only the published literature. Several tests assumption is the effect size of the largest study in each

have been developed for this purpose; some are informal, meta-analysis [20].

whereas others use statistical methods of variable rigor. Francis has applied a variant of the excess significance

These tests may be applied to single studies, multiple test on multiple occasions on discrete psychological science

studies on specific questions, or wider disciplines. studies, where multiple experiments have statistically

Evaluating biases in each single study is attractive, but significant results [21]. The probability is estimated that

most difficult, because the data are usually limited, unless all experiments in the study would have had statistically

designs and analysis plans are registered a priori [9]. It is significant results. The appropriateness of applying excess

easier to evaluate bias across multiple studies performed significance testing in single studies has been questioned.

on the same question. When tests of bias are applied to a Obviously, this application is more charged, because spe-

wider scientific corpus, it is difficult to pinpoint which cific studies are pinpointed as being subject to bias.

single studies in this corpus of evidence are affected more

by bias. The tractable goal is to gain insight into the Other field-wide assessments

average bias in the field. In some fields, such as neuroimaging, where identification

of foci rather than effect sizes is the goal, one approach

Small-study effects evaluates whether the number of claimed discovered foci is

Tests of small-study effects have been popular since the related to the sample size of the performed studies [22]. In

mid-1990s. They evaluate studies included in the same the absence of bias, one would expect larger studies to have

meta-analysis and assess whether effect sizes are related larger power and, thus, detect more loci. Lack of such a

to study size. When small studies have larger effects relation, or even worse, an inverse relation with fewer foci

2

TICS-1311; No. of Pages 7

Review Trends in Cognitive Sciences xxx xxxx, Vol. xxx, No. x

discovered with larger studies, offers indirect evidence VBM studies, the number of foci reported in small VBM

of bias. studies and even in meta-analyses with few studies was

often inflated, consistent with reporting biases [30]. Inter-

Methods for correction of bias estingly, this and another investigation suggested that

Both small-study effects approaches and selection models studies with fewer coauthors tend to report larger brain

allow extensions to correct for the potential presence of abnormalities [31]. Similar biases were also detected in the

bias. For example, one can estimate the extrapolated effect use of functional MRI (fMRI) in an evaluation of 94 whole-

size for a study with theoretically infinite sample size that brain fMRI meta-analyses with 1788 unique data sets of

is immune from small-study bias [23]. For selection mod- psychiatric or neurological conditions and tasks [22].

els, one may impute missing nonsignificant studies in Reporting biases seemed to affect small fMRI studies,

meta-analyses to estimate corrected effect sizes [16]. An- which may be analyzed and reported in ways that may

other popular approach is the trim-and-fill method, and generate a larger number of foci (many foci especially in

variants thereof, for imputing missing studies [24]. Other smaller studies may represent false-positives) [22].

methods try to model explicitly the impact of multiple

biases on the data [25,26]. Although these methods are Animal studies and other preclinical studies

a focus of active research and some are even frequently The poor reproducibility of preclinical studies, including

used, their performance is largely unknown. They may animal studies, has recently been highlighted by several

generate a false sense of security that we can remedy failures to replicate key findings [32–34]. Irreproducible

distorted and/or selectively reported data. More solid results indicate that various biases may be operating in

approaches for correction of bias would require access to this literature.

raw data, protocols, analyses codes, and registries of un- In some fields of in vitro and cell-based preclinical

published information, but this is usually not available. research, direct replication attempts are uncommon and,

thus, bias is difficult to assess empirically. Conversely,

Empirical evidence for publication and other reporting there are often multiple animal model studies performed

biases in cognitive sciences on similar questions. Empirical evaluations have shown

Neuroimaging that small studies consistently give more favorable results

With over 3000 publications annually in the past decade, compared with larger studies [35] and study quality is

including several meta-analyses [27] (Figure 1), the iden- inversely related to effect size [36–39]. A large-scale

tification of potential publication or reporting biases is of meta-epidemiological evaluation [37,40] used data from

crucial interest for the future of neuroimaging. The first the Collaborative Approach to Meta-Analysis and Review

study to evaluate evidence for an excess of statistically of Animal Data in Experimental Studies consortium

significant results in neuroimaging focused on brain volu- (CAMARADES; http://www.camarades.info/). It assessed

metric studies based on region of interests (ROIs) analyses 4445 data sets synthesized in 160 meta-analyses of neuro-

of psychiatric conditions [28]. The study analyzed 41 meta- logical disorders, and found clear evidence of an excess of

analyses (461 data sets) and concluded that there were too significance. This observation suggests strong biases, with

many studies with statistically significant results, proba- selective analysis and outcome reporting biases being

bly due to selective outcome or analysis reporting [28]. plausible explanations.

Another evaluation assessed studies using voxel-based-

morphometry (VBM) methods, which are less subject to Psychological science

selective reporting of selected ROIs [29]. Still, in a data set Over 50 years ago, Sterling [41] offered evidence that most

of 47 whole-brain meta-analyses, including 324 individual published psychological studies reported positive results.

This trend has been persistent and perhaps even increas-

ing over time [42], particularly in the USA [43]. Evidence

10 000

suggests perceived bias against negative results in peer

review [44], and a desire for aesthetically pleasing positive

8000 effects [45] leads authors to be less likely to even attempt to

publish negative studies [44]. Instead, many psychological

6000 scientists report taking advantage of selective reporting

and flexibility in analysis to make their research results

4000 more publishable [46,47]. Such practices may contribute to

unusually elevated likelihoods of positive results that are

just below the nominal 0.05 threshold for significance (‘P 2000 Number of publicaons

hacking’) [48,49].

0

Clinical trials

Publication and other reporting biases in the clinical trials

1962 1972 1982 1992 2002 2012

Years literature have often been ascribed to financial vested

TRENDS in Cognitive Sciences interests from manufacturer sponsors [50]. For example,

in a review [51] of all randomized controlled trials of

Figure 1. Brain-imaging studies over the past 50 years. Number of brain-imaging

nicotine replacement therapy (NRT) for smoking cessation,

studies published in PubMed (search term ‘neuroimaging’, updated up to 2012)

over the past 50 years. more industry-supported trials (51%) reported statistically

3

TICS-1311; No. of Pages 7

Review Trends in Cognitive Sciences xxx xxxx, Vol. xxx, No. x

significant results compared with nonindustry trials (22%); No reporng standards Not publishing null results Not publishing replicaons

this difference was unexplained by trial characteristics. Many measures and no analysis constraints Materials unavailable for replicaon

Moreover, industry-supported trials indicated a larger

Presenng exploratory analysis as confirmatory Data unavailable for reanalysis

effect of NRT [summary odds ratio 1.90, 95% confidence

Underpowered designss Data peepeekingking Selecve reporng No reporng

interval (CI) 1.67–2.16] compared with nonindustry trials

(summary odds ratio 1.61, 95% CI 1.43–1.80). Evidence of

excess significance has also been documented in trials of Design Conduct Analyze Report

study study data results

neuroleptics [19]. Comparisons of published results

against FDA records shows that, although almost half of

the trials on for depression have negative

Register study Specify analysis plan Open materials Open data Replicaon

results in the FDA records, these negative results either Soluons workflow problems

Data collecon start and stop rules Disnguish confirmatory and exploratory analysis

remain unpublished or are published with distorted

reporting that shows them as positive [8]; thus, the pub- Disclose all measures, condions, and outcomes of pre-specified analysis plan

lished literature shows larger estimates of treatment Registered reports Require disclosure Reward open pracces Post-pub review

effects for antidepressants than do the FDA data. A similar TRENDS in Cognitive Sciences

pattern has also been recorded for trials on antipsychotics

[7]. Figure 2. Common practices and possible solutions across the workflow for

addressing publication biases. Red problems and green solutions are mostly

Financial conflicts are not the only explanation for such controllable by researchers; purple problems and blue solutions are mostly

controllable by journal editors. Funding agencies may also be major players in

biases. For example, similar problems to antidepressants

shaping incentives and reward systems.

also seem to exist in the literature of randomized trials on

psychotherapies [52,53]. towards favoring

specific cognitive, behavioral, or social theories that under- publication. Using public registries such as http://clinical-

lie these interventions may be responsible. trials.gov/, one can discover studies that were conducted

and never reported. However, still only the minority of

Genetics clinical trials is registered and registration is uncommon

Over the past couple of decades, candidate gene studies for other types of design for which registration may also be

have exemplified many of the problems of poor reproduc- useful [64–66].

ibility; few of the early studies in this literature could be

replicated reliably [54,55]. This problem has been even Pre-specification

worse in the literature of genetics of cognitive traits and Pre-specification of the project design and analysis plan

personality, where phenotype definitions are flexible [56]. can allow separating pre-specified from post hoc explorato-

The development of genome-wide association methods ry analyses. Currently, registered studies do not necessar-

necessitated greater statistical stringency (given the num- ily include such detailed pre-specification. Exploratory

ber of statistical tests conducted) and larger sample sizes. analysis can be useful, but they are tentative. Pre-specified

This has transformed the reliability of genetic research, analyses can be interpreted with greater confidence [67].

with fewer but more robust genetic signals now being Researchers can register and pre-specify analyses through

identified [57]. The successful transformation has required services such as the Open Science Framework (OSF; http://

international collaboration, the combination of data in osf.io/).

large consortia, and the adoption of rigorous replication

practices in meta-analysis within consortia [58,59]. Data, protocol, code, and analyses availability

For materials that can be shared digitally, transparency of

Approaches to prevent bias the materials, data, and research process is now straight-

Most scientists embrace fundamental scientific values, forward. There are hundreds of data repositories available

such as disinterestedness and transparency, even while for data storage, some are generic (e.g., http://thedata.org/),

believing that other scientists do not [60]. However, noble whereas others are specific for particular types of data (e.g.,

intentions may not be sufficient to decrease biases when http://openfmri.org/), and others offer additional services

people are not able to recognize or control their own biases for collaboration and project management (e.g., OSF and

[61,62], rationalize biases through motivated reasoning http://figshare.com/). There are also an increasing number

[63], and are embedded in a culture that implicitly rewards of options for registering protocols (e.g., PROSPERO for

the expression of biases for personal career advancement systematic reviews, Trials, and F1000 journals).

(i.e., publication over accuracy). Actual solutions should

recognize the limits of human capacity to act on intentions, Replication

and nudge the incentives that motivate and constrain Another core value of science is reproducibility [68]. Present

behavior. Figure 2 illustrates common practices and pos- incentives for innovation vastly outweigh those for replica-

sible solutions across the workflow for addressing multiple tion, making replications rare in many fields [69,70]. Jour-

biases, which are discussed below. nals are altering those incentives with a ‘registered reports’

publishing option to avoid bias on the part of authors or

Study registration reviewers who might otherwise be swayed by statistical

Study registration has been used primarily for clinical significance or the ‘information gain’ [71] arising from the

trials. Most leading journals now require protocol regis- results of a study [72]. Authors propose replicating an

tration for clinical trials at inception as a condition of existing result and provide justification for importance

4

TICS-1311; No. of Pages 7

Review Trends in Cognitive Sciences xxx xxxx, Vol. xxx, No. x

and description of the protocol and analysis plan. Peer Publication practices

review and conditional acceptance before data collection Beyond disclosure requirements and replication reports,

lowers the barrier for authors to pursue replication, and journals can encourage or require practices that shift

increases confidence in the results of the replication via incentives and decrease bias. For example, in 2014, eight

and pre-specification. Multiple journals in charter journals in psychology and neuroscience adopted

psychology and neuroscience have adopted this format badges to acknowledge articles that met criteria for Open

[73–75]; (http://www.psychologicalscience.org/index.php/ Data, Open Materials, and Preregistration (http://osf.io/

replication), and funders such as the NIH are considering TVyXZ). Furthermore, journals can make explicit that

support for replication research [76]. replications are welcome, not discouraged [83], particular-

ly of research that had been published previously in that

Collaborative consortia journal. Finally, journals such as Psychological Science are

Incentives for replication research are enhanced further changing to allow fulsome description of methods [84]. Full

through crowdsourcing and collaborative consortia to description may facilitate the adoption of standards in

increase the feasibility and impact of replication. The contexts that allow substantial flexibility. For example,

Reproducibility Project is an open collaboration of more a study of 241 studies using fMRI found 223 unique

than 180 researchers aiming to estimate the reproduc- analysis pipelines with little justification for this variation

ibility of psychological science. Replication teams each [85]. Furthermore, activation patterns were strongly de-

select a study from a sample of articles published in three pendent on the analysis pipeline applied.

psychology journals in 2008. All teams follow a protocol

that provides quality assurance and adherence to stan- Post-publication review practices

dards. The effort distributes workload among many con- Retractions are a rare event [86], as are full re-analyses of

tributors and aggregates the replications for maximal published results. Increasing opportunities and exposure

value [77,78]. The Many Labs Project implemented a to post-publication review commentary and re-analysis

single study session including 13 experimental effects may be useful. Post-publication comments can now be

in psychology [79]. The same protocol (with translation) made in PubMed Commons and have been possible for a

was administered in 36 laboratory and web samples long time in the BMJ, Frontiers, PLoS, and other journal

around the world (total n = 6344). groups.

Commentators also note untapped opportunities for

replication research in the context of advanced undergrad- Incentive structures

uate and graduate pedagogy [80,81]. The Collaborative The primary currency of academic science is publication.

Replications and Education Project (CREP; https://osf.io/ Incentives need to be realigned so that emphasis is given

fLAuE/) identified ten published studies from 2010 of those not to quantity of publications, but to the quality and

that were amenable to integration and execution as a impact of the work, reproducibility track-record, and ad-

course-based research project. Project coordinators provid- herence to sharing culture (for data, protocols, analyses,

ed instructional aids and defined standards for document- and materials) [61,87].

ing and aggregating the results of replications across

institutions. Concluding remarks

Some fields, such as genetics, have been transformed by Overall, publication and other selective reporting biases

the widespread adoption of replication practices as a sine are probably prevalent and influential in diverse cognitive

qua non in the setting of large-scale collaborative consor- sciences. Different approaches have been proposed to rem-

tia, where tens and hundreds of teams participate. Such edy these biases. Box 2 summarizes some outstanding

consortia allow ensuring that maximal standardization or questions in this process. As shown in Figure 2, there

harmonization exists in the analysis plan across the par-

ticipating teams.

Box 2. Outstanding questions

There are several outstanding questions about how to minimize the

Pre-publication review practices

impact of biases in the literature. These include who should take the

Authors follow standards and checklists for formatting

lead in changing various practices (journals, scientists, editors,

manuscripts for publication, but there are few standards

funding agencies, regulators, patients, the public, or combinations

for the content of what is reported in manuscripts in basic of the above); how to ensure that more stringent requirements to

protect from bias do not stifle creativity by increasing the burden of

science. In January 2014, Psychological Science intro-

documentation; the extent to which new measures to curtail bias

duced a four-item checklist of disclosure standards for

may force investigators, authors, and journals to adapt their

submission: rationale for sample size, outcome variables,

practices in a normative fashion, satisfying new checklists of

experimental conditions, and use of covariates in analysis. requirements, but without really changing the essence of the

Enriched checklists and reporting standards, such as research practices; how to avoid excessive, unnecessary replication

happening at the expense of not investing in new research

CONSORT and ARRIVE, are in use in clinical trials

directions; how to optimize the formation and efficiency of

and animal research, respectively (http://www.consort-

collaborative consortia of scientists; and how to tackle potential

statement.org/; http://www.nc3rs.org.uk/page.asp?id=

caveats or collateral problems that may ensue while trying to

1357). In addition, specialized reporting standards have maximize public access to raw data [e.g., in theory, analyses may be

been introduced for some methods, such as fMRI [82]. Such performed by ambitious re-analysts with the explicit purpose of

showing that they get a different (and, thus, more easily publish-

practices may improve the consistency and clarity of reporting. able) result].

5

TICS-1311; No. of Pages 7

Review Trends in Cognitive Sciences xxx xxxx, Vol. xxx, No. x

18 Gadbury, G.L. and Allison, D.B. (2012) Inappropriate fiddling with

are multiple entry points for each of diverse ‘problems’

statistical analyses to obtain a desirable p-value: tests to detect its

from designing to conducting studies, analyzing data, and

presence in published literature. PLoS ONE 7, e46363

reporting results. ‘Solutions’ are existing, but uncommon,

19 Ioannidis, J.P. and Trikalinos, T.A. (2007) An exploratory test for an

practices that should improve the fidelity of scientific excess of significant findings. Clin. Trials 4, 245–253

inference in the cognitive sciences. We have limited evi- 20 Ioannidis, J.P.A. (2014) Clarifications on the application and

interpretation of the test for excess significance and its extensions.

dence on effectiveness of these approaches. Moreover,

J. Math. Psychol. 57, 187

improvements may require interventions at multiple

21 Francis, G. (2012) The psychology of replication and replication in

levels. Such interventions should aim at improving the

psychology. Perspect. Psychol. Sci. 7, 585–594

efficiency and reproducibility of science without stifling 22 David, S.P. et al. (2013) Potential reporting bias in FMRI studies of the

creativity and the drive for discovery. brain. PLoS ONE 8, e70104

23 Ru¨ cker, G. et al. (2011) Detecting and adjusting for small-study effects

in meta-analysis. Biom. J. 53, 351–368

Acknowledgments

24 Duval, S. and Tweedie, R. (2000) Trim and fill: a simple funnel-plot-

S.P.D. acknowledges funding from the National Institute on Drug Abuse

based method of testing and adjusting for publication bias in meta-

US Public Health Service grant DA017441 and a research stipend from

analysis. Biometrics 56, 455–463

the Institute of Medicine (IOM) (Puffer/American Board of Family

25 Ioannidis, J.P. (2011) Adjusting for bias: a user’s guide to performing

Medicine/IOM Anniversary Fellowship). M.R.M. acknowledges support

plastic surgery on meta-analyses of observational studies. Int. J.

for his time from the UK Centre for Tobacco Control Studies (a UKCRC

Epidemiol. 40, 777–779

Public Health Research Centre of Excellence) and funding from British

26 Thompson, S. et al. (2011) A proposed method of bias adjustment for

Heart Foundation, Cancer Research UK, the Economic and Social

meta-analyses of published observational studies. Int. J. Epidemiol.

Research Council, the Medical Research Council, and the National

40, 765–777

Institute for Health Research under the auspices of the UK Clinical

27 Jennings, R.G. and Van Horn, J.D. (2012) Publication bias in

Research Collaboration. F.F-P. acknowledges funding from the EU-GEI

neuroimaging research: implications for meta-analyses.

study (project of the European network of national schizophrenia

Neuroinformatics 10, 67–80

networks studying Gene-Environment Interactions). B.A.N. acknowl-

28 Ioannidis, J.P. (2011) Excess significance bias in the literature on brain

edges funding from the Fetzer Franklin Fund, Laura and John Arnold

volume abnormalities. Arch. Gen. Psychiatry 68, 773–780

Foundation, Alfred P. Sloan Foundation, and the John Templeton

29 Borgwardt, S. et al. (2012) Why are psychiatric imaging methods

Foundation.

clinically unreliable? Conclusions and practical guidelines for

authors, editors and reviewers. Behav. Brain Funct. 8, 46

References 30 Fusar-Poli, P. et al. (2013) Evidence of reporting biases in voxel-based

1 Green, S. et al. (2011) Cochrane Handbook for Systematic Reviews of morphometry (VBM) studies of psychiatric and neurological disorders.

Interventions, version 5.1.0, The Cochcrane Collaboration Hum. Brain Mapp. http://dx.doi.org/10.1002/hbm.22384

2 Dwan, K. et al. (2008) of the empirical evidence of 31 Sayo, A. et al. (2012) Study factors influencing ventricular enlargement

study publication bias and outcome reporting bias. PLoS ONE 3, e3081 in schizophrenia: a 20 year follow-up meta-analysis. Neuroimage 59,

3 Vogel, G. (2011) Scientific misconduct. Psychologist accused of fraud on 154–167

‘astonishing scale’. Science 334, 579 32 Begley, C.G. and Ellis, L.M. (2012) Drug development: raise standards

4 Vogel, G. (2008) Scientific misconduct. Fraud charges cast doubt on for preclinical cancer research. Nature 483, 531–533

claims of DNA damage from cell phone fields. Science 321, 1144–1145 33 Prinz, F. et al. (2011) Believe it or not: how much can we rely on published

5 Vogel, G. (2006) Developmental biology. Fraud investigation clouds data on potential drug targets? Nat. Rev. Drug Discov. 10, 712

paper on early cell fate. Science 314, 1367–1369 34 ter Riet, G. et al. (2012) Publication bias in laboratory animal research:

6 Saquib, N. et al. (2013) Practices and impact of primary outcome a survey on magnitude, drivers, consequences and potential solutions.

adjustment in randomized controlled trials: meta-epidemiologic PLoS ONE 7, e43404

study. BMJ 347, f4313 35 Sena, E.S. et al. (2010) Publication bias in reports of animal stroke

7 Turner, E.H. et al. (2012) Publication bias in antipsychotic trials: an studies leads to major overstatement of efficacy. PLoS Biol. 8,

analysis of efficacy comparing the published literature to the US Food e1000344

and Drug Administration database. PLoS Med. 9, e1001189 36 Ioannidis, J.P. (2012) Extrapolating from animals to humans. Sci.

8 Turner, E.H. et al. (2008) Selective publication of trials Transl. Med. 4, 151ps15

and its influence on apparent efficacy. N. Engl. J. Med. 358, 252–260 37 Jonasson, Z. (2005) Meta-analysis of sex differences in rodent models of

9 Dwan, K. et al. (2013) Systematic review of the empirical evidence of learning and memory: a review of behavioral and biological data.

study publication bias and outcome reporting bias – an updated review. Neurosci. Biobehav. Rev. 28, 811–825

PLoS ONE 8, e66844 38 Macleod, M.R. et al. (2008) Evidence for the efficacy of NXY-059 in

10 Sterne, J.A. et al. (2011) Recommendations for examining and experimental focal cerebral ischaemia is confounded by study quality.

interpreting asymmetry in meta-analyses of randomised Stroke 39, 2824–2829

controlled trials. BMJ 343, d4002 39 Sena, E. et al. (2007) How can we improve the pre-clinical development

11 Lau, J. et al. (2006) The case of the misleading funnel plot. BMJ 333, of drugs for stroke? Trends Neurosci. 30, 433–439

597–600 40 Button, K.S. et al. (2013) Power failure: why small sample size

12 Peters, J.L. et al. (2006) Comparison of two methods to detect undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14,

publication bias in meta-analysis. JAMA 295, 676–680 365–376

13 Harbord, R.M. et al. (2006) A modified test for small-study effects in 41 Sterling, T.D. (1959) Publication decisions and their possible effects on

meta-analyses of controlled trials with binary endpoints. Stat. Med. 25, inferences drawn from tests of significance, or vice versa. J. Am. Stat.

3443–3457 Assoc. 54, 30–34

14 Copas, J.B. and Malley, P.F. (2008) A robust P-value for treatment 42 Fanelli, D. (2012) Negative results are disappearing from most

effect in meta-analysis with publication bias. Stat. Med. 27, 4267–4278 disciplines and countries. Scientometrics 90, 891–904

15 Copas, J. and Jackson, D. (2004) A bound for publication bias based on 43 Fanelli, D. and Ioannidis, J.P. (2013) US studies may overestimate

the fraction of unpublished studies. Biometrics 60, 146–153 effect sizes in softer research. Proc. Natl. Acad. Sci. U.S.A. 110, 15031–

16 Hedges, L.V. and Vevea, J.L. (1996) Estimating effect size under 15036

publication bias: small sample properties and robustness of a 44 Greenwald, A.G. (1975) Consequences of prejudice against the null

random effects selection model. J. Educ. Behav. Stat. 21, 299–332 hypothesis. Psychol. Bull. 82, 1–20

17 Pfeiffer, T. et al. (2011) Quantifying selective reporting and the Proteus 45 Giner-Sorolla (2012) Science or art? How esthetic standards grease the

phenomenon for multiple datasets with similar bias. PLoS ONE 6, way through the publication bottleneck but undermine science.

e18362 Psychol. Sci. 7, 562–571

6

TICS-1311; No. of Pages 7

Review Trends in Cognitive Sciences xxx xxxx, Vol. xxx, No. x

46 John, L.K. et al. (2012) Measuring the prevalence of questionable 72 Emerson, G.B. et al. (2010) Testing for the presence of positive-outcome

research practices with incentives for truth telling. Psychol. Sci. 23, bias in peer review: a randomized controlled trial. Arch. Intern. Med.

524–532 170, 1934–1939

47 Simmons, J.P. et al. (2011) False-positive psychology: undisclosed 73 Chambers, C.D. (2013) Registered reports: a new publishing initiative

flexibility in data collection and analysis allows presenting anything at Cortex. Cortex 49, 609–610

as significant. Psychol. Sci. 22, 1359–1366 74 Nosek, B.A. and Lakens, D.E. (2013) Call for proposals: special issue of

48 Masicampo, E.J. and Lalande, D.R. (2012) A peculiar prevalence of p Social Psychology on ‘Replications of important results in social

values just below .05. Q. J. Exp. Psychol. (Colchester) 65, 2271–2279 psychology’. Soc. Psychol. 44, 59–60

49 Simonsohn, U. et al. (2014) P-curve: a key to the file-drawer. J. Exp. 75 Wolfe, J. (2013) Registered reports and replications in attention,

Psychol. Gen. (in press) perception, and psychophysics. Atten. Percept. Psychophys. 75, 781–783

50 Stamatakis, E. et al. (2013) Undue industry influences that distort 76 Wadman, M. (2013) NIH mulls rules for validating key results. Nature

healthcare research, strategy, expenditure and practice: a review. Eur. 500, 14–16

J. Clin. Invest. 43, 469–475 77 Open Science Collaboration (2012) An open, large-scale collaborative

51 Etter, J.F. et al. (2007) The impact of pharmaceutical company funding effort to estimate the reproducibility of psychological science. Perspect.

on results of randomized trials of nicotine replacement therapy for Psychol. Sci. 7, 657–660

smoking cessation: a meta-analysis. Addiction 102, 815–822 78 Open Science Collaboration (2014) The : a

52 Cuijpers, P. et al. (2013) Comparison of psychotherapies for adult model of large-scale collaboration for empirical research on

depression to pill control groups: a meta-analysis. Psychol. reproducibility. In Implementing Reproducible Computational

Med. 44, 685–695 Research (A Volume in the R Series) (Stodden, V. et al., eds), Taylor

53 Cuijpers, P. et al. (2010) Efficacy of cognitive-behavioural therapy and & Francis (in press)

other psychological treatments for adult depression: meta-analytic 79 Klein, R.A. and Ratliff, K.A. (2014) Investigating variation in

study of publication bias. Br. J. Psychiatry 196, 173–178 replicability: a ‘Many Labs’ replication project. Registered design.

54 Munafo`, M.R. (2009) Reliability and replicability of genetic association Soc. Psychol. (in press)

studies. Addiction 104, 1439–1440 80 Frank, M.C. and Saxe, R. (2012) Teaching replication. Perspect.

55 Ioannidis, J.P. et al. (2001) Replication validity of genetic association Psychol. Sci. 7, 600–604

studies. Nat. Genet. 29, 306–309 81 Grahe, J.E. et al. (2012) Harnessing the undiscovered resource of

56 Munafo`, M.R. and Flint, J. (2011) Dissecting the genetic architecture of student research projects. Perspect. Pyschol. Sci. http://dx.doi.org/

human personality. Trends Cogn. Sci. 15, 395–400 10.1177/1745691612459057

57 Hirschhorn, J.N. (2009) Genomewide association studies: illuminating 82 Poldrack, R.A. et al. (2008) Guidelines for reporting an fMRI study.

biologic pathways. N. Engl. J. Med. 360, 1699–1701 Neuroimage 40, 409–414

58 Evangelou, E. and Ioannidis, J.P. (2013) Meta-analysis methods for 83 Neuliep, J.W. and Crandell, R. (1990) Editorial bias against replication

genome-wide association studies and beyond. Nat. Rev. Genet. 14, research. J. Soc. Behav. Pers. 5, 85–90

379–389 84 Eich, E. (2014) Business not as usual. Psychol. Sci. http://dx.doi.org/

59 Panagiotou, O.A. et al. (2013) The power of meta-analysis in genome- 10.1177/0956797613512465

wide association studies. Annu. Rev. Genomics Hum. Genet. 14, 85 Carp, J. (2012) The secret lives of experiments: methods reporting in

441–465 the fMRI literature. Neuroimage 63, 289–300

60 Anderson, M.S. et al. (2007) Normative dissonance in science: results 86 Steen, R.G. (2010) Retractions in the scientific literature: do authors

from a national survey of U.S. Scientists. J. Empir. Res. Hum. Res. deliberately commit research fraud? J. Med. 37, 113–117

Ethics 2, 3–14 87 Koole, S.L. and Lakens, D. (2012) Rewarding replications: a sure and

61 Nosek, B.A. et al. (2012) Scientific utopia: II. Restructuring incentives simple way to improve psychological science. Perspect. Psychol. Sci. 7,

and practices to promote truth over publishability. Perspect. Psychol. 608–614

Sci. 7, 615–631 88 Rosenthal, R. (1979) The file drawer problem and tolerance for null

62 Nosek, B.A. et al. (2012) Implicit social cognition. In Handbook of Social results. Psychol. Bull. 86, 638–641

Cognition (Fiske, S. and Macrae, C.N., eds), pp. 31–53, Sage 89 Ioannidis, J.P. (1998) Effect of the statistical significance of results on

63 Kunda, Z. (1990) The case for motivated reasoning. Psychol. Bull. 108, the time to completion and publication of randomized efficacy trials.

480–498 JAMA 279, 281–286

64 Ioannidis, J.P. (2012) The importance of potential studies that have 90 Ioannidis, J.P. and Trikalinos, T.A. (2005) Early extreme contradictory

not existed and registration of observational data sets. JAMA 308, estimates may appear in published research: the Proteus phenomenon

575–576 in molecular genetics research and randomized trials. J. Clin.

65 World Medical Association (2013) World Medical Association Epidemiol. 58, 543–549

Declaration of Helsinki: ethical principles for medical research 91 Kirkham, J.J. et al. (2010) The impact of outcome reporting bias in

involving human subjects. JAMA 310, 2191–2194 randomised controlled trials on a cohort of systematic reviews. BMJ

66 Dal-Re´, R. et al. (2014) Making prospective registration of 340, c365

observational research a reality. Sci. Transl. Med. 6, 224cm1 92 Hahn, S. et al. (2000) Assessing the potential for bias in meta-analysis

67 Wagenmakers, E-J. et al. (2012) An agenda for purely confirmatory due to selective reporting of subgroup analyses within studies. Stat.

research. Perspect. Psychol. Sci. 7, 632–638 Med. 19, 3325–3336

68 Schmidt, S. (2009) Shall we really do it again? The powerful concept of 93 Moreno, S.G. et al. (2009) Novel methods to deal with publication

replication is neglected in the social sciences. Rev. Gen. Psychol. 13, 90– biases: secondary analysis of antidepressant trials in the FDA trial

100 registry database and related journal publications. BMJ 339, b2981

69 Collins, H.M. (1985) Changing Order, Sage 94 Vedula, S.S. et al. (2009) Outcome reporting in industry-sponsored

70 Makel, M.C. et al. (2012) Replications in psychology research: how often trials of gabapentin for off-label use. N. Engl. J. Med. 361, 1963–1971

do they really occur? Perspect. Psychol. Sci. 7, 537–542 95 Ioannidis, J.P. (2008) Why most discovered true associations are

71 Evangelou, E. et al. (2012) Perceived information gain from inflated. Epidemiology 19, 640–648

randomized trials correlates with publication in high-impact factor 96 Young, N.S. et al. (2008) Why current publication practices may distort

journals. J. Clin. Epidemiol. 65, 1274–1281 science. PLoS Med. 5, e201

7