Thesis

Role of systematic reviews in the identification and prevention of poor quality research and scientific misconduct: random reflections based on anaesthesiologic cases

ELIA, Nadia

Abstract

Research in clinical medicine often suffers from poor methodological quality and is sometimes based on fraudulent data, which may bias the conclusions of systematic reviews. This dissertation develops the idea that not all research malpractices and misconduct threaten the scientific knowledge, nor may they all bias the conclusions of systematic reviews to the same extent, but rather that the strict procedures applied during proper systematic reviewing actually should make systematic reviews robust to some common errors and misconduct. Moreover, well-performed systematic reviews could become part of the solution to the problem by providing clear research agendas and by providing important methodological information that may be used for future research. Furthermore, systematic reviewers may correct the published literature after having identified errors during critical appraisal of articles, and they may even play the role of whistle-blowers by identifying new, yet unrecognised, cases of misconducts.

Reference

ELIA, Nadia. Role of systematic reviews in the identification and prevention of poor quality research and scientific misconduct: random reflections based on anaesthesiologic cases. Thèse de privat-docent : Univ. Genève, 2017

DOI : 10.13097/archive-ouverte/unige:92532

Available at: http://archive-ouverte.unige.ch/unige:92532

Disclaimer: layout of this document may differ from the published version.

1 / 1

Clinical Medicine Section Department of Anaesthesiology, Pharmacology and Intensive Care Department of Community Medicine and Primary Care

"Role of systematic reviews in the identification and prevention of poor quality research and scientific misconduct: random reflections based on anaesthesiologic cases"

Thesis submitted to the Faculty of Medicine of the University of Geneva

for the degree of Privat-Docent by

Nadia ELIA

Geneva

2016

2

“There is something extraordinarily satisfying in designing an RCT of “place of therapy”, writing the protocol in such a way as to avoid all the ethical pitfalls, persuading all the necessary people to participate, and checking to see that no one cheats” 1

Archibald Cochrane, 1972

3

4 Summary

Research in clinical medicine often suffers from poor methodological quality and is sometimes based on fraudulent data. Although systematic reviews are generally recognised as generating the highest level of evidence of the impact of an intervention through the synthesis of all existing research on a topic, a broad area of research has developed to highlight that some common malpractices and major misconduct in original reports may bias the conclusions of systematic reviews, questioning their reliability in this context.

In this dissertation, I develop the idea that not all research malpractices and misconduct threaten the scientific knowledge, nor may they all bias the conclusions of systematic reviews to the same extent, but rather that the strict procedures applied during proper systematic reviewing actually should make systematic reviews robust to some common errors and misconduct. Moreover, I raise the idea that well-performed systematic reviews could become part of the solution to the broad problem of poor quality research and research misconduct by providing clear research agendas which would highlight what research is, respectively is not, needed, and thereby decreasing the number of redundant trials performed and published and by providing important methodological information that may be used for future research. Furthermore, systematic reviewers may play an important role by correcting the published literature after having identified errors during critical appraisal of articles, and they may even play the role of whistle-blowers by identifying new, yet unrecognised, cases of misconducts. Finally, I suggest that performing of a systematic review during the academic curriculum could improve the methodological research understanding of young physician-researcher.

Although most of the answers to the difficult problem of research malpractice and misconduct probably lie within academic structures, policies and regulations, systematic reviews may be seen as an interesting tool to support better quality in the published medical literature.

5

6 Table of content

Foreword ...... 9 1. Introduction ...... 11 1.1 Context ...... 11 1.2 Aims and structure of the dissertation ...... 11 2. The experimental clinical trial ...... 12 2.1 Steps from the idea to publication ...... 12 2.2 Potential malpractices ...... 13 2.3 Safeguards ...... 14 3. Honest error or misconduct ? ...... 19 4. The systematic review ...... 21 4.1 Malpractices that do not threaten the conclusions of systematic reviews ...... 21 4.2 Malpractices that do threaten the conclusions of systematic reviews ...... 23 4.3 Additional roles ...... 23 5 Personal contributions in this area of research ...... 27 5.1 Ketamine for post-operative pain - a quantitative systematic review of randomised trials ...... 29 5.2 Fate of articles that warranted retraction due to ethical concerns: a descriptive cross- sectional study...... 41 5.3 Susceptibility to fraud in systematic reviews: lessons from the Reuben case...... 51 5.4 How do authors of systematic reviews deal with research misconduct in original studies? A cross-sectional analysis of systematic reviews and survey of their authors...... 63 5.5 Ability of a meta-analysis to prevent redundant research: systematic review of studies on pain from propofol injection ...... 75 6 Discussion ...... 89 6.1 What should be done? ...... 89 6.2 Where do systematic reviews fit into the picture? ...... 92 7 Conclusion ...... 94 8 Bibliography ...... 95

7

8 Foreword

One morning in early 2009, as I arrived in my office at the Department of anaesthesiology of the University Hospitals of Geneva, I found my boss and mentor wearing a strange look on his face. We had a problem. Dr Scott S Reuben, an eminent American anaesthesiologist, had admitted having falsified and fabricated data in 21 of his articles, published between 1996 and 2008. These articles were going to be retracted. This was like an earthquake in the area of anaesthesiology. The news was immediately, and noisily, disseminated.2,3,4,5,6 Scott Reuben had published numerous original research articles on the topic of postoperative , and anaesthesiologists were questioning what should still be regarded as “known” in this field.7 More worrying to me was the fact that an eminent anaesthesiologist, P.F. White, suggested that “the retraction of Dr. Reuben’s articles would compromise every meta-analysis, editorial and systematic review of analgesia trials that included these fabricated findings”.8 I felt particularly concerned since I had been one of the author and co-author of such systematic reviews and meta-analysis.9 The following years confirmed that the "Reuben Case" was unfortunately not isolated, with the emergence of two new scandals in anaesthesiology: Joachim Boldt in 2010, and Yoshitaka Fujii in 2011. One positive effect of these scandals was to motivate me, with my background in anaesthesiology, epidemiology and public health, to investigate these issues further. What was this all about? Who were these doctors that "cheated" in their research publications? Was there a way to identify fraudulent data? Also, I started to wonder whether P.F. White’s concerns could be justified: where should systematic reviews and meta-analyses stand in the context of fraudulent science? These questions have modelled my research interest ever since, and will be the object of the present thesis.

9

10 1. Introduction

1.1 Context Experimental research in clinical medicine often suffers from poor methodological quality,10,11,12,13,14 and is sometimes based on fraudulent practices.15,16,17,18 These errors and misconduct (further called “malpractices”) threaten the quality of patient care,19,20,21,22,23 damage the reputation of science,24,25 and may also threaten the conclusions of systematic reviews relying on these trials.8,7,26 Although systematic reviews are generally regarded as generating the highest level of evidence of the impact of an intervention through the synthesis of all existing research on a topic, a broad area of research has developed to highlight the impact that some common malpractices in original reports may have on the conclusions of systematic reviews, questioning their reliability in this context.27,28,29,30,31,32 Less attention has been paid to the fact that not all research malpractices may bias the conclusions of systematic reviews, that systematic reviews may sometimes correct some of these malpractices, and that systematic reviewers may even act as whistle-blowers in the context of research misconduct.

1.2 Aims and structure of the dissertation The aim of this dissertation is to discuss the impact of the different research malpractices on the conclusions of systematic reviews and explore what additional roles systematic reviewers could play in this context. I will start by describing the steps underlying a clinical trial and illustrate the potential malpractices as well as the safeguards that exist throughout the research process. For clarity purposes, I will focus on the experimental clinical research design. Secondly, I will shortly discuss the fuzzy limit between “honest error” and “major misconduct”, and highlight that reprehensible misconduct does not always threaten the scientific knowledge or the conclusions of systematic reviews. Thirdly, I will briefly describe the steps underlying a systematic review to illustrate why some malpractices in the included trials remain harmless to the conclusions of the systematic review, how systematic reviewers may identify and sometimes correct these malpractices, and how systematic reviewers may even identify new, previously unrecognised, cases of intentional misconduct. The fourth part will report my personal contributions to this area of research and I will end by discussing the future roles of systematic reviewers in the context of research malpractice. Systematic reviews are often criticized or misunderstood.33 Although problems related to malpractices and misconduct when performing a systematic review do exist, and are becoming more frequent,34 I will not developed these in this dissertation, as it would represent a topic on its own. On the contrary, my aim here will be to highlight some of their strengths that are often overlooked or ignored. Therefore the assumption underlying the entire dissertation is that the systematic review is well performed, and sticks to the standards of high quality methodology.

11 2. The experimental clinical trial In an experimental trial, as opposed to an observational study, the investigator administrates an intervention to a group of patients not previously exposed to the intervention. In a randomized controlled trial (RCT), some patients receive the intervention, while others do not, and this is determined by chance. The strength of RCTs is to distribute all potential confounding variables, both known and unknown, equally among the groups, so that the difference in outcome between the groups can be attributed to the intervention only. RCTs represent the gold standard design for assessment of the impact of an intervention. Correctly performing a RCT requires adherence to a number of rules and procedures.

2.1 Steps from the idea to publication A research starts with an idea which is translated into a good research question, that needs to be both relevant and answerable.35,36 This is followed by the elaboration of a research protocol describing the progress of the research in detail. The protocol is an official document that guides the researcher throughout the trial, can be amended, and must be submitted to different regulatory bodies for approval. The protocol should justify the choice of the study design, give a clear description of the population investigated, of the intervention administrated, the comparator chosen, and of the primary and secondary outcomes of interest, to minimise the risk of subsequent biases. The number of patients required to reach a conclusion also needs to be justified based on a pre-hoc sample size calculation. This is especially important in experimental trials applying an intervention of unknown impact to participants as it would be unethical to include either too few,37,38 but also too many patients in such a trial. The research protocol requires to be submitted to an ethics committee for formal approval in order to guarantee the safety of included patients, to check that patients are properly informed of the aims of the trial, of their rights to participate (or not) and also of their rights to change their mind regarding their participation, any time, without justification. The ethics committee should also verify the relevance, feasibility and adequacy of the methods proposed. If the trial involves the administration of a new medication, or of an old medication for a new indication, the protocol should also be submitted to the national agency for drug regulation (in Switzerland: Swissmedic). If external financing of the trial is needed, the protocol may also be submitted to a sponsor (sometimes private actors). All instances review the protocol separately, and all approvals are required before patient enrolment can start. After formal approval of the protocol by all regulatory bodies and co-authors, the protocol needs to be published or registered in an open-access database such as, for example, “ClinicalTrial.org”. Patients fulfilling inclusion criteria can then be approached, and those who agree to participate are asked to sign an informed consent form before they can be enrolled in the study and randomized. Any deviation from the initial protocol needs to be documented; amendments need to be submitted for approval to the regulatory bodies, and all documents regarding these changes

12 should be completely and transparently reported in a “Trial Master File”. Once data collection is over, the results need to be analysed, interpreted and any potential biases discussed. Finally, the results of the trial are clearly and transparently reported in a scientific article, and submitted for publication to a journal. Journal editors send the submitted article to experts (called: peer- reviewers) for their opinion; peer-reviewers may suggest to reject the article or to propose ways to improve it. In case of rejection, the article can be improved and submitted to another journal for consideration. Eventually, the article gets published and will either remain un-read or un-cited39,40 or will be read, cited, commented and, in rare cases, retracted. The major steps underlying a randomized controlled trial are illustrated in Figure 1.A.

2.2 Potential malpractices Malpractices, ranging from harmless innocent errors to intentional misconduct, may occur throughout the process of the clinical trial. The first possible problem may be to start a research based on a research question that is irrelevant. Although science requires that experiments be repeated to demonstrate the consistency of results over time and place, a good research question should aim to provide an answer that is not yet known and requires a careful review of the published literature on the topic; this is still too often lacking.11,41 Performing a trial when the answer to the research question is already known raises ethical problems, especially in the field of experimental research on humans. The limit between a relevant and a redundant research question however, is less clear.36 This will be discussed further in chapter 5.6. It has become highly improbable that a clinical trial could start without a research protocol at all, since the protocol needs to be submitted to regulatory bodies. However the written protocol may be incomplete or remain unregistered and inaccessible.42,43 Also, the choice of an inadequate study design and the lack, or unclear definition, of primary and secondary outcomes are frequently described.14,44 Studies comparing published research with their protocol showed that over 60% had at least one primary outcome that had changed, was introduced or omitted compared to the protocol,45,46,47 even in high impact factor journals.48 Also the use of non- validated scales for the measurement of outcomes,49 and the lack of pre-hoc sample size calculation are common problems.11,14,42 It has been shown that less than 5% of the studies in anaesthesiology reported a pre-hoc sample size calculation before 1985; that proportion has grown to about 55% in 200112 and, in a review of RCTs published in the European Journal of Anaesthesiology, we showed that it had further improved to 74% 10 years later.50 Improvements in the quality of published RCTs have been described by others.51 It is surprising however that a quarter of the RCTs published in anaesthesiology still do not justify their sample size.50 During the data collection stage, malpractices can take various forms,52 which can be regrouped under the headings of “information biases”, when the variables or outcomes are systematically recorded differently across the groups due to, for example, lack or violation of blinding,53 or

13 “selection biases” when, for example in an RCT, concealment of the randomisation code is violated. Finally, unreported deviations from the protocol have also been described.46,54 During the data analysis stage, authors may be tempted to change the primary outcome in favour of one that is statistically significant,47 to apply an inadequate statistical test to show more impressive results, or to selectively report some results55 56 5758 or subgroup analyses,59 just to name a few very common malpractices.60 Over-interpretation of the study results, or making conclusion that are not supported by the data have also been reported frequently.61 Any intentional attempt to hide or improve the appearance of the results, for example by ignoring outliers, is known as “data falsification”. In some rare cases, an article may be published based on inexistent and invented data; this is known as data fabrication.16,17 Many problems may occur when writing the article, such as, authorship issues (guest- or ghost- authorship),62 not-reporting an author’s or a sponsor-related conflict of interest transparently,19,63,64 but also plagiarism of sentences from previously published work. Self-citation of one’s own work or selective citations of articles supporting one’s own view65,66 or of statistically significant results,67 is almost the rule, and has been suggested to be a gender issue.59 Simultaneous submission of the same article to different journals, which may lead to multiple publications of the same data, still happens, although prohibited. Finally, maybe the worst case for an article, is to end-up not being published at all;69,70 this has been shown to be frequent even for RCTs, in particular when RCTs show non-significant results.71,72 Recent initiatives, such as the mandatory registration of research protocols, or the “Restoring Invisible and Abandoned Trials (RIAT)” initiative, have emerged to counter-balance this problem.73 Potential malpractices during the process of a RCT are summarized in Table 1.

2.3 Safeguards Safeguards to prevent or correct malpractices during the research process exist. They include controls made to the protocol (before the trial starts), audits and monitoring performed during data collection, peer-reviews of the article (before publication), comments by cautious readers (after publication), and eventually publication of corrections, or retraction of the published article. Furthermore, the research team and academic rules and attitude towards research integrity play an important role that may influence the risk of malpractices. Protocol submission to regulatory bodies usually represents a step where problems are highlighted, and corrected, thanks to the reviews performed by the ethics committee, the drug regulation agency or funders. This remains a difficult task and a large variety of standards have been described across ethics committees.74 Attempts to harmonise procedures are emerging;75,76 however, the impact of ethical peer-review on subsequent trial quality remains unclear.41,77,78 Sometimes, authors fail to seek any formal ethical approval or patient consent; this may hide even more serious forms of malpractices.

14 Box 1 The Reuben case: uncovering fraud through the lack of ethics committee approval. Dr Scott S Reuben was an American anaesthesiologist with an international reputation built on his research focused on multimodal analgesia for the prevention and control of post-operative pain. He was the chief of the acute pain clinic in Baystate Medical Center, Springfield, Massachusetts, when a routine review of research abstracts discovered, in May 2008, that two of the studies that he intended to present during an internal « research week » had not received formal approval by an ethics committee.2 An investigation was launched by his institution and, a year later, Dr Reuben admitted having fabricated data, and invented patients, in 21 research articles published during a 13 years period. Dr Reuben assumed the entire responsibility for this fraud, and his co-authors were not further bothered. He was finally sentenced to six months in jail and three years of supervised release. He was also ordered to repay about $400’000 after he pleaded “guilty to health care fraud”, according to Patrick Johnson from The Republican (June 2010).

The prospective registration of research protocols has become mandatory for publication in most peer-reviewed journals. Although clinical trial registries do not yet control the quality of the research protocol, this procedure should force authors to stick to their initial ideas, and prevent them from switching endpoints or introducing new endpoints that were accidentally found to be more interesting, because statistically significant. 43,79,80 It should also help to reduce non- publication of “negative” trials, and facilitate the work of peer-reviewers.81 Some journals encourage and accept to publish research protocols. In this case, the protocols benefit from a peer-review, and may be improved.82 Audits can be performed by the ethics committee or by the drug regulation agency any time during data collection. These however, are not systematic, are performed on a random basis, but have been shown to sometimes identify and correct some malpractices.83,4 Monitoring of the trial can be performed by an independent structure. This consists of a regular data quality control and is performed on demand by the trial investigators. As illustrated in Table 1, peer-reviewers are given a central role to guarantee the good quality of the published literature. However, it has been shown that they remain unable to detect most major errors in RCTs.84,85,86 Although RCTs published in high impact factor journals have been shown to be more likely to report on methods to prevent biases compared to those published in lower impact factor journals, which could be attributed to better peer-review, this does not guarantee low risk of bias.87 More worrying is the fact that training of peer-reviewers may not be of any use,85,88,89 and worse is the recent discovery of peer-review fraudulent organisations, where some authors manage to review their own papers through the use of faked email addresses.90,91,92 Softwares are now used by most journals to detect plagiarism,93 and the occurrence of copy-paste like types of plagiarism is likely to decrease. Plagiarism of ideas though

15 will remain a problem. Sometimes, errors or malpractices are discovered by cautious readers after publication, eventually leading to correction, or retraction of the article.

Box 2 The Boldt case: uncovering fraud by cautious readers. Prof Joachim Boldt was a renowned German anaesthesiologist. He had built his reputation through his publications focusing on the use of colloids. He was the head of the department of anaesthesiology at the Klinikum Ludwigshafen, a hospital in Germany. In December 2009, the journal Anesthesia & Analgesia published an article authored by Prof Boldt, which was followed by three letters from readers to the Editor in Chief of the journal. These letters all highlighted that the data reported in this paper were rather unlikely (too small standard deviations for the IL-6 concentrations). The Editor in Chief of the journal, who agreed with the reader’s concerns, identified the responsible entity for ethical conduct of research in Klinikum Ludwigshafen (LAK- RLP) who started an investigation on that trial. About a year later, after having identified numerous errors in the article, the article was retracted. The investigation concluded that the study had been fabricated; none of the patients or analyses had been identified. Prof Boldt admitted having fabricated the data and having signed for his co-authors on the copyright transfer forms.94 Following on that case of fraud, Prof Boldt’s institution started an internal investigation on his work over the past 10 years; their conclusion lead to the retraction of about 90 articles published in 18 journals.16

The proportion of retracted articles has increased rapidly, and a large proportion were shown to be due to misconduct,95 highlighting both low barriers to publication of fraudulent articles, and easier retraction practices.96 A “retraction index”* showed that the chance of an article being retracted was higher when published in a high impact factor journal.97 However, it is widely believed that retracted articles only represent the tip of the iceberg, with most of the malpractices remaining un-identified, and even some identified frauds remaining un-retracted. This problem will be discussed further in chapter 5.3. Table 1 summarizes the potential malpractices and safeguards discussed above.

______* computed as the number of retractions multiplied by 1,000, divided by the number of published articles within these journals during the same time period

16 Figure 1 Steps underlying a randomized controlled trial and a systematic review

A. RANDOMIZED CONTROLLED TRIAL B. SYSTEMATIC REVIEW

Idea Idea

Research question Research question

• Study design • Plan of literature search Ethics Committee • Population • Inclusion /exclusion criteria (inclusion/exclusion) Drug Regulation Agency Sponsor

• Intervention / comparator • Intervention / comparator Protcol • Primary / secondary outcomes • Protcol Primary / secondary outcomes • Sample size, plan of analysis • Plan of analysis

Registration • Patient enrollement • Article retrieval (Information, consent) • Critical apraisal Possible audits • Randomisation, concealement • Double data extraction and monitoring

• Data collection • Contact with study authors

Data collection Data Data collection Data

Data analyses Data syntheses +/- meta-analyses

Data interpretation, conclusions Data interpretation, conclusions, agenda

Peer-reviewers Writting and publishing the article Writting and publishing the article Editors Readers TABLE 1 Malpractices and safeguards throughout the process of a RCT

Steps in a RCT Potential malpractices Institutional safeguards Defining the research Clinically irrelevant question question Lack of a clear research question Protocol peer-review by: Incomplete review of existing literature Ethics committee National agency for drug Writing the research Wrong design regulation protocol Ill-defined primary and secondary outcome Funding agencies Use of non-validated outcome measures Lack of sample size calculation Protocol publication in journal Lack of analysis plan Submitting the research Failure to submit the protocol for approval protocol Failure to seek patient informed consent Failure to register protocol Performing the research Selection bias Information bias Audits Deviation from protocol Monitoring Analysing the results Changing the primary outcome Wrong statistical tests applied Data falsification (i.e. ignoring outliers) Data fabrication

Interpreting the results Biased and/or over-interpretation of results Peer-reviewers Writing the article Not reporting protocol violation Editors Selective reporting of outcomes Ghost / guest authorship Not declaring conflicts of interest Plagiarism Selective or self-citation Plagiarism software Publishing the article Simultaneous submission to multiple Editors journals Peer-reviewers Duplicate publication Failure to publish research Falsifying the peer-review process

Post-publication Non-correction of identified errors Non-retraction of articles that warrant Editors and publishers retraction

18 3. Honest error or misconduct ?

It is usually admitted that minor errors ought to be identified and corrected, but authors don’t need to be blamed or punished.98 However, some errors are considered so important that they cannot be corrected, and because they are misleading, or considered misconduct, retraction of the article becomes necessary. The limit between “honest error” and “misconduct” relies on the intention of the authors: a misconduct being an intentional error, as apposed to honest error supposedly “unconscious”.

Therefore, differentiating errors (who deserve correction) from misconduct (who deserve correction and sanction) becomes a subjective and moral matter, leading to various and inconsistent grading of the different malpractices.99 Initially, misconduct used to describe only data fabrication, falsification, and plagiarism,100,101 which were undoubtedly intentional malpractices. Some authors have thereafter introduced the concept of “questionable research practices”24 to describe more common (maybe less conscious) malpractices, and some have further tried rating misconduct from “minor” to “major” misconduct.102 (Fig 2)

Classification of different malpractices as either minor errors or major misconduct remains difficult. For example, the lack of a clear definition of the primary outcome can be an unconscious error made by an inexperienced researcher, but can also be an intentional malpractice made in the hope to manipulate the data during data analysis. The same can be said of most other malpractices discussed above. A less moral, and perhaps more objective way to consider the problem would be to look at it from the point of view of “Science”. In this case, we realise that first, honest errors can falsify the scientific knowledge just as much as some major misconduct can, since both may lead to conclusions that do not reflect reality, and secondly that not all recognised misconduct threaten the scientific knowledge.103 For example, plagiarism of someone’s wording or ideas without acknowledgement, or guest authorship, although unfair to the original authors, illegal and reprehensible, may not affect the validity of the scientific knowledge. This is good news for systematic reviewers, since not all research malpractices will bias their conclusions. In the next chapter, I will describe the process of a systematic review and discuss whether each of the above cited malpractices could bias the conclusions of systematic reviews, and describe how the process of systematic reviewing may correct or identify new misconducts.

19 Figure 2 Varying definitions of research misconduct

What is Research misconduct ? Scientific integrity, misconduct in Richard Smith science Emilio Bossi

Swiss Medical Weekly 2010; 140(13-14):183-186

The COPE Report 2000. BMJ Books 2000. http://publicationethics.org/files/u7141/COPE2000pdfcomplete.pdf

Research misconduct - The grey area of Questionable Research Practices René Custers

http://www.vib.be/en/news/Pages/Research-misconduct- --The-grey-area-of-Questionable-Research-Practices.aspx 4. The systematic review The structure of a systematic review is very similar to that of a clinical trial. (Fig1.B) The process starts with an idea, and the formulation of a clear research question, and is followed by the elaboration of a research protocol describing the major steps of the review. The first of these steps is a transparent and exhaustive search of the literature for all published and unpublished articles potentially answering the research question.104 These articles are selected according to pre-defined inclusion criteria, and all potentially relevant articles are critically appraised, regarding validity and research methodology. Outcomes of interest are extracted by at least two authors separately to control for potential data extraction errors and contact with study authors is often necessary to clarify the methodology used, or to unearth hidden (unreported) outcomes.105,106 The results are then synthetized at the study level based on the summary measures reported in the article, and when deemed adequate, a weighted pooled summary estimate of results can be computed, known as a “meta-analysis”. If such pooling of the data does not seem reasonable, the systematic review will remain descriptive, also called “qualitative systematic review”. Interpretation of the review’s findings is then made, together with a thorough discussion of the strengths and limitations of the analyses. The review ends with a conclusion, sometimes with recommendations, but most of all, with a research agenda highlighting what is known, and what needs further clarification.107 The process leading to the publication of the finalised article is very similar to that of a clinical trial. Since systematic reviews rely on previously performed trials, their conclusions obviously can suffer from some of the above-discussed malpractices. Interestingly, some of these malpractices remain harmless to their conclusions, either because they don’t directly threaten the scientific knowledge, or because the strict procedures applied during systematic reviewing have the potential to detect and to correct some of the malpractices, or take them into account during the analyses. Additionally, systematic reviewers may play an important role by suspecting or identifying misconduct in original trials, but also to improve the design and conduct of subsequent research.

4.1 Malpractices that do not threaten the conclusions of systematic reviews Some malpractices do not threaten the scientific knowledge, and therefore do not threaten the conclusions of a well-performed systematic review either. Examples of such malpractices include guest-authorships, or plagiarism of ideas or wording of others. Although reprehensible, these malpractices are unlikely to bias the conclusions of systematic reviews. Even failure to obtain formal ethical approval for the conduct of a clinical trial, or failure to obtain patient’s informed consent, if the trial has been otherwise well performed, should not bias the conclusions of a systematic review.

21 Some other malpractices may theoretically threaten the scientific knowledge, at the article level, but may not be such a threat to the conclusions of systematic reviews either because they are not considered by systematic reviewers, or because they may be detected, and actions may be taken to avoid making biased conclusions. For example, the bad formulation of a research question is unlikely to be a problem since a research based on an “unanswerable” question is likely to remain un-published, one based on a “clinically irrelevant” question is likely not to be selected during the review process, and redundant researches will tend to increase the power of a meta-analysis, and the generalizability of the conclusions of a systematic review. Also, good systematic reviewing implies the critical appraisal of retrieved articles, and trials using an inadequate study design will most likely not be retained for analyses. Changing the primary outcome during data analysis will also remain mostly harmless because systematic reviewers include trials based on predefined methodological criteria, rarely on the basis of their primary outcomes (although this can happen) and never on the statistical significance of the reported results. Systematic reviewers may detect changes made to the primary outcome by looking at the registered protocol, if available, and perform sensitivity analyses accordingly. Since quantitative systematic reviews recalculate all measures of impact of the tested intervention using the reported summary estimates at the study level, the choice of an inadequate statistical test in the primary report is unlikely to bias the results of a quantitative systematic review. Although underpowered trials are likely to report an overestimated intervention effect, their potential impact when included in a systematic review remains a matter of debate.108 Interestingly, 70% of the systematic reviews published in the Cochrane database were shown to be based only on underpowered studies.109 Moreover, systematic reviews may be viewed as designs that can make sense of underpowered trials, therefore justifying their publication since they may be subsequently included in a larger meta-analysis, and allow the estimation of a plausible effect that may be challenged in future studies.110 Systematic biases during patient enrolment and/or data collection can certainly bias the estimate of the effect of an intervention,111 and it is the difficult duty of systematic reviewers to critically appraise all retrieved trials and to decide whether or not biases may have distorted the reported results and therefore justify the non-inclusion of the trials into the analyses.32 Some trials fulfilling all inclusion criteria may be finally excluded if a major bias has been identified. For minor risks of biases, that are not judged severe enough to justify the exclusion of the article from the review, systematic reviewers are encouraged to rate their risk of biases, using different scores (for example: Jadad scale,112 modified oxford scale,113 or Cochrane risk of bias tool114 to name a few). This quality rating allows the performance of subgroup analyses according to different risk of biases, and helps quantify the impact of the bias of the review’s conclusion. The choice of the scale used however remains unclear to many authors.115 Personal contact with study authors is a very important part of a systematic review as it allows clarification of methodological issues, of ill-defined primary and secondary outcomes and also to unearth unreported outcomes.69,105,116 It has been shown to be a valuable way to decrease

22 selective outcome reporting.117 Other typical examples of research malpractices that remain mostly harmless to the conclusions of systematic reviews include citation malpractices (selective or self-citation) and over-interpretation of results by study authors, or conclusions that are not based on the reported results. The latest are harmless to quantitative systematic reviews since the interpretation of the study authors are barely read by systematic reviewers. However, qualitative systematic reviews may be more sensible to over-interpretation of results; this will be illustrated in chapter 5.4. Non-publication of completed research could bias the scientific knowledge and the conclusions of systematic reviews towards an over-estimation of the impact of an intervention,118 but also sometimes towards an underestimation of this effect104 and there are clinically relevant examples in the medical literature of non-publication of studies with statistically significant adverse effects.119 Systematic reviewers are encouraged to either search for unpublished trials,104,120,121 and/or to test for “publication bias” through visual examination of funnel plots in an attempt to correct these malpractices.122 Whether statistical correction of an assumed publication bias is a good idea remains a matter of debate.29 Finally, duplicate publication, meaning the publication of the results of the same trial in more than one article, can be a threat to the scientific knowledge and to systematic reviews if it remains un-recognised.123 Systematic reviewers have been shown to be well trained in detecting such duplicate reports124 and therefore are able to identify them, exclude the duplicate and report on the redundant publication.

4.2 Malpractices that do threaten the conclusions of systematic reviews Some malpractices however may bias the scientific knowledge and the conclusions of systematic reviews since they are likely to remain unnoticed by systematic reviewers. These include data fabrication and falsification, but also non-declaration of protocol violation (when the protocol is not prospectively registered or accessible), of author’s conflict of interest,125,126 ghost authorships (if associated with conflict of interest),127,128 corrupted peer-review,90 or failure to correct identified errors or to retract articles that ought to be retracted. This list only considers the most common malpractices. Systematic reviewers should therefore consider these potentially misleading malpractices when discussing the results of their review. Interestingly, recent cases of massive frauds in anaesthesiology have highlighted that some cases of malpractices and misconduct may be inter-related. For example, the lack of ethical approval may hide other science threatening misconduct, such as data fabrication, as was illustrated by the cases described in Box 1 and 2. Therefore, even the above-mentioned malpractices do not directly impact of the scientific knowledge, may potentially be symptoms of more science threatening practices.

4.3 Additional roles The previous paragraphs have illustrated why systematic reviews are vulnerable to research

23 malpractices and misconduct, but probably less so than original researches. Interestingly, systematic reviewers may play additional roles in this context. First, systematic review can be used as a design to investigate on the prevalence of research misconduct in the published literature,17 or the impact of different malpractices on the trial’s or on the systematic review’s conclusions129 such as publication or outcome reporting bias,55 sponsors130,131 and author’s financial conflict of interests,132,133 or data falsification or fabrication22,134 just to name a few. Secondly, because they read all that has been published worldwide on a given topic, systematic reviewers are in a very good position to identify malpractices or even suspect new cases of misconduct.135 For instance, systematic reviewers can detect “copy-paste like” duplicate publications,124,136 and it has been suggested that they should also try to identify trials lacking reports of formal ethical approval.137,138 In some cases, systematic reviewers may come to suspect data fabrication as well, although there is to date no widely accepted and validated way to detect such cases.

Box 3 The Fujii case: uncovering fraud by a systematic review Prof Yoshitaka Fujii was a Japanese anaesthesiologist. His area of research focused on the prevention of post-operative nausea and vomiting. He worked in various institutions in Japan. Suspicion regarding the reliability of his research arose in 2001 with the publication of a systematic review investigating the impact of a drug “granisetron” on the incidence of post- operative nausea and vomiting by Peter Kranke et al.139 In this systematic review, it became obvious that something was wrong with the Fujii papers; authors of the systematic review realised that the incidences of headaches in the control groups of these trials were too similar to have happened by random chance. However, they were unable to prove data falsification and shared their concern through a letter to the editor.140 In 2012, Toho University asked for the retraction of 8 of Prof Fujii’s trials for “failure to seek formal ethical approval”. This was followed by the publication of a report by John Carlisle141 who analysed the data reported in Fujii’s articles and showed that the data were not consistent with what would be expected on real samples of humans (lack of variability of different variables). An investigation was then launched, and ended with the dramatic conclusion that, of 212 articles examined, 126 had been totally fabricated, 46 contained fabricated data, for 37 no conclusion could be drawn. Only three trials were valid.142

The problem is that it remains unclear to many systematic reviewers what the correct procedure should be when uncovering these misconducts;143 this will be discussed in chapter 5.5. Finally systematic reviewers can serve to guide and improve the design and relevance of future research. For example, they may identify when a research question has become irrelevant, because the answer to that question is known, and highlight exactly what still needs to be investigated in order to avoid future useless research.144

24 TABLE 2 Systematic reviews in the context of research malpractice

Potential malpractices Role s of the systematic review Malpractices that Incomplete review of existing literature Can serve as a basis for identification of do not threaten the future research needed scientific Guest authorship knowledge or the Plagiarism Can identify plagiarism conclusions of systematic reviews Failure to submit the protocol to ethics Can identify trials that fail to report on formal committee for approval ethical approval Failure to seek patient informed consent or patient consent Failure to register protocol Can identify trials whose protocol is not accessible

Malpractices that Clinically irrelevant or unclear research Can identify questions that have already can threaten the question been answered, and those who deserve scientific further investigation. knowledge, but not Wrong design Can exclude inadequate designs from necessarily the analyses conclusions of Deviation from protocol Can identify protocol deviation and change systematic reviews Changing the primary outcome in primary outcome if protocol accessible

Lack of analysis plan or wrong statistical Can recalculate measures of effect tests applied Lack of sample size calculation Can pool underpowered trials

Selection or information bias Can rate the risk of bias through critical Violation of blinding appraisal

Ill-defined primary and secondary outcome Can contact study authors to clarify Selective reporting of outcomes outcome definition and unearth unreported outcome

Selective or self-citation Not considered by systematic reviewers Biased or over-interpretation of results Not considered for quantitative systematic reviews

Failure to publish research Can search for unpublished trials or test for publication bias Duplicate publication Can identify duplicate publications and exclude duplicates from analyses Malpractice that Data falsification (Ignoring outliers)* threaten the Data fabrication* scientific Not reporting protocol violation knowledge and the Not declaring conflicts of interest conclusions of Ghost authorship (with conflict of interest) systematic reviews Peer-review organisation Non-correction of identified errors Non-retraction of articles that warrant retraction *May be suspected in exceptional cases although inconsistently.

25

26 5 Personal contributions in this area of research The following chapter reports my personal contributions regarding the problems of research malpractices and misconduct for systematic reviewers. Chapter 5.1 is an example of a systematic review performed to evaluate the impact of adding ketamine to an anaesthetic-analgesic regimen in order to achieve a better control of postoperative pain. This article illustrates how the procedures underlying systematic reviewing routinely lead to the identification of common errors, suboptimal reporting of research outcomes, but also how systematic reviewers work to “fill in the gaps”, to correct poor quality or badly reported research. Chapter 5.2 explores the question of an overlooked threat to systematic reviews: the un-retraction of articles that warranted retraction. Although some articles had been recognised as fraudulent and required to be retracted, the retraction process was not straightforward and some articles may remained un-retracted, and their status unknown to systematic reviewers. Chapter 5.3 is a re-analysis of systematic reviews that illustrates what may happen to the conclusions of systematic reviews when fraudulent data has been inadvertently included into meta-analyses. Chapter 5.4, is a cross-sectional analysis of systematic reviews and survey of their authors that examines the hypothesis that systematic reviewers are in a privileged position to identify or suspect research malpractices or misconduct, and that they are both aware of this, and ready to apply protective measures accordingly. Finally chapter 5.5 proposes a new role for systematic reviewers, which consists of guiding researcher to avoid performing research on clinically irrelevant questions, and helping them improve their study designs.

27

28 5.1 Ketamine for post-operative pain - a quantitative systematic review of randomised trials

Authors : Elia N and Tramèr MR Published in : Pain 2005; 113: 61-70 IF 2005 : 4.309

Summary In this systematic review with meta-analyses, our aim was to identify the impact of adding ketamine to anaesthesia for the control of postoperative pain and adverse effects. We included 53 trials (about 3000 patients) originating from 25 countries, and reporting on many different settings. This example illustrates that systematic reviewing implies searching for published and unpublished trials (through contacts with the manufacturer of ketamine), in any language (retrieved articles were written in English, German, Spanish and Japanese), contacting study authors to search for additional information (11 of 20 authors contacted answered) and the rating of the trial quality (which was performed, ten year ago, using a modified Oxford scale based on randomisation, concealment of allocation, blinding and reporting of the flow of patients). This example also illustrates how systematic reviewers may identify studies with inadequate control groups, (69 trials identified, and exclude from analyses), identify problems related to selective non-reporting of adverse effects, of reporting of non-relevant outcomes, deplore the small sizes of the trials performed in anaesthesiology and the fact that some authors are unable or not willing to respond to inquiries. Finally, this systematic review illustrates how the lack of a rational clinical research agenda can result in unclear evidence of the effect of an intervention despite randomisation of about 3000 patients, and provides an important, useful and clear research agenda to guide further research.

The interested reader may want to read a recent systematic review from the same authors testing the impact of Ketamine in a very specific setting (mixed with an opioid in a patient-controlled device) on postoperative pain and side effects. Although the topic and the authors remained the similar, the methods used have changed, especially regarding the evaluation of the quality of trials, and the use of trial sequential analyses. This article was not included in the present dissertation due to space limitation.

Ref: Benefit and harm of adding ketamine to an opioid in a patient-controlled analgesia device for the control of postoperative pain: systematic review and meta-analyses of randomized controlled trials with trial sequential analyses. Assouline B, Tramèr MR, Kreienbühl L, Elia N PAIN (in press)

29 Pain 113 (2005) 61–70 www.elsevier.com/locate/pain

Ketamine and postoperative pain – a quantitative systematic review of randomised trials

Nadia Elia*, Martin R. Trame`r

EBCAP Institute (Evidence-Based Critical care, Anaesthesia and Pain treatment), Division of Anaesthesiology, Geneva University Hospitals, 24 Rue Micheli-du-Crest, CH-1211 Geneva 14, Switzerland

Received 5 May 2004; received in revised form 10 September 2004; accepted 28 September 2004

Abstract

Ketamine, an N-methyl-D-aspartate receptor antagonist, is known to be analgesic and to induce psychomimetic effects. Benefits and risks of ketamine for the control of postoperative pain are not well understood. We systematically searched for randomised comparisons of ketamine with inactive controls in surgical patients, reporting on pain outcomes, opioid sparing, and adverse effects. Data were combined using a fixed effect model. Fifty-three trials (2839 patients) from 25 countries reported on a large variety of different ketamine regimens and surgical settings. Sixteen studies tested prophylactic intravenous ketamine (median dose 0.4 mg/kg, range (0.1–1.6)) in 850 adults. Weighted mean difference (WMD) for postoperative pain intensity (0–10 cm visual analogue scale) was K0.89 cm at 6 h, K0.42 at 12 h, K0.35 at 24 h and K0.27 at 48 h. Cumulative morphine consumption at 24 h was significantly decreased with ketamine (WMD K15.7 mg). There was no difference in morphine-related adverse effects. The other 37 trials tested in adults or children, prophylactic or therapeutic ketamine orally, intramuscularly, subcutaneously, intra-articulary, caudally, epidurally, transdermally, peripherally or added to a PCA device; meta- analyses were deemed inappropriate. The highest risk of hallucinations was in awake or sedated patients receiving ketamine without benzodiazepine; compared with controls, the odds ratio (OR) was 2.32 (95%CI, 1.09–4.92), number-needed-to-harm (NNH) 21. In patients undergoing general anaesthesia, the incidence of hallucinations was low and independent of benzodiazepine premedication; OR 1.49 (95%CI 0.18–12.6), NNH 286. Despite many published randomised trials, the role of ketamine, as a component of perioperative analgesia, remains unclear. q 2004 International Association for the Study of Pain. Published by Elsevier B.V. All rights reserved.

Keywords: Systematic review; Meta-analysis; Surgery; Postoperative; Pain; Analgesia

1. Introduction controversial, and the fear of psychomimetic adverse effects such as hallucinations or nightmares limits its widespread The identification of the N-methyl-D-aspartate (NMDA) use in clinical practice. receptor and its role in pain perception has brought a new In 1999, Schmid et al. published a review on ketamine interest to ketamine as a possible adjuvant to multimodal for postoperative pain treatment (Schmid et al., 1999). The pain treatment (Klepstad et al., 1990). During the last 15 authors concluded that ketamine was analgesic and useful in years, a large number of clinical trials have been published surgical patients. However, the reviewed trials reported on a that tested this NMDA-receptor antagonist for the manage- wide diversity of clinical settings and ketamine regimens, ment of postoperative pain. However, the evidence for a and the authors were unable to quantify the analgesic effect beneficial effect of ketamine in this setting remains and the risk of harm of ketamine. Many new reports have been published since. Also, to aid in clinical decision making, health care providers need to know how well * Corresponding author. Tel.: 41 22 382 75 17; fax: 41 22 382 75 11. ketamine works as an analgesic and what the incidence of E-mail address: [email protected] (N. Elia). adverse effects is. Thus, valid quantitative information is URL: http://www.hcuge.ch/anesthesie/data.htm needed about minimal effective dose, dose-responsiveness,

0304-3959/$20.00 q 2004 International Association for the Study of Pain. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.pain.2004.09.036 62 N. Elia, M.R. Trame`r / Pain 113 (2005) 61–70 and adverse effect profile. The aim of this systematic review is to address these issues. Box 1: Modified Oxford Scale

Validity score (0–7) 2. Methods Randomisation We followed the QUOROM statement that recommends standards to improve the quality of reporting of meta-analyses 0 None (Moher et al., 1999). 1 Mentioned 2 Described and adequate 2.1. Systematic search Concealment of allocation Published reports of randomised trials testing ketamine for postoperative pain management were searched in Medline, 0 None Embase, CINHAL, Biosis previews, Indmed, and the Cochrane 1 Yes Controlled Trials Register. We used the free text and MeSH terms ‘ketamine’, ‘ketalar’, ‘ketanest’, ‘keta*’, ‘surgery’, ‘surgical’, Double blinding ‘post(-)operative’, ‘pain treatment’, ‘analgesia’ and ‘analgesic’, without language restriction. We limited our search to randomised 0 None trials and humans. The last electronic search was in November 1 Mentioned 2003. We identified additional studies from the bibliographies of 2 Described and adequate retrieved reports and reviews (Eide et al., 1995; Granry et al., 2000; Kohrs and Durieux, 1998; Schmid et al., 1999; White et al., 1982). We contacted the manufacturer of ketamine (Pfizer AG, 8052 Flow of patients Zu¨rich, Switzerland) and asked for additional reports including unpublished data. We contacted authors of original reports for 0 None translation or to obtain additional information. 1 Described but incomplete 2 Described and adequate 2.2. Inclusion and exclusion criteria

We considered randomised trials with an inactive (placebo or a morphine-sparing effect would be clinically relevant if there was ‘no treatment’) control group, and that reported on postoperative a decrease in morphine-related adverse effects. Definitions of pain outcomes or adverse drug reactions related to ketamine. adverse effects were taken as reported in the original trials. Relevant pain outcomes included pain intensity, time to first To facilitate comparisons between trials, we normalised analgesic request, postoperative cumulative morphine consump- ketamine regimens to milligrams per kilogram of bodyweight tion and morphine-related adverse effects. We excluded trials (mg/kg). When the trials did not report on average bodyweight of the including less than 10 patients per group and those reporting on studied population, we assumed it was 70 kg. premedication, chronic pain, or emergency medicine. Data from Ketamine administration at induction of anaesthesia and during animal studies, abstracts, letters or reviews were not considered. or immediately after surgery (before the patient complained of pain) was regarded as a prophylactic regimen. Postoperative administration in a patient with overt pain was defined as a 2.3. Validity scoring therapeutic regimen. When a trial tested ketamine before versus after surgery, and there was no evidence of any difference, we One author (NE) screened the abstracts of all retrieved reports combined the data. and excluded articles that clearly did not meet our inclusion criteria. Both authors then independently read the included reports and assessed their methodological validity using a modified 7- 2.5. Meta-analyses point 4-item Oxford scale (Box 1)(Jadad et al., 1996; Pasquina et al., 2003). We resolved discrepancies by discussion. As there was a For continuous data, we calculated weighted mean differences prior agreement that we would exclude reports without randomis- (WMD) with 95% confidence interval (CI). If the authors did not ation, the minimum score of an included trial was one, and the report on mean values and standard deviations, we contacted them. maximum score was seven. If they did not answer and the data were reported as graphs, we extracted the data from the graphs. 2.4. Data extraction Dichotomous data on adverse effects were summarised using relative risks (RR) with 95%CI. For rare outcomes (for instance, We extracted information about ketamine (time and route of psychomimetic adverse effects) we computed Peto odds ratios administration, dose), number of patients enrolled and analysed, (OR) that deal better with zero cells. If the 95%CI included one, we observation period, surgery, postoperative analgesia and pain assumed that the difference between ketamine and control was not outcomes. Dichotomous data on presence or absence of adverse statistically significant. To estimate the clinical relevance of an effects were extracted. There was a pre-hoc agreement that adverse effect we calculated the number-needed-to-harm (NNH) N. Elia, M.R. Trame`r / Pain 113 (2005) 61–70 63

(McQuay and Moore, 1998). NNH were calculated whether or not Table 1 the OR suggested a statistically significant difference between Included randomised trials, according to clinical settings ketamine and control. We used a fixed effect model throughout Clinical setting Nb trials Nb patients Regimens since we combined data only when they were clinically homogenous. Analyses were performed using ReviewManager Route Goal K/C Mg/kg software (version 4.0, Cochrane Collaboration) and Excel. Data Adults were graphically plotted using forest plots to evaluate treatment IV P 16 509/341 0.4 [0.1; 1.6] effects, and L’Abbe´ plots (event rate scatters) to explore variability IV PCA T 5 147/137 1.0 [0.3; 2.1] of the data. Epid P 10 285/169 0.3 [0.1; 1.2] Epid T 2 65/66 0.7 [0.5; 1.0] IVO24 h PCT 3 64/48 3.4 [2.0; 6.8] IV (sedation) P 3 126/76 0.9 [0.5; 6.9] 3. Results Children IV P 2 44/32 0.5 [0.5; 1.0] Caudal P 4 82/86 0.5 [0.3; 0.5] 3.1. Systematic search Other regimens in adults or children PO, IM, IA, IV, P/T 10 249/195 0.5 [0.1; 10] We identified 241 trials; 188 were subsequently excluded SC, TTS, NB (Appendix A: Flow chart). Fifty-three valid randomised Number of trials do not add up since 2 trials tested two routes of trials tested ketamine in 2,721 adults and children (refer- administration. IV, intravenous; Epid, epidural; PO, per os; IM, ences, see included randomised trials; detailed information, intramuscular; IA, intra-articular; SC, subcutaneous; TTS, transcutaneous see Appendix A: Analysed trials). The manufacturer of transport system; NB, nerve block (brachial plexus block and Bier’s block); ketamine did not provide any data. We contacted the authors PCA, patient controlled analgesia; P, prophylactic; T, therapeutic; K, of 20 reports to obtain additional information; eleven ketamine; C, control. Regimens have been converted to (cumulative) mg/kg taking into account average bodyweights as reported in the original trials, answered (Ngan Kee et al., 1997; Lee and Sanders, 2000; Ketamine regimens are expressed as median [range]. Kakinohana et al., 2000; Burstal et al., 2001; Subramaniam et al., 2001a; Unlu¨genc¸ et al., 2002; Jaksch et al., 2002; Unlu¨genc¸ et al., 2003; Rosseland et al., 2003; Kararmaz et al., Schmid et al. (1999). However, of those considered by 2003; Xie et al., 2003). We did not retrieve any unpublished Schmid et al., seven were excluded by us: four lacked an data. The reports were published between 1971 and 2003. inactive control group (Fu et al., 1997; Jahangir et al., 1993; They were written in English (nZ49), German (2), Spanish Owen et al., 1987; Wilder-Smith et al., 1998), one was on (1), and Japanese (1), and originated from 25 countries (six healthy volunteers (Maurset et al., 1989), one had less than from Turkey; five from the US; four each from China and 10 patients per group (Tverskoy et al., 1994), and one was Germany; three each from France, Norway and the UK; two not randomised (Clausen et al., 1975). each from Australia, Belgium, Brazil, Greece, India, Israel In the 16 trials, 889 adults were randomised and data from and Japan; and one each from Austria, Denmark, Ireland, 850 were subsequently analysed by the original investi- New Zealand, Saudi Arabia, South Africa, South Korea, gators. Most trials reported on follow-up periods of 24–48 h. Spain, Sweden, Taiwan and United Arab Emirates). Two One trial reported on additional outcomes after three days studies declared sponsorship by the manufacturer (Reeves et (Menigaux et al., 2001), one after five days (Jaksch et al., al., 2001; Weber and Wulf, 2003). 2002), and one after one year (De Kock et al., 2001). The median number of patients receiving ketamine per study was 25 (range, 10–105). The median modified Oxford 3.2.2. Surgical settings and ketamine regimens C scale was 4 (range, 2–6). Four trials tested S( ) ketamine Surgeries were abdominal (7 trials), gynaecologic (4), (Himmelseher et al., 2001; Lauretti et al., 2001; Jaksch et orthopaedic (3), maxillo-facial (1), and outpatient surgery al., 2002; Weber and Wulf, 2003), all others used racemic (1). In 11 trials, a single bolus of ketamine was injected at ketamine. The trials described a large variety of ketamine induction of anaesthesia or immediately after surgery. In regimens and surgical settings (Table 1). Two trials tested four, a preoperative bolus was followed by a continuous two different regimens (De Kock et al., 2001; Xie et al., infusion throughout surgery (De Kock et al., 2001; 2003). The largest clinically homogenous subgroup (16 Guignard et al., 2002; Jaksch et al., 2002; Kararmaz et al., trials) tested prophylactic intravenous ketamine in adults 2003). One trial tested both bolus injection and continuous undergoing general anaesthesia. These trials shall be infusion (Heinke and Grimm, 1999). In three trials, an discussed in more detail. identical dose of ketamine was administered either before surgery or at the end of surgery; in none of these did the time 3.2. Prophylactic intravenous ketamine of administration have an impact on the outcome (Dahl et al., 2000; Gilabert Morell and Sanchez Perez, 2002; 3.2.1. Trials Menigaux et al., 2000). The average body weight in 51 trials Of the 16 trials, two only (Ngan Kee et al., 1997; was 70.3 kg; the median dose of ketamine across all trials Roytblat et al., 1993) had been included in the review by was 0.4 mg/kg (range, 0.1–1.6). 64 N. Elia, M.R. Trame`r / Pain 113 (2005) 61–70

3.2.3. Analgesic efficacy of intravenous ketamine as meansGstandard deviations and the authors did not respond to our inquiries. In four trials, average morphine 3.2.3.1. Pain intensity. Ten trials reported on pain intensity consumption in controls was between 34.2 and 49.7 mg. In at rest, measured on a 0–10 cm visual analogue scale (Fig. 1; one trial, ketorolac was added to the morphine in the PCA; detailed data see Appendix A: VAS of Pain Intensity). In in that trial, average cumulative morphine consumption in controls, pain intensity rarely reached more than 4 cm on the controls was 15.6 mg only (Gilabert Morell and Sanchez 10 cm scale. Combined data showed a consistent and Perez, 2002). All trials except one (Jaksch et al., 2002) statistically significant decrease in pain intensity at rest with reported on a statistically significant decrease in morphine ketamine compared with control at 6 h postoperatively consumption with ketamine; morphine-sparing ranged from (WMD, K0.89 cm), 12 h (K0.42 cm), 24 h (K0.35 cm), 9% (Jaksch et al., 2002) to 47% (Menigaux et al., 2000) and 48 h (K0.27 cm). We found no evidence of a (median: 32%). When data from all four trials were relationship between the dose of ketamine and analgesic combined, WMD in favour of ketamine was about efficacy. Four trials reported on pain intensity on movement K16 mg. Including the trial that was using ketorolac did (De Kock et al., 2001; Jaksch et al., 2002; Kararmaz et al., not change that result. (Fig. 2A; detailed data see Appendix 2003; Ngan Kee et al., 1997), each at different time points; A: Cumulative Morphine Consumption). meta-analysis was deemed inappropriate. 3.2.3.3. Opioid-related adverse effects. In 12 trials, post- 3.2.3.2. Cumulative morphine consumption at 24 h. Seven operative analgesia was with systemic morphine, piritramid trials reported on morphine consumed with a PCA pump or fentanyl; two of those did not report on opioid-related during the first 24 h after surgery. In two trials (De Kock adverse effects (Heinke and Grimm, 1999; Ngan Kee et al., et al., 2001; Guignard et al., 2002) the data was not reported 1997). One study reported on the number of patients

Fig. 1. VAS of pain intensity (0–10 cm). On the L’Abbe´ plots, the sizes of the symbols represent the sizes of the trials. WMD, weighted mean difference; CI, confidence interval; VAS, visual analogue scale of pain intensity; for detailed information see Appendix A. N. Elia, M.R. Trame`r / Pain 113 (2005) 61–70 65

Fig. 2. Opioid-sparing. (A) 24 h morphine consumption. On the L’Abbe´ Plot, the sizes of the symbols represent the sizes of the trials; WMD, weighted mean difference; CI, confidence interval. (B) Opioid-related adverse effects. For each adverse effect a meta-analysis is shown; RR, relative risk; PONV, postoperative nausea and vomiting; n, number of patients with adverse effect; N, total number of patients. For detailed information see Appendix A. presenting less than five episodes of nausea (De Kock et al., significantly fewer patients who had received ‘high dose’ 2001); these data could not be combined with nausea data intravenous ketamine (bolus 0.5 mg/kg, infusion from the other trials. The nine remaining trials reported on 0.25 mg/kg/h during surgery) suffered from residual pain presence or absence of nausea (5 trials), vomiting (4), requiring analgesic medication. After one year, three of 17 nausea or vomiting (4), pruritus (2), drowsiness (2), or controls still had pain compared with none of those who urinary retention (3) (Fig. 2B; detailed data see Appendix A: had received intravenous ketamine. Opioid-Related Adverse Effects). Combined data showed no statistically significant difference between ketamine and 3.3. Other ketamine regimens control for any of these opioid-related adverse effects. Thirty-seven trials tested a variety of regimens and routes 3.2.3.4. Time to first request of analgesic. Seven trials of administration of ketamine in children or adults reported on the time when patients first requested an (Appendix A: Analysed Trials). analgesic postoperatively (see Appendix A: Time to First Request of Analgesic). In six of seven trials, average delay 3.3.1. Ketamine added to an opioid in an intravenous in controls ranged from 10 to 30 min. In one study, wound PCA device infiltration with ropivacaine 200 mg increased this delay to In five trials (284 adults), ketamine was added to 121 min (Papaziogas et al., 2001). When data from all seven morphine or tramadol in a PCA. Regimens ranged from 1 trials were combined, the average improvement with to 2.5 mg of ketamine per mg of morphine, and 0.2 mg of ketamine was about 16 min. Exclusion of the trial that ketamine per mg of tramadol. Meta-analysis was deemed was using ropivacaine did not change that result. inappropriate. Authors of three trials concluded that ketamine decreased postoperative pain intensity; the two 3.2.3.5. Long term follow up. In one trial, patients were others concluded that it did not. contacted after two weeks, one month, six months and one year and were asked if they felt any pain or particular 3.3.2. Epidural ketamine in adults sensations at the scar area, and whether this pain required Ten trials reported on prophylactic epidural ketamine in medication (De Kock et al., 2001). At six months, 454 adults. Two further trials reported on therapeutic 66 N. Elia, M.R. Trame`r / Pain 113 (2005) 61–70 epidural ketamine in 131 patients. There was a large There was no graphical evidence of a relationship variability in epidural regimens; meta-analysis was deemed between the dose of ketamine and the risk of having inappropriate. Authors of five of ten prophylactic and of hallucinations (Appendix A: Risk of Hallucinations Accord- both therapeutic studies concluded that ketamine decreased ing to the Weight Adjusted Ketamine Dose). postoperative pain; the five others concluded that it did not. To test the impact of concomitant benzodiazepines and patients’ vigilance we performed four subgroup analyses 3.3.3. Prolonged (O24 h) intravenous ketamine in adults (Fig. 3). In eight trials, patients received ketamine during In three trials (112 patients), an intravenous infusion of general anaesthesia after benzodiazepine premedication ketamine was started at the beginning of surgery and (Abdel-Ghaffar et al., 1998; De Kock et al., 2001; Roytblat continued until 24 or 48 h after surgery. Meta-analysis was et al., 1993; Gilabert Morell and Sanchez Perez, 2002; deemed inappropriate. Authors of one report concluded that Guignard et al., 2002; Papaziogas et al., 2001; Subramaniam ketamine decreased postoperative pain; the two others et al., 2001b; Weir et al., 1998). Three of 289 (1%) patients concluded that it did not. experienced hallucinations with ketamine, compared with one of 154 (0.6%) controls; OR 1.49 (95%CI, 0.18–12.6), 3.3.4. Intravenous ketamine in sedated adults NNH 257. In four trials, 107 patients received ketamine Three trials reported on intravenous ketamine during (91 controls) during general anaesthesia without a benzo- sedation in 36 intubated patients in the intensive care unit, diazepine premedication (Lee and Sanders, 2000; Menigaux and 166 patients undergoing breast biopsy or cataract et al., 2000; Menigaux et al., 2001; Ozbek et al., 2002). extraction. Surgical procedures were not comparable and There were no reports of hallucinations. Thus, in patients pain measurements were inconsistent; meta-analysis was undergoing a general anaesthetic with or without benzo- deemed inappropriate. Authors of all reports concluded that diazepine premedication, three of 396 (0.8%) presented ketamine was beneficial during sedation since it decreased hallucinations with ketamine compared with one of 245 the number of rescue analgesics that were needed during (0.4%) controls; OR 1.49 (95% CI, 0.19–12.6), NNH 286. surgery and the number of patients with inadequate sedation.

3.3.5. Intravenous and caudal ketamine in children Two paediatric trials (76 children) tested prophylactic intravenous ketamine, and four (168 children) reported on ketamine added to caudal ropivacaine, bupivacaine, or alfentanil. Postoperative pain measurements were incon- sistent; meta-analysis was deemed inappropriate. All authors concluded that ketamine decreased postoperative pain.

3.3.6. Other ketamine regimens Ten trials (444 patients) reported on a variety of other regimens. Ketamine was given intra-articularly, intramus- cularly, subcutaneously, orally, transdermally, added to a brachial plexus block with local anaesthetic, as an adjuvant to intravenous regional anaesthesia, or as a combination of an intravenous preoperative bolus with postoperative PCA. Authors of five reports concluded that ketamine was beneficial; the five others concluded that it was not.

3.4. Safety analysis

3.4.1. Hallucinations Thirty trials reported on hallucinations. We tested the impact of the dose of ketamine, of a premedication with benzodiazepines, and of patients’ vigilance at the time of drug administration on the risk of hallucinations. Benzo- diazepines were midazolam, diazepam, lorazepam, lorme- Fig. 3. Risk of hallucinations according to patients’ vigilance and premedication with benzodiazepines. Each symbol represents one com- tazepam, or temazepam. We assumed that for these parison between ketamine and control. GA, general anaesthesia; benzo, sensitivity analyses, the route of administration of ketamine benzodiazepine; NNH, number-needed-to-harm. Benzodiazepines were was not a factor. midazolam, diazepam, lorazepam, lormetazepam or temazepam. N. Elia, M.R. Trame`r / Pain 113 (2005) 61–70 67

In seven trials (eight comparisons), patients were thus, patients who received ketamine had a decline in pain awake or sedated after a benzodiazepine premedication intensity of about 25% at 6 h and of about 20% at 24 h (Adriaenssens et al., 1999; Azevedo et al., 2000; Badrinath compared with those who received a placebo. Since patients et al., 2000; Ilkjaer et al., 1998; Lauretti et al., 2001; in control groups did not experience very severe pain, the Subramaniam et al., 2001a; Unlu¨genc¸ et al., 2002). Of 217 clinical relevance of this improvement remains unclear patients who received ketamine, nine (4.1%) had hallucina- (Kalso et al., 2002). Secondly, in clinical practice, both tions, compared with two of 155 (1.3%) controls; OR 2.19 decreased pain intensity and decreased opioid consumption (95%CI, 0.58–8.27), NNH 35. In nine trials, patients were may be regarded as two linked proxies of analgesic efficacy, awake or sedated without a benzodiazepine premedication and it may not be feasible to clearly separate between these (Frey et al., 1999; Gorgias et al., 2001; Hercock et al., two endpoints. During the first 24 postoperative hours, 1999; Huang et al., 2000; Joachimsson et al., 1986; Lee controls consumed about 40 mg of morphine on average; et al., 2002; Qureshi et al., 1995; Reeves et al., 2001; this corresponds to the usual average amount of morphine Yanli and Eren, 1996). Of 230 patients who received consumed during the first day after major surgery (Walder ketamine, 24 (10.4%) had hallucinations, compared with 11 et al., 2001). In four of five trials, this dose of morphine was of 193 (5.7%) controls; OR 2.32 (95%CI, 1.09–4.92), NNH reduced by 27–47%, and at the same time, patients had a 21. Thus, of all awake or sedated patients, 33 of 447 (7.4%) reduction in pain intensity. Thus, the beneficial albeit presented hallucinations with ketamine compared with 13 of moderate effect of ketamine on pain intensity should be 348 (3.7%) controls; OR 2.28 (95%CI, 1.19–4.40), NNH 27. interpreted in conjunction with the opioid-sparing effect. Thirdly and interestingly, there was no decrease in the 3.4.2. Nightmares incidence of morphine-related adverse effects although Thirteen trials reported on the incidence of nightmares. morphine consumption was clearly reduced with ketamine. With ketamine, 10 of 421 (2.4%) patients had nightmares, One reason may be that these trials mainly concentrated on compared with two of 262 (0.8%) controls; OR 2.64 efficacy and did not systematically evaluate and report on (95%CI, 0.76–9.12), NNH 62. adverse effects. This limitation on the reporting of adverse effects has been described in other pain settings (Edwards 3.4.3. Pleasant dreams et al., 1999). Another reason may be that the decrease in Eight trials reported on pleasant dreams. With ketamine, morphine consumption was not strong enough to impact the 29 of 159 patients (18.2%) had pleasant dreams, compared incidence of morphine-related adverse effects; this would with 14 of 145 (9.7%) controls; OR 1.96 (95%CI, 1.01– weaken the clinical relevance of the opioid-sparing. Finally, 3.81), NNT 12. with ketamine, the delay until patients requested the first rescue analgesic was prolonged by about 16 min. This result 3.4.4. Visual disturbances was statistically significant but of no clinical importance; Thirteen trials reported on visual disturbances (diplopia however, it was consistent with the overall analgesic or nystagmus). With ketamine, 23 of 373 (6.2%) patients efficacy of ketamine. One trial only looked at long term had visual disturbances, compared with seven of 271 (2.6%) outcomes (De Kock et al., 2001). Although that study controls; OR 2.34 (95%CI, 1.09–5.04), NNH 28. included a limited number of patients, the authors were able to show a beneficial effect of ketamine on the persistence of painful sensations around the scar for up to 6 months after 4. Discussion surgery. These data support the biological basis of ketamine as an antagonist at the NMDA receptor. 4.1. Efficacy and harm In children, there was some evidence for an analgesic effect with intravenous or caudal ketamine. All paediatric Knowing that ketamine has some analgesic effect in trials concluded that ketamine may be of use. The role of surgical patients begs the question as to whether it should be ketamine as an adjunct to morphine-PCA, however, remains used more frequently as an adjuvant to multimodal unclear. perioperative analgesia. When administered intravenously A further issue that needs to be considered when during anaesthesia in adults, ketamine decreases post- discussing the usefulness of ketamine as a component of operative pain intensity up to 48 h, decreases cumulative perioperative analgesia is harm. The risk of psychomimetic 24 h morphine consumption, and delays the time to first adverse effects such as hallucinations is perhaps the main request of rescue analgesic. When assessing the clinical reason why many clinicians are apprehensive in using relevance of these potentially beneficial effects, several ketamine. Sensitivity analyses provided some evidence that issues need to be considered. Firstly, pain intensity was patients’ vigilance, and less so concomitant use of decreased by about 1 cm on a 10 cm pain scale at 6 h; at benzodiazepines, had an impact on the risk of hallucinations. subsequent time points, this benefit further decreased but the Three conclusions may be drawn. Firstly, in anaesthetised effect was still statistically significant at 48 h. In control patients, the risk of ketamine-induced hallucinations is groups, pain intensity was about 4 cm on the 10 cm scale; minimal. Secondly, benzodiazepines should not be regarded 68 N. Elia, M.R. Trame`r / Pain 113 (2005) 61–70 as an universally effective protection against hallucinations; where further research is not justified. For instance, three even with benzodiazepine premedication, one in 35 non- trials were unable to show a benefit when ketamine was anaesthetised patients will have hallucinations who would administrated preemptively, echoing similar negative not have done so had they not received ketamine. Finally, results of preemptive analgesia with other agents ketamine should be given very carefully to patients who are (Moiniche et al., 2002). We were unable to identify a not anaesthetised; patients should be informed about this dose-response for analgesic efficacy, and in anaesthetised potential risk and they should be carefully observed during patients, the risk of psychomimetic adverse effects was not a the procedure. There were also other ketamine-related limiting factor. Thus, future studies could test, in anaes- adverse effects described such as pleasant dreams or visual thetised patients, higher doses than those reported in these disturbances; the clinical relevance of those is not obvious. trials. Patients should undergo major or painful procedures; only then, the analgesic efficacy of ketamine may be 4.2. Limitations measured (Kalso et al., 2002). Trials should include long- term follow-up (up to one year) to investigate the impact of This systematic review has several limitations; most of perioperative ketamine on chronic pain. Further studies them are related to methodological weaknesses of the original should be of reasonable size, and should report on clinically studies. For example, the original trials did not allow to relevant endpoints of both analgesic efficacy (VAS of pain identify patient-related predictive factors for the emergence intensity, 24 h cumulative morphine consumption) and of hallucinations or to distinguish between the efficacy of harm (psychomimetic effects, prolonged sedation). A different benzodiazepines. Also, we were unable to establish a further interesting setting that should be studied is that of dose-response for efficacy or harm, nor were we able to morphine-resistant pain. In postoperative patients who quantify the impact of perioperative ketamine on the experienced only partial pain relief with morphine (i.e. emergence of chronic pain. One of the major problems was VAS pain intensity O6/10 despite intravenous morphine the wide variability in clinical settings, ketamine regimens 0.1 mg/kg), small doses of ketamine (0.25 mg/kg, up to and reported outcomes. Fifty-three trials were conducted in three times) rapidly improved pain relief (Weinbroum, 25 countries. This illustrates a large and worldwide interest in 2003). These observations support the role of the NMDA this drug. However, most trials were of limited size; groups receptor in the acute pain setting. Adverse drug reactions rarely exceeded 30 patients. Also, for some routes of should be carefully watched for and any alleged opioid- administration, such as transdermally or intra-articularly, sparing should be confirmed by a reduction in opioid-related distinct pharmacokinetic profiles made it impossible to adverse effects. Finally, before administering ketamine to combine these trials with the others. There was a large patients who are not anaesthetised, a careful evaluation of variety in outcome measures, and those outcomes that might the expected benefits and risks should be established. be regarded as clinically relevant (i.e. opioid-related adverse effects) were inconsistently reported. Finally, data reporting was often unsatisfactory; multiple trials presented data as Conflict of interest statement figures only and authors, when asked for more detailed information, were unable or unwilling to respond to our None declared enquiry. As a consequence, for the majority of the regimens, data pooling was impossible or was deemed inadequate. All Acknowledgements these problems have been described before for other acute pain settings (Walder et al., 2001). It is tempting to believe Special thanks go to D. Haake for his help in searching that this poor output, despite a large number of randomised electronic databases, to G. Elia for statistical help, and patients, was mainly due to the obvious lack of a rational to Drs R. Burstal, W. Jaksch, M. Kakinohana, A. Kararmaz, clinical trial program. Indeed, two trials only declared H.M. Lee, W.D. Ngan Kee, L.A. Rosseland, K. sponsorship by the manufacturer, suggesting that ketamine Subramaniam, H. Unlu¨genc¸, and H. Xie, who responded has never reached high priority for the manufacturer. to our inquiries. Dr Trame`r was a beneficiary of a PROSPER Ketamine for chronic pain (Hocking and Cousins, 2003) grant (No 3233-051939.97/2) from the Swiss National and for cancer pain (Bell et al., 2003) has been reviewed Science Foundation. The salary of Dr Elia was provided by recently. The authors concluded that the evidence for the the Swiss National Science Foundation (No 3200- efficacy of ketamine in the chronic pain setting was weak due 064800.01/1) and by the EBCAP Institute. to a lack of high quality, randomised trials with an appropriate number of patients and standardised ketamine regimens. Appendix A 4.3. Research agenda Supplementary data associated with this article can be This systematic review highlights some areas where found, in the online version, at doi:10.1016/j.pain.2004.09. further research with ketamine is warranted but also areas 036 N. Elia, M.R. Trame`r / Pain 113 (2005) 61–70 69

Wilder-Smith OH, Arendt-Nielsen L, Gaumann D, Tassonyi E, Rifat KR. References Sensory changes and pain after abdominal hysterectomy: a comparison of anesthetic supplementation with fentanyl versus magnesium or Bell RF, Eccleston C, Kalso E. Ketamine as adjuvant to opioids for cancer ketamine. Anesth Analg 1998;86:95–101. pain. A qualitative systematic review. J Pain Sympt Manag 2003;26: 867–75. Clausen L, Sinclair DM, Van Hasselt CH. Intravenous ketamine for postoperative analgesia. S Afr Med J 1975;49:1437–40. Edwards JE, McQuay HJ, Moore RA, Collins SL. Reporting of adverse effects in clinical trials should be improved: lessons from acute Included randomised trials postoperative pain. J Pain Sympt Manag 1999;18(6):427–37. Eide PK, Stubhaug A, Oye I. The NMDA-antagonist ketamine for Abdel-Ghaffar ME, Abdulatif M, Al Ghamdi A, Mowafi H, Anwar A. prevention and treatment of acute and chronic post-operative pain. Epidural ketamine reduces post-operative epidural PCA consumption Baillie`re’s Clin Anaesthesiol 1995;9:539–54. of fentanyl/bupivacaine. Can J Anaesth 1998;45:103–9. Fu ES, Miguel R, Scharf JE. Preemptive ketamine decreases postoperative Adriaenssens G, Vermeyen KM, Hoffmann VL, Mertens E, Adriaensen HF. narcotic requirements in patients undergoing abdominal surgery. Postoperative analgesia with i.v. patient-controlled morphine: effect of Anesth Analg 1997;84:1086–90. adding ketamine. Br J Anaesth 1999;83:393–6. Granry JC, Dube L, Turroques H, Conreux F. Ketamine: new uses for an old Azevedo VM, Lauretti GR, Pereira NL, Reis MP. Transdermal ketamine as drug. Curr Opin Anaesthesiol 2000;13:299–302. an adjuvant for postoperative analgesia after abdominal gynecological Hocking G, Cousins MJ. Ketamine in chronic pain management: an surgery using lidocaine epidural blockade. Anesth Analg 2000;91: evidence-based review. Anesth Analg 2003;97:1730–9. 1479–82. Jadad AR, Carroll D, Moore RA, McQuay HJ. Developing a database of Badrinath S, Avramov MN, Shadrick M, Witt TR, Ivankovich AD. The use published reports of randomised clinical trials in pain research. Pain of a ketamine-propofol combination during monitored anesthesia care. 1996;6:239–46. Anesth Analg 2000;90:858–62. Jahangir SM, Islam F, Aziz L. Ketamine infusion for postoperative Burstal R, Danjoux G, Hayes C, Lantry G. PCA ketamine and morphine analgesia in asthmatics: A comparison with intermittent meperidine. after abdominal hysterectomy. Anaesth Intensive Care 2001;29: Anesth Analg 1993;76:45–9. 246–51. Kalso E, Smith L, McQuay HJ, Moore RA. No pain, no gain: clinical Chia YY, Liu K, Liu YC, Chang HC, Wong CS. Adding ketamine in a excellence and scientific rigour. Lessons learned from IA morphine. multinodal patient controlled epidural regimen reduces postoperative Pain 2002;98(3):269–75. pain and analgesic consumption. Anesth Analg 1998;86:1245–9. Klepstad P, Maurset A, Moberg ER, Oye I. Evidence of a role for NMDA Dahl V, Ernoe PE, Steen T, Raeder JC, White PF. Does ketamine have receptors in pain perception. Eur J Pharmacol 1990;187:513–8. preemptive effects in women undergoing abdominal hysterectomy Kohrs R, Durieux M. Ketamine: teaching an old drug new tricks. Anesth procedures? Anesth Analg 2000;90:1419–22. Analg 1998;87:1186–93. De Kock M, Lavand Homme P, Waterloos H. Balanced analgesia in the Maurset A, Skoglund LA, Hustveit O, Oye I. Comparison of ketamine and perioperative period: Is there a place for ketamine? Pain 2001;92: pethidine in experimental and postoperative pain. Pain 1989;36:37–41. 373–80. McQuay HJ, Moore RA. Number needed to treat and relative risk reduction. Edwards ND, Fletcher A, Cole JR, Peacock JE. Combined infusions of Ann Intern Med 1998;128:72–3. morphine and ketamine for postoperative pain in elderly patients. Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF. Improving Anaesthesia 1993;48:124–7. the quality of reports of meta-analyses of randomised controlled trials: Frey K, Sukhani R, Pawlowski J, Pappas AL, Mikat-Stevens M, Slogoff S. the QUOROM statement. Quality of reporting of meta-analyses. Lancet Propofol versus propofol-ketamine sedation for retrobulbar nerve 1999;354:1896–900. Moiniche S, Kehlet H, Dahl JB. A qualitative and quantitative systematic block: comparison of sedation quality, intraocular pressure changes, review of preemptive analgesia for postoperative pain relief: the role of and recovery profiles. Anesth Analg 1999;89:317–21. timing of analgesia. Anaesthesiology 2002;96:725–41. Gilabert Morell A, Sanchez Perez C. Effect of low-dose intravenous Owen H, Reekie RM, Clements JA, Watson R, Nimmo WS. Analgesia ketamine in postoperative analgesia for hysterectomy and adnexect- from morphine and ketamine. A comparison of infusions of morphine omy. Rev Esp Anestesiol Reanim 2002;49:247–53. and ketamine for postoperative analgesia. Anaesthesia 1987;42: Gorgias NK, Maidatsi PG, Kyriakidis AM, Karakoulas KA, Alvanos DN, 1051–6. Giala MM. Clonidine versus ketamine to prevent tourniquet pain during Pasquina P, Trame`r MR, Walder B. Prophylactic respiratory physiotherapy intravenous regional anesthesia with lidocaine. Reg Anesth Pain Med after cardiac surgery: systematic review. Br Med J 2003;327:1379. 2001;26:512–7. Schmid RL, Sandler AN, Katz J. Use and efficacy of low-dose ketamine in Guignard B, Coste C, Costes H, Sessler DI, Lebrault C, Morris W, the management of acute postoperative pain: a review of current Simonnet G, Chauvin M. Supplementing desflurane-remifentanil techniques and outcomes. Pain 1999;82:111–25. anesthesia with small-dose ketamine reduces perioperative opioid Tverskoy M, Oz Y, Isakson A, Finger J, Bradley ELJr, Kissin I. Preemptive analgesic requirements. Anesth Analg 2002;95:103–8. effect of fentanyl and ketamine on postoperative pain and wound Heinke W, Grimm D. Preemptive effects caused by co-analgesia with hyperalgesia. Anesth Analg 1994;78:205–9. ketamine in gynecological laparotomies ? Anaesthesiol Reanim 1999; Walder B, Schafer M, Henzi I, Trame`r MR. Efficacy and safety of patient- 24:60–4. controlled opioid analgesia for acute postoperative pain. A quantitative Hercock T, Gillham MJ, Sleigh J, Jones SF. The addition of ketamine to systematic review. Acta Anaesthesiol Scand 2001;45:795–804. patient controlled morphine analgesia does not improve quality of Weinbroum AA. A single small dose of postoperative ketamine provides analgesia after total abdominal hysterectomy. Acute Pain 1999;2:68–72. rapid and sustained improvement in morphine analgesia in the presence Himmelseher S, Ziegler Pithamitsis D, Kochs E, Jelen Esselborn S, of morphine-resistant pain. Anesth Analg 2003;96:789–95. Martin J, Argiriadou H. Small-dose S(C)-ketamine reduces post- White PF, Way WL, Trevor AJ. Ketamine-Its pharmacology and operative pain when applied with ropivacaine in epidural anesthesia for therapeutic uses. Anesthesiology 1982;56:119–36. total knee arthroplasty. Anesth Analg 2001;92:1290–5. 70 N. Elia, M.R. Trame`r / Pain 113 (2005) 61–70

Huang GS, Yeh CC, Kong SS, Lin TC, Ho ST, Wong CS. Intra-articular Ozbek H, Bilen A, Ozcengiz D, Gunes Y, Ozalevli M, Akman H. The ketamine for pain control following arthroscopic knee surgery. Acta comparison of caudal ketamine, alfentanil and ketamine plus alfentanil Anaesthesiol Sin 2000;38:131–6. administration for postoperative analgesia in children. Paediatr Anaesth Ilkjaer S, Nikolajsen L, Hansen TM, Wernberg M, Brennum J, Dahl JB. 2002;12:610–6. Effect of i.v. ketamine in combination with epidural bupivacaine or Papaziogas B, Argiriadou H, Papagiannopoulou P, Pavlidis T, Georgiou M, epidural morphine on postoperative pain and wound tenderness after Sfyra E, Papaziogas T. Preincisional intravenous low-dose ketamine renal surgery. Br J Anaesth 1998;81:707–12. and local infiltration with ropivacaine reduces postoperative pain after Jaksch W, Lang S, Reichhalter R, Raab G, Dann K, Fitzal S. Perioperative laparoscopic cholecystectomy. Surg Endosc 2001;15:1030–3. small-dose S(C)-ketamine has no incremental beneficial effects on Qureshi FA, Mellis PT, McFadden MA. Efficacy of oral ketamine for postoperative pain when standard-practice opioid infusions are used. providing sedation and analgesia to children requiring laceration repair. Anesth Analg 2002;94:981–6. Pediatr Emerg Care 1995;11:93–7. Javery KB, Usserry TW, Steger HG, Colclough GW. Comparison of Reeves M, Lindholm DE, Myles PS, Fletcher H, Hunt JO. Adding ketamine morphine and morphine with ketamine for postoperative analgesia. Can to morphine for patient-controlled analgesia after major abdominal J Anaesth 1996;43:212–5. surgery: a double-blinded, randomized controlled trial. Anesth Analg Joachimsson PO, Hedstrand U, Eklund A. Low-dose ketamine infusion for 2001;93:116–20. analgesia during postoperative ventilator treatment. Acta Anaesthesiol Rosseland LA, Stubhaug A, Sandberg L, Breivik H. Intra-articular (IA) Scand 1986;30:697–702. catheter administration of postoperative analgesics. A new trial design Kakinohana M, Hasegawa A, Taira Y, Okuda Y. Pre-emptive analgesia allows evaluation of baseline pain, demonstrates large variation in need with intravenous ketamine reduces postoperative pain in young patients of analgesics, and finds no analgesic effect of IA ketamine compared after appendicectomy: a randomized control study. Masui 2000;49: with IA saline. Pain 2003;104:25–34. 1092–6. Roytblat L, Korotkoruchko A, Katz J, Glazer M, Greemberg L, Fisher A. Kararmaz A, Kaya S, Karaman H, Turhanoglu S, Ozyilmaz MA. Postoperative pain: the effect of low-dose ketamine in addition to Intraoperative intravenous ketamine in combination with epidural general anesthesia. Anesth Analg 1993;77:1161–5. analgesia: postoperative analgesia after renal surgery. Anesth Analg Sadove MS, Shulman M, Hatano S, Fevold N. Analgesic effects of 2003;97:1092–6. ketamine administered in subdissociative doses. Anesth Analg 1971;50: Kawana Y, Sato H, Shimada H, Fujita N, Ueda Y, Hayashi A, Araki Y. 452–7. Epidural ketamine for postoperative pain relief after gynecologic Stubhaug A, Breivik H, Eide PK, Kreunen M, Foss A. Mapping of operations: a double-blind study and comparison with epidural punctuate hyperalgesia around a surgical incision demonstrates that morphine. Anesth Analg 1987;66:735–8. ketamine is a powerful suppressor of central sensitization to pain Kirdemir P, Oezkocak I, Demir T, Goegues(cedilla) N. Comparison of following surgery. Acta Anaesthesiol Scand 1997;41:1124–32. postoperative analgesic effects of preemptively used epidural ketamine Subramaniam B, Subramaniam K, Pawar DK, Sennaraj B. Preoperative and neostigmine. J Clin Anesth 2000;12:543–8. epidural ketamine in combination with morphine does not have a Lauretti GR, Oliveira AP, Rodrigues AM, Paccola CA. The effect of clinically relevant intra- and postoperative opioid-sparing effect. transdermal nitroglycerin on spinal S(C)-ketamine antinociception Anesth Analg 2001a;93:1321–6. following orthopedic surgery. J Clin Anesth 2001;13:576–81. Subramaniam K, Subramaniam B, Pawar DK, Kumar L. Evaluation of the Lee HM, Sanders GM. Caudal ropivacaine and ketamine for postoperative safety and efficacy of epidural ketamine combined with morphine for analgesia in children. Anaesthesia 2000;55:806–10. postoperative analgesia after major upper abdominal surgery. J Clin Lee IO, Kim WK, Kong MH, Lee MK, Kim NS, Choi YS, Lim SH. No Anesth 2001b;13:339–44. enhancement of sensory and motor blockade by ketamine added to Susuki M, Tsueda K, Lansig PS, Tolan MM, Fuhrman TM, Ignacio CI, ropivacaine interscalene brachial plexus blockade. Acta Anaesthesiol Sheppard RA. Small-doses ketamine enhances morphine induced Scand 2002;46:821–6. analgesia after outpatient surgery. Anesth Analg 1999;89:98–103. Lehman KA, Klaschik M. Lack of pre-emptive analgesic effect of low-dose Unlu¨genc¸ H, Guenduez M, Oezalevli M, Akman H. A comparative study on ketamine in postoperative patients. A prospective, randomised double- the analgesic effect of tramadol, tramadol plus magnesium, and tramadol blind study. Schmerz 2001;15:248–53. plus ketamine for postoperative pain management after major abdominal Menigaux C, Fletcher D, Dupont X, Guignard B, Guirimand F, Chauvin M. surgery. Acta Anaesthesiol Scand 2002;46:1025–30. The benefits of intraoperative small-dose ketamine on postoperative Unlu¨genc¸ H, Ozalevli M, Guler T, Isik G. Postoperative pain management pain after anterior cruciate ligament repair. Anesth Analg 2000;90: with intravenous patient-controlled morphine: comparison of the effect 129–35. of adding magnesium or ketamine. Eur J Anaesth 2003;20:416–21. Menigaux C, Guignard B, Fletcher D, Sessler DI, Dupont X, Chauvin M. Weber F, Wulf H. Caudal bupivacaine and s(C)-ketamine for post- Intraoperative small-dose ketamine enhances analgesia after outpatient operative analgesia in children. Paediatr Anaesth 2003;13(244-8):2003. knee arthroscopy. Anesth Analg 2001;93:606–12. Weir PS, Fee JP. Double-blind comparison of extradural block with three Murray WB, Yankelowitz SM, le Roux M, Bester HF. Prevention of post- bupivacaine- ketamine mixtures in knee arthroplasty. Br J Anaesth tonsillectomy pain with analgesic doses of ketamine. S Afr Med J 1987; 1998;80:299–301. 72:839–42. Xie H, Wang X, Liu G, Wang G. Analgesic effects and pharmacokinetics of Naguib M, Sharif AMY, Seraj M, El Gammal M, Dawlatly AA. Ketamine a low dose of ketamine preoperatively administered epidurally or for caudal analgesia in children: comparison with bupivacaine. Br intravenously. Clin Pain 2003;19(317–322):2003. J Anaesth 1991;67:559–64. Yanli Y, Eren A. The effect of extradural ketamine on onset time and Ngan Kee WD, Khaw KS, Ma ML, Mainland PA, Gin T. Postoperative sensory block in extradural anaesthesia with bupivacaine. Anaesthesia analgesic requirement after cesarean section: a comparison of 1996;51:84–6. anesthetic induction with ketamine or thiopental. Anesth Analg 1997; Zohar E, Luban I, Zunser I, Shapiro A, Jedeikin R, Fredman B. Patient- 85:1294–8. controlled bupivacaine wound instillation following cesarean section: Ong EL, Osborne GA. Ketamine for co-induction of anaesthesia in oral the lack of efficacy of adjuvant ketamine. J Clin Anesth 2002;14: surgery. Ambul Surg 2001;9:131–5. 505–11.

30 5.2 Fate of articles that warranted retraction due to ethical concerns: a descriptive cross- sectional study.

Authors : Elia N, Wager E; Tramèr MR Published in : PLOSOne 2014 ; 9 (1) ; e85846 IF 2014 : 3.234

Summary

Following on the « Boldt debacle »,16 named according to the German anaesthetist Dr. Joachim Boldt that was found to have forgotten to seek formal ethical approval for 90 of his articles (Box 2), Editors in Chief of 18 journals that had published his papers declared their intention to retract these articles. In this cross-sectional study, we aimed to check whether a year and half later, these articles had been retracted and whether the retractions had been performed in adequacy with the published recommendations. Our study showed that 9 out of 88 articles (10%) had not been retracted at all (no publication of notice of retraction, no marked pdf) and could not be identified as requiring retraction by a reader unaware of the scandal. Only 5 had been retracted according to all the published requirements. The way other articles had been retracted varied regarding who took the responsibility for the retraction, marking of the pdf, or free access to the retracted article. Interestingly, 14 articles still required payment to be accessed, even thought they had been retracted! Responsibility for both the lack of retraction and inadequate retractions were shared by editors and publishers and by the lack of a person responsible to check that the retraction was performed. Reasons ranged from personal problems of the editor, to legal threats from Dr. Boldt’s co-authors.

This article has been commented in a renowned website dedicated to research misconduct:

RetractionWatch: http://retractionwatch.com/2013/09/10/what-happened-to-joachim-boldts-88-papers-that- were-supposed-to-be-retracted/

41 Fate of Articles That Warranted Retraction Due to Ethical Concerns: A Descriptive Cross-Sectional Study

Nadia Elia1,2*, Elizabeth Wager3, Martin R. Trame`r1 1 Division of Anaesthesiology, Geneva University Hospitals, and Medical Faculty, University of Geneva, Geneva, Switzerland, 2 Institute of Social and Preventive Medicine, Medical Faculty, Geneva, Switzerland, 3 Sideview, Princes Risborough, United Kingdom

Abstract

Objective: To study journals’ responses to a request from the State Medical Association of Rheinland-Pfalz, Germany, to retract 88 articles due to ethical concerns, and to check whether the resulting retractions followed published guidelines.

Design: Descriptive cross-sectional study.

Population: 88 articles (18 journals) by the anaesthesiologist Dr. Boldt, that warranted retraction.

Method: According to the recommendations of the Committee on Publication Ethics, we regarded a retraction as adequate when a retraction notice was published, linked to the retracted article, identified the title and authors of the retracted article in its heading, explained the reason and who took responsibility for the retraction, and when the retracted article was freely accessible and marked using a transparent watermark that preserved original content. Two authors extracted data independently (January 2013) and contacted editors-in-chief and publishers for clarification in cases of inadequate retraction.

Results: Five articles (6%) fulfilled all criteria for adequate retraction. Nine (10%) were not retracted (no retraction notice published, full text article not marked). 79 (90%) retraction notices were published, 76 (86%) were freely accessible, but only 15 (17%) were complete. 73 (83%) full text articles were marked as retracted, of which 14 (16%) had an opaque watermark hiding parts of the original content, and 11 (13%) had all original content deleted. 59 (67%) retracted articles were freely accessible. One editor-in-chief stated personal problems as a reason for incomplete retractions, eight blamed their publishers. Two publishers cited legal threats from Dr. Boldt’s co-authors which prevented them from retracting articles.

Conclusion: Guidelines for retracting articles are incompletely followed. The role of publishers in the retraction process needs to be clarified and standards are needed on marking retracted articles. It remains unclear who should check that retractions are done properly. Legal safeguards are required to allow retraction of articles against the wishes of authors.

Citation: Elia N, Wager E, Trame`r MR (2014) Fate of Articles That Warranted Retraction Due to Ethical Concerns: A Descriptive Cross-Sectional Study. PLoS ONE 9(1): e85846. doi:10.1371/journal.pone.0085846 Editor: K. Brad Wray, State University of New York, Oswego, United States of America Received August 26, 2013; Accepted October 24, 2013; Published January 22, 2014 Copyright: ß 2014 Elia et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: The authors have no funding or support to report. Competing Interests: MRT is the editor-in-chief and NE is an associate editor of the European Journal of Anaesthesiology which has retracted eight articles by Boldt. EW is an author of the COPE guidelines on retraction and is a self-employed consultant who provides training on publication ethics, she was Chair of COPE (2009-12) this was an unpaid position, she is a member of the Ethics Committee of the BMJ and has received travel expenses from COPE and the BMJ. She has acted as a paid consultant to BMJ Group, Wiley-Blackwell, and Elsevier and been paid to speak at meetings run by Springer. This does not affect the authors’ adherence to all the PLOS ONE policies on sharing data and materials. The authors declare that they received no support from any organization for the submitted work; have no financial relationships with any other organizations that might have an interest in the submitted work in the previous three years; and no other relationships or activities that could appear to have influenced the submitted work. * E-mail: [email protected]

Introduction Authoritative bodies such as the Committee On Publication

th Ethics (COPE) [2], and the National Library of Medicine (NLM) On the 25 of February 2011, the State Medical Association of [3] have produced guidelines concerning the retraction of Rheinland-Pfalz, Germany, informed all affected medical journals fraudulent research papers. According to COPE, retraction of the results of its evaluation regarding the status of Institutional notices should be published and linked to the retracted article, Review Board approval for research conducted by the anaesthetist be clearly identified as a retraction (not as a correction or a Dr. Joachim Boldt. The evaluation revealed that 88 original comment), identify the retracted article by including the title and articles authored by Dr. Boldt, and published in 18 peer-reviewed authors in the retraction heading, explain who is retracting and the journals, lacked formal ethical approval. As a consequence, the reasons for retraction, and finally, be freely accessible to all reader, editors-in-chief of these 18 journals signed a common statement thus not be hidden behind access barriers [4]. NLM adds a declaring their intention to retract these articles from their statement stipulating that ‘‘The retraction should appear on a numbered journals, and to publish formal retraction notices [1]. page in a prominent section in an issue of the print journal that published the

PLOS ONE | www.plosone.org 1 January 2014 | Volume 9 | Issue 1 | e85846 Retraction of Articles for Ethical Concerns retracted article as well as in the online version, be listed in the Table of computer without subscription to any journal. For the searches of Contents page.’’ NLM does not remove the original reference of a the full text PDF, we copied the titles of retracted articles into retracted article, but updates the citation to indicate it has been GoogleH. If a link to Pubmed was provided, that link was tried retracted and adds a link to the retraction statement. The original first. If a link to the full text was provided in Pubmed, that link was article should be retained unchanged, except for a watermark on used. If the Pubmed link led to the full text (online or PDF), the the PDF indicating on each page that it is ‘‘retracted’’. Sox and article was classified as freely accessible through Pubmed. If the Pubmed Rennie have further recommended that, as with retraction notices, link led to a login page requiring registration and/or a fee for journals should provide free access to the full text of retracted access to the article, the article was classified as not freely accessible fraudulent articles [5]. through Pubmed. For all articles that were not freely accessible In earlier cases, some occurring before these recommendations through Pubmed, alternative links provided through Google were were published, journals were found not to have retracted articles searched (for instance, ScienceDirect [http://www.sciencedirect. appropriately [5,6]. Although the importance of appropriately com/], ResearchGate [http://www.researchgate.net/] or Google retracting fraudulent articles has often been underlined in Scholar [http://scholar.google.ch/]). When this secondary search Commentaries and Editorials [7–9], studies attempting to quantify was successful, the article was classified as freely accessible through the problem are still lacking. We therefore set out to study the fate alternative web sources. If it was not, it was classified as not freely of the 88 articles by Dr. Joachim Boldt that were meant to be accessible through alternative web sources. The same procedure was retracted in early summer 2011 because of lack of ethical approval. repeated to assess accessibility of retraction notices. We considered a retraction as adequate if the following criteria Methods were fulfilled: a retraction notice was published and linked to the retracted article, was identified as a ‘‘retraction’’ in the table of This study is reported according to the STROBE statement for contents of the journal, included title and authors of the retracted reporting cross-sectional studies. article in its heading, explained who took responsibility for the retraction, gave reasons for retraction, and was freely accessible, Ethics statement and if the PDF of the retracted article was clearly labelled as Ethical approval was not required for this study. ‘‘retracted’’ using a transparent watermark preserving original content and was freely accessible. Study design and setting Finally, we contacted the editors-in-chief of those journals that This was a descriptive cross-sectional study. All searches and had failed to retract articles correctly, and asked them for an data extraction were done in January 2013. explanation. If feasible, we also contacted the journal publishers and asked for explanations. Study selection We selected all 88 articles listed in the document ‘‘Editors-in-Chief Bias Statement Regarding Published Clinical Trials Conducted without IRB Data extraction was performed by two authors (NE and MT) Approval by Joachim Boldt’’ (published in February 2011) for which independently in order to minimise the risk of extraction errors. the State Medical Association of Rheinland-Pfalz, Germany, was Clear procedures were defined before performing the searches in unable to verify approval by a competent ethics committee and order to guarantee reproducibility of the findings. had therefore recommended that they should be retracted [1]. Study size and statistical analyses Variables, data extraction and data sources The study sample was defined as all trials that were authored or For each article, we checked whether all criteria for adequate co-authored by Dr. Joachim Boldt and that were found by an retraction, as stipulated by COPE, were fulfilled [2]. In addition official inquiry to warrant retraction because of lack of formal (following other recommendations) we checked whether the full ethical approval. This is a descriptive study; there was no intention text article was freely accessible [5], and whether the original to search for associations between variables or to draw statistical content was preserved [10]. inferences; therefore, no sample size calculation was performed. For each title, two authors (NE, MRT) independently recorded Results are reported as frequencies and percentages. whether or not a retraction notice had been published in the respective journal (yes, no), whether the retraction notice was linked Results to the retracted article (yes, no), was listed in the journal’s table of contents (completely: complete reference of retracted article is listed Journals and publishers in table of contents; incompletely: retraction is listed in table of The 88 articles were published in Anesthesia and Analgesia (22 contents but reference is lacking; none: retraction notice is not articles), British Journal of Anaesthesia (11), Journal of Cardiothoracic and listed), and whether the heading of the retraction notice included Vascular Anesthesia (9), European Journal of Anaesthesiology (8), Anaesthesia the title (yes, no) and the authors (yes, no) of the retracted article. (6), Ana¨sthesiologie Intensivmedizin Notfallmedizin Schmerztherapie (6), For each retraction notice that could be retrieved, we checked Canadian Journal of Anesthesia (5), Intensive Care Medicine (5), Acta whether or not the notice provided explanations concerning the Anaesthesiologica Scandinavica (3), Der Ana¨sthesist (2), Annals of Thoracic reason for retraction (yes, no), and described who was responsible Surgery (2), Critical Care Medicine (2), Thoracic and Cardiovascular Surgery for the retraction (editor-in-chief, editorial board, publisher, etc). (2), Medical Science Monitor (2), Minerva Anestesiologica (2), Anesthesiology We checked on the article PDFs whether, and how, they were (1), Journal of Cranio-Maxillo-Facial Surgery (1), and Vox Sanguinis (1) marked, for instance, using a watermark indicating ‘‘retracted’’ (Table 1). across all pages. We classified the watermarks as transparent (i.e. The 18 journals were published by nine publishers (Table 1): underlying text, figures or tables were readable) or opaque (i.e. Lippincott Williams & Wilkins, Wiley-Blackwell, Springer, and underlying text, figures and tables were obscured). Elsevier (three journals each), Thieme (2), Oxford University Press, In order to assess the accessibility of retraction notices and full Edizioni Minerva Medica and Medical Science International (1 text retracted articles, all searches were performed from a private each). One journal (European Journal of Anaesthesiology) had changed

PLOS ONE | www.plosone.org 2 January 2014 | Volume 9 | Issue 1 | e85846 LSOE|wwpooeog3Jnay21 oue9|Ise1|e85846 | 1 Issue | 9 Volume | 2014 January 3 www.plosone.org | ONE PLOS Table 1. Summary of 88 articles that warranted retraction.

Journal and publisher Retractions Retraction notice Full text article (PDF) Heading includes Notice describes Free access Watermark Free access

Linked to Identified In table of Article Who Published Month article as such content title Authors retracts Reasons Pubmed Other Present Quality Pubmed Other

Acta Anaesthesiologica Scandinavica Wiley-Blackwell 3 3 Aug YES YES YESa NO NO YES YES YES NO NO YES (1) Anaesthesia Wiley-Blackwell 6 6 July YES YES YESa NO NO YES YES YES YES Transparent YES (5) YES (1) Ana ¨sthesiologie Intensivmedizin Notfallmedizin Schmerztherapie Thieme 6 YES (2) July YES YES NO YES NO YES YES YES YES (2) Opaque NO NO NO (4) NO (4) NO NO Anesthesia & Analgesia Lippincott Williams & Wilkins 22 22 May YES NO YESa YES NO YES NOc YES YES Transparent YES Anesthesiology Lippincott Williams & Wilkins 1 1 May YES YES YESb NO NO NO YES YES YES Transparent YES Annals of Thoracic Surgery Elsevier 2 2 July YES YES YESa NO NO YES YES YES YES Opaque YES (1) NO British Journal of Anaesthesia Oxford University Press 11 11col July YES YES YESb NO YES YES YES YES YES Content YES (empty) deletede Canadian Journal of Anesthesia/Journal canadien d’anesthe Springer 5 5 Sept YES YES YESa YES YES YES YES YES YES Transparentf YES (4) NO (1) YES (1) Critical Care Medicine Lippincott Williams & Wilkins 2 2col Oct YES YES YESb NO NO NO YES NO NO YES (1) Transparent NO NO ´sie NO (1) NO NO

Der Ana Concerns Ethical for Articles of Retraction ¨sthesist Springer 2 YES (1) May YES NO1 NO YES NO YES YES NO NO YES (1) Opaque NO NO NO (1) NO (1) NO NO European Journal of Anaesthesiology Lippincott Williams & Wilkins 7 7 June YES YES YESa YES NO YES NOc YES YES Transparent NO NO Cambridge University Press 1 1 June YES YES YESa YES NO YES NOc YES NO d NO NO Intensive Care Medicine Springer 5 5 July YES YES YESa YES YES NO YES YES YES Transparentf NO NO Journal of Cardiothoracic and Vascular Anesthesia W.B. Saunders Company; Elsevier 9 9 Aug YES YES YESb YES YES YES YES NO YES YES Opaque NO YES Journal of Cranio-Maxillofacial Surgery Thieme; Elsevier 1 None NO NO NO Retraction of Articles for Ethical Concerns

publishers in 2009 (from Cambridge University Press to Lippincott Williams & Wilkins).

Outcome data and main results Five retractions (6%; published in one journal) fulfilled all retraction is listed in b predefined criteria of adequate retraction (Fig. 1). Retraction notices. No retraction notices had been pub- lished for nine of the articles (10%; five journals). Of these five PDF accessible but original text, e journals, three (publishing four articles) had not published any retraction notice at all, one had published retraction notices for only two of six articles, and one had published a retraction notice NONO NO YES NO NO Full text article (PDF) for only one of two articles (Table 1). Retraction notices for the remaining 79 articles were published between May 2011 and October 2011. retraction notices after learning of our findings. Each of the 79 retraction notices was linked, in Pubmed, to the retracted article; 55 (63%) notices were clearly identified as ‘‘Retractions’’, one was labelled ‘‘Statement’’, one ‘‘Erratum’’, and 22 (originating from one journal) were not labelled at all and referred to an Editor’s Note named: ‘‘Notice of retraction’’. 76 retraction notices (86%) were listed in the table of contents of the respective journals; in 52 (59%; nine journals) the complete reference of the retracted article was listed in the table of contents, and in 24 (27%; four journals) the retraction was listed in the table Who retracts Reasons Pubmed Other Present Quality Pubmed Other

complete reference of retracted article is listed in table of content; of contents but the reference was lacking. Three published a retraction notices (two journals) were not listed in the table of contents. The formats of the retraction notices were consistent within statement. 2 each journal, but differed between journals (Table 1): 53 notices PDF of one article that was published by a former publisher not marked; d (60%) included the title of the retracted article, and 32 (36%) NO YESYES YES YES YES YES YES YES YES NO YES NO Transparent NO YES NO Article title Authors Heading includes Notice describes Free access Watermark Free access

erratum; included the names of the authors in their heading. Twenty notices 1 (23%; four journals) included both the title and the authors in their heading, and 14 (16%) included neither of them in the heading, but included the reference in the text of the retraction notice. b a Seventy-one retraction notices (81%) described who had taken YES In table of content responsibility for the retraction. Responsibilities varied across journals. In seven journals (48 retraction notices), the editor-in- chief signed the retractions, in three (ten notices), it was the editor- 2 in-chief with the publisher, and in two (13 notices), responsibility Identified as such was taken by the editorial board. In three journals (eight notices), there was no indication of who had retracted the article; for example, one stated, ‘‘the following article has been retracted...’’. Reasons collective notice of retraction published.

Linked to article for the retraction were explicitly provided in the notices in 13 Col journals (49 retraction notices); in two journals (30 notices), the notice referred to an editorial that explained the context of the retractions. Sixty-seven retraction notices (76%) were accessible through Pubmed, nine retraction notices (10%; one journal) were accessible Retraction notice Published Month through an alternative weblink (ScienceDirect) but not through Pubmed, and three (3%; two journals) were not freely accessible. Overall, only 15 retraction notices (17%; three journals) were

reason not described in notice but notice refers to an editorial giving the reason; found to fulfil all predefined criteria for an adequate retraction c notice (Fig. 1). Retracted articles. Fifteen articles (17%) were not marked as retracted. Nine of these also lacked a retraction notice. Of the 73 articles (83%) that were marked as retracted, 48 (55%; eight Watermark present but almost invisible. Column ‘‘Month’’ refers to year 2011. Since January 2013, one publisher (Wiley-Blackwell) has modified its f journals) had transparent watermarks although in ten of those (11%; two journals), the watermark was almost invisible (Fig. 2). The other 14 articles (16%; four journals) had opaque marks

Cont. across all pages that completely obscured parts of text, tables or figures (Fig. 3). Eleven articles (13%; all from one journal) had their entire content (i.e. text, tables, figures, references) deleted. In one journal, only seven of eight full texts were labelled with a doi:10.1371/journal.pone.0085846.t001 Table 1. Journal and publisher Retractions Medical Science InternationalEdizioni Minerva Medica 2ThiemeWiley-Blackwell None 1Numbers in parentheses are numbers of articles or of retraction notices. 1 1 1 May YES None 1 NO July YES YES YES table of content but reference is lacking; Medical Science Monitor Minerva Anestesiologica The Thoracic and Cardiovascular Surgeon Vox Sanguinis figures, tables deleted; watermark although retraction notices were published for all eight.

PLOS ONE | www.plosone.org 4 January 2014 | Volume 9 | Issue 1 | e85846 Retraction of Articles for Ethical Concerns

Figure 1. Flow chart. 1Retraction notice linked to the retracted article, identified as a retraction in the table of content of the journal, included the title and authors of the retracted article in its heading, explained who took the responsibility for retraction, and provided reasons for retraction. 2Watermark transparent, not opaque, and the original text preserved. 3Through PubMed or alternative websites. doi:10.1371/journal.pone.0085846.g001

Figure 2. Transparent watermarks. Left panel: Anesthesia and Analgesia. Right panel: Canadian Journal of Anesthesia. doi:10.1371/journal.pone.0085846.g002

PLOS ONE | www.plosone.org 5 January 2014 | Volume 9 | Issue 1 | e85846 Retraction of Articles for Ethical Concerns

co-authors that prevented them from retracting a total of six articles in three journals.

Discussion We followed up the fate of 88 articles that should have been retracted due to the absence of ethical approval. Almost two years after the need for retraction was recognised, following an official investigation, our study shows that the performance of many journals is disappointing. Applying our stringent rules, only five articles (6%; one journal) were adequately retracted. A retraction notice had been published for only 79 (90%) of the articles, and although most retraction notices were listed in the journal’s table of contents, 27% of the listings were incomplete. Also, only 48 (54%) full text articles were clearly and correctly marked as retracted; the others were not marked at all, or the text was either completely deleted or covered by an opaque watermark obscuring parts of the original information. A major issue was the lack of free access to both retraction notices and full texts of retracted articles; only 76 (86%) of all retraction notices and 59 (67%) of all retracted articles were freely accessible, and not all of those were freely accessible through PubMed but had to be accessed through alternative websites. Formal retraction from the literature is the most severe sanction for a published research article [2]. Since science must be self- correcting, retraction of unreliable articles is an essential step for rectifying the scientific knowledge-base [11], although it has been shown that it is probably insufficient to inform the scientific community [12]. Retractions appear to be ‘‘unpopular’’ with both editors and institutions since they may shed doubt on the integrity of science, and on the expertise of the editorial team. However, they demonstrate the determination to maintain the integrity of knowledge and to prevent readers from being misled by unreliable information [10,13]. What is certain is that retractions are, and will remain, necessary since the safeguards of science including the Figure 3. Opaque watermark. Journal of Cardiothoracic and Vascular process of peer review, remain vulnerable to fraud and error. So Anesth. why did we find such great disparity in the way in which these 88 doi:10.1371/journal.pone.0085846.g003 articles were handled? There may be several explanations. First, nine (10%) articles have not been retracted at all. There is The article that lacked a watermark had been published by the little agreement on what exactly requires a retraction [8], and lack journal’s previous publisher. of ethical approval does not fit into any of the conventional Forty-four articles (50%; six journals) were freely accessible definitions of misconduct such as plagiarism, fabrication, or through Pubmed, but 11 of those (13%; one journal) had their falsification of data. Therefore, some editors and publishers may content completely deleted. Fifteen full text articles (17%) were be inclined to consider ethical concerns as a minor problem only. freely accessible, although through alternative weblinks only; one NLM advises that articles may be withdrawn because of of them was not marked. Overall, only 34 articles (39%) were both ‘‘pervasive error’’ or ‘‘unsubstantiated or irreproducible data’’ adequately marked and freely accessible (Fig. 1). due to either misconduct or honest error. COPE advises journal Explanations from editors-in-chief and publishers. We editors to retract an article if they have clear evidence that the contacted the editors-in-chief of the journals that had not correctly findings are ‘‘unreliable’’, and that they should consider retracting implemented some or all retractions and asked them for a publication if it reports ‘‘unethical research’’. The 88 articles by explanations. Two did not respond. One referred to personal Dr. Boldt and co-workers do not clearly fit into any of these health problems that prevented him from accomplishing the task. categories and there is not yet an agreement on how to handle a The others referred us to their publishers. One of these editors clinical research article that may not necessarily be ‘‘unethical’’ per cited internal communication problems that led to the omission of se, but that was not officially approved by a competent ethics one of two retraction notices. One editor explained that it had committee. It is therefore possible that retraction was delayed or been the publisher’s decision to delete the content of the retracted prevented because the articles did not fit into one of the ‘‘usual’’ articles. However, he also challenged the principle that data in categories requiring retraction. However, this reason was not retracted articles should be preserved, as he considered these data mentioned by any of the editors who were contacted requesting an were false and therefore valueless. explanation and this does not explain why the retractions failed to We were able to communicate with three publishers. One was adhere to published guidelines. critical of the fact that there was no mechanism within the industry Second, no reliable mechanisms exist to ensure that research that he knew of for previous publishers to be notified of retractions articles warranting retraction (e.g. following an appropriate required by journals that were now produced by a different investigation) are actually retracted [7,10,14] although the COPE publisher. Two publishers mentioned legal threats from Boldt’s guidelines on cooperation between journals and institutions give

PLOS ONE | www.plosone.org 6 January 2014 | Volume 9 | Issue 1 | e85846 Retraction of Articles for Ethical Concerns some guidance on this [4]. For example, the primary responsibility clearly marked and freely accessible electronically, and that the for investigating possible scientific misconduct rests with the original information remains visible? When studying various authors’ institution. Once an institution has determined that recommendations on how to retract articles [2], it is striking misconduct involving a research publication has occurred, journals how much responsibility is given to the editors. Several guidelines are obliged to consider retraction of the work [11]. In the present indicate that editors are responsible for the final decision about example, suspicion of scientific misconduct was first raised by retracting material, with or without cooperation of the authors, journal editors, who then turned to the authors’ institution and its and additionally that they should ensure that retractions are ethics committee, asking for an internal investigation. When labelled in such a way that they are identified by bibliographic misconduct was recognised by the competent ethics committee, databases. There seems to be some contradiction here since we the editors-in-chief of all journals involved decided to start a joint found that some editors considered they were powerless against a retraction process. However, the individual retraction processes publisher’s decision. Also, current guidelines do not specify who were handled independently by each journal, and did not follow a should verify whether the retraction process has been implemented clearly defined common procedure. correctly. Third, it remains unclear who should be responsible for Finally, journal editors or publishers may be reluctant to issue retracting an article. In cases of unintentional ‘‘honest’’ error, retractions and to mark an article because they may fear legal the responsibility for requiring a retraction rests with the authors. actions by discredited authors. In our example, it was sometimes a In cases of scientific misconduct, obviously, someone else has to publisher that decided not to retract an article because Boldt’s co- assume this responsibility. It has been claimed that, once an authors threatened legal action. This highlights, again, the central institution has identified fraud or significant error, it is then up to role of publishers in deciding whether an article is to be retracted the journals to respond promptly and properly [4,14]. However, or not. Today, the threat of legal action weighs heavily, especially there is no clear guidance about who, within the ‘‘journal’’, should on smaller journals [9], although, according to COPE, authors take this responsibility. Surveys of retractions have shown that this usually would not have grounds for taking legal action against a varies and may be the editor-in-chief, the entire editorial board, journal over the act of retraction if it follows a suitable the publisher, or the owner of the journal, which may be an investigation and proper procedures [2]. COPE also states that academic society [6]. Interestingly, retraction notices sometimes journal editors should consider at least issuing an expression of specified that an editor-in-chief alone had ordered the retraction, concern if an investigation is underway but a judgment will not be and sometimes there seemed to be an agreement between a available for a considerable time [2]. None of the journals that publisher and an editor-in-chief or the entire editorial board. Most failed to retract an article has published such an expression of editors of journals that had failed to correctly retract some or all concern. articles, referred us to their publishers, suggesting that they held Our analysis has some limitations. The main weakness of our them responsible. As long ago as 1990 it was recognised that study is that it remains descriptive and relates to a single series of authors, editors, reviewers, and librarians all needed to be involved papers involving one author and a single investigation into a in a multifaceted approach to address the continued use of invalid specific case of lack of evidence of ethical approval, which may not data [15]; interestingly, publishers were ignored at that time. By be typical of most retractions; we have not attempted to identify 2012, adequately dealing with scientific misconduct has become, potential ‘‘risk factors’’ for problems in the retraction process. The according to different guidelines, a joint mission for ‘‘authors, reason for this is that we had no strong a priori hypothesis to test, editors, and publishers’’ [16–18]. The exact responsibility of each and the relatively small sample size would have prevented us from actor, however, remains ill defined. performing multivariate analyses. However, our analyses may Fourth, recommendations on how full texts of retracted articles serve as a basis for future larger studies focusing on retractions due should be labelled are also lacking. COPE, for instance, states that to ethical issues and for other reasons. Indeed, this descriptive retracted articles should not be removed from printed copies of the cross-sectional study is the first of its kind to describe systematically journal or from electronic archives but their retracted status should the disparity of the retraction processes in a uniform context; one be indicated as clearly as possible. The question remains, whether common author for each article and a single type of misconduct the original content of these articles should be preserved. Some (lack of ethical approval) was involved. Selection bias is unlikely may agree with one of the interviewed editors who argued that the since we included all articles by Dr. Boldt that had been identified data were false and therefore valueless, so why leave it viewable? by an official investigation [1]. It cannot be excluded that a few An alternative argument may be that scientific data, even when additional studies will eventually require retraction because of lack fraudulent or unethical, belongs to the public and may serve future of ethical approval but it is unlikely that this will change the overall research, for instance, research into fraud, and that the picture. The risk of reporting bias was minimized by having two preservation of the historical record, including all faults, mistakes, researchers extract the relevant data separately. and corrections, is essential [10]. Indeed, in 1984, when the NLM The Boldt case was a shock to the academic world [19]. The implemented a policy for identifying and indexing published impact of this debacle was recognized even outside peri-operative retractions, they chose to link the notice of retraction to the medicine [20]. Perhaps the only positive and encouraging fact was original article rather than delete the citation to the retracted that the editors-in-chief of 18 journals agreed, in a well- article, because they felt that removal might affect historical orchestrated, committed and overt way, to sign a strong public perspective [10]. In the present study, data from 25 retracted statement and to retract 88 articles. This organised approach articles (28% of retracted articles) had been partially or completely against fraud, across so many journals, was probably unique in the removed; either the contents of the articles were completely scientific world until then, and it has been cited as a laudable deleted, or parts of the underlying text, tables or figures were example of how journal editors should play a more active role in hidden by opaque watermarks. the retraction of fraudulent papers [8]. Since, an even larger case Fifth, a further unresolved issue refers to the ultimate control of scientific misconduct, necessitating the retraction of 183 articles, after a retraction has been initiated. Who ensures that an article is has been uncovered [21]. Our study shows that purging the adequately retracted, that a notice is published in the journal and literature of fraudulent or unethical articles remains a technically indexed in databases such as Pubmed, that the full text article is challenging process and highlights several weaknesses which must

PLOS ONE | www.plosone.org 7 January 2014 | Volume 9 | Issue 1 | e85846 Retraction of Articles for Ethical Concerns be addressed. This is probably more a confirmation than a Both retraction notices and retracted articles belong to the public revelation. We did know that retractions were imperfect, but our domain and should not be hidden behind access barriers or pay study helps to quantify this. Perhaps most fundamentally, there walls. After a reasonable time period, it should be verified whether must be clear and universally accepted definitions about what type the retraction process has been successfully implemented, of articles deserve retraction. We feel strongly that science that has although, again, it remains unclear who shall take on that not been approved by a competent ethics committee should fall responsibility, but perhaps this role could be taken by the into this category and we suggest that COPE’s wording about institution that carried out the investigation. And finally, legal ‘‘unethical research’’ should be amended to include this. The roles safeguards are needed that allow journal editors to retract and responsibilities of the different players must be unambiguously fraudulent or unethical papers even against the wishes of authors defined, and publishers must not be left out – in fact, they should and/or publishers. probably have a central role. The mechanism for retracting articles published by a previous publisher (i.e. after a journal has Acknowledgments switched publishers) needs to be resolved. It seems obvious that once a competent independent investigation has provided Special thanks go to the editors and who publishers who responded to our convincing evidence that an article should be retracted, it is the enquiry and provided additional information. journal’s responsibility to issue a retraction notice, but it remains unclear who must take on that responsibility, the editor or the Author Contributions publisher. We suggest that it should be the publisher’s responsi- Conceived and designed the experiments: NE MRT. Performed the bility to adequately mark the full text of the retracted article, using experiments: NE MRT. Analyzed the data: NE MRT EW. Contributed a transparent rather than opaque watermark on each page. reagents/materials/analysis tools: NE MRT EW. Wrote the paper: NE Deleting the content of the fraudulent article should be outlawed. MRT EW. Read and approved the final manuscript: NE MRT EW.

References 1. EIC Joint Statement on Retractions Available:http://journals.lww.com/ 11. Fang FC, Casadevall A (2011) Retracted science and the retraction index. Infect ejanaesthesiology/Documents/EIC%20Joint%20Statement%20on%20Retractions Immun 79: 3855–9. %2012Mar2011.pdf. Accessed 12 August 2013. 12. Whitely WP, Rennie D, Hafner AW (1994) The scientific community’s response 2. Wager E, Barbour V, Yentis S, Kleinert S on behalf of COPE Council. to evidence of fraudulent publication. The Robert Slutsky case. JAMA 272: 170– Retractions: Guidance from the Committee on Publication Ethics (COPE) 3. Available:http://publicationethics.org/files/u661/Retractions_COPE_gline_ 13. Garfield E, Welljams-Dorof A (1990) The impact of fraudulent research on the final_3_Sept_09_2_.pdf. Accessed 12 August 2013. scientific literature: The Stephen E. Breuning case. JAMA 263: 1424–6. 3. Errata, Retractions, Partial Retractions, Corrected and Republished Articles, 14. Friedman PJ (1990) Correcting the literature following fraudulent publication. Duplicate Publications, Comments (including Author Replies), Updates, Patient JAMA 263: 1416–9. Summaries, and Republished (Reprinted) Articles Policy for MEDLINEH. 15. Pfeifer MP, Snodgrass GL (1990) The continued use of retracted, invalid Available:http://www.nlm.nih.gov/pubs/factsheets/errata.html. Accessed 12 scientific literature. JAMA 263: 1420–3. August 2013. 16. Knottnerus JA, Tugwell P (2012) Good publishing practice. J Clin Epidemiol 65: 4. Research institutions guidelines. Available:http://publicationethics.org/files/ 353–4. Research_institutions_guidelines_final.pdf. Accessed 12 August 2013. 17. Code of conduct for publishers. Available:http://publicationethics.org/files/ 5. Sox HC, Rennie D (2006) Research misconduct, retraction, and cleansing the Code%20of%20conduct%20for%20publishers%20FINAL_1_0.pdf. Accessed medical literature: Lessons from the Poehlman case. Ann Intern Med 144: 609– 14 August 2013. 13. 6. Wager E, Williams P (2011) Why and how do journals retract articles? An 18. Code of conduct for journal editors. Available:http://publicationethics.org/ analysis of Medline retractions 1988–2008. J Med Ethics 37: 567–70. files/Code_of_conduct_for_journal_editors_Mar11.pdf. Accessed 14 August 7. Abbott A, Schwarz J (2002) Dubious data remain in print two years after 2013. misconduct inquiry. Nature 418: 113–113. 19. Trame`r MR (2011) The Boldt debacle. Eur J Anaesthesiol 28: 393–5. 8. Marchant J (2011) ‘Flawed’ infant death papers not retracted. Nat News 476: 20. Dyer C (2011) Guidance on intravenous fluid therapy is to be reviewed after six 263–4. quoted articles are withdrawn. BMJ 342: d1492–d1492. 9. Newman M (2010) The rules of retraction. BMJ 341: c6985–c6985. 21. Trame`r MR (2013) The Fujii story. A chronicle of naı¨ve disbelief. 10. Atlas MC (2004) Retraction policies of high-impact biomedical journals. J Med Eur J Anaesthesiol 30: 195–198. Libr Assoc 92: 242–50.

PLOS ONE | www.plosone.org 8 January 2014 | Volume 9 | Issue 1 | e85846

5.3 Susceptibility to fraud in systematic reviews: lessons from the Reuben case.

Authors: Marret E, Elia N, Dahl JB, McQuay HJ, Møiniche S, Moore RA, Straube S, Tramèr, MR Published in : Anesthesiology 2009 ; 111 :1279-89 IF 2009 : 5.354

Summary

In this study, our aim was to investigate and quantify the impact of the “Reuben fraud” (see Box 1) on the conclusions of systematic reviews that had included one or more article based on fabricated data in their analyses, since it had been claimed that the massive retraction of 20 articles by Scott Reuben compromised the conclusions of every systematic reviews that had included these trials.8 We wanted to check whether systematic reviews could be “robust” to fraud, or whether their conclusions differed widely from the conclusion they would have reached had they not considered the fraudulent studies. We re-analysed 25 systematic reviews that cited (5), considered (6) or included (14) a trial authored by Scott Reuben, with or without the (potentially) fraudulent data. Overall, we showed that carefully performed quantitative systematic reviews were mostly robust against the Reuben fraud, even though qualitative systematic reviews including fraudulent reports were more vulnerable regarding their conclusions, especially if the proportion of fabricated data among all included trials was higher than 30%. This study highlighted yet other interesting issues such as the unclear role of co-authors of Scott Reuben (one co-author contributed up to 8 articles, but was not bothered by the scandal), or the unclear role of sponsors (all reports that had been sponsored by the pharmaceutical industry were withdrawn). It also highlighted the high rate of self-citations in Scott Reuben’s articles. Our conclusion was that one of the strength of a systematic review was to shift emphasis from single studies to multiple ones, and biased conclusion could be avoided if some basic principles of systematic reviewing were observed.

51 Anesthesiology 2009; 111:1279–89 Copyright © 2009, the American Society of Anesthesiologists, Inc. Lippincott Williams & Wilkins, Inc. Susceptibility to Fraud in Systematic Reviews Lessons from the Reuben Case Emmanuel Marret, M.D.,* Nadia Elia, M.D., M.Sc.,† Jørgen B. Dahl, M.D., D.MSc., M.B.A.,‡ Henry J. McQuay, M.A., D.M., F.R.C.A., F.R.C.P. (Edin.),§ Steen Møiniche, M.D.,࿣ R. Andrew Moore, M.A., D.Phil., C.Chem., F.R.S.C., F.R.C.A., D.Sc.,# Sebastian Straube, B.M., B.Ch., M.A., D.Phil.,** Martin R. Trame` r, M.D., D.Phil.††

Background: Dr. Scott Reuben allegedly fabricated data. The tematic reviews were vulnerable if the fraudulent data were authors of the current article examined the impact of Reuben more than 30% of the total. Qualitative systematic reviews reports on conclusions of systematic reviews. seemed at greater risk than quantitative. Methods: The authors searched in ISI Web of Knowledge sys- tematic reviews citing Reuben reports. Systematic reviews were grouped into one of three categories: I, only cited but did not SYSTEMATIC reviews of randomized controlled trials include Reuben reports; II, retrieved and considered, but eventu- (RCTs), with or without meta-analysis, are considered ally excluded Reuben reports; III, included Reuben reports. For powerful levels of evidence on which to guide clinical quantitative systematic reviews (i.e., meta-analyses), a relevant difference was defined as a significant result becoming nonsignif- practice. However, bias and fraud in original research icant (or vice versa) by excluding Reuben reports. For qualitative can threaten systematic review and meta-analysis. For systematic reviews, each author decided independently whether example, when systematic reviews include trials with noninclusion of Reuben reports would have changed conclusions. inadequate concealment of treatment allocation or inap- Results: Twenty-five systematic reviews (5 category I, 6 cat- propriate blinding, they are likely to overestimate the egory II, 14 category III) cited 27 Reuben reports (published 1,2 1994–2007). Most tested analgesics in surgical patients. One of 6 benefit of a treatment. Similarly, accidental inclusion quantitative category III reviews would have reached different of covert duplicate publication into meta-analysis can conclusions without Reuben reports. In all 6 (30 subgroup anal- bias the conclusion of that meta-analysis in favor of an yses involving Reuben reports), exclusion of Reuben reports experimental intervention.3 never made any difference when the number of patients from Falsification of data is a grave breach of scientific eth- Reuben reports was less than 30% of all patients included in the analysis. Of 8 qualitative category III reviews, all authors agreed ics. Fabricated data may be published in peer-reviewed that one would certainly have reached different conclusions scientific journals and may subsequently be included in without Reuben reports. For another 4, the authors’ judgment systematic reviews. Recently, routine audit uncovered was not unanimous. perhaps one of the largest research frauds ever reported.4 Conclusions: Carefully performed systematic reviews proved U.S. anesthesiologist Dr. Scott Reuben allegedly fabricated robust against the impact of Reuben reports. Quantitative sys- clinical studies.5 Most of these trials demonstrated benefits from analgesic drugs.  Supplemental digital content is available for this article. Direct Data from Reuben publications have been included in URL citations appear in the printed text and are available in systematic reviews and meta-analyses. It has been both the HTML and PDF versions of this article. Links to the digital files are provided in the HTML text of this article on the claimed that the retraction of Reuben’s articles compro- Journal’s Web site (www.anesthesiology.org). mise every systematic review that included these fabri- cated findings.6 In this context, two questions are rele- vant. First, does noninclusion of fraudulent data in * Research Associate, Division of Anesthesiology, University Hospitals of Geneva. systematic reviews change the conclusions of these sys- † Research Associate, †† Professor, Division of Anesthesiology, University Hos- pitals of Geneva, and Faculty of Medicine, University of Geneva, Geneva, Swit- tematic reviews? Second, are some systematic reviews zerland. ‡ Professor, Department of Anaesthesia, Centre of Head and Ortho- more robust than others against the impact of included paedics, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark. § Professor, Nuffield Department of Anaesthetics, # Professor, Pain fraudulent data, and, if so why? We set out to address Research and Nuffield Department of Anaesthetics, University of Oxford, John these issues using the Reuben case as an example. Radcliffe Hospital, Oxford, United Kingdom. ࿣ Consultant, Department of Anaesthesiology, Glostrup University Hospital, Copenhagen, Denmark. ** Phy- sician-Scientist, Department of Occupational and Social Medicine, University of Go¨ttingen, Go¨ttingen, Germany. Received from the Division of Anesthesiology, Department of Anesthesiology, Materials and Methods Critical Care and Clinical Pharmacology, Geneva University Hospitals, Geneva, Switzerland. Submitted for publication July 18, 2009. Accepted for publication August 26, 2009. Supported by the Evidence-based Critical Care, Anaesthesia and For the purpose of this analysis, we regarded all re- Pain Treatment Foundation, Geneva, Switzerland (salary of Dr. Elia), and institu- tional funds from the Department of Anesthesiology, Critical Care and Clinical ports that were coauthored by Reuben as potentially Pharmacology, Geneva University Hospitals, Geneva, Switzerland. fraudulent, including those not included on the official Address correspondence to Dr. Trame`r: Division of Anesthesiology, University retraction list.7 Hospitals of Geneva, Geneva, Switzerland. [email protected]. Information on purchasing reprints may be found at www.anesthesiology.org or on the We searched for indexed reports of any study archi- masthead page at the beginning of this issue. ANESTHESIOLOGY’s articles are made tecture that were coauthored by Dr. Scott S. Reuben. We freely accessible to all readers, for personal use only, 6 months from the cover date of the issue. searched ISI Web of Knowledge using the author search

Anesthesiology, V 111, No 6, Dec 2009 1279 1280 MARRET ET AL.

Retrieved review (n=28)

No systematic review* (n=2)21,34

Duplicate publication (n=1)10,11

Relevant systematic review (n=25)

Category I Category II Category III Systematic review that cited Systematic review that Systematic review that included a Reuben report in the retrieved and considered a data from a Reuben report introduction or discussion but Reuben report for inclusion either quantitatively (i.e. meta- did not consider the data for but eventually excluded it on analytically) or qualitatively inclusion (n=5)10,13,17,31,35 the basis of, for instance, quality or validity criteria (n=6)8,16,19,24,28,29

Quantitative systematic Qualitative systematic review review (meta-analysis) (n=8)12,15,20,23,26,27,30,33 (n=6)9,14,18,22,25,32

Fig. 1. Systematic reviews that cited Reuben reports. * No search strategy described. term Reuben SS.‡‡ The date of the last search was March subgroup analyses that included data from Reuben reports 18, 2009. The Create Citation Report tool was used to without Reuben reports using their original statistical soft- summarize bibliometric data of the Reuben reports. From ware package. When they did not respond to our inquiry, all reports that cited Reuben at least once, we selected we extracted the relevant data from text, Forrest plots, or those that used the term systematic review or meta-anal- tables of the meta-analysis or from original articles, and ysis in the title. When the title left doubt about the nature repeated the analyses without the Reuben reports using of the citing reference, we consulted the abstract. Review Manager (RevMan) version 5.0 (The Nordic Co- We checked whether the citing reviews fulfilled at chrane Centre, The Cochrane Collaboration, Copenhagen, least one of two criteria of a systematic review: (1) a Denmark; 2008). There was a pre hoc agreement that a methods section with a description of the search strat- change from a significant to a nonsignificant result (or vice egy or (2) explicit inclusion criteria for eligible reports. versa) by excluding Reuben data were a relevant change in All other reviews were regarded as narrative, nonsystem- the outcome of a meta-analysis. atic reviews and were not considered further. Systematic reviews were categorized into three sub- groups: Category I cited a Reuben report (e.g.,inthe Results introduction or the discussion) but did not consider the data for inclusion; category II retrieved and considered a We retrieved 96 Reuben reports. Sixty-four were cited Reuben report but eventually excluded it on the basis of, at least once (total number of citations, 1,199; without e.g., quality or validity criteria; and category III included self-citations, 682 or 57% of all citations). Of the 64 cited data from a Reuben report either qualitatively or quan- Reuben reports, 27 (see Supplemental Digital Content 1, titatively (i.e., meta-analytically). a document that lists all Reuben references cited in this When a Reuben report was included in a qualitative study, http://links.lww.com/ALN/A559) were cited by at category III review, each author decided independently least one of 28 reviews (fig. 1).8–35 whether noninclusion of that report would have Two of the citing reviews did not fulfill the minimal changed the overall conclusions of the review. When criteria of a systematic review and were therefore not our verdict was unanimous, we assumed that noninclu- considered further.21,34 One was published twice.10,11 sion of Reuben reports would, or would not, have Therefore, 27 Reuben reports were cited by 25 system- changed the conclusions of the review. atic reviews (fig. 1). The systematic reviews were pub- When data from Reuben reports were included in a lished between 1994 and 2007, thus all at least 2 yr quantitative category III systematic review (meta-analysis), before the Reuben case was made public. Consequently, we contacted the authors and asked them to repeat all none mistakenly included one of the retracted articles. The 27 Reuben reports were published between

‡‡ ISI Web of Knowledge. Available at: http://apps.isiknowledge.com/. Ac- 1994 and 2007 (table 1). Twenty-five were RCTs, in- cessed August 17, 2009. cluding data from 1,906 patients (median number of

Anesthesiology, V 111, No 6, Dec 2009 SUSCEPTIBILITY TO FRAUD IN SYSTEMATIC REVIEWS 1281 patients per trial, 60 [range, 40–200]). One was a all meta-analyses except for the analysis by Jirarattana- retrospective analysis including data from 434 pa- phochai and Jung,18 who did not respond to our inquiry. tients, and 1 was performed in 15 healthy volunteers. We therefore repeated their analyses with and without Ten reports were officially withdrawn.7 Six acknowl- Reuben reports. edged sponsorship by pharmaceutical companies, al- Five of those six quantitative systematic reviews though it remained unclear whether the authors were (studying the efficacy of adding NSAIDs or coxibs to funded by industry or whether the trials were spon- patient-controlled analgesia with morphine after ma- sored by industry like trials performed for registration. jor surgery,14 the impact of NSAIDs or coxibs on All reports with acknowledged sponsorship by a phar- morphine-related adverse effects,22 the efficacy of pre- maceutical company were subsequently withdrawn. emptive analgesia,25 or the antiemetic efficacy of Three reports acknowledged sponsorship by institu- droperidol9,32) reported on a total of 14 subgroup tional funds, and in 18, no sponsorship was acknowl- analyses that included Reuben reports. Exclusion of edged. Twelve different drugs, mainly analgesics, Reuben reports did not significantly change any of the given by various routes, were tested: classic nonste- results (table 2). roidal antiinflammatory drugs (NSAIDs), cyclooxygen- The sixth quantitative systematic review (studying ase 2–selective inhibitors (coxibs), morphine, fenta- NSAIDs and coxibs for analgesia after spine surgery)18 nyl, oxycodone, meperidine, bupivacaine, pregabalin, included data from 17 RCTs (789 patients); 5 (240 pa- venlafaxine, and droperidol. Seven of the 10 officially tients) were Reuben reports [7,8,16,22,25].§§ A further withdrawn reports tested a coxib. Types of surgery 2 Reuben reports [23,24] were cited in introduction or were mainly orthopedic. Reuben was the first author discussion but were not further analyzed. Sixteen sub- of 21 reports (78%). Thirty-eight individuals acted as group analyses included Reuben reports. For some sub- coauthors (median number per report, 2 [range, group analyses, the majority of data came from Reuben 1–5]). One individual coauthored eight times, one reports. For example, the analysis of pain intensity at coauthored six times, and one coauthored five times. 24 h with coxibs included data from 234 patients from 5 RCTs; 200 patients came from 4 Reuben reports. A total Category I Systematic Reviews of 8 subgroup analyses (50%) met our criterion for a Five systematic reviews (studying antiemetic drugs against relevant difference when the Reuben reports were ex- morphine-induced emesis,17 tramadol for osteoarthritis,10 cluded (table 2). This was mainly due to a decrease in perioperative gabapentin and pregabalin,31 perioperative power, i.e., 95% confidence intervals became wider. ,13 or postdischarge symptoms after outpatient However, for some subgroup analyses, point estimates surgery35) cited Reuben reports in the introduction or also changed. For example, 24-h morphine sparing with coxibs changed from an average of Ϫ30.2 mg to Ϫ2mg discussion but did not consider the data for inclusion (fig. 1). after exclusion of Reuben reports. Similarly, periopera- tive blood loss with NSAIDs or changed from Category II Systematic Reviews Ϫ19.7 ml to ϩ23.7 ml (table 2). Six systematic reviews (studying rofecoxib for postop- In the 6 meta-analyses, 30 subgroup analyses involved 8 28 erative pain relief, opioid sparing with coxibs, or Reuben reports. In 19 (63%), the ratio of the numbers of efficacy of intraarticular administration of analgesic Reuben reports over the numbers of all trials and the drugs16,19,24,29) retrieved and considered Reuben re- ratio of the numbers of patients in Reuben reports over ports for inclusion but excluded them from analysis the total numbers of patients were approximately 30% or for a variety of reasons (fig. 1). The Reuben reports lower (fig. 2); for none did the exclusion of Reuben were mainly excluded because patients received con- reports make any difference. In 8 analyses (27%), the comitant drugs that potentially interfered with the ratios of the numbers of reports and patients were efficacy of the experimental intervention or because between approximately 40% and 70%, and for 5 of specific endpoints were not reported. On no occasion those, exclusion of Reuben reports did make a rele- did the authors of these systematic reviews express vant difference. In 3 analyses (10%), the ratios of the concerns about the validity of the excluded Reuben numbers of reports and patients were 80% and higher, reports. and for all 3, exclusion of Reuben reports made a relevant difference. All analyses with significant changes Category III Systematic Reviews in results after exclusion of Reuben reports were from 1 Quantitative Systematic Reviews (Meta-analyses). single meta-analysis.18 Six quantitative systematic reviews included data from Qualitative Systematic Reviews. It was our unani- Reuben reports (fig. 1). We had access to all trial data of mous verdict for 3 of 8 qualitative systematic reviews that noninclusion of Reuben reports would not have 23,26,33 §§ For references of Reuben reports [shown throughout this article in brack- changed their conclusion. They studied the role of ets], please refer to Supplemental Digital Content 1, http://links.lww.com/ALN/A559. clonidine as an adjuvant to local anesthetics for periph-

Anesthesiology, V 111, No 6, Dec 2009 1282 MARRET ET AL.

Table 1. Reuben Reports That Were Subsequently Cited in Systematic Reviews

Reference Architecture No. of Patients No. of No. of No. Citing Systematic Authors, Journal, Year No. of Report in Report Citations Coauthors Reviews

Reuben SS, Dunn SM, Duprat KM, et al. [1] RCT 60 18 3 1 Anesthesiology 1994 Reuben SS, Steinberg RB, Kreitzer JM, et al. Anesth [4] RCT 60 51 3 1 Analg 1995 Freedman GM, Kreitzer JM, Reuben SS, et al. [3] RCT 40 10 3 2 Mount Sinai J Med 1995 Reuben SS, Connelly NR. Anesth Analg 1995 [2] RCT 80 66 1 1 Reuben SS, Connelly NR. Anesth Analg 1996 [5] RCT 80 41 1 3 Reuben SS, Duprat KM. Reg Anesth 1996 [6] RCT 60 16 1 1 Reuben SS, Connelly NR, Steinberg R. Reg Anesth [7] RCT 80 26 2 1 1997 Reuben SS, Connelly NR, Lurie S. Anesth Analg [8] RCT 70 43 4 4 1998 Steinberg RB, Reuben SS, Gardner G. Anesth Analg [9] RCT 70 20 2 1 1998 Reuben SS, Connelly NR, Maciolek H. Anesth Analg [10] RCT 60 44 2 2 1999 Reuben SS, Steinberg RB, Klatt JL, et al. [12] RCT 45 24 3 1 Anesthesiology 1999 Reuben SS, Steinberg RB, Lurie SD, et al. [11] RCT 60 19 3 1 Anesth Analg 1999 Connelly NR, Camerlenghi G, Bilodeau M, et al. Reg [13] RCT 40 4 5 2 Anesth Pain Med 1999 Reuben SS, Connelly NR. Anesth Analg 2000 [16] RCT 60 129 1 6

Joshi W, Reuben SS, Kilaru PR, et al. Anesth Analg [14] RCT 60 36 4 2 2000 Lurie SD, Reuben SS, Gibson CS, DeLuca PA, [15] RCT 15 volunteers 14 4 1 Maciolek HA. Reg Anesth Pain Med 2000 Reuben SS, Sklar J, El-Mansouri M. Anesth Analg [17] RCT 40 20 2 1 2001 Reuben SS, Bhopatkar S, Maciolek H, et al. Anesth [18] RCT 60 76 4 6 Analg 2002 Reuben SS, Fingeroth R, Krushell R, et al. [19] RCT 100 39 3 3 J Arthroplasty 2002 Joshi W, Connelly NR, Reuben SS, et al. Anesth [20] RCT 80 31 4 3 Analg 2003 Reuben SS, Makari-Judson G, Lurie SD. J Pain [21] RCT 100 36 2 1 Sympt Manag 2004 Reuben SS, Ekman EF. J Bone Joint Surg 2005 [22] RCT 80 35 1 2 Reuben SS, Ablett D, Kaye R. Can J Anesth 2005 [23] Retrospective 434 19 2 2

Reuben SS, Ekman EF, Raghunathan K, et al. Reg [24] RCT 80 7 5 1 Anesth Pain Med 2006 Reuben SS, Buvanendran A, Kroin JS, et al. Anesth [25] RCT 80 19 3 1 Analg 2006 Reuben SS, Ekman EF. Anesth Analg 2007 [26] RCT 191 12 1 1

Reuben SS, Ekman EF, Charron D. Anesth Analg [27] RCT 200 9 2 1 2007

eral nerve blockade,23 adverse effects associated with anesthesia (IVRA) included 29 trials (1,217patients); 6 opioids,33 and controlled-release oxycodone for the (325 patients) were from Reuben [4,6,9,11,12,15]. The treatment of cancer and noncancer pain.26 authors pointed out that only 1 trial (a Reuben report It was our unanimous verdict that noninclusion of [4]) looked at the potential intraoperative benefit of Reuben reports would have had an impact on the results NSAIDs added to local anesthetics: significantly fewer of 1 qualitative systematic review.12 This analysis of patients had tourniquet pain when ketorolac was added, adjuvants to local anesthetics for intravenous regional and reportedly there were a number of significant post-

Anesthesiology, V 111, No 6, Dec 2009 SUSCEPTIBILITY TO FRAUD IN SYSTEMATIC REVIEWS 1283

Table 1. Continued

No. of Patients Experimental Route of Setting, per Group Withdrawn Sponsorship Intervention Administration Type of Surgery

10 No Not specified Fentanyl Intrathecal Lower extremity vascular

No Not specified Ketorolac IVRA Hand

20 No Not specified Droperidol Morphine PCA Orthopedic

10 No Not specified Ketorolac Intraarticular Knee arthroscopy 20 Yes Not specified Ketorolac Intraarticular Knee arthroscopy 20 No Not specified Ketorolac IVRA, wound infiltration Hand 20 No Not specified Ketorolac Morphine PCA Spine

10 No Not specified Ketorolac Intravenous Spinal fusion

10 No Not specified Ketorolac IVRA Carpal tunnel

20 Yes Not specified Oxycodone Oral Anterior cruciate ligament reconstruction 15 No Institutional and/or departmental Clonidine IVRA Carpal tunnel sources 10 No Not specified Meperidine IVRA Carpal tunnel

20 No Not specified Clonidine Adjuvant to local Eye anesthetic Yes Institutional and/or departmental Celecoxib, rofecoxib Oral Spinal fusion sources 15 No Not specified Clonidine, morphine Intraarticular Knee arthroscopy

15 No Not specified Clonidine IVRA Tourniquet pain in healthy volunteers 20 No Not specified Bupivacaine, morphine Intraarticular Knee arthroscopy

20 No Not specified Rofecoxib Oral Knee arthroscopy

50 Yes Merck Rofecoxib Oral Total knee arthroplasty

32, 34 No Not specified Rofecoxib Oral Pediatric tonsillectomy

47, 48 Yes Rays of Hope Foundation, Venlafaxine Oral Mastectomy -Ayerst Laboratories 40 Yes Pfizer Celecoxib Oral Spinal fusion 130, 124 No Institutional and/or departmental Ketorolac, celecoxib, NA Spinal fusion sources rofecoxib 60, 120 40 Yes Pfizer Celecoxib Oral Spinal fusion

20 Yes Not specified Celecoxib, pregabalin Oral Spinal fusion

NA Yes Pfizer Celecoxib Oral Anterior cruciate ligament reconstruction 100 Yes Pfizer Celecoxib Oral Anterior cruciate ligament reconstruction

For references of Reuben reports [shown in brackets], please refer to Supplemental Digital Content 1, http://links.lww.com/ALN/A559. IVRA ϭ intravenous regional anesthesia; NA ϭ not available; PCA ϭ patient-controlled analgesia; RCT ϭ randomized controlled trial. operative benefits from IVRA with ketorolac compared quet tolerance and improved postoperative analgesia. with systemic control. According to these authors, an- According to the authors, these experimental findings other Reuben report [6] found that ketorolac was were further supported by a clinical study by Reuben equally analgesic either infiltrated into the surgical site [12]. They also stressed that according to various Re- or when given as an adjunct to IVRA. In addition, they uben reports [12,15], a small dose of clonidine as an thought, based on a Reuben report that was performed adjuvant to IVRA seemed to be well tolerated. Finally, in 15 volunteers [15], that clonidine prolonged tourni- they concluded that opioids were disappointing for

Anesthesiology, V 111, No 6, Dec 2009 1284 MARRET ET AL.

Table 2. Reanalyses of Meta-analyses That Included Reuben Data

No. of Trials/No. of Patients Included

Reference No. Experimental Intervention Outcome Meta-analysis Reuben Data

NSAIDs and coxibs as adjuvants to morphine after major surgery 22 NSAID and coxib, Incidence of nausea 14/1,343 1/20 any regimen and vomiting NSAID and coxib, Incidence of pruritus 10/1,436 1/20 any regimen 14 Coxib, single dose† Morphine consumption 0–24 h (mg) 3/139 1/40 Coxib, single dose‡ Morphine consumption 0–24 h (mg) 5/182 1/40 NSAID, multiple dose Morphine consumption 0–24 h (mg) 15/893 1/30 NSAID, any regimen Incidence of nausea and vomiting 13/1,387 1/30 NSAID, any regimen Incidence of pruritus 9/1,369 1/30 Preemptive analgesia with local anesthetics and NSAIDs 25 Local anesthetic infiltration Supplemental postoperative analgesic 8/360 1/40 requirements Local anesthetic infiltration Time to first analgesic postoperatively 8/362 1/40 NSAID, any regimen Pain intensity during the first 24–48 h 12/617 1/40 NSAID, any regimen Supplemental postoperative analgesic 12/582 1/40 requirements NSAID, any regimen Time to first analgesic 6/307 1/40 Antiemetic efficacy of droperidol 32 Droperidol, added Incidence of vomiting at 24 h 5/331 1/40 to intravenous morphine PCA 9 Droperidol, any regimen, Incidence of vomiting, any 110/8,084 1/40 any route of observation period administration NSAIDs and coxibs as adjuvants to morphine after spine surgery§ 18 Coxib, any regimen Pain intensity 0–2 h (VAS) 5/255 2/120 Coxib, any regimen Pain intensity 4–6 h (VAS) 5/234 4/200 Coxib, any regimen Pain intensity 24 h (VAS) 5/234 4/200 Coxib, any regimen Morphine consumption 0–2 h (mg) 5/255 2/120 Coxib, any regimen Morphine consumption 4–6 h (mg) 3/114 2/80 Coxib, any regimen Morphine consumption 0–24 h (mg) 5/234 4/200 NSAID, any regimen Pain intensity 4–6 h (VAS) 6/243 2/60 NSAID, any regimen Pain intensity 24 h (VAS) 9/418 2/70 Any NSAID (ketorolac Morphine consumption 4–6 h (mg) 3/90 2/60 and ketoprofen) NSAID, any regimen Morphine consumption 0–24 h (mg) 8/348 2/60 NSAID and coxib, any Incidence of nausea and 10/472 2/60 regimen vomiting NSAID and coxib, Incidence of sedation 6/263 1/40 any regimen Ketorolac and celecoxib Incidence of pruritus 3/113 2/60 Any NSAID and celecoxib Incidence of urinary retention 5/205 2/80 Ketoprofen and celecoxib Incidence of respiratory depression 2/70 1/40 Any NSAID and celecoxib Volume of perioperative blood loss (ml) 5/298 2/120

IVRA and that, based on data from Reuben [11], only substantially changed some of the conclusions of that meperidine had substantial postoperative benefit but at qualitative systematic review. the expense of postdeflation side effects. It was our view For 4 qualitative category III systematic reviews, our that noninclusion of the 6 Reuben reports would have verdict of whether noninclusion of Reuben reports

Anesthesiology, V 111, No 6, Dec 2009 SUSCEPTIBILITY TO FRAUD IN SYSTEMATIC REVIEWS 1285

Table 2. Continued

Result of Original Subgroup Result of Subgroup Analysis Analysis (95% CI) without Reuben Data (95% CI) Verdict*

RR 0.70 (0.59 to 0.84) RR 0.71 (0.59 to 0.85) No change

RR 0.73 (0.52 to 1.02) RR 0.72 (0.50 to 1.03) No change

WMD Ϫ7.22 (Ϫ10.6 to Ϫ3.82) WMD Ϫ6.71 (Ϫ11.1 to Ϫ2.35) No change WMD Ϫ27.8 (Ϫ44.3 to Ϫ11.4) WMD Ϫ23.1 (Ϫ44.1 to Ϫ2.17) No change WMD Ϫ19.7 (Ϫ26.3 to Ϫ13.0) WMD Ϫ16.8 (Ϫ20.1 to Ϫ13.5) No change RR 0.72 (0.61 to 0.86) RR 0.73 (0.61 to 0.88) No change RR 0.78 (0.59 to 1.04) RR 0.78 (0.58 to 1.04) No change

SMD 0.44 (0.22 to 0.66) SMD 0.40 (0.17 to 0.63) No change

SMD 0.44 (0.22 to 0.65) SMD 0.37 (0.15 to 0.59) No change SMD 0.14 (Ϫ0.03 to 0.31) SMD 0.09 (Ϫ0.08 to 0.26) No change SMD 0.48 (0.31 to 0.66) SMD 0.44 (0.26 to 0.62) No change

SMD 0.68 (0.44 to 0.92) SMD 0.60 (0.34 to 0.85) No change

RR 1.72 (1.38 to 2.15) RR 1.73 (1.34 to 2.23) No change

RR 0.65 (0.61 to 0.70) RR 0.65 (0.61 to 0.70) No change

WMD Ϫ5.80 (Ϫ10.4 to Ϫ1.24) WMD Ϫ6.74 (Ϫ20.5 to 7.02) Relevant difference WMD Ϫ10.4 (Ϫ15.2 to Ϫ5.57) WMD Ϫ3.00 (Ϫ23.4 to 17.4) Relevant difference WMD Ϫ3.79 (Ϫ6.89 to Ϫ0.69) WMD Ϫ6.30 (Ϫ23.2 to 10.6) Relevant difference WMD Ϫ2.80 (Ϫ5.01 to Ϫ0.59) WMD Ϫ1.50 (Ϫ3.24 to 0.24) Relevant difference WMD Ϫ7.97 (Ϫ9.77 to Ϫ6.17) WMD Ϫ3.20 (Ϫ25.7 to 19.3) Relevant difference WMD Ϫ30.2 (Ϫ46.2 to Ϫ14.2) WMD Ϫ2.00 (Ϫ19.4 to 15.4) Relevant difference WMD Ϫ9.83 (Ϫ13.5 to Ϫ6.14) WMD Ϫ9.53 (Ϫ15.2 to Ϫ3.88) No change WMD Ϫ14.5 (Ϫ19.0 to Ϫ9.90) WMD Ϫ17.6 (Ϫ24.2 to Ϫ11.0) No change WMD Ϫ5.38 (Ϫ7.53 to Ϫ3.23) WMD Ϫ2.50 (Ϫ10.6 to 5.59) Relevant difference

WMD Ϫ14.7 (Ϫ24.7 to Ϫ4.79) WMD Ϫ10.6 (Ϫ19.4 to Ϫ1.86) No change RR 0.83 (0.63 to 1.09) RR 0.92 (0.67 to 1.26) No change

RR 0.79 (0.41 to 1.53) RR 0.99 (0.60 to 1.64) No change

RR 0.42 (0.16 to 1.07) RR 0.32 (0.04 to 3.89) No change RR 1.01 (0.57 to 1.80) RR 1.40 (0.72 to 2.71) No change RR 0.18 (0.02 to 1.60) RR 0.31 (0.01 to 8.28) No change WMD Ϫ19.7 (Ϫ37.3 to Ϫ2.02) WMD 23.7 (Ϫ39.2 to 86.6) Relevant difference

* A change from a significant to a nonsignificant result (or vice versa) by excluding Reuben data was regarded as a relevant difference in the outcome of a subgroup analysis. † Celecoxib, 200 mg. ‡ Rofecoxib, 50 mg. § Subgroup analyses of the meta-analysis by Jirarattanaphochai and Jung (2008) were redone. CI ϭ confidence interval; coxib ϭ cylcooxygenase-2 selective inhibitor; NSAID ϭ nonsteroidal antiinflammatory drug; PCA ϭ patient-controlled analgesia; RR ϭ relative risk; SMD ϭ standardized mean difference; VAS ϭ visual analog scale; WMD ϭ weighted mean difference. would have had an impact on the results or the conclu- Rømsing and Møiniche27 compared four different sions of the review was not unanimous.15,20,27,30 These coxibs with NSAIDs for postoperative analgesia. They are discussed briefly, and the reasons for our lack of included 3 Reuben reports [16,18,20] and excluded an unanimity are explained. additional report [19] because pain intensity up to 24 h

Anesthesiology, V 111, No 6, Dec 2009 1286 MARRET ET AL.

100% dark as to how the conclusions from, e.g., a systematic review were weighted compared with data from a single 80% RCT. In the results section, the authors suggested that the use of coxibs might result in a reduction in long-term 60% Ratio of number of patients complications after surgery including chronic pain. As in meta-analysis evidence for this assumption, 2 RCTs from Reuben (N° Reuben/N° all) 40% [26,27] were cited that both studied the analgesic effi- cacy of celecoxib. Although the authors did not retain 20% this hypothesis in the conclusions of the review, our disagreement was whether these Reuben reports would 0% have had an impact on one of the results of that review. 0% 20% 40% 60% 80% 100% Finally, Fischer et al.15 published evidence-based rec- Ratio of number of reports in meta-analysis (N° Reuben/N° all) ommendations for analgesia techniques after total knee Fig. 2. Ratio of numbers of reports and ratio of numbers of pa- arthroplasty. The authors acknowledged support by an tients. Ratio of number of Reuben reports over number of all educational grant from Pfizer, reimbursement by Pfizer for reports in a meta-analysis (x-axis) and ratio of number of patients attending working group meetings, help and expertise in in Reuben reports over number of all patients of all reports in a meta-analysis (y-axis). Data are from six meta-analyses (30 sub- performing literature searches by a Pfizer employee, and group analyses) that included Reuben reports. Each symbol rep- editorial assistance by medical writers who were spon- resents one subgroup analysis that included data from at least one sored by Pfizer. The search strategy considered exclusively -subgroup analyses with sig ؍ Reuben report. Light gray circles nificant changes in results after exclusion of Reuben reports; dark RCTs. Two Reuben reports were retrieved but were even- subgroup analyses that were unaffected by exclusion of tually excluded by the authors because surgery was not ؍ squares are from one single knee arthroplasty [10], or postoperative pain scores were (8 ؍ Reuben reports. All light gray circles (n meta-analysis.18 Exclusion of Reuben reports never had a signifi- cant impact on results when ratios of numbers of reports and of not reported [19]. Because no Reuben reports were con- patients were less than 30%. sidered in the actual analysis, we considered whether to classify the review as category II. However, in the recom- postoperatively was not reported. Our ambiguity was mendations section, the authors unexpectedly referred to 2 due to one of the main conclusions of the review, stating further, previously not considered Reuben reports [22,23]. that 50 mg rofecoxib provided superior analgesia com- One was a retrospective chart review [23]. Based on those pared with 200 mg celecoxib. That conclusion was 2 reports, the authors weighted the evidence regarding based on data from 4 trials; 2 were from the same group bone healing against NSAIDs and in favor of coxibs: “Lim- of Merck collaborators, and 1 was from Reuben [16], and ited data show that conventional NSAID may have dose- all 3 were in favor of rofecoxib. and duration-dependent detrimental effects on bone heal- Straube et al.30 tested the effect of coxibs on postop- ing,” and “Although there is concern about impairment of erative outcomes. They included 4 Reuben reports that bone healing with cyclooxygenase 2–selective inhibitors, tested celecoxib or rofecoxib [16,18–20]. The authors limited evidence shows that they have no detrimental ef- used a vote counting procedure. Noninclusion of Re- fects.” Both Reuben reports were classified as level 1 evi- uben reports would not have changed the ratio of posi- dence by the authors, and because both reports were tive to negative trials. However, in the discussion the performed in spinal surgery, the evidence was regarded as authors stated that “With one exception (citing a Reuben “transferable.” Our disagreement was whether noninclu- report [18]) studies did not address the question of sion of these Reuben reports would have changed one of preemptive analgesia, where preoperative coxib was the main conclusions of that review. compared with postoperative coxib, though that ex- ception found a large benefit for preoperative over postoperative rofecoxib,” and “Only one small study Discussion (citing a Reuben report [18]) evaluated preoperative with postoperative coxib in a standard preemptive The majority of quantitative systematic reviews (meta- design, and did find significant benefit for preopera- analyses) proved to be robust against the impact of tive use.” Our disagreement was whether without potentially fraudulent Reuben reports. This was not an referring to that Reuben report the authors would unexpected result because data from a few trials, even have been able to make a statement in favor of pre- when flawed or fabricated, should have no substantial emptive analgesia with rofecoxib. impact on the conclusions of a meta-analysis, which Liu and Wu20 provided a summary of postoperative often includes data from hundreds or thousands of pa- analgesia practices using data from systematic reviews tients from a large number of trials. However, some but also from some additional, selected RCTs. The au- systematic reviews seemed to be less robust; they would thors described a search strategy; however, selection probably have reported on different results or would have criteria remained obscure and the reader was left in the drawn different conclusions had they not included Reuben

Anesthesiology, V 111, No 6, Dec 2009 SUSCEPTIBILITY TO FRAUD IN SYSTEMATIC REVIEWS 1287 reports.12,15,18,20,27,30 There were three main reasons for several reasons. First, Reuben reports were published this lack of robustness. First, the numerical relation be- over a period of 15 yr with scope for others to confirm tween the number of potentially fraudulent and the num- or refute the findings. There is empirical evidence that ber of valid data in a systematic review seems to be crucial the median survival time without substantive new evi- (although we were only able to show this empirically for dence for meta-analyses is approximately 5 yr only, and meta-analyses).12,18 It may be inferred from figure 2 that that clinically important evidence that alters conclusions meta-analyses that include mainly trials or patient data from about the effectiveness and harms of treatments can one or only a few authors or centers need to be interpreted accumulate rapidly.37 Second, the Reuben studies were cautiously. Whether the cutoff ratio of approximately 30% of limited size, minimizing their quantitative impact on is universally applicable depends on several factors. For meta-analyses. Third, most Reuben reports echoed cur- example, here, trial sizes were very similar. Hence, one rent knowledge and did not contradict science. mega-trial would overwhelm even a large number of fabri- One particularity of a few Reuben reports was that cated trials if they were small. There are no rules as to how they pretended to add new insights; these were often many trials a valid meta-analysis should include. The pro- attractive, welcomed by many, and sometimes they portion of potentially fraudulent data among valid data seemed to be revolutionary and to advance science. needs to be considered. Among those were the absence of detrimental effects of Our lack of unanimity in estimating the impact of coxibs on bone healing after spine surgery, the benefi- Reuben reports in some qualitative systematic reviews cial long-term outcome after preemptive administration was mainly due to two reasons. First, some systematic of coxibs including an allegedly decreased incidence of reviewers seemed to give undue weight to evidence chronic pain after surgery, and the analgesic efficacy of emerging from particular Reuben reports.20,30 This prob- ketorolac or clonidine when added to local anesthetics for intravenous regional anesthesia. Clinical algorithms lem is inherent to the process of qualitative systematic 38 reviews because they cannot take into account effect based on this evidence need to be revised. Our analysis has limitations. First, we were unable to size. Second, in one review that claimed to consider address the role of Reuben’s coauthors. It remains ob- exclusively data from RCTs, the authors referred to ob- scure why a small number of individuals coauthored a servational data from Reuben that eventually had a large number of Reuben reports without having any strong impact on the overall conclusions.15 The Improv- suspicion about the nature of the data. Although the ing the Quality of Reports of Meta-Analyses of Random- publishing role of senior members of a collaboration ized Controlled Trials (QUOROM) statement stipulates group on an article has been recognized for a while, the that the results of a systematic review should be inter- responsibilities of coauthors, and thus each author’s con- preted in the “light of the totality of available evi- 39 36 tributions, have only been emerging recently. Second, dence.” To avoid any misinterpretation, the QUOROM we were not able either to evaluate the impact of spon- statement may need to be more specific in that the sorship. Research that is sponsored by industry may results of a systematic review should be interpreted in draw undue conclusions in favor of the industrial prod- the “light of the totality of evidence that was retrieved uct.40–42 Some Reuben reports acknowledged sponsor- through the systematic literature search.” Data not di- ship by industry; however, we do not know whether rectly relevant should be interpreted as hypothesis gen- industry simply funded the authors or was actively in- erating and may be used as a basis for a rational research volved in design, analysis, and publication of the studies. agenda rather than as transferable evidence. All reports that acknowledged sponsorship from indus- ϭ Of the 96 Reuben reports, one third (n 32) have try were retracted because the reported data were iden- never been cited; although their impact on science and tified as having been fabricated (table 1). For the major- clinical practice is impossible to quantify, we may as- ity of Reuben reports, no information on sponsorship sume that it remains low. Two thirds have been cited was provided; we do not know whether there was none almost 1,200 times in indexed journals, and Reuben and or whether it was not reported. Given the concerns his coauthors have actively participated in this dissemi- about financial ties in research, one would suggest that if nation process because almost half of all citations were a substantial proportion of evidence was derived from a due to self-citation. Thirty-seven reports (38%) were single study sponsor, the review of the evidence should cited in articles other than systematic reviews; among be considered with greater skepticism. Third, we as- those were editorials, guidelines, clinical studies, and sumed that all Reuben reports that were cited by sys- conventional, nonsystematic review articles. A previ- tematic reviews were fabricated and that the entire data ously published opinion statement attempted to estimate sets were flawed. We cannot exclude that some of these the impact of Reuben reports in this literature.6 Finally, reports contained valid data. However, it may be argued 27 reports (28%) were cited in systematic reviews. that it is dangerous when we discover a fraudulent re- Clearly, the detrimental impact of Reuben reports on search article to assume that there were no problems systematic reviews, if there was any, was limited for with all previous work.43

Anesthesiology, V 111, No 6, Dec 2009 1288 MARRET ET AL.

All human activity is associated with misconduct.44 It weight to individual studies. Such reviews must be per- is perhaps naive to believe that fraud can be avoided; it formed with particular care. The clinical message of our will probably always exist. Almost 2% of scientists ad- analysis is pragmatic. The overall effect of Reuben’s mitted to have fabricated, falsified, or modified data at fraud is weak in areas where there are lots of other data, least once, and this estimate was considered to be con- but is likely to be stronger in areas of few data. Even after servative.45 A fraudulent article looks much the same as retraction of Reuben reports, classic NSAIDs and coxibs a nonfraudulent one; there seem to be no obvious alert are still analgesic in surgical patients, they still have an signs to hint that an article is fraudulent.46 One of the opioid-sparing effect, and they still decrease pain inten- strengths of systematic review is that it leads to a shift of sity in the immediate postoperative period. However, emphasis from single studies to multiple studies. Unless clinical algorithms that have been heavily based on evi- a fraudulent study is very large and reports on an impor- dence from Reuben reports need to be revised. Among tant number of events, it will not have much scope to those are the effects of coxibs on bone healing and change the conclusions of a systematic review. Conse- preemptive analgesia, and the analgesic efficacy of adju- quently, conclusions from a single study should not be vants to local anesthetics for IVRA. overestimated. Also, every effort must be undertaken to further improve quality and validity of systematic re- views. Well-conducted systematic reviews allow a more References objective appraisal of the evidence than traditional, non- 1. Moher D, Pham B, Jones A, Cook DJ, Jadad AR, Moher M, Tugwell P, Klassen systematic reviews, provide a more precise estimate of a TP: Does quality of reports of randomised trials affect estimates of intervention treatment effect, and may explain heterogeneity be- efficacy reported in meta-analyses? Lancet 1998; 352:609–13 2. Schulz KF, Chalmers I, Hayes RJ, Altman DG: Empirical evidence of bias: tween the results of individual studies beyond the play of Dimensions of methodological quality associated with estimates of treatment chance. Ill-conducted systematic reviews, on the other effects in controlled trials. JAMA 1995; 273:408–12 3. Trame`r MR, Reynolds DJ, Moore RA, McQuay HJ: Impact of covert duplicate hand, may be biased because of exclusion of relevant publication on meta-analysis: A case study. BMJ 1997; 315:635–40 studies or inclusion of inadequate studies. 4. Routine audit uncovered Reuben fraud: Missing IRB info led to discovery of fabricated data. Anesthesiology News 2009; 35:3 Our study shows clearly that meta-analysis is not an ap- 5. Winstein KJ, Armstrong D: Top pain scientist fabricated data in studies, propriate instrument to detect fraud if the fraudulent data hospital says. , March 11, 2009, p A12 6. White PF, Kehlet H, Liu S: Perioperative analgesia: What do we still know? are in line with valid data. Instances where noninclusion of Anesth Analg 2009; 108:1364–7 Reuben reports changed the results from pooled analyses 7. Shafer SL: Tattered threads. Anesth Analg 2009; 108:1361–3 8. Barden J, Edwards J, Moore RA, McQuay HJ: Single dose oral rofecoxib for were almost always due to a decrease in the power of the postoperative pain. Cochrane Database Syst Rev 2005; 1:CD004604 analysis related to a decrease in the amount of analyzable 9. Carlisle JB, Stevenson CA: Drugs for preventing postoperative nausea and vomiting. Cochrane Database Syst Rev 2006; 3:CD004125 data (table 2). Only rarely did a point estimate change 10. Cepeda MS, Camargo F, Zea C, Valencia L: Tramadol for osteoarthritis. significantly. Cochrane Database Syst Rev 2006; 3:CD005522 11. Cepeda MS, Camargo F, Zea C, Valencia L: Tramadol for osteoarthritis: A In conclusion, misleading systematic reviews can gen- systematic review and metaanalysis. J Rheumatol 2007; 34:543–55 36 erally be avoided if a few basic principles are observed. 12. Choyce A, Peng P: A systematic review of adjuncts for intravenous regional anesthesia for surgical procedures. Can J Anaesth 2002; 49:32–45 However, the QUOROM statement, intended to improve 13. Desjardins PJ, Mehlisch DR, Chang DJ, Krupa D, Polis AB, Petruschke RA, the quality of reporting of systematic reviews, does not Malmstrom K, Geba GP: The time to onset and overall analgesic efficacy of rofecoxib 50 mg: A meta-analysis of 13 randomized clinical trials. Clin J Pain consider the possibility of fraud. Additional issues may 2005; 21:241–50 be considered to protect systematic reviews against 14. Elia N, Lysakowski C, Trame`r MR: Does multimodal analgesia with acet- aminophen, nonsteroidal antiinflammatory drugs, or selective cyclooxygenase-2 fraudulent data and to further improve their credibility inhibitors and patient-controlled analgesia morphine offer advantages over mor- and validity. For example, the QUOROM statement phine alone? Meta-analyses of randomized trials. ANESTHESIOLOGY 2005; 103:1296– 304 should include a word of caution regarding inappropri- 15. Fischer HB, Simanski CJ, Sharp C, Bonnet F, Camu F, Neugebauer EA, ate ratios between data coming from a very limited Rawal N, Joshi GP, Schug SA, Kehlet H, PROSPECT Working Group: A procedure- specific systematic review and consensus recommendations for postoperative number of authors or centers and the total number of analgesia following total knee arthroplasty. Anaesthesia 2008; 63:1105–23 data that are included in a systematic review. Also, sys- 16. Gupta A, Bodin L, Holmstro¨m B, Berggren L: A systematic review of the peripheral analgesic effects of intraarticular morphine. Anesth Analg 2001; 93: tematic reviewers should stick to pre hoc–defined rules 761–70 and should not rely on evidence that was not considered 17. Hirayama T, Ishii F, Yago K, Ogata H: Evaluation of the effective drugs for the prevention of nausea and vomiting induced by morphine used for postoper- for the review itself. Although readers may expect sys- ative pain: A quantitative systematic review. Yakugaku Zasshi 2001; 121:179–85 tematic reviewers to put their findings into a wider 18. Jirarattanaphochai K, Jung S: Nonsteroidal antiinflammatory drugs for postoperative pain management after lumbar spine surgery: A meta-analysis of context, it does not make sense to define explicit inclu- randomized controlled trials. J Neurosurg Spine 2008; 9:22–31 sion and noninclusion criteria for the review process but 19. Kalso E, Trame`r MR, Carroll D, McQuay HJ, Moore RA: Pain relief from intra-articular morphine after knee surgery: A qualitative systematic review. Pain then to base the main conclusions on data that were not 1997; 71:127–34 initially included in the review because they did not 20. Liu SS, Wu CL: The effect of analgesic technique on postoperative patient- reported outcomes including analgesia: A systematic review. Anesth Analg 2007; fulfill minimal quality criteria. Finally, systematic review- 105:789–808 ers must recognize that qualitative systematic reviews 21. Macario A, Lipman AG: Ketorolac in the era of cyclo-oxygenase-2 selective nonsteroidal anti-inflammatory drugs: A systematic review of efficacy, side ef- are more vulnerable than quantitative systematic reviews fects, and regulatory issues. Pain Med 2001; 2:336–51 and that they are at particular risk of giving undue 22. Marret E, Kurdi O, Zufferey P, Bonnet F: Effects of nonsteroidal antiinflam-

Anesthesiology, V 111, No 6, Dec 2009 SUSCEPTIBILITY TO FRAUD IN SYSTEMATIC REVIEWS 1289

matory drugs on patient-controlled analgesia morphine side effects: Meta-analysis 34. Wu CL, Naqibuddin M, Fleisher LA: Measurement of patient satisfaction as of randomized controlled trials. ANESTHESIOLOGY 2005; 102:1249–60 an outcome of regional anesthesia and analgesia: A systematic review. Reg Anesth 23. McCartney CJ, Duggan E, Apatu E: Should we add clonidine to local Pain Med 2001; 26:196–208 anesthetic for peripheral nerve blockade? A qualitative systematic review of the 35. Wu CL, Berenholtz SM, Pronovost PJ, Fleisher LA: Systematic review and literature. Reg Anesth Pain Med 2007; 32:330–8 analysis of postdischarge symptoms after outpatient surgery. ANESTHESIOLOGY 24. Møiniche S, Mikkelsen S, Wetterslev J, Dahl JB: A systematic review of 2002; 96:994–1003 intra-articular local anesthesia for postoperative pain relief after arthroscopic 36. Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF: Improving knee surgery. Reg Anesth Pain Med 1999; 24:430–7 the quality of reports of meta-analyses of randomised controlled trials: The 25. Ong CK, Lirk P, Seymour RA, Jenkins BJ: The efficacy of preemptive QUOROM statement. Quality of Reporting of Meta-analyses. Lancet 1999; 354: analgesia for acute postoperative pain management: A meta-analysis. Anesth 1896–900 Analg 2005; 100:757–73 37. Shojania KG, Sampson M, Ansari MT, Ji J, Doucette S, Moher D: How 26. Rischitelli DG, Karbowicz SH: Safety and efficacy of controlled-release quickly do systematic reviews go out of date? A survival analysis. Ann Intern Med oxycodone: A systematic literature review. Pharmacotherapy 2002; 22:898–904 2007; 147:224–33 27. Rømsing J, Møiniche S: A systematic review of COX-2 inhibitors compared 38. Eisenach JC: Data fabrication and article retraction: How not to get lost in with traditional NSAIDs, or different COX-2 inhibitors for post-operative pain. the woods. ANESTHESIOLOGY 2009; 110:955–6 Acta Anaesthesiol Scand 2004; 48:525–46 39. Authorship policies. Nature 2009; 458:1078 28. Rømsing J, Møiniche S, Mathiesen O, Dahl JB: Reduction of opioid-related 40. Als-Nielsen B, Chen W, Gluud C, Kjaergard LL: Association of funding and adverse events using opioid-sparing analgesia with COX-2 inhibitors lacks docu- conclusions in randomized drug trials: A reflection of treatment effect or adverse mentation: A systematic review. Acta Anaesthesiol Scand 2005; 49:133–42 events? JAMA 2003; 290:921–8 29. Rosseland LA: No evidence for analgesic effect of intra-articular morphine 41. Bekelman JE, Li Y, Gross CP: Scope and impact of financial conflicts after knee arthroscopy: A qualitative systematic review. Reg Anesth Pain Med of interest in biomedical research: A systematic review. JAMA 2003; 289: 2005; 30:83–98 454–65 30. Straube S, Derry S, McQuay HJ, Moore RA: Effect of preoperative Cox-II- 42. Yank V, Rennie D, Bero LA: Financial ties and concordance between selective NSAIDs (coxibs) on postoperative outcomes: A systematic review of results and conclusions in meta-analyses: Retrospective cohort study. BMJ 2007; randomized studies. Acta Anaesthesiol Scand 2005; 49:601–13 335:1202–5 31. Tiippana EM, Hamunen K, Kontinen VK, Kalso E: Do surgical patients 43. Smith R: Investigating the previous studies of a fraudulent author. BMJ benefit from perioperative gabapentin/pregabalin? A systematic review of effi- 2005; 331:288–91 cacy and safety. Anesth Analg 2007; 104:1545–56 44. Smith R: Research misconduct: The poisoning of the well. J R Soc Med 32. Trame`r MR, Walder B: Efficacy and adverse effects of prophylactic anti- 2006; 99:232–7 emetics during patient-controlled analgesia therapy: A quantitative systematic 45. Fanelli D: How many scientists fabricate and falsify research? A systematic review. Anesth Analg 1999; 88:1354–61 review and meta-analysis of survey data. PLoS One 2009; 4:e5738 33. Wheeler M, Oderda GM, Ashburn MA, Lipman AG: Adverse events asso- 46. Trikalinos NA, Evangelou E, Ioannidis JPA: Falsified papers in high-impact ciated with postoperative opioid analgesia: A systematic review. J Pain 2002; journals were slow to retract and indistinguishable from nonfraudulent papers. 3:159–80 J Clin Epidemiol 2008; 61:464–70

Anesthesiology, V 111, No 6, Dec 2009 5.4 How do authors of systematic reviews deal with research misconduct in original studies? A cross-sectional analysis of systematic reviews and survey of their authors.

Authors : Elia N, VonElm E, Chatagner A, Pöpping DM, Tramèr MR Published in : BMJ Open 2016 ; 6 : e010442 IF 2015 : 2.562

Summary

In this study our aim was to explore whether systematic reviewers were aware of the potential for research malpractice in the trials included in their analyses. Additionally, we explored whether systematic reviewers sometimes suspected new cases of research misconduct during the review process. We performed a cross-sectional analysis of 118 systematic reviews published in 2013 in one of 4 major medical journals and the Cochrane Library, and performed a survey of their authors for additional questions. Our study showed that more than half of systematic reviewers searched for duplicate publications (69%), for unpublished trials (66%), or contacted study authors (62%). However only 27% were concerned with the risk of sponsor bias, 4% with the risk related to conflicts of interest of study authors and only 2.5% checked for ethical approval of included trials. Interestingly, in 7 reviews were cases of misconduct suspected, but only 2 reported it in the published review.

This article has been commented in two renowned websites dedicated to research misconduct:

RetractionWatch: http://retractionwatch.com/2016/08/16/should-systematic-reviewers-report-suspected- misconduct/

Rédaction Médicale et Scientifique: http://www.h2mw.eu/redactionmedicale/revue-syst%C3%A9matique-m%C3%A9ta- analyse/

63 Downloaded from http://bmjopen.bmj.com/ on April 7, 2016 - Published by group.bmj.com

Open Access Research How do authors of systematic reviews deal with research malpractice and misconduct in original studies? A cross- sectional analysis of systematic reviews and survey of their authors

Nadia Elia,1,2 Erik von Elm,3 Alexandra Chatagner,4 Daniel M Pöpping,5 Martin R Tramèr1,6

To cite: Elia N, von Elm E, ABSTRACT et al Strengths and limitations of this study Chatagner A, .Howdo Objectives: To study whether systematic reviewers authors of systematic reviews apply procedures to counter-balance some common ▪ deal with research This study combines quantitative and qualitative forms of research malpractice such as not publishing malpractice and misconduct methods to investigate how systematic reviewers in original studies? A cross- completed research, duplicate publications, or selective deal with research malpractice and misconduct. sectional analysis of reporting of outcomes, and to see whether they ▪ It proposes clear and reproducible samples of systematic reviews and identify and report misconduct. systematic reviews from four major medical jour- survey of their authors. BMJ Design: Cross-sectional analysis of systematic reviews nals and the Cochrane Library. Open 2016;6:e010442. and survey of their authors. ▪ The extracted data were confirmed by 70% of doi:10.1136/bmjopen-2015- Participants: 118 systematic reviews published in the authors of the systematic reviews analysed. 010442 four journals (Ann Int Med, BMJ, JAMA, Lancet), and ▪ The systematic reviewers were not asked to the Cochrane Library, in 2013. confirm all their data, but only the information ▸ Prepublication history and Main outcomes and measures: Number (%) of considered ambiguous. additional material is reviews that applied procedures to reduce the impact ▪ There is currently no common definition of available. To view please visit of: (1) publication bias (through searching of ‘research misconduct’. This may have led to an the journal (http://dx.doi.org/ unpublished trials), (2) selective outcome reporting (by underestimation of its real prevalence. 10.1136/bmjopen-2015- 010442). contacting the authors of the original studies), (3) duplicate publications, (4) sponsors’ and (5) authors’ Received 4 November 2015 conflicts of interest, on the conclusions of the review, INTRODUCTION Revised 12 January 2016 and (6) looked for ethical approval of the studies. Rationale Accepted 10 February 2016 Number (%) of reviewers who suspected misconduct Research misconduct can have devastating are reported. The procedures applied were compared consequences for public health1 and patient across journals. care.23While a common definition of Results: 80 (68%) reviewers confirmed their data. 59 research misconduct is still lacking, there is (50%) reviews applied three or more procedures; 11 an urgent need to come up with strategies to (9%) applied none. Unpublished trials were searched prevent it.45Fifteen years ago, Smith6 pro- in 79 (66%) reviews. Authors of original studies were ‘ contacted in 73 (62%). Duplicate publications were posed a preliminary taxonomy of research ’ searched in 81 (69%). 27 reviews (23%) reported misconduct describing 15 practices ranging sponsors of the included studies; 6 (5%) analysed from ‘minor’ to ‘major’ misconduct. Some of their impact on the conclusions of the review. Five these practices, however, are very common reviews (4%) looked at conflicts of interest of study and may not be regarded by all as ‘miscon- authors; none of them analysed their impact. Three duct’. Therefore, this research defines ‘mal- reviews (2.5%) looked at ethical approval of the practice’ as relatively common and minor studies. Seven reviews (6%) suspected misconduct; misconduct, while the term ‘misconduct’ is only 2 (2%) reported it explicitly. Procedures applied used for data fabrication, falsification, pla- differed across the journals. giarism or any other intentional Conclusions: Only half of the systematic reviews applied malpractices. For numbered affiliations see three or more of the six procedures examined. Sponsors, It has been shown that some of the 15 mal- end of article. conflicts of interest of authors and ethical approval remain overlooked. Research misconduct is sometimes identified, practices described by Smith, threaten the Correspondence to but rarely reported. Guidance on when, and how, to report conclusions of systematic reviews. Examples Dr Nadia Elia; suspected misconduct is needed. of such malpractices include: avoiding the 78 [email protected] publication of a completed research,

Elia N, et al. BMJ Open 2016;6:e010442. doi:10.1136/bmjopen-2015-010442 1 Downloaded from http://bmjopen.bmj.com/ on April 7, 2016 - Published by group.bmj.com

Open Access duplicate publications,9 selectively reporting on out- Setting and selection of systematic reviews comes or adverse effects,10 and presenting biased results Systematic reviews were identified through a PubMed that are in favour of the sponsors’11 or the authors’12 search in August 2014, using the syntax ‘systematic interests. The impact of other malpractices, such as gift review [Title] AND journal title [Journal], limit or ghost authorship, is less clear. 01.01.2013 to 31.12.2013’. A computer-generated Rigorous systematic review methodology includes spe- random sequence was used to select 25 reviews pub- cific procedures that can counter-balance some of the lished in 2013 in the Cochrane Library (http://www. research malpractice. Unpublished studies may, for cochranelibrary.com/cochrane-database-of-systematic- example, be identified through exhaustive literature reviews/2013-table-of-contents.html). searches,13 and statistical tests or graphical displays such Reviews were selected by one of the five authors (NE) as funnel plots can quantify the risk of publication on the basis of the review titles and abstracts. This was bias.14 Unreported outcomes may be unearthed by con- checked by another author (AC). To be eligible, reviews tacting the authors of original articles, and multiple pub- had to describe a literature search strategy and include lications based on the same cohort of patients can be at least one trial or study. Narrative reviews or identified and excluded from analyses.15 Authors of sys- meta-analyses without an exhaustive literature search tematic reviews (further called: systematic reviewers or were not considered. reviewers) can also use sensitivity analyses to quantify the impact sponsors’ and authors’ personal interests have on Variables the conclusions of a review. Finally, it has been suggested From each systematic review, we extracted the following that, as part of the process of systematic reviewing, information: the first author’s name and country of ethical approval of included studies or trials should be affiliation; the number of co-authors; the name of the checked on in order to identify unethical research.16 17 journal; the title of the review; the number of databases Systematic reviewers could hence act as whistle-blowers searched; the number of studies and study designs when reporting any suspected misconduct.18 included; the language limitations applied; whether or not a protocol was registered and freely accessible; and, finally, sources of funding and possible conflicts of inter- Objectives est of the reviewers. The aim of this study is to examine whether systematic Furthermore, we examined whether each of the reviewers apply the aforementioned procedures, and selected reviews applied the following six procedures, whether they uncover and report on cases of they: (1) searched for unpublished trials; (2) contacted misconduct. authors to identify unreported outcomes; (3) searched fi The study rst examines whether reviewers searched for duplicate publications (defined as a redundant for unpublished studies or tested for publication bias, republication of an already published study, with or contacted authors to unearth unreported outcomes, without a cross-reference to the original article); (4) searched for duplicate publications, analysed the impact analysed the impact of the sponsors of the original fl of sponsors or possibles con icts of interest of study studies on the conclusions of the review; (5) analysed authors, checked on ethical approval of the studies and the impact of possible conflicts of interest of the authors reported on misconduct. The secondary objective was to on the conclusions of the review; and (6) extracted examine whether four major journals and the Cochrane information on ethical approval of included studies. We Library reported consistently on the issue. used the following rating system: 0=procedure not applied, 1=partially applied, 2=fully applied (table 1). Finally, we collected information on whether the sys- METHODS tematic reviewers suspected, and explicitly reported on, The reporting of this cross-sectional study follows the any misconduct in the included articles. STROBE recommendation.19 The protocol is available from the authors. Bias Data from the reviews were extracted by one author (NE), and copied into a specifically designed spread- Study design sheet. Two of the co-authors checked the data (AC and We conducted a cross-sectional analysis of systematic DMP). We contacted all the corresponding authors of reviews published in 2013 in four general medical jour- the reviews and asked them to confirm our interpret- nals (Annals of Internal Medicine (Ann Int Med), The BMJ ation of their methods of review. This included their (BMJ), JAMA and The Lancet (Lancet)). A random method regarding the search for unpublished trials, sample of new reviews was drawn from the Cochrane their contacts with the authors, their search for dupli- Database of Systematic Reviews (Cochrane Library) in 2013, cate publications and their identification of misconduct. as Cochrane reviews are considered the gold standard in When there was discrepancy between our interpretation terms of systematic reviewing. and the reviewers’ answers, we used the latter. This was

2 Elia N, et al. BMJ Open 2016;6:e010442. doi:10.1136/bmjopen-2015-010442 Downloaded from http://bmjopen.bmj.com/ on April 7, 2016 - Published by group.bmj.com

Open Access

Table 1 Rating of the six procedures examined Score 01 2 Search of unpublished Unpublished trials not Unpublished trials searched Unpublished trials searched AND trials and/or test for searched, publication bias OR publication bias tested publication bias tested or publication bias not tested statistically corrected Contact with study authors Study authors not Study authors contacted for Study authors contacted to unearth contacted methodology or unspecified unreported endpoints reasons Duplicate publications Duplicate publications not Duplicate publications Duplicate publications referenced searched or not mentioned searched in the published report* Sponsors of the studies Information on study Information on study Impact of study sponsor(s) on the sponsors not reported sponsors reported conclusions of the review analysed Study authors’ conflicts of Study authors’ conflicts of Study authors’ conflicts of Impact of study authors’ conflicts of interest interest not reported interest reported interest on the conclusions of the review analysed Ethical approval Ethical approval of Ethical approval of included Lack of ethical approval of included included studies not studies reported studies reported and referenced† reported Percentages may not add-up to 100% because of rounding errors. *Also includes reports that explicitly mention that no duplicates were identified. †Also includes reports that explicitly mention that none of the included studies lacked ethical approval. done by email, and a reminder was sent to those who 10L1-10; Cochrane Library 19C1-19)(figure 1, online had not replied within 2 weeks. supplementary appendix table 1A).

Characteristics of the reviews Sample size The characteristics of the reviews are described in table 2, The capacity of systematic reviewers to identify miscon- online supplementary appendix tables 1A and 2A. duct is unknown. Our hypothesis was that 5% of system- Approximately 75% of the first authors were affiliated to atic reviewers would identify misconduct. Therefore, we an English-speaking institution. The protocols of all the needed a minimum of 110 systematic reviews to allow us Cochrane reviews were registered and available. However, to detect a prevalence of 5%, if it existed, with a margin protocols were available for only 17 reviews from the of error of 4% assuming an α-error of 0.05. journals. Sources of funding were declared in 110 reviews. Statistical methods Among these 110, 24 declared that they had no funding Descriptive results are reported as numbers (propor- at all. All the reviews declared presence or absence of tions) and median (IQR) as required. To check whether conflicts of interest of the reviewers. systematic reviews were different from one journal to the The median number of databases searched was four. other, we performed all descriptive analyses separately Additional references were searched in systematic according to title of the journal. χ2 or Kruskal–Wallis reviews published previously and through contacting tests were applied to test the null hypothesis of homoge- experts and/or authors of the original studies. Forty-two neous distribution of characteristics and outcomes. We (36%) reviews only considered the English literature, compared reviews from reviewers who answered our and 6 (5%) reviews searched in Medline only. Four inquiry with reviews from those who did not, and across (3%) reviews searched for English articles in Medline journals. Since Cochrane reviews were expected to be only.A(35,39),J(5,9) The median number of articles different from those published in the journals, we per- included per review was 28. Half of the systematic formed separate analyses with and without Cochrane reviews included a mix of various study designs, while reviews. We did not expect missing data. Statistical sig- 39% included only RCTs (table 2). nificance was defined as an α-error of 0.05 or less in two- sided tests. Analyses were performed using STATAV.13. Outcome data Contact with reviewers Out of the 118 reviews, we were able to contact 111 cor- RESULTS responding authors. No valid email address was available Selection of reviews for seven. Eighty reviewers (72%) responded to our We identified 136 references; 18 were excluded for dif- inquiries. ferent reasons, leaving us with 118 systematic reviews Among the 80 reviewers who responded, 8 (10%) pro- (Ann Int Med 39A1-39; BMJ 38B1-38; JAMA 12J1-12; Lancet vided information that changed our data extraction

Elia N, et al. BMJ Open 2016;6:e010442. doi:10.1136/bmjopen-2015-010442 3 Downloaded from http://bmjopen.bmj.com/ on April 7, 2016 - Published by group.bmj.com

Open Access

Figure 1 Flowchart of retrieved and analysed systemic reviews. regarding the endpoint ‘search for unpublished trials or Search of unpublished trials and test for publication bias test for publication bias’. One reviewer declared that, con- Fifty-six reviewers (47%) either searched for unpub- trary to our assumption, unpublished trials had not been lished trials or applied a statistical test to identify publi- searched in their review,B28 and seven claimed that cation bias. Twenty-three reviewers (19%) did both. unpublished trials had been searched although this was Unpublished studies were sought for in trial registries not reported in the published reviews.A(21,25,30,31),B(2,17,27) (eg, ClinicalTrials.gov or FDA database), or by contact- Eleven reviewers (14%) provided information that ing experts and manufacturers. The number of unpub- changed our data extraction regarding the endpoint lished studies included in these reviews was ‘contact with authors of original studies’.Onedeclaredthat inconsistently reported. Contacting the reviewers did not authors had not been contacted,B35 and 10 claimed that help us clarify this issue. Seven reviews (6%) only dis- authors had been contacted although this was not reported cussed the risk of publication bias, and 32 reviews (27%) in the published review.A(12,25,31,34),B(20,25,34),J(9,12),L1 did not mention it at all (table 3). Twenty-six reviewers (32%) provided information that changed our data extraction regarding the endpoint Contact with authors to unearth unreported outcomes ‘duplicate publication’. Terms used included ‘duplicate’, Seventy-three reviewers (62%) had contacted the ‘companion article’, ‘multiple publications’, ‘articles with authors of the original studies. Fifty-eight reviewers overlapping datasets’,or‘trials with identical patient (49%) had searched for unreported results from the ori- population’. Three reviewers declared that, contrary to ginal articles. The reviews rarely reported on the our assumption, they had not identified duplicate number of authors contacted and the response rate. We publications,A(25,26),J3 and 23 claimed having searched were not able to clarify this issue in our email exchange for duplicates although this was not reported in the with the reviewers (table 3). published review.A(16,17,21,24,27,30,31,33),B(9,10,14,20,25,27), C(6,10,11,14,16,18),J6,L(6,9) Duplicate publications Five reviewers (6%) told us about suspected cases of Duplicate publications were sought for in 81 reviews misconduct that were not reported in the published (69%). Twenty-two reviewers confirmed that duplicates review. were not sought for, and 15 did not answer our enquiry. Characteristics of the reviews did not differ depending The number of duplicates identified was rarely men- on whether or not we were able to contact their authors tioned. We failed to clarify this issue in our exchange (table 2). with the reviewers. Ten reviews (8.5%) published the ref- erence of at least one identified duplicate.A(1,10,18,38),C Main results (1,4,6,12),J(2,9) The median number of procedures applied in each review was 2.5 (IQR, 1–3). Eleven reviews (9%) applied Sponsors no procedures at all, while no review applied all six Twenty-seven reviews (23%) reported on the sources of procedures. funding for the studies. Six reviewers (5%) analysed the

4 Elia N, et al. BMJ Open 2016;6:e010442. doi:10.1136/bmjopen-2015-010442 Downloaded from http://bmjopen.bmj.com/ on April 7, 2016 - Published by group.bmj.com

Open Access

Table 2 Characteristics of the systematic reviews analysed ALL Answer No answer N Per cent N Per cent N Per cent p Value Number of systematic reviews 118 100 80 68 38 32 Number of co-authors 0.240 Median (IQR) 6 (5–9) 6 (4–7.5) 6 (5–9) Country of affiliation of 1st author 0.969 USA 42 36 27 34 15 39 UK 23 19 16 20 7 18 Canada 21 18 14 18 7 18 Australia 4 3 3 4 1 3 Others 28 24 20 25 8 21 Protocol registration/accessibility 0.513 No mention of protocol 82 69 53 66 29 76 Protocol exists/not freely accessible 3 3 2 3 1 3 Protocol exists/freely accessible 33 28 25 31 8 21 Source of funding of systematic review 0.420 Not reported 8 7 7 9 1 3 Reported that no funding was received 24 20 15 19 9 24 Reported that funding was received with details 86 73 58 73 28 74 Conflicts of interests of systematic reviewers 0.130 Not reported 0 0 0 0 0 0 Reported that there were none 42 36 33 41 9 24 Reported and detailed in the published report 61 52 39 49 22 58 Reported elsewhere (eg, online) 15 13 8 10 7 18 Number of databases searched 0.637 Median (IQR) 4 (3–7) 4 (3–7) 4 (3–6) Language limitations applied 0.569 None 56 47 35 44 21 55 English only 42 36 30 38 12 32 Limitations to other languages 7 6 6 8 1 3 Not specified 13 11 9 11 4 11 Number of studies included 0.804 Median (IQR) 28 (12–57) 28 (12–59) 34 (12–55) Study designs examined 0.312 RCT only 46 39 32 40 14 37 Cohort prospective only 5 4 4 5 1 3 Diagnostic studies only 5 4 5 6 0 0 Various designs 62 53 39 49 23 61 Answer: reviews in which extracted data were confirmed by reviewers. No answer: reviews in which extracted data were not confirmed by reviewers. p Value testing the null hypothesis of equal distribution between the reviews for which the author responded to our inquiries and those who did not. Statistical tests: χ2 test or Kruskal-Wallis equality of population rank tests, as appropriate. RCT, randomised controlled trials. impact of sponsors on the results of the review.B (8,14) Finally, the appendix table of one of the reviews (4,15,26,32),J12,L8 One reviewer claimed that sponsor bias showed that one study might have suffered from a ‘sig- was unlikely,L8 while three were unable to identify any nificant conflict of interest’ .A4 sponsor bias.B(4,15),J12 Finally, two reviews identified sponsor bias (see online supplementary appendix table B26,32 3A). Ethical approval of the studies Three reviews looked at whether or not ethical approval Conflicts of interest of authors had been sought for (see online supplementary Five reviewers reported on conflicts of interest of the appendix table 3A).B(27,37),C10 Two reviews explicitly authors of the studies.A(8,14),B20,C(8,14) None of them reported that all included studies had received ethical used this information to perform subgroup analyses. approval.B(27 37) The third review reported extensively One review mentioned conflicts of interest as a possible on which studies had or had not provided any informa- explanation for their (biased) findings.A8 In three tion on ethical approval or patient consent.C10 reviews, the affiliations of the authors were summarised Outcomes did not differ according to whether or not and potential conflicts of interest clearly identified.B20,C we were able to contact the reviewers (table 3).

Elia N, et al. BMJ Open 2016;6:e010442. doi:10.1136/bmjopen-2015-010442 5 Downloaded from http://bmjopen.bmj.com/ on April 7, 2016 - Published by group.bmj.com

Open Access

Table 3 Application of the procedures to counter-balance some common research malpractices ALL Answer No answer p Value N Per cent N Per cent N Per cent Number of systematic reviews 118 100 80 68 38 32 Search of unpublished trials and/or test for publication bias 0.914 Publication bias discussed only or not mentioned 39 33 26 33 13 34 Unpublished trials searched OR publication bias tested 56 47 39 49 17 45 Unpublished trials searched AND publication bias tested 23 19 15 19 8 21 Contact with authors of the studies 0.427 Study authors not contacted 45 38 28 35 17 45 Study authors contacted for method or unspecified reason 15 13 12 15 3 8 Study authors contacted for unreported outcomes 58 49 40 50 18 47 Duplicate publications 0.057 Not searched or not mentioned 37 31 21 26 16 42 Searched and found, not referenced OR no mention of results 71 60 54 68 17 45 Searched, found and referenced 10 8 5 6 5 13 Sponsors of the studies 0.809 Not mentioned 91 77 63 79 28 74 Information extracted 21 18 13 16 8 21 Information extracted and subgroup analyses performed 6 5 4 5 2 5 Conflicts of interests of study authors 0.703 Not mentioned 113 96 77 96 36 95 Information extracted 5 4 3 4 2 5 Information extracted and subgroup analyses performed 0 0 0 0 0 0 Ethical approval of included studies 0.481 Not mentioned 115 97 77 96 38 100 Information extracted 3 3 3 4 0 0 Information extracted and subgroup analyses performed 0 0 0 0 0 0 Number of procedures applied 0.403 None 11 9 6 8 5 13 1 or 2 procedures 48 41 34 43 14 37 3 or 4 procedures 56 47 38 48 18 47 5 procedures 3 3 2 3 1 3 Median (IQR) 2.5 (1–3) 2.5 (2–3) 2.5 (1–3) Explicit mention of misconduct by reviewers 0.296 No, or not mentioned 111 94 74 93 37 97 Yes 7 6 6 8 1 3 Answer: reviews in which extracted data were confirmed by reviewers. No answer: reviews in which extracted data were not confirmed by reviewers. p Value testing the null hypothesis of equal distribution between the reviews for which the authors responded to our inquiry and those who did not (χ2 test). Percentages may not add-up to 100% because of rounding errors.

Suspicion of misconduct showed a significant increase in the risk of mortality and Two reviewers suspected research misconduct in the arti- acute kidney injury with hydroxyethyl starch solutions cles included in their review and reported it that was not apparent in Boldt’s articles. accordingly.B31,J12 Contacting the other reviewers The second review examined different techniques of allowed us to uncover five additional cases of possible sperm selection for assisted reproduction. The reviewers misconduct. Four agreed to be cited here;A26,B33,C16,L1 suspected data manipulation in one study since its while one preferred to remain anonymous. authors reported non-significant differences between Data falsification was suspected in three reviews.A26,C16, the number of oocytes retrieved and embryos trans- J12 One review, looking at the association of hydroxyethyl ferred while the p-value, when recalculated by the starch administration with mortality or acute kidney reviewers, was statistically significant.C16 injury of critically ill patients,J12 included seven articles The third review (on management strategies for asymp- co-authored by Joachim Boldt. However, a survey per- tomatic carotid stenosis) reported that misconduct had formed in 2010, and focusing on Boldt’s research pub- been suspected based on the ‘differences in data lished between 1999 and 2010, had led to the retraction between the published SAPPGIRE trial and the of 80 of his articles due to data fabrication and lack of re-analysed data posted on the FDA website’.A26 Although ethical approval.20 The seven articles co-authored by this information was not provided in the published Boldt were kept in the review as they had been pub- review, it was available in the full report online (http:// lished before 1999. Nonetheless, the reviewers performed www.ahrq.gov/research/findings/ta/carotidstenosis/caro sensitivity analyses excluding these seven articles, and tidstenosis.pdf).

6 Elia N, et al. BMJ Open 2016;6:e010442. doi:10.1136/bmjopen-2015-010442 Downloaded from http://bmjopen.bmj.com/ on April 7, 2016 - Published by group.bmj.com

Open Access

Intentional selective reporting of outcomes was sus- More than half of the reviews applied procedures to pected in two reviews.B31 In one review that examined reduce the impact of these malpractices. The problem early interventions to prevent psychosis, the reviewers of conflicts of interest remains underestimated, and identified, discussed and referenced three articles that ethical approval of the original studies is overlooked. did not report on all the outcomes.B31 In the second Although systematic reviewers are in a privileged pos- review, the corresponding reviewer (who preferred to ition to unearth misconduct such as copy-paste-like pla- remain anonymous) revealed that he ‘knew of two situa- giarism, intentional selective data reporting and data tions in which authors knew what the results showed if fabrication or falsification, they do not systematically they used standard categories (of outcome) but did not report them. Finally, editors have a role to play in publish them because there was no relationship, or not improving and implementing rigorous procedures for the one they had hoped to find.’ the reporting of systematic reviews to counter-balance Plagiarism was identified in one review examining the impact of research malpractice. the epidemiology of Alzheimer’s disease and other forms of dementia in China.L1 According to the Comparison with other similar analyses reviewers, they had identified a ‘copy-paste-like dupli- Our study confirms that systematic reviewers are able to cate publication of the same paper published two or identify publications dealing with the same cohort of more times, with the same results and sometimes even patients.15 However, 20% of reviews under consideration different authors’. This information was not reported failed to report having searched for duplicates. Only ten in the published review. The corresponding reviewer of them provided the references to some of the identi- explained that they ‘had a brief discussion…about fied duplicates. It remains unclear whether reviewers do what to do about those findings and whether to not consider duplicate publication worth disclosing or mention them in the paper…We did not think that we whether they are unsure on how to address the issue. should be distracted from our main goal, so we felt Finally, there is no widely accepted definition of the that it was better to leave it to qualified bodies and term ‘duplicate’, which, in turn, adds to the confusion. specialised committees on research malpractice to For example, a number of reviewers used the term address this problem separately.’ ‘duplicate’ to describe identical references identified Finally, one reviewer told us that “there were some more than once through the search process. ‘suspected misconduct’ in original studies on which we Selective publication of studies, and selective reporting – performed sensitivity analysis (best case—worst case)”. of outcomes, have been examined previously.21 26 This The review mentioned that some studies were of poor led the BMJ to call for « publishing yet unpublished com- quality, but it did not specifically mention suspicion of pleted trials and/or correcting or republishing misreported misconduct.B33 trials ».27 Other ways to address this issue include registra- The median number of studies included in the tion of the study protocols,28 searching for unpublished reviews that detected misconduct was 56 (IQR, 11–97), trials829and contacting authors to retrieve unreported and was 28 (IQR, 12–57) in the reviews that did not outcomes.30 31 Our analyses show that 70% of systematic detect misconduct. The difference did not reach statis- reviewers are aware of these malpractices, although 10% tical significance. failed to report them explicitly. As described before,32 most reviewers did not report on the number of unpub- Secondary endpoints lished articles included in their analyses, the number of The reviews published in the four medical journals dif- authors contacted and the response rate. fered in most characteristics examined in this paper. The Despite the obvious risk of research conclusions 33 reviews published in the Cochrane Library differed from all favouring a sponsor, subgroup analyses on funding the other reviews (see online supplementary appendix were rarely performed. Sponsor bias may overlap with 34 35 table 2A). There were also differences in procedures other malpractices such as selective reporting, 9 applied across the journals (see online supplementary redundant publication or failure to publish completed 35 fl appendix table 4A). The only three reviews that had research. It may also overlap with con icts of interest extracted data on ethical approval for the studies were of the authors of the original studies, an issue that published in the BMJ (2) and in the Cochrane Library (1). remains largely overlooked in these systematic reviews. Finally, one review from the BMJ and one review from There is a general understanding that authors with con- fl JAMA explicitly mentioned potential misconduct. icts of interest are likely to present conclusions in favour of their own interests.12 36 Although most journals now ask for a complete and overt declaration of conflicts DISCUSSION of interest from all authors, this crucial information Statement of principal findings remains unclearly reported. This may explain why we This analysis confirms some issues and highlights new found no reviews that performed subgroup analyses on ones. The risk related to double counting of participants this issue. due to duplicate publications and the risk of selective Ten years ago, Weingarten proposed that ethical reporting of outcomes are reasonably well recognised. approval of studies should be checked during the

Elia N, et al. BMJ Open 2016;6:e010442. doi:10.1136/bmjopen-2015-010442 7 Downloaded from http://bmjopen.bmj.com/ on April 7, 2016 - Published by group.bmj.com

Open Access process of systematic reviewing.16 However, our study taxonomy of research misconduct’ proposed by Smith in shows that only three reviews reported having done so. 2000,6 and categorised all common minor misconduct The need to report ethical approval in original studies as ‘malpractices’. Some may disagree with our has only recently been highlighted. A case of massive classification. fraud in anaesthesiology,20 in which informed consent from patients and formal approval by ethics committee Conclusions and research agenda fi were fabricated, only shows how dif cult a task it will be. The PRISMA guideline has improved the reporting of fi The most striking nding was that although seven sys- systematic reviews.37 PRISMA-P aims to improve the tematic reviews suspected misconduct in original studies, robustness of the protocols of systematic reviews.38 The fi ve of them did not report it, one reported it without 17-items list mentions the assessment of meta-bias(es), further comment and only one reported overtly on the such as publication bias across studies, and selective suspicion. This illustrates that reviewers do not consider outcome reporting within studies. However, the list is themselves entitled to make an allegation, although they not concerned with authors’ conflicts of interest, spon- are in a privileged position to identify misconduct. The sors, ethical approval of original studies, duplicate publi- fact that one reviewer preferred to remain anonymous cations or the reporting of suspected misconduct. The further illustrates the reluctance to openly report on MECIR project defines 118 criteria classified as ‘manda- misconduct. tory’ or ‘highly desirable’, to ensure transparent report- ing in a Cochrane review.39 Among these criteria, Strengths and weaknesses publication and outcome reporting bias, funding We used a clear and reproducible sampling method that sources as well as conflicts of interest are highlighted as was not limited to any medical specialty. The data ana- ‘mandatory’ to report on. However, ethical approval for fi lysed had been con rmed by the reviewers. This allowed studies is not. Most importantly, neither of the two us to quantify the proportion of procedures that were recommendations explicitly describes what should be implemented but not reported by the reviewers. Finally, considered misconduct, what kind of misconduct must fi to our knowledge, this is the rst analysis of the proce- be reported, to whom, and how. dures used by systematic reviewers to deal with research We have previously shown how systematic reviewers malpractice and misconduct. Qualitative answers of the can test the impact of fraudulent data on systematic reviewers were very informative. reviews,40 identify redundant research41 and identify We selected systematic reviews from four major references that should have been retracted.42 This paper medical journals and the Cochrane Library, which is con- suggests that systematic reviewers may have additional sidered the gold standard in terms of systematic review- roles to play. They may want to apply specific procedures ing. We can reasonably assume that the problems to protect their analyses from common malpractices in fi identi ed are at least as serious in other medical jour- the original research, and they may want to identify and fi nals. Systematic reviews that were not identi ed as such report on suspected misconduct. However, they do not in their titles were not included. However, including seem to be ready to act as whistle-blowers. The need for fi these reviews would not have changed our ndings. explicit guidelines on what reviewers should do once Only one reminder was sent to the reviewers, since the misconduct has been suspected or identified has already response rate was reasonably high. Furthermore, the been highlighted.18 These guidelines remain to be characteristics of the systematic reviews did not differ defined and implemented. The proper procedure would between reviews for which authors responded or not. require the reviewer to request the institution where the fi We did not ask the reviewers to con rm all their data research was conducted to investigate on the suspected but focused on the information that we considered misconduct, as the institution holds the legal legitimacy. unclear. It is possible that some of the reviewers had Whether alternative procedures could be applied should ’ ’ indeed extracted information on sponsors and authors be discussed. For example, they may include contacting fl con icts of interest, as well as ethical approval of the the editor-in-chief of the journal where the suspected studies, but failed to report them. A number of proce- paper was originally published, or the editor-in-chief dures applied and instances of misconduct came to light where the systematic review will eventually be published. through our personal contacts with the reviewers. Our Future research should explore the application of add- results might therefore be underestimated. On the other itional protective procedures such as checking for the hand, it is possible that some reviewers pretended having adherence of each study to its protocol, or the handling applied some procedures although they had not. This of outlier results, and quantify the impact of these mea- would have led to an overestimation of the number of sures on the conclusions of the reviews. Finally, potential procedures applied. The major weakness of this study risks of false reporting of misconduct need to be fi ‘ lies in the lack of an accepted de nition of research studied. misconduct’. It is possible that some reviewers might have hesitated to disclose suspected misconduct, leading Author affiliations to an underestimation of the prevalence of misconduct 1Division of Anaesthesiology, Geneva University Hospitals, Geneva, identified. Finally, we have used the ‘preliminary Switzerland

8 Elia N, et al. BMJ Open 2016;6:e010442. doi:10.1136/bmjopen-2015-010442 Downloaded from http://bmjopen.bmj.com/ on April 7, 2016 - Published by group.bmj.com

Open Access

2Institute of Global Health, University of Geneva, Geneva, Switzerland 12. Bes-Rastrollo M, Schulze MB, Ruiz-Canela M, et al. Financial 3Cochrane Switzerland, Institute of Social and Preventive Medicine, Lausanne, conflicts of interest and reporting bias regarding the association Switzerland between sugar-sweetened beverages and weight gain: a systematic review of systematic reviews. PLoS Med 2013;10:e1001578. 4Institute Guillaume Belluard, Cran Gevrier, France 5 13. Chan AW. Out of sight but not out of mind: how to search for Department of Anaesthesiology, Intensive Care and Pain Medicine, University unpublished clinical trial evidence. BMJ 2012;344:d8013. Hospital Münster, Münster, Germany 14. Egger M, Smith GD, Schneider M, et al. Bias in meta-analysis 6Faculty of Medicine, University of Geneva, Geneva, Switzerland detected by a simple, graphical test. BMJ 1997;315:629. 15. von Elm E, Poglia G, Walder B, et al. Different patterns of duplicate publication: an analysis of articles used in systematic reviews. JAMA Acknowledgements The authors thank all the authors of the systematic 2004;291:974–80. reviews who kindly answered our inquiries, and Liz Wager for her thoughtful 16. Weingarten MA, Paul M, Leibovici L. Assessing ethics of trials in advice on the first draft of our manuscript. systematic reviews. BMJ 2004;328:1013–14. 17. Vergnes JN, Marchal-Sixou C, Nabet C, et al. Ethics in systematic Contributors NE, MRT and EvE designed and coordinated the investigations. reviews. J Med Ethics 2010;36:771–4. NE wrote the first draft of the manuscript. AC and DMP were responsible for 18. Vlassov V, Groves T. The role of Cochrane review authors in checking the inclusion and extraction of the data. NE performed the statistical exposing research and publication misconduct. Cochrane Database analyses and contacted the reviewers. All the authors contributed to the Syst Rev 2010;2011:ED000015. (accessed 26 Oct 2015). revision of the manuscript and the intellectual development of the paper. NE 19. von Elm E, Altman DG, Egger M. et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) is the study guarantor. statement: guidelines for reporting observational studies. Lancet – Funding This research received no specific grant from any funding agency in 2007;370:1453 7. 20. Tramèr MR. The Boldt debacle. 2011;28: the public, commercial or not-for-profit sectors. Eur J Anaesthesiol 393–5. Competing interests All the authors have completed the Unified Competing 21. Kelley GA, Kelley KS, Tran ZV. Retrieval of missing data for Interests form at http://www.icmje.org/coi_disclosure.pdf (available on request meta-analysis: apractical example. Int J Technol Assess Health Care 2004;20:296–9. from the corresponding author) and declare that they have received no 22. Rising K, Bacchetti P, Bero L. Reporting bias in drug trials submitted support from any organisation for the submitted work, that they have no to the Food and Drug Administration: review of publication and financial or non-financial interests that may be relevant to the submitted work, presentation. PLoS Med 2008;5:e217; discussion e217. nor do their spouses, partners, or children. EvE is the co-director of Cochrane 23. Turner EH, Matthews AM, Linardatos E, et al. Selective publication Switzerland—a Cochrane entity involved in producing, disseminating and of antidepressant trials and its influence on apparent efficacy. – promoting Cochrane reviews. N Engl J Med 2008;358:252 60. 24. Hopewell S, Loudon K, Clarke MJ, et al. Publication bias in clinical Provenance and peer review Not commissioned; externally peer reviewed. trials due to statistical significance or direction of trial results. Cochrane Database Syst Rev 2009;(1):MR000006. Data sharing statement No additional data are available. 25. Dwan K, Altman DG, Cresswell L, et al. Comparison of protocols and registry entries to published reports for randomised controlled Open Access This is an Open Access article distributed in accordance with trials. Cochrane Database Syst Rev 2011;(1):MR000031. the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, 26. Scherer RW, Langenberg P, von Elm E. Full publication of results which permits others to distribute, remix, adapt, build upon this work non- initially presented in abstracts. Cochrane Database Syst Rev 2007; commercially, and license their derivative works on different terms, provided (2):MR000005. 27. Doshi P, Dickersin K, Healy D, . Restoring invisible and the original work is properly cited and the use is non-commercial. See: http:// et al abandoned trials: a call for people to publish the findings. BMJ creativecommons.org/licenses/by-nc/4.0/ 2013;346:f2865. 28. Wager E, Elia N. Why should clinical trials be registered? Eur J Anaesthesiol 2014;31:397–400. 29. Cook DJ, Guyatt GH, Ryan G, et al. Should unpublished data be REFERENCES included in meta-analyses? Current convictions and controversies. 1. Deer B. How the case against the MMR vaccine was fixed. BMJ JAMA 1993;269:2749–53. 2011;342:c5347. 30. Chan AW, Hróbjartsson A, Haahr MT, et al. Empirical evidence for 2. Bouri S, Shun-Shin MJ, Cole GD, et al. Meta-analysis of secure selective reporting of outcomes in randomized trials: comparison of randomised controlled trials of β-blockade to prevent perioperative protocols to published articles. JAMA 2004;291:2457–65. death in non-cardiac surgery. Heart 2014;100:456–64. 31. Chan AW, Altman DG. Identifying outcome reporting bias in 3. Krumholz HM, Ross JS, Presler AH, et al. What have we learnt from randomised trials on PubMed: review of publications and survey of Vioxx? BMJ 2007;334:120. authors. BMJ 2005;330:753. 4. Martinson BC, Anderson MS, de Vries R. Scientists behaving badly. 32. Mullan RJ, Flynn DN, Carlberg B, et al. Systematic reviewers Nature 2005;435:737–8. commonly contact study authors but do so with limited rigor. J Clin 5. Lüscher TF. The codex of science: honesty, precision, and truth— Epidemiol 2009;62:138–42. and its violations. Eur Heart J 2013;34:1018–23. 33. Lundh A, Sismondo S, Lexchin J, et al. Industry sponsorship and 6. Smith R. The COPE Report 2000. What is research misconduct? research outcome. Cochrane Database Syst Rev 2012;12: http://publicationethics.org/files/u7141/COPE2000pdfcomplete.pdf MR000033. http://onlinelibrary.wiley.com/doi/10.1002/14651858. (accessed 26 Oct 2015). MR000033.pub2/abstract (accessed 26 Oct 2015). 7. Eyding D, Lelgemann M, Grouven U, et al. Reboxetine for 34. Melander H, Ahlqvist-Rastad J, Meijer G, et al. Evidence b(i)ased acute treatment of major depression: systematic review and medicine—selective reporting from studies sponsored by meta-analysis of published and unpublished placebo and pharmaceutical industry: review of studies in new drug applications. selective serotonin reuptake inhibitor controlled trials. BMJ BMJ 2003;326:1171–3. 2010;341:c4737. 35. Vedula SS, Li T, Dickersin K. Differences in reporting of analyses in 8. Hart B, Lundh A, Bero L. Effect of reporting bias on meta-analyses internal company documents versus published trial reports: of drug trials: reanalysis of meta-analyses. BMJ 2012;344:d7202. comparisons in industry-sponsored trials in off-label uses of 9. Tramèr MR, Reynolds DJ, Moore RA, et al. Impact of covert gabapentin. PLoS Med 2013;10:e1001378. duplicate publication on meta-analysis: a case study. BMJ 36. Bekelman JE, Li Y, Gross CP. Scope and impact of financial 1997;315:635–40. conflicts of interest in biomedical research: a systematic review. 10. Selph SS, Ginsburg AD, Chou R. Impact of contacting study authors JAMA 2003;289:454–65. to obtain additional data for systematic reviews: diagnostic accuracy 37. Moher D, Liberati A, Tetzlaff J, et al, The PRISMA Group. Preferred studies for hepatic fibrosis. Syst Rev 2014;3:107. Reporting items for systematic reviews and meta-analyses: the 11. Huss A, Egger M, Hug K, et al. Source of funding and results of PRISMA statement. PLoS Med 2009;6:e1000097. studies of health effects of mobile phone use: systematic 38. Shamseer L, Moher D, Clarke M, et al. Preferred reporting items for review of experimental studies. Environ. Health Perspect systematic review and meta-analysis protocols (PRISMA-P) 2015: 2007;115:1–4. elaboration and explanation. BMJ 2015;349:g7647.

Elia N, et al. BMJ Open 2016;6:e010442. doi:10.1136/bmjopen-2015-010442 9 Downloaded from http://bmjopen.bmj.com/ on April 7, 2016 - Published by group.bmj.com

Open Access

39. MECIR reporting standards now finalized | Cochrane Community 41. Habre C, Tramèr MR, Pöpping DM, et al. Ability of a meta-analysis (beta). http://community.cochrane.org/news/tags/authors/ to prevent redundant research: systematic review of studies on pain mecir-reporting-standards-now-finalized (accessed 26 Oct 2015). from propofol injection. BMJ 2014;349:g5219. 40. Marret E, Elia N, Dahl JB, et al. Susceptibility to fraud in systematic 42. Elia N, Wager E, Tramèr MR. Fate of articles that warranted reviews: lessons from the Reuben case. Anesthesiology retraction due to ethical concerns: a descriptive cross-sectional 2009;111:1279–89. study. PLoS ONE 2014;9:e85846.

10 Elia N, et al. BMJ Open 2016;6:e010442. doi:10.1136/bmjopen-2015-010442 74 5.5 Ability of a meta-analysis to prevent redundant research: systematic review of studies on pain from propofol injection

Authors : Habre C, Tramer MR, Popping DM, Elia N Published in : BMJ 2014 ; 349 :g5219 IF 2014 : 17.445

Summary

In this study we aimed to check the potential impact of a systematic review on the performance and design of subsequent research, based on a very well known example from all anaesthesiologists: the problem of prevention of pain on injection of propofol. Specifically, we aimed to verify that the conclusion and recommendations of a systematic review published in 2000145 on this topic had been followed. The systematic review provided a clear research agenda stipulating that further trials needed to focus on children and that, since an effective intervention had been identified to prevent pain on injection of propofol, this “gold standard intervention” needed to be used as a comparator to test the clinical relevance of new interventions. We showed that the impact of the systematic review on the designs of subsequent research was low. In particular, only 49 of 136 trials published after the publication of the systematic review used the gold standard intervention as comparator or were performed in children and could therefore be considered clinically relevant since potentially leading to a change in practice. This study highlighted the problem of the profusion of clinically irrelevant studies due to the imbalance between the strong pressure to publish and the weak barriers to prevent useless research and calls for a strengthening of existing control barriers such as ethical committees, funders, clinical trial registries and journal editors. Finally, the article concludes with this new idea regarding the role that systematic reviews ought to play in helping researchers to improve their future research.

75 BMJ 2014;349:g5219 doi: 10.1136/bmj.g5219 Page 1 of 13

Research

RESEARCH

Ability of a meta-analysis to prevent redundant research: systematic review of studies on pain from propofol injection OPEN ACCESS

1 2 3 Céline Habre research fellow , Martin R Tramèr professor in anaesthesia , Daniel M Pöpping 4 2 5 anaesthetist , Nadia Elia public health epidemiologist

1Department of Radiology, Geneva University Hospitals, 4 rue Gabrielle-Perret-Gentil, CH-1211 Geneva 14, Switzerland; 2Division of Anaesthesiology, Geneva University Hospitals, Geneva, Switzerland; 3Faculty of Medicine, University of Geneva, Geneva, Switzerland; 4Department of Anaesthesiology and Intensive Care, University Hospital Münster, Münster, Germany; 5Institute of Global Health, Faculty of Medicine, University of Geneva, Geneva, Switzerland

Abstract of the new trials were considered clinically relevant since they used the Objective To examine whether, according to the conclusions of a 2000 most efficacious intervention as comparator or included a paediatric systematic review with meta-analysis on interventions to prevent pain population. from propofol injection that provided a research agenda to guide further Conclusions The impact of the systematic review on the design of research on the topic, subsequently published trials were more often subsequent research was low. There was an improvement in the optimally blinded, reported on children, and used the most efficacious reporting of optimal blinding procedures and a tendency towards an intervention as comparator; and to check whether the number of new increase in the proportion of paediatric trials. The most efficacious trials published each year had decreased and whether the designs of intervention was more often chosen as comparator but remained trials that cited the review differed from those that did not. marginally used, and the number of trials published per year had not Study design Systematic review comparing old trials (published before, decreased. The use of systematic reviews should be encouraged to and included in, the review) with new trials (published afterwards). inform rational, and thus ethical, trial design and improve the relevance of new research. Data sources Medline, Cochrane, Embase, and bibliographies to January 2013. Introduction Eligibility criteria for study selection Randomised studies testing any intervention to prevent pain from propofol injection in humans. Systematic reviews often identify gaps in knowledge and methodological flaws in the existing literature.1 They should Results 136 new trials (19 778 patients) were retrieved. Compared with also guide researchers in assessing the need for further the 56 old trials (6264 patients), the proportion of optimally blinded trials investigations,2 although this is often overlooked. For example, had increased from 10.7% to 38.2% (difference 27.5%, 95% confidence a systematic review published in 2005 and including 64 trials interval 16.0% to 39.0%, P<0.001), and the proportion of trials that used found that in 1992 it could already have been shown, after the the most efficacious intervention as comparator had increased from 12th trial had been published, that aprotinin reduced the risk of 12.5% to 27.9% (difference 15.4%, 4.0% to 26.9%, P=0.022). The bleeding in patients undergoing cardiac surgery; thus a timely proportion of paediatric trials had increased from 5.4% to 12.5%, although performed systematic analysis of the published literature could this was not significant (difference 7.1%, −1.0% to 15.2%, P=0.141). have prevented 52 further trials from being performed.3 It The number of new trials published each year was significantly higher remains unclear though to what extent published systematic (median number/year 12 (range 7-20) v 2.5 (0-9), P<0.001) with no reviews influence the design of subsequent trials. obvious decreasing trend. 72.8% (n=99) of the new trials cited the review, with their designs similar to trials not citing the review. Only 36.0% (n=49) Designing a trial comprises, among other things, the choice of population and outcomes of interest, methods of data collection,

Correspondence to: C Habre [email protected] Extra material supplied by the author (see http://www.bmj.com/content/349/bmj.g5219?tab=related#datasupp) Effective reference interventions according to Picard review Summary of analysed randomised controlled trials References of included trials

No commercial reuse: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe BMJ 2014;349:g5219 doi: 10.1136/bmj.g5219 Page 2 of 13

RESEARCH

or a comparator against which a new, potentially useful had been identified, we additionally searched for trials that did experimental intervention ought to be tested. Administering the not include a placebo arm. We included trials in adults, children, anaesthetic propofol intravenously may be distressing for or volunteers undergoing general anaesthesia or sedation. We patients as it is often associated with pain at the injection site.4 considered drug interventions (for example, pretreatment with In 2000, a systematic review by Picard and Tramèr (a coauthor a drug, or an alternative emulsion of propofol) and non-drug of the present analysis), including data from 6264 patients from interventions (for example, cooling of propofol). To be included, 56 randomised placebo controlled trials, tested the analgesic trials had to report on the incidence of pain on injection of efficacy of interventions to prevent the pain from propofol propofol as the primary outcome. We did not consider letters, injection.5 The systematic review, which was published in one conference abstracts, or studies in animals. of the top five anaesthesiology journals, indexed in all major medical databases, provided six main messages. Firstly, evidence Information sources and searches showed that the most efficacious analgesic intervention was to We performed searches in Medline (via Pubmed), Embase, and administer a small intravenous dose of lidocaine (lignocaine) the Cochrane Library. We additionally identified trials from with venous occlusion before the propofol injection; with bibliographies of retrieved trials and checked references of a lignocaine 40 mg, the best documented regimen, the number further relevant systematic review by Jalota and colleagues that needed to treat to prevent any pain compared with placebo was 7 was published 11 years after the Picard review. We limited the 1.8 (95% confidence interval 1.5 to 2.2). Secondly, although search period from January 2002 (to ensure that trialists had the alternative interventions were also efficacious (for example, an scope to read the Picard review, published in 2000) to January intravenous bolus of lignocaine without venous occlusion, 2013. We did not search for unpublished trials. Trials were lignocaine mixed with propofol, or a variety of opioids or identified using the same search strategy and key words as in non-opioid analgesics administered concomitantly), none of the Picard review—namely, “propofol”, “pain”, “injection”, these showed a similar degree of efficacy compared with and “random”, sought in the titles and abstracts, with a limit to lignocaine with venous occlusion (see supplementary file 1). humans but no limit to language. Thirdly, for several experimental interventions, such as intravenous ondansetron, droperidol, or ketamine, or the dilution Study selection and risk of bias assessment of propofol with homologous blood, no meaningful conclusions could be drawn owing to a lack of valid data. Fourthly, more One author (CH) assessed the eligibility of retrieved articles by data on children were needed to allow definite conclusions in screening the titles and abstracts. Queries were resolved through this population. Fifthly, blinding procedures needed to be discussion with two other authors (NE, MRT). As in the Picard improved as only 11% of the trials were optimally blinded. And review, we scored new trials for quality of data reporting using finally the authors of the Picard review concluded their report the five point Oxford scale, which considers the three items by questioning the necessity of performing further trials to randomisation, blinding, and flow of patients.8 Blinding was identify yet another analgesic intervention to prevent pain from rated optimal (2 points) when drugs were matched. Since we propofol injection. analysed exclusively randomised trials, the minimum score of We examined whether the Picard review had had any impact an included trial was 1. on subsequent research on pain from propofol injection. Specifically, we checked whether the number of new trials on Data collection process the subject published per year had decreased over time, and One author (CH) entered all data into an excel spreadsheet, whether the Picard review had influenced the design of which was developed for the purpose of this analysis. One of subsequently published trials. For example, we expected that three other authors (NE, DMP, or MRT) independently checked most new trials would focus on children, that the proportion of the data. We contacted the authors of the original reports when optimally blinded trials would increase, and that the most we needed to clarify the nature of the data or were unable to efficacious analgesic intervention identified in the Picard review access a report. would be chosen as a comparator against which new experimental interventions were tested. We also checked Data items whether subsequently published trials cited the Picard review, We extracted the characteristics of the trials, including year of and whether the designs of trials that cited the review differed publication, journal impact factor (Journals Citation Reports from those that did not. 2011; we analysed journals without an impact factor separately), open access status of the journal (yes/no), study population Methods (adults, children, volunteers), number of analysed participants, This systematic review was written according to the PRISMA and sources of funding (none, academic, industry, not declared). statement for reporting systematic reviews and meta-analyses.6 According to the Picard review, the most efficacious intervention The study protocol was not registered but is available from the was an intravenous bolus injection of lignocaine with venous authors. occlusion (manually or with a tourniquet) about 20 seconds before administering propofol into the same vein.5 For the Eligibility criteria purpose of our analysis we classified this method as the primary reference treatment. Alternative interventions that had some We included all trials that had been analysed in the Picard 5 proved efficacy, although less so compared with the lignocaine review and added a new search to identify all trials that had occlusion technique (for instance, intravenous injection of since been published. As in the Picard review, we searched for lignocaine without occlusion), were classified by us as secondary full published reports of randomised trials testing the analgesic reference treatments. We classified interventions without proved efficacy of any intervention compared with placebo or no efficacy according to the Picard review, and new, potentially treatment to prevent pain from propofol administered useful interventions that had not yet been retrieved in that intravenously. Since the necessity (and ethical acceptability) of systematic review, as experimental interventions. We regarded a placebo arm may be questioned as an efficacious intervention no treatment controls as placebos.

No commercial reuse: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe BMJ 2014;349:g5219 doi: 10.1136/bmj.g5219 Page 3 of 13

RESEARCH

Additionally, we classified the new trials according to their Characteristics of new trials clinical relevance. We considered that for rational decision The 136 new trials included data from 19 778 patients (table making clinicians needed to know not only how well a new 1⇓). The trials originated from 30 countries and were published experimental intervention performed compared with placebo in 51 different journals, of which 29 (56.9%) had an impact but also how it performed compared with the primary reference factor and 14 (27.5%) were open access (see supplementary file treatment. Therefore we regarded study designs to be clinically 2). Eighty trials (58.8%) tested the efficacy of a variety of drugs relevant that compared an experimental intervention with the administered before, or concomitantly with, propofol, 40 primary reference treatment, with or without an additional arm. (29.4%) tested the analgesic efficacy of different emulsions of We regarded all other comparisons—for example, experimental propofol, and 16 (11.8%) tested non-drug interventions. intervention versus placebo only, or experimental intervention versus secondary reference treatment, to be not clinically Synthesis of differences between old and new relevant because these trials were unlikely to contribute trials importantly to existing knowledge. If a trial showed that an experimental intervention offered advantages over placebo or General characteristics over a secondary reference treatment, it would still leave the Compared with the old trials, the new trials were larger (median question unanswered as to whether that experimental number of analysed participants 125.5 v 100, P<0.001), intervention should be preferred or not to the currently most published in journals with lower impact factors (median 2.23 v efficacious intervention (that is, the primary reference treatment). 2.96, P<0.001), more often published in journals without an Also, we regarded trials that compared the primary reference impact factor (30.1% v 16.1%, difference 14.1%, 95% treatment with a secondary reference treatment only or the confidence interval 1.7% to 26.4%, P=0.043), and scored higher primary or a secondary reference treatment with placebo as for quality of data reporting (median 3 v 2, P<0.001, table 1). redundant and therefore not clinically relevant, as these Eighty nine (65.4%) new trials scored 2 for randomisation comparisons had already been analysed in the Picard review. compared with 6 (10.7%) old trials (difference 54.7%, 95% Furthermore, we considered any trial performed in children as confidence interval 43.3% to 66.1%, P<0.001). The flow of clinically relevant since there was a lack of evidence for the patients was optimally described in 29 (21.3%) new trials best intervention to prevent pain from propofol injection in this compared with five (8.9%) old trials (difference 12.4%, 2.2% subgroup. Finally, when the original trials cited the Picard to 22.6%, P=0.041). review, we checked the context in which the review was cited and whether it was explicitly used to design the trial. Main outcomes Data synthesis Compared with the old trials, the number of new trials published per year increased (median 12 v 2.5; P<0.001, fig 2⇓, table 2⇓). We compared the characteristics and designs of trials published The number of new trials performed in children had also before and included in the Picard review (old trials) with those increased, from three (5.4%) to 17 (12.5%), although the published after the Picard review (new trials). The main difference in proportions did not reach significance (P=0.141). outcomes were the proportion of trials performed in children, Blinding procedures had improved (P<0.001) and a greater the proportion that were optimally blinded (Oxford quality score proportion of new trials were optimally blinded (10.7% v 38.2%; 2), and the proportion that included the primary reference difference 27.5%, 16.0% to 39.0%). New trials used the primary treatment. Results are reported as numbers and percentages. reference treatment more often as a comparator (27.9% v 12.5%; Continuous variables are reported as medians and ranges. We difference 15.4%, 4.0% to 26.9%, P=0.022) and used a 2 applied the χ test to determine statistically significant secondary reference treatment less often (35.3% v 62.5%; differences between categorical variables, and we reported the difference −27.2%, −42.2% to −12.2%, P<0.001). difference in proportions between groups, with 95% confidence Ninety nine of the 136 new trials (72.8%) cited the Picard review intervals for each category. For comparisons of non-Gaussian (table 3⇓). Compared with the 37 trials that did not, trials citing distributions we used the Mann-Whitney test. the review were published more often per year (median 9 v 3, In a further subgroup analysis we grouped trials according to P<0.001), were not more often performed in children (14.1% v whether or not they cited the Picard review and we compared 8.1%, P=0.344), the distribution of blinding scores were not the characteristics of their designs. We compared these statistically significantly different although optimal blinding subgroups using similar statistical tests. We also compared the was more common (43.4% v 24.3%, difference 19.1%, 2.2% to characteristics of clinically relevant new trials with those that 36.0%), and reporting quality tended to be higher (median were not clinically relevant on the outcomes: published in an Oxford score 3 v 2, P=0.011). Reporting of randomisation open access journal (yes/no), impact factor, citing the Picard procedures scored 2 in 70 (70.7%) of the trials citing the review review (yes/no), and sources of funding. compared with 19 (51.4%) of the trials not citing the review (difference 19.4%, 0.9% to 37.8%, P=0.035), and the flow of Results patients was optimally described in 24 (24.2%) of the trials citing the review compared with five (13.5%) of the trials not Study selection citing the review (P=0.174). The proportion of trials including We identified 360 new reports (fig 1⇓). Through screening of the primary reference treatment did not differ (29.2% v 24.3%, titles and abstracts, we excluded 189 reports. The remaining P=0.565, table 4⇓). However, trials citing the review included 171 were studied in detail and a further 35 were subsequently a secondary reference treatment more often (40.4% v 21.6%; excluded. We eventually included 136 new randomised trials difference 18.8%, 2.4% to 35.2%, P=0.041) and a design without that had been published at least two years after the publication a primary or a secondary reference treatment less often (30.3% of the Picard review (see supplementary files 2 and 3). Of these v 54.1%; difference −23.8%, −42.2% to −5.3%, P=0.011). 136 new trials, 94 were placebo controlled.

No commercial reuse: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe BMJ 2014;349:g5219 doi: 10.1136/bmj.g5219 Page 4 of 13

RESEARCH

Additional findings Discussion Clinical relevance This study, which aimed to examine the impact of a systematic Forty nine (36.0%) new trials were regarded by us as clinically review with meta-analysis on the design and relevance of relevant. Of the 87 (64.0%) trials considered not clinically subsequent research, illustrates four major problems. Firstly, relevant, 47 (54.0%) compared experimental interventions with although the systematic review had identified a simple, effective, or without a placebo, 32 (36.8%) compared an experimental and low cost intervention and strongly suggested that additional intervention with a secondary reference treatment with or trials on this specific issue were no longer necessary, the without a placebo, four (4.6%) compared the primary reference publication of trials has not decreased. Secondly, although the treatment with a secondary reference treatment with or without systematic review provided a clear research agenda, its influence a placebo, three (3.5%) compared secondary reference treatments on the design of further trials has remained poor. Thirdly, the with or without a placebo, and one (1.1%) compared the primary proportion of subsequently published trials that could have had reference treatment with placebo. an impact on clinical practice has remained low. Finally, citing There were no significant differences between clinically relevant the systematic review had no clear influence on the design or and non-relevant trials for the proportion that cited the Picard relevance of subsequently published research. review (79.6% v 69.0%, P=0.181), the proportion published in journals without an impact factor (28.6% v 31.0%, P=0.764), Comparison with other studies the median impact factor of those published in journals with an There are numerous examples where systematic reviews, if impact factor (2.23 v 2.19, P=0.418), or the proportion published performed in a timely manner, could have provided evidence in open access journals (22.4% v 20.7%, P=0.810, table 5⇓). of the effectiveness of an intervention and thus prevented Of the 49 clinically relevant trials, two (4.1%) were funded by redundant research.9 10 There is also evidence that knowledge industry; of the 87 clinically non-relevant trials, seven (8.0%) from systematic reviews is underused to inform future were funded by industry (difference −4.0%, −11.9% to 4.0%, research.3 11 Our study confirms these findings. These raise table 5). Of the nine new trials funded by industry, two were ethical concerns not only because patients are unnecessarily optimally blinded, two were not double blinded at all, five randomised in worthless trials but also because resources are (55.5%) compared an experimental intervention with a wasted.2 secondary reference treatment, three (33.3%) compared an It remains unclear why the number of published trials on the experimental intervention with placebo, and one (11.1%) prevention of pain from propofol injection has not decreased; compared two experimental interventions. None used the in fact the number has actually increased. Individual motivations primary reference treatment (see supplementary file 2). may explain this finding. Doctors are challenged to engage in research for their career progression (publish or perish policy), Context of citation of Picard review but clinically relevant large studies with long follow-up periods 12 Since we were unable to find significant differences between are difficult to achieve. Studies focusing on pain from propofol the designs of trials citing and not citing the Picard review, we injection are easy and quick to perform and are straightforward made further investigations regarding the context in which the to publish. As long as academic promotion and funding systems Picard review was cited in the original trials. Among the 99 are based on simplistic counts of published papers, favouring trials that cited the Picard review, 39 (39.4%) did so to illustrate prolific authors regardless of the relevance and validity of their the large variety of analgesic interventions that had been tested work, this is unlikely to change. Ethics committees are supposed in this setting (five used the primary reference treatment) and to ensure scientific soundness and relevance of clinical 21 (21.2%) to document the underlying risk of pain (four used investigations, preventing enrolment of participants in non-ethical trials. This should include the rational choice of a the primary reference treatment). Fifteen trials explicitly reported 13 having used the review as a basis for the choice of the primary comparator intervention and be supported by systematic reference treatment as a comparator (of which one criticised the reviews to check how new protocols fit in with the current state fact that the studies included in the review were too disparate of medical knowledge. In real life, however, ethics committees and that they had methodological gaps, and therefore they chose seem to stand alone, with limited resources and sometimes not to repeat the comparison). Seven trials explicitly reported not enough scientific credit or knowledge to identify, and stop, the having chosen the primary reference treatment for different performance of irrelevant research. It also remains unclear why reasons: the tourniquet was difficult or threatening in children editors accepted these new trials for publication. The new trials (n=4 trials), the primary reference treatment was not often used were published in journals with lower impact factors, and a although it was the best option available (n=1), the primary higher proportion was published in journals without any impact reference treatment was judged to be awkward (n=1), and the factor. This suggests that editors of higher quality journals were Picard review was regarded as not-conclusive (n=1). Eleven unwilling to publish articles on an already solved problem. It has been suggested that open access journals may apply less further trials acknowledged the identification of the primary 14 reference treatment by the review; 10 ignored it without any stringent criteria for publication. We cannot confirm this explanation and one chose the primary reference treatment, but hypothesis based on our sample of trials; a similar proportion on the basis of the authors’ own previous pilot study. The of relevant and non-relevant trials were published in open access authors of four trials chose the primary reference treatment journals. without explanation but eventually compared their results with The Picard review provided a clear research agenda; however, those of the Picard review. Finally, the authors of two trials its influence on the design of further research has remained stated as a limitation of their study that they did not use the marginal. More data on children were deemed necessary, and primary reference treatment that was identified by the Picard although 17 new paediatric trials have been published, the review. proportional increase in paediatric trials did not reach statistical All analyses were performed on the subgroup of 94 trials significance and may have occurred by chance. Interestingly, including a placebo or no treatment group. The results were although a total of 20 trials performed in children is now similar. available and may be sufficient to draw conclusions on this

No commercial reuse: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe BMJ 2014;349:g5219 doi: 10.1136/bmj.g5219 Page 5 of 13

RESEARCH

population, these trials were excluded from the updated authors remained unaware of the Picard review even though we systematic review by Jalota and colleagues,7 and to our selected trials that had been published from 2002. A two year knowledge no systematic review has yet summarised the delay between the publication of the review and the inclusion available evidence on the prevention of pain from propofol of trials into our analyses may be short. Seven of 13 trials (54%) injection in children. The research agenda in the Picard review published in 2002 cited the review and it is possible that some also suggested that more optimally blinded trials were needed, of these trials had started enrolment of patients before the which was achieved in subsequent trials. Moreover, blinding publication of the review. This would explain why their design was more often optimal in trials citing than not citing the Picard was not influenced by the Picard review. Thirdly, the relevance review. However, since new studies scored higher in all aspects of the Picard review may be questioned since it may be of quality of data reporting, the specific impact of the Picard considered out of date. The median survival time of a systematic review may be questioned, since this improvement may actually review has been estimated to be about six years.20 However, the reflect an increase in the implementation of the CONSORT biological basis underlying pain from propofol injection has not statement.15 Interestingly, the source of funding was still not changed, and the Picard review has remained the only systematic declared in about 65% of the trials, and only three have been review on this topic for almost 12 years. The relevance of the registered, although these items are included in the CONSORT primary reference treatment identified in the review may also checklist. be questioned since it was based on about 400 patients only. The identification of the currently most effective intervention However, to find an even smaller treatment effect (for example, to prevent pain from injection of propofol was the main message a decrease from a conservative baseline risk of 60% pain with of the Picard review. It is plausible that in some of the placebo to 20% pain with an experimental intervention, number subsequently published trials the review had an influence on needed to treat 2.5), only about 35 patients would be necessary the choice of the comparator against which a new, potentially in each group of a single trial to obtain 90% power for detecting innovative, experimental intervention was tested. It has been such a level of analgesic efficacy (two sided test, α level 0.05). In 2011, a new systematic review on the same subject, published argued that the choice of a comparator intervention should be 7 supported by a systematic review of the relevant literature.16 by different authors, included data from 25 260 patients from However, the authors of less than one third of subsequently 177 trials and came to the same conclusion—namely, that published trials chose the primary reference treatment as a administering lidocaine (lignocaine) along with venous occlusion comparator. was the most efficacious intervention for the prevention of pain from propofol injection. This second review also reported that Overall, the number of clinically relevant trials (those that are injection of propofol through a large vein provided a similar likely to have an impact on clinical practice) remained low. degree of pain relief, an intervention that had remained ill Although the Picard review identified a primary reference described at the time of the Picard review. Fourthly, we did not treatment, researchers may have wanted to test yet another contact the authors of the new trials to ask them whether they experimental intervention that may have been even more knew about the Picard review, why they had chosen to cite it efficacious, or simpler to use. One would expect this or not, and on what grounds they had designed their study. It experimental intervention to be compared with the current may be that some authors had actually used the Picard review primary reference treatment. In our study, about 1 in 3 to inform their study design but did not overtly refer to it. It is subsequently published trials only used the primary reference also possible that the new trials, which were designed according treatment as a comparator. Authors that aim for academic to the Picard review, subsequently influenced further trials too. recognition may embark on research that does not necessarily Indeed, some trials used a clinically relevant study design but improve existing knowledge.17 Also, they may want to reach did not explicitly justify their choice. Fifthly, our definition of statistically significant results to ensure publication, since what constitutes a clinically relevant study design may be journals are more prone to publish such results,18 and therefore challenged. For example, seven trials had chosen, based on the refrain from comparing an experimental intervention with the Picard review, not to use the primary reference treatment for currently most efficacious treatment. Interestingly, nine trials different reasons. Of these, two were classified by us as being were industry sponsored, and none of those used the primary not clinically relevant. There may be an argument to classify reference treatment as a comparator. This suggests that in these those as clinically relevant as the authors used the Picard review trials a comparator was chosen to favour a priori the to defend the choice of their comparator. And finally, the experimental intervention, at the cost of threatening the principle generalisability of our findings remains unknown. This analysis of equipoise.19 Citing the Picard review did not clearly reflect is based on a specific subject of perioperative medicine. It is its influence on study design. Although almost 73% of the new possible that in research areas where studies of longer follow-up trials cited the Picard review, their characteristics and clinical times and more complex infrastructure are required, fewer relevance were largely similar to those that did not. Only about irrelevant trials are produced. 32% of the trials mentioning the review explicitly described having used it to inform study design; 15 cited it to justify the Unanswered questions and future research choice of the primary reference treatment, whereas seven cited it to explain why they had chosen not to use the primary Our study raises questions about the dissemination of results reference treatment. Ignoring the recommendations of the Picard identified through a systematic review. It seems that review may be justified as long as it is explicitly explained. dissemination of the main results of the Picard review and their implementation into practice and methodological guidelines Strengths and weaknesses of this study has not been sufficient. The extraordinary degree of analgesic efficacy of intravenous lignocaine with venous occlusion, its This study has several limitations. Firstly, we did not analyse ease of use, and low cost, may have led the authors of the Picard some trials that were published in Chinese, Japanese, or Persian, review to believe that the primary reference treatment would because of problems with translation, and we did not include be widely accepted both in clinical practice and as a comparator studies published between January 2000 and December 2001. for subsequent research. They were wrong. It was surprising It is unlikely that including those trials would have changed the that so many new trials had been published on a topic for which results of our analysis. Secondly, we cannot exclude that some

No commercial reuse: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe BMJ 2014;349:g5219 doi: 10.1136/bmj.g5219 Page 6 of 13

RESEARCH

an efficacious intervention was known. Even more disturbing Contributors: CH, MRT, and NE conceived the study, contributed to was that most new trials ended up with clinically non-relevant study design, conducted the analyses, and prepared the manuscript. designs, which meant that these trials were unable, regardless CH performed systematic literature searches. CH and DP extracted of the results, to generate important new knowledge and thus data from original trials. DMP contributed to the preparation of the to considerably change clinical practice. The profusion of manuscript. All authors read and approved the manuscript. CH and NE clinically non-relevant studies reflects the considerable had full access to all of the data in the study and can take responsibility imbalance between the strong pressure to publish and the for the integrity of the data and the accuracy of the data analysis. NE weakness of barriers that prevent useless research. is guarantor. Methodological implementation strategies are needed to define Funding: This study did not receive specific funding. an appropriate and ethical research agenda based on the best Competing interests: All authors completed the ICMJE uniform disclosure currently available evidence. Dissemination and implementation form at www.icmje.org/coi_disclosure.pdf (available on request from of scientific knowledge is a challenge and has been widely the corresponding author) and declare: MRT is a coauthor of the Picard 21 22 discussed. There is a difference, however, between review; no support from any organisation for the submitted work; no implementation of clinical knowledge and implementation of financial relationships with any organisations that might have an interest methodological knowledge. While clinicians remain free to treat in the submitted work in the previous three years; no other relationships patients as they want, research protocols have to go through a or activities that could appear to have influenced the submitted work. range of barriers that could avoid waste of resources if they Ethical approval: Not required. were adequately used. Barriers include funders (for funded research), ethics committees, and clinical trial registries. Data sharing: No additional data available. Unfortunately, clinical trial registries do not yet exert any quality Transparency: The lead author (NE) affirms that this manuscript is an control over registered protocols; this is supposed to be done honest, accurate, and transparent account of the study being reported; by funding agencies and ethics committees. However, funders that no important aspects of the study have been omitted; and that any are more easily impressed by a long list of publications than by discrepancies from the study as planned have been explained. relevant content, and ethics committees are more concerned about patient protection than down-regulation of non-relevant 1 Egger M, Smith GD. Meta-Analysis. Potentials and promise. BMJ 1997;315:1371-4. 2 Young C, Horton R. Putting clinical trials into context. Lancet 2005;366:107-8. research, although scientific soundness is a prerequisite of 3 Fergusson D, Glass KC, Hutton B, Shapiro S. Randomized controlled trials of aprotinin ethical soundness (for example, Helsinki Declaration article in cardiac surgery: could clinical equipoise have stopped the bleeding? Clin Trials 23 2005;2:218-29; discussion 229-32. 21). We suggest that both funders and ethics committees start 4 Eger EI 2nd. Characteristics of anesthetic agents used for induction and maintenance of asking authors explicitly whether a systematic review on their general anesthesia. Am J Health Syst Pharm 2004;61(suppl 4):3-10. 5 Picard P, Tramèr MR. Prevention of pain on injection with propofol: a quantitative research topic exists, and if yes, whether that systematic review systematic review. Anesth Analg 2000;90:963-9. proposes a research agenda. If this is the case, authors should 6 Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gotzsche PC, Ioannidis JP, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate explain how their proposed research project fits into that health care interventions: explanation and elaboration. J Clin Epidemiol 2009;62:e1-34. research agenda. This would require a wider acceptance of the 7 Jalota L, Kalira V, George E, Shi YY, Hornuss C, Radke O, et al. Prevention of pain on strengths of systematic reviews and meta-analyses in defining injection of propofol: systematic review and meta-analysis. BMJ 2011;342:d1110. 8 Jadad AR, Moore RA, Carroll D, Jenkinson C, Reynolds DJ, Gavaghan DJ, et al. Assessing research agendas but would avoid unnecessary, redundant, or the quality of reports of randomized clinical trials: is blinding necessary? Control Clin invalid, and thus unethical, research. Finally, journal editors Trials 1996;17:1-12. 9 Lau J, Antman EM, Jimenez-Silva J, Kupelnick B, Mosteller F, Chalmers TC. Cumulative may inform their readers that they will stop considering reports meta-analysis of therapeutic trials for myocardial infarction. N Engl J Med 1992;327:248-54. of trials that ignore such knowledge for publication. 10 Gilbert R, Salanti G, Harden M, See S. Infant sleeping position and the sudden infant death syndrome: systematic review of observational studies and historical review of recommendations from 1940 to 2002. Int J Epidemiol 2005;34:874-87. Conclusions and recommendations 11 Clarke M, Hopewell S, Chalmers I. Reports of clinical trials should begin and end with up-to-date systematic reviews of other relevant evidence: a status report. J R Soc Med Much effort has been put into the conviction of academia and 2007;100:187-90. 12 Yuan HF, Xu WD, Hu HY. Young Chinese doctors and the pressure of publication. Lancet policymakers that systematic reviews are vital instruments to 2013;381:e4. 24 improve efficacy and safety of healthcare. These efforts should 13 Garattini S, Bertele V, Li Bassi L. How can research ethics committees protect patients be extended to the research area. Authors should justify their better? BMJ 2003;326:1199-201. 11 25 14 Bohannon J. Who’s afraid of peer review? Science 2013;342:60-5. trial design in the context of the current state of knowledge. 15 Schulz KF, Altman DG, Moher D. CONSORT 2010 statement: updated guidelines for Our findings highlight the role that systematic reviews should reporting parallel group randomised trials. BMJ 2010;340:c332. 16 Mann H, Djulbegovic B. Choosing a control intervention for a randomised clinical trial. play in guiding trialists in their choice of the most appropriate BMC Med Res Methodol 2003;3:7. study design, avoiding ill designed and clinically none relevant 17 Krishnan V. Etiquette in scientific publishing. Am J Orthod Dentofacial Orthop 2013;144:577-82. trials and thus a waste of resources. 18 Hopewell S, Loudon K, Clarke MJ, Oxman AD, Dickersin K. Publication bias in clinical trials due to statistical significance or direction of trial results. Cochrane Database Syst Rev We thank P Picard (Centre d’Étude et de Traitement de la Douleur, 2009;1:MR000006. 19 Ashcroft R. Equipoise, knowledge and ethics in clinical research and practice. Bioethics Clermont-Ferrand, France) and M R Tramèr (Geneva University 1999;13:314-26. Hospitals, Geneva, Switzerland) who shared with us original data of 20 Shojania KG, Sampson M, Ansari MT, Ji J, Doucette S, Moher D. How quickly do systematic reviews go out of date? A survival analysis. Ann Intern Med 2007;147:224-33. their systematic review; T Perneger (Geneva University Hospitals) for 21 Peters DH, Adam T, Alonge O, Agyepong IA, Tran N. Implementation research: what it his thoughtful advice; R Abbasivash (Department of Anaesthesiology, is and how to do it. BMJ ;347:f6753. 22 Contopoulos-Ioannidis DG, Alexiou GA, Gouvias TC, Ioannidis JP. Medicine. Life cycle Faculty of Medicine, Urmia University of Medical Sciences, Urmia, Iran) of translational research for medical interventions. Science 2008;321:1298-9. who responded to our enquiry and provided a copy of his paper; M 23 Gandevia B, Tovell A. Declaration of Helsinki. Med J Aust 1964;2:320-1. 24 Clarke M. Doing new research? Don’t forget the old. PLoS Med 2004;1:e35. Öztürk (Department of Surgery, Geneva University Hospitals) who 25 Chalmers I. Academia’s failure to support systematic reviews. Lancet 2005;365:469. translated two Turkish papers into English. We assume full responsibility Accepted: 3 August 2014 for the analyses and interpretation of these data. Presentation: Summary data of this analysis have been presented as Cite this as: BMJ 2014;349:g5219 an oral presentation at the 7th International Congress on Peer Review This is an Open Access article distributed in accordance with the Creative Commons and Biomedical Publication, Chicago, September 8-10, 2013 (abstract Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, No PRC-13-0043). remix, adapt, build upon this work non-commercially, and license their derivative works

No commercial reuse: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe BMJ 2014;349:g5219 doi: 10.1136/bmj.g5219 Page 7 of 13

RESEARCH

What is already known on this topic

It is unethical to embark on new research without first analysing what can be learnt from existing literature Systematic reviews guide researchers in assessing the need for further investigations, to avoid unnecessary and redundant research Systematic reviews often provide a research agenda to guide future research

What this study adds

A systematic review that had identified an efficacious analgesic intervention to prevent pain on injection of propofol and provided a clear agenda for future research had only little impact on the design of subsequently published trials on the same subject Implementation strategies are needed to ensure dissemination of results and methodological issues identified through systematic reviews New trials should be designed according to the findings and recommendations of the research agenda of previously published systematic reviews

on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/.

No commercial reuse: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe BMJ 2014;349:g5219 doi: 10.1136/bmj.g5219 Page 8 of 13

RESEARCH

Tables

Table 1| Characteristics of trials published before and after the Picard review. Values are numbers (percentages) unless stated otherwise

Difference in proportions (95% Characteristics Old trials* New trials† CI) P value

Trials 56 136 Participants 6264 19 778 Median (range) year of publication 1995 (1982-99) 2007 (2002-12) Median No (range) of participants per 100 (28-368) 125.5 (16-500) <0.001 trial Published in journal without impact 9 (16.1) 41 (30.1) 14.1 (1.7 to 26.4) 0.043 factor Median (range) impact factor‡ 2.96 (1.21-5.36) 2.23 (0.03-5.36) <0.001 Oxford quality score§: <0.001 1 or 2 43 (76.8) 45 (33.1) −43.7 (−57.3 to −30.1) 3 12 (21.4) 43 (31.6) 10.2 (−3.1 to 23.5) 4 or 5 1 (1.8) 48 (35.3) 33.5 (24.8 to 42.3)

*Published before (and therefore included in) the Picard review. †Published after the Picard review, from 2002 onwards. ‡Calculated excluding trials that were published without an impact factor. §Randomisation: none (score 0), mentioned (1), described and adequate (that is, computer generated list of random numbers, or sealed envelopes) (2); blinding: none or observer blinding only (score 0), double blinding described but not optimal (1), optimal (double dummy or convincing blinding procedures) (2); drop-outs: not described or incomplete (score 0), clear follow-up of each patient (1).

No commercial reuse: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe BMJ 2014;349:g5219 doi: 10.1136/bmj.g5219 Page 9 of 13

RESEARCH

Table 2| Comparisons of outcomes in trials published before and after the Picard review. Values are numbers (percentages) unless stated otherwise

Difference in proportions Outcomes Old trials* New trials† (95% CI) P value

Publication rate: Median No (range) of trials published per year 2.5 (0-9) 12 (7-20) <0.001 Population: Trials in children 3 (5.4) 17 (12.5) 7.1 (−1.0 to 15.2) 0.141 Blinding (Oxford quality score): <0.001 No attempt, or single blinding only (score 0) 19 (33.9) 37 (27.2) 6.7 (21.2 to 7.8) Described but not optimal (score 1) 31 (55.4) 47 (34.6) −20.8 (−36.1 to −5.5) Optimal (score 2) 6 (10.7) 52 (38.2) 27.5 (16.0 to 39.0) Trial design including primary reference treatment: PT v ST v experimental (v placebo) 3 (5.4) 5 (3.7) PT v experimental (v placebo) 0 (0.0) 28 (20.6) PT v ST (v placebo) 4 (7.1) 4 (2.9) PT v placebo 0 (0.0) 1 (0.7) Total 7 (12.5) 38 (27.9) 15.4 (4.0 to 26.9) 0.022 Trial design without PT but including ST: ST v experimental (v placebo) 19 (33.9) 45 (33.1) ST v ST (v placebo) 16 (28.6) 3 (2.2) Total 35 (62.5) 48 (35.3) −27.2 (−42.2 to −12.2) <0.001 Trial design without PT or ST: Experimental v experimental (v placebo) 14 (25.0) 50 (36.8) 11.8 (−2.2 to 25.7) 0.116

PT=primary reference treatment (intravenous lidocaine (lignocaine) with venous occlusion); ST=secondary reference treatment (that is, alternative intervention of proved efficacy, for example, lignocaine added to propofol). *Published before (and therefore included in) the Picard review. †Published after the Picard review, from 2002 onwards.

No commercial reuse: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe BMJ 2014;349:g5219 doi: 10.1136/bmj.g5219 Page 10 of 13

RESEARCH

Table 3| Comparison of characteristics of trials according to their reference to the Picard review. Values are numbers (percentages) unless stated otherwise

Difference in proportions (95% Characteristics Reference to review No reference to review CI) P value

Trials 99 (72.8) 37 (27.2) Participants 14 590 5188 Median (range) year of publication 2007 (2002-12) 2007 (2002-12) 0.992 Median No (range) of participants per 127 (22-500) 120 (16-335) 0.839 trial Published in journal without impact 26 (26.3) 15 (40.5) −14.3 (−32.3 to 3.8) 0.106 factor Median (range) impact factor* 2.23 (0.03-5.36) 2.11 (0.52-3.29) 0.385 Oxford quality score†: 0.011 1 or 2 26 (26.3) 19 (51.4) −25.1 (−43.4 to −6.8) 3 32 (32.3) 11 (29.7) 2.6 (−14.8 to 20.0) 4 or 5 41 (41.4) 7 (18.9) 22.5 (6.6 to 38.4)

*Calculated excluding trials that were published without an impact factor. †Randomisation: none (score 0), mentioned (1), described and adequate (that is, computer generated list of random numbers, or sealed envelopes) (2); blinding: none or observer blinding only (score 0), double blinding described but not optimal (1), optimal (double dummy or convincing blinding procedures) (2); drop-outs: not described or incomplete (score 0), clear follow-up of each patient (1).

No commercial reuse: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe BMJ 2014;349:g5219 doi: 10.1136/bmj.g5219 Page 11 of 13

RESEARCH

Table 4| Comparison of outcomes in trials according to their reference to the Picard review. Values are numbers (percentages) unless stated otherwise

Difference in proportions Outcomes Reference to review No reference to review (95% CI) P value

Publication rate: Median No (range) of trials published per year 9 (6-15) 3 (0-6) <0.001 Population: Trials in children 14 (14.1) 3 (8.11) 6.0 (−5.1 to 17.2) 0.344 Blinding (Oxford quality score): 0.118 No attempt, or single blinding only (score 0) 24 (24.2) 13 (35.1) −10.9 (−28.4 to 6.6) Described but not optimal (score 1) 32 (32.3) 15 (40.5) −8.2 (−26.5 to 10.1) Optimal (score 2) 43 (43.4) 9 (24.3) 19.1 (2.2 to 36.0) Trial designs including PT: PT v ST v experimental (v placebo) 3 (3.0) 2 (5.4) PT v experimental (v placebo) 23 (23.2) 5 (13.5) PT v ST (v placebo) 3 (3.0) 1 (2.7) PT v placebo 0 (0.0) 1 (2.7) Total 29 (29.2) 9 (24.3) 5.0 (−11.5 to 21.5) 0.565 Trial designs without PT but including ST: ST v experimental (v placebo) 37 (37.4) 8 (21.6) ST v ST (v placebo) 3 (3.0) 0 (0.0) Total 40 (40.4) 8 (21.6) 18.8 (2.4 to 35.2) 0.041 Trial designs without PT or ST: Experimental v experimental (v placebo) 30 (30.3) 20 (54.1) −23.8 (−42.2 to −5.3) 0.011

PT=primary reference treatment (intravenous lidocaine (lignocaine) with venous occlusion). ST=secondary reference treatment (that is, alternative intervention of proved efficacy, for example, lignocaine added to propofol).

No commercial reuse: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe BMJ 2014;349:g5219 doi: 10.1136/bmj.g5219 Page 12 of 13

RESEARCH

Table 5| Relevant versus non-relevant trial designs among 136 new trials. Values are numbers (percentages) unless stated otherwise

Characteristics Relevant designs Non-relevant designs Difference in proportions (95% CI) P value

Trials 49 (36.0) 87 (64.0) Median (range) year 2007 (2002-12) 2007 (2002-12) 0.667 Reference to Picard review 39 (79.6) 60 (69.0) 10.6 (−4.3 to 25.5) 0.181 Published in journal without impact factor 14 (28.6) 27 (31.0) −2.4 (−18.4 to 13.5) 0.764 Median (range) impact factor* 2.23 (0.03-5.36) 2.19 (0.32-4.24) 0.418 Published in open access journal 11 (22.4) 18 (20.7) 1.8 (−12.7 to 16.2) 0.810 Funding source: 0.143 Not declared 32 (65.3) 55 (63.2) 2.1 (−14.7 to 18.8) None 10 (20.4) 8 (9.2) 11.2 (−1.6 to 24.0) Academic 5 (10.2) 17 (19.5) −9.3 (−21.2 to 2.5) Industry 2 (4.1) 7 (8.0) −4.0 (−11.9 to 4.0)

*Calculated excluding trials that were published without an impact factor.

No commercial reuse: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe BMJ 2014;349:g5219 doi: 10.1136/bmj.g5219 Page 13 of 13

RESEARCH

Figures

Fig 1 Flow chart of study selection. *Number does not add up as some titles were from more than one database. †Chinese (n=3 trials), Japanese (n=7), and Persian (n=1)

Fig 2 Number of published randomised controlled trials studying efficacy of interventions for prevention of pain from propofol injection. Trials published before 2000 are those included in the Picard review.5 For the present analysis, searches for trials published after the Picard review included references from 2002 onwards. White bars represent clinically relevant trials

No commercial reuse: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe 6 Discussion

Medicine has slowly moved from an art to a science based on evidence, and most of today’s clinical practices need to be justified by scientific evidence.146,147 This transition aimed to improve treatment efficacy and patient’s safety, and to allow physicians to inform and advise their patients in a clear and transparent way.148 The use, and the publication, of scientific evidence has become part of an academic physician’s everyday work, and almost a mandatory task in academic hospitals.149 In anaesthesiology, publication of an article is now also required to achieve a specialist degree.150 This shift towards science had different consequences of which I will briefly discuss three. One consequence has been the emergence of a “pressure to publish” atmosphere, which leads to a high increase in the number of articles published in all disciplines, making it almost impossible for physicians to remain up-to-date.151,152 This has highlighted an urgent need for clear and transparent reviews of the existing literature, based on scientific and reproducible methods to synthetize previously performed researches, now known as systematic reviews.110,147 A second consequence was that, since sound evidence requires implementation of clinical trials using very specific procedures, some of which remain unknown to physicians performing the trials, a large part of the evidence published is of poor quality and may report biased findings.152,153,154, Finally, the explosive association of pressure to publish on one side, and of low methodological knowledge on the other, has created the bases of an increased temptation to “cheat” or “fraud”,15,155 in order to facilitate the publication of articles. This has further led to the publication of distorted results and conclusions156 with potentially devastating consequences on patient’s health,157 the scientific knowledge, and the reputation of science as described above. All of these problems have been reasonably well recognized throughout the medical community and among editors of scientific journals.158,159,160 Although Universities fear the scandal related to the uncovering of research fraud in their institution,161 and remain largely reluctant to openly communicate on these issues, once identified, most are aware that research malpractice and misconduct represents a major threat to their mission.

6.1 What should be done? One way to resolve a problem is to try to address its major causes one by one. A first path could therefore be to decrease the incentives to malpractices and misconduct, one of which is the pressure to publish for all physicians. Although encouraging research should be commended and is part of a strong academic curriculum, forcing all physicians, even those who have no interest in the research area, to produce research and to publish, is most likely contra- productive for the scientific knowledge and for the scientific community. According to Bastian et al, who recognise the difficulty of estimating these numbers, there were about 14 new RCTs published every day in the medical field in the 1980ies, and 75 RCTs and 11 systematic reviews

89 published every day in 2010, although “there are still only 24 hours in a day”…151 This enormous amount of scientific production ought to be better controlled. Academic promotion, as well as attribution of research funds, need to be based a little more on the quality of the articles published by the applicant and a little less on the quantity of published articles if we want to see a decrease in the number of useless research published.153 Also, when academic promotion and grant attribution will stop depending solely on the number of publications, especially in high impact factor journals, the temptation of adding guest-authors in publications,162 and even worse, the temptation to fabricate and/or falsify data may decrease as well.155 Unfortunately, research on the factors responsible for scientific misconduct remains sparse.163,164 It has been suggested that the lack of a clear definition of research misconduct, the lack of research integrity policy within an institution,99 and cultural characteristics restraining criticism were risk factors for fraud among young researchers.165 Also, if the number of publications is probably not a safe metric in order to evaluate a researcher, we have to admit that the evaluation of the scientific performance of both individuals and of institutions remains a challenge. Since year 2000, different metrics have been proposed in an attempt to quantify the scientific productions, such as for example Science citation index, Web of science, h-index or the more recent Mendeley and Altmetrics. However, these metrics are too often misused according to scientometricians who highlighted that reading a researcher’s work to judge his/her scientific quality is much more appropriate than relying on any given number.166 Future challenges for academic institutions will be to honestly re-think their structure and incentives for academic promotion,167 to find new innovative ways to evaluate their collaborators, and to replace the simple count of articles and summing up of impact factors with other, more reliable metrics.149,168,169 A second path could be to implement structures to increase the quality of the research produced and published.10 This is likely to become necessary if medicine aims to remain a scientific discipline based on evidence. For example, physicians need to be better trained in research methodology and epidemiological principles throughout their medical curriculum. They need to be taught how to search for scientific evidence regarding a clinical question, where to look and how to critically appraise the published literature.165 They need to be aware of the importance of the science behind their everyday practice. All too often do I hear physicians dismissing the clinical trials as just merely serving to confirm what they already know from their clinical practice. A shift to a real scientific culture is needed among the medical community and this may take a long time if we consider where the medical culture and knowledge comes from. The Hippocratic Oath170 is an eloquent illustration of it’s “eminence based” foundations. A necessary step will require recognising that for today’s every-day practice, a physician needs to be able to search for literature and to critically appraise published evidence just as much as using a stethoscope. But also that performing good quality clinical trials requires very specific skills and that professional epidemiologist are needed in academic hospitals to help physician-researchers design, perform, report and publish high quality research. Also, if medicine aims to be a science, the philosophy

90 underlying the scientific process should be taught and highlighted repeatedly during the medical school. For example, physicians should clearly understand that although it may be quite pleasant to publish in Science or Nature, what is really important is to report what has been done during the research in such a way that readers can understand the procedures, to highlight any risk of bias or problem that may have distorted the reported results, including any potential conflicts of interests.171 There is no shame of not being able to replicate the results described in a previously published article. On the contrary, science needs to be replicated, and sometimes, because findings may happen just by chance, it is of major importance to be able to question previous findings.172 A third path could be to improve the barriers to publication of poor quality and fraudulent research. As seen previously, there are up to now quite few existing barriers and most of them may be of limited use to avoid such publications.86 Much responsibilities relies on the shoulders of the ethics committees41 and of the reviewers and editors173 of scientific journals for spotting, correcting and sometimes for denouncing wrong doings, with very few counterparts for this difficult work. There are up to now no real incentives to become either a careful member of a research’s ethics committee77 or a rigorous peer-reviewer, the definition of peer-reviewer itself being quite vague.174 For example, peer-review is generally performed on a voluntary basis, without either pecuniary reward or academic recognition of the work performed. Making a thorough peer-review of a research article demands time and commitment, without counterpart. Open peer-review is starting to develop and may improve the motivation of peer-reviewers and the quality of their reviews, as their work no longer remains hidden. Some have suggested however that open peer-review made no difference in the quality,175 but did increase the number of peer-reviewers declining invitation.176 Unfortunately, research on peer-reviewing remain scarce.177 Some web-initiatives such as “Publon”178 are emerging to develop a better recognition of the work of peer-reviewers by publishing their peer-review history online, and rewarding “top peer-reviewers” with prizes.179 Their impact on the overall quality of peer-reviews remain to be demonstrated. Finally, the last path could be to increase the post-publication verification process, which may identify errors or misconduct, lead to publication of corrections, and sometimes to retractions.180 Cautious readers sometimes take the time to write a letter to the Editor in Chief of a journal responsible for the publication of a flawed article after having spotted an error or a potential misconduct. We have illustrated this with a recent case in anaesthesiology, and shown how it could lead to identification of important frauds. Some web-platforms like “PubPeer”181 have also been recently developed to allow readers to comment and discuss published papers together; some discussion have led to the identification of cases of fraud. Other websites are specifically dedicated to highlighting multiple sorts of problems arising in scientific publication such as “Retraction Watch”182 that track retracted articles across all scientific journals. A system to better inform researchers regarding the retracted articles in their area of research is still needed, since it

91 has been shown that retracted articles are still cited after retraction.183 The positive side of this frightening reality, is that recognition of the existence of research misconduct and malpractice forces the scientific community to develop new procedures to try to avoid misconducts,98,184 User- based initiatives such as the ”EvidenceLive Manifesto”,185 or the “OpenTrials”186 intiatives are now regularly emerging, and will hopefully play an important role on the correction of the published literature, but also such as the “James Lind Alliance”187 to improve the relevance of research projects.

6.2 Where do systematic reviews fit into the picture? There are several ways systematic reviews may be of interest in this context. One way systematic reviews may be useful is by helping to decrease the number of newly published trials. Every researcher should be asked, before he/she starts a new research project, to check whether a systematic review on the topic has already been performed and published. Depending on the quality of the systematic review published, one could either decide that the research is no longer needed since a definite answer to the research question has been reached, or, if this was not the case, the research agenda of the available systematic review could help the researcher to reformulate a research question that remains unclear and needs to be investigated further. If no systematic review is yet available on the topic, it could be a good idea to start by performing the missing systematic review, before starting yet a new research project. Only once a well-performed systematic review has clearly identified the need for further research should a new trial be started. This usage of systematic reviews, by researchers, but also by ethics committees,41 grant providers, or drug regulation agencies, could help to decrease the number of useless research published. A second way systematic reviews could help in this context is by improving the methodological research knowledge of young physician-researchers. In fact, performing a systematic review during, or just after the medical curriculum, could be very formative. All of the post-graduate students or physicians (anaesthesiologists for most of them) I had the chance and the opportunity to accompany through the development of a systematic review, have attested of the profound impact that this experience has had on their understanding of the research methodology as well as the strengths and limitations of various study designs, including those of the systematic review itself (not discussed in the present dissertation). One needs to spend hours and days reading and critically appraising the published evidence on one given topic to really catch the reality of the science available. Many young researchers were surprised to realise the rather poor quality of the available evidence on which their everyday practice rely. Although this may sound negative, it is actually very important that every physician realises that reading one article is not enough to change their clinical practice, and that not everything that has been published necessarily reflects the truth. Additionally, the process of systematic reviewing188,189 as well as the standards of reporting systematic reviews190 are changing and improving regularly, making it a field of

92 continuous renewal. The implementation of “living systematic reviews”, the online and up-to-date summaries of available evidence, may represent the next step forward.191 Finally, it is of outmost importance that systematic reviews are taken for what they can offer, and not more. Although systematic reviews are vulnerable to the poor quality of research they rely on, they remain the best available source of evidence, as long as they are well performed, and acknowledge their limitations. Also, having read the entire scientific production answering a given question allows systematic reviewers to sometimes suspect or identify errors and misconducts. A systematic review is essentially a team work in which different experts from different fields are involved. One these experts should be a methodologist, epidemiologist or statistician which usually has a good capacity to detect potential errors in the reported data.192 Tools for suspecting the non-publication of small negative studies such as examining the plot of effect estimates in relation to the sample size of the trial have been recognised 20 years ago,122 and are used regularly by systematic reviewers. Other tools are less frequently used. For example it is quite difficult to fabricate totally “plausible” data and some statistical tools are now emerging to identify “doubtful” data such as a “very unlikely repartition of data in a randomised design”.103,156,193,194 The major issue that remains to be solved is what systematic reviewers ought to do when they come to suspect malpractices or misconduct. As discussed in chapter 5.5, most systematic reviewers do not seem to know what to do, nor do they seem willing to act as whistle-blowers. However, it should be the responsibility of all scientists to preserve the validity of published science.195,196 As the title of a recent editorial published by the Editors of PLoS Medicine stated “An unbiased scientific record should be everyone’s agenda”.197 It may be time for systematic reviewers to re-think their role in the matter. The Cochrane collaboration probably will need to play its role as a central actor in this context by creating a set of rules or recommendations as to what type of misconduct should be tracked, and define a clear code of conduct to help improve the validity of the accessible scientific record.198

93 7 Conclusion

There is little doubt that a well-performed systematic review will always provide a higher level of evidence of the efficacy of an intervention compared to a single experimental trial. However, the accumulation of poor quality research and of research based on misconduct challenges its role. Although systematic reviews may not always provide definite (and correct) answers, they can be very useful to prevent the production of unnecessary trials and to guide researchers in their choices of relevant research questions, rigorous methodology and relevant outcomes to apply in new trials. Systematic reviews may also help to clean the published literature from errors and malpractices identified during the review process, and may help improve the methodological knowledge and appraisal skills of young researchers. Even though most of the answers to the difficult problem of research misconduct lie with academic structures, policies and regulations, and will require major transformation and paradigm shifts, systematic reviews may be seen as an interesting tool to support a better quality in the published medical literature.

94 8 Bibliography

1. Cochrane, A. L. Effectiveness and Efficiency: Random Reflections on Health Services. (1972). 2. Borrell, B. A Medical Madoff: Anesthesiologist Faked Data in 21 Studies. Scientific American Available at: http://www.scientificamerican.com/article/a-medical-madoff- anesthestesiologist-faked-data/. (Accessed: 19th July 2016) 3. Harris, G. Doctor’s Pain Studies Were Fabricated, Hospital Says. (2009). 4. Adam, M. Routine Audit Uncovered Reuben Fraud - Anesthesiology News. Anesthesiology News (2009). Available at: http://www.anesthesiologynews.com/Policy- Management/Article/03-09/Routine-Audit-Uncovered-Reuben-Fraud/12641/ses=ogst. (Accessed: 11th July 2016) 5. Kowalczyk, L. Doctor accused of faking studies. .com (2009). Available at: http://archive.boston.com/news/health/articles/2009/03/11/doctor_accused_of_faking_studies/. (Accessed: 6th October 2016) 6. Winstein, K. J. & Armstrong, D. Top Pain Scientist Fabricated Data in Studies, Hospital Says. Wall Street Journal (2009). 7. Shafer, S. L. Tattered Threads: Anesth. Analg. 108, 1361–1363 (2009). 8. White, P. F., Kehlet, H. & Liu, S. Perioperative analgesia: what do we still know? Anesth. Analg. 108, 1364–1367 (2009). 9. Elia, N., Lysakowski, C. & Tramèr, M. R. Does multimodal analgesia with acetaminophen, nonsteroidal antiinflammatory drugs, or selective cyclooxygenase-2 inhibitors and patient- controlled analgesia morphine offer advantages over morphine alone? Meta-analyses of randomized trials. Anesthesiology 103, 1296–1304 (2005). 10. Moher, D. & Altman, D. G. Four Proposals to Help Improve the Medical Research Literature. PLoS Med. 12, (2015). 11. Clark, G. T. & Mulligan, R. Fifteen common mistakes encountered in clinical research. J. Prosthodont. Res. 55, 1–6 (2011). 12. Pua, H. An evaluation of the quality of clinical trials in anesthesia. - PubMed - NCBI. Available at: http://www.ncbi.nlm.nih.gov/pubmed/11684972. (Accessed: 10th May 2016) 13. Ioannidis, J. P. A. Why Most Published Research Findings Are False. PLoS Med. 2, (2005). 14. Langford, R. A., Huang, G. H. & Leslie, K. Quality and reporting of trial design in scientific papers in Anaesthesia over 25 years. Anaesthesia 64, 60–64 (2009). 15. Tavare, A. Managing research misconduct: is anyone getting it right? BMJ 343, d8212 (2011). 16. Tramèr, M. R. The Boldt debacle. Eur. J. Anaesthesiol. 28, 393–395 (2011). 17. Fanelli, D. How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data. PLoS ONE 4, e5738 (2009). 18. Gardner, W., Lidz, C. W. & Hartwig, K. C. Authors’ reports about research integrity problems in clinical trials. Contemp. Clin. Trials 26, 244–251 (2005). 19. Krumholz, H. M., Ross, J. S., Presler, A. H. & Egilman, D. S. What have we learnt from

95 Vioxx? BMJ 334, 120 (2007). 20. Deer, B. How the case against the MMR vaccine was fixed. BMJ 342, c5347–c5347 (2011). 21. Bolsin, S., Colson, M. & Marsiglio, A. Perioperative β blockade. BMJ 347, f5640 (2013). 22. Bouri, S., Shun-Shin, M. J., Cole, G. D., Mayet, J. & Francis, D. P. Meta-analysis of secure randomised controlled trials of β-blockade to prevent perioperative death in non-cardiac surgery. Heart heartjnl-2013-304262 (2013). doi:10.1136/heartjnl-2013-304262 23. Lenzer, J. Why we can’t trust clinical guidelines. BMJ 346, f3830–f3830 (2013). 24. Martinson, B. C., Anderson, M. S. & de Vries, R. Scientists behaving badly. Nature 435, 737–738 (2005). 25. Husten, L. Medicine Or Mass Murder? Guideline Based on Discredited Research May Have Caused 800,000 Deaths In Europe Over The Last 5 Years. Forbes Available at: http://www.forbes.com/sites/larryhusten/2014/01/15/medicine-or-mass-murder-guideline-based- on-discredited-research-may-have-caused-800000-deaths-in-europe-over-the-last-5-years/. (Accessed: 9th September 2016) 26. Carlisle, J. B. A meta‐ analysis of prevention of postoperative nausea and vomiting: randomised controlled trials by Fujii et al. compared with other authors. Anaesthesia 67, 1076– 1090 (2012). 27. Egger, M. & Smith, G. D. Misleading meta-analysis. BMJ 311, 753–754 (1995). 28. Pereira, T. V. & Ioannidis, J. P. A. Statistically significant meta-analyses of clinical trials have modest credibility and inflated effects. J. Clin. Epidemiol. 64, 1060–1069 (2011). 29. Sterne, J. A. C., Egger, M. & Smith, G. D. Investigating and dealing with publication and other biases in meta-analysis. BMJ 323, 101–105 (2001). 30. Imberger, G., Gluud, C., Boylan, J. & Wetterslev, J. Systematic Reviews of Anesthesiologic Interventions Reported as Statistically Significant: Problems with Power, Precision, and Type 1 Error Protection. Anesth. Analg. 121, 1611–1622 (2015). 31. Akl, E. A. et al. Reporting, handling and assessing the risk of bias associated with missing participant data in systematic reviews: a methodological survey. BMJ Open 5, e009368 (2015). 32. Kehlet, H. & Joshi, G. P. Systematic Reviews and Meta-Analyses of Randomized Controlled Trials on Perioperative Outcomes: An Urgent Need for Critical Reappraisal. Anesth. Analg. 121, 1104–1107 (2015). 33. Sivakumar, H. & Peyton, P. J. Poor agreement in significant findings between meta- analyses and subsequent large randomized trials in perioperative medicine. Br. J. Anaesth. 117, 431–441 (2016). 34. Ioannidis, J. P. A. The Mass Production of Redundant, Misleading, and Conflicted Systematic Reviews and Meta-analyses. Milbank Q. 94, 485–514 (2016). 35. Farrugia, P., Petrisor, B. A., Farrokhyar, F. & Bhandari, M. Research questions, hypotheses and objectives. Can. J. Surg. 53, 278–281 (2010). 36. Brian Haynes, R. Forming research questions. J. Clin. Epidemiol. 59, 881–886 (2006). 37. Freiman, J. A., Chalmers, T. C., Smith, H. & Kuebler, R. R. The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial. Survey of 71 ‘negative’ trials. N. Engl. J. Med. 299, 690–694 (1978). 38. Hopewell, S., Loudon, K., Clarke, M. J., Oxman, A. D. & Dickersin, K. Publication bias in clinical trials due to statistical significance or direction of trial results. Cochrane Database Syst.

96 Rev. MR000006 (2009). doi:10.1002/14651858.MR000006.pub3 39. Davis, P. How Much of the Literature Goes Uncited? The Scholarly Kitchen (2012). 40. Remler, D. Are 90% of academic papers really never cited? Reviewing the literature on academic citations. Impact of Social Sciences (2014). 41. Savulescu, J., Chalmers, I. & Blunt, J. Are research ethics committees behaving unethically? Some suggestions for improving performance and accountability. BMJ 313, 1390– 1393 (1996). 42. Chan, A.-W., Hróbjartsson, A., Jørgensen, K. J., Gøtzsche, P. C. & Altman, D. G. Discrepancies in sample size calculations and data analyses reported in randomised trials: comparison of publications with protocols. BMJ 337, a2299 (2008). 43. Weibel, S., Elia, N. & Kranke, P. The transparent clinical trial: Why we need complete and informative prospective trial registration. Eur. J. Anaesthesiol. 33, 72–74 (2016). 44. Glasziou, P., Meats, E., Heneghan, C. & Shepperd, S. What is missing from descriptions of treatment in trials and reviews? BMJ 336, 1472–1474 (2008). 45. Chan, A.-W., Hróbjartsson, A., Haahr, M. T., Gøtzsche, P. C. & Altman, D. G. Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. JAMA 291, 2457–2465 (2004). 46. Chan, A.-W. & Altman, D. G. Identifying outcome reporting bias in randomised trials on PubMed: review of publications and survey of authors. BMJ 330, 753 (2005). 47. Dwan, K. et al. Comparison of protocols and registry entries to published reports for randomised controlled trials. Cochrane Database Syst. Rev. MR000031 (2011). doi:10.1002/14651858.MR000031.pub2 48. Al-Marzouki, S., Roberts, I., Evans, S. & Marshall, T. Selective reporting in clinical trials: analysis of trial protocols accepted by The Lancet. Lancet Lond. Engl. 372, 201 (2008). 49. Myles, P. S., Grocott, M. P. W., Boney, O. & Moonesinghe, S. R. Standardizing end points in perioperative trials: towards a core and extended outcome set: Table 1. Br. J. Anaesth. 116, 586–589 (2016). 50. Elia, N. & Tramèr, M. R. Adherence to guidelines for improved quality of data reporting: where are we today? Eur. J. Anaesthesiol. 28, 478–480 (2011). 51. Greenfield, M. L. V. H. et al. Improvement in the Quality of Randomized Controlled Trials Among General Anesthesiology Journals 2000 to 2006: A 6-Year Follow-Up: Anesth. Analg. 108, 1916–1921 (2009). 52. Chalmers, I. & Glasziou, P. Avoidable waste in the production and reporting of research evidence. The Lancet 374, 86–89 (2009). 53. Haahr, M. T. & Hróbjartsson, A. Who is blinded in randomized clinical trials? A study of 200 trials and a survey of authors. Clin. Trials Lond. Engl. 3, 360–365 (2006). 54. Vedula, S. S., Li, T. & Dickersin, K. Differences in Reporting of Analyses in Internal Company Documents Versus Published Trial Reports: Comparisons in Industry-Sponsored Trials in Off-Label Uses of Gabapentin. PLoS Med 10, e1001378 (2013). 55. Dwan, K. et al. Systematic Review of the Empirical Evidence of Study Publication Bias and Outcome Reporting Bias. PLOS ONE 3, e3081 (2008). 56. Vedula, S. S., Bero, L., Scherer, R. W. & Dickersin, K. Outcome Reporting in Industry- Sponsored Trials of Gabapentin for Off-Label Use. N. Engl. J. Med. 361, 1963–1971 (2009). 57. Schandelmaier, S. et al. Planning and reporting of quality-of-life outcomes in cancer trials.

97 Ann. Oncol. Off. J. Eur. Soc. Med. Oncol. ESMO 26, 1966–1973 (2015). 58. Mathieu, S., Boutron, I., Moher, D., Altman, D. G. & Ravaud, P. Comparison of registered and published primary outcomes in randomized controlled trials. JAMA 302, 977–984 (2009). 59. Kasenda, B. et al. Subgroup analyses in randomised controlled trials: cohort study on trial protocols and journal publications. BMJ 349, g4539 (2014). 60. O’Boyle, E. H., Banks, G. C. & Gonzalez-Mulé, E. The Chrysalis Effect How Ugly Initial Results Metamorphosize Into Beautiful Articles. J. Manag. 149206314527133 (2014). doi:10.1177/0149206314527133 61. Boutron, I., Dutton, S., Ravaud, P. & Altman, D. G. Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes. JAMA 303, 2058–2064 (2010). 62. Lüscher, T. F., Gersh, B., Landmesser, U. & Ruschitzka, F. Is the panic about beta- blockers in perioperative care justified? Eur. Heart J. 35, 2442–2444 (2014). 63. Krimsky, S. & Sweet, E. An Analysis of Toxicology and Medical Journal Conflict-of- Interest Polices. Account. Res. 16, 235–253 (2009). 64. Kasenda, B. et al. Agreements between Industry and Academia on Publication Rights: A Retrospective Study of Protocols and Publications of Randomized Clinical Trials. PLOS Med 13, e1002046 (2016). 65. Kulkarni, A. V., Aziz, B., Shams, I. & Busse, J. W. Author self-citation in the general medicine literature. PloS One 6, e20885 (2011). 66. Landoni, G. et al. Self-citation in anaesthesia and critical care journals: introducing a flat tax. Br. J. Anaesth. 105, 386–387 (2010). 67. Jannot, A.-S., Agoritsas, T., Gayet-Ageron, A. & Perneger, T. V. Citation bias favoring statistically significant studies was present in medical research. J. Clin. Epidemiol. 66, 296–301 (2013). 68. Cameron, E. Z., Edwards, A. M. & White, A. M. Publishing: Halt self-citation in impact measures. Nature 505, 160–160 (2014). 69. Chalmers, I. Underreporting research is scientific misconduct. JAMA 263, 1405–1408 (1990). 70. Altman, D. G. & Moher, D. Declaration of transparency for each research article. BMJ 347, f4796 (2013). 71. Gibson, L. M., Brazzelli, M., Thomas, B. M. & Sandercock, P. A. A systematic review of clinical trials of pharmacological interventions for acute ischaemic stroke (1955-2008) that were completed, but not published in full. Trials 11, 43 (2010). 72. Schandelmaier, S. et al. Premature Discontinuation of Randomized Trials in Critical and Emergency Care: A Retrospective Cohort Study. Crit. Care Med. 44, 130–137 (2016). 73. Doshi, P., Dickersin, K., Healy, D., Vedula, S. S. & Jefferson, T. Restoring invisible and abandoned trials: a call for people to publish the findings. BMJ 346, f2865 (2013). 74. While, A. E. Ethics committees: impediments to research or guardians of ethical standards? BMJ 311, 661 (1995). 75. Hedgecoe, A., Carvalho, F., Lobmayer, P. & Raka, F. Research ethics committees in Europe: implementing the directive, respecting diversity. J. Med. Ethics 32, 483–486 (2006). 76. Salman, R. A.-S. et al. Increasing value and reducing waste in biomedical research regulation and management. Lancet 383, 176–185 (2014).

98 77. Goodman, N. W. & MacGowan, A. Are research ethics committees behaving unethically? If committees were sued who would be liable? BMJ 314, 676–677 (1997). 78. Hopewell, S. et al. Impact of peer review on reports of randomised trials published in open peer review journals: retrospective before and after study. BMJ 349, g4145–g4145 (2014). 79. Dyer, C. UK clinical trials must be registered to win ethics committee approval. BMJ 347, f5614 (2013). 80. Dechartres, A., Ravaud, P., Atal, I., Riveros, C. & Boutron, I. Association between trial registration and treatment effect estimates: a meta-epidemiological study. BMC Med. 14, 100 (2016). 81. Wager, E. & Elia, N. Why should clinical trials be registered? Eur. J. Anaesthesiol. 31, 397–400 (2014). 82. Eysenbach, G. Peer Review and Publication of Research Protocols and Proposals: A Role for Open Access Journals. J. Med. Internet Res. 6, (2004). 83. Shapiro, M. F. & Charrow, R. P. The role of data audits in detecting scientific misconduct. Results of the FDA program. JAMA 261, 2505–2511 (1989). 84. Schroter, S. et al. Effects of training on quality of peer review: randomised controlled trial. BMJ 328, 673 (2004). 85. Schroter, S. et al. What errors do peer reviewers detect, and does training improve their ability to detect them? J. R. Soc. Med. 101, 507–514 (2008). 86. Bohannon, J. Who’s Afraid of Peer Review? Science 342, 60–65 (2013). 87. Bala, M. M. et al. Randomized trials published in higher vs. lower impact journals differ in design, conduct, and analysis. J. Clin. Epidemiol. 66, 286–295 (2013). 88. Bruce, R., Chauvin, A., Trinquart, L., Ravaud, P. & Boutron, I. Impact of interventions to improve the quality of peer review of biomedical journals: a systematic review and meta-analysis. BMC Med. 14, (2016). 89. Marusic, A., Wager, E., Utrobicic, A., Rothstein, H. R. & Sambunjak, D. in Cochrane Database of Systematic Reviews (ed. The Cochrane Collaboration) (John Wiley & Sons, Ltd, 2016). 90. Ferguson, C., Marcus, A. & Oransky, I. Publishing: The peer-review scam. Nat. News 515, 480 (2014). 91. Haug, C. J. Peer-Review Fraud — Hacking the Scientific Publication Process. http://dx.doi.org/10.1056/NEJMp1512330 (2015). Available at: http://www.nejm.org/doi/full/10.1056/NEJMp1512330. (Accessed: 24th August 2016) 92. Cohen, A. et al. Organised crime against the academic peer review system. Br. J. Clin. Pharmacol. 81, 1012–1017 (2016). 93. Cross, M. Policing plagiarism. BMJ 335, 963–964 (2007). 94. Shafer, S. L. Shadow of doubt. Anesth. Analg. 112, 498–500 (2011). 95. Fang, F. C., Steen, R. G. & Casadevall, A. Misconduct accounts for the majority of retracted scientific publications. Proc. Natl. Acad. Sci. U. S. A. 109, 17028–17033 (2012). 96. Steen, R. G., Casadevall, A. & Fang, F. C. Why Has the Number of Scientific Retractions Increased? PLoS ONE 8, (2013). 97. Fang, F. C. & Casadevall, A. Retracted Science and the Retraction Index ▿ . Infect. Immun. 79, 3855–3859 (2011).

99 98. Nath, S. B., Marcus, S. C. & Druss, B. G. Retractions in the research literature: misconduct or mistakes? Med. J. Aust. 185, 152–154 (2006). 99. Faria PhD student, R. Scientific misconduct: how organizational culture plays its part. Tijdschr. Cult. Crim. 5, 38–54 (2015). 100. Office of Science and Technology. FACT SHEET Research Misconduct - A New Definition and New Procedures for Federal Research Agencies. Research Misconduct - A New Definition and New Procedures for Federal Research Agencies (1999). Available at: http://clinton3.nara.gov/WH/EOP/OSTP/html/9910_20_2.html. (Accessed: 27th July 2016) 101. Faria, R. La Ciencia bajo presión: comportamientos inadecuados y daños sociales. Crítica Penal Poder (2014). 102. Smith, R. The COPE Report 2000. What is research misconduct? (2000). 103. Buyse, M. et al. The role of biostatistics in the prevention, detection and treatment of fraud in clinical trials. Stat. Med. 18, 3435–3451 (1999). 104. Hart, B., Lundh, A. & Bero, L. Effect of reporting bias on meta-analyses of drug trials: reanalysis of meta-analyses. BMJ 344, d7202 (2012). 105. Kelley, G. A., Kelley, K. S. & Tran, Z. V. Retrieval of missing data for meta-analysis: A practical example. Int. J. Technol. Assess. Health Care 20, 296–299 (2004). 106. Mullan, R. J. et al. Systematic reviewers commonly contact study authors but do so with limited rigor. J. Clin. Epidemiol. 62, 138–142 (2009). 107. Higgins, J. P. T. & Green, S. (editors). Cochrane Handbook for Systematic Reviews of Intervention. Version 5.1.0 (updated March 2011). The Cochrane Colalboration, 2011. Available from http://handbook.cochrane.org. 108. Guyatt, G. H., Mills, E. J. & Elbourne, D. In the Era of Systematic Reviews, Does the Size of an Individual Trial Still Matter? PLOS Med 5, e4 (2008). 109. Turner, R. M., Bird, S. M. & Higgins, J. P. T. The Impact of Study Size on Meta-analyses: Examination of Underpowered Studies in Cochrane Reviews. PLOS ONE 8, e59202 (2013). 110. Garg, A. X., Hackam, D. & Tonelli, M. Systematic Review and Meta-analysis: When One Study Is Just not Enough. Clin. J. Am. Soc. Nephrol. 3, 253–260 (2008). 111. Moher, D. et al. Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet Lond. Engl. 352, 609–613 (1998). 112. Jadad, A. R. et al. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control. Clin. Trials 17, 1–12 (1996). 113. Elia, N. & Tramèr, M. R. Ketamine and postoperative pain--a quantitative systematic review of randomised trials. Pain 113, 61–70 (2005). 114. Higgins, J. P. T. et al. The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ 343, d5928 (2011). 115. Detweiler, B. N., Kollmorgen, L. E., Umberham, B. A., Hedin, R. J. & Vassar, B. M. Risk of bias and methodological appraisal practices in systematic reviews published in anaesthetic journals: a meta-epidemiological study. Anaesthesia 71, 955–968 (2016). 116. Pöpping, D. M. et al. Impact of Epidural Analgesia on Mortality and Morbidity After Surgery: Systematic Review and Meta-analysis of Randomized Controlled Trials. Ann. Surg. 259, 1056–1067 (2014). 117. Selph, S. S., Ginsburg, A. D. & Chou, R. Impact of contacting study authors to obtain additional data for systematic reviews: diagnostic accuracy studies for hepatic fibrosis. Syst. Rev.

100 3, 107 (2014). 118. Turner, E. H., Matthews, A. M., Linardatos, E., Tell, R. A. & Rosenthal, R. Selective publication of antidepressant trials and its influence on apparent efficacy. N. Engl. J. Med. 358, 252–260 (2008). 119. Eyding, D. et al. Reboxetine for acute treatment of major depression: systematic review and meta-analysis of published and unpublished placebo and selective serotonin reuptake inhibitor controlled trials. BMJ 341, c4737 (2010). 120. Cook, D. J. et al. Should unpublished data be included in meta-analyses? Current convictions and controversies. JAMA 269, 2749–2753 (1993). 121. Chan, A.-W. Out of sight but not out of mind: how to search for unpublished clinical trial evidence. BMJ 344, d8013–d8013 (2012). 122. Egger, M., Smith, G. D., Schneider, M. & Minder, C. Bias in meta-analysis detected by a simple, graphical test. BMJ 315, 629 (1997). 123. Tramèr, M. R., Reynolds, D. J., Moore, R. A. & McQuay, H. J. Impact of covert duplicate publication on meta-analysis: a case study. BMJ 315, 635–640 (1997). 124. von Elm, E., Poglia, G., Walder, B. & Tramèr, M. R. Different patterns of duplicate publication: an analysis of articles used in systematic reviews. JAMA 291, 974–980 (2004). 125. Roseman, M. et al. Reporting of conflicts of interest from drug trials in Cochrane reviews: cross sectional study. BMJ 345, e5155 (2012). 126. Bes-Rastrollo, M., Schulze, M. B., Ruiz-Canela, M. & Martinez-Gonzalez, M. A. Financial Conflicts of Interest and Reporting Bias Regarding the Association between Sugar-Sweetened Beverages and Weight Gain: A Systematic Review of Systematic Reviews. PLoS Med 10, e1001578 (2013). 127. Gøtzsche, P. C. et al. What Should Be Done To Tackle Ghostwriting in the Medical Literature? PLoS Med. 6, (2009). 128. Matheson, A. Ghostwriting: the importance of definition and its place in contemporary drug marketing. BMJ 354, i4578 (2016). 129. Gattas, D. J. M. et al. Fluid Resuscitation with 6% Hydroxyethyl Starch (130/0.4) in Acutely Ill Patients: An Updated Systematic Review and Meta-Analysis. Anesth. Analg. 114, 159–169 (2012). 130. Huss, A., Egger, M., Hug, K., Huwiler-Müntener, K. & Röösli, M. Source of Funding and Results of Studies of Health Effects of Mobile Phone Use: Systematic Review of Experimental Studies. Environ. Health Perspect. 115, 1–4 (2007). 131. Lundh, A., Sismondo, S., Lexchin, J., Busuioc, O. A. & Bero, L. Industry sponsorship and research outcome. Cochrane Database Syst. Rev. (1996). 132. Melander, H., Ahlqvist-Rastad, J., Meijer, G. & Beermann, B. Evidence b(i)ased medicine—selective reporting from studies sponsored by pharmaceutical industry: review of studies in new drug applications. BMJ 326, 1171 (2003). 133. Bekelman, J. E., Li, Y. & Gross, C. P. Scope and impact of financial conflicts of interest in biomedical research: a systematic review. JAMA 289, 454–465 (2003). 134. Marret, E. et al. Susceptibility to fraud in systematic reviews: lessons from the Reuben case. Anesthesiology 111, 1279–1289 (2009). 135. Moore, R. A., Derry, S. & McQuay, H. J. Fraud or flawed: adverse impact of fabricated or poor quality research. Anaesthesia 65, 327–330 (2010).

101 136. Chalmers, I. Role of systematic reviews in detecting plagiarism: case of Asim Kurjak. BMJ 333, 594–596 (2006). 137. Vergnes, J.-N., Marchal-Sixou, C., Nabet, C., Maret, D. & Hamel, O. Ethics in systematic reviews. J. Med. Ethics 36, 771–774 (2010). 138. Weingarten, M. A., Paul, M. & Leibovici, L. Assessing ethics of trials in systematic reviews. BMJ 328, 1013–1014 (2004). 139. Kranke, P., Apfel, C. C., Eberhart, L. H., Georgieff, M. & Roewer, N. The influence of a dominating centre on a quantitative systematic review of granisetron for preventing postoperative nausea and vomiting. Acta Anaesthesiol. Scand. 45, 659–670 (2001). 140. Kranke, P., Apfel, C. C., Roewer, N. & Fujii, Y. Reported data on granisetron and postoperative nausea and vomiting by Fujii et al. Are incredibly nice! Anesth. Analg. 90, 1004– 1007 (2000). 141. Carlisle, J. B. The analysis of 168 randomised controlled trials to test data integrity. Anaesthesia 67, 521–537 (2012). 142. Miller, D. R. Retraction of articles written by Dr. Yoshitaka Fujii. Can. J. Anaesth. J. Can. Anesth. 59, 1081–1088 (2012). 143. Vlassov, V. The role of Cochrane Review authors in exposing research and publication misconduct. Cochrane Database Syst. Rev. (2010). 144. Ross, J. S. et al. Pooled analysis of rofecoxib placebo-controlled clinical trial data: lessons for postmarket pharmaceutical safety surveillance. Arch. Intern. Med. 169, 1976–1985 (2009). 145. Picard, P. & Tramèr, M. R. Prevention of pain on injection with propofol: a quantitative systematic review. Anesth. Analg. 90, 963–969 (2000). 146. Panda, S. C. Medicine: Science or Art? Mens Sana Monogr. 4, 127 (2006). 147. Sharma, R., Gordon, M., Dharamsi, S. & Gibbs, T. Systematic reviews in medical education: a practical approach: AMEE guide 94. Med. Teach. 37, 108–124 (2015). 148. Sackett, D. L., Rosenberg, W. M., Gray, J. A., Haynes, R. B. & Richardson, W. S. Evidence based medicine: what it is and what it isn’t. BMJ 312, 71 (1996). 149. Barbour, V. Perverse incentives and perverse publishing practices. Sci. Bull. 60, 1225– 1226 (2015). 150. Departement fédéral de l’intérieur. Spécialiste en anesthésiologie. Programme de formation postgraduée du 1er janvier 2013 (dernière révision: 24 septembre 2015). (2011). 151. Bastian, H., Glasziou, P. & Chalmers, I. Seventy-Five Trials and Eleven Systematic Reviews a Day: How Will We Ever Keep Up? PLOS Med 7, e1000326 (2010). 152. Sarewitz, D. The pressure to publish pushes down quality. Nat. News 533, 147 (2016). 153. Altman, D. G. The scandal of poor medical research. BMJ 308, 283–284 (1994). 154. Schulz, K. F., Chalmers, I., Hayes, R. J. & Altman, D. G. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 273, 408–412 (1995). 155. Bossi, E. Scientific integrity, misconduct in science. Swiss Med. Wkly. 140, 183–186 (2010). 156. Al-Marzouki, S., Roberts, I., Marshall, T. & Evans, S. The effect of scientific misconduct on the results of clinical trials: A Delphi survey. Contemp. Clin. Trials 26, 331–337 (2005). 157. Godlee, F., Smith, J. & Marcovitch, H. Wakefield’s article linking MMR vaccine and

102 autism was fraudulent. BMJ 342, c7452 (2011). 158. Crocker, J. & Cooper, M. L. Addressing Scientific Fraud. Science 334, 1182–1182 (2011). 159. Lancet, T. Scientific fraud: action needed in China. The Lancet 375, 94 (2010). 160. Smith, R. Statutory regulation needed to expose and stop medical fraud. BMJ 352, i293 (2016). 161. Binder, R., Friedli, A. & Fuentes-Afflick, E. The New Academic Environment and Faculty Misconduct. Acad. Med. J. Assoc. Am. Med. Coll. 91, 175–179 (2016). 162. Lüscher, T. F. The codex of science: honesty, precision, and truth—and its violations. Eur. Heart J. 34, 1018–1023 (2013). 163. Davis, M. S., Riske-Morris, M. & Diaz, S. R. Causal Factors Implicated in Research Misconduct: Evidence from ORI Case Files. Sci. Eng. Ethics 13, 395–414 (2007). 164. Mutch, W. A. C. Academic fraud: perspectives from a lifelong anesthesia researcher. Can. J. Anesth. Can. Anesth. 58, 782–788 (2011). 165. Fanelli, D., Costas, R. & Larivière, V. Misconduct Policies, Academic Culture and Career Stage, Not Gender or Pressures to Publish, Affect Scientific Integrity. PLOS ONE 10, e0127556 (2015). 166. Hicks, D., Wouters, P., Waltman, L., de Rijcke, S. & Rafols, I. Bibliometrics: The Leiden Manifesto for research metrics. Nature 520, 429–431 (2015). 167. The Lancet. Retractions: the lessons for research institutions. 168. Ioannidis, J. P. A. How to Make More Published Research True. PLoS Med. 11, (2014). 169. Okonta, P. I. & Rossouw, T. Misconduct in research: a descriptive survey of attitudes, perceptions and associated factors in a developing country. BMC Med. Ethics 15, 25 (2014). 170. Pavur, C. The Hippocratic Oath in Latin with English translation. Available at: http://www.academia.edu/2638728/The_Hippocratic_Oath_in_Latin_with_English_translation. (Accessed: 19th September 2016) 171. Kuehn, B. M. Harmonizing reporting of financial conflicts. JAMA 309, 19 (2013). 172. Ioannidis, J. P. A. Why Science Is Not Necessarily Self-Correcting. Perspect. Psychol. Sci. 7, 645–654 (2012). 173. Tramèr, M. R. The Fujii story: A chronicle of naive disbelief. Eur. J. Anaesthesiol. 30, 195–198 (2013). 174. Smith, R. Peer review: a flawed process at the heart of science and journals. J. R. Soc. Med. 99, 178–182 (2006). 175. Kowalczuk, M. K. et al. Retrospective analysis of the quality of reports by author- suggested and non-author-suggested reviewers in journals operating on open or single-blind peer review models. BMJ Open 5, e008707 (2015). 176. Rooyen, S. van, Godlee, F., Evans, S., Black, N. & Smith, R. Effect of open peer review on quality of reviews and on reviewers’ recommendations: a randomised trial. BMJ 318, 23 (1999). 177. Rennie, D. Let’s make peer review scientific. Nat. News 535, 31 (2016). 178. Home. Publons Available at: http://prw.publons.com/. (Accessed: 12th October 2016) 179. Van Noorden, R. The scientists who get credit for peer review. Nat. News doi:10.1038/nature.2014.16102 180. Fanelli, D. Why Growing Retractions Are (Mostly) a Good Sign. PLOS Med 10, e1001563

103 (2013). 181. PubPeer - Search publications and join the conversation. Available at: https://pubpeer.com/. (Accessed: 12th October 2016) 182. Retraction Watch - Tracking retractions as a window into the scientific process. Retraction Watch Available at: http://retractionwatch.com. (Accessed: 12th October 2016) 183. Pfeifer, M. P. & Snodgrass, G. L. The continued use of retracted, invalid scientific literature. JAMA 263, 1420–1423 (1990). 184. Claxton, L. D. Scientific authorship. Part 1. A window into scientific fraud? Mutat. Res. 589, 17–30 (2005). 185. Evidence Live | Better Evidence for Better Healthcare. Available at: http://evidencelive.org/. (Accessed: 12th October 2016) 186. OpenTrials. OpenTrials Available at: http://opentrials.net/. (Accessed: 12th October 2016) 187. The James Lind Alliance | James Lind Alliance. Available at: http://www.jla.nihr.ac.uk/. (Accessed: 12th October 2016) 188. Shamseer, L. et al. Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ 349, g7647 (2015). 189. MECIR reporting standards now finalized | Cochrane Community (beta). Available at: http://community.cochrane.org/news/tags/authors/mecir-reporting-standards-now-finalized. (Accessed: 10th July 2015) 190. Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G. & The PRISMA Group. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Med 6, e1000097 (2009). 191. Elliott, J. H. et al. Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS Med 11, e1001603 (2014). 192. Ranstam, J. et al. Fraud in medical research: an international survey of biostatisticians. ISCB Subcommittee on Fraud. Control. Clin. Trials 21, 415–427 (2000). 193. Hein, J., Zobrist, R., Konrad, C. & Schuepfer, G. Scientific fraud in 20 falsified anesthesia papers: Detection using financial auditing methods. Anaesthesist 61, 543–549 (2012). 194. Pandit, J. J. On statistical methods to test if sampling in trials is genuinely random. Anaesthesia 67, 456–462 (2012). 195. Bonnet, F. & Samama, C. M. Les cas de fraude dans les publications : de Darsee à Poldermans. Presse Médicale 41, 816–820 (2012). 196. Ioannidis, J. P. A. Evidence-based medicine has been hijacked: a report to David Sackett. J. Clin. Epidemiol. 73, 82–86 (2016). 197. Editors, T. Pl. M. An Unbiased Scientific Record Should Be Everyone’s Agenda. PLoS Med. 6, (2009). 198. Carlisle, J. et al. What should the Cochrane Collaboration do about research that is, or might be, fraudulent? Cochrane Database Syst. Rev. ED000060 (2013). doi:10.1002/14651858.ED000060

104