Data Sharing and Reanalysis of Randomized Controlled Trials In
Total Page:16
File Type:pdf, Size:1020Kb
RESEARCH Data sharing and reanalysis of randomized controlled trials in leading biomedical journals with a full data sharing policy: survey of studies published in The BMJ and PLOS Medicine Florian Naudet,1 Charlotte Sakarovitch,2 Perrine Janiaud,1 Ioana Cristea,1,3 Daniele Fanelli,1,4 David Moher,1,5 John P A Ioannidis1,6 1Meta-Research Innovation ABSTRACT in contacting corresponding authors and lack of Center at Stanford (METRICS), OBJECTIVES resources on their behalf in preparing the datasets. Stanford University, Stanford, To explore the effectiveness of data sharing by In addition, there was a range of different data California, USA randomized controlled trials (RCTs) in journals with sharing practices across study groups. 2Quantitative Sciences Unit, Division of Biomedical a full data sharing policy and to describe potential CONCLUSIONS Informatics Research, difficulties encountered in the process of performing Data availability was not optimal in two journals Department of Medicine, reanalyses of the primary outcomes. Stanford University, Stanford, with a strong policy for data sharing. When CA, USA DESIGN investigators shared data, most reanalyses largely 3Department of Clinical Survey of published RCTs. reproduced the original results. Data sharing Psychology and Psychotherapy, practices need to become more widespread and Babes-Bolyai University, SETTING Romania PubMed/Medline. streamlined to allow meaningful reanalyses and 4 reuse of data. Department of Methodology, ELIGIBILITY CRITERIA London School of Economics RCTs that had been submitted and published by The TRIAL REGISTRATION and Political Science, UK Open Science Framework osf.io/c4zke. 5Centre for Journalology, Clinical BMJ and PLOS Medicine subsequent to the adoption Epidemiology Program, Ottawa of data sharing policies by these journals. Hospital Research Institute, MAIN OUTCOME MEASURE Introduction Ottawa, ON, Canada The primary outcome was data availability, defined 6Departments of Medicine, of Patients, medical practitioners, and health policy Health Research and Policy, of as the eventual receipt of complete data with clear analysts are more confident when the results and Biomedical Data Science, and labelling. Primary outcomes were reanalyzed to assess conclusions of scientific studies can be verified. For of Statistics, Stanford University, to what extent studies were reproduced. Difficulties Stanford, California, USA a long time, however, verifying the results of clinical encountered were described. Correspondence to: trials was not possible, because of the unavailability J P A Ioannidis RESULTS of the data on which the conclusions were based. [email protected] 37 RCTs (21 from The BMJ and 16 from PLOS Medicine) Data sharing practices are expected to overcome this Additional material is published published between 2013 and 2016 met the eligibility problem and to allow for optimal use of data collected online only. To view please visit the journal online. criteria. 17/37 (46%, 95% confidence interval 30% to in trials: the value of medical research that can inform Cite this as: BMJ 2018;360:k400 62%) satisfied the definition of data availability and clinical practice increases with greater transparency http://dx.doi.org/10.1136/bmj.k400 14 of the 17 (82%, 59% to 94%) were fully reproduced and the opportunity for external researchers to Accepted: 8 January 2018 on all their primary outcomes. Of the remaining RCTs, reanalyze, synthesize, or build on previous data. errors were identified in two but reached similar In 2016 the International Committee of Medical conclusions and one paper did not provide enough Journal Editors (ICMJE) published an editorial1 stating information in the Methods section to reproduce the that “it is an ethical obligation to responsibly share analyses. Difficulties identified included problems data generated by interventional clinical trials because participants have put themselves at risk.” The ICMJE WHat IS ALREADY KNOWN ON THIS TOPIC proposed to require that deidentified individual patient The International Committee of Medical Journal Editors requires that a data data (IPD) are made publicly available no later than sharing plan be included in each paper (and prespecified in study registration) six months after publication of the trial results. This 2-7 Two leading general medical journals, The BMJ and PLOS Medicine, already have proposal triggered debate. In June 2017, the ICMJE a stronger policy, expressly requiring data sharing as a condition for publication stepped back from its proposal. The new requirements of randomized controlled trials (RCTs) do not mandate data sharing but only a data sharing Only a small number of reanalyses of RCTs has been published to date; of these, plan to be included in each paper (and prespecified in 8 a minority was conducted by entirely independent authors study registration). Because of this trend toward a new norm where WHat THIS stUDY ADDS data sharing for randomized controlled trials (RCTs) Data availability was not optimal in two journals with a strong policy for data becomes a standard, it seems important to assess sharing, but the 46% data sharing rate observed was higher than elsewhere in how accessible the data are in journals with existing the biomedical literature data sharing policies. Two leading general medical When reanalyses are possible, these mostly yield similar results to the original journals, The BMJ9 10 and PLOS Medicine,11 already analysis; however, these reanalyses used data at a mature analytical stage have a policy expressly requiring data sharing as a Problems in contacting corresponding authors, lack of resources in preparing condition for publication of clinical trials: data sharing the datasets, and important heterogeneity in data sharing practices are barriers became a requirement after January 2013 for RCTs on to overcome drugs and devices9 and July 2015 for all therapeutics10 the bmj | BMJ 2018;360:k400 | doi: 10.1136/bmj.k400 1 RESEARCH at The BMJ, and after March 2014 for all types of we sent a standardized email (https://osf.io/h9cas/). interventions at PLOS Medicine. Initial emails were sent from a professional email We explored the effectiveness of RCT data sharing in address ([email protected]), and three additional both journals in terms of data availability, feasibility, reminders were sent to each author two or three weeks and accuracy of reanalyses and describe potential apart, in case of non-response. difficulties encountered in the process of performing reanalyses of the primary outcomes. We focused on Data availability RCTs because they are considered to represent high Our primary outcome was data availability, defined as quality evidence and because availability of the data the eventual receipt of data presented with sufficient is crucial in the evaluation of health interventions. information to reproduce the analysis of the primary RCTs represent the most firmly codified methodology, outcomes of the included RCTs (ie, complete data with which also allows data to be most easily analyzed. clear labelling). Additional information was collected Moreover, RCTs have been the focus of transparency on type of data sharing (request by email, request and data sharing initiatives,12 owing to the importance using a specific website, request using a specific of primary data availability in the evaluation of register, available on a public register, other), time for therapeutics (eg, for IPD meta-analyses). collecting the data (in days, time between first attempt to success of getting a database), deidentification Methods of data (concerning name, birthdate, and address), The methods were specified in advance. They were type of data shared (from case report forms to directly documented in a protocol submitted for review on 12 analyzable datasets),13 sharing of analysis code, and November 2016 and subsequently registered with the reasons for non-availability in case data were not Open Science Framework on 15 November 2016 (https:// shared. osf.io/u6hcv/register/565fb3678c5e4a66b5582f67). Reproducibility Eligibility criteria When data were available, a single researcher (FN) We surveyed publications of RCTs, including cluster carried out a reanalysis of the trial. For each study, trials and crossover studies, non-inferiority designs, analyses were repeated exactly as described in the and superiority designs, that had been submitted and published report of the study. Whenever insufficient published by The BMJ and PLOS Medicine subsequent to details about the analysis was provided in the the adoption of data sharing policies by these journals. study report, we sought clarifications from the trial investigators. We considered only analyses concerning Search strategy and study selection the primary outcome (or outcomes, if multiple primary We identified eligible studies from PubMed/Medline. For outcomes existed) of each trial. Any discrepancy The BMJ we used the search strategy: “BMJ”[jour] AND between results obtained in the reanalysis and (“2013/01/01”[PDAT]: “2017/01/01”[PDAT]) AND those reported in the publication was examined in Randomized Controlled Trial[ptyp]. For PLOS Medicine consultation with a statistician (CS). This examination we used: “PLoS Med”[jour]) AND (“2014/03/01”[PDAT]: aimed to determine if, based on both quantitative “2017/01/01”[PDAT]) AND Randomized Controlled (effect size, P values) and qualitative (clinical Trial[ptyp]. judgment) consideration, the discrepant results of the Two reviewers (FN and PJ) performed the eligibility