TOWARDS A CULTURE OF OPEN SCIENCE AND DATA SHARING IN HEALTH AND MEDICAL RESEARCH

Anisa Rowhani-Farid Bachelor of Medical Science, Master of Public Health

Primary Supervisor: Professor Adrian G. Barnett

Associate Supervisors: Professor Nicholas Graves and Dr Katie Page

Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy

School of Public Health and Social Work Faculty of Health Queensland University of Technology

2018

Keywords

Badges Code sharing Culture Data sharing Health and medical research Incentives Institutional change Journals Knowledge Meta-research Meta-science Open science Paradigm shift Replicability Reproducibility Research integrity Research waste Rewards Scientific policy Scientific truth Transparency Verification

Towards a culture of open science and data sharing in health and medical research i

Abstract

Background: The foundation of health and medical research is data. Sharing data and code are important components of reproducible research. Data sharing facilitates the progress of research and strengthens science. Data sharing in research is widely discussed in the literature, however there are seemingly no evidence-based incentives that reward data sharing, nor studies that demonstrate the effectiveness of data sharing policies at increasing data sharing. Simple incentives to increase data sharing might provide the change needed to increase data sharing in health and medical research.

Methods: Four separate studies were conducted, however, data are still being collected for Study 4.

Study 1: A of the health and medical research literature was used to uncover any evidence-based incentives, with empirical randomised or pre- and post- data, that examined data sharing rates. We were also interested in quantifying and classifying: the number of opinion pieces on the importance of incentives, the number of observational studies that analysed data sharing rates and practices, and strategies aimed at increasing data sharing rates.

Study 2: An observational study at the BMJ research archive of 160 randomly sampled BMJ research articles from 2009 to 2015, excluding meta-analyses and systematic reviews. The aim was to quantify data sharing trends and data sharing policy compliance at the BMJ by analysing the rate of data sharing practices, and investigate attitudes and examine barriers towards data sharing. The primary outcome measures were the percentages of research articles that indicated the availability of their raw datasets in their data sharing statements, and those that easily made their datasets available upon request.

Study 3: The reproducibility policy at the journal Biostatistics rewards articles with badges for data and code sharing. This study investigated the effect of badges on increasing reproducible research, specifically, data and code sharing, at Biostatistics compared with Statistics in Medicine. The setting of this observational study was the Biostatistics and the Statistics in Medicine (control journal) online research archives. The data were 240 randomly sampled articles from 2006 to 2013 (30 articles per

ii Towards a culture of open science and data sharing in health and medical research

year) per journal, a total sample of 480 articles. Data analyses included: plotting the probability of data and code sharing by article submission date, and Bayesian logistic regression modelling to test for a difference in the probability of making data and code available after the introduction of badges at Biostatistics.

Study 4: A randomised controlled trial to examine if Open Data Badges increase data sharing rates among health and medical researchers that publish with BMJ Open. This study is a parallel group randomised controlled trial with two arms, control and intervention, with 83 research articles per arm, for a total of 166 research articles.

Results: Study 1 showed that only one incentive (using open data badges) has been tested in health and medical research that examined data sharing rates. The number of opinion pieces (n = 85) out-weighed the number of articles testing strategies (n = 76), and the number of observational studies exceeded them both (n = 106).

Study 2 found that only 7/157 research articles easily shared their datasets, 4.5% (95% confidence interval: 1.8% to 9.0%). For 21 clinical trials bound by the BMJ data sharing policy, the percent shared was 24% (8% to 47%).

Study 3 concluded that the probability of data sharing was higher at Biostatistics than the control journal but the probability of code sharing was comparable for both journals. The probability of data sharing increased by 3.9 times (95% credible interval: 1.5 to 8.4 times, p-value probability that sharing increased: 0.998) after badges were introduced at Biostatistics. On an absolute scale, this difference was only a 7.6% increase in data sharing (95% CI: 2 to 15%, p-value: 0.998). Badges did not have an impact on code sharing at the journal (mean increase: 1 time, 95% credible interval: 0.03 to 3.58 times, p-value probability that sharing increased: 0.378).

Conclusions: Given that data is the foundation of evidence-based health and medical research, it is paradoxical that there is only one evidence-based incentive to reward data sharing. More well-designed studies are needed in order to increase the currently low rates of data sharing.

Despite the BMJ’s strong data sharing policy, sharing rates are low. Possible explanations for low data sharing rates could be: the wording of the BMJ data sharing policy, which leaves room for individual interpretation and possible

Towards a culture of open science and data sharing in health and medical research iii

loopholes; that our email requests ended up in researchers’ spam folders; and, that researchers are not rewarded for sharing their data. It might be time for a more effective data sharing policy and better incentives for health and medical researchers to share their data.

The effect of badges at Biostatistics was a 7.6% increase in the data sharing rate, which is 5 times less than the effect of badges on data sharing at Psychological Science (37.9% badge effect). Although badges at Biostatistics did not increase code sharing and were associated with only a moderate effect on data sharing, they are an interesting step that journals are taking to incentivise and promote reproducible research.

iv Towards a culture of open science and data sharing in health and medical research

Research output from this PhD

Peer-reviewed publications Rowhani-Farid A, Allen M, Barnett AG: What incentives increase data sharing in health and medical research? A systematic review. Research Integrity and 2017, 2(1):4. doi.org/10.1186/s41073-017-0028-9

Rowhani-Farid A, Barnett AG: Has open data arrived at the British Medical Journal (BMJ)? An observational study. BMJ Open 2016, 6(10). doi.org/10.1136/bmjopen-2016-011784

Rowhani-Farid A and Barnett AG. Badges for sharing data and code at Biostatistics: an observational study [version 2; referees: 2 approved]. F1000Research 2018, 7:90 (doi: 10.12688/f1000research.13477.2)

Data publications

Rowhani-Farid A, Allen M, Barnett AG: What incentives increase data sharing in health and medical research? A systematic review. Research Integrity and Peer Review 2017, 2(1):4. https://osf.io/dspu8/

Rowhani-Farid A, Barnett AG: Has open data arrived at the British Medical Journal (BMJ)? An observational study. BMJ Open 2016, 6(10). doi:10.5061/dryad.q8s5k.

Rowhani-Farid A and Barnett AG. Badges for sharing data and code at Biostatistics: an observational study [version 2; referees: 2 approved]. F1000Research 2018, 7:90 https://doi.org/10.6084/m9.figshare.5687548.v2

Abstract publication

Rowhani-Farid A, Barnett AG: Has open data arrived at the British Medical Journal (BMJ)? An observational study, 5th World Conference on Research Integrity, May 2017, Amsterdam, the Netherlands, http://www.wcri2017.org/images/Abstract-Book-5th-WCRI-2017.pdf

Conference presentations

• e-research Australasia conference, October 2015, Brisbane, Australia – poster presentation • IHBI Inspires conference, November 2015, Brisbane, Australia – poster presentation • IHBI Inspires conference, November 2016, Brisbane, Australia – oral presentation, People’s Choice Winner for best oral presentation

Towards a culture of open science and data sharing in health and medical research v

• 5th World Conference on Research Integrity, May 2017, Amsterdam, the Netherlands – oral presentation at the Conference and at the Doctoral Forum • IHBI Inspires conference, September 2017, Brisbane, Australia – poster presentation

Other presentations

• Guest lecture for a graduate course, Research Support Services for Academic and Special Libraries, offered by the Library and Information Education program in the School of Information Systems at QUT in October 2016 • Guest speaker for QUT International Open Data Week 2016 event – Open Data: It's good science • Guest speaker for QUT Open Access Week 2017 event • Guest blogger for QUT Love Data Week 2018, http://blogs.qut.edu.au/library/2018/02/12/love-data-week-2018/ • Guest blogger for F1000blog in 2018: https://blog.f1000.com/2018/03/19/building-trust-badging/ • National finalist for FameLab: https://www.britishcouncil.org.au/famelab; https://australiascience.tv/vod/anisa-rowhani-farid-towards-a-culture-of-open- science-and-data-sharing-in-health-and-medical-research/

vi Towards a culture of open science and data sharing in health and medical research

Table of Contents

Keywords ...... i Abstract ...... ii Research output from this PhD ...... v Table of Contents ...... vii List of Figures ...... x List of Tables ...... xii List of Abbreviations ...... xiii Statement of Original Authorship ...... xiv Acknowledgements ...... xv Chapter 1: Introduction ...... 1 1.1 Background ...... 1 1.2 Context...... 4 1.3 Purposes ...... 6 1.4 Significance, Scope and Definitions ...... 7 Chapter 2: Literature Review ...... 9 2.1 An historical, philosophical, and epistemological perspective ...... 9 2.2 Why share data? ...... 14 2.3 Negative impacts of data withholding ...... 18 2.4 Why to not share data ...... 19 2.5 Barriers to sharing ...... 21 2.6 The requirement of sharing data ...... 27 2.7 Summary and Implications ...... 33 2.8 A personal experience of accessing data ...... 34 2.9 Thesis Outline ...... 37 Chapter 3: What incentives increase data sharing in health and medical research? A systematic review ...... 38 3.1 Overview ...... 39 3.2 Abstract ...... 40 3.3 Key words ...... 40 3.4 Rationale ...... 40 3.5 Objective ...... 41 3.6 Introduction ...... 41 3.7 Research design and methodology ...... 43 3.8 Results ...... 47

Towards a culture of open science and data sharing in health and medical research vii

3.9 Discussion ...... 56 3.10 Limitations ...... 57 3.11 Conclusion ...... 58 3.12 List of abbreviations ...... 58 3.13 Declarations ...... 58 Chapter 4: Has open data arrived at the British Medical Journal (BMJ)? An observational study ...... 61 4.1 Overview ...... 62 4.2 Abstract ...... 63 4.3 Article summary ...... 64 4.4 Introduction ...... 64 4.5 Methods ...... 66 4.6 Results ...... 68 4.7 Discussion ...... 74 4.8 Conclusion ...... 77 4.9 Other information ...... 78 4.10 Copyright, Open Access, and permission to reuse ...... 78 4.11 Funding ...... 78 4.12 Contributors ...... 78 4.13 Ethical approval ...... 78 4.14 Competing interests ...... 79 4.15 Transparency ...... 79 4.16 Open access ...... 79 4.17 Data sharing ...... 79 Chapter 5: Badges for data and code sharing at Biostatistics: an observational study ...... 80 5.1 Overview ...... 81 5.2 Abstract ...... 82 5.3 Key words ...... 83 5.4 Introduction ...... 83 5.5 Methods ...... 85 5.6 Results ...... 89 5.7 Discussion ...... 97 5.8 Conclusion ...... 99 5.9 Data availability ...... 100 5.10 Consent ...... 100 5.11 Competing interests ...... 100 5.12 Grant information ...... 100

viii Towards a culture of open science and data sharing in health and medical research

5.13 Acknowledgements ...... 100 Chapter 6: Did awarding badges increase data sharing at British Medical Journal Open (BMJ Open)? A randomised controlled trial ...... 101 6.1 Overview ...... 101 6.2 Funding ...... 101 6.3 Aim ...... 101 6.4 Significance ...... 101 6.5 Hypothesis ...... 102 6.6 Background ...... 102 6.7 Outcome measures ...... 103 6.8 Research design and methods ...... 105 6.9 Methods ...... 105 6.10 Limitations ...... 106 6.11 Statistical Analysis ...... 107 Chapter 7: Discussion ...... 109 7.1 Science and society: reflections at the nexus ...... 109 7.2 Evidenced-based science? ...... 110 7.3 Badges? ...... 110 7.4 Policies? ...... 111 7.5 Delays and badges ...... 112 7.6 Kitemarks as badges ...... 113 7.7 The evolution of and Costs of data sharing policies ...... 113 7.8 Alternative perspectives ...... 116 7.9 Research translation ...... 117 7.10 Ideas for future studies...... 120 7.11 Reflections ...... 121 Chapter 8: Conclusions ...... 122 8.1 The cultural shift towards more open data...... 122 8.2 The way forward ...... 123 Appendices ...... 125 Appendix A ...... 125 Appendix B ...... 131 Appendix C ...... 133 Appendix D ...... 137 Bibliography ...... 147

Towards a culture of open science and data sharing in health and medical research ix

List of Figures

Figure 1-1: Conceptual framework of this research (image derived primarily from the work of Ioannidis et al. 2015) [23] ...... 5 Figure 2-1: The current scientific paradigm shift ...... 12 Figure 3-1: PRISMA [1] Flow Chart: systematic selection of studies that have tested incentives to increase data sharing rates in health and medical research from the literature ...... 48 Figure 4-1: Flow chart of the randomly sampled BMJ research articles showing the availability of data ...... 70 Figure 4-2: Flow chart of the randomly sampled BMJ research articles bound by the BMJ data sharing policy, randomised clinical trials, showing the availability of data ...... 71 Figure 4-3: Summary of data availability and actual data received for BMJ research articles grouped by year and and in relation to data sharing policy changes. BMJ, British Medical Journal; RCT, randomised ...... 72 Figure 5-1a: Flow chart of data availability. Randomly selected Biostatistics articles from 2006 to 2013 ...... 89 Figure 5-1b: Flow charts of data availability. Randomly selected Statistics in Medicine articles from 2006 to 2013 ...... 89 Figure 5-2a: Plot of data availability over time by submission date. The dots at ‘No’ or ‘Yes’ are individual articles and the lines are a smoothed mean using a LOESS together with 95% confidence intervals (grey areas). The red lines indicate the interim period: 1 January 2009 to 1 July 2009 ...... 90 Figure 5-2b: Plot of data availability over time by publication date. The dots at ‘No’ or ‘Yes’ are individual articles and the lines are a smoothed mean using a LOESS together with 95% confidence intervals (grey areas). The red lines indicate the interim period: 1 January 2009 to 1 July 2009 ...... 91 Figure 5-2c: Plot of data availability by submission date assuming the now broken links were working at the time. The dots at ‘No’ or ‘Yes’ are individual articles and the lines are a smoothed mean using a LOESS together with 95% confidence intervals (grey areas). The red lines indicate the interim period: 1 January 2009 to 1 July 2009...... 92 Figure 5-3a: Flow charts of code availability. Randomly selected Biostatistics articles from 2006 to 2013 ...... 93 Figure 5-3b: Flow charts of code availability. Randomly selected Statistics in Medicine articles from 2006 to 2013 ...... 93 Figure 5-4a: Plot of code availability over time by submission date. The dots at ‘No’ or ‘Yes’ are individual articles and the lines are a smoothed

x Towards a culture of open science and data sharing in health and medical research

mean using a LOESS together with 95% confidence intervals (grey areas). The red lines indicate the interim period: 1 January 2009 to 1 July 2009...... 94 Figure 5-4b: Plot of code availability over time by publication date. The dots at ‘No’ or ‘Yes’ are individual articles and the lines are a smoothed mean using a LOESS together with 95% confidence intervals (grey areas). The red lines indicate the interim period: 1 January 2009 to 1 July 2009...... 95 Figure 5-4c: Plot of code availability by submission date assuming the now broken links were working at the time. The dots at ‘No’ or ‘Yes’ are individual articles and the lines are a smoothed mean using a LOESS together with 95% confidence intervals (grey areas). The red lines indicate the interim period: 1 January 2009 to 1 July 2009...... 96 Figure 7-1: Selected tweets arising from this PhD’s contribution to the scientific discourse ...... 120

Towards a culture of open science and data sharing in health and medical research xi

List of Tables

Table 3-1: Categorisation of all screened articles into sub-fields and article type .... 49 Table 3-2: Categorisation of the 76 data sharing strategy articles ...... 52 Table 4-1: Numbers of various data sharing statements for randomly selected BMJ research articles (2009–2015) that indicated the availability of their raw data ...... 68

xii Towards a culture of open science and data sharing in health and medical research

List of Abbreviations

AGB – Adrian Gerard Barnett

ARC – Australian Research Council

ARF – Anisa Rowhani-Farid

BMJ – British Medical Journal

COS – Center for Open Science

IHBI – Institute of Health and Biomedical Innovation

NHMRC – National Health and Medical Research Council

PhD – Doctorate of Philosophy

PLOS – Public Library of Science

QUT – Queensland University of Technology

WHO – World Health Organization

Towards a culture of open science and data sharing in health and medical research xiii

Statement of Original Authorship

The work contained in this thesis has not been previously submitted to meet requirements for an award at this or any other higher education institution. To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where due reference is made.

Signature:

Date: 29/June/2018

xiv Towards a culture of open science and data sharing in health and medical research

Acknowledgements

Serendipity. The mystery behind how this PhD was conceived is not lost on me. The very day of giving up hope, of losing faith in the system, of burying my dreams and aspirations, was the very day I was beckoned to a new horizon. “Anisa janam (meaning my dear in Farsi, my mother tongue), do not give up yet, I believe you will get accepted one day, there is someone out there who will like your idea, you must put forth effort in life to receive confirmations, knock on many doors, and one will open”, were the reassuring words my dear mother repeated on the phone while I received rejection letter upon rejection letter. That very afternoon, I was encouraged to give it my final shot. I even told myself: “this is my last effort”. To my surprise, a gentleman finally answered the phone and he was confused as to why I was calling him as he was no longer the liaison academic for future higher degree research students! Clearly the website had not been updated and this was the reason behind the lack of response to my emails and phone calls. With a glimmer of hope, I sent the new liaison academic my proposal, the paper I was about to tear up into shreds just a few minutes earlier. Shortly after, a researcher, a humble gentleman, whose name was Adrian, wrote me some kind and encouraging words, fanning my slight glow of hope. He invited me to meet him in his office the next week; a figurative and literal door opened. My mother was right.

On the seventh floor of the Institute of Health and Biomedical Innovation, a PhD was conceived. “May I be your supervisor?”, were the words he uttered. I thought to myself that I should be asking him if I could be his student, not the other way around! With some embarrassment and shock, I responded: “I would be delighted to, if you accept me as your student?”, he said, “yes”. “Yes”, a word I had not heard in months.

After the meeting, some three years ago, I sat under the globes outside the research institute (pictured), anticipating the beginning of a journey that is now slowly coming to a close. Writing this thesis has been a reflective exercise of the learning I have

Towards a culture of open science and data sharing in health and medical research xv

gained thus far, and particularly of the close accompaniment and training I have received from my professor to be a scientist, a meta-scientist. Though a world- renowned professor, he always treated me as a friend, an equal. I felt as though we were constantly walking shoulder to shoulder in the search for scientific truth. My achievements are a testament of his faith in my abilities; I am grateful and privileged to have landed as his student.

Not long ago, I went back home to Vanuatu to visit my parents. On my return journey, while walking on the runway towards the airplane, I read the words “Spirit of Vanuatu” written on the side of the aircraft. The tropical raindrops fell upon my cheeks and blended with my tears as I pondered on the spirit of that land that holds my roots, which nurtured my very being. "It takes a village to raise a child", were the words of an African proverb that reverberated in my heart as I traced my footsteps as a child, an adolescent, a youth, and, as an adult, which I walked in the company of friends, dearest friends, each of which had a role to play in filling me with life, in cheering me on, and in lifting me up.

With a clarified mind and a keenness of spirit, I present to you this doctorate of philosophy, a body of work which was accomplished through the guidance and support of family, teachers, and friends. From my very first educators, my parents, and my siblings, who taught me about the fundamental principles of life, to my teachers throughout my childhood, youth, and adulthood, who nurtured within me a thirst for seeking knowledge, to my friends who laughed with me, and who challenged me, all of whom always had absolute faith that I could achieve such a task. A task which seemed insurmountable some three years ago, now feels like an achievable milestone, welcoming me, beckoning me.

I thank you all for holding my hand on this journey, this bittersweet journey, filled with its paradoxes; its sorrows and joys, and its crises and victories. Such are the dialectics that have propelled me forward, motivated by the desire of contributing towards the betterment of humankind.

“The virtues of humanity are many, but science is the most noble of them all.” – Bahá'í Writings

xvi Towards a culture of open science and data sharing in health and medical research

Chapter 1: Introduction

This chapter outlines the background (section 1.1) and context (section 1.2) of the research, and its purposes (section 1.3). Section 1.4 describes the significance and scope of this research and provides definitions of terms used.

1.1 BACKGROUND

1.1.1 A personal anecdote: an idea cultured in the laboratory My desire to pursue research in this field began when I was a junior bench scientist some 10 years ago, conducting anti-malarial drug research. I was confronted with the commercial aspect of scientific research. I learned about the institutional arrangements between industry, academia, the community, or “consumers”. I also learned about how intellectual property, patents, and funding arrangements play a critical yet limiting role in contributing to the advancement of scientific knowledge.

It became clear to me that scientific research is driven and, more often than not, pressured by the funding available from government and industry, and that these relationships are primarily based on the conception that scientific knowledge is generated through research that views knowledge as a commodity, distributed at a cost to other researchers and most importantly, populations that might need open access to that knowledge.

As I read more, I wondered what happened to all the public money that was spent on health and medical research. I read Chalmers and Glasziou’s (2009) paper on research waste, as well as the series that was published in in 2014 called ‘Research: increasing value, reducing waste’. I learned that around 85% of the world’s spending on health and medical research is wasted per year, and a contributing factor was that the findings of medical studies cannot be reproduced by other researchers and so seemingly successful medical breakthroughs are thus unverifiable (1, 2). This reproducibility crisis in health and medical research made me think of the way in which scientific knowledge progresses. I was fascinated by the paper written by in 2005 where he concluded through simulations that most published findings in the scientific discourse are false and misleading (3, 4).

Introduction 1

If most of what is claimed in the scientific literature is false, and if scientists are adopting malpractices because of the pressure to commercialise so-called ‘medical breakthroughs’, then how deep will the cultural change have to be for scientists to conduct high-quality research with integrity, and share all their findings, positive or negative? This question has motivated my doctorate of philosophy.

1.1.2 Openness in scientific research Science is fundamentally open in its (5). From one perspective, the generation of scientific knowledge can be regarded as an educational process, from mathematical theorems postulated centuries ago, to the development of new complex biomedical devices. It is a process of various branches of learning, not in conflict with one another, rather, whole, complex, and, extensive in range. The scientific educational enterprise is one which involves the whole of humanity – from a teenager learning about the process of osmosis, an amateur scientist using a rain gauge, to an experienced researcher in a laboratory developing a drug.

The concept of sharing scientific knowledge is intrinsic to science, none more so than to health and medical science (5, 6). Louis Pasteur’s contribution to health science in 1859 for example, that diseases came from microorganisms and that bacteria could be killed by heat and disinfectant, caused doctors to wash their hands and sterilise their instruments, thereby saving millions of lives (7). The learning generated through health and medical science solves problems, cures diseases, and improves functions and processes.

Rapid technological advances and rising consciousness about the purpose and vision of science have brought science to a crossroads (8). Open science is a movement that aims to contribute to humanity’s progress by promoting the sharing of scientific ideas, concepts, data, code, methods, and results with not only researchers, but also consumers, industry, and society at large (9, 10). At the heart of open science is the principle that science belongs to humanity (11). The foundation of health and medical research is data: its collection, analysis, publication, reanalysis, critique, and reuse (12). Open science is the practice of making everything in the discovery process fully and openly available, creating transparency and reproducibility, increasing efficiency and driving further discovery by allowing new knowledge to be generated in the context of the full knowledge of earlier discoveries (8). Practising open science commits researchers to revealing and sharing the entirety of their

2 Introduction

results, data, methodologies, code, and models (13). Accordingly, there are different levels of “data” for a research study: the raw or primary data, which can be qualitative or quantitative; processed data or summary statistics, which are derived from primary data; and meta-data, which are “data about data”, and contain information about a dataset (5). There is also the study protocol (and/or methods) and the code used to analyse the raw data. The focus of my research is on raw or primary data but I do discuss meta-data and code because there is some overlap in ideas and policy. I chose to focus on the sharing of raw data as it provides the evidence behind one’s research findings.

There is a global movement to find the hidden results of research, one of the successes of which is the requirement to register clinical trials (14). The next logical step is a global effort to find hidden data. A study in 2015 estimated 200,000 to 235,000 invisible or hidden datasets (using a broad definition of data that included genetic, chemical, biochemical, physiological and questionnaire/survey) generated from the U.S. National Institutes of Health funded research published in 2011 alone (15). Further, a study conducted by Iqbal et al. (2016) assessed the status of transparency in the scientific literature by sampling 441 biomedical journal articles published in 2000–2014, none of which made all raw data directly available (16).

In 2012, the Royal Society defined open science as “open data (available, intelligible, assessable and useable data) combined with open access to scientific publications and effective communication of their contents”. So open data is an important element of the open scientific movement (5). Open data, or “data sharing”, facilitates transforming data into knowledge, and ensures that everyone has the opportunity to contribute to advancing knowledge (17, 18). Open data promotes better science and is ultimately of a greater benefit to society, although it is understandable that not all data can be shared due to ethical or legal reasons (12).

Data sharing has been defined by Tuyl and Whitmire (2016) as data that is (19):

• Discoverable: available via a direct and persistent link

• Accessible: shared via an open platform with minimal barriers to access

• Transparent: documented and described sufficiently

• Actionable: useable in analytical platforms with minimal reprocessing or reformatting (19).

Introduction 3

The concepts of open science, open data, data sharing, and reproducibility are proliferating in the scientific discourse among researchers, funders, publishers, and universities. The recognition of the importance of data sharing by such key players is also demonstrated through the development of data sharing policies. However, barriers to openness remain, such as technical, motivational, economic, political, legal and ethical barriers (12, 20).

The focus of this research is to contribute to a culture of open science and data sharing in the health and medical research community, in the current climate of rapid scientific discovery.

In this thesis, the word ‘culture’ is defined as a set of beliefs, concepts, ideas, behaviours, thoughts, words and actions that shape a group of people and institutions, and which creates a common vision.

1.2 CONTEXT

This research is concerned with the way in which science progresses, at the heart of which is scientific research. As such, I will include in this thesis a brief analysis of the philosophy and epistemology of science to provide an historical context for the direction of the research, which is commonly referred to as ‘meta-research’ in the scientific discourse, in other words – ‘research’ on research. Ioannidis et al. (2015) describe meta-research as taking a bird’s eye view of science and they define it as ‘…an evolving scientific discipline that aims to evaluate and improve research practices.’ They divide meta-research into five major areas of: methods, reporting, reproducibility, evaluation and incentives (21).

This research focuses on research reproducibility and research incentives. The conceptual framework of this research is in Figure 1-1.

4 Introduction

Figure 1-1: Conceptual framework of this research (image derived primarily from the work of Ioannidis et al. 2015) [23]

Research reproducibility is concerned with verifying scientific research by promoting the sharing of raw data, code, and methods and to ensure that research is repeatable and reproducible by others to validate findings and progress research further and faster (21). There has been some recent discussion of the nomenclature concerning the terms ‘reproducibility’ and ‘replicability’. Generally, ‘reproducibility’ refers to the ability of a researcher to duplicate the results of a prior study using the same materials as were used by the original investigator, and ‘replicability’ refers to the ability of a researcher to duplicate the results of a prior study if the same procedures are followed but new data are collected (22, 23). However, Goodman et al. (2016) have provided some insight into specific terminology to assist with dispelling conceptual and operational confusion (22). As such, the new terms ‘methods reproducibility’, ‘results reproducibility’, and ‘inferential reproducibility’ have been added to the lexicon of reproducibility (22):

Introduction 5

• Methods reproducibility: refers to the provision of enough detail about study procedures and data so the same procedures could, in theory or actuality, be exactly repeated

• Results reproducibility: (previously described as replicability) refers to obtaining the same results of an independent study whose procedures are as closely matched to the original experiment as possible

• Inferential reproducibility: refers to the drawing of qualitatively similar conclusions from either an independent replication of a study or a reanalysis of the original study

In this thesis, the words replicability and reproducibility refer to their traditional definitions, to use common terminology.

Research incentives are concerned with rewarding scientific research. Incentives work best when they increase the value of research, reduce research waste, and promote positive institutional change (21). My research question addresses both research reproducibility and research incentives by aiming to provide quantitative evidence into why data sharing is not common in the health and medical research community, and provide potential solutions to increase the value of research and reduce waste through the development of evidence-based incentives that increase data sharing. Accordingly, my research question is: what interventions increase data sharing in the health and medical research community?

1.3 PURPOSES

A key premise of and assumption behind my research is: the adoption of a culture of open science and data sharing increases the value of health and medical research and reduces research waste. My argument is that the culture of open science and data sharing are not common practice in health and medical research as there are seemingly no evidence-based incentives for researchers, thus, the over-arching hypothesis of my research is that evidence-based incentives for researchers will increase data sharing rates in health and medical research.

To contribute to the body of knowledge on open science and data sharing, I specifically aim to:

1) systematically study the literature on incentives to share data;

6 Introduction

2) quantify data sharing rates and policy compliance at a journal;

3) investigate the effect of an incentive at increasing reproducible research at a journal;

4) trial an incentive that can be adopted by journals to promote the culture of data sharing in health and medical research.

1.4 SIGNIFICANCE, SCOPE AND DEFINITIONS

To promote responsible data sharing, studies that evaluate the effectiveness of requirements, recommendations, and tools are required (24). There has been little evaluation of whether data sharing policies are effective for ensuring that researchers comply with recommendations or for increasing the amount of research carried out (25). Many journals now require authors to share their data with other investigators, either by depositing the data in a public repository or making it freely available upon request. These policies remain largely untested (26, 27). Given that health policy is expected to be evidence-based, it is appropriate that such policies be tested using research, which will provide the high-quality empirical evidence required to help convince one of the key stakeholders – the research community – to change practice.

Introduction 7

8 Introduction

Chapter 2: Literature Review

This chapter begins with an historical, philosophical and epistemological background (section 2.1) and reviews literature on the following topics: why share data?; (section 2.2), which covers the practical reasons as to the importance of sharing data; negative impacts of data withholding; (section 2.3), which discusses the consequences of hiding data; why not to share data; (section 2.4), which touches on the instances when it is best to protect data, and the barriers to data sharing data; (section 2.5), which outlines the impediments to open data; and the requirement of sharing data; and (section 2.6), which discusses the main steps being taken to promote open data in the global research community. Section 2.7 highlights the implications from the literature and develops the conceptual framework for my PhD. Section 2.8 includes a personal experience from one of my colleagues and describes the challenges they faced while trying to obtain data for their PhD. Finally, section 2.9 includes an outline of my thesis.

2.1 AN HISTORICAL, PHILOSOPHICAL, AND EPISTEMOLOGICAL PERSPECTIVE

Science is a body of knowledge that allows humanity to understand the laws of the universe and to surpass them. The process of scientific discovery is systematic, rigorous, and dynamic (6). The science of science, or meta-science, or meta- research, uses rigorous methods to examine how scientific practices influence the validity of scientific conclusions (28). Meta-science has its roots in the philosophy of science and the study of scientific methods (28). As such, this chapter involves a brief analysis of science, from an historical, philosophical, and epistemological standpoint.

2.1.1 The scientific method For centuries scientific philosophers have disputed over the process of scientific discovery and the way in which scientific knowledge comes into existence. The positivist view states that science is a cumulative progressive process. Karl Popper’s view was that science advances by disproving theories and revising them (falsificationism) (29). Thomas S. Kuhn introduced the concept of revolutionary

Literature Review 9

science – paradigm shifts that occur when there is a scientific crisis. He talked about how scientific change is not purely a cognitive process but also a social process (30). To resolve the tension between falsificationism and revolutionary science, Imre Lakatos introduced the idea of ‘research programs’ or independent scientific disciplines (31). More recently, contemporary scientific philosophers such as Anderson (2016) and Rolin (2015) have discussed how the time is ripe to introduce collaboration and interdisciplinarity into the epistemology of science (32).

Scientific research progresses through the efforts of researchers worldwide to generate hypotheses, to test them, to falsify them, to observe their results and through a process of observation, induction and deduction, and collaboration, to contribute to scientific truth (33). Continuous redefinition of theories due to the generation of new knowledge are essential for the proper functioning of scientific production (33). Data sharing builds on the principle of scientific truth being ‘progressive’ – that discoveries are built upon previous ones – as Isaac Newton once said, “if I have seen further it is by standing on the shoulders of giants” (34). Though he made this statement in the year 1676, the principle is applicable today.

Robert Merton, a sociologist, who studied the conceptual framework of the sociology of science, explored how scientists are shaped and motivated by the values and incentives that the institutions of science create, which then influences the products of scientific activity (35).

Progressing scientific truth then requires transparency of practice and an unconstrained flow of information between all parts of the scientific system (32, 33). When scientists generate a set of principles and fundamental beliefs, these truths guide scientific research and propel it forward. One of these truths is the force of gravity; a recent Public Library of Science (PLOS) Medicine editorial discussed how soon most people will not be able to deny the force of data sharing or the requirement of data sharing in the same way as they cannot deny the force of gravity (36). Data sharing is an essential component of the direction in which science is moving (37).

2.1.2 A scientific paradigm shift: share scientific data The roots of open science can be traced back to the establishment of the first scientific societies in the 17th century, which represented ‘a revolutionary

10 Literature Review

organisational change from a culture based on secrecy and patronage to one of professional reputation and autonomy’ (13). In the 21st century, the demand for open access to information has become more prevalent than previous centuries, as part of a cultural trend in which the word ‘open’ is associated with an ever-wider range of endeavours, prefixing: government, culture, archives, research, knowledge, science, source, data, democracy, journalism and more (13). Greater openness may require more than a change in practice, it may involve an evolution of philosophy (13) as not all scientists agree that data sharing is a key part of scientific discovery. Many of the technological problems of data sharing are being addressed, but what is harder to change is the culture of keeping data hidden (38).

There are some examples from history that demonstrate paradigm changes, a famous one being the emergence of Copernican astronomy in 1543 which described the sun as the centre of the solar system, a theory that emerged from the crisis of Ptolemaic astronomy that believed the Earth was stationery and at the center of the universe (30). A modern example of a paradigm shift is Marshall and Warren’s Nobel Prize winning discovery in 2005 that the Helicobacter Pylori is responsible for stomach ulcers as opposed to the common belief of blaming ulcers on stress and bad diet (39).

The present scientific community might be at the brink of a paradigm shift from the old closed ways of conducting scientific research into something that serves the needs of our time more efficiently and complements the recent rapid technological advances (40). In a period of transition, we can see remnants of the old ways of doing things and glimmerings of the new. Science as a body of knowledge is going through a period of transition, where signs of a new system are emerging and coming to the fore (40).

Old ways of operation in scientific research and the impact of political, social, and economic forces create challenges that need to be addressed urgently (41). These challenges have created the crisis of science – the signs of which are apparent everywhere – the lack of trust in the scientific process, the inefficiencies of the current publication system, the misuse of science to bring about destruction of the environment, politicised research funding, and the resulting hyper-competitive environment in the research community – these are signs and symptoms of an ineffective system and the emergence of a newer more robust system (42). Figure 2-1 demonstrates the current scientific paradigm shift.

Literature Review 11

Figure 2-1: The current scientific paradigm shift

Analysing the dichotomy between sharing and withholding data from a philosophical point of view may be a means to bring epistemological aspects into the discussion about open science (33).

What I propose is that data sharing and openness in scientific research be considered as fundamental elements of the philosophy of how scientific knowledge is generated. Thomas Kuhn introduced the concept of paradigm shifts that come after a scientific crisis – the crisis before us today is the reproducibility crisis and the paradigm shift is from closed, hidden science to open science and data sharing (30).

2.1.3 Principles and values in data sharing There are some values embedded within the act of sharing scientific data, which include:

Integrity The scientific enterprise is built on trust. Society trusts that scientific research results are an honest and accurate reflection of a researcher’s work (43, 44). To deserve

12 Literature Review

trust, health and medical research needs to be open, honest, and transparent (32). When scientific data are replicated by other researchers in another study, it is the expected norm that the original raw data was generated with integrity through high quality research (45, 46).

Collaboration to seek scientific truth Data sharing is a way of collaborating in the scientific realm. In 1637 René Descartes alluded that the impact of open science is greatest when it operates in a collaborative process (33, 47):

‘The best minds would be led to contribute to further progress, each one according to his bent and ability, in the necessary experiments, and would communicate to the public whatever they learned, so that one man might begin where another left off; and thus, in the combined lifetimes and labours of many, much more progress would be made by all together than anyone could make by himself.’

David Bohm (2004), in his book, On Dialogue, discussed the process of dialogue – that in dialogue two people are making something in common, creating something new together. This concept forms the foundation of sharing scientific data – creating something new together through the process of scientific communication. Bohm argues that science is supposed to be dedicated to truth and fact (45). If this is the purpose of science, to discover truth, then it should be free of hindrances and the aim should be that of progressing knowledge (45). As such, sharing scientific data should be an integral component of the process of scientific discovery as when researchers share their data with others they are sharing different views of truth, which, together as a collective, creates a more holistic perspective of scientific truth (45).

Bohm then elaborates on the power of the collective – drawing a comparison to laser light. Ordinary light is called ‘incoherent’, meaning that the light waves are not in phase with one another so they do not build up. A laser, on the other hand, produces an intense beam which is coherent and builds up strength as all the light waves are aligned. This beam can do things that ordinary light cannot. We could compare ordinary light to scientific communication that is sporadic, inconsistent and even non-existent. But if scientists were to think and communicate and share with each other in a coherent way, they would generate manifold powers and scientific

Literature Review 13

discovery would take significant leaps forward. Accordingly, the process of sharing scientific data could contribute to the coherence of scientific discovery (45).

The idea of the "fourth paradigm" of scientific research postulates that data-intensive, cross-domain research is the way of the future, and collaborative, distributed networks of researchers will work together to solve scientific research problems (37). As Nielsen (2012) stated in his book, Reinventing Discovery, “we are reinventing discovery, and the result will be a new era of networked science that speeds up discovery, not in one small corner of science, but across all of science” (48).

Creativity Data sharing promotes creativity by providing opportunities for a wider diversity of perspectives, thereby increasing the talent pool for scientific discovery (41). As Bohm and Peat (1987) stated:

‘…It may turn out that such a form of free exchange of ideas and information is of fundamental relevance for transforming culture and freeing it of destructive misinformation, so that creativity can be liberated' (49).

2.2 WHY SHARE DATA?

In addition to the philosophical ideals, there are a number of practical and measureable reasons to share data.

2.2.1 Advance medical science and improve health The great advances being made now in health and medical science are the combined result of numerous researchers and their willingness to share the products of their work. Sharing, however, was not always the norm (5, 41). Prior to the appearance of scientific journals in the 17th century, researchers were hesitant to share their findings with others (41). The pace of scientific advancement, however, changed radically with the development of scientific journals in 1665 when Henry Oldenburg of the Royal Academy of Sciences in Britain launched the publication Philosophical Transactions of the Royal Academy of Sciences, and Denis de Sallo of France established the Journal des sçavans for the purpose of scientists sharing their findings with others (41, 50, 51).

The potential of data sharing to advance medical science and address human health issues is undeniable. Data sharing facilitates open science within the health and

14 Literature Review

medical research enterprise, improves the development of drugs and devices, and benefits the public’s health (52, 53).

It is particularly important for researchers working on outbreaks to embrace a culture of openness (54). For instance, the Severe Acute Respiratory Syndrome (SARS) disease which emerged in February 2003, rapidly spread across three continents, and resulted in a total of 7,761 cases and 623 deaths (33). The epidemic was controlled only four months after its emergence by a World Health Organization (WHO) coordinated effort based on extensive data sharing (33).

Likewise, the researchers working on the Ebola outbreak have recently committed to work openly in outbreaks to honour the memory of their colleagues who died at the forefront of the Ebola outbreak, and to ensure that no future epidemic is as devastating (54). In 2016, only seven months after the data from the largest Ebola outbreak in history was made public, researchers at the Broad Institute in Cambridge, Massachusetts, sequenced 99 more Ebola genomes (54). The data was from patients at the Kenema Government Hospital in Sierra Leone. How befitting that Kenema means 'translucent, clear like a river stream' or 'open to the public gaze' (54).

Similarly, there are several initiatives in the biomedical research area that practice an open science approach including the Open Source Malaria Project, Structural Genomics Consortium, EbolaClinicalTrials.org, Worldwide Antimalarial Resistance Network (WWARN), Ebola Virus Disease Data Sharing Platform Initiative, Global Initiative on Sharing Avian Influenza Data (GISAID), International Severe Acute Respiratory and Emerging Infection Consortium (ISARIC), NextFlu, virological.org, and GenBank (55, 56).

Recently, Pisani and Botchway (2017) performed an in-depth case study of WWARN. They found that WWARN included standardised data from 103 molecular studies and 186 clinical trials, representing 135,000 individual patients. Analyses of the pooled data has generated new methods and influenced malaria treatment recommendations globally (57).

2.2.2 Reduce research waste Not only can data sharing save lives, it can also reduce the current large waste in health and medical research. Glasziou and Chalmers (2009) estimated that 85% of worldwide spending on health and medical research is wasted, around USD $170

Literature Review 15

billion per year (1). Data sharing allows resources to be used more efficiently by: reducing research costs in terms of money, effort, and time; maximizing the use of data; correcting errors in analyses; and reducing participant burden (41, 58).

Data sharing optimises the use of public resources in a climate of scientific openness and transparency (12, 59). An example that demonstrates reduced waste from data sharing is a systematic review of the papers that used a large dataset shared by the National Morbidity, Mortality, and Air Pollution Study (NMMAPS) in the United States. This study found that 67 papers resulted from the NMMAPS data including substantive and methodological papers – a considerable research output from one study, further demonstrating the value in sharing data (60). Another high profile example is the data shared from the Framingham Heart Study, which has resulted in over 1,000 publications, with major findings related to heart disease (61).

Interventions to make science less wasteful and more effective could be extremely beneficial to our health and our grasp of truth, and could help scientific research more successfully pursue its goals (62).

2.2.3 Enhance climate of health and medical community Medical literature portrays only a partial picture of the evidence about clinical strategies, including drugs and devices (63). Data sharing will enrich the climate of the health and medical community as it helps identify and prevent bad science by facilitating a systemic integrity conducive to early detection of error, misconduct and fraud (64). Complete and verifiable scientific data serve as a powerful solution against selective reporting (65). This practice enhances public confidence in health and medical research (5).

2.2.4 Ethical responsibility Data sharing is an ethical duty of researchers to fulfil their obligations to research participants, by fully respecting and maximising their contributions (66). Participants in research studies have a right to know whether the data that they provide will be open or limited for commercial purposes (67). A recent study by Mello et al. (2018) surveyed 771 current and recent trial participants (79% response rate) and found that 93% were very or somewhat likely to allow their own data to be shared with university scientists (68). In Australia, data sharing beyond the original study needs to be specifically mentioned in the Participant Information and Informed Consent

16 Literature Review

form (69). A perspective piece in the New journal of Medicine discussed the results of the patient summit on aligning incentives for data sharing(70). Most patients indicated that they had entered into a trial not to advance knowledge, but to obtain what their doctor thought was the best treatment option for them – this raises some questions about how well patients understand consent forms (70). When asked about patient privacy, Sara Riggare, a patient who is also a doctoral researcher, indicated that when one is suffering from a life-threatening disease, privacy is not the main concern, rather, how many people will die if they do not share their data (70). This experience indicates that patients want their data used responsibly (70).

A recent study looked at the ethical and legal challenges for seeking consent to data sharing in dementia research as patients have lost or risk losing legal capacity (71). They found that in most countries, a substitute decision maker can consent to data sharing in the incapable adult's best interests (71). They recommended that researchers can encourage patients to communicate their research preferences before a loss of capacity, and educate substitute decision makers about their ethical and legal duties (71).

2.2.5 Research reproducibility Research reproducibility and data quality are the pillars of the scientific method (72). In pharmaceutical research, data reproducibility is a key driver for decision-making (73, 74).

The irreproducibility of the Pons-Fleischmann (1989) experiment, which claimed to have achieved nuclear fusion in a jar of water at room temperature (cold fusion), thereby potentially solving the energy crisis, is a well-known example of irreproducibility (75). Scientists could not replicate the cold fusion claim, and so, regardless of the reputations of the investigators or the journal involved, the claim was rejected (75). More recently, in a 2012 study it was found that 47 out of 53 cancer research papers published in Nature were irreproducible (76, 77). In a survey conducted by Nature itself in 2016, 90% of readers agreed that the scientific community is in the midst of a reproducibility crisis (78).

Similarly, in the field of economics, a recent replicability study published in Nature (2018) found that out of 203 economics papers, fewer than one in seven supplied the materials required for replication (79).

Literature Review 17

2.3 NEGATIVE IMPACTS OF DATA WITHHOLDING

Naturally, data withholding has the converse effects of data sharing. It increases the time, money, and effort researchers expend on applying for identical grants to repeat studies, which in turn raises the burden on human and animal research participants (41, 80). Moher et al. (2016) estimated that more than 50% of studies in health and medicine are never published in full, and more than 50% of planned study outcomes are never reported (81). Repeating these studies then increases the opportunity costs of research waste as original studies that could have been completed or progressed are lost (2). As data sharing also facilitates replicability, it is important for researchers to resolve the tension between the under- and over-replication of studies so as to minimise research waste (82).

Withholding medical research data undermines the validity of drugs, devices, and other innovations (74). There are multiple accounts of how withheld data were subsequently exposed, reanalysed and found to be unreliable, therefore undermining the integrity of health and medical research and damaging the public’s image of medical science (83). The development of the antiviral drug Tamiflu is a stark example of the malpractice and waste arising from data withholding. Roche, the drug company behind its development, withheld vital information on its clinical trials of Tamiflu for five years (84). The Collaboration finally obtained all the data and after putting the evidence together concluded that Tamiflu has little or no impact on complications of flu infection, such as pneumonia (84). The U.S. government spent around US$1.5 billion, the U.K. government around US$700 million, and the Australian government an estimated AUD$192 million on Tamiflu based on the incomplete evidence provided by Roche, which highlights the significant costs of withholding data (84).

A more controversial example of the challenges of data withholding is when patient advocacy groups demand access to trial data for reanalysis as with the recent dispute over chronic fatigue treatment. The original investigators of the PACE (pacing, graded activity, and cognitive behavior therapy: a randomised evaluation) trial, White et al., published their findings in the Lancet in 2011 (85). Critics of the trial say that the researchers did not analyse the data as set out in the original trial protocol (85). After a five-year battle to gain access to the trial data, Alem Matthees, an Australian patient with chronic fatigue syndrome, finally succeeded in getting

18 Literature Review

Queen Mary University of London to release the data under the UK Freedom of Information Act (85). Matthees and two academic co-authors reanalysed the data and they found that previously reported recovery rates were overestimated fourfold and that the recovery rates in the intervention groups were not significantly higher than in the group of patients who had specialist medical care alone (85, 86).

2.4 WHY TO NOT SHARE DATA

2.4.1 Data protection There are of course legitimate boundaries of openness that must be maintained, including the personal privacy of participants, safety, and national security (5, 38, 58). In some instances, the publishing of a dataset may require the consent of many associated stakeholders (38). The question of repeatedly seeking participant consent when it comes to sharing a dataset for secondary analyses is a topic that is not yet universally agreed upon (56).

Despite techniques to anonymise datasets, some could still be individually re- identifiable, making data sharing a problematic task if it leads to stigmatisation or other harms, and such risks may vary with cultural contexts (56, 87). For instance, the Department of Health in Australia had to remove a large Medicare dataset from its open data portal as researchers at the University of Melbourne discovered that individual health practitioner details could be uncovered (87). In cases like this, for the protection and privacy of the participants, the best course of action would be to not openly share data or to use better methods to anonymise the data (56).

There are some datasets that could potentially increase public health issues if they were made publicly available. One such example is data on children’s tobacco use. The centre for tobacco control research at the University of Stirling has been disputing attempts by Philip Morris International, which makes Marlboro cigarettes, to gain access to its research into the smoking habits and attitudes of teenagers under freedom of information laws (88). Philip Morris International wants the research centre to hand over its data on thousands of children who are smokers (88). The detail reveals children’s attitudes to smoking and their reactions to packaging and advertising (88).

The University was resisting releasing the data on the grounds that it would be a breach of confidentiality and that tobacco companies are not allowed to collect the

Literature Review 19

data themselves (89). The Scottish Information Commissioner concluded that the University of Stirling did not comply with the freedom of information laws and so must respond to Philip Morris international’s request (89). The Commissioner also concluded that Philip Morris International’s request was not harmful (89).

There are varying opinions on the integrity of the Commissioner’s decision. Some argue that releasing this data to a tobacco company could be misused and lead to an increase in smoking rates among teenagers, therefore this data is best left solely with the original researchers. Yet others, such as the former editor of the BMJ, Richard Smith, argue that though it is a valid concern to worry that the data will be misused by the tobacco company, denying them access to the data is not the right response (90). Rather, as Richard Smith stated, ‘the way for the researchers to counter the tobacco company is not through hiding their data but through better analysis and better argument. That is the essence of science’(90).

Richard Smith received much criticism for his opinion as it is difficult to trust a tobacco company, given their history and especially since tobacco smoking is the leading cause of preventable mortality worldwide and is responsible for over 7 million deaths each year (91). Additionally, it is disconcerting that from late 2017, Phillip Morris International established the Foundation for a Smoke Free World, whose aim is to “eliminate cigarette smoking worldwide” (92). The Foundation is led by former WHO executive director, Derek Yach, and will receive US$1 billion in funding over 12 years (91, 93). It is not the purpose of this PhD to delve into the inconsistences in these occurrences, rather to outline the power of tobacco companies to manipulate the public health agenda through and marketing strategies. Should an industry that “sells death and disability” be given unrestricted access to data into the smoking habits of children and teenagers (91)? Professor Gerard Hastings of the University of Stirling criticised the use of freedom of information in this context (89, 90, 94), which brings to light the notion of data protection for a “moral” or “noble” cause. One might ask who determines the nobility a cause, as with the smoking example, there are always differing perspectives about the best course of action. There are limitations to the global frameworks or laws that provide guidance or insight into the implementation science of data sharing, particularly as we are in the early stages of the global data sharing movement and ethical issues and questions are slowly coming to the fore.

20 Literature Review

An opinion article by Kaye et al. discussed other ethical challenges with biomedical research governance frameworks, further outlining how they are not well equipped to deal with the complexities of international data sharing (95). One limitation is these frameworks do not recognise the connectivity between data providers and those who use the data, which can lead to potentially unjust outcomes that are not in accordance with the expectations of the data providers (95). In one sense, these governance frameworks inadvertently perpetuate inequalities as they obscure the contributions of different stakeholders along the data stream (95).

2.5 BARRIERS TO SHARING

There are numerous barriers to sharing health and medical research data and a systematic review of evidence classified them in six categories: technical, motivational, economic, political, legal, and ethical (20, 41). The next section presents a discussion of the main barriers to data sharing.

2.5.1 Political and legal barriers Legal frameworks, such as data protection laws, may act as a barrier to sharing within or between countries, especially during public health emergencies (56). Some specific international legal frameworks and treaties exist, such as the Nagoya Protocol of the convention on biological diversity and the Bermuda Principles for rapid release of DNA sequence data, however a recent consultation concluded that what is required is a ‘global data governance or ethical framework, supplemented by local memoranda of understanding that take into account the local context’ (56).

A lack of trust at a governmental level exacerbates the political barriers to sharing (20). A prime example is when the Indonesian government declined to share H5N1 influenza samples with the international community during the 2007 pandemic due to lack of trust on the potential use of these samples for financial benefit (20). Legal arrangements were required in the absence of a trust relationship which guided the development of the Pandemic Influenza Preparedness Framework (20).

Another political barrier might be the potential economic damage that data sharing could have on tourism and trade (20, 56). The worldwide SARS epidemic led to estimated economic losses of 50 billion USD between 1998 and 2004, and Foot and Mouth Disease in the UK caused in losses of 30 billion USD between 1998 and 2003 (20, 56).

Literature Review 21

2.5.2 System of scientific publishing Data underpins the published body of scientific knowledge, which is the foundation for all scientific progress (12). However, the current system of scientific publishing works against dissemination of the data underlying publications, and barriers include: inability to access all articles, let alone the raw data behind publications; restrictions on data usage applied by publishers or data providers; and publication of data that is difficult to verify (e.g. lacking a data dictionary) (12).

Scientific journals are the major link between science and society and data provides the evidence behind researchers’ work (51). As such, it is disconcerting that the sharing of data has not been a part of current publication practices, which has implications on research translation and the possible adoption of results from flawed papers in real life settings. The current system discourages data sharing as it creates incentives for investigators to maximize the publication of subsequent analyses from a given dataset without competition (96). Challenges have arisen from the outdated methods of sharing scientific research using journals. The journal was a product of the printing press, which allowed scientific knowledge to be distributed far more widely than the previous system of letters between small groups of scientists. Though the journal was a solution for the needs of the 17th century, this method of distributing and sharing scientific knowledge is now old-fashioned and is proving to be ill-suited to modern science (97, 98).

2.5.3 Negative impact on career Researchers and funders have raised several concerns about data sharing. Many researchers have a sense of ownership over their data, an entitlement which could be intertwined with the fear of relinquishing their rights (20). There are cases when data has been stolen or misused by others, a legitimate concern that significantly sets back the progress of researchers’ careers. A good example of the dilemma between sharing and hiding data is the case of the researcher who shared the sequence data for Giardia lamblia, a diarrhea-causing protozoan, and then discovered that he got scooped by a colleague who had published a paper using his sequence information before he got a chance to publish his own research (99). Many other examples demonstrate that the fear of being scooped is a real barrier for open data in health and medical research (100).

22 Literature Review

Data are often treated as the private property of investigators who aim to increase their publications at the expense of the widest possible use of the data. This limits both the advancement of the research and its application for public health benefit (80, 101). There has been some discussion about the ownership of data, particularly when it comes to clinical trials. Does the data belong to the sponsors, investigators, doctors, or patients? Ultimately, the data belongs to patients because they provide it and it is important to respect and honor their participation in trials by focusing on the wider societal benefits of sharing data from clinical trials, rather than hiding it for individual gain (100, 102).

2.5.4 Data sharing policies – lack of standardisation Though journals and research councils have created data sharing policies to increase data sharing in health and medical research, these safeguards also become obstacles to data access due to lack of standardisation of data sharing policies and of data repositories, thus further delaying research (66). In other words, some data depositories that are promoted by institutions have specific rules and regulations for different types of datasets (summary level data vs. individual data), and these standards become barriers for access across consortia (66).

2.5.5 Lack of knowledge and resources Data could also be withheld when researchers are unaware of data management tools, or do not receive adequate training to effectively manage their scientific data, including using the appropriate metadata standards to ensure that their data are easily located and used (37). There is a critical shortage of competent data managers in public health research, which undermines the rest of the scientific enterprise and creates an obstacle to data sharing (9).

Scientists often view data sharing as a time-consuming task with no reward (103). Tenopir et al. (2011) conducted a survey of 1,329 scientific researchers to explore data sharing practices and perceptions of the barriers and enablers of data sharing (104). Around 50% of researchers said they do not share data due to time constraints, underdeveloped data sharing standards, and inadequate infrastructure. Ironically, however, 85% of these researchers expressed interest in having access to other researchers’ datasets (104). The paradox of not sharing one’s data and desiring to use available datasets raises important questions (66, 104).

Literature Review 23

Similarly, a survey conducted in 2018 by Springer Nature found that “Organising data in a presentable and useful way” was the most stated reason for not sharing data (46% of over 7000 researchers surveyed) (105).

In some instances, as is the case with data generated in the ‘global south’, data sharing is more complicated. A qualitative study by Bezuidenhout (2018) indicated that some researchers in sub-Saharan African laboratories describe data sharing as a ‘western issue’ and discuss technological and financial barriers to data sharing such as poor internet access and lack of funds to publish in open access journals (106).

2.5.6 Commercialisation of scientific data and intellectual property rights Scientific data has in some instances become privatised into a negotiable commodity, and the pressure to commercialise is a key contributor to data withholding (107). When science is conducted primarily to deliver profit rather than the search for truth and/or betterment of humanity and the environment, the likelihood that conflicts of interest will occur increases. These conflicts can lead to decreased quality of scientific research and the suppression of findings, which increases risks to the public’s health and the environment (108). As such, industries may be against open access to data as it decreases their capacity to control the conclusions of research findings (58, 63).

An example which shows how pharmaceutical companies sometimes hide clinical trial data for commercial purposes is the case between the European Medicines Agency (EMA) and the two drug companies, AbbVie and InterMune (109, 110). The EMA was the first drugs agency to publish details of the full clinical-trial data that it receives from pharmaceutical companies online (110). In 2013, the two drug firms resisted the release of their data to third-party freedom-of-information requests by the EMA, arguing that the release of the data would infringe their rights to protect commercially confidential information (110). After a year-long legal battle that was taken to the General Court of the European Union, both drug companies dropped their cases after striking a deal with the regulator to limit what the public could see (111-113). This led to the EU Ombudsman office investigating the EMA’s decision to only provide partial public access to the clinical trial data (113). This example demonstrates the power drug companies have over even regulators such as the EMA.

24 Literature Review

D’Este and Patel argue that in the 1990s a ‘third role’ was added to universities, namely – to ‘knowledge producer’ and ‘knowledge transmitter’ was added ‘economic developer’ (114). Accordingly, university researchers can receive contradictory messages about the adaptation of an open scientific collaborative model or a commercialisation model. On the one hand researchers are told to commercialise their research by patenting, licensing, and forming close partnerships with industry, while on the other hand they are encouraged to share data and disseminate knowledge (115). Thus, there is a tension between academically oriented science, which rewards researchers for openness, publication and priority of discovery, and commercially oriented research and development, which views most information as private (67). Jean-Claude Guedon argued in his paper ‘Open access: Toward the internet of the mind’ that ‘in no case should economic interests be allowed to interfere with the full potential of a free communicating system designed to destined to help humanity – the whole of humanity – grow knowledge’ (116).

There is a common belief among researchers that commercial value means closed data (38), and patents are often cited as a barrier to openness even though a core purpose of the patent system is to enable information to be shared (5). The current debate among scientific researchers about the irreconcilability of the open scientific and commercialisation models shows there is still much that the scientific community can do to promote the co-existence of intellectual property and openness (5, 114).

As such, the cultural shift being called for requires rigorous and open conversation about the objectives of public and private funding in health and medical research and about academic incentives to collaborate and share data in an increasingly competitive research context (66).

2.5.7 Research incentives This literature review does not discuss open data badges as incentives to promote reproducible practices (117) as badges were not discussed in the literature at the time this PhD was conceived. The paper by Kidwell et al. (2016) was the first to introduce the concept of an electronic open data badge as a reward for open data. Thus, the findings of Kidwell et al. (2016) influenced the design of Studies 3 and 4 of my PhD. As such, badges are discussed in more depth in subsequent chapters of this thesis.

Literature Review 25

There are many ways to institute behavioural change, but history and the psychological literature suggest that motivating change with reward is more effective than motivating change with punishment (118). Currently, researchers get little credit for data sharing and so there is a cultural reluctance to publish data openly (12). Data sharing practices will be increased not only by making it easier to share data, but also by providing incentives to researchers (37, 101, 119).

To encourage the sharing of data in health and medical research, it might be beneficial to change the criteria by which scientists and their teams are rewarded for their efforts by agencies that fund them and institutions that host them (72). The criteria should be aligned with the desired outcomes. An example designed to encourage good research behaviour is the PQRST criteria: productive, high-quality, reproducible, shareable, and translatable (120). The reward system could then be shifted to allow for a diversity in advancement, rather than the only current incentive which is dominated by “publish or perish”.

Funding agencies, universities, and research institutes must address the crucial issue of academic credit, and devise methods that recognise and reward data sharing and encourage a culture of openness (121, 122). This will include considerations of how best to reflect academic output and avenues for academic publication that encourage data acquisition and sharing as important contributions (66, 67). The system of academic credit could better align incentives for data sharing with the advancement of science and medicine (96). For those who embrace a culture of openness, there should be recognition of the positive contributions to the new research paradigm (18).

One suggestion for a possible “institutional” reward was that funders withhold a proportion of the grant award expectant on data deposition (123, 124). Further, if researchers have a clear vision on the benefits of data sharing, such as opportunities to network, increased research outputs and the efficiency of research that builds on previous datasets, such a vision might help increase compliance to data sharing policies and remove concerns around the reuse of data (123, 124).

Another possible reward for data sharing that has been discussed in the literature is a system of recognition where the generators of data are identified and cited (96). Similarly, Olfson et al. suggested the creation of a data sharing metric, the S-Index,

26 Literature Review

which would be similar to the popular H-Index, but would measure the number of peer reviewed publications for which investigators have shared their data (125).

2.6 THE REQUIREMENT OF SHARING DATA

This next section aims to outline the two main steps that are being taken towards more open data in health and medical research.

2.6.1 The policy shift Increasing numbers of international bodies, research councils, journals, and funders are creating policies which require or encourage researchers to make their datasets freely accessible to all (25, 33).

International movement On the global level, the United Nations Educational, Scientific and Cultural Organization’s (UNESCO) Universal Declaration on the Human Genome & Human Rights (1997) directs states and organisations to cooperate and exchange scientific knowledge and information (Article 19) (126). UNESCO’s International Declaration on Human Genetic Data (2003) fosters the international dissemination of scientific knowledge (Article 18) by guiding researchers to establish cooperative relationships and to circulate data (127). In 2007, the Organization for Economic Cooperation and Development (OECD) further indicated, in its Principles and Guidelines for Access to Research Data from Public Funding, that full and open access to scientific data should be adopted as the international norm associated with publicly-funded research, avoiding expensive and unnecessary duplication of data collection and improving the overall efficiency of research (128). The US National Institute of Health (NIH) has mandated the sharing of all data from research projects requesting $500,000 or more, and where sharing is not possible an explanation must be given (102). However, there is no monitoring of researchers’ compliance with this policy (81). The Biotechnology and Biomedical Sciences Research Council contributed to the data sharing policy shift by specifying that making data more readily available will reinforce open scientific enquiry and stimulate new investigations and analyses. The Council stated efforts at doing so should be led by the scientific community to ensure the natural embedding of sharing into the scientific culture (129). In July 2017, The updated its data sharing policy, stating:

Literature Review 27

“We expect our researchers to maximise the availability of research data, software and materials with as few restrictions as possible. As a minimum, the data underpinning research papers should be made available to other researchers at the time of publication, as well as any original software that is required to view datasets or to replicate analyses. Where research data relates to public health emergencies, researchers must share quality-assured interim and final data as rapidly and widely as possible, and in advance of journal publication” (130).

In China, in April 2018, the government put out new rules which decreed that all scientific data generated in China must be submitted to government-sanctioned data centers before appearing in publications (131). What was confusing for researchers was that the regulations also called for open access and data sharing (131).

On the Australian front, in 2016 the NHMRC released a national statement on data sharing stating that, “NHMRC encourages data sharing and providing access to data and other research outputs (metadata, analysis code, study protocols, study materials and other collected data) arising from NHMRC supported research” (132). In early 2018, the NHMRC updated its policy by adding the word “strongly” before “encourages” (133). The Australian Research Council (ARC) outlined in 2016 its support to ensuring the widest possible dissemination of the research supported by ARC funding, by stating:

“The ARC’s position reflects an increased focus in Australian and international research policy and practice on open access to data generated through publicly funded research” and “The ARC’s requirement is designed to encourage researchers to consider the ways in which they can best manage, store, disseminate and reuse data. Researchers, in consultation with institutions, have a responsibility to consider the management and future potential of their research data, taking into account the particular approaches, standards and uses for data that may exist in different institutions, disciplines and research projects. Some institutions may have infrastructure and/or processes in place for storing, managing and sharing data – these are valuable resources that should be utilised. The ARC does not require full, detailed data management plans (such as those required by some funding agencies internationally) and does not mandate open access to data” (120, 134).

28 Literature Review

The data sharing policies from the two major funders in Australia (NHMRC and ARC) are relatively weak compared to other national policies such as that of the Wellcome Trust. The Wellcome Trust mandates the sharing of data connected to a publication however the NHMRC and the ARC simply encourage, not mandate, data sharing. The Wellcome Trust also provides the technological platform (Wellcome Open Research, powered by the publisher F1000) for its funded research to be published open access, to undergo open peer review, and to provide data, code and materials required for research reproducibility (135).

In 2015, Bill and Melinda Gates Foundation instituted its Open Access Policy which stated that:

“The foundation will require that data underlying the published research results be immediately accessible and open. This too is subject to the transition period and a 12-month embargo may be applied.”

In 2017 the Foundation also created the Gates Open Research platform (powered by F1000), which is similar to Wellcome Open Research and F1000Research where all research funded by the Foundation can undergo open peer review and have all elements of their research open to the public (136).

Similarly, the African Academy of Sciences has created AAS Open Research (powered by F1000), an open science platform where all research funded by the Academy can be deposited openly (137).

Another global initiative that promotes data sharing is The Research Data Alliance (RDA) (138). The RDA was launched as a community-driven organization in 2013 by the European Commission, the United States Government's National Science Foundation and National Institute of Standards and Technology, and the Australian Government’s Department of Innovation, with the goal of building the social and technical infrastructure to enable open sharing of data. The RDA currently has over 6,700 members from 136 countries (138).

Clinical trials In recent years, the sharing of clinical trial data has gained considerable momentum. The ClinicalTrials.gov platform ensures that all trials are registered, which has greatly increased the transparency of trials. This platform currently contains over 270,000 research studies from 203 countries (139). However, poor quality of

Literature Review 29

registered trial data and the inaccessibility of trial protocols, results and participant- level data undermine the potential benefits of clinical trial registration (140). In 2013, the European Medicines Agency (EMA) made an announcement that it will make available some of the clinical trial data where there is no risk to endangering the privacy of participants (102). In 2015, the WHO published a landmark position statement, requiring all trials to make their methods and results available (14). In recent paper published in British Medical Journal (BMJ) Open in December 2017, 10 principles and 50 recommendations that represent the fundamental requirements of any framework used for the sharing of clinical trials data were outlined (141).

Pharmaceutical companies have a generally poor record on data sharing (83) but have now joined the data sharing movement, albeit to varying degrees. The large company GlaxoSmithKline (GSK) announced in 2012 its intention to release ‘patient-level’ raw data from clinical trials of approved drugs and failed investigational compounds (142). Since then, the drug company has created a portal for the release of this data called ClinicalStudyDataRequest.com (143). An audit that compared pharmaceutical companies’ policies on access to their trials’, methods, results, regulatory documents, and data, found that commitments made by companies to transparency of trials were highly variable, and many companies’ policies were poorly worded and internally inconsistent (144).

Journals Journal policy on research data availability is an important part of the ongoing shift toward publishing reproducible research (145). In 2013, Stodden et al. conducted an empirical analysis of data policy adoption by journals and found that higher impact journals are more likely to have open data policies, and of the 170 journals analysed, 38% had a data policy (145). One of the world’s leading medical journals, the BMJ, stated that from 1 July 2015 its requirements for data sharing extended to apply to all submitted clinical trials, not just those that test drugs or devices (146). The large internationally recognised publisher PLOS is also at the forefront of introducing data sharing policies, with a stringent policy that “require[s] authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception” (36). Both the BMJ and PLOS promote the use of Dryad as a third-party platform for authors to deposit their raw data from their publication. There are some newer initiatives that promote an open science platform

30 Literature Review

for research by offering article pre-publication, data (and code) deposition in a public repository upon submission, and open peer-review, and F1000Research is amongst the leaders in that regard (97).

In 2015, Nosek and colleagues proposed the Transparency and Openness Promotion (TOP) guidelines to assist journals and funders to adopt transparency and reproducibility policies (147, 148). Over 5,000 journals, including Science and the group of Springer Nature journals, and organisations, are signatories of these guidelines (148, 149). The TOP guidelines provide a graded system when it comes to data sharing:

“For Level 1, the published article states whether or not data, code, materials are available and, if available, how to access. For Level 2, data, code, materials must be posted to a trusted repository and exceptions to sharing identified at article submission. Level 3 adds independent verification of reported analyses to the Level 2 standard” (149).

Another initiative that promotes openness in research is The Peer Reviewers’ Openness Initiative. The initiative was launched on 1 January 2017 and its 457 signatories pledge that as reviewers, they will not offer comprehensive review for any manuscript that does not make data publicly available without a clear reason (148, 150).

In early 2016 the International Committee of Medical Journal Editors (ICMJE) put together a proposal outlining some requirements to help meet the mandating of clinical trial data sharing worldwide (151). After receiving some criticism from researchers worldwide regarding the feasibility of the proposed requirements, in June 2017, the ICMJE released its revised requirements for data sharing statements of clinical trials (152). These requirements stipulated that all manuscripts that report the results of clinical trials to include a clear data sharing statement that describes how the data will be shared, including a clear date for when it will become available and for how long (152). The guidelines also indicated that trials that start after 1 January 2019 must include a data sharing plan in the trial’s registration. The requirements indicated that ICMJE member journals can choose to adopt more stringent requirements for data sharing (152). The ICMJE requirements for data sharing only mandate the inclusion of data sharing statements but not data sharing,

Literature Review 31

which is a relatively weak policy compared with other agencies, and does not change the open research culture.

Though the health, medical, and scientific discourse has many policies to encourage data sharing, the sharing of health and medical scientific data is far from a standard practice worldwide or in Australia. There are researchers who withhold their data from others, and in so doing increase research waste (2, 80). In addition, research has found that despite widespread support for data sharing and reuse in principle, in practice data sharing policies have had little impact (37). Even when research is published in journals with clear guidelines regarding data sharing, more than half of the publications did not adhere to the data sharing instructions (26). One study determined how well authors complied with data sharing policies by requesting data from authors who had published in a journal with a clear data sharing policy, but only received one of ten data sets requested (27). A study surveyed 100 ecological datasets in journals that have a strong public data archiving policy and found that 56% were incomplete and 64% were archived in way that partially or entirely prevented reuse (153). This suggests that journal policies requiring data sharing do are not a full proof policy (27).

However, a study by Vines et al. in 2013 provides some hope for the effectiveness of data sharing policies (154). This study found that mandated data archiving policies that require the inclusion of a data availability statement in the manuscript increased data sharing from below 10% to above 90%, and that archiving rates at journals with less stringent policies were only very slightly higher than those with no policy (154). This suggest that journal-based mandatory data archiving policies and mandatory data availability statements should be more widely adopted (154). However, Vines et al. found in another study in 2014 that for papers where the authors shared their data, permanent data loss happened at about 17% per year, suggesting that better data storage is required to make research data more accessible and discoverable in the long-term (155, 156).

There are numerous collaborative groups worldwide that have developed principles and policies to promote open data, two of which are the Data Fairport Initiative and FORCE11 (97). A significant step in this regard has been the development of the FAIR (Findable, Accessible, Interoperable and Reusable) Data Principles, which were published in 2017 (157). These principles are guiding the development of data

32 Literature Review

sharing at national data organisations such as the Australian National Data Service (ANDS), and local groups and university research and ethics committees (158).

2.6.2 Data depositories When individuals store their data in repositories, their work gains the potential to be shared across regional, national, and international borders. Research, therefore, can move forward to benefit the global scientific community (37).

A researcher may make their data open by publishing them on a web repository, ensuring its long-term accessibility, with some form of institutional support (159). Examples of such repositories dedicated to archiving open-source materials include GitHub (github.com), Figshare (figshare.com), Dryad (dryad.com), and Open Science Framework (osf.io) (159).

Recent studies have examined the usage of some of these depositories. Thelwall and Kousha (2016) were the first researchers to analyse the uptake and use of a generic academic resource sharing repository, Figshare (160). Their findings demonstrated that successful use of Figshare is not limited to any particular discipline, resource type or audience, and that there is not a simple pattern that some disciplines have many users and their resources tend to attract many views (160). The authors conducted a similar study in 2017 which examined two evolutionary biology journals, Evolution and Heredity, that mandate data sharing and make extensive use of another depository, Dryad (161). They found an average of 40 to 55 data downloads per article after a few years which suggest that some re-use is occurring for shared life sciences data (161).

2.7 SUMMARY AND IMPLICATIONS

The current literature predominantly shows that data sharing is not standard practice in the health and medical research community. Not sharing data slows the advancement of health and medical science and the potential contributions it can make to improve the public’s health, and it further exacerbates the reproducibility crisis in health and medicine.

This literature review has demonstrated that despite widespread support for data sharing and reuse in principle, in practice data sharing policies have had limited impact. Many papers on data sharing have discussed the need for better incentives

Literature Review 33

and reward systems to increase data sharing. However, there are no evidence-based methods to increase data sharing and hence there is a clear gap in the literature in exploring what interventions might work to increase data sharing.

The belief of scientific openness and data sharing is slowly becoming common to the scientific culture worldwide as more people recognise the benefits and importance of data sharing (5). It is evident that there is a need for a better system, one that allows for the wide distribution of all forms of scientific knowledge, not just those that are endorsed by the gatekeepers at journals or hidden away (41). Open data will not come about spontaneously, it requires capacity building, both individually and collectively at all levels of society – universities, journals, funding bodies, governments, research boards, service providers, regulators, companies, and other collaborators (36). The change in culture is one that is going to be progressive and it requires change in the belief system that governs the current scientific practice (33). This thesis will contribute to this cultural shift towards more open data by providing empirical evidence that can assist with the strengthening of data sharing policies and the testing of evidence-based incentives that increase data sharing.

2.8 A PERSONAL EXPERIENCE OF ACCESSING DATA

Here I include an anecdote from a PhD colleague who tried to access health data that was advertised as available for re-use in research projects. Their experiences raise some of the important barriers of data sharing and highlights the costs of not sharing data, or sharing it reluctantly. My colleague’s anecdote provides first-hand experience of the waste in research that arises from legal barriers, a lack of trust, and inconsistent regulation of data sharing policies. The quote below is in my colleague’s own words, I have removed any identifying details.

“For one dataset: I contacted them after someone sent me their annual report, which showed that they were collecting data relevant to my project, but didn’t appear to be using it... They have a webpage about how to access their data … I first contacted them in [month 1] and after a phone call where I couldn’t get a straight answer, I was told “we don’t just give the data out to anyone”, but I was sent the relevant paperwork to complete. I completed it, obtained ethics approval from [my university] to add the data to my project. I also

34 Literature Review

had to confirm I had the funds to pay for the data access (a bit of a tangent, but just another hurdle). It took a bit of time for me to get all of this together. After completing and submitting all the required paperwork in [month 3], I received an email declining my application in [month 4]. I spoke to the relevant person on the phone and was told the “door wasn’t completely closed”. There was a bit of back and forth and I was told things like “the answer is no, because someone else is already using the data” or “someone else is doing the same analysis as you”, or “we already have someone doing a [type of] analysis”, or “there’s too much overlap between your project and another one”. You can see that those are not very consistent – so I was pretty confused. I also needed to know if someone was doing the same analysis as me, because that could be a problem for me too. They sent my research proposal to the other person, but refused to send me an outline of the other project. Again – it was pretty confusing. After more back and forth, it was still unclear why they were so hesitant to give me access to the data. I’d jumped through all the hoops they’d given me – perhaps they were hoping I’d give up. Eventually, it came down to who I knew, not what I knew. I had hoped it wouldn’t come to that, but it wasn’t until my supervisors stepped in and spoke to someone they knew, on the executive committee, that things started to move. Still there was quite a lot of resistance, but other than vague statements like those above, I wasn’t ever told what the hold-up was. Still there were more delays. I was asked to speak to the other researcher using the data, to see whether they thought there was too much overlap between our projects. We did that in [month 5], and reported back that there wasn’t much of an overlap and that we would be happy to work together/confer along the way. The response to my email about that was that it wasn’t up to us. (Even more confusing because I thought I was doing what I’d been asked to do!) Then there was more back and forth between supervisors and the data custodians and I had to rework and resubmit my proposal (end of [month 9]), providing supporting information that [my supervisors] had the qualifications to be able to assist me with statistical analysis and … modelling, and add one of the internal people as a co-investigator. I was also reminded at this point that they could refuse my proposal on the basis

Literature Review 35

that I was not an internal collaborator. (This would have been useful information in [month 1]! Perhaps we could have avoided some of the delays if they’d made that clear from the outset.) Then the magic happened. About a month later I received provisional approval (end of [month 10]). I had to do another update to ethics to add a co-investigator and that took until mid-[month 11] because I needed information from various people. Then in [month 12], they set up access to the data. There was then some change in the documentation required so I had to update that in [month 13]. There were then a few more delays because people were on leave and the data were finally made available to me in [month 14]. (Over 12 months in total.) … I felt this was pretty poor, considering the data are collected from public and private hospitals, for the purposes of quality and safety, and research. Despite this (and their data sharing policy!), they seemed unwilling to share outside the group, even though there is a significant public investment in the collection of the data. Quality and safety cannot be improved if the data are just sitting in a repository!

As for the other dataset: The team at … are very good data-sharers. There are several people at [my team] using their data and all that was required was [my university] ethics approval for use of the de-identified data. Luckily, they had foreseen future sharing and had included future use in their consent form when they collected the data. We gave them our proposals in [month 1] and they sent a data transfer agreement for [my university] to sign off on in [month2]. There were some delays (possibly on [my university’s] side), but … made the data available in [month 5]. The problem for me in this case was the complicated intellectual property agreement between [all the collaborators]. This took a long time to negotiate and I’m not sure who/what exactly caused all the delays, but it finally came through in late [month 11] and I was able to access the data from that time. (10 months for me, but only 4 months for the others.) So, really this was about tricky intellectual property contract arrangements (which are probably overkill for the type of work we do), not data sharing. In my view, public health research should be public. There should be no way of

36 Literature Review

keeping/hiding results or data for commercial gain, particularly when the data have been collected from the public, with the promise of reporting the outcomes of research to improve health or healthcare in Australia.”

2.9 THESIS OUTLINE

Each study of my PhD had its own research design, sampling framework, recruitment method, sample size calculation, procedure, measures/assessment tools, methods of analysis, and results. Study 1 was a systematic review and was published at Research Integrity and Peer Review. The published manuscript is included in Chapter 3 of this thesis. Study 2 was an observational study and was published at BMJ Open; the published manuscript is included as Chapter 4. Study 3 was another observational study and was published at F1000Research; the published manuscript is included as Chapter 5. Lastly, Study 4, was a randomised controlled trial. The data collection for this trial is still continuing as there were some delays with gaining ethical and legal approval to conduct a trial with BMJ Open. However, the registered study protocol is included as Chapter 6. This protocol was registered at the Open Science Framework and it is currently under embargo. The embargo will be removed once data collection is complete, so as to avoid any contamination of the control group in the trial.

Literature Review 37

Chapter 3: What incentives increase data sharing in health and medical research? A systematic review

The authors listed below have certified that:

1. they meet the criteria for authorship in that they have participated in the conception, execution, or interpretation, of at least that part of the publication in their field of expertise;

2. they take public responsibility for their part of the publication, except for the responsible author who accepts overall responsibility for the publication;

3. there are no other authors of the publication according to these criteria;

4. potential conflicts of interest have been disclosed to (a) granting bodies, (b) the editor or publisher of journals or other publications, and (c) the head of the responsible academic unit, and

5. they agree to the use of the publication in the student’s thesis and its publication on the QUT’s ePrints site consistent with any limitations set by publisher requirements.

In the case of this chapter:

What incentives increase data sharing in health and medical research? A systematic review

38 What incentives increase data sharing in health and medical research? A systematic review

Contributor Statement of contribution* Anisa Rowhani- Farid ARF collected and analysed all data for the study and wrote the manuscript. ARF read and approved the final manuscript.

Signature

29/June/2018 Date

Michelle Allen MA collected data and analysed (30%) for the study and edited the manuscript. MA read and approved the final manuscript.

AGB provided close student mentorship for this research, which is a part of ARF’s Adrian G. Barnett PhD under his primary supervision, and was a major contributor for the writing of this manuscript. AGB read and approved the final manuscript.

Principal Supervisor Confirmation

I have sighted email or other correspondence from all Co-authors confirming their certifying authorship. (If the Co-authors are not able to sign the form please forward their email or other correspondence confirming the certifying authorship to the RSC).

Name Signature Date

Adrian Barnett 29/June/2018

3.1 OVERVIEW

This chapter contains the modified version of the study that was published at Research Integrity and Peer Review. The modifications are: the text, figures, tables, and references, of the original document formatted for the purposes of this thesis. The published version can be found at:

What incentives increase data sharing in health and medical research? A systematic review 39

Rowhani-Farid A, Allen M, Barnett AG: What incentives increase data sharing in health and medical research? A systematic review. Research Integrity and Peer Review 2017, 2(1):4. https://doi.org/10.1186/s41073-017-0028-9.

3.2 ABSTRACT

Background: The foundation of health and medical research is data. Data sharing facilitates the progress of research and strengthens science. Data sharing in research is widely discussed in the literature, however there are seemingly no evidence-based incentives that promote data sharing.

Methods: A systematic review (registration: doi.org/10.17605/OSF.IO/6PZ5E) of the health and medical research literature was used to uncover any evidence-based incentives, with pre- and post- empirical data, that examined data sharing rates. We were also interested in quantifying and classifying: the number of opinion pieces on the importance of incentives, the number observational studies that analysed data sharing rates and practices, and strategies aimed at increasing data sharing rates.

Results: Only one incentive (using open data badges) has been tested in health and medical research that examined data sharing rates. The number of opinion pieces (n = 85) out-weighed the number of articles testing strategies (n = 76), and the number of observational studies exceeded them both (n = 106).

Conclusions: Given that data is the foundation of evidence-based health and medical research, it is paradoxical that there is only one evidence-based incentive to promote data sharing. More well-designed studies are needed in order to increase the currently low rates of data sharing.

3.3 KEY WORDS

Incentives, data sharing, open data, meta-research

3.4 RATIONALE

Despite the current shift towards more open data in health and medical research, there are seemingly no evidence-based incentives that increase data sharing. As

40 What incentives increase data sharing in health and medical research? A systematic review

such, a systematic review was used to verify the lack of evidence-based incentives in this area.

3.5 OBJECTIVE

This study aims to systematically review the literature to appraise and synthesise scientific research papers that concern incentives that have been tested to increase data sharing in health and medical research.

3.6 INTRODUCTION

3.6.1 Research waste: hidden data, irreproducible research

The foundation of health and medical research is data—its generation, analysis, re- analysis, verification, and sharing (12). Data sharing is a key part of the movement towards science that is open, where data are easily accessible, intelligible, reproducible, replicable, and verifiable (21). Data sharing is defined here as making raw research data available in an open data depository, and includes controlled access where data are made available upon request which may be required due to legal or ethical reasons. Despite the wide-scale benefits of data sharing such as addressing global public health emergencies, it is yet to become common research practice. For instance, the Severe Acute Respiratory Syndrome (SARS) disease was controlled only four months after its emergence by a World Health Organization coordinated effort based on extensive data sharing (33). Likewise, the researchers working on the Ebola outbreak have recently committed to work openly in outbreaks to honour the memory of their colleagues who died at the forefront of the Ebola outbreak, and to ensure that no future epidemic is as devastating (54). Notwithstanding these benefits, numerous studies have demonstrated low rates of data sharing in health and medical research, with the leading journal the British Medical Journal (BMJ) having a rate as low as 4.5% (162) and biomedical journal articles 0% (16). There are of course legitimate reasons to withhold data, such as the concern about patient privacy, and the requirement for patient consent for sharing (20).

With 85% of the world’s spending on health and medical research, an estimated $170 billion, wasted every year, it is clear that the scientific community is in crisis, leading

What incentives increase data sharing in health and medical research? A systematic review 41

to questions about the veracity of scientific knowledge (1). Data sharing and openness in scientific research should be fundamental to the philosophy of how scientific knowledge is generated. Thomas Kuhn introduced the concept of paradigm shifts that arise from a scientific crisis. The paradigm shift before us today is from closed, hidden science to open science and data sharing (30). Sharing scientific data will allow for data verification and re-analysis, and for testing new hypotheses. Open data reduces research waste in terms of time, costs, and participant burden, and in turn, strengthens scientific knowledge by ensuring research integrity (82, 162).

The many current problems in health and medical research has led to the emergence of a new field, meta-research, which is concerned with improving research practices (21). Meta-research has five sub-themes with ‘reproducibility’ and ‘incentives’ as two of the themes (21). Reproducibility is concerned with the verification of research findings, which can be achieved through the sharing of data and methods (21). Incentives is concerned with rewarding researchers, which includes incentives to share their data and methods (21). We were interested in how researchers are incentivised to openly share their raw data, thus combining two sub-themes of meta- research.

3.6.2 Research waste: historical barriers

Historically, it has not been common practice for the content of a research article to include access to the raw data from scientific experiments (163). This flaw, created by technological limitations among others, has hindered the progress of scientific knowledge (162). However, we can no longer blame technology for outdated research practices. There are many data depositories which allow researchers to easily share their data using a citable DOI. There have also been many recent policies and frameworks to encourage openness in research (20). Yet uptake in health and medicine is low and what is lacking, it appears, are rewards that incentivize researchers to share their data (163). Incentives are defined here as rewards that are given to researchers if they participate in sharing their raw scientific data (164).

42 What incentives increase data sharing in health and medical research? A systematic review

3.7 RESEARCH DESIGN AND METHODOLOGY

The Queensland University of Technology (QUT) Library staff assisted in developing a rigorous and clearly documented methodology for both the search strategy and the selection of studies. The aim was to minimise by documenting the search process and the decisions made to allow the review to be reproduced and updated.

The Cochrane Handbook for Systematic Reviews was used as a guide for this systematic review: http://handbook.cochrane.org/. The EQUATOR Network PRISMA (2009) Checklist (165) was used to ensure good practice as well as accurate reporting.

Three systematic review registries (Prospero, Joanna Briggs Institute, and Cochrane) were checked to ensure our proposed systematic review had not already been done. Our systematic review protocol was registered at the Open Science Framework on 1 August 2016 (doi.org/10.17605/OSF.IO/6PZ5E).

3.7.1 Inclusion criteria

Types of documents This review considered published journal articles with empirical data that trialed any incentive to increase data sharing in health and medical research.

Types of data Articles must have tested an incentive that could increase data sharing in health and medical research. For the purposes of this review, health and medical research data are defined as any raw data that have been generated through research from a health and medical facility, institute or organisation.

Incentives are defined here as ‘a benefit, reward, or cost that motivates an […] action’. This was based on the definition of incentives in economics, which groups incentives into four categories: financial, moral, natural and coercive (166).

Types of measures The review included any paper with empirical data on sharing that compared an intervention and control, which used a clear research design (including randomised

What incentives increase data sharing in health and medical research? A systematic review 43

and non-randomised designs). The types of measures included are: the percent of datasets shared, or the number of datasets shared, or the relative ratio of data sharing.

3.7.2 Exclusion Criteria

This review excluded the following, but still classified these excluded papers by field:

• all editorial and opinion pieces that only discuss strategies to increase data sharing without trialling them.

• strategies that do not involve incentives, e.g., education seminars, change in a data sharing policy or some other policy, access to data management tools and managers

• observational studies that describe data sharing patterns

3.7.3 Search Strategy

This search strategy was designed to access published articles through the following steps:

( ( ( “open science” OR “open data” OR “data sharing” ) AND ( incentive* OR motivation* OR reward* OR barrier* ) ) )

44 What incentives increase data sharing in health and medical research? A systematic review

Database Search Health/Medical

PubMed ((“open science” OR “open data” OR “data sharing”) AND (incentive* OR motivation* OR reward* OR barrier*))

EMBASE ((“open science” OR “open data” OR “data sharing”) AND (incentive* OR motivation* OR reward* OR barrier*))

CINAHL ((“open science” OR “open data” OR “data sharing”) AND (incentive* OR motivation* OR reward* OR barrier*))

Multi-disciplinary Search databases

Scopus ((“open science” OR “open data” OR “data sharing”) AND (incentive* OR motivation* OR reward* OR barrier*))

Web of Science ((“open science” OR “open data” OR “data sharing”) AND (incentive* OR motivation* OR reward* OR barrier*))

What incentives increase data sharing in health and medical research? A systematic review 45

Relevant articles that did not appear in the database search but were known to the reviewers were hand-picked and extracted into EndNote

3.7.4 Process of Selecting and Evaluating Articles

Two reviewers, ARF and MA, screened the titles of the articles and based on the inclusion and exclusion criteria, extracted them into EndNote. Duplicates were removed.

The reviewers independently screened the extracted article titles and abstracts based on the inclusion and exclusion criteria and categorised them into five groups:

1) Incentives

2) Other strategies

3) Opinion pieces

4) Observational studies

5) Irrelevant

ARF read the titles and abstracts of all extracted articles and MA verified her findings by reading a random sample of 30%. Discrepancies between the two reviewers were approximately 10%, however these were relatively minor and resolved through discussion of the scope of each of the categories. For instance, a research paper outlined the introduction of a data system, one reviewer classified it as an observational study, but after discussion it was agreed that it was a strategy article as its objective was to increase data sharing rates rather than observing data sharing patterns.

3.7.5 Process of Extracting Relevant Information

The two reviewers independently read eligible documents and extracted data sharing incentives in health and medical research. Both reviewers were agnostic regarding

46 What incentives increase data sharing in health and medical research? A systematic review

the types of incentives to look for. The final list of incentives was determined and agreed on by all authors (20).

3.7.6 Data Synthesis

Individual incentives were grouped into research fields. A qualitative description of each incentive was presented.

Based on our prior experience of the literature, the research fields and sub-fields for classification were:

a. Health and medical research i. Psychology ii. Genetics iii. Other (health/medical) b. Non-health and medical research i. Information Technology ii. Ecology iii. Astronomy iv. Other (non-health/medical) The other articles – strategies, opinion pieces, and observational studies – were also grouped into the same research fields.

3.8 RESULTS

The database searches found 1415 articles, 1039 of which met the inclusion criteria based on assessment of titles and abstracts and were exported into EndNote. After automatically removing duplicates, 670 articles remained and after manually removing the remainder of the duplicates, 586 articles remained.

586 titles and abstracts were read and categorised based on the above inclusion and exclusion criteria. One study was hand-picked as it met the inclusion criteria, bringing the total number of extracted articles to 587. After screening titles and abstracts, nine articles were classified under incentives in health and medical research. These articles were then read in full, and one of them was judged as an incentive that satisfied the inclusion criteria.

What incentives increase data sharing in health and medical research? A systematic review 47

The PRISMA (165) flow chart that outlines the journey of the articles from identification to inclusion is in Figure 3-1. The categorisation of all 587 articles into the sub-fields and article type is in Table 3-1.

Figure 3-1: PRISMA [1] Flow Chart: systematic selection of studies that have tested incentives to increase data sharing rates in health and medical research from the literature

A review of the reference list for the one included intervention was undertaken (117). The titles and abstracts of the full reference list of this study (23 papers) and those that cited the study (5 papers) were read, but none met the inclusion criteria of this systematic review.

48 What incentives increase data sharing in health and medical research? A systematic review

Table 3-1: Categorisation of all screened articles into sub-fields and article type

Article type

Research fields and sub-fields Total: Opinion Observational field Incentives Strategies Pieces Studies of studies

Health and Medical Research Psychology 1 2 4 1 8 Genetics 0 16 12 17 45 Other 0 58 69 88 215 Total: 1 76 85 106 268 health and medical research

Non-Health and Medical Research Astronomy 0 0 0 1 1 Ecology 0 5 11 8 24 Information 0 38 26 28 92 Technology Other 0 46 52 87 185 Total: non- 0 89 89 124 302 health and medical research Total: type 1 149 174 230 570 of studies

17 articles were irrelevant, bringing the total number of screened articles to 570. The distribution of articles across type of study was similar for both health and medical

What incentives increase data sharing in health and medical research? A systematic review 49

research and non-health and medical research (Table 3-1). Observational studies were the most common type (n = 106, n = 124), then opinion pieces (n = 85, n = 89), then articles testing strategies (n = 76, n = 89), and articles testing incentives were uncommon (n = 1, n = 0).

3.8.1 Observational studies about data sharing in health and medical research

These articles did not fit the inclusion criteria, but based on the abstracts they were mostly concerned with observing data sharing patterns in the health and medical research community, using quantitative and qualitative methods. The motivation behind these studies was often to identify the barriers and benefits to data sharing in health and medical research. For instance, Federer et al. (2015) conducted a survey to investigate the differences in experiences with and perceptions about sharing data, as well as barriers to sharing among clinical and basic science researchers (167).

3.8.2 Opinion pieces about data sharing in health and medical research

These articles also did not fit the inclusion criteria, but based on the abstracts they were opinion and editorial pieces that discussed the importance and benefits of data sharing and also outlined the lack of incentives for researchers to share data.

3.8.3 Main results: incentives in health and medical research

Badges

Open data and open materials badges were created by the Center of Open Science and were tested at the journal Psychological Science (117). In January 2014, the journal adopted badges to acknowledge open data, open materials and preregistration of research if published (117). A Badges Committee at the Centre of Open Science outlined what it meant to have “open data” and “open materials” and the journal editorial team awarded badges to those authors who voluntarily applied for them upon article acceptance and proved that they met the criteria (117). The criteria to earn an open data or open materials badge involved making all digitally sharable data and materials available in an open data repository (117). Badges greatly increased the reported open data rate at the journal from 1.5% in the first half of 2012 (start point) to 39.4% in the first half of 2015 (end point) (117).

50 What incentives increase data sharing in health and medical research? A systematic review

Limitations

A limitation of the badge study was that it did not use a randomized parallel group design; notwithstanding, it was the only incentive that was tested in the health and medical research community, with pre- and post-incentive empirical data (117). The pre- and post-design of the study makes it vulnerable to other policy changes over time, such as a change from a government funding agency like the recent Statement on Data Sharing from the Australian National Health and Medical Research Council (168). However, the Kidwell et al. study addressed this concern with contemporary control journals. A limitation of the badge scheme was that even with badges, the accessibility, correctness, usability, and completeness of the shared data and materials was not 100%, which was attributable to gaps in specifications for earning badges. In late 2015, the Center for Open Science Badges Committee considered provisions for situations in which the data or materials for which a badge was issued somehow disappear from public view and how adherence to badge specifications can be improved by providing easy procedures for editors/journal staff to validate data and material availability before issuing a badge, and by providing community guidelines for validation and enforcement (117).

3.8.4 Incentives in non-health and medical research

Of the non-health/medical incentives, seven were categorised as information technology, and nine as other. Upon reading the full text, all the sixteen non- health/medical incentives were proposed incentives or strategies as opposed to tested incentives with comparative data.

3.8.5 Strategies to increase data sharing in health and medical research

Given that the systematic review found only one incentive, we classified the data sharing strategies tested in the health and medical research community. Seventy-six articles were classified under ‘strategies’ and Table 3-2 shows the further classification into categories based on a secondary screening of titles and abstracts. The articles are grouped by whether they presented any data, descriptive or empirical.

What incentives increase data sharing in health and medical research? A systematic review 51

Table 3-2: Categorisation of the 76 data sharing strategy articles

Sub-theme of Category Empirical or None or little health and medical (numbers) descriptive data empirical or research (numbers descriptive data [references]) (numbers [references]) Psychology Data system 2(169, 170) (2) Data system (14) 12(171-182) 2(183, 184)

Genetics Collaboration and 1(185) data system (1)

Collaboration (1) 1(186)

Other (health and Data system (41) 35 (187-221) 6 (222-227) medical research)

Collaboration and 7 (228-234)

data system (7)

Collaboration (6) 3(235-237) 3 (73, 238, 239)

Policy 2(240, 241) 1(242) (3)

Campaign 1(243) (1)

The majority, 57/76, of strategies were technological strategies such as the introduction of a data system to manage and store scientific data. Seven of the 76 strategies concerned encouraging collaboration among research bodies to increase data sharing. Eight were a combination of collaboration across consortia and the introduction of a technological system. Three had a data sharing policy as the strategy but did not test the effectiveness of the policy, but two of them reported descriptive data from their experience in implementing the policy. One strategy was an open data campaign.

Below we give some examples of the strategies used to promote data sharing.

52 What incentives increase data sharing in health and medical research? A systematic review

3.8.6 Strategies in health and medical research: data systems

Dataset linkage – attribution

Two articles discussed an incentive system for human genomic data and data from rare diseases, namely, microattribution and nanopublication—the linkage of data to their contributors. However, the articles only discussed the models and did not present empirical data (178, 193).

Another article discussed the OpenfMRI project that aims to provide the neuroimaging community with a resource to support open sharing of fMRI data (197). In 2013, the OpenfMRI database had 18 full datasets from seven different laboratories and in October 2016, the database had 55 datasets openly available (244). The authors identified credit as a barrier towards sharing data and so incorporated attribution into the OpenfMRI website where a dataset is linked to the publication and the list of investigators involved in collecting the data (197).

Electronic laboratory notebooks

An article discussed open source drug discovery and outlined its experience with two projects, the praziquantel (PZG) project and the Open Source Malaria project (194). The article did not have pre- and post- strategy data. The authors discussed the constituent elements of an open research approach to drug discovery, such as the introduction of an electronic lab notebook that allows the deposition of all primary data as well as data management and coordination tools that enhances community input (194). The article describes the benefits and successes of the open projects and outlines how their uptake needs to be incentivised in the scientific community (194).

3.8.7 Strategies in health and medical research: collaboration and data system

An article discussed the development of the Collaboratory for MS3D (C-MS3D), an integrated knowledge environment that unites structural biologists working in the area of mass spectrometric based methods for the analysis of tertiary and quaternary macromolecular structures (MS3D) (229). C-MS3D is a web-portal designed to provide collaborators with a shared work environment that integrates data storage and management with data analysis tools (229). The goal is not only to provide a

What incentives increase data sharing in health and medical research? A systematic review 53

common data sharing and archiving system, but also to assist in the building of new collaborations and to spur the development of new tools and technologies (229).

Attribution

One article outlined the collaborative efforts of the Global Alzheimer’s Association Interactive Network (GAAIN) to consolidate the efforts of independent Alzheimer’s disease data repositories around the world with the goals of revealing more insights into the causes of Alzheimer’s disease, improving treatments, and designing preventative measures that delay the onset of physical symptoms (230). In 2016, they had registered 55 data repositories from around the world with over 25,000 subjects using GAAIN’s search interfaces (230). The methodology employed by GAAIN to motivate participants to voluntarily join its federation is by providing incentives: data collected by its data partners are advertised, as well as the identity of the data partners, including their logos and URL links, on each GAAIN search page (230). GAIIN attributes its success in registering 55 data repositories to date to these incentives which provide opportunities for groups to increase their public visibility while retaining control of their data, making the relationship between GAIIN and its partners mutually beneficial (230). This study did not have pre- and post- strategy empirical data, but described the importance of incentives in motivating researchers to share their data with others (230).

3.8.8 Strategies in health and medical research: collaboration

An article described how data sharing in computational neuroscience was fostered through a collaborative workshop that brought together experimental and theoretical neuroscientists, computer scientists, legal experts and governmental observers (235). This workshop guided the development of new funding to support data sharing in computational neuroscience, and considered a conceptual framework that would direct the data sharing movement in computational neuroscience (235). The workshop also unveiled the impediments to data sharing and outlined the lack of an established mechanism to provide credit for data sharing as a concern (235). A recommendation was that dataset usage statistics and other user feedback be used as important measures of credit (235).

54 What incentives increase data sharing in health and medical research? A systematic review

One article addressed the need to facilitate a culture of responsible and effective sharing of cancer genome data through the establishment of the Global Alliance for Genomic Health (GA4GH) in 2013 (185). The collaborative body unpacked the challenges with sharing cancer genomic data as well as the potential solutions (185). The GA4GH developed an ethical and legal framework for action with the successful fostering of an international ‘coalition of the willing’ to deliver a powerful, globally accessible clinic-genomic platform that supports data-driven advances for patients and societies (185).

3.8.9 Strategies in health and medical research: policy

An article discussed the efforts of the Wellcome Trust Sanger Institute to develop and implement an institute-wide data sharing policy (240). The article outlined that successful policy implementation depends on: working out detailed requirements (guidance), devoting efforts and resources to alleviate disincentives (facilitation), instituting monitoring processes (oversight), and leadership (240). The topic of disincentives (facilitation) included concerns about lack of credit (240). They propose that cultural barriers to data sharing continue to exist and that it is important to align the reward system to ensure that scientists sharing data are acknowledged/cited and that data sharing is credited in research assessment exercises and grant career reviews (240).

3.8.10 Strategies in health and medical research: campaign

One intervention was an open data campaign which was included in the review via an open letter in June 2014 from the AllTrials campaign to the director of the European Medicines Agency to remove barriers to accessing clinical trial data (243). The AllTrials campaign is supported by more than 78,000 people and 470 organisations worldwide (243). This letter contributed to the European Medicines Agency publishing the clinical reports underpinning market authorization requests for new drugs, which was part of a more proactive policy on transparency that applied to all centralized marketing authorisations submitted after 1 January 2015 (243). The adoption of this policy was a significant step in ensuring transparency of health and medical research in Europe (243).

What incentives increase data sharing in health and medical research? A systematic review 55

3.9 DISCUSSION

This systematic review verified that there are few evidence-based incentives for data sharing in health and medical research. The irony is that we live in an evidence- based world, which is built upon the availability of raw data, but we hardly have any evidence to demonstrate what will motivate researchers to share data. To date, open data badges are the only tested incentive. Badges are an effective signal and incentive for open practices and journals can offer them to authors who are willing and able to meet criteria to earn an open data and open material badge (117).

It is interesting to note the great number of opinion pieces (n = 85) on the importance of developing incentives for researchers, which outnumbered the number of articles that tested strategies to increase data sharing rates (n = 76). ‘Opinion pieces’ are mutually exclusive from ‘strategies’ as the former is concerned with discussing possible strategies and incentives and the latter tests the ideas and strategies and provides evidence of what works or does not work. These strategies included: the introduction of data systems such as electronic laboratory notebooks and databases for data deposition that incorporated a system of credit through data linkage; collaboration across consortia that also introduce data systems that also use data attribution as an incentive; collaboration across consortia through workshops and development of frameworks for data sharing; implementation of data sharing policies; and campaigns to promote data sharing. These strategies discussed the requirement of introducing rewards to increase data sharing rates and the only form of incentive used was via data attribution and advertising on websites. Studies that test the effectiveness of attribution and advertising as a form of credit are necessary.

In light of the small number of studies, we see a clear need for studies to design and test incentives that would motivate researchers to share data. Organisations are promoting the development of incentives to reduce research waste. In late 2016, the Cochrane and the REWARD alliance combined to create the annual Cochrane- REWARD prize for reducing waste in research. The monetary prize is awarded to ‘any person or organisation that has tested and implemented strategies to reduce waste in one of the five stages of research production [question selection, study design, research conduct, publication, and reporting] in the area of health’. This prize is an example of an incentive for researchers to design studies or implement

56 What incentives increase data sharing in health and medical research? A systematic review

policies that reduce research waste; it will be interesting to see the impact of this initiative (245).

Another endeavour in the area of developing incentives and rewards for researchers is the convening in early 2017 of a group of leaders from the US and Europe from academia, government, journals, funders, and the press to help develop new models for academic promotion and professional incentives that would promote the highest quality science, organised by the Meta-Research Innovation Center at Stanford (METRICS). The focus will be on designing practical actions that embody principles that this community has embraced, while also recognizing that the effect of any such policies will need empirical evaluation.

While the systematic barriers to widespread data sharing are being addressed through the general shift towards more openness in research, the conversation on data sharing includes an alternative view where users of shared data are called ‘research parasites’ who ‘steal from research productivity’ and who are ‘taking over’ (246, 247). There is also some questioning of whether data sharing is worth the effort (247). These points, however, are contrary to the purpose of sharing data, which is to progress science as a body of knowledge and to make the research process more robust and verifiable (162, 247).

3.10 LIMITATIONS

A limitation of this systematic review is that we did not search the grey literature (materials and research produced by organizations outside of the traditional commercial or and distribution channels). This review could be perceived as having a narrow design, given that we anticipated a lack of evidence- based incentives for data sharing in health and medical research, hence making the topic of this systematic review too simple. However, we could not be sure that there were no incentives and the recent paper by Lund and colleagues (2016) emphasises the importance of conducting systematic reviews prior to designing interventions in order to avoid adding to the already large issue of research waste (248).

What incentives increase data sharing in health and medical research? A systematic review 57

3.11 CONCLUSION

The current meta-research discourse outlines the numerous benefits of openness in research: verification of research findings, progressing health and medicine, gaining new insights from re-analyses, reducing research waste, increasing research value, and promoting research transparency. However, this systematic review of the literature has uncovered a lack of evidence-based incentives for researchers to share data, which is ironic in an evidence-based world. The open data badge is the only tested incentive that motivated researchers to share data (117). This low-cost incentive could be adopted by journals and added to the reward system to promote reproducible and sharable research (117, 120). Other incentives like attribution require empirical data. Instead of evidence-based incentives, the literature is full of opinion pieces that emphasize the lack of incentives for researchers to share data, outweighing the number of strategies that aim to increase data sharing rates in health and medicine. Observational studies that identify data sharing patterns and barriers are also plentiful, and whilst these studies can provide useful background knowledge, they do not provide good evidence of what can be done to increase data sharing.

3.12 LIST OF ABBREVIATIONS

AGB: Adrian Gerard Barnett ARF: Anisa Rowhani-Farid C-MS3D: Collaboratory for MS3D GA4GH: Global Alliance for Genomic Health GAAIN: Global Alzheimer’s Association Interactive Network MA: Michelle Allen METRICS: Meta-Research Innovation Center at Stanford PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses PZG: praziquantel QUT: Queensland University of Technology REWARD: Reduce research Waste and Reward Diligence WTSI: Wellcome Trust Sanger Institute

3.13 DECLARATIONS

3.13.1 Ethics approval and consent to participate

Not applicable

58 What incentives increase data sharing in health and medical research? A systematic review

3.13.2 Consent for publication

Not applicable

3.13.3 Availability of data and material

The datasets generated and analysed during the current study are available at the Open Science Framework repository (DOI 10.17605/OSF.IO/DSPU8).

3.13.4 Competing interests

The authors declare that they have no competing interests.

3.13.5 Funding

No monetary assistance was provided for this systematic review, however support was provided in kind by the Australian Centre for Health Services Innovation at the Institute of Health and Biomedical Innovation at QUT.

3.13.6 Authors' contributions

ARF collected and analysed all data for the study and wrote the manuscript. MA collected data and analysed (30%) for the study and edited the manuscript. AGB provided close student mentorship for this research, which is a part of ARF’s PhD under his primary supervision, and was a major contributor for the writing of this manuscript. All authors read and approved the final manuscript. 3.13.7 Acknowledgements

The QUT Librarians assisted in designing the search strategy for this review.

What incentives increase data sharing in health and medical research? A systematic review 59

60 What incentives increase data sharing in health and medical research? A systematic review

Chapter 4: Has open data arrived at the British Medical Journal (BMJ)? An observational study

The authors listed below have certified that:

1. they meet the criteria for authorship in that they have participated in the conception, execution, or interpretation, of at least that part of the publication in their field of expertise;

2. they take public responsibility for their part of the publication, except for the responsible author who accepts overall responsibility for the publication;

3. there are no other authors of the publication according to these criteria;

4. potential conflicts of interest have been disclosed to (a) granting bodies, (b)the editor or publisher of journals or other publications, and (c) the head of the responsible academic unit, and

5. they agree to the use of the publication in the student’s thesis and its publication on the QUT’s ePrints site consistent with any limitations set by publisher requirements.

In the case of this chapter:

Has open data arrived at the British Medical Journal (BMJ)? An observational study

Has open data arrived at the British Medical Journal (BMJ)? An observational study 61

Contributor Statement of contribution* Anisa Rowhani- Farid ARF – data collection, data analysis, writing of manuscript ARF had full access to all of the data (including statistical reports and tables) in the

study and can take responsibility for the integrity of the data and the accuracy of the data analysis.

Signature

29/June/2018 Date AGB – data verification, design of study and student supervision, editing of Adrian G. Barnett manuscript, statistical code for data analysis. AGB had full access to all of the data (including statistical reports and tables) in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis.

Principal Supervisor Confirmation

I have sighted email or other correspondence from all Co-authors confirming their certifying authorship. (If the Co-authors are not able to sign the form please forward their email or other correspondence confirming the certifying authorship to the RSC).

Name Signature Date

Adrian Banrett 29/June/2018

4.1 OVERVIEW

This chapter contains the modified version of the study that was published at BMJ Open. The modifications are: the text, figures, tables, and references, of the original document formatted for the purposes of this thesis. The published version can be found at:

62 Has open data arrived at the British Medical Journal (BMJ)? An observational study

Rowhani-Farid A, Barnett AG: Has open data arrived at the British Medical Journal (BMJ)? An observational study. BMJ Open 2016, 6(10). http://dx.doi.org/10.1136/bmjopen-2016-011784

4.2 ABSTRACT

Objective To quantify data sharing trends and data sharing policy compliance at the BMJ by analysing the rate of data sharing practices, and investigate attitudes and examine barriers towards data sharing.

Design Observational study.

Setting The BMJ research archive.

Participants 160 randomly sampled BMJ research articles from 2009 to 2015, excluding meta- analysis and systematic reviews.

Main outcome measures Percentages of research articles that indicated the availability of their raw datasets in their data sharing statements, and those that easily made their datasets available upon request.

Results Three articles contained the data in the article. Fifty out of 157 (32%) remaining articles indicated the availability of their datasets. Twelve used publicly available data and the remaining 38 were sent email requests to access their datasets. Only 1 publicly available dataset could be accessed and only 6 out of 38 shared their data via e-mail. So only 7/157 research articles shared their datasets, 4.5% (95% confidence interval: 1.8% to 9%). For 21 clinical trials bound by the BMJ data sharing policy, the percent shared was 24% (8% to 47%).

Conclusions Despite the BMJ’s strong data sharing policy, sharing rates are low. Possible explanations for low data sharing rates could be: the wording of the BMJ data

Has open data arrived at the British Medical Journal (BMJ)? An observational study 63

sharing policy, which leaves room for individual interpretation and possible loopholes; that our email requests ended up in researchers’ spam folders; and, that researchers are not rewarded for sharing their data. It might be time for a more effective data sharing policy and better incentives for health and medical researchers to share their data.

4.3 ARTICLE SUMMARY

4.3.1 Strengths and limitations of this study • Our study quantified data sharing among all types of research articles published in The BMJ from 2009 to 2015

• The BMJ data sharing policy specifically applies to clinical trial data but our study analysed data sharing among all studies that have original raw data

• The sample size was 160 articles, which is relatively small

• The BMJ data sharing policy suggests using the BMJ as a broker to negotiate data access, however we did not use this service given the amount of time and resources it required both on our part and on the BMJ’s

4.4 INTRODUCTION

Open data is defined as ‘available, intelligible, assessable and useable data’(5). The practice of open data or ‘data sharing’ is the term given to the exercise of making all raw data fully and openly available, creating transparency and ensuring reproducibility, and driving further discovery by allowing new knowledge to be generated in the context of earlier discoveries (8-10). Though the concept of data sharing has only recently been identified as a sub-theme of meta-research (16), it was a research topic 15 years ago when Reidpath and Allotey conducted a prospective study to examine data sharing amongst BMJ articles. Only 1 out of 29 researchers contacted (3%) made their datasets available. The reluctance of researchers to make their data available raised questions about the validity of their findings, and suggested the researchers were potentially more concerned with not losing an advantage than advancing science through data sharing (249).

64 Has open data arrived at the British Medical Journal (BMJ)? An observational study

The research climate in 2001 was significantly different to the current era, where rapid technological advances are contributing to what Bartling and Friesike (2014) refer to as the second scientific revolution with terms such as ‘data sharing’, ‘open data’, ‘open research’, and ‘Science 2.0’ proliferating in the scientific discourse (40). Open data and data sharing are now being considered as fundamental elements of the shift towards research that is verifiable, reproducible, and transparent (36).

Given the many recent changes to research publishing, it is fitting to conduct a similar study to Reidpath and Allotey’s. Our study quantified data sharing trends at the BMJ from 2009 to 2015, paying particular attention to policy changes at the journal that aimed to increase data sharing.

We selected the BMJ because it is an international health and medical research journal leading the data sharing movement. In March 2009, the BMJ introduced the idea of a data sharing statement in research articles. The purpose was to explain whether there were any additional data available and how they could be accessed (250). The BMJ was among one of the first medical journals to introduce such a concept; a significant milestone in the data sharing movement that is gathering momentum in health and medical research. In 2010, the BMJ crystallised the data sharing statement into a policy (251). In 2012, the BMJ introduced a stricter data sharing policy for drugs and device trials: ‘From 1 January 2013, trials of drugs and medical devices will be considered for publication only if the authors commit to making the relevant anonymised patient level data available on reasonable request (252)’. From 1 July 2015 the BMJ’s requirements for data sharing extended to all submitted clinical trials, not just those that tested drugs or devices (146). A number of journals now require authors share their data(253, 254), either via a public repository or making it freely available upon request. The success of these policies remain largely untested (26).

Although the BMJ’s data sharing policy focusses on trials, data sharing should ideally apply to all types of research. This idea forms the basis of our study, which did not only examine clinical trials, but included all types of studies with original raw data. The reasoning behind our approach is that the BMJ is “…keen to maximise the usefulness and usage of data and promote transparency, and to satisfy the requirements of the many research funders that encourage or even mandate data sharing” (255). From this statement, we deduce that the BMJ supports research

Has open data arrived at the British Medical Journal (BMJ)? An observational study 65

reproducibility and transparency of research findings, which support high-quality research and apply to all research data.

4.5 METHODS

4.5.1 Overview A random sample of research papers published in the BMJ were examined to observe the issues arising with data sharing, including the point that was raised on a recent BMJ podcast (256), namely, that researchers indicate the availability of their data in order to pass the editorial review, but fail to share when it is requested. We contacted researchers who indicated in their data sharing statements that they were willing to make their datasets available. Our aims were to: i) estimate the rate of data sharing, and ii) examine the shared datasets by comparing them to the published findings to quantify the integrity of the data sharing process.

4.5.2 Participants A random number generator was used to select the research papers (using Excel). We excluded studies whose complete data were available in the article, including systematic reviews and meta-analyses. All other types of studies were included. Twenty BMJ research papers were randomly sampled per year from 2009 to 2014. In 2015 we randomly selected 20 papers before a major policy change on 1 July 2015 and 20 papers after. The total sample size was 160. We did not use a formal sample size calculation because they are often limited (257). Instead, the sample size was based on the practical considerations of reading papers, contacting authors, and examining their data.

The setting of this study was the BMJ research archive. All information required for data collection was publicly available online. Data collection commenced on 12 November 2015 and ended on 31 January 2016. The first author (ARF) read the research papers and extracted the details of authors. The following variables were documented: type of study, data sharing statement, and data availability. The second author (AGB) independently assessed the data sharing statements for 20 randomly selected articles. No disagreements were found, meaning that there is a 90% probability that the agreement between the two authors is over 90%.

Authors of articles that stated a willingness to share their data were contacted via e- mail. A de-identified copy of our approach email to authors is included as Appendix

66 Has open data arrived at the British Medical Journal (BMJ)? An observational study

B. Three research articles had their data within the text of the article itself as they were meta-analyses or systematic reviews, these researchers were not contacted, reducing the sample size from 160 to 157.

Email requests for data were sent from 18 November 2015 to 16 December 2015. 28 January 2016 was set as the final date for receiving datasets. A single reminder was sent to researchers who made an initial positive response but who did not send their datasets after two weeks. Alternative email addresses were only sought when our original email bounced back. A response from authors was taken as consent to participate in the study – all authors were informed about the ethical approval of the study and the procedure of consent.

Some research articles indicated that their data were available from external sources but were subject to additional applications. We did not apply for these datasets given the large amount of time it would take to apply, and because there was no guarantee we would gain access to the data.

4.5.3 Quantitative variables We first categorised each article into:

Data not available – research articles whose data sharing statement was that ‘no additional data is available’

Data available – research articles that indicated in their data sharing statement that their data is available

And then categorised those with data available into:

Data not available – research articles that did not make their datasets available to our team upon request and research articles that had ‘publicly available data’ that we could not locate.

Data potentially available – research articles that indicated that their datasets were available but they were subject to forms and applications and research articles that mentioned that their datasets were publicly available but they were not easily accessible and which also required forms and applications

Data easily available (received) – the research articles that made their dataset available to our team

Has open data arrived at the British Medical Journal (BMJ)? An observational study 67

4.5.4 Statistical methods We reported the percent of data sharing and 95% confidence interval. We examined the sample sizes and variables in the received data to verify that they matched the original paper. We used logistic regression to examine a change in data sharing over time using publication date as the time variable. We used a log link in place of the logit so our results are prevalence ratios not odds ratios (258).

4.6 RESULTS

4.6.1 Participants Out of 157 randomly sampled research articles, 50 indicated in various ways the availability of their raw data. The numbers grouped by what was written in the data sharing statements are in Table 4-1.

Table 4-1: Numbers of various data sharing statements for randomly selected BMJ research articles (2009–2015) that indicated the availability of their raw data

2009 2010 2011 2012 2013 2014 2015 2015 Total (1) (2) Additional data 1 1 3 4 3 2 5 0 19 available from author Reasonable requests 0 1 0 2 1 2 7 2 15 for access to data can be made to the authors Data was available 0 1 0 1 0 1 3 2 8 from external sources subject to additional applications Data publicly available 0 0 0 0 0 2 0 3 5 Data was available 0 1 0 1 0 0 0 0 2 once they had completed all planned analyses and published results Data was available 0 0 0 0 0 1 0 0 1 after 3 years, subject to a contract and authors will examine requests Total 1 4 3 8 4 8 15 7 50

68 Has open data arrived at the British Medical Journal (BMJ)? An observational study

Thirty-eight emails were sent to researchers that indicated in some way that their datasets were available. Of the 38 authors who were emailed, only 16 of them responded to our email, leaving 22 non-responses, which were categorised as “data not available”. Six of the 16 responses provided their datasets to our team (one of which was a randomised clinical trial but we could not verify that the data shared matched the article), these articles were categorised as “data easily available (received)”. Eight of the 16 responses raised caveats upon request, three of which were categorised as “data potentially available” as they were subject to forms and applications, and the remaining five were categorised as “data not available”. Two responses never followed through to make their data available and were categorised as “data not available”.

Twelve research articles had data that was available publicly or subject to forms: 3 provided external links that were no longer functioning, and 3 provided generalised links with no clear indication of the specific dataset that was used for the purpose of the study. These 6 articles were categorised as “data not available”. 5/12 articles were subject to application forms; these articles were categorised as “data potentially available”. Only 1 of the ‘publicly available’ datasets was uploaded onto a public data depository, Dryad.

Out of the 50 articles that had data available, 21 were randomised clinical trials (RCTs). One RCT dataset was freely available on Dryad, leaving 20 RCTs which were emailed to request their data. 13/20 did not respond to our email and were categorised as “data not available”, 4/20 made their data available (one of which was unverifiable) an overall sharing rate of 24% (8% to 47%). The remaining 3/7 responses raised caveats and did not make their data available to our team.

A flow chart of the data sharing results are in Figures 4-1, and Figure 4-2 for RCTs which are bound by the BMJ data sharing policy. The data sharing rates by BMJ policy changes are in Figure 6.

Has open data arrived at the British Medical Journal (BMJ)? An observational study 69

Figure 4-1: Flow chart of the randomly sampled BMJ research articles showing the availability of data

70 Has open data arrived at the British Medical Journal (BMJ)? An observational study

Figure 4-2: Flow chart of the randomly sampled BMJ research articles bound by the BMJ data sharing policy, randomised clinical trials, showing the availability of data

Has open data arrived at the British Medical Journal (BMJ)? An observational study 71

Figure 4-3: Summary of data availability and actual data received for BMJ research articles grouped by year and and in relation to data sharing policy changes. BMJ, British Medical Journal; RCT, randomised clinical trial

72 Has open data arrived at the British Medical Journal (BMJ)? An observational study

4.6.2 Main results The total numbers were: 7/50 articles had “data easily available (received)”, 35/50 articles were “data not available”, and 8/50 articles were “data potentially available”.

Six of the seven datasets contained data that matched the article, with one dataset unverifiable as it was difficult to navigate the data and no data dictionary was provided.

The percentage of data easily available from the 157 articles was only 4.5% (95% confidence interval: 1.8% to 9%). One of the shared datasets was not verifiable, so the actual data sharing rate might be lower than 4.5%. A further 8 articles had data potentially available, so the data sharing rate could be as high as 9.6% (5.5% to 15%).

For RCTs, 5/21 RCTs made their datasets easily available, a data sharing rate of 24% (95% confidence interval: 8% to 47%). 16/21 RCTs were categorised as “data not available”, and 0/21 in “data potentially available”.

29/50 articles were not bound by the policy but indicated data availability in their data sharing statements, only 2 of which made their data available. The sharing rate for those articles not bound by the BMJ data sharing policy is: 2/29, 7% (95% confidence interval: 1% to 23%).

4.6.3 Authors’ responses to data sharing The authors who made their datasets available did so with positive and encouraging words. Here are a few examples:

“Good luck with your project, I am a firm supporter of open access to data.”

“Thank you very much for you interest in our study. We adhere the BMJ data sharing policy indeed. Please find attached the data files.”

One researcher went so far as to offer to translate the dataset into English.

Eight out of 16 authors provided email responses that were not consistent with their data sharing statements and raised caveats, including: the requirement for entering into contracts with their institutions; writing a detailed plan indicating what we will do with their data; potentially paying for their data; that their data were no longer available as they are carrying out additional studies; and that their data were only available on their own University premises. These hidden policies, contracts, costs,

Has open data arrived at the British Medical Journal (BMJ)? An observational study 73

and rules were not included in their BMJ data sharing statements. One researcher thought our research question was not “a reasonable research question” and so refused to share their data.

4.6.4 Change over time A logistic regression showed that there was a 26% increase in the rate of “data shared” for every additional year between 2009 and 2015 (95% confidence interval: 13% to 43%), and a 40% increase in the rate of “data promised” for every additional year between 2009 and 2015 (95% confidence interval: –4% to 131%).

4.7 DISCUSSION

Only 32% of research articles published indicated the availability of their raw data. And then only 14% of those approached made their data easily available, and just one was freely accessible on Dryad. This gives an overall percent of only 4.5% of data sharing for research articles at the BMJ, with a higher 24% data sharing rate among clinical trials that are bound by the BMJ data sharing policy.

4.7.1 Interpretation From the 50 out of 157 authors that indicated the availability of their raw data, less than half were clinical trials (21), and the rest were: cohort studies, cross-sectional analyses, modelling studies, case-control studies, retrospective analyses, and others. It is encouraging to note that the majority of research articles that offered to make their raw data available were not bound by the data sharing policy that specifically applies to clinical trials. Assessing compliance of the BMJ data sharing policy was not the focus of our study as we were interested in all types of research articles. The easily available data sharing rate for clinical trials was 24%, which is higher than the rate for all articles types, but still low.

There are of course cases where ethical and legal constraints prevent data sharing, and we did not measure these occurrences.

Though 50 out of 157 research articles indicated the availability of their raw data, only 7 researchers easily provided their data for this study. It seems data sharing rates at the BMJ have only increased from 3% to 4.5% in 15 years, but with a 40% increase in the rate of “data promised” since 2009 (249), demonstrating an increased

74 Has open data arrived at the British Medical Journal (BMJ)? An observational study

compliance with data sharing policies for publication purposes, but not in practice (256).

With regard to the caveats that were raised only after we requested access to “available” raw data, we recognise that researchers have the right to set their own conditions for data access, but none of these conditions were mentioned in the data sharing statements. Ideally authors should state all the conditions in their data sharing statement, so as to clearly outline the procedures for accessing their raw data. It should not take much extra time to add this information to the data sharing statement. If there are restrictions on data availability – such as, home institution restrictions or other agreements with companies – these restrictions should be clearly outlined in the data sharing statement, an example of which could read: “Our University’s Data Sharing policy is that data is only available at our institution”. Ideal data sharing is freely available, easily accessible raw data that is downloadable from an online data depository such as Dryad.

Our findings are comparable to similar studies assessing data sharing rates at Public Library of Science (PLOS) journals. A study by Savage and Vickers in 2009 received only 1/10 datasets (10%) that were requested (27), and a larger sample of 441 biomedical journal articles published from 2000 to 2014 had a data sharing rate of 0%, although these researchers only searched for freely available data and did not email authors (16). It is evident that data sharing is not common practice even among publishers with strong data sharing policies such as the BMJ and PLOS.

The cultural shift towards more open data in health and medical research is not as developed as the discipline of genomics. An empirical study conducted by Milia et al. in 2012 demonstrated that ‘the majority of published data regarding human genetic variation are made openly available to the scientific community’(259).

There are a few possible explanations for the low data sharing rates at the BMJ. The wording of the BMJ data sharing policy states that authors of all submitted clinical trials, not just those that test drugs and devices, commit to making the relevant anonymised patient level data available on reasonable request (146). ’s editorial post in 2012 explains ‘reasonable request’ as:

“As for “reasonable request,” the BMJ is not in a position to adjudicate, but we will expect requesters to submit a protocol for their re-analysis to the authors and to

Has open data arrived at the British Medical Journal (BMJ)? An observational study 75

commit to making their results public. We will encourage those requesting data to send a rapid response to thebmj.com, describing what they are looking for. If the request is refused we will ask the authors of the paper to explain why (252).”

The interpretation of ‘reasonable request’ is left to individual authors. What we thought as a ‘reasonable request’ may not be by other researchers, and could be behind the low data sharing rate. Some thought that the purpose of our study was not worth their time and resources, hence labelling our study in the category of unreasonable requests. It is not the purpose of this paper to convince the audience of the reasonability of our study, rather to bring to the BMJ’s attention the ambiguity created by the policy wording. With regard to submitting a protocol, our email included all the procedures of our examination of the dataset for verification. We did not use the BMJ to broker access to papers on our behalf, and the data sharing rate could be higher if we used this route, although we note that this takes additional time and effort on our behalf and staff at the BMJ.

There are other potential reasons for the low data sharing rates. Given that 55% of the researchers contacted via email did not respond, we could deduce that: they never received our email due to out-dated e-mail addresses or spam filters, that researchers were too busy, or that our request was simply ignored. We therefore recommend that multiple contacts are given, potentially including other researchers or even Twitter accounts. Non-response problems would be overcome by having the data stored by a third party, such as Dryad, as recommended by the BMJ.

Another possible barrier of data sharing is the lack of rewards in the scientific community. Researchers who participate in the culture of sharing should be supported and rewarded by the academic and research career systems (38, 101, 260) The lack of incentives for data sharing is a key barrier as researchers are often time poor and many do not see the value of spending time preparing their data or may be concerned about lengthy follow-up questions. A recent study conducted by Kidwell and colleagues (2016) demonstrated that badges, developed by the Centre of Open Science, were effective incentives that increased data sharing rates (117). To encourage data sharing in health and medical research, it might be beneficial to change the criteria by which scientists and their teams are rewarded for their efforts by funding agencies and institutions(120). Ioannidis and Khoury (2014) designed the

76 Has open data arrived at the British Medical Journal (BMJ)? An observational study

“PQRST approach” for rewarding researchers, where the ‘S’ stood for sharing of data, code, and protocols (120). To contribute to the adoption of a culture of data sharing, in early 2016 the International Committee of Medical Journal Editors (ICMJE) put together a proposal outlining some requirements to help meet the mandating of clinical trial data sharing worldwide (151).

4.7.2 Limitations The sample size of 160 is relatively small. However the confidence interval for the rate of easily shared data is quite narrow and the upper limit is below 10%.

Our data sharing rate could be increased by more active chasing of researchers, yet, as Iqbal et al. (2016) indicated, ‘the yield would be uncertain, and personal communications should not replace the lack of transparency in the published scientific record’ (16). As such, we did not try to find alternative email addresses for those researchers that did not respond (we did try to find an alternative address if an email bounced back), nor did we follow up on them. Also, we did not approach the journal to help us negotiate access due to the amount of time and resources such a task requires both on our part and the BMJ’s for up to 160 papers.

We did not compare the characteristics of those who did and did not share their data (e.g., which country was best/worst) as that was not one of our study aims.

4.7.3 Generalisability We used a random sample of BMJ research papers and only excluded meta-analyses and systematic reviews. Hence our results should be generalisable to the wider BMJ literature and potentially to other general medical journals.

4.8 CONCLUSION

As policies and procedures, rules and regulations that promote and encourage data sharing become more common, our study provides a glimpse into the reality of data sharing practices amongst health and medical researchers, using the BMJ as a case study. Has open data arrived at the BMJ? We think not. With a data sharing rate of only 4.5% among all studies and 24% among clinical trials, there is clear room for improvement despite the journal’s internationally leading stance on encouraging data sharing. Tighter data sharing policies and better incentives for researchers to share their data might be needed.

Has open data arrived at the British Medical Journal (BMJ)? An observational study 77

4.9 OTHER INFORMATION

4.10 COPYRIGHT, OPEN ACCESS, AND PERMISSION TO REUSE

The Corresponding Author has the right to grant on behalf of all authors and does grant on behalf of all authors, a worldwide licence to the Publishers and its licensees in perpetuity, in all forms, formats and media (whether known now or created in the future), to i) publish, reproduce, distribute, display and store the Contribution, ii) translate the Contribution into other languages, create adaptations, reprints, include within collections and create summaries, extracts and/or, abstracts of the Contribution, iii) create any other derivative work(s) based on the Contribution, iv) to exploit all subsidiary rights in the Contribution, v) the inclusion of electronic links from the Contribution to third party material where-ever it may be located; and, vi) licence any third party to do any or all of the above.

4.11 FUNDING

This project is supported in kind by the Australian Centre for Health Services Innovation which is based at the Institute of Health and Biomedical Innovation at Queensland University of Technology (QUT) in Brisbane, Australia. There was no specific funding for this study.

4.12 CONTRIBUTORS

ARF – data collection, data analysis, writing of manuscript

AGB – data verification, design of study and student supervision, editing of manuscript, statistical code for data analysis

Both authors had full access to all of the data (including statistical reports and tables) in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis.

4.13 ETHICAL APPROVAL

This study received low-risk ethical approval from the Office of Research Ethics and Integrity at QUT.

78 Has open data arrived at the British Medical Journal (BMJ)? An observational study

4.14 COMPETING INTERESTS

All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

4.15 TRANSPARENCY

The lead author affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

4.16 OPEN ACCESS

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0.

4.17 DATA SHARING

The data set (with all identifiers removed) is deposited as a data supplement at the Dryad data repository at http://datadryad.org/ with the doi:10.5061/dryad.q8s5k. The data is free for re-use by all other researchers and there are no additional forms or ethics applications that need to be completed.

Has open data arrived at the British Medical Journal (BMJ)? An observational study 79

Chapter 5: Badges for data and code sharing at Biostatistics: an observational study

The authors listed below have certified that:

1. they meet the criteria for authorship in that they have participated in the conception, execution, or interpretation, of at least that part of the publication in their field of expertise;

2. they take public responsibility for their part of the publication, except for the responsible author who accepts overall responsibility for the publication;

3. there are no other authors of the publication according to these criteria;

4. potential conflicts of interest have been disclosed to (a) granting bodies, (b)the editor or publisher of journals or other publications, and (c) the head of the responsible academic unit, and

5. they agree to the use of the publication in the student’s thesis and its publication on the QUT’s ePrints site consistent with any limitations set by publisher requirements.

In the case of this chapter:

Badges for data and code sharing at Biostatistics: an observational study

80 Badges for data and code sharing at Biostatistics: an observational study

Contributor Statement of contribution* Anisa Rowhani- Farid ARF conceptualised the study design, collected the data and assisted with data analysis, and wrote the first draft of the manuscript.

Signature

29/June/2018 Date AGB conceptualised the study design, analysed the data and edited the manuscript. Adrian G. Barnett

Principal Supervisor Confirmation

I have sighted email or other correspondence from all Co-authors confirming their certifying authorship. (If the Co-authors are not able to sign the form please forward their email or other correspondence confirming the certifying authorship to the RSC).

Name Signature Date

Adrian Barnett 29/June/2018

5.1 OVERVIEW

This chapter contains the modified version of the study that was published at F1000Research. The modifications are: the text, figures, tables, and references, of the original document formatted for the purposes of this thesis. The published version can be found at:

Rowhani-Farid A and Barnett AG. Badges for sharing data and code at Biostatistics: an observational study [version 2; referees: 2 approved]. F1000Research 2018, 7:90 (doi: 10.12688/f1000research.13477.2)

Biostatistics was chosen as the journal of focus for this study as, at the time of this study and to the best of my knowledge, it was the only other journal apart from

Badges for data and code sharing at Biostatistics: an observational study 81

Psychological Science that was rewarding authors for their reproducible practices. Though the focus of this thesis is on reproducible practices in health and medical journals, the findings of this Chapter are likely generalisable to a broader scope of journals, as journals from all fields that publish observational or experimental studies are beginning to consider the importance of reproducibility policies and incentives for sharing data and code.

5.2 ABSTRACT

Background: The reproducibility policy at the journal Biostatistics rewards articles with badges for data and code sharing. This study investigates the effect of badges at increasing reproducible research.

Methods: The setting of this observational study is the Biostatistics and Statistics in Medicine (control journal) online research archives. The data consisted of 240 randomly sampled articles from 2006 to 2013 (30 articles per year) per journal. Data analyses included: plotting probability of data and code sharing by article submission date, and Bayesian logistic regression modelling.

Results: The probability of data sharing was higher at Biostatistics than the control journal but the probability of code sharing was comparable for both journals. The probability of data sharing increased by 3.9 times (95% credible interval: 1.5 to 8.44 times, p-value probability that sharing increased: 0.998) after badges were introduced at Biostatistics. On an absolute scale, this difference was only a 7.6% increase in data sharing (95% CI: 2 to 15%, p-value: 0.998). Badges did not have an impact on code sharing at the journal (mean increase: 1 time, 95% credible interval: 0.03 to 3.58 times, p-value probability that sharing increased: 0.378). 64% of articles at Biostatistics that provide data/code had broken links, and at Statistics in Medicine, 40%; assuming these links worked only slightly changed the effect of badges on data (mean increase: 6.7%, 95% CI: 0.0% to 17.0%, p-value: 0.974) and on code (mean increase: -2%, 95% CI: -10.0 to 7.0%, p-value: 0.286).

Conclusions: The effect of badges at Biostatistics was a 7.6% increase in the data sharing rate, 5 times less than the effect of badges at Psychological Science. Though badges at Biostatistics did not impact code sharing, and had a moderate effect on data sharing, badges are an interesting step that journals are taking to incentivise and promote reproducible research.

82 Badges for data and code sharing at Biostatistics: an observational study

5.3 KEY WORDS

Reproducibility, incentives, rewards, data sharing, code sharing, meta-research

5.4 INTRODUCTION

Historically, the replication of a scientific experiment has been the measure of its validity, however, not all experiments can be replicated in their totality (261). ‘Replicability’ is the ability of a researcher to duplicate the results of a prior study if the same procedures are followed but new data are collected (22). In 2009, Roger Peng mentioned in an editorial in Biostatistics that the minimum standard that could bridge the gap between replicability and nothing is “reproducible research” (261). ‘Reproducibility’ is the ability of a researcher to duplicate the results of a prior study using the same materials as were used by the original investigator (22) . Reproducibility was defined by Peng in terms of sharing the data and computer code used to analyse the data and he described it as the “cornerstone of the scientific method” (261). In a perspective piece in 2011, Peng likened reproducibility to a spectrum, at one end being the gold standard of full replication, and at the other, publication only (262). Given the expectation that data will be accessible, researchers who refuse to share the evidentiary basis behind their conclusions, or the materials needed to reproduce published experiments, fail to maintain the standards of science (43). Although in some instances highly-sensitive data cannot be shared for legal or privacy reasons.

Scientific journals are critical to changing the culture of research. Many journals are introducing data sharing policies, but studies have shown that policies alone are not effective in promoting a culture of sharing and that scientists potentially need to be rewarded for good behavior (263). Ioannidis et al. discuss changing the reward criteria to include ‘reproducible’ and ‘sharing’ using the PQRST criteria – productive, high-quality, reproducible, shareable, and translatable (120). A systematic review of incentives that motivated researchers to share their data in the health and medical research community, uncovered only one evidence-based incentive that increased data sharing at the journal Psychological Science from 1.5% pre-incentive (2012) to 39.4% post-incentive (2015) (117, 264). This incentive was an open data badge developed by the Center of Open Science (COS) and introduced at the journal in January 2014 (117).

Badges for data and code sharing at Biostatistics: an observational study 83

Badges for reproducible research were not an innovative creation of COS however. The journal Biostatistics introduced badges, or what they called kitemarks (named after the UK kitemark system of establishing product safety), on 1 July 2009 as part of their policy to reward reproducible research (261). The policy was introduced by Roger Peng, the then Associate Editor for reproducibility (AER) (261). Sharing was not enforced, rather authors were encouraged to consider the reproducibility of their research (261). From here on, kitemarks will be referred to as badges, using common terminology.

The reproducibility policy at the journal instructed authors to indicate in their submission if they intend to submit supplementary materials that include data, code, or both (261). The policy rewarded articles with data available with the letter D on the front page of the published article PDF, articles with code available with a C, and articles with data and code available and which were tested for reproducibility by the AER an R for reproducibility (261). It is important to note that data refers to raw data and not simulated data, which are commonly used in statistics.

The policy change at Biostatistics provided an ideal opportunity to replicate the findings of the Kidwell et al. badge study by examining sharing rates at another journal that offered a reward or incentive for reproducible research (117). We note that Kidwell et al. examined data and material sharing only, as badges were not offered for code.

A survey conducted by Nature in 2016 indicates that the scientific community is in the midst of a reproducibility crisis (78). The current culture in science provides strong incentives for innovation and relatively weak incentives for certainty and reproducibility (265). Within the current ‘post-truth’ era there is much public scrutiny and suspicion around the validity of science. Such a debate, compounded by the reproducibility crisis, signals a time for a cultural shift in the scientific research process (266). The sharing of data, as well as the computer code used to analyse the data, should, where possible, be integral components of the research process, however data sharing rates have been as low as 0% (16). Of course, not all data can be shared due to legal and ethical constraints, but these are neither the only, nor main reasons behind low sharing rates (20). Scientists are still exploring the barriers towards sharing and a key concern is that researchers are not incentivised to share (262).

84 Badges for data and code sharing at Biostatistics: an observational study

5.4.1 Aim Our aim is to investigate the effect of badges at increasing reproducible research, specifically, data and code sharing, at Biostatistics.

5.5 METHODS

5.5.1 Participants This is an observational study with two journals, intervention and control, using a pre-post study design, with 30 randomly selected papers per year from 2006 to 2013 for each journal. We chose Statistics in Medicine as the control journal as it did not have a badges or any type of reproducible research reward scheme during those years, but is in the same field of research with similar goals of publishing papers on statistical methods development in health and medicine. Additional control journals would have increased the representativeness of our study and increased the statistical power. However, no other similar journals from the field of biostatistics satisfied the inclusion criteria, as they all introduced a reproducibility policy before or between 2006 to 2013. Therefore, the study setting is the Biostatistics and Statistics in Medicine research archive. All the information required was publicly available online, as such participant consent was not required and an ethics exemption (exemption number: 1700001051) was granted by the Office of Research Ethics and Integrity at the Queensland University of Technology.

5.5.2 Sample size calculation and power A sample of only 19 papers per journal would have given us a 90% power to detect a difference in data sharing of 37.9%, based on the effect of badges from the Kidwell et al. study (117). This uses a two-sided 5% significance level. We felt this sample was unrealistically small, hence we instead based our sample size on the practical considerations of reading papers and examining their data and code sharing choices, given the time constraints of the first author’s (ARF) PhD. Thirty papers per year from 2006 to 2013 for two journals is a total sample of 480 papers, which is practically possible, and provides good coverage over the time of the policy change at Biostatistics.

5.5.3 Data collection For each year and journal, a random number generator was used to select the research articles (in Microsoft Excel 2016). Articles were included if they:

Badges for data and code sharing at Biostatistics: an observational study 85

• Generated and analysed original data (article had data and code to share), or

• Conducted secondary analyses on a pre-existing dataset from another study (article had data and code to share), or

• Generated simulated data (article did not have data to share but had code to share)

Articles were excluded if:

• They were meta-analyses, meta-regressions, or systematic reviews, as these papers usually contain the data within the paper

• They were case series, opinion pieces or some other publication type without data or code

If an article was excluded then we sampled another article from the same year and journal to maintain the sample size. ARF read the research papers and extracted the details of the articles included in the study. Each article was screened using these search terms: “data”, “code”, “package”, “available”, “https”, “www”, “figshare”, and “github”. For the included articles, the following variables were documented: submission date, data sharing statement, data availability, hyperlink to dataset, code sharing statement, code availability, hyperlink to code, and badge allocation (for Biostatistics articles).

The second author (AGB) independently assessed data and code sharing for 20 randomly selected articles. There were minor discrepancies between the authors, which were resolved by discussion.

Using definitions from our previous work (263), each research article was categorised for data and code sharing as:

Data sharing available: articles that had a functioning link to a publicly available dataset deposited at a third-party site or attached as supplementary material to the electronic version of the article potentially available: articles that indicated that the dataset was potentially available upon request from the authors

86 Badges for data and code sharing at Biostatistics: an observational study

not available: articles that did not indicate the availability of the dataset analysed in the article or where the link to the data was no longer working none to share: articles that used simulated data and so did not have a raw dataset to share

Code sharing available: articles that had a functioning link to publicly available code deposited at a third-party site, or attached as supplementary material to the electronic version of the article or available within the article itself potentially available: articles that indicated that the code was potentially available upon request from the authors not available: articles that did not indicate the availability of the code used to analyse the data (raw or simulated) or where the link to the code was no longer working

5.5.4 Intervention period We defined the intervention period based on the policy change date at Biostatistics and using the article’s submission date as this is when authors are thinking about the journal requirements and perhaps becoming aware of the badge. Since the policy change was on 1 July 2009, papers submitted to Biostatistics after that date were in the intervention period. We included a six month gap before the policy change as an interim phase because papers submitted during this time (1 January 2009 to 1 July 2009) could experience the badge policy upon re-submission, so papers submitted in this period were categorized into the interim period. Any papers submitted to Biostatistics before 1 January 2009 were in the control period and all papers submitted to Statistics in Medicine were controls.

The first analysis examined data and code availability and probability of sharing over time using submission date. As a sensitivity analysis, we used the articles’ publication dates extracted from PubMed in place of submission date. We conducted this sensitivity analysis to examine whether the policy was associated with a change based on the very latest date that authors could make changes to their papers.

Badges for data and code sharing at Biostatistics: an observational study 87

5.5.5 Statistics methods We plotted the binary data and code sharing over time and included a smooth curve to estimate the mean sharing rate over time in each journal. The smooth curves were made using a LOESS smooth with a span of 0.9, and we also plotted the 95% confidence intervals. Papers where there was no data to share (i.e., using simulated data) were excluded from these plots.

To test for a difference in the probability of making data and code available after the introduction of badges, we used logistic regression and presented the results as prevalence ratios rather than odds ratios, as prevalence ratios are generally easier to understand (258). Due to possible convergence issues with a standard logistic regression model using a log-link to estimate prevalence ratios, we ran a Bayesian logistic regression model using WinBUGS (version 1.3.4). Using a Bayesian model has the added advantage of giving 95% credible intervals and Bayesian p-values that are far easier to interpret than frequentist confidence intervals and p-values. The Bayesian p-values used here estimate the probability that sharing increased after the policy change at Biostatistics. As well as showing the change in data and code sharing probability, on the relative scale, of the prevalence ratio, we also show the absolute increase in sharing probability after the policy change together with 95% credible intervals.

In a sensitivity analysis we used a strong control for time by including year as a random effect, assuming that each year has its own data sharing rate. This essentially matches papers from Biostatistics and Statistics in Medicine from the same year. We did this to adjust for other changes over time, for example a potential increase over time in data and code depositories such as GitHub, Figshare, and Dryad, and a potential decrease in data and code availability for papers published many years ago because of broken links (155).

The current editors of Biostatistics indicated that when the publisher (Oxford) switched to a new publishing platform in January 2017, some of the supplemental material was lost in the transfer (personal communication, J Leek, 8 November 2017). As such, we conducted a sensitivity analysis assuming these broken links worked before Oxford changed publishing platforms.

The data analysis was made using the statistical software R (version 3.2.3).

88 Badges for data and code sharing at Biostatistics: an observational study

5.6 RESULTS

5.6.1 Broken links We often encountered issues with broken hyperlinks at both journals. Forty-nine out of 76 (64%) articles that provided links to data and code at Biostatistics had broken links and at Statistics in Medicine, 21 out of 53 (40%) articles that provided links to data and code had broken links. We examine the impact of these broken links in sensitivity analyses.

5.6.2 Data availability over time Flow charts show the frequency of data and code availability for each journal (Figures 5-1a and 5-1b). Biostatistics had 8 articles with no data to share, bringing the sample with possible data available to 232; 20 of which had data available, 3 had data potentially available and 209 had no data available. Statistics in Medicine had 31 articles with no data to share, bringing the sample with possible data available to 209; 2 of which had data available, 4 had data potentially available and 203 had no data available.

Biostatistics Statistics in articles Medicine articles n = 240 n = 240

none to share none to share n = 8 n = 31

potentially potentially available not available available not available available available n = 20 n = 209 n = 2 n = 203 n = 3 n = 4

Figure 5-1a: Flow chart of data availability. Figure 5-1b: Flow charts of data availability. Randomly selected Biostatistics articles from Randomly selected Statistics in Medicine articles 2006 to 2013 from 2006 to 2013

Badges for data and code sharing at Biostatistics: an observational study 89

The data available and probability of sharing by submission date together with a smooth mean and 95% confidence intervals are in Figure 5-2a. The vertical red lines are at 1 July 2009, the date badges were introduced at Biostatistics, and 1 January 2009, six months prior to the policy change (interim period). It is clear that data availability and probability of sharing were greater over time in Biostatistics than in the control journal, Statistics in Medicine, but the probability of sharing data at Biostatistics was still low, at well below 0.25. Interestingly an increase in data sharing at Biostatistics took place before badges were introduced at the journal. The results of the sensitivity analysis using publication date are shown in Figure 5-2b. The smooth means in Figure 5-2b are similar to those in Figure 5-2a and show that data availability and probability of sharing were increasing at Biostatistics before badges were introduced. The results of the sensitivity analysis assuming the broken links were working using submission date as the time variable are shown in Figure 5- 2c. The smooth means in Figure 5-2c are similar to those in Figures 5-2a and 5-2b showing that the data sharing results are not greatly influenced by these broken links.

Figure 5-2a: Plot of data availability over time by submission date. The dots at ‘No’ or ‘Yes’ are individual articles and the lines are a smoothed mean using a LOESS together with 95% confidence intervals (grey areas). The red lines indicate the interim period: 1 January 2009 to 1 July 2009

90 Badges for data and code sharing at Biostatistics: an observational study

Figure 5-2b: Plot of data availability over time by publication date. The dots at ‘No’ or ‘Yes’ are individual articles and the lines are a smoothed mean using a LOESS together with 95% confidence intervals (grey areas). The red lines indicate the interim period: 1 January 2009 to 1 July 2009

Badges for data and code sharing at Biostatistics: an observational study 91

Figure 5-2c: Plot of data availability by submission date assuming the now broken links were working at the time. The dots at ‘No’ or ‘Yes’ are individual articles and the lines are a smoothed mean using a LOESS together with 95% confidence intervals (grey areas). The red lines indicate the interim period: 1 January 2009 to 1 July 2009.

5.6.3 Code availability over time The frequency of code availability for each journal is in Figures 5-3a and 5-3b, which were comparable for the two journals. Statistics in Medicine had 24 articles with code available, 27 potentially available, and 189 with no code available, while Biostatistics had 14 articles with code available, 22 potentially available, and 204 with no code available.

92 Badges for data and code sharing at Biostatistics: an observational study

Biostatistics Statistics in articles Medicine articles n = 240 n = 240

potentially potentially available not available available not available available available n = 14 n = 204 n = 24 n = 189 n = 22 n = 27

Figure 5-3a: Flow charts of code availability. Figure 5-3b: Flow charts of code availability. Randomly selected Biostatistics articles from Randomly selected Statistics in Medicine articles 2006 to 2013 from 2006 to 2013

The code availability and probability of sharing by submission date together with a smooth curve and 95% confidence intervals are in Figure 5-4a. The smooth means for Biostatistics and Statistics in Medicine are mostly on top of each other in this graph, except for a drop-off in sharing at Biostatistics in later years. This indicates no great difference in code sharing at these journals. Figure 5-4b shows the results of the sensitivity analysis, where publication date was used instead of submission date. In this graph (Figure 5-4b), the smooth curves for Biostatistics and Statistics in Medicine are again mostly on top of each other, showing an increase in code sharing over time at both journals, but around mid-2011 the two curves diverged, with Statistics in Medicine showing an increase in code sharing and Biostatistics a drop. The results of the sensitivity analysis assuming the broken links were working using submission date as the time variable are shown in Figure 5-4c. In this graph the smooth curves are again mostly overlapping, but with greater code availability over time at both journals.

Badges for data and code sharing at Biostatistics: an observational study 93

Figure 5-4a: Plot of code availability over time by submission date. The dots at ‘No’ or ‘Yes’ are individual articles and the lines are a smoothed mean using a LOESS together with 95% confidence intervals (grey areas). The red lines indicate the interim period: 1 January 2009 to 1 July 2009.

94 Badges for data and code sharing at Biostatistics: an observational study

Figure 5-4b: Plot of code availability over time by publication date. The dots at ‘No’ or ‘Yes’ are individual articles and the lines are a smoothed mean using a LOESS together with 95% confidence intervals (grey areas). The red lines indicate the interim period: 1 January 2009 to 1 July 2009.

Badges for data and code sharing at Biostatistics: an observational study 95

Figure 5-4c: Plot of code availability by submission date assuming the now broken links were working at the time. The dots at ‘No’ or ‘Yes’ are individual articles and the lines are a smoothed mean using a LOESS together with 95% confidence intervals (grey areas). The red lines indicate the interim period: 1 January 2009 to 1 July 2009.

5.6.4 Increase in data sharing associated with badges The logistic regression model estimated that the probability of data sharing increased by 5.7 (95% CI for prevalence ratio: 0.69 to 16.43, p-value: 0.947) times that of the control period in the interim period of 1 January 2009 to 1 July 2009. This Bayesian p-value gives an estimated 94.7% probability that the mean rate of sharing increased. After the interim period, the probability of data sharing increased by an estimated 3.9 (95% CI: 1.5 to 8.4, p-value: 0.998) times after badges were introduced. On an absolute scale, this difference was only a 7.6% increase in data sharing (95% CI: 2 to 15%). After controlling for time, badges increased the probability of data sharing at the journal by an estimated 4.9 times (95% CI: 1.5 to 13.0, p-value: 0.997). This is comparable to the prevalence ratio of 3.9 when time was not added as a random effect, which shows that controlling for time only slightly increased the effect badges had on the probability of data sharing. After assuming the now broken links were

96 Badges for data and code sharing at Biostatistics: an observational study

working at the time of publication, the logistic regression model that controlled for time gave a slightly different estimate of the mean effect of badges from the previous 7.6% to 6.7% (95% CI: 0.0% to 17.0%, p-value: 0.974).

5.6.5 Increase in code sharing associated with badges During the interim period, badges did not have an effect on code sharing (prevalence ratio of 1). After the interim period there was an estimated 0.61% increase (95% CI: –5 to 8%, p-value: 0.55) in sharing. After adjusting for time, this absolute difference reduced to –1.4% (95% CI: –7 to 5%, p-value: 0.287). This suggests that badges did not have an impact on the probability of sharing code. After assuming the now broken links were working at the time of publication, the logistic regression model that controlled for time estimate a slightly changed mean effect of badges from the previous 0.61% to –2% (95% CI: –10 to 7%, p-value: 0.286).

5.7 DISCUSSION

5.7.1 Are badges effective incentives for reproducibility? The results of this observational study and those of the related Kidwell et al. badge study (117) cannot accurately deduce the effectiveness of badges because of the of the non-randomised study design. The Kidwell et al. 2016 badge study received criticism from Hilda Bastian on its study design, analyses, and claims (267). One of the criticisms was that the badges scheme was not the only intervention offered at the journal, there were four other co-interventions offered in 2014, and so any effect could not be attributed to badges alone (267). Bastian reasonably argued that to isolate the impact of badges, groups that had the same conditions except badges were needed (267). Our study is also exposed to similar limitations with regard to confounding as other changes may have occurred that we were not aware of. However, we can derive some insight into the effect badges had on data and code sharing from the results of both observational studies.

After the introduction of badges at Biostatistics, the probability of data sharing increased 3.9 times. This prevalence ratio might seem like a large increase but on an absolute scale it is only a 7.6% increase in the rate of data sharing, which is much lower than the 37.9% effect of badges at Psychological Science (117). When the now broken links were assumed to indicate sharing, the badge effect reduced slightly to 6.7%. The large difference between the effect of badges at Biostatistics and

Badges for data and code sharing at Biostatistics: an observational study 97

Psychological Science could be related to differences in the culture of sharing between the two fields, and the timeframes of the studies: 2006 to 2013 for our study, versus 2012 to 2015 for Kidwell et al. Our study analysed incentives for data and code sharing at an earlier time when the reproducibility crisis was not yet a testified reality, hence researchers may have been more primed to change behaviour in the Kidwell et al. study. Also, since statisticians typically re-analyse existing datasets, it might be harder for them to share the data as they might not have the rights. This is contrary to research in psychological science where original data is normally collected and analysed, making sharing a potentially simpler task.

There was an apparent increase in data sharing before badges were introduced at Biostatistics (Figure 5-2a). One possibility is that articles that were submitted before the policy change could still have experienced the policy because of the time needed for peer review and resubmission. We used submission date to determine if articles were prepared before or after the policy change because we know that sharing data often takes preparation time and we believed that authors were therefore more likely to react to the policy when they were writing their first draft. However, data sharing seemed to be increasing before badges were introduced even when we used publication date in a sensitivity analysis. The reproducibility policy at Biostatistics was built on the existing framework that “allowed and encouraged authors to place supplementary materials online” (261). Such an option of depositing supplementary material could have contributed to the rise in data sharing before badges. Also, Roger Peng assumed the role as the Associate Editor for reproducibility at Biostatistics in 2006, which might have catalysed a change in the culture of reproducibility at the journal. Another possible contributor to the increase in data sharing before the policy change is the general trend towards more open science and open data (37).

Badges did not appear to have an effect on code sharing as the prevalence ratio was 1.1 When the now broken links were assumed to indicate code sharing, the badge effect on code changed slightly from 0.61% to –2%. This is an unexpected outcome as code is of great importance in the field of biostatistics. A possible explanation behind the lack of badge effect on code sharing could be our definition of code sharing, which might seem traditional compared with the reproducibility policy at Biostatistics. We defined code sharing as the availability of the code used to analyse the data (original or simulated) in the article. The policy at Biostatistics included

98 Badges for data and code sharing at Biostatistics: an observational study

referencing “…software that is widely available from central repositories (e.g. CRAN, Statlib)”. It is true that providing a link to a third-party repository where software packages are deposited, such as vignettes, typically contain some general code, but it often takes specialized skills to work out the code at these repositories, and they might not always explain the analyses covered in the actual published article. This is in line with what Stodden et al. recommended in their piece on reproducibility in Science, “Data and code underlying discoveries must be discoverable from the related publication, accessible, and reuseable” (268).

Badges have been promoted as a simple solution because they are low cost. However, while collecting data for our study, we noticed that articles did not always appear to be allocated with badges correctly, implying that assigning badges is not always clear cut and journal staff may need to spend more time on verification. An alternative approach is that peer-reviewers check for data and code availability and assign badges as part of the standard peer review process. It could be that peer- reviewers prefer to have access to the data and code in order to review the article anyway, so this model might work, but it still requires additional time and effort on their part and as they receive little recognition for their work, plus it might be unfair to expect all peer-reviewers to check for data and code sharing.

5.8 CONCLUSION

Efforts are underway by the global meta-research community to strengthen the reliability of the scientific method (21). Data and code sharing is an indispensable part of the movement towards science that is open; where scientific truth is not a questionable commodity, but is easily accessible, replicable, and verifiable (269). The cultural shift towards reproducible science is complex and it calls for a twofold change in the attitudes of individual researchers toward reproducibility, and the leadership provided by the systems and services that support scientific research. As such, journals, universities, government bodies, and funders are key players in promoting this culture. Transparency and reproducibility are elements central to strengthening the scientific method, and data and code provide the key to scientific truth (16). As Peng argued in Science, the culture of reproducibility will not drastically change overnight, but simply bringing the notion of reproducibility to the fore and making it routine will make a difference (262). Badges are already being used by journals including Biostatistics, Psychological Science , British Medical

Badges for data and code sharing at Biostatistics: an observational study 99

Journal Open Science , and Association for Computing Machinery to encourage researchers to share the evidence behind their work (261) (270). Based on this observational study and a previous study, it appears that badges do help to increase data sharing, but a randomised trial is needed to better estimate their true effect, as well as studies of the additional time needed to implement and maintain them.

5.9 DATA AVAILABILITY

Anonymised data and the R code used in the analyses are publicly available at: https://doi.org/10.6084/m9.figshare.5687548.v2 (271).

5.10 CONSENT

An ethics exemption was granted by the Office of Research Ethics and Integrity at the Queensland University of Technology for this study (exemption number: 1700001051). No consent was needed as all data collected and analysed in this study were publicly available.

5.11 COMPETING INTERESTS

No competing interests were disclosed.

5.12 GRANT INFORMATION

This study was supported in kind by the Institute of Health and Biomedical Innovation at the Queensland University of Technology.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

5.13 ACKNOWLEDGEMENTS

Thanks to one of the current Editors of Biostatistics, Professor Jeffrey Leek, for his prompt response to our enquiry. Sincere thanks goes to my PhD colleague, Victoria McCreanor, for reading through the draft manuscript and providing feedback.

100 Badges for data and code sharing at Biostatistics: an observational study

Chapter 6: Did awarding badges increase data sharing at British Medical Journal Open (BMJ Open)? A randomised controlled trial

6.1 OVERVIEW

This chapter contains the protocol (version 5, last updated 26 June 2018) of the study as I am still collecting data. There were delays in starting the trial because of the additional legal and ethical approvals to collaborate with BMJ Open from a non EU country. Data collection commenced in August 2017 and so the recruitment dates that were previously stated in the protocol have now been delayed. This protocol was registered at the Open Science Framework and is still under embargo, which will be removed once data collection is complete. The trial stopped recruitment on 16 March 2018, and I will now finalise the existing participants and prepare to analyse the data and write up the findings. The complete data may not be ready until late 2018 because of the delay between recruiting participants when they first submit their paper, and getting the follow-up data from their published paper. It has been decided that this final manuscript will be prepared after the submission of my thesis for external examination.

6.2 FUNDING

No funding is provided for this study. Anisa Rowhani-Farid is on a QUT scholarship and is receiving in-kind support from the Institute of Health and Biomedical Innovation at QUT.

6.3 AIM

To examine if Open Data Badges increase data sharing rates among health and medical researchers that publish in BMJ Open.

6.4 SIGNIFICANCE

Data sharing is not standard practice in published papers in health and medical research. Not sharing data slows the advancement of health and medical science, and

Did awarding badges increase data sharing at British Medical Journal Open (BMJ Open)? A randomised controlled trial 101

hence slows the potential benefits to improved public health. Simple incentives to increase data sharing might provide the change needed to increase data sharing rates in health and medical research.

6.5 HYPOTHESIS

Open Data Badges increase data sharing in papers published in BMJ Open.

6.6 BACKGROUND

The foundation of health and medical research is data—its analysis, reanalysis, sharing and verification (12). Data sharing is an indispensable component of the movement towards science that is open, where scientific truth is not a negotiable commodity, rather easily accessible, replicable, and verifiable (21). Despite the wide-scale benefits of data sharing, it is yet to become common research practice. Numerous studies have demonstrated low rates of data sharing in health and medical research, with leading journals such as the BMJ having rates as low as 4.5% (263) and biomedical research articles 0% (16).

With the rapid rise of technological advances, there are now many platforms to openly and easily share scientific data (e.g., Figshare, Dryad). There have also many recent policies that encourage openness in research from funding bodies and journals. Yet the uptake of data sharing in health and medicine is low. One barrier to uptake may be the lack of recognition for data sharing and hence a reputational reward may be a useful incentive (263). Incentives are defined here as reputational rewards that are given to researchers if they participate in sharing their raw scientific data openly in a data depository.

In the worldwide health and medical publishing setting, BMJ Open has relatively strong policies to encourage data sharing. BMJ Open was the first medical journal to link datasets from its published medical journal articles to an open repository – Dryad (http://datadryad.org/). Every BMJ Open paper must complete a data sharing statement, even if it is to state that no data are available.

Badges for reproducibility were first adopted by the journal Biostatistics in 2009. In 2014, the Center for Open Science (USA) developed the Open Data Badge that motivated researchers to share their data with others (117). In their study, Kidwell et al. (2016) used Open Data Badges to reward researchers for sharing their data which

102 Did awarding badges increase data sharing at British Medical Journal Open (BMJ Open)? A randomised controlled trial

increased the data sharing rate at the journal Psychological Science from 1.5% to 39.4% (117).

However, there are some doubts around the effectiveness of badges. Our previous observational study that examined the effect of badges at increasing data sharing at the journal Biostatistics, found that badges increased data sharing by only 7.6% and had no effect on code sharing. So badges worked, but had a limited efficacy. Our conclusion was that the only way we could truly test the power of badges was to use a gold standard design, a randomised controlled trial.

6.6.1 Treatment arms There are two arms in this study, intervention and control. The intervention arm receives an email which informs them about data sharing at BMJ Open as well as the treatment, an Open Data Badge in exchange for publicly sharing their data at a repository of their choice. The control arm simply receives an email informing them about data sharing at BMJ Open, with no offer of an Open Data Badge or an incentive of any sort should they choose to publicly share their data at a repository of their choice.

6.7 OUTCOME MEASURES

6.7.1 Primary outcome • Data sharing rate

Data sharing is confirmed only after the data have been:

1. Directly available: deposited at a third party site (e.g., Dryad, Figshare, Github, Kaggle)

2. Verified: confirmed by the study team as accessible data

This definition of data sharing was derived from the Open Science Framework standard for earning an Open Data Badge (272).

A categorical data sharing outcome will be made using the categories:

1. No data deposited at a public repository

2. Data deposited at a public repository but data could not be verified as some people have embargoes on their datasets and/or links may be broken

Did awarding badges increase data sharing at British Medical Journal Open (BMJ Open)? A randomised controlled trial 103

3. Author(s) states that data are available only after further applications (e.g. ethics), or other conditions (e.g. payment, travelling to host institution), or upon reasonable request. Or author(s) states that data are available via a repository where access is provided only after a proposal to use data has been approved by an access committee, and within the terms of a data sharing agreement. Hence, data are available under controlled access.

4. Data deposited in public repository without restriction

Based on the above we will also tabulate the data as `Not available' (categories 1, 2), `Potentially available' (category 3) and `Directly available' (category 4).

BMJ Open staff will send through authors’ data sharing decisions once their articles are accepted for publication at the journal and the verification of data sharing will be done independently by the two QUT investigators, with disagreements about which category a paper belongs in resolved by consensus. The verification will be done blind to treatment arm however one investigator (Anisa Rowhani-Farid) might be somewhat unblinded as she will be involved with both recruiting participants (30 June 2017 to 16 March 2018) and verifying data sharing (December 2017 onwards), and hence may know which arm participants are in. The 6-month delay between recruitment and data sharing verification should reduce the possible recognition of authors.

6.7.2 Badge criteria Using the Kidwell and colleagues (2016) badge study as a precedent, the criteria for awarding papers with an Open Data Badge will be:

• The paper has been accepted for publication at BMJ Open

• The paper has a clear, permanent link to an open repository where the data are stored

• The data are easily downloadable

Thus, the only category eligible for a badge is number 4, the ‘Directly available’ group.

Once data sharing verification for control and intervention groups are complete, the control group who make their data directly available will also receive a badge as a bonus.

104 Did awarding badges increase data sharing at British Medical Journal Open (BMJ Open)? A randomised controlled trial

Due to time and human resource constraints, the investigators will not apply for access to the data from the ‘Potentially available’, as this could involve multiple lengthy ethical forms and application charges.

BMJ Open will not act as brokers to negotiate data access for researchers who decline to share their data.

6.8 RESEARCH DESIGN AND METHODS

This study is a collaboration between BMJ Open and QUT. This study will be a parallel group randomised controlled trial with two arms, control and intervention, with 83 research articles per arm, for a total of 166 research articles. 16 March 2018 is the last date that participants will be recruited.

6.9 METHODS

6.9.1 Inclusion criteria Papers will be eligible for inclusion if they:

• Are undergoing peer review at BMJ Open

6.9.2 Exclusion criteria Papers will be excluded if:

• The paper's title contains the word `Protocol', as this means it is likely to be a protocol rather than a results paper.

• They are meta-analyses or systematic reviews, as these papers often contain the data within the paper.

• They are case series, opinion pieces or some other publication type where there are no data.

• Any authors on the paper have a relationship with the QUT study team, in order not to bias their response.

• They are still under review at the time we assess data sharing (see below)

• The contact author has already been approached to be part of the study

• They are rejected after peer-review

Did awarding badges increase data sharing at British Medical Journal Open (BMJ Open)? A randomised controlled trial 105

6.9.3 Recruitment methods BMJ Open will include an opt-in button on ScholarOne so that authors who were interested to participate could opt-in. The QUT research team will receive author contact details after they have opted-in to be a part of this study upon submission of their paper to BMJ Open from 30 June 2017 to 16 March 2018.

Each potential paper will be screened for eligibility and then added to a REDCap database (273). REDCap provides a secure link for data transfer from the UK to Australia. The database will record each study’s title, study design, contact author, contact author’s e-mail, BMJ Open submission number and submission date.

A randomisation list will be created by the study statistician (Adrian Barnett) in R (www.r-project.org). It will use 1:1 allocation in randomised blocks of size 4, 6 and 8. This blocking will ensure roughly equal allocation over time. The randomisation list will be added to REDCap and papers will be randomised by clicking a button after they have been screened and had their basic details entered. For those papers randomised to the intervention, an automatic e-mail will be created based on the template. The team will then e-mail (recruitment emails included as Appendix C) authors randomly allocated to the intervention to ask them if they would share their data in return for an Open Data Badge that would appear on the PDF version of their published paper on BMJ Open web site (if their paper is accepted). The control arm will receive an email (recruitment email attached) requesting them to share their data but no incentive will be offered.

6.9.4 Sample size calculation and power A sample of 171 papers per arm will give us a 90% power to detect a difference in data sharing of 8% (based on five published studies) versus 20% (a conservative halving of the previous badges study effect). This uses a two-sided 5% significance level. Due to unexpected slow recruitment, the final sample will be reduced to 83 papers per arm, with a 62% power to detect a difference in data sharing.

6.10 LIMITATIONS

Due to the opt-in method of participant recruitment, the trial results may overestimate the effect of badges, as researchers who are more sympathetic to data sharing may be more likely to participate in the trial. The estimated impact of badges might also be reduced, as researchers will not receive a badge if they make

106 Did awarding badges increase data sharing at British Medical Journal Open (BMJ Open)? A randomised controlled trial

their data available after the paper is accepted because the journal staff need to confirm availability.

6.11 STATISTICAL ANALYSIS

6.11.1 Primary outcome We will compare the data sharing rates for those who were (intervention arm) and were not (control arm) promised an Open Data Badge. We will tabulate the numbers shared (yes/no) by treatment arm (intervention/control) in a 2×2 table and use a Chi- squared test or Fisher’s exact test if there are small cell sizes. We will calculate the percent shared in each treatment arm, the difference between the arms, and a 95% confidence interval of the difference.

The peer review process will create a delay between randomisation and time to publication. The median time to first decision at BMJ Open is 55 days, and many submissions undergo two reviews. There is also a delay between final acceptance and publication. We will therefore assess the outcomes in November 2018. Any papers still under review will be excluded.

Other outcomes To assess if the study arms were similar we will compare them in terms of: study type and publication status (accepted/rejected) using frequency tables.

We will collect the statements in every “Data sharing statement” and use a word frequency table to compare the two study arms. We will first remove common words such as: at, the, there, their, etc. We will also compare the average number of words per data sharing statement and calculate the difference and 95% confidence interval of the difference.

We will prospectively collect data on the amount of time needed by the QUT study team to contact authors and verify the datasets. This data will be entered into REDCap and will be presented using simple summary statistics per paper.

The results will be reported using the CONSORT guidelines for parallel group randomised controlled trials (274).

Planned interactions We will test whether there is an interaction between the main effect of Open Data Badges (intervention or control arm) and study type: clinical trials, observational

Did awarding badges increase data sharing at British Medical Journal Open (BMJ Open)? A randomised controlled trial 107

studies, longitudinal studies, surveys, other. Depending on the numbers recruited we may combine small groups.

6.11.2 Significance This study has the potential to influence the reward system to incorporate data sharing as a measure of high-quality research. Open Data Badges could be adopted internationally, further increasing collaborations in the scientific realm and progressing the advancement of health and medicine.

108 Did awarding badges increase data sharing at British Medical Journal Open (BMJ Open)? A randomised controlled trial

Chapter 7: Discussion

7.1 SCIENCE AND SOCIETY: REFLECTIONS AT THE NEXUS

7.1.1 A search for truth Scientific knowledge is generated through the efforts of researchers worldwide testing theories and falsifying hypotheses. Through the processes of observation, induction and deduction, which are formally established elements of the epistemology of science, researchers are able to analyse the raw data collected through their studies and use statistical methods to draw conclusions and contribute their share to the body of knowledge called ‘science’. However, within the current climate of ‘alternative facts’, also known as the ‘post-truth’ era, there is much public scrutiny and suspicion around the validity of science. Are scientific truths undeniable facts or are they debatable opinions made by a group of biased researchers? And how are scientific claims measured? These are among the myriad questions confronting the public, policy makers, politicians, and scientists themselves.

The renowned meta-scientist John Ioannidis argued in 2005 that most published scientific findings are false (4). Ioannidis’ assertion, compounded by the symptoms of the post-truth era, catalyzed a global movement by the meta-research community to lessen pseudoscientific claims and to strengthen the reliability of the scientific method (21). Progress in science should be attributable to the reproducibility of an experiment, whether the findings are verifiable and replicable under different settings. If science is a body of knowledge that allows humanity to discover more about the universe, then a key element of the scientific process should be reproducibility. The object of most value to reveal scientific truth, thereby providing the evidence to support its claims and making it reproducible, is the raw data generated from the experiment itself (28). The act of sharing data, then, takes on great prominence. This thesis set out to investigate the interventions that increase data sharing in the health and medical research community, in order to promote a culture of open science and data sharing. The findings of the three studies of this PhD provide some insight into the factors that could contribute to this cultural change.

Discussion 109

7.2 EVIDENCED-BASED SCIENCE?

In 2015, when I commenced the PhD, the concepts of open science, open data, data sharing, reproducibility and incentives, were proliferating throughout the scientific discourse. Opinion piece after opinion piece, editorial after editorial, discussed the importance of motivating researchers to share their raw scientific data. Though exciting at first, it was becoming a disappointing exercise to read new papers just to discover that scientists were merely talking about incentives for data sharing, but were not presenting any empirical evidence that demonstrated the testing of any of their proposed hypotheses. Given that science is supposedly evidence-based, it was paradoxical that a group of scientists had little evidence to show – the irony was noted, and provided the perfect experiment for a team of meta-scientists like ourselves: provide data on data sharing to increase data sharing.

7.3 BADGES?

My first study systematically examined the entirety of the health and medical literature, to confirm that there were no evidence-based incentives that motivated researchers to share their data. The systematic review uncovered only one evidence- based incentive, open data badges, developed by The Center for Open Science. Badges were introduced at the journal Psychological Science in 2014 and increased data sharing from 1.5% pre-incentive (2012) to 39.4% post-incentive (2015). Kidwell et al. (2016) concluded that badges are a simple, effective and low-cost incentive for open practices, and journals can offer them to authors who are willing and able to meet criteria to earn an open data and open material badge (117). The systematic review also confirmed my original inclinations of encountering a great number of opinion pieces (n = 85) on the importance of developing incentives for researchers, which outnumbered the number of articles that tested strategies to increase data sharing rates (n = 76) (264).

Badges as incentives for data sharing seemed somewhat frivolous for a group of esteemed scientific researchers. This finding was rather familiar to me as in October of 2015, I participated in my first conference, the eResearch Australasia conference in Brisbane, where I had the opportunity to privately speak to the keynote speaker Professor Tony Hey about my research into incentives for data sharing. At that time, my study design was to test money as an incentive, and to trial it at my own

110 Discussion

university. Tony Hey disclosed to me that there was some ongoing research on badges as incentives for researchers. I vividly remember being skeptical and sneeringly commenting on the meaninglessness of such a concept and being dismissive of the possible effectiveness of a badge, and afterwards, with my supervisor, having a good laugh about it. One year later, in October of 2016, I had to put aside my pre-conceived notions and readjust my thinking as I now had some intriguing evidence that showed that an electronic badge did in fact motivate researchers to share their data.

I was also moved towards badges because of the barriers and complexities of providing a financial incentive, as my original study intended. Along with the administrative hurdles in terms of the mechanics of awarding funds within QUT, there was the difficulty of raising the money needed to fund the trial. It was also uncertain how much money should be awarded per dataset shared.

7.4 POLICIES?

But was a badge the most important change in data sharing? What happened to the whole concept of motivating change through rules and regulations? Since scientific journals are the gatekeepers of research, do their data sharing policies influence researchers’ behaviours? As my literature review revealed, there were not many studies in the field of health and medical research that analysed the effect of data sharing policies. This gap in the literature shaped my next study, which sought to investigate whether data sharing policies were effective at motivating researchers to share their data. In 2015, not many health and medical journals had data sharing policies. The BMJ, however, were pioneers, with a data sharing policy that was introduced in 2009 and which had evolved to become more comprehensive, when in 2015, all randomised controlled trials published in the journal were bound by the policy, with a requirement to share all their raw data. Other BMJ articles were welcome to share their data, but it was not a requirement for them to do so. All articles, trial or not, needed to include a data sharing statement in their manuscripts. In 2001, a study was conducted that analysed data sharing at the BMJ (249). It was interesting to investigate whether data sharing rates had changed since then. Unfortunately, data sharing rates at the BMJ had only increased from 3% to 4.5% in 15 years, but with a 40% increase in the rate of “data promised” since 2009 (249), demonstrating an increased compliance with data sharing policies for publication

Discussion 111

purposes, but not in practice (256). And clinical trials, which were bound by the policy, only had a data sharing rate of 24% (263). Our conclusions were that a two- fold approach of both tighter data sharing policies and incentives was required. A recent paper by Naudet et al. (2018), published in the BMJ, replicated my second study with an emphasis on clinical trial data sharing at the BMJ and PLOS Medicine. The primary outcome was data availability and those articles with data available were reanalyzed to quantify reproducibility (275). The 46% data availability rate observed was higher than elsewhere in the biomedical literature; the authors also re- analysed the data and found that 82% of papers were reproducible (275). The authors are not sure whether their higher data availability rate results directly from policy implications or from a slow and positive cultural change from trialists (275).

7.5 DELAYS AND BADGES

In July 2016, my professor was asked to serve on the Editorial Board of BMJ Open. This facilitated a great opportunity for my PhD: to conduct a randomized controlled trial to test the effectiveness of badges at a leading medical journal, BMJ Open. This trial is the first global trial of its kind and has the potential to shape the future of incentives for data sharing in health and medical research, and could be transferable to the entire scientific research community at large. We re-designed my PhD, and began ethics applications. Lawyers were involved as we had to sign a memorandum of understanding to permit BMJ Open to share data on authors with us so that we could conduct a study on data sharing, another entertaining ironic moment in my PhD. These delays took five months and our study design changed two times. However, the delays had one advantage as I met Professor Brian Nosek, the co- founder of The Center of Open Science, and the principal investigator of the badge study, in May 2017 in Amsterdam at the 5th World Conference on Research Integrity. I shared with Brian the challenges I was encountering with getting our trial started and he provided some timely and expert advice that shaped my trial at BMJ Open. The next day I met BMJ Open editor, Adrian Aldcroft, to discuss and finalise our trial design in person. Finally, in July 2017, we received ethical and legal approval to start data collection and it is ongoing at January 2018, with 135 participants to date. I received funding to go to London in February to March 2018 for a one-month internship to progress the trial and to finalise the data collection phase.

112 Discussion

7.6 KITEMARKS AS BADGES

In July 2017, while pondering on the delays regarding my trial, I read the article by Nichols and Twidale (2017) called “Metrics for Openness” where they discussed “kitemarks” at the journal Biostatistics as an incentive for reproducibility at the journal (276). This was a light-bulb moment for us as the reproducibility policy change at the journal provided an ideal opportunity to conduct an observational study with a pre-post design, to investigate the effect of badges, thus replicating the Kidwell et al. badge study at the journal Biostatistics.

This was an interesting time in the debate around incentives as in August 2017, Hilda Bastian, a renowned biostatistician, critiqued the claims of Kidwell et al. about the effectiveness of badges, and her blog post received worldwide attention and praise. Given these growing doubts around badges, I was not surprised when the results of our observational study showed that badges increased data sharing rates at Biostatistics by only 7.6%, and had no effect on code sharing. These results indicated that badges worked, but had a limited efficacy. Our conclusion was that the only way we could truly test the power of badges was to use a gold standard design, a randomised controlled trial (277).

On a related note, a study published in 2017 looked at data sharing at Psychological Science, the setting for the original badge study by Kidwell et al., and found that though badges rapidly increased data sharing between 2013 and 2015, in 2016 this trend stopped, confirming that badges alone are not sufficient (278).

7.7 THE EVOLUTION OF AND COSTS OF DATA SHARING POLICIES

My second study revealed that data sharing statements are not monitored by journal staff as researchers might simply state that all their data are available in order to get their papers published, but then fail to share the data upon request. There are also examples in PLOS ONE where researchers state, "all the data are available in the article", when clearly the data are not in the article, again this was probably not checked by journal staff. These are breaches of the data sharing policies and journals must consider the implications of a lack of compliance. One possible approach is to withdraw such papers from the journal. Another approach could be for the journal to negotiate data access. The BMJ provides such broker services to negotiate data access between data providers and data users, however data negotiation would also

Discussion 113

incur additional time costs for the journal staff and researchers. My research raises the question of whether journals monitor data sharing statements and data sharing practices of researchers. Naudet et al. (2018) indicate that active auditing of data sharing policies by journal editors may facilitate the implementation of data sharing (275), but this also has time costs.

A recent study by Grant and Hrynaszkiewicz (2018) found that it took on average 6.2 minutes per manuscript for editorial staff at Nature journals to add a data sharing statement that provided a link to data in a public repository, whereas adding a data sharing statement that provided data upon request took 3.4 minutes per manuscript (279). These figures suggest that the costs may be relatively low.

There are additional costs for journals when it comes to adopting incentives for data sharing. Badges have been promoted as a simple solution because they are low cost (117). However, while collecting data for my third study, I noticed that articles did not always appear to be allocated with badges correctly, implying that assigning badges is not always clear cut and journal staff may need to spend more time on correctly awarding badges. This can sometimes be difficult for journal staff who may not be researchers and hence are not able to judge if the data shared match the data used in the publication. A recent commentary by Gertler et al. in Nature (2018) studied the in the field of economics (79). The authors argued that the way forward is for more journals to take on the responsibility of verifying reproducibility before publication by either using their own staff or commissioning external academic experts to conduct the verification exercise (79). Naudet et al. (2018) drew similar conclusions by suggesting that if journals ask for data and code, they should be checked for completeness and basic usability (with or without reviewers) (275).

Journals such as F1000Research mandate data and code sharing upon submission and peer reviewers generally check data and the code before acceptance, however there is no formal verification process pre-publication (280). Biostatistics has an Associate Editor for Reproducibility and authors have the option to have their studies reproduced by the Editor if they submit their data and code, however formal verification or the sharing of data and code are not mandatory (261). Springer Nature has recently launched a Research Data Support deposition and curation service that offers data curation and publication, in order to directly support

114 Discussion

researchers in sharing their data (281). This was a free service during its pilot phase, however it will be available as an optional paid service for researchers (281). This service provides a platform for sharing data and code, but it does not include research reproducibility (281).

It appears that data sharing policies of journals are slowly evolving from simply mandating the inclusion of a data sharing statement that outlines the authors’ data sharing plans, to mandating the sharing of data and code and depositing them at a public repository upon article submission. It might be time for journals to evolve into institutions that not only disseminate knowledge but which ensure the reproducibility of the research they share. Journals would have to consider the financial, technical, legal, and ethical implications of implementing stringent reproducibility policies so as to contribute to the global shift towards a culture of more reproducible research. As such, perhaps there are some parallels in learning between the other areas of meta-research, particularly, in the field of reporting. The EQUATOR network, established in 2008, was created to tackle the problems of inadequate reporting systematically and on a global scale (282). To date, the network has worked successfully with journals to reduce research waste by implementing checklists and guidelines for reporting various types of studies such as STROBE for observational studies, PRISMA for systematic reviews and meta- analyses, and CONSORT for randomised controlled trials, among others. The network indicated that making all reporting guidelines known and easily available is the first step in their successful use (282). Promotion, education and training form another key part of the EQUATOR Network’s core programme (282). In this light, it could be that data depositories such as Dryad and Figshare provide free space to journals to host the data and code from the research they publish and that universities collaborate with journals to assist with peer review of data and code.

A recent piece in the Scientific American in April 2018 discussed an alternative to submitting data to journals upon publication: uploading data upon collection using a blockchain network (283). A blockchain network would provide a timestamp on data to mark its ownership thereby preventing others from stealing it (283). The author argues that a blockchain network would open a revolutionary window to the verification of research (283).

Discussion 115

While designing and collecting data for my fourth study, the randomised controlled trial at BMJ Open, I learned of the technical difficulties and costs of allocating badges. Articles must have their data available at the production phase in order to be eligible for a badge. Some researchers might have their datasets under embargo until the article is in publication phase, which will mean they cannot receive a badge.

In addition to the costs of awarding badge, there are costs associated with monitoring badges, as my third study revealed. Articles might have been allocated with an open data badge however the availability of their datasets was not always certain, implying that badges must be regulated over time by journal staff. Journals could also use researchers to help monitor compliance, with the ability for researchers to report where data are no longer available or where the data shared does not match the data in the article. Although removing a badge from a paper may also take time for journal staff given the complexities of the production process.

There is also the concern of the potential economic losses of adopting rigorous data sharing policies. An editor of a high-ranking health and medical journal told us that they did not pursue a mandated policy because they might lose good papers to rival journals. On the contrary, the experience of PLOS provides evidence of how a radical move to mandate data sharing by a publisher was welcomed by authors and contributed to a culture of data sharing at its journals (284). Though PLOS was first concerned that it would not be able to consider a significant number of submissions because authors could not share data, since the implementation of the updated policy in 2014, to 2017, PLOS estimated that less than 0.1% of its rejections were due to authors’ unwillingness or inability to share data (284).

7.8 ALTERNATIVE PERSPECTIVES

The discipline of meta-research is an emerging field. Though there are countless opinions on scientific reproducibility, the field lacks trials that test incentives for reproducible practices. A possible explanation for the lack of trials in this area is the minimal funding for meta-research which could stem from the current tension in the discourse on data sharing, where promoters of data sharing are referred to as ‘research parasites’ who ‘steal from research productivity’ and who are apparently ‘taking over’ (246, 247). There is also some questioning of whether data sharing is

116 Discussion

worth the effort as we do not currently know the value of sharing data as there is no study that quantifies the average return on sharing a dataset (247).

In 2016, the New England Journal of Medicine published a perspective piece called, ‘Data sharing – Is the juice worth the squeeze?’, which discussed the low number of requests for a dataset released by GlaxoSmithKline (247). What the article failed to address however was the difficulty and complexity of obtaining this dataset from GlaxoSmithKline (285). There are comments on Twitter from researchers that critique this perspective piece and outline these technical difficulties. One researcher said: “Major trials not available. Access uncertain. Data dictionaries hard to find -> can't formulate Q's. Trials spilt betw many sites”, another said, "true data sharing would be preferable to data access on a dedicated website". These words are similar to my colleague’s experience that I outlined in Chapter 2.8.

The statistician Niels Keiding argued in an editorial in Biostatistics in 2010 that repeatedly sharing an existing dataset for the purpose of progressing future “method- driven” analyses would not bring much value to science as would promoting “problem-driven” science (286). These points are contrary to the purpose of sharing data, which is to progress science as a body of knowledge and to make the research process more robust and verifiable (162). My view is that sharing data, regardless of originality, and making it findable and useable, increases the potential for readers to learn new methods, and increases the transparency and reproducibility of research, thus reducing research waste.

7.9 RESEARCH TRANSLATION

In this section I outline how this thesis might be translated into policy or practice.

Along with incentives, my research unveiled that more stringent data sharing policies are also required. As I discussed earlier, some global collaboration has resulted from my PhD: a study with BMJ Open and connections with The Center of Open Science and possibly Elsevier. I also had the opportunity to participate in a round table discussion on open research with Heather Joseph of Scholarly Publishing and Academic Resources Coalition (SPARC).

During February to March 2018 I had the opportunity to do an internship at BMJ Open in London. This internship was to progress the randomised control trial and to complete the data collection phase of the study. I held consultations at the BMJ with

Discussion 117

a few key individuals who provided expert advice about the progress of the trial; the BMJ researcher, Dr Sara Schroter, and the Editor of BMJ Open, Adrian Aldcroft, and Dr Emily Sena, the Editor of BMJ Open Science. I also participated in an Editor’s Retreat where I met Dr Fiona Godlee, the Editor in Chief of the BMJ. While based at BMJ Open, I met with the Publisher of the BMJ, Claire Rawlinson, who sought my thoughts about updating the BMJ data sharing policy and implementing badges as an incentive for open research at its journals.

Further, while in London I collaborated with researchers and strategic staff of global funders, journals, and universities, including, Wellcome Trust, London School of Hygiene and Tropical Medicine, F1000Research, and Springer Nature. Particularly noteworthy was attending a lecture on publication ethics by Dr Richard Smith ODE, the former Editor in Chief of the BMJ. Dr Smith shared some thoughts about the future of open research and has offered to provide me mentorship to discuss the progress of my career this field. I was also interviewed by a journalist who publishes with Nature and Science to discuss a news article about my research. During my meeting with Dr Robert Kiley, the Head of Open Research at Wellcome Trust, he pointed out that he and his team had read my recent paper with F1000Research and he sought my thoughts about the future of policies and incentives for the global movement towards more open research.

My research has stimulated much discussion around the importance of open research at the Office of Research Ethics and Integrity (OREI) at my own university. The Director of Research at the Faculty of Health at QUT created a School-level discussion session about open research after participating in my final PhD seminar and further appreciating the importance of openness in research, particularly in health. In addition, my PhD findings contributed to the creation of the QUT Research Data Management Strategy for 2017 to 2020 that was developed by OREI and accepted by the University Research Innovation Committee.

My publications have also received some positive attention on Twitter (Figure 7-1) from leading meta-researchers and influential individuals in the field of open research. Particularly noteworthy were the tweets from Professor Harlan Krumholz, founder of the Yale Open Data Access (YODA) Project, Dr Trish Groves, director of academic outreach and advocacy for the BMJ, editor in chief of the online only journal BMJ Open, and honorary deputy editor of the BMJ, Associate Professor

118 Discussion

David Moher, an epidemiologist who spearheaded the development of the CONSORT and the PRISMA statements, Dr Richard Morey, senior lecturer at Cardiff University, Dr Elizabeth Wager, Co-Editor-in-Chief of the journal Research Integrity and Peer Review, Dr Robert Kiley, Head of Open Research at Wellcome Trust, and Dr Liz Allen, Director of Strategic Initiatives at F1000Research. Lastly, Dr Trish Groves presented the results of my second study that looked at data sharing at the BMJ in a public interview with the Methods in Research on Research (MiRoR) group in February 2018. She mentioned that the results of my study were disappointing with low sharing rates but understandably so as it takes time and money to prepare datasets for sharing and the only reason clinical trialists would have shared their data with me is because I was doing a study on data sharing (287).

Discussion 119

Figure 7-1: Selected tweets arising from this PhD’s contribution to the scientific discourse

7.10 IDEAS FOR FUTURE STUDIES

Due to the shortage of time, I had to exclude two studies from my PhD: a national survey of health and medical researchers in Australia to investigate rewards for data sharing, and a case–control study to test whether a data management tool increases data sharing.

It would also be interesting to conduct a trial that involved testing money as an incentive for data sharing and to collaborate with a national funding body such as the National Health and Medical Research Council in Australia to offer funds as a prize for sharing data upon publication. Naudet et al. argued that funders are well positioned to facilitate data sharing (275). The Wellcome Trust, for instance, already allows researchers to use funds for open access fees, and this could be extended to include funding data sharing (288). Similarly, an irreproducibility report by the National Association of Scholars published in April 2018 discussed forty recommendations for increasing the reproducibility of science, one of which was:

“Government agencies should prioritize grant funding for researchers who pre-register their research protocols and who make their data and research protocols publicly available” (289).

Other ideas for future research are to do qualitative studies that explore researchers’ perspectives about the value and usefulness of badges (and other incentives), and related studies of journal staff and their priorities.

120 Discussion

7.11 REFLECTIONS

7.11.1 What I would have done differently If scientific data are the foundation or building block of health and medical research, then statistical methods could be likened to the grand design or architectural framework for building a robust scientific edifice. The more I learned about scientific research, the more fascinated I became as to the value of statistics in health and medicine. It is through statistical methods that scientific truth can be determined and if I could go back and do things differently, I would have spent more time learning about statistical methods and particularly, a coding language such as R (www.r-project.org).

When I first started, I spent a great deal of time reading the scientific literature and understanding the various theoretical constructs of my PhD. Particularly, I broadened my scope of subject matters and read some philosophical books and even attended a course on the history of science and its philosophy. Somewhere down the line however, I somewhat lost my philosophical vision. It is easy to lose sight of the overarching purpose of one’s research and to get distracted while collecting data. My Writing Circle instructor recommended that I maintain a journal and write my reflections and thoughts about my research daily. This technique, along with frequent conversations with my supervisors, colleagues, and friends, about the purpose of my research and its philosophical underpinnings, assisted me to remain focused and empowered to carry on with the work.

With regard to my research design, naturally, my studies had their limitations. My systematic review search strategy could have included more search terms and could have reviewed the grey literature. My observational study at the BMJ could have had a larger sample size and I could have used the BMJ as a broker to negotiate data access and I could have applied for the datasets which were classified as “data potentially available”.

If I had the statistical skills, for my observational study at Biostatistics, I could have verified datasets in addition to verifying data sharing. This is important to ensure that researchers are sharing useful data, but it can often take time to verify the data as it may require reproducing the results.

Discussion 121

Chapter 8: Conclusions

8.1 THE CULTURAL SHIFT TOWARDS MORE OPEN DATA

To encourage the culture of openness and data sharing in the scientific community, it is important to raise consciousness among policy and decision makers. The process of raising consciousness is one that involves education and training but also time. Leading journals, funders, organisations, institutes, research councils and universities are slowly adopting this culture of data sharing, however there are some key decision makers such as the ICMJE and the two Australian governmental research funding bodies, the NHMRC and the ARC, that are yet to mandate and regulate data sharing. There are political and economic factors that shape decision makers’ opinions, so it seems the adoption of a culture of sharing is one that will unfold progressively over time.

During this gradual cultural shift towards more open data, there is a spectrum of activity associated with the act of sharing data. At one end of the spectrum is high- quality data deposited at a third-party depository, easily accessible, findable, reusable, and publicly available. At the other end of the spectrum is low-quality data hidden away in someone’s computer, not accessible, not findable, not reusable, and most definitely not publicly available. Of course, there is a variety of circumstances in between these two extremes and not all data can be shared because of ethical and legal reasons and there are people who feel differently about sharing data, however, my research has contributed to this cultural shift towards more open data by providing empirical evidence that has assisted with the reform of data sharing policies and the development of evidence-based incentives for researchers.

The significance of my research is that it has challenged the roots of the philosophical conceptions of science, it questioned the current system within which health and medical research is conducted, it prompted scientists to reflect on the factors that shape their choices to share or not to share their data, it promoted research reproducibility and transparency – all with the vision of the widespread generation, diffusion and application of health and medical knowledge.

122 Conclusions

8.2 THE WAY FORWARD

The global scientific community is moving towards open research. Open research may soon become the norm, however, there are no shortcuts and no formulas. Policies, incentives, educational initiatives, technological tools, all play a part in the process of change. As with any global movement, effort at all levels is required. The magnitude of the change before us demands sustained advocacy, creativity, courage, and above all, action. Small steps, when sustained over long periods of time, result in great distances travelled. Effort and movement, together with an attitude of learning, will bring about significant transformations. We must trust that the global research community will become more open. In fact, at the heart of every enterprise in the generation of scientific knowledge lies trust. Patients trust that researchers use their data responsibly, scientists trust that other scientists have conducted their research with integrity, funders trust that researchers spend public funding honestly, and the public trust that the published scientific literature is reproducible, high-quality, and translatable. When it comes to sharing data, as Milton Packer elucidated in a recent editorial in the BMJ, ‘trust is the crux of the matter’, as ‘in the absence of a framework of trust, data sharing will become a weapon for the sceptics rather than a conduit for the advancement of science and medicine’ (290). It is the advancement of health and medicine that must be our ultimate goal. Without such a vision, sharing data and open research becomes a negative ‘naming and shaming’ exercise, which further stimulates fear, both in the public and amongst researchers themselves (290). Accordingly, a spirit of sustained encouragement might bring about profound results.

8.2.1 Utopian? I wanted to share some personal reflections on the cultural shift towards more open data. Could the culture of data sharing and open science be dismissed as utopian? If not, then how far-fetched are we from this reality? I believe the search for scientific truth is what motivates researchers—a quest for knowledge, seeking insight into the mysteries of the universe, understanding the interconnections of the physical reality, and surpassing the laws of nature to innovate, discover, and transcend. Humanity has placed a lot of faith in science, particularly in the fruits of scientific discovery, the technological devices that are, for the most part, widely adopted by people around the world. How far we have come as a world scientific community? At this

Conclusions 123

juncture, surrounded by the maladies of the post-truth era and the reproducibility crisis in health and medicine, humanity is in dire need of a more robust, verifiable and trustworthy body of knowledge it calls science. It seems the current system of generating scientific knowledge is as a crumbling structure, the very foundations of which can no longer support the entirety of the edifice. A cartoon image in a commentary by Begley et al. published in Nature in 2015 illustrates the disintegrating scientific enterprise, where funders, journals, researchers, and institutions are depicted as the pillars of the crumbling “robust science” edifice (291). If the foundation of health and medical research is data, and if that is undermined, then our scientific edifice is most definitely collapsing. Policies, frameworks, and incentives, though well-intentioned, are as patchwork, covering the cracks of a disintegrating structure. It might be time, during this paradigm shift towards more open data, to begin considering what the implications of building a new robust edifice might be.

The economic literature discusses intrinsic and extrinsic motivations—intrinsic associating with the deep roots of human motivation to do good deeds without any external stimuli, and extrinsic concerning external factors that motivate behavior (292). Naturally, incentives would fall under extrinsic motivations. Studies demonstrate however that when extrinsic motivations are stimulated, intrinsic motivations are lost. These principles of economics might have some implications on the premise behind developing “extrinsic motivators” or incentives for data sharing and I will end this thesis with some rhetorical questions in that regard.

The paradox which faces meta-researchers during this paradigm shift from closed to open science is: will incentives to share data remove the natural desire to share one’s data for the progress of scientific research? In other words, does the hypothesis that incentives will motivate researchers to share their scientific data contradict itself? Are researchers purely driven by prestige and individualism, or is there a deeper, more meaningful root to sharing scientific data? Could a longer-lasting, deeper sense of fulfilment and reward come from recognising the interconnectedness of one’s research with the world around us and of contributing to the betterment of humanity at large? As Nikola Tesla said:

“Science is but a perversion of itself unless it has its ultimate goal the betterment of humanity”.

124 Conclusions

Appendices

Appendix A

PRISMA (2009) Checklist for Study 1

Appendices 125

Reported Section/topic # Checklist item on page # TITLE Title 1 Identify the report as a systematic review, meta-analysis, or both. 1 ABSTRACT Structured summary 2 Provide a structured summary including, as applicable: background; objectives; data sources; study eligibility criteria, 2 participants, and interventions; study appraisal and synthesis methods; results; limitations; conclusions and implications of key findings; systematic review registration number. INTRODUCTION Rationale 3 Describe the rationale for the review in the context of what is already known. 3 Objectives 4 Provide an explicit statement of questions being addressed with reference to participants, interventions, comparisons, 3 outcomes, and study design (PICOS). METHODS Protocol and registration 5 Indicate if a review protocol exists, if and where it can be accessed (e.g., Web address), and, if available, provide 5 registration information including registration number. Eligibility criteria 6 Specify study characteristics (e.g., PICOS, length of follow-up) and report characteristics (e.g., years considered, 6 language, publication status) used as criteria for eligibility, giving rationale. Information sources 7 Describe all information sources (e.g., databases with dates of coverage, contact with study authors to identify 7, 8 additional studies) in the search and date last searched. Search 8 Present full electronic search strategy for at least one database, including any limits used, such that it could be 7, 8 repeated. Study selection 9 State the process for selecting studies (i.e., screening, eligibility, included in systematic review, and, if applicable, 8, 9 included in the meta-analysis). Data collection process 10 Describe method of data extraction from reports (e.g., piloted forms, independently, in duplicate) and any processes 9, 10 for obtaining and confirming data from investigators.

126 Appendices

Data items 11 List and define all variables for which data were sought (e.g., PICOS, funding sources) and any assumptions and 9, 10 simplifications made. Risk of bias in individual 12 Describe methods used for assessing risk of bias of individual studies (including specification of whether this was 13, 14 studies done at the study or outcome level), and how this information is to be used in any data synthesis. Summary measures 13 State the principal summary measures (e.g., risk ratio, difference in means). 6 Synthesis of results 14 Describe the methods of handling data and combining results of studies, if done, including measures of consistency 9, 10 2 (e.g., I ) for each meta-analysis.

Page 1 of 2

Appendices 127

Reported Section/topic # Checklist item on page # Risk of bias across studies 15 Specify any assessment of risk of bias that may affect the cumulative evidence (e.g., , selective 13, 14 reporting within studies). Additional analyses 16 Describe methods of additional analyses (e.g., sensitivity or subgroup analyses, meta-regression), if done, indicating 14, 15, which were pre-specified. 16, 17, 18, 19 RESULTS Study selection 17 Give numbers of studies screened, assessed for eligibility, and included in the review, with reasons for exclusions at 10, 11, each stage, ideally with a flow diagram. 12, 13, 14, 15 Study characteristics 18 For each study, present characteristics for which data were extracted (e.g., study size, PICOS, follow-up period) and 13 provide the citations. Risk of bias within studies 19 Present data on risk of bias of each study and, if available, any outcome level assessment (see item 12). 13, 14 Results of individual studies 20 For all outcomes considered (benefits or harms), present, for each study: (a) simple summary data for each 13, 14 intervention group (b) effect estimates and confidence intervals, ideally with a forest plot. Synthesis of results 21 Present results of each meta-analysis done, including confidence intervals and measures of consistency. 13 Risk of bias across studies 22 Present results of any assessment of risk of bias across studies (see Item 15). 13, 14 Additional analysis 23 Give results of additional analyses, if done (e.g., sensitivity or subgroup analyses, meta-regression [see Item 16]). 14, 15, 16, 17, 18, 19 DISCUSSION Summary of evidence 24 Summarize the main findings including the strength of evidence for each main outcome; consider their relevance to 20, 21 key groups (e.g., healthcare providers, users, and policy makers).

128 Appendices

Limitations 25 Discuss limitations at study and outcome level (e.g., risk of bias), and at review-level (e.g., incomplete retrieval of 22 identified research, reporting bias). Conclusions 26 Provide a general interpretation of the results in the context of other evidence, and implications for future research. 22

FUNDING Funding 27 Describe sources of funding for the systematic review and other support (e.g., supply of data); role of funders for the 24 systematic review.

From: Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group (2009). Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLOS Med 6(7): e1000097. doi:10.1371/journal.pmed1000097 For more information, visit: www.prisma-statement.org. Page 2 of 2

Appendices 129

Appendix B

Email to authors for Study 2

Subject Title: Title of The BMJ research article that was randomly selected (changes per email)

Dear xxxx

My name is Anisa Rowhani-Farid and I am a higher degree research (Masters) student at the School of Public Health and Social Work, Faculty of Health, Queensland University of Technology (QUT) in Brisbane, Australia. The title of my research is: Promoting a culture of open science and data sharing in health and medical research. I am examining how data sharing rates have changed over time and in reaction to data sharing policy changes at the British Medical Journal (BMJ). The BMJ has made several changes to its data sharing policy since 2009. Your research paper xxxx has been randomly sampled for this study. You mentioned in your data sharing statement that your dataset is available upon request. Would it be possible to receive a copy of your dataset or to be directed to a depository where it is stored? We are interested in re-examining your research dataset in order to quantify the integrity of the data sharing process. We will examine the data to verify that it appears consistent with the published paper (e.g., by checking the sample size), but will not perform a formal re-analysis. You will not directly benefit from participation in this project however it is hoped the outcomes of the study contribute to the movement toward data sharing in health and medical research. There are minimal risks associated with your participation in this project. These include discomfort associated with your duty to provide data to other researchers. All data and correspondence will be treated with strict confidence and no individual researchers or projects will be mentioned in the paper, rather we will report aggregated statistics. The data will be deleted as soon as we have re-examined the datasets and research data that is generated through this study will be publicly shared in an anonymised format for possible future use. Please note that this study has been approved by the QUT Human Research Ethics Committee (approval number 1500000842). QUT is committed to research integrity and the ethical conduct of research projects. However, if you do have any concerns or complaints about the ethical conduct of the

131

project you may contact the QUT Research Ethics Advisory Team on +61 7 3138 5123 or email [email protected]. The QUT Research Ethics Advisory Team is not connected with the research project and can facilitate a resolution to your concern in an impartial manner. Your response to this email will be taken as consent to participate in this study. Should you have any further questions about this study, feel free to contact me.

Kind regards

Anisa Rowhani-Farid, Masters Student [email protected]

Associate Professor, Adrian Barnett

Supervisor [email protected]

+61 7 3138 6010

School of Public Health and Social Work, Faculty of Health

Queensland University of Technology

132

Appendix C

Emails to authors for Study 4:

Intervention group email

Subject Title:

*Title of the research article that was randomly selected (changes per email)*

Dear [xxxx]

My name is Anisa Rowhani-Farid and I am a PhD student from the Faculty of Health, Queensland University of Technology (QUT) in Brisbane, Australia. I am examining data sharing rates at BMJ Open.

Your research paper [xxxx] has been randomly sampled for this study. Would you be willing to publicly share the data used in this paper by depositing it at a public data repository? In exchange, you will be awarded with an Open Data Badge (https://osf.io/tvyxz/wiki/1.%20View%20the%20Badges/) that would appear on your publication. You do not need to respond to this email. If your paper is accepted for publication at BMJ Open, we will verify whether you shared your data at a public depository based on your data sharing statement. You will earn a badge if your data sharing statement provides a link to your dataset, making your data easily accessible.

Your participation in this study is entirely voluntary. You will not directly benefit from participation in this project apart from potentially earning a badge, however it is hoped the outcomes of the study contribute to the understanding of data sharing in health and medical research. There are minimal risks associated with your participation in this project. These include discomfort associated with providing data to other researchers. All data and correspondence will be treated with strict confidence and stored on an authorised secure platform called REDCap. No individual researchers or projects will be mentioned in our analyses, rather we will report aggregated statistics. Research data that is generated through this study will be publicly shared in an anonymised format for possible future use.

Please note that this study has been approved by the QUT Human Research Ethics Committee (approval number 1600001100).

133

QUT is committed to research integrity and the ethical conduct of research projects. However, if you do have any concerns or complaints about the ethical conduct of the project you may contact the QUT Research Ethics Advisory Team on +61 7 3138 5123 or email [email protected]. The QUT Research Ethics Advisory Team is not connected with the research project and can facilitate a resolution to your concern in an impartial manner.

This research is independent of the peer-review process, and your decision to share or not share your data should not impact your chance of being accepted.

Should you wish to opt-out of this study, simply respond to this email expressing your desire to opt-out. Please note that such an opt-out option from this study does not exempt research articles from BMJ Open data sharing policy, nor should it affect your decision to share or not to share your data.

Should you have any further questions about this study, feel free to contact me.

Regards

Anisa Rowhani-Farid BMedSc, MPH

PhD Student

W: +61 7 3138 6429 │ E: [email protected]

Professor Adrian Barnett

Supervisor

W: +61 7 3138 6010 | E: [email protected]

School of Public Health and Social Work

Institute of Health and Biomedical Innovation

Queensland University of Technology

60 Musk Avenue, Kelvin Grove, QLD, 4069, Australia

134

Control group email

Subject Title:

*Title of the research article that was randomly selected (changes per email)*

Dear [xxxx]

My name is Anisa Rowhani-Farid and I am a PhD student from the Faculty of Health, Queensland University of Technology (QUT) in Brisbane, Australia. I am examining data sharing rates at BMJ Open.

Your research paper [xxxx] was randomly sampled for this study. Would you be willing to publicly share the data used in this paper by depositing it at a public data repository? You do not need to respond to this email. If your paper is accepted for publication at BMJ Open, we will verify whether you shared your data at a public depository based on your data sharing statement. Data sharing will be confirmed if your data sharing statement provides a link to your dataset, making your data easily accessible.

Your participation in this study is entirely voluntary. You will not directly benefit from participation in this project, however it is hoped the outcomes of the study contribute to the understanding of data sharing in health and medical research. There are minimal risks associated with your participation in this project. These include discomfort associated with providing data to other researchers. All data and correspondence will be treated with strict confidence and stored on an authorised secure platform called REDCap. No individual researchers or projects will be mentioned in our analyses, rather we will report aggregated statistics. Research data that is generated through this study will be publicly shared in an anonymised format for possible future use.

Please note that this study has been approved by the QUT Human Research Ethics Committee (approval number 1600001100).

QUT is committed to research integrity and the ethical conduct of research projects. However, if you do have any concerns or complaints about the ethical conduct of the project you may contact the QUT Research Ethics Advisory Team on +61 7 3138 5123 or email [email protected]. The QUT Research Ethics Advisory Team is

135

not connected with the research project and can facilitate a resolution to your concern in an impartial manner.

This research is independent of the peer-review process, and your decision to share or not share your data should not impact your chance of being accepted.

Should you wish to opt-out of this study, simply respond to this email expressing your desire to opt-out. Please note that such an opt-out option from this study does not exempt research articles from BMJ Open data sharing policy, nor should it affect your decision to share or not to share your data.

Should you have any further questions about this study, feel free to contact me.

Regards

Anisa Rowhani-Farid BMedSc, MPH

PhD Student

W: +61 7 3138 6429 │ E: [email protected]

Professor Adrian Barnett

Supervisor

W: +61 7 3138 6010 | E: [email protected]

School of Public Health and Social Work

Institute of Health and Biomedical Innovation

Queensland University of Technology

60 Musk Avenue, Kelvin Grove, QLD, 4069, Australia

136

Appendix D

Email confirmations from co-authors and journals to include the three published papers in this thesis.

Email confirmation from Professor Adrian G. Barnett:

Dear Anisa

I am happy with these author statements. Regards,

Adrian

From: Anisa Rowhani-Farid Sent: Thursday, 19 April 2018 6:38 PM To: Adrian Barnett Subject: Re: Thesis by publication

Hi Adrian,

Just to confirm that you are happy with the following author contribution statements for my thesis:

• Anisa Rowhani-Farid, Adrian G. Barnett (2018). Badges for sharing data and code at Biostatistics: an observational study. F1000Research, 7, 90 doi:10.12688/f1000research.13477.2

137

ARF conceptualised the study design, collected the data and assisted with data analysis, and wrote the first draft of the manuscript.

AGB conceptualised the study design, analysed the data and edited the manuscript.

• Anisa Rowhani-Farid, Michelle Allen, Adrian G. Barnett (2017). What incentives increase data sharing in health and medical research? A systematic review. Research Integrity and Peer Review, 2 doi:10.1186/s41073-017-0028-9

ARF collected and analysed all data for the study and wrote the manuscript.

ARF read and approved the final manuscript.

MA collected data and analysed (30%) for the study and edited the manuscript.

MA read and approved the final manuscript.

AGB provided close student mentorship for this research, which is a part of ARF’s PhD under his primary supervision, and was a major contributor for the writing of this manuscript.

AGB read and approved the final manuscript.

138

• Anisa Rowhani-Farid, Adrian G Barnett (2016). Has open data arrived at the British Medical Journal (BMJ)? An observational study. BMJ Open, 6, e011784 doi:10.1136/bmjopen-2016-011784

ARF – data collection, data analysis, writing of manuscript

ARF had full access to all of the data (including statistical reports and tables) in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis.

AGB – data verification, design of study and student supervision, editing of manuscript, statistical code for data analysis.

AGB had full access to all of the data (including statistical reports and tables) in the study and can take responsibility for the integrity of the data and the accuracy of the data analysis.

Cheers,

Anisa

Dear Anisa

I give my permission for the following papers to be included in your thesis by publication:

• Anisa Rowhani-Farid, Adrian G. Barnett (2018). Badges for sharing data and code at Biostatistics: an observational study. F1000Research, 7, 90 doi:10.12688/f1000research.13477.2

139

• Anisa Rowhani-Farid, Michelle Allen, Adrian G. Barnett (2017). What incentives increase data sharing in health and medical research? A systematic review. Research Integrity and Peer Review, 2 doi:10.1186/s41073-017-0028-9

• Anisa Rowhani-Farid, Adrian G Barnett (2016). Has open data arrived at the British Medical Journal (BMJ)? An observational study. BMJ Open, 6, e011784 doi:10.1136/bmjopen-2016-011784

Regards,

Prof Adrian Barnett BSc PhD

Senior Research Fellow

Institute of Health and Biomedical Innovation (IHBI) & School of Public Health and Social Work

Queensland University of Technology

60 Musk Avenue

Kelvin Grove, Queensland, 4059

Australia

Phone: ++61-7-3138 6010

Publications: http://eprints.qut.edu.au/view/person/Barnett,_Adrian.html https://twitter.com/aidybarnett

***

Email confirmation from Dr Michelle Allen:

Works for me �

From: Anisa Rowhani-Farid [mailto:[email protected]] Sent: Thursday, 19 April 2018 4:05 PM To: Michelle Allen Subject: Re: systematic review

140

Hi Michelle,

Just to confirm that you are happy with this author contribution statement for my thesis:

MA collected data and analysed (30%) for the study and edited the manuscript.

MA read and approved the final manuscript.

Cheers,

Anisa

Hi Anisa,

Great to hear from you – we will have to catch up for coffee soon.

I hereby provide my permission to include our systematic review “What incentives increase data sharing in health and medical research? A systematic review”, in your PhD.

Kindest regards,

Michelle

Michelle Allen, PhD | Monitoring and Evaluation Specialist – Population Health, Evaluation and Research Unit | Abt Associates

PO Box 1874 | Milton QLD | 4064 Australia

141

Phone: +61 7 3891 4132 | Email: [email protected] | Web: www.abtassoc.com.au

***

Email confirmation from Springer Nature:

Dear Anisa,

Thank you for contacting Springer Nature.

The open access articles published in BioMed Central's journals are made available under the Creative Commons Attribution (CC-BY) license, which means they are accessible online without any restrictions and can be re-used in any way, subject only to proper attribution (which, in an academic context, usually means citation). The re-use rights enshrined in our license agreement (http://www.biomedcentral.com/about/policies/license-agreement) include the right for anyone to produce printed copies themselves, without formal permission or payment of permission fees. As a courtesy, however, anyone wishing to reproduce large quantities of an open access article (250+) should inform the copyright holder and we suggest a contribution in support of open access publication (see suggested contributions at http://www.biomedcentral.com/about/policies/reprints-and- permissions/suggested-contributions). Please note that the following journals have published a small number of articles that, while freely accessible, are not open access as outlined above: Alzheimer’s Research & Therapy, Arthritis Research & Therapy, Breast Cancer Research, Critical Care, Genome Biology, Genome Medicine, Stem Cell Research & Therapy. You will be able to find details about these articles at http://www.biomedcentral.com/about/policies/reprints-and-permissions

If you have any questions, please do not hesitate to contact me.

With kind regards,

---

Dang Abadiano Global Open Research Support Executive

Global Open Research Support

142

Springer Nature T +44 (0)203 192 2009 www.springernature.com

Email confirmation from BioMed Central Ltd.

RIPR-D-16-00027R1

What incentives increase data sharing in health and medical research? A systematic review

Anisa Rowhani-Farid, BMedSc, MPH; Michelle Allen, BHM, PhD; Adrian Gerard Barnett, BSc (Hons), PhD

Research Integrity and Peer Review

Dear Ms. Anisa,

Thank you for your email.

All authors of papers published by BioMed Central and SpringerOpen retain copyright of their manuscripts and can therefore use any part of it again as long as the original article is properly cited.

Our full copyright policy can be found here: http://www.biomedcentral.com/getpublished/copyright-and-license or http://www.springeropen.com/get-published/copyright

Please let us know if you have any further questions.

Best Wishes,

Allen Claveria

Journals Editorial Office (JEO)

143

______

From: Anisa Rowhani-Farid

Sent: Wednesday, April 4, 2018 2:33 AM

To: Claveria, Allen, BioMed Central Ltd.; Info@BMC

Subject: What incentives increase data sharing in health and medical research? A systematic review

Hi Elizabeth Wager and other editors of Research Integrity and Peer Review,

I hope you are well.

I am submitting my PhD in a few weeks’ time and I require copyright permission from the publisher to include my article “What incentives increase data sharing in health and medical research? A systematic review” in my thesis. A response to this email providing permission would suffice.

Kind regards,

Anisa

DISCLAIMER: This e-mail is confidential and should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage mechanism. Biomed Central Limited does not accept liability for any statements made which are clearly the sender's own and not expressly made on behalf of Biomed Central Limited or one of their agents.

Please note that Biomed Central Limited and their agents and affiliates do not accept any responsibility for viruses or malware that may be contained in this e-mail or its attachments and it is your responsibility to scan the e-mail and attachments (if any).

144

Biomed Central Limited. Registered office: The Campus, 4 Crinan Street, London, N1 9XW. Registered Number: 03680030 England.

***

Email confirmation from BMJ Open:

Hi Anisa,

Glad to hear you made it back!

You actually don't require permission to include it in your thesis as it's published under CCByNC (so you can reuse it for anything as long as it's attributed and not used commercially). So it's no problem at all!

Adrian

***

Email confirmation from F1000Research:

Hi Anisa,

Thanks for your email. Your article is published under a CC-BY license, with the copyright belonging to you and your co-authors. This means that it can be re-used, provided that the original article is cited correctly. The full copyright statement is: “Copyright: © 2018 Rowhani-Farid A and Barnett AG. This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.”.

I hope that this has helped – if you have any further questions do let us know.

145

Best wishes,

Rachel

146

Bibliography

1. Chalmers I, Glasziou P. Avoidable waste in the production and reporting of research evidence. The Lancet. 2009;374(9683):86-9. 2. Chalmers I, Bracken M, Djulbegovic B, Garattini S, Grant J, Gülmezoglu M, et al. How to increase value and reduce waste when research priorities are set. The Lancet. 2014;383(9912):156-65. 3. Ioannidis J. How to Make More Published Research True. PLoS Med. 2014;11(10):e1001747. 4. Ioannidis JPA. Why Most Published Research Findings Are False. PLOS Medicine. 2005;2(8):e124. 5. The Royal Society. Science as an open enterprise. 2012. 6. Kim Y, Stanton J. Institutional and individual influences on scientists' data sharing behaviors: A multilevel analysis. Proc Am Soc Info Sci Tech. 2013;50(1):1- 14. 7. Pasteur L. Transformation des acides tartriques en acide racémique. Découverte de l'acide tartrique inactif. Nouvelle méthode de séparation de l'acide racémique en acides tartriques droit et gauche: impr. Bachelier; 1853. 8. Watson M. When will 'open science' become simply 'science'? Genome biology. 2015;16(1):101. 9. Pisani E, AbouZahr C. Sharing health data: good intentions are not enough. Bulletin of the World Health Organization. 2010;88(6):462-6. 10. Aleksic J, Alexa A, Attwood T, Chue Hong N, Dahlö M, Davey R, et al. An open science peer review oath. F1000Research. 2014;3. 11. Ruhi Foundation. Spirit of Faith. 2005. 12. Molloy J. The Open Knowledge Foundation: Open Data Means Better Science. PLoS Biol. 2011;9(12):e1001195. 13. Grand A, Wilkinson C, Bultitude K, Winfield A. Mapping the hinterland: Data issues in open science. Public Understanding of Science. 2014:0963662514530374. 14. Goldacre B. How to Get All Trials Reported: Audit, Better Data, and Individual Accountability. PLoS Med. 2015;12(4):e1001821. 15. Read K, Sheehan J, Huerta M, Knecht L, Mork J, Humphreys B, et al. Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study. PLoS ONE. 2015;10(7):e0132735. 16. Iqbal SA, Wallach JD, Khoury MJ, Schully SD, Ioannidis JPA. Reproducible Research Practices and Transparency across the Biomedical Literature. PLoS Biol. 2016;14(1):e1002333. 17. Puniewska M. Scientists Have a Sharing Problem The Atlantic. 2014. 18. Jon T, Mark H, Sabina L, Dan P, David G, Nobuko M, et al. The State of Open Data Report. 19. Van Tuyl S, Whitmire A. Water, Water, Everywhere: Defining and Assessing Data Sharing in Academia. PLOS ONE. 2016;11(2):e0147942. 20. van Panhuis W, Paul P, Emerson C, Grefenstette J, Wilder R, Herbst A, et al. A systematic review of barriers to data sharing in public health. BMC public health. 2014;14. 21. Ioannidis JPA, Fanelli D, Dunne DD, Goodman SN. Meta-research: Evaluation and Improvement of Research Methods and Practices. PLoS Biol. 2015;13(10):e1002264.

Bibliography 147

22. Goodman SN, Fanelli D, Ioannidis JPA. What does research reproducibility mean? Science Translational Medicine. 2016;8(341):341ps12-ps12. 23. Bollen K, Cacioppo JT, Kaplan RM, Krosnick JA, Olds JL. Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science. 2015. 24. Piwowar H. Who Shares? Who Doesn't? Factors Associated with Openly Archiving Raw Research Data. PLoS ONE. 2011;6(7):e18657. 25. Rani M, Buckley B. Systematic archiving and access to health research data: rationale, current status and way forward. Bulletin of the World Health Organization. 2012;90(12):932-9. 26. Alsheikh-Ali AA, Qureshi W, Al-Mallah MH, Ioannidis JPA. Public Availability of Published Research Data in High-Impact Journals. PLoS ONE. 2011;6(9):e24357. 27. Savage C, Vickers A. Empirical Study of Data Sharing by Authors Publishing in PLoS Journals. PLoS ONE. 2009;4(9):e7078. 28. Schooler J. could rescue the ‘replication crisis’. Nature. 2014;515(7525):9-. 29. Popper K. The Logic of Scientific Discovery: Taylor & Francis; 2002. 30. Kuhn TS. The Structure of Scientific Revolutions: University of Chicago Press; 1996. 31. Lakatos I, Worrall J, Currie G. The Methodology of Scientific Research Programmes: Volume 1: Philosophical Papers: Cambridge University Press; 1980. 32. Andersen H. Collaboration, interdisciplinarity, and the epistemology of contemporary science. Studies in History and Philosophy of Science Part A. 2016;56:1-10. 33. Destro Bisol G, Anagnostou P, Capocasa M, Bencivelli S, Cerroni A, Contreras J, et al. Perspectives on Open Science and scientific data sharing:an interdisciplinary workshop. Journal of anthropological sciences = Rivista di antropologia : JASS / Istituto italiano di antropologia. 2014;92:179-200. 34. Andreoli-Versbach P, Mueller-Langer F. Open access to data: An ideal professed but not practised. Research Policy. 2014;43(9):1621-33. 35. Little D. Understanding Society [Internet]2009. [cited 2018 26/06/2018]. Available from: https://understandingsociety.blogspot.com/2009/12/mertons- sociology-of-science.html. 36. The PME. Can Data Sharing Become the Path of Least Resistance? PLoS Med. 2016;13(1):e1001949. 37. Tenopir C, Dalton ED, Allard S, Frame M, Pjesivac I, Birch B, et al. Changes in Data Sharing and Data Reuse Practices and Perceptions among Scientists Worldwide. PLoS One. 2015;10(8):e0134826. 38. Warr W. Data sharing matters. Journal of computer-aided molecular design. 2014;28(1):1-4. 39. Marshall BJ. The lasker awards: Celebrating scientific discovery. JAMA. 2005;294(11):1420-1. 40. Bartling S, Friesike S. Opening Science 2014. Available from: http://download.springer.com/static/pdf/726/bok%253A978-3-319-00026- 8.pdf?originUrl=http%3A%2F%2Flink.springer.com%2Fbook%2F10.1007%2F978- 3-319-00026- 8&token2=exp=1444876949~acl=%2Fstatic%2Fpdf%2F726%2Fbok%25253A978- 3-319-00026- 8.pdf%3ForiginUrl%3Dhttp%253A%252F%252Flink.springer.com%252Fbook%25 2F10.1007%252F978-3-319-00026-

148

8*~hmac=6a86168114f00aac9c125b12c28176c976f6c4a3cbb159c7403778b648c0ce 0f. 41. Fischer B, Zigmond M. The Essential Nature of Sharing in Science. Science and Engineering Ethics. 2010;16(4):783-99. 42. Sumner A, Ishmael-Perkins N, Lindstrom J. Making Science of Influencing: Assessing the Impact of Development Research. IDS Working Papers. 2009;2009(335):01-45. 43. National Academy of Sciences, National Academy of Engineering, Institute of Medicine. On Being a Scientist: A Guide to Responsible Conduct in Research: Third Edition. Washington, DC: The National Academies Press,; 2009. 82 p. 44. ALLEA. The European Code of Conduct for Research Integrity. Berlin; 2017. 45. Bohm D, Nichol L. On Dialogue. London New York: Routledge; 2004. 46. Rolin K. Values in Science: The Case of Scientific Collaboration. Philosophy of Science. 2015;82(2):157-77. 47. Descartes R. A Discourse on Method - (1637): Read Books; 2008. 48. Nielsen M. Reinventing Discovery: The New Era of Networked Science: Princeton University Press; 2011. 280 p. 49. Bohm D, Peat F. Science, order, and creativity. A dramatic look at the roots of science and life. Science, Order and Creativity: A Dramatic Look al the Roots of Science and Life. 1987. 50. Frey J, Bird C. Scientific and technical data sharing: a trading perspective. Journal of Computer-Aided Molecular Design. 2014;28(10):989-96. 51. Smith R. The trouble with medical journals. Journal of the Royal Society of Medicine. 2006;99(3):115-9. 52. Fecher B, Friesike S, Hebing M. What drives academic data sharing? PloS one. 2015;10(2). 53. Wilhelm E, Oster E, Shoulson I. Approaches and costs for sharing data. JAMA. 2014;311(12):1201-2. 54. Yozwiak N, Schaffner S, Sabeti P. Data sharing: Make outbreak research open access. Nature. 2015;518(7540):477-9. 55. Lee WH. Open Access Target Validation Is a More Efficient Way to Accelerate Drug Discovery. PLoS Biol. 2015;13(6):e1002164. 56. Goldacre B, Harrison S, Mahtani KR, Heneghan C, editors. WHO consultation on Data and Results Sharing During Public Health Emergencies2015: Centre for Evidence-Based Medicine, University of Oxford; 2015. 57. Pisani E, Botchway S. Sharing individual patient and parasite-level data through the WorldWide Antimalarial Resistance Network platform: A qualitative case study. Wellcome Open Research. 2017;2:63. 58. Dunn AG, Day RO, Mandl KD, Coiera E. Learning from Hackers: Open- Source Clinical Trials2012 2012-05-02 00:00:00. 132cm5-cm5 p. 59. Anagnostou P, Capocasa M, Milia N, Sanna E, Battaggia C, Luzi D, et al. When data sharing gets close to 100%: what human paleogenetics can teach the open science movement. PloS one. 2015;10(3). 60. Barnett AG, Huang C, Turner L. Benefits of Publicly Available Data. Epidemiology. 2012;23(3):500-1. 61. Guttmacher A, Nabel E, Collins F. Why data-sharing policies matter. Proceedings of the National Academy of Sciences of the United States of America. 2009;106(40).

Bibliography 149

62. Ioannidis J. Clinical trials: what a waste. BMJ. 2014;349:g7089. 63. Krumholz H. Open science and data sharing in clinical research: basing informed decisions on the totality of the evidence. Circulation Cardiovascular quality and outcomes. 2012;5(2):141-2. 64. Myers PO. Open Data: Can It Prevent Research Fraud, Promote Reproducibility, and Enable Big Data Analytics In Clinical Research? The Annals of Thoracic Surgery. 2015;100(5):1539-40. 65. Bouter LM. Open data are not enough to realize full transparency. Journal of Clinical Epidemiology. 2016;70:256-7. 66. Choudhury S, Fishman JR, McGowan ML, Juengst ET. Big data, open science and the brain: lessons learned from genomics. Front Hum Neurosci. 2014;8:239. 67. Alter GC, Vardigan M. Addressing Global Data Sharing Challenges. Journal of empirical research on human research ethics : JERHRE. 2015;10(3):317-23. 68. Mello MM, Lieou V, Goodman SN. Clinical Trial Participants’ Views of the Risks and Benefits of Data Sharing. New England Journal of Medicine. 2018;378(23):2202-11. 69. National Health and Medical Research Council. NHMRC Statement on Data Sharing Australia2017 [updated 23/09/2016. Available from: https://www.nhmrc.gov.au/grants-funding/policy/nhmrc-statement-data-sharing. 70. Haug CJ. Whose Data Are They Anyway? Can a Patient Perspective Advance the Data-Sharing Debate? New England Journal of Medicine. 2017;376(23):2203-5. 71. Thorogood A, Deschênes St-Pierre C, Knoppers BM. Substitute consent to data sharing: a way forward for international dementia research? Journal of Law and the Biosciences. 2017;4(1):133-58. 72. Ioannidis J, Greenland S, Hlatky M, Khoury M, Macleod M, Moher D, et al. Increasing value and reducing waste in research design, conduct, and analysis. The Lancet. 2014;383(9912):166-75. 73. Steckler T, Brose K, Haas M, Kas MJ, Koustova E, Bespalov A. The preclinical data forum network: A new ECNP initiative to improve data quality and robustness for (preclinical) neuroscience. European neuropsychopharmacology : the journal of the European College of Neuropsychopharmacology. 2015;25(10):1803-7. 74. Kannt A, Wieland T. Managing risks in drug discovery: reproducibility of published findings. Naunyn-Schmiedeberg's Archives of Pharmacology. 2016:1-8. 75. Hernán M, Wilcox A. Epidemiology, data sharing, and the challenge of scientific replication. Epidemiology (Cambridge, Mass). 2009;20(2):167-8. 76. Bezuidenhout L. Variations in Scientific Data Production: What Can We Learn from #Overlyhonestmethods? Science and Engineering Ethics. 2014:1-15. 77. Begley CG, Ellis LM. Drug development: Raise standards for preclinical cancer research. Nature. 2012;483(7391):531-3. 78. Baker M. 1,500 scientists lift the lid on reproducibility. Nature. 2016(533):452–4 79. Gertler P, Galiani S, Romero M. How to make replication the norm. Nature. 2018;554(7693):417-9. 80. Ross J, Krumholz H. Ushering in a new era of open science through data sharing: the wall must come down. JAMA. 2013;309(13):1355-6. 81. Moher D, Glasziou P, Chalmers I, Nasser M, Bossuyt P, Korevaar D, et al. Increasing value and reducing waste in biomedical research: who's listening? The Lancet. 2016;387(10027):1573-86.

150

82. Salman R, Beller E, Kagan J, Hemminki E, Phillips R, Savulescu J, et al. Increasing value and reducing waste in biomedical research regulation and management. The Lancet. 2014;383(9912):176-85. 83. Goldacre B. . London: Fourth Estate 2012. 84. Doshi P, Jefferson T, Del Mar C. The Imperative to Share Clinical Study Reports: Recommendations from the Tamiflu Experience. PLoS Med. 2012;9(4):e1001201. 85. Wise J. Reanalysis of PACE trial reignites row over chronic fatigue treatment. BMJ. 2016;354. 86. Keith JG. ‘PACE-Gate’: When clinical trial evidence meets open data access. Journal of Health Psychology. 2016;22(9):1106-12. 87. Health pulls Medicare dataset after breach of doctor details [press release]. itnews, 29/09/2016 2016. 88. Tobacco firm demands university's research on children and smoking [press release]. theguardian 2011. 89. BBC. University fights Philip Morris tobacco research bid. BBC News. 2011. 90. Smith R. BMJ. 2011. [cited 2018 12/04/2018]. Available from: http://blogs.bmj.com/bmj/2011/09/01/richard-smith-let-the-tobacco-company-see- the-data/. 91. Daube M, Moodie R, McKee M. Towards a smoke-free world? Philip Morris International's new Foundation is not credible. The Lancet.390(10104):1722-4. 92. Yach D. Foundation for a smoke-free world. The Lancet.390(10104):1807-10. 93. Hirschhorn N. Another perspective on the Foundation for a Smoke-Free World. The Lancet.391(10115):25. 94. Christie B. Tobacco company makes freedom of information request for university’s research. BMJ. 2011;343. 95. Kaye J, Terry SF, Juengst E, Coy S, Harris JR, Chalmers D, et al. Including all voices in international data-sharing governance. Human Genomics. 2018;12(1):13. 96. Bierer BE, Crosas M, Pierce HH. Data Authorship as an Incentive to Data Sharing. New England Journal of Medicine. 2017;376(17):1684-7. 97. Tracz V, Lawrence R. Towards an open science publishing platform. F1000Research. 2016;5:130. 98. Fyfe A, Coate K, Curry S, Lawson S, Moxham N, Røstvik CM. Untangling Academic Publishing: A history of the relationship between commercial interests, academic prestige and the circulation of research: Zenodo; 2017 25 May 2017. 99. Marshall E. DNA Sequencer Protests Being Scooped With His Own Data. Science. 2002;295(5558):1206-7. 100. Vickers AJ. Whose data set is it anyway? Sharing raw data from randomized trials. Trials. 2006;7:15-. 101. Walport M, Brest P. Sharing research data to improve public health. The Lancet. 2011;377(9765):537-9. 102. Singh KNM, Shetty YC. Data sharing: A viable resource for future. Perspectives in Clinical Research. 2017;8(2):63-7. 103. Chan A-W, Song F, Vickers A, Jefferson T, Dickersin K, Gøtzsche P, et al. Increasing value and reducing waste: addressing inaccessible research. The Lancet. 2014;383(9913):257-66. 104. Tenopir C, Allard S, Douglass K, Aydinoglu A, Wu L, Read E, et al. Data Sharing by Scientists: Practices and Perceptions. PLoS ONE. 2011;6(6):e21101. 105. David S, Grace B, Iain H, Katie A, Dan P, Mithu L, et al. Whitepaper: Practical challenges for researchers in data sharing2018.

Bibliography 151

106. Bezuidenhout L. To share or not to share…incentivizing data sharing in life science communities. Developing World Bioethics. 2018;1-7:n/a-n/a. 107. D'Este P, Rentocchini F, Grimaldi R, Manjarrés-Henríquez L. The relationship between research funding and academic consulting: An empirical investigation in the Spanish context. Technological Forecasting and Social Change. 2013. 108. Critchley C, Nicol D. Understanding the impact of commercialization on public support for scientific research: Is it about the funding source or the organization conducting the research? Public Understanding of Science. 2011;20(3):347-66. 109. Abbott A. Europe’s drug regulator opens vaults of clinical-trials data. Nature. 2016;538(7626):440-. 110. Koenig F, Slattery J, Groves T, Lang T, Benjamini Y, Day S, et al. Sharing clinical trial data on patient level: Opportunities and challenges. Biometrical Journal. 2014;57(1):8-26. 111. Pharmafile. InterMune drops EMA lawsuit over data release,. 2014. 112. AllTrials. EMA gets important European ruling in AbbVie and InterMune cases. 2013. 113. FiercePharma. EU official leans on EMA to release AbbVie data on Humira. 2016. 114. D’Este P, Patel P. University–industry linkages in the UK: What are the factors underlying the variety of interactions with industry? Research Policy. 2007;36(9):1295-313. 115. Caulfield T, Harmon SH, Joly Y. Open science versus commercialization: a modern research conflict. Genome Medicine. 2012;4(2):17. 116. Guedon J-C. Toward the Internet of the Mind. 2017. 117. Kidwell MC, Lazarevi LB, Baranski E, Hardwicke TE, Piechowski S, Falkenberg L-S, et al. Badges to Acknowledge Open Practices: A Simple, Low-Cost, Effective Method for Increasing Transparency. PLoS Biol. 2016;14(5):e1002456. 118. Buttliere BT. Using science and psychology to improve the dissemination and evaluation of scientific work. Frontiers in computational neuroscience. 2014;8:82. 119. Gardner D, Toga A, Ascoli G, Beatty J, Brinkley J, Dale A, et al. Towards effective and rewarding data sharing. Neuroinformatics. 2003;1(3):289-95. 120. Ioannidis JA, Khoury MJ. Assessing value in biomedical research: The pqrst of appraisal and reward. JAMA. 2014;312(5):483-4. 121. Glasziou P, Altman D, Bossuyt P, Boutron I, Clarke M, Julious S, et al. Reducing waste from incomplete or unusable reports of biomedical research. Lancet. 2014;383(9913):267-76. 122. Moher D, Naudet F, Cristea IA, Miedema F, Ioannidis JPA, Goodman SN. Assessing scientists for hiring, promotion, and tenure. PLOS Biology. 2018;16(3):e2004089. 123. McNutt M. Data Sharing. Science. 2016. 124. Antman E. Data sharing in research: benefits and risks for clinicians. BMJ. 2014;348:g237. 125. Olfson M, Wall MM, Blanco C. Incentivizing data sharing and collaboration in medical research—the s-index. JAMA Psychiatry. 2017;74(1):5-6. 126. UNESCO. Universal Declaration on the Human Genome and Human Rights, 1997 [Available from: http://www.unesco.org/new/en/social-and-human- sciences/themes/bioethics/human-genome-and-human-rights/. 127. UNESCO. International Declaration on Human Genetic Data 2003 [Available from: http://www.unesco.org/new/en/social-and-human- sciences/themes/bioethics/human-genetic-data/.

152

128. Organisation for Economic Co-operation and Development. Principles and Guidelines for Access to Research Data from Public Funding 2007 [Available from: http://www.oecd.org/sti/sci- tech/oecdprinciplesandguidelinesforaccesstoresearchdatafrompublicfunding.htm. 129. BBSRC. Strategic Plan 2010-2015 [Available from: http://www.bbsrc.ac.uk/news/planning/strategy/. 130. Wellcome Trust. Policy on data, software and materials management and sharing 2017 [updated 10/07/2017. Available from: https://wellcome.ac.uk/funding/managing-grant/policy-data-software-materials- management-and-sharing. 131. Normile D. China asserts firm grip on research data. Science. 2018. 132. National Health and Medical Research Council. Targeted Consultation on the Draft Principles for Accessing and Using Publicly Funded Data for Health Research | NHMRC Public Consultations 2014 [Available from: http://consultations.nhmrc.gov.au/public_consultations/funded-data. 133. National Health and Medical Research Council. NHMRC Open Access Policy 2018 [Available from: https://www.nhmrc.gov.au/grants-funding/policy/nhmrc-open- access-policy. 134. Australian Research Council. Open Access Policy Canberra2015 [updated 23/09/2017. Available from: http://www.arc.gov.au/arc-open-access-policy. 135. Wellcome Trust. Wellcome Open Research 2018 [Available from: https://wellcomeopenresearch.org/. 136. Gates Open Research. Gates Open Research 2018 [Available from: https://gatesopenresearch.org/. 137. AAS Open Research. AAS Open Research 2018 [Available from: https://aasopenresearch.org/. 138. Research Data Alliance. Sharing Rewards and Credit (SHARC) IG 2018 [Available from: https://www.rd-alliance.org/groups/sharing-rewards-and-credit- sharc-ig. 139. ClinicalTrials. ClinicalTrials.gov 2018 [Available from: https://clinicaltrials.gov/. 140. Viergever RF, Li K. Trends in global clinical trial registration: an analysis of numbers of registered clinical trials in different parts of the world from 2004 to 2013. BMJ Open. 2015;5. 141. Ohmann C, Banzi R, Canham S, Battaglia S, Matei M, Ariyo C, et al. Sharing and reuse of individual participant data from clinical trials: principles and recommendations. BMJ Open. 2017;7(12). 142. Harrison C. GlaxoSmithKline opens the door on clinical data sharing. Nature reviews Drug discovery. 2012;11(12):891-2. 143. GlaxoSmithKline. ClinicalStudyDataRequest.com 2018 [Available from: https://www.clinicalstudydatarequest.com/. 144. Goldacre B, Lane S, Mahtani K, Heneghan C, Onakpoya I, Bushfield I, et al. Pharmaceutical companies’ policies on access to trial data, results, and methods: audit study. BMJ. 2017;358:j3334. 145. Stodden V, Guo P, Ma Z. Toward Reproducible Computational Research: An Empirical Analysis of Data and Code Policy Adoption by Journals. PLOS ONE. 2013;8(6):e67111. 146. Loder E, Groves T. The BMJ requires data sharing on request for all trials. BMJ (Clinical research ed). 2015;350:h2373.

Bibliography 153

147. Nosek BA, Alter G, Banks GC, Borsboom D, Bowman SD, Breckler SJ, et al. Promoting an open research culture. Science. 2015;348(6242):1422-5. 148. Munafò M, Nosek B, Bishop D, Button K, Chambers C, Percie du Sert N, et al. A manifesto for reproducible science. Nature Human Behaviour. 2017;1(1):0021. 149. Center for Open Science. TOP Guidelines 2017 [Available from: https://cos.io/our-services/top-guidelines/. 150. Morey RD, Chambers CD, Etchells PJ, Harris CR, Hoekstra R, Lakens D, et al. The Peer Reviewers' Openness Initiative: incentivizing open research practices through peer review. Royal Society Open Science. 2016;3(1):150547. 151. Taichman DB, Backus J, Baethge C, Bauchner H, de Leeuw PW, Drazen JM, et al. Sharing Clinical Trial Data — A Proposal from the International Committee of Medical Journal Editors. New England Journal of Medicine.0(0):null. 152. Taichman DB, Sahni P, Pinborg A, et al. Data sharing statements for clinical trials: A requirement of the international committee of medical journal editors. JAMA. 2017;317(24):2491-2. 153. Roche D, Kruuk L, Lanfear R, Binning S. Public Data Archiving in Ecology and Evolution: How Well Are We Doing? PLoS Biol. 2015;13(11):e1002295. 154. Vines TH, Andrew RL, Bock DG, Franklin MT, Gilbert KJ, Kane NC, et al. Mandated data archiving greatly improves access to research data. The FASEB Journal. 2013;27(4):1304-8. 155. Vines T, Albert A, Andrew R, Débarre F, Bock D, Franklin M, et al. The Availability of Research Data Declines Rapidly with Article Age. Current Biology. 2014;24(1):94-7. 156. Hoffmann T, Glasziou P, Beller E, Goldacre B, Chalmers I. Focus on sharing individual patient data distracts from other ways of improving trial transparency. BMJ. 2017;357. 157. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data. 2016;3:160018. 158. ANDS. The FAIR data principles 2017 [Available from: https://www.ands.org.au/working-with-data/fairdata. 159. Rouder JN. The what, why, and how of born-open data. Behavior Research Methods. 2015:1-8. 160. Thelwall M, Kousha K. Figshare: A universal repository for academic resource sharing? Online Information Review. 2016;40(3):333-46. 161. Mike T, Kayvan K. Do journal data sharing mandates work? Life sciences evidence from Dryad. Aslib Journal of Information Management. 2017;69(1):36-45. 162. Karczewski K, Tatonetti N, Manrai A, Patel C, Titus Brown C, Ioannidis J. Methods to ensure the reproducibility of biomedical research Biocomputing 2017. 2016:117-9. 163. Friesike S, Schildhauer T. Open science: Many good resolutions, very few incentives, yet. Incentives and Performance: Governance of Research Organizations2015. p. 277-89. 164. Stodden V. Reproducing Statistical Results. Annual Review of Statistics and Its Application. 2015;2(1):1-19. 165. Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ. 2009;339. 166. What is Economics? 2016 [Available from: http://www.whatiseconomics.org/what-is-economics/incentives/.

154

167. Federer LM, Lu YL, Joubert DJ, Welsh J, Brandys B. Biomedical data sharing and reuse: Attitudes and practices of clinical and scientific research staff. PloS one. 2015;10(6). 168. NHMRC. NHMRC Statement on Data Sharing Australia2016 [updated 23/09/2016. Available from: https://www.nhmrc.gov.au/grants- funding/policy/nhmrc-statement-data-sharing. 169. Wang L, Kogan A, Cobia D, Alpert K, Kolasny A, Miller MI, et al. Northwestern University Schizophrenia Data and Software Tool (NUSDAST). Frontiers in Neuroinformatics. 2013;7. 170. Wang L, Alpert K, Calhoun V, Keator D, King M, Kogan A, et al. Schizconnect: A one-stop web-based resource for large-scale schizophrenia neuroimaging data integration. . 2015;41:S279. 171. Choquet R, Fonjallaz Y, De Carrara A, Maaroufi M, Vandenbussche PY, Dhombres F, et al., editors. Coding rare diseases in health information systems: A tool for visualizing classifications and integrating phenotypic and genetic data. CEUR Workshop Proceedings; 2014. 172. Adamusiak T, Parkinson H, Muilu J, Roos E, van der Velde KJ, Thorisson GA, et al. Observ-OM and Observ-TAB: Universal Syntax Solutions for the Integration, Search, and Exchange of Phenotype And Genotype Information. Human Mutation. 2012;33(5):867-73. 173. Bartsch H, Thompson WK, Jernigan TL, Dale AM. A web-portal for interactive data exploration, visualization, and hypothesis testing. Frontiers in Neuroinformatics. 2014;8(MAR). 174. Choquet R, Maaroufi M, Fonjallaz Y, de Carrara A, Vandenbussche PY, Dhombres F, et al. LORD: a phenotype-genotype semantically integrated biomedical data tool to support rare disease diagnosis coding in health information systems. AMIA Annu Symp Proc. 2015;2015:434-40. 175. Falkner JA, Hill JA, Andrews PC. Proteomics FASTA archive and reference resource. Proteomics. 2008;8(9):1756-7. 176. Kanterakis A, Kuiper J, Potamias G, Swertz MA. PyPedia: Using the wiki paradigm as crowd sourcing environment for bioinformatics protocols. Source Code for Biology and Medicine. 2015;10(1). 177. Queralt-Rosinach N, Pinero J, Bravo A, Sanz F, Furlong LI. DisGeNET-RDF: harnessing the innovative power of the Semantic Web to explore the genetic basis of diseases. Bioinformatics (Oxford, England). 2016. 178. Patrinos GP, Cooper DN, van Mulligen E, Gkantouna V, Tzimas G, Tatum Z, et al. Microattribution and nanopublication as means to incentivize the placement of human genome variation data into the public domain. Human Mutation. 2012;33(11):1503-12. 179. Tan A, Tripp B, Daley D. BRISK-research-oriented storage kit for biology- related data. Bioinformatics. 2011;27(17):2422-5. 180. Thomas DG, Gaheen S, Harper SL, Fritts M, Klaessig F, Hahn-Dantona E, et al. ISA-TAB-Nano: A Specification for Sharing Nanomaterial Research Data in Spreadsheet-based Format. BMC Biotechnology. 2013;13. 181. van Kampen AH, van Schaik BD, Pauws E, Michiels EM, Ruijter JM, Caron HN, et al. USAGE: a web-based approach towards the analysis of SAGE data. Serial Analysis of Gene Expression. Bioinformatics. 2000;16(10):899-905. 182. Voegele C, Bouchereau B, Robinot N, McKay J, Damiecki P, Alteyrac L. A universal open-source Electronic Laboratory Notebook. Bioinformatics. 2013;29(13):1710-2.

Bibliography 155

183. Postman J, Hummer K, Bretting P, Kinard G, Bohning M, Emberland G, et al. GRIN-Global: An international project to develop a global plant genebank information management system. Acta Horticulturae2010. p. 49-56. 184. Kawaguchi A, Mondal A, Montesdeoca N, Govind S, Lee MJ, editors. PADMA database: Pathogen associated drosophila microarray database. ICC2009 - International Conference of Computing in Engineering, Science and Information; 2009. 185. Siu LL, Lawler M, Haussler D, Knoppers BM, Lewin J, Vis DJ, et al. Facilitating a culture of responsible and effective sharing of cancer genome data. Nature medicine. 2016;22(5):464-71. 186. Lawler M, Siu LL, Rehm HL, Chanock SJ, Alterovitz G, Burn J, et al. All the World's a Stage: Facilitating Discovery Science and Improved Cancer Care through the Global Alliance for Genomics and Health. Cancer discovery. 2015;5(11):1133-6. 187. Ye S, Makedon F, Steinberg T, Shen L, Ford J, Wang Y, et al., editors. SCENS: A system for the mediated sharing of sensitive data. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries; 2003. 188. Weinberg M, Waterman S, Lucas CA, Falcon VC, Morales PK, Lopez LA, et al. The US-Mexico border infectious disease surveillance project: Establishing binational border surveillance. Emerging Infectious Diseases. 2003;9(1):97-102. 189. Tian L, Zhang W, Wang H, Wu C, Ni Y, Cao F, et al., editors. MeDetect: Domain entity annotation in biomedical references using linked open data. CEUR Workshop Proceedings; 2012. 190. Tardif CL, Schäfer A, Trampel R, Villringer A, Turner R, Bazin PL. Open Science CBS Neuroimaging Repository: Sharing ultra-high-field MR images of the brain. NeuroImage. 2016;124:1143-8. 191. Sonne T, Jensen MM. Race by hearts: Using technology to facilitate enjoyable and social workouts. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)2014. p. 125-32. 192. Shirts BH, Hauser RG, Bennett S, Jackson B. Using a database/repository structure to facilitate multi-institution utilization management data sharing. American Journal of Clinical Pathology. 2013;140:A072. 193. Roos M, Schultes EA, Mons B. Speeding up research with the Semantic Web. Orphanet Journal of Rare Diseases. 2012;7. 194. Robertson MN, Ylioja PM, Williamson AE, Woelfle M, Robins M, Badiola KA, et al. Open source drug discovery - A limited tutorial. Parasitology. 2014;141(1):148-57. 195. Reilent E, Kuusik A, Puju M, editors. Real-time data streaming for functionally improved eHealth solutions. Proceedings - IEEE-EMBS International Conference on Biomedical and Health Informatics: Global Grand Challenge of Health Informatics, BHI 2012; 2012. 196. Redfield C, Fisher J, Horng S, Schoenfeld D, Mottley L, Wolfe RE, et al. Real- time clinical information exchange between EMS and the emergency department. Academic Emergency Medicine. 2013;20(5):S144-S5. 197. Poldrack RA, Barch DM, Mitchell JP, Wager TD, Wagner AD, Devlin JT, et al. Towards open sharing of task-based fMRI data: The OpenfMRI project. Frontiers in Neuroinformatics. 2013;7(JUNE). 198. Peeples MM, Iyer AK, Cohen JL. Integration of a mobile-integrated therapy with electronic health records: Lessons learned. Journal of Diabetes Science and Technology. 2013;7(3):602-11.

156

199. Patel V, Dinov ID, Van Horn JD, Thompson PM, Toga AW. LONI MiND: Metadata in NIfTI for DWI. NeuroImage. 2010;51(2):665-76. 200. Owolabi K, Sonoiki D, Salet F, Gignac N. Application of FOSS4G and open data to support polio eradication, vaccine delivery and ebola emergency response in West Africa. International Journal of Geoinformatics. 2015;11(3):31-6. 201. Ocampo JM, Smart JC, Allston A, Bhattacharjee R, Boggavarapu S, Carter S, et al. Improving HIV Surveillance Data for Public Health Action in Washington, DC: A Novel Multiorganizational Data-Sharing Method. JMIR public health and surveillance. 2016;2(1):e3. 202. Natter MD, Quan J, Ortiz DM, Bousvaros A, Ilowite NT, Inman CJ, et al. An i2b2-based, generalizable, open source, self-scaling chronic disease registry. Journal of the American Medical Informatics Association. 2013;20(1):172-9. 203. McCusker JP, Lebo T, Krauthammer M, McGuinness DL. Next generation cancer data discovery, access, and integration using Prizms and nanopublications. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)2013. p. 105-12. 204. Lozano-Rubi R, Pastor X, Lozano E. OWLing Clinical Data Repositories With the Ontology Web Language. JMIR medical informatics. 2014;2(2):e14. 205. Lefort L, Leroux H, editors. Design and generation of linked clinical data cubes. CEUR Workshop Proceedings; 2013. 206. Kazansky Y, Wood D, Sutherlun J. The current and potential role of satellite remote sensing in the campaign against malaria. Acta Astronautica. 2016;121:292- 305. 207. Jupp S, Malone J, Bolleman J, Brandizi M, Davies M, Garcia L, et al. The EBI RDF platform: Linked open data for the life sciences. Bioinformatics. 2014;30(9):1338-9. 208. Izzo M, Mortola F, Arnulfo G, Fato MM, Varesio L. A digital repository with an extensible data model for biobanking and genomic analysis management. BMC Genomics. 2014;15 Suppl 3:S3. 209. Howell FW, Cannon RC, Goddard NH. Catalyzer: A novel tool for integrating, managing and publishing heterogeneous bioscience data. Concurrency Computation Practice and Experience. 2007;19(2):207-21. 210. Haselgrove C, Poline JB, Kennedy DN. A simple tool for neuroimaging data sharing. Frontiers in Neuroinformatics. 2014;8(MAY). 211. Fraser HSF, Habib A, Goodrich M, Thomas D, Blaya JA, Fils-Aime JR, et al., editors. E-Health systems for management of MDR-TB in resource-poor environments: A decade of experience and recommendations for future work. Studies in Health Technology and Informatics; 2013. 212. Fonseca CG, Backhaus M, Bluemke DA, Britten RD, Chung JD, Cowan BR, et al. The Cardiac Atlas Project-an imaging database for computational modeling and statistical atlases of the heart. Bioinformatics. 2011;27(16):2288-95. 213. Embi PJ, Stephens W, Myslinski R. American college of rheumatology's rheumatology informatics system for effectiveness registry pilot. Arthritis and Rheumatism. 2013;65:S83. 214. Embi PJ, Barnes I, Myslinski R, Ervin D, Stevens W, Borlawsky T, et al. The rheumatology informatics system for effectiveness (RISE): Enabling data access across disparate sites for quality improvement and research. Arthritis and Rheumatism. 2012;64:S1101-S2. 215. Craddock C. The preprocessed connectomes project: An open science repository of preprocessed data. Biological Psychiatry. 2016;79(9):397S.

Bibliography 157

216. Costa RS, Veríssimo A, Vinga S. KiMoSys: A web-based repository of experimental data for KInetic MOdels of biological SYStems. BMC Systems Biology. 2014;8(1). 217. Coakley MF, Hurt DE, Weber N, Mtingwa M, Fincher EC, Alekseyev V, et al. The NIH 3D Print Exchange: A Public Resource for Bioscientific and Biomedical 3D Prints. 3d Printing and Additive Manufacturing. 2014;1(3):137-40. 218. Bryar JM, Dalby CK, Anastas S, Brady L, Hassett MJ, Shulman LN, et al. Implementation of chemotherapy treatment plans (CTP) in a large comprehensive cancer center (CCC): The key roles of infrastructure and data sharing. Journal of Clinical Oncology. 2013;31(31). 219. Barbarito F, Pinciroli F, Mason J, Marceglia S, Mazzola L, Bonacina S. Implementing standards for the interoperability among healthcare providers in the public regionalized Healthcare Information System of the Lombardy Region. Journal of biomedical informatics. 2012;45(4):736-45. 220. Ancker JS, Silver M, Kaushal R. Rapid growth in use of personal health records in New York, 2012-2013. Journal of General Internal Medicine. 2014;29(6):850-4. 221. Software breakthrough makes data sharing easy. Hospital peer review. 1996;21(11):147-50. 222. Yuanqing T, Xuesong L, Jijiao F, Xiuming W. Preventing, controlling, and sharing data of arsenicosis in China. Data Science Journal. 2007;6(SUPPL.):S511-S7. 223. Vreeman DJ, Richoz C. Possibilities and Implications of Using the ICF and Other Vocabulary Standards in Electronic Health Records. Physiotherapy research international : the journal for researchers and clinicians in physical therapy. 2015;20(4):210-9. 224. Van Horn JD, Grethe JS, Kostelec P, Woodward JB, Aslam JA, Rus D, et al. The Functional Magnetic Resonance Imaging Data Center (fMRIDC): The challenges and rewards of large-scale databasing of neuroimaging studies. Philosophical Transactions of the Royal Society B: Biological Sciences. 2001;356(1412):1323-39. 225. Stärk KD, Arroyo Kuribreña M, Dauphin G, Vokaty S, Ward MP, Wieland B, et al. One Health surveillance - More than a buzz word? Preventive veterinary medicine. 2015;120(1):124-30. 226. Herrick R, Horton W, Olsen T, McKay M, Archie KA, Marcus DS. XNAT Central: Open sourcing imaging research data. NeuroImage. 2016;124:1093-6. 227. Rautenberg PL, Wachtler T, editors. Flexible specification of data models for neuroscience databases. Proceedings - International Workshop on Database and Expert Systems Applications, DEXA; 2011. 228. Ballard DJ, Nicewander D, Skinner C. Health care provider quality improvement organization Medicare data-sharing: a diabetes quality improvement initiative. Proceedings / AMIA Annual Symposium AMIA Symposium. 2002:22-5. 229. Yu ET, Hawkins A, Kuntz ID, Rahn LA, Rothfuss A, Sale K, et al. The Collaboratory for MS3D: A New Cyberinfrastructure for the Structural Elucidation of Biological Macromolecules and Their Assemblies Using Mass Spectrometry-Based Approaches. Journal of Proteome Research. 2008;7(11):4848-57. 230. Toga AW, Neu SC, Bhatt P, Crawford KL, Ashish N. The Global Alzheimer's Association Interactive Network. Alzheimer's and Dementia. 2016;12(1):49-54. 231. Toga AW, Neu S, Crawford K, Bhatt P, Ashish N. The global Alzheimer's association interactive network (GAAIN). Alzheimer's and Dementia. 2015;11(7):P121.

158

232. Duhaime AC, Hotz GA, Wittkampf F, Haire SW, Costine BA. Pediatric patients in the track TBI trial-testing common data elements in children. Journal of Neurotrauma. 2014;31(12):A67. 233. Dekker A, Wiessler W, Xiao Y, Krishnapuram B, Rao B, O'Meara E, et al. Rapid learning in practice: Validation of an EU population-based prediction model in USA trial data for H&N cancer. Radiotherapy and Oncology. 2012;103:S229. 234. Chaudhary O. Developing the foundation for syndromic surveillance and health information exchange for yolo county, california. Online journal of public health informatics. 2012;4(2). 235. Teeters JL, Harris KD, Millman KJ, Olshausen BA, Sommer FT. Data sharing for computational neuroscience. Neuroinformatics. 2008;6(1):47-55. 236. National Research Council Board on Research D, Information. The National Academies Collection: Reports funded by National Institutes of Health. In: Uhlir PF, editor. Designing the Microbial Research Commons: Proceedings of an International Symposium. Washington (DC): National Academies Press (US) National Academy of Sciences.; 2011. 237. Wong DF, Innis R, Sung LM, Gold L, Paul S, Phan P, et al. New models of open innovation to rejuvenate the biopharmaceutical ecosystem, a proposal by the ACNP liaison committee. Neuropsychopharmacology. 2013;38:S17. 238. Poline JB, Breeze JL, Ghosh S, Gorgolewski KF, Halchenko YO, Hanke M, et al. Data sharing in neuroimaging research. Frontiers in Neuroinformatics. 2012;6(MARCH). 239. Sharing data with physicians helps break down barriers. Data strategies & benchmarks : the monthly advisory for health care executives. 2002;6(5):70-5. 240. Dyke SOM, Hubbard TJP. Developing and implementing an institute-wide data sharing policy. Genome Medicine. 2011;3(9). 241. Cherico-Hsii S, Bankoski A, Singal P, Horon I, Beane E, Casey M, et al. Sharing overdose data across state agencies to inform public health strategies: A case study. Public Health Reports. 2016;131(2):258-63. 242. IMPACT Act levels the playing field on healthcare performance. Hospital case management : the monthly update on hospital-based care planning and critical paths. 2015;23(12):156-8. 243. Goldacre B, Godlee F, Heneghan C, Tovey D, Lehman R, Chalmers I, et al. OPEN DATA CAMPAIGN Open letter: European Medicines Agency should remove barriers to access clinical trial data. Bmj-British Medical Journal. 2014;348. 244. OpenfMRI. 2016 [Available from: https://openfmri.org/dataset/. 245. Cochrane. The Cochrane-REWARD prize for reducing waste in research 2016 [Available from: http://www.cochrane.org/news/cochrane-announces-cochrane- reward-prize-reducing-waste-research. 246. Longo DL, Drazen JM. Data Sharing. New England Journal of Medicine. 2016;374(3):276-7. 247. Strom BL, Buyse ME, Hughes J, Knoppers BM. Data Sharing — Is the Juice Worth the Squeeze? New England Journal of Medicine. 2016;375(17):1608-9. 248. Lund H, Brunnhuber K, Juhl C, Robinson K, Leenaars M, Dorch BF, et al. Towards evidence based research. BMJ. 2016;355. 249. Reidpath DD, Allotey PA. Data sharing in medical research: an empirical investigation. Bioethics. 2001;15(2):125-34. 250. Groves T. Managing UK research data for future use. BMJ. 2009;338. 251. Groves T. BMJ policy on data sharing. BMJ (Clinical research ed). 2010;340(jan28_4):c564.

Bibliography 159

252. Godlee F, Groves T. The new BMJ policy on sharing data from drug and device trials. BMJ. 2012;345. 253. Piwowar H, Chapman W, Chapman W. Identifying data sharing in biomedical literature. AMIA Annual Symposium proceedings / AMIA Symposium AMIA Symposium. 2008:596-600. 254. Piwowar H, Chapman W. Public sharing of research datasets: A pilot study of associations. Journal of Informetrics. 2010;4(2):148-56. 255. BMJ. 2016 [Available from: https://www.bmj.com/about-bmj/resources- authors/article-types. 256. BMJ. The BMJ requires data sharing on request for all trials [Internet]; 2015 2015. Podcast. Available from: https://soundcloud.com/bmjpodcasts/the-bmj- requires-data-sharing 257. Norman G, Monteiro S, Salama S. Sample size calculations: should the emperor’s clothes be off the peg or made to measure? BMJ. 2012;345. 258. Deddens JA, Petersen MR. Approaches for estimating prevalence ratios. Occupational and Environmental Medicine. 2008;65(7):501-6. 259. Milia N, Congiu A, Anagnostou P, Montinaro F, Capocasa M, Sanna E, et al. Mine, Yours, Ours? Sharing Data on Human Genetic Variation. PLoS ONE. 2012;7(6):e37552. 260. Krumholz H. Why data sharing should be the expected norm. BMJ (Clinical research ed). 2015;350. 261. Peng RD. Reproducible research and Biostatistics. Biostatistics. 2009;10(3):405-8. 262. Peng RD. Reproducible Research in Computational Science. Science. 2011;334(6060):1226-7. 263. Rowhani-Farid A, Barnett AG. Has open data arrived at the British Medical Journal (BMJ)? An observational study. BMJ Open. 2016;6(10). 264. Rowhani-Farid A, Allen M, Barnett AG. What incentives increase data sharing in health and medical research? A systematic review. Research Integrity and Peer Review. 2017;2(1):4. 265. Ebersole CR, Axt JR, Nosek BA. Scientists? Reputations Are Based on Getting It Right, Not Being Right. PLoS Biol. 2016;14(5):e1002460. 266. Barbui C. Sharing all types of clinical data and harmonizing journal standards. BMC Medicine. 2016;14(1):1-3. 267. Bastian H. PLOS Blogs2017. [cited 2017]. Available from: http://blogs.plos.org/absolutely-maybe/2017/08/29/bias-in-open-science-advocacy- the-case-of-article-badges-for-data-sharing/. 268. Stodden V, McNutt M, Bailey D, Deelman E, Gil Y, Hanson B, et al. Enhancing reproducibility for computational methods. Science. 2016;354(6317):1240-1. 269. Reproducibility and reliability of biomedical research: improving research practice. The Academy of Medical Sciences; 2015. 270. Sena ES. Inaugural editorial: advancing preclinical and translational research of relevance to medicine. BMJ Open Science. 2017;1(1). 271. Rowhani-Farid A, Barnett A. Badges for sharing data and code at Biostatistics - an observational study2018. 272. Open Science Framework. Badges to acknowledge open practices 2016 [Available from: https://osf.io/tvyxz/wiki/7. Frequently Asked Questions/. 273. Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research Electronic Data Capture (REDCap) - A metadata-driven methodology and workflow

160

process for providing translational research informatics support. Journal of biomedical informatics. 2009;42(2):377-81. 274. Schulz KF, Altman DG, Moher D. CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. BMJ. 2010;340. 275. Naudet F, Sakarovitch C, Janiaud P, Cristea I, Fanelli D, Moher D, et al. Data sharing and reanalysis of randomized controlled trials in leading biomedical journals with a full data sharing policy: survey of studies published in The BMJ and PLOS Medicine. BMJ. 2018;360. 276. Nichols DM, Twidale MB. Metrics for openness. Journal of the Association for Information Science and Technology. 2017;68(4):1048-60. 277. Rowhani-Farid A, Barnett A. Badges for sharing data and code at Biostatistics: an observational study [version 2; referees: 2 approved]2018. 278. Giofrè D, Cumming G, Fresc L, Boedker I, Tressoldi P. The influence of journal submission guidelines on authors' reporting of statistics and use of open research practices. PLOS ONE. 2017;12(4):e0175583. 279. Grant R, Hrynaszkiewicz I. The impact on authors and editors of introducing Data Availability Statements at Nature journals. bioRxiv. 2018. 280. F1000Research. How to publish: F1000; 2018 [Available from: https://f1000research.com/for-authors/data-guidelines. 281. Springer Nature. Research Data Support: Springer Nature; 2018 [Available from: https://www.springernature.com/gp/authors/research-data-policy. 282. DG A, I S. A history of the evolution of guidelines for reporting medical research: the long road to the EQUATOR Network. 2015. 283. Foley HC. Many Scientific Studies Are Bogus, but Blockchain Can Help Scientific American: Scientific American; 2018 [Available from: https://blogs.scientificamerican.com/observations/many-scientific-studies-are-bogus- but-blockchain-can-help/. 284. Byrne M. PLOS. 2017. [cited 2018 04/01/2018]. Available from: http://blogs.plos.org/everyone/2017/05/08/making-progress-toward-open-data/. 285. Nisen P, Rockhold F. Access to patient-level data from GlaxoSmithKline clinical trials. N Engl J Med. 2013;369. 286. Keiding N. Reproducible research and the substantive context. Biostatistics. 2010;11(3):376-8. 287. Groves T. Interview with Dr Trish Groves. In: Nguyen Thu V, Glonti K, editors. Methods in Research on Research. YouTube2018. 288. Wellcome Trust. Open access policy 2018 [Available from: https://wellcome.ac.uk/funding/managing-grant/open-access-policy. 289. National Association of Scholars. The Irreproducibility Crisis of Modern Science: Causes, Consequences, and the Road to Reform, . 2018. 290. Packer M. Data sharing in medical research. BMJ. 2018;360. 291. Begley CG, Buchan AM, Dirnagl U. Robust research: Institutions must do their part for reproducibility. Nature. 2015(525):25-7. 292. Deci EL, Koestner R, Ryan RM. A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation. Psychological Bulletin. 1999;125(6):627-68.

Bibliography 161