Sharing Is Caring—Data Sharing Initiatives in Healthcare

International Journal of Environmental Research and Public Health Review Sharing Is Caring—Data Sharing Initiatives in Healthcare Tim Hulsen Department of Professional Health Solutions & Services, Philips Research, 5656AE Eindhoven, The Netherlands; [email protected] Received: 28 February 2020; Accepted: 24 April 2020; Published: 27 April 2020 Abstract: In recent years, more and more health data are being generated. These data come not only from professional health systems, but also from wearable devices. All these ‘big data’ put together can be utilized to optimize treatments for each unique patient (‘precision medicine’). For this to be possible, it is necessary that hospitals, academia and industry work together to bridge the ‘valley of death’ of translational medicine. However, hospitals and academia often are reluctant to share their data with other parties, even though the patient is actually the owner of his/her own health data. Academic hospitals usually invest a lot of time in setting up clinical trials and collecting data, and want to be the first ones to publish papers on this data. There are some publicly available datasets, but these are usually only shared after study (and publication) completion, which means a severe delay of months or even years before others can analyse the data. One solution is to incentivize the hospitals to share their data with (other) academic institutes and the industry. Here, we show an analysis of the current literature around data sharing, and we discuss five aspects of data sharing in the medical domain: publisher requirements, data ownership, growing support for data sharing, data sharing initiatives and how the use of federated data might be a solution. We also discuss some potential future developments around data sharing, such as medical crowdsourcing and data generalists. Keywords: data sharing; data management; data science; big data; healthcare 1. Introduction The past years have seen a steep rise in the amount of health data being generated. These data come not only from professional health systems (MRI scanners, pathology slides, DNA tests, etc.) but also from wearable devices. All these data combined form ‘big data’ that can be utilized to optimize treatments for each unique patient (‘precision medicine’) [1]. To achieve this precision medicine, it is necessary that hospitals, academia and industry work together to bridge the ‘valley of death’ of translational medicine [2]. However, hospitals and academia often have problems with sharing their data, even though the patient is actually the owner of his/her own health data, and data sharing is associated with increased citation rate [3,4]. Academic hospitals usually want to be the first ones to publish papers on the data, because they spent a lot of time in setting up clinical trials and collecting the data. Society benefits the most if the patient’s data are shared as soon as possible so that other researchers can work with it [5], but this idea has not settled in yet. Some datasets are publicly available (e.g., in prostate cancer [6]), but these are usually only shared after studies are finished and/or publications have been written based on the data, which means a severe delay of months or even years before others can use the data for analysis. One solution is to incentivize the hospitals [7,8] to share their data with (other) academic institutes and the industry. Besides this academic reluctance, data is also being shared less because of stricter privacy laws such as the EU General Data Protection Regulation (GDPR) [9] and the California Consumer Privacy Act (CCPA) [10]. At the moment, only around 10% of the world’s population has it personal information covered by the GDPR or similar Int. J. Environ. Res. Public Health 2020, 17, 3046; doi:10.3390/ijerph17093046 www.mdpi.com/journal/ijerph Int. J. Environ. Res. Public Health 2020, 17, x 2 of 12 data is also being shared less because of stricter privacy laws such as the EU General Data Protection Regulation (GDPR) [9] and the California Consumer Privacy Act (CCPA) [10]. At Int. J. Environ. Res. Public Health 2020, 17, 3046 2 of 12 the moment, only around 10% of the world’s population has it personal information covered by the GDPR or similar laws, but Gartner Research predicts that this will be around 50% by laws,2022 but [11]. Gartner There Research is an increasingly predicts that urgent this willneed be to around balance 50% the byopportunity 2022 [11]. big There data is anprovides increasingly for urgentimproving need to balancehealthcare, the opportunityagainst the right big data of individuals provides for to improving control their healthcare, own data against [1]. Scientists the right of individualsshould maximize to control theirtheir ownefforts data to [improve1]. Scientists healthca shouldre, maximizebut they should their eff alsoorts toonly improve use data healthcare, with butappropriate they should informed also only useconsent. data This with open appropriate science informedvs. privacy consent. balance This will open remain science an increasing vs. privacy balancechallenge will remain for the ancoming increasing years. challenge for the coming years. TheThe topic topic of data of data sharing sharing has receivedhas received more more attention attention in recent in recent years. years. In 1980, In 1980, only 46only articles 46 (0.0186%articles of (0.0186% the total) publishedof the total) in PubMedpublished contained in PubMed the keywordcontained “data the sharing”,keyword while“data insharing”, 2019 there werewhile 5960 in articles 2019 there (0.4253% were of 5960 the articles total) containing (0.4253% thisof the keyword total) containing (Figure1). this It is keyword also interesting (Figure to 1). see theIt sudden is also rise interesting of interest to insee the the subject sudden since rise 2016, of in theterest year in of the the subject approval since of 2016, the GDPR, the year and of another the peakapproval in 2018, of the the year GDPR, of its and enforcement. another peak in 2018, the year of its enforcement. Figure 1. Graph of the number of abstracts of PubMed publications containing the keyword “data sharing”Figure as 1. a percentageGraph of the of thenumber total, of per abstracts year since of PubM 1980. ed publications containing the keyword “data sharing” as a percentage of the total, per year since 1980. If we use PubMed to find terms related to “data sharing”, there are some interesting observations (Figure2).If Mostlywe use used PubMed are obviously to find terms terms related such as to “patients”, “data sharing”, “health”, there “study” are some and “information”, interesting butobservations closely behind (Figure these 2). are Mostly “use” (orused “used” are obviously/”using”), terms “treatment”, such as “patients”, “care” “analysis” “health”, and “study” “rights”. “Use”and might “information”, point to the but fact closely that behind data collection these are and“use” sharing (or “used”/”using”), is closely connected “treatment”, to the “care” usage of the“analysis” data, i.e., in and the “rights”. consent “Use” form it might should point be mentionedto the fact that in detail data collection what the healthand sharing data will is closely be used for.connected “Treatment”, to the “care” usage and of “analysis”the data, i.e., point in tothe one consent of the form main it uses should of the be data:mentioned analysis in detail in order to improvewhat the treatment health data and will care, be forused example for. “Treatment”, in clinical decision “care” and support “analysis” (CDS) point systems. to one “Rights” of the is probablymain relateduses of tothe the data: patients’ analysis privacy in order rights to whenimprove it comes treatment to data and sharing, care, for an issueexample that in is clinical discussed in detaildecision in this support manuscript. (CDS) systems. “Rights” is probably related to the patients’ privacy rights whenThere it havecomes been to data some sharing, studies onan theissue conditions that is discussed and challenges in detail for in sharing this manuscript. data. For example, for the BigData@HeartThere have platform been some of the studies Innovative on the Medicines conditions Initiative and challenges (IMI), a descriptive for sharing case data. study For into the conditionexample, forfor datathe sharingBigData@Heart was carried platform out [12 ].of Principle the Innovative investigators Medicines of the participating Initiative (IMI), databases a weredescriptive requested case to send study any into kind the of condition documentation for data that sharing possibly was specified carried the out conditions [12]. Principle for data sharing,investigators which were of the then participating qualitatively databases reviewed were for conditions requested relatedto send to any data kind sharing of documentation and data access. Thisthat review possibly revealed specified overlap the conditions on the conditions: for data (1)sharing, only towhich share were health then data qualitatively for scientific reviewed research, (2) infor anonymized conditions /relatedcoded form,to data (3) sharing after approval and data from access. a designated This review review revealed committee, overlap and on while the (4) observingconditions: all appropriate (1) only to measuresshare health for data for security scientific and research, in compliance (2) in withanonymized/coded the applicable laws form, and regulations. These challenges give thought to the design of an ethical governance framework for data sharing platforms. The conclusion of the case study was that current data sharing initiatives should concentrate on: (1) the scope of the research questions that may be addressed, (2) how to deal with Int. J. Environ. Res. Public Health 2020, 17, x 3 of 12 (3) after approval from a designated review committee, and while (4) observing all appropriate measures for data security and in compliance with the applicable laws and regulations. These challenges give thought to the design of an ethical governance framework for data sharing Int.platforms. J. Environ. Res.

Sharing Is Caring—Data Sharing Initiatives in Healthcare

Data Management and Data Sharing in Science and Technology Studies

ORION OPEN SCIENCE FACTSHEETS Brief, Informative, and Easy-To-Understand One-Page Factsheets on Open Science Topics

Standards for Design and Measurement Would Make Clinical

Biodiversa Citizen Science Toolkit P

Data Sharing and the Future of Science Who Beneﬁts from Sharing Data? the Scientists of Future Do, As Data Sharing Today Enables New Science Tomorrow

A Mixed-Methods Investigation Into Barriers for Sharing Geospatial and Resilience Flood Data in the UK

CODATA Task Group on Data Citation Standards

Digital Research Data Sharing and Management

Data Papers As a New Form of Knowledge Organization in the Field of Research Data Joachim Schöpfel, Dominic Farace, Hélène Prost, Antonella Zane

A Content Analysis on the Rhetorical Moves Used in Data Paper Abstracts K

Toward Open Data Policies in Phonetics

A Guideline for a Public-Private Partnership on Urban Big Data Sharing