
Data Science 2 (2019) 341–352 341 DOI 10.3233/DS-190020 IOS Press The ten commandments of translational research informatics Tim Hulsen Department of Professional Health Solutions & Services, Philips Research, Eindhoven, The Netherlands E-mail: [email protected]; ORCID: https://orcid.org/0000-0002-0208-8443 Editor: Manisha Desai (https://orcid.org/0000-0002-6949-2651) Solicited reviews: Lance Downing (https://orcid.org/0000-0002-2133-9244); Mark Musen (https://orcid.org/0000-0003-3325-793X); 3 anonymous reviewers Received 23 March 2019 Accepted 26 July 2019 Abstract. Translational research applies findings from basic science to enhance human health and well-being. In translational research projects, academia and industry work together to improve healthcare, often through public-private partnerships. This “translation” is often not easy, because it means that the so-called “valley of death” will need to be crossed: many interesting findings from fundamental research do not result in new treatments, diagnostics and prevention. To cross the valley of death, fundamental researchers need to collaborate with clinical researchers and with industry so that promising results can be im- plemented in a product. The success of translational research projects often does not depend only on the fundamental science and the applied science, but also on the informatics needed to connect everything: the translational research informatics. This informatics, which includes data management, data stewardship and data governance, enables researchers to store and analyze their ‘big data’ in a meaningful way, and enable application in the clinic. The author has worked on the information technology infrastructure for several translational research projects in oncology for the past nine years, and presents his lessons learned in this paper in the form of ten commandments. These commandments are not only useful for the data managers, but for all involved in a translational research project. Some of the commandments deal with topics that are currently in the spotlight, such as machine readability, the FAIR Guiding Principles and the GDPR regulations. Others are mentioned less in the literature, but are just as crucial for the success of a translational research project. Keywords: Translational research, medical informatics, data management, data curation, data science 1. Introduction Translational research applies findings from basic science to enhance human health and well-being. In a medical research context, it aims to “translate” findings in fundamental research into medical practice and meaningful health outcomes. In translational research projects, academia and industry work together to improve healthcare, often through public-private partnerships [28]. This “translation” is often not easy, because it means that the so-called “valley of death” [4] will need to be crossed: many interesting find- ings from fundamental research do not result in new treatments, diagnostics and prevention. To cross the valley of death, fundamental researchers need to collaborate with clinical researchers and with industry 2451-8484 © 2019 – IOS Press and the authors. This article is published online with Open Access and distributed under the terms of the Creative Commons Attribution License (CC BY 4.0). 342 T. Hulsen / The ten commandments of translational research informatics so that promising results can be implemented in a product. Examples of initiatives supporting trans- lational research are EATRIS [3], the European Infrastructure for Translational Medicine and NCATS [24], the National Center for Advancing Translational Sciences in the USA. The success of translational research projects often does not depend only on the fundamental science and the applied science, but also on the informatics needed to connect everything: the ‘translational research informatics’. This type of informatics was first described in 2005 by Payne et al. [36], as the in- tersection between biomedical informatics and translational research. Translational research informatics should enable the researchers to store and analyze their ‘big data’ in a meaningful way, and enable ap- plication in the clinic [19]. This translational research informatics field includes data management, data stewardship and data governance, which are receiving more attention recently. Data management (in research) is the care and maintenance of the data that are produced during the course of a research cycle. It is an integral part of the research process and helps to ensure that your data are properly organized, described, preserved, and shared [40]. Data management ensures that the story of a researcher’s data collection process is organized, understandable, and transparent [48]. According to Rosenbaum, 2010 [41], data stewardship is a concept with deep roots in the science and practice of data collection, shar- ing, and analysis. It denotes an approach to the management of data, particularly data that can identify individuals. Data governance is the process by which responsibilities of stewardship are conceptualized and carried out. Perrier et al. (2017) presents an extensive overview of 301 articles on data management, distributed over the six phases of the Research Data Lifecycle [15]: (1) Creating Data, (2) Processing Data, (3) Analysing Data, (4) Preserving Data, (5) Giving Access to Data and (6) Re-Using Data. It shows that most publications focus on phases 4 to 6 (especially phase 5), but not so much on phases 1 to 3, while these are equally important. The author has worked on the information technology (IT) infrastructure, data integration and data management for several translational research projects in oncology [20–22,27,39] for the past nine years, as well as on a large Dutch translational research informatics project [32], and presents his lessons learned in this paper in the form of ten commandments. These commandments are not only useful for the data managers, but for all involved in a translational research project, since they touch upon crucial elements such as data quality, data access and sustainability, and cover all phases of the research data lifecycle. As opposed to most publications in the field of data management, this article even puts an emphasis on the early phases of the research data lifecycle. 2. The ten commandments 2.1. Commandment 1: Create a separate data management work package When clinicians, biologists and other researchers come together in a translational research project, they often do not think about data management, data curation, data integration and the IT infrastruc- ture, except for when it is already too late: the data sit on several computers scattered over different organizations, and nobody knows how to combine them and make sense of them. The solution: create a separate work package or work stream on data management. This work package can be thought of as a sub-project, which has its own goal (or even a list of milestones and deliverables) and has full-time equivalents (FTEs) and other financial resources allocated. Within this work package, a data manage- ment plan (DMP) will be created which describes exactly how the data management will take place (such a DMP is obligated nowadays in several funding programmes such as Horizon 2020 [17], and T. Hulsen / The ten commandments of translational research informatics 343 Fig. 1. A proposed work package (WP) structure for a translational research project. with good reason). Since this data management work package will have touchpoints with the other (data generating) work packages, the data management work package leader needs to be involved in all project meetings. It is also advised to create a separate work package on data analysis, which gets its input from the data management / data integration work package (Fig. 1). 2.2. Commandment 2: Reserve time and money for data entry Investigators tasked with generating the hypotheses may not have the expertise to understand the complexities of processing and managing data that are not clean and ready for analysis. For example, they usually lack knowledge about an Electronic Data Capture (EDC) system such as OpenClinica [35], Castor EDC [6]orREDCap[16], as part of work package 1 in Fig. 1. This work is often assigned to trial nurses, who usually already have a high workload. The data entry work is on the bottom of their priority list, which can cause delays and even errors. Which is a large concern, because data quality is an important matter when it comes to data analysis [5]. Even when the data are extracted from an Electronic Health Record (EHR) and entered into the EDC system automatically, this needs to be checked by someone. The term “Garbage in, garbage out” (GIGO) comes from computer science, but applies to medical data as well [26]. The solution here is to reserve money to hire people who can do this job for a certain amount of hours per week. By spending relatively little money on data entry, one can save a lot of time and money by not having to redo analyses because of missing or erroneous data. This commandment does not apply to just clinical data but to the other data types as well: imaging and digital pathology data often need to be annotated, biobank data usually need to be exported from a Laboratory Information Management System (LIMS), and raw genomics data need to be processed first before they can be used in an integrated database. All these steps need sufficient resources to make sure that they are executed correctly. 2.3. Commandment 3: Define all data fields up front together with the help of data analysis experts Within the Prostate Cancer Molecular Medicine (PCMM [20]) project, we noticed after a few years that some information that was essential to answer certain research questions was not being collected in the electronic case report form (eCRF). A second eCRF needed to be constructed, which resulted in a lot of time being spent on going back to the patient’s entries in the hospital system and collecting the data, if they were there at all. We learned our lesson.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages12 Page
-
File Size-