Genome Medicine

Towards a European Health Research and Innovation Cloud (HRIC) --Manuscript Draft--

Manuscript Number: GMED-D-19-00086R2 Full Title: Towards a European Health Research and Innovation Cloud (HRIC) Article Type: Opinion Section/Category: I don't know, Editor will decide section

Funding Information: European Commission Not applicable (Travel grant to attend the workshop)

Abstract: The European Union (EU) initiative on the Digital Transformation of Health and Care (Digicare) aims at providing conditions for building a secure, flexible and decentralised digital health infrastructure. Creating a European Health Research and Innovation Cloud (HRIC) within this environment should enable data sharing and analysis for health research across the EU in compliance with data protection legislation while preserving the full trust of the participants. Such a HRIC should learn from and build on existing data infrastructures, integrate best practices, and focus on the concrete needs of the community in terms of technologies, governance, management, regulation and ethics requirements. Here we describe the vision and expected benefits of digital data sharing in health research activities and a roadmap fostering the opportunities while answering the challenges of implementing a HRIC. For this, we put forward five specific recommendations and action points to ensure that a European HRIC: 1) is built on established standards and guidelines, providing cloud technologies through an open and decentralised infrastructure; 2) is developed and certified to the highest standards of interoperability and data security that can be trusted by all stakeholders; 3) is supported by a robust ethical and legal framework compliant with the EU General Data Protection Regulation (GDPR); 4) establishes a proper environment for the training of new generations of data and medical scientists; 5) stimulates research and innovation in transnational collaborations through public and private initiatives and partnerships funded by the EU through Horizon 2020 and Horizon Europe. Corresponding Author: Charles Auffray European Institute for Systems Biology and Medicine Lyon, FRANCE Corresponding Author Secondary Information: Corresponding Author's Institution: European Institute for Systems Biology and Medicine Corresponding Author's Secondary Institution: First Author: Frank M Aarestrup, PhD First Author Secondary Information: Order of Authors: Frank M Aarestrup, PhD Abdullah Albeyatti, MD Willian J Armitage, PhD Charles Auffray Luca Augello, MSc Rudi Balling, PhD Nora Benhabiles, PhD Guido Bertolini, MD Jan G Bjaalie, MD, PhD Michaela M Black, PhD

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation Niklas Blomberg, PhD Petronille Bogaert, MSc Marian Bubak, PhD Brecht Claerhout, MSc Laura Clarke, MSc Bertrand De Meulder, PhD Gianni D'Errico, PhD Alberto Di Meglio, PhD Nikolaus Forgo, PhD Caroline Gans-Combe, LLM Alexander E Gray, BA Ivo Gut, PhD Alexandra Gyllenberg, PhD Georg Hemmrich-Stanisak, PhD Lars Hjorth, MD Yannis Ioannidis, PhD Sonata Jarmalaite, PhD Alexander Kel Ferath Kherif, PhD Jan O Korbel, PhD Catherine Larue, PhD Mitzi László, MA Andrew Maas, MD, PhD Luis Magalhaes, MBA Isabelle Manneh-Vangramberen, RPH Edwin Morley-Fletcher, PhD Christian Ohmann, MD Per Oksvold, MSc Neil P Oxtoby, PhD Isabelle Perseil, PhD Vasileios Pezoulas, BSc Olaf Riess, MD Heleen Riper, PhD Josep Roca, MD Philip Rosenstiel, MD Philippe Sabatier, PhD Ferran Sanz, PhD Mohammed Tayeb Gard Thomassen, PhD Johan Van Bussel, PhD

Powered by Editorial Manager®Marc Van and Den ProduXion Bulcke, PhD Manager® from Aries Systems Corporation Herman Van Oyen, PhD Order of Authors Secondary Information: Response to Reviewers: Dear Genome Medicine Editors,

I have uploaded a clean copy of the revised version of our HRIC paper, and an annotated version in which the changes and additional references are visible, addressing the nine questions as requested in the mail from your colleague Romina Andrew, also introducing minimal clarifications for the other minor points mentioned by Reviewer 1.

We hope you will find this final revised version acceptable for publication in Genome Medicine.

With best wishes.

Charles Auffray and Nora Benhabiles on behalf of the authors

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation Manuscript Click here to access/download;Manuscript;HRIC-manuscript- revised-12122019_clean.docx Click here to view linked References

1 Towards a European Health Research and Innovation Cloud (HRIC) 2 3 1 2 3 4* 5 6* 4 Albeyatti A. , Armitage W.J. , Augello L. , Auffray C. , Balling R. , Benhabiles N. , Bertolini 5 G.7, Bjaalie J.G.8, Black M.9, Blomberg N.10*, Bogaert P.11, Bubak M.12, Claerhout B.13, 6 Clarke L.14, De Meulder B.15, D'Errico G.16, Di Meglio A.17, Forgo N.18, Gans-Combe C.19, 7 Gray A.E.20, Gut I.21, Gyllenberg A.22, Hemmrich-Stanisak G.23, Hjorth L.24, Ioannidis Y.25, 8 26 27 28 29* 30 31 32 9 Jarmalaite S. , Kel A. , Kherif F. ., Korbel J.O. , Larue C. , Laszlo M. , Maas A. , 33 34 35 36 10 Magalhaes L. , Manneh-Vangramberen I. , Aarestrup F.M. , Morley-Fletcher E. , 11 Ohmann C.37, Oksvold P.38, Oxtoby N.P.39, Perseil I.40, Pezoulas V.41, Riess O.42, Riper H. 43, 12 Roca J.44, Rosenstiel P.45, Sabatier P.46, Sanz F.47, Tayeb M.48, Thomassen G.49, Van Bussel 13 50 51 52 14 J. , Van den Bulcke M. , Van Oyen H. 15 16 Affiliations: 17 1Medicalchain, National Health Service, London United Kingdom, ‘[email protected]’ 18 2Bristol Medical School, Translation Health Sciences, Bristol, United Kingdom ‘[email protected]’ 19 3Regional Agency for Innovation & Procurement (ARIA), Welfare Services Division, Milano, Lombardy, Italy 20 [email protected] 21 4European Institute for Systems Biology and Medicine (EISBM), Vourles, France '[email protected]' 22 5Luxembourg Centre for Systems Biomedicine, Campus Belval, University of Luxembourg '[email protected]' 23 6CEA, French Atomic Energy and alternative energy Commission, Direction de la Recherche Fondamentale, 24 Université Paris-Saclay, F-91191 Gif-sur-Yvette, France '[email protected]' 25 7 Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Bergamo, Italy '[email protected]' 26 8Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway '[email protected]' 27 9Ulster University, Belfast, United Kingdom '[email protected]' 28 10ELIXIR, Welcome Genome Campus, Hinxton, CB10 1SD, United Kingdom 'niklas.blomberg@elixir- 29 europe.org' 30 11Sciensano, Brussels, Belgium and Tilburg University, Tilburg, The Netherlands 31 '[email protected]' 32 12Akademia Gornizco Hutnizca University of Science and Technology, Department of Computer Science and 33 Academic Computing Center Cyfronet, Krakow, Poland '[email protected]' 34 13Ghent University, Ghent, Belgium ‘[email protected]'' 35 14European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, 36 Hinxton, Cambridge, CB10 1SD, UK '[email protected]' 37 15European Institute for Systems Biology and Medicine (EISBM), Vourles, France '[email protected]' 38 16Fondazione Toscana Life Sciences '[email protected]' 39 17CERN, European Organization for Nuclear Research, Meyrin, Switzerland '[email protected]' 40 18University of Vienna, Vienna, Austria '[email protected]' 41 19INSEEC School of Business & Economics, Paris, France '[email protected]' 42 20PwC, Oslo, Norway '[email protected]' 43 21Center for Genomic Regulations, Barcelona, Spain '[email protected]' 44 22Neuroimmunology Unit, The Karolinska Neuroimmunology & Multiple Sclerosis Centre, Department of 45 Clinical Neuroscience, Karolinska Institute, Stockholm, Sweden '[email protected]' 46 23Institute of Clinical Molecular Biology, Kiel University and University Hospital Schleswig-Holstein, Campus 47 Kiel, Kiel, Germany ‘[email protected] 48 24Lund University, Skåne University Hospital, Department of Clinical Sciences, Pediatrics, Lund, Sweden 49 '[email protected]' 50 25Athena Research & Innovation Center and University of Athens, Greece '[email protected]' 51 26 52 National Cancer Institute, Vilnius, Lithuania '[email protected]' 27geneXplain GmbH, Wolfenbüttel, Germany '[email protected]' 53 28 54 Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland '[email protected]' 29European Molecular Biology Laboratory, Unit, Heidelberg, Germany '[email protected]' 55 30 56 Integrated Biobank of Luxembourg, Luxembourg, '[email protected]' 31‘[email protected]’ 57 32 58 Antwerp University Hospital and University of Antwerp, Belgium '[email protected]' 33 59 Clinerion Ltd, Basel, Switzerland '[email protected]' 34 60 European Cancer Patient Coalition, Belgium '[email protected]' 61 62 1 63 64 65 35Technical University of Denmark, Kongens Lyngby, Denmark '[email protected]' 36 1 Lynkeus and Public Policy Consultant, Rome, Italy '[email protected]' 37 2 European Clinical Research Infrastructure Network, Duesseldorf, Germany '[email protected] 3 duesseldorf.de' 38 4 Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden 5 '[email protected]' 39 6 Centre for Medical Image Computing, Department of Computer Science, University College London, United 7 Kingdom '[email protected]' 40 8 Institut National de la Santé et de la Recherche Médicale, Information Technology Department, Paris, France 9 '[email protected]' 10 41Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and 11 Engineering, University of Ioannina, Ioannina, Greece '[email protected]' 12 42Institute of Medical Genetics and Applied Genomics, Rare Disease Center, Tuebingen, Germany 13 '[email protected]' 14 43Vrije Universiteit, Dept. of Behavioural and Movement Sciences, Section Clinical, Neuro and Developmental 15 Psychology, Amsterdam, The Netherlands'[email protected]' 16 44Hospital Clínic de Barcelona, IDIBAPS, University of Barcelona, Barcelona, Spain '[email protected]' 17 45Institute of Clinical Molecular Biology, Kiel University and University Hospital Schleswig-Holstein, Campus 18 Kiel, Kiel, Germany '[email protected]' 19 46French National Centre for Scientific Research, Grenoble, France '[email protected]' 20 47Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra, Barcelona, Spain 21 '[email protected]' 22 48Medicalchain, London, United Kingdom '[email protected]' 23 49University of Oslo, Oslo, Norway '[email protected]' 24 50Scientific Institute of Public Health, Brussels, Belgium '[email protected]' 25 51Scientific Institute of Public Health, Brussels, Belgium '[email protected]' 26 52Ghent University, Ghent, Belgium and Sciensano, Brussels, Belgium 27 '[email protected]' 28 * corresponding authors 29 30 31 32 33 Abstract 34 35 36 The European Union (EU) initiative on the Digital Transformation of Health and Care (Digicare) 37 aims at providing conditions for building a secure, flexible and decentralised digital health 38 infrastructure. Creating a European Health Research and Innovation Cloud (HRIC) within this 39 40 environment should enable data sharing and analysis for health research across the EU in 41 compliance with data protection legislation while preserving the full trust of the participants. 42 Such a HRIC should learn from and build on existing data infrastructures, integrate best 43 44 practices, and focus on the concrete needs of the community in terms of technologies, 45 governance, management, regulation and ethics requirements. Here we describe the vision 46 and expected benefits of digital data sharing in health research activities and a roadmap 47 fostering the opportunities while answering the challenges of implementing a HRIC. For this, 48 49 we put forward five specific recommendations and action points to ensure that a European 50 HRIC: 1) is built on established standards and guidelines, providing cloud technologies through 51 an open and decentralised infrastructure; 2) is developed and certified to the highest 52 53 standards of interoperability and data security that can be trusted by all stakeholders; 3) is 54 supported by a robust ethical and legal framework compliant with the EU General Data 55 Protection Regulation (GDPR); 4) establishes a proper environment for the training of new 56 57 generations of data and medical scientists; 5) stimulates research and innovation in 58 transnational collaborations through public and private initiatives and partnerships funded by 59 the EU through Horizon 2020 and Horizon Europe. 60 61 62 2 63 64 65

1 Background 2 3 4 Genomics has brought life sciences into the realm of data sciences - large scale DNA and RNA 5 sequencing is now routine in life-science and biomedical research with an estimate of up to 6 7 60 million human genomes being available in the coming years [1, 2]. Recent innovations in 8 medical research and healthcare, such as high-throughput genome sequencing, 9 transcriptomics, , , single-cell omics techniques, high-resolution 10 imaging, electronic medical records (EMRs), big-data analytics and a plethora of internet- 11 12 connected health devices fundamentally change the infrastructure requirements for health 13 research. 14 Translating these new data together with clinical information into scientific insights and 15 16 actionable outcomes for improving clinical care is a major challenge. As life-science and health 17 research datasets rapidly grow larger, with an ever-increasing number of study participants 18 required to detect meaningful but weak signals that may be blurred by a myriad of 19 confounding biological, experimental or environmental factors, the computational resources 20 21 required to process and analyse this big data increasingly outgrows the capabilities of even 22 large research institutes. The various cloud technologies and services listed in Table 1 are 23 based on shared commercial and private computer and storage resources that can be 24 25 provided on demand to users from a large number of different institutions conducting or 26 participating in joint projects. They have emerged as powerful solutions to the challenges of 27 collaborating in research on genomic, biomedical and health data. Biomedical and health 28 29 research has yet to fully enter the big data and cloud computing era. The HRIC, as described 30 in this manuscript, would help facilitate this transition, providing access to larger datasets, 31 cutting-edge tools and knowledge, as envisioned in [3]. For example, the HRIC should ease the 32 incorporation of domain expert knowledge into systems disease maps in a format that can be 33 34 both understood by all stakeholders (patients and clinicians, scientists and drug developers), 35 and processed by a high-performance computers, thus supporting the development of 36 innovative medicines and diagnostics [4, 5]. Cloud technologies (through e.g. Hadoop 37 38 applications) makes it possible to collaborate, access and reuse data also in situations when 39 privacy concerns or regulation prohibits remote users from downloading data – an important 40 benefit in Europe where national regulations can differ significantly. Clouds allow bringing 41 algorithms to the data and as such can enable data sharing and joint processing without 42 43 generating unnecessary copies of the data, which comes with potential benefits for data 44 protection [6, 7]. Clouds, additionally, enable performing computational analyses at a scale 45 that individual institutions would struggle to manage [7]. Consequently, in the last few years 46 47 the large international cancer and other genomics consortia have created specialized 48 genomics and biomedical cloud environments, each supporting individual projects [2]. These 49 projects have made important advances in connecting health research data across disciplines, 50 51 organisations and national boundaries. For instance, in research on rare diseases, 52 international collaborations that integrate genomic, phenotypic, and clinical data have 53 introduced new paradigms in diagnosis and care [8]. However, a project-based, fragmented 54 landscape will not enable access and construction of large data cohorts that are required to 55 56 address novel or broader biomedical questions that were not anticipated in individual projects 57 when collecting inform consent from the participants, nor will it provide adequate data 58 governance and containable cost models. 59 60 61 62 3 63 64 65 Scaling and sustainably managing such solutions to support all European life scientists thus 1 requires a coordinated action from science policy makers, funders and other actors in this 2 complex ecosystem. Connecting Europe’s health data to advance the understanding of life and 3 4 disease requires that research data and analysis tools, standards and computational services 5 are made FAIR – i.e. findable, accessible, interoperable, and reusable for researchers across 6 scientific disciplines and national boundaries [9]. Truly enabling personalised and digital 7 8 medicine across Europe and beyond will require a connected digital infrastructure for 9 Europe’s health data that supports systematic openness and integration of research data with 10 real-world datasets (e.g. environmental monitoring data) generated inside all healthcare 11 12 systems, government agencies, foundations and private organisations that will adopt it. th 13 On the 13 of March 2018, the Health directorate of the Directorate-General for Research and 14 Innovation of the European Commission of the EU organised a workshop to explore the 15 possibility and challenges involved in establishing a cloud for health research and innovation, 16 17 accessible by researchers and health professionals throughout Europe, in line with 18 recommendations for a European Innovation Council and the Horizon Europe 2021-2027 19 framework programme [10, 11]. The cloud computing environment proposed in this 20 21 manuscript builds upon the European Open Science Cloud (EOSC) initiative developed in the 22 last few years by the European Commission [12], with a focus in the life science and medicine 23 field. The EOSC aims at developing a trusted, open environment for the scientific community 24 for storing, sharing and re-using scientific data and results. Overall, the authors feel that the 25 26 cloud described in this manuscript, providing the biomedical and health research community 27 with the technical infrastructure and services listed in Table 1 to support the development of 28 innovative diagnostics methods and medical treatments, should become an integral part of 29 30 the EOSC. The workshop gathered a broad range of experts from multiple biomedical research 31 disciplines, health care, informatics, ethics and legislation, including representatives of more 32 than 45 FP7 and Horizon 2020 (H2020) EU-funded collaborative projects. The participants 33 explored requirements and developed a set of recommendations for a European “Health 34 35 Research and Innovation Cloud” (HRIC) to connect researchers and health data sources in 36 Europe [13] such that clinical data, software, computational resources, methods, clinical 37 protocols, and publications, can be more widely and securely accessed and reused following 38 39 the FAIR principles [9] than is currently possible with existing European research infrastrucures 40 such as ELIXIR that form a network of heterogenous national nodes. For example, the HRIC 41 infrastructure would benefit from the aforementioned advantages of cloud computing for the 42 43 archiving and dissemination of health data. This paper summarises the main conclusions from 44 the workshop and highlights five recommendations and action points to the EU and national 45 stakeholders (Table 2). The recommendations are key issues that need to be addressed in 46 order to link biological, clinical, environmental and lifestyle information (from single 47 48 individuals to large cohorts) to the health and wellbeing status of patients and citizens over 49 time, while making this wealth of data and information available for European health research 50 and innovation in clinical care. 51 52 53 The HRIC should be built on established standards and guidelines, to foster European-wide 54 medical research 55 56 57 Rationale 58 59 Sharing of data, information and knowledge represents the most important functionality in 60 the context of a HRIC. High-level standardisation, common exchange mechanisms, interfaces 61 62 4 63 64 65 and protocols, and semantic interoperability form the foundation for widespread adoption of 1 the FAIR principles [9] in health research. Data shared collaboratively in such health-related 2 cloud projects are now largely standardised for the processing of genomic DNA read and 3 4 genomic variant calling files. By comparison, the sharing of highly sensitive clinical and health 5 data has thus far been much less developed hence representing a key future focus area, and 6 numerous challenges remain with regard to sharing these data in a meaningful way. 7 8 9 Existing standards and guidelines 10 11 Many individual projects, in Europe and globally, have demonstrated the opportunities and 12 the added value presented by connecting and exchanging data across countries via 13 standardised protocols. Table 3 lists recent European projects that have developed towards 14 clinical and health data exchange via cloud-based solutions. All those projects have developed 15 16 and implemented worthy ideas that should be included in the HRIC. However, we envision the 17 HRIC as a disease-agnostic environment, and on a larger scale than the platforms mentioned 18 in Table 3. Moreover, the HRIC should not be tied to a single project or consortium, but should 19 20 rather be under the governance of an independent body. International exchange of health 21 research data holds tremendous potential in disease research by facilitating better 22 investigation of disease causality and linking of genotypes and phenotypes, such as 23 24 demonstrated in the Pan-Cancer Analysis of Whole Genomes (PCAWG) project, for example. 25 An important aspect of the cloud-based data governance is that it allows sharing of data 26 outside a consortium through a data request mechanism and a governance infrastructure that 27 track participant consent, and data access. The cloud-based audit trail capabilities of the 28 29 cloud-based research analyses that can be implemented at both data and infrastructure levels 30 is a direct benefit to the data controllers. A standardised data model (or data access model) 31 and/or standardised metadata models facilitate consolidation of different data sets and 32 33 significantly increase the findability, the semantic interoperability and, by consequence, the 34 reusability of data, and thus its ‘FAIRness’ [9, 14]. 35 Workshop participants agreed that a first minimalistic and yet effective approach to data 36 exchange should consist of a small number of initial online repositories containing references 37 38 (e.g. links to render data sets findable) along with metadata (e.g. type and scale of content, 39 specifications describing in what systems the data set may be stored and processed), and 40 indications on how to gain access (e.g. requirements and point of contact). This could be 41 42 designed as a metadata repository containing the metadata of data objects and information 43 on how to gain access to them. The data objects as such may, or indeed should, be stored 44 elsewhere. 45 Beyond their metadata, however, data sets are bound to greatly differ, as research projects 46 47 vary largely in scope. Not only would it be cumbersome to record a large number of 48 parameters irrelevant to the specific question addressed, but it would also be problematic 49 from an ethical viewpoint, considering patient personal data protection aspects [15]. It 50 51 therefore seems more promising to drive standardisation within research communities while 52 looking out for opportunities for overall standardisation. 53 Thus, the workshop envisioned the HRIC as a distributed collection of data repositories, people 54 55 and services, which together make up a framework to share and operate as a federated data 56 commons with reproducible software, standards and expertise based on joint policies and 57 guidelines to conduct health research as performed successfully in previous initiatives [2, 16- 58 19]. The need for federation is also highlighted in the proposed EU action plan for ‘Making 59 60 61 62 5 63 64 65 sense of big data in health research’ [3]. Creating such a HRIC opens new frontiers for research 1 and healthcare via the opportunities for strong international collaborations. 2 Beyond Europe the HRIC should collaborate internationally to drive the development and 3 4 widespread adoption of global standards and connectivity. Ongoing initiatives exist outside 5 Europe that are aiming at developing global standards for the secure exchange of e.g. health 6 and medical records within health information federations while tracking completeness of the 7 8 supporting data [20, 21], as well as developing guidelines for data analytics and standardised 9 workflows. They should be considered for the basis of the HRIC as this will provide the 10 scientific community with means of reproducibility, version control and documentation, which 11 12 will be an important vehicle to drive increased standardisation and connectivity. 13 14 15 16 A European HRIC should be developed and certified to the highest standards of 17 interoperability and data security 18 19 20 Rationale 21 The workshop participants enthusiastically endorsed the vision of the HRIC as a federated 22 environment. Through work on the standardisation, harmonisation and integration of 23 24 genomic data with other health-relevant information to optimise hypothesis-driven analyses, 25 the major blockers for cross-border collaborations in translational research on disease 26 prevention, treatment and management will be addressed through a federated HRIC 27 infrastructure in which data sources remain at their location of origin and are made accessible 28 29 to users through a metadata repository. Moreover, data security has to be integral to the 30 development of the HRIC, using modern cryptology and access control techniques to ensure 31 the protection of the patients’ data contained therein. 32 33 34 Standards in interoperability and data security 35 36 European health systems have different ways of managing and storing health data, making 37 the exchange of clinical data between the EU member states complex. The challenges are well 38 illustrated by the use of eHealth records and their use as secondary research material. In a 39 recent OECD report [22] ten countries reported comprehensive record sharing within one 40 41 country-wide system designed to support each patient having only one EHR (Supplementary 42 Table.). These countries are Estonia, Finland, France, Greece, Ireland, Latvia, Luxembourg, 43 Poland, Slovakia and the United Kingdom (England, Northern Ireland, Scotland and Wales). In 44 45 these countries, plans call for patient records to be shared among physician offices and 46 between physicians and hospitals regarding patient treatment, current medications, and 47 laboratory tests and medical images. Some have already implemented part or all these 48 49 functionalities, while others are progressing toward it. In other countries key aspects of record 50 sharing are managed at sub-national level only, such as within provinces, states, regions or 51 networks of health care organisations (for example, Austria, Germany, Italy, Netherlands, 52 Sweden, and Spain; Supplementary Table). Among them, all have implemented or are 53 54 planning the implementation of a national information exchange that enables key elements 55 to be shared country-wide. Based on the recent reports of the European Commission [23], 56 Belgium, Malta, Portugal, Romania and Slovenia are now developing national EHR systems, 57 58 leading to a total of 16 EU member states providing such services. 59 Within the framework of the Joint Action on Rare Cancers, an EU initiative that brings together 60 European research centres, policy makers and other stakeholders with the aim to set the 61 62 6 63 64 65 agenda at national level, an analysis was made on the status of eHealth medical records in the 1 EU member states. This work builds on the OECD study and complements this with 2 information provided by the European Commission on the national laws on EHRs in the EU 3 4 member states [24]. Thus all EU countries are investing in the development of clinical EHRs, 5 but only some countries are moving forward the possibility of data extraction for research, 6 the provision of statistics and the enablement of other uses that serve the public interest (P. 7 8 Bogaert, personal communication). Countries that develop EHR systems that combine or 9 virtually link data together to capture patient health care histories have the potential to be 10 used also for long-term follow-up of cancer patients. Figure 1 shows how data from various 11 12 sources can be integrated to provide a full picture of patients’ health status over time and 13 carry out research for patterns and anomalies in large populations using the specific 14 combinations of data and analysis resources relevant for each research project, while ensuring 15 compliance with security and data protection regulations. 16 17 The H2020-funded project European Open Science Cloud (EOSC)-Life [25] is developing 18 policies, specifications and tools for the management of data for biological and medical 19 research including aspects of eHealth data. The use of common metadata standards, 20 21 developed in EOSC-Life, as a foundation for remote data discovery and access was emphasised 22 by the workshop participants as a key enabler for the HRIC. For instance, practical and legal 23 considerations for cloud computing of patient data, which include the responsible use of 24 federated and hybrid clouds set up between academic and industrial partners have been put 25 26 forward by early EOSC pilot projects [26]. 27 The workshop participants emphasised that sustainability aspects are critical and must be 28 considered from the start. To ensure that a HRIC could respond properly to emerging needs, 29 30 innovation and technology changes, a distributed federated storage solution offering access 31 to FAIR data and services should be built according to modularity principles. In particular, due 32 to the long-term nature of a HRIC, due consideration should be given to deploying generic and 33 modular computational methods and/or data storage management systems, while the 34 35 information and communications technology (ICT) infrastructure should be flexible, portable 36 and expandable. 37 The million European genomes initiative [27] is a case in point: eighteen member states have 38 39 already signed the Genomics Declaration of Cooperation [28] to enable cross-border access 40 to genomic databases and other health information. This federation of national initiatives [28] 41 will provide secure access to such data resources in the member states to enable the discovery 42 43 of personalised therapies and diagnostics for the benefit of patients. The initiative involves 44 aligning strategies of ongoing national genomic sequencing campaigns with complementary 45 de novo genome sequencing to obtain a total cohort of 1 million Europeans, accessible in a 46 transnational framework, by 2022 [29]. The HRIC would form a basis for such large-scale, 47 48 permanent collaborations. 49 The workshop participants recognise that ensuring a maximal data security is paramount to 50 build and maintain trust with the European citizens. To address this issue, we suggest using 51 52 modern cryptology such as blockchain to ensure data security by design, and holding regular 53 data security assessment (e.g. through hackathons and/or commercial security audits). As 54 shown in recent literature, the use of blockchain in biomedical research is still in its infancy 55 [30]. However, blockchain or other advanced cryptology tools that can be used to protect data 56 57 in a cloud environment could prove useful for the secure and trustworthy implementation of 58 the HRIC [31]. 59 60 61 62 7 63 64 65 The HRIC must be supported by a robust ethics and legal framework compliant with the 1 GDPR 2 3 4 Rationale 5 6 Compliance with GDPR and other data protection laws, as well as enforcing an ethical usage 7 of data is paramount to gain the general public’s support and trust in the HRIC. 8 9 10 Existing ethics and legal framework 11 Unblocking the legal and administrative barriers for sharing human research data across 12 13 geographical and organisational boundaries will, if the trust of research participants is 14 preserved, pave the way for continent-scale cohorts in life science research. This will represent 15 a significant innovation as sharing and joint analysis of sensitive data has until now been 16 17 severely limited because of the different restrictions inherent to the different classes of 18 sensitive data. By using a federated database model, with a metadata repository within the 19 HRIC cloud encrypted environment, data security is maintained while innovative data analyses 20 can be performed by bringing the algorithms to the data rather than centralizing the data [32]. 21 22 Federation, rather than full integration of all available resources, poses an important challenge 23 for implementation and deployment of an effective HRIC. The set-up and functioning of a HRIC 24 requires a robust foundation of legal agreements and ethics rules and procedures, as well as 25 26 security and data protection compliance protocols. Importantly these elements must be 27 introduced during the conception and design phase as part of its governance. This is essential 28 in order to allow the different HRIC actors providing access to their data sources, managing 29 them within the cloud, and accessing these resources to incorporate policy requirements into 30 31 the design and manage the complexity involved in implementing simple and intuitive user 32 interfaces and project portals. This may be challenging considering the heterogeneity of health 33 systems and health market accesses across Europe and will need the sharing of a common 34 35 agreed vision across EU member states. 36 Ethical, societal and privacy considerations for (re)using health related data have been 37 outlined in the Code of Practice on Secondary Use of Medical Research Data developed in the 38 39 European Translational Information and Knowledge Management Services (eTRIKS) project 40 funded by the Innovative Medicines Initiative (IMI) [33]. In addition to clear and explicit 41 consent, explicit dissent may need to be considered for the use/re-use of data. Ultimately, 42 each citizen and patient must be able to access her/his own data and know when and where 43 44 it has been used and for what purpose. In addition, the difficult question of the business model 45 of using those data should be discussed at various levels from ethical, societal and economical 46 standpoints taking into account potential future developments of products and services using 47 48 personal medical data. Furthermore, the goal of providing the citizens with personalised 49 services requires technical advancements in the collection and analysis of data (e.g. in data 50 analytics and machine learning). For this type of usage, simple consent mechanisms might not 51 be sufficient. For example, how should a clear data-collection purpose statement be defined, 52 53 if data is collected for multiple usage scenarios across a distributed/federated cloud in which 54 actors from different geographical and legislative environments will need to interact and 55 cooperate? Would an excessive number of consent requests minimise data provision for 56 57 research or clinical applications? Another level of complexity is introduced by the 58 heterogeneity of data protection and privacy regulations when the data originate from states 59 within federated national health systems (e.g. Germany, Italy). Development of large-scale 60 61 62 8 63 64 65 European access mechanisms will require open consultation and engagement with national 1 policymakers, patient organisations and the wider society to build the trust and confidence 2 needed for widespread adoption and sustainable operations. 3 4 In addition to technical, ethical and legal specifications, a global integrated governance model 5 needs to be established for the HRIC in line with that of the EOSC, regulating roles and 6 responsibilities of all contributing institutions and users, and procedures for authentication 7 8 and access control to individual resources. Principles, with specific guidelines on 9 implementation into a HRIC environment, will need to be developed in order to manage and 10 regulate aspects such as ownership, access, transparency, sharing, integration, 11 12 standardisation of data and metadata formats, tools and frameworks, while ensuring 13 confidentiality and sustainability. All these principles need to be developed with the 14 overarching objective of providing benefit to and preserving the trust of patients and the 15 general public. 16 17 Health data mostly represents sensitive data, which need to be managed to preserve the trust 18 of patients, research participants and the general public, respect social norms and naturally 19 comply with the rules and regulations of data protection laws, notably the EU GDPR [15]. 20 21 Although the GDPR directly applies across the EU and its provisions prevail over national laws, 22 EU member states retain the ability to introduce in their own national legislation certain 23 derogations provided for by the GDPR itself. The GDPR also introduces the notions of 'Privacy 24 by Design', which means that any organisation that processes personal data, must ensure that 25 26 privacy is built into a system during the whole life cycle of the system or process; and 'Privacy 27 by Default', which means that the strictest privacy settings should apply by default, without 28 any manual input from the end user. In addition, any personal data provided by the user to 29 30 enable the optimal use of a given health dataset should only be kept for the amount of time 31 necessary to provide the intended product or service [15]. 32 Thus, successfully linking and accessing biomedical and health data across Europe will require 33 many different disciplines and specialists working together with a coordination effort that 34 35 should encompass controlled access mechanisms to ensure compliance with privacy and data 36 protection regulations. Data providers need logging and monitoring functionalities to comply 37 with the GDPR and to enable tracking of data and methods within the system, controlling 38 39 instances and routines that check for the adherence of predefined standards and formats to 40 guarantee data integrity. Access mechanisms need to be developed that support the 41 researchers, data producers and data analysts to request permissions and fulfil reporting 42 43 requirements for data use in national and international research projects, which represents a 44 significant regulatory, political and sustainability challenge [34]. These include in particular 45 considerations about the rights of patient donors and research participants, taking into 46 account the data protection aspects of various legal systems and local regulations. 47 48 Researchers have to face differences in the understanding of the right to data protection in 49 those different regional or national European ecosystems. There is an urgent need for 50 standardised, usable, data-protection-policy-compliant solutions for sensitive data sharing 51 52 capable of integrating and analysing health data coming from different sources, organisations 53 and potentially from different research disciplines. These aspects are subject to ongoing 54 discussions and debates in the EOSC initiative [35] and for instance, progress has been made 55 in the Human Brain Project through its Ethics and Society sub-project in collaboration with the 56 57 project platforms [36, 37]. Other examples of data sharing compliant with data protection 58 policies can be found in the recent literature [38-45]. Furthermore, there is the issue of 59 capacity with the amount of data starting to strain the infrastructure of any individual hospital 60 61 62 9 63 64 65 or research institute. Thus, the interplay between privacy, data security and access control on 1 one hand and access (including cost-recovery models) to storage, computational and analysis 2 resources on the other hand will be a defining element of the policy and technology 3 4 development of a decentralised digital health infrastructure. The evolution of a cloud model 5 that could be used in European health research will also have to take into account other 6 specific aspects of the GDPR [15]. For instance, the European Commission intends to facilitate 7 8 the free flow of non-personal data in the European Digital Single Market, and for health- 9 related research participants it codifies the ‘right to be forgotten’. It stipulates that patient 10 donors should be able to retain control over their data regardless of technological 11 12 developments. A European HRIC could be an important enabler for researchers to comply with 13 these requirements. For example, once certain conditions are met between European and 14 international partners, including those pertaining to data protection and use, federated and 15 hybrid clouds could facilitate the deletion of data sets once a donor exercises her/his ‘right to 16 17 be forgotten’, which thus could minimise the necessary transfer of large raw data sets across 18 borders, as the deletion can be performed in the original dataset and easily propagated to the 19 relevant federated data sources. 20 21 22 A proper training environment for HRIC developers and users should be established 23 24 25 Rationale 26 The workshop identified the lack of trained personnel, with solid skills in both medical and 27 28 data analytical fields as one of the major bottlenecks when dealing with ‘medical big data’ [3]. 29 30 31 Need for training and ideas 32 Indeed, effectively developing, operating and maintaining the HRIC will pose serious 33 challenges and require the training of a new generation of data scientists able to navigate 34 35 smoothly and efficiently between computational, security and medical disciplines. This 36 includes clinical researchers, bioinformaticians, data analysts, data managers, software 37 engineers, cloud engineers, other IT-specialists, ethics officers, and data protection specialists, 38 39 which represent an essential new field of expertise. Finding professionals that cover more 40 than one or two of the above-described disciplines is nearly impossible. Furthermore, 41 communication between this comprehensive mix of clinical researchers, data managers and 42 IT/bioinformatics specialists needs to be improved, requiring a governance structure well 43 44 beyond that of a standard research setting. The EU should take inspiration from existing large 45 and successful infrastructures that foster multidisciplinary teams, such as CERN [46, 47]. Thus, 46 it is necessary to rethink the training and education of health professionals and to update 47 48 them with the HRIC in mind, considering both international standards and practices for data 49 sharing as well as national environments and regulations. 50 51 52 Multiple funding mechanisms are required to drive the development of the HRIC and 53 support its broad use in research projects 54 55 56 Rationale 57 Delivering the HRIC requires an ambitious reshaping of the European landscape for health 58 data and research through appropriate funding schemes that enable transformation of 59 60 61 62 10 63 64 65 fragmented ICT resources and project-centric solutions for data access and governance into a 1 long-term, coherent service ecosystem that can be accessed by users transnationally. 2 3 4 Need for innovative public-private funding initiatives 5 Indeed, the HRIC needs a trusted and transparent innovation approach that recognises the 6 7 importance of a clear, long-term ambition in the programme to support the participation of 8 industry and small to medium enterprises (SME) in joint projects with a broad set of societal 9 actions. In particular, there is a need to support the EU ICT industrial/SME innovation 10 ecosystem in order to demonstrate the benefits in advancing data sharing, integration and 11 12 analysis across Europe, for the benefit of all citizens, thus creating a foundation for attractive 13 private investments. 14 For this purpose, targeted EU funding mechanisms also involving private investors need to 15 16 support the development of HRIC-compliant services for data sharing and analysis in health- 17 related research projects (i.e. through reimbursement for storage and computing costs) with 18 incentives to reuse and extend existing infrastructures that favour national HRIC participation 19 rather than rebuilding and fragmenting solutions. Moreover, the EU ICT industrial ecosystem 20 21 must be supported in order to alleviate the risks associated with storing and sharing data 22 across cloud systems operated by non-EU companies. The EU has established a strict privacy 23 and ethics policy through the implementation of the GDPR legislation, binding for all operators 24 25 active on the EU territory [48-51]. 26 European funders, science policy makers and other actors need to develop mechanisms that 27 bring together experience gained and lessons learned from a large portfolio of pathfinding 28 29 projects and build on current investments, thus leveraging existing project outcomes. This 30 requires an inclusive and integrative approach with programmes that bring together the many 31 different actors into the HRIC, as its construction needs interdisciplinary collaboration with 32 expertise from many disciplines e.g. economics, ICT, biomedical and health, social sciences 33 34 and policy. In particular, frameworks for public-private partnerships such as IMI have shown 35 a way to include industry in open transparent projects that also includes patients and other 36 public bodies, SMEs and European researchers. Many of the mechanisms proposed in the 37 38 Lamy report (prioritise research and innovation in EU and national budgets, build a true EU 39 innovation policy that creates future markets, rationalise the EU funding landscape and 40 achieve synergy with structural funds…)[52] and in the development of the Horizon Europe 41 “Missions” [53] would also be well suited to the development of the HRIC and help bring 42 43 together initiatives from the many funders, national and regional stakeholders. Other 44 opportunities such as those proposed to be included in the Horizon Europe strategic workplan 45 (e.g. the European Information Cloud, the European Institute of Technology and the European 46 47 Council for Health Research [54, 55]) should be of interest to deploy innovations together with 48 industry at the European level [56]. 49 Furthermore, future programmes need to create incentives such that developed solutions are 50 transformed into long term reusable resources and make sure that this infrastructure is 51 52 deployed across the EU with development informed by ongoing research projects. The 53 European Joint Programme on Rare Diseases (EJP-RD) [56] provides a good example of how 54 infrastructure development can be linked with research projects at national and international 55 56 levels. As in the EJP-RD, the European Strategy Forum on Research Infrastructures (ESFRI) can 57 play a role in the development of HRIC. Two further aspects of EJP-RD are worth noting: the 58 programme has a strong emphasis on the importance on the diverse workforce required with 59 60 a training programme that stretches beyond academia and research networks to reach a 61 62 11 63 64 65 broad set of individuals in health systems and the education sector. Secondly, the research 1 portfolio should fund broadly and recognise that successfully addressing many of the 2 identified challenges will require a diverse portfolio of projects that avoid any artificial 3 4 boundary between biomedical and health research. Horizon Europe should allow for linkages 5 between the HRIC, ESFRI and other themes of Horizon Europe and, importantly, between HRIC 6 and other funding sources such as the European Structural and Innovation Funds (ESIF) and 7 8 the cutting-edge basic research successfully supported by the European Research Council 9 (ERC) during the past decade [57, 58]. 10 The HRIC should enable investments in product development for future health-care solutions 11 12 as well as allow health care providers to procure such solutions. People need to be an integral 13 part of the innovation vision where the HRIC supports a highly skilled future workforce that 14 makes Europe attractive for locating R&D investments. The experience gained in IMI public- 15 private partnership infrastructure projects such as eTRIKS and the European Medical 16 17 Information Framework (EMIF) should be leveraged. Finally, the current and future EU 18 Framework Programmes for Research and Innovation (Horizon 2020 and Horizon Europe) 19 should consider mobilising funds to support novel pilot actions, pooling of data and resources 20 21 across the EU, and demonstrate the benefits in advancing data sharing, integration and 22 analysis across Europe, for the benefit of all citizens. 23 24 25 Conclusions, recommendations and action points 26 27 Clouds are increasingly becoming a key venue for enabling and hosting European and 28 29 international collaborations, benefitting from the ability to hold data securely in a single 30 location (or in few locations) and enable collaborative research on computational 31 infrastructure used for analysis. In conclusion, a cloud-based federated data storage solution 32 with interoperable data access services (see Table 1) to local repositories and modular 33 34 environments that can be configured for a given use case seems to match the data needs of 35 the EU research and medical institutions and of all other stakeholders. The choice of cloud 36 technology provides the ability to manage rapidly growing datasets and provides users with 37 38 access to the massive computational infrastructure needed for analysis. The federated, cloud 39 based research environment - HRIC- described in this paper, would represent an added value 40 to the entire biomedical and bioinformatics community, because single research institutes and 41 medical institutions lack sufficient infrastructure capacity. The establishment of a 42 43 transnational HRIC will allow the European research community at large to contribute to the 44 global international leadership required to address societal and scientific challenges through 45 transnational collaborations. In order to ensure the effective and efficient implementation of 46 47 the European HRIC, the workshop participants endorsed the five recommendations and 48 actions points presented in Table 2 to the EU and all stakeholders. 49 50 51 52 Application A set of programs running on one or more computers allowing a 53 user to perform a set of tasks 54 Cloud Application An application running in the cloud 55 56 Cloud Computing The delivery of IT services over a network (e.g. the Internet) by 57 means of a combination of infrastructure, software and data 58 hosted by one or more cloud providers using a service model 59 60 61 62 12 63 64 65 similar to that used by traditional utility companies (e. g. water or 1 electricity) 2 3 Cloud Federation The combination of infrastructure, software and services from 4 separate networks and providers, having shared access 5 mechanisms, to perform common actions, achieve load-balancing 6 or optimize availability or costs 7 8 Cloud Instance A virtual server or container of resources running on a physical 9 host computer possibly hosting several independent instances 10 (see virtualization) 11 12 Cloud Marketplace An online marketplace of cloud services and applications 13 operated by a Cloud Service Provider (CSP) 14 Cloud Service Provider A company or public entity that offers cloud services to individual 15 16 (CSP) users or other entities 17 Cloud Storage A model of data storage in which data is hosted across one or 18 more facilities by a hosting entity or CSP and remotely accessed 19 20 by users over the Internet 21 Container A type of virtualized instance running on a physical host server in 22 isolated user spaces and possibly preloaded with applications 23 24 Hybrid Cloud A cloud computing infrastructure comprised of a mix of public 25 and private cloud and on-premise instances and resources 26 Infrastructure The combination of hardware resources (network, computing, 27 storage, etc.) and virtualized instances supporting an IT 28 29 environment 30 Infrastructure as a A model of cloud computing in which a CSP provides an 31 Service (IaaS) infrastructure of virtualized resources to users as a service over a 32 33 network (e.g. the Internet) 34 On-Premise It refers to infrastructure or software run on computing resources 35 physically hosted by the entity using them 36 37 Platform A computer system where applications can run on or can be built 38 with 39 Platform as a Service A model of cloud computing in which a CSP provides the 40 41 (PaaS) infrastructure and the platforms where users can run and manage 42 their own applications as a service over a network (e.g. the 43 Internet) 44 Private Cloud A cloud infrastructure used by a single organization either on- 45 46 premise or hosted by a third-party CSP over the Internet or 47 dedicated private networks 48 Public Cloud A cloud infrastructure hosted by a CSP (or a Federation) and used 49 50 by the public or multiple organizations across the Internet 51 Software as a Service A model of cloud computing in which a CSP hosts and provides 52 (SaaS) applications (software) to users as a service over a network (e.g. 53 54 the Internet) 55 Virtualization A technology that allows to run a software simulation of a 56 physical computer where a full operating system and applications 57 58 can be installed 59 Table 1 : Glossary of cloud computing terms. 60 61 62 13 63 64 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Recommendation Rationale Action points 19 20 The HRIC should be supported by predefined 21 standards, data formats, protocols and templates. The Suggest the adoption of data formats and Provide and foster standards, good 22 data standards and guidelines applied in the HRIC architectures in policies, grant applications 23 practices and guidelines necessary to should be designed as to facilitate interoperability and projects calls, throughout the EU and its 24 establish the HRIC. 25 between the diverse health systems and policies in member states. 26 Europe and globally. 27 28 - In future grant and project calls related to 29 The HRIC should provide computational the development of the HRIC, the EU and 30 infrastructures and services, analytical and the applicants should commit to comply Develop and certify the 31 visualisation tools to all users as a platform to share with the highest standards of security, 32 infrastructure and services required knowledge, data and guidelines. Services for data interoperability and reproducibility. 33 for operation of the HRIC. 34 sharing, security and analysis should be compliant - The EU should develop a certification 35 with an EU certification system. system to validate compliance with the 36 37 standards mentioned. 38 A robust ethical and legal framework has to be 39 developed that defines rules for privacy, security, 40 41 Enable the HRIC to operate within an ownership, access and usage of data within the HRIC. - In grants and project calls, federated and 42 ethical and legal framework A federated system architecture should be preferred GDPR-compliant data architectures 43 adequate for health systems. as it allows for comparison of data and results, while should be preferred. 44 complying with EU (GDPR) and international data 45 46 protection and sharing rules. 47 Education and training of health professionals need to 48 be updated with the HRIC in mind, considering both - Scientists and health professional in 49 50 Establish a proper environment for international standards and practices for data sharing training should be made aware of the 51 the training of a new generation of as well as national environments and regulations. The possibilities of the HRIC. 52 data and medical scientists. EU should take inspiration in existing large and - Communication in relevant professional 53 successful infrastructures that foster multidisciplinary channels should be strengthened. 54 55 teams, such as CERN. 56 57 58 59 60 61 62 14 63 64 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 The EU and its member states should together with - The EU should invest through calls and 23 Fund public and private initiatives for private investors develop a coherent, ambitious and grants in order to build and consolidate 24 the development of the HRIC through long-term action plan supported by innovative funding the HRIC. 25 26 EU Framework Programmes mechanisms that consolidate the outcomes from the - The existing industrial ecosystem should 27 (Horizon 2010 and Horizon Europe). existing project portfolio into a long-term operational be supported, to remain competitive 28 infrastructure. against the other actors in the world. 29 Table 2: Summary of the recommendations, details on the rationale and suggestions for actions points directed to the funding agencies and the actors of the field. 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 15 63 64 65 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Figure 1 : Proposed general architecture of the HRIC European (inter)national databases, with varying data formats and data 24 25 types are referenced in a metadata repository, following formatting rules of the Federated data commons as agreed at the 26 27 HRIC governance level. The different users, after access control to the cloud, use the HRIC interface to access the repository, 28 29 which gathers the relevant data and performs analysis, with outputs such as mathematical models, data visualisations, 30 31 statistics and patient’s profiles, according to the users’ needs. 32 33 34 35 36 37 Table 3: Relevant initiatives for the European Health Research and Innovation cloud 38 39 40 PROJECT AIMS/SUMMARY Cloud model used References 41 CORBEL project Creating a platform for Scalable cloud-based provision of [59] 42 harmonised user access to data access and compute across 43 biological and medical infrastructures. 44 technologies, biological samples 45 46 and data. The project has 47 developed the data 48 harmonisation, ethics guidance 49 and user access protocols 50 necessary for transnational access 51 to both pre-clinical and clinical 52 53 research infrastructures and is 54 piloting access to participant level 55 data from clinical trials43. 56 ELIXIR EUROPEAN RESEARCH Hybrid cloud ecosystem : [60] 57 INFRASTRUCTURE with 21 i/ Local, private clouds (e.g. 58 59 members and over 180 research EMBL-EBI Embassy); 60 organisations. ELIXIR is creating a 61 62 16 63 64 65 network of local instances of the ii/ National community clouds 1 European Genome-Phenome (e.g. cPouta, MetaCentrum cloud, 2 Archive that give users controlled de.NBI); 3 and secure access to raw data and iii/ European research and 4 44,45 5 precomputed results . innovation oriented clouds (e.g. 6 EOSC) 7 iv/ Public/commercial compliant 8 clouds (e.g. Google, Azure, AWS). 9 10 11 European IMI-funded highly scalable cloud- Scalable cloud-based platform for [61] 12 Translational based platform for translational translational research and 13 Information research, information and applications development. The 14 knowledge management providing Openstack technology is used to 15 and Knowledge Management open-source applications that can run a private cloud for eTRIKS. 16 securely host heterogeneous data 17 Services types, including multi-omics data, 18 (eTRIKS) 19 preclinical laboratory data and 20 clinical information, including 21 longitudinal data sets. The 22 platform is a robust translational 23 24 research knowledge management 25 system that is able to host other 26 data mining applications and 27 support the development of new 28 analytical tools46. 29 30 European IMI-funded project that has Research analytical service [62, 63] 31 Medical successfully improved access to approaches via HER and cohort 32 Information human health data via providing data platforms. 33 Framework tools and workflows to discover, 34 (EMIF) assess, access and (re)use human 35 health data. The efforts of this IMI 36 37 project are being extended 38 through the EHDEN one. 39 Human Brain The HBP Medical Informatics The HBP Joint Platform plans to [64] 40 Project (HBP) Platform allows researchers adopt in cloud technology and 41 Medical around the world to exploit provide those services through its 42 43 Informatics medical data to create machine- computer centers JSC-Jülich and 44 Platform learning tools that can analyse CSCS-Lugano together with BSC- 45 these data for new insights into Barcelona, CINECA-Bologna and 46 brain-related diseases. The CEA-Saclay. 47 Medical Informatics web-portal47 48 49 provides a software framework Software infrastructure : 50 based on federated and i/ The (base) infrastructure layer 51 distributed computing that allows is accessible through an 52 researchers to mine clinical data “Infrastructure as a Service” 53 stored on hospital and laboratory (IaaS) interface; 54 servers, without moving the data ii/ Tools to enable simulation and 55 56 from the servers where they reside modeling as well as data analytics 57 and without compromising patient workflows for neuroscience that 58 privacy. the HBP operates as a “Platform 59 as a Service” (PaaS); 60 61 62 17 63 64 65 iii/ Several software services for 1 data-driven brain simulations and 2 for virtual neurorobot design and 3 operation offered in form of 4 5 “Software as a Service” (SaaS). 6 HBP operates the following SaaS: 7 a/ Model driven brain 8 simulations; 9 b/ Neurorobotics simulation and 10 development tools across the 11 12 whole workflow the of 13 neurorobotics life cycle. 14 Human Brain The HPB Knowledge Graph (KG) is [65] 15 Project (HBP) an online graph database 16 Knowledge accepting submissions of 17 18 Graph Data anonymized human data, animal 19 Platform data, and models from the brain. 20 All data made discoverable and 21 accessible through KG search have 22 been curated. Data are also made 23 24 available with integrated 25 multilevel HBP Atlases, holding 26 information about the brain in 27 standard reference spaces. 28 29 30 Helix Nebula The project is a pan-European Helix Nebula Science [66] 31 public private partnership initiative Cloud (HNSciCloud): 32 led by EIROforum and leading Hybrid cloud platform linking 33 commercial cloud computing together commercial cloud 34 partners. Since 2011, it has been service providers and publicly 35 piloting the use of cloud funded research organisations’ 36 37 computing to enable complex data in-house IT resources via 38 analyses and large-scale data the GEANT network to provide 39 sharing, with life science-oriented innovative solutions supporting 40 projects ranging from complex data intensive science. 41 genome assembly to assessing These services support the 42 43 somatic variation in the context of connection of the research 44 different types of cancer. infrastructures identified in 45 the ESFRI Roadmap to the 46 nascent European Open Science 47 Cloud (EOSC) intended to create a 48 single digital research space for 49 50 Europe’s 1.8 million researchers. 51 52 European Open The pilot project has recently Cloud based services for open [25] 53 Science Cloud investigated the benefits of data sciences - integration and 54 55 (EOSC) and cloud computer sharing at a consolidation of e-infrastructure 56 pan-European level with life platforms, federation of existing 57 science-oriented projects on pan- European research 58 cancer analyses31 and imaging. infrastructures and scientific 59 This has been pursued across clouds. 60 61 62 18 63 64 65 academic cloud computing 1 environments located in Western 2 and Eastern Europe as well as 3 Canada. Similar initiatives have 4 8 5 been launched in the USA . 6 PCAWG PCAWG - PANCANCER ANALYSIS Hybrid cloud model [67, 68] 7 OF WHOLE GENOMES The data coordinating center lists 8 The study is an international collaborative agreements with 9 collaboration to identify common cloud providers AWS and an 10 11 patterns of mutation in more than academic compute cloud 12 2,800 cancer whole genomes from resource maintained at the 13 the International Cancer Genome cancer collaboratory, by the 14 Consortium. This project is Ontario Institute for Cancer 15 exploring the nature and Research and hosted at the 16 consequences of somatic and Compute Canada facility. 17 18 germline variations in both coding 19 and non-coding regions, with 20 specific emphasis on cis-regulatory 21 sites, non-coding RNAs, and large- 22 scale structural alterations. 23 24 RD-Connect RD-Connect is an integrated Online secured platform [69] 25 platform connecting databases, connecting different types of 26 patient registry data, biobanks and patients related rare disease data 27 clinical bioinformatics for rare enabling genome-phenome 28 disease research. analysis. 29 30 The RD-Connect allow the 31 integration of different data types 32 (e.g. omics, clinical information, 33 patient registries and biobanks). 34 Those integrated data can be 35 accessed and analysed by the 36 37 scientific community to speed-up 38 research, diagnosis and therapy 39 development for patients with rare 40 diseases to contribute for better 41 health. 42 43 44 COMPARE COMPARE, a network of One serve all analytical [70] [71] 45 collaborators of the Global framework and data exchange 46 Microbial Identifier initiative (GMI), platform with various data 47 aims to improve identification and integration for real time analysis 48 49 mitigation of emerging infectious and interpretation of sequence- 50 diseases and foodborne outbreaks. based pathogen. 51 52 53 54 Abbreviations 55 CERN (the European Organisation for nuclear research) 56 EMR (Electronic Health Record) 57 58 EOSC (European Open Science Cloud) 59 EU (European Union) 60 FAIR (Findable, Accessible, Interoperable, Retrievable) 61 62 19 63 64 65 FP7 (The European Union's seventh framework programme for Research, Technological 1 Development and Demonstration) 2 H2020 (The European Union's Horizon Research and Innovation programme) 3 4 HBP (Human Brain project) 5 HRIC (European Health Research and Innovation Cloud) 6 OECD (Organisation of Economic Collaboration and Development) 7 8 IaaS (Infrastructure as a Service) 9 PaaS (Platform as a Service) 10 SaaS (Software as a Service) 11 12 13 14 Declarations 15 16 17 Availability of data and materials 18 “Not applicable” 19 20 21 Funding 22 The workshop participants received funding from the European Union Seventh Programme 23 for Research, Technological Development and Demonstration (FP7) and Horizon Research and 24 Innovation Programme (H2020) and other European Union programmes under the following 25 26 grant agreements: AETIONOMY (Developing an Aetiology-based Taxonomy of Human Disease- 27 Approaches to Develop a New Taxonomy for Neurological Disorders, IMI-no115568), ANTI- 28 SUPERBUG PCP (ANTISUPERBUG Precommercial Procurement, H2020-no688878), B-CAST 29 30 (Breast CAncer Stratification understanding the determinants of risk and prognosis of 31 molecular sub-types, H2020-no633784), BRIDGE Health (Bridging information and data 32 generation for evidence-based health policy and research, H2020-no664691), CASyM 33 (Coordinating Action Systems Medicine – Implementation of Systems Medicine across Europe, 34 35 FP7-n°305033), CENTER-TBI (Collaborative European NeuroTrauma Effectiveness Research in 36 TBI, FP7-no602150), CECM (Centre for New Methods in Computational Diagnostics and 37 Personalized Therapy, H2020-no763734), COLOSSUS (Advancing a Precision Medicine 38 39 Paradigm in Metastatic Colorectal Cancer: Systems based patient stratification solutions, 40 H2020-no754923), COMPARE (COllaborative Management Platform for detection and 41 Analyses of (Re-)emerging and foodborne outbreaks in Europe, H2020-no643476), 42 o 43 CONNECARE (Personalised Connected Care for Complex Chronic Patients, H2020-n 689802), 44 CREATIVE (Collaborative REsearch on ACute Traumatic brain Injury in intensiVe care medicine 45 in Europe, FP7-no602714), CHAFEA grant-no709723), DEFORM (Define the global and financial 46 impact of research misconduct H2020-no710246), ECCTR (European Cornea and Cell 47 48 Transplant Registry, ECRIN-IA (European Clinical Research Infrastructures Network- 49 Integrating Activity, FP7-no284395), EHR4CR (Electronic Health Records Systems for Clinical 50 Research, IMI-no115189) eInfraCentral (European E-infrastructures Services Gateway, H2020- 51 o 52 n 731049), ELIXIR (European Life-science Infrastructure for Biological Information, FP7- 53 n°211601), ELIXIR-EXCELERATE (Fast track ELIXIR implementation and drive early user 54 exploitation across the life sciences, H2020-no676559), eMEN (e-mental health innovation and 55 transnational implementation platform North West Europe, H2020), EMIF (European Medical 56 o 57 Information Framework, IMI-n 115372), ERA PerMed (ERA-net Cofund in Personalised 58 Medicine, H2020-no779282), eTRIKS (Delivering European Translational Information and 59 Knowledge Management Services, IMI-1-no115446), E-COMPARED (European COMPARative 60 61 62 20 63 64 65 Effectiveness Research on online Depression, FP7- no603098), EuroPOND (Data-driven models 1 for progression of neurological diseases, H2020-n°666992), EurValve (Personalised Decision 2 Support for Heart Valve disease, H2020-no689617), HBP SGA1/SGA2 (Human Brain Project 3 4 specific grant agreements, H2020-n°720270/785907), MASTERMIND (Management of Mental 5 Disorders through Advanced Technologies, CIP-no621000), MedBioinformatics (Creating 6 medically-driven integrative bioinformatics applications focused on oncology, CNS disorders 7 8 and their comorbidities, H2020-n°634143), ICT4DEPRESSION (User-friendly ICT tools to 9 enhance self-management and effective treatment of depression in the EU, FP7- n°248778), 10 ImpleMentAll (Towards evidence-based tailored implementation strategies for eHealth, 11 o 12 H2020-n 733025), INSTRUCT-ULTRA (Releasing the full potential of instruct to expand and 13 consolidate infrastructure services for integrated structural life sciences research, H2020- 14 no731005), MeDALL (Mechanisms of the Development of ALLergy, FP7-n°261357) MIDAS 15 (Meaningful Integration of Data, Analytics and Services, H2020-no 727721), MultipleMS 16 17 (Multiple manifestations of genetic and non-genetic factors in Multiple Sclerosis disentangled 18 with a multi-omics approach to accelerate personalised medicine, H2020-no733161), myPEBS 19 (Randomized Comparison Of Risk-Stratified versus Standard Breast Cancer Screening 20 o 21 European Women Aged 40-74, H2020-n 755394), OpenAIRE-Advance (Advancing Open 22 Scholarship, H2020-no777541), OpenMedicine (OpenMedicine, H2020-no643796), 23 PanCareSurFup (PanCare Childhood and Adolescent Cancer Survival Care and Follow-up 24 Studies, FP7-no257505), PIONEER (Prostate Cancer DIagnOsis and TreatmeNt Enhancement 25 26 through the Power of Big Data in EuRope, H2020-IMI-2-n°777492), PREPARE (Platform for 27 European Preparedness Against (Re-)emerging Epidemics, FP7-n°602525), Regions4PerMed 28 (Interregional coordination for a deep and fast uptake of personalised health, H2020- 29 o 30 n 825812), RD-CONNECT (An integrated platform connecting registries, biobanks and clinical 31 bioinformatics for rare disease research, FP7-no305344), Solve-RD (Solving the unsolved Rare 32 Diseases, H2020-no779257), SPIDIA4P (SPIDIA for Personalised Medicine-Standardization of 33 generic Pre-analytical procedures for In-vitro DIAgnostics for Personalised Medicine, H2020- 34 o 35 n 733112), SYSCID (A Systems medicine approach to chronic inflammatory disease, H2020- 36 no733100), SysCLAD (Systems prediction of Chronic Allograft Dysfunction, FP7-n°305457), 37 SYSCOL (Systems Biology of Colorectal Cancer, FP7-no258236), SysMedPD (Systems Medicine 38 39 of Mitochondrial Parkinson’s Disease, H2020-n°668738), U-BIOPRED (Unbiased BIOmarkers 40 for the PREDiction of respiratory disease outcomes, IMI-n°115010), VPH-share (Virtual 41 Physiological Human: Sharing for Healthcare - A Research Environment, FP7-n°269978). 42 43 Neither the European Commission nor any other person acting on behalf of the Commission 44 is responsible for the use, which might be made of the following information. The views 45 expressed in this publication are the sole responsibility of the authors and do not necessarily 46 reflect the views of the European Commission. 47 48 49 Competing interests 50 RB is a founder and holder of shares of MEGENO S.A. and holds shares of ITTM S.A. The other 51 52 authors have declared no competing interests. 53 54 Authors' contributions 55 This publication is inspired by a workshop organized by the Health Directorate at the European 56 57 Commission. The chairs ADM, CGC, EMF, FS, IP, LC, MB, MBu, ML, MVDB, NBl, RB contributed 58 to the concept of the manuscript. GHS produced the draft workshop report, which was used 59 as a basis for the manuscript. The core editing group CA, NB, BDM, JK, NB, MVDB reviewed 60 61 62 21 63 64 65 and finalized the manuscript. All authors contributed to the content of the manuscript and 1 approved its final version. 2 3 4 Acknowledgements 5 We would like to thank the scientific policy officers at the European Commission Health 6 Directorate, who developed the workshop concept, organised it and provided input to this 7 8 publication: Sasa Jenko, Katarina Krepelkova, Christina Kyriakopoulou, Jana Makedonska, 9 Joana Namorado, Elsa Papadopoulou, Jan Van de Loo and Gregor Schaffrath (National Expert 10 in professional training). The workshop was organised by the Health Directorate of the 11 12 Directorate-General for Research and Innovation at the European Commission with 13 contributions of the Innovative Medicines Initiative, the Directorate General for 14 Communications Networks, Content and Technology and the Directorate General for Health 15 and Food Safety. 16 17 18 References 19 20 21 22 1. Birney E, Vamathevan J, Goodhand P: Genomics in healthcare: GA4GH looks to 23 2022. In: Cold Spring Harbor Laboratory. bioRxiv; 2017. 24 2. Langmead B, Nellore A: Cloud computing for genomic data analysis and 25 collaboration. Nature reviews Genetics 2018, 19(5):325. 26 3. Auffray C, Balling R, Barroso I, Bencze L, Benson M, Bergeron J, Bernal-Delgado E, 27 28 Blomberg N, Bock C, Conesa A et al: Making sense of big data in health research: 29 Towards an EU action plan. Genome medicine 2016, 8(1):71. 30 4. Mazein A, Ostaszewski M, Kuperstein I, Watterson S, Le Novere N, Lefaudeux D, De 31 Meulder B, Pellet J, Balaur I, Saqi M et al: Systems medicine disease maps: 32 33 community-driven comprehensive representation of disease mechanisms. NPJ 34 systems biology and applications 2018, 4:21. 35 5. Ostaszewski M, Gebel S, Kuperstein I, Mazein A, Zinovyev A, Dogrusoz U, Hasenauer 36 J, Fleming RMT, Le Novere N, Gawron P et al: Community-driven roadmap for 37 integrated disease maps. Briefings in bioinformatics 2019, 20(2):659-670. 38 39 6. Phillips M, Molnar-Gabor F, Korbel JO, Thorogood A, Joly Y, Chalmers D, Townend 40 D, Knoppers BM, Consortium P: Of Clouds and Genomic Data Protection. Nature 41 2019. 42 7. PCAWG C: Pan-cancer analysis of whole genome. Nature 2019. 43 44 8. Lochmuller H, Badowska DM, Thompson R, Knoers NV, Aartsma-Rus A, Gut I, Wood 45 L, Harmuth T, Durudas A, Graessner H et al: RD-Connect, NeurOmics and 46 EURenOmics: collaborative European initiative for rare diseases. European 47 journal of human genetics : EJHG 2018, 26(6):778-785. 48 9. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, 49 50 Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE et al: The FAIR Guiding 51 Principles for scientific data management and stewardship. Scientific data 2016, 52 3:160018. 53 10. Lamy P, Brudermüller M, Ferguson M, Friis L, Garmendia C, Gray I, Gulliksen J, 54 55 Kulmala H, Maher N, Plentz Fagundes M et al: LAB - FAB - APP Investing in the 56 European future we want. In. Brussels: Independent High Level Group on maximising 57 the impact of EU research & Innovation Programmes; 2017. 58 11. Hauser H, Bergman N, Bruncko M, Cosgrave P, Dwyer G, Helder M, Hinrikus T, Hoerr 59 I, Karia B, Kolar J et al: Funding - Awareness - Scale - Talent (FAST) Europen is 60 61 62 22 63 64 65 back: Accelerating breakthrough innovation. In. Luxemburg: European 1 Commission; 2018. 2 12. Implementation Roadmap for the European Open Science Cloud 3 [https://ec.europa.eu/research/openscience/pdf/swd_2018_83_f1_staff_working_paper 4 5 _en.pdf#view=fit&pagemode=none] 6 13. Mons B, Neylon C, Velterop J, Dumontier M, da Silva Santos LOB, Wilkinson MD: 7 Cloudy, increasingly FAIR; revisiting the FAIR Data guiding principles for the 8 European Open Science Cloud. Information Services & Use 2017, 37:49-56. 9 10 14. FAIRsharing homepage [https://fairsharing.org/] 11 15. REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAMENT AND OF 12 THE COUNCIL [https://eur-lex.europa.eu/legal- 13 content/EN/TXT/?uri=celex%3A32016R0679] 14 16. Bouzaglo D, Chasida I, Ezra Tsur E: Distributed retrieval engine for the development 15 16 of cloud-deployed biological databases. BioData mining 2018, 11:26. 17 17. Corrie BD, Marthandan N, Zimonja B, Jaglale J, Zhou Y, Barr E, Knoetze N, Breden 18 FMW, Christley S, Scott JK et al: iReceptor: A platform for querying and analyzing 19 antibody/B-cell and T-cell receptor repertoire data across federated repositories. 20 Immunological reviews 2018, 284(1):24-41. 21 22 18. Sreenivasaiah PK, Kim DH: Current trends and new challenges of databases and 23 web applications for systems driven biological research. Frontiers in physiology 24 2010, 1:147. 25 19. Wade TD: Traits and types of health data repositories. Health information science 26 27 and systems 2014, 2:4. 28 20. Estiri H, Klann JG, Weiler SR, Alema-Mensah E, Joseph Applegate R, Lozinski G, 29 Patibandla N, Wei K, Adams WG, Natter MD et al: A federated EHR network data 30 completeness tracking system. J Am Med Inform Assoc 2019, 26(7):637-645. 31 21. Shenai S, Aramudhan M: Cloud computing framework to securely share health & 32 33 medical records among federation of healthcare information systems. Biomedical 34 research 2018; Special Issue: S133-S136. 35 22. Oderkirk J: Readiness of electronic health record systems to contribute to national 36 health information and research. In. Paris: OECD Health Working Papers; 217: 80. 37 38 23. eHealth: Digital health and care 39 [https://ec.europa.eu/health/ehealth/projects/nationallaws_electronichealthrecords_en] 40 24. Overview of the national laws on electronic health records in the EU Member 41 States and their interaction with the provision of cross-border eHealth services - 42 Final report and recommendations 43 44 [https://ec.europa.eu/health/sites/health/files/ehealth/docs/laws_report_recommendatio 45 ns_en.pdf] 46 25. EOSC homepage [https://www.eoscpilot.eu/] 47 26. Molnar-Gabor F, Lueck R, Yakneen S, Korbel JO: Computing patient data in the 48 49 cloud: practical and legal considerations for genetics and genomics research in 50 Europe and internationally. Genome medicine 2017, 9(1):58. 51 27. EU countries will cooperate in linking genimc databases across borders 52 [https://ec.europa.eu/digital-single-market/en/news/eu-countries-will-cooperate- 53 linking-genomic-databases-across-borders] 54 55 28. Declaration of cooperation towards access to at least 1 million sequenced genomes 56 in the European Union by 2022 57 [https://www.euapm.eu/pdf/EAPM_Declaration_Genome.pdf] 58 59 60 61 62 23 63 64 65 29. Saunders G, Baudis M, Becker R, Beltran S, Béroud C, Birney E, Brooksbank C, 1 Brunak S, Van den Bulcke M, Drysdale R et al: Leveraging European infrastructures 2 to access 1 million human genomes by 2022. Nature Reviews Genetics 2019. 3 30. Drosatos G, Kaldoudi E: Blockchain Applications in the Biomedical Domain: A 4 5 Scoping Review. Computational and structural biotechnology journal 2019, 17:229- 6 240. 7 31. Esposito C, De Santis A, Tortora G, Chang H, Choo K-KR: Blockchain: A Panacea 8 for Healthcare Cloud-Based Data Security and Privacy? IEEE Cloud Computing 9 10 2018, 5. 11 32. Miller JB: Big data and biomedical informatics: Preparing for the modernization 12 of clinical neuropsychology. The Clinical neuropsychologist 2019, 33(2):287-304. 13 33. The eTRIKS code of practice on secondary use of medical research data 14 [https://www.etriks.org/code-of-practice/] 15 16 34. De Hert P, Papakonstantinou V: Three Scenarios for International Governance of 17 Data Privacy: Towards an International Data Privacy Organization, Preferably a 18 UN Agency? I/S: A Journal of Law and Policy for the Information Society 2013, 19 9(2):271-324. 20 35. EU picks team of 11 to run giant European Open Science Cloud 21 22 [https://sciencebusiness.net/science-cloud/news/eu-picks-team-11-run-giant-european- 23 open-science-cloud] 24 36. Human brain project ethics and society sub-projects 25 [https://www.humanbrainproject.eu/en/social-ethical-reflective/] 26 27 37. Salles A, Bjaalie JG, Evers K, Farisco M, Fothergill BT, Guerrero M, Maslen H, Muller 28 J, Prescott T, Stahl BC et al: The Human Brain Project: Responsible Brain Research 29 for the Benefit of Society. Neuron 2019, 101(3):380-384. 30 38. 8 Benefits and Risks of Cloud Computing in Healthcare 31 [https://solutionsreview.com/cloud-platforms/8-benefits-and-risks-of-cloud- 32 33 computing-in-healthcare/] 34 39. Ali O, Shrestha A, Soar J, Wamba F: Cloud computing-enabled healthcare 35 opportunities, issues, and applications: A systematic review. International Journal 36 of Information Management 2018, 43:146-158. 37 38 40. Lian JW: Establishing a Cloud Computing Success Model for Hospitals in Taiwan. 39 Inquiry : a journal of medical care organization, provision and financing 2017, 40 54:46958016685836. 41 41. de Bruin B, Floridi L: The Ethics of Cloud Computing. Science and engineering ethics 42 2017, 23(1):21-39. 43 44 42. Council CSC: Impact of Cloud Computing on Healthcare. In.; 2017. 45 43. Sultan N: Making use of cloud computing for healthcare provision: Opportunities 46 and challenges. International Journal of Information Management 2014, 34(2):177- 47 184. 48 49 44. Sobeslav V, Maresova P, Krejcar O, Franca TC, Kuca K: Use of cloud computing in 50 biomedicine. Journal of biomolecular structure & dynamics 2016, 34(12):2688-2697. 51 45. Rodrigues JJPC, Sendra Compte S, de la Torra Diez I: Cloud Computing on e-Health. 52 In: e-Health Systems, Theory, Advances and Technical Applications. Edited by 53 IP-; 2016: 191-207. 54 55 46. CERN Openlab homepage [https://openlab.cern/] 56 47. Worldwide LHC Computing Grid homepage [http://wlcg.web.cern.ch/] 57 48. Chen J, Qian F, Yan W, Shen B: Translational biomedical informatics in the cloud: 58 present and future. BioMed research international 2013, 2013:658925. 59 60 61 62 24 63 64 65 49. Khan N, Yaqoob I, Hashem IA, Inayat Z, Ali WK, Alam M, Shiraz M, Gani A: Big 1 data: survey, technologies, opportunities, and challenges. 2 TheScientificWorldJournal 2014, 2014:712826. 3 50. Kuo AM: Opportunities and challenges of cloud computing to improve health care 4 5 services. Journal of medical Internet research 2011, 13(3):e67. 6 51. Navale V, Bourne PE: Cloud computing applications for biomedical science: A 7 perspective. PLoS computational biology 2018, 14(6):e1006144. 8 52. LAB, FAB, APP: Investing in the European future we want. In. Luxemburg: 9 10 Publication office of the European Union; 2017. 11 53. Mazzucato M: Mission-Oriented Research & Innovation in the European Union - 12 a problem-solving approach to fuel innovation-led growth. In. Luxemburg: 13 Publication office of the European Union; 2018. 14 54. EIT in Horizon Europe (2021-2027) - complementarities and synergies with the 15 16 EIC 17 [https://eit.europa.eu/sites/default/files/eit_position_paper_horizon_europe_eit_eic_0.p 18 df] 19 55. H2020 Scientiic Panel for Health: Building the future of health research - Proposal 20 for a European Council for Health Research In.; 2018. 21 22 56. European Joint Programme of Rare Diseases homepage 23 [https://www.ejprarediseases.org/index.php/about/] 24 57. European Research Council homepage [https://erc.europa.eu] 25 58. European Structural and Investment Funds [https://ec.europa.eu/info/funding- 26 27 tenders/funding-opportunities/funding-programmes/overview-funding- 28 programmes/european-structural-and-investment-funds_en] 29 59. CORBEL project homepage [http://www.corbel-project.eu] 30 60. ELIXIR infrastructure homepage [https://www.elixir-europe.org/] 31 61. The eTRIKS project homepage [https://www.etriks.org/] 32 33 62. EMIF project homepage [http://www.emif.eu/] 34 63. EHDEN project homepage [https://www.ehden.eu/] 35 64. Human Brain Project homepage [https://www.humanbrainproject.eu/en/medicine/] 36 65. Human Brain Project Knowledge Graph 37 38 [https://www.humanbrainproject.eu/en/explore-the-brain/] 39 66. HELIX nebula homepage [http://www.helix-nebula.eu/] 40 67. [https://dcc.icgc.org/icgc-in-the-cloud/collaboratory] 41 68. PCAWG project homepage [https://dcc.icgc.org/pcawg] 42 69. RD Connect homepage [https://rd-connect.eu/] 43 44 70. COMPARE project homepage [http://www.compare-europe.eu] 45 71. [https://www.globalmicrobialidentifier.org] 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 25 63 64 65 Personal Cover Click here to access/download;Personal Cover;HRIC- manuscript-revised-12122019_changes.docx

Towards a European Health Research and Innovation Cloud (HRIC) Formatted: Tab stops: 2.85", Left

Albeyatti A.1, Armitage W.J.2, Augello L.3, Auffray C.4*, Balling R.5, Benhabiles N.6*, Bertolini G.7, Bjaalie J.G.8, Black M.9, Blomberg N.10*, Bogaert P.11, Bubak M.12, Claerhout B.13, Clarke L.14, De Meulder B.15, D'Errico G.16, Di Meglio A.17, Forgo N.18, Gans-Combe C.19, Gray A.E.20, Gut I.21, Gyllenberg A.22, Hemmrich-Stanisak G.23, Hjorth L.24, Ioannidis Y.25, Jarmalaite S.26, Kel A.27, Kherif F.28., Korbel J.O. 29*, Larue C.30, Laszlo M.31, Maas A.32, Magalhaes L.33, Manneh-Vangramberen I.34, Aarestrup F.M.35, Morley-Fletcher E. 36, Ohmann C.37, Oksvold P.38, Oxtoby N.P.39, Perseil I.40, Pezoulas V.41, Riess O.42, Riper H. 43, Roca J.44, Rosenstiel P.45, Sabatier P.46, Sanz F.47, Tayeb M.48, Thomassen G.49, Van Bussel J.50, Van den Bulcke M.51, Van Oyen H.52

Affiliations: 1Medicalchain, National Health Service, London United Kingdom, ‘[email protected]’ 2Bristol Medical School, Translation Health Sciences, Bristol, United Kingdom ‘[email protected]’ 3Regional Agency for Innovation & Procurement (ARIA), Welfare Services Division, Milano, Lombardy, Italy [email protected] 4European Institute for Systems Biology and Medicine (EISBM), Vourles, France '[email protected]' 5Luxembourg Centre for Systems Biomedicine, Campus Belval, University of Luxembourg '[email protected]' 6CEA, French Atomic Energy and alternative energy Commission, Direction de la Recherche Fondamentale, Université Paris-Saclay, F-91191 Gif-sur-Yvette, France '[email protected]' 7 Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Bergamo, Italy '[email protected]' 8Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway '[email protected]' 9Ulster University, Belfast, United Kingdom '[email protected]' 10ELIXIR, Welcome Genome Campus, Hinxton, CB10 1SD, United Kingdom 'niklas.blomberg@elixir- europe.org' 11Sciensano, Brussels, Belgium and Tilburg University, Tilburg, The Netherlands '[email protected]' 12Akademia Gornizco Hutnizca University of Science and Technology, Department of Computer Science and Academic Computing Center Cyfronet, Krakow, Poland '[email protected]' 13Ghent University, Ghent, Belgium ‘[email protected]'' 14European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK '[email protected]' 15European Institute for Systems Biology and Medicine (EISBM), Vourles, France '[email protected]' 16Fondazione Toscana Life Sciences '[email protected]' 17CERN, European Organization for Nuclear Research, Meyrin, Switzerland '[email protected]' 18University of Vienna, Vienna, Austria '[email protected]' 19INSEEC School of Business & Economics, Paris, France '[email protected]' 20PwC, Oslo, Norway '[email protected]' 21Center for Genomic Regulations, Barcelona, Spain '[email protected]' 22Neuroimmunology Unit, The Karolinska Neuroimmunology & Multiple Sclerosis Centre, Department of Clinical Neuroscience, Karolinska Institute, Stockholm, Sweden '[email protected]' 23Institute of Clinical Molecular Biology, Kiel University and University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany ‘[email protected] 24Lund University, Skåne University Hospital, Department of Clinical Sciences, Pediatrics, Lund, Sweden '[email protected]' 25Athena Research & Innovation Center and University of Athens, Greece '[email protected]' 26National Cancer Institute, Vilnius, Lithuania '[email protected]' 27geneXplain GmbH, Wolfenbüttel, Germany '[email protected]' 28 Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland '[email protected]' Formatted: French (France) 29European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany '[email protected]' 30Integrated Biobank of Luxembourg, Luxembourg, '[email protected]' 31‘[email protected]’ 32Antwerp University Hospital and University of Antwerp, Belgium '[email protected]' 33Clinerion Ltd, Basel, Switzerland '[email protected]' 34European Cancer Patient Coalition, Belgium '[email protected]'

1 35Technical University of Denmark, Kongens Lyngby, Denmark '[email protected]' 36Lynkeus and Public Policy Consultant, Rome, Italy '[email protected]' 37European Clinical Research Infrastructure Network, Duesseldorf, Germany '[email protected] duesseldorf.de' 38Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden '[email protected]' 39Centre for Medical Image Computing, Department of Computer Science, University College London, United Kingdom '[email protected]' 40Institut National de la Santé et de la Recherche Médicale, Information Technology Department, Paris, France '[email protected]' 41Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina, Greece '[email protected]' 42Institute of Medical Genetics and Applied Genomics, Rare Disease Center, Tuebingen, Germany '[email protected]' 43Vrije Universiteit, Dept. of Behavioural and Movement Sciences, Section Clinical, Neuro and Developmental Psychology, Amsterdam, The Netherlands'[email protected]' 44Hospital Clínic de Barcelona, IDIBAPS, University of Barcelona, Barcelona, Spain '[email protected]' 45Institute of Clinical Molecular Biology, Kiel University and University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany '[email protected]' 46French National Centre for Scientific Research, Grenoble, France '[email protected]' 47Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra, Barcelona, Spain '[email protected]' 48Medicalchain, London, United Kingdom '[email protected]' 49University of Oslo, Oslo, Norway '[email protected]' 50Scientific Institute of Public Health, Brussels, Belgium '[email protected]' 51Scientific Institute of Public Health, Brussels, Belgium '[email protected]' 52Ghent University, Ghent, Belgium and Sciensano, Brussels, Belgium '[email protected]' * corresponding authors

Abstract

The European Union (EU) initiative on the Digital Transformation of Health and Care (Digicare) aims at providing conditions for building a secure, flexible and decentralised digital health infrastructure. Creating a European Health Research and Innovation Cloud (HRIC) within this environment should enable data sharing and analysis for health research across the EU in compliance with data protection legislation while preserving the full trust of the participants. Such a HRIC should learn from and build on existing data infrastructures, integrate best practices, and focus on the concrete needs of the community in terms of technologies, governance, management, regulation and ethics requirements. Here we describe the vision and expected benefits of digital data sharing in health research activities and a roadmap fostering the opportunities while answering the challenges of implementing a HRIC. For this, we put forward five specific recommendations and action points to ensure that a European HRIC: 1) is built on established standards and guidelines, providing cloud technologies through an open and decentralised infrastructure; 2) is developed and certified to the highest standards of interoperability and data security that can be trusted by all stakeholders; 3) is supported by a robust ethical and legal framework compliant with the EU General Data Protection Regulation (GDPR); 4) establishes a proper environment for the training of new generations of data and medical scientists; 5) stimulates research and innovation in transnational collaborations through public and private initiatives and partnerships funded by the EU through Horizon 2020 and Horizon Europe.

2

Background

Genomics has brought life sciences into the realm of data sciences - large scale DNA and RNA sequencing is now routine in life-science and biomedical research with an estimate of up to 60 million human genomes being available in the coming years [1, 2]. Recent innovations in Field Code Changed medical research and healthcare, such as high-throughput genome sequencing, Field Code Changed transcriptomics, proteomics, metabolomics, single-cell omics techniques, high-resolution imaging, electronic medical records (EMRs), big-data analytics and a plethora of internet- connected health devices fundamentally change the infrastructure requirements for health research. Translating these new data together with clinical information into scientific insights and actionable outcomes for improving clinical care is a major challenge. As life-science and health research datasets rapidly grow larger, with an ever-increasing number of study participants required to detect meaningful but weak signals among that may be blurred by a myriad of confounding biological, experimental or environmental factors, the computational resources required to process and analyse this big data increasingly outgrows the capabilities of even large research institutes. The various cCloud technologies and services listed in Table 1 are, based on shared commercial and private computer and storage resources that can be provided on demand to users from a large number of different institutions conducting or participating in joint projects (Table 1). They, have emerged as powerful solutions to the challenges of collaborating in research on genomic, biomedical and health data. Biomedical Commented [CA1]: RA1: ‘The title of this paper and health research has yet to fully enter the bBig dData and cCloud computing era. The HRIC, invokes "Cloud" but other than using the term "federated cloud" in the article nothing much is as described in this manuscript, would help facilitate this transition, providing access to larger discussed how cloud computing will fit into the HRIC?’ datasets, cutting-edge tools and knowledge, as envisioned in [3]. For example, the HRIC should Please consider briefly clarifying this in your ease the incorporation of domain expert knowledge into systems disease maps in a format Background section. that can be both understood by all stakeholders (patients and clinicians, scientists and drug Commented [CA2]: RA2: ‘A prefix of "innovation" is developers), and processed by a high-performance computers, thus supporting the used with Cloud, no clear discussion is provided on development of innovative medicines and diagnostics [4, 5]. Cloud technologies (through e.g. what kind of innovations are conceived by using Cloud Computing to tackle biomedical research challenges.’ Hadoop applications) makes it possible to collaborate, access and reuse data also in situations when privacy concerns or regulation prohibits remote users from downloading data – an Please consider briefly clarifying this – we assume that important benefit in Europe where national regulations can differ significantly. Clouds allow the proposed HRIC aims to facilitate research and innovation, but consider briefly clarifying how and what bringing algorithms to the data and as such can enable data sharing and joint processing kind of innovation is envisioned. without generating unnecessary copies of the data, which comes with potential benefits for data protectionCloud technologies (through e.g. Hadoop applications) makes it possible to collaborate, access and reuse data also in situations when privacy concerns or regulation Field Code Changed prohibits remote users from downloading data – an important benefit in Europe where Field Code Changed national regulations can differ significantly. Clouds allow bringing algorithms to the data and Field Code Changed as such can enable data sharing and joint processing without generating unnecessary copies of the data, which comes with potential benefits for data protection [6, 7].. Clouds, Field Code Changed additionally, enable performing computational analyses at a scale that individual institutions Field Code Changed would struggle to manage Clouds, additionally, enable performing computational analyses at a scale where individual institutions would struggle [7].. Consequently, in the last few years Field Code Changed the large international cancer and other genomics consortia have created specialized genomics and biomedical cloud environments, each supporting individual projects [2]. These Field Code Changed projects have made important advances in connecting health research data across disciplines, organisations and national boundaries. For instance, in research on rare diseases,

3 international collaborations that integrate genomic, phenotypic, and clinical data have introduced new paradigms in diagnosis and care [8]. However, a project-based, fragmented Field Code Changed landscape will not enable access and construction of large data cohorts that are required to address novel or broader biomedical questions that were not anticipated in individual projects when collecting inform consent from the participants, nor will it provide adequate data governance and containable cost models. Scaling and sustainably managing such solutions to support all European life scientists thus requires a coordinated action from science policy makers, funders and other actors in this complex ecosystem. Connecting Europe’s health data to advance the understanding of life and disease requires that research data and analysis tools, standards and computational services are made FAIR – i.e. findable, accessible, interoperable, and reusable for researchers across scientific disciplines and national boundaries [9]. Truly enabling personalised and digital Field Code Changed medicine across Europe and beyond will require a connected digital infrastructure for Europe’s health data that supports systematic openness and integration of research data with real-world datasets (e.g. environmental monitoring data). generated inside all healthcare systems, government agencies, foundations and private organisations that will adopt it.(e.g. environmental monitoring data). On the 13th of March 2018, the Health directorate of the Directorate-General for Research and Innovation of the European Commission of the EU organised a workshop to explore the possibility and challenges involved in establishing a cloud for health research and innovation, accessible by researchers and health professionals throughout Europe, in line with recommendations for a European Innovation Council and the Horizon Europe 2021-2027 framework programme [10, 11]. The cloud computing environment proposed in this Field Code Changed manuscript builds upon the European Open Science Cloud (EOSC) initiative developed in the Field Code Changed last few years by the European Commission [12], with a focus in the life science and medicine Field Code Changed field. The EOSC aims at developing a trusted, open environment for the scientific community for storing, sharing and re-using scientific data and results. Overall, the authors feel that the cloud described in this manuscript, providing the biomedical and health research community Commented [BDM3]: RA4: ‘Which basic services will with the technical infrastructure and services listed in Table 1 to support the development of be provided by the HRIC? What types of applications innovative diagnostics methods and medical treatments, should become an integral part of will be used for tackling data integration?’ the EOSC. The workshop gathered a broad range of experts from multiple biomedical research ‘What types of innovation can one anticipate by disciplines, health care, informatics, ethics and legislation, including representatives of more development of HRIC?’ than 45 FP7 and Horizon 2020 (H2020) EU-funded collaborative projects. The participants explored requirements and developed a set of recommendations for a European “Health Please briefly answer those points. Research and Innovation Cloud” (HRIC) to connect researchers and health data sources in Europe [13] such that clinical data, software, computational resources, methods, clinical Commented [BDM4R3]: Niklas, Alberto protocols, and publications, can be moreare widely and securely accessed and reused Field Code Changed following the FAIR principles [9] than is currently possible with existing European research Field Code Changed infrastrucures such as ELIXIR that form a network of heterogenous national nodes. For example, tThe HRIC infrastructure would benefit from the aforementioned advantages of cloud computing for the archiving and dissemination of health data via compute clouds. This paper summarises the main conclusions from the workshop and highlights five recommendations and action points to the EU and national stakeholders (Table 2). The recommendations are key issues that need to be addressed in order to link biological, clinical, environmental and lifestyle information (from single individuals to large cohorts) to the health and wellbeing status of patients and citizens over time, while making this wealth of data and information available for European health research and innovation in clinical care.

4

The HRIC should be built on established standards and guidelines, to foster European-wide medical research

Rationale Sharing of data, information and knowledge represents the most important functionality in the context of a HRIC. High-level standardisation, common exchange mechanisms, interfaces and protocols, and semantic interoperability form the foundation for widespread adoption of the FAIR principles [9] in health research. Data shared collaboratively in such health-related Field Code Changed cloud projects are now largely standardised for the processing of genomic DNA read and genomic variant calling files. By comparison, the sharing of highly sensitive clinical and health data has thus far been much less developed hence representing a key future focus area, and numerous challenges remain with regard to sharing these data in a meaningful way.

Existing standards and guidelines Many individual projects, in Europe and globally, have demonstrated the opportunities and the added value presented by connecting and exchanging data across countries via standardised protocols. Table 3 lists recent European projects that have developed towards clinical and health data exchange via cloud-based solutions. All those projects have developed Commented [CA5]: and implemented worthy ideas that should be included in the HRIC. However, we envision the RA5: ‘On page 4, line 46-47, reference to Table 3; what is it that these platforms do not provide? Why is there a HRIC as a disease-agnostic environment, and on a larger scale than the platforms mentioned need for HRIC? What gaps will HRIC address?’ in Table 3. Moreover, the HRIC should not be tied to a single project or consortium, but should Please consider briefly clarifying this point. rather be under the governance of an independent body. International exchange of health research data holds tremendous potential in disease research by facilitating better investigation of disease causality and linking of genotypes and phenotypes, such as demonstrated in the Pan-Cancer Analysis of Whole Genomes (PCAWG) project, for example. An important aspect of the cloud-based data governance is that it allows sharing of data outside a consortium through a data request mechanism and a governance infrastructure that track participant consent, and data access. The cloud-based audit trail capabilities of the cloud-based research analyses that can be implemented at both data and infrastructure levels is a direct benefit to the data controllers. A standardised data model (or data access model) and/or standardised metadata models facilitate consolidation of different data sets and significantly increase the findability, the semantic interoperability and, by consequence, the reusability of data, and thus its ‘FAIRness’ [9, 14]. Field Code Changed Workshop participants agreed that a first minimalistic and yet effective approach to data Field Code Changed exchange should consist of a small number of initial online repositories containing references (e.g. links to render data sets findable) along with metadata (e.g. type and scale of content, specifications describing in what systems the data set may be stored and processed), and indications on how to gain access (e.g. requirements and point of contact). This could be designed as a metadata repository containing the metadata of data objects and information on how to gain access to them. The data objects as such may, or indeed should, be stored elsewhere. Beyond their metadata, however, data sets are bound to greatly differ, as research projects vary largely in scope. Not only would it be cumbersome to record a large number of parameters irrelevant to the specific question addressed, but it would also be problematic from an ethical viewpoint, considering patient personal data protection aspects [15]. It Field Code Changed

5 therefore seems more promising to drive standardisation within research communities while looking out for opportunities for overall standardisation. Thus, the workshop envisioned the HRIC as a distributed collection of data repositories, people Commented [CA6]: RA6: ’On page 5, lines 18-22 and services, which together make up a framework to share and operate as a federated data indicate that HRIC is envisioned as a "federated commons". But there is no explanation what biomedical commons with reproducible software [2, 16-19], standards and expertise based on joint research challenges, a federated data-commons is policies and guidelines to conduct health research as performed successfully in previous going to address that are not currently being met by existing infrastructures? There is no reference citation initiatives [2, 16-19]. . The need for federation is also highlighted in the proposed EU action for Data Commons, and advantages of a federated plan for ‘Making sense of big data in health research’ [3]. Creating such a HRIC opens new Data Commons over non-federated Data Commons.’ frontiers for research and healthcare via the opportunities for strong international Please consider briefly clarifying/addressing these collaborations. points. Beyond Europe the HRIC should collaborate internationally to drive the development and Answer: widespread adoption of global standards and connectivity. OngoExisting initiatives exist in the We added four references related to data commons. USA, Canada, Africa and the United Kingdomoutside Europe [20, 21] andthat are aiming at developing global standards for the secure data exchange and securityof e.g. health and medical records within health information federations while tracking completeness of the Field Code Changed supporting data [20, 21], as well as developing guidelines for data analytics and standardised Field Code Changed workflows. T, thaeyt should be considered for the basis of the HRIC as this will. This provides the scientific community with means of reproducibility, version control and documentation, Field Code Changed which will be an important vehicle to drive increased standardisation and connectivity. Commented [BDM7]: RA7: ‘What are those initiatives? There is no context and no references provided for those initiatives’. A European HRIC should be developed and certified to the highest standards of interoperability and data security Please briefly clarify what those initiatives are. Commented [BDM8R7]: We added two references Rationale related to the initiatives mentioned. The workshop participants enthusiastically endorsed the vision of the HRIC as a federated Field Code Changed environment. Through work on the standardisation, harmonisation and integration of Field Code Changed genomic data with other health-relevant information to optimise hypothesis-driven analyses, the major blockers for cross-border collaborations in translational research on disease prevention, treatment and management will be addressed through a federated HRIC infrastructure in which data sources remain at their location of origin and are made accessible to users through a metadata repository. Moreover, data security has to be integral to the development of the HRIC, using modern cryptology and access control techniques to ensure the protection of the patients’ data contained therein.

Standards in interoperability and data security European health systems have different ways of managing and storing health data, making the exchange of clinical data between the EU member states complex. The challenges are well illustrated by the use of eHealth records and their use as secondary research material. In a recent OECD report [22] ten countries reported comprehensive record sharing within one Field Code Changed country-wide system designed to support each patient having only one EHR (Supplementary Table.). These countries are Estonia, Finland, France, Greece, Ireland, Latvia, Luxembourg, Poland, Slovakia and the United Kingdom (England, Northern Ireland, Scotland and Wales). In these countries, plans call for patient records to be shared among physician offices and between physicians and hospitals regarding patient treatment, current medications, and laboratory tests and medical images. Some have already implemented part or all these functionalities, while others are progressing toward it. In other countries key aspects of record

6 sharing are managed at sub-national level only, such as within provinces, states, regions or networks of health care organisations (for example, Austria, Germany, Italy, Netherlands, Sweden, and Spain; Supplementary Table). Among them, all have implemented or are planning the implementation of a national information exchange that enables key elements to be shared country-wide. Based on the recent reports of the European Commission [23], Field Code Changed Belgium, Malta, Portugal, Romania and Slovenia are now developing national EHR systems, leading to a total of 16 EU member states providing such services. Within the framework of the Joint Action on Rare Cancers, an EU initiative that brings together European research centres, policy makers and other stakeholders with the aim to set the agenda at national level, an analysis was made on the status of eHealth medical records in the EU member states. This work builds on the OECD study and complements this with information provided by the European Commission on the national laws on EHRs in the EU member states [24]. Thus all EU countries are investing in the development of clinical EHRs, Field Code Changed but only some countries are moving forward the possibility of data extraction for research, the provision of statistics and the enablement of other uses that serve the public interest (P. Bogaert, personal communication). Countries that develop EHR systems that combine or virtually link data together to capture patient health care histories have the potential to be used also for long-term follow-up of cancer patients. Figure 1 shows how data from various sources can be integrated to provide a full picture of patients’ health status over time and carry out research for patterns and anomalies in large populations using the specific combinations of data and analysis resources relevant for each research project, while ensuring compliance with security and data protection regulations. The H2020-funded project European Open Science Cloud (EOSC)-Life [25] is developing Field Code Changed policies, specifications and tools for the management of data for biological and medical research including aspects of eHealth data. The use of common metadata standards, developed in EOSC-Life, as a foundation for remote data discovery and access was emphasised by the workshop participants as a key enabler for the HRIC. For instance, practical and legal considerations for cloud computing of patient data, which include the responsible use of federated and hybrid clouds set up between academic and industrial partners have been put forward by early EOSC pilot projects [26]. Field Code Changed The workshop participants emphasised that sustainability aspects are critical and must be considered from the start. To ensure that a HRIC could respond properly to emerging needs, innovation and technology changes, a distributed federated storage solution offering access to FAIR data and services should be built according to modularity principles. In particular, due to the long-term nature of a HRIC, due consideration should be given to deploying generic and modular computational methods and/or data storage management systems, while the information and communications technology (ICT) infrastructure should be flexible, portable and expandable. The million European genomes initiative [27] is a case in point: eighteen member states have Field Code Changed already signed the Genomics Declaration of Cooperation [28] to enable cross-border access Field Code Changed to genomic databases and other health information. This federation of national initiatives [28] Field Code Changed will provide secure access to such data resources in the member states to enable the discovery of personalised therapies and diagnostics for the benefit of patients. The initiative involves aligning strategies of ongoing national genomic sequencing campaigns with complementary de novo genome sequencing to obtain a total cohort of 1 million Europeans, accessible in a transnational framework, by 2022 [29]. The HRIC would form a basis for such large-scale, Field Code Changed permanent collaborations.

7 The workshop participants recognise that ensuring a maximal data security is paramount to build and maintain trust with the European citizens. To address this issue, we suggest using modern cryptology such as blockchain to ensure data security by design, and holding regular data security assessment (e.g. through hackathons and/or commercial security audits). As Commented [CA9]: RA8: ‘On page 7, lines 9-10, shown in recent literature, the use of blockchain in biomedical research is still in its infancy There is no citation provided to support Block Chain can be successfully deployed for large scale Cloud [30]. However, blockchain or other advanced cryptology tools that can be used to protect data based biomedical data management and access?’ in a cloud environment could prove useful for the secure and trustworthy implementation of the HRIC [31]. Field Code Changed

Field Code Changed The HRIC must be supported by a robust ethics and legal framework compliant with the GDPR

Rationale Compliance with GDPR and other data protection laws, as well as enforcing an ethical usage of data is paramount to gain the general public’s support and trust in the HRIC.

Existing ethics and legal framework Unblocking the legal and administrative barriers for sharing human research data across geographical and organisational boundaries will, if the trust of research participants is preserved, pave the way for continent-scale cohorts in life science research. This will represent a significant innovation as sharing and joint analysis of sensitive data has until now been severely limited because of the different restrictions inherent to the different classes of sensitive data. By using a federated database model, with a metadata repository within the HRIC cloud encrypted environment, data security is maintained while innovative data analyses can be performed by bringing the algorithms to the data rather than centralizing the data [32]. Field Code Changed Federation, rather than full integration of all available resources, poses an important challenge for implementation and deployment of an effective HRIC. The set-up and functioning of a HRIC requires a robust foundation of legal agreements and ethics rules and procedures, as well as security and data protection compliance protocols. Importantly these elements must be introduced during the conception and design phase as part of its governance. This is essential in order to allow the different actorsHRIC actors providing access to their data sources, managing them within the cloud, and accessing these resources to incorporate policy requirements into the design and manage the complexity involved in implementing simple and intuitive user interfaces and project portals. This may be challenging considering the heterogeneity of health systems and health market accesses across Europe and will need the sharing of a common agreed vision across EU member states. Ethical, societal and privacy considerations for (re)using health related data have been outlined in the Code of Practice on Secondary Use of Medical Research Data developed in the European Translational Information and Knowledge Management Services (eTRIKS) project funded by the Innovative Medicines Initiative (IMI) [33]. In addition to clear and explicit Field Code Changed consent, explicit dissent may need to be considered for the use/re-use of data. Ultimately, each citizen and patient must be able to access her/his own data and know when and where it has been used and for what purpose. In addition, the difficult question of the business model of using those data should be discussed at various levels from ethical, societal and economical standpoints taking into account potential future developments of products and services using personal medical data. Furthermore, the goal of providing the citizens with personalised

8 services requires technical advancements in the collection and analysis of data (e.g. in data analytics and machine learning). For this type of usage, simple consent mechanisms might not be sufficient. For example, how should a clear data-collection purpose statement be defined, if data is collected for multiple usage scenarios across a distributed/federated cloud in which actors from different geographical and legislative environments will need to interact and cooperate? Would an excessive number of consent requests minimise data provision for research or clinical applications? Another level of complexity is introduced by the heterogeneity of data protection and privacy regulations when the data originate from states within federated national health systems (e.g. Germany, Italy). Development of large-scale European access mechanisms will require open consultation and engagement with national policymakers, patient organisations and the wider society to build the trust and confidence needed for widespread adoption and sustainable operations. In addition to technical, ethical and legal specifications, a global integrated governance model needs to be established for the HRIC in line with that of the EOSC, regulating roles and, responsibilities of all contributing institutions and users, and procedures for authentication and access control to individual resources. Principles, with specific guidelines on implementation into a HRIC environment, will need to be developed in order to manage and regulate aspects such as ownership, access, transparency, sharing, integration, standardisation of data and metadata formats, tools and frameworks, while ensuring confidentiality and sustainability. All these principles need to be developed with the overarching objective of providing benefit to and preserving the trust of patients and the general public. Health data mostly represents sensitive data, which need to be managed to preserve the trust of patients, research participants and the general public, respect social norms and naturally comply with the rules and regulations of data protection laws, notably the EU GDPR [15]. Field Code Changed Although the GDPR directly applies across the EU and its provisions prevail over national laws, EU member states retain the ability to introduce in their own national legislation certain derogations provided for by the GDPR itself. The GDPR also introduces the notions of 'Privacy by Design', which means that any organisation that processes personal data, must ensure that privacy is built into a system during the whole life cycle of the system or process; and 'Privacy by Default', which means that the strictest privacy settings should apply by default, without any manual input from the end user. In addition, any personal data provided by the user to enable the optimal use of a given health dataset should only be kept for the amount of time necessary to provide the intended product or service [15]. Field Code Changed Thus, successfully linking and accessing biomedical and health data across Europe will require many different disciplines and specialists working together with a coordination effort that should encompass controlled access mechanisms to ensure compliance with privacy and data protection regulations. Data providers need logging and monitoring functionalities to comply with the GDPR and to enable tracking of data and methods within the system, controlling instances and routines that check for the adherence of predefined standards and formats to guarantee data integrity. Access mechanisms need to be developed that support the researchers, data producers and data analysts to request permissions and fulfil reporting requirements for data use in national and international research projects, which represents a significant regulatory, political and sustainability challenge [34]. These include in particular Field Code Changed considerations about the rights of patient donors and research participants, taking into account the data protection aspects of various legal systems and local regulations. Researchers have to face differences in the understanding of the right to data protection in

9 those different regional or national European ecosystems. There is an urgent need for standardised, usable, data-protection-policy-compliant solutions for sensitive data sharing capable of integrating and analysing health data coming from different sources, organisations and potentially from different research disciplines. These aspects are subject to ongoing discussions and debates in the EOSC initiative [35] and for instance, progress has been made Field Code Changed in the Human Brain Project through its Ethics and Society sub-project in collaboration with the project platforms [36, 37]. Other examples of data sharing compliant with data protection Field Code Changed policies can be found in the recent literature [38-45]. Furthermore, there is the issue of Field Code Changed capacity with the amount of data starting to strain the infrastructure of any individual hospital Field Code Changed or research institute. Thus, the interplay between privacy, data security and access control on one hand and access (including cost-recovery models) to storage, computational and analysis resources on the other hand will be a defining element of the policy and technology development of a decentralised digital health infrastructure. The evolution of a cloud model that could be used in European health research will also have to take into account other specific aspects of the GDPR [15]. For instance, the European Commission intends to facilitate Field Code Changed the free flow of non-personal data in the European Digital Single Market, and for health- related research participants it codifies the ‘right to be forgotten’. It stipulates that patient donors should be able to retain control over their data regardless of technological developments. A European HRIC could be an important enabler for researchers to comply with these requirements. For example, once certain conditions are met between European and international partners, including those pertaining to data protection and use, federated and hybrid clouds could facilitate the deletion of data sets once a donor exercises her/his ‘right to be forgotten’, which thus could minimise the necessary transfer of large raw data sets across borders, as the deletion can be performed in the original dataset and easily propagated to the relevant federated data sources.

A proper training environment for HRIC developers and users should be established

Rationale The workshop identified the lack of trained personnel, with solid skills in both medical and data analytical fields as one of the major bottlenecks when dealing with ‘medical big data’ [3]. Field Code Changed

Need for training and ideas Indeed, effectively developing, operating and maintaining the HRIC will pose serious challenges and require the training of a new generation of data scientists able to navigate smoothly and efficiently between computational, security and medical disciplines. This includes clinical researchers, bioinformaticians, data analysts, data managers, software engineers, cloud engineers, other IT-specialists, ethics officers, and data protection specialists, which represent an essential new field of expertise. Finding professionals that cover more than one or two of the above-described disciplines is nearly impossible. Furthermore, communication between this comprehensive mix of clinical researchers, data managers and IT/bioinformatics specialists needs to be improved, requiring a governance structure well beyond that of a standard research setting. The EU should take inspiration from existing large and successful infrastructures that foster multidisciplinary teams, such as CERN [46, 47]. Thus, Field Code Changed it is necessary to rethink the training and education of health professionals and to update Field Code Changed

10 them with the HRIC in mind, considering both international standards and practices for data sharing as well as national environments and regulations.

Multiple funding mechanisms are required to drive the development of the HRIC and support its broad use in research projects

Rationale Delivering the HRIC requires an ambitious reshaping of the European landscape for health data and research through appropriate funding schemes that enable transformation of fragmented ICT resources and project-centric solutions for data access and governance into a long-term, coherent service ecosystem that can be accessed by users transnationally.

Need for innovative public-private funding initiatives Indeed, the HRIC needs a trusted and transparent innovation approach that recognises the importance of a clear, long-term ambition in the programme to support the participation of industry and small to medium enterprises (SME) in joint projects with a broad set of societal actions. In particular, there is a need to support the EU ICT industrial/SME innovation ecosystem in order to demonstrate the benefits in advancing data sharing, integration and analysis across Europe, for the benefit of all citizens, thus creating a foundation for attractive private investments. For this purpose, targeted EU funding mechanisms also involving private investors need to support the development of HRIC-compliant services for data sharing and analysis in health- related research projects (i.e. through reimbursement for storage and computing costs) with incentives to reuse and extend existing infrastructures that favour national HRIC participation rather than rebuilding and fragmenting solutions. Moreover, the EU ICT industrial ecosystem must be supported in order to alleviate the risks associated with storing and sharing data across cloud systems operated by non-EU companies. The EU has established a strict privacy Commented [CA10]: RA9: ‘On page 10, lines 28-30, it and ethics policy through the implementation of the GDPR legislation, binding for all operators is not clear how risks in storing data in the Cloud will be lowered by EU ICT industrial ecosystem, when active on the EU territory [48-51]. compared to their non-EU counterparts?’ European funders, science policy makers and other actors need to develop mechanisms that Please consider briefly clarifying or addressing these bring together experience gained and lessons learned from a large portfolio of pathfinding minor points. projects and build on current investments, thus leveraging existing project outcomes. This requires an inclusive and integrative approach with programmes that bring together the many Field Code Changed different actors into the HRIC, as its construction needs interdisciplinary collaboration with expertise from many disciplines e.g. economics, ICT, biomedical and health, social sciences and policy. In particular, frameworks for public-private partnerships such as IMI have shown a way to include industry in open transparent projects that also includes patients and other public bodies, SMEs and European researchers. Many of the mechanisms proposed in the Lamy report (prioritise research and innovation in EU and national budgets, build a true EU innovation policy that creates future markets, rationalise the EU funding landscape and achieve synergy with structural funds…)[52] and in the development of the Horizon Europe Field Code Changed “Missions” [53] would also be well suited to the development of the HRIC and help bring Field Code Changed together initiatives from the many funders, national and regional stakeholders. Other opportunities such as those proposed to be included in the Horizon Europe strategic workplan (e.g. the European Information Cloud, the European Institute of Technology and the European Field Code Changed Council for Health Research [54, 55]) should be of interest to deploy innovations together with Field Code Changed industry at the European level [56]. Field Code Changed 11 Furthermore, future programmes need to create incentives such that developed solutions are transformed into long term reusable resources and make sure that this infrastructure is deployed across the EU with development informed by ongoing research projects. The European Joint Programme on Rare Diseases (EJP-RD) [56] provides a good example of how Field Code Changed infrastructure development can be linked with research projects at national and international levels. As in the EJP-RD, the European Strategy Forum on Research Infrastructures (ESFRI) can play a role in the development of HRIC. Two further aspects of EJP-RD are worth noting: the programme has a strong emphasis on the importance on the diverse workforce required with a training programme that stretches beyond academia and research networks to reach a broad set of individuals in health systems and the education sector. Secondly, the research portfolio should fund broadly and recognise that successfully addressing many of the identified challenges will require a diverse portfolio of projects that avoid any artificial boundary between biomedical and health research. Horizon Europe should allow for linkages between the HRIC, ESFRI and other themes of Horizon Europe and, importantly, between HRIC and other funding sources such as the European Structural and Innovation Funds (ESIF) and the cutting-edge basic research successfully supported by the European Research Council (ERC) during the past decade [57, 58]. Field Code Changed The HRIC should enable investments in product development for future health-care solutions Field Code Changed as well as allow health care providers to procure such solutions. People need to be an integral part of the innovation vision where the HRIC supports a highly skilled future workforce that makes Europe attractive for locating R&D investments. The experience gained in IMI public- private partnership infrastructure projects such as eTRIKS and the European Medical Information Framework (EMIF) should be leveraged. Finally, the current and future EU Framework Programmes for Research and Innovation (Horizon 2020 and Horizon Europe) should consider mobilising funds to support novel pilot actions, pooling of data and resources across the EU, and demonstrate the benefits in advancing data sharing, integration and analysis across Europe, for the benefit of all citizens.

Conclusions, recommendations and action points

Clouds are increasingly becoming a key venue for enabling and hosting European and international collaborations, benefitting from the ability to hold data securely in a single location (or in few locations) and enable collaborative research on computational infrastructure used for analysis. In conclusion, a cloud-based federated data storage solution with interoperable data access services (see Table 1) to local repositories and modular environments that can be configured for a given use case seems to match the data needs of the EU research and medical institutions and of all other stakeholders. The choice of cloud technology provides the ability to manage rapidly growing datasets and provides users with access to the massive computational infrastructure needed for analysis. The federated, cloud based research environment - HRIC- described in this paper, would represent an added value to the entire biomedical and bioinformatics community, because single research institutes and medical institutions lack sufficient infrastructure capacity. The establishment of a transnational HRIC will allow the European research community at large to contribute to the global international leadership required to address societal and scientific challenges through transnational collaborations. In order to ensure the effective and efficient implementation of the European HRIC, the workshop participants endorsed the five recommendations and actions points presented in Table 2 to the EU and all stakeholders.

12

Application A set of programs running on one or more computers allowing a user to perform a set of tasks Cloud Application An application running in the cloud Cloud Computing The delivery of IT services over a network (e.g. the Internet) by means of a combination of infrastructure, software and data hosted by one or more cloud providers using a service model similar to that used by traditional utility companies (e. g. water or electricity) Cloud Federation The combination of infrastructure, software and services from separate networks and providers, having shared access mechanisms, to perform common actions, achieve load-balancing or optimize availability or costs Cloud Instance A virtual server or container of resources running on a physical host computer possibly hosting several independent instances (see virtualization) Cloud Marketplace An online marketplace of cloud services and applications operated by a Cloud Service Provider (CSP) Cloud Service Provider A company or public entity that offers cloud services to individual (CSP) users or other entities Cloud Storage A model of data storage in which data is hosted across one or more facilities by a hosting entity or CSP and remotely accessed by users over the Internet Container A type of virtualized instance running on a physical host server in isolated user spaces and possibly preloaded with applications Hybrid Cloud A cloud computing infrastructure comprised of a mix of public and private cloud and on-premise instances and resources Infrastructure The combination of hardware resources (network, computing, storage, etc.) and virtualized instances supporting an IT environment Infrastructure as a A model of cloud computing in which a CSP provides an Service (IaaS) infrastructure of virtualized resources to users as a service over a network (e.g. the Internet) On-Premise It refers to infrastructure or software run on computing resources physically hosted by the entity using them Platform A computer system where applications can run on or can be built with Platform as a Service A model of cloud computing in which a CSP provides the (PaaS) infrastructure and the platforms where users can run and manage their own applications as a service over a network (e.g. the Internet) Private Cloud A cloud infrastructure used by a single organization either on- premise or hosted by a third-party CSP over the Internet or dedicated private networks Public Cloud A cloud infrastructure hosted by a CSP (or a Federation) and used by the public or multiple organizations across the Internet

13 Software as a Service A model of cloud computing in which a CSP hosts and provides (SaaS) applications (software) to users as a service over a network (e.g. the Internet) Virtualization A technology that allows to run a software simulation of a physical computer where a full operating system and applications can be installed Table 1 : Glossary of cloud computing terms.

14 Recommendation Rationale Action points The HRIC should be supported by predefined standards, data formats, protocols and templates. The Suggest the adoption of data formats and Provide and foster standards, good data standards and guidelines applied in the HRIC architectures in policies, grant applications practices and guidelines necessary to should be designed as to facilitate interoperability and projects calls, throughout the EU and its establish the HRIC. between the diverse health systems and policies in member states. Europe and globally. - In future grant and project calls related to The HRIC should provide computational the development of the HRIC, the EU and infrastructures and services, analytical and the applicants should commit to comply Develop and certify the visualisation tools to all users as a platform to share with the highest standards of security, infrastructure and services required knowledge, data and guidelines. Services for data interoperability and reproducibility. for operation of the HRIC. sharing, security and analysis should be compliant - The EU should develop a certification with an EU certification system. system to validate compliance with the standards mentioned. A robust ethical and legal framework has to be developed that defines rules for privacy, security, Enable the HRIC to operate within an ownership, access and usage of data within the HRIC. - In grants and project calls, federated and ethical and legal framework A federated system architecture should be preferred GDPR-compliant data architectures adequate for health systems. as it allows for comparison of data and results, while should be preferred. complying with EU (GDPR) and international data protection and sharing rules. Education and training of health professionals need to be updated with the HRIC in mind, considering both - Scientists and health professional in Establish a proper environment for international standards and practices for data sharing training should be made aware of the the training of a new generation of as well as national environments and regulations. The possibilities of the HRIC. data and medical scientists. EU should take inspiration in existing large and - Communication in relevant professional successful infrastructures that foster multidisciplinary channels should be strengthened. teams, such as CERN.

15 The EU and its member states should together with - The EU should invest through calls and Fund public and private initiatives for private investors develop a coherent, ambitious and grants in order to build and consolidate the development of the HRIC through long-term action plan supported by innovative funding the HRIC. EU Framework Programmes mechanisms that consolidate the outcomes from the - The existing industrial ecosystem should (Horizon 2010 and Horizon Europe). existing project portfolio into a long-term operational be supported, to remain competitive infrastructure. against the other actors in the world. Table 2: Summary of the recommendations, details on the rationale and suggestions for actions points directed to the funding agencies and the actors of the field.

16

Figure 1 : Proposed general architecture of the HRIC European (inter)national databases, with varying data formats and data Commented [CA11]: RA3: ‘Figure 1 legend has a statement "rules of federated data commons", but does types are referenced in a metadata repository, following formatting rules of the Federated data commons as agreed at the not discuss in the text what those rules are and how a relationship between federated Cloud and the HRIC governance level. as agreed at the HRIC governance level. The different users, after access control to the cloud, use the Commons will be established?’

HRIC interface to access the repository, which gathers the relevant data and performs analysis, with outputs such as Please consider briefly clarifying this point.

mathematical models, data visualisations, statistics and patient’s profiles, according to the users’ needs. Answer: [...] following formatting rules of the Federated data commons as agreed at the HRIC governance level.

Table 3: Relevant initiatives for the European Health Research and Innovation cloud

PROJECT AIMS/SUMMARY Cloud model used References CORBEL project Creating a platform for Scalable cloud-based provision of [59] harmonised user access to data access and compute across biological and medical infrastructures. technologies, biological samples and data. The project has developed the data harmonisation, ethics guidance and user access protocols necessary for transnational access to both pre-clinical and clinical research infrastructures and is piloting access to participant level data from clinical trials43. ELIXIR EUROPEAN RESEARCH Hybrid cloud ecosystem : [60] INFRASTRUCTURE with 21 i/ Local, private clouds (e.g. members and over 180 research EMBL-EBI Embassy); organisations. ELIXIR is creating a

17 network of local instances of the ii/ National community clouds European Genome-Phenome (e.g. cPouta, MetaCentrum cloud, Archive that give users controlled de.NBI); and secure access to raw data and iii/ European research and precomputed results44,45. innovation oriented clouds (e.g. EOSC) iv/ Public/commercial compliant clouds (e.g. Google, Azure, AWS).

European IMI-funded highly scalable cloud- Scalable cloud-based platform for [61] Translational based platform for translational translational research and Information research, information and applications development. The and Knowledge knowledge management providing Openstack technology is used to Management open-source applications that can run a private cloud for eTRIKS. securely host heterogeneous data Services types, including multi-omics data, (eTRIKS) preclinical laboratory data and clinical information, including longitudinal data sets. The platform is a robust translational research knowledge management system that is able to host other data mining applications and support the development of new analytical tools46. European IMI-funded project that has Research analytical service [62, 63] Medical successfully improved access to approaches via HER and cohort Information human health data via providing data platforms. Framework tools and workflows to discover, (EMIF) assess, access and (re)use human health data. The efforts of this IMI project are being extended through the EHDEN one. Human Brain The HBP Medical Informatics The HBP Joint Platform plans to [64] Project (HBP) Platform allows researchers adopt in cloud technology and Medical around the world to exploit provide those services through its Informatics medical data to create machine- computer centers JSC-Jülich and Platform learning tools that can analyse CSCS-Lugano together with BSC- these data for new insights into Barcelona, CINECA-Bologna and brain-related diseases. The CEA-Saclay. Medical Informatics web-portal47 provides a software framework Software infrastructure : based on federated and i/ The (base) infrastructure layer distributed computing that allows is accessible through an researchers to mine clinical data “Infrastructure as a Service” stored on hospital and laboratory (IaaS) interface; servers, without moving the data ii/ Tools to enable simulation and from the servers where they reside modeling as well as data analytics and without compromising patient workflows for neuroscience that privacy. the HBP operates as a “Platform as a Service” (PaaS);

18 iii/ Several software services for data-driven brain simulations and for virtual neurorobot design and operation offered in form of “Software as a Service” (SaaS). HBP operates the following SaaS: a/ Model driven brain simulations; b/ Neurorobotics simulation and development tools across the whole workflow the of neurorobotics life cycle. Human Brain The HPB Knowledge Graph (KG) is [65] Project (HBP) an online graph database Knowledge accepting submissions of Graph Data anonymized human data, animal Platform data, and models from the brain. All data made discoverable and accessible through KG search have been curated. Data are also made available with integrated multilevel HBP Atlases, holding information about the brain in standard reference spaces.

Helix Nebula The project is a pan-European Helix Nebula Science [66] public private partnership initiative Cloud (HNSciCloud): led by EIROforum and leading Hybrid cloud platform linking commercial cloud computing together commercial cloud partners. Since 2011, it has been service providers and publicly piloting the use of cloud funded research organisations’ computing to enable complex data in-house IT resources via analyses and large-scale data the GEANT network to provide sharing, with life science-oriented innovative solutions supporting projects ranging from complex data intensive science. genome assembly to assessing These services support the somatic variation in the context of connection of the research different types of cancer. infrastructures identified in the ESFRI Roadmap to the nascent European Open Science Cloud (EOSC) intended to create a single digital research space for Europe’s 1.8 million researchers.

European Open The pilot project has recently Cloud based services for open [25] Science Cloud investigated the benefits of data sciences - integration and (EOSC) and cloud computer sharing at a consolidation of e-infrastructure pan-European level with life platforms, federation of existing science-oriented projects on pan- European research cancer analyses31 and imaging. infrastructures and scientific This has been pursued across clouds.

19 academic cloud computing environments located in Western and Eastern Europe as well as Canada. Similar initiatives have been launched in the USA8. PCAWG PCAWG - PANCANCER ANALYSIS Hybrid cloud model [67, 68] OF WHOLE GENOMES The data coordinating center lists The study is an international collaborative agreements with collaboration to identify common cloud providers AWS and an patterns of mutation in more than academic compute cloud 2,800 cancer whole genomes from resource maintained at the the International Cancer Genome cancer collaboratory, by the Consortium. This project is Ontario Institute for Cancer exploring the nature and Research and hosted at the consequences of somatic and Compute Canada facility. germline variations in both coding and non-coding regions, with specific emphasis on cis-regulatory sites, non-coding RNAs, and large- scale structural alterations. RD-Connect RD-Connect is an integrated Online secured platform [69] platform connecting databases, connecting different types of patient registry data, biobanks and patients related rare disease data clinical bioinformatics for rare enabling genome-phenome disease research. analysis. The RD-Connect allow the integration of different data types (e.g. omics, clinical information, patient registries and biobanks). Those integrated data can be accessed and analysed by the scientific community to speed-up research, diagnosis and therapy development for patients with rare diseases to contribute for better health.

COMPARE COMPARE, a network of One serve all analytical [70] [71] collaborators of the Global framework and data exchange Microbial Identifier initiative (GMI), platform with various data aims to improve identification and integration for real time analysis mitigation of emerging infectious and interpretation of sequence- diseases and foodborne outbreaks. based pathogen.

Abbreviations CERN (the European Organisation for nuclear research) EMR (Electronic Health Record) EOSC (European Open Science Cloud) EU (European Union) FAIR (Findable, Accessible, Interoperable, Retrievable)

20 FP7 (The European Union's seventh framework programme for Research, Technological Development and Demonstration) H2020 (The European Union's Horizon Research and Innovation programme) HBP (Human Brain project) HRIC (European Health Research and Innovation Cloud) OECD (Organisation of Economic Collaboration and Development) IaaS (Infrastructure as a Service) PaaS (Platform as a Service) SaaS (Software as a Service)

Declarations

Availability of data and materials “Not applicable”

Funding The workshop participants received funding from the European Union Seventh Programme for Research, Technological Development and Demonstration (FP7) and Horizon Research and Innovation Programme (H2020) and other European Union programmes under the following grant agreements: AETIONOMY (Developing an Aetiology-based Taxonomy of Human Disease- Approaches to Develop a New Taxonomy for Neurological Disorders, IMI-no115568), ANTI- SUPERBUG PCP (ANTISUPERBUG Precommercial Procurement, H2020-no688878), B-CAST (Breast CAncer Stratification understanding the determinants of risk and prognosis of molecular sub-types, H2020-no633784), BRIDGE Health (Bridging information and data generation for evidence-based health policy and research, H2020-no664691), CASyM (Coordinating Action Systems Medicine – Implementation of Systems Medicine across Europe, FP7-n°305033), CENTER-TBI (Collaborative European NeuroTrauma Effectiveness Research in TBI, FP7-no602150), CECM (Centre for New Methods in Computational Diagnostics and Personalized Therapy, H2020-no763734), COLOSSUS (Advancing a Precision Medicine Paradigm in Metastatic Colorectal Cancer: Systems based patient stratification solutions, H2020-no754923), COMPARE (COllaborative Management Platform for detection and Analyses of (Re-)emerging and foodborne outbreaks in Europe, H2020-no643476), CONNECARE (Personalised Connected Care for Complex Chronic Patients, H2020-no689802), CREATIVE (Collaborative REsearch on ACute Traumatic brain Injury in intensiVe care medicine in Europe, FP7-no602714), CHAFEA grant-no709723), DEFORM (Define the global and financial impact of research misconduct H2020-no710246), ECCTR (European Cornea and Cell Transplant Registry, ECRIN-IA (European Clinical Research Infrastructures Network- Integrating Activity, FP7-no284395), EHR4CR (Electronic Health Records Systems for Clinical Research, IMI-no115189) eInfraCentral (European E-infrastructures Services Gateway, H2020- no731049), ELIXIR (European Life-science Infrastructure for Biological Information, FP7- n°211601), ELIXIR-EXCELERATE (Fast track ELIXIR implementation and drive early user exploitation across the life sciences, H2020-no676559), eMEN (e-mental health innovation and transnational implementation platform North West Europe, H2020), EMIF (European Medical Information Framework, IMI-no115372), ERA PerMed (ERA-net Cofund in Personalised Medicine, H2020-no779282), eTRIKS (Delivering European Translational Information and Knowledge Management Services, IMI-1-no115446), E-COMPARED (European COMPARative

21 Effectiveness Research on online Depression, FP7- no603098), EuroPOND (Data-driven models for progression of neurological diseases, H2020-n°666992), EurValve (Personalised Decision Support for Heart Valve disease, H2020-no689617), HBP SGA1/SGA2 (Human Brain Project specific grant agreements, H2020-n°720270/785907), MASTERMIND (Management of Mental Disorders through Advanced Technologies, CIP-no621000), MedBioinformatics (Creating medically-driven integrative bioinformatics applications focused on oncology, CNS disorders and their comorbidities, H2020-n°634143), ICT4DEPRESSION (User-friendly ICT tools to enhance self-management and effective treatment of depression in the EU, FP7- n°248778), ImpleMentAll (Towards evidence-based tailored implementation strategies for eHealth, H2020-no733025), INSTRUCT-ULTRA (Releasing the full potential of instruct to expand and consolidate infrastructure services for integrated structural life sciences research, H2020- no731005), MeDALL (Mechanisms of the Development of ALLergy, FP7-n°261357) MIDAS (Meaningful Integration of Data, Analytics and Services, H2020-no 727721), MultipleMS (Multiple manifestations of genetic and non-genetic factors in Multiple Sclerosis disentangled with a multi-omics approach to accelerate personalised medicine, H2020-no733161), myPEBS (Randomized Comparison Of Risk-Stratified versus Standard Breast Cancer Screening European Women Aged 40-74, H2020-no755394), OpenAIRE-Advance (Advancing Open Scholarship, H2020-no777541), OpenMedicine (OpenMedicine, H2020-no643796), PanCareSurFup (PanCare Childhood and Adolescent Cancer Survival Care and Follow-up Studies, FP7-no257505), PIONEER (Prostate Cancer DIagnOsis and TreatmeNt Enhancement through the Power of Big Data in EuRope, H2020-IMI-2-n°777492), PREPARE (Platform for European Preparedness Against (Re-)emerging Epidemics, FP7-n°602525), Regions4PerMed (Interregional coordination for a deep and fast uptake of personalised health, H2020- no825812), RD-CONNECT (An integrated platform connecting registries, biobanks and clinical bioinformatics for rare disease research, FP7-no305344), Solve-RD (Solving the unsolved Rare Diseases, H2020-no779257), SPIDIA4P (SPIDIA for Personalised Medicine-Standardization of generic Pre-analytical procedures for In-vitro DIAgnostics for Personalised Medicine, H2020- no733112), SYSCID (A Systems medicine approach to chronic inflammatory disease, H2020- no733100), SysCLAD (Systems prediction of Chronic Allograft Dysfunction, FP7-n°305457), SYSCOL (Systems Biology of Colorectal Cancer, FP7-no258236), SysMedPD (Systems Medicine of Mitochondrial Parkinson’s Disease, H2020-n°668738), U-BIOPRED (Unbiased BIOmarkers for the PREDiction of respiratory disease outcomes, IMI-n°115010), VPH-share (Virtual Physiological Human: Sharing for Healthcare - A Research Environment, FP7-n°269978). Neither the European Commission nor any other person acting on behalf of the Commission is responsible for the use, which might be made of the following information. The views expressed in this publication are the sole responsibility of the authors and do not necessarily reflect the views of the European Commission.

Competing interests RB is a founder and holder of shares of MEGENO S.A. and holds shares of ITTM S.A. The other authors have declared no competing interests.

Authors' contributions This publication is inspired by a workshop organized by the Health Directorate at the European Commission. The chairs ADM, CGC, EMF, FS, IP, LC, MB, MBu, ML, MVDB, NBl, RB contributed to the concept of the manuscript. GHS produced the draft workshop report, which was used as a basis for the manuscript. The core editing group CA, NB, BDM, JK, NB, MVDB reviewed

22 and finalized the manuscript. All authors contributed to the content of the manuscript and approved its final version.

Acknowledgements We would like to thank the scientific policy officers at the European Commission Health Directorate, who developed the workshop concept, organised it and provided input to this publication: Sasa Jenko, Katarina Krepelkova, Christina Kyriakopoulou, Jana Makedonska, Joana Namorado, Elsa Papadopoulou, Jan Van de Loo and Gregor Schaffrath (National Expert in professional training). The workshop was organised by the Health Directorate of the Directorate-General for Research and Innovation at the European Commission with contributions of the Innovative Medicines Initiative, the Directorate General for Communications Networks, Content and Technology and the Directorate General for Health and Food Safety.

References

1. Birney E, Vamathevan J, Goodhand P: Genomics in healthcare: GA4GH looks to 2022. In: Cold Spring Harbor Laboratory. bioRxiv; 2017. 2. Langmead B, Nellore A: Cloud computing for genomic data analysis and collaboration. Nature reviews Genetics 2018, 19(5):325. 3. Auffray C, Balling R, Barroso I, Bencze L, Benson M, Bergeron J, Bernal-Delgado E, Blomberg N, Bock C, Conesa A et al: Making sense of big data in health research: Towards an EU action plan. Genome medicine 2016, 8(1):71. 4. Mazein A, Ostaszewski M, Kuperstein I, Watterson S, Le Novere N, Lefaudeux D, De Meulder B, Pellet J, Balaur I, Saqi M et al: Systems medicine disease maps: community-driven comprehensive representation of disease mechanisms. NPJ systems biology and applications 2018, 4:21. 5. Ostaszewski M, Gebel S, Kuperstein I, Mazein A, Zinovyev A, Dogrusoz U, Hasenauer J, Fleming RMT, Le Novere N, Gawron P et al: Community-driven roadmap for integrated disease maps. Briefings in bioinformatics 2019, 20(2):659-670. 6. Phillips M, Molnar-Gabor F, Korbel JO, Thorogood A, Joly Y, Chalmers D, Townend D, Knoppers BM, Consortium P: Of Clouds and Genomic Data Protection. Nature 2019. 7. PCAWG C: Pan-cancer analysis of whole genome. Nature 2019. 8. Lochmuller H, Badowska DM, Thompson R, Knoers NV, Aartsma-Rus A, Gut I, Wood L, Harmuth T, Durudas A, Graessner H et al: RD-Connect, NeurOmics and EURenOmics: collaborative European initiative for rare diseases. European journal of human genetics : EJHG 2018, 26(6):778-785. 9. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE et al: The FAIR Guiding Principles for scientific data management and stewardship. Scientific data 2016, 3:160018. 10. Lamy P, Brudermüller M, Ferguson M, Friis L, Garmendia C, Gray I, Gulliksen J, Kulmala H, Maher N, Plentz Fagundes M et al: LAB - FAB - APP Investing in the European future we want. In. Brussels: Independent High Level Group on maximising the impact of EU research & Innovation Programmes; 2017. 11. Hauser H, Bergman N, Bruncko M, Cosgrave P, Dwyer G, Helder M, Hinrikus T, Hoerr I, Karia B, Kolar J et al: Funding - Awareness - Scale - Talent (FAST) Europen is

23 back: Accelerating breakthrough innovation. In. Luxemburg: European Commission; 2018. 12. Implementation Roadmap for the European Open Science Cloud [https://ec.europa.eu/research/openscience/pdf/swd_2018_83_f1_staff_working_paper _en.pdf#view=fit&pagemode=none] 13. Mons B, Neylon C, Velterop J, Dumontier M, da Silva Santos LOB, Wilkinson MD: Cloudy, increasingly FAIR; revisiting the FAIR Data guiding principles for the European Open Science Cloud. Information Services & Use 2017, 37:49-56. 14. FAIRsharing homepage [https://fairsharing.org/] 15. REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL [https://eur-lex.europa.eu/legal- content/EN/TXT/?uri=celex%3A32016R0679] 16. Bouzaglo D, Chasida I, Ezra Tsur E: Distributed retrieval engine for the development of cloud-deployed biological databases. BioData mining 2018, 11:26. 17. Corrie BD, Marthandan N, Zimonja B, Jaglale J, Zhou Y, Barr E, Knoetze N, Breden FMW, Christley S, Scott JK et al: iReceptor: A platform for querying and analyzing antibody/B-cell and T-cell receptor repertoire data across federated repositories. Immunological reviews 2018, 284(1):24-41. 18. Sreenivasaiah PK, Kim DH: Current trends and new challenges of databases and web applications for systems driven biological research. Frontiers in physiology 2010, 1:147. 19. Wade TD: Traits and types of health data repositories. Health information science and systems 2014, 2:4. 20. Estiri H, Klann JG, Weiler SR, Alema-Mensah E, Joseph Applegate R, Lozinski G, Patibandla N, Wei K, Adams WG, Natter MD et al: A federated EHR network data completeness tracking system. J Am Med Inform Assoc 2019, 26(7):637-645. 21. Shenai S, Aramudhan M: Cloud computing framework to securely share health & medical records among federation of healthcare information systems. Biomedical research 2018; Special Issue: S133-S1367. 22. Oderkirk J: Readiness of electronic health record systems to contribute to national health information and research. In. Paris: OECD Health Working Papers; 217: 80. 23. eHealth: Digital health and care [https://ec.europa.eu/health/ehealth/projects/nationallaws_electronichealthrecords_en] 24. Overview of the national laws on electronic health records in the EU Member States and their interaction with the provision of cross-border eHealth services - Final report and recommendations [https://ec.europa.eu/health/sites/health/files/ehealth/docs/laws_report_recommendatio ns_en.pdf] 25. EOSC homepage [https://www.eoscpilot.eu/] Field Code Changed 26. Molnar-Gabor F, Lueck R, Yakneen S, Korbel JO: Computing patient data in the cloud: practical and legal considerations for genetics and genomics research in Europe and internationally. Genome medicine 2017, 9(1):58. 27. EU countries will cooperate in linking genimc databases across borders [https://ec.europa.eu/digital-single-market/en/news/eu-countries-will-cooperate- linking-genomic-databases-across-borders] 28. Declaration of cooperation towards access to at least 1 million sequenced genomes in the European Union by 2022 [https://www.euapm.eu/pdf/EAPM_Declaration_Genome.pdf] Field Code Changed

24 29. Saunders G, Baudis M, Becker R, Beltran S, Béroud C, Birney E, Brooksbank C, Brunak S, Van den Bulcke M, Drysdale R et al: Leveraging European infrastructures to access 1 million human genomes by 2022. Nature Reviews Genetics 2019. 30. Drosatos G, Kaldoudi E: Blockchain Applications in the Biomedical Domain: A Scoping Review. Computational and structural biotechnology journal 2019, 17:229- 240. 31. Esposito C, De Santis A, Tortora G, Chang H, Choo K-KR: Blockchain: A Panacea for Healthcare Cloud-Based Data Security and Privacy? IEEE Cloud Computing 2018, 5. 32. Miller JB: Big data and biomedical informatics: Preparing for the modernization of clinical neuropsychology. The Clinical neuropsychologist 2019, 33(2):287-304. 33. The eTRIKS code of practice on secondary use of medical research data [https://www.etriks.org/code-of-practice/] Field Code Changed 34. De Hert P, Papakonstantinou V: Three Scenarios for International Governance of Data Privacy: Towards an International Data Privacy Organization, Preferably a UN Agency? I/S: A Journal of Law and Policy for the Information Society 2013, 9(2):271-324. 35. EU picks team of 11 to run giant European Open Science Cloud [https://sciencebusiness.net/science-cloud/news/eu-picks-team-11-run-giant-european- open-science-cloud] 36. Human brain project ethics and society sub-projects [https://www.humanbrainproject.eu/en/social-ethical-reflective/] Field Code Changed 37. Salles A, Bjaalie JG, Evers K, Farisco M, Fothergill BT, Guerrero M, Maslen H, Muller J, Prescott T, Stahl BC et al: The Human Brain Project: Responsible Brain Research for the Benefit of Society. Neuron 2019, 101(3):380-384. 38. 8 Benefits and Risks of Cloud Computing in Healthcare [https://solutionsreview.com/cloud-platforms/8-benefits-and-risks-of-cloud- computing-in-healthcare/] 39. Ali O, Shrestha A, Soar J, Wamba F: Cloud computing-enabled healthcare opportunities, issues, and applications: A systematic review. International Journal of Information Management 2018, 43:146-158. 40. Lian JW: Establishing a Cloud Computing Success Model for Hospitals in Taiwan. Inquiry : a journal of medical care organization, provision and financing 2017, 54:46958016685836. 41. de Bruin B, Floridi L: The Ethics of Cloud Computing. Science and engineering ethics 2017, 23(1):21-39. 42. Council CSC: Impact of Cloud Computing on Healthcare. In.; 2017. 43. Sultan N: Making use of cloud computing for healthcare provision: Opportunities and challenges. International Journal of Information Management 2014, 34(2):177- 184. 44. Sobeslav V, Maresova P, Krejcar O, Franca TC, Kuca K: Use of cloud computing in biomedicine. Journal of biomolecular structure & dynamics 2016, 34(12):2688-2697. 45. Rodrigues JJPC, Sendra Compte S, de la Torra Diez I: Cloud Computing on e-Health. In: e-Health Systems, Theory, Advances and Technical Applications. Edited by Elsevier IP-; 2016: 191-207. 46. CERN Openlab homepage [https://openlab.cern/] 47. Worldwide LHC Computing Grid homepage [http://wlcg.web.cern.ch/] Field Code Changed 48. Chen J, Qian F, Yan W, Shen B: Translational biomedical informatics in the cloud: present and future. BioMed research international 2013, 2013:658925.

25 49. Khan N, Yaqoob I, Hashem IA, Inayat Z, Ali WK, Alam M, Shiraz M, Gani A: Big data: survey, technologies, opportunities, and challenges. TheScientificWorldJournal 2014, 2014:712826. 50. Kuo AM: Opportunities and challenges of cloud computing to improve health care services. Journal of medical Internet research 2011, 13(3):e67. 51. Navale V, Bourne PE: Cloud computing applications for biomedical science: A perspective. PLoS computational biology 2018, 14(6):e1006144. 52. LAB, FAB, APP: Investing in the European future we want. In. Luxemburg: Publication office of the European Union; 2017. 53. Mazzucato M: Mission-Oriented Research & Innovation in the European Union - a problem-solving approach to fuel innovation-led growth. In. Luxemburg: Publication office of the European Union; 2018. 54. EIT in Horizon Europe (2021-2027) - complementarities and synergies with the EIC [https://eit.europa.eu/sites/default/files/eit_position_paper_horizon_europe_eit_eic_0.p df] 55. H2020 Scientiic Panel for Health: Building the future of health research - Proposal for a European Council for Health Research In.; 2018. 56. European Joint Programme of Rare Diseases homepage [https://www.ejprarediseases.org/index.php/about/] Field Code Changed 57. European Research Council homepage [https://erc.europa.eu] 58. European Structural and Investment Funds [https://ec.europa.eu/info/funding- tenders/funding-opportunities/funding-programmes/overview-funding- programmes/european-structural-and-investment-funds_en] 59. CORBEL project homepage [http://www.corbel-project.eu] Field Code Changed 60. ELIXIR infrastructure homepage [https://www.elixir-europe.org/] Field Code Changed 61. The eTRIKS project homepage [https://www.etriks.org/] 62. EMIF project homepage [http://www.emif.eu/] Field Code Changed 63. EHDEN project homepage [https://www.ehden.eu/] Field Code Changed 64. Human Brain Project homepage [https://www.humanbrainproject.eu/en/medicine/] Field Code Changed 65. Human Brain Project Knowledge Graph Field Code Changed [https://www.humanbrainproject.eu/en/explore-the-brain/] 66. HELIX nebula homepage [http://www.helix-nebula.eu/] Field Code Changed 67. [https://dcc.icgc.org/icgc-in-the-cloud/collaboratory] Field Code Changed 68. PCAWG project homepage [https://dcc.icgc.org/pcawg] 69. RD Connect homepage [https://rd-connect.eu/] 70. COMPARE project homepage [http://www.compare-europe.eu] 71. [https://www.globalmicrobialidentifier.org]

26 Supplementary Table

Click here to access/download Supplementary Material HRIC-Supplementary-Table 1-18102019.docx