ELIXIR ELIXIR Annual Report 2018 Annual Report 2018 Contents

2 Foreword by Robert Gentleman, Chair of Scientific Advisory Board 3 Foreword by ELIXIR Director

5 Platforms 8 Tools: Services and connectors to drive access and exploitation 10 Data: Sustaining Europe’s life science data infrastructure 12 Compute: Access, exchange and storage 14 Interoperability: Integration of data and services 17 Training: Professional skills for managing and exploiting data 20 Platform leaders

21 Use Cases and Communities 23 Human data 24 Rare diseases 25 Marine metagenomics 26 Plant sciences 27 Use Case leaders 28 New ELIXIR Communities

29 Members

39 2017 Highlights

43 EU Grants 44 ELIXIR-EXCELERATE 46 Collaboration with other Research Infrastructures

49 Supporting activities 50 Capacity building and Node development 51 Industry engagement 53 International collaboration 54 Impact and sustainability 55 Communications 57 Governance 59 ELIXIR Hub staff

61 Governance Committees and financial data 62 ELIXIR Committees 65 Implementation studies in 2017 69 Financial data Foreword by Robert Gentleman

The past year has seen a remarkable increase in the technical and scientific activities of ELIXIR, ranging from the selection of ELIXIR Implementation Studies by peer review for the first time, to the delivery of activities across all of the ELIXIR Platforms and Use Cases, to the results of ELIXIR collaborations with its European and international partners.

As Chair of the ELIXIR Scientific Advisory Board (SAB), I have watched the development of ELIXIR in 2017 in great detail, and – as we stated in our regular SAB reports – the rate of progress in the implementation of the ELIXIR Scientific Programme has been phenomenal.

The selection and publication of the ELIXIR Core Data Resources was not only the highlight of 2017, I have had the privilege of engaging with ELIXIR and but of the entire programming cycle of 2014–2018. its members through my role on the ELIXIR Scientific By establishing the initial list of ELIXIR Core Data Advisory Board since the early days of ELIXIR. Resources, ELIXIR has defined the best practices in I am proud of the impact that the SAB has had and I assessing the quality of biological data provision and continue to be impressed by the energy and effort of its has gained worldwide recognition as an authority in members. It has been enormously interesting to watch this field. This directly feeds into the global case to and help guide the development of ELIXIR, and I am sustain core data resources. looking forward to seeing it build and develop further. Another important milestone in 2017 was the creation I believe that the future of ELIXIR is very bright and of new ELIXIR Communities. Although each of the that it will leverage the potential of life-science data for three new communities created – the Proteomics, maximum impact on research, innovation and society Metabolomics and the Galaxy community – have at large. different requirements, they provide very good opportunities to develop strong technical activities that Robert Gentleman will benefit their respective user groups. Chair of ELIXIR Scientific Advisory Board (2013-2017) In 2017, we also saw the beginning of the development of the ELIXIR Scientific Programme for 2019–2023, the strategic document that lays out the priorities and objectives for the next five years. I’ve been pleased to see ELIXIR reaching out to ELIXIR Nodes in the development of this programme. It will be crucial in 2018 to maintain this vibrant dialogue to ensure the Programme responds to the needs of the life science community in Europe.

2 ELIXIR Annual Report 2017 Foreword by ELIXIR Director

This Annual Report illustrates the growth and maturity of our research infrastructure, highlighting the efforts of the 650 plus national experts involved in ELIXIR through our 21 ELIXIR Nodes, which collectively span over 200 institutes across Europe.

As I reflect on the major achievements of 2017, there are several events and themes that stand out: the publication of the initial list of ELIXIR Core Data Resources, the collective efforts in ELIXIR- EXCELERATE that resulted in a highly successful mid- term review, and the implementation of FAIR principles across our data resources and services.

Long-term sustainability A key goal for ELIXIR, ever since the planning began for a distributed European data infrastructure, has been The quality of our work in ELIXIR-EXCELERATE to develop a new model that ensures the long-term was also recognised by the European Commission; sustainability of the key life science databases and ELIXIR-EXCELERATE was cited as an early success knowledgebases. In 2017, we made several important story in the analysis of the Horizon 2020 Research steps towards this goal. Infrastructures programme published in May 2017 by the European Commission. In Prague, during the ISMB conference in July, we announced the initial list of ELIXIR Core Data Resources. It was the culmination of many months of committed FAIR principles effort by our Data Platform, Heads of Nodes, Scientific In September, we published a position paper on Advisory Board, our external evaluators and – not least – FAIR Data Management in the life sciences, setting the representatives of all the ELIXIR data resources who out our view on the guiding principles on FAIR Data embraced the process and participated in it. Management. The position paper confirmed our commitment to supporting the FAIR data principles Our work to establish ELIXIR Core Data Resources, and within the framework of the European Open Science the very positive reactions we have received about these Cloud (EOSC). resources from the community, also strengthened our position in the dialogue with global funders. Throughout Good examples of how we support FAIR data are the 2017, ELIXIR was an active participant in the initiative continued success of the Bioschemas Community. to establish a global coalition to sustain core data In 2017, Bioschemas held a series of workshops and resources, initially facilitated by the Human Frontier hackathons to engage with the life-science community Science Program Organization. and encourage their members to adopt and use the Bioschemas specifications and make their data ELIXIR-EXCELERATE discoverable by others. As more ELIXIR resources adopt the Bioschemas markup, we are building a In May 2017, our flagship project ELIXIR-EXCELERATE key component of ELIXIR FAIR data infrastructure. received very positive feedback on the activities and results of the first half of this project. During the project’s mid-term review meeting in May in Brussels, we had a very fruitful discussion with our external reviewer and received many useful recommendations for our future work.

ELIXIR Annual Report 2017 3 Looking ahead The year 2018 is the last year of our current Scientific Programme and we are working hard on the next ELIXIR Scientific Programme for 2019–2023.

As a strategic document to set the future direction of ELIXIR, this new programme reflects the priorities and activities of ELIXIR Members and the needs of our user communities. Development of the ELIXIR 2019–23 Scientific Programme is a considerable undertaking that involves the large community of ELIXIR Node experts, the leaders of ELIXIR Platforms and Use Cases, and the Heads of ELIXIR Nodes.

I would like to take the opportunity to thank all of those involved in the work to develop the vision of ELIXIR in 2023, including those working on the technical roadmaps that will take us there, and on the development and delivery of services that have earned us the trust and commitment of users.

I look forward to the continued success of ELIXIR in this transformative era in the life sciences.

Niklas Blomberg

4 ELIXIR Annual Report 2017 Platforms ELIXIR Platforms

ELIXIR activities are structured around five Platforms and a growing portfolio of Use Cases (see the Use Case section).

The Platforms form the basic units of operation within ELIXIR, drawing on technical expertise and resources from ELIXIR Nodes. ELIXIR Platforms are built on the real and changing needs of established research communities. They are led by senior scientists from ELIXIR Nodes and are supported by a Platform coordinator at the ELIXIR Hub.

The ELIXIR Platforms comprise: • Data: Sustaining Europe’s life-science data infrastructure • Tools: Services and connectors to drive access and exploitation • Interoperability: Supporting the discovery, integration and analysis of biological data • Compute: Storage, computing and authentication / access services • Training: Professional skills for managing and exploiting data

The activities of the Platforms are primarily funded by the ELIXIR-EXCELERATE project, in which each Platform is represented by a Work Package. Each Platform also manages an expanding set of ELIXIR Implementation Studies, funded through the ELIXIR Hub budget. Additional activities are funded through other EU grants (CORBEL, EOSC and others).

6 ELIXIR Annual Report 2017 Platforms Communities

Data Rare diseases

Human data

Training

Marine metagenomics

Compute Plant science

Proteomics

Interoperability

Metabolomics

Tools Galaxy

ELIXIR Annual Report 2017 7 Tools Services and connectors for access and exploitation

The ELIXIR Tools Platform supports the The Tools Platform also started a joint Implementation discovery, quality and sustainability of Study with the ELIXIR Training Platform to integrate 4 software resources, initially from ELIXIR ELIXIR portals from a user perspective . In addition, the Nodes, and later from the wider life Galaxy Working Group was selected in October 2017 to become an ELIXIR Community and is planning its first sciences community. community meeting in March 2018 (see Chapter New The main objectives of this Platform are: to help users Communities). find, access, re-use, deploy and benchmark software tools, including workflows; and to help software Scientific Benchmarking and technical providers and developers to better describe and monitoring develop software tools, including workflows. Scientific benchmarking and technical monitoring of tools and services provides an objective The ELIXIR Tools Platform coordinates technical way to ensure that bioinformatics tools are available, activities, and engages end-users across seven stable, robust and fit for purpose. In 2017, the first technical groups: (1) Bio.tools; (2) Scientific OpenEbench prototype was released at http://elixir.bsc. benchmarking and technical monitoring; (3) Software es. This Benchmarking platform will provide guidance deployment; (4) Workflows and workbenches; (5) and software infrastructure for benchmarking and Software development best practices; (6) Tools technical monitoring of bioinformatics tools, web- interoperability; and (7) Galaxy Workflow Management servers and workflows. The first workshop organised system. In 2017, the platform focused activities on Bio. around OpenEBench was held in September 2017 tools, Scientific benchmarking, Software deployment, in Basel. and the Software development best practices, which are highlighted below. The experience gained from establishing the ELIXIR Benchmarking Platform was published in a working Bio.tools: ELIXIR tools and services paper Lessons Learned: Recommendations for registry Establishing Critical Periodic Scientific Benchmarking5. This technical group aims to deliver a world-leading discovery portal (bio.tools) for bioinformatics Software deployment software information. In 2017, the content of bio.tools The main goal of the Software deployment group is to expanded to over 10,000 entries and migrated to a support existing community efforts with their work on new version of the data model (biotoolsSchema 2.0). bioinformatics software deployments using traditional The usability of the portal was also improved by the packages and containers. In October 2017, the ELIXIR development of new features, such as content sharing, Tools Platform organised an ELIXIR hackathon in Paris, and the enrichment of results with literature data. In France, on Biocontainers to accelerate and consolidate September, the Platform published the first (beta) the Containers platform as a Service, and to integrate versions of Tool Information Standards and bio.tools it with other groups within the ELIXIR Tools Platform. Curation Guidelines1. As a result of this event, a new ELIXIR Biocontainers Implementation Study was approved to begin in The bio.tools developers published two new research January 2018. papers on the integration of bio.tools and workbench environments. The first paper was published in April The Workflow and workbenches group was in Giga Science and presented ReGaTE, a software established by the ELIXIR Tools Platform to support utility to automate the registration of services available the integration of the bio.tools with integrated in a Galaxy instance in bio.tools2. The second paper environments (Galaxy, Taverna, etc.) and to develop presented the ToolDog (Tool DescriptiOn Generator), the link between discovery and execution. This group which facilitates the integration of tools registered in developed the ToolDog tool to facilitate the integration bio.tools into workbench environments3. of tools registered in bio.tools into workbench environments3.

8 ELIXIR Annual Report 2017 Number of entries in the bio.tools registry: January 2017 - January 2018

11000

10000

9000

8000

3000

7000

6000

5000

4000

3000 Jan 17 Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan 18

Software development best practices 1. http://biotools.readthedocs.io/en/latest/curators_guide.html In July 2017, the Software Development Best Practice 2. Doppelt-Azeroual O, Mareuil F et al. ReGaTE: Registration of Galaxy Tools in ELIXIR, GigaScience 2017, 6(6): 1-4, (doi: 10.1093/gigascience/gix022 Working Group published a new paper to encourage developers, research institutes, and companies to 3. Hillion KH, Kuzmin I, Khodak A et al. Using bio.tools to generate and annotate workbench tool descriptions. F1000Research 2017, 6(ELIXIR):2074 (doi: adopt four best practices for open source development 10.12688/f1000research.12974.1 ) 6 of life science research software. 4. https://www.elixir-europe.org/activities/elixir-integration-user-perspective

The paper is the outcome of year-long discussions 5. Capella-Gutierrez, S. et al. Lessons Learned: Recommendations for and deliberations driven by the Platform, together Establishing Critical Periodic Scientific Benchmarking. 2017.URI: http://hdl. handle.net/2117/107279 with the Software Sustainability Institute and the 6. Jiménez RC, Kuzak M, Alhamdoosh M et al. Four simple recommendations Netherlands eScience Center. They involved a wide to encourage best practices in research software. F1000Research 2017, 6:876 range of researchers and developers, representing over (doi: 10.12688/f1000research.11407.1) 40 different institutes and organisations. As such, the recommendations present a broad consensus of the life-science research community.

The ELIXIR Tools Platform is funded through the ELIXIR- EXCELERATE project (Work Package 1: Tools Interoperability and Service Registry, and Work Package 2: Benchmarking) and Implementation Studies commissioned by the ELIXIR Hub.

ELIXIR Annual Report 2017 9 Data Sustaining Europe’s life-science data infrastructure

The ELIXIR Data Platform provides a Following the selection process, the Data Platform framework for developing ELIXIR’s and the Core Data Resources leads agreed on a set of sustainability strategy for life-science metrics on the Core Data Resources to be gathered data resources. The main goal of the annually, as a way of demonstrating their value and Platform is to establish quality metrics for managing their life cycles. The plan for collating these data resources, to identify the key data quality indicators of ELIXIR’s Core Data Resources was published in August 20171. resources across all life science domains, and to make data resources easier to find In September 2017, the Platform began the second and access. round of the selection process for both ELIXIR Core Data Resources and ELIXIR Deposition Databases. In 2017, the main focus of the Platform was the The selection and evaluation will take place regularly selection of ELIXIR Core Data Resources and ELIXIR and further resources will be included as the ELIXIR Recommended Deposition databases. The Data data infrastructure evolves. Platform also piloted, for the first time, the selection of Implementation Studies through peer review involving Global coalition to sustain core external reviewers. data resources ELIXIR is an active member of the initiative to establish ELIXIR Core Data Resources and a global coalition to sustain core data resources, which Recommended Deposition Databases brings together senior managers of key databases and ELIXIR Core Data Resources are the primary focus of leaders of major funding organisations across the Data Platform’s activities. Core Data Resources the world. are a set of European life-science data resources (deposition archives and knowledgebases) that are of In March 2017, the group published a call-for-action fundamental importance to life-science research and to for a global coalition to sustain core data resources2, the long-term preservation of biological data. which identified the main shortcomings of the current funding of life-science data infrastructures and which In 2017, the Data Platform concluded the first called for more fit-for-purpose infrastructure funding round of the process to select the ELIXIR Core Data models. The work done by ELIXIR in establishing Core Resources and announced the initial list. In addition Data Resources in Europe and in identifying indicators to this list of Core Data Resources, ELIXIR compiled to assess the importance of data resources has provided a list of databases that it recommends be used for this global coalition with a framework that can be the deposition of experimental data. The goal was developed globally. to provide guidance to journals and funders on the appropriate repositories in which to publish open Data Platform Implementation studies data in the life sciences. Many funding agencies have In 2017, the Data Platform ran a peer review process to already shown an interest in recommending that their select a portfolio of Implementation Studies linked to grantees use ELIXIR’s Deposition Databases. the Platform to begin in 2018. The Platform published a Request for Proposals, along with the Criteria The Core Data Resources form the backbone of for Evaluation. The selection process relied on the ELIXIR’s sustainability strategy. The monitoring and assessment of independent experts in bioinformatics evaluation of their usage will provide reliable measures and bioinformatics service provision. Of the 17 of their scientific and economic value and will highlight applications received, seven studies were selected, the benefits of generating a sustainable infrastructure involving 13 ELIXIR Nodes. for open biological data. This is the first time ELIXIR Implementation Studies have been selected via independent peer review. The goal of this approach was to drive excellence in the proposed work, to increase the transparency of ELIXIR resource allocation, and to provide feedback to ELIXIR Nodes regarding their service provision plans and objectives.

10 ELIXIR Annual Report 2017 ELIXIR Core Data Resources

ArrayExpress: Functional Genomics Ensembl Genomes: Genome browser InterPro: An umbrella resource Data from high-throughput functional for non-vertebrate genomes that to which many collaborating genomics experiments. supports comparative analysis, data databases contribute. It enables mining, and visualisation of non- the functional analysis of protein CATH: A hierarchical domain vertebrate genomes sequences, performed by classifying classification of protein structures in sequences into families and predicting the Protein Data Bank. Europe PMC: A repository that the presence of domains and provides access to world-wide life ChEBI: Dictionary of molecular important sites. sciences articles, books, patents and entities focused on ‘small’ chemical clinical guidelines. PDBe: A database of biological compounds. macromolecular structures. Human Protein Atlas: ChEMBL: Database of bioactive drug- A knowledgebase that contains most PRIDE: A database of mass like small molecules, it contains 2-D of all human protein-coding genes, spectrometry-based proteomics structures, calculated properties and and information about the expression data, including peptide and protein abstracted bioactivities. and localization of the corresponding expression information (identifications EGA: A database of personally proteins based on both RNA and and quantification values), and the identifiable genetic and phenotypic protein data. supporting mass spectra evidence. data resulting from biomedical The IMEx Consortium, STRING-db: A database of known research projects. represented by IntAct and MINT: and predicted protein-protein ENA: A database of nucleotide IntAct provides a freely available, open interactions. sequencing information, covering raw source database system and analysis UniProt: A comprehensive repository sequencing data, sequence assembly tools for molecular interaction data. and resource for protein sequence information, and functional annotation. MINT focuses on experimentally and annotation data. verified protein-protein interactions Ensembl: Genome browser for mined from the scientific literature by vertebrate genomes that supports expert curators. research in comparative genomics, evolution, sequence variation, and transcriptional regulation.

Long-term funding models for The ELIXIR Data Platform is funded through the ELIXIR- EXCELERATE project (Work Package 2: Data Resources bioinformatics databases and Services) and through ELIXIR Implementation Studies To better understand the challenges in developing commissioned by the ELIXIR Hub. and securing long-term sustainable funding for key bioinformatics resources, ELIXIR funded an Implementation Study to explore and review sustainable funding models for a specific case: the UniProt knowledgebase.

The Implementation Study ran from January to December 2017 and was carried out by ELIXIR Switzerland (SIB, Swiss Institute of Bioinformatics). The study resulted in a paper, published in the ELIXIR F1000R channel in November 2017 3. The article presented and reviewed twelve funding models for data resources and applied them to the specific case of UniProt. It showed that most of the models lead to 1. Stockinger, H., Barlow, M., Cook, C. et al. Plan for collation of metrics and quality data at the ELIXIR Hub. Zenodo 2018. (doi: 10.5281/zenodo.1194123) inconsistencies with open access or equity policies and proposed a new Infrastructure Model whereby funding 2. Data management: A global coalition to sustain core data. Nature 543, 179 (09 March 2017) doi:10.1038/543179a3. agencies would set aside a fixed percentage of their 3. Gabella C, Durinx C and Appel R. Funding knowledgebases: Towards a research grant volumes, which would subsequently sustainable funding model for the UniProt use case. F1000Research 2017, be redistributed to core data resources. 6(ELIXIR):2051 (doi: 10.12688/f1000research.12989.1)

ELIXIR Annual Report 2017 11 Compute Access, exchange and storage

The ELIXIR Compute Platform is developing Transfer of large volumes of sensitive a robust technical infrastructure for human data – demonstrators accessing, transferring, exchanging and Moving large volumes of data between sites, while analysing biological data. It aims to provide maintaining confidentiality and security, is a key cloud, computing, storage and access capability of the Compute Platform. In 2017, the services for the research community. Platform deployed GridFTP servers and integrated them with the ELIXIR AAI to enable regular file The objective is to integrate the individual technical transfers between nine GridFTP servers, whilst components provided by the ELIXIR Nodes into a providing a record of reliability and network seamless service provision system for the life-science performance. The demonstration of the transfer of research community. The Compute Platform works sensitive data between the GridFTP servers was closely with the ELIXIR Use Cases and with the ELIXIR presented in November 2017, during an ELIXIR Training Platform to ensure that technical solutions Webinar, which gave an overview of improved support address their specific needs. to researchers in accessing and processing sensitive data from European Genome-Phenome Archive (EGA, In 2017, the development of the Compute Platform http://www.ega-archive.org). services resulted in the first release of services to end-users. An ELIXIR Implementation Study which started in November 2017 will support a number of ELIXIR Nodes ELIXIR Authentication and Authorisation in testing and integrating this service into production. Infrastructure (AAI) Cloud resources integration The ELIXIR AAI allows a user to create an ELIXIR identity based on a pre-existing identity (e.g. Google, The goal of the cloud integration task within the ORCID or the researcher’s home university) to Compute Platform is to integrate the cloud resources authenticate access to multiple infrastructure services. affiliated with the ELIXIR Nodes. The Platform has been This allows users to use their existing accounts to evaluating the EGI Federated Cloud model, especially access ELIXIR services and helps operators of ELIXIR within the context of the emerging European Open resources to manage user access to their services. Science Cloud (EOSC) initiative, where it may be used as the federation model. The ELIXIR Compute Platform In 2017, several resources and services adopted ELIXIR is engaging with EOSC through an ELIXIR Competency AAI, including the AAI Gateway of the European Grid Centre, funded as part of the EOSC-Hub project. Infrastructure (EGI) to provide access to EGI services. The relying service provider network using the ELIXIR The work of the ELIXIR Compute Platform was funded through identity that underpins ELIXIR AAI also includes two the ELIXIR-EXCELERATE project (Work Package 4: Compute, commercial cloud service providers within the Helix Data access and exchange services) and Implementation Studies Nebula Science Cloud project, and integrates with commissioned by the ELIXIR Hub. The development of the ELIXIR AAI was informed by close collaboration with the AARC 1 EUDAT’s B2ACCESS service, which allows access to and AARC2 (Authentication and Authorisation for Research and EUDAT services. Collaboration) projects.

In addition to receiving ELIXIR-EXCELERATE funding, the AAI tasks were also supported through an ELIXIR Implementation Study in 2017, which accelerated the development and integration of the AAI services needed to support human data.

The ELIXIR AAI is currently a production service used by an increasing number of service providers within the ELIXIR community, with three hundred identity providers, more than 1000 users, and tens of services connected to date.

1. https://www.eudat.eu/services/b2access

12 ELIXIR Annual Report 2017 Number of ELIXIR AAI users and number of research institutions enabled for ELIXIR AAI login

1600

1400

1400

1200

1000

800

600

400

200

0 Jan 17 Oct 17 Apr 18

ELIXIR users Home Organisation Identity Providers enabled for login (eduGAIN)

Number of registered resource providers connected to ELIXIR AAI

100 79 61 50

9 0 Jan 17 Oct 17 Apr 18

ELIXIR Annual Report 2017 13 Interoperability Integration of data and services

The ELIXIR Interoperability Platform aims Service Framework to support people and machines in finding, The Interoperability Platform Roadmap defined general combining and reusing datasets, as well components of the interoperability service stack and as individual data records from different identified existing resources – both within and outside sources, across institutional, geographical ELIXIR – that can deliver these components most effectively. The needs analysis was captured in the and scientific domains. Implementing Service Framework for the Interoperability Backbone. and promoting the ‘FAIR’ principles1 (Findable, Accessible, Interoperable and The first ELIXIR interoperability services selected Reusable) of data stewardship, the Platform as part of the Service Framework are as follows: encourages the life-science community Identifiers.org (https://identifiers.org); Ontology Lookup to adopt standardised formats, metadata, Service (https://www.ebi.ac.uk/ols); and FAIRsharing (https://fairsharing.org/, formerly BioSharing). The vocabularies and identifiers. The Platform is Platform developed a work plan to improve their driven by the needs of ELIXIR’s Use Cases interoperability with other Platform registries and and collaborates globally through a number to facilitate their longer-term sustainability. The of initiatives, including the Research Data Interoperability Platform also drafted criteria for FAIR Alliance, FORCE11, U.S. NIH data commons Interoperability services and registries, resulting in projects, and others. the development of procedures and processes (SOPs) through which other candidate registries and resources can be incorporated into the Platform. Emerging interoperability services Bioschemas – universal markup for datasets and In 2016, the major outcome of the Platform was the biological entities publication of the Interoperability Platform Roadmap, Bioschemas is a community-driven metadata which defines the Platform’s technical and scientific vocabulary and markup based on schemas.org, which strategy and outlines requirements for ELIXIR’s is tailored for the life sciences. In 2017, Bioschemas Interoperability services. In 2017, the Interoperability mobilised the life-science community to develop Platform started to implement this Roadmap and description profiles for data resources and data types. presented the first components of the emerging portfolio of ELIXIR interoperability services. Bioschemas also encouraged researchers to develop, adopt, and to use the specifications through a series of The activities that fall within the Interoperability workshops in Hinxton, UK, and at various international Platform are divided into seven projects: (1) Service meetings, including the Open Science Fair in Athens. Framework; (2) Resource Markup (Bioschemas); In October 2017, in Hinxton, representatives from (3) Identifiers (including mapping services); (4) over 30 different biological resources, including major Metadata services and Standards Registry; (5) Linked international resources such as UniProt and PDBe, Open Data; (6) Workflow and Tools Interoperability tested and adopted at least one of the Bioschemas (Common Workflow Language); and (7) Interoperability specifications. Knowledge Hub. As part of this effort, Bioschemas also expanded its In 2017, the Interoperability Platform focused scope and developed specifications for more types of their attention and efforts on Service Framework, life-science data. By the end of 2017, twelve different Bioschemas, Common Workflow Language, and specifications2 had been developed, describing the Identifiers. The Platform has also worked within general properties of datasets, as well as specific the EOSCPilot project to pilot Data Catalogue markup for data types, samples, proteins, and markup interoperability, Common Workflow Language for other bio resources, such as laboratory protocols adoption, and identifier schemes within the European and tools. Tools to assist with markup, validation and Open Science Cloud (see EU Grants section). indexing are currently being piloted.

14 ELIXIR Annual Report 2017 Common Workflow Language To address the needs of scientific journals, identifiers. The Common Workflow Language (CWL) supports the org has implemented compact identifiers, providing reproducibility and interoperability of workflows and an easy and human readable way of referencing analysis tools. It helps scientists and bioinformaticians data in scientific papers. This work was organised to describe analysis tools and workflows, which through a Force11 identifiers group and carried out can then be used across a variety of platforms. The in collaboration with an equivalent meta-resolver, Interoperability Platform works in close partnership name-2-thing (n2t), which is based at the California with the CWL grassroots community and has adopted Digital Library, USA. This collaboration resulted in CWL’s approach to improve reproducibility and the implementation of a global resolution of compact interoperability of life science data and research. Identifiers for biomedical data, which was presented in Nature Scientific Data in 20183. In 2017, the Platform worked with the Marine Metagenomics Use Case on the description Experts within ELIXIR’s Interoperability Platform lead of their analysis pipelines. This led to a funded the data interoperability task within the ESOCpilot ELIXIR Implementation study in 2018 to support project. In 2017, this work culminated in a first draft of the integration of CWL with ELIXIR tools, and the the strategy and recommendations to help users and interoperability of pipelines with international services to find and access datasets across several partners. Along with the Tools and Training scientific disciplines. (See more in EU Grants section). Platforms, the Interoperability Platform initiated a programme of workshops, best practice guides, and Looking ahead community engagement efforts, notably with the The overall goal of the Interoperability Platform Galaxy Community. Support for CWL in Galaxy is remains to enable scientists to find, access, combine, on the official project roadmap. It is currently being and analyse multiple datasets. In addition, it also implemented, and a significant part of the CWL aims to enable data- and service- providers to deliver standard is already implemented in the Galaxy’s findable, accessible, interoperable and reusable backend. The next important step is to adapt its datasets and services, and to facilitate the adoption web frontend. of global standards.

Identifiers In 2018, the Interoperability platform will build on the The vast majority of data collections in the life sciences work of 2017. It will also focus on establishing its first are now accessible online. It is therefore crucial to have recommended portfolio of services, and on improving a stable and persistent means to identify and reference its guidelines and knowledge hub for ELIXIR members. data. Identifiers.org, run by EMBL-EBI, is an established It will propose criteria and a selection procedure resolving system, which provides resolvable identifiers for ELIXIR recommended interoperability services for life-science resources (databases). In 2017, the and it will organise a call for proposals for ELIXIR Interoperability Platform ran an ELIXIR Implementation interoperability services. An ELIXIR Implementation Study to establish Identifiers.org as a core ELIXIR Study will also focus on validation services for meta service to provide stable and resolvable identifiers for data and common formats for the Platform’s partners life-science data. This Implementation Study updated to use. Proof of concept validation services will also be the registry that underpins the service, to include the implemented for the ELIXIR Plant Use Case, majority of resources run by ELIXIR Nodes. among others.

The work of the ELIXIR Interoperability Platform was funded through the ELIXIR-EXCELERATE project (Work Package 5: the ELIXIR Interoperability Backbone). The work on Identifiers. org and Bioschemas were funded by the ELIXIR Hub through ELIXIR Implementation Studies. Other sources of grant funding, including the CORBEL, EOSCPilot, and BioExcel projects, have contributed to activities within the Platform. 1. Wilkinson, MD., Dumontier, M. The FAIR Guiding Principles for scientific data. Scientific Data 2016,03(15) online, (doi: 10.1038/sdata.2016.18)

2. http://bioschemas.org/specifications/

3. Sarala M. Wimalaratne et al. Uniform resolution of compact identifiers for biomedical data. Sci. Data 5:180029 doi: 10.1038/sdata.2018.29 (2018).

ELIXIR Annual Report 2017 15 Interoperability Platform Services Framework

Standards and APIs Applications Intergration Pipelines

Identifier, resoution, Standards registry Ontology API description versioning, provenance

Identifier mapping Tools registry Linked data Tools and workflow descriptions

Citation implimentation Workflows registry Annotation and Dataset description curation

Identifier authority Search and Query Data integration Validation services

BYOD BYOW BYOAPI

16 ELIXIR Annual Report 2017 Training Professional skills for managing and exploiting data

The provision of bioinformatics training ELIXIR Training Courses to help life-science researchers work In 2017, the ELIXIR Training Platform organised 44 effectively with ELIXIR resources is a key training events for researchers, trainers and developers priority. The ELIXIR Training Platform is under the umbrella of ELIXIR-EXCELERATE; several therefore building a sustainable training hundred training events have been organised by the infrastructure that aims to provide access Nodes themselves. to training courses, tools, resources, and One example of such training for researchers are expertise, including quality assurance the Software and Data Carpentry workshops. After and monitoring. In 2017, with the delivery a successful pilot1, ELIXIR has set up an agreement and completion of the basic elements of with the Carpentry Foundation to roll out the the Training Infrastructure, the ELIXIR provision of Software and Data carpentry courses Training Platform entered its second within ELIXIR, to build an instructor pool, and thus phase of the ELIXIR-EXCELERATE to empower researchers to more easily access the grant. This second phase aims to ELIXIR infrastructure. extend, strengthen and to consolidate The Platform also concluded the Train-the-Trainer current efforts, and to apply our defined (TtT) Pilot programme, which consisted of seven pilot standards and descriptors to the all courses organised in 2016 and 2017, the development content produced by the Platform. of training materials, and the systematic collection of feedback from course participants. The results and lessons learnt were published in a paper in the ELIXIR 2 The activities of the Training Platform are made up F1000R channel , and all materials produced are 3 of the following components: (1) ELIXIR Courses for available online . The TtT programme has been running researchers, developers and trainers, (2) Training at full speed since then, and will be further expanded evaluation, and (3) Training infrastructure, including from 2018 on with the ELIXIR TtT Exchange, which the ELIXIR Training Portal TeSS, the ELIXIR-SI will facilitate the participation of ELIXIR members in e-learning platform, and the Virtual Coffee Room. TtT courses. These activities are implemented in the ELIXIR Nodes To make sure ELIXIR courses effectively exploit under responsibility of the Training Coordinators computing resources, the Training Platform started an Group (TrCG). Implementation study to define the best mechanisms to request and use ELIXIR Node cloud resources for bioinformatics training. Ready-to-run virtual machines that contain an operating system and pre-installed analysis software should improve the portability and reproducibility of ELIXIR courses.

1. Pawlik A, van Gelder CWG, Nenadic A et al. Developing a strategy for computational lab skills training through Software and Data Carpentry: Experiences from the ELIXIR Pilot action. F1000Research 2017, 6:1040, (doi:10.12688/f1000research.11718.1)

2. Morgan SL, Palagi PM, Fernandes PL et al. The ELIXIR-EXCELERATE Train- the-Trainer pilot programme: empower researchers to deliver high-quality training. F1000Research 2017, 6:1557 (doi: 10.12688/f1000research.12332.1)

3. https://github.com/TrainTheTrainer/EXCELERATE-TtT

ELIXIR Annual Report 2017 17 Figure 1: Overall satisfaction of participants of ELIXIR Figure 2: Concrete ways ELIXIR training course helped Training courses their participants

It did not help as I do not use the It improved Poor Excellent resources covered in the course my overall 1% 10% 6% Satisfactory efficiency 5% 22% It improved my ability to handle data Good 44% 16%

It improved my interactions with Very good the bioinformatician 68% analysing my data 28%

Training evaluation Training Infrastructure Using a consistent set of Key Performance Indicators (KPIs), the Training Platform started to systematically TeSS: ELIXIR Training portal collect feedback from courses organised with the The TeSS portal allows scientists to browse, discover support of the ELIXIR-EXCELERATE project and of and to organise life-science training events and materials ELIXIR Nodes. that have been aggregated from ELIXIR Nodes and third- party providers (e.g. RI-Train, BioEXCEL, GOBLET, etc). The data collected to date cover 144 courses that were By the end of 2017, it included nearly 300 events and organised between September 2016 and July 2017, over 800 training materials from 45 providers. from 2,200 respondents, across twelve participating ELIXIR Nodes. According to the survey, nearly 80% of In 2017, the TeSS team started an ELIXIR Implementation respondents thought that the course they attended was Study to cross-link materials in TeSS with relevant ‘excellent’ to ‘very good’ (Figure 1), and 87% of them tools registered in ELIXIR Tools and Service Registry would recommend their course to colleagues. The data (bio.tools). The ultimate goal is to enable researchers collected six months after participants had attended a to discover and use ELIXIR resources across domains course indicate that attending a course has improved and platforms through an intuitive graphical interface, their ability to handle data or has improved their work based on diagrams of the most commonly used efficiency (Figure 2). bioinformatics workflows.

Following a detailed evaluation of the KPIs used and the ELIXIR e-Learning results collected to date, the evaluation methodology Throughout the year, ELIXIR courses were broadcast has been officially adopted by ELIXIR and will be used via a video conferencing system to allow numerous, to collect information for all ELIXIR training courses. The geographically distributed users to attend training. next iteration of ELIXIR training course feedback data Bioinformatics tools and services were also embedded collection began in August 2017 and will continue until into the ELIXIR-SI eLearning Platform, easing the access July 2018. of course participants to HPC, cloud-based resources, and to containers, overcoming the technical problems that participants can experience when trying to access training resources remotely. Overall, eleven courses and over 320 course participants benefited from the ELIXIR e-learning resources.

18 ELIXIR Annual Report 2017 Virtual Coffee Room The Virtual Coffee Room (VCR), a web-based platform, was released in March 2017 to ease the exchange of information among ELIXIR developers and trainers, to share questions, tasks and issues about software development among developers, and also to more quickly identify training needs. In the future, additional uses for the VCR will be explored by other ELIXIR communities, for instance as a help-desk platform for ELIXIR services.

Outlook In 2018 and the remainder of the first ELIXIR programming cycle (2014–2019), the Training Platform will further develop a coherent portfolio of Train-the- Developer, Train-the-Researcher, and Train-the-Trainer courses, and it will further expand the Platform’s training activities in areas identified as training needs, such as data stewardship, data management, and training in ELIXIR resources. Another area of focus will be to connect individual training courses into learning paths that are tailored to the competencies and needs of individual researchers, including in industry.

The ELIXIR Training Platform is funded through ELIXIR- EXCELERATE project (Work Package 11: ELIXIR Training Programme). The Platform actively collaborates with partner initiatives and projects (GOBLET, Software and Data Carpentries, CORBEL, RITrain and others.)

ELIXIR Annual Report 2017 19 Platform leaders

Tools Data

Søren Brunak Jo McEntyre Christine Durinx

Compute

Ludek Matyska Steven Newhouse Tommi Nyrönen

Interoperability

Carole Goble Chris Evelo Helen Parkinson

Training

Patricia Palagi Celia van Gelder Gabriella Rustici

20 ELIXIR Annual Report 2017 Use Cases and Communities ELIXIR Use Cases and Communities

The ELIXIR Use Cases drive the work of Use Cases as emerging Communities the ELIXIR Platforms by defining their With the first programme cycle coming to an end in bioinformatics needs and requirements. 2018, the ELIXIR Head of Nodes Committee reviewed This close collaboration ensures that in 2017 the existing model of four Use Cases - Human the services developed by the ELIXIR Data, Plant Sciences, Rare Diseases, and Marine Platforms are fit for purpose and serve Metagenomics. Following this evaluation, the Head of the needs of their research communities. Nodes Committee decided to change the name of Use Cases to better reflect how they are organised and how they contribute to ELIXIR’s development. The activities of the existing ELIXIR Use Cases have so far been funded principally through the ELIXIR- Starting in 2019, when ELIXIR’s next Scientific EXCELERATE grant. They bring together experts to Programme (2019-2023) begins, Use Cases will be develop specialised standards and services in their referred to as “Communities”. This will avoid ambiguity, respective domains, and also provide feedback on especially in systems and software engineering, where the Platform services, helping to ensure that they are the term ‘use case’ has a well-established meaning. practical and useful. The word ‘community’ also better characterises how ELIXIR coordinates its expertise and services for In September 2017, the ELIXIR Heads of Nodes agreed scientists within a particular domain. to continue the four existing ELIXIR Use Cases, and also to establish three new ‘Communities’, each Communities will function similarly to Use Cases, serving a specific group of researchers. The current and their activities will be funded through a variety portfolio of ELIXIR Use Cases and Communities thus of sources, including Hub-funded Commissioned now consists of: Services, project-based funding from the European Commission, and commitments from ELIXIR Nodes. • Human data: Developing long-term strategies for managing and accessing sensitive human data • Rare diseases: Supporting the development of new therapies for rare diseases • Marine metagenomics: Developing a sustainable metagenomics infrastructure to nurture research and innovation in marine science • Plant science: Developing an infrastructure to facilitate genotype-phenotype analyses for crop and tree species • Proteomics: Supporting research on the expression and interaction of proteins • Metabolomics: Providing infrastructure services for metabolite identification • Galaxy: Integrating Galaxy platform with ELIXIR resources and services

22 ELIXIR Annual Report 2017 Human data Use case

The ELIXIR Human Data Use Case is building the technical infrastructure required for researchers to discover, combine, and to exchange controlled-access human data, while complying with data-privacy and data- security requirements.

The backbone of the Human Data Use Case is the European Genome-phenome Archive (EGA)1. The Use Case extends and generalises the EGA system of access authorisation and secure data transfer, and makes it available to researchers across the ELIXIR Nodes.

The Use Case works closely with the Global Alliance for Genomics and Health (GA4GH) in developing and establishing global standards for the sharing and exchange of genomics data. The transfer of large volumes of sensitive ELIXIR Beacons human data The GA4GH Beacon is a lightweight web platform In 2017, the Use Cases worked closely with the ELIXIR that allows any genomic data centre in the world to Compute Platform to allow the secure transfer of make its data discoverable. Users can ask Beacons sensitive human data stored within the European straightforward yes or no questions like, ‘Do any of Genome-phenome Archive, using ELIXIR AAI. A these data resources have genomes with this allele at demonstration of this sensitive data transfer was that position?’ The search result informs a researcher presented in November 2017 to researchers via an as to whether making a data access request is required ELIXIR Webinar, giving an overview of the improved for their research, saving valuable time and resource. support in accessing and processing sensitive data The collaboration between ELIXIR and GA4GH on the from the European Genome-Phenome Archive (EGA). ELIXIR Beacon Project expanded in 2017 to develop the network of ELIXIR Beacons and to improve the Facilitating the re-use of human data discoverability of European genomics data. Additional Following up on the results of the ELIXIR goals of the collaboration were to develop new features Implementation study Genomic data management and to add security measures to attract stakeholders for TraIT using the EGA, the Human data Use Case with more sensitive data sets while minimising risks to published a second research paper, presenting individual privacy. a technical solution for linking Dutch data portal The new ELIXIR Beacon Network will allow users (TranSMART) and Galaxy with EGA, for the reuse 2 to query all ELIXIR Beacons simultaneously. The of human translational research data . integration of the Beacon network with the ELIXIR TraIT (Translational research IT) is a project by the Dutch Authentication and Authorisation Infrastructure (AAI) Center for Translational Molecular Medicine (CTMM) will also enable streamlined access to sensitive human to implement an IT infrastructure for translational data in a secure and safe manner. biomedical research. Linking TraIT’s TranSMART portal In October 2017, during the GA4GH 5th Plenary and the Galaxy platform with EGA enabled Dutch Meeting in Orlando, USA, the Beacon project was researchers to use EGA as the long-term storage named one of the GA4GH Driver projects, with the aim solution for raw data. of driving the development of, and providing guidance on the development of, global standards for genomics data. The development of the Beacon network 1. http://www.ega-archive.org/ continues in 2018, and the first release of the ELIXIR Beacon specifications is planned for the last quarter 2. Zhang C, Bijlard J, Staiger C et al. Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data. F1000Research 2017, of 2018. 6:14889 (doi: 10.12688/f1000research.12168.1)

ELIXIR Annual Report 2017 23 Rare diseases Use case

The ELIXIR Rare Diseases Use Case aims to build a portfolio of ELIXIR resources to address the needs of the rare diseases research community. The goal of this Use Case is to create a federated infrastructure that will enable researchers to discover, access, and analyse different rare disease repositories across Europe. It is doing this in partnership with other European infrastructures and projects, namely RD- CONNECT, BBMRI-ERIC and E-Rare.

Catalogue of Rare Disease data resources and tools Based on a user survey carried out in 2016 and 2017, the Rare Diseases Use Case prioritised the first resources to include in the ELIXIR infrastructure and published a catalogue of resources, data sources and methods for the RD communities. The first version of Rare Diseases Implementation Studies the resulting catalogue – released in February 2017 – An ELIXIR Implementation Study was set up to connect comprised 51 resources and analysis tools. Throughout the RD-Connect platform for rare disease research2 2017, the catalogue expanded to over 100 different with the European Genome-phenome Archive (EGA)3. resources, most of which are now registered in the This Implementation study developed a solution for ELIXIR Service and Tools registry (bio.tools) as part of visualizing the data stored in the EGA by allowing a specific Rare Diseases collection1. authorized users to visualize files from the EGA The rare disease portfolio of resources will be through a genome browser (such as Genome Maps regularly evaluated using a selection of datasets and or the Integrated Genome Browser) that is integrated benchmarking strategies. The development of the with the RD-Connect platform. The results of this evaluation procedure was informed by the ELIXIR Tools Implementation Study were presented to researchers in 4 Platform as part of the ELIXIR benchmarking strategy. an ELIXIR webinar in early 2018 . The second ELIXIR Implementation study in 2017 aimed Rare Disease training capacity and needs to map out the requirements for making sources of data survey FAIR, with a particular focus on the interoperability In the first half of 2017, the Rare Disease Use Case of molecular rare disease data, as well as on enabling carried out a survey to collect data about the training federated queries5. This Implementation study capacity and needs of the rare diseases research improved variant-phenotype mapping and visualization, community. The goal was to use the data to develop and introduced better ontology based annotation of new training courses and workshops that will be the data. In addition, it also identified problems and developed in collaboration with the ELIXIR Training challenges in making rare disease data FAIR. Platform and will address the existing skills gap within the community. 1. https://goo.gl/LSfv9k

2. https://platform.rd-connect.eu

3. Visualization of aligned genomics data for rare diseases (RD-Connect) as a driver for re-al-time access of controlled data at the EGA: https://www.elixir- europe.org/visualisation-aligned-rd-data

4. https://www.elixir-europe.org/events/elixir-webinar-visualisation-rare- disease-genomics- data

5. Interpretation of phenotypic and genotypic variation for rare diseases in terms of biological pathways: https://www.elixir-europe.org/about-us/ implementation-studies/interpretation-phenotypic-and-g enotypic-variation

24 ELIXIR Annual Report 2017 Marine metagenomics Use case

The Marine Metagenomics Use Case aims to develop a sustainable metagenomics infrastructure to enhance research and industrial innovation within the marine domain. Marine metagenomics resources range from deposition archives with research data output to highly dynamic knowledgebases that aggregate and process research data through manual curation and complex analysis pipelines.

Data standards for the marine research community In June 2017, the Marine Metagenomics Use Case published its first publication on best practices1. The paper proposes best practice as a foundation for a community standard to enable reproducibility and Metagenomics analysis pipelines the better sharing of metagenomics datasets, leading Working under the umbrella of the ELIXIR Marine ultimately to greater metagenomics data reuse and Metagenomics Use Case, EMBL-EBI significantly repurposing. It outlines best practice for the reporting updated the EMBL-EBI’s Metagenomics resource of metagenomics workflows throughout four essential (EMG)8, including an overhaul of the taxonomic steps: (1) sampling, (2) sequencing, (3) data analysis, profiling section of their pipeline, and updating the and (4) data archiving, and highlights essential variable underlying reference databases and tools. These parameters and common data formats in each step. updates facilitated higher-resolution taxonomic assignments which helped to classify over 70 % of Marine-specific data resources previously unclassified 16s rRNAs. In March 2017, the Use Case launched the Marine EMG also became the first adopters of the Common Metagenomics Portal2, which contains three Workflow Language (CWL) within the Marine contextual and sequence reference databases: Metagenomics Use Case. The adoption of the CWL MarRef, MarDB and MarCat. MarRef3 is a database for enabled the existing in-house pipeline to be replaced completely sequenced marine prokaryotic genomes, with a considerably simpler code of CWL. All EMG MarDB4 is a database of sequenced marine prokaryotic workflow descriptions are now available on a genomes regardless of the level of completeness, and public repository9. MarCat5 is a catalogue of marine genes and proteins derived from metagenomics samples. The Marine Metagenomics Portal also improved their META-pipe pipelines to enhance the precision This Use Case also released two additional databases: and accuracy of biodiversity and function analysis. ITSoneDB6, a comprehensive collection of marine The Metagenomics Portal also released MAR fungal ribosomal RNA Internal Transcribed Spacer 1 BLAST10, a search engine for interrogating marine (ITS1) to support metabarcoding surveys of fungal and metagenomics datasets. This tool enables BLAST other microbial eukaryotic environmental communities, searches to be performed on all genes and protein- and Eukaryotic gene catalogue7, a unigene catalog coding sequences from the marine databases MarRef, from the samples collected by the Tara Ocean samples. MarDB and MarCat.

5. https://mmp.sfb.uit.no/databases/marcat/ 1. ten Hoopen P, Finn RD et al. The metagenomic data life-cycle: standards 6. http://itsonedb.cloud.ba.infn.it/ and best practices, GigaScience 2017, 6(8):1–11, (doi: https://doi.org/10.1093/ 7. https://www.ebi.ac.uk/ena/data/view/ERZ480625 gigascience/gix047) 8. https://www.ebi.ac.uk/metagenomics/ 2. https://mmp.sfb.uit.no 9. https://github.com/EBI-Metagenomics/ebi-metagenomics-cwl 3. https://mmp.sfb.uit.no/databases/marref/#/ 10. https://mmp.sfb.uit.no/blast/ 4. https://mmp.sfb.uit.no/databases/mardb/

ELIXIR Annual Report 2017 25 Plant science Use case

The ELIXIR Plant science Use Case is building a common technical infrastructure and associated social practices to support plant genotype-phenotype analysis based on the widest available public datasets. The goal is to make plant genotypic and phenotypic data easier to find, integrate and analyse, by making them FAIR.

Standards for plant data and metadata The Plant Sciences Use Case developed and proposed an extension of the Minimal Information about Plant Phenotyping Experiments (MIAPPE) v1.0 specification1. This extension is now out for wider community consultation, and the Use Case continues to work on the model.

During the subsequent stages of the MIAPPE’s development, special attention was given to biosource attributes for non-crop species, such as The Use Case also developed new ontology lists to forest trees, which were not well-represented by the address gaps in existing vocabularies. The Woody Plant 2 previous specification. The specification was also Ontology was released in August 2017. It provides all complemented with a list of proposed ontologies variables used for woody plant observations, collected and expected data types to help users annotate their from various past and ongoing projects at national data. This revised specification is now considered and international levels. The Plant Experimental Assay 3 for potential adoption as MIAPPE v1.1. Ontology focuses on the description of pipelines of manipulations performed from specimens to data and A workshop organised in Oeiras, Portugal, in contains entities from three distinct realms (biological, September 2017, also produced the backbone of the physical and data), including experimental products, MIAPPE specification integration within RDF (Resource their relations, and the protocols describing the Description Framework), which is compliant with manipulation with the products. RDA -WDI standards (Research Data Alliance – Wheat Data Interoperability). Plant data discovery and access – To ensure the broad adoption of the data and the Breeding API extended standards, Plant Sciences Use Case The Plant Science Use Case participated in the members played an active role in establishing the Breeding API (BrAPI) project, which aims to develop formal governance structure for MIAPPE, to serve as and implement a Web Service API for exchanging data a persistent body to integrate, validate and promote on plant material, and on phenotyping and genotyping, efforts to develop standards in this area. Two members mainly for breeding purposes. of the Plant Science Use Case were selected to serve The Plant Science Use Case also organised three ‘Bring on the initial steering committee. This Committee Your Own Data’ hackathons to showcase the potential will ensure that information is exchanged between of FAIR data in the context of plant research and to relevant projects. It will also identify future priority demonstrate how to ensure the interoperability of plant areas and establish working groups to address these domain data, using MIAPPE and ELIXIR Interoperability priorities, and it will identify opportunities for outreach Platform resources. and funding.

1. http://www.miappe.org

2. https://www.ebi.ac.uk/ols/ontologies/co_357

3. https://bitbucket.org/PlantExpAssay/ontology

26 ELIXIR Annual Report 2017 Use Case leaders

Human data

Serena Scollen Thomas Keane Jordi Rambla

Rare diseases

Serena Scollen Ivo Gut Marco Roos

Marine metagenomics

Nils Peder Willassen Rob Finn

Plant sciences

Paul Kersey Celia Miguel

ELIXIR Annual Report 2017 27 New ELIXIR Communities

New ELIXIR Communities Metabolomics Following the review of the existing ELIXIR Use The Metabolomics Community will facilitate an ELIXIR Cases, the Head of Nodes Committee invited research infrastructure for metabolite identification, in order to communities in ELIXIR to submit proposals to become help scientists to better understand the biochemistry of recognised ELIXIR Communities. organisms. The tools used for metabolite identification produce large data sets that are more efficiently From the submitted proposals, the Head of Nodes analysed, reported and stored using resources Committee selected three new Communities, covering connected by ELIXIR. Representatives from ten Nodes Proteomics, Metabolomics and Galaxy. In 2017, these met in April 2017 to discuss metabolomics within the three new Communities established their structure scope of ELIXIR. As with the Proteomics Community, and leadership and prepared their work programme the outcomes of this meeting have been described in a for 2018. paper published in ELIXIR’s F1000 Research channel2. A further seven Community proposals will undergo evaluation during 2018. Galaxy Galaxy is a workflow management system that removes the need for users to compile and install tools, Proteomics at the same time, facilitating the sharing of data and The Proteomics Community aims to align ELIXIR results so that science is reproducible. The Galaxy activities with the needs of scientists researching Working Group formed within the Tools Platform. protein expression and interactions. Merging currently As a recognised Community, Galaxy will continue to available and future sustainable proteomics resources support other Communities, such as the Proteomics into existing ELIXIR Platforms and Use Cases will and Metabolomics Communities. It will also develop help to integrate proteomics data with multi-omics a strategy with the Nodes to increase the availability data. The Proteomics Community will also improve of data visualisation tools, to integrate the Galaxy data processing and analysis pipelines, and create resources with ELIXIR AAI, and to further develop guidelines for proteomics data management and training materials and events. annotation.

Representatives from eleven Nodes discussed ELIXIR’s activities in proteomics at a strategic meeting held in March 2017. Evidence-based recommendations arising from this meeting were published in ELIXIR’s F1000 Research channel1.

1. Vizcaíno JA, Walzer M, Jiménez RC et al. A community proposal to integrate proteomics activities in ELIXIR. F1000Research 2017, 6:875 (doi: 10.12688/ f1000research.11751.1)

2. van Rijswijk M, Beirnaert C, Caron C et al. The future of metabolomics in ELIXIR. F1000Research 2017, 6(ELIXIR):1649 (doi: 10.12688/ f1000research.12342.2) Members ELIXIR Nodes updates

Members Observers Belgium Italy Greece Czech Republic Luxembourg Denmark Netherlands EMBL-EBI Norway Estonia Portugal Finland Slovenia France Spain Germany Sweden Hungary Switzerland Ireland UK Israel

30 ELIXIR Annual Report 2017 Belgium • Organised workshops and trainings related to ELIXIR CZ services, including: Advanced in silico drug • ELIXIR Belgium officially launched in Ghent in design workshop, RNA-seq Chipster online course, February with a one-day event, including a data and a Repeat Explorer workshop management workshop • Presented ELIXIR CZ at a day on national Research • Finalised and approved the ELIXIR Collaboration Infrastructures organised by the Czech Ministry of Agreement Education, Youth and Sport • Started the Implementation Study ‘ELIXIR Integration from a User perspective’ and is due to participate in Denmark five more Implementation Studies • Expanded the ELIXIR registry of bioinformatics • Organized a BYOD hackathon together with ELIXIR tools and data services (https://bio.tools) to over Netherlands, as well as a workshop on one of ELIXIR 10,000 entries in total from 681 contributors and 293 Belgium’s Node services (PLAZA), and two training domains, representing most major European service courses on data-mining and data-processing providers • Co-organized the European Galaxy developer • Published a schema (https://github.com/bio-tools/ Workshop (Strasbourg), the ELIXIR/GOBLET/ biotoolsSchema/) and ontology (https://github.com/ GTN hackathon for Galaxy training material re-use edamontology/) releases for the formalised syntactic (Cambridge), and the data hackathon of the Galaxy and semantic description of tools Community Conference (Montpellier) • Organised or participated in five events throughout • Hosted the ELIXIR Innovation and SME-forum on 2017 around the development and use of bio.tools ‘Data-Driven Innovation in Food, Nutrition and • Ran a studentship scheme supporting students to Microbiome’ in Brussels work on curation-focused mini-projects with an • Supported the Belgian Metabolomics Day and the impact on bio.tools content growth and quality Benelux Bioinformatics Conference, presenting • Co-initiated ELIXIR Implementation Studies for ELIXIR Belgium at these events “Architecture for Software Containers” and “ELIXIR • Implemented an automated procedure for Belgian Integration from a User perspective” training events to aggregate training information on • Helped create a community to integrate proteomics TeSS activities throughout ELIXIR Czech Republic • Organised the third annual Danish Bioinformatics Conference (August 2017) • Secured additional grant from the national research infrastructure development fund • Co-authored PLOS Biology paper on the design, provision and re-use of identifiers in the life sciences1 • Successfully passed an external interim evaluation of the ELIXIR Czech Republic infrastructure project • Co-authored GigaScience paper to present the ReGaTE utility for registration of Galaxy tools in bio. • Completed the ELIXIR Implementation Study for tools (see also the Tools Platform section)2 ELIXIR AAI Production 2017 (with ELIXIR Finland) • Co-authored F1000R paper on encouraging best • Started work on one new ELIXIR Implementation practices in research software3 study (DataMovement – ELIXIR Proof of concept study on the availability of big datasets on remote compute • Co-authored F1000R paper to present the ToolDog infrastructure), and was involved in two more ELIXIR (Tool DescriptiOn Generator) to facilitate the Implementation Studies in 2018 (AAI Production 2018, integration of tools registered in the ELIXIR tools 4 and Towards Data Stewardship in ELIXIR) registry (see also the Tools Platform section) • Ran an ELIXIR Staff Exchange project with EMBL-EBI to work on the 3DPATCH project 1. Julie A. McMurry et al., Identifiers for the 21st century: How to design, • Organised an annual ELIXIR CZ conference for provision, and reuse persistent identifiers to maximize utility and impact of infrastructure partners and users life science data. PLOS Biology 2017 (doi: 10.1371/journal.pbio.2001414)

ELIXIR Annual Report 2017 31 EMBL-EBI Estonia

• 13 EMBL-EBI resources were named as ELIXIR Core • Started national actions (service development and Data Resources; 12 EMBL-EBI resources were named maintenance, and training) supported through EU as ELIXIR Deposition Databases Structural Funds (€1.3M, until 2022) • Implemented system for replicating data sets • Co-hosted ELIXIR Innovation and SME Forum in between ELIXIR Nodes Helsinki, Finland, in February 2017 (with ELIXIR Finland) • Integrated the ELIXIR Authentication and Authorization Infrastructure at EMBL-EBI • Initiated work on the Implementation Study ‘ELIXIR Integration from a User perspective’ together • Registered EMBL-EBI tools in Bio.tools with ELIXIR UK and ELIXIR Belgium as main co- • Commenced Implementation Study on the contributors development of architecture for software containers • Deployed ELIXIR AAI for the Virtual Coffee Room at ELIXIR and their use by EXCELERATE Use Cases (https://cafe.elixir.ut.ee) • Led (with ELIXIR Netherlands) the creation of a • Launched funcExplorer, https://biit.cs.ut.ee/ Community to integrate metabolomics activities funcexplorer/, a web tool for fast data clustering throughout ELIXIR coupled with enrichment analysis • Contributed to development of the MIAPPE data • Co-authored ELIXIR F1000R paper to present the standards for plant phenotyping data, and BrAPI, the ToolDog (Tool DescriptiOn Generator) to facilitate plant breeding API, and the integration of phenotypic the integration of tools registered in the ELIXIR tools data with EMBL-EBI data resources registry (See also the Tools Platform section)5 • Implemented Bioschemas in a number of ELIXIR- affiliated EMBL-EBI data resources Finland • Led the creation of a Community to integrate • Secured €1.6 million grant from national research proteomics activities throughout ELIXIR infrastructure roadmap to expand secure cloud e.g. • Started two ELIXIR Implementation Studies: (1) on for cancer research plant phenotyping data validation in collaboration • Organized a metagenomics course with trainers with the University of Gent Center for Plant Systems from ELIXIR Norway, EMBL-EBI, and ELIXIR Finland. Biology (VIB); and (2) to create open proteomics Developed a training course on RNA-seq data analysis pipelines for Data Dependent Acquisition analysis with Chipster via the eLearning platform approaches (in collaboration with ELIXIR-DE) with the ELIXIR Czech Republic and ELIXIR Slovenia • Deployed ELIXIR AAI for the BioSamples database Nodes • Introduced the Human Cell Atlas project into the • Developed a protocol to launch a copy of the ELIXIR Human Data Use Case META-pipe metagenomics annotation pipeline in OpenStack clouds and in the EGI-federated cloud (in • Collaborated with the Ontology Lookup Service collaboration with ELIXIR Norway and ELIXIR Czech (OLS) to resolve CURIEs used for ontology terms Republic). The annotation service based on this • Implemented the ELIXIR Beacon network for relevant protocol is now being used in ELIXIR Finland EMBL-EBI data resources • Integrated tools for single-cell RNA-seq data analysis • Began implementation of the htsget secure in the Chipster platform and organized a training streaming API with RD-Connect course on it • Organised four training courses in Europe, five • Started the Implementation Study on ‘Using clouds courses at EMBL-EBI, and two webinars in data and VMs for training’ analysis and data management • Continued developing and operating the ELIXIR AAI together with ELIXIR Czech Republic; represented ELIXIR in the AARC and AARC2 project

32 ELIXIR Annual Report 2017 • Co-hosted ELIXIR Innovation and SME Forum in • Organised the second International de.NBI Helsinki, Finland, in February 2017 (with ELIXIR Symposium “The Future Development of Estonia) Bioinformatics in Germany and Europe” (October 2017) • Contributed to the development of training on national data management planning and to • Organised 69 training courses with a total of 1,489 the development of guidance on national data participants management planning for sensitive data to be added • Established a de.NBI cloud at five universities to DMPTuuli (the Finnish instance of DMPonline, in Bielefeld, Freiburg, Gießen, Heidelberg, and an online platform to help researchers create, Tübingen, and integrated it with ELIXIR’s AAI system review, and share data management plans that meet institutional and funder requirements). Israel • Run an ELIXIR Staff exchange project with ELIXIR Spain to develop Local EGA technologies, and to • Developed an ELIXIR Staff Exchange project with develop secure human data discovery and transfer EMBL-EBI in Structural Biology (to start in January 2018) France • Hosted a visit of the ELIXIR Director to the ELIXIR Israel Node • Formally launched the new National Research Infrastructure Roadmap for ensuring the long-term • Announced the physical location of the Node as sustainability of the national Node being at the Nancy and Stephen Grand Israel National Center for Personalized Medicine • Held the first European Galaxy Administrator Workshop • Held the ELIXIR Innovation and SME Forum on ‘Data Driven Innovation in Rare Diseases and Personalised Medicine’ • Organised and Hosted the Galaxy Community Conference 2017 (June) • Organised the first ELIXIR BioContainers Hackathon (October 2017) • Hosted the ELIXIR Board meeting (November 2017) • Launched and offered nearly 80 applications and around 2,000 containers in the Cloud Federation Biosphere 2. Doppelt-Azeroual O, Mareuil F et al. ReGaTE: Registration of Galaxy Tools in ELIXIR, GigaScience 2017, 6(6): 1-4, (doi: 10.1093/gigascience/gix022) • Trained over 1,200 people through more than 110 3. Rafael C. Jiménez et al., (2017), Four simple recommendations training courses to encourage best practices in research software, (doi: 10.12688/ f1000research.11407.1)

Germany 4. Hillion KH, Kuzmin I, Khodak A, Rasche E, Crusoe M, Peterson H, Ison J, Ménager H et al. Using bio.tools to generate and annotate workbench • Received further support from the Federal Ministry of tool descriptions. F1000Research 2017, 6(ELIXIR):2074 (doi: 10.12688/ f1000research.12974.1) Education and Research (BMBF) to run the national ELIXIR Node for another two years until February 2020 5. Hillion KH, Kuzmin I, Khodak A, Rasche E, Crusoe M, Peterson H, Ison J, Ménager H et al. Using bio.tools to generate and annotate workbench • Organized the strategic workshop on ‘The Future of tool descriptions. F1000Research 2017, 6(ELIXIR):2074 (doi: 10.12688/ f1000research.12974.1) proteomics in ELIXIR’ (March 2017) and published a 6. Vizcaíno JA, Walzer M, Jiménez RC et al. A community proposal to 6 white paper in the ELIXIR F1000R channel integrate proteomics activities in ELIXIR. F1000Research 2017, 6:875 (doi: 10.12688/f1000research.11751.1) • Organised the strategic workshop on ‘The Future of Metabolomics in ELIXIR’ (April 2017) and published a 7. van Rijswijk M, Beirnaert C, Caron C et al. The future of metabolomics in ELIXIR. F1000Research 2017, 6(ELIXIR):1649 (doi: 10.12688/ white paper in the ELIXIR F1000R channel7 f1000research.12342.2)

ELIXIR Annual Report 2017 33 Ireland • Organised 10 courses and workshops all over Italy, including in Rome, Naples, Padua, Bari, Milan, • Submitted ELIXIR Node application, which was Cagliari, Trento, Salerno, and Palermo. Ran five approved by the ELIXIR SAB in September 2017 training courses, three workshops, one tutorial and • Developed the ELIXIR Service Delivery Plan with one ELIXIR-EXCELERATE Train the Trainer course a selection of ELIXIR Ireland services to be offered • Held the first Summer School in advanced through ELIXIR, including: BioOpener (resource computational metagenomics (June 2017) to access and work with fragmented biomedical • Co-organised a nine-day EMBO Practical course on repositories); CancerGD.org (resource for analyzing Population genomics (May 2017) and interpreting genetic dependencies in cancer); Clustal Omega (Multiple Sequence Alignment • Developed and made available in an open repository package); Riboseq.Org (for ribosome profiling training course materials: https://github.com/ELIXIR- (RiboSeq) data analysis); SLiMs (suite of tools and IIB-training repositories for the analysis, visualisation and • Launched a new web page for the ELIXIR-IT Training dissemination of short linear motifs in proteins); and, Platform at: https://elixir-iib-training.github.io/ Whatizit (text-processing system linking biomedical website terms to publicly available databases) • Provided access to ownCloud services hosted at Italy INFN Cloud infrastructure

• Launched or significantly updated 11 databases: • Provided HPC resources to 20 projects through the DisProt, RepeatsDB, ITSoneDB, REDIportal, eDGAR, ELIXIR-IT HPC@CINECA initiative Galactosemia Proteins Database, MobiDB (for which • Worked on the “ELIXIR-IT integration” ELIXIR we obtained a persistent Identifiers.org prefix), MINT, Implementation study Signor 2.0, PeachVarDB, and HmtDB • Secured a national grant with a total budget of • Launched or updated 12 tools and pieces of €400,000 software: MetaShot, A-GAME, RNentropy, CoVaCS, • Co-authored over 20 publications about released or ISPRED 4, SChloro, BAR 3.0, Disnor, MToolBox, updated services and best practices BEAM, MobiDB-lite, and SODA • Included MobiDB (part of InterPro) and MINT (within Luxembourg IMEx Consortium) in the list of ELIXIR Core Data Resources • Finalised and approved the ELIXIR Collaboration Agreement • Re-organised the internal structure of ELIXIR Italy to mirror ELIXIR’s overall structure. ELIXIR Italy • Officially launched ELIXIR Luxembourg on 7 now has five platforms with leads and deputies to September 2017, with a half-day symposium better integrate the national activities within ELIXIR • Started the Implementation Study on ‘Integrating Platforms and Use Cases ELIXIR-Luxembourg into ELIXIR Activities’ • Developed and approved a quality management • Engaged in three additional Implementation Studies policy for ELIXIR Italy database and tool services with the Training Platform that were approved for • Registered or updated ELIXIR Italy bioinformatics funding services into the ELIXIR Tools and Services Registry: • Organised and hosted a Training course in a total of 174 ELIXIR Italy tools and databases are Luxembourg: Data processing with R tidyverse (four registered in bio.tools days in November 2017) • Secured Horizon 2020 funding as part of the MCSA- • Deployed OpenStack-based IT infrastructure for Data RISE European grant and Compute services • Launched a new high performance database cluster for data hosting at the Node

34 ELIXIR Annual Report 2017 • Developed the Translational Medicine Data Netherlands Catalogue8 in collaboration with eTRIKS • Organised Health-RI conference 2017, together with • Established a sustainability solution for the other Dutch research infrastructures, presenting a Innovative Medicine Initiative project eTRIKS, business plan for the Health-RI initiative. Received which is transferable to other projects (data hosting commitment of ~20 organisations, including funders agreements in progress) towards this initiative • Co-authored ‘The Future of metabolomics in ELIXIR’7 • Organised an ELIXIR track at the Dutch as a member of the new ELIXIR Community on Bioinformatics conference BioSB2017 in April Metabolomics • Organised two Data Carpentry Genomics workshops • Co-organised the international conference Impact and one Software Carpentry / Data Carpentry of Big Data Analytics on Healthcare9 with the Instructor training Luxembourg Centre for Systems Biomedicine (LCSB) on 4–5 October, 2017 • Organised three BYOD hackathons on ELIXIR-BrAPI (with ELIXIR Belgium), Cancer Genomics, and at the Norway Rare Disease Summer school (with ELIXIR Italy), as well as a FAIR/BYOD workshop at Bio IT World 2017 • Secured national funding for ELIXIR Norway for in Boston, USA 2017–2021 (€15 m) • Co-authored two F1000R papers in the ELIXIR 10 • Launched the Marine Metagenomics Portal (March channel, one about Software and Data Carpentry12, 2017), which contains the MAR databases; MarRef, and the other about Linking EGA, Galaxy & MarDB and MarCat, and META-pipe. tranSMART13 • Contributed to the GigaScience paper ‘The • Organised a workshop about FAIR Data and Data metagenomic data life-cycle: standards and best Stewardship in ELIXIR and a workshop to receive practices’11 feedback on the Data Management Plan wizard • Upgraded the Norwegian e-infrastructure for Life during the ELIXIR All Hands Meeting in Rome, March Science (NeLS) 2017 • Set in production integrated service provision with • Organised ELIXR Netherlands Roadmap partner NorSeq sequencing consortium, providing end- meeting in Utrecht, January 2017 users’ data through NeLS • Secured continued Nordic funding for handling sensitive bioinformatics data (Tryggve2, 2017–2020) • Organised 11 training workshops on the NeLS platform, on NGS data analysis, meta analysis, and data storage • Worked with FAIRDOM UK to integrate their SEEK system with the NeLS platform for handling project meta-data and data, initially for Digital Life Norway 8. http://datacatalog.elixir-luxembourg.org projects 9. https://bigdata.uni.lu • Contributed to data management hands-on courses 10. https://mmp.sfb.uit.no organized by Digital Life Norway 11. Petra ten Hoopen, Robert D. Finn et al. The metagenomic data life-cycle: standards and best practices, GigaScience, Volume 6, Issue 8, 1 August • Co-organised the “Metagenomics data analysis” 2017, Pages 1–11, https://doi.org/10.1093/gigascience/gix047 workshop, Helsinki, Finland (April, 2017) 12. Pawlik A, van Gelder CWG, Nenadic A et al. Developing a strategy for computational lab skills training through Software and Data Carpentry: Experiences from the ELIXIR Pilot action. F1000Research 2017, 6:1040 (doi: 10.12688/f1000research.11718.1)

13. Zhang C, Bijlard J, Staiger C et al. Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data. F1000Research 2017, 6:1488 (doi: 10.12688/f1000research.12168.1)

ELIXIR Annual Report 2017 35 • Organised ELIXR Netherlands Kick-off Meeting • Launched ELIXIR Portugal Node Computing Service in Utrecht, October 2017. This meeting brought (proposal) providing virtual machines to researchers together important parties for a FAIR infrastructure in bioinformatics using a cloud-based OpenStack for data and services (e.g. DRE and euroCAT), leading platform (available images include: Genomics Virtual to collaborations on the Personal Health Train Lab, Galaxy Docker, IGC Galaxy) • Participated as an ELIXIR representative in RDA Slovenia meetings in Barcelona and Montreal • Presented an ELIXIR webinar on FAIR data tooling • ELIXIR-SI entity (BEE) established in 2017 new collaborations with: The Agricultural Institute of • Published a Briefings in Bioinformatics paper entitled: Slovenia; the Biotechnical Faculty and Faculty for ‘Bioinformatics in the Netherlands: the value of a Computer and Information Science in Ljubljana; 14 nationwide community’ and with the University Medical Centre Ljubljana, to • Organised two workshops with the Health Research build on pre-existing collaborations with the National Board in Ireland to teach FAIR data stewardship to Institute of Biology, Arnes (Geant, EGI and PRACE their internal organisation and to their researchers members) and with the Institute Jožef Stefan • Organised a three-day meeting of the ELIXIR • Used the ELIXIR-SI eLearning Platform (EeLP) to Interoperability Platform at the Vrije Universiteit support bioinformatics courses and webinars that Amsterdam, January, 2017 were run across Europe https://elixir.mf.uni-lj.si (available also via http://elearning.elixir-slovenia.org) • Expanded ELIXIR Netherlands to include 55 partners, of which 10 are currently active in ELIXIR projects • (Co-)organised eleven training courses and workshops (courses are in EeLP), in collaboration • Participated in an ELIXIR Implementation Study on with several ELIXIR Nodes, including Czech, German, Data Management Planning (together with other Finnish, Spanish, French, Italian, Portugese and ELIXIR Nodes) Swedish • Participated in three new Use Case proposals • Co-authored an ELIXIR F1000R paper on ‘Ten steps (Metabolomics, Nutrition and Toxicology) to get started in genome assembly and annotation’ • Ran an ELIXIR Staff Exchange project in Plant (as part of ELIXIR Capacity Building and Training/ Breeding (BrAPI), with EMBL-EBI eLearning activity)15. • Co-authored F1000R paper to present the ToolDog Portugal (Tool Description Generator) to facilitate the • Organised an ELIXIR-EXCELERATE Plant Sciences integration of tools registered in the ELIXIR tools Use Case workshop to build a domain ontology for registry (see also the Tools Platform section). plant phenotyping data (September 2017) • Became new partners for training in two ELIXIR • Participated in ELIXIR Staff Exchange project with Implementation Studies (Beacons and CWL) ELIXIR Netherlands • Partnered with ELIXIR-ES in Staff Exchange project • Contributed to the development of the MIAPPE approved for 2018 (Enhanced Cloud Computing with specifications Resource Auto-Scaling for Educational Software) • Presented ELIXIR Portugal at Fórum de Gestão de • Received €88,000 research infrastructure grant from Dados de Investigação (Research Data Management the Slovenian Research Agency Forum) and at the Bioinformatics Open Days in Braga Spain • Launched a Community Service Registration (as part of bio.tools) to help Portuguese community to share • Formally joined ELIXIR in October 2017; Spain had and promote their bioinformatics services previously been a Provisional Member • Started the execution of the BioData project, the • Secured national funding for 2018–2020. Spanish national project that supports the ELIXIR Portuguese National Institute of Bioinformatics (ELIXIR Spain) node expanded from 10 to 19 groups from 13 different institutions • Organized 10 training courses, integrated into the (proposed) GTPB Node Service • Launched a national network of bioinformatics groups working at Health Research Institutes • Further improved the (proposed) Yeastract Node associated with Hospitals (TransBioNet) Service with: updated Gene Ontology terms, updated Gene information from Saccharomyces Genome Database, and new curated regulatory information

36 ELIXIR Annual Report 2017 • The European Genome-phenome Archive (EGA) – EGA, in collaboration with other Nordic ELIXIR jointly managed and developed by the EBI-EMBL and nodes and with ELIXIR Spain ELIXIR Spain – named ELIXIR Core Data Resource • Organised 20 national advanced training events in and included in the recommended ELIXIR Deposition bioinformatics Databases • Organised a distributed course in linux using the • Hosted an ELIXIR Innovation and SME Forum e-learning system from the Slovenian ELIXIR Node event in June 2017 in Barcelona around data driven innovation in health and personalised genomics • Ran weekly drop-in sessions at all major sites in Sweden where researchers could meet • Contributed to three implementation studies bioinformaticians to discuss projects concerning Human Data and Rare diseases Use Cases: ELIXIR Beacon 2017, Implementation of Switzerland phenotypic and genotypic variation for rare-diseases in terms of biological pathways, and Remote real- • Two core resources of ELIXIR Switzerland (SIB – time visualization of human rare disease genomics Swiss Institute for Bioinformatics) were named data (RD-Connect) stored at EGA as ELIXIR Core Data Resources: UniProt, the • Co-organised in collaboration with the Centre of world reference resource for protein sequence Excellence for Molecular Biology (BioExcel), a Bring and functional information; and STRING, the Your Own Workflow (BYOF) hackathon/meeting in knowledgebase of in-depth information on protein- Amsterdam, November 2017. protein interactions • Co-organised a training course on High Performance • Launched two new core resources as recommended Computing together with ELIXIR Slovenia (April 2017 by SIB Scientific Advisory Board: V-Pipe, an in Malaga, Spain) emerging tool to help research on virus genomics, and SwissLipids, a comprehensive knowledgebase • Participated in the Staff Exchange programme by of lipids hosting colleagues from Sweden, Finland and Italy, working on EGA, ELIXIR AAI, and OpenEBench • Conducted a study, supported by ELIXIR and published on the F1000R channel, that identified a • Co-authored and published a position paper sustainable funding model for core data resources in on scientific benchmarking activities and how the life sciences17 benchmarking should be taken into account when designing infrastructures to support life science • Initiated the implementation of a national secure research16 and interoperable infrastructure, as part of the Swiss Personalized Health Network (SPHN)18 Sweden • Launched the first Swiss Certificate of Advanced Studies in personalized molecular oncology, together • Secured funding from the Swedish Research Council with the University Hospital of Basel and the of €5.7 m (57 MSEK) for national and European University Hospital of Lausanne activities, from 2018–2020; a substantial increase compared to previous years • Secured Nordic funding from NordForsk for sensitive data as part of the Tryggve project, with €600,000 (6 MSEK) allocated to Sweden for 2018–2020

• Organised two Capacity Building Workshops on 14. Celia W. G. van Gelder, Rob W. W. Hooft, Merlijn N. van Rijswijk, Linda Genome Annotation and Assembly (in Slovenia and van den Berg, Ruben G. Kok, Marcel Reinders, Barend Mons, Jaap Heringa; Portugal) Bioinformatics in the Netherlands: the value of a nationwide community, Briefings in Bioinformatics, (doi: https://doi.org/10.1093/bib/bbx087)

• Participated in three Implementation Studies: Data 15. Dominguez del Angel V, Hjerde E, Sterck L, Capella-Gutierrez S, movement, ELIXIR Beacons, and Local Ensembl Notredame C, Vinnere Pettersson O, Amselem J, Bouri L, Bocs S, Leskošek B, et al. Ten steps to get started in genome assembly and annotation. • The Human Protein Atlas was named as an ELIXIR F1000Research, 2018, vol. 7, https://f1000research.com/articles/7-148/v1 Core Data Resource 16. Capella-Gutierrez S, De la Iglesia D, Haas J, Lourenco A, Fernandez Gonzalez JM, Repchevsky D, Dessimoz C, Schwede T, Notredame C, Gelpi • Launched the new Pathology Atlas with an analysis JL, Valencia A. Lessons Learned: Recommendations for Establishing Critical Periodic Scientific Benchmarking. bioRxiv 181677; https://doi. of all human genes in all major cancers, which org/10.1101/181677 showed the consequence of protein levels for overall 17. Gabella C, Durinx C and Appel R. Funding knowledgebases: Towards patient survival (August 2017) a sustainable funding model for the UniProt use case [version 1; referees: 3 approved]. F1000Research 2017, 6(ELIXIR):2051 (doi: 10.12688/ • Contributed to the development of Local/Federated f1000research.12989.1)

ELIXIR Annual Report 2017 37 • Organised 53 courses in bioinformatics-related Birmingham Metabolomics Training Centre) topics, spanning 94 days of teaching and training • Organised the ELIXIR/GOBLET/GTN hackathon for nearly 1,200 researchers Galaxy training material re-use (May 2017) • Hosted an ELIXIR Train the Trainer workshop • Secured funding for three Node resources: PHI-base in Lausanne (January 2017) and contributed to (from Smart Crop Protection Institute); InterMine and developing training materials and ISA Tools (from The Wellcome Trust); and CATH/ • Contributed to the ELIXIR paper to present the Gene3D (from the BBSRC) ELIXIR-EXCELERATE Train-the-Trainer pilot programme19 Greece (Observer)

• Contributed to the ELIXIR Training platform work • Secured national funding for ELIXIR Greece for to define metrics for measuring quality/impact of 2018–2020, with an overall budget of €4 million training • Funds are for a shared national compute resource, 20 • Contributed to the development of SourceData and for: 11 databases, 23 tools, a middleware, and for by EMBO, an open-access platform to bring to the five community pilot actions (Marine bioinformatics, surface information buried in the figures of scientific Computational Metabolomics, Protein interactomics, 20 papers NcRNA biomarker identification, and Pathogen • Organized the first Swiss bioinformatics public metagenomics) hackathon • Funded partners are from six universities, eight • Co-organised, together with the University of Basel, research centers, and a national IT infrastructure the 13th Basel Computational Biology Conference provider 2 [BC] , which had more than 500 participants from • Designed a website for ELIXIR Greece, which 24 countries includes a national registry of bioinformatics/ computational biology resources (to be launched in United Kingdom the first quarter of 2018) • Finalised the ELIXIR-UK Consortium Agreement • Installed a monitoring system for resource usage • The Pathogen-Host Interactions database joined • Conducted a survey on computing demands for the as an ELIXIR-UK Node Resource, and the CATH design of the ELIXIR Greece compute resources database was awarded ELIXIR Core Data Resource status • Led two 2017 ELIXIR Implementation Studies (Bioschemas and Learning Paths) and will lead the 2018 Data Implementation Study (CATH, Swiss- 18. https://www.sphn.ch/en.html

Model, PDBe, and InterPro) 19. Morgan SL, Palagi PM, Fernandes PL et al. The ELIXIR-EXCELERATE Train-the-Trainer pilot programme: empower researchers to deliver high- • Partnered in three 2018 ELIXIR Implementation quality training [version 1; referees: 2 approved]. F1000Research 2017, Studies in Interoperability (Validation, Workflow 6:1557 (doi: 10.12688/f1000research.12332.1) interoperability) and Data (FAIRmetrics) 20. http://sourcedata.embo.org/ 21. https://www.bc2.ch/2017 • Ran ELIXIR Portals including: TeSS, which saw a 32% increase in users; and FAIRsharing, which now 22. Pawlik A, van Gelder CWG, Nenadic A et al. Developing a strategy for computational lab skills training through Software and Data Carpentry: includes over 1000 standards, 1000 databases, and Experiences from the ELIXIR Pilot action F1000Research 2017, 6:1040, (doi: 100 data policies 10.12688/f1000research.11718.1) Larcombe L, Hendricusdottir R, Attwood TK et al. ELIXIR-UK role • Published three ELIXIR F1000R publications22, and in bioinformatics training at the national level and across ELIXIR. co-authored an additional three papers23 F1000Research 2017, 6:952 (doi: 10.12688/f1000research.11837.1) Hancock JM, Game A, Ponting CP and Goble CA. An open and transparent • Partnered with Software Sustainability Institute and process to select ELIXIR Node Services as implemented by ELIXIR-UK. F1000Research 2017, 5(ELIXIR):2894 (doi: 10.12688/f1000research.10473.2) BBSRC to develop guidance or software outputs in funding awards, and ran a national workshop on 23. Morgan SL, Palagi PM, Fernandes PL et al. The ELIXIR-EXCELERATE Train-the-Trainer pilot programme: empower researchers to deliver licensing high-quality training. F1000Research 2017, 6:1557 (doi: 10.12688/ f1000research.12332.1) • Trained 2,470 researchers through 108 face-to- Jiménez RC, Kuzak M, Alhamdoosh M et al. Four simple recommendations face training courses, and 2936 active learners to encourage best practices in research software. F1000Research 2017, 6:876 (doi: 10.12688/f1000research.11407.1) in four online courses (run by the two ELIXIR-UK van Rijswijk M, Beirnaert C, Caron C et al. The future of metabolomics training Node resources, the Bioinformatics Training in ELIXIR. F1000Research 2017, 6(ELIXIR):1649 (doi: 10.12688/ Programme of the University of Cambridge, and The f1000research.12342.2)

38 ELIXIR Annual Report 2017 2017 highlights

Hungary becomes a Member ELIXIR takes part in a Call for action of ELIXIR for global coalition to sustain core Hungary became the 21st Member data resources to join ELIXIR! The ELIXIR Node in The leaders of major international Hungary is led by the MTA Research life-science data resources issued a Centre for Natural Sciences and is call for a global coalition to support coordinated by Professor Laszlo biological data resources that are Patthy of the Institute of Enzymology essential for the work of life-science within the Research Centre for Natural researchers, educators, and innovators. Sciences of the Hungarian Academy ELIXIR has actively participated in the of Sciences. The Node will focus on development of this call, using the novel tools, services and databases experience gained in developing the in the fields of protein sequence and selection procedure for ELIXIR Core structure investigation, DNA sequence Data Resources. analysis, and translational medicine.

Jan Feb Mar

ELIXIR and GA4GH Beacon Team Up to Advance Genomic Data Sharing The Beacon Project of the Global Alliance for Genomics and Health (GA4GH) and ELIXIR expanded ELIXIR 2017 All Hands community their partnership to develop the meeting in Rome Beacon project and to improve The third ELIXIR All Hands meeting the discoverability of European was held in Rome, Italy on 21-23 genomic data. The partnership aims March, with nearly 300 attendees. The to establish a European network of meeting presented the full breadth of Beacons that allows users to query all EXCELERATE activities; the keynotes ELIXIR Beacons simultaneously and by R.Gentleman and F. Oullette to introduce new security measures. presented the research activities of In October, the Beacon project was 23andMe, and the Training Programme named one of the GA4GH driver of Genome Canada. projects, an initiative to help guide GA4GH development efforts and to establish standard tools for genomic data sharing.

“ELIXIR is helping data stewards make their data discoverable for re-use. Working with the GA4GH, ELIXIR is helping to scale up existing standards and good practices in data discovery.” Peter Goodhand, GA4GH Executive Director

ELIXIR Annual Report 2017 39 New ELIXIR website design ELIXIR launches staff exchange ELIXIR Hub launched a new design for the programme ELIXIR website at www.elixr-europe.org ELIXIR launched the first Call for Staff The new design improves the navigation Exchange projects between ELIXIR and organisational structure of the site and Nodes. The purpose of the ELIXIR Staff updates its overall layout. Exchange programme is to support capacity building in ELIXIR Nodes and to exchange best practices in bioinformatics service provision. The programme will also strengthen the links between ELIXIR Nodes and supports the interoperability and sustainability of ELIXIR services and data resources.

Apr May Jun

ELIXIR-EXCELERATE receive a very positive Mid term review The ELIXIR-EXCELERATE mid-term review assessed the activities and results of the first half of the project. According to the assessment report ELIXIR-EXCELERATE “... fully achieved its objectives and milestones for the period and has delivered exceptional results with significant immediate or potential impact.”

ELIXIR-EXCELERATE was also Building Galaxy training capacity recognised in the analysis of the ELIXIR-EXCELERATE, GOBLET and the Horizon 2020 Research Infrastructures Galaxy Training Network, organised programme published in May by a joint hackathon on Galaxy training the European Commission. In the material re-use in Cambridge, UK. document, the European Commission The main objective was to extend the presented ELIXIR-EXCELERATE as existing collection of Galaxy training a success story and an illustration of materials, which was initiated at the added value provided by Horizon 2020 2016 Galaxy Community Conference. research infrastructure programme. ELIXIR also supported the Galaxy community by sponsoring the Galaxy Community Conference in Montpellier, France, in June 2017.

40 ELIXIR Annual Report 2017 ELIXIR at ISMB/ECCB, Prague, in Prague, Czech Republic, the ELIXIR publishes position paper Czech Republic, 21–25 July conference presented the activities of on FAIR Data Management in life ELIXIR presented its activities in a the ELIXIR Platforms and Use Cases, sciences dedicated programme track at the as well as its Industry Programme and This position paper highlights the ISMB-ECCB Conference, the biggest and Capacity Building Programme. The key role of research infrastructure in most important event in bioinformatics event also saw the official launch of helping researchers to make published and computational biology. Hosted the ELIXIR Core Data Resources. life-science data FAIR (Findable, Accessible, Interoperable and Reusable). This statement also voices ELIXIR’s commitment to enabling the availability of FAIR data within the framework of the European Open Science Cloud (EOSC), an initiative of the European Commission and Member States to connect big data across Europe. In support of the EOSC Declaration from July 2017, ELIXIR published a set of guiding principles on FAIR Data Management.

Jul Aug Sep

ELIXIR establishes new Communities for Proteomics, Metabolomics and Galaxy Following agreement with the Heads of Nodes committee, ELIXIR establishes three new Communities, covering Proteomics, Metabolomics and ELIXIR announces initial list of Core Galaxy, while continuing to operate Data Resources and Deposition its four current Use Cases on Marine Databases Metagenomics, Plant Sciences, Rare ELIXIR published the initial list of ELIXIR Diseases and Human Data. Core Data Resources – data resources considered to be fundamental importance to the life-science community and to the long-term ELIXIR-EXCELERATE Train-the- preservation of biological data. These Trainer programme helps close skills resources provide a benchmark for high gap in bioinformatics training quality concerning the infrastructure The ELIXIR Training Platform published of service provision and they drive a strategic paper on the delivery ELIXIR’s discussions with funders and and outcomes of seven ELIXIR- policy-makers on the sustainability of EXCELERATE Train-the-Trainer life-science data resources. courses organised in 2016 and 2017. Suggestions from this paper have “We are already seeing the first benefits already contributed to the development of the process to identify the Core of an ELIXIR Train-the-Trainer Data Resources in terms of improving programme that can be hosted by any ELIXIR’s capacity to deliver data ELIXIR Node. resources that meet the scientific need. For example, as a result of the “Scientists at all career stages need ‘point evaluation process, we have seen data of need’ training to help them make resources change their license to align the best use of bioinformatics tools and with ELIXIR’s Open Access principles, resources in their research. ELIXIR is allowing more extensive data reuse not contributing to expand the provision of only for basic research but for industry courses and also of new instructors able too.” to deliver high-quality courses, and cope Jo McEntyre (EMBL-EBI), co-Leader of with the high demand for training.“ the ELIXIR Data Platform. Patricia Palagi, ELIXIR Training Platform Co-Lead.

ELIXIR Annual Report 2017 41 ELIXIR joins new initiative to improve security in drug development ELIXIR becomes a partner in eTRANSAFE, a new €40 million project funded by the Innovative Medicines Initiative (IMI), which began in September. This five-year project aims to develop an advanced data integration infrastructure and new computational methods to improve security in drug development process. ELIXIR joins the project consortium, which consists of eight academic institutions, six SMEs and twelve pharmaceutical companies, coordinated by the Fundació Institut Mar d'Investigacions Mèdiques (IMIM, part of ELIXIR Spain) and led by the pharmaceutical company, Novartis

Oct Nov Dec

Bioschemas meeting The Bioschemas workshop organised in October at the Wellcome Genome Campus in Hinxton, UK, to engage with the life-science community and encourage their members to adopt and use the Bioschemas specifications. The event brought together ELIXIR finished its 2017 Innovation representatives from 33 different and SME programme in Paris biological resources, including major Paris, France, saw the last event international resources like UniProt in ELIXIR Innovation and SME and PDBe. Each of them tested and programme in 2017. The programme adopted at least one of the Bioschemas involved four thematic events in specifications. Helsinki, Barcelona, Brussels and Paris, each presented a range of “The life-science community is beginning ELIXIR bioinformatics resources to to see the benefits of the Bioschemas R&D companies and SMEs. These markup and we will be expand our reach four events attracted a total of 382 to new data types and new data sets. participants, the largest number of By getting Core and Node-supported attendees we have so far supported in data resources to embed markup in a single year. Furthermore, the average their sites, ELIXIR will build a crucial level of satisfaction of the participants component of our FAIR metadata exceeded 95%, as demonstrated in infrastructure; enough to be indexable the post-event feedback survey. by search engines.” Carole Goble, ELIXIR UK Head of Node and one of the Bioschemas leaders

42 ELIXIR Annual Report 2017 EU Grants ELIXIR-EXCELERATE

ELIXIR-EXCELERATE is a €19 million Horizon The ELXIR-EXCELERATE Work Packages are as 2020 project to fast-track the implementation follows: of ELIXIR by coordinating national data • WP1: Tools Platform: Tools Interoperability and infrastructures and by ensuring the delivery Service Registry of life-science data services through its • WP2: Tools Platform: Benchmarking Platforms and Use Cases. • WP3: Data Platform: Data Resources and Services • WP4: Compute Platform: Compute, Data access and exchange services • WP5: Interoperability Platform: The ELIXIR Interoperability Backbone • WP6: Marine Metagenomics Use Case: Marine metagenomics infrastructure as a driver for research Basic facts about this project and industrial innovation • €19.8 million in funding • WP7: Plant Sciences Use Case: Integrating Genomic • Project term of four years (2015–2019) and Phenotypic Data for Crop and Forest Plants • Involving 48 partners in 18 countries • WP8: Rare Disease Use Case: ELIXIR infrastructure for Rare Disease research Overall goals • WP9: Human Data Use Case: Secure archiving, ELIXIR-EXCELERATE fast-tracks the implementation of dissemination and analysis of human access- key scientific and organisational aspects of ELIXIR and controlled data facilitates the integration of Europe’s bioinformatics resources. It aims to deliver ‘excellence’ to ELIXIR’s • WP10: ELIXIR Node Capacity Building Programme users by ‘accelerating’ the implementation of one of Training Europe’s three priority Research Infrastructures, as • WP11:Training Platform: ELIXIR Training Programme considered by ESFRI and the European Council. • WP12: Excellence in ELIXIR Management and The goals of ELIXIR-EXCELERATE are to: Operations • Deliver world-leading data services for academia and industry • WP13: Communications, Industry and Community • Increase bioinformatics capacity and competence Engagement across Europe • WP14: Ethics requirements • Complete the management and organisational In 2017, the EXCELERATE project held its second processes for an efficient, distributed infrastructure Annual General Meeting in Rome on 21–23 March The ELIXIR-EXCELERATE project is fully embedded 2017; the three-day event was held in conjunction with into ELIXIR’s operations, meaning that all EXCELERATE the ELIXIR All Hands meeting. activities and objectives reflect and complement the On 10 May 2017 – halfway through the duration of objectives of ELIXIR’s Scientific Programme, the grant in the mid-term review took place to assess 2014–2018. the activities and results of the first half of the ELIXIR- The ELIXIR Platforms and Use Cases are represented EXCELERATE project. During the mid-term review in EXCELERATE as Work Packages (WPs), coupled by meeting in Brussels, each Work Package presented dedicated Work Packages on Capacity Development, their activities and achievements to date and their Operations, Communications and Ethics. impact on life-science user communities.

The feedback from the external review was very positive, and according to the assessment report submitted to the European Commission, ELIXIR- EXCELERATE “... fully achieved its objectives and

44 ELIXIR Annual Report 2017 milestones for the period and has delivered exceptional results with significant immediate or potential impact.”

ELIXIR-EXCELERATE was also recognised in the analysis of the Horizon 2020 Research Infrastructures programme1 published in May by the European Commission. In the document, the European Commission presented ELIXIR-EXCELERATE as an early success story and as an illustration of added value benefits provided by Horizon 2020 research infrastructure interventions.

The main ELIXIR-EXCELERATE outputs in 2017 • Published the initial list of ELIXIR Core Data Resources as fundamental resources for life-science research and for the long-term preservation of biological data (WP3) • Organised four ELIXIR Innovation and SMEs Forums (in Helsinki, January; Barcelona, June; Brussels, October; Paris, November) (WP13) • Launched ELIXIR Scientific Benchmarking and Technical Monitoring platform (OpenEBench) (WP2) • Developed technical demonstrator for the secure transfer of large volumes of sensitive human data from the European Genome-phenome Archive (WP9) • Published version 2 of the ELIXIR Handbook of Operations, which is the main source of information on ELIXIR procedures, recommendations and guidelines, and which is released annually (WP12) • ELIXIR-EXCELERATE Marine Metagenomics Backbone (WP6) developed new tools, pipeline and reference databases that helped to classify over 70 % of previously unclassified 16s rRNAs • Extended previously proposed standard MIAPPE (Minimal information about a Plant Phenotyping Experiment) (WP7) • Developed and presented Demonstrator for the secure transfer of large volumes of sensitive human data (WP4)

1. https://ec.europa.eu/research/evaluations/pdf/archive/h2020_evaluations/ swd(2017)221-annex-2-interim_evaluation-h2020.pdf

ELIXIR Annual Report 2017 45 Collaboration with other Research Infrastructures

AARC / AARC 2 eTRANSAFE The AARC (Authorisation and Authentication eTRANSAFE is a €40 million project funded by the for Research and Collaboration) project and its Innovative Medicines Initiative (IMI), which began in successor, AARC2, are e-infrastructure projects that September 2017. The five-year project aims to develop focus on the authentication of researchers and on an advanced data integration infrastructure and new managing their access rights to services. The AARC computational methods to improve security in drug projects develop reference architectures to enable development process. research infrastructures and e-infrastructures to ELIXIR is part of a project consortium that consists take similar approaches to user authentication and of eight academic institutions, six SMEs and twelve authorization (AAI), by removing obstacles from cross- pharmaceutical companies, coordinated by the infrastructure interoperability. Fundació Institut Mar d'Investigacions Mèdiques In 2017, ELIXIR participated in AARC and AARC2 (IMIM, part of ELIXIR Spain) and led by the projects through ELIXIR Finland (CSC – IT Center for pharmaceutical company, Novartis. Science) and ELIXIR Czech Republic (CESNET). The ELIXIR leads two main tasks within this project: (1) work focused on developing a level of assurance creating a policy framework that allows industry framework for authenticating users that meets the and other organisations to share drug safety data needs of the life-science community. ELIXIR also and to adhere to consistent guidelines for predictive worked on a pilot project within the AARC on the toxicology models; and (2) data interoperability integration of CILogon (an integrated open source and integration. The technical and scientific work identity and access management platform for research involved in these tasks is being carried out by three collaborations) and VOMS (Virtual Organization ELIXIR Nodes: EMBL-EBI, ELIXIR Denmark (Technical Membership Service) into the ELIXIR AAI. University of Denmark), and ELIXIR Spain (through the Together with CORBEL, which forms part of WP5, Barcelona Supercomputing Centre and IMIM). the AARC2 project pulled together the requirements on a Life Science AAI and ran a pilot on this AAI using the e-infrastructures. In April 2017, the AARC project also organised an AAI training event for the ELIXIR community. (See also ELIXIR Compute Platform section)

46 ELIXIR Annual Report 2017 EOSCPilot (2017–2018) The European Open Science Cloud for Research The two major outcomes of the EOSCpilot data pilot project (EOSCpilot) is supporting the first phase interoperability task are: (1) EDMI (EOSC Dataset of the development of the European Open Science Minimum Information), a set of crosswalk metadata Cloud (EOSC). It is a consortium of 33 pan-European guidelines on minimum information for finding and organisations and 15 third parties that aims to reduce accessing datasets and; (2) recommendations on how the fragmentation of, and improve the interoperability to support an ecosystem of metadata catalogues in between, European data infrastructures. EOSC. These recommendations aim to promote the reuse of existing domain-specific registries (such as The objectives of the project are to: (1) develop and trial ELIXIR registries in the life sciences); the goal is to the governance framework for the EOSC and contribute use existing standards and Bioschemas to show how to the development of European open science ELIXIR resources are compliant with EDMI. This can policy and best practice; and (2) launch a number of be used to evaluate the FAIRness of datasets and data demonstrators that will function as high-profile pilots resources, with a special emphasis on Findability that integrate services and infrastructures. These pilots and Accessibility. aim to demonstrate interoperability and its benefits in a number of scientific domains. In the EOSC Governance Work Package, ELIXIR has co-led the task to develop a first set of Principles of ELIXIR is a partner in the Data Interoperability and in Engagement – a set of basic principles that would apply the Governance Work Packages of the EOSCPilot. to service providers and users of the EOSC. The first In 2017, the project’s Data Interoperability Work phase of this task involved mapping the principles and Package produced a first draft of the strategy and rules that current domain-specific infrastructures and recommendations to help users and services to find and e-Infrastructures have when providing access to users. access datasets across several scientific disciplines. This ELIXIR led several panel discussions at EOSC meetings work involves ELIXIR interoperability services such as and workshops to get feedback from service providers Bioschemas, DATs1, FAIRsharing, OmicsDI and ELIXIR and to test concepts with potential users. Core Data Resources, such as PRIDE.

1. Sansone SA, Gonzalez-Beltran A, DATS, the data tag suite to enable discoverability of datasets. Sci Data. 2017 Jun 6;4:170059. doi: 10.1038/ sdata.2017.59.

ELIXIR Annual Report 2017 47 ENVRIPlus (2015–2019) CORBEL (2015–2019) CORBEL is a collaboration project between 13 ESFRI ENVRIplus is a Horizon 2020 project linking Biological and Medical Research Infrastructures Environmental and Earth System Research funded through EU’s Horizon 2020 programme. The Infrastructures, projects and networks together goal of CORBEL is to establish a framework of shared with technical specialist partners to create a more services between the participating infrastructures coherent, interdisciplinary and interoperable cluster (BMS RI), which enhances the efficiency, productivity of Environmental Research Infrastructures. ELIXIR, and impact of European biomedical research and its represented by EMBL-EBI, provides expertise and translation into medicine. The CORBEL consortium is resources in the Biodiversity and Ecosystem field led by ELIXIR as the coordinator and the Biobanking and in the ‘Data for science’ theme. and Biomolecular Resources Research Infrastructure (BBMRI) as co-coordinator.

In the first half of 2017 the project finalised the first periodic report, the contractual financial and technical report submitted to the Europen Commission which was the basis of the project's mid-term review. The mid-term review was then held in Brussels in June 2017. According to the evaluation review prepared by the external evaluator appointed by the European Commission "the project has delivered exceptional EMBRIC (2015–2019) results with significant immediate or potential impact (...). Each of the eight work packages has made EMBRIC – initiated in 2015 and, financed with 9 substantial progress in achieving its stated goals." million euros from the Horizon 2020 programme – connects marine biotechnology initiatives that focus In the 2nd half of 2017 CORBEL published its Catalogue on science, industry and regional growth. In EMBRIC, of Services, providing an overview of services that ELIXIR partners with research infrastructures such can be accessed via the participating research as the European Marine Biological Resource Centre infrastructures (access to samples and technologies, (EMBRC), Microbial Resource Research Infrastructure data, tools, expertise and others.) (MIRRI) and EU-OPENSCREEN, to drive at stronger connections between science and industry through At the 2nd Annual General Meeting in October, the a number of “workflows” including bioproduct consortium decided to publish a second Open Call, discovery, leverage of microbiological culture including services from medical infrastructures in collections and aquaculture breeding strategies addition. The coordinator has therefore started a formal informed by genomics. contract amendment process, asking for a cost neutral In 2017, the project further developed its 'Configurator prolongation of the project for 9 months. Service'1 and made it available to the marine biotechnology community. The configurator assists marine scientists in planning their data management, sharing, analysis and publication needs.

1. http://www.embric.eu/node/1371

48 ELIXIR Annual Report 2017 Supporting activities Capacity Building and Node development

Staff Exchange Programme Building an annotation infrastructure The purpose of the ELIXIR Staff Exchange programme in ELIXIR Nodes is to support capacity building in ELIXIR Nodes and The ELIXIR Staff Exchange Programme builds on the exchange of best practice in bioinformatics the experience gained during a pilot Staff Exchange service provision. The programme also strengthens scheme between EMBL-EBI and Masaryk University the links between ELIXIR Nodes and supports the (ELIXIR Czech Republic). The project trained six interoperability and sustainability of ELIXIR services graduate students from the Masaryk University (MU) and data resources. in the annotation of data in the Protein Databank in Europe (PDBe), with the goals of building local capacity The first set of Staff Exchange projects were selected for the annotation of PDBe data in the Czech Republic in August 2017 through an internal peer-review and of establishing a longer-term partnership between process. The seven selected staff exchanges started EMBL-EBI and MU. in October 2017 and included a diverse set of projects, for example, to integrate services, tools or workflows In addition to the immediate benefits of building from Nodes, to develop ontologies, and to improve annotation expertise in ELIXIR Czech Republic and of computing resources for training. improving the annotation data in PDBe (close to 300 macromolecular complexes were annotated as part Based on the initial feedback and experience from of this project), the project also served as a successful this first Call, staff exchange appears to be a powerful proof-of-concept for establishing annotation capacity mechanism by which to bring Nodes of the distributed at, and for strengthening links between, ELIXIR Nodes. ELIXIR infrastructure together. Given this initial success, ELIXIR has allocated funding for two new The collaboration between PDBe and ELIXIR Czech calls in 2018. Republic now continues with support from an EU Regional Development Funds grant and has expanded to other institutes within ELIXIR Czech Republic.

ELIXIR Nodes participating in ELIXIR Staff Exchange projects. Sweden Netherlands (3 projects) Spain (3 projects) (4 projects)

Slovenia France (2 projects) Portugal (1 project) (1 project) Italy (2 projects)

Czech Rep. (2 projects) EMBL-EBI Finland (2 projects) (2 projects)

50 ELIXIR Annual Report 2017 ELIXIR Industry Engagement

As the costs of generating '-omics' data ELIXIR SME and Innovation Programme decrease, the bioinformatics industry will The main focus of ELIXIR’s industry engagement continue to grow. This will, in turn, increase activities in 2017 was the Innovation and SME the demand for a robust and sustainable programme of four events: infrastructure for public life-science data. • Helsinki, Finland, in February ELIXIR’s industry engagement supports the • Barcelona, Spain, in June use of public bioinformatics resources by • Brussels, Belgium, in October research-intensive companies and by Small • Paris, France, in November. to Medium-Sized Enterprises (SMEs). ELIXIR Finland and ELIXIR Estonia together with the The ELIXIR Industry Strategy has set five objectives Global Alliance for Genomics and Health hosted the to increase the awareness of public bioinformatics ELIXIR Innovation and SME Forum in Helsinki. This resources and to promote open innovation: forum was aimed at companies in the genomics and health domains that use public bioinformatics • Increase industry usage of ELIXIR resources and resources and are looking to further streamline this ensure the name is synonymous with quality process by using global data resources available • Enable Open innovation by Europe’s SMEs through ELIXIR. • Build effective partnerships with key industry The forum held in Barcelona, hosted by ELIXIR stakeholders and initiatives Spain, focused on public-private partnerships in • Ensure effective communication between industry genomics, bioinformatics and health. The programme and ELIXIR featured presentations on some of the key resources in ELIXIR and ELIXIR Spain. The programme also • Support the bioinformatics training needs of industry included a technical seminar that introduced funding opportunities in the pre-competitive space, such as those offered by Innovative Medicines Initiative, as well as a hands-on demonstration of resources that are available through ELIXIR and Bioinformatics Barcelona.

Panel discussion at the ELIXIR Innovation and SME Forum in Barcelona, June 2017.

ELIXIR Annual Report 2017 51 The Brussels event, hosted by ELIXIR Belgium, brought These four events attracted a total of 382 participants, together companies that are active in the probiotics, the largest number of attendees we have so far food and health sectors, to stimulate interactions supported in a single year. The average level of between companies and academic partners. It satisfaction of the participants exceeded 95%, as showcased ELIXIR resources in the microbial research demonstrated in the post-event feedback survey. space, such as ENA, PRIDE and Metagenomics Use Case tools. The programme also featured talks by Furthermore, four collaborative projects and proposals Unilever and DSM, by a selection of SMEs, as well involving eight different companies (all SMEs) as by the European Commission. The event was and ELIXIR Nodes were initiated as a result of the also co-located together with a workshop titled, Innovation and SME Forums organised in 2017. Personalised nutrition for better health – targeting the microbiome, which was co-organised by OECD and ELIXIR’s Industry Advisory Committee the Department of Economy, Science and Innovation The ELIXIR Industry Advisory Committee (IAC) met for of the Flemish government. the third time in early 2017. Based on their discussion, the IAC presented a set of high-level recommendations The final ELIXIR Innovation and SME Forum of 2017 for ELIXIR, including the need to keep developing was held in Paris, France, and focused on Rare good links with other successful industry–academia Diseases and Personalised Medicine. In this lunch-to- initiatives, such as the Innovative Medicines Initiative. lunch event, attendees were immersed in the world The IAC also recommended continuing to work with of data-driven innovation, illustrated through talks by regional SME associations, to maximize the number innovative companies and by presentations of ELIXIR’s of companies that can find out about and access open data resources and services, such as Orphanet, ELIXIR’s services. and of resources developed by the ELIXIR Rare Diseases Use Case.

Members of the ELIXIR Industry Advisory Committee. From left to righ: Filip Pattyn (ONTOFORCE), Andreas Kremer (ITTM, Luxembourg, appointed in November 2017), Abel Ureta-Vidal (Eagle Genomics, appointed in November 2017), Ian Barrett (AstraZeneca UK), Elizabeth Reynolds (General Bioinformatics, Vice Chair) and Iain Hrynaszkiewicz (Springer Nature, Chair). Members of ELIXIR Hub Niklas Blomber (ELIXIR Director), Andrew Smith (ELIXIR External Relations Manager) and Pablo Roman (ELIXIR Industry Officer). Not pictured: Martin Ebeling, Anita Eliasson, Natalia Jiménez Lozano, Christian Paulitz, Angel Pizarro, Philippe Sanseau, Sándor Szalma, Sara Paulina de Oliveira Monteiro, Claus Stie Kallesøe, and Belinda Clarke.

52 ELIXIR Annual Report 2017 International collaboration

Global collaboration is a cornerstone of In April 2017, ELIXIR’s Board approved a formal ELIXIR’s implementation; users come from document that set out the principles that ELIXIR all over the world, many databases are run as should follow when considering Membership part of global collaborations, and to ensure applications from countries outside of ESFRI3. Whilst an effective, integrated data infrastructure, any country in the world can use the services run by ELIXIR, the agreed strategy when considering a relevant initiatives must work in close formal application to become a Member from outside cooperation. ELIXIR’s global collaborations of ESFRI is to consider the relative maturity of that are mapped out in the International country’s bioinformatics landscape and the benefits to Strategy1, which underwent a revision and existing Members of that country joining ELIXIR. This update in 2017. The International Strategy document now acts as a guide in ELIXIR’s dialogue is aimed at a broad range of stakeholders, with countries outside of Europe. including: users and communities across Other key work globally in 2017 included the effort the world; global informatics and data – led by the Data Platform – to support the long- initiatives; policy makers and funders; and term sustainability of Core Data Resources through ELIXIR partners. It acts as a roadmap that engagement in the global coalition to support life- showcases how ELIXIR engages with key science data resources (see Data Platform section global initiatives and countries. for further details).

The objectives of the International Strategy are to: • Ensure that ELIXIR serves life-science users and communities across the globe • Support collaboration between ELIXIR and relevant global bioinformatics and data initiatives • Shape global science policy discussions on data and research infrastructures • Develop formal collaborations with those countries outside of Europe where there is mutual benefit

In 2017, activities took place to support the implementation of all of the above objectives. Collaboration Strategies were concluded with key partners, including with the Global Alliance for Genomics and Health (GA4GH) and the BioExcel project2. ELIXIR’s Training Platform continued to collaborate closely with the GOBLET training initiative, with which ELIXIR already has a pre-established Joint Training Strategy.

1. https://www.elixir-europe.org/elixir-international-strategy

2. https://bioexcel.eu/

3. See ESFRI Member States: http://www.esfri.eu/delegates

ELIXIR Annual Report 2017 53 Impact and sustainability

Throughout 2017, ELIXIR undertook a range Desk-based research and interviews with many Small of activities in order to understand and to Medium-Sized Enterprises (SMEs) were carried demonstrate the impact of public life-science out in late 2017. These interviews were performed to data. By showcasing the value of investments describe the role, and to understand the importance, of public life-science databases to industry. This work in ELIXIR and of a public bioinformatics culminated in the report, published in early 2018, infrastructure generally, important steps can entitled ‘Public Data Resources as a Business Model be taken towards ensuring the sustainability for SMEs’1. of these publicly-funded resources. In order to shape the science policy and funding landscape, ELIXIR continued to produce a range ELIXIR was interviewed as part of the OECD’s of Position Papers and responses to relevant work to develop a methodology for assessing the consultations. This included formal submissions to socio-economic impact assessment of research the Interim Evaluation of Horizon 2020, where ELIXIR infrastructures across all disciplines. ELIXIR was also presented a case for the need for appropriate, long- invited to present at a number of OECD workshops term funding for data infrastructures, and a Position on the theme of business models for databases and Paper on ‘FAIR Data Management in the Life Sciences, international data infrastructures, both of which which also doubled as ELIXIR’s endorsement of the ensured that ELIXIR’s visibility remained high with European Open Science Cloud (EOSC) Declaration2. policy makers.

A proposal to Horizon 2020 on impact assessment was also prepared and submitted in early 2017. The project proposal, RI Impact Pathways, brought together economists and impact-assessment experts to study four existing infrastructures – ELIXIR, Cern, Desy and Alba – and to attempt to develop a model for impact assessment. The proposal was selected for funding in autumn 2017 and began in early 2018.

1. Roman Garcia P, Smith A and Blomberg N. Public data resources as a business model for SMEs. The Role of Public Bioinformatics Infrastructure in supporting innovation in the life sciences. F1000Research 2018, 7(ELIXIR):590 (document) (doi: 10.7490/f1000research.1115445.1)

2. Blomberg N and ELIXIR Consortium. ELIXIR position paper on FAIR data management in the life sciences. F1000Research 2017, 6(ELIXIR):1857 (document) (doi: 10.7490/f1000research.1114985.1)

54 ELIXIR Annual Report 2017 Communications

Communications strategy review ELIXIR videos The ELIXIR Communications strategy was first Throughout 2017, the ELIXIR Hub released two released in May 2016. In 2017, the Communications more promotional videos on the ELIXIR YouTube strategy was reviewed against: (1) the mission and channel, which complement the ELIXIR profile video objectives of ELIXIR; and (2) the communications published in October 2017. The first video in 2017 activities of other ESFRI research infrastructures and was released in March and focused on the impact of of other organisations similar to ELIXIR. The resulting ELIXIR. It presents how life-science researchers use report presented qualitative and quantitative data to three particular resources that are available through measure the effectiveness of ELIXIR Communications ELIXIR (Human Protein Atlas, SwissProt/UniProt , and provided a series of recommendations for specific and Europe PMC), and also how industry can benefit communications channels and/or audience. from open bioinformatics data. The second video – published in November – features ELIXIR activities The available data showed that ELIXIR's in Human genomics and translational data. Through communication presence in general outperforms other interviews with senior scientists from ELIXIR Nodes ESFRI research infrastructures. In particular, ELIXIR has and members of the ELIXIR leadership team, the video a strong presence in social media and has built a strong presents a portfolio of services to facilitate the sharing, audience. This digital communication is complemented exchange and re-use of genetics data. by a significantly improved ELIXIR website, both in terms of content and design (see below). The three videos will be complemented by one additional video to be released in 2018 to present the Another strong communications asset is ELIXIR’s ELIXIR Training Programme. Newsletter, which is considered to be the most useful communications channel, particularly for ELIXIR’s In 2017, the ELIXIR Hub continued with the ELIXIR internal audience. webinar series and organised 16 webinars to present the work of ELIXIR Implementation studies, ELIXIR New ELIXIR website Platforms and Use Cases. The layout, design and content structure of the ELIXIR website were redesigned in Spring 2017. The main goal of this re-design was to improve the user experience of the site, by simplifying the navigation structure throughout the website, by updating its content, and by improving the site’s layout and design.

The feedback on the redesigned ELIXIR website, and the data collected during the Communications review (see above), suggest that the website’s new content is well-aligned with the needs of the ELIXIR stakeholders. Overall, the website was rated as good or very good by over 63% of the survey respondents. ELIXIR video on Human Genomics and Translational Data. All ELIXIR videos are available on the ELIXIR Youtube channel: https://youtu.be/stTY6fxwonY .

ELIXIR Annual Report 2017 55 ELIXIR at ISMB/ECCB 2017 ELIXIR Gateway on F1000Research In 2017, the major conference for ELIXIR was the The ELIXIR Gateway on F1000Research was launched joint ISMB-ECCB Conference (Intelligent Systems in December 2015, as a platform to collect and capture for Molecular Biology – European Conference for ELIXIR’s research and technical outputs. In 2017, the Computational Biology), which was held on 21–25 July ELIXIR channel published 12 articles, all transparently in Prague, the Czech Republic. peer-reviewed through the F1000Research invited post-publication peer review process. The programme of the conference featured a dedicated track to showcase ELIXIR activities and services. The The most successful article published in the ELIXIR ELIXIR Special Track presented ELIXIR Platforms and Gateway in 2017 was by Jiménez, Kuzak et al,1 which Use Cases, ELIXIR Capacity building programme, and presented recommendations to encourage best ELIXIR Industry support. Participants could also learn practices in research software development. This has about ELIXIR services at the ELIXIR demonstration been viewed by over 2,200 readers. Other popular stand where they could talk directly to developers and publications include articles by Morgan, Palagi et al. operators of selected ELIXIR resources. presenting ELIXIR Train-the-Trainer programme2 and by Gabella, Durinx and Appel exploring sustainable The ISMB-ECCB 2017 was also organised with the funding model for life science data resources3. support of ELIXIR Czech Republic. The editorial oversight of the ELIXIR Gateway is provided by an Advisory Board, who review all papers submitted to the Gateway to ensure all materials are relevant to the ELIXIR community.

The members of the Advisory Board of the ELIXIR Gateway in 2017 were: • Niklas Blomberg, ELIXIR Director • Inge Jonassen, University of Bergen, Head of ELIXIR Norway • Arlindo Oliveira, Instituto Superior Técnico, Head of ELIXIR Portugal • Bengt Persson, Uppsala University, Sweden, Head of ELIXIR Sweden • Graziano Pesole, The University of Bari Aldo Moro, Head of ELIXIR Italy

1. Jiménez RC, Kuzak M, Alhamdoosh M et al. Four simple recommendations to encourage best practices in research software [version 1; referees: 3 approved]. F1000Research 2017, 6:876 (doi: 10.12688/f1000research.11407.1)

2. Morgan SL, Palagi PM, Fernandes PL et al. The ELIXIR-EXCELERATE Train- the-Trainer pilot programme: empower researchers to deliver high-quality training [version 1; referees: 2 approved]. F1000Research 2017, 6:1557 (doi: 10.12688/f1000research.12332.1)

3. Gabella C, Durinx C and Appel R. Funding knowledgebases: Towards a sustainable funding model for the UniProt use case [version 2; referees: 3 approved]. F1000Research 2018, 6(ELIXIR):2051 (doi: 10.12688/ f1000research.12989.2)

56 ELIXIR Annual Report 2017 Governance

To ensure the integration of bioinformatics In 2017, the ELIXIR Hub worked closely with the services into a coherent distributed ELIXIR Nodes on the development of Collaboration infrastructure, it is critical to create effective Agreements. During this period, four Collaboration links between the national institutes that Agreements (between ELIXIR Hub and the ELIXIR make up the ELIXIR Nodes, and between nodes for Belgium, Italy, Luxembourg and UK) were signed. By the end of 2017, a total of fifteen ELIXIR those institutes and the ELIXIR Hub. This is Nodes had their Collaboration Agreements in place. done through the signing of Collaboration Negotiations will continue with the remaining Nodes Agreements, which allow the ELIXIRNodes in 2018. to receive funding from the ELIXIR Hub for Commissioned Services. ELIXIR Node Service Delivery Plans (SDPs) describe the scientific and service provision content that each ELIXIR Node provides through ELIXIR. In 2017, the Hub prepared a SDP with ELIXIR Luxembourg and submitted it for review and approval to the ELIXIR Board during its 2017 Spring meeting. The portfolio of ELIXIR services, as put forward by ELIXIR Nodes, is available on the ELIXIR website (http://elixir-europe. org/services).

Number of completed governance agreements

20

15

10

5

0 2013 2014 2015 2016 2017

Year

ELIXIR Consortium Agreements Service Delivery Plans Collaboration Agreements First Commissioned Service Contracts

The growth of ELIXIR in its first four years (2014–2017) is demonstrated by the number of ELIXIR members (ELIXIR Consortium Agreements). The growing number of completed Collaboration Agreements, Service Delivery Plans and Commissioned Service Contracts illustrate the successful implementation of ELIXIR Governance structure. These contract allow ELIXIR Nodes to collaborate and ELIXIR Hub to commission technical services.

ELIXIR Annual Report 2017 57 ELIXIR Scientific Processes Working The ELIXIR 2019–2023 Programme Group In 2017, we also initiated the development of the next The ELIXIR Heads of Nodes established a Working ELIXIR Scientific Programme. ELIXIR operates through Group to develop scientific processes within ELIXIR, quinquennial Programmes, the first of which started in with the specific aim for 2017 of establishing a process 2014. The next Programme will span the years, 2019– for the selection of new Communities. The Working 2023. In order to be ready for approval by the ELIXIR Group also developed recommendations for processes Board in November 2018, the first draft of the next concerning the Programme, annual Work Plans, and Programme was prepared in 2017 and reviewed by the Implementation Studies, and will continue its work Heads of Nodes. This programme will also be reviewed through to 2018. by the ELIXIR Scientific Advisory Board in early 2018. The Programme is developed along with the 2019– Handbook of Operations 2023 Financial Plan, which lays out how the ELIXIR The ELIXIR Handbook of Operations, which underwent Budget is planned to be used, both for coordination an update in 2017, continues to be the authoritative and Commissioned Services activities. source of information on ELIXIR procedures, recommendations and guidelines, strategies and reference documents. It is aimed at the whole ELIXIR community, including all staff in ELIXIR Nodes, ELIXIR Hub staff, ELIXIR Board members and national funders. The topics covered by the Handbook include Governance, Nodes and Service provision, ELIXIR Programme cycle, Project management, Communications and External relations, and Technical operations.

58 ELIXIR Annual Report 2017 ELIXIR Hub staff

In 2017, the ELIXIR Hub significantly Jerry took over the CTO position from Rafael Jiménez, expanded its team and strengthened its who had served as ELIXIR CTO from 2014. Following coordination capacity. Jerry’s appointment, Rafael moved to the post of Chief Data Architect, working on key projects within The recruitment of the new ELIXIR Chief Technical ELIXIR, such as Bioschemas and the European Open Officer (CTO) and the appointment of ELIXIR Platform Science Cloud. Coordinators was a major step in terms of establishing technical support and coordination, both within In the ELIXIR Hub Project Management Unit, Juan and between ELIXIR Platforms and Use Cases. This Arenas replaced Steffi Suhr as the EXCELERATE structural change allowed the technical and domain project manager in March 2017, and became the Head experts in the Nodes to focus on their technical work, of the Project Management Unit in July. Juan came and provided greater capacity to drive the development to ELIXIR with over 15 years of experience in project of technical strategy across ELIXIR Platforms and management. Before joining ELIXIR, Juan worked Use Cases. as Portfolio Manager & Technology Officer at the University of Sheffield, UK. Prior to that, Juan led ICT Jerry Lanfear joined the ELIXIR Hub in September projects for global companies in a variety of sectors 2017 as its new Chief Technical Officer (CTO). and technologies in Spain. As CTO, Jerry’s main task is to lead the design and implementation of an ELIXIR-wide technical strategy Rachel Drysdale was appointed ELIXIR Data Platform and to oversee the developments of ELIXIR Platforms Coordinator in March, working closely with the Data and the interaction between them. Jerry joined Platform leadership team, coordinating the activities of ELIXIR from Pfizer where he was a Senior Director the Platform, notably the process of identifying ELIXIR within the IT group with responsibility for IT service Core Data Resources. Rachel joined ELIXIR from PLOS, delivery across R&D within the UK and for data where she had worked as Manager of Taxonomy management globally. Systems and Analysis, and before that as Consulting Editor for PLOS ONE.

Members of the ELIXIR Hub (from left to right): Juan Arenas, Rafael Jimenez, Niklas Blomberg, Laura Mangan, Jerry Lanfear, Kayla Wiles, Phyllida Hallidie, Rachel Drysdale, Andrew Smith, Susanna Repo, John Hancock, Sheena Lee, Martin Cook, Premysl Velek, Dana Cernoskova, Pablo Roman, Friederike Schmidt-Tremmel, David Loyd and Pascal Kahlem (external consultant).

ELIXIR Annual Report 2017 59 In June, Susheel Varma joined the ELIXIR Hub as John Hancock leads the development of research Technical Coordinator for Human Genomics and communities associated with ELIXIR’s Use Cases Translational Data, and worked to develop and and their integration with ELIXIR Platforms. John has coordinate services within the Human Genomics and broad experience in various research and management Translational Data portfolio, headed by Serena Scollen, positions in computational biology. His previous posts Head of Human Genomics and Translational Data include being a Group Leader at the MRC Clinical in ELIXIR. Susheel left the ELIXIR Hub at the end of Sciences Centre, a Reader in Computational Biology 2017 and took up a new position at EMBL-EBI, where at Royal Holloway University of London, and Head of he works on as ELIXIR Competency Centre Project Bioinformatics at the MRC Mammalian Genetics Unit. Manager. Norman Morrison left the position of ELIXIR In 2015, he moved to the Earlham Institute in Norwich Interoperability Platform Coordinator in September to work as ELIXIR UK Node Coordinator. 2017, to take up a new post in the Human Cell Atlas project. Dana Černošková joined the Hub in January as ELIXIR Events Officer, covering for Melissa Balzano, The Technical Team also appointed David Lloyd who went on maternity leave. To support the overall as a Project Coordinator and John Hancock as administration of the ELIXIR Hub office, the ELIXIR Hub Communities and Services Coordinator. David Lloyd recruited Laura Mangan as Administrative Assistant. works with the ELIXIR Nodes in developing their ELIXIR Service Delivery Plans and also supports the The ELIXIR Hub also hosted two interns. Guillermo ELIXIR Beacons project. David previously worked at Calderon Mantilla spent six months (March – August) the Sanger Institute where he was a coordinator for supporting the Bioschemas initiative; Kayla Wiles the Global Alliance for Genomics and Health. joined the External Relations team in June for six months to support the communications activities of the Hub, namely the evaluation of the Communications Strategy and development of the ELIXIR Node Toolkit.

60 ELIXIR Annual Report 2017 Governance committees and financial data ELIXIR Committees

ELIXIR Board

Chair Vice Chairs Prof Rein Aasland, Norway Dr Ruben Kok, Netherlands Prof , Italy

Country Scientific delegate Administrative delegate

Belgium Laurence Lenoir Michele Oleo, Didier Flagothier

Czech Republic Jaroslav Koča Jan Buriánek

Denmark Troels Tvedegaard Rasmussen

Estonia Pärt Peterson Toivo Räim, Priit Tamm

Finland Per Öster Riina Vuorento, Jarmo Wahlfors

France Claudine Medigue Eric Guittet

Germany Roland Eils, Alexander Goesmann Johannes Mohr

Hungary László Patthy Gábor Tóth

Ireland Marion Boland, Dara Dunican TBA

Israel Yossi Kalifa Ilana Lowi

Italy Rita Casadio (from April 2017) Salvatore La Rosa

Luxembourg Rudi Balling, Regina Becker Lynn Wenandy, Pierre Misteri (both from December 2017)

Netherlands Ruben Kok Bea Pauw

Norway Rein Aasland, Stig Omholt

Portugal Ana Teresa Freitas Andreia Feijão, Tiago Saborida

Slovenia Damjana Rozman Albin Kralj

Spain Ferran Sanz Cristina Bauluz, Dr Rafael de Andres-Medina

Sweden Björn Andersson Karl Gertow (from October 2016) Anna Wetterbom (stepped down end of 2016)

Switzerland Christian von Mering Isabella Beretta (from August 2017)

UK Chris Rawlings Mark Palmer, Amanda Collis

EMBL Iain Mattaj, Silke Schumacher

62 ELIXIR Annual Report 2017 Heads of Node Committee

Chair Niklas Blomberg, ELIXIR Director

Country Head of Node

Belgium Yves Van de Peer

Czech Republic Jiří Vondrášek

Denmark Søren Brunak

Estonia Jaak Vilo

Finland Tommi Nyrönen

France Claudine Médigue, Jacques van Helden (from March 2017)

Germany Alfred Pühler

Hungary Balázs Gyorffy (from January 2018)

Ireland Walter Koch

Israel

Italy Graziano Pesole

Luxembourg Reinhard Schneider

Netherlands Jaap Heringa

Norway Inge Jonassen

Portugal Arlindo Oliveira

Slovenia Brane Leskošek

Spain Alfonso Valencia

Sweden Bengt Persson

Switzerland Ron Appel

UK Carole Goble

EMBL-EBI ,

ELIXIR Annual Report 2017 63 Scientific Advisory Committee Industry Advisory Committee

Chair Chair Robert Gentleman, 23andMe, USA Iain Hrynaszkiewicz, Springer Nature

Vice Chair Vice Chair Dr Janet Kelso, Max Planck Institute for Elizabeth Reynolds, General Bioinformatics, UK Evolutionary Anthropology, Germany Members Members Martin Ebeling, Hoffmann-La Roche, Switzerland Prof Pascal Borry, University of Leuven, Belgium Anita Eliasson, Biocomputing Platforms Ltd, Finland Prof Elina Ikonen, University of Helsinki, Finland Natalia Jiménez Lozano, Atos, Spain Prof Larry Hunter, University of Colorado, USA Christian Paulitz, Bayer CropScience, Germany Prof Nicola Mulder, UCT Computational Biology Group Angel Pizarro, Amazon Web Services, USA (NBN), South Africa Philippe Sanseau, GlaxoSmithKline, UK Dr Francis Ouellette, Ontario Institute for Cancer Sándor Szalma, Takeda Pharmaceuticals, USA Research, Canada Sara Paulina de Oliveira Monteiro, P-BIO, Portugal Prof Juni Palmgren, Karolinska Institutet, Sweden Claus Stie Kallesøe, Gritsystems A/S, Denmark Dr Susan E. Wallace, University of Leicester, UK Belinda Clarke, Agri-Tech East, UK Dr Jérôme Wojcik, Quartz Bio, Switzerland

Members of the SAB (from left to right): Janet Kelso, Larry Hunter, Francis Ouelette, Robert Gentleman (Chair), Nicola Mulder, Elina Ikonen and Juni Palmgren, (not pictured: Jérôme Wojcik, Susan E. Wallace, Pascal Borry and Alan Archibald).

64 ELIXIR Annual Report 2017 Implementation studies in 2017

Implementation Studies are short-term projects carried In total, the work put into these studies came to 250 out by ELIXIR Nodes that address key scientific and Person Months and required an approximate budget technical issues within ELIXIR. The outcome of an of €2.6 m, demonstrating the rapid maturity of ELIXIR Implementation Study might be a description of service and the increasing focus within the ELIXIR budget of requirements, a piece of software, or a technical supporting technical activities that are carried out by deliverable with an accompanying report. the ELIXIR Nodes. On average, 2.5 Nodes participated in each study and a total of nine Nodes were involved Implementation Studies are funded through the budget across all studies. of the ELIXIR Hub and form part of ELIXIR’s ongoing activities in a particular Platform or Use Case. They are In addition, during 2017, the ELIXIR Platforms laid proposed by Platforms, agreed with the ELIXIR Heads the groundwork for a large portfolio of Implementation of Nodes Committee, and the contracts are approved Studies to be carried out in 2018. They did this by by the ELIXIR Board. planning and developing project ideas for a further eight projects, which have been approved to start In 2017, ELIXIR ran 13 Implementation Studies, on 1 January 2018. with one additional study approved in 2017 with a start date of early 2018. All ongoing and completed Implementation Studies are listed at https://www.elixir- europe.org/aboutus/implementation-studies

Name Sector Nodes Leads

Completed in 2017

Solutions for IMI data management Human Data EMBL-EBI ELIXIR Dylan Spalding, Hub EMBL-EBI; Susanna Repo, ELIXIR Hub; David Henderson, IMI OncoTrack

Data Resource Implementations for the Human Data Switzerland Michael Baudis, CH Global Alliance for Genomics and Health France EMBL-EBI Data Schema

ELIXIR Annual Report 2017 65 Name Sector Nodes Leads

Started and finished in 2017

ELIXIR Beacons Human Data Belgium Serena Scollen, EMBL-EBI ELIXIR Hub; Spain, Switzerland Susheel Varma, ELIXIR Netherlands Hub; Finland, Sweden Ilkka Lappalainen, FI; Jordi Rambla, ES; Michael Baudis, CH; Dylan Spalding, EMBL-EBI

Assessment of the operation of the Compute Czech Republic Michal Prochazka, CZ; ELIXIR AAI Finland Mikael Linden, FI

Interpretation of phenotypic and Rare Disease Netherlands Friederike Ehrhart, NL; genotypic variation for rare diseases in Spain Chris Evelo, NL; Marco terms of biological pathways Italy Roos, NL

Name Sector Nodes Leads

Ongoing in 2017

Implementation Study on Data Interoperability EMBL-EBI Sarala Wimalaratne, Identification and Interoperability EMBL-EBI;

The scientific and economic impact Data Switzerland Christine Durinx, CH of ELIXIR Data Resources – Towards a sustainable funding model for the UniProt-SwissProt use case

Integrating distributed resources in Data EMBL-EBI Paul Kersey, EMBL-EBI Ensembl Genomes Sweden Norway

Microbial metabolism resource for Data France Claudine Medigue, FR Systems Biology Switzerland EMBL-EBI

Proteomics infrastructure service Data EMBL-EBI Juan Antonio Vizcaino, Germany EMBL-EBI;

Bioschemas Interoperability UK Alasdair Gray, UK; EMBL-EBI Carole Goble, UK; Rafael NL Jimenez, ELIXIR Hub

66 ELIXIR Annual Report 2017 Name Sector Nodes Leads

Ongoing in 2017 (cont.)

Visualization of aligned genomics data Rare Diseases Spain Sergi Beltran, Jordi for rare diseases (RD-Connect) as a driver EMBL-EBI Rambla, Joaquín for real-time access of controlled Dopazo, Alfonso data at the EGA Valencia and Salvador Capella, ES

Integrating ELIXIR Luxembourg into Human Data Luxembourg Reinhard Schneider, LU ELIXIR activities

Integrating ELIXIR Italy into ELIXIR Data Italy Graziano Pesole, IT activities Tools Interoperability Human Data Rare Disease Marine

Using clouds and VMs for bioinformatics Training Finland Eija Korpelainen, FI training (Workshop as a Service) Compute Netherlands Switzerland France, UK Belgium, Spain Slovenia, Germany

ELIXIR integration from a user Training & Tools UK, Estonia Frederik Coppens, BE perspective Belgium Denmark Switzerland EMBL-EBI Norway, France

ELIXIR Proof of concept study on the Compute Sweden Mikael Borg, SE availability of big datasets on remote Germany compute infrastructure Czech Republic EMBL-EBI, Finland

ELIXIR Annual Report 2017 67 Timeline of ELIXIR Implementation studies in 2017

2016 2017 2018 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2

OncoTrack feasibility study Data Identification and interoperability

Data Resource Implementation for GA4GH schema

Sustainability of data resources

Towards distributed Ensembl Mining the proteome Systems biology ELIXIR AAI ELIXIR Beacons Bioschemas

Visualisation of Rare Disease data

Making RD data sources FAIR

Integration of ELIXIR Luxembourg Integration of ELIXIR Italy Using Clouds in Training

Integration from User Pe rspective

Big Data in remote compute infrastructure

Implementation studies led by

Human Data Use Case Training Platform

Rare Disease Use Case Compute Platform

Node driven study Data Platform

Interoperability Platform

68 ELIXIR Annual Report 2017 Financial data

Appendix 1. ELIXIR Income and Expenditure for 2017 (Note 30 to EMBL annual accounts)

In its 2014 Summer meeting, EMBL Council As of 31 December 2016, the total number of unanimously approved ELIXIR’s legal framework, signatories to the ECA stood at 20 (with France including its status within EMBL as a "Special and Spain as Provisional Members) with Greece Project” as well as EMBL's membership of ELIXIR additionally making contributions to its financing as (EMBL/2013/16/Rev 1). an Observer.

As of 31 December 2015, the total number of The budget of ELIXIR is set annually by the ELIXIR signatories to the ECA stood at 15 (with France Board and all funds related to its activities, including and Spain as Provisional Members) with Slovenia its surplus, are ring-fenced within EMBL's accounts. additionally making contributions to its financing.

2017 2017 Budget 2016 Actual Revised Original Actual Income €000 €000 €000 €000

ELIXIR Member state contributions Ordinary contributions (a) 5.035 5.059 5.033 3.710 Foreign exchange (loss)/gain (3) - - 50 on sterling contributions (b) Grant income (c) 984 1.301 1.244 899 Other income - - - 1 Net income 6.016 6.360 6.277 4.660

Expenditure Technological activities Salaries 405 680 610 195 Running costs 58 259 217 151 Equipment and depreciation 2 9 9 - Commissioned services 1.311 3.493 3.671 435 Total expenditure Technological Activities 1.776 4.441 4.507 781 Directorate and Administrative expenditure Salaries 686 1.085 882 640 Running costs 372 564 497 260 Equipment and depreciation 14 9 9 - Total expenditure Directorate and 1.072 1.658 1.388 900 Administration Support and Admin Infrastructure costs 626 1.439 1.119 342 Grant expenditure incurred 987 1.301 1.244 899 Total Expenditure 4.461 8.839 8.258 2.922 Surplus/(Deficit) 1.555 (2.479) (1.981) 1.738

ELIXIR Annual Report 2017 69 a) ELIXIR Member state contributions

2017 2016 Actual Actual

€000 €000 Belgium 145 123 Czech Republic 54 46 Denmark 93 79 Estonia 6 5 Finland 72 61 France 846 719 Germany 1.061 376 Greece 26 22 Hungary 34 - Ireland 56 20 Israel 66 56 Italy 625 531 Luxemburg 10 4 Netherland 241 205 Norway 133 113 Portugal 63 54 Slovenia 14 10 Spain 438 372 Sweden 147 125 Switzerland 188 159 United Kingdom 717 630 Total 5,035 3,710

70 ELIXIR Annual Report 2017 b) The ELIXIR Board approved that, from January 2016, the UK will pay its member state contributions in Sterling (Elixir/2015128). The difference between the value of these contributions valued in Eurosat the date of payment and the date of the approval of the 2017 budget was a loss of €3k (2016: gain of €50k).

c) Grant income

2017 2016 Actual Actual €000 €000 Grant funding awarded 4.096 5.832 Grant income earned in the current year 984 899 Grant expenditure incurred in the current year (987) (899) Unutilised grant income 2,060 4,892

(d) The following countries have amounts due or prepaid at 31 December 2016

Values in €000 Contribution Contribution Prepayments 2017 2016/2015/2014 Total for 2018 Greece 26 45 (e) 71 Israel - - - 91 Total 26 45 71 91

(e) A provision has been raised against the 2013/2014 contributions, refer note 12.3.

ELIXIR Annual Report 2017 71 Credits and acknowledgements

Produced on the direction of the ELIXIR Board in May 2018.

With special thanks to all of those who contributed to the development of ELIXIR infrastructure in 2017, most notably Heads of Nodes, Platform and Use Case leads, Technical and Training Coordinators and members of the various Working Groups.

© 2018 ELIXIR

This publication was produced by the External Relations team at the ELIXIR Hub

For more information about ELIXIR please contact [email protected]

Art direction and design: Design Science Cover and divider pages: Keith Peters ELIXIR is building a sustainable European Infrastructure for biological information, supporting life science research and its translation to: Medicine Environment Bioindustries Society

Contact Niklas Blomberg, Director ELIXIR Wellcome Genome Campus Hinxton, Cambridgeshire CB10 1SD, United Kingdon +44 (0)1223 492 670 +44 (0)1223 494 468 [email protected] www.elixir-europe.org