ELIXIR

Annual Report 2016 ELIXIR Annual Report 2016

ELXIR Annual Report 2016 1

Contents

2 Foreword by Torsten Schwede, Chair of ELIXIR Board (2015–2016) 3 Foreword by ELIXIR Director

5 ELIXIR Platforms and Use Cases 6 Structure of ELIXIR activities 8 Tools: Services and connectors to drive access and exploitation 10 Data: Sustaining Europe’s life science data infrastructure 11 Compute: Access, exchange and storage 12 Interoperability: Integration of data and services 14 Training: Professional skills for managing and exploiting data 16 Human data: Use case 17 Rare diseases: Use case 18 Marine metagenomics: Use case 19 Plant sciences: Use case 20 Platfrom and Use case leaders

23 ELIXIR members 24 Introducing ELIXIR UK 25 Introducing ELIXIR Germany 26 ELIXIR Nodes updates

32 2016 highlights

35 EU projects 36 ELIXIR-EXCELERATE 39 Collaboration with other Research Infrastructures

41 Supporting activities 42 Capacity building and Node development 43 Governance 44 Industry engagement 46 International collaboration 47 ELIXIR Working Groups

49 ELIXIR Hub activities 50 ELIXIR Communications 52 ELIXIR Hub staff

53 Governance Committees and financial data 54 ELIXIR Committees 57 Financial data 60 Commissioned Services: ELIXIR Implementation studies Foreword by Torsten Schwede

I have watched ELIXIR develop since its conception and have been delighted to see its continued growth, year on year, since the Preparatory Phase.

Yet 2016 was in many ways transformative for ELIXIR. The number of activities taking place in ELIXIR Nodes and across ELIXIR Platforms and Use Cases simply exploded. Four new Implementation Studies were initiated, five ELIXIR Node applications were approved and three Collaboration Agreements concluded. ELIXIR’s first ELSI (Ethical, Legal and Societal Issues) policy was developed in 2016, and the work to define indicators for ELIXIR Core Data Resources concluded. ELIXIR became the largest ESFRI research infrastructure in terms of membership, with 20 members in 2016. This rapid expansion of activities and membership is a clear indication of ELIXIR’s success, and shows the increasingly widespread recognition of as a driving force in the life sciences. 2016 also marked the maturity of ELIXIR: the European Strategy Forum on Research Infrastructures (ESFRI) recognised ELIXIR as a ‘landmark’ infrastructure in its 2016 Roadmap. I have had the privilege of sitting on the ELIXIR Board since its formation in 2014, serving the past two years as Chair. ELIXIR has achieved a great deal during that time, and I am looking forward to seeing it build and develop further. To realise the potential of to address the most pressing challenges we face, we must continue to develop ELIXIR at pace so that it may support new and emerging communities. Looking at the energy and commitment of our members, I continue to be enthusiastic about the achievements we can make together in 2017 and beyond.

Torsten Schwede Chair of ELIXIR Board, 2015–2016

2 ELXIR Annual Report 2016 Foreword by ELIXIR Director

The year 2016 marked the midway point in ELIXIR’s first Scientific Programme (2014– 2018), and was indeed transformational. Our two flagship projects, ELIXIR- EXCELERATE and CORBEL, were in full swing, allowing us to concentrate fully on infrastructure delivery and integration. This Annual Report highlights major achievements that illustrate the growth and maturity of our research infrastructure, which represents the combined efforts of 20 members.

Integrated services Our goal is to establish a coordinated, integrated infrastructure for life-science data, and in 2016 we took major steps towards achieving just that. Our work on ELIXIR Core Data Resources was strategically important. These resources are essential to the wider life- science community, and to the long-term preservation of biological data. The ELIXIR Data Platform defined criteria for identifying Core Data Resources as well as the process by which they will be selected in 2017. These Core Data Resources will drive ELIXIR’s sustainability strategy, and serve as a basis for policy-level discussions. We launched the ELIXIR Authorisation and Authentication Infrastructure (AAI), a critical component of the ELIXIR Compute Platform for both users and service providers. AAI facilitates seamless access to ELIXIR services for researchers, and it gives ELIXIR service providers better control and management of access rights. Our Intranet was among the first services to use the ELIXIR AAI, so our partners can now share information more easily with one another from over 180 institutes. Other services that integrated the ELIXIR AAI were ELIXIR Beacons, discovery services that make human genomic datasets and cohorts discoverable and accessible by the global biomedical research community. The Beacons, developed in collaboration with the Global Alliance for Genomics and Health, define a global standard in data discovery. By the end of 2016, ELIXIR Beacons facilitated access to genomics data from six ELIXIR Nodes. In 2016 our members will engage more deeply in the Beacons project, new Nodes will participate and the ELIXIR-GA4GH collaboration will be expanded.

ELXIR Annual Report 2016 3 Expanding membership Five new members joined ELIXIR in 2016: Italy, Slovenia, Ireland, Luxembourg and Germany. With these additions, we became the largest ESFRI Research Infrastructure in terms of membership. In the three years since our December 2013 launch, our membership has surged from the six founding members to 20.

Looking ahead The second half of our current Scientific Programme focuses on expanding our efforts to accelerate the exploitation of open biological data. In 2017 we will further consolidate the ELIXIR's portfolio of services and initiate discussions about the next programming period (2019–2023). There are over 600 people actively contributing to ELIXIR, and each one has played a part in its success. We owe a lot to those whose hard work and dedication contributed to the development of ELIXIR in 2016: the ELIXIR Heads of Node, Technical and Training Coordinators, Platform and Use Case Leaders and everyone in ELIXIR Nodes. The achievements of 2016 have laid the foundations for accelerated progress in 2017 and beyond, and I look forward to continued success in this transformative era in the life sciences.

Niklas Blomberg ELIXIR Director

4 ELXIR Annual Report 2016 ELIXIR Platforms and Use Cases Structure of ELIXIR activities

ELIXIR activities are structured according ELIXIR Platforms are built on the real and changing to five Platforms and four Use Cases. needs of established research communities. The four Use Cases drive the work of the ELIXIR Platforms by defining These form the basic units of operation their bioinformatics needs and requirements. This close within ELIXIR, and draw on expertise and collaboration ensures that the services developed by the resources from ELIXIR Nodes. ELIXIR Platforms are fit for purpose. Senior scientists at the ELIXIR Nodes lead each of the The ELIXIR Platforms comprise: Platforms and Use Cases, which are primarily funded by • Data: Sustaining Europe’s life-science the ELIXIR-EXCELERATE project, in which each Platform data infrastructure and Use Case is represented by a Work Package. Additional activities are funded through other grants and through the • Tools: Services and connectors to drive access ELIXIR Hub as Implementation Studies. and exploitation • Interoperability: Supporting the discovery, integration and analysis of biological data • Compute: Storage, compute and authentication/ access services • Training: Professional skills for managing and exploiting data The four Use Cases service domain-specific research communities: • Human data: Developing long-term strategies for managing and accessing sensitive human data • Rare diseases: Supporting the development of new therapies for rare diseases • Marine metagenomics: Developing a sustainable metagenomics infrastructure to nurture research and innovation in marine science • Plant science: Developing an infrastructure to facilitate genotype-phenotype analyses for crop and tree species

6 ELXIR Annual Report 2016 Human data Rare diseases USE CASES

Tools

Data Compute

PLATFORMS

Inter- operability Training

Marine Plant science metagenomics

ELXIR Annual Report 2016 7 Tools Services and connectors for access and exploitation

The Tools Platform is working to deliver Benchmarking a discovery portal that collects and curates Bioinformatics methods must be assessed for quantitative computational tools and data services performance and user friendliness to ensure they are fit for purpose. Benchmarking is thus a basic element of the for the life sciences. It is also developing ELIXIR infrastructure. These methods are so diverse in a framework for benchmarking and their design and purpose that it is necessary to abstract the evaluating bioinformatics tools from a scientific benchmark and technical monitoring processes in order to design an effective approach. This is an important scientific and technical perspective. part of the ELIXIR Tools Platform work programme, which is developing guidance and infrastructure for Bio.tools: ELIXIR tools and services registry benchmarking and monitoring bioinformatics tools, web In 2016 the Tools Platform focused on developing content, servers and algorithms. the underlying ontology (EDAM), portal software, In 2016, the Tools Platform mapped the needs of relevant sustainability strategy and applications for the bio.tools stakeholders and selected several user scenarios on which portal (launched in 2015). By the end of 2016, the portal to base the development of the ELIXIR Benchmarking had 6,205 entries and 45,357 EDAM annotations, including Platform. In 2016 the group started a stepwise contributions from over 30 countries. The data is freely implementation of the ELIXIR Benchmarking Platform, available under open license (CC BY 4.0). releasing the first demonstrators in early 2017. The Tools Platform organised 15 events in 2016 to collect ELIXIR supports community-driven benchmarking the content, anchor the portal development within the efforts, and in 2016 the Tools Platform provided such broad community, and raise awareness of bio.tools. support through by engaging with bioinformatics They also engaged other research infrastructures and initiatives such as CAMEO1, Quest for Orthologs2 and infrastructure projects, many of which now use or evaluate CAPRI3. This will provide a strong connection between the the data from bio.tools. These include the NIH Big Data to ELIXIR infrastructure and the communities carrying out Knowledge (BD2K) programme and AZTEC.bio in the US; benchmarking exercises within their expert knowledge Instruct, EuroBioimaging, BioEXCEL, Common Workflow domains. In the long term, it can lead to community Language and EU-ToxRisk in Europe, and the EMBL- agreements on data standards, submission formats Australia BioResource. and evaluation methods for quality assessment of The Tools Platform also collaborated with the ELIXIR bioinformatics tools. Interoperability Platform on the Common Workflow Language. The collaboration focused on enabling The ELIXIR Tools Platform is funded through the ELIXIR-EXCELERATE interoperability between different workflows, and the use project (Work Package 1: Tools Interoperability and Service Registry, and Work Package 2: Benchmarking). of the EDAM ontology and bio.tools for tool description.

1. Continuous Automated Model EvaluatiOn (CAMEO), https://cameo3d.org 2. The Quest for Orthologs is a joint initiative to benchmark, improve and standardize orthology predictions, http://questfororthologs.org/ 3. Critical Assessment of PRediction of Interactions (CAPRI), http://www.ebi.ac.uk/ msd-srv/capri/capri.html

8 ELXIR Annual Report 2016 Figure 1: Growth of contributors to bio.tools over time

500

400

300

Number of contributors of Number 200

100 2015 2016 2017

0

Jul 2015 Jul 2016 Jan 2017 Nov 2014Dec 2014 Jan 2015Feb 2015Mar 2015Apr 2015May 2015 Jun 2015 Aug 2015Sep 2015Oct 2015Nov 2015Dec 2015 Jan 2016Feb 2016Mar 2016Apr 2016May 2016Jun 2016 Aug 2016Sep 2016Oct 2016Nov 2016Dec 2016 Feb 2017Mar 2017 Apr 2017

ELXIR Annual Report 2016 9 Data Sustaining Europe’s life-science data infrastructure

The ELIXIR Data Platform provides a Establishing ELIXIR Core Data Resources framework for developing ELIXIR’s life- In 2016 the Data Platform focused on establishing the science data resources in a sustainable ELIXIR Core Data Resources portfolio. A landmark paper published in September 2016 describes the structures, manner. Operated by ELIXIR Nodes, these governance and processes required for identifying and resources serve a myriad purposes, ranging evaluating ELIXIR Core Data Resources, and presents from databases that archive research key indicators that formalise the scientific, technical and governance requirements1 (Figure 2). In October 2016, the outputs (e.g. DNA sequences) to dynamic, process of selecting the ELIXIR Core Data Resources started value-added knowledge bases, manually with the collection of nominations from ELIXIR Nodes. The curated by experts, that aggregate, process final portfolio of Core Data Resources will be published in and visualise data. Summer 2017, following extensive review.

ELIXIR Core Data Resources are central to Data Platform Linking literature with data activities. This select set of European data resources is Another objective of the Data Platform is to explore how of fundamental importance to the broader life-science text mining and other literature-based disciplines can research community, and to the long-term preservation contribute to the sustainable curation of data resources. of biological data. Many bioinformatics resources are useful precisely because they are manually curated by experts who review published One of the goals of ELIXIR is to ensure that these resources experimental data and extract relevant information. The are available over the long term, and that they continue to Data Platform is exploring how this work can be further support biomedical and biological research. enhanced and supported by text mining approaches, which can generate more granular links between articles and the data underlying them. This is essential for removing Figure 2: Indicators reflecting the multiple bottlenecks to quality assurance as high-throughput data facets of Core Data Resources generation increases the rate of research publication. This work resulted in the launch of SciLite, a platform that overlays text-mined annotations on research articles to link 2 Scientific focus and quality of science articles to related resources or tools .

Community served Long term sustainability of life science knowledge bases Quality of service The long-term funding of data resources is a constant concern for those who are managing them, with only a very small minority of databases having secured funding over Legal and funding infra- five years or more. In September 2016, the Data Platform structure, and governance launched an ELIXIR Implementation Study to evaluate the different funding models of knowledgebases, and Impact and propose a funding model that ensures their long-term translational stories sustainability. UniProt, the Universal Protein Resource, a key resource for protein sequences and functional information knowledge, serves as a use case in this evaluation. The final recommendations are planned for the INDICATORS second half of 2017.

The work on the ELIXIR Core Data Resources and linking data with 1. Durinx C, McEntyre J, et al. (2016) Identifying ELIXIR Core Data Resources. F1000Research 5. pii:ELIXIR-2422; (doi: 10.12688/f1000research.9656.2) literature were funded through the ELIXIR-EXCELERATE project (Work 2. Venkatesan A, Kim JH, et al. (2016) SciLite: a platform for displaying text-mined Package 3: Data Resources and Services). The Implementation Study on annotations as a means to link research articles with biological data. Wellcome long term sustainability of knowledge bases is supported through the Open Res 1:25; (doi: 10.12688/wellcomeopenres.10210.1) ELIXIR Hub budget.

10 ELXIR Annual Report 2016 Compute Access, exchange and storage

The Compute Platform is building a robust Compute Demonstrator with ELIXIR technical infrastructure for accessing, Marine Metagenomics Use Case transferring, exchanging and analysing The first ELIXIR Technical Service Demonstrator, biological data. It aims to provide cloud, a package of compute-related services, was applied to the ELIXIR Marine Metagenomics Use Case in 2016. compute, storage and access services for The demonstrator comprises scientific-software the research community. pipelines for analysing marine metagenomics data into containers and images, which increases portability The objective is to integrate the individual technical and usage of these services across ELIXIR’s distributed components provided by ELIXIR Nodes into a seamless infrastructure. The Demonstrator used ELIXIR’s AAI service provision system for the life science research to identify and authenticate researchers accessing the community. The Compute Platform works closely with the Marine metagenomics analysis pipeline (META-pipe, four scientific Use Cases and the ELIXIR Training Platform run by ELIXIR Norway) and improved the output of the to ensure the technical solutions fit their specific needs. data analysis.

ELIXIR Authentication and Authorisation The work of the ELIXIR Compute Platform was funded through the ELIXIR-EXCELERATE project (Work Package 4: Compute, Data access Infrastructure (AAI) and exchange services). The development of the ELIXIR AAI was To ensure that people are identified accurately when informed by close collaboration with the AARC (Authentication and Authorisation for Research and Collaboration) project. accessing sensitive data, even when they move between organisations, a reliable system of user identification is an absolute necessity for controlled-access services and resources. In 2016 the Compute Platform developed the ELIXIR AAI, which provides an ELIXIR ID for accessing ELIXIR services in a federated manner. Using ELIXIR AAI, researchers may use their existing, verified academic, corporate and social media identities. The service providers connected to the ELIXIR AAI benefit from a centralised user identity and access management services. In its first six months of its operation, ELIXIR AAI users created 584 ELIXIR IDs (in 101 ‘groups’), linked 292 identity providers and provided access to 16 relying ELIXIR services (e.g. ELIXIR Beacon in Finland). ELIXIR AAI credentials are accepted by the European Grid Infrastructure (via CheckIn) and the EUDAT e-infrastructure (via B2ACCESS).

ELIXIR Technical Services Roadmap The Compute Platform published the ELIXIR Technical Services Roadmap1, which sets out the needs of the ELIXIR Use Cases. The Roadmap defines ‘Technical Use Cases’ (TUCs), which serve as generic templates for technical activities within the Compute Platform. The Roadmap also identifies technologies that will be used to provide consistent compute infrastructure services.

1. Lars AB, Borg M, Cornelis A, Gonzalez M, et al (2016). A Technical Services Roadmap for supporting Life Science Research in Europe. Zenodo (doi: 10.5281/zenodo.60291)

ELXIR Annual Report 2016 11 Interoperability Integration of data and services

The ELIXIR Interoperability Platform Core services: Biosharing, Identifiers, (EIP) is developing and implementing a Ontology Lookup Service framework to support people and machines The ELIXIR Interoperability Roadmap has identified in the discovery, access, integration and the first components of the interoperability service framework: Identifiers.org, Biosharing.org and Ontology analysis of biological data. The Platform Lookup Service. implements the FAIR principles of Findable, Biosharing is the ELIXIR curated metadata and standards Accessible Interoperable and Reusable data registry. It is run by ELIXIR UK Node (University of Oxford) stewardship, and extends these to include and endorsed by a community of 68 organisations, APIs, ontology, standards and workflow including publishers, standardisation groups, and data management support initiatives. In 2016 the Platform stewardship. The Platform is driven conducted a community user survey that showed that by biological Use Cases including rare Biosharing is a core and critical resource for standards, genetic diseases, marine metagenomics, databases and policies. plant biology and others. Identifiers.org is an identifier resolution service run by EMBL-EBI Node that provides unique, stable, resolvable ELIXIR Interoperability Platform Roadmap and location independent compact URIs to identify and locate scientific data. In 2016, the Platform started an A key output of the Platform in 2016 was the development ELIXIR Implementation Study to establish Identifiers.org of the Interoperability Roadmap. The Roadmap defines as a stable ELIXIR system for uniform identifier resolution both the technical and scientific strategy and outlines to major bioinformatics resources. requirements for ELIXIR’s Interoperability services and related activities: Ontology Lookup Service is a repository for biomedical ontologies developed by the EMBL-EBI Node, which Standards and services provides access to biomedical ontologies for use by ELIXIR partners. In 2016, its list of over 130 ontologies was • Bioschemas - A new initiative delivering lightweight extended to include key ontologies used by the ELIXIR ‘Findability’ of ELIXIR resources and their content Plant Science and Human Data Use Cases. • Identifiers - core services and common standards for unique identification of data • Metadata – core services and common standards for dataset and sample level metadata for interoperability and reusability available from Biosharing • Ontology services enabling semantic interoperability available from Ontology Lookup Service

Approaches to interoperability • Identification and integration of core services supporting interoperability • Interoperability through connecting tools and resources with workflows: API standards and metadata to describe and compare workflows using Common Workflow Language (CWL)

Capacity building and support • Knowledge Hub for know-how, BYODs (Bring Your Own Data) events, and best practices

12 ELXIR Annual Report 2016 Bioschemas In 2017-2018, the Platform will continue to advance the Interoperability Roadmap. This includes work on the Bioschemas is a flagship project of the ELIXIR community development and adoption of BioSchemas, Interoperability Platform. It supports the use of and on pilot implementations in metadata validation and schema.org markup in life sciences and streamlines the workflow interoperability through CWL. The vision is to discovery, curation and analysis of distributed data. continue to work closely with international interoperability BioSchmas enables ELIXIR registries as well as standard services and efforts and embed Core Interoperability web search engines to harvest and exploit metadata Services across all of the ELIXIR Platforms. provided by content providers and data generators.

After a successful pilot with training materials and the The work of the ELIXIR Interoperability Platform was funded ELIXIR training portal (TeSS) in 2016, the Interoperability through ELIXIR-EXCELERATE project (Work Package 5: the ELIXIR Platform has expanded the approach to datasets, running Interoperability Backbone). The work on Identifiers and Ontology Lookup Service was funded by CORBEL. Work on identifiers was funded four hackathons 2016 and launching an Implementation by CORBEL and ELIXIR-EXCELERATE. The Bioschemas initiative was Study in 2017. The study brings together datasets and jointly funded by the ELIXIR Hub (through ELIXIR Implementation registries across ELIXIR and focuses on datasets for Study), ELIXIR-EXCELERATE and CORBEL. protein annotation, samples, plant phenotyping, and sensitive human data (Beacons).

Figure 3: Interoperability Platform Services Framework

Standards and APIs Applications Intergrated services Pipelines Identifier, resoution, Standards registry Ontology services: API description versioning, provenance Ontology Lookup Service (OLS)

Identifier mapping Tools registry: bio.tools Linked data services Tools and workflow descriptions

Citation implimentation Prefix commons Search services Dataset description: DATS: DatA Tag Suite

Identifier authorities Annotation services Validation services

BYOD BYOW BYOAPI

ELXIR Annual Report 2016 13 Training Professional skills for managing and exploiting data

The ELIXIR Training Platform is building a ELIXIR Training portal sustainable training infrastructure for the The Training eSupport System, TeSS, allows scientists to wider life science community. The scope of browse, discover and organise life-science training events and materials that have been aggregated from ELIXIR the Platform is broad, covering the training Nodes and third-party providers (e.g. RI-Train, BioEXCEL). tools and resources, training expertise, The production version of the portal was released in June quality assurance and monitoring. 2016. The ELIXIR training infrastructure comprises four Training events in ELIXIR Nodes and third-party training components: (1) training evaluation methods and metrics, providers are now aggregated automatically in the TeSS (2) ELIXIR Training Portal , (3) ELIXIR eLearning Platform portal, using the Bioschemas event specification (in and (4) Virtual Coffee Room . collaboration with the ELIXIR Interoperability Platform). A crucial part of the training infrastructure is the combined The TeSS portal team joined forces with the ELIXIR Tools expertise of the ELIXIR Training Platform members, and Interoperability Platforms to facilitate integration with notably the ELIXIR Training Coordinators representing the bio.tools and Biosharing.org registries, which will allow each of the ELIXIR Nodes. curators to link tools, databases and standards to relevant training resources. ELIXIR Training programme ELIXIR e-Learning In 2016, the ELIXIR Training Platform organised 36 training events according to five different streams: e-Learning, The ELIXIR Training Platform released the ELIXIR Train the Researcher, Train the Trainer, Train the Developer, e-Learning platform with six newly developed e-learning 1 and TeSS. courses . In collaboration with GOBLET, the Platform published a white paper2 which presents a consensus In 2016 the Training Platform carried out a training gap of what the ELIXIR and GOBLET communities mean analysis to identify training needs in ELIXIR Nodes. The by ‘e-learning’. This is a necessary first step towards results showed (1) the need to prioritise ELIXIR Use Cases formulating e-learning strategies for ELIXIR and GOBLET. in planning and delivering bioinformatics training, and (2) the need to improve training capacity in ELIXIR Nodes in The ELIXIR Training Platform is funded through the ELIXIR- order deliver this training effectively. EXCELERATE project (Work Package 11: ELIXIR Training Programme). The Platform also actively collaborates with partner initiatives and The Train the Trainer programme in 2016 trained 53 projects (GOBLET, BD2K, Software and Data Carpentries, CORBEL, instructors from 12 ELIXIR Nodes; the plan for 2017 is to RITrain and others). extend the pool of trainers, also in collaboration with the Software and Data Carpentry communities.

Impact and quality

To measure the impact and quality of ELIXIR training, the ELIXIR Training Platform defined what constitutes an ELIXIR training event, and agreed a set of Key Performance Indicators (KPIs). This work provided a solid basis for continuous monitoring of the quality of all ELIXIR training events. To further promote standardisation and reuse of bioinformatics course material, the Platform established a common framework for the curation of training materials (in collaboration with GOBLET, the Global Organisation for Bioinformatics Learning, Education and Training). 1. https://elixir.mf.uni-lj.si 2. Attwood, TK., Leskosek, Brane L., Dimec, Jure, Morgan, Sarah, Mulder, Nicola, van Gelder, Celia W.G., & Palagi, Patricia M. (2016). Defining a lingua franca for the ELIXIR/ GOBLET e-learning ecosystem. Zenodo. (doi: 10.5281/zenodo.166378)

14 ELXIR Annual Report 2016 eLearning Train the Trainer • ELIXIR/GOBLET • Train the Trainer UK and Portugal events workshop: defining an Train the Researcher e-learning lingua franca • RNA-seq and ChIP-seq • RNA-seq data analysis data analysis with Chipster with Chipster • Variant Analysis Workshop

Train the Developer Train the Researcher • Workshop for developers: Software faster: From • Software and Data months to minutes Carpentry Instructor Training Workshop • Data Curation Training for tranSMART • Managing and Integrating Life Science Information

Train the Researcher Train the Trainer • ELIXIR Data and Software • Train the trainer Carpentry Instructor Slovenia event Training

eLearning Train the Trainer • Linux command • EXCELERATE train the line course trainer and training impact workshop

JanSep '15 FebDec '15 Mar '16 AprJun '16 SepMay '16 DecJun '16

eLearning

• RNA-seq Train the Trainer data analysis with Chipster • Using clouds and virtual machines in bioinformatics training Train the Trainer • Train the trainer Italy event Train the Researcher TeSS • BIIT web-tools for high- eLearning • "Training" and "metadata throughput data analysis searching" meetings • Genome Assembly and from ELIXIR-Estonia Annotation • Data Carpentry Workshop • Unix Command Line SE Train the Developer • CASyM/EASyM tutorial • ELIXIR and DARIAH "AAI Train the Developer Workshop for service and resource providers" • ELIXIR Technical Hackathon: Tools, • Hackathon Bioinformatique Workflow and Workbenches

TeSS Train the Trainer • TeSS workshop with training • Train the trainer event standards group BioSchemas

ELXIR Annual Report 2016 15 Human data Use case

The ELIXIR Human Data Use Case is tasked with building the technical infrastructure required for researchers to discover, combine and exchange controlled-access human data, while complying with data- privacy and data-security requirements. The backbone of the Human Data Use Case is the European Genome-phenome Archive (EGA). The Use Case extends and generalises the EGA system of access authorisation and secure data transfer, and makes it available to researchers across the ELIXIR Nodes. The Use Case works closely with the ELIXIR Compute Platform, particularly around systems for authorising and authenticating access to sensitive data (i.e. AAI). It is also informed by the security framework, tools and APIs developed by the Global Alliance for Genomics and Health (GA4GH). In 2017, ELIXIR and GA4GH expand their partnership to Local EGA add new security measures and create a network of ELIXIR Beacons. The goal is to make sure biomedical research Local instances of the EGA will enable the storage of data generated in Europe can be found by any researcher sensitive, controlled-access data that may not leave their worldwide. country of origin for a variety of reasons. Integration of the associated metadata in EGA will ensure that such data are discoverable by researchers everywhere. Facilitating re-use of human data The Human Data Use Case collaborated with the Dutch The Human Data Use Case developed a functional Center for Translational Medicine (TraIT) on a scoping demonstrator of the Local EGA in 2016, including study to develop IT infrastructure for translational successful integration of ELIXIR AAI with the research. This collaboration connected the EGA with corresponding AAI services in EGA. The demonstrator was TraIT’s data analysis platform (Galaxy) and data portal tested by ELIXIR Nodes on their local infrastructures. (tranSMART), and enabled Dutch researchers to use EGA The further development of the demonstrator will focus on as the long-term storage solution for raw data1. the case of biobank data from Finland’s National Institute ELIXIR also collaborated on the Innovative Medicines for Health and Welfare (THL). Initiative (IMI) project OncoTrack, exploring options for long-term storage and discovery services for the ELIXIR Beacons cancer research data (e.g. from human, animal, cell lines) The Beacon Project develops an open sharing platform generated by the project. that helps genomic data centres make their data more discoverable. Beacons allow researchers to query individual The work of the ELIXIR Human Data Use Case was funded through datasets – without disclosing any sensitive information – to the ELIXIR-EXCELERATE project (Work Package 9: Secure archiving, dissemination and analysis of human access-controlled data). The determine whether they contain a specific genetic variant ELIXIR Beacon project and collaborations with TraIT and OncoTrack of interest. were supported by the ELIXIR Hub budget though three ELIXIR Implementation Studies (ELIXIR Beacons, Genomic data management ELIXIR’s fruitful collaboration on Beacons with the Global for TraIT using the EGA, and ELIXIR – IMI OncoTrack scoping study on Alliance for Genomics and Health (GA4GH) resulted in long-term data handling). six ELIXIR Nodes making their genomics data available for discovery during 2016. Beacons were ‘lit’ in Sweden, Finland, France, Switzerland and Belgium, and in the EGA.

Each ELIXIR Beacon makes one or more genomics datasets 1. Hoogstrate Y, Zhang C, et al. (2016) Integration of EGA secure data access into Galaxy. discoverable to the international research community. F1000Research 5(ELIXIR):2841; (doi: 10.12688/f1000research.10221.1)

16 ELXIR Annual Report 2016 Rare diseases Use case

The ELIXIR Rare Diseases Use Case is building a portfolio of ELIXIR resources to address the needs of the rare diseases research community. During 2016 the Use Case worked in close collaboration with the ELIXIR Tools, Data, Compute and Training Platforms as well as several rare-disease initiatives (RD- Connect, BBMRI, FAIR-dICT and ODEX4ALL) to map and connect existing resources relevant to the rare-disease research community.

Portfolio of resources and tools for rare diseases research Following a survey of current tools and data resources for rare diseases research, the Use Case identified 51 ELIXIR services and tools that are important to rare disease research, and added them to the ELIXIR Tools and Services Registry (bio.tools). In collaboration with the ELIXIR Tools Platform, the Use Case started to develop benchmarking and standards for reporting quality for these resources.

Linking rare diseases data The rare disease data linkage plan emphasises data interoperability as the critical factor to boost research in the rare disease domain. Because each rare disease field is limited in size, the single most important way to gain new insights is to combine data across registries, biobanks and -omics databases. To achieve the goal of treatment and access to care for all rare diseases, exponential growth in the efficiency of combining data for analysis is required. An ecosystem of FAIR resources can bring this goal closer to this goal. The comprehensive ‘rare disease data linkage plan’ for selected biobanks and registries within the rare disease community was established by the Use Case in 2016. The purpose of the plan is to make the data in these biobanks and registries more Findable, Accessible, Interoperable and Reusable (FAIR). The development of this technical framework will continue throughout 2017.

The work of the Rare Disease Use Case was funded through the ELIXIR- EXCELERATE project (Work Package 8: ELIXIR infrastructure for Rare Disease research).

ELXIR Annual Report 2016 17 Marine metagenomics Use case

The ELIXIR Marine Metagenomics Use Case is building a stable, sustainable infrastructure for the marine science community. Marine metagenomics resources range from deposition archives with research data outputs (e.g. DNA sequences), to highly dynamic knowledge bases that aggregate and process research data through manual curation and complex, scalable metagenomics analysis pipelines.

Metagenomics data standards environment for the marine domain In 2016 the Use Case - with input from the wider metagenomics community - proposed standards for provenance data throughout the data life cycle, covering the description of processes around sampling, sequencing, analysis and archiving of analysis results.1 EMG and META-pipe analysis pipelines were deployed within a cloud infrastructure, so they can be offered via ELIXIR cloud resources, using the ELIXIR AAI. The Establishment of marine-specific distribution and scheduling of computing jobs for META- data resources pipe and EMG was developed in collaboration with the The Use Case released three new reference databases: ELIXIR Compute Platform as a proof of concept, and can MarRef, MarDB and MarCat. These were made public in be run on cloud providers including EMBL-EBI’s Embassy early 2017 through the Marine Metagenomics Portal2. Cloud, Google Cloud Platform, Amazon Web Services, These databases provide robust, richly annotated datasets cPouta (ELIXIR Finland) and CESNET (ELIXIR Czech with highly accurate, detailed information. Republic). MarRef is a manually curated marine microbial reference The activities of the ELIXIR Marine Metagenomics Use Case in 2016 genome database that contains completely sequenced were funded through the ELIXIR-EXCELERATE project (Work Package genomes. MarDB includes all sequenced marine microbial 6: Marine Metagenomics Infrastructure as Driver for Research and genomes regardless of level of completeness. MarCat is a Industrial Innovation). catalogue of uncultivable and cultivable marine genes and proteins derived from metagenomics samples.

‘Gold standards’ for metagenomics analysis To enrich the analysis output and improve computational performance, the Use Case developed new versions of the two existing metagenomics analysis pipelines: EMG (EMBL-EBI’s Metagenomics resource) and META- pipe (ELIXIR Norway), based on the new standards and databases developed by the Use Case.

1. This work is captured in a paper, The Metagenomics Data Life-cycle: Standards and Best Practices accepted for publication in Gigascience in early 2017 2. http://mmp.sfb.uit.no

18 ELXIR Annual Report 2016 Plant science Use case

The ELIXIR Plant science Use Case is building a common infrastructure for systematic publication and programmatic access to genotypic and phenotypic data from crop and forest species. The major goal is to facilitate the discovery of plant data and streamline their analysis. The range of bioinformatics services for the plant science community cover standards for presentation of genotypic and phenotypic plant data, plant data management, and data discovery and access.

Data description standards and annotation of key datasets Drawing on the outcomes of a major workshop1, the Use Case developed a first set of recommendation guidelines with agreed vocabularies, based on the extension and revision of the MIAPPE standards (Minimum Information The common API is an extension of Breeding API (BrAPI), about Plant Phenotyping Experiment). The evolution of which is being developed independently by many groups the MIAPPE standard will be proposed to the MIAPPE worldwide to enable access to plant-breeding data. governing body in May 2017. The Use Case contributed its experience with exemplar datasets and specific user needs to help adjust the BrAPI Following the development of the initial set of data specification. The first proof of concept of the common standards, the Use Case began annotating ten exemplar API was developed during a joint BrAPI-ELIXIR workshop datasets from ELIXIR Nodes in Belgium, France, organised in Montpellier, France, in December 2016. Netherlands, Slovenia, Portugal and the UK. Six of those exemplar datasets were fully annotated and submitted to the BioSamples database; the remaining annotated The activities of the ELIXIR Plant Science Use Case in 2016 were funded through the ELIXIR-EXCELERATE project (Work Package 7: Integrating dataset was submitted to Gene Expression Omnibus. Genomic and Phenotypic Data for Crop and Forest Plants).

Methods for plant data discovery and access To support access to reference databases, the Use Case proposed and began implementing an architecture for access to distributed datasets. The goal is to develop a common API for programmatic access to data, and implement it at each partner genotypic, phenotypic and sample repository. This will allow users to query multiple databases from a single endpoint by integrating results from distributed resources.

1. Five-day workshop Semantics for Harmonization and Integration of Phenotypic and Agronomic Data (PhenoHarmonis) in May 2016 in Montpellier, France, which brought together around 80 specialists in data standardisation, data managers and data producers. The Use Case contributed to the workshop organisation.

ELXIR Annual Report 2016 19 Platform leaders

Tools Data

Søren Brunak Jo McEntyre Christine Durinx

Compute

Tommi Nyrönen Ludek Matyska Steven Newhouse

Interoperability

Carole Goble Chris Evelo Helen Parkinson

Training

Patricia Palagi Chris Ponting Rita Hendricusdottir Celia van Gelder

20 ELXIR Annual Report 2016 Use case leaders

Human data

Serena Scollen Tom Keane Jordi Rambla

Rare diseases

Ivo Gut Marco Roos

Marine metagenomics

Nils Peder Willassen Rob Finn

Plant sciences

Paul Kersey Celia Miguel

ELXIR Annual Report 2016 21

ELIXIR members

ELXIR Annual Report 2016 23 Introducing ELIXIR UK

Since its formation in September 2013, ELIXIR UK focused its service provision on bioinformatics training. In 2016, ELIXIR UK expanded the consortium and its portfolio of services to better represent the activities of the UK bioinformatics community. The remit of ELIXIR UK services has been extended from its original training focus to include Data, Tools and Interoperability. It has added ten new services and three specialist training centres (University of Birmingham, Cambridge University and the Roslin Institute in Edinburgh) to its portfolio. In addition, two discovery portals developed by ELIXIR UK (ELIXIR TeSS portal1, and the Biosharing portal for data standards and related policies2) have become official ELIXIR services. The ELIXIR UK structure is now divided into six categories: Interoperability and standards, Protein Structure and Function, Expression Atlases, Human Health and Disease, Agri Data and Training. The new structure is the result of an open and transparent review process carried out throughout 2015-2016. The process brought in external experts to evaluate the submission, which was then reviewed by the ELIXIR’s Scientific Advisory Board. The experience gained during the expansion was summarised in an article published on the ELIXIR F1000Research channel in December 20163. www.elixir-uk.org

1. https://tess.elixir-europe.org 2. https://biosharing.org 3. Hancock JM, Game A, Ponting CP and Goble CA. (2016) An open and transparent process to select ELIXIR Node Services as implemented by ELIXIR-UK. F1000Research 5(ELIXIR):2894 (doi: 10.12688/f1000research.10473.2)

24 ELXIR Annual Report 2016 Introducing ELIXIR Germany

Germany became a full member of ELIXIR on 2 August 2016, when the Federal Minister of Education and Research signed the ELIXIR Consortium Agreement. The ELIXIR Germany Node was established by the German Network for Bioinformatics Infrastructure (de.NBI). de.NBI coordinates the management and provision of bioinformatics services across Germany, bringing together expertise and resources through dedicated service centres in areas such as microbes, integrative bioinformatics, data management, crop bioinformatics, human genomics and . It is led by the Bielefeld University and includes over 20 partner institutes across the country. As ELIXIR Member, de.NBI focused on developing the German Node Application to specify which bioinformatics services will become part of ELIXIR. This will ensure the integration of de.NBI into ELIXIR. To celebrate Germany’s membership in ELIXIR, de.NBI organised a special session dedicated to ELIXIR at the International de.NBI Symposium in Heidelberg, where the members of the ELIXIR Germany leadership and the ELIXIR Director, Niklas Blomberg, presented ELIXIR in general and the plans and activities in ELIXIR Germany. www.denbi.de

ELXIR Annual Report 2016 25 ELIXIR Nodes updates

Members Observers

Belgium Italy Greece Czech Republic Luxembourg Denmark Norway EMBL-EBI Netherlands Estonia Portugal Finland Slovenia France Spain Germany Sweden Israel Switzerland Ireland United Kingdom

26 ELXIR Annual Report 2016 Belgium Denmark

• Launched the ELIXIR Belgium website: www.elixir- • Further developed the ELIXIR registry of bioinformatics belgium.org tools and data services (https://bio.tools), with addition or improvement of over 6500 entries in total from 486 • Launched the ELIXIR Belgium Training platform, contributors and 244 institutes following a community meeting with Belgian universities and research centers • Developed comprehensive documentation for bio.tools (http://biotools.readthedocs.io/en/latest/), including • Organised a workshop on how to handle genomics data details of how to get involved in the community-driven in a more efficient way process behind the registry development • Launched the ELIXIR Belgium Beacon to provide access • Organised, led or participated in an intensive series of to exome variant frequencies from patients with rare 14 events around the development and use of bio.tools. genetic disorders • Launched a studentship scheme to support students to • Secured a regional grant of €1 million to further work on curation-focused ‘mini-projects’ that improves develop the ELIXIR Belgium Node services; expand the the quality of the bio.tools content and its growth portfolio of existing Node services; provide training for the life-science community; and develop a strategy • Organised the second annual Danish Bioinformatics for establishing a state-of-the-art data-storage and Conference analysis infrastructure for life science research in Flanders. EMBL-EBI Czech Republic • Organised five training courses as part of CORBEL and ELIXIR-EXCELERATE • Developed and launched the ELIXIR Authentication • Contributed to three knowledge-exchange workshops and Authorisation Infrastructure (AAI) together with organised by ELIXIR ELIXIR Finland • Trained six postgraduate students from ELIXIR Czech • Launched Galaxy and Chipster (see ELIXIR Finland) Republic on protein annotation and curation as part of instances for the Czech life science community ELIXIR Staff Exchange pilot scheme • Organised ELIXIR-EXCELERATE Structural Funds • Launched new Implementation study on Data workshop Identification and Interoperability (https://www.elixir- • Launched the ELIXIR-EXCELERATE Data Nodes europe.org/activities/identifiers) Network (in collaboration with ELIXIR Sweden) • Organised a workshop and an e-learning course on Estonia RNASeq in Chipster • Launched the Virtual Coffee Room (https://cafe.elixir. • Organised workshop on Genome Assembly and ut.ee), a tool for sharing knowledge across the ELIXIR Annotation community • Presented ELIXIR CZ at the Czech National • Launched ClustVis, a simple web tool for statistical Bioinformatics Conference 2016 (ENBIK), a meeting analysis (PCA, clustering, visualisation) of tabular data supported by the Node • Contributed to three training courses on high- • Presented ELIXIR CZ at an information day on national throughput data analysis (in Paris, Cambridge and Research Infrastructures organised by the Czech Oxford) Ministry of Education, Youth and Sport • Secured national funding for ELIXIR Estonia for 2017– • Organised a staff exchange programme with EMBL-EBI 2022, with an overall budget of €1.3 million to train ELIXIR Czech Republic staff in annotation of protein data

ELXIR Annual Report 2016 27 Finland Germany

• Represented ELIXIR in the EU-funded AARC • Signed the ELIXIR Consortium Agreement and became (Authentication and Authorisation for Research and a full Member of ELIXIR in August 2016 Collaboration) project • Hosted an ELIXIR Tools and Service Registry • Launched an ELIXIR-FI Beacon to make available Hackathon and registered ELIXIR Germany tools in the the Finnish samples sequenced in the 1000genomes Registry project. The first beacon connected to the ELIXIR AAI • Organised the first International de.NBI Symposium on developed by the ELIXIR Compute Platform ‘Bioinformatics for Human ‘Health and Disease’, with a • Launched Chipster (http://chipster.csc.fi) a versatile, special session dedicated to de.NBI / ELIXIR Germany scalable data-analysis and visualisation platform with a selection of popular bioinformatics tools Israel • Organised one Compute Platform workshop and two • Continued to develop the national Node in Israel with training events a focus on genomics, disease-oriented bioinformatics • Successfully demonstrated cross-border extension services, proteomics and storage and retrieval of data of secure clouds as part of the Tryggve project, a collaboration between ELIXIR Nodes in Finland, Ireland Denmark, Norway and Sweden to establish a Nordic platform for sharing sensitive data • Became member of ELIXIR in July 2016 • Co-developed the ELIXIR AAI with ELIXIR Czech • Started to develop the ELIXIR Ireland Node Republic as part of the ELIXIR Compute Platform Italy France • Launched a HCP resource for researchers in ELIXIR­ Italy • Launched the ELIXIR France e-learning service for and other ELIXIR Nodes (ELIXIR-IIB HPC@Cineca) computational and experimental researchers. At the • Registered ELIXIR Italy bioinformatics services in the time of printing, six training courses are available on the ELIXIR Tools and Services Registry: Pathway Inspector, platform (MICrobial Community Analysis, micca), VESPUCCI, • Launched Cloud Marketplaces, a catalogue of COLOMBOS, ASSIsT, DisProt, VHLdb, RepeatsDB 2.0, bioinformatics cloud appliances (https://biosphere. RING, FELLS, MobiDB lite, Argot 2.5, Funtaxis france-bioinformatique.fr/catalogue) • Organised nine bioinformatics training courses • Launched more than 15 Public Galaxy servers • Formally launched ELIXIR Italy at a launch meeting in • Launched BioShadock, a curated Bioinformatics Bologna in November container registry (http://bioshadock.genouest.org) • Secured a national grant with a total budget of • Organised two hackathons to register ELIXIR France €400,000 tools in the ELIXIR Tools and Services Registry, and to support collaboration and technical developments between bio.tools and the French bioinformatics community • Held the first Summer School in metagenomics

28 ELXIR Annual Report 2016 Luxembourg • Co-organised the European Conference on Computational Biology 2016 in Den Haag (with the • Became member of ELIXIR in July 2016 Netherlands bioinformatics and • Organised a data curation training course in research school, BioSB) Luxembourg, in collaboration with the eTRIKS research • Organised six bioinformatics training courses infrastructure for translational data sharing within BioSB • Co-organised a joint hackathon in Montpellier, France, Norway with the ELIXIR Plant Use Case and the Breeding API • Submitted Service Delivery Plan with four services consortium (LiceBase, META-pipe, Genomic HyperBrowser and • Organised three ‘Bring Your Own Data’ (BYOD) BioXSD / GTrack) meetings • Organised eight training workshops on the Norwegian • Placed on the National Roadmap for Large-Scale e-Infrastructure for Life Sciences (NeLS) platform, Research Facilities of the Netherlands Organisation for NGS data analysis, meta-analysis and data storage Scientific Research (NWO) • Organised two developer workshops on NeLS • Co-launched the European Metabolomics Training • Organised course on tools for NGS data analysis, Coordination Group (EmTraG) to implement a training including ELIXIR resources strategy in this area • Hosted the ELIXIR Innovation and SME Forum, “Data- • Successfully completed two Implementation driven innovation in the aquaculture and marine studies: ELIXIR–TraIT–EGA and the Rare Disease industries” Implementation Study • Co-organised the "Technologies for Digital Life" Portugal meeting in Bergen and the Data management workshop in Trondheim • Launched ELIXIR Portugal Compute services, based on • Established a close collaboration with Digital Life OpenStack, made available to researchers in Portugal Norway for coordination of research infrastructures in • Organised a number of training events, including 13 biological and medical science, data management and hands-on bioinformatics training courses (attended by innovation in biotechnology 167 researchers) and one Train-the-Trainer event • Contributed to the development of plant data Netherlands standards and data management best practices, • Expanded to include 125 research partners in 45 partner namely the Plant Experimental Assay Ontology and the institutions MIAPPE and BrAPI specifications • Co-organised a conference with other Dutch research • Initiated a project on precision medicine, which aims at infrastructures to kick-start the national initiative improving the diagnosis and treatment of stroke and ‘Health Research Infrastructure (Health-RI)’ some rare diseases • Organised the ELIXIR track at the Dutch Bioinformatics conference BioSB2016

ELXIR Annual Report 2016 29 Slovenia Sweden

• Completed preparations for signing the ELIXIR • Organised three Capacity Building Workshops, two of Consortium Agreement and became a full ELIXIR which were on genome annotation and assembly (held in Member in February 2016 Barcelona and Prague) • Launched the ELIXIR-SI eLearning Platform to offer • Launched the ELIXIR-EXCELERATE Data Nodes Network bioinformatics courses and webinars (http://elearning. (June 2016) in collaboration with ELIXIR Czech Republic elixir-slovenia.org) • Launched the Cell Atlas: part of the open-access Human • Established Bioinformatics ELIXIR-SI Entity (BEE) Protein Atlas, featuring an interactive database with as a core national group in Slovenia that will provide unparalleled high-resolution images that visualises, for bioinformatics services to the wider life science research the first time, the location of over 12 000 proteins in cells community in Slovenia (December 2016) • Formally launched the ELIXIR Slovenia Node at a major • Launched an ELIXIR Beacon, giving access to the whole- event for national policymakers, funders and scientific genome variant frequencies for 1000 Swedish individuals leaders generated within the SweGen project • Co-organised three training courses and two hackathons • Received a national grant from the Swedish Research Council for the Human Protein Atlas contribution to • Placed in the 2016 Revision of the National Research ELIXIR (~€200,000 per year, 2016–2017) Infrastructure Roadmap (originally published in 2011) • Received an €80,000 research grant from the Slovenian Switzerland Research Agency • Awarded two Innovative Medicines Initiative (IMI) grants: Spain “BEAt-DKD - Biomarker Enterprise to Attack Diabetic Kidney Disease” (2016–2021) and RHAPSODY - Risk • Broadened the ELIXIR Spain portfolio, including specific Assessment and ProgreSsion of Diabetes (2016–2020) resources aligned with ELIXIR Use Cases • Organised 57 courses on bioinformatics-related topics, • Developed and submitted a proposal for national funding spanning 113 days of teaching and training nearly 1350 of the INB/ELIXIR-ES to the National Institute of Health researchers Carlos III with the double objective of consolidating the alignment with ELIXIR and increasing the translational • Hosted the ELIXIR Software Carpentry/Data Carpentry impact on the Spanish National Health System instructors' training in Lausanne (January 2016) and • Launched ELIXIR Beacon servers including the whole three Software Carpentry/Data Carpentry workshop genome variant frequencies of 790 unrelated Spanish • Led and contributed to the ELIXIR white paper "Defining individuals as part of the CSVS project a lingua franca for the ELIXIR/GOBLET e-learning • Supported of different events and activities, including the ecosystem" and to the report on the training needs INB/ELIXIR-ES Annual Meeting, and the joint workshop identified across the ELIXIR community between ELIXIR and BioCreative on text mining • Organised a series of conferences to celebrate the 30th • Organised a training workshop on High-Performance anniversary of UniProtKB/Swiss-Prot and the 30th Computing for the ELIXIR community anniversary of The Eukaryotic Promoter Database • Launched an ELIXIR Beacon for the arrayMap cancer genome data repository

30 ELXIR Annual Report 2016 United Kingdom

• Extended the scope of the Node to include Data, Tools and Interoperability • Added ten new services and three specialist training centres (University of Birmingham, Cambridge University and the Roslin Institute in Edinburgh) to the portfolio • Launched the production of the ELIXIR TeSS portal (tess.elixir-europe.org). The Biosharing portal for data standards and related policies (biosharing.org) became official ELIXIR service • Launched a website for developing training activity in statistics (www.statschoices.org.uk), offering a growing set of resources and information about the wider Statistics Training Signposting project • Received a national grant to support the ELIXIR-UK Coordination Office (total budget of £767,420 over four years) • Organised four training workshops and two meetings for the ELIXIR UK community

Greece (Observer)

• Organised Symposium "Infrastructures for Life Science’s Big Data – and the role of ELIXIR" in Athens • Developed and submitted ELIXIR Greece Node Application to the ELIXIR Scientific Advisory Board (SAB), and received a positive review • Developed and submitted a proposal for national funding of the Greek ELIXIR Node to the Ministry of Education, Research and Religion

ELXIR Annual Report 2016 31 Activities 2016 highlights ELIXIR Industry Strategy The ELIXIR Industry Strategy provides a comprehensive overview of the current Activities and future actions to stimulate innovation ELIXIR OncoTrack collaboration and ensure industry uptake of ELIXIR’s Membership ELIXIR started a collaboration with OncoTrack, services in companies of all sizes. Slovenia joins ELIXIR an Innovative Medicine Initiative project to improve the diagnosis and treatment of colon ELIXIR Innovation and SME Forum: cancer. They will explore options for providing Data-driven innovation in the long-term storage of and discovery services for aquaculture and marine industries data generated within the project Hosted by ELIXIR Norway in Oslo on 12- 13 May, the event showcased free tools Open Data in Action: Life Science Data and services offered through ELIXIR and Infrastructure for Innovation conference highlighted research-intensive companies Researchers, policy makers, research already making use of ‘big data’ in their infrastructure operators and industry respective life-science fields. representatives gathered on 4 February in Brussels for the ELIXIR Conference 'Open Data in Action' to discuss the impact of open data in the life sciences on innovation. Robert-Jan Smits, Director-General of the Research and Innovation DG of the European Commission (EC), opened the conference by highlighting the Open Data and Open Access policies in the Horizon 2020 funding programme. He spoke about the EC’s goal to make data management plans a requirement for all Horizon 2020 grants.

Jan Feb Mar Apr May Jun

Activities Activities Second ELIXIR All Hands meeting ELIXIR Training Platform kicks off The second ELIXIR All Hands meeting was the ‘Train the Trainer’ programme held in Barcelona, Spain on 7-10 March, ELIXIR’s ‘Train the Trainer’ programme, with over 200 attendees. It presented the designed for ELIXIR trainers, was full breadth of EXCELERATE activities, kicked off at the ELIXIR-EXCELERATE with keynotes from the Centre of Genomics Activities Workshop in Hinxton, UK. The Regulation in Barcelona, the Malaspina ELIXIR launches its Data programme takes inspiration from the 2010 Expedition and the NIH Big Data to Nodes Network EMBL-EBI programme and Software Knowledge initiative. A Data Nodes Workshop in Prague and Data Carpentry initiatives. on 13-14 April 2016 marked the launch of ELIXIR’s Data Nodes Network. The Network brings together ELIXIR Nodes with large Membership data collections and databases Italy joins ELIXIR to develop guidelines and good practices to facilitate collection and linking of data to international deposition archives run by ELIXIR ELIXIR: A Landmark in the Nodes such as EMBL-EBI. 2016 ESFRI roadmap The 2016 ESFRI Roadmap classified ELIXIR as a ‘Landmark’, as it reached its implementation phase by 2015. The Roadmap describes the role and impact of ELIXIR as “essential for European life science research, as the enhanced technical architecture will facilitate access to well-curated data, international collaboration and ultimately play an integral role in the transformation of bio-industries”.

32 ELXIR Annual Report 2016 Activities ECCB 2016 in The Hague, Netherlands ELIXIR was the main organising sponsor of the 15th European Conference Activities on Computational Biology (ECCB) Activities EMBL-EBI teams up with Czech ELIXIR held on 3-7 September in The Hague, ELIXIR Policy for Ethical, Legal Node to build annotation capacity Netherlands. The programme featured a and Societal Issues (ELSI) track dedicated to ELIXIR Applications, ELIXIR adopted an overarching EMBL-EBI and Masaryk University which showcased 12 projects in ELIXIR policy for ethical and legal aspects of (ELIXIR Czech Republic) started a Use cases and Platforms. The ELIXIR accessing and re-using sensitive life- collaboration to train researchers from poster session featured over 40 posters. science data. Covering Ethical, Legal Masaryk University in the annotation of and Societal Issues (ELSI) around the data in the Protein Data Bank in Europe provision of sensitive human data in (PDBe). The project was part of ELIXIR’s Three new Implementation Studies research, the policy provides guidance capacity-building activities and served ELIXIR kicked off three new on key principles and clarifies how these as a pilot for the development of the Implementation Studies, covering the will be handled in the context of ELIXIR ELIXIR Staff exchange programme. Interoperability and Data Platforms services. as well as the Human Data Use Case. The three new projects complemented ELIXIR AAI launches four on-going Implementation studies The ELIXIR Compute Platform in 2016. The 2016 portfolio of ELIXIR launched the ELIXIR Authentication Implementation studies included seven and Authorisation Infrastructure (AAI). projects and 9 ELIXIR Nodes. AAI introduced a common ELIXIR Membership identity that allows researchers to Germany sign-on to multiple services across joins ELIXIR ELIXIR using their institution or university accounts, or using their ORCID ID, Google or LinkedIn accounts.

JanJul FebAug MarSep OctApr NovMay DecJun

Membership Luxembourg and Ireland join ELIXIR Activities The ELIXIR International Strategy ELIXIR published its International Strategy, which defines ELIXIR's objectives in collaboration beyond Europe and ensures Activities that ELIXIR engages with the relevant major Five new ELIXIR Beacons The first stage of the ELIXIR Beacon Activities international initiatives. The Strategy was project (2015-2016) resulted in the The ELIXIR TeSS portal presented at the International Conference ‘lighting’ of Beacons in five ELIXIR ELIXIR UK launched ELIXIR’s Training on Research Infrastructures (ICRI), hosted in Nodes: Sweden, Finland, France, e-Support System, TeSS. The portal Cape Town, South Africa on 3-5 October. Switzerland and Belgium, and in the collects, disseminates and organises European Genome-phenome archive information about bioinformatics Nominations for Core Data Resources run jointly by EMBL-EBI and ELIXIR training from ELIXIR Nodes and third- ELIXIR opened the process for Nodes to Spain. Each ELIXIR Beacon makes party providers. By the end of 2016, the propose databases as ELIXIR Core Data one or more genomics datasets portal contained over 1000 resources Resources, which are those considered discoverable to the international (training events and materials) from to be of fundamental importance to the research community. more than 30 providers. wider life-science community and the tess.elixir-europe.org long-term preservation of biological data. Core Data Resources are exemplars both within ELIXIR and the global bioinformatics community. They represent the community consensus on data resources that are of utmost importance to research. They drive ELIXIR’s sustainability strategy, provide policymakers with insights into quality and impact, and serve as a quality benchmark in ELIXIR capacity building.

ELXIR Annual Report 2016 33

EU Projects

ELXIR Annual Report 2016 35 ELIXIR-EXCELERATE

ELIXIR-EXCELERATE is a €19 million Use Cases Horizon 2020 project to help implement • WP6: Marine Metagenomics Use Case: Marine metagenomics infrastructure as driver for research and ELIXIR by coordinating national data industrial innovation infrastructures and ensuring the delivery • WP7: Plant Sciences Use Case: Integrating Genomic and of life-science data services through its Phenotypic Data for Crop and Forest Plants • WP8: Rare Disease Use Case: Use Case: ELIXIR Platforms and Use Cases. infrastructure for Rare Disease research • WP9: Human Data Use Case: Secure archiving, Basic facts dissemination and analysis of human access-controlled • €19.8 million data • four years (2015-2019) Node Capacity • 41 partners in 17 countries • WP10: ELIXIR Node Capacity Building Programme Overall goals Training • WP11:Training Platform: ELIXIR Training Programme • Implement the ELIXIR Scientific Programme • Develop and connect resources and services across Operations and outreach ELIXIR Nodes • WP12: Excellence in ELIXIR Management and Operations • Build bioinformatics capacity in Europe • WP13: Communications, Industry and Community Engagement The ELIXIR-EXCELERATE project is fully embedded into • WP14: Ethics requirements ELIXIR’s operations. This means that all EXCELERATE activities and objectives reflect and complement the In 2016, the EXCELERATE project held its first annual objectives of ELIXIR’s Scientific Programme 2014– meeting in Barcelona on 7–10 March 2016, as part of the 2018. The five ELIXIR Platforms and four Use Cases ELIXIR All Hands meeting. The theme of the meeting are represented in EXCELERATE as Work Packages was ‘User communities', with the programme focusing (WPs), complemented by Work Packages on Capacity on activities in the ELIXIR four Use Cases: Marine Development, Operations, Communications and Ethics: metagenomics (WP6), Rare diseases (WP7), Human data (WP8) and Plant Sciences (WP9). Platforms • WP1: Tools Platform: Tools Interoperability and Service Registry • WP2: Tools Platform: Benchmarking • WP3: Data Platform: Data Resources and Services • WP4: Compute Platform: Compute, Data access and exchange services • WP5: Interoperability Platform: The ELIXIR Interoperability Backbone

36 ELXIR Annual Report 2016 The main EXCELERATE outputs in 2016 • Extended the ELIXIR Tools and service registry (https:// bio.tools) with benchmarking capabilities (WP1), • Launched the ELIXIR Authentication and Authorisation ensuring it is a gateway to databases and tools for life- Infrastructure to enable individual researchers to access science data analysis different services across ELIXIR, through a single sign-on • Developed indicators to assess quality and impact of process1 (WP4) data resources4. These indicators will be used to identify • Established the ELIXIR Data Nodes Network – a network ELIXIR Core Data Resources and support excellence in of Nodes with key data resources, developing guidelines resource development and operation (WP3) and good practices to facilitate linking to international • Developed reference implementation of descriptors, archives (WP10) service management and API for datasets (WP5) • Released the ELIXIR Training Portal to disseminate • Use Cases: specific marine databases launched (WP6); training resources from ELIXIR Nodes and third-party released first set of ontologies for annotation of providers2 (WP11) phenotype in crop and forest plants (WP7); organised • Published ELIXIR Technical Services Roadmap3 that rare-disease community workshop (WP8); defined map specific scientific needs of different communities to requirements for human access-controlled data (WP9) technical and infrastructure solutions (WP4) • Developed and adopted the ELIXIR ELSI Policy (WP14)

Re-using life-science data ethically

Clear governance of Ethical, Legal and to both data generators and data ethics experts on the ELIXIR Scientific Societal Issues (ELSI) is an integral users. ELIXIR’s ELSI Policy supports Advisory Board and external experts, part of the provision of ELIXIR the process of making scientific data the ELSI Policy addresses the concerns Services, so the adoption of ELIXIR’s available in an ethically and legally of key stakeholders such as ethics ELSI policy was a major milestone sound manner by consolidating committees and ethics advisory within both EXCELERATE and ELIXIR requirements shared across all ELIXIR boards (e.g. of large projects), funders, generally. The policy is centred Members and Services. data providers, data users and patients around the provision of sensitive or patient groups. As distributed infrastructure, ELIXIR human data for use in research, and Services are delivered by ELIXIR As it defines a model that clarifies provides guidance on key principles in Nodes. This means that each ELIXIR specific responsibilities of databases the context of ELIXIR-EXCELERATE Service must have its own ethics and other services offering human Services. and regulatory oversight in place data for secondary use, the ELIXIR ELIXIR Services offer data for as the complexity and diversity of ELSI Policy is also of use to other secondary use - they do not produce regulations around data can vary research infrastructures and database data and are not involved in getting significantly between or even within providers currently going through the informed consent from research individual countries. process of assessing or developing participants. Their role centres their own policies. Developed in consultation with Heads around data stewardship as a service of Nodes and technical personnel,

1. www.elixir-europe.org/aai 2. http://tess.elixir-europe.org 3. Bongo LA, Borg M, Cornelis A et al (2016). A Technical Services Roadmap for supporting Life Science Research in Europe. Zenodo (doi: 10.5281/zenodo.60291) 4. Durinx C, McEntyre J, et al. (2016) Identifying ELIXIR Core Data Resources. F1000Research 5. pii:ELIXIR-2422 (doi: 10.12688/f1000research.9656.2)

ELXIR Annual Report 2016 37 Figure 4: Scope of the ELIXIR ELSI Policy

Submission Access ELIXIR Node

Data User generator ELIXIR (researcher, Patient (researcher, SERVICE consortium) consortium)

Consent

ELIXIR ELSI Policy • Data transfer agreement • Data access agreement • Terms of use

Scope of the ELIXIR ELSI Policy: An ELIXIR Service (e.g. an archive) The Data Transfer Agreement ensures that data providers and users providing data for research. When implemented the ELSIs policy only are aware of their obligations. The responsibility for implimenting the influences the DTA (e.g. as "Terms of Use" or "Terms of Service") at the obligations is cascaded down to the date provider and/or user. "data in) (submission) and "data out" (use) ends. The Node retains full responsibility for having a sustainable ELSI framework in place for the ELIXIR Service in question

38 ELXIR Annual Report 2016 Collaboration with other Research Infrastructures

ELIXIR both collaborates in and runs major EU projects, which in turn helps to fund specific activities in ELIXIR Platforms and Use Cases. These projects also enable ELIXIR to collaborate with key international initiatives in the life sciences, European Marine Biological Research other ESFRI Research Infrastructures and Infrastructure Cluster (EMBRIC) e-Infrastructures. EMBRIC – initiated in 2015 and, financed with €9 million from the Horizon 2020 programme – connects marine biotechnology initiatives that focus on science, industry and regional growth. In EMBRIC, ELIXIR partners with research infrastructures such as the European Marine Biological Resource Centre (EMBRC), Microbial Resource Research Infrastructure (MIRRI) and EU-OPENSCREEN, to drive at stronger connections between science and industry through a number of “workflows” including Authentication and Authorisation for bioproduct discovery, leverage of microbiological culture collections and aquaculture breeding strategies informed Research and Collaboration (AARC) by genomics. The AARC project brings together 20 partners from Achievements in 2016 include: across Europe with the goal to harmonise approaches to authentication and authorization in different • Launch of the EMBRIC “Configurator” service , a e-infrastructures and research collaborations. structured consultancy service that translates incoming proposed blue biotechnology studies into effective Represented by ELIXIR Finland (CSC- IT Center for configurations of data infrastructure from the portfolio Science), ELIXIR focuses on developing training materials of ELIXIR resources, setup and operation guides and data on AAI (Authentication and Authorization Infrastructure) management plans integration and leads the work on Level of Assurance. In March, it organised a training workshop on AAI integration • Checklist reporting standards, deployed within ELIXIR, for the ELIXIR community; furthermore, ELIXIR Finland for biomolecular data sets derived from strains in led the working group to develop a Level of Assurance microbiological collections and for shellfish aquaculture framework for research services. applications The AARC and CORBEL projects were working together • Mapping of services and requirements around chemical towards a common AAI for life sciences. An initial biology, including profiling of reference-level and other workshop was organised in May and hosted by ECRIN to ELIXIR resources, unmet requirements (typically around discuss the possibilities of having a common AAI for the laboratory management), future training needs and data participating research infrastructures. The different use integration across small molecules, culture collections cases for a common AAI were gathered and compared in and genomics databases Autumn.

ELXIR Annual Report 2016 39 ENVRIplus CORBEL ENVRIplus is a Horizon 2020 project linking Environmental CORBEL is collaboration between 11 ESFRI Biological and Earth System Research Infrastructures, projects and and Medical Research Infrastructures funded through networks together with technical specialist partners to EU’s Horizon 2020 programme. The goal of CORBEL is create a more coherent, interdisciplinary and interoperable to establish a framework of shared services between the cluster of Environmental Research Infrastructures. participating infrastructures (BMS RI), which enhances the efficiency, productivity and impact of European ENVRIplus covers all domains of Earth system science biomedical research and its translation into medicine. The (Atmospheric domain, Marine domain, Biosphere and CORBEL consortium is led by ELIXIR as the coordinator Solid Earth domains), and it is organised into six themes: and the Biobanking and Biomolecular Resources Research (1) Technical innovation, (2) Data for science, (3) Access Infrastructure (BBMRI) as co-coordinator. to research infrastructures, (4) Societal relevance and understanding, (5) Knowledge transfer and (6) In the first half of 2016 the project set up its governance Communication and dissemination. and administrative structures, established user and stakeholder forums, and carried out technology and gap ELIXIR, represented by EMBL-EBI, provides expertise and analysis to set the direction for the project’s technical and resources in the Biodiversity and Ecosystem field and in the scientific development. ‘Data for science’ theme. The project started in May 2015 and is coordinated by the Integrated Carbon Observation The project also established the BMS RI Innovation System (ICOS) Research Infrastructure. Office to help the European Biological and Medical Research Infrastructures with collaboration with industry To date, ELIXIR work has included: and technology transfer, providing legal support and • Support for a mapping exercise between the established partnering advice. ENVRI Reference Model and resources within ELIXIR, In the second half of the year, the project started the including the provision of a number of ELIXIR use cases technical and research effort to develop CORBEL platform from ELIXIR Greece and EMBL-EBI for shared infrastructure services. In October CORBEL • Presentation of ELIXIR outreach and dissemination launched the first Open Call for research projects to access practices integrated research infrastructures. The call offered four different Access Tracks: (1) Genotype to phenotype analysis, (2) Predictive systems pharmacology for safer drugs and chemical products, (3) Structure-function analysis of large protein complexes and (4) Marine Metazoan developmental models. The project received 80 proposals which were reviewed by external experts and by the service providers requested by the applicant.

40 ELXIR Annual Report 2016 Supporting activities

ELXIR Annual Report 2016 41 Capacity building and Node development

The ELIXIR Capacity Building Programme Data Nodes Network (1) supports ELIXIR Nodes in their The ELIXIR Data Nodes Network focuses on establishing organisation and governance, including guidelines and good practices to facilitate efficient data collection from national data resources and channeling sharing best practices in accessing EU these into international archives. The goal is to create Structural Funds (ESIF) and (2) develops and routes for uniform data publishing across ELIXIR, including coordinates ELIXIR-wide ‘communities of repositories for replication of reference data and secure storage of sensitive data. practice’ for management, administrative and technical services in ELIXIR Nodes. The network was launched at the Capacity Building Workshop held in April in Prague, Czech Republic. The first component of the Data Nodes Network was the Local EGA developed by the ELIXIR Human Data Use Case. The ELIXIR Node Handbook release and full implementation of the Local EGA in the In 2016 the Capacity Building Programme developed the Data Nodes Network is planned for 2017. ELIXIR Node Handbook which provides a basic reference for organisational, legal and governance aspect of ELIXIR Genome assembly and annotation Nodes’ functioning. Building on and extending the ELIXIR Genome assembly and genome annotation are two areas Handbook of Operations, the Node Handbook will be of bioinformatics analysis that require considerable time, regularly updated to reflect the experience from individual resources, and knowledge. These resources are unevenly ELIXIR Nodes. distributed throughout Europe, with only a few centres dealing with these analyses regularly. The ELIXIR Capacity Building Programme supports the transfer of genome annotation and assembly know-how from those centres across ELIXIR Nodes. One training course on genome annotation and assembly was organised in October 2016 in Prague, Czech Republic, attended by 24 participants. The course content is now available on the ELIXIR eLearning platform. Two more courses are planned in 2017. ELIXIR Sweden and ELIXIR Czech Republic also started a pilot project to investigate how the experts centres could supply genome assembly and genome annotation as a service in Europe. The project is scheduled to finish in 2017.

42 ELXIR Annual Report 2016 Governance

To ensure the integration of bioinformatics Commissioned services services into a coherent distributed Following the signature of Collaboration Agreements, infrastructure, it is critical to create effective the ELIXIR Hub can conclude Commissioned Services Contracts with ELIXIR Nodes. These are short-term links between the national institutes that projects funded through the ELIXIR budget (currently make up ELIXIR Nodes, and between those carried out through Implementation Studies) run by ELIXIR institutes and the ELIXIR Hub. This is done Nodes. The Commissioned Service Contract is signed between the ELIXIR Hub and the entity that represents through the signature of Collaboration the ELIXIR Node in each country. Agreements, which allows the ELIXIR In 2016, the total of 37 contracts were approved across nine Nodes to receive funding from the ELIXIR ELIXIR Nodes. The goal for 2017 is to increase the number Hub for Commissioned Services. of Nodes with Commissioned Service Contracts and perform the first pilot for internal tendering of long-term In 2016, the ELIXIR Hub worked closely with ELIXIR Infrastructure Services, which will go beyond the current Nodes on the development of Collaboration Agreements short-term duration of Implementation Studies. – three Collaboration Agreements (France, Portugal and Spain) were signed during the period. By the end of 2016, a total of twelve ELIXIR Nodes had their Collaboration Handbook of Operations Agreements in place. Negotiations will continue with the To formalise existing procedures and policies and remaining Nodes in 2017. support governance and administrative practices throughout ELIXIR, the ELIXIR Hub in 2016 developed In parallel with the development of Collaboration the ELIXIR Handbook of Operations. The document is the Agreements, the ELIXIR Hub begun working on ELIXIR authoritative source of information on ELIXIR procedures, Node Service Delivery Plans (SDPs), which describe the recommendations and guidelines, strategies and reference scientific and service provision content that each ELIXIR documents. It is aimed at the whole ELIXIR community, Node provides through ELIXIR. In 2016 the Hub prepared including all staff in ELIXIR Nodes, ELIXIR Hub staff, ELIXIR ten SDPs (Czech Republic, Denmark, Estonia, EMBL- Board members and national funders. The topics covered EBI, Finland, France, Netherlands, Norway, Sweden and by the Handbook include Governance, Nodes and Service Switzerland) and submitted them for review and approval provision, ELIXIR Programme cycle, Project management, to the ELIXIR Board during its 2016 Autumn session (21–22 Communications and External relations, and Technical November) in Hinxton, UK. operations. The services included in Nodes Service Delivery Plans now form the initial portfolio of ELIXIR services .

ELXIR Annual Report 2016 43 Industry engagement

In 2016, ELIXIR appointed a dedicated The ELIXIR Industry Strategy, published in May 2016, sets Industry and International Officer, five objectives in order to increase awareness and promote open innovation. Pablo Román García. Pablo leads the • Increase industry usage of ELIXIR resources and ensure implementation of ELIXIR's Industry the name is synonymous with quality Strategy and runs the ELIXIR Innovation • Enable Open innovation by Europe’s SMEs and SME programme. • Build effective partnerships with key industry stakeholders and initiatives • Ensure effective communication between industry and ELIXIR • Support the bioinformatics training needs of industry Each objective is met by a set of actions, carried out by ELIXIR partners. The strategy itself will be reviewed annually against metrics and updated regularly to include new activities. An industry brochure1 was also produced, featuring some of the major activities and relevant initiatives within the five ELIXIR Platforms.

44 ELXIR Annual Report 2016 Innovation and SME events ELIXIR’s Industry Advisory Committee In May 2016, an ELIXIR Innovation and SME forum The ELIXIR Industry Advisory Committee (IAC) met for was held in Norway, focusing on metagenomics and the second time in early 2016, issuing a set of high-level marine aquaculture. Organised in collaboration with recommendations for ELIXIR, including the need to keep ELIXIR Norway and the ELIXIR Marine Metagenomics developing good links with other successful industry- Use Case, it featured talks from companies such as academia initiatives such as the Innovative Medicines MarineHarvest, AquaGen and ArticZymes, who rely on Initiative. The IAC also recommended continuing to access to public data to run their analysis pipelines. The increase awareness through presence at conferences and Programme Technical tracks presented some of the free the promotion of ELIXIR Node services (see page 54 for IAC resources offered through ELIXIR such as Ensembl, EBI members). Metagenomics portal, LiceBase, SalmoBase and genomic resources for Atlantic cod. During 2016, ELIXIR prepared a calendar of four upcoming Innovation and SME forums to take place in 2017. Starting with Helsinki in February, followed by Barcelona (June), Brussels (October) and Paris (November), the events will focus on health, microbiome and rare diseases respectively.

ELIXIR at industry networking events ELIXIR was presented at several industry focused events: the IMI Stakeholder Meeting in Brussels, Belgium, BioEurope in Cologne, Germany, and the Target Validation Using Bioinformatics Conference in Heidelberg, Germany. During the last part of 2016, ELIXIR further expanded its engagement with local biotechnology clusters in Spain, Finland, UK and with the Council of European BioRegions.

Industry projects Throughout 2016, ELIXIR built up collaborations with key industry initiatives. ELIXIR and the IMI-funded OncoTrack project developed their collaboration around using the European Genome- phenome Archive as a solution for long-term storage of data. OncoTrack uses large-scale genomics to generate new data to power research that will improve the early diagnosis of colon cancer. The final report published at the end of the collaboration also explains the benefits of ELIXIR services to industry. ELIXIR kept on developing Bioschemas as a flagship project. This community-driven initiative encourages people, including industry users (e.g. Google and repositive.io2) in life science to use schema.org markup, so that their websites and services contain consistently structured and machine readable information.

1. Available via http://www.elixir-europe.org/industry 2. https://repositive.io

ELXIR Annual Report 2016 45 International collaboration

ELIXIR International strategy ELIXIR on the global stage In October 2016, ELIXIR published its International Collaboration with the US Big Data to Knowledge Strategy, which sets out ELIXIR's role on the global stage. (BD2K) initiative stepped up in 2016. Several ELIXIR The strategy defines ELIXIR's objectives in international representatives gave talks at the BD2K Annual Meeting collaboration and will ensure that ELIXIR engages with in Bethesda, USA, presenting ELIXIR’s International major international initiatives in bioinformatics and key Strategy, Training and Interoperability initiatives. countries outside Europe. ELIXIR Platforms and groups continued their collaboration The four objectives of the ELIXIR International Strategy with global initiatives such as GOBLET (Global are to: Organisation for Bioinformatics Learning, Education and Training), Global Alliance for Genomics and Health, and the • Ensure ELIXIR serves life-science users and communities Research Data Alliance. across the globe • Support collaboration between ELIXIR and relevant At the policy level, ELIXIR began global discussions global bioinformatics and data initiatives around the sustainability of life science databases with representatives from North American, European and Asian • Shape global science policy discussions around data and funding bodies, facilitated by the Human Frontiers Science research infrastructures Programme. • Develop formal collaboration with those countries outside Europe where there is a mutual benefit The International Conference on ELIXIR is recognised by the G7's Group of Senior Officials Research Infrastructures to be a Global Research Infrastructure with a potential for ELIXIR was presented at the International Conference on membership by countries outside of Europe. At the heart Research Infrastructures (ICRI) in Cape Town, South Africa, of ELIXIR’s International Strategy is a commitment to on 3-5 October 2016. ICRI is a global forum for research realise this potential and drive bioinformatics collaboration infrastructures operators and managers to discuss at the global level. collaboration on research infrastructures across countries and scientific domains. ELIXIR presented its International Strategy; ELIXIR’s External Relations Manager, Andrew Smith, gave a talk on Inclusive Research Infrastructures for Development and Capacity building.

46 ELXIR Annual Report 2016 ELIXIR Working groups

The ELIXIR Working Groups gather experts ELIXIR Working Group on Data around a specific theme or a problem and Management Plans actively engage with ELIXIR Nodes, ELIXIR The ELIXIR Data Management Plans (DMP) Working Group Platforms and Use Cases. Their mandate received a 6-month mandate in December 2015 to explore potential actions in developing a strategy for supporting and scope are defined by the ELIXIR Heads good data management practices in life sciences. of Node Committee; their outcomes range Following a survey and a discussion held at the ELIXIR All- from policy recommendations and white Hands meeting in March 2016, the Working Group became part of the ELIXIR Training Platform, tasked to develop the papers to review articles and scoping ELIXIR Data Management Portal. The development was studies. led by ELIXIR Netherlands and ELIXIR Czech Republic; the first testing version of the portal was released in December In 2016 ELIXIR had three working groups in operation: the 2016. Galaxy Recommendation Working Group, the Software Development Best Practices Working Group and the Data The goal of the Data Management Portal is to provide Management Plans Working Group. guidance to life science researchers on data stewardship and help them to create Data Management Plans for their Galaxy Recommendations Working Group research projects. The Galaxy Recommendation Working Group was established in 2015 to evaluate to evaluate the use of the online platform Galaxy in the ELIXIR Nodes, and its applications in ELIXIR Platforms and Use Cases. In 2016 the Working Group presented its work at the Galaxy Community Conference in Bloomington, USA1, and organised Galaxy DevOps Workshop in Heidelberg, Germany.

Software Development Best Practices Working Group The Software Development Best Practices Working Group was launched in December 2015 to deliver guidelines on best practices and quality assessment in software development. Following the kick-off workshop in Amsterdam, the group proposed and published a set of metrics to monitor and assess good practices in software development2. The set represents a consensus of experts from different organisations that have potential to get adopted by the communities they represent. The group also worked on recommendation for Open Source software development. The research paper to present these recommendations and a corresponding website are planned for publication in were published in Summer 2017.

1. Coppens F. ELIXIR: a distributed infrastructure for life-science information. F1000Research 2016, 5:1569 (poster) (doi: 10.7490/f1000research.1112469.1) 2. Artaza H, Chue Hong N, Corpas M et al. Top 10 metrics for life science software good practices. F1000Research 2016, 5(ELIXIR):2000 (doi: 10.12688/f1000research.9206.1) 3. http://dmp.fairdata.solutions

ELXIR Annual Report 2016 47

ELIXIR Hub activities

ELXIR Annual Report 2016 49 ELIXIR Communications Mathias Uhlén (ELIXIR Sweden) in the ELIXIR profile video

Following an ELIXIR Communications The ELIXIR Intranet was formally launched in December expert workshop held in December 2015; in 2016 the focus was on expanding functionality and attracting more active users from within ELIXIR. By 2015, the ELIXIR Hub (with input from the end of 2016, the ELIXIR intranet had nearly 400 users. partners) developed ELIXIR’s first The ELIXIR Intranet was also the first service to use the Communications Strategy. The purpose newly developed ELIXIR Authorisation and Authentication Infrastructure (AAI): this enabled intranet users to use their of the Communications Strategy is to institutional account as well as their ORCID ID or LinkedIn act as a reference point for all aspects account to log in. of ELIXIR’s communications work. It In October, the ELIXIR Hub released the ELIXIR profile provides a comprehensive overview of video to present its structure, mission and objectives. The existing communications channels and video includes interviews with partners and presents many of ELIXIR’s current activities. It will be followed by three policies, including main messages and more videos presenting specific aspects of ELIXIR. The primary target groups. Annexed to the video is available on the ELIXIR YouTube channel1. strategy contain more detailed guidance In 2016 the ELIXIR Hub continued with the ELIXIR Webinar documents on specific aspects of ELIXIR Series and organised 15 webinars to present the work of communications (ELIXIR social media ELIXIR Implementation Studies, ELIXIR Platforms and Use Cases. strategy, ELIXIR writing style guide and others).

50 ELXIR Annual Report 2016 ELIXIR at ECCB 2016 ELIXIR Gateway on F1000Research ELIXIR was the main organising partner of the European The ELIXIR Gateway on F1000Research was launched Conference for Computational Biology (ECCB), the main in December 2015 as a platform to collect and capture European event for bioinformatics and computational ELIXIR’s research and technical outputs. In 2016, the biology in 2016, held from 3–7 September in The Hague, ELIXIR channel published nine articles, all transparently Netherlands. peer-reviewed through the F1000Research invited post publication peer review process. As part of the ELIXIR-ECCB 2016 partnership, the programme of the conference featured a dedicated track The most successful article published in the ELIXIR to showcase ELIXIR activities and services. The ELIXIR Gateway was by Durinx, McEntyre, Appel et al.2, which Application Track session presented twelve projects from presented the criteria and process for identifying and across ELIXIR infrastructure, which were selected from selection of ELIXIR Core Data Resources. This has over 40 submissions from ELIXIR Nodes, Platforms and been viewed by over 16,000 readers. Other important Use Cases. publications include articles by Bousfield, McEntyre, Velankar et al. exploring patterns of database citation in The full breadth of ELIXIR activities was presented in the articles and patents3 and by Artaza, Chue Hong, Corpas ELIXIR Poster session that showcased nearly 50 posters. et al., presenting top ten metrics for life science software “As one of the major initiatives in Europe and globally, we development good practices4. were delighted to give ELIXIR a venue to present its work to The editorial oversight of the ELIXIR Gateway is provided the international and multidisciplinary audience at ECCB. by an Advisory Board who review all papers submitted to It was for the first time that we included an Applications the Gateway and ensure all materials are relevant to the track in the programme and it proved to be a success! It ELIXIR community. The members of the Advisory Board was great to see the interest the ELIXIR Track created also worked as a review panel for the ELIXIR Applications and the feedback we received from the delegates was Track organised as part of the ECCB 2016 (see above). overwhelmingly positive.” The members of the Advisory Board of the ELIXIR Gateway — Jaap Heringa, Chairman of the ECCB 2016 Organising are: Committee and Head of the ELIXIR Netherlands • Niklas Blomberg, ELIXIR Director The ECCB 2016 was organised by the Dutch Techcentre • Inge Jonassen, University of Bergen, Head of ELIXIR for Life Sciences (ELIXIR Netherlands) and the BioSB Norway research school. • Barend Mons, Leiden University Medical Center, Netherlands • Arlindo Oliveira, Instituto Superior Técnico, Head of ELIXIR Portugal • Bengt Persson, Uppsala University, Sweden, Head of ELIXIR Sweden

1. https://youtu.be/stTY6fxwonY 2. Durinx C, McEntyre J, Appel R et al. Identifying ELIXIR Core Data Resources. F1000Research 2017, 5(ELIXIR):2422 (doi: 10.12688/f1000research.9656.2) 3. Bousfield D, McEntyre J, Velankar S et al. Patterns of database citation in articles and patents indicate long-term scientific and industry value of biological data resources. F1000Research 2016, 5(ELIXIR):160 (doi: 10.12688/f1000research.7911.1) 4. Artaza H, Chue Hong N, Corpas M et al. Top 10 metrics for life science software good practices. F1000Research 2016, 5(ELIXIR):2000 (doi: 10.12688/f1000research.9206.1)

ELXIR Annual Report 2016 51 ELIXIR Hub staff

In 2016, the ELIXIR Hub expanded its team and The ELIXIR Hub External Relations team expanded with recruited experts for technical, coordination and the arrival of Pablo Román García as ELIXIR Industry and leadership positions. International Officer. Pablo leads the implementation of ELIXIR's Industry strategy, including running the ELIXIR Serena Scollen joined the Hub in June as the Head of Innovation and SME programme; He also supports Human Genomics and Translational Data to lead the ELIXIR’s collaboration with countries outside Europe and development and implementation of an ELIXIR-wide relevant global initiatives. Human Data strategy. Melissa Balzano joined the Hub in March as ELIXIR Events Serena joined ELIXIR from Pfizer where she was Director Officer, replacing Joy Friesner. and Head of UK . In this role, she led and implemented a genetic and precision medicine strategy To support the administration of EU grants, the Project to support drug target selection and clinical programmes Management team recruited Elenko Anastasov as for the Pain and Sensory Disorders Research Unit. ELIXIR Grants Administrator. Elenko supports the financial monitoring and administration of the ELIXIR- The technical expertise and coordination capacity was EXCELERATE and CORBEL projects. strengthened by two new Technical Coordinators. Montserrat González joined the Hub team in March The ELIXIR Hub also hosted three interns. Roberto as ELIXIR Technical Officer to implement the quality Preste spent six months (March - August) supporting the assurance processes for ELIXIR services and support the Bioschemas initiative; Federico López Gómez joined the coordination of the ELIXIR Compute Platform (Montserrat Hub in October and worked on several short-term projects left the ELIXIR Hub at the end of 2016 and took up a new for the ELIXIR Events portal and the Bioschemas initiative. position at EMBL-EBI). Norman Morrison joined the Marine Gabory joined the External Relations team to ELIXIR Hub in July to work as Technical Lead for the ELIXIR support ELIXIR’s work on long term funding strategies Interoperability Platform. Norman works closely with the and consultation responses. Interoperability Platform leadership team, driving the development of a long-term interoperability strategy and architecture for ELIXIR. Norman took over responsibilities from Andy Jenkinson who left his temporary assignment at ELIXIR at the end of June.

ELIXIR Hub staff in 2016. From left to right: Federico López Gómez, Nicola Kay, Přemysl Velek, Steffi Suhr, Melissa Balzano, Pablo Román García, Phyllida Hallidie, Norman Morrison, Serena Scollen, Andy Smith, Niklas Blomberg, Marine Gabory and Martin Cook. (Not pictured: Susanna Repo, Rafael Jimenez, Friederike Schmidt-Tremmel, Mikael Linden, Montserrat González, Elenko Anastasov, Roberto Preste)

52 ELXIR Annual Report 2016 Governance committees and financial data

ELXIR Annual Report 2016 53 ELIXIR Committees

ELIXIR Board

Chair Vice Chairs Torsten Schwede, Switzerland Anna Wetterbom, Sweden Prof Rein Aasland, Norway

Country Scientific delegate Administrative delegate Belgium Laurence Lenoir (from July 2016) Michele Oleo François Guissart (until June 2016) Didier Flagothier Czech Republic Jaroslav Koča Jan Burianek Denmark Troels Tvedegaard Rasmussen Estonia Pärt Peterson Toivo Räim Priit Tamm Finland Per Öster Riina Vuorento Jarmo Wahlfors France Claudine Medigue Eric Guittet Germany (from August 2017) Roland Eils Johannes Mohr Alexander Goesmann Ireland (from July 2017) TBA Marion Boland Dara Dunican Israel Yossi Kalifa Ilana Lowi Italy Salvatore La Rosa Luxembourg TBA Lynn Wenandy (from June 2016) Pierre Misteri Netherlands Ruben Kok Bea Pauw Norway Rein Aasland Jacob E Wang Stig Omholt (until March 2016) Portugal Ana Teresa Freitas Andreia Feijão Tiago Saborida Slovenia Damjana Rozman Albin Kralj Spain Ferran Sanz Cristina Bauluz Dr Rafael de Andres-Medina Sweden Ulf Gyllensten (Until October 2016) Karl Gertow (from October 2016) Björn Andersson (From October 2016) Anna Wetterbom (stepped down end of 2016) Switzerland Torsten Schwede Isabella Beretta UK Chris Rawlings Mark Palmer Amanda Collis EMBL Iain Mattaj Silke Schumacher

54 ELXIR Annual Report 2016 Heads of Node Committee

Chair Niklas Blomberg, ELIXIR Director

Country Head of Node Belgium Yves Van de Peer Czech Republic Jiří Vondrášek Denmark Søren Brunak Estonia Jaak Vilo Finland Tommi Nyrönen France Jean‐François Gibrat Germany (from August 2016) Alfred Pühler Ireland (from July 2016) Walter Koch Israel Italy Graziano Pesole Luxembourg (from June 2016) Reinhard Schneider Netherlands Barend Mons (until September 2016) Jaap Heringa (from September 2016) Norway Inge Jonassen Portugal Arlindo Oliveira Slovenia Brane Leskošek Spain Alfonso Valencia Sweden Bengt Persson Switzerland Ron Appel UK Carole Goble EMBL-EBI

ELXIR Annual Report 2016 55 Scientific Advisory Committee Industry Advisory Committee

Chair Chair Robert Gentleman, 23andMe, USA Angel Pizarro, Amazon Web Services, USA

Vice Chair Vice Chair Janet Kelso, Max Planck Institute for Philippe Sanseau, GlaxoSmithKline, UK Evolutionary Anthropology, Germany Members Members Belinda Clarke, Agri-Tech East, UK (from April 2016) Alan Archibald, University of Edinburgh, UK Martin Ebeling, Hoffmann-La Roche, Switzerland Pascal Borry, University of Leuven, Belgium Anita Eliasson, Biocomputing Platforms Ltd, Finland Kate Bushby, Newcastle University, UK Iain Hrynaszkiewicz, Springer Nature, UK Elina Ikonen, University of Helsinki, Finland Natalia Jiménez Lozano, Atos, Spain Larry Hunter, University of Colorado, USA (from Claus Stie Kallesøe, Grit42, Denmark November 2016) Sara Paulina de Oliveira Monteiro, P-BIO, Portugal Edward Marcotte, University of Texas at Austin, USA (from April 2016) Nicola Mulder, UCT Computational Biology Group (NBN), Christian Paulitz, Bayer CropScience, Germany South Africa (from November 2016) Francis Ouellette, Ontario Institute for Cancer Research, Angel Pizarro, Amazon Web Services, USA Canada (from April 2016) Liz Reynolds, General Bioinformatics, UK Juni Palmgren, Karolinska Institutet, Sweden Sándor Szalma, Takeda Pharmaceuticals, USA (from November 2016) Jerry Lanfear, Pfizer, UK (until June 2016) Susan E. Wallace, University of Leceister, UK Mark Foster, Syngenta, UK (until July 2016) Jérôme Wojcik, Quartz Bio, Switzerland

ELIXIR Scientific Advisory Board, from left to ELIXIR Industry Advisory Committee, from left right: Francis Ouellette, Alan Archibald, Nicola to right: Pablo Román García (ELIXIR Industry Mulder, Robert Gentleman, Juni Palmgren, and International Officer), Natalia Jiménez Susan E. Wallace, Larry Hunter (not pictured: Lozano, Philippe Sanseau, Martin Ebeling, Janet Kelso, Pascal Borry, Kate Bushby, Elina Niklas Blomberg (ELIXIR Director), Anita Ikonen, Edward Marcotte and Jérôme Wojcik) Eliasson, Christian Paulitz, Liz Reynolds, Angel Pizarro, Iain Hrynaszkiewicz (not pictured: Belinda Clarke, Claus Stie Kallesøe, Sara Paulina de Oliveira Monteiro, Sándor Szalma, Jerry Lanfear and Mark Foster).

56 ELXIR Annual Report 2016 Financial data

In its 2014 Summer meeting, EMBL Council unanimously As of 31 December 2016, the total number of signatories to approved ELIXIR’s legal framework, including its status the ECA stood at 20 (with France and Spain as Provisional within EMBL as a "Special Project” as well as EMBL's Members) with Greece additionally making contributions membership of ELIXIR (EMBL/2013/16/Rev 1). to its financing as an Observer. As of 31 December 2015, the total number of signatories to The budget of ELIXIR is set annually by the ELIXIR Board the ECA stood at 15 (with France and Spain as Provisional and all funds related to its activities, including its surplus, Members) with Slovenia additionally making contributions are ring-fenced within EMBL's accounts. to its financing.

2016 2016 2015 Actual Budget Actual Income €000 €000 €000 ELIXIR Member state contributions Ordinary contributions (a) 3.710 2.759 2.064 Foreign exchange gain on sterling contributions (b) 50 Grant income (c) 899 1.042 150 Other income 1 - 3 Provision for unpaid member state contributions - - - Net Income 4.660 3.801 2.217

Expenditure

Technological activities Salaries 195 440 112 Running costs 151 260 79 Equipment and depreciation - 8 - Commissioned services (see page 58-59) 435 940 Total expenditure Technological Activities 781 1.648 191

Directorate and Administrative expenditure Salaries 640 1.161 800 Running costs 260 452 463 Equipment and depreciation - 4 - Total expenditure Directorate and Administration 900 1.617 1.263

Support and Admin Infrastructure costs 342 534 405 Grant expenditure incurred 899 - 150

Total expenditure 2.922 3.799 2.009 Surplus/(Deficit) 1.738 2 208

ELXIR Annual Report 2016 57 ELIXIR Member state contributions

2016 2015 Actual Actual

(a) ELIXIR Member state contributions €000 €000 Belgium 123 116 Czech Republic 46 43 Denmark 79 74 Estonia 5 4 Finland 61 58 France 719 226 Germany 376 Greece 22 - Ireland 20 Israel 56 53 Italy 531 Luxemburg 4 Netherland 205 193 Norway 113 107 Portugal 54 51 Slovenia 10 11 Spain 372 265 Sweden 125 118 Switzerland 159 151 United Kingdom 630 594 Total 3.710 2.064

Less provision for unpaid member state contributions - - Net ELIXIR member state contributions 3.710 2.064

58 ELXIR Annual Report 2016 (b) The ELIXIR Board approved that, from January 2016, the UK will pay its member state contributions in Sterling (ELIXIR/2015/28). The difference between the value of these contributions valued in Euros at the date of payment and the date of the approval of the 2016 budget was €50k.

2016 2015 Actual Actual (c) Grant income €000 €000 Grant funding awarded 5.832 5.941 Grant income earned 899 150 Expenditure incurred (899) (150) Unutilised grant income 4.892 5.791

(d) The following countries have amounts due or prepaid at 31 December 2016:

Values in € 000 Contribution 2016 Contribution 2015–2013 Total Greece - 45 45 Germany 376 - 376 Israel - 53 53 ELIXIR member state receivables 376 98 474

ELXIR Annual Report 2016 59 Commissioned Services: ELIXIR Implementation studies

Implementation Studies are short-term projects carried in a particular Platform or Use Case. They are proposed out by ELIXIR Nodes addressing key scientific and technical by Platforms, agreed with the ELIXIR Heads of Nodes issues within ELIXIR. The outcome of an Implementation Committee and approved by the ELIXIR Board. Study may be a description of service requirements, In 2016 ELIXIR ran nine Implementation Studies , with a piece of software, or a technical deliverable with an three additional studies approved in 2016 with a start date accompanying report. in early 2017. Ongoing and completed Implementation Implementation Studies are funded through the budget of Studies are listed at https://www.elixir-europe.org/about- the ELIXIR Hub and form part of ELIXIR’s ongoing activities us/implementation-studies

Name Sector Nodes Leads Finished in 2016 Metrics discovery and implementation Tools United Kingdom Manuel Corpas, UK in life-sciences John Hancock, UK Genomic data management for translational Human Data Netherlands Sanne Abeln, NL and biomarker research using the European EMBL-EBI Dylan Spalding, Genome-phenome Archive (EGA) Spain EMBL-EBI Rare disease interoperability backbone Rare Disease Netherlands Marco Roos, NL France EMBL-EBI Bringing European data forward for Human Data France Ilkka Lappalainen, FI international data discovery Switzerland Jordi Rambla, ES Spain Belgium EMBL-EBI The Netherlands Sweden Building the foundation for Data Carpentry Training United Kingdom Alexandra Pawlik, UK and Software Carpentry within ELIXIR Norway Finland Italy Belgium Switzerland The Netherlands Slovenia Estonia Czech Republic France Israel Portugal

60 ELXIR Annual Report 2016 Name Sector Nodes Leads Ongoing in 2016 Implementation Study on Data Interoperability EMBL-EBI Henning Hermjakob, Identification and Interoperability EMBL-EBI Data Resource Implementations for the Human Data Switzerland Michael Baudis, CH Global Alliance for Genomics and Health France Data Schema EMBL-EBI The scientific and economic impact of Data Switzerland Christine Durinx, CH ELIXIR Data Resources – Towards a sustainable funding model for the UniProt- SwissProt use case Solutions for IMI data management Human Data EMBL-EBI Dylan Spalding, ELIXIR Hub EMBL-EBI IMI OncoTrack Susanna Repo, Spain ELIXIR Hub David Henderson, IMI OncoTrack Approved in 2016 (to start early 2017) Integrating distributed resources Data EMBL-EBI Paul Kersey, EMBL-EBI in Ensembl Genomes Sweden Norway Microbial metabolism resource for Data France Claudine Medigue, FR Systems Biology Switzerland EMBL-EBI Proteomics infrastructure service Data EMBL-EBI Juan Antonio Vizcaino, Germany EMBL-EBI

ELXIR Annual Report 2016 61 Credits and acknowledgements

Produced on the direction of the ELIXIR Board in May 2017.

With special thanks to all of those who contributed to the development of ELIXIR infrastructure in 2016, most notably Heads of Nodes, Platform and Use Case leads, Technical and Training Coordinators and members of the various Working Groups.

© 2017 ELIXIR

This publication was produced by the External Relations team at the ELIXIR Hub

For more information about ELIXIR please contact [email protected]

Art direction and design: Design Science Illustration: Jai Wilson ELXIR Annual Report 2016 63 ELIXIR is building a sustainable ELIXIR European Infrastructure for biological information, supporting life science Annual Report 2016 research and its translation to: Medicine Environment Bioindustries Society

Contact Niklas Blomberg, Director ELIXIR Wellcome Genome Campus Hinxton, Cambridgeshire CB10 1SD, United Kingdon +44 (0)1223 492 670 +44 (0)1223 494 468 [email protected] www.elixir-europe.org