<<

Infrastructures for research and innovation

Professor Ewan Birney FRS Director, EMBL-EBI www.ebi.ac.uk Outline of talk

• Who Am I, What is EMBL? • The change in • The needs for stratified patients in clinical care and drug discovery • Europe’s assets • A path to releasing Europe’s strengths The European Molecular Biology Laboratory

80+ nationalities >1600 personnel 6 sites in Europe

Heidelberg, Germany Hinxton, Cambridge, UK Grenoble, France

Main Laboratory Structural Biology

Tissue Biology, Disease Modeling Neuroscience Structural Biology

Barcelona, Spain Rome, Italy Hamburg, Germany Ewan Birney

• Lead the original team that analysed the human genome (gene sets) • research in genomic information • Set up many key databases in genomics (eg, Ensembl)

• Director of EMBL-EBI • Non-executive director for (NHS clinical genomics) • Formal Advice to UK, Finnish, Danish, US governments; informal to other governments • Advisor to both large (GSK) and small (Oxford Nanopore) companies • Chair of the Global Alliance for Genomics and Health (GA4GH) We have been living through a revolution. One genome 2003 to 2018

The cost of sequencing a The cost of sequencing a genome in 2003 genome in 2018 Imaging: new technologies change the game

EM tomography, Atomic-scale models from EM

Super-resolution light microscopy

High-resolution MRI and CT Light sheet microcopy Genomics: from research to healthcare

Research Practicing Medicine

• English language • National language • Light-weight legal • Heavy legal framework • Similar systems • Different systems • Open data • Closed data • Publications • Not published • Grant funding • Contract funding Big numbers! Stratification of Patients Stratification

Class A

Stratification Class B

Class C Benefits of stratification

• In clinical practice • Better diagnosis and prognosis • Better use of (expensive) medicines (“personalised” medicine) • Specific care pathways optimised for the cases

• In drug discovery • More clarity on the therapeutic goals in early development • Cheaper and more likely to succeed Phase II and Phase III trials 4 Pillars of stratification

At scale Clear legal basis to genomic access appropriate assays data and approach patients

Very Large Harmonised Virtual Cohorts representation of ideally with key aspects of population scale EHRs ascertainment Europe’s Assets Well regulated, often state run healthcare

• Total population size of >200 million • The largest coherent EHR records in the world (Denmark, 6 million Danish citizens) • Sweden, Norway, Finland all have good record keeping • Large, predominantly state run systems in France and UK

• Historical as well as future health data The most advanced clinical + population genomics programs globally

• Finland - >10% of the population sequenced in 5 years • Estonia – aiming for all 1 million biobanked • Denmark – 5Million EHRs, 100,000 sequenced • UK – Goal of 5 million with genomic assays within 5 years • France – Clinical + Population scale assays for ~1 million within 5 years • Spain – Variety of regional programs with scale to millions An European Framework: MEGA ELIXIR Node Map

Associated Institutes

ELIXIR-BE Katholieke Genomic InfrastructureUniversiteit Leuven

ELIXIR-BE University of • EMBL-EBI Antwerp ELIXIR-BE University of Liège • World leader in genome ELIXIR-BE Vrije Universiteit informationBr uandssel analysis • The most comprehensiveELIXIR-BE Universiteit Hasselt lifescienceELdatasetsIXIR-BE Interuniversity Institute of Bioinformatics globally Brussels

ELIXIR-CZ: Masaryk • ELIXIR University (CEITEC)

ELIXIR-CZ: Masaryk • European Uwideniversity (C EnetworkRIT-SC)

with NationalELIXIR -CnodesZ: Institute of to connect localChemic aresearchl Technology and healthcareELIXIR-CZ: Institute of Experimental Botany AS CR, v. v. i.

ELIXIR-CZ: Institute of Molecular Genetics of the AS CR

ELIXIR- CZ Institute of Microbiology ASCR ELIXIR-CZ: Cesnet

ELIXIR-CZ: University of South Bohemia The need for infrastructure Clinical Record + Diagnosis

Reference Infrastructure National Genome Database A vibrant commercial research sector

• Many European large scale pharmaceutical companies • Sanofi, GSK, Roche, AstraZeneca, Novartis • Balance of US vs European research intensity • Vibrant SME community • Based around clusters – Heidelberg-Stuttgart-Munich-Basel, Paris-Brussels-Amsterdam, Oxford--Cambridge, Barcelona, Stockholm-Helsinki • Public-private partnerships • IMI • OpenTargets @EMBL-EBI A path for European stratified populations Alignment of European programs

• Million genomes declaration • EMBL-EBI and ELIXIR (ESFRI) as genomic infrastructure • IMI programs as an instrument to foster cross- institutional, trans-national, public-private partnerships Engagement with Nation state Health strategy • Practical “on the ground” implementation is in the hands of the operations and regulation of the healthcare systems in Europe • Source of EHR information • Source of genomic information • Fundamental need to have >100 million person cohorts will drive trans-national work • Clear for smaller countries that between country federation is needed • Clear for rare disease in all countries; will become relevant to more diseases Engagement with global structures

• Europe has to tackle trans-national coordination far earlier than the US or Chinese systems • Similar opportunity as mobile phone GSM standards – the need for ultimately trans-national access places Europe as the leader in how to solve this • Legal and ethical components (GDPR) • Technical components • Leadership in global bodies, such as GA4GH (Global Alliance for Genomics and Health) Thank you!

EMBL-EBI Follow me on @ewanbirney

1/11/2019 25 Our mission

Deliver Deliver Train the Engage Coordinate excellent scientific next with bioinformatic research services generatio European s in Europe n of industry scientists Life : many data types

Genes, genomes & variation

Gene, protein & metabolite expression

Protein sequences, families & motifs Phenotypes

Macromolecular structures

Interactions, reactions & pathways

Chemogenomics & metabolomics Data resources at EMBL-EBI Gene, protein & metabolite Genes, genomes & variation • Ensembl expression • Ensembl Genomes • Expression Atlas • GWAS Catalog • Metabolights • Metagenomics portal • PRIDE • RNA Central Protein Molecular structures sequences, families & motifs • Protein Data Bank in Europe • InterPro • Electron Microscopy Data • Bank • UniProt Literature & ontologies Chemical Systems • BioModels • Experimental Factor biology • BioSamples Ontology • ChEBI • Enzyme Portal • Gene Ontology • ChEMBL • IntAct • BioStudies • SureChEMBL • Reactome • Europe PMC Molecular Archives • European Nucleotide Archive ~410 people • European Variation Archive Worldwide collaborations • European Genome-phenome Archive • ArrayExpress Global reference data

See the live map at www.ebi.ac.uk/about/our-impact Big data, big demand

Scientists at over ~27 million requests to EMBL-EBI websites every 3.2 million unique IP addresses use day EMBL-EBI websites

EMBL-EBI delivered Sustainable Funding 1-5 US$ billion Over 40 difference funding agencies worldwide in efficiency savings worldwide Forward commitment of over £100 million Research groups at EMBL-EBI

Alex Ewan Pedro Alvis Rob Paul Andrew Moritz Bateman Birney Beltrao Brazma Finn Flicek Leach Gerstung

Nick Zamin John Evangelia Oliver Janet Virginie Daniel Goldman Iqbal Marioni Petsalaki Stegle Thornton Uhlmann Zerbino Research data at EMBL-EBI

Mutations affecting proteins implicated in rare diseases Evolution of Proteomic & RNA comparison phosphorylation sites Genomics of infectious disease > < Modeling unwanted variation in single-cell transcriptom e studies Single Cell Genomics Translational bioinformatics EMBL Research Community

• Research group picture

~170 people ~50 visitors / year Medical Genomics Serious efforts on way

• Genomics England • 100,000 Genomes by end of 2019 (35,000 done now) • Long term 60K-100K from “routine healthcare” across NHS • Plan France Génomique • ~100,000 genomes / year by 2025, first sites selected • Iceland • 40% of the population genotyped/sequenced + imputed • Switzerland • SPRT program to promote genomic medicine • Finland • at least ~10% (0.5 million) of the population with sequence data by 2020 • US – Complex payer/insurance lead market • Mixture of HMO (Geisgner) and NIH (All of Us – mainly a cohort) Genomics: from research to healthcare

Research Practicing Medicine

• English language • National language • Light-weight legal • Heavy legal framework • Similar systems • Different systems • Open data • Closed data • Publications • Not published • Grant funding • Contract funding Bridges need at least two anchors Long-term goals

• Ideal: “Institute for Biomedical informatics” in each country • Large nations/populations: Distributed network with a clear centre of gravity • EMBL-EBI & ELIXIR handle research data: reference collections and sharing amongst researchers (including clinical) • Institute for Biomedical Informatics: • Responsible for exploiting molecular reference data • Provides the national link and point of reference (eg, around legislation) • Broker for research data (back to EMBL-EBI, NCBI & ELIXIR) France  EMBL-EBI Basic Research

• Working collaboratively with Elixir-France • Orphanet, CAZy • Support training in bioinformatics • Ensuring French scientists and institutes exploit EMBL- EBI • Seamless APIs to allow submission of data driven by institutes (less complexity for user/scientist, use EMBL-EBI as backup) • GDR Mediatec for Chemical Ecology -> Metabolights • Genscope DNA data -> ENA • Research work with French research scientists • Institute Pasteur, Institute Curies links • French Embassy internships at EMBL-EBI Applied Research : Medicine

• Ensuring transfer of skills and expertise to the French medical system • France’s medical genomics must be run and delivered in France (obviously!) • Technical aspects, eg, Archiving DNA data at scale nationally • Reference human biology resource • Orphanet • Infectious epidemiology/bacterial genome sequencing? • Working with Elixir-France and others for international standards • ELIXIR’s role in GA4GH standards Big numbers! Global standards: the GA4GH

• GA4GH is THE standards-setting body for genomics and healthcare • Embraces federated approach • Setting community standards early • Cloud: Analysis carried out where the data ‘lives’ • “You’re already using it!”: SAM/BAM/CRAM/VCF formats • Tools: htsget – the first step away from file-based access • Rare disease diagnoses: Matchmaker Exchange • Federated discovery: GA4GH Beacons Federation

Open research data Healthcare data with research use

analysis analysis

Aggregate data globally Analyse data locally (via VMs)

Download, analyse locally Collate analyses