Bioinformatics Challenges for the Life Sciences

Rolf Apweiler Associate Director EMBL-EBI www.ebi.ac.uk What is EMBL-EBI?

• Part of the European Molecular Biology Laboratory • International, non-profit research institute • Europe’s hub for biological data services and research • 550 members of staff from 55 nations What we do

Bioinformatics

Services Research Training Industry engagement European Coordination . EMBL-EBI Service Mission To enable life science research and its translation to medicine, agriculture, the bioindustries and society by providing biological data, information and knowledge ELIXIR connects national bioinformatics centres and EMBL-EBI into a sustainable European infrastructure for biological research data

medicine

agriculture

bioindustries ELIXIR underpins life science research – across academia environment 5 and industry

www.elixir-europe.org ELIXIR: a distributed infrastructure

• Data • Tools • Compute • Standards • Training What services do we provide? Labs around the world send us …provide their data and tools to help we… researchers use it

A collaborative Archive it Analyse, add enterprise value and integrate it

Classify it Share it with other data providers Data resources at EMBL-EBI Genes, genomes & variation

European Nucleotide Ensembl European Genome-phenome Archive Archive Ensembl Genomes Metagenomics portal 1000 Genomes Gene, & metabolite expression

ArrayExpress Metabolights Expression Atlas PRIDE Literature & & protein families ontologies InterPro Pfam UniProt Europe PubMed Central Gene Ontology Experimental Factor Molecular structures Ontology Protein Data Bank in Europe Electron Microscopy Data Bank

Chemical biology

ChEMBL ChEBI Reactions, interactions & pathways Systems BioModels BioSamples IntAct Reactome Enzyme Portal Looking ahead Challenges and opportunities Bigger and bigger data Life science: many data types

Genes, genomes & variation

Gene, protein & metabolite expression

Protein sequences, families & motifs

Macromolecular structures

Interactions, reactions & pathways

Chemogenomics & metabolomics

Phenotypes Challenges

• Data volume • Data type diversity • User diversity • Data interoperability, especially semantics and shared controlled vocabs • Added-value resources through curation • To allow tailoring of data delivery (Portals) to the needs of specific scientific communities • Integrate articles, big data and the long tail of data (unstructured data) • To integrate user data with large data sets: Embassy cloud model • Images • Medical data: Controlled access data is very expensive • User training EMBL-EBI now and 2020 • Now: • 10 million web hits a day • 4 million unique IP addresses a year • 30 000 cores • 50 PB of disc

• 2020 • 20 million web hits a day • 7 million unique IP addresses a year • 300 000 cores • 4 ExaB of disc Thank you www.ebi.ac.uk

Twitter: @emblebi Facebook: EMBLEBI Weibo: emblebi YouTube: EMBLMedia