Bioinformatics Challenges for the Life Sciences
Rolf Apweiler Associate Director EMBL-EBI www.ebi.ac.uk What is EMBL-EBI?
• Part of the European Molecular Biology Laboratory • International, non-profit research institute • Europe’s hub for biological data services and research • 550 members of staff from 55 nations What we do
Bioinformatics
Services Research Training Industry engagement European Coordination . EMBL-EBI Service Mission To enable life science research and its translation to medicine, agriculture, the bioindustries and society by providing biological data, information and knowledge ELIXIR connects national bioinformatics centres and EMBL-EBI into a sustainable European infrastructure for biological research data
medicine
agriculture
bioindustries ELIXIR underpins life science research – across academia environment 5 and industry
www.elixir-europe.org ELIXIR: a distributed infrastructure
• Data • Tools • Compute • Standards • Training What services do we provide? Labs around the world send us …provide their data and tools to help we… researchers use it
A collaborative Archive it Analyse, add enterprise value and integrate it
Classify it Share it with other data providers Data resources at EMBL-EBI Genes, genomes & variation
European Nucleotide Ensembl European Genome-phenome Archive Archive Ensembl Genomes Metagenomics portal 1000 Genomes Gene, protein & metabolite expression
ArrayExpress Metabolights Expression Atlas PRIDE Literature & Proteins & protein families ontologies InterPro Pfam UniProt Europe PubMed Central Gene Ontology Experimental Factor Molecular structures Ontology Protein Data Bank in Europe Electron Microscopy Data Bank
Chemical biology
ChEMBL ChEBI Reactions, interactions & pathways Systems BioModels BioSamples IntAct Reactome Enzyme Portal Looking ahead Challenges and opportunities Bigger and bigger data Life science: many data types
Genes, genomes & variation
Gene, protein & metabolite expression
Protein sequences, families & motifs
Macromolecular structures
Interactions, reactions & pathways
Chemogenomics & metabolomics
Phenotypes Challenges
• Data volume • Data type diversity • User diversity • Data interoperability, especially semantics and shared controlled vocabs • Added-value resources through curation • To allow tailoring of data delivery (Portals) to the needs of specific scientific communities • Integrate articles, big data and the long tail of data (unstructured data) • To integrate user data with large data sets: Embassy cloud model • Images • Medical data: Controlled access data is very expensive • User training EMBL-EBI now and 2020 • Now: • 10 million web hits a day • 4 million unique IP addresses a year • 30 000 cores • 50 PB of disc
• 2020 • 20 million web hits a day • 7 million unique IP addresses a year • 300 000 cores • 4 ExaB of disc Thank you www.ebi.ac.uk
Twitter: @emblebi Facebook: EMBLEBI Weibo: emblebi YouTube: EMBLMedia