Alfonso Valencia. Ph.D. ICREA Professor Director Life Sciences Dept. Director Spanish Bioinformatics Institute INB-ISCIII ELIXIR-ES
Big data en investigación biomedica y en práctica clínica.
5/March/2019 XII Conferencia Anual de las Plataformas Tecnol6gicas de lnvestigaci6n Biomedica: Medicamentos lnnovadores, Nanomedicina Tecnologfa Sanitaria y Mercados Biotecnol6gicos Biology and Medicine as Big Data science
~100PB High Energy Physics
Data Size Biology & Astronomy Medicine ~10PB Climate Models
Model predictability
From Ewan Birney EBI … Biology and Medicine …in 10 years
~100PB High Energy Physics
Data Size Biology & Astronomy Medicine ~10PB Climate Models
Model predictability Digital Twins view of Future Medicine
.. Datos muy especiales
- Complejidad asociada a experimentos y muestras (meta-información) - Heterogeneidad propia de la biología - Confidencialidad - Propiedad y gestión de los datos
Literature and ontologies CitExplore, GO Genomes Ensembl, Ensembl Large Scale Genomes, EGA Nucleotide sequence EMBL-Bank Genomics Proteomes Gene expression UniProt, PRIDE ArrayExpress Protein structure PDBe Protein families, motifs and domains InterPro Chemical entities ChEBI, ChEMBL Protein interactions IntAct
Pathways Reactome
Systems BioModels
BSC CNAG EBI-EMBL
Personalized Medicine schema
Gomez-Lopez (2017 ) Briefings in Bioinformatics, ELIXIR: European Bioinformatics Infrastructure
Data
Interoperability
Tools
Compute
Training Marine metagenomics
Crop and forest plants
Human data
Rare diseases ELIXIR Human genomics platform
Generate Archive, discover, manage Diverse data from Store, access and share data of multiple Analyse diverse providers types and origin Transbionet network (INB hosted) General Purpose
11.15 Pflops/s
3,456 nodes of Intel Xeon v5 processors
14PB storage
22th in the World 6th in Europe If you want to know more, see… Emerging Technologies, for evaluation of 2020 Exascale systems
3 systems, each of more than 0,5 Pflops/s with ARMv8. KNH, & Power9+NVIDIA,
1.5 Pflops/s
Life Sciences 7 research groups Understanding living 5 Support Units (including INB) organisms by theoretical and computational methods 120 scientists/engineers by Machine Learning 2020
Computational Protein and genomics drug modeling
Personalized 0 Medicine
Text Evaluation of Mining social impact
Bio- Infrastructure BSC technical strategy for Personalized Medicine Personalized Medicine schema
Gomez-Lopez (2017 ) Briefings in Bioinformatics, NLP and Text Mining Sanidad
Justicia Infraestructura computacional
Infraestructura Vigilancia Sectorial* PlataformaTAy NLP Recursos dominio Open Link Data Link Open lingü í
stica Recursos generales
*Inteligencia competitiva
Nuevos verticales
Plataforma TL componentes
Datos Recursos
Casos de Uso, Plataforma Aplicaciones
Digital Twins view of Future Medicine
IMI Lesson I am learning (1/2)
Symmetry of the problems between companies and academia - information silos in departments or administrations - difficult to adhere to long term projects - limited capacity to adapt to new technologies
Data Management is Key - Federation for searches and execution (beacons and containers) - Encryption (i.e. searches in encrypted data) - Trusted partners as mediators - FAIR / standards to facilitate the internal operations
Thrust (has a lot to do with exploration versus stability)
IMI Lesson I am learning (2/2)
New calls - Data management issues - Analysis platforms integrating OMICS and EHR (others) (there are many call - projects on-going) - AI/ML applications on diverse data sets and applications
New projects should be more challenging and transformative.
Darío Garcia-Gasulla, Ferran Pares, Armand Vilalta
BSC (High Performance AI) Roland Mathis, Matteo Manica Computational biology group (BSC)
IBM (Computational systems biology) Vera Pancaldi CRCT, Toulouse Ignacio Martin-Subero IDIBAPS, Barcelona Enrique Carrillo de Santa Pau IMDEA, Madrid David Juan Sopeña IBE, Barcelona Felipe Were CNIC, Madrid Biola Javierre IJC, Barcelona www.bsc.es IBM-BSC Deep Learning Center Eric Soriano CFIS UPC, Barcelona
Davide Cirillo, Miguel Ponce de León, Alba Lepore, Miguel Vazquez, Martin Krallinger, Jon SánchezSpanish ,MINECO “Comorbidity networks” eTRANSAFE UlisesBSC, Cortés Barcelona María OPENMINDTED High Rodríguez Performance Computationa Excellerate- Artificial l systems ELIXIR Intelligence biology fellows Alfons_Valenci a
Predicting protein structure from the sequence is one of the fundamental problems in molecular biology.
It is the key to the prediction of the consequences of mutations in human diseases and to drug design
07/03/2019 All CASP results are available (each target / all predictors)
07/03/2019 Data mining: Complex Networks
16M abstracts
● Bio-entities associations ● Functional interpretation ● Translational applications
07/03/2019 1 Data mining: Complex Networks MelanomaMine
AUC = 0.95 (5-fold cross- validation)
Martin Krallinger (Text Mining Unit, BSC) Data mining: Complex Networks
07/03/2019 Data mining: Complex Networks Melanoma Content Analytics graph
myasthenia azathioprine cheilitis nephrectomy elastosis sarcoidosis aids appendectomy breast cancer
lung cancer appendicitis tyrosine kinase inhibitor LOX lobectomy neurosurgery tetanus apc potency dysphagia leukoderma sequela hypophysectomy chest pain SMAD3 diplopia SKI CTGF TGFB1 CAV1 autoimmunity pancreatic cancer dilation and curettage MMP1 MIF TF NOTCH1 ITK IRF8 ascites IGF1 IL10 splenectomy pancreatitis IL6 MST1 CD81 irritation lymphadenitis anemia endometriosis vulvectomy hyperactivity XIAP lymphadenopathy toxoplasmosis gastrectomy hepatoma FAS TP53 teratoma BRCA1 uveitis fertility CD55 vitiligo CDKN2 CCND neurofibroma colon cancer BRCA2 MSH2 melanosis amaurosis enucleation MLH1 CDKN2B hypercalcemia exenteration atypicality LTBP2 retinal scleritis carcinoid PTEN detachment psoriasis iritis hemosiderosis goiter thyroidectomy photocoagulation amputation 1 Data mining: Complex Networks Improving the interpretation of standard pathway analysis by linking ontological annotations