<<

Alfonso Valencia. Ph.D. ICREA Professor Director Life Sciences Dept. Director Spanish Institute INB-ISCIII ELIXIR-ES

Big data en investigación biomedica y en práctica clínica.

5/March/2019 XII Conferencia Anual de las Plataformas Tecnol6gicas de lnvestigaci6n Biomedica: Medicamentos lnnovadores, Nanomedicina Tecnologfa Sanitaria y Mercados Biotecnol6gicos Biology and Medicine as Big Data science

~100PB High Energy Physics

Data Size Biology & Astronomy Medicine ~10PB Climate Models

Model predictability

From EBI … Biology and Medicine …in 10 years

~100PB High Energy Physics

Data Size Biology & Astronomy Medicine ~10PB Climate Models

Model predictability Digital Twins view of Future Medicine

.. Datos muy especiales

- Complejidad asociada a experimentos y muestras (meta-información) - Heterogeneidad propia de la biología - Confidencialidad - Propiedad y gestión de los datos

Literature and ontologies CitExplore, GO Genomes Ensembl, Ensembl Large Scale Genomes, EGA Nucleotide sequence EMBL-Bank Proteomes Gene expression UniProt, PRIDE ArrayExpress structure PDBe Protein families, motifs and domains InterPro Chemical entities ChEBI, ChEMBL Protein interactions IntAct

Pathways Reactome

Systems BioModels

BSC CNAG EBI-EMBL

Personalized Medicine schema

Gomez-Lopez (2017 ) Briefings in Bioinformatics, ELIXIR: European Bioinformatics Infrastructure

Data

Interoperability

Tools

Compute

Training Marine metagenomics

Crop and forest plants

Human data

Rare diseases ELIXIR Human genomics platform

Generate Archive, discover, manage Diverse data from Store, access and share data of multiple Analyse diverse providers types and origin Transbionet network (INB hosted) General Purpose

11.15 Pflops/s

3,456 nodes of Intel Xeon v5 processors

14PB storage

22th in the World 6th in Europe If you want to know more, see… Emerging Technologies, for evaluation of 2020 Exascale systems

3 systems, each of more than 0,5 Pflops/s with ARMv8. KNH, & Power9+NVIDIA,

1.5 Pflops/s

Life Sciences 7 research groups Understanding living 5 Support Units (including INB) organisms by theoretical and computational methods 120 scientists/engineers by Machine Learning 2020

Computational Protein and genomics drug modeling

Personalized 0 Medicine

Text Evaluation of Mining social impact

Bio- Infrastructure BSC technical strategy for Personalized Medicine Personalized Medicine schema

Gomez-Lopez (2017 ) Briefings in Bioinformatics, NLP and Sanidad

Justicia Infraestructura computacional

Infraestructura Vigilancia Sectorial* PlataformaTAy NLP Recursos dominio Open Link Data Link Open lingü í

stica Recursos generales

*Inteligencia competitiva

Nuevos verticales

Plataforma TL componentes

Datos Recursos

Casos de Uso, Plataforma Aplicaciones

Digital Twins view of Future Medicine

IMI Lesson I am learning (1/2)

Symmetry of the problems between companies and academia - information silos in departments or administrations - difficult to adhere to long term projects - limited capacity to adapt to new technologies

Data Management is Key - Federation for searches and execution (beacons and containers) - Encryption (i.e. searches in encrypted data) - Trusted partners as mediators - FAIR / standards to facilitate the internal operations

Thrust (has a lot to do with exploration versus stability)

IMI Lesson I am learning (2/2)

New calls - Data management issues - Analysis platforms integrating OMICS and EHR (others) (there are many call - projects on-going) - AI/ML applications on diverse data sets and applications

New projects should be more challenging and transformative.

Darío Garcia-Gasulla, Ferran Pares, Armand Vilalta

BSC (High Performance AI) Roland Mathis, Matteo Manica group (BSC)

IBM (Computational systems biology) Vera Pancaldi CRCT, Toulouse Ignacio Martin-Subero IDIBAPS, Barcelona Enrique Carrillo de Santa Pau IMDEA, Madrid David Juan Sopeña IBE, Barcelona Felipe Were CNIC, Madrid Biola Javierre IJC, Barcelona www.bsc.es IBM-BSC Deep Learning Center Eric Soriano CFIS UPC, Barcelona

Davide Cirillo, Miguel Ponce de León, Alba Lepore, Miguel Vazquez, Martin Krallinger, Jon SánchezSpanish ,MINECO “Comorbidity networks” eTRANSAFE UlisesBSC, Cortés Barcelona María OPENMINDTED High Rodríguez Performance Computationa Excellerate- Artificial l systems ELIXIR Intelligence biology fellows Alfons_Valenci a

Predicting from the sequence is one of the fundamental problems in .

It is the key to the prediction of the consequences of mutations in human diseases and to drug design

07/03/2019 All CASP results are available (each target / all predictors)

07/03/2019 Data mining: Complex Networks

16M abstracts

● Bio-entities associations ● Functional interpretation ● Translational applications

07/03/2019 1 Data mining: Complex Networks MelanomaMine

AUC = 0.95 (5-fold cross- validation)

Martin Krallinger (Text Mining Unit, BSC) Data mining: Complex Networks

07/03/2019 Data mining: Complex Networks Melanoma Content Analytics graph

myasthenia azathioprine cheilitis nephrectomy elastosis sarcoidosis aids appendectomy breast

lung cancer appendicitis tyrosine kinase inhibitor LOX lobectomy neurosurgery tetanus apc potency dysphagia leukoderma sequela hypophysectomy chest pain SMAD3 diplopia SKI CTGF TGFB1 CAV1 autoimmunity pancreatic cancer dilation and curettage MMP1 MIF TF NOTCH1 ITK IRF8 ascites IGF1 IL10 splenectomy pancreatitis IL6 MST1 CD81 irritation lymphadenitis anemia endometriosis vulvectomy hyperactivity XIAP lymphadenopathy toxoplasmosis gastrectomy hepatoma FAS TP53 teratoma BRCA1 uveitis fertility CD55 vitiligo CDKN2 CCND neurofibroma colon cancer BRCA2 MSH2 melanosis amaurosis enucleation MLH1 CDKN2B hypercalcemia exenteration atypicality LTBP2 retinal scleritis carcinoid PTEN detachment psoriasis iritis hemosiderosis goiter thyroidectomy photocoagulation amputation 1 Data mining: Complex Networks Improving the interpretation of standard pathway analysis by linking ontological annotations