Modeling Biological Systems and Analyzing Large-Scale Data Sets

ilya shmulevich TCGA Data Types TCGA Research Network Heterogeneous data Clinical variables contributing to tumor aggressiveness

Less More Aggressive Aggressive

Distant M0=No M1=Yes Metastasis

Tumor Stage Early (I-II) Late(III-IV)

Fraction Lymph Nodes 0 100 % Positive by H & E

Lymphatic No Yes Invasion Present

Vascular No Yes Invasion Present

Histological Mucinous Non- Type mucinous Vesteinn Thorsson Nature, 487,330- 337, 2012. FBXW7

Vesteinn Thorsson Nature, 487,330- Vesteinn Thorsson 337, 2012. Nature, 487,330- Vesteinn Thorsson, Dick 337, 2012. Kreisberg Web-based Apps

http://explorer.cancerregulome.org The Regulome Explorer is an interactive web application that allows the user to explore multivariate relationships in data

explorer.cancerregulome.org

Richard Kreisberg, Jake Lin, Timo Erkkila, Sheila Reynolds explorer.cancerregulome.org RF-ACE, a multivariate statistical inference method based on ensembles of decision trees, which seeks to uncover significant associations between features in the input data matrix.

Timo Erkkilä, Sheila Reynolds, Kari Torkkola RF-ACE has high predictive power and is resistant to over- fitting. http://code.google.com/p/rf-ace/

Computational challenges: • mixed data types: continuous, RF-ACE features: discrete, and categorical • handles mixed variable types • tens of thousands of features • does not require imputation of x tens or hundreds of samples missing values • non-linear, noisy, and • random subsampling rather than multivariate relationships combinatorial search • correlated features • statistical testing removes redundant • missing data features • -value for each candidate predictor • fast, portable implementation in C++

Timo Erkkilä Google I/O keynote presentation 600,000 cores June 27, 2012

A multilevel pan-cancer view: from genes to hallmarks

Theo Knijnenburg Mutational investment Billions of Associations!

explorer.cancerregulome.org Motivating questions

• Repurposing – Which existing cancer drugs may be therapeutic in which other cancers? – Which inhibitors with no current cancer indications may be therapeutic in certain cancers?

• Opportunity – TCGA primary tumor data may serve as the basis for guided investigation of these open questions Guiding principle • The direct protein target for most inhibitors is not the sensitizing aberrated protein itself – e.g., AKT1 inhibitors are most effective against cell lines with PTEN mutations

Song et al. (2012) Proof of concept: Associations between drug targets (e.g., AKT1) and sensitizing aberrations (e.g., PTEN) also evident in TCGA

PTEN mutations in UCEC

AKT1 protein expression related to PTEN mutation in UCEC

PTEN mutation status Proof of concept: Associations between drug targets (e.g., AKT1) and sensitizing aberrations (e.g., PTEN) also evident in TCGA

Association

drug target : sensitizing aberration pairs Approach

• Create large heterogeneous graph of associations from TCGA data, literature, databases, … – [Billions of edges, Terabytes of data]

• Query on Cray YarcData uRiKA graph analytics appliance – No locality of reference, graphs hard to partition – [Minutes rather than hours per query]

Synthetic lethal protein targets Candidate compounds ATR

CHEK1 Genomic Aberration

PAK3

TP53 mutation PLK1

• Identify aberrated gene → target → drug … SGK2

… relationships for drugs with and without WEE1 known efficacy in cancer

Integrating multiple data sources into a (big) graph Genomic aberrations Therapeutic targets Candidate inhibitors

RNAi Graph Data Model: Resource Description Framework (RDF) 1 0.2515 _:blankGeneNMD http://www.systemsbiolo http://www.systemsbiolo gy.net/brca2> gy.net/tp53>

_:blankDrugGeneNMD >

http://www.systemsbiolo 1.1628 gnab gy.net/biotin> _:blankPairwise http://www.systemsbiolo gy.net/Drug>

-0.511 gexp Example SPARQL Query

Literature Literature Seed Gene TCGA Associated Small Cancer List Genes Molecules Type cancer.gov approved drugs Database Example Result: PTEN associations in UCEC

Genomic aberrations Candidate targets Candidate inhibitors

Acepromazine Acitretin Adapalene Adenine Adenosine monophosphate Adenosine triphosphate Adinazolam Alitretinoin Allylestrenol Alpha-Linolenic Acid Alprazolam Alteplase Aminocaproic Acid Amsacrine Anistreplase Aprotinin Arcitumomab Atorvastatin Bepridil Biotin Bromazepam Capromab Carglumic acid ASRGL1 Carmustine Chlordiazepoxide Chlorotrianisene Chlorpheniramine ESR1 Cinolazepam Clobazam Clomifene Clonazepam GLYATL2 Clorazepate Clotiazepam Conjugated Estrogens Cysteamine PLIN3 Danazol Dantrolene Debrisoquin Desogestrel HADH Desvenlafaxine Diazepam Dicumarol Dienestrol NT5E Diethylpropion Diethylstilbestrol Dipyridamole PIK3R3 Drospirenone Droxidopa Duloxetine Dutasteride Dydrogesterone Ephedra GABRE Epinephrine Escitalopram Estazolam Estradiol PGR Estramustine Estriol Estrone Estropipate Ethinyl Estradiol Ethynodiol Diacetate FBP1 Etonogestrel Finasteride Fludiazepam Fluoxymesterone Flurazepam SMPD3 Fluticasone Propionate Fluvastatin Fulvestrant Galsulfase Ginkgo biloba GRIN1 Glutathione Glycine Guanadrel Sulfate Guanethidine Halazepam Halofantrine PIK3R1 Hydroxocobalamin Idursulfase Isoproterenol RARG Ketazolam L-Alanine L-Arginine L-Asparagine L-Aspartic Acid AADAT L-Carnitine L-Citrulline L-Cysteine Levonordefrin Levonorgestrel L-Glutamic Acid CACNA2D2 L-Histidine Lindane L-Methionine L-Ornithine SST L-Proline L-Serine Medroxyprogesterone SRD5A1 Megestrol Melatonin Menadione Meperidine Mestranol B4GALT1 Methotrimeprazine Miconazole Midazolam ADRA1B Milnacipran N-Acetyl-D-glucosamine NADH KCNJ12 Naloxone Nitrazepam RYR1 Norethindrone Norgestimate Olopatadine SLC6A14 Oxazepam Paroxetine Pentostatin Pentoxifylline RETSAT Phosphatidylserine FAAH Pravastatin Prazepam Promazine SRR Propericiazine Propiomazine Protriptyline Pyridoxine NQO1 Quazepam Quinestrol Raloxifene CEACAM1 Reteplase Rosuvastatin S-Adenosylmethionine KCNK6 Streptokinase Tamoxifen ACADS Tazarotene Temazepam Tenecteplase CRAT Thiopental Thioproperazine Toremifene PTEN Tranexamic Acid ELOVL4 Tretinoin Triazolam Trilostane Urokinase Venlafaxine FOLH1 Vitamin A ALDH1A3 SORD ASS1 NADSYN1 PRNP NDUFA11 KCNH2 CPS1 SLC22A5 HMGCR ALDH18A1 PARS2 GLS B4GALT4 ACACB SLC38A3 GSR OAZ3 TCN1 SLC1A1 SMPD4 BHMT2 HSD17B4 GRIK5 GLDC PPIB PIPOX ADA SCN3B S100A1 PLG SLC1A4 CBS GLRB ACVR1B SLC6A2 Example Result: PTEN associations in UCEC Genomic aberrations Candidate targets Candidate inhibitors

PTEN PIK3R1/PIK3CA Wortmannin

PTEN mutation status PDB id 3hhm Repurposing existing cancer drugs in other cancers

Genomic aberrations Candidate targets Candidate inhibitors

Existing cancer indication Target Cancer Drug A New cancer indication Example Result

• TP53 is frequently mutated in most tumor types • ABCG2, also known as Breast Cancer Resistance Protein (BCRP), is associated with TP53 mutation in TCGA breast cancer data • Nelfinavir, an HIV protease inhibitor, also binds ABCG2 and many other proteins • High-throughput cell line screening of breast cancer cells recently identified Nelfinavir as a selective inhibitor. “It can be brought to HER2-breast cancer treatment trials with the same dosage regimen as that used among HIV patients. “ [Shim et al. JNCI 2012]

Understanding behavior of massive multicellular systems: BioCellion

Source: http://www.sjrcd.org/soilhealth/soilagg.html Source: http://www.theregister.co.uk source: EMBO Rep. 2004 May; 5(5): 470–476.

Ductal Carcinoma model: Nicholas Flann, Utah State Univ.

Source: http://www.webmd.com Acknowledgments Brady Bernard, Ryan Bressler, Andrea Eakin, Timo Erkkilä, Lisa Iype, Seunghwa Kang, Theo Knijnenburg, Roger Kramer, Richard Kreisberg, Kalle Leinonen, Jake Lin, Yuexin Liu, Michael Miller, Sheila Reynolds, Hector Rovira, Vesteinn Thorsson, Da Yang, Wei Zhang