Computational Integration of Genome-Wide Observational And

Computational integration of genome-wide observational and functional data in cancer Felix Sanchez Garcia Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2015 © 2015 Felix Sanchez Garcia All rights reserved ABSTRACT Computational integration of genome-wide observational and functional data in cancer Felix Sanchez Garcia The emergence of high throughput technologies is enabling the characterization of cancer genomes at unprecedented resolution and scale. However, such data suffer from the typical limitations of observational studies, which are frequently challenged by their inability to differentiate between causality and correlation. Recently, several datasets of genome-wide functional assays performed on tumor cell lines have become available. Given the ability of these assays to interrogate cancer genomes for the function of each individual gene, these data can provide vital cues to identify causal events and, with them, novel drug targets. Unfortunately, current analytical methods have been unable to overcome the challenges posed by these assays, which include poor signal to noise ratio and wide-spread off-target effects. Given the largely orthogonal strengths and weaknesses of descriptive analysis of genetic and genomic observational data from cancer genomes and genome-wide functional screening, I hypothesized that integrating the two data types into unified computational models would significantly increase the power of the biological analysis. In this dissertation I use integrative approaches to tackle two crucial problems in cancer research: the identification of driver genes and the discovery of tumor lethalities. I use the resulting methods to study breast cancer, the second most common form of this disease. The first part of the dissertation focuses on the analysis of regions of copy number alteration for the identification of driver genes. I first describe how a simple integrative method enabled the identification of BIN3, a novel driver of metastasis in breast cancer. I then describe Helios, an unsupervised method for the identification of driver genes in regions of SCNA that integrates different data sources into a single probabilistic score. Applying Helios to breast cancer data identified a set of candidate drivers highly enriched with known drivers (p-value < e-14). In vitro validation of 12 novel candidates predicted by Helios found 10 conferred enhanced anchorage independent growth, demonstrating Helios’s exquisite sensitivity and specificity. I further provide an extensive characterization of RSF-1, a driver identified by Helios whose amplification correlates with poor prognosis, which displayed increased tumorigenesis and metastasis in mouse models. The second part of this dissertation addresses the problem of identifying tumor vulnerabilities using genome-wide shRNA screens across tumor cell lines. I approach this endeavor using a novel integrative method that employs different biomarkers of cellular state to facilitate the identification of clusters of hairpins with similar phenotype. When applied to breast cancer data, the method not only recapitulates the main subtypes and lethalities associated to this malignancy, but also identifies several novel putative lethalities. Taken together, this research demonstrates the importance of the computational integration of genome-wide functional and observational data in cancer research, providing novel approaches that yield important insights into the biology of the disease. Table of Contents List of Figures ........................................................................................................................................................... iii List of Tables ............................................................................................................................................................. vi List of Supplementary Tables ........................................................................................................................... vii Chapter I - Introduction ........................................................................................................................ 1 The omics era ............................................................................................................................................................. 3 The RNAi revolution ................................................................................................................................................ 4 Integrative approaches .......................................................................................................................................... 8 Identifying driver alterations .............................................................................................................................. 9 Non-oncogenic addictions.................................................................................................................................. 15 Chapter II – Detection of regions of copy number alterations ............................................. 18 Introduction ............................................................................................................................................................. 18 Overview of computational approaches ...................................................................................................... 18 ISAR ............................................................................................................................................................................. 22 ISAR-DEL ................................................................................................................................................................... 27 Chapter III – Identification of driver genes in regions of copy number alteration ....... 31 Introduction ............................................................................................................................................................. 31 Methods ..................................................................................................................................................................... 35 Modeling copy number........................................................................................................................................ 36 Modeling additional sources of information .............................................................................................. 39 Features used in the Helios algorithm .......................................................................................................... 39 Combining the features into a unified framework ................................................................................... 43 Model learning ........................................................................................................................................................ 44 Initializing a starting point ................................................................................................................................ 45 Chapter IV – Computational analysis of the landscape of amplifications in Breast Cancer ....................................................................................................................................................... 47 Introduction ............................................................................................................................................................. 47 Results ........................................................................................................................................................................ 48 Methods ..................................................................................................................................................................... 59 Chapter V – RSF-1, a novel driver of Invasion and Metastasis in Breast Cancer ............ 67 RSF-1 Promotes Colony Growth In Vitro ..................................................................................................... 67 RSF-1 Promotes Growth in Xenograft Models ........................................................................................... 68 RSF-1 Promotes invasion in Xenograft Models ......................................................................................... 70 RSF-1 and CCND1 significantly co-occur and are independent predictors of survival ............ 72 Methods ..................................................................................................................................................................... 75 Chapter VI – Computational analysis of the landscape of deletions of Breast Cancer for the identification of drivers of resistance to anoikis .............................................................. 79 ISAR-DEL analysis ................................................................................................................................................. 80 i Integration with Anoikis screen ...................................................................................................................... 83 Chapter VII – BIN3 is a novel 8p21 tumor suppressor gene that regulates the attachment checkpoint in epithelial cells ................................................................................... 87 Silencing of BIN3 promotes tumorigenesis ...............................................................................................

Computational Integration of Genome-Wide Observational And

A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus

9, 2015 Glasgow, Scotland, United Kingdom Abstracts

Computational and Experimental Approaches for Evaluating the Genetic Basis of Mitochondrial Disorders

Literature Mining Sustains and Enhances Knowledge Discovery from Omic Studies

Complete Transcriptome Profiling of Normal and Age-Related Macular

Insights Into MYC Biology Through Investigation of Synthetic Lethal Interactions with MYC Deregulation

Identification of Genomic Targets of Krüppel-Like Factor 9 in Mouse Hippocampal

Agricultural University of Athens

WO 2013/095793 Al 27 June 2013 (27.06.2013) W P O P C T

Supplementary Table 3. Genes Specifically Regulated by Zol (Non-Significant for Fluva)

Coexpression Networks Based on Natural Variation in Human Gene Expression at Baseline and Under Stress

Ryan D. Hernandez