A Graph-Theoretic Approach to Model Genomic Data and Identify Biological Modules Asscociated with Cancer Outcomes

A Graph-Theoretic Approach to Model Genomic Data and Identify Biological Modules Asscociated with Cancer Outcomes Deanna Petrochilos A dissertation presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2013 Reading Committee: Neil Abernethy, Chair John Gennari, Ali Shojaie Program Authorized to Offer Degree: Biomedical Informatics and Health Education ©Copyright 2013 Deanna Petrochilos University of Washington Abstract Using Graph-Based Methods to Integrate and Analyze Cancer Genomic Data Deanna Petrochilos Chair of the Supervisory Committee: Assistant Professor Neil Abernethy Biomedical Informatics and Health Education Studies of the genetic basis of complex disease present statistical and methodological challenges in the discovery of reliable and high-confidence genes that reveal biological phenomena underlying the etiology of disease or gene signatures prognostic of disease outcomes. This dissertation examines the capacity of graph-theoretical methods to model and analyze genomic information and thus facilitate using prior knowledge to create a more discrete and functionally relevant feature space. To assess the statistical and computational value of graph-based algorithms in genomic studies of cancer onset and progression, I apply a random walk graph algorithm in a weighted interaction network. I merge high-throughput co-expression and curated interaction data to search for biological modules associated with key cancer processes and evaluate significant modules in terms of both their predictive value and functional relevance. This approach identifies interactions among genes involved in proliferation, apoptosis, angiogenesis, immune evasion, metastasis, and energy metabolism pathways, and generates hypotheses for future cancer biology studies. Based on the results of this work, I conclude that graph-based approaches are powerful tools for the integration and analysis of complex molecular relationships that reveal significant coordinated activity of genomic features where previous statistical and analytical methods have been limited. i TABLE OF CONTENTS Table of Figures ........................................................................................................................ vi Table of Tables ...................................................................................................................... viii Glossary .................................................................................................................................... ix Acknowledgements ................................................................................................................... xi Chapter 1: Introduction .............................................................................................................. 1 1.1: Challenges in Large Scale Genomic Studies ...................................................... 1 1.2: Research Objectives ............................................................................................ 3 1.2.1: Assessing Network Characteristics of Cancer-Associated Genes in Metabolic and Signaling Networks .............................................................. 4 1.2.2: Using Weighted Random Walks to Identify Cancer-Associated Modules in Expression Data ........................................................................................... 5 1.2.3: Evaluation of the Use of Weighted Random Walks and Expression Data to Identify Cancer-Associated Modules ........................................................... 6 1.2.4: Analysis of microRNA Data in Random Walk-Generated Expression Modules........................................................................................................ 7 1.2.5: Evaluation of Analyzing miRNA Data in Random Walk-Generated Expression Modules ..................................................................................... 7 1.3: Contributions ....................................................................................................... 8 1.4: Dissertation Overview ....................................................................................... 10 Chapter 2: Network Biology and the Cancer Genome ............................................................ 11 2.1: Introduction ....................................................................................................... 11 2.2: Overview of Biological Pathways and Interaction Networks ........................... 11 2.3: Network Features and Definitions .................................................................... 16 2.4: Graph and Pathway-Based Approaches Using Prior Evidence in Genome Studies18 2.5: Graph-based Random Walks in Gene Prioritization and Module Discovery ... 25 Chapter 3: Assessing Network Characteristics of Cancer Associated Genes in Metabolic and Signaling Networks ...................................................................................................... 29 3.1: Introduction ....................................................................................................... 29 3.2: Methods ............................................................................................................. 30 ii 3.2.1: Overview .................................................................................................... 30 3.2.2: Network Construction ................................................................................ 31 3.2.3: Definition of Cancer Genes ........................................................................ 34 3.2.4: Network Features........................................................................................ 34 3.2.5: Statistical Analysis ..................................................................................... 34 3.2.6: Community Analysis .................................................................................. 35 3.3: Results and Discussion ...................................................................................... 36 3.3.1: Global Network Statistics ........................................................................... 36 3.3.2: Feature Prediction ....................................................................................... 36 3.3.3: Community Analysis .................................................................................. 38 3.4: Conclusion......................................................................................................... 43 Chapter 4: Using Random Walks to Identify Cancer-Associated Modules in Expression Data ...................................................................................................................................... 45 4.1: Introduction ....................................................................................................... 45 4.2: Methods ............................................................................................................. 47 4.2.1: Overview .................................................................................................... 47 4.2.2: Gene Expression Data ................................................................................ 48 4.2.3: Network Construction ................................................................................ 49 4.2.4: Weights and Significance Scoring.............................................................. 50 4.2.5: Definition of Cancer Genes ........................................................................ 50 4.2.6: Community Analysis .................................................................................. 51 4.3: Results and Discussion ...................................................................................... 51 4.3.1: Functional Annotation ................................................................................ 51 4.3.2: Breast Cancer .............................................................................................. 56 4.3.3: Hepatocellular Carcinoma .......................................................................... 61 4.3.4: Colorectal Cancer ....................................................................................... 66 4.3.5: Evaluation: Overlap with GSEA ................................................................ 71 4.3.6: Evaluation: Comparison with jActiveModules and Matisse ...................... 71 iii 4.4: Conclusion......................................................................................................... 74 Chapter 5: Analysis of miRNA Data in Random Walk-Generated Expression Modules ....... 76 5.1: Introduction ....................................................................................................... 76 5.2: Methods ............................................................................................................. 80 5.2.1: Overview .................................................................................................... 80 5.2.2: Gene Expression Data ................................................................................ 82 5.2.3: MiRNA-mRNA Matching .......................................................................... 83 5.2.4: Network Construction ................................................................................ 83 5.2.5: Weighting Scheme...................................................................................... 84 5.2.6: Community Analysis .................................................................................

A Graph-Theoretic Approach to Model Genomic Data and Identify Biological Modules Asscociated with Cancer Outcomes

Foamy Viral Vector Integration Sites in SCID-Repopulating Cells After MGMTP140K-Mediated in Vivo Selection

Mutation Signatures and Mutable Motifs in Cancer Research

Targeted Sequencing Reveals the Somatic Mutation Landscape in a Swedish Breast Cancer Cohort

Deep Sequencing of the X Chromosome Reveals the Proliferation History of Colorectal Adenomas

TF–RBP–AS Triplet Analysis Reveals the Mechanisms of Aberrant

NF-Y Subunits Overexpression in HNSCC

Comparative Assessment of Genes Driving Cancer and Somatic Evolution

The Network of Cancer Genes (NCG): a Comprehensive Catalogue of Known and Candidate Cancer Genes from Cancer Sequencing Screens

Oncogenes, Tumor Suppressor and Differentiation Genes Represent The

Cancer-Dedicated Gene Set Interpretation Authors

(NCG): a Comprehensive Catalogue of Known and Candidate Cancer Genes from Cancer Sequencing Screens

The Network of Cancer Genes (NCG): a Comprehensive Catalogue Of