Development and Application of an Integrative Genomics Approach to Lung Cancer

DEVELOPMENT AND APPLICATION OF AN INTEGRATIVE GENOMICS APPROACH TO LUNG CANCER by RAJAGOPAL CHARI B.Sc., University of British Columbia, 2001 B.Sc., University of British Columbia, 2004 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Pathology and Laboratory Medicine) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) June 2010 © Rajagopal Chari, 2010 Abstract Lung cancer has the highest mortality rate amongst all diagnosed malignancies with adenocarcinoma (AC) being the most commonly diagnosed subtype of this disease in North America. The dismal survival statistics of lung cancer patients are largely due to the detection of the disease at an advanced stage and to a lesser extent, the limited efficacy of current front line treatments. Genomic approaches, namely gene expression analysis, have provided tremendous insight into lung cancer. While many gene expression changes have been identified, most changes are likely reactive to changes which have a primary role in cancer development. Moreover, one feature which can discern primary from reactive changes is the presence of concordant DNA level alteration. Many well known genes involved in cancer such as TP53 and CDKN2A have been shown to be affected by multiple mechanisms of alteration such as somatic mutation in or loss of DNA sequence. For a given gene, one tumor may be affected by one mechanism while another tumor may be affected by a different mechanism. Although this level of multi-dimensional analysis has been performed for specific genes, such analysis has not been done at the genome-wide level. This thesis highlights the development and application of a multi-dimensional genetic and epigenetic approach to identify frequently aberrant genes and pathways in lung AC. I present, first, the design and implementation of the system for integrative genomic multi-dimensional analysis of cancer genomes, epigenomes and transcriptomes (SIGMA2). Next, analyzing a multi-dimensional dataset generated from ten lung AC specimens with non-malignant controls, I identified novel genes and pathways that would have been missed if a non-integrative approach were used. Finally, examining genes involved with EGFR signaling, I identified a gene, signal receptor protein alpha (SIRPA), which had not been previously shown to be associated with lung cancer. Taken together, these findings demonstrate the power of a multi-dimensional approach to identify important genes and pathways in lung cancer. Moreover, identifying key genes using a multi-dimensional approach on a small sample set suggests the need of large datasets may be circumvented by using a more comprehensive approach on a smaller set of samples. ii Table of Contents Abstract ......................................................................................................................................... ii Table of Contents ......................................................................................................................... iii List of Tables ............................................................................................................................... vii List of Figures ............................................................................................................................. viii List of Abbreviations ...................................................................................................................... x Acknowledgements ..................................................................................................................... xii Dedication ................................................................................................................................... xiii Co-Authorship Statement ........................................................................................................... xiv Chapter 1: Introduction ................................................................................................................. 1 1.1 Lung cancer ......................................................................................................................... 2 1.2 Genomic profiling of lung cancer ......................................................................................... 3 1.2.1 Gene expression analysis ............................................................................................. 3 1.2.2 DNA copy number analysis .......................................................................................... 4 1.2.3 Loss of heterozygosity (LOH) and allelic imbalance ..................................................... 5 1.3 Somatic mutations in lung cancer ....................................................................................... 5 1.4 Epigenetic alterations in lung cancer ................................................................................... 6 1.4.1 DNA methylation ........................................................................................................... 6 1.5 Current level of integrative analysis .................................................................................... 7 1.6 Need for an integrative approach to study lung cancer ....................................................... 7 1.7 Bioinformatic tools for genomic analysis ............................................................................. 8 1.8 Thesis theme ....................................................................................................................... 9 1.9 Objectives and hypothesis .................................................................................................. 9 1.10 Specific aims and outline of thesis .................................................................................. 10 1.11 Description of high throughput data in this thesis ............................................................ 13 1.12 Other relevant contributions not included as chapters in this thesis ............................... 13 1.12.1 Development of tools for genomic analysis .............................................................. 14 1.12.2 Baseline gene expression in non-malignant lung tissue ........................................... 14 1.12.3 Differential gene expression analysis in lung cancer ................................................ 15 1.12.4 Integration of gene dosage and gene expression in lung cancer ............................. 16 1.13 References ...................................................................................................................... 18 iii Chapter 2: SIGMA2: A system for the integrative genomic multi-dimensional analysis of cancer genomes, epigenomes, and transcriptomes1 .............................................................................. 24 2.1 Introduction ........................................................................................................................ 25 2.2 Implementation .................................................................................................................. 26 2.3 Results and discussion ...................................................................................................... 26 2.3.1 Look and feel of SIGMA2 ............................................................................................ 26 2.3.2 Description of application scope and functionality ...................................................... 27 2.3.3 Approach to integration between array platforms and assays .................................... 27 2.3.4 Format requirements of input data .............................................................................. 27 2.3.5 Description of user interface ....................................................................................... 28 2.3.6 Analysis of data from a single assay type ................................................................... 29 2.3.7 Analysis of data from multiple assays in a given 'omics dimension ............................ 30 2.3.8 Combinatorial analysis of multiple 'omics dimensions - gene dosage and gene expression ........................................................................................................................... 30 2.3.9 Group comparison analysis - single ‘omics dimension ............................................... 31 2.3.10 Group comparison analysis - integrating multiple 'omics dimensions ....................... 31 2.3.11 Multi-dimensional analysis of a breast cancer genome ............................................ 31 2.3.12 Exporting data and results ........................................................................................ 32 2.4 Conclusions ....................................................................................................................... 32 2.5 Availability and requirements ............................................................................................ 33 2.6 References ........................................................................................................................ 46 Chapter 3: An integrative multi-dimensional genetic and epigenetic strategy to identify aberrant genes and pathways in cancer2 .................................................................................................. 48 3.1 Background ....................................................................................................................... 49 3.2 Methods ............................................................................................................................

Development and Application of an Integrative Genomics Approach to Lung Cancer

Systems and Chemical Biology Approaches to Study Cell Function and Response to Toxins

Origins and Functional Impact of Copy Number Variation in the Human Genome

Investigation of the Underlying Hub Genes and Molexular Pathogensis in Gastric Cancer by Integrated Bioinformatic Analyses

Transcriptomic and Proteomic Profiling Provides Insight Into

A Chromosome Level Genome of Astyanax Mexicanus Surface Fish for Comparing Population

A Yeast Bifc-Seq Method for Genome-Wide Interactome Mapping

Molecular Dysexpression in Gastric Cancer Revealed by Integrated Analysis of Transcriptome Data

Supplementary Tables S1-S3

The Consensus Coding Sequences of Human Breast and Colorectal Cancers Tobias Sjöblom,1* Siân Jones,1* Laura D

Identification of Gene-Oriented Exon Orthology Between Human and Mouse

Novel Genes Associated with Colorectal Cancer Are Revealed by High Resolution Cytogenetic Analysis in a Patient Specific Manner

Download Special Issue