Development and Application of an Integrative Genomics Approach to Lung Cancer
Total Page:16
File Type:pdf, Size:1020Kb
DEVELOPMENT AND APPLICATION OF AN INTEGRATIVE GENOMICS APPROACH TO LUNG CANCER by RAJAGOPAL CHARI B.Sc., University of British Columbia, 2001 B.Sc., University of British Columbia, 2004 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Pathology and Laboratory Medicine) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) June 2010 © Rajagopal Chari, 2010 Abstract Lung cancer has the highest mortality rate amongst all diagnosed malignancies with adenocarcinoma (AC) being the most commonly diagnosed subtype of this disease in North America. The dismal survival statistics of lung cancer patients are largely due to the detection of the disease at an advanced stage and to a lesser extent, the limited efficacy of current front line treatments. Genomic approaches, namely gene expression analysis, have provided tremendous insight into lung cancer. While many gene expression changes have been identified, most changes are likely reactive to changes which have a primary role in cancer development. Moreover, one feature which can discern primary from reactive changes is the presence of concordant DNA level alteration. Many well known genes involved in cancer such as TP53 and CDKN2A have been shown to be affected by multiple mechanisms of alteration such as somatic mutation in or loss of DNA sequence. For a given gene, one tumor may be affected by one mechanism while another tumor may be affected by a different mechanism. Although this level of multi-dimensional analysis has been performed for specific genes, such analysis has not been done at the genome-wide level. This thesis highlights the development and application of a multi-dimensional genetic and epigenetic approach to identify frequently aberrant genes and pathways in lung AC. I present, first, the design and implementation of the system for integrative genomic multi-dimensional analysis of cancer genomes, epigenomes and transcriptomes (SIGMA2). Next, analyzing a multi-dimensional dataset generated from ten lung AC specimens with non-malignant controls, I identified novel genes and pathways that would have been missed if a non-integrative approach were used. Finally, examining genes involved with EGFR signaling, I identified a gene, signal receptor protein alpha (SIRPA), which had not been previously shown to be associated with lung cancer. Taken together, these findings demonstrate the power of a multi-dimensional approach to identify important genes and pathways in lung cancer. Moreover, identifying key genes using a multi-dimensional approach on a small sample set suggests the need of large datasets may be circumvented by using a more comprehensive approach on a smaller set of samples. ii Table of Contents Abstract ......................................................................................................................................... ii Table of Contents ......................................................................................................................... iii List of Tables ............................................................................................................................... vii List of Figures ............................................................................................................................. viii List of Abbreviations ...................................................................................................................... x Acknowledgements ..................................................................................................................... xii Dedication ................................................................................................................................... xiii Co-Authorship Statement ........................................................................................................... xiv Chapter 1: Introduction ................................................................................................................. 1 1.1 Lung cancer ......................................................................................................................... 2 1.2 Genomic profiling of lung cancer ......................................................................................... 3 1.2.1 Gene expression analysis ............................................................................................. 3 1.2.2 DNA copy number analysis .......................................................................................... 4 1.2.3 Loss of heterozygosity (LOH) and allelic imbalance ..................................................... 5 1.3 Somatic mutations in lung cancer ....................................................................................... 5 1.4 Epigenetic alterations in lung cancer ................................................................................... 6 1.4.1 DNA methylation ........................................................................................................... 6 1.5 Current level of integrative analysis .................................................................................... 7 1.6 Need for an integrative approach to study lung cancer ....................................................... 7 1.7 Bioinformatic tools for genomic analysis ............................................................................. 8 1.8 Thesis theme ....................................................................................................................... 9 1.9 Objectives and hypothesis .................................................................................................. 9 1.10 Specific aims and outline of thesis .................................................................................. 10 1.11 Description of high throughput data in this thesis ............................................................ 13 1.12 Other relevant contributions not included as chapters in this thesis ............................... 13 1.12.1 Development of tools for genomic analysis .............................................................. 14 1.12.2 Baseline gene expression in non-malignant lung tissue ........................................... 14 1.12.3 Differential gene expression analysis in lung cancer ................................................ 15 1.12.4 Integration of gene dosage and gene expression in lung cancer ............................. 16 1.13 References ...................................................................................................................... 18 iii Chapter 2: SIGMA2: A system for the integrative genomic multi-dimensional analysis of cancer genomes, epigenomes, and transcriptomes1 .............................................................................. 24 2.1 Introduction ........................................................................................................................ 25 2.2 Implementation .................................................................................................................. 26 2.3 Results and discussion ...................................................................................................... 26 2.3.1 Look and feel of SIGMA2 ............................................................................................ 26 2.3.2 Description of application scope and functionality ...................................................... 27 2.3.3 Approach to integration between array platforms and assays .................................... 27 2.3.4 Format requirements of input data .............................................................................. 27 2.3.5 Description of user interface ....................................................................................... 28 2.3.6 Analysis of data from a single assay type ................................................................... 29 2.3.7 Analysis of data from multiple assays in a given 'omics dimension ............................ 30 2.3.8 Combinatorial analysis of multiple 'omics dimensions - gene dosage and gene expression ........................................................................................................................... 30 2.3.9 Group comparison analysis - single ‘omics dimension ............................................... 31 2.3.10 Group comparison analysis - integrating multiple 'omics dimensions ....................... 31 2.3.11 Multi-dimensional analysis of a breast cancer genome ............................................ 31 2.3.12 Exporting data and results ........................................................................................ 32 2.4 Conclusions ....................................................................................................................... 32 2.5 Availability and requirements ............................................................................................ 33 2.6 References ........................................................................................................................ 46 Chapter 3: An integrative multi-dimensional genetic and epigenetic strategy to identify aberrant genes and pathways in cancer2 .................................................................................................. 48 3.1 Background ....................................................................................................................... 49 3.2 Methods ............................................................................................................................