Data Mining the Serous Ovarian Tumor Transcriptome
Total Page:16
File Type:pdf, Size:1020Kb
DATA MINING THE SEROUS OVARIAN TUMOR TRANSCRIPTOME by Timothy Lee Tickle A dissertation submitted to the faculty of The University of North Carolina at Charlotte in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Bioinformatics and Computational Biology Charlotte 2011 Approved by: Dr. M. Taghi Mostafavi Dr. Christine Richardson Dr. Susan Sell Dr. David L. Tait Dr. Jennifer Weller ii © 2011 Timothy Lee Tickle ALL RIGHTS RESERVED iii ABSTRACT TIMOTHY LEE TICKLE. Data mining the serous ovarian tumor transcriptome. (Under the direction of DR. M. TAGHI MOSTAFAVI AND DR. JENNIFER WALSH WELLER) Ovarian cancer is the most lethal gynecologic cancer in the United States. If caught in early stages, patient survival rate is 94%, late stage survival rates drop to 28%. It is because most cases are caught in late stages that high mortality is seen. Correct diagnosis is dependent on the presence of symptoms: ∼90% of diagnosed ovar- ian cancers are symptomatic. These symptoms tend to be unfocused and not acute. The goal of this project is to develop a transcript-level data set measuring ovarian tumor expression and associated paracrine signaling for later biomarker research. To this end, laser capture microdissection was used with exon based oligonucleotide ar- rays to measure the transcriptome of benign and malignant (Type II) serous ovarian surface epithelial-stromal tumors. In addition to profiling tumor, surrounding stro- mal tissue expression was measured to examine potential paracrine signaling. In total, ∼270 million measurements were performed using 50 microarrays. An initial analysis was performed to measure quality, and to compare our measurements against known ovarian cancer properties as established in the molecular genetics literature. Using ontological annotation and de novo pathway generation methods, major trends were defined in the data set including the following: apical surface and tight junction ac- tivity, mitotic activity, tumor suppression in benign tumors, epithelial-mesenchymal transitioning, known ovarian tumor oncogene activity, and evidence of paracrine sig- naling. A list of differentially expressed transcripts was defined which may be explored as biomarkers. The potential for meaningful future analysis is diverse. This data set will contribute to the capacity of the cancer genetics community to perform high resolution exploration of serous ovarian epithelial-stromal surface tumors, aiding in developing better diagnostics and therapeutics. iv DEDICATION This is dedicated to all the lives touched by cancer and those who fight for their cures. v ACKNOWLEDGEMENTS I would like to thank the Blumenthal Cancer Center, the Department of Gyne- cologic Oncology, the Sonya Hanko Wyatt Ovarian Cancer Research Endowment at Carolinas Medical Center, the Hemby Cancer Research Endowment, and the Health Services Foundation (Carolinas HealthCare System) for their continued support and passion for ovarian cancer research. Thank you to the Microarray Core Facility at the Cannon Research Center (Carolinas Medical Center) for hybridizing the exon ç samples and for access to Partek® Genomics Suite . Thank you to the Bioinformat- ics Research Center at the North Carolina Research Campus for access to software and expertise. Thank you to the GAANN fellowship program, for their support and interest in my academic experience. Thank you to the College of Computing and Informatics and the Department of Bioinformatics and Genomics for their guidance and untold support in navigating my PhD candidacy. vi TABLE OF CONTENTS LIST OF TABLES ix LIST OF FIGURES xii CHAPTER 1: INTRODUCTION 1 1.1 Ovarian Tumor Clinicopathologic Features 1 1.2 Ovarian Tumor Histology 3 1.3 Serous Ovarian Surface Epithelial-Stromal Tumorigenesis 5 1.4 TumorsandtheSurroundingStroma 8 1.5 Ovarian Tumor Epidemiology 10 1.6 Ovarian Cancer Biomarkers 12 1.7 Alternative Splicing and Tumorigenesis 14 1.8 Microarray Technology 15 1.9 Prior Studies 18 1.10 Experimental Design 21 1.11 Summary 22 CHAPTER 2: MATERIALS AND METHODS 24 2.1 Rationale for the Key Features of the Wet Lab Methods 24 2.1.1 Laser Capture Microdissection 24 2.1.2 Microdissection Schema 25 2.2 Wet Lab Methods 25 2.2.1 Patient Sample Harvesting 27 2.2.2 Hemotoxylin and Eosin staining (Histopathology Confirmation) 28 2.2.3 HistoGene Staining 29 2.2.4 Laser Capture Microdissection 29 2.2.5 RNA Extraction 29 2.2.6 Tissue Scraped RNA Extraction 30 2.2.7 RNA Purification 30 vii 2.2.8 RNA Integrity Measurements 31 2.2.9 Complementary DNA Generation 31 2.2.10 SPIA Amplification 31 2.2.11 Nanodrop Amplified Complementary DNA Quantification 32 2.2.12 ST-Complementary DNA Conversion 32 2.2.13 Fragmentation and Labeling 33 2.2.14 Exon Microarray Hybridization Recipe 33 2.3 Analysis Pipeline 33 2.3.1 In Silico Quality Control 33 2.3.2 Initial Data Generation 34 2.3.3 Inferential Statistics: Gene-level 35 2.3.4 Inferential Statistics: Transcript-level 37 2.3.5 Biological Interpretation 38 CHAPTER 3: RESULTS 42 3.1 Study Sample Details 42 3.2 Laser Capture Microdissection 45 3.3 In Silico Quality Control 46 3.3.1 Rudimentary Microarray Processing Quality Control 46 3.3.2 73 Samples 49 3.3.3 Outlier Experimental Samples 53 3.3.4 Sample Factors: Random and Experimental 53 3.4 Gene-level Analysis 59 3.5 Transcript-level Analysis 62 3.6 Biological Contextualization 65 3.6.1 Chromosomal Location 65 3.6.2 Gene Ontology Analysis 68 3.6.3 Pathway Analysis 68 viii 3.6.4 Gene Interaction Analysis 69 CHAPTER 4: DISCUSSION 74 4.1 Notable Comparisons 74 4.2 Control Sample Evaluations 74 4.3 Ontology Enrichment 75 4.3.1 Cell Junctions 77 4.3.2 Spindle and Condensed Chromosomes 87 4.3.3 Terms Related to Cell Projection 88 4.3.4 Terms Related to Plasma Membranes 91 4.3.5 Cytoskeleton 101 4.3.6 Extracellular Region 101 4.4 Major Trends in Ontologies 105 4.5 NrCAM Expression 108 4.6 Chromosomal Location 109 4.7 Deriving a Gene Network 109 4.8 Future Directions 113 BIBLIOGRAPHY 115 APPENDIX A: LCM IMAGES 136 APPENDIX B: TRANSCRIPT / TRANSCRIPT CLUSTER LISTS 147 APPENDIX C: BIOLOGICAL CONTEXT RESULTS DETAIL 168 ix LIST OF TABLES TABLE 2.1: Planned a posteriori contrasts performed after the ANOVA. 37 TABLE 3.1: Sampling model design. 43 TABLE 3.2: 75 samples counts detail. 44 TABLE 3.3: Basic patient demographic and pathology details. 44 TABLE 3.4: Experiment sample count after all levels of quality control. 54 TABLE 3.5: A posteriori contrasts (transcript cluster level). 61 TABLE 3.6: Transcript level a posteriori comparisons and contrasts. 64 TABLE 3.7: Summary of benigh biological annotation. 66 TABLE 3.8: Summary of malignant biological annotation. 67 TABLE 4.1: Summary of unique Cellular Component term counts. 76 TABLE 4.2: Summary of transcripts related to cell junctions (1 of 3). 84 TABLE 4.3: (continued). 85 TABLE 4.4: (continued). 86 TABLE 4.5: Summary of transcripts related to spindles. 89 TABLE 4.6: Summary of transcripts related to cell projections. 92 TABLE 4.7: Summary of transcripts related to plasma membrane (1 of 6). 95 TABLE 4.8: (continued). 96 TABLE 4.9: (continued). 97 TABLE 4.10: (continued). 98 TABLE 4.11: (continued). 99 TABLE 4.12: (continued). 100 TABLE 4.13: Summary of transcripts related to the cytoskeleton (1 of 2). 102 TABLE 4.14: (continued). 103 TABLE 4.15: Summary of transcripts related to the extracellular region. 106 TABLE B.1: Complete benign tumor transcript list (1 of 4). 147 TABLE B.2: (continued). 148 x TABLE B.3: (continued). 149 TABLE B.4: (continued). 150 TABLE B.5: Complete benign tumor transcript-cluster list (1 of 3). 151 TABLE B.6: (continued). 152 TABLE B.7: (continued). 153 TABLE B.8: Complete malignant tumor transcript list (1 of 3). 154 TABLE B.9: (continued). 155 TABLE B.10:(continued). 156 TABLE B.11:Complete malignant tumor transcript-cluster list (1 of 3). 157 TABLE B.12:(continued). 158 TABLE B.13:(continued). 159 TABLE B.14:RefSeq numbers and fold-changes for benign T transcripts (1 of 3).160 TABLE B.15:(continued). 161 TABLE B.16:(continued). 162 TABLE B.17:(continued). 163 TABLE B.18:RefSeq numbers and fold-changes (malignant, tumor) (1 of 3). 164 TABLE B.19:(continued). 165 TABLE B.20:(continued). 166 TABLE B.21:RefSeq numbers and fold-changes for the benign TS transcript. 167 TABLE C.1: DAVID results for the benign transcript lists (1 of 4). 168 TABLE C.2: (continued). 169 TABLE C.3: (continued). 170 TABLE C.4: (continued). 171 TABLE C.5: DAVID results for the malignant transcript lists (1 of 6). 172 TABLE C.6: (continued). 173 TABLE C.7: (continued). 174 TABLE C.8: (continued). 175 xi TABLE C.9: (continued). 176 TABLE C.10:(continued). 177 TABLE C.11:DAVID results for the benign transcript cluster lists (1 of 2). 178 TABLE C.12:(continued). 179 TABLE C.13:DAVID results for the malignant transcript cluster lists (1 of 6). 180 TABLE C.14:(continued). 181 TABLE C.15:(continued). 182 TABLE C.16:(continued). 183 TABLE C.17:(continued). 184 TABLE C.18:(continued). 185 xii LIST OF FIGURES FIGURE 1.1: Human Exon 1.0 ST microarray chip synthesis. 19 FIGURE 2.1: Microdissection schema of benign sample. 26 FIGURE 2.2: Microdissection schema of malignant sample. 27 FIGURE 2.3: Flow chart of wet lab protocols used in sample preparation. 40 FIGURE 2.4: Sample preparation protocol quality checks. 41 FIGURE 3.1: Percentages of sample pairing schemes. 43 FIGURE 3.2: An image progression, capturing normal epithelium.