High-Throughput Bioinformatics Approaches to Understand Gene Expression Regulation in Head and Neck Tumors
Total Page:16
File Type:pdf, Size:1020Kb
High-throughput bioinformatics approaches to understand gene expression regulation in head and neck tumors by Yanxiao Zhang A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Bioinformatics) in The University of Michigan 2016 Doctoral Committee: Associate Professor Maureen A. Sartor, Chair Professor Thomas E. Carey Assistant Professor Hui Jiang Professor Ronald J. Koenig Associate Professor Laura M. Rozek Professor Kerby A. Shedden c Yanxiao Zhang 2016 All Rights Reserved I dedicate this thesis to my family. For their unfailing love, understanding and support. ii ACKNOWLEDGEMENTS I would like to express my gratitude to Dr. Maureen Sartor for her guidance in my research and career development. She is a great mentor. She patiently taught me when I started new in this field, granted me freedom to explore and helped me out when I got lost. Her dedication to work, enthusiasm in teaching, mentoring and communicating science have inspired me to feel the excite- ment of research beyond novel scientific discoveries. I’m also grateful to have an interdisciplinary committee. Their feedback on my research progress and presentation skills is very valuable. In particular, I would like to thank Dr. Thomas Carey and Dr. Laura Rozek for insightful discussions on the biology of head and neck cancers and human papillomavirus, Dr. Ronald Koenig for expert knowledge on thyroid cancers, Dr. Hui Jiang and Dr. Kerby Shedden for feedback on the statistics part of my thesis. I would like to thank all the past and current members of Sartor lab for making the lab such a lovely place to stay and work in. And for enormous help I received from them for my projects. In particular, I’d like to thank Yu-Hsuan Lin for developing the prototype pipeline for PePr, Lada Koneva, Shama Virani and Pelle Hall for identifying viral integration and RNA-seq mutation in the head and neck cancer project, and Chee Lee for the pathway analysis in the thyroid cancer project. I would also like to thank all of my friends here at University of Michigan for their company, support and inspirations. Ann Arbor is like my second hometown with a lot of cherishable memories. Finally, I’m very grateful to my parents for encouraging and supporting me to chase my own dreams, to become who I am today. And to my dear Yuqi, who has always been a truthful friend and a loving company. Thank you so much for always having faith in me. iii TABLE OF CONTENTS DEDICATION ::::::::::::::::::::::::::::::::::::::: ii ACKNOWLEDGEMENTS :::::::::::::::::::::::::::::::: iii LIST OF FIGURES :::::::::::::::::::::::::::::::::::: vii LIST OF TABLES ::::::::::::::::::::::::::::::::::::: ix LIST OF ABBREVIATIONS ::::::::::::::::::::::::::::::: x ABSTRACT ::::::::::::::::::::::::::::::::::::::::: xii CHAPTER I. Introduction .....................................1 1.1 Introduction.................................1 1.2 Background.................................3 1.2.1 High-throughput technologies..................3 1.2.2 Genomic aberrations and instability in cancer..........5 1.2.3 Epigenetic dysregulation in cancer: DNA methylation and his- tone modifications.........................8 1.2.4 Discovery of molecular cancer subtypes: distinct etiology and prognosis..............................9 1.2.5 The biology of head and neck tumors.............. 10 1.3 Dissertation overview............................ 12 II. PePr: a peak-calling prioritization pipeline to identify consistent or differen- tial peaks from replicated ChIP-Seq data ..................... 24 2.1 Introduction................................. 24 2.2 Methods................................... 27 2.2.1 Datasets.............................. 27 2.2.2 PePr algorithm.......................... 28 2.2.3 Motif analysis.......................... 35 2.2.4 Unique peak analysis....................... 35 2.3 Results.................................... 35 iv 2.3.1 Overview of the PePr method.................. 35 2.3.2 Comparison to other methods.................. 36 2.4 Discussion.................................. 42 III. Subtypes of HPV-positive head and neck cancers are associated with HPV characteristics, copy number alterations, PIK3CA mutation, and pathway signatures ...................................... 68 3.1 Introduction................................. 68 3.2 Methods................................... 71 3.2.1 Tumor tissue acquisition, DNA and RNA extraction....... 71 3.2.2 RNA-seq and SNP-array protocol................ 71 3.2.3 RNA-seq analysis of the host gene expression.......... 72 3.2.4 Measuring HPV gene expression, and detection of HPV sub- types and genic integration.................... 72 3.2.5 Computing full-length E6 percentages.............. 73 3.2.6 Finding unique pathways in each HPV(+) cluster........ 74 3.2.7 Unsupervised clustering of gene expression values....... 74 3.2.8 Pathway scores.......................... 75 3.2.9 RNA-seq mutation calling.................... 75 3.2.10 TCGA RNA-seq analysis..................... 76 3.2.11 CNA analysis........................... 76 3.3 Results.................................... 76 3.3.1 Overview of differential expression results from HPV(+) and HPV(-) tumors.......................... 76 3.3.2 Unsupervised clustering revealed two HPV(+) subgroups.... 77 3.3.3 Differentially regulated genes and pathways between HPV(+) subgroups............................. 78 3.3.4 HPV(+) subgroups correlate with HPV integration, E2/E4/E5 expression levels, full-length E6 percent and E6 activity..... 79 3.3.5 Correlation of subgroups with copy number alterations and PIK3CA mutation............................. 80 3.3.6 Characteristics associated with HPV(+) subgroups and patient survival.............................. 82 3.4 Discussion.................................. 83 IV. Genomic binding and regulation of gene expression by the thyroid carcinoma- associated PAX8-PPARG fusion protein ...................... 106 4.1 Introduction................................. 106 4.2 Materials and methods............................ 108 4.2.1 Cell culture............................ 108 4.2.2 Antibodies............................ 108 4.2.3 Flow cytometry.......................... 108 v 4.2.4 ChIP-seq assay.......................... 109 4.2.5 RNA-seq assay.......................... 109 4.2.6 ChIP-seq data analysis...................... 110 4.2.7 RNA-seq data analysis...................... 111 4.2.8 Gene set enrichment testing................... 111 4.3 Results.................................... 112 4.3.1 Overview of genes regulated by PPFP in the absence and pres- ence of pioglitazone....................... 112 4.3.2 PPFP regulates processes related to oncogenesis........ 112 4.3.3 PPFP can induce or repress PAX8-regulated genes....... 113 4.3.4 Overview of the PPFP cistrome................. 114 4.3.5 Overview of genes and gene sets containing PPFP peaks.... 116 4.3.6 Why is pioglitazone adipogenic in PPFP-expressing cells?... 117 4.3.7 Why is pioglitazone therapeutic in the mouse model of PPFP thyroid carcinoma?........................ 118 4.3.8 Similarity of gene regulation by PPFP in PCCL3 cells and hu- man thyroid carcinomas..................... 119 4.4 Discussion.................................. 120 V. Conclusions and future directions ......................... 143 5.1 Conclusions................................. 143 5.2 Future directions............................... 145 5.2.1 Chapter II............................. 145 5.2.2 Chapter III............................ 146 5.2.3 Chapter IV............................ 147 APPENDIX ::::::::::::::::::::::::::::::::::::::::: 149 vi LIST OF FIGURES Figure 2.1 Motif logos for all TFs used in our comparisons.................. 46 2.2 Workflow of PePr.................................. 47 2.3 Extra-variance beyond that of the Poisson distribution is observed in ChIP-seq data 48 2.4 H3K27me3 data show a high autocorrelation of the dispersion parameters esti- mated for nearby windows............................. 49 2.5 Comparison of PePr to other approaches on NRSF data.............. 50 2.6 Comparison of PePr to ZINBA-CR (A) ZINBA-SA (B) and edgeR-basic (C) on NRSF data...................................... 51 2.7 Rank comparisons between PePr and the alternative approaches on NRSF data.. 52 2.8 Rank comparisons between PePr and the alternative approaches on ATF4 data.. 53 2.9 Comparison of PePr to other approaches on ATF4 data.............. 54 2.10 Comparison of PePr to ZINBA-CR (A), ZINBA-SA (B) and edgeR-basic (C) on ATF4 data...................................... 55 2.11 Example of an H3K27me3 enriched region showing high variation of ChIP-seq signals across samples................................ 56 2.12 A scaling FDR analysis of the H3k27me3 dataset shows PePr was most robust to differences in read coverage level.......................... 57 2.13 Comparison of PePr to other approaches for H3K27me3 data............ 58 3.1 Identification of two HPV(+) subgroups and pathway differences between them. 88 3.2 HPV(+) subgroups correlate with several HPV characteristics........... 89 3.3 HPV(+) subgroups differ by copy number alterations............... 90 3.4 The HPV-KRT subgroup had more PIK3CA mutations and amplifications..... 91 3.5 Summary of characteristics that differ by HPV(+) subgroup, and prognosis.... 91 3.6 Focal amplification of chr11q13 and chr11q22 in HPV(-) tumors and far-end deletion of chr11q in HPV(+) tumors........................ 92 3.7 ConsensusCluster Plus output for (A) UM and (B) UM+TCGA HPV(+) samples. 93 3.8 Correlation