Open Dogan Phdthesis Final.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
The Pennsylvania State University The Graduate School Eberly College of Science ELUCIDATING BIOLOGICAL FUNCTION OF GENOMIC DNA WITH ROBUST SIGNALS OF BIOCHEMICAL ACTIVITY: INTEGRATIVE GENOME-WIDE STUDIES OF ENHANCERS A Dissertation in Biochemistry, Microbiology and Molecular Biology by Nergiz Dogan © 2014 Nergiz Dogan Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy August 2014 ii The dissertation of Nergiz Dogan was reviewed and approved* by the following: Ross C. Hardison T. Ming Chu Professor of Biochemistry and Molecular Biology Dissertation Advisor Chair of Committee David S. Gilmour Professor of Molecular and Cell Biology Anton Nekrutenko Professor of Biochemistry and Molecular Biology Robert F. Paulson Professor of Veterinary and Biomedical Sciences Philip Reno Assistant Professor of Antropology Scott B. Selleck Professor and Head of the Department of Biochemistry and Molecular Biology *Signatures are on file in the Graduate School iii ABSTRACT Genome-wide measurements of epigenetic features such as histone modifications, occupancy by transcription factors and coactivators provide the opportunity to understand more globally how genes are regulated. While much effort is being put into integrating the marks from various combinations of features, the contribution of each feature to accuracy of enhancer prediction is not known. We began with predictions of 4,915 candidate erythroid enhancers based on genomic occupancy by TAL1, a key hematopoietic transcription factor that is strongly associated with gene induction in erythroid cells. Seventy of these DNA segments occupied by TAL1 (TAL1 OSs) were tested by transient transfections of cultured hematopoietic cells, and 56% of these were active as enhancers. Sixty-six TAL1 OSs were evaluated in transgenic mouse embryos, and 65% of these were active enhancers in various tissues. Inclusion of additional epigenetic features improved the prediction accuracy, with combinations of TAL1, GATA1, EP300, H3K4me1, and H3K27ac giving high accuracy of enhancer prediction (70%-75% success depending on method of clustering) while having strong discriminatory power maintaining good sensitivity (Sn, up to 84%) and specificity (Sp, up to 80%). Importantly, it was shown that activating histone marks in the absence of key transcription factors or open chromatin profile is a weak predictor of enhancer activity, and had no discriminatory power to predict enhancers. Motifs that distinguish active from inactive TAL1 OSs implicate IRFs, STATs, and FOX protein families as candidate positive co-factors with TAL1, while REST (NRSF) and HOX family proteins are implicated in inactivity. While signals for evolutionary constraint were weak over the entire TAL1-bound DNA segments regardless of activity in either assay, phylogenetic preservation of a TF-binding site motif was associated with enhancer activity. Additionally, we reported that the conservation of GATA1 occupancy is linked to iv pleiotropic functions, meaning they are enhancers in multiple tissues, including non- hematopoietic tissues. Furthermore, the TAL1-bound enhancers validated in enhancer assays were assigned to their target genes and they include not only erythroid but also non-erythroid genes. v TABLE OF CONTENTS LIST OF FIGURES ........................................................................................................ vii LIST OF TABLES ........................................................................................................... ix ACKNOWLEDGEMENTS................................................................................................ x Chapter-1 Introduction to Regulation of Transcription by Cis-Regulatory Modules (CRMs)............................................................................................................................ 1 1.1 Regulation of transcription .................................................................................. 1 1.2 Cis-regulatory modules (CRMs) .......................................................................... 2 1.2.1 Enhancers ................................................................................................... 4 1.2.2 Identify enhancers ....................................................................................... 5 1.2.3 How do enhancers influence human disease? ............................................. 7 1.2.4 How important are enhancers for evolution ................................................ 10 1.3 Regulation of erythropoiesis via transcriptional enhancers and histone modifications ............................................................................................................. 12 1.3.1 Transcription factors ..................................................................................... 13 1.3.2 Histone modifications ................................................................................. 14 1.4 Statement of Thesis .......................................................................................... 17 Chapter-2 Epigenetic and Genetic Features that Lead To Discovery of Enhancer Function ........................................................................................................................ 19 2.1 Introduction.......................................................................................................... 20 2.2 Results ................................................................................................................ 22 2.2.1 Genome-wide prediction of regulated erythroid CRMs by TAL1 occupancy .. 22 2.2.2 Occupancy by TAL1 is a strong predictor of enhancer activity ...................... 26 2.2.3 Impact of additional epigenetic features in predicting enhancer activity in the presence of TAL1 binding ...................................................................................... 33 2.2.4 Confirmation of predictive power of EP300 occupancy, H3K4monomethylation and H3K27 acetylation in enhancer prediction ....................................................... 41 2.2.5 Effective combinations of epigenetic features for prediction of enhancers .... 42 2.2.6 Motifs that distinguish TAL1-bound enhancers from inactive TAL1 OSs ....... 49 2.2.7 Conservation as an illuminator, not a predictor ............................................. 52 2.3 Discussion ........................................................................................................... 56 2.4 Methods .............................................................................................................. 59 2.4.1 ChIP-seq data for epigenetic features ........................................................... 59 2.4.2 Enhancer assays by transient transfection for K562 cells ............................. 59 vi 2.4.3 Transgenic mouse assays (VISTA Enhancer Browser) ................................. 60 2.4.4 Clustering algorithms .................................................................................... 61 2.4.5 Measuring discriminatory power of transcription factors and histone modifications to identify enhancers ........................................................................ 62 2.4.6 Identification of significantly enriched motifs by employing the computer program, Discriminating Matrix Enumerator ........................................................... 62 2.4.7 Analyses of sequence conservation and motif preservation .......................... 63 2.5 Data access ......................................................................................................... 64 Chapter-3 Developing transfection methods for cell lines that allow interrogation of different aspects of erythroid differentiation ................................................................... 65 3.1 Transient transfections to test CRMs ................................................................... 66 3.2 Transcription factors present in different cell line models of erythroid cells .......... 68 3.3 Erythroid enhancement with HBG1 promoter in K562 cells .................................. 69 3.4 GATA1 responsive expression from HBG1 promoter in G1E-ER4 system ........... 71 3.5 GATA1-dependent enhancement with Vav2 promoter in G1E-ER4 system ......... 73 3.6 Potential repressor effect of GATA1-ER on expression in G1E-ER4 system ....... 74 3.7 Discussion ........................................................................................................... 74 3.8 Methods .............................................................................................................. 74 3.8.1 Enhancer assays by transient transfection for G1E-ER4 cell system ............ 75 Chapter-4 Conserved Transcription Factor Occupancy and Enhancer Usage in Different Cell Systems and Multiple Tissues ................................................................................ 77 4.1 Introduction.......................................................................................................... 78 4.2 Contribution of conserved GATA1 occupancy to enhancer activity in different cell systems ..................................................................................................................... 80 4.3 Conserved occupancy is associated with enhancer activity in multiple tissues .... 84 4.4 Discussion ........................................................................................................... 88 4.5 Methods .............................................................................................................