The Spatial and Temporal Regulatory Code of Transcription Initiation In

The Spatial and Temporal Regulatory Code of Transcription Initiation in Drosophila melanogaster by Elizabeth Ann Rach Computational Biology and Bioinformatics Duke University Date:_______________________ Approved: ___________________________ Uwe Ohler, PhD, Supervisor ___________________________ Fred S. Dietrich, PhD ___________________________ Jack Keene, PhD ___________________________ Sayan Mukherjee, PhD ___________________________ Gregory Crawford, PhD Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computational Biology and Bioinformatics in the Graduate School of Duke University 2010 ABSTRACT The Spatial and Temporal Regulatory Code of Transcription Initiation in Drosophila melanogaster by Elizabeth Ann Rach Computational Biology and Bioinformatics Duke University Date:_______________________ Approved: ___________________________ Uwe Ohler, PhD, Supervisor ___________________________ Fred S. Dietrich, PhD ___________________________ Jack Keene, PhD ___________________________ Sayan Mukherjee, PhD ___________________________ Gregory Crawford, PhD An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computational Biology and Bioinformatics in the Graduate School of Duke University 2010 Copyright by Elizabeth Ann Rach 2010 Abstract Transcription initiation is a key component in the regulation of gene expression. Recent high‐throughput sequencing techniques have enhanced our understanding of mammalian transcription by revealing narrow and broad patterns of transcription start sites (TSSs). Transcription initiation is central to the determination of condition specificity, as distinct repertoires of transcription factors (TFs) that assist in the recruitment of the RNA polymerase II to the DNA are present under different conditions. However, our understanding of the presence and spatiotemporal architecture of the promoter patterns in the fruit fly remains in its infancy. Nucleosome organization and transcription initiation have been considered hallmarks of gene expression, but their cooperative regulation is also not yet understood. In this work, we applied a hierarchical clustering strategy on available 5’ expressed sequence tags (ESTs), and developed an improved paired‐end sequencing strategy to explore the transcription initiation landscape of the D.melanogaster genome. We distinguished three initiation patterns: “peaked or Narrow Peak TSSs”, “Broad Peak TSSs”, and “broad TSS cluster groups or Weak Peak TSSs”. The promoters of peaked TSSs contained the location specific sequence elements, and were bound by TATA Binding Protein (TBP), while the promoters of broad TSS cluster groups were associated iv with non‐location‐specific elements, and were bound by the TATA‐box related Factor 2 (TRF2). Available ESTs and a tiling array time series enabled us to show that TSSs had distinct associations to conditions, and temporal patterns of embryonic activity differed across the majority of alternative promoters. Peaked promoters had an association to maternally inherited transcripts, and broad TSS cluster group promoters were more highly associated to zygotic utilization. The paired‐end sequencing strategy identified a large number of 5’ capped transcripts originating from coding exons that were unlikely the result of alternative TSSs, but rather the product of post‐transcriptional modifications. We applied an innovative search program called FREE to embryo, head, and testes specific core promoter sequences and identified 123 motifs: 16 novel and 107 supported by other motif sources. Motifs in the embryo specific core promoters were found at location hotspots from the TSS. A family of oligos was discovered that matched the Pause Button motif that is associated with RNA pol II stalling. Lastly, we analyzed nucleosome organization, chromatin structure, and insulators across the three promoter patterns in the fruit fly and human genomes. The WP promoters showed higher associations with H2A.Z, DNase Hypersensitivity Sites (DHS), H3K4 methylations, and Class I insulators CTCF/BEAF32/CP190. Conversely, NP v promoters had higher associations with polII and GAF binding. BP promoters exhibited a combination of features from both promoter patterns. Our study provides a comprehensive map of initiation sites and the conditions under which they are utilized in D. melanogaster. The presence of promoter specific histone replacements, chromatin modifications, and insulator elements support the existence of two divergent strategies of transcriptional regulation in higher eukaryotes. Together, these data illustrate the complex regulatory code of transcription initiation. vi Dedication To my wonderfully strong and loving parents, Kathy and Herb. To the world you might be one person, but to one person, you just might be the world. – unknown author vii Contents ABSTRACT .................................................................................................................................... IV LIST OF TABLES ........................................................................................................................ XIV LIST OF FIGURES ...................................................................................................................... XVI LIST OF ABBREVIATIONS ......................................................................................................... XX ACKNOWLEDGEMENTS ........................................................................................................... XXV 1. THE DNA CODE FOR GENE REGULATION ............................................................................. 1 1.1 FROM SEQUENCING TO REGULATION....................................................................................... 1 1.2 DROSOPHILA MELANOGASTER ................................................................................................. 3 1.2.1 An Ideal Model Organism ............................................................................................... 3 1.2.2 Embryonic Development ................................................................................................. 5 1.2.3 Availability of 12 Genomes ............................................................................................. 7 1.3 GENETICS OF TRANSCRIPTION ................................................................................................. 9 1.3.1 DNA is Tightly Compacted Within a Cell ........................................................................ 9 1.3.2 Epigenetic Modifications Increase DNA Accessibility .................................................. 11 1.3.3 Initiation ........................................................................................................................ 14 1.3.4 Elongation and Termination.......................................................................................... 15 1.3.5 Post Transcriptional Processing ................................................................................... 16 1.4 TRANSCRIPTIONAL REGULATORS .......................................................................................... 17 1.4.1 Enhancers and Repressors ............................................................................................ 17 1.4.2 Operons and Insulators ................................................................................................. 21 1.5 CONDITION SPECIFIC TRANSCRIPTIONAL PROGRAMS ............................................................ 24 1.6 ASSOCIATIONS OF TRANSCRIPTION TO DISEASE .................................................................... 27 2. EXPERIMENTAL AND COMPUTATIONAL STRATEGIES FOR MODELING 5’ ENDS ......... 29 2.1 LOW THROUGHPUT EXPERIMENTAL METHODS ..................................................................... 29 2.1.1 S1 Mapping ................................................................................................................... 29 viii 2.1.2 RNase Protection ........................................................................................................... 31 2.1.3 Primer Extension ........................................................................................................... 32 2.1.4 5’ RACE ........................................................................................................................ 34 2.2 HIGH THROUGHPUT EXPERIMENTAL METHODS .................................................................... 37 2.2.1 Expressed Sequence Tags (ESTs) .................................................................................. 37 2.2.2 Serial Analysis of Gene Expression (SAGE) ................................................................. 39 2.2.3 Cap Analysis of Gene Expression (CAGE) ................................................................... 43 2.2.4 Tiling Arrays ................................................................................................................. 46 2.3 COMPUTATIONAL IDENTIFICATION OF TSS FEATURES .......................................................... 49 2.3.1 Classifying and Parsing ................................................................................................ 49 2.3.2 Sequence Classification ................................................................................................. 51 2.3.3 Markov Models (MM) ................................................................................................... 53

The Spatial and Temporal Regulatory Code of Transcription Initiation In

Functional Aspects and Genomic Analysis

Gene Prediction Using Deep Learning

Complete Genome Sequence of the Hyperthermophilic Bacteria- Thermotoga Sp

Cep-2020-00633.Pdf

Bioinformatics: a Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D

Genes & Gene Finding

Computational Gene Finding

Scheme and Syllabus of B. Tech. Biotechnology (BT) Batch 2011 Onwards

Basics of Molecular Biology

Computational Gene Prediction

Yersinia Pestis Genes Essential Across a Narrow Or a Broad Range of Environmental Conditions Nicola J

Introduction to Ab Initio and Evidence-Based Gene Finding