Abstract Genome-Wide Computational Analysis Of

ABSTRACT GENOME-WIDE COMPUTATIONAL ANALYSIS OF CHLAMYDOMONAS REINHARDTII PROMOTERS by Kokulapalan Wimalanathan As a model organism, use of Chlamydomonas is not only limited with biological experiments to understand chloroplast and flagella, but is also extended to biodiesel production. Chlamydomonas promoter regions were extracted based on available RNA-Seq data and community genome annotation, and promoters were used to analyze and detect core and proximal promoter elements. While the evidence suggests only the TATA box (canonical and non-canonical TATA boxes) as the only core promoter element, it also indicates that the TATA box in Chlamydomonas is different than Arabidopsis and human TATA boxes. While some proximal promoter elements discovered show weak similarities to known promoter elements from other species, most are novel elements only present in Chlamydomonas. Most of the proximal promoter elements detected show significant similarities to each other. It is evident from this study that the promoter architecture in Chlamydomonas seems to be simpler compared to animals and plants. GENOME-WIDE COMPUTATIONAL ANALYSIS OF CHLAMYDOMONAS REINHARDTII PROMOTERS A Thesis Submitted to the Faculty of Miami University in partial fulfillment of the requirements for the degree of Master of Science Department of Botany by Kokulapalan Wimalanathan Miami University Oxford, Ohio 2011 Advisor ________________________ (Dr. Chun Liang) Reader ________________________ (Dr. Roger Meicenheimer) Reader ________________________ (Dr. Mufit Ozden) i Table of Contents List of Tables..................................................................................................................................iii List of Figures.................................................................................................................................iv Acknowledgments............................................................................................................................v 1 Chapter 1:Review of Core Promoter Analysis.............................................................................1 1.1 Gene expression...................................................................................................................1 1.1.1 Eukaryotic gene structure.............................................................................................1 1.1.2 Transcription factors and cis-regulatory elements........................................................2 1.1.3 Transcription initiation.................................................................................................2 1.1.4 Different types of promoters based on the distribution of TSS....................................3 1.1.5 Core promoter elements...............................................................................................4 1.1.6 Proximal promoter elements.........................................................................................6 1.1.7 Alternative promoters...................................................................................................7 1.2 Computational methods to identify cis-regulatory elements...............................................9 1.2.1 Enumerative method.....................................................................................................9 1.2.2 Deterministic optimization method............................................................................11 1.2.3 Probabilistic optimization method..............................................................................12 1.2.4 Notations to represent cis-regulatory elements..........................................................12 1.3 References..........................................................................................................................14 1.4 Figures................................................................................................................................21 1.5 Tables.................................................................................................................................28 2 Chapter 2 ................................................................................................................................30 2.1 Abstract..............................................................................................................................30 2.2 Introduction........................................................................................................................31 2.3 Results................................................................................................................................37 2.3.1 Obtaining promoter regions for analysis....................................................................37 2.3.2 LDSS analysis of core promoters of human, Arabidopsis and Chlamydomonas ......37 2.3.3 Comparing and combining octamer clusters to produce putative core promoter elements................................................................................................................................41 2.3.4 Analysis of proximal promoter elements...................................................................42 2.4 Discussion..........................................................................................................................47 2.5 Materials and Methods.......................................................................................................52 2.5.1 Obtaining Valid Promoter Sequences.........................................................................52 2.5.2 LDSS analysis of core promoters of three species.....................................................53 2.5.3 Comparing and combining octamer clusters to form octamer groups .....................55 2.5.4 Analysis of proximal promoter elements...................................................................56 2.6 Acknowledgments..............................................................................................................58 2.7 References..........................................................................................................................59 2.8 Figures................................................................................................................................68 2.9 Tables.................................................................................................................................81 ii List of Tables Table 1.1: The position and consensus of the core promoter elements in plants and animals.......28 Table 1.2: IUPAC single character codes for nucleic acid sequences............................................29 Table 2.1: Basic statistics of upstream promoter analysis.............................................................81 Table 2.2: The 14 KEGG motif groups and their component KEGG motifs.................................82 Table 2.3: Functional annotation of GO motifs detected in GO gene groups................................83 Table 2.4: Functional annotation of KEGG motifs detected in KEGG gene groups.....................84 iii List of Figures Figure 1.1: Gene expression, structural gene components and pre-initiation complex.................21 Figure 1.2: Different types of promoters based on TSS distribution.............................................22 Figure 1.3: Core promoter elements commonly found in plants and animals...............................23 Figure 1.4: A simple example of how to use the LDSS method. ..................................................24 Figure 1.5: Creating sequence logos..............................................................................................25 Figure 1.6: Information represented in a sequence logo................................................................26 Figure 1.7: Several ways commonly used to represent consensus motif models..........................27 Figure 2.1: The Major core promoter elements present in animals and plants..............................68 Figure 2.2: Examples of LDSS-positive and LDSS-negative octamers in Chlamydomonas.......69 Figure 2.3: LDSS heatmap graphs for Arabidopsis, human, and Chlamydomonas......................70 Figure 2.4: Sequence logos from Arabidopsis LDSS octamer clusters.........................................71 Figure 2.5: Sequence logos from human LDSS octamer clusters.................................................72 Figure 2.6: Sequence logos from Chlamydomonas LDSS octamer clusters. ...............................73 Figure 2.7: Sequence logos from combined putative core promoter elements..............................74 Figure 2.8: GO motifs....................................................................................................................75 Figure 2.9: Representative motifs from each KEGG motif group.................................................76 Figure 2.10: Overall method used to analyze promoters...............................................................77 Figure 2.11: LDSS Parameters.......................................................................................................78 Figure 2.12: Valid Promoters.........................................................................................................79 Figure 2.13: MEME Minimal Motif Format..................................................................................80 iv Acknowledgments I heartily thank my advisor, Dr. Chun Liang, whose encouragement, guidance, and support has helped me conduct this research in a meaningful

Abstract Genome-Wide Computational Analysis Of

Involvement of REST Corepressor 3 in Prognosis of Human Hepatitis B

Silencer Elements Controlling the B29 (Ig␤) Promoter Are Neither Promoter- Nor Cell-Type-Specific

Robijn Bruinsma

CHD7 Represses the Retinoic Acid Synthesis Enzyme ALDH1A3 During Inner Ear Development

Oncjuly3 6..6

Transcrip\Onal and Epigene\C Changes During Heart Disease

Repressor to Activator Switch by Mutations in the First Zn Finger of The

UNIT 6 from DNA to Protein: Gene Expression PART 2 Hillis Textbook

The Neuron-Restrictive Silencer Element: a Dual Enhancer͞silencer Crucial for Patterned Expression of a Nicotinic Receptor Gene in the Brain

Identification of a Family of Camp Response Element-Binding Protein Coactivators by Genome-Scale Functional Analysis in Mammalian Cells

Transcription in Archaea

A STAT Protein Domain That Determines DNA Sequence Recognition Suggests a Novel DNA-Binding Domain