Douglas Hoen Phd Thesis FINAL CORRECTED

Coevolution of transposable elements and plant genomes by DNA sequence exchanges Douglas R. Hoen Department of Biology McGill University, Montreal December, 2011 A thesis submitted to McGill University in partial fulfillment of the requirements of the degree of Doctor of Philosophy. © Douglas R. Hoen, 2011 2 Express yourself completely, then keep quiet. Be like the forces of nature: when it blows, there is only wind; when it rains, there is only rain; when the clouds pass, the sun shines through. LAO-TZU 3 Table of Contents Abstract7 Résumé8 Acknowledgements9 Contributions10 Chapter One: Literature review12 Introduction13 Discovery15 Frequent Birth Model17 Plant DTEs24 FAR1/FHY3/FRS24 DAYSLEEPER30 MUSTANG31 GARY34 Additional Examples35 Other Types of Protein-Coding Exaptation36 Regulatory Exaptation, Coding and Noncoding40 Conclusions45 References46 Bridge to Chapter Two58 Chapter Two: The evolutionary fate of MULE-mediated duplications of host gene fragments in rice59 Abstract60 Introduction60 Results and Discussion62 4 Methods72 MULE detection72 Gene prediction73 cDNA mapping73 Transduplicate detection74 Conserved domain detection74 Coding-sequence disablement and truncation75 Synonymous and nonsynonymous substitution rate75 Acknowledgments75 Supplemental Material76 References81 Bridge to Chapter Three84 Chapter Three: Transposon-mediated expansion and diversification of a family of ULP-like genes85 Abstract86 Introduction87 Materials and Methods89 Sequences89 Identification of MULEs, Transduplicates, and Peptidase C48 89 Alignment, Phylogeny, and Conservation92 Expression94 Mobility94 Results96 Detection of MULEs and Transduplicates96 Identification of KI98 5 Phylogeny and Age100 Conservation103 Expression104 Mobility107 Discussion110 Transduplication110 KI Sequence Evolution111 Expression113 Mobility115 Is KI Selfish?118 Function120 Broader Evolutionary Implications122 Acknowledgments123 Supplemental Material124 References128 Bridge to Chapter Four133 Chapter Four: Discovery of domesticated transposable element genes in Arabidopsis thaliana by integrative genomic data mining134 Abstract135 Introduction135 Methods138 Data Sources138 Identification of TE-like Genes140 Storage, Attribution, and Analysis of Genomic Features144 Determination of Shared Microsynteny145 6 Clustering of Domain Sequences146 Calculation of DNA Repetitiveness146 Calculation of Pericentromere Shoulder Coordinates146 Controls147 Calculation of Confidence Levels and Curation147 Additional Analysis of Complex Loci150 Results151 TE Signature Domains151 Domain Scans152 Cross-Genome Conservation155 Pericentromere Shoulders160 Domesticated Transposable Elements160 Discussion198 References210 Conclusion213 Appendix One: The map-based sequence of the rice genome215 Appendix Two: MUSTANG is a novel family of domesticated transposase genes found in diverse angiosperms216 7 Abstract Transposable elements (TEs) are self-replicating genetic elements that comprise a large portion of all characterized nuclear genomes. Self- replication, which is catalyzed by proteins encoded by autonomous TEs, permits TEs to persist without necessarily providing immediate adaptive benefit to the organism; therefore, TEs are sometimes characterized as selfish, parasitic, or junk DNA. Nevertheless, over the course of evolution, TEs have produced diverse and vital eukaryotic adaptations. One way in which TEs coevolve with ordinary genes is by direct sequence exchange: TEs can duplicate and mobilize ordinary genes; conversely, TE-derived sequences can become conserved as ordinary genes. In this thesis, I use genome-scale bioinformatic analyses to identify direct sequence exchanges from plant genomes to TEs, and vice versa, and to characterize their functional and evolutionary consequences. After reviewing the literature, I first examine Mutator-like elements (MULEs) in rice that have duplicated and mobilized thousands of ordinary coding gene fragments, a process we term transduplication. Contrary to a previous report, these sequences do not appear to produce functional proteins, although they may have regulatory functions. Second, I examine a gene family that appears to have originated through transduplication in Arabidopsis thaliana MULEs, which is conserved within TEs, called Kaonashi (KI). KI shows that transduplication does occasionally produce functional gene duplications; however, at least in this case, the result is a not a new ordinary gene, but a new TE gene. Finally, I examine ordinary genes in A. thaliana derived from TE genes, a process termed molecular domestication. In addition to 3 previously known A. thaliana domesticated transposable elements (DTEs) families, I identify 23 candidate novel families. Together, these results support the view that, despite persisting by self-replication, TEs are not molecular parasites but are integral components of eukaryotic genomes. 8 Résumé Les éléments transposables (ET) sont des séquences d’ADN capables de se déplacer et de s’autoreproduire dans un génome, un mécanisme appelé transposition. Ces éléments représentent l’une des composantes les plus importantes des génomes nucléaires eucaryotes. Cette capacité à s’autoreproduire, grâce aux protéines codées par les ET autonomes, a permis aux ET de persister et de peupler les génomes sans nécessairement apporter un avantage adaptatif immédiat à l’organisme hôte. À cet égard, les ET sont parfois considérés comme des éléments égoïstes ou parasites, ou de l’ADN «poubelle». Néanmoins, les ET ont joué un rôle important au cours de l’évolution en générant diverses adaptations essentielles aux eucaryotes. Ainsi, les ET peuvent coévoluer avec les gènes du génome hôte par l’échange direct de séquence d’ADN. Les ET peuvent se dupliquer et mobiliser des gènes hôtes ; à l’inverse, des séquences d’ADN dérivées de ET peuvent avoir le même niveau de conservation que des gènes hôtes. Dans le cadre de ma thèse, j’ai utilisé des analyses bio-informatiques à l’échelle du génome afin d’identifier des échanges directs de brins de séquence d’ADN à partir de génomes de plantes vers les ET, et vice-versa, et de caractériser leurs fonctions et leurs effets évolutifs. Ma thèse débutera par une recension des diverses publications scientifiques dans le domaine. Je dresserai ensuite un portrait des éléments mobiles Mutator-like (MULE) dans le génome du riz qui ont entraîné la duplication et la mobilisation de milliers de fragments de gènes codants normaux, un procédé appelé transduplication. Contrairement à ce qui avait été rapporté dans des publications antérieures, ces séquences transdupliquées ne semblent pas produire des protéines fonctionnelles malgré le fait qu’elles puissent avoir des fonctions régulatrices. En second lieu, j’examinerai une famille de gènes, appelée Kaonashi (KI), qui proviendrait d’un événement de transduplication présent dans les MULE de l’Arabidopsis thaliana, mais également conservé dans les ET. La 9 présence de la famille KI nous montre que le procédé de transduplication permet à l’occasion des duplications fonctionnelles de gènes. Cependant, du moins dans le cas de la KI, le procédé n’entraîne pas la création d’un nouveau gène normal, mais bien d’un nouvel élément transposable. En troisième lieu, j’examinerai les gènes hôtes présents dans le génome de la plante A. thaliana qui proviendrait de ET, un procédé appelé domestication moléculaire. En plus des trois cas de familles d’éléments transposables domestiquées (ETD) déjà connues dans l’espèce A. thaliana, j’ai identifié 23 nouvelles familles potentielles. L’ensemble de ces résultats tend à démontrer que, malgré le fait qu’ils persistent dans les génomes grâce à leur capacité d’autoreproduction, les ET ne sont pas des parasites moléculaires, mais bien des éléments clés faisant partie intégrale des génomes eucaryotes. Acknowledgements I appreciate greatly the help, support, and patience that many have given to me as I have walked this path. I especially thank my thesis supervisor, Thomas Bureau, who has been generous with his time, ideas, and most importantly understanding. Thanks also to my supervisory committee, Joseph Dent, Daniel Schoen, and Paul Harrison, as well as to Mathieu Blanchette, for their attention and suggestions. Thanks to my lab mates, co-authors, and other colleagues for fruitful (and fruitless) discussions. Thanks most of all to my family and friends, for their love. My funding was in part provided by the Natural Sciences and Engineering Research Council of Canada Postgraduate Scholarship Program and the E & M Wilson Bursary administered by McGill University. 10 Contributions Chapter One has been submitted as an invited contribution to an upcoming Topics in Current Genetics book on Plant Transposons. I researched and wrote this chapter with guidance from my thesis supervisor, Prof. Thomas Bureau. Chapter Two was published in Genome Research: Hoen DR, Juretic N, Huynh ML, Harrison PM, Bureau TE. 2005. The evolutionary fate of MULE-mediated duplications of host gene fragments in rice. Genome Res. 15(9):1292-7. Co-first authorship is shared by N.J., who contributed to MULE discovery, experimental design, quality control, analysis of specific cases, and writing. M.H. was involved in preliminary MULE discovery. P.H. contributed to analyzing pseudogenic features. T.B. contributed to experimental design and writing. I designed, implemented, and analyzed all large-scale bioinformatic experiments, including MULE 1 & TSD detection, expression mapping, transduplicate detection, conserved domain detection, analyses of coding

Douglas Hoen Phd Thesis FINAL CORRECTED

Transcriptional Modulator ZBED6 Affects Cell Cycle and Growth of Human Colorectal Cancer Cells

Cellular and Molecular Signatures in the Disease Tissue of Early

Dissection of the Cellular Function of the ZBED6 Transcription Factor in Mouse

Atlas Journal

Transcriptional Modulator ZBED6 Affects Cell Cycle and Growth of Human Colorectal Cancer Cells

Functional Characterization of the Biological Significance of the ZBED6/ZC3H11A Locus in Placental Mammals

SMARCA4 Regulates Spatially Restricted Metabolic Plasticity in 3D Multicellular Tissue

Human Social Genomics in the Multi-Ethnic Study of Atherosclerosis

Downloaded, Each with Over 20 Samples for AD-Specific Pathways, Biological Processes, and Driver Each Specific Brain Region in Each Condition

Genomic Rearrangements Near Genes Leading to Upregulation Across a Diverse Subset of Human Cancers

Imprint Stability and Plasticity During Development

Identificação De Genes E O Problema Do Alinhamento Spliced Múltiplo