The Evolutionary History and Genomic Impact Of
Total Page:16
File Type:pdf, Size:1020Kb
THE EVOLUTIONARY HISTORY AND GENOMIC IMPACT OF MAMMALIAN DNA TRANSPOSONS by JOHN KELLY PACE, II Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY THE UNIVERSITY OF TEXAS AT ARLINGTON December 2008 Copyright © by John Kelly Pace, II 2008 All Rights Reserved ACKNOWLEDGEMENTS I would like to first and foremost thank my wonderful wife, Deedra, and my incredible children Jana, Julie and Jill, for their tireless support during graduate school. They have spent many hours without me while I was working on the computer, but their faith in me and willingness to allow me to follow my dream has kept me going when I was ready to stop. I love you all and cannot say thank you enough! Next, I want to thank my advisor, Dr. Cedric Feschotte. Cedric has not only been a great advisor, but has also been a good friend. He has provided me with opportunities to grow intellectually and given me chances to learn from others in places I would never have been able to go without his help such as France. He allowed me to be a Research Assistant in his lab, thereby giving me more time to focus on my research and finish school in a timely manner. I am forever grateful for all he has done for me. I would like to thank the Biology Department at the University of Texas at Arlington. My committee members, Drs. Esther Betran, Shawn Christensen, Andre Pires da Silva and Ellen Pritham, who challenged me to learn more than I thought was possible, knowing full well it would make me a better scientist. The members of the Feschotte Lab have been a tremendous support, particularly Don Hucks, Marcel Guibotsy, Claudio Casola, Qi Wang and Clement Gilbert. It has been a privilege to work alongside each of them. I would like to thank the members of the Genome Biology Group for their insights and comments on my work and for allowing me to learn from their work. My work on human polymorphism of repair-mediated duplications would not have been possible without the assistance of Mark Batzer and Shurjo Sen at Louisiana State University. Last, I would like to thank the Biology Office staff, Gloria Burlingham, Linda Taylor and Peggy Fisher for all of their help. October 15, 2008 iii ABSTRACT THE EVOLUTIONARY HISTORY AND GENOMIC IMPACT OF MAMMALIAN DNA TRANSPOSONS John Kelly Pace, PhD. The University of Texas at Arlington, 2008 Supervising Professor: Cedric Feschotte Transposable elements (TEs) are mobile repetitive sequences that make up large fractions of mammalian genomes, including at least 45% of the human genome. Class 2, or DNA intermediate transposons, which make up approximately 3% of human nuclear DNA, have typically been overlooked in analyses of mammalian genomes and therefore remain poorly understood. Here, we carried out the first large-scale analysis of the evolutionary history of DNA transposons throughout eutherian mammal evolution. The analysis included 18 mammals from all major branches of the eutherian tree including Afrotheria, Laurasiatheria, primates, rodents and Xenarthra. We combined three different computational methods: average divergence of the TE family from the ancestral consensus sequence, nested insertion analysis and cross-species genomic analysis of orthologous loci to determine the average age of each TE family, when the family was active and in which species. The combination of these methods, the latter two of which do not rely upon calibration of a molecular clock, allowed us to trace the evolutionary history of the 249 currently recognizable eutherian DNA transposon families for which at least 100 copies of the TE could be identified. iv Our analysis revealed that, contrary to previous assumptions, the horizontal transfer (HT) of DNA transposons is a widespread and common phenomenon in eutherians. We report strong evidence for the HT of 13 different autonomous TE families that were horizontally transferred into 13 of the 18 eutherian species surveyed, typically invading multiple species lineages. In each case, the infiltration of the TE family was an independent event that resulted in lineage-specific activity, producing distinct bursts of transposition. Together, these bursts were responsible for the insertion of between 2,300 and 222,000 copies of new elements and the addition of up to 39 megabases of nuclear DNA per species. We discovered a general, eutherian-wide slowdown in the number of horizontal transfers and lineage-specific activity of DNA transposons over the past 100 million years, with only 2 different families of TEs invading 4 of the 18 species within the past 40 million years. v TABLE OF CONTENTS ACKNOWLEDGEMENTS…………………………….………………………………………… iii ABSTRACT………………………………………………………………………………………. iv LIST OF ILLUSTRATIONS…………………………………………………………………….. xii LIST OF TABLES……………………………………………………………………………….. xvi Chapter Page 1. THE EVOLUTIONARY HISTORY OF HUMAN DNA TRANSPOSONS: EVIDENCE FOR INTENSE ACTIVITY IN THE PRIMATE LINEAGE..........…………….…………………………………… 1 1.1 Introduction........................................…………………..……………. 1 1.2 Results..............................................………………………………… 4 1.2.1 Census of DNA transposons in the human genome….......................................……………………. 4 1.2.2 Analysis of DNA transposons nested into other elements……………………………………………….….. 5 1.2.3 Cross-species genomic analysis of orthologous insertions….………………………………………..…. 9 1.2.4 Age of DNA transposons based on sequence divergence….………………………………………..…. 13 1.3 Discussion..........................................………………………………… 20 1.3.1 Combined methods provide a detailed estimate of the age of human DNA transposons...……………. 20 1.3.2 A general extinction of DNA transposon activity in the anthropoid lineage...........................……………. 23 1.3.3 Contribution of DNA transposons to primate genome evolution......................................……………. 25 1.4 Methods..........................................………………………………… 26 1.4.1 Calculation of average percent divergence from RepeatMasker output..................................……………. 26 vi 1.4.2 Nested insertions........................................……………. 27 1.4.3 Cross-species genomic analysis of orthologous insertions..........................................……………. 27 1.4.4 Calculation of substitution rates and dating according to sequence divergence.............……………. 28 2. REPEATED HORIZONTAL TRANSFER OF A DNA TRANSPOSON IN MAMMALS AND OTHER TETRAPODS.....………………… 29 2.1 Introduction..........................................…………….………………….. 29 2.2 Results..............................................………………………………… 30 2.2.1 Discovery of SPIN transposons...........……………………. 30 2.2.2 Lineage-specific amplification of non-autonomous SPIN elements.............................................……………………. 32 2.2.3 Evidence for the horizontal introduction of SPIN elements.............................................……………………. 36 2.2.4 Timing of SPIN colonizations................……………………. 42 2.3 Discussion..............................................………………………………… 43 2.4 Materials and Methods..........................………………………………… 46 2.4.1 Identification of SPIN elements..............……………………. 46 2.4.2 Tests for purifying selection..................……………………. 47 2.4.3 Calculation of neutral substitution rates in mammalian lineages colonized by SPIN....……………………. 47 2.4.4 PCR and sequencing of non-autonomous SPIN elements..............................................……………………. 49 3. CAPTURE OF DNA VIA DOUBLE-STRAND BREAK REPAIR IS A COMMON MECHANISM OF GENOMIC DUPLICATION IN VERTEBRATES........................…………………………………………………. 50 3.1 Introduction.......................................…………..…………………….. 50 3.2 Results.............................................………..……………………..…. 53 3.2.1 Identification of potential RDs……………..…...........…… 53 3.2.2 Analysis of RD breakpoints……………………………….… 56 vii 3.2.3 Rate and timing of repair-mediated duplication during primate evolution............……………………………….… 59 3.2.4 RD polymorphism and human genetic variation....….… 61 3.2.5 Duplication of potentially functional sequences…….… 63 3.2.6 Genomic distribution of RDs……………………………….… 64 3.2.7 RDs in other vertebrate genomes.....…………………….… 67 3.3 Discussion.........................................………….………………………. 70 3.3.1 Validation of experimental method.....……..…...........…… 70 3.3.2 Genomic impact of RDs.....................……..…...........…… 71 3.3.3 General slowdown in RD formation during the primate radiation....................................……..…...........…… 72 3.3.4 Genome size and number of RDs.....……..…...........…… 72 3.3.5 RDs and segmental duplications........……..…...........…… 74 3.4 Methods.........................................………….………………………. 75 3.4.1 Retrieval of genome sequences and RepeatMasker rmsk files....................................................……..…...........…… 75 3.4.2 Identification of RDs..........................……..…...........…… 75 3.4.3 PCR verification of computationally detected RD loci........................................................……..…...........…… 76 4. WIDESPREAD HORIZONTAL TRANSFER OF DNA TRANSPOSONS IN MAMMALS..................................………………………………………………. 78 4.1 Introduction.......................................…………………………………. 78 4.2 Results.............................................……………….………………… 81 4.2.1 Discovery of new DNA TE families.......…….……………. 81 viii 4.2.2 Horizontal transfer of hAT elements..……….…………… 82 4.2.2.1 Charlie12 family.................……….……………