Phylogenetic Incongruence As a Signature of Past Events: Uncovering the Genomic Watermarks of Introgression and Gene Duplication
Total Page:16
File Type:pdf, Size:1020Kb
Phylogenetic Incongruence as a Signature of Past Events: Uncovering the Genomic Watermarks of Introgression and Gene Duplication Item Type text; Electronic Dissertation Authors Forsythe, Evan Sullivan Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Download date 04/10/2021 02:07:53 Link to Item http://hdl.handle.net/10150/628008 PHYLOGENETIC INCONGRUENCE AS A SIGNATURE OF PAST EVENTS: UNCOVERING THE GENOMIC WATERMARKS OF INTROGRESSION AND GENE DUPLICATION by Evan Sullivan Forsythe __________________________ Copyright © Evan Sullivan Forsythe 2018 A Dissertation Submitted to the Faculty of the SCHOOL OF PLANT SCIENCES In Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY In the Graduate College THE UNIVERSITY OF ARIZONA 2018 STATEMENT BY AUTHOR This dissertation has been submitted in partial fulfillment of the requirements for an advanced degree at the University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the Library. Brief quotations from this dissertation are allowable without special permission, provided that an accurate acknowledgement of the source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the head of the major department or the Dean of the Graduate College when in his or her judgment the proposed use of the material is in the interests of scholarship. In all other instances, however, permission must be obtained from the author. SIGNED: Evan Sullivan Forsythe 3 ACKNOWLEDGEMENTS I would first like to thank my parents, Maureen and John, for their unending support. I would not be writing this dissertation without the countless opportunities they have afforded me throughout my life. I would also like to thank my undergraduate mentor, Charles (Chuck) Rodell, for his infectious passion for genetics and evolution and for often pointing out to his class the beautiful fact that the same core biological concepts are at play ‘in carrots and frogs and everything in between’, an idea I extended to mustards and Neanderthals. I am extremely grateful for the mentorship I’ve received during my time at the University of Arizona. I would like to extend my appreciation to Mark Beilstein, Eric Lyons, Rebecca Mosher, Ravishankar Palanivelu, Michelle McMahon, Michael Sanderson, Andrew Nelson, David Baltrus, A. Elizabeth (Betsy) Arnold, Noah Whiteman, Jeremiah Hackett, Mike Barker, Ted Weinert, Michael Nachman, Frans Tax, Mike Frank, Jennifer Wisecaver, and Georgina Lambert. Thank you for the respect and guidance and for making academia a friendly place. I also extend thanks to the many undergraduate researchers in the Beilstein Lab who have made the work a positive, welcoming, and fun environment. I thank The Integrative Graduate Education and Research Traineeship and The Boynton Graduate Fellowship in Plant Molecular Biology for financial support. I would also like to acknowledge my fellow graduate students, many of whom have become some of my very best friends. Thank you Kelly Dew-Budd, Stacy Jorgensen, Kyle Palos, Jennifer Nobel, Josh Trujillo, Kelvin Pond, Justin Shaffer, Aaron Ragsdale, Andrew Gloss, Florence Durney, Brianna McTeague, Noelle Bittner, Gleb Zhelezov, Jordan Brock, Ashlee Christine, and Nick Beauregard. Thank you Pierce Edmiston for your friendship and for encouraging me to pursue grad school. I would especially like to thank Shea Lambert and 4 Anthony Baniaga for being the best colleagues, friends, and roommates anyone could ever ask for. Thanks for the good times, the perl scripts, and the late night discussions of species concepts. Finally, I extend heartfelt thanks to my siblings, Grant and Sarah, and all my beloved friends in the ‘real world’ who know that there is more to life than publications and p-values. Thank you Gwen Clarke, Caleb Johansson, Clint Baker, Craig Baker, Troy Schlicht, Tyler Peterson, Grady Sloan, Natalia Guzman, Francis Bacon, Bryan Jazzperson, Leo Flynn, Rachel Wehr, Ben Cahill, Bubble Baker, Alex Ralston, Bill Baron, and all the fine folks at Caffe Luce. 5 DEDICATION To my mentor Mark Beilstein, a wonderful scientist and an even better human being. 6 TABLE OF CONTENTS ABSTRACT .............................................................................................................................. 8 INTRODUCTION................................................................................................................... 10 1.1 LITERATURE REVIEW ........................................................................................................ 10 1.2 EXPLANATION OF DISSERTATION FORMAT ......................................................................... 23 PRESENT STUDY.................................................................................................................. 25 2.1 EPISTATIC INTERACTIONS DRIVE BIASED GENE RETENTION IN THE FACE OF MASSIVE NUCLEAR INTROGRESSION ..................................................................................................................... 25 2.2 POLARIZATION OF INTROGRESSION WITH GENOME-WIDE DIVERGENCE PATTERNS ............... 27 2.3 A GENOMIC ANALYSIS OF FACTORS DRIVING LINCRNA DIVERSIFICATION: LESSONS FROM PLANTS.................................................................................................................................. 28 REFERENCES ....................................................................................................................... 34 APPENDIX A .......................................................................................................................... 43 APPENDIX B .......................................................................................................................... 90 APPENDIX C ........................................................................................................................ 125 7 ABSTRACT Eukaryotic genomes are ever-changing mosaics of past evolutionary events. Each site in the genome provides information about the evolutionary processes that have shaped its identity. Phylogenetics has long used the nucleotide identities at genomic sites as characters with which to reconstruct the ‘tree of life’ that unites living organisms. However, it was soon recognized that different sites in the genome often evolve independently, and thus each site tells a story of its own evolution, which may conflict with the stories told by other sites in the same genome. For example, in contradiction to the tidy concept of bifurcating tree-like evolution, we recognize that genetic material can be transferred between species, causing independent branches to reunite (i.e. reticulate) to form a shape that is more web-like than tree-like, referred to as a phylogenetic network. Further, when this type of reticulation occurs, it may only affect a portion of the genome, meaning phylogenies from some sites in the genome will reflect a history of reticulation while others will reflect a history of simple speciation. Another evolutionary process that can affect portions of the genome is gene duplication, which can result in a locus in the genome sharing homology with other loci in the same genome. When such multi-copy loci are used to infer branching order of several species, it can become very difficult to distinguish orthologs from out-paralogs, thus obscuring which nodes on the gene tree correspond with speciation events and which correspond with duplication events. Both reticulation and gene duplication can result in discordance between phylogenies inferred from differing regions in the genome, a condition termed phylogenetic incongruence. In plant evolution, reticulation and gene duplication are ubiquitous and a large field of study is devoted to the phylogenetic challenges caused by these processes. However, in addition to obfuscating species tree inference, these two processes are known to have considerable effects on 8 the functional evolution of species; both can underlie the acquisition of novel adaptive traits. Therefore, it is important to identify and characterize these processes in model and crop systems. Toward this goal, the ‘challenge’ of phylogenetic incongruence can be turned on its head and instead used as a signature that illuminates the genomic watermarks of historical reticulation and duplication. In this dissertation, I present three studies in which I used genome-wide patterns of phylogenetic incongruence to infer evolutionary processes. In Appendix A, my coauthors and I describe a reticulation event that occurred more than nine million years ago in Brassicaceae and I explore the genomic and functional consequences that are still detectable in extant genomes. In Appendix B, I develop a method for determining which species received foreign genetic material during introgression events. I explore the versatility of this method by applying it to simulated whole genome sequences, as well as empirical genome data from extant and extinct hominins as well as mosquitos. Finally, In Appendix C, I infer complex duplication histories of more than 1,000 non-coding RNAs across the Arabidopsis genome, thereby revealing a mechanism underlying diversification in plant transcriptomes.