Chapter 1 - a Brief Overview of the Complexities of Repetitive DNA
Total Page:16
File Type:pdf, Size:1020Kb
UNCOVERING VARIATION IN THE REPETITIVE PORTIONS OF GENOMES TO ELUCIDATE TRANSPOSABLE ELEMENT AND SATELLITE EVOLUTION A Dissertation Presented to the Faculty of the Graduate School of Cornell University In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Michael Peter McGurk August 2019 © 2019 Michael Peter McGurk UNCOVERING VARIATION IN THE REPETITIVE PORTIONS OF GENOMES TO ELUCIDATE TRANSPOSABLE ELEMENT AND SATELLITE EVOLUTION Michael Peter McGurk, Ph. D. Cornell University 2019 Eukaryotic genomes are replete with repeated sequence, in the form of transposable elements (TEs) dispersed throughout genomes and as large stretches of tandem repeats (satellite arrays). Neutral and selfish evolution likely explain their prevalence, but repeat variation can impact function by altering gene expression, influencing chromosome segregation, and even creating reproductive barriers between species. Yet, while population genomic analyses have illuminated the function and evolution of much of the genome, our understanding of repeat evolution lags behind. Tools that uncover population variation in non-repetitive portions of genomes often fail when applied to repetitive sequence. To extend structural variant discovery to the repetitive component of genomes we developed ConTExt, employing mixture modelling to discover structural variation in repetitive sequence from the short read data that commonly comprises available population genomic data. We first applied ConTExt to investigate how mobile genetic parasites can transform into megabase-sized tandem arrays, as some satellites clearly originated as TEs. Making use of the Global Diversity Lines, a panel of Drosophila melanogaster strains from five populations, this study revealed an unappreciated consequence of transposition: an abundance of TE tandem dimers resulting from TEs inserting multiple times at the same locus. Thus, the defining characteristic of TEs—transposition—regularly generates structures from which new satellite arrays can arise, and we further captured multiple stages in the emergence of satellite arrays ongoing in a single species. We then investigated the complex array of processes which shape TE evolution, focusing on the putatively domesticated HeT-A, TAHRE, and TART (HTT) elements that maintain the telomeres of Drosophila. To provide context, we compared HTT variation to that of other TE families with known properties. Our results suggest that differences between HTT variation and other TE families largely result from the rapid sequence turnover at telomeres. We further suggest that the localization of the HTTs to the telomere reflects a successful evolutionary strategy rather than pure domestication. However, we find evidence that susceptibility to host regulation varies among HTTs and across populations, suggesting that despite constituting the mechanism of telomere maintenance, the HTTs remain in conflict with the genome like any other TE. BIOGRAPHICAL SKETCH The author grew up in Randolph, Massachusetts. He attended both Bridgewater State College and Bridgewater State University—as somebody saw fit to change the name partway through—and earned his Bachelor of Science in Biology with a minor in Biochemistry from Bridgewater State University in 2013. He began his doctoral studies in the field of Biochemistry, Molecular, and Cell Biology at Cornell University in 2013, and was supported during his first year by Cornell’s Presidential Life Sciences Fellowship. He joined the lab of Professor Daniel Barbash in 2014 who introduced him to the repetitive stuff in genomes and under whose mentorship he has developed and applied computational approaches to understand the evolution of repetitive elements. ACKNOWLEDGMENTS The years over which this work was carried out were challenging, stimulating, and, more than occasionally, fun. They tested and honed my critical thinking, my creativity, and my perseverance. At the end of this, I can say that this time has been immensely rewarding and I am incredibly grateful to have had this opportunity. The subjects of this gratitude, I think, can be found in considering that events could have played out otherwise, that if someone rebooted the Universe in Safe Mode and performed a System Restore, after rerunning the past five years, or ten years, or twenty years it might become clear that this outcome was not necessarily guaranteed. So in reflecting upon the people and circumstances that led me to this Present, rather some other conceivable present where this dissertation does not exist, where I did not get to experience the challenges, the frustrations, and the joys of these past six years, I find myself wanting to thank quite a few people. To Dan Barbash, my mentor over these years, I am immensely indebted. He not only gave me the opportunity to carry out this work but also the guidance I needed to develop as a scientist. I cannot help but think he took a risk in letting a first-year PhD student, with a background in molecular biology rather than computer science, pursue this heavily computational line of research as a dissertation project. But few things perhaps cultivate the confidence necessary for independent research better than having the space to fail, figure out why the ideas didn’t work, and then to try again. Dan not only provided the freedom to do so, but when I found myself lost helped me step back and consider the way forward. I have greatly valued his careful and meticulous approach to science, and the extent to which he cares about doing and communicating science well. This certainly improved the research I carried out, my writing and speaking abilities, and my approach to science. And when personal crises arose outside of my work, he always supported me and gave me the time and space I needed to resolve or make peace with them. I want to also thank my committee members, Andy Clark and Paul Soloway, whose thoughtful feedback both challenged me and improved this work. They were always willing to meet and talk and share their expertise, and always thought deeply about my projects. The members of their labs, too, were generous with their time and perspective. It was, for example, a conversation about a bioinformatic tool that David Taylor from Paul’s lab was employing in his own work that made me aware of the Expectation-Maximization acceleration procedure which I ultimately employed in this project. Members of the Clark and Feschotte labs also provided regular perspective and feedback during our monthly joint lab meetings. Dan Zinshteyn, Kevin Wei, Tawny Cuykendall, Shuqing Ji, Sarah Lower, Dean Castillo, and Anne-Marie Dion-Côté were my lab mates over the course of this work and made the environment a truly joyful place to work and do research. They provided support during the hard times, offered willing ears to bounce ideas off of, and were quite simply great fun to be around. I have many wonderful memories from the years I spent working alongside them. My time as an undergraduate at Bridgewater State University is where I discovered the wonderful complexity of biology and my love for it. The faculty care greatly about education and my decision to pursue graduate school was in no small part due to them, especially Drs. Boriana Marintcheva, Jennifer Mendell, and Steven Haefner. I felt at home there, in no small part to the emphasis placed on undergraduate education. It was also early during my time at Bridgewater that I struggled with and briefly dropped out of college. The months which followed were critical moments where had they gone differently, I would not be writing this. During this time, the support of my parents, Linnea and Peter McGurk, and my brother, Daniel McGurk, was crucial. My subsequent successes also owe much to Dr. Arunas Kuncaitis who strongly advocated that I continue at Bridgewater, helped me plan out the first few semesters of my return, and develop strategies to achieve what I was capable of. That was not the first time I struggled academically, rather it was the turning point. Up to this point, however, my mother had put considerable time and effort into navigating the conflicting bureaucracies of special education statutes and school administrators and sat through countless frustrating, probably embarrassing parent- teacher conferences. At the time it must have seemed like optimism verging on delusion, but she labored under the conviction that somehow these efforts would pay off and a time would come when I would thrive. I think this work would not have happened if she had not been there to help keep me afloat during those years. I am finally grateful for the researchers, the biologists, the mathematicians, computer scientists, philosophers whose own years of thought and effort laid down the foundations for this work, without whose contributions to the body of human knowledge this dissertation would not be possible. And also to any future readers, who perhaps (or perhaps not) will find the ideas expressed herein useful in their own work. Science, as in many human endeavors, is not the work of individuals laboring independently, but is built-up upon prior efforts and is in turn built upon. Any individual contribution is likely to be only a small part of the body that emerges, but it is both humbling and inspiring to be a part of it. TABLE OF CONTENTS Chapter 1 - A brief overview of the complexities of repetitive DNA ............................. 1 Transposable element evolution ....................................................................... 3 The principles