Transposable Elements in Teleost Fish
Total Page:16
File Type:pdf, Size:1020Kb
Transposable elements in teleost fish In silico explorations of TE activity, diversity and abundance across 74 teleost fish genomes William Brynildsen Reinar Department of Biosciences Faculty of Mathematics and Natural Sciences UNIVERSITY OF OSLO June 2016 2 Acknowledgments First, a great thanks goes to my co-supervisor Ole K. Tørresen who provided me with invaluable guidance throughout the project. Thanks to Associate Prof. Alexander J. Nederbragt and Prof. Kjetill S. Jakobsen for accepting me in the genomics group and allowing me to write the thesis in a highly stimulating environment at CEES. My appreciation also goes to Srinidhi Varadharajan for fruitful discussions, as well as Dr. Michael Matschiner and Dr. Martin Malmstrøm for valuable input and for providing the genomes and the phylogeny used in the project. Thanks to Dr. Ning Jiang at Michigan State University for responding to my enquiries and providing me with some of the scripts used in the project. I am also grateful to Siv Hoff, Håkon Høgset and Caroline S. Mjønes for great input and support. Finally, thanks to my family for always being there for me. Oslo, May 2016 William Brynildsen Reinar 3 Copyright William Brynildsen Reinar 2016 Transposable elements in teleost fish: in silico explorations of TE activity, diversity and abundance across 74 teleost genomes William Brynildsen Reinar http://www.duo.uio.no Print: Reprosentralen, Universitetet i Oslo 4 5 Summary Transposable elements (TEs) comprise a major proportion of most large eukaryotic genomes. Through different mechanisms of transposition, TEs can move within genomes and cause genomic rearrangements. Large-scale studies of TEs have recently become possible through the advent of high-throughput sequencing. Thus, unprecedented opportunities for uncovering the TE compositions in genomes from a vast range of species have presented itself. Teleost fish are an infraclass within ray-finned fishes, with a long evolutionary history, that display vast diversity in morphology and a variety of life history strategies. Compared with other vertebrates, the genomes of teleost fish have been shown to inhabit a highly diverse composition of TEs, many of which show signs of recent activity. In addition, the link between genome size and TE abundances has been indicated to be particularly strong. However, less than 20 different teleost fish have so far been analyzed with regard to TE content. Using nine available and 65 newly sequenced teleost genome assemblies, this project aimed to explore TE diversity, abundance and activities within teleost fish genomes. A custom computational pipeline was devised to annotate the genomes of 74 different teleostean species, including a more extensive investigation of a recently improved version of the Atlantic cod genome. Results were coupled to a time-calibrated phylogenetic tree for an investigation of the TE data in a proper evolutionary context. The study revealed a substantial variation in TE content and that the variation mostly follows phylogenetic patterns. Unique in scale, this study uncovers that i) DNA transposons are predominant TEs, ii) the diversity of TE superfamilies/families is high and iii) that recent TE activity is common in teleost fish. In addition, an investigation of TE activity over time revealed the presence of a strong phylogenetic signal present in the repeat landscapes of genomes that diverged less than 20 million years ago. Last, a strong correlation between genome size and TE abundance are indicated. 6 Abbreviations TE Transposable element TR Tandem repeat LTR Long terminal repeat LTR-RT LTR retrotransposon TRIM Terminal repeat-retrotransposons in miniature TIR Terminal inverted repeat TSD Target site duplication LINE Long interspersed nuclear element SINE Short interspersed nuclear element CDA Copy divergence analysis HMM Hidden Markov model PBS Primer binding site PPT Polypurine tract GFF Generic feature format IGV Integrative Genome Viewer LCA Last common ancestor Mya Million years ago nt Nucleotides bp Base pairs cDNA Complementary DNA tRNA Transfer RNA 7 Table of contents TABLE OF CONTENTS ................................................................................................................................... 8 INTRODUCTION .............................................................................................................................................. 9 TRANSPOSABLE ELEMENT BIOLOGY ................................................................................................................................ 9 INFLUENCES ON HOSTS .................................................................................................................................................... 12 TRANSPOSABLE ELEMENTS IN TELEOSTS .................................................................................................................... 14 IN SILICO DETECTION ....................................................................................................................................................... 15 AIMS OF THE WORK .................................................................................................................................... 17 METHODS ....................................................................................................................................................... 18 CHOICE OF DE NOVO DETECTION TOOLS ....................................................................................................................... 18 BUILDING SPECIES-SPECIFIC LIBRARIES ....................................................................................................................... 21 COPY DIVERGENCE ANALYSIS ......................................................................................................................................... 25 GENOME ASSEMBLY DATA .............................................................................................................................................. 25 DATA ANALYSIS ................................................................................................................................................................ 26 RESULTS .......................................................................................................................................................... 27 VALIDATION OF THE ANNOTATION PIPELINE .............................................................................................................. 27 ANNOTATION OF THE ATLANTIC COD GENOME .......................................................................................................... 29 ABUNDANCE, DIVERSITY AND ACTIVITY ACROSS TELEOSTEAN LINEAGES ............................................................. 33 RELATION BETWEEN ABUNDANCE AND GENOME SIZE .............................................................................................. 41 DISCUSSION ................................................................................................................................................... 44 DEVELOPING A COMPUTATIONAL ANNOTATION PIPELINE ....................................................................................... 44 EVALUATING THE ATLANTIC COD ANNOTATION EFFORTS ....................................................................................... 46 IN LIGHT OF CURRENT KNOWLEDGE ............................................................................................................................. 47 PHYLOGENETIC PERSPECTIVES ...................................................................................................................................... 49 CONCLUSION AND FUTURE PERSPECTIVES ........................................................................................ 51 REFERENCES .................................................................................................................................................. 52 APPENDIX ....................................................................................................................................................... 58 8 Introduction Transposable elements (TEs) are a diverse group of genetic elements that share the capability of movement within a genome. Thus, they are often called ‘jumping genes’. Like viruses, TEs take advantage of the DNA replication and transcription processes of their hosts to facilitate self-propagation. However, unlike viruses, TEs rarely leave host cells. The unique ability of TEs to make copies of themselves appears to be a effective strategy for self-conservation, evident by their presence in genomes across the tree of life (Kazazian 2004). In fact, the transposase gene, encoded by a certain class of TEs, is the most abundant gene in nature (Aziz et al. 2010). TEs were first discovered in the 1940’s by Barbara McClintock (McClintock 1950). Through studies of seed color variation in maize, McClintock demonstrated that certain genetic elements have the capability to ‘jump’ within genomes and disrupt the function of genes, in her case the genes deciding seed color. This Nobel prize-winning discovery has had a profound impact of our understanding of genome architecture and dynamics, as DNA was so far considered to be rather static. Studies of TEs have since determined the different mechanisms of movement, which is a process that usually results in an increase of TE copies. Such replicative movement has led TEs to be a major fraction of many eukaryotic genomes, such as that of maize, in which