Bioinformatic and Phylogenetic Analyses of Retroelements in Bacteria

University of Calgary PRISM: University of Calgary's Digital Repository Graduate Studies The Vault: Electronic Theses and Dissertations 2018-11-29 Bioinformatic and phylogenetic analyses of retroelements in bacteria Wu, Li Wu, L. (2018). Bioinformatic and phylogenetic analyses of retroelements in bacteria (Unpublished doctoral thesis). University of Calgary, Calgary, AB. doi:10.11575/PRISM/34667 http://hdl.handle.net/1880/109215 doctoral thesis University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. Downloaded from PRISM: https://prism.ucalgary.ca UNIVERSITY OF CALGARY Bioinformatic and phylogenetic analyses of retroelements in bacteria by Li Wu A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY GRADUATE PROGRAM IN BIOLOGICAL SCIENCES CALGARY, ALBERTA NOVEMBER, 2018 © Li Wu 2018 Abstract Retroelements are mobile elements that are capable of transposing into new loci within genomes via an RNA intermediate. Various types of retroelements have been identified from both eukaryotic and prokaryotic organisms. This dissertation includes four individual projects that focus on using bioinformatic tools to analyse retroelements in bacteria, especially group II introns and diversity-generating retroelements (DGRs). The introductory Chapter I gives an overview of several newly identified retroelements in eukaryotes and prokaryotes. In Chapter II, a general search for bacterial RTs from the GenBank DNA sequenced database was performed using automated methods. It not only enlarged the collection of bacterial reverse transcriptases (RTs), but also revealed several new classes of RTs. In Chapter III, another automated search was performed to identify group II introns. All predicted introns were automatically folded and then manually refined. Other information, such as multiple copies and non-standard intron organisations, were also identified. Next, all introns were subjected in several analyses in order to depict common properties for each class, such as preferences in target sites and DNA strands. Using this enlarged dataset, Chapter IV aimed to resolve the phylogeny of group II introns and investigate whether the intron-encoded protein (IEP) and RNA portions coevolved. Among trees constructed from various datasets, such as using different sequence masks, smaller sampled subsets or morphological features, the hypothesis that the IEP and RNA coevolved was supported by comparisons among most trees, even though it seemed to be rejected by formal topology tests. Finally, Chapter V compiled and systematically classified the most recent set of DGRs, which can be used as a reliable reference to direct future DGR-related studies and experimental designs. ii Preface Chapter II of this thesis has been published as Zimmerly S and Wu L. "An Unexplored Diversity of Reverse Transcriptases in Bacteria". Microbiol Spectr. 2015;3(2):MDNA3- 0058-2014. (As stated in "ASM Journals Statement of Authors’ Rights", an ASM author retains the right to reuse the full article in his/her dissertation or thesis.) Chapter V of this thesis has been published as Wu L, Gingery M, Abebe M, Arambula D, Czornyj E, Handa S, Khan H, Liu M, Pohlschroder M, Shaw KL, Du A, Guo H, Ghosh P, Miller JF, Zimmerly S. "Diversity-generating retroelements: natural variation, classification and evolution inferred from a large-scale genomic survey". Nucleic Acids Res. 2018;46(1):11-24. (This article is available under the Creative Commons CC-BY-NC license and permits non-commercial use, distribution and reproduction in any medium, provided the original work is properly cited.) iii Table of Contents Abstract ........................................................................................................................... ii Preface ........................................................................................................................... iii Table of Contents ........................................................................................................... iv List of Tables ................................................................................................................ viii List of Figures ................................................................................................................. ix List of Abbreviations ....................................................................................................... xi Chapter I. Introduction .................................................................................................... 1 The diversity of reverse transcriptases ........................................................................ 1 Eukaryotic RTs ........................................................................................................... 2 Retroviruses and LTR retrotransposons .................................................................. 2 Non-LTR retrotransposons – LINE and SINE elements ........................................... 4 PLEs ....................................................................................................................... 6 DIRS ....................................................................................................................... 7 Pararetroviruses – hepadna- and caulimoviruses .................................................... 8 TERTs ..................................................................................................................... 9 rvt elements ............................................................................................................10 Prokaryotic RTs .........................................................................................................11 Group II introns .......................................................................................................11 Group II like RTs and CRISPR/Cas association .....................................................13 Retrons...................................................................................................................14 DGRs .....................................................................................................................15 Abi-related RTs ......................................................................................................17 Uncharacterised RTs ..............................................................................................17 Evolutionary relationship among RT-associated elements .........................................17 Contributions of bioinformatics approaches ................................................................20 Sequence-related analyses ....................................................................................20 Structure-related analyses ......................................................................................21 Phylogenetic analyses ............................................................................................22 Data handling and database management .............................................................24 Aims of this dissertation .............................................................................................24 Chapter II. Discovery of new reverse transcriptases in bacteria .....................................26 Abstract .....................................................................................................................26 Introduction ................................................................................................................26 Materials and Methods ...............................................................................................27 Initial BLAST searches for putative full-length RTs .................................................27 iv RT classification and class-wide alignments ...........................................................27 Domain composition analysis .................................................................................28 Results and Discussion ..............................................................................................30 The expanded collection of bacterial RTs ...............................................................30 Domain composition and functional prediction of RTs ............................................32 Conclusion .................................................................................................................34 Chapter III. Learning about group II introns through bioinformatic analyses ...................36 Abstract .....................................................................................................................36 Introduction ................................................................................................................37 Materials and Methods ...............................................................................................38 Intron collection from GenBank ..............................................................................38 Detection and classification of putative group II RTs ...........................................38 Filtering for incomplete group II RTs ...................................................................39

Bioinformatic and Phylogenetic Analyses of Retroelements in Bacteria

Discrete-Length Repeated Sequences in Eukaryotic Genomes (DNA Homology/Nuclease SI/Silk Moth/Sea Urchin/Transposable Element) WILLIAM R

Chromosomal Locations of Highly Repeated Dna

Inverted Repeats in Chloroplast DNA from Higher Plants* (Circular DNA/Electron Microscopy/Denaturation Mapping/Circular Dimers) RICHARD Kolodnert and K

Simian Virus 40 Tandem Repeated Sequences As an Element of The

Genome of an Acetabularia Mediterranea Strain (Southern Blot/Molecular Cloning/Dasycladaceae/Evolution) MARTIN J

Centromeric Satellite Dnas: Hidden Sequence Variation in the Human Population

Tandem Repeats Dispersed Repeats

Repeated Sequences in Linear Genetic Programming Genomes

Sequence Relationship Between Long and Short Repetitive DNA Of

Computational Method for Finding Repeated Elements

Structural Features of the Genome Can Lead to DNA Rearrangements And

Acinonyx Jubatus)