Evolutionary Inference from Endogenous Retrovirus Distribution and Diversity
Total Page:16
File Type:pdf, Size:1020Kb
Evolutionary Inference from Endogenous Retrovirus Distribution and Diversity Robert James Moncreiff Gifford University of London Imperial College of Science, Technology and Medicine Department of Biology Silwood Park A thesis submitted for the degree of Doctor of Philosophy in the year 2002 1 Preface PREFACE The work in this thesis carried out between October 1998 and August 2002. My research was supported by a studentship from the Natural Environment Research Council (NERC) and supervised by Dr M. Tristem. This thesis is the result of my own work except where explicitly stated in the text. The contents have not been previously submitted for any degree, diploma, or any other qualification at Imperial College or at any other university. 2 Acknowledgements ACKNOWLEDGEMENTS I would like to thank my friends and colleagues at Silwood Park for their support and understanding. Equally I would like to thank my friends elsewhere, and my family, for offering some respite from retroviruses and from science in general, and for their encouragement and unaccountable faith in me. Special thanks to Paul-Michael Agapow for his invaluable guidance and supervision in the realm of programming and bioinformatics, and to Vicki and her family for their wonderful hospitality during my stint in post-grant purgatory. Above all I would like to thank my supervisor, Dr Mike Tristem, for his support, guidance, and his remarkable patience and generosity. 3 Abstract ABSTRACT Endogenous retroviruses (ERVs) are the relics of germline infections by ancient retroviruses. ERVs are widespread elements within the genomic DNA of vertebrates, and show great potential as markers of evolutionary processes. The work reported here is an exploration of the distribution and diversity of ERVs throughout vertebrate genomes, and of the kind of evolutionary inference that can be made from it. PCR screening and automated sequencing were used to amplify and characterise novel ERV fragments, and phylogenetic reconstruction was used to infer the relationships between them. A computer simulation model was developed and used to explore how ERV distribution and diversity is generated in response to varying ecological and evolutionary parameters. Simulation provided an experimental environment in which to model the relationship between the evolutionary history of host/ERV lineages, and patterns of ERV distribution. Investigations using simulation suggested a general pattern for ERV evolution and indicated how events in the evolution of the host/virus lineage might shape ERV distribution and diversity. 4 Table of Contents TABLE OF CONTENTS Preface 2 Acknowledgements 3 Abstract 4 Contents 5 Index of Figures 9 Index of Tables 11 Abbreviations 12 Retrovirus nomenclature 13 Aims 15 1) Introduction 1.1 Retroviruses 16 1,1,1 The Retroviridae 16 1.1.2 Retroviruses and medical science 16 1.1.3 Endogenous retroviruses 17 1.2 Reverse transcription - a unique genetic strategy 19 1.3 Retrovirus structure and genomic organisation 21 1.4 The retrovirus life cycle 25 1.4.1 Attachment and penetration 25 1.4.2 Reverse transcription 25 1.4.3 Nuclear entry and integration 30 1.4.4 Expression: Transcription 33 1.4.5 Expression: Translation 34 1.4.6 Assembly and budding 35 1.5 Retrovirus evolution 38 1.5.1 Rapid evolution of retroviral sequences 38 1.5.2 Mechanisms for gene exchange 39 1.5.3 Endogenous retroviruses 40 1.5.4 Reconstructing retrovirus relationships 43 1.6 Retrovirus distribution and diversity 49 1.6.1 Exogenous retrovirus diversity 49 1.6.2 Human endogenous retrovirus (HERV) diversity 49 1.6.3 ERV diversity throughout vertebrates 54 1.7 Analysis of ERV distribution and diversity 58 1.7.1 ERV host range 58 1.7.2 Cospeciation and horizontal transmission of retroviruses 58 1.7.3 ERV copy number 60 1.7.4 Age distribution of ERV insertions 60 1.7.5 ERV distribution within genomes 61 1.7.6 ERVs as Glade markers 63 1.9 The aims of this study 64 5 Table of Contents TABLE OF CONTENTS 2) The Distribution and Diversity of Class II Retroviruses 2.1 Introduction - A review of Class II diversity 65 2.1.1 Class II diversity: Alpharetroviruses 65 2.1.2 Class II diversity: Betaretroviruses 70 2.1.3 Class II diversity: Lentiviruses 73 2.1.4 Class II diversity: Deltaretroviruses 78 2.1.5 Class II diversity: IAP Elements 81 2.1.6 Class II diversity: Class II HERVs 82 2.1.7 Class II diversity: Divergent Class II ERVs 85 2.1.8 Using PCR screening to investigate the diversity of Class II ERVs 87 2.2 Materials 88 2.2.1 Media, plates and buffers 88 2.2.2 Vectors and bacterial strains 88 2.2.3 Enzymes 88 2.2.4 Gels, running buffers, and molecular weight markers 89 2.2.5 Oligonucleotide primers 89 2.2.6 Other reagents, kits and consumables 90 2.2.7 Equipment 90 2.3 Methods 91 2.3.1 DNA extraction 91 2.3.2 Polymerase chain reaction 91 2.3.3 Cloning - Ligation 93 2.3.4 Cloning - Transformations 94 2.3.5 Cloning - Plasmid DNA Preparation 95 2.3.6 Sequencing 96 2.3.7 Sequence identification and alignment 97 2.3.8 Phylogenetic analysis 98 2.3.9 Confirming sequence origin using PCR 99 2.4 Results 100 2.4.1 Design of novel primer pairs 100 2.4.2 Isolation of viruses 104 2.4.3 Confirmation of fragment origin 105 2.4.4 Sequence alignment 110 2.4.5 g-patch domain 112 2.4.6 Nonsense mutations (stop codons and frameshifts) 114 2.4.7 Phylogenetic analysis 119 2.4.8 The status of recognised Class II groups 124 2.4.9 Novel divergent groups 131 2.4.10 Distribution of nonsense mutations across phylogeny 133 2.4.11 Distribution of env types across mammalian Class II retroviruses 135 2.4.12 Avian class II retroviruses and host geographic range 139 6 Table of Contents TABLE OF CONTENTS 2.5 Discussion 141 2.5.1 Phylogenetic analysis 141 2.5.2 Novel retrovirus groups 144 2.5.3 Horizontal transfer between host classes 145 2.5.4 Horizontal transfer within host classes 146 3) Simulation Modelling of ERV Evolution 3.1 Introduction 148 3.1.1 ERV distribution and diversity within species 148 3.1.2 Computer simulation of ERV evolution using an individual-based 150 model 3.2 Approach 151 3.2.1 Model components 152 3.2.2 Input data 154 3.2.3 Output data 155 3.2.4 Model structure 155 3.3 Implementation 160 3.3.1 Materials - Software development environment 160 3.3.2 Random number generator 160 3.3.3 General features of the design and implementation process 161 3.3.4 Simulation components 161 3.4 Demonstration 166 3.4.1 TEST 1 — Population size and fixation frequency 166 3.4.2 TEST 2 — Gene density and fixation frequency 167 3.4.3 TEST 3 — Transposition rate and element population size 167 3.4.4 TEST 4 - Recombination 170 3.5 Application 172 3.5.1 Fixation and persistence of TE lineages 172 3.5.2 ERV Glade growth 172 3.5.3 The effect of incomplete sampling on evolutionary inference 174 3.5.4 Consequences of sharing gene products 175 4) The Generation of ERV Distribution and Diversity 4.1 Introduction 177 4.1.1 The `lifecycle of ERV lineages 177 4.1.2 Colonisation, ERV diversity and host/virus ecology 177 4.1.3 Post-colonisation ERV evolution 178 4.2 Methods 180 4.2.1 The Passengers simulation 180 7 Table of Contents TABLE OF CONTENTS 4.3 Results 180 4.3.1 Simulating colonisation 180 4.3.2 Persistence of ERV activity following fixation 185 4.3.3 Frequency of fixation 189 4.4 Discussion 192 4.4.1 Colonisation, the pace of amplification, and loss versus persistence of 192 ERV lineages 4.4.2 The ERV lineage `lifecycle' and the dynamics of ERV Glade growth 195 4.4.3 Fixation frequency 198 5) Conclusions 5.1 The distribution and diversity of retroviruses 202 5.1.1 ERVs as evolutionary markers 202 5.1.2 ERVs as markers of exogenous retrovirus evolution 202 5.1.3 Class II retrovirus distribution and diversity 204 5.1.4 The generation of ERV diversity 190 6) References References 207 7) Appendices Appendix 1 Tissue and DNA sources 235 Appendix 2 Nucleotide alignment of Class II ERV pol fragments 239 Appendix 3 Characteristics of novel class II ERVs identified in this study 275 8 Index of Figures INDEX OF FIGURES Nomenclature Al) Retrovirus classification 14 1) Introduction 1.1 The retrovirus replication cycle 19 1.2 Reverse transcription and the central dogma 20 1.3 Schematic cross-section through a retroviral particle 21 1.4 Genome structure of a generalised retrovirus 23 1.5 The retrovirus life cycle 26 1.6 Reverse transcription 27-29 1.7 Integration 31 1.8 Fixation of an ERV insertion 41 1.9 DNA recombination events involving ERVs 41 1.10 An evolutionary tree of the retroelements 48 1.11 Taxonomy and sequence relationships of retroviruses 50 1.12 The relationships between exogenous retrovirus genera, HERV families, and some 56 non-human ERVs 1.13 Tanglegram showing host/virus relationships 62 1.14 Fixed ERVs track host phylogeny 62 2) The Distribution and Diversity of Class II Retroviruses 2.1 A phylogeny of the class II retroviruses 66 2.2 Rous sarcoma virus (RSV) genetic map 67 2.3 Mason-Pfizer monkey virus (MPMV) genetic map 71 2.4 Human immunodeficiency virus type-1 (HIV-1) genetic map 74 2.5 Human T-cell leukemia virus type-1 (HTLV-1) genetic map 79 2.6 Deltaretrovirus relationships 79 2.7 Novel Class II ERVs identified by PCR screening 86 2.8 Positions of target motifs for primers within PRO-RT coding domain 101 2.9 An alignment showing the conserved motif 'DIG/KDAY' in the lentivirus genome 101 2.10 Comparison of primer efficiencies 103 2.11 PCR products and marker 104 2.12 Alignment of retroviral G-patch domains with other G-patch domains 113 2.13 Distribution of G-patch domain across Class II taxa 113 2.14 Comparison of nucleotide composition