Evolutionary Inference from Endogenous Distribution and Diversity

Robert James Moncreiff Gifford

University of London

Imperial College of Science, Technology and Medicine Department of Biology Silwood Park

A thesis submitted for the degree of Doctor of Philosophy in the year 2002

1 Preface

PREFACE

The work in this thesis carried out between October 1998 and August 2002. My research was supported by a studentship from the Natural Environment Research Council (NERC) and supervised by Dr M. Tristem.

This thesis is the result of my own work except where explicitly stated in the text. The contents have not been previously submitted for any degree, diploma, or any other qualification at Imperial College or at any other university.

2 Acknowledgements

ACKNOWLEDGEMENTS

I would like to thank my friends and colleagues at Silwood Park for their support and understanding. Equally I would like to thank my friends elsewhere, and my family, for offering some respite from and from science in general, and for their encouragement and unaccountable faith in me.

Special thanks to Paul-Michael Agapow for his invaluable guidance and supervision in the realm of programming and bioinformatics, and to Vicki and her family for their wonderful hospitality during my stint in post-grant purgatory.

Above all I would like to thank my supervisor, Dr Mike Tristem, for his support, guidance, and his remarkable patience and generosity.

3 Abstract

ABSTRACT

Endogenous retroviruses (ERVs) are the relics of germline infections by ancient retroviruses. ERVs are widespread elements within the genomic DNA of vertebrates, and show great potential as markers of evolutionary processes. The work reported here is an exploration of the distribution and diversity of ERVs throughout vertebrate genomes, and of the kind of evolutionary inference that can be made from it. PCR screening and automated sequencing were used to amplify and characterise novel ERV fragments, and phylogenetic reconstruction was used to infer the relationships between them. A computer simulation model was developed and used to explore how ERV distribution and diversity is generated in response to varying ecological and evolutionary parameters. Simulation provided an experimental environment in which to model the relationship between the evolutionary history of host/ERV lineages, and patterns of ERV distribution. Investigations using simulation suggested a general pattern for ERV evolution and indicated how events in the evolution of the host/ lineage might shape ERV distribution and diversity.

4 Table of Contents

TABLE OF CONTENTS

Preface 2 Acknowledgements 3 Abstract 4 Contents 5 Index of Figures 9 Index of Tables 11 Abbreviations 12 Retrovirus nomenclature 13 Aims 15 1) Introduction

1.1 Retroviruses 16 1,1,1 The Retroviridae 16 1.1.2 Retroviruses and medical science 16 1.1.3 Endogenous retroviruses 17

1.2 Reverse transcription - a unique genetic strategy 19 1.3 Retrovirus structure and genomic organisation 21 1.4 The retrovirus life cycle 25 1.4.1 Attachment and penetration 25 1.4.2 Reverse transcription 25 1.4.3 Nuclear entry and integration 30 1.4.4 Expression: Transcription 33 1.4.5 Expression: Translation 34 1.4.6 Assembly and budding 35 1.5 Retrovirus evolution 38 1.5.1 Rapid evolution of retroviral sequences 38 1.5.2 Mechanisms for gene exchange 39 1.5.3 Endogenous retroviruses 40 1.5.4 Reconstructing retrovirus relationships 43 1.6 Retrovirus distribution and diversity 49 1.6.1 Exogenous retrovirus diversity 49 1.6.2 (HERV) diversity 49 1.6.3 ERV diversity throughout vertebrates 54 1.7 Analysis of ERV distribution and diversity 58 1.7.1 ERV host range 58 1.7.2 Cospeciation and horizontal transmission of retroviruses 58 1.7.3 ERV copy number 60 1.7.4 Age distribution of ERV insertions 60 1.7.5 ERV distribution within genomes 61 1.7.6 ERVs as Glade markers 63 1.9 The aims of this study 64

5 Table of Contents

TABLE OF CONTENTS

2) The Distribution and Diversity of Class II Retroviruses

2.1 Introduction - A review of Class II diversity 65 2.1.1 Class II diversity: Alpharetroviruses 65 2.1.2 Class II diversity: 70 2.1.3 Class II diversity: 73 2.1.4 Class II diversity: 78 2.1.5 Class II diversity: IAP Elements 81 2.1.6 Class II diversity: Class II HERVs 82 2.1.7 Class II diversity: Divergent Class II ERVs 85 2.1.8 Using PCR screening to investigate the diversity of Class II ERVs 87

2.2 Materials 88 2.2.1 Media, plates and buffers 88 2.2.2 Vectors and bacterial strains 88 2.2.3 Enzymes 88 2.2.4 Gels, running buffers, and molecular weight markers 89 2.2.5 Oligonucleotide primers 89 2.2.6 Other reagents, kits and consumables 90 2.2.7 Equipment 90

2.3 Methods 91 2.3.1 DNA extraction 91 2.3.2 Polymerase chain reaction 91 2.3.3 Cloning - Ligation 93 2.3.4 Cloning - Transformations 94 2.3.5 Cloning - Plasmid DNA Preparation 95 2.3.6 Sequencing 96 2.3.7 Sequence identification and alignment 97 2.3.8 Phylogenetic analysis 98 2.3.9 Confirming sequence origin using PCR 99

2.4 Results 100 2.4.1 Design of novel primer pairs 100 2.4.2 Isolation of 104 2.4.3 Confirmation of fragment origin 105 2.4.4 Sequence alignment 110 2.4.5 g-patch domain 112 2.4.6 Nonsense mutations (stop codons and frameshifts) 114 2.4.7 Phylogenetic analysis 119 2.4.8 The status of recognised Class II groups 124 2.4.9 Novel divergent groups 131 2.4.10 Distribution of nonsense mutations across phylogeny 133 2.4.11 Distribution of env types across mammalian Class II retroviruses 135 2.4.12 Avian class II retroviruses and host geographic range 139

6 Table of Contents

TABLE OF CONTENTS

2.5 Discussion 141 2.5.1 Phylogenetic analysis 141 2.5.2 Novel retrovirus groups 144 2.5.3 Horizontal transfer between host classes 145 2.5.4 Horizontal transfer within host classes 146 3) Simulation Modelling of ERV Evolution

3.1 Introduction 148 3.1.1 ERV distribution and diversity within species 148 3.1.2 Computer simulation of ERV evolution using an individual-based 150 model 3.2 Approach 151 3.2.1 Model components 152 3.2.2 Input data 154 3.2.3 Output data 155 3.2.4 Model structure 155 3.3 Implementation 160 3.3.1 Materials - Software development environment 160 3.3.2 Random number generator 160 3.3.3 General features of the design and implementation process 161 3.3.4 Simulation components 161 3.4 Demonstration 166 3.4.1 TEST 1 — Population size and fixation frequency 166 3.4.2 TEST 2 — Gene density and fixation frequency 167 3.4.3 TEST 3 — Transposition rate and element population size 167 3.4.4 TEST 4 - Recombination 170 3.5 Application 172 3.5.1 Fixation and persistence of TE lineages 172 3.5.2 ERV Glade growth 172 3.5.3 The effect of incomplete sampling on evolutionary inference 174 3.5.4 Consequences of sharing gene products 175 4) The Generation of ERV Distribution and Diversity

4.1 Introduction 177 4.1.1 The `lifecycle of ERV lineages 177 4.1.2 Colonisation, ERV diversity and host/virus ecology 177 4.1.3 Post-colonisation ERV evolution 178

4.2 Methods 180 4.2.1 The Passengers simulation 180

7 Table of Contents

TABLE OF CONTENTS

4.3 Results 180 4.3.1 Simulating colonisation 180 4.3.2 Persistence of ERV activity following fixation 185 4.3.3 Frequency of fixation 189

4.4 Discussion 192 4.4.1 Colonisation, the pace of amplification, and loss versus persistence of 192 ERV lineages 4.4.2 The ERV lineage `lifecycle' and the dynamics of ERV Glade growth 195 4.4.3 Fixation frequency 198 5) Conclusions

5.1 The distribution and diversity of retroviruses 202 5.1.1 ERVs as evolutionary markers 202 5.1.2 ERVs as markers of exogenous retrovirus evolution 202 5.1.3 Class II retrovirus distribution and diversity 204 5.1.4 The generation of ERV diversity 190

6) References

References 207

7) Appendices

Appendix 1 Tissue and DNA sources 235 Appendix 2 Nucleotide alignment of Class II ERV pol fragments 239 Appendix 3 Characteristics of novel class II ERVs identified in this study 275

8

Index of Figures

INDEX OF FIGURES

Nomenclature

Al) Retrovirus classification 14

1) Introduction

1.1 The retrovirus replication cycle 19 1.2 Reverse transcription and the central dogma 20 1.3 Schematic cross-section through a retroviral particle 21 1.4 Genome structure of a generalised retrovirus 23 1.5 The retrovirus life cycle 26 1.6 Reverse transcription 27-29 1.7 Integration 31 1.8 Fixation of an ERV insertion 41 1.9 DNA recombination events involving ERVs 41 1.10 An evolutionary tree of the retroelements 48 1.11 and sequence relationships of retroviruses 50 1.12 The relationships between exogenous retrovirus genera, HERV families, and some 56 non-human ERVs 1.13 Tanglegram showing host/virus relationships 62 1.14 Fixed ERVs track host phylogeny 62

2) The Distribution and Diversity of Class II Retroviruses

2.1 A phylogeny of the class II retroviruses 66 2.2 (RSV) genetic map 67 2.3 Mason-Pfizer monkey virus (MPMV) genetic map 71 2.4 Human immunodeficiency virus type-1 (HIV-1) genetic map 74 2.5 Human T-cell leukemia virus type-1 (HTLV-1) genetic map 79 2.6 relationships 79 2.7 Novel Class II ERVs identified by PCR screening 86 2.8 Positions of target motifs for primers within PRO-RT coding domain 101 2.9 An alignment showing the conserved motif 'DIG/KDAY' in the genome 101 2.10 Comparison of primer efficiencies 103 2.11 PCR products and marker 104 2.12 Alignment of retroviral G-patch domains with other G-patch domains 113 2.13 Distribution of G-patch domain across Class II taxa 113 2.14 Comparison of nucleotide composition across aligned region in different class II 116 groups 2.15 Average number of stop codons and frameshifts in Class I and Class II ERV 116 fragments obtained by PCR screening 2.16 Plot comparing the distribution of nonsense mutations in Class I and Class II ERV 117 fragments obtained by PCR screening 2.17 Plot comparing the distribution of nonsense mutations within Class II ERV subgroups 117 2.18 Average number of stop codons and frameshifts in Class I and Class II subgroups 118

9

Index of Figures

INDEX OF FIGURES

2.19 Plot comparing the distribution of nonsense mutations in Class I and Class II ERV 118 subgroups 2.20 Neighbour joining phylogram showing Class II retrovirus relationships 120 2.21 Bootstrapped neighbour joining tree showing Class II retrovirus relationships 121 2.22 Bootstrapped strict consensus of six maximum parsimony trees showing Class II 122 retrovirus relationships 2.23 Strict consensus of six maximum parsimony trees showing viral host class origin 123 2.24 An NJ phylogeny of Class II retroviruses constructed using 2847bp of pol 129 2.25 Alignment of paired LTRs from a HERV.K.HML-9 insertion 129 2.26 Strict consensus of six maximum parsimony trees showing stop/codon frameshift data 134 2.27 Distribution of Env types in mammalian Class II ERVs 136 2.28 Predicted distribution of Env types in mammalian Class II ERVs 137 2.29 Detail of MP strict consensus showing avian retroviruses and host geographic range 140 2.30 Model of Class II retrovirus evolution 143 3) Simulation Modelling of ERV Evolution

3.1 Conceptual representation of the simulation model 153 3.2 Syntax structure of a sample Passengers infile 156 3.3 Internal mechanics of the simulation model 160 3.4 Number of failed colonisations per 100 fixations over a range of population sizes 168 3.5 Number of failed colonisations per 100 fixations over a range of gene densities 168 3.6 Rate of element population size increase under varying transposition rates 169 3.5 Decay in linkage disequilibrium (D) over time at varying chiasmata frequencies 169

4) The Generation of ERV Distribution and Diversity

4.1 - Plots showing element population size increase in 100 simulations carried out over a 181-182 4.5 range of five transposition rates. 4.6 Effect of population size on simulation outcome 183 4.7 Effect of gene density on simulation outcome 183 4.8 - Persistence plots (host mutation only) 186 4.10 4.11 - Persistence plots (host and element mutation) 187 4.13 4.14 Master/slave and multiple source models of ERV amplification 191 4.15 Effect of host population bottlenecks on element fixation frequency 191 4.16 The lifecycle of an ERV lineage 197

10

Index of Tables

INDEX OF TABLES

1) Introduction

1.1 Retroelement nomenclature 46 1.2 Taxonomic characters of the seven retrovirus genera 51 1.3 Estimates of the fraction of the human genome for retroviral classes 52 1.4 General properties of HERV families 53

2) The Distribution and Diversity of Class II Retroviruses

2.1 The Alpharetroviruses 67 2.2 Lentivirus diseases 74 2.3 Primer sequences 101 2.4 Taxa screened for ERV insertions and viral fragments identified 106 2.5 Previously described Class II retroviruses included in this analysis 111 2.6 Novel ERV groups identified by phylogenetic analysis 132 2.7 Env homology in Betaretroviruses 138 3) Simulation Modelling of ERV Evolution

3.1 Components and classes 162

11 Abbreviations

ABBREVIATIONS

A adenine T thymine G guanine C cytosine AIDS acquired immune deficiency syndrome BLAST basic local alignment search tool CA capsid protein ERV endogenous retrovirus HERV human endogenous retrovirus HGP human genome sequencing project IAP intracisternal A-type particle IN integrase LTR long terminal repeat MA matrix protein MaLR mammalian apparent LTR retrotransposons NC nucleocapsid protein PBS primer binding site PCR polymerase chain reaction PPT polypurine tract PR protease RH ribonuclease H (RNase-H) RT SU surface glycoprotein TE transposable element TM transmembrane glycoprotein VLP virus-like particle

12 Nomenclature

RETROVIRUS NOMENCLATURE

This thesis distinguishes three retroviral classes, Class I, Class II and Class III, in a classification system that integrates endogenous and exogenous retrovirus species (Figure 1A). Although a phylogenetic relationship between endogenous retrovirus and exogenous retroviruses exists, classification systems have developed seperately for each. Seven exogenous retrovirus genera are recognised by the International Comittee on the Taxonomy of viruses (ICTV); Alpharetrovirus, , , Deltaretrovirus, , Lentivirus and Spumavirus (Pringle, 1999). Amongst endogenous retroviruses (ERVs) separate classifications exist for human and non-human species. The 27 'families' of human ERVs (HERVs) so far identified are grouped into three 'Classes', based on sequence similarity to known exogenous genera. Class I HERVs are most similar to , Class II HERVs are similar to Alpha- and Betaretroviruses, whilst Class III retroviruses show greatest similarity to Spumaviruses.

In phylogenies based on alignments of diverse retroviral RT genes from both exogenous and endogenous species, three clades emerge with robust bootstrap support. The relationships among the clades varies according to whether maximum parsimony or neighbour joining methods are used to construct phylogenies. However, the three clades are generally retained, suggesting three major divergences in the evolution of RT (see Figure Al. for refererences). These three clades give rise to the three retroviral Classes recognised in this thesis.

13 Classification

Figure Al. Retrovirus Classification

Diverse vertebrate ERVs

HERVS HE RV.L Class III HERVS u2 Class III U3

Spumaviruses

Gammaretroviruses

HERV.A HERV.ADP HERV.E HERV.F (type b) HERV.FRD HERV.H HERV.H49C23 HERV.I RRHERV.I Class I HERVS (type b) HERV.L (type b) HERV.P HERV.R HERV.R (type b) HERV.R (type c) HERV.T Class I HERV.W HERV.XA ERV.9

Epsilonretroviruses

Diverse lower vertebrate ERVs I

Diverse mammalian ERVs

Diverse avian ERVs

Ii Deltaretroviruses

Lentiviruses H Betaretroviruses — Alpharetroviruses

HERV.K.HML-2 Class.II Class II HERVS HERV.K.HML-5 H HERV.K.HML-6 --I lAP elements --I Rare lower vertebrate ERVs

--I Diverse higher vertebrate ERVs

Figure Al. The tree above summarises known relationships between diverse retroviruses, based on phylogenies derived from alignments of retroviral RT genes. Data suggests there have been at least three major divergences during RT evolution. The terms Class I, Class II and Class III have come into common usage to differentiate between HERVs, according to which of the three major lineages their RT gene shows homology. In this thesis, the same system is applied to all retroviruses, as illustrated above. Most endogenous retroviruses (ERVs) described so far belong to Class I, in which they show a tendency to cluster according to host class. The majority of known exogenous retroviruses fall into Class II. For reference see Chiu eta!, 1984; Callahan, 1988; Wilkinson et al, 1994; Tristem et al, 1996; Martin et al, 1997; Boeke and Stoye, 1997; Hemiou et al, 1998; Benit et al, 1999; Andersson et al, 1999; Tristem, 2000; Katzourakis and Tristem, in press.

14 Aims

AIMS

Endogenous retroviruses (ERVs) are widespread elements within the genomic DNA of vertebrates. ERVs arise through germline infections by ancient retroviruses, and their distribution throughout vertebrate genomes is a reflection of numerous ecological and evolutionary processes. Recently, research has begun to explore the utility of widespread retrovirus-derived sequences in vertebrate genomes as widespread markers of evolutionary processes. The primary aim of this study is to describe the distribution and diversity of a subset of ERVs (Class II ERVs) in a broad range of higher vertebrates. Polymerase chain reaction (PCR) and automated sequencing are used to amplify and characterise ERV fragments, and phylogenetic reconstruction is used to infer the relationships between them, and to investigate the evolutionary history of the Class II retroviruses as a whole. The second aim of this study is to investigate the kind of inference that can be made from ERV distribution data. Computer simulation is used to create an environment in which to study the effect of varying ecological parameters on ERV evolution. The development of the simulation model is described, and the model is used to investigate the effect of varying specific parameters on the generation of ERV distribution and diversity.

15 Chapter 1 Introduction

1. Introduction

1.1. Retroviruses

1.1.1 The Retroviridae

The retroviruses (Retroviridae) are a large family of viruses, distinguished by their mechanism of replication. In recent decades retroviruses have become increasingly significant to modern science, not only to medicine, as pathogens, and as tools for gene therapy, but also to the emerging science of genomics. The progress of genome sequencing in recent years has revealed that retrovirus-derived sequences are remarkably widespread features in vertebrate genomes. These endogenous retroviruses (ERVs) are apparently the relics of ancient infections. Recently, research has begun to explore the potential of ERVs as markers of evolutionary processes. This thesis aims to further explore that potential.

1.1.2 Retroviruses and medical science

Over the last twenty years, retroviruses have risen from relative obscurity to occupy a prominent position within medical science. This change has accompanied the emergence of retroviral infections as public health priorities, and the discovery of applications for retroviruses as molecular tools.

Retroviruses first came to widespread prominence in the early 1980's, when two related retroviruses, HIV-1 and HIV-2, were identified as the cause of acquired immunodeficiency syndrome (AIDS) (Gallo et al, 1984; Levy et al, 1984). Over the past two decades, the emergence of pandemic AIDS has stimulated a considerable amount of research into the biology of HIV infection (Joag et al, 1996). Despite this, there is still no cheap and reliable treatment for the disease. A World Health Organisation (WHO) global summary of the HIV/AIDS epidemic in 2000 estimated that a total of 21.8

16 Chapter 1 Introduction million people have died from the disease since the beginning of the epidemic, and aapproximately 40 million people are currently living with HIV/AIDS (UNAIDS/WHO, 2000).

Prior to the emergence of HIV, the retroviruses were a relatively obscure family (Vogt 1997a). However, the ability of some retroviruses to induce tumours in their hosts had attracted scientific attention. Rous sarcoma virus (RSV) was identified in 1911 as the cause of chicken leukosis, a neoplastic disease of chickens (Rous, 1911), and subsequently, retroviruses capable of inducing tumours in mammalian hosts were identified. These retroviruses provided the first model systems for studies of carcinogenesis, and have delivered valuable insights into the general mechanisms of cancer, for example the role of oncogenes (Rosenberg and Jolicoeur, 1997; Vogt 1997a).

The use and study of retroviruses in the laboratory led to recognition of their potential as vectors for gene delivery. Recent attempts to treat children born with the genetic disease severe combined immunodeficiency (SCID) using retroviral vectors were initially hailed as technological breakthroughs (Cavazzana-Calvo et al, 2000). However, these efforts have since faltered, after two patients in trials developed leukemia (Check, 2003a). This development has cast some uncertainty over the future of gene therapy in general, and the applications of retroviruses therein (Check, 2003b).

1.1.3 Endogenous retroviruses

Studies throughout the 1960's and 1970's led to the surprising discovery that retroviruses occur naturally within the genomes of diverse vertebrates (Dougherty et al, 1967; Vogt, 1967; Weiss, 1967; Benveniste and Todaro, 1975; Benveniste and Todaro, 1977). These endogenous retroviruses (ERVs) are apparently derived from ancient retroviruses that infected the germline cells of vertebrate ancestors. Most ERVs have lost the ability to replicate through accumulated mutations and are not expressed, but a few are capable of expression. The ERV genes are sometimes expressed in certain

17 Chapter 1 Introduction tumour cell lines, causing them to spontaneously generate retroviral particles (Vogt and Friis, 1971; Levy, 1978; Kuff and Leuders, 1988).

ERVs have been found in the genomes of all vertebrates in which they have been searched for, with the exception of the most basal vertebrate lineage, Agnatha (hagfish and lampreys) (Herniou et al, 1998). The complete sequencing of the human genome has revealed thousands of human endogenous retroviruses (HERVs) (Tristem, 2000; Bock and Stoye, 2000). The extensive distribution of ERVs throughout the genomes of vertebrates indicates a longstanding and intimate relationship between retroviruses and their hosts, with genetic consequences that may have been important in vertebrate evolution.

Over the past decade, the potential utility of ancient ERV sequences as widespread `markers' of evolutionary processes has increasingly been recognised (Herniou et al, 1998; Sverdlov, 2000; Hughes and Coffin, 2001). The interactions of retroviruses and their hosts throughout their evolution have shaped ERV distribution and diversity, potentially creating informative patterns. The aim of this investigation was to describe the distribution and diversity of ERVs, and to explore how patterns of ERV distribution and diversity are generated in response to varying ecological and evolutionary parameters.

18

Chapter I Introduction

1.2 Reverse transcription - a unique genetic strategy

All retroviruses undergo a remarkable replication cycle during which the normal direction of biological information flow is reversed. Extracellular retroviruses have genomes composed of RNA, but when a retrovirus infects a host cell, a viral enzyme called reverse transcriptase (RT) makes a DNA copy of the viral genome. The DNA copy is translocated to the nucleus where it is integrated into the chromosomal DNA of the host cell by a second virally encoded enzyme, called integrase (IN). The integrated DNA form of the retrovirus is referred to as a provirus. The provirus genes are expressed by cellular mechanisms, and direct the synthesis of progeny retrovirus virions.

Figure 1.1 The retrovirus replication cycle

Extracellular RNA Retrovirus Extracellular phase

host cell membrane Infectio Release

I Reverse transcription of RNA into DNA Integrated DNA Provirus Synthesis of new viral progeny

Expression of retroviral Integration into nuclear DNA of host cell genes by host cell mechanisms

nucleus Intracellular phase

This model of retroviral replication met with some scepticism when it was initially proposed by Howard Temin in 1964, particularly since it necessitates a step during which RNA is transcribed into DNA (Temin, 1964). This step violates biology's 'central dogma' - that molecular information flows unidirectionally in biological systems, from DNA, through RNA, to protein (Crick, 1958). However, in time retroviral virions were demonstrated to contain an enzyme capable of channelling information from RNA back to DNA, providing conclusive support for Temin's hypothesis. This enzyme was called reverse transcriptase (Baltimore, 1970; Temin and Mizutami, 1970).

19

Chapter I Introduction

Figure 1.2 Reverse Transcription and the Central Dogma

(a) The Central Dogma states that biological information flows unidirectionally

Transcription Translation Replication MINN. RNA ■ MI* Protein

(b) Reverse transcription allows information to flow from RNA back to DNA

Transcription Translation IMMO Replication RNA IMO Protein IIN Reverse Transcription

The central dogma states that the flow of information in biological systems is from DNA to RNA and then to protein, as shown above in (a). DNA is the repository of information. Genes are stretches of DNA that are transcribed into RNA copies (RNA transcripts). RNA acts as an intermediary to protein synthesis. RNA transcripts deliver the information contained within genes to the locations in the cell where protein synthesis occurs. There, transcripts are translated into polypeptides. Whenever the information contained within the DNA genome is copied (replicated), the process is tightly regulated to ensure that identical copies are created. Reverse transcription (b) contravenes the central dogma, by allowing the flow of information from RNA back to DNA. Reverse transcribed sequences may be integrated into genomic DNA.

Retroviruses are not the only animal viruses that propagate via reverse transcription, this also occurs, for example, in Hepadnavirus replication. They are also not the only viruses known to integrate their DNA into that of the host cell. Another example of a virus exhibiting this behaviour is bacteriophage lambda (Lewin, 1997). However, retroviruses are the only viruses known to undergo both reverse transcription and integration steps as part of their life cycle. Most of the unique properties of retroviruses derive from their mechanism of replication. Integration sets up a uniquely intimate relationship between a retrovirus and its host. It underlies the ability of retroviruses to produce persistent infections and to establish chronically infected cells, to transform infected cells rapidly and efficiently, and to colonise the host germline and generate ERVs that persist as host alleles (Brown, 1997).

20

Chapter 1 Introduction

1.3 Retrovirus structure and genomic organisation

All retroviruses share a similar basic virion structure and genome organisation (Figures 1.3 and 1.4 respectively). Retroviral virions range from 80-100 nm in diameter. They have an internal protein core that contains the viral genomic RNA and is surrounded by a lipid envelope. The core structure is composed of matrix (MA), capsid (CA) and nucleocapsid (NC) structural proteins and contains reverse transcriptase (RT), integrase (IN), and protease (PR) enzymes. The viral envelope is derived from the host cell, and is formed when the virus particle buds from host cell plasma membrane in which viral glycoproteins are inserted. The glycoproteins are composed of transmembrane (TM) and surface (SU) subunits linked together by disulphide bonds (Vogt, 1997b).

Figure 1.3 Schematic cross section through a retroviral particle loo nm 1 Surface glycoprotein (SU) Transmembrane glycoprotein (TM)

• --/- , ` _ Integrase (IN) Lipid envelope / Genomic RNA r- C Ci) i

, ---s, --\ ) ._4 Matrix protein (MA) P it, Nucleocapsid (- ,, protein (NC) Capsid agif ;:- _ protein (CA) .0 ir

Reverse transcriptase (RT) Protease (PR)

21 Chapter 1 Introduction

Retrovirus genomes typically range from 7-12 kb in size. The genome of an extracellular retrovirus is composed of linear, single-stranded RNA of positive polarity. Uniquely amongst viruses, retroviruses have diploid genomes - the genome is a dimer composed of two identical single-stranded RNA molecules. A single molecule of cellular tRNA is attached to a primer-binding site (PBS) located 1000 to 500 bases from the 5' end of the RNA genome. The tRNA has a role in initiating reverse transcription. Different species of retrovirus vary in the tRNA species that they recognise (Harada et al, 1979; Peters and Glover, 1980; Jiang et al, 1993; Holzschu et al, 1995; Vogt, 1997b). All retrovirus genomes are 5' capped and 3' polyadenylated, though this may simply reflect the synthesis of genomic RNA by cellular mechanisms (Vogt, 1997b).

The DNA form of the retroviral genome differs from the RNA in that it is flanked by characteristic long terminal repeat (LTR) sequences. The LTRs are generated by reverse transcription, and each LTR is composed of distinct U3, R and U5 regions that are derived from the non-coding ends of the RNA genome. U3 is derived from sequences unique to the 3' end of the RNA genome, R is derived from a sequence repeated at both ends, and U5 is derived from sequences unique to the 5' end of the RNA.

The retroviral genome contains three essential coding domains; gag, pol, and env. These domains are found in the same relative positions in all seven retrovirus genera. The gag domain encodes three non-glycosylated proteins, the matrix (MA) protein, capsid (CA) protein, and nucleocapsid (NC) protein that constitute the structural components of the viral core. The pol domain encodes the reverse transcriptase (RT) and integrase (IN) enzymes responsible for carrying out the reverse transcription and integration steps of replication, respectively. The env reading frame encodes two envelope glycoproteins, both of which are translated from a spliced subgenomic RNA. The larger of the two, the surface (SU) protein, protrudes from the virion envelope and is responsible for viral recognition of cell-surface receptors. The smaller transmembrane (TM) protein serves to anchor SU proteins in the viral envelope and contains domains responsible for the fusion of viral and cellular membranes that occurs when a retrovirus enters a host cell. A fourth coding domain, called pro, is found between gag and pol. The pro coding domain

22

Chapter 1 Introduction

a) Integrated DNA provirus

Accessory Accessory Accessory genes genes genes/

U3 R U5 gag pol env U3 R U5

--; -2 MA CA NC' PR RT RH ' IN — SU 1 TM ____...___.__.__I LTR LTR

Host genomic DNA

b) Full-length genomic RNA and messenger RNA for Gag and Gag-Pol PBS FS IV PPT R U5 ■ gag pro pol env U3 R 5' c'- 0 e' - t SD SA

Leader Region

c) Spliced subgenomic SD/SA

messenger RNA for 5, CAI' AAA 3, Env R U5 I U3 R PBS PPT

Figure 1.4 Genome structure of a generalised retrovirus

Regions of the genome are not shown to scale. The genome structure of a generalised DNA provirus, with long terminal repeats (LTRs), is shown in (a). The LTRs are composed of U3, R and U5 elements, and contain promoters and enhancers that mediate transcription. The gag, pro, poi, and env coding domains are located in the same relative positions in all retroviruses. In complex retroviruses, accessory genes can be located at either or both of the marked positions. The primary transcriptional product of transcription is the full length RNA shown in (b). Sequences important for replication are marked on the full length RNA in the approximate positions in which they are found; (PBS) primer-binding site; (w) encapsidation sequences; (SD) splice donor site; (SA) splice acceptor site; (FS) frameshift site; (PPT) polypurine tract; (PA) polyadenylation signal; (AAA) poly(A) tail. The spliced subgenomic mRNA for the Env protein is shown in (c). Complex retroviruses generate accessory gene products from a further subset of spliced mRNAs.

23 Chapter 1 Introduction

encodes the viral protease (PR) that carries out proteolytic processing of retroviral polyproteins. The arrangement of the pro coding domain with respect to gag and pol varies between retroviral genera and species; it can be translated as part of gag, part of pol or translated in a separate reading frame. Retroviruses are designated simple or complex according to whether one or more additional or 'accessory' genes are present in the genome. Simple retroviruses carry only the elementary information, whereas complex retroviruses code for additional regulatory proteins from multiply spliced messages (Vogt, 1997b).

24 Chapter 1 Introduction

1.4 The retrovirus life cycle

The retrovirus life cycle can be characterised as a series of six distinct stages: (1) attachment and penetration, (2) reverse transcription, (3) nuclear entry and integration, (4) transcription, (5) translation, (6) assembly and release from the host cell. These six stages of the retrovirus life cycle are discussed in order below. A summary of retrovirus replication is shown in Figure 1.5.

1.4.1 Attachment and penetration

A new cycle of infection begins when the glycoproteins protruding from an extracellular retrovirus attach to specific plasma membrane receptors on the host cell. Retroviral envelope glycoproteins are composed of two peptides, an external glycosylated hydrophilic polypeptide (SU) and a viral membrane-spanning protein (TM). The highly specific interaction of the SU domain with cellular receptors mediates fusion and determines the host range of the virus, defining susceptible host species and target cell types within the host animal. Numerous retroviral receptors have been identified and characterised at the molecular level (see Hunter 1997 for review). In the case of HIV and other lentiviruses, the three dimensional structure of the receptor has been determined and the viral glycoprotein-binding domain pinpointed. Most retroviruses utilize a pH-independent fusion mechanism, however, recent work has indicated that avian leukosis virus (ALV) utilizes a novel entry mechanism that combines aspects of both pH-independent and pH-dependent entry (Barnard and Young, 2003).

Following the attachment of retroviral Env gycoproteins to specific receptors on the host cell, fusion of virus and cell membranes propels the viral core inwards into the cytoplasm. The processes involved in mediating fusion are poorly understood. Studies of HIV-1 indicate that the viral core is a ribonucleoprotein complex that incorporates the genomic RNA and the RT enzyme (Wilk et al, 2001). Penetration into the cytoplasm apparently activates RT and initiates reverse transcription of the RNA genome.

25

Chapter 1 Introduction

Figure 1.5 The Retrovirus Life Cycle

Extracellular virus

-) Attachment

Penetration Assembly & Release

Plasma membrane 1 0

Receptor molecule ll IV Premature virion CZ Genomic RNA packaged Reverse transcription 1 t into virion (4) 4C. type/lentivirus assembly pathway ig

O. B/D type assembly pathway

Gag-Pal

Gag Endoplasmic reticulum 0 a I Env secreted <::) into lumen of ER Translation 111111Mr. 0

Preintegration complex =------Ribosome Nuclear Entry ------9------° --__

Spliced subgenomic RNA for Env Nucleus 5' cap G;5513164% -AAA

Viral genomic RNA, and Integration RNA for Gag & Gag-Pol 7-- LTR LTR Host genomic DNA S. cap ._____:://------AAA

'-1:1111.7M235MICIIIIII [::•=imiimAll Transcription

Integrated DNA provirus 1:::EMIffinimassal 1::ElIllimauimnd31111:,-,:)

Figure 1.5 All retroviruses have infection cycles with common steps. The infection cycle begins with the attachment of retroviral surface glycoproteins to specific receptors in the host cell, and the penetration of the cell membrane. Inside the host cell, retroviral RT catalyses reverse transcription of the retroviral genomic RNA, generating a DNA provirus in a preintegration complex. This complex moves to the cell nucleus, where retroviral integrase (IN) catalyses the integration of the DNA provirus into the nuclear DNA of the cell. Provirus genes are expressed by host cellular mechanisms, generating full length and spliced retroviral RNAs. Env glycoproteins are translated from subgenomic RNAs, and are threaded into the lumen of the endoplasmic reticulum. They are subsequently transported to the plasma membrane of the cell. Processing of Gag-Pol and Gag polyproteins generates the structural and enzymatic components of the retroviral particle. Particles assemble at the cell surface and bud from the plasma membrane into which Env glycoproteins are secreted.

26

Chapter 1 Introduction

Figure 1.6 (part 1) Reverse transcription of the retroviral genome

1) Minus strand synthesis is primed by a tRNA bound to PBS in genomic RNA

DNA synthesis tRNA Reverse transcriptase enzyme

- AAA 3' R U5 PBS PPT U3 R DNA copy of RNA template Viral genomic RNA

2) RT mediated extension generates minus strand strong stop DNA (-sssDNA)

•. I I I I U_AN ----A)- 5' _ _ - AAA 3' R U5 PBS PPT U3 R Viral genomic RNA

3) RNase H activity of degrades RNA template R U5 L RNA 1,1-7__. I 7 7T 777M-' AAA 3'

PPT U3 R -sssDNA RNase-H activity of RT degrading RNA template

4) 1st strand transfer translocates -sssDNA to 3' end of the genomic RNA template

-sssDNA I I

}{RNA; -..... 5'5, R U5 L__1 • ---m- AAA 3' PPT U3 R ...... 1st strand transfer ...... R l___ _I _ __I -AAA . PBS 5' tRNA 3' R sequences anneal R U5 I I -sssDNA

27

Chapter 1 Introduction

Figure 1.6 (part 2) Reverse transcription of the retroviral genome

5) Minus strand synthesis contnues following 1st strand transfer

PBS R 5' 1 I L7 -AAA 3' ill IL I I tRNA DNA synthesis R

6) PPT region resists RNase H degradation

RNase H activity degrades template PPT region

— tRCI‘12ND

7) Plus strand synthesis primed by PPT region of RNA template that resists RNase H degradation

PPT U3 R DNA synthesis I 4" * 1 • • {IMMIIIIIIIIIII % _,_ ] I . I I I I I tRNA PBS

8) PBS is generated by copying from tRNA

PPT U3> R U5 PBS DNA synthesis =. * 5' PBS 1 1.-L {SAD U3 R U5

9) RNase H degrades tRNA primer +sssDNA PPT I 1 PBS MIEN PPT U3 R

28

Chapter 1 Introduction

Figure 1.6 (part 3) Reverse transcription of the retroviral genome -FsssDNA 10) 2nd strand transfer PPT I I

PBS

PPT U3 R U5

2nd strand transfer

PBS sequences anneal U3 R U5 1: i. I PPT U3 R U5 i PBS

11) Plus strand synthesis completed PPT primer removed

U3 R U5 a DNA synthesis

,- 11111 PBS PPT U3 R U5

12) Minus strand synthesis completed

U3 R U5 PBS - MI

DNA synthesis R U5 PBS PPT U3 R U5

13) Generation of DNA provirus with identical long terminal repeat (LTR) termini

U3 R U5 PBS PPT U3 R U5 , , 5 AIM 3, iiiii 1 5, U3 R U5 PBS PPT U3 R U5 LTR 1.T1(

29 Chapter 1 Introduction

1.4.2 Reverse transcription

Reverse transcription (shown in Figure 1.6) involves two distinct activities of the RT enzyme - (1) a DNA polymerase that can use either RNA or DNA as a template, and (2) a nuclease termed ribonuclease H (RNase-H) that specifically degrades the RNA strand of RNA:DNA duplexes. However, a role for other retroviral proteins (possibly those derived from gag) cannot be ruled out. For example, the nucleoprotein complex of the core is apparently adapted to provide a particularly favourable environment for reverse transcription (Darlix, 1991; Nagy et al, 1994). The reverse transcription process involves two intramolecular 'strand transfers'. During each strand transfer, the growing DNA chain detaches from the RNA template and reattaches at a different position. The strand transfers generate LTR sequences at either end of the DNA provirus (Telesnitsky and Goff, 1997).

Synthesis of the first DNA strand by reverse transcription is primed by the 3' end of a partially unwound transfer RNA molecule attached by complementary base pairing to the viral PBS. As RT extends the DNA chain from the primer tRNA, the RNase-H activity of RT degrades the template RNA strand of the RNA/-sssDNA duplex. RT- mediated extension of minus-strand DNA synthesis proceeds until the 5' end of genomic RNA is reached, generating minus strand strong-stop DNA (-sssDNA). Next, the first strand transfer occurs, causing -sssDNA to be transferred from the 5' to the 3' end of a viral genomic RNA. This transfer is mediated by identical repeat (R) sequences found at either end of the retroviral genomic RNA. Once -sssDNA has been transferred to the 3' R segment on the viral RNA, minus strand DNA synthesis resumes, accompanied by RNase-H digestion of the template strand. The polypurine tract (PPT), however, is relatively resistant to RNase-H degradation, and a defined segment derived from the PPT persists to serve as a primer for plus strand synthesis. Plus strand synthesis is halted after a portion of the primer tRNA is reverse transcribed, yielding a segment of DNA called plus strand strong-stop DNA (+sssDNA). RNase-H removes the primer tRNA, exposing sequences in +sssDNA that are complementary to sequences in the PBS near the 3' end of the plus strand DNA. Following this, the second strand transfer

30

Chapter 1 Introduction

Figure 1.7 Integration 1) Provirus in pre-integration complex LTR LTR The viral DNA molecule at the Viral proteins completion of synthesis is a blunt U3 R US U3 R Us ended linear molecule with termini WM— 3' corresponding to the boundaries of the ■ s' long terminal repeats, as specified by GATGGAAGGCTAI, -AAATCTTAGCAAA the primers of plus- and minus-strand CTACCTTCCGATT -TTTAGAATCGTTT DNA synthesis during reverse Region shown 3' en d cleavage transcription (see Figure 1.6). The above in yellow viral DNA is contained in a preintegration complex that includes the integrase enzyme. Integrase 5' cleaves the 3' termini of the viral 3'

DNA, eliminating 2-3 bases from each GATGGAAGGCTAA- 3' end, exposing recessed 3' -OH -ACCTTCCGATT- groups on a phylogenetically conserved CA/TG dinucleotide Cleavage exposes -01i groups on the adenine (shown in blue), that defines the ends of a conserved TG/AC of the integrated provirus. Following dinucleotide (shown in cleavage, the preintegration complex blue) at the 3' ends enters the cell nucleus, where the -OH groups are used to attack host DNA at the target site (sequence shown in red) at positions staggered four to six bases in the 5' direction. The staggering of Target site in host DNA the attachment sites, combined with DNA repair in subsequent integration steps, leads to the duplication of four to six bases from the target site at -TGGGTCCCTCTCCACTCTTCCGTCTTTATTTCTGCATG- either side of the integrated provirus. -ACCCAGGGAGAGGTGAGAAGGCAGAAATAAAGACGTAC- These features facilitate the identification of the ends of ancient ERVs.

Provirus integrated into target site LTR LTR

U3 R US U3 R US-

TCCACTCTTCCTGGAAGGCTAA- -AAATCTTAGCACTTCCGTCTTT AGGTGAGAAGGACCTTCCGATT- -TTTAGAATCGTGAAGGCAGAAA

The integration reaction duplicates 4-6 nucleotides from the target site (shown in red) as direct repeats either side of the integrated provirus. The conserved dinuceotide (shown in blue) defines the ends ofthe integrated provirus.

31 Chapter 1 Introduction causes +sssDNA to detach from the template and reattach via annealing of the complementary PBS segments in +sssDNA and the minus strand DNA. Plus- and minus-strand syntheses are then completed, with the plus and minus strands of DNA each serving as a template for the other strand. The resulting DNA provirus is flanked by identical long terminal repeat (LTR) sequences (Telesnitsky and Goff, 1997).

1.4.3 Nuclear Entry and Integration

Reverse transcription of the viral genome forms a preintegration complex (PIC), consisting of the viral RNA/DNA and proteins that facilitate nuclear entry and integration (Bowerman et al, 1989). The PIC first is translocated from the cytoplasm to the nucleus. In HIV-1, studies using green fluorescent protein indicate that migration of the PIC to the nucleus involves dynein-mediated transport along microtubules (McDonald et al, 2002). The process of entry to the nucleus appears to vary between retroviruses. (MLV) appears to be incapable of traversing the interphase nuclear membrane and must wait instead for the disassembly of the nuclear membrane at mitosis, and it is likely that similar restrictions apply to all simple retroviruses. In contrast, HIV-1 (and possibly other complex retroviruses) can enter non- dividing nuclei, apparently via signal-mediated, energy dependent transport through nuclear pores (Lewis and Emerman, 1994). The HIV-1 PIC is able to traverse nuclear pores despite the fact that it measures more than twice the size of the nuclear pore central channel (Sherman and Greene, 2002). Studies indicate that viral proteins involved in this process include the HIV integrase (Gallay et al. 1997), MA protein (Bukrinsky et al, 1993), and the accessory protein Vpr (Connor et al, 1995). All three of these proteins carry nuclear localisation signals, and mutation of these has been shown to reduce the efficiency of HIV replication in non-dividing cells. However, mutation of the Vpr and MA nuclear localisation signals caused only partial and dose dependent restriction of HIV import, suggesting that integrase may have the key role in facilitating nuclear import (Gallay et al. 1997). Additionally, an intermediate product of reverse transcription consisting of a triple stranded 'DNA flap' has been demonstrated to play an important role in HIV-1 nuclear import (Zennou et al, 2000). Despite many recent

32 Chapter 1 Introduction advances in understanding of HIV-1 entry into the nucleus, the precise mechanism by which nuclear import is directed remains unknown (Sherman and Greene, 2002).

Within the nucleus, the viral integrase catalyses the integration of the retroviral genome into chromosomal DNA to form a stable provirus. The integration reaction (Figure 1.7) is initiated by the cleavage of both the 3' termini of the viral DNA by integrase, causing the loss of two to three base pairs from each 3' end. In all retroviruses, loss of the terminal bases exposes 3' -OH groups on a phylogenetically conserved CA/TG dinucleotide. The -OH groups are used to attack phosphodiester bonds on opposite strands of the target DNA, at positions staggered by four to six bases in the 5' direction. DNA synthesis extends the host DNA 3' -OH groups that flank the junction between virus and host DNA, filling in the gaps that flank the viral DNA and displacing the mismatched viral ends. Extension duplicates a 4-6bp sequence from the target site, with the duplicated sequence flanking the integrated provirus a direct repeat (Majors and Varmus, 1981).

The number of potential sites for integration in the cellular genome is very large, and may include all points in the genome (Withers-Ward et al, 1994). However, all retroviruses exhibit preferential patterns of integration, and these patterns have been shown to vary amongst retroviral species. These preferences are generally influenced by the pattern of folding in nucleosomal DNA, which determines the accessibility of integration sites (Pryciak and Varmus, 1992). Sites where strand distortion widens the major grooves in DNA, such as are found in nucleosomes, are strongly preferred. Studies of integration patterns over whole genomes also suggest a preference for transcriptionally active regions (Shih et al, 1988). Host cell DNA-binding proteins may occlude certain sites, preventing their use. Integration of the retroviral genes is an inherently mutagenic process, it can potentially activate or inactivate cellular genes, and is one of the mechanisms by which retroviruses can induce tumours (Brown, 1997).

The integrated provirus behaves like cellular gene; it is expressed by cellular mechanisms and replicated along with chromosomal DNA. Integration is effectively

33 Chapter 1 Introduction irreversible, although a homologous recombination between the two long terminal repeats (LTRs) can lead to partial deletion that removes the internal coding sequence and one of the LTRs, leaving behind a solo LTR (Figure 1.9a). Gene conversion can potentially lead to the complete elimination of proviral alleles in cells hemizygous for the provirus (Figure 1.9d), but is probably rare (Stoye, 2001).

1.4.4 Expression: Transcription

Transcription of the proviral genes is carried out by cellular RNA polymerase II, and is directed by a diverse array of cis-acting elements in the viral LTRs. The majority of cis- acting elements involved in the initiation of cellular transcription are located in the U3 region of the LTR. Although the promoter in the viral 5' LTR initiates transcription downstream, such that promoter sequences are not present in the full-length genomic transcripts used to form progeny virions, the LTR mechanism of retrotransposition regenerates these sequences during reverse transcription (see Figure 1.7). Simple retroviruses rely solely on the interaction of cellular factors with regulatory elements in the viral LTR for control of transcription. Complex retroviruses, in contrast, are able to regulate gene expression more precisely through virus-encoded trans-activating factors that affect the extent of transcription and control the relative amounts of the products of various genes. Retroviruses have also been shown to employ a wide range of host-cell transcription factors, including both ubiquitous and tissue-specific or ligand-dependent activators (Rabson and Graves, 1997).

Transcription of the provirus generates spliced and unspliced mRNAs. Unspliced mRNAs serve as the full-length progeny RNA genomes, and as mRNA for gag and pol. In spliced mRNAs, the splicing event fuses the 5' portion of the genomic RNA to downstream genes, most commonly env, but the accessory genes of complex retroviruses are also translated from a subset of spliced mRNAs. The intron between the splice donor and splice acceptor sites that is removed by splicing contains the gag, pro, and pol genes (Figure 1.4). All newly synthesised viral RNA products are 5' capped and 3' polyadenylated by the cellular RNA processing machinery (Rabson and Graves,

34 Chapter 1 Introduction

1997). The proper ratio of spliced to unspliced mRNA must be maintained for efficient replication. For simple retroviruses this ratio is determined by several cis-acting sequences that have only been partially determined. In many complex retroviruses splicing is regulated by through the interaction of sequences on the RNA with proteins encoded by accessory genes (Bakker et al, 1996; Cullen, 1998; Maury, 1998).

1.4.5 Expression: Translation

Viral transcripts are translated on ribosomes and processed by cellular mechanisms. Proteins derived from gag, pro and pol genes are translated from full length RNAs. Translation of full length viral RNA is usually terminated by a stop codon at the end of the gag gene, generating Gag polyproteins which are subsequently cleaved to yield the structural components of the viral core. However, a proportion of the ribosomes translating gag continue to translate the downstream genes, thereby generating a Gag- Pro-Pol fusion protein with Gag at its amino terminus. In MLV and related retroviruses, the gag stop codon is bypassed by readthrough suppression (Yoshimaka et al, 1985a; Yoshimaka et al, 1985b). In most retroviruses, however, it is bypassed by ribosomal frameshifting, during which ribosomes stall and shift into a new reading frame before continuing into the downstream gene. Ribosomal frameshifting is mediated by essential consensus secondary structures located downstream from the frameshifting site (Chen et al, 1995). In retroviruses in which the pro gene is in a reading frame by itself, there are two frameshift signals, one before pro and the other before pol. Frameshifting has apparently evolved as a simple strategy to provide the proper ratios of Gag, Gag-Pro, and Gag-Pro-Pol polypeptides in the infected cell (Atkins and Gesteland, 1999). A consequence of this strategy is that the enzymatic proteins required by the virus are fused to the structural proteins in the Gag polyprotein. This provides an indirect way to incorporate enzymes into the virion during assembly.

Env polyproteins composed of SU and TM subunits are synthesised from a spliced sub- genomic DNA. Post-transcriptional processing of the SU and TM proteins occurs within

35 Chapter 1 Introduction the lumen of the endoplasmic reticulum (ER). Env polyproteins are anchored in the lipid bilayer of the ER by a highly hydrophobic signal peptide.

1.4.6 Assembly and Budding

The final stage of the viral life cycle involves the assembly of translation products and progeny RNA into viral particles, and the release of particles from the cell in a process referred to as budding. At the cell periphery, progeny virus particles are enclosed by the lipid membrane of the host cell, which is modified by insertion of viral Env glycoproteins. Budding of the particle pinches off the region of the host cell plasma membrane that surrounds it, thereby forming the viral envelope in which Env glycoproteins are embedded (Hunter, 1997).

Retrovirus particles can be observed in electron micrographs, and show varying morphologies. A classification of retrovirus particle morphologies has been developed that distinguishes four particle types - A, B, C and D. Type A particles have one or two concentric rings, and appear to represent immature B and D particles (Teich, 1984). Type B particles are large with a dense round core that can appear lopsided in the mature virion, and have characteristically spiky surface glycoproteins. Type D particles have a bar-shaped core. Type B and type D particle morphologies are found in mature Betaretrovirus particles. Type C particles are observed in alpha- and gammaretroviruses and have a condensed, round, and slightly angular core centred towards the middle of the particle. Other retroviral morphologies have been observed, such as the 'cone- shaped' core of lentiviruses, but these have not been given a letter name (Swanstrom and Wills, 1997).

The pathways by which the translation products and progeny RNA of diverse retroviruses are assembled into viral particles exhibit many subtle variations, but there are essentially two major patterns (Teich, 1984). In the first, exhibited by betaretroviruses and spumaviruses, particles assemble in the cytoplasm and are subsequently transported to the plasma membrane. Betaretroviruses differ from

36 Chapter 1 Introduction spumaviruses in that cytoplasmic particles are immature (A-type), and mature as they bud from the cell. In the second (C-type) assembly pathway, translation products aggregate at the cell periphery and formation of particles occurs concurrently with budding. This pathway has been observed in alpharetroviruses (avian leukosis virus (ALV)), gammaretroviruses (murine leukemia virus (MLV)) and lentiviruses (HIV-1) (Swanstrom and Wills, 1997). In HIV-1, the protein components of the virion undergo extensive modification as they combine together to form the virion, such that the proteins that constitute the mature virion are not the same as those from which the virion is initially formed (Gelderblom, 1990). Maturation of HIV-1 virus particles is a complex and dynamic process, involving an unknown number of intermediate structures, most of which have not been elucidated (Vogt, 1996).

Retroviral assembly involves the packaging of two copies of the viral genomic RNA into virions, and dimerisation of the genomic RNA (Duesberg, 1968; Kung et al, 1975, Darlix et al, 1992). Both processes appear to be mediated by cis-acting sequence elements in genomic RNA (Beasley and Hu, 2002; Greatorex and Lever, 1998). Elements responsible for mediating packaging (packaging signals) have been identified in several retroviruses, primarily by deletion studies (Beasley and Hu, 2002). In some viruses (spleen necrosis virus, reticuloendotheliosis virus) they have been termed encapsidation sequences (E) (Watanabe and Temin, 1982), whereas in others they have been termed w (Mann et al, 1983; Adam and Miller, 1988). They are generally located in the 5' untranslated region of the genome, though some extend into the gag coding domain (Berkowitz et al, 1996). Packaging signals exhibit little sequence homology in different viruses, and packaging specificity is generally restricted to viruses of the same or closely related species (Swanstrom and Wills, 1997). However, similarities in RNA secondary structure have been observed between different viral packaging signals, and it has been proposed that hairpin loop structures may interact with Gag polyproteins in a mechanistically similar way in different viral species (Beasley and Hu, 2002).

The formation of dimers of genomic RNA is mediated by a dimer linkage sequence (DLS). DLS sequences have been mapped in several viruses, and are generally located

37 Chapter 1 Introduction in the 5' leader region of the genome (Greatorex and Lever, 1998). In HIV-1, the proximity of DLS sequences to the packaging signal, has led to speculation that dimerisation and encapsidation may be linked (Berkout and van Wamel, 1996). However, studies with Rous sarcoma virus (RSV) indicate that monomers are packaged initially, and dimerisation occurs subsequent to packaging of a second monomer (Lear et al, 1995).

1.5 Retrovirus Evolution

Retroviral evolution is characterised by:

(1) The potential for extremely rapid sequence evolution. (2) Mechanisms for gene exchange between retroviruses and their hosts, and between divergent retroviruses coinfecting the same host cell. (3) The capacity of retroviruses to enter the host germline where they can replicate autonomously, effectively behaving as transposable elements.

1.5.1 Rapid evolution of retroviral sequences

Retroviruses have the potential to evolve very quickly indeed. Extremely high rates of sequence evolution have been calculated for some exogenous retroviruses, such as HIV- 1. By way of illustration, it has been pointed out that the distance between the protease peptides of HIV-1 and HIV-2 is roughly equivalent to the distance observed between homologous proteins from eubacteria and eukaryotes, which diverged approximately two billion years ago. One of the most recent estimates of the evolution of members of the HIV-1 M group is —0.0024 substitutions per base pair per year in the env coding domain and —0.0019 substitutions per base pair per year in the gag coding domain. At this rate, the diversity between the proteases of HIV-1 and HIV-2 could have accumulated in closer to two thousand years than to two billion years (Sala and Wain- Hobson, 2000).

38 Chapter 1 Introduction

Variation is generated rapidly in actively replicating retrovirus populations because reverse transcription is highly error-prone. Numerous in vitro experiments have demonstrated the poor fidelity of RT compared with host DNA polymerases (Preston et al, 1988; Williams and Loeb, 1992; Bebenek and Kunkel, 1993). The RT enzyme has no proofreading function - that is, it has no mechanism by which it can locate and correct mismatched nucleotides after they have been added to the polynucleotide chain. The lack of proofreading activity probably accounts for the high error rate in RT, although the enzyme's tendency to extend mismatched primer termini may also be a factor (Pulsinelli and Temin, 1994). High error rates give rise to dynamic distributions of genetic variants, or quasispecies (Domingo, 2002). The rapid generation of variation enables retrovirus populations to adapt relatively quickly to changing selection pressures in the replication environment; expanding or altering their tropism, evading host immune defences, and resisting drugs designed to block their replication (Coffin, 1993; Wain-Hobson, 1993; Coffin, 1995).

Errors occurring during transcription may generate genetic variants that do not encode functional enzymes. These defective progeny may nonetheless be capable of active replication providing that the required enzymes are supplied in trans by other proviral insertions. There are many examples of replication defective retroviruses that require the presence of a 'helper' virus to replicate (Shimotohno and Temin, 1981; Wei et al, 1981; Tabin et al, 1982). Providing the promoter and packaging signal are intact, replication can continue even where the retroviral coding sequence has been replaced completely, providing essential regulatory sequences (PBS, PPT, yr, and LTR regions) remain intact.

Not only do retroviruses mutate faster than eukaryotes, they generally also have vastly higher numbers of generations per unit time, resulting in a far higher evolutionary 'clock speed' for retroviral sequences compared to cellular genes. In some retroviral infections, the rate of replication is extremely high relative to host genomic DNA replication. HIV infection is marked by active virus replication (Wei et al, 1995), and this may account for the very high rates of sequence evolution observed for the HIV retroviruses as compared to other retroviruses (Sala and Wain-Hobson, 2000).

39 Chapter 1 Introduction

1.5.2 Mechanisms for gene exchange

Gene exchange between retroviruses and the host cell, and between distinct retroviruses, is facilitated by integration during the retroviral life cycle and by the diploid nature of the retrovirus genome. The retroviral virion contains two copies of the viral genome linked by regions near their 5' termini. If a cell is producing two different kinds of retroviral genomic RNA, copackaging of heterogenous RNAs into viral particles can lead to recombination and the formation, in the next cycle of infection, of stable genetic recombinants (Hu and Temin, 1990). Recombination between genetic variants within a retroviral quasispecies probably facilitates genetic repair, and may have a beneficial role in generating variation, similar to that sex is thought to have in cellular organisms (Temin, 1991).

The exchange of sequence information via recombination can range from short intragenic sequences to entire coding domains, and can generate phenotypic variants with altered biological properties (Telesnitsky and Goff, 1997). For example, since Env SU glycoproteins constitute the main determinant of viral host range, exchange of env genes between divergent viruses could potentially give rise to recombinant retroviruses with radically altered tropism, and thereby precipitate horizontal transmission into new host species.

Retroviruses can also acquire cellular genes during integration. Sometimes during transcription of full-length genomic RNA, the 3' polyadenylation signal may be suppressed, such that transcription continues into cellular genes downstream of the insertion site. The cellular genes may be attached 3' to viral genomic DNA and packaged into infectious virions. This mechanism of acquiring new coding sequences has led to the capture of cellular proto-oncogenes by some retroviruses, generating acutely-transforming retroviruses - retroviruses with the capacity to rapidly induce tumours and transform cells in culture (Duesberg and Vogt, 1970; Stehelin et al, 1976).

40

Chapter I Introduction

Figure 1.8 Fixation of an ERV insertion (adapted from Page and Holmes, 1998)

A Generation 5

Retroviruses that integrate into germline cells can be vertically inherited, and providing they Generation 4 do not unduly harm the host, may reach fixation in the host population. The figure shows the fixation of an ERV insertion Generation 3 in a small population. The original insertion event occurs in generation one and is fixed in the population by generation Generation 2 0 0 five.

Generation I 0 a

Genome with ERV insertion Genome without ERV insertion

Figure 1.9 DNA recombination events involving ERVs (after Stoye, 2001)

a) LTR ± •ti LTR

b) i— Liti_i Eili.—Ellin riliari I TR LTR

LTR

+ 111-11E1-111:1 EL ] LTR

LTR LTR 1111111 ffillik Alli ± ii=1".••••411 d) LTR LTR

Figure 1.9 DNA recombination events involving ERVs (a) Recombination between the two LTRs of a single provirus resulting in the loss of one LTR and the viral coding sequences, and leaving behind a solo LTR (b) Homologous recombination between two proviruses in the same chromosome resulting in a microdeletion and loss of the intervening sequences. (c) Recombination between the 3' and 5' LTRs of a given provirus leading to a tandemly duplicated provirus. (d) Gene conversion leading to gene exchange with no proviral loss.

41 Chapter 1 Introduction

1.5.3 Endogenous retroviruses

Retroviruses can infect most somatic cell types, and occasionally a retrovirus may infect the oocytes or early embryo of its host, thereby entering the germline. Provided that integration does not unduly harm the host, the provirus can subsequently persist within the germline as a host allele. Retroviruses that enter the germline in this way are referred to as endogenous retroviruses (ERVs).

The genomes of most vertebrates contain thousands of ERVs (Boeke and Stoye, 1997; Herniou et al, 1998). Detailed study of ERVs in the genomes of (Tristem, 2000), mice (Lueders and Kuff, 1983; Kozak et al, 1987), chickens (Frisby et al, 1979) has revealed that ERV insertions in widely dispersed genomic locations may be very closely related, emerging as distinct monophyletic lineages in phylogenetic trees. The high degree of relatedness between insertions in many ERV lineages suggests that initial germline integration events have been followed by expression of ERV genes, leading to an increase in copy number. Amplification of ERV copy number presumably occurs through expression of ERV genes, leading either to reinfection of, or retrotransposition within, germline cells.

Some ERVs represent endogenised variants of extant exogenous virus strains. For example, mouse mammary tumour virus (MMTV) and Jaagsiekte sheep retrovirus (JSRV) are found both as ERV insertions and as infectious viruses in their host species (Nandi and McGrath, 1973; Palmarini et al, 1996; Cousens et al, 1999). However, the majority of ERV insertions are estimated to be millions of years old, and do not appear to have infectious counterparts. These ERVs are apparently derived from ancient infectious retroviruses that colonised the ancestral host germline.

Often, ERV insertions are found in the same genomic location in all members of a species, in other words, they are genetically 'fixed' (see Figure 1.8). Fixed insertions can sometimes be identified at equivalent genomic locations in related species, indicating that colonisation (and probably fixation) occurred prior to the divergence of

42 Chapter 1 Introduction the host taxa. Most insertions are estimated to be many millions of years old, and have lost their capacity to replicate through mutational decay. ERV insertions typically show signs of their age, the viral open reading frames are usually interrupted by stop codons and frameshifts, and contain numerous indels. Some ERV insertions have lost their env coding domains, while others have lost their entire coding sequence through homologous recombination between the two proviral LTRs, leaving behind solo LTRs (Bock and Stoye, 2000) (see Figure 1.9).

Potentially, sequence information could be exhanged (via recombination) between diverse ERV lineages, or between ERVs and infectious exogenous retroviruses, generating new viral phenotypes. Recombination could occur between copackaged genomic RNAs from two different ERV lineages or between ERV genomic DNA and genomic RNA derived from exogenous retroviruses. Thus ancient ERV sequences could potentially be recirculated in infectious virus populations after remaining dormant as endogenous sequences for thousands or perhaps even millions of years. ERV sequences in host genomes may thus act as a reservoir of viral sequence information from which, potentially, new viral phenotypes can arise.

The pathogenic potential of ERV sequences in animal organs destined for use as transplants in human recipients (xenotransplants) is an area of current medical concern. Pigs, which are considered potential donor for human xenotransplants, harbour porcine ERVs (PERVs) with unknown pathogenic potential. This has caused serious concern with respect to a possible transmission of novel viruses to humans via a transplanted organ. Transmission of PERV to human cells has been documented under in vitro conditions, but not in vivo. The possible consequences of introducing PERV into immunocompromised human organisms are not known and require further research (Blusch et al, 2000; Platt, 2000; van der Laan et al, 2000).

43 Chapter 1 Introduction

1.5.4 Reconstructing retrovirus relationships

Taxonomic characters that are distinctive for various species and genera within the retrovirus family include the presence or absence of accessory genes, presence or absence of oncogenes, site of virion assembly, host range, and shape of the viral core (see Table 1.2). However, these characters have largely been superseded by phylogenetic reconstruction based on sequence comparison. Table 1.2 gives the species demarcation criteria for each of the seven retroviral genera. Phylogenetic reconstruction is usually based on the alignment of retroviral polymerase (RT) genes, since these tend to be the most conserved in the viral genome.

Phylogenetic inference based on genome sequence data is almost certainly the key to future advance of viral taxonomy. However, in many respects, its use with regard to retroviruses is potentially problematic. One problem is that, as discussed in section 1.5.1, retroviral sequences can evolve very rapidly. In theory, the rapid evolution of retroviral sequences might be expected to obscure homology and/or phylogenetic signal, making it difficult or impossible to resolve confidently evolutionary relationships.

It is perhaps surprising then that marginal sequence similarities have been identified not only between the RT genes of diverse exogenous retroviruses, but also between the RT genes of retroviruses and RT sequences derived from a wide range of RT-encoding non- viral elements (see Box 1.). Many of these elements diverged from one another millions upon millions of years ago. Nevertheless, most students of retroviral evolution now accept that diverse RT genes share several regions of homology and may share a common origin. Alignment of RT genes from diverse sources has identified six conserved domains within the enzyme. RT alignments have been used to explore the evolutionary relationships of retroviruses to one another, and to other RT-encoding elements, using a variety of tree-building algorithms (Doolittle et al, 1989; Xiong and Eickbush, 1990; Herniou et al, 1998; Martin et al, 1999a). This analysis has helped define an overarching assemblage of retroelements, of which the retroviruses form part (Xiong and Eickbush, 1990; Boeke and Stoye, 1997).

44 Chapter 1 Introduction

Phylogenies based on RT have been instrumental in the development of both retrovirus and retroelement taxonomy. Trees constructed using RT genes of diverse retroelements lend some support to the idea that retroviruses evolved from non-viral elements (see Box 1., Figure 1.10). Consequently, retroviral RT phylogenies are sometimes rooted on the RT genes of LTR retrotransposons.

A second problem for phylogenetic reconstruction of retrovirus relationships is the potential for recombination between divergent retroviruses. Failure to identify recombination events could result in misleading phylogenies. If the rate of sequence exchange between diverse retroviruses is high, phylogenetic reconstruction could rapidly become a meaningless exercise; the evolutionary history of the retroviruses would resemble a tangled network rather than a tree. However, if recombination between divergent species is relatively rare, recombination events can serve as informative directional markers for evolution (Doolittle et al, 1989). So far, recombination between distantly related retroviruses appears to be relatively rare. Independent trees based on retroviral sequences (RNase H, IN, TM) are generally congruent with those based on RT. In some cases, however, incongruent trees strongly indicate that gene exchange between diverse retroviruses has occurred during evolution (Benit et al, 2001).

45

Chapter 1 Introduction

Box 1. The Retroelement As semblage

Retroviruses are not the only parasites to encode reverse transcriptase. It is encoded by viruses of two other families (the and collectively referred to as pararetroviruses) and by a wide range of retrotransposable elements, or 'retrotransposons' inhabiting the genomes of prokaryotes and eukaryotes. Retrotransposons are genes, or modules of genes, that replicate autonomously within genomes. Like infectious viruses, retrotransposons are parasitic nucleic acids that do not encode organisms. However, they differ in that they spread through populations primarily via vertical inheritance as proviruses within germ line cells, rather than by horizontal transfer between individuals. Typically, cellular mechanisms express the retrotransposon genes, which are then reverse transcribed back into DNA and reintegrated at a new genomic location. This 'copying and pasting' process is referred to as retrotransposition. The RT enzyme used in retrotransposition is usually encoded by the element itself, but may also be 'borrowed' from (supplied in trans by) another element. Together retroviruses, pararetroviruses and retrotransposable elements form an assemblage of 'retroelements' that share the common feature of replication by reverse transcription. The retroelements are a remarkably diverse and successful assemblage, and retroelements have been detected in the genomes of organisms ranging from protozoa to human beings (Eickbush, 1994). Table 1 summarises the various retroelement types.

Table 1.1 Retroelement nomenclature (after Boeke and Stoye, 1997)

Name used here Common synonyms Description Example (host)

Endogenous retrovints Retrovirus found as germline HERV.H (human) provirus; often not infectious IAP (mouse)

LTR retrotransposon Type-I retrotransposon LTR-containing retrotransposon; Tyl (yeast) not infectious as part of normal Gypsy (fruit fly) life cycle

Non-LTR retrotransposon Poly(A) retrotransposon; Retrotransposon lacking LI (human) type II retrotransposon; non- terminal repeats; usually has viral retrotransposon; LINE poly(A) or similar structure at 3' end

Pararetrovirus Infectious DNA viruses that HHBV (human) replicate via RNA intermediate

Retroplasmid Plasmid encoding RT Mauriceville (Neuospora)

Retnaintron Mobile group II intron Intron encoding RT a1.2 (yeast)

Retrotranscript Retroposon; SINE; Elements that do not encode RT A lu (human) processed pseudogene that apparently transpose via an RNA intermediate

Retron msDNA Unusual branched nucleic acid msDNA (bacteria) with RNA and DNA component. DNA segment encoded by adjacent DNA.

46

Chapter 1 Introduction

The best studied of the non-infectious, RT-encoding retroelements are the LTR and non-LTR retrotransposons. These two groups are distinguished by genomic structure and replication strategy.

LTR-retrotransposons

LTR retrotransposons have been studied extensively in Saccharomyces cerevisae and Drosophila melanogaster where they were first discovered (Eickbush, 1994, Marlor et al, 1986). Many more LTR-retrotransposons have since been characterised and they appear to be remarkably widespread and abundant in eukaryotes (Eickbush, 1994). Most have a similar genetic organisation, encoding two main open reading frames, gag and pol, and having characteristic LTR sequences at the 3' and 5' terminals of their DNA form. As in retroviruses, the pol open reading frame encodes catalytic proteins involved in reverse transcription while the gag open reading frame encodes structural proteins. During replication the element genes are transcribed and translated by cellular mechanisms. Structural proteins encoded by gag assemble into intracellular virus-like particles (VLPs). Genomic RNA is packaged into VLPs where it associates with the element polymerase and is reverse transcribed. Reverse transcription generates a DNA copy of the genomic transcript with complete LTR sequences at either end. This DNA intermediate migrates to the nucleus where integrase bound to the LTR sequences generates a staggered cut in the chromosomal DNA and catalyses the integration ofretrotransposon genes.

Non-LTR retrotransposons

The non-LTR retrotransposons comprise a diverse assortment of simple retroelements (Boeke and Stoye, 1997; Eickbush, 1994). As their names indicates, these elements lack LTRs. Replication of non-LTR retroelements is not well understood, but Eickbush (1994) proposes a model in which translation of the retrotransposon genes generates a protein that has both reverse transcriptase and endonuclease activities. This protein associates with a full-length, polyadenylated RNA transcript of the retrotransposon genome, and the complex of RNA and protein migrates to the nucleus where it generates a chromosomal break (which may be a single stranded nick or a double stranded break). The 3' ends of one of the nicked strands then acts as a primer for reverse transcription of the first DNA strand (negative strand). Second strand synthesis might be completed by element-encoded enzymes, or by host DNA repair mechanisms. Non-LTR retrotransposons rely on cellular RNA polymerase II for transcription, but eukaryotic RNA polymerase promoter sequences are typically located upstream from the transcription initiation site, posing the problem that any promoter sequence encoded by the retrotransposon itself would not be present in the resulting transcript. The LTRs of LTR-retrotransposons provide an elegant solution to this problem, because they are regenerated, along with the promoter sequences they contain, during replication of the element. In non-LTR retrotransposons, the problem must be overcome by different means, such as the use of 5' promoters capable of initiating transcription upstream (Mizrokhi et al, 1988; Minchotti and Di Nocera, 1991; Swergold, 1990), or by site-specific insertion downstream from host promoter sequences such that transcription is guaranteed (Eickbush, 1994).

Retroelement Phylogeny

Alignment of RT genes from diverse RT-encoding retroelements has identified at least six regions of homology. Xiong and Eickbush (1990) aligned the RT genes various retroelement groups and used phylogenetic reconstruction to estimate the evolutionary relationships between them. Assuming that the RT genes of the entire retroelement assemblage share a common origin, the tree might be rooted on an ancient ancestral reverse transcriptase. Although this view remains contentious (Temin, 1989), it is thought by some commentators that RT may be a very ancient molecule that served an important archiving function during the transition from RNA to DNA genomes (Boeke and Stoye, 1997).

47

Chapter 1 Introduction

Figure 1.10 An evolutionary tree of the retroelements (adapted from Boeke and Stoye, 1997)

Non-LTR retrotransposons

Retrointrons

Non-infectious Retrons Ancient RT retroelements

R.etroplasmids

LTR-retrotransposons

Hepadnaviruses

Pararetroviruses

Retroviruses

The rooted tree (Figure 1.10) suggests a straightforward hypothesis for the evolution of the different retroelement types; - that they arose by becoming progressively more complex. This could have occurred by stepwise addition of new modules of genetic information, including cis-acting regulatory sequences and coding sequences. This general hypothesis has a widespread acceptance among commentators on retroviral evolution, though it is not universally accepted (Boeke and Stoye, 1997).

Furthermore, there are differing views on the evolutionary relationship between the infectious retroviruses and pararetroviruses and the LTR-retrotransposons (see Figure 1.10). Phylogenies based on RT seem to suggest a single origin of the vertebrate retroviral branch elements (Mike Tristem, personal communication). However, it is by no means easy to chart an evolutionary course using contemporary sequence data, and it remains possible that numerous independent acquisitions of infectivity have generated multiple independent viral lineages (Coffin, 1993). Moreover, loss as well as acquisition of sequence modules can occur in evolution, so it is presumptuous to assume unidirectional evolution from simple to complex forms. For example, the colonisation of the host germ line by endogenous retroviruses may be followed by adaptation to a retrotransposon-like replication strategy, with ERV insertions spreading through the host population by retrotransposition within in germ line cells. Adaptation is indicated in some ERV lineages by the widespread distribution within some genomes of closely related ERV insertions that lack functional env genes (Hirose et al, 1993). Some commentators have argued that infectious retroviruses are the progenitors of at least a proportion of the eukaryotic LTR-retrotransposons (Coffin, 1993).

48 Chapter 1 Introduction

1.6. Retrovirus distribution and diversity

1.6.1 Exogenous retrovirus diversity

The International Committee on the Taxonomy of Viruses (ICTV) currently distinguishes seven genera within the retrovirus family, in a classification that is almost exclusively concerned with exogenous isolates (Pringle, 1999). The seven genera are the Alpharetroviruses, Betaretroviruses, Deltaretroviruses, Gammaretroviruses, , Spumaviruses and Lentiviruses (Table 1.2 and Figure 1.11).

The RT genes of exogenous retroviruses can be aligned with the ERV pseudogenes. Phylogenies based on these alignments allow us to explore the relationships between ancient ERVs in diverse host genomes, and modern exogenous retroviruses. Numerous diverse ERV sequences have been identified in recent years, using a variety of techniques. Initially, high and low stringency hybridisation techniques that relied on homology to known exogenous retroviruses were used to identify and characterise ERVs (Lueders and Kuff, 1980; Dunwiddie et al, 1986). More recently, efforts to identify and characterise novel ERVs have taken advantage of the vast amount of data provided by genome sequencing projects (section 1.6.3 below). PCR-based methods have been used to explore ERV diversity across a wide range of taxa (see section 1.6.4).

1.6.3 Human endogenous retrovirus (HERV) diversity

The human genome sequencing project (HGP) has generated a huge resource of retroviral sequence data. At the time of writing, the most significant challenge to researching human endogenous retroviruses (HERVs) is simply managing the data, and the focus is on developing ERV-specific data mining algorithms to exploit it efficiently. A variety of bioinformatics tools, such as BLAST search (Altschul et al, 1997) have been used to explore the diversity of HERV sequences in the human genome (Tristem, 2000; Benit et al, 2001).

49

Chapter 1 Introduction

Figure 1.11 Taxonomy and sequence relationships of exogenous retroviruses

Tree Species

Gypsy Outgroup retroelements Ty3

83 SFV I too SFV3 Spumaviruses CFV 100 111 98 FFV BFV

58 MULV Gammaretroviruses 100 GaLV IR vat cc...d 97 FeLV WDSV Eps lonretroviruses Poillw HTLVI too Deltaretroviruses 73 HTLVII 100 STLV3 4114 67 BLV

too HIV I I SIVcpzUS 99 HIV2

74 62 SIVagm FIV Lentiviruses 54 III too CAEV 100 Visna EIAV

too BIV 68 Jembrana RSV Alpharetroviruses Aar- too MPMV loo too SRV I Betaretroviruses Host range key 75 SRVII 1111 Alt a —i Oillab WO Jaagsiekte lig Reptiles Fish MMTV

Figure 1.11 A phylogeny showing the relationships of the exogenous retroviral genera to one another. The phylogeny was based on an alignment of RT genes and constructed using the NJ clustering method. Host range information is shown for each of the genera. Type C retrovirus virions have been observed in snakes, and may be derived from exogenous Gammaretroviruses, but no sequence data is available to confirm this (Lunger et al, 1974). Other retrovirus particles identified in snakes have recently been shown to be derived from endogenous Class II ERVs (Huder et al, 2002).

50 Chapter 1 Introduction

51 Chapter 1 Introduction

Studies of HGP data indicate that as much as 5% of the human genome consists of ancient retroviral insertions, although solo LTRs comprise 85% of ERV-derived sequences' (Lander et al, 2001) (Table 1.3). It seems probable that the genomes of other vertebrates contain similar numbers of such sequences. In the not too distant future, data from the mouse genome sequencing project may permit a detailed comparison. Inevitably, due to the abundance of data provided by the HGP, far more is known about the diversity of ERVs in the human genome than in that of any other host species.

Table 1.3 Estimates of the fraction of the human genome for retroviral classes

ERV class Number of Total No. bases in Fraction of draft copies (x 1000) draft HGP data (Mb) genome sequence (%)

ERV-class I 112 79.2 2.89 ERV-class II 8 8.5 0.31 ERV-class III 83 39.5 1.44 MaLR 240 99.6 3.65

Totals 443 226.8 8.29

Table 1.3. (after consortium 2001). Data extracted from a RepeatMasker (A.F.A. Smit & P. Green, unpublished data) analysis of the draft human genome sequence. MaLR = (Mammalian apparent LTR-retrotransposons - LTR elements with internal sequences lacking any detectable homology to RT (Smit, 1993))

Work in the Retroviral Evolution group at Imperial College has involved searching for novel HERVs in genome project sequence data by BLAST searching with diverse retroviral pol fragments. Phylogenetic inference methods were used to reconstruct the evolutionary relationships among the RT pseudogenes identified using this searching technique. An initial analysis carried out when the draft human genome was 7% complete identified 22 HERV families (Tristem, 2000). Subsequently, six additional HERV families have been identified (Benit et al, 2001; Katzourakis and Tristem, in press). The 28 HERV families are summarised in Table 1.4. The majority of HERV families show little sign of recent activity. Most are estimated to be fixed in the human

I These figures vary according to the assumptions made during their calculation (see Paces et al, 2002 for a different estimate).

52 Chapter 1 Introduction

Table 1.4 General properties of HERV families

HERV Family Alternative Name Primer Copy no.'

Class I

HERV.A* HERV.Z69907 tRNAAla ND HERV.ADP ADP-pol tRNAThr (?) 60 HERV.E tRNAGhi 85 HERV.F tRNAPhe 15 HERV.F (type b) tRNAPhe 15 HERV.F (type c)* tRNAPhe ND HERV.FRD tRNAllis 15 HERV.H RTLV-H tRNAHIs 660 HERV.H49C23 No LTRs 70 HERV.I RTLV-I tRNAlle 85 RRHERV.I tRNAlle 15 HERV.K* (type b) HERV.Z69907 tRNALYs ND HERV.L* (type b) tRNALe" ND HERV.P HuRRS-P tRNAPr° 70 HERV.R ERV.R tRNAArg 15 HERV.R (type b) tRNAArg 15 HERV.R (type c) * tRNAArg ND HERV.T HERV.S71 tRNAThr 15 HERV.W MSRV tRNAT"P 115 HERV.XA tRNAPhe 15 ERV-9 tRNAArg 70

Class II

HERV.K.HML2 HERVK.10 tRNALYs 170 HERV.K.HML5 tRNAlle 45 HERV.K.HML6 tRNALYs 70

Class III

HERV.L tRNALeu 575 HERV.S tRNAser 70 U2** Unknown ND U3** Unknown ND

Table 1.4. The 28 HERV families, after (Tristem, 2000), (Benit et al, 2001)** and (Katzourakis and Tristem, in press)*. ND = not done. The majority of HERV families fall into Class I, although Classes II and III contribute two of the largest families, HERV-K.HML2 and HERV.L respectively.

53 Chapter 1 Introduction genome, and are probably very old (up to —80 million years old (Tristem, 2000)). The age of HERV insertions is reflected in the degeneracy of their reading frames, most of which have numerous stop codons and frameshifts interrupting them. However, insertions with completely intact reading frames have been identified in the HERV.K.HML-2 lineage (Turner et al, 2001). HERV.K.HML-2 is thought to be the most recently acquired ERV lineage in the human genome, and the identification of polymorphic insertions in this lineage suggests it may still be active (Stoye, 2001).

1.6.3 ERV diversity throughout vertebrates

Southern hybridisation, polymerase chain reaction (PCR) and automated sequencing are techniques commonly used to identify and characterise novel ERVs in species for which extensive genome sequence data is not available. Southern hybridisation with a retrovirus derived probe is often applied as a test for the presence of ERVs in host genomes (Lueders and Kuff, 1981; Hecht et al, 1996). On its own, however, hybridisation data can give only limited information about ERV distribution and diversity. More thorough analyses can be performed when ERV sequence data is obtained for phylogenetic analysis. One approach involves the initial use of Southern hybridisation to identify ERV-containing clones in a genomic library, and subsequent sequencing of the positive clones (Kabat et al, 1996; Benit et al, 1997; Martin et al, 2002).

An alternative method uses PCR with degenerate primers directed against conserved motifs in the retroviral genome to amplify —800 bp fragments of the retroviral polymerase gene (Tristem, 1996) (see Section 2.3.2 for further details). Amplified fragments can be sequenced and analysed phylogenetically. The advantages of this protocol are twofold; firstly the degeneracy of the oligonucleotide primers enables a reduced degree of specificity suited to the amplification of divergent sequences. Secondly, the fragment of the retroviral polymerase gene amplified by these primers constitutes probably the most conserved region of the retroviral genome, and contains

54 Chapter 1 Introduction

several regions of relatively unambiguous homology (Doolittle et al, 1989; Xiong and Eickbush, 1990).

Previous work within the Retroviral Evolution Group at Imperial College has focussed on using this method to describe ERV diversity across a wide range of vertebrate groups. Many of the ERV sequences identified by PCR screening have been carefully aligned, and the evolutionary relationships between them estimated using a variety of tree-building algorithms. Phylogenetic analysis identified numerous divergent ERV lineages, indicating that the diversity of retroviral forms is far greater than currently recognised (Herniou et al, 1998).

At present, there is no agreement on how to incorporate ERV sequences into the existing retroviral taxonomic system. HERV lineages have been classified according to the tRNA recognised by their primer-binding site (PBS) (eg. HERV-I for a PBS that recognises an isoleucine tRNA). This nomenclature is problematic for two reasons. Firstly, phylogenetic analysis has demonstrated that diverse HERV lineages may utilise the same tRNA primer (Tristem, 2000), and secondly, this system refers to HERV lineages as families, but the Retroviridae as a whole have been assigned family status (Bock and Stoye, 2000). Non-human ERVs have sometimes been classified according to their host (eg. Trichosurus vulpecula endogenous retrovirus (TvERV); painted frog endogenous retrovirus (RV-painted frog)). This system of classification is also problematic; firstly because most host species probably contain multiple distinct ERV lineages, and secondly because highly similar ERV insertions may be found in different host species (Lueders and Kuff, 1983; Martin et al, 1999a).

This thesis uses a broad classification that distinguishes three classes of endogenous retrovirus, and reflects what has been shown by the data. Retroviral RT phylogenies generally reveal three major branches, defining three retroviral 'classes'. Class I contains the gamma- and epsilonretroviruses, Class II contains the lentiviruses and the alpha-, beta- and deltaretroviruses, and Class III contains the spumaviruses. Although the terms Class I, II and III labels are usually only applied to HERV sequences, it is the

55

Chapter 1 Introduction

Figure 1.12 The relationships between exogenous retrovirus genera, HERV familes and some non-human ERVs

Micropia Gypsy TY3 HERV.S RV Common possum HERV.L MuERV.L Class III SFVL3 SFS/ I HSV Spumaviruses FeSEV 1

MuLV - FeLV Gammaretroviruses GaLV _ HERV.S7 I HERV.R RRHERV.I HERV.E HaEV TaEV HERV.H HERV.Ftypeb HERV.F HERV.XA HERV.Ftypec HERV.P HERV.W ERV.9 HERV.Z HERV.FRD Class I HERV.I HERV.ADP HERV.HS2 WDSV — Epsilonretroviruses RV Puff adder RV Gharialll i RV Pit viper RV Stickleback

'

RV Tuarura RV Slider turtle! RV Horse

1-- HTLVI - HTLVII Deltaretroviruses BLV OMVV - 1 Visna CAEV FIV HIV I SIVcpzUS Lentiviruses HIV2 SIVagm EIAV BIV Class II — Jembrana - — MPMV - 1 SRV I SRVII Betaretroviruses Jaagsiekte --E—I MMTV _ I-- — HERV.K.HML2 HERV.K.HML6 HERV.K.HML5 FIERV.K.HML9 Alpharetroviruses RSV

Human ERVs ■ Reptile ERVs Piscine ERVs E Non-human mammalian ERVs Amphibian ERVs Avian ERVs

Figure 1.12 A phylogeny showing the relationships between HERV families, some non-human ERVs, and the exogenous genera. The phylogeny is based on an alignment of RT genes and constructed using the NJ clustering method. Note that the majority of ERVs fall into Class I.

56 Chapter 1 Introduction

convention of this thesis to apply them to all retroviruses, irrespective of their host origin, and whether exogenous or endogenous (See Retrovirus Nomenclature, p13). It seems pragmatic to apply these labels in this way, since it has an underlying basis in phylogenetic studies and greatly facilitates discussion of retrovirus diversity.

As shown in Figure 1.12, the majority of ERVs so far identified cluster together with the gammaretrovirus and epsilonretrovirus genera in Class I (Herniou et al, 1998). Many of the novel Class I ERV sequences identified in diverse vertebrate genomes group together to form well-supported, divergent clades, which may represent novel retroviral genera. Relatively few Class II ERV sequences have been identified. The Class II ERVs that have been identified show homology to the betaretroviruses and alpharetroviruses, but are clearly distinct from the complex deltaretroviruses and lentiviruses. No ERV sequences have been identified that show significant sequence homology to either of these two genera.

57 Chapter 1 Introduction

1.7 Analysis of ERV distribution and diversity

1.7.1 Retroviral host range

The distribution of ERVs throughout animal genomes can be explored using PCR or hybridisation techniques, and provides a good indication of retroviral host range. PCR screening in a wide range of vertebrate hosts suggests that the distribution range of ERVs includes the genomes of all jawed vertebrates (Gnathostoma) (Tristem et al, 1996; Martin et al, 1997; Herniou et al, 1998; Martin et al, 1999a). ERVs have not been identified in the basal vertebrate lineage Agnatha (the jawless hagfish and lampreys), or in any invertebrate species. ERV-like insertions (retroelements with apparent env coding domains) that have been found in fruit flies (Kim et al, 1994; Song et al, 1994) and plants (Peterson-Burch et al, 2000) are more closely related to LTR-retrotransposons (see Box 1.) than to retroviruses in RT

1.7.2 Cospeciation and horizontal transmission of retroviruses

Several studies have attempted to determine the extent to which the distribution of ERVs across host taxa has been generated by horizontal transmission across host taxa as opposed to cospeciation of retroviruses with their hosts. Retroviruses that cospeciate along with host taxa should have a phylogeny that mirrors that of the host lineage, whereas incongruent trees indicate host switching (or loss).

The distribution of baboon endogenous virus (BaEV) sequences throughout primate genomes has been investigated using PCR and hybridisation techniques. The distribution of BaEV strains reflects host habitat rather than host phylogeny, indicating that horizontal transmission of the exogenous progenitor of BaEV has occurred between primate species in shared habitats. There are two distinct BaEV strains; Baboons, geladas and African green monkeys, which share a savanna environment, harbour the BaEVsav strain, while mandrills and mangabeys, which live in forest areas, harbour the BaEVfor strain (van der Kuyl et al, 1995; van der Kuyl et al, 1996) .

58 Chapter 1 Introduction

Studies within the Retroviral Evolution group at Imperial College have estimated the levels of horizontal transmission and cospeciation between the MLV-related Class-I retroviruses and their hosts. Phylogenies of MLV-related ERVs isolated from mammalian, avian and reptilian hosts were compared to phylogenies of the host taxa. Results showed that viruses from a particular host class tend to cluster together, indicating that horizontal transmission between host classes is rare (see Figure 1.13). Within host classes, however, there were indications that horizontal transfer between species was more frequent.

Rates of horizontal transmission were compared across different host and virus groups, in an effort to identify factors influencing the rate of horizontal transmission between species. The host class from which ERVs were derived was not by itself associated with altered levels of cospeciation. Rather, the evidence suggested that the elevated levels of interspecies horizontal transmission were associated with viral genotype, being significantly higher in mammalian type I murine leukemia virus (MLV)-related ERVs than in other groups of Class I retroviruses examined in the study (Martin and Tristem, 2000). There were indications of recent horizontal transfer amongst mammalian type I MLV- related ERVs. Gibbon leukemia virus (GaLV), an exogenous virus, is very closely related to an ERV fragment identified in the koala bear (Phascolarctos cinereus) genome (-85% nucleotide identity across 900bp of pol). The level of sequence divergence is comparable to that observed between different strains of GaLV, suggesting that horizontal transfer of these viruses between placental and marsupial mammals occurred relatively recently. Since the two host taxa are geographically isolated, it is considered likely that a vector species mediated the horizontal transfer. The widespread occurrence of closely related ERVs in rodent genomes suggests that rodents may have been the vectoring species (Martin and Tristem, 2000).

Even though there appears to be a relatively high level of intra-class horizontal transmission amongst mammalian type I MLV-related retroviruses, the sheer number of both the potential host taxa and viruses mean that it has been extremely difficult to determine the exact pattern and timing of individual horizontal transmission events.

59 Chapter 1 Introduction

Furthermore, it has not yet proved possible to discern relationships between host life- history factors and levels of retroviral horizontal transmission. Such relationships, and informative biogeographic patterns of ERV distribution, may well be elucidated through further sampling of ERV diversity.

1.7.3 ERV copy number

Bioinformatics tools have been used to estimate the copy number of diverse HERV lineages in the human genome (see Table 1.2 and Table 1.3). Estimates vary according to the method used, however, most HERV lineages appear to have approximately 15-30 members. However, there are three HERV lineages (HERV.H; HERV.L and HERV.K.HML-2) that stand out from the rest, with copy numbers ranging from —150 to —700. Comparison with copy numbers for related ERV lineages in other primate genomes may prove informative in the future. The copy number of specific groups of ERVs has been estimated in other species using hybridisation techniques (Kuff and Leuders, 1988; Hecht et al, 1996), although this method may be prone to error under certain circumstances.

1.7.4 Age distributions of ERV insertions

Complete genome sequence data provide us with the opportunity to reconstruct detailed relationships within ERV lineages, and to estimate the relative age of individual insertions. For ERV insertions that retain LTR sequences at both termini, a second source of phylogenetic signal is available. Since the LTRs are identical immediately after integration (see sections 1.4.2 and 1.4.3), the percentage divergence between pairs of LTRs can be used to estimate the approximate date of integration, assuming a given rate of mutation (Johnson and Coffin, 1999). LTR divergence data can be used in conjunction with phylogenetic reconstruction to infer patterns of amplification in diverse HERV lineages.

60 Chapter 1 Introduction

Preliminary studies indicate that patterns of amplification may vary between HERV families. For example, whereas HERV.L shows an apparent burst of activity followed by little or none, the HERV.H lineage seems to have retained a relatively constant rate of activity over time (Mike Tristem, personal communication). However, further research is required before any definitive statements can be made with regard to this. Ongoing work (carried out by Aris Katzourakis) aims to plot lineage through time data for diverse HERV families. Inferences about past retrotranspositional activity could then be made from lineage through time plots using birth-death models (Harvey et al, 1994, Nee et al, 1994; Purvis, 1996).

1.7.5 ERV distribution within genomes

Access to mapped and annotated genome sequence data is steadily enabling researchers to study the distribution of ERVs in relation to genes and other chromosomal regions and features. Although ERVs are known to preferentially integrate into GC- and Alu- rich, actively transcribed and early replicated areas of chromosomes (Page and Holmes, 1998) within the human genome show diverse patterns of distribution (Sverdlov, 2000; Mamedov, Batrak et al, 2002). The distribution of ERVs in relation to coding sequences is an area of considerable interest, since it is thought likely by many that LTRs and other retroelement promoter sequences might have a role in regulating the expression of nearby genes (Bennetzen, 1996; Kidwell and Lisch, 1997; Kidwell and Lisch, 2000). As such, ERV insertion near to coding sequences may have been an important factor in genome evolution. Numerous studies have shown that LTRs and HERVs are located nearby or within gene loci (Meisler and Ting, 1993; Kulski et al, 1997; Kidwell and Lisch, 2000).

Preliminary analyses suggest a relationship between retroviral distribution, gene density and recombination rate (Kurdyukov et al, 2001; Katzourakis and Tristem, in press). Chromosomal load on the Y chromosome appears to be significantly higher than on other chromosomes. This may reflect the fact that the Y chromosome does not recombine (except in a relatively small region known as the pseudoautosomal region),

61

Chapter 1 Introduction

Host Virus Figure 1.13 Mammals 0 0 IlaEV 0MeEVII 0 MiEVII A tanglegram showing the relationship 0 BoEV between virus and host phylogeny for a 0 OvEVII group of MLV-relatcd retroviruses 0 TaEV (after Martin et al, 1999). The viruses 0 HEAVE. 0 HERV.seqA of a particular host class generally 0 MeEVI cluster together, indicating that 0 MiEVI horizontal transmission between host 0 HC2 classes is rare. 0 VuEV 0 OvEVI Squamata 0 0 BaeV Because horizontal transmission 0 OrEV across host classes is rare, cases where 0 FeLV it has occurred stand out in the data. In 0 MuLV 0 PERV mp this case, SNV (spleen necrosis virus), 0 PERU MK a highly pathogenic virus of ducks, O MuRRS chickens and turkeys, clusters with the 0 GaIV mammlian viruses, indicating that it 0 RV Koala 0 RV Opossum has probably undergone horizontal O RV Echidna transmission from mammals to birds. — CrocodyliaG. 0 SNV • RV Edible frog II • RV Natterjack toad • RV Rhinatremid caecilian IV • RV Rhinatremid caecilian III — • RV Yellow striped caecilian —0 RV Green anole —0 RV Boa constrictor RV Garter snake RV Jararaca RV Puff adder 0 RV European adder — Birds 0 RV Komodo dragon It —0 RV African dwarf crocodile ♦ RV False gharial 0 RV Wood pigeon 0 RV Wren 0 RV Bowerbird Ill 0 RV Redwing RV Bowerbird II 0 RV Rook O RV Partridge) 0 RV Partridgell Amphibians 0 RV Pheasant

Figure 1.14 Fixed ERVs track host phylogeny

Insertion found in Lack of insertion in all descendant species this species indicates that it diverged from the others prior to the fixation event

Element phylogeny

Fixation of ERV insertion

Germline colonisation by retrovirus

Host phylogeny

62 Chapter 1 Introduction

enabling slightly deleterious retroviral insertions to persist for longer. It may also reflect the low gene density on the Y chromosome, in the sense that retroviral insertions into non-essential, non-genic regions are less likely to be harmful to the host and therefore less likely to be removed by selection.

The non-random distribution of HERVs with respect to genes and chromosomes is of particular interest in relation to primate genomes. Many of the differences between the human genome, and the genomes of great are due to differences in the number of distribution of transposable elements. Analysis of the differences in integration sites of HERVs in the human genome and the genomes of great apes might aid insight into the role of retroviruses and other transposable elements in hominid speciation (Sverdlov, 2000).

1.7.6 ERVs as Glade markers

Often, ERV insertions are found in the same genomic location in all members of a species, in other words, they are genetically 'fixed'. Fixed insertions can sometimes be identified at equivalent genomic locations in related species, indicating that colonisation (and probably fixation) occurred prior to the divergence of the host taxa. Once fixed, an insertion cannot be lost from a host population except under very unusual circumstances. ERV insertions that reach fixation will subsequently track host phylogeny (Figure 1.14). The independent, random nature of insertion is such that the insertion of the same retrovirus at the same genomic locus in different host species is extremely unlikely. Consequently ERV insertions can be used as high-weight, homoplasy-free markers for the diagnosis of common ancestry (Johnson and Coffin, 1999). Other retroelements, such as SINEs, can also be used in this way and have contributed to a number of important systematic revisions, most notably with respect to mammalian orders (Shimamura et al, 1997).

63 Chapter 1 Introduction

1.10 The aims of this study

This thesis involved the use of two separate, but related approaches to study ERV distribution and diversity, and the evolutionary inference that can be made from it. Firstly, PCR screening was used to investigate the distribution and diversity of Class II ERVs in a broad range of higher vertebrates. Although the distribution of Class I ERVs in vertebrates has been studied in some detail (Tristem et al, 1996; Martin, Herniou et al, 1997; Herniou, Martin et al, 1998; Martin, Herniou et al, 1999a), Class II retroviruses remain relatively poorly investigated. PCR screening and automated sequencing were used to amplify and characterise Class II ERV pol fragments, and phylogenetic reconstruction was used to infer the relationships between them. The data presented in chapter two of this thesis significantly advances understanding of Class II distribution and diversity.

The second aim of this study was to use computer simulation to explore how ERV distribution and diversity is generated in response to varying ecological and evolutionary parameters. Simulation provided an experimental environment in which to model the relationship between the evolutionary history of host/ERV lineages, and patterns of ERV distribution. Chapter three describes the design and implementation of the computer simulation, and Chapter four describes a series of experiments performed with it.

64 Chapter 2 Class II Retrovirus Diversity

2. The Distribution and Diversity of Class II Retroviruses

2.1 INTRODUCTION

Data suggest there have been at least three major divergences during RT evolution, giving rise to three retroviral 'classes' (Class I, Class II and Class III) in phylogenies based on alignments of retroviral RT genes (see p13) (Chiu et al, 1984; Callahan, 1988; Doolittle et al, 1989; Wilkinson et al, 1994; Tristem et al, 1996; Martin et al, 1997; Boeke and Stoye, 1997; Herniou et al, 1998; Benit et al, 1999; Andersson et al, 1999; Tristem, 2000; Katzourakis & Tristem, in press). The Class II retroviruses include four of the exogenous retrovirus genera (the Lentivirus, Alpharetrovirus, Betaretrovirus and Deltaretrovirus), three HERV families, and diverse ERV sequences isolated from vertebrate genomes. Exogenous Class II retroviruses have not been described in lower vertebrates (fish, amphibians, reptiles), and PCR screening suggests that Class II ERVs are also rare in these host classes, although they have been detected in a few species (Martin, 1999b, 175-177; Huder et al, 2002). The following sections review the Class II genera and examine what is known about the diversity of Class II ERVs. Figure 2.1 shows a phylogenetic tree of the exogenous Class II retroviruses.

2.1.1 Alpharetroviruses (avian sarcoma and leukosis viruses)

Alpharetroviruses are found only in birds, in which they occur as both exogenous and endogenous viruses. Alpharetroviruses have a simple genome organisation and the viral particles have type-C morphology. Numerous oncogene-containing alpharetroviruses have been described.

The prototype member of the Alpharetrovirus genus is avian leukosis virus (ALV), which occurs as an exogenous and endogenous virus of the domestic chicken. ALV was the first retrovirus identified (Ellerman & Bang, 1908) and has had a central role in the development of the field of retrovirology as a whole.

65 Chapter 2 Class II Retrovirus Diversity

Figure 2.1 A phylogeny of the exogenous Class II retroviruses showing host class

fl Mammalian Deltaretroviruses HTLVI Li Avian HTLV2 pnv3 BIN

Lentiviruses OMVV 00 V,sna 62 CAEV F1V 100 HIV1 SIB cpzUS 100 111V2 SIV orn E1AV 100 B1V Jembrana

MPMV SRVI SRVII IlabboonSERV MusD Jaagsickte "It ERV SMRVH 'ANDY

AlphareWevirnses ,RSAI Vv' suhur

HERV.K superfamilly 52 11F,RV.K HERVICIIM66 H ERVICITML5

IAP Elements 100 IAPSilamster 1APMouse 'AK:Hamster LDV

Figure 2.1 A phylogeny showing the relationships of the Class II retroviruses to one another. The phylogeny was based on an alignment of RT genes (see Appendix 2.1) and constructed using the NJ clustering method. Gammaretroviruses were used as a outgroup (not shown). Bootstrap support values over 50% (from 1000 bootstrap replicates) are indicated. At present, the lymphoproliferative disease virus (LDV) of turkeys is not placed within any genus. Note that the Class II Glade as a whole has strong bootstrap support (93%).

66

Chapter 2 Class II Retrovirus Diversity

Figure 2.2 Rous Sarcoma Virus (RSV) Genetic Map

a ¢ a a Z 0 0 0 g a ild 0 .T.9 .5 ig ..- A I A H E, E E '-' 2 ' t; C°ba ,,,, . ›. > Proviral DNA , . M CA r.-- oo cn ,0 f 4 0 •zr 00 --. t-- '-^ " r,-. ,o r--- oo

lk... I. ____Jiaatir4 U3 R U5 U3 R U5

Frame I ORFs Frame 2 gag pro 7srcv-

Genomic RNA AUG [JAG UAA R US gag pro poi env v-src U3 R 5' CAP - I . r" I_ AAA 3' SD -1 FS

Table 2.1 The Alpharetroviruses

Alpha Retrovirus Species Acronym Oncogene

Avian Leukosis Virus Avian Leukosis Virus-RSA ALV-A Avian Leukosis Virus-HPRS103 ALV-J Oncogene-containing viruses

Replication Competent Rous Sarcoma Virus (RSVP) Rous Sarcoma Virus (Prague C) RSV-Pr-C src Rous Sarcoma Virus (Schidt-Ruppin B) RSV-SR-B src Rous Sarcoma Virus (Schidt-Ruppin D) RSV-SR-D src Replication Defective Avian carcinoma Mill Hill virus 2 ACMHV-2 myc Avian myelobastosis virus AMV myc Avian myelobastosis virus 29 AMCV-29 myc Avian sarcoma virus CTIO ASV-CTIO crk Fujinami sarcoma virus FuSVs fps UR2 sarcoma virus UR2SV ros Y73 sarcoma virus Y73SV yes

67 Chapter 2 Class II Retrovirus Diversity

A second alpharetrovirus infecting chickens is Rous sarcoma virus (RSV), which was discovered shortly after ALV (Rous, 1911). RSV is very closely related to ALV, but is distinguished by the presence of an oncogene between its env coding domain and its 3' LTR. The capacity of RSV to induce connective tissue tumours and other malignancies in infected birds led to its adoption as an early model system for studies of oncogenesis (Vogt, 1997a). RSV is unique amongst oncogene-containing alpharetroviruses in possessing its oncogene (src) outside the genes required for replication (see Figure 2.2). All other oncogene-containing alpharetroviruses (see Table 2.1) have deletions in one or more of their coding domains, which have been replaced by the oncogene sequence (Payne, 1992). These viruses are replication-defective, and require the presence of an intact 'helper virus' that supplies essential functions in trans, to replicate.

Within some alpharetrovirus species, strains are distinguished by differences in host cell receptor specificity that reflect variations in envelope glycoproteins. Exogenous and endogenous ALV isolates have been classed into subgroups (eg., A-, J-), on this basis (van Regenmortel et al, 2000; Benson et al, 1998). The oncogene-containing replication defective viruses are identified by the presence of a unique oncogene in each species. Additionally, variant strains of the replication defective alpharetroviruses can be distinguished by differences in the regions of gag, pol, and env that are deleted.

Endogenous elements closely related to ALV and RSV have been identified in the genome of the domestic chicken (Gallus gallus) (Coffin et al, 1978). In some lines of chickens, these endogenous sequences are spontaneously expressed and give rise to a virus called Rous-associated virus (RAV-0) (Vogt and Fris, 1971). RAV-0 insertions have also been identified in genome of the red jungle fowl, the domestic chicken's wild ancestor, but not in more distantly related galliform birds, leading to the suggestion that they represent a relatively recent germline infection (Frisby et al, 1979).

A number of partially and fully characterised avian retroviruses have not been allocated to any genus. Some of these viruses show indications of relatedness to the alpharetroviruses. Previous classification systems included the lymphoproliferative

68 Chapter 2 Class II Retrovirus Diversity disease virus (LDV) of turkeys along with the alpharetroviruses in a single genus, the `Avian sarcoma and leukosis viruses' (Payne, 1992). However, the relationship of LDV to the Alpharetroviruses is poorly resolved in phylogenies constructed using RT, and the virus has not been allocated to any genus since the retroviral genera were reclassified (van Regenmortel et al, 2000). LDV has not been well-studied, largely due to the lack of an in-vitro cell culture system, it appears to be an exogenous retrovirus of turkeys (McDougall et al, 1978), but almost nothing is known of the natural biology of the infection.

Retroviruses exhibiting type-C virion morphology have been isolated from golden and Lady Amherst pheasants (Hanafusa et al, 1976). These viruses are thought to represent distinct species, but their nucleotide sequence has not been determined and their relationship to alpharetroviruses is unknown.

Class II retroviruses called endogenous avian retrovirus-0 (EAV-0) have been identified within the genomes of the Gallus genus of game birds (order Galliformes), in which their phylogeny closely reflects host phylogeny, suggesting an ancient origin (Sacco et al, 2001; Boyce-Jacino et al, 1992; Resnick et al, 1990). Complete sequence data are not available for EAV-0 insertions, and they are not usually included as part of the Alpharetrovirus genus.

Recently, PCR screening in galliform birds identified 19 gag genes showing relatedness to previously characterised alpharetrovirus gag genes (Dimcheff et al, 2000). This work led to the characterisation of a complete proviral sequence of a novel avian ERV from ruffed grouse (Bonasu umbellus) (Dimcheff et al, 2001). The virus, which was defective in pol, was designated tetraornine endogenous virus (TERV).

69 Chapter 2 Class II Retrovirus Diversity

2.1.2 Betaretroviruses (mammalian B- and Type-D retroviruses)

The Betaretrovirus genus includes exogenous viruses and ERVs of , rodents and marsupials. The exogenous species include mouse mammary tumour virus (MMTV), Mason-Pfizer monkey virus (MPMV), Jaagsiekte sheep retrovirus (JSRV), and retrovirus types 1 and 2 (SRV-1 and SRV-2). Some members of the Betaretrovirus genus exist as both exogenous and endogenous forms. Betaretroviruses have simple genomic organisation, and no oncogene-containing members of the genus have been identified (van Regenmortel et al, 2000). The Betaretrovirus genus was created by the grouping together of viruses previously classified as separate genera (Pringle, 1999). The type-D and type-B labels will occasionally be used here to distinguish between MMTV and other betaretroviruses.

MMTV was the first mammalian retrovirus isolated (Bittner, 1936), and occurs as an endogenous and exogenous virus of laboratory mice (Mus musculus). The MMTV virus has many unusual features, such as its transmission vertically via milk, the type-B morphology of the virion, and its capacity to induce tumours via insertional activation of oncogenes, that led to it being classified as the prototype member of a separate genus for many years. The type-B virions of MMTV have prominent surface spikes and an eccentric condensed core, and MMTV was originally classified as the prototype member of the 'Mammalian type-B retroviruses'. The other betaretroviruses, which have type-D virions that lack prominent surface spikes, were grouped together in the 'Mammalian type-D retrovirus' genus (see Petropoulos, 1997).

More than 50 endogenous MMTV sequences have been identified in the mouse genome (Kozak et al, 1987; Tomonari et al, 1993). At least two loci, Mtvl and Mtv2 can express infectious virus (van Nie et al, 1977; van Nie and Verstraeten, 1977). Additional polymorphic MMTV proviruses are present in some, but not all, wild mice (Callahan et

70

Chapter 2 Class H Retrovirus Diversity

Figure 2.3. Mason-Pfizer monkey virus (MPMV) genetic map

.zg g 0 ,, 5 P 5g 1 2 E 9 e Proviral DNA . ,. c4. t

CV vl SO rV (..1 '='cn v-1 '''N 11101- U3 R U5 U3 R U5

Frame 1 [ pm 1 ORFs Frame 2 ' 111

Frame 3

Genomic RNA

AUG UAA UAA R US CAV U3 R gag pro poi, — 5' CAP - clavt- r -----*- H -AAA 3' SD SA -1 FS -1 FS Proteins . .., " ;,,,, .i. a, 2. 7, a a. I + - Cleavage by viral protease 0-MA CA INC Gag T - Cleavage by host cell protease + + + A + , 0 - Myristylation a I- 1 O V N V- N 'a 1.t -6. '2. 7.. "i2.' Gag-Pro 43) MA L CA DU PR t At'

Gag-Pro-Pol 43) MA CA DU Pit +

0 ,9 ft ft `cl. Env su t

71 Chapter 2 Class II Retrovirus Diversity

al, 1982; Imai et al, 1994). However, thorough and systematic studies of MMTV distribution in different rodent species have not been performed.

The original type-D retrovirus was isolated from a rhesus monkey mammary tumour in 1970 and was called Mason-Pfizer monkey virus (MPMV) (Chopra and Mason, 1970). Figure 2.3 shows a genetic map of MPMV, and the products of MPMV expression. Subsequent investigation has revealed the existence of exogenous simian retroviruses, or SRVs, in captive macaques from primate centres around the world. All SRV isolates are closely related, having similar core but distinct envelope antigens, and fall into five neutralisation serotypes (SRV 1-5). Infection with SRVs has only been demonstrated in captive macaques, and these probably represent zoonotic infections since infection has not been demonstrated in wild caught individuals (Gardner et al, 1994). Serological surveys suggest that West African talapoin monkeys may represent a natural reservoir of SRVs (Ilyinskii et al, 1991). Recently, a novel SRV naturally infecting the common Hanuman langur was identified (Nandi et al, 2000). Comparison of retrovirus phylogenies constructed using RT with phylogenies constructed using sequences derived from env suggests that the ancestor of the primate betaretroviruses may have acquired a new env gene through recombination with a gammaretrovirus (Benit et al, 2001).

Jaagsiekte sheep retrovirus (JSRV) occurs as both an exogenous and endogenous virus of domestic sheep, in which it causes a contagious, slow progressive lung disease which pathologically appears as an adenomatosis (Jones, 1985). Investigations using probes derived from JSRV CA and SU have identified sequences showing weak homology to JSRV in diverse ungulate species, suggesting endogenous Betaretroviruses may have a widespread distribution across ungulate hosts (Hecht et al, 1996).

Endogenous type-D viruses were first identified in langurs3 (Benveniste and Todaro, 1977). The existence of type-D endogenous retroviruses (Simian endogenous

3 The Po-1-Lu langur isolate has never been confirmed and no sequence data are available.

72 Chapter 2 Class II Retrovirus Diversity retroviruses, or SERVs) has been demonstrated in African cercopithecine monkeys, and the complete sequence of a type-D SERV from a Savanna baboon (Papio cynocephalus) has been determined (van der Kuyl et al, 1997). These studies concluded that type-D SERVs are found in monkeys of the subfamily Cercopithinae but not in those of the Colobinae and Hominoidae. Noting that split between the Cercopithinae and the Colobinae is estimated to have occurred around 9 million years ago (Martin, 1993), van der Kuyl and colleagues (1997) suggested that the exogenous ancestor of type-D SERVs entered the Cercopthicine germline at some point within the last 9 million years. Endogenous betaretroviruses have also been characterised in the genomes of laboratory mice (Mager and Freeman, 2000), and brush-tailed possums (Baillie and Wilkins, 2001).

2.1.3 Lentiviruses

Lentiviral virions are —80-100 nm in diameter with dense conical cores. Only exogenous lentiviruses have been described, and these include viruses of humans, primates, domestic and wild felids, and a variety of domestic ungulates (goats, sheep, cattle and horses). No oncogene-containing members have been reported and integration is not known to activate cellular oncogenes (Joag et al, 1996). Figure 2.4 shows the genomic organisation of HIV-1. Table 2.2 details features of the various lentivirus species and their pathogenic properties.

Lentiviral genomes have complex organisation, encoding at least two regulatory proteins (Tat, Rev) in addition to Gag, Pol and Env proteins. The primate lentiviruses encode other accessory proteins (Vif, Vpr, Vpu/Vpx and Nef); counterparts of these proteins have not been definitively identified in other lentivirus groups (Petropoulos, 1997). The pol genes of all non-primate lentiviruses (except the BDV/JDV lineage) encode a protein with dUTPase activity. Recent investigations claimed to identify a hidden' dUTPase sequence within the env gene of HIV-1 (Abergel et al, 1999), but the sequence similarity is weak.

73

Chapter 2 Class H Retrovirus Diversity

Figure 2.4 Human immunodeficiency virus type 1 (HIV-I) genetic map

s 0 g .8 .8, .2 •=i; ,‘g g °c .= 4 V. -6— : i LI . — • - :g i I I 1 Ili :I 1 Proviral DNA , ,I.40'. 5.MR, iii14 '-' .., i R a 'S 2 F; 1:, s2 s iz zt, ci s,- A ,,I... 2 0° , a v . .;- r. -1 °P, r - - «1 Vo go' ti, ] I L 1 1 I (7111 = ' _ 1m U3 R U5 U3 R U5

Frame 1 [tar vpu} rev [nef

ORFs Frame 2 pro I vpr I Frame 3 rev gag ] vif tat i

Genomic RNA

AUG UAA UAG R US gag pro poi env U3 R 5' CAP - 1 .1_ II -AAA 3/ SD SA -1 FS

Table 2.2 Lentivirus diseases.

Host Virus Disease

Horse virus Anaemia, wasting

Sheep Visna maedi retrovirus (VMV) Pneumonia,wasting, arthritis,mastitis,

Goat Caprine arthritis encephalitis virus Arthritis,mastitis, encephalitis (CAEV)

Cattle Bovine immunodeficiency virus None (BIV)

Indonesian cattle Jembrana disease virus (JDV) Fever,lymphadenopathy and lymphopoenia

Various domestic and wild felids Feline immunodeficency virus Immunodeficiency, (FIV) wasting, encephalitis

Various African primates Simian None immunodeficiency virus

Humans Human Immunodeficiency, immunodeficiency wasting, encephalopathy, pneumonia,

74 Chapter 2 Class II Retrovirus Diversity

The first lentiviral disease was described in the 1950's, when ovine maedi-visna virus (OMVV) was identified as the cause of maedi-visna, a persistent neurological disease of sheep (Sigurdsson, 1954). Lentiviruses have now been identified as the causative agents of a variety of diseases, including immunodeficiencies, neurological degeneration, and arthritis. Although the diseases caused by the various lentiviral species differ, lentiviral infections are typically distinguished by the slow, smouldering progress of disease. The incubation period preceding the onset of disease is unusually long, often from months to years, and the diversity of organ systems affected may be very great (Joag et al, 1996).

Lentiviral disease has become a public health priority since the emergence of acquired immunodeficiency syndrome (AIDS) in the early 1980's. AIDS is caused by two closely related retroviruses, HIV-1 and HIV-2 (Gallo et al, 1984; Levy et al, 1984). The growth of the HIV/AIDS pandemic over the past two decades has stimulated a considerable amount of research into the biology of HIV infection. Attempts to develop treatments for the disease have been confounded by the complex interactions of the HIV retrovirus with its host, and the ability of HIV viruses to evade efficiently immune elimination and resist drug treatment. HIV infection is characterised by the slow, persistent etiology of disease. There is typically a long incubation period, during which persistent viral replication leads to a steady decline in the number of host CD4+ lymphocytes. The onset of AIDS occurs when the depletion of CD4+ cells exposes the infected individual to opportunistic infections and neoplasms. In all except a few rare cases, HIV infection progresses to AIDS and results in the death of the infected individual. Although research into HIV/AIDS has generated an enormous body of knowledge about HIV and other retroviruses, there is still no cheap and reliable treatment for AIDS. In monotherapy with antiretroviral agents, initial declines in the numbers of circulating virus are followed by the rapid appearance of mutant viruses resistant to the drug (Fauci and Desrosiers, 1997). Therapies using combinations of drugs have demonstrated greater success in suppressing viral replication (Emini and Fan, 1997), but such therapies are expensive and are unlikely to reach HIV-infected populations in developing countries, where the rate of HIV infection is highest and medical help is most required. A World Health Organisation (WHO) global summary of the HIV/AIDS epidemic in 2000

75 Chapter 2 Class II Retrovirus Diversity

estimated that a total of 21.8 million people have died from the disease since the beginning of the epidemic, 36.1 million people are currently living with HIV/AIDS, and the majority of those people live in the developing world (UNAIDS/WHO, 2000).

A large number of simian immunodeficiency viruses (SIVs) related to HIV-1 and HIV-2 have been isolated from African primates (see Hahn et al, 2000). The first SIV was isolated at the New England Regional Primate Research Centre, from an Asian macaque that developed symptoms of immunodeficiency (Daniel et al, 1985). The discovery of this virus was largely fortuitous, since SIVs do not typically cause disease in their African hosts, and the disease symptoms seen in this case were the result of transmission from African to Asian primates while both were in captivity. SIVs have since been isolated from a variety of African primates including monkeys, mandrills, mangabeys and (all of which harbour SIVs in the wild) (Desrosiers, 1990).

Since their discovery, SIVs have been implicated in the origin of the HIV/AIDS epidemic (Hahn et al, 2000). Phylogenetic analysis suggests that HIV-1 and HIV-2 viruses arose through separate cross-species transmissions of African SIVs to humans (see Holmes, 2001). Within the primate lentivirus group, most species and subspecies of African primate seem to carry their own monophyletic lineage of lentivirus. In contrast, humans seem not to have their own distinct lineage, but instead have apparently acquired two divergent lineages in the HIV-1 and HIV-2 viruses, from chimpanzees (Gao et al, 1999) and sooty mangabeys (Gao et al, 1992) respectively. Baboons and macaques, like humans, seem not to have their own lineage and have acquired lentiviruses from other species (Hahn et al, 2000). The intraspecies diversity of SIVs is not well understood, because only one or a small number of isolates from each primate species have so far been sequenced (Hahn et al, 2000).

HIV-1 strains are divided into main (M), outlier (0) and N (non-M, non-0) groups, each of which is thought to have originated from a separate cross-species transmission event. The epidemiology of the AIDS pandemic tends to agree with a 20th century origin of at least the HIV-1 M group (Korber et al, 2000). The emergence of HIV-1 apparently went

76 Chapter 2 Class II Retrovirus Diversity

unnoticed in sub-Saharan Africa for several decades (Zhu et al, 1998). The diverse HIV- 2 lineages are thought to have arisen in sooty mangabeys with several cross-species transfers from sooty mangabeys into humans, one for each subtype of HIV-2. Diverse subtypes of HIV-2 are thus analogous to the groups (M, N, and 0) of HIV-1 in terms of the transfer events thought to have created them.

Feline immunodeficiency viruses have been studied from wild African lions (Brown et al, 1994), North American wildcats (Carpenter et al, 1996), the Kazakhstan Pallas cat (Barr et al, 1997), and housecats from around the world (Carpenter et al, 1998). The global diversity of housecat FIVs is roughly twice as great as the diversity of the HIV-1 M group in humans, indicating that FIV has probably been spreading through domestic cats for a longer period of time than the HIV-1 M group has in humans, assuming both lineages evolve at close to equal rates (Carpenter et al, 1998).

In both feline and primate lineages, the divergence between isolates is generally correlated with geographic distance, with distant isolates appearing more diverse. Lentiviruses have been estimated to mutate at a rate as much as 107 times as fast as eukaryotic sequences, and show extreme diversity in their DNA sequences (Sala and Wain-Hobson, 2000). Despite the rapid evolutionary rate, many elements in the lentiviral genome have been conserved over time. The gag and pol genes are conserved well enough that multiple sequence alignments of this region can be constructed with relative confidence. However, the sequences of the accessory genes and the env gene are much more variable, making sequence alignment difficult. One of the most conserved regions of the lentiviral genome is the Lysine tRNA primer-binding site. Although the PBS is short (15 bases), it is almost perfectly conserved in all lentiviruses (Berkout, 1996).

77 Chapter 2 Class II Retrovirus Diversity

2.1.4 Deltaretroviruses (HTLV/BLV retroviruses)

Only infectious viruses of primates and cattle are known within this genus. The species recognised are (BLV), and primate T-lymphotropic retroviruses types 1,2 and 3 (PTLV-1, PTLV-2, and PTLV-3). Infection is often asymptomatic, but is associated with B- and T-cell leukemias and lymphomas, and neurological disease. Deltaretroviruses are similar to lentiviruses in that infection typically results in a slow 'smouldering' or chronic pattern of disease (Cann and Chen, 1996). The virion morphology resembles that of type-C retroviruses. Deltaretroviruses have complex genomic organisation, encoding two regulatory proteins, Tax and Rex, in addition to the standard retroviral proteins. No oncogene containing members of the genus are known (see Figure 2.5).

The primate T-lymphotropic viruses form three distinct lineages (PTLV-1, PTLV-2, and PTLV-3); each comprised of geographically distinct isolates, often from both human and simian hosts (Meertens et al, 2002). Most of the viruses belonging to the PTLV-1 lineage, which comprises human T-cell lymphotropic virus type 1 (HTLV-1) and simian T-cell lymphotropic virus type 1 (STLV-1), cannot be separated into distinct phylogenetic lineages according to their species of origin (see Figure 2.6). The interspersion of simian and human isolates in phylogenetic trees has been interpreted as evidence of past and recent interspecies transmission events (Crandall, 1996; Kelsey et al, 1999), and a popular hypothesis is that HTLV-1 was originally transmitted to humans from monkeys (Cann and Chen, 1996).

In contrast with the human immunodeficiency viruses HIV-1 and HIV-2, HTLV-1 and HTLV-2 have low rates of replication in their host, and have remarkably stable genomes. The rate of nucleotide substitution in HTLV-2 has been estimated to be around 104/10-5 nucleotide substitution per site per year, one of the lowest evolutionary rates reported for a retrovirus so far (Salemi et al, 2000). In addition, the nucleotide

78 Chapter 2 Class II Retrovirus Diversity

composition of deltaretroviruses differs markedly from that of lentiviruses. Generally speaking, lentiviruses are A-rich and C-poor across the entire genome (36% A, 18% C,

79

Chapter 2 Class H Retrovirus Diversity

Figure 2.5 Human T-cell leukemia virus type 1 (HTLV-1) genetic map

a 1 8 01-4 a a .2 = a I .4 - •-.0 0 .9. -III 0 0 .z,. 0 'A "E 7: 1-4 O a •.g, i '2 .° .E - a ..• — E E 0. E .- :v., .E.- - .-0 14 ,Ei, . 2 2 2 2 .9. 0Z no CI' """ .. 1 I i Et 0 Proviral DNA . .,-, .0 E a rnt-- O ,,,, v„ 00 0 00 ,e, ,,z, t-- 00NO 000000 0, —^ .1- 0

'----% U3 R U5 U3 R U5 Frame 1

ORFs Frame 2 pro i1 tax

Frame 3 gagAill rex rex 1_

Genomic RNA

AUG UAA UAA UAA R U5 U3 R gag pro o env tax/rex 5' CAP - Al -AAA 3' SD ± SA -1 FS -IFS

Figure 2.6 Primate Deltaretrovirus relationships

----- PTLV-2 HTLV-2

HTL5 -2 ra;L 2 STLV-2 ST HTLV-2 s'bwe B subtype D STLV-2 HTLV-2 PP1664 subtype A

- _ --/—'----

---- — STLV-3 subtype B PTLV-1 STLV-3 STLV-3 TLV I subtype A Subtype A (STLV-L) — ------ItlIVI African _, mil 1,pe C SILV-1 ---- , SI LV I

Figure 2.6 An unrooted phylogenetic tree of the Deltaretroviruses (After Meertens et al, 2002). Simian and human isolates within the PTLV-2 lineage form discrete monophyletic clusters. In contrast, simian and human isolates within the PTLV-1 lineage form interspersed clusters, possibly indicating recent horizontal transmission events between humans and simian species. Only simian viruses have been identified within the PTLV-3 lineage.

79 Chapter 2 Class II Retrovirus Diversity

24% G and 22% T), whereas Deltaretroviruses are C-rich and G-poor across the entire genome (23% A, 33% C, 18% G and 23% T). No biological explanation for this difference is apparent at this time (Foley, 2000).

HTLV-1 associated disease was first described in Japan in 1977 (Uchiyama et al, 1977), and has since been described in most parts of the world. The vast majority of HTLV-1 infected individuals are asymptomatic carriers of the virus. However, an infected individual has an approximately 5-10% chance of developing HTLV-associated conditions during their lifetime. About 30% of patients with clinical manifestations of HTLV-1 infection have chronic/smouldering adult T-cell leukemia (ATL), characterised by skin lesions. HTLV-1 infection may also lead to acute ATL, a severe form of HTLV- related disease. The median survival in acute ATL is approximately 6 months. The pathology of HTLV-1 infection is complex and can affect a variety of organ systems. Other diseases associated with HTLV-1 include HTLV-1 associated myelopathy (also known as tropical spastic paraparesis), immunosuppression, B-cell chronic lymphocytic leukemia, and uveitis (Cann and Chen, 1996).

Since the virus was first isolated in 1981 (Poiesz et al, 1981), several different HTLV-1 subtypes have been described, some of which cluster according to geographic region and ethnic group (Salemi et al, 2000). Recently, HTLV-1 proviruses were amplified from genomic DNA extracted from Andean mummies about 1500 years old. The nucleotide sequences of these proviruses were similar to HTLV subtypes found in Japanese and contemporary Andean populations, indicating that HTLV-1 was carried to South America by ancient Mongoloids, rather than by European colonists (Sonoda et al, 2000).

The PTLV-2 lineage comprises isolates of human T-cell lymphotropic virus type-2 (HTLV-2) and simian T-cell lymphotropic virus type-2 (STLV-2). In contrast to the PTLV-1 lineage, PTLV-2 isolates from humans and primates form discrete clusters, with no evidence of recent interspecies transmission. HTLV-2 was first identified in a T- cell line established from a patient with hairy cell leukemia. This cell line was shown to

80 Chapter 2 Class II Retrovirus Diversity harbour a retrovirus, and based on serological cross-reactivity, this virus was shown to be related to, but distinct from, HTLV-1, and was termed HTLV-2. Although HTLV-2 has a similar genomic organisation to HTLV-1, it shows only 60% homology at the nucleotide level. Certain population groups show a high incidence of HTLV-2 infection. This is particularly true of people of Native American origin. However, the absolute incidence of HTLV-2 infection is low compared with the worldwide incidence of HTLV-1, making it difficult to draw conclusions about the origins and spread of the virus (Cann and Chen, 1996).

The PTLV-3 lineage has only recently been identified. The prototype strain was isolated from an Eritrean baboon kept in a captive colony in Leuven, Belgium. This strain exhibits 40 and 38% divergence at the nucleotide level from HTLV-1 and HTLV-2 respectively (van Brussel et al, 1998). Very recently, a new PTLV-3 subtype was isolated from wild-caught red-capped Mangabeys (Meertens et al, 2002).

BLV occurs worldwide, especially in dairy cattle. The BLV virus causes enzootic bovine leukosis, a B-cell lymphoma, in some infected animals, but persistent asymptomatic infection is common (Egberink and Horzinek, 1994).

2.1.5 Intracisternal A-type particle (IAP) elements

Intracisternal A-type particle (IAP) elements are endogenous elements that have been identified in several rodent species and are sometimes expressed. Expression of IAP elements generates retroviral particles with 'type A' morphology that are assembled on the endoplasmic reticulum (ER). Rather than being released from the cell, the particles bud into intracellular cisternae (Dalton et al, 1961). Purified genomic RNA extracted from A-particles can be used as a probe for hybridisation. Using this method it has been estimated that approximately 1000-2000 endogenous IAP elements are present in the Mus musculus genome (Kuff and Lueders, 1988; Kuff et al, 1981; Kuff et al, 1983). Investigations have shown that many of the endogenous IAP elements are deleted or truncated in some way, elements lacking complete gag and/or pol genes are referred to

81 Chapter 2 Class II Retrovirus Diversity as type-II, whereas "full length" forms are referred to as type I (Shen-Ong and Cole, 1982). Subsequent studies have subdivided type I elements into four subclasses, and type II elements into three subclasses (Kuff and Leuders 1988). The full-length forms contain obvious gag and pol genes, but most seem to lack intact env genes (Mietz et al, 1987). Recently, a subset of full length IAP-related proviruses encoding an env-like protein have been identified; these elements are referred to as IAPE elements (Reuss and Schaller, 1991; Fennelly et al, 1996).

Hybridisation data suggests that repetitive sequences closely related to IAP elements are widely distributed throughout rodent genomes, although they may be excluded from the genomes of certain rodent species (Kuff and Lueders 1988). Large-scale amplification of IAP sequences has apparently occurred independently in the mouse, and Syrian hamster (Lueders and Kuff, 1983).

2.1.6 Class II HERVs

Several distinct lineages of HERVs exhibiting homology to mammalian Betaretrovirus strains have been identified and are referred to as Class II HERVs (Callahan, 1988). The first Class II HERVs to be identified used a lysine (K) tRNA to prime reverse transcription (Ono, 1986), and the Class II HERVs are often referred to as the HERV-K superfamily. The HERV-K superfamily has been subdivided into up to ten 'groups' HML 1-10 based on sequence identities within the pol gene (Andersson et al, 1999). However, recent phylogenetic analysis of HERV diversity suggests that only some of these groups (HML-2, HML-5 and HML-6) should be considered distinct Class II families (Tristem, 2000).

Members of the HERV-K.HML-2 (or HERVK-10) family were first discovered when they showed homology to Type-B MMTV/ SH-IAP (Syrian Hamster IAP elements). There are estimated to be 30-50 relatively intact HERV-K.HML-2 proviruses in the human genome, and around 1000 solo LTRs (Stoye, 2001). The HERV-K.HML-2 family is unique amongst HERV lineages in that it has been shown to include insertions

82 Chapter 2 Class II Retrovirus Diversity

encoding functional enzymatic proteins (Berkhout, 1996; Tonjes et al, 1997) and viral particles (Simpson et al, 1996). This fact, coupled with the observation that the degree of divergence between paired LTRs of HERV-K.HML-2 insertions is smaller than for any other HERV lineage, suggests that HERV-K.HML-2 is the most recently acquired HERV family, and has retained activity since the human and lineages diverged (Barbulescu et al, 1999; Medstrand and Mager, 1998). Indeed, it is possible that the HERV-K.HML-2 lineage is still active within humans. This proposal is supported by the identification of polymorphic loci in humans that contain either full- length HERV-K.HML-2 alleles or pre-integration alleles. One of the polymorphic insertions (HERV-K113) has full-length open reading frames for all its viral proteins and lacks any nonsynonomous substitutions in amino acid motifs that are well conserved among retroviruses (Turner et al, 2001).

Although the HERV-K.HML-2 family still shows signs of activity, the presence of proviruses at identical loci in humans and Old World monkeys suggests that it first colonised the primate lineage over 35 million years ago. Sequences homologous to HERV-K.HML-2 have been found in all Old World primates, but it is not clear whether the lineage has remained active in primates other than humans (Mayer et al, 1998).

The HERV-K.HML-2 family may be the only example of a human endogenous retrovirus with complex genome organisation. Viruses of the HERV-K.HML-2 family have been shown to encode a sequence specific nuclear RNA export factor, termed K- Rev, that is functionally analogous to the HIV-1 Rev protein. Like HIV-1 Rev, K-Rev binds to the cellular factor CRM1 and to a cis-acting viral RNA target to mediate nuclear export of unspliced RNAs. Surprisingly, this HERV-K RNA sequence is also recognised by HIV-1 Rev. However, HIV-1 Rev and its HERV-K.HML-2 counterpart show very little sequence homology (Yang et al, 1999).

The HERV-K.HML-2 lineage has been implicated in autoimmune disease type I diabetes through expression of a superantigen domain within the viral Env protein (Conrad et al, 1996). This autoantigen was detected in insulin-dependent diabetes

83 Chapter 2 Class II Retrovirus Diversity mellitus (IDDM) patients and raises the possibility that a genetic susceptibility for developing IDDM could be linked to the expression or presence of HERV-K.HML-2 elements in humans. However, a recent study could find no evidence for activation of specific T-cell subsets, one of the key features of superantigens, apparently ruling out superantigen involvement in Type I diabetes, and any role for HERV-K.HML-2 as a causative factor (Lapatschek et al, 2000).

It has also been suggested that the protease (PR) of a human HERV-K.HML-2 might complement HIV-1 PR, and thereby confer resistance to antiretroviral drugs targeted at the HIV-1 PR (Towler et al, 1998). However, HERV-K PR failed to correctly process HIV-1 Gag and Pol polyproteins in experimental assays (Padow et al, 2000), and a role for HERV-K PR in HIV-1 resistance to protease inhibitors is now considered unlikely.

The structure, genomic organisation and distribution of the HERV-K.HML-6 lineage has been investigated in some detail (Medstrand et al, 1997). It is estimated that the haploid human genome contains about 30-40 HERV-K.HML-6 proviruses, in addition to about 50 HERV-K.HML-6 solo LTRs. The reading frames of all HERV-K.HML-6 proviruses so far identified show numerous deletions and stop codons. The HERV- K.HML-6 lineage is phylogenetically distinct from HERV-K.HML-2. HERV-K.HML-5 lineage is more closely related to the HERV-K.HML-6 than to the ERVs of the HERV- K.HML-2 lineage (Tristem, 2000). However, the HERV-K.HML-5 lineage has not been investigated in detail.

It has been proposed that a sequence within the human genome, called HRES-1, is viral in origin, being derived from regions of the HIV and HTLV genomes (Banki et al, 1992). This was rather a dubious hypothesis at the time, and recent sequence comparisons provide little support for the idea that the HRES-1 element is retroviral in origin (Tristem, 2000).

84 Chapter 2 Class II Retrovirus Diversity

2.1.7 Divergent Class 2 ERVs Identified by PCR Screening

Prior to the initiation of this work, preliminary investigation of the distribution of Class II elements across a wide range of vertebrate taxa had been carried out by Imperial College's Retroviral Evolution Group. PCR screening of 40 lower vertebrate taxa with Class II specific primers determined that Class II ERVs are apparently rare in lower vertebrates. Nevertheless, sequences derived from three caecilian taxa showed homology to alpha- and betaretroviruses (Lynch, 2000). It is not clear why caecilians are exceptional amongst lower vertebrates in harbouring Class II ERVs.

Screening in higher vertebrate taxa identified several novel Class II sequences that showed homology to alpha- and betaretroviruses during alignment to computer sequence databases, and clustered with known B/D type sequences in trees of Class II elements (Figure 2.7). Unfortunately, the relationships between the novel ERV sequences and established Class II taxa were poorly resolved in phylogenetic analysis; although relationships between some of the isolates were suggested, bootstrap analysis failed to support many of the relationships and collapsed the clades, such that the group was represented as a well-supported polytomy. Alignments based on the entire region amplified in PCR screening (from protease to domain 5 of RT) failed to resolve the polytomy, indicating that further data would be necessary to resolve relationships within the group (Martin, 1999).

Recent investigations demonstrated that class I ERVs isolated from a particular host class tended to cluster together, suggesting that horizontal transfer of these retroviruses across vertebrate classes occurs only rarely. However, there were indications that the situation was different in class II, where viral host range did not appear to be constrained by family or grouping. Although relatively few Class II ERVs had been isolated (12 in total), the limited data available suggested that horizontal transfer might be frequent in this group (Martin et al. 1999; Martin, 1999).

85

Chapter 2 Class II Retrovirus Diversity

Figure 2.7 Novel Class II ERVs detected by PCR screening

RV Regent's bowerbird III RV Caecilian (Alt) 60 70 RV Caecilian (Siph) II 77 RV Caecilian (Bou) RV Stripe-faced dunnart HT 99 S RV-1 64 99 MPNIV 99 SRV-2 59 83 SMRV-11 7 9 71 70 RV Ostrich 91 88 Jaapicktc. (1S RV ) RV Duck-billed platypus II RV Partridge IV 59 RV Wren II RV Mistle thrush II IAP Syrian hamster 7 9 Nim iv 73 RV Wood pigeon I 80 RV Wood pigeon III 96 HERV.K.HML-2 RSV 69 LDV EIAV 100 100 100 HIV1 100 100 100 HIV2 100 tam 100 BLV

Deltaretrovirus Alpharetrovirus Novel sequences

■ Lentivirus Betaretrovirus

Figure 2.7 After J.Martin (Ph.D. thesis, 1999) Strict consensus of 3 maximum parsimony trees generated from 160 amino acid alignment of the RT gene, using the PROTPARS matrix. The tree is rooted on 3 spumaviruses (HSRV, SFV-1, SFV-3). Bootstrap support values are shown (upper figure: maximum parsimony (100 replicates with 100 random additions); (lower figure: neighbour joining (1000 replicates)).

86 Chapter 2 Class II Retrovirus Diversity

2.1.8 Investigating Class II Distribution and Diversity

The Class II Glade is of particular interest since it includes perhaps the most medically important and biologically interesting retroviruses, foremost amongst these being the human immunodeficiency viruses (HIV-1 and HIV-2) and the human T-cell leukaemia viruses (HTLV types 1 and 2). The genera to which these viruses belong are apparently highly divergent within the Retroviridae, and many questions about their origin and evolution remain open to speculation. Both the lentiviruses and the Deltaretroviruses appear to be highly divergent lineages within Class II, and no endogenous sequences showing close similarity to either the lentiviruses or deltaretroviruses have ever been identified. Consequently, most authors regard these viruses as 'true' exogenous retroviruses. This conclusion may well be premature, since sampling for endogenous viruses has been relatively restricted. Discovery of ancient ERV insertions related to these groups might clarify the relationships between them and other Class II retroviruses. Such a placement is desirable since it might reveal interesting information about the evolution of these biomedically prominent genera.

The aim of the work described here was to investigate the evolution of Class II retroviruses. PCR screening of 96 vertebrate taxa led to the characterisation of pol sequences from over 90 novel Class II ERVs. The relationships of novel ERV fragments to known Class II retroviruses was analysed using both neighbour-joining and maximum parsimony methods of phylogenetic inference, and the distribution of viral traits across the Class II phylogeny was analysed.

87 Chapter 2 Class II Retrovirus Diversity

2.2 MATERIALS

2.2.1 Media, Plates and Buffers

Liquid media, buffers, and agarose plates used during the course of this work were as follows: Transformed JM109 bacterial cells were grown in 2xYT medium (16g bacto tryptone, lOg bacto yeast extract and 5g NaC1) supplemented with ampicillin (1001.ig/ml final concentration). Cells were grown in NZCYM medium supplemented with 0.2% maltose and 10mM MgSO4. For plates, 15g of agar was added to each litre of medium. Transformed bacteria were selected using a blue/white colour screening process on 2xYT/ampicillin/IPTG/X-Gal plates (see section 2.3). The X-Gal and IPTG component of these plates was spread across the surface of the solid media. 20µ1 of 50mg/m1 X-Gal and 100111 of 100mM IPTG were used on each individual plate. All liquid media, plates, and buffers were prepared according to manufacturer instructions or as described in Maniatis et al (1989).

2.2.2 Vectors and Bacterial Strains

The pGEM-T Easy vector system (Promega, Madison, USA) was used for cloning of PCR products. This kit provided aliquots of pGEMTM T Easy cloning vector, and of the JM109 strain of E.coli. The pGEM-T Easy cloning vector is provided as a linearised plasmid. It is cut with EcoR V to generate an insertion site and 3' terminal thymidine (T) residues are added to the cut ends. The 3'-T overhangs at the insertion site prevent recircularisation of the vector and provide a compatible overhang for ligation of PCR products generated by Taq polymerase (Taq polymerase adds a single deoxyadenosine (A) to the 3' end of PCR products via a non-template dependent activity).

2.2.3 Enzymes

Taq (Thermus aquaticus) DNA polymerase was purchased from Roche Molecular Biochemicals (Indianapolis, IN, USA). Restriction endonucleases were purchased from

88 Chapter 2 Class II Retrovirus Diversity

New England Biolabs (Beverly, MA, USA) and Promega. BigDyeTM Terminator ready reaction cycle sequencing kits were purchased from ABI Prism Applied Biosystems (Warrington, UK); Proteinase K was purchased from QIAGEN (Crawley, West Sussex, UK); RNAseA (bovine pancreas) was purchased from Sigma (Poole, Dorset, UK);

2.2.4 Gels, Running Buffers, and Molecular Weight Markers

Gel electrophoresis of DNA used both tris-acetate EDTA (TAE) and tris-borate EDTA (TBE) gels; (50X TAE: 243g Tris, 20.5g anhydrous NaAc, 18.6g EDTA per litre; 10X TBE 108g Trisma base, 55g boric acid, 9.3g EDTA per litre). Multipurpose powdered agarose purchased from Roche Molecular Biochemicals was added to either lx TAE or lx TBE running buffer to prepare agarose gels of concentration 0.8%-1.3%. Electrophoresis was carried out at 80-100 volts, in a running buffer containing ethidium bromide at concentration of 0.5-1µ,g/ml, such that DNA could be visualised by viewing agarose gels under UV light.

The following molecular weight markers and gel loading dye were purchased from Promega and used during the course of this work; 1 DNA and HindIII digested 1 DNA; 100bp ladder; blue/orange loading dye at a ratio of 1 part dye to 5 parts DNA solution. Blue/Orange 6X loading dye (10% Ficoll 400, 0.25% bromophenol blue, 0.25% xylene cyanol FF, 0.4% orange G, 10mM Tris-HCL (pH 7.5) and 50mM EDTA).

2.2.5 Oligonucleotide Primers

Oligonucleotide primers were purchased from MWG Biotech (Milton Keynes, UK). Primers were supplied as a desiccated residue and were resuspended in sterile distilled H2O to a concentration of 100 pmoles/µ1.

89 Chapter 2 Class II Retrovirus Diversity

2.2.6 Other Reagents, Kits and Consumables

DNeasy tissue kits were obtained from QIAGEN. Sephaglass band prep kit, GFX Microplasmid Prep Kit, GFX DNA and gel band purification kit, were purchased from Amersham Pharmacia Biotech. Orthoboric acid, Tri sodium citrate, EDTA, NaOH, NaC1, 10% SDS chloroform and Ampicillin were purchased from BDH, TriLina Tris- base, X-gal, IPTG, BSA (Bovine serum albumin Fraction V), phenol chloroform isoamyl alcohol (25:24:1 by volume), bromophenol blue, sodium acetate, mineral oil and sterile distilled water were all purchased from Sigma. All eppendorphs, pipette tips and other laboratory plasticware were obtained from Appleton Woods (Birmingham, UK). Filter paper was purchased from Whatman International (Maidstone, Kent, UK).

2.2.7 Equipment

Equipment purchased and designed by Perkin Elmer Biosystems was used for PCR and sequencing. PCRs were carried out using a Perkin Elmer GeneAmp 2400 thermal cycler. Sequencing was performed using an ABI Prism 3700 DNA Analyzer. A Heraeus Biofuge 'pico' centrifuge was used for all centrifugation, and a Vortex Genie 2 vortex (Scientific Industries Inc., Bohemia, USA) was used for mixing and resuspending samples. Gel electrophoresis tanks and power packs were purchased from Anachem (Luton, UK). DNA was visualised using a UVP (Cambridge, UK) high performance transilluminator and image, photographs were manipulated and prepared for printing using UVP's Labworks software package.

90 Chapter 2 Class II Retrovirus Diversity

2.3 METHODS

2.3.1 DNA Extraction

Genomic DNA for PCR and Southern hybridisation was extracted from tissue samples using the DNeasy Tissue Kit obtainable from QIAGEN. A variety of animal tissues were used as starting material for DNA extraction (see Appendices 2.2. and 2.3 for tissue sources and tissue types). The DNeasy Tissue Kit contains specialised mini- columns fitted with a silica-gel membrane that efficiently binds DNA. The mini- columns enable rapid purification of DNA from solution without the need for organic extraction or ethanol precipitation. For each tissue sample, the extraction procedure was as follows: Approximately 25mg of tissue was weighed out and chopped or ground into small pieces using a sterile scalpel or glass rod. The tissue sample was then placed in a buffered solution containing proteinase K and incubated at 55°C until the tissue was completely lysed. Following lysis, the buffering conditions were adjusted to provided ideal conditions for DNA-binding and the lysate was loaded onto the mini-column. DNA was selectively bound to the silica-gel membrane within the mini-column during a brief centrifugation step, while contaminants were washed through. Two further washes were applied to remove any remaining contaminants, and the DNA was eluted in 50111 of sterile, nuclease-free water.

2.3.2 Polymerase Chain Reaction

Genomic DNA was screened for retroviral inserts using polymerase chain reaction (PCR). PCR is a technique that has revolutionised molecular biology by making it possible to generate billions of copies of specific DNA segments quickly and accurately. The PCR technique emulates the processes used to replicate DNA in bacterial cells, allowing researchers to generate numerous copies of any defined DNA fragment in the laboratory. The only information needed to replicate a particular target region is the sequence of two short regions flanking it. Two oligonucleotide primers complementary

91 Chapter 2 Class II Retrovirus Diversity to the sequences on opposite strands of these flanking regions can then be synthesised, and these serve as the starting point for copying during the reaction.

The key components of the PCR reaction mixture are template DNA, oligonucleotide primers, free nucleotides, and a thermostable DNA polymerase. The reaction is a three- step process that is carried out in repeated cycles controlled by a heating block (see section 2.2.7). Firstly, template DNA is denatured (separated into single strands) by heating to —95°C. The temperature is then reduced to —40-50°C, at which oligonucleotide primers anneal to their complementary regions on the template DNA. In the third step temperature is adjusted so that the DNA polymerase becomes optimally active and synthesises a new single strand from the 3'0H end of each primer by sequentially adding free nucleotides that are complementary to the template sequence. DNA synthesis from one primer is directed toward the other, resulting in the replication of the desired intervening sequence. These three steps are repeated and the number of copies of the target sequence grows exponentially with each cycle.

Genomic DNA samples were screened for retrovirus insertions using degenerate primers and an appropriately modified PCR protocol. Most PCR procedures aim to generate multiple copies of a specific DNA sequence present as a single copy in the original template DNA. When sampling for novel endogenous retroviruses, however, the aim is usually to amplify a heterogenous range of sequences from the many retroviral inserts present in the template DNA. Primers are targeted against regions of the retroviral genome that are conserved across the retroviral groups or genera of interest. However, the nucleotide sequences of endogenous retroviruses are highly variable, and even the most conserved motifs may vary to some degree. To cope with this variation, a mixture of oligonucleotide primers is used, in which the overall form of the primer sequence is conserved but varies to a given degree at certain positions within the sequence. 'Degenerate' primers of this type allow us to develop PCR protocols capable of coping with variation at particular positions within the chosen target motif. At positions where it was deemed equally probable that any of the four bases might be present, inosine

92 Chapter 2 Class II Retrovirus Diversity residues were inserted into the primer; inosine is capable of binding to any of the four bases.

The aim when designing degenerate primers for ERV screening is to achieve a balance between degeneracy and conservation such that primers are specific enough to amplify only the retro element family or the retroviral genus in which we are interested, but variable enough that divergent sequences within the target family or genus will also be amplified. The active site motifs of the retroviral protease gene and domain 5 of RT (LXDT/SGA/S and YV/MDDI/LL/Y, respectively) are highly conserved and previous experience within our group has shown that degenerate primer pairs directed against these motifs are highly effective for screening procedures (Tristem, 1996). The majority of novel primers designed and tested during the course of this work were directed against these regions. The intervening region between the two motifs is typically from 800-1200bp in length and includes domains 1-5 of the RT gene and most of the protease gene. Amplification of this region has the advantage that it contains much of the sequence information used previously in phylogenetic analysis of distantly related retrolements (Xiong and Eickbush, 1990) and allows the construction of phylogenies based on more than one retroviral gene. A universal protease primer specific to all retroelements was used throughout the work discussed here, specificity was achieved by modification of the RT-directed primer of the PRO/RT primer pair.

2.3.3 Cloning - Ligation

Prior to sequencing, PCR products were ligated into a plasmid vector (pGEM-T Easy) and transformed into competent JM109 E. coli cells. A cloning step was necessary since the PCR screening procedure was designed to amplify a heterogeneous mixture of viral polymerase gene fragments. The cloning step allowed the separation of different viral isolates so that they could be used as template for sequencing reactions. To avoid cloning of non-viral products, PCR products were electrophoresed through a 1.2% agarose gel (refer to section 2.2.4 for gel, buffer and marker details), and bands of appropriate molecular weight were excised. A band prep kit (Amersham Pharmacia

93 Chapter 2 Class II Retrovirus Diversity

Biotech) was used according to manufacturer instructions to dissolve the agarose slice and to purify and elute the nucleic acids. Ligations were carried out at 4°C overnight, and were then either used immediately in transformations or frozen at -20°C.

2.3.4 Cloning - Transformations

Transformations were carried out as follows; JM109 competent cells were removed from -80°C storage and defrosted in an ice bath. Ligation reactions were briefly centrifuged and —2µ1 of each ligation was transferred to a fresh, labelled tube. 50µ1 of competent cells were carefully transferred into the tubes containing 41 ligation mixtures using a wide-bore pipette, and gently stirred. Cells were allowed to stand on ice for 20 minutes and then heated shocked in a water bath at exactly 42°C for 1 minute. Heat shocking allows ligated vectors to be taken up by the bacteria, where vector replication will occur. Transformed cells were allowed to recover on ice for 2 minutes, before the addition of 9500 of SOC medium (section 2.2.1) and incubation for 1 hour at 37°C with shaking (-150 rpm).

A 1000 aliquot of each transformation culture was spread onto 2xYT/Ampicillin/X- Gal/IPTG plates. 'Indicator' plates of this type facilitate the identification of bacterial colonies that contain recombinant plasmids by exploiting the properties of the plasmid vector. Only transformed bacteria will grow on the ampicillin-supplemented plates, since they require the ampicillin resistance (Ampr) gene the vector provides. However, a proportion of the growing colonies will contain non-recombinant copies of the vector. The pGEM-T Easy vector is designed to facilitate a blue/white colour screening process that distinguishes colonies containing recombinant plasmids. The polycloning site of the vector is embedded within the peptide-coding region of the enzyme b-galactosidase (lacZ). The presence of this coding region causes transformed colonies of JM109 cells containing non-recombinant plasmid to appear blue on indicator plates containing chromogenic substrate (X-gal). However, insertion of a fragment of foreign DNA into the polycloning site interrupts the coding sequence of b-galactosidase, and consequently cells carrying recombinant plasmids form white colonies.

94

Chapter 2 Class II Retrovirus Diversity

2.3.5 Cloning - Plasmid DNA Preparation

White colonies containing recombinant plasmids were picked into ampicillin selective YT broth (section 2.2.1) and were grown up overnight at 37°C with shaking (-20Orpm). Cloned vectors were recovered from 3m1 volumes of E.coli cultures using the GFX Micro Plasmid Prep kit (Amersham Pharmacia Biotech), according to manufacture instructions. The GFX kit employs a modified alkaline cell lysis procedure, and guarantees 3-61.1g of plasmid DNA per ml of culture. Bacterial cells were pelleted by centrifugation, the supernatant was aspirated and discarded, and the pellet resuspended in 300111 of an isotonic solution (100mM Tris-HC1 (pH 7.5), 10mM EDTA and 400µg/m1RNase I). Chromosomal DNA and bacterial proteins were denatured by 304,1 of an alkaline lysis solution (1M NaOH and 5.3% [w/v] SDS), and neutralised using 3000 3M NaAc (pH 4.8). At a neutral pH, plasmid DNA renatures while bacterial DNA remains denatured and precipitates in a DNA-SDS complex along with denatured bacterial membrane proteins. Plasmid DNA was bound to a column-based glass-fibre matrix, and washed with 80% ethanol. Plasmid DNA was eluted in sterile, nuclease-free water and stored at -20°C.

To check whether the colonies picked and propagated contained recombinant plasmids, an aliquot of plasmid DNA was digested with EcoRl. The vector has EcoR1 restriction sites flanking the polycloning site, such that digestion with EcoRl will excise insert DNA. A 101 aliquot of plasmid DNA was mixed with 1 unit Eco R1 (0.510), 2 111 Eco R1 10X buffer, 0.5µ1 BSA and 6.5µ1 sterile water, and the digestion mixture was incubated at 37°C for 30 minutes. The digested plasmid DNA was electrophoresed through a 1.3% agarose gel with HindIII digested 1 DNA as a marker, and plasmids found to contain inserts of the appropriate size (800-1000 base pairs) were selected for sequencing. The presence of multiple bands in some lanes indicated inserts with internal EcoR1 restriction sites. In these cases it was possible to calculate the size of the insert DNA by totalling the size of all the bands present.

95 Chapter 2 Class II Retrovirus Diversity

2.3.6 Sequencing

The products of sequencing reactions were analysed using an ABI Prism 3700 DNA Analyzer (Perkin Elmer Biosystems). During sample preparation, the DNA fragments in a sample are chemically labelled with fluorescent dyes. The dyes facilitate the detection and identification of the DNA. One DNA molecule is usually only labelled with one dye molecule, but up to four dyes can be used to label the DNA in the sample. Both the type of fluorescent labelling and the sample composition vary with the sample preparation method used. Cycle sequencing reactions were set up as follows: 0.5-1ttg (usually 1- 41) of template DNA was combined with 3.2 pmols of primer and 41 of AmpliTaq FS.

Reactions were then placed in the Perkin Elmer 24000 PCR block and underwent 25 cycles of the following three reaction steps: 96°C/20 seconds, 49°C/15 seconds, 60°C/4 minutes. Reaction products were then alcohol precipitated. For each reaction a tube was prepared containing 41 3M NaAc (pH 5.2) and 50111 95% ethanol and the reaction products were transferred to that tube and left on ice for 20 minutes. The solution was spun at 13,000 rpm for 30 minutes, and the ethanol aspirated and discarded. 250111 of fresh 70% ethanol was then added to the tube and the tubes were spun again at 13,000 rpm for 5 minutes. All the ethanol was removed following centrifugation and the tubes were left to air dry for 30 minutes. Cleaned, precipitated reaction products were stored at -80°C.

Sequence reaction products were analysed on an ABI Prism 3700 DNA Analyzer. Samples are injected into thin glass capillaries in the analyser that are filled with a stationary polymer. One sample is placed in each capillary. A voltage is then applied across the capillaries causing the DNA fragments to migrate through the polymer, with the shorter fragments moving faster than the longer fragments. When the fragments reach the ends of their capillaries, they are carried across a transparent cuvette by a moving polymer that flows over the ends of the capillaries in a process called sheath flow. The moving polymer moves the DNA fragments through the path of a laser beam that makes their dye molecules fluoresce. The fluorescence is captured by a detector,

96 Chapter 2 Class II Retrovirus Diversity

which includes a charged coupled device (CCD) camera. The CCD camera converts the fluorescence information into electronic information, which is then transferred to a computer workstation for processing. After the data is processed it is stored in the instrument database and displayed as an electropherogram. An electropherogram plots relative dye concentration (y-axis) against time (x-axis) for each of the dyes used to label the DNA fragments. Each peak in the electropherogram represents a single fragment. The analysed data is viewed with DNA sequencing analysis software

2.3.7 Sequence Identification and Alignment

First the ends of each sequence (the sequences of the primers) were identified within each electropherogram. The two fragments of the sequence were then assembled into a contig using Assemblylign®. Mismatches were identified and electropherograms re- examined to determine correct sequence of the contig.

Through previous work a database containing a diverse array of retroviral sequence data had been established. This database contained a range of retroelement pol gene fragments obtained through PCR screening of diverse vertebrate taxa; a range of complete sequences of endogenous retroviral sequences identified through phylogenetic screening of the human genome project database; and the complete sequences of exogenous and endogenous viruses, and retroelements obtained by other workers. Novel sequences obtained by PCR were aligned to this database using MacVector®. Alignment to the database returned a list of the best matches to the query sequence in the database, and a score for each of the sequences. Using this approach it was usually possible to determine immediately whether a sequence was viral in origin, and approximately which retroviral group or genus it was likely to be a member of. Sequences that showed no or little homology to the viral sequence data already present in the database were further investigated by alignment to public nucleotide sequence databases using the National Centre for Biotechnology's BLAST asic local alignment search tool) (Altschul et al, 1997).

97 Chapter 2 Class II Retrovirus Diversity

The pro-RT sequences from novel ERVs were aligned with previously described Class II ERVs and exogenous retroviruses of Group II. Clustal W (Thompson et al, 1994) was used to generate an initial alignment of nucleotide sequences. This alignment provided a template for conceptual translation of ERV pseudogenes and the generation of an amino acid alignment. Amino acid and nucleotide alignments were then viewed together as a single data matrix and adjusted by eye to minimise mismatches. At regions of ambiguity, conceptual translation in all three frames, and comparison between all three potential gene products assisted alignment and the confident identification of insertions and deletions in ERV pseudogenes. Inserted nucleotides were excluded from the final alignment. In a few sequences regions were encountered that proved exceptionally difficult to align and these regions were represented as 'missing data' in the construction of phylogenies.

2.3.8 Phylogenetic Analysis

Phylogenetic analyses were performed using the maximum parsimony (MP), and neighbour-joining (NJ) methods implemented in PAUP4 (Swofford, 1998). A modified method of maximum parsimony analysis was used to accommodate the large number of taxa in the dataset. The method consisted of carry out 30,000 random addition replicates with tree-bisection reconnection (TBR), using an unordered matrix, holding only one tree in memory during each replicate. This generated minimum trees, which were then used as the starting point for a full heuristic search saving all optimal trees. The optimal trees from this search were then used to re-weight the data matrix (using the resealed consistency index) and a heuristic search was conducted using this re-weighted matrix. The minimum tree obtained then used as an input tree for a further round of searching in which all the characters were again weighted to unity. This process continued until there was no further reduction in tree length. Bootstrap values for this tree were generated using 500 random additions per replicate again using an unordered matrix and holding only one tree in memory.

98 Chapter 2 Class II Retrovirus Diversity

2.3.9 Confirmation of ERV Amplification Products

PCR using specifically designed internal primers was used to confirm the origin of a given ERV fragment within a given DNA sample as opposed to another, contaminating DNA source. PCR was repeated using the specific primers and a separate source of the DNA in question. Successful amplification indicated that the ERV in question almost certainly originated from the sample. At least two separate DNA samples were available for each host taxon, although these samples sometimes originated from the same source of tissue. This meant that the only way in which a false confirmation could be achieved was if both DNA samples were contaminated, or the tissue sample itself was contaminated, which was considered highly unlikely.

99 Chapter 2 Class II Retrovirus Diversity

2.4 RESULTS

2.4.1 Design of novel primer pairs

A Class II-specific primer pair had been designed previously and was available for use at the initiation of this work. The two primers making up the pair were PRO-universal, a `universal' primer directed at a motif in the protease gene conserved across all retroelements, and Class II-RTI, a primer directed at a conserved motif in the RT gene specific to Class II retroviruses. However, this primer showed a tendency to cross- amplify Class I sequences, particularly in mammalian host taxa. Consequently, attempts were made to design a primer that amplified Class II sequences with significantly greater frequency.

In addition, since there was a particular interest in amplifying lentivirus and deltaretrovirus related sequences, attempts were made to design primers specific to each of these two genera. Primers were targeted at (1) the unique lentivirus motif DIKJGDAY' (see Figures 2.8 and 2.9), which is located within domain 3 of RT (Xiong and Eickbush, 1990), and (2) a lentivirus/deltaretrovirus specific variation of the 'Q/HYMDDI' motif in domain 5 of the RT gene. Five novel primers were designed in total, Figure 2.8 shows the motifs against which these primers were directed, their sequences, and their relative locations within the pol gene. In all PCR reactions, a specific primer designed against the RT region of the chosen target genera was used in conjunction with the PRO-universal primer.

Each of the specific primers was used in conjunction with the universal PRO primer on a selection of four mammalian (sheep, goat, domestic cat, and horse) and four avian (peregrine falcon, king penguin, hermit thrush, Eastern screech owl) genomic DNA samples. The four mammalian DNAs were all extracted from lentivirus (Visna, CAEV, FIV, EIAV) infected tissue samples (see Appendix 2.2 for details). Amplification products were resolved on 1.7% agarose gels. Gel resolved products in the expected size range were excised from agarose gels using a sterile scalpel blade, and purified for

100

Chapter 2 Class II Retrovirus Diversity

Figure 2.8 Positions of target motifs for primers within PRO-RT region

gag pol env

PRO RT RH IN

-400 by . 800-1000 by

Motif LVDTGA DIG/KDAY Q/HYMDDI

Specificity Universal Retrovirus Lentiviruses Class II

ERV groups Class I amplified All ERV groups Non-viral sequences Class II (not lenti/delta) Class III

Figure 2.9 An alignment showing the conserved motif DIG/,,DAY in the lentivirus genome.

EIAV LNKTVQVGTE I SRGLPH PGG LI KCKHMTVLD I GDAYFT I PLDPE FRPYTAFT I PS INHQE HIV- l_BH5 LNRRTQDFWEVQLGI PHPAGLKKKKSVTVLDVGDAYFSVPLDEDFRKYTAFT I PS INNE T B IV LNKI TVKGQEFS TGLPY P P GIKECEHLTAIDIKDAYFT I PLHEDFRPFTAFSVVPVNREG CaEV LNKQTEDLTEAQLGLPHPGGLQKKKHVT ILDIGDAYFT I PLYEPYREYTCFTLLSPNNLG S IVmanGB1 LNKLTQDFHELQLG I PH PAGIKKCKRI TVLD I GDAYFS I PLDPDYRPYTAFTVPSVNNQA

Table 2.3 Primer Sequences _ Primer Sequence PRO 5'-GT(GT) TTI (GT)TI GA(CT) ACI GGI (GT)C-3' CLASS II-RT 1 5'-ATI AGI A(GT)(AG) TC(AG) TCC AT(AG) TA-3' CLASS II-RT 2 5'-AGI A(GT)(AG) TC(AG) TCC AT(AG) TA-3" CLASS II-RT 3 5'-GAT GTC GTC CAT (AG)TA ITG-3' LentiiDelta-RT 1 5'-GTC GTC CAT GTA (CT)TG (AG)TA-3' Lenti-Digday 1 5'-GAA GTA IGC GTC ICC (AGT)AT (AG)TC-3'

Table 2.3 Sequences of primers tested in the analysis. Brackets indicate degeneracy (i.e.(GT) = G or T)

101 Chapter 2 Class II Retrovirus Diversity cloning. Following ligation and cloning of excised bands, transformed bacteria were cultivated on indicator plates and up to five colonies containing recombinant plasmids were selected for sequencing for each individually excised product. Following sequencing, alignment against a database of known sequences (see Methods, section 2.3.8) was used to determine the Class/Genus to which the amplified sequences belonged.

After the RT3 primer pair was shown to be significantly more efficient at amplifying Class II-related sequences than the others (see Figure 2.10), it was used exclusively for subsequent screening procedures. None of the primer pairs successfully amplified lentiviral or deltaretroviral sequences even after considerable effort had been expended on modifying PCR parameters and primer sequences. Although viral sequences were amplified, these sequences all showed greater homology to beta or alpharetroviruses, than to either of the two complex Class II genera. The preference of lentivirus-directed primers for other Class II sequences may reflect the higher copy number of other Class II-related ERVs relative to lentivirus/deltaretrovirus proviruses in infected genomic DNA. This may even have been the case for genomic DNA samples obtained from cells infected in culture, in which lentiviral proviruses would presumably occur at relatively high copy number. Attempts to develop primers that amplified a region spanning from PR to IN were also unsuccessful.

102

Chapter 2 Class II Retrovirus Diversity

Comparison of Primer Efficiency

45 -

40

35

30 • Class I 25 • Class II SI Class HI 20 U Non-viral ■ Lenti/Delta-related 15

10

Class 1:1-RT I Class II-RT2 Class H-RT3 Lenti,Delta-RT I Lenti-Digdayl Primer

Figure 2.10 A bar graph showing the number of class I, II, and III ERV, lentivirus and nonviral sequences amplified in a control experiment (the y axis shows the number of sequences amplified for each class/group). Each of the specific primers was used in conjunction with the universal PRO primer on a selection of four mammalian and four avian genomic DNA (see above for details). The four mammalian DNAs were all extracted from lentivirus infected mammalian tissue samples. Where amplification products in the correct size range were obtained, bands were excised and cloned. Five colonies were selected for each amplification product, and sequenced. Following sequencing, alignment against a database of known sequences (see Methods, section 2.3.8) was used to determine the Class/Genus to which the amplified sequences belonged. The total number of Class I, H, III, lentivirus and nonviral sequences was scored for each primer, and plotted as shown above. None of the primers successfully amplified lentivirus related sequences. However, the Class II RT-3 primer was the most reliable Class II-specific primer, since the others showed a tendency to cross-amplify Class I and Class III ERV sequences.

103 Chapter 2 Class II Retrovirus Diversity

2.4.2 Isolation of viruses

96 higher vertebrate taxa (50 mammalian taxa and 46 avian taxa) were screened for Class II ERVs using the PRO/RT3 primer pair. (see Appendix 1 for sources of tissue/DNA). Prior to use, genomic DNA samples were electrophoresed through 0.5% agarose gels to assess the quality of the DNA. Some of the genomic DNA samples were in a degraded condition when they were received, indicated by smeared or weak bands on agarose gels. Unsuccessful PCR amplification of retroviral inserts from these samples may have been due to their poor condition.

1 2 3 4 5 6 Figure 2.11 A photograph of an agarose gel showing PCR products and marker. Lane I - negative control; lane 2 - Ferruginous hawk; lane 3 - Goshawk; lane 4 - Eastern screech owl; lane 5 - Marsh Hanier; lane 6 - 100 base pair ladder.

Amplification products for each PCR reaction were resolved on 1.7% agarose gels (Figure 2.11). Occasionally, two distinct bands were seen on the agarose gel. In these cases, both bands were excised and cloned separately. For successful PCR reactions, products in the expected size were cloned into bacteria and cultivated on indicator plates. At least five clones were picked from each plate, and selected for sequencing. Initially up to 15 clones were selected, but after sequencing and preliminary phylogenetic anlaysis revealed that the vast majority of clones were members of a single

104 Chapter 2 Class II Retrovirus Diversity viral lineage, the number was reduced. In total, 400 clones were sequenced, and 266 of these contained viral inserts. 186 of these showed homology to Class II retroviruses. In addition, 69 inserts showing homology to Class I ERVs were sequenced, along with 11 inserts showing homology to Class III ERVs.

In cases where more than one Class II-related ERV sequence was identified within a single host taxon, the sequences were usually more than 95% similar at the nucleotide level, and were not considered distinct viral taxa. However, 16 host taxa (13 avian and 3 mammalian - see Table 2.4) were found to harbour two or more Class II related sequences which were less than 75% similar. In these cases each of the ERV sequences was included in subsequent sequence alignments and phylogenetic analyses.

Removal of closely related sequences from the dataset left 91 novel Class II ERV fragments. The 91 novel ERVs included 58 sequences derived from avian hosts, 32 from mammalian hosts, and a Class II ERV isolated from an amphibian (by Clare Lynch). Class II-related sequences were amplified from all avian orders investigated, and all mammalian orders investigated except Chiroptera (bats), Pinnipedia (seals), and Scandentia (tree shrews). No Class II ERVs were isolated from these orders despite repeated attempts. Amongst the mammalian order Carnivora, neither canids (dogs) or ursids (bears) yielded any class II ERV sequences. Negative PCR results may simply have been the result of sub-optimal PCR conditions or genomic DNA extraction, and cannot confirm the absence of Class II sequences. Two additional sequences of rodent origin (IAP brown rat and MERV-1) were obtained by screening of online databanks. In total, PR-RT regions were identified from a total of 93 novel Class II ERVs. Table 2.4 summarises the novel ERVs.

2.4.3 Confirmation of Fragment Origin

PCR with internal primers directed against specific internal motifs (see Section 2.3.9) confirmed the origin of the 93 amplified fragments discussed above. Two ERV

105 Chapter 2 Class II Retrovirus Diversity

Table 2.4 Taxa screened for ERV insertions and viral fragments identified*

Order/Family Species Retroviral lsolate(s)

Class Ayes

Anseriformes Anatidae (Swans, geese and ducks) White-fronted goose (Anser albifrons) RV-White-fronted goose I Redhead (Aythya americana) RV-Redhead I Brent goose (Branta bernicla) PCR Unsuccessful Wandering whistling duck (Dendrocygna RV-Wandering whistling duck I arcuata) North American black duck (Anas rubripes) RV-Black duck I RV-Black duck II RV-Black duck III RV-Black duck IV Baikal teal (Anas Formosa) PCR Unsuccessful Apterygiformes Apterygidae (Kiwis) Brown kiwi (Apteryx australis) RV-Brown kiwi I Great spotted kiwi (Apteryx haasti) RV-Great spotted kiwi I Little spotted kiwi (Apteryx owenii) RV-Little spotted kiwi I RV-Little spotted kiwi II Casuariformes Casuariidae (Cassowaries) Cassowary (Casuarius casuarius) RV-Cassowary I* Dromaiidae (Emu) Emu (Dromaius novaehollandiae) RV-Emu I* Charadriiformes Jacanidae (Jacanas) Bronze-winged jacana (Metopidius indicus) PCR Unsuccessful Ciconiiformes Ciconidae (Storks) Wood stork (Mycteria americana) PCR Unsuccessful Phoenicopteridae (Flamingos) Chilean flamingo (Phoenicopterus ruber RV-Flamingo I chilensis) RV-Flamingo II Columbiformes Columbidae (Pigeons) Wood pigeon (Columba palumus) RV-Wood pigeon I Falconiformes Accipritidae Goshawk (Accipter gentilis) RV-Goshawk I Marsh harrier (Circus aeruginosus) RV-Marsh harrier I RV-Marsh harrier II Red kite (Milvus milvus) PCR Unsuccessful Ferruginous hawk (Buteo regalis) RV-Ferruginous hawk I RV-Ferruginous hawk II Cathartidae (New World vultures) Turkey vulture (Cathartes aura) RV-Turkey vulture I Falconidae (Typical falcons) Peregrine falcon (Falco peregrinus) RV-Peregrine falcon I Galliformes Numididae Vulturine guineafowl (Acryllium RV-Guineafowl I vulurinum) RV-Guineafowl II Phasianidae (Pheasants and quails) Gambel's Quail (Callipepla gambelii) RV-Gambel's quail I RV-Gambel's quail II Golden pheasant (Chrysolophus pictus) RV-Golden pheasant I Bobwhite quail (Colinus virginianus) PCR Unsuccessful Japanese quail (Coturnix japonica) RV-Japanese quail I Ring-necked pheasant (Plzasianus RV-Ring-necked pheasant I colchicus) RV-Ring-necked pheasant II Blue peacock (Pavo cristatus) RV-Blue peacock RV-Blue peacock II RV-Blue peacock III

* Class II viral fragments shown in blue, Class 1 viral fragments shown in pink, Class III viral fragments shown in green. * Amplified by Jo Martin. 106 Chapter 2 Class II Retrovirus Diversity

Table 2.4 (con'td) Taxa screened for ERV insertions and viral fragments identified

Order/Family Species Retroviral Isolate(s)

Class Ayes

Grey partridge (Perdix perdix) RV-Partridge W * Cabot's Tragopan (Tragopan caboti) RV-Tragopan I Tetraonidae (Grouse) Black grouse (Lyrurus tetrix) RV-Black grouse I Gaviiformes Gaviidae (Loons) Common loon (Gavia immer) RV-Common loon I RV-Common loon II Gruiformes Rallidae (Rails) Gray moorhen (Gallinula chloropus) RV-Gray moorhen I RV-Gray moorhen II European coot (Fulica atra) PCR Unsuccessful Passeriformes Muscicapidea (Thrushes and allies) Hermit thrush (Catharus guttatus) RV-Hermit thrush I RV-Hermit thrush II RV-Hermit thrush III RV-Hermit thrush IV Mistle thush (Turdus viscivorous) RV-Mistle thrush I Paridae (True tits) Blue tit (Parus caeruleus) RV-Blue tit I RV-Blue tit II RV-Blue tit III Corvidae (Crows) Common magpie (Pica pica) RV-Common magpie I RV-Common magpie II RV-Common magpie III Azure-winged magpie (Cyanopica cyana) RV-Azure-winged magpie I RV-Azure-winged magpie II Ptilonorhynchidae (Bowerbirds) Regent bowerbird (Sericulus RV-Regent bowerbird III * chrysocephalus)

Piciformes Picidae (Woodpeckers) Green woodpecker (Picus viridis) RV-Green woodpecker I Rhamphastidae (Toucans) Golden-collared toucanet (Selenidera RV-Toucanette I reinwardtii) RV-Toucanette II Rheiformes Rheidae (Rhea) Greater rhea (Rhea americana) RV-Greater rhea I Darwin's rhea (Pterocnemia pennata) RV-Darwin's rhea I RV-Darwin's rhea II Sphenisciformes Spheniscidae (Penguins) King penguin (Aptenodytes patagonicus) RV-King penguin I

Strigiformes Strigidae (Typical owls) Long-eared owl (Asio otus) PCR Unsuccessful Eastern screech-owl (Otus asio) RV-Eastern screech owl I RV-Eastern screech owl II RV-Eastern screech owl Ill Struthioniformes Struthionidae (Ostriches) North African ostrich (Stuthio camelus) RV-Ostrich I RV-Ostrich D*

Tinamiformes Tinamidae (Tinamous) Elegant-crested tinamou (Eudromia RV-Elegant-crested tinamou I* elegans)

* Class II viral fragments shown in blue, Class I viral fragments shown in pink, Class HI viral fragments shown in green. * Amplified by Jo Martin. 107 Chapter 2 Class II Retrovirus Diversity

Table 2.4 (con'td) Taxa screened for ERV insertions and viral fragments identified*

Order/Family Species Retroviral Isolate(s)

Class ia

Artiodactyla Bovidae Bohor reedbuck (Redunca redunca) PCR Unsuccessful American bison (Bison bison) RV-Bison I Gemsbok (Oryx gazella) PCR Unsuccessful Oryx (Oryx oryx) PCR Unsuccessful Mountain goat (Oreamnos americanus) PCR Unsuccessful Musk ox (Ovibos moschatus) RV-Musk ox I Thinhom sheep (Ovis dalli) PCR Unsuccessful Impala (Aepyceros melampus) PCR Unsuccessful Domestic sheep (Ovis aries) RV-Sheep I Domestic goat (Capra hircus) RV-Goat I Cervidae (Deer) Caribou (Rangifer tarandus) RV-Caribou I White-tailed deer (Odocoileus virginianus) RV-White-tailed deer I Camelidae (Camels and Llamas) Llama (Lama glama) PCR Unsuccessful Giraffidae (Giraffe and Okapi) Giraffe (Giraffa camelopardalis) RV-Giraffe I

Carnivora Canidae (Dogs) Coyote (Canis latrans) PCR Unsuccessful Felidae (Cats) Cougar (Felis concolor) RV-Cougar I Domestic cat (Felix catus) RV-Domestic cat I Ursidae (Bears) American black bear (Ursus americanus) PCR Unsuccessful Mustelidae (Weasels and relatives) Pine marten (Martes martes) PCR Unsuccessful Small Indian mongoose (Herpestes RV-Small Indian mongoose I javanicus) RV-Small Indian mongoose H RV-Small Indian mongoose III Chinese ferret badger (Melogale moschata) RV-Chinese ferret badger I RV-Chinese ferret badger II Cetacea Delphinidae (Dolphins) Risso's dolphin (Grampus griseus) RV-Risso's dolphin I RV-Risso's dolphin II Common dolphin (Delphinus delphis) RV-Common dolphin I White-beaked dolphin (Lagenorhynchus RV- White-beaked dolphin I albirostris) Atlantic white-sided dolphin RV- Atlantic white-sided dolphin I (Lagenorhynchus acutus) Striped dolphin (Stenella coeruleoalba) RV-Striped dolphin I Bottle-nosed dolphin (Tursitops truncatus) RV- Bottle-nosed dolphin I Chiroptera Pteropopidae (Flying foxes) Hairy-legged vampire bat (Diphyla RV-Hairy-legged vampire bat I caudata) RV-Bally-legged vampire hat II Edentata Dasypodidae (Armadillos) Three-banded armadillo (Tolypeutes RV-Three-banded armadillo I matacus) RV-Three-banded armadillo II Insectivora Erinacidae (Hedgehogs and European hedgehog (Erinaceus euopaeus) RV-Hedgehog I moonrats) Lagomorpha Leporidae (Rabbits and hares) European rabbit (Oryctolagus cuniculus) RV-Rabbit I

* Class II viral fragments shown in blue, Class I viral fragments shown in pink, Class III viral fragments shown in green. * Amplified by Jo Martin. 108 Chapter 2 Class II Retrovirus Diversity

Table 2.4 (con'td) Taxa screened for ERV insertions and viral fragments identified'

Order/Family Species Retroviral Isolate(s)

Class Mammalia

Marsupialia Macropodidae (Kangaroos and Red kangaroo (Macropus rufus) RV-Red kangaroo I wallabies) Dasyuridae (Marsupial carnivores) Stripe-faced dunnart (Sminthopsis RV-Stripe-faced dunnart I* macroura) Monotremata Ornithoryhnchidae (Platypus) Duck-billed platypus (Ornithorhynchus RV-Duck-billed platypus I* anatinus) Short-beaked echidna (Tachyglossus RV-Echidna I aculeatus) Pinnipedia Odobenidae (Walrus) Walrus (Odobenus rosmarus) PCR Unsuccessful Phocidae (True or Hair seals) Grey seal (Halichoerus grypus) RV- Grey seal I Otaridae (Eared seals) Northern fur seal (Callorhinus ursinus) RV- Northern fur seal I Primates Lorisidae (Bush babies, lorises and Slow loris (Nycticebus coucang) RV-Slow loris I pottos) Cercopithidae White-epauletted black colobus (Colobus RV-Colobus I angolensis) Pongidae Orang-utan (Pongo pygmaeus) RV-Orang-utan I

Rodentia Muridae African grass rat (Arvicanthis ansorge0 RV-Grass rat I RV-Grass rat II House mouse (Mus musculus) MERV-1 (ACO26385.) Shrew mouse (Mus pahari) RV-Shrew mouse I Rice rat (Oryzomys intermedius) RV-Rice rat I Multimammate rat (Mastomys huberti) RV-Multimammate rat I Bismark giant rat (Uromys neobritannicus) RV-Bismark giant rat I Yemeni mouse Myomys yemeni) RV-Yemeni mouse I Brown rat (Rattus norvegicus) IAP Brown rat (AC094407) Sciuridae Indian provost squirrel (Callosciurus RV-Indian provost squirrel I prevosti) Prairie dog (Cynomys ludovicianus) RV-Prairie dog I Scandentia Tree shrew (Tupaia belangeri) PCR Unsuccessful Class Amphibia

Gymnophiona (Caecilians) Caeciliidae Boulengerula boulengeri RV-Caecilian (Bou)+

* Class II viral fragments shown in blue, Clas i viral fragments shown in pink, Class III viral fragments shown in green. * Amplified by Jo Martin. + Amplified by Clare Lynch 109 Chapter 2 Class II Retrovirus Diversity

fragments identified by PCR screening could not be confirmed and were excluded from subsequent analysis.

2.4.4 Sequence Alignment

Novel ERV sequences identified by PCR screening were aligned with previously characterised Class II ERVs. The previously characterised sequences included two alpharetroviruses, nine betaretroviruses, two deltaretroviruses, six lentiviruses, and several Class II ERV sequences. Initially, ten HERV.K subgroups (HML-1 - HML-10) were included in the alignment, but this was reduced to four after seven of the subgroups were shown to represent a single monophyletic lineage in bootstrapped NJ trees. Details of the previously described Class II sequences included in the alignment are shown in Table 2.4.

Novel ERV sequences were conceptually translated using MacVector (see Methods, section 2.3.7) and an alignment of the inferred amino acid sequences was prepared. Regions where there was no clear homology between sequences, or where homology could not be unambiguously identified, were excluded from the alignment. The amino acid alignment was subsequently used as a template to generate an alignment of ERV nucleotide sequences (see Appendix 2). The final alignment contained 122 Class II retrovirus taxa, aligned across 907 base pairs.

Alignment of amplified sequences revealed indels of up to 23 bp in several sequences. Regions that could not be unambiguously aligned were omitted from the final alignment. Sequences tended to be most variable at positions between the pro and pol coding domains. Appendix 3 summarises the characteristics of the novel Class II ERVs identified in this study. The 5' end of some of the Class II HERV sequences (HERV.K.HML.7 and HERV.K.HML.8) appeared to have been deleted, their sequence having been replaced by sequences that showed no apparent homology to viral sequences.

110 Chapter 2 Class II Retrovirus Diversity

All exogenous Class II retroviruses have a -1 frameshift between the pro and pol coding domains. In all but 8 of the novel Class II ERV sequences, alignment and conceptual translation suggested that a -1 frameshift was required for translation of pol.

Table 2.5 Previously described Class II retroviruses included in this analysis

Fully Accession Retrovirus Sequenced? Number Reference

Alpharetrovirus Avian leukosis virus subgroup J (ALV-J) Yes NC001408 Bieth and Darlix (1992) Rous sarcoma virus (RSV) Yes NC001407 Petropoulous (1997) Betaretrovirus Simian retrovirus 1 (SRV-1) Yes M11841 Power et al (1986) Simian retrovirus 2 (SRV-2) Yes M16605 Grant eta! (1995) Mason-Pfizer monkey virus (MPMV) Yes M12349 Sonigo et al (1986) Squirrel monkey retrovirus type H (SMRV-H) Yes M23385 Oda et al (1988) Simian endogenous retrovirusb6b. (SERVbabboon) U85505 van der Kuyl et al (1997) Mus musculus type D-like endogenous retrovirus AF246633 Mager and Freeman (MusD) (2000) Trichosurus vulpecula endogenous retrovirus Yes AF224725 Baillie and Wilkins (2001) (TvERV) Jaagsiekte sheep retrovirus (JSRV) Yes M80216 York et al (1992) Mouse mammary tumour virus (MMTV) Yes M15122 Moore et al (1987) Deltaretrovirus Bovine leukemia virus (BLV) Yes K02120 Sagata et al (1985) Human T-cell leukemia virus type 1 (HTLV-1) Yes AF033817 Petropoulos (1997) Lentivirus Human immunodeficiency virus type 1 (HIV-1) Yes M38431 Petropoulos (1997) Simian immunodeficiency virus Yes SIU58991 Petropoulos (1997) (African green monkey strain) (SWagm) Feline immunodeficienct virus (FIV) Yes M25381 Talbott et al, (1989) Caprine arthritis/encephalitis virus (CAEV) Yes CEAVCG Saltarelli et al (1990) Ovine maedi-visna virus (OMVV) Yes M10608 Sonigo et al (1985) Equine anemia virus (EIAV) Yes M16575 Rushlow et al (1986) HERV.K superfamily HERV.K.HML-2 Yes M14123 Ono (1986) HERV.K.HML-5 (in HGP contig) Yes AP000870 Andersson et al (1999) HERV.K.HML-6 Yes AF069508 Yin et al (1999) HERV.K.HML-9 (in HGP contig) Yes ACO25569 Andersson et al (1999) IAP Elements IAP mouse Yes M17551 Mietz et al (1987) IAP Chinese hamster No M34951 Anderson et al (1990) IAP Syrian hamster Yes M10134 Ono and Ohishi (1983) Unclassified viruses Lymphoproliferative disease virus (LDV) Yes U09568 Chajut et al (1992) Murine ERV Ul (MuERV-ul ) Yes AC005817 Benit et al (2001) PERV yl (referred to here as PigD) No AF274705 Patience et al (2001)

RNA secondary structure immediately downstream of the frameshifting site was investigated using prediction programs (Zuker et al, 2000), however, no conservation in

111 Chapter 2 Class 11 Retrovirus Diversity secondary structure could be identified. There appeared to be as many diverse secondary structures as there were sequences.

The nucleotide composition across the entire aligned region of the genome was computed in PAUP (Swofford, 1998). As discussed in section 2.1.4, lentiviruses are A- rich and C-poor across the entire genome, whereas deltaretroviruses are C-rich and G- poor across the entire genome, and across the region aligned in this study. In alpharetroviruses and betaretroviruses, the relative proportion of A, T, G, and C is more even across the aligned region. The average nucleotide compositions of the novel ERVs identified by PCR screening showed a similarly even distribution of the four nucleotides. This remained the case when the novel ERVs where split into groups of related viruses (see section 2.4.7: phylogenetic analysis), or divided according to host class, and the groups analysed separately (Figure 2.14). None of the groups showed nucleotide compositions similar to deltaretroviruses or lentiviruses. It is not clear whether this observation has any biological significance.

2.4.5 g-patch domain

Alignment revealed that length differences between sequences were mainly due to the presence or absence of an approximately 43 amino acid region immediately prior to the site of ribosomal frameshifting. Comparison of this sequence with the Pfam database revealed highly significant matches to characteristic g-patch domains found in many RNA-binding proteins, and thought to have an RNA-binding function (Figure 2.12). The g-patch domain has previously been identified in the primate betaretroviruses (Aravind and Koonin, 1999). In these viruses, the domain is cleaved from Pro to generate a discrete RNA-binding protein called p5 (see Figure 2.3). The function of the protein is unknown. It has been reported that in these viruses the viral aspartyl proteinase cleaves and releases the portion of the polyprotein that consists primarily of the g-patch domain as a small, stand alone protein. This protein might function in splicing or transport of the intron containing mRNAs of these retroviruses (Hruskova-Heidingsfeldova et al, 1995).

112

Chapter 2 Class II Retrovirus Diversity

Figure 2.12 Alignment of retroviral g-patch domain with other g-patch domains

Pfam Access# Q9TYS1 TGGIGRLMLEKMGWRPGEGLGKDATGNLEPLMLDVKSDRKGLI Pfam Access# Q9VHBO TGGMGMALLQKMGWKPGEGLGRCKTGSLQPLLLDVKLDKRGLV Pfam Access# SON_HUMAN TGGMGAVLMRKMGWREGEGLGKNKEGNKEPILVDFKTDRKGLV Pfam Access# LU15_HUMAN HSNIGNKMLQAMGWREGSGLGRKCQGITAPIEAQVRLKGAGLG Pfam Access# Q14136 SDNIGSRMLQAMGWKEGSGLGRKKQGIVTPIEAQTRVRGSGLG Pfam Access# 001691 ESNIGNRLLKSMGWKEGQGVGKHAQGIVNPIEAERFVQGAGLG Pfam Access# Q9VPY9 SSNVGSRLLQKMGWSEGQGLGRKNQGRTQIIEADGRSNYVGLG Pfam Access# 094585 NNGKGKQLLEMMGWSRGKGLGSENQGMVDPVVAVVKNNKQGLH Pfam Access# Q9XWG6 SGNVGFKLLKSMGWSEGQGLGKEKQGHVEPVATEVKNNRKGLG Pfam Access# Q22705 DQKLSKKLMEKMGWSEGDGLGRNRQGNADSVKLKANTSGRGLG Pfam Access# Q9SH99 KDSAAFKLMKSMGWEEGEGLGKDKQGIKGYVRVTNKQDTSGVG HERVK family HERV.K.HML2 TSQ--KIMT-KMGYIPGKGLGKNEDGIKVPVEAKINQEREGIG HML-6 family HERV.K.HML6 *NVK-ENGIQSGKGLGKPLQGNPDPISITGQTER-GVI Type-A genus IAPSHamster QAS AIMA-KMGYTNGRGLGRQEQGRIKPITQHGNRGRKGLG Type-A genus IAPCHamster TSQ GIMK-RMGYSPRPGLGKHLQGRTSPINSQLRPKNLGLG Type-A genus IAPNRat GASOKIM—IPGKGIGKSLQGRTSPITSPTERRESGLG Type-A genus IAPMouse KAK NIMA-KMGYKEGKGLGHQEQGRIEPISPNGNQDRQGLG Type-A genus B.G.RatI QAH KMMT-RMGYEEGQGLGSKEQGRLQPIPQIKHEGRRGLG Type-A genus MRatI SAQ HMMQ-DMGYAPGEGIGKYLQGRKSPIPVK*RQKRQGLG Type-A genus Yemenimouse SAQ HIMR-DMGYVPGFGIGKYLQGWRSPISAQQRQKRQGLG Type-A genus Prairiedogl VAQ KIMQ-EMGYRPGLGLGKTLQGLKHPLEPQQKFNRSGLG Type-A genus Grass.rat.I QSQ NMMQ-GMGYRPGKGLGKNLQGSPDVITTLPKHDRTGLG Type-A genus Rabbitl -VQ SMLQ-KMGYVAGKGLGVQLQGRSSPIELKQKPDRTGLG Type-D genus WTDeerI NDAVLQMLL-N-GLLPNQGLGKNGETNLSPVQTKTLPLRSGLR Type-D genus Cariboul NDAVSQMLL-NQGLLPNQGLGKNGEGNLSPTQNKTLAFRSGLG Type-D genus DolphinI NSQVSNMML-DQGFLPTKGLGTNQQGTVSPIDVKIKNDRQGLG Type-D genus Jaagsiekte SPTVTDLML-DQGLLPNQGLGKQHQGIILPLDLKPNQDRKGLG Type-D genus TvERV SSVVTEQML-SQGFLPRQGLGKNKQGITQPLHIQSHPDRSGLG Type-D genus SMRVH NDTVMTQML-SQGYLPGQGLGKNNQGITQPITITPKKDKTGLG Type-D genus Goat II NALVTQQML-CQGFIPGKGLGRDKQGTIQPINLSPKTDRSGLG Type-D genus Lorisl NETITHQML-KQGFCAGQGLGKYSQGIKEPIQITSNLNRAGLG Type-D genus Mongoosell NEVITCQML-NQGFRPRQGLGKYSQGIKEPIKLKNNDNASA-G Type-D genus Ostrich-D NEVITQQML-NLCFLLGQGLGKSNQGIKQPLPVTPKNNRSGVG Type-D genus MusD KEMVTEQTF-RQGPLPDHGLIKKGQEIKTFKGLKPHSNVRGLK Type-D genus MPMV NDIVTAQML-AQGYSPGKGLGKKENGILHPIPNQGQSNKKGFG Type-D genus SRV-1 SDIVTAQML-AQGYSPGKGLGKNENGILHPIPNQGQFDKKGFG Type-D genus SRV-2 NDIVTAQML-AQGYSPGKGLGKREDGILQPIPNSGOLDRKGEG Type-D genus BabboonSERV NDIVIAQML-TOGYTPGKGLGKRENGIPQPILVSGQFDKKGEG Type-D genus Colobusl NDIVTAQML-AQGYHPGKGLGKREDGILQPIPAIGKLNKRGLG Type-D genus Mongoosel NDLVAAQML-TQGYKPDKGIEINKDSITQPVEVLNNHHTGGIE Type-B genus] BisonI DEKVTSQML-HMGYDPSKGLGKQQEGIIEPICPTPRQLHTGLG Type-B genus] MuskoxII DDKVTSQML-QMGYDPSKGLGKQQTGIIEPICPTPRKLRAGLG

Figure 2.13 Distribution of g-patch domain across class II taxa

Viral lineage Genomic organisation of the protease/RT region

PRO g-patch sequence RI Deltaretroviruses 5' 3'

Lentiviruses r. F

HERV.K.HML-2 it HERV.K.HML-6 1 Betaretroviruses* IAP Elements 1 A MMTV E. —I—I f 1 Alpharetroviruses r _D -Ii_ I HERV.K.HML-5 FITI {— I HERV.K.HML-9 I I L 1

•,xc ludmg MMTV

113 Chapter 2 Class II Retrovirus Diversity

The g-patch domain was present in all betaretroviruses, except MMTV. MMTV contains a short region of sequence in the appropriate location that aligns with the 5' end of the g-patch sequence found in other betaretroviruses. The g-patch domain was also present in all IAP-related retroviruses, and all HERV.K-related sequences except the HML-9 and HML-5 groups (Figure 2.13). Although regions of approximately similar length and position to the g-patch domain were observed in some other mammalian retroviruses (RV-Hedgehog I, RV-Grass rat II, RV-Chinese ferret badger I, RV-Shrew mouse I, RV-Rice rat I and RV-Three banded armadillo II, MuERV-U1, and MERV-1) the region in these sequences proved too difficult to align and was represented as missing data in the final alignment.

2.4.6 Nonsense mutations (stop codons and frameshifts)

ERV pseudogenes that are not under selection accumulate nonsense mutations (stop codons and frameshifts) that interrupt the viral reading frame. Assuming that rates of mutation are constant and relatively even in different host lineages, the number of nonsense mutations across the reading frame of an ERV insertion can give an approximate indication of its age. Alignment of Class II ERV sequences revealed that many were intact across the entire amplified region. The nucleotide alignment (Appendix 3) was used to calculate the number of nonsense mutations (stop codons and frameshifts) in the most intact insertion of each novel ERV identified by PCR screening. 33 of the 99 novel ERVs had insertions that were completely intact, with no stop codons or frameshifts. In 26 the most intact insertion had only one interruption across the amplified region, and in 17, only two. Only —25% of novel ERVs identified lacked insertions with fewer than two mutations interrupting the viral reading frame. This suggests that a large proportion of the Class II ERVs identified have been active relatively recently.

Ongoing work (carried out by Jo Martin) within the Retrovirus Evolution Group at Silwood Park, has involved extensive PCR screening of mammalian taxa for MLV- related ERV fragments. As a result of this work, 318 MLV-related ERV fragments have

114 Chapter 2 Class II Retrovirus Diversity been isolated, and the number of stop codons and frameshifts across the amplified region calculated from a nucleotide alignment (again taking the most intact copy of a particular virus where multiple copies had been obtained). The same region of the viral genome (PR to RT, see Figure 2.8) was amplified in this study as was amplified here from Class II ERVs.

Figure 2.15 shows the average number of stop codons, and the average number of frameshifts, across the entire amplified region of (1) 101 novel Class II ERVs, and (2) 318 MLV-related ERVs amplified using the same basic screening technique, but primers specific for MLV-related sequences. There are fewer stop codons and frameshifts, on average, in the Class II ERV dataset. The plot in Figure 2.16 compares the relative proportion of ERVs with a given number of nonsense mutations in the Class I and Class II datasets (see also Figure 2.17, Figure 2.18 and Figure 2.19, and Section 2.4.10). A noticeably greater proportion of the Class II ERVs were completely intact or nearly intact across the amplified region, whereas a larger proportion of Class I ERVs had 3-11 interruptions in the viral reading frame.

The distribution of nonsense mutations in pol sequences identified by PCR screening suggested that a greater proportion of Class II ERV diversity has been established as the result of recent activity. However, the stop codon and frameshift data derived from data obtained by PCR screening can only provide a very approximate measure of recent retroviral activity. It is not possible to quantify the sampling error associated with the PCR based screening method. Mutation rates are likely to vary across different host taxa and may not be constant over time (Purvis, 1995). Furthermore, the shorter generation time of some taxa may contribute to a significantly higher 'clock speed' in different host lineages.

115

Chapter 2 Class II Retrovirus Diversity

Figure 2.14 Comparison of nucleotide composition across aligned region in different class II groups 0.45 -

0.4 -

0.35-

•San9 0.3- -to 0.25 - -Deltaretroviruses 71 -Lentiviruses ,. 0 0.2 - Mammal ERVs = "t"" Avian ERVs a 0.15 - 0 ri, 0.1 -

0.05 -

0 1 1 I A C G T Figure 2.14 Average nucleotide compositions across the region of the retroviral genome aligned in this study. Nucleotide compositions are plotted for deltaretroviruses, lentiviruses and for class II ERVs isolated from mammals and from birds. Lentiviruses are A-rich and C-poor across the aligned region, whereas deltaretroviruses are C-rich and G-poor. The average nucleotide compositions of the novel ERVs identified by PCR screening show a more even distribution of the four nucleotides. The biological significance of these differences is unclear.

Figure 2.15 Average number of stop codons and frameshifts in Class I and Class II ERV fragments obtained by PCR screening

2.5

2- 2 o 1 B 0 1.5 - '' Stop codons =0 c..0 Frameshifts it a) tv) es tli Q. 0.5 -

0 Class I Class II Figure 2.15 Bar graphs showing the average number of stop codon and frameshift mutations in class I and class II ERVs amplified by PCR screening. On average, class II retroviruses had less top codons and less frameshifts, suggesting that they may have radiated more recently than their class I counterparts.

116

Chapter 2 Class II Retrovirus Diversity

Figure 2.16 Plot comparing the distribution of nonsense mutations in Class I and Class II ERV fragments obtained by PCR screening

0.4

0.35 • 0.3

a.) \ ,,, 0.25 \ . o o 0.2

a, o 0.15 4:: al 1 1 0.05

• -6 -- 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 Class I —0 Class II Number of nonsense mutations

Figure 2.16 PCR screening has been used to sample for both Class I and Class II ERVs. For each of the novel ERVs identified by this means, the number of nonsense mutations (stop codons and frameshifts) has been calculated. The plot above shows the proportion of viruses with a given number of nonsense mutations, for both the Class I and Class II lineages. A markedly higher proportion of the Class II ERVs obtained by PCR screening had no nonsense mutations, or only one, suggesting that a higher proportion of Class II ERVs are relatively recent insertions.

Figure 2.17 Plot comparing the distribution of nonsense mutations within Class II ERV subgroups

0.45 —0— Avian Class II supergroup —0— Beta/IAP mammalian Class Its OA —0— Basal mammalian Class Its 0.35-

al 0.3 - aJ -5.' 0.25-

'•E 0.2 - 0 0. a, 0.15-o 0.1 -

0.05- 1110 0 I I Q 1 • I I '4 0 0 1 O- 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Number of nonsense mutations

Figure 2.17 The plot above shows the proportion of viruses with a given number of nonsense mutations within Class II. Intact sequences are more common amongst Betaretrovirus/IAP-related elements, suggesting that these groups may have been active more recently than mammalian Class II viruses that do not fall within the Betaretrovirus or IAP element clades.

117

Chapter 2 Class H Retrovirus Diversity

Figure 2.18 Stop codons and frameshifts in subgroups of Class I and Class II retrovirus

Class I iteineal=01§Mii% TYPE II Mammalian MLVs

TYPE I Mammalian MLVs

Basal non-MLVs Stop codons

TYPE I Avian MLVs "'WtIff1n:t Frameshifts

Class II

Basal Mammalian Class Hs

Betaretroviruses and ■ Stop codons /1AP-related Class Frameshifts

Avian Class Its

0 0.5 1.5 2 2.5 3 Average # of nonsense mutations Figure 2.18 Bar graph showing the average number of stop codons and frameshifts within subgroups of Class I (MLV- related) and Class 11 retrovirus.

Figure 2.19 Plot comparing the distribution of nonsense mutations in Class I and Class II ERV subgroups

CLASS I —0— TYPE I Mammalian MLVs --0—. TYPE II Mammalian MLVs

CLASS II —0— Avian Class II ERVs 0 Betaretrovirus and IAP-related ERVs 0.45

0.4

0.35

0.3

ses 0.25 iru v n

io 0.2 t or

Prop 0.15

0.1

0.05 , O o 2 4 6 8 10 12 Number of nonsense mutations

118 Chapter 2 Class II Retrovirus Diversity

2.4.7 Phylogenetic Analysis

The nucleotide alignment described in section 2.4.4 provided the basis for construction for NJ and MP phylogenies. Figure 2.20 shows a NJ phylogram. Figure 2.21 shows a boostrapped NJ tree, Figure 2.22 shows a bootstrapped consensus (strict) of six MP trees of equal length. Bootstrap support over 50% is indicated on the trees.

Phylogenetic analysis supported most of the established groupings (the four exogenous Class II genera, and the lineages of Class II ERVs discussed in the introduction). Bootstrapping supported the inclusion of 29 of the newly identified ERV fragments into established genera in NJ trees, 26 in a bootstrapped consensus of six MP trees. Although in some cases bootstrap support was relatively weak (-50-65%). The HERV.K superfamily resolved into four distinct subgroups, comprising the HERV.K.HML-2, HERV.K.HML-5, HERV.K.HML-6, and HERV.K.HML-9 lineage. No novel ERV sequences emerged that split the lentiviruses or deltaretroviruses, although parsimony trees incorporated divergent hedgehog and Nile rat sequences into the lentivirus Glade in a basal position. In all trees, the deltaretroviruses were placed basal to the other groups. Previous reconstructions of retroviral and retroelement relationships have also placed the deltaretroviruses in this position (Doolittle et al, 1989; Herniou et al, 1998; Tristem, 2000; Katzourakis and Tristem, in press). Phylogenetic reconstruction thus suggests that the deltaretroviruses are the basal or root taxon within the Class II retrovirus lineage.

In both NJ and MP trees, Class II ERVs showed a strong tendency to cluster together according to host class. Figure 2.23 shows viral host class mapped onto a bootstrapped consensus of the six shortest trees obtained in MP analysis. Although many of the deeper relationships within the tree are not supported by bootstrapping, the tendency to cluster according to host class is striking, and seems to suggest that interclass horizontal transmission of Class II retroviruses between mammals and birds is relatively rare. In NJ trees, mammalian retroviruses form a polyphyletic group with the avian retroviruses (which form a paraphyletic group as a consequence), and mammalian retroviruses tend to constitute the most basal lineages within the tree. The two complex genera of

119

Chapter 2 Class II Retrovirus Diversity

Figure 2.20 NJ Phylogram SlVagm LTV CAEV DELTARETROVIRUS ONIVV i EIAV MERV-1 Shrew mouse I Rice rat I Armadillo! I HERVICHML2 Sheep Provost squirrel I Mongoose 111 HERVIGHML-6 Duman! HERVJ(HML-5 Pia] Armadillo! HERV.K.HML9 Cougar I Domestic Cat I Cassurry 1 A. Magpie 11 C. Magpie III M. thrush II Penguin I G kiwi I L. kiwi I E. tinamou I Vulture! P. falcon I M. harrier I Moorhen I H. thrush 11 C. magpie] Blue tit II C. Magpie LI S. Owl I R. Pheasant I Peacock I W. goose 1 Flamingo I Black duck III S. owl III J. Quail I H. thrush! LDV Emu I L. kiwi!! M. harrier II - -1 ALV-J

atg= I II i 74 Toucanette I — Black duck II 1 Blue tit I c:l RV Caecilitm (Bolt) I <

Bowerbird III VI , G rhea! I- D. rhea I Brown kiwis G woodpecker I Toucanette II Loon II Loon I Ostrihc I Moorhen II Black duck IV Blue till II Goshawk 1 F. hawk I Pigeon 1 131acluse 1 Peacock H. thrush IV Partidge IV Guineafowl 1 G pheasant I .--, Black duck I 4 1AP Brown. rat --- ny Multimammate rat 1 Yemeni mouse I P IAP C. hamster Prattle dog I Grass rat I lAP S. hamster IAP mouse Ptr Giant rat I g MPMV SRVI en Colobus I SRV-2 Babboon SERV Mongoose 1 in Mus-D In TvERV Goat !I SMRVH Ostrich -D 0 — i Mongoose II 0 Loris I Jaagsiekte W. Deer I Caribou I eS, — - Dolphin! C. Badger I MMTV Giraffe! Bison I

Platypus! Echidna 1 Red longs 1 MuERV-U I Grass rat II Hedgehog! BLV D HTLVI DELTARETROVIRUS 0.05 changes

120

Chapter 2 Class II Retrovirus Diversity

Figure 2.21 Bootstrapped Neighbour Joining Tree Showing Relationships of Novel ERVs 100 1 00 1-1D-1 72 I SLVagm 87 f'IV 100 100 CAEV - - - OMVV EIAV i 00 rj99.= Henn mu -- Nile rat I I Iler1g4119g.1 Red kangaroo I IAP houserat 100 MD = °stomas Y IAPC hamste I r--- Dane 4801 55 —_ kiile.MII I r• IA P 5,gam Act 95 100 IAP mouse Urornvs I Rabbit I 00 MPMV SRV I 100 EolObuS I qq 80 SRV II 95 Baboon SERV _ Mongoose L______MusD 69 T oV Goa II 69 I SMRVH , , L.... jisttichl)30 bl 100 _Aggro If _Slow 106.31 88 laagsiekte I Oft f. .... cNyaFrrr u 1 SA Ruses dolphin 2 Ferret badger I' ,-, .__ G raffe I 100 100 3n_00 1 Musk ox I __ . platypus I Thigrgyaj____, 95 ,MERY8 ACO2,5385 100 Mus palled I Orvromvs I

FIBIlV.K.HML-2 Hal Sheep! 81 I Provost squirrel I Mongoose III 50r liERVICHML 6 Dunnartl HERV KTIML-5 1---I PiRD Armadillo 1 _____ JIERVASAINIL-9 i Ofl Cougar I Domestic cat Casstmry 1 100 tW r— C.:A VP:PH — M thmash II 99 I Poneuin I 100 Groat kiwi I Lisle kiwi] 100—, Tinamou I 100 Vulture 74 Peregrine falcon 1 Marsh harrier 100 Moorhen I H. thrush II C. magpie I 99 1 Blue tit II ii i ub 81 ,, r,..,lovill Peacock I 90A W gPoW I !MI , Flamingo I 83 59I Black cluck III Screech °wall__ _Api mese Quail I ____13Jhu./413 I Class II subgroups LDV I I (in the order they occur top to bottom in the tree) Emu] 73 L.kiwi II Marsh harlia II 00 I RSV 100 ALV subtrei 1 Deltaretrovirus Tragopanl 95 /—' GuineafowIll 100 T7 (caecilian) Lentivirus 75 Blue tit 1 100 III Toucanette I Black duck 11 Betaretrovirus or lAP-related Hermit thrush III ____ Bower10 Great rhea Darwin. rhea HERV.K superfamily Blown kiwi I 94 Green woodpecker 1 00= Touesnette II Avian Class II A Common loon II Common loon I ❑ °ski h I m a LDV-related viruses Bliwk dusk IV 00 Blue tit III 100 Goshawk I Alpharetrovirus- Ferruginous hawk! ❑ 59 Pigeon 98 Black mouse I Peacock II ❑ Avian Class II B Hermit thrush IV 94 Tared el° 57 Guinea owl 1 11 Novel divergent ERVs 941 Golden pheasant 1 Black duck 1

121

Chapter 2 Class II Retrovirus Diversity

Figure 2.22 Strict consensus of 6 maximum parsimony trees 99 79 iv L-- mum 1 100 Class II subgroups °CEPAIMVVLVEA V_V- — - - (in the order they occur top to bottom in the tree) 100 ERV-I 100 Shrew mouses 74 Rice rat 1 Lentivirus Armadillo 11 88 .. ... ifr.r..ff819,8g-rs4- 1 NI tar 1 _ Betaretrovirus or IAP-related 76 I Yemeni mouse 1 1AP C. hamster Prairic HERV.K superfamily IAP S. hammer' 98 1AP mouse _giant rat 1 _ _Grass tall Avian Class II B it 10 )1 ,ARZ'f- 60 SRVI li Avian Class II A 1 00 ti7 71/7 ' Baboon S --k__, 191245.1 — ..99MO .8 .Aki._ LDV-related viruses MusD 89= TvERV Oo t II [ 1 Alpharetrovirus SMRVH 65 Ostrich° 99 r...... mongoose It

Deltaretrovirus 61 Loris I 85 Jaansickte , 00/ W. doer Ungrouped ERVs Caribou ❑ Dolphin I 70 MMTV 0 loo girnI 1 -1 Musk ox I DERV K.ITNIL-2 ___LIERVK HAIL-5 HERV.K.HML-6 PigD MUM., Dunnart 1 100= 1001 18 re=us , a Hedgehog" Echidna I CFBadgerl 00= Sheep I Provost xganret Mongoose III HERVK.HML-9 100= Cougar I Domestic Cal l Bowerbird III 100 G rhea I D. rhea) H. thrush IV Brown kiwi I 100= Toucanene II Loon II Woodpecker I Loon 1 r-- Ostrich 1, Moorhen- II ( Blue tit RI Black dock TV 100 Goshawk 1 F. hawk I 80 Pigeon I 981— B. arousal Peacock II 100= Tmamou I 84 Vulture 6 M. harrier I P. falcon I 90 H. thrush II Moorhen I Screech owl I R. pheasant I Peacock L___ 100 LDV L kiwi II 62 t._._ Emu I __MAlarrier.“ ____- Armadillo I 100 r--, V 98 ' ALV-/ 68 I tonna owl II Itpi 100L...... T7 (eaccilian) Blue tit I 100 80 =, TIZIrdVIn---, H. thrush III 58 Porridge IV Guineafowl 1 Red kangaroo I Cassuary I 86= Blue tit ll I C. mainsie B C. magpie I 98 Penguin I 100 = g kil"isvi 1 97 f "= Lim' it Black duck III H thrush I Screech owl 111 Japanese quail I 100 1 1' =—A-tlii'.1ci11111----- ....____M..._tlumh II Cr pheasant 1 — Black chick I HTLV-I BIN

122

Chapter 2 Class If Retrovirus Diversity

Figure 2.23 Strict consensus of 6 maximum parsimony trees showing viral host class origin

99 7 . ___H61-__ ... 91 Vagin : Fl • 199 100 0 VV • 90 MERVKACO2638L •• 00 74 us pahan E Mammalian Armadillo II 88 IAP house 00 99 10 Mastomss I ❑ Avian 6 Mvomys I IAP C. hamster Fpairie dog I Amphibian yRIIIIIII, IAP S. hamster 90 IAP mouse Urorn9s 1 • Exogenous virus Nile rat 1 It bb II 100 MPMV • SRVI • 100 71 SRVII 57 Baboon SERV Colobu I Moymou_sot__.____ MusD 69 T ERV o 97 ------:SMll'Itill 64 65 OstrichD 99 Mongoose II Lunn i s bagaiekte • 100 Igmlbdeer!ez Dolphin 1 70 MMTV • 00 L_..-G.._. _ 1 0 Musk ox 1 HERV.K.HML-2 BEV KAils4L-5 HERV.K.HML-6 P gt) Platypus I Dolman I I BenitM 100 l'Hlt7eat — — Fshidna I CFBadicel 110 Shecti I Provost squirrel 1 Mongoose In HERV.K.HML-9 100 I— g:::,uo ..,1 .,.. Bower10

MI — H. a ll H. thnish IV Brown kiwi I 100f— ToucoaTtie II Loo Woodpecker) Loon I

11 '°OoVein II 901 Blue tit III Black duck IV 100 Goshawk I F. hawk I g0 Pigeon I B.grouse 981= Paoock II 100 E unamou I 84 i V It I M. homer I P falcon I 991 H. thrush II Moorhen I Screech owl I =1— pit; ph I Peacock) 100 LDV • L kiwi II 621 Emu! _ _ M• ',amp-I I Armadillo 100 • 98 I BAU/ 1 • I 68 1 6' ritinearowl i II 100 100f s pnolitigig " 100 I 80 1.--. Touetinette I Black duck II H. thrush III 58 PartidgrJO Gumeafowl I Red kangaroo I CassuaryI %6r s Blue lit II I C magpie I C mimic I 98 Paquin I i BM r— L. k"ik wi) MI= W aciose I 971 Flamingo I Black duck III H grush I Sewell owl III Javanese mail I MO A. magpie II 1 0 C magma III M thrush II IS easant I Black duck I IITLV-I BLV

123 Chapter 2 Class II Retrovirus Diversity exogenous Class II retroviruses (deltaretroviruses and lentiviruses) appear as divergent lineages towards the presumed root of the tree. In MP trees the deltaretroviruses are also the most basal lineage. The other mammalian retroviruses tend to cluster together to form a monophyletic Glade.

Bootstrapping collapsed the deep structure of NJ and MP trees, leaving a large polytomy within which several smaller clades were retained, including both recognised Class II groups, and groups comprised entirely of novel ERVs. Bootstrap support within many of these clades was high, though some contained only two or three taxa. The emergence of novel, well-supported clades comprised entirely of novel ERVs suggested that the diversity of Class II ERVs might be greater than previously recognised. Resolving the ancestral relationships between Class II groups may require further sampling of Class II diversity. In addition, the alignment of a longer region of the retroviral genome (eg PR- IN or PR-RH) may lead to a greater degree of resolution in phylogenies. It remains possible, however, that Class II retroviruses have undergone an extensive radiation relatively recently, generating evolutionary relationships which are close to polytomy, and will be extremely difficult to resolve.

2.4.8 The status of recognised Class II groups

Alpharetroviruses

Screening led to the identification of several ERV fragments that clustered together with alpharetroviruses (ALV-J and RSV) in phylogenies. In NJ trees seven novel sequences were retained in a monophyletic Glade with the alpharetroviruses after bootstrapping. Three of these seven grouped together with the alpharetrovirus sequences in MP trees. ALV and RSV are most closely related to two ERV sequences found infecting Galliform birds (RV-Guineafowl II and RV-Cobot's Tragopan I). In NJ trees, a Class II ERV fragment isolated from an amphibian (RV-caecilian (Bou)) clustered together with the alpharetroviruses. Divergent strains of ALV/RSV-related retroviruses have been identified and classified into subgroups (Section 2.1.1). A representative of the most

124 Chapter 2 Class II Retrovirus Diversity

divergent of the ALV-related retroviruses, (the ALV-J subgroup of ALV), was included in phylogenies to investigate whether RSV and ALV would be split by novel sequences. However, ALV-J and RSV are more closely related to one another (97% identical across aligned region) than to any of the novel alpharetrovirus-related ERV fragments.

Betaretroviruses and IAP-related Elements

The betaretroviruses comprised the largest individual Glade retained after bootstrapping. Eleven novel ERVs clustered together with betaretroviruses in bootstrapped NJ trees, ten in a bootstrapped consensus of six MP trees. In both NJ and MP trees, an ERV sequence isolated from an ostrich (Jo Martin, unpublished data) clustered with the betaretroviruses with strong (>90%) bootstrap support. Since the host range of betaretroviruses is otherwise restricted to mammals, the discovery of a betaretroviral sequence in an avian genome suggests horizontal transmission of betaretroviruses from mammals to birds has occurred, perhaps relatively recently. In addition, a betaretroviral sequence was isolated from a colobus monkey, indicating that contrary to previous reports (van der Kuyl et al, 1997), Simian ERVs related to betaretroviruses (SERVs) have colonised the genomes of the Colobinidae.

As discussed in Section 2.1.2, the Betaretrovirus genus was created by the grouping together of viruses previously classified as separate genera (Pringle, 1999). For many years, MMTV had separate genus status as the prototype member of the 'mammalian type-B retroviruses'. The other betaretroviruses made up a second genus: the 'mammalian type-D retroviruses'. Monophyly of betaretroviruses was supported in NJ but not MP analyses. In both NJ and MP analyses MMTV clusters together with three ERV sequences isolated from artiodactyls (RV-Giraffe I, RV-Bison I, and RV-Musk ox I). In MP analyses, the Glade containing MMTV and the Glade containing the rest of the betaretroviruses formed part of a large unresolved polytomy. NJ trees placed divergent marsupial ERVs (RV-Duck-billed platypus I and RV-Short-beaked echidna I) as basal taxa to the betaretrovirus Glade.

125 Chapter 2 Class II Retrovirus Diversity

Seven ERV sequences were identified that clustered together with sequences derived from rodent IAP elements. The IAP-related elements formed a distinct Glade with robust bootstrap support (95% in NJ trees, 90% in MP trees). All of the IAP-related sequences were isolated from rodent genomes, with the exception of one, which was isolated from a lagomorph (European rabbit (Oryctolagus cuniculus)).

Three lines of evidence suggest an evolutionary relationship between the betaretroviruses and the IAP-related elements: (1) Firstly, the groups emerge as sister clades in both MP and NJ trees. (2) Secondly, both groups are distinguished by the presence of a g-patch domain between the pro and pol reading frames. Assuming that g- patch has only been acquired once in the evolution of Class II retroviruses, then the two groups must monophyletic. Although the entire g-patch domain is not present in MMTV, what appears to be the 5' part of the sequence can be identified in alignments. This suggests that g-patch was present ancestrally in MMTV, and has been lost, rather than gained independently in all the other betaretroviruses. Further indication of the relationship between betaretroviruses and IAP-related elements comes from the fact that (3) the A-type particles observed in cells expressing IAP elements are widely believed to represent immature B/D-type virions (Teich, 1984, Gelderblom, 1990).

In order investigate further the relationship between betaretroviruses and IAP-related elements, taxa for which the complete sequence data was available were aligned across 2847 base pairs of pol, and used to construct bootstrapped NJ trees. Five betaretroviruses and the IAP mouse sequence were aligned with one HERV.K.HML-2 insertion and three HERV.K.HML-9 insertions. Trees were rooted on three ERV sequences of the HERV.K.HML-9 family, and trees were constrained so that the outgroup was placed as a monophyletic sister group to the ingroup. These constraints were based on the positions of taxa in previous NJ and MP analysis, and the distribution of g-patch domains. The g-patch domain is found in all the ingroup sequences (except MMTV which only has part of the sequence), but not in the outgroup sequences. Providing that g-patch has only been acquired once in Class II retrovirus evolution, then the assumption of monophyly of ingroup and outgroup sequences is consistent.

126 Chapter 2 Class 11 Retrovirus Diversity

Phylogenies based on the larger alignment provided further support for the monophyly of betaretroviruses and IAP-related elements. The IAP mouse sequence was placed as a sister taxon to MMTV, and these two taxa were grouped together with the other Betaretroviruses with 97% bootstrap support. HERV.K.HML-2 was placed basal to the Betaretrovirus/IAP Glade (see Figure 2.24).

Class H HERV superfamily.

The diversity of Class II HERVs has been investigated previously. Andersson and colleagues (Andersson et al, 1999) suggested that the Class II HERVs constituted a single family, within which there were 10 'subgroups', which they labelled HML1 - HML10. HERVs that were less than 80% similar were considered distinct subgroups. More recent analyses indicate that at least two of the ten groups (HERV.K.HML-5 and HERV.K.HML-6) are phylogenetically distinct Class II HERV families (Tristem, 2000), whilst the others are closely related insertions of the HERV.K.HML-2 family.

To investigate the relationships between the ten Class II HERV subgroups and the novel class II ERVs identified by PCR screening, representatives of each HERV group were included in an alignment with other Class II ERVs. The sequences were aligned across the region of the retroviral genome amplified by PCR screening. The original analysis by Andersson was based on an alignment of 244 base pairs of RT. The 244 base pair segments identified in the original study were retrieved from GenBank, and then used as query sequences to BLAST the human genome. This allowed contigs containing the complete insertions of the relevant HERV subgroups to be identified (see Table 2.4), and the viral sequences could subsequently be extracted from the contig using tools for sequence data manipulation. Alignment of the region from protease to RT revealed that two of the Class II subgroups, HERV.K.HML-7 and HERV.K.HML-8, consisted of retroviral sequences interrupted in pol by non-viral sequences. Trees constructed using these sequences (with the non-viral sequence represented as missing data), suggested that the insertions were, in any case, very closely related to the HERV.K.HML-2 family (data not shown). These two groups were subsequently excluded from the alignment.

127 Chapter 2 Class II Retrovirus Diversity

Working on the assumption that HERVs that were paraphyletic with respect to non- human viruses were independently derived, it was concluded that the Class II HERVs comprise at least four distinct families. The HERV.K.HML1, HERV.K.HML3, HERV.K.HML4 and HERV.K.HML10 groups formed a robust, monophyletic Glade with HERV.K.HML2 in NJ trees, and appear to represent divergent lineages within the HERV.K.HML2 family. Sequences from each of these HERV.K subgroups were more than 80% similar across the aligned region, suggesting that even by Andersson and Medstrands original criteria, they comprised a single group. Only one representative of the HERV.K.HML2 family was included in the final alignment used for MP analyses. NJ trees supported the contention that the HERV.K.HML5 and HERV.K.HML6 groups constitute independent Class II HERV families (Tristem, 2000), and distinguished a fourth family: HERV.K.HML9. HERV.K.HML9 also emerged as an independent family in MP analyses, though the relationship between HERV.K.HML2, HERV.K.HML5 and HERV.K.HML6 was unresolved using this method (Figures 2.21 and 2.22)

The copy number of HERV.K.HML-9 in the human genome appears to be very low, screening by BLAST search identified only three insertions within the non-redundant (nr) database of Genbank. LTRs were identified in two of these insertions (by Aris Katzourakis). An alignment of one paired LTRs from one of the insertions is shown in Figure 2.25. The divergence between LTRs was used to calculate the integration date of the HML-9 insertion as follows: The divergence between the paired LTRs was calculated and these divergence figures were then corrected to account for the presence of multiple mutations at the same sit, back mutations, and convergent substitutions, using the two parameter model (Kimura, 1980). Two estimates of the rate of change of the host genome were calculated (2.1 x 10-9 and 1.3 x 10-9 substitutions per year (as described in Tristem, 2000), and hence two estimates of the integration date were derived. The estimated integration dates for the two insertions for which LTR could be identified were as follows: AC011503; 16-26 million years old. ACO25569, 22 to 25 million years old. The HERV.K.HML-9 family thus appears to be younger than the

128

Chapter 2 Class El Retrovirus Diversity

Figure 2.24 An NJ phylogeny of Class II retroviruses constructed using 2847bp of pol

HML.9 AC011503 1001 HML.9 ACO25569 HML.9 AC011612 100 MPMV 100 SRVI 100 97 SRVII 97 Jaagsiekte 71 MMTV IAP mouse HERV.K.HML-2

g-patch -ve g-patch +ve

Figure 2.2.4 A bootstrapped NJ tree based on an alignment of 2847 base pairs (spanning the greater part of the retroviral pro and pol coding domains). Trees were rooted on three insertions of the HERV-K.HML-9 family, and tree were constrained so that the outgroup was placed as a monophyletic sister group to the ingroup. These constraints were based on the positions of the sequences in previous NJ and MP analysis, and the distribution of g-patch coding domains across the taxa. The domain is found in all the ingroup sequences (except MMTV, which only has part of the sequence), but not in the outgroup. Providing that g-patch has only been acquired once in Class ll retrovirus evolution, the assumption of monophyly is consistent. Robust bootstrap support was provided for the inclusion of the IAP mouse sequence within the Betaretrovirus Glade.

5 LTR TGTGTGGGGGTGTACGACAATACATTCAAGCTTATATACAAGGCATTTGA

3LTR TATGGGGGGGTGTACAACAATACAGTCAAGCTTATGTACAAGGCATTTGA

5LTR GGTTGAGGCATGGAAAAATACTAAGGCACTGTGTGTATGTTGTTTGTGCA

3LTR GATCGAAGCATGGAAAAATACTGAGGCACTGTGTGTATGTTGTTTGTGCA

5LTR TGATACTGTAACTCCTTGACTCTGAAAACAGGACAAGGAACAGGATGTGT

3LTR TGAGACTGTAACTCCTTGACCCTGAAAACAAGACAAGGAACAGGATGTGT

5LTR GATAAGGAGTGCTGAAGACAGCATCCTAAGAATGTGGTTTGAGTGCTTTC

3LTR GATAAGGAGTGCTGAACACAGCCTCCTAAGAATGTGGTTTGAGTGCTTTC

5LTR AGATGTAATAAATA -AGGCCATATGTACCTCATGACCTGACCCCCAAATA

3LTR AGACGTAATAAACAGAGGCCATATGCACCTCATGACCCGA—CCCCAAATA

5LTR GCCACCTGGTGGATGTTTCTTGTTTGTCTAAATTGTAGTTTAACAAGCCT

3LTR GCCACCTGGTGGATGTTTCTTATTTGTCTAAACTGTAGTTTAACAAGCCT

5LTR TCTCAACAAATAC TTGACAGACAGATCTTAAAGCAGCAATCCCTTGAAGG

3LTR TCTCAGTAAATACCTGGCAGACAGATCTTAAAGCAGCACTCCCTTGGAGG

5LTR AGCTGCTCCCCACCCCGTTCA -'1°TGTAATTGTATGAATACTTATTTCTCG

3LTR AGCTGCTCCCCGCCCTGTTCAGCTGTAACTGTCTGAATACTTATTTATTG

5LTR GCGTTCACTGCAGAGCAACCTGCA

3LTR GTGTTCACTGAAGAGCAAGCTGCA

Figure 2.25 An alignment of paired LTRs from a I IERV.K.HML-9 insertion in the human genome (accession number Ac025569). There are 36 mismatches between the 5' and 3' LTRs (mismatches are highlighted in red

129 Chapter 2 Class II Retrovirus Diversity

HERV.K.HML-5 family, for which integration dates of 33 to 53 million years ago have been calculated (Tristem, 2000).

The large number of stop codons and frameshifts in these ERVs (relative to HERV.K.HML2) supported the inference that they represent relatively ancient germline colonisations. Insertions of the HERV.K.HML2 family have a clear g-patch domain between the pro and pol reading frames. HERV.K.HML-6 insertions also have a g-patch domain in the same location, although it is quite divergent in this family. In contrast, this characteristic sequence is absent in the HERV.K.HML5 and HERV.K.HML9 families (Figure 2.13). These families are older suggesting that g-patch may not have been present ancestrally in mammalian retroviruses, and represented a derived character. MP trees (see Figure 2.22) placed HML-9 basal to the rest of the mammalian Class II ERVs, providing further indication that the HML-9 group represents an ancient Class II lineage, although bootstrapping did not support this relationship.

LDV-related ERVs

Three novel ERVs (RV-Emu I, RV-Marsh harrier II, and RV-Little spotted kiwi I) clustered together with the exogenous avian retrovirus LDV in NJ and MP trees. Bootstrap support for these relationships was moderate to strong. The LDV-related retroviruses appear to represent a distinct genus of Class II retroviruses.

Lentiviruses and Deltaretroviruses

None of the amplified ERV sequences showed significant homology to lentiviral or deltaretroviral sequences. Endogenous counterparts of these complex retroviruses remain to be identified. However, an effect of adding more novel taxa to the alignment of Class II sequences was that the lentiviruses moved from a basal position in MP trees (between the deltaretroviruses and the rest of the Class II Glade), to form part of an unresolved polytomy with betaretroviruses and novel Class II ERVs. Although relationships between the lentiviruses and other Class II groups were not supported by

130 Chapter 2 Class II Retrovirus Diversity bootstrapping, the movement of the lentivirus Glade from a strongly supported position at the base of the Class II tree suggested that the relationship between the lentiviruses and other mammalian retroviruses may be resolved by more widespread sampling of Class II retrovirus diversity and further phylogenetic analysis.

2.2.8 Novel Divergent Groups

Avian Class II A and Class II B ERVs

In addition to the established genera, several well-supported clades comprised entirely of novel ERVs emerged in phylogenetic analysis. Amongst avian Class II ERVs, two clades emerged which retained more than three members after bootstrapping, these were named 'Avian Class II A' and 'Avian Class II B' ERVs (see Figure 2.21, Figure 2.22 and Table 2.5 for details). The Avian Class II A group has seven members, whereas the Class II B ERVs have nine. Bootstrap support within these groups was, for the most part, very high (often over 90%). The two novel groups may constitute at least one entirely new genus of avian retroviruses.

Divergent Mammalian Class II ERVs

Several of the mammalian ERVs identified in this study were not clearly related to betaretroviruses or IAP-elements in NJ and MP trees. Relationships between these viruses were poorly resolved. Clades comprised of divergent mammalian ERVs and retained after bootstrapping contained, at the most, four members (Table 2.5). These 'groups' were considered too small to be given provisional names. A Glade containing the RV-Grass rat II and RV-Hedgehog sequences also contained the MuERV-1 sequence described recently by Benit et al (2001). In NJ trees there was the weak suggestion of a relationship between these ERVs and the lentiviruses, however, this relationship was not supported by bootstrapping. Divergent viruses were isolated from feline hosts (cougar and domestic cat). NJ and MP trees grouped HML.9 together with these divergent feline ERVs, but again the relationship was not supported by

131 Chapter 2 Class II Retrovirus Diversity

Table 2.6 Novel ERV groups identified by phylogenetic analysis

ERV Group/Species # Mutations

Avian Class II A ERVs

RV-Vulture I 0 RV-Peregrine falcon I 0 RV-Marsh harrier I 0 RV-Moorhen I 0 RV-Tinamou I 1 RV-Common magpie I (NJ only) 1 RV-H.thrush II RV-Screech owl I (MP only)

Avian Class II B ERVs

RV-Black duck IV 0 RV-Goshawk I 0 RV-Pigeon I 0 RV-Black grouse I 0 RV-Ostrich I 1 RV-Moorhen II 1 RV-Blue tit III RV-Ferruginous hawk I RV-Peacock II

Divergent clades of mammalian Class II ERVs

RV-Cougar I 6 RV-Domestic cat I 5

MuERV-U1 ND RV-Grass rate II 4 RV-Hedgehog I 6

RV-Rice rat I 1 RV-Armadillo II MERV-1 20 RV-Shrew mouse I 7

Table 2.6. Novel ERV groups. The # mutations column shows the combined number of stop codons and frameshifts: 0 - blue; 1 - green; 2 - pink; 3 or more - red. The divergent mammalian clades were small (2- 4 viruses) and have not been named.

132 Chapter 2 Class II Retrovirus Diversity bootstrapping. Marsupial Class II ERVs showed a tendency to cluster together in some trees, although the only relationship retained in bootstrapped trees was that between duck-billed platypus and short-beaked echidna sequences

2.2.9 Distribution of nonsense mutations across Class II phylogeny

Figure 2.26 shows the distribution of nonsense mutations (stop codons and frameshifts - see Section 2.4.6) across the aligned region mapped onto a bootstrapped consensus of six equally parsimonious MP trees. Amongst mammalian viruses, novel ERV sequences that do not form part of the betaretrovirus/IAP group are generally quite degraded, most having four or more interruptions in the viral reading frame. The novel groups of divergent mammalian ERVs described above (see Table 2.5) are comprised entirely of degenerate sequences with numerous stop codons and frameshifts. I would suggest that these groups represent ancient mammalian Class II lineages, and that exogenous counterparts to these sequences are unlikely to be identified.

In contrast, the novel avian ERV groups (Class II A and Class II B) described above (Table 2.5) contain a high proportion of sequences that are intact or nearly intact across the aligned region. This suggests that these groups of viruses have been active relatively recently and that exogenous counterparts to these sequences may currently be circulating in avian populations. I anticipate that exogenous viruses belonging to these groups will eventually be isolated.

133

Chapter 2 Class II Retrovirus Diversity

Figure 2.26 Strict consensus of 6 maximum parsimony trees showing stop codon/fraineshift data

100 MERV-I 100 4 Shrew mouse 1 Rice rat I number of nonsense muations Armadillo U (stop codons and frameshifts) 861 IAP brown rat 1001 99 Multirnammaie rat I 6 _Yemeni mouse None ro Li Prairie dog I Os 90 98 1AP S hamster IAP mouse Ei I Giant rat 1 02 E 2 RabbitI 100 MPMV — SRV-I El 3 100 SRV-2 • 57 Babb.. SERV Colobusl 113 4 or more Mongoose I 71 MusD • exogenous virus TvERV Goat It 97 SMRV 11 65 Ostrich D 99 Mongoose II Loris 1 851 Joa • 1001 White-tailed deer 1 Caribou) Dolphin MMTV 100 Giraffe I 00 Bison I Muskoa 1 HERV.K.HML-2 HER.V/CHML 5 HERV.K.HMI,6_ PigD Platypus I Dunoart 10 MuERV-U1 00 Goias rat It Iledgehog 1 Echidna I Ferret badger I Sheep I Provost squirrel 1 Mongoose III HERV.K.HML-9 100 f-- Cougar I Domestic cat I Bowerbird III 100 Greater rhea I Darwin's rhea I H. thrush IV Brown kiwi I 100 Toncancrte II Loon II G woodpecker I t. Loon I Ostrich I 90 95 Black duck I V 100 Goshawk 1 F. hawkl 80 Pigeon I 98 Black use Peacock II 100 Tinainou 1 Vulture I M. hanier I 54 P falcon 1 99 H. thrush II

Moorhen I Screech owl I

R. _pheasant I IAV Peacock I

-1, UE 100 .11 I kiwi 62 Emu 1 ssup

3/titarigilL II 00 1 RSV • ALV • 68 Tragooan I SAlla "1 Guinearowl 11 100 1001 1-7 (cacciliaril 72 Blue tit I 100 80 Toucanette I Black duck It 14. thrush III 58 Partidge IV C incaf 11

(-ossuary I 86 B C. magpie 11 C. magpie I 98 r Penguin I 100 G. kiwi I L. kiwi I

Flamingo ) Black duck III H thrush I Screech owl III J. mod/ I A magpie 11 1001 C magpie Ill ci_phoiwoo Block duck I

134 Chapter 2 Class II Retrovirus Diversity

2.4.11 Distribution of env types in mammalian Class II retroviruses

A recent investigation by Benit and colleagues (Benit et al, 2001) investigated the relationships between diverse retroviral env genes by constructing phylogenies based on an alignment of TM domains. This study identified a relatively well-conserved motif in TM, called CKS17, because seventeen-mer peptides derived from it have been shown to have immunosuppresive properties in in vitro assays (Cianciolo et al, 1986; Sonigo et al, 1986). A consensus of the CKS17 motif was used to screen genome sequence databases for diverse retroviral TM domains, which were subsequently aligned.

Some Class II retrovirus groups lack the CKS17 domain and were not identified by the CKS17 screen. Nevertheless it was possible to align TM domains from these CKS17- negative (CKS17-) retroviruses with the CKS17-positive (CKS17+) ones, by taking advantage of hydrophobic plots, coiled-coil structure predictions, and conserved residues and domains identified in the TM domain by structural and biochemical approaches. The resulting alignment of 269 amino acid residues of the TM domain was used to construct bootstrapped NJ trees. The resulting tree disclosed two groups corresponding to the CKS17- and CKS17+ sequences. All Class I sequences belonged to the CKS17+ group, while the CKS17- group was composed entirely of Class II sequences. Some Class II retroviruses, however, have CKS17+ TM domains, and clustered with Class I retroviruses in NJ trees based on TM. These included the deltaretrovirus HTLV, the primate betaretrovirus MPMV, and the alpharetrovirus RSV, as well as the endogenous sequence MuERV-U1. The incongruence between phylogenies based on RT and phylogenies based on TM suggested that exchange of env coding sequence between Class I and Class II retroviruses has occurred at least once in retroviral evolution.

The investigation described above analysed only representative members of the each Class II group. The distribution of env types within certain groups (betaretroviruses and HERV.K groups) was investigated more thoroughly here by using alignment tools to analyse taxa for which env sequence data were available. To determine whether a given

135

Chapter 2 Class 11 Retrovirus Diversity

Figure 2.27 Detail of MP strict consensus showing mammalian retroviruses and distribution of env types

100 may 7.- 60 S RV I • 100 71 1 SRV II • 57 Baboon SERV Colobus I Mongoose I 71 td MusD trl 89 TvERV V Goat II 97 SMRV-H • 64 65 OstrichD 99 Mongoose II 0 Loris I 85 Jaagsiekte • Fo 1001 W. deer I cn tit Caribou I cn — Dolphin I 70 MMTV • 100 Giraffe I 100 Bison I Musk ox I — 88 IAP brown rat — 99 100 Multimanunate rat I 76 Yemeni mouse I IAP C. hamster tri t-` Prairie dog I rrl 90 98 IAP S. hamster 4 IAP mouse trl Giant rat I .i Grass rat I cn Rabbit I — HERV.K.HML-2 HERV.K.HML-5

HERV.ICHML-6 L: 9 I HIV-1 79 SIVagrn FIV < 100 100 1 CAEV • OMVV • EIAV cr)tri 100 MERV- l cn 100 74 Shrew mouse I Rice rat I Armadillo II PigD Platypus I Dunnart I — I00 MuERV-141An 100 Nile rat II Hedgehog I Echidna I Ferret badger I 1001 Sheep I Provost squirrel I ... Mongoose III HERV.K.HML-9 100 Cougar I Domestic cat ALPHARETROVIRUS EDELTARETROVIRUS 1 GAMMARETRO)ABM

Figure 2.27 The distribution of env gene types across Class II taxa, based on homology detected using BLAST to BLAST pairwise alignment, and on previous investigations (Bent eta!, 2001). Assuming the tree can be rooted on Ganunaretroviruses and the rest of the Class I retroviruses, then it would appear that the CKS17+ type I env is ancestral, and that the CKS17- env found in MMTV, HERV.K.HML-2 and lentiviruses has been acquired later in Class II evolution. This interpretation of the distribution of env types requires a second recombination event, that has replaced the MMTV-like env with a Class I/CKS 17+ env in some Betaretroviruses (MPMV, SRV-1, SRV-2, Baboon SERV, and TvERV).

136

Chapter 2 Class II Retrovirus Diversity

Figure 2.28 Detail of MP strict consensus showing mammalian retroviruses and predicted distribution of env types

1001 MPMV 60 SRV I • 100 71 SRV II ----1 5 7 Baboon SERV Colobus I Mongoose 1 _, 71 MusD --1 9 TvERV Goat 11 97 SMRV-H 64 65 _._ QstriehP _ _ 9 MongooseB Loris I Jaagsiekte

85 1IIIIA0111MIVIgil 00 W. deer I

Caribou I S3S Dolphin I 70 MMTV • 100 Giraffe I 100 Bison 1 Musk ox 88 IAP brown rat 100 99 M. rat 1

76 Yemeni mouse I dVI IAP C. hamster Prairie dog 1 AIgig

IAP S. hamster I 90 98

IAP mouse Ig Giant rat I IN Grass rat 1 SI Rabbit I HERV.K.HML-2 HERV.K.HML-5 HERV.K HML-6 9 HIV-1 79 SlVagm RV 00 100 CAEV OMVV EIAV 100 MERV-1 100 74 Shrew mouse I Rice rat 1 Armadillo H PigD Platypus I Dunnart I 100 ___ MuERV-Ul_. 100 Grass rat II Hedgehog I Echidna I Ferret badger 1 100 Sheep I Provost squirrel I Mongoose III HERV.K.HML-9 1001 Cougar I Domestic cat 1

ALPHARETROVIRUS DELTARETROVIRUS A GAMMARETROVIRUS4

Figure 2.28 A prediction of how env types might be distributed amongst Class II retroviruses, assuming recombination events are rare. Work is currently being carried out (by Peter Kabat) to obtain the complete sequence of the avian Betaretrovirus Ostrich-D. Based on the distribution of env types amongst Betaretroviruses I would predict that this work will reveal that Ostrich-D has a CKS17+/ Class I env.

137 Chapter 2 Class II Retrovirus Diversity virus had a MMTV-like (CKS17-) env, the virus sequences and a portion of MMTV env were aligned using the online BLAST to BLAST tool (Tatusova et al, 1999), which searches for homology between two query sequences. Results are shown in Table 2.6 below. The env genes of the lentiviruses have been investigated previously and appear to be monophyletic (Mike Tristem, unpublished data). Figure 2.27 maps the distribution of env types, based on the conclusions of Benit et al (2001) and other results discussed above, onto the portion of the MP tree (a bootstrapped, strict consensus of six equally parsimonious trees) obtained from the alignment of the PR-RT region reported here. The alpharetroviruses, deltaretroviruses, and gammaretroviruses are shown in the relative positions in which they are found relative to the mammalian viruses in this study and in others (Tristem, 2000; Katzourakis and Tristem, in press). Figure 2.28 shows a prediction of how env types might be distributed across the mammalian Class II retroviruses analysed in this study, based on the assumption that acquisition of novel env genes via recombination occurs only rarely.

Table 2.7 Env homology in Betaretroviruses

Betaretrovirus Species Homology to SRV-1 Env Homology to MMTV Env

SRV-1 + - SRV-2 + - MPMV + - SERV + - SMRV-H + - TvERV + - MusD No Env No Env JSRV - + MMTV + IAPEmouse + IAP Syrian hamster No Env No Env IAP Chinese hamster No Env No Env IAP Brown rat No Env? No Env? HERV.K.HML-5 - + HERV.K.HML-6 + HERV.K.HML-9 - + MERV-1 No Env? No Env?

Table 2.7 Distribution of env types in Class II retroviruses. The SRV-1 env clusters with the CKS17+ env genes of Class I retroviruses in trees based on alignments of TM domains (Benit et al, 2001). MMTV env is CKS17- and is distinct from SRV-1 env (+ :homology detected, - :no homology detected)

138 Chapter 2 Class II Retrovirus Diversity

2.4.12 Avian Class II retroviruses and host geographic range

In order to investigate whether there was any relationship between host geographic range and virus phylogeny amongst avian Class II ERVs, species were classified according to their geographic range. Species restricted to Southern hemisphere continents (Australasia, South America, Antarctica) were considered Southern, whereas species restricted to more Northern continents (Africa, North America, Eurasia) were considered Northern. Species whose range spanned both of these sets of continents were considered global. Figure 2.29 shows these traits mapped onto the region of the MP tree that contained the majority of avian viruses (the complete MP tree is shown in Figure 2.22). Geographic range data was taken from Perrins and Middleton (1998).

Viruses showed a weak tendency to group together according to their distribution in these macrogeographic terms, suggesting that horizontal transmission may have been restricted between isolated continents. Wider sampling of ERV diversity in a greater range of avian taxa is needed to demonstrate that divergent ERV subgroups circulated ancestrally in distinct geographic regions. Nonetheless, there were indications within this dataset that horizontal transmission of viruses has occurred between divergent species with isolated distributions in the Southern hemisphere; viruses isolated from three species of Southern Hemisphere flightless birds (king penguin, lesser spotted kiwi and greater spotted kiwi) clustered together with 100% bootstrap support. These ERV sequences were intact, or nearly intact, across the entire amplified region, suggesting that they were inserted into the germline relatively recently. The integration events almost certainly occurred after the divergence of the avian orders in question (Sphenisciformes (penguins) and Apterygiformes (kiwis)); kiwis and other ratites are thought to have diverged relatively early on in evolution of birds (Cooper, 2001).

139

Chapter 2 Class II Retrovirus Diversity

Figure 2.29 Detail of MP strict consensus showing avian retroviruses and host geographic range

To mammalian viruses (see Fig 2.23) Bowerbird III A 100 G. rhea I B. rhea I H. thrush IV Brown kiwi I 100 Toucanette 11 Loon II Woodpecker I Loon I Ostrich I Moorhen Il 90 Blue fit III n 95 Black duck IV 100 Goshawk I F, 94 1 F. hawk I 80 _tigeon I > 8 Black grouse I 1 Peacock II 100, Tinamou I 84 Vulture I n 64 M. harrier I e3'p 54 1 P. falcon I 99 H. thrush II = Moorhen I co Screech owl I R. pheasant I Peacock I 1 oo LDV 6 L. kiwi II Enna M. harrier II > Armadillo I .ei 100 RSV w 98 1 ALV-J 5 68 Tragopanl <. I Guineafowl II 100 RV-Caecilian Bou 72 Blue tit I 100 80 Toucanette I 13',. Black duck II H. thrush III 58 I Partidge IV _Guineafow_l I Red kangaroo I Cassuary I 86 Blue tit Il Il C. magpiell C. magpie I 98 Penguin I 100 _akiwil L. kiwi i 0 W. goose I 97 Flamingo I Black duck III H thrush I Screech owl III Japanese quail I 1001 A magpie II 100 C magpie III M. thrush II G pheasant I Black duck I

I I Southern hemisphere • Global H Northern hemisphere

Figure 2.29 Species were classified according to their geographic range. Species restricted to Southern hemisphere continents (Australasia, South America, Antarctica) were considered Southern, whereas species native to North America, Africa or Eurasia were considered Northern. Species indigenous to continents either side of this divide were considered global. Viruses showed some tendency to group according to the geographic range of the host. All Avian Class II A ERVs were isolated from Northern species. (geographic range data derived from Perrins and Middleton, 1998).

140 Chapter 2 Class II Retrovirus Diversity

2.5 DISCUSSION

2.5.1 Phylogenetic analysis

The data presented here suggest that the Class II retroviruses are a relatively modern group, and have radiated through higher vertebrates relatively recently. Several lines of evidence indicate that Class II retroviruses evolved from Class I retroviruses, rather than the other way round. Firstly, the overall diversity, at least of human ERVs, appears to be greater in Class I than in Class II, which may indicate that Class I retroviruses have been radiating for longer. The data presented here suggest that by phylogenetic criteria, there are only four distinct families of Class II HERV, compared to 21 families in Class I. Furthermore, estimates of the age of HERV families suggest that in humans at least, Class I HERV lineages are older, on the whole, than Class II lineages (Tristem, 2000). Notably, Class II includes the HERV.K.HML2 lineage, which appears to be the most modern and recently established group of HERVs (Stoye, 2001, Turner et al, 2001).

The host range of Class I retroviruses appears to be much wider than that of Class II retroviruses. Again, this may indicate that Class I retroviruses are older and have had more time to radiate across a diversity of vertebrate taxa. Although Class II ERV sequences are found in some amphibians, they appear to be rare in lower vertebrates, whereas Class I and Class III ERVs have been isolated in all vertebrate classes, except Agnatha (Herniou et al; 1998, Martin, 1999). However, further sampling is required to establish that Class II ERVs are genuinely rare in lower vertebrates.

Comparison of the number of stop codon and frameshifts interrupting the reading frames of ERV sequences amplified by PCR suggests that a greater proportion of Class II ERV diversity has been established as the result of recent activity. (see Figures 2.16 and 2.19). A high proportion of the Class II ERV fragments obtained in this investigation were intact, or nearly intact, across the amplified region. Probably the most likely explanation for this is that these insertions are relatively young. This finding thus supports the idea that relatively recent horizontal transfer of infectious viruses, and

141 Chapter 2 Class II Retrovirus Diversity multiple, independent germline colonisation events have generated the distribution of Class II ERVs throughout higher vertebrate genomes. Data suggest that the radiation of Class II retroviruses has continued in recent times; both the Lentivirus and Deltaretrovirus genera contain viruses (HIV-1, HIV-2, PTLV-1) thought to have crossed species boundaries recently (Crandall, 1996; Kelsey et al, 1999; Holmes, 2001).

Trees rooted on LTR retrotransposon sequences suggest that the deltaretrovirus genus forms the basal root of the Class II Glade. The presence of a C-type virion in basal Class II lineages and in all Class I retroviruses so far described suggests that the C-type virion morphology is an ancestral character (see Figure 2.30). However, this inference must be treated with some caution, since it has previously been demonstrated that MPMV can be converted from a D-type to a C-type assembly pattern by changing a single amino acid in its MA sequence (Rhee and Hunter, 1990).

An RNA-binding 'g-patch' domain is inserted between pro and pol in some, but not all mammalian retroviruses. Assuming that the g-patch domain has been acquired only once in retrovirus evolution, monophyly of betaretroviruses, IAP-related elements, and the HERV.K10 lineage is supported (see Figure 2.30). Betaretroviruses and IAP-related elements are monophyletic in NJ trees, although this relationship is not retained after bootstrapping. For g-patch to have been acquired just once during the evolution of Class II retrovirus, it would have to have been lost in MMTV. The presence of a short, apparently truncated region g-patch sequence in the correct insertion position in the MMTV genome supports the inference that the domain was present there ancestrally. Some divergent mammalian retroviruses lack discernible g-patch domains but contain an insertion of equivalent length in the correct position between pro and pol. It is possible that the g-patch sequence has degenerated in these viruses.

142

Chapter 2 Class II Retrovirus Diversity

Figure 2.30 Model of Class II Retrovirus Evolution

Host Class Betaretroviruses g Mammals

❑ Avian lAP Element;iati4,

Virion Morphology HERV.K10 (7) C-type Divergent mammalian E 0 Cone-shaped/lentivirus containing a degener g-patch domat • A-, B-, D-type

Unknown 0 Lentivtruses

▪ G-patch acquisition Divergent mammalian ERVs lacking a g-patch domain • Exogenous members 3 Inter-class transmission Alpharetroviruses •

LDV-related retroviruses •

Type I Avian ERVs

Ancestral Class II it-- Type II Avian ERVs

Numerous divergent, well supported clades of avian ERVs, with only 2-3 members

Deltaretroviruses •

Figure 2.30 ERV distribution and stop codon/frameshift data (see discussion 2.5.1) suggest that Class II retroviruses are a more modern group than Class I retroviruses. Based on this assumption, the Class II retrovirus phylogeny can be rooted on Gamma- or Epsilonretroviruses, which have C-type virion morphology. the rooted tree suggests that the C-type virion morphology is

143 Chapter 2 Class II Retrovirus Diversity

2.5.2 Novel retrovirus groups

Class II diversity appears to be greater than previously recognised. PCR screening identified ERVs related to alpha- and betaretroviruses, to LDV, and to recognised groups of endogenous elements, but also led to the identification numerous novel Class II ERV subgroups that emerge as well-supported clades in phylogenetic analysis. The well-supported clades retained after bootstrapping include not only the previously recognised genera, but also several novel groups comprised entirely of novel ERVs. Amongst avian Class II ERVs, two novel clades emerge; the avian Class II A ERVs (with eight ERV species) and avian Class IIB ERVs (with nine ERV species). While strong bootstrap support is obtained for many of the relationships within clades, relationships among them are not well-supported. In both MP and NJ analyses, bootstrapping collapses the deeper structure of phylogenies. This may simply reflect the need for more sequence data to resolve relationships. Alternatively, Class II retroviruses may have undergone a period of 'explosive' radiation ancestrally, with many divergent lineages arising within a relatively short space of time. In this scenario, the actual basal relationships between lineages would be close to a hard polytomy.

Many of the PR-RT fragments amplified from the two novel avian groups, Class II A and Class II B, were intact, or almost intact. This suggests that exogenous members of these clades exist, and remain to be isolated. It is possible that retroviruses previously described, but for which pol sequence data was unavailable (Golden and Lady Amherst pheasant viruses (Hanafusa et al, 1976), TERV (Dimcheff et al, 2001)), may prove to be members of these clades. In mammals, viruses that did not belong in the established genera were usually old, suggesting that the majority of genera containing exogenous members may have been identified.

Despite the wide range of taxa surveyed, no ERVs showing relatedness to lentiviruses or deltaretroviruses were identified. Although comparisons of primer sequences with the target motifs in these genera suggested that amplification was not prohibited in theory, it was not possible to demonstrate experimentally that the primer pairs used for PCR

144 Chapter 2 Class II Retrovirus Diversity screening were capable of amplifying lentivirus- or deltaretrovirus-derived sequences, thus it cannot be concluded that lentivirus or deltaretrovirus-related ERVs are absent from the taxa screened in this study. Lentiviruses moved from a basal to a derived position in MP trees with the addition of more ERV sequences, perhaps indicating that more data will pull them unambiguously into one or another Glade.

2.5.3 Horizontal transfer between host classes

A strict consensus of the six shortest trees MP identified by heuristic search suggests that Class II ERVs cluster together according to host class (Figure 2.23), and that inter- class transmission events have been rare in the evolution of the Class II retroviruses. Although the deep structure of MP trees is not supported after bootstrapping, it is unlikely that such a strong pattern would be obtained by chance. There appears to be relatively little host switching between host classes, reflecting results obtained for the Class I retroviruses (Martin, 1999).

MP trees suggest that there are only two well-supported examples of inter-class horizontal transmission (Figure 2.23). The first of these involves an ERV fragment isolated from an amphibian, a caecilian (Boulengerula boulengeri). This rare example of a lower vertebrate Class II ERV clustered with avian viruses in NJ and MP trees. Previous screening of 40 lower vertebrate taxa failed to identify any other Class-II related sequences (Martin, 1999). The rarity of Class II retroviruses in lower vertebrates may reflect the recent emergence of the group within higher vertebrates from Class I or Class III retroviruses. However, very recently, Class II ERVs were identified in snakes (Huder et al, 2002), suggesting further screening is required to investigate the distribution of Class II retroviruses in lower vertebrates more thoroughly.

The second confirmed horizontal transfer event involves a betaretrovirus isolated from an ostrich. This is the first example of an avian betaretrovirus. Interestingly, a recent report suggested that a newly recognised disease, ostrich fading syndrome (OFS) is associated with a type-D retrovirus (Kabay and Ellis, 1996). The symptoms of OFS are

145 Chapter 2 Class II Retrovirus Diversity wasting in chicks up to six months of age, usually resulting in death (up to 80% morbidity on some farms). As part of a previous intensive investigation to determine the cause of the syndrome, a retrovirus was isolated from affected birds. An experiment was designed to reproduce OFS using this retrovirus. Degenerate retrovirus polymerase primers were used to amplify a PCR product from provirus incorporated into the genome of infected cells. The DNA sequences were compared with sequences in the various genomic databases and showed similarities to the reverse transcriptase gene of Betaretroviruses. Further investigation of the Ostrich type-D ERV sequence is planned. It will be of particular interest to determine whether this sequence has a recombinant env domain, as is found in the primate Betaretroviruses.

2.5.4 Horizontal transfer within host classes

Comparisons of host and virus phylogeny can be carried out to estimate the extent of horizontal transfer occurring within host classes, and can potentially indicate the direction of spread through host taxa. There were indications that the distribution of Class II ERVs across species reflects horizontal transmission rather than host phylogeny. For example, viruses from phylogenetically distinct flightless birds on isolated continents (penguins and kiwis) clustered together with high bootstrap support, suggesting that horizontal transfer of avian viruses has occurred between species in distinct Southern hemisphere continents (see Figure 2.29).

A second interesting finding concerns the isolation of a betaretrovirus from Risso's dolphin (RV-Risso's dolphin I). This ERV has only one stop codon along the amplified region, and no frameshifts. The lack of degeneracy detected in this ERV suggests it was inserted into the germline relatively recently, almost certainly after the terrestrial ancestors of the Cetacea entered the aquatic environment. It thus appears likely that horizontal transmission of betaretroviruses between terrestrial and fully aquatic marine mammals has occurred. This is an intriguing finding, particularly since remarkably little is known about the transmissibility of viruses in aquatic media.

146 Chapter 2 Class II Retrovirus Diversity

The majority of exogenous retroviruses that have been identified infect terrestrial species; these retroviruses are transmitted via body fluids'', and are not thought to survive for significant lengths of time in the external environment. In aquatic media, however, the situation may be different. A recent study demonstrated that spumavirus, a primate retrovirus, retained infectivity for up to five days in distilled, estuarine and marine water (Lotlikar and Lipson, 2002). Some retroviruses may even be adapted to transmission in the aquatic environment; it has been suggested that unusual features of the Env proteins of exogenous retroviruses infecting fish (walleye dermal sarcoma virus (WDSV) and walleye epidermal hyperplasia virus (WEHV)) may act to stabilise these viruses in aquatic media (LaPierre et al, 1999).

Restrictions on retroviral transmission within aquatic media, and between aquatic and terrestrial media, may be reflected in the distribution of ERVs in the genomes of marine mammals. A promising line of enquiry concerns the distribution and diversity of in ERVs in cetaceans and pinnipeds (seals) in comparison to ERVs in terrestrial species. Although only one cetacean Class II ERV was identified here, several Class I ERVs were isolated from cetaceans (see Table 2.3). Class I ERVs have been more thoroughly sampled than Class II ERVs, and it is more likely that variations in ERV distribution reflecting viral ecology will be detected for this class of ERVs. Further sampling of Class I ERVs in marine mammals and comparison with the large and growing set of Class I ERV sequences is a future objective.

Equine infectious anaemia virus (EIAV) is the one known exception to this rule, it is transmitted by mosquito vectors.

147 Chapter 3 Modelling ERV Evolution

3. Simulation modelling of ERV evolution

3.1 INTRODUCTION

3.1.1 ERV distribution and diversity within species

The previous chapter described the use of PCR screening to investigate the distribution and diversity of Class II ERVs across a wide range of vertebrate species. The data obtained by PCR screening allows an approximate analysis of ERV distribution and diversity throughout vertebrates as a whole, or within specific subgroups of vertebrates. However, PCR screening provides only limited information about ERV diversity within a given host genome. The technique yields fragments of ERV insertions rather than complete ERV sequences (see Chapter two, section 2.3.2). PCR results for a given host species may not accurately reflect the full range of ERV diversity in that hosts genome, negative results cannot confirm the absence of an ERV lineage, and the technique gives no information about the location of ERV insertions within the genome. Consequently, PCR screening can only provide incomplete and imprecise knowledge of ERV distribution and diversity within the genomes of sampled taxa.

However, recent years have seen the progress of numerous projects aiming to sequence completely and annotate the genomes of certain organisms, including humans ( sapiens) and mice (Mus musculus). A draft version of the human genome has been completed, and is publicly accessible online. The draft human genome sequence was generated over a relatively short period by the coordinated efforts of 20 groups from laboratories around the world. The sequence is a composite of several individuals. It was generated from a physical map covering more than 96% of the euchromatic part of the human genome, and covers about 94% of the genome (Lander et al, 2001). Using search tools such as BLAST (Altschul et al, 1997), sequences derived from HERV insertions can be identified within and retrieved from online databases with relative ease (Tristem, 2000). However, exhaustive cataloguing of ERV diversity within genome sequence data

148 Chapter 3 Modelling ERV Evolution is challenging, and current focus is on developing ERV-specific data mining algorithms to exploit it efficiently (Smit, 1993; R. Belshaw, personal communication). Recent developments have seen the emergence of online databases dealing specifically with HERV data derived from the draft human genome sequence (Paces et al, 2002).

When the complete genome sequence of an organism is available, it becomes possible to investigate the distribution and diversity of ERV sequences within that genome in precise detail. Since the complete sequences of ERV insertions can be obtained from using genome project data, there is far greater scope for phylogenetic analysis. Phylogenies can be based on various ERV genes, or on entire ERV genomes. Recombination between ERVs may be detected by comparing phylogenies based on different regions of the genome. Sequence divergence between LTR sequences provides a means of estimating insertion dates (Johnson and Coffin, 1999). Genome project data allows us to map precisely the positions of ERVs, and look for patterns in ERV distribution within and across chromosomes.

Preliminary analysis of the human genome has led to the identification of distinct HERV lineages, each of which is assumed to have arisen from an independent colonisation event (Tristem, 2000) (see Section 1.6.3, p50). HERV data from the human genome project opens up possibilities for a wide range of evolutionary and ecological investigations. How are diverse HERVs distributed throughout the genome? Are some HERVs more common than others? How has the activity of various HERV lineages varied over time? What do variations in HERV distribution (both in terms of integration dates and the intragenomic distribution of insertions) between humans and their closest relatives reveal about the evolution of the human genome itself?

Interpreting ERV data requires a model of ERV evolution. A successful model would link patterns of ERV distribution and diversity with evolutionary processes, so that, given ERV data from the human genome project, for example, we might be able to make inferences about the evolution of humans and their ERVs. Deriving a simple

149 Chapter 3 Modelling ERV Evolution model that predicts the effect of ecological parameters on ERV evolution is one of the aims of this thesis.

3.1.1 Computer simulation of ERV evolution using an individual based model

Computer simulation can be used to model ERV evolution, and explore how ERV distribution and diversity is shaped under varying parameters. The activities of ERV insertions within a population of host organisms (amplification, fixation and loss), and the selection pressures that govern them, constitute a system that is naturally described by the interaction of individual units (ERVs, chromosomes, and hosts) with individual strategies. Models of systems like this are called individual-based, entity-based or agent-based models (Holland, 1995). In an agent-based model, the characteristics of each individual are tracked through time. This stands in contrast to modelling techniques where the characteristics of the population are averaged together and predictions are based on averaged characteristics for the whole population.

This chapter describes an agent-based model of ERV evolution and the implementation of a computer simulation (called 'Passengers') based on the model. The Passengers simulation can be used as a tool to study the evolution of ERVs and other TEs under varying parameters. The results of a series of tests to demonstrate the integrity of the application are reported and some of its potential applications are discussed. Chapter four describes the use of simulation to investigate the effect of parameters such as transposition rate, mutation rate, host population size and gene density on ERV evolution.

150 Chapter 3 Modelling ERV Evolution

3.2 APPROACH

Simulations occupy an intermediate position between theory and experiment. They are not experiments in the traditional sense of the word because they do not directly manipulate the world being modelled, rather, they are more like analytical tools that allow us to follow and study change in dynamic systems. As the simulation is executed, patterns and symmetries will typically show up in the ongoing action.

Simulations are based on underlying models, which should reflect key features of the system to be studied. Deriving a model involves a series of choices about which processes and parameters to include, and how they will affect one another. These choices reflect the objective of our investigation; which parameters we are most interested in, which we think are most important, the kind of experiments we plan to perform, and how we plan to analyse the data.

At the outset of this project, it was clear that a wide range of studies of ERVs, and of transposable elements in general, could make use of computer simulation, and that a wide range of variables and relationships could be worked into the model. Consequently it was decided to begin with a relatively simple model, but one that could accommodate subsequent embellishments. Most of the components and processes of the model are designed so that modifications (novel features and algorithms) can easily be accommodated.

151 Chapter 3 Modelling ERV Evolution

3.2.1 Model components

Figure 3.1 shows a conceptualised representation of the organisation of components in the simulation model. The components are described individually below.

Transposable Elements Transposable elements have a transposition rate that represents the probability that transposition of this element occurs in each generation, and a mutation rate that represents the probability of a mutation event occurring at each transposition event. Mutation of elements leads to a loss of transpositional activity (ie. it reduces transposition rate to zero). Mutation can occur either at transposition (due to copying errors), or as a result of background mutation in the host.

Chromosomes Elements are located in discrete genomic units within the host - chromosomes. Since hosts are diploid, two copies of each chromosome are present, and these copies recombine at meiosis. During meiosis, a chiasmata frequency variable determines the likelihood of crossovers at meiosis. The proportion of vital coding sequences relative to junk DNA may vary from one chromosome to another. The proportion of junk DNA is described by a 'proportion junk' variable.

Hosts Hosts have a fitness value between 0 and 1, and genomes consisting of a specified number of chromosomes. Insertion of an element into the coding sequences of a host chromosome has a damaging effect on host fitness. Insertions in non-coding DNA have no direct effect on fitness, however, they can affect it indirectly by acting as a source of novel elements, some of which may then insert into coding DNA. The host genome has a mutation rate that determines the probability of background mutation in the host genome operating on elements within it.

152

Chapter 3 Modelling ERV Evolution

Figure 3.1 Conceptual representation of the simulation model

Environment

Figure 3.1 The components of the model include elements, chromosomes, hosts and an external environment. The figure above shows a conceptual representation of the model. Hosts are diploid, sexual and replicate in discrete generations. Within the host population, the fitness of an individual host determines its probability of mating successfully, and selection is implemented through differential reproductive success over time. The only variables in the model that affect fitness are directly or indirectly related to the presence or absence of selfish elements. Host reproduction involves the generation of haploid recombinant gametes. During meiosis, chiasmata frequencies specific to each chromosome in the genome determine the probability of crossover occurring between any two elements.

Environment

The host population is contained by an external environment. Environment parameters determine the number of hosts in a population at any given time, and the number of generations to be simulated. The default scenario is for host population size to remain constant throughout the lifetime of the simulation. Simulations can incorporate host population bottlenecks or booms, though this requires manipulation of source code.

153

Chapter 3 Modelling ERV Evolution

3.2.2 Input data

Input data required for simulation is stored in an input file. The input file can be loaded into the simulation at the outset of simulation, so that same parameters can be used repeatedly. The input file carries information in three blocks. Figure 3.2 shows the syntax structure of an input text file.

Environment Block The first block in an input file is the environment block, which includes the following information: • Whether or not to build trees as simulation. • The rate of background mutation in the host genome. • The number of hosts in the population. • The number of chromosomes in each host. • The chiasmata frequency for each chromosome. • The gene density for each chromosome (expressed as the proportion of the genome consisting of non-coding (junk) DNA).

Seed Block The second block, the 'seed' block, includes information describing the type and distribution of elements in the initial population. Elements present in the starting population are referred to as 'seeds'. The seed block contains the following information:

• The number of different seed elements. • The name of each seed. • The mutation rate of each seed. • The transposition rate of each seed. • The way in which the seed is to be distributed in the population (ie. via random insertion, or insertion at a specific locus). • The details of the locus (chromosome and position) if the seed is to be inserted at a specific locus.

154 Chapter 3 Modelling ERV Evolution

• The number of copies of the seed to be inserted in the starting population. The maximum number of copies that can be seed at a specific locus is twice the number of hosts in the population, this makes the seeded element fixed in the host population.

Preferences Block The preferences block specifies the limits and endpoints of the simulation. Depending on the type of investigation being carried out, different endpoints may be appropriate. Limits on the number of generations, elements or fixed elements are specified in this block. The 'simulation type' is a number that specifies when the simulation should terminate and what data will be written to output files. For example, type 3 simulations record the linkage disequilibrium between two specified loci each generation, whereas type 4 simulations will stop when a target number of fixed elements have been obtained, and only record data for simulations that successfully reach the target.

3.2.4 Output data

As the simulation proceeds, the application records parameters associated with elements, such as their frequency in the host population, and their relationships to one another. This information is logged to an output file.

3.2.5 Model structure

Overview of the model The action within the model takes place in the context of a replicating population of diploid, sexual hosts, in which the germline has been colonised by at least one element lineage. Hosts replicate through discrete generations. Replication of the parental generation creates a discrete daughter generation, and there is no interbreeding between individuals in distinct generations. Within the host population, the fitness of an individual host determines its probability of mating successfully, and the differential reproductive success that occurs as a result of differences in fitness means that selection

155

Chapter 3 Modelling Retroelement Evolution

Figure 3.2 Syntax structure of a sample Passengers infile

#PASSENGERS IN 4- Header identifying this file as a Passengers infile

BEGIN LINEAGE;

Instructs program whether or not to build trees of elements as it simulates Build trees = OFF Mutation Rate = 0 4 Rate of mutation in the host germ line Initial Size = 1000 C Number =1 4 Size of host population Chromosome 1: Chiasmata Frequency = 0 41---Chiasmata frequency for chromosome I Chromosome 1: Proportion junk =1.0 4 Proportion junk DNA for chromosome I

ENDBLOCK.5

BEGIN SEED;

Number of seeds = 1 4 Number of distinct elements to be introduced at the start of the simulation

SEED 1 Name = ERV 4 Name of this seed Rate of mutation at transposition Mutation Rate = 0.0 4 Transposition Rate = 0.01 Details of first seed TYPE = Random 4 Seed at random locus in host Number of copies =1

ENDBLOCK;

BEGIN PREFS;

Simtype = 4 4 Type of simulation 4 Target number of fixed elements for simulation Fixtarget =10 NumRepeats = 100 4 Number of times to repeat this simulation MaxElement = 100000 4 Maximum limit for elements population size MaxGens = 10000 4 Maximum limit for simulation length File Name = Pop1000t0.01 41-- Name of file to log output to

ENDBLOCK;

156

Chapter 3 Modelling ERV Evolution

Figure 3.3 Internal mechanics of the simulation model

population of size n

11' select hosts for mating

•Ill'' recombination: create haploid gametes replace parental population with daughter population 'fir fuse gametes: form diploid progeny

transposition: mutation

,ir host germline mutation

/IF calculate progeny fitness

Ilr

export to the F-' daughter daughter population population

Jr

does daughter population size = n ?

Figure 3.3 The internal mechanics of the simulation model. Hosts replicate through discrete generations. Replication of the parental generation creates a discrete daughter generation, and there is no interbreeding between individuals in distinct generations. Within the host population, the fitness of an individual host determines its probability of mating successfully, and the differential reproductive success that occurs as a result of differences in fitness means that selection is implemented over time. The only variables in the model that affect fitness are directly or indirectly related to the activities of selfish elements.

157 Chapter 3 Modelling ERV Evolution is implemented over time. The only variables in the model that affect fitness are directly or indirectly related to the activities of selfish elements. Figure 3.3 illustrates the underlying mechanics of the simulation model. The following sections review the details of key processes.

Host Mating and Selection Selection is implemented through differential reproductive success. Individuals in the population are randomly selected and provided with an opportunity to mate. To determine whether the selected individual is fit enough to mate, a random number is generated within the possible range of fitness values (0 — 1). If that random number is larger than the selected individual's fitness, then the individual is rejected, and loses the opportunity to mate. However, if the random number is less than the individual's fitness, it is selected to make up one member of a mating pair. Thus, individuals of high fitness stand a better chance of exploiting a mating opportunity than do individuals of low fitness (Note — the same individual can be selected for mating more than once in a single generation). Host parental and daughter generations are always discrete, so that progeny are never able to breed with members of the parental generation. Two mating partners are selected as described above, and meiosis is simulated to generate a single, recombinant, haploid gamete for each mating partner. Haploid gametes are then unified to form the diploid progeny. The daughter generation does not begin breeding until its population size equals n. When this point is reached, all of the parental generation are discarded, and the daughter generation becomes the new parental generation. The cycle is repeated as specified by the input seed.

Meiosis and recombination Meiosis generates haploid gametes from the parental diploid genome. Recombination can occur during meiosis, so that gametes are a combination of parental and maternal genes. The probability of a recombination causing a switch between the paternal and maternal chromosomes between any two elements in the parental genome is calculated by the following equation.

158 Chapter 3 Modelling ERV Evolution

1) P = ((1 -(1 - d) f) 2

Where d is the distance between the current element and the last, and f is the average number of chiasmata (as specified in input data) on the chromosome involved. The top line of the equation describes the probability that at least one recombination event has occurred between the two loci separated by distance d. Where recombination occurs, however, there is only a 50% chance that it will produce a switch from the maternal to the paternal chromosome sets, so the top line of the equation is divided by two.

Transposition Transposition occurs after the union of gametes to create a new diploid host. The probability of any element transposing is determined by its transposition rate parameter.

Mutation Mutation of elements can occur at transposition, or as background mutation during the lifetime of an individual host. Mutation reduces transposition rate incrementally. Mutation does not directly reduce selection coefficient. Consequently the effect of reduced transposition on selection coefficient is not represented. This may lead to a more stringent purging of insertions than might be natural. The elements cannot adapt to a particular location other than to insert initially in a location that is not harmful to the host.

159 Chapter 3 Modelling ERV Evolution

3.3 IMPLEMENTATION

3.3.1 Materials - software development environment

The simulation software was developed in the C++ programming language, which supports object-orientated programming (OOP). Object orientated languages are ideal for the development of applications like simulations that require flexibility and compartmentalisation of data (McConnell, 1993). In an object-orientated language, data can be linked together with the routines that act on that data in a process called encapsulation. Encapsulation enables data to be linked explicitly to the processes that manipulate it, and also keeps data safe from outside interference and misuse (by other parts of the program). Thus, encapsulation allows code and data to be combined in such a way that a self-contained "black box" is created. When code and data are linked together in this fashion, an object is created. In other words, an object is simply the device that supports encapsulation. To all intents and purposes, an object is simply a variable of user-defined type. Each time a new object is defined, a new data type is created. Each specific instance of this data-type is a new variable (Schildt, 1998).

3.3.2 Random Number Generator

Random number generators (RNGs) are deterministic algorithms that produce numbers with certain distribution properties. Roughly speaking, these numbers should behave similar to realisations of independent, identically distributed random variables. Every RNG has its deficiencies, and no RNG is appropriate for all tasks. For example, several good RNGs from the toolbox of stochastic simulation are unsuited for cryptographical applications, because they produce predictable output streams. In stochastic simulation, in order to verify our simulation results, we should be able to choose from a whole arsenal of widely different RNGs. The reason behind this argument is the possibility that the intrinsic structure of our RNG might interfere with our simulation problem and yield wrong results. There are two big families of RNGs, linear generators and nonlinear ones. In stochastic simulations, linear RNGs are the best known and most widely available.

160 Chapter 3 Modelling ERV Evolution

The RNG class used in this application is a pseudo-random number generator intended to provide various random types, in various distributions over various ranges. It is implemented as a linear congruential generator, where f(z) = 16807 z mod (2 ** 31 - 1) (James, 1990).

3.3.3 General features of the design and implementation process

In order to implement the simulation model within a software application, it is necessary to determine how to represent the model as code, which in turn requires an understanding of the data that the application must receive from the user, and the data that will be recorded during the lifetime of the simulation.

Early on in the investigation, it was clear that a wide range of studies of ERVs, and of transposable elements in general, could make use of computer simulation. Since (a) we were not sure which parameters would prove to be the most important or interesting, (b) we would like to create increasingly complex simulation environments as we progress, (c) we would like to explore a wide variety of similar systems such as retrotransposons, it was decided at the outset to aim for a flexible implementation. This necessitates an underlying implementation that is inherently flexible and open to further development.

3.3.4 Simulation Components

The representation of the simulation model as object-orientated code derives directly from the heirarchal organisation of components in the model, as described in Section 3.2. All of the various entities within the simulation model (elements, chromosomes, gametes, hosts), as well as input and output data structures, are represented by discrete `data types'. Each data type encapsulates the parameters and behaviours specific to one of components of the model. Data types are organised hierarchically, as shown in Figure 3.1. Component data types are named after the components they represent, as shown in the table below.

161 Chapter 3 Modelling ERV Evolution

Table 3.1 Components and Classes

Simulation Component Data Type

Environment Environ.Ob Host Host.Ob Gamete Gamete.Ob Chromosome Chromosome.Ob Element Element.Ob Seed Seed.Ob Element Tree ElementTree.Ob

Element. Oh

Two element-associated parameters, location and chromosome assignment, describe the location of the element in the host genome. Elements can be located at any point between the start and end of the genome, but distinct elements may not occupy the exact same locus on the same chromosome set simultaneously. Elements also contain the values of h and s that determine their selection coefficient in the host (see Host.Ob below).

Gamete.Ob

Elements are inserted into the chromosome in order of their location, so that they form a linear series with respect to their specified locus. The total number of chromosome pairs in the genome can be varied. Each chromosome in the host genome has an associated chiasmata frequency; this value defines the average number of chiasmata that occur during recombination of the chromosome. During simulation of recombination, the simulation iterates along each chromosome pair in the genome, copying elements from either the maternal chromosome set, or the paternal chromosome set. Whenever a new element is encountered, the application calculates the probability that a crossover has occurred between this element, and the last element that was encountered.

162 Chapter 3 Modelling ERV Evolution

Host. Ob

The host data type defines an organism with an associated fitness and a genome that acts as a container for elements. Elements define the host genome, since the genome contains no information until elements are incorporated into it. Each chromosome has 10,000,000 discrete locations numbered from 0.0000001 to 1.0000000, each a conceptual locus that may or may not be occupied by a genetic element. An element located at any of these loci has an attached tag, which assigns it to either the maternal or paternal copy of the chromosome pair. Thus, chromosome pairs are represented as single linear arrays of element data types, not by the instantiation of two separate chromosome objects.

The host data type controls and implements the insertion of elements into the genome. The model assumes that all detrimental effects of elements stem from damaging insertions. The extent to which an insertion damages the host (its selection coefficient) is set at insertion. Two values; h and s, combine to determine the selection coefficient. Where an insertion is homozygous, the selection coefficient is (1 - s), in heterozygotes, however, it is (1 - (h s)).

In each generation, every element in the host genome is tested to see whether it undergoes transposition, as defined by its transposition rate parameter. During transposition, the probability of a mutation event occurring as the element transposes is calculated according to its mutation rate parameter.

163 Chapter 3 Modelling ERV Evolution

Gamete. Ob

The gamete data type is identical to the host data type except that the genome is haploid (all elements in the genome are either paternal, or maternal). When haploid gametes are combined to create a diploid host, the elements contained within the genome of each gamete are inserted into the diploid genome in the correct order of location.

Environ.Ob

During the simulation, data types are manipulated according to the specifications of Environ.ob, which forms the algorithmic nerve centre of the simulation. This data type is also the repository of preferences and variables relating to the way in which the simulation proceeds, and the type of data that is recorded. Environ.ob encapsulates all the data describing the initial structure of host/parasite complexes, their organisation into populations, their relationships to one another and their subsequent evolution. It controls host breeding, turnover of host generations, and the logging of data as specified according to user preferences.

The data recorded over the lifetime of a simulation is recorded by several variables specific to the Environ.Ob data member. Essentially, all the data can be captured within a single tree data structure, and it is our intention to use such a structure in the future. However, although a tree is perhaps the most natural data structure within which to record simulation data, retrieving specific types of data from such a structure can be a complicated procedure in itself, and often throughout the work described here (in Section 3.4 and Chapter four), alternative methods were used. For example, rather than calculate the total number of transpositions that occurred during a simulation by counting the nodes in an output data tree, this information was simply recorded in a separate Environ.Ob member variable designed specifically for that purpose. Furthermore, trees were not always constructed during simulation. The application allows tree building functions to be deactivated, decreasing the time taken for simulations to execute. If tree-building functions are deactivated, simulation data is logged directly to an output file.

164 Chapter 3 Modelling ERV Evolution

An important member variable in Environ.Ob is the 'genome map'. This is essentially a consensus map of element insertion loci and frequencies for the entire host population. The genome map is updated with every generation, and allows extinct and fixed element loci to be rapidly identified without the need to iterate through each individual host genome separately. Following each host generation, it is important to identify whether element insertions at specific loci have been lost or fixed, so that data can be updated accordingly.

Seed.Ob

This class is simply a container to receive and store all the input data for simulation.

165 Chapter 3 Modelling ERV Evolution

3.4 DEMONSTRATION

The simulation has numerous internal validation functions that ensure that variables fall within expected bounds, and that relationships between variables and within data structures such as trees are internally consistent. The integrity of tree output, host mating, element mutation, transposition and other processes was also assessed manually by matching output to a record of events archived at application runtime, and by using the compiler to follow algorithms step by step to ensure that they were functioning appropriately. Additionally, the application was assessed by analysing output to assess whether simulations behaved as they were expected to under different sets of conditions.

3.4.1 TEST 1— Population size and fixation frequency

A single inactive (non-transposing) element was inserted randomly into a population of sexual, diploid hosts. Hosts had a single, non-recombining chromosome, which consisted entirely of non-coding DNA, so that all insertions were selectively neutral. Population size was varied at intervals ranging from 10 to 600. Simulations set up in this way had two potential outcomes. Frequently, the introduced element was lost from the population by random drift, in which case, a 'failed event' was recorded and the simulation restarted. Occasionally, however, the element would reach fixation via random drift, and a 'fixation event' was recorded. For each host population size, simulations were repeated until 100 fixation events had occurred.

In the absence of recombination, selection and transposition, the probability of fixation of an individual element introduced as a single copy into the host population is dependent on population size alone. For example, if there are 50 hosts, an element inserted randomly at the outset of the simulation will occupy one of the 100 chromosomes. The probability of that element (or rather, the chromosome containing that element) eventually reaching fixation is equal to its initial frequency in the population, which in this case is 1/100. Therefore we expect approximately 99 elements to be lost for every one that is fixed, and the number of loss events per fixation event

166 Chapter 3 Modelling ERV Evolution should increase linearly as the size of the host population increases. The number of loss events per 100 fixation events was plotted against population size (Figure 3.4), and was seen to closely match the predicted outcome.

3.4.2 TEST 2 — Gene density and fixation frequency

Simulations identical to those described above were run, again with inactive elements. However, in this case population size was held constant (100 hosts) and the proportion of coding-DNA was varied. Insertion of an element into coding DNA was lethal, while elements in non-coding DNA were neutral. Under these circumstances, the probability of the introduced element (or rather, the chromosome containing that element) eventually reaching fixation is equal to its initial frequency in the population, multiplied by the proportion of non-coding DNA in the genome. Consequently, the number of failed events per fixation event is thus expected to increase exponentially as the proportion of non-coding DNA decreases. The results of simulations set up in this way were plotted (the number of loss events per fixation event against the proportion of non- coding DNA in the host chromosome, see Figure 3.5), and were seen to closely match the predicted outcome.

3.4.3 TEST 3 — Transposition rate and element population size

A fixed, active element insertion was introduced into a population of 100 hosts. The transposition rates of the element were varied, and the increase in element population size from generation to generation was followed until a maximum number of elements was reached or surpassed, at which point the results were logged to output files, and the simulation was repeated. For each transposition rate, the simulation was repeated 50 times and the results averaged. The averaged results for each transposition rate are shown in Figure 3.6.

In these simulations, an active element is fixed in the host population. Since the active element cannot be lost, and there is no mutation to reduce the rate of transposition,

167

Chapter 3 Modelling Retroelement Evolution

r Figure 3.4 Number of failed colonisations per 100 fixation over a range of population sizes

120000 Predicted O Result

100000

ions • t fixa

100 80000 for ions t

isa 60000 lon d co ile fa

f 40000 o ber

Num 20000

100 200 300 400 500 600 700 800 Population size

Figure 3.5 Number of failed colonisations per 100 fixations over a range of gene densities

250000 Results —0— Predicted

ns io

t 200000 fixa 00 1 for

150000 ns io t isa n lo 100000 d co ile fa f o

ber 50000 Num

10 20 30 40 50 60 70 80 100 Percentage coding DNA

Figures 3.6 and 3.7 The results of tests performed using the Passengers simulation. Simulations with predictable outcomes were set up (see sections 3.4.3 and 3.4.4 in the main text), and the results obtained plotted against those predicted. In Figure 3.5 the rate of increase in element population size varies approximately as expected as transposition rate increases. In Figure 3.6 the decay in linkage disequilibrium (D) between two loci that are initially present on the same chromosome set in all hosts, and absent on the other, is tracked over time for a range of three different chiasmata frequencies. The plot shows that decay in D closely matches that predicted for each chiasmata frequency.

168

Chapter 3 Modelling Retroelement Evolution

,

. 4 • , o -g oil 0 A t. b Rate of element population size increase under varying transposition cr rates

> .

, g Transposition rate 0.1 0.2 0.3 0.4 0.5 — expected — expected expected - - — expected — result — result — result — result —

, " §

ts " 8 8 lemen

, •, f e , 7, . o ber m

8 8 Nu

, . 8 O ,

...__..-- 0 0 20 40 60 80 100 Generation

Figure 3.7 Decay in linkage disequilibrium (D) over time at varying chiasmata frequencies

Chiasmata Freq. result - 0.01 expected - result ----- 0.05 expected - result 0.1 0.8 expected -

•4:> . ------.--..._,__.-, ' 5' 0.6 cr a) vi 4 a) ,to 0.4

;-1 -,—„.._,,.. —, 0.2

--__

0 0 20 40 60 80 100 Generation

Figures 3.6 and 3.7 The results of tests performed using the Passengers simulation. Simulations with predictable outcomes were set up (see sections 3.4.3 and 3.4.4 in the main text), and the results obtained plotted against those predicted. In Figure 3.5 the rate of increase in element population size varies approximately as expected as transposition rate increases. In Figure 3.6 the decay in linkage disequilibrium (D) between two loci that are initially present on the same chromosome set in all hosts, and absent on the other, is tracked over time for a range of three different chiasmata frequencies. The plot shows that decay in D closely matches that predicted for each chiasmata frequency.

169 Chapter 3 Modelling ERV Evolution the number of elements in the population is expected to increase exponentially at a rate determined by the transposition rate of the element. In addition to showing the results of simulations for five different transposition rates, Figure 3.6 shows increase in element population size against the predicted rate of increase if no insertions are lost from the population by drift. The plot shows a close match between the averaged results obtained from 50 test simulations and those predicted. Simulation results would not be expected to exactly match these predictions because stochastic effects and genetic drift can influence element population sizes in simulations. Nevertheless, the congruity between obtained and predicted results indicates that elements are behaving at least approximately as expected under the conditions.

3.4.4 TEST 4 - Recombination

Recombination was tested as follows. Host populations of 20000 individuals were created, each host having a single chromosome. Two inactive elements were seeded onto the maternal copy of the chromosome for each host in the initial population. One element was inserted at position 0.2, the second was inserted at position 0.8, so that the distance between the two element loci was 0.6. Simulations were then carried out, varying the chiasmata frequency of the single chromosome from 0.01 to 0.1. The linkage disequilibrium (D) between the two loci was calculated in each generation. Each simulation had a lifespan of 100 generations. Ten repeat simulations were carried out for each chiasmata frequency.

Seeding elements in the way described above effectively created a population in which there were two loci with two alleles each (if we regard presence or absence of an element at a specific loci as two separate allele states). Let us call the presence of an element at position 0.2 allele A, and absence of an element at that position allele a. Similarly let the presence of an element at position 0.8 be allele B, and the absence of an element at that position allele b. There are four possible haplotypes, AB, ab. Ab, and aB. D is calculated by adding the frequencies of haplotypes AB and ab in the population, and subtracting that total from the sum of the frequencies of haplotypes Ab and aB.

170 Chapter 3 Modelling ERV Evolution

Since, in the initial population, both elements are seeded onto the maternal chromosome, only haplotypes AB and ab are present initially, and D = 1. Recombination between loci breaks down linkage disequilibrium. In an infinite population, linkage disequilibrium decays at an exponential rate equal to the recombination rate between loci. The rate of decay is calculated using the equation

2) D' = (1-r)D

Where D' is the linkage disequilibrium in the next generation, D is the linkage disequilibrium in the current generation, and r is the rate of recombination between the two loci under examination. Using equation (1), to calculate the rate of recombination between the two element loci, the expected decay in D over time could be plotted, and matched to the results of the simulations described above. As shown in Figure 3.7, results closely match predicted outcomes.

171 Chapter 3 Modelling ERV Evolution

3.5. APPLICATION

Simulation can provide the basis for a broad range of investigations of transposable element evolution. The application described here has been developed with flexibility in mind to facilitate future elaborations with regard to the complexity and detail of simulation processes and algorithms. We expect that simulation will prove a useful tool for the derivation of models that make both qualitative and quantitative predictions about transposable element evolution. Some of the potential uses of the simulation are discussed below. Further applications are also discussed in Chapters four and five.

3.5.1 Fixation and persistence of TE lineages

The use of simulation to investigate the effect of ecological parameters on the lifespan and distribution of ERV lineages is described in Chapter four of this thesis (p177).

3.5.2 ERV Glade growth

Present understanding of how natural selection shapes the evolution of ERVs and other retroelements in the genome is relatively poor. The intermittent activities of retroelements seem to provide a somewhat nebulous target for selection. The presence of active retroelements in the genome is likely to disadvantage the host, since transposition events have the potential to seriously damage host fitness. Despite this, retroelements can increase in frequency, because selection against the element is thought to be a weak force compared to selection in favour of it through disproportionate inheritance (Orgel and Crick, 1980). It is not clear to what extent selection favouring coadaptation to the host through reduced transposition can be counteracted by selection favouring aggressive and virulent elements that transpose actively at the expense of long term persistence.

Investigations of the balance between the effects of transposon self-propagation and selective elimination due to negative effects on host fitness have previously been carried

172 Chapter 3 Modelling ERV Evolution out in vivo, usually using Drosophila (Montgomery and Langley, 1983; Langley, Brookfield and Kaplan, 1983; Charlesworth and Charlesworth, 1983; Charlesworth and Langley; 1986; Charlesworth, 1987). While the value of in vivo investigations is in no question whatsoever, the speed and flexibility of computer simulations offers considerable advantages for a broad-based approach to transposable element evolution.

For example, simulation can be used to explore optimal parameters for 'persistence' of transposable element lineages. Here, persistence refers to the active lifespan of an element lineage. Element lineages containing only fixed insertions incapable of expression are considered extinct. A variety of factors in the environment, such as mutation rate, transposition rate, gene density and chiasmata frequency, may influence persistence. Host and element population dynamics may have unpredictable outcomes, particularly where insertions are capable of utilising one another's gene products (see Section 3.6.4).

Simulation provides a rapid means to explore the effect of environmental parameters, on the active lifespan of element lineages. It may prove instructive to analyse the distribution of element-linked traits throughout element phylogenies, and compare the distribution of traits across phylogenies. How does the distribution of traits vary between fixed and active and extinct elements in complete, restructured phylogenies? Do these distributions differ in response to different environmental conditions?

Analysis of the distribution of nodes throughout simulated element phylogenies may help elucidate typical patterns of amplification, fixation, and extinction for transposable element lineages. Evolutionary inference from the distribution of ERVs and other transposable elements should ideally be based on reliable models of Glade growth. Our capacity to apply one or another model of Glade growth to empirically derived ERV data will influence the kind of inferences we are able to make from it.

Comparisons of Glade growth require a measure of the relative positions of internal nodes within a phylogeny. An example of a useful statistic for this purpose is the y

173 Chapter 3 Modelling ERV Evolution statistic described by Pybus and Harvey (2000). y has a useful property; under a pure birth process (see Pybus and Harvey, 2000), y-values of complete, reconstructed phylogenies follow a standard normal distribution. If y > 0 then a phylogeny's internal nodes are closer to the tips than expected, and if y < 0 then the internal nodes are closer to the root than expected under the pure birth model.

3.5.3 The effect of incomplete sampling on evolutionary inference

Several studies have suggested that amplification rates for ERVs are not constant over time. For example, assuming a constant mutation rate of -0.13% per million years for full sized LTRs, LTR divergence data suggests significant burst of integration/ amplification of HERVs occurred about 30 Mya (Sverdlov, 2000). Preliminary analysis carried out on two distinct families of HERVs suggests remarkably different patterns of activity over time (Mike Tristem, unpublished data). The HERV-L elements show an apparent early burst of retrotranspositional activity, followed by little or no retrotransposition. In contrast, HERV-H elements appear to have had a more constant rate of activity over time.

With regard to this data, it is important to remember that sampling of ERV diversity is almost inevitably incomplete, and that the effect of incomplete sampling on transposable element phylogenies could create a misleading impression of their evolution. The majority of insertions created over the lifespan of an ERV lineage are likely to be lost, either through selection or drift, and most probably only a small proportion of elements ever reach fixation. The majority of ERVs in the human genome, for example, are 30 million years old or more, and are assumed to be at fixation in the human population (Tristem, 2000).

The effect of incomplete sampling on macroevolutionary inference can be tested by comparing complete reconstructed phylogenies generated using simulation with phylogenies in which sampling is incomplete (Pybus and Harvey, 2000). Comparisons of complete 'transposition histories' with pruned trees containing only extant insertions

174 Chapter 3 Modelling ERV Evolution can be used to assess the effect of incomplete sampling on evolutionary inference. A similar approach can be used assess the effect of incomplete sampling on attempts to distinguish between multiple source and master/slave patterns of ERV Glade growth using empirical data.

Since, in most cases, only fixed insertions are available for study, the distribution of nodes in element phylogenies may simply reflect a bias in the way in which insertions are fixed. For example, if the host population undergoes a population bottleneck, proportionally more elements are likely to be fixed, and what appears in phylogenies to represent a burst of transpositional activity may simply reflect the higher frequency of fixation. It is possible that a distortion of this type might be detected through a corresponding behaviour in other genes or transposable element lineages.

3.5.4 Consequences of sharing gene products

Population dynamics models of transposable element evolution are complicated in situations where insertions are capable of using one another's gene products. The supply of enzymes in trans may allow some element insertions to remain active long after their coding sequences degenerate, providing sequences controlling transcription and RNA packaging remain functional. This may have implications for the long-term evolution of LINEs versus ERVs. Although trans supply of enzymes might be expected to prolong the activity of an element lineage, proliferation of degenerate templates at the expense of functional insertions might instead have the overall effect of terminating Glade growth more rapidly.

Where the supply of enzymes in trans is an important regulating factor in ERV activity, germline colonisation may influence the activity of established ERV lineages, providing that the established ERV lineage can utilise enzymes provided by the novel one. Potentially, germline colonisation could reactivate long-dormant ERV lineages. A similar effect might be engendered by the spread of active alleles encoding functional proteins throughout the population. Sexual organisms will mix alleles possibly bringing

175 Chapter 3 Modelling ERV Evolution together alleles, creating genotypes in which trans complementation leads to active transposition. Nonetheless levels of enzyme available for transcription will decrease as expression decreases - this might lead to a reduction in the overall level of transposition.

It may be instructive to compare patterns of Glade growth in lineages capable of using gene products in trans, with lineages restricted to cis-activity. Data obtained by simulation can be compared with empirically derived data - for example data derived from the HGP. As evidenced by the complementation of defective retroviruses (Shimotohno and Temin, 1981; Wei et al, 1981; Tabin et al, 1982) retrovirus insertions may use one another's gene products. In contrast, cis-activity of LINE proteins with regard to LINE RNA is predominant. Simulation can be used to try and predict the effect of sharing gene products on patterns of Glade growth for ERV lineages as compared to LINE lineages. Results obtained using simulation models can be compared with LINE and ERV data derived from the human genome.

176 Chapter 4 The Generation of ERV Distribution and Diversity

4.2 METHODS

4.2.1 The Passengers Simulation

The application 'Passengers', a computer simulation designed to simulate the evolution of transposable elements, was used for all the experiments described here. See Chapter 3 for information regarding the internal algorithmic structure of the simulation.

4.3 RESULTS

4.3.1 Simulating Colonisation

The aim of the first set of experiments performed here was to simulate the events following germline colonisation by a retrovirus. Simulations were set up as follows: There was a population of 100 hosts. Each host had a single pair of homologous, non- recombining chromosomes. The entire chromosome consisted of non-coding DNA, so that all element insertions would be selectively neutral. A host was selected at random and a single copy of an ERV element was inserted into its genome. Both element and host mutation rates were set to zero, so that elements were impervious to mutation. Following the introduction of the colonising ERV element into the population, simulations proceeded until either (1) no elements remained in the population (having been eliminated by genetic drift), (2) an element reached fixation, or (3) the number of elements in the host population reached or exceeded 100,000. Simulations were carried out using elements with a range of transposition rates (0.01, 0.025, 0.05, 0.075 and 0.1) and with host population sizes of 100, 1000, 10,000, and 100,000. For each set of parameters, 100 repeat simulations were carried out. The variation in element population size over time was plotted on the same graph for all 100 simulations (see Figures 4.1 — 4.5)

180

Chapter 4 Generation of ERV Diversity

Figure 4.1 Element population growth with a transposition rate of 0.01

© Fixation 10000

..._-,-"-- 1000 ,...,....„,---"\,'....,-,,,./- -"-'''.--A3 r.)A. ,..,_—_, At r-, .r--,_. te..,"I.-•-- -" 'g /)--,, vi v4 100 r\eyt /I/ ,/,'"' 4..0 i .e r iAP-'' K r - 1,1\1lF g \l` v - Z 10 Isity 1 I

'IA 111

N en Generations

1000000 Figure 4.2 Element population growth with a transposition rate of 0.025

su Fixation 100000 -,--- ,- -'44'-

10000

_,-• ,,---'-----/-'- .."----'-^ _7------1000 ' ,_/-.,--J,,„,- - t 171 -

1 ,../J stS § 100 z \- ii,„( AA vt.ti°

10 { A yV / i h II y Aijj 0 — m Generations

Figures 4.1.4.5 Plots showing the change in element population size over time for simulations of germline colonisation in a population of 100 hosts. A range of five different element transposition rates (0.01, 0.025, 0.5, 0.75 and 0.1) were used and the results of 100 simulations at each transposition rate are shown. At low transposition rates most simulations result in elimination of the colonising element via genetic drift. Ocassionally, however, elements populations expand and simulationsresult in the fixation of an insertion, or the element population exceeding the maximum limit (100,000 elements). As transposition rate increases, elements are more likely to avoid elimination, but less likely to achieve fixation before the element population size limit is reached.

181

Chapter 4 Generation of ERV Diversity

mow) Figure 4.3 - Element population growth with a transposition rate of 0.05

100000

10000

..,..... re;''' g 1000 OP 100 11,

10 ti liAt i. f'4,14,1 , ,11 V il lig Ilia 1 ,I k 8 Generations

1000000 Figure 4.4 - Element population growth with a transposition rate of 0.075

moo

10000

loco E E 1 100 .iiiirif, r"

• %..gir''''r ,,,4 i.,24 „ . Generations

1000000 Figure 4.5 - Element population growth with a transposition rate of 0.1

100000

- ,- 10000 " ._...% _...", 7-

m ....7.. ,./ 0 1000 Ale': -," .A0111 •1Pd''' i: t' s iog r, • J1 te 100 1'r 10,4*$j „./ - •44 'iir 'tt;ii, o"

R Generation

182

Chapter 4 Generation of ERV Diversity

Figure 4.6 Simulation outcomes when varying population size and transposition rate

II Number of loss events El Number of maximum limited trials

T = 0.075 T = 0.1

100 -

80

▪ 60 -

40

20-

0 a o o co 0 00 0 0 00 00 co 00 000 000

0 0. 0 000 0 000 000 000 000 100 000 1 0, 10 0, 1 00, 00, n Oo —0 C 10, 10, 1 00, ion 1 ion

t t

1 0 100, 0 0 1 io 100, ion 1

0 t t n la

la n 171 ion

o ..4 ion n ion t la ion u t ion u la t io io t t ion t u io t

a. m la u t la t a, la op la op la u la u la

0 u 0 la

o ' op

p u la p op

a. u a, u u u

a. p 0 u p op op op op op p op

a. p op p op p op p p p p p

Figure 4.7 Simulation outcomes when varying gene density and transposition rate

la Number of loss events D Number of maximum limited trials

T = 0.01 T = 0.025 T = 0.05 T = 0.075 T = 0. 1000

800

ls 600 ia f tr o r

be 400 m Nu

200

0

I I I C I I I 1 I 1 •"1 • I I •• I • • ", •-• - o ▪ • 6 6 cP. 6 6 a a co C

Figures 4.6 and 4.7 Bar graphs showing the proportion of simulations resulting in loss of the colonising element, and the proportion in which the number of elements expanded to maximum limits (100,000 elements) over a range of five transposition rates (T) whilst varying host population size (Figure 4.6, 100 simulations per set of parameters) and the proportion of coding to non-coding (junk) DNA (Figure 4.7, 1000 simulations per set of parameters). The plots indicate that the outcome of simulations is independent ofpopulation size and gene density, but dependent on transposition rate.

183 Chapter 4 The Generation of ERV Distribution and Diversity

Since only a single element is introduced at the outset of each simulation, and only one transposition event can occur per element per generation, the ERV population remains small in the generations immediately following colonisation. As long as the ERV population remains small it is vulnerable to elimination via random drift. Consequently, the majority of germline colonisation events are followed by loss of the ERV element within a few generations. Nonetheless, colonising ERVs occasionally evade immediate elimination, and the number of elements in the population increases over time. Element populations sizes may increase initially but subsequently fall back to zero. However, plots showing element population size against time for each set of 100 simulations suggest that once the number and distribution of elements in the population reaches some threshold configuration, an increase in the element population size becomes more likely than a decrease, and from this point on, element populations increase in size exponentially (see Figures 4.1-4.5). Plots suggest that the threshold is characterised by an element population size roughly equivalent to the host population size (Figures 4.1- 4.5). Unfortunately, time was restricted and I was unable to explore this relationship further.

The exponential growth of element populations that surpass this threshold reflects the fact that in these simulations, there was no mutation to attenuate transposition rate as simulation proceeded (elements had a mutation rate of zero, and the rate of element mutation in the host germline was also zero). The exponential nature of element population growth means that after reaching threshold levels element populations reach the maximum size limit in a small number of generations. However, it takes on average 4N generations to fix a neutral insertion (where N = the size of the host population). Consequently, although some simulations (those involving elements with relatively low transposition rates) do occasionally result in fixation, element populations that increase beyond threshold levels generally hit the maximum limit for element population size before any individual insertion is fixed. This is despite the fact that the maximum is relatively high (100,000 elements in a population of 100 hosts). As would be expected, the probability of hitting element population size limits before any insertion reaches fixation increases as transposition rate increases (data not shown). Varying host

184 Chapter 4 The Generation of ERV Distribution and Diversity

population size has little discernible effect on the probability of elements being lost as opposed to their probability of expanding beyond the maximum limit of 100,000 elements (see Figure 4.6)

In order to investigate the effect of varying the proportion of coding to non-coding DNA in the host genome on the outcome of colonisation, simulations were carried out as described above, with populations of 100 hosts. 1000 simulations were carried out and the proportion of non-coding DNA in the genome was varied. Element insertions into non-coding were selectively neutral; insertions into coding DNA were heterozygous lethal. The general pattern of these simulations was similar to that described above regardless of the proportion of coding DNA in the host genome (Figure 4.7).

4.3.2 Persistence of ERV activity following fixation

Transposition rate must be tempered by mutation rate to prevent exponential growth of element population. To compare the effects of host and element mutation rate on growth dynamics and persistence of element populations, a second set of simulations was carried out. Simulations were set up similar to those described in Section 4.3.1 (a population of 100 hosts, each with a single pair of homologous, non-recombining chromosomes consisting entirely of non-coding DNA), except that rather than introducing a single ERV element randomly into the host population, simulations began with a fixed, active element inserted at a specific locus. Initially, the mutation rate of the element was set to zero (mutation never occurred at transposition), and host mutation rate was set to 0.01. The transposition rate of the fixed element insertion was varied (0.001, 0.005, and 0.01), and simulation proceeded until all the elements in the population had been inactivated, or until the number of elements in the simulation exceed the maximum limit (100,000 elements). For each set of parameters, 50 repeat simulations were carried out. The generation in which the lineage was inactivated was plotted against the number of elements in the element population at the point of inactivation on the same graph for all 50 simulations (see Figures 4.8 — 4.10).

185

Chapter 4 Generation of ERV diversity

Figure 4.8 Time to inactivation plotted against element population size at inactivation, 450 where host mutation rate = 0.01, element transposition rate = 0.001

400 • • • 350 • 300 •

250 ii. • • • 0 200 Imso•roo. **No • • • • • • • • .0 zg 150 100 -

50 -

0 , , 0 100 200 300 400 500 600 Time (generations)

Figure 4.9 Time to inactivation plotted against element population size at inactivation, where host mutation rate = 0.01, element transposition rate = 0.005 1600-

1400- •

1200- • • 40 1000- • g ,-6 800- • o • • 600- • E$ 2 • 400 - 4 • • a,. •a,*e • 200- IMMIAMIle.. -v•

0 o 200 400 600 800 1000 1200 Time (generations)

60000 Figure 4.10 Time to inactivation plotted against element population size at inactivation, where host mutation rate = 0.01, and element transposition rate = 0.01

50000 •

40000

•g' • • E: 30000 o o 1 E 20000 • Z

10000 • •• • o , 0 1000 2000 3000 4000 3000 6000 Time (generations)

Figures 4.8 - 4.10 Plots showing the generation of inactivation against the number of elements present in the population at the point of inactivation at three different transposition rates, in simulations where there was no mutation at transposition.

186

Chapter 4 Generation of ERV Diversity

Figure 4.11 Time to inactivation plotted against element population size at inactivation, where 60000 host mutation rate = 0.01, element mutation rate = 0, element transposition rate = 0.01

50000 •

40000

fa2 • • lE, 30000 o

1 20000 • z9

10000 • •• ,, • o 1000 2000 3000 4000 5000 6000 Time (generations)

Figure 4.12 Time to inactivation plotted against element population size at inactivation, where host mutation rate = 0.01, element mutation rate = 0.5, element transposition rate = 0.01 3500 • 3000 -

2500 - i 2000 6 • A, 76 ... 0 1500 - b • 1000 - • • • • * • Z • 500 - 4:• # ***•• 0 0 200 400 600 800 1000 1200 1400 Time (generations)

Figure 4.13 Time to inactivation plotted against element population size at inactivation, where host mutation rate = 0.01, element mutation rate = 1, element transposition rate = 0.01 1200 1000 - ...... • • 800 - ...♦ • , • N.• 0 1 • 7, 600 - 1, • t • • • ••. 4 4, 400 - ♦ • III . ilib • •• • • ...... • z 200 - W* • • i. • ... 0 , , , . , 0 100 200 300 400 500 600 Time (generations)

Figures 4.11 - 4.13 Plots showing the generation of inactivation against the number of elements present in the population at the point of inactivation at three different transposition rates, in simulations where there was mutation at transposition and in the host gerrnline.

187 Chapter 4 The Generation of ERV Distribution and Diversity

In all of these simulations, the element lineage was inactivated before element population size reached the maximum limit (although that outcome would inevitably have been achieved with higher transposition rates). The lower the transposition rate, the more rapidly, on average, the element lineage was inactivated. Often, only the fixed insertion was present in the population when the lineage was inactivated. This outcome was more frequent when the transposition rate of the fixed insertion was lower than the rate of the host mutation (0.001, 0.005). Plots suggested that the number of elements present in the population at inactivation was roughly proportional to the number of generations that element activity persisted.

A second set of simulations was carried out as described above, but this time element mutation rate was varied. For each set of parameters, 50 repeat simulations were carried out. The generation in which the lineage was inactivated and the number of elements in the element population at the point of inactivation were plotted on the same graph for all 50 simulations (see Figures 4.11 — 4.13). Introducing element mutation led to more rapid inactivation of ERV lineages; the higher the mutation rate, the more rapidly the lineage was inactivated.

Models distinguish two basic patterns of ERV Glade growth (Figure 4.14). In the first, the master gene model (also called master template model), only one or perhaps a few highly active 'master' ERV loci are capable of amplification. Master genes give rise to daughter copies that are incapable of transcription themselves. The alternative is a multiple source model of ERV amplification, in which daughter copies are capable of independent amplification. Simulations in which the rate of element mutation at transposition was one conform to the master template model. As mutation rate decreases towards zero, the dynamics of the system tend increasingly towards multiple source behaviour.

Simulations showed that the rate at which Glade growth proceeds under the master gene model is tightly coupled to the activity of the master gene. This model also predicts a functional role for the 'master gene' in host, since the persistence of the lineage requires

188 Chapter 4 The Generation of ERV Distribution and Diversity that the master gene is maintained intact over long periods of evolutionary time. In other words, a master gene would probably need to be under selective constraints or in some other way protected so that it could reach fixation and remain retrotranspositionally active. In the multiple source model the rate of amplification is tightly coupled to overall copy number. Unlike the master gene model, it is not necessary to invoke a functional role to support the long-term persistence of individual source genes.

4.3.3 Frequency of fixation

Since most ERV insertions are likely to be either selectively disadvantageous, or selectively neutral at best, the majority of insertions created during the lifespan of an ERV lineage are likely to be eliminated either by genetic drift or by selection. Only those insertions that, for one reason or another, are fixed in the host genome, remain to be studied today.

The rate of fixation k, is determined by the rate at which new insertions are generated, and the probability of an insertion reaching fixation. The rate at which new insertions are generated is equal to the sum of the transposition rates of all the elements in the population (Et). Assuming all insertions are selectively neutral, the probability of any insertion reaching fixation in a population of fixed size is equal to the reciprocal of the population size (1/(2N)). Thus:

3) k = Et (1/(2N))

It can be seen from this equation that the rate of fixation increases as transposition rate increases and as population size decreases. Consequently, host population bottlenecks may influence the fate of element insertions. The effect of a bottleneck will depend on its extent and duration, and the amount of element polymorphism present in the population at the point in which the bottleneck occurs.

189 Chapter 4 The Generation of ERV Distribution and Diversity

To investigate the effect of host population bottlenecks on fixation frequency, simulations were set up as follows. A fixed insertion was introduced into a population of 1000 hosts, each host had a single pair of homologous, non-recombining chromosomes consisting entirely of non-coding DNA. The rate of mutation in the host germline was 0.003; elements had a transposition rate of 0.01 and a mutation rate of 0.5 (these parameters were chosen because they were found produce a steady rate of element population growth). Simulations ran for 2000 generations. The number of fixed elements present in the population after 2000 generations was compared for populations that underwent a bottleneck after 500 generations and populations that did not. Two bottlenecks events were simulated, both occurring between 500 and 505 generations. One bottleneck reduced the host population size to 100 hosts for five generations, the second reduced the host population to 10 hosts for 5 generations. A two-tailed T-test (assuming unequal variances) indicated that a bottleneck that reduced the host population size to 100 significantly increased the number of fixed elements present in

the 2000th generation as compared to when there were no bottlenecks (p= 6.9454E-09). The more drastic the bottleneck, the greater the effect on fixation frequency. Furthermore, a bottleneck that reduced the host population to ten for five generations resulted in a

significant increase as compared to a bottleneck that reduced it to 100 (p = 2.3268E-07). Figure 4.15 shows the results of bottleneck simulations.

190

Chapter 4 Generation of ERV Diversity

.1 Models of ERV Glade growth

n 4 --- Ca CD model (also called master template model) • Parent A = Master ------4 I Non-propagating copies

Parent A' .-"--'--1, 2 Non-propagating copies

Parent A"

2 3 ...I. Non propagating copies

Source Gene Model

Parent I

ParentA' ------Thli. Parent Iii

5 2 1 l Parent A" ------Thll Parent 8' --"------Thl Parent C 3 sss IssonoI, '..""IFD--- ' Inactivating mutation lir

'D e Effect of host population bottlenecks on fixation frequency

ts

• lemen

d e • e f fix

o r 8 be m nu e erag

Av

I . • a) No Bottlenecks b) Bottleneck (100) c) Bottleneck (10)

!

.11 ;;; 7L effect of host population bottlenecks on fixation frequency in a population of 1000 hosts. The plot shows . ' 5 c A T

' of fixed elements from 50 simulations in which there were (a) no bottlenecks, (b) a bottleneck to 100

r

6 c 8 500th and 505th generations, and (c) a bottleneck to 10 hosts between the 500th and 505th generations.

191 Chapter 4 The Generation of ERV Distribution and Diversity

4.4 DISCUSSION

As stated previously, the generation of ERV distribution and diversity comprises three distinct processes: (1) colonisation, (2) amplification, and (3) fixation. A novel ERV lineage is established by colonisation of the host germline by a 'founder' ERV insertion. Following colonisation, the evolution of an ERV lineage is shaped by the retrotranspositional activity of insertions (amplification), and by factors influencing the frequency of insertions at specific loci in the host population (with some insertions reaching fixation and thereby establishing a more or less permanent presence in the germline).

4.4.1 Colonisation, the pace of amplification, and loss versus persistence of ERV lineages

The initial fate of founder insertions depends on the effect they have on their host. Although active selection of ERV insertions is thought to have occurred in some cases (see below, Section 4.4.3), it is generally assumed that ERV insertions will either be neutral or negatively selected, and susceptible to rapid elimination via selection or drift. Consequently, it can be expected that most germline colonisation events are followed by rapid loss of the colonising insertion from the host population. However, ERVs that escape immediate elimination may undergo amplification in their copy number. Persistence of the ERV lineage becomes increasingly likely as ERV copy number increases.

ERV amplification may involve reinfection of germline cells, or possibly, intracellular retrotransposition, and it is not clear which is more important in ERV evolution. Intracellular retrotransposition is precluded by the typical pathway of viral assembly, which directs ERV sequences towards budding and cellular export. Reinfection of virus- producing cells may also be hindered to some extent by secreted Env molecules binding to viral receptors (receptor interference). Both effects together could explain the

192 Chapter 4 The Generation of ERV Distribution and Diversity limitations in copy number of HERV proviruses relative to other retroelements (Lower et al, 1996). Nevertheless, it is clear that amplification of ERV copy number has occurred in the past, whatever the mechanism of amplification (see Tristem, 2000).

Simulation demonstrated that, as expected, most germline colonisation events are followed closely by loss of ERV sequences from the host germline by drift when the founder insertion is selectively neutral. These results suggest that ancient retroviruses may only have established ERV lineages infrequently. Consequently, only a small subset of ancient retroviruses diversity may be represented today as ERVs in vertebrate genomes.

However, ERVs do occasionally evade elimination, and subsequent amplification of the colonising insert can establish a population of related insertions in the host germline. Simulations indicated that once the distribution and number of elements in the host population reaches some threshold configuration, elements are no longer susceptible to elimination via drift, and (in the absence of mutation) establish a permanent presence in the germline. The probability of an ERV element successfully establishing an ERV lineage and avoiding elimination increased as the transposition rate of the element increased. The success of highly active elements was independent of host population size and the gene density of the host genome. Thus, simulations indicated that virulent, aggressively transposing ERVs are more likely to establish persistent lineages than `docile', inactive ones. To some extent, this may be a reflection of the assumptions made about the effects of element insertion. In all simulations, insertions into coding DNA were lethal only when homozygous, whereas heterozygous insertions in coding DNA were neutral. There were no intermediate effects of insertion on host fitness, and no detrimental effects related to carrying large numbers of elements. Thus, elements were only selected against in hosts homozygous for a particular insertion. Highly active elements create novel element polymorphism rapidly, but only homozygotes arise infrequently while the frequency of individual insertions remains low (especially if the host population is very large). Each novel insertion has an initial frequency of one, and although it may subsequently increase in frequency via genetic drift, the increase

193 Chapter 4 The Generation of ERV Distribution and Diversity generally occurs slowly. Consequently, selection against elements is inefficient, and element lineages that avoid elimination via drift in the early stages of their development can spread rapidly through the population.

Are the assumptions that were made about the effect of element insertion reasonable? It remains possible that they are not. Insertion of an ERV might have a dominant lethal effect, (eg. if the presence of the novel ERV in the germline causes congenital disease by expressing infectious virus, or by lethally unbalancing the regulation of crucial gene products). Providing, however, that the presence of ERV is not in itself pathogenic, then the assumption that insertions are generally neutral, even when they are heterozygous in coding DNA, does not seem wildly unreasonable. Certainly, transposable elements have been known to spread through host populations with incredible speed, suggesting that they can thrive despite a proclivity to insert into coding sequences. For example, since apparently undergoing horizontal transmission from Drosophila wilstoni to Drosophila melanogaster sometime around 1950, transposable elements called 'P elements' have spread like wildfire, so that most D. melanogaster have P elements, though not those collected in the wild before 1950 and kept in isolation since (Biemont, 1992).

Although a highly active element might spread very successfully through the host population, a high rate of transposition cannot be maintained indefinitely. As element populations expand, they inevitably exert a negative effect on their host, if only through sheer force of numbers. The success of all TE lineages in the long term depends on their capacity to limit the timing and extent of transposition to levels tolerable to the host, and this depends on their ability to regulate their expression, and thereby control the timing, location and extent of transposition. It is unsurprising then, that the diversity of TEs so far identified show a trend towards increased sophistication in the regulation of expression (Eickbush, 1994). In addition, the presence of active elements in the host germline may select for resistant hosts in which transposition is suppressed by host mechanisms. For example, the activity of P elements in D. melanogaster has apparently decreased since invasion, suggesting adaptation of either the D. melanogaster genome, or the P elements themselves (Lozovskaya et al, 1995).

194 Chapter 4 The Generation of ERV Distribution and Diversity

ERV lineages may be established through aggressive transposition following colonisation, and subsequent reduction in amplification rate. Amplification rate might be reduced by a number of means. Transcription of retrovirus genes is generally thought to be repressed or inhibited in areas of DNA methylation (Lorincz et al, 2001, Svoboda, 2000). Methylation may represent a host mechanism for suppressing the activity of ERVs and other transposable elements (Yoder and Bestor, 1996), and recent experiments have supported this conclusion with specific regard to ERVs (O'Neill et al, 1998). Another mechanism by which selection on hosts could reduce ERV transposition is by favouring the spread of transcription-repressing 'silencer' sequences (Feuer et al, 1989). Finally, ERV activity may be curtailed by the effects of mutation (see Section 4.4.2 below).

4.4.2 The ERV lineage ‘lifecycle and the dynamics of ERV Glade growth

Simulation indicated that once a population of active ERVs achieves a certain breadth of distribution in the host population, amplification of ERV copy number through retrotransposition exceeds loss of elements via drift or selection, and the ERV population subsequently increases in size. Since the transposition rates of all elements remained constant in these simulations, ERV populations grew exponentially. However, in real life, ERV transposition rate is unlikely to be constant for all insertions, and is unlikely to remain constant over time. Firstly, since retrotransposition is highly error- prone (Telesnitsky and Goff, 1997), daughter ERVs may often be non-functional, or have reduced transposition efficiency. Secondly, assuming ERV insertions are neutral or negatively selected, there is no pressure to maintain the sequence integrity of active ERVs, and consequently these insertions will steadily accumulate mutations. Over time, mutation of ERV coding and promoter sequences will decrease the efficiency of retrotransposition in active ERVs (eg. by disrupting mRNA expression, or reducing packaging efficiency). In the absence of recombination, one insertion after another will be inactivated and the retrotranspositional activity of the lineage will gradually peter out. The irreversible accumulation of mutations in clonal lineages is sometimes referred to as 'Muller's ratchet' (Muller, 1964). Without selection to maintain functional ERVs,

195 Chapter 4 The Generation of ERV Distribution and Diversity all ERV lineages are destined to succumb to Mullers ratchet and eventually cease activity. When all element activity has ceased, the activity of the lineage comes to an end. Over millennia, the fixed ERV insertions of inactive lineages will gradually decay into unrecognisable 'junk DNA' as they are overwritten by accumulated mutations.

The activity of ERV lineages can be thought of as a `lifecycle (Figure 4.16). The lifecycle begins with colonisation by a founder ERV. It then proceeds through amplification of the founder ERV, generation of ERV polymorphism, and expansion of the ERV population. The rate of amplification subsequently declines, however, either in response to host mechanisms suppressing transposition, or due to the accumulation of mutations in active ERVs. The slowdown and gradual cessation of transposition may lead to the loss of all elements from the population, and the lifecycle of the lineage may thus end. However, if element insertions remain in the population, (for example if they have reached fixation), the lifecycle of the lineage ends when all of these elements have been inactivated by one means or another.

Simulations that involved both host and element mutation illustrated the curtailing effect that copying errors at transposition have on the activity of element lineages. The effect of mutation was particularly strong when amplification dynamics tended towards a Master/Slave model (mutation at retrotransposition tended to inactivate daughter insertions). Since reverse transcription is known to be highly-error prone, it is possible that errors during retrotransposition exert a significant dampening effect on the growth of ERV lineages. However, it must be emphasized that, in real life, even degenerate insertions that do not encode functional enzymes may retain transpositional activity if enzymatic functions are supplied in trans by insertions elsewhere in the genome. This type of process is illustrated by the well-described occurrence of defective retroviruses that require a helper virus for successful replication (Shimotohno and Temin, 1981; Wei et al, 1981; Tabin et al, 1982). A future aim is to incorporate the sharing of gene products into the simulation model and examine the effects on the persistence of element activity (see Section 3.5.4, p175). Unfortunately time did not permit these simulations to be included in the work described here.

196

Chapter 4 Generation of ERV Diversity

Figure 4.16 The lifecycle of an ERV lineage

A Exogenous retrovirus

,71 coR. Recombination with Colonisation ,,''' .---, exogenous retroviruses o infecting the host 171 species, and release of < 0 ERV sequences back into 0 co o the exogenous virus -o .....0— population = • Loss from population 0= 0 • (via selection or drift) • wn9 • Amplification 000. 50. w ., D., e4 0 tzt c. 8- o 0 Loss from population or, • (via selection or drift) 04 Z ir 0 0 Fixation of insertions ." to a ...... 0-0 0_

Inactivation and decay • ip (in absence of selection)

fl Maintenance of coding capacity under host selection (ERV sequences co-opted by host)

Figure 4.15 ERV lineages can be thought of as having a 'lifecycle' . The lifecycle begins with colonisation by a founder ERV. Colonisation may be immediately followed by loss of the ERV sequences by drift or selection. Alternatively, the ERV may undergo amplification via retrotransposition or reinfection of host cells. The rate of amplification subsequently declines, however, either in response to host mechanisms suppressing transposition, or due to the accumulation of mutations in active ERVs. The slowdown and gradual cessation of transposition may lead to the loss of all elements from the population, and the lifecycle of the lineage may thus end. However, element insertions that reach fixation remain in the population. ERVs that reach fixation may be under host selection (for examples see Schulte and Wellstein, 1998; Meisler and Ting, 1993; Kulski et al, 1997), in which case their coding capacity may be to some degree retained. ERVs that reach fixation but are not under host selection are destined to decay into junk at approximately the same rate as a host pseudogene. Potentially, recombination could occur between copackaged genomic RNAs from ERV lineages or between ERV genomic DNA and genomic RNA derived from exogenous retroviruses. Thus ancient ERV sequences could potentially be recirculated in infectious virus populations after remaining dormant as endogenous sequences for thousands or perhaps even millions of years. This process is the only way in which the fitness of

197 Chapter 4 The Generation of ERV Distribution and Diversity

Preliminary reconstructions of HERV phylogenies have indicated that patterns of activity may vary between HERV families (Tristem, unpublished data). For example, the HERV.L family apparently underwent an early burst of activity and has been inactive since, whereas activity within the HERV.H family appears to have been retained throughout its evolution. It is important to stress that these trees are based on ERV insertions that are almost certainly at fixation in the human genome, and consequently may not reflect the true extent of amplification over time (see Section 4.4.3 below). Nonetheless these trees seem to suggest that patterns of activity can vary, and that activity of an ERV lineage may persist over for millions, perhaps even tens of millions of years.

4.4.3 Fixation Frequency

For the most part, only ERV insertions that, for one reason or another, are fixed in the host genome remain to be studied today. It is therefore important to consider the factors that contribute to fixation. Many papers use the term 'amplification' to talk about an increase in the copy number of ERV insertions around a specified time (calculated, for example, by using LTR divergence data, or inferred from molecular phylogenies). The term is unfortunate because it is potentially misleading; an amplification in ERV copy number may not reflect any increase in ERV amplification around the time in question. Since the majority of ERVs in humans at least, are estimated to be at fixation, an amplification of ERV copy number reflects an increase in fixation frequency. Thus amplification in copy number around a specific time in reconstructed phylogenies based on fixed ERVs may represent a burst of fixation, rather than a burst of amplification (the two might correspond, but need not).

Some ERVs may have been fixed directly by positive selection. In general, active selection of ERV insertions is thought to be unlikely, particularly since ERVs may retain pathogenic potential, and may damage or inactivate host genes through insertion nearby or within host open reading frames with potentially lethal consequences for the host organism (Kidwell and Lisch, 1997). Occasionally however, ERV insertions may

198 Chapter 4 The Generation of ERV Distribution and Diversity confer advantages on the host, and in these cases positive selection and subsequent fixation of retroelement insertions can take place. Although rare, the incorporation of retrovirus-derived sequences into certain host genes indicates advantageous traits have been conferred by ERV insertion in the past (Schulte and Wellstein, 1998; Kulski et al, 1997). For example, a member of the HERV-E family is inserted into the promoter region of the amylase gene cluster, generating an unusual salivary-specific promoter (Miesler and Ting, 1993). An unsubstantiated but plausible idea is that favourable LTR- effects on host gene expression may lead to positive selection of ERV insertions (Sverdlov, 2000). Alternatively, it has been suggested that ERV insertions may benefit the host organism by providing immune protection against related exogenous retroviruses via receptor interference (Weiss, 1993). In this scenario, selection maintains a source of new insertions in the host germline. Although there is no conclusive evidence to support this theory, receptor interference has been demonstrated for two loci in mice (Fvl and Fv4) which appear to be derived from ancient proviruses (Best et al, 1997). More speculative roles for ERVs in host evolution have also been proposed. For example, the placental expression of ERVs has prompted the suggestion that acquisition of ERVs was essential for the evolution of the placental mode of reproduction (Villareal, 1997).

Generally, however, it is considered unlikely that ERV insertions will be positively selected. Most insertions are probably negatively selected or neutral at best. Neutral, or even weakly negatively selected insertions may nevertheless reach fixation via processes such as genetic 'hitchhiking'. When a gene at a particular locus is changing frequency over time, it can cause related changes at linked loci. Suppose, for instance, that directional selection is substituting one allele A, for another A' at a specific locus, and an ERV insertion occurs at a linked locus nearby. Providing that selection against the novel insertion does not outweigh selection in favour of A, then the frequency of both alleles will increase in tandem, and the novel insertion will eventually be fixed along with A. The insertion is said to have 'hitchhiked' to fixation (Maynard Smith and Haigh, 1974).

199 Chapter 4 The Generation of ERV Distribution and Diversity

In small populations, neutral insertions may reach fixation without even the benefit of hitchhiking effects, but simply through random genetic drift. It has been speculated that `plagues' of extremely virulent retroviral infection may have been responsible for causing host population bottlenecks in the past, with ERV fixation being associated with host survival (Sverdlov, 2000). While this theory is difficult to substantiate, simulations in which host populations underwent sudden bottlenecks demonstrated the striking effect that constricting host population size can have on the frequency of ERV fixation. This observation has led us to consider the possibility that retroviral fixation is associated with population bottlenecks. If population size is the overriding factor in retrovirus fixation, ERV fixations may be genetic markers of ancient bottleneck events. Since the integration dates of ERV insertions can be estimated using LTR divergence data, it is possible that ERVs might provide a means to identify and approximately date ancient host bottlenecks. The potential of ERVs in this regard is intriguing. Potentially, ERVs could provide a means of inferring bottlenecks in the very distant past (up to 80 million years ago). It would be particularly interesting if these bottlenecks correlated with estimated speciation dates in vertebrate lineages. With regard to this, it may be significant that insertions of the HERV-H family are reported to be rare in New World primates but present at high copy number in Old World primates (Mager and Freeman, 1995). Perhaps this distribution reflects an ancient bottleneck in the evolution of Old World primates.

If speciation were associated, at least in some instances, with host population bottlenecks and fixation of ERV insertions, we might expect to find distinctive ERV `signatures' in divergent host taxa. The ERV signature of a particular species might be reflected in the presence of fixed of ERVs at distinct loci in different taxa, or in the presence of distinct ERV lineages in different taxa. One line of enquiry for future simulation work might involve investigating the effect of lineage sorting events on the ERV composition of sister taxa.

Simulations showed that when insertions are neutral with regard to host fitness, very few of the insertions that are created during the amplification of an ERV lineage actually

200 Chapter 4 The Generation of ERV Distribution and Diversity reach fixation. Consequently, cooption and positive selection of ERV sequences by the host may be a significant factor in determining the composition of fixed ERVs, even though it is only thought to occur rarely. By the same measure, genetic hitchhiking may also exert a significant influence.

201 Chapter 5 Conclusions

5. CONCLUSIONS

5.1.1 ERVs as evolutionary markers

ERVs are widespread features within the genomes of vertebrate species, and represent an important archive of evolutionary information. Patterns of ERV distribution and diversity of ERV sequences within and between vertebrate genomes have been generated in response to numerous ecological and evolutionary pressures, and are therefore potentially informative. These patterns reflect not only the interactions between exogenous retroviruses and their hosts over evolutionary time, but also the processes that have shaped vertebrate genomes. In addition, the irreversible, independent nature of ERV insertion allows ERVs to be used as extremely powerful genetic markers for identifying common ancestry between taxa. As such they represent a powerful new tool for systematic biology that can be strategically integrated with other phylogenetic characters.

5.1.2 ERVs as markers of exogenous retrovirus evolution

Until relatively recently, most studies of viruses have focussed on their role as agents of human disease, or their potential as microbiological tools (Roisman et al 1997). These investigations typically consist of studies of biochemical and molecular properties, or of interactions with host cellular and immune systems. Traditionally, viruses have not been well studied in the broader context of ecology and evolution.

Gradually, however, this is beginning to change. The emergence of novel viral pathogens (particularly the HIV viruses) has been a catalyst in the recognition that viruses also need to be studied in an ecological and evolutionary context (Morse and Schluederberg, 1990). Numerous questions remain to be answered with regard to the role of viruses as widespread and dynamic components of ecosystems: How many different types of virus there are? How are diverse virus families related to one another? How variable are viruses within and between groups of host species? How frequently do

202 Chapter 5 Conclusions

viruses cross species barriers and transfer between host lineages? Are we more likely to acquire new viruses from animals that we are closely related to phylogenetically, or that we are exposed to most frequently in our environment?

Ecological and evolutionary investigations are frequently assisted by the acquisition of informative and reliable phylogenies (Page and Holmes, 1998). Unfortunately, many attempts to reconstruct viral phylogeny have been relatively unsuccessful. Viruses typically provide little in the way of reliable taxonomic characters, and most modern viral taxonomy is based almost exclusively on gene sequence data. However, even when sequence data are available, phylogenetic reconstruction has often proved problematic, with trees very poorly resolved. The potential for extremely rapid sequence evolution in viral genes, and for the efficient exchange of genes via recombination, are frequently cited as obstacles to the reconstruction of viral phylogeny (Roizman et al 1997).

However, another factor that may account for the poor resolution often seen in viral phylogenies is the extremely patchy sampling of viral diversity. There is a very strong bias within virology towards research on viruses that cause socially and economically significant disease. As a result, the majority of viruses that have been characterised are those of man, domestic and laboratory animals and domestic crops (this point is illustrated by the range of lentivirus species identified to date (Table 2.2)).

In the case of animal viruses, sampling of diversity is further hindered by the fundamental difficulty of isolating viruses from wild animals. Novel viruses can usually only be isolated from infected organisms, but there is no straightforward method for targeting and accessing infected animals in wild populations; viral infections are often short in duration, and only a small proportion of the host population may be infected at any one time. Even given access to infected tissue, isolation and characterisation of viral agents can be challenging. As a consequence, sampling of viral biodiversity has been limited, and the overall diversity is likely to be greatly underrepresented. Consequently, the first challenge to the study of viral ecology and evolution may simply be to gain a realistic impression of distribution and diversity within at least one group of viruses.

203 Chapter 5 Conclusions

ERVs embedded in genomic DNA provide a tractable source of data for the exploration of retroviral distribution and diversity. Not only are ERV sequences relatively easy to obtain, but they can also provide us with a unique opportunity to compare ancient and modern retrovirus taxa. For this reason, the retroviruses are uniquely suited to ecological and evolutionary studies. Of course, many of the features that make retroviruses more easy to study than other viruses may also make them different, and there can be no guarantee that findings for retroviruses will be applicable to other virus groups. Nevertheless, the relative ease with which retrovirus diversity can be explored makes this family particularly amenable to study.

5.1.3 Class II retrovirus distribution and diversity

Sampling of Class II ERV distribution and diversity using PCR screening indicates that these retroviruses are, on the whole, restricted to higher vertebrates (Martin, 1999). However, Class II ERVs have been identified in caecilians and snakes, and it remains possible that further sampling will disclose further examples of lower vertebrate Class II retroviruses.

The distribution of Class II ERVs across host taxa, and the condition of viral reading frames across the amplified region, suggests that Class II retroviruses have undergone a relatively recent radiation. Given that horizontal transmission of the HIV viruses is widely thought to have occurred recently, it could be argued that active radiation of Class II retroviruses continues to occur. Comparison with data obtained for Class I retroviruses suggests that, in general, recent activity has been greater amongst Class II retroviruses than those of Class I (see Figures 2.15 — 2.19). Class I HERVs appear to be more common, more diverse, and in general, more ancient than Class II HERVs. The HERV.K.HML2 family at least, appears to have been established very recently, relatively speaking. However, this study has also distinguished three other Class II HERV families (HML.5, HML.6 and HML.9) that appear to represent more ancient lineages of Class II HERV.

204 Chapter 5 Conclusions

Phylogenetic analysis of Class II ERVs suggested that Class II diversity is far greater than previously recognised, especially in avian host taxa. In addition to identifying novel Alpharetroviruses and Betaretroviruses, this study has identified two entirely novel groups of avian retrovirus (Avian Class II type 1 ERVs and Avian Class II type 2 ERVs), a group of novel ERVs closely related to the unclassified avian retrovirus LDV, and numerous additional Class II ERV lineages with one-three members.

The relationships between most Class II groups was poorly resolved. I believe this is at least partly a reflection of the need for further sampling of Class II diversity. The data obtained in this study would not support an analysis of the pattern and timing of horizontal transmission events within host classes, that would require far more data. Although, within host classes, Class II retroviruses appear to be radiating via horizontal transmission, horizontal transmission of Class II retroviruses from one host class to another appears to be relatively rare, reflecting results obtained for the MLV-related Class I retroviruses (Martin and Tristem, 2000).

The study of Class II retroviruses presented here suggests directions for further research. Of particular interest is a Betaretrovirus sequence isolated from an ostrich. This is the first avian Betaretrovirus identified and appears to represent a relatively recent inter- class transmission event. Future work in the Retroviral Evolution Group at Silwood Park will involve the cloning and complete sequencing of this sequence.

5.1.4 The generation of ERV diversity

The progress of genome sequencing projects in recent years has led to an increasing recognition of the potential of non-coding, repetitive and parasitic or 'selfish' DNA sequences such as ERVs to inform our understanding of genome evolution. The sheer quantity of ERVs in most vertebrate genomes is a clear indication of the fundamental role they have played in shaping genomic diversity. Models that predict how ERV distribution and diversity is generated in response to ecological and evolutionary pressures will aid the interpretation of genomic data, which is now accumulating with

205 Chapter 5 Conclusions

great rapidity. Computer simulation using individual-based models provides a convenient basis to explore the effects of specific ecological and evolutionary parameters on ERV evolution.

Simulation demonstrated some of the potential of ERV distributions to reveal information about ancient host ecology. Fixation of ERV sequences was uncommon in simulations, but was greatly increased when host populations harbouring active elements and a sufficient degree of element polymorphism underwent a population bottleneck. Therefore, ERV insertions (which can be dated by using phylogenetic data, by their presence or absence in related species, and by using LTR divergence data) might provide a means of identifying ancient host population bottlenecks.

Simulation also illustrated some of the limitations of ERV data, particularly with regard to inference about exogenous retrovirus ecology and evolution. From simulations it appeared that perhaps the majority of germline colonisation events are followed by loss of the colonising ERV sequences from the population, even in the absence of serious detrimental effects of germline insertion. Furthermore, simulation showed that host and element mutation can rapidly deactivate an ERV lineage, and it is likely that retrotransposition is a highly error prone process, given the error rate of RT in in vitro assays (Preston et al, 1988; Bebenek and Kunkel, 1993; Williams and Loeb, 1992). Lastly, in large populations, the vast majority of neutral ERV insertions are lost from the population. Taken together, these data indicate that the ERV composition of modern vertebrate genomes may only be a poor reflection of ancient exogenous retrovirus diversity. An exogenous retrovirus may have a longstanding association with a particular host, and nevertheless fail to establish any ERV insertions. Germline integration might be precluded by the cell tropism of the virus, or perhaps by the pathogenic potential of ERVs in certain lineages. Consequently we should be cautious when interpreting ERV distribution data with regard to the ecology and evolution of ancient retroviruses; the intricacies of retrovirus evolution may not be fully and accurately reflected in the distribution of fixed ERVs.

206 References

6. REFERENCES

ABERGEL, C., ROBERTSON, D. L. & CLAVERIE, J. M. (1999). "Hidden" dUTPase sequence in human immunodefiency virus type 1 gp120. J. Virol. 73, 751-753. ADAM, M. A. & MILLER, A. D. (1988). Identification of a signal in a murine retrovirus that is sufficient for packaging of nonretroviral RNA into virions. J Virol 62, 3802-6. ALTSCHUL, S. F., MADDEN, T. L., SCHAFFER, A. A., ZHANG, J., ZHANG, Z., MILLER, W. & LIPMAN, D. L. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nuc. Acids Res. 25, 3389-3402. ANDERSON, K. P., LIE, Y. S., Low, M. A., WILLIAMS, S. R., FENNIE, E. H., NGUYEN, T. P. & WURM, F. M. (1990). Presence and transcription of intracisternal A- particle-related sequences in CHO cells. J Virol 64, 2021-32. ANDERSSON, M.-L., LINDESKOG, M., MEDSTRAND, P., WESTLEY, B., MAY, F. & BLOMBERG, J. (1999). Diversity of human endogenous retrovirus class II-like sequences. J. Gen. Virol. 80, 255-260. ARAVIND, L. & KOONIN, E. V. (1999). G patch: a new conserved domain in eukaryotic RNA-processing proteins and type D retroviral polyproteins. TIBS 24, 342-344. ATKINS, J. F. & GESTELAND, R. F. (1999). Intricacies of ribosomal frameshifting. Nature Structural Biology 6, 206-207. BAILLIE, G. J. & WILKINS, R. J. (2001). Endogenous type D retrovirus in a marsupial, the common brushtail possum (Trichosurus vulpecula). Journal of Virology 75, 2499-2507. BAKKER, A., LI, X. Q., RULAND, C. T., STEPHENS, D. W., BLACK, A. C. & ROSENBLATT, J. D. (1996). Human T-cell leukemia virus type 2 Rex inhibits pre-mRNA splicing in vitro at an early stage of spliceosome formation. Journal of Virology 70, 5511-5518. BALTIMORE, D. (1970). Viral RNA-dependent DNA polymerase. Nature 226, 1209- 1211. BANKI, K., MACEDA, J., HURLEY, E., ABLONCZY, E., MATTSON, D. H., SZEGEDY, L., HUNG, C. & PERL, A. (1992). Human T-Cell Lymphotropic Virus (HTLV)-

207 References

Related Endogenous Sequence, HRES-1, Encodes a 28-Kda Protein - a Possible Autoantigen For HTLV-I Gag-Reactive Autoantibodies. Proceedings of the National Academy of Sciences of the United States of America 89, 1939-1943. BARBULESCU, M., TURNER, G., SEAMAN, M. I., DEINARD, A. S., KIDD, K. K. & LENZ, J. (1999). Many human endogenous retrovirus K (HERV-K) proviruses are unique to humans. Current Biology 9, 861-868. BARR, M. C., ZOU, L. L., LONG, F., HOOSE, W. A. & AVERY, R. J. (1997). Proviral organization and sequence analysis of feline immunodeficiency virus isolated from a Pallas' cat. Virology 228, 84-91. BARTOLOME, C., MASIDE, X. & CHARLESWORTH, B. (2002). On the abundance and distribution of transposable elements in the genome of Drosophila melanogaster. Mol. Biol. Evol. 19, 926-937. BEASLEY, B. E. & Hu, W. S. (2002). cis-Acting elements important for retroviral RNA packaging specificity. J Virol 76, 4950-60. BEBENEK, K. & KUNKEL, T. A. (1993). The fidelity of reverse transcriptases. In Reverse transcriptase (ed. A. M. Skalka and S. P. Goff), pp. 85-102. Cold Spring Harbour Laboratory Press, New York. BENIT, L., DEPARSEVAL, N., CASELLA, J. F., CALLEBAUT, I., CORDONNIER, A. & HEIDMANN, T. (1997). Cloning of a new murine endogenous retrovirus, MuERV-L, with strong similarity to the human HERV-L element and with a gag coding sequence closely related to the Fvl restriction gene. Journal of Virology 71, 5652-5657. BENIT, L., DESSEN, P. & HEIDMANN, T. (2001). Identification, phylogeny, and evolution of retroviral elements based on their envelope genes. Journal of Virology 75, 11709-11719. BENIT, L., LALLEMAND, J. B., CASELLA, J. F., PHILIPPE, H. & HEIDMANN, T. (1999). ERV-L elements: a family of endogenous retrovirus-like elements active throughout the evolution of mammals. Journal of Virology 73, 3301-3308. BENNETZEN, J. L. (1996). The Contributions of Retroelements to Plant Genome Organization Function and Evolution. Trends in Microbiology 4, 347-353.

208 References

BENSON, S. J., Ruis, B. L., FADLY, A. M. & CONKLIN, K. F. (1998). The unique envelope gene of the subgroup J avian leukosis virus derives from ev/J proviruses, a novel family of avian endogenous viruses. Journal of Virology 72, 10157-10164. BENVENISTE, R. & TODARO, G. (1975). Evolution of type C viral genes; preservation of ancestral murine type C viral sequences in pig cellular DNA. Proc. Natl. Acad. Sci. U.S.A. 72, 4090-4094. BENVENISTE, R. E. & TODARO, G. J. (1977). Evolution of primate oncornaviruses: An endogenous virus from langur (Presbytis spp.) with related virogene sequence in other Old World monkeys. Pro. Natl. Acad. Sci. USA 74, 4557. BERKHOUT, B. & VAN WAMEL, J. L. (1996). Role of the DIS hairpin in replication of human immunodeficiency virus type 1. J Virol 70, 6723-32. BERKOUT, B. (1996). Structure and function of the human immunodeficiency virus leader RNA. Nucleic Acid Res. Mol. Biol. 54, 1-34. BERKOWITZ, R., FISHER, J. & GOFF, S. P. (1996). RNA packaging. Curr Top Microbiol Immunol 214, 177-218. BEST, S., LE TISSIER, P. R. & STOYE, J. P. (1997). Endogenous retroviruses and the evolution of resistance to retroviral infection. Trends in Microbiology 5, 313- 318. BIEMONT, C. (1992). Population genetics of transposable DNA elements - A Drosophila point of view. Genetica 86, 67-84. BIETH, E. & DARLIX, J. L. (1992). Complete Nucleotide-Sequence of a Highly Infectious Avian- Leukosis Virus. Nucleic Acids Research 20, 367-367. BITTNER, J .JJ. (1936). Some possible effects of nursing on the mammary gland tumour incidence in mice. Science 84, 162-162. BLUSCH, J. H., PATIENCE, C., TAKEUCHI, Y., TEMPLIN, C., ROOS, C., VON DER HELM, K., STEINHOFF, G. & MARTIN, U. (2000). Infection of nonhuman primate cells by pig endogenous retrovirus. Journal of Virology 74, 7687-7690. BOCK, M. & STOYE, J. P. (2000). Endogenous retroviruses and the human germline. Curr. Opinion. Gen. Dev. 10, 651-655.

209 References

BOEKE, J. D. & STOYE, J. P. (1997). Retrotransposons, Endogenous retroviruses, and the evolution of retroelements. In Retroviruses (ed. J. M. Coffin, S. H. Hughes and H. E. Varmus), pp. 343-435. CSHL Press, New York. BOWERMAN, B., BROWN, P. 0., BISHOP, J. M. & VARMUS, H. E. (1989). A nucleoprotein complex mediates the integration of retroviral DNA. Genes Dev 3, 469-78. BOYCE-JACINO, M. T., ODONOGHUE, K. & FARAS, A. J. (1992). Multiple Complex Families of Endogenous Retroviruses Are Highly Conserved in the Genus Gallus. Journal of Virology 66, 4919-4929. BROWN, E. W., YUHKI, N., PACKER, C. & O'BRIEN, S. J. (1994). A lion lentivirus related to feline immunodeficiency virus: epidemiologic and phylogenetic aspects. J Virol 68, 5953-68. BROWN, P. O. (1997). Integration. In Retroviruses (ed. J. M. Coffin, S. H. Hughes and H. E. Varmus), pp. 161-204. Cold Spring Harbour Laboratory Press, New York. BUKRINSKY, M. I., HAGGERTY, S., DEMPSEY, M. P., SHAROVA, N., ADZHUBEL, A., SPITZ, L., LEWIS, P., GOLDFARB, D., EMERMAN, M. & STEVENSON, M. (1993). A nuclear localization signal within HIV-1 matrix protein that governs infection of non-dividing cells. Nature 365, 666-9. CALLAHAN, R. (1988). Two families of endogenous human retroviral genomes. Banbury Rep. 30, 91-100. CANN, A. J. & CHEN, I. S. Y. (1996). Human T-Cell Leukemia virus Types I and II. In Fields Virology (ed. B. N. Fields, D. M. Knipe and P. M. Howley). Lippincott- Raven Publishers, Philadelphia. CARPENTER, M. A., BROWN, E. W., CULVER, M., JOHNSON, W. E., PECONSLATTERY, J., BROUSSET, D. & OBRIEN, S. J. (1996). Genetic and phylogenetic divergence of feline immunodeficiency virus in the puma (Puma concolor). Journal of Virology 70, 6682-6693. CARPENTER, M. A., BROWN, E. W., MACDONALD, D. W. & O'BRIEN, S. J. (1998). Phylogeographic patterns of feline immunodeficiency virus genetic diversity in the domestic cat. Virology 251, 234-243. CAVAZZANA-CALVO, M., HACEIN-BEY, S., BASILE, C. D., GROSS, F., YVON, E., NUSBAUM, P., SELZ, F., HUE, C., CERTAIN, S., CASANOVA, J. L., Bousso, P., LE

210 References

DEIST, F. & FISCHER, A. (2000). Gene therapy of human severe combined immunodeficiency (SCID)- X1 disease. Science 288, 669-672. CHANT, A., SARID, R., YANIV, A., SMYTHERS, G. W., TRONICK, S. R. & GAZIT, A. (1992). The Lymphoproliferative Disease Virus of Turkeys Represents a Distinct Class of Avian Type-C Retrovirus. Gene 122, 349-354. CHARLESWORTH, B. (1987). The Population Biology of Transposable Elements. Trends in Ecology & Evolution 2, 21-23. CHARLESWORTH, B. & CHARLESWORTH, D. (1983). The population dynamics of transpoable elements. Genet. Res. 42, 1-27. CHARLESWORTH, B. & LANGLEY, C. H. (1986). The Evolution of Self-Regulated Transposition of Transposable Elements. Genetics 112, 359-383. CHECK, E. (2003a). Second cancer case halts gene therapy trials. Nature 421, 305. CHECK, E. (2003b). Cancer fears cast doubts on future of gene therapy. 2003 421, 678. CHEN, X. Y., CHAMORRO, M., LEE, S. I., SHEN, L. X., HINES, J. V., TINOCO, I. & VARMUS, H. E. (1995). Structural-Studies and Functional-Studies of Retroviral Rna Pseudoknots Involved in Ribosomal Frameshifting - Nucleotides At the Junction of the 2 Stems Are Important For Efficient Ribosomal Frameshifting. Embo Journal 14, 842-852. CHIU, I. M., CALLAHAN, R., TRONICK, S. R., SCHLOM, J. & AARONSON, S. A. (1984). Major Pol Gene Progenitors in the Evolution of Oncoviruses. Science 223, 364- 370. CHOPRA, H. C. & MASON, M. M. (1970). A new virus in a spontaneous mammary tumour of a rhesus monkey. Cancer Res. 30:, 2081. CIANCIOLO, G. J., BOGERD, H. & SNYDERMAN, R. (1986). Human T-Cell Lymphotropic Virus (Htiv) Envelope-Related Peptides Inhibit Human-Lymphocyte Proliferative Responses. Clinical Research 34, A492-A492. COFFIN, J. M. (1993). Reverse transcription and evolution. In Reverse Transcriptase, vol. 445-479 (ed. A. M. Skalka and S. P. Goff). Cold Spring Harbour Laboratory Press, New York. COFFIN, J. M. (1995). HIV population dynamics in vivo: implications for genetic variation, pathogenesis and therapy. Science 267, 483-489.

211 References

COFFIN, J. M., CHAMPION, M. A. & CHABOT, F. (1978). Nucleotide sequence relationships between the genomes of an endogenous and exogenous avian tumour virus. J. ViroL 28, 972-991. CONNOR, R. I., CHEN, B. K., CHOE, S. & LANDAU, N. R. (1995). Vpr is required for efficient replication of human immunodeficiency virus type-1 in mononuclear phagocytes. Virology 206, 935-44. CONRAD, B., WEISSMAHR, R. N., BONI, J., ARCARI, R., SCHUPBACH, J. & MACH, B. (1997). A human endogenous retroviral superantigen as candidate autoimmune gene in type I diabetes. Cell 90, 303-313. CONSORTIUM, I. H. G. S. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860-921. CONSORTIUM, M. G. S. (2002). Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520-562. COOK, J. M. & TRISTEM, M. (1997). 'SINEs of the times' - Transposable Elements as Clade Markers for Their Hosts. TREE 12, 295-297. COOPER, A., LALUEZA-FOX, C., ANDERSON, S., RAMBAUT, A., AUSTIN, J. & WARD, R. (2001). Complete mitochondrial genome sequences of two extinct moas clarify ratite evolution. Nature 409, 704-7. COUSENS, C., MINGUIJON, E., DALZIEL, R. G., ORTIN, A., GARCIA, M., PARK, J., GONZALEZ, L., SHARP, J. M. & DE LAS HERAS, M. (1999). Complete sequence of enzootic nasal tumor virus, a retrovirus associated with transmissible intranasal tumors of sheep. Journal of Virology 73, 3986-3993. CRANDALL, K. A. (1996). Multiple interspecies transmissions of human and simian T- cell leukemia/lymphoma virus type I sequences. Mol. Biol. Evol., 115-131. CRICK, F. H. C. (1958). On Protein Synthesis. Symp. Soc. Exp. Biol. 12, 137. CULLEN, B. R. (1998). Posttranscriptional regulation by the HIV-1 Rev protein. Seminars in Virology 8, 327-334. DALTON, A. J., POTTER, M. & MERWIN, R. M. (1961). Some ultrastructural characteristics of a series of primary and transplanted plasma-cell tumours of the mouse. J. Natl. Cancer Inst. 26, 1221-1267.

212 References

DANIEL, M. D., LE'TVIN, N. L. & KING, N. W. (1985). Isolation of T-cell tropic HTLV- III-like retroviruses from macaques. Science 228, 1201-1204. DARLIX, J. L. (1991). Structure and Variability of Human Retrovirus Hiv-1. Bulletin De L Institut Pasteur 89, 211-242. DESROSIERS, R. C. (1990). The simian immunodeficiency viruses. Annu. Rev. Immunol. 8, 557-578. DIMCHEFF, D. E., DROVETSKI, S. V., KRISHNAN, M. & MINDELL, D. P. (2000). Cospeciation and horizontal transmission of avian sarcoma and leukosis virus gag genes in galliform birds. Journal of Virology 74, 3984-3995. DIMCHEFF, D. E., KRISHNAN, M. & MINDELL, D. P. (2001). Evolution and characterization of tetraonine endogenous retrovirus: A new virus related to avian sarcoma and leukosis viruses. Journal of Virology 75, 2002-2009. DOMINGO, E. (2002). Quasispecies theory in virology. Journal of Virology 76, 463-465. DOOLITTLE, R. F., FENG, D. F., JOHNSON, M. S. & MCCLURE, M. A. (1989). Origins and evolutionary relationships of retroviruses. Q. Rev. Biol. 64, 1-30. DOUGHERTY, R. M., DI STEFANO, H. S. & ROTH, F. K. (1967). Virus particles and viral antigens in chicken tissues free of infectious avian leukosis virus. Proc. Natl. Acad. Sci. 58, 808-817. DUESBERG, P. H. & VOGT, P. K. (1970). Differences between the ribonucleic acids of transforming and non-transforming avian tumour viruses. Proc. Natl. Acad. Sci. U.S.A. 67, 1673-1680. DUNWIDDIE, C. T., RESNICK, R., BOYCE-JACINO, M., ALEGRE, J. N. & FARAS, A. J. (1986). Molecular clonong and characterisation of gag-, pol, and env-related sequences in ev- chicken. J. Virol. 59, 669-675. EGBERINK, H. & HORZINEK, M. C. (1992). Animal immunodeficiency viruses. Vet Microbiol 33, 311-31. EICKBUSH, T. H. (1994). Origin and evolutionary relationships of retroelements. In The evolutionary biology of viruses (ed. S. S. Morse). Raven Press, New York. ELLERMAN, R. N. & BANG, 0. (1908). Experimentalle Lekamie bei Huhnern. ZentrabL Bakteriol. Parasitenkd. Infectionskr. Hyg. Abt. Orig. 46, 595-609.

213 References

EMINI, E. A. & FAN, H. Y. (1997). Immunological and Pharmacological Approaches to the Control of Retroviral Infections. In Retroviruses (ed. J. M. Coffin, S. H. Hughes and H. E. Varmus). Cold Spring Harbour Laboratory Press, New York. FAUCI, A. S. & DESROSIERS, R. C. (1997). Pathogenesis of HIV and SIV. In Retroviruses (ed. J. M. Coffin, S. H. Hughes and H. E. Varmus). Cold Spring Harbour Laboratory Press, New York. FENNELLY, J., HARPER, K., LAVAL, S., WRIGHT, E. & PLUMB, M. (1996). Co- amplification of tail-to-tail copies of MuRVY and TAPE retroviral genomes on the Mus musculus Y chromosome. Mammalian Genome 7, 31-36. FEUER, G., TAKETO, M., HANECAK, R. C. & FAN, H. (1989). Two blocks in Moloney murine leukemia virus expression in undifferentiated F9 embryonal carcinoma cells as determined by transient expression assays. J. Virol. 63, 2317-2324. FOLEY, B. T. (2000). An overview of the molecular phylogeny of lentiviruses. In HIV Sequence Compendium (ed. C. Kuiken, F. McCutchan, B. Foley, J. W. Mellors, B. Hahn, J. Mullins, P. Marx and S. Wolinsky). Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, New Mexico, U.S.A. FRISBY, D. P., WEISS, R. A., ROUSELL, M. & STEHELIN, D. (1979). The distribution of endogenous retroviral sequences in the DNA of galliforme birds does not coincide with avian phylogenetic relationships. Cell 17. GALLAY, P., HOPE, T., CHIN, D. & TRONO, D. (1997). HIV-1 infection of nondividing cells through the recognition of integrase by the importin/karyopherin pathway. Proc Natl Acad Sci USA 94, 9825-30. GALLO, R. C., SALAHUDDIN, S. Z., POPOVIC, M., SHEARER, G. M., KAPLAN, M., HAYNES, B. F., PALKER, T. J., REDFIELD, R., OLESKE, J., SAFAI, B., WHITE, G., FOSTER, P. & MARKHAM, P. D. (1984). Frequent Detection and Isolation of Cytopathic Retroviruses (Htiv-Iii) From Patients With Aids and At Risk For Aids. Science 224, 505-503. GAO, F., BAILES, E., ROBERTSON, D. L., CHEN, Y. L., RODENBURG, C. M., MICHAEL, S. F., CUMMINS, L. B., ARTHUR, L. 0., PEETERS, M., SHAW, G. M., SHARP, P. M. &

214 References

HAHN, B. H. (1999). Origin of HIV-1 in the chimpanzee troglodytes troglodytes. Nature 397, 436-441. GAO, F., YUE, L., WHITE, A. T., PAPPAS, P. G., BARCHUE, J., HANSON, A. P., GREENE, B. M., SHARP, P. M., SHAW, G. M. & HAHN, B. H. (1992). Human Infection By Genetically Diverse Sivsm-Related Hiv-2 in West Africa. Nature 358, 495-499. GARDNER, M. B., ENDRES, M. & BARRY, P. (1994). The Simian Retroviruses: SIV and SRV. In The Retroviridae, vol. 3 (ed. J. A. Levy), pp. 133-236. Plenum Press, New York. GELDERBLOM, H. (1990). Morphogenesis, maturation and fine structure of lentiviruses. In Retroviral proteases: Maturation and morphogenesis. (ed. L. H. Pearl). Stockton Press, New York. GRANT, R. F., WINDSOR, S. K., MALINAK, C. J., BARTZ, C. R., SABO, A., BENVENISTE, R. E. & TSAI, C. C. (1995). Characterization of Infectious Type-D Retrovirus From Baboons. Virology 207, 292-296. GREATOREX, J. & LEVER, A. (1998). Retroviral RNA dimer linkage. J Gen Virol 79 ( Pt 12), 2877-82. HANAFUSA, T., HANAFUSA, H., METROKA, C. E., HAYWARD, W. S., RETTENMIER, C. W., SAWYER, R. C., DOUGHERTY, R. M. & DI STEFANO, H. S. (1976). Pheasant virus: New class of ribodeoxyvirus. Proceedings of the National Academy of Sciences, USA 58. HARADA, F., PETERS, G. G. & DAHLBERG, J. E. (1979). The primer tRNA for Moloney murine leukemia virus DNA synthesis: Nucleotide sequence and aminoacylation of tRNAPr°. J. Biol. Chem. 254, 10979-10985. HARADA, F., SAWYER, R. C. & DAHLBERG, J. E. (1975). A primer ribonucleic acid for initiation of in vitro Rous sarcoma virus deoxyribonucleic acid synthesis. J. Biol. Chem. 250, 3487-3497. HARRISON, G. P., MIELE, G., HUNTER, E. & LEVER, A. M. (1998). Functional analysis of the core human immunodeficiency virus type 1 packaging signal in a permissive cell line. J Virol 72, 5886-96. HARVEY, P. H., MAY, R. M. & NEE, S. (1994). Phylogenies Without Fossils. Evolution 48, 523-529.

215 References

HECHT, S. J., STEDMAN, K. E., CARLSON, J. 0. & DEMARTINI, J. C. (1996). Distribution of endogenous type B and type D sheep retrovirus sequences in ungulates and other mammals. Proceedings of the National Academy of Sciences of the United States of America 93, 3297-3302. HEDGES, S. B., PARKER, P. H., SIBLEY, C. G. & KUMAR, S. (1996). Continental breakup and the ordinal diversification of birds and mammals. Nature 381, 226-229. HERNIOU, E., MARTIN, J., MILLER, K., COOK, J., WILKINSON, M. & TRISTEM, M. (1998). Retroviral diversity and distribution in vertebrates. J. Virol. 72, 5955-5966. HICKEY, D. A. (1993). Evolutionary dynamics of transpoable elements in prokaryotes and eukaryotes. In Transposable elements and evolution (ed. J. F. McDonald). Kluwer Academic Publishers, Rotterdam. HIROSE, Y., TAKAMATSU, M. & HARADA, F. (1993). Presence of Env Genes in Members of the Rtvl-H Family of Human Endogenous Retrovirus-Like Elements. Virology 192, 52-61. HOLLAND, J. (1995). Hidden order: how adaptation build complexity. Addison-Wesley, Reading, Mass. HOLMES, E. C. (2001). On the origin and evolution of the human immunodeficiency virus (HIV). Biological Reviews 76, 239-254. HOLZSCHU, D. L., MARTINEAU, D., FODOR, S. K., VOGT, V. M., BOWSER, P. R. & CASEY, J. W. (1995). Nucleotide sequence and protein analysis of a complex piscine retrovirus, walleye dermal sarcoma virus. J. Virol. 69. HRUSKOVA-HEIDINGSFELDOVA, 0., ANDEANSKY, M., FABRY, M., BLAHA, I., STROP, P. & HUNTER, E. (1995). Cloning, bacterial expression and characterization of the Mason-Pfizer monkey virus proteinase. J. Biol. Chem. 270, 15053-15058. Hu, W.-S. & TEMIN, H. M. (1990). Genetic consequences of packaging two RNA genomes in one retroviral particle: pseudodiploidly and high rate of genetic recombination. Proc. Natl. Acad. Sci. U.S.A. 87, 1556. HUDER, J. B., BONI, J., HATT, J.-M., SOLDATI, G., LUTZ, H. & SCHUPBACH, J. (2002). Identification and characterisation of two closely related unclassifiable endogenous retroviruses in pythons (Python molurus and Python curtus). J. Virol. 76, 7607-7615.

216 References

HUGHES, J. F. & COFFIN, J. M. (2001). Evidence for genomic rearrangements mediated by human endogenous retroviruses during primate evolution. Nature Genetics 29, 487-489. HUNTER, E. (1997). Viral Entry and Receptors. In Retroviruses (ed. J. M. Coffin, S. H. Hughes and H. E. Varmus), pp. 71-120. Cold Spring Harbour Laboratory Press, New York. ILYINSKII, P. 0., DANIEL, M., LERCHE, N. & DESROSIERS, R. C. (1992). Antibodies to type D retrovirus in talapoin monkeys. J. Gen. Virol. 72, 453. IMAI, S., OKUMOTO, M., IWAI, S., HAGA, N., MORI, N., MIYASHITA, N., MORIWAKI, K., HILGERS, J. & SARKER, N. H. (1994). Distribution of mouse mammary tumour virus in Asian wild mice. J. Virol. 68, 3437-3442. JAMES, F. (1990). A Review of Pseudorandom number Generators. Computer Physics Communications 60, 329-344. JIANG, M., MAK, J., LADHA, A., COHEN, E., KLEIN, M., ROVINSKI, B. & KLEIMAN, L. (1993). Identification of Transfer-RNAs Incorporated Into Wild-Type and Mutant Human-Immunodeficiency-Virus Type-1. Journal of Virology 67, 3246- 3253. JOAG, S. V., STEPHENS, E. B. & NARAYAN, O. (1996). Lentiviruses. In Fields Virology (ed. B. N. Fields, D. M. Knipe and P. M. Howley). Lippincott-Raven Publishers, Philadelphia. JOHNSON, W. E. & COFFIN, J. M. (1999). Constructing primate phylogenies front ancient retrovirus sequences. Proceedings of the National Academy of Sciences of the United States of America 96, 10254-10260. JONES, T. W. H. (1985). Sheep Pulmonary Adenomatosis (Jaagsiekte). Veterinary Record 117, 210-210. KABAT, P., TRISTEM, M., OPAVSKY, R. & PASTOREK, J. (1996). Human endogenous retrovirus HC2 is a new member of the S71 retroviral subgroup with a full- length pol gene. Virology 226, 83-94. KABAY, M. & ELLIS, D. (1996). Reproducing Ostrich Fading Syndrome. Rural Industries Research and Development Corporation, Perth.

217 References

KATZOURAKIS, A. & TRISTEM, M. (in press). Phylogeny of human endogenous and exogenous retroviruses. In Retroviruses and Primate Evolution (ed. E. D. Sverdlov). KELSEY, C. R., CRANDALL, K. A. & VOEVODIN, A. F. (1999). Different models, different trees: the geographic origin of PTLV-I. Mol. Phylogenetic. Evol. 13, 336-347. KIDWELL, M. G. & LISCH, D. (1997). Transposable elements as sources of variation in animals and plants. Proc. Natl. Acad. Sci. USA 94, 7704-7711. KIDWELL, M. G. & LISCH, D. R. (2000). Transposable elements and host genome evolution. TREE 15, 95-99. KIM, A., TERZIAN, C., SANTAMARIA, P., PELISSON, A., PRUDHOMME, N. & BUCHETON, A. (1994). Retroviruses in Invertebrates - the Gypsy Retrotransposon Is Apparently an Infectious Retrovirus of Drosophila-Melanogaster. Proceedings of the National Academy of Sciences of the United States of America 91, 1285- 1289. KIMURA, M. (1980). The neutral theory of molecular evolution. Scientific American 241, 624-626. KORBER, B., MULDOON, M., THEILER, J., GAO, F., GUPTA, R., LAPEDES, A., HAHN, B. H., WOLINSKY, S. & BHATTACHARYA, T. (2000). Timing the ancestor of the HIV-1 pandemic strains. Science 288, 1789-1796. KOZAK, C., PETERS, G., PAULEY, R., MORRIS, V., MICHALIDES, R., DUDLEY, J., GREEN, M., DAVISSON, M., PRAKASH, 0., VAIDYA, A., HILGERS, J., VERSTRAETEN, A., HYNES, N., DIGGELMANN, H., PETERSON, D., COHEN, J. C., DICKSON, C., SARKAR, N., NUSSE, R., VARMUS, H. & CALLAHAN, R. (1987). A Standardized Nomenclature For Endogenous Mouse Mammary-Tumor Viruses. Journal of Virology 61, 1651-1654. KUFF, E. L., FEENSTRA, A., LUEDERS, K., SMITH, L., HAWLEY, R., Hozumi, N. & SHULMAN, M. (1983). Intracisternal a-Particle Genes As Movable Elements in the Mouse Genome. Proceedings of the National Academy of Sciences of the United States of America-Biological Sciences 80, 1992-1996. KUFF, E. L. & LEUDERS, K. K. (1988). The intracisternal A-particle family: Structure and functional aspects. Ad. Cancer Res. 51, 184-276.

218 References

KUFF, E. L., SMITH, L. A. & LUEDERS, K. K. (1981). Intracisternal a-Particle Genes in Mus-Musculus - a Conserved Family of Retrovirus-Like Elements. Molecular and Cellular Biology 1, 216-227. KUFF, E. L., WIVEL, N. A. & LUEDERS, K. K. (1968). The extraction of intracisternal A- particles from a mouse plasma-cell tumour. Cancer Res. 28. KULSKI, J. K., GAUDIER!, S., BELLGARD, M., BALMER, L., GILES, K., INOKO, H. & DAWKINS, R. L. (1997). The evolution of MHC diversity by segmental duplication and transposition of retroelements. Journal of Molecular Evolution 45, 599-609. KURDYUKOV, S. G., LEBEDEV, Y. B., ARTAMONOVA, II, GORODENTSEVA, T. N., BATRAK, A. V., MAMEDOV, I. Z., AZHIKINA, T. L., LEGCHILINA, S. P., EFIMENKO, I. G., GARDINER, K. & SVERDLOV, E. D. (2001). Full-sized HERV-K (HML-2) human endogenous retroviral LTR sequences on human chromosome 21: map locations and evolutionary history. Gene 273, 51-61. LANDER, E. S., LINTON, L. M., BIRREN, B., NUSBAUM, C., ZODY, M. C., BALDWIN, J., DEVON, K., DEWAR, K., DOYLE, M., FITZHUGH, W., FUNKE, R., GAGE, D., HARRIS, K., HEAFORD, A., HOWLAND, J., KANN, L., LEHOCZKY, J., LEVINE, R., MCEWAN, P., MCKERNAN, K., MELDRIM, J., MESIROV, J. P., MIRANDA, C., MORRIS, W., NAYLOR, J., RAYMOND, C., ROSETTI, M., SANTOS, R., SHERIDAN, A., SOUGNEZ, C., STANGE-THOMANN, N., STOJANOVIC, N., SUBRAMANIAN, A., WYMAN, D., ROGERS, J., SULSTON, J., AINSCOUGH, R., BECK, S., BENTLEY, D., BURTON, J., CLEE, C., CARTER, N., COULSON, A., DEADMAN, R., DELOUKAS, P., DUNHAM, A., DUNHAM, I., DURBIN, R., FRENCH, L., GRAFHAM, D., GREGORY, S., HUBBARD, T., HUMPHRAY, S., HUNT, A., JONES, M., LLOYD, C., MCMURRAY, A., MATTHEWS, L., MERCER, S., MILNE, S., MULLIKIN, J. C., MUNGALL, A., PLUMB, R., ROSS, M., SHOWNKEEN, R., Sims, S., WATERSTON, R. H., WILSON, R. K., HILLIER, L. W., MCPHERSON, J. D., MARRA, M. A., MARDIS, E. R., FULTON, L. A., CHINWALLA, A. T., PEPIN, K. H., GISH, W. R., CHISSOE, S. L., WENDL, M. C., DELEHAUNTY, K. D., MINER, T. L., DELEHAUNTY, A., KRAMER, J. B., COOK, L. L., FULTON, R. S., JOHNSON, D. L., MINX, P. J., CLIFTON, S. W., HAWKINS, T., BRANSCOMB, E., PREDKI, P., RICHARDSON, P., WENNING, S., SLEZAK, T.,

219 References

DOGGETT, N., CHENG, J. F., OLSEN, A., LUCAS, S., ELKIN, C., UBERBACHER, E., FRAZIER, M., et al. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860-921. LANGLEY, C. H., BROOKFIELD, J. F. Y. & KAPLAN, N. (1983). Transposable elements in Mendelian poulations. I. A theory. Genetics 104, 457-471. LAPATSCHEK, M., DURR, S., LOWER, R., MAGIN, C., WAGNER, H. & MIETHKE, T. (2000). Functional analysis of the env open reading frame in human endogenous retrovirus IDDMK(1,2)22 encoding superantigen activity. Journal of Virology 74, 6386-6393. LAPIERRE, L. A., HOLZSCHU, D. L., BOWSER, P. R. & CASEY, J. W. (1999). Sequence and transcriptional analyses of fish retroviruses Walleye Epidermal Hyperplasia Virus types 1 and 2: Evidence for a gene duplication. J. Virol. 73, 9393-9403. LEAR, A. L., HADDRICK, M. & HEAPHY, S. (1995). A study of the dimerization of Rous sarcoma virus RNA in vitro and in vivo. Virology 212, 47-57. LEVY, J. A. (1978). Xenotropic type C viruses. Curr. Top. Microbiol. Immunol. 79, 109- 213 LEVY, J. A., HOFFMAN, A. D., KRAMER, S. M., LANDIS, J. A. & SHIMABUKURO, J. M. (1984). Isolation of Lymphocytopathic Retroviruses From San-Francisco Patients With Aids. Science 225, 840-842. LEWIS, P. & EMERMAN, M. (1994). Passage through mitosis is required for oncoretroviruses but not for human immunodeficiency virus. J. Virol., 510-516. LORINCZ, M. C., SCHUBELER, D. & GROUDINE, M. (2001). Methylation-mediated proviral silencing is associated with MeCP2 recruitment and localised histone H3 deacetylation. Mol. Cell. Biol. 21, 7913-7922. LOTLIKAR, M. S. & LIPSON, S. M. (2002). Survival of spumavirus, a primate retrovirus, in laboratory media and water. FEMS Microbiol. Lets. 211, 207-211. LOWER, R., LOwER, J. & KURTH, R. (1996). The viruses in all of us: Characteristics and biological significance of human endogenous retrovirus sequences. 1996 93, 5177-5184. LOZOVSKAYA, E. R., D.L., H. & PETROV, D. A. (1995). Genomic regulation of transposable elements in Drosophila. Curr. Opin. Genet. Dev. 5, 768-73.

220 References

LUEDERS, K. K. & KUFF, E. L. (1980). Intracisternal A particle genes: Identification in the genome of Mus musculus and comparison of multiple isolates from a mouse gene library. Proc. Natl. Acad. Sci. U.S.A. 77, 3571-3575. LUEDERS, K. K. & KUFF, E. L. (1981). Sequences Homologous to Retrovirus-Like Genes of the Mouse Are Present in Multiple Copies in the Syrian-Hamster Genome. Nucleic Acids Research 9, 5917-5930. LUEDERS, K. K., FRANKEL, W. N., MIETZ, J. A. & KUFF, E. L. (1993). Genomic Mapping of Intracisternal a-Particle Proviral Elements. Mammalian Genome 4, 69-77. LUEDERS, K. K. & KUFF, E. L. (1983). Comparison of the Sequence Organization of Related Retrovirus- Like Multigene Families in 3 Evolutionarily Distant Rodent Genomes. Nucleic Acids Research 11, 4391-4408. LUNGER, P. D., HARDY, W. D. J. & CLARK, H. F. (1974). C-type viral particles in a reptilian tumour. J. Natl. Cancer. Inst. 52, 1231. LYNCH, C. (2000). Evolution of the vertebrate LTR-retrotransposons, Imperial College of Science Technology and Medicine. MAGER, D. L. & FREEMAN, J. D. (1995). HERV-H endogenous retroviruses - presence in the New World branch but amplification in the Old World primate lineage. Virology 213, 395-404. MAGER, D. L. & FREEMAN, J. D. (2000). Novel mouse type D endogenous proviruses and ETn elements share long terminal repeat and internal sequences. J. Virol. 74, 7221-7229. MAJORS, J. E. & VARMUS, H. E. (1981). Nucleotide sequences at host-proviral junctions for mouse mammary tumour virus. Nature 289. MAMEDOV, I., BATRAK, A., BUZDIN, A., ARZUMANYAN, E., LEBEDEV, Y. & SVERDLOV, E. D. (2002). Genome-wide comparison of differences in the integration sites of interspersed repeats between closely related genomes. Nucleic Acids Research 30, art. no.-e71. MANIATIS, T., SAMBROOK, J. & FRITSCH, E. (1989). Molecular Cloning: Laboratory Manual. Cold Spring Harbour Laboratory Press, New York.

221 References

MANN, R., MULLIGAN, R. C. & BALTIMORE, D. (1983). Construction of a retrovirus packaging mutant and its use to produce helper-free defective retrovirus. Cell 33, 153-9. MARLOR, R., PARKHURST, S. & CORCES, V. (1986). The Drosophila melanogaster gypsy transposable element encodes putative gene products homologous to retroviral proteins. MoL Cell Biol. 6, 1129-1134. MARTIN, J. (1999b). The phylogeny and evolution of murine-leukemia-related and other retroviruses, Imperial College of Science Technology and Medicine. MARTIN, J., HERNIOU, E., COOK, J., WAUGH O'NEILL, R. & TRISTEM, M. (1997). Human endogenous retrovirus type I-related viruses have an apparently widespread distribution within vertebrates. J. Virol. 71, 437-443. MARTIN, J., HERNIOU, E., COOK, J., WAUGH O'NEILL, R. & TRISTEM, M. (1999a). Interclass transmission and phyletic host tracking in murine leukaemia virus related retroviruses. J. Virol 73, 2442-2449. MARTIN, J., KABAT, P., HERNIOU, E. & TRISTEM, M. (2002). Characterization and complete nucleotide sequence of an unusual reptilian retrovirus recovered from the order crocodylia. Journal of Virology 76, 4651-4654. MARTIN, J. & TRISTEM, M. (2000). Cospeciation and horizontal transmission rates in the murine leukamia-related retroviruses. In Phylogeny, cospeciation and coevolution (ed. R. D. M. Page). University of Chicago, Chicago. MARTIN, R. D. (1993). Primate origins: plugging the gaps. Nature 363, 223-234. MAURY, W. (1998). Regulation of equine infectious anemia virus expression. Journal of Biomedical Science 5, 11-23. MAYER, J., MEESE, E. & MUELLER-LANTZSCH, N. (1998). Human endogenous retrovirus K homologous sequences and their coding capacity in old world primates. Journal of Virology 72, 1870-1875. MAYNARD-SMITH, J. & HAIGH, J. (1976). The hitch-hiking effect of a favourable gene. Genet. Res. 23, 23-35. MCCONNELL, S. (1993). Code Complete. Microsoft Press, Washington.

222 References

MCDOUGALL, J. S., BIGGS, P. M., SHILLETO, R. W. & MILNE, B. S. (1978). Lymphoproliferative disease virus of turkeys. Experimental transmission and aetiology. Avian Pathology 7, 141. MEDSTRAND, P. & MAGER, D. L. (1998). Human-specific integrations of the HERV-K endogenous retrovirus family. Journal of Virology 72, 9782-9787. MEDSTRAND, P., MAGER, D. L., YIN, H., DIETRICH, U. & BLOMBERG, J. (1997). Structure and genomic organization of a novel human endogenous retrovirus family: HERV-K (HML-6). Journal of General Virology 78, 1731-1744. MEERTENS, L., MAHIEUX, R., MAUCLERE, P., LEWIS, J. & GESSAIN, A. (2002). Complete sequence of a novel highly divergent simian T-cell lymphotropic virus from wild-caught red-capped mangabeys (Cercocebus torquatus) from Cameroon: A new primate T- lymphotropic virus type 3 subtype. Journal of Virology 76, 259- 268. MEISLER, M. H. & TING, C. N. (1993). The remarkable evolutionary history of the human salivary amylase genes. Crit. Rev. Oral. Biol. Med. 4, 503-509. MIETZ, J. A., GROSSMAN, Z., LUEDERS, K. K. & KUFF, E. L. (1987). Nucleotide sequence of a complete mouse intracisternal A-particle genome: relationship to known aspects of particle assembly and function. J Virol 61, 3020-9. MILLER, A. D. (1997). Development and Applications of Retroviral Vectors. In Retroviruses (ed. J. M. Coffin, S. H. Hughes and H. E. Varmus), pp. 437-474. Cold Spring Harbour Laboratory Press, New York. MINCHIOTTI, G. & DI NOCERA, P. P. (1991). Convergent transcription initiates from oppositely oriented promoters within the 5' end region of Drosophila melanogaster F elements. MoL Cell. Biol. 11, 5171-5180. MIZROKHI, L. J., GEORGIEVA, S. G. & ILYIN, Y. V. (1988). Jockey, a mobile element similar to mammalian LINEs, is transcribed from the internal promoter by RNA polymerase II. Cell 54, 685-691. MONTGOMERY, E. A. & LANGLEY, C. H. (1983). Transposable elements in Mendelian poulations. II. Distribution of three copia-like elements in a natural population of Drosophila melanogaster. Genetics 104, 473-483.

223 References

MOORE, G. E. (1987). Confirmation of a Retrovirus in a B95-8 Cell-Line. In Vitro Cellular & Developmental Biology 23, 153-153. MOORE, R., DIXON, M., SMITH, R. E., PETERS, G. & DICKSON, C. (1987). Complete nucleotide sequence of a milk-transmitted mammary tumour virus: Two frameshift suppression events are required for translation of gag and pol. J. Virol. 61, 480-490. MORSE, S. S. & SCHLUEDERBERG, A. (1990). Emerging viruses: the evolution of viruses and viral disease. J. NIH Res. 6, 52-56. MULLER, H. J. (1964). The relation of recombination to mutational advance. Mutat. Res. 1., 2-9. NAGY, K., YOUNG, M., BABOONIAN, C., MERSON, J., WHITTLE, P. & OROSZLAN, S. (1994). Antiviral Activity of Human-Immunodeficiency-Virus Type-1 Protease Inhibitors in a Single-Cycle of Infection - Evidence For a Role of Protease in the Early Phase. Journal of Virology 68, 757-765. NANDI, J. S., BHAVALKAR-POTDAR, V., TIKUTE, S. & RAUT, C. G. (2000). A novel type D simian retrovirus naturally infecting the Indian Hanuman Langur (Semnopithecus entellus). Virology 277, 6-13. NANDI, S. & MCGRATH, C. M. (1973). Mammary neoplasia in mice. Adv. Cancer Res. 17, 353-414. NEE, S., HOLMES, E. C., MAY, R. M. & HARVEY, P. H. (1994). Extinction Rates Can Be Estimated From Molecular Phylogenies. Philosophical Transactions of the Royal Society of London Series B-Biological Sciences 344, 77-82. ODA, T., IKEDA, S., WATANABE, S., HATSUSHIKA, M., AICIYAMA, K. & MITSUNOBU, F. (1988). Molecular-Cloning, Complete Nucleotide-Sequence, and Gene Structure of the Provirus Genome of a Retrovirus Produced in a Human-Lymphoblastoid Cell-Line. Virology 167, 468-476. O'NEILL, R. J. W., O'NEILL, M. J. & GRAVES, M. J. (1998). Undermethylation associated with retroelement activation and chromosome remodelling in an interspecific mammalian hybrid. Nature 393, 68-72.

224 References

ONO, M. (1986). Molecular-Cloning and Long Terminal Repeat Sequences of Human Endogenous Retrovirus Genes Related to Type-a and Type-B Retrovirus Genes. Journal of Virology 58, 937-944. ONO, M. & OHISHI, H. (1983). Long terminal repeat sequences of intracisternal A particle genes in the Syrian hamster genome: identification of tRNAPhe as a putative primer tRNA. Nucleic Acids Res 11, 7169-79. ONO, M., TOH, H., MIYATA, T. & AWAYA, T. (1985). Nucleotide sequence of the Syrian hamster intracisternal A-particle gene: close evolutionary relationship of type A particle gene to types B and D oncovirus genes. J Virol 55, 387-94. ORGEL, L. E. & CRICK, F. H. C. (1980). Selfish DNA: the ultimate parasite. Nature 284, 604-607. PACES, J., PAVLICEK, A. & PACES, V. (2002). HERVd: database of human endogenous retroviruses. Nucleic Acids. Res. 30, 205-206. PADOW, M., LAI, L. L., FISHER, R. J., ZHOU, Y. C., Wu, X. Y., KAPPES, J. C. & TOWLER, E. M. (2000). Analysis of human immunodeficiency virus type 1 containing HERV-K protease. Aids Research and Human Retroviruses 16, 1973-1980. PAGE, R. D. M. & HOLMES, E. C. (1998). Molecular evolution: A phylogenetic approach. Blackwell, Oxford. PALMARINI, M., COUSENS, C., DALZIEL, R. G., BAI, J., STEDMAN, K., DEMARTINI, J. C. & SHARP, J. M. (1996). The exogenous form of jaagsiekte retrovirus is specifically associated with a contagious lung cancer of sheep. Journal of Virology 70, 1618-1623. PATIENCE, C., SWITZER, W. M., TAKEUCHI, Y., GRIFFITHS, D. J., GOWARD, M. E., HENEINE, W., STOYE, J. P. & WEISS, R. A. (2001). Multiple groups of novel retroviral genomes in pigs and related species. Journal of Virology 75, 2771- 2775. PAYNE, L. N. (1992). Biology of Avian Retroviruses. In The Retroviridae, vol. 1 (ed. J. A. Levy), pp. 299-376. Plenum Press, New York. PERRINS, C. M. & MIDDLETON, A. L. A. (1998). The Encyclopedia of Birds. Facts on File Inc., New York.

225 References

PETERS, G. G. & GLOVER, C. (1980). tRNAs and priming of RNA directed DNA synthesis in mouse mammary tumour virus. J. virol. 35, 31-40. PETERSON-BURCH, B. D., WRIGHT, D. A., LATEN, H. M. & VOYSTAS, D. F. (2000). Retroviruses in plants? TIG 16, 151-152. PETROPOULOS, C. (1997). Retroviral Taxonomy, Protein Structures, Sequences, and Genetic Maps (ed. J. M. Coffin, S. H. Hughes and H. E. Varmus). PLATT, J. L. (2000). Xenotransplantation - New risks, new gains. Nature 407, 27-30. POIESZ, B. J., RUSCETTI, F. W., REITZ, M. S. & GALLO, R. C. (1981). Isolation of a new type C retrovirus (HTLV) in primary uncultured cells of a patient with Sezary T- cell Leukemia. Nature 294, 268-271. POWER, M. D., MARX, P. A., BRYANT, M. L., GARDNER, M. B., BARR, P. J. & Luciw, P. A. (1986). Nucleotide-Sequence of SRV-1, a Type-D Simian Acquired-Immune- Deficiency-Syndrome Retrovirus. Science 231, 1567-1572. PRESTON, B. D., POIESZ, B. J. & LOEB, L. A. (1988). Fidelity of HIV-1 Reverse Transcriptase. Svience 242, 1168-1173. PRINGLE, C. R. (1999). Virus taxonomy - 1999 - The Universal System of Virus Taxonomy, updated to include the new proposals ratified by the International Committee on Taxonomy of Viruses during 1998. Archives of Virology 144, 421-429. PRYCIAK, P. M. & VARMUS, H. E. (1992). Nucelosomes, DNA-binding proteins, and DNA sequence modulate retroviral integration target site selection. Cell 69, 769- 780. PULSINELLI, G. A. & TEMIN, H. M. (1994). High-Rate of Mismatch Extension During Reverse Transcription in a Single Round of Retrovirus Replication. Proceedings of the National Academy of Sciences of the United States of America 91, 9490- 9494. PURVIS, A. (1995). A composite estimate of primate phylogeny. Phil. Trans. R. Soc. Lond. B 348, 405-421. PURVIS, A. (1996). Using interspecific phylogenies to test macroevolutionary hypotheses. In New uses for new phylogenies (ed. P. H. Harvey, A. H. Leigh- Brown, J. Maynard-Smith and S. Nee). Oxford University Press, Oxford.

226 References

PYBUS, 0. G. & HARVEY, P. H. (2000). Testing macro-evolutionary models using incomplete molecular phylogenies. Proceedings of the Royal Society of London Series B-Biological Sciences 267, 2267-2272. RABSON, A. B. & GRAVES, B. J. (1997). Synthesis and processing of viral proteins. In Retroviruses (ed. J. M. Coffin, S. H. Hughes and H. E. Varmus), pp. 205-262. Cold Spring Harbour Laboratory Press, New York. RESNICK, R. M., BOYCEJACINO, M. T., Fu, Q. & FARAS, A. J. (1990). Phylogenetic Distribution of the Novel Avian Endogenous Provirus Family Eav-0. Journal of Virology 64, 4640-4653. REUSS, F. U. & SCHALLER, H. C. (1991). Cdna Sequence and Genomic Characterization of Intracisternal a- Particle-Related Retroviral Elements Containing an Envelope Gene. Journal of Virology 65, 5702-5709. RHEE, S. S. & HUNTER, E. (1990). A Single Amino-Acid Substitution Within the Matrix Protein of a Type-D Retrovirus Converts Its Morphogenesis to That of a Type- C Retrovirus. Cell 63, 77-86. ROIZMAN, B., HOWLEY, P. M., STRAUS, S. E., MARTIN, M. A., GRIFFIN, D. E., LAMB, R. A. & KNIPE, D. M. (1997). Fields Virology. Plenum Press, New York. ROSENBERG, N. & JOLICOEUR, P. (1997). Retroviral Pathogenesis. In Retroviruses (ed. J. M. Coffin, S. H. Hughes and H. E. Varmus). Cold Spring Harbour Laboratory Press, New York. Rous, P. (1911). A sarcoma of the fowl transmissible by an agent separable from the tumour cells. J. Exp. Med. 13, 397-411. RUSHLOW, K., OLSEN, K., STIEGLER, G., PAYNE, S. L., MONTELARO, R. C. & ISSEL, C. J. (1986). Lentivirus genomic organization: the complete nucleotide sequence of the env gene region of equine infectious anemia virus. Virology 155, 309-21. SACCO, M. A., HOWES, K. & VENUGOPAL, K. (2001). Intact EAV-HP endogenous retrovirus in Sonnerat's jungle fowl. Journal of Virology 75, 2029-2032. SAGATA, N., YASUNAGA, T., TSUZUKU-KAWAMURA, J., OHISHI, K., OGAWA, Y. & IKAWA, Y. (1985). Complete nucleotide sequence of the genome of bovine leukemia virus: Its evolutionary relationship to other retroviruses. Proc. Natl. Acad. Sci. 82, 677-681.

227 References

SALA, M. & WAIN-HOBSON, S. (2000). Are RNA viruses adapting or merely changing? J. Mol. Evol. 51, 12-20. SALEMI, M., DESMYTER, J. & VANDAMME, A. M. (2000). Tempo and mode of human and simian T-lymphotropic Virus (HTLV/STLV) evolution revealed by analyses of full-genome sequences. Molecular Biology and Evolution 17, 374-386. SALTARELLI, M., QUERAT, G., KONINGS, D. A. M., VIGNE, R. & CLEMENTS, J. E. (1990). Nucleotide-Sequence and Transcriptional Analysis of Molecular Clones of Caev Which Generate Infectious Virus. Virology 179, 347-364. SCHILDT, J. M. (1998). C++ The complete reference. Plenum Press, New York. SCHULTE, A. M. & WELLSTEIN, A. (1998). Structure and Phylogenetic Analysis of an Endogenous Retrovirus Inserted into the Human Growth Factor Pleiotrophin. J. Virol 72, 6065-6072. SERVENAY, M., KUPIEC, J. J., PERIES, J. & EMANOIL-RAVIER, R. (1990). Molecular cloning and characterization of retrovirus-like intracisternal type A particle genes (IAP) present in the Chinese hamster genome. Virus Genes 4, 351-8. SHERMAN, M. P. & GREENE, W. C. (2002). Slipping through the door: HIV entry into the nucleus. Microbes Infect 4, 67-73. Simi, C.-C., STOYE, J. P. & COFFIN, J. M. (1988). Highly preferred targets for retrovirus integration. Cell 53, 531-537. SHIMAMURA, M., YASUE, H., OHSHIMA, K., ABE, H., KATO, H., KISHIRO, T., GOTO, M., MUNECHIKA, I. & OKADA, N. (1997). Molecular evidence from retroposons that whales form a Glade within even-toed ungulates. Nature 388, 666-670. SHIMOTOHNO, K. & TEMIN, H. M. (1981). Formation of infectious progeny virus after insertion of herpes simplex thymidine kinase gene into DNA of an avian retrovirus. Cell 26, 67-77. SIGURDSSON, B. (1954). Observations on three slow infections of sheep. Maedi, paratuberculosis, rida, a slow encephalitis of sheep with general remarks on infections which develop slowly and some of their special characteristics. Br. Vet. J. 110, 255-270. SIMPSON, G. R., PATIENCE, C., LOWER, R., TONJES, R. R., MOORE, H. D. M., WEISS, R. A. & BOYD, M. T. (1996). Endogenous D-Type (HERV-K) Related Sequences

228 References

Are Packaged into Retroviral Particles in the Placenta and Possess Open Reading Frames for Reverse Transcriptase. Virology 222, 451-456. SMIT, A. F. A. (1993). Identification of a New, Abundant Superfamily of Mammalian LTR- Transposons. Nucleic Acids Research 21, 1863-1872. SONG, S. U., GERASIMONVA, T., KURKULOS, M., BOEKE, J. D. & CORCES, V. G. (1994). An Env-like Protein Encoded by a Drosphila Retroelement: Evidence that gypsy is an Infectious Retrovirus. Genes & Dev. 8, 2046-2057. SONIGO, P., ALIZON, M., STASKUS, K., KLATZMANN, D., COLE, S., DANOS, 0., RETZEL, E., TIOLLAIS, P., HAASE, A. & WAINHOBSON, S. (1985). Nucleotide-Sequence of the Visna Lentivirus - Relationship to the Aids Virus. Cell 42, 369-382. SONIGO, P., BARKER, C., HUNTER, E. & WAINHOBSON, S. (1986). Nucleotide-Sequence of Mason-Pfizer Monkey Virus - an Immunosuppressive D-Type Retrovirus. Cell 45, 375-385. SONODA, S., LI, H. C., CARTIER, L., NUNEZ, L. & TAJIMA, K. (2000). Ancient HTLV type 1 provirus DNA of Andean mummy. Aids Research and Human Retroviruses 16, 1753-1756. STEHELIN, D., VARMUS, H. E., BISHOP, J. M. & VOGT, P. K. (1976). DNA relating to the transforming gene(s) of avian sarcoma viruses present in normal avian DNA. Nature 260, 170-173. STOYE, J. P. (2001). Endogenous retroviruses: Still active after all these years? Curr. Biol. 11, R914-R916. SVERDLOV, E. D. (2000). Retroviruses and primate evolution. Bioessays 22, 161-171. SVOBODA, J., HEJNAR, J., GERYK, J., ELLEDER, D. & VERNEROVA, Z. (2000). Retroviruses in foreign species and the problem of provirus silencing. Gene 261, 181-188. SWANSTROM, R. & WILLS, J. W. (1997). Synthesis, assembly, and processing of viral proteins. In Retroviruses (ed. J. M. Coffin, S. H. Hughes and H. E. Varmus), pp. 205-262. Cold Spring Harbour Laboratory Press, New York. SWERGOLD, G. D. (1990). Identification, characterisation, and cell specificity of a human LINE-1 promoter. MoL Cell Biol. 10, 6718-6729.

229 References

SWOFFORD, D. L. (1998). PAUP*. Phylogenetic analysis using parsimony (* and other methods). Sinauer Associates, Sunderland, MA. TABIN, C. J., HOFFMANN, J. W., GOFF, S. P. & WEINBERG, R. A. (1982). Adaptation of a Retrovirus As a Eukaryotic Vector Transmitting the Herpes-Simplex Virus Thymidine Kinase Gene. Molecular and Cellular Biology 2, 426-436. TALBOTT, R. L., SPARGER, E. E., LOVELACE, K. M., FITCH, W. M., PEDERSEN, N. C., Luciw, P. A. & ELDER, J. H. (1989). Nucleotide sequence and genomic organization of feline immunodeficiency virus. Proc Natl Acad Sci U S A 86, 5743-7. TATUSOVA, T. A. & MADDEN, T. L. (1999). BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol Lett 174, 247-50. TEICH, N. (1984). Taxonomy of retroviruses. In Molecular biology of tumour viruses (ed. R. W. e. al), pp. 25-207. Cold Spring Harbour Laboratory Press, New York. TELESNITSKY, A. & GOFF, S. P. (1997). Reverse Transcriptase and the Generation of Retroviral DNA. In Retroviruses (ed. J. M. Coffin, S. H. Hughes and H. E. Varmus), pp. 121-160. Cold Spring Harbour Laboratory Press, New York. TEMIN, H. M. (1964). Nature of the provirus of Rous sarcoma virus. Natl. Cancer Inst. Monogr. 17, 557-570. TEMIN, H. M. (1989). Retrovirus Variation and Evolution. Genome 31, 17-22. TEMIN, H. M. (1991). Sex and recombination in retroviruses. Trends Genet. 7. TEMIN, H. M. & MIZUTAMI, S. (1970). RNA-dependent DNA polymerase in virions of Rous Sarcoma Virus. Nature 226, 1211-1213. THOMPSON, J. D., HIGGINS, D. G. & GIBSON, T. J. (1994). Clustal-W - Improving the Sensitivity of Progressive Multiple Sequence Alignment Through Sequence Weighting, Position- Specific Gap Penalties and Weight Matrix Choice. Nucleic Acids Research 22, 4673-4680. TOMONARI, K., FAIRCHILD, S. & ROSENWASSER, 0. A. (1993). Influence of Viral Superantigens On V-Beta-Specific and V- Alpha-Specific Positive and Negative Selection. Immunological Reviews 131, 131-168.

230 References

TONJES, R. R., LIMBACH, C., LOwER, R. & KURTH, R. (1997). Expression of Human Endogenous Retrovirus Type K Envelope Glycoprotein in Insect and Mammalian Cells. J. Virol. 71, 2747-2756. TOWLER, E. M., GULNIK, S. V., BHAT, T. N., XIE, D., GUSTSCHINA, E., SUMPTER, T. R., ROBERTSON, N., JONES, C., SAUTER, M., MUELLER-LANTZSCH, N., DEBOUCK, C. & ERICKSON, J. W. (1998). Functional characterization of the protease of human endogenous retrovirus, K10: Can it complement HIV-1 protease? Biochemistry 37, 17137-17144. TRISTEM, M. (1996). Amplification of divergent retroelements by PCR. Biotechniques 20, 608-612. TRISTEM, M. (2000). Identification and characterisation of novel human endogenous retrovirus families by phylogenetic screening of the human genome mapping project database. J. Virol. 74, 3715-3730. TRISTEM, M., HERNIOU, E., SUMMERS, K. & COOK, J. (1996). Three Retroviral Sequences in Amphibians Are Distinct from Those in Mammals and Birds. J. Virol. 70, 4864-4870. TURNER, G., BARBELESCU, M., Su, M., JENSEN-SEAMAN, M. I., KIDD, K. K. & LENZ, J. (2001). Insertional polymorphisms of full-length endogenous retroviruses in humans. Curr. Biol. 11, 1531-1535. UCHIYAMA, T., YODOI, J., SAGAWA, K., TAKATSUKI, K. & UCHINO, H. (1977). Adult T- cell leukemia: clinical and hematologic features of 16 cases. Blood 50, 481-492. UNAIDS/WHO. (2000). AIDS epidemic update. Joint United Nations Programme on HIV/AIDS (UNAIDS), Geneva. VAN BRUSSEL, M., SALEMI, M., LIU, H. F., GABRIELS, J., GOUBAU, P., DESMYTER, J. & VANDAMME, A. M. (1998). The simian T-lymphotropic virus STLV-PP1664 from Pan paniscus is distinctly related to HTLV-2 but differs in genomic organization. Virology 243, 366-379. VAN DER KUYL, A. C., DEKKER, J. T. & GOUDSMIT, J. (1995). Distribution of Baboon Endogenous Virus Among Species of African Monkeys Suggests Multiple Ancient Cross-Species Transmissions in Shared Habitats. Journal of Virology 69, 7877-7887.

231 References

VAN DER KUYL, A. C., DEKKER, J. T. & GOUDSMIT, J. (1996). Baboon endogenous virus evolution and ecology. Trends Microbiol. 4, 455-459. VAN DER KUYL, A. C., MANG, R., DECKER, J. C. & GOUDSMIT, J. (1997). Complete nucleotide sequence of simian endogenous type D retrovirus with intact genome organisation: Evidence for ancestry to simian retrovirus and baboon endogenous virus. J. Virol. 71, 3666-3676. VAN DER LAAN, L. J. W., LOCKEY, C., GRIFFETH, B. C., FRASIER, F. S., WILSON, C. A., ONIONS, D. E., HERING, B. J., LONG, Z. F., Ono, E., TORBETT, B. E. & SALOMON, D. R. (2000). Infection by porcine endogenous retrovirus after islet xenotransplantation in SCID mice. Nature 407, 90-94. VAN NIE, R. & VERSTRAETEN, A. A. (1977). Studies of genetic transmission of mammary tumour virus of C3Hf mice. Int. J. Cancer. 16, 922-931. VAN NIE, R., VERSTRAETEN, A. A. & DEMOES, J. (1977). Genetic transmission of mammary tumour virus by Gr mice. 19. VAN REGENMORTEL, M. H. V., FAUQUET, C. M., BISHOP, D. H. L., CARSTENS, E. B., ESTES, M. K., LEMON, S. M., MANILOFF, J., MAYO, M. A., MCGEOCH, D. J., PRINGLE, C. R. & WICKNER, R. B. (2000). Seventh Report of the International Commitee on Taxonomy of Viruses. Academic Press, San Diego. VILLAREAL, L. P. (1997). On viruses, sex and motherhood. J. Virol. 71, 859-865. VOGT, P. K. (1967). A virus released by "non-producing" Rous sarcoma cells. Proc. Natl. Acad. Sci. 58, 801-808. VOGT, P. K. (1997a). Historical Introduction to the General Properties of Retroviruses. In Retroviruses (ed. J. M. Coffin, S. H. Hughes and H. E. Varmus). Cold Spring Harbour Laboratory Press, New York. VOGT, P. K. (1997b). Retroviral virions and genomes. In Retroviruses (ed. J. M. Coffin, S. H. Hughes and H. E. Varmus), pp. 27-70. Cold Spring Harbour Laboratory Press, New York. VOGT, P. K. & FRIIS, R. R. (1971). An avian leukosis related to RSV(0). Properties and evidence for helper activity. Virology 43, 223-234. VOGT, V. M. (1996). Proteolytic processing and particle maturation. Curr. Top. Microbiol. Immunol. 214, 95-131.

232 References

WAIN-HOBSON, S. (1993). The fastest genome evolution ever described: HIV variation in situ. Curr. Opin. Genet. Dev. 3, 878-883. WATANABE, S. & TEMIN, H. M. (1982). Encapsidation sequences for spleen necrosis virus, an avian retrovirus, are between the 5' long terminal repeat and the start of the gag gene. Proc Natl Acad Sci USA 79, 5986-90. WEI, C. M., GIBSON, M., SPEAR, P. G. & SCOLNICK, E. M. (1981). Construction and Isolation of a Transmissible Retrovirus Containing the Src Gene of Harvey Murine Sarcoma-Virus and the Thymidine Kinase Gene of Herpes-Simplex Virus Type-1. Journal of Virology 39, 935-944. WEI, X., GHOSH, S. K., TAYLOR, M. E., JOHNSON, V. A., EMINI, E. A., DEUTSCH, P., LIFSON, J. D., BONHOEFFER, S., NOWAK, M. A., HAHN, B. H., SAAG, M. S. & SHAW, G. M. (1995). Viral dynamics in human immunodeficiency virus type 1 infection. Nature 373, 117-122. WEISS, R. A. (1967). Spontaneous virus production from "non-virus producing" Rous sarcoma cells. Virology 32, 719-723. WEISS, R. A. (1993). Cellular receptors and viral glycoproteins involved in retrovirus entry. In The Retroviridae, vol. 2 (ed. J. A. Levy). Plenum Press, New York. WILK, T., GROSS, I., GOWEN, B. E., RUTTEN, T., DE HAAS, F., WELKER, R., KRAUSSLICH, H. G., BOULANGER, P. & FULLER, S. D. (2001). Organization of immature human immunodeficiency virus type 1. Journal of Virology 75, 759-771. WILKINSON, D. A., MAGER, D. L. & J.A., L. (1994). Endogenous human retroviruses. In The Retroviridae, vol. 3 (ed. J. A. Levy). Plenum Press, New York. WILLIAMS, K. J. & LOEB, L. A. (1992). Retroviral reverse transcriptases: Error frequencies and mutagenesis. Curr. Top. Microbiol. Immunol. 176, 165-180. WITHERS-WARD, E. S., KITAMURA, Y., BARNES, J. P. & COFFIN, J. M. (1994). Distribution of targets for avian retrovirus DNA integration in vivo. Genes Dev. 8, 1473-1487. XIONG, Y. & EICKBUSH, T. H. (1990). Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 9, 3353-3362.

233 References

YANG, J., H.P., B., PENG, S., WIEGAND, H., TRUANT, R. & CULLEN, B. R. (1999). An ancient family of human endogenous retroviruses encodes a functional homolog of the HIV-1 Rev protein. Proc. Natl. Acad. Sci. U.S.A. 96, 13404-13408. YIN, H., MEDSTRAND, P., KRISTOFFERSON, A., DIETRICH, U., AMAN, P. & BLOMBERG, J. (1999). Characterization of human MMTV-like (HML) elements similar to a sequence that was highly expressed in a human breast cancer: Further definition of the HML-6 group. Virology 256, 22-35. YODER, J. A. & BESTOR, T. H. (1996). Genetic analysis of genomic methylation patterns in plants and animals. Biol. Chem. 377, 605-610. YORK, D. F., VIGNE, R., VERWOERD, D. W. & QUERAT, G. (1992). Nucleotide-Sequence of the Jaagsiekte Retrovirus, an Exogenous and Endogenous Type-D and Type-B Retrovirus of Sheep and Goats. Journal of Virology 66, 4930-4939. YOSHIMAKA, Y., KATOH, I., COPELAND, T. D. & OROSZLAN, S. (1985a). Murine leukemia virus protease is encoded by the gag-pol gene and is synthesised through suppression of an amber termination codon. Proc. Natl. Acad. Sci. 82, 1618-1622. YOSHIMAKA, Y., KATOH, I., COPELAND, T. D. & OROSZLAN, S. (1985b). Translational readthrough of an amber termination codon during synthesis of the protease. J. Virol. 55, 870-873. ZENNOU, V., PETIT, C., GUETARD, D., NERHBASS, U., MONTAGNIER, L. & CHARNEAU, P. (2000). HIV-1 genome nuclear import is mediated by a central DNA flap. Cell 101, 173-85. am, T., KORBER, B. T., NAHMIAS, A. J., HOOPER, E., SHARP, P. M. & Ho, D. D. (1998). An African HIV-1 Sequence from 1959 and Implications for the Origin of the Epidemic. Nature 391, 594-597. ZUKER, M. (2000). Calculating nucleic acid secondary structure. Current Opinion in Structural Biology 10, 303-310.

234 Appendix 1. Tissue and DNA Sources

Appendix 1. Tissue and DNA sources

Order/Family Species Source

Class Ayes

Anseriformes Anatidae (Swans, geese and ducks) White-fronted goose (Anser albifrons) Slimbridge Reserve, UK Redhead (Aythya americana) David P. Mindell, University of Michigan, USA. Brent goose (Branta bernicla) Slimbridge Bird Reserve, UK Wandering whistling duck (Dendrocygna David P. Mindell, University of arcuata) Michigan, USA. North American black duck (Anas rubripes) David P. Mindell, University of Michigan, USA. Baikal teal (Anas formosa) Slimbridge Bird Reserve, UK Apterygiformes Apterygidae (Kiwis) Brown kiwi (Apteryx australis) Alan Cooper, Oxford University, UK Great spotted kiwi (Apteryx haasti) Alan Cooper, Oxford University, UK Little spotted kiwi (Apteryx owenii) Alan Cooper, Oxford University, UK Casuariformes Dromaiidae (Emu) Emu (Dromaius novaehollandiae) Alan Cooper, Oxford University, UK Casuariidae (Cassowaries) Cassowary ( ) Alan Cooper, Oxford University, UK

Charadrilformes Jacanidae (Jacanas) Bronze-winged jacana (Metopidius indicus) Paul Johnson, University of Cambridge, Cambridge, UK Ciconiiformes Ciconidae (Storks) Wood stork (Mycteria americana) David P. Mindell, University of Michigan, USA. Phoenicopteridae (Flamingos) Chilean flamingo (Phoenicopterus ruber Slimbridge Bird Reserve, UK chilensis) Columbiformes Columbidae (Pigeons) Wood pigeon (Columba palumus) Ban Donato, Imperial College, UK Falconiformes Accipritidae Goshawk (Accipter gentilis) Koon Wah Fok, University of Nottingham, UK Marsh harrier (Circus aeruginosus) Caroline Metcalf, University of Nottingham, UK Red kite (Milvus milvus) Koon Wah Fok, University of Nottingham, UK Ferruginous hawk (Buteo regalia) Koon Wah Fok, University of Nottingham, UK Cathartidae (New World vultures) Turkey vulture (Cathartes aura) David P. Mindell, University of Michigan, USA. Falconidae (Typical falcons) Peregrine falcon (Falco peregrinus) David P. Mindell, University of Michigan, USA. Galliformes Numididae Vulturine guineafowl (Acryllium Koon Wah Fok, University of vulurinum) Nottingham, UK Phasianidae (Pheasants and quails) Gambel's Quail (Callipepla gambelii) David P. Mindell, University of Michigan, USA. Golden pheasant (Cluysolophus pictus) Koon Wah Fok, University of Nottingham, UK Bobwhite quail (Colinus virginianus) Koon Wah Fok, University of Nottingham, UK Japanese quail (Coturnix japonica) Koon Wah Fok, University of Nottingham, UK Ring-necked pheasant (Phasianus Koon Wah Fok, University of colchicus) Nottingham, UK

235 Appendix 1. Tissue and DNA Sources

Appendix 1 (continued). Tissue and DNA sources

Order/Family Species Retroviral Isolate(s)

Galliformes (continued) Blue peacock (Pavo cristatus) Koon Wah Fok, University of Nottingham, UK Grey partridge (Perdix perdix) Bart Donato, Imperial College, UK Cabot's Tragopan (Tragopan caboti) Koon Wah Fok, University of Nottingham, UK Tetraonidae (Grouse) Black grouse (Lyrurus tetrix) Koon Wah Fok, University of Nottingham, UK Gaviiformes Gaviidae (Loons) Common loon (Gavia immer) David P. Mindell, University of Michigan, USA. Gruiformes Rallidae (Rails) Gray moorhen (Gallinula chloropus) Slimbridge Bird Reserve, UK European coot (Fulica atra) Slimbridge Bird Reserve, UK Passeriformes Muscicapidea (Thrushes and allies) Hermit thrush (Catharus guttatus) David P. Mindell, University of Michigan, USA. Mistle thush (Turdus viscivorous) Koon Wah Fok, University of Nottingham, UK Paridae (True tits) Blue tit (Parus caeruleus) Koon Wah Fok, University of Nottingham, UK Corvidae (Crows) Common magpie (Pica pica) Koon Wah Fok, University of Nottingham, UK Azure-winged magpie (Cyanopica cyana) Koon Wah Fok, University of Nottingham, UK Piciformes Picidae (Woodpeckers) Green woodpecker (Picus viridis) Bart Donato, Imperial College, UK Rhamphastidae (Toucans) Golden-collared toucanet (Selenidera David P. Mindell, University of reinwardtii) Michigan, USA. Rheiformes Rheidae (Rhea) Greater rhea (Rhea americana) David P. Mindell, University of Michigan, USA. Darwin's rhea (Pterocnemia pennata) Alan Cooper, Oxford University, UK Sphenisciformes Spheniscidae (Penguins) King penguin (Aplenodytes patagonicus) David P. Mindell, University of Michigan, USA. Strigiformes Strigidae (Typical owls) Long-eared owl (Asio otus) The Hawk Conservancy, Andover, UK Eastern screech-owl (Otus asio) David P. Mindell, University of Michigan, USA. Struthioniformes Struthionidae (Ostriches) North African Ostrich (Stuthio camelus) (Ostrich meat purchased at Waitrose, Bracknell,UK) Tinamiformes Tinamidae (Tinamous) Elegant-crested tinamou (Eudromia Alan Cooper, Oxford University, UK elegans)

Class Mammalia

Artiodactyla Bovidae Bohor reedbuck (Redunca redunca) Peter Arctander, Zoological Institute, University of Copenhagen, Denmark American bison (Bison bison) Peter Arctander, Zoological Institute, University of Copenhagen, Denmark Gemsbok (Oryx gazella) Peter Arctander, Zoological Institute, University of Copenhagen, Denmark

236 Appendix I. Tissue and DNA Sources

Appendix 1 (continued). Tissue and DNA sources

Order/Family Species Retroviral Isolate(s)

Artiodactyla (continued) Onyx (Oryx otyx) Peter Arctander, Zoological Institute, University of Copenhagen, Denmark Mountain goat (Oreamnos americanus) Gordon Jarrell, Alaska Frozen Tissue Collection, USA Musk ox (Ovibos moschatus) Gordon Jarrell, Alaska Frozen Tissue Collection, USA Thinhom sheep (Ovis dalli) Gordon Jarrell, Alaska Frozen Tissue Collection, USA Impala (Aepyceros melampus) Peter Arctander, Zoological Institute, University of Copenhagen, Denmark Visna- infected domestic sheep (Ovis aries) Division of Comparative Medicine, John Hopkins University, Baltimore, USA CAEV-infected domestic goat (Capra Maria Suzan, Laboraory of lentivirus hircus) pathology, Marseille, France Cervidae (Deer) Caribou Gordon Jarrell, Alaska Frozen Tissue Collection, USA White-tailed deer (Odocoileus virginianus) J. Martin, Imperial College, Silwood Park, UK Camelidae (Camels and Llamas) Llama (Lama glama) J. Martin, Imperial College, Silwood Park, UK

Carnivora Canidae (Dogs) Coyote (Canis latrans) J. Martin, Imperial College, Silwood Park, UK Felidae (Cats) Cougar (Fells concolor) J. Martin, Imperial College, Silwood Park, UK Domestic cat (Felix catus) J. Martin, Imperial College, Silwood Park, UK Ursidae (Bears) American black bear (Ursus americanus) J. Martin, Imperial College, Silwood Park, UK Mustelidae (Weasels and relatives) Pine marten (Mantes martes) J. Martin, Imperial College, Silwood Park, UK Small Indian mongoose (Herpestes J. Martin, Imperial College, Silwood javanicus) Park, UK Chinese ferret badger (Melogale moschata) Leona Chemnick, San Diego Zoo/CRES, San Diego, Cetacea Delphinidae (Dolphins) Risso's dolphin (Grampus griseus) Rob Deaville, Institute of Zoology, London, UK Common dolphin (Delphinus delphis) Rob Deaville, Institute of Zoology, London, UK White-beaked dolphin (Lagenorhynchus Rob Deaville, Institute of Zoology, albirostris) London, UK Atlantic white-sided dolphin Rob Deaville, Institute of Zoology, (Lagenorhynchus acutus) London, UK Striped dolphin (Stenella coeruleoalba) Rob Deaville, Institute of Zoology, London, UK Bottle-nosed dolphin (Tursitops truncatus) Rob Deaville, Institute of Zoology, London, UK Chiroptera

Pteropopidae (Flying foxes) Hairy-legged vampire bat (Diphyla J. Martin, Imperial College, Silwood caudata) Park, UK Edentata

Dasypodidae (Armadillos) Three-banded armadillo (Tolypeutes J. Martin, Imperial College, Silwood matacus) Park, UK

Appendix 1 (continued). Tissue and DNA sources

237 Appendix I. Tissue and DNA Sources

Appendix 1 (continued). Tissue and DNA sources

Order/Family Species Retroviral Isolate(s)

Insectivora Erinacidae (Hedgehogs and European hedgehog (Erinaceus euopaeus) J. Martin, Imperial College, Silwood moonrats) Park, UK Lagomorpha Leporidae (Rabbits and hares) European rabbit (Otyctolagus cuniculus) J. Martin, Imperial College, Silwood Park, UK Marsupialia Macropodidae (Kangaroos and Red kangaroo (Macropus rufus) Rachel Waugh O'Neill, La Trobe wallabies) University, Australia Dasyuridae (Marsupial carnivores) Stripe-faced dunnart (Sminthopsis Rachel Waugh O'Neill, La Trobe macroura) University, Australia Monotremata Ornithorylmchidae (Platypus) Duck-billed platypus (Ornithorhynchus Rachel Waugh O'Neill, La Trobe anatinus) University, Australia Short-beaked echidna (Tachyglossus Rachel Waugh O'Neill, La Trobe aculeatus) University, Australia Perissodactyla Horse (EIAV infected) Wendy Maury, University of South Dakota, USA Pinnipedia Odobenidae (Walrus) Walrus (Odobenus rosmarus) J. Martin, Imperial College, Silwood Park, UK Phocidae (True or Hair seals) Grey seal (Halichoerus gpypus) Rob Deaville, Institute of Zoology, London, UK Otaridae (Eared seals) Northern fur seal (Callorhinus ursinus) Rob Deaville, Institute of Zoology, London, UK Primates Lorisidae (Bush babies, lorises and Slow loris (Nycticebus coucang) J. Martin, Imperial College, Silwood pottos) Park, UK Cercopithidae White-epauletted black colobus (Colobus D. Gotelli, Institute of Zoology, London, angolensis) UK Pongidae Orang-utan (Pongo pygmaeus) D. Gotelli, Institute of Zoology, London, UK Rodentia Muridae Grass rat (Avicanthus -) J. Martin, Imperial College, Silwood Park, UK Mus pahari J. Martin, Imperial College, Silwood Park, UK Oryzomys intermedius J. Martin, Imperial College, Silwood Park, UK Mastomys huberti Laurent Gramjon, Paris Museum of Natural History, France Bismark giant rat (Uromys neobritannicus) J. Martin, Imperial College, Silwood Park, UK Myomys yemeni (Multimammate rat) Laurent Gramjon, Paris Museum of Natural History, France Sciuridae Indian provost squirrel (Callosciurus Leona Chemnick, San Diego Zoo/CRES, prevosti) San Diego, Scandentia Prairie dog (Cynomys ludovicianus) J. Martin, Imperial College, Silwood Park, UK Tree shrew (Tupaia belangeri) Jim Patton, University of California, USA

238 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Appendix 2. Nucleotide alignment BLV [CTTGTGGACA CCGGGGCT]GA AAATACGGTT CTCCCACAAA ATTGGCTGGT TCGAGATTAC HTLV1 [CTACTAGATA CAGGAGCA]GA CATGACAGTC CTTCCGATAG CCTTGTTCTC AAGTAATACT HTLV2 [CTACTTGACA CAGGAGCC]GA CCTTACGGTT ATACCCCAGA CACTCGTGCC CGGGCCGGTA HIV1 [CTATTAGATA CAGGAGCA]GA TGATACAGTA TTAGAAGAAA TGAATTTGCC AGGAAGATGG HIV2 [TTGCTAGACA CAGGGGCT]GA CGACTCAATA GTAGCAGGCG TAGAGTTAGG GAGCAATTAT SIVagm [TTATTAGATA CGGGGGCA]GA TGATACCATT ATAAAAGAAG CAGATTTACA ATTATCAGGA FIV [TTATTAGACA CAGGAGCA]GA TATAACAATT TTAAATAGGA GAGATTTTCA AGTAAAAAAT BIV [TTAGTAGACA CTGGAGCA]GA TGAGGTAGTG CTTAAGAACA TACATTGGGA TAGGATAAAA Jembrana [TTGATAGATA CCGGGGCT]GA TGAGGTAGTG CTGAAAGACA TTCATTGGGA TAGAATCAAG CAEV [TTATTTGATA CCGGGGCG]GA CCGAACTATA GTTAGATGGC ATGAGGGCTC GGGAAACCCA Visna [TTAGTAGATA CGGGGGCA]GA TAAAACTATA GTAACATCCC ATGATATGTC AGGGATACCA EIAV [CTGTTAGACA CAGGAGCA] GA TACTTCAGTG TTGACTACTG CACATTATAA TAGGTTAAAA HERVK10 [TTGGTAGACA CTGGAGCA]GA TGTCTCTATT ATTGCTTTAA ATCAGTGGCC AAAAAACTGG HML1Z70280 [CTAGTAGATA CTGGAGCT]GA TGTCTCTATT ATTGCTATAA ATCAATGACC CCGGCACTGG HML3ACO25577 [-TGATAGATA CAGGAGTG]GA CATTTCAATC ATTTCCCTAC AGCACATGCC ATCCATGTGG HML3AC092364 [TTGGCAGATA CAGGAGCA]GA CATTTCAATC ATTTCTCTAC AGCACTGGCT GTCCACGTGG HML4AC093517 [ATGATTGACA CGGGCGCT]GG TGTGTCCATT ATCGCTTTAC ATCGATGGCC CCGACACTGG HML5AP000870 [TTATTGGACA CGGGAGCA]GA TATTCCAATC ATTAGTGATC AAAACTGGCC AGAAACTTGG HML6AF069508 [CTTATGGATA CGGGAGCT]GA TGTGTTAGTA ATATCCAAAC ACTATTGGCC CACCATCTTG HML7AC013722 [7777777777 97777977]77 7797797??? ?9777777 7777777777 7777777777 HMLBAL596245 77777777]77 7777777?")7 7777777?7? 7????????? 977777777) HML9ACO25569 [ATTATTGACA CAAGAGCT]GA TCAAACTATT ATAGCTAAAA AACACTGGCC TTCAGATTGG HML9AC068700 [ATTATTGACA CA-GGGCC]AA TCAAACTATT ATAGCTAAAA AACAGTGGCC TTCAGATTGG Oryzomysl [??????GACA CGGGGGCC]AA TTGTTCGATA ATCACCCCAG AATCCTGGCA CCCCAGATGG Armadilloll [TTGGTGGATA CGGGGGCT]GA TGTTTCTATT ATTGCTAAAC GATATTGGCC ACAAGCTTGG Reedbuckl [TTGGTGGATA CGGGGGCT]GA TGTGTCTATT ATTGCTAAAC AATATTGGCC ACAAGCTTGG Muspaharill [TTGTTGGATA CGGGGGCT]GA TGTTAGTACA CGTTCCCAAA AGTCTTGGAA TCCAGACTGG NileRatil [TTGGTGGATA CGGGGTCA]GA TAGGACAATT TTAAGACAGC AAGCGGTCCC ACAAAACTGG Hedgehogl [TTGTTGGACA CGGGGGCA]GA AAAGATTATT TTAAGGCA-- --GAGGTTTC CCAGAATTGG Cougarl [TTGGTGGATA CGGGGGCT]AA TGTAAATGTT ATATCCTCCT CATAATGGAC TTCTCCCTGG DomesticCatl [TTGGTGGATA CGGGGGCT]GA TGTAACCGTT ATATCCTCCT CCCAATGACC TTCTCACTGG T22 [TTGGTGGATA CGGGGGCA]GA TGTCTCGGTA ATCAGCGCTA CGCAGTGGCC CCTTCAGTGG Sheepl [TTGTTGGATA CGGGGGCA]AA TGTTTCCATC ATAAGAACAA AAGACTGGCC TTCGGACTGG IPSquirrell [TTGTTGGACA CGGGGGCG]GA TATTCCCATT ATAAGAATAG AGGAATGGCC TTTGGATTGG SIMongooseI [TTGTTGGATA CGGGGGCT]AA TGTTACCATC CTCACTTCAA AAGACTGGCC CCAAGCATGG MPMV [TTAATAGATA CGGGGGCT]GA TGTCACAATT ATCAAGCTGG AGGACTGGCC TCCTAATTGG SRVI [TTAATCGATA CGGGAGCT]GA TGTCACTATC ATCAAGCTAG AGGACTGGCC TCCTAATTGG SRVII [TTAATAGATA CTGGAGCA]GA TGTTACTATT ATAAAACAAG AAGACTGGCC ATCCCATTGG BabboonSERV [CTAATAGACA CAGGGGCC]GA TGTAACTATC ATTAAACAAG AAGATTGGCC CTCTCATTGG MusD [GTAATAGATA CCGGGGCT]GA TGTAACGATT ATAAGAGGGC AGGACTGGCC CTCAAACTGG TvERV [CTACTCGATT CCGGGGCT]GA TTCAACTGTT ATTTCTGAGG CACACTGGCC ACCTGCCTGG SMRVH [ATTCTTGATA CAGGGGCC]GA TGCCACCGTT ATATCTTACA CTCACTGGCC GAGGAACTGG Colobusl [TTGTTGGACA CGGGGGCT]GA TGTAACAATT ATTAAACAAC AAGACTGGCC CGCCAGTTGG SImongooseI [TTGGTGGACA CGGGGGCA]GA CGTCACCATT ATTAAAAAGG AGGCCTGGCC TTCACAATGG Goatli [TTGTTGGACA CGGGGGCT]GA CTCAACGGTC ATCTCTAAAA TTCATTGGCC CTCCGGTTGG OstrichD [TTGGTGGATA CGGGGGCA]GA TGCTACCGTA ATATCAGCTA GATATTGGCC TTCCAACTGG Lorisl [TTGGTGGATA CGGGGTCA]GA CGCCACAGTA ATCGCCAGTA AATACTGGCC CCCTTCCTGG SIMongooseII [TTGGTGGATA CGGGGGCT]GA TGCTACCATA ATTTCAGATA AGTACTGGCC CTCTTCCCGG Jaagsiekte [GTACTAGATA CAGGGGCC]GA TATTAGTGTC ATTTCTGATA AATATTGGCC TACCACATGG RDolphinl [TTGTTGGATA CGGGGGCT]GA TGTCTCTGTT ATGACGCAGA TACACTGGCC TAAACGTTGG WFDeerI [TTGGTGGATA CGGGGGCT]GA TGTTTCTGTA AAATCT-CTA AA---TGGCC AAAAAACTGG Cariboul [TTGGTGGATA CGGGGGCT]GA TGTTTCTGTA ATGTCTCTAA GTGATTGGCC AAAAAACTGG GiraffeI [TTGTTGGATA CGGGGGCA]GA TGTCTCTTGC ATTGCTGGAA AAGATTGGCC CTCATCCTGG Bisonl [TTGTTGGATA CGGGGGCA]GA TGTTTCCTGC ATTGCTGGGA AAGACTAGCC TAGTTCCTGG Muskoxl [TTGTTGGATA CGGGGGCA]GA CGTTTCTTGC ATTTCAGGAA AAGACTGGCC CAGTTCCTGG CFBadgerI [TTGGTGGATA CGGGGTCA]GA TGTTTCAGTA ATCGCCTCCC AGCACTGGCC TAAAAACTGG HRV5 [?????????? ????????]77 7)77797.)7? 777777???? ???7777777 Goatl [TTGGTGGACA CGGGGGCT]GA CCTT-CCATT ATTACCCAAC AAGACTGGCC AAGAAAGTGG IAPSHamster [ATAATGGATT CGGGAGCA]GA CAAAAGCATT ATTTCACTCC ATTGGTGGCC GAAGTCTTGG IAPCHamster [TTACTAGATT CTGGAGCT]GA TAAGAGCATC ATAGCCACTA AAGATTGGCC CTCTGGCTGG IAPMouse [ATCCTTGATA CCGGAGCA]GA TAAAAGTATA ATTTCTACAC ATTGGTGGCC CAAAGCATGG Mastomysl [TTGTTGGATA CGGGGGCA]GA CAGGAGTATT ATAGCAAAAA AGGACTGGCC TTCAGGTTGG Myomysl [TTGGTGGATA CGGGGGCA]GA CAGGAGTATA ATAGCTAAGA AAGATTGATC TTCAGGCTGG UromysI [TTGGTGGACA CGGGGTCA]GA TAAAAACATA ATCTCCACCA GCTGGTGGCC AAAGAGTTGG Prairiedogl [TTGGTGGATA CGGGGGCA]GA TAGTAGCATA GTCACTCTGA AGGACTGGCC AAAGGGATGG Rabbitl [TTGTTGGATA CGGGGGCC]GA CAAGTCCATC ATAGCAACAA AGGACTGGCC TAGTTCATGG NileRatI [TTGGTGGACA CGGGGGCA]GA TCGTAGCATC ATATCTGCCC ATGACTGGCC TGCCAAATGG MMTVBR6 [CTCTTGGATA CCGGGGCA]GA TAAAACTTGC ATAGCAGGCA GAGACTGGCC AGCTAATTGG SDunnartI [77777777" 77GGGGCT]GA TATATCTGTT ATAACAAAAA TTGATTGGCC TCCTGAATGG RKangal [TTGGTGGATA CGGGGGCA]GA CATCTCAGTA ATCCGAGAAG ATGACTGGGC ATCCCACTGG

239 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Platypus) [TCGGAGGACA CGGGGGCT]GA CAAGTCGATT ATCGCCAAAA GGCACTGGCC CCTTTCTTGG Echidnal [TTGGTGGACA CGGGGGCT]GA TCGTTCGGTC CTTACAAGAA AGGATTGGCC CCCGGCATGG Armadillo) [TTGGTCGACC TGCAGGCG]77 7777779777 9797999??? ?????????? 7779 77777 PFalconl [TTGGTGGATA CGGGGGCC]GA TGTTACGGTA ATATCTCAAG CTAAATGGCC GCCACTGTGG PrairiedogII [CTAGTGATTA CGGGGGCC]GA TGTTACGGTA ATATCTCAAG CTAAATGGCC GCCACTGTGG MHarrierl [TGTTGGATAC GGGTGGCT]GA TGTTACAGTA CTATCTTTGG CTAAGTGGCC CCCACAGTGG VultureI [???GTGGATA CGGGGGCT]GA TGTTACAGTA ATATCCCAGG CCAAGTGGCC TCCCCGGTGG ETinamoul [TTGGTGGACA CGGGGGCT]GA CGTCACAGTA ATATCCCAAG CCAAGTGGCC TCCCCGGTGG Moorhenl [TTGGTGGATA CGGGGGCT]GA TGTTACAGTC ATCTCGCAGG CAAAATGGCC TCCACATTGG HThrushl [TTGGTGGATA CGGGGGCT]GA TGTGACTGTC ATTTCTCAAG ATAAATGGCC TTCTAACTGG ESOw1I [TTGGTGGATA CGGGGGCG]GA GGTGACCGTA ATATCTGCGA TCAAATGGCT ACAGCATTGG RNPheasantl [TTGGTGGACA CGGGGGCA]GA TGTAACAGTA ATTGCCCAAA GGACTTTGCC ATCACACTGG Peacockl [TTGGTGGATA CGGGGTCA]GA TGTAACAGTA ATTGCCCAAA GTGCTTGGCC ATCGCATTTG BluetitII [TTGGTGGATA CGGGGGCA]GA TGTGACCATC ATACCGGAAA AGGTGTGGCC ATCACACTGG CMagpieII [TTGGTGGACA CGGGGGCA]GG CGTGACGGTC ATTGCCTCCA ACAACTGGCT GTCACATTGG Penguinl [TTGTTGGATA CGGGGGCA]GA CGTTACTATC ATCAGTACTG CAAAGTGGCC TGGATCATGG GSKiwiI [TTGTTGGATA CGGGGGCA]GA TGTCACTGTT ATCTCCCTAG CGCGATGGCC TCGCGGGTGG LKiwiI [TTGGTGGACA CTGGGGCA]GA TGTCACTGTT ATCTCCCTAG CGCGATGGCC TCGCGGGTGG HThrushl [TTGGTGGATA CGGGGGCA]GA TGGGACAGTG ATTTCTCTC- CTGCCTGGCC TCCCACACGG ES0w1II [TTGTTGGATA CGGGGGCT]GA TGTAACTATT ATATCTCAGT ACTCGTGGCC ACCCACCTGG WFgoosel [TTGTTGGATA CGGGGGCA]GA TGTAACAGTA ATAGCCCTAA AGGACTGTCC CAGTCGATGG NABDuckIII [TTGGTGGATA CGGGGTCT]GA TGTCACAGT- ATAGCCATGA AGGACTGGCC GAAGGACTGG Flamingol [TTGGTGGATA CGGGGGCA]GA TGTTACAGTA ATTGCTTTTA AGGACTGGCC CAACAATTGG GRheaII [TGGTGGGATA CGGGGTCG]GA TCTTACCATA ATACCACAAA CCTCTTGGCC ATTGAGCTGG DRheaI [TTGGTGGATA CGGGGTCG]GA TCTTACCATA ATACCACAAA CCTCTTGGCC ATTGAGCTGG BrownKiwiI [TTGGTGGATA CGGGGTCC]GA CGTCACCATA GTACCTCGCG ACTCCTGGCC TGCCGCTTGG RBowerIII [?????????? ????????]?? ?????????? 7777777.277 777?7.7. 7 7777777777 Toucanettell [TTGGTGGATA CGGGGGCT]GA TGTTACTATC ATTCCTTGCT ACGTTTGGCC ACCTGCTTGG GWoodpeckerl ')GGGGGCT]GA TGTCACCATC ATACCTCGCC ATAGGTGGAA TAGTCAGTGG ES0w1II [TTGGTGGATA CGGGGGCA]GA TGTAACCATT ATAGCACAAA TGGAATGGCC TGGTACATGG GRheaI [TTGGTGGATA CGGGGGCT]GA TGTAAGTATT GTTCCATGAA AAATTTGGCC TGTTTCTTGG JQuailI [TTGGTGGATA CGGGGGCT]GA CATAACTATT ATAGCAAAAG CAGAATGGAT AAAGGACTGG Cassuaryl [TTGGTGGATA CGGGGGCG]GA TGTGTCAATC ATCATGCAAT CAGACTGGCC CCAAGATTGG AZMagpieII [TTGGTGGATA CGGGGGCT]GA TGTTTCCATT ATCAAAGCCA GTGACTGGCC TAGTGGTTGG CMagpieIII [TTGGTGGATA CGGGGGCT]GA TGTTTCCATT ATCAAAGCCA GTGACTGGCC TAGTGATTGG MThrushl [TTGGTGGATA CGGGGGCT]AA CGTTTCCATT ATrAAAAGCA GTGACTGACT GATGGTTGAC CMagpiel [TTGTTGGATA CGGGGTCA]GA TGTTTCTGTT ATCTCTTATT ATCGTTGGCC AACAGATTGG Guineafowll [TTGGTGGACA CGGGGGCT]GA TGTCACAGTT GTACCAAAAG AGGTGTGGCC CTCTCACTGG PartidgelV [TTGGTGGATA CGGGGGCG]GA TATTACTGTC ATCAGTTAGC ATGATTGGCC TCCACATCGG HThrushII [TTGTTGGATA CGGGGGCT]GA TGTTACTATT GTCCCCATTG ATTTTTGGCC TAGTCAGTGG GPheasantl [TTGGTGGATA CGGGGGCG]GA TGTGACTGCC ATGCCCGTGG GAAGGTGGCC GGTGCGGTGG GPheasantII [TTGGTGGATA CGGGGTCG]GA CATAACAATT ---TCCGAGA AGTTATGGCA GCACAAATGG NABDuckI [TTGGTGGATA CGGGGGCG]GA TGTTACAGTG ATCCCCAGGA CGCAATGGCC CAGGAGCTGG LoonI [TTGGTGGATA CGGGGTCT]GA TGTTACAATT ATAAACCGAA CTATATGGCC TACAAGTTGG Ostrichl [TTGGTGGATA CGGGGGCA]GA TTCTAGTATA ATTACTCCAT CACAGTGGCC CCAAGGATGG MoorhenII [TTGGTGGATA CGGGGGCC]GA CACTAGCATA GTGTCCTCAG AAAGTTGGCC GTCAGCATGG Goshawkl [TTGGTGGATA CGGGGGCT]GA CATTTCTATC ATTAGCAGAC ATTCTTGGCC AAAATCATGG Pigeon) [TTGGTGGATA CGGGGGCT]GA TTTGACCATC GTAGATGAAG GCGCCTGGCC TCAGGATTGG FHawkI [TTGGTGGATA CGGGGTCT]GA CATTTCTATC GTCAGCGAAC ATTTTTGGCC AGAATCACGG BGrousel [TTGGTGGATA CGGGGGCA]GA TATTACTATT GTAGCAACGA GCGCCTGGCC CCGACAGTGG BGrouseII [TTGGTGGATA CGGGGGCA]GA CACTTCTATA ATAGCTCCAG AAAGTTGGCC GGCCGAGCGG PeacockII [TTGTTGGATA CGGGGGCA]GA TATCACTATT GTATCTCAGC AGATGTGGCC CTCCCAATGG NABDuckIV [TTGGTGGATA CGGGGGCA]GA TGTGACAATA ATCAGTGCGC AAAGTTGGCC ACCGCATTGG BluetitIII [?????????? ????????]?? ????????TT GTTGCTCCAC AATATTGGCC CCAGGAATGG Toucanettel [TTGGTGGATA CGGGGGCA]GA TGTTACAGTT ATCCCAGAAT CCAGCTGGCC TCCAAGCTGG NABDuckII [TTGGTGGATA CGGGGGCA]GA CATTACAGTT ATCCCTGAGA TGAACTGGCC TCTGAGTTGG HThrushIII [TTGGTGGATA CGGGGTCA]GA CGTCACCATC ATCCCTGACA TACACTGGCC TCTTCCATGG BlueTitI [TTGTTGGATA CGGGGGCA]GA TGTCACAGTC ATTCCAGAGA CAAATTGGCC TCCGCTCTGG T7 [TTGGTGGATA CGGGGGCA]GA TGTCACAGTC ATT-CTCAGA CAAATTGGCC TCCTCTCTGG MHarrierII [TTGGTGGATA CGGGGTCC]GA TGTCTCATGT ATACCCCCAT GTTTGTGGCC TGATTCCTGG Emul [TTGGTGGATA CGGGGGCA]GA TGTTACATGT GTGCCCTCGT GGTTGTGGCC ATCGAGCTGG LKiwiII [7"77GGATA CGGGGTCT]GA TGTCACGTGC ATCCCTCTCC GCATGTGGCC GAAAAACTGG LDV [ATGCTGGATA CGGGAGCC]GA TGTAACGGTG ATAAATGAAC CATCTTGGCC CTGCACTTGG RSV [CTGTTGGACT CTGGAGCG]GA CATCACTATT ATTTCAGAGG AGGATTGGCC CACCGATTGG ALVsubgrpJ ICTGTTGGACT CCGGAGCG]GA CATCACTATT ATTTCGGAGG AGGACTGGCC TACTGATTGG Tragopanl [TTGGTGGATA CGGGGTCG]GA CATTACAGTT TTCGCGGATA GCGATTGGCC ATCTACATGG GuineafowlII [TTGGTGGATA CGGGGTCG]GA TGTGACAGTC ATCTCTAAGT CAGATTGGCC AGAGGATTGG LoonII [TTGGTGGATA CGGGGGCT]GA TGTTACTATC ATTCCTTGCT ACGTTTGGCC ACCTGCTTGG

240 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Appendix 2. Nucleotide alignment (continued) BLV CCACGGATCC CCGCCGCA------GTGCTC GGAGCAGGGG GAGTCTCCCG GAACAGATAC HTLV1 CCCCTCAAAA ACACATCC------GTGTTA GGGGCAGGGG GCCAAACCCA AGATCACTTT HTLV2 AAGCTCCACG ACACCCTG------ATCCTA GGCGCCAGTG GGCAAACCAA CACCCAGTTC HIV1 AAACCAAAAA TG ATAGGG GGAATTGGAG GTTTTATCAA AGTAAGACAG HIV2 AGTCCAAAGA TA GTAGGG GGAATAGGGG GATTCATAAA TACCAAAGAA SIVagm ACATGGAAAC CAAAAATA-- ATAGGG GGCATTGGAG GGGGACTCAA TGTAAAAGAG FIV TCTATAGAAA ATGGAAGGCA AAATATGATT GGAGTAGGAG GAGGAAAGAG AGGAACAAAT BIV GGGTATCCAG GGACACCAAT TAAACAAATT GGGGTAAATG GAGTAAATGT GGCCAAAAGG Jembrana GGTGTGCCCG CTGCCTCAGT AGTGCAGGTT GGAGTAACAG GCAGAAATAT AGCAAGGAGG CAEV GCCGGAAGGA TAAAA CTGCAA GGAATAGGAG GAATAGTAGA AGGAGAAAAA Visna AAGGGAAGGA TAATA TTACAG GGCATAGGGG GAATAATAGA AGGAGAAAAA EIAV TATAGAGGGA GAAAA TATCAA GGGACGGGAA TAATAGGAGT GGGAGGAAAT HERVK10 CCTAAACAAA AGGCTGTTAC AGGACTTGTC GGCATAGGCA CAGCCTCAGA AGTGTATCAA HML1Z70280 CCTAAGCAAA AGGCATCCAT TGTTATTGTT GGAGTAGGAG CTGCGTCAGA AGTTTTTCAA HML3ACO25577 CCAGTTCAAC CCACTCAATT TAACATAGTT GGAGCTGGTA AAGCCCCTGA AGTGTATCAA HML3AC092364 CCAATTCAAC CTGCTCAATT TAACATAGTT GGAGTTGGTA AAGCCACTGA AGTATATCAA HML4AC093517 CCAAAGGAGC ATGCCTCCAC AGCATTCGTG GGTGTTGGTC AGGCTTCACA GGTATATGAA HML5AP000870 CCTTGGGTCA CTCAGAAA-- -CAAAAAATT GTCAACATCG GGGAAGTGCG CACAGCCAAG HML6AF069508 ACCCCTGCAA TTAACTTCTA CATCCTTAGT GGACTAGGAA AGCTCAAATT TCATGGAGTG HML7AC013722 ?777777777 7777777777 7777777777 777777???? ?????????? ?????????? HML8AL596245 7777777777 7777777777 7777777777 7777777777 72777777/7 1777777777 HML9ACO25569 GCCTCCTCTT CTGCCTTATG TTCTATCATG GGTGTTGGAG GTATCAGCAG CCATTAA-AA HML9AC068700 TCTTCCTCTC CTGCCTTA TCATG TGTGTTGGAG GTATCAACAG CAGTTAA-AA Oryzomysl CCTCTTCAAG AGGTAAATGT TCAGTTTTTG GGAATTGGGT CTGTATCCCA GGTAAAACAG Armadilloll ACTTTGTAGA AGGTGCCTAC AGTTTTTACT GGCGTTGGCA CAATGACGGA TGTTTACCAG Reedbuckl ACTTTGCAAA AGGTGCCTAC AGTTTTTACA GGCATTGGAA CAATGACAGG TGTTTACCAG Muspaharill CCACTTCAGA ACGTTTACAC GCAGTTTATA GAAATTGGTA AATTATTGTG AATAAGACAA NileRatil CAGCTG---A CCCCTGGGCG CCAGCTCCTA GGCATAGGGG GTACCTCTCA CACATTCATC Hedgehogl ---GAACTCC TTCCAGGACC CCATTTACAC GGAGTTAGAG GGTTGACCCA AGGCTTTCTC Cougarl GCTACC---G CCCCTACAGC CACCTTGGTG GGGGTTGGGG GCACCCAGCC TTCCTGACAA DomesticCatl GCCACC---G CCCCTACAGC CACCCTGGTA GGAGGTGGGG GCTCCCAGCC TTCCCAACAA T22 CCAACAGATA CGGCCCCAAA A---ATTTGG GGGGTAGGGG GGATGCAAAC TTCTCGGCAA Sheepl CCTACAATTT TAACCTCACA CCAGTTGGTG GGAATAAGCA CTGCAGATGC AGCTCAAACT IPSquirrell CCTGCAGTTT TAGCCTCACA CCAGTTGGTG GGAATAGGAA CTGCAGATGC AGCTCAAACT SIMongooseI CCTGCTCAGA CTTCTCCTTA CATGGTCTCT GGAGTTGGAG CTAATAATTT TGCCAACCGC MPMV CCTATAACAG ATACCTTAAC CAATTTAAGA GGAATAGGAC AAAGTAACAA CCCTAAACAA SRVI CCTATAACAG ATACCTTAAC CAATTTAAGA GGTATAGGAC AAAGCAACAA CCCTAAGCAA SRVII CCTACCACCG AAACCTTAAC TAATTTGAGA GGAATAGGAC AAAGTAATAA CCCCAGGCAA BabboonSERV CCTACCACAG AGACTTTAAC TCACTTGAGA GGAATTGGGC AAAGCAGTAA TCCTAAACAA MusD CCCCTATCTG TTTCCTTGAC TCACCTTCAA GAAATTGGTT ACGCCAGTAA CCCAAAACGT TvERV CCGTTACAGC CCTCCCTGAC CCACTTACAA GGTATAGGGC AGAGTTCCAA CACTATGCAA SMRVH CCGTTAACAA CCGTTGCTAC TCACCTGCGC GGTATTGGCC AGGCCACCAA CCCCCAACAA Colobusl CCCACTTCAG AGACCTTAAC AAATTTACGA GGAATCGGGC AAAGTAATAA CCCTAGACAA SImongooseI CCCTTATCCC CTGCCCTAGC CAACCTTAAA GGCATAGGAC GGAGTGTAAA CCCAGTGAAG GoatII CCCCCACGCA TATCGGCTAC TCATTTACAA GGGATAGGTC CGTCAAAGAA TACTCTACAG OstrichD CCCTGTGAGG CCTCAGTAAC TCATTTGAA- GGCATTGGCC AGTCTGTCAA CCCTAGACAA LorisI CCCCTCACTA CTTCCCTAAC CCATCTCTGG GGTATTGGCC AAACCTCCAA ACCCCAG--- SIMongooseII CCCCTTATGG CATCTTTCAC CCATCTTAAG GGCATAGGGC AAACATCCAA TCCTCAACAG Jaagsiekte CCAAAACAGA TGGCTATTTC CACTCTCCAG GGTATTGGCC AAACTACCAA TCCAGAACAG RDolphinI CCCATTAGTC CTACTATTAC AGAACTTCAC AGAATAGGAC AAAGTTCTTC TCCTATGCAA WFDeerI AATAAGCAAG TAGCCATCTC CACCTTACGG GAAATTGGCC AGTCACATAA CCCCGAACAG Cariboul AATAAGCAAG TAGCCATCTC CACCTTATGG GGAATTGGCC AGTCTCATAA CCCTGAACAG Giraffel CCCACACGTC TGACCAATGC TGCCTTGGTA GGAATAGGGT CAGTGCCCTC GGTTGCTAAG BisonI CCAATGCATA CAACTGAAAA TGACTTGGTG GGAATAGGGA GAGCCCAGGC GGTAGCTAAA Muskoxl CCCACACATA CTACTGAAAA TAACTTGGTG GGGATAGGAA GAGCCCCCAC AGTAGCTAAA CFBadgerl CCTCTCGTAG ATTCAGATAC TTCTATTAGA GGTGTAGGAC AAGCTTGTGC TCCTAAGTGC HRV5 7777777777 7777777777 7777777777 7777777777 ?????????? ?????????? Goatl CCAGTTCAAA GATCAGATAC TTGCCTGAGA GGCCTTGGTT ACTCTGAAGG CCCAAATAAA IAPSHamster CCCACTGTTG TTTCATCTCA TTCTCTACAA GGTCTTGGAT ACCAGTCCTC TCCTGCCATC IAPCHamster CCTATACAGG TTTCTTCTCA AAGTTTACAA GGTTTAGGCT ATGCTAAGGC TCCTGATATG IAPMouse CCCACCACAG AGTCATCTCA TTCATTACAG GGCCTAGGAT ATCAATCATG TCCCACTATA Mastomysl CCTGTACAAG CCTCTTCTCA AACACTCCAA GGCCTAGGCT ATGCAAAAAC TCCTGATATG Myomysl CCCATACAAG CTTCTTCTTA GATGTTGCAA GGACTTGGCT ATGCAAAAAC TCCTGATATG UromysI CCAACTACTG TTTCCTCTCA TTCCTTACAA GGACTGGGAT ATGAAGCTAG CCCTGCTGTT Prairiedogl CCGGTGCAGC TTTCAGACCA GTCTTTAAGA GGTTTAGGAT ATGATCAAGC TCCACAAGTC Rabbitl CCTACAAACG TGGCAGAACA AACTTTACAA GGACTTGGTT TTGCCCACTT CCCTAGGGTC NileRatI CCGACCCAGA GGTCTGCCCA AAACCTCCTG GGATTAGGCT ATGAATCCTC CCCCATGATG MMTVBR6 CCCATTCACC AAACTGAGAG TTCTCTTCAA GGTTTAGGCA TGGCCTGTGG GGTGGCGCGT SDunnartI TCACTTCAAA CGCAATCAG- -TAAGTTAAT GGAGTGGGTG GCATGCAGCT TGCTTCCCAG RKangal AAGGTACAAA AGGTCAGTCA CTCTGTTGCG GGGGTAGGAG GTAACCAACT CGCCCATAAG

241 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Platypusl CACCTACAGC AAGGGAGTGT CCCGTTGGCA GGAATAGGAA CAGTAGAAAC ACCCGCGCAA Echidnal CCACTGCAAA CCGGGGCTGC CCCGTTAGAA GGCATCGGAG AGGTTACCCT CCCCTCTCAA Armadillol '?9,79,777? ??????7??, ?????????? ??????WP,7 "›"PP,9')">7 PFalconI CCGCTAGCCA ATATTCCCCA AGCGTTAGCT GGAATTGGCG GAACTGGTAA AAGCCACCAG PrairiedogII CCGCTAGCCA ATATTCCCCA AGCGTTAGCT GGAATTGGCG GAACTGGTAA AAGCCACCAG MHarrierl CCTCTGGCCA ATGTTTCCCA AACATTGGCA GGAATCGGCG GAACTGGCAG CAGCCACCAA Vulturel CCCTTAGCCA ACGTCTCACA AGTGTTGGCC GGCATAGGGG GAAACGGCTC CAGCCATCAA ETinamoul CCCCTAGCCA ACGTCTTACA AGTATTGGCC GGCATTGGAG GAAATGGCTC CAGCCATCAA Moorhenl CCTTTGACTT CTACCCCTCA GGCCCTTGCA GGGGTGGGAG GAACCGGCAC AAGTCGGCAA HThrushl CCCCTTACCG AGGTACCCCA TGCCCTCACC GGAATTGGGG GGGTCAGCAA AAGCTTTCAA ESOw1I CCATTAGCTC AGGTAACTCA AGTGCTTTTC GGGGTTGGTG GTCATGCTGT AACTTGTCAT RNPheasantl CCCACCTCAC AGGTTCATAC CACATCAGTG GGTATAGGAG GCCAGACTGT CCGTGTGCAG Peacockl CCCACCTCAC AGGTTCGTAC CGCACTAATG GGTATAGGAG GCCAGACTGT CCCTGTACAA BluetitII GAACTACAAC CAGTGGCAGG AAAAATACAA GGTGTTGGAG GGATGAAACT GGCAAAAATT CMagpiell GGGTTTCAAC CAGCGGAAGG TATTGTTACA GGTGTGGGAG GATCTGTGGC TGCCCAGAGA Penguins GAAACCATCC CGGTCAACAC GGGGCTAGTG GGAATCGGAG GCCTGTCGAC GTCCAGGCAG GSKiwiI GCAACAACGC AGGCGAGCAG TGGCCTCGCC GGAATCGGCG GGGTATCCAT CCCTCGGCAG LKiwiI GCAACAACAC AGGCGAGCAG TGGCCTCGCC GGAATCGGCG GGGTATCCAT CCCTCGGCAG HThrushl CCGATGGCCT CACTTGGCCA GGCCATTGCA GGAGCGGGGG GTACAGCACA GACCTTTGTT ESOw1II CCAATTGTAA ATATGGGAAT GGGAGTGGTA GGACTGGGAG GGCCAACACA AGTAGAAGTA WFgoosel CTGTTGGATT CAACTGCAAA AGGCCTCATG GGGGTGGGTG GTGCTTCCAA TACTTGTCAA NABDuckIII CCTTTAGAAT CAACCACGAA GGGACTTGTA GGAGTAGGAG GATCATCGCA AACCTATCCG Flamingol CCGCTGGACC CTACTACGGG GGGCCTCGTA GGAGTGGGAG GTGTCTCTAA CACCACTTAT GRheaII AGTCTACAGG AAGTGTCAAC ACCTGTAACA GGTATTGGAG GGCAGTCAGC AACCAAGGTT DRheaI AGTCTACAGG AAGTGTCAAC ACCTGTAACA GGTATTGGAG GGCAGTCAGC AACCAAGGTT BrownKiwil GCGCTAGTGG ATAGCACCAC TGCAGTACAG GGGATTGGTG GTTCAGCCCC CACCTGGAGA RBowerIII w''')w)"A'AC ACCAGTAATG GGGGTGGGGG GCGTGCGAAT GACCCGAATT Toucanettell CCCCTTGTGC CAGCGCCAAC AAGCATTATG GGCGTAGGTG GATCGCAAAG CACAATGATA GWoodpeckerl CCATTGCTGA TAGCAAATTC TGTAGTCATG GGTGTAGGAG GGGTGCAAAC AACTTGGGTC ESOw1II CCATTGAATA ATCCTGCTCA TGGGCTAATG GGGGTGGGCA GAGTGTCAAC CACAATGCAG GRheaI CCACTGCAGG TGTTACAAAT GTCTATCACG GGGATTGGCG GGTCTCAGCC CACTTGTGTA JQuailI CCTTTGTTGG CAACCACCTG TGGGTTATTG GGGATTGGAG GTTCATCACC CTCTGTCTCA Cassuaryl CCTTTAAAGA ATCCAACTGC GGCTATTGTA GGTGTTGGTG GAATGCAATT CGCCAAACAA AZMagpieII CCCACAGTTG ACCCTGCCTC AACCCTGGTT GGGGTAGGAG GGCTTCAACG CCCTCGCCAA CMagpieIII CCCACAGTTG GCCCTGCCTC GATCCTGATT GGGGTAGGAG GGCTTCAGTG CCCTCACCAA MThrushl CCAACC TC AAATCTGATT GGGTTAGGAG GGCTGCAACA CCCTCAGCAA CMagpiel AAATTAATAA CCCCCCCAGG CGCTCTCACA GGCATTGGAG GTGTCACTCC ATGATTGCAA Guineafowll CCTTTGGAAG CTTGTCCCTC TGGCCTTTCG GGCATGGGTG GAGTACAGGT AGCTAAGCGT PartidgeIV CCTACCAATG CTTCCCCATT GGCTATCTCT GGGGTTGGAA GTCGGCAGTT TGCGTCCAGG HThrushll CCTTTGCGAC AATTGAATGA AAGTCTTTTG GGAGTTGCGG GTGTTACATC AACATGGCAA GPheasantl CCTGTGACAG CATCTACAGC CATGGTTAGA GGGGTTGGCG GGATTTTGGT GGGACAGAGT GPheasantII CCTTTAGAGA CCACCACTCG CATGGTAAGT GGGGTGGGTG GCTCAGCCTA CGGACAGCAG NABDuckI CCGCTCTGGC CTTTTCTGAA ATCTGTACAA GGTGTGGGTG GGAAGGAGTA TGGCAGTTGG Loonl TCTGTAGAAC TGCCAATGTC ACCCATTACA GGCGTGGGAG GACAATCCAC ACCTTATATC Ostrichl CCCCTGTTTC CATCAGGTAC TACGGTTTCT GGAGTTGGGG GTATGACATT TGCTCAGAGA MoorhenII CCGTCGTATG CTTCCTCTCA CACGGTTATG GGGGTTGGAG GACTCACGTT AGCGAGAAGA Goshawkl CCAACAAGGA CTGTTGCTGG AGGCGTGGAA GGAGTTGGAG GAACAGTTAG TATTCAACAA Pigeonl CCAACAAAAA CTATGGATAG AGGCGTTGAA GGAGTAGGAG GTCATTCATT CAGTACCGTC FHawkI CCAACAAGGA CTGTTGCTGG AGGCGTAGAA GGAGTTGGGG GAACAGTTAG TATTCAACAA BGrousel CCGACGGAGT CGGTCGGAAA GGGGGTGGAA GGAGTTGGTG GAACTGTTCC GGTGTCTCGC BGrousell CCACTGCAAC CTTCTACTAC GACGGTCTCG GGCATCGGGG GTATGACGCT GGCTAGTGAG PeacockII CCCCTTGAAA GGATGGTGAA AGGGGTGGAG GGAGTGGGTG GAGCGGTTCC AGTAACTCGA NABDuckIV CCTACATTCG AATCCAGTAC CACTGTAGCA GGAGTAGGAG GTATAACTAT TGCTCGTCGA BluetitIII CCTATGTTGC CTAGTATGGT TACTGTTACA GGGGTGGAAG GTTTAACTTT AGCAAAAAGG Toucanettel GAATTA---G TAGAAGCCTC CTCAGTGGGC GGAATCGGAG GACTGGTTCG AGCCCGAAAG NABDuckil AAATTG---G AAGACGCCCC TATGGTGGGT GGAGCTGGGG GATTAACCCA AGCTCGGAAG HThrushIII AAGCTA---G AAGAAGCCCC CATGGTGGGC GGAGTCAGAG GACTGTCCCG AGCCCGCAAA BlueTitI AAACTA---G AGGATGCCCT CATGGTAGGC GGGGTTGGAG GGCTGTCATG GGCTTGGAGA T7 AAACTA---G AGGATGCCCC CATGGTGGGC GGGGTTGGAG GGTTGTCATG AGCTCGGAGA MHarrierII CCTACT------GAGCCGGG ATCCCTGGAA GGCTTAGGGG GGCAGACGGG GTCACAGCGT Emul CCAGCA---A AAGTGGCAAG CTCGTTAGAT GGAGTGGGAG GCAGGGCCGG AGGCTATGAT LKiwiII CCGACG---G ACCAGGGAGG AAGTCTCACC GGGGTGGGAG GCAACGCGAA GGCGTACCGC LDV CCTGCT---A TTCCGACCGT TGGAGTGGAA GGAGTAGGGG GACTAACTAA TGCATCCCGC RSV CCAGTGGAGG CCGCGAACCC GCAGATCCAT GGGATAGGAG GGGGAATTCC CATGCGAAAA ALVsubgrpJ CCGGTGGACA CCGCGAACCC ACAGATCCAT GGCATAGGAG GGGGAATTCC CATGCGAAAA Tragopanl CCTACA---G GAGCAACTCA AATGATAGCG GGAGTGGGGG GAACCATCCC TACTCGAAAA GuineafowlII CCAACA---G GCCCACTTCA AATGATCAGA GGGGTGGGTG GAGTTATAAC AGCGTGGAAA LoonII CCCCTTGTGC CAGCGGCAAC AAGCATTATG GGCGTAGGTG GATCGCAAAG CACAATGATA

242 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Appendix 2. Nucleotide alignment (continued) BLV AATTGGCTAC AAGGCCCTCT GACC[CTGGCT CTA]AAACCAG AGGGTCCCTT TATCACC[--- HTLV1 AAGCTCACCT CCCTTCCTGT GCTA[ATACGC CTC]CCTTTCC GGACGACGCC TATTGTT[--- HTLV2 AAACTCCTCC AAACCCCCCT ACAC[ATATTC TTG]CCCTTCC GAAGGTCCCC CGTTATC[--- HIV1 TATGATCAGA TACTCATAGA AATC[---TGC GGA]CATAAAG CTATA GGTACA[--- HIV2 TATAAAAATG TAGAAATAAG AGTA[ TTA]AATAAAA GAGTA---AG AGCCACC[--- SIVagm TATAGTGATA GGGAAGTAAG ATTG[ GAA]GACAAAA TTTTG---AG AGGGACC[--- FIV TATATTAATG TACATTTAGA GATT[ AGA]GATGAAA ATTATAAGAC ACAATGT[??? BIV AAGACCCACG TAGAGTGGAG ATTT[ AAG]GATAAGA CT GGGATA[--- Jembrana AAAAGTAATG TAGAGTGGAG ATTC[ AAA]AACAGAT AT GGCATA[--- CAEV TGGAATAATG TAGAATTAGA ATAT[ AAA]GGAGAAA CAAGAAAG-- -GGAACA[--- Visna TGGGAACAAG TACACTTGCA ATAT[ AAA]GATAAAA TGATCAAA-- -GGTACC[--- EIAV GTGGAAACAT TTTCTACGCC TGTG[---ACT ATA]AAGAAAA AGGGTAGACA CATTAAG[--- HERVK10 AGTATGGAGA TTTTACATTG CTTA[ ]GGGCCAG ATAATCAAGA AAGTACT[--- HML1Z70280 AGTTCTTTGA TTTTACCATG TCAA[ ]GGGCCGG ATGGTTAGGA AGGGATA[--- HML3ACO25577 AGTAGTTATA TTTTGCATTG TGAA[ ]GGGCTCA ATGGACAACC TGGGACT[--- HML3AC092364 AGTAGTTAAA TTTTGCATTG TGAG[ ]GGGCCCG ATGGACAATC TGGGACT[--- HML4AC093517 AGTTCCACAA TTTTACATTG CACA[ ]GG-CCTG AAGGACAGAT TGGAACT[--- HML5AP000870 CAGAGCACAC GCCCCCCA-A CCTG[ TTG]GATTCAG AGGGAAGAAA GGCAGTT[--- HML6AF069508 CTGAGACTTA TCCTGTCTCA GACC[ ]CACATGG ACAGTCATGT ACTTTCG[ACC HML7AC013722 7777777777 9777777777 7777[777777 777]7777777 7777777777 /777777[777 HML8AL596245 ?????????? ?????????? ????[?????? 977]7777777 77777777" 77777[79.) HML9ACO25569 AGTAGGCATA ACCTTCCACT TTTG[ ]GGCCTGG AAGGCAAGGT TGCCCAG[--- HML9AC068700 AGTAGGCATA TCCTTCCACT TTTG[ ]GGTTGGG AAGGCAAGGT CGCCCAT[--- Oryzomysl AGTTCAAGAT GGCTTGAATG TATT[ ]GGACCAG AAGGACAGAG AGGAATA[--- AxmadilloII AGTACTCAGA TATTTAAGGT AGAA[ ]GGTCCGG AAGGACAAGA GGCCGCT[--- Reedbuckl AGTACTCAGA TATTTAAGGT AGAA[ ]AGTCCGG AAGGACAAGA GGCTGCT[--- Muspaharill AGTGTCCAAT GAATTACCTG TGTG[ ]GGACCAG AAGGTCATAC AGGGAAG[--- NileRatII ACCAAACAGA CCTACAGGTA GGAA[ ]GACCCAG ATGGAGCTAC CGGTCTC[--- HedgehogI ACATGAGATT CGCTCATGTG GGAA[ ]GACCTGG AAGGAACTAG TGGACAC[--- Cougar' AGCCTCACCT TATACAAATG TTTA[ ]AGGCCTG ATGACCAAAT AGCCTAC[--- DomesticCatl AGCCTCAGCT TATGCAGATG TTTA[ ]GGGCCTG ATGACCAAAT AGCCTAC[--- T22 AGCTCTGCTG TATTAAGGGT TACT[ ]GGGATTG ACGCTACAGT AAGTGCG[TTT Sheepl TATGTTAGTT CATCTTACTT ACAG[---GCC CTG]GGCCCTA ATCAATTAGT CGCTTAC[--- IPSquirrell TATGTTAGTT CATCTTACTT ACAA[---GCC CTG]TGCCCTG ATCAATTAGT CGCTTAC[--- SIMongooseI CAAGCATTGC AGTTACTGAA GGTT[ ATT]GGGCCTA ATGGTGAGGA AGCT---[--- MPMV AGTTCTAAAT ATCTTACTTG GAGA[ ]GATAAAG AAAACAATTC TGGTCTC[--- SRVI AGTTCTAAAT ATCTTACTTG GAGA[ ]GATAAAG AAAATAATTC TGGTCTC[--- SRVII AGTTCAAAAT ATCTCACTTG GAAA[ ]GACAAAG AAAATAACTC AGGCCTT[--- BabboonSERV AGTTCTAAAT ACCTAACATG GACA[ ]GATAAAG AAAACAATTC AGGCCTC[--- MusD AGTTCCAAAT TGCTAACCTG GAGA[ ]GATGAAG ATGAAAAATC AGGAAAG[--- TvERV TCAACACAGC TGTTGCAATG GGAA[ ]GATCGTG AGGGTAATAG AGGAACA[--- SMRVH AGTGCTCAAA TGCTTAAGTG GGAGI ]GACTCTG AAGGCAATAA TGGTCAC[--- ColobusI AGTTCCAAAT ATCTCACCTG GAGA[ ]GATCAAG AAAGTAATTC CGGTCTC[--- Slmongoosel AGTGCAAGAT TACTTGCGCT GGCA[ ]GGATGCA GAGGGAAATC AGGGGAC[--- GoatIl AGTTCCAGAC TGCTCAAATG GGAG[ ]GACTCAG AAGGTCATTC AGGCACT[--- OstrichD TGCTCAAAAA TTCTTACTTG GAAG[ ]GACTCTG AGGGTAACTC AGGACAG[--- Lorisl --TGCTGAAA GCCTCAAATG GACA[ ]GATGAGG AAGGCAACAC TGTTACT[--- SIMongooseII AGTTCTACGA TTCTTATCTG GTCA[ ]GACGAGG AAGGCAATAC AGGTTCT[--- Jaagsiekte AGTTCATCCC TTCTTACTTG GAAG[ ]GATAAAG ATGGACATAC AGGCCAA[--- RDolphinI AGTAGTCAGT TTTTTTTATG GCAA[ ]GATAGTG AGGGCCATTC AGGATAT[--- WFDeerI AGTTCTGAAT TGCTTAGGTG GAAA[ ]GATGCTG AGGGTCATGA AGGATAT[--- Cariboul AGTTCTGAAT TGATTAGGTG GAAG[ ]GATGTT- --GAGGGTCA TAAGGAT[--- Giraffel AGCTCACAAA TTTTGACATG GTCA[ ]GATGAGA AGGGGGCACA AGGCACT[--- Bisonl AGTGCAAAAA TATTAGATTG GCAG[ ]TTTGAGA ATACCTGT-- -GGAACT[--- Muskoxl AGTGCAAAAA TTTTAGATTG GCAA[ ]TTTGAGG ATACCCGT-- -GGAACT[--- CFBadgerl TTTTTTTTTT TTTTAAATTG GCCC[ ]ACTGCTG AGGGTCATCA AGGAGTA[--- HRV5 7777777777 7777777777 7777[777777 777]7777777 7777777777 7777777[777 GoatI AGTTCCCAAG TATTAAAATG GAAA[ ]GATGAAG AGGGTCATTC GGGTACT[--- IAPSHamster AGTGCCTCAG CCTTAACCTG GCGG[ ]GATGCTG AAGGCAAACA GGGATGT[--- IAPCHamster AGTGCTAGAC AATTGCCTTG GAAA[ ]GATCAGG AAGGGCATTC AGGGACC[--- IAPMouse AGCTCCGTTG CCTTGACGTG GGAA[ ]TCCTCTG AAGGACAGCA AGGGAAA[--- Mastomysl AGTGCAAGAC AATTAAAGTG GCAG[ ]GACGAGG AGGGTCACTC AGGACAT[--- MyomysI AGTGTGAGGC AATTAAGGTG GCAA[ ]GATCAGG AAGGGCACTC TGGAGTG[--- UromysI AGTTCTCGCC TATTACGCTG GCAA[ ]GCCCCTG AAGGCCAAGT AGGACAG[--- Prairiedogl AGTTGCAGAC ATTTATCATG GAAA[ ]GATTCTG AAGGACATTC TGGCACT[--- Rabbitl AGTGCAGCTC AGTTACCGTG GCAG[ ]GATAGCG AGGGTCATAA AGGACAG[--- NileRatI AGCTCTAAGG AGCTGGTGTG GAAA[ ]AACCAGG AGGGGAAAAC AGGAAGG[--- MMTVBR6 AGTAGTCAGC CACTCCGTTG GCAA[ ]CATGAGG ATAAATCA-- -GGAATT[--- SDunnartl TCTAGTAGAG TACTTCATTG GGAA[ ]TTTGAAA ATCAGAAA-- -GGTATT[--- RKangal GCCTTAAGAA GGATGAAGTG GCAG[ ]CACGAAG ATGACGAG-- -GGCTTG[---

243 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Platypus' AGCAGTAAGT TATTAAAATG GGAA[ ]-GCAAGG AGCAACAA-- -GGATAT[--- Echidnal AGTGCCGCTT ACCTACACTG GGAG[ ]CTTGGTA GCCGATCA-- -GGATAC[--- Armadillol vv,v[o7v7w, '7.,]??????? TV77"›,,??? 7')'>77"P>p*), PFalconI AGCCTGGAGC TTATCCAAGT TCAA[ ]GGCCCAG AAGGGCACAT TGCTTAT[--- PrairiedogII AGCCTGGAGC TTATCCAAGT TCAA[ ]GGCCCAG AAGGGCACAT TGCTTCT[--- MHarrierl AGTGTGAATT TAATCCAAAT TCAA[ IGGTCCAG AGGGGCATAT AGCCTCC[--- Vulturel AGTCTGGAAA TAATCCAAAT AAAA[ ]GGCCCGG AGGGACATGT TGCCTCT[--- ETinamoul AGCTTGGAAA TGATCCAAAT AAAGI ]GGCCCGG AGGGACATGT TGCCTCT[--- Moorhenl AGCCTCCACT CCGTCACTAT TGAA[ IGGCCCAG AGGGACGCAT AGCCACA[--- HThrushl AGCCGAAATC TCATTCAAGT CACA [ ]GGACCAG AAGGTCTCAT TGCTTTT[--- ESOw1I TCACGTAAGG GGCCCAGAAA ACCG[ IGGTAGTG AATATTCGTC CATTTAT[--- RNPheasant I AGTGCACACG TGATTCAAGT AGTG[ ]GGCCCCA AAGGCATAAG AGCCTGT[--- Peacockl AGTGCACACA TGACTCAAAT AGTT[ IGGCTCCA AAGGCAGAAG AGCCTGT[--- Bluetit II GCTAGAGATA TCGTCCAGAT CGAA[ ]GGTCCGG ATGGTAGGAT TGCTAGT[--- CMagpieII AGCAGGACTC TTGTCCAGAT TGAG[ IGGCCCAG AGCGGCGGAT TGCAACA[--- Penguinl AGTGCCAACT TAATACAGAT TGTAI IAGCCCGG AAAGACAGGT TGCAACT[--- GSKiwiI AGCGCCCAGA TTATACAGGT GATC[ IGGGCCGG AGGGGCAAGT TGCGAAC[--- LKiwiI AGCGCCCAGA TTATACAGGT GATT[ IGGGCCGG AGGGGCAAGT TGCGAAC[--- HThrushl AGTCAAGGGC CTGTGTTAGT CAGG[ ]AATCTTC AAGGGCAGAC GGCTACT[--- ES0w1II GCAGCTACGC CTATTCTGAT CACC[ ]AATCCTG AGGGTCAACA GGCAACC[--- WFgoosel AGTAAATACA TGCTTACCAT TAAA[ IACTCACA AAGGTTCGGC ATACCAC[--- NABDuckIII TCCAAATACA AGCTAACAAT CAAA[ ]ACACAAG AAGGCTCTAC CTTTTCC[--- Flamingol CAAAGTAAAC ATATGCTTTC CATC[ATT--- ]ACTCGCG AAGGTTCTAC GTATACC[--- GRheaII AGTCAATCAT TTGTAACAAT TACT[ ]GACACAG AAGGGAACAT GGCACAGI--- DRheaI AGTCAATCAT TTGTAACAAT TACT[ IGACACAG AAGGGAACAT GGCACAGI--- BrownKiwiI AGTGCGAGAC CCGTGAAGAT CTGC[ ]GACGAAG AAGGTCATTT GGGCGTG[--- RBowerIII AGTTCAACCT TTTTAACAGT GTGG[ -GTTG AAGGAGTTTG TGCTCAG[--- Toucanettell AGTAAACTTC CAGTTTCTCT GAGG[ ITTTCAAG ACAGAGCAAT AGTAGCG[--- GWoodpeckerl AGCAAAGAGC CACGTGCAAC TCACI ITTTTCAA GATGGCAAGT CAATCTC[--- ESOw1II AGTGCGTTTA CAGTGCTGTG TGAA[ ]AACCTTG ATGGCCTTGT CTGCTCA[--- GRheaI AGTGAGAACT TCATAACAAT ACAA[ ]GATAATA AGGACTTAGC TCTTACT[GCA JQuailI CGCAGTGAAG CGCTTTTAGT CTGC[ ]AATGGAC CAGGCTGAAC CCTTTGC[ACA Cassuaryl AGTGCCCAGC CGTTATTGGT GTTG[ ]GGCCCCG AAGGACAGCG GGGCCAT[--- AZMagpieII AGTGCCCACC TTCGCCTTGT TCAT[ ]GGCCCTG ATGGGCAGAC TACTCGC[--- CMagpieIII AGTGCTCACC TTCACCTTGT TCAT[ ]GGCCCCG ATGGGCAGAC TGCCCGC[--- MThrushl AGTGCCCACC TTCCCCTTCT TCTC[ IAGCCCCC ATGGGCAAAC TGCCCAC[--- CMagpiel AGCGAATCTG TAATCAGTAT TACA[ ]GGACCGC AAGGAAAAAA AGCATTG[--- Guineafowll AGTCACCTGC CGGTTACAGT GCAC[ ]CTTATTG GGGAAAATGA ACAAGCG[CAG PartidgelV AGTGTTAATG TTATTCAGGT CAGGI IATCAGGG GGGAGATGGA GGACGCT[ACT HThrushII AGCCTAGAGC CCATCTTAGT CAGA[ IGACAAGG AGGGGAATGA GGCTGTA[--- GPheasantl AGTGTATATC CCGCTATTCT TAGT[TATCAA ]GATGATG ATGGCAGCTG GAAGGAG[TTG GPheasantll AGCCAGCATC CCATGGTGGT GGAA[TGGGTC ]GATGACC ATGAGCCTTG CCGATCT[GGC NABDuckI AGCAAACACC CCATCATGTT CTCC[TGGGTG ]GACAATA GTTTGACCGT GGAGGCT[GGA LoonI AGTAAACAAC CAGTACAGAT AAAA[ ]TTTCCAG AAGGCCAGCA AACAAGC[--- Ostrichl ACACCTACGT TAAAAGTAAA GCTG[ AGGGTTGTCA AGTTTCT[--- MoorhenII ACACCTCCCG TAACCCTCTG CCTC[ ATGAGCAACA AGTGACA[--- Goshawkl AGTCAGGATA GGATTGTAGT TTAT[ ] -CTGG ATGGAAGAAA AACCAAT[--- Pigeonl AAGTATTTGA CGGTAATTCT GGTC[ ]ATCATAG ACGGGAGAAA AGCCAAT[--- FHawkI AGTCAGGATA GGATTGGAGT TTAT[ ] -CTAG ATGGACGAAG AACCAAT[--- BGrousel AGTTTGGATC CAATTATGGT GACA[ I -TTAG AAAACCACTC TGCGCAA[--- BGrousell ACCCCCCTCA TTACTGTGGA TATT[ ATGGTAGAAG AGCTGCA[--- Peacock'' AGTATACATC CTATCCAGGT CACC[ ] -ATTG AGGGACGATT TGCCCAT[--- NABDuckIV TCACCAGTAC TGCAATGGAC AATA[ GTGACAAAGT TGTAAAA[--- BluetitIII TCTCCACAAA TCCAGATTCA ATTA[ ATGGGAAGGT CATCAGC[--- Toucanettel AGTGCTCAAT TGGTAGCAAT TACA[ ]CTTCACA CCGAAAAGGG ACCAGAG[--- NABDuckII AGCGCCCAAT TGGTTGCAAT CACA[ ]CTCCATA CTGAAAAGGA ACCAGAG[--- HThrushIII AGCACCCAGC TGATAGCAAT AACA[ ]CTCTGCA CCGAGAGAGG ACCAGGA[--- BlueTitI AGCACACAGC TAATAGCCAT TACA[ ]CTCCACA CAGAAAAAGG GCCAGAA[--- T7 AGCGCACAGC TAATAGCCAT TACA[ ICTTCATA CAGAAAAAGG CCCAGAG[--- MHarrierll TCACTGAGTG CCCTTTGGGC TTCT[ ]ATAGTAG ATAGTGACGG TACAGTT[--- Emul TTCGAAACCG ATATCACAAT TGTAI ITTGGAGG ACCCGGATAG GCCTGCT[--- LKiwiII TCGCGGTTCG CATTGAAACT CACC[ CTGCATG ATCCTGACGC AAAACAA[--- LDV AGCGAATTTT TGTTGCAAGG CCGT[ I ATAAGAA GGGAGAAGGA AGAA---[--- RSV TCTCGTGACA TGATAGAGTT GGGG[ ]GTTATTA ACCGAGACGG GTCTTTG[--- ALVsubgrpJ TCTCGTGACA TGATAGAGTT AGGGI IGTTATTA ACCGAGACGG GTCTTTG[--- Tragopanl TCAGCCAAGC CGGTTGAGAT CGCC[ IGTGATTA ATAGGGACGG GTCATTG[--- GuineafowlII TCCCTAGCAA AGGTAGAGAT CGTT[ ]GCAGTTG CCCGCGATGG GACTTTG[--- LoonII AGTAAACTTC CAGTTTCTCT GAGG[ ]TTTCAAG ACAGAGCAAT AGTAGCG[---

244 Appendix 1. Nucleotide Alignment of Class II ERV pol fragments

Appendix 2. Nucleotide alignment (continued) BLV ] ATCCCAAAA ATTTTAGTT[- ]GACAC TTCCGACAAA HTLV1 ] TTAACATCT TGCCTAGTT[- ]GATAC CAAAAACAAC HTLV2 ] CTTTCCTCC TGCCTCTTA[- ]GACAC CCACAACAAA HIV1 ] GTATTA GTAGGACCT[- ]ACACC T HIV2 ] ATAATG ACAGGTGAT[- ]ACCCC A SIVagm ] ATATTG ATAGGAAGC[- ]ACTCC C FIV 777779]77" 77797777" ""GATAAC TCATTAATA[- ]CAACC A BIV ] ATTGAT GTCTTGTTC[T CAGAT]ACTCC T Jembrana ] GTGGAC GTCCTGTTC[T CCAAC]ACTCC A CAEV ] ATAGTA GTGTTACCA[C AA -]AGTCC A Visna ] ATAGTG GTGTTAGCT[A CG- -]AGTCC G EIAV ] ?? ????ATGCTA GTGGCAGAT[- ]ATTCC A HERVK10 ] GT TCAGCCAATG ATTACTTCA[- ]ATTCC T HML1Z70280 ] AT CCAACCTATC ATTACACCT[- ]ATTCC T HML3ACO25577 ] AT TCAACCAATT ATAATTTCT[- ]GTACC T HML3AC092364 ] AT TCAACCAATT ATAACTTCT[- ]GTACT T HML4AC093517 ] AT TCGCCCCCTC ATTACACCC[- ]ATTCC T HML5AP000870 ] AT ACAACCTCTA ATCATGCCT[- ]ATCCC T HML6AF069508 T ] TATGTTGCA[A AT- -]ATAGC T HML7AC013722 ??????]???? ?????????? ????????77 777777777[7 77777]77777 7777777777 HML8AL596245 77777717777 7777772777 7777777777 7777777??[7 ?????]????? ?????????7 HML9ACO25569 ] GT TCAGCCCTTT ATCTTAGAA[- ]ATTCC A HML9AC068700 ] CT TCAGCCCTTT ATCTTAGAA[- ]ATTCC A Oryzomysl ] TT AAAGCCATAT GTGGCTAAC[- ]ATTGC T ArmadilloII ] CT GCAACCTTAT GTTGCAGAT[- ]GTAGG A Reedbuckl ] CT GCAACCTTAT GTTGCAGAT[- ]GTAGG A Muspaharill ] TT GAGACCTTAT GTGGCTGAC[- ]ATATA C NileRatil ] AC ACACCCCCTG GTGGCTCAA[- ]GTAGC C Hedgehogl ] TT CCACCCTTTG GTAGCTGAT[- ]ATTAG C Cougarl ] AT AGAGCCCTAC GTGATGAAT[- ]ATCCC T DomesticCatl ] AT ACAGCCCTAC ATGACGAAT[- ]ATCCC T T22 ] AT CCAGCCCTAT ATATTAGAC[- ]ATTGA A Sheepl ] AT TAAACCGTAC GTTGCCCCA[- ]TTACC G IPSquirrell ] AT TAAACTGTAC ATTGCCCCA[- ]TTACC A SlMongoosel ] TT CATGCAATCA TCGATCTTG[G AG---]ATTCC C MPMV ] AT CAAACCGTTT GTTATTCCT[A AC---]TTACC T SRVI ] AT TAAACCGTTT GTTATTCCT[A AT---]TTACC T SRVII ] AT CAAGCCATTT GTCATCCCT[A AC---]TTACC G BabboonSERV ] AT TAAGCCATTT GTCATCCCT[T AC- -]CTACC T MusD ] AT TCAGCCATAT GTTATGTCA[A AT---]TTGCC T TvERV ] AT CCGTCCCTTT GTAGTCCCA[T GC---]CTCCC C SMRVH ] AT TACCCCTTAT GTCCTCCCC[A AT- -]CTGCC A Colobusl ] AT TAAGCCATTT GTTATCCCT[G AT---]TTGCC T SImongooseI ] AT TCAGCCTTCT GTTATCCAA[G GT---]TTACC T GoatII ] GT TCAGCCCTAT GTTGTTGAA[A AC---]CTCCC T OstrichD ] GT TCAACCTTTC ATTGTTCTG[G GT---]TTACC T Lorisl ] GT TCAACCTTAT GTTGTGCCT[G GC---]CTCCC A SIMongooseII ] GT ACAGCCATAT GTAGTCCCT[A AT---]TTGCC T Jaagsiekte ] TT TAAACCTTAT ATTCTGCCC[T AT- -]CTTCC A RDolphinI ] TT CCAGCCTTAT GTTTTGCCT[G GG- -]CTGCC T WFDeerI ] AT CCAGCCTTAC ATTTTACCT[A AC- -]ATTCC T Cariboul ] AT ATCCAGCTTT ACATTTCCT[A AC---]ATTCC T Giraffel ] TT TTGCCCATAT GTGATTCCT[T CA- -]CTGCC T Bisonl ] TT CCAGCCTTAT GTAGTTCCT[T CA---]CTCCC C Muskoxl ] TT CCAACCTTAT GTCGTTCAT[T CG---]CTCCC C CFBadgerl ] TT CCAAACATTT GTGCTACAA[- ]ATTGA T HRV5 777777] T GCAACCTTAT GTTAGTGCA[- ]CTCCC C Goatl ] TT TCAACCATAT ATCTTGACA[T CT---]CTTCC A IAPSHamster ] TT TACCCCCTAT GTGTTGCCA[- ]CTCCC T IAPCHamster ] AT GCAACCTTAT GTGTTAGAC[- ]TTACC A IAPMouse ] TT CATACCTTAT GTGCTCCCA[- ]CTCCC G Mastomysl ] AT GCAACCATAT GTTCTTGAG[- ]CTCCC T Myomysl ] AT GAAACCCTAT GTACTTGAG[- ]TTGCC A Uromysl ] TT TCTCCCGTAC GTTCTACCG[- ]CTTCC A Prairiedogl ] TT TCAACCATAT GTGCTTGAT[- ]CTTCC A Rabbitl ] TT TACTCCTTTC GTCTTGCCT[- ]CTTCC C NileRatI ] TT TACTCCGTAT GCTCTGGAC[- ]CTCCC G MMTVBR6 ] AT ACATCCTTTT GTGATCCCT[A CA -]CTGCC T SDunnartl ] TT TAGGCCTCTT GTCATTGCT[G GG- -]CTCAG G RKangal ] TT CCAACCCCTG AGAGTCAAA[G GA---]TTGCC T

245 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Platypusl ] TT CCAACCATAT ATAGTTGAT[G GA- -]CTCCT G EchidnaI ] AT TCAACCATAT GTGTTAGAT[C GC- -]CTCCC C Armadillol ??????]???? ?????????? ?????????? ??,77"77 P "7"P"" 7"","" PFalconI ] GT TAAACCTTTT GTGTTACCT[- ]GTTCC C PrairiedogII ] GT TAAACCTTTT GTGTTACCT[- ]GTTCC C MHarrierl ] GT TAAGCCTTTT GTACTGCCT[- ]GTTCC C Vulturel ] GT TAAGCCTTTG GTATTGCCT[- ]GTGCC T ETinamoul ] GT TAAGCCTTTT GTATTGCCT[- ]GTACC T Moorhenl ] GT TAAGCCCTTT GTCTTACCA[- ]ATACC A HThrushl ] GT TAAGCCATTT GTTTTGCCT[- ]GTACC T ESOw1I ] AT TGGAAACCCC TATTACATT[- ]ATGGG G RNPheasantl ] GT TCGT [- ] Peacocki ] GT TCGT [- ] BluetitII ] GT CCGACCTTTT GTAATTGAC[- ]TACAA G CMagpiell ] GT CCGTCCATTT GTTATTGAC[- ]TCTGG G Penguinl ] AT TCGACCTTTT GTGGTTCCT[- ]GTGCC G GSKiwiI ] AC CCGTCCCTAC ATAGTCAAC[- ]GTACC A LKiwiI ] AC CCGTCCCTAT ATAGTTAAC[- ]GTACC A HThrushl ] GT CCGGCCCTAC ATGACTGCA[- ]TCCCC G ESOw1II ] AT CAAGCCCTAT GTGATGACA[- ]GCCCC C WFgoosel ] GT TCGGCCATAT GCTGCCCCA[- ]ATACC T NABDuckIII ] AT ACGTCTGTAT GCCACTCTT[- ]ATCCC A Flamingol ] GT TCGGCCGTAC ATCGCCCCA[- ]ATACC C GRheaII ] GT ACGGCCTTAT GTAATGTGT[- ]ACACC A DRheaI ] GT ACGGCCTTAT GTAATGTGT[- ]ACACC A BrownKiwil ] GT AACACCATAC GTAATGAAT[- ]ACCCC T RBowerIII ] GT GAAGCCCTAT GTAATGAAC[- ]ACAAC A Toucanettell ] AC TAGACCTTAT GTCTTAAAT[- ]TTGCC A GWoodpeckerl ] GT AACACCATAT GTGCTAACC[- ]CTTCC A ESOw1II ] GT TCGGCCCTAT GTTGCTCCT[- ]ATACC A GRheaI AAA---] AT AAAGCCATAC ATAGTGGAC[- ]ACCCC T JQuailI ] GT ACAGCCATAT GTAGCTAAA[- ]GTGCC A Cassuaryl ] AT TACCCCCTAT GTTGTGCAA[- ]GCCCC T AZMagpieII ] AT CTCACCATTC ATTGCTCCA[- ]GTACT G CMagpieIII ] AT CGCACCATTC ATTGCTCCA[- ]GTACC A MThrushl ] AT CGCACCTTTC ATTGCTTCT[- ]GTCCC A CMagpiel ] AT CCGTCCCTAT GTAGTGCAA[- ]AAACC C Guineafowll ] AC ACGCCCCTAT GTGGTGCCT[G GT---]ATACC A PartidgelV ] GC TCGTCCTTAT ATTCTACGT[G GT- -]CTCCC A HThrushII ] AC ACGCCCTTAT GTTTTAAAC[- ]ACTCC G GPheasantl AGCCCA] AC GCGTCCATAT GTCAATGAT[- ]ATTCC G GPheasantll CCC- -] AT CCGGCTTTTC GCATAGAAG[- ]TGCCC A NABDuckI TGC- -] TT GTCTCCATAT GTGTTGGAC[- ]ATTCC C Loonl ] CT TAAGGTTTAT GTATTACCT[- ]TTGCC CGGAGTGCTG Ostrichl ] GC TGTGTTTTCT GTGGTACAA[- ]CTACC ACCAACTGTG MoorhenII ] GT AGTCCTTTCC ATTGTGTTG[- ]CTGCC GCCCACAGTC Goshawkl ] GC CACAATCACT GTAATGTCC[- ]TTGCC TCATGGAGTT Pigeonl ] AC ATTTGTTACA GTAATGACT[- ]CTTCC TGCAGGTGTA FHawkI ] GC CACAATTACT GTAATGCCT[- ]TTGCC TTATGGAGTT BGrouseI ] TG CAAAATTACA GCAATGCCC[- ]CTCCC AGTGGGTGTT BGrousell ] GC GATATTTTCA GTAACGACG[- ]CTTCC TCCCACCGTC Peacockli ] TG CCGTATCACA GTGATGCCA[- ]CTTCC TGTTGGAGTA NABDuckIV ] TG CAGTATATCT GTCTTACCG[- ]TTGCC AGATGGAGTA BluetitIll ] TC TATTTTGTCA ATTGTTCCA[- ]TTACC AGACAAGGTT Toucanettel ] AA AACCATCGCC CTTTTTCCA[T AT---]GTCAC ATCAGGGGTC NABDuckII ] AA AACCATCACT CTCTTCCTA[T AT---]GTCAT GTCAGGAGTC HThrushIII ] AG GACTATCACC CTTATTATA[T AC---]GTTAT GTCGGATTGC BlueTitI ] AA AACTATTGCC CTCTTCCTG[T AT---]GTCAT GACAGGAATC T7 ] AA AACTATTGCC CTCTTCCCA[T AT---]GTCAT AACAGGAATC MHarrierll ] CT TGCGGCTCTT GTAACTCCC[C AT---]CAGGC ACCCGTAGCA Emul ] CA GATCATCCGT GTGCGGCCG[T AC---]ATAAC TGCCATAGGG LKiwiII ] GC GACAATCACA CCT[T AT---]GTCAC CGAAATTGCT LDV ] AT CCGTGTGAGC CCGTATATA[- ]---GC AGCCATTGGG RSV ] GA GCGACCCCTG CTCCTCTTC[C CCGCA]GTAGC TATGGTTAGA ALVsubgrpJ ] GA GCGACCCCTG CTCCTCTTC[C CCGCA]GTAGC TATGGTTAGA TragopanI ] GA AAGGCCCGTC ACACTCACC[C GT-TA]GTGGC GCAGGTACCG GuineafowlII ] GA AAAGCCAGCC CTGCTCGTC[C CACTG]GTCGC AGAGATTCCG Loonli ] AC TAGACCTTAT GTCTTAAAT[- ]TTGCC A

246 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Appendix 2. Nucleotide alignment (continued) BLV TGGCAAATTT TAGGACGGGA CGTCCCTCCC GCCTAC---A GGCTTCTATC T[CTGAGGAAG HTLV1 TGGGCCATCA TAGGTCGTGA TGCCTTACAA CAATGCCAAG GCGTCCTGTA C[ HTLV2 TGGACCATCA TTGGAAGGGA CGCCCTACAA CAATGCCAGG GGCTTCTATA C[ HIV1 GTCAACATAA TTGGAAGAAA TCTGTTGACT CAGATT---G GCTGCACTTT A[ HIV2 ATCAACATTT TTGGCAGAAA CATTCTGACA GCCTTA---G GCATGTCATT A[ SIVagm ATAAACATAA TTGGAAGAAA TATATTAGCA CCAGCA---G GAGCCAAATT A[ FIV TTAT TAGGGAGAGA TAATATGATT AAATTC A ATATTAGGTT A[ BIV GTAAACCTTT TTGGGAGATC TCTTCTACGT AGCATA---G TGACTTGC- -[ Jembrana GTAAATTTGC TAGGCCGATC AGTACTGCAA AGTATA---G TGACAAAA-- -[ CAEV GTAGAAGTAT TAGGACGAGA TAACATGGCC CGATTT---G GAATAAAGAT A[ Visna GTAGAAGTAT TAGGAAGAGA TAATATGAGA GAATTG---G GAATAGGATT A[ EIAV GTGACTATTT TGGGACGAGA TATTCTTCAG GACTTA---G GTGCAAAATT G[ HERVK10 CTTAATCTGT GGGGTCGAGA TTTATTACAA CAATGG---G GTGCGGAAAT C[---ACCATG HML1Z70280 GTTAATTTAT GGGGTAGAGA CTTATTGCAA CAATGG---G ATGCTGAAAT A[---TCTATT HML3ACO25577 ATAAATTTAT GGGGAAGAGA TTTATTACAA CAATGG---G GAGCACAAGT T[---CTAATT HML3AC092364 ATAAATTTAT GGGGAAGAGA TTTATTACAA CAATGG---G GAGCACAATT T[---CTAATT HML4AC093517 GTTAATTTGT GGGGAAGAGA TCTTTTATAT CAATGG---G GGGCTCAGAT T[---ACTTTT HML5AP000870 GTTAATCTTT GGGGACAGGA CCTATTAGCC CAATGT---G GGGGGTCACT C[ HML6AF069508 GGTAATCTAT GGGGCCGAGA TTTACCGACA GcATGG___, 7777777977 HML7AC013722 7777777777 7777777777 777?777777 7777777777 777?77.7777 7[ HML8AL596245 7777797777 7977?????? ?????????? ?????????? ?????????? ?[---AATATT HML9ACO25569 TTTTCTCTGT GGGGACAAGA TGCCCTAGAA CAGTGG---G GCTTAACTGT CI HML9AC068700 TTTTCTCTGT GGGGACAAGA TGCCCTAGAA AAGGGG---G GCTTAACTGT C[ Oryzomysl ATAAATTTAT GGGGACGAGA TTTATTACAG CAGTGG---A ATACACAGAT T[---AATATT Armadilloll CTTAATTTCT GGGGAAGAGA CTTATTATAC CAATGG---G GAGCTTATTT G[---AATATT Reedbuckl CTTAATTTAT GGTGAAGAGA CTTATTATAC CAATGG---G GAGCTTATTT G[---AATGTT Muspaharill ATAAACTTGT GGGGAAAGGA TCTTTTACAA CAATGG---A GTACTCAAAT T[---AATATT NileRatII ACCAACCTTT TAGGGAGAGA TATCTTACAA GCCTTA---G ATGTGGTTCT T[ Hedgehogl ACCAACCTAC TGGGCAGGGA TCTTCTGTAA TGTCTT---G ATGTTAGGAT Al Cougarl ATCAATTTGT GGGGAGGAGA CTTGTTGAGT CAGTGC------AATATTCA GI DomesticCatl ATCAATTTGT GGGGAAGAGA CTTACTGAGT CAATGG---A ATATTCAGCT T[ T22 CTAAATTTGT GGGGACGGGA CCTGTTGGCT CAACTG---G GGGCGACACT G[ Sheepl TTAAATTTGC GGGGAAGAGA CTTTCTACAA GCTCAA---G TGACTATACA A[ IPSquirrelI TTAAGTTTGT GGGGAAGAGA TTTTCTACAA CAAGCTCAAG CGACTATACA A[ SIMongooseI TATACATTAT GGGGGAGGGA TTTCTTAGGA CAATGG---A CAGCTAAACT G[ MPMV GTCAATCTTT GGGGCCGAGA TTTACTTTCT CAAATG---A AAATTATGAT G[ SRVI GTCAACCTTT GGGGCAGAGA TCTCCTTTCT CAAATG---A AAATTATGAT G[ SRVII GTTAATCTTT GGGGCCGAGA TTTACTTTCC CAAATG---A AAATCATGAT GI BabboonSERV GTTAACCTGT GGGGGCGAGA TCTGCTCGCT CAAATG---A AAATTATAAT G[ MusD GTAACCCTGT GGGGAAGAGA TCTGTTGTCA CAGATG---G GCGTTATCCT GI TvERV GTTAATCTGT GGGGAAGAGA TATCCTCTCA CAAATG---G GAGTTATCAT G[ SMRVH GTCAATCTCT GGGGAAGGGA CATCCTCTCT CAAATG---A AACTTGTCAT G[ Colobusl GTTAACCTTT GGGGTAGAGA TCTTCTCTCC CAAATG---A GGGTCATGAT GE SImongooseI GTTAGTCTGT GGGGACGAGA CCTCTTATGT CAAATG---G GTATTATTCT Al Goat II GTTAATCTTT GGGGTCATGA TATTCTTAGT CAACTA---G GGGTTATCAT GI OstrichD GTCAACCTAT GGGGTCGTGA TGTCTTATCT CAACTA---G GGGTCATCAT G[ LorisI GTCAATCTCT GGGGCAGAGA CATCCTTGCC CAACTA---A GACTAATTAT G[ SIMongooseII GTTAATCTCT GGGGGAGAGA CATCCTTGCC CAACTA---A AACTAATTAT GI Jaagsiekte GTTAATCTAT GGGGGCGTGA TATATTAAGC AAAATG---G GTGTTTATTT A[ RDolphinl TTAAACCTCT GGGGCCAAGA TATATTAAAA GAAATG---G GAGTATTACT T[ WFDeerI GCTAATCTGT GGGGAAGAGA TGTTATGAAA CAAACG---G GAGTTTACAT A[ Cariboul GTTAATCTGT GGGGAAGAGA TGTTGTGAAA CAAATG---G GGGTCTACAT Al GiraffeI TTTTCTTTAT GGGGGAGACA TATATTATCT CAGATG---G GAATGCTTTT A[ Bisonl TTTACTCTAT GGGGGAGGGA TGTGTTATCT CAAATA---G GAGTGCTACG T[ Muskoxl CTTACCCTGT GGGGGAGGGA CGTGCTATCT CAAATG---G GAGGGTTACT T[ CFBadgerl CTAAATTTAT GGGGACGTGA TGTCTTAACA GATATG---G GTGTTTCTTT G[ HRV5 ATCACATTAT TGGGAAGAGA CATTTTGGAA CAAATG---G GACTGACATT Al Goatl ATTACATTGC GGGGTCAAGA TATATTGGCT CAGCTA---A ATCTGAAATT G[ IAPSHamster GTAAATTTAT GGGGACGAGA TGTGTTACAA GCCATG---G GCATGACCCT Al IAPCHamster ATTTCATTAT GGGGAAGAGA TTTGTTAAAG GATATG---G GTTTTAAACT C[ IAPMouse GTTAACCTCT GGGGAAGGGA TATTATGCAG CATTTG---G GCCTTATTTT G[ Mastomysl ATCTCCCTTT GGGGAAGAGA CCTTTTAAAA GACATG---G GATTCAAATT A[ MyomysI ATCTCTCTTT GGGGAAGAGA TTTGTTAAAG GATATG---G GATTTAAGTT G[ UromysI GTTAATCTCT GGGGGCGAGA CATCCTTCAG AAGCTA---G ACCTGAGATT GE Prairiedogl ATCTCTCTAT GGGGACGTGA TCTTATGAAG GACATG---G GGTTTCAACT T[ RabbitI ATTTCCCTTT GGGGGAGAGA CATCCTGGCT GCCATG---G ATGTCGCCTT G[ NileRatI GTCACCTTGT GGGGAAGAGA TGTACTAGTA CAATTA---G GCATGAAACT C[ MMTVBR6 TTCACCTTAT GGGGAAGAGA TATTATGAAA GATATA---A AGGTCAGATT G[ SDunnartI ACCTCCTTAT GGGGAAGAGA TTTATTGAAA ACCCTT---T GGACAACACT A[ RKangal TTGACCCTGT GGGGATGGGA CATCCTTACC GGGATG---G GCACCACCCT A[

247 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Platypusl ATCAATCTGT GGGGAAGAGA TATACTCCAG CAAATG---A AAGTGATGTT A[ EchidnaI ATTAATTTGT GGGGACGTGA TCTTTTAGGA TAAATG---G GGGCCGTAAT TI ArmadilloI w777,7.,??? ?????????? ?????????? 9999999977 aaaaaaaaaa ,[97,77,77, PFalconI ATGGTTTTAT GGGGACGGGA TGTGCTATCA CAGTGG---G GAATGACAAT T[ PrairiedogII ATGGTTTTAT GGGGACGGGA TGTGCTATCA CAGTGG---G GAATGACAAT T[ MHarrierl ATGATTTTAT GGGGGCGTGA TGTACTGTCC CAGTGG---G GAATGTCCAT T[ Vulturel ATGGTCTTGT GGGGACGCGA CGTATTGTCA CAGTGG---G GAATGTCCTT A[ ETinamoul ATGGTCTTGT GGGGACGCGA TGTGTTGTCG CAGTGG---G GAATGTCTTT A[ Moorhenl ATGGTTTTAT GGGGACGGGA TGTACTAACA CAATGG---G GAATGACATT G[ HThrushl ATGGTGTTAT GGGGATGTGA TGTACTTTCT CAATGG---G GTTTCAAAAT T[ ESOw1I TTGTGACTGT ATGACTCAAT GGGGAACTCA GATTGG---A TCAGA RNPheasantl --GTCTCTTT GGGGACGGGA CTGCCTCTCT CAGTGG---G GACTGAATTT G[ Peacockl --GACTCTTT GGGGACAGGA CTGGCTCTCT CAGTGG---G GACTAAACTA G[ BluetitII TGCCCCCTGT GGGGGAGAGA TACCATGTTG CAGTGG---G GGGTAAAACT C[ CMagpiell TTTACATTGT GGGGCAGGGA TCTCATGTCC CAGTGG---G GAGCCCGGGT T[ Penguinl TTAAACCTGT GGGGACGAGA TGTCTTGAGC ACTTGG---G GACTAGTGAT T[ GSKiwiI CTAACTTTGT GGGGGAGGGA TGTGCTTGGA ACTTGG---G GGGTTACGGT G[ LKiwiI CTAACTTTGT GGGGGAGGGA TGTGCTTGGA ACTTGG---G GGGTTACAGT G[ HThrushl ATCAACCTCT GGGGGCAGGA TGTTTTGGCT GTTTGG---G AGGTGCGCAT 7. 1 ES0w1II CTAAACCTAT GGAGGAGGGA CTGTTTGTCA CAATGG---G GAGTGAAGTT T[ WFgooseI GTCAGCCTCC TGGGGAGAGA TGTGATGGGG CAGGGC---A ATTTCATCCT G[ NABDuckIII ATTAGCTTTC TTGGCAGAGA CGTGATGGCA CATGGA---A ACTTCATCAT G[ Flamingol ATCAGCCTCT TGGGAAGGGA CGTGATGGGA CAATGT---A ACTTTACTCT G[ GRheaII TTATGGATTT TGGGGAGAGA TGTCCTTAGT CAATGG---G GCGTGGTGAT G[ DRheaI CTATGGATTT TGGGGAGAGA TGTCCTTAGT CAATGG---G GCGTGGTGAT G[ BrownKiwiI CTCTGGATCT TAGGCCGGGA CCTTCTGTCT CAGTGG---G GCCTAGTTCT C[ RBowerIII ATCTGGCTAT TGGGACGAGA TGCACTGAGC CAGATG---G GTTTTTGTCT CI Toucanettell CTTGCTCTTA TCGGCCGTGA TGTTTTGTCA CAGCTG---G GTGCCCGACT A[ GWoodpeckerl GTTGTTTTAT TGGGCAAAGA TGCCTTAAGT CAGCTA---G GAGCAGAACT T[ ESOw1II ATCAGCCTTC TGAGCCGGGA TGTCTTGGGA CAGATG---A ATTATACCAT T[ GRheaI ATATTACTTC TGGGGCGAGA CTGTTTATCG CAATGG---G GTTTAACATT A[ JQuailI TTGAATCTGT TGAGGAAAGT TGTGCTATCC CAGTGG---G GTGTGTTCAT C[ Cassuaryl TGCACCCTTT GGGGCCGCGA TTTATTGAGT CAATGG---G GAGTTGTTAT T[ AZMagpieII TGTACATTGT GGGGACGTGA TGTCCTGGGA CAGTTT---G GAACTACTGT G[ CMagpieIII TGTACATTGT GGGGACATGA TGTCCTGGGA CAGTTT---G GAACTACTGT G[ MThrushl TGTACATTAT GGGGACGGGA TGCCCTGGGA CAGTTT---G GAACCACTGT G[ CMagpiel ATCACAGTGT GGGGAAGAGA TTTGCTTTCT GAATGG---A GAGCCAAAAT T[ Guineafowll TCTACCCTCT TGGGAAGATG TCCTGGCGCA GATTCG---A TCAGCACTGG T[ PartidgelV GGGAGCCTGC TTGGACGTGA TGTGCTTCAG CAATTT---G GAGCGCTATT G[ HThrushII CTCTGGCTAC TAGGGAGAGA TGTTTTAAGT CAGTGG---A ATGTTGTGTT G[ GPheasantl ATTCTGCTGA TTGGGCGCGA TGCACTTGCT ACTGTC---A GTGCACACAT T[TCCATTGAA GPheasantll TTGT GAGGGCGAGA TGTTCTCGCT GGTGCG---G GGGTGACAAT C[---CATCTT NABDuckI ATCGCGCTCA TAGGGCAGGA CATTTTACAG ATATTG---G GGGCTAGCCT A[GTA Loonl GAAGCCTTAG TCGGCCGAGA TGTCTTAAGC CAAATT---G GTGCTGTTTT A[ Ostrichl CAGTGTCTAA TTGGAAGGGA TATACTCTCA CAGTTG---G CGGTTGTACT T[ MoorhenII ACCTGTTTAC TCAGGCAAGA CGTATTGGCA CAATTG---G GAGTGGTGTT G[ Goshawkl TCAGCCCTGA TTGGGCGAGA TATCCTTCTC CAGTTA---G GAGTAAGATT G[ Pigeonl AATGGACTAA TAGGTCGAGA TATACTGGAT CAGCTA---G GAGTCATTTT G[ FHawkI TCAGCCTTGA TTGGGCAAGA TATTCTTCTC CAGTTA---G GAGTAAGATT G[ BGrousel TCGGCGCTAG TCGGCCAAGA CGTACTGGAC CAGCTG---G GTGTTATCCT C[ BGrousell GCTTGCCTTA TTGGCCGGGA TATATTGTCT CAACTT---G GTCTTGTACT G[ PeacockII AGCGCATTAA TAGGCCGTGA TGTTTTAGAT CAGCTT---G GCGTGGTTCT T[ NABDuckIV CAAGCACTGA TAGGGAGAGA CATTCTTGCT CAGATG---G GAATGGTTCT T[ Bluetitill CAATGTCTAA TTGGACGAGA TATTTTGTCA CAACTT---G GTGTTGTTTT G[ Toucanettel CCACCCCTGC TGGGAAGGGA TGCTTTGGCC CTGTTA---G GCGCTAGGGT G[ NABDuckII CCTCCCCTGT TGGGAAGGAA CACTTTAGCC CTATTAAAAA AAGCCAGGGT G[ HThrushIII CCACCCTTGT TAGGGAGAGA TGCCCTAGCC CTGCTG---G ACGTTAGGGT G[ BlueTitI CCAGCCCTGC TAGGGAGCAA CACCCTAGCC TTGCTA---A AAGCCAGGGT G[ T7 CCAGCCCTAT TAGGGAGAGA TGCCCTAGCT TTGCTA---A AAGCCAGGGT A[ MHarrierll GAACCTTTGG TAGGAAGAGA TGTACTAAAC CAATGG---G GCATTCGCCT T[ Emul GAACCATTGT TAGGGCGAGA TGCCTTATTG CAAAGC---G GCTTTCACCT G[ LKiwiII GACCCGGTAC TGGGCCGGGA CGCGCTCAGC CAACTC---A ACATCAAGAT C[ LDV TTCAATATCC TCGGCCGGGA GGCGTTAGCG CAACTG---C ACTGTGTGGT G[ RSV GGGAGTATCC TAGGAAGAGA TTGTCTGCAG GGCCTA---G GGCTCCGCTT G[ ALVsubgrpJ GGGAGTATCC TGGGAAGAGA TTGTCTGCAG GGCCTA---G GGCTCCGCTT G[ Tragopanl GGAACCTTAT TGGGGCGGGA CTACCTGCGG AGCGTG---T CCGCTCGGAT A[ GuineafowlII AGAACTTTGC TAGGTCGAGA TTATTTGACA GCAATT---G GCACACGGAT C[ LoonII CTTGCTCTTA TCGGCCGTGA TGTTTTGTCA CAGCTG---G GTGCCCGACT A[

248 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Appendix 2. Nucleotide alignment (continued) BLV TAC]GCCCCCC TGTGG HTLV1 -] HTLV2 --1 HIV1 HIV2 SIVagm FIV BIV Jembrana CAEV -] Visna -] EIAV -] HERVK10 CCCiGCTCCAT TATATAGCCC CACGAGTCAA AAAA TCATGACCAA GATGGGATAT HML1Z70280 CCT]ACGGATC AATATAGTAA TAATAGTAGA CAAA TGATGAAAAA TATGGGATAT HML3ACO25577 CCA]GAACAAT TGTATAACCC TCAAAGTCAA CATA TGATGCATGA AATAGGGTAC HML3AC092364 CCAIGAACAAT TATATAGCCC TCAAAGTCAA CATA CAGTGCATGA AATGGGGTAT HML4AC093517 CCTjAAAGGCA GTTACAGCCA GCAGAGTAAA GATA TGATGACCAA AATGGGATTT HML5AP000870 ---]TGCAGAC CC HML6AF069508 777p077077 7777797770 9779 TAAA ATGTTAAAGA AAATGGGATT HML7AC013722 ___p777777 7277777777 7797779777 .7.797.????? 7??77777 7797777777 HML8AL596245 CCA]CATAACT CTTATAGTGC TCCCAGTCAA CATA TGATGGAAAA CATGAGGTTT HML9ACO25569 ---]CTGCCAC CT HML9AC068700 ---]CCACCGC CC Oryzomysl CCC]??????? ?????????? ????????T, """"" """"" """"" Armadilloll CC117977977 7,77799777 77797797,7 777777.7??? ???????777 7,977777)9 Reedbuckl CCV7777777 .7777777777 77777?7T7, T777777777 7, 9 7.777,79.,7) MuspahariII CCT]7777077 777777)777 777777777 77777777.7? ?????????7 7777979 NileRatII --inn??? ?????????? ???79777 7777?77777 7777777777 7777777777 HedgehogI ---]TCCACAG AT CougarI ---ICTTAAAA CA DomesticCatl ---]AAAACAA GT T22 ---]ACTCTTA AT Sheepl ---]TTGAATG AA IPSquirrell ---]TTGAATG AA SIMongooseI -] MPMV -] TGTAGCCC CAATGACATA GTAACTGCTC AAATGTTAGC CCAGGGCTAC SRVI -] TGTAGTCC TAGTGACATA GTCACTGCCC AAATGTTAGC CCAAGGCTAC SRVII -] TGTAGTCC TAATGATATT GTCACTGCCC AAATGTTAGC TCAAGGATAC BabboonSERV -] TGTAGTCC AAATGATATA GTTATTGCAC AAATGTTAAC TCAAGGATAC MusD -] TGCAGTTC TAAGGAGATG GTGACTGAGC AGACATTCAG GCAGGGACCC TvERV -] TGCAGCCC TAGCTCAGTT GTTACAGAGC AAATGCTAAG TCAAGGCTTT SMRVH -] TGCAGTCC CAACGATACT GTCATGACCC AAATGCTAAG CCAGGGGTAT ColobusI -] TGTAGTCC AAATGATATA GTCACTGCAC AAATGTTGGC TCAGGGCTAC SImongooseI -] TATAAATC TAATGATTTG GTTGCAGCAC AAATGTTAAC TCAGGGATAT Goat II -] TGCAGCCC CAATGCACTT GTTACTCAAC AAATGCTCTG TCAAGGATTT OstrichD -] TGTAGTCC AAATGAAGTA ATTACTCAAC AGATGCTTAA CCTATGTTTC LorisI -] AGCAGTCC AAACGAAACC ATCACCCACC AAATGTTAAA ACAAGGGTTT SIMongooseII -] TGTGGTCC AAATGAGGTT ATCACTTGTC AGATGTTAAA TCAGGGTTTT Jaagsiekte -] TATAGTCC TTCACCCACT GTGACAGATT TGATGTTAGA TCAGGGCTTA RDolphinl -] TATAGTCC AAACTCTCAA GTCTCTAATA TGATGTTGGA TCAAGGATTT WFDeerI -] TTCGCCCC AAATGATGCC GTTAAGCAGA TGCTTTTGAA T---GGGCTA Cariboul -] TTCACCCC AAATGATGCT GTTAGTCAGA TGCTTTTGAA TCAGGGGTTA Giraffel -] TATAGGCC AGATGAAAAG GTTACTAACC AAATGCTGCA AATGAGGTAT Bisonl -] TTCAGTCC CGATGAAAAA GTGACCTCTC AAATGCTCCA TATGGGCTAT Muskoxl -] TTTAGCCC TGATGATAAA GTGACGTCTC AGATGTTGCA GATGGGCTAT CFBadgerI ---]ACAACAG CTCCTATCCC CAGATCTAAA CAGTGTTCAG TTGCAATGGC TCTATGGAAA HRV5 ---]ACCAATG AAGACCAGTT GGAGAGTTCT CCAGGATGGC GTATTATGTA TAAAAAGGGA GoatI ---]GTCACTT TGCCCGACCC CCCACCGTCA CATT GGATGAAACA GCAGGGCTAT IAPSHamster ---]ACTAATG AATACTCCCC CCAGGCATCA GCCA TTATGGCAAA GATGGGGTAT IAPCHamster ---1ACAAATG AATACTCAGA AACATCTCAA GGTA TCATGAAACG AATGGGATAC IAPMouse ---]TCCAATG AATATTCAGC TAAAGCAAAA AATA TCATGGCAAA GATGGGTTAT Mastomysl ---1AGTAATG AGTATTCTGA TTCGGCCCAA CACA TGATGCAAGA TATGGGATAT MyomysI ---]AGTAATG AGTATTCTGA CTCCGCTCAA CATA TAATGCGAGA TATGGGATAT UromysI ---]ACTAACC TCTACTCCCC CCAAGCACAC AAGA TGATGACTAG GATGGGATAT Prairiedogl ---]AGTAATA AATATTCAGC AGTTGCCCAA AAGA TAATGCAAGA AATGGGATAT Rabbitl ---]GTTACCA CTTCCTCTGC AGTACAA TCTA TGCTACAGAA AATGGGTTAT NileRatI ---1ACAAATG ATTACTCTGC TCAATCACAA AATA TGATGCAGGG CATGGGCTAT MMTVBR6 ---]ATGACAG ACTCACCAGA TGATTCACAG SDunnartI ---JAACATTG AG RKangal ---]TGCACAG AA

249 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Platypus' ---]ACTACCG AA Echidnal ---]ACGACCG AC???????? ?????????? Armadillol ???] PFalconI ---]CGTACCC AT Prairiedogil ---]CGTACCC AT MHarrierI ---]CGGACCC AT Vulturel ---]CAGACCC AT ETinamouI ---iCAGACCC AT Moorhenl ---]AAAACAC AT HThrushl ---]CAGTCTG AT ESOw1I -] RNPheasantl ---]GGGACAG T Peacockl ---]GGGACAG A Bluetitil ---]ACTATTC CT CMagpieII ---]GAGATCC CA Penguinl ---]GGCACTG CT GSKiwiI ---]CAAACTG CG LKiwiI ---]CAAACTG CG HThrushI ---]AAGTCAA AT ESOw1II ---]ACTATGG AT WFgoosel -] NABDuckIII -] Flamingol -]. GRheaII ---]CAGTCAC AT DRheaI ---]CAGTCAC AT BrownKiwil ---]TCGTGCC CT RBowerIII ---]ACCAATG AG Toucanettell ---]GTCACTT CG GWoodpeckerI ---]GTTACTG AT ES0w1II ---]GCAAGCA AC GRheaI ATT]ACATCTC CT JQuailI ---]AGTAATG GA Cassuaryl ---]GGGACAA AT AZMagpieII ---]AGCATAA CA CMagpieIII ---]AGCATAA AC MThrushl ---]AGCATAG AC CMagpiel ---]AAACTGG AT GuineafowlI ---]--CATCG GG PartidgelV ---]GTTGTAG GT HThrushll ---]ACCATTC AA GPheasantl GGC]AAGTTAG AT GPheasantll GGG]AAGCCTC AG NABDuckI ---]ATCCCTG AC Loonl ---]TCTACAA AC Ostrichl ---]ACGAATG AG MoorhenII ---]ACAAATG AG Goshawkl ---]ACAACTG AG Pigeonl ---]ACAACTG AA FHawkI ---]ACAACTG AA BGrouseI ---]ACCAATT CA BGrouseII ---]ACCAACG AT PeacockII ---]ACTACAA GA NABDuckIV ---]ACGTCAG AC BluetitIII ---]ACCAATG AG Toucanettel ---]ACAAATT TA NABDuckII ---]AGAAATC TA HThrushIII ---]ACAAATT TA BlueTitI ---]ACAAATT TA T7 ---]ACAAATT TA MHarrierll ---]ACAAATT TA Emul ---]ACAAATT TA LKiwiII ---]AGAAAT- LDV ---]TCAAATT TA RSV ---]ACAAATT TA ALVsubgrpJ ---]ACAAATT TA Tragopanl ---]ACAAATT TG GuineafowlII ---]ACAAATT TA LoonII ---]GTCACTT CG

250 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Appendix 2. Nucleotide alignment (continued) BLV HTLV1 HTLV2 HIV1 HIV2 SIVagm FIV BIV Jembrana CAEV Visna EIAV HERVK10 ATACCAGGAA AGGGACTAGG GAAAAATGAA GATGGCATTA AAGTTCCAGT TGAGGCTAAA HML1Z70280 TGCCTGGGAA AAGGACTAGA AAAAGATAAA ATTGGCCAAT CAGAAACTTT AGAATTAAAA HML3ACO25577 GTCCCTGGTA TGGGACTAGA AAAAAATTTG CAAGGTTTGA AAAAACCGCT TCAAGTGGAA HML3AC092364 GTCCCTGGTA TGGGACTAGA AAAAAATTTG CAAGGTTTGA AAGAACCACT TCAAGTGGAA HML4AC093517 GTTCAAGGTC TGGGTTTAGA AAAATCAGCA CAAGGCATCA CTGAGCATAT CATACCTACT HML5AP000870 HML6AF069508 CAGAGTGGAA AAGGTTTAGG AAAGCCCCTG CAGGGAAACC CTGATCCAAT ATCAATAACT HML7AC013722 7777777777 7/77977777 7777777777 7777777??? W???977777 7)77)77777 HML8AL596245 GTTCCTGGGC TGGGTCTCAC TCCAAAGCAT GAAGGGATTG TTAAACCCCT CCCAGTTACT HML9ACO25569 HML9AC068700 Oryzomysl 9777779.777 7779.779779 7777777 ??????9777 777777777.7 7777777)" Armadilloll 9777777777 7777777777 7777777977 77?9?????? 99?77777 Reedbuckl 7777777x77 7777777777 7777777777 7977????72 7777777777 7777777777 Muspaharill 7977.779777 7977977777 7777777777 7777779.772 777777777? ??77777777 NileRatII 7777777777 7777977777 ????777777 7777777777 Hedgehogl ???? ?????9797 777777)777 Cougarl DomesticCatl T22 SheepI IPSquirrell SIMongooseI MPMV AGCCCAGGAA AAGGGTTAGG AAAAAAGGAA AATGGCATTC TACATCCTAT CCCAAATCAA SRVI AGCCCCGGAA AAGGATTAGG AAAAAACGAA AATGGCATTC TACATCCTAT CCCAAATCAA SRVII AGCCCAGGAA AAGGACTAGG AAAAAGAGAA GATGGAATCT TACAACCTAT CCCAAATTCA BabboonSERV ACCCCTGGTA AAGGTCTTGG AAAAAGAGAA AACGGTATCC CACAGCCTAT ACTAGTTTCA MusD CTGCCTGATC ATGGACTAAT AAAGAAGGGA CAGGAAATTA AGACTTTTAA GGGTCTTAAA TvERV TTACCCCGCC AAGGACTAGG AAAAAATAAA CAAGGCATCA CTCAACCTTT ACATATACAA SMRVH CTCCCCGGCC AAGGGTTGGG AAAAAATAAT CAAGGAATCA CCCAGCCCAT TACTATTACC Colobusl CACCCTGGGA AAGGGTTAGG AAAAAGAGAA GATGGCATTT TACAGCCAAT CCCAGCCATA SImongooseI AAGCCAGATA AGGGAATTGA AATAAATAAA GATAGTATTA CCCAACCTGT TGAGGTTCTA Goat II ATACCGGGAA AGGGGCTTGG ACGAGACAAA CAAGGCACTA TACAACCTAT AAACCTATCC OstrichD CTACTGGGAC AAGGACTGGG AAAAAGCAAT CAGGGTATAA AACAGCCTCT TCCTGTAACG Lorisl TGTGCTGGTC AGGGATTAGG GAAATATTCC CAAGGAATAA AAGAGCCCAT TCAAATAACC SIMongooseII CGTCCCAGGC AAGGCTTAGG AAAGTACTCT CAAGGCATAA AGGAACCTAT CAAATTAAAA Jaagsiekte CTTCCAAATC AAGGTTTAGG TAAACAACAT CAAGGCATCA TTTTGCCCCT TGATTTAAAA RDolphinl CTTCCCACAA AAGGGCTGGG AACAAATCAA CAAGGAACTG TCTCTCCTAT TGATGTGAAA WFDeerI TTGCCTAATC AGGGATTAGG AAAAAATGGA GAAACAAATT TGTCACCTGT TCAAACTAAA Cariboul TTGCCTAATC AGGGATTAGG AAAAAATGGA GAAGGAAATT TGTCACCCAC TCAAACTAAA Giraffel AATCCCGACA AGGGGCTTGG TAACGATCAG TAGGGAATTC TTTCTCCATT AGAAATGGTT Bisonl GATCCATCCA AAGGATTAGG TAAACAACAG GAAGGAATAA TTGAGCCAAT TTGCCCAACC Muskoxl GATCCATCAA AAGGATTGGG TAAACAACAG ACAGGAATAA TTGAGCCAAT TTGCCCAACT CFBadgerl AAAATGGGAT ATATTTCAGT TAAGGGCTTA AGA-GAAGAT TTACAGGAAA GACCCTACCT HRV5 TATCAGGAGA GAGGATTAGG TTCCAGAGGA GAAGGCAGAA GGGACCTAGC CCAGCCCAAA GoatI GTTCCTGGGA AGGGGATAGG AAAGGATTTG CAAGGTAGAA CAGAACCCAT TG-AATTTAC IAPSHamster ACAAATGGTA GAGGCTTGGG TAGGCAAGAA CAAGGCAGAA TAAAACCCAT TACACAACAC IAPCHamster AGTCCCAGGC CAGGCCTCGG GAAACATCTG CAGGGTCGTA CCAGTCCTAT TAATTCCACA IAPMouse AAAGAAGGAA AAGGGTTAGG ACATCAAGAA CAGGGAAGGA TAGAGCCCAT CTCACCTAAT Mastomysl GCTCCAGGCT TTGGAATAGG AAAGTATTTG CAAGGGAGGA AAAGTCCAAT ACCTGTGAAG MyomysI GTTCCAGGCT TTGGAATAGG GAAATACTTG CAAGGGTGGA GAAGTCCCAT ATCTGCTCAA Uromysl GAGGAAGGGC AAGGCCTAGG AAGCAAAGAA CAGGGCAGAT TACAACCTAT TCCCCAAATA Prairiedogl AGACCAGGTC TTGGATTAGG AAAAACTTTA CAAGGACTTA AACACCCCTT AGAGCCGCAA Rabbitl GTCGCCGGCA AAGGACTCGG AGTTCAACTC CAGGGTCGTT CTTCCCCCAT TGAGCTCAAA NileRatI AGGCCAGGCA AGGGACTGGG AAAGAATCTG CAAGGCAGCC CAGATGTGAT AACTACACTT MMTVBR6 SDunnartl RKangal

251 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Platypusl Echidna) Armadillol PFalconl PrairiedogII MHarrierl VultureI ETinamoul Moorhenl HThrushl ESOw1I RNPheasantl Peacockl Bluetit II CMagpiell Penguinl GSKiwiI LKiwiI HThrushl ES0w1II WFgoosel NABDuckIII Flamingol GRheaII DRheaI BrownKiwil RBowerIII Toucanettell GWoodpeckerI ESOw1II GRheaI JQuailI Cassuaryl AZMagpieII CMagpieIII MThrushl CMagpiel Guineafowll PartidgelV HThrushII GPheasantl GPheasantll NABDuckI Loonl Ostrichl MoorhenII Goshawk) Pigeonl FHawkI BGrousel BGrousell PeacockII NABDuckIV Bluetitill Toucanettel NABDuckII HThrushIII BlueTitI T7 MHarrierll EmuI LKiwiII LDV RSV ALVsubgrpJ Tragopanl GuineafowlII LoonII

252 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Appendix 2. Nucleotide alignment (continued) BLV [ ] HTLV1 [ ] ??? HTLV2 [ ] ??? HIV1 [ A AT]TTTCCC-- -ATT---AGT HIV2 [ A AT]CTACCA-- -GTC---GCC SIVagm [ G TA] ATG---GGT FIV [ G TA] ATG---GCT BIV [ ]TTCACC-- -CTACTTGTT Jembrana [ ]TTTACT-- -CTAGCTGCA CAEV [ ] ATA---ATG Visna [ ] ATT---ATG EIAV [ ] GTT---TTG HERVK10 ATAAATCAAG AAAGAGAAGG AATAGGG[ TATC CT]TTT TA---GGG HML1Z70280 GGAAAAACAG ATTGGACCGG ATTGGGT[ GTCA TT]TTT A- -GGA HML3ACO25577 AGACAAAGTT CCCGCCAAAG ATTAGGA[ AACA AT]TTT TG---ATG HML3AC092364 AAACAAAGTT CCCAACAAAG ATTAGGA[ A AT]AATATT-- --TGATGGTG HML4AC093517 CCTAAAGCAG ACTCTACAGG ACTTGGT[ TATT CT]TTT TA---GAA HML5AP000870 [ C]TTT CTAATAATG HML6AF069508 GGACAAACAG AAAGGGGCTA GGTCATC[ AGGA TT]TTT GGT---GGG HML7AC013722 57,5777,77 ","7"79 7,'')'7'[TCC CCCCGGTCCA TA]TCC CAG---AAC HMLBAL596245 GTAAAAGAAA ACAGGGCGGG TTTAGGT[ TATC CT]TTT TA---ATG HML9ACO25569 [ ]TTT TGGTAGGG HML9AC068700 [ ]TTT TGGTAGGG Oryzomysl ?????????? ?????????? ???????[ ] ATT .7779.7 97.)7799[ Armadilloll ?7,7,777',7 77W,9 ] GGG Reedbuckl """7977 7????????? ???????[ ] GGG Muspaharill ?7,7W)7W,7 77777",77")2 97777,7[ ]TTT ATAAAAGGG NileRatII .")9.)7.777., WPV)*),7'),7 ')7')7( ]ATT CTA---GGG Hedgehogl ""9"7,9 ,????????? ???????[ CC]CCAATA-- -CTA---ACA Cougarl [ A GT]TTT TC---ATA DomesticCatl [ ]TTT TC- -CTA T22 [ ]??? ???? Sheepl [ A CT]TTTTCT-- --TA---GGG IPSquirrelI [ A CT]TTTTCT-- --TA---GGG SIMongooseI [ A GT]TTTTCC-- --TA---GGG MPMV GGACAATCTA ACAAAAAAGG TTTTGGA[ A AT]TTT TAACTGCG SRVI GGACAATTTG ACAAAAAGGG ATTTGGA[ A AT]TTT TAACTGCG SRVII GGACAACTTG ACAGAAAAGG ATTTGGA[ A AT]TTT TAGCTACG BabboonSERV GGACAATTTG ATAAAAAGGG GTTTGGA[ A AT]TTT TAGCTCAG MusD CCCCACTCTA ACGTGAGAGG TTTAAAG[ T AT]TTTCAG-- --TA---TCG TvERV TCTCACCCTG ACCGCTCCGG CCTTGGT[--- TTCCAGACAC AT]TTTTCA-- --TA---AGG SMRVH CCCAAAAAAG ACAAAACAGG CCTAGGA[--- TTCCACCAAA AT]TTACC--- --GTAGTCGT Colobusl GGAAAATTGA ATAAAAGGGG GCTTGGA[ A AT]TTT TAACCATG Simongoosel AATAACCATC ATACAGGGGG TATTGAA[ A TA]TTTAAC-- -CTA---GTA GoatII CCCAAAACAG ACCGTTCAGG CTTAGGA[------TATAAAA GG]TTTTCC-- --TA---GGG OstrichD CCTAAAAATA ATCGTTCAGG CGTGGGA[------AATGAAA AT]TTTCCT-- --TA---GCG Lorisl TCAAATCTTA ACCGCGCTGG ACTAGGC[------ATAGAAA AT]TTACCC-- --TG---ATG SIMongooseII AACAATGATA ACGCCTCGGC T-TAGGC[------TCT--AA AT]TTACCC-- --TA---ATG Jaagsiekte CCTAATCAAG ATCGAAAAGG CTTGGGG[ T GT]TTTCCC-- --TA---GGG RDolphinl ATAAAAAATG ATAGACAAGG TTTGGGT[ ]TTTTCT-- --TA---GGG WFDeerI ACCTTGCCTC TTCGATCAGG ATTAAGA[ T AT]TTT TA---GGG Cariboul ACCTTGCCTC TTCGATCAGG ATTAGGA[ T AT]TTT TA---GGG Giraffel CCTAACAAGA ATAGAAAAGG TTTAGGA[ TATT CA]AAT TTATCCTA Bisonl CCTCGGCAGC TACATACTGG ATTGGGA[ TATC CA]AAT TTATAATG Muskoxl CCTCGCAAGC TACGTGCTGG ATTAGGA[ TATC CA]AAT TTATAATT CFBadgerl GCTTCACTTA CAATCACTAC CTTCCGA[--- AGCGGGGTCC GG]TTTTCT-- -TCA---GGG HRV5 GGCTACTTTA AAAGACAGGA CACAGGT[ ]TTTTCG-- --TG---GGG Goatl ACCAATGCAA AGCTGATCCG ACTAGGT[ TATC CT]TTTGTA-- --TG---AGG IAPSHamster GGAAATCGGG GTAGAAAAGG ACTGGGT( ]TTTATT-- --TG---GGG IAPCHamster ATTGAGACCA AAGAATCTAG GTCTGGG[ ]TTTTTC-- --CT---AGG IAPMouse GGAAACCAAG ACAGACAGGG TCTGGGT[ ]TTTCCT-- --TA---GCG Mastomysl TAGAGACAAA AGAGACAGGG CCTGGGT[ ]TTTTCC-- --TA---GGG Myomysl CAAAGACAAA AGAGACAAGG CCTGGGT[ ]TTTTCC-- --TA---GGG Uromysl AAACATGAAG GACGAAGAGG ACTGGGT[ ]TTTTCA-- --TA---AAG Prairiedogl CAAAAATTTA ATAGATCTGG ATTGGGT[ ]TTTTCC-- --TA---GAG Rabbitl CAAAAACCAG ATAGAACTGG CTTGGGT[ ]TTTTCC-- --TA---GGG NileRatI CCCAAACATG ATAGGACTGG GCTGGGT[ ]TTTTCT-- --TA---GGG MMTVBR6 [ G AT]TTA-TG-- -ATA---GGG SDunnartI [- GAAAACTATT AG]GGATCA-- --CT---GAG RKangal [- CACCACCAGG TA]AACTCA-- --TA---GGG

253 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Platypus' [ ACCC CT]TTATCA-- --TA---ATG Echidnal [- -TCACTGCGC CC]TTTTCC-- -TTC---GCG Armadillol [ ]??? ???????? PFalconI [ ]TTT TAGGCGGG Prairiedogli [ ]TTT TAGGCGGG MHarrierl [ ]TTT TAGGTGGG Vulturel [ ]TTT TAGAAGGG ETinamoul [ ]TTT TAGAGGGG Moorhenl [ ]TTT TAGGGGGG HThrushI [ ]TTT CAGTTGAG ESOw1I [ ]TTT GTCTTAAGG RNPheasantl [ ]TTT ATCATAGGG Peacockl [ ]TTT ATA---GGG Bluetitil [GAG ACACCTCAGG AT]TTT TGATAGAG CMagpiell [TCT CAACCCCGGG AT]TTT TAGTAGGG Penguinl [GAT CCGCATCAAC AT]TTT TAGCGGGG GSKiwiI [TCC TTGAAGCGGC CT]TTT CAGAAGGG LKiwiI [TCC TTGAAGCGGC CT]TTT CAGAAGGG HThrushl [ ]TTA TAATTGGG ESOw1II [ ]TTT TGACT--G WFgoosel [AGC TCTCCGGAAA AT]TTT CGATAGCG NABDuckIII [AGA TGCCCTGAAA AT]TTT CGGTAGCG Flamingol [AGC TCCCCAGAGA AT]TTT CAATAGCG GRheaII [ ]TTC TGATAGCG DRheaI [ ]TTC TGATAGCG BrownKiwil [ ]TTT TAGTAATG RBowerIII [ G GT]TTT TTAGCGGCA Toucanettell [ C CT]TTT CTGGAGCG GWoodpeckerI [ C CT]TTT TCATAGCG ESOw1II [- -CCGGAAC AT]TTT GCGGAGCG GRheaI [ ]TTT TAGTTGCG JQuailI T CAGTGGAGGA TCAGCAT[- TGGAATCCAC CT]TTT TTGGGGTG Cassuaryl [ T TA]TCC TA---AGG AZMagpieII CT GACTCTC[- GTCAAACCCT AT]TTC ATAGGC CMagpieIII AC TGACCCT[- CGTCAAACCC TA]TTT CATAGGG MThrushl CCTGATT[- CACATCACAC TC]TTT TTCATAAAG CMagpiel [ ]TTT TCATAGGG Guineafowll [CGA CATATGCAGG AT]TTT CGGGAGGG PartidgelV [ -CATTTGG AT]TTT CCCAGAGG HThrushll [ C CT]TTT CCGGAGCG GPheasantl [ GGTGGTCCGG AT]TTT TAGTTGGG GPheasantll [ G GT]TTT TACTGAGG NABDuckI AGGCATAT TAAACAG[GAA CAAGACCAGG GT]TTT TAGCAGTG Loonl [ C AT]TTT TAATTATG Ostrichl [ C AC]CCT TTGGCCTA MoorhenII [ C GC]CCTTTG-- --GGGTAGTC Goshawkl [ TCAC CT]TTT TAGGAGCG Pigeonl [ AGGG CT]TTTCGC-- --TGACGGTC FHawkI [ TCGC CT]TTT TAGGAGCG BGrousel [ GAGTCTTCGG GT]TTT CCCTAATG BGrousell [ CACC CT]TTA CCTTAATG PeacockII [ -GACACTC CT]TTT TCTTAACG NABDuckIV [ CACC AT]TTA TAGGTATG BluetitIII [ CACC CT]TTA GAATAGAG Toucanettel [ ]CCC TA---AGG NABDuCkII [ ]CCG TA---AGG HThrushIII [ ]TTC TA---AGG BlueTitI [ ]CCC TA---AGG T7 [ ]TCC TA---AGG MHarrierll [ ] TAGGGAAG Emul I ] GGGTAAGG LKiwiII [ ]TTT TGATGGGG LDV I IGTC TG---AGG RSV I TAGGGAGG ALVsubgrpJ I TAGGGAGG Tragopanl [ ] TAGAAAGG GuineafowlII [ ] TAGTTAGG LoonII [ C CT]TTT CTGGA-GC

254 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Appendix 2. Nucleotide alignment (continued) BLV [? ??? ]?? ?????????? HTLV1 777777777[7 7777777777 7777777 177 7777777777 HTLV2 ???????7777777777 777777 177 9797797777 HIV1 [- ]CC TATTGAGACT HIV2 [- ]AA GATAGAACCA SIVagm CAACTGTCA[- ]GA ACAAATTCCC FIV CAAATTTCT[- ]GA TAAGATTCCA BIV CACACAGAA[- ]AA AATCGAACCC Jembrana CACACCAAA[- ICA GATTCAACCA CAEV GCAAATTTA[G AGGAA ]AA AAGAATCCCA Visna GCAAATTTA[G AAGAA ]AA GAAAATTCCC EIAV GCACAGCTC[T CC ]AA GGAAATAAAA HERVK10 GCGGTCACT[G TAGAGCCTCC T ]AA ACCCATACCA HML1Z70280 GCGGCCATT[G TTGAGCCTCT G ]GC TCCCATTCCT HML3ACO25577 GTGGCCATT[G TTAAGCCTCC A ]GA ACCTATGCCT HML3AC092364 GCCATTGTT[A AGCCTCCA ]GA GCCTATACCT HML4AC093517 GTGGTCACT[A TCAAGCCTCC A ]GA ACCCATCCCT HML5AP000870 GCCACTGTT[A TTATC ]CT TCCCCTACCC HML6AF069508 GTCATGGTA[T TTCTCCTCCA CCC ]AC TGCTTTGCCG HML7AC013722 CCTCACTCT[C CTTCTCTTCC T -CTGCCCAT HML8AL596245 GCGGCCGCT[G CCATGCCTTC T ]GA TCCTATCCCT HML9ACO25569 GCCACTGCT[C CTTTG ]AA AATTATCAAA HML9AC068700 GCCACGGCT[C CTTTG ]AA AATTATCAAA Oryzomysl TCAAAAATG[C CC ]AC GGCATTACCT ArmadilloII GCTACTGTT[A CAAAGGGC ITT CGCATTGCCT Reedbuckl GCTACTGTT[A CAAAGGCC ITT CTCATTACCT Muspaharill GCCACTGCT[G ACATACCA )AC AGCCTTGCCC NileRatII GCCAGTGTT[A GATTA IAA TACTCCCTGC HedgehogI GCCAATGTT[T GTAAC ]CA AACTCCCTGC Cougarl GCCACTGCT[C AAATT ]GA GACCCTCCCA DomesticCatl GCCACTGCT[C AAACT ]GA GCCCCTCCCA T22 977777777[T CTATTCCT ]AA GGCCCTTCCA Sheepl GTCACTGGG[A TA IAA GCCACTGAAG IPSquirrell GTCACTGAG[A TA ]AA GCCACTGAAG SIMongooseI GCCATGGAT[C ATTGGGTG ]CC ACCATTGAAA MPMV GCCATTGAC[A TACTTGCACC CCAACAG ]TG CGCTGAACCC SRVI GCCATTGAC[A TGCTTGCACC CCAACAG ]TG TGCTGAACCC SRVII GCCGTTGAC[A TACTTGCACC CCAGAGA ]TA TGCTGACCCC BabboonSERV GCCACTGAC[A TACCTGCACC CCAAAGG ]TG CGCTGACCCC MusD GCCACTGTC[T TGCCTGCATC C ]CA CGCCGAAAAA TvERV GCCACT---[G GTCCCCTCAG CCTA ]CA GGCGGATAAG SMRVH GCCATTGAC[A TTCCTGTACC C ICA CGCTGACAAA Colobusl GCCATAGAC[A TGCCTGCACC CCAAAAA ]TT TGCTGATCCA SImongooseI GCCATCAAC[A GTCCTGTTCC C ]TA TGCTGATAAG Goatli GTCACTGTC[T CTCCTGCAAC A ]CA GGCAGACAAA OstrichD GTCATTGAC[T CCCCTGCACC C ICA CACGGATAAA LorisI GCTATTGAC[C TTCCTGTACC C ]CA TGCTGATAAG SIMongooseII GTCATTGAC[A TGCCTGTACC C ]CA TGCAGATAAG Jaagsiekte ACCTCTGAT[T CTCCTGTGAC G ]CA TGCCGATCCT RDolphinl GCCTTGGAT[T CTCCTCCACT C ]AC TGCTGATCCA WFDeerI GAGGCCATT[G TATCTCCTGT GGCT ]CA TGCGGACGCG Cariboul GAGGCCATT[G TATCTCCTGA GGCT ]CA TGCGGTCACA Giraffel GAGGCTGTT[G CTCTT ]GC TGCCGACCCT Bisonl GAGGCCATT[G TTTTT ]GC TGCTGATCCC Muskoxl GAGGCCATT[G TTTTT ]GC TGCCGACCCC CFBadgerI CCACTGAAC[C ACCCCCCCCG C ]CA CGCGCTGCCT HRV5 GCCACTGAG[A TG ]CA GCCGCTGCCA GoatI GCCACTGAG[A CGAAA ]CC ATTGTTAAAA IAPSHamster GCCGTTGAG[G CT -TCACGACCC IAPCHamster GCCACTGAG[G A -GGTATTCCT IAPMouse GCCATTGGG[G CA -GCACGACCC Mastomysl GCCACTGAG[G AG -GGCATCCCC Myomysl GCCACTGAG[- -GGCATTCCC Uromysl GCCGCTGAG[- ]GG TATCCTGCCC Prairiedogl GCTGCTGAG[G AA -CAGGTGCCC Rabbitl GCCACTGTA[T CGCCTCCTAA GCCAATA IAA ACCTTTACCC NileRatI GCCACTGAG[G GA -AGTACACCT MMTVBR6 GCCATTGAG[A GCAATCTC ITT TGCAGACCAA SDunnartI AAAATCCTG[G CAGTGCCA CCC RKangal GTCGCTGGA[G GGGCG ]AG ACCGATTCCA

255 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

PlatypusI CCCACTGCC[C CACAACAT ]TA TGCAGAAAAA Echidnal GCCACTGCA[A ATGTCGGGGT CAACCGCCCA ]AC CGCCCAACGA Armadillol 7.p.,7.,..,[c CT ]AC GCCCATTCCT PFalconI GCCATTGAA[A TGCGC ]GA CACCCTAAAA PrairiedogII GCCATTGAA[A TGCGC ]GA CACCCTAAAA MHarrierl GCCATTGAA[G TGCGC ]GA CACCCTAAGA Vulturel GCCATTGGA[G TGCGC ]GA CACCCTGAAA ETinamoul GCCATTGAA[A AGCGC ]GA CACCCTGAAA Moorhenl GCCATTGAG[G AGCGC ]GA CACTCTCAAA HThrushl GCCACTGAG[G TGCGC ]AA CACCCTGAAA ESOw1I GCCACTGCG[G CGCAA ]CC CACCCTTAAA RNPheasantl GCCTTTGCG[G TGCAG ]CG TACATTAGTA Peacockl GCCGCTACA[G TACAG ]TG TACATTACCA Bluetitli GCCACTGTA[C AGCGT ]CC TGTCCAAAAA CMagpiell GCCACTGCG[G AGCGC ]CC CACCCAGAAA Penguinl GTCACTGGC[G CTCGG ]CC CACCGTTCGA GSKiwiI GTCACTGTG[A AGCGG ]CC TGTTCCCCGT LKiwiI GTCACTTGT[G AGCGG ]CC TGTTCCCCGT HThrushl GTCACCGAA[G TGAAGGGCAG GAGTTAT ]CC TATGATGCCT ES0w1II CAGGGTGTA[A AACAA ]CC CACACTTGCG WFgooseI GCCATTGAC[A AGCGA ]CC ACTCATGAAA NABDuckIII GCCATTGAA[A AGTGA ]CC GATAATGAAA Flamingol GCCATTGAC[G AGTGA ]CC TCTCACAAAG GRheaII GCCATGGTC[G GCAGCTCTCG C ]CC GACGCTAAAA DRheaI GCCATGGTC[G GCAGCTCTCG C ]CC GACGCTAAAA BrownKiwil GCCGCTGCT[G ACACTTCTCC A ]CC CACGCTGAAA RBowerIII GCCATTGAT[G GGCGA ]CC AATCCTAAAA Toucanettell GCCACGATC[A TGCAG ]CC GGTGCTAAAA GWoodpeckerI GCCGCTGGT[C AGCAG ]CC AGCCCTCAAA ESOw1II GCCATTGAA[A AGTGA ]CC AATACTTAAG GRheaI GTCACTGAT[G AGGGATAACC A ]CC CTTAATAAAA JQuailI GCCATTGAT[G AAAGG ]AT GACCTTAAAA Cassuaryl GTCACTGAA[G AGCAG ]CC TTGTGTAAAA AZMagpieII GCCATTGAA[C AGCTGTTCCA AACACAAGTA CTG ]AA CATATGTAAG CMagpieIII GCCATTGAA[C AGCCGTTCTA AACACAAGTA CTG ]AA CATATGTAAG MThrushl GCCATTGAA[G AGCCATTCTG AACACAAATA CTG ]AA CATATGTAAG CMagpiel GCCACTAAA[G CACTC ]AG TACTTCAAAA Guineafowll GCTGCTGGG[G GTAAA ]CC CTTGGTCCGC PartidgelV GGTCCTGGG[T GGTTG ]AT GCCGGTAAAA HThrushll GCCACTGAG[C AGGTAGATGC TCGC ]CC ATGCATTAAG GPheasantl GCTGCTGGG[G AAGTAAGCTG CTGGAAA ]AT GATCTGTAAA GPheasantll GCCACTGGG[G AAGTTCCATG CTGGCAT ]CT TAATTGGCTG NABDuckI GCCGCTGAT[G AGCCT ]CT CACCTGGCCA Loonl GGCACTGGA[G TGCAGTCG ]CC CAACCCACCG Ostrichl TTAGCCATT[G CTTGGACT ]TT CCCAATTCCC MoorhenII GCCATTGCT[T GGACT ]TT CCCACTTCCA Goshawkl GTCACTGAT[T GGACT ]TT CCCGATTCCT Pigeonl ACTGCAAAA[T GGGCT ]TT CCCGATCCCT FHawkI GTCACTGAT[T GGACT ]TT CCCGATTCCT BGrousel GCCACTGCA[T GGGCT ]TT CCCGATCCGA BGrouseII GCCATTGCT[T GGACT ]TT CCCACTGCCA PeacockII GCTACTGCT[T GGGCT ]TT CCCGATCGCA NABDuckIV GCCATTGCT[T GGACT ]TT CCCAATCCCA BluetitIII GCCATTGTT[C GGACT ]TT CCCAATCCCA Toucanettel GCCACTGCC[G CATACCCACT C ]CC ACCAATCAAA NABDuckII GCCACTACT[G CATACCCACT T ]CC ACCGATTAGA HThrushIII GCCACTGTC[G CATACCCACC A ]CT GCCCATCAAG BlueTitI GCCACTGCC[A CGTACCCGCT G ]CC ACCAATCAGG T7 GCCACTGCC[G TGTACCCGCT G ]CA ACCAATCAGG MHarrierII GCCACTGTT[T TACTCATGGA CAGCTCT ]GT AGCTGTGCAC Emul GCCACTGCC[T ACCTCCTCGG GGCTCAT ]TT CGCTGTCCCG LKiwiII GCCACTGGG[A CCGCCACTAG TGAGATC ]AG GGCTATACCC LDV GCCACTGTT[C TGTTAATGGA GAGATCT ]TG GGCAATCCCT RSV GCCACTGTT[C TCACTGTTGC GCTACAT ]CT GGCTATTCCG ALVsubgrpJ GCCACTGTT[C TTACTGTTGC GCTACAT ]CT GGCTATTCCG Tragopanl GCCACTGTC[T GCCAACTATC AGCGCAG ]TA TTCGATCCCA GuineafowlII GCTACTGTC[C ACCAGCTGTC CATCGTG ]TA TGCGATTCCA LoonlI GGCCACGAT[C ATGCAG ]CC GGTGCTAAAA

256 Appendix 2. Nucleotide Alignment of Class H ERV pol fragments

Appendix 2. Nucleotide alignment (continued) BLV ??????[???? ?????]????? ?????????? 77777777" 77777'77"''''''' ,w)7CCC HTLV1 777777[7777 77777'77777 7777777777 7777777777 77???????? ???????CCC HTLV2 ??????[???? ?????]????? ???????777 "77'777" 7777777777 777)777TCC HIV1 GTACCA[GTAA AA---]77777 77777777" ????77777- --???GTTAA ACAATGGCCA HIV2 ATAAAA[ATAA TG---]77777 7777777777 777777777- --???CTGAG ACAATGGCCC SIVagm ATTACC[CCTG TGAAA]????? ?????????? 777777777- --???TTAAA ACAATGGCCC FIV GTAGTA[AAAG TAAAA]77777 7?777777" 777777777- --???ATAAA ACAATGGCCA BIV CTACCC[ ]????? ?????????? 7777777" --???GTACC CCAGTGGCCC Jembrana CTTCCG[ ] 77777 7779777777 777777777- --???GTGCC TCAGTGGCCC CAEV ATTACA[AAAG TAAAA]????? ??????7777 777777777- --???GTCCC ACAATGGCCA Visna AGTACA[AGAG TAAGA]77777 7777777777 77777777?- --???ATAGC GCAATGGCCT EIAV TTTAGA[AAAA TAGAG]w>wv' 777777'777 777?"777- --???ATTCC TCAATGGCCA HERVK10 CTAACT[ ]TGGAA AACAGAAAAA CCGG TGTGGGTAAA TCAGTGGCCG HML1Z70280 CTTGTT[ ]TGGCT AACTGCCAAA CCGG TTTGGGTGCA GCAATGGCCA HML3ACO25577 TTAAAA[ ]TGGTT AACAGATAAG CCAA TTTGGATGGA ACAATGTCCG HML3AC092364 TTAAAA[ ]TGGTT AACAGATAAG CCAA TTTGGATAGA ATGATGGCTG HML4AC093517 TTGACC[ ]TGGAG AACTTAAAAA CCTG TCTAGGTAGA TGAGTGGCTG HML5AP000870 CGGCG-[ ]TGGCT CTCTCGAGAT CCAC TTTGGGTAGA ACAGTGGCCC HML6AF069508 CTAGAG[ ]TGGCT GACTGACAAA CCTG TGTGGGTGGA TCAATGGCCC HML7AC013722 GCACAA[ ]TGGAT GTGTGAGTCA CCTG TTTGGGTAGA GCAGTGGCCG HML8AL596245 TTACAA[ ]TGGAA ATCTGACACA CCTA TTTGGATTCA GCAGTGGCCG HML9ACO25569 ATTAAG[ ]AGGAA AACTACTAGC CCAG TATGGGTGGA GCAGTGGCCC HML9AC068700 ATCAAG[ ]TGGAA AACTACCAGT CCAG TATGGGTGGA GTAGGGGTCC Oryzomysl CTGAAA[ ]TGGCT GACAGAGAAG CCTA TATGGGTTCC TCAATGGCCT Armadilloll TTAAAA[ ]TGGCT GACAAACAAG CCTA TCTGGGTAGG ACAGTGGCCC Reedbuckl TTAAAA[ ]TGGCT GACAAACAAG CCTA TCTGGGTACA ACAGTGGCCC Muspaharill CTAAAG[ ]TGGTT ATCAAACCAG CCTC TTTGGCAAGA GCAGTAGCCC NileRatil CTTGAT[------]TGGCT CTCTGACGAA GCCA TCTGGGTCCC ACAGTGGCCT Hedgehogl TTAGAT[ ]TAGCT TTCTAATGAG CCTG TCTGGGTGAA ACAGTGTCCT Cougarl TTGAAA[ ]TGGAA AACTAACAAA CCTG TATGGGTTGA GCAATGGCCA DomesticCatl TTGAAA[ ]TGGAA AACTAACAAA CCTG TATGGGTGAG CAGTGGCCAA T22 CTTCAG[ ]TGGCT TCATAATAAC CCTG TGTGGGTTAA TCAGTGGCCA Sheepl TTAGAA[ ]TGGAA GTCTGATAAA CCTA TCTGGACAGC TCAGTGGCCC IPSquirrell TTAGAA[ ]TGGAA GTCTGATAAA CCTA TCTGGACAGC TCAGTGGCCC SIMongooseI ATCCAA[ ]TGGAA ATCAGATGAA CCAG TATGGGTCAA TCCATGGCCT MPMV ATCACG[ ]TGGAA ATCAGACGAA CCTG TCTGGGTTGA TCAGTGGCCA SRVI ATCACG[ ]TGGAA ATCAGACGAA CCTG TCTGGGTTGA TCAGTGGCCA SRVII ATTACA[ ]TGGAA GTCAGATGAG CCTG TCTGGGTTGA TCAATGGCCT BabboonSERV ATTACT[ ]TGGAA GTCAGATGAG CCCG TTTGGGTTGA TCAGTGGCCT MusD ATTCAA[ ]TGGCG TAATGATATT CCCG TGTGGGTAGA TCAGTGGTCT TvERV ATTACA[ ]TGGAG ATCTGAGACT CCCG TCTGGATTGA CCAGTGGCCC SMRVH ATTTCC[ ]TGGAA AATTACAGAC CCTG TGTGGGTTGA TCAGTGGCCA Colobusl ATCACT[ ]TGGAA GTCAGATGAG CCCG TTTGGGTTGA TCAATGGCCA Simongoosel ATTACT[ ]TGGAA GTCTGATGAC CCTG TGTGGGTTGA TCAATGGCAT Goatil ATTATT[ ]TGGAA AAGTGATGAT CCTG TCTGGGTTGA TCAGTGGCCC OstrichD ATTAAG[ ]TGGAA ATTTGAGTCT CCAG TGTGGGTAGA TCAATGACCC Lorisl ATTACT[ ]TGGAA ATCCAATGAA CCGG TGTGGGTCAA TCAATGGCTG SIMongooseII ATCTCT[ ]TGGAA ATCCCATGAT CCTG TTTGGGTTGA TCAATGGCCT Jaagsiekte ATCGAT[ ]TGGAA ATCTGAGGAA CCGG TATGGGTCGA TCAGTGGCCC RDolphinl ATTACT[ ]TGGCT GACTGATGAC CCTG TATGGGTGGA CCAATGGCCC WFDeerI ATTGCC[ ]TGGAA AAGTGACGCG CCGG TCTGGGTCGA TCAATGGCCT Cariboul ATTACC[ ]TGGAA AAGTGACGCA CTGG TCTGGGACGA TCAATGGCCT GiraffeI ATTACC[ ]TAGAA ATCTGATGAT CTGG TATGGGTGGA GCAATGACCC Bison' ATTACA[ ]TGGAA ATCTCAAGAC CCAG TGTGGGTAGA ACAATGGCCT Muskoxl ATCACA[ ]TGGAA ATCTCAAGAT CCGG TGTGGGTAGA ACAATGGCCT CFBadgerl ATCGAG[ ]TGGAA ATCTGATGAC CCTG CATGGGTGGA TCAGTGGCCC HRVS TTGTCA[ ]TGGCT GGACAACAAG CCAA AGTGGATACC ACAGTGGCCC Goatl ATTACT[ ]TGGAA AAACAATGAT GCTG TTTGGGTACC CCAGTGGCCC IAPSHamster ATACCA[ ]TGGAA AACAGAGGAG CCGT TATGGGTCTC TCAATGGCCT IAPCHamster ATTACC[ ]TGGAA AACAGAGGAG CCGG TATGGGTTCC TCAGTGGCCA IAPMouse ATACCAI ]TGGAA AACAGGGGAC CCAG TGTGGGTTCC TCAATGGCAC Mastomysl ATAACT[ ]TGGAA GATGGAGGAT TCAG TTTGGGTTCC CCAGTGGCCA Myomysl ATAACT[ ]TGGAA GATGGAGGAT TCG- TGTGGGTTCC TCAGTGGCCA UromysI ATACCT[ ]TGGCT AACAGAGGAG CCGG TATGGGTTCC TCAGTGGCCT Prairiedogl ATCACC[ ]TGGAG ATCAGAGGAG CCAG TATGGGTGTC TCAGTGGCTG Rabbitl ATCTCT[ ]TGGCT AACAAACACA CCCG TCTGGGTACC ACAGTGGCCC NileRatI ATCACT[ ]TGGAA AACGGGGGAC CCAG TGTGGGTCCC TCAGTGGCCA MMTVBR6 ATATCT[ ]TGGAA GTCAGACCAG CCTG TATGGCTTAA TCAATGGCCC SDunnartl TTAAGA[ ]TGAAA ACATGACTGT CCAT TATGGGGTAG AATCAGTGGT RKangaI TTGCAA[ ]TGGAA GTCGACAGAC CCTA TCTGGGTCGA TCAGTGGCCC

257

Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Appendix 2. Nucleotide alignment (continued) Platypusl ATTACA [ ]TGGCT TTCAGACAAG CCTA TCTGGGTGGA GCAGTGGCTG Echidnal ATAACG [ ]TGGGC TTCTGATACA CCAG TATGGGTGGA GCAGTGGCCG Armadillol CTTATC[ ]TGGGA CACAGAGGAA CCTG TTTGGGTTGA GCAATGGCCC PFalconI TTAACC[ ]TGGAA AACTGATACT CCTA TTTGGGTAGA TCAATGGCCC PrairiedogII TTAACC[ ]TGGAA AACTGATACT CCTA TTTGGGTAGA TCAATGGCCC MHarrierl TTAACC[ ITGGAA AACCAATACC CCTA TTTGGGTAGA TCAATGGCCC Vulturel TTGATA[ ]TGGAA AACCACTGCC CCTA TTTGGGTAGA TCAATGGCCC ETinamouI TTGACA[ ]TGGAA AACTAATGCC CCTA TTTGGGTAGA TCAATGGCCC Moorhenl ATCACA[ ]TGGAA AACAGATACC CCTA TTTGGGTGGA TCAATGGCCC HThrushI CTAACC[ ]TGGAA AACCACAACA CCCA TCTGGGTGGA TCAATGGCCC ESOw1I CTAACA[ ]TGGAA AATGGAATCA CCTG TGTGGGTGGA TCAGTGGCCC RNPheasantl TTCATA[ ]TGGAA GTCATCAGAT CCTG CGTGGGTGGG TCAGTGGCCC Peacockl CTCATA[ ]TGGAA GACATCAGAT CCTG CGTGGGTGGA TCAGTGGCCC BluetitII TTATCT[ ]TGGCT AGATAATGAT CCAA TTTGGGTGAA CCAGTGGCCT CMagpiell TTGAAT[ ]TGGCT CACTAATAAA AGGG TCTGGGTGGA TCAGTGGCCT Penguinl CTGACA[ ]TGGCT CACCGATCAG CCAG TGTGGGTGGA TCAGTGGCCC GSKiwiI TTGACC[ ]TGGCT TACAAATAAG CCGC TCTGGGTGGA TCAGTGGCCC LKiwiI TTGACC[ ]TGGCT TACAAATAAG CCTG TCTGGGTGGA TCAGTGGCCC HThrushI TTGCAG[ ]TGGTT GACTAATGCC CCCA TCTGGCAAAA TCAGTGGCCT ES0w1II TTGACC[ ]TGGTT AACAAATAAC CCTG TGTGGGTGGA TCAGTGGCCG WFgoosel CTCACC[ ITGGAA AACTGAGAAG CCAG TGTGGGTGGA TCAATGGCCG NABDuckIII CTCACC[ ]TGGAA AACTGAAAAA CCGG TGTGGGTGGA TCAGTGGCCT FlamingoI CTCACC[ ]TGGAA GATGGACAAT CCAG TGTGGGTAGA TCAATGGCCG GRheaII TTAAAA[ ITGGAA AACTAAAAAT CCGG TATGGGTTGA CCAATGGCCG DRheaI TTAAAA[ ]TGGAA AACTAAAAAT CCGG TATGGGTTGA CCAATGGCCG BrownKiwiI ATTACA[ ]TGGAA AACTGACGAA CCAG TGTGGGTCGA GCAGTGGCCG RBowerIII TTAAAA[ ]TGGCT GACAGAGACA CCAA TATGGATTGA TCAATGGCCG Toucanettell CTCACAI ]TGGAC CACATCAAGT CCAG TGTGGGTAGA CCAGTGGCTG GWoodpeckerl TTAAAA[ ]TGGAA AACAGATGAA CCGG TATGGGTGGA GCAGTGGCCG ES0w1II CTGAAG[ ITGGAA AACTGGCACT CTGG TTTGGGTGGA TCAATGGCCG GRheaI TTGACA[ ]TGGTT AATTGACAAA TCTG TTTGGGTAGA TCAGTGGCCG JQuailI CTCTCA[ ]TGAAA AACAGATAAG CCCA TGTGGATGGA TTGGTGGCTG Cassuaryl CTCGCC[ ]TGGAA GACTGATGAA CCAG TATGGGTGAA TCAGTGGCCC AZMagpieII CTCACG[ ]TGGAG GATTGATGAC CCGA TCTGG-TCTC TCAATGGCCC CMagpieIII CTCACG[ ]TGGAG GAC CCTA TATGGGTCTC TCAATGGCCC MThrushl CTGACA[ ]TGGAG GGCAGATGAC CCTG TCTGGGTTTC TCAATGGCCC CMagpieI CTGACC[ ]TGGAA AACTGACACC CCTG TCTGGGTCAA CCAGTGGTCC Guineafowll CTAAAG[ ]TGGAA GTCGGACACC CCCG TGTGGGTCGC CCAGTGGCCC PartidgelV TTGTCA[ ]TGGCG TTCTGACAAG CTGG TGTGGTTCCC CCAGTGGCCC HThrushII TTATCT[ ]TGGAC CACCGAAGAG CCTG TGTGGGTTGC TCAGTGGCCC GPheasantl ] AGTGATAAC CCAG TATGGATTTC CCAGTGGCCC GPheasantII TCAAACGAC CCGA TATGGGTACC TCAGTGGCCC NABDuckI TTGTCT[ ]TGGAA ATCTGATGGT CCTA TCTGGGTGAA TCAGTGGCTG Loonl TTGACT[ ]TGGTC CTCCAATGAA CCTG TGTGGGTTGA CCAGTGGCCC Ostrichl ATTACC[ ]TGAAC TACCAACACA CCGG TATGGGTTAA GCAATGGCCA MoorhenII CTCACA[ ]TGGAA CACTGATGAA CCTG TGAGGGTTAA GCAATGGCCC Goshawkl TTAGTT[ ]TGGAC AAATGATGAA CCTG TTTGGATCGA ACAGTGGCCG Pigeonl CTTAAA[ ]TGGAC TTCAGATGAC CCTG TATGGATCGA GCAGTGGCCG FHawkI TTAGTT[ ]TGGAC TAATGATGGA CCTG TCTGGATCGA ACAGTGGCCG BGrouseI TTGCGA[ ]TGGAT TTCTGATCAT CCCA TTTGGATTGA TCAGTGGCCG BGrouseII TTGACT[ ---- - ]TGGAC AACAAACACC CCAG TCGTGGTTAA GCAATGGCCA PeacockII CTGACT[ ]TGGCT GACTGACAAG CCCG TATGGATCGA GCAGTGGCCG NABDuckIV CTAAAA[ ]TGGAA AACTGATGAA CCTG TGTGGGTTGA GCAGTGGCCA BluetitIII CTCAAA[ ]TGGAT AATCCAGGAA CCAG TATGGGTTGA ACAGTGGCCT Toucanettel CTAACC[ ]TGGAA ATCGTCTGAC CCGG TGTGGGTCGA GCAGTGGCCC NABDuckII CTGACA[ ITGGAA ATCGTCCGAC CCAG TGTGGGTTGA GCAGTGGCCC HThrushIII CTAACC[ ]TGGAA ATCAGTCGAT CCGG TGTGGATCGA GCAGTGGCCC BlueTitI CTTTCT[ ]TGGAA ATCATCCGAC CCAG TGTGGGTCGA GCAGTGGCCC T7 CCTTCT[ ]TAGAA ATCATCCAGC CCGG TGTGGGTCGA GCAGTGGCCC MHarrierll TTGACGI ITGGAA ATCCACTGAA CCTA TATGGGTGGA ACAGTGGCCC Emul CTGACA[ ITGGTG CTCCAATGAG CCTG TGTGGGTAGA GCAGTGGCCA LKiwiII TTAAAA[ ]TGGAA AAGTGACGAC CCGA TATGGACTGA CCAGTGGCCC LDV CTTGAA[ ITGGCA TACTGACGTA CCTG TATGGATAGA GCAGTGGCCC RSV CTCAAA[ ITGGAA GCCAGACCAC ACG CCTG TGTGGATTGA CCAGTGGCCC ALVsubgrpJ CTCAAAI ITGGAA GCCAGACCAC ACG CCTG TGTGGATTGA CCAGTGGCCC Tragopanl CTGCGC[ ]TGGAG ACAGGACGTG CGG CCTG TCTGGGTAGA TCAGTGGCCC GuineafowlII CTGCAA[ ]TGGAG ACAGGACGCG CAC CCTG TTTGGGTGGA TCAGTGGCCC LoonII CTGACA[ ]TGGAC CACATCAAGT CCAG TGTGGGTAGA CCAGTGGCCG

258 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Appendix 2. Nucleotide alignment (continued) BLV TTTAAACTAG AACGCCTCCA GGCCCTTCAA GACCTGGTCC ATCGCTCTCT GGAGGCAGGT HTLV1 TTTAAACCAG AACGCCTCCA GGCCTTGCAA CACTTGGTCC GGAAGGCCCT GGAGGCAGGC HTLV2 TTTAAACCTG AGCGCCTCCA GGCCTTAAAT GACCTGGTCT CCAAGGCCCT GGAGGCTGGT HIV1 TTGACAGAAG AAAAAATAAA AGCATTAGTA GAAATTTGTA CAGAAATGGA AAAGGAAGGA HIV2 TTAACAAAAG AAAAAATAGA GGCACTAAAA GAGATCTGTG AGAAAATGGA AAGAGAGGGC SIVagm CTCTCCAAAG AAAAAATAAA AGCCTTACAG GAAATATGTG ACCAATTAGA GAAAGAAGGA FIV TTAACAAATG AAAAAATTGA AGCCTTAACA GAAATAGTAG AAAGACTAGA AAGAGAAGGG BIV TTGACAAAAG AAAAGTATCA GGCTCTTAAG CAAATTGTGA AAGATCTTTT AGCAGAAGGA Jembrana CTCACATTAG AAAAATATAA AGCCCTTAAG GAAATTGTTG AGGAACTACT AAAAGATGGA CAEV TTAACAGAAG AGAAATTAAA AGGTCTAACA GAAATCATAG ATAAATTAGT GGAAGAAGGA Visna TTGACGCAAG AAAAATTAGA GGGATTAAAA GAAATAGTAG ACAGATTAGA GAAGGAAGGG EIAV CTCACTAAGG AGAAACTAGA AGGGGCTAAA GAGATAGTCC AAAGACTATT GTCAGAGGGA HERVK10 CTACCAAAAC AAAAACTGGA GGCTTTACAT TTATTAGCAA ATGAACAGTT AGAAAAGGGT HML1Z70280 CTGAAACAGG AAAAACTGGA GGCTTTAAAA GAACTGGTGC AGGAACCATT GCAAAAGGGA HML3ACO25577 TTAAGTAAAG AGAAACTGGA GGCTTTAGAG AAATTAGTTA CTGAACAATT AGAAAATGGG HML3AC092364 CTAAATAAAG AAAAACTGGA GGCTTTAGAG ATTTTAATTA CTAAACAATT AGAAAATGTG HML4AC093517 CTCCCAAAAA ATAAGCTGGA GGCACTTCAT ATTTTGGTTC TTGAACAGTT AAAATTGGGA HML5AP000870 TTAAAGGGAG AG--ATTACA AAGACCCCAT GAATTAGTTG AGGAGCAATT AAAAGCCAGC HML6AF069508 CTATCACAGG AGAAAGCTGA TGCAACTCCA TCAGCTAGTG AGAGAGCAAT CTGGAAGCAG HML7AC013722 CTTTCCAAAC ACAAGTTGGA GGCTTTAACT GAAATTGTTA ATGATTTACT ACAAGCAAAC HML8AL596245 CTTTCTAAAG AAAAACTGGA GGATTTAACT CAATTGGTTT CTGAACAGTT ACAACTTGGA HML9ACO25569 ATTAAGAAGG AAAAATTGGA ACATATTCAA CGTCTAGTAC AAGAACAA-- -CATGCTGGC HML9AC068700 ATTAAGAAGG AAAAACTGGA GCATATTCAA TGTCTAGTAC AAGAACAACT AGATGCTGGC Oryzomysl CTAACAAAAG AGAAATTGCA GGCCTTAGAG CAGCTGGTGC AGGAGCAGTT AAATGCTCAG Armadilloll CTTCCGAAAG AAAAATTGGA AGCCCTGAAA GAATTAGTGC AGGAGCAGTT AATGGCAGGG Reedbuckl CTTCCAAAAG AAAAATTGGA AGCCCTGAAA GAATTAGTGC AGGAGCAATT AATGGCAGGG Muspaharill ATAACAAAAG AAAAATTGCA GGTATTTGAA CAACTGGTGC AAGAGCAGTT GGAGGCTCAG NileRatli CTATCAAGGG AGAAGCTTAG AGCCCTCAGA GAACTGGTAG AGGAACTGCT GTCTCTACAA Hedgehogl TTGCCTAGGG ATAAGCTAGA AATTTTAAAA GAGCTCGTAT GAGAGCAGTT GTCCTTGGGA Cougars GCAAAAGGGG ACTGAATGGC TCATCTCCAA CAATGAGTGG ATGAACAATT AAAAGGCAGG DomesticCatl TTAAAGGGG- ACCGATTGGT TCATCTCCAA CAATTAGTGG ATGAACAATT AAAAAGCGGG T22 CTGTCCACAG AAAAATTAAA TGCTTTAAAA GCCTTAGTGA ATGATCAACT AACCGCAGGT Sheepl CTATCAAAAG AGAAACTGTC CGCTTTGCAT ACTTTGGTGG CTGAACTACT ACAACAAAAT IPSquirrell CTATCAAAAG AGAAACTGTC CGCTTTGCAT ACTTTGGTGG CTGAACTATT ACAACAAAAT SIMongooseI CTTACTTCTG AGAAATTGCA GGCGGCCAAA GAGATTATTT CACAACAATT AAAAGATGGC MPMV TTAACCAATG ACAAACTTGC TGCTGCCCAA CAGTTAGTGC AAGAACAGTT AGAGGCAGGA SRVI TTAACCAGTG AAAAACTTGC TGCTGCCCAA CAGTTAGTGC AAGAACAGTT AGAGGCAGGA SRVII CTAACTCAGG AAAAACTTGC TGCTGCCCAA CAGTTAGTGC AAGAACAATT GCAGGCAGGG BabboonSERV TTACTCAATG ATAAATTAAG TGCCGCCCAA CAGTTACTGC AGGAACAACT GGAAGCAGGA MusD TTACCTAAAG AGAAAATAGA GGCACCTTCT TTGCTAGTGC AGGAGCAGTT AGAAGCAGGA TvERV CTCCCTAAAG AAAAACTAGA GGCTGCAAAT ATGTTAGTTC AACAACAATT GACTGCGGGT SMRVH CTTACATATG AGAAAACCCT CGCTGCCATT GCGTTAGTAC AGGAACAGCT CGCAGCAGGA Colobusl TTAACCAATG AGAAGCTTGC CGCTGCCCAA CAGTTAGTGC AGGAACAATT AGAGGCAGGA SImongooseI TTATCTGATG AAAAGATACT GGCTGCACAG CAGCTGGTAA AAGAGCAATT GGATGCTGGA GoatII CTCCCAGAAG TCAAAGTGAA CGCGGCTATG GAGCTTGTGC AGGAGCAACT TGCTGCGGGG OstrichD TTAACTGAAC AAAAACTCGC AGCTGTCACG GCGTTGGTGC AGGAACAGCT AGCTGCTGGA LorisI TTAACTACAG AAAAATTAAC TGCAGCAGCA ACGTTAGTAC AGGAACAGCT TGCTGCCGGC SIMongooseII TTAACTACTG AAAAACTGGC TGCAGCTACA GAGTTAGTAC AGGAACAACT TGCTGCAGGA Jaagsiekte CTAACACAAG AAAAACTTTC TGCCGCACAA CAGCTGGTGC AGGAACAGCT GAGACTTGGT RDolphinl CTCACAAAGG AGAAATGAGA AGCTGCAGAA CAATTAGTAC TGGAACAATT A GGG WFDeerI CTTACAAAAG AAAAAATATT AGCCACAGAA CAATTGGCAC AGGAACAATT GGCGCTTGGT Cariboul CTCACAAAAG GAAAAATATT AGCTGCAGAA CAATTGGTGC AGGAACAAAT GGCAATTGGC Giraffel TTAACAGCTG AAAAGCTGCA GGCAGCTGAG GATTTAGTTA TGGAGCAACG GGTGGCCGGC Bisonl CTGCCTCAGG AAAAATTACT GACAGCTAAA ATGTTAATTT CTGAACAACT GCAACTGGGA Muskoxl CTGCATAAGG AAAAATTACT GGCAGCTAAA GCATTAATCT TTGAACAATT AGAATTGGGA CFBadgerl TTGCCTTCAG AAAAGCTGGC AGCAGCGGCT ATCCTAACAG AAGAGTAGCT GCTTTGGGGT HRVS CTTACCCAGG AAAAGCTGGC TGCGGTAAAT GATATAGTGT TACAACAATT AGAGGCAGGC GoatI TTATCCAAAG AAAAGATCCA AGCAGCCCAA CAATTGGTTG AAGAACAATT AAAAGCAGGA IAPSHamster CTATCCTCTG AAAAATTAGA GGCTGTCACA AGATTAGTGC AAGAACAGGA ACGGCTGGGG IAPCHamster CTTTCCTCTG AGAAACTGGA AGCTGCTAAG ACTCTAGTGC GGGAGCAGCT GGATCTGGGG IAPMouse CTATCCTCTG AAAAACTAGA AGCTGTGATT CAACTGGTAG AGGAACAATT AAAATTAGGC Mastomysl CTATCTTCTG AAAAATGGTT GCAGCAAAAT AGTTGGTGG- CTGAACAGAT GTCCTTAGGA MyomysI CTTTCCTCTG AAAAGTTGAG TGCTGCCAGA GAGTTGGTGG CTGAAGAGAT GTCCTTAAAG Uromysl CTTTTCTCTG AAAAACTTGA GGCCGCGACA AAGTTAGTTC AGGAACAACT TGCCCAAGGC Prairiedogl CTCTCCTCTG AAAAATTGAC TATGGCTCGT ACCCTAGTAC AAGAACAATT ACAGTTGGGA Rabbitl CTATCTCAGG AAAAACTGGA GGCAGTCAAC GCGCTGGTCT TAGAGCAAGT CAACGCTGGC NileRatI TTGCCTATAG AAAAGTTACA GGTGGCAAGA GCCTTAGTGC AAGAACAATT AGAGGCAGGA MMTVBR6 CTTAAACAAG AAAAGTTACA GGCTTTACAA CAGTTAGTGA CAGAACAATT ACAACTGGGC SDunnartl TCCTTTCAAA TGAAAAACTC ACTACCTTAA AATCCATCAT TATGGAACAG GAGAAACTGG RKangal CTAACCCATG AAAAACTTCT CGCCCTCCGA GAAATAGTAT CGCAACTCTT AAAGGAGGAC

259 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Platypus' CTAACAAAAG AAAAGCTACA GGCTCTAGAA CATATTGTGA AAGAACAGTT ACATGCAGGA Echidnal CTACCAACCC CGAAATTGGC TGCTTTACAA GTGATAGTCG CGGAACAATT AGCGGCGGGG Armadillo' CTACCGTCCC ATAAATTAAC AGCTTTACAT GAATTAGTAA AAGAACAGTT ACAACTTGGG PFalconI CTACCTTTAG AAAAGCTTCG CGCCCTTCAG GAGCTGGTCA CAGAACAGCT CGAAAAAGGA PrairiedogII CTACCTTTAG AAAAGCTTCG CGCCCTTCAG GAGCTGGTCA CAGAACAGCT CGAAAAAGGA MHarrierI CTACCACTGG AAAAGCTTCG CGCCCTCCAA GAATTAGTCA CGGAACAGTT AATGAAAGGA Vulture' CTACCTCTTG AAAAGCTTCG CGCTCTCCAA AACCTGGTTG CGGAACAAGT AACGAAAGGG ETinamoul CTGCCTCTTG AAAAGCTTCG CGCTCTCCAA GACCTGGTTA CGGAACAGTT AACAAAAGGG Moorhenl CTTCCTATTG AGAAATTGCG TGCTTTAAAT AACCTAGTAG AGGAACAATT CCAAAAGGGT HThrushl CTGCCATTAC TTAAATTGCA TGCACTCACT GATCTTGTCC AAGAACAACT TCAAAAGGGA ESOwlI TTGCCAATGG AAAAGTTGCA CCACCTTAAT GATTTAGTTC AGAAACAATT AAGGCAGGAT RNPheasantl CTCAGTAGTG AAAAGCTGCA TCACTTGCAT GAACTCATTC AGGAACAGTT C Peacockl TTAGTAGT-G AAAAGCTGCA CCACTTGCAT GAGCTGCTTC AGAAATGGTT AGCAGCTGGC Bluetitli CTCAGCAAAC AAAAACTTGG GGCGCTAGAA AAATTAGTAG AAGAAGAATT GGCTAAGGGA CMagpiell TTAAAAACAC ACTAATTGAG AGCACTCACT GCACTCGTGG AGGAGCAGCT GGGGAAGGGG Penguinl CTATCACAGG AAAAGGCCGA CGCGTTGCGA ACGTTGGTGG ACGAGCAAGT GAGTCAAGGA GSKiwiI CTGACAGTAG AAAAGGCCGA GGCATTACAA TTGCTAGTGC AGGAGCAACT TCGTCAGGGA LKiwiI CTGACAGTAG AAAAGGCTGA GGCATTACAA TTGCTAGTGC AGGAGCTACT TCGTCAGGGA HThrushl TTATCCAAAT CCAAATTAAT TGCCCTTCGT CAATTAGTTC AAGAACAATT GCAACAGGGA ESOw1II CTGTCAATTG AGAAATTAAA GGCCCTGCAG GAGTTGGTGA TGGAACAGCT AGCTGTTGGA WFgoosel CTTCCAGTGG AAAAGGTCGC TGTGCTCCAA GAACTTATTA AAGAGCAATT GAAATTAGGA NABDuckIII CTCCCCGCAG AACAGGTCGC AGTGCTATAA GAACTTGTTC AGGAGCAATT GAAAGCAGGA FlamingoI CTCACGGCAG AGAAGGTCGC AGCCCTCCAT GAACTTGTTC AAGAACAACT GCAATTAGGA GRheaII CTGACAGAAG AGAAGCTGCG ACACGCCGAG GAGCTGGTAA AGGAGCAGTT AAGGGAGGGT DRheaI CTGACAGAAG AGAAGCTGCG ACATGCCGAG GAGCTGGTAA AGGAGCAGTT AAGGGAGGGT BrownKiwil CTGCAAAAGG AGAAGCTAAG GGAACTTAAC AAATTGGTAG AAGAACAGCT ACAATTGGGA RBowerIII CTGTCTAAAG AAAAGCTCGC CCATGTTGAA GAATTGGTTG AGGATCAGTT GCGAAGGGGA Toucanettell CTGAAAGAGG ACTGGCTGCA GATCATTCAA ACATTAGTTC AAGAGCAGTT AGAGGCAGGT GWoodpeckerI CTCAGACAGG ATAGGCTGTA AATTGTTAAG ACTCTAGTGC AGGAACAATT GCAATTAGGG ES0w1II CTAACCCATG AAAAAGTCGC TAAATTGCAG GAATTGGTAC AAGAGCAATT GGCTAAGGGA GRheaI CTTACCCGAG AAAAGTTGTC CCATCTGCAT GTGCTAGTTC AGGAACAACT TCAGTTAGGG JQuailI CTAACAGAGG AAAAGGTCGT CATTCTTAAC ACCTTAGTTC AAGAATAACT AGATAAGGGA Cassuaryl TTAAAAGCAG AAAGGCTGCA AAAGTTACAC GGGCTGGTGC AGGAGCAGCT CTGTGCAGGG AZMagpieII CTATGTCAGG AATGGCTACA GAACCTACAG CAGCTTGTAG ATGAACAGCT GTCTCTTGGC CMagpieIII CTATGTCAGG AACATCAGGA ACAGCTACAG AACCTACAGG ATGAACAGCT GTCTCTTGGA MThrushl CTGTATAAGG GACAGCTTCA AAACCTACAG CAACTCACAG ATGAACAGCT TTCTCTTGGG CMagpiel CTACCAGATC ATAAGCTGAG TGCCCTCAAA AAACTAGTGG CGGAACAACT GCAAAAAGGC Guineafowll CTGAGTACTG AAAAGTTGAC CCACCTACAA GAACTGATCG AAGAGGAAGT GACGTGTGGT PartidgelV CTAAGTCGCG AGAAATTGAC CCACTTGCAG GAATTCGTGA ACCCGGAGGT TGCAGCAGGC HThrushll ATCTCCCAAG AAAAAATAGA GCATCTACAA GAATTGGTAA ATCAACAATT GAGTCAAGGT GPheasantI CTTGCTCAGG AAAAGGTGAA AGCTGTTAAC GAATTAGTCA ATCGGGAGGT CGCTATGGGA GPheasantll CTTACCCAAG AAAAATTACA AGCCTTGTAT GACTTGGTAG CCCAACAGGT GCAGGAGGGG NABDuckI CTATCTCAAG AAAAATTAGA GGCGCTAGAG GAACTGGTTA AGCGAGAGGT TGCATTAGGA Loonl ATGACCGAGG AGTGACTGCA AATCGCTAAA CAATTAGTTG CCGAACAACT TGCAGCTGGC Ostrichl TTAAAACGGG AAAGTCTATT TCATGCTCAT CAATTGGTCA TAGAGCAATT TCAACAAGGT MoorhenII CTAAAAAGGG AAAGCCTGAT TCAAGCACAC CACCTAGTAG AGGAGCAATT CCAACAAGGA Goshawkl CTAAAGAAGG AAAGTCTGGA ACAAGCAGAA ATGTTGGTAG ATGAACAATT AGCATTGGGA Pigeonl TTGAAAATGG AAAGTCTTGC AGCGGCACAT CAGTTGGTAC AAGAACAATT TGATCAAGGA FHawkI CTAAGGCAGG AAAGTCTGAA ACAAGCAGAA ATGTTGGTAG ATGTACAATT AGCACTGGGA BGrousel TTGAAACAGG AAAGCCTGCA GCATGCCAGT CAGTTGGTAC AAGAGCAATT GCAGCAAGGG BGrouseII CTAAAATGGG AAAGTCTACA ACAGGCGCAC CTGCTGGTTC AAACTCAGTT CAATCAGGGA PeacockII TTGAAAAAGG AAAGCTTGCA TCAGGCAACC CTATTGGTTG CGGAGCAGTT GGATCAAGGG NABDuckIV CTAAAACGGG AAAGTCTAAT ACATGCTCAC GGACTAGTAC AAGAGCAGTG TCAACAAGGG BluetitIII TTAAAAAAGG AAAGTCTGCA ACAAGCTCAT CAATTAGTGC TGGAACAATA TCAACAAGGA Toucanettel CTATCAAAGC CTCGAATGGA TGCTCTGATA GAGCTACTTG ACTGCGAACT ACAGCAAGGT NABDuckII CTGTCGAAGC CTCGAATGGA TGCTCTTCTA GAGCTAGTCG ACCGCAAACT ACAACGAGGT HThrushIII TTGTCCAAGC CTCGAATGAC TGCCCTCCTT GAGCTAGTTA GCCGTGAGCT GCACAAGAAT BlueTitI CTGTCCAAGC CTCGATTAGA TGCCCTTCTT GAGTTAGTTA ACCGTGAATT ACAACAAGGT T7 CTGTCCAAGC CTCGATTAGA TGCCCTTCTT GAGTTAGTTA AGCGCGAATT ACAACAACGT MHarrierII CTACCTAGAG AAAAGCTGGA AGCGGCACAC AAAATCGTAC AAGAAGTAGT ACAGAAAGGG Emul TTGACAGGGG ACAAATTGGC GGCTGCCCGA CAAATAGTAA GTAGAGAATT AGAGCAGGGA LKiwiII CTGACTTCAG AAAAATTGGA AGCGGCGCAG AAGATAGTGG CTCGCGAACT GGAGCAAGGG LDV TTAACTGCCC AAAAACTTGA CGCTGTGCAA AACATTATTC AAGACCTGCT AAAAGATGGT RSV CTCCCTGAAG GTAAACTTGT AGCGCTAACG CAATTAGTGG AAAAAGAATT ACAGTTAGGA ALVsubgrpJ CTTCCTGAGG GTAAACTTGT AGCGCTAACG CAATTAGTGG AAAAAGAATT ACAGTTAGGA Tragopanl TTACCGACGG AAAAGCTTGC AGCGCTTAGA ATGCTTATTA GACAAGAGCT TCGCTTAGGA GuineafowlII CTAACCCAAG GAAAGCTTGC CGCGCTTCAA CAGCTAGTTA CTCAGGAACT TTGGCTAGGG LoonII CTGAAAGAGG ACTGGCTGCA GATCATTCAA ACATTAGTTC AAGAACAGTT AGAGACAAGT

260 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Appendix 2. Nucleotide alignment (continued) BLV TATATCTCC[- ]CCCTG GGACGGGCCA GGCAATAATC CAGTCTTCCC GGTACGGAAA HTLV1 CATATCGAA[- ]CCCTA CACCGGGCCA GGAAATAACC CAGTATTCCC AGTTAAAAAA HTLV2 CACATTGAA[- ]CCATA CTCAGGACCA GGCAATAACC CCGTCTTCCC CGTTAAAAAA HIV1 AAAATTTCA[A AAATT]GGGCC TGAAAATCCA TACAATACTC CAGTATTTGC CATAAAGAAA HIV2 CAGCTAGAG[G AGGCAJCCTCC AACTAATCCT TATAATACCC CCACATTTGC AATCAAGAAA SIVagm AAAATTAGC[A AGATA]GGAGG AGAGAATGCA TACAACACTC CAGTGTTTTG CATAAAGAAA FIV AAAGTAAAA[A GAGCA]GATCC AAATAATCCA TGGAATACAC CAGTATTTGC TATAAAAAAG BIV AAAATTTCC[G AAGCT]GCTTG GGATAACCCA TATAATACCC CAGTTTTTGT TATAAAGAAA Jembrana AAAATCTCT[A GAACC]CCTTG GGATAATCCT TTCAATACCC CTGTTTTTGT AATAAAAAAG CAEV AAACTAGGA[A AGGCA]CCCCC ACATTGGACA TGTAATACTC CAATCTTTTG CATAAAAAAG Visna AAAGTAGGA[A GAGCG]CCCCC ACACTGGACT TGTAATACCC CTATATTTTG TATTAAGAAG EIAV AAAATATCA[G AAGCT]AGTGA CAATAATCCT TATAATTCAC CCATATTTGT AATAAAAAAG HERVK10 CACATTGAGH ]CCTTC GTTCTCACCT TGGAATTCTC CTGTGTTTGT AATTCAGAAG HML1Z70280 CATATAGAG[- ]CCTAC TTTCTCCCCT TGGAATTCTC CTGTATTTGT CATTAAGAAA HML3ACO25577 CACATAGCT[- ]CCAAA ATTTTCTCCT TGGAATTCTC CAGTTTTCAT AATTAAGAAA HML3AC092364 CACATAGCT[- ]CCAAT ATTTTCCCCT TGGAATTCTC CAGTTTTTGT AATTAAGAAA HML4AC093517 CACATTGAA[- ]CCTTC TTTTTCTCCT TGGAATTCAC CTGTTTTTGT TATTCAAAAG HML5AP000870 CATATAGAA[- ]CCATC AAACAGCCCT TGGAATTCAC CCATTTTCGT CATTCCCAAA HML6AF069508 AACATACAG[- ]AATAG TCAGCCACCT GTATTTGTGA TCCAAAAATG TCAGGA---- HML7AC013722 ACTACCGAG[- ]CCCTC CTGGTCTCCA TGGAACTCAC CTGTGTTTGT TGTACAAAAG HML8AL596245 AATGTGAAAN ]CCTTC TCTTTCCCCC TGGAATTCTC CTGTGTTTCT AGTAAAAAAG HML9ACO25569 CACATTGAG[- ]CCTCC TGCTAGTCCC TGGAACACTC CTATTTTTAC TATTCCAAAG HML9AC068700 CACATTGAG[- ]CCTAC TACCAGTCCC TGGAACACTC CTATTTTTAC TATTCCAAAA OryzomysI CATATAGAA[- JGAATC AACCAGCCCT TGGAATTCTC CTGTATTTGT TATTAAAAAG ArmadilloII CATATTGAGH ]CCCCC TACTAGTCCG TGGAATTCTC CAGTTTTTGT AATTAAAAAG Reedbuckl CATTTTGAGH ]CCCTC TACTAGTCCA TGGAATTCTC CAGATTTTGT AAATAAAAAG Muspaharill CAGATAGAA[- ]GAGTC TTACCAGCCA TGGCATCCCC CTGAGTTTGT TGTAAAAAAG NileRatil CACATCTGCH ]CCTTC CCACAGTCCA TGGAACACCC TCGAGTTTGC TATAAAAAAA HedgehogI CACATTCGT[- ]CATTC TCAAAGCCCA TGG-ATACTC CTGTCTTTGT GATTAAAAAG CougarI CATAGAGAGH ]CCTAG TAGGTGCCCT TGGAATACTC CTATCTTTTG TGTTCCTAAA DomesticCatI CATACTTGGI- IAATAC TCCTTGGCCT TGGAATACTA TGTTTTTGTG TTCCTAAAAA T22 CATTTAGAA[- ]GTTAG TACCAGTCCA TACAACACTC CAGTTTTTGT CATTAAGAAA Sheepl AGAATAGAAN ]GGTAC TTAATCACCA TGGAATTCAC CAATTTTTGT CATTAAAAAG IPSquirrell AGAATAGAA[- ]ACTAC TCAATCACCA TGGAACTCAC CAGTTTTTGT CATTAAAAAG SIMongooseI CATATTGAG[- ]CCTTC ACAATCCCCT TGGAATTCTC CTATTTTTGT CATAAAAAAG MPMV CATATTACT[- ]GAAAG TAGTTCTCCC TGGAACACTC CCATATTTGT TATAAAAAAG SRVI CATATTACT[- ]GAAAG TAATTCCCCT TGGAACACTC CCATATTTGT TATAAAAAAG SRVII CATATTATA[- ]GAAAG TAATTCCCCC TGGAATACAC CTATATTTGT CATAAAAAAG BabboonSERV CATATTATA[- ]GAAAG TAATTCTCCT TGGAATACAC CTATTTTTGT TATTAAAAAG MusD CATTTGGTG[- ]GAGTC TCATTCTCCC TGGAATACGC CCATTTTCAT TATCAGGAAG TvERV CACATAGAA[- JCCCTC CAACTCTCCT TGGAATACTC CAATTTTTGT TATTAAAAAG SMRVH CATATTGAG[- ICCCAC AAATTCTCCA TGGAATACTC CTATATTCAT CATTAAGAAA Colobusl CACATTACA[- IGAAAG CAACTCTCCT TGGAACACCC CCATATTTGT CATAAAAAAG SImongooseI CACTTAGTA[- ]GTAAG CAAGTCCCCT TGGAATACAC CTATATTTGT CATAAAAAGG GoatII CACATTGAA[- ]CCTTC CACCTCCCCA TGGAATACTC CTATTTTTGT TATAAAAAAG OstrichD CATTTGGAA[- ]CCAAC CACCTCTCCC TGGAATACTC CTATATTTGT AATTAGAAAG Lorisl CATATTGAA[- ]CCTAT TACTTCCCCT TGGAATACCC CTATTTTTGT CATAAAAAAG SIMongooseII CATATTGAG[- ]CCCAC TACTTCCCCC TGGAACACCC CTATCTTCGT CATTAAGAAA Jaagsiekte CATATTGAAP ]CCTTC TACTTCTGCT TGGAATTCTC CAATTTTTGT TATAAAAAAG RDolphinI CATTTAGAA[- ]ATTTC CAGTAGTCCT TGGAATACTC CTATTTTTAT TATCAAGAAA WFDeerI CATATTGAA[- ]CACTC TAATTCGCCA TGGAACTCTC CTTTTTTTGT AATTAAAAAG Caribous CATATTGAA[- ]CACTC TAATTTGCCA TGGAACTCTC CTATTTTTGT AATTAAAAAG Giraffel CATATAGAGN ]CCCTC TAATAGCCCC TGGAATACTC CTATTTTTGT TATTAAGAAG Bisonl CATATTGAG[- ]CCCTC CACTAGTCCC TGGAACACTC CTATTTTTGT TATTAAGAAA Muskoxl CATATTGAGP ]CCCTC CGATAGCCCC TGGAACACTC CTATTTTTGT CATTAAGAAA CFBadgerl CATTTAAAG[- ]GAGTC CTGTAGCCCT TGGAATACTC CTATATTTGT GATAAAGAAA HRV5 CATTTGCAA[- ]CCATC AACCTCCCCA TGGAATACGC CAATATTTGG GATAAAGAAA Goatl CATATCACT[- ]CCTTC CACCTCCCCT TGGAATTCTC CTATTTTTGT AGTAAAAAAG IAPSHamster CATCTAGAG[- ]CCTTC TACCTCCCCA TGGAATACAC CAATTTTTGT TATTAAAAAG IAPCHamster CATATAAAA[- ]TCCTC TGTATCTCCA TGGAATACTC CTATTTTTGT CATTAAGAAA IAPMouse CATATTGAC[- ]CCTTC TACCTCACCT TGGAATACTC CAATTTTTGT AATTAAGAAA MastomysI CACGTAAAA[- ]CCATC TGTGTCACCC TGGAATACCC CTATCTTTGT CATTAAAAAG Myomysl CATATAAAA[- ]CCATC TGTGTCTCCC TGGAATACTC CTATTTTTGT CATTAAGAAG Uromysl CATTTAGAA[- ]CCCTC GACCTCGCCT TGGAATACAC CCATTTTCGT CATTAAAAAG Prairiedogl CATATTAAG[- ]CCTTC TACTTCCCCA TGGAATACGC CCATATTTGT TATCAAGAAA Rabbitl CATTTGATT[- ]CCATC TACCTCTCCA TGGAATACGC CTATTTTTGC CATACGCAAA NileRatI CATATAGAA[- ]CCATC CCAATCCCCC TGGAACACTC CAATCTTTGT AATAAGAAAG MMTVBR6 CACTTAGAA[- ]GAGAG CAATAGCCCT TGGAATACGC CTGTTTTTGT CATTAAAAAG SDunnartl GCCATA---[T T ]GAGCA TTCCTTTAGT GCTTATAACT CACCTGGTTT TGTAATTAGA RKangal CATATTGAA[- ]CCTTC AGTCAGCCCA TATAACTCCC CTGTCTTTGT GATCAAAAAG

261 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Platypusl CATATTGAG[- ]GAGCC CACCAGTCGG GGGAATTCTC CAGTATTGGT AATAAAAAAG EchidnaI CATATTGAA[- ]CCCTC CGATAGCCCT TGGAATTCCC CTGTCTTTGT GATAAAAAAG Armadillol CACATAGAA[- ]AAAAG TTACAGCCCT TGGAACTCCC CTGTTTTTGT AATTCAAAAA PFalconI CACATTGTG[- ]CCTTC TACCAGCCCA TGGAATTCTC CTGTTTTTGT TATCAGAAAA PrairiedogII CACATTGTG[- ]CCTTC TACCAGCCCA TGGAATTCTC CTGTTTTTGT TATCAGAAAA MHarrierl CACATTGTG[- ]CCTTC TACTAAACCC TGGAATTCAC CTATTTTTGT TATTAAGAAA VultureI CACATAGTT[- ]CCCTC AACCAGTCCA TGGAACTCTC CTGTGTTTGT TATTAAAAAA ETinamoul CACATAGTC[- ]CCCTC GACCAGCCCA TGGAACTCCC CTGTATTTGT TATTAAGAAA Moorhenl CATATCGTA[- ]CCCTC TGTCAGTCCC TGGAATTCTC CTGTTTTTGT TATCAAAAAA HThrushl CACATAGTC[- ]CTGTC CACCAGTCCC TGGAATTCCC CTGTGTTTGC AATTAGAAAA ESOwlI GTATTGTAC[- ]CTACC AC-CAGTCCT TGGAACTCCC CAGTGTTTGT TATTTTAAAG RNPheasantl CACGTAATT[- ]CCTAC TGCCAGCCCA TGGAGCTCTG CAGTGTTTGT GATTAGGAAG Peacockl CACATAATT[- ]CCTAC TACCAGCCCA TGGAACTCTC CACTGTTTGT GATTAGGAAG Bluetitil CACATAGTA[- ]GAAAC AACCAGCCCT TGGAATTCCC CGGTGTTTGT AATAAAAAAG CMagpiell AACACTGAG[- ]CCTTC CAACAGCGCT TGGAACTCCC CGGTCTTTGT GGTCCGGAAG Penguinl CATCTGGTT[- ]CCCAC CACCAGCGCC TGGAACACCC CAGTGTTTGT TATAGAAAAG GSKiwiI CACCTGCAG[- ICCAAC CACCAGCCCG TGGAATTCAC CCGTCTTTGT GATCAAGAAA LKiwiI CACCTGCAG[- ]CCAAC CACCAGCCTG TGGAATACAC CTGCCTTTGT GATCAAGAAA HThrushl CACATTGAG[- ]CCTTC TAACAGTCCC TGGAATACCC CTGTCTTTGT AATACAGAAG ESOw1II CTTATTGAG[- ]CCCTC TCACAGCCCA TGGAATAATC CTGTGTTTGT AATTAAAAAG WFgoosel CATATCACCP ICCCAC AACTAGTCCT TGGAACTCAC CAATATTTGT GATCAAAAAG NABDuckIII CACATAGTC[- ]CCAAC CACCAGTCCA TGGAACTCGC CCGTGTTCGT AATTAAAAAA FlamingoI CACATTACC[- ]CCTAC GAACAGCCCC TGGAATTCAC CTGTGTTTGT CATAAAAAAG GRheaII CATATTAAA[- ]CCTTC TACTAGCCCT TGGAATACAC CTATTTTTAC TATTCAGAAA DRheaI CATATTAAA[- ]CCTTC TACTAGCCCT TGGAATACAC CTATTTTTAC TATTCAGAAA BrownKiwil CATATTCAA[- ]CCTTC AACCAGCCCA TGGAACACAC CTATCTTTAC CATACAGAAA RBowerIII CATGTCATT[- ]CCTTC TACAAGTCCC TGGAATACCC CTATTTTTGC AATCCCTAAG Toucanettell CGCATTGTT[- ]CCCTC AACCAGTCCA TGGAATACTC CTATTTTTAC CATTCCGAAG GWoodpeckerl CATATAATT[- ICCATC CACCAGTCCA TGGAATACTC CAATCTTCAC CATTCCAAAA ESOw1II CATATTAAGF- IGCCAC TACAAGTCCA TGGAACACTC CTGTATTTGT TATCCCAAAA GRheaI CATTTAGAGI- ]GAATC TTTCAGCCCT TGGAATACCC CCGTATTCGT AATTCCTAAA JQuailI TATATTATG[- ICCAAT AACTAGCCCC TGGAACACTC CAGTATTTGT AATACCTAAG Cassuaryl CATATAGTC[- ]CCCTC CACCAGTCCG TGGAACACAC CTGTATTTGT TATTCCAAAG AZMagpieII CACATACAG[- ICCATC CTCTAGCCCT TGGAACACTC CTGTATTCTG CATTCCCAAG CMagpieIII CACATACAG[- ICCATC CTCTAGCCCT TGGAACACTC CTGTATCCTG CATTCCCAAG MThrushl CACATCCAG[- ]CCATC CTCCAGCCCT CTGAAAATTC CTGTATACTG CATGCAAGAA CMagpiel CACATTACA[- ]CCTAC CAACAGTCCC TGGAACTCAC CTGTGTTTGT AATCCATAAG Guineafowll CACCTAGTC[- ]CCATC CTTTAGTCCC TGGAATAGTC CGGTATTTGT AGTTCAGAAA PartidgelV CACTTGGTT[- ICCAAC TACCAGTCCC TGGAATTCTC CAGTTTTTGT GATTCAGAAG HThrushll CATATACGA[- ]CCTTC TAATAGTCCT TGGAATACAC CTGTTTTTGT TATTCAGAAA GPheasantl CATTTGGTA[- ICCATC CACCAGCCCA TGGAATTTT- TGT CATTAAAAAG GPheasantll CATCTGGTG[- ]CCTTC AGTTAGTTCG TGGAATAGCC CCGTGTTTGT CATAAAAAAG NABDuckI CATCTACGC[- ]CCTTC AACTAGCCCC TGGAATAGCC CGGTATTTCT AATCAAGAAG LoonI CACATAAAA[- ICCATC TGTTAGTCCC TGGAATACTC CCATATTTAT AATCCCAAAA Ostrichl CATTTATGT[- ITTGTC CACTAGTCCA TGGAATTCTC CAATATTTGT CATAAAAAGG Moorhenll CACCTGAAA[- ]CTATC AACAAGCCCA TGGAATACTC TTATTTTTGT GATAAAAAAG Goshawkl CATATTAAM- ]CCTTC AACTAGCCCG TGGAATACAC CCATTTTTGT TATCAAGAAG Pigeonl CATTTGCAA[- ]CTGTC AACAAGCCCA TGGAATACGC CTATTTTTGT GATTAAGAAA FHawkI CATATTAAAN ]CCTTC AACTAGCCCC TGGAATACAC CCATTTTTGT TATCAAGAAA BGrousel CATATCGTG[- ]CCTTC TACCAGCCCT TGGAATACTC CTATTTTTGT GATTAAGAAA BGrousell CATTTAAAG[- ]CTCTC AACAAGTCCT TGGAACACCC CAATCTTTGT CATTAAGAAG Peacockll CATATTCAA[- ICCATC TACTAGCCCC TGGAATACTC CAATTTTTGT AATTCAAAAG NABDuckIV CACCTAAGA[- ]CTGTC AACAAGCCCT TGGAATACTC CAATTTTTGT GATTCCCAAG BluetitIll CATTTAAAG[- ]TTTTC AACAAGTCCT TGGAATACTC CTATTTTTGT TATAAAAAAG Toucanettel CACATAGAG[- ]CCTTC CACCAGCCCA TGGAACACTC CAGTTTTTGT AATACCCAAA NABDuckII CATATTGAG[- ]CCTTC TGCTAGCCCA TGGAACACCC CAGTTTTTGT AATACCTAAG HThrushIII CATATAGAA[- ]CCCTC CACAAGCCCA TGGAACACCC CAATCTTCGT AATACCTAAA BlueTitI CACACAGAG[- ICCATC TACTAGCCCG TAGAATACCC CTGTTTTCAT AATCCCTAAA T7 CACATAGAG[- ICCATC TACTAGCCCG TGGAACACTC CTGTTTTTGT AATCCCTAAA MHarrierll CATCTAGTA[- ]GAAAG TACAAGCCCT TGGAATACCC CCATTTTTGT CATTCAGAAA EmuI CACCTGGAG[- ]GAAAG CCACAGTCCC TGGAACACTC CTATATTTGT GATACACAAA LKiwiII CACCTAGAG[- ]CCCAG CAATAGCCCT TGGAACTCTC CGATTTTCGT GATCCTAAAA LDV CGAATAATCF- ]CCCTC CCGAAGCCAA TGGAATTCGC CAATTTTTGT GATCCAAAAG RSV CATATAGAM- ]CCTTC ACTTAGTTGT TGGAACACAC CTGTCTTCGT GATCCGGAAG ALVsubgrpJ CATATAGAA[- ]CCTTC ACTTAGTTGT TGGAACACAC CTGTCTTTGT GATCCGGAAG Tragopanl CATATAGAA[- ]CCCTC GCTTAGTGCG TGGAACACTC CAGTGTTCGT CATAAAGAAA Guineafowlll CATATAGAA[- ]CCCTC CCTTAGCCGA TGGAACACCC CGGTGTTTGT AATCAAGAAA LoonII CACATTGTT[- ]CCCTC AACCAGTCCA TGGAATAGTC CTATTTTTAC CATTCCGAAG

262 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Appendix 2. Nucleotide alignment (continued) BLV CCAAATGGC- --GCCTGGAG GTTTGTGCAT GACCTACGAG CTACAAATGC TCTTACAAAG HTLV1 GCCAATGGA- --ACCTGGCG ATTCATCCAC GACCTGCGGG CCACTAACTC TCTAACCATA HTLV2 CCAAATGGT- --AAATGGAG GTTCATTCAT GACCTAAGAG CCACCAATGC CATTACTACC HIV1 AAAGACAGTA CTAAATGGAG AAAATTAGTA GATTTCAGAG AACTTAATAA GAGAACTCAA HIV2 AAGGACAAAA ACAAATGGAG AATGCTAATA GATTTTAGAG AACTAAACAA GGTAACTCAA SIVagm AAAGACAAGT CACAATGGAG AATGTTAGTA GATTTTAGGG AACTAAACAA AGCAACACAA FIV AAAAGTGGA- --AAATGGAG AATGCTCATA GATTTTAGAG AATTAAACAA ACTAACTGAG BIV AAGGGAACGG GAAGATGGAG GATGCTAATG GATTTTAGGG AATTAAATAA GATAACAGTT Jembrana AAAGGGGGAA GTAAGTGGAG AATGCTAATG GATTTCAGGG CCTTAAATAA AGTGACAAAC CAEV AAATCAGGG- --AAGTGGAG AATGTTAATA GATTTCAGAG AATTGAACAA ACAGACAGAA Visna AAATCAGGA- --AAATGGAG GATGTTAATA GATTTTAGAG AATTAAATAA GCAAACAGAA EIAV AGGTCTGGC- --AAATGGAG GTTATTACAA GATCTGAGAG AATTAAACAA AACAGTACAA HERVK10 AAATCAGGC- --AAATGGCA TACGTTAACT GACTTAAGGG CTGTAAACGC CGTAATTCAA HML1Z70280 AAATCAGGG- --AAATGGAG AATGTTAACA GATTTAAGGG CTGTTAATGC TCTGAGTCAA HML3ACO25577 AAATCAGGT- --AAATGGAG AATGTTAACT GACTTAAGAG CCATCAATTC AGTTATACAA HML3AC092364 AAATCAGGT- --AAATGGAG AATGTTAACT GACTTAAGAG CCATCAATTC AGTTATGCAA HML4AC093517 AAATCTGGT- --AAGTGGAG AATGCTTACT GATCTTAGGG CAGTAAATGC TGTCCTTCAG HML5AP000870 AAGTCTGGT- --ATATGGAG ACTTTTGCAT GACTTACGTG CTATCAATGC TAATTTGCAA HML6AF069508 AAATGGCA ACTGCTACAT GATTTGAGAG CTATTAAAGC ACAGATTAAA HML7AC013722 AAGTCAGGA- AAATGGAG GACGGTAACA GACTTAAGAG CTGTTAATAC AGTTATTAAA HML8AL596245 AAATCAGGC- AAGTGGCG GATGGTAACC AATTTAATGG CCATTAATGC TGTAATTAAA HML9ACO25569 AGGTCAGGA- AAATGGGG GTTATTGCAT GACCTCTGTG CTATTAATGC TGTGATGCTT HML9AC068700 AGGTCAGGA- AAATGGAG GTTATTGTAT GACCTCCATG CTATTAATGC TGTGATGTTT Oryzomysl AAATCTGGA- AAGTGGAG AATGGTGACA GATCTAAGGG CAATAAACAG GGTGATTCAG ArmadilloII AAATCTGGC- AAATGGCG CATGTTAACT GATTTGCGTG AAATTAACAA AGTTATTCAA Reedbuckl AAATCTGGC- AAATGGCG CATGTTAACT GATTTGTGTG AAATTAACAA AGTTATTCAA MuspahariII AAATCAGGT- AAATGGAG GATAATAAGA GACTTAAGAA CAATTAATAA GGCAATTCAA NileRatli AGCTCAGGG- GGATGGAG AGTATTGCAA GATTTGAGGG CAATTAACAA AACCATGCAA Hedgehogl CGCTTGGGA- AAATGGCG CCTTCTCCAG ATCTCCCGTG TATTAAATAA AACCATGCAG Cougarl AAAACTGGC- AAATGGCG ACTTCATCAT TATCTCAGAG CTGTTAATGC AGTAATTGAG DomesticCatI AAAAAAAAAC CCAAAACAAT AAATGCAAAA ACAAAAACAA AAAAAACTGG GCAAGTGGCG T22 AAATCTGGA- AAGTGGAG GCTATTGCAT GACCTTCGAG CAATTAATAA AATATTGATG Sheepl AAATCAGGT- AAATGGAG AATGCTAACA GGA ATATTAACAC TATAATGATT IPSquirrell AAATCAGGT- --AAATGGAG AATGCTAACA GATTTAAGGA ATATTAACAC TCTAATGATT SIMongooseI AAATCTGGT- --GAATGGCG ACTTCTTATT GACTATAGGA AAGTGAATGA CACTATGATC MPMV AAATCTGGT- --AAATGGAG GCTCTTACAA GATTTACGAG CCGTTAATGC CACTATGGTA SRVI AAATCTGGT- --AAATGGAG GCTCTTACAA GATTTACGAG CCGTTAATGC CACTATGGTA SRVII AAGTCTGGT- --AAATGGAG ACTTTTGCAA GATTTAAGGG CGGTAAATGC CACCATGGTG BabboonSERV AAGTCTGGT- --AAATGGAG ACTCTTACAA GATTTAAGAG CAGTAAATAT CACTATGGTC MusD AAATCGGGA- --AAATGGAG ACTGTTGCAA GATTTAAGAA AGGTTAATGA AACCATGGTA TvERV AAGTCAGGT- --GCCTGGAG ATTGCTCCAT GATTTAAGGG CGGTCAATAA GACCATGATC SMRVH AAATCAGGT- --AGCTGGCG TCTTTTACAG GATCTAAGAG CCGTTAATAA GGTAATGGTC Colobusl AAATCTGGA- --AAATGGAG GCTCCTACAA GATCTAAGAG CAATAAATAC TACCATGATA Slmongoosel AAATCAGGG- --AAATGGAG ATTGTTGCAG GACCTTAGGG CAGTTAACAG TACTATGGTT GoatII AAATCAGGA- --AAGTGGAG GCTTTTACAG GACCTTAGAG AAGTTAACAA GACTATGGTC OstrichD AAAAATGGG- --TCATGGCG ACTCCTTCAG GACCTCAGAG AAATTAATAA AACCATATTT LorisI AAATCTGGC- --AAATGGAG AATACTGCAA GATTTAAGAG AAATAAGTAA AACAATGCTT SIMongooseII AAATCTGGT- --AAATGGAG ACTTTTACAG GACTTAAGAG AAATCAATAA AACCATGTTT Jaagsiekte AAGTCTGGT- --AAATGGAG GTTGTTACAA GATCTCCGTA AGGTAAATGA GACGATGATG RDolphinl AAATCTGGA- --AAATATAG GTTATTACAG GATCTAAGAG CTGTTAATAA AACTATGCTA WFDeerI AAATCTGGT- --AAATGGAG ATTATTACAA GATCTCAGAA AGGTTAATGA GACCATGGTT Cariboul AAGTCTCGT- --AAATGGAG ATTGCTGCAA GATCTTAGAA AGGTTAATGA GACCATGGTT Giraffel AAATCAGGA- --AAATGGAG ATTGTTACAA GATTTGAGAG CTATAAACGC AACTATGGAA BisonI AAATCTGGC- --AAGTGGCG ACTTTTACAA GATTTAAGAG CTATTAATGC CACTATGGAA Muskoxl AAATCTGGT- --AAATGGCG ACTTTTACAG GGTTTGAGAG CTATTAATGC CACCATGGAG CFBadgerl AAATCAGGA- --AACTGGAG GTTGTTACAG GATCTGAGAG CTGTAAATGC TACCATGAAA HRV5 AAATCGGGG- --AAATACTG GTTGTTGCAT GATTTACGGG CTGTTAATCA GCAGATGCAA GoatI AAATCAGGA- --AAGTGTAG GCTCTTATAA GATTTGAGAG AAGTAAATAA AACTATGATT IAPSHamster AAATCTGGG- --AAGTGGAG ATTACTCCAC GACCTGCGGG CCATTAACAA TCAGATGCAT IAPCHamster AAATCTGGT- --AAATGGAG ACTGCTTCAC GATCTTAGAG CTATTAATCA ACAGATGCAA IAPMouse AAGTCAGGA- --AAGTGGAG ACTGCTCCAT GACCTCAGAC CCATTAATGA GCAAATGAAC MastomysI AAATCTGGG- --AAATGGAG ATTGTTGCAT GATCTGCATG CCATAAACCA ACAAATGCAG Myomysl AAATCTGGA- --AAATGGAG ATTGTTGCAT GATCTGCATG CCATAAATCA ACAAATGCAG Uromysl AAATCTGGA- --AAATGGCG ATTGTTACAT GATTTAAGAG CTATTAATGC CCAGATGCAA Prairiedogl AAATCTGGA- --AAGTGGCG GTTGCTACAT GATCTGCAGG CTATTAATGC TCAAATGCAG Rabbitl AAATCAGGC- --CAATGGAG ACTCTTACAT GATCTTAGGG CGGTCAACGC ACAAATGCAA NileRatI AAATCGGGT- --AAATGGAG ATTGCTACAT GATTTGCGGG CCATCAACGC CCAGATGCAA MMTVBR6 AAGTCAGGA- --AAATGGAG ACTGTTACAA GACCTACGTG CAGTTAATGC TACAATGCAC SDunnartl AAAAAAAAC- --AAATGGCG CATGTTGATT ----TGAGAG CTGTGAATGC CTCCATGCAA RKangal AAATCGGGA- --AAATGGCG CCTCCTAATA GACCTGCGGC AAGTTAAACG CCACCGTGCG

263 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Platypusl CAATCAGGA- --AAATGGAG ATGCTTA-CA GATTTAAGAG CAGTAAATAA CACCATGGAA Echidnal CGCTTGGGG- --GCCTGGAG GATGTTGACT GATCTCAGAG AAATCAATAA GACGATGCAG Armadillol AGGGGTAAAA ATACCTTCCG TCTTCTCCAT GACTTGAGAG CAATAAATGC CCACATCAAA PFalconI CAAACTGGC- --AAGTGGCG CTTGCTCCAT GATCTCAGAA AAATTAATGA TGCTATGGAA PrairiedogII CAAACTGGC- --AAGTGGCG CTTGCTCCAC GATCTCAGAA AAATTAATGA TGCTATGGAA MHarrierI AAAACTGGC- --AAGTGGCG CTTGCTCCAT GACCTTAGAA AAATTAATGA CGCTATGGAA Vulturel CAATCTGGC- --AAGTGGCG CTTGCTCCAT GATCTCAGAA AAATAAATGA CGTAATGGAA ETinamoul CAATCTGGC- --AAGTGGCG CTTGCTCCAT GATCTCAGGA AAATAAATGA CACAATGGAA Moorhenl CCAAATGGC- --AAGTGGCG CCTCCTGCAT GATCTTAGAA AAATTAATGA TGCTATGGAA HThrushl CAATCTGGC- --AAGTGGCG TTTGCTCCAG GACCTCCGTA AAATAAAT-T CTTGATGGGG ESOw1I ACAAGTGGC- --AAATGGTG ACTTTTACAT GATCTCTGAC ACATCAATGA TGCCATGGAG RNPheasantl AAGAATGGG- --AAATGGCG TCTGTTGCAA GATTTACGGC AAGCTAATGC TGTCGTGGAA Peacockl AAGAATGGG- --AAATGGCG TCTGTTGCAA AACTTACAAT AAATTAATGC TGTCATGGAA BluetitII CCAGGGAAAG ACAAATGGCG TCTCCTTCAA GATTTAAGAG AAATAAACAA AAAAATACAA CMagpiell CCGGGGACAG ACAAGTGGAG ACTCCTTCAC GACCTGCAGA AAATTAACGA GGTCATTGAG Penguinl AAAAGTGGC- --AAATGGAG ACTTCTACAT GATTTACGGC AAGTAAATGC CGTGATTCAA GSKiwiI AAGAGCGGA- --CAGTGGCG ACCGCTACAC GACCTGCGAC GGGTGAATGC AGCTATAGAG LKiwiI AAGAGCGGA- --CAGTGGCG ACTGCTACAC GACCTGCGAC AGGTGAATGC AGCTATAGAG HThrushl AAGTCAGGA- --AAATGGAG GTTGTTACAG GACCTCCGAA AAGTTAATGT AGTCATGCAG ESOw1II AAATCAGGA- --AAACAGTG ATTTTTGCAT GACCTACGGC AAATAAATGC TGTTATGACC WFgoosel AAAAATGGG- --AAATGGTG ACTGCTCCAT GACCTGCGTC AGATAAACAA CGTCATGGAA NABDuckIII AAGAGTGGG- --AAATGGAG ATTACTACAT GACTTAAGAA AAATAAATGA GGTCATGAAA Flamingol AAGAGTGGA- --AAATAGCG ACTCCTTCAT GACTTACGTC AAATAAATAA TGCAATGGAG GRheaII AAATCAGGG- --AAGTGGAG ATTATTGCAT GACCTTAGGG CGGTGAATGC CTGTATGGAG DRheaI AAATCAGGG- --AAGTGGAG ATTATTGCAT GACCTTAGGG CGGTGAATGC CTGTATGGAG BrownKiwil AGGTCAGGA- --AAATGGAG ACTGCTACAC GATCTGCGAG CTGTGAACGC AACAATGGAA RBowerIII AAATCAGGG- --AAATGGAG ATTATTACAT GACCTTCGAG CTATCAATTC TGTGATGCAG Toucanettell AAGTCTGGG- --AAGTGGCG TTTGTTACAT GATTTGTGTG CTATAAACGC AGTTATGCAA GWoodpeckerl AAATCTGGT- --AAATGGCA TTTATTACAT GATCTGAGAG CAATCAACTC GGTCATGGAG ESOw1II AAAAGTGGT- --AAATGGAG ACTCCTGCAT GACTTAAGAA AAGTAAACGA GGTCATTGAC GRheaI AAGTCAGGA- --AGATGGTG TTTGCTACAA GATTTGTGAG CTGTAAATGC AGTAATGCAA JQuailI AAAAACAGA- --AAGTGGCA TTTATTACAT GATCGTAACA GGGTAAATGA AGTTACAGAG CassuaryI AAAAATGGC- --AAATGGCG ATTGCTGCAC GACCTACGTG CAGTAAATGC ATTGATAGAA AZMagpieII AAAAGTAGT- --AAATGGCA ACTGTTGTAC AACTTGCGTG TGGTAAATGT GGTCACTGAA CMagpieIII AAAAGTGGT- --AAATGGCG ACTGTTGCAT CACTTGTGTG CGGTAAATGC AGTCATTGAA MThrushl AAG--TGGT- --AAATGGAG ACTCTGACAT GACTTGGAGG TGGTAAATGC AATCATTGAG CMagpiel AAAACTTCTG ACACCTGGCG ATTATTACAA GATCTCAGAA AGATCAATGC AGTAATTGAA GuineafowlI AAGTCCGGG- --AAGTGGCG ATTTGTCTAT GATCTCCAGG CTGTGAACAA TACCGTGGAA PartidgelV AAGTCGGGG- --AAATGGCA CTTCATATAT GATCTTAGAG CAGTCAGTGA CACCATGGAA HThrushll AAATCTGGG- --AAATGGAG ATTTTTGCAT GATCTTAGGG CAGTGAATGA ACAAATGTTG GPheasantl AAGTCAGGT- --CAATGGCG ACTGCTGCAT GATTTATGAG GAGTCAATGC AGTTTTGCAG GPheasantll AAGTCAGGA- --AAATGGTG TTTCATACAT GATCTTCGTC GAGTGAATGA TTCCATGGAG NABDuckI ---ACAGAG- --AAGTGGTG TTTTCTCCGT GATCTGCGTA AAGTGAATGC AAGCATGCAG Loonl AAGAGTGGA- --AAATGGAG ACTTTTACAT GACCTACGCA GAGTTAATGC CCAAATGCAA Ostrichl AAGTCAGGT- --AAATATCG ATTGTTACAT GATCTGAGAG AGATTAATAA TCAGATGGAA MoorhenII AAATCTGGG- --AAGTACCG GCTGCTACAC GACCTCAGGG CCGTTAATAA CCAAATGGAA Goshawkl AAATCGGGC- --AAATATCG GTTATTACAT GATCTCAGAG CAGTCAATGA TCAGATGCAA Pigeonl AAATCAGGG- --AAGTTTCG ACTGCTTCAT GATTTGAGAG CTGTAAATGC ACAAATGCAA FHawkI AAATCAGGC- --AAATATCA GCTATTACAT GATCTTAGAG CAGTCAATGA TCAGATGCAA BGrousel AAATCGGGG- --AAATATCG GCTGCTACAT GATCTGCGTG CTATTAATAA TCAAATGCAA BGrouseII AAGACAGGA- --AAATATCG CTTACTGCAT GACCTGCGAG CAGTCAATGC TCAAATGGAA PeacockII AAATCAGGA- --AAATATCG TTTGTTGCAT GATCTCCG-G CTGTAAACAA TCAAATGCAA NABDuckIV AAATCGGGG- --AAATATCG TTTACTTCAC GACCTGCGAG CGGTTAATAA TCAAATGTGT Bluetitill AAATCTGGG- --AAATATCG ATTGCTGCAT GACTTACGTG CAGTAAATGA AAAAATGGAG Toucanettel CGAACCGGTG AGGGATTTCG CCTCCTTCAT GATCTACGTG AAATAAATAA AATGATTCAA NABDuckII CGAACCGGTG AAGGGTTTCG CCTCCTCCAT GACCTGCGTG AAGTAAACAA GATAATTCAA HThrushIII CGGTCCGGCG AAGGCTTTCG TCTCTTGCAC GACCTGCGAG AAGTGAACAA GAGAATCCAG BlueTitI CGATCCAGAG AGGGGTTTCA CCTCCTCCAC GATTTACGTG AAAGAAATAA GAAAATTCAA T7 CAATCCGGAG AAGGGTTTTG CCTCCTCCAC GACTTGCATG AAGTAAATAA GAAAATTCAA MHarrierll AAGGATCGTA GCAAGTTTAG ACTTTTGCAT GACCTGCGTG CAGTTAATGA ACGCATGGAG Emul AAGGATAAGT CTAAATTTCG ATTACTGCAC GACCTACGAG CGGTAAATCA GCAGATGGAA LKiwiII AAGACTAAGG ACAGTTACCG GCTATTACAT GACCTGAGAC GGATCAATCA ACAGATGTGT LDV AAAGATAAGA GCAAATTTCG CATGCTGCAT GACTTACGGG CAGTAAATGC CTTGATAAAA RSV GCTTCCGGG- --TCTTACCG CTTACTGCAT GATTTGCGCG CTGTTAACGC CAAGCTTGTT ALVsubgrpJ GCTTCTGGG- --TCTTATCG TTTATTGCAT GACTTGCGCG CTGTTAACGC CAAGCTTGTT Tragopanl CGATCGGGT- --CAATATCG ATTACTACAC GATCTGCGCG CAGTCAACTC GCAACTCATA GuineafowlII CGCTCAGGA- --GCCTTCCG TCTCCTACAT GATTTACACG CAGTGAATTC CCAGTTAATA LoonII AAGTCGGGG- --AAGCGGCA TTTGTTACAT GATTTGCGTG CTATAAACGC AGTTATGCAA

264 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Appendix 2. Nucleotide alignment (continued) BLV CCCATTCCGG CACTCTCTCC CGGACCGCCA GACCTTACCG CTATCCCTAC GCACCCTCCA HTLV1 GATCTCTCAT CATCTTCCCC CGGGCCCCCT GACTTGTCCA GCCTGCCAAC TACACTAGCC HTLV2 ACCCTCACCT CTCCTTCCCC AGGGCCCCCC GATCTCACTA GCCTACCGAC AGCCTTACCC HIV1 GATTTCTGGG AAGTTCAATT AGGAATACCA CATCCTGCAG GGTTAAAACA GAAAAAATCA HIV2 GACTTCACAG AAATCCAGTT AGGAATTCCA CACCCAGCAG GACTAGCCAA GAAGAAACGA SIVagm GATTTTTTCG AAGTACAGTT AGGCATACCT CATCCATCAG GGTTCGAAAA GATGACGGAA FIV AAAGGAGCAG AGGTCCAGTT GGGACTACCT CATCCTGCTG GTTTACAAAT AAAAAAACAA BIV AAAGGACAAG AATTCTCTAC AGGCTTACCT TACCCTCCAG GAATTAAGGA ATGTGAACAC Jembrana AAAGGACAAG AATTCCAAAT CGGGCTGCCA TACCCCCCAG GAATTCAGCA ATGTGAACAT CAEV GATTTAACAG AAGCGCAGTT AGGACTCCCG CATCCGGGAG GACTACAAAA GAAAAAACAT Visna GATTTAGCAG AAGCACAGTT AGGGTTACCT CATCCAGGAG GATTACAGAG AAAGAAACAT EIAV GTAGGAACGG AAATATCCAG AGGATTGCCT CACCCGGGAG GATTAATTAA ATGTAAACAC HERVK10 CCCATGGGGC CTCTCCAACC CGGGTTGCCC TCTCCGGCCA TGATCCCAAA AGATTGGCCT HML1Z70280 CCCATGGGTG CACTACAACC AGGGCTGCCC TCCCCAACAA TGATCCCAAA ATACTGGCCT HML3ACO25577 CCTATGGGAG CATTACAGCC AGGATTGCCT TCTCCTGCTA TAATTCCAAA AAATTGGCCT HML3AC092364 CCTATGGGAA CATTACAGCC AGGATTGCCT TCTCCTGCTA TAATTCCAAA AAATTGGCCT HML4AC093517 CCTATGGGGA CATTACAACC CAGTTTGCCC TCCCCCAGTA TGATTACTGA GTATTGGCCA HML5AP000870 CCTATGGGGC CCCTTCAACA GGGGCTCCCC TCCCCCACAG CGATTCCTCA AGATTGGCCT HML6AF069508 CCAGTGGGTG CATTACAGCA AGGTCTGTCA TCCTCAGCAG CCATTCCAGA GATTGGACTC HML7AC013722 CCTATGGGGG CATTACAACC CAGTATGCCC TCCCCCTCCG TGATTCCTGA GGCATGGCCT HML8AL596245 CCTGTGGGGG CAGTCCAACC TGGCATGCCT GCCCCTGCTT CAATACCTAA AAATTGGCCT HML9ACO25569 CCTATGGGAC CTTTGCAACC AAGATTGCCT TCTCCGGGTA TGATCCCCAA AGACTGGCTT HML9AC068700 CCTATGGGAC CTTTGTAACC AGGATTGCCT TCTCCGGTTA TGATCCCCAA AGACTGGCCT Oryzomysl CCAATGGGCT CTCTACAGTC TGGGATTCCT TTGCCTTCTT TGTTACCGAA AGAATGGCCT Armadilloll CCTATGGGAT CTTTACAGCC TGGTTTGCCA TCCCCAAGTA TGATCCCACA AAATTGGCAT Reedbuckl CCTATGGGAT CTTTACAGCC TGGTTTGCCA TCCCCAACTA TGATCCCACA AAATTGGCCT Muspaharill CCAATGGGTC CTGTACAGTC TGGGCTTCCT TTACCATCTT TGTTATCAAA ATCATGGCCT NileRatli ATCTGCCCCC CACCCCCAAG AGGGCTACCC CACATCTCAG CCATTTCTGC CAATGTTCCC Hedgehogl GTCTGGGGCT CCCCCCAAAG TGGTTTGCCT CTTGCTTCTG CAATACCAAT GGGAATTCCA Cougarl CCTTTTGGC- CCTTGCAGCT AGGACCACCC TCTCCCTCAG TGCAGCTTCA AAATTGGCTG DomesticCatl ACTTCTTTCA TAATCTCAGA GCTATTAATG CAGTATTTGA GCCTTTTGGC CCCTTGCAGC T22 CCCATGGGAC CCCTTCAGTG TGGTCTCCCC AATCCCAATC TGATTCCCTC TACATATGAA Sheep' CCTATGGGAG CATTATAACC AGGACTCCCA AGCCTTGCTA TGGTCCCTAA GGACTGGGCT IPSquirrell CCTAAGGGAG CATTACAACC AGGACTCCCA AGCCCTGCTA TGGTCCCTAA GGACTGGGCT SIMongooseI CCCCTGGGAG CTCTACAGCC CGGGCTCCAC AGTCCTAACA TGATACCCAA AGACTGGTAT MPMV TTAATGGGAG CTTTACAACC TGGATTACCC TCCCCGGTGG CTATCCCACA AGGGTATCTT SRVI TTAATGGGAG CTTTACAACC TGGATTGCCC TCTCCGGTGG CTATCCCACA AGGGTATCTT SRVII TTAATGGGGG CTCTCCAACC TGGGCTGCCC TCGCCAGTGG CTATCCCTCA GGGATATTTT BabboonSERV CTTATGGGTG CCTTACAACC AGGATTGCCT TCACCGGTTG CGATTCCTCA AAAATATTTT MusD CTTATGGGAA CCTTACAGCC GGGGCTCCCC TCCCCAGTAG CCATTCCTAA GGGATATTAT TvERV CCCATGGGAT CGCTGCAACC GGGTCTTCCT GCACCCGTAG CAATCCCGGC AGGCTTCCAA SMRVH CCCATGGGAG CCCTTCAGCC TGGTCTTCCC TCTCCTGTAG CCATCCCCCT AAACTATCAC Colobusl CTTATGGGTG CTCTACAGCC CGGACTACCC TCTCCGGTAG CAATCCCTCA AGGATATTTT SImongooseI CTGATGGGAA AACTCCAGCC GGGGCTTCCC TCTCCTGTGG CTATACCTTT AGGGTATTAT GoatII CCCATGGGGG CGTTACAGCC TGGCCTGCCC TCCCCTGTGG CTATACCTAA GGGATTTTAT OstrichD GCAATGGGGG CTCTTCAACC TGGACTTCCC TCCCCCGTTG CCGTTCCAGC TAACTATTTT LorisI TCCATGGGGG CTTTACAACC AGGTCTGCCT ACTCCCGTGG CTATCCCTGC TGGATTTTAT SlMongoosell ACCTTGGGGG CATTACAACC AGGGCTGCCC ACCCCAGTGG CCATCCTGAA GGGCTTTTTC Jaagsiekte CATATGGGAG CCTTACAACC TGGGTTGCCC ACTCCTTCTG CTATACCTGA TAAATCCTAT RDolphinl ATAATGGGAG CTCTTCAACC AGGCCTACCC TCTCCAACAG CCATACCACA CAATTATCAT WFDeerI GTTATGGGGC CTTTGCAGCC TGGTCTCCCT TCTCCTATAG CTATTCCAAA AGGGACTTAT Cariboul ATCATGGGAC CCTACAGCCC AGTATTCCTT TTCCTATAGC TATTCCAAAA G-AAACTTAT GiraffeI GACATGGGGG CCCTTCAGCC GGATTTCCCT TCCCCAGTAG CTGTGCCCTT TCAGTATAAT BisonI GATATCAGGG CCTTACAACC TGGGCTTCCT TCCCCAGTAG CCATCCCAGA AGGATATAAT Muskoxl GACATGGGGG CCTTACAGCC TGGGCTCCCT TCCCCAGTAG CCATTCCAGA GGGATATAAT CFBadgerl CCTATGGGGA CCCTTCAGCC TGGTTTGCCT ACACCATCAG TTATTCCTCT TCAATATAAA HRV5 CCCATGGGGG CATTACAACC CGGACTCCCG GTTCCTACTA TGATCCCAAA GCATTGGCCA Goatl GTTATGGGGC CAACTCAGCC GGGATTACCA AATCCTGTAG CCATTCCTAA AGATTATCAT IAPSHamster CTTTTTGGCC CTGTTCAAAG AGGCCTTCCT TTGCTTTCTG CACTTCCTCA AGATTGGAAG IAPCHamster ATTATGGGCC CTGTACAACG TGGTCTTCCA CTTTTAACTT CTTTACCTGC ATCATGGCCT IAPMouse TTATTTGGCC CAGTACAGAG GGGTCTCCCT GTACTTTCCG CCTTACCACG TGGCTGGAAT Mastomysl ATCATGGGCC CGGTGCAACA TGGCTTACCG CTTTTGTCAT CTTTACCAGC CTCATGGCCC Myomysl ATCAGGGGTC CAGTGCAACA AGTCTTGCCA CTTTTATCAT CTTTACCTGC CTCATGGCCT Uromysl GTCTTCGGGC CCCTTCAGAG GGGCCTGCCA TTGCTTTCAG CGCTTCCAAA ACACTGGGAG Prairiedogl CTCATGGGAC TGGTACAGAG AGGATTGCCT TTGCTTTCAA CAATTCCAAC AAATTGGCCA Rabbitl CCTATGGGCC CTGTGCAACG TGGCCTTCCC CTACTCTCGA CACTCCCTGA CAAGTGGCCC NileRatI CCTATGGGCC CCATTCAACG AGGCCTTCCT TTACTTTCTA GTCTGCCAAA GGAATGGTCT MMTVBR6 GATATGGGAG CATTACAACC CGGCTTGCCG TCCCCTGTAG CAGTCCCTAA AGGATGGGAA SDunnartI CCCATGGGAG CTTTACAACC AGGCCTCCTA TCTCCTAATA TGGTTCCTAA AGAATACCAT RKangaI GCCTTTTGGA GCCCTATACC CGGCCTCCCG ACACCCACAG CCATTCCGAA AGAGTGGCAA

265 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Platypusl CCGATGGGTC CTTTACAACC AGGGTTGCCT TCTCCAGCCA TGATCCCGCA GGAATGGCCG Echidnal CCAATGGGAG CGCTGCAACC TGGGCTCCCG AACCCGGCTA TGATCCCTAG GAATTGGCCG ArmadilloI CCCATGGGTT CACCCCAGCC AGGCCTCCCT CATCCATCCG CAGTGCCCTT GGATTACTCC PFalconI GATATGGGAG CCCTTCAACC CGGGCTCCCG TCCCCGACCA TGATCCCTCG ACACTGGCAT PrairiedogII GATATGGGAG CCCTTCAACC CGGGCTCCCG TCCCCGACTA TGATCCCTCG ACACTGGCAT MHarrierl GACATGGGAG CCCTACAACC AGGACTCCCG TCCCCCACTA TGATTCCTCG AGATTGGCAT Vulturel GACATGGGAG CCCTCCAGCC AGGGCTCCCA TCCCCTACTA TGATACCTCG AAATTGGCAT ETinamouI GATATTGGGC CCCTTCAGCC AGGGCTCCCA CCCCCTACCA TGATACCCCG AAACTGGCAT Moorhenl GATATGGGAC CCCTCCAACC GGGGCTCCCC TCACCAACCA TGTTACCTCG AGATTGGCAT HThrushl CAAATGGGG- CTTTACAGCC CGGCCTTTCC TCCCCTACCA TGATCGCTCA AAATTGGCAT ESOw1I GATATGGGTG CGTTACAACC AGGAATGCCT TCCCCAACAA CGATTCCCCG AAACCGGCAT RNPheasantl GAAATGGGAC CACTGCAGTC TGGACTACCA TCCCCTACAA TGATACCAAA AGACTGGTGT Peacockl GAAACGGGAC CATTGCAGTC TGGAGTACCA TCCCTTACAA TGATACCAAA AGACTGGCGC BluetitII AACATGGGAT CTCTCCAACC AGGAATGCCC TCCCCAACCA TGCTACCACA AGATTGGCTA CMagpieII GACATGGGAC CCCTGCAGCC CGGAATGCCA TCACCCTCTA TGCTCCCTAG GCAGTGGAAG Penguinl GATATGGGGA CATTGCAACC AGGTATGCCT TCACCGACAT TGATACCTCG TCAATGGGAC GSKiwiI GACATGGGTG CTCTGCAGCC CGGCATGCCC TCTCCGACTA TGCTGCCGCG GAACTGGAGC LKiwiI GACATGGGTG CTCTGCAGCC CGGCATGCCC TCTCCAACTA TGCTGCCCCG GAACTGGAAC HThrushl AGCATGGGAG CGTTGTAGCC TGGTATGCCC TCTCCCACCA TGCTCCCCAC TGGATGGGAC ESOw1II ACAATGGGGG CCCTAGAACT GGGCCTACCC TCACCTGCCA TGATTCCTAT GGATTGGGAA WFgoosel GACATGGGAG CTCTCCAGCC CAGATTGCCT TCCCCTACTA TGCTCCCTCG AAGTTGGAAT NABDuckIII GAAATGGGAT CTCTCCAAAC AGGTTTGCCT TCTCCCACCA TGATTCCGCA TAACTGGAAT rlamingoI GACATGGGGG CGCTTCAGCC AGGACTGCCC TCCCCTACCA TGCTTCCTCG AGGCTGGAAC GRheaII GATATGGGTC CATTACAGCC TGGTTTACCG TCACCGGTGA TGTTACCACG AAATTGGAAT DRheaI GATATGGGTC CATTACAGCC TGGTTTACCG TCACCGGTGA TGTTACCACG AAATTGGAAT BrownKiwil GATATGGGAG CGTTACAACC AGGACTGCCA TCACCTACCA TGCTCCCGCG GGAATGGAGT RBowerIII GAGATGGGAC ATCTGCAGCG TGGCCTGCCC TCTCAATCAA TGCTGCCTGT TAATTGGCAA Toucanettell GATATGGGAG CATTGCAGCC AGGCTTACCT TCTCTGGTAA TGATTCCCCA AGATTGGGAT GWoodpeckerI GAGATGGGTG CCTTACAACC TGGCCTCCCA ACTCCTACTA TGATCCCTCA AAATTGGGAT ESOw1II GACATGGGAG CATTGCAACC TGGCCTACCA TCTCCCAGCA TCATTCCCTG AAATTGGGAC GRheaI ACAATGGGAG CACTTCAGCC AAGTTTGCCA AACCCTACTA TGATTCCTAG TCACTGGGAG JQuailI GACATGGGGG CACTACAGCC AGGTTTACCA TCACCTACAA TGATTCTGCA AGTCTGGGAT Cassuaryl CCTATGGGAA CACTCCAACC TGGGATTCCG TCACCAGCTA TGTTACCTGA ATCATGGCCT AZMagpieII CCCATGGGTA CCTTGCAACC TGGACTTCCC TCTCCTGCGA TGATACCTCT CGATTGGCCC CMagpieIII CCCATGGGGT GCTTGCAACC TGGACTTCCC TCTCCTGCGA TGATACCTCT CAATTGACCC MThrushl CCAATGGGTT CCTTACAACC TGGACTTCCC TCTCCAGTGG TCATTGGCCC TTCATAGTCA CMagpiel GACATGGGCC CTCTCCAACC TGGCCTGCCC AACCTTTCCA TGATCCCAAG AAATTGGCCA GuineafowlI GACATGGGAC CGTTGCAGCC AGGCATGCCC TTCCCCACCA TGATTCCTTG AGACTGGTCT PartidgelV GACATGGGAG CCTTACAACT AGGCCTTCCC ATGCTTTCTA TGTTACCGCG CAACTGGCAG HThrushll CCTTTAGGTG CTCTACAACC GGGTTTACCT TCACCTTCAG CTTTACCAAG AAATTATCAT GPheasantl GATATGGGGC CTTTACAGCC AGGACTGTCT TTACTATCCA TGGTACCTGA CTCCTGGCAA GPheasantll GCAATGGAGG CCTTACAACC TGGGTTACCC TTTCCCTCCA TGGTTCCCCG GGACTGGCAA NABDuckI GACATGGGAG CCCTTCAACC TGGCCTCCTG GTGCCTTCTA TGATTCCTCA GGATTGGCAA Loonl TCAATGGGTG CTTTGCAACC TGGAATGCCT TCGCCTAATG TGTTACCAGA TGGGTGGCAT Ostrichl CCGACGGGTG CTTTACAGCC AGGGCTACCT AATCCAGCCA TGATTCCAGA GTGTTGGTCT MoorhenII GCAACGGGGG CGGTACAGCC GGGGCTCCCC AACCCCGCGA TGGTACCACA AGATTGGCCA Goshawkl GCCATGGGAG CACTTCAACC GGGAATGCCT AGCCCAGCTA TGTTACCGAT AGGCTGGCAC Pigeonl CCGATGGGAG CGTTACAGCC TGGTCTTCCG AATCCGGCAA TGCTTCCTGA ACACTGGAAG FHawkI GCCATGGGAG CACTTCAACC GGGAATGCCT AGCCCAGCTA TGTTACCGAT AGGCTGGCAT BGrousel GCAATGGGGG CATTGCAACC AGGATTGCCA AACCCATCGA TGGTCCCGGA GTCATGGCAT BGrouseII CCCATGGGGG CGCTACAACC TGGACTTCCA AACCCTGCAA TGATCCCTGA GGGATGGTCA PeacockII GCCATGGGAG CGTTGTAGCC CGGACTCCCG AATCCTGCCA TGATCCCTGA TACATGGCAC NABDuckIV GCCATGGGAG CACCGCAACC TGGTCTACCT AATCCTGCTA TGATACCAGA GGGTTGGCAT Bluetit= CCAATGGGGG TGCTACAACC TGGTTTGCCC AATCCTGCAA TGTTACCAAA AGATTGGCCC Toucanettel CCAATGGGTC CTGTCCAAAC ATTGCTGCCC ATGAACTCTA TGATACCAGA AGGGCAACCT NABDuckII CCAATGGGTC CTGTCCAAAC GTTGTTGCCT ATGAACTCTA TGATACCGGA AGGGCAACCT HThrushIII CCTATGGGCC CCGTGCAAAC CCTGCTGCCC ATGAACTCGA TGATACCGAA AGGACAACCC BlueTitI CCCATGGGAC CTGTTCAAAC GTTG---CCT ATGAACTCTA TGGTGCCAGA GGGCCAACCC T7 CCCATGGGGC TTGTCCAAAC ACTGTTA-CT ATGAACTCTA TGGTGCCAGA AGGCCAACCC MHarrierll GACATGGGTC CTTTGCAACC AGGCTTACCT ATTCCTTCTG CCATTCCGGA CGGATGGCCA Emul GCGATGGGTG CCCTCCAGCC GGGGTTGCCA CTACCATCAG TAATACCAAA GAATTGGCCA LKiwiII GACTTTGGAC AGTTACAACC AGGACTGCCG GTACTGACGG CACTTCCTGA GAATTGGCCG LDV GACTGGGGAG CGCTGCAACC AGGCACCCCT TGGCCTGGAG CTATACCGTC AGAATGGCCT RSV CCTTTTGGGG CCGTCCAACA GGGGGCGCCA GTTCTCTCCG CGCTCCCGCG TGGCTGGCCC ALVsubgrpJ CCTTTTGGGG CCGTCCAACA GGGGGCGCCA GTTCTCTCCG CGCTCCCGGC TGGCTGGCCC Tragopanl CCTTTCGGGC CGGTTCAGCA AGGGGCCCCT ATTTTGTCAG CAATCCCCGA AGAATGGGGG Guineafowlll CCTTTTGGCC CAGTGCAACA AGGAGGTCCT GTGCTCTCCG CTGTGCCCAG AAATTGGCCT LoonIl GATATGGGAG CATTACAGCC AGGCTTACCT TCTCCGGTAA CGATTCCCCA AGATTGGGAT

266 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Appendix 2. Nucleotide alignment (continued) BLV CATATTTGCC TAGATCTCAA AGATGCCTTC TTCCAGATTC CAGTCGAAGA CCGCTTCCGC HTLV1 CACTTACAAA TAGACCTTAA AGACGCCTTT TTCCAAATCC CCCTACCTAA ACAGTTCCAG HTLV2 CACCTACAGA TAGATCTTAC TGACGCCTTT TTCCAAATCC CCCTCCCCAA GCAGTACCAG HIV1 GTAACAGTAC TGGATGTGGG TGATGCATAT TTTTCAGTTC CCTTAGATAA AGACTTCAGG HIV2 ATTACTGTCC TAGATGTAGG GGATGCTTAC TTTTCCATAC CACTACATGA GGATTTTAGA SIVagm ATAACAGTAT TAGACATAGG GGATGCCTAT TATTCAATAC CATTAGACCC AGAGTTTAGA FIV GTAACAGTAT TAGATATAGG GGATGCATAT TTCACCATTC CTCTTGATCC AGATTATGCT BIV TTAACTGCAA TAGATATAAA AGATGCCTAC TTTACTATCC CTTTACATGA GGACTTTAGA Jembrana ATAACAGCTA TAGACATAAA AGATGCCTAC TTTACCATCC CTTTAGATGA GAATTTTAGA CAEV GTTACAATAT TGGACATAGG AGATGCATAT TTTACTATAC CCCTATATGA ACCATATCGA Visna GTAACAATAT TAGATATAGG AGATGCATAT TTTACAATAC CATTATATGA GCCATATAGA EIAV ATGACTGTAT TAGATATTGG AGATGCATAT TTCACTATAC CCTTAGATCC AGAGTTTAGA HERVK10 TTAATTATAA TTGATCTAAA GGATTGCTTT TTTACCATCC CTCTGGCAGA GCAGGATTGT HML1Z70280 CTCATAGTGA TAGATCTAAA GGATTGCTTT TTTACCATTC CTTTAGCTGC CCAAGATTAT HML3ACO25577 TTAATAGTCG TAGATTTTAA AGACTGTTTC TTTACTATCC CTTTAGCTGA GCAAGACTGT HML3AC092364 TTAGTAGTCA TAGATTTAAA AGACTGTATC TTTACTATCC CTTTAGCTGA ACAAGACTGT HML4AC093517 CTTATCATCA TTGACCTTAA AGATTGCTTT TTTACCATTC CTCTGGCCCC TCAGGACTTT HML5AP000870 ATAATCGTTA TTGACTGAAA AGACTGTTTT TATACTATTC CCCTTGCAAA ACAGGACAGA HML6AF069508 TTGTAGTAAT AGGTCTTAAA GAGTTTTTTT TTTAATATAC CATTACACAA AAAGGATAAG HML7AC013722 TTAATTATCA TTGACCTTAA GGACTGCTTT TTTCATATTC CTTTAGACAA GTCAGACTGT HML8AL596245 CTCATAGTTA TTGATTTTAA AGAT---TTT TTTCATATAC CTTTACATAA ATCAGATTGT HML9ACO25569 CTTATTATCA TTAATCTAAG AAATTGTTTT TTTACCATAC CCTTGCATCC CGAGGATCGA HML9AC068700 CTTATTGTAA TTGATCTAAA GGATTGTCTT TTTACCATAC CCTTGCATCC TGAGGATTGA Oryzomysl ATCATAGTTA TTGATTTAAA AGACTGTTTT TTTACAATAC CCTTACAAGA ACAGGATAGA Armadilloll ATTATTATTA TTGATTTAAA AGATTGTTTC TTTAATATTC CTCTTAATCC AAATGATAGA Reedbuckl ATTATTATTA TTGATCTAAA AGCTTGTTT- ---ACTATTC CTCTTAATCC AAATGATAGA Muspaharill ATCATAATAA TTGATTTGAA AGATTGTTTT TTTCTCAATA CCCTCACATG AGAAAGACAG NileRatII CTATTAGCAA TAGACATAAA AGACTGCTTT TTTTCCATCT CTCTCCACCC CTGGGACTGT Hedgehogl ATCATAGCTA TTGTTATACA AGATTGTTTT TTCTCTATTC CCTTGCATCT GCAAGATTGT Cougarl CTTCTTATCA TTGATTTGAA GACTGCTTTT TCACAATCCC CTTAACTGAA CAAGATAAAA DomesticCatl CAGGAACTCC CTCTCCCTTC AATGTTGCCT CAAAATTGGC CGATTCTTAT CATTGATTTG T22 TTAGCGGTCG TGGATTTGCA GGACTGTTTT TTTTCCATTC CACTGCAGGA CCAAGATCGG Sheepl GTTATGATTA TAGATTTACA AGATTGCTTT TTCACTATAC CTTTACATCC AGATGACAGG IPSquirrelI ATTATGATTA TTGATTTACA AGATTGC-TT TTCACTATGC CTTTACATCC AGATGACAGG SIMongooseI CATATTATCA TTGACCTGAA AGATTGCTTT TACAACATTT CTTTGCACCC GGATGACTGA MPMV AAAATAATTA TTGATCTCAA GGATTGTTTC TTTTCTATTC CCCTTCATCC TAGTGACCAA SRVI AAAATAATTA TTGATCTCAA AGATTGTTTC TTTTCTATTC CCCTTCATCC TAGTGATCAA SRVII AAAATAGTCA TTGATCTTAA AGATTGTTTT TTTACTATCC CCCTTCAGCC CGTTGACCAA BabboonSERV AAAATCATTA TTGATCTTAA AGATTGCTTT TTTACAATTC CCCTTCACCC TGCTGACCAA MusD AAGATTGTTA TAGATTTGAA AGATTGTTTC TTTACCATCC CTTTGCATCC AAAGGATTGT TvERV AAAATTGTAA TAGACCTCAA GGACTGCTTC TTTTCCATTC CTCTCCACCC CGACGACTCC SMRVH AAAATTGTTA TTGACCTTAA GGATTGTTTC TTTACCATCC CCTTACACCC TGAAGACAGA Colobusl AAAATTATCA TAGACCTTAA GGATTGCTTT TTTACAATTC CCCTTCATCC TACCGACCAA SImongooseI AAAGTTGTCA TAGACCTTAA AGACTGTTTT TTTTTTACCC CCCTCCACCC TGATGACCAA Goat II AAGATTGTCA TAGACATAAA GGACTGTTTC TTTTCTATTC CCTTACATCC GGATGATTGT OstrichD AAAATCATTA TTGACCTCAA AGATTGCTTC TTTACCATCC CTCTTCATCC ACTCGATAGA LorisI AAGATTATAA TTGACCTCAA AGATTGCTTC TTTACCATTC CCTTACATCC CGAAGACAAG SIMongooseII AAAATAGTCA TTGACCTCAA AGACTGTTTT TTCACCATCC CCTACATCCC CAAAATAGAC Jaagsiekte ATTATTGTTA TAGATTTAAA AGATTGTTTT TACACTATTC CTCTTGCACC TCAAGATTGC RDolphinl CTTTTAGTTA TAGATCTAAA AGATTGTTTT TTCACTGTTC CTTTGTTTCC TGAAGATAGA WFDeerI AAGTTAATTA TTGATTTAAA AGATTGTTTT TATACTATAC CACTGGCATC ACAAGATTGT Cariboul AAATTAATCA TTGATTTAAA AGATTATTTT TATACTATAC CACTGGCACC ACAAGATTGT Giraffel GTGTTAGTCA TAGATCTGCA CGATTGTTTC TTTACCATCC CCCTGGCTGT TCAAGATTGT Bisonl ATTATTGTAA TTGATTTGCA AGACTGTTTT TTCACCATCC CCTTGAATGC TGAGGATAAA Muskoxl ATAACTGTT- A AGATTGTTTT TTCACTATTC CCTTAAATCC TGAGGATAAA CFBadgerl TTGATAATAT TAGATTTAAA AGATTGTTTT TTTACTATTC CTCTAGCCCC TCAGGATTGT HRV5 TTAATAGTAC TTGATCTGAA GGACTGCTTT TTTAGCATAC CTCTACATGA ACAAGACACT Goatl ATCCTTGCCA TTGACATACA AGACTGTTTC TTTAGTATTC TACTCCATCC TGAGGATGCC IAPSHamster CTTATTATTA TAGATATTAA GGATTGTTTC TTCTCTATTC CACTTTACCC ACGGGATAGA IAPCHamster ATCATCTCTA TAGATATTAA AGATTGCTTC TTTTCCATAC CTTTGTGTGC CAAGGATTCA IAPMouse TTAATTATTA TAGATATTAA AGATTGTTTC TTTTCTATAC CTTTGTGTCC AAGAGATAGG Mastomysl ATAATAGTTA TAGACATAAG AGATTGTTTT TTCTCCATTC CTCTGTGTGC CAAAGATAGT Myomysl ATTATAGTTA TAGATATAAA AGATTGTTTT TTCTCCATTC CTCTGTGTGC CAAAGATAAT Uromysl ATCATTATCA TTGATATAAA AGATTGTTTT TTCTCCATTC CACTGCTGCC TAAGGACAGG Prairiedogl GTGATTTGTA TAGACGTTAA AGATTGTTCC TTTTCTATTC CATTGAATTC TCAGGATACT RabbitI ATCATTGTTA TAGATCTTAA GGATTGCTTT TTTTCCATTC CATTGGATAA AAAGGATACC NileRatI ATCTTTATAA TTGACATTAA GGATTGTTTC TTTTCTATAC CATTGGCCCC CGCCGATTGT MMTVBR6 ATAATCATAA TAGATCTACA AGATTGCTTT TTTAATATAA AACTGCATCC TGAAGATTGT SDunnartI ATGGTGATTA TTGATATAAA GGATTGCTTT TATTCTATTC CCTTACATTC TGCTGACAGG RKangal GTCGTAGTAA TAGACATCAA AGATTGCTTC TATTCAATAC CCCTTCACCC AAAGGATAAG

267 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

PlatypusI TTAATTATAA TAGACCTTAA AGATTGCTTT TATACGATCC CACTTCAAAC AGAGGAGCGA Echidnal ATATTAGTAG TAGATATTCA GGATTGTTTT TCCTCTATTC CGCTCCACCC GGGAGATCAT Armadillol CTGGTCGTAC TGGATATAAA AGATTGCTTC TTCTCCATTC CCTTGCATGA CACAGACAAG PFalconI TTAACTGTCA TAGATCTTAA AGATTGTTTT TTCAGTATCC CGTTACACCC AGATGATGCC Prairiedogli TTAACTGTCA TAGATCTTAA AGATTGTTTT TTCAGTATCC CGTTACACCC AGATGATGCC MHarrierl TTAACTGTTA TTGACCTCAA AGACTGTTTT TTTAATATCC CCCTGCATCC AAATGATGCT Vulturel TTAACCGTAA TTGACCTTAA AGACTGTTTC TTTAATATTC CTTTGCATCC GGATGATGCC ETinamoul TTGACCGTTA TTGATCTCAA AGACTGTTTC TTTAATGTCC CTTTGCATCC GGATGATGCT Moorhenl CTTACAGTGA TTGATTTAAA AGATTGTTTC TTTAACATAC CTTTGCACCC TCAGGATGCA HThrushl TTAATAGTCA TCAATTTAAA AGATTGCTTT TTTGATATTC CTTTGCACCC TGATGATGCT ESOw1I CTTGTAATAA TTGATCTAAA GGATTGGTTT TTTATTATTC CCTTGCATCT GGATGATGCA RNPheasantl TTGACTGCTA TTGATTTAAG GGACTGCTTC TTCACCATTC CGCTGGATCC TCAAGATGCA Peacockl TGGATCC TCATGATGCG BluetitII CTAGCAGTAC TGGATATTAA AGATTGCTTT TTTCAAATTC CCCTGCATCC AGAGGATGCA CMagpiell CTAGCAGTCA TAGACATCAA GGACTGCTTC TTCCATATCC CCCTGCATCC GGACGATGCA Penguinl ATTGTGATTA TCGATTTGAA AGACTGCTTT TTCACGATTC CCTTAGCGCC GGAAGATGCT GSKiwiI ATTGTTATAG TGGACCTGAA GGATTGCTTC TTTACCATTC CCCTGGTGCC AGAAGACGCG LKiwiI ATTGTTATAG TGGACCTGAA GGACTGCTTC TTTACCATTC CCCTGGCGCC AGAAGACGCG HThrushl ATTTTGGTGA TCAATTTGAA GGACCGTTTT TTCATAATCC CCTTGTGTCC TGAAGACAGG ES0w1II ATCATTGTGA TAGATTTAAA GGATTGCTTC TTTACTATTC CATTGGCATC CCAAGATAAA WFgoosel ATACTAATTA TTGACTTGAA GGATTGTTTT TTTACTATAC CACTTCACTC TGAAGATGCT NABDuckIII ATTATTATTG TTGACCTCAA GGACTGTTAT TTTACCATTC CACTCCACCC TAAGGATGCA Flamingol ATATTAATTA TTGATTTAAA GGATTGCTTT TTCACTATAC CACTTCACCC TGAAGACGCT GRheaII TTACTCGTGG TTGATCTTAA AGATTGTTTC TTTACAATAC CGCTACATCC TGAGGACAGT DRheaI TTACTCGTGG TTGATCTTAA AGATTGTTTC TTTACAATAC CGCTACATCC TGAGGACAGT BrownKiwil CTGCTAGTAA TTGATCTGAA AGACTGTTTC TTTAGTATCC CCTTACATCA GGAGGACTCC RBowerIII CTGTTAGTTG TCGACTTGAG GGATTGCTTT TTCACAATTC CTCTGCATGA AGATGATAGC Toucanettell TTGTTTATAG TTGATTATTT GGATTGTTAT TTGACTATTC TGTTACACCC CGAAGACGCT GWoodpeckerl TTGATCATTA ATGATTTAAA GGATTGTTTC TTTACAATTT ACTTACATCC GGAGGATTCA ESOw1II ATATTGATAA TAGATTTAAA AGGTTGGGTT TTTACAATAC CTTTGCACAA GCAGGATGCA GRheaI ATTGTGGTAA TTGATTTGAA AGATTGTTTC TTTACAATCC CACTTCATCC TGCTGATAAG JQuailI ATCTTAGTAA TTCATCTTAA AGACCATTTT TTCACAATTT CATTGCACCT AGCAGACGCA Cassuaryl CTTGTTGTTA TTGACCTTAA GGATTGCTTT TTTACTATAC CATTACATCC AGCGGATGCA AZMagpieII CTTGTTGTCA TAGACTTAAA AGACTGCTTT TTTACCATTC CCTTGCACCC TGATGATGCG CMagpieIII CTCCTTGTCA TAGTCTTAAA AGACTGCTTT TTTACCATTC CTTTGCACCC TGATGATGCA MThrushl TACATTTAAA AGACTGTTTT TTTACCATTC CCTTGCACA- C TGATGATGCA CMagpiel CTTGTAATTA TCGATCTAAA AGACTGTTTC TTTAACATCC CACTGCATCC AGACGATGCT Guineafowll GTTTTTACAA TTGACCTCAA AGATTGCTTT TCCAATATCC CACTGCACCC TGATGACCAG PartidgelV TTGCTAGTGA TAGACCTGAA AGACTGTTTT TTCACCATTG CTTTGCACCG TGATGATCAG HThrushll CTATTCATTA TTGACATTAA AGATTGTTTC TTTAGTATAC CTCTGCACCC AGATGACACC GPheasantl TGTCTGCTAA TTGACCTGAA GGACTGCTTC ---CTTTTTC CTCTACATTC AGATGATTCA GPheasantll TGCATTATTG TGGACCTGAA GGACTGTTTT TTCAGCATTC CCCTTCACCC CGATGACTGC NABDuckI TGCCTTGTCA TAGATCTCAA GGACTGTTTT TTTTCAATCA CATTGGTGGA ACAGGACAAG Loonl ATCTTGATTG TAGTTTTAAA AGACTGTTTC TTTACCATTC CCTTGCATCC TCAGGATACA Ostrichl CTGCTAAGTA TTGATTTAAA GGATTGCTTT TTTACAATTG ATTTACATCC TCAGGATCAA MoorhenII CTATTAATCA TTGATCTAAA GGATGGCTTC TTTACAATTG CCCTACATCC AAATGACACA Goshawkl TTATTAATAA TTGACCTAAA AGACTGTTTT TTCACCATCT TCTTACATCC CAGAGACACT PigeonI TTGTTAATTG TGGATTTGAA GGATTGCTTC TTTACTATTT CTTTGCACCA TCTTGATACA FHawkI CTACTGATTA TTGATTTAAA GGACTGTTTT TTCACCATCT TTCTACATCC CAAAGACACG BGrousel CTGTTAATAA TTGATCTGAA GGATTGCTTT TTTACAATCA AAATTCACCC CAATGATTCA BGrouseII TTATTGGTCA TTGATCTGAA AGATTGTTTC TTCACAATTC CTCTCCATCC TCAAGATACC PeacockII CTGTTGATAA TAGATTTGAA GGACTGCTTC TTTACTATTC GCATTCATCC AAATGATACC NABDuckIV TTATTGATTG TGGATCTGAA AGACTGTTTT TTCACAATAG CCCTCCATGA AAAAGACAAA BluetitIII TTGTTAATAA TTGATTTAAA GGACTGTTTT TTCACTATCT CACTGCATGA ACAGGATACT Toucanettel TGCGCCGTCC TTGATATTAA GGACTGCTTC TTCTCCATAC CTTTGCATGA AGAAGACAAA NABDuckII TGTGCCGTCC TTGATATTAA GGACTGTTTT TTCTCGATAC CTCTACATGA AGAAAACAAG HThrushIII TGTGCAGTGC TCGACATCAA AGACTGTTTC TTTTCTATCC CCCTGCACGA GGACGACAAG BlueTitI TGTGCTGTTC TTGATATCAA AGACTGTTTC TTTTCTATAC CCTTTCATGA AGAAGACAGA T7 TTTGCTGTTC TTGATATCAA AGACTGTTTC TTTTCTATAC CCTTACATGA AGAAGACAAA MHarrierII GTCGTAGTGA TGGACATCAA AGATTGTTTC TTTTCAATTC CTCTAGCTAA AGAGGATCAG Emul GTGGTGGTGA TGGATATCCA AGATTGCTTT TTCTCTATAC CATTAGCCCC TCAGGACAAA LKiwiII ACTATGGCAG TAGACATCAA AGATTGCTTC TTTTCTATCC CGCTGCACTC ACGGGATAGG LDV GTAATAGCCA TGGACATCTC CGACTGCTTT TTCTCAATCC CGCTGGCAGA GCGGGACTCT RSV CTGATGGTCT TAGACCTCAA GGATTGCTTC TTTTCTATCC CTCTTGCGGA ACAAGATCGC ALVsubgrpJ CTGATGGTCC TAGACCTCAA GGATTGCTTT TTTTCTATTC CTCTTGCGGA ACAAGATCGC Tragopanl GTGGTGGTCA TTGACCTGAA GGATTGCTTC TTCTCTATAC CCCTCGCGGA AAGGGATAGG GuineafowlII CTAGTGGTCA TAGATCTTAA GGATTGTTTT TTCTCAATCC CTTTGGCGGA GCAAGACCGA LoonlI TTGTTTATAG TTGATTTGAA AGATTGTTTC TTCACTATTC CGTTACACCC CGAAGACGCT

268 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Appendix 2. Nucleotide alignment (continued) BLV TTCTACTTGT CTTTTACCCT CCCATCCCCC GGGGGACTCC AACCTCATAG ACGCTTTGCC HTLV1 CCCTACTTTG CTTTCACTGT CCCACAGCAG TGTAACTACG GCCCCGGCAC TAGATACGCC HTLV2 CCATACTTCG CCTTCACCAT TCCCCAGCCA TGTAACTATG GCCCCGGGAC CAGATATGCA HIV1 AAGTATACTG CATTTACCAT ACCTAGTATA AACAATGAGA CACCAGGGAT TAGATATCAG HIV2 CAGTATACTG CATTTACTCT ACCATCAATA AACAATGCTG AACCAGGAAA AAGATACATA SIVagm AAGTATACCG CTTTTACCAT TCCATCAGTA AATAATCAAG GGCCAGGTAC TAGATATCAG FIV CCTTATACAG CATTTACTTT ACCTAGAAAA AATAATGCGG GACCAGGAAG GAGATTTGTG BIV CCCTTTACAG CCTTCTCTGT AGTCCCTGTA AATCGAGAAG GACCTATAGA GAGGTTCCAG Jembrana CAATACACTG CTTTCTCGGT GGTCCCGGTG AATAGAGAAG GGCCTCTAGA AAGATATCAT CAEV GAGTACACAT GTTTTACTCT ATTAAGTCCT AATAATCTAG GACCATGTAA AAGATACTAT Visna CAATATACAT GCTTTACCAT GTTAAGTCCA AATAATTTAG GACCATGTGT AAGATATTAT EIAV CCATATACAG CTTTCACTAT TCCCTCCATT AATCATCAAG AACCAGATAA AAGATATGTG HERVK10 GAAAAATTTG CCTTTACTAT ACCAGCCATA AATAATAAAG AACCAGCCAC CAGGTTTCAG HML1Z70280 GAAAAATTTG CTTTTATTGT TCCTGCCATA AATAATAAAG AACCAGCGGA CAAATACCAT HML3ACO25577 GAATGGTTTG CATTTACAAT TCCTGCAGTA AACAACCTGC AGCCTGCTAA GCGTTATCAT HML3AC092364 GAAGGGTTTG CATTTGCAAT TCCTGCAGTA AACAACCTGC AGCCTGCTGA GCGTTATCAT HML4AC093517 GAAAAATTTG CTTTCACTGT TCCAGCCCTT AACAACGTTG CTCCTGCAGC ACGTTACCAT HML5AP000870 GAAAAAATTG CATTTACAAT ACCAGCTATC AATAATGAAA GGCCAGCTTG CCAATTTCAT HML6AF069508 CCTCAATTTG CCTTCTCTGT GCCTTCTATT AATCAAAGAG AGCCTGTCTC TCGCTCTCAG HML7AC013722 GAAAAATTTG CTTTCACTAC ACCTTCCATT AACAATTCAG CTTCTGCAGC TAGATATCAA HML8AL596245 GAAAAATTTG CTTTTACTGT ACCGTCTATC AATAATCAGG ATCCTGCAGC TCATTATCAA HML9ACO25569 GAAAAATTTA CTTTTACCAT TCCCACTTAT AACAATCAGC AACCAGTTCA ACGTTATCAA HML9AC068700 GACAAATTTG CCTTTACCAT TCCCACTTAT AATAATCAGC AGCCAGTTGA ACGCTATCAA Oryzomysl GAAAAATTTG CTTTTACAGT ACCTACTTAT AATAATTCTC AGCCTCAAAG GAGGTACCAG Armadilloll GAAAAATTTG CGTTTACTAT TCCTTCTATT AATCATTCAG CTCCTACTGA ACGATATCAG Reedbuckl GAAAAATTTG CATTTACTAT TCCTTCTATT AATCATTCAC CTCCTACTGA ACCATATCAG Muspaharill GGAAAGGTTT GCTTTCCCGT GCCTACTGTT AGTAACAGTC ATCCACTAAA GAGATACCAC NileRatII GAACCCTTTG CATTCTCAGT GCCCTCCATT AACAACTCTG TCCCAGCCTC ACAATATGAG Hedgehogl AAACGTTTTG CCTTCTCTGT TCCTTCTATT AATAATGTCA GGCCCTGCTG ATAGATTTGA Cougarl AAAAATTTGT CTTTACTGTT CCTGTTTTTC AATCACTGTT GGCCAACACA CTGTTACCAT DomesticCatl AAAGATTGCT TTTTCACAAT CCCCTTAGCT GAACAAGATA AGAGAAAAAT TTGCCTTTAC T22 GCAAAATTTG CCTTTACTGT TCCGGTACTT AATAACAGCC AACCAACGGC GAGGTATCAA Sheepl CAACATTTTA CCTTCTCAAT ACCTTCCATT AATAATCAAA CCCCTGTTCA ACGGTATCAA IPSquirrell CAACGTTTTG CCTTCTCAAT ACCTTCTATT AATAATCAAT CCCCTGCTCA ATGGTATCAA SlMongoosel CAACGTTTTG CTTTTTCTTT GCCTGCTCTT AATAATCAAG CTCCTAGAGA GCGCTTTCAT MPMV AAAAGATTTG CCTTCAGCCT ACCATCCACA AATTTTAAAG AACCTATGCA ACGTTTTCAG SRVI AAAAGGTTTG CATTCAGCCT ACCTTCCACA AATTTTAAGG AACCTATGCA ACGTTTTCAG SRVII AAGCGATTTG CTTTTAGTCT TCCGTCTACC AACTTTAAAC AACCAATGAA ACGTTATCAA BabboonSERV AAAAGATTTG CCTTTAGTCT TCCATCTACA AATTTTAGAC AACCAATGAA GCGCTATCAA MusD GAGAGATTTG CTTTTAGTGT TCCTTCTGTA AATTTCAAGG AACCCATGAA AAGATATCAA TvERV AAAAGATTCG CCTTCACTGT CCCAGTTACC AATTGCGTAG GACCCTCTCC TCGCTTTCAA SMRVH CCTTACTTTG CCTTTAGCGT CCCTCAAATC AACTTCCAAA GTCCTATGCC TCGTTATCAG Colobusl AAACGATTCG CCTTTAGCTT ACCATCTATA AATTTTAGAG AACCAATGAA GCGATACCAA SImongooseI GAGCTATTCG CTTTTAGCGT GCCTTCTGCT AATCTAAAAA GACCCATGAT AAGATATCAA Goat II GAACATTTTG CCTTCAGCAT CCCTATTGTT AATTTTGCTG GACCCATGCC TCGCTTTCAA OstrichD GAGCATTTTG CGTTTAGCCT CCCAGTTGTT AATTTTAAAG GACCGATGCA GCGCTATCAG Lorisl GCACGCTTTG CATTTAGCTT ACCAGTAACT AATTTCAAAG GACCCATGCC TCATTTCCAT SIMongooseII AACGCTTTGC CTTCAGGCTG CCAGGTTGTC AATTTCAAAG GACCTATGCC GCGCTTTCAG Jaagsiekte AAAAGATTCG CTTTCAGTTT ACCCTCTGTT AATTTTAAAG AGCCTATGCA ACGCTATCAA RDolphinl AAACATTTTG CTTTTAGCTT GCCTGCCTTA AATTTTAAGG AACCTATGAG GAGATTTCAA WFDeerI CCTCGTTTTG CTTTTAGCGT CCCTGCTAAT AACTTTCATC AACCCATGAA GCGTTATCAA Cariboul CCTCGTTTTG CTTTTAGTGT CCCTGATAAT AACTTTCATC AACCCATGAA GCATTATCAA Giraffel AAGAGGTTTG CTTTCAGTCT CCCTTCAGTT AATTTTAAAC AGCCCTATAG AAGGTTTCAA Bisonl AAGCGATTTG CCTTTAGTGT ACCAGCAGAA AATTTTAAAC AGCCTCATTT AAGGTTTCAA Muskoxl AAGCGATTTG CCTTTAGTCT GCCATCAGAA AATCTTAAAC AGCCTTACTT ACGGTTTCAG CFBadgerI GAACGGTTCG CTTTTACTAT TCCTAGTACT AATCATAAGG AGCCTGCTAA GCGATATCAG HRV5 CAGAGGTTTG CTTTCACCAT ACCTTCCATT AATCATCAAG GTCCTGACAA AAGATATGGA GoatI CCTAGATTTG CATTTTCAGT ACCCAGTATA AATTGTCATG AGCCGAGTCA ACGGTATCAC IAPSHamster CCAAGGTTTG CCTTCACTAT CCCTTCTCTT AATCATATGG AACCAGACAA GAGATTTCAG IAPCHamster GGGCGTTTTG CGTTTACGCT GCCCTCTTGT AATCATGAAC AACCTGATTT AAGGTATGAA IAPMouse CCCAGATTTG CCTTTACCAT CCCCTCTATT AACTCAGATG AACCTGATAA CAGGTATCAA Mastomysl GAGAGATTTG CATTTACTAT TCCATCCTGT AACCATGAAG AACCTGATCA GAGGTTTGAA Myomysl GAGAGATTTG CATTTACTGT TCCATCTTGT AATCATGAAG AACCTGATCA AAGGTTTGAA Uromysl ATCCGTTTCG CTTTCACCTT ACCAGCAACC AACCATCAAG AACCTGATAA ACGGTTTCAA Prairiedogl AAACACTTCG CTTTTACTCT GCCTTCATGT AATCATGAGG AGCCTGATCA AAGGTTTGAA Rabbitl CCGCGTTTTG CTTTCACCGT CCCCACCCTA AATCAGGAAC AACCGGACAA ACGTTGGGAA NileRatI GAGCGGTTTG CTTTTACTGT CCCCGCAATA AACAATGAAG AACCTGATGC CCACTACCAA MMTVBR6 AAAAGATTTG CTTTTAGTGT GCCCTCCCCT AATTTTAAGA GACCTTATCA AAGATTCCAA SDunnartI GAGAAATTTG CCTTTTCTGT TCCAGCTGTT AATTTACAAG CTCCTGCCCC TAGGTGGCAA RKangal GAGAAATTTG CCTTTTCCGT ACCATCAACA AATTTGAATG GCCCCTATGA TCGACATCAA

269 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Platypusl AAAAGATTTG CATTCTCAAT CCCTTATATC AATGTTAGAG CCCCCACCCA ACGATATCAA Echidnal ATTCGATTGT GGCCAACAGG ACTCAACCTA CAGTGTGAT- ATCAG Armadillol GAAGAATTCG CCTTCTCAGT TCCCCAAATT AACCACCAAG GGGCCAATGA ACGCTATCAG PFalconI CTGAAGTTTA CTTTTTCAGT TCCAAGCATC AACATGCAAG CTCCATTGCA GAGATATCAG Prairiedogil CTGAAATTTG CTTTTTCAGT TCCAAGCATC AACATGCAAG CTCCATTGCA GAGATATCAG MHarrierl CCTAAATTTG CTTTTTCAGT TCCAAGCGTC AACATGCAAG CCCCGTTACA AAGATACCAG Vulturel CCCCGATTCG CTTTTTCGGT CCCAAGCACC AATATGCAGG CGCCTTTACA AAGATACCAG ETinamoul CCCAAATTCG CTTTCTCGGT CCCAAGCGTC AATATGCAAG AGCCTCTACA AAGATACCAG Moorhenl CCTAAATTTG CCTTTTCAGT TCCAAGCATC AATGTGCAAG AGCCTTTACG CAGATTCCAT HThrushl CCTAAGTTTG CTTTTTCTGT ACCTAGCACT AACATGCAAG CTCCCTTGCA GCGATACCAC ESOw1I CCTAAATTTG CCTTTTCTGT TCCCAGTATT AATGTGAGTA AACCTGCAAG ACGGTACCAT RNPheasantl CCTAAGTCTG CATTCTCAGT WCCCTCCAGT AATGCGTCTG AACCGGCTAA ACGATATCAT Peacockl CCTAAATTTG CATTCTCAGT TCCCTCCATT AATGTGTCTG CACTGGCTAA ACAATACCAT BluetitII CCAAGGTTTG CCTTTTCCAT TCCTACCATT AACAGGGAAG CCCCAATGAA ACGGTACCAC CMagpiell CCCCGGTTTG CCTTCTTGGT GCCTTCCCTC AACCGAGAGG CTCCCATGCA GCGCTACCAG Penguinl CCCAGATTTG CCTTCTCTCT CCCAGTTGTA AACCATCAAG GGCCTAGGCT GCGATATCAT GSKiwiI CCAAGATTTG CTTTTTCGGT ACCATCCAGA AACCATCAAT CGCCTATGCA AAGATATCAT LKiwiI CCACGATTTG CTTTTTCGGT ACCATCCAGA AACCATCAAT CACCGATGCA AAGATATCAT HThrushl CCAAAATTTG CTTTCACAGT ACCTGCAGTG AACAATTCTG AGCCAGGACA GAGATATCAA ESOw1II TTTG CGTTCTCTGT GCCTTCTCTA AATCATGCAG AGCCTTCGAA AAGGTACCAT WFgooseI CCCCGATTTG CCTTTTCAGT CCCAAAAATC AACAGGGGTG AACCCATAGA CAGATATCAC NABDuckIII CTGTGATTTG CATTTTCGGT ACCAAAAATA AACAAAAGTG AACCCATGGA CAGATATCAC Flamingol CCTCGTTTTG CCTTTTCAAT TCCGAAACTT AACAGAAGTG AGCCTATGGA CCGATATCAT GRheaII GAGAAGTTTG CTTTTACAGT ACCATCCATT AATAGAGCCG CCCCAGCTCA ACGCTTTGAA DRheaI GAGAAGTTTG CTTTTACAGT ACCATCCATT AATAGAGCCG CCCCCGCTCA ACGCTTTGAA BrownKiwil AAACGCTTTG CATTCTCAGT GCCAACGATA AATCGATCTG GTCCTTCGAG ACGCTATCAG RBowerIII GAGAAGTTTG CCTTCTCGGT ACCATCCATA AATAAGTCTG AACCAACAAA GAGGTATCAA Toucanettell GAAAAATTTG CCCTTTCTGT CCCATCCGTA AATAAAAAGG AACCGGCAAA ACGGTACCAC GWoodpeckerl GAAAGATTTG CATTAACTGT GCCATCAATT AATAAGGCAG AACCTGCCCA ACTATACCAA ES0w1II GAGCCCTTCG CCTTTTCTGT GCCGACAATT AATAAGGCTG AGCCGATGAA AAGATATCAT GRheaI GAGAAATTTG CCTTTAGTGT ACCATCAATT AATAAACAGG AACCTTATCA TCGATGTCAG JQuailI TCTAACTATG CTTTCTCAAT ACCATCTATC AACAAAGCTG GGCCAATGTG ACAGTATTAT Cassuaryl CCACGATTCG CATTTACAGT TCCAGCGGTC AACAATCAGG AGTCTACACT TCGCTTTCAC AZMagpieII CCTCGTTTTG CTTTTTCAGT GCCCACCCTG AACCATGCCG AACCAATGAA AGGGTATCAC CMagpieIII CCTCATTTTG CTTTTTCAGT GCCCACCCTG AACCACGCAG AGCCAATGAA AAGGTATCAC MThrushl CCTTGCTTTG CCTTTTCAGT GCCCCACTCC AAACCACACA GGCCAATGAA AAGATATCAC CMagpiel CCTCGTTTTG CCTTTTCCGT TCCAAGTACA AACCTGCAAG AACCACTTCA AAGGTACCAT Guineafowll GAACGTTTCG CGTTAATGGT CCCGAGGCTG AATAATGAGG GCCCTGCTGC TCGGTACCAC PartidgelV AAGCGTTTTG CTTTAACTAT TCCCAGTGTG AATAATGATG CTCCTGCCCG CCGGTATCAT HThrushll TGTAGGTTTG CTTTTAGCAT CCCTTCTATC AATAACCATG CTCCAATGAA AAGGTATGAA GPheasantl CCTAGGTTTG CCATGACGGT CCTAGCTGTT AACAATGCAG AACCAGCAAG GTGGTACCAA GPheasantll AAGCGTTTTG CCCTCACTGT ACCAGCCCTT AACAACGCAG AACCGGCCAA AAGGTTTGAA NABDuckI CAGCGTTTTG TGCAGACTGT TCCATCTGTT AATAATCAGG CACCTGCTAA ATGGTATGAA Loonl CAGTGATTTG CATTTTCAAT ACCAGCTATA AACAAAGCCT CACCGGCAGA CTGCTATGAA Ostrichl AAACGGTTTG CTTTTACCTT ACCCTCCCTT AACAGAGAGG GCCCAGATCA ACGATTTGAA MoorhenII AAACGTTTTG CCTTCACGCT CCCAGCAATT AACAGAGGAG AACCGGACAA GAGATTTGAA GoshawkI GAGCGCTTTG CTTTTACGTT ACCATCGGTT AATCGAGCTG CCCCAGCTCG CAGATTTGAA Pigeonl CAGCGACTTG CCTTCACTTT GCCAGCGATA AATAGGGAAG CTCCTACTCA GCGGTTTGAG FHawkI GAACGTTTTG CTTTTACATT ACCATCGGTT AATTGAGCTG GCCCAGCTCG CAGATTTGAG BGrousel GAACGCTTTG CTTTTACATT GCCAGCAATC AATAAGAGCA TGCCAGAGGC GCACTATCAA BGrousell AAACGTTTTG CCTTTACTCT GCCAGCTATA AATAGGGGAG AGCCTGACAA ACGCTTCGAA Peacockil CCCTGTTTTG CTTTTACGCT GCCATCGATT AATCGCAGCT CTCCTGCGAT GCGCTATGAG NABDuckIV CAGAGGTTTG CCTTTACTCT CCCGGCAATA AATCGTGAAG GTCCAGATCA AAGATTTGAA BluetitIII TGCAGATTTG CTTTTACCTT GCCTTCATTA AATCAAGCAG AACCAGACAA GAGGTTTGAA Toucanettel GAGCGATTTG CTTTTTCTGT CGTGTTCCCA AACAGCGAAC GTCCCAATCT ACGCTTCCAG NABDuckII GAGCGGTTTG CTTTCTCTGT TGTGTTTCCA AACAGCCAAT GCCCCAACTT ATGCTTCCAG HThrushIII GAGCGGTTCG CCTTTTCCAT CGTCTTTCCG AACAGCCAAC GGCCCAACCT ACGCTTCCAG BlueTitI GAGGCGGTTT GCTTTTCAGT GGTGTTTCCA AGGCAGCCAG CGCCCCAACT TCCCAGTGGA T7 GAGCAGTTTG CTTTCTCAGT GGTGTTTCCA AACAGCCAGA GCCCCAACCT ATGCTTCCAG MHarrierll CACAAATTTG CTTTCACGTT ACCTGCTGTG AACCTGCAAG GTCCCAGTAA GCGATATCAG Emul GCTCGATTTG CCTTCACACT GCCCTCAGAG AATCTGCAGG AGCCAGCAGC GCGATATCAA LKiwiII GAGAGATTCG CATTCACAGT TCCATCGGTG AATAAGCAGG AGCCGGCGCG GCGATACCAA LDV GAGCGGTTTG CCTTTACGAT TCCCTCTCCG AACCTCCGCG AGCCTGCCAA AAGATACCAG RSV GAAGCTTTTG CATTTACGCT CCCCTCTGTG AATAACCAGG CCCCCGCTCG AAGATTCCAA ALVsubgrpJ GAAGCTTTTG CATTTACGCT CCCCTCTGTG AATAACCAGG CCCCCGCTCG AAGATTCCAA Tragopanl GAGGCTTTTG CATTTTCTGT CCCAGTTCAG AACAATCAGG GACCGGTCCA GCGATACCAA GuineafowlII GAAGCATTTG CCTTTACCGT GCCAGCTCCC AGTAATCAAA GCCACACTGA TAGGTATCAG LoonII GGAAAATTTG CCTTTTCTGT CCCATCCGTA AATAAAAAGG AACCGGCAAA ACGGTACCAC

270 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Appendix 2. Nucleotide alignment (continued) BLV TGGCGGGTCC TACCTCAAGG CTTCATTAAC AGCCCAGCTC TTTTCGAACG AGCACTACAG HTLV1 TGGAGAGTAC TACCCCAAGG GTTTAAAAAT AGTCCCACCC TGTTCGAAAT GCAGCTGGCC HTLV2 TGGACTGTCC TTCCACAGGG GTTTAAAAAC AGCCCCACCC TCTTCGAACA ACAATTAGCA HIV1 TACAATGTGC TTCCACAGGG ATGGAAAGGA TCACCAGCAA TATTCCAGTG TAGCATGACA HIV2 TATAAAGTCT CACCACAGGG ATGGAAGGGA TCACCAGCAA TTTTTCAGTA CACAATGAGG SIVagm TTCAACTGTC TTCCACAAGG ATGGAAGGGA TCCCCAACTA TTTTTCAGAA CACAGCAGCT FIV TGGTGTAGTC TACCACAAGG CTGGATTTTA AGTCCATTGA TATATCAAAG TACATTAGAT BIV TGGAATGTTC TACCACAAGG ATGGGTATGT AGCCCTGCCA TTTATCAGAC TACCACCCAG Jembrana TGGAATGTGT TACCTCAAGG GTGGGTGTGT AGCCCTGCCA TATATCAAAC AACCACTCAG CAEV TGGAAAGTGC TGCCACAAGG TTGGAAATTG AGTCCATCTG TATATCAATT TACTATGCAG Visna TGGAAAGTGT TACCACAAGG ATGGAAATTA AGTCCTGCAG TGTATCAATT TACAATGCAA EIAV TGGAATTGTT TACCACAAGG ATTCGTGTTG AGCCCATATA TATATCAGAA AACATTACAG HERVK10 TGGAAAGTGT TACCTCAGGG AATGCTTAAT AGTCCAACTA TTTGTCAGAC TTTTGTAGGT HML1Z70280 TGGAAAGTAC TGCCACAAGG CATGCTAAAT AGCCCAACTG TTTGTTAAAC TTATGTCAGG HML3ACO25577 TGGGAAGTGT TGCCATAGGG CATGTTAAAC AGTCCCACAA TTTGCCAGAT GTATGTGGGG HML3AC092364 GGGAAAGTGT TGCCAAAGGG CATGTTAAAC AGTGCAACAA TTTGCCAGAT GTATGTTGGG HML4AC093517 TGGAAAGTCC TACCTCAAGG TCTGCTCAAT AGCCCCACTA TTTGTCAATA TTATGTGGGA HML5AP000870 TGGAACGTGC TTCCTCAAAG AATGCTGAAC AGTCCTACCA TGTGTCAGTA TCATATAAAT HML6AF069508 TGGAAAGTTT TAGCCCAAGG CATGCTCAAC AGTCCTATGT TATGTCAGCA TTTTGTAGGA HML7AC013722 TGGAAAGTTT TACCTCAAGG AATAATTAAC AGTCCTACTA TTTGTCAGTT GTTTGTCAGT HML8AL596245 TGGAAAGTAC TTCCTCAGGG AACACTGAAT AGCCCTACAA TCTGCCAGCT TTATGTTGAA HML9ACO25569 TGGAGAGTTC TACCCCAAGA GATGATAAAT AGCCCA-CTA CATGTCAGCT TTACGTACTT HML9AC068700 TGGACAGTTC TGCCCCAAGG GATGATGAAT AGCCCTACTA TATGTCAGCT TTACGTACAT Oryzomysl TGGAGGGTCC TCCCTCAAGG AATGCTAAAT AGCCCTACTC TGTGTCAATA TTTTGTACAA Armadilloll TGGAAAGTTT TACCTCAGGG AATGATGAAC AGTCCTACAT TATGTCAATC CTTTGTTCAC Reedbuckl TGGAAGGTT- TACTTCAGGG AATGATGAAC AGTCCTACAT TATGTCAATC CTTTGTTCAC Muspaharill TGGAAGGTTC TTCCTCAGGG AGTGTTAAAC AGTCCAACTT TATGTCAGTA TTTTGTGTAA NileRatII TGGATCATTC CTCCCCAGGG AATGGCTAAC AGTCCTATCA TTTGTCAGGA GGCTGTAGCC Hedgehogl ATAGTGTTGC TGCCCCAGGC ATGGCCAGTA GTCCTACAAT TTGACAAGAG GCAGGTTAAA Cougarl TGGAAAGTCC TCCCACAAGG GATGCTTAAT AGCCCACCTA TGTACCAATT TTATGTTAAT DomesticCatl TGGGTTCCTG TTCTCAATCA CTGCGGGCCA CCACCCTGTT ACCATTGGAA GGGT/7")''''') T22 TGGAAGGCGT TGCCACAGGG AATGTTAAAT AGTCCAACAT TATGTCAACT GTATGTTCAT Sheepl TGGAAGGTCC TGCCCTAAGG TATGATGAAC TCTGTTATGG TCTGTCAATT CGTTGTTGAT IPSquirrell TGGAAGGTCC TGCCCCAAGG TATGATGAAC TCCCCTACGG TCTGTCAATT CATTGTTGAT SIMongooseI TGGAAGGTCC TCCCCCAAGG TATGATGAAT TCTCCAACAA TGTGTCAAAT GTTTGTGCAT MPMV TGGAAGGTTT TACCACAAGG TATGGCCAAC AGTCCTACCT TATGTCAAAA ATATGTGGCC SRVI TGGAAAGTTT TACCGCAACG TATGGCCAAC AGCCCTACCT TATGCCAAAA ATATGTGGCC SRVII TGGAAAGTGT TGCCTCAAGG CATGGCCAAT AGTCCTACCT TGTGTCAAAA ATATGTAGCT BabboonSERV TGGAAAGTCT TACCTCAGGG TATGGCCAAT AGTCCTACCT TGTGTCAAAA ATATGTAGCT MusD TGGACAGTTC TCCCGCAGGG CATAGCTAAT AGTCCCATCT TATGTCAAAG GTTTGTGGCA TvERV TGGAAAGTTC TCCCCCAAGG CATGACCAAC AGCCCTACCC TCTGCCAGAA GTATGTTGCC SMRVH TGGAAGGTTC TGCCACAGGG CATGGCCAAC AGTCCCACAC TGTGCCAAAA ATTTGTTGCT Colobusl TGGAAAGTGT TACCTCAAGG TATGGCCAAC AGTCCCACCT TGTGTCAAAA ATATGTGGCT SImongooseI TGGAAGGTTC TACCTCAGGG CATGGCTAAT AGCCCTACCT TATGTCAAAG CTTTGTGGCA GoatII TGGCGGGTTT TACCACAAGG CATGGCAAAC AGCCCTACCC TCTGCCAGAG GTTTGTTGCT OstrichD TGGAAAGTTC TCCCTCAGGG CATGGCCAAT AGTCCCACTC TGTGTCAAAA ATTTGTTGCT Lorisl TGGAAAGTTC TACCTCAAGG CATGGCCAAT GGCCCAACAC TGTGCCAAAA GTTTATAGCC SIMongooseII GTGGCGTGTA CTACCTCAGG GTATGGCCAA CAGCCCCACT CTCTGTCAAA AATTTTTGCT Jaagsiekte TGGAGAGTTC TCCCGCAAGG AATGACTAAT AGCCCTACGC TGTGCCAAAA ATTTGTTGCT RDolphinl TGGAAAGTAT TGCCTCAGGG CATGGCAAAT AGTCCTACAT TATGCCAAAA ATATGTGGCT WFDeerI TGGAAAGTAC TCCCTCAAGG CATGGCTAAC AGCCCCACTT TATGTCAAAA GTTTGTTGCT Cariboul TAGAAAGTAC TCCCTCAAGG CATGGCTAAC AGCCCCACTT TATGTCAAAA GTTTGTTGCT Giraffel TGGAAGGTTT TGCCTCAGGG AATGAAAAAT AGCCCCACCT TGTGTCAGAA ATTTGTGGCT Bisonl TGGAAAGTGT TGCCCCAGGG AATGAAAAAT AGCCCAACTT TGTGTCAAAA ATTTGTTAAT Muskoxl TGGAAGGTTT TGCCTCAGGG AATGAAAAAT AGCCCTACCT TTTGCCAAAA ATTTGTTAAT CFBadgerl TGGCAAGTTT TGCCCCAGGG TATGGCTAAT AGCCCCACCT TGTGTCAAGA GTTTGTGTCT HRV5 TGGAAAGTGC TTCCCCAAGG AATGACTAAT AGTCCTGCCA TATGCCAGCT ATATGTTGAC GoatI TGGGTGGTTT TGCCTCAAGG CGTGGCTAAC AGTCCAACCA TATGTCAAGT ATATGTTGCT IAPSHamster TGGAAGGTAC TGCCGCAAGG CATGGCCAAT AGCCCAACAA TATGTCAGCT ATATGTGCAG IAPCHamster TGGGAGTGTT GGCCACAGGG GATGGCCAAT AGTCCTACTA TGTGTCAGTT GTTTGTAGCA IAPMouse TGGAAGGTCT TACCACAGGG AATGTCCAAT AGTCCTACAA TGTGCCAACT TTATGTGCAA MastomysI TGGATAGTCT TACCTCAGGG CATGGCAAAT AGCCCCACTA TGTGTCAACC GTATGTGGGA Myomysl TGGGTAGTCT TGCCTTAGGG CATGGCAAAT AGCCCCACTA TGTGTCAACT GTATGTGGGT Uromysl TAGAAGGTCC TACCTCAGGG AATGGCTAAT AGCCCCACCA TGTGCCAACA TTTTGTGCAG Prairiedogl TGGGTGGTGT TGCCTCAAGG CATGGCCAAC AGTCCTACGA TGTGTCAGAT GTTTGTTGGG RabbitI TGGACAGTCT TGCCTCAGGG AATGACAAAT AGCCCCACGA TGTGTCAAAT TTATGTGGCT NileRatI TGGCGGGTAC TACCACAAGG AATGGCCAAT AGTCCTACTA CGTGTCAGCT GTATGTAGGG MMTVBR6 TGGAAAGTTT TGCCCCAGGG TATGAAAAAT AGCCCTACTT TATGTCAAAA ATTTGTGGAC SDunnartl TGGAAAGTAT TACCACAAGA AATGGCTAAC AGTCCCATCC TTTGTCAATG TAGAC RKangal TTCAGAGTCT TACCTCAAGG CATGGCCAAT AGCCCCACAA TGTGCCAAGC CTATGTCGCA

271 Appendix 2, Nucleotide Alignment of Class II ERV pol fragments

Platypusl TGGGAGGTAT TACCCCAAGG CATGAAATAT AGCCCTACTC TGTGTCAAAA ATTTGTTAGC Echidnal TGGACGGTCC TCCCCCAGGG AGTGAAAAAT AGCCCCACAA TGTGCCAGAC TTACATTGCG Armadillol TGGAAAGTAC TCCCACAAGG CATGAAAAAC AGCCCTGCTA TTTGCCAGAT CTATGTCAAC PFalconI TGGGTTGTAC TGCCACAAGG CATGAAAAAT AGCCCTACAA TGTGTCAATG GTATGTTGCA Prairiedogil TGGGTTGTAC TGCCACAAGG TATGAAAAAT AGCCCTACAA TGTGTCAATG GTATGTTGCA MHarrierI TGGGTTGTAC TGCCACAAGG AATGAAAAAT AGCCCCACGA TATGTCAATG GTATGTAGCT Vulture) TGGGTTGTGC TGCCACAAGG AATGAAAAAT AGTCCCACGA TATGTCAATG GTATGTAGCT ETinamoul TGGGTTGTGC TGCCACAAGG AATGAAGAAT AGTCCCACGA TATGTCAATG GTATGTAGCT Moorhenl TGGGTAGTGC TGCCGCAAGG AATGAAAAAT AGTCCAACAA TTTGCCAATG GTACGTAGCA HThrushl TGGGTTGTGC TTCCACAAGG AATGACAAAC AGCCCTACTA TTTGCCAGTG GTTCATTGCA ESOwlI TGGGTTGTTT TGCCACAAGG CATGAAAAAC AGTCGCACAA TGTGTCAGTG GTTTGTTGCA RNPheasantl TGGACGTTGT TACCCCAGGG AATGAAAAAT AGTCCCACTA CAGGTCAGTG GTTTGTAGCA Peacockl TGGACAGTGT TATCCCAGGG AATGAAAAAT AGTCCCACTA TATGTCAGTG GTTTGTAGCA BluetitII TGGACAGTAT TACCTCAGGG GATGAAAGCC AGCCCCTTCA TCTGCCAGTG GTATGTGGGG CMagpiell TGGCAAGGTC TGCCACATGG CATGAAAAAC TCGCCCACCA TTTGCCAGTG GTATGTCGCT Penguinl TGGACGGTGT TGCCCCAGGG CATGAAAAAC AGTCCCACAA TATGTCAAAT GTATGTTGCG GSKiwiI CGGACCGTAT TGCCACAGGG AATGAAAAAT AGTCCTACTA TATGTCAAAT GTATGTGGCA LKiwiI TGGACTGTAC TGCCACAGGG AATGAAAAAT AGTCCTACTA TATGTCAAAC GTATGTGGCC HThrushl TGGGAGGTTT TGCCACAGGG TTGTCATAAC AGCCCGACTA TTTGTCAGTG GTATGTGGCT ES0w1II TGGAAAGTTT TGCCACAAGG CATGAAAAAT TCACCAAAAA TTTGTCAATG GTTTGTAGCC WFgoosel TGGACCGTAC TATCCCAAGG GATGAAAAAT TCCCCAACTA TATGCCAGAC ATACGTTGCA NABDuckIII TGGACTGTTC TTCCACTAGG GATGAAGAAT TCTCCAACCG TATGCCAAAT GTATGTTGCT Flamingol TGGACAGTAT TACCCCAAGG AACAAAAAAT TCTCCGACCA TATGTCAAGC ATTTGTTGCA GRheaII TGGGTAGTTT TACCTCAGGG GATGAAGAAC TCCCCTACCA TCTGCCAATG GTTTGTAGAC DRheaI TGGGTAGTTT TACCTCAGGG GATGAAAAAC TCCCCTACCA TCTGCCAATG GTTTGTAGAC BrownKiwil TGGGTCGTTC TACCCCAAGG TATGAAAAAC TCCCCCACGA TATGTCAATG GTATGTAGAC RBowerIII TGGGTTGTGT TGCCTCAGGG GATGCATAAT TCCCCTATAA TGTGCCAGCT GTACGTTGCC Toucanettell TGGGTGGTGC TACCATAAGG CATGAAAAAT TCCCCAGCAT TATGTCAAAT GTATGTTTCT GWoodpeckerl TGGGTAACCT TGCCTCAAGG CATAAAGAAT TCTCCCACCC TGTGCCAAAT GTATGTGGCT ESOw1II TGGACAGTGT TGCCACAAGG CATGCAAAAT TCACCTACGA TGTGCCAAAT TTATGTGGAT GRheaI TGGACGGTAC TTCCCCAAGG TATGAAAAAT TCTCCAACCA TTTGTCAATA TTATGTGGCT JQuailI TGGAGTGTGC TGCCTCAACG TATGAAAAAC TCTCCTACAG TGTATCAGTT GTTTGTAGCT Cassuaryl TGGACGGTAT TGCCTCAGGG GATGCTAAAC AGTCCGACAA TATGCCAAAT GTTTGTGGCT AZMagpiell TGGACAGTTT TGCCACAAGG CATGTGCAAC AGCCCAACTA TATGCCAGAG GGTGATGGAC CMagpieIII TGGACAGTTT TGCTGCAAGG CATGTGCAAC AGCACAACTA TATGCCAGAG GGTGGTGGAG MThrushl TGGACAGCTT TGCCACCAGG CATGTGCAAC AGCCCAATGA TATGCCAGAG AGTGGTAGAC CMagpiel TGGCTAGTTT TGCCTCAAGG CATGAAGAAC TCACCGACTA TTTGCCAATA TTTTGTAGCC Guineafowll TGGACAGTCC TGCCGCAAGG CATGAAGAAC TCCCCGACTA TCTGTCAGTG GTTTGTTGAC PartidgeIV TGGAATGTTC TGCCACAGGG TATGAAGAAT TCCAGCACAC TGGCGCCCTT ACTAGTGGAC HThrushl) TGGATAGTGC TTCCACAAGG AATGAAAAAT AGCCCAACCA TTTGTCAATG GTATGTTGAT GPheasantl TGGACTGTCC TACCACAAGG TATGAAAAAT CCCCCCACCA TTTTTCAGAT GTATGTTGCC GPheasantII TGGGTAGTCC TTCCCCAGGG CATGAAGAAT TCCCCTACGA TTTGCCAGAT GTTCGTACAG NABDuckI TGGACAGTCT TGCCGCAGGG TATGAAGAAT TCACCCACCA TTTGTCAGAT GTTTGTGGCT Loonl TGGGTCGTCT TACCACAAGG CATGAAAAAC TCACCTACAC TCTGTCAGCT ATATGTAGCC Ostrichl TGGACAGTAT TACCACAGGG TATGCCTAAT AGCCCCACCT TGTGTCAATT ATATGTAGCT MoorhenII TGGACGGTGT TGCCACAGGG CATGTGCAAC TCGCCAACAC TATGTCAACT GTATGTCGAC Goshawkl TGGATTGTAC TTCCACAAGG CATGAAAAAC AGTCCTACTT TATGTCAACT TTTTGTGGAC Pigeonl TGGACTGTGT TACCCCAAGG AATGAAAAAT AGTCCGACAT TGTGTCAATT GTTTGTGGAC FHawkI TGGATTGTGC TCCCACAAGG CATGAAAAAC AGCCCTACTT TATGTCAACT TTTTGTGGAC BGrousel TGGGTCGTGT TTCCGCAGGG GATGAAAAAT AGCCCTACCT TATGCCAGTT GTTTGTTGAT BGrousell TGGACGGTAC TGCCACAGGG TATGCGGAAC TCCCCAACAA TATGTCAATT GTACGTGGAT PeacockII TGGACTGTTT TGCCTCAAGG CATGAAGAAC AGCCCCACTC TCTGTCAACT GTTCGTTGAC NABDuckIV TGGACTGTAT TACCCCAGGG AATGCGTAAT TCACCTATGT TATGTCAGCT TTATGTTGAC BluetitIII TGGACAGTTT TACCACAGGG TATGCGTAAT TCTCCTACCT TGTGTCAACT TTATGTTGAT Toucanettel TGGAAGGTGC TGCCTCAGGG CATGATCAAC TCACCCACCA TCTGCCAGAT CACAGTGGAT NABDuckli TGGAAAGTGC TGCCTCAAGG AATGATCAAC TCACCTACTA TCTGCCAGAT CATGGTAGAT HThrushIII TGGAGAATGT TACCCCAAGG AATGATCAAC TCCCCTACCA TCTGCCAGAT CACAGTAGAC BlueTitI AAGGTGCTGC TTCAAGGGGA TGGTCAGGCT CACCTACAAA TCTGCCAAAT CACAGTCGAT T7 TGGAAAGTGC TGCCTCAGGG GATGGTCAAC TCACATACAA TCTGTCAAAC CACAGTTGAT MHarrierll TGGACGGTGT TACCACAGGG CATGAAAAAC TCTCCTACTA TTTGTCAGCG AGCCGTAGAC Emul TGGACCGTCC TCCCGCAGGG CATAAAAATT TCACCTACGA TTTGTCAGGC GGCAGTAGCT LKiwiII TGGACGGTGC TCCCACAGGG CATGAAGAAC TCACCTGCGA TTTGCCAACA CGTTGTAGCC LDV TGGACTGTGC TACCGCAAGG CATGAAGAAT TCCCCGTACA TTTGCCAGCA GGTGGTAGCT RSV TGGAAGGTCT TGCCCCAAGG GATGACCTGT TCTCCCACTA TCTGTCAGTT GGTAGTGGGT ALVsubgrpJ TGGAAGGTCT TGCCCCAAGG GATGACCTGT TCTCCCACTA TCTGTCAGTT GATAGTGGGT Tragopanl TGGAAGGTGC TCCCGCAGGG AATGGCTTGC TCTCCCACGA TCTGCCAGTT GGTAGTAGAC Guineafowlil TGGTGCATTT TGCCGCAAGG AATGGCATGT TCCCCCACCA TCTGCCAGCT TGTGGTCGGC LoonII TGGGTGGTGC TACCACAGGG AATGAAAAAT TCCCCAACAT TATGTCAAAT GTGTTTC--T

272 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Appendix 2. Nucleotide alignment (continued) BLV GAACCTCTTC GCCAAGTTTC CGCCGCCTTT TCCCAGTCTC TTCTGG[TGTC CTATATGGACGAT] HTLV1 CATATCCTGC AGCCCATTCG GCAAGCCTTC CCCCAATGCA CTATTC[TTCA GTACATGGATGAC] HTLV2 GCCGTCCTCA ACCCCATGAG GAAAATGTTT CCCACATCGA CCATTG[TCCA ATACATGGATGAC] HIV1 AAAATCTTGG AGCCTTTTAG AAAACAAAAT CCAGACATAG TCATCT[ATCA ATACATGGATGAT] HIV2 CAGGTCTTAG AACCATTCAG AAAAGCAAAC CCGGATATCA TTCTCA[TTCA GTACATGGATGAT] SIVagm TCCATTCTAG AAGAAATAAA AAAGGAGTTA AAACCCCTAA CCATTG[TGCA ATACATGGATGAC] FIV AATATAATAC AACCTTTTAT TAGACAAAAT CCTCAATTAG ATATTT[ACCA ATATATGGATGAC] BIV AAGATTATAG AAAACATTAA AAAGAGTCAC CCAGATGTCA TGTTGT[ATCA ATATATGGATGAT] Jembrana GAGATAATTG CAGAGATAAA AGATAGATTT CCTGACATTG TGCTCT[ATCA ATATATGGATGAC] CAEV GAGATCTTAG AGGATTGGAT ACAGCAGCAT CCAGAAATTC AATTTG[GCAT ATATATGGATGAT] Visna AAAATATTAA GAGGATGGAT AGAAGAACAC CCTATGATAC AATTTG[GAAT ATACATGGATGAT] EIAV GAAATTTTAC AACCTTTTAG GGAAAGATAT CCTGAAGTAC AATTGT[ATCA ATATATGGATGAT] HERVK10 CGAGCTCTTC AACCAGTGAG AGAAAAGTTT TCAGACTGTT ATATTA[TTCA TTATATTGATGAT] HML1Z70280 AAAGCTATTA AGCCAGTTAG AGAACAGTTT AAAAAATGTT ATATCA[TCCA TTACATGGATGAT] HML3ACO25577 CAAGCAATTG AACCTACTCG TAAAAAATTT TCACAGTGTT ACAT-A[TTCA CTATATGGATGAT] HML3AC092364 CAAGCAATTG AACATACTCA TAAAAAATTT CCACAGTGTT ACATTA[TTCA CTATGTGGATGGT] HML4AC093517 TGCATATTAA AGCCAGTTAG AGATAAATTT CCCCAATGTT ATATTA[TTCA TTACACAGATGAT] HML5AP000870 CAGGCTTTGC TCCTCAGTGG AAAATAATTT CCTAATTGCA AGATTA[TTCA TTTTATGGATGAT] HML6AF069508 AAAGCTTTAA AGGAGCCTCG CAATATATTT CCCAATGCCT ACATCA[TTCA TTATATGGATGAT] HML7AC013722 ACTGTGTTAC AACCTATCCA ACAGACTTTT AAAAATAATT ACATTC[TTCA TTATACGGATGAT] HML8AL596245 CAAGTGCTTT CACCAGTTCG AGCCCAATTT CCCCAGGCCT ATGTTC[TTCA TTATATTGATGAT] HML9ACO25569 GAAGCCTTAC TTTCTGTGCG TCCATCCTTC CCCCAGGCAA AAATCT[TCCA CTATGTGGATGAT] HML9AC068700 GAAGCTTTAC TTCCTGTGTG TCAATCCTTC CCCCAGGCAA AAATCT[TCCA CTATATGGATGAT] Oryzomysl AAGCCACTAG AACAAATACG TAAGAAATTC CCACAATCTA TAATTT[ATCA CTATATGGACGAC] ArmadilloII ATTCCATTAC AAATTTTACG AAAACAATTT CCTAAGGCTA TGATTA[TTTA TTACATGGGCGAC] Reedbuckl ATTCTGTTAC AAATTTTATG AAAACAATTT CCTAAGGCTA TAATTA[TTCA CTATATGGACGAC] Muspaharill CAACCATTAG AAATTATTCA TAAGCAATTT CCTTGATCTA TCATTT[ATCA CTATATGGACGAC] NileRatII ACTGCATTAA AACCCTATGT AGACAAA------GGACTCA ACATCT[ATCA CTATATGGATGAC] HedgehogI TCTGCCCTTT TCTCATATAT TCATAAG------GGCTTTA ATGTTT[TCCA CTACATGGACGAC] Cougarl GAAGACTTAA AGCCAATGAG GGCGCAGTTT CATTCTTGGA TTGTTT[ACCA CTACATGGACGAC] DomesticCatl ')W77?????? ?????????? ?????????? 7?????[???? ????????77??7] T22 CAGGCGCTGG CTGGTTTTCG GGATCAGTTT CCCCAGTTAT TAGTGT[ATCA TTACATGGACGAC] Sheepl AAAATTTTGC AGCCCATCAG ACAGCAATTC CCTGAGGCAT ATCTCA(TTCA CTACATGGATGAC] IPSquirrell AAAATTTTGC AGCCCATCAG ACAGCAATTC CCTGAGGCGT ATCTCA[TTCA CTACATGGATGAC] SIMongooseI CATGTTTTGC TCCCTATTAG AACTCAATTT TCTACACTGT ATATTA(-TCA CTACATGGACGAC] MPMV ACAGCCATAC ATAAGGTTAG ACATGCCTGG AAACAAATGT ATATTA[TACA TTACATGGATGAC] SRVI ACAGCCATAC ATAAAGTTAG ACATGCCTGG AAACAAATGT ATATTA[TACA TTACATGGATGAT] SRVII GCTGCTATAG AGCCAGTCAG AAAATCTTGG GCACAAATGT ACATTA[TACA CTATATGGATGAC] BabboonSERV GCTGCTATAG AGCCAGTCAG AAAAACATGG ACACAAATGT ATATTA[TACA TTATATGGATAAT] MusD AAGGCAATTC AGCCTGTTAG ACAACAATGG CCAAATATTT ACATCA[TTCA TTTCACAGATGAT] TvERV CAGACAATCG ACCCCTTTCG CTTACAATTT CCACAACTTT ATATCA[TTCA CTATATGGATGAC] SMRVH GCCGCCATTG CCCCAGTAAG ATCCCAGTGG CCAGAGGCCT ATATCC[TCCA TTATATGGATGAC] Colobusl ACAGCCATAC ACTCAGTTAG GGAGACATGG AAACAAATGT ATATTA[TACA CTACATGGATGAC] SImongooseI AATGCCATAC AAAAGGTACG CGATGATTGG AAGGATATGT ATATTA[TACA CTATATGGATGAC] Goatli AAGATCGTTG ACCCTTTCTG CCTCCAGTTC CTTTCTCTTT ATATAA[TCCA CTATATGGATGAT] OstrichD CAGGCTGTAG ACCCTCTTAG GCGTCTGTGG TCTTCCATTT ATATCA[TTCA TTATATGGACGAC] Lorisl CAAATTATTG ACCCTTTTCG AACATTATGG CCTACTAGCT ATATAA[TTCA CTACATGGACGAC] SIMongooseII CAGGTAATTG ATTCCCTGCG AAACTCTTGG CCTTCCTTCT ATATCA[TTCA CTACATGGATGAC] Jaagsiekte ACAGCAATAG CTCCGGTTCG TCAACGTTTT CCGCAGCTAT ATTTAG[TTCA TTATATGGATGAT] RDolphinl CAGGCCTTGA CTCCTGTGAG AAAAAGGTTT CCATCGTTAT ATCTCA[TACA CTACATGGACGAC] WFDeerI AAAGCCTTGG AACCTACTCG TCAAAAATAT TCTTCTCTAT ATATGA[TTCA CTATATGGATGAC] Cariboul GAAGCTCTGG AACCTACTCG CCAAAAATAC CCTTCTCTAT ATATGA[TTCA CTATATGGACGAC] GiraffeI CAAGCTATGC AGAATGTTAG AGGGAAGTAT AAAGATTTAT ATTTGA[TACA CTACATGGATGAC] Bisonl GCAGCTATAG AAGATATTAG GGCTAAATAT GAACAACTGT ATATGA[TTCA CTATATGGATGAT] Muskoxl GCCGCTCTAG AAGATACTAG GGCTAAATAT GAACACGTAT ATATGA[TTCA CTACATGGACGAC] CFBadgerl CATGCAATTG CCCCATTTCG AGTACTATTT CCAATGGTTT ATTGCA[TACA CTATATGGACGAC] HRV5 CAGGCAGTA? "V?7,7??.,?? 9.7700057,7 wv,?.?????? ??????[???? ?????????????] Goatl GCAGCTCTTC TCCCTATTAG AAAACAATTT CCAAAATGGG TGTTAG[TACA CTACATGGACGAC] IAPSHamster GAGGCTTTGG AGCCAATTAG GAAGCAATTT ACATCTTTAA TCGTTA[TCCA TTATATGGACGAT] IAPCHamster GAAGCAATTG CTCCTTTTGA GGTGGACTTT CCCAAAATTA GATGTG[TTCA TTATATGGATGAT] IAPMouse GAAGCTCTTT TGCCAGTGAG GGAACAATTC CCCTCTTTAA TTTTGC[TCCT TTACATGGATGAC] Mastomysl GAAGCCATTG CTCCTTTGAG AAAGAATTCT CCAGACTTGA GATGCC[TTCA CTACATGGATGAC] Myomysl GAAGCCATTG CTCCTTTGAG AAAGAATTTT CCAGACTTGG GATGCC[TTCA CTACATGGACGAC] UromysI CTGGCGCTTA ATCCAATAAG AAAACAGTTT CCTTCGCTGG TTCTGC[TGCA CTACATGGATGAC] Prairiedogl GCGACCCTTG CTCCGCTCAG ACAAAAGTAT CCATTGTTGA AATGCA[TACA CTATATGGACGAC] RabbitI CAGGTACTGG AGTCTCTTAA GGCTCAGCAT TCCGATTGCA GGTACC[TACA CTACATGGACGAC] NileRatI CGAGCTTTGC TGCCAATTAG GCAGACCTTC CCAGAGATTA GAGTTT[GTCA CTACATGGACGAC] MMTVBR6 AAAGCTATAT TGACTGTAAG GGATAAATAC CAAGACTCAT ATATTG[TGCA TTACATGGATGAC] SDunnartl AAGGTTCTTG CCCCAGTTAG AAATTTATAT CCAAATGTTT ATATGC[TTCA TTACATGGACGAC] RKangal GCCATCATAC AACCAGTCCG CGATTACTCC CCACAGACCT TGATTA[TTCA CTACATGGACGAC]

273 Appendix 2. Nucleotide Alignment of Class II ERV pol fragments

Platypusl CTAGCCCTAA CTCCAATTAG ACAAAAATAT CTGAGTGCTT ATATTA[TGCA GTACATGGACGAC] Echidnal CTTCTGATAG TGCCGCTCCA CGTCAAGCAT CCTGAGGCAT ATATTA[TTCA CTATATGGACGAC] Armadillol TTCGCAATTC AGCCTCTCAG ACACAGGTTC C TTGTCC[TTCA TTACATGGACGAC] PFalconI AAAATACTCA GTCCTGTCCG AAATGCTATG CCTGCTGTGT TACTGT[ATCA TTATATGGATGAC] PrairiedogII AAAATACTCA GTCCTGTCCG AAATGCTATG CTTGCTGTGT TACTGT[ATCA CTACATGGACGAC] MHarrierl AAAGTGCTTA GTCCTGTTAG GACCACAATG TCCAGTGTCT TACTGT[ACCA CTATATGGACGAC] VultureI AAAATACTTA GCCCTGTCAG AACTAAAATG CCTGACGTCC TATTGT[ATCA TTACATGGACGAC] ETinamoul AAGATACTCA GCCCTGTCAG AAGCAAAATG TCTGATGCCT TATTGT[ATCA CTATATGGACGAC] MoorhenI AAAATTTTAA GCCCAATAAG GAAGTCCATG CCTGCAATTT TAATTT[ACCA CTATATGGACGAC] HThrushl AAAGTCCTCA GTCCTGTTCA CCACAAAATG CCTGCTGCTT TGTTGT[ATCA CTACATGGACGAC] ESOw1I AAGGCTTTGA GACCTGTTAG AATGCAGGTC CCCCATGCTG TTATAT[ATCA TTACATGGACGAC] RNPheasant I AAGGCTCTAT TCCCTGCGAA GGAAAGGTTG CCCCACACGA CTATTT[ATCA CTATATGGACGAC] Peacockl AAATCTCTAT CCCCTGCGAG GGAAAAGTTG CCCCACGCGA CTATT[ATCA CTACATGGACGAC] BluetitII TCATTGCTGT CCTCAGTGCG TGCTGAAAAG AGAGAGGCTA TCATTT[TGCA CTATATGGACGAT] CMagpiell CGCATTTTGT CCCCTGTCCG GAAAAAGGCT ATTAGCGCTG TGATCC[TACA CTACATGGACGAC] PenguinI CGTGTGTTGT CGCCCATAAG AAACAAATAT CCTGACTTAA TTTGTT[ATCA CTATATGGACGAT] GSKiwiI GAGGCCCTGT CTGAGGTTCG ACGAAAGCAT CCACACATTT TTTGTT[ATCA CTACATGGATGAC] LKiwiI TGTCTGAGGC CCTGTCTGAG GCGAAAGCAT CCACAGATTT TTTGTT[ATCA CTACATGGATGAT] HThrushl CAGGCCTTCT CTGAAGTTCG CGAGCAGTTT CTTGACGCGC ATTTTT[ACAA CTACATGGACGAC] ES0w1II CAGACTTTGT CACCTGTGAG AGAAAAGTTC CCTACTAGCT ATTGTT[ACCA TTACATGGATGAC] WFgoosel GAAGCTTTGC GCCCTGTGCG AAGGCAATTC CCTCACGTGT ACATTT[ATCA CTATATGGACGAC] NABDuckIII AAAGCTCTGG CCCCTATTAG AAAACAGTAT CCCGAGACCT ATATTT[ATCA CTACATGGACGAC] Flamingol GAGGCTCTGC GCCCAGTGCG AAAGCGTTTT CCCCATGTGT ATATGT[ATCA CTATATGGACGAC] GRheaII CTAGCTCTGC AACCATGGCG GAAGAGGCAC CCAGAACTGC TCACAT[ATCA TTATATGGACGAC] DRheaI CTAGCTCTGC AACCGTGGCG GAAGAGGCAC CCAGAACTGC TCACAT[ATCA CTACATGGACGAC] BrownKiwiI CGAGCCTTAG AACCCTGGAG GCAGAGACAT CCCGAGGCTA TAGTAT[ACCA CTACATGGACGAC] RBowerIII TGGGCACTAG CGCCACTGCG AAAGCAGTAC CCCCAGTACC TTATCT[ATCA TTACATGGACGAC] Toucanettell CGGGCCCTTC AACCATTTCG CGAGCAGAAT CCTTTCCTGC TTGTGT[ATCA CTATATGGACGAC] GWoodpeckerl TGGCCTTTGT CCCCATTTAG ACAGACTCAT CGTGACTTTC TTGTCT[ACCA CTATATGGACGAC] ES0w1II TGGGCATTGG GACCAATTAG GAGACAATGG CCTCAATACC TCATCT[ATCA TTACATGGACGAC] GRheaI TGGGCTCTCA CCCCTGTGCG CCAACAGTAT CCTCAGTGGT TGATCT[ATCA TTATATGGACGAT] JQuailI GCAGCTCTGC CTCCGTTCTG AGCATGCTGG GTAAACTGTT TAATT-[AGCA CTATATGGACGAT] Cassuaryl CGAGCTATCC TCCCTATTCG CAGAGCATAT CCCAATGCTT TAATTT[ATCA CTACATGGACGAT] AZMagpieII CTTACTCTGC AACCTGTCCT CCGCCAATTT CCTGAAGCAA CTGTGT[ACCA CTATATGGACGAC] CMagpieIII CTTACTCTGC AACCTGTCCG CTGACAATTT CCTGAAGCGA CTGTGT[ACCA CTACATGGATGAC] MThrushl CTTACTCTGC AACCTGTGTG CTGCCAATTT CCTGAGGCAA CCATAT[CCCA CTACATGGACGAC] CMagpiel CGTGCATTGT CCCCAGTCCG TGAGCAATTC CCGCAATCAG TTATTC[TCCA CTACATGGACGAC] Guineafowll CTAGCTCTAA GTGCCTTCCG CCGACGATAT CCGAGCCTTA TCGTCT[ATCA CTACATGGACGAC] PartidgelV CGAGCT???? "9??7"77 7???????" ???? 79"7 77777' 7 [7777 HThrushII AAGGCACTTG AAGAGTGGAG AAAAGAGAAC AAAGCCTTCT TAACGT[ACCA CTATATGGACGAC] GPheasantl TTGGCCCTGA GACCCTTTTG AGGGAAGTAC CCACAGCTCA TTGTTT[ATCA CTATATGGACGAC] GPheasantll GAAGCATTGC AGCCCATTCG GGAACAATGG CCCTCCCTGT TGATCA[TCCA CTACATGGACGAC] NABDuckI CGTGCATTGC AGCCAGTCAG GAAGCAATAC CCGCAGCTGA TGATGC[TACA CTACATGGACGAC] LoonI TGGGCATTAC AACCACTCCG GGACCAATGG CCGGACACAA TAATCT[ATCA TTACATGGATGAC] Ostrichl GCTGCACTAC AGCCCTTACG AGAACAATGG AAACATGTTA TAATTT[ATCA CTACATGGATGAC] MoorhenII GCTGCCTTAC AGCCGCTGCG GCGACGATGG CCAAAGACCA TGATTT[ATCA CTACATGGACGAC] Goshawkl AATGCCCTGA AAGAAATCCG ATCAACCTGG GACAAGACAA TAATAT[ATCA CTATATGGATGAC] Pigeonl AATGCGTTGC GTCCTATTCG TGATGCCTGG CCTACAGCGA TGGTGT[ATCA CTACATGGACGAC] FHawkI AATGCCCTGA GAGAAATCCA ATCAACCTGG GACAACACGA TAATAT[ATCA CTATATGGACGAC] BGrousel TACGCGTTAG CACTGGTCAG AAAAGCTTGG TCTCATGCTA TCATTT[ATCA CTACATGGACGAC] BGrousell GCAGCTTTGC AGCCCCTGCA TCACGAAATG CCCGACACTA TTATTT[ACCA CTATATGGACGAC] PeacockII ATGGCCCTCC AACCAGTACG TGCTGCGTGG CCACATGCCA TCACTT[ACCA CTACATGGACGAC] NABDuckIV GCAGCTCTGC AACCGATTCG ACGAAAATGG CCAGAGACAA TAATCT[ATCA CTACATGGACGAC] Bluetitlil AATGCTTTGC AGCCACTACG TGCTCGATGG CTACA-ACTA TTATTT[ATCA CTACATGGATGAC] Toucanettel CGGGCGCTGG CACCGGTCCG GCACAGCGAC CCAACTGCGA CCATCA[TTCA ATATATGGACGAC] NABDuckII CGGGCGCTGG CACCAGTTCG GCGCAGTGAC CCGACTGCGA CTATCA[TCCA CTACATGGACGAC] HThrushIII CAAGCACTAG CACCAGTCCG GCAGAGCAAC CCGACCGCTA CCATCA[TGCA GTACATGGACGAC] BlueTitI CGAGTGCTAG CATCGATCCG TCATAGCAAC TTGACTGCGA CCATCA[TCCA CTACATGGACGAC] T7 CGAGCGCTAG CACCGATCTG TCATAGCAAC TCGACTGCTA CCATCA[TCCA GTACATGGACGAC] MHarrierll ATAGCGCTAC AGCCGGTTAG GCAGCAACGA CCGCACCTTT TGATTC[TCCA CTATATGGACGAC] Emul AAGGTATTGC AGCCGGTAAG AGCGCAGCTC CCGCGCGCGC TTATCA[TCCA TTACATGGACGAC] LKiwiII GACGCAATAG CCCCGGTACG AACACAACAT GCGGCGGGGG TCATGA[TCCA CTACATGGACGAC] LDV GAGGTTATCC GTCCTATTAG GGAACGGTTT CGTGATGCAG TGATCA[TACA TTACATGGATGAC] RSV CAGGTACTTG AGCCCTTGCG ACTCAAGCAC CCATCTCTGT GCATGT[TGCA TTATATGGATGAT] ALVsubgrpJ CAGGTACTTG AGCCCTTGCG ACTCAAGCAC CCATCTCTGC GCATGT[TGCA TTACATGGATGAT] Tragopanl AGGGTCTTAG GCCCTGTCAG GGAGAAGATG GGACAGTCTT TTCTGG[TGCA CTACATGGACGAC] Guineafowlil AGGATCCTAG AGCCGTTGAG AGGTCAATTC CCTGCCTGTC AGATAC[TGCA CTACATGGACGAC] LoonII TGGGCCCTTC AACCGTTTCG CGAGCAGAAT CCTTTCCTGC TTGTGT[ATCA TTATATGGACGAC]

274