View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Elsevier - Publisher Connector

Virology 441 (2013) 66–69

Contents lists available at SciVerse ScienceDirect

Virology

journal homepage: www.elsevier.com/locate/yviro

Ancient invasion of an extinct in cetaceans

Lina Wang a, Qiuyuan Yin a, Guimei He a, Stephen J. Rossiter b, Edward C. Holmes c,d, Jie Cui c,n

a Institute of Molecular Ecology and Evolution, Institutes for Advanced Interdisciplinary Research, East China Normal University, Shanghai 200062, China b School of Biological and Chemical Sciences, Queen Mary, University of London, London E1 4NS, UK c Sydney Emerging Infections and Biosecurity Institute, School of Biological Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW 2006, Australia d Fogarty International Center, National Institutes of Health, Bethesda, Maryland 20892, USA

article info abstract

Article history: Endogenous (EGVs) have been widely studied in terrestrial mammals but seldom so Received 11 January 2013 in marine . A genomic mining of the (Tursiops truncatus) genome revealed a Returned to author for revisions new EGV, termed Tursiops truncatus endogenous (TTEV), which is divergent from extant 8 February 2013 mammalian EGVs. Molecular clock dating estimated the invasion time of TTEV into the host genome to Accepted 9 March 2013 be approximately 10–19 million years ago (MYA), while a previously identified killer whale endogenous Available online 29 March 2013 gammaretrovirus (KWERV) was estimated to have invaded the host genome approximately 3–5MYA. Keywords: Using a PCR-based technique, we then verified that similar endogenous exist in nine cetacean Endogenous gammaretrovirus genomes. Phylogenetic analysis revealed that these cetacean EGVs are highly divergent from their Cetaceans counterparts in other mammals, including KWERV from the killer whale. In sum, we conclude that there Molecular dating have been at least two invasion episodes of EGVs into cetaceans during their evolutionary history. Phylogeny & 2013 Elsevier Inc. All rights reserved.

Introduction Results and discussion

Retroviruses (family Retroviridae) are a diverse family of infec- Our genomic mining revealed a novel tious agents that possess a positive-sense RNA genome of 7–12 kb with a complete genome located at positions 5,310–13,561 of in length and an outer envelope protein. They employ a unique Contig248472 of the bottlenose dolphin genome (18,463 bp, Gen- replication strategy, which involves reverse transcription of the Bank accession number ABRN02237111). Notably, this was virion RNA into double-strand (ds) DNA which is subsequently divergent to all known extant gammaretroviruses, with less than integrated into the host chromosome. Infection of host germline 70% sequence similarity to other gammaretroviruses during the cells can lead to vertical transmission from parent to offspring in tBLASTn analysis. In addition, this endogenous virus has only one the form of Mendelian alleles named endogenous copy in the dolphin genome, an observation which was verified by (ERVs) (Weiss, 2006). gene cloning (see below). A nucleotide BLAST analysis also ERVs have been discovered in many animals (mainly mam- revealed the presence of two solo LTRs in Contig528084 and mals), of which endogenous gammaretroviruses (EGVs) are parti- Contig235866 of the dolphin genome. We denote this novel cularly widely distributed. To date, the major mammalian hosts element as Tursiops truncatus endogenous retrovirus, or TTEV. reported for EGVs are terrestrial species (Gifford and Tristem, TTEV has a typical gammaretroviral structure with a genome of 2003) and flying bats (Cui et al., 2012a, 2012b). The first doc- 8,252 nucleotides (Fig. 1). This EGV also employs the tRNAPro (5′- umentation of an EGV in a marine species was in the killer whale GGG CTC GTC CGG GAT-3′) at the primer binding site (PBS) which (Orcinus orca; Lamere et al., 2009). A similar virus was also been is conserved among all gammaretroviruses, but possesses a variant detected in another nine cetacean species (Lamere et al., 2009). To polypurine tract (PPT) (5′-AAG AAA AGG GGG GAA-3′). increase our understanding of the extent and genetic diversity of To establish the phylogenetic relationship of TTEV to the extant ERVs in marine species, we performed a genomic mining and gammaretroviruses, we performed a phylogenetic analysis using the phylogenetic analysis of those ERVs present in the dolphin complete sequences of the Gag, Pol and proteins. Strikingly, and (Tursiops truncatus) and other cetacean genomes. as suggested by our BLAST analysis, the phylogenetic analysis of Pol revealed that TTEV is divergent to all extant mammalian EGV, and more closely related to the avian Reticuloendotheliosis gammaretro- virus (REV) (Fig. 2). In marked contrast, the EGV previously identified n – Corresponding author. in cetaceans KWERV from the killer whale (Orcinus orca)(Lamere E-mail address: [email protected] (J. Cui). et al., 2009) – was contained within the genetic diversity of

0042-6822/$ - see front matter & 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.virol.2013.03.006 L. Wang et al. / Virology 441 (2013) 66–69 67 mammalian gammaretroviruses and hence distinct from TTEV into the O. orca genome, tentatively suggests that these represent two (Fig. 2). The Gag and Env phylogenetic trees showed the same independent endogenization events. pattern; TTEV was divergent from both KWERV and the other To infer the phylogenetic positions of these newly acquired mammalian gammaretroviruses (Fig. 3). cetacean EGVs, we performed another phylogenetic analysis using We next estimated the invasion time of TTEV using an LTR- Pro (protease)-Pol (polymerase) amino acid sequences. This resulted in divergence dating method (Johnson and Coffin, 1999; Cui et al., a phylogeny in which all the nine cetacean EGVs sampled in this study 2012b).Accordingly,ourestimateforthetimeofTTEV(D¼0.018) were phylogenetically distinct from the known mammalian gammar- invasion was in the range 10–19 MYA, while that of KWERV was etroviruses and KWERV (Fig. 4), and hence in accordance with the estimated to have occurred approximately 3–5MYA(D¼0.005). phylogenetic position of TTEV determined above (Fig. 1). Given the low Although KWERV has also been documented in the genome of T. level of sequence diversity (mean nucleotide p-distance of 0.016) truncatus (Lamere et al., 2009), that the separation time of T. truncatus within the clade containing the nine new cetacean EGVs, it is clear that and O. orca (at approximately 9 MYA; Vilstrup et al., 2011)isolder these EGVs share a relatively recent common ancestor. Interestingly, than our molecular clock estimate for the invasion time of KWERV these new cetacean EGVs contained a member (OOEV2) also identified in the killer whale, such that there have been at least two independent invasions (represented by OOEV2 and KWERV) of gammaretroviruses 0 2000 4000 6000 8000 into cetacean genomes. Given the distinct phylogenetic position of these nine new cetacean EGVs, we propose that they represent a novel species within the Gammaretrovirus.

LTR gag pol env LTR PBS pol-env PPT Materials and methods overlap Genomic mining Fig. 1. Genomic structure of a cetacean endogenous retrovirus, TTEV. The complete viral genome is 8,252 nucleotides in length. The major structural genes, gag, pol and env, are shaded accordingly. The positions of the two LTRs, PBS (primer-binding We performed a genomic mining of one publicly available dolphin site), PPT (polypurine tract), and a pol-env overlap region are also shown. genome (Tursiops truncatus, http://www.ncbi.nlm.nih.gov/genome/),

PERV-B 100 PERV-C (Pig) PERV-A 91 MDEV (Mouse) KoRV 91 (Koala) 99 GALV (Primate) KWERV (Whale) F-MuLV 77 R-MuLV 73 M-MuLV (Mouse) 100 M-CRV 85 96 XMRV FeLV (Cat) 93 100 BaEV (Primate) RD114 90 (Cat) RfRV (Bat) 95 * TTEV (Dolphin) REV (Chicken) HERV-E

83 HERV-H-RGH

HERV-Fc1 (Primate)

86 HERV-FRD HERV-W

100 HERV-I HERV-ADP WDSV

0.3 subs/site

Fig. 2. Phylogenetic positions of TTEV in Class I retroviruses. The tree was inferred using complete Pro-Pol amino acid sequences and was rooted using an as outgroup (WDSV). The retroviruses from cetaceans are shaded. The Class I ERVs used in this phylogenetic analysis were taken from Jern et al. (2005). The viral lineage marked with an asterisk represents the new cetacean viruses reported in this study. Only bootstrap values 470% are shown. Branch lengths are drawn to a scale of amino acid substitutions (subs) per site. All abbreviations can be found in Table 1. 68 L. Wang et al. / Virology 441 (2013) 66–69

KoRV R-MuLV 100 100 GALV 100 F-MuLV M-CRV 96 MDEV 73 M-MuLV 77 PERV-A 72 FeLV 99 100 BaEV 95 100 PERV-C RD114 PERV-B KWERV KWERV 76 100 KoRV 100 GALV F-MuLV MDEV 76 R-MuLV 100 PERV-B 89 93 100 PERV-A XMRV PERV-C 100 M-MuLV RfRV 90 95 M-CRV 100 REV * TTEV FeLV 0.2 subs/site 73 100 BaEV R-MuLV 97 RD114 96 F-MuLV RfRV 85 M-MuLV 96 LAcEV M-CRV 100 99 LAlEV XMRV OOEV2 FeLV KWERV PPEV RfRV GMEV MEDV DDEV

100 PERV-C SCEV 100 100 PERV-A * GGEV PERV-B 73 100 TTEV KoRV REV 84 GALV 0.2 subs/site 100 REV * TTEV Fig. 4. Phylogenetic relationships among cetacean and other gammaretroviruses 0.2 subs/site (endogenous and exogenous) inferred using partial Pro-Pol amino acid sequences. The retroviruses from cetaceans are shaded. The retroviruses from cetaceans are Fig. 3. Gag (A) and Env (B) amino acid phylogenies. XMRV was excluded from the shaded and the viral clade marked with an asterisk represents the nine new Gag phylogeny because it is highly truncated, while RD114 and BaEV were not cetacean viruses reported here. Only bootstrap values 470% are shown. Branch included in the Env phylogeny because this gene was involved in a recombination lengths are drawn to a scale of amino acid substitutions (subs) per site. The tree is event with (divergent) . The retroviruses from cetaceans are midpoint rooted for purposes of clarity only. All abbreviations can be found in shaded and the viral lineage marked with an asterisk represents the new cetacean Table 1. viruses reported in this study. Only bootstrap values 470% are shown. Branch lengths are drawn to a scale of amino acid substitutions (subs) per site. The tree is midpoint rooted for purposes of clarity only. All abbreviations can be found in ′ Table 1. TAG GTG ACA CTA TAG-3 ) primers were used to sequence all positive clones on an ABI 3730 DNA sequencer (Applied Biosys- tems). All nine gammaretroviruses obtained in this study were using genomic nucleotide tBLASTn, with the extant gammaretro- submitted to GenBank and assigned the accession numbers: viruses used as reference sequences (Table 1). For this analysis we KC128847, KC128848, KC128849, KC128850, KC128851, KC128852, employed a cut-off E value of 1e−10 and a sequence similarity of 60%. A KC128853, KC128854 and KC128855 (Table 1). genomic mining for LTR sequences of the newly discovered EGV (see Results) was also undertaken using tBLASTn, this time with a cut-off E Phylogenetic analyses value of 1e-10 and a sequence similarity of 80%. Several phylogenetic analyses were performed to determine PCR amplification the phylogenetic position and evolutionary history of the novel EGVs. All reference gammaretroviral sequences (Table 1) were To identify and characterize those EGVs present, we designed a downloaded from GenBank (http://www.ncbi.nlm.nih.gov/Gen pair of degenerated primers covering the conserved region (∼950- Bank/) and aligned with TTEV in MUSCLE 3.7 (Edgar, 2004), and bp) of the viral protease (Pro)/polymerase (Pol) – forward primer then checked manually in Se-Al (http://tree.bio.ed.ac.uk/software/ (5′-TTC CTS ATA GAC ACA GGA GCC-3′) and reverse primer (5′-ATC seal/). For phylogenetic inference using the Pol gene, we included ATC CAC GTA TTG CAR WAG AGT-3′) – based on TTEV and the seven primate EGVs (Jern et al., 2005) which have complete Pol publicly available gammaretroviruses (Table 1). Muscle tissue sequences, as well as an epsilonretrovirus virus – Walleye dermal samples of nine cetacean species were kept at the School of Life sarcoma virus (WDSV) – as an outgroup to root the phylogenetic Sciences, East China Normal University, Shanghai, China. DNA was tree. Phylogenetic relationships were then inferred using the extracted using the Tissue DNA Kit (Omega) according to the maximum-likelihood (ML) method available in PhyML 3.0 manufacturer's protocols. Polymerase chain reactions (PCRs) were (Guindon et al., 2010), employing 1,000 bootstrap replicates to performed using Ex Taq TM polymerase (TaKaRa) with the reaction determine phylogenetic robustness. The best-fit amino acid sub- conditions as follows: 94 1C for 5 min followed by 30 cycles stitution models were determined using ProtTest 3.0 (Darriba consisting of 94 1C for 30 s, 54.5 1C for 30 s, 72 1C for 1 min, and et al., 2011) and found to be LGþIþΓ for Pol, LGþΓ for both Gag a final extension of 72 1C for 10 min. All PCR products were ligated and Env. To infer the phylogenetic positions of the nine newly into pGEM-T Easy vectors (Promega) and transformed. The uni- amplified cetacean EGVs obtained here, we performed another versal T7 (5′-TAA TAC GAC TCA CTA TAG GG-3′) and SP6 (5′-ATT phylogenetic analysis using a conserved region of Pro (protease)- L. Wang et al. / Virology 441 (2013) 66–69 69

Table 1 List of sequences used for phylogenetic analysis in this study.

Virus Abbreviation Retrovirus GenBank Accession no. Host

Retroviridae Gammaretrovirus REV Reticuloendotheliosis virus NC_006934 Chicken XMRV Pre Xenotropic MuLV-related virus-1 NC_007815 Mouse FeLV NC_001940 Cat GALV Gibbon ape leukemia virus NC_001885 Primate F-MuLV Friend NC_001362 Mouse M-MuLV Moloney murine leukemia virus NC_001501 Mouse R-MuLV Rauscher murine leukemia virus NC_001819 Mouse M-CRV Murine type C retrovirus NC_001702 Mouse PERV-A Porcine endogenous retrovirus A AJ293656 Pig PERV-B Porcine endogenous retrovirus B AY099324 Pig PERV-C Porcine endogenous retrovirus C EF133960 Pig RD114 RD114 retrovirus NC_009889 Cat MDEV Mus dunni endogenous virus AF053745 Mouse KoRV AF151794 Koala KWERV Killer whale endogenous retrovirus GQ222416 Killer whale BaEV Baboon endogenous virus D10032 Primate RfRV Rhinolophus ferrumequinum retrovirus JQ303225 Bat LAcEV Lagenorhynchus acutus endogenous retrovirus KC128847 Atlantic white-sided dolphin GMEV Globicephala melas endogenous retrovirus KC128848 Long-finned pilot whale SCEV Stenella coeruleoalba endogenous retrovirus KC128849 Striped dolphin TTEV Tursiops truncatus endogenous retrovirus KC128850 Bottlenose dolphin DDEV Delphinus delphis endogenous retrovirus KC128851 Short-beaked common dolphin OOEV2 Orcinus orca endogenous retrovirus type 2 KC128852 Killer whale GGEV Grampus griseus endogenous retrovirus KC128853 Risso′s dolphin PPEV Phocoena phocoena endogenous retrovirus KC128854 Harbour porpoise LAlEV Lagenorhynchus albirostris endogenous retrovirus KC128855 White-beaked dolphin Epsilonretrovirus WDSV Walleye dermal sarcoma virus NC_001867 Fish

Pol (polymerase) sequences (323 amino acids in length). All Cui, J., Tachedjian, M., Wang, L., Tachedjian, G., Wang, L.F., Zhang, S., 2012a. reference sequences (Table 1) were combined and aligned with Discovery of retroviral homologs in bats: implications for the origin of – the nine new cetacean EGVs (translated into amino acid mammalian gammaretroviruses. J. Virol. 86, 4288 4293. Cui, J., Tachedjian, G., Tachedjian, M., Holmes, E.C., Zhang, S., Wang, L.F., 2012b. sequences, with indels removed) using MUSCLE 3.7 (Edgar, Identification of diverse groups of endogenous gammaretroviruses in mega- 2004). A ML phylogenetic tree for these viruses was then inferred, and microbats. J. Gen. Virol. 93, 2037–2045. again employing the best-fit JTTþIþΓ amino acid substitution Darriba, D., Taboada, G.L., Doallo, R., Posada, D., 2011. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27, 1164–1165. model with 1,000 bootstrap replicates. Dornburg, A., Brandley, M.C., McGowen, M.R., Near, T.J., 2011. Relaxed clocks and inferences of heterogeneous patterns of nucleotide substitution and divergence Molecular dating time estimates across whales and dolphins (Mammalia: Cetacea). Mol. Biol. Evol. 29, 721–736. Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and WeusedtheLTR-divergencemethodtoestimatetheinvasion high throughput. Nucleic. Acids. Res. 32, 1792–1797. time of TTEV into the host genome (Johnson and Coffin, 1999; Cui Gifford, R., Tristem, M., 2003. The evolution, distribution and diversity of endo- et al., 2012b). Briefly, this analysis employed the relation T¼(D/R)/2, genous retroviruses. Virus Genes 26, 291–315. in which T is the invasion time (million years, Myr), D is the number Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W., Gascuel, O., 2010. New algorithms and methods to estimate maximum-likelihood phylogenies: of nucleotide differences per site between the two LTRs, which is assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321. estimated to be 0.018 (LTR length 433-bp) using LTR_FINDER (Xu and Jern, P., Sperber, G.O., Blomberg, J., 2005. Use of endogenous retroviral sequences Wang, 2007), and R is the genomic substitution rate (nucleotide (ERVs) and structural markers for retroviral phylogenetic inference and substitutions (subs) per site, per year). The latter is estimated to be in taxonomy. Retrovirology 2, 50. Johnson, W.E., Coffin, J.M., 1999. Constructing primate phylogenies from ancient – −10 the range of 4.8 9.1 10 subs/site/year based on studies of a retrovirus sequences. Proc. Natl. Acad. Sci. USA 96, 10254–10260. series of nuclear genes and introns from odontocetes which contain Lamere St, S.A., Leger, J.A., Schrenzel, M.D., Anthony, S.J., Rideout, B.A., Salomon, D. the cetacean families (Delphinidae and Phocoenidae) studied here R., 2009. Molecular characterization of a novel gammaretrovirus in killer – (Alter et al., 2007; Dornburg et al., 2011). Notably, there is growing whales (Orcinus orca). J. Virol. 83, 12956 12967. McGowen, M.R., Grossman, L.I., Wildman, D.E., 2012. Dolphin genome provides evidence that the substitution rates of cetaceans are lower than the evidence for adaptive evolution of nervous system genes and a molecular rate averageseeninothermammalianspecies(McGowen et al., 2012). slowdown. Proc. Biol. Sci 279, 3643–3651. Vilstrup, J.T., Ho, S.Y., Foote, A.D., Morin, P.A., Kreb, D., Krützen, M., Parra, G.J., Robertson, K.M., de Stephanis, R., Verborgh, P., Willerslev, E., Orlando, L., References Gilbert, M.T., 2011. Mitogenomic phylogenetic analyses of the Delphinidae with an emphasis on the Globicephalinae. BMC Evol. Biol 11, 65. Alter, S.E., Rynes, E., Palumbi, S.R., 2007. DNA evidence for historic population size Weiss, R.A., 2006. The discovery of endogenous retroviruses. Retrovirology 3, 67. fi and past ecosystem impacts of gray whales. Proc. Natl. Acad. Sci. USA 104, Xu, Z., Wang, H., 2007. LTR_FINDER: an ef cient tool for the prediction of full-length 15162–15167. LTR retrotransposons. Nucleic. Acids. Res. 35, W265–W268, Web Server issue.