The Genomes of Many Yam Species Contain Transcriptionally Active Endogenous Geminiviral Sequences That May Be Functionally Expressed Denis Filloux, S
Total Page:16
File Type:pdf, Size:1020Kb
The genomes of many yam species contain transcriptionally active endogenous geminiviral sequences that may be functionally expressed Denis Filloux, S. Murrell, M. Koohapitagtam, M. Golden, Charlotte Julian, Serge Galzi, Marilyne Uzest, Marguerite Rodier-Goud, Angélique d’Hont, Marie-Stéphanie Vernerey, et al. To cite this version: Denis Filloux, S. Murrell, M. Koohapitagtam, M. Golden, Charlotte Julian, et al.. The genomes of many yam species contain transcriptionally active endogenous geminiviral sequences that may be functionally expressed. Virus Evolution, Oxford University Press, 2015, 1 (1), pp.1-17. 10.1093/ve/vev002. hal-02635090 HAL Id: hal-02635090 https://hal.inrae.fr/hal-02635090 Submitted on 27 May 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Virus Evolution, 2015, 1–17 doi: 10.1093/ve/vev002 Research Article The genomes of many yam species contain transcriptionally active endogenous geminiviral sequences that may be functionally expressed Denis Filloux,1 Sasha Murrell,2,3 Maneerat Koohapitagtam,1,4 Michael Golden,2 Charlotte Julian,1 Serge Galzi,1 Marilyne Uzest,1 Marguerite Rodier-Goud,5 Ange´lique D’Hont,5 Marie Stephanie Vernerey,1 Paul Wilkin,6 Michel Peterschmitt,1 Stephan Winter,7 Ben Murrell,2,8 Darren P. Martin,2,† and Philippe Roumagnac1,* 1CIRAD-INRA-SupAgro, UMR BGPI, Campus International de Montferrier-Baillarguet, 34398 Montpellier Cedex- 5, France, 2Computational Biology Group, Institute of Infectious Diseases and Molecular Medicine, University of Cape Town, Cape Town 4579, South Africa, 3Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA, 4Department of Pest Management, Faculty of Natural Resources, Prince of Songkla University, Hat Yai campus, Thailand 90120, 5CIRAD, UMR AGAP, TA A- 108/03, Avenue Agropolis, F-34398 Montpellier Cedex 5, France, 6Royal Botanic Gardens, Kew, Richmond, Surrey, TW9 3AB, UK, 7DSMZ Plant Virus Department, Messeweg 11/12, 38102, Braunschweig, Germany and 8Department of Medicine, University of California, San Diego, La Jolla, CA *Corresponding author: E-mail: [email protected] †Darren P. Martin: http://orcid.org/0000-0002-8785-0870 Abstract Endogenous viral sequences are essentially ‘fossil records’ that can sometimes reveal the genomic features of long extinct virus species. Although numerous known instances exist of single-stranded DNA (ssDNA) genomes becoming stably inte- grated within the genomes of bacteria and animals, there remain very few examples of such integration events in plants. The best studied of these events are those which yielded the geminivirus-related DNA elements found within the nuclear genomes of various Nicotiana species. Although other ssDNA virus-like sequences are included within the draft genomes of various plant species, it is not entirely certain that these are not contaminants. The Nicotiana geminivirus-related DNA ele- ments therefore remain the only definitively proven instances of endogenous plant ssDNA virus sequences. Here, we char- acterize two new classes of endogenous plant virus sequence that are also apparently derived from ancient geminiviruses in the genus Begomovirus. These two endogenous geminivirus-like elements (EGV1 and EGV2) are present in the Dioscorea spp. of the Enantiophyllum clade. We used fluorescence in situ hybridization to confirm that the EGV1 sequences are inte- grated in the D. alata genome and showed that one or two ancestral EGV sequences likely became integrated more than 1.4 million years ago during or before the diversification of the Asian and African Enantiophyllum Dioscorea spp. Unexpectedly, we found evidence of natural selection actively favouring the maintenance of EGV-expressed replication-associated protein (Rep) amino acid sequences, which clearly indicates that functional EGV Rep proteins were probably expressed for pro- longed periods following endogenization. Further, the detection in D. alata of EGV gene transcripts, small 21–24 nt RNAs that VC The Author 2015. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 1 2|Virus Evolution, 2015, Vol. 1, No. 1 are apparently derived from these transcripts, and expressed Rep proteins, provides evidence that some EGV genes are pos- sibly still functionally expressed in at least some of the Enantiophyllum clade species. Key words: endogenous viral sequences; geminivirus; yam; functional protein expression; selection analysis. 1 Introduction CDP18021, CDP18667, CDP20185, and CDP20985), and Populus tri- chocarpa (black cottonwood tree, GenBank accession: The geminiviruses (Family Geminiviridae) are a diverse group of PRJNA17973) (Liu et al. 2011; Martin et al. 2011), the best studied viruses with circular single-stranded DNA (ssDNA) genomes of these integrated geminivirus sequences are the so-called composed of one or two components, varying in size from 2.5 to ‘geminivirus-related DNA’ (GRD) elements within the genomes 3.0 kb, which are characteristically encapsidated within twinned of various Nicotiana species (Kenton et al. 1995; Bejarano et al. incomplete icosohedral (or geminate) particles (Jeske 2009). All 1996; Ashby et al. 1997; Murad et al. 2004). geminivirus genomes contain between four and eight genes and GRD elements have been classified into three groups: GRD2, have a hairpin structure at their virion strand origins of replica- GRD3, and GRD5. GRD2 is represented by five to fifteen copies tion (v-ori) that consists of a GC-rich stem and a loop containing on chromosome 4 of N.tomentosa. The other GRD elements are a highly conserved AT-rich nonanucleotide motif (usually with found as three related repeat classes, GRD5, GRD53, and GRD3, the sequence TAATATTAC) (Jeske 2009). This hairpin is located clustered as multiple direct repeats on homologous group 4 within an intergenic region that also contains the bidirectional chromosomes (GRD5 and GRD53) or on chromosome 2 (GRD3) in promoter elements and transcription start sites of diverging vi- several Nicotiana species. Each GRD element contains a degener- rion and complementary sense genes (Jeske 2009). The only two ate and truncated begomoviral rep gene and an intergenic re- genes that are obviously conserved among all currently de- gion fragment carrying a geminivirus virion-sense origin of scribed geminiviruses are those encoding the coat protein (cp) replication (v-ori) -like sequence (Kenton et al. 1995; Bejarano and the replication associated protein (rep)(Bernardo et al. et al. 1996; Ashby et al. 1997; Murad et al. 2004). 2013). Although all known geminiviruses also express one or Endogenous viral sequences such as the GRD elements are es- more proteins that are involved in virus movement, there is sentially ‘fossil records’ that have the potential to reveal the ge- no detectable homology between the movement proteins of nomic features of long extinct virus species (Katzourakis 2013). viruses in different genera (Jeske 2009). Upon integration, such sequences, if non-functional, would have Based on host ranges, vector specificities, genome organiza- begun evolving under neutral genetic drift at approximately 10–9 tion, and genome-wide sequence similarities, the family substitutions per site per year (Huang et al. 2012), respectively, Geminiviridae has been split into seven genera by the approximately 10 (Lefeuvre et al. 2011)to10,000(Duffy and International Committee on the Taxonomy of Viruses: Holmes 2008) times slower than the long- and short-term substi- Begomovirus, Curtovirus, Topocuvirus, Becurtovirus, Turncurtovirus, tution rates of ‘free living’ geminiviruses. Given the approximate Eragrovirus, and Mastrevirus (Adams, King, and Carstens 2013). timing of such integration events, the comparison of endogenous However, recent discoveries of various highly divergent gemini- sequences with their distant contemporary relatives can be used virus-like ssDNA viruses that cannot reasonably be classified to both determine long-term viral genomic substitution rates within these seven genera suggest that there likely exists far (Gilbert et al. 2009; Gilbert and Feschotte 2010; Lefeuvre et al. more diversity within this family than is currently represented 2011) and to date events in the deep evolutionary history of vi- by the established taxonomy (Krenz et al. 2012; Loconsole et al. ruses (Emerman and Malik 2010; Katzourakis and Gifford 2010). 2012; Bernardo et al. 2013). Although the begomoviruses, curto- Analyses of the multiple geminivirus rep fragments within the viruses, and topocuviruses share relatively similar genome genomes of various Nicotiana species have indicated that they structures and are known to naturally infect only dicotyledon- have likely been evolving under neutral genetic drift (Murad et al. ous plants, viruses in the other genera have a variety