Virus Evolution, 2020, 6(1): veaa008
doi: 10.1093/ve/veaa008 Rapid Communication
Molecular fossils reveal ancient associations of dsDNA viruses with several phyla of fungi Zhen Gong,† Yu Zhang,† and Guan-Zhu Han*
Jiangsu Key Laboratory for Microbes and Functional Genomics, College of Life Sciences, Nanjing Normal University, Nanjing, Jiangsu, 210023, China
*Corresponding author: E-mail: [email protected] †These authors contributed equally to this work.
Abstract Little is known about the infections of double-stranded DNA (dsDNA) viruses in fungi. Here, we use a paleovirological method to systematically identify the footprints of past dsDNA virus infections within the fungal genomes. We uncover two distinct groups of endogenous nucleocytoplasmic large DNA viruses (NCLDVs) in at least seven fungal phyla (accounting for about a third of known fungal phyla), revealing an unprecedented diversity of dsDNA viruses in fungi. Interestingly, one fungal dsDNA virus lineage infecting six fungal phyla is closely related to the giant virus Pithovirus, suggesting giant virus relatives might widely infect fungi. Co-speciation analyses indicate fungal NCLDVs mainly evolved through cross-species transmission. Taken together, our findings provide novel insights into the diversity and evolution of NCLDVs in fungi.
Key words: paleovirology; fungi; endogenous viral elements; nucleocytoplasmic large DNA viruses; phylogenetics.
1. Introduction some unclassified viruses (Koonin and Yutin 2018). NCLDVs As a paradigm, viruses have long been thought to be smaller possess highly diverse genomes ranging from 100 kb (iridovi- than cellular organisms. The discovery of Acanthamoeba poly- ruses) to > 2.5 Mb (pandoraviruses) (Philippe et al. 2013; phaga mimivirus, a giant virus with a double-stranded DNA Legendre et al. 2018). Some giant viruses can even encode pro- linear genome of 1.2 Mb, challenged this ‘well-established’ tein translation components, further blurring the boundary be- concept (La Scola 2003; Raoult et al. 2004). Since then, many tween viruses and cellular world (Arslan et al. 2011). NCLDVs giant viruses have been identified in eukaryotes (primarily infect a remarkably wide range of eukaryotes, from protists to amoebae) and diverse environments across the globe (Koonin green algae and animals (Delaroque and Boland 2008; Koonin 2009; Koonin and Yutin 2010; Philippe et al. 2013; Maumus et al. and Yutin 2010; Colson et al. 2012; Gallot-Lavalle´e and Blanc 2014). Phylogenetic analyses suggest that giant viruses fall 2017; Koonin and Yutin 2018). However, only few NCLDVs within the diversity of nucleocytoplasmic large DNA viruses have been described in land plants and fungi (Maumus et al. (NCLDVs), a group of double-stranded DNA (dsDNA) viruses 2014; Gallot-Lavalle´e and Blanc 2017). (Colson et al. 2012; Koonin and Yutin 2018; Guglielmini et al. To date, double-stranded RNA (dsRNA) viruses (e.g. 2019). Quadriviridae, Megabirnaviridae, Partitiviridae, Reoviridae, and Despite their various gene repertoires and genome sizes, Totiviridae), positive-sense single-stranded RNA [(þ)ssRNA] vi- NCLDVs form a monophyletic group. Currently, NCLDVs com- ruses (e.g. Alphaflexiviridae, Gammaflexiviridae,andHypoviridae), prise at least seven families (Iridoviridae, Ascoviridae, Asfarviridae, negative-sense single-stranded RNA [(-)ssRNA] viruses Phycodnaviridae, Poxviridae, Mimiviridae, and Marseilleviridae) and (e.g. Bunyaviridae), and single-stranded DNA (ssDNA) viruses
VC The Author(s) 2020. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/ licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]
1 2|Virus Evolution, 2020, Vol. 6, No. 1
(a)(b) 100 Ascoviridae Fundsalphavirus Fundsbetavirus 100 99 Rozellomycota 99 Iridoviridae 100 Marseilleviridae Aphelidiomycota 91 99 95 Blastocladiomycota 100 P. patens 95 ‘viruses’ Fundsalphaviruses 99 100 Chytridiomycota
Conidiobolus coronatus fundsalphavirus Monoblepharomycota 99 97 Pithovirus sibericum Neocallimastigomycota Mimiviridae & Phycodnaviridae Olpidiomycota 77 Pandoraviruses & Phycodnaviridae 100 Eukaryotic delta Basidiobolomycota 100 Herpesviridae Zoopagomycota 79 100 Eukaryotic zeta Kickxellomycota 95 100 Fundsbetaviruses 100 Entomophthoromycota 100 Asfarviridae
100 Faustovirus Calcarisporiellomycota 100 Kaumoebavirus Mucoromycota 97 Heterocapsa circularisquama DNA virus 01 Mortierellomycota 100 Phages 100 Baculoviridae Glomeromycota 97 91 100 Poxviridae Entorrhizomycota 100 Archaeal B3 82 Basidiomycota 97 100 Eukaryotic alpha Ascomycota 100 Archaeal B1
0.3
Figure 1. Phylogeny and host distribution of fungus NCLDV-like sequences. (a) Phylogenetic tree of fundsalphaviruses and fundsbetaviruses based on DNAP. Fundsalphaviruses, fundsbetaviruses, other dsDNA viruses, and P. patens ‘virus’ were labeled in green, orange, blue, and brown, respectively. This unrooted tree was reconstructed using a maximum likelihood method. Ultrafast bootstrap (UFBoot) values (>75) were shown on selected nodes. (b) Host distribution of fundsalphaviruses and fundsbetaviruses. The fungi phylogeny at the level of phylum was inferred from Tedersoo et al. (2018). The presence of fundsalphaviruses and fundsbetaviruses in fungus phylum was labeled with green solid circle and orange solid circle, respectively.
(e.g. Genomoviridae) have been known to infect fungi (Ghabrial 2. Results and discussion et al. 2015; Krupovic et al. 2016; Roossinck 2019). Among them, dsRNA viruses are most commonly observed in fungi (Ghabrial To explore the diversity of dsDNA viruses in fungi, we used a et al. 2015; Roossinck 2019). similarity search and phylogenetic analysis combined approach Virus can adventitiously integrate into host germline to screen the presence of NCLDV-like elements in 1,006 fungal genomes and become vertically inherited to next generation, genomes (see Methods). We identified a total of 87 NCLDV-like forming the so called endogenous viral elements (EVEs). sequences within 15 fungal genomes (Supplementary Table S1). Notably, the replication of phaeoviruses requires integration Phylogenetic analysis based on the family B DNA polymerase into the host genomes using integrase encoded by their (DNAP), a core gene conserved among NCLDVs and cellular genomes (Delaroque et al. 1999; Meints et al. 2008). EVEs record organisms, shows that the fungus NCLDV-like sequences past viral infections and thus represent ‘molecular fossils’ for identified here cluster into two distinct lineages, designated as studying the deep history and ancient ecology of viruses (Patel, fungus dsDNA alpha virus (fundsalphavirus) and fungus dsDNA Emerman, and Malik 2011; Feschotte and Gilbert 2012). EVEs beta virus (fundsbetavirus) (Fig. 1a and Supplementary Fig. S1). have facilitated the rise of an emerging field, Paleovirology. Premature stop codons and frameshift mutations exist in some With the recent development of next generation sequencing DNAP sequences, suggesting these viruses are endogenous viral (NGS), hundreds of eukaryote genomes have been sequenced, elements and are not generated by laboratory contamination. providing a rich resource for uncovering the footprints of past Interestingly, fundsalphaviruses appear to be closely related to viral infections. Recently, NCLDV-like sequences have been Pithovirus sibericum, a giant virus isolated from Siberian perma- identified in three fungus species (Gallot-Lavalle´e and Blanc frost (Fig. 1a and Supplementary Fig. S5)(Legendre et al. 2014). 2017). However, much remains unknown about the distribution Surprisingly, endogenous fundsalphaviruses were identified in and evolution of dsDNA viruses in fungi (Colson et al. 2013). six fungus phyla (accounting for a third of known fungal phyla) In this study, we used a paleovirological approach to systemat- (Fig. 1b)(Tedersoo et al. 2018). Endogenous fundsbetaviruses ically identify endogenous dsDNA virus elements in the genomes were only identified within Rhizophagus irregularis. Phylogenetic of fungi. We performed phylogenetic analyses to investigate the analysis shows that fundsbetaviruses are closely related with relationship between the newly identified viral sequences in African swine fever virus (the Asfarviridae family). Taken fungal genomes and NCLDVs. We also performed co-speciation together, our results suggest NCLDVs might infect/have infected analyses to explore the evolutionary mode of fungal viruses. a wide range of fungi. Z. Gong et al. | 3
To further characterize fundsalphaviruses and fundsbetavi- lineages of dsDNA viruses infecting fungi, revealing a poten- ruses, we investigated the gene contents flanking DNAP. We tially unprecedented diversity of dsDNA virus families in fungi found that at least six genes might be shared among fundsal- and raising the possibility that other dsDNA viruses might be phaviruses, namely D5 helicase-primase (D5), D6/D11-like SNF2 still circulating in fungi. Further work is needed to characterize helicase (DEXDc-HELICc), S-adenosylmethionine-dependent the diversity of dsDNA viruses in fungi. Our findings come with methyltransferase (SAM), DNA-dependent RNA polymerase two caveats: (1) the diversity of dsDNA viruses might be under- (RNAP) subunit 2 and 5 (RPB2 and RPB5), and RNAP subunit 1 (N- estimated in fungi, because we only used DNAP as a probe to terminal [RP1n] and C-terminal [RP1c]) (Supplementary Fig. S2). screen dsDNA virus insertions; (2) the fungus NCLDV sequences In some cases, retrotransposon-related sequences were found might be generated by laboratory contamination or sequencing near the endogenous fundsalphavirus elements, suggesting error. However, many fungus NCLDV sequences are in long ge- retrotransposons might potentially play a role in the integration nomic contigs. Fundsalphaviruses have been identified in as and proliferation of fundsalphaviruses (Supplementary Fig. S2; many as fourteen species across six phyla. There are premature Aswad and Katzourakis 2017). The variation in gene contents stop codons and frameshift mutations in fungus NCLDV among different endogenous fundsalphavirus elements sequences. All these lines make the possibility of contamina- may reflect the variation of fundsalphaviruses infecting differ- tion highly unlikely (Naccache et al. 2013; Kjartansdo´ ttir et al. ent fungi and/or be due to the long-term evolution after the 2015). Nevertheless, our findings reveal dsDNA viruses, espe- integrations of ancestral viruses. Phylogenetic analyses based cially giant virus relatives, could infect a previously neglected on these six genes show that fundsalphaviruses consistently major eukaryotic kingdom, Fungi. cluster together with NCLDVs (Supplementary Figs S3–S8). In particular, our phylogenetic analyses of RNAP1 and RNAP2 clearly distinguish NCLDVs from cellular species, which is 3. Methods and materials consistent with a previous study by Guglielmini et al. (2019), 3.1 Identification of NCLDV-like sequences in fungal and show that fundsalphaviruses cluster together with genomes NCLDVs (Supplementary Figs S4 and S6). But fundsalphavi- ruses do not share similar gene orders with other NCLDVs We screened a total of 1,006 whole genome shotgun (WGS) (Supplementary Fig. S2). Taken together, these results indicate sequences of fungi for the NCLDV-like sequences through a that fundsalphaviruses might be derived from a distinct similarity search and phylogenetic analysis combined ap- NCLDV family. proach. First, we used the tBLASTn algorithm to search against Moreover, we performed similarity search of major capsid the fungal genomes with an e-cut-off value of 10 5 and DNAP of protein (MCP) and A32 packaging ATPase, proteins involved in representative dsDNA viruses as queries. DNAP has been used virion morphogenesis, against fungal genomes. The evolution- widely to identify and classify NCLDVs due to strong sequence ary history of MCP and A32 proteins are complex with sequen- conservation, clear evolutionary history, and few horizontal ces from prokaryotes (perhaps phages) dispersed across the gene transfer (Monier, Claverie, and Ogata 2008; Colson et al. trees, precluding meaningful evolutionary interpretation and 2013). Next, the significant hits together with DNAP sequences classification (Supplementary Figs S9 and S10). However, we did of representative NCLDVs, herpesviruses, baculoviruses, eukar- find that some significant hits from fungal genomes cluster to- yotes, and archaea were aligned using MAFFT (Koonin and gether with NCLDVs, further confirming the presence of dsDNA Yutin 2010; Katoh and Standley 2013) (accession numbers are viruses in fungal genomes (Supplementary Figs S9 and S10). available in Supplementary Fig. S1). Initial phylogenetic Endogenous fundsalphavirus elements were identified in analyses were performed through an approximate maximum fourteen fungus species that belong to six fungal phyla (Fig. 1b). likelihood method implemented in FastTree (Price, Dehal, and One might argue that they may arise through an ancestral inte- Arkin 2010). The fungal sequences that group with NCLDVs gration or horizontal gene transfer event in the most recent were extracted for further studies. Moreover, we performed sim- common ancestor of these fungi. To test this possibility, we per- ilarity searches using MCP and A32 protein as queries with e- formed co-speciation analyses using an event-based approach cut-off value of 0.01. Phylogenetic analyses were also performed at the level of host phylum. There is no significant congruence through an approximate maximum likelihood method imple- between host and fundsalphavirus phylogenies (Fig. 2; Table 1), mented in FastTree (Price, Dehal, and Arkin 2010). revealing that these NCLDV-like sequences within fungal To explore the gene contents of fundsalphaviruses and genomes may not derive from an ancestral integration event. fundsbetaviruses, we extracted the sequences containing DNAP Moreover, we did not find shared host genes among endoge- within their genomes, covering eight flanking protein domains nous fundsalphaviruses of different fungi (Supplementary Fig. for each side of DNAP. The conserved domains were detected S2). These results suggest these endogenous fundsalphaviruses through the BLASTx algorithm against non-redundant protein might arise through multiple integration events, and fundsal- database in National Center for Biotechnology Information phaviruses evolved in fungi mainly via cross-species (NCBI) website, and then confirmed by the Conserved Domain transmission. search. Interestingly, the NCLDV-like sequences previously identi- fied in the genome of the moss Physcomitrella patens (File´e 2014; Maumus et al. 2014) fall within the diversity of fundsalphavi- 3.2 Phylogenetic analyses ruses (Fig. 1a), raising the possibility that land plant NCLDVs To explore the phylogenetic relationship among fungal NCLDV- might originate through cross-species transmission from like sequences, other dsDNA viruses, bacteria, archaea, and fungi. Indeed, many plant viruses are closely related to fungal eukaryotes, we performed phylogenetic analyses of seven viruses (Roossinck 2019). proteins (DNAP, SAM, RPB2, RPB5, RPB1, D5 helicase-primase, Much remains unknown about the distribution and evolu- and D6/D11-like SNF2 helicase). We used the BLASTp algorithm tion of dsDNA viruses in fungi. In this study, we found two to search against non-redundant protein database with an 4|Virus Evolution, 2020, Vol. 6, No. 1
Neo_cal MCOG01001603.1-6842-6093 Neo_cal MCOG01000005.1-727220-727735 Neo_cal MCOG01001608.1-6791-6055 Neo_cal MCOG01000891.1-3820-4932 Neo_cal MCOG01001593.1-1675-599 Neocallimastigomycota Neo_cal MCOG01000549.1-40523-41719 Neo_cal MCOG01000869.1-1652-471 Neo_cal MCOG01000648.1-20085-19729 Neo_cal MCOG01001255.1-12283-11163 Neo_cal MCOG01001412.1-10066-9453 Neo_cal MCOG01000524.1-43418-42306 Neo_cal MCOG01000618.1-15743-16924 Neo_cal MCOG01001348.1-5060-6172 Neo_cal MCOG01001761.1-2825-1662 Neo_cal MCOG01000937.1-5753-6934 Neo_cal MCOG01000566.1-40848-39946 Neo_cal MCOG01000665.1-4185-5024 Basidiobolomycota Neo_cal MCOG01001403.1-10005-9517 Neo_cal MCOG01001719.1-4122-4502 Neocallimastigomycota 100 Neo_cal MCOG01000764.1-5534-6112 Neo_cal MCOG01000753.1-4977-6113 98 Neo_cal MCOG01000452.1-67957-69018 Ana_rob MCFG01000208.1-45763-44228 Ana_rob MCFG01000387.1-13696-12551 Pir_fin MCFH01000019.1-631779-631555 99 Pir_fin MCFH01000069.1-221525-220380 Pir_fin MCFH01000083.1-112059-110575 Kickxellomycota Pir_fin MCFH01000089.1-61193_62587 Pir_fin MCFH01000089.1-115070-116563 Pir_E2 KZ151082.1-44419-43901 100 Pir_E2 KZ151426.1-6775-7554 Pir_fin MCFH01000063.1-79043-77322 Pir_E2 KZ151806.1-3433-4230 Neo_cal MCOG01000149.1-384862-384638 100 Neo_cal MCOG01000331.1-171535-170315 Neo_cal MCOG01000572.1-150-1328 100 Neo_cal MCOG01000112.1-393-1206 100 Pec_rum ASRE01014393.1-1211-3166 Pir_E2 KZ150781.1-166785-168551 100 87 Ana_rob MCFG01000243.1-83698-81851 Ana_rob MCFG01000494.1-10398-8269 Entomophthoromycota Cun_ele JNDR01000766.1-3307-4212 91 100 Cun_ele JNDR01000766.1-801-3227 Mucoromycota Cun_ele JNDR01000766.1-2-700 93 Kickxellomycota 95 Zan_cul LSSK01000604.1-8801-6927 Con_cor KQ964610.1-47151-47588 Entomophthoromycota Mucoromycota 100 P. patens ‘viruses’
Bas_mer MCFE01000191.1-77124-76513 100 99 Bas_mer MCFE01000333.1-22315-21848 99 Bas_mer MCFE01000822.1-1945-4329 Basidiobolomycota 100 Bas_het JNET01021252.1-104-685 90 Bas_het JNET01029572.1-8763-8128 99 98 Bif_ade MVBO01000368.1-2183-339 Mucoromycota 99 Mor_elo KV442340.1-1451-111 Mortierellomycota 100 100 Mor_ver KN042439.1-59829-58065 Mortierellomycota 100 Mor_elo KV442190.1-1-882 Cas_mol KN233205.1-1706-43 (plant) Mortierellomycota Sak_obl JNEV01001195.1-180347-182671 Mucoromycota Con_cor KQ964598.1-37719-39284 Entomophthoromycota YP_009000951.1 Pithovirus sibericum
Figure 2. Evolutionary mode of fundsalphaviruses. The fundsalphavirus tree (left) is compared with its fungus host tree (right) at the level of phylum. Gray lines con- nect the fundsalphaviruses to their corresponding host phyla. Fungus name abbreviations: Neo_cal, Neocallimastix californiae; Pir_fin, Piromyces finnis; Ana_rob, Anaeromyces robustus; Pir_ E2, Piromyces sp. E2; Pec_rum, Pecoramyces ruminatium; Bas_mer, Basidiobolus meristosporus; Bas_het, Basidiobolus heterosporus; Cun_ele, Cunninghamella elegans; Bif_ade, Bifiguratus adelaidae; Sak_obl, Saksenaea oblongispora; Con_cor, Conidiobolus coronatus; Mor_elo, Mortierella elongate; Mor_ver, Mortierella verticillata; Zan_cul, Zancudomyces culisetae.
Table 1. Number of events experienced by fundsalphaviruses
Event costsa Total cost Co-speciationb Duplicationb Duplication and host switchb Lossb Failure to divergeb P-valuec P-valued