A New Lineage of Segmented RNA Viruses Infecting Animals
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/741645; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. A new lineage of segmented RNA viruses infecting animals Darren J. Obbard*1 Mang Shi2+ Katherine E. Roberts3+ Ben Longdon3+ Alice B. Dennis4+ *Author for correspondence + Contributed equally 1. Institute of Evolutionary Biology, University of Edinburgh, Charlotte Auerbach Road, Edin- burgh, United Kingdom 2. Charles Perkins Center, The University of Sydney, NSW, Australia 3. Biosciences, College of Life & Environmental Sciences, University of Exeter, Penryn Campus, Penryn, Cornwall, United Kingdom 4 Department of Evolutionary Biology & Systematic Zoology, Institute of Biochemistry and Biol- ogy, University of Potsdam, 4476 Potsdam, Germany Page | 1 bioRxiv preprint doi: https://doi.org/10.1101/741645; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Abstract Metagenomic sequencing has revolutionised our knowledge of virus diversity, with new virus sequences being reported faster than ever before. However, virus discovery from meta- genomic sequencing usually depends on detectable homology: without a sufficiently close relative, so-called ‘dark’ virus sequences remain unrecognisable. An alternative approach is to use virus-identification methods that do not depend on detecting homology, such as virus recognition by host antiviral immunity. For example, virus-derived small RNAs have previously been used to propose ‘dark’ virus sequences associated with the Drosophilidae (Diptera). Here we combine published Drosophila data with a comprehensive search of transcriptomic sequences and selected meta-transcriptomic datasets to identify a completely new lineage of segmented positive-sense single-stranded RNA viruses that we provisionally refer to as the Quenyaviruses. Each of the five segments contains a single open reading frame, with most encoding proteins showing no detectable similarity to characterised viruses, and one sharing a small number of residues with the RNA-dependent RNA polymerases of single- and double- stranded RNA viruses. Using these sequences, we identify close relatives in approximately 20 arthropods, including insects, crustaceans, spiders and a myriapod. Using a more conserved sequence from the putative polymerase, we further identify relatives in meta-transcriptomic datasets from gut, gill, and lung tissues of vertebrates, reflecting infections of vertebrates or of their associated parasites. Our data illustrate the utility of small RNAs to detect viruses with limited sequence conservation, and provide robust evidence for a new deeply divergent and phylogenetically distinct RNA virus lineage. Key Words Metagenome, RNA virus, dark virus, arthropod, RNA interference 1 Introduction 2018). Together, these two sources of (meta- )genomic data have ‘filled in’ the tree of vi- Pioneered by studies of oceanic phage ruses at many levels. They have expanded (Breitbart et al. 2002), since the mid-2000s the host range of known viruses (e.g. metagenomic studies have identified thou- Galbraith et al. 2018), identified vast num- sands of new viruses (or virus-like sequences) bers of likely new species and genera—con- associated with bacteria, plants, animals, sequently provoking considerable debate on fungi, and single-celled eukaryotes (reviewed how we should go about virus taxonomy in Greninger 2018, Obbard 2018, Shi et al. (Simmonds et al. 2017, King et al. 2018, 2018, Zhang et al. 2018). At the same time, Simmonds and Aiewsakun 2018)—and iden- routine high-throughput sequencing has pro- tified new lineages that may warrant recog- vided a rich resource for virus discovery nition at family level, including Chuviruses, among eukaryotic host genomes and tran- Yueviruses, Qinviruses, Zhaoviruses, scriptomes (e.g. Bekal et al. 2011, Longdon et Yanviruses and Weiviruses (Li et al. 2015, Shi al. 2015, Webster et al. 2015, François et al. et al. 2016). More importantly, these discov- 2016, Mushegian et al. 2016, Gilbert et al. eries have also started to impact upon our 2019). Indeed, a recent survey suggested understanding of virus evolution (Wolf et al. that, as of 2018, around 10% of the available 2018), emphasising the importance of ‘mod- picornavirus-like polymerase sequences ex- ular’ exchange (Koonin et al. 2015, Dolja and isted only as un-annotated transcripts within Koonin 2018) and suggesting surprisingly the transcriptomes of their hosts (Obbard long-term fidelity to host lineages, at least at Page | 2 bioRxiv preprint doi: https://doi.org/10.1101/741645; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. higher taxonomic levels (Geoghegan et al. RNA virus sequences are absent from DNA. 2017, Shi et al. 2018). The presence and absence of contigs across datasets can also provide useful clues as to Despite the successes of metagenomic virus the origin of a sequence. Specifically, se- discovery, there are clear limitations to the quences that are present in all individuals, or approach. First, ‘virus-like sequences’ from a in all populations, are more likely to repre- putative host need not equate to an active vi- sent genome integrations, sequences that al- ral infection of that species. They may repre- ways co-occur with recognisable viral frag- sent integrations into the host genome, in- ments may be segments that are not detect- fections of cellular parasites or other micro- able by homology, and sequences that co-oc- biota, infections of gut contents, or simply cur with non-host sequences are candidates contaminating nucleic acid (reviewed in to be viruses of the microbiota. Obbard 2018). Second, most metagenomic methods rely on similarity searches to iden- One of the most powerful ways to identify vi- tify virus sequences through inferred homol- ruses is to capitalise on the host’s own ability ogy. This limits the new discoveries to the rel- to recognise pathogens, for example by se- atives of known viruses. In the future, as sim- quencing the copious virus-derived small ilarity search algorithms become more sensi- RNAs generated by the antiviral RNAi re- tive (e.g. Kuchibhatla et al. 2014, Yutin et al. sponses of plants, fungi, nematodes and ar- 2018), this approach may be able to uncover thropods (Aguiar et al. 2015, Webster et al. all viruses—at least those that have common 2015). This not only demonstrates host ancestry with the references. However, this recognition of the sequences as viral in approach will probably still struggle to iden- origin, but also (if both strands of ssRNA vi- tify less conserved parts of the genome, es- ruses are present) demonstrates viral replica- pecially for segmented viruses and incom- tion, and can even identify the true host of plete assemblies. As a consequence, there the virus based on the length distribution and may be many viruses and virus fragments base composition of the small RNAs that cannot be seen through the lens of ho- (compare Webster et al. 2016, with Coyle et mology-based metagenomics, the so-called al. 2018). ‘dark’ viruses (Rinke et al. 2013, Using ribosome-depleted RNA and small- Krishnamurthy and Wang 2017, Knox et al. RNA metagenomic sequencing, Webster et al 2018). (2015) previously proposed approximately The ultimate solution to the shortcomings of 60 ‘dark’ virus sequences associated with metagenomic discovery is to isolate and ex- Drosophila. These comprised contigs of at perimentally characterise viruses. However, least one 1kbp that were present as RNA but the large number of uncharacterised virus- not DNA, contained a long open reading like sequences means that this is unlikely to frame, lacked identifiable homology with be an option in the foreseeable future. In- known viruses or cellular organisms, and stead, we can use other aspects of meta- were substantial sources of the 21nt small genomic data to corroborate evidence of a vi- RNAs that characterise Drosophila antiviral ral infection (reviewed in Obbard 2018). For RNAi. They included ‘Galbut virus’ example, metagenomic reads are more con- (KP714100, KP714099), which has since been sistent with an active infection if RNA is very shown to constitute two divergent segments abundant (several percent of the total), if of an insect-infecting Partitivirus (KP757930; strand biases reflect active replication (such Shi et al. 2018) and is the most common virus as the presence of the coding strand for neg- associated with Drosophila melanogaster in ative sense RNA viruses or DNA viruses), or if the wild (Webster et al. 2015); ‘Chaq virus’ Page | 3 bioRxiv preprint doi: https://doi.org/10.1101/741645; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. (KP714088), which may be a satellite or an (pool