A New Lineage of Segmented RNA Viruses Infecting Animals
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/741645; this version posted August 21, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Title: A new lineage of segmented RNA viruses infecting animals Authors: Darren J. Obbard*1 Mang Shi2+ Katherine E. Roberts3+ Ben Longdon3+ Alice B. Dennis4+ *Author for correspondence + Contributed equally Affiliation: 1. Institute of Evolutionary Biology, University of Edinburgh, Charlotte Auerbach Road, Edinburgh, United Kingdom 2. Charles Perkins Center, The University of Sydney, NSW, Australia 3. Biosciences, College of Life & Environmental Sciences, University of Exeter, Penryn Campus, Penryn, Cornwall, United Kingdom 4 Department of Evolutionary Biology & Systematic Zoology, Institute of Biochemistry and Biology, University of Potsdam, 4476 Potsdam, Germany Email & ORCID: DJO [email protected] 0000-0001-5392-8142 ABD [email protected] 0000-0003-0948-9845 BL [email protected] 0000-0001-6936-1697 KER [email protected] 0000-0002-8567-3743 MS [email protected] 0000-0002-6154-4437 Page | 1 bioRxiv preprint doi: https://doi.org/10.1101/741645; this version posted August 21, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Abstract Metagenomic sequencing has revolutionised our knowledge of virus diversity, with new virus se- quences being reported at a higher rate than ever before. However, virus discovery from meta- genomic sequencing usually depends on detectable homology: without a sufficiently close relative, so-called ‘dark’ virus sequences remain unrecognisable. An alternative approach is to use virus-iden- tification methods that do not depend on detecting homology, such as virus recognition by host antiviral immune mechanisms. For example, the sequencing of virus-derived small RNAs has previ- ously been used to propose ‘dark’ virus sequences associated with the Drosophilidae (Diptera). Here we combine published Drosophila data with a comprehensive search of arthropod transcriptomic sequences and selected meta-transcriptomic datasets to identify a completely new lineage of seg- mented positive-sense single-stranded RNA viruses that we provisionally refer to as the Quenya- viruses. Each of the five segments contains a single open reading frame, with most encoding proteins showing no detectable similarity to characterised viruses, and one sharing a small number of key residues with the RNA-dependent RNA polymerases of single- and double-stranded RNA viruses. Using these sequences, we identify close relatives in approximately 20 arthropods, including insects, crustaceans, spiders and a myriapod. Using a more conserved sequence from the putative polymer- ase, we further identify relatives in meta-transcriptomic datasets from gut, gill, and lung tissues of vertebrates, reflecting infections of vertebrates or of their associated parasites. Our data illustrate the utility of small RNAs to detect viruses with limited sequence conservation, and provide robust evidence for a new deeply divergent and phylogenetically distinct RNA virus lineage. Key Words Metagenome, RNA virus, dark virus, arthropod, RNA interference 1 Introduction Galbraith et al. 2018), identified vast numbers of likely new species and genera—consequently Initially pioneered by studies of oceanic phage provoking considerable debate on how we (Breitbart et al. 2002), since the mid-2000s an should go about virus taxonomy (Simmonds et ever-increasing number of metagenomic studies al. 2017, King et al. 2018, Simmonds and have identified thousands of new viruses (or vi- Aiewsakun 2018)—and identified new lineages rus-like sequences) associated with bacteria, that may warrant recognition at family level, in- plants, animals, fungi, and single-celled eukary- cluding Chuviruses, Yueviruses, Qinviruses, otes (reviewed in Greninger 2018, Obbard 2018, Zhaoviruses, Yanviruses and Weiviruses (Li et al. Shi et al. 2018, Zhang et al. 2018). At the same 2015, Shi et al. 2016). Perhaps even more im- time, routine high-throughput sequencing has portantly, these discoveries have also started to provided a rich resource for virus discovery impact our understanding of virus evolution, among eukaryotic host genomes and transcrip- emphasising the importance of ‘modular’ ex- tomes (e.g. Bekal et al. 2011, Longdon et al. change (Koonin et al. 2015, Dolja and Koonin 2015, Webster et al. 2015, François et al. 2016, 2018) and suggesting surprisingly long-term fi- Mushegian et al. 2016, Gilbert et al. 2019). In- delity to host lineages, at least at higher taxo- deed, a recent survey suggested that, as of 2018, nomic levels (Geoghegan et al. 2017, Shi et al. around 10% of the available picornavirus-like 2018). polymerase sequences exist only as un-anno- tated transcripts within the transcriptomes of Nevertheless, despite the successes of meta- their hosts (Obbard 2018). Together, these two genomic virus discovery, there are clear limita- sources of (meta-)genomic data have ‘filled in’ tions to the approach. First, ‘virus-like se- the tree of viruses at many levels. They have ex- quences’ identified from a putative host need panded the host range of known viruses (e.g. not equate to an active viral infection of that Page | 2 bioRxiv preprint doi: https://doi.org/10.1101/741645; this version posted August 21, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. species. They may instead represent integra- microbiota. tions into the host genome, infections of cellular One of the most powerful ways to identify vi- parasites or other microbiota, infections of gut ruses is to capitalise on the host’s own ability to contents, or simply contaminating nucleic acid recognise pathogens, for example by sequenc- (reviewed in Obbard 2018). Second, most meta- ing the copious virus-derived small RNAs gener- genomic methods rely on sequence similarity ated by the antiviral RNAi responses of plants, searches to identify virus sequences through in- fungi, nematodes and arthropods (Aguiar et al. ferred homology. This necessarily limits the new 2015, Webster et al. 2015). This not only demon- viruses that can be discovered to the relatives of strates host recognition of the sequences as viral known viruses. In the future, as similarity search in origin, but also (if both strands of ssRNA vi- algorithms become more sensitive (e.g. ruses are present) demonstrates viral replica- Kuchibhatla et al. 2014, Yutin et al. 2018), this tion, and can even identify the true host of the approach may be able to uncover all viruses—at virus based on the length distribution and base least those that do have common ancestry with composition of the small RNAs (compare the references. However, this approach will al- Webster et al. 2016, with Coyle et al. 2018). most certainly continue to struggle to identify Using ribosome-depleted RNA and small-RNA less conserved parts of the genome, especially metagenomic sequencing, Webster et al (2015) for segmented viruses and incomplete assem- previously proposed approximately 60 ‘dark’ vi- blies. As a consequence, there may be many vi- rus sequences associated with Drosophila. ruses and virus fragments that cannot be seen These comprised contigs of at least one 1kbp through the lens of homology-based meta- that were present as RNA but not DNA, con- genomics, the so-called ‘dark’ viruses (Rinke et tained a long open reading frame, lacked identi- al. 2013, Krishnamurthy and Wang 2017, Knox et fiable homology with known viruses or cellular al. 2018). organisms, and were substantial sources of the The ultimate solution to any shortcomings of 21nt small RNAs that characterise antiviral RNAi metagenomic discovery is to isolate and experi- in Drosophila. They included ‘Galbut virus’ mentally characterise viruses. However, the (KP714100, KP714099), which has since been sheer number of uncharacterised virus-like se- shown to constitute two divergent segments of quences means that this is unlikely to be an op- an insect-infecting Partitivirus (KP757930; Shi et tion in the foreseeable future. Instead, we can al. 2018) and is the most common virus associ- use other aspects of metagenomic data to cor- ated with Drosophila melanogaster in the wild roborate evidence of a viral infection (reviewed (Webster et al. 2015); ‘Chaq virus’ (KP714088), in Obbard 2018). For example, metagenomic which may be a satellite or an optional segment reads are more consistent with an active infec- of Galbut virus (Shi et al. 2018); and 56 unnamed tion if RNA is very abundant (several percent of ‘dark’ virus fragments (KP757937-KP757993). the total), if strand biases reflect active replica- Subsequent discoveries have since allowed 26 of tion (such as the presence of the coding strand these previously dark sequences to be identified for negative sense RNA viruses or DNA viruses), as segments or fragments of viruses that do dis- or if RNA virus sequences are not present as play detectable homology in other regions, in- DNA. The presence and absence of contigs cluding several pieces