Download As Adobe
Total Page:16
File Type:pdf, Size:1020Kb
Edinburgh Research Explorer Expansion of the metazoan virosphere Citation for published version: Obbard, DJ 2018, 'Expansion of the metazoan virosphere: progress, pitfalls, and prospects', Current Opinion in Virology, vol. 31, pp. 17-23. https://doi.org/10.1016/j.coviro.2018.08.008 Digital Object Identifier (DOI): 10.1016/j.coviro.2018.08.008 Link: Link to publication record in Edinburgh Research Explorer Document Version: Peer reviewed version Published In: Current Opinion in Virology Publisher Rights Statement: Allows users to copy, distribute and transmit an article as long as the author is attributed, the article is not used for commercial purposes, and the work is not modified or adapted in any way. General rights Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 29. Sep. 2021 Expansion of the Metazoan Virosphere: Progress, Pitfalls, and Prospects Darren J Obbard1* 1Institute of Evolutionary Biology, and Centre for Immunity, Infection and Evolution The University of Edinburgh Charlotte Auerbach Road Edinburgh, EH9 3FL *[email protected] Keywords: Virus; Metagenomics; RNAi; Host-range; Picornavirales; Abstract: Metagenomic sequencing has led to a recent and rapid expansion of the animal virome. It has uncov- ered a multitude of new virus lineages from under-sampled host groups, including many that break up long branches in the virus tree, and many that display unexpected genome sizes and structures. Although there are challenges to inferring the existence of a virus from a ‘virus-like sequence’, in the absence of an isolate the analysis of nucleic acid (including small RNAs) and sequence data can provide considerable confidence. As a consequence, this period of molecular natural history is helping to re- shape our views of deep virus evolution. Highlights: Metagenomic discovery now contributes substantially to our view of virus evolution Serendipitous virus sequences from transcriptomes and genomes are under-utilised Small-RNA and strand-specific sequencing aid interpretation of viral metagenomes Barcode-switching and cryptic/unintended host material hamper robust host assignment Systematic host sampling is required to estimate the total number of animal viruses 1 Explosive metagenomic growth Metagenomic discovery has already had a huge impact on our knowledge of virus diversity. It is 120 years since the word ‘virus’ was first It has ‘filled in’ shallower parts of the tree, finding applied specifically to a viral pathogen [1], but the close relatives of iconic human pathogens, such as number of known viruses is growing faster than new influenzas in toads and eels [14]. It has also ever (figure 1A; [2]). Much of this growth is discovered new deep branches, such as clades of through metagenomic discovery: the undirected insect-infecting Partitiviruses [10,11] and Lu- large-scale sequencing of nucleic acids sampled teo/Sobemo-like viruses [10,24], and whole new from potential hosts or their environment [2-4]. families, such as the Chuviruses [25]. This in turn Pioneered by studies of bacteriophage in the ma- has led to renewed interest in inferring deep viral rine environment [5], recent years have witnessed phylogenies [11,26], and has prompted proposals an explosion in metagenomic sampling of the met- for large-scale updates of higher-level virus taxon- azoan virosphere. This boom has focussed first on omy [27]. More importantly, metagenomics now viruses likely to infect us and our livestock, partic- contributes to our thinking on virus evolution. It ularly the virome of mammalian faeces [e.g. 6], on has provided a better perspective on host-associ- putative disease reservoirs such as bats [e.g. 7,8], ation and host-switching [14,28,29], found famil- and on arbovirus vectors [e.g. 9]. Subsequently, iar virus lineages with unexpected genome sizes the focus has expanded to include neglected ani- and structures [11,25,30], and uncovered an unex- mal lineages, identifying hundreds of new RNA vi- pectedly dynamic history of ‘modular’ protein ruses in arthropods and other invertebrates [10- swapping [11,26]. Finally, merely having a PCR 13], and recently in divergent and under-sampled product from a metagenomic sample can provide chordates [14,15]. an experimental route to the functional biology of Compared to the isolation of new virus cul- an uncultured virus [31]. tures, metagenomic discovery seems (relatively) Potential pitfalls cheap, easy, and (virtually) guaranteed—se- quences often appear ‘for free’ when sequencing The recent viral bonanza partly reflects ad- genomes and transcriptomes (Figure 1B-E) [10,16- vances in nucleic acid sequencing, a technology 18]. Nevertheless, there are clearly limitations to that has left Moore’s Law—that computational metagenomic discovery—especially for important power doubles every 2 years—far behind [32]. But applied questions such as “Where is the pandemic sequencing is just one of the challenges to explor- coming from?” [2]. With an isolate in hand we ing the virosphere. The lack of a viable meta-bar- would have more than just a ‘virus-like sequence’: coding sequence means that virus discovery often we could unambiguously confirm the host, be con- takes a full metagenomic approach, sequencing fident we hadn’t been misled by a computational total (or virus-enriched) nucleic acid, and subse- artefact, and study viral replication, host range quently assigning sequences through inferred ho- and immunity [19-21]. However, our catalogue of mology [e.g. 3,33,34]. This is challenging because the virosphere is in its infancy, and there are still high divergence means that only the most con- great gains to be made from simple ‘molecular served sequences are recognisable (e.g. RNA virus natural history’. Fewer than 5 thousand viruses polymerases), and even then, only at the protein have received formal taxonomic recognition [22] level. Sensitive surveys therefore benefit from as- and only around 15 thousand have even been sembled contigs rather than raw reads (so that di- named informally (Figure 1A). This is less compre- vergent genes are linked to recognisable ones) hensive than the 17th century view of plant diver- and protein rather than nucleic-acid similarity sity, even in absolute terms [ca. 18 thousand searches (because divergence is high). This can be species, 23], but few biologists today would claim done using off-the-shelf assemblers and search al- the naturalists of subsequent centuries wasted gorithms such as SPADes [35] or Trinity [36], and their effort when making herbarium collections. Diamond [37], but there is also a growing ecosys- And a modern evolutionary virologist can proba- tem of virus-specific metagenomic packages and bly learn more from a virus genome than a 17th pipelines available [34]. century botanist could from a dried specimen. 2 As with any field in rapid development, best mina platforms can misattribute reads among li- practice is uncertain and fluid, and there are pit- braries at rates of up to 1% [44], and while this is falls for the unwary [3]. For example, although the reduced by incorporating barcodes in both adap- assembly of virus (especially RNA virus) genomes tors (‘dual indexing’), it is not always completely is facilitated by their small size and largely unre- mitigated. Multi-host samples are often explicitly petitive nature, the high complexity of meta- recognised as such, for example those from ‘holo- genomic pools tends to promote artefactual and bionts’ such as anemones [45]. However, the chimeric contigs [4,38]. These can unite viral se- multi-host nature of other samples is sometimes quences with non-viral ones, especially high-copy- downplayed. For example, faecal samples are of- number host sequences such as those from mito- ten dominated by viruses infecting the gut micro- chondria and ribosomes. Such ‘wide’ chimeras are biota and/or organisms in the host’s diet [46,47], partly mitigated by the use of paired-end and but virus-like sequences are sometimes reported strand-specific reads, ensuring effective adaptor (at least in the headline) as if they were viruses of removal, and (when possible) removing host reads the faecal donor itself. And, if nucleic acids or viri- before assembly (although this can introduce ons are prepared from whole host individuals, vi- problems if virus reads can cross-map to the host). ruses in faecal matter and viral infections of para- Chimeric mis-assemblies among divergent viruses sites (notably nematodes, platyhelminthes, and or viral segments are also possible, especially microscopic arthropods) and pathogens (fungi, when they share near identical stretches of se- trypanosomatids, apicomplexans, amoebae, and quence, such as structural RNA motifs or terminal many others) will also be represented among the repeats. These are harder to diagnose, and may ul- sequences. Pre-screening of samples for specific timately require PCR verification, but can often be parasites by PCR [such as nematodes, e.g. 13]— flagged by comparison with close relatives (if can mitigate against this, as can tissue dissection available), unexpected local variation in read- [14] (although at the potential risk of biasing dis- depth, and comparison across metagenomic sam- covery toward viruses with a strong tissue tro- ples. pisms). However, the potential for viral infections of eukaryotic parasites within the metazoan host These challenges aside, discovering a ‘virus- means that even dissected tissue may be crypti- like sequence’ remains easier than confirming its cally multi-host. For example, the only dimar- status as an infectious agent of the targeted host.