New insight into the RNA aquatic virosphere via viromics

Alexander Culleya,b,c,d a Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Québec, G1V 0A6, Canada b Centre d’études nordiques (CEN), Université Laval, Québec, Québec, G1V 0A6, Canada c Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Québec, G1V 0A6, Canada d Takuvik, Unité Mixte Interntionale (UMI 3376) Université Laval (Canada) & Centre National de la Recherche Scientifique (France), Québec QC GIV 0A6, Canada

Accepted for publication in Research 244 (2018) 84–89

Keywords: RNA virus, Viromics, Metatranscriptomics, Viral ecology, Aquatic ,

Abstract

RNA that infect microbes are now recognized as an active, persistent and important component of the aquatic microbial community. While some information about the diversity and dynamics of the RNA virioplankton has been derived from culture-based and single gene approaches, research based on viromic and metatransciptomic methods has generated unprecedented insight into this relatively understudied class of microbes. Here, the relevant literature is summarized and discussed, including viromic studies of extracellular aquatic RNA viral assemblages, and transcriptomic studies of active and associated RNA viruses from aquatic environments followed by commentary on the present challenges and future directions of this field of research.

1. Introduction

1.1. The aquatic virioplankton

The most abundant and diverse group of biological entities in aquatic environments are the protists, prokaryotes, and viruses that comprise the microbial community. Microbes play a critical role in the cycling of nutrients and energy, and therefore understanding the dynamics and interactions of this group is vital to understanding the ecology of aquatic ecosystems as a whole. An important, but relatively understudied, component of the microbial community is the viruses.

The study of aquatic viruses that infect microbes has focused almost exclusively on viruses with double stranded (ds) genomes, and for good reason. Viruses with dsDNA genomes are the most abundant (Wigington et al., 2016) and diverse (Suttle, 2016) microbes in aquatic environments. They regulate abundance and influence population structure, catalyze evolution through the mediation of gene exchange (reviewed in (Suttle, 2007)) and ultimately modify biogeochemical cycles of fundamental importance (reviewed in Weitz and Wilhelm, 2012). Double-stranded DNA phages are undoubtedly major contributors to the virioplankton (community of extracellular aquatic viruses), however, it is now apparent that this assemblage is more diverse than previously appreciated. Communities of largely uncharacterized viruses with single-stranded (ss)DNA (Labonté and Suttle, 2013), and ssRNA and double-stranded (ds)RNA genomes appear to be ubiquitous and widespread in a range of aquatic environments (see below). In fact, there are limited data that suggest that at times the abundance of RNA viruses can exceed that of dsDNA phages (Steward et al., 2013). However, a comprehensive survey of the morphology of viruses from the world's oceans indicate that this may not be the status quo (Brum et al., 2013). Moreover sequences closely related to a group of viruses with massive dsDNA genomes have been identified in marine microbial metagenomic libraries (Monier et al., 2008), suggesting that these viruses are also persistent constituents of the virus community. Integrating the diverse and complex groups that comprise the virioplankton into a cohesive ecological picture presents a substantial challenge, and yet it is essential if we are to attain a better understanding of aquatic microbial ecology.

The observation that viruses with a range of genome types exist in aquatic environments raises the question of whether it matters from an ecological perspective, if a virus has a RNA or dsDNA genome. Although the ecological ramifications of a virus’s genome composition has yet to be elucidated, the most parsimonious answer would seem to be yes. The activity of a virus is inextricably linked to the composition of its genome. If we summarize the general characteristics of the RNA and dsDNA viruses currently in the ICTV database (https://talk. ictvonline. org/taxonomy/), in general, RNA viruses differ from dsDNA viruses in the following ways: the replication of the RNA viral genome is prone to orders of magnitude more error; replication takes place in the cytoplasm; the average genome size is significantly smaller; there are fewer species

1 of RNA viruses that infect prokaryotes; and, with the important exception of , RNA viruses do not expressly integrate their genome into the chromosome of their host. These fundamental differences between RNA and DNA viruses impact the dynamics of the host-virus interaction and ultimately the overall functioning of the ecosystem.

1.2. Microbial RNA viruses in aquatic habitats: viral isolates

The first data on RNA viruses in the ocean were based on the isolation of RNA viruses that infect aquatic animals of economic importance. These viruses span all possible genome configurations and infect a suite of invertebrates, fish, seabirds and marine mammals (Lang et al., 2009). Although informative and important, these studies did not contribute to an understanding of the ecology of aquatic RNA viruses: that is their abundance, diversity, interactions and role in key energy, nutrient, and evolutionary dynamics.

Early indications that RNA viruses may be important contributors from an ecological perspective resulted from the isolation of viruses that infected various species of phytoplankton i.e. photosynthetic single-celled eukaryotes (protists) that form the base of the aquatic food web. RNA viruses have now been isolated that infect most major taxa of photosynthetic protists. These include diatoms (Kimura and Tomaru, 2015; Nagasaki et al., 2004a; Shirai et al., 2008; Tomaru et al., 2009; Tomaru et al., 2012), dinoflagellates (Nagasaki et al., 2004b) raphidophytes (Tai et al., 2003), prasinophytes (Brussaard et al., 2004) and thraustochytrids (Takao et al., 2006). Surprisingly, there has only been one report of an RNA phage isolated from an aquatic environment (Hidaka and Ichida, 1976).With some exceptions, these viruses are lytic, have small , large burst sizes and have positive-sense, single stranded genomes (Fig. 1).

1.3. Microbial RNA viruses in aquatic habitats: single-gene surveys

Studies based on cultivation independent approaches indicate that the diversity among the marine RNA viruses far exceeds, and is not well represented by, the limited number of isolates currently in hand. In an early study, an RT-PCR for RNA-dependent RNA polymerase (RdRP) genes was used with extracted RNA from seawater to survey the diversity within a “superfamily” of RNA viruses known as the picorna-like viruses (Culley et al., 2003). None of the environmental sequences detected fell within previously established virus families, suggesting that the four clades formed by these sequences may represent four new families of RNA viruses (Culley et al., 2003). A subsequent study was conducted that again assessed the diversity of picorna-like viruses in seawater. This work included the direct filtration of relatively small volumes of water and RT-PCR amplification of a RdRP gene fragment using improved primer sets (Culley and Steward, 2007). Viral polymerase sequences were amplified from several distinct aquatic environments, including an estuarine urban canal, a subtropical bay, and a temperate bay. Amplification occurred in samples from the same site in different seasons and at different depths showing that RNA viruses were widespread and persistent. Sequencing of the amplified gene fragments revealed novel sequences that are highly divergent from any known isolates. The sequences formed a loose cluster together with other RdRP gene fragments derived from temperate seawater, but none of the sequences from the sub-tropical environment were identical to sequences from temperate waters. Phylogenetic analysis suggested that the virotypes from the study represented five new genera and 24 new species of RNA viruses. If it is assumed that all or most of the distinct environmental virotypes have different hosts, the data imply that a diverse assemblage of marine protists may be infected with viruses at any given time. In a more recent study, Gustavsen et al. (2014) examined the temporal and spatial dynamics of picorna-like RNA viruses based on RT-PCR amplification and deep sequencing of the RdRP molecular marker described above in the coastal waters of British Columbia. While greatly expanding the catalogue of picornavirad RdRPs from the marine environment, they found that the community composition of these viruses was extremely variable over time and space. This suggests the hosts of these viruses, likely protists, are subjected only to intermittent attack by this subset of RNA viruses. Single gene surveys based on the amplification of a molecular marker are a useful, sensitive and straightforward means of characterizing the diversity of a particular virus group. However, unlike cellular genomes, viruses do not share a universal gene and thus this method is not appropriate for a global assessment of diversity.

2. Purpose and scope of the review

A number of recent reviews have discussed the application of metagenomic techniques to the investigation of viral ecology in aquatic environments (see Hayes et al., 2017 and Perez-Sepulveda et al., 2016 for example). However none of these have focused exclusively on RNA viromics. The timing of this overview seems appropriate because RNA viruses are starting to be accepted as a ubiquitous and persistent component of the aquatic virioplankton and more sensitive and powerful sequencing and bioinformatic techniques are available that now allow a more comprehensive characterization of RNA from environmental samples. Nevertheless, as will become apparent below, aquatic RNA viromics is a nascent field and there are many fundamental questions about the RNA virus community that remain unanswered. In order to maintain cohesion, I have restricted this review to RNA viruses that infect primarily naturally occurring planktonic microbes in aquatic environments (Table 1). This has resulted in the omission of several studies associated with aquatic environments that employed RNA viromics or detected RNA viruses in a metatranscriptomic-based study, including the characterization of RNA viruses in waste water (Rosario et al., 2009), sewage (Cantalupo et al., 2011; Ng et al., 2012), the seaweed (Lachnit et al., 2015) and coral holobiont (reviewed in Thurber et al., 2017) and in deep subseafloor sediments (Engelhardt et al., 2015).

2 3. Microbial RNA viromics

3.1. Marine viromics

The application of metagenomics to the study of RNA viruses has expanded the known RNA virosphere exponentially in the past decade (Shi et al., 2016; Krishnamurthy and Wang, 2017). In contrast to cultivation-based or barcoding approaches, metagenomic method provide a more global assessment of community sequence diversity and thus a more comprehensive view of the complete genetic landscape of a RNA virus assemblage. The first study that used RNA viromics to investigate the diversity of an extracellular aquatic RNA virus assemblage is described in Culley et al. (2006). Here, reverse-transcribed whole-genome shotgun libraries were produced with a modified linker amplified shotgun library (LASL) approach (Breitbart et al., 2002) in order to characterize two marine RNA virus communities from coastal British Columbia. For both samples, the majority of sequences had no significant similarity to known sequences. Of the sequences with homology to viruses in the database, one RNA virus library (JP) was characterized by diverse picorna-like viruses, while the second library (SOG) was dominated by viruses related to members of the family (Culley et al., 2006). In the latter library, 59% of the sequence reads that formed overlapping contiguous sequences (contigs) fell into one segment. Similarly, 66% of JP sequence fragments contributed to only four contigs (Culley et al., 2006). A PCR-based approach to increase the amount of sequence for each dominant contig resulted in the assembly of three complete viral genomes (Culley et al., 2007). These presumably represented the most abundant viruses in the two marine RNA virus communities sampled. Based on cloning and Sanger sequencing, these results were derived from what would now be considered a woefully insuffcient number of sequence reads. Nevertheless, the data generated the first hypothesis about the nature of the in situ aquatic RNA virosphere: that positive-sense (+)ssRNA viruses that infect eukaryotes are dominant.

A subsequent study using more powerful sequencing techniques used a viromic approach to characterize the RNA virioplankton in a coastal tropical bay in Hawaii (Culley et al., 2014; Steward et al., 2013). Consistant with the data from coastal British Columbia, a majority of sequences were unknown. When only the classified sequences were considered, the tropical RNA viromes were once again dominated by sequences from +ssRNA viruses that shared characteristics with viruses classified in the order Picornavirales. Further supporting the conclusion that picornavirads were dominant in these samples, the distribution of picornavirad molecular markers in buoyant density gradient fractions directly corresponded with the distribution of total viral RNA. Assembly resulted in the reconstruction of six complete or near complete genomes that shared synteny and had homology with viruses from the taxa , , and Bacillarnavirus. Although the direct quantification of RNA viruses from aquatic samples may be possible in the near future, current virus quantification techniques have proven inadequate (Brussaard et al., 2000; Tomaru and Nagasaki, 2007). In order to estimate the relative contribution of RNA viruses in the tropical seawater samples described above, Steward et al. (2013) coupled measurements of the relative masses of total viral RNA and DNA with estimates of the mass of nucleic acid per RNA or DNA virus as determined by the characterization of the RNA virome. The results were surprising because they suggested that in some instances the number of RNA virus genomes could exceed the number of DNA genomes, seemingly contradicting the assumed numerical dominance of dsDNA viruses in seawater. It must be stated however, that this approach is based on a number of poorly supported assumptions, the violation of any of which would result in markedly different conclusions.

In an effort to expand our understanding of RNA virus biogeography and temporal dynamics, a study was conducted over one summer at Palmer Station on the Western Antarctic Peninsula (Miranda et al., 2016). Once again the trend continued. Although a majority of sequences were unknown, among the identified reads, a group of+ssRNA viruses related to picornavirads were dominant throughout the austral summer. Further, precipitous declines in chlorophyll were concomitant with increases in the total mass of virus-like RNA, a re-lationship that suggests that the most likely hosts of these viruses were diatoms, the most abundant photosynthetic organism in these samples. Several essentially complete RNA genomes were assembled which were most similar to the genomes of bacillarnaviruses, a taxon of viruses that infect diatoms. Additionally, the relative contribution of RNA viruses over the summer was determined in the same manner as Steward et al.(2013) and resulted in percent estimates ranging from 2 to 79% of the total. In summary, three sets of viromes were collected from three distinct marine regions: temperate, tropical and polar, and in all three cases picornavirad-like sequences were dominant. Moreover, an estimate of relative RNA virus abundance from tropical and polar coastal waters suggests that RNA viruses are more abundant than anticipated, and at times can be the majority virotype in a particular sample. Finally, none of the viromes from the three regions had sequences with significant homology to RNA . These data raise all, some, or none of the following possibilities: that most marine RNA viruses do not infect bacteria; that the analytical tools used in these studies were unable to identify the RNA phages in the sample; and/or that the sequencing depth was not suffcient to uncover the RNA phages in these samples. Whatever the explanation, these results merit further consideration and verification.

3.2. Freshwater viromics

The first RNA viromes from a freshwater environment were produced by Djikeng et al. (2009) to characterize the viral community associated with a highly eutrophic, man-made temperate lake in two seasons, the winter and in the summer. Libraries were constructed with a sensitive, and relatively simple and unbiased metagenomic library construction technique based on a modified version of the sequence independent single primer amplification (SISPA) method (Djikeng et al., 2008). Their analysis indicated that, like the marine RNA viroplankton, most of the sequences had no discernable similarity to known sequences. An analysis of the sequences with significant similarity with characterized RNA viruses indicated that the samples included approximately 30 viral families, including viruses that infect plants, insects, vertebrates and humans, with sequences from the order Picornavirales being the most abundant. These results demonstrate how a lake in a catchment with a diversity of land uses can serve as a repository for RNA viruses, including pathogens of crops, livestock and humans. For example, a segmented dsRNA virus that is a close relative of the Banna virus, a pathogen that causes encephalitis, was present in the library. It should also be noted, that like the marine data, none of the sequences from these RNA viromes were identified as being related to RNA phages.

In a follow up to a seminal seasonal study of DNA virus diversity in an Antarctic Lake (López-Bueno et al., 2009), Lopez-Bueno et al. (2015) explored RNA viral diversity, including quasi-species dynamics, in the same location as well as in microbial mats in the surrounding water basin.

3 Like with Djikeng et al. (2009), Culley et al. (2014) and Miranda et al. (2016), libraries were produced with the SISPA method. Both the Antarctic Lake and microbial mat RNA viromes had a high percentage of sequences with no known homology. Of the sequences that were identified, picorna- like virus sequences were the most common, and no RNA phage-like sequences were detected. Assembly yielded four essentially complete RNA virus genomes. The authors postulated that differences in the distribution and seasonality of these genomes is reflected in differences in quasispecies complexity. Relating the viral mutation rate of assembled RNA genomes with the ecology and hydrology of the lake environment adds a novel perspective to this study (Cavicchioli and Erdmann, 2015) and it should be considered in the analysis of RNA viromes in the future.

An innovative use of viromics was employed by Bolduc et al. (2012) to identify putative RNA viruses that infect Archaea in a high-temperature acidic hot springs populated primarily with Archaea. After assaying several hotsprings for detectable viral RNA, viromes were produced from a selection of sites, and subsequently assembled, and screened for the presence of a RdRP, a signature gene of +ssRNA viruses. In a further effort to confirm that these sequences originated from RNA viruses, the putative RNA sequences were compared to concomitantly collected viral DNA metagenomes. Sequences that had significant similarity to the DNA viromes were removed from down-stream analysis. Of the remaining contigs, one was over 5 kb in length and contained regions that were homologous to a RdRP and a RNA virus gene. Phylogenetic and statistical analyses of these genes support the idea that this putative virus does not belong in any of the recognized taxa of eukaryotic or bacterial viruses. Further evidence of a link between the RNA virome-assembled genome with an archaeal host resulted from an analysis of the infection history of the cellular com-munity captured in the spacer content of the CRISPR cassette generated through metagenomics. Here, CRISPR modules from the archaeal genus Sulfolobus were shown to have acquired sequences from the RNA viromes described above, a realistic possibility as apparently some archaeal CRISPR systems are capable of targeting RNA. However, none of the spacer sequences were definitively identified and thus it is unclear whether these sequences are viral in origin. It was subsequently pointed out that there is a lack of congruence in codon usage between Archaea and the putative RNA viruses discovered in this study, calling into question whether indeed the host of these viruses are Archaea (Stedman et al., 2013). However, in addition to the fact that codon usage in viruses does not necessarily correspond to that of their hosts (Young et al., 2013), multiple independent lines of evidence all point to the conclusions of the paper being justified. This research is an exciting and effective demonstration of how a viromics-based approach can be directed towards the discovery of a specific target (in this case an archaeal phage). It also is a demonstration of the added breadth of complementary information and analyses available when DNA virus, RNA virus and cellular metagenomes are generated concurrently.

3.3. Viral metatranscriptomics

Increased sequencing power, better bioinformatic tools and greater computing resources has resulted in a deeper exploration of the metatranscriptome of aquatic microbial communities (discussed in Moran et al., 2012). Metatranscriptomic data is now being used to detect active viruses, and viruses with RNA genomes within communities of host cells. Although the depth of sequencing and library construction methodology can limit the groups of RNA viruses detected with this approach, the concomitant evaluation of virus and host activity promises to produce more information about the interplay between virus and host communities. This section is divided into two sections: a discussion of the analysis of aquatic microbe-infecting RNA virus communities from metatranscriptomic data alone; and a review of research based on the analysis of paired RNA and DNA viromes with the accompanying metatranscriptome.

A recent study focused on the dynamics of active viruses during a brown tide bloom based on metatranscriptomics in two temperate bays (Moniruzzaman et al., 2017). The initial step in library construction involved the production of a cDNA copy of the RNA transcript primed from the poly-A tail. This step limited the data to viruses that commandeer the host eukaryotic cellular machinery to produce RNA transcripts and to intra- or extra- cellular RNA viruses with genomes with poly-A tails that were associated with the host cell size fraction. As well as detecting an abundance of DNA giant virus transcripts, a high percentage of the RNA virus sequences identified were categorized as unclassified picornavirads. Few dsRNA sequences were detected and no sequences from negative (−)ssRNA, or, as expected, RNA phages were identified. Although these data correspond with the community composition observed in the viromes of extracellular RNA viruses (see above), these data are undoubtedly a biased portrait of RNA virus diversity. Over the initiation, evolution and demise of the bloom, the peak of RNA viral sequence abundance coincided with the decline of the bloom. This suggests that these viruses are directly involved in the lysis of the brown tide forming phytoplankter, or that they infect the species of protists that succeed the bloom former as its abundance declines (or both). Eighteen genomes were assembled from the metatranscriptomes and all eighteen shared common characteristics with picornavirads. An estimate of the relative abundance of these genomes over the sampling period indicated that they were all most active during the later phase of the bloom. The direct assignment of each assembled genome to a particular host was not possible. Nevertheless, networking analyses based on the co-occurrence of host and RNA virus marker genes suggested that some hosts may be susceptible to infection by both RNA and DNA viruses, and that many other groups of protists are infected by RNA viruses than are currently in culture. This study serves as a demonstration of a new strategy to investigate the activity of RNA viruses through non-cultivation approaches. Although somewhat complicated by the fact that adsorbed virions encapsidating genomes with poly-A tails are included in the final dataset, the fact that metatranscriptomics produces a direct assessment of only the active component of the virus community offers the clearest picture yet of in situ virus and host interactions on a community level.

The most multifaceted approach to the study of aquatic RNA viral ecology to date was produced by Zeigler-Allen et al. (2017). In this work, samples for metatranscriptomics and RNA and DNA viromics were collected from size fractions of different classes of microbes from sites in and around the Baltic Sea and from a Swedish Arctic Lake. As well as being a critical commercial waterway, the Baltic Sea is an ecosystem of particular interest because it experiences dramatic gradients of salinity, nutrients and oxygen and is heavily effected by an-thropogenic activities. Unlike the brown tide study (Moniruzzaman et al., 2017), the construction of metatranscriptomes did not involve poly-A priming, therefore avoiding the introduction of a bias towards RNA virus genomes with poly-A tails. Although RNA virus sequences were lower in abundance than DNA viruses, they were detected at every station sampled in both viromes and metatranscriptomes (including the Arctic Lake). In contrast to DNA viruses, the highest relative number of RNA viruses was associated with the 0.8–3.0 μm size-class, a size fraction that typically includes protists, a result that reaffrms the culture and non culture-based results of previous studies from other aquatic environments. Unlike previous studies however, the most highly represented group of RNA viruses was dsRNA. For the first time, sequences were also detected with similarity to viruses in the order

4 , a taxon of viruses with −ssRNA genomes that include an assortment of important human and animal pathogens. In addition to the usual aquatic picornavirad viruses detected elsewhere, groups of+ssRNA viruses with similarity to taxa such as the genus Ourmiavirus, an unassigned group of plant viruses, were present as well. Additionally, many sequences were similar to retroviruses or retroviral elements. It is diffcult to pinpoint exactly why this study detected a significantly greater diversity of RNA virus groups. It may be that the Baltic Sea region harbors a more diverse community of RNA viruses than other regions, or that the combination of viromics and metatran-scriptomics effectively captures a wider range of viruses, or that a greater depth of viral sequence space was characterized in this study.

4. Future directions

It should be evident from this review that our understanding of the ecology of wild RNA virus communities in aquatic environments remains rudimentary. Fundamental questions on the total abundance, residence times, host ranges, reservoirs and replication dynamics remain unanswered. Although certain trends can be distilled from the research that has been completed: for example the apparent, but not exclusive, dominance of picornavirads, these data have been generated from a small number of unlinked studies based on a variety of methods and different generations of sequencing technologies (Table 1). The field of study can be advanced by synchronizing measurements of the extra- and intracellular virome as demonstrated by Zeigler-Allen et al.(2017); the development of new methods, for example the separation of viral communities by genome type prior to sequencing (Andrews-Pfannkoch et al., 2010), the isolation and characterization of the viral dsRNA intermediary from infected cells (Decker and Parker, 2014; Urayama et al., 2016) and the direct sequencing of RNA (https://nanoporetech.com/rna); the reconfiguration of bioinformatics tools developed for phage ecology (Bolduc et al., 2017; Páez-Espino et al., 2017; Roux et al., 2015; Watkins and Putonti, 2017) to the study of RNA viruses; and a focus on uniform, integrated and ideally global, sampling programs (for example Brum et al., 2015; Roux et al., 2016). Significant challenges remain, including the characterization of the large percentage of unknown, presumably RNA viral reads and the lack of a rigorous evaluation of the standard methods used to generate RNA viromes (Karlsson et al., 2013; Miranda and Steward, 2017). Never-theless, given the important progress that has been made in such a relatively short period of time, the future of aquatic RNA ecology is promising.

Acknowledgements

Research in the Culley laboratory is supported primarily by the Natural Sciences and Engineering Research Council of Canada (NSERC), the Fonds de recherche du Québec - Nature et technologies (FRQNT) and the program Sentinel North financed by the Canada First Research Excellence Fund (CFREF). I would also like to thank Amanda Toperoff for her assistance with Fig. 1.

References

Andrews-Pfannkoch, C., Fadrosh, D.W., Thorpe, J., Williamson, S.J., 2010. Hydroxyapatite mediated separation of dsDNA, ssDNA and RNA genotypes from natural viral assemblages. Appl. Environ. Microbiol. 76 (15), 5039–5045. Bolduc, B., Shaughnessy, D.P., Wolf, Y.I., Koonin, E.V., Roberto, F.F., Young, M., 2012. Identification of novel positive-strand RNA viruses by metagenomic analysis of ar- chaea-dominated Yellowstone hot springs. J. Virol. 86 (10), 5562–5573. Bolduc, B., Jang, H.B., Doulcier, G., You, Z.-Q., Roux, S., Sullivan, M.B., 2017. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria. PeerJ 5, e3243. Breitbart, M., Salamon, P., Andresen, B., Mahaffy, J.M., Segall, A.M., Mead, D., Azam, F., Rohwer, F., 2002. Genomic analysis of uncultured marine viral communities. Proc. Natl. Acad. Sci. U. S. A. 99 (22), 14250–14255. Brum, J.R., Schenck, R.O., Sullivan, M.B., 2013. Global morphological analysis of shows minimal regional variation and dominance of non-tailed viruses. ISME J. 7 (9), 1738–1751. Brum, J.R., Ignacio-Espinoza, J.C., Roux, S., Doulcier, G., Acinas, S.G., Alberti, A., Chaffron, S., Cruaud, C., De Vargas, C., Gasol, J.M., Gorsky, G., Gregory, A.C., Guidi, L., Hingamp, P., Iudicone, D., Not, F., Ogata, H., Pesant, S., Poulos, B.T., Schwenck, S.M., Speich, S., Dimier, C., Kandels-Lewis, S., Picheral, M., Searson, S., Tara Oceans Coordinators, Bork, P., Bowler, C., Sunagawa, S., Wincker, P., Karsenti, E., Sullivan, M.B., 2015. Ocean plankton. Patterns and ecological drivers of ocean viral commu- nities. Science 348 (6237), 1261498. Brussaard, C.P.D., Marie, D., Bratbak, G., 2000. Flow cytometric detection of viruses. J. Virol. Methods 85 (1–2), 175–182. Brussaard, C.P.D., Noordeloos, A.A.M., Sandaa, R.-A., Heldal, M., Bratbak, G., 2004. Discovery of a dsRNA virus infecting the marine photosynthetic protist Micromonas pusilla. Virology 319 (2), 280–291. Cantalupo, P.G., Calgua, B., Zhao, G., Hundesa, A., Wier, A.D., Katz, J.P., Grabe, M., Hendrix, R.W., Girones, R., Wang, D., Pipas, J.M., 2011. Raw sewage harbors diverse viral populations. mBio 2 (5), e00180–00111. Cavicchioli, R., Erdmann, S., 2015. The discovery of Antarctic RNA viruses: a new game changer. Mol. Ecol. 24 (19), 4809–4811. Culley, A.I., Steward, G.F., 2007. New genera of RNA viruses in subtropical seawater, inferred from polymerase gene sequences. Appl. Environ. Microbiol. 73 (18), 5937–5944. Culley, A.I., Lang, A.S., Suttle, C.A., 2003. High diversity of unknown picorna-like viruses in the sea. Nature 424 (6952), 1054–1057.

5 Culley, A., Lang, A., Suttle, C., 2006. Metagenomic analysis of coastal RNA virus com- munities. Science 312 (5781), 1795–1798. Culley, A.I., Lang, A.S., Suttle, C.A., 2007. The complete genomes of three viruses as- sembled from shotgun libraries of marine RNA virus communities. Virol. J. 4, 69. Culley, A.I., Mueller, J.A., Belcaid, M., Wood-Charlson, E.M., Poisson, G., Steward, G.F., 2014. The characterization of RNA viruses in tropical seawater using targeted PCR and metagenomics. mBio 5 (3), e01210–01214. Decker, C.J., Parker, R., 2014. Analysis of double-stranded RNA from microbial com- munities identifies double-stranded RNA virus-like elements. Cell Rep. 7 (3), 898–906. Djikeng, A., Halpin, R., Kuzmickas, R., Depasse, J., Feldblyum, J., Sengamalay, N., Afonso, C., Zhang, X., Anderson, N.G., Ghedin, E., Spiro, D.J., 2008. Viral genome sequencing by random priming methods. BMC Genomics 9, 5. Djikeng, A., Kuzmickas, R., Anderson, N.G., Spiro, D.J., 2009. Metagenomic analysis of RNA viruses in a fresh water lake. PLoS One 4 (9), e7264. Engelhardt, T., Orsi, W.D., Jørgensen, B.B., 2015. Viral activities and life cycles in deep subseafloor sediments. Environ. Microbiol. Rep. 7 (6), 868–873. Gustavsen, J.A., Winget, D.M., Tian, X., Suttle, C.A., 2014. High temporal and spatial diversity in marine RNA viruses (Picornavirales) revealed by high-throughput se- quencing implies an important role in mortality and structure ofing plankton com- munities. Front. Microbiol. 5, 1–13. Hayes, S., Mahony, J., Nauta, A., van Sinderen, D., 2017. Metagenomic approaches to assess bacteriophages in various environmental niches. Viruses 9 (6) 127–122. Hidaka, T., Ichida, K., 1976. Properties of a marine RNA-containing . Mem. Faculty Fish. Kagoshima Univ. 25, 77–89. Karlsson, O.E., Belák, S., Granberg, F., 2013. The effect of preprocessing by sequence- independent, single-primer amplification (SISPA) on metagenomic detection of viruses. Biosecur. Bioterror. 11 (1), S227–34. Kimura, K., Tomaru, Y., 2015. Discovery of two novel viruses expands the diversity of single-stranded DNA and single-stranded RNA viruses infecting a cosmopolitan marine diatom. Appl. Environ. Microbiol. 81 (3), 1120–1131. Krishnamurthy, S.R., Wang, D., 2017. Origins and challenges of viral dark matter. Virus Res. 239 (July), 136–142. http://dx.doi.org/10.1016/j. virusres.2017.02.002. López-Bueno, A., Tamames, J., Velázquez, D., Moya, A., Quesada, A., Alcamí, A., 2009. High diversity of the viral community from an Antarctic lake. Science 326 (5954), 858–861. Labonté, J.M., Suttle, C.A., 2013. Previously unknown and highly divergent ssDNA viruses populate the oceans. ISME J. 7 (11), 2169–2177. Lachnit, T., Thomas, T., Steinberg, P., 2015. Expanding our understanding of the seaweed holobiont: RNA viruses of the red alga delisea pulchra. Front. Microbiol. 6, 1489. Lang, A.S., Rise, M.L., Culley, A.I., Steward, G.F., 2009. RNA viruses in the sea. FEMS Microbiol. Rev. 33 (2), 295–323. Lopez-Bueno, A., Rastrojo, A., Peiró, R., Arenas, M., Alcami, A., 2015. Ecological con- nectivity shapes quasispecies structure of RNA viruses in an Antarctic lake. Mol. Ecol. 24 (19), 4812–4825. Miranda, J.A., Steward, G.F., 2017. Variables influencing the effciency and interpretation of reverse transcription quantitative PCR (RT-qPCR): An empirical study using Bacteriophage MS2. J. Virol. Methods 241, 1–10. Miranda, J.A., Culley, A.I., Schvarcz, C.R., Steward, G.F., 2016. RNA viruses as major contributors to Antarctic virioplankton. Environ. Microbiol. 18 (11), 3714–3727. Monier, A., Claverie, J.-M., Ogata, H., 2008. Taxonomic distribution of large DNA viruses in the sea. Genome Biol. 9 (7), R106. Moniruzzaman, M., Wurch, L.L., Alexander, H., Dyhrman, S.T., Gobler, C.J., Wilhelm, S.W., 2017. Virus-host relationships of marine single-celled eukaryotes resolved from metatranscriptomics. Nat. Commun. 8, 16054. Moran, M.A., Satinsky, B., Gifford, S.M., Luo, H., Rivers, A., Chan, L.-K., Meng, J., Durham, B.P., Shen, C., Varaljay, V.A., Smith, C.B., Yager, P.L., Hopkinson, B.M., 2012. Sizing Up Metatranscriptomics 7 (2), 237–243. Nagasaki, K., Tomaru, Y., Katanozaka, N., Shirai, Y., Nishida, K., Itakura, S., Yamaguchi, M., 2004a. Isolation and characterization of a novel single- stranded RNA virus in- fecting the bloom-forming diatom Rhizosolenia setigera. Appl. Environ. Microbiol. 70 (2), 704–711. Nagasaki, K., Tomaru, Y., Nakanishi, K., Hata, N., Katanozaka, N., Yamaguchi, M., 2004b. Dynamics of Heterocapsa cicularisquama (Dinophyceae) and its viruses in Ago Bay, Japan. Aquat. Microb. Ecol. 34 (3), 219–226. Ng, T.F.F., Marine, R., Wang, C., Simmonds, P., Kapusinszky, B., Bodhidatta, L., Oderinde, B.S., Wommack, K.E., Delwart, E., 2012. High variety of known and new RNA and DNA viruses of diverse origins in untreated sewage. J. Virol. 86 (22), 12161–12175. Páez-Espino, D., Chen, I.-M.A., Palaniappan, K., Ratner, A., Chu, K., Szeto, E., Pillay, M., Huang, J., Markowitz, V.M., Nielsen, T., Huntemann, M.K., Reddy, T.B., Pavlopoulos, G.A., Sullivan, M.B., Campbell, B.J., Chen, F., McMahon, K., Hallam, S.J., Denef, V., Cavicchioli, R., Caffrey, S.M., Streit, W.R., Webster, J., Handley, K.M., Salekdeh, G.H., Tsesmetzis, N., Setubal, J.C., Pope, P.B., Liu, W.-T., Rivers, A.R., Ivanova, N.N., Kyrpides, N.C., 2017. IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses. Nucleic Acids Res. 45 (D1), D457–D465. Perez-Sepulveda, B., Redgwell, T., Rihtman, B., Pitt, F., Scanlan, D.J., Millard, A., 2016. Marine phage genomics: the tip of the iceberg. FEMS Microbiol. Lett. 363 (15), 1–8. Rosario, K., Nilsson, C., Lim, Y., Ruan, Y., Breitbart, M., 2009. Metagenomic analysis of viruses in reclaimed water. Environ. Microbiol. 11 (11), 2806–2820.

6 Roux, S., Enault, F., Hurwitz, B.L., Sullivan, M.B., 2015. VirSorter: mining viral signal from microbial genomic data. PeerJ 3, e985. Roux, S., Brum, J.R., Dutilh, B.E., Sunagawa, S., Duhaime, M.B., Loy, A., Poulos, B.T., Solonenko, N., Lara, E., Poulain, J., Pesant, S., Kandels- Lewis, S., Dimier, C., Picheral, M., Searson, S., Cruaud, C., Alberti, A., Duarte, C.M., Gasol, J.M., Vaqué, D., Tara Oceans Coordinators, Bork, P., Acinas, S.G., Wincker, P., Sullivan, M.B., 2016. Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses. Nature 537 (7622), 689–693. Shi, M., Lin, X.-D., Tian, J.-H., Chen, L.-J., Chen, X., Li, C.-X., Qin, X.-C., Li, J., Cao, J.-P., Eden, J.-S., Buchmann, J., Wang, W., Xu, J., Holmes, E.C., Zhang, Y.-Z., 2016. Redefining the invertebrate RNA virosphere. Nature 1–12. Shirai, Y., Tomaru, Y., Takao, Y., Suzuki, H., Nagumo, T., Nagasaki, K., 2008. Isolation and characterization of a single-stranded RNA virus infecting the marine planktonic diatom Chaetoceros tenuissimus Meunier. Appl. Environ. Microbiol. 74 (13), 4022–4027. Stedman, K.M., Kosmicki, N.R., Diemer, G.S., 2013. Codon usage frequency of RNA virus genomes from high-temperature acidic-environment metagenomes. J. Virol. 87 (3), 1919. Steward, G.F., Culley, A.I., Mueller, J.A., Wood-Charlson, E.M., Belcaid, M., Poisson, G., 2013. Are we missing half of the viruses in the ocean? ISME J. 7 (3), 672–679. Suttle, C.A., 2007. Marine viruses?major players in the global ecosystem. Nat. Rev. Microbiol. 5 (10), 801–812. Suttle, C.A., 2016. Environmental microbiology: viral diversity on the global stage. Nat. Microbiol. 1 (11), 16205. Tai, V., Lawrence, J.E., Lang, A.S., Chan, A.M., Culley, A.I., Suttle, C.A., 2003. Characterization of HaRNAV, a single-stranded RNA virus causing lysis of Heterosigma akashiwo (Raphidophyceae). J. Phycol. 39 (2), 343–352. Takao, Y., Mise, K., Nagasaki, K., Okuno, T., Honda, D., 2006. Complete nucleotide se- quence and genome organization of a single-stranded RNA virus infecting the marine fungoid protist Schizochytrium sp. J. Gen. Virol. 87 (Pt 3), 723–733. Thurber, R.V., Payet, J.P., Thurber, A.R., Correa, A.M.S., 2017. Virus-host interactions and their roles in coral reef health and disease. Nat. Rev. Microbiol. 15 (4), 205–216. Tomaru, Y., Nagasaki, K., 2007. Flow cytometric detection and enumeration of DNA and RNA viruses infecting marine eukaryotic microalgae. J. Oceanogr. 63 (2), 215–221. Tomaru, Y., Takao, Y., Suzuki, H., Nagumo, T., Nagasaki, K., 2009. Isolation and char- acterization of a single-stranded RNA virus infecting the bloom-forming diatom Chaetoceros socialis. Appl. Environ. Microbiol. 75 (8), 2375–2381. Tomaru, Y., Toyoda, K., Kimura, K., Hata, N., Yoshida, M., Nagasaki, K., 2012. First evidence for the existence of pennate diatom viruses. ISME J. 6 (7), 1445–1448. Urayama, S., Takaki, Y., Nunoura, T., 2016. FLDS: a comprehensive dsRNA sequencing method for intracellular RNA virus surveillance. Microb. Environ. 31 (1), 33–40. Watkins, S.C., Putonti, C., 2017. The use of informativity in the development of robust viromics-based examinations. PeerJ 5, e3281. Weitz, J.S., Wilhelm, S.W., 2012. Ocean viruses and their effects on microbial commu- nities and biogeochemical cycles. F1000 Biol. Rep.s 4, 17. http://dx.doi.org/10. 3410/B4-17. Wigington, C.H., Sonderegger, D., Brussaard, C.P.D., Buchan, A., Finke, J.F., Fuhrman, J.A., Lennon, J.T., Middelboe, M., Suttle, C.A., Stock, C., Wilson, W.H., Wommack, K.E., Wilhelm, S.W., Weitz, J.S., 2016. Re-examination of the relationship between marine virus and microbial cell abundances. Nat. Microbiol. 1 (3), 15024. http://dx. doi.org/10.1038/nmicrobiol.2015.24. Young, M., Bolduc, B., Shaughnessy, D.P., Roberto, F.F., Wolf, Y.I., Koonin, E.V., 2013. Reply to codon usage frequency of RNA virus genomes from high-temperature acidic- environment metagenomes. J. Virol. 87 (3), 1920–1921. Zeigler-Allen, L., Mccrow, J.P., Ininbergs, K., Dupont, C.L., Badger, J.H., Hoffman, J.M., Ekman, M., Allen, A.E., Bergman, B., Venter, J.C., 2017. The Baltic Sea virome: di- versity and transcriptional activity of DNA and RNA viruses. mSystems 2 (1), e00125–16.

7 8 9