bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1 Meta-transcriptomic detection of diverse and divergent
2 RNA viruses in green and chlorarachniophyte algae
3
4
5 Justine Charon1, Vanessa Rossetto Marcelino1,2, Richard Wetherbee3, Heroen Verbruggen3,
6 Edward C. Holmes1*
7
8
9 1Marie Bashir Institute for Infectious Diseases and Biosecurity, School of Life and
10 Environmental Sciences and School of Medical Sciences, The University of Sydney,
11 Sydney, Australia.
12 2Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical
13 Research, Westmead, NSW 2145, Australia.
14 3School of BioSciences, University of Melbourne, VIC 3010, Australia.
15
16
17 *Corresponding author:
18 Marie Bashir Institute for Infectious Diseases and Biosecurity, School of Life and
19 Environmental Sciences and School of Medical Sciences,
20 The University of Sydney,
21 Sydney, NSW 2006, Australia.
22 Tel: +61 2 9351 5591
23 Email: [email protected]
1 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
24 Abstract
25 Our knowledge of the diversity and evolution of the virosphere will likely increase
26 dramatically with the study of microbial eukaryotes, including the microalgae in few RNA
27 viruses have been documented to date. By combining meta-transcriptomic approaches
28 with sequence and structural-based homology detection, followed by PCR confirmation, we
29 identified 18 novel RNA viruses in two major groups of microbial algae - the chlorophytes
30 and the chlorarachniophytes. Most of the RNA viruses identified in the green algae class
31 Ulvophyceae were related to those from the families Tombusviridae and Amalgaviridae that
32 have previously been associated with plants, suggesting that these viruses have an
33 evolutionary history that extends to when their host groups shared a common ancestor. In
34 contrast, seven ulvophyte associated viruses exhibited clear similarity with the mitoviruses
35 that are most commonly found in fungi. This is compatible with horizontal virus transfer
36 between algae and fungi, although mitoviruses have recently been documented in plants.
37 We also document, for the first time, RNA viruses in the chlorarachniophytes, including the
38 first observation of a negative-sense (bunya-like) RNA virus in microalgae. The other virus-
39 like sequence detected in chlorarachniophytes is distantly related to those from the plant
40 virus family Virgaviridae, suggesting that they may have been inherited from the secondary
41 chloroplast endosymbiosis event that marked the origin of the chlorarachniophytes. More
42 broadly, this work suggests that the scarcity of RNA viruses in algae most likely results from
43 limited investigation rather than their absence. Greater effort is needed to characterize the
44 RNA viromes of unicellular eukaryotes, including through structure-based methods that are
45 able to detect distant homologies, and with the inclusion of a wider range of eukaryotic
46 microorganisms.
47
48 Author summary
2 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
49 RNA viruses are expected to infect all living organisms on Earth. Despite recent
50 developments in and the deployment of large-scale sequencing technologies, our
51 understanding of the RNA virosphere remains anthropocentric and largely restricted to
52 human, livestock, cultivated plants and vectors for viral disease. However, a broader
53 investigation of the diversity of RNA viruses, especially in protists, is expected to answer
54 fundamental questions about their origin and long-term evolution. This study first
55 investigates the RNA virus diversity in unicellular algae taxa from the phylogenetically
56 distinct ulvophytes and chlorarachniophytes taxa. Despite very high levels of sequence
57 divergence, we were able to identify 18 new RNA viruses, largely related to plant and fungi
58 viruses, and likely illustrating a past history of horizontal transfer events that have occurred
59 during RNA virus evolution. We also hypothesise that the sequence similarity between a
60 chlorarachniophyte-associated virga-like virus and members of Virgaviridae associated with
61 plants may represent inheritance from a secondary endosymbiosis event. A promising
62 approach to detect the signals of distant virus homologies through the analysis of protein
63 structures was also utilised, enabling us to identify potential highly divergent algal RNA
64 viruses.
65
66 INTRODUCTION
67 Viruses are likely to infect every cellular organism and play fundamental roles in biosphere
68 diversity, evolution, and ecology. Studies of the global virosphere performed to date have
69 revealed marked heterogeneities in virus composition. For example, while RNA viruses are
70 commonplace in eukaryotes, they are less often found in bacteria, with only two families
71 described to date, and are yet to be conclusively identified in Archaea. Rather, both the
72 bacteria and Archaea are dominated by DNA viruses1,2. It is unclear, however, whether the
73 highly skewed distribution of viruses reflects fundamental biological, cellular or ecological
74 factors of the hosts in question, or because RNA viruses in bacteria and Archaea are often
3 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
75 so divergent in sequence that they are difficult to detect using primary sequence
76 comparisons alone.
77
78 Despite greatly increased virus sampling following the advent of metagenomic next-
79 generation sequencing, our picture of the virosphere remains largely restricted to bacteria,
80 vertebrates and plants3,4. Clearly, such a sampling bias will also impact our knowledge of
81 the fundamental patterns and processes of virus origins and evolution. A good example of
82 this major knowledge bias are the unicellular eukaryotes, grouped under the term “protists”,
83 and particularly the microalgae: only 61 distinct viruses have been formally recognized in
84 these taxa5, with only 82 viral sequences classified as infecting eukaryotic microalgae6. This
85 represents only 0.6% of the total 14,679 viral sequences listed on the Viral-Host database
86 (release April 2020), although the true number of microalgal species is estimated to exceed
87 300,0007.
88
89 Despite early attempts, and the first algal virus cultivation in 19798,9, the isolation and
90 characterization of phycoviruses (i.e. algal viruses) has been constrained by the difficulty in
91 cultivating both the algae and their viruses, as well as the inherent challenges in identifying
92 RNA viruses that are highly divergent in primary sequence. Indeed, because RNA viruses
93 are the fastest evolving entities described, phylogenetic signal is rapidly lost over
94 evolutionary time. Hence, it is possible that the low number of algal RNA viruses detected
95 to date simply reflects the fact that they are highly divergent in sequence, even in the
96 canonical RNA-dependent RNA polymerase (RdRp), and hence refractory to detection
97 using primary sequence similarity. Importantly, protein structures are expected to be an
98 order of magnitude more conserved than amino acid sequences10. As a consequence, the
99 study of conserved secondary or tertiary structures could help identify distant homologies
100 among RNA viruses over extended evolutionary time-scales11,12. Accordingly, the analysis of
4 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
101 conserved protein structures may constitute a promising approach to identify novel viruses
102 within the microalgae.
103
104 Microalgae are a polyphyletic group of microscopic unicellular photosynthetic organisms
105 distributed across diverse branches of the eukaryotic phylogeny. With 72,500 species
106 discovered to date and organised into the TSAR (Telonemia, Stramenopiles, Alveolates,
107 Rhizaria), Archaeplastida, Haptista, Cryptista and “excavates” supergroups13–15, microalgae
108 constitute a huge source of genetic diversity16. In addition to their wide range of habitats,
109 their ubiquity and the diversity of genomic features characterised to date (with linear,
110 circular, segmented, non-segmented, single-strand and double-strand genomes) suggest
111 that microalgae will harbour an enormous untapped source of viral diversity. Due to the
112 ancient nature of eukaryotic algae (ca. 1.8 billion years), their involvement in secondary
113 plastid endosymbiosis events involving many branches of the eukaryotic phylogeny, and
114 that microalgae constitute a primary food source for marine and freshwater food chain, it is
115 also possible that algal viruses played a crucial role in the early events of eukaryote virus
116 evolution5,17.
117
118 The algal viruses documented to date are dominated by those with DNA genomes: 55 of
119 the 82 algal virus sequences available at VirusHostdb. These include the well-known giant
120 viruses, the majority of which (53%) have been described in the green algae (Chlorophyta).
121 This DNA-dominated virome of green algae contrasts with those of their sister-group, the
122 land plants18, for which 60% of the 3590 viral sequences are RNA viruses (i.e. the
123 “Riboviria”; VirusHostdb, April 2020 release). In marked contrast, 85% of the 27 algae RNA
124 viruses described to date have been identified in diatoms19. Even the few algal RNA viruses
125 characterised to date display impressive diversity, belonging to the families Totiviridae,
126 Reoviridae, Marnaviridae, Endornaviridae, Flaviviridae, Narnaviridae and Alvernaviridae.
5 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
127 However, it is currently unclear whether these seemingly differing distributions of DNA and
128 RNA viruses reflect a major switch in virus composition that occurred during the expansion
129 of land plants or is simply indicative of the inherent limitations in sampling and investigation
130 described above20. As such, revealing the pattern and extent of RNA virus diversity in algae
131 may have profound consequences for understanding the long-term processes that have
132 shaped virus diversity and evolution.
133
134 Herein, we characterised more of the RNA virosphere in cultured samples of two clades of
135 microalgae: (i) the green algae (Chlorophyta), that are part of the Archaeplastida eukaryotic
136 supergroup, and (ii) the chlorarachniophytes, a lineage of Rhizaria that obtained a
137 chloroplast through secondary endosymbiosis of a green alga21. To this end we performed
138 an unbiased (i.e. bulk RNA-sequencing) meta-transcriptomic analysis with an emphasis on
139 detecting remote signals of homology in the RdRp, the hallmark of RNA viruses, using
140 protein-profile based approaches. The comparison of the viromes of these two distant
141 groups will help answer the following key questions: (i) is the long-term divergence between
142 the two algae taxa also reflected in their RNA virome compositions? (ii) does their RNA
143 virome provide evidence for complex evolutionary histories, including horizontal transfer
144 events? (iii) is the RNA/DNA virus bias observed between algae and green plants an artefact
145 of sampling or reflect a more fundamental biological division? More generally, we aimed to
146 broaden our understanding of the biodiversity of algal viruses, with the expectation that the
147 characterization of the algal virosphere will have important implications for understanding
148 and managing the roles they play in global element cycling, climate forcing and
149 biotechnology. Indeed, viruses are important in regulating microalgae populations and may
150 provide a key reservoir for genetic novelty22,23.
151
152 RESULTS
6 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
153
154 The RNA viromes of two divergent groups of microalgae
155 Our aim was to determine the RNA viromes of six microalgae cultures from six different
156 algal species classified into two highly phylogenetically distinct algal clades: the
157 chlorarachniophytes (Rhizaria) and the green algae (Chlorophyta, Archaeplastida) (Fig 1A).
158 To the best of our knowledge, this is the first identification of viruses in the
159 Chlorarachniophyceae and Ulvophyceae (Chlorophyta) classes (Fig 1C) 24.
160
161 While our previous limited understanding of viruses infecting chlorarachniophytes can be
162 explained by the small number of species from this clade characterized to date (only 15 in
163 Algaebase), the Ulvophyceae is an abundant and diverse algal lineage in existence since
164 the late Proterozoic and comprises at least 1,933 species25. It contains a wide range of
165 morphologies from unicellular benthic algae to large seaweeds26 and its representatives
166 commonly occur in marine, terrestrial and freshwater habitats18. We therefore performed
167 RNA sequencing (meta-transcriptomics) on six microalgal species belonging to both the
168 green algae (classes Ulvophyceae and Mamiellophyceae) and chlorarachniophytes (Table
169 1).
170
171 Table 1. Sample and library description.
Library Species Class
Kraftionema allantoideum Ulvophyceae/Kraftionemaceae
ALG_1 Microrhizoidea pickettheapsiorum Mamiellophyceae/Dolichomastigaceae
Dolichomastix tenuilepis Mamiellophyceae/Dolichomastigaceae
ALG_2 Ostreobium sp. HV05007bc Ulvophyceae/Bryopsidales
7 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Chlorarachnion reptans Chlorarachniophyceae/Chlorarachnion ALG_3 Lotharella sp Chlorarachniophyceae/Lotharella
172
173 Because of variable levels of RNA quality and quantity, fewer non-rRNA reads were
174 obtained for the ALG_1 and ALG_3 libraries, which also likely explains the reduced average
175 length of contigs in these libraries compared to ALG_2.
176
177 In total, we identified 18 new putative viral sequences using a standard sequence similarity
178 search among the three libraries, largely comprising those with double-stranded (ds) RNA
179 or single-stranded positive-sense (ssRNA(+)) genomes (Table 2). A divergent bunya-like
180 partial sequence was also retrieved from Chlorarachnion reptans and may constitute the
181 first negative-sense (ssRNA(-)) virus identified in microalgae. Importantly, the presence of
182 these viruses was validated by RT-PCR on each total extracted RNA (Fig S1). In each case
183 these viruses exhibited very low levels of sequence similarity to existing RdRp amino acid
184 sequences, with sequence identities ranging from only 27 to 38%. Most of the viral
185 sequences discovered in this study likely correspond to full-length genomes, given that the
186 length and genomic organization (ORF numbers, length, etc) is similar to the well-annotated
187 reference homologs. Thus, these sequences are very likely true viruses rather than
188 endogenous viral elements (EVEs) inserted into host genomes.
189
190 Table 2. BLASTx results of the newly described virus-like sequences against nr database.
New virus Length Count BLASTx hit % e-value BLASTx Hit Taxonomy
(algal host species) nt (%read) ID
Amalga-like boulavirus 1440 1427 BAA25883Rd 31 4.6E-23 BDRM Partitiviridae
(K. allantoideum)) (0.16%) Rp (dsRNA)
8 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Amalga-like chassivirus 3399 1471 BAA25883Rd 28 1.8E-38 BDRM Partitiviridae
(Ostreobium sp.) (0.02%) Rp (dsRNA)
Amalga-like chaucrivirus 4036 16239 BAA25883Rd 33 3.3E- BDRM Partitiviridae
(Ostreobium sp.) (0.20%) Rp 103 (dsRNA)
Amalga-like dominovirus 4011 1974 BAA25883Rd 33 5.1E-88 BDRM Partitiviridae
(Ostreobium sp.) (0.04%) Rp (dsRNA)
Amalga-like 3254 875 BAA25883Rd 27 3.5E-39 BDRM Partitiviridae
lacheneauvirus (0.01%) Rp (dsRNA)
(Ostreobium sp.)
Partiti-like alassinovirus 3,658 3459 BAB63954 29 1.6E-48 BDRC Bryopsis
(Ostreobium so.) (0.14%) cinicola*
(dsRNA)
Partiti-like lacotivirus 3273 3074 BAB63954 29 2.5E-45 BDRC Bryopsis
(Ostreobium so.) (2.60%) cinicola*
(dsRNA)
Partiti-like adriusvirus 4252 4053 BAB63954 23 5.7E-18 BDRC Bryopsis
(Ostreobium so.) (0.14%) cinicola*
(dsRNA)
Mito-like babylonusvirus 2942 8552 APG77166Rd 38 9E-42 Shahe narna- Unclassified
(Ostreobium sp.) (0.11%) Rp like virus 6 RNA virus
(ssRNA)
Mito-like albercanusvirus 2791 5115 APG77166Rd 39 7.2E-41 Shahe narna- Unclassified
(Ostreobium sp.) (0.06%) Rp like virus 6 RNA virus
(ssRNA)
Mito-like spartanusvirus 2684 14494 ASM94099R 38 2E-32 Barns Ness Narnaviridae
(Ostreobium sp.) (0.18%) dRp serrated (ss+RNA)
wrack narna-
like virus 3
Mito-like laruketanusvirus 2928 11430 APG77166.1 36 5.8E-33 Shahe narna- Unclassified
(Ostreobium sp.) (0.17%) RdRp like virus 6 RNA virus
(ssRNA)
Mito-like 2773 1939 APG77166Rd 34 6.1E-40 Shahe narna- Unclassified
boobanusbagrillonusvirus (0.03%) Rp like virus 6 RNA virus
(Ostreobium sp.)
9 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Mito-like picolinusvirus 2714 13040 YP 34 4.1E-33 Botrytis Narnaviridae
(Ostreobium sp.) (0.10%) 00228433Rd cinerea (ss+RNA)
Rp mitovirus 1
Mito-like bionusvirus 3260 7333 AXY40442Rd 27 5.7E-13 Rhizophagus Narnaviridae
(Ostreobium sp.) (0.09%) Rp diaphanum (ss+RNA)
mitovirus 1
Tombus-like 3835 6272 YP 36 5.1E-45 Hubei Unclassified
chagrupourvirus (0.08%) 009336735hy tombus-like RNA virus
(Ostreobium sp.) pot. protein 2 virus 12
Virga-like bellevillovirus 2313 172.66 AMO03254 29 4.4E-44 Boutonnet Unclassified
(C. reptans) (0.01%) polyprotein virus ssRNA virus
(ssRNA)
Bunya-like bridouvirus (C. 2818 192 APG79310Rd 30 9.1E-68 Shahe Unclassified
reptans) (0.01%) Rp bunya-like RNA virus
virus 1
191 BDRM: Bryopsis mitochondria-associated dsRNA; BDRC: Bryopsis cinicola chloroplast-
192 associated dsRNAs; *likely viral RdRp mis-annotated as a host protein.
193
194 Eight of the viruses identified in Ostreobium sp. fell within the Narnaviridae or
195 Tombusviridae. In contrast, five viral sequences from Ostreobium sp. and K. allantoideum
196 do not fit into defined taxonomic groups, and were instead related to the broad set of
197 ‘partiti-like’ viruses that comprise the Partitiviridae, Totiviridae and Amalgaviridae. Finally,
198 more divergent but detectable sequence similarities to the Virgaviridae (+ssRNA) were
199 obtained for samples from the chlorarachniophyte library.
200
201 Detection of divergent viruses using protein structural data
202 An additional attempt to detect even more divergent RNA viruses was conducted was using
203 protein structure. In particular, it is possible that highly divergent viruses are part of the
204 unknown sequences (i.e. contigs with no match in nt/nr databases, or the ‘dark matter’) that
205 comprise between 50-60% of total contigs obtained in this study (Fig 2A). Accordingly, we
10 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
206 attempted to detect evolutionary-conserved features of protein structural and functional
207 motifs in orphan sequences that encode unknown ORFs with a minimum of 200 amino
208 acids (600 nucleotides) (Fig 2A). The corresponding translated ORFs were submitted to the
209 protein-profile based program HMMer3 and compared with profiles from the PFAM RdRp
210 clan and the VOG databases. To exclude false-positives, all positive hits were compared to
211 the entire PFAM database. This resulted in the identification of three non-phage contigs
212 that displayed RdRp homology: ALG_2_DN19089, ALG_2_DN594 and ALG_3_DN34624
213 (Table 3).
11 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
214 Table 3. Results of the VOGdb and PFAM HMM analysis. Light purple: phage-like sequences. Grey: Non-viral sequences; Light blue: DNA
215 virus-like sequences; Orange: RNA virus-like sequences.
Contig name ORF Abund. Viral hit E-value Viral hit description Viral-like hit taxonomy PFAM hit ID E-value2 PFAM hit description ALG_2_DN19089_c0_g1_i1_len4252 ORF_1 4717 VOG03062 1.00E-12 REFSEQ_hypothetical_protein Bacteriophage - - - ALG_2_DN19089_c0_g1_i1_len4252 ORF_1 4717 PF00680.20 2.70E-05 RNA dependent RNA polymerase RdRP-1 - - - ALG_2_DN19250_c2_g3_i5_len1869 ORF_1 743.91 VOG23558 1.00E-04 REFSEQ_hypothetical_protein Caudovirales;Siphoviridae - - - ALG_2_DN18568_c0_g1_i1_len2977 ORF_1 511 VOG10478 1.30E-06 sp|Q05224|VG18_BPML5_Gene_18_protein Bacteriophage - - - ALG_2_DN19013_c0_g1_i2_len1689 ORF_2 224.31 VOG22975 1.40E-04 REFSEQ_carboxylesterase Caudovirales;Siphoviridae - - - ALG_2_DN19410_c0_g2_i6_len1950 ORF_1 183.94 VOG12013 5.20E-04 sp|P03778|Y06_BPT7_Protein_0.6B Viruses PF16752.5 1.50E-04 Tubulin-specific chaperone C ALG_2_DN18226_c0_g1_i1_len1259 ORF_1 157.61 VOG09820 2.90E-04 REFSEQ_hypothetical_protein Phycodnaviridae;Chlorovirus - - - ALG_2_DN18993_c2_g2_i2_len2532 ORF_1 151.17 VOG08344 8.50E-08 REFSEQ_hypothetical_protein Bacteriophage PF13424.6 3.80E-132 Tetratricopeptide repeat ALG_3_DN34624_c0_g1_i1_len2077 ORF_2 146 PF02123.16 8.50E-05 Viral RNA-directed RNA-polymerase RdRP-4 - - - ALG_2_DN18744_c0_g1_i3_len2080 ORF_1 134.99 VOG10472 4.10E-04 REFSEQ_hypothetical_protein Poxviridae - - - ALG_2_DN19214_c2_g1_i7_len1432 ORF_2 118 VOG06927 6.90E-04 REFSEQ_hypothetical_protein Bacteriophage - - - ALG_3_DN25592_c0_g1_i1_len1043 ORF_1 115 VOG01256 4.40E-04 sp|Q9QU29|ORF3_TTVB1_Uncharacterized_ORF3_protein dsDNA_viruses - - - ALG_2_DN18451_c0_g1_i4_len2061 ORF_4 109.48 VOG17696 7.10E-04 REFSEQ_hypothetical_protein Bacteriophage PF16058.5 1.80E-17 Mucin-like ALG_2_DN18451_c0_g1_i4_len2061 ORF_4 109.48 VOG17696 7.10E-04 REFSEQ_hypothetical_protein Bacteriophage PF16058.5 1.10E-07 Mucin-like ALG_2_DN18732_c0_g2_i2_len2454 ORF_2 99.65 VOG09815 1.70E-15 REFSEQ_hypothetical_protein Phycodnaviridae;Chlorovirus - - - ALG_2_DN18957_c0_g1_i1_len1408 ORF_2 97.98 VOG02199 8.40E-04 sp|Q5UR09|YR648_MIMIV_Uncharacterized_protein_R648 Ortervirales PF06156.13 4.70E-04 Initiation control protein YabA ALG_2_DN18957_c0_g1_i2_len1657 ORF_2 89.04 VOG02199 3.50E-04 sp|Q5UR09|YR648_MIMIV_Uncharacterized_protein_R648 Ortervirales PF06156.13 2.20E-04 Initiation control protein YabA ALG_2_DN19250_c2_g3_i3_len2203 ORF_1 71.48 VOG23558 7.80E-05 REFSEQ_hypothetical_protein Caudovirales;Siphoviridae - - - ALG_2_DN19410_c0_g2_i7_len1634 ORF_1 49.51 VOG12013 9.90E-04 sp|P03778|Y06_BPT7_Protein_0.6B Bacteriophage - - - ALG_2_DN11543_c0_g1_i1_len842 ORF_1 42 VOG20356 3.80E-04 REFSEQ_hypothetical_protein Caudovirales;Myoviridae - - - ALG_2_DN19463_c5_g2_i1_len765 ORF_1 37.7 VOG24589 2.60E-05 REFSEQ_hypothetical_protein Caudovirales;Siphoviridae - - - ALG_2_DN19174_c0_g2_i18_len938 ORF_1 36.39 VOG10625 9.70E-04 sp|Q05293|VG78_BPML5_Gene_78_protein Bacteriophage - - - ALG_2_DN41289_c0_g1_i1_len750 ORF_1 36 VOG06662 9.50E-05 REFSEQ_Cupin Bacteriophage - - - ALG_2_DN22182_c0_g1_i1_len820 ORF_1 35 VOG02199 1.50E-04 sp|Q5UR09|YR648_MIMIV_Uncharacterized_protein_R648 Ortervirales PF06156.13 7.20E-04 Initiation control protein YabA ALG_2_DN44027_c0_g1_i1_len815 ORF_1 33.98 VOG21678 3.70E-04 REFSEQ_hypothetical_protein Caudovirales;Myoviridae PF08614.11 6.00E-05 Autophagy protein 1(ATG16) ALG_2_DN594_c0_g2_i1_len711 ORF_1 28 PF17501.2 2.80E-04 Viral RNA-directed RNA polymerase Viral_RdRp_C - - - ALG_2_DN14271_c0_g1_i1_len772 ORF_1 25.12 VOG18617 2.20E-04 REFSEQ_hypothetical_protein Caudovirales;Siphoviridae PF13855.6 1.60E-21 Leucine rich repeat ALG_2_DN18993_c2_g2_i1_len2178 ORF_1 21.18 VOG08344 4.70E-07 REFSEQ_hypothetical_protein Bacteriophage PF13374.6 1.40E-126 1Tetratricopeptide repeat ALG_2_DN44027_c0_g1_i2_len783 ORF_1 19.02 VOG21678 3.60E-04 REFSEQ_hypothetical_protein Caudovirales;Myoviridae PF08614.11 7.70E-05 Autophagy protein 1(ATG16) ALG_2_DN19463_c5_g2_i6_len928 ORF_1 14.34 VOG24589 6.40E-05 REFSEQ_hypothetical_protein Caudovirales;Siphoviridae - - - 216 ALG_2_DN19463_c5_g2_i4_len863 ORF_1 4.36 VOG24589 5.30E-05 REFSEQ_hypothetical_protein Caudovirales;Siphoviridae - - -
12 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
217 To manually assess the level of confidence of the RdRp signal detected in the HMM
218 comparisons, the protein sequences of the three RdRp candidates were aligned to amino
219 acid sequences retrieved from the RdRp_C, RdRp_4 and RdRp_1 PFAM profiles. The
220 RdRp_C profile represents the C-terminal of the RdRp (Protein A) found in
221 Alphanodaviruses. Unfortunately, this region lacks the functional motifs usually associated
222 with the RdRp, preventing us from definitively establishing the ALG_2_DN594 contig as
223 representing a true RdRp-encoding virus. Similarly, the ALG_3_DN34624 alignment with the
224 viral RdRp sequences that comprise the PFAM RdRp_4 profile (PF02123) does not display
225 a clear conservation of the crucial functional residues and motifs within the RdRp,
226 particularly the canonical GDD motif that is GFD in ALG_3 contig (Fig S2): strikingly, the
227 GFD motif is absent from an alignment of 4,627 viral RdRp sequences27. Whether this
228 reflects a newly identified functional motif of the viral RdRp or an artefact remains to be
229 determined, but at this point we cannot safely conclude that ALG_3_DN34624 encodes a
230 viral RdRp. In contrast, the ALG_2_DN19089-encoded ORF shared multiple motifs with
231 RdRp_1 profile (PF00680) (Fig S3), including motif A at positions 437-442 and GDD at
232 positions 557-559. Because of the presence of these functional motifs, ALG_2_DN19089
233 can be confidently considered as a RdRp-encoding contig and will be referred to here as
234 ‘Partiti-like adriusvirus’. Interestingly, this contig also revealed significant similarity to some
235 eukaryotic chloroplast-associated double-stranded RNA replicons (BDRC) obtained from
236 the green algae species, Bryopsis cinicola28 (Table 2). It is therefore likely that these BDRC
237 dsRNA Bryopsis-replicons in fact represent viral RdRp sequences20, and we treat them as
238 such in this study.
239
240 An additional BLASTx comparison using this divergent Partiti-like RdRp as a reference
241 identified two other BDRC-like contigs in the Ostreobium sp. data set - ALG_2_DN19300
242 and ALG_2_DN19436 (Table 2, Fig 5). Along with the Partiti-like adriusvirus, these two
13 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
243 additional sequences were both validated by RT-PCR (Fig S1) and are listed in the viral
244 contig table as ‘Partiti-like lacotivirus’ and ‘Partiti-like alassinovirus’, respectively (Table 2).
245
246 Relative abundance and prevalence of RNA viruses in the samples
247 Relative virus abundance varied between libraries and viruses of the same family in the
248 Ostreobium sp. culture, with viral-like sequences constituting between 0.01 and 0.2% of
249 the total non rRNA reads (Table 2, Fig 2). Each virus described was identified in only one of
250 the cultures sequenced (Fig S1). However, inter-sample BLAST-comparisons revealed
251 similarity between the partiti-like sequence identified in the K. allantoideum and Ostreobium
252 samples. A non-annotated ORF from a K. allantoideum contig, ALG_1_DN2506, aligned
253 with the N-terminal ORF of Ostreobium sp. amalga-like virus contigs that potentially encode
254 the virus coat protein. Because of their co-occurrence in K. allantoideum (1.1 and 1.2 PCR,
255 Fig S1) and their similar abundance levels, it is likely that ALG1 DN2506 and ALG 1 DN2691
256 (referred as ‘Amalga-like boulavirus’, Table 2) contigs are in fact part of the same genome.
257 Unfortunately, the poor quality of RNAs and the resulting high degree of fragmentation
258 obtained in the ALG_1 RNAseq library did not allow us to resolve this question.
259
260 Notably, a large majority of the new RNA viruses reported here come from the ulvophyte
261 Ostreobium sp., although this may in part result from differences in RNA quality and
262 sequencing rather than a true biological difference in RNA virome composition as only a
263 limited number of non-rRNA reads were obtained for the ALG_1 and ALG_3 libraries (Fig
264 S4). Difficulties in detecting highly divergent sequences, especially in poorly characterized
265 and distant clades like the chlorarachniophytes, may also contribute to the different
266 numbers of viruses observed between libraries.
267
268 Secondary host detection
14 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
269 As the algal cultures analysed here were not axenic, we evaluated the potential presence of
270 other eukaryotes in the samples using CCMetagen29. According to the Krona plots obtained
271 for each library, cultivated algae were the major organism found in the samples (Fig S5).
272 Nevertheless, 2-8% of ALG_1 and ALG_3 contigs were assigned to dinoflagellates and
273 Cyanophora algae (Fig S5). Although sequences assigned to Lingulodinium polyedrum
274 potentially result from GenBank mis-annotation and were likely of bacterial origin, the
275 Coolia malayensis (Dinophyceae), Amphidinium sp (Dinophyceae) and Cyanophora
276 paradoxa (Glaucophyta) associated sequences likely constitute true assignments. We
277 therefore suspect that these additional micro-eukaryotic transcripts arose from cross-
278 contamination with additional algae samples that were extracted and sequenced at the
279 same time. Importantly, none of the viruses identified in the three libraries studied here
280 could be detected in the transcriptomes of the co-processed Dinophyceae and
281 Glaucophyta cultures, suggesting that these low-abundance contaminants are not the
282 hosts of the viruses reported here.
283
284 Phylogenetic analysis of the newly identified viruses
285
286 dsRNA viruses
287 Partiti-like viruses
288 Eight of the newly-described viral sequences exhibited RdRp amino acid sequence
289 similarity with Partiti-like viruses (i.e. close to members of the Partitiviridae). Based on
290 phylogenetic studies, five of the viral-like sequences are close to those from the
291 Amalgaviridae (named after their mosaic status comprising both a Partitiviridae-like RdRp
292 and the dicistronic and monopartite genomic organization of the Totiviridae30) and form a
293 clade with the Bryopsis mitochondria-associated dsRNA (BDRM), although they share only
294 28-32% sequence identity (Table 2). BDRM was first described as a dsRNA associated with
15 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
295 mitochondria in Bryopsis cinicola macroalgae31 and later classified as a virus by the
296 International Committee on Taxonomy of Viruses (ICTV). Like Ostreobium and Kraftionema,
297 Bryopsis belongs to the class Ulvophyceae, and it seems likely that all these five newly
298 identified Amalga-like viral sequences (Fig 5) form an Ulvophyceae-infecting viral clade.
299
300 Considering the levels of RdRp pairwise identity between these sequences (Table A, Top)
301 we assumed that each constituted a new species. Although a clear taxonomic classification
302 needs to be established at the family or genus levels, we performed a preliminary
303 taxonomic assessment using the PAirwise Sequence Comparison (PASC) tool from the
304 NCBI32. Each of these five new BDRM-like viral genome sequences were compared with
305 the Amalgaviridae full-length genomes available at NCBI. For each newly-discovered virus,
306 the closest matches to existing Amalgaviridae sequences were retrieved, and the resulting
307 pairwise identity distributions compared with those within and between Amalgaviridae
308 genera (Fig 3A). While this analysis indicates that these newly identified virus sequences are
309 not part of any existing Amalgaviridae genus (Fig 3A), whether they can be considered as a
310 new genus within the Amalgaviridae, or a new family, is not yet clear and will require a
311 formal taxonomic assessment by the ICTV.
312
313 Table 4. RdRp-based pairwise identity of newly discovered viruses. Top: Amalga-like
314 RdRp; Middle: BDRC-like RdRp; Bottom: Mitovirus-like RdRp.
Amalga-like Amalga-like Amalga-like Amalga-like Amalga-like RdRp chassivirus lacheneauvirus chaucrivirus dominovirus
Amalga-like boulavirus 19 21 16 16
Amalga-like chassivirus - 47 22 20
Amalga-like lacheneauvirus - 22 21
Amalga-like chaucrivirus - 36
315
BDRC-like RdRp Partiti-like alassinovirus Partiti-like lacotivirus
Partiti-like adriusvirus 20 20
16 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Partiti-like alassinovirus - 32
316
Mito-like Mito-like Mito-like Mito-like Mito-like Mito-like Mitovirus-like RdRp picolinus babylonus albercanus boobanusbagrillonus laruketanus virus bionusvirus virus virus virus virus
Mito-like spartanusvirus 46 19 21 21 20 15
Mito-like picolinusvirus - 17 20 19 19 13
Mito-like babylonusvirus - 22 22 20 14
Mito-like albercanusvirus - 26 25 16
Mito-like laruketanusvirus - 37 14
Mito-like - 12 boobanusbagrillonusvirus
317
318 Interestingly, if these Amalga-like sequences are translated into amino acids using the
319 protozoan mitochondrial code, they display the same genomic organization as BDRM,
320 encoding two overlapping ORFs: the 5’ one encoding a hypothetical protein and the other
321 coding for a replicase through a -1 ribosomal frameshift33 (Fig S6). However, it is unclear if
322 these sequences should be translated through the mitochondrial genetic code or the
323 standard cytoplasmic one code, and such sub-cellular localization still remains to be
324 validated. It is also notable that the two closest homologs of BDRM, the Amalga-like
325 dominovirus and Amalga-like chaucrivirus, contain the GGAUUUU ribosomal -1 frameshift
326 motif and could plausibly be translated in this manner in the mitochondria (Fig S6).
327
328 The length and two-ORF encoding genomic structure of the BDRM-like sequences
329 generally correspond to genomic features of the amalgaviruses (Fig 5, right). Despite a lack
330 of detectable sequence similarity at both the sequence (BLASTx) and structural levels
331 (Phyre2), the second ORFs predicted in these amalgavirus-like sequences are expected to
332 encode a CP-like protein, even if the involvement of this potential CP in encapsidation
333 remains unclear34. A divergent virus similar to the BDRM was also recently identified as
334 infecting the phytopathogenic fungi Ustilaginoidea virens35.
335
17 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
336 The three BDRC-like sequences identified from Ostreobium sp. also cluster with the Partiti-
337 like viruses (Fig 5, Table 2) and can be classified as three different species after applying
338 the 90% RdRp percentage identity species demarcation criteria (Table 4, Middle). Notably,
339 the genomic organization of these BDRC-like contigs seems “inverted” compared to
340 members of the Totiviridae and Amalgaviridae; that is, a first ORF encoding a CP protein is
341 followed by a second that represents the RdRp. Indeed, the RdRp encoded by the Partiti-
342 like ALG_2 contigs is close to the 5’ extremity, followed by a second ORF. This second
343 ORF could potentially encode a CP, although functional annotation could not be achieved
344 due to the high level of sequence divergence.
345
346 Structural-based homology analysis
347 The remaining hits from the RdRp-profile analysis - ALG_2_DN594_c0_g2_i1_len711 and
348 ALG_3_DN34624_c0_g1_i1_len2077 - were used in a Phyre2 analysis. However, this
349 revealed no confident identification of viral RdRp signal (i.e. the confidence levels of models
350 were < 90%).
351
352 Despite the great diversity of host and genomic organizations, the detectable sequence
353 similarity observed between Partitiviridae and Amalgaviridae strongly suggests that they
354 share common ancestry34. As such, it is important to determine whether amalgaviruses are
355 restricted to plant hosts36–38, and hence differ from the Partitiviridae that have been
356 characterized in many divergent eukaryotic hosts such as fungi, plants and protists39,40.
357
358 ssRNA(+) viruses
359 Mitovirus-like viruses
360 Seven viral sequences from Ostreobium sp. clustered in the Narnaviridae, forming a clade
361 within the genus Mitovirus (Fig 6). Their single ORF encodes an RdRp, and a genome length
18 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
362 of ~3000 nt is typical of the Narnaviridae (Fig 6, right). These viral sequences could
363 constitute a new clade of protist-associated mitoviruses potentially restricted to green
364 microalgae. Indeed the closest relative identified here, Shahe Narna-like virus 6, was
365 isolated from freshwater small planktonic crustaceans (Daphnia magna, Daphnia carinata
366 and Moina macrocopa) belonging to the order Cladocera41 (commonly referred to as water
367 fleas). These animals feed on microalgae and it is therefore possible that Shahe Narna-like
368 virus 6 in fact infects algae rather than arthropods, in a similar manner to other members of
369 the Narnaviridae. It is therefore possible that these RNA viruses could be major constituents
370 in habitats occupied by protists and by consequence in larger predatory invertebrates.
371
372 Based on the 50% RdRp sequence identity used as a species demarcation criterion in the
373 Narnaviridae (ICTV report 2009), we identified seven new mito-like viral species (Table 4,
374 bottom). To place of these new species among the Narnaviridae, we performed a PASC
375 analysis using full-length genomes from the seven new viral species. This revealed that the
376 identity levels of the new sequences fell in the range expected of intra-genus diversity (Fig
377 3B). We therefore propose the existence of a new sub-clade of mitoviruses, including these
378 seven new species as well as the Shahe Narna-like virus 6 previously described41 (Fig 6).
379 Whether this proposed clade is associated with mitochondria is currently unclear, and
380 predicted ORFs were obtained using both standard and mitochondrial genetic codes.
381
382 Tombusviridae-like viruses
383 One sequence from Ostreobium sp., the Tombus-like chagrupourvirus, exhibited similarity
384 to members of the Tombusviridae family of ssRNA(+) viruses (Fig 7), grouping with viruses
385 previously identified as infecting plant or plant pathogenic fungi. This again illustrates the
386 likely shared evolutionary history between green algae and land plants, and that horizontal
387 virus transfers can occur between plants and their pathogenic fungi. Of note is that the
19 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
388 closest relative of Tombus-like chagrupourvirus documented to date, Hubei-Tombus like
389 virus 12, was isolated from freshwater animals (mollusca Nodularia douglasiae)41. According
390 to the lack of distinguishable animal-related contigs in the Ostreobium sp. sample (Fig S5),
391 this Hubei-Tombus like virus 12 may, together with the newly tombus-like sequence
392 discovered here, constitute a new clade of green algae-infecting viruses.
393
394 The 3.8kb genome length of the Tombus-like chagrupourvirus is similar to those commonly
395 observed in Tombusviridae and their relatives (Fig 7, right), suggesting that it comprises a
396 full-length genome for this virus. This putative full-length genome sequence was compared
397 to Tombusviridae reference genomic sequences using PASC to assess its taxonomic
398 position (Fig 3C). Accordingly, the Ostreobium-associated tombus-like sequence could
399 constitute a new Tombusviridae genus.
400
401 Virgaviridae-like viruses
402 One sequence identified in Chlorarachnion reptans displayed detectable sequence similarity
403 to the Virgaviridae-like RdRp supergroup (Fig 8). The family Virgaviridae comprise ssRNA(+)
404 viruses traditionally associated with plants and display diverse genomic organizations. The
405 short length of the Virga-like bellevillovirus associated with chlorarachniophytes indicates
406 that this sequence likely reflects a partial genome sequence only. Moreover, the multi-
407 segment structure of the closest relatives suggests that the partial genome recovered in
408 ALG_3 could also contain additional segments not yet identified. Although further host
409 confirmation is required, this newly described RNA virus-like sequence would constitute the
410 first algae virus from the Hepe-Virga group.
411
412 ssRNA(-) virus
413 Bunyavirales-like viruses
20 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
414 A partial viral genome, the Bunya-like bridouvirus, encoding a RdRp-like signal was
415 identified in the Chlorarachnion reptans sample, although this sequence is highly divergent
416 and cannot be formally assigned to any previously described viral family. Strikingly,
417 however, the sequence clusters with a Bunya-Arena like virus previous identified in diverse
418 Freshwater small planktonic crustaceans (Daphnia magna (N/A), Daphnia carinata (N/A) and
419 Moina macrocopa (N/A))41 that typically feed on algae. Our phylogenetic analysis places this
420 sequence within the diversity of the order Bunyavirales (Fig 9). In addition to the freshwater
421 organism-associated viruses identified in Shi et al. 2016 (ref.41), this contig clusters with
422 several bunya-like unclassified negative-strand viruses isolated from the fungi
423 Cladosporium cladosporioides and the oomycete Plasmopara viticola (Fig 9) that are both
424 plant pathogens. The multi-segment structure of the closest classified family, the
425 Phenuiviridae, suggests that additional segments associated with the partial Bunya-like
426 bridouvirus genome may exist in C. reptans.
427
428 Discussion
429 The central aim of this study was to better characterise the RNA virus diversity in two major
430 algal lineages, the chlorarachniophytes and the ulvophytes, for which no RNA viruses had
431 previously been reported. Our investigation of the RNA virus diversity in six microalgae
432 species led to the identification of 18 new divergent RNA viruses, although they exhibited
433 clear homology with five established viral families. This clearly supports the idea that are an
434 important reservoir of untapped RNA virus diversity and that the apparent domination of
435 DNA viruses likely originates, at least partially, from sampling biases. The notion of a
436 potentially large dark matter of algal viruses is further reinforced by the high proportion of
437 unassigned contigs obtained, that likely contain a non-neglectable proportion of highly
438 divergent viral reads.
439
21 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
440 RNA virome similarities between green algae and land plants
441 Among the 18 novel RNA virus species described here, seven of those detected in the
442 green algae Ostreobium sp. and K. allantoideum were seemingly related to the
443 Tombusviridae and Amalgaviridae families of plant RNA viruses. Such similarities in RNA
444 virome between green algae and land plants are consistent with previous analyses based
445 on the Plant Genome project transcriptomic data that identified Partitivirus-like signals in
446 Chlorophyte algae20. However, the very limited sequence similarity among these viruses
447 strongly suggests an ancient divergence among these viral families, perhaps even before
448 the chlorophyte-streptophyte split some 850-1,100 million years ago (Ma)25. The close link
449 between land plant and green algae RNA virosphere is reinforced by the recent observation
450 that plant viruses are able to infect non-vascular plants such as mosses and algae42,43.
451
452 Divergent fungal mitoviruses in Ostreobium sp.
453 The presence of a potential new clade of protist-associated mitoviruses is of importance as
454 mitoviruses have traditionally been viewed as restricted to fungal hosts and were only very
455 recently identified in plants44. Similar to virus transfer between fungi and land plants 5, it is
456 possible that the symbiosis and co-evolution between green algae and fungi46 explains the
457 close phylogenetic relationships in their virome, perhaps facilitating horizontal gene transfer
458 events. Indeed, coral holobionts are the place for frequent interactions between endolithic
459 algae, such as Ostreobium sp., and fungi47–49. While we cannot formally identify host
460 associations from our RNA-seq data alone, no fungal-assigned contigs were detected in
461 our data, arguing against these being fungal viruses. Considering the high levels of
462 sequence divergence between our viral sequences and those associated with fungi and
463 within the clade formed by Ostreobium sp.-associated viruses, it seems likely that any such
464 horizontal gene transfer events are not recent and may have occurred in Ulvophyceae or
465 even Chlorophyta ancestors. It will be of considerable interest to examine this new
22 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
466 mitovirus-clade across a larger set of green microalgae species. Further characterisation of
467 such green algae-infecting mitovirus diversity and their lifecycle could be pivotal for our
468 understanding of the Narnaviridae and their evolution in eukaryotic cells. For example, it is
469 of interest to determine whether this putative mitochondrial subcellular location is the result
470 of an escape from the cytoplasmic dsRNA silencing (as suggested for newly characterized
471 plant mitoviruses50) or if these viruses are relics of the eukaryotic endosymbiosis event,
472 particularly as mitoviruses have bacterial counterparts, the Leviviridae27. More broadly,
473 these newly-reported mitovirus-like sequences further illustrate the enormous diversity of
474 hosts infected by Narnaviridae, including such eukaryotic microorganisms as Apicomplexa,
475 Excavates and Oomycetes hosts51–54.
476
477 Detection of plant viruses in the chlorarachniophytes
478 The apparent similarity between a Rhizarian (C. reptans) associated viral sequence and
479 land-plant infecting viruses was also striking. The Rhizaria and Archaeplastida are assumed
480 to have diverged before the cyanobacteria primary endosymbiosis event, ca. 1.5 billion
481 years ago55,56. Thus, a detectable sequence similarity between Rhizaria-associated viruses
482 and those infecting land plants cannot be reasonably attributed to such ancient
483 evolutionary events. Instead, the presence of such land plant-like viruses in
484 Chlorarachniophyte would more likely reflect more recent acquisition of these viruses in
485 chlorarachniophytes through either horizontal transfer by common vector/symbiont/parasite
486 ancestors or secondary endosymbiosis (eukaryote-to-eukaryote) events. Indeed, a
487 secondary endosymbiosis event of a green alga in the core Chlorophyta, possibly related to
488 Bryopsidales, led to the origin of the plastid of chlorarachniophytes between 578–318 Ma21.
489 Whether this virus (i) constitutes a relic of viruses that infected this engulfed green algal
490 endosymbiont, (ii) is part of the chlorarachniophyte cytoplasm or still associated with the
491 periplastidial compartment (i.e. remnant cytoplasm of the endosymbiont), or (iii) interacts
23 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
492 with the nucleomorph (remnants of the green algal endosymbiont nucleus) are key
493 questions in the evolution of eukaryotic RNA viruses. While our data cannot provide
494 answers, it will be of major interest to extend these analyses to euglenophytes and the
495 dinoflagellate genus Lepidodinium where distinct secondary endosymbiosis with green
496 algae have also occurred, as well as to cryptophytes that also contain the remnant nucleus
497 of its red algae endosymbiont57,58.
498
499 First report of a negative-sense RNA virus in algae
500 Our detection of Bunyavirales-like sequence in C. reptans is of importance as it would
501 constitute the first evidence of a negative-sense RNA virus in a microalgae and only the
502 second discovered in eukaryotic microorganisms. Considering the extensive host range
503 attributed to Bunyavirales (from land plants to invertebrates, vertebrates and humans), their
504 association with chlorarachniophyte hosts is plausible. However, considering the poor level
505 of abundance in C. reptans sample, additional work is clearly needed to retrieve full-
506 genome information and to confirm the association of such bunya-like viruses with
507 chlorarachniophytes.
508
509 One of the greatest challenges in viral meta-transcriptomics is our capacity to detect
510 distant homologies, especially in rapidly evolving RNA viruses. As a first attempt to retrieve
511 such ephemeral evolutionary signals, we scanned orphan contigs using RdRp protein
512 structure information in addition to the standard primary amino acid sequence. Notably,
513 both RdRp profile and 3D protein structure comparison led to the identification of highly
514 divergent RNA virus candidates, although these remain difficult to annotate. Efforts to
515 better describe the repertoire of sequence and structure of viral RdRps are central to
516 unveiling the RNA virosphere in overlooked eukaryotic organisms.
517
24 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
518 MATERIAL AND METHODS
519 Algal cultures
520 Algal strains were isolated from marine sand (Microrhizoidea pickettheapsiorum59;
521 Kraftionema allantoideum60; Chlorarachnion reptans and Lotharella sp. from the Wye River,
522 Victoria, Australia) or coral skeletons (Ostreobium sp. HV05007, Kavieng, Papua New
523 Guinea), or obtained from the NIVA Culture Collection of Algae (Dolichomastix tenuilepis,
524 SCCPA strain K-0587). Cultures were maintained in K-enriched seawater medium
525 (transferring every other week) at either 26ºC (Ostreobium) or 16ºC (all others) under cool
526 white LED lights at 1 Photosynthetic active radiation (PAR) (Ostreobium) or 15 PAR (all
527 others). Cultures were pelleted by centrifugation in falcon tubes and stored in RNAlater at -
528 80ºC until RNA extraction.
529
530 Total RNA extractions
531 For total RNA extractions, RNAlater was removed by low centrifugation, algal cells were
532 disrupted using thaw/freezing cycles and bead beating (0.1-0.5mm), and total RNA was
533 further extracted using the Qiagen® RNeasy Plant mini kit following the manufacturer's
534 instructions.
535
536 For the initial using meta-transcriptomic screening, RNAs were pooled into three groups: (i)
537 the chlorophyta Dolichomastix tenuilepis and Microrhizoidea pickettheapsiorum
538 (Mamiellophyceae) and Kraftionema allantoideum (Ulvophyceae) were pooled into
539 metatranscriptome ‘ALG_1’; (ii) the ulvophyte Ostreobium sp. comprised ‘ALG_2’; and (iii)
540 the two chlorarachniophytes Chlorarachnion reptans and Lotharella sp. were pooled into
541 ‘ALG_3’ (Table 1).
542
543 Total RNA Sequencing
25 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
544 RNA quality was assessed and TruSeq stranded libraries were synthetized by the Australian
545 Genome Research Facility (AGRF), using either (i) TruSeq stranded with a eukaryotic rRNA
546 depletion step (RiboZero Gold kit, Illumina) for ALG_2, or (ii) the SMARTer Stranded Total
547 RNA-Seq Kit v2 - Pico Input Mammalian libraries (Takara Bio USA) for ALG_1 and ALG_3
548 (Table S1), due to the low amount of input RNAs in these libraries. The resulting libraries
549 were sequenced on an Illumina HiSeq2500 (paired-end, 100bp) at the AGRF. Library
550 descriptions and RNA-seq statistics are summarized in Table S1.
551
552 In silico processing of meta-transcriptomic data
553 Read depletion and contig assembly
554 The RNA-seq data were first subjected to low-quality read and Illumina adapter filtering
555 using the Trimmomatic program61. Ribosomal RNA was depleted using the SILVA database
556 v3262, which removed between 86 and 94% of the total unfiltered reads (Fig S1A/2). Read-
557 depleted libraries were then de novo assembled using the Trinity program63 and contigs
558 shorter than 200nt were removed (the average length of contig assembly is shown in Fig
559 S1B/2). Contig abundances were calculated from the RNA-seq data using the Expectation-
560 Maximization (RSEM) software64.
561
562 RNA virus detection using BLASTx and BLASTn
563 The similarity of contigs to the current NCBI nucleotide (nt) and protein (nr) databases was
564 determined using the BLASTn and Diamond BLASTx programs65, respectively, employing
565 1e-10 and 1e-05 as e-value cut-offs. RNA virus-like sequences were identified using
566 BLASTx against all RdRp protein sequences available on NCBI/GenBank. False-positive
567 signals for RNA viruses were removed by BLASTing RdRp-like sequences against the nr
568 database and discarding sequences displaying a non-viral sequence as the best hit.
569
26 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
570 RNA virus profile-based homology detection
571 To detect especially diverse RdRp-based sequences, orphan contigs (i.e. those with no
572 match in either the nr and nt databases) were compared to the PFAM RdRp-protein profiles
573 ‘MitoVir_RdRp’ (PF05919), ‘Birna_RdRp’ (PF04197), ‘Viral_RdRp_C’ (PF17501), ‘RdRP_1’
574 (PF00680), ‘RdRP_2’ (PF00978), ‘RdRP_3’ (PF00998) and ‘RdRP_4’ (PF02123), as well as
575 to the entire VOG profile database (http://vogdb.org)66 using the HMMer3 program67. To
576 check for false-positive signals, these orphan sequences were submitted to the entire
577 PFAM database.
578
579 3D protein structure prediction of RdRp-like contigs
580 To infer a structural model for the distant RdRp signals detected using profiles, sequences
581 displaying a RdRp-like signal were subjected to the normal mode search of the Protein
582 Homology/analogY Recognition Engine v 2.0 (Phyre2) web portal68. Briefly, this program
583 first compares the submitted amino acid sequence to a curated non-redundant nr20 data
584 set using HHblits69. It then converts the conserved secondary structure information as a
585 query against known 3D-structures using HHsearch70. A final structural modelling step
586 based on identified structural homologies is then performed as described previously68.
587
588 RNA virus sequence analysis and annotation
589 Total non-rRNA reads were mapped onto RNA virus-like contigs using Bowtie2 and
590 heterogeneous coverage and potential mis-assemblies were manually resolved using
591 Geneious v11.1.471.
592
593 Open reading frames (ORFs) were first predicted using getORF from EMBOSS, in which
594 ORFs were defined as regions that are free of stop codons, although partial sequences (i.e.
595 missing start or stop codons) were retained for analysis. Protein domains were then
27 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
596 annotated using the InterProscan software package from EMBL-EBI, comprising ProSite,
597 PFAM, TIGR, SuperFamily, and CDD among others (https://github.com/ebi-pf-
598 team/interproscan).
599
600 Revealing host-virus associations
601 A challenge faced by all metagenomic studies is confidently assigning each viral sequence
602 to a particular host in a given sample. We used algal cultures to minimize the number of
603 potential additional cellular hosts. These cultures were, however, non-axenic, with mainly
604 bacteria present. To evaluate the possibility of additional microeukaryotic cells in the
605 sample, we obtained taxonomic identification for contigs in the metatranscriptome by
606 aligning them to the NCBI nt database using the KMA aligner and the CCMetagen
607 program29,72. Contigs matching an entry in the nt database were displayed as Krona plots
608 and classified based on their taxonomy (utilising high taxonomic levels for clarity).
609
610 Phylogenetic analysis
611 RdRp amino acid sequences were aligned using the L-INS-I algorithm in the MAFFT
612 program73 and trimmed with TrimAI (automated mode). Maximum likelihood phylogenetic
613 trees were then estimated using IQ-TREE, employing ModelFinder to obtain the best-fit
614 model of amino acid substitution in each case, with nodal support assessed using 1000
615 bootstrap replicates and 1000 replicates of SH-like approximate likelihood ratio test (SH-
616 aLRT)74,75. For each tree, reference genomes and corresponding RdRp sequences were
617 retrieved from the NCBI Viral genome resource
618 (https://www.ncbi.nlm.nih.gov/genome/viruses/). To depict the evolutionary relationships of
619 the newly discovered viruses as meaningfully as possible, the closest unclassified BLASTx
620 homologs were used in the phylogenetic analysis. This resulted in alignments of 237, 76,
28 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
621 198, 558 and 325 RdRp protein sequences for Amalga-like, Mito-like, Tombus-like, Virga-
622 like and Bunya-like virus groups, respectively.
623
624 RT-PCR validation
625 Viral contigs were validated experimentally and associated to individual algal sample using
626 Reverse Transcription (SuperScript™ IV reverse transcriptase - Invitrogen™) followed by
627 PCR (Platinum™ SuperFi™ DNA polymerase - Invitrogen™), with specific primer sets for
628 each contig. The rbcL and tufA marker genes were used as PCR positive controls using
629 sets of primers designed in ref.76. All primers used in this study are described in Table S2.
630
631 ACKNOWLEDGMENTS
632 We acknowledge the SIH Bioinformatics Hub and the University of Sydney’s high-
633 performance computing cluster Artemis for providing the computing resources used for this
634 study. We also thank Fabien Burki for providing us source files to build Figure 1.
29 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
635 REFERENCES
636 1. Krupovic M, Prangishvili D, Hendrix RW, Bamford DH. Genomics of bacterial and
637 archaeal Viruses: dynamics within the prokaryotic virosphere. Microbiol Mol Biol Rev
638 2011; 75: 610–635.
639 2. Lang AS, Rise ML, Culley AI, Steward GF. RNA viruses in the sea. FEMS Microbiol
640 Rev 2009; 33: 295–323.
641 3. Krishnamurthy SR, Wang D. Origins and challenges of viral dark matter. Virus Res
642 2017; 239: 136–142.
643 4. Zhang Y-Z, Shi M, Holmes EC. Using metagenomics to characterize an expanding
644 virosphere. Cell 2018; 172: 1168–1172.
645 5. Short SM, Staniewski MA, Chaban Y V., Long AM, Wang D. Diversity of viruses
646 infecting eukaryotic algae. Curr Issues Mol Biol 2020; 39: 29–62.
647 6. Mihara T, Nishimura Y, Shimizu Y, Nishiyama H, Yoshikawa G, Uehara H et al.
648 Linking virus genomes with host taxonomy. Viruses 2016; 8: 66.
649 7. Richmond A, Hu Q. Handbook of Microalgal Culture: Applied Phycology and
650 Biotechnology: Second Edition. In: Handbook of Microalgal Culture: Applied
651 Phycology and Biotechnology: Second Edition. 2013, pp 1–719.
652 8. Mayer JA, Taylor FJR. A virus which lyses the marine nanoflagellate Micromonas
653 pusilla. Nature 1979; 281: 299–301.
654 9. Brown RM. Algal viruses. Adv Virus Res 1972; 17: 243–277.
655 10. Illergard K, Ardell DH, Elofsson A. Structure is three to ten times more conserved
656 than sequence—A study of structural response in protein cores. Proteins Struct
657 Funct Bioinforma 2009; 77: 499–508.
658 11. Bamford DH, Grimes JM, Stuart DI. What does structure tell us about virus
659 evolution ? Curr Opin Struct Biol 2005; 15: 655–663.
660 12. Chen J, Guo M, Wang X, Liu B. A comprehensive review and comparison of different
30 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
661 computational methods for protein remote homology detection. Brief Bioinform 2018;
662 19: 231–244.
663 13. Singh J, Saxena RC. An Introduction to Microalgae: Diversity and Significance. In:
664 Handbook of Marine Microalgae. Academic Press, 2015, pp 11–24.
665 14. Archibald JM. The evolution of algae by secondary and tertiary endosymbiosis. Adv
666 Bot Res 2012; 64: 87–118.
667 15. Burki F, Roger AJ, Brown MW, Simpson AGB. The new tree of eukaryotes. Trends
668 Ecol Evol 2020; 35: 43–55.
669 16. Guiry MD. How many species of algae are there? J. Phycol. 2012; 48: 1057–1063.
670 17. Chapman RL. Algae: the world’s most important “plants”—an introduction. Mitig
671 Adapt Strateg Glob Chang 2013; 18: 5–12.
672 18. Leliaert F, Smith DR, Moreau H, Herron MD, Verbruggen H, Delwiche CF et al.
673 Phylogeny and molecular evolution of the green algae. CRC Crit Rev Plant Sci 2012;
674 31: 1–46.
675 19. Urayama SI, Takaki Y, Nunoura T. FLDS: A comprehensive DSRNA sequencing
676 method for intracellular RNA virus surveillance. Microbes Environ 2016; 31: 33–40.
677 20. Mushegian A, Shipunov A, Elena SF. Changes in the composition of the RNA virome
678 mark evolutionary transitions in green plants. BMC Biol 2016; 14: 68.
679 21. Jackson C, Knoll AH, Chan CX, Verbruggen H. Plastid phylogenomics with broad
680 taxon sampling further elucidates the distinct evolutionary origins and timing of
681 secondary green plastids. Sci Rep 2018; 8: 1–10.
682 22. Fuhrman JA. Marine viruses and their biogeochemical and ecological effects. Nature
683 1999; 399: 541–548.
684 23. Forterre P, Prangishvili D. The major role of viruses in cellular evolution: Facts and
685 hypotheses. Curr Opin Virol 2013; 3: 558–565.
686 24. Short SM, Staniewski MA, Chaban Y V, Long AM, Wang D. Diversity of viruses
31 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
687 infecting eukaryotic algae. Curr Issues Mol Biol 2020; 39: 29–62.
688 25. Cortona A Del, Jackson CJ, Bucchini F, Van Bel M, D’hondt S, Skaloud P et al.
689 Neoproterozoic origin and multiple transitions to macroscopic growth in green
690 seaweeds. Proc Natl Acad Sci USA 2020; 117: 2551–2559.
691 26. Cocquyt E, Verbruggen H, Leliaert F, Clerck O De. Evolution and cytological
692 diversification of the green seaweeds (Ulvophyceae). Mol Biol Evol 2010; 27: 2052–
693 2061.
694 27. Wolf YI, Kazlauskas D, Iranzo J, Lucía-Sanz A, Kuhn JH, Krupovic M et al. Origins
695 and evolution of the global RNA virome. mBio 2018; 9: e02329-18.
696 28. Koga R, Horiuchi H, Fukuhara T. Double-stranded RNA replicons associated with
697 chloroplasts of a green alga, Bryopsis cinicola. Plant Mol Biol 2003; 51: 991–999.
698 29. Marcelino VR, Clausen PTLC, Buchmann JP, Wille M, Iredell JR, Meyer W et al.
699 CCMetagen: comprehensive and accurate identification of eukaryotes and
700 prokaryotes in metagenomic data. Genome Biol 2020; 21: 103.
701 30. Sabanadzovic S, Valverde RA, Brown JK, Martin RR, Tzanetakis IE. Southern tomato
702 virus: The link between the families Totiviridae and Partitiviridae. Virus Res 2009; 140:
703 130–137.
704 31. Koga R, Fukuhara T, Nitta T. Molecular characterization of a single mitochondria-
705 associated double-stranded RNA in the green alga Bryopsis. Plant Mol Biol 1998; 36:
706 717–724.
707 32. Bao Y, Chetvernin V, Tatusova T. Improvements to pairwise sequence comparison
708 (PASC): a genome-based web tool for virus classification. Arch Virol 2014; 159:
709 3293–3304.
710 33. Atkins JF, Loughran G, Bhatt PR, Firth AE, Baranov P V. Ribosomal frameshifting and
711 transcriptional slippage: From genetic steganography and cryptography to
712 adventitious use. Nucleic Acids Res 2016; 44: 7007–7078.
32 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
713 34. Krupovic M, Dolja V V, Koonin E V. Plant viruses of the Amalgaviridae family evolved
714 via recombination between viruses with double-stranded and negative-strand RNA
715 genomes. Biol Direct 2015; 10: 12.
716 35. Zhang T, Jiang Y, Dong W. A novel monopartite dsRNA virus isolated from the
717 phytopathogenic fungus Ustilaginoidea virens and ancestrally related to a
718 mitochondria-associated dsRNA in the green alga Bryopsis. Virology 2014; 462–463:
719 227–235.
720 36. Goh CJ, Park D, Lee JS, Sebastiani F, Hahn Y. identification of a novel plant
721 amalgavirus (Amalgavirus, Amalgaviridae) genome sequence in Cistus incanus. Acta
722 Virol 2018; 62: 122–128.
723 37. Zhan B, Cao M, Wang K, Wang X, Zhou X. Detection and characterization of cucumis
724 melo cryptic virus, cucumis melo amalgavirus 1, and melon necrotic spot virus in
725 cucumis melo. Viruses 2019; 11: 1–15.
726 38. Lee JS, Goh CJ, Park D, Hahn Y. Identification of a novel plant RNA virus species of
727 the genus Amalgavirus in the family Amalgaviridae from chia (Salvia hispanica).
728 Genes and Genomics 2019; 41: 507–514.
729 39. Roossinck MJ. Lifestyles of plant viruses. Philos Trans R Soc B Biol Sci 2010; 365:
730 1899–1905.
731 40. Nibert ML, Ghabrial SA, Maiss E, Lesker T, Vainio EJ, Jiang D et al. Taxonomic
732 reorganization of family Partitiviridae and other recent progress in partitivirus
733 research. Virus Res 2014; 188: 128–141.
734 41. Shi M, Lin X-D, Tian J-H, Chen L-J, Chen X, Li C-X et al. Redefining the invertebrate
735 RNA virosphere. Nature 2016; 540: 539–543.
736 42. Polischuk V, Budzanivska I, Shevchenko T, Oliynik S. Evidence for plant viruses in
737 the region of Argentina Islands, Antarctica. FEMS Microbiol Ecol 2007; 59: 409–417.
738 43. Petrzik K, Vondrák J, Kvíderová J, Lukavský J. Platinum anniversary: virus and lichen
33 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
739 alga together more than 70 years. PLoS One 2015; 10: e0120768.
740 44. Nibert ML, Vong M, Fugate KK, Debat HJ. Evidence for contemporary plant
741 mitoviruses. Virology 2018; 518: 14–24.
742 45. Roossinck MJ. Evolutionary and ecological links between plant and fungal viruses.
743 New Phytol 2019; 221: 86–92.
744 46. Bonfante P. Algae and fungi move from the past to the future. eLife 2019; 8: e49448.
745 47. Ricci F, Rossetto Marcelino V, Blackall LL, Kühl M, Medina M, Verbruggen H.
746 Beneath the surface: Community assembly and functions of the coral skeleton
747 microbiome. Microbiome 2019; 7: 1–10.
748 48. C. R, J. R. Fungi and Their Role in Corals and Coral Reef Ecosystems. In:
749 Raghukumar C. (ed). Biology of Marine Fungi. Springer, Berlin, Heidelberg, 2012, pp
750 89–113.
751 49. Marcelino VR, Verbruggen H. Multi-marker metabarcoding of coral skeletons reveals
752 a rich microbiome and diverse evolutionary origins of endolithic algae. Sci Rep 2016;
753 6: 31508.
754 50. Nerva L, Vigani G, Di Silvestre D, Ciuffo M, Forgia M, Chitarra W et al. Biological and
755 molecular characterization of Chenopodium quinoa mitovirus 1 reveals a distinct
756 small RNA response compared to those of cytoplasmic RNA viruses. J Virol 2019;
757 93: e01998-18.
758 51. Charon J, Grigg MJ, Eden JS, Piera KA, Rana H, William T et al. Novel RNA viruses
759 associated with Plasmodium vivax in human malaria and Leucocytozoon parasites in
760 avian disease. PLoS Pathog 2019; 15: e1008216.
761 52. Zangger H, Ronet C, Desponds C, Kuhlmann FM, Robinson J, Hartley M-A et al.
762 Detection of Leishmania RNA virus in Leishmania parasites. PLoS Negl Trop Dis
763 2013; 7: e2006.
764 53. Grybchuk D, Akopyants NS, Kostygov AY, Konovalovas A, Lye L-F, Dobson DE et al.
34 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
765 Viral discovery and diversity in trypanosomatid protozoa with a focus on relatives of
766 the human parasite Leishmania. Proc Natl Acad Sci 2018; 115: E506–E515.
767 54. Cai G, Myers K, Fry WE, Hillman BI. A member of the virus family Narnaviridae from
768 the plant pathogenic oomycete Phytophthora infestans. Arch Virol 2012; 157: 165–
769 169.
770 55. Yoon HS, Hackett JD, Ciniglia C, Pinto G, Bhattacharya D. A molecular timeline for
771 the origin of photosynthetic eukaryotes. Mol Biol Evol 2004; 21: 809–818.
772 56. Parfrey LW, Lahr DJG, Knoll AH, Katz LA. Estimating the timing of early eukaryotic
773 diversification with multigene molecular clocks. Proc Natl Acad Sci U S A 2011; 108:
774 13624–13629.
775 57. Burki F, Keeling PJ. Rhizaria. Curr Biol 2014; 24: R103–R107.
776 58. Gould SB. Algae’s complex origins. Nature 2012; 492: 46–48.
777 59. Wetherbee R, Rossetto Marcelino V, Costa JF, Grant B, Crawford S, Waller RF et al.
778 A new marine prasinophyte genus alternates between a flagellate and a dominant
779 benthic stage with microrhizoids for adhesion. J Phycol 2019; 55: 1210–1225.
780 60. Wetherbee R, Verbruggen H. Kraftionema allantoideum, a new genus and family of
781 Ulotrichales (Chlorophyta) adapted for survival in high intertidal pools. J Phycol 2016;
782 52: 704–715.
783 61. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina
784 sequence data. Bioinformatics 2014; 30: 2114–2120.
785 62. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P et al. The SILVA
786 ribosomal RNA gene database project: improved data processing and web-based
787 tools. Nucleic Acids Res 2013; 41: D590-6.
788 63. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I et al. Full-length
789 transcriptome assembly from RNA-Seq data without a reference genome. Nat
790 Biotechnol 2011; 29: 644–652.
35 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
791 64. Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with
792 or without a reference genome. BMC Bioinformatics 2011; 12: 323.
793 65. Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND.
794 Nat Methods 2015; 12: 59–60.
795 66. El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC et al. The Pfam
796 protein families database in 2019. Nucleic Acids Res 2019; 47: D427–D432.
797 67. Eddy SR. Accelerated profile HMM searches. PLoS Comput Biol 2011; 7: e1002195.
798 68. Kelley LA, Mezulis S, Yates CM, Wass MN, Sternberg MJE. The Phyre2 web portal
799 for protein modeling, prediction and analysis. Nat Protoc 2015; 10: 845–858.
800 69. Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein
801 sequence searching by HMM-HMM alignment. Nat Methods 2012; 9: 173–175.
802 70. Soding J. Protein homology detection by HMM-HMM comparison. Bioinformatics
803 2005; 21: 951–960.
804 71. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S et al. Geneious
805 Basic: An integrated and extendable desktop software platform for the organization
806 and analysis of sequence data. Bioinformatics 2012; 28: 1647–1649.
807 72. Clausen PTLC, Aarestrup FM, Lund O. Rapid and precise alignment of raw reads
808 against redundant databases with KMA. BMC Bioinformatics 2018; 19: 307.
809 73. Katoh K, Standley DM. MAFFT Multiple sequence alignment software version 7:
810 Improvements in performance and usability. Mol Biol Evol 2013; 30: 772–780.
811 74. Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: A fast and effective
812 stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol
813 2015; 32: 268–274.
814 75. Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder:
815 fast model selection for accurate phylogenetic estimates. Nat Methods 2017; 14:
816 587–589.
36 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
817 76. Vieira HH, Bagatini IL, Guinart CM, Vieira AAH, Vieira HH, Bagatini IL et al. tufA gene
818 as molecular marker for freshwater Chlorophyceae. Algae 2016; 31: 155–165.
819 77. Strassert JFH, Jamy M, Mylnikov AP, Tikhonenkov D V, Burki F. New phylogenomic
820 analysis of the enigmatic phylum telonemia further resolves the eukaryote tree of
821 Llfe. Mol Biol Evol 2019; 36: 757–765.
822
37 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
823 FIGURE LEGENDS
824 Fig 1. The enormous diversity of algae contrasts with its poorly-characterized
825 virosphere. (A) Representation of algae supergroups among the diversity of eukaryotes
826 (latest eukaryotic classification retrieved from15). The phylogenetic tree was adapted from77.
827 Pictures illustrate some of the samples used in this study and corresponding clades are
828 marked with “*”. (B) Pictures of algae cultures used in this study. (C) The current extent of
829 the microalgae virosphere. The viral sequence counts for each virus class (DNA or RNA,
830 single-stranded or double-stranded) were retrieved from VirusHostdb6 according to 11
831 major eukaryotic algae lineages. Microalgal lineages investigated in this study are
832 highlighted in bold.
833
834 Fig 2. Abundance of unknown and RNA virus-like contigs detected in the algal
835 libraries. (A) Percentage of non-assigned contigs. For clarity, numbers are normalized as
836 the percentage of total contigs (actual contig numbers are indicated in bold). Blue: number
837 of contigs showing a strong sequence similarity with the nr database (e-value < 1e-05); light
838 grey: contigs showing a weak sequence similarity to the nr database (e-values 1e-05 to 1e-
839 03); middle-dark grey: contigs with no sequence similarity detected by BLASTx/BLASTp
840 but encoding one or more ORFs of more than 200 amino acids (600 nt); dark grey: genomic
841 ‘dark matter’ - contigs without any signal detected or any long ORFs encoded. (B) Total
842 number of RNA virus reads per total number of non-rRNA reads in each library. (C)
843 Distribution of RNA virus diversity in the three libraries and percentage of RNA virus reads
844 associated with each viral super-clade. The host range is represented for each viral clade.
845
846 Fig 3. Genome pairwise identity distributions of the new algal viral sequences. The
847 level of pairwise identity between the viruses newly identified viruses and existing members
848 of each viral family are represented in red. (A) Inter-genus and intra-genus identity levels
38 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
849 within the Amalgaviridae. Inter-genus and intra-genus identity levels are displayed in grey
850 and purple, respectively. (B) Inter-genus and intra-genus identity levels within the
851 Narnaviridae. Inter-genus and intra-genus identity levels are displayed in grey and yellow,
852 respectively. C) Inter-genus and intra-genus identity levels within Tombusviridae. Inter-
853 genus and intra-genus identity levels are displayed in grey and green, respectively.
854
855 Fig 4. Genomic organization of the Partitiviridae, Totiviridae and Amalgaviridae.
856 Possible evolutionary scenarios for the BDRC-like contigs observed in Ostreobium sp.
857
858 Fig 5. RdRp phylogeny of newly identified chlorophyte viruses among the
859 Amalgaviridae, Partitiviridae, Picobirnaviridae and Hypoviridae. Sequences identified in
860 this study are labelled in red. Unclassified sequences from ref.41 are highlighted in grey. For
861 clarity, some families and genera have been collapsed. Right, phylogenetic tree estimated
862 using IQ-Tree with bootstrap replicates and SH-aLTR set to 1000 (values indicated in
863 parenthesis). Left, genomic organizations of new viruses as well as the closest identified
864 relatives representing each family/genus as follows: Cryphonectria hypovirus 2
865 (Hypoviridae); Chicken picobirnavirus (Picobirnaviridae); Southern tomato virus
866 (Amalgaviridae); Cryptosporidium parvum virus 1 (Cryspovirus); Discula destructiva virus 1
867 (Gammapartitivirus); Fig cryptic virus (Deltapartitivirus); Ceratocystis resinifera partitivirus
868 (Betapartitivirus); White clover cryptic virus 1 (Alphapartitivirus). The tree is mid-pointed
869 rooted and branch lengths are scaled according to the number of amino acid substitutions
870 per site.
871
872 Fig 6. Phylogeny of the Narnaviridae-Botourmiaviridae group based on the RdRp.
873 Newly discovered viruses from Ostreobium sp. are highlighted in red. RdRp sequences
874 from unassigned RNA virus retrieved from ref.41 are marked in grey. Right, phylogenetic tree
39 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
875 estimated using IQ-Tree with bootstrap replicates and SH-aLTR set to 1000 (values
876 indicated in parenthesis). Left, genomic organization of both viral genomes identified in this
877 study (red) and representatives of each major family and genus used in the phylogeny
878 (Cassava virus C - Botourmiaviridae; Saccharomyces 23S RNA narnavirus; Chenopodium
879 quinoa mitovirus 1). Annotations of Cassava virus C coding sequences: RdRp (Segment I);
880 Putative movement protein (Segment II); Coat protein (Segment III). Branch lengths are
881 scaled according to the number of amino acid substitutions per site.
882
883 Fig 7. Phylogeny of the Tombusviridae RdRp. The tombus-like sequence identified in this
884 study is labelled in red. Unclassified sequences from ref.41 are highlighted in grey. For
885 clarity, some families and genera have been collapsed. Right, phylogenetic tree estimated
886 using IQ-Tree with bootstrap replicates and SH-aLTR set to 1000 (values indicated in
887 parenthesis). Left, genomic organizations of new viruses as well as their closest homologs
888 and representatives from each family/genus as follows: Black beetle virus (Nodaviridae);
889 Carrot mottle virus (genus Dianthovirus); Carnation ringspot virus (Dianthovirus); Beet black
890 scorch virus (Betanecrovirus); Cucumber leaf spot virus (Aureusvirus); Maize necrotic streak
891 virus (Tombusvirus); Olive latent virus 1 (Alphanecrovirus); Panicum mosaic virus
892 (Panicovirus); Hibiscus chlorotic ringspot virus (Betacarmovirus); Clematis chlorotic mottle
893 virus (Pelarspovirus); Carnation mottle virus (Alphacarmovirus). The tree is mid-pointed
894 rooted and branch lengths are scaled according to the number of amino acid substitutions
895 per site.
896
897 Fig 8. Phylogeny of the Hepe-Virga group RdRp. The hepe-virga-like sequence identified
898 in this study is labelled in red. Unclassified sequences from ref.41 are highlighted in grey. For
899 clarity, some families or genera have been. Right, RdRp-based phylogenetic tree obtained
900 using IQ-tree with bootstrap replicates and SH-aLTR set to 1000 (resulting values are
40 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
901 indicated in parenthesis). Left, genomic organizations of new viruses as well as closest
902 homologs and representative from each family/genus as follows: Orthohepevirus A
903 (Hepeviridae); Poinsettia mosaic virus (order Tymovirales); Wheat stripe mosaic virus
904 (Benyviridae); Diatom colony associated dsRNA virus 15 (Endornaviridae); Cabassou virus
905 (Togaviridae); Apple mosaic virus (Bromoviridae); Mint virus 1 (Closteroviridae); Cucumber
906 mottle virus (Virgaviridae). The tree is mid-pointed rooted and branch lengths are scaled
907 according to the number of amino acid substitutions per site.
908
909 Fig 9. Phylogeny of the Bunyavirales RdRp. The bunya-like sequence identified in this
910 study is labelled in red. Unclassified sequences from ref.41 are highlighted in grey. For
911 clarity, some families and genera have been collapsed. Right, phylogenetic tree estimated
912 using IQ-Tree with bootstrap replicates and SH-aLTR set to 1000 (values indicated in
913 parenthesis). Left, genomic organizations of new viruses as well as their closest homologs
914 and representatives from each family/genus as follows: Melon chlorotic spot virus
915 (Phenuiviridae); Yogue virus (Nairoviridae); Latino mammarenavirus (Arenaviridae); Seattle
916 orthophasmavirus (Phasmaviridae); Melon yellow spot virus (Tospoviridae); Fig mosaic
917 emaravirus (Fimoviridae); Tataguine orthobunyavirus (Peribunyaviridae). The tree is mid-
918 pointed rooted and branch lengths are scaled according to the number of amino acid
919 substitutions per site.
920
41 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
921 FIGURES
A)
B)
Microrhizoidea Dolichomastix tenuilepis Kraftionema Chlorarachnion reptans pickettheapsiorum allantoideum C) ChChlolorroopphhyytta Rhodophyta Glaucophyta ChChlolorarararacchnhnioiopphhyycceeae Dinophyceae dsDNA Bacillariophyta ssDNA Pelagophyceae dsRNA Phaeophyceae ssRNA Raphidophyceae Haptophyta Cryptophyta Euglenoids 30 20 10 0 10 20 30 922
923 Figure 1
924
42 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
A) 100% 90% 80% 963 25001 70% 62709 Dark matter contigs contigs 60% f 8 ORFans (ORF > 200aa) 50% 50 40% 1432 Distant homology (e-value < 1e-03) 1966 1560 3267 centage o Strong homology (e-value < 1e-05)
r 30% e
P 871 20% 31867 14860 10% 0% ALG_1 ALG_2 ALG_3 B) 8
7 Millions
6
5
4 RNA virus reads non RNA virus reads
Read count 3
2
1
0 ALG_1 ALG_2 ALG_3 C) 100% 90%
eads 80% r 70%
virus 60% Bunya-like A 50% Virga-like RN f 40% Amalga-like
30% Mito-like
centage o 20% Tombus-like r e
P 10% 0% ALG_1 ALG_2 ALG_3 925
926 Figure 2
927
43 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
A) 16
14
12
10
8
6
4 2
0 % % % % % % % % % % % % % % % % % % % % % % % % % 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 B) 160
140
120
100 80
60
40
20
0 % % % % % % % % % % % % % % % % % % % % % % % % % % 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100 C) 500 450 400 350 300 250 200 150 100 50 0 % % % % % % % % % % % % % % % % % % % % % % % % % 0 4 8 928 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96
929 Figure 3
930
44 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
931
Totiviridae
CP RdRp Partitiviridae Partiti-like contigs CP from this study
RdRp Amalgaviridae RdRp CP ?
932 CP RdRp
933 Figure 4
934
45 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
935
Genome length (nt)
99.9/100 45.3/22 100/100 Hypoviridae ALG_2_DN19089_c0_g1_i1_len4252_ORF_1_frame_2_translation_2Partiti-like adriusvirus (ORF1) 99.7/100 ALG_2_DN19300_c0_g1_i1_len3273_ORF_1_frame_1_translation_2Partiti-like lacotivirus (ORF1) 97.6/95 ALG_2_DN19436_consensus_reversed_ORF_1_frame_1_translation_2Partiti-like alassinovirus (ORF1) 8.8/43 74.9/33 BAB63954.1_2_RdRp-like_protein__Bryopsis_cinicola_
100/100
81.8/56 97.9/100
99.7/100 88.9/88
96.7/95 YP_009361996.1_Picobirnaviridae_Picobirnavirus_Picobirnavirus-green-monkey_KNA_2015_vertebrates
99.9/100YP_009351841.1_Picobirnaviridae_Picobirnavirus_Otarine-picobirnavirus_segment-2_vertebrates JMYJCC46554_Jingmen_picobirna-like_virus_1_len1632 Picobirnaviridae 83.2/7195.7/93 YP_239361.1_Picobirnaviridae_Picobirnavirus_Human-picobirnavirus_segment-2_human 95.1/78 YP_009553306.1_Picobirnaviridae_Picobirnavirus_Picobirnavirus-sp._segment-2_human_vertebrates 90.6/43 89.8/75YP_009551574.1_Picobirnaviridae_Picobirnavirus_Chicken-picobirnavirus_segment-RNA-2_vertebrates 40/41 YP_009241386.1_Picobirnaviridae_Picobirnavirus_Porcine-picobirnavirus_segment-S_vertebrates 88.7/78 YP_009389484.1_Picobirnaviridae_Picobirnavirus_Picobirnavirus-dog_KNA_2015_vertebrates YP_009342446.1_Partitiviridae_Botryosphaeria-dothidea-virus-1_segment-1_fungi BAA25883.1_5_RdRp_partial__Bryopsis_mitochondria-associated_dsRNA_ 82.5/74 92.1/90 ALG_2_DN19090_c0_g1_i2_len4036_ORF_2_frame_2_translation_2Amalga-like chaucrivirus (ORF2) 97.8/99 98/99 ALG_2_DN19428_c0_g1_i6_len4011_ORF_2_frame_3_translation_2Amalga-like dominovirus (ORF2) ALG_1_DN2691_c0_g1_i2_len1440_ORF_1_frame_1_translation_2Amalga-like boulavirus, partial (ORF1) 25.4/48 99/83 ALG_2_DN18594_c0_g1_i1_len3399_ORF_2_frame_2_translation_2Amalga-like chassivirus (ORF2) 99.8/100 ALG_2_DN19448_c0_g6_i1_len3254_ORF_2_frame_3_translation_2Amalga-like lacheneauvirus (ORF2) NP_624325.1_Amalgaviridae_Zybavirus_Zygosaccharomyces-bailii-virus-Z_fungi SCM33193_Hubei_partiti-like_virus_59_len3660 84.2/83 99.1/99 YP_009389417.1_Amalgaviridae_Antonospora-locustae-virus-1_fungi
95.5/94 YP_003934623.1_Amalgaviridae_Amalgavirus_Blueberry-latent-virus_land-plants YP_009362302.1_Amalgaviridae_Zostera-marina-amalgavirus-1_land-plants 100/100 100/100 YP_009362304.1_Amalgaviridae_Zostera-marina-amalgavirus-2_land-plants YP_009553344.1_Amalgaviridae_Festuca-pratensis-amalgavirus-2 55.2/57 YP_009447919.1_Amalgaviridae_Amalgavirus_Allium-cepa-amalgavirus-1_land-plants 100/100 YP_009447921.1_Amalgaviridae_Amalgavirus_Allium-cepa-amalgavirus-2_land-plants 45.3/22 32.3/4090.2/87 YP_009552083.1_Amalgaviridae_Phalaenopsis-equestris-amalgavirus-1 98.7/100 YP_009553030.1_Amalgaviridae_Salvia-hispanica-RNA-virus-1_land-plants Amalgaviridae YP_002321509.1_Amalgaviridae_Amalgavirus_Southern-tomato-virus_land-plants 99.5/9999.3/100 YP_009552797.1_Amalgaviridae_Capsicum-annuum-amalgavirus-1 94.4/97 YP_009551562.1_Amalgaviridae_Anthoxanthum-odoratum-amalgavirus-1_land-plants 100/100 YP_009552799.1_Amalgaviridae_Festuca-pratensis-amalgavirus-1 85.7/74 YP_003868436.1_Amalgaviridae_Amalgavirus_Rhododendron-virus-A_land-plants YP_009552085.1_Amalgaviridae_Medicago-sativa-amalgavirus-1 94.8/8794.4/65 YP_009553342.1_Amalgaviridae_Cleome-droserifolia-amalgavirus-1 65.3/43 YP_009551564.1_Amalgaviridae_Camellia-oleifera-amalgavirus-1 61.2/45 YP_009388304.1_Amalgaviridae_Amalgavirus_Spinach-amalgavirus-1_land-plants 90.9/70YP_009552087.1_Amalgaviridae_Erigeron-breviscapus-amalgavirus-1 55.2/71 YP_009552089.1_Amalgaviridae_Erigeron-breviscapus-amalgavirus-2 YP_009553311.1_Partitiviridae_Aspergillus-fumigatus-partitivirus-2_segment-1_fungi 81.9/50 YP_009508065.1_Partitiviridae_Cryspovirus_Cryptosporidium-parvum-virus-1_segment-RNA-1_protozoa 88.5/76 Partitiviridae / Cryspovirus 99.4/100
100/100 YP_008327312.1_Partitiviridae_Ustilaginoidea-virens-partitivirus-2_segment-RNA-1_fungi 84.5/49 93.9/93 81.5/78 Partitiviridae / Gammapartitivirus 66.5/44 97.4/88 96/98 WHZHC73278_Wuhan_large_pig_roundworm_virus_1_RdRp_len1560 22.2/59 100/100 XZSJSC65291_Xinzhou_partiti-like_virus_1_RdRp_len1523 99.4/100 QTM22037_Hubei_partiti-like_virus_57_len1618 95.2/96 9.5/45 QTM26145_Hubei_partiti-like_virus_56_len1642 85.8/49 100/100 Partitiviridae / Deltapartitivirus 100/100 Partitiviridae / Betapartitivirus YP_007419077.1_Partitiviridae_Alphapartitivirus_Rosellinia-necatrix-partitivirus-2_segment-RNA-1_fungi 46.7/59 YP_009362092.1_Partitiviridae_Bipolaris-maydis-partitivirus-1_segment-RNA-1_fungi 100/100 99.4/100 YP_009273018.1_Partitiviridae_Arabidopsis-halleri-partitivirus-1_segment-1_land-plants 97.6/100 YP_009551597.1_Partitiviridae_Alphapartitivirus_Medicago-sativa-alphapartitivirus-1_segment-RNA-1_land-plants 99.4/98 QTM26091_Hubei_partiti-like_virus_27_len1703 92.9/84 99.7/100 QTM20591_Hubei_partiti-like_virus_25_len1855 63.2/69 91.9/84 QTM27118_Hubei_partiti-like_virus_26_len1754 91.1/83 Partitiviridae / Alphapartitivirus 80.3/64
936 0.4
937 Figure 5
938
46 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Mito-like bionusvirus (ORF1) Mito-like spartanusvirus (ORF2) Mito-like picolinusvirus (ORF1) Mito-like babylonusvirus (ORF1) Mito-like albercanusvirus (ORF1) Mito-like laruketanusvirus (ORF1) Mito-like boobanusbagrillonusvirus (ORF1)
939
940 Figure 6
941
47 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Genome length (nt) BHZC37000_Beihai_tombus-like_virus_14_len5160 100/100 97.8/97 WZFSL78689_Wenzhou_tombus-like_virus_16_len4467 87.2/81
BHWZXX15284_Beihai_tombus-like_virus_16_len4822
26.7/55 98.8/90
14.9/33 BHBJDX18752_Beihai_tombus-like_virus_15_len3325
83.4/56 WHBL72297_Hubei_tombus-like_virus_37_len7304 61.3/62 99.8/100 97.8/97 25.9/55 Nodaviridae 98.2/97
JMYJCC68055_Jingmen_tombus-like_virus_1_len4971
WHBL72828_Hubei_tombus-like_virus_31_len4264 59.2/74 100/100
21/54 QTM27120_Hubei_tombus-like_virus_26_len2286
99.5/99 spider124630_Hubei_tombus-like_virus_29_len2550
45.6/59 YP_009552019.1_Tombusviridae_Culex-associated-Tombus-like-virus 100/100 99.9/100 insectZJ94961_Shangao_tombus-like_virus_1_len2299 100/100 40.8/47 SSZZ1762_Hubei_tombus-like_virus_28_RdRp_len2539
82/60 18.2/46 94/87
spider118341_Hubei_tombus-like_virus_14_len4303 100/100 WHYY18977_Hubei_tombus-like_virus_13_len5904 40.8/26 WHYY21098_Hubei_tombus-like_virus_15_len4000 93.5/80 100/100 YP_009553261.1_Tombusviridae_Yongsan-tombus-like-virus-1 90.4/48 99.9/100 mosZJ33874_Wenzhou_tombus-like_virus_11_len4958
90.9/46 83.1/57 BHZY58393_Beihai_tombus-like_virus_9_len4382 100/100 BHZY59908_Beihai_tombus-like_virus_10_len4836 96.4/76 TALG_2_DN18828_c0_g1_i1_len3835_ORF2_translationombus-like chagrupourvirus (ORF2) CJLX30686_Changjiang_tombus-like_virus_17_len3791 17.2/34 YP_009253998.1_Tombusviridae_Umbravirus_Sclerotinia-sclerotiorum-umbra-like-virus-1 94/80 85.1/51 65.2/12 WHSFII11317_Hubei_tombus-like_virus_11_len2724 86.9/29
17.2/18 89.9/89
100/100 100/98 88.7/78
90.6/41 beimix73523_Wenzhou_tombus-like_virus_8_len4683 98.4/90 WHCC112361_Hubei_tombus-like_virus_10_len3851
76.3/34 YP_009551334.1_RdRp__Citrus_yellow_vein-associated_virus_ 82.7/30 100/100 98.7/100 AWS06679.1_RdRP__Ethiopia_maize-associated_virus_ 100/100 Umbravirus
WZFSL85189_Wenzhou_tombus-like_virus_6_len5659 91.1/36 100/100 100/100 Dianthovirus 99/99
83/596.1/27 WHSFII19265_Hubei_tombus-like_virus_2_len5510 Tombusviridae 77.2/5063.3/38 100/100 WHWN54407_Hubei_tombus-like_virus_1_len5430 95.7/83 WGML136364_Hubei_tombus-like_virus_5_len3085 99.2/100 NP_619751.1_Tombusviridae_Avenavirus_Oat-chlorotic-stunt-virus_land-plants 88.5/86 WGML12775_Hubei_tombus-like_virus_6_len3770 83/56 87.6/55
89.3/4770.6/62
86.2/53 100/100 Betanecrovirus 90.6/87 100/100 Aureusvirus 96.2/94 100/100 Tombusvirus 94.7/58 NP_044732.1_Tombusviridae_Gallantivirus_Galinsoga-mosaic-virus 100/100 YP_007517174.1_Tombusviridae_Macanavirus_Furcraea-necrotic-streak-virus_land-plants 90.6/85 100/100 Alphanecrovirus NP_619718.1_Tombusviridae_Machlomovirus_Maize-chlorotic-mottle-virus_land-plants 97.6/90 100/100 100/100 Panicovirus NP_619521.1_Tombusviridae_Gammacarmovirus_Cowpea-mottle-virus_land-plants 100/100 YP_002333476.1_Tombusviridae_Gammacarmovirus_Soybean-yellow-mottle-mosaic-virus_land-plants 99.6/99 93.8/86 SCJXSX39078_Beihai_tombus-like_virus_1_len4145 99.5/100 NP_041226.1_Tombusviridae_Gammacarmovirus_Melon-necrotic-spot-virus_land-plants 98/99 17.3/44 NP_862835.2_Tombusviridae_Gammacarmovirus_Pea-stem-necrosis-virus_land-plants 88.4/86 99.4/98 Betacarmovirus 100/100 Pelarspovirus
92.2/91 YP_004300263.3_Tombusviridae_Trailing-lespedeza-virus-1_land-plants 98.9/99 93.3/91 YP_009270620.1_Tombusviridae_Gompholobium-virus-A_land-plants 96.8/98 Alphacarmovirus
942 0.5
943 Figure 7
944
48 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
82.5/48
YP_009553650.1_Hepeviridae_Rana-hepevirus_vertebrates
YP_009553211.1_Hepeviridae_Sogatella-furcifera-hepe-like-virus 84.5/64 45.8/15 98.2/100 WHCC116187_Hubei_hepe-like_virus_1_len7594 22.5/11 79.7/48 WHCC101555_Hubei_hepe-like_virus_2_len5933 98/96 83.9/66 WZFSL85851_Wenzhou_hepe-like_virus_2_len7652 Genome length (nt)
95/57 99.2/100 Hepeviridae 75.2/52 100/100 Tymovirales NP_062883.2_Matonaviridae_Rubivirus_Rubella-virus_human Segment I 99.7/100 83.9/66 Benyviridae Segment II 100/100 Endornaviridae 84/66 QHD64722.1_RdRp__Plasmopara_viticola_associated_virga-like_virus_1_ 97.3/100 ALG_9_DN7625_c0_g1_i2_len2313_reversed_ORF_2_frame_2_translationVirga-like bellevillovirus (ORF2) 93.6/92 85/77 WLJQ101932_Wenling_toga-like_virus_len9247 96.1/98 100/100 Togaviridae 98.5/98 Segment I 99/100 Bromoviridae Segment II 99.8/100 Closteroviridae Segment III 88.6/82 93.9/54 Virgaviridae mosHB236471_Hubei_virga-like_virus_1_len9141 84.4/62 100/100 mosHB235832_Hubei_virga-like_virus_2_len10650 96.7/100 QHA33736.1_polyprotein__Atrato_Virga-like_virus_2_ 88.2/73 SCM49209_Hubei_virga-like_virus_12_len8318_FS
100/100AMO03256.1_putative_polyprotein__Bofa_virus_ 85.1/87 AWA82276.1_polyprotein__Beult_virus_ 18.6/53 AMO03221.1_putative_polyprotein__Buckhurst_virus_ 87.6/8478.6/73 AMO03254.1_putative_polyprotein__Boutonnet_virus_ 15.8/59 AVK59469.1_replicase_protein__Hubei_virga-like_virus_11_ 100/100 QHA33754.1_polyprotein__Atrato_Virga-like_virus_5_ 99.6/100 YP_009553255.1_replicase__Culex_pipiens_associated_Tunisia_virus_
QDZ71189.1_hypothetical_protein__Megastigmus_ssRNA_virus_ 74.5/51 95.5/73 WHYY22294_Hubei_virga-like_virus_10_len9644 100/100 SCM51642_Hubei_virga-like_viurs_9_len10976
mosHB236537_Hubei_virga-like_virus_21_len10072
100/100QGA87337.1_hypothetical_protein__Hammarskog_virga-like_virus_ 3.5/34 QHA33793.1_polyprotein__Atrato_Virga-like_virus_7_ 99.9/100 QHA33782.1_polyprotein__Atrato_Virga-like_virus_7_ 74.3/52 86.5/59 QCM140901_Hubei_virga-like_virus_22_len4933 100/100 AWK77912.1_nonstructural_polyprotein_partial__Robinvale_plant_virus_1_ 93/83 mosHB236486_Hubei_virga-like_virus_23_len14662 90.5/89 BHHK129576_Beihai_anemone_virus_1_len11540 98/100 SXXX35475_Sanxia_atyid_shrimp_virus_1_len11157
YP_004928118.1_Kitaviridae_Higrevirus_Hibiscus-green-spot-virus-2_segment-RNA-1_land-plants 92.3/72 99.7/100 ATW76030.1_replicase__Hibiscus-infecting_cilevirus_ 100/100 YP_654538.1_Kitaviridae_Cilevirus_Citrus-leprosis-virus-C_segment-RNA-1_land-plants 21.9/49 QFU28437.1_RdRp__Passion_fruit_green_spot_virus_ 83.8/61 YP_009551530.1_Kitaviridae_Blunervirus_Tea-plant-necrotic-ring-blotch-virus_segment-RNA-2_land-plants 100/100 AGI44298.1_RdRp__Blueberry_necrotic_ring_blotch_virus_ 18.3/47 100/100 45.5/73 94.1/77 QGA69815.1_polyprotein__Varroa_destructor_virus_4_ 92/90 88.6/68
BHTH16123_Beihai_barnacle_virus_2_len9154
WGML111093_Hubei_virga-like_viurs_8_len9503
75.4/45 WHYY23932_Wuhan_house_centipede_virus_1_len10322 98.6/99 WHCCII10272_Wuhan_insect_virus_8_len10644 66.7/3724/60 YP_009552459.1_Virgaviridae_Nephila-clavipes-virus-3 85.3/80 WHCCII12598_Hubei_virga-like_virus_4_len7920 50.9/64 91.7/74 WHCCII13328_Wuhan_insect_virus_9_len9305 QTM162107_Hubei_negev-like_virus_1_len2937
AQM55309.1_hypothetical_protein_1__Cordoba_virus_ 100/100 75.1/36 YP_009351833.1_hypothetical_protein_1__Cordoba_virus_ 64.4/76 AQM55490.1_hypothetical_protein_1__Cordoba_virus_ 79.7/29 QTM27262_Hubei_virga-like_virus_7_len9481
CC64601_Hubei_negev-like_virus_2_len6640 71.6/435.2/32 YP_009351835.1_hypothetical_protein_1__Loreto_virus_ 64.7/68 AQM55272.1_hypothetical_protein_1__Big_Cypress_virus_
70.7/40YP_009552739.1_putative_RdRp__Ying_Kou_virus_
97.8/92AQZ55393.1_ORF1__Castlerea_virus_
93/95QBR99594.1_hypothetical_protein__Manglie_virus_
95.4/98AQM55317.1_hypothetical_protein_1__Ngewotan_negevirus_
92.5/91ASO75595.1_ORF1__Negev_virus_ 75.5/74 BBN20799.1_hypothetical_protein_1__West_Accra_virus_ 10/46 QDC23193.1_ORF1__Ngewotan_virus_
945 0.5
946 Figure 8
947
49 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
BHTH41702_Bunya-Arena_Beihai_barnacle_virus_6_len5878 Genome length (nt) 100/100 YYSZX17850_Bunya-Arena_Beihai_blue_swimmer_crab_virus_2_len6364
80.4/88 BALG_9_DN7730_c0_g1_i1_len2818_ORF_2_frame_2_translationunya-like bridouvirus (ORF2) 99/100 APG79310.1_2_RdRp__Shahe_bunya-like_virus_1_ 99.8/100 QDB75018.1_RdRp__Cladosporium_cladosporioides_negative-stranded_RNA_virus_2_
100/100 QGY72640.1_RdRp__Plasmopara_associated_mycobunyavirales-like_virus_3_
97.3/100 QDH86925.1_RdRp_partial__Arenavirus_sp._ 96.3/89 27.1/67 QGY72638.1_RdRp__Plasmopara_associated_mycobunyavirales-like_virus_1_
QGY72642.1_RdRp_partial__Plasmopara_associated_mycobunyavirales-like_virus_5_
92.7/93 BHTH16146_Bunya-Arena_Beihai_barnacle_virus_5_len6887 100/100 QDW81309.1_RdRp__Rhizoctonia_solani_bunya_phlebo-like_virus_1_
97.9/99 SCM50476_Bunya-Arena_Hubei_bunya-like_virus_5_len6161 100/100 SZmix21338_Bunya-Arena_Hubei_bunya-like_virus_6_len5634
98.9/99 WGML135976_Bunya-Arena_Hubei_bunya-like_virus_4_len6492
96.9/98 YP_009422199.1_Negarnaviricota_Coguvirus_Citrus-coguvirus_segment-RNA-1_land-plants 100/100 YP_009667028.1_Negarnaviricota_Phenuiviridae_Laulavirus_Laurel-Lake-laulavirus_segment-L_invertebrates 98/96 85.9/84 Phenuiviridae
99.7/99
13.5/3962/28
90.4/32 94.4/45
WP_076272294.1_pentapeptide_repeat-containing_protein__Paenibacillus_odorifer_ 32.7/18 37.6/54 SHWC0209c12636_Bunya-Arena_Shahe_bunya-like_virus_3_len9961 100/100 99.6/86 94.8/79 SHWC0209c12661_Bunya-Arena_Shahe_bunya-like_virus_4_len6195
YP_009666319.1_Negarnaviricota_Wupedeviridae_Wumivirus_Millipede-wumivirus 93.1/90 100/100 Nairoviridae
BHNXC41490_Bunya-Arena_Beihai_sipunculid_worm_virus_7_RdRp_len11709
96.7/98 WGML140132_Bunya-Arena_Hubei_myriapoda_virus_5_RdRp_len9874 96.3/89 80.8/84 99.3/100 Arenaviridae
100/100
BHJJX25757_Bunya-Arena_Beihai_hermit_crab_virus_2_RdRp_len9038 95.6/99 WGML134260_Bunya-Arena_Hubei_bunya-like_virus_11_len7824 89.3/94 99.7/100 100/100 Phasmaviridae
ZCM20815_Bunya-Arena_Hubei_orthoptera_virus_2_RdRp_len8763
53.3/75 WGML112345_Bunya-Arena_Hubei_bunya-like_virus_7_RdRp_len7402 98.1/98 100/100 WGML140341_Bunya-Arena_Hubei_myriapoda_virus_6_RdRp_len7470 18.3/39
BHJP59844_Bunya-Arena_Beihai_bunya-like_virus_5_len7842
9/28 SCM26604_Bunya-Arena_Hubei_bunya-like_virus_12_len4484 98.7/97 75.1/74 100/100 Hantaviridae
YP_009553307.1_Negarnaviricota_Athtab-bunya-like-virus_segment-L_invertebrates
99.1/94 100/100 Tospoviridae
93.5/98 WLJQ100911_Bunya-Arena_Wenling_crustacean_virus_9_RdRp_len6691 93.8/93 100/100 Fimoviridae 95.1/89 100/100 Peribunyaviridae
0.5 948
949 Figure 9
950
951
50 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
952 SUPPLEMENTARY FIGURE LEGENDS
953 Fig S1. Gel electrophoresis of RT-PCR to validate the putative new RNA viruses in
954 each sample used for library construction and RNA-seq. RT-PCR validation in (A)
955 ALG_1; (B) ALG_2; (C) ALG_3. rbcL: ribulose-bisphosphate carboxylase gene, tufA:
956 elongation factor Tu, both used as positive PCR-controls. K.a: Kraftionema allantoideum;
957 M.p: Microrhizoidea pickettheapsiorum ; D.t :Dolichomastix tenuilepsis.
958
959 Fig S2. MAFFT-alignment of the ALG_3_DN34624-encoded ORF with the viral RdRp
960 sequences that compose PFAM RdRp_4 profile.
961
962 Fig S3. MAFFT-alignment of the ALG_2_DN19089-encoded ORF with the PFAM
963 RdRp_1 profile entries.
964
965 Fig S4. rRNA depletion and contig assembly results. (A) Read counts before and after
966 trimming and SortmeRNA rRNA depletion. (B) Trinity assembly quality expressed as the
967 average length of contig assembled for each library.
968
969 Fig S5. Relative abundance of taxonomically-assigned contigs. Krona graphs obtained
970 using KMA and CCMetagen methods for (A) ALG_1, (B) ALG_2 and (C) ALG_3 libraries. The
971 ORFans contigs (i.e. those without any match in nt database) are not shown here. Each
972 circle represents a taxonomic level which is grouped and coloured based on their
973 taxonomic classification at lower ranks.
974
975 Fig S6. Genomic organization of the ALG_2 Partiti-like sequences and Bryopsis
976 mitochondria-associated dsRNA. ORFs are predicted using the mold protozoan
51 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
977 mitochondrial genetic code (translation table 4). Ribosomal -1 frameshifting motifs
978 “GGAUUUU” are indicated in red boxes.
979
980 SUPPLEMENTARY TABLES
981 Table S1. RNAseq library preparation and sequencing results.
Library ID Library prep protocol Nb paired-end reads Data yield (bp)
ALG_1 SMARTer Stranded Total RNA-Seq Kit v2 Pico Input Mammalian 21,014,207 5.30 Gb
ALG_2 TruSeq stranded/RiboZero gold 25,963,002 6.54 Gb
ALG_3 SMARTer Stranded Total RNA-Seq Kit v2 Pico Input Mammalian 18,448,371 4.65 Gb
982
983 Table S2. List of primers used in this study. tufA primers have been retrieved from 76.
Sequence ID Primer ID Sequence Direction Product Size
1.1F CATTGCGTGAGTGGTATGCG forward Hypothetical amalga-like 867 (ALG_1_DN2506) 1.1R GTGCGCCTCTTAGCTATTGT reverse
1.2F CGCTCATTGCACTAAAGATGACA forward Amalga-like boulavirus 1112 1.2R TCAACCGTGTAAGGAGTTTGA reverse
2.1F GCTGCCGCTCACAATAGAGA forward Amalga-like chassivirus 2616 2.1R GGGTAACGGGGTCCTGTATG reverse
2.2F CCAGCCGATTGATCTCGACA forward Tombus-like chagrupourvirus 2360 2.2R ATGGACTTGTAATGGGCGCA reverse
2.3F GATTGACGCAGAACACGACG forward Amalga-like chaucrivirus 2105 2.3R ACAGACTCCGCCTCGAGTAA reverse
2.4F CCTGAGGGCACTTCTGTGTT forward 2000
52 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Mito-like babylonusvirus 2.4R CCATCATACGTTGGCGGAGA reverse
2.5F AGCAGAAGGAGGTCTCGCTA forward Mito-like bionusvirus 2311 2.5R TTTCGCTAAGGGGAACGACC reverse
2.6F GATCGCCTCCGTCTACATGG forward Mito-like spartanusvirus 2032 2.6R TTTGGACAGTCGGTACCGTC reverse
2.7R TGCAAGCTTCCCACCTTGAT reverse Mito-like laruketanusvirus 2070 2.7F GGAAAATCGCCAAAGCAGGG forward
2.8R GGCCTTTTAGAAAGCCAGCG reverse Mito-like boobanusbagrillonusvirus 2154 2.8F CCGGATGCCTCTTACGTGAA forward
2.9F GTTGCTGAGGGTCTTGTCGA forward Amalga-like dominovirus 2809 2.9R ACGATTTTCGAGACGCCTGT reverse
2.10F CTGGCAGCTACACACGAGAA forward Amalga-like lacheneauvirus 2675 2.10R GCGAATCAGTTGTTCCCGGA reverse
2.11F GGATGCGACCGTTGAATGC forward Partiti-like alassinovirus 2867 2.11R TCACTAGGGCACTCCACGA reverse
2.12F GGTTTGAACGCGGGGTAGA forward Partiti-like adriusvirus 3357 2.12R TCGAATTCATCAGTGGGCA reverse
2.13F TTTCAATGGCGGCGAGTCA forward Partiti-like lacotivirus 2503 2.13R CGGCCATATCTTCATCGTCCA reverse
2.14F CAGAACGTTGTTCCGGCTTG forward Mito-like picolinusvirus 2114 2.14R TCCCCGAACGTAAGGATCCT reverse
2.15F CCGCGGCCCAATCAAATATG forward Mito-like albercanusvirus 1973 2.15R CTGGGTAAGCACGCATTTCG reverse
53 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
9.1F AAGTTGTGTGATAGGGGGCG forward Virga-like bellevillovirus 1741 9.1R TAAACATCCACACGCCGTCA reverse
9.2F TGGCTGCACTAACCCTCATG forward Bunya-like bridouvirus 2049 9.2R GCTGTGTTGCTTGCATCCAA reverse
rbcL_F CGTGCDATGCAYGCNGTWAT forward rbcL 275 rbcL_R TGCCAWACRTGAATACCACC reverse
tufA_F GGNGCNGCNCAAATGGAYGG forward tufA 758-901 tufA_R CCTTCNCGAATMGCRAAWCGC reverse
984
54