<<

bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Meta-transcriptomic detection of diverse and divergent

2 RNA in green and chlorarachniophyte

3

4

5 Justine Charon1, Vanessa Rossetto Marcelino1,2, Richard Wetherbee3, Heroen Verbruggen3,

6 Edward C. Holmes1*

7

8

9 1Marie Bashir Institute for Infectious Diseases and Biosecurity, School of and

10 Environmental Sciences and School of Medical Sciences, The University of Sydney,

11 Sydney, .

12 2Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical

13 Research, Westmead, NSW 2145, Australia.

14 3School of BioSciences, University of Melbourne, VIC 3010, Australia.

15

16

17 *Corresponding author:

18 Marie Bashir Institute for Infectious Diseases and Biosecurity, School of Life and

19 Environmental Sciences and School of Medical Sciences,

20 The University of Sydney,

21 Sydney, NSW 2006, Australia.

22 Tel: +61 2 9351 5591

23 Email: [email protected]

1 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

24 Abstract

25 Our knowledge of the diversity and evolution of the virosphere will likely increase

26 dramatically with the study of microbial , including the microalgae in few RNA

27 viruses have been documented to date. By combining meta-transcriptomic approaches

28 with sequence and structural-based homology detection, followed by PCR confirmation, we

29 identified 18 novel RNA viruses in two major groups of microbial algae - the chlorophytes

30 and the chlorarachniophytes. Most of the RNA viruses identified in the green algae class

31 were related to those from the families and that

32 have previously been associated with , suggesting that these viruses have an

33 evolutionary history that extends to when their host groups shared a common ancestor. In

34 contrast, seven ulvophyte associated viruses exhibited clear similarity with the

35 that are most commonly found in fungi. This is compatible with horizontal transfer

36 between algae and fungi, although mitoviruses have recently been documented in plants.

37 We also document, for the first time, RNA viruses in the chlorarachniophytes, including the

38 first observation of a negative-sense (bunya-like) RNA virus in microalgae. The other virus-

39 like sequence detected in chlorarachniophytes is distantly related to those from the

40 virus family , suggesting that they may have been inherited from the secondary

41 endosymbiosis event that marked the origin of the chlorarachniophytes. More

42 broadly, this work suggests that the scarcity of RNA viruses in algae most likely results from

43 limited investigation rather than their absence. Greater effort is needed to characterize the

44 RNA viromes of unicellular eukaryotes, including through structure-based methods that are

45 able to detect distant homologies, and with the inclusion of a wider range of eukaryotic

46 microorganisms.

47

48 Author summary

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

49 RNA viruses are expected to infect all living on Earth. Despite recent

50 developments in and the deployment of large-scale sequencing technologies, our

51 understanding of the RNA virosphere remains anthropocentric and largely restricted to

52 human, livestock, cultivated plants and vectors for . However, a broader

53 investigation of the diversity of RNA viruses, especially in , is expected to answer

54 fundamental questions about their origin and long-term evolution. This study first

55 investigates the RNA virus diversity in unicellular algae taxa from the phylogenetically

56 distinct ulvophytes and chlorarachniophytes taxa. Despite very high levels of sequence

57 divergence, we were able to identify 18 new RNA viruses, largely related to plant and fungi

58 viruses, and likely illustrating a past history of horizontal transfer events that have occurred

59 during RNA virus evolution. We also hypothesise that the sequence similarity between a

60 chlorarachniophyte-associated virga-like virus and members of Virgaviridae associated with

61 plants may represent inheritance from a secondary endosymbiosis event. A promising

62 approach to detect the signals of distant virus homologies through the analysis of protein

63 structures was also utilised, enabling us to identify potential highly divergent algal RNA

64 viruses.

65

66 INTRODUCTION

67 Viruses are likely to infect every cellular and play fundamental roles in biosphere

68 diversity, evolution, and ecology. Studies of the global virosphere performed to date have

69 revealed marked heterogeneities in virus composition. For example, while RNA viruses are

70 commonplace in eukaryotes, they are less often found in , with only two families

71 described to date, and are yet to be conclusively identified in . Rather, both the

72 bacteria and Archaea are dominated by DNA viruses1,2. It is unclear, however, whether the

73 highly skewed distribution of viruses reflects fundamental biological, cellular or ecological

74 factors of the hosts in question, or because RNA viruses in bacteria and Archaea are often

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

75 so divergent in sequence that they are difficult to detect using primary sequence

76 comparisons alone.

77

78 Despite greatly increased virus sampling following the advent of metagenomic next-

79 generation sequencing, our picture of the virosphere remains largely restricted to bacteria,

80 vertebrates and plants3,4. Clearly, such a sampling bias will also impact our knowledge of

81 the fundamental patterns and processes of virus origins and evolution. A good example of

82 this major knowledge bias are the unicellular eukaryotes, grouped under the term “protists”,

83 and particularly the microalgae: only 61 distinct viruses have been formally recognized in

84 these taxa5, with only 82 viral sequences classified as infecting eukaryotic microalgae6. This

85 represents only 0.6% of the total 14,679 viral sequences listed on the Viral-Host database

86 (release April 2020), although the true number of microalgal species is estimated to exceed

87 300,0007.

88

89 Despite early attempts, and the first algal virus cultivation in 19798,9, the isolation and

90 characterization of phycoviruses (i.e. algal viruses) has been constrained by the difficulty in

91 cultivating both the algae and their viruses, as well as the inherent challenges in identifying

92 RNA viruses that are highly divergent in primary sequence. Indeed, because RNA viruses

93 are the fastest evolving entities described, phylogenetic signal is rapidly lost over

94 evolutionary time. Hence, it is possible that the low number of algal RNA viruses detected

95 to date simply reflects the fact that they are highly divergent in sequence, even in the

96 canonical RNA-dependent RNA polymerase (RdRp), and hence refractory to detection

97 using primary sequence similarity. Importantly, protein structures are expected to be an

98 order of magnitude more conserved than amino acid sequences10. As a consequence, the

99 study of conserved secondary or tertiary structures could help identify distant homologies

100 among RNA viruses over extended evolutionary time-scales11,12. Accordingly, the analysis of

4 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

101 conserved protein structures may constitute a promising approach to identify novel viruses

102 within the microalgae.

103

104 Microalgae are a polyphyletic group of microscopic unicellular photosynthetic organisms

105 distributed across diverse branches of the eukaryotic phylogeny. With 72,500 species

106 discovered to date and organised into the TSAR (Telonemia, Stramenopiles, Alveolates,

107 Rhizaria), Archaeplastida, Haptista, Cryptista and “excavates” supergroups13–15, microalgae

108 constitute a huge source of genetic diversity16. In addition to their wide range of habitats,

109 their ubiquity and the diversity of genomic features characterised to date (with linear,

110 circular, segmented, non-segmented, single-strand and double-strand ) suggest

111 that microalgae will harbour an enormous untapped source of viral diversity. Due to the

112 ancient nature of eukaryotic algae (ca. 1.8 billion years), their involvement in secondary

113 endosymbiosis events involving many branches of the eukaryotic phylogeny, and

114 that microalgae constitute a primary food source for marine and freshwater food chain, it is

115 also possible that algal viruses played a crucial role in the early events of virus

116 evolution5,17.

117

118 The algal viruses documented to date are dominated by those with DNA genomes: 55 of

119 the 82 algal virus sequences available at VirusHostdb. These include the well-known giant

120 viruses, the majority of which (53%) have been described in the green algae ().

121 This DNA-dominated virome of green algae contrasts with those of their sister-group, the

122 land plants18, for which 60% of the 3590 viral sequences are RNA viruses (i.e. the

123 “”; VirusHostdb, April 2020 release). In marked contrast, 85% of the 27 algae RNA

124 viruses described to date have been identified in diatoms19. Even the few algal RNA viruses

125 characterised to date display impressive diversity, belonging to the families ,

126 , , , , Narnaviridae and .

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

127 However, it is currently unclear whether these seemingly differing distributions of DNA and

128 RNA viruses reflect a major switch in virus composition that occurred during the expansion

129 of land plants or is simply indicative of the inherent limitations in sampling and investigation

130 described above20. As such, revealing the pattern and extent of RNA virus diversity in algae

131 may have profound consequences for understanding the long-term processes that have

132 shaped virus diversity and evolution.

133

134 Herein, we characterised more of the RNA virosphere in cultured samples of two clades of

135 microalgae: (i) the green algae (Chlorophyta), that are part of the Archaeplastida eukaryotic

136 supergroup, and (ii) the chlorarachniophytes, a lineage of Rhizaria that obtained a

137 chloroplast through secondary endosymbiosis of a green alga21. To this end we performed

138 an unbiased (i.e. bulk RNA-sequencing) meta-transcriptomic analysis with an emphasis on

139 detecting remote signals of homology in the RdRp, the hallmark of RNA viruses, using

140 protein-profile based approaches. The comparison of the viromes of these two distant

141 groups will help answer the following key questions: (i) is the long-term divergence between

142 the two algae taxa also reflected in their RNA virome compositions? (ii) does their RNA

143 virome provide evidence for complex evolutionary histories, including horizontal transfer

144 events? (iii) is the RNA/DNA virus bias observed between algae and green plants an artefact

145 of sampling or reflect a more fundamental biological division? More generally, we aimed to

146 broaden our understanding of the biodiversity of algal viruses, with the expectation that the

147 characterization of the algal virosphere will have important implications for understanding

148 and managing the roles they play in global element cycling, climate forcing and

149 biotechnology. Indeed, viruses are important in regulating microalgae populations and may

150 provide a key reservoir for genetic novelty22,23.

151

152 RESULTS

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

153

154 The RNA viromes of two divergent groups of microalgae

155 Our aim was to determine the RNA viromes of six microalgae cultures from six different

156 algal species classified into two highly phylogenetically distinct algal clades: the

157 chlorarachniophytes (Rhizaria) and the green algae (Chlorophyta, Archaeplastida) (Fig 1A).

158 To the best of our knowledge, this is the first identification of viruses in the

159 Chlorarachniophyceae and Ulvophyceae (Chlorophyta) classes (Fig 1C) 24.

160

161 While our previous limited understanding of viruses infecting chlorarachniophytes can be

162 explained by the small number of species from this clade characterized to date (only 15 in

163 Algaebase), the Ulvophyceae is an abundant and diverse algal lineage in existence since

164 the late Proterozoic and comprises at least 1,933 species25. It contains a wide range of

165 morphologies from unicellular benthic algae to large seaweeds26 and its representatives

166 commonly occur in marine, terrestrial and freshwater habitats18. We therefore performed

167 RNA sequencing (meta-transcriptomics) on six microalgal species belonging to both the

168 green algae (classes Ulvophyceae and Mamiellophyceae) and chlorarachniophytes (Table

169 1).

170

171 Table 1. Sample and library description.

Library Species Class

Kraftionema allantoideum Ulvophyceae/Kraftionemaceae

ALG_1 Microrhizoidea pickettheapsiorum Mamiellophyceae/Dolichomastigaceae

Dolichomastix tenuilepis Mamiellophyceae/Dolichomastigaceae

ALG_2 Ostreobium sp. HV05007bc Ulvophyceae/

7 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Chlorarachnion reptans Chlorarachniophyceae/Chlorarachnion ALG_3 Lotharella sp Chlorarachniophyceae/Lotharella

172

173 Because of variable levels of RNA quality and quantity, fewer non-rRNA reads were

174 obtained for the ALG_1 and ALG_3 libraries, which also likely explains the reduced average

175 length of contigs in these libraries compared to ALG_2.

176

177 In total, we identified 18 new putative viral sequences using a standard sequence similarity

178 search among the three libraries, largely comprising those with double-stranded (ds) RNA

179 or single-stranded positive-sense (ssRNA(+)) genomes (Table 2). A divergent bunya-like

180 partial sequence was also retrieved from Chlorarachnion reptans and may constitute the

181 first negative-sense (ssRNA(-)) virus identified in microalgae. Importantly, the presence of

182 these viruses was validated by RT-PCR on each total extracted RNA (Fig S1). In each case

183 these viruses exhibited very low levels of sequence similarity to existing RdRp amino acid

184 sequences, with sequence identities ranging from only 27 to 38%. Most of the viral

185 sequences discovered in this study likely correspond to full-length genomes, given that the

186 length and genomic organization ( numbers, length, etc) is similar to the well-annotated

187 reference homologs. Thus, these sequences are very likely true viruses rather than

188 endogenous viral elements (EVEs) inserted into host genomes.

189

190 Table 2. BLASTx results of the newly described virus-like sequences against nr database.

New virus Length Count BLASTx hit % e-value BLASTx Hit

(algal host species) nt (%read) ID

Amalga-like boulavirus 1440 1427 BAA25883Rd 31 4.6E-23 BDRM

(K. allantoideum)) (0.16%) Rp (dsRNA)

8 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Amalga-like chassivirus 3399 1471 BAA25883Rd 28 1.8E-38 BDRM Partitiviridae

(Ostreobium sp.) (0.02%) Rp (dsRNA)

Amalga-like chaucrivirus 4036 16239 BAA25883Rd 33 3.3E- BDRM Partitiviridae

(Ostreobium sp.) (0.20%) Rp 103 (dsRNA)

Amalga-like dominovirus 4011 1974 BAA25883Rd 33 5.1E-88 BDRM Partitiviridae

(Ostreobium sp.) (0.04%) Rp (dsRNA)

Amalga-like 3254 875 BAA25883Rd 27 3.5E-39 BDRM Partitiviridae

lacheneauvirus (0.01%) Rp (dsRNA)

(Ostreobium sp.)

Partiti-like alassinovirus 3,658 3459 BAB63954 29 1.6E-48 BDRC Bryopsis

(Ostreobium so.) (0.14%) cinicola*

(dsRNA)

Partiti-like lacotivirus 3273 3074 BAB63954 29 2.5E-45 BDRC Bryopsis

(Ostreobium so.) (2.60%) cinicola*

(dsRNA)

Partiti-like adriusvirus 4252 4053 BAB63954 23 5.7E-18 BDRC Bryopsis

(Ostreobium so.) (0.14%) cinicola*

(dsRNA)

Mito-like babylonusvirus 2942 8552 APG77166Rd 38 9E-42 Shahe narna- Unclassified

(Ostreobium sp.) (0.11%) Rp like virus 6 RNA virus

(ssRNA)

Mito-like albercanusvirus 2791 5115 APG77166Rd 39 7.2E-41 Shahe narna- Unclassified

(Ostreobium sp.) (0.06%) Rp like virus 6 RNA virus

(ssRNA)

Mito-like spartanusvirus 2684 14494 ASM94099R 38 2E-32 Barns Ness Narnaviridae

(Ostreobium sp.) (0.18%) dRp serrated (ss+RNA)

wrack narna-

like virus 3

Mito-like laruketanusvirus 2928 11430 APG77166.1 36 5.8E-33 Shahe narna- Unclassified

(Ostreobium sp.) (0.17%) RdRp like virus 6 RNA virus

(ssRNA)

Mito-like 2773 1939 APG77166Rd 34 6.1E-40 Shahe narna- Unclassified

boobanusbagrillonusvirus (0.03%) Rp like virus 6 RNA virus

(Ostreobium sp.)

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Mito-like picolinusvirus 2714 13040 YP 34 4.1E-33 Botrytis Narnaviridae

(Ostreobium sp.) (0.10%) 00228433Rd cinerea (ss+RNA)

Rp 1

Mito-like bionusvirus 3260 7333 AXY40442Rd 27 5.7E-13 Rhizophagus Narnaviridae

(Ostreobium sp.) (0.09%) Rp diaphanum (ss+RNA)

mitovirus 1

Tombus-like 3835 6272 YP 36 5.1E-45 Hubei Unclassified

chagrupourvirus (0.08%) 009336735hy tombus-like RNA virus

(Ostreobium sp.) pot. protein 2 virus 12

Virga-like bellevillovirus 2313 172.66 AMO03254 29 4.4E-44 Boutonnet Unclassified

(C. reptans) (0.01%) polyprotein virus ssRNA virus

(ssRNA)

Bunya-like bridouvirus (C. 2818 192 APG79310Rd 30 9.1E-68 Shahe Unclassified

reptans) (0.01%) Rp bunya-like RNA virus

virus 1

191 BDRM: Bryopsis mitochondria-associated dsRNA; BDRC: Bryopsis cinicola chloroplast-

192 associated dsRNAs; *likely viral RdRp mis-annotated as a host protein.

193

194 Eight of the viruses identified in Ostreobium sp. fell within the Narnaviridae or

195 Tombusviridae. In contrast, five viral sequences from Ostreobium sp. and K. allantoideum

196 do not fit into defined taxonomic groups, and were instead related to the broad set of

197 ‘partiti-like’ viruses that comprise the Partitiviridae, Totiviridae and Amalgaviridae. Finally,

198 more divergent but detectable sequence similarities to the Virgaviridae (+ssRNA) were

199 obtained for samples from the chlorarachniophyte library.

200

201 Detection of divergent viruses using protein structural data

202 An additional attempt to detect even more divergent RNA viruses was conducted was using

203 protein structure. In particular, it is possible that highly divergent viruses are part of the

204 unknown sequences (i.e. contigs with no match in nt/nr databases, or the ‘dark matter’) that

205 comprise between 50-60% of total contigs obtained in this study (Fig 2A). Accordingly, we

10 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

206 attempted to detect evolutionary-conserved features of protein structural and functional

207 motifs in orphan sequences that encode unknown ORFs with a minimum of 200 amino

208 acids (600 nucleotides) (Fig 2A). The corresponding translated ORFs were submitted to the

209 protein-profile based program HMMer3 and compared with profiles from the PFAM RdRp

210 clan and the VOG databases. To exclude false-positives, all positive hits were compared to

211 the entire PFAM database. This resulted in the identification of three non-phage contigs

212 that displayed RdRp homology: ALG_2_DN19089, ALG_2_DN594 and ALG_3_DN34624

213 (Table 3).

11 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

214 Table 3. Results of the VOGdb and PFAM HMM analysis. Light purple: phage-like sequences. Grey: Non-viral sequences; Light blue: DNA

215 virus-like sequences; Orange: RNA virus-like sequences.

Contig name ORF Abund. Viral hit E-value Viral hit description Viral-like hit taxonomy PFAM hit ID E-value2 PFAM hit description ALG_2_DN19089_c0_g1_i1_len4252 ORF_1 4717 VOG03062 1.00E-12 REFSEQ_hypothetical_protein - - - ALG_2_DN19089_c0_g1_i1_len4252 ORF_1 4717 PF00680.20 2.70E-05 RNA dependent RNA polymerase RdRP-1 - - - ALG_2_DN19250_c2_g3_i5_len1869 ORF_1 743.91 VOG23558 1.00E-04 REFSEQ_hypothetical_protein ; - - - ALG_2_DN18568_c0_g1_i1_len2977 ORF_1 511 VOG10478 1.30E-06 sp|Q05224|VG18_BPML5_Gene_18_protein Bacteriophage - - - ALG_2_DN19013_c0_g1_i2_len1689 ORF_2 224.31 VOG22975 1.40E-04 REFSEQ_carboxylesterase Caudovirales;Siphoviridae - - - ALG_2_DN19410_c0_g2_i6_len1950 ORF_1 183.94 VOG12013 5.20E-04 sp|P03778|Y06_BPT7_Protein_0.6B Viruses PF16752.5 1.50E-04 Tubulin-specific chaperone C ALG_2_DN18226_c0_g1_i1_len1259 ORF_1 157.61 VOG09820 2.90E-04 REFSEQ_hypothetical_protein ; - - - ALG_2_DN18993_c2_g2_i2_len2532 ORF_1 151.17 VOG08344 8.50E-08 REFSEQ_hypothetical_protein Bacteriophage PF13424.6 3.80E-132 Tetratricopeptide repeat ALG_3_DN34624_c0_g1_i1_len2077 ORF_2 146 PF02123.16 8.50E-05 Viral RNA-directed RNA-polymerase RdRP-4 - - - ALG_2_DN18744_c0_g1_i3_len2080 ORF_1 134.99 VOG10472 4.10E-04 REFSEQ_hypothetical_protein - - - ALG_2_DN19214_c2_g1_i7_len1432 ORF_2 118 VOG06927 6.90E-04 REFSEQ_hypothetical_protein Bacteriophage - - - ALG_3_DN25592_c0_g1_i1_len1043 ORF_1 115 VOG01256 4.40E-04 sp|Q9QU29|ORF3_TTVB1_Uncharacterized_ORF3_protein dsDNA_viruses - - - ALG_2_DN18451_c0_g1_i4_len2061 ORF_4 109.48 VOG17696 7.10E-04 REFSEQ_hypothetical_protein Bacteriophage PF16058.5 1.80E-17 Mucin-like ALG_2_DN18451_c0_g1_i4_len2061 ORF_4 109.48 VOG17696 7.10E-04 REFSEQ_hypothetical_protein Bacteriophage PF16058.5 1.10E-07 Mucin-like ALG_2_DN18732_c0_g2_i2_len2454 ORF_2 99.65 VOG09815 1.70E-15 REFSEQ_hypothetical_protein Phycodnaviridae;Chlorovirus - - - ALG_2_DN18957_c0_g1_i1_len1408 ORF_2 97.98 VOG02199 8.40E-04 sp|Q5UR09|YR648_MIMIV_Uncharacterized_protein_R648 PF06156.13 4.70E-04 Initiation control protein YabA ALG_2_DN18957_c0_g1_i2_len1657 ORF_2 89.04 VOG02199 3.50E-04 sp|Q5UR09|YR648_MIMIV_Uncharacterized_protein_R648 Ortervirales PF06156.13 2.20E-04 Initiation control protein YabA ALG_2_DN19250_c2_g3_i3_len2203 ORF_1 71.48 VOG23558 7.80E-05 REFSEQ_hypothetical_protein Caudovirales;Siphoviridae - - - ALG_2_DN19410_c0_g2_i7_len1634 ORF_1 49.51 VOG12013 9.90E-04 sp|P03778|Y06_BPT7_Protein_0.6B Bacteriophage - - - ALG_2_DN11543_c0_g1_i1_len842 ORF_1 42 VOG20356 3.80E-04 REFSEQ_hypothetical_protein Caudovirales; - - - ALG_2_DN19463_c5_g2_i1_len765 ORF_1 37.7 VOG24589 2.60E-05 REFSEQ_hypothetical_protein Caudovirales;Siphoviridae - - - ALG_2_DN19174_c0_g2_i18_len938 ORF_1 36.39 VOG10625 9.70E-04 sp|Q05293|VG78_BPML5_Gene_78_protein Bacteriophage - - - ALG_2_DN41289_c0_g1_i1_len750 ORF_1 36 VOG06662 9.50E-05 REFSEQ_Cupin Bacteriophage - - - ALG_2_DN22182_c0_g1_i1_len820 ORF_1 35 VOG02199 1.50E-04 sp|Q5UR09|YR648_MIMIV_Uncharacterized_protein_R648 Ortervirales PF06156.13 7.20E-04 Initiation control protein YabA ALG_2_DN44027_c0_g1_i1_len815 ORF_1 33.98 VOG21678 3.70E-04 REFSEQ_hypothetical_protein Caudovirales;Myoviridae PF08614.11 6.00E-05 Autophagy protein 1(ATG16) ALG_2_DN594_c0_g2_i1_len711 ORF_1 28 PF17501.2 2.80E-04 Viral RNA-directed RNA polymerase Viral_RdRp_C - - - ALG_2_DN14271_c0_g1_i1_len772 ORF_1 25.12 VOG18617 2.20E-04 REFSEQ_hypothetical_protein Caudovirales;Siphoviridae PF13855.6 1.60E-21 Leucine rich repeat ALG_2_DN18993_c2_g2_i1_len2178 ORF_1 21.18 VOG08344 4.70E-07 REFSEQ_hypothetical_protein Bacteriophage PF13374.6 1.40E-126 1Tetratricopeptide repeat ALG_2_DN44027_c0_g1_i2_len783 ORF_1 19.02 VOG21678 3.60E-04 REFSEQ_hypothetical_protein Caudovirales;Myoviridae PF08614.11 7.70E-05 Autophagy protein 1(ATG16) ALG_2_DN19463_c5_g2_i6_len928 ORF_1 14.34 VOG24589 6.40E-05 REFSEQ_hypothetical_protein Caudovirales;Siphoviridae - - - 216 ALG_2_DN19463_c5_g2_i4_len863 ORF_1 4.36 VOG24589 5.30E-05 REFSEQ_hypothetical_protein Caudovirales;Siphoviridae - - -

12 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

217 To manually assess the level of confidence of the RdRp signal detected in the HMM

218 comparisons, the protein sequences of the three RdRp candidates were aligned to amino

219 acid sequences retrieved from the RdRp_C, RdRp_4 and RdRp_1 PFAM profiles. The

220 RdRp_C profile represents the C-terminal of the RdRp (Protein A) found in

221 Alphanodaviruses. Unfortunately, this region lacks the functional motifs usually associated

222 with the RdRp, preventing us from definitively establishing the ALG_2_DN594 contig as

223 representing a true RdRp-encoding virus. Similarly, the ALG_3_DN34624 alignment with the

224 viral RdRp sequences that comprise the PFAM RdRp_4 profile (PF02123) does not display

225 a clear conservation of the crucial functional residues and motifs within the RdRp,

226 particularly the canonical GDD motif that is GFD in ALG_3 contig (Fig S2): strikingly, the

227 GFD motif is absent from an alignment of 4,627 viral RdRp sequences27. Whether this

228 reflects a newly identified functional motif of the viral RdRp or an artefact remains to be

229 determined, but at this point we cannot safely conclude that ALG_3_DN34624 encodes a

230 viral RdRp. In contrast, the ALG_2_DN19089-encoded ORF shared multiple motifs with

231 RdRp_1 profile (PF00680) (Fig S3), including motif A at positions 437-442 and GDD at

232 positions 557-559. Because of the presence of these functional motifs, ALG_2_DN19089

233 can be confidently considered as a RdRp-encoding contig and will be referred to here as

234 ‘Partiti-like adriusvirus’. Interestingly, this contig also revealed significant similarity to some

235 eukaryotic chloroplast-associated double-stranded RNA replicons (BDRC) obtained from

236 the green algae species, Bryopsis cinicola28 (Table 2). It is therefore likely that these BDRC

237 dsRNA Bryopsis-replicons in fact represent viral RdRp sequences20, and we treat them as

238 such in this study.

239

240 An additional BLASTx comparison using this divergent Partiti-like RdRp as a reference

241 identified two other BDRC-like contigs in the Ostreobium sp. data set - ALG_2_DN19300

242 and ALG_2_DN19436 (Table 2, Fig 5). Along with the Partiti-like adriusvirus, these two

13 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

243 additional sequences were both validated by RT-PCR (Fig S1) and are listed in the viral

244 contig table as ‘Partiti-like lacotivirus’ and ‘Partiti-like alassinovirus’, respectively (Table 2).

245

246 Relative abundance and prevalence of RNA viruses in the samples

247 Relative virus abundance varied between libraries and viruses of the same family in the

248 Ostreobium sp. culture, with viral-like sequences constituting between 0.01 and 0.2% of

249 the total non rRNA reads (Table 2, Fig 2). Each virus described was identified in only one of

250 the cultures sequenced (Fig S1). However, inter-sample BLAST-comparisons revealed

251 similarity between the partiti-like sequence identified in the K. allantoideum and Ostreobium

252 samples. A non-annotated ORF from a K. allantoideum contig, ALG_1_DN2506, aligned

253 with the N-terminal ORF of Ostreobium sp. amalga-like virus contigs that potentially encode

254 the virus coat protein. Because of their co-occurrence in K. allantoideum (1.1 and 1.2 PCR,

255 Fig S1) and their similar abundance levels, it is likely that ALG1 DN2506 and ALG 1 DN2691

256 (referred as ‘Amalga-like boulavirus’, Table 2) contigs are in fact part of the same .

257 Unfortunately, the poor quality of and the resulting high degree of fragmentation

258 obtained in the ALG_1 RNAseq library did not allow us to resolve this question.

259

260 Notably, a large majority of the new RNA viruses reported here come from the ulvophyte

261 Ostreobium sp., although this may in part result from differences in RNA quality and

262 sequencing rather than a true biological difference in RNA virome composition as only a

263 limited number of non-rRNA reads were obtained for the ALG_1 and ALG_3 libraries (Fig

264 S4). Difficulties in detecting highly divergent sequences, especially in poorly characterized

265 and distant clades like the chlorarachniophytes, may also contribute to the different

266 numbers of viruses observed between libraries.

267

268 Secondary host detection

14 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

269 As the algal cultures analysed here were not axenic, we evaluated the potential presence of

270 other eukaryotes in the samples using CCMetagen29. According to the Krona plots obtained

271 for each library, cultivated algae were the major organism found in the samples (Fig S5).

272 Nevertheless, 2-8% of ALG_1 and ALG_3 contigs were assigned to and

273 Cyanophora algae (Fig S5). Although sequences assigned to Lingulodinium polyedrum

274 potentially result from GenBank mis-annotation and were likely of bacterial origin, the

275 Coolia malayensis (Dinophyceae), Amphidinium sp (Dinophyceae) and Cyanophora

276 paradoxa (Glaucophyta) associated sequences likely constitute true assignments. We

277 therefore suspect that these additional micro-eukaryotic transcripts arose from cross-

278 contamination with additional algae samples that were extracted and sequenced at the

279 same time. Importantly, none of the viruses identified in the three libraries studied here

280 could be detected in the transcriptomes of the co-processed Dinophyceae and

281 Glaucophyta cultures, suggesting that these low-abundance contaminants are not the

282 hosts of the viruses reported here.

283

284 Phylogenetic analysis of the newly identified viruses

285

286 dsRNA viruses

287 Partiti-like viruses

288 Eight of the newly-described viral sequences exhibited RdRp amino acid sequence

289 similarity with Partiti-like viruses (i.e. close to members of the Partitiviridae). Based on

290 phylogenetic studies, five of the viral-like sequences are close to those from the

291 Amalgaviridae (named after their mosaic status comprising both a Partitiviridae-like RdRp

292 and the dicistronic and monopartite genomic organization of the Totiviridae30) and form a

293 clade with the Bryopsis mitochondria-associated dsRNA (BDRM), although they share only

294 28-32% sequence identity (Table 2). BDRM was first described as a dsRNA associated with

15 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

295 mitochondria in Bryopsis cinicola macroalgae31 and later classified as a virus by the

296 International Committee on Taxonomy of Viruses (ICTV). Like Ostreobium and Kraftionema,

297 Bryopsis belongs to the class Ulvophyceae, and it seems likely that all these five newly

298 identified Amalga-like viral sequences (Fig 5) form an Ulvophyceae-infecting viral clade.

299

300 Considering the levels of RdRp pairwise identity between these sequences (Table A, Top)

301 we assumed that each constituted a new species. Although a clear taxonomic classification

302 needs to be established at the family or genus levels, we performed a preliminary

303 taxonomic assessment using the PAirwise Sequence Comparison (PASC) tool from the

304 NCBI32. Each of these five new BDRM-like viral genome sequences were compared with

305 the Amalgaviridae full-length genomes available at NCBI. For each newly-discovered virus,

306 the closest matches to existing Amalgaviridae sequences were retrieved, and the resulting

307 pairwise identity distributions compared with those within and between Amalgaviridae

308 genera (Fig 3A). While this analysis indicates that these newly identified virus sequences are

309 not part of any existing Amalgaviridae genus (Fig 3A), whether they can be considered as a

310 new genus within the Amalgaviridae, or a new family, is not yet clear and will require a

311 formal taxonomic assessment by the ICTV.

312

313 Table 4. RdRp-based pairwise identity of newly discovered viruses. Top: Amalga-like

314 RdRp; Middle: BDRC-like RdRp; Bottom: Mitovirus-like RdRp.

Amalga-like Amalga-like Amalga-like Amalga-like Amalga-like RdRp chassivirus lacheneauvirus chaucrivirus dominovirus

Amalga-like boulavirus 19 21 16 16

Amalga-like chassivirus - 47 22 20

Amalga-like lacheneauvirus - 22 21

Amalga-like chaucrivirus - 36

315

BDRC-like RdRp Partiti-like alassinovirus Partiti-like lacotivirus

Partiti-like adriusvirus 20 20

16 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Partiti-like alassinovirus - 32

316

Mito-like Mito-like Mito-like Mito-like Mito-like Mito-like Mitovirus-like RdRp picolinus babylonus albercanus boobanusbagrillonus laruketanus virus bionusvirus virus virus virus virus

Mito-like spartanusvirus 46 19 21 21 20 15

Mito-like picolinusvirus - 17 20 19 19 13

Mito-like babylonusvirus - 22 22 20 14

Mito-like albercanusvirus - 26 25 16

Mito-like laruketanusvirus - 37 14

Mito-like - 12 boobanusbagrillonusvirus

317

318 Interestingly, if these Amalga-like sequences are translated into amino acids using the

319 protozoan mitochondrial code, they display the same genomic organization as BDRM,

320 encoding two overlapping ORFs: the 5’ one encoding a hypothetical protein and the other

321 coding for a replicase through a -1 ribosomal frameshift33 (Fig S6). However, it is unclear if

322 these sequences should be translated through the mitochondrial genetic code or the

323 standard cytoplasmic one code, and such sub-cellular localization still remains to be

324 validated. It is also notable that the two closest homologs of BDRM, the Amalga-like

325 dominovirus and Amalga-like chaucrivirus, contain the GGAUUUU ribosomal -1 frameshift

326 motif and could plausibly be translated in this manner in the mitochondria (Fig S6).

327

328 The length and two-ORF encoding genomic structure of the BDRM-like sequences

329 generally correspond to genomic features of the amalgaviruses (Fig 5, right). Despite a lack

330 of detectable sequence similarity at both the sequence (BLASTx) and structural levels

331 (Phyre2), the second ORFs predicted in these amalgavirus-like sequences are expected to

332 encode a CP-like protein, even if the involvement of this potential CP in encapsidation

333 remains unclear34. A divergent virus similar to the BDRM was also recently identified as

334 infecting the phytopathogenic fungi Ustilaginoidea virens35.

335

17 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

336 The three BDRC-like sequences identified from Ostreobium sp. also cluster with the Partiti-

337 like viruses (Fig 5, Table 2) and can be classified as three different species after applying

338 the 90% RdRp percentage identity species demarcation criteria (Table 4, Middle). Notably,

339 the genomic organization of these BDRC-like contigs seems “inverted” compared to

340 members of the Totiviridae and Amalgaviridae; that is, a first ORF encoding a CP protein is

341 followed by a second that represents the RdRp. Indeed, the RdRp encoded by the Partiti-

342 like ALG_2 contigs is close to the 5’ extremity, followed by a second ORF. This second

343 ORF could potentially encode a CP, although functional annotation could not be achieved

344 due to the high level of sequence divergence.

345

346 Structural-based homology analysis

347 The remaining hits from the RdRp-profile analysis - ALG_2_DN594_c0_g2_i1_len711 and

348 ALG_3_DN34624_c0_g1_i1_len2077 - were used in a Phyre2 analysis. However, this

349 revealed no confident identification of viral RdRp signal (i.e. the confidence levels of models

350 were < 90%).

351

352 Despite the great diversity of host and genomic organizations, the detectable sequence

353 similarity observed between Partitiviridae and Amalgaviridae strongly suggests that they

354 share common ancestry34. As such, it is important to determine whether amalgaviruses are

355 restricted to plant hosts36–38, and hence differ from the Partitiviridae that have been

356 characterized in many divergent eukaryotic hosts such as fungi, plants and protists39,40.

357

358 ssRNA(+) viruses

359 Mitovirus-like viruses

360 Seven viral sequences from Ostreobium sp. clustered in the Narnaviridae, forming a clade

361 within the genus Mitovirus (Fig 6). Their single ORF encodes an RdRp, and a genome length

18 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

362 of ~3000 nt is typical of the Narnaviridae (Fig 6, right). These viral sequences could

363 constitute a new clade of -associated mitoviruses potentially restricted to green

364 microalgae. Indeed the closest relative identified here, Shahe Narna-like virus 6, was

365 isolated from freshwater small planktonic crustaceans (Daphnia magna, Daphnia carinata

366 and Moina macrocopa) belonging to the order Cladocera41 (commonly referred to as water

367 fleas). These feed on microalgae and it is therefore possible that Shahe Narna-like

368 virus 6 in fact infects algae rather than arthropods, in a similar manner to other members of

369 the Narnaviridae. It is therefore possible that these RNA viruses could be major constituents

370 in habitats occupied by protists and by consequence in larger predatory invertebrates.

371

372 Based on the 50% RdRp sequence identity used as a species demarcation criterion in the

373 Narnaviridae (ICTV report 2009), we identified seven new mito-like viral species (Table 4,

374 bottom). To place of these new species among the Narnaviridae, we performed a PASC

375 analysis using full-length genomes from the seven new viral species. This revealed that the

376 identity levels of the new sequences fell in the range expected of intra-genus diversity (Fig

377 3B). We therefore propose the existence of a new sub-clade of mitoviruses, including these

378 seven new species as well as the Shahe Narna-like virus 6 previously described41 (Fig 6).

379 Whether this proposed clade is associated with mitochondria is currently unclear, and

380 predicted ORFs were obtained using both standard and mitochondrial genetic codes.

381

382 Tombusviridae-like viruses

383 One sequence from Ostreobium sp., the Tombus-like chagrupourvirus, exhibited similarity

384 to members of the Tombusviridae family of ssRNA(+) viruses (Fig 7), grouping with viruses

385 previously identified as infecting plant or plant pathogenic fungi. This again illustrates the

386 likely shared evolutionary history between green algae and land plants, and that horizontal

387 virus transfers can occur between plants and their pathogenic fungi. Of note is that the

19 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

388 closest relative of Tombus-like chagrupourvirus documented to date, Hubei-Tombus like

389 virus 12, was isolated from freshwater animals (mollusca Nodularia douglasiae)41. According

390 to the lack of distinguishable -related contigs in the Ostreobium sp. sample (Fig S5),

391 this Hubei-Tombus like virus 12 may, together with the newly tombus-like sequence

392 discovered here, constitute a new clade of green algae-infecting viruses.

393

394 The 3.8kb genome length of the Tombus-like chagrupourvirus is similar to those commonly

395 observed in Tombusviridae and their relatives (Fig 7, right), suggesting that it comprises a

396 full-length genome for this virus. This putative full-length genome sequence was compared

397 to Tombusviridae reference genomic sequences using PASC to assess its taxonomic

398 position (Fig 3C). Accordingly, the Ostreobium-associated tombus-like sequence could

399 constitute a new Tombusviridae genus.

400

401 Virgaviridae-like viruses

402 One sequence identified in Chlorarachnion reptans displayed detectable sequence similarity

403 to the Virgaviridae-like RdRp supergroup (Fig 8). The family Virgaviridae comprise ssRNA(+)

404 viruses traditionally associated with plants and display diverse genomic organizations. The

405 short length of the Virga-like bellevillovirus associated with chlorarachniophytes indicates

406 that this sequence likely reflects a partial genome sequence only. Moreover, the multi-

407 segment structure of the closest relatives suggests that the partial genome recovered in

408 ALG_3 could also contain additional segments not yet identified. Although further host

409 confirmation is required, this newly described RNA virus-like sequence would constitute the

410 first algae virus from the Hepe-Virga group.

411

412 ssRNA(-) virus

413 -like viruses

20 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

414 A partial viral genome, the Bunya-like bridouvirus, encoding a RdRp-like signal was

415 identified in the Chlorarachnion reptans sample, although this sequence is highly divergent

416 and cannot be formally assigned to any previously described viral family. Strikingly,

417 however, the sequence clusters with a Bunya-Arena like virus previous identified in diverse

418 Freshwater small planktonic crustaceans (Daphnia magna (N/A), Daphnia carinata (N/A) and

419 Moina macrocopa (N/A))41 that typically feed on algae. Our phylogenetic analysis places this

420 sequence within the diversity of the order Bunyavirales (Fig 9). In addition to the freshwater

421 organism-associated viruses identified in Shi et al. 2016 (ref.41), this contig clusters with

422 several bunya-like unclassified negative-strand viruses isolated from the fungi

423 Cladosporium cladosporioides and the oomycete Plasmopara viticola (Fig 9) that are both

424 plant pathogens. The multi-segment structure of the closest classified family, the

425 , suggests that additional segments associated with the partial Bunya-like

426 bridouvirus genome may exist in C. reptans.

427

428 Discussion

429 The central aim of this study was to better characterise the RNA virus diversity in two major

430 algal lineages, the chlorarachniophytes and the ulvophytes, for which no RNA viruses had

431 previously been reported. Our investigation of the RNA virus diversity in six microalgae

432 species led to the identification of 18 new divergent RNA viruses, although they exhibited

433 clear homology with five established viral families. This clearly supports the idea that are an

434 important reservoir of untapped RNA virus diversity and that the apparent domination of

435 DNA viruses likely originates, at least partially, from sampling biases. The notion of a

436 potentially large dark matter of algal viruses is further reinforced by the high proportion of

437 unassigned contigs obtained, that likely contain a non-neglectable proportion of highly

438 divergent viral reads.

439

21 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

440 RNA virome similarities between green algae and land plants

441 Among the 18 novel RNA virus species described here, seven of those detected in the

442 green algae Ostreobium sp. and K. allantoideum were seemingly related to the

443 Tombusviridae and Amalgaviridae families of plant RNA viruses. Such similarities in RNA

444 virome between green algae and land plants are consistent with previous analyses based

445 on the Plant Genome project transcriptomic data that identified Partitivirus-like signals in

446 Chlorophyte algae20. However, the very limited sequence similarity among these viruses

447 strongly suggests an ancient divergence among these viral families, perhaps even before

448 the chlorophyte-streptophyte split some 850-1,100 million years ago (Ma)25. The close link

449 between land plant and green algae RNA virosphere is reinforced by the recent observation

450 that plant viruses are able to infect non-vascular plants such as mosses and algae42,43.

451

452 Divergent fungal mitoviruses in Ostreobium sp.

453 The presence of a potential new clade of protist-associated mitoviruses is of importance as

454 mitoviruses have traditionally been viewed as restricted to fungal hosts and were only very

455 recently identified in plants44. Similar to virus transfer between fungi and land plants 5, it is

456 possible that the symbiosis and co-evolution between green algae and fungi46 explains the

457 close phylogenetic relationships in their virome, perhaps facilitating horizontal transfer

458 events. Indeed, coral holobionts are the place for frequent interactions between endolithic

459 algae, such as Ostreobium sp., and fungi47–49. While we cannot formally identify host

460 associations from our RNA-seq data alone, no fungal-assigned contigs were detected in

461 our data, arguing against these being fungal viruses. Considering the high levels of

462 sequence divergence between our viral sequences and those associated with fungi and

463 within the clade formed by Ostreobium sp.-associated viruses, it seems likely that any such

464 events are not recent and may have occurred in Ulvophyceae or

465 even Chlorophyta ancestors. It will be of considerable interest to examine this new

22 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

466 mitovirus-clade across a larger set of green microalgae species. Further characterisation of

467 such green algae-infecting mitovirus diversity and their lifecycle could be pivotal for our

468 understanding of the Narnaviridae and their evolution in eukaryotic cells. For example, it is

469 of interest to determine whether this putative mitochondrial subcellular location is the result

470 of an escape from the cytoplasmic dsRNA silencing (as suggested for newly characterized

471 plant mitoviruses50) or if these viruses are relics of the eukaryotic endosymbiosis event,

472 particularly as mitoviruses have bacterial counterparts, the Leviviridae27. More broadly,

473 these newly-reported mitovirus-like sequences further illustrate the enormous diversity of

474 hosts infected by Narnaviridae, including such eukaryotic microorganisms as Apicomplexa,

475 Excavates and Oomycetes hosts51–54.

476

477 Detection of plant viruses in the chlorarachniophytes

478 The apparent similarity between a Rhizarian (C. reptans) associated viral sequence and

479 land-plant infecting viruses was also striking. The Rhizaria and Archaeplastida are assumed

480 to have diverged before the cyanobacteria primary endosymbiosis event, ca. 1.5 billion

481 years ago55,56. Thus, a detectable sequence similarity between Rhizaria-associated viruses

482 and those infecting land plants cannot be reasonably attributed to such ancient

483 evolutionary events. Instead, the presence of such land plant-like viruses in

484 Chlorarachniophyte would more likely reflect more recent acquisition of these viruses in

485 chlorarachniophytes through either horizontal transfer by common vector/symbiont/parasite

486 ancestors or secondary endosymbiosis (eukaryote-to-eukaryote) events. Indeed, a

487 secondary endosymbiosis event of a green alga in the core Chlorophyta, possibly related to

488 Bryopsidales, led to the origin of the plastid of chlorarachniophytes between 578–318 Ma21.

489 Whether this virus (i) constitutes a relic of viruses that infected this engulfed green algal

490 , (ii) is part of the chlorarachniophyte cytoplasm or still associated with the

491 periplastidial compartment (i.e. remnant cytoplasm of the endosymbiont), or (iii) interacts

23 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

492 with the nucleomorph (remnants of the green algal endosymbiont nucleus) are key

493 questions in the evolution of eukaryotic RNA viruses. While our data cannot provide

494 answers, it will be of major interest to extend these analyses to euglenophytes and the

495 genus Lepidodinium where distinct secondary endosymbiosis with green

496 algae have also occurred, as well as to cryptophytes that also contain the remnant nucleus

497 of its red algae endosymbiont57,58.

498

499 First report of a negative-sense RNA virus in algae

500 Our detection of Bunyavirales-like sequence in C. reptans is of importance as it would

501 constitute the first evidence of a negative-sense RNA virus in a microalgae and only the

502 second discovered in eukaryotic microorganisms. Considering the extensive host range

503 attributed to Bunyavirales (from land plants to invertebrates, vertebrates and humans), their

504 association with chlorarachniophyte hosts is plausible. However, considering the poor level

505 of abundance in C. reptans sample, additional work is clearly needed to retrieve full-

506 genome information and to confirm the association of such bunya-like viruses with

507 chlorarachniophytes.

508

509 One of the greatest challenges in viral meta-transcriptomics is our capacity to detect

510 distant homologies, especially in rapidly evolving RNA viruses. As a first attempt to retrieve

511 such ephemeral evolutionary signals, we scanned orphan contigs using RdRp protein

512 structure information in addition to the standard primary amino acid sequence. Notably,

513 both RdRp profile and 3D protein structure comparison led to the identification of highly

514 divergent RNA virus candidates, although these remain difficult to annotate. Efforts to

515 better describe the repertoire of sequence and structure of viral RdRps are central to

516 unveiling the RNA virosphere in overlooked eukaryotic organisms.

517

24 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

518 MATERIAL AND METHODS

519 Algal cultures

520 Algal strains were isolated from marine sand (Microrhizoidea pickettheapsiorum59;

521 Kraftionema allantoideum60; Chlorarachnion reptans and Lotharella sp. from the Wye River,

522 Victoria, Australia) or coral skeletons (Ostreobium sp. HV05007, Kavieng, Papua New

523 Guinea), or obtained from the NIVA Culture Collection of Algae (Dolichomastix tenuilepis,

524 SCCPA strain K-0587). Cultures were maintained in K-enriched seawater medium

525 (transferring every other week) at either 26ºC (Ostreobium) or 16ºC (all others) under cool

526 white LED lights at 1 Photosynthetic active radiation (PAR) (Ostreobium) or 15 PAR (all

527 others). Cultures were pelleted by centrifugation in falcon tubes and stored in RNAlater at -

528 80ºC until RNA extraction.

529

530 Total RNA extractions

531 For total RNA extractions, RNAlater was removed by low centrifugation, algal cells were

532 disrupted using thaw/freezing cycles and bead beating (0.1-0.5mm), and total RNA was

533 further extracted using the Qiagen® RNeasy Plant mini kit following the manufacturer's

534 instructions.

535

536 For the initial using meta-transcriptomic screening, RNAs were pooled into three groups: (i)

537 the chlorophyta Dolichomastix tenuilepis and Microrhizoidea pickettheapsiorum

538 (Mamiellophyceae) and Kraftionema allantoideum (Ulvophyceae) were pooled into

539 metatranscriptome ‘ALG_1’; (ii) the ulvophyte Ostreobium sp. comprised ‘ALG_2’; and (iii)

540 the two chlorarachniophytes Chlorarachnion reptans and Lotharella sp. were pooled into

541 ‘ALG_3’ (Table 1).

542

543 Total RNA Sequencing

25 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

544 RNA quality was assessed and TruSeq stranded libraries were synthetized by the Australian

545 Genome Research Facility (AGRF), using either (i) TruSeq stranded with a eukaryotic rRNA

546 depletion step (RiboZero Gold kit, Illumina) for ALG_2, or (ii) the SMARTer Stranded Total

547 RNA-Seq Kit v2 - Pico Input Mammalian libraries (Takara Bio USA) for ALG_1 and ALG_3

548 (Table S1), due to the low amount of input RNAs in these libraries. The resulting libraries

549 were sequenced on an Illumina HiSeq2500 (paired-end, 100bp) at the AGRF. Library

550 descriptions and RNA-seq statistics are summarized in Table S1.

551

552 In silico processing of meta-transcriptomic data

553 Read depletion and contig assembly

554 The RNA-seq data were first subjected to low-quality read and Illumina adapter filtering

555 using the Trimmomatic program61. Ribosomal RNA was depleted using the SILVA database

556 v3262, which removed between 86 and 94% of the total unfiltered reads (Fig S1A/2). Read-

557 depleted libraries were then de novo assembled using the Trinity program63 and contigs

558 shorter than 200nt were removed (the average length of contig assembly is shown in Fig

559 S1B/2). Contig abundances were calculated from the RNA-seq data using the Expectation-

560 Maximization (RSEM) software64.

561

562 RNA virus detection using BLASTx and BLASTn

563 The similarity of contigs to the current NCBI nucleotide (nt) and protein (nr) databases was

564 determined using the BLASTn and Diamond BLASTx programs65, respectively, employing

565 1e-10 and 1e-05 as e-value cut-offs. RNA virus-like sequences were identified using

566 BLASTx against all RdRp protein sequences available on NCBI/GenBank. False-positive

567 signals for RNA viruses were removed by BLASTing RdRp-like sequences against the nr

568 database and discarding sequences displaying a non-viral sequence as the best hit.

569

26 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

570 RNA virus profile-based homology detection

571 To detect especially diverse RdRp-based sequences, orphan contigs (i.e. those with no

572 match in either the nr and nt databases) were compared to the PFAM RdRp-protein profiles

573 ‘MitoVir_RdRp’ (PF05919), ‘Birna_RdRp’ (PF04197), ‘Viral_RdRp_C’ (PF17501), ‘RdRP_1’

574 (PF00680), ‘RdRP_2’ (PF00978), ‘RdRP_3’ (PF00998) and ‘RdRP_4’ (PF02123), as well as

575 to the entire VOG profile database (http://vogdb.org)66 using the HMMer3 program67. To

576 check for false-positive signals, these orphan sequences were submitted to the entire

577 PFAM database.

578

579 3D protein structure prediction of RdRp-like contigs

580 To infer a structural model for the distant RdRp signals detected using profiles, sequences

581 displaying a RdRp-like signal were subjected to the normal mode search of the Protein

582 Homology/analogY Recognition Engine v 2.0 (Phyre2) web portal68. Briefly, this program

583 first compares the submitted amino acid sequence to a curated non-redundant nr20 data

584 set using HHblits69. It then converts the conserved secondary structure information as a

585 query against known 3D-structures using HHsearch70. A final structural modelling step

586 based on identified structural homologies is then performed as described previously68.

587

588 RNA virus sequence analysis and annotation

589 Total non-rRNA reads were mapped onto RNA virus-like contigs using Bowtie2 and

590 heterogeneous coverage and potential mis-assemblies were manually resolved using

591 Geneious v11.1.471.

592

593 Open reading frames (ORFs) were first predicted using getORF from EMBOSS, in which

594 ORFs were defined as regions that are free of stop codons, although partial sequences (i.e.

595 missing start or stop codons) were retained for analysis. Protein domains were then

27 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

596 annotated using the InterProscan software package from EMBL-EBI, comprising ProSite,

597 PFAM, TIGR, SuperFamily, and CDD among others (https://github.com/ebi-pf-

598 team/interproscan).

599

600 Revealing host-virus associations

601 A challenge faced by all metagenomic studies is confidently assigning each viral sequence

602 to a particular host in a given sample. We used algal cultures to minimize the number of

603 potential additional cellular hosts. These cultures were, however, non-axenic, with mainly

604 bacteria present. To evaluate the possibility of additional microeukaryotic cells in the

605 sample, we obtained taxonomic identification for contigs in the metatranscriptome by

606 aligning them to the NCBI nt database using the KMA aligner and the CCMetagen

607 program29,72. Contigs matching an entry in the nt database were displayed as Krona plots

608 and classified based on their taxonomy (utilising high taxonomic levels for clarity).

609

610 Phylogenetic analysis

611 RdRp amino acid sequences were aligned using the L-INS-I algorithm in the MAFFT

612 program73 and trimmed with TrimAI (automated mode). Maximum likelihood phylogenetic

613 trees were then estimated using IQ-TREE, employing ModelFinder to obtain the best-fit

614 model of amino acid substitution in each case, with nodal support assessed using 1000

615 bootstrap replicates and 1000 replicates of SH-like approximate likelihood ratio test (SH-

616 aLRT)74,75. For each tree, reference genomes and corresponding RdRp sequences were

617 retrieved from the NCBI Viral genome resource

618 (https://www.ncbi.nlm.nih.gov/genome/viruses/). To depict the evolutionary relationships of

619 the newly discovered viruses as meaningfully as possible, the closest unclassified BLASTx

620 homologs were used in the phylogenetic analysis. This resulted in alignments of 237, 76,

28 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

621 198, 558 and 325 RdRp protein sequences for Amalga-like, Mito-like, Tombus-like, Virga-

622 like and Bunya-like virus groups, respectively.

623

624 RT-PCR validation

625 Viral contigs were validated experimentally and associated to individual algal sample using

626 Reverse (SuperScript™ IV - Invitrogen™) followed by

627 PCR (Platinum™ SuperFi™ DNA polymerase - Invitrogen™), with specific primer sets for

628 each contig. The rbcL and tufA marker were used as PCR positive controls using

629 sets of primers designed in ref.76. All primers used in this study are described in Table S2.

630

631 ACKNOWLEDGMENTS

632 We acknowledge the SIH Bioinformatics Hub and the University of Sydney’s high-

633 performance computing cluster Artemis for providing the computing resources used for this

634 study. We also thank Fabien Burki for providing us source files to build Figure 1.

29 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

635 REFERENCES

636 1. Krupovic M, Prangishvili D, Hendrix RW, Bamford DH. Genomics of bacterial and

637 archaeal Viruses: dynamics within the prokaryotic virosphere. Microbiol Mol Biol Rev

638 2011; 75: 610–635.

639 2. Lang AS, Rise ML, Culley AI, Steward GF. RNA viruses in the sea. FEMS Microbiol

640 Rev 2009; 33: 295–323.

641 3. Krishnamurthy SR, Wang D. Origins and challenges of viral dark matter. Virus Res

642 2017; 239: 136–142.

643 4. Zhang Y-Z, Shi M, Holmes EC. Using metagenomics to characterize an expanding

644 virosphere. 2018; 172: 1168–1172.

645 5. Short SM, Staniewski MA, Chaban Y V., Long AM, Wang D. Diversity of viruses

646 infecting eukaryotic algae. Curr Issues Mol Biol 2020; 39: 29–62.

647 6. Mihara T, Nishimura Y, Shimizu Y, Nishiyama H, Yoshikawa G, Uehara H et al.

648 Linking virus genomes with host taxonomy. Viruses 2016; 8: 66.

649 7. Richmond A, Hu Q. Handbook of Microalgal Culture: Applied Phycology and

650 Biotechnology: Second Edition. In: Handbook of Microalgal Culture: Applied

651 Phycology and Biotechnology: Second Edition. 2013, pp 1–719.

652 8. Mayer JA, Taylor FJR. A virus which lyses the marine nanoflagellate

653 pusilla. Nature 1979; 281: 299–301.

654 9. Brown RM. Algal viruses. Adv Virus Res 1972; 17: 243–277.

655 10. Illergard K, Ardell DH, Elofsson A. Structure is three to ten times more conserved

656 than sequence—A study of structural response in protein cores. Proteins Struct

657 Funct Bioinforma 2009; 77: 499–508.

658 11. Bamford DH, Grimes JM, Stuart DI. What does structure tell us about virus

659 evolution ? Curr Opin Struct Biol 2005; 15: 655–663.

660 12. Chen J, Guo M, Wang X, Liu B. A comprehensive review and comparison of different

30 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

661 computational methods for protein remote homology detection. Brief Bioinform 2018;

662 19: 231–244.

663 13. Singh J, Saxena RC. An Introduction to Microalgae: Diversity and Significance. In:

664 Handbook of Marine Microalgae. Academic Press, 2015, pp 11–24.

665 14. Archibald JM. The evolution of algae by secondary and tertiary endosymbiosis. Adv

666 Bot Res 2012; 64: 87–118.

667 15. Burki F, Roger AJ, Brown MW, Simpson AGB. The new tree of eukaryotes. Trends

668 Ecol Evol 2020; 35: 43–55.

669 16. Guiry MD. How many species of algae are there? J. Phycol. 2012; 48: 1057–1063.

670 17. Chapman RL. Algae: the world’s most important “plants”—an introduction. Mitig

671 Adapt Strateg Glob Chang 2013; 18: 5–12.

672 18. Leliaert F, Smith DR, Moreau H, Herron MD, Verbruggen H, Delwiche CF et al.

673 Phylogeny and molecular evolution of the green algae. CRC Crit Rev Plant Sci 2012;

674 31: 1–46.

675 19. Urayama SI, Takaki Y, Nunoura T. FLDS: A comprehensive DSRNA sequencing

676 method for intracellular RNA virus surveillance. Microbes Environ 2016; 31: 33–40.

677 20. Mushegian A, Shipunov A, Elena SF. Changes in the composition of the RNA virome

678 mark evolutionary transitions in green plants. BMC Biol 2016; 14: 68.

679 21. Jackson C, Knoll AH, Chan CX, Verbruggen H. Plastid phylogenomics with broad

680 taxon sampling further elucidates the distinct evolutionary origins and timing of

681 secondary green . Sci Rep 2018; 8: 1–10.

682 22. Fuhrman JA. and their biogeochemical and ecological effects. Nature

683 1999; 399: 541–548.

684 23. Forterre P, Prangishvili D. The major role of viruses in cellular evolution: Facts and

685 hypotheses. Curr Opin Virol 2013; 3: 558–565.

686 24. Short SM, Staniewski MA, Chaban Y V, Long AM, Wang D. Diversity of viruses

31 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

687 infecting eukaryotic algae. Curr Issues Mol Biol 2020; 39: 29–62.

688 25. Cortona A Del, Jackson CJ, Bucchini F, Van Bel M, D’hondt S, Skaloud P et al.

689 Neoproterozoic origin and multiple transitions to macroscopic growth in green

690 seaweeds. Proc Natl Acad Sci USA 2020; 117: 2551–2559.

691 26. Cocquyt E, Verbruggen H, Leliaert F, Clerck O De. Evolution and cytological

692 diversification of the green seaweeds (Ulvophyceae). Mol Biol Evol 2010; 27: 2052–

693 2061.

694 27. Wolf YI, Kazlauskas D, Iranzo J, Lucía-Sanz A, Kuhn JH, Krupovic M et al. Origins

695 and evolution of the global RNA virome. mBio 2018; 9: e02329-18.

696 28. Koga R, Horiuchi H, Fukuhara T. Double-stranded RNA replicons associated with

697 of a green alga, Bryopsis cinicola. Plant Mol Biol 2003; 51: 991–999.

698 29. Marcelino VR, Clausen PTLC, Buchmann JP, Wille M, Iredell JR, Meyer W et al.

699 CCMetagen: comprehensive and accurate identification of eukaryotes and

700 in metagenomic data. Genome Biol 2020; 21: 103.

701 30. Sabanadzovic S, Valverde RA, Brown JK, Martin RR, Tzanetakis IE. Southern tomato

702 virus: The link between the families Totiviridae and Partitiviridae. Virus Res 2009; 140:

703 130–137.

704 31. Koga R, Fukuhara T, Nitta T. Molecular characterization of a single mitochondria-

705 associated double-stranded RNA in the green alga Bryopsis. Plant Mol Biol 1998; 36:

706 717–724.

707 32. Bao Y, Chetvernin V, Tatusova T. Improvements to pairwise sequence comparison

708 (PASC): a genome-based web tool for . Arch Virol 2014; 159:

709 3293–3304.

710 33. Atkins JF, Loughran G, Bhatt PR, Firth AE, Baranov P V. Ribosomal frameshifting and

711 transcriptional slippage: From genetic steganography and cryptography to

712 adventitious use. Nucleic Acids Res 2016; 44: 7007–7078.

32 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

713 34. Krupovic M, Dolja V V, Koonin E V. Plant viruses of the Amalgaviridae family evolved

714 via recombination between viruses with double-stranded and negative-strand RNA

715 genomes. Biol Direct 2015; 10: 12.

716 35. Zhang T, Jiang Y, Dong W. A novel monopartite dsRNA virus isolated from the

717 phytopathogenic Ustilaginoidea virens and ancestrally related to a

718 mitochondria-associated dsRNA in the green alga Bryopsis. 2014; 462–463:

719 227–235.

720 36. Goh CJ, Park D, Lee JS, Sebastiani F, Hahn Y. identification of a novel plant

721 amalgavirus (Amalgavirus, Amalgaviridae) genome sequence in Cistus incanus. Acta

722 Virol 2018; 62: 122–128.

723 37. Zhan B, Cao M, Wang K, Wang X, Zhou X. Detection and characterization of cucumis

724 melo cryptic virus, cucumis melo amalgavirus 1, and melon necrotic spot virus in

725 cucumis melo. Viruses 2019; 11: 1–15.

726 38. Lee JS, Goh CJ, Park D, Hahn Y. Identification of a novel plant RNA virus species of

727 the genus Amalgavirus in the family Amalgaviridae from chia (Salvia hispanica).

728 Genes and Genomics 2019; 41: 507–514.

729 39. Roossinck MJ. Lifestyles of plant viruses. Philos Trans R Soc B Biol Sci 2010; 365:

730 1899–1905.

731 40. Nibert ML, Ghabrial SA, Maiss E, Lesker T, Vainio EJ, Jiang D et al. Taxonomic

732 reorganization of family Partitiviridae and other recent progress in partitivirus

733 research. Virus Res 2014; 188: 128–141.

734 41. Shi M, Lin X-D, Tian J-H, Chen L-J, Chen X, Li C-X et al. Redefining the invertebrate

735 RNA virosphere. Nature 2016; 540: 539–543.

736 42. Polischuk V, Budzanivska I, Shevchenko T, Oliynik S. Evidence for plant viruses in

737 the region of Argentina Islands, Antarctica. FEMS Microbiol Ecol 2007; 59: 409–417.

738 43. Petrzik K, Vondrák J, Kvíderová J, Lukavský J. Platinum anniversary: virus and lichen

33 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

739 alga together more than 70 years. PLoS One 2015; 10: e0120768.

740 44. Nibert ML, Vong M, Fugate KK, Debat HJ. Evidence for contemporary plant

741 mitoviruses. Virology 2018; 518: 14–24.

742 45. Roossinck MJ. Evolutionary and ecological links between plant and fungal viruses.

743 New Phytol 2019; 221: 86–92.

744 46. Bonfante P. Algae and fungi move from the past to the future. eLife 2019; 8: e49448.

745 47. Ricci F, Rossetto Marcelino V, Blackall LL, Kühl M, Medina M, Verbruggen H.

746 Beneath the surface: Community assembly and functions of the coral skeleton

747 microbiome. Microbiome 2019; 7: 1–10.

748 48. C. R, J. R. Fungi and Their Role in Corals and Coral Reef Ecosystems. In:

749 Raghukumar C. (ed). Biology of Marine Fungi. Springer, Berlin, Heidelberg, 2012, pp

750 89–113.

751 49. Marcelino VR, Verbruggen H. Multi-marker metabarcoding of coral skeletons reveals

752 a rich microbiome and diverse evolutionary origins of endolithic algae. Sci Rep 2016;

753 6: 31508.

754 50. Nerva L, Vigani G, Di Silvestre D, Ciuffo M, Forgia M, Chitarra W et al. Biological and

755 molecular characterization of Chenopodium quinoa mitovirus 1 reveals a distinct

756 small RNA response compared to those of cytoplasmic RNA viruses. J Virol 2019;

757 93: e01998-18.

758 51. Charon J, Grigg MJ, Eden JS, Piera KA, Rana H, William T et al. Novel RNA viruses

759 associated with Plasmodium vivax in human malaria and Leucocytozoon parasites in

760 avian disease. PLoS Pathog 2019; 15: e1008216.

761 52. Zangger H, Ronet C, Desponds C, Kuhlmann FM, Robinson J, Hartley M-A et al.

762 Detection of Leishmania RNA virus in Leishmania parasites. PLoS Negl Trop Dis

763 2013; 7: e2006.

764 53. Grybchuk D, Akopyants NS, Kostygov AY, Konovalovas A, Lye L-F, Dobson DE et al.

34 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

765 Viral discovery and diversity in trypanosomatid protozoa with a focus on relatives of

766 the human parasite Leishmania. Proc Natl Acad Sci 2018; 115: E506–E515.

767 54. Cai G, Myers K, Fry WE, Hillman BI. A member of the virus family Narnaviridae from

768 the plant pathogenic oomycete Phytophthora infestans. Arch Virol 2012; 157: 165–

769 169.

770 55. Yoon HS, Hackett JD, Ciniglia C, Pinto G, Bhattacharya D. A molecular timeline for

771 the origin of photosynthetic eukaryotes. Mol Biol Evol 2004; 21: 809–818.

772 56. Parfrey LW, Lahr DJG, Knoll AH, Katz LA. Estimating the timing of early eukaryotic

773 diversification with multigene molecular clocks. Proc Natl Acad Sci U S A 2011; 108:

774 13624–13629.

775 57. Burki F, Keeling PJ. Rhizaria. Curr Biol 2014; 24: R103–R107.

776 58. Gould SB. Algae’s complex origins. Nature 2012; 492: 46–48.

777 59. Wetherbee R, Rossetto Marcelino V, Costa JF, Grant B, Crawford S, Waller RF et al.

778 A new marine prasinophyte genus alternates between a flagellate and a dominant

779 benthic stage with microrhizoids for adhesion. J Phycol 2019; 55: 1210–1225.

780 60. Wetherbee R, Verbruggen H. Kraftionema allantoideum, a new genus and family of

781 Ulotrichales (Chlorophyta) adapted for survival in high intertidal pools. J Phycol 2016;

782 52: 704–715.

783 61. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina

784 sequence data. Bioinformatics 2014; 30: 2114–2120.

785 62. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P et al. The SILVA

786 ribosomal RNA gene database project: improved data processing and web-based

787 tools. Nucleic Acids Res 2013; 41: D590-6.

788 63. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I et al. Full-length

789 transcriptome assembly from RNA-Seq data without a reference genome. Nat

790 Biotechnol 2011; 29: 644–652.

35 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

791 64. Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with

792 or without a reference genome. BMC Bioinformatics 2011; 12: 323.

793 65. Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND.

794 Nat Methods 2015; 12: 59–60.

795 66. El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC et al. The Pfam

796 protein families database in 2019. Nucleic Acids Res 2019; 47: D427–D432.

797 67. Eddy SR. Accelerated profile HMM searches. PLoS Comput Biol 2011; 7: e1002195.

798 68. Kelley LA, Mezulis S, Yates CM, Wass MN, Sternberg MJE. The Phyre2 web portal

799 for protein modeling, prediction and analysis. Nat Protoc 2015; 10: 845–858.

800 69. Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein

801 sequence searching by HMM-HMM alignment. Nat Methods 2012; 9: 173–175.

802 70. Soding J. Protein homology detection by HMM-HMM comparison. Bioinformatics

803 2005; 21: 951–960.

804 71. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S et al. Geneious

805 Basic: An integrated and extendable desktop software platform for the organization

806 and analysis of sequence data. Bioinformatics 2012; 28: 1647–1649.

807 72. Clausen PTLC, Aarestrup FM, Lund O. Rapid and precise alignment of raw reads

808 against redundant databases with KMA. BMC Bioinformatics 2018; 19: 307.

809 73. Katoh K, Standley DM. MAFFT Multiple sequence alignment software version 7:

810 Improvements in performance and usability. Mol Biol Evol 2013; 30: 772–780.

811 74. Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: A fast and effective

812 stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol

813 2015; 32: 268–274.

814 75. Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS. ModelFinder:

815 fast model selection for accurate phylogenetic estimates. Nat Methods 2017; 14:

816 587–589.

36 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

817 76. Vieira HH, Bagatini IL, Guinart CM, Vieira AAH, Vieira HH, Bagatini IL et al. tufA gene

818 as molecular marker for freshwater Chlorophyceae. Algae 2016; 31: 155–165.

819 77. Strassert JFH, Jamy M, Mylnikov AP, Tikhonenkov D V, Burki F. New phylogenomic

820 analysis of the enigmatic phylum telonemia further resolves the eukaryote tree of

821 Llfe. Mol Biol Evol 2019; 36: 757–765.

822

37 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

823 FIGURE LEGENDS

824 Fig 1. The enormous diversity of algae contrasts with its poorly-characterized

825 virosphere. (A) Representation of algae supergroups among the diversity of eukaryotes

826 (latest eukaryotic classification retrieved from15). The phylogenetic tree was adapted from77.

827 Pictures illustrate some of the samples used in this study and corresponding clades are

828 marked with “*”. (B) Pictures of algae cultures used in this study. (C) The current extent of

829 the microalgae virosphere. The viral sequence counts for each virus class (DNA or RNA,

830 single-stranded or double-stranded) were retrieved from VirusHostdb6 according to 11

831 major eukaryotic algae lineages. Microalgal lineages investigated in this study are

832 highlighted in bold.

833

834 Fig 2. Abundance of unknown and RNA virus-like contigs detected in the algal

835 libraries. (A) Percentage of non-assigned contigs. For clarity, numbers are normalized as

836 the percentage of total contigs (actual contig numbers are indicated in bold). Blue: number

837 of contigs showing a strong sequence similarity with the nr database (e-value < 1e-05); light

838 grey: contigs showing a weak sequence similarity to the nr database (e-values 1e-05 to 1e-

839 03); middle-dark grey: contigs with no sequence similarity detected by BLASTx/BLASTp

840 but encoding one or more ORFs of more than 200 amino acids (600 nt); dark grey: genomic

841 ‘dark matter’ - contigs without any signal detected or any long ORFs encoded. (B) Total

842 number of RNA virus reads per total number of non-rRNA reads in each library. (C)

843 Distribution of RNA virus diversity in the three libraries and percentage of RNA virus reads

844 associated with each viral super-clade. The host range is represented for each viral clade.

845

846 Fig 3. Genome pairwise identity distributions of the new algal viral sequences. The

847 level of pairwise identity between the viruses newly identified viruses and existing members

848 of each viral family are represented in red. (A) Inter-genus and intra-genus identity levels

38 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

849 within the Amalgaviridae. Inter-genus and intra-genus identity levels are displayed in grey

850 and purple, respectively. (B) Inter-genus and intra-genus identity levels within the

851 Narnaviridae. Inter-genus and intra-genus identity levels are displayed in grey and yellow,

852 respectively. C) Inter-genus and intra-genus identity levels within Tombusviridae. Inter-

853 genus and intra-genus identity levels are displayed in grey and green, respectively.

854

855 Fig 4. Genomic organization of the Partitiviridae, Totiviridae and Amalgaviridae.

856 Possible evolutionary scenarios for the BDRC-like contigs observed in Ostreobium sp.

857

858 Fig 5. RdRp phylogeny of newly identified chlorophyte viruses among the

859 Amalgaviridae, Partitiviridae, Picobirnaviridae and Hypoviridae. Sequences identified in

860 this study are labelled in red. Unclassified sequences from ref.41 are highlighted in grey. For

861 clarity, some families and genera have been collapsed. Right, phylogenetic tree estimated

862 using IQ-Tree with bootstrap replicates and SH-aLTR set to 1000 (values indicated in

863 parenthesis). Left, genomic organizations of new viruses as well as the closest identified

864 relatives representing each family/genus as follows: Cryphonectria 2

865 (Hypoviridae); Chicken (Picobirnaviridae); Southern tomato virus

866 (Amalgaviridae); Cryptosporidium parvum virus 1 (); Discula destructiva virus 1

867 (Gammapartitivirus); Fig cryptic virus (Deltapartitivirus); Ceratocystis resinifera partitivirus

868 (Betapartitivirus); White clover cryptic virus 1 (Alphapartitivirus). The tree is mid-pointed

869 rooted and branch lengths are scaled according to the number of amino acid substitutions

870 per site.

871

872 Fig 6. Phylogeny of the Narnaviridae-Botourmiaviridae group based on the RdRp.

873 Newly discovered viruses from Ostreobium sp. are highlighted in red. RdRp sequences

874 from unassigned RNA virus retrieved from ref.41 are marked in grey. Right, phylogenetic tree

39 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

875 estimated using IQ-Tree with bootstrap replicates and SH-aLTR set to 1000 (values

876 indicated in parenthesis). Left, genomic organization of both viral genomes identified in this

877 study (red) and representatives of each major family and genus used in the phylogeny

878 (Cassava virus C - Botourmiaviridae; Saccharomyces 23S RNA ; Chenopodium

879 quinoa mitovirus 1). Annotations of Cassava virus C coding sequences: RdRp (Segment I);

880 Putative movement protein (Segment II); Coat protein (Segment III). Branch lengths are

881 scaled according to the number of amino acid substitutions per site.

882

883 Fig 7. Phylogeny of the Tombusviridae RdRp. The tombus-like sequence identified in this

884 study is labelled in red. Unclassified sequences from ref.41 are highlighted in grey. For

885 clarity, some families and genera have been collapsed. Right, phylogenetic tree estimated

886 using IQ-Tree with bootstrap replicates and SH-aLTR set to 1000 (values indicated in

887 parenthesis). Left, genomic organizations of new viruses as well as their closest homologs

888 and representatives from each family/genus as follows: Black beetle virus ();

889 Carrot mottle virus (genus ); Carnation ringspot virus (Dianthovirus); Beet black

890 scorch virus (); Cucumber leaf spot virus (); Maize necrotic streak

891 virus (); Olive latent virus 1 (); Panicum

892 (); Hibiscus chlorotic ringspot virus (Betacarmovirus); Clematis chlorotic mottle

893 virus (Pelarspovirus); Carnation mottle virus (Alphacarmovirus). The tree is mid-pointed

894 rooted and branch lengths are scaled according to the number of amino acid substitutions

895 per site.

896

897 Fig 8. Phylogeny of the Hepe-Virga group RdRp. The hepe-virga-like sequence identified

898 in this study is labelled in red. Unclassified sequences from ref.41 are highlighted in grey. For

899 clarity, some families or genera have been. Right, RdRp-based phylogenetic tree obtained

900 using IQ-tree with bootstrap replicates and SH-aLTR set to 1000 (resulting values are

40 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

901 indicated in parenthesis). Left, genomic organizations of new viruses as well as closest

902 homologs and representative from each family/genus as follows:

903 (); Poinsettia mosaic virus (order ); Wheat stripe mosaic virus

904 (Benyviridae); Diatom colony associated dsRNA virus 15 (Endornaviridae); Cabassou virus

905 (Togaviridae); Apple mosaic virus (); Mint virus 1 (); Cucumber

906 mottle virus (Virgaviridae). The tree is mid-pointed rooted and branch lengths are scaled

907 according to the number of amino acid substitutions per site.

908

909 Fig 9. Phylogeny of the Bunyavirales RdRp. The bunya-like sequence identified in this

910 study is labelled in red. Unclassified sequences from ref.41 are highlighted in grey. For

911 clarity, some families and genera have been collapsed. Right, phylogenetic tree estimated

912 using IQ-Tree with bootstrap replicates and SH-aLTR set to 1000 (values indicated in

913 parenthesis). Left, genomic organizations of new viruses as well as their closest homologs

914 and representatives from each family/genus as follows: Melon chlorotic spot virus

915 (Phenuiviridae); Yogue virus (); Latino mammarenavirus (Arenaviridae); Seattle

916 orthophasmavirus (); Melon yellow spot virus (Tospoviridae); Fig mosaic

917 (Fimoviridae); Tataguine (). The tree is mid-

918 pointed rooted and branch lengths are scaled according to the number of amino acid

919 substitutions per site.

920

41 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

921 FIGURES

A)

B)

Microrhizoidea Dolichomastix tenuilepis Kraftionema Chlorarachnion reptans pickettheapsiorum allantoideum C) ChChlolorroopphhyytta Rhodophyta Glaucophyta ChChlolorarararacchnhnioiopphhyycceeae Dinophyceae dsDNA Bacillariophyta ssDNA Pelagophyceae dsRNA Phaeophyceae ssRNA Raphidophyceae Haptophyta Cryptophyta Euglenoids 30 20 10 0 10 20 30 922

923 Figure 1

924

42 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A) 100% 90% 80% 963 25001 70% 62709 Dark matter contigs contigs 60% f 8 ORFans (ORF > 200aa) 50% 50 40% 1432 Distant homology (e-value < 1e-03) 1966 1560 3267 centage o Strong homology (e-value < 1e-05)

r 30% e

P 871 20% 31867 14860 10% 0% ALG_1 ALG_2 ALG_3 B) 8

7 Millions

6

5

4 RNA virus reads non RNA virus reads

Read count 3

2

1

0 ALG_1 ALG_2 ALG_3 C) 100% 90%

eads 80% r 70%

virus 60% Bunya-like A 50% Virga-like RN f 40% Amalga-like

30% Mito-like

centage o 20% Tombus-like r e

P 10% 0% ALG_1 ALG_2 ALG_3 925

926 Figure 2

927

43 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A) 16

14

12

10

8

6

4 2

0 % % % % % % % % % % % % % % % % % % % % % % % % % 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 B) 160

140

120

100 80

60

40

20

0 % % % % % % % % % % % % % % % % % % % % % % % % % % 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100 C) 500 450 400 350 300 250 200 150 100 50 0 % % % % % % % % % % % % % % % % % % % % % % % % % 0 4 8 928 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96

929 Figure 3

930

44 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

931

Totiviridae

CP RdRp Partitiviridae Partiti-like contigs CP from this study

RdRp Amalgaviridae RdRp CP ?

932 CP RdRp

933 Figure 4

934

45 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

935

Genome length (nt)

99.9/100 45.3/22 100/100 Hypoviridae ALG_2_DN19089_c0_g1_i1_len4252_ORF_1_frame_2_translation_2Partiti-like adriusvirus (ORF1) 99.7/100 ALG_2_DN19300_c0_g1_i1_len3273_ORF_1_frame_1_translation_2Partiti-like lacotivirus (ORF1) 97.6/95 ALG_2_DN19436_consensus_reversed_ORF_1_frame_1_translation_2Partiti-like alassinovirus (ORF1) 8.8/43 74.9/33 BAB63954.1_2_RdRp-like_protein__Bryopsis_cinicola_

100/100

81.8/56 97.9/100

99.7/100 88.9/88

96.7/95 YP_009361996.1_Picobirnaviridae_Picobirnavirus_Picobirnavirus-green-monkey_KNA_2015_vertebrates

99.9/100YP_009351841.1_Picobirnaviridae_Picobirnavirus_Otarine-picobirnavirus_segment-2_vertebrates JMYJCC46554_Jingmen_picobirna-like_virus_1_len1632 Picobirnaviridae 83.2/7195.7/93 YP_239361.1_Picobirnaviridae_Picobirnavirus_Human-picobirnavirus_segment-2_human 95.1/78 YP_009553306.1_Picobirnaviridae_Picobirnavirus_Picobirnavirus-sp._segment-2_human_vertebrates 90.6/43 89.8/75YP_009551574.1_Picobirnaviridae_Picobirnavirus_Chicken-picobirnavirus_segment-RNA-2_vertebrates 40/41 YP_009241386.1_Picobirnaviridae_Picobirnavirus_Porcine-picobirnavirus_segment-S_vertebrates 88.7/78 YP_009389484.1_Picobirnaviridae_Picobirnavirus_Picobirnavirus-dog_KNA_2015_vertebrates YP_009342446.1_Partitiviridae_Botryosphaeria-dothidea-virus-1_segment-1_fungi BAA25883.1_5_RdRp_partial__Bryopsis_mitochondria-associated_dsRNA_ 82.5/74 92.1/90 ALG_2_DN19090_c0_g1_i2_len4036_ORF_2_frame_2_translation_2Amalga-like chaucrivirus (ORF2) 97.8/99 98/99 ALG_2_DN19428_c0_g1_i6_len4011_ORF_2_frame_3_translation_2Amalga-like dominovirus (ORF2) ALG_1_DN2691_c0_g1_i2_len1440_ORF_1_frame_1_translation_2Amalga-like boulavirus, partial (ORF1) 25.4/48 99/83 ALG_2_DN18594_c0_g1_i1_len3399_ORF_2_frame_2_translation_2Amalga-like chassivirus (ORF2) 99.8/100 ALG_2_DN19448_c0_g6_i1_len3254_ORF_2_frame_3_translation_2Amalga-like lacheneauvirus (ORF2) NP_624325.1_Amalgaviridae_Zybavirus_Zygosaccharomyces-bailii-virus-Z_fungi SCM33193_Hubei_partiti-like_virus_59_len3660 84.2/83 99.1/99 YP_009389417.1_Amalgaviridae_Antonospora-locustae-virus-1_fungi

95.5/94 YP_003934623.1_Amalgaviridae_Amalgavirus_Blueberry-latent-virus_land-plants YP_009362302.1_Amalgaviridae_Zostera-marina-amalgavirus-1_land-plants 100/100 100/100 YP_009362304.1_Amalgaviridae_Zostera-marina-amalgavirus-2_land-plants YP_009553344.1_Amalgaviridae_Festuca-pratensis-amalgavirus-2 55.2/57 YP_009447919.1_Amalgaviridae_Amalgavirus_Allium-cepa-amalgavirus-1_land-plants 100/100 YP_009447921.1_Amalgaviridae_Amalgavirus_Allium-cepa-amalgavirus-2_land-plants 45.3/22 32.3/4090.2/87 YP_009552083.1_Amalgaviridae_Phalaenopsis-equestris-amalgavirus-1 98.7/100 YP_009553030.1_Amalgaviridae_Salvia-hispanica-RNA-virus-1_land-plants Amalgaviridae YP_002321509.1_Amalgaviridae_Amalgavirus_Southern-tomato-virus_land-plants 99.5/9999.3/100 YP_009552797.1_Amalgaviridae_Capsicum-annuum-amalgavirus-1 94.4/97 YP_009551562.1_Amalgaviridae_Anthoxanthum-odoratum-amalgavirus-1_land-plants 100/100 YP_009552799.1_Amalgaviridae_Festuca-pratensis-amalgavirus-1 85.7/74 YP_003868436.1_Amalgaviridae_Amalgavirus_Rhododendron-virus-A_land-plants YP_009552085.1_Amalgaviridae_Medicago-sativa-amalgavirus-1 94.8/8794.4/65 YP_009553342.1_Amalgaviridae_Cleome-droserifolia-amalgavirus-1 65.3/43 YP_009551564.1_Amalgaviridae_Camellia-oleifera-amalgavirus-1 61.2/45 YP_009388304.1_Amalgaviridae_Amalgavirus_Spinach-amalgavirus-1_land-plants 90.9/70YP_009552087.1_Amalgaviridae_Erigeron-breviscapus-amalgavirus-1 55.2/71 YP_009552089.1_Amalgaviridae_Erigeron-breviscapus-amalgavirus-2 YP_009553311.1_Partitiviridae_Aspergillus-fumigatus-partitivirus-2_segment-1_fungi 81.9/50 YP_009508065.1_Partitiviridae_Cryspovirus_Cryptosporidium-parvum-virus-1_segment-RNA-1_protozoa 88.5/76 Partitiviridae / Cryspovirus 99.4/100

100/100 YP_008327312.1_Partitiviridae_Ustilaginoidea-virens-partitivirus-2_segment-RNA-1_fungi 84.5/49 93.9/93 81.5/78 Partitiviridae / Gammapartitivirus 66.5/44 97.4/88 96/98 WHZHC73278_Wuhan_large_pig_roundworm_virus_1_RdRp_len1560 22.2/59 100/100 XZSJSC65291_Xinzhou_partiti-like_virus_1_RdRp_len1523 99.4/100 QTM22037_Hubei_partiti-like_virus_57_len1618 95.2/96 9.5/45 QTM26145_Hubei_partiti-like_virus_56_len1642 85.8/49 100/100 Partitiviridae / Deltapartitivirus 100/100 Partitiviridae / Betapartitivirus YP_007419077.1_Partitiviridae_Alphapartitivirus_Rosellinia-necatrix-partitivirus-2_segment-RNA-1_fungi 46.7/59 YP_009362092.1_Partitiviridae_Bipolaris-maydis-partitivirus-1_segment-RNA-1_fungi 100/100 99.4/100 YP_009273018.1_Partitiviridae_Arabidopsis-halleri-partitivirus-1_segment-1_land-plants 97.6/100 YP_009551597.1_Partitiviridae_Alphapartitivirus_Medicago-sativa-alphapartitivirus-1_segment-RNA-1_land-plants 99.4/98 QTM26091_Hubei_partiti-like_virus_27_len1703 92.9/84 99.7/100 QTM20591_Hubei_partiti-like_virus_25_len1855 63.2/69 91.9/84 QTM27118_Hubei_partiti-like_virus_26_len1754 91.1/83 Partitiviridae / Alphapartitivirus 80.3/64

936 0.4

937 Figure 5

938

46 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Mito-like bionusvirus (ORF1) Mito-like spartanusvirus (ORF2) Mito-like picolinusvirus (ORF1) Mito-like babylonusvirus (ORF1) Mito-like albercanusvirus (ORF1) Mito-like laruketanusvirus (ORF1) Mito-like boobanusbagrillonusvirus (ORF1)

939

940 Figure 6

941

47 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Genome length (nt) BHZC37000_Beihai_tombus-like_virus_14_len5160 100/100 97.8/97 WZFSL78689_Wenzhou_tombus-like_virus_16_len4467 87.2/81

BHWZXX15284_Beihai_tombus-like_virus_16_len4822

26.7/55 98.8/90

14.9/33 BHBJDX18752_Beihai_tombus-like_virus_15_len3325

83.4/56 WHBL72297_Hubei_tombus-like_virus_37_len7304 61.3/62 99.8/100 97.8/97 25.9/55 Nodaviridae 98.2/97

JMYJCC68055_Jingmen_tombus-like_virus_1_len4971

WHBL72828_Hubei_tombus-like_virus_31_len4264 59.2/74 100/100

21/54 QTM27120_Hubei_tombus-like_virus_26_len2286

99.5/99 spider124630_Hubei_tombus-like_virus_29_len2550

45.6/59 YP_009552019.1_Tombusviridae_Culex-associated-Tombus-like-virus 100/100 99.9/100 insectZJ94961_Shangao_tombus-like_virus_1_len2299 100/100 40.8/47 SSZZ1762_Hubei_tombus-like_virus_28_RdRp_len2539

82/60 18.2/46 94/87

spider118341_Hubei_tombus-like_virus_14_len4303 100/100 WHYY18977_Hubei_tombus-like_virus_13_len5904 40.8/26 WHYY21098_Hubei_tombus-like_virus_15_len4000 93.5/80 100/100 YP_009553261.1_Tombusviridae_Yongsan-tombus-like-virus-1 90.4/48 99.9/100 mosZJ33874_Wenzhou_tombus-like_virus_11_len4958

90.9/46 83.1/57 BHZY58393_Beihai_tombus-like_virus_9_len4382 100/100 BHZY59908_Beihai_tombus-like_virus_10_len4836 96.4/76 TALG_2_DN18828_c0_g1_i1_len3835_ORF2_translationombus-like chagrupourvirus (ORF2) CJLX30686_Changjiang_tombus-like_virus_17_len3791 17.2/34 YP_009253998.1_Tombusviridae_Umbravirus_Sclerotinia-sclerotiorum-umbra-like-virus-1 94/80 85.1/51 65.2/12 WHSFII11317_Hubei_tombus-like_virus_11_len2724 86.9/29

17.2/18 89.9/89

100/100 100/98 88.7/78

90.6/41 beimix73523_Wenzhou_tombus-like_virus_8_len4683 98.4/90 WHCC112361_Hubei_tombus-like_virus_10_len3851

76.3/34 YP_009551334.1_RdRp__Citrus_yellow_vein-associated_virus_ 82.7/30 100/100 98.7/100 AWS06679.1_RdRP__Ethiopia_maize-associated_virus_ 100/100 Umbravirus

WZFSL85189_Wenzhou_tombus-like_virus_6_len5659 91.1/36 100/100 100/100 Dianthovirus 99/99

83/596.1/27 WHSFII19265_Hubei_tombus-like_virus_2_len5510 Tombusviridae 77.2/5063.3/38 100/100 WHWN54407_Hubei_tombus-like_virus_1_len5430 95.7/83 WGML136364_Hubei_tombus-like_virus_5_len3085 99.2/100 NP_619751.1_Tombusviridae_Avenavirus_Oat-chlorotic-stunt-virus_land-plants 88.5/86 WGML12775_Hubei_tombus-like_virus_6_len3770 83/56 87.6/55

89.3/4770.6/62

86.2/53 100/100 Betanecrovirus 90.6/87 100/100 Aureusvirus 96.2/94 100/100 Tombusvirus 94.7/58 NP_044732.1_Tombusviridae_Gallantivirus_Galinsoga-mosaic-virus 100/100 YP_007517174.1_Tombusviridae_Macanavirus_Furcraea-necrotic-streak-virus_land-plants 90.6/85 100/100 Alphanecrovirus NP_619718.1_Tombusviridae_Machlomovirus_Maize-chlorotic-mottle-virus_land-plants 97.6/90 100/100 100/100 Panicovirus NP_619521.1_Tombusviridae_Gammacarmovirus_Cowpea-mottle-virus_land-plants 100/100 YP_002333476.1_Tombusviridae_Gammacarmovirus_Soybean-yellow-mottle-mosaic-virus_land-plants 99.6/99 93.8/86 SCJXSX39078_Beihai_tombus-like_virus_1_len4145 99.5/100 NP_041226.1_Tombusviridae_Gammacarmovirus_Melon-necrotic-spot-virus_land-plants 98/99 17.3/44 NP_862835.2_Tombusviridae_Gammacarmovirus_Pea-stem-necrosis-virus_land-plants 88.4/86 99.4/98 Betacarmovirus 100/100 Pelarspovirus

92.2/91 YP_004300263.3_Tombusviridae_Trailing-lespedeza-virus-1_land-plants 98.9/99 93.3/91 YP_009270620.1_Tombusviridae_Gompholobium-virus-A_land-plants 96.8/98 Alphacarmovirus

942 0.5

943 Figure 7

944

48 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

82.5/48

YP_009553650.1_Hepeviridae_Rana-hepevirus_vertebrates

YP_009553211.1_Hepeviridae_Sogatella-furcifera-hepe-like-virus 84.5/64 45.8/15 98.2/100 WHCC116187_Hubei_hepe-like_virus_1_len7594 22.5/11 79.7/48 WHCC101555_Hubei_hepe-like_virus_2_len5933 98/96 83.9/66 WZFSL85851_Wenzhou_hepe-like_virus_2_len7652 Genome length (nt)

95/57 99.2/100 Hepeviridae 75.2/52 100/100 Tymovirales NP_062883.2_Matonaviridae_Rubivirus_Rubella-virus_human Segment I 99.7/100 83.9/66 Benyviridae Segment II 100/100 Endornaviridae 84/66 QHD64722.1_RdRp__Plasmopara_viticola_associated_virga-like_virus_1_ 97.3/100 ALG_9_DN7625_c0_g1_i2_len2313_reversed_ORF_2_frame_2_translationVirga-like bellevillovirus (ORF2) 93.6/92 85/77 WLJQ101932_Wenling_toga-like_virus_len9247 96.1/98 100/100 Togaviridae 98.5/98 Segment I 99/100 Bromoviridae Segment II 99.8/100 Closteroviridae Segment III 88.6/82 93.9/54 Virgaviridae mosHB236471_Hubei_virga-like_virus_1_len9141 84.4/62 100/100 mosHB235832_Hubei_virga-like_virus_2_len10650 96.7/100 QHA33736.1_polyprotein__Atrato_Virga-like_virus_2_ 88.2/73 SCM49209_Hubei_virga-like_virus_12_len8318_FS

100/100AMO03256.1_putative_polyprotein__Bofa_virus_ 85.1/87 AWA82276.1_polyprotein__Beult_virus_ 18.6/53 AMO03221.1_putative_polyprotein__Buckhurst_virus_ 87.6/8478.6/73 AMO03254.1_putative_polyprotein__Boutonnet_virus_ 15.8/59 AVK59469.1_replicase_protein__Hubei_virga-like_virus_11_ 100/100 QHA33754.1_polyprotein__Atrato_Virga-like_virus_5_ 99.6/100 YP_009553255.1_replicase__Culex_pipiens_associated_Tunisia_virus_

QDZ71189.1_hypothetical_protein__Megastigmus_ssRNA_virus_ 74.5/51 95.5/73 WHYY22294_Hubei_virga-like_virus_10_len9644 100/100 SCM51642_Hubei_virga-like_viurs_9_len10976

mosHB236537_Hubei_virga-like_virus_21_len10072

100/100QGA87337.1_hypothetical_protein__Hammarskog_virga-like_virus_ 3.5/34 QHA33793.1_polyprotein__Atrato_Virga-like_virus_7_ 99.9/100 QHA33782.1_polyprotein__Atrato_Virga-like_virus_7_ 74.3/52 86.5/59 QCM140901_Hubei_virga-like_virus_22_len4933 100/100 AWK77912.1_nonstructural_polyprotein_partial__Robinvale_plant_virus_1_ 93/83 mosHB236486_Hubei_virga-like_virus_23_len14662 90.5/89 BHHK129576_Beihai_anemone_virus_1_len11540 98/100 SXXX35475_Sanxia_atyid_shrimp_virus_1_len11157

YP_004928118.1_Kitaviridae_Higrevirus_Hibiscus-green-spot-virus-2_segment-RNA-1_land-plants 92.3/72 99.7/100 ATW76030.1_replicase__Hibiscus-infecting_cilevirus_ 100/100 YP_654538.1_Kitaviridae_Cilevirus_Citrus-leprosis-virus-C_segment-RNA-1_land-plants 21.9/49 QFU28437.1_RdRp__Passion_fruit_green_spot_virus_ 83.8/61 YP_009551530.1_Kitaviridae_Blunervirus_Tea-plant-necrotic-ring-blotch-virus_segment-RNA-2_land-plants 100/100 AGI44298.1_RdRp__Blueberry_necrotic_ring_blotch_virus_ 18.3/47 100/100 45.5/73 94.1/77 QGA69815.1_polyprotein__Varroa_destructor_virus_4_ 92/90 88.6/68

BHTH16123_Beihai_barnacle_virus_2_len9154

WGML111093_Hubei_virga-like_viurs_8_len9503

75.4/45 WHYY23932_Wuhan_house_centipede_virus_1_len10322 98.6/99 WHCCII10272_Wuhan_insect_virus_8_len10644 66.7/3724/60 YP_009552459.1_Virgaviridae_Nephila-clavipes-virus-3 85.3/80 WHCCII12598_Hubei_virga-like_virus_4_len7920 50.9/64 91.7/74 WHCCII13328_Wuhan_insect_virus_9_len9305 QTM162107_Hubei_negev-like_virus_1_len2937

AQM55309.1_hypothetical_protein_1__Cordoba_virus_ 100/100 75.1/36 YP_009351833.1_hypothetical_protein_1__Cordoba_virus_ 64.4/76 AQM55490.1_hypothetical_protein_1__Cordoba_virus_ 79.7/29 QTM27262_Hubei_virga-like_virus_7_len9481

CC64601_Hubei_negev-like_virus_2_len6640 71.6/435.2/32 YP_009351835.1_hypothetical_protein_1__Loreto_virus_ 64.7/68 AQM55272.1_hypothetical_protein_1__Big_Cypress_virus_

70.7/40YP_009552739.1_putative_RdRp__Ying_Kou_virus_

97.8/92AQZ55393.1_ORF1__Castlerea_virus_

93/95QBR99594.1_hypothetical_protein__Manglie_virus_

95.4/98AQM55317.1_hypothetical_protein_1__Ngewotan_negevirus_

92.5/91ASO75595.1_ORF1__Negev_virus_ 75.5/74 BBN20799.1_hypothetical_protein_1__West_Accra_virus_ 10/46 QDC23193.1_ORF1__Ngewotan_virus_

945 0.5

946 Figure 8

947

49 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

BHTH41702_Bunya-Arena_Beihai_barnacle_virus_6_len5878 Genome length (nt) 100/100 YYSZX17850_Bunya-Arena_Beihai_blue_swimmer_crab_virus_2_len6364

80.4/88 BALG_9_DN7730_c0_g1_i1_len2818_ORF_2_frame_2_translationunya-like bridouvirus (ORF2) 99/100 APG79310.1_2_RdRp__Shahe_bunya-like_virus_1_ 99.8/100 QDB75018.1_RdRp__Cladosporium_cladosporioides_negative-stranded_RNA_virus_2_

100/100 QGY72640.1_RdRp__Plasmopara_associated_mycobunyavirales-like_virus_3_

97.3/100 QDH86925.1_RdRp_partial__Arenavirus_sp._ 96.3/89 27.1/67 QGY72638.1_RdRp__Plasmopara_associated_mycobunyavirales-like_virus_1_

QGY72642.1_RdRp_partial__Plasmopara_associated_mycobunyavirales-like_virus_5_

92.7/93 BHTH16146_Bunya-Arena_Beihai_barnacle_virus_5_len6887 100/100 QDW81309.1_RdRp__Rhizoctonia_solani_bunya_phlebo-like_virus_1_

97.9/99 SCM50476_Bunya-Arena_Hubei_bunya-like_virus_5_len6161 100/100 SZmix21338_Bunya-Arena_Hubei_bunya-like_virus_6_len5634

98.9/99 WGML135976_Bunya-Arena_Hubei_bunya-like_virus_4_len6492

96.9/98 YP_009422199.1_Negarnaviricota_Coguvirus_Citrus-coguvirus_segment-RNA-1_land-plants 100/100 YP_009667028.1_Negarnaviricota_Phenuiviridae_Laulavirus_Laurel-Lake-laulavirus_segment-L_invertebrates 98/96 85.9/84 Phenuiviridae

99.7/99

13.5/3962/28

90.4/32 94.4/45

WP_076272294.1_pentapeptide_repeat-containing_protein__Paenibacillus_odorifer_ 32.7/18 37.6/54 SHWC0209c12636_Bunya-Arena_Shahe_bunya-like_virus_3_len9961 100/100 99.6/86 94.8/79 SHWC0209c12661_Bunya-Arena_Shahe_bunya-like_virus_4_len6195

YP_009666319.1_Negarnaviricota_Wupedeviridae_Wumivirus_Millipede-wumivirus 93.1/90 100/100 Nairoviridae

BHNXC41490_Bunya-Arena_Beihai_sipunculid_worm_virus_7_RdRp_len11709

96.7/98 WGML140132_Bunya-Arena_Hubei_myriapoda_virus_5_RdRp_len9874 96.3/89 80.8/84 99.3/100 Arenaviridae

100/100

BHJJX25757_Bunya-Arena_Beihai_hermit_crab_virus_2_RdRp_len9038 95.6/99 WGML134260_Bunya-Arena_Hubei_bunya-like_virus_11_len7824 89.3/94 99.7/100 100/100 Phasmaviridae

ZCM20815_Bunya-Arena_Hubei_orthoptera_virus_2_RdRp_len8763

53.3/75 WGML112345_Bunya-Arena_Hubei_bunya-like_virus_7_RdRp_len7402 98.1/98 100/100 WGML140341_Bunya-Arena_Hubei_myriapoda_virus_6_RdRp_len7470 18.3/39

BHJP59844_Bunya-Arena_Beihai_bunya-like_virus_5_len7842

9/28 SCM26604_Bunya-Arena_Hubei_bunya-like_virus_12_len4484 98.7/97 75.1/74 100/100

YP_009553307.1_Negarnaviricota_Athtab-bunya-like-virus_segment-L_invertebrates

99.1/94 100/100 Tospoviridae

93.5/98 WLJQ100911_Bunya-Arena_Wenling_crustacean_virus_9_RdRp_len6691 93.8/93 100/100 Fimoviridae 95.1/89 100/100 Peribunyaviridae

0.5 948

949 Figure 9

950

951

50 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

952 SUPPLEMENTARY FIGURE LEGENDS

953 Fig S1. Gel electrophoresis of RT-PCR to validate the putative new RNA viruses in

954 each sample used for library construction and RNA-seq. RT-PCR validation in (A)

955 ALG_1; (B) ALG_2; (C) ALG_3. rbcL: ribulose-bisphosphate carboxylase gene, tufA:

956 elongation factor Tu, both used as positive PCR-controls. K.a: Kraftionema allantoideum;

957 M.p: Microrhizoidea pickettheapsiorum ; D.t :Dolichomastix tenuilepsis.

958

959 Fig S2. MAFFT-alignment of the ALG_3_DN34624-encoded ORF with the viral RdRp

960 sequences that compose PFAM RdRp_4 profile.

961

962 Fig S3. MAFFT-alignment of the ALG_2_DN19089-encoded ORF with the PFAM

963 RdRp_1 profile entries.

964

965 Fig S4. rRNA depletion and contig assembly results. (A) Read counts before and after

966 trimming and SortmeRNA rRNA depletion. (B) Trinity assembly quality expressed as the

967 average length of contig assembled for each library.

968

969 Fig S5. Relative abundance of taxonomically-assigned contigs. Krona graphs obtained

970 using KMA and CCMetagen methods for (A) ALG_1, (B) ALG_2 and (C) ALG_3 libraries. The

971 ORFans contigs (i.e. those without any match in nt database) are not shown here. Each

972 circle represents a taxonomic level which is grouped and coloured based on their

973 taxonomic classification at lower ranks.

974

975 Fig S6. Genomic organization of the ALG_2 Partiti-like sequences and Bryopsis

976 mitochondria-associated dsRNA. ORFs are predicted using the mold protozoan

51 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

977 mitochondrial genetic code ( table 4). Ribosomal -1 frameshifting motifs

978 “GGAUUUU” are indicated in red boxes.

979

980 SUPPLEMENTARY TABLES

981 Table S1. RNAseq library preparation and sequencing results.

Library ID Library prep protocol Nb paired-end reads Data yield (bp)

ALG_1 SMARTer Stranded Total RNA-Seq Kit v2 Pico Input Mammalian 21,014,207 5.30 Gb

ALG_2 TruSeq stranded/RiboZero gold 25,963,002 6.54 Gb

ALG_3 SMARTer Stranded Total RNA-Seq Kit v2 Pico Input Mammalian 18,448,371 4.65 Gb

982

983 Table S2. List of primers used in this study. tufA primers have been retrieved from 76.

Sequence ID Primer ID Sequence Direction Product Size

1.1F CATTGCGTGAGTGGTATGCG forward Hypothetical amalga-like 867 (ALG_1_DN2506) 1.1R GTGCGCCTCTTAGCTATTGT reverse

1.2F CGCTCATTGCACTAAAGATGACA forward Amalga-like boulavirus 1112 1.2R TCAACCGTGTAAGGAGTTTGA reverse

2.1F GCTGCCGCTCACAATAGAGA forward Amalga-like chassivirus 2616 2.1R GGGTAACGGGGTCCTGTATG reverse

2.2F CCAGCCGATTGATCTCGACA forward Tombus-like chagrupourvirus 2360 2.2R ATGGACTTGTAATGGGCGCA reverse

2.3F GATTGACGCAGAACACGACG forward Amalga-like chaucrivirus 2105 2.3R ACAGACTCCGCCTCGAGTAA reverse

2.4F CCTGAGGGCACTTCTGTGTT forward 2000

52 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Mito-like babylonusvirus 2.4R CCATCATACGTTGGCGGAGA reverse

2.5F AGCAGAAGGAGGTCTCGCTA forward Mito-like bionusvirus 2311 2.5R TTTCGCTAAGGGGAACGACC reverse

2.6F GATCGCCTCCGTCTACATGG forward Mito-like spartanusvirus 2032 2.6R TTTGGACAGTCGGTACCGTC reverse

2.7R TGCAAGCTTCCCACCTTGAT reverse Mito-like laruketanusvirus 2070 2.7F GGAAAATCGCCAAAGCAGGG forward

2.8R GGCCTTTTAGAAAGCCAGCG reverse Mito-like boobanusbagrillonusvirus 2154 2.8F CCGGATGCCTCTTACGTGAA forward

2.9F GTTGCTGAGGGTCTTGTCGA forward Amalga-like dominovirus 2809 2.9R ACGATTTTCGAGACGCCTGT reverse

2.10F CTGGCAGCTACACACGAGAA forward Amalga-like lacheneauvirus 2675 2.10R GCGAATCAGTTGTTCCCGGA reverse

2.11F GGATGCGACCGTTGAATGC forward Partiti-like alassinovirus 2867 2.11R TCACTAGGGCACTCCACGA reverse

2.12F GGTTTGAACGCGGGGTAGA forward Partiti-like adriusvirus 3357 2.12R TCGAATTCATCAGTGGGCA reverse

2.13F TTTCAATGGCGGCGAGTCA forward Partiti-like lacotivirus 2503 2.13R CGGCCATATCTTCATCGTCCA reverse

2.14F CAGAACGTTGTTCCGGCTTG forward Mito-like picolinusvirus 2114 2.14R TCCCCGAACGTAAGGATCCT reverse

2.15F CCGCGGCCCAATCAAATATG forward Mito-like albercanusvirus 1973 2.15R CTGGGTAAGCACGCATTTCG reverse

53 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.08.141184; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

9.1F AAGTTGTGTGATAGGGGGCG forward Virga-like bellevillovirus 1741 9.1R TAAACATCCACACGCCGTCA reverse

9.2F TGGCTGCACTAACCCTCATG forward Bunya-like bridouvirus 2049 9.2R GCTGTGTTGCTTGCATCCAA reverse

rbcL_F CGTGCDATGCAYGCNGTWAT forward rbcL 275 rbcL_R TGCCAWACRTGAATACCACC reverse

tufA_F GGNGCNGCNCAAATGGAYGG forward tufA 758-901 tufA_R CCTTCNCGAATMGCRAAWCGC reverse

984

54