<<

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Ever-increasing viral diversity associated with the red imported fire ant Solenopsis invicta 2 (Formicidae: )

3

4 César A.D. Xavier1, Margaret L. Allen2*, and Anna E. Whitfield1*

5

6 1Department of Entomology and Plant Pathology, North Carolina State University, 840 Main Campus

7 Drive, Raleigh, NC 27606, USA

8 2U. S. Department of Agriculture, Agricultural Research Service, Biological Control of Pests Research

9 Unit, 59 Lee Road, Stoneville, MS 38776, USA

10 *Corresponding author: Margaret L. Allen

11 Phone: 1(662) 686-3647

12 E-mail: [email protected]

13 *Corresponding author: Anna E. Whitfield

14 Phone: 1(919) 515-5861

15 E-mail: [email protected]

1

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

16 Abstract

17 Background: Advances in sequencing and analysis tools have facilitated discovery of many new

18 from invertebrates, including ants. Solenopsis invicta is an invasive ant that has quickly

19 spread around world causing significant ecological and economic impacts. Its virome has begun

20 to be characterized pertaining to potential use of viruses as natural enemies. Although the S. invicta

21 virome is best characterized among ants, most studies have been performed in its native range,

22 with little information from invaded areas.

23 Methods: Using a metatranscriptome approach, we further characterized viruses associated with

24 S. invicta, in two introduced areas, U.S and Taiwan. The data set used here was obtained from

25 different stages (larvae, pupa, and adults) of S. invicta life cycle. Publicly available RNA sequences

26 from GenBank’s Sequence Read Archive were downloaded and de novo assembled using CLC

27 Genomics Workbench 20.0.1. Contigs were compared against the non-redundant protein

28 sequences and those showing similarity to viral sequences were further analyzed.

29 Results: We characterized five putative new viruses associated with S. invicta transcriptomes.

30 Sequence comparisons revealed extensive divergence across ORFs and genomic regions with most

31 of them sharing less than 40% amino acid identity with those closest homologous sequences

32 previously characterized. The first negative-sense single-stranded RNA viruses included in the

33 orders and are reported. In addition, two positive single-strand

34 viruses and one single strand DNA were also characterized. While the presence of a putative

35 associated with S. invicta was previously suggested to be a contamination, here we

36 characterized and present strong evidence that Solenopsis invicta virus 14 (SINV-14) is a tenui-

37 like virus that has a long-term association with the ant. Furthermore, based on virus abundance

38 compared to housekeeping genes, phylogenetic relationships, and completeness of viral coding

2

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

39 sequences, our results suggest that four of five viruses reported, those being SINV-14, SINV-15,

40 SINV-16 and SINV-17, replicate in the ant S. invicta.

41 Conclusions: The present study expands our knowledge about viral diversity associated with S.

42 invicta in introduced areas with potential to be used as biological control agents, which will require

43 further biological characterization.

44 Keywords: Red imported fire ant, S. invicta, Virus diversity, Virome, Metatranscriptome,

45 Tenuivirus, Mononegavirales, Bunyavirales.

3

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

46 Background

47 Insects are the most abundant and diverse group of animals on earth (1). High throughput

48 sequencing has led to huge advances in revealing previously unknown diversity of insect viruses,

49 significantly contributing to filling deep phylogenetic gaps along evolutionary history within the

50 most diverse viral lineages (2-4). Nonetheless, like insect diversity, the diversity of viruses

51 associated with insects is far from clear (1, 3). While many studies have focused on viromes of

52 arbovirus-transmitting insects, especially those involved in transmission of medically important

53 viruses, such as mosquitoes (5-8), other groups that are important pests either impacting

54 agricultural or natural ecosystems, such as invasive ants, have been barely studied (9-11). In

55 addition to contributing to better understanding of basic aspects of virus ecology and evolution,

56 these studies may contribute to new opportunities to use viruses as tools to develop more

57 sustainable insect control methods (12-14).

58 The red imported fire ant, Solenopsis invicta, is an invasive pest causing significant

59 ecological impact and economic loss in invaded areas (15, 16). Originating from South America,

60 S. invicta was accidentally introduced into the southern region of the United States (U.S.) almost

61 a century ago, becoming a serious problem (17). Since then, it has spread throughout the

62 southeastern U.S. and more recently into Oklahoma, New Mexico, Arizona, and California (18).

63 Limited introduction events, likely associated with small founder populations, led to a significant

64 reduction in natural enemies and enemy diversity associated with S. invicta in introduced areas

65 (19-21). Therefore, S. invicta populations may reach sizes even greater than those observed in its

66 native range, making control difficult and even more costly (22). High densities observed in S.

67 invicta populations in the U.S. have facilitated its dispersal across the world, contributing to

68 repeated introduction events in several countries, such as China, Taiwan, and Australia (18).

4

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

69 Morrison, Porter (23) demonstrated, based on predictive models, that most tropical and subtropical

70 regions worldwide are potentially appropriate for S. invicta infestation. Highly competitive ability,

71 generalist feeding habits and high populations densities make S. invicta a successful invasive

72 species causing huge disturbance in biodiversity by displacing native ants and other in

73 introduced regions (15). Currently, chemical insecticides are the most common control strategy

74 used against S. invicta (24). Low efficacy due to temporary effects of chemical applications, high

75 cost in extensive areas, and off-target effects harmful to beneficial and other native species are

76 substantial impairments to addressing invasive ant damage and expansion. Therefore,

77 establishment of management strategies that are both environmentally friendly and self-sustainable

78 are necessary.

79 Classical biological control has been one strategy used in an attempt to control this pest in

80 the U.S., with viruses considered a promising resource to be used as biopesticides (24-26). Over

81 the last decade, a great effort has been made in characterizing the S. invicta virome pertaining to

82 potential use in biological control (11, 27-30). To date, the S. invicta virome is composed of mainly

83 positive-sense single-strand RNA (+ssRNA) viruses in the order . These include

84 eleven viruses from families , Polycipiviridae, , Soliniviridae and two

85 unclassified viruses (11, 31). Additionally, one double-strand RNA (dsRNA) and one single-strand

86 DNA (ssDNA) of the families and , respectively, have been characterized

87 (11, 32). While most of these viruses have been reported associated with S. invicta in its native

88 range, only the species Solenopsis invicta virus 1 (SINV-1), SINV-2, SINV-3, SINV-6 and

89 Nylanderia fulva virus 1 (NfV-1) have been reported in United States (11, 31). Although great

90 progress has been reached in identification and molecular characterization of S. invicta virome, the

91 effect of these viruses and their potential use in biological control still need to be elucidated.

5

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

92 Interactions among S. invicta and SINV-1, SINV-2 and SINV-3, have been the only ones

93 previously characterized (25, 33, 34). SINV-1 was shown to affect claustral queen weight making

94 them lighter than uninfected ones, whereas SINV-2 directly affected fitness of queens by reducing

95 their reproductive output (33). SINV-3 was shown to be the most aggressive virus, causing

96 significant mortality in S. invicta colonies, with greatest potential as a biological control agent (25,

97 35).

98 Here using a metatranscriptome approach, we further characterized the viruses associated

99 with S. invicta in two introduced areas, U.S and Taiwan. These investigations utilized existing

100 publicly available RNA sequences deposited in NCBI GenBank as Sequence Read Archive (SRA)

101 data files (Table 1). Five new viruses were found associated with S. invicta, with the first negative-

102 sense single-stranded RNA (-ssRNA) viruses included in the orders Bunyavirales and

103 Mononegavirales reported. In addition, two +ssRNA viruses, included in the family Iflaviridae

104 and an unassigned species, and a partial ssDNA virus, were also characterized.

105 Methods

106 Data set selection and sequencing

107 Contigs corresponding to a putative new virus genome were initially obtained from a

108 transcriptome project studying differential gene expression between larvae and pupae stages of S.

109 invicta collected in Mississippi, U.S. (36). To further investigate and characterize viral diversity

110 associated with S. invicta, six libraries (36) were downloaded from GenBank’s Sequence Read

111 Archive (SRA) and analyzed. In addition, to determine whether those viruses were present in other

112 locations in U.S. and abroad, several transcriptome libraries deposited at SRA were compared

113 against those previous putative viral contigs using BLASTn. Libraries that presented high

114 abundance of reads mapping to putative viral contigs with an E-value cutoff lower than 1e-2 were

6

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

115 downloaded and further analyzed. The final data set analyzed here consisted of 13 libraries, all

116 from S. invicta transcriptomes. Detailed information on the samples are presented in Table 1. All

117 information about sample collection, RNA extraction, library preparation and high throughput

118 sequencing has been previously described (36-38).

119 De novo assembly and virus genome characterizations

120 Sequencing reads were trimmed for quality (Suppl. Table S1) and De novo assembled using

121 CLC Genomics Workbench 20.0.1 with default settings. Reads associated to S. invicta were first

122 filtered by mapping them back to the S. invicta genome (GenBank accession: GCA_000188075.2)

123 and only unmapped reads were used for De novo assembling (Suppl. Table S1). Contigs were

124 compared against NCBI non-redundant protein sequence (nr) using BLASTx and those predicted

125 to contain near full-length and intact coding sequences homologous to viral sequences previously

126 described were confirmed by mapping reads back to obtain consensus sequences for each library.

127 Potential open reading frames (ORFs) of the putative new viruses were predicted using ORF finder

128 (NCBI) and by comparative analysis with those related viruses. For identification of conserved

129 functional domains, a domain-based BLAST search was performed against the Conserved Domain

130 Database (CDD). ORFs that did not present any conserved domain were predicted based on

131 comparative analysis with other known closely related viruses (5).

132 Multiple sequence alignment and phylogenetic analysis

133 For phylogenetic analyses, one representative sequence for each taxon most closely related

134 to viruses described here was retrieved from GenBank according to BLASTx analysis. ORF

135 integrity was checked and extracted using ORF finder and the presence of conserved domain

136 characteristics for each group was checked at CDD before proceeding to phylogenetic analysis.

137 For highly divergent data sets, regions comprising protein conserved domains were extracted using

7

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

138 a script written in R software according to coordinates obtained from CDD for each sequence, and

139 only the conserved domain was used (as indicated below in the figure legends). Multiple sequence

140 alignments of deduced amino acid sequences were prepared using the MUSCLE option in MEGA7

141 (39). Alignments were manually checked and adjusted when necessary. Phylogenetic trees were

142 constructed using Bayesian inference performed with MrBayes 3.2.6 (40). Estimates of the amino

143 acid model were automatically conducted by setting the prior for the amino acid model to mixed,

144 as implemented in MrBayes. The analyses were carried out running two independent runs of

145 20,000,000 generations with sampling at every 1000 generations and a burn-in of 25%.

146 Convergence between runs were accepted when average standard deviations of split frequencies

147 was lower than 0.01. Trees were visualized and edited using FigTree and Corel Draw, respectively.

148 Inferring viruses-S. invicta association

149 Most samples used here were prepared from whole body ants, and to differentiate truly

150 replicating viruses from those that might be from food contamination or commensal organisms,

151 viral abundance was compared with abundance of three housekeeping genes: cytochrome c

152 oxidase subunit I (cox1), that presented high expression levels, ribosomal protein L18 (rpl18) and

153 translation elongation factor 1 (eif1-beta), both with lower expression levels. Viral and internal

154 genes abundance were calculated as percentage of viral or housekeeping reads per total number of

155 reads in each library (Table 2). Differences among groups was assessed using non-parametric

156 Kruskal-Wallis test followed by post hoc multiple comparison test using Fisher’s least significant

157 difference (41) using Agricolae package (42) in R software. The data were log-transformed before

158 statistical testing. Thus, a virus was suggested to be hosted by S. invicta if they qualified in at least

159 two of the criteria as previously suggested by (8) with slight modifications: (i) virus abundance

160 was within the range or higher abundance than housekeeping genes; (ii) viral reads per total

8

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

161 number of reads in a library was higher than 0.01%; (iii) they were phylogenetically close to

162 another insect virus; and (iv) complete viral coding sequence regions were recovered.

163 Principal component analysis (PCA) of compositional bias

164 Synonymous codon usage (SCU) and dinucleotide composition bias has been used to infer

165 putative host-virus association and taxonomic placement for a given virus (43, 44). To gain further

166 insight about the host of the tenui-like sequences reported here, we used principal component

167 analysis (PCA) to compare the compositional bias between plant tenuivirus and other viruses

168 infecting plant and animals (vertebrates and invertebrates). SCU counts were determined for the

169 full-length nucleotide sequence of RNA-dependent RNA polymerase (RdRp), glycoprotein,

170 nucleocapsid and non-structural protein 4 (NS4) protein, using the coRdon package (45) in R

171 software. Dinucleotide bias was calculated as the frequency of dinucleotide XY divided by the

172 product of frequencies of nucleotide X and nucleotide Y using the SeqinR package (46) in R

173 software. Sequences highly divergent with no conserved domain according to CDD analysis were

174 not included. Principal components analysis was performed on codon usage counts and

175 dinucleotide bias using the function prcomp and plotted using the factoextra package (47), both

176 implemented in R software.

177 Results

178 New viruses found associated with S. invicta transcriptome

179 Virus sequences were found associated with S. invicta transcriptomes collected from three

180 geographic locations in the U.S. and one location from Taiwan, representing different stages of the

181 S. invicta life cycle (Table 1). De novo assembling of non-host reads (Suppl. Table S1) from 13

182 libraries followed by BLAST analyses revealed the presence of complete or near-complete

183 genomes of 4 putative new single-strand RNA (ssRNA) viruses, tentatively named S. invicta virus

9

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

184 14 (SINV-14), SINV-15, SINV-16 and SINV-17, and one partial genome encompassing the almost

185 full-length coding sequence of a ssDNA virus, named S. invicta-associated densovirus (SINaDNV;

186 Figure 1). Sequence comparisons revealed extensive divergence across ORFs and genomic regions

187 with most of them sharing less than 40% amino acid identity compared to those closest

188 homologous sequences previously characterized (Suppl. Table S2). In addition, typical domains

189 associated with RdRp and other proteins involved in virus replication, encapsidation, and entry

190 into the cell were found and characteristic of known virus groups, again confirming the viral

191 identity of those sequences (Suppl. Table S3). Further analyses demonstrated that SINV-14 and

192 SINV-15 fit within the order Bunyavirales and Mononegavirales, respectively, with SINV-14

193 having a multisegmented genome (Figure 1). The two other viruses with non-segmented +ssRNA

194 genomes, SINV-16 and SINV-17, were included within the family Iflaviridae and into an

195 unclassified group closely related to a proposed new genus of insect-infecting viruses “Negevirus”

196 (48), respectively. Finally, the partial ssDNA virus SINaDNV was included into family

197 Parvoviridae subfamily Densovirinae.

198 Molecular and phylogenetic characterization of new viruses

199 Negative-sense RNA viruses

200 The SINV-14 genome consists of a linear, negative-sense single-strand RNA composed of

201 four segments, RNA1, RNA2, RNA3 and RNA4 with approximately 8900, 2862, 2086 and 2501

202 nucleotides (nt), respectively (Figure 1A). Identity analysis of deduced amino acid sequences of

203 coding regions across all segments demonstrated an extensive divergence with those most closely

204 related viruses (Suppl. Table S2). The segments RNA1 and RNA2 encode only one ORF, located

205 on the negative strand (Figure 1A). The deduced amino acid sequence of RNA1 ORF was the most

206 conserved, showing highest identity with the RdRp of Fitzroy Crossing tenui-like virus 1

10

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

207 (FCTenV1), a new tenui-like virus described associated with Culex annulirostris from western

208 Australia (Suppl. Table S2) (49), followed by European wheat striate mosaic virus (EWSMV), a

209 typical plant tenuivirus (50). Likewise, even though the RNA2 ORF was the most divergent, it

210 showed highest identity with the glycoprotein of FCTenV1 (Suppl. Table S2). In contrast with

211 RNA1 and RNA2, the segments RNA3 and RNA4 encode two ORFs in ambisense orientation

212 (Figure 1A). Deduced amino acid sequence of RNA3 ORF, encoded on the negative strand,

213 showed highest identity with nucleocapsid protein of otter fecal bunyavirus, an unclassified

214 (51), while no significant similarity was found to a small putative ORF located on the

215 positive strand (Suppl Table S2). The BLASTx analyses indicated that ORFs encoded by RNA4

216 in the positive and negative strands were related to those of non-nucleocapsid proteins from typical

217 plant , and nonstructural protein 4 (NS4) of FCTenV1 followed by other plant

218 tenuiviruses, respectively (Suppl. Table S2). Interestingly, the ORF encoded on the negative strand

219 contained the specific domain of movement proteins (pfam03300) of plant tenuiviruses (Suppl.

220 Table S3). A domain-based BLAST search was performed against the CDD, confirming the

221 presence of conserved domains from typical members of family for RdRp,

222 glycoprotein, putative NS4 and non-nucleocapsid protein (Suppl. Table S3). Also interestingly,

223 further sequence analysis revealed the presence of conserved sequences at the ends of all four

224 segments identical to those found at tenuivirus genomes (3’-ACUUUGUGU and ACACAAAGU-

225 5’; Figure 1B) (52), indicating that full length genome segments were likely assembled from the

226 RNA-Seq datasets.

227 To further determine the taxonomic placement of SINV-14 within the order Bunyavirales,

228 Bayesian phylogenetic trees were inferred based on deduced amino acid sequence of RdRp,

229 glycoprotein, nucleocapsid and NS4, representative of all four segments found (Figure 2). In

11

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

230 accordance with identity analysis, SINV-14 RdRp was closest related to that of FCTenV1 (Figure

231 2A). These two viruses clustered closely to Horsefly horwuvirus (virus Wuhan horsefly virus,

232 WhHV), a tenui-like virus described from a poll of horseflies (family Tabanidae) (2), the only

233 species within the genus Horwuvirus (Figure 2A). Moreover, while the WhHV putative

234 glycoprotein and NS4 were very divergent and no conserved domains were detected based on CDD

235 analyses (thus they were not included into phylogenetic analyses) SINV-14 glycoprotein and NS4

236 phylogenetic trees were congruent with RdRp, clustering in a well-supported clade closest to

237 FCTenV1 and composing a sister clade with typical plant tenuiviruses (Figure 2B and C).

238 Interestingly, the phylogenetic tree of the nucleocapsid protein was incongruent compared to those

239 of other viral proteins (Figure 2D). It was more closely related to that of otter fecal bunyavirus and

240 other representative viruses of different genus within family Phenuiviridae rather than with plant

241 tenuiviruses (Figure 2D). The identity analysis supports the phylogenetic placement in which

242 nucleocapsid protein was significantly closer related to otter fecal bunyavirus than tenui-like and

243 plant tenuiviruses (Suppl. Table S2).

244 The SINV-15 genome comprises a linear negative-sense, single strand RNA varying from

245 10113 to 10135 nt (Figure 1C). The genome organization is typical of viruses within order

246 Mononegavirales, which contain the conserved core predicted to encode five ORFs: nucleocapsid

247 (N), phosphoprotein (P), matrix protein (M), glycoprotein (G) and RdRp (L) on the negative strand

248 (Figure 1C). Domain-based BLAST search confirmed that ORFs 1 and 5 carry nucleocapsid and

249 RdRp conserved domains, respectively, whereas no conserved domains were identified for ORFs

250 2, 3 and 4 (Suppl. Table S3). Thus, these ORFs were inferred based on comparative analysis with

251 other known related viruses. Identity analysis indicated SINV-15 is most closely related to Formica

252 fusca virus 1 (FfusV-1; Suppl. Table S2), another virus within order Mononegavirales recently

12

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

253 described associated with the ant Formica fusca (9). Indeed, phylogenetic analysis based on

254 deduced amino acid sequence of RdRp clustered SINV-15 closest to FfusV-1 and Formica exsecta

255 virus 4 (FeV-4), both currently unaccepted species yet. As these three viruses composed a clade

256 closely related to Orinoco virus (ONCV), the only accepted species of genus Orinovirus, it

257 suggests they are putative new members of this genus within the family . This is the

258 first report of an orinovirus associated with S. invicta.

259 Positive-sense RNA viruses

260 In addition to SINV-14 and SINV-15, which have negative sense genomes, we also

261 characterized two positive sense RNA viruses (Figure 1D and E). SINV-16 has a positive sense

262 single-stranded RNA genome varying from 10,355 to 10,369 nt (Figure 1D). Genome organization

263 is typical of viruses included in the order Picornavirales. The single ORF predicted encodes a

264 large polyprotein (2910 amino acids) including RdRp, helicase, and the canonical

265 protein domain (Figure 1D and Suppl. Table S3). Sequence comparison based on amino

266 acid sequence of the polyprotein revealed significant identity with two unaccepted iflavirus-related

267 species, pink bollworm virus 4, isolated from the pink bollworm moth Pectinophora gossypiella

268 (53) and Hubei myriapoda virus 1, isolated from a pool of individuals of genus Scutigeridae (3).

269 Phylogenetic analysis of RdRp domain clearly clustered SINV-16 within genus Iflavirus (Figure

270 3B). In accordance with identity analysis, it was most closely related to Hubei myriapoda virus 1,

271 in a well-supported clade composed also by pink bollworm virus 4 and Dinocampus coccinellae

272 paralysis virus (DcPV), an accepted iflavirus species isolated from the parasitic wasp Dinocampus

273 coccinellae. These results indicate that SINV-16 is a new iflavirus reported from S. invicta.

274 The SINV-17 genome consists of a linear, positive-sense single-stranded RNA with

275 approximately 10,341 nt predicted to encode three putative ORFs (Figure 1E). ORF1 was predicted

13

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

276 to encode the longest protein containing the viral methyltransferase, helicase and RdRp domains

277 (Figure 1E and Suppl. Table S3). ORF2 and ORF3 were predicted to encode smaller structural

278 proteins, which contain domains associated with virion glycoproteins of insect viruses and virion

279 membrane proteins of plant and insect viruses, respectively (Suppl. Table S3). Interestingly, while

280 ORFs 1 and 2 were most similar to Hubei virga-like virus 8, an unclassified virus isolated from a

281 pool of individuals of the genus Scutigeridae (3), ORF 3 showed significant identity to Loreto

282 virus isolated from Anopheles albimanus, a member of the proposed genus Negevirus (48) (Suppl.

283 Table S2). To further determine the taxonomic placement of SINV-17, Bayesian phylogenetic

284 trees were inferred based on the deduced amino acid sequence of ORF1 RdRp domain. In

285 accordance with identity analysis, SINV-17 was most closely related to Hubei virga-like virus 8,

286 appearing to represent a sister clade to the proposed genus Negevirus, and possibly representing a

287 new genus (Figure 3C).

288 Other virus sequences

289 Two small contigs were found showing significant similarity with RNA virus sequences.

290 The first one has 430 nt in length and exhibited highest identity with a hypothetical protein encoded

291 by Beihai tombus-like virus 6 (41.13% sequence identity; 97% coverage; E-value = 4e-17;

292 GenBank access: APG76145.1), isolated from a pool of marine crustaceans within the genus

293 Penaeidae (3). The second contig has 330 nt and showed highest identity with NfV-1 polyprotein

294 (59.49% sequence identity; 93% coverage; E-value = 2e-24; GenBank access: AOC55078.1) from

295 tawny crazy ant Nylanderia fulva (54).

296 In addition to the RNA viruses, we also characterized a partial genome encompassing the

297 almost full-length coding sequence of a single-strand DNA virus. The SINaDNV partial genome

298 assembled from RNA-Seq reads was predicted to encode three ORFs on the positive strand,

14

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

299 showing a typical organization of viruses within the family Parvoviridae (Figure 1F). The ORFs

300 1 and 2 were overlapped and were most similar to NS1 and NS2 of mosquito-infecting

301 densoviruses, respectively (Suppl. Table 2). In addition, a third ORF was predicted to encode the

302 capsid protein showing significant identity to Aedes albopictus C6/36 cell densovirus (Suppl.

303 Table S2). Phylogenetic analysis based on deduced amino acid sequence of the NS1, clustered

304 SINaDNV in a well-supported branch most closely related to viruses within subfamily

305 Densovirinae and genus Brevidensovirus (Figure 3D). While these results strongly suggest

306 SINaDNV is a new densovirus species, further investigation will be needed to reveal its whole

307 genome organization.

308 Inferring host association

309 Although the transcriptomic approach has been shown to efficiently reveal viral diversity

310 associated with a diverse range of samples, inference about host-virus association it is not always

311 a trivial task. S. invicta is a generalist predator/scavenger feeding primarily on other insects and

312 small arthropods, both alive and dead. Thus, it is possible that viruses characterized here could be

313 associated with undigested food or even with S. invicta-associated microorganisms, since most

314 samples involved the whole ant body (Table 1). To infer host-virus association and differentiate

315 viruses truly replicating from those that may be a contamination, we calculated the percentage of

316 viral reads for each library and compared with three housekeeping genes, cox1, rpl18 and eif1-

317 beta (Figure 4 and Table 2). While the abundance of all three housekeeping genes were relatively

318 stable across different libraries, it differed among them, with cox1 most abundant, followed by

319 rpl18 and eif1-beta (Figure 4A). Likewise, virus abundance was relatively stable within species,

320 except for SINV-14, which showed a great variation between libraries, varying between 0.130 to

321 8.394% of viral reads compared to the total reads, considering the virus complete genome (Table

15

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

322 2). For SINV-14 and SINV-16, read abundance was significantly higher than eif1-beta and rpl18,

323 with no difference compared to cox1 (Figure 4B). Moreover, SINV-17 abundance was

324 significantly lower than cox1 and higher than eif1-beta and rpl18. In contrast, SINV-15 abundance

325 was the lowest one among them, being significantly lower than coxI, with no difference compared

326 to rpl18 and eif1-beta (Figure 4B). In addition, SINV-14, SINV-15, SINV-16 and SINV-17

327 abundances were higher than 0.01% of reads compared to entire library (Table 2). The only

328 exception was SINaDNV, that had a very low abundance compared to housekeeping genes and

329 other viruses (Table 2).

330 Interestingly, SINV-14 abundance was much higher in a sample prepared from a dissected

331 brain compared to those from the whole body (Table 1 and 2). Considering the abundance of all

332 segments, SINV-14 from the brain sample was 17-fold more abundant compared with cox1 gene,

333 whereas abundance of those samples prepared from whole body varied 0.135 to 2.69-fold higher

334 than cox1 (Table 2). Moreover, the abundance of segments was asymmetric (Figure 4C). The reads

335 mapping on RNA1 were significantly more abundant, followed by RNA3 and RNA4, with RNA2

336 the least abundant (Figure 4C). These results suggest that SINV-14 is unlikely to be a

337 contamination, and may exhibit a tissue tropism, as we observed evidence that it accumulates more

338 in the brain compared with the whole body.

339 To further verify the relationship between SINV-14 and S. invicta, we also analyzed the

340 codon usage and dinucleotide bias (Figures 5). Compositional biases in virus genomes are likely

341 to mimic those observed in their host species (43, 44), and therefore has been used to assign a

342 given virus to a host and assist in taxonomic placement. Similar data set used for phylogenetic

343 analyses was used here (Suppl. Table S4). Principal component analysis of codon usage based on

344 RdRp, glycoprotein, nucleocapsid and NS4 protein, clearly separate SINV-14 from plant viruses

16

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

345 (tenuivirus and ), and non-plant viruses (phenuivirus, orthobunyavirus and tenui-

346 like viruses; Figure 5A-D). Based on the RdRp, three very clear clusters were observed,

347 representing , non-plant virus and SINV-14 (Figure 5A). Interestingly, the two tenui-

348 like viruses (WhHV and FCTenv1) were closer to plant tenuiviruses than SINV-14, based on RdRp

349 (Figure 5A). Moreover, while SINV-14 clearly clustered separately from other groups, based on

350 glycoprotein, nucleocapsid and NS4, this separation was not very clear between plant and non-

351 plant viruses and tenui-like viruses, as observed for RdRp (Figure 5A-D). We also performed

352 principal component analysis on nucleotide bias obtaining similar results to the codon usage

353 analysis (Figure 5E-H). Whereas RdRp clusters was not very clear, based on plant and non-plant

354 viruses, for all other genes SINV-14 clearly separated from any other group (Figure 5E-H), and

355 again, tenui-like viruses (WhHV and FCTenv1) were closer to other groups than SINV-14.

356 Discussion

357 The advancement of sequencing tools, the collection of numerous genetic samples of high-

358 impact invasive ants, and the continuing discovery of novel insect-related (and other) viruses have

359 been combined to facilitate identification of novel viruses like those reported here. While virus

360 identification for the purpose of addressing pathogenic relationships will certainly continue, it is

361 now possible to identify virome entities that are not necessarily pathogens in the organism samples

362 from which they are identified.

363 If a species of ant is invasive, are the viruses that persist with that species in its invasive

364 range also invasive species? Definitions of “invasive viruses” would argue that only when a virus

365 causes a disease of obvious impact is the virus itself considered invasive (55). Yet it is easy to

366 speculate that a virus arriving with an invasive ant could prove pathogenic to native ants, insects,

367 and insectivores in the invasive region; perhaps even plants and the human food chain could

17

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

368 experience impact. More thorough characterization of the virosphere will enhance our

369 understanding of the roles microbes play and how viruses and hosts interact.

370 S. invicta is a ground-dwelling ant that feeds on a broad diet that may include plant and

371 insect exudates, prey, and decaying matter. Thus, the opportunities for exchange of virus particles

372 from the environment, whether native or novel, are many. The social nature of ant colonies where

373 exchanges of biological fluids among colony members of all castes and life stages must occur also

374 facilitates distribution of virus within the colony. Distinguishing clearly between different

375 host/microbe relationships and forms of virus transmission (horizontal and vertical) will be

376 challenging with invasive ant systems.

377 Using a transcriptome approach, we searched for RNA viruses in the SRA data collected

378 from S. invicta. Curiously, while a great diversity of positive and negative sense viruses has been

379 reported from ants (9, 10) and arthropods (2), S. invicta virome has been composed mainly of

380 +ssRNA viruses in the order Picornavirales, with no negative sense RNA viruses previously

381 reported. One factor that could be responsible for this bias is the selection of polyadenylated RNA

382 for library preparation, in previous studies (11, 30). The utilization of unbiased library preparation,

383 using ribosomal depletion methods, besides the detection of +ssRNA viruses, has enabled the

384 discovery of many viruses with non-polyadenylated genome, especially within order Bunyavirales

385 (2). The fact of some of the libraries used here were prepared using this approach (36) allowed us

386 to characterize SINV-14, the first bunyavirus reported from ants. In addition, we also reported for

387 the first time another negative sense virus within the order Mononegavirales associated to S.

388 invicta. Although we did not perform any amplification step of viruses genomes characterized

389 here, the presence of untruncated ORFs carrying intact functional domains (Suppl. Table S3), the

390 high abundance of viral reads, and the similar organization and genomes sizes compared to other

18

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

391 closely related viruses strongly suggest that we obtained the correct full-length or near full-length

392 genomes (Figure 1). The only exception is SINaDNV, due to the linear single strand DNA

393 (ssDNA) genome. Only active transcriptional units were sampled using our transcriptome

394 approach and further investigation is needed to reveal its full-length genome structure.

395 Our analysis of RNA obtained from geographically and temporally different samples

396 suggests shifts in the virome of native and invasive fire ants (11). In established exotic populations

397 of the S. invicta Yang, Yu (21) identified SINV-1 and SINV-2 and hypothesized that while these

398 two viruses may persist, the more virulent virus, SINV-3 arrived with founders but caused high

399 host mortality resulting in individual carriers of the virus being rapidly eliminated. Nine additional

400 viruses were identified in S. invicta RNA samples from the native South American range of the

401 species (11). The viruses described here show distribution and composition that varied according

402 to geographic location and S. invicta stage (Table 1 and 2). While SINV-16, SINV-17 and

403 SINaDNV were exclusively found in Mississippi and specifically associated with larvae

404 individuals, detected in all three libraries, these viruses were not detected in pupae from the same

405 sampled place, suggesting a stage-dependent association, as previously reported for some other S.

406 invicta viruses (28). Conversely, SINV-14 and SINV-15 were detected associated with all stages

407 analyzed (Table 1 and 2). In addition, SINV-14 and SINV-15 were found in Mississippi and Texas,

408 with SINV-14 also detected in Florida (Table 1 and 2). Interestingly, while SINV-15, SINV-16,

409 SINV-17 and SINaDNV were restricted to the U.S., SINV-14 was also detected in Taiwan (Table

410 1 and 2). To date, four of five viruses previously reported associated with S. invicta in introduced

411 areas have also been previously described to occur in its native origin (11, 21). Although viral

412 diversity associated with S. invicta has been deeply studied in Argentina, the viruses reported in

413 this study have not been detected there yet (11, 30). To gain further insight about the new viruses’

19

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

414 distribution in S. invicta native range, eight transcriptome libraries publicly available at SRA, were

415 analyzed. These libraries were prepared from 182 colonies collected from eastern Formosa,

416 Argentina, suggested to be the original source to S. invicta population in U.S. (11, 18). Based on

417 BLASTn analysis, our search failed to detect any significant reads matching to virus genomes

418 obtained here (data not shown), suggesting that these viruses may not be present in the sampled

419 area in Argentina, and even that new host-virus associations may have occurred in introduced

420 areas.

421 While prospecting viral enemies in the origin center of S. invicta offers opportunity for

422 discovering a higher viral diversity with potential to be used in biocontrol, investigating viruses

423 circulating in introduced populations may reveal new virus-ant interactions with even greater

424 potential. Due to the short-term virus-host association, those viruses acquired in introduced areas

425 may be even more harmful to the host. Furthermore, selecting natural enemies already present in

426 the invaded area is likely to offer less risk to the natural ecosystem. Thus, the viruses characterized

427 here expand the diversity of viruses infecting S. invicta in invaded areas with potential to be used

428 as biological control agents, which will require further biological characterization.

429 The use of metatranscriptome approach has allowed the discovery of divergent groups of

430 RNA viruses contributing to a better understanding about origin and evolutionary history of viruses

431 associated with the most diverse host taxa (56). While the origins of viruses that replicate in plant

432 and insect vectors remains still unknown, such as those within order Bunyavirales and

433 Mononegavirales, the discovery of possible intermediate forms has been suggested (2, 49).

434 Tenuiviruses are known to be plant viruses that replicate in the insect vector with a multipartite

435 negative/ambisense, ssRNA genome, packaged in a filamentous nonenveloped virion and

436 horizontally transmitted to the host plant in a circulative propagative manner by planthoppers

20

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

437 (Order ), where vertical transmission can also occur (57, 58). Li, Shi (2) reported the

438 first species of a tenui-like virus, Horsefly horwuvirus (virus WhHV), associated with a pool of

439 horseflies (family Tabanidae), and proposed that it may represent an transitional form between

440 plant-infecting virus and -specific viruses. In addition, a partial genome of a new tenui-

441 like, FCTenV1, has been recently reported associated to Culex annulirostris (49). While these two

442 tenui-like viruses have been reported associated with (Order Diptera), we characterized a new

443 tenui-like virus closely related to the FCTenV1, associated to S. invicta (Order Hymenoptera).

444 Interestingly, in contrast with WhHV, which lacks an ambisense coding strategy, the SINV-14

445 genome exactly mirrors the genomic structure of typical plant tenuivirus, encoding proteins using

446 ambisense strategy, and also has the conserved sequences at the ends of all four segments identical

447 to those found in tenuivirus genomes (Figure 1A and B). While these viruses may represent

448 different steps of transitional forms between tenui-infecting plant and those insects-specific

449 viruses, the direction of the process, if they come from plant to insect or otherwise, remains

450 unknown.

451 Whereas phylogenetic trees based on RdRp, glycoprotein and NS4 protein were highly

452 congruent and related to those from tenui-like viruses, the nucleocapsid protein was closely related

453 to that of otter fecal phlebovirus, strongly suggesting that SINV-14 may have a

454 recombinant/reassortment origin, where the RNA3 or part of it was acquired from a divergent

455 phenuivirus. While it could be argued that this might be an artefact due to assembling a segmented

456 genome using a transcriptome approach, the fact that we did not find any other contigs related to

457 phenuiviruses, the high read abundance, the constant association between these four segments

458 across different libraries and the presence of the conserved tenuivirus sequence located at the ends

459 of genome segments (Figure 1B), suggest that they are part of a unique genome, rather than being

21

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

460 an artefact. In addition, phylogenetic congruence among different segments indicate that they are

461 in an intimate codivergence process (Suppl. Figure S1).

462 Valles, Strong (59) using an expressed sequence tag (ETS) library from S. invicta, detected

463 a short sequence (approximately 750 nt, GenBank access: EF409991) related to plant tenuiviruses.

464 Further identity analysis showed that sequence is 99.8% identical to SINV14 RNA4. They

465 suggested that the sequence would be likely a contamination due to the ant diet feeding either plant

466 or infected insect. Tenuiviruses are typical plant viruses that replicate in the insect vector (57, 58),

467 and the high abundance of SINV-14 compared to housekeeping genes is strong evidence of active

468 replication in S. invicta. Furthermore, the highest virus abundance was found in a sample prepared

469 from a dissected ant brain, which rules out the possibility of contamination due to feeding on plants

470 or association with insect vectors infected with a tenuivirus. The stripe virus (MSpV), a

471 typical plant tenuivirus, was detected in the brain of its vector Peregrinus maidis, providing

472 evidence of replication of tenuiviruses in this tissue (60). Tenuiviruses replicate in diverse tissues

473 of their insect vectors and are transovarially transmitted between generations suggesting that

474 SINV-14, and other tenui-like, could be sustained through vertical transmission in their insect host

475 (58). Additionally, significant asymmetric abundance of different components of SINV-14 suggest

476 a very specific and active interaction. Asymmetric accumulation in multipartite viruses has been

477 shown and seems to be common trait, shared by RNA and DNA multipartite viruses infecting

478 plants and animals (61, 62); this has been suggested to be involved in control of gene expression

479 allowing fast virus adaptation (63). Although this has not yet been shown for any phenuivirus and

480 may be host dependent (61), our results suggest that this may occur, at least for SINV-14, and

481 more experiments will be necessary to confirm.

22

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

482 While SINV-14 is evolutionarily closely related to insect tenui-like viruses and plant

483 tenuivirus, the synonymous codon usage and dinucleotide analyses demonstrate a distinct

484 compositional bias compared to FCTenV1 and WhHV, and all other viruses analyzed here,

485 indicating that the virus may be actively replicating in ants rather than plants and other insects.

486 The active replication may have driven the virus genome to distinct compositional bias at the

487 nucleotide level, while maintaining protein integrity at the amino acid level and close relationship

488 with those of other tenui-like viruses, as observed through phylogenetic analysis. The fact that

489 most viruses examined here replicate in plant or vertebrate host and also in the insect vector, could

490 be the reason driving such difference between SINV-14, most likely associated only with ant,

491 compared to other viruses. Furthermore, phylogenetic congruence across different segments that

492 mirror the genetic structure of invasive S. invicta, suggest an intimate and long-term codivergence

493 process between virus-host. Ascunce, Yang (18) demonstrated that S. invicta subpopulations from

494 Texas comprise a distinct genetic cluster compared to Mississippi and Florida, that can also be

495 observed when we reconstructed the phylogeny for the distinct segments of SINV-14 (Suppl.

496 Figure S1). Interestingly, while Ascunce, Yang (18) suggested an introduction event from

497 California into Taiwan, the virus recovery from Taiwan clustered with those from Mississippi and

498 Florida, suggesting that other more recent introductions may have occurred (64, 65).

499 We presented strong evidence that SINV-14 is long-term association and may actively

500 replicate in ants. However, the possibility of this virus, and other tenui-like to replicate in plants is

501 unknown. The presence of the protein carrying an NS4 domain, that is related to cell-to-cell and

502 long-distance movement in plants (66) in insect viruses is puzzling. Solenopsis invicta virus 14

503 NS4 is highly divergent showing 19.85 to 24.4% of identity compared to other tenuivirus (Suppl.

504 Figure S2). In addition, mutation in most sites associated with cell-to-cell and long-distance

23

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

505 movement (Suppl. Figure S2), suggests that this protein might have lost the capacity to move viral

506 genome in plant, whereas its maintenance in insect viruses may be related to another role acquired

507 through functional diversification. The possible function of the plant virus movement protein from

508 insect viruses is of significant interest, and its role in insects and plants remains to be addressed.

509 Altogether, based on virus abundance compared to housekeeping genes, abundant viruses

510 with viral reads higher than 0.01%, phylogenetic relationship, complete viral coding sequence

511 regions recovered, and compositional bias for SINV-14, our results suggest that four out five

512 viruses reported here, those being SINV-14, SINV-15, SINV-16 and SINV-17 are truly replicating

513 in S. invicta. Our results suggest fluid shifts in the virome of this invasive species. Further research

514 describing this virome in native and invasive regions and ecosystems could provide insight on

515 virus evolution and invasion mechanics.

516 Conclusions

517 The present study expands our knowledge about viral diversity associated with S. invicta in

518 introduced areas. In addition to revealing new virus-host interactions, contributing to better

519 understanding of viral evolutionary history and emergence, the discovery of new viruses expands

520 the range of agents with potential to be used in biocontrol programs, which will require further

521 biological characterization. By understanding these interactions, we may be better equipped to

522 cope with ongoing global changes and introductions of invasive organisms and their associated

523 viruses.

524 Availability of data and materials

525 Consensus virus genome sequences were deposited in GenBank under accession numbers:

526 MT860232 to MT860267.

24

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

527 Ethics approval and consent to participate

528 Not applicable.

529 Consent for publication

530 Not applicable.

531 Competing interests

532 The authors declare that they have no competing interests.

533 Funding

534 This work was supported by USDA ARS In-House Appropriated Research Project: Biology and

535 Control of Invasive Ants, Project Number: 6066-22320-010-00-D

536 Authors' contributions

537 C.A.D.X, M.L.A. and A.E.W. conceived and designed the study. C.A.D.X and M.L.A. performed

538 the analysis. C.A.D.X, M.L.A. and A.E.W. discussed the results and wrote the manuscript.

539 Acknowledgements

540 Trade, firm, or corporation names in this publication are for the information and convenience of

541 the reader. Such use does not constitute an official endorsement or approval by the U.S.

542 Department of Agriculture of the Agricultural Research Service (USDA-ARS).

543 References

544 1. Stork NE. How many species of insects and other terrestrial arthropods are there on Earth? 545 Annu Rev Entomol. 2018;63:31-45.

546 2. Li CX, Shi M, Tian JH, Lin XD, Kang YJ, Chen LJ, et al. Unprecedented genomic diversity 547 of RNA viruses in arthropods reveals the ancestry of negative-sense RNA viruses. Elife. 548 2015;29(4):05378.

25

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

549 3. Shi M, Lin X-D, Tian J-H, Chen L-J, Chen X, Li C-X, et al. Redefining the invertebrate 550 RNA virosphere. Nature. 2016;540(7634):539-43.

551 4. Käfer S, Paraskevopoulou S, Zirkel F, Wieseke N, Donath A, Petersen M, et al. Re- 552 assessing the diversity of negative strand RNA viruses in insects. PLoS Pathog. 553 2019;15(12).

554 5. Shi M, Neville P, Nicholson J, Eden JS, Imrie A, Holmes EC. High-resolution 555 metatranscriptomics reveals the ecological dynamics of mosquito-associated RNA viruses 556 in western australia. J Virol. 2017;91(17):00680-17.

557 6. Atoni E, Zhao L, Karungu S, Obanda V, Agwanda B, Xia H, et al. The discovery and global 558 distribution of novel mosquito-associated viruses in the last decade (2007-2017). Rev Med 559 Virol. 2019;29(6):13.

560 7. Faizah AN, Kobayashi D, Isawa H, Amoa-Bosompem M, Murota K, Higa Y, et al. 561 Deciphering the virome of Culex vishnui subgroup mosquitoes, the major vectors of 562 japanese encephalitis, in . Viruses. 2020;12(3).

563 8. Pettersson JH, Shi M, Eden JS, Holmes EC, Hesson JC. Meta-transcriptomic comparison 564 of the RNA viromes of the mosquito vectors Culex pipiens and Culex torrentium in 565 Northern Europe. Viruses. 2019;11(11).

566 9. Kleanthous E, Olendraite I, Lukhovitskaya NI, Firth AE. Discovery of three RNA viruses 567 using ant transcriptomic datasets. Archives of virology. 2019;164(2):643-7. Epub 568 2018/11/10.

569 10. Dhaygude K, Johansson H, Kulmuni J, Sundström L. Genome organization and molecular 570 characterization of the three Formica exsecta viruses-FeV1, FeV2 and FeV4. PeerJ. 571 2019;20(6).

572 11. Valles SM, Rivers AR. Nine new RNA viruses associated with the fire ant Solenopsis 573 invicta from its native range. Virus Genes. 2019;55(3):368-80.

574 12. Lacey LA, Grzywacz D, Shapiro-Ilan DI, Frutos R, Brownbridge M, Goettel MS. Insect 575 pathogens as biological control agents: back to the future. Journal of Invertebrate 576 Pathology. 2015;132:1-41.

577 13. Chen X, Gonçalves MA. Engineered viruses as genome editing devices. Mol Ther. 578 2016;24(3):447-57.

579 14. Rode NO, Estoup A, Bourguet D, Courtier-Orgogozo V, Débarre F. Population 580 management using gene drive: molecular design, models of spread dynamics and 581 assessment of ecological risks. Conservation Genetics. 2019;20(4):671-90.

582 15. Porter SD, Savignano DA. Invasion of polygyne fire ants decimates native ants and disrupts 583 arthropod community. Ecology. 1990;71(6):2095-106.

26

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

584 16. Pimentel D, Zuniga R, Morrison D. Update on the environmental and economic costs 585 associated with alien-invasive species in the United States. Ecological Economics. 586 2005;52(3):273-88.

587 17. Callcott A-MA, Collins HL. Invasion and range expansion of imported fire ants 588 (Hymenoptera: Formicidae) in North America from 1918-1995. The Florida Entomologist. 589 1996;79(2):240-51.

590 18. Ascunce MS, Yang C-C, Oakey J, Calcaterra L, Wu W-J, Shih C-J, et al. Global invasion 591 history of the fire ant Solenopsis invicta. Science. 2011;331(6020):1066-8.

592 19. Shoemaker DD, Ahrens M, Sheill L, Mescher M, Keller L, Ross KG. Distribution and 593 prevalence of Wolbachia Infections in native populations of the fire ant Solenopsis invicta 594 (Hymenoptera: Formicidae). Environmental Entomology. 2003;32(6):1329-36.

595 20. Bouwma AM, Ahrens ME, DeHeer CJ, DeWayne Shoemaker D. Distribution and 596 prevalence of Wolbachia in introduced populations of the fire ant Solenopsis invicta. Insect 597 Mol Biol. 2006;15(1):89-93.

598 21. Yang C-C, Yu Y-C, Valles SM, Oi DH, Chen Y-C, Shoemaker D, et al. Loss of microbial 599 (pathogen) infections associated with recent invasions of the red imported fire ant 600 Solenopsis invicta. Biological Invasions. 2010;12(9):3307-18.

601 22. Porter SD, Fowler HG, Mackay WP. Fire ant mound densities in the United States and 602 Brazil (Hymenoptera: Formicidae). Journal of Economic Entomology. 1992;85(4):1154- 603 61.

604 23. Morrison LW, Porter SD, Daniels E, Korzukhin MD. Potential global range expansion of 605 the invasive fire ant, Solenopsis invicta. Biological Invasions. 2004;6(2):183-91.

606 24. Drees BM, Calixto AA, Nester PR. Integrated pest management concepts for red imported 607 fire ants Solenopsis invicta (Hymenoptera: Formicidae). Insect Science. 2013;20(4):429- 608 38.

609 25. Valles SM, Porter SD, Choi M-Y, Oi DH. Successful transmission of Solenopsis invicta 610 virus 3 to Solenopsis invicta fire ant colonies in oil, sugar, and cricket bait formulations. 611 Journal of Invertebrate Pathology. 2013;113(3):198-204.

612 26. Oi D, Valles S, Porter S, Cavanaugh C, White G, Henke J. Introduction of fire ant 613 biological control agents into the Coachella Valley of California. Florida Entomologist. 614 2019;102(1):284-6, 3.

615 27. Valles SM, Strong CA, Dang PM, Hunter WB, Pereira RM, Oi DH, et al. A picorna-like 616 virus from the red imported fire ant, Solenopsis invicta: initial discovery, genome sequence, 617 and characterization. Virology. 2004;328(1):151-7.

27

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

618 28. Valles SM, Hashimoto Y. Isolation and characterization of Solenopsis invicta virus 3, a 619 new positive-strand RNA virus infecting the red imported fire ant, Solenopsis invicta. 620 Virology. 2009;388(2):354-61.

621 29. Valles SM, Strong CA, Hashimoto Y. A new positive-strand RNA virus with unique 622 genome characteristics from the red imported fire ant, Solenopsis invicta. Virology. 623 2007;365(2):457-63.

624 30. Valles SM, Porter SD, Calcaterra LA. Prospecting for viral natural enemies of the fire ant 625 Solenopsis invicta in Argentina. PLoS One. 2018;13(2).

626 31. Allen ML. Near-complete genome sequences of new strain of Nylanderia Fulva Virus 1 627 from Solenopsis invicta. Microbiol Resour Announc. 2020;9(15):00798-19.

628 32. Valles SM, Shoemaker D, Wurm Y, Strong CA, Varone L, Becnel JJ, et al. Discovery and 629 molecular characterization of an ambisense densovirus from South American populations 630 of Solenopsis invicta. Biological Control. 2013;67(3):431-9.

631 33. Manfredini F, Shoemaker D, Grozinger CM. Dynamic changes in host-virus interactions 632 associated with colony founding and social environment in fire ant queens (Solenopsis 633 invicta). Ecol Evol. 2015;6(1):233-44.

634 34. Hsu H-W, Chiu M-C, Shoemaker D, Yang C-CS. Viral infections in fire ants lead to 635 reduced foraging activity and dietary changes. Scientific Reports. 2018;8(1):13498.

636 35. Valles SM. Positive-strand RNA viruses infecting the red imported fire ant, Solenopsis 637 invicta. Psyche. 2012;2012:821591.

638 36. Allen ML, Rhoades JH, Sparks ME, Grodowitz MJ. Differential gene expression in red 639 imported fire ant (Solenopsis invicta) (Hymenoptera: Formicidae) larval and pupal stages. 640 Insects. 2018;9(4).

641 37. Morandin C, Tin MMY, Abril S, Gómez C, Pontieri L, Schiøtt M, et al. Comparative 642 transcriptomics reveals the conserved building blocks involved in parallel evolution of 643 diverse phenotypic traits in ants. Genome Biology. 2016;17(1):43.

644 38. Fontana S, Chang N-C, Chang T, Lee C-C, Dang V-D, Wang J. The fire ant social 645 supergene is characterized by extensive gene and transposable element copy number 646 variation. Molecular Ecology. 2020;29(1):105-20.

647 39. Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 648 7.0 for bigger datasets. Molecular Biology and Evolution. 2016;33(7):1870-4.

649 40. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, et al. MrBayes 650 3.2: efficient Bayesian phylogenetic inference and model choice across a large model 651 space. Systematic biology. 2012;61(3):539-42. Epub 2012/02/22.

28

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

652 41. Bewick V, Cheek L, Ball J. Statistics review 10: further nonparametric methods. Critical 653 care (London, England). 2004;8(3):196-9. Epub 2004/04/16.

654 42. De Mendiburu F, Yaseen M. Statistical procedures for agricultural research. R package 655 version 140. https://myaseen208.github.io/agricolae/2020.

656 43. Di Giallonardo F, Schlub TE, Shi M, Holmes EC. Dinucleotide composition in animal 657 RNA viruses is shaped more by virus family than by host species. Journal of virology. 658 2017;91(8):e02381-16.

659 44. Kapoor A, Simmonds P, Lipkin WI, Zaidi S, Delwart E. Use of nucleotide composition 660 analysis to infer hosts for three novel picorna-like viruses. J Virol. 2010;84(19):10322-8.

661 45. Elek A, Kuzman M, Vlahovicek K. coRdon: codon usage analysis and prediction of gene 662 expressivity. R package version 170. https://github.com/BioinfoHR/coRdon2020.

663 46. Charif D, Lobry JR. SeqinR 1.0-2: a contributed package to the R project for statistical 664 computing devoted to biological sequences retrieval and analysis. In: Bastolla U, Porto M, 665 Roman HE, Vendruscolo M, editors. Structural Approaches to Sequence Evolution: 666 Molecules, Networks, Populations. Berlin, Heidelberg: Springer Berlin Heidelberg; 2007. 667 p. 207-32.

668 47. Kassambara A, Mundt F. factoextra: extract and visualize the results of multivariate data 669 analyses. R package version 107. http://www.sthda.com/english/rpkgs/factoextra2020. 670 48. Vasilakis N, Forrester NL, Palacios G, Nasar F, Savji N, Rossi SL, et al. Negevirus: a 671 proposed new taxon of insect-specific viruses with wide geographic distribution. Journal 672 of virology. 2013;87(5):2475-88. Epub 2012/12/19.

673 49. Williams SH, Levy A, Yates RA, Somaweera N, Neville PJ, Nicholson J, et al. The 674 diversity and distribution of viruses associated with Culex annulirostris mosquitoes from 675 the Kimberley region of western Australia. Viruses. 2020;12(7):717.

676 50. Sõmera M, Kvarnheden A, Desbiez C, Blystad DR, Sooväli P, Kundu JK, et al. Sixty years 677 after the first description: genome sequence and biological characterization of European 678 wheat striate mosaic virus infecting cereal crops. Phytopathology. 2020;110(1):68-79.

679 51. Bodewes R, Ruiz-Gonzalez A, Schapendonk CME, van den Brand JMA, Osterhaus 680 ADME, Smits SL. Viral metagenomic analysis of feces of wild small carnivores. Virology 681 journal. 2014;11:89-.

682 52. Toriyama S, Kimishima T, Takahashi M, Shimizu T, Minaka N, Akutsu K. The complete 683 nucleotide sequence of the genome and genomic comparisons with 684 viruses of the genus Tenuivirus. Journal of General Virology. 1998;79(8):2051-8.

685 53. Dou X. Transcriptome analysis and identification of genes involved in moth sex 686 pheromone biosynthetic pathways. Graduate Theses and Dissertations2019.

29

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

687 54. Valles SM, Oi DH, Becnel JJ, Wetterer JK, LaPolla JS, Firth AE. Isolation and 688 characterization of Nylanderia fulva virus 1, a positive-sense, single-stranded RNA virus 689 infecting the tawny crazy ant, Nylanderia fulva. Virology. 2016;496:244-54. Epub 690 2016/06/30.

691 55. Invasive.org: center for invasive species and ecossystem healthy. 692 https://www.invasive.org/species/diseases.cfm. Accessed 4 August 2020.

693 56. Wolf YI, Kazlauskas D, Iranzo J, Lucía-Sanz A, Kuhn JH, Krupovic M, et al. Origins and 694 evolution of the global RNA virome. mBio. 2018;9(6):e02329-18.

695 57. Falk BW, Tsai JH. Biology and molecular biology of viruses in the genus Tenuivirus. 696 Annual review of phytopathology. 1998;36:139-63.

697 58. Liu W, Hajano JU, Wang X. New insights on the transmission mechanism of tenuiviruses 698 by their vector insects. Curr Opin Virol. 2018;33:13-7.

699 59. Valles SM, Strong CA, Hunter WB, Dang PM, Pereira RM, Oi DH, et al. Expressed 700 sequence tags from the red imported fire ant, Solenopsis invicta: annotation and utilization 701 for discovery of viruses. J Invertebr Pathol. 2008;99(1):74-81.

702 60. Nault LR, Gordon DT. Multiplication of Maize Stripe Virus in Peregrinus maidis. 703 Phytopathology. 1988;78(7):991-5.

704 61. Sicard A, Yvon M, Timchenko T, Gronenborn B, Michalakis Y, Gutierrez S, et al. Gene 705 copy number is differentially regulated in a multipartite virus. Nat Commun. 2013;4(2248).

706 62. Wu B, Zwart MP, Sánchez-Navarro JA, Elena SF. Within-host evolution of segments ratio 707 for the tripartite genome of Alfalfa mosaic virus. Scientific Reports. 2017;7(1):5004.

708 63. Zwart MP, Elena SF. Modeling multipartite virus evolution: the genome formula facilitates 709 rapid adaptation to heterogeneous environments. Virus Evol. 2020;6(1).

710 64. Schoebel CN, Botella L, Lygis V, Rigling D. Population genetic analysis of a parasitic 711 mycovirus to infer the invasion history of its fungal host. Molecular Ecology. 712 2017;26(9):2482-97.

713 65. Carver S, Lunn T. When are pathogen dynamics likely to reflect host population genetic 714 structure? Molecular Ecology. 2020;29(5):859-61.

715 66. Zhang C, Pei X, Wang Z, Jia S, Guo S, Zhang Y, et al. The Rice stripe virus pc4 functions 716 in movement and foliar necrosis expression in Nicotiana benthamiana. Virology. 717 2012;425(2):113-21.

30

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

718 Table 1. Information summary of S. invicta transcriptome dataset analyzed in this study.

Sample Part of insect SRA access # Stage Origin Date of collection Other information Refence code body A SRX3035962 Larvae Whole body Mississippi, USA March 2016 Polygynous Allen et al., (2018) B SRX3035961 Larvae Whole body Mississippi, USA March 2016 Polygynous Allen et al., (2018) C SRX3035960 Larvae Whole body Mississippi, USA March 2016 Polygynous Allen et al., (2018) X SRX3035959 Pupa Whole body Mississippi, USA March 2016 Polygynous Allen et al., (2018) Y SRX3035964 Pupa Whole body Mississippi, USA March 2016 Polygynous Allen et al., (2018) Z SRX3035963 Pupa Whole body Mississippi, USA March 2016 Polygynous Allen et al., (2018) Q2 DRX037806 Adults Whole body Texas, USA September 2012 Polygynous, queen Morandin et al., (2016) W2 DRX037809 Adults Whole body Texas, USA September 2012 Polygynous, worker Morandin et al., (2016) W3 DRX037810 Adults Whole body Texas, USA September 2012 Polygynous, worker Morandin et al., (2016) Y05 SRX5464977 Adults Brain Florida, USA May 2014 Queen -* K05 SRX5464984 Adults Brain Florida, USA May 2014 Queen - CA01 SRX5464990 Adults Brain Florida, USA May 2014 Queen - A pool of two virgin 2-small SRX5822389 Adults Whole body Taiwan May 2014 Fontana et al., (2020) queens, lab colony 719 * Reference is not available yet.

31

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

720 Table 2. Abundance of viruses and housekeeping genes, cytochrome c oxidase subunit I (cox1), ribosomal protein L18 (rpl18) and

721 translation elongation factor 1 (eif1-beta), expressed in percentage of viral reads per total number of reads in each library.

Virus abundance (%)* Housekeeping genes (%) Sample SRA access # SINV-14 SINV-15 SINV-16 SINV-17 SINaDNV cox1 rpl18 eif1-beta RNA1 RNA2 RNA3 RNA4 Total& A SRX3035962 - - - - - 0.008 0.516 0.128 0.002 0.3910 0.0238 0.0163 B SRX3035961 0.047 0.019 0.042 0.072 0.180 0.013 0.327 0.235 - 0.7102 0.0263 0.0267 C SRX3035960 0.046 0.005 0.051 0.048 0.150 0.019 0.120 0.281 0.002 0.4324 0.0246 0.0305 X SRX3035959 - - - - - 0.009 - - - 0.2860 0.0161 0.0080 Y SRX3035964 - - - - - 0.023 - - - 0.3236 0.0153 0.0089 Z SRX3035963 0.374 0.032 0.131 0.089 0.626 0.039 - - - 0.2687 0.0135 0.0090 Q2 DRX037806 1.469 0.055 0.240 0.944 2.708 0.211 - - - 1.0036 0.0530 0.0469 W2 DRX037809 1.679 0.006 0.088 0.601 2.374 0.020 - - - 2.1720 0.0342 0.0113 W3 DRX037810 1.658 0.006 0.089 0.619 2.372 0.020 - - - 1.6576 0.0351 0.0117 Y05 SRX5464977 6.803 0.941 0.650 0.001 8.394 - - - - 0.4903 0.0119 0.0048 K05 SRX5464984 ------0.4957 0.0141 0.0046 CA01 SRX5464990 ------0.4639 0.0158 0.0064 2-small SRX5822389 0.084 0.009 0.017 0.019 0.130 - - - - 0.9619 0.0712 0.0340 722 * Viruses with abundance equal or higher than those shown by at least one of the housekeeping genes are highlighted in bold and those

723 with full-length genome underlined.

724 & Total: means all segments together

725

32

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

726 Figure legends

727 Figure 1. Genome organization of five putative new viruses associated with S. invicta

728 transcriptomes. Predicted open reading frames (ORFs) encoding proteins homologous to known

729 viral proteins or conserved domain that they encode are represented and colored according to their

730 orientation in the genome (dark blue: those encoded on positive strand and light blue: those

731 encoded on negative strand). Graphic representation of (A) segmented negative single-strand RNA

732 genome S. invicta virus 14 (SINV-14) and (B) conserved consensus sequence located at the ends

733 of tenuivirus genome segment (5’-ACACAAAGU and ACUUUGAGU-3’) detected in SINV-14

734 genome. Dashed lines indicate incomplete sequence. Nucleotides highlighted in red differ from

735 consensus sequence. (C) Negative-sense single-strand RNA genome S. invicta virus 15 (SINV-

736 15), and positive-sense single-strand RNA genomes (D) S. invicta virus 16 (SINV-16) and (E) S.

737 invicta virus 17 (SINV-17) are shown. (F) Partial double-strand DNA genome of S. invicta-

738 associated densovirus (SINaDNV) is represented. RdRp and L: RNA-dependent RNA polymerase;

739 G: glycoprotein; N: nucleocapsid protein; p3: unknown; NS4: non-structural protein 4; NCP: non-

740 nucleocapsid protein; P: phosphoprotein; M: matrix protein; CP: capsid protein; H: Helicase; V:

741 viral methyltransferase; F: Ftsj-like methyltransferase; NS1and NS2: non-structural protein 1 and

742 2, respectively.

743 Figure 2. Evolutionary history inferred for different genomic segments of S. invicta virus 14

744 (SINV-14). Midpoint-rooted Bayesian phylogenetic trees based on (A) full-length amino-acid

745 sequences of RNA-dependent RNA polymerase (RdRp), (B) amino-acid sequence of conserved

746 glycoprotein domain, (C) full-length amino-acid sequence of movement protein and (D) full-

747 length amino-acid sequence of nucleocapsid protein. Bayesian posterior probabilities are shown at

748 the nodes. Nodes with posterior probability values between >0.50 and ≤85 are indicated by empty

33

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

749 circles, >0.85 and <0.95 are indicated by half-filled circles and those with values ≥0.95 are

750 indicated by filled circles. The scale bar at the bottom indicates the number of amino acid

751 substitutions per site. Branches highlighted in red indicate virus characterized in this study. Genera

752 are indicated at the right side. Proposed species unaccepted yet by International Committee on

753 Taxonomy of Viruses (ICTV) are followed by an asterisk. Proteins sequences encoded by cognates

754 components for a given virus that do not have conserved domain according to Conserved Domain

755 Database (CDD) were not included.

756 Figure 3. Evolutionary history of new viruses associated to S. invicta transcriptome based on

757 Bayesian inference. Midpoint-rooted phylogenetic trees were constructed based on (A) conserved

758 domain (pfam00946) of RNA-dependent RNA polymerase (RdRp) of S. invicta virus 15 (SINV-

759 15) and representative species of the order Mononegavirales, (B) conserved domain (pfam00680)

760 of RdRp of S. invicta virus 16 (SINV-16) and representative species of the families Iflaviridae and

761 Dicistroviridae and (C) full-length amino-acid sequences of RdRp of S. invicta virus 17 (SINV-

762 17) and representative species of the proposed new genus Negevirus (unassigned to any family)

763 and families and Kitaviridae, are shown. (D) Bayesian phylogenetic tree based on

764 full-length amino-acid sequences of non-structural gene 1 (NS1) of S. invicta-associated

765 densovirus (SINaDNV) and representative species of the family Parvoviridae. Bayesian posterior

766 probabilities are shown at the nodes. Nodes with posterior probability values between >0.65 and

767 ≤85 are indicated by empty circle, >0.85 and <0.95 are indicated by half-filled circles and those

768 with values ≥0.95 are indicated by filled circles. The scale bar at the bottom indicates the number

769 of amino acid substitutions per site. Branches highlighted in red indicates viruses characterized in

770 this study. Genera/Families/Subfamilies are shown at right side.

34

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

771 Figure 4. Boxplot of housekeeping genes and viruses abundance. (A) Abundance among

772 housekeeping genes from all libraries were compared. Abundance was calculated as percentage of

773 housekeeping or viruses reads per total number of reads in each library. (B) Abundance between

774 viruses and housekeeping genes were compared. Statistical test was performed among virus and

775 housekeeping genes abundance from the same libraries. (C) Normalized abundance of S. invicta

776 virus 14 (SINV-14) segments significantly differ among them. Normalized abundance was

777 calculated by dividing the abundance of each segment by the total abundance of the entire genome.

778 Box plots with different letters indicate significant differences between groups according to non-

779 parametric Kruskal-Wallis test followed by post hoc multiple comparison test using Fisher’s least

780 significant difference (p<0.05). Box plots show the first and third quartiles as a box, horizontal

781 line corresponds to the median and whiskers correspond to 1.5 times the inter-quartile distance

782 (IQR) from the difference between first and third quartiles.

783 Figure 5. Compositional biases of Solenopsis invicta virus 14 (SINV-14) differ from plant

784 tenuivirus and tenui-like viruses associated with flies. Principal Components Analysis (PCA) of

785 synonymous codon usage (A-D) and dinucleotide bias (E-H) of SINV-14 and other viruses that

786 infect plants and animals (vertebrate and invertebrate) are shown. Scatterplots show the first two

787 principal components that account for the largest variance (shown in parenthesis) in the data set

788 with ellipses representing each group and dots individuals. Viruses were divided into two main

789 groups: plant viruses include those that replicate in plants and insect vectors belonging to the

790 genera Tenuivirus and Orthotospovirus and non-plant viruses including those that do not replicate

791 in plants belonging to the genera Orthobunyavirus and representatives within family

792 Phenuiviridae. For detailed information of the data set see Supplementary Table S4. For NS4

793 protein, comparison was performed only for SINV-14, Fitzroy Crossing tenui-like virus1

35

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

794 (FCTenV1) and tenuiviruses, as this gene is not shared by other groups. WhHV, Wuhan horsefly

795 virus.

36

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

796 Figure 1

797

37

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

798 Figure 2

799 38

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

800 Figure 3

801

39

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

802 Figure 4

803

40

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

804 Figure 5

805

41