bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
1 Ever-increasing viral diversity associated with the red imported fire ant Solenopsis invicta 2 (Formicidae: Hymenoptera)
3
4 César A.D. Xavier1, Margaret L. Allen2*, and Anna E. Whitfield1*
5
6 1Department of Entomology and Plant Pathology, North Carolina State University, 840 Main Campus
7 Drive, Raleigh, NC 27606, USA
8 2U. S. Department of Agriculture, Agricultural Research Service, Biological Control of Pests Research
9 Unit, 59 Lee Road, Stoneville, MS 38776, USA
10 *Corresponding author: Margaret L. Allen
11 Phone: 1(662) 686-3647
12 E-mail: [email protected]
13 *Corresponding author: Anna E. Whitfield
14 Phone: 1(919) 515-5861
15 E-mail: [email protected]
1
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
16 Abstract
17 Background: Advances in sequencing and analysis tools have facilitated discovery of many new
18 viruses from invertebrates, including ants. Solenopsis invicta is an invasive ant that has quickly
19 spread around world causing significant ecological and economic impacts. Its virome has begun
20 to be characterized pertaining to potential use of viruses as natural enemies. Although the S. invicta
21 virome is best characterized among ants, most studies have been performed in its native range,
22 with little information from invaded areas.
23 Methods: Using a metatranscriptome approach, we further characterized viruses associated with
24 S. invicta, in two introduced areas, U.S and Taiwan. The data set used here was obtained from
25 different stages (larvae, pupa, and adults) of S. invicta life cycle. Publicly available RNA sequences
26 from GenBank’s Sequence Read Archive were downloaded and de novo assembled using CLC
27 Genomics Workbench 20.0.1. Contigs were compared against the non-redundant protein
28 sequences and those showing similarity to viral sequences were further analyzed.
29 Results: We characterized five putative new viruses associated with S. invicta transcriptomes.
30 Sequence comparisons revealed extensive divergence across ORFs and genomic regions with most
31 of them sharing less than 40% amino acid identity with those closest homologous sequences
32 previously characterized. The first negative-sense single-stranded RNA viruses included in the
33 orders Bunyavirales and Mononegavirales are reported. In addition, two positive single-strand
34 viruses and one single strand DNA virus were also characterized. While the presence of a putative
35 tenuivirus associated with S. invicta was previously suggested to be a contamination, here we
36 characterized and present strong evidence that Solenopsis invicta virus 14 (SINV-14) is a tenui-
37 like virus that has a long-term association with the ant. Furthermore, based on virus abundance
38 compared to housekeeping genes, phylogenetic relationships, and completeness of viral coding
2
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
39 sequences, our results suggest that four of five viruses reported, those being SINV-14, SINV-15,
40 SINV-16 and SINV-17, replicate in the ant S. invicta.
41 Conclusions: The present study expands our knowledge about viral diversity associated with S.
42 invicta in introduced areas with potential to be used as biological control agents, which will require
43 further biological characterization.
44 Keywords: Red imported fire ant, S. invicta, Virus diversity, Virome, Metatranscriptome,
45 Tenuivirus, Mononegavirales, Bunyavirales.
3
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
46 Background
47 Insects are the most abundant and diverse group of animals on earth (1). High throughput
48 sequencing has led to huge advances in revealing previously unknown diversity of insect viruses,
49 significantly contributing to filling deep phylogenetic gaps along evolutionary history within the
50 most diverse viral lineages (2-4). Nonetheless, like insect diversity, the diversity of viruses
51 associated with insects is far from clear (1, 3). While many studies have focused on viromes of
52 arbovirus-transmitting insects, especially those involved in transmission of medically important
53 viruses, such as mosquitoes (5-8), other groups that are important pests either impacting
54 agricultural or natural ecosystems, such as invasive ants, have been barely studied (9-11). In
55 addition to contributing to better understanding of basic aspects of virus ecology and evolution,
56 these studies may contribute to new opportunities to use viruses as tools to develop more
57 sustainable insect control methods (12-14).
58 The red imported fire ant, Solenopsis invicta, is an invasive pest causing significant
59 ecological impact and economic loss in invaded areas (15, 16). Originating from South America,
60 S. invicta was accidentally introduced into the southern region of the United States (U.S.) almost
61 a century ago, becoming a serious problem (17). Since then, it has spread throughout the
62 southeastern U.S. and more recently into Oklahoma, New Mexico, Arizona, and California (18).
63 Limited introduction events, likely associated with small founder populations, led to a significant
64 reduction in natural enemies and enemy diversity associated with S. invicta in introduced areas
65 (19-21). Therefore, S. invicta populations may reach sizes even greater than those observed in its
66 native range, making control difficult and even more costly (22). High densities observed in S.
67 invicta populations in the U.S. have facilitated its dispersal across the world, contributing to
68 repeated introduction events in several countries, such as China, Taiwan, and Australia (18).
4
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
69 Morrison, Porter (23) demonstrated, based on predictive models, that most tropical and subtropical
70 regions worldwide are potentially appropriate for S. invicta infestation. Highly competitive ability,
71 generalist feeding habits and high populations densities make S. invicta a successful invasive
72 species causing huge disturbance in biodiversity by displacing native ants and other arthropods in
73 introduced regions (15). Currently, chemical insecticides are the most common control strategy
74 used against S. invicta (24). Low efficacy due to temporary effects of chemical applications, high
75 cost in extensive areas, and off-target effects harmful to beneficial and other native species are
76 substantial impairments to addressing invasive ant damage and expansion. Therefore,
77 establishment of management strategies that are both environmentally friendly and self-sustainable
78 are necessary.
79 Classical biological control has been one strategy used in an attempt to control this pest in
80 the U.S., with viruses considered a promising resource to be used as biopesticides (24-26). Over
81 the last decade, a great effort has been made in characterizing the S. invicta virome pertaining to
82 potential use in biological control (11, 27-30). To date, the S. invicta virome is composed of mainly
83 positive-sense single-strand RNA (+ssRNA) viruses in the order Picornavirales. These include
84 eleven viruses from families Dicistroviridae, Polycipiviridae, Iflaviridae, Soliniviridae and two
85 unclassified viruses (11, 31). Additionally, one double-strand RNA (dsRNA) and one single-strand
86 DNA (ssDNA) of the families Totiviridae and Parvoviridae, respectively, have been characterized
87 (11, 32). While most of these viruses have been reported associated with S. invicta in its native
88 range, only the species Solenopsis invicta virus 1 (SINV-1), SINV-2, SINV-3, SINV-6 and
89 Nylanderia fulva virus 1 (NfV-1) have been reported in United States (11, 31). Although great
90 progress has been reached in identification and molecular characterization of S. invicta virome, the
91 effect of these viruses and their potential use in biological control still need to be elucidated.
5
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
92 Interactions among S. invicta and SINV-1, SINV-2 and SINV-3, have been the only ones
93 previously characterized (25, 33, 34). SINV-1 was shown to affect claustral queen weight making
94 them lighter than uninfected ones, whereas SINV-2 directly affected fitness of queens by reducing
95 their reproductive output (33). SINV-3 was shown to be the most aggressive virus, causing
96 significant mortality in S. invicta colonies, with greatest potential as a biological control agent (25,
97 35).
98 Here using a metatranscriptome approach, we further characterized the viruses associated
99 with S. invicta in two introduced areas, U.S and Taiwan. These investigations utilized existing
100 publicly available RNA sequences deposited in NCBI GenBank as Sequence Read Archive (SRA)
101 data files (Table 1). Five new viruses were found associated with S. invicta, with the first negative-
102 sense single-stranded RNA (-ssRNA) viruses included in the orders Bunyavirales and
103 Mononegavirales reported. In addition, two +ssRNA viruses, included in the family Iflaviridae
104 and an unassigned species, and a partial ssDNA virus, were also characterized.
105 Methods
106 Data set selection and sequencing
107 Contigs corresponding to a putative new virus genome were initially obtained from a
108 transcriptome project studying differential gene expression between larvae and pupae stages of S.
109 invicta collected in Mississippi, U.S. (36). To further investigate and characterize viral diversity
110 associated with S. invicta, six libraries (36) were downloaded from GenBank’s Sequence Read
111 Archive (SRA) and analyzed. In addition, to determine whether those viruses were present in other
112 locations in U.S. and abroad, several transcriptome libraries deposited at SRA were compared
113 against those previous putative viral contigs using BLASTn. Libraries that presented high
114 abundance of reads mapping to putative viral contigs with an E-value cutoff lower than 1e-2 were
6
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
115 downloaded and further analyzed. The final data set analyzed here consisted of 13 libraries, all
116 from S. invicta transcriptomes. Detailed information on the samples are presented in Table 1. All
117 information about sample collection, RNA extraction, library preparation and high throughput
118 sequencing has been previously described (36-38).
119 De novo assembly and virus genome characterizations
120 Sequencing reads were trimmed for quality (Suppl. Table S1) and De novo assembled using
121 CLC Genomics Workbench 20.0.1 with default settings. Reads associated to S. invicta were first
122 filtered by mapping them back to the S. invicta genome (GenBank accession: GCA_000188075.2)
123 and only unmapped reads were used for De novo assembling (Suppl. Table S1). Contigs were
124 compared against NCBI non-redundant protein sequence (nr) using BLASTx and those predicted
125 to contain near full-length and intact coding sequences homologous to viral sequences previously
126 described were confirmed by mapping reads back to obtain consensus sequences for each library.
127 Potential open reading frames (ORFs) of the putative new viruses were predicted using ORF finder
128 (NCBI) and by comparative analysis with those related viruses. For identification of conserved
129 functional domains, a domain-based BLAST search was performed against the Conserved Domain
130 Database (CDD). ORFs that did not present any conserved domain were predicted based on
131 comparative analysis with other known closely related viruses (5).
132 Multiple sequence alignment and phylogenetic analysis
133 For phylogenetic analyses, one representative sequence for each taxon most closely related
134 to viruses described here was retrieved from GenBank according to BLASTx analysis. ORF
135 integrity was checked and extracted using ORF finder and the presence of conserved domain
136 characteristics for each group was checked at CDD before proceeding to phylogenetic analysis.
137 For highly divergent data sets, regions comprising protein conserved domains were extracted using
7
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
138 a script written in R software according to coordinates obtained from CDD for each sequence, and
139 only the conserved domain was used (as indicated below in the figure legends). Multiple sequence
140 alignments of deduced amino acid sequences were prepared using the MUSCLE option in MEGA7
141 (39). Alignments were manually checked and adjusted when necessary. Phylogenetic trees were
142 constructed using Bayesian inference performed with MrBayes 3.2.6 (40). Estimates of the amino
143 acid model were automatically conducted by setting the prior for the amino acid model to mixed,
144 as implemented in MrBayes. The analyses were carried out running two independent runs of
145 20,000,000 generations with sampling at every 1000 generations and a burn-in of 25%.
146 Convergence between runs were accepted when average standard deviations of split frequencies
147 was lower than 0.01. Trees were visualized and edited using FigTree and Corel Draw, respectively.
148 Inferring viruses-S. invicta association
149 Most samples used here were prepared from whole body ants, and to differentiate truly
150 replicating viruses from those that might be from food contamination or commensal organisms,
151 viral abundance was compared with abundance of three housekeeping genes: cytochrome c
152 oxidase subunit I (cox1), that presented high expression levels, ribosomal protein L18 (rpl18) and
153 translation elongation factor 1 (eif1-beta), both with lower expression levels. Viral and internal
154 genes abundance were calculated as percentage of viral or housekeeping reads per total number of
155 reads in each library (Table 2). Differences among groups was assessed using non-parametric
156 Kruskal-Wallis test followed by post hoc multiple comparison test using Fisher’s least significant
157 difference (41) using Agricolae package (42) in R software. The data were log-transformed before
158 statistical testing. Thus, a virus was suggested to be hosted by S. invicta if they qualified in at least
159 two of the criteria as previously suggested by (8) with slight modifications: (i) virus abundance
160 was within the range or higher abundance than housekeeping genes; (ii) viral reads per total
8
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
161 number of reads in a library was higher than 0.01%; (iii) they were phylogenetically close to
162 another insect virus; and (iv) complete viral coding sequence regions were recovered.
163 Principal component analysis (PCA) of compositional bias
164 Synonymous codon usage (SCU) and dinucleotide composition bias has been used to infer
165 putative host-virus association and taxonomic placement for a given virus (43, 44). To gain further
166 insight about the host of the tenui-like sequences reported here, we used principal component
167 analysis (PCA) to compare the compositional bias between plant tenuivirus and other viruses
168 infecting plant and animals (vertebrates and invertebrates). SCU counts were determined for the
169 full-length nucleotide sequence of RNA-dependent RNA polymerase (RdRp), glycoprotein,
170 nucleocapsid and non-structural protein 4 (NS4) protein, using the coRdon package (45) in R
171 software. Dinucleotide bias was calculated as the frequency of dinucleotide XY divided by the
172 product of frequencies of nucleotide X and nucleotide Y using the SeqinR package (46) in R
173 software. Sequences highly divergent with no conserved domain according to CDD analysis were
174 not included. Principal components analysis was performed on codon usage counts and
175 dinucleotide bias using the function prcomp and plotted using the factoextra package (47), both
176 implemented in R software.
177 Results
178 New viruses found associated with S. invicta transcriptome
179 Virus sequences were found associated with S. invicta transcriptomes collected from three
180 geographic locations in the U.S. and one location from Taiwan, representing different stages of the
181 S. invicta life cycle (Table 1). De novo assembling of non-host reads (Suppl. Table S1) from 13
182 libraries followed by BLAST analyses revealed the presence of complete or near-complete
183 genomes of 4 putative new single-strand RNA (ssRNA) viruses, tentatively named S. invicta virus
9
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
184 14 (SINV-14), SINV-15, SINV-16 and SINV-17, and one partial genome encompassing the almost
185 full-length coding sequence of a ssDNA virus, named S. invicta-associated densovirus (SINaDNV;
186 Figure 1). Sequence comparisons revealed extensive divergence across ORFs and genomic regions
187 with most of them sharing less than 40% amino acid identity compared to those closest
188 homologous sequences previously characterized (Suppl. Table S2). In addition, typical domains
189 associated with RdRp and other proteins involved in virus replication, encapsidation, and entry
190 into the cell were found and characteristic of known virus groups, again confirming the viral
191 identity of those sequences (Suppl. Table S3). Further analyses demonstrated that SINV-14 and
192 SINV-15 fit within the order Bunyavirales and Mononegavirales, respectively, with SINV-14
193 having a multisegmented genome (Figure 1). The two other viruses with non-segmented +ssRNA
194 genomes, SINV-16 and SINV-17, were included within the family Iflaviridae and into an
195 unclassified group closely related to a proposed new genus of insect-infecting viruses “Negevirus”
196 (48), respectively. Finally, the partial ssDNA virus SINaDNV was included into family
197 Parvoviridae subfamily Densovirinae.
198 Molecular and phylogenetic characterization of new viruses
199 Negative-sense RNA viruses
200 The SINV-14 genome consists of a linear, negative-sense single-strand RNA composed of
201 four segments, RNA1, RNA2, RNA3 and RNA4 with approximately 8900, 2862, 2086 and 2501
202 nucleotides (nt), respectively (Figure 1A). Identity analysis of deduced amino acid sequences of
203 coding regions across all segments demonstrated an extensive divergence with those most closely
204 related viruses (Suppl. Table S2). The segments RNA1 and RNA2 encode only one ORF, located
205 on the negative strand (Figure 1A). The deduced amino acid sequence of RNA1 ORF was the most
206 conserved, showing highest identity with the RdRp of Fitzroy Crossing tenui-like virus 1
10
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
207 (FCTenV1), a new tenui-like virus described associated with Culex annulirostris from western
208 Australia (Suppl. Table S2) (49), followed by European wheat striate mosaic virus (EWSMV), a
209 typical plant tenuivirus (50). Likewise, even though the RNA2 ORF was the most divergent, it
210 showed highest identity with the glycoprotein of FCTenV1 (Suppl. Table S2). In contrast with
211 RNA1 and RNA2, the segments RNA3 and RNA4 encode two ORFs in ambisense orientation
212 (Figure 1A). Deduced amino acid sequence of RNA3 ORF, encoded on the negative strand,
213 showed highest identity with nucleocapsid protein of otter fecal bunyavirus, an unclassified
214 Phlebovirus (51), while no significant similarity was found to a small putative ORF located on the
215 positive strand (Suppl Table S2). The BLASTx analyses indicated that ORFs encoded by RNA4
216 in the positive and negative strands were related to those of non-nucleocapsid proteins from typical
217 plant tenuiviruses, and nonstructural protein 4 (NS4) of FCTenV1 followed by other plant
218 tenuiviruses, respectively (Suppl. Table S2). Interestingly, the ORF encoded on the negative strand
219 contained the specific domain of movement proteins (pfam03300) of plant tenuiviruses (Suppl.
220 Table S3). A domain-based BLAST search was performed against the CDD, confirming the
221 presence of conserved domains from typical members of family Phenuiviridae for RdRp,
222 glycoprotein, putative NS4 and non-nucleocapsid protein (Suppl. Table S3). Also interestingly,
223 further sequence analysis revealed the presence of conserved sequences at the ends of all four
224 segments identical to those found at tenuivirus genomes (3’-ACUUUGUGU and ACACAAAGU-
225 5’; Figure 1B) (52), indicating that full length genome segments were likely assembled from the
226 RNA-Seq datasets.
227 To further determine the taxonomic placement of SINV-14 within the order Bunyavirales,
228 Bayesian phylogenetic trees were inferred based on deduced amino acid sequence of RdRp,
229 glycoprotein, nucleocapsid and NS4, representative of all four segments found (Figure 2). In
11
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
230 accordance with identity analysis, SINV-14 RdRp was closest related to that of FCTenV1 (Figure
231 2A). These two viruses clustered closely to Horsefly horwuvirus (virus Wuhan horsefly virus,
232 WhHV), a tenui-like virus described from a poll of horseflies (family Tabanidae) (2), the only
233 species within the genus Horwuvirus (Figure 2A). Moreover, while the WhHV putative
234 glycoprotein and NS4 were very divergent and no conserved domains were detected based on CDD
235 analyses (thus they were not included into phylogenetic analyses) SINV-14 glycoprotein and NS4
236 phylogenetic trees were congruent with RdRp, clustering in a well-supported clade closest to
237 FCTenV1 and composing a sister clade with typical plant tenuiviruses (Figure 2B and C).
238 Interestingly, the phylogenetic tree of the nucleocapsid protein was incongruent compared to those
239 of other viral proteins (Figure 2D). It was more closely related to that of otter fecal bunyavirus and
240 other representative viruses of different genus within family Phenuiviridae rather than with plant
241 tenuiviruses (Figure 2D). The identity analysis supports the phylogenetic placement in which
242 nucleocapsid protein was significantly closer related to otter fecal bunyavirus than tenui-like and
243 plant tenuiviruses (Suppl. Table S2).
244 The SINV-15 genome comprises a linear negative-sense, single strand RNA varying from
245 10113 to 10135 nt (Figure 1C). The genome organization is typical of viruses within order
246 Mononegavirales, which contain the conserved core predicted to encode five ORFs: nucleocapsid
247 (N), phosphoprotein (P), matrix protein (M), glycoprotein (G) and RdRp (L) on the negative strand
248 (Figure 1C). Domain-based BLAST search confirmed that ORFs 1 and 5 carry nucleocapsid and
249 RdRp conserved domains, respectively, whereas no conserved domains were identified for ORFs
250 2, 3 and 4 (Suppl. Table S3). Thus, these ORFs were inferred based on comparative analysis with
251 other known related viruses. Identity analysis indicated SINV-15 is most closely related to Formica
252 fusca virus 1 (FfusV-1; Suppl. Table S2), another virus within order Mononegavirales recently
12
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
253 described associated with the ant Formica fusca (9). Indeed, phylogenetic analysis based on
254 deduced amino acid sequence of RdRp clustered SINV-15 closest to FfusV-1 and Formica exsecta
255 virus 4 (FeV-4), both currently unaccepted species yet. As these three viruses composed a clade
256 closely related to Orinoco virus (ONCV), the only accepted species of genus Orinovirus, it
257 suggests they are putative new members of this genus within the family Nyamiviridae. This is the
258 first report of an orinovirus associated with S. invicta.
259 Positive-sense RNA viruses
260 In addition to SINV-14 and SINV-15, which have negative sense genomes, we also
261 characterized two positive sense RNA viruses (Figure 1D and E). SINV-16 has a positive sense
262 single-stranded RNA genome varying from 10,355 to 10,369 nt (Figure 1D). Genome organization
263 is typical of viruses included in the order Picornavirales. The single ORF predicted encodes a
264 large polyprotein (2910 amino acids) including RdRp, helicase, and the canonical picornavirus
265 capsid protein domain (Figure 1D and Suppl. Table S3). Sequence comparison based on amino
266 acid sequence of the polyprotein revealed significant identity with two unaccepted iflavirus-related
267 species, pink bollworm virus 4, isolated from the pink bollworm moth Pectinophora gossypiella
268 (53) and Hubei myriapoda virus 1, isolated from a pool of individuals of genus Scutigeridae (3).
269 Phylogenetic analysis of RdRp domain clearly clustered SINV-16 within genus Iflavirus (Figure
270 3B). In accordance with identity analysis, it was most closely related to Hubei myriapoda virus 1,
271 in a well-supported clade composed also by pink bollworm virus 4 and Dinocampus coccinellae
272 paralysis virus (DcPV), an accepted iflavirus species isolated from the parasitic wasp Dinocampus
273 coccinellae. These results indicate that SINV-16 is a new iflavirus reported from S. invicta.
274 The SINV-17 genome consists of a linear, positive-sense single-stranded RNA with
275 approximately 10,341 nt predicted to encode three putative ORFs (Figure 1E). ORF1 was predicted
13
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
276 to encode the longest protein containing the viral methyltransferase, helicase and RdRp domains
277 (Figure 1E and Suppl. Table S3). ORF2 and ORF3 were predicted to encode smaller structural
278 proteins, which contain domains associated with virion glycoproteins of insect viruses and virion
279 membrane proteins of plant and insect viruses, respectively (Suppl. Table S3). Interestingly, while
280 ORFs 1 and 2 were most similar to Hubei virga-like virus 8, an unclassified virus isolated from a
281 pool of individuals of the genus Scutigeridae (3), ORF 3 showed significant identity to Loreto
282 virus isolated from Anopheles albimanus, a member of the proposed genus Negevirus (48) (Suppl.
283 Table S2). To further determine the taxonomic placement of SINV-17, Bayesian phylogenetic
284 trees were inferred based on the deduced amino acid sequence of ORF1 RdRp domain. In
285 accordance with identity analysis, SINV-17 was most closely related to Hubei virga-like virus 8,
286 appearing to represent a sister clade to the proposed genus Negevirus, and possibly representing a
287 new genus (Figure 3C).
288 Other virus sequences
289 Two small contigs were found showing significant similarity with RNA virus sequences.
290 The first one has 430 nt in length and exhibited highest identity with a hypothetical protein encoded
291 by Beihai tombus-like virus 6 (41.13% sequence identity; 97% coverage; E-value = 4e-17;
292 GenBank access: APG76145.1), isolated from a pool of marine crustaceans within the genus
293 Penaeidae (3). The second contig has 330 nt and showed highest identity with NfV-1 polyprotein
294 (59.49% sequence identity; 93% coverage; E-value = 2e-24; GenBank access: AOC55078.1) from
295 tawny crazy ant Nylanderia fulva (54).
296 In addition to the RNA viruses, we also characterized a partial genome encompassing the
297 almost full-length coding sequence of a single-strand DNA virus. The SINaDNV partial genome
298 assembled from RNA-Seq reads was predicted to encode three ORFs on the positive strand,
14
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
299 showing a typical organization of viruses within the family Parvoviridae (Figure 1F). The ORFs
300 1 and 2 were overlapped and were most similar to NS1 and NS2 of mosquito-infecting
301 densoviruses, respectively (Suppl. Table 2). In addition, a third ORF was predicted to encode the
302 capsid protein showing significant identity to Aedes albopictus C6/36 cell densovirus (Suppl.
303 Table S2). Phylogenetic analysis based on deduced amino acid sequence of the NS1, clustered
304 SINaDNV in a well-supported branch most closely related to viruses within subfamily
305 Densovirinae and genus Brevidensovirus (Figure 3D). While these results strongly suggest
306 SINaDNV is a new densovirus species, further investigation will be needed to reveal its whole
307 genome organization.
308 Inferring host association
309 Although the transcriptomic approach has been shown to efficiently reveal viral diversity
310 associated with a diverse range of samples, inference about host-virus association it is not always
311 a trivial task. S. invicta is a generalist predator/scavenger feeding primarily on other insects and
312 small arthropods, both alive and dead. Thus, it is possible that viruses characterized here could be
313 associated with undigested food or even with S. invicta-associated microorganisms, since most
314 samples involved the whole ant body (Table 1). To infer host-virus association and differentiate
315 viruses truly replicating from those that may be a contamination, we calculated the percentage of
316 viral reads for each library and compared with three housekeeping genes, cox1, rpl18 and eif1-
317 beta (Figure 4 and Table 2). While the abundance of all three housekeeping genes were relatively
318 stable across different libraries, it differed among them, with cox1 most abundant, followed by
319 rpl18 and eif1-beta (Figure 4A). Likewise, virus abundance was relatively stable within species,
320 except for SINV-14, which showed a great variation between libraries, varying between 0.130 to
321 8.394% of viral reads compared to the total reads, considering the virus complete genome (Table
15
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
322 2). For SINV-14 and SINV-16, read abundance was significantly higher than eif1-beta and rpl18,
323 with no difference compared to cox1 (Figure 4B). Moreover, SINV-17 abundance was
324 significantly lower than cox1 and higher than eif1-beta and rpl18. In contrast, SINV-15 abundance
325 was the lowest one among them, being significantly lower than coxI, with no difference compared
326 to rpl18 and eif1-beta (Figure 4B). In addition, SINV-14, SINV-15, SINV-16 and SINV-17
327 abundances were higher than 0.01% of reads compared to entire library (Table 2). The only
328 exception was SINaDNV, that had a very low abundance compared to housekeeping genes and
329 other viruses (Table 2).
330 Interestingly, SINV-14 abundance was much higher in a sample prepared from a dissected
331 brain compared to those from the whole body (Table 1 and 2). Considering the abundance of all
332 segments, SINV-14 from the brain sample was 17-fold more abundant compared with cox1 gene,
333 whereas abundance of those samples prepared from whole body varied 0.135 to 2.69-fold higher
334 than cox1 (Table 2). Moreover, the abundance of segments was asymmetric (Figure 4C). The reads
335 mapping on RNA1 were significantly more abundant, followed by RNA3 and RNA4, with RNA2
336 the least abundant (Figure 4C). These results suggest that SINV-14 is unlikely to be a
337 contamination, and may exhibit a tissue tropism, as we observed evidence that it accumulates more
338 in the brain compared with the whole body.
339 To further verify the relationship between SINV-14 and S. invicta, we also analyzed the
340 codon usage and dinucleotide bias (Figures 5). Compositional biases in virus genomes are likely
341 to mimic those observed in their host species (43, 44), and therefore has been used to assign a
342 given virus to a host and assist in taxonomic placement. Similar data set used for phylogenetic
343 analyses was used here (Suppl. Table S4). Principal component analysis of codon usage based on
344 RdRp, glycoprotein, nucleocapsid and NS4 protein, clearly separate SINV-14 from plant viruses
16
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
345 (tenuivirus and orthotospovirus), and non-plant viruses (phenuivirus, orthobunyavirus and tenui-
346 like viruses; Figure 5A-D). Based on the RdRp, three very clear clusters were observed,
347 representing plant virus, non-plant virus and SINV-14 (Figure 5A). Interestingly, the two tenui-
348 like viruses (WhHV and FCTenv1) were closer to plant tenuiviruses than SINV-14, based on RdRp
349 (Figure 5A). Moreover, while SINV-14 clearly clustered separately from other groups, based on
350 glycoprotein, nucleocapsid and NS4, this separation was not very clear between plant and non-
351 plant viruses and tenui-like viruses, as observed for RdRp (Figure 5A-D). We also performed
352 principal component analysis on nucleotide bias obtaining similar results to the codon usage
353 analysis (Figure 5E-H). Whereas RdRp clusters was not very clear, based on plant and non-plant
354 viruses, for all other genes SINV-14 clearly separated from any other group (Figure 5E-H), and
355 again, tenui-like viruses (WhHV and FCTenv1) were closer to other groups than SINV-14.
356 Discussion
357 The advancement of sequencing tools, the collection of numerous genetic samples of high-
358 impact invasive ants, and the continuing discovery of novel insect-related (and other) viruses have
359 been combined to facilitate identification of novel viruses like those reported here. While virus
360 identification for the purpose of addressing pathogenic relationships will certainly continue, it is
361 now possible to identify virome entities that are not necessarily pathogens in the organism samples
362 from which they are identified.
363 If a species of ant is invasive, are the viruses that persist with that species in its invasive
364 range also invasive species? Definitions of “invasive viruses” would argue that only when a virus
365 causes a disease of obvious impact is the virus itself considered invasive (55). Yet it is easy to
366 speculate that a virus arriving with an invasive ant could prove pathogenic to native ants, insects,
367 and insectivores in the invasive region; perhaps even plants and the human food chain could
17
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
368 experience impact. More thorough characterization of the virosphere will enhance our
369 understanding of the roles microbes play and how viruses and hosts interact.
370 S. invicta is a ground-dwelling ant that feeds on a broad diet that may include plant and
371 insect exudates, prey, and decaying matter. Thus, the opportunities for exchange of virus particles
372 from the environment, whether native or novel, are many. The social nature of ant colonies where
373 exchanges of biological fluids among colony members of all castes and life stages must occur also
374 facilitates distribution of virus within the colony. Distinguishing clearly between different
375 host/microbe relationships and forms of virus transmission (horizontal and vertical) will be
376 challenging with invasive ant systems.
377 Using a transcriptome approach, we searched for RNA viruses in the SRA data collected
378 from S. invicta. Curiously, while a great diversity of positive and negative sense viruses has been
379 reported from ants (9, 10) and arthropods (2), S. invicta virome has been composed mainly of
380 +ssRNA viruses in the order Picornavirales, with no negative sense RNA viruses previously
381 reported. One factor that could be responsible for this bias is the selection of polyadenylated RNA
382 for library preparation, in previous studies (11, 30). The utilization of unbiased library preparation,
383 using ribosomal depletion methods, besides the detection of +ssRNA viruses, has enabled the
384 discovery of many viruses with non-polyadenylated genome, especially within order Bunyavirales
385 (2). The fact of some of the libraries used here were prepared using this approach (36) allowed us
386 to characterize SINV-14, the first bunyavirus reported from ants. In addition, we also reported for
387 the first time another negative sense virus within the order Mononegavirales associated to S.
388 invicta. Although we did not perform any amplification step of viruses genomes characterized
389 here, the presence of untruncated ORFs carrying intact functional domains (Suppl. Table S3), the
390 high abundance of viral reads, and the similar organization and genomes sizes compared to other
18
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
391 closely related viruses strongly suggest that we obtained the correct full-length or near full-length
392 genomes (Figure 1). The only exception is SINaDNV, due to the linear single strand DNA
393 (ssDNA) genome. Only active transcriptional units were sampled using our transcriptome
394 approach and further investigation is needed to reveal its full-length genome structure.
395 Our analysis of RNA obtained from geographically and temporally different samples
396 suggests shifts in the virome of native and invasive fire ants (11). In established exotic populations
397 of the S. invicta Yang, Yu (21) identified SINV-1 and SINV-2 and hypothesized that while these
398 two viruses may persist, the more virulent virus, SINV-3 arrived with founders but caused high
399 host mortality resulting in individual carriers of the virus being rapidly eliminated. Nine additional
400 viruses were identified in S. invicta RNA samples from the native South American range of the
401 species (11). The viruses described here show distribution and composition that varied according
402 to geographic location and S. invicta stage (Table 1 and 2). While SINV-16, SINV-17 and
403 SINaDNV were exclusively found in Mississippi and specifically associated with larvae
404 individuals, detected in all three libraries, these viruses were not detected in pupae from the same
405 sampled place, suggesting a stage-dependent association, as previously reported for some other S.
406 invicta viruses (28). Conversely, SINV-14 and SINV-15 were detected associated with all stages
407 analyzed (Table 1 and 2). In addition, SINV-14 and SINV-15 were found in Mississippi and Texas,
408 with SINV-14 also detected in Florida (Table 1 and 2). Interestingly, while SINV-15, SINV-16,
409 SINV-17 and SINaDNV were restricted to the U.S., SINV-14 was also detected in Taiwan (Table
410 1 and 2). To date, four of five viruses previously reported associated with S. invicta in introduced
411 areas have also been previously described to occur in its native origin (11, 21). Although viral
412 diversity associated with S. invicta has been deeply studied in Argentina, the viruses reported in
413 this study have not been detected there yet (11, 30). To gain further insight about the new viruses’
19
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
414 distribution in S. invicta native range, eight transcriptome libraries publicly available at SRA, were
415 analyzed. These libraries were prepared from 182 colonies collected from eastern Formosa,
416 Argentina, suggested to be the original source to S. invicta population in U.S. (11, 18). Based on
417 BLASTn analysis, our search failed to detect any significant reads matching to virus genomes
418 obtained here (data not shown), suggesting that these viruses may not be present in the sampled
419 area in Argentina, and even that new host-virus associations may have occurred in introduced
420 areas.
421 While prospecting viral enemies in the origin center of S. invicta offers opportunity for
422 discovering a higher viral diversity with potential to be used in biocontrol, investigating viruses
423 circulating in introduced populations may reveal new virus-ant interactions with even greater
424 potential. Due to the short-term virus-host association, those viruses acquired in introduced areas
425 may be even more harmful to the host. Furthermore, selecting natural enemies already present in
426 the invaded area is likely to offer less risk to the natural ecosystem. Thus, the viruses characterized
427 here expand the diversity of viruses infecting S. invicta in invaded areas with potential to be used
428 as biological control agents, which will require further biological characterization.
429 The use of metatranscriptome approach has allowed the discovery of divergent groups of
430 RNA viruses contributing to a better understanding about origin and evolutionary history of viruses
431 associated with the most diverse host taxa (56). While the origins of viruses that replicate in plant
432 and insect vectors remains still unknown, such as those within order Bunyavirales and
433 Mononegavirales, the discovery of possible intermediate forms has been suggested (2, 49).
434 Tenuiviruses are known to be plant viruses that replicate in the insect vector with a multipartite
435 negative/ambisense, ssRNA genome, packaged in a filamentous nonenveloped virion and
436 horizontally transmitted to the host plant in a circulative propagative manner by planthoppers
20
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
437 (Order Hemiptera), where vertical transmission can also occur (57, 58). Li, Shi (2) reported the
438 first species of a tenui-like virus, Horsefly horwuvirus (virus WhHV), associated with a pool of
439 horseflies (family Tabanidae), and proposed that it may represent an transitional form between
440 plant-infecting virus and arthropod-specific viruses. In addition, a partial genome of a new tenui-
441 like, FCTenV1, has been recently reported associated to Culex annulirostris (49). While these two
442 tenui-like viruses have been reported associated with flies (Order Diptera), we characterized a new
443 tenui-like virus closely related to the FCTenV1, associated to S. invicta (Order Hymenoptera).
444 Interestingly, in contrast with WhHV, which lacks an ambisense coding strategy, the SINV-14
445 genome exactly mirrors the genomic structure of typical plant tenuivirus, encoding proteins using
446 ambisense strategy, and also has the conserved sequences at the ends of all four segments identical
447 to those found in tenuivirus genomes (Figure 1A and B). While these viruses may represent
448 different steps of transitional forms between tenui-infecting plant and those insects-specific
449 viruses, the direction of the process, if they come from plant to insect or otherwise, remains
450 unknown.
451 Whereas phylogenetic trees based on RdRp, glycoprotein and NS4 protein were highly
452 congruent and related to those from tenui-like viruses, the nucleocapsid protein was closely related
453 to that of otter fecal phlebovirus, strongly suggesting that SINV-14 may have a
454 recombinant/reassortment origin, where the RNA3 or part of it was acquired from a divergent
455 phenuivirus. While it could be argued that this might be an artefact due to assembling a segmented
456 genome using a transcriptome approach, the fact that we did not find any other contigs related to
457 phenuiviruses, the high read abundance, the constant association between these four segments
458 across different libraries and the presence of the conserved tenuivirus sequence located at the ends
459 of genome segments (Figure 1B), suggest that they are part of a unique genome, rather than being
21
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
460 an artefact. In addition, phylogenetic congruence among different segments indicate that they are
461 in an intimate codivergence process (Suppl. Figure S1).
462 Valles, Strong (59) using an expressed sequence tag (ETS) library from S. invicta, detected
463 a short sequence (approximately 750 nt, GenBank access: EF409991) related to plant tenuiviruses.
464 Further identity analysis showed that sequence is 99.8% identical to SINV14 RNA4. They
465 suggested that the sequence would be likely a contamination due to the ant diet feeding either plant
466 or infected insect. Tenuiviruses are typical plant viruses that replicate in the insect vector (57, 58),
467 and the high abundance of SINV-14 compared to housekeeping genes is strong evidence of active
468 replication in S. invicta. Furthermore, the highest virus abundance was found in a sample prepared
469 from a dissected ant brain, which rules out the possibility of contamination due to feeding on plants
470 or association with insect vectors infected with a tenuivirus. The Maize stripe virus (MSpV), a
471 typical plant tenuivirus, was detected in the brain of its vector Peregrinus maidis, providing
472 evidence of replication of tenuiviruses in this tissue (60). Tenuiviruses replicate in diverse tissues
473 of their insect vectors and are transovarially transmitted between generations suggesting that
474 SINV-14, and other tenui-like, could be sustained through vertical transmission in their insect host
475 (58). Additionally, significant asymmetric abundance of different components of SINV-14 suggest
476 a very specific and active interaction. Asymmetric accumulation in multipartite viruses has been
477 shown and seems to be common trait, shared by RNA and DNA multipartite viruses infecting
478 plants and animals (61, 62); this has been suggested to be involved in control of gene expression
479 allowing fast virus adaptation (63). Although this has not yet been shown for any phenuivirus and
480 may be host dependent (61), our results suggest that this may occur, at least for SINV-14, and
481 more experiments will be necessary to confirm.
22
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
482 While SINV-14 is evolutionarily closely related to insect tenui-like viruses and plant
483 tenuivirus, the synonymous codon usage and dinucleotide analyses demonstrate a distinct
484 compositional bias compared to FCTenV1 and WhHV, and all other viruses analyzed here,
485 indicating that the virus may be actively replicating in ants rather than plants and other insects.
486 The active replication may have driven the virus genome to distinct compositional bias at the
487 nucleotide level, while maintaining protein integrity at the amino acid level and close relationship
488 with those of other tenui-like viruses, as observed through phylogenetic analysis. The fact that
489 most viruses examined here replicate in plant or vertebrate host and also in the insect vector, could
490 be the reason driving such difference between SINV-14, most likely associated only with ant,
491 compared to other viruses. Furthermore, phylogenetic congruence across different segments that
492 mirror the genetic structure of invasive S. invicta, suggest an intimate and long-term codivergence
493 process between virus-host. Ascunce, Yang (18) demonstrated that S. invicta subpopulations from
494 Texas comprise a distinct genetic cluster compared to Mississippi and Florida, that can also be
495 observed when we reconstructed the phylogeny for the distinct segments of SINV-14 (Suppl.
496 Figure S1). Interestingly, while Ascunce, Yang (18) suggested an introduction event from
497 California into Taiwan, the virus recovery from Taiwan clustered with those from Mississippi and
498 Florida, suggesting that other more recent introductions may have occurred (64, 65).
499 We presented strong evidence that SINV-14 is long-term association and may actively
500 replicate in ants. However, the possibility of this virus, and other tenui-like to replicate in plants is
501 unknown. The presence of the protein carrying an NS4 domain, that is related to cell-to-cell and
502 long-distance movement in plants (66) in insect viruses is puzzling. Solenopsis invicta virus 14
503 NS4 is highly divergent showing 19.85 to 24.4% of identity compared to other tenuivirus (Suppl.
504 Figure S2). In addition, mutation in most sites associated with cell-to-cell and long-distance
23
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
505 movement (Suppl. Figure S2), suggests that this protein might have lost the capacity to move viral
506 genome in plant, whereas its maintenance in insect viruses may be related to another role acquired
507 through functional diversification. The possible function of the plant virus movement protein from
508 insect viruses is of significant interest, and its role in insects and plants remains to be addressed.
509 Altogether, based on virus abundance compared to housekeeping genes, abundant viruses
510 with viral reads higher than 0.01%, phylogenetic relationship, complete viral coding sequence
511 regions recovered, and compositional bias for SINV-14, our results suggest that four out five
512 viruses reported here, those being SINV-14, SINV-15, SINV-16 and SINV-17 are truly replicating
513 in S. invicta. Our results suggest fluid shifts in the virome of this invasive species. Further research
514 describing this virome in native and invasive regions and ecosystems could provide insight on
515 virus evolution and invasion mechanics.
516 Conclusions
517 The present study expands our knowledge about viral diversity associated with S. invicta in
518 introduced areas. In addition to revealing new virus-host interactions, contributing to better
519 understanding of viral evolutionary history and emergence, the discovery of new viruses expands
520 the range of agents with potential to be used in biocontrol programs, which will require further
521 biological characterization. By understanding these interactions, we may be better equipped to
522 cope with ongoing global changes and introductions of invasive organisms and their associated
523 viruses.
524 Availability of data and materials
525 Consensus virus genome sequences were deposited in GenBank under accession numbers:
526 MT860232 to MT860267.
24
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
527 Ethics approval and consent to participate
528 Not applicable.
529 Consent for publication
530 Not applicable.
531 Competing interests
532 The authors declare that they have no competing interests.
533 Funding
534 This work was supported by USDA ARS In-House Appropriated Research Project: Biology and
535 Control of Invasive Ants, Project Number: 6066-22320-010-00-D
536 Authors' contributions
537 C.A.D.X, M.L.A. and A.E.W. conceived and designed the study. C.A.D.X and M.L.A. performed
538 the analysis. C.A.D.X, M.L.A. and A.E.W. discussed the results and wrote the manuscript.
539 Acknowledgements
540 Trade, firm, or corporation names in this publication are for the information and convenience of
541 the reader. Such use does not constitute an official endorsement or approval by the U.S.
542 Department of Agriculture of the Agricultural Research Service (USDA-ARS).
543 References
544 1. Stork NE. How many species of insects and other terrestrial arthropods are there on Earth? 545 Annu Rev Entomol. 2018;63:31-45.
546 2. Li CX, Shi M, Tian JH, Lin XD, Kang YJ, Chen LJ, et al. Unprecedented genomic diversity 547 of RNA viruses in arthropods reveals the ancestry of negative-sense RNA viruses. Elife. 548 2015;29(4):05378.
25
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
549 3. Shi M, Lin X-D, Tian J-H, Chen L-J, Chen X, Li C-X, et al. Redefining the invertebrate 550 RNA virosphere. Nature. 2016;540(7634):539-43.
551 4. Käfer S, Paraskevopoulou S, Zirkel F, Wieseke N, Donath A, Petersen M, et al. Re- 552 assessing the diversity of negative strand RNA viruses in insects. PLoS Pathog. 553 2019;15(12).
554 5. Shi M, Neville P, Nicholson J, Eden JS, Imrie A, Holmes EC. High-resolution 555 metatranscriptomics reveals the ecological dynamics of mosquito-associated RNA viruses 556 in western australia. J Virol. 2017;91(17):00680-17.
557 6. Atoni E, Zhao L, Karungu S, Obanda V, Agwanda B, Xia H, et al. The discovery and global 558 distribution of novel mosquito-associated viruses in the last decade (2007-2017). Rev Med 559 Virol. 2019;29(6):13.
560 7. Faizah AN, Kobayashi D, Isawa H, Amoa-Bosompem M, Murota K, Higa Y, et al. 561 Deciphering the virome of Culex vishnui subgroup mosquitoes, the major vectors of 562 japanese encephalitis, in Japan. Viruses. 2020;12(3).
563 8. Pettersson JH, Shi M, Eden JS, Holmes EC, Hesson JC. Meta-transcriptomic comparison 564 of the RNA viromes of the mosquito vectors Culex pipiens and Culex torrentium in 565 Northern Europe. Viruses. 2019;11(11).
566 9. Kleanthous E, Olendraite I, Lukhovitskaya NI, Firth AE. Discovery of three RNA viruses 567 using ant transcriptomic datasets. Archives of virology. 2019;164(2):643-7. Epub 568 2018/11/10.
569 10. Dhaygude K, Johansson H, Kulmuni J, Sundström L. Genome organization and molecular 570 characterization of the three Formica exsecta viruses-FeV1, FeV2 and FeV4. PeerJ. 571 2019;20(6).
572 11. Valles SM, Rivers AR. Nine new RNA viruses associated with the fire ant Solenopsis 573 invicta from its native range. Virus Genes. 2019;55(3):368-80.
574 12. Lacey LA, Grzywacz D, Shapiro-Ilan DI, Frutos R, Brownbridge M, Goettel MS. Insect 575 pathogens as biological control agents: back to the future. Journal of Invertebrate 576 Pathology. 2015;132:1-41.
577 13. Chen X, Gonçalves MA. Engineered viruses as genome editing devices. Mol Ther. 578 2016;24(3):447-57.
579 14. Rode NO, Estoup A, Bourguet D, Courtier-Orgogozo V, Débarre F. Population 580 management using gene drive: molecular design, models of spread dynamics and 581 assessment of ecological risks. Conservation Genetics. 2019;20(4):671-90.
582 15. Porter SD, Savignano DA. Invasion of polygyne fire ants decimates native ants and disrupts 583 arthropod community. Ecology. 1990;71(6):2095-106.
26
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
584 16. Pimentel D, Zuniga R, Morrison D. Update on the environmental and economic costs 585 associated with alien-invasive species in the United States. Ecological Economics. 586 2005;52(3):273-88.
587 17. Callcott A-MA, Collins HL. Invasion and range expansion of imported fire ants 588 (Hymenoptera: Formicidae) in North America from 1918-1995. The Florida Entomologist. 589 1996;79(2):240-51.
590 18. Ascunce MS, Yang C-C, Oakey J, Calcaterra L, Wu W-J, Shih C-J, et al. Global invasion 591 history of the fire ant Solenopsis invicta. Science. 2011;331(6020):1066-8.
592 19. Shoemaker DD, Ahrens M, Sheill L, Mescher M, Keller L, Ross KG. Distribution and 593 prevalence of Wolbachia Infections in native populations of the fire ant Solenopsis invicta 594 (Hymenoptera: Formicidae). Environmental Entomology. 2003;32(6):1329-36.
595 20. Bouwma AM, Ahrens ME, DeHeer CJ, DeWayne Shoemaker D. Distribution and 596 prevalence of Wolbachia in introduced populations of the fire ant Solenopsis invicta. Insect 597 Mol Biol. 2006;15(1):89-93.
598 21. Yang C-C, Yu Y-C, Valles SM, Oi DH, Chen Y-C, Shoemaker D, et al. Loss of microbial 599 (pathogen) infections associated with recent invasions of the red imported fire ant 600 Solenopsis invicta. Biological Invasions. 2010;12(9):3307-18.
601 22. Porter SD, Fowler HG, Mackay WP. Fire ant mound densities in the United States and 602 Brazil (Hymenoptera: Formicidae). Journal of Economic Entomology. 1992;85(4):1154- 603 61.
604 23. Morrison LW, Porter SD, Daniels E, Korzukhin MD. Potential global range expansion of 605 the invasive fire ant, Solenopsis invicta. Biological Invasions. 2004;6(2):183-91.
606 24. Drees BM, Calixto AA, Nester PR. Integrated pest management concepts for red imported 607 fire ants Solenopsis invicta (Hymenoptera: Formicidae). Insect Science. 2013;20(4):429- 608 38.
609 25. Valles SM, Porter SD, Choi M-Y, Oi DH. Successful transmission of Solenopsis invicta 610 virus 3 to Solenopsis invicta fire ant colonies in oil, sugar, and cricket bait formulations. 611 Journal of Invertebrate Pathology. 2013;113(3):198-204.
612 26. Oi D, Valles S, Porter S, Cavanaugh C, White G, Henke J. Introduction of fire ant 613 biological control agents into the Coachella Valley of California. Florida Entomologist. 614 2019;102(1):284-6, 3.
615 27. Valles SM, Strong CA, Dang PM, Hunter WB, Pereira RM, Oi DH, et al. A picorna-like 616 virus from the red imported fire ant, Solenopsis invicta: initial discovery, genome sequence, 617 and characterization. Virology. 2004;328(1):151-7.
27
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
618 28. Valles SM, Hashimoto Y. Isolation and characterization of Solenopsis invicta virus 3, a 619 new positive-strand RNA virus infecting the red imported fire ant, Solenopsis invicta. 620 Virology. 2009;388(2):354-61.
621 29. Valles SM, Strong CA, Hashimoto Y. A new positive-strand RNA virus with unique 622 genome characteristics from the red imported fire ant, Solenopsis invicta. Virology. 623 2007;365(2):457-63.
624 30. Valles SM, Porter SD, Calcaterra LA. Prospecting for viral natural enemies of the fire ant 625 Solenopsis invicta in Argentina. PLoS One. 2018;13(2).
626 31. Allen ML. Near-complete genome sequences of new strain of Nylanderia Fulva Virus 1 627 from Solenopsis invicta. Microbiol Resour Announc. 2020;9(15):00798-19.
628 32. Valles SM, Shoemaker D, Wurm Y, Strong CA, Varone L, Becnel JJ, et al. Discovery and 629 molecular characterization of an ambisense densovirus from South American populations 630 of Solenopsis invicta. Biological Control. 2013;67(3):431-9.
631 33. Manfredini F, Shoemaker D, Grozinger CM. Dynamic changes in host-virus interactions 632 associated with colony founding and social environment in fire ant queens (Solenopsis 633 invicta). Ecol Evol. 2015;6(1):233-44.
634 34. Hsu H-W, Chiu M-C, Shoemaker D, Yang C-CS. Viral infections in fire ants lead to 635 reduced foraging activity and dietary changes. Scientific Reports. 2018;8(1):13498.
636 35. Valles SM. Positive-strand RNA viruses infecting the red imported fire ant, Solenopsis 637 invicta. Psyche. 2012;2012:821591.
638 36. Allen ML, Rhoades JH, Sparks ME, Grodowitz MJ. Differential gene expression in red 639 imported fire ant (Solenopsis invicta) (Hymenoptera: Formicidae) larval and pupal stages. 640 Insects. 2018;9(4).
641 37. Morandin C, Tin MMY, Abril S, Gómez C, Pontieri L, Schiøtt M, et al. Comparative 642 transcriptomics reveals the conserved building blocks involved in parallel evolution of 643 diverse phenotypic traits in ants. Genome Biology. 2016;17(1):43.
644 38. Fontana S, Chang N-C, Chang T, Lee C-C, Dang V-D, Wang J. The fire ant social 645 supergene is characterized by extensive gene and transposable element copy number 646 variation. Molecular Ecology. 2020;29(1):105-20.
647 39. Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 648 7.0 for bigger datasets. Molecular Biology and Evolution. 2016;33(7):1870-4.
649 40. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, et al. MrBayes 650 3.2: efficient Bayesian phylogenetic inference and model choice across a large model 651 space. Systematic biology. 2012;61(3):539-42. Epub 2012/02/22.
28
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
652 41. Bewick V, Cheek L, Ball J. Statistics review 10: further nonparametric methods. Critical 653 care (London, England). 2004;8(3):196-9. Epub 2004/04/16.
654 42. De Mendiburu F, Yaseen M. Statistical procedures for agricultural research. R package 655 version 140. https://myaseen208.github.io/agricolae/2020.
656 43. Di Giallonardo F, Schlub TE, Shi M, Holmes EC. Dinucleotide composition in animal 657 RNA viruses is shaped more by virus family than by host species. Journal of virology. 658 2017;91(8):e02381-16.
659 44. Kapoor A, Simmonds P, Lipkin WI, Zaidi S, Delwart E. Use of nucleotide composition 660 analysis to infer hosts for three novel picorna-like viruses. J Virol. 2010;84(19):10322-8.
661 45. Elek A, Kuzman M, Vlahovicek K. coRdon: codon usage analysis and prediction of gene 662 expressivity. R package version 170. https://github.com/BioinfoHR/coRdon2020.
663 46. Charif D, Lobry JR. SeqinR 1.0-2: a contributed package to the R project for statistical 664 computing devoted to biological sequences retrieval and analysis. In: Bastolla U, Porto M, 665 Roman HE, Vendruscolo M, editors. Structural Approaches to Sequence Evolution: 666 Molecules, Networks, Populations. Berlin, Heidelberg: Springer Berlin Heidelberg; 2007. 667 p. 207-32.
668 47. Kassambara A, Mundt F. factoextra: extract and visualize the results of multivariate data 669 analyses. R package version 107. http://www.sthda.com/english/rpkgs/factoextra2020. 670 48. Vasilakis N, Forrester NL, Palacios G, Nasar F, Savji N, Rossi SL, et al. Negevirus: a 671 proposed new taxon of insect-specific viruses with wide geographic distribution. Journal 672 of virology. 2013;87(5):2475-88. Epub 2012/12/19.
673 49. Williams SH, Levy A, Yates RA, Somaweera N, Neville PJ, Nicholson J, et al. The 674 diversity and distribution of viruses associated with Culex annulirostris mosquitoes from 675 the Kimberley region of western Australia. Viruses. 2020;12(7):717.
676 50. Sõmera M, Kvarnheden A, Desbiez C, Blystad DR, Sooväli P, Kundu JK, et al. Sixty years 677 after the first description: genome sequence and biological characterization of European 678 wheat striate mosaic virus infecting cereal crops. Phytopathology. 2020;110(1):68-79.
679 51. Bodewes R, Ruiz-Gonzalez A, Schapendonk CME, van den Brand JMA, Osterhaus 680 ADME, Smits SL. Viral metagenomic analysis of feces of wild small carnivores. Virology 681 journal. 2014;11:89-.
682 52. Toriyama S, Kimishima T, Takahashi M, Shimizu T, Minaka N, Akutsu K. The complete 683 nucleotide sequence of the rice grassy stunt virus genome and genomic comparisons with 684 viruses of the genus Tenuivirus. Journal of General Virology. 1998;79(8):2051-8.
685 53. Dou X. Transcriptome analysis and identification of genes involved in moth sex 686 pheromone biosynthetic pathways. Graduate Theses and Dissertations2019.
29
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
687 54. Valles SM, Oi DH, Becnel JJ, Wetterer JK, LaPolla JS, Firth AE. Isolation and 688 characterization of Nylanderia fulva virus 1, a positive-sense, single-stranded RNA virus 689 infecting the tawny crazy ant, Nylanderia fulva. Virology. 2016;496:244-54. Epub 690 2016/06/30.
691 55. Invasive.org: center for invasive species and ecossystem healthy. 692 https://www.invasive.org/species/diseases.cfm. Accessed 4 August 2020.
693 56. Wolf YI, Kazlauskas D, Iranzo J, Lucía-Sanz A, Kuhn JH, Krupovic M, et al. Origins and 694 evolution of the global RNA virome. mBio. 2018;9(6):e02329-18.
695 57. Falk BW, Tsai JH. Biology and molecular biology of viruses in the genus Tenuivirus. 696 Annual review of phytopathology. 1998;36:139-63.
697 58. Liu W, Hajano JU, Wang X. New insights on the transmission mechanism of tenuiviruses 698 by their vector insects. Curr Opin Virol. 2018;33:13-7.
699 59. Valles SM, Strong CA, Hunter WB, Dang PM, Pereira RM, Oi DH, et al. Expressed 700 sequence tags from the red imported fire ant, Solenopsis invicta: annotation and utilization 701 for discovery of viruses. J Invertebr Pathol. 2008;99(1):74-81.
702 60. Nault LR, Gordon DT. Multiplication of Maize Stripe Virus in Peregrinus maidis. 703 Phytopathology. 1988;78(7):991-5.
704 61. Sicard A, Yvon M, Timchenko T, Gronenborn B, Michalakis Y, Gutierrez S, et al. Gene 705 copy number is differentially regulated in a multipartite virus. Nat Commun. 2013;4(2248).
706 62. Wu B, Zwart MP, Sánchez-Navarro JA, Elena SF. Within-host evolution of segments ratio 707 for the tripartite genome of Alfalfa mosaic virus. Scientific Reports. 2017;7(1):5004.
708 63. Zwart MP, Elena SF. Modeling multipartite virus evolution: the genome formula facilitates 709 rapid adaptation to heterogeneous environments. Virus Evol. 2020;6(1).
710 64. Schoebel CN, Botella L, Lygis V, Rigling D. Population genetic analysis of a parasitic 711 mycovirus to infer the invasion history of its fungal host. Molecular Ecology. 712 2017;26(9):2482-97.
713 65. Carver S, Lunn T. When are pathogen dynamics likely to reflect host population genetic 714 structure? Molecular Ecology. 2020;29(5):859-61.
715 66. Zhang C, Pei X, Wang Z, Jia S, Guo S, Zhang Y, et al. The Rice stripe virus pc4 functions 716 in movement and foliar necrosis expression in Nicotiana benthamiana. Virology. 717 2012;425(2):113-21.
30
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
718 Table 1. Information summary of S. invicta transcriptome dataset analyzed in this study.
Sample Part of insect SRA access # Stage Origin Date of collection Other information Refence code body A SRX3035962 Larvae Whole body Mississippi, USA March 2016 Polygynous Allen et al., (2018) B SRX3035961 Larvae Whole body Mississippi, USA March 2016 Polygynous Allen et al., (2018) C SRX3035960 Larvae Whole body Mississippi, USA March 2016 Polygynous Allen et al., (2018) X SRX3035959 Pupa Whole body Mississippi, USA March 2016 Polygynous Allen et al., (2018) Y SRX3035964 Pupa Whole body Mississippi, USA March 2016 Polygynous Allen et al., (2018) Z SRX3035963 Pupa Whole body Mississippi, USA March 2016 Polygynous Allen et al., (2018) Q2 DRX037806 Adults Whole body Texas, USA September 2012 Polygynous, queen Morandin et al., (2016) W2 DRX037809 Adults Whole body Texas, USA September 2012 Polygynous, worker Morandin et al., (2016) W3 DRX037810 Adults Whole body Texas, USA September 2012 Polygynous, worker Morandin et al., (2016) Y05 SRX5464977 Adults Brain Florida, USA May 2014 Queen -* K05 SRX5464984 Adults Brain Florida, USA May 2014 Queen - CA01 SRX5464990 Adults Brain Florida, USA May 2014 Queen - A pool of two virgin 2-small SRX5822389 Adults Whole body Taiwan May 2014 Fontana et al., (2020) queens, lab colony 719 * Reference is not available yet.
31
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
720 Table 2. Abundance of viruses and housekeeping genes, cytochrome c oxidase subunit I (cox1), ribosomal protein L18 (rpl18) and
721 translation elongation factor 1 (eif1-beta), expressed in percentage of viral reads per total number of reads in each library.
Virus abundance (%)* Housekeeping genes (%) Sample SRA access # SINV-14 SINV-15 SINV-16 SINV-17 SINaDNV cox1 rpl18 eif1-beta RNA1 RNA2 RNA3 RNA4 Total& A SRX3035962 - - - - - 0.008 0.516 0.128 0.002 0.3910 0.0238 0.0163 B SRX3035961 0.047 0.019 0.042 0.072 0.180 0.013 0.327 0.235 - 0.7102 0.0263 0.0267 C SRX3035960 0.046 0.005 0.051 0.048 0.150 0.019 0.120 0.281 0.002 0.4324 0.0246 0.0305 X SRX3035959 - - - - - 0.009 - - - 0.2860 0.0161 0.0080 Y SRX3035964 - - - - - 0.023 - - - 0.3236 0.0153 0.0089 Z SRX3035963 0.374 0.032 0.131 0.089 0.626 0.039 - - - 0.2687 0.0135 0.0090 Q2 DRX037806 1.469 0.055 0.240 0.944 2.708 0.211 - - - 1.0036 0.0530 0.0469 W2 DRX037809 1.679 0.006 0.088 0.601 2.374 0.020 - - - 2.1720 0.0342 0.0113 W3 DRX037810 1.658 0.006 0.089 0.619 2.372 0.020 - - - 1.6576 0.0351 0.0117 Y05 SRX5464977 6.803 0.941 0.650 0.001 8.394 - - - - 0.4903 0.0119 0.0048 K05 SRX5464984 ------0.4957 0.0141 0.0046 CA01 SRX5464990 ------0.4639 0.0158 0.0064 2-small SRX5822389 0.084 0.009 0.017 0.019 0.130 - - - - 0.9619 0.0712 0.0340 722 * Viruses with abundance equal or higher than those shown by at least one of the housekeeping genes are highlighted in bold and those
723 with full-length genome underlined.
724 & Total: means all segments together
725
32
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
726 Figure legends
727 Figure 1. Genome organization of five putative new viruses associated with S. invicta
728 transcriptomes. Predicted open reading frames (ORFs) encoding proteins homologous to known
729 viral proteins or conserved domain that they encode are represented and colored according to their
730 orientation in the genome (dark blue: those encoded on positive strand and light blue: those
731 encoded on negative strand). Graphic representation of (A) segmented negative single-strand RNA
732 genome S. invicta virus 14 (SINV-14) and (B) conserved consensus sequence located at the ends
733 of tenuivirus genome segment (5’-ACACAAAGU and ACUUUGAGU-3’) detected in SINV-14
734 genome. Dashed lines indicate incomplete sequence. Nucleotides highlighted in red differ from
735 consensus sequence. (C) Negative-sense single-strand RNA genome S. invicta virus 15 (SINV-
736 15), and positive-sense single-strand RNA genomes (D) S. invicta virus 16 (SINV-16) and (E) S.
737 invicta virus 17 (SINV-17) are shown. (F) Partial double-strand DNA genome of S. invicta-
738 associated densovirus (SINaDNV) is represented. RdRp and L: RNA-dependent RNA polymerase;
739 G: glycoprotein; N: nucleocapsid protein; p3: unknown; NS4: non-structural protein 4; NCP: non-
740 nucleocapsid protein; P: phosphoprotein; M: matrix protein; CP: capsid protein; H: Helicase; V:
741 viral methyltransferase; F: Ftsj-like methyltransferase; NS1and NS2: non-structural protein 1 and
742 2, respectively.
743 Figure 2. Evolutionary history inferred for different genomic segments of S. invicta virus 14
744 (SINV-14). Midpoint-rooted Bayesian phylogenetic trees based on (A) full-length amino-acid
745 sequences of RNA-dependent RNA polymerase (RdRp), (B) amino-acid sequence of conserved
746 glycoprotein domain, (C) full-length amino-acid sequence of movement protein and (D) full-
747 length amino-acid sequence of nucleocapsid protein. Bayesian posterior probabilities are shown at
748 the nodes. Nodes with posterior probability values between >0.50 and ≤85 are indicated by empty
33
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
749 circles, >0.85 and <0.95 are indicated by half-filled circles and those with values ≥0.95 are
750 indicated by filled circles. The scale bar at the bottom indicates the number of amino acid
751 substitutions per site. Branches highlighted in red indicate virus characterized in this study. Genera
752 are indicated at the right side. Proposed species unaccepted yet by International Committee on
753 Taxonomy of Viruses (ICTV) are followed by an asterisk. Proteins sequences encoded by cognates
754 components for a given virus that do not have conserved domain according to Conserved Domain
755 Database (CDD) were not included.
756 Figure 3. Evolutionary history of new viruses associated to S. invicta transcriptome based on
757 Bayesian inference. Midpoint-rooted phylogenetic trees were constructed based on (A) conserved
758 domain (pfam00946) of RNA-dependent RNA polymerase (RdRp) of S. invicta virus 15 (SINV-
759 15) and representative species of the order Mononegavirales, (B) conserved domain (pfam00680)
760 of RdRp of S. invicta virus 16 (SINV-16) and representative species of the families Iflaviridae and
761 Dicistroviridae and (C) full-length amino-acid sequences of RdRp of S. invicta virus 17 (SINV-
762 17) and representative species of the proposed new genus Negevirus (unassigned to any family)
763 and families Virgaviridae and Kitaviridae, are shown. (D) Bayesian phylogenetic tree based on
764 full-length amino-acid sequences of non-structural gene 1 (NS1) of S. invicta-associated
765 densovirus (SINaDNV) and representative species of the family Parvoviridae. Bayesian posterior
766 probabilities are shown at the nodes. Nodes with posterior probability values between >0.65 and
767 ≤85 are indicated by empty circle, >0.85 and <0.95 are indicated by half-filled circles and those
768 with values ≥0.95 are indicated by filled circles. The scale bar at the bottom indicates the number
769 of amino acid substitutions per site. Branches highlighted in red indicates viruses characterized in
770 this study. Genera/Families/Subfamilies are shown at right side.
34
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
771 Figure 4. Boxplot of housekeeping genes and viruses abundance. (A) Abundance among
772 housekeeping genes from all libraries were compared. Abundance was calculated as percentage of
773 housekeeping or viruses reads per total number of reads in each library. (B) Abundance between
774 viruses and housekeeping genes were compared. Statistical test was performed among virus and
775 housekeeping genes abundance from the same libraries. (C) Normalized abundance of S. invicta
776 virus 14 (SINV-14) segments significantly differ among them. Normalized abundance was
777 calculated by dividing the abundance of each segment by the total abundance of the entire genome.
778 Box plots with different letters indicate significant differences between groups according to non-
779 parametric Kruskal-Wallis test followed by post hoc multiple comparison test using Fisher’s least
780 significant difference (p<0.05). Box plots show the first and third quartiles as a box, horizontal
781 line corresponds to the median and whiskers correspond to 1.5 times the inter-quartile distance
782 (IQR) from the difference between first and third quartiles.
783 Figure 5. Compositional biases of Solenopsis invicta virus 14 (SINV-14) differ from plant
784 tenuivirus and tenui-like viruses associated with flies. Principal Components Analysis (PCA) of
785 synonymous codon usage (A-D) and dinucleotide bias (E-H) of SINV-14 and other viruses that
786 infect plants and animals (vertebrate and invertebrate) are shown. Scatterplots show the first two
787 principal components that account for the largest variance (shown in parenthesis) in the data set
788 with ellipses representing each group and dots individuals. Viruses were divided into two main
789 groups: plant viruses include those that replicate in plants and insect vectors belonging to the
790 genera Tenuivirus and Orthotospovirus and non-plant viruses including those that do not replicate
791 in plants belonging to the genera Orthobunyavirus and representatives within family
792 Phenuiviridae. For detailed information of the data set see Supplementary Table S4. For NS4
793 protein, comparison was performed only for SINV-14, Fitzroy Crossing tenui-like virus1
35
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
794 (FCTenV1) and tenuiviruses, as this gene is not shared by other groups. WhHV, Wuhan horsefly
795 virus.
36
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
796 Figure 1
797
37
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
798 Figure 2
799 38
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
800 Figure 3
801
39
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
802 Figure 4
803
40
bioRxiv preprint doi: https://doi.org/10.1101/2020.08.11.245274; this version posted August 12, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
804 Figure 5
805
41