bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1 Revisiting Orthotospovirus Phylogeny Using Whole
2 Genomic Data and a Hypothesis for their Geographic
3 Origin and Diversification
4
5 Anamarija Butković a, Rubén González a, Santiago F. Elenaa,b,*
6
7 aInstituto de Biología Integrativa de Sistemas (I2SysBio), Consejo Superior de
8 Investigaciones Científicas-Universitat de València, 46182 Paterna, Spain
9 bThe Santa Fe Institute, Santa Fe, NM 87501, USA
10
11 *Address correspondence to Santiago F. Elena, [email protected].
12
13
14
15 Running title: Orthotospovirus origin and phylogenomics
16
1 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
17 ABSTRACT
18 The family Tospoviridae, a member of the Bunyavirales order, is constituted of tri-
19 segmented negative-sense single-stranded RNA viruses that infect plants and are also able
20 of replicating in their insect vectors in a persistent manner. The family is composed of a
21 single genus, the Orthotospovirus, whose type species is Tomato spotted wilt virus
22 (TSWV). Previous studies assessing the phylogenetic relationships within this genus
23 were based upon partial genomic sequences, thus resulting in unresolved clades and a
24 poor assessment of the roles of recombination and genome shuffling during mixed
25 infections. Complete genomic data for most Orthotospovirus species are now available
26 at NCBI genome database. In this study we have used 62 complete genomes from 20
27 species. Our study confirms the existence of four phylogroups (A to D), grouped in two
28 major clades (A-B and C-D), within the genus. We have estimated the split between the
29 two major clades ~3,100 years ago shortly followed by the split between the A and B
30 phylogroups ~2,860 years ago. The split between the C and D phylogroups happened
31 more recently, ~1,465 years ago. Segment reassortment has been shown to be important
32 in the generation of novel viruses. Likewise, within-segment recombination events have
33 been involved in the origin of new viral species. Finally, phylogeographic analyses of
34 representative viruses suggests the Australasian ecozone as the possible origin of the
35 genus, followed by complex patterns of migration, with rapid global spread and numerous
36 reintroduction events.
37
38 IMPORTANCE
39 Members of the Orthotospovirus genus infect a large number of plant families, including
40 food crops and ornamentals, resulting in multimillionaire economical losses. Despite this
41 importance, phylogenetic relationships within the genus were established years ago based
2 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
42 in partial genomic sequences. A peculiarity of orthotospoviruses is their tri-segmented
43 negative sense genomes, which makes segment reassortment and within-segment
44 recombination, two forms of viral sex, potential evolutionary forces. Using full genomes
45 from all described orthotospovirus species, we revisited their phylogeny and confirmed
46 the existence of four major phylogroups with uneven geographic distribution. We have
47 also shown a pervasive role of sex in the origin of new viral species. Finally, using
48 Bayesian phylogeographic methods, we assessed the possible geographic origin and
49 historical dispersal of representative viruses from the different phylogroups.
50
51 KEYWORDS
52 Bunyavirales, phylogenomics, phylogeography, recombination, segmented viruses,
53 segments reassortment, selection, Tospoviridae, virus taxonomy
54
3 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
55 The order Bunyavirales is full of emerging pathogens that cause severe diseases in
56 humans (e.g., the Crimean-Congo and hantavirus hemorrhagic fevers, the La Crosse
57 encephalitis or the Rift Valley fever), farm animals (e.g., Nairobi sheep disease, ovine
58 and bovine Schmallemberg diseases, or Akabame ruminants’ disease) and agronomically
59 important crops (e.g., tomato spotted wilt, melon yellow spot or peanut bud necrosis
60 diseases). Bunyavirales are all arboviruses, and are transmitted by hemato- or
61 phytophagous arthropods. After the last International Committee on Taxonomy of
62 Viruses (ICTV) reclassification (Adams et al., 2017), two families of plant-only infecting
63 bunyavirus have been designated: the Fimoviridae and the Tospoviridae. Fimoviridae
64 contain a single genus, the Emaravirus, and nine recognized species. Likewise, all
65 Tospoviridae species were classified within the Orthotospovirus genus.
66 Orthotospoviruses have spread around the world causing important loses in vegetable,
67 legume and ornamental crops (Gilbertson et al., 2015). Along with crops, they infect a
68 large variety of weed species; usually each virus species having a wide range of plant
69 hosts. As an example, only for Tomato spotted wilt virus (TSWV), the type member of
70 the genus, more than 900 species belonging to over 90 monocotyledonous and
71 dicotyledonous plants have been described as susceptible hosts (Pappu et al., 2009).
72 The genome structure of orthotospoviruses consists of three negative-sense single-
73 stranded RNAs (Baltimore’s class V; Fig. 1) that differ in size: segments L (large; ~8.9
74 kb), M (medium; ~4.8 kb) and S (small; ~2.9 kb). Only orthotospoviruses and
75 tenuiviruses (a plant-infecting genus of the Phenuiviridae family within the
76 Bunyavirales) use the ambisense strategy to produce their mRNAs (Hull, 2014). Segment
77 S encodes for the nucleoprotein (N) in negative-sense and for the protein NSS in positive-
78 sense. Protein NSS is involved in the suppression of plant RNA silencing (Hedil and
79 Kormelink, 2016). The segment M encodes for a precursor polyprotein of the
4 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
80 glycoproteins GN and GC in negative-sense. In positive-sense it encodes for the protein
81 NSM, which is responsible for the modification of plasmodesmata that enable the
82 infectious nucleocapsid units to move throughout, allowing intercellular movement
83 (Storms et al., 1995). Singh and Savithri (2015) showed that NSM also remodels the
84 endoplasmic reticulum networks to form vesicles to facilitate viral replication and
85 movement. The segment L encodes for the RNA-dependent RNA polymerase (RdRp) in
86 negative-sense.
87 Orthotospovirus particles are spherical with a diameter of 80 - 110 nm. The particles
88 are made of a lipid envelope that is spiked by two glycoproteins, GN and GC (Fig. 1).
89 These glycoproteins form heterodimers that compose oligomeric structures, the exact
90 conformation they make on the outside of the envelope has not been determined
91 experimentally yet. The cytoplasmic tails of the glycoproteins interact with the
92 nucleoproteins that are inside the envelope (Ribeiro et al., 2009). The nucleoproteins
93 encapsidate the three viral RNA segments forming the ribonucleoprotein complexes.
94 These complexes are associated with the RdRp. The number of ribonucleoprotein
95 complexes packed into each virion remains unknown, but for being viable, virions must
96 at least include one copy of the three genomic segments.
97 Orthotospovirus particles are transmitted between susceptible plants through thrip
98 vectors (Rotenberg et al., 2015). Up to fifteen species of common thrips have been
99 reported as facultative vectors (Rotenberg et al., 2015). All these vectors belong to the
100 Frankliniella and Thrips genera (Terebrantia suborder, Thysanoptera order, Thripinae
101 subfamily, Thripidae family). The transmission of orthotospoviruses by thrips has
102 several unusual characteristics. Only the first and second larval instances can acquire the
103 virus and this competence decreases with age. Although thrips can retain lifelong
104 infectivity, their ability to transmit the virus can be erratic (Hull, 2009). The ability to
5 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
105 retain the infectivity is due to the mode of transmission which is propagative, meaning
106 that the virus is able to replicate in its vector (Rotenberg et al., 2015; Fletcher et al., 2016).
107 As the virus replicates in the thrips, with some of the thrips (e.g., Frankliniella
108 occidentalis) being extreme polyphagous, there are chances that mixed infections of
109 orthotospoviruses occur both within the insect and/or the plant. That would lead to
110 recombination, segment reassortment, and component capture events. The global spread
111 of thrips and orthotospoviruses could lead to global disease outbreaks (Gilbertson et al.,
112 2015). Recent studies have shown that plant hosts infected with orthotospoviruses can
113 affect the feeding behavior and even the survival of the vector (Sundaraj et al., 2014).
114 Specifically, Shalileh et al. (2016) showed how TSWV manipulated F. occidentallis via
115 the host plant nutrients to enhance virus transmission and spread.
116 In this work, we have reevaluated the phylogenetic status within the Orthotospovirus
117 genus based on available full genomes. Furthermore, to identify events of heterologous
118 genome reassortment during mixed infections as well within-segment recombination
119 events, phylogenies were done for each one of the three segments and for each one of the
120 five proteins and incongruences among trees were evaluated. All the phylogenies were
121 done using Bayesian methods. Along with the phylogenies there were other evolutionary
122 characteristics of the viruses that were analyzed: selection pressures, recombination
123 events and phylogeographic distribution and dynamics. Based on these results, we have
124 drawn a plausible scenario for the geographic origin and ulterior spread and
125 diversification of Orthotospovirus.
126
127 RESULTS AND DISCUSSION
128 Pervasive evidences of recombination and segment reassortment analyses. Prior
129 to any other study, we evaluated the role of recombination during the phylogenetic
6 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
130 diversification of the orthotospoviruses (File S1). The PHI test for the protein alignment
131 found evidences of recombination between the GNGC (P = 0.0053) and NSM (P = 0.0017)
132 genes in segment M; consistently giving also significant values for segment M as a whole
133 (P < 0.0001). One possible mechanism could be the physiological role of NSM,
134 responsible for viral movement and regulation of apoptosis. In different plant hosts there
135 are different proteins involved in binding to NSM (Knipe and Howley, 2013). Differences
136 in NSM structure could have a major impact in cell-to-cell transportation. It has also been
137 suggested that the NSM protein is an evolutionary adaptation of an insect-only infecting
138 bunyavirus to a plant host, where it determines symptomatology and host range
139 (Tsompana et al., 2004). Therefore, when new viruses spill into new hosts, often in mixed
140 infections with other well-adapted viruses, within segments recombination or segments
141 reassortment events may give rise to new viruses with enhanced movement in the novel
142 host.
143 Furthermore, GNGC encodes for the precursor of the two glycoproteins GN and GC,
144 which is consistent with the GARD algorithm detecting a recombination breakpoint
145 between them. This suggest independent evolutionary histories for the two glycoproteins.
146 It is known that GN is responsible for attachment to epithelial cells in the midgut of thrips
147 as a mode of more effective transmission (Knipe and Howley, 2013). Recombination
148 between these two proteins can readily generate new viruses that would explore their
149 potential of transmission by novel thrips species. The exact breakpoint between GNGC
150 was estimated using GARD to be between amino acid positions 768 and 769. This
151 partition will be later used in all the BEAST analyses described in the following sections.
152 Further supporting the recombinogenic nature of orthotospoviruses, the pairwise
153 comparison of the MCC trees obtained for the three segments (Fig. S1) using tanglegram
7 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
154 in Dendroscope was incongruent between all segments. Mainly, the incongruence in tree
155 topologies was entirely driven by TSWV.
156 In conclusion, recombination within or reassortment of M segment during mixed
157 infections may create new viruses with improved movement in more hosts or
158 transmission ability using new vectors.
159 Full genome phylogenetic reconstructions confirm the existence of four
160 phylogroups. Maximum clade credibility (MCC) Bayesian trees were generated from
161 62 full genomic sequences belonging to 20 different orthotospovirus species (Fig. 2). The
162 trees were built with the heterogeneous model configuration described in Material and
163 Methods section that introduces a partition between segments and, in the case of segment
164 M, it accounts for the existence of a recombination breaking point within the glycoprotein
165 gene. The MCC obtained from the full-genome shows a significant clustering into four
166 different phylogroups A, B, C, and D. Pappu et al. (2009) defined phylogroups in terms
167 of geographical origin of the viruses. However, our full-genome analysis of a larger and
168 updated dataset does not support the notion of independent geographic origins for each
169 phylogroup. Instead, our phylogroups contain viruses from diverse origins. For instance,
170 Pappu et al. (2009) so-called American phylogroup is part of what we have defined here
171 as phylogroup A that contains a mixture of viruses isolated from Asia, Europe, Australia,
172 and America. Likewise, Pappu et al. (2009) Eurasian phylogroup, contains viruses from
173 what we have defined here as phylogroups C and D. Phylogroup D includes an isolate of
174 Capsicum chlorosis virus (CaCV) from Australia. We only incorporated a full-genome
175 sequence from Africa since other sequences from this region failed to meet the criteria
176 mentioned in the Materials and Methods section. It is worth mentioning that most of
177 orthotospoviruses have a broad worldwide distribution, though many genomes have not
178 been fully sequenced and assembled after identification, thus limiting the number of full-
8 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
179 genomes available for our study. We expect that future accumulation of additional full-
180 genome sequences for some species would modify the number of clades in the phylogeny
181 shown in Fig. 2. In fact, it has been suggested using partial sequences that Groundnut
182 chlorotic fan-spot virus (GCFSV) and Groundnut yellow spot virus (GYSV) would likely
183 establish a different clade (Oliver and Whitfield, 2016).
184 In general, most nodes in the MCC tree had a high statistical support (0.95 < BPP £
185 1). However, a few nodes had lower support. The node with the lowest support (BPP =
186 0.52) corresponded to the divergence of TSWV strains 22 and 23; while other nodes
187 within the TSWV clade also had BPP < 0.70 support. Two other clades had intermediate
188 support values (BPP » 0.70): the basal node in which phylogroups A and B split apart,
189 and the node leading to the speciation events between Groundnut ringspot virus (GRSV),
190 Tomato chlorotic spot (TCSV), TSWV, and Zucchini lethal chlorosis virus (ZLCV).
191 Rates of molecular evolution are highly heterogeneous among genes and
192 lineages. Estimated rates of molecular evolution ranged between 10-4 - 2´10-3
193 substitutions per site and per year for segment L, between 6´10-4 - 10-3 for segment M
194 and between 9´10-4 - 1.7´10-3 for segment S (color legends in Fig. S2).
195 Heterogeneity in rates of molecular evolution among viruses (and strains of the same
196 virus) have been observed for the different genes (Fig. 3). For the RdRp gene, the slowest
197 estimated rates of evolution corresponded to all the nodes, while the fastest corresponded
198 to branches separating the phylogroups A-B and C-D. It showed the slowest rate of
199 evolution amongst the five genes.
200 For gene NSM the fastest rates of molecular evolution corresponded to the basal
201 branches leading to phylogroups A-B and C-D, with branches leading to phylogroups A
202 and B. By contrast, the slowest evolutionary rates for NSM were estimated for the basal
9 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
203 branch leading to the phylogroup C, and to terminal branches leading to CaCV and
204 Mulberry vein banding virus (MVBaV).
205 Regarding the glycoprotein genes, for gene GN, we observed fast rates of molecular
206 evolution in the basal branches leading to clades A-B, C-D and phylogroups A, B, C and
207 D. The fastest evolutionary rates were for ZLCV, Melon severe mosaic virus (MSMV)
208 and Melon yellow spot virus (MYSV). The slowest evolving branches were for TSWV
209 and MVBaV. For gene GC, the fastest rate of molecular evolution corresponds the branch
210 leading to the phylogroups C-D. By contrast, the slowest rates of molecular evolution
211 correspond to the basal branch leading to the yet to be named tospovirus isolated from
212 kiwi plants (hereafter referred as Tospokiwi) and TSWV.
213 Overall, the NSS gene showed slow rates of evolution in most basal branches, except
214 those leading to the phylogroup A and B and also phylogroups C-D. MYSV showed
215 faster evolutionary rate compared to the other viruses.
216 High rates of molecular evolution for the N gene were the basal branches leading to
217 phylogroups A and D. The slowest rates observed for the N gene were in the majority of
218 basal branches except the tip branches leading to Bean necrotic mosaic virus (BeNMV)
219 and Impatiens necrotic spot virus (INSV).
220 These results suggest a pattern of variable rates of evolution along the diversification
221 of orthotospoviruses. Overall, accelerated rates of evolution have been observed in all
222 genes in the origin of phylogroups A-B and C-D, in some internal branches within
223 phylogroups leading to speciation events and in some terminal branches giving rise to
224 novel strains of a particular virus. Even for fast evolving lineages, differences among
225 genes exist. For example, MYSV showed very fast rates of molecular evolution in the
226 GN and NSS genes but a rather slow one in the other genes.
10 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
227 Estimation of divergence times. As it can be shown in Fig. 2 for the full genome,
228 the time of the diversification between clades A-B and C-D was estimated to be 3099
229 years ago (ya) (95% HPD confidence interval: 5192 - 1758 ya) (BPP = 1). Soon after,
230 ~2860 ya (95% HPD confidence interval: 4772 - 1605 ya), the A-B clade diverged into
231 phylogroups A and B (BPP = 1). The C-D clade diverged into phylogroups C and D
232 (BPP = 1) around 1465 ya (95% HPD: 2421 - 854 ya).
233 In addition, MCC trees were also generated for the individual segments (Fig. S2) and
234 individual genes (Fig. 3), with estimates of divergence times generated from these partial
235 trees being consistent with those obtained for the full genomes. For example, in the case
236 of the three segments (Fig. S2), the estimated divergence of phylogroups A-B and C-D
237 from each other was in the range of 2786 - 875 ya, whereas the divergence between clades
238 A-B and C-D took place between 2621 - 616 ya. Genes N and NSM have more recent
239 divergence times, estimated about 121 (95% HPD: 548 - 36 ya) and 141 (95% HPD: 366
240 - 45 ya) between phylogroups A-B and C-D, respectively. Divergence times between the
241 four mayor phylogroups estimated for gene NSS is 538 ya (95% HPD: 2519 - 81 ya).
242 However, estimates obtained using the G gene place the divergence between A-B and C-
243 D clades much earlier on time, 2911 ya (95% HPD: 7010 - 1122 ya).
244 A phylogeographic analysis of the origin and dispersal of Orthotospovirus. For
245 these analyses, instead of using countries, or even continents, as geographic units, we
246 opted to use more biologically relevant units: the eight biogeographic realms, or
247 ecozones, into which World Wide Fund for Nature (WWF) divides the world (Schulz,
248 2005). These ecozones consider the geological barriers that kept plants and animals
249 separated from other areas, having a shared evolutionary history. Seventy-three
250 sequences from a reduced dataset of all N proteins available have been used for these
251 analyses. The results are shown in Fig. 4. Australasia (Australia, New Guinea and
11 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
252 neighboring islands) has been identified as the most likely origin of orthotospoviruses
253 (RSPP = 0.4961), with the second highest supported origin being the Western Palearctic
254 (Western Europe and North Africa) (RSPP = 0.3929). However, this conclusion has to
255 be taken with some caution, since Pappu et al. (2009) and Oliver and Whitfield (2016)
256 had suggested that TSWV isolates were actually imported into Australia from Europe ca.
257 1915. The second highest support was for Western Palearctic. Eurasia, as part of the
258 Palearctic ecozone, contains the largest number of different Orthotospovirus species,
259 giving more plausibility to the hypothesis of the Palearctic ecozone as the possible origin
260 of the genus.
261 From Australasia, viruses were introduced into the Western Palearctic and from there
262 into Indomalaya (South Asian Subcontinent and Southeast Asia). Afterwards these three
263 ecozones have been stablished as dispersion centers to colonize the rest of the ecozones;
264 Neotropic (South America and the Caribbean), Nearctic (North America), Eastern
265 Palearctic (Eurasia) and Afrotopic (Sub Saharan Africa) in that order, with numerous
266 reintroductions. It is interesting that the sequences that were introduced numerous times
267 in Eastern Palearctic and Afrotropic became later isolated. The BSSVS analysis shows
268 the strongest support for the dispersal route from the Indomalaya ecozone into the Eastern
269 Palearctic one (BF = 236.2).
270 We also tested the null hypothesis of a random distribution of geographic origins on
271 the MCC tree generated with the full genome (Fig. S3) using Treebreaker (Ansari and
272 Didelot, 2016). The original definition of phylogroups was based on the geographic
273 origin. Therefore, the alternative hypothesis would be that geographic origins must be
274 differentially distributed among the four phylogroups. The branches on the MCC tree
275 that successfully reject the hypothesis of random distribution are for phylogroups B, C,
276 D, and for viruses TSWV and INSV. Phylogroup B has sequences from America,
12 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
277 phylogroup C has sequences from Asia and Europe while the phylogroup D has sequences
278 mostly from Asia. TSWV is found all around the globe but the sequences we used in this
279 study predominantly came from Asia. In the case of INSV we have sequences from
280 Europe. We can see that all the sequences coming from Asia and Europe show a
281 nonrandom distribution of geographic origin in the MCC tree. Only two viruses coming
282 from America, BeNMV and Soybean vein necrosis virus (SVNV) show significant
283 support for a geographic correlation. Interestingly, this association must be entirely due
284 to the limited geographic distribution of the vector species (Rotenberg et al., 2015).
285 BeNMV and SVNV are transmitted by the aphid Neohydatothrips variabilis which is
286 restricted to Central and North America. INSV came from Europe and is transmitted by
287 Frankinella spp. which is found all around the world. Phylogroup C shows strong
288 correlation for viruses Polygonum ringspot virus (PolRSV) and Hippeastrum chlorotic
289 ringspot virus (HCRV) which come from the Paleartic ecozone and are transmitted by
290 Thryps spp. and Dictyothrips spp., mostly distributed across Eurasia. Phylogroup D,
291 mostly contains viruses isolated from Asia and is transmitted by Thrips spp. which is
292 found everywhere.
293 TSWV geographic origin and pandemic dispersal. TSWV is the most represented
294 virus in our dataset and hence provides the richest information to perform
295 phylogeographic analyses for an individual virus. As in the previous section, we have
296 used the WWF ecozones. We have performed two independent phylogeographic analysis
297 of the N protein sequences from TSWV, one based in the entire 229 isolates dataset and
298 a second one based on a reduced dataset of 87 isolates that minimizes overrepresentation
299 of sequences from certain countries (see Materials and Methods). For both datasets, the
300 most likely origin of TSWV was within the Australasia ecozone (RSPP = 0.9426 for the
301 reduced dataset and RSPP = 0.9016 for the full dataset). However, this conclusion has to
13 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
302 be taken with some caution for the reasons discussed in the previous section. In any case,
303 the two data sets differ in the spread of isolates out of Australasia. The results for the
304 reduced dataset are shown in Fig. 5a. In the case of the reduced dataset TSWV isolates
305 emigrated from Australasia first into the Neotropic and then into the Nearctic and the
306 Western and Eastern Palearctic ecozones as recently as 1994 - 1999. Even more recently,
307 around 2007, a second introduction from the Nearctic into the Eastern Palearctic took
308 place. In 2015 several possible parallel introductions into the Afrotropic ecozones from
309 the Neotropic and Easten Palearctic ones are inferred from the dataset, though the
310 strongest supported route was from the Palearctic (BF = 215.4). The sequences from the
311 Western Palearctic and Afrotropic after the introductions became isolated.
312 When looking into the phylogeographic diffusion patterns inferred from the full
313 dataset (Fig. 5b), Australasia still appears as the origin of TSWV pandemic as previously
314 mentioned. From there, the virus spread out into the Nearctic, Western Palearctic,
315 Neotropic, and Eastern Palearctic ecozones within less than ten years (between 1988 and
316 1996). From the Eastern Palearctic, Western Palearctic and Neotropic, isolates were
317 introduced back into the Australasia, Indomalaya (South Asian Subcontinent and
318 Southeast Asia) and Afrotropic ecozones in early 2000s (Fig. 5b). Subsequent
319 reintroductions into Afrotropic were originated from the Eastern Palearctic and Neotropic
320 ecozones (Fig. 5b). In the last decade, isolates from the Afrotropic have diffused into the
321 Neotropic and Indomalaya ecozones (Fig. 5b).
322 In both sequence datasets the Nearctic, Western Palearctic and Eastern Palearctic
323 ecozones show the largest number of viral lineages. The strongest supported routes in
324 this case were between the Neotropic and Australasia and the Eastern Palearctic and
325 Australasia ecozones (in both cases, BF = 226.4). Sequences introduced into the
326 Indomalaya ecozone became isolated after the introduction.
14 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
327 Variations of population sizes over time. Population sizes over time plots are a
328 useful tool to illustrate overall patterns of genetic diversification throughout time
329 (Rambaut et al., 2018), i.e. the number of lineages observed on a tree with respect to time.
330 If diversification happens in a constant steady-state manner through time, a straight line
331 is expected when plotting the number of lineages in a logarithmic scale as a function of
332 time. In one hand, if the observed line lays above the null hypothesis straight line, the
333 diversification rate increases with time. In the other hand, if the observed line goes below
334 the straight line, it can be concluded that diversification rate is decreasing with time. In
335 our case, lineages will be considered as different virus species.
336 In the Skyride plot from the TSWV analysis of the reduced (87 sequences) dataset
337 (Fig. 6a) we can see a rise from 1995 until 2002, where we see a slight decline from 2003
338 until 2005. From 2005 until 2012.5 we have a slight rise after which we have a steady
339 population size until 2018 (Fig. 6a). While the Skyride plot of the full TSWV sequence
340 set (229 sequences) shows a similar pattern with the exception of a decline in population
341 size between the years 2005 – 2012.5 and a decline after that until 2018 (Fig. 6b). For
342 both datasets, the estimated median effective population size was 600 individuals.
343 Positive selection analysis. Selection analysis have been performed with the MEME
344 and aBSREL algorithms. Codons under positive selection have been found in all the
345 protein coding sequences except N (File S2): position 26 (P = 0.05) of NSM, positions
346 468 (P = 0.01), 476 (P = 0.05), 479 (P = 0.05), and 490 (P = 0.01) of NSS, positions 4 (P
347 < 0.01) and 813 (P < 0.01) of GNGC, and positions 1066 (P = 0.03) and 1728 (P = 0.04)
348 of L (File S2 and Fig. S4) were all under significant positive selection. The algorithm
349 aBSREL indicated nodes under positive selection for all genes except N (File S2). These
350 findings point to the fact that beneficial mutations arose in the four proteins and drifted
351 to fixation in the viral population, resulting in adaptation to new hosts and/or new vectors.
15 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
352 This change in the nucleic acid content and the worldwide spread of the thrips vectors,
353 especially F. occidentalis (Knipe and Howley, 2013), could lead to the spread of some
354 lineages that had beneficial mutations for adaptation to new hosts-vectors-environment.
355 This adaptive evolution could explain the fast and worldwide spread of orthotospoviruses
356 during the 1960s and 1970s by the vector F. occidentalis (Knipe and Howley, 2013).
357 Reflecting adaptation of the orthotospoviruses and ulterior diversification into new plant
358 hosts used by this very polyphagous vector.
359 The N protein does not show evidences of positive selection; instead, it shows many
360 instances of significant negative selection (File S2). This could be due to very stringent
361 structural constraints given its multifunctional nature. N is involved in transcription and
362 translation of viral RNA. It has been shown that a single point mutant of Bunyamwera
363 virus (BUNV) was either defective for transcription but not translation and vice versa
364 (Eifan and Elliot, 2009; Walter et al., 2011). This observation suggests that N may have
365 two functional domains; one involved in transcription and other in translation, each
366 interacting with different cellular cofactors. Since the small N protein forms the
367 ribonucleocapsid complex with viral RNAs and tightly interacts with the RdRp to initiate
368 RNA synthesis, protecting the nascent RNAs from degradation (Knipe and Howley,
369 2013), even a small structural change might disrupt its function, jeopardizing completion
370 of the viral replication cycle. But considering essential roles of other proteins that show
371 signatures of positive selection this situation in N could also be due to the lack of
372 statistical power of the methods used.
373
374 CONCLUSIONS
375 The taxonomic status of the plant-infecting bunyaviruses was recently revisited by
376 ICTV. They changed the status of tospoviruses from the genus level to the family level,
16 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
377 Tospoviridae. The new family only contains, so far, a genus, the Orthotospovirus.
378 However, Oliver and Whitfield (2016) have suggested that the family could be classified
379 into five distinct phylogenetic clades whose type members would be GYSV, IYSV,
380 SVNV, TSWV, and WSMoV. This classification is fully consistent with our finding of
381 four phylogroups; TSWV clade would be the representative species of phylogroup A,
382 SVNV of phylogroup B, IYSV would represent phylogroup C, and WSMoV phylogroup
383 D. No full genome sequences for the two viruses from the GYSV clade were available
384 at NCBI, and hence are missing in our analyses. Additional full genome data for the two
385 GYSV clade members would be necessary to confirm if they form an additional E
386 phylogroup as well as to explore their phylogeographic history in relation with the other
387 phylogroups.
388 Based in partial genome sequences and a smaller number of species, Pappu et al.
389 (2009) classified orthotospoviruses according to their geographic origin in two
390 geogroups: Asian and American. Our analyses suggest a more complex picture and that
391 classification of orthotospovirus species cannot be done based on geographic criteria.
392 Comparing this geographic classification with our genome-based phylogroups, we
393 observe that phylogroups A and B would be mainly of Asian origin while phylogroups C
394 and D would be mainly of American origin, though the geographic distinction is diffuse.
395 Genome reassortment, recombination and selection all played a role in the origin and
396 diversification of orthotospoviruses.
397 Looking at the results of the phylogeographic distribution of TSWV nucleocapsid
398 sequences and 73 orthotospovirus nucleocapsid sequences we suggest that the origin of
399 orthotospoviruses is the Australasia ecozone, afterwards radiating into other ecozones
400 around the globe. The geographic distribution of vectors correlates well with the
401 geographic distribution of different Orthotospovirus species. Phylogroups B, C, D and
17 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
402 TSWV and INSV are transmitted by aphids that are specific for the Neartic/Neotropic
403 and Paleartic regions and are the predominant vectors for each group.
404 This work is the first phylogenetic analysis done on the family Tospoviridae that used
405 well annotated full genome sequences. In line with the results from other studies, we
406 propose a hypothesis that the genus Orthotospovirus could reasonably be split into at least
407 four new genera.
408
409 MATERIAL AND METHODS
410 Viral species, sequence extraction and database curation. All available full
411 Orthotospovirus genomes were obtained from the NCBI Genome Database
412 (www.ncbi.nlm.nih.gov/genome; Brister et al., 2015) in May 10th, 2018. The genome
413 collection was further expanded for all orthotospoviruses that were defined as a species
414 with the same criterion used to search the NCBI Genome Database: the complete
415 genomes had to come from the same research paper and had to be sequenced from the
416 same extraction sample. Our working dataset is thus composed of the amino acid
417 sequences derived from 62 full viral genomes (the entire proteome) with well-defined
418 segments and genes belonging to 20 different Orthotospovirus species (File S1). All
419 nucleotide sequences were extracted by their accession number with the help of E-utilities
420 from Entrez Direct. It was confirmed by further inspection with Geneious Prime version
421 2019.0.3 (www.geneious.com) that each of the 62 genomes corresponded, indeed, to a
422 different Orthotospovirus.
423 To examine the phylogeographic origins of TSWV and all orthotospoviruses we
424 collected 229 sequences for TSWV (including partial sequences) and 73 for all
425 orthotospoviruses after filtering for duplicates. In the case of all orthotospovirus N
426 proteins we did not use the expanded set because it did not converge which could be
18 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
427 caused by problems with the priors or a low temporal signal (R2 = 0.13). We also created
428 a subset of TSWV sequences where we tried to minimize the bias of overrepresentation
429 from certain countries and from specific dates, we also removed partial sequences. The
430 subset was guided by CD-HIT (Fu et al., 2012) where we clustered the sequences coming
431 from the same ecozone with sequence identity cut-off of 95%, taking into consideration
432 only the ecozones that had the largest number of sequences. After removing the bias and
433 partial sequences in TSWV set, 87 sequences were retained.
434 Alignment and assembly of segments and full genomes. All alignments used in
435 this study were generated using MAFFT version 7 (Kazutaka and Standley, 2013) and
436 visually inspected. We generated the following alignments. Firstly, the protein
437 sequences of the five viral genes (GNGC, N, NSM, NSS, and RdRp; Fig. 1) were aligned
438 individually. Secondly, to build the alignment of segments, we respected the order of
439 genes in NCBI records and combined NSM and GNGC genes into segment M and NSS and
440 N genes into segment S (Fig. 1). Thirdly, we generated the concatenated full viral
441 genomes by combining the three segments in the arbitrary L-M-S order. Segment and
442 full genome joining were performed with FaBox alignment joiner available online
443 (Villesen, 2007). The resulting globally aligned proteome was 5,401 amino acids long,
444 had 966 identical sites and a pairwise identity of 62.3%.
445 Tests of phylogenetic congruence and detection of segment reassortment and
446 recombination events. To seek for heterologous segregation of segments in the origin
447 of novel orthotospoviruses, segment trees were compared to one another (M-L, M-S and
448 S-L) using the tanglegram function in Dendroscope version 3 (Huson and Scornavacca,
449 2012). Recombination analysis within genes and segments was performed using the PHI
450 test implemented in SplitsTree4 (Huson and Bryant, 2006) for recombination on the
451 protein and codon sequences encoded by each gene and concatenated segment. The
19 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
452 genetic algorithm for recombination detection GARD (Kosakovsky Pond et al., 2006), as
453 implemented in the www.datamonkey.org server (Weaver et al., 2018) was also used to
454 detect recombination breakpoints in the codon aligned nucleotide sequences of the five
455 Orthotospovirus genes.
456 Selection of the best models of amino acid substitutions. ProtTest 3 (Darriba et
457 al., 2011) was used to assign the best fitting models of protein evolution for the
458 individually aligned protein sequences based upon the Bayesian Information Criterion
459 (BIC) minimal score. For the aligned TSWV N sequences used for the phylogeographic
460 analyses, we used the second best-scoring model, FLU+G, since the best-scoring one,
461 HIVb (Korber, 2000), was not implemented in BEAST version 1.10.4 (Suchard et al.,
462 2018).
463 Phylogenetic and phylogeographic trees reconstruction. The maximum
464 likelihood trees of the protein alignments for genes, segments and full genome sequences
465 were generated with IQ-Tree (Nguyen et al., 2013). In the maximum likelihood tree
466 reconstruction, we used the model assigned with ProtTest 3 and 1000 bootstrap replicates.
467 These trees were later used as input for the program TempEst (Rambaut et al., 2016)
468 where we examined the temporal signal and clock-likeness of our data before proceeding
469 to the Bayesian analysis. All the trees for genes, segments, full genome and the
470 phylogeographic analysis had a positive correlation coefficient and significant temporal
471 signal. Values for R2 ranged between 0.031 - 0.36 which corresponded again to full
2 472 genome and gene NSS. The R values for the two datasets of TSWV and 73
473 orthotospovirus N genes were in the range 0.094 - 0.22. BEAST version 1.10.4 (Suchard
474 et al., 2018) on CIPRES Science Gateway (www.phylo.org; Miller et al., 2010) and its
475 associated program BEAUti were used for Bayesian analyses. All the genes, segments
476 and full genome sequences were analyzed under the best-fitting substitution model with
20 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
477 site heterogeneity modeled with a Gamma distribution (with four categories) and a
478 fraction of invariable sites, if needed. We used the uncorrelated relaxed clock where each
479 branch in the tree has its own evolutionary rate. Also, we used time-aware GMRF
480 Skyride as a tree prior because we did not have any data on the expansion or population
481 history and it was highly unlikely that constant population size was maintained, chain
9 482 length of 10 , and dates of isolation assigned to each sequence. In the case of the GNGC
483 gene, we introduced a partition in the alignment, the first half containing the N-terminal
484 768 amino acids and the second consisting of the C-terminal 821 amino acids. This was
485 done because we found a breakpoint at this position using the program GARD. Each
486 partition had a heterogeneous amino acid substitution model assigned by ProtTest 3.
487 Segments M, S and full genome sequences were also partitioned following the end and
488 the beginning of each respective gene inside the segment. So, segment M was split into
489 three fragments corresponding to genes NSM, GN and GC, segment S was split into genes
490 NSS and N, while the full genomes were split into genes N, NSS, NSM, GN, GC, and RdRp.
491 The respective models selected by ProtTest 3 for each gene were assigned to each
492 partition since the substitution models were unlinked. We used the same parameters as
493 stated above but with unlinked clock models for each partition to account for different
494 rates among branches and with linked trees to obtain a single tree estimated for all the
495 data.
496 In the phylogeographic analysis of TSWV and 73 orthotospoviruses we focused only
497 on the N gene sequences because they greatly outnumbered the rest of the gene sequences
498 available in NCBI. Larger number of sequences gave us higher resolution and more
499 information from different biogeographic realms or ecozones. Assigning ecozones
500 (Scuhultz, 2005) instead of countries to sequences proved to be a better choice since some
501 countries were highly under represented. Furthermore, countries define political entities
21 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
502 while ecozones (Fig. 4 and Fig. 5) delimit areas were plants and animals where relatively
503 isolated for long periods of time and therefore share a common evolutionary history. We
504 modeled the sequences with phylogeographic diffusion in discrete space.
505 All the outputs were analyzed by Tracer 1.7.1 (Rambaut et al., 2018) where
506 convergence and adequate effective sample size (ESS) were estimated. To generate the
507 maximum clade credibility (MCC) tree and remove the 10% burn in we used
508 TreeAnnotator 1.10.4. Bayesian posterior probabilities (BPP) were used to assess the
509 significance of each node in the MCC tree. FigTree 1.4.3
510 (tree.bio.ed.ac.uk/software/figtree) was used to visualize all trees and make them
511 publication ready. SpreaD3 was used to prepare the spatiotemporal history and visualize
512 it in a map (Bielejec et al., 2016).
513 Assessing the distribution of geographic origins along the MCC phylogenetic
514 tree. Phylogeny-trait association was performed with a software package TreeBreaker
515 (Ansari and Didelot, 2016). The traits assigned were the countries of origin from File S1
516 for complete genome MCC tree. The analysis was performed on a set of 90,001,000 trees
517 generated with BEAST after removing the 10% burn in.
518 dN/dS-based selection analyses. For the analyses of selective constraints operating
519 on the different genes, nucleotide sequences were aligned with MUSCLE (Edgar, 2004)
520 as implemented in MEGA7 (Kumar et al., 2016) using the codon alignment option that
521 preserves open reading frames. Two different methods were used to analyze each codon
522 alignment and inferring differences in selective pressures. The first one was the mixed
523 effects model of evolution (MEME) which uses mixed-effects maximum likelihood
524 approach to test for individual sites that have been subject to episodic positive or
525 diversifying selection (Murrell et al., 2012). The second one was the adaptive branch-
526 site random effects likelihood (aBSREL) model that uses the branch-site models which
22 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
527 test for positive selection on a selected proportion of branches (Smith et al., 2015). In
528 both cases the P = 0.05 threshold was set for each analysis. These selection analyses were
529 conducted in the www.datamonkey.org server (Weaver et al., 2018).
530
531 SUPPLEMENTARY MATERIAL
532 Supplementary File S1 (Excel). Orthotospovirus genomic sequence and phylogeographic
533 data.
534 Supplementary File S2 (Excel). Results of per-site and branch-site selection analyses.
535 Supplementary Fig. S1 (PDF). Dendrograms of the genomic segments generated with
536 Dendroscope.
537 Supplementary Fig. S2 (PDF). MCC trees obtained for each genomic segment.
538 Supplementary Fig. S3 (PDF). Results from the analyses performed with TreeBreaker.
539 Supplementary Fig. S4 (PDF). Graphical representation of per-site selection analyses.
540 On the abscissa is the nucleotide site and on ordinates the is the likelihood ratio test (LRT)
541 statistic for episodic diversification. Significant sites are those with an LRT value ³ 4.
542
543 AUTHOR CONTRIBUTIONS
544 AB and RG collected the data, performed all the analyses and drafted a manuscript.
545 SFE conceived the study, supervised its execution and wrote the manuscript.
546
547 ACKNOWLEDGEMENTS
548 This work was supported by grant Spain Agencia Estatal de Investigación – FEDER
549 grant BFU2015-65037-P to SFE. AB and RG were supported by Generalitat Valenciana
550 grant GRISOLIAP/2018/005 and by Spain Agencia Estatal de Investigación – FEDER
551 contract BES-2016-077078, respectively.
23 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
552
553 REFERENCES
554 Adams, M. J., Lefkowitz, E. J., King, A. M. Q., Harrach, B. Harrison, R. L., Knowles, N.
555 J., Kropinski, A. M., Krupovic, M., Kuhn, J. H., Mushegian, A. R., Nibert, M.,
556 Sabanadzovic, S., Sanfaçon, H., Siddell, S. G., Simmonds, P., Varsani, A., Zerbini,
557 F. M., Gorbalenya, A. E., Davison, A. J., 2017. Changes to taxonomy and the
558 International Code of Virus Classification and Nomenclature ratified by the
559 International Committee on Taxonomy of Viruses (2017). Arch. Virol. 162, 2505-
560 2538.
561 Ansari, A.M., Didelot, X., 2016. Bayesian inference of the evolution of a phenotype
562 distribution on a phylogenetic tree. Genetics 204, 89-98.
563 Bielejec, F., Baele, G., Vrancken, B., Suchard, M. A., Rambaut, A., Lemey, P., 2016.
564 SpreaD3: interactive visualisation of spatiotemporal history and trait evolutionary
565 processes. Mol. Biol. Evol. 33, 2167-2169.
566 Brister, J.R., Ako-Adjei, D., Bao, Y., Blinkova, O., 2015. NCBI viral genomes resource.
567 Nucl. Acids Res. 43, D571-D577.
568 Darriba, D., Taboada, G.L., Doallo, R., Posada, D., 2011. ProtTest 3: fast selection of
569 best-fit models of protein evolution. Bioinformatics 27, 1164-1165.
570 Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and high
571 throughput. Nucleic Acids Res. 32, 1792-97.
572 Eifan, S., Elliot R.M., 2009. Mutational analysis of the Bunyamwera orthobunyavirus
573 nucleocapsid protein gene. J. Virol. 83, 11307-11317.
574 Fletcher, S. J., Shrestha, A., Peters, J. R., Carroll, B. J., Srinivasan, R., Pappu, H. R.,
575 Mitter, N., 2016. The Tomato spotted wilt virus genome is processed differentially in
24 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
576 its plant host Arachis hypogaea and its thrips vector Frankliniella fusca. Front. Plant
577 Sci. 7, 1349.
578 Fu, L., Niu, B., Zhu, Z., Wu, S., Li, W., 2012. CD-HIT: accelerated for clustering the
579 next-generation sequencing data. Bioinformatics 28, 3150-3152.
580 Gilbertson, R.L., Batuman, O., Webster, C.G., Adkins, S., 2015. Role of the insect
581 supervectors Bemisia tabaci and Frankliniella occidentalis in the emergence and
582 global spread of plant viruses. Annu. Rev. Virol. 2, 67-93.
583 Hedil, M., Kormelink, R., 2016. Viral RNA silencing suppression: the enigma of
584 Bunyavirus NSS proteins. Viruses 8, 208.
585 Hull, R., 2009. Comparative Plant Virology, second edition, Elsevier Academic Press,
586 San Diego CA, USA.
587 Huson, D.H., Bryant, D., 2006. Application of phylogenetic networks in evolutionary
588 studies. Mol. Biol. Evol. 23, 254-267.
589 Huson, D.H., Scornavacca, C., 2012. Dendroscope 3: An interactive tool for rooted
590 phylogenetic trees and networks. Syst. Biol. 61, 1061-1067.
591 Kazutaka, K., Standley, D.M., 2013. MAFFT multiple sequence alignment software
592 version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772-780.
593 Knipe, D.M., Howley, P.M., 2013. Fields Virology, sixth edition. Wolters
594 Kluwer/Lippincott Williams & Wilkins Health, Philadelphia PA, USA
595 Korber, B., 2000. HIV signature and sequence variation analysis, in: Rodrigo, G.A.,
596 Learn, G.H. (Eds.), Computational Analysis of HIV Molecular Sequences. Kluwer
597 Academic Publishers, Dordrecht, Netherlands, pp 55-72.
598 Kosakovsky Pond, S.L., Posada, D., Gravenor, M.B., Woelk, C.H., Frost, S.D., 2006
599 Automated phylogenetic detection of recombination using a genetic algorithm. Mol.
600 Biol. Evol. 23, 1891-1901.
25 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
601 Kumar S., Stecher G., Tamura K., 2016. MEGA7: molecular evolutionary genetics
602 analysis version 7.0 for bigger datasets. Mol. Biol. Evol. 33, 1870-1874.
603 Miller, M.A., Pfeiffer, W., Schwartz, T., 2010. Creating the CIPRES science gateway for
604 inference of large phylogenetic trees. Proceedings of the Gateway Computing
605 Environments Workshop (GCE), IEEE, New Orleans LA, USA, pp. 1-8.
606 Murrell, B., Wertheim, J.O., Moola, S., Weighill, T., Scheffler, K., Kosakovsky Pond,
607 S.L., 2012. Detecting individual sites subject to episodic diversifying selection. PLoS
608 Genet. 8, e1002764.
609 Nguyen, L.-T., Schmidt, H.A., von Haeseler, A., Minh, B.Q., 2015. IQ-TREE: A fast and
610 effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol.
611 Biol. Evol. 32, 268-274.
612 Oliver, J. E., Whitfield, A. E., 2016. The genus Tospovirus: emerging bunyaviruses that
613 threaten food security. Annu. Rev. Virol. 3, 101-124.
614 Pappu, H.R., Jones, R.A., Jain, R.K., 2009. Global status of tospovirus epidemics in
615 diverse cropping systems: successes achieved and challenges ahead. Virus Res. 141,
616 219-236.
617 Rambaut, A., Lam, T.T., Max Carvalho, L., Pybus, O.G., 2016. Exploring the temporal
618 structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus
619 Evol 2, vew007.
620 Rambaut, A., Drummond, A.J., Xie, D., Baele, G., Suchard, M.A., 2018. Posterior
621 summarisation in Bayesian phylogenetics using Tracer 1.7. Syst. Biol. 67, 901-904.
622 Ribeiro, D., Borst, J. W., Goldbach, R., Kormelink, R., 2009. Tomato spotted wilt virus
623 nucleocapsid protein interacts with both viral glycoproteins GN and GC in planta.
624 Virology 383, 121-130.
26 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
625 Rotenberg, D., Jacobson, A.L., Schneweis, D.J., Whitfield, A.E., 2015. Thrips
626 transmission of tospoviruses. Curr. Opin. Virol. 15, 80-89.
627 Scuhultz, J., 2005. The Ecozones of the World, 2nd ed. Springer, New York, NY.
628 Shalileh, S., Ogada, P.A., Moualeu, D.P., Poehling, H.M., 2016. Manipulation of
629 Frankliniella occidentalis (Thysanoptera: Thripidae) by Tomato spotted wilt virus
630 (tospovirus) via the host plant nutrients to enhance its transmission and spread.
631 Environ. Entomol. 45, 1235-1242.
632 Singh, P., Savithri, H., 2015. GBNV encoded movement protein (NSM) remodels ER
633 network via C-terminal coiled coil domain. Virology 482, 133-146.
634 Smith, M.D., Wertheim, J.O., Weaver, S., Murrell, B., Scheffler, K., Kosakovsky Pond,
635 S.L., 2015. Less is more: an adaptive branch-site random effects model for efficient
636 detection of episodic diversifying selection. Mol. Biol. Evol. 32, 1342-1353.
637 Storms, M.M., Kormelink, R., Peters, D., Van Lent, J.W., Goldbach, R.W., 1995. The
638 nonstructural NSM protein of Tomato spotted wilt virus induces tubular structures in
639 plant and insect cells. Virology 214, 485-493.
640 Suchard, M.A., Lemey, P., Baele, G., Ayres, D.L., Drummond, A.J., Rambaut, A. 2018.
641 Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus
642 Evol. 4, vey016.
643 Sundaraj, S., Srinivasan, R., Culbreath, A.K., Riley, D.G., Pappu, H.R., 2014. Host plant
644 resistance against Tomato spotted wilt virus in peanut (Arachis hypogaea) and its
645 impact on susceptibility to the virus, virus population genetics, and vector feeding
646 behavior and survival. Phytopathology 104, 202-210.
647 Tsompana, M., Abad, J., Purugganan, M., Moyer, J.W., 2004. The molecular population
648 genetics of the Tomato spotted wilt virus (TSWV) genome: Evolutionary genetics of
649 TSWV. Mol. Ecol. 14, 53-66.
27 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
650 Villesen, P., 2007. FaBox: an online toolbox for FASTA sequences. Mol. Ecol. Notes 7,
651 965-968.
652 Walter, C.T., Costa Bento, D.F., Guerrero Alonso, A., Barr, J.N., 2011. Amino acid
653 changes within the Bunyamwera virus nucleocapsid protein differentially affect the
654 mRNA transcription and RNA replication activities of assembled ribonucleoprotein
655 templates. J. Gen. Virol. 92, 82-84.
656 Weaver, S., Shank, S.D., Spielman, S.J., Li, M., Muse, S.V., Kosakovsky Pond, S.L.,
657 2018. Datamonkey 2.0: a modern web application for characterizing selective and
658 other evolutionary processes. Mol. Biol. Evol. 35, 773-777.
659
28 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
FIG 1. Schematic representation of the Orthotospovirus tri-segmented ambisense
genome. The main function, and length of each protein is indicated. The translation
sense of each ORF is indicated by arrows.
L RdRp ∼8.9 kb ∼3050 aas Replication
GC GN Glyco p ro te in s NS M GN/GC Lipid M N ∼4.8 kb ∼337 aas ∼1245 aas e n ve lo p e Cell to cell Glycoproteins FRAGMENT(-)RNA S movement RdRp
FRAGMENT M L FRAGMENT S NSS N ∼494 aas ∼300 aas ∼2.9 kb Silencing Nucleoprotein suppressor
660
29 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
FIG 2. MCC tree obtained from the concatenated full genome. Phylogroups A, B, C,
and D are indicated with different color boxes. Branch labels indicate BPPs.
MSMV CSNV TSWV27 1 0.8 TSWV21 0.97TSWV24 1 TSWV25 0.53TSWV23 1 1 TSWV22 1 TSWV8 TSWV17 TSWV20 1 TSWV11 1 1 TSWV30 0.69 0.97 TSWV4 0.681 TSWV5 0.73 TSWV3 1 TSWV2 TSWV10 0.98 1 TSWV29 1 TSWV6 0.83 0.6 TSWV7 A 1 1 TSWV18 TSWV19 1 TSWV16 TSWV13 1 1 1 TSWV28 1 1 TSWV26
1 TSWV15 1 TSWV12 1 0.74 TSWV9 1 TSWV31 TSWV1 0.73 TSWV14 1 TCSV GRSV 1 ZLCV1 ZLCV2 1 INSV1 INSV2 1 BeNMV SVNV B 1 HCRV1 1 HCRV2 1 PolRSV1 C 1 PolRSV3 PolRSV2 1 1 MYSV1 1 MYSV3 MYSV2 1 PCSV MVBaV CaCV1 1 1 1 CaCV3 1 0.96 CaCV4 1 1 CaCV2 D CaCV5 1 1 WBNV WSMoV 1 CCSV1 1 CCSV2 1 TZSV Tospokiwi
-3500 -3000 -2500 -2000 -1500 -1000 -500 0
661
30 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
FIG 3. Six MCC trees of the 20 Orthotospovirus species used in this study. The trees
were obtained from each individual gene protein sequence. Phylogroups A, B, C, and
D are indicated with different color boxes. Numbers above branches represent BPPs.
Viral species with more than one isolate has been collapsed (triangles). Branches of
different color indicate differences in rates of molecular evolution according to the left
scale. The scale below the trees represents years.
RdRp NSM
Substitutionrate_medianrate 1 Substitutionrate_medianrate MSMV 0.0024 INSV 0.0204 1 0.9999 ZLCV MSMV 0.9858 0.9999 1 0.9988 1 0.9721 TSWV ZLCV 1 1 0.8873 CSNV 1 TSWV A 0.9643 A 0.5542 1 0.6391 1 TCSV 1 TCSV GRSV GRSV 1 CSNV INSV 1 BeNMV 1 BeNMV SVNV B SVNV B 1 1 1 HCRV 1 HCRV 1 0.9921 PolRSV C PolRSV C 0.9547 1 0.8111 MYSV PCSV 1 PCSV 0.8373 MYSV 0.0001 0.9977 0.9505 0.0005 1 TZSV 1 0.9902 TZSV 1 CCSV 0.6601 1 1 1 CCSV Tospokiwi D Tospokiwi D 0.9968 1 MVBaV 0.9995 WSMoV WBNV 1 1 WBNV 0.643 WSMoV 1 0.9868 1 CaCV CaCV MVBaV
-900 -800 -700 -600 -500 -400 -300 -200 -100 0 -150 -125 -100 -75 -50 -25 0
GN GC SubstitutionHA1.rate_medianrate MSMV SubstitutionHA2.rate_medianrate MSMV 0.0011 0.8328 1 TCSV 0.0006 0.8328 1 TCSV 1 GRSV 1 GRSV 1 1 1 0.9973 TSWV A 1 0.9973 TSWV A 1 CSNV 1 CSNV 1 1 0.5192 ZLCV 0.5192 ZLCV 1 1 INSV INSV 1 BeNMV 1 BeNMV SVNV B SVNV B 1 1 1 HCRV 1 HCRV 1 1 PolRSV C PolRSV C 1 1 1 1 MYSV MYSV 0.9975 PCSV 0.9975 PCSV 0.0003 0.0002 1 Tospokiwi 1 Tospokiwi 0.9022 0.9022 1 CCSV2 1 CCSV2 0.8015 TZSV 0.8015 TZSV 1 CCSV1 D 1 CCSV1 D WBNV WBNV 1 1 1 1 0.9974 0.9974 1 CaCV 1 CaCV WSMoV WSMoV MVBaV MVBaV
-3000 -2500 -2000 -1500 -1000 -500 0 -3000 -2500 -2000 -1500 -1000 -500 0
NSS N Substitutionrate_medianrate 1 Substitutionrate_medianrate 0.9519 CSNV INSV 1 0.0077 1 0.0318 ZLCV ZLCV 0.9999 0.9988 0.9991 1 TCSV CSNV 1 1 0.9938 GRSV 1 TCSV A 1 A 0.3284 0.9466 TSWV 0.6599 GRSV 1 1 0.8233 0.4059 INSV TSWV MSMV MSMV 0.3247 0.9998 BeNMV 1 BeNMV SVNV B SVNV B 1 1 0.9999 HCRV 1 HCRV 0.929 1 PolRSV C PolRSV C PCSV 0.8626 0.8305 PCSV 1 0.9038 0.9877 TZSV 0.0003 MYSV 0.0007 0.9994 1 1 0.9536 CCSV 1 CCSV 0.9856 Tospokiwi 0.6262 TZSV MVBaV 0.471 Tospokiwi 0.9114 0.9999 WBNV D MVBaV D 0.8631 0.9898 1 0.4144 WSMoV CaCV 1 1 CaCV 1 WBNV 1 MYSV WSMoV
-600 -500 -400 -300 -200 -100 0 -125 -100 -75 -50 -25 0
Figure 2. Phylogenetic MMC trees for each one of Tospoviruses genes. Branches are colored 662 depending on the substitution rate of the gene and labeled with the posterior probability. Tospoviruses species are colored depending on the Tospovirus group they belong. Axis represent years from 2017.
31 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
FIG 4. Phylogeographic representation of the migration pattern of the 73
orthotospovirus species based on the analysis of the N gene. The colored regions
represent different ecozones: Afrotopic in dark grey, Australasia in orange,
Indomalayan in blue, Nearctic in green, Neotropical in purple, Eastern Palearctic in
red, and Western Paleartic in yellow, respectively. Arrows show the direction of the
movement, with the black continuous lines representing movements in the current
period of time and blue dashed one’s past movements.
663
664
32 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
FIG 5. Phylogeographic representation of the migration pattern inferred for TSWV
based in the N gene. (A) Reduced dataset consistent in 87 sequences. (B) Large dataset
composed with 229 sequences. Colors represent different ecozones while lines
represent movements, as described in the legend of Fig. 3.
665
33 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
FIG 6. Population sizes over time plots generated for TSWV. (A) Reduced dataset
consistent in 87 sequences. (B) Large dataset composed with 229 sequences. The
ordinate axis represents the number of lineages (in logarithmic scale), while the
abscissa axis represents time from the origin. Light blue lines represent the 95% HPD
confidence interval. Tree model root height 95% HPD for the full sequence set: 26 -
32 and for the reduced set: 26 - 39.
34 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.07.031120; this version posted April 10, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
666
35