<<

Accepted Manuscript

Title: Identification of a - complex in Sophora alopecuroides

Author: Jahangir Heydarnejad Mehdi Kamali Hossain Massumi Anders Kvarnheden Maketalena F. Male Simona Kraberger Daisy Stainton Darren P. Martin Arvind Varsani

PII: S0168-1702(17)30009-6 DOI: http://dx.doi.org/doi:10.1016/j.virusres.2017.03.023 Reference: 97113

To appear in: Virus Research

Received date: 5-1-2017 Revised date: 15-3-2017 Accepted date: 18-3-2017

Please cite this article as: Heydarnejad, J., Kamali, M., Massumi, H., Kvarnheden, A., Male, M.F., Kraberger, S., Stainton, D., Martin, D.P., Varsani, A.,Identification of a nanovirus-alphasatellite complex in Sophora alopecuroides, Virus Research (2017), http://dx.doi.org/10.1016/j.virusres.2017.03.023

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. 1 Identification of a nanovirus-alphasatellite complex in Sophora alopecuroides

2 Jahangir Heydarnejad 1* , Mehdi Kamali 1, Hossain Massumi 1, Anders Kvarnheden 2, Maketalena 3 F. Male 3, Simona Kraberger 3,4 , Daisy Stainton 3,5 , Darren P. Martin 6 and Arvind Varsani 3,7,8*

4

5 1Department of Protection, College of Agriculture, Shahid Bahonar University of Kerman, 6 Kerman, Iran 7 2Department of Plant Biology, Uppsala BioCenter, Linnean Center of Plant Biology in Uppsala, 8 Swedish University of Agricultural Sciences, Box 7080, SE-750 07 Uppsala, Sweden 9 3School of Biological Sciences, University of Canterbury, Private Bag 4800, Christchurch, 8140, 10 New Zealand 11 4Department of Microbiology, Immunology and Pathology, Colorado State University, Fort 12 Collins, CO 80523, USA 13 5School of Biological Sciences, The University of Queensland, Brisbane, Queensland 4072, 14 Australia 15 6Computational Biology Group, Institute of Infectious Disease and Molecular Medicine, 16 University of Cape Town Faculty of Health Sciences, Observatory 7925. South Africa 17 7Structural Biology Research Unit, Department of Clinical Laboratory Sciences, University of 18 Cape Town, Rondebosch, 7701, Cape Town, South Africa 19 8The Biodesign Center for Fundamental and Applied Microbiomics, Center for Evolution and 20 Medicine, School of Sciences, Arizona State University, Tempe, AZ 85287-5001, USA 21 22 *Corresponding authors 23 Dr Jahangir Heydarnejad, Department of Plant Protection, College of Agriculture, Shahid 24 Bahonar University of Kerman, Kerman, Iran, Fax: 0098 341 3222043, E-mail: 25 [email protected] Manuscript 26 Assoc Prof. Arvind Varsani, The Biodesign Center for Fundamental and Applied Microbiomics, 27 Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, 28 AZ 85287-5001, USA, Tel: +1 480 410 9366. E-mail: [email protected] 29

1

Page 1 of 23 30 Key words: Nanovirus; Sophora alopecuroides ; Iran; alphasatellite; high throughput sequencing 31

Accepted Manuscript

2

Page 2 of 23 31 Abstract

32 in the genus Nanovirus of the family generally have eight individually 33 encapsidated circular genome components and have been predominantly found infecting 34 Fabaceae in Europe, Australia, Africa and Asia. For over a decade Sophora alopecuroides 35 L. (Fabaceae) plants have been observed across Iran displaying dwarfing, yellowing, stunted 36 leaves and yellow vein banding. Using a high-throughput sequencing approach, sequences were 37 identified within one such plant that had similarities to nanovirus genome components. From this 38 plant, the nanovirus-like molecules DNA-R (n=4), DNA-C (n=2), DNA-S (n=1), DNA-M (n=1), 39 DNA-N (n=1), DNA-U1 (n=1), DNA-U2 (n=1) and DNA-U4 (n=1) were amplified, cloned and 40 sequenced. Other than for the DNA-R, these components share less than 71% identity with those 41 of other known . The four DNA-R molecules were highly diverse, sharing only 65- 42 71% identity with each other and 64-86% identity with those of other nanoviruses. In the S. 43 alopecuroides plant 14 molecules sharing 57.7-84.6% identity with previously determined 44 sequences of nanovirus-associated alphasatellites were also identified. Given the research 45 activity in the nanovirus field during the last five years coupled with high-throughput sequence 46 technologies, many more diverse nanoviruses and nanovirus-associated satellites are likely to be 47 identified.

48

Accepted Manuscript

3

Page 3 of 23 48 1. Introduction

49 The family Nanoviridae contains plant-infecting viruses with multi-component single-stranded 50 DNA (ssDNA) genomes that are individually encapsidated within isometric 17-20 nm virions. 51 The two genera within the family, Nanovirus and Babuvirus, are differentiated based on their 52 biological and genomic properties. Viruses in both genera are transmitted by aphids with 53 members of the genus Nanovirus infecting dicotyledonous host plants and those of the genus 54 Babuvirus infecting monocotyledonous plants. Also, whereas members of the genus Nanovirus 55 have eight genome components (named DNA-R, DNA-S, DNA-M, DNA-C, DNA-N, DNA-U1, 56 DNA-U2 and DNA-U4) that are between 970 and 1021 nucleotides (nts) in length, members of 57 the genus Babuvirus have six components (DNA-R, DNA-S, DNA-M, DNA-C, DNA-N, and 58 DNA-U3) that are between 1013 and 1116 nts in length (Table 1) (Vetten et al. , 2012).

59

60 There are presently eight recognized species in the genus Nanovirus (Table 1) (Abraham et al. , 61 2012; Boevink et al. , 1995; Chu and Helms, 1988; Grigoras et al. , 2014; Grigoras et al. , 2010; 62 Grigoras et al. , 2009; Katul et al. , 1998; Sano et al. , 1998), which have so far been found 63 infecting various predominantly leguminous species (Table 1). The host symptoms include 64 stunting, necrosis, leaf yellowing or reddening and leaf curling (Vetten et al. , 2012).

65

66 The DNA-R, DNA-S, DNA-M, DNA-C and DNA-N components of nanoviruses and 67 babuviruses are homologous and all encode a protein of known function. DNA-R encodes a 68 replication-associated protein (Rep) (Burns et al. , 1995; Hafner et al. , 1997; Harding et al. , 1993) 69 which is involved in replicating all canonical components. DNA-S encodes the capsid protein 70 (CP); (Wanitchakorn et al. , 1997). DNA-C encodes a cell-cycle link protein (Clink) which is 71 involved in switchingAccepted the plant host into DNA repli Manuscriptcation or S-phase to increase replication of 72 the other components (Aronson et al. , 2000; Lageix et al. , 2007; Wanitchakorn et al. , 2000). 73 DNA-M encodes a movement protein (MP) and based on cellular localisation studies DNA-N 74 encodes a putative nuclear shuttle protein (NSP) (Wanitchakorn et al. , 2000). The DNA-U1, 75 DNA-U2, and DNA-U4 components have unknown functions.

4

Page 4 of 23 76

77 The components of a nanovirus genome share two homologous regions: the common region 78 stem-loop (CR-I) and the common region II (CR-II). In addition to the canonical genome 79 components, nanoviruses have also been found associated with molecules known as 80 alphasatellites. Like the canonical DNA-R component, these molecules contain a rep -like gene, 81 however, unlike DNA-R, they are unable to trans-replicate the canonical genome components 82 with which they are associated (Horser et al. , 2001; Timchenko et al. , 1999; Timchenko et al. , 83 2000). Alphasatellites that are related to those of nanoviruses are also found associated with 84 viruses in the genera and Mastrevirus of the family (Kumar et al. , 85 2014; Zhou, 2013).

86

87 Sophora alopecuroides L. (Fabaceae) is a wild perennial herb that is widely distributed across 88 the arid and semi-arid regions of Iran and other parts of Asia (Bisby et al. , 1994). It is primarily 89 used as livestock feed but in traditional Chinese medicine it is also used to treat fever and 90 diarrhea (Song et al. , 1999; Zhao et al. , 2013). Over the past decade in Iran S. alopecuroides 91 plants have been observed throughout most parts of the country with apparent disease symptoms 92 including severe yellowing and stunting, shrunken leaves and yellow vein banding (Figure 1). In 93 this report a putative nanovirus in the genus Nanovirus recovered from S. alopecuroides is 94 described.

95

96 2. Materials and methods

97 2.1 DNA isolation, amplification of circular molecules and Illumina sequencing

98 A symptomatic Accepted sample of Sophora alopecuroides Manuscript L. (Fabaceae; Figure 1) was collected at 99 Shahid Bahonar University of Kerman (Kerman, Iran) in 2014 and total DNA was extracted 100 according to Zhang et al. (1998). Circular DNA sequences were amplified using rolling circle 101 amplification with Phi29 DNA polymerase as previously described (Shepherd et al. , 2008). The

5

Page 5 of 23 102 amplified circular DNA was sequenced on an Illumina HiSeq 2500 sequencer at Noveogene 103 (Hong Kong).

104

105 2.2 De novo assembly of Illumina sequencing reads and analysis of resulting contigs

106 The Illumina sequenced paired-end reads were de novo assembled using ABySS v1.9 (Simpson 107 et al. , 2009) assembler. Contigs of >500 nts were analyzed using BLASTn and BLASTx 108 (Altschul et al. , 1990) against a viral database to identify viral-like contigs. In the 3366 contigs 109 that were > 500 nts viral sequences were identified that shared similarities to nanovirus 110 components and alphasatellite sequences.

111

112 2.3 Recovery and characterization of circular molecules with viral and alphasatellite 113 sequences

114 Abutting primers were designed for all de novo assembled contigs that had similarities to 115 nanovirus and alphasatellite sequences in order to recover, verify and archive the DNA 116 molecules. In all but one case the overlapping primers were designed to contain a restriction 117 enzyme site (Supplementary Table 1).

118

119 These abutting primer pairs were used to PCR amplify the circular molecules from the original 120 DNA extracts with KAPA Hifi Hotstart DNA polymerase (Kapa Biosystems, USA) using the 121 following protocol: initial denaturation at 95°C for 3 min followed by 25 cycles at 98°C for 20 s, 122 60°C for 15 s, 72°C for 45 s and a final extension at 72°C for 1 min. The amplicons of ~1.0 kb 123 were resolved onAccepted a 0.7% agarose gel, gel purifiedManuscript and cloned into pJET1.2 124 (ThermoFisher, USA). The inserts of the resulting recombinant were Sanger sequenced 125 by primer walking at Macrogen Inc. (South Korea) and the contigs assembled using DNAbaser 126 v.4 (Heracle BioSoft S.R.L., Romania). The putative open reading frames were identified in the 127 circular DNA molecule sequences using ORFfinder (https://www.ncbi.nlm.nih.gov/orffinder )

6

Page 6 of 23 128 (Sayers et al. , 2012) and all pairwise nucleotide identities were determined using SDT v1.2 129 (Muhire et al. , 2014).

130

131 The recovered sequences determined by Sanger sequencing were aligned with representative 132 nanovirus components and alphasatellite sequences from GenBank (downloaded on the 17th Oct 133 2016) using MUSCLE (Edgar, 2004). Phylogenetic trees were inferred using FastTree (Price et 134 al. , 2010) with GTR+CAT substitution model. Branches with < 80% SH-like support were 135 collapsed using TreeGraph2 (Stover and Muller, 2010). The phylogenetic trees were rooted with 136 sequences of babuviruses for DNA-R, DNA-S, DNA-C, DNA-M and DNA-N datasets and 137 midpoint rooted for DNA-U1, DNA-U2, DNA-U4 and the alphasatellite datasets. The CR-SL of 138 all nanovirus components available in GenBank along with those from this study were aligned 139 using MUSCLE (Edgar, 2004).

140

141 3. Results and discussion

142 3.1 Identification and recovery of nanovirus and associated alphasatellite molecules from 143 Sophora alopecuroides.

144 In an effort to determine the etiological agent of a severe disease affecting S. alopecuroides in 145 Iran (Figure 1), total DNA was purified from an affected plant, circular molecules were 146 amplified and the resulting DNA was analyzed by high-throughput sequencing (HTS). Among 147 de novo assembled contigs of the HTS reads nanovirus-like and nanovirus-associated 148 alphasatellite-like molecules were identified. Using abutting primers designed from these contigs 149 to amplify specific circular replicons, the amplicons for 14 alphasatellite molecules, four DNA-R 150 molecules, two DNA-CAccepted molecules and one each of DNA-ManuscriptS, DNA-M, DNA-N, DNA-U1, DNA- 151 U2 and DNA-U4 were cloned and Sanger sequenced (Figure 1). The sequences of these 26 152 circular molecules (Figure 1; Supplementary Table 1) have been deposited in GenBank as 153 accessions KX534385 to KX534410. In addition, what appear to be six defective molecules 154 (Supplementary Data 1) with an alphasatellite-like backbone and no detectable coding region 155 were cloned and sequenced. 7

Page 7 of 23 156 3.2 Sequence analysis of nanovirus-like genome components from S. alopecuroides

157 Analysis of the pairwise identities of the recovered nanovirus-like sequences from S. 158 alopecuroides indicated that in general these share <72% pairwise identities with all other known 159 nanovirus genome components, except DNA-R which shared up to 88% identity (Table 2, Figure 160 2). Nanoviruses whose CP amino acid sequence diversity is >15% and / or their overall genomes 161 share <75% pairwise identity could be classified as new species (Vetten et al. , 2012). The 162 putative genome of the new nanovirus shares 66-69% pairwise identity with other nanovirus 163 genomes and the predicted amino acid sequence of the CP of the virus share 44-54% identity 164 with CP sequences of other nanoviruses. This implies that the newly determined components are 165 likely derived from one or more novel nanovirus species. Based on the symptoms of the plant 166 from which the virus was obtained (Figure 1), we refer to this virus as Sophora yellow stunt- 167 associated virus (SYSaV) in the rest of the manuscript. It is noteworthy that the four recovered 168 DNA-R molecules share less than 76.5% identity with each other and that they therefore likely 169 each represents a different nanovirus species. DNA-R3 (KX534390) shares a higher degree of 170 identity (78 - 88% pairwise identity) with the DNA-Rs of other known nanoviruses (Table 2; 171 Figure 2) than with the other three DNA-R sequences isolated from the S. alopecuroides plant.

172 Similarly, the two DNA-C molecules from S. alopecuroides (KX534386 and KX534396) share 173 only 71% pairwise identity with one another and therefore are also likely derived from different 174 nanovirus species (Table 2; Figure 2). These molecules share 60.5-68.1 and 61.3-69.3% pairwise 175 identities with DNA-C molecules of other nanoviruses.

176 The deduced amino acid sequence of the protein encoded by the two DNA-Cs of SYSaV 177 contains the LXCXE motif that is conserved in nearly all other known nanoviruses. This protein 178 is responsible for interactions with plant retinoblastoma-like proteins and is involved in cell 179 cycle regulation (AronsonAccepted et al. , 2000; Wanitchakorn Manuscript et al. , 2000). 180 The DNA-S component (KX534385) shares between 61.7 and 67.7% pairwise identity with the 181 DNA-S components of other known nanoviruses (Table 2; Figure 3). The DNA-M (KX534387), 182 DNA-N (KX534393) and DNA-U1 (KX534394) components, respectively, share 61.2-68.0%, 183 63.9-70.4% and 59.5-70.2% pairwise identities with those of other nanoviruses, and are most 184 closely related to their homologous counterparts in subterranean clover stunt virus (SCSV) 8

Page 8 of 23 185 (Table 2; Figure 4). SYSaV-U2 (KX534395) and SYSaV-U4 (KX534392), respectively, share 186 61.5-66.4% and 58.7-66.5% pairwise identities with their counterparts in other nanoviruses 187 (Table 2; Figure 4).

188

189 3.3 Analysis of the CR-I, CR-II and identification of putative Rep recognition sequences

190 The CR-I which is conserved across all components of an individual nanovirus genome contains 191 both the origin of virion strand replication and iterated direct and inverted repeat sequences that 192 function as Rep recognition sites during the initiation and resolution of rolling circle replication 193 (Hafner et al. , 1997; Londono et al. , 2010). All of the SYSaV components contain the 194 “TAGTATTAC” nonanucleotide within the loop sequence of a likely hairpin structure that is 195 highly conserved in all nanoviruses. Rep initiates rolling circle replication by nicking this 196 nonanucleotide between the final T and A nucleotides. An alignment of the entire CR-I sequence 197 of all nanoviruses indicated that they predominantly contain iterated sequences containing the 198 trimer “TGA” (in the SYSaV components the full iterated sequence is TGACG) (Figure 5).

199

200 Pairwise identity analysis of the CR-I of all the nanovirus components reveals a diversity of 201 46.4% (Supplementary Data 2). Pairwise comparisons of the CR-I of DNA-R molecules of black 202 medic leaf roll virus (BMLV), faba bean necrotic stunt virus (FBNSV), faba bean necrotic 203 yellows virus (FBNYV), faba bean yellow leaf virus (FBYLV), milk vetch dwarf virus (MDV), 204 pea necrotic yellow dwarf virus (PNYDV), pea yellow stunt virus (PYSV) and SCSV and that of 205 their canonical DNA-C, -M, -N, -S, -U1, -U2 and U4 molecules indicate that there can be up to 206 31% diversity within a species (Supplementary Figure 1). It is noted that the CR-I of the four 207 SYSaV DNA-Rs when compared with their canonical molecules shows diversity of ~15 - 40% 208 (Supplementary FigureAccepted 1). Manuscript

209

210 The CR-II of nanoviruses (~50 nts) is relatively smaller than that found in babuviruses. A highly 211 conserved motif (CTCTGCGAAGCTATATG) was identified in the CR-II region (Figure 5). The

9

Page 9 of 23 212 CR-II of three DNA-Rs (KX534388, KX534389, KX534391) shares >95% identity and the 213 fourth one (KX534390) shares ~87% (Supplementary Data 3).

214

215 3.4 Nanovirus-associated alphasatellites and alphasatellites-like circular molecules

216 Alphasatellites are frequently associated with nanovirus infections (Vetten et al. , 2012). Unlike 217 begomovirus-alphasatellites which are ~1350 nts, the nanoviruses-associated alphastaellites are 218 ~1000 nts and have an AT rich region domain downstream of the rep gene. The nanovirus- 219 associated alphasatellites that have been so far identified all cluster together and their Reps are 220 more closely related to Reps of begomovirus-associated alphasatellites. Fourteen of these 221 molecules were identified in the examined S. alopecuroides plant (KX534397-KX534410) 222 (Figure 1, Supplementary Table 1). All but two of these molecules had the consensus 223 nonanucleotide that is generally found at alphasatellite virion-strand origins of replication 224 (TAGTATTAC). The two exceptions, KX534397 and KX534398, had a CAGTATTAC 225 sequence. The alphasatellites associated with SYSaV share between 57.7 and 99.7% pairwise 226 identities with each other and 57.7-84.6% with other nanovirus-associated alphasatellites (Figure 227 6). Specifically, molecule KX534397 shares 84% pairwise identity with alphasatellites 228 associated with FBNYV (AJ005964 and AJ132185). Analysis of the phylogeny of the 14 229 Sophora yellow stunt-associated alphasatellites revealed three well supported clades that 230 accommodate all the currently identified nanovirus-associated alphasatellites except that of 231 coconut foliar decay virus (CFDV; M29963; Figure 6). Both the pairwise identity and 232 phylogenetic analysis of the available alphasatellite molecules indicated that there are no clear 233 associations between specific groups of closely related molecules and particular nanovirus 234 species. 235 Accepted Manuscript 236 Defective molecules which are similar to the canonical genome components but with insertions 237 or deletions which likely render them as non-functional components have previously been found 238 associated with nanoviruses (Stainton et al. , 2016; Su et al. , 2003) and geminiviruses (Al 239 Rwahnih et al. , 2016; Bach and Jeske, 2014; Casado et al. , 2004; Frischmuth and Stanley, 1992; 240 Hadfield et al. , 2012; Horn et al. , 2011; Paprotka et al. , 2010; Stanley and Townsend, 1985; 10

Page 10 of 23 241 Stenger et al. , 1992; van der Walt et al. , 2009; Zaffalon et al. , 2012). Six alphasatellite-like 242 circular molecules sharing >85% nucleotide identity and with no detectable coding region 243 (Supplementary Data 1) were recovered. A blast analysis of these shows that they share 78-90% 244 identity with 24-33% coverage (mainly in the intergenic and the 3’ and 5’ termini of the rep gene 245 with milk vetch dwarf alphasatellite molecules).

246

247 4. Conclusion

248 Here, using a high-throughput sequencing approach, at least one putative novel nanovirus is 249 identified infecting S. alopecuroides , a legume that can be found growing wild throughout most 250 parts of Iran. In a single diseased plant displaying yellowing and stunting symptoms 12 distinct 251 molecules that appeared to be components of one or more nanovirus genomes and a further 14 252 distinct molecules which appear to be nanovirus-associated alphasatellites were identified. Given 253 that such a diverse set of DNA-R and alphasatellite molecules have been recovered from the 254 single S. alopecuroides plant, it is plausible that that this plant harboured a mixed infection. It 255 would be very interesting to determine whether similarly symptomatic S. alopecuroides plants 256 from elsewhere in Iran contain similar complements of canonical genome components and 257 alphasatellite molecules to those described here. Based on all the analysis of the CR-I and CR-II 258 of the DNA-Rs with other nanovirus-like components from S. alopecuroides plants, we are 259 unable to provide a high confidence assemblage of what would be a novel nanovirus genome. It 260 is highly likely, based on the four diverse DNA-Rs identified in this study and the two DNA-Cs, 261 that these may represent four novel nanoviruses adding to the other nanovirus species that have 262 so far been identified globally. 263 Accepted Manuscript 264 Acknowledgement

265 This study was supported by a grant from Shahid Bahonar University of Kerman, Kerman, Iran. 266 AV and DPM are supported by the National Science Foundation of South Africa.

267

11

Page 11 of 23 267 Figure legends and table text

268 Figure 1: a) Plant of Sophora alopecuroides (left) displaying symptoms associated with 269 infection by Sophora yellow stunt-associated virus, including severe yellowing and stunting and 270 small sized leaves, in comparison to a healthy plant (right). b) Early symptoms of yellow veins in 271 a plant of the same species. c) Linearised illustration of the DNA molecules recovered from the 272 infected Sophora alopecuroides.

273 Figure 2: Maximum likelihood phylogenetic tree and pairwise identity matrix of nanovirus 274 DNA-R molecule nucleotide sequences. The phylogenetic tree is rooted with babuvirus DNA-R 275 molecule sequences. DNA-R molecules from Sophora alopecuroides are in red bold font .

276 Figure 3: Maximum likelihood phylogenetic tree and pairwise identity matrix of nanovirus 277 DNA-S molecule nucleotide sequences. The phylogenetic tree is rooted with babuvirus DNA-S 278 molecule sequences. DNA-S molecules from Sophora alopecuroides are in red bold font.

279 Figure 4: Maximum likelihood phylogenetic trees of DNA-C, DNA-M, DNA-N, DNA-U1, 280 DNA-U2 and DNA-U4 molecule nucleotide sequences of nanoviruses. DNA-C, DNA-M and 281 DNA-N phylogenetic trees are rooted with corresponding babuvirus molecule sequences whereas 282 DNA-U1, DNA-U2 and DNA-U4 phylogenetic trees are mid-point rooted. DNA molecules from 283 Sophora alopecuroides are in red bold font.

284 Figure 5: Alignment of the CR-I (A) and CR-II (B) regions identified in the SYSaV sequences. 285 The iterons and the nonanucleotide motif in CR-I and a highly conserved motif in CR-II are 286 highlighted in grey boxes.

287 Figure 6: Maximum likelihood phylogenetic tree and pairwise identity matrix of nanovirus- 288 associated alphasatelliteAccepted molecule nucleotide seque Manuscriptnces. Sophora yellow stunt-associated 289 alphasatellites molecules from Sophora alopecuroides are in red bold font.

290 Table 1: Overview of all babuvirus and nanovirus species, including the size (nt) and 291 presence/absence of components.

12

Page 12 of 23 292 Table 2: Pairwise identities of SYSaV DNA-R, DNA-S, DNA-C, DNA-M, DNA-N, DNA-U1, 293 DNA-U2 and DNA-U4 with those of other nanoviruses.

294 Supplementary Table 1: Details of primer pairs used to recover the DNA-R, DNA-S, DNA-C, 295 DNA-M, DNA-N, DNA-U1, DNA-U2 and DNA-U4 molecules as well as alphasatellite 296 molecules from Sophora alopecuroides. The underlined regions within the primer pairs are the 297 regions which correspond to a restriction enzyme site. The GenBank accession numbers of the 298 recovered components are included.

299 Supplementary Figure 1: Analysis of the percentage pairwise diversity of the CR-I of the 300 DNA-R of BMLV, FBNSV, FBNYV, FBYLV, MDV, PNYDV, PYSV, SCSV and SYSaV with 301 that of their canonical molecules.

302 Supplementary Data 1: Nucleotide sequence file (fasta format) of defective molecules 303 recovered in this study.

304 Supplementary Data 2: Pairwise comparisons and alignment of the CR-I of all nanoviruses.

305 Supplementary Data 3: Pairwise comparisons and alignment of the CR-II of all nanoviruses.

306

307 References

308

309 Abraham, A.D., Varrelmann, M., Vetten, H.J. (2012). Three distinct nanoviruses, one of which represents 310 a new species, infect faba bean in Ethiopia. Plant Dis. 96, 1045–1053. 311 Al Rwahnih, M., Alabi, O.J., Westrick, N.M., Golino, D., Rowhani, A. (2017). Description of a novel 312 monopartite geminivirus and its defective subviral genome in grapevine. Phytopathology 103, 313 240-251. Accepted Manuscript 314 Alavinejad, E., Behjatnia, S.A.A., Izadpanah, K., Masoumi, M. (2011). Molecular detection of Faba bean 315 necrotic yellows virus in legume fields of some North, North West and South provinces of Iran. 316 In The 7th National Biotechnology Congress of I R Iran, p. 6. 317 Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J. (1990). Basic Local Alignment Search 318 Tool. J. Mol. Biol. 215, 403-410. 319 Aronson, M.N., Meyer, A.D., Györgyey, J., Katul, L., Vetten, H.J., Gronenborn, B., Timchenko, T. 320 (2000). Clink, a nanovirus-encoded protein, binds both pRB and SKP1. J. Virology 74, 2967- 321 2972.

13

Page 13 of 23 322 Bach, J., Jeske, H. (2014). Defective of beet curly top virus from long-term survivor sugar beet 323 plants. Virus Res. 183, 89-94. 324 Bisby, F.A., Buckingham, J., Harborne, J.B. (1994). Phytochemical dictionary of the Leguminosae. Vol. 325 1: Plants and their constituents: Chapman and Hall, London. 326 Boevink, P., Chu, P.W.G., Kees, P. (1995). Sequence of subterranean clover stunt virus DNA: Affinities 327 with the geminiviruses. Virology 207, 354-361. 328 Burns, T.M., Harding R.M., Dale, J.L. (1995). The genome organisation of banana bunchy top virus: 329 analysis of six ssDNA components. J. Gen. Virol. 76, 1471-1482. 330 Casado, C.G., Javier Ortiz, G., Padron, E., Bean, S.J., McKenna, R., Agbandje-McKenna, M., Boulton, 331 M.I. (2004). Isolation and characterization of subgenomic DNAs encapsidated in "single" T = 1 332 isometric particles of Maize streak virus. Virology 323, 164-171. 333 Chu, P.W.G., Helms, K. (1988). Novel virus-like particles containing circular single-stranded DNAs 334 associated with subterranean clover stunt disease. Virology 167, 38-49. 335 Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. 336 Nucleic acids Res. 32, 1792-1797. 337 Frischmuth, T., Stanley, J. (1992). Characterization of beet curly top virus subgenomic DNA localizes 338 sequences required for replication. Virology 189, 808-811. 339 Grigoras, I., Ginzo, A.I., Martin, D.P., Varsani, A., Romero, J., Mammadov, A., Huseynova, I.M., Aliyev, 340 J.A., Kheyr-Pour, A., Huss, H., Ziebell, H., Timchenko, T., Vetten, H.J., Gronenborn, B. (2014). 341 Genome diversity and evidence of recombination and reassortment in nanoviruses from Europe. 342 J. Gen. Virol. 95, 1178-1191. 343 Grigoras, I., Gronenborn, B., Vetten, H.J. (2010). First report of a nanovirus disease of pea in Germany. 344 Plant Dis. 94, 642-642. 345 Grigoras, I., Timchenko, T., Katul, L., Grande-Perez, A., Vetten, H.J., Gronenborn, B. (2009). 346 Reconstitution of authentic nanovirus from multiple cloned DNAs. J. Virol. 83, 10778-10787. 347 Hadfield, J., Thomas, J.E., Schwinghamer, M.W., Kraberger, S., Stainton, D., Dayaram, A., Parry, J.N., 348 Pande, D., Martin, D.P., Varsani, A. (2012). Molecular characterisation of dicot-infecting 349 mastreviruses from Australia. Virus Res. 166, 13-22. 350 Hafner, G.J., Stafford, M.R., Wolter, L.C., Harding, R.M., Dale, J.L. (1997). Nicking and joining activity 351 of banana bunchy top virus replication protein in vitro. J. Gen. Virol. 78, 1795-1799. 352 Harding, R.M., Burns, T.M., Hafner, G., Dietzgen, R.G., Dale, J.L. (1993). Nucleotide sequence of one 353 component of the banana bunchy top virus genome contains a putative replicase gene. J. Gen. 354 Virol. 74, 323-328. 355 Horn, J., Lauster, S., Krenz, B., Kraus, J., Frischmuth, T., Jeske, H. (2011). Ambivalent effects of 356 defective DNA in beet curly top virus-infected transgenic sugarbeet plants. Virus Res. 158, 169- 357 178. 358 Horser, C.L., Harding, R.M., Dale, J.L. (2001). Banana bunchy top nanovirus DNA-1 encodes the 'master' 359 replication initiation protein. J. Gen. Virol. 82, 459-464. 360 Katul, L., Timchenko, T., Gronenborn, B., Vetten, H.J. (1998). Ten distinct circular ssDNA components, 361 four of which encode putative replication-associated proteins, are associated with the faba bean 362 necrotic yellows virus genome. J. Gen. Virol. 79, 3101-3109. 363 Kumar, J., Kumar,Accepted J., Singh, S.P., Tuli, R. (2014). Association Manuscript of satellites with a mastrevirus in natural 364 infection: Complexity of Wheat dwarf India virus disease. J. Virol. 88, 7093-7104. 365 Lageix, S., Catrice, O., Deragon, J.M., Gronenborn, B., Pélissier, T., Ramirez, B.C. (2007). The 366 nanovirus-encoded clink protein affects plant cell cycle regulation through interaction with the 367 retinoblastoma-related protein. J. Virol. 81, 4177-4185. 368 Londono, A., Riego-Ruiz, L., Arguello-Astorga, G.R. (2010). DNA-binding specificity determinants of 369 replication proteins encoded by eukaryotic ssDNA viruses are adjacent to widely separated RCR 370 conserved motifs. Arch. Virol. 155, 1033-1046.

14

Page 14 of 23 371 Lotfipour, M., Izadpanah, K., Behjatnia, S.A.A. (2016). Identification and molecular characterization of 372 Faba bean necrotic stunt virus, a new nanovirus in legume fields in Iran. in The 22nd Iranian 373 Plant Prot Cong Karaj. Iran. 374 Makkouk, K.M., Fazlali, Y., Kumari, S.G., Farzadfar, S. (2002). First record of Beet western yellows 375 virus, Chickpea chlorotic dwarf virus, Faba bean necrotic yellows virus and Soybean dwarf virus 376 infecting chickpea and lentil crops in Iran. Plant Pathol. 51, 387. 377 Muhire, B.M., Varsani, A., Martin, D.P. (2014). SDT: a tool based on pairwise 378 sequence alignment and identity calculation. PloS one 9, e108277. 379 Paprotka, T., Boiteux, L.S., Fonseca, M.E., Resende, R.O., Jeske, H., Faria, J.C., Ribeiro, S.G. (2010). 380 Genomic diversity of sweet potato geminiviruses in a Brazilian germplasm bank. Virus Res. 149, 381 224-233. 382 Price, M.N., Dehal, P.S., Arkin, A.P. (2010). FastTree 2--approximately maximum-likelihood trees for 383 large alignments. PloS one 5, e9490. 384 Sano, Y., Wada, M., Hashimoto, Y., Matsumoto, T., Kojima, M. (1998). Sequences of ten circular 385 ssDNA components associated with the milk vetch dwarf virus genome. J. Gen. Virol. 79, 3111- 386 3118. 387 Sayers, E.W., Barrett, T., Benson, D.A., Bolton, E., Bryant, S.H., Canese, K., Chetvernin, V., Church, 388 D.M., Dicuccio, M., Federhen, S., Feolo, M., Fingerman, I.M., Geer, L.Y., Helmberg, W., 389 Kapustin, Y., Krasnov, S., Landsman, D., Lipman, D.J., Lu, Z., Madden, T.L., Madej, T., 390 Maglott, D.R., Marchler-Bauer, A., Miller, V., Karsch-Mizrachi, I., Ostell, J., Panchenko, A., 391 Phan, L., Pruitt, K.D., Schuler, G.D., Sequeira, E., Sherry, S.T., Shumway, M., Sirotkin, K., 392 Slotta, D., Souvorov, A., Starchenko, G., Tatusova, T.A., Wagner, L., Wang, Y., Wilbur, W.J., 393 Yaschenko, E., Ye, J. (2012). Database resources of the National Center for Biotechnology 394 Information. Nucleic acids Res. 40, D13-25. 395 Shepherd, D.N., Martin, D.P., Lefeuvre, P., Monjane, A.L., Owor, B.E., Rybicki, E.P., Varsani, A. 396 (2008). A protocol for the rapid isolation of full geminivirus genomes from dried plant tissue. J. 397 Virol. Methods 149, 97-102. 398 Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J., Birol, I. (2009). ABySS: a parallel 399 assembler for short read sequence data. Genome Res. 19, 1117-1123. 400 Song, J.Z., Xu, H.X., Tian, S.J., But, P.P. (1999). Determination of quinolizidine alkaloids in traditional 401 Chinese herbal drugs by nonaqueous capillary electrophoresis. J. Chromatogr. A 857, 303-311. 402 Stainton, D., Martin, D.P., Collings, D.A., Thomas, J.E., Varsani, A. (2016). Identification and in silico 403 characterisation of defective molecules associated with isolates of banana bunchy top virus. Arch. 404 Virol. 161, 1019-1026. 405 Stanley, J., Townsend, R. (1985). Characterisation of DNA forms associated with cassava latent virus 406 infection. Nucleic acids Res. 13, 2189-2206. 407 Stenger, D.C., Stevenson, M.C., Hormuzdi, S.G., Bisaro, D.M. (1992). A number of subgenomic DNAs 408 are produced following agroinoculation of plants with beet curly top virus. J. Gen. Virol. 73, 237- 409 242. 410 Stover, B.C., Muller, K.F. (2010). TreeGraph 2: combining and visualizing evidence from different 411 phylogenetic analyses. BMC Bioinformatics 11, 7. 412 Su, H.J., Tsao, L.Y.,Accepted Wu, M.L., Hung, T.H. (2003). Biological Manuscript and molecular categorization of strains of 413 Banana bunchy top virus. J. Phytopathol .151, 290-296. 414 Timchenko, T., De Kouchkovsky, F., Katul, L., David, C., Vetten, H.J., Gronenborn, B. (1999). A single 415 Rep protein initiates replication of multiple genome components of faba bean necrotic yellows 416 virus, a single-stranded DNA virus of plants. J. Virol. 73, 10173-10182. 417 Timchenko, T., Katul, L., Sano, Y., de Kouchkovsky, F., Vetten, H.J., Gronenborn, B. (2000). The master 418 Rep concept in nanovirus replication: Identification of missing genome components and potential 419 for natural genetic reassortment. Virology 274, 189-195.

15

Page 15 of 23 420 van der Walt, E., Rybicki, E.P., Varsani, A., Polston, J.E., Billharz, R., Donaldson, L., Monjane, A.L., 421 Martin, D.P. (2009). Rapid host adaptation by extensive recombination. J. Gen. Virol. 90, 734- 422 746. 423 Vetten, H.J., Dale, J.L., Grigoras, I., Gronenborn, B., Harding, R., Randles, J.W., Sano, Y., Thomas, J.E., 424 Timchenko, T., Yeh, H.H. (2012). Nanoviridae. In Virus Taxonomy: Ninth Report of the 425 International Committee on Taxonomy of Viruse, pp. 395-404. Edited by A.M.Q. King, M.J. 426 Adams, E.B. Carstens, E.J. Lefkowitz. Amsterdam: Elsevier, Academic Press. 427 Wanitchakorn, R., Hafner, G.J., Harding, R.M., Dale, J.L. (2000). Functional analysis of proteins encoded 428 by banana bunchy top virus DNA-4 to -6. J. Gen. Virol. 81, 299-306. 429 Wanitchakorn, R., Harding, R.M., Dale, J.L. (1997). Banana bunchy top virus DNA-3 encodes the viral 430 coat protein. Arch. Virol. 142, 1673-1680. 431 Zaffalon, V., Mukherjee, S.K., Reddy, V.S., Thompson, J.R., Tepfer, M. (2012). A survey of 432 geminiviruses and associated satellite DNAs in the cotton-growing areas of northwestern India. 433 Arch. Virol. 157, 483-495. 434 Zhang, Y.P., Uyemoto, J.K, Kirkpatrick, B.C. (1998). A small-scale procedure for extracting nucleic 435 acids from woody plants infected with various phytopathogens for PCR assay. J. Virol. Methods 436 71, 45-50. 437 Zhao, L.F., Xu, Y.J., Ma, Z.Q., Deng, Z.S. (2013). Colonization and plant growth promoting 438 characterization of endophytic Pseudomonas chlororaphis strain Zong1 isolated from Sophora 439 alopecuroides root nodules. Braz. J. Microbiol. 44, 623-631. 440 Zhou, X. (2013). Advances in understanding begomovirus satellites. Ann. Rev. Phytopathol. 51, 357-381.

441

442

443 Table 1:

Components Genus Species Acronym Host species Country Rep CP MP Clink NSP Unknown Unknown Unknown Nanovirus Subterranean clover stunt SCSV Trifolium AU 1005 998 1001 991 1002 988 - - virus subterraneum Faba bean necrotic yellows FBNYV Vicia faba ES, AZ, ET, 1003 1006 992 999 986 996 1020 - virus IR, SY, EG, MA Faba bean necrotic stunt FBNSV Phaseolus ET, IR, MA, 1003 992 980 994 981 986 984 - virus vulgaris AZ Lens culinaris Vicia sativa Pea necrotic yellow dwarf PNYDV Pisum sativum AT, GE 1002 981 988 988 993 978 985 - virus Milk vetch dwarf virus MDV Astragalus BD, CH, JP 1001 997 985 990 977 989 981 - sinicus Catharanthus roseus Dolichos lablab Glycine max Nicotina Acceptedtabacum Manuscript Pisum sativum Vicia faba Faba bean yellow leaf virus FBYLV Vicia faba ET 1002 1001 980 995 1000 990 995 - Pea yellow stunt virus PYSV Pisum sativum AT, AZ, SE 1002 976 975 971 977 970 971 - Black medic leaf roll virus BMLRV Medicago AT, AZ, SE 1008 1017 1010 1017 1021 1011 1001 - lupulina Pisum sativum Sophora yellow stunt- SYSaV Sophora IR 997 990 982 985 983 974 982 associatedvirus alopecuroides Babuvirus Banana bunchy top virus BBTV Musa spp Banana 1110 1076 1047 1017 1090 - - 1064 worldwide † Abaca bunchy top virus ABTV Musa spp PH, MY 1099 1078 1076 1013 1073 - - 1059 16

Page 16 of 23 Cardamom bushy dwarf CBDV Amomum IN 1102 1086 1083 1027 1116 - - 1088 virus subulatum 444 †Banana regions worldwide (except the Americas) 445

446 Table 2:

Percentage pairwise identity SYSaV BMLRV FBNSV FBNYV FBYLV MVDV PNYDV PYSV SCSV DNA R1 R2 R3 R4 C1 C2 R1 ------74.6 - 76.1 74.0 - 75.6 74.8 - 75.4 75.7 73.9 - 74.4 74.7 - 74.9 76.7 75.7 R2 68.9 - - - - - 68.5 - 70.2 67.6 - 68.2 65.1 - 68.8 66.5 66.0 - 67.0 65.3 - 65.7 65.9 66.3 R3 76.5 67.7 - - - - 85.8 - 88.2 83.7 - 84.4 84.2 - 85.0 83.8 84.4 - 84.9 81.8 - 81.9 81.9 78.4 R4 70.7 66.9 64.8 - - - 64.2 - 66.3 64.1 - 64.7 64.3 - 65.4 62.1 63.5 - 64.2 61.7 - 62.2 62.1 65.4 S ------62.7 - 64.6 63.0 - 65.2 62.8 - 64.5 66.9 64.3 - 66.1 63.8 - 64.2 61.7 63.0 - 67.7 N ------66.7 - 69.1 68.0 - 70.4 66.9 - 69.9 68.9 68.2 - 69.8 63.9 - 64.4 65.7 71.2 C1 - - - - - 62.5 - 65.4 65.5 - 67.7 65.8 - 68.1 60.5 65.6 - 67.5 62.0 - 62.9 62.7 64.0 C2 - - - - 71.0 - 64.4 - 65.4 67.3 - 69.3 65.3 - 67.1 63.5 64.3 - 66.9 61.3 - 61.4 64.5 68.0 M ------62.3 - 65.6 63.0 - 66.3 60.9 - 65.0 66.7 61.6 - 66.7 65.8 - 66.1 64.4 68.0 U1 ------59.5 - 61.0 65.7 - 68.0 66.4 - 70.2 59.5 68.0 - 68.9 60.0 - 61.6 64.7 63.1 U2 ------63.5 - 65.3 63.3 - 66.4 61.5 - 66.1 61.8 63.8 - 65.1 63.3 - 63.5 62.3 - 62.7 - U4 ------62.7 - 65.6 59.6 - 62.7 61.0 - 64.3 66.5 63.3 - 63.8 61.3 - 61.4 63.5 - 447

448

449

450• Use of high throughput sequencing to identify 451 novel virus and satellite molecules. 452• Detection of 12 nanovirus-like molecules associated with 453 Sophora alopecuroides. 454• 12 molecules shared less than 71% identity with those of 455 other known nanoviruses. 456• 14 alphasatellites molecules were recovered and 457 characterized. 458

459 Accepted Manuscript

17

Page 17 of 23 c

Nucleotides 1 100 200 300 400 500 600 700 800 900 1000 1046

SYSaV-R1 [KX534389] SYSaV-R2 [KX534388] SYSaV-R3 [KX534390] SYSaV-R4 [KX534391] SYSaV-S [KX534385] SYSaV-C2 [KX534396] SYSaV-C1 [KX534386] SYSaV-M [KX534387] SYSaV-N [KX534393] SYSaV-U1 [KX534394] SYSaV-U2 [KX534395]

Nanovirus components SYSaV-U4 [KX534392]

SYS α 1a [KX534399] SYS α 1b [KX534407] SYS α 2 [KX534402] SYS α 3 [KX534406] SYS α 4 [KX534408]Accepted Manuscript SYS α 5 [KX534397] SYS α 6 [KX534410] SYS α 7a [KX534400] SYS α 7b [KX534405] SYS α 7c [KX534403] SYS α 7d [KX534401] SYS α 7e [KX534404] SYS α 8 [KX534398] Nanovirus associated satellites SYS α 9 [KX534409]

Page 18 of 23 BMLRV FBNSV FBNYV R [KC979000] 100 FBNYV FBNYV R [KC979009] FBYLV FBNYV R [HE663168] 96 MDV FBNYV R [KC979035] 92 PNYDV FBNYV R [KC979019] PYSV FBNYV R [KC979027] 89 SCSV FBNYV R [GQ274025] 85 SYSaV FBNYV R [AJ132180] FBNYV R [Y11405] 81 MDV R [KX710110] Pairwise identity (%) 77 SH-like MDV R [AB027511] branch MDV R [LC094159] 73 support MDV R [LC010103] 70 FBYLV R [HE654123] >95% FBNSV R [AJ749894] 66 >90-95% FBNSV R [GU983866] 62 >80-90% FBNSV R [GQ150778] FBNSV R [KC978992] FBNSV R [GQ274033] FBNSV R [KC978982] FBNSV R [KC978974] FBNSV R [KC978966] BMLRV R [KC978949] BMLRV R [KC978939] BMLRV R [KC978958] SYSaV R [KX534390] PYSV R [KC979054] PNYDV R [GU553134] PNYDV R K[C979043] SCSV R [AJ290434] SYSaV R [KX534389] SYSaV R [KX534388] SYSaV R [KX534391]

0.1 FBNYV R [Y11405] MDV R [LC094159] MDV R [LC010103] MDV R [KX710110] MDV R [AB027511] SCSV R [AJ290434] PYSV R [KC979054] FBNYV R [AJ132180] FBNSV R [AJ749894] FBYLV R [HE654123] FBNSV R [KC978992] FBNSV R [KC978982] FBNSV R [KC978974] FBNSV R [KC978966] SYSaV R [KX534390] SYSaV R [KX534389] SYSaV R [KX534388] SYSaV R [KX534391] FBNYV R [KC979000] FBNYV R [KC979009] FBNYV R [HE663168] FBNYV R [KC979035] FBNYV R [KC979019] FBNYV R [KC979027] PNYDV R K[C979043] BMLRV R [KC978949] BMLRV R [KC978939] BMLRV R [KC978958] FBNSV R [GU983866] FBNYV R [GQ274025] FBNSV R [GQ150778] FBNSV R [GQ274033] Accepted Manuscript PNYDV R [GU553134]

Page 19 of 23 BMLRV FBNSV FBNYV S [KC979020] 100 FBNYV FBNYV S [KC979028] FBYLV FBNYV S [KC979036] 96 MDV FBNYV S [GQ274028 92 PNYDV FBNYV S [HE663169] PYSV FBNYV S [KC979010] 88 SCSV FBNYV S [AM493900] 84 SYSaV FBNYV S [AJ132183] FBNYV S [Y11408] 80 FBNYV S [AM493899] Pairwise identity (%) 77 SH-like FBNYV S [KC979001] branch FBYLV S [HE654124] 73 support FBNSV S [AJ749897] 69 FBNSV S [GQ150779] >95% FBNSV S [GU983867] 65 >90-95% FBNSV S [KC978993] 61 >80-90% FBNSV S [GQ274036] FBNSV S [KC978983] FBNSV S [KC978975] FBNSV S [KC978967] MDV S [LN890467] MDV S [AB044387] MDV S [LC094153] MDV S [LC094165] MDV S [AB009046] MDV S [LC010102] PYSV S [KC979055] PNYDV S [JN133279] PNYDV S [KC979044] SYSaV S [KX534385] SCSV S [L47332] SCSV S [U16734] BMLRV S [KC978950] BMLRV S [KC978940] 0.05 BMLRV S [KC978959] SCSV S [L47332] SCSV S [U16734] FBNYV S [Y11408] MDV S [LN890467] MDV S [LC094153] MDV S [LC094165] MDV S [LC010102] MDV S [AB044387] MDV S [AB009046] PYSV S [KC979055] FBNYV S [AJ132183] FBNSV S [AJ749897] FBYLV S [HE654124] FBNYV S [GQ274028 PNYDV S [JN133279] FBNYV S [KC979020] FBNYV S [KC979028] FBNYV S [KC979036] FBNYV S [HE663169] FBNYV S [KC979010] FBNSV S [KC978993] FBNSV S [KC978983] FBNSV S [KC978975] FBNSV S [KC978967] SYSaV S [KX534385] FBNYV S [KC979001] FBNYV S [AM493900] FBNSV S [GU983867] PNYDV S [KC979044] FBNYV S [AM493899] BMLRV S [KC978950] BMLRV S [KC978940] BMLRV S [KC978959] FBNSV S [GQ274036] Accepted Manuscript FBNSV S [GQ150779]

Page 20 of 23 DNA-C FBNYV C [KC979021] DNA-U1 FBNYV U1 [KC979024] FBNYV C [KC979029] FBNYV U1 [KC979032] FBNYV C [KC979037] FBNYV U1 [KC979040] FBNYV U1 [GQ274026] FBNYV C [GQ274023] BMLRV FBNYV C [AJ132179] FBNYV U1 [AJ132181] FBNYV C [KC979011] FBNYV U1 [HE663170] FBNSV FBNYV C [KC979002] 98 FBNYV U1 [KC979014 FBNYV FBNYV U1 [KC979005] MDV C [LN890461] FBYLV MDV C [LN890462] MDV U1 [LC094149] MDV C [AB000923] MDV U1 [LC094161] MDV MDV C [LC094148] MDV U1 LC010098] PNYDV MDV U1 [AB000924] MDV C [LC010097] PYSV MDV C [LC094160] FBYLV U1 [HE654128] FBNYV C [HE654126] FBNSV U1 [AJ749895] SCSV FBNSV C [KC978994] FBNSV U1 [GU983871] SYSaV FBNSV C [GQ150780] FBNSV U1 [GQ150783] FBNSV C [GU983868] FBNSV U1 [KC978997] FBNSV C [AJ749900] FBNSV U1 [GQ274034] FBNSV C [GQ274031] FBNSV U1 [KC978971] FBNSV C [KC978984] FBNSV U1 [KC978987] FBNSV C [KC978968] FBNSV U1 [KC978979] SH-like FBNSV C [KC978976] FBNYV U1 [AJ132184] SCSV C [U16732] PYSV U1 [KC979062] branch SYSaV C1 [KX534386] PNYDV U1 [JN133283] SYSaV C2 [KX534396] PNYDV U1 [KC979048] support BMLRV U1 [KC978954] PYSV C [KC979056] >95% PNYDV C [JN133280] BMLRV U1 [KC978944] PNYDV C [KC979045] BMLRV U1 [KC978963] >90-95% BMLRV C [KC978951] SCSV U1 [U16736] BMLRV C [KC978941] SYSaV U1 [KX534394] >80-90% BMLRV C [KC978960] 0.1 0.1

DNA-M FBNYV M [KC979022] DNA-U2 FBNSV U2 [KC978980] FBNYV M [KC979030] FBNSV U2 [KC978998] FBNYV M [KC979038] FBNSV U2 [KC978972] FBNYV M [GQ274027] FBNSV U2 [GU983872] FBNYV M [KC979018] FBNSV U2 [AJ749898] FBNYV M [KC979003] FBNSV U2 [GQ150784] FBNYV M [KC979012] FBNSV U2 [GQ274037] FBNYV M [AJ132182] FBNSV U2 [KC978988] FBNYV M [Y11407] FBNYV U2 [Y11409] FBYLV M [HE654125] FBNYV U2 [GQ274029] PYSV M [KC979057] FBNYV U2 [HE663171] PNYDV M [JN133281] FBNYV U2 [KC979006] PNYDV M [KC979046] FBNYV U2 [KC979015] FBNSV M [KC978969] FBNYV U2 [KC979025] FBNSV M [KC978977] FBNYV U2 [KC979041] FBNSV M [GQ274035] FBNYV U2 [KC979033] FBNSV M [KC978985] PNYDV U2 [JN133284] FBNSV M [KC978995] PNYDV U2 [KC979049] FBNSV M [GQ150781] PYSV U2 [KC979059] FBNSV M [GU983869] PYSV U2 [KC979060] FBNSV M [AJ749896] FBYLV U2 [HE654129] FBNSV M [AF159705] PYSV U2 [KC986815] MDV M [LN890465] FBNYV U2 [Y11406] MDV M [LN890466] MDV U2 [LC094163] MDV M [AB000927] MDV U2 [LC094151] MDV M [LC010101] MDV U2 [LC010100] MDV M [LC094152] MDV U2 [AB000926] MDV M [LC094164] BMLRV U2 [KC978964] BMLRV M [KC978952] BMLRV U2 [KC978946] BMLRV M [KC978942] BMLRV U2 [KC978955] BMLRV M [KC978961] BMLRV U2 [KC978945] SCSV [M U16730] SYSaV U2 [KX534395] SYSaV M [KX534387] 0.1 0.1

DNA-N DNA-U4 FBNYV N [KC979023] FBNSV U4 [AJ749901] FBNYV N [KC979031] FBNSV U4 [GU983873] FBNYV N [KC979039] FBNSV U4 [GQ150785] FBNYV N [AJ132186] FBNYV U4 [AF159704] FBNYV N [GQ274030] FBNSV U4 [GQ274032] FBNYV N [AM493901] FBNSV U4 [KC978973] FBNYV N [KC979013] FBNSV U4 [KC978981] Accepted FBNYV N K[C979004] Manuscript FBNSV U4 [KC978989] FBNYV N [AM493898] FBNSV U4 [KC978999] MDV N [LN890463] BMLRV U4 [KC978947] MDV N [LN890464] BMLRV U4 [KC978965] MDV N [AB000925] BMLRV U4 [KC978956] MDV N [LC094150] MDV U4 [AB255373] MDV N [LC094162] MDV U4 [LC099944] MDV N [LC010099] MDV U4 [LC010104] FBNSV N [GQ274038] MDV U4 [LC094154] FBNSV N [KC978978] PNYDV U4 [JN133285] FBNSV N [KC978970] PNYDV U4 [KC979050] FBNSV N [KC978986] PYSV U4 [KC979061] FBNSV N [KC978996] FBYLV U4 [HE654130] FBNSV N [GQ150782] FBNYV U4 AJ749902] FBNSV N [AJ749899] FBNYV U4 [AJ749903] FBNSV N [GU983870] FBNYV U4 [GQ274024] FBYLV N [HE654127] FBNYV U4 [KC979007] PYSV N [KC979058] FBNYV U4 [KC979017] PNYDV N [JN133282] FBNYV U4 [KC979016] 100 PNYDV N [KC979047] FBNYV U4 [KC979034] BMLRV N [KC978953] FBNYV U4 [KC979026] BMLRV N [KC978943] FBNYV U4 [KC979042]Page 21 of 23 BMLRV N [KC978962] SYSaV U4 [KX534392] SYSaV N [KX534393] SCSV N [U16733] 0.1 0.1 Iteron Iteron Nonanucleotide motif A SYSaV-C [KX534386] TGAC-GTCACT--ATGTATCCGGGCTGACGTG---ATCGGGGACTAGTATTACCCC-CGATCACGGGTACCAGATACA--TC SYSaV-C [KX534396] TGAC-GCGTG---ATGTATCCGGGATGACGTG---ATCGGGGTCTAGTATTACCCC-CGATCACGGGACG--GGTACA--TG CR-I SYSaV-M [KX534387] TGAC-GCGTG---ATGTATCCGGGATGACGTG---ATCGGGGTCTAGTATTACCCC-CGATCACGGGACG--GGTACA--TG SYSaV-N [KX534393] TGAGTGAAGCGTGATGTATCCGGGCTGACGTC---ACCGGGGAGTAGTATTACCCC-CGGGACGGATACATGGATAC----- SYSaV-R [KX534388] TGAC-GTC-----ATGTACCCAGGCTGACGTC---CCGGGGGCTTAGTATTACCCC-CGGGACGGGTACACGGGTAC----- SYSaV-R [KX534389] TGAC-GTCAT---CTGTTCCCGTGATGAGCTGGCGC---GGGACTAGTATTACCCCGCGCCAGAG-GACG--GGTACAATTT SYSaV-R [KX534390] TGAC-GTCAG---GTGATCCCTTGATGACGTGGGGC--GGGGCTTAGTATTACCCC-CGCCCCGGGATCA- GAGTCACC-- SYSaV-R [KX534391] TGAC------CGGGCCCTGGGG----GCT---GC-GGGGC-TAGTATTACCCC-GCAGCCCGGGGCTTTGGGGCTTTTT SYSaV-S [KX534385] TGAC-GCGTG---ATGTATCCGGGATGACGTG---ATCGGGGTCTAGTATTACCCC-CGATCACGGGACG--GGTACATG-- SYSaV-U1 [KX534394] TGAC-GTCACT--ATGTATCCGGGCTGACGTG---ATCGGGGACTAGTATTACCCC-CGATCACGGGTACCAGATACATC-- SYSaV-U2 [KX534395] TGAC-GCGTG---ATGTATCCGGGATGACGTG---ATCGGGGTCTAGTATTACCCC-CGATCACGGGACG--GGTACATG-- SYSaV-U4 [KX534392] TGAGTGACGCGTGATGTATCCGGGATGACGTG---ATCGGGGTCTAGTATTACCCC-CGATCACGGGACG--GGTACATC--

Conserved motif B SYSaV-C [KX534386] CTCTGCGAAGCTATATGTTTACA------TAGTGACTTGCA SYSaV-C [KX534396] CTCTGCGAAGCTATATGAAGAAGAAATATGTTGGGC-TTGGCCCAATAA CR-II SYSaV-M [KX534387] CTCTGCGAAGCCATATGGAGAAGGATTATATTGGGCCTTGGCCCAATAA SYSaV-N [KX534393] CTCTGCGAAGCCATATGGAGAAGGATTATATTGGGCCTTGGCCCAATAA SYSaV-R [KX534388] CTCCGCGTAGCGGTATGCTTTCCGAGGA------TAGGATATGTAA SYSaV-R [KX534389] CTCCGCGAAGCGGTATGCTTTCGGAGGA------TAGGATTGTAAT SYSaV-R [KX534390] CTCCGCGAAGCGGTATGTTTAGTGAGGA------TAGGATTGTAAC SYSaV-R [KX534391] CTCCGCGTAGCGGTATGCTTTCTGAGGA------TAGGATTGTAAT SYSaV-S [KX534385] CTCTGCGAAGCTATATGAAGAAGAAATATGTTGGGC-TTGGCCCAATAA SYSaV-U1 [KX534394] CTCTGCGAAGCTATATGTTTACA------TAGTGACTTGCA SYSaV-U2 [KX534395] CTCTGCGAAGCCATATGGAGAAGGATTATATTGGGC-TTGGCCCAATAA SYSaV-U4 [KX534392] CTCTGCGAAGCCATATGGAGAAGGATTATATTGGGC-TTGGCCCAATAA

Accepted Manuscript

Page 22 of 23 BMLRV-alphasat MDV-alphasat [LC094155] FBNSV-alphasat 100 MDV-alphasat [LC094158] FBNYV-alphasat MDV-alphasat [LC010785] 96 MDV-alphasat MDV-alphasat [AB009047] 92 PNYDV-alphasat FBNYV-alphasat [AJ005964] SCSV-alphasat FBNYV-alphasat [AJ132185] 87 SYS-alphasat SYS-alphasat [KX534397] 83 SYS-alphasat [KX534401] Pairwise identity (%) 79 SH-like SYS-alphasat [KX534404] SYS-alphasat [KX534403] 75 branch SYS-alphasat [KX534400] support SYS-alphasat [KX534405] 71 >95% MDV-alphasat [AB000922] 66 MDV-alphasat [LC010784] >90-95% MDV-alphasat [LC094147] 62 >80-90% MDV-alphasat [LC094157] 58 PNYDV-alphasat [KC979052] SYS-alphasat [KX534409] SYS-alphasat [KX534398] SCSV-alphasat [U16735] SYS-alphasat [KX534410] MDV-alphasat [AB000920] BMLRV-alphasat [KC978957] FBNYV-alphasat [KC979051] FBNSV-alphasat [KC978990] SYS-alphasat [KX534408] FBNYV-alphasat [AJ005966] FBNYV-alphasat [AJ132187] SYS-alphasat [KX534406] FBNSV-alphasat [KC978991] FBNYV-alphasat [KC979008] FBNYV-alphasat [X80879] BMLRV-alphasat [KC978948] SYS-alphasat [KX534402] SYS-alphasat [KX534399] SYS-alphasat [KX534407] SCSV-alphasat [U16731] FBNYV-alphasat [AJ005968] MDV-alphasat [AB000921] MDV-alphasat [LC010783] MDV-alphasat [LC094156] MDV-alphasat [LC094146] CFDV-alphasatAccepted [M29963] Manuscript Geminivirus alphasatellites KC978957] KC978948]

Nanovirus DNA-R SCSV -alphasat [U16735] SCSV -alphasat [U16731] FBNYV -alphasat [X80879] SYS-alphasat [KX534404] MDV -alphasat [LC010783] MDV -alphasat [LC094156] MDV -alphasat [LC094146] MDV -alphasat [LC094155] MDV -alphasat [LC094158] MDV -alphasat [LC010785] MDV -alphasat [LC010784] MDV -alphasat [LC094147] MDV -alphasat [LC094157] MDV -alphasat [AB000920] MDV -alphasat [AB000921] MDV -alphasat [AB009047] MDV -alphasat [AB000922] SYS-alphasat [KX534397] SYS-alphasat [KX534401] SYS-alphasat [KX534403] SYS-alphasat [KX534400] SYS-alphasat [KX534405] SYS-alphasat [KX534409] SYS-alphasat [KX534398] SYS-alphasat [KX534410] SYS-alphasat [KX534408] SYS-alphasat [KX534406] SYS-alphasat [KX534402] SYS-alphasat [KX534399] SYS-alphasat [KX534407] FBNYV -alphasat [AJ005966] FBNYV -alphasat [AJ132187] FBNYV -alphasat [AJ005968] FBNYV -alphasat [AJ005964] FBNYV -alphasat [AJ132185] BMLRV-alphasat [ BMLRV-alphasat [ FBNYV -alphasat [KC979051] FBNSV -alphasat [KC978990] FBNSV -alphasat [KC978991] FBNYV -alphasat [KC979008]

PNYDV - alphasat [KC979052]

0.2 Page 23 of 23