Appearances Can Be Deceptive: Revealing a Hidden Viral Infection with Deep Sequencing in a Plant Quarantine Context

Thierry Candresse1,2, Denis Filloux3, Brejnev Muhire4, Charlotte Julian3, Serge Galzi3, Guillaume Fort3, Pauline Bernardo3, Jean-Heindrich Daugrois3, Emmanuel Fernandez3, Darren P. Martin4, Arvind Varsani5,6,7, Philippe Roumagnac3* 1 INRA, UMR 1332 Biologie du Fruit et Pathologie, CS 20032, 33882 Villenave d’Ornon Cedex, France, 2 Universite´ de Bordeaux, UMR 1332 Biologie du Fruit et Pathologie, CS 20032, 33882 Villenave d’Ornon Cedex, France, 3 CIRAD, UMR BGPI, Campus International de Montferrier-Baillarguet, 34398 Montpellier Cedex-5, France, 4 Computational Biology Group, Institute of Infectious Diseases and Molecular Medicine, University of Cape Town, Cape Town, South Africa, 5 School of Biological Sciences and Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand, 6 Department of Plant Pathology and Emerging Pathogens Institute, University of Florida, Gainesville, Florida, United States of America, 7 Electron Microscope Unit, Division of Medical Biochemistry, Department of Clinical Laboratory Sciences, University of Cape Town, Observatory, South Africa

Abstract Comprehensive inventories of plant viral diversity are essential for effective quarantine and sanitation efforts. The safety of regulated plant material exchanges presently relies heavily on techniques such as PCR or nucleic acid hybridisation, which are only suited to the detection and characterisation of specific, well characterised pathogens. Here, we demonstrate the utility of sequence-independent next generation sequencing (NGS) of both -derived small interfering RNAs (siRNAs) and virion-associated nucleic acids (VANA) for the detailed identification and characterisation of infecting two quarantined sugarcane plants. Both plants originated from Egypt and were known to be infected with Sugarcane streak Egypt Virus (SSEV; Mastrevirus, Family ), but were revealed by the NGS approaches to also be infected by a second highly divergent mastrevirus, here named Sugarcane white streak Virus (SWSV). This novel virus had escaped detection by all routine quarantine detection assays and was found to also be present in sugarcane plants originating from Sudan. Complete SWSV genomes were cloned and sequenced from six plants and all were found to share .91% genome- wide identity. With the exception of two SWSV variants, which potentially express unusually large RepA proteins, the SWSV isolates display genome characteristics very typical to those of all other previously described mastreviruses. An analysis of virus-derived siRNAs for SWSV and SSEV showed them to be strongly influenced by secondary structures within both genomic single stranded DNA and mRNA transcripts. In addition, the distribution of siRNA size frequencies indicates that these mastreviruses are likely subject to both transcriptional and post-transcriptional gene silencing. Our study stresses the potential advantages of NGS-based virus metagenomic screening in a plant quarantine setting and indicates that such techniques could dramatically reduce the numbers of non-intercepted virus pathogens passing through plant quarantine stations.

Citation: Candresse T, Filloux D, Muhire B, Julian C, Galzi S, et al. (2014) Appearances Can Be Deceptive: Revealing a Hidden Viral Infection with Deep Sequencing in a Plant Quarantine Context. PLoS ONE 9(7): e102945. doi:10.1371/journal.pone.0102945 Editor: Hanu Pappu, Washington State University, United States of America Received January 31, 2014; Accepted June 24, 2014; Published July 25, 2014 Copyright: ß 2014 Candresse et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by the ANR French National Research Agency (Netbiome Program, SafePGR Project). AV and DPM are supported by the National Research Foundation of South Africa. BM is funded by the University of Cape Town. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * Email: [email protected]

Introduction The major challenge of using classical nucleic acid sequence- informed detection tools such as polymerase chain reaction (PCR) When attempting to prevent the spread of plant diseases, or Southern hybridisation assays, is that despite being highly comprehensive inventories of viral diversity are fundamental both sensitive, these techniques are generally either species or, at best, for effective quarantine and sanitation efforts, and to ensure that genus-specific. In addition, such tools lack the capacity to detect, plant materials within biological resource centres (BRCs) can be let alone identify, pathogens that are unknown, poorly character- safely distributed [1,2]. Detection of pathogens is one of the most ized or highly variable. Although it might be argued that the most critical quarantine and BRC operations. Ideally, the tools used for economically important pathogens tend to be well characterized this purpose must be both sensitive enough to accurately detect the and that it is therefore not a serious issue that many of the more presence of even extremely low amounts of pathogen nucleic acids obscure pathogens go undetected, it is becoming better appreci- or proteins, and provide sufficient specific information to identify ated that the ‘‘importance’’ of any particular pathogenic microbe the genetic variants/strains of whatever pathogens are present. is very difficult to define. Specifically, the environmental and

PLOS ONE | www.plosone.org 1 July 2014 | Volume 9 | Issue 7 | e102945 Viral Metagenomics in a Plant Quarantine Context economic impacts of a particular pathogen can vary widely with Materials and Methods varying climatic and ecological conditions and there are large numbers of microbes that are presently not classified as pathogens Plant material and sugarcane quarantine DNAs collection (or at least which are not noticeably pathogenic to humans or to Leaves presenting typical symptoms of sugarcane streak disease domesticated plants and animals), which will eventually emerge as were sampled from two sugarcane plants that had previously been important future pathogens [3]. Also, since non-domesticated found to be infected with Sugarcane streak Egypt virus (SSEV) plant and animal species and countless numbers of microbes which and had been kept in a quarantine greenhouse at the CIRAD contribute to natural terrestrial ecosystems [4–6] can also Sugarcane Quarantine Station, in Montpellier, France. The two potentially be threatened by exotic pathogens, the unconstrained sugarcane plants, VARX and USDA (which was initially global dissemination of apparently harmless fungi, viruses and maintained at USDA-APHIS Plant Germplasm Quarantine bacteria could have serious environmental and economic impacts. before being transferred to CIRAD in 2007), were both initially Whereas ‘‘sequence-dependent’’ microbial detection methods, collected in Egypt during two independent sampling surveys in the which are generally based on PCR or nucleic acid hybridisation, late 1990s [25,26]. These sampling surveys were both carried out can only be used to target known pathogens, sequence-indepen- on experimental stations and commercial lands in close collabo- dent next generation sequencing (NGS) based approaches can ration with Egyptian authorities (Sugar Crop Research Institute potentially provide an ideal platform for identifying almost all (SCRI), Dr Abdel Wahab I. Allam (Director of SCRI) regarding known and unknown microbes present in any particular host VARX; and Agricultural Genetic Engineering Research Institute, organism [5,7–9]. Coupled with innovative sample processing Dr N.A. Abdallah, and Dr M.A. Madkour regarding USDA). In procedures, ‘‘metagenomics’’ applications of NGS [10] have addition, leaves from six sugarcane plants originating from Sudan already enabled the identification of novel pathogens through (B0065, B0067, B0069, D0002, D0003 and D0005, Table S1) and the rapid and comprehensive characterization of microbial strains maintained at the CIRAD Sugarcane Quarantine Station were also used (Material Transfer Agreements between CIRAD and and isolates within environmental and host tissue samples [9,11]. Kenana Sugar Co. Ltd). DNAs from these six plants were In addition to numerous applications in the study of animal extracted using the DNeasy Plant Mini Kit (Qiagen). In addition, infecting viruses, NGS-based metagenomics approaches have also DNA was extracted from an additional 18 frozen leaf samples (2 been used to detect plant infecting viruses [12]. Three main classes 20uC), including 17 samples originating from Sudanese sugarcane of nucleic-acids have been targeted by such analyses: (1) virion- plants, which had passed through the Montpellier Quarantine associated nucleic acids (VANA) purified from viral particles station between 2000 and 2009 and one which had been obtained [13,14]; (2) double-stranded RNAs (dsRNA) [15]; and (3) virus- from a sugarcane seedling grown from sugarcane true seeds [fuzz] derived small interfering RNAs (siRNAs) [16]. Large numbers of developed in Guadeloupe from a biparental cross involving plants both known and new plant and fungus infecting DNA and RNA H70-6957 and B86-049 using the DNeasy Plant Mini Kit (Table viruses and viroids have been detected using these approaches S1). [12,17–21]. A major shortcoming of these metagenomic approaches is, VANA extraction from viral particles, cDNA amplification however, that they remain technically cumbersome and too expensive for routine diagnostic applications on collections of and sequencing eukaryotic hosts - even if barcoded primers are used to bulk- One gram of leaf material from the VARX and USDA plants sequence pooled samples from multiple sources [15]. Although were ground in Hanks’ buffered salt solution (HBSS) (1:10) with prohibitive for high throughput diagnostics, the costs of NGS in four ceramic beads (MP Biomedicals, USA) using a tissue the context of viral diversity research are often offset by the vast homogeniser (MP biomedicals, USA). The homogenised plant 6 volumes of useful data that can be generated on viral population extracts were centrifuged at 3,200 g for 5 min and 6 ml of the 6 dynamics, co-infections, mutation frequencies and genetic recom- supernatants were further centrifuged at 8,228 g for 3 min. The resulting supernatants were then filtered through a 0.45 mm sterile bination [22–24]. syringe filter. The filtrate was then centrifuged at 148,0006g for Here we describe the application of siRNA- and VANA- 2.5 hrs at 4uC to concentrate viral particles. The resulting pellet targeted NGS approaches to the analyses of two Egyptian was resuspended overnight at 4uC in 200 ml of HBSS. Non- sugarcane plants maintained for a number of years at the CIRAD encapsidated nucleic acids were eliminated by adding 15 U of Sugarcane Quarantine Station in Montpellier, France. These bovine pancreas DNase I (Euromedex) and 1.9 U of bovine plants were both known to be infected with Sugarcane streak pancreas RNase A (Euromedex, France) followed by incubation at Egypt Virus (SSEV; Family Geminiviridae, Genus Mastrevirus) 37uC for 90 min. Total nucleic acids were finally extracted from and were maintained for use as positive controls during the virions using a NucleoSpin 96 Virus Core Kit (Macherey-Nagel, application of diagnostic tools for SSEV detection in sugarcane Germany) following the manufacturer’s protocol. The amplifica- plants passing through the quarantine station. Using the siRNA- tion of extracted nucleic acids was performed as described by and VANA-targeted NGS approaches, we discovered and Victoria et al. [14] and aimed at detecting both RNA and DNA characterized a novel highly divergent mastrevirus from these viruses. Reverse transcriptase priming and amplification of nucleic two plants. This novel virus was also identified in other sugarcane acids were used for detecting RNA viruses. A Klenow Fragment plants originating from Sudan that exhibited white spots on the step was included in the protocol in order to detect DNA viruses as base of their leaf blades that become fused laterally, so as to appear demonstrated by Froussard [27]. Briefly, viral cDNA synthesis was as chlorotic stripes. Accordingly, we have proposed naming this performed by incubation of 10 ml of extracted viral nucleic acids virus Sugarcane white streak Virus (SWSV). In addition, we with 100 pmol of primer DoDec (59-CCT TCG GAT CCT CCN present a detailed analysis of siRNAs derived from the SWSV and NNN NNN NNN NN-39)at85uC for 2 min. The mixture was SSEV variants infecting the two analysed sugarcane plants. immediately placed on ice. Subsequently, 10 mM dithiothreitol, 1 mM of each deoxynucloside triphosphate (dNTP), 4 mlof56 Superscript buffer, and 5 U of SuperScript III (Invitrogen, USA) were added to the mixture (final volume of 20 ml), which was then

PLOS ONE | www.plosone.org 2 July 2014 | Volume 9 | Issue 7 | e102945 Viral Metagenomics in a Plant Quarantine Context incubated at 25uC for 10 min, followed by 42uC incubation for 59-GCT GAA ACC TAT GGC AAA GA-39 and SWSV-R1 60 min and 70uC incubation for 5 min before being placed on ice reverse primer 59-AGC CTC TCT ACA TCC TTT GC-39; and for 2 min. cDNAs were purified using the QiaQuick PCR cleanup pair2 ECORI-1F forward primer 59-GAA TTC CCA GAG CGT kit (Qiagen). Priming and extension was then performed using GGT A-39 and ECORI-2R reverse primer 59-GAG TTG AAT Large (Klenow) Fragment DNA polymerase (Promega). First, TCC GGT ACC AAG GAC-39) were complementary to 20 ml of cDNA in the presence of 4.8 mM of primer DoDec were sequences within the rep gene of SWSV. Total DNAs from the heated to 95uC for 2 min and then cooled to 4uC. 2.5 U of two sugarcane plants described above (VARX and USDA) were Klenow Fragment, 10X Klenow reaction buffer and 0.4 mM of extracted using the DNeasy Plant Mini Kit (Qiagen) and screened each dNTP (final volume of 25 ml) were added. The mixture was for SWSV using the two pairs of primers and GoTaq Hot Start incubated at 37uC for 60 min followed by 75uC enzyme heat Master Mix (Promega) following the manufacturer’s protocol. inactivation for 10 min. PCR amplification was carried out using Amplification conditions consisted of an initial denaturation at 5 ml of the reaction described above in a 20 ml reaction containing 95uC for 2 min, 35 cycles at 94uC for 10 sec, 55uC for 30 sec, 2 mM primer (LinkerMid50 primer for VARX: 59-ATC GTA 68uC for 3 min, and a final extension step at 68uC for 10 min. GCA GCC TTC GGA TCC TCC-39 and LinkerMid52 primer Amplification products of ,2.8 Kbp were gel purified, ligated to for USDA: 59-ATG TGT CTA GCC TTC GGA TCC TCC-39), pGEM-T (Promega) and sequenced by standard Sanger sequenc- and 10 ml of HotStarTaq Plus Master Mix Kit (Qiagen). The ing using a primer walking approach. following cycling conditions were used: one cycle of 95uC Reverse transcriptase priming and amplification of nucleic acids for 5 min, five cycles of 95uC for 1 min, 50uC for 1 min, 72uC were carried out in order to detect the intron of the rep gene. Total for 1.5 min, 35 cycles of 95uC for 30 sec, 50uC for 30 sec, 72uC for RNAs from VARX were extracted using the RNeasy Plant Mini 1.5 min +2 sec at each cycle. An additional final extension for Kit (Qiagen). DNase treatment of extracted RNAs was carried out 10 min at 72uC was then performed. DNA products were pooled using RQ1 RNase-Free DNase (Promega) following the manufac- (VARX and USDA products and 94 additional products obtained turer’s protocol. Viral cDNA synthesis was performed by from other quarantine samples), cleaned up using the Wizard SV incubation of 1 ml of DNase treated RNAs with 15 ml of RNase Gel and PCR Clean-Up System (Promega) and sequenced on 1/ free water, 0.6 mM of each primers (SWSV_F2: 59-ACC ATG 8th of a 454 pyrosequencing plate using GS FLX Titanium TGC TGC CAG TAA TT-39 and ECORI-2R: 59-GAG TTG reagents (Beckman Coulter Cogenics, USA). AAT TCC GGT ACC AAG GAC-39), and 0.4 mM of mixed deoxynucloside triphosphate (dNTPs), 5 ml of 5X Qiagen OneStep siRNA extraction and sequencing RT-PCR Buffer and 1 ml of Qiagen OneStep RT-PCR Enzyme The nucleic acid extraction and sequencing approach of Kreuze Mix. Tubes were first placed at 50uC for 30 min for cDNA et al. [16] was used with slight modifications. Total RNAs were synthesis. PCR amplification was then carried out using the extracted from 100mg of VARX fresh leaf material using Trizol following cycling conditions: One cycle of 95uC for 15 min, 35 (Invitrogen) following the manufacturer’s instructions. Small RNA cycles of 94uC for 1 min, 55uC for 1 min, 72uC for 1 min. An libraries were directly generated from total RNAs. Small RNAs additional final extension for 10 min at 72uC was then performed. ligated with 39 and 59 adapters were reverse transcribed and PCR Amplification products were gel purified, ligated to pGEM-T amplified (30 sec at 98uC; [10 sec at 98uC, 30 sec at 60uC, 15 sec (Promega) and sequenced by standard Sanger sequencing. at 72uC] 613 cycles; 10 min at 72uC) to create cDNA libraries selectively enriched in fragments having adapter molecules at both ends. The last step was an acrylamide gel purification of the 140– PCR detection tests 150 nt amplified cDNA constructs (corresponding to cDNA inserts DNAs extracted from 17 sugarcane plants originating from from siRNAs +120 nt from the adapters). Small RNA libraries Sudan kept at 220uC or six freshly extracted from plants were checked for quality and quantified using a 2100 Bioanalyzer maintained at the CIRAD Sugarcane Quarantine Station were (Agilent). The library was then sequenced on one lane of a HiSeq screened for SWSV. DNA extracted from one sugarcane seedling Illumina as single-end 50 base reads. grown from true seeds (fuzz) was also screened for SWSV. PCR amplification was carried out using the two pairs of primers Sequence assembly described above (SWSV_F1 and SWSV_R1; ECORI-1F and Analyses of reads produced by either Illumina (siRNA ECORI-2R) using GoTaq Hot Start Master Mix (Promega) sequencing) or 454 GS FLX Titanium (Amplified-VANA following the manufacturer’s protocol. Amplification products of sequencing) were performed using CLC Genomics Workbench ,2.8 Kbp were gel purified, ligated to pGEM-T (Promega) and 5.15. De novo assemblies of contigs were performed with a sequenced as described above. Plants infected with SWSV were minimal contig size set at 100 bp and 200 bp for Illumina and 454 also screened for all known sugarcane-infecting mastreviruses:Su- GS FLX Titanium reads, respectively. A posteriori mapping of garcane streak Egypt Virus, Sugarcane streak virus, Maize streak reads against the complete genomes of SWSV (once the full virus, Sugarcane streak Reunion virus, Eragrostis streak virus and genome had been cloned and sequenced) or SSEV or against parts Saccharum streak virus. PCR amplification was carried out using of these genomes were also performed using CLC Genomics 1 ml of DNA template in a 25 ml reaction containing 0.2 mMof Workbench 5.15. Primary sequence outputs have been deposited each broad spectrum primer (SSV_1732F: 59-CAR TCV ACR in the sequence read archive of GenBank (accession numbers: TTR TTY TGC CAG TA-39 and SSV_2176R: 59-GAR TAC VANA_USDA dataset: SRR1207274; VANA_VARX dataset: CTY TCH ATG MTH CAG A-39) and GoTaq Hot Start Master SRR1207275; siRNA_VARX dataset: SRR1207277). Mix (Promega) following the manufacturer’s protocol. The following cycling conditions were used: One cycle of 95uC for SWSV genome amplification, cloning and sequencing 2 min, 35 cycles of 94uC for 1 min, 53uC for 1 min, 72uC for Two partially overlapping SWSV specific PCR primer pairs 1 min. An additional final extension for 10 min at 72uC was then were designed so as to avoid any potential cross-hybridization to performed. 63 representative species of the family Geminiviridae, including SSEV. These two primer pairs (pair1: SWSV_F1 forward primer

PLOS ONE | www.plosone.org 3 July 2014 | Volume 9 | Issue 7 | e102945 Viral Metagenomics in a Plant Quarantine Context

Sequence analyses positive (siRNAs tended to occur at structured sites) or negative Six complete genomes of the novel mastrevirus were recovered (siRNAs tended to occur outside of structured sites). from plants VARX, USDA, A0037, B0069, D0005 and E0144 (Table S1) and were aligned with the genomes of representative Results mastreviruses using MUSCLE (with default settings) [28]. Similarly, the predicted replication associated protein (Rep) and 454-based sequencing of VANA from the VARX and protein (CP) amino acid sequences encoded by the viruses USDA sugarcane samples within the full-genome dataset were also aligned using MUSCLE. This approach was used in an attempt to detect both RNA and Maximum likelihood phylogenetic trees were inferred for the full DNA viruses that may be present in the two sugarcane plants [27]. genomes (TN93+G+I nucleotide substitution model chosen as the A total of 2612 and 1635 reads were respectively obtained from best-fit using jModelTest [29]), Rep (WAG+G+F amino acid the VARX and USDA plants following length and quality substitution model chosen as the best-fit using ProtTest [30]) and filtering. One hundred and eight and 18 contigs were produced CP (rtREV+G+F amino acid substitution model chosen as the by de novo assembly from the VARX- and USDA-derived reads, best-fit using ProtTest) datasets with PHYML [31]. Approximate respectively. Two contigs from the VARX plant (2706 nt and likelihood ratio tests (aLRT) were used to infer relative supports for 412 nt) and two from the USDA plant (2706 nt and 649 nt), branches (with branches having ,80% support being collapsed). encoded proteins with between 91 and 100% sequence identity All pairwise identity analysis of the full genome nucleotide with previously described SSEV proteins (Table1). BLASTx sequences, capsid protein (CP) amino acid sequences, replication analysis revealed that an additional two contigs from the VARX associated protein (Rep) amino acid sequences and movement plant (2122 and 127 nt) and three contigs from the USDA plant protein (MP) amino acid sequences were carried out using the (1836, 196 and 312 nt) were homologous with known mastre- MUSCLE-based pairwise alignment and identity calculation viruses but were nevertheless only distantly related to mastrevirus approach implemented in SDT v1.0 [32]. The full genome sequences currently deposited in GenBank (Table1). sequence alignment of representative mastrevirus genome se- A posteriori mapping of VANA 454 reads obtained from the quences together with SWSV was used to detect evidence of VARX and USDA plants against the complete SWSV genome recombination in SWSV using RDP 4.24 with default settings (see below), revealed that 23.9% (625/2612) and 16.1% (264/ [33]. Sequences are deposited in GenBank under accession 1635) of the total reads were derived from this genome and that numbers (SWSV-A [SD-VARX-2013] - KJ187746; SWSV-A these yielded complete genome coverage at an average depth of [SD -USDA-2013] - KJ187745; SWSV-B [SD -B0069-2013] – 81X and 29X, respectively. Interestingly, a ,120 nt long region of KJ210622; SWSV-B [SD -D0005-2013] - KJ187747; SWSV-B very low coverage (,4X) was identified, which mapped to the [SD -E0144-2013] - KJ187748 and SWSV-C [SD -A0037-2013] - large intergenic region (LIR) of the SWSV genome (Figure 1). KJ187749). A mapping analysis performed with the genome of SSEV indicated that the corresponding values were 53.5% of reads Test for associations between siRNAs and SWSV/SSEV (1398/2612, 159X average coverage depth) and 75% of reads genomic and transcript secondary structures (1227/1635, 138X coverage) for the VARX and USDA plants, The SWSV/SSEV full genome sequences and predicted respectively (Figure S1). unspliced complementary and virion strand transcripts were separately folded using Nucleic Acid Secondary Structure siRNA Illumina sequencing from the VARX sugarcane Predictor [34], with the sequence conformation set as circular plant DNA, at a temperature of 25uC. NASP generates a list of all A total of 15,275,640 raw reads were generated from the secondary structures detectable within given DNA or RNA VARX sugarcane sample, which were then filtered down to sequences and through simulations it demarcates a set of structures 3,945,108 high quality reads in the 21 to 24 nt size range of referred to as a ‘‘high confidence structure set’’ (HCSS), that siRNAs. From these reads, 226 contigs were obtained by de novo confers a higher degree of thermodynamic stability (lower free assembly, six of which showed significant degrees of similarity to energy) to the sequences than what would be expected to be mastreviruses based on BLASTx [35] searches (Table2). Of these achievable by randomly generated sequences with the same base six contigs, two (contigs #121 and #176) had a high degree of composition (with a p, = 0.05). identity to SSEV while the remaining four were more distantly Given the genomic coordinates of pairing nucleotides within the related to known mastreviruses. Three of these four contigs HCSS, we investigated whether there was any significant trend for (contigs #44, #86 and #101) apparently corresponded with a more reads (looking both at all reads collectively and at the 21 nt, mastrevirus capsid protein (CP) gene and the other one (contigs 22 nt, 23 nt and 24 nt long reads separately) occurring within #79) with a movement protein (MP) gene, while the cumulative secondary structures predicted to occur within (i) the full genomes, contig length of 761 bp corresponded to slightly more than a (ii) the virion-strand transcripts and (iii) the complementary-strand quarter of a typical mastrevirus genome (Table2). transcripts. The reads were mapped to the secondary structures Following the cloning and sequencing of the full genome of the and we counted how many nucleotides were located at paired and new mastrevirus (SWSV; see below) it was determined that 0.59% unpaired sites. While Kolmogorov-Smirnov tests (implemented in of the Illumina reads obtained from the VARX plant could be R; www.r-project.org) were used to determine whether the mapped to this genome (Figure 1) to generate contigs that covered distribution of reads between paired and unpaired sites were 96.3% of the genome at an average depth of 185X with only seven different, Wilcoxon rank-sum tests (also implemented in R, www.r- gaps of between three and 40 nucleotides. These gaps were located project.org) indicated whether there were significantly more reads within the large intergenic region (three gaps) and within the at paired sites compared to unpaired sites and vice versa. Whereas probable replication associated protein (Rep) gene (four gaps; the Kolmogorov-Smirnov tests were used to indicate whether any Figure 1) encoded by the C1 ORF. It is noteworthy that the associations existed between siRNA locations and base pairing ,120 nt long region of very low coverage (,4X) identified using within nucleic acid secondary structures, the Wilcoxon rank-sum the VANA approach mapped to the same part of the LIR region tests were used to determine whether detected associations were of the SWSV genome that remained uncovered during the

PLOS ONE | www.plosone.org 4 July 2014 | Volume 9 | Issue 7 | e102945 Viral Metagenomics in a Plant Quarantine Context

Illumina-based siRNA sequencing (Figure 1). As has been previously observed for other viruses, genome coverage was highly heterogeneous (Figure 1). However, a clear general trend could be observed, with the region corresponding to the virion sense V1 and V2 ORFs (encoding CP and MP proteins, respectively), showing an average coverage depth of ,436X and the comple- mentary sense C1 ORF showing an average coverage of only ,38X. Coverage of the non-coding large and small intergenic regions and the presumed C1 ORF intron were even lower at 17.5X and 6.8X, respectively. It is also noteworthy that besides differences in coverage depth, these various genomic regions of SWSV also showed differences in the siRNA size classes that they yielded. While there was an enrichment of the 21 and 22 nt siRNA size classes amongst the total siRNA reads mapping to the V1 and V2 ORFs, there was a depletion of the 21 nt siRNA size classes and an enrichment of the 24 nt size class amongst total siRNA reads mapping to the C1 ORF (Figure 2). The LIR and, to a lesser extent, the SIR showed a pattern similar to the C1 ORF region (data not shown). The C1 intron, however, had an extreme over-representation of the 24 nt size class with the other size classes being either nearly (22 nt) or

iate ), MSV (). totally (21 and 23 nt) absent (Figure 2). Since the VARX plant was also infected with SSEV, a similar analysis of SSEV-derived siRNAs was performed. Mapping against the genome of SSEV (NC_001868) demonstrated that 0.17% of total reads (6572) were derived from it and that these reads covered 98.6% of the SSEV genome at an average depth of 55X, leaving only 4 gaps of between 5 and 15 nucleotides (Figure S1). Although showing some high degrees of local heterogeneity, genome coverage of SSEV was less biased when comparing the different genomic regions. Nevertheless a similar trend to that associated with SWSV was observed with a higher depth of contigs from sugarcane plants VARX and USDA with detectable homology to mastreviral coverage for the virion sense V1–V2 ORFs (76.5X) than for both the complementary sense C1 ORF (37.6X) and the non-coding regions (46X). Also, as for SWSV, the 21–22 nt siRNA size classes de novo were enriched amongst those mapping to the virion sense ORFs and the 24 nt, siRNA size class was enriched amongst those mapping to the complementary sense C1 ORF (Figure S1). However, unlike for SWSV, no strong siRNA size-class biases were observed for the non-coding regions (data not shown). By collectively using the Illumina siRNA reads and the 454 VANA reads it was possible to assemble a single genome of the novel mastrevirus from both the VARX and USDA plants. SWSV

Associations between siRNAs and SWSV/SSEV genomic and transcript secondary structures It has been previously determined that nucleic acid structures can have an appreciable impact on both the distribution of siRNA targets [36,37], and the operational efficiency of small RNA mediated anti-viral and anti-viroid defences [37,38]. We detected strong evidence for the presence of ssDNA secondary structures in both the SWSV (30 high confidence structure set (HCSS) identified) and SSEV (29 HCSS structures identified) genomes (Table 3). The distributions of the HCSS structural elements were, however, different in the predicted virion and complementary strand transcripts of the two viruses, with only two HCCS 1237 2706 2122 412 127 1387 470 11 1 SSEV (NP_045945) DDSMV (YP_003915158) SSEV (AAC98076) BCSMV (YP_004089628) CP RepA RepA MP 0.00 3.84E–56 1.72e–10 2.07E–8 70% 100% 71% 95,2% 123413 2706 1836 649 196 312 1128 82structures 12 37 83 detected SSEV (NP_04945) in DDSMV (YP_003915158) the SSEV (NP_04945) SWSV MSV (CAA10092) SSEV (AAF76868) complementary CP RepA strand RepA RepA RepA transcript 9.20E–177 1.04E–56 2.80E–66 99.2% 48.8% 1.34E–8 1.65E–30 91.1% 56.2% 84.3% # # # # # # # # # and none being detected in the SSEV virion strand transcript (so

Lengths, numbers of reads and BlastX analysis results for VANA 454 that this particular transcript was not analysed further). We detected a strong association between the absence of predicted secondary structures within the ssDNA SWSV genome and increased frequencies of corresponding 22, 23 and 24 nt long sequences. doi:10.1371/journal.pone.0102945.t001 Table 1. SampleVARX Contig Contig length (bp) Number of reads BlastX Virus BlastXLocus BlastX e-value Percent identity USDA Acronyms used are as follows: SSEV (Sugarcane streak Egypt virus), DDSMV ( striate mosaic virus), BCSMV (Bromus catharticus str siRNAs (p-values ,0.008; Table 3). Curiously, we found a

PLOS ONE | www.plosone.org 5 July 2014 | Volume 9 | Issue 7 | e102945 Viral Metagenomics in a Plant Quarantine Context

Figure 1. SWSV genome coverage following NGS. The genomic organization of SWSV is schematically shown above the graph. While relative degrees of coverage achieved after a posteriori mapping of reads produced by Illumina-based siRNA sequencing against the SWSV genome is indicated in green, the coverage achieved after mapping reads produced by 454 GS FLX Titanium-based VANA sequencing is indicated in blue. doi:10.1371/journal.pone.0102945.g001 different association when considering the predicted SWSV RNA not so far been reported. The novel mastrevirus was, however, transcripts with 21 nt siRNA reads displaying a strong tendency to detected in five sugarcane plants originating from Sudan (A0037, correspond with nucleotide sites that were predicted to be base B0065, B0069, D0005 and E0144) out of the 23 screened (Table paired in both the virion and complementary strand transcripts (p- S1). values ,2.49610-6) and the 22, 23 and 24 siRNA size classes Complete SWSV genomes from four sugarcane plants (A0037, displaying a similar tendency with respect to the virion strand B0069, D0005 and E0144) were cloned and sequenced. The transcript (p-values ,6.07610-13). genomes of these isolates have .91% genome-wide identity with Similar to SWSV, for the SSEV full genome there was an those recovered from the VARX and USDA sugarcane plants. association between the absence of ssDNA structural elements and Phylogenetic analyses of the full genomes (Figure 3) and of the increased frequencies of 22 nt siRNAs. Also similar to SWSV the amino acid sequences that they likely encode (Figure 4) confirmed 22, 23 and 24 nt long siRNAs display a significant tendency to that the isolates obtained from the Sudanese sugarcane plants also correspond with transcript nucleotides that are base-paired within belong to the SWSV species. The six isolates can be further secondary structural elements. classified into 3 strains, SWSV-A (VARX, USDA), -B (B0069, D0005, E0144) and -C (A0037) (Figure S3) based on the proposed A novel sugarcane-infecting mastrevirus originating from classification of mastrevirus strains outlined by Muhire et al. [32]. the Nile region Using primers that allow the amplification of all sugarcane- The complete genome of SWSV, as recovered from the VARX infecting mastreviruses, including Sugarcane streak Egypt Virus, and USDA plants, is most similar to that of Wheat dwarf India Sugarcane streak virus, Maize streak virus, Sugarcane streak virus (WDIV, Accession number NC_017828), with which it Reunion virus, Eragrostis streak virus and Saccharum streak virus, shares 61% genome-wide identity. Whereas the Rep and MP the five sugarcane plants originating from Sudan were shown to be amino acid sequences of SWSV are also most similar to those of free of co-infection with other known mastreviruses. Three of them WDIV (54.4% and 44.8% identity, respectively), the CP is most are still maintained at the CIRAD sugarcane quarantine station similar to that of Panicum streak virus (PanSV, NC_001647, (B0065, B0069 and D0005) and exhibit white spots on the base of 51.4–53.9%). Based on the 78% species demarcation threshold set their leaf blades, around the blade joint where the two wedge by the Geminivirus study group of the ICTV [32], it is clear that shaped areas called ‘‘dewlaps’’ are located (Figure S4). These spots the novel mastrevirus should be considered a new species within can become fused laterally, so as to appear as chlorotic stripes the genus Mastrevirus of the Family Geminiviridae (Figure S2). (Figure S4). It is noteworthy that the SWSV infected D0005 plant This is further confirmed by phylogenetic analyses performed on displayed very little evidence of these spots (only one leaf out of both the full genome (Figure 3) and on the amino acid sequences eight exhibited tiny white spots that resembled thrip damage) and of its encoded proteins (Figure 4). The new virus clearly clusters it is therefore very likely that SWSV infections could escape visual with mastreviruses on a branch that is not closely associated with inspection (Figure S4). Given that three of the infected sugarcane any other species classified within this genus. Whereas the CP of varieties exhibited mild foliar symptoms, i.e. white spots on the SWSV clusters within the virus clade including the various African base of their leaf blades that become fused laterally, so as to appear streak viruses, Australasian striate mosaic viruses, Digitaria streak as chlorotic stripes, we propose naming the new species Sugarcane virus (DSV) and WDIV, the Reps cluster with the African streak white streak Virus. viruses and WDIV (Figure 4). SWSV was not detected in sugarcane seedlings derived from Genome analysis of SWSV sugarcane true seeds under sterile insect-proof conditions, in The SWSV genomes recovered from the various sugarcane agreement with the fact that seed transmission of geminiviruses has plants were between 2828 and 2836 nt and are, in almost all

PLOS ONE | www.plosone.org 6 July 2014 | Volume 9 | Issue 7 | e102945 Viral Metagenomics in a Plant Quarantine Context

respects, very similar to those of all other previously described mastreviruses. The one exceptional feature of the SWSV genomes is that in case of the VARX and USDA isolates alternative splicing of complementary sense transcripts likely results in the expression of both a standard Rep (which is predicted to be 396 amino acids long), and a rather unusual RepA of 418 amino acids long. This is the only known occurrence in any geminivirus of a RepA that is larger than Rep. Given the uniqueness of this apparent genome organisation in the USDA and VARX isolates the correct identification of the intron within the complementary sense transcript was verified. RT-PCR reactions targeting the complementary sense transcript clearly indicated the presence of a mixture of spliced and non- spliced complementary sense mRNA transcripts, and confirmed that the correct locations of the acceptor and donor sites of the 66 nt long SWSV intron had been identified (Figure S5).

Analysis of recombination All the SWSV genome sequences determined here share evidence of the same ancestral recombination event in the short dwarf India virus), SSRV (Sugarcane streak Reunion virus). intergenic region - corresponding to genomic coordinates 1419– 1468 in the USDA isolate (p = 3.821610-7 for the GENECONV, MAXCHI and RDP methods implemented in RDP4.24). Corresponding coordinates are known to be very common sites of recombination in mastreviruses [39] and the fragment that they delimit in SWSV has apparently been derived from something resembling an African streak virus.

Discussion We have performed NGS-based analyses of both siRNA and VANA isolated from sugarcane plants originating from Egypt. Both sequence-independent NGS approaches revealed the pres- contigs from sugarcane plant VARX with detectable homology to mastreviral sequences. ence of a novel mastrevirus, SWSV, which had so far escaped routine quarantine detection assays, possibly because it was present in mixed infection with SSEV. The procedures used for de novo the discovery of SWSV pave the way towards the application of NGS-based quarantine detection procedures. Such procedures would likely be hierarchical with a first stage sequence-indepen- dent NGS step followed by sequence-dependent secondary assays. Whereas the first step would be to identify novel viruses within a single plant (perhaps one displaying apparent disease symptoms), the second step would be to use sequence dependent approaches to both confirm the presence of any novel virus(es) identified in the original host, and identify the presence of this(ese) virus(es) in larger plant collections. A major strength of such an approach is that it would also yield complete genome sequences. The present study also confirms that both VANA [13] and siRNA [16] can be successfully targeted by metagenomics approaches for the discovery and characterization of plant- infecting DNA viruses. The VANA-based 454 pyrosequencing approach has several advantages as it initially combines reverse- transcriptase priming and a Klenow Fragment step, which potentially enables the detection of both RNA and DNA viruses. Additionally, up to 96-tagged amplified DNAs (cDNA and DNAs amplified using the Klenow Fragment step) can be pooled and

1211764479 10186 133101 117 275 258 111 270 520 914 7640 1649 1284 SSEV (AAF76871) SSEV (AAC98080)sequenced SacSV (YP_003288767) WDIV (YP_006273068) WDIV (YP_006273069) SSRV (ABZ03975) in multiplex CP MP CP format MP CP [15] CP making this 2.60E–9 7.22E–15 7,16E–06 approach 3,29E–11 6,78E–20 100% very 100% 1,94E–05 68% 70% 50% 64% # # # # # # useful for routine diagnostic screening of plants within BRCs and quarantine stations. However, validation using plants infected or

Lengths, numbers of reads and BlastX analysis results for siRNA co-infected with RNA and DNA viruses needs to be carried out in order to determine the sensitivity and specificity levels of this 454- based VANA sequencing approach. Virus-derived siRNAs naturally accumulate in virus-infected Table 2. VirusSSEV Contig Contig length (bp) Number of reads BlastX Virus BlastXLocus BlastX e-value Percent identity SWSV Acronyms used are as follows: SSEV (Sugarcane streak Egypt virus), SWSV (Sugarcane white streak virus), SacSV (Saccharum streak virus), WDIV (Wheat doi:10.1371/journal.pone.0102945.t002 plants as a consequence of the action of Dicer enzymes as part of

PLOS ONE | www.plosone.org 7 July 2014 | Volume 9 | Issue 7 | e102945 Viral Metagenomics in a Plant Quarantine Context

Figure 2. Size distribution of sequenced siRNAs obtained from the VARX plant. The histograms represent the numbers of siRNA reads in each size class. (A) The size distributions of total reads, (B) The size distributions of reads mapping to the rep gene C-sense intronic region of SWSV, (C) The size distributions of reads mapping to the V1–V2 ORFs region of SWSV and (D) The size distributions of reads mapping to the C1 ORF region of SWSV. doi:10.1371/journal.pone.0102945.g002 the RNA silencing-based plant antiviral defences [40]. Adopting a predicted secondary structures within both the single stranded metagenomic approach and randomly sequencing these siRNAs is DNA (ssDNA) genomes of these viruses and their predicted single therefore an extremely powerful way to discover and characterise stranded RNA (ssRNA) complementary and virion strand previously unknown plant viruses and viroids [16,41]. In addition transcripts. However, whereas significantly more siRNAs corre- to providing evidence for the presence of the two mastreviruses co- sponded with unstructured regions of the ssDNA genome, for the infecting the VARX sugarcane plant, this approach provided transcripts significantly more siRNAs corresponded with struc- information on the interaction of the plant antiviral silencing tured regions of ssRNA. It is plausible that base-paired nucleotides machinery and these two viruses. Although these aspects have within transcript RNA molecules are protected from siRNA been studied previously in geminiviruses in the Begomovirus genus binding and that the secondary structures evident both in (Blevins et al., 2006; Akbergenov et al., 2006; Rodrı´guez-Negrete transcripts produced by SWSV, SSEV and in mastrevirus et al., 2009; Yang et al., 2011; Aregger et al., 2012), very little genomes in general [43] may represent an evolutionary adaptation comparable information has previously been available for for viral persistence. In mammalian RNA viruses there is an mastreviruses. association between degrees of genomic secondary structure and The siRNA distributions observed here for SWSV and SSEV, infection duration with viruses having highly structured genomes perhaps unsurprisingly, seem to largely parallel those previously tending to cause chronic infections and viruses with unstructured reported for begomoviruses. In particular, the differences in size genomes tending to cause acute infections [44,45]. classes observed between different genome regions suggest that In both analysed Egyptian sugarcane accessions, VARX and mastreviruses are subject both to transcriptional gene silencing, USDA, SWSV was found to be present in co-infections with based on 24 nt long siRNAs produced through the action of SSEV. Both sugarcane plants were independently collected in DCL3, and to post-transcriptional gene silencing (PTGS) medi- Egypt which suggests that SWSV infection of Egyptian sugarcane ated by the 21–22 nt long siRNAs produced through the action of plants may not be a rare phenomenon. SWSV was also detected in the antiviral Dicers DCL4 and DCL2 (Rodrı´guez-Negrete et al., SSEV-free plants that originated from Sudan. It is noteworthy that 2009; Aregger et al., 2012). The action of the former mechanism is one of the Sudanese plants from which SWSV was isolated, particularly evident in the siRNAs mapping to the SWSV intron E0144, was initially grown in Sudan in 1992 before being but is also, to a lesser extent, evident in the siRNAs mapping to transferred to Barbados in 1998 and subsequently sent back to the both the non-coding regions and the complementary sense ORFs CIRAD Sugarcane Quarantine Station in 2009 (unpublished data, of SWSV and SSEV. On the other hand, the 21–22 nt siRNA size CIRAD Sugarcane Quarantine Station). Assuming that SWSV classes associated with PTGS are particularly evident in the did not infect this plant in Barbados between 1998 and 2009, it is siRNAs mapping to the two virion sense ORFs which are known plausible that SWSV was present along the Nile basin at least from to be more actively transcribed in mastreviruses than their the late 1980s. Interestingly, as a consequence of indel polymor- complementary sense counterparts [42]. phisms in the 66 nt long SWSV intron, the Egyptian SWSV For both SWSV and SSEV we detected a significant association isolates VARX and USDA have a highly unusual genome between the frequencies of siRNAs and the presence/absence of organization and likely express a RepA protein that, while having

PLOS ONE | www.plosone.org 8 July 2014 | Volume 9 | Issue 7 | e102945 LSOE|wwpooeog9Jl 04|Vlm su e102945 | 7 Issue | 9 Volume | 2014 July 9 www.plosone.org | ONE PLOS Table 3. Associations between siRNAs and SWSV/SSEV genomic and transcript secondary structures in the HCSS.

Probability if association Probability of no Probability of no between siRNAs and association between association between siRNAs secondary structure (KS siRNAs and base-paired and unpaired nucleotides Sequence name Component Length Number of structures siRNA type Test) nucleotides (WRS test) (WRS test)

SWSV Full genome 2830 30 All 5.6961025 0.999 6.2061024 21 0.205 0.590 0.410 22 3.3561026 0.992 0.008 23 0.0005 0.999 6.8161024 24 3.4861029 0.999 2.6761026 V-strand transcript 1222 16 All 0 1.80610215 1 21 0.022 4.93610217 1 22 04.1610217 1 23 8.23610214 1.10610216 1 24 0 6.07610213 1 C-strand transcript 1446 2 All 2.0061024 0.100 0.899 21 1.7861027 2.4961026 0.999 22 0.409 0.122 0.877 23 0.008 0.925 0.075 24 0.003 0.970 0.029 SSEV Full genome 2706 29 All 0.007 0.889 0.111 21 0.016 0.737 0.263 22 0.0001 0.999 0.001 23 0.140 0.861 0.139 24 0.209 0.499 0.501 ia eaeoisi ln urnieContext Quarantine Plant a in Metagenomics Viral V-strand transcript 1131 0 NA NA NA NA C-strand transcript 1406 13 All 0.002 0.019 0.981 21 0.670 0.465 0.535 22 0.006 0.025 0.975 23 0.002 0.001 0.999 24 0.041 0.023 0.977

doi:10.1371/journal.pone.0102945.t003 Viral Metagenomics in a Plant Quarantine Context

Figure 3. Maximum-likelihood phylogenetic tree of 63 virus isolates representing each known mastrevirus species (including major strains) and the 6 SWSV isolates determined in this study. Tree branches are coloured according to the geographical origins of the viruses. Branches marked with filled and open circles respectively have .95% and 80–94% approximate likelihood ratio test support; branches having ,80% support were collapsed. The phylogenetic tree is rooted using the full genome sequence of Dicot-infecting mastreviruses. doi:10.1371/journal.pone.0102945.g003 the same N- and C-terminus sequences as Rep, is 22 amino acids that there may have been other undetected recent introductions of longer than Rep. mastreviruses to the Americas. Although insect transmission of The recent discoveries of SWSV and other highly divergent mastreviruses in the New World remains to be reported, it is mastreviruses [46,47] suggest that this geminivirus genus likely noteworthy that one of the three mastrevirus species that has so far encompasses a far greater diversity and has a greater global been detected in the Americas was isolated from a dragonfly which distribution than has been previously appreciated. The SWSV had possibly eaten a plant feeding insect that was carrying the isolate from the Sudanese sugarcane plant that had been virus [48]. The presence of SWSV in Barbados offers an propagated in Barbados represents only the third instance of opportunity to investigate possible natural transmission of the discovery of mastreviruses in the New World [16,48], and suggests virus by screening sugarcane planted near the SWSV infected

PLOS ONE | www.plosone.org 10 July 2014 | Volume 9 | Issue 7 | e102945 Viral Metagenomics in a Plant Quarantine Context

Figure 4. Maximum-likelihood phylogenetic tree of Rep (A) and CP (B) proteins. Tree branches are coloured according to the geographical origins of the viruses. Branches marked with filled and open circles are respectively have .95% and 80–94% approximate likelihood ratio test support; branches having ,80% support were collapsed. doi:10.1371/journal.pone.0102945.g004

E0144 accessions. Phylogenetic analyses of any SWSV genomes situation with SWSV might closely match that of Sugarcane yellow sampled from such plants should reveal their likely recent leaf virus (SCYLV), which remained unnoticed for at least 30 transmission histories. years during its spread throughout the world [49]. In order to Given the relatively high degrees of sequence divergence accurately determine the potential economic impacts of the observed between the different SWSV isolates described here dissemination of SWSV, additional studies assessing the pathoge- (,9%), it is plausible that the natural geographical range of SWSV nicity of this virus are certainly warranted. is broader than just the Nile basin. Also, the global dissemination Our study stresses both the potential advantages of NGS-based of sugarcane cuttings, the absence of SWSV diagnostic tools, and virus metagenomic screening in a plant quarantine setting, and the the fact that SWSV induces, at least in one case, extremely mild need to better assess viral diversity within plants that are destined symptoms in sugarcane imply that SWSV may have already been for exotic habitats. It indicates that a combination of sequence- unknowingly distributed throughout the sugarcane growing independent NGS-based partial viral genome sequencing coupled regions of the world. The failure of established sugarcane with sequence-dependent Sanger-based full genome cloning and quarantine diagnostics in this regard provides a dramatic example sequencing is likely to reduce the number of non-intercepted virus of how potentially pathogenic viruses can evade the screening pathogens passing through plant quarantine stations, while at the procedures of quarantine facilities and may spread worldwide same time alerting authorities to the presence and potential spread through international plant material exchanges. In this regard the of viruses with unknown pathogenic potentials.

PLOS ONE | www.plosone.org 11 July 2014 | Volume 9 | Issue 7 | e102945 Viral Metagenomics in a Plant Quarantine Context

Supporting Information Figure S5 Reverse transcriptase priming and amplifi- cation of nucleic acids were carried out in order to Figure S1 (A) Genome coverages obtained after a posteriori detect the rep gene C-sense intronic region. (A) Agarose gel mapping against the complete genome of SSEV of reads produced detection of presence of a mixture of spliced and non-spliced by Illumina (siRNA sequencing). The genomic organization of complementary sense mRNA transcripts. 1: 1 Kb ladder; 2: SSEV is schematically shown at the top of the figure. (B) Size Reverse transcriptase priming and amplification of nucleic acids distribution of sequenced siRNAs obtained from the VARX plant without DNase treatment of extracted RNAs; 3: Reverse mapping on the V1–V2 ORFs region of SSEV. Histograms transcriptase priming and amplification of nucleic acids with represent the number of siRNA reads in each size class. and (C) DNase treatment of extracted RNAs. (B) 66 nt long SWSV intron Size distribution reads mapping on C1–C2 ORFs region of SSEV. nucleotidic sequence and splice donor and acceptor sites. The (TIF) sequence of the intron (in lower case) and its flanking exons (upper Figure S2 Two-dimensional genome-wide percentage case) are shown. The 59 (donor) and 39 (acceptor) splice sites are pairwise nucleotide identity plot of monocot-infecting underlined (lower case). mastreviruses including the six novel SWSV isolates (TIF) from this study. Table S1 List of the sugarcane varieties from the (TIF) CIRAD Sugarcane Quarantine Station (SQS) that were Figure S3 (A) Maximum-likelihood phylogenetic tree of six screened for the presence of all known sugarcane- SWSV isolates. The six isolates can be classified into 3 strains, infecting mastreviruses and SWSV. SWSV-A (VARX, USDA), -B (B0069, D0005, E0144) and -C (DOC) (A0037). (B) Genome-wide pairwise nucleotide similarity score matrix, the 94% strain demarcation threshold set by the Acknowledgments Geminivirus study group of the ICTV (Muhire et al. 2013) is The authors are grateful to Se´bastien Theil for depositing NGS datasets in indicated (green coloured below 94% and pink-red coloured above GenBank. 94%). (TIF) Author Contributions Figure S4 Symptoms caused by SWSV on B0065, B0069 Conceived and designed the experiments: TC DF PR. Performed the and D0005 plants. experiments: BM CJ SG GF JHD EF PB. Analyzed the data: TC DF BM (TIF) DPM AV PR. Wrote the paper: TC DPM AV PR.

References 1. Anderson PK, Cunningham AA, Patel NG, Morales FJ, Epstein PR, et al. (2004) 17. Al Rwahnih M, Daubert S, Golino D, Rowhani A (2009) Deep sequencing Emerging infectious diseases of plants: pathogen pollution, climate change and analysis of RNAs from a grapevine showing Syrah decline symptoms reveals a agrotechnology drivers. Trends Ecol Evol 19: 535–544. multiple virus infection that includes a novel virus. Virology 387: 395–401. 2. Jones KE, Patel NG, Levy MA, Storeygard A, Balk D, et al. (2008) Global trends 18. Kraberger S, Stainton D, Dayaram A, Zawar-Reza P, Gomez C, et al. (2013) in emerging infectious diseases. Nature 451: 990–993. Discovery of Sclerotinia sclerotiorum Hypovirulence-Associated Virus-1 in 3. Jones RAC (2009) Plant virus emergence and evolution: Origins, new encounter Urban River Sediments of Heathcote and Styx Rivers in Christchurch City, scenarios, factors driving emergence, effects of changing world conditions, and New Zealand. Genome Announc 1. prospects for control. Virus Research 141: 113–130. 19. Li R, Gao S, Hernandez AG, Wechter WP, Fei Z, et al. (2012) Deep sequencing 4. Roossinck MJ (2011) The good viruses: viral mutualistic symbioses. Nat Rev of small RNAs in tomato for virus and viroid identification and strain Microbiol 9: 99–108. differentiation. PLoS One 7: e37127. 5. Rosario K, Breitbart M (2011) Exploring the viral world through metagenomics. 20. Martinez G, Donaire L, Llave C, Pallas V, Gomez G (2010) High-throughput Curr Opin Virol 1: 1–9. sequencing of Hop stunt viroid-derived small RNAs from cucumber leaves and 6. van der Heijden MG, Bardgett RD, van Straalen NM (2008) The unseen phloem. Mol Plant Pathol 11: 347–359. majority: soil microbes as drivers of plant diversity and productivity in terrestrial 21. Sikorski A, Massaro M, Kraberger S, Young LM, Smalley D, et al. (2013) Novel ecosystems. Ecol Lett 11: 296–310. myco-like DNA viruses discovered in the faecal matter of various animals. Virus 7. Li L, Delwart E (2011) From orphan virus to pathogen: the path to the clinical Res 177: 209–216. lab. Curr Opin Virol 1: 282–288. 22. Borucki MK, Chen-Harris H, Lao V, Vanier G, Wadford DA, et al. (2013) 8. Mokili JL, Rohwer F, Dutilh BE (2012) Metagenomics and future perspectives in Ultra-Deep Sequencing of Intra-host Rabies Virus Populations during Cross- virus discovery. Curr Opin Virol 2: 63–67. species Transmission. PLoS Negl Trop Dis 7: e2555. 9. Willner D, Hugenholtz P (2013) From deep sequencing to viral tagging: recent 23. Simmons HE, Dunham JP, Stack JC, Dickins BJ, Pagan I, et al. (2012) Deep advances in viral metagenomics. Bioessays 35: 436–442. sequencing reveals persistence of intra- and inter-host genetic diversity in natural 10. Thurber RV, Haynes M, Breitbart M, Wegley L, Rohwer F (2009) Laboratory and greenhouse populations of zucchini yellow mosaic virus. J Gen Virol 93: procedures to generate viral metagenomes. NatProtoc 4: 470–483. 1831–1840. 11. Lipkin WI (2010) Microbe hunting. Microbiol Mol Biol Rev 74: 363–377. 24. Wang H, Xie J, Shreeve TG, Ma J, Pallett DW, et al. (2013) Sequence 12. Roy A, Shao J, Hartung JS, Schneider W, Brlansky RH (2013) A case study on recombination and conservation of Varroa destructor virus-1 and deformed discovery of novel Citrus Leprosis virus cytoplasmic type 2 utilizing small RNA wing virus in field collected honey bees (Apis mellifera). PLoS One 8: e74508. libraries by Next Generation Sequencing andbioinformatic analyses. Journal of 25. Bigarre L, Salah M, Granier M, Frutos R, Thouvenel J, et al. (1999) Nucleotide Data Mining in Genomics & Proteomics 4: 1000129. sequence evidence for three distinct sugarcane streak mastreviruses. Arch Virol 13. Melcher U, Muthukumar V, Wiley GB, Min BE, Palmer MW, et al. (2008) 144: 2331–2344. Evidence for novel viruses by analysis of nucleic acids in virus-like particle 26. Shamloul AM, Abdallah NA, Madkour MA, Hadidi A (2001) Sensitive detection fractions from Ambrosia psilostachya. Journal of Virological Methods 152: 49– of the Egyptian species of sugarcane streak virus by PCR-probe capture 55. hybridization (PCR-ELISA) and its complete nucleotide sequence. J Virol 14. Victoria JG, Kapoor A, Dupuis K, Schnurr DP, Delwart EL (2008) Rapid Methods 92: 45–54. identification of known and new RNA viruses from animal tissues. PLoS Pathog 27. Froussard P (1993) rPCR: a powerful tool for random amplification of whole 4: e1000163. RNA sequences. PCR Methods Appl 2: 185–190. 15. Roossinck MJ, Saha P, Wiley G, Quan J, White J, et al. (2010) Ecogenomics: 28. Edgar RC (2004) MUSCLE: a multiple sequence alignment method with using massively parallel pyrosequencing to understand virus ecology. Mol Ecol reduced time and space complexity. BMC Bioinformatics 5: 113. 19 81–88. 29. Posada D (2008) jModelTest: Phylogenetic model averaging. Molecular Biology 16. Kreuze JF, Perez A, Untiveros M, Quispe D, Fuentes S, et al. (2009) Complete and Evolution 25: 1253–1256. viral genome sequence and discovery of novel viruses by deep sequencing of 30. Abascal F, Zardoya R, Posada D (2005) ProtTest: selection of best-fit models of small RNAs: a generic method for diagnosis, discovery and sequencing of protein evolution. Bioinformatics 21: 2104–2105. viruses. Virology 388: 1–7.

PLOS ONE | www.plosone.org 12 July 2014 | Volume 9 | Issue 7 | e102945 Viral Metagenomics in a Plant Quarantine Context

31. Guindon S, Delsuc F, Dufayard JF, Gascuel O (2009) Estimating maximum 41. Kashif M, Pietila S, Artola K, Jones RAC, Tugume AK, et al. (2012) Detection likelihood phylogenies with PhyML. Methods Mol Biol 537: 113–137. of Viruses in Sweetpotato from Honduras and Guatemala Augmented by Deep- 32. Muhire B, Martin DP, Brown JK, Navas-Castillo J, Moriones E, et al. (2013) A Sequencing of Small-RNAs. Plant Disease 96: 1430–1437. genome-wide pairwise-identity-based proposal for the classification of viruses in 42. Morris-Krsinich BA, Mullineaux PM, Donson J, Boulton MI, Markham PG, et the genus Mastrevirus (family Geminiviridae). Archives of Virology 158: 1411– al. (1985) Bidirectional transcription of maize streak virus DNA and 1424. identification of the coat protein gene. Nucleic Acids Res 13: 7237–7256. 33. Martin DP, Lemey P, Lott M, Moulton V, Posada D, et al. (2010) RDP3: a 43. Muhire BM, Golden M, Murrell B, Lefeuvre P, Lett JM, et al. (2013) Evidence flexible and fast computer program for analyzing recombination. Bioinformatics of pervasive biologically functional secondary-structures within the genomes of 26: 2462–2463. eukaryotic single-stranded DNA viruses. J Virol. 34. Semegni JY, Wamalwa M, Gaujoux R, Harkins GW, Gray A, et al. (2011) 44. Davis M, Sagan SM, Pezacki JP, Evans DJ, Simmonds P (2008) Bioinformatic NASP: a parallel program for identifying evolutionarily conserved nucleic acid and physical characterizations of genome-scale ordered RNA structure in secondary structures from nucleotide sequence alignments. Bioinformatics 27: mammalian RNA viruses. J Virol 82: 11824–11836. 2443–2445. 45. Simmonds P, Tuplin A, Evans DJ (2004) Detection of genome-scale ordered 35. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic Local RNA structure (GORS) in genomes of positive-stranded RNA viruses: Alignment Search Tool. Journal of Molecular Biology 215: 403–410. Implications for virus evolution and host persistence. RNA 10: 1337–1351. 36. Donaire L, Wang Y, Gonzalez-Ibeas D, Mayer KF, Aranda MA, et al. (2009) 46. Kraberger S, Harkins GW, Kumari SG, Thomas JE, Schwinghamer MW, et al. Deep-sequencing of plant viral small RNAs reveals effective and widespread (2013) Evidence that dicot-infecting mastreviruses are particularly prone to inter- targeting of viral genomes. Virology 392: 203–214. 37. Itaya A, Zhong X, Bundschuh R, Qi Y, Wang Y, et al. (2007) A structured viroid species recombination and have likely been circulating in Australia for longer RNA serves as a substrate for dicer-like cleavage to produce biologically active than in Africa and the Middle East. Virology 444: 282–291. small RNAs but is resistant to RNA-induced silencing complex-mediated 47. Kraberger S, Thomas JE, Geering AD, Dayaram A, Stainton D, et al. (2012) degradation. J Virol 81: 2980–2994. Australian monocot-infecting mastrevirus diversity rivals that in Africa. Virus 38. Westerhout EM, Ooms M, Vink M, Das AT, Berkhout B (2005) HIV-1 can Res 169: 127–136. escape from RNA interference by evolving an alternative structure in its RNA 48. Rosario K, Padilla-Rodriguez M, Kraberger S, Stainton D, Martin DP, et al. genome. Nucleic Acids Res 33: 796–804. (2013) Discovery of a novel mastrevirus and alphasatellite-like circular DNA in 39. Varsani A, Monjane AL, Donaldson L, Oluwafemi S, Zinga I, et al. (2009) dragonflies (Epiprocta) from Puerto Rico. Virus Res 171: 231–237. Comparative analysis of Panicum streak virus and Maize streak virus diversity, 49. Komor E, ElSayed A, Lehrer AT (2010) Sugarcane yellow leaf virus recombination patterns and phylogeography. Virol J 6: 194. introduction and spread in Hawaiian sugarcane industry: Retrospective 40. Voinnet O (2005) Induction and suppression of RNA silencing: insights from epidemiological study of an unnoticed, mostly asymptomatic plant disease. viral infections. Nat Rev Genet 6: 206–220. European Journal of Plant Pathology 127: 207–217.

PLOS ONE | www.plosone.org 13 July 2014 | Volume 9 | Issue 7 | e102945