Identification of Emerging Viral Genomes in Transcriptomic Datasets of Alfalfa (Medicago Sativa L.) Peng Jiang, Jonathan Shao and Lev G
Total Page:16
File Type:pdf, Size:1020Kb
Jiang et al. Virology Journal (2019) 16:153 https://doi.org/10.1186/s12985-019-1257-y RESEARCH Open Access Identification of emerging viral genomes in transcriptomic datasets of alfalfa (Medicago sativa L.) Peng Jiang, Jonathan Shao and Lev G. Nemchinov* Abstract Background: Publicly available transcriptomic datasets have become a valuable tool for the discovery of new pathogens, particularly viruses. In this study, several coding-complete viral genomes previously not found or experimentally confirmed in alfalfa were identified in the plant datasets retrieved from the NCBI Sequence Read Archive. Methods: Publicly available Medicago spp. transcriptomic datasets were retrieved from the NCBI SRA database. The raw reads were first mapped to the reference genomes of Medicago sativa and Medigago truncatula followed by the alignment of the unmapped reads to the NCBI viral genome database and de novo assembly using the SPAdes tool. When possible, assemblies were experimentally confirmed using 5′/3′ RACE and RT-PCRs. Results: Twenty three different viruses were identified in the analyzed datasets, of which several represented emerging viruses not reported in alfalfa prior to this study. Among them were two strains of cnidium vein yellowing virus, lychnis mottle virus and Cactus virus X, for which coding-complete genomic sequences were obtained by a de novo assembly. Conclusions: The results improve our knowledge of the diversity and host range of viruses infecting alfalfa, provide essential tools for their diagnostics and characterization and demonstrate the utility of transcriptomic datasets for the discovery of new pathogens. Keywords: Alfalfa (Medicago sativa L.), Transcriptomic discovery, Plant viruses Background Nonetheless, they are widespread in major alfalfa cultiva- Alfalfa (Medicago sativa L.) is one of the most exten- tion areas and their contribution to the severity of com- sively cultivated forage legumes in the world [1]. It has plex infections involving multiple pathogens is poorly recently become the third most valuable field crop in the understood. Recently, many emerging viral diseases of United States with an estimated worth of over $9.3 bil- alfalfa have been described that have the potential to lion, which is $1.2 billion more than that of wheat, ac- cause serious yield losses. These include a rhabdovirus cording to the National Alfalfa and Forage Alliance [2]. that was diagnosed in alfalfa plants displaying multiple Alfalfa productivity is often limited by various biotic and abnormalities [5]; a new enamovirus from Argentina, Al- abiotic components in the ecosystem [1, 3, 4]. With the falfa enamovirus-1 [6] (AEV-1) that was detected in al- increase in the production of monocropped alfalfa, infec- falfa plants showing dwarfism symptoms; an AEV isolate tious diseases, including viruses, have became more from Sudan, designated AEV-2 [7]; Alfalfa virus S, a new common. Traditionally, viral infections of alfalfa have species of the family Alphaflexiviridae discovered in al- been considered by producers, breeders, growers and re- falfa samples exhibiting chlorosis and stunting [8]; alfalfa search communities as diseases of limited importance. virus F, a new member of the genus Marafivirus [9]; and Alfalfa leaf curl virus found in plants displaying leaf * Correspondence: [email protected] curling symptoms [10]. Alfalfa appears to be widely in- USDA/ARS, Beltsville Agricultural Research Center, Molecular Plant Pathology fected with seed-transmitted partitiviruses and the Laboratory, Beltsville, MD 20705, USA © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Jiang et al. Virology Journal (2019) 16:153 Page 2 of 12 biological significance of these in alfalfa is currently un- were not available for experimental confirmation of the known and requires further investigation [11, 12]. transcriptomic data. For the MsAV1 experiments, the Publicly available transcriptomic datasets have became same RNA samples employed in Nemchinov et al. [18] a valuable tool for the discovery of new pathogens, par- were used. Total RNA was extracted as previously de- ticularly viral sequences [11, 13–16]. The retrieval of scribed [18]. The 5′ end and 3′ end of MsAV1 were complete or nearly complete viral genomes from tran- amplified using the SMARTer RACE 5′/3′ Kit (Takara scriptomic data improves our knowledge of the diversity Bio USA, Inc., Madison. WI). The primers used for 5′/ and host range of these pathogens and provides essential 3′ RACE and the specific RT-PCR detection of the virus tools for their diagnostics and characterization. In this are listed in the Additional file 1. Phylogenetic trees study, we performed a systematic survey of alfalfa tran- were constructed using MEGA 7 software [19] by apply- scriptomic datasets publicly available at NCBI. The sur- ing maximum likelihood method (with 1000 replicates) vey indicated that approximately 90% of Medicago sativa based on the JTT matrix-based model. The protein se- samples employed in the generation of the deposited quences were aligned by Cluster W incorporated into datasets contained viruses. Several emerging viruses MEGA7. The sequence pairwise identity analysis of were identified that had not been reported to infect al- MsAV1 RdRp sequences was performed using the falfa prior to this study or had not been experimentally MAFFT program of the Sequence Demarcation Tool confirmed in the plant. (http://web.cbio.uct.ac.za/~brejnev/). The SIAS tool that calculates pairwise sequence identity and similarity from Methods multiple sequence alignments (http://imed.med.ucm.es/ Viral sequences were identified in alfalfa datasets re- Tools/sias.html) was used to predict amino acid iden- trieved from the NCBI Sequence Read Archive (https:// tities between the Pro-Pol regions of cnidium vein yel- www.ncbi.nlm.nih.gov/sra). The raw sequencing reads lowing virus (CnVYV) and lychnis mottle virus were first mapped to the reference genomes of Medicago (LycMoV). sativa (http://www.medicagohapmap.org/downloads/ cadl)andMedicago truncatula (http://www.medicago- Results genome.org/) with the Bowtie2 software described We performed a systematic survey of 655 alfalfa tran- below. Those reads that did not map to the reference ge- scriptomic datasets publicly available from the NCBI. nomes were assembled into contigs using SPAdes The survey indicated that approximately 90% of Medi- (v.3.12.0) and searched (BLASTn with default setting cago sativa samples employed in the generation of the and the following parameters: 20 threads, two align- deposited datasets contained viruses. Twenty three dif- ments and two descriptions) against other plant ge- ferent viruses were found in these transcriptomes, in- nomes (rice, apple, beet, arabidopsis and tomato) to cluding several viruses not previously reported in alfalfa. check for possible cross-run contamination. Reads that The heterogeneity of the viral contigs identified in this did not map to any plant species were aligned to the study is summarized in Fig. 1 (see Additional file 2 for NCBI viral genome database (https://www.ncbi.nlm.nih. details). Essentially complete viral genomes were assem- gov/genome/viruses/). The alignments were performed bled for three emerging alfalfa viruses that have not been using BBMap (v. 37.66), SeqMan Ngen (version 15.2.0, identified in alfalfa prior to this study (CnVYV, LycMoV build number 130; mer size 17–21, minimum match and Cactus virus X) and for one virus that had not been percentage 85%) and Bowtie 2 (v.2.3.4) with the follow- experimentally confirmed in the plant (MsAV1). ing very sensitive settings: -D 20 -R 3 -N 0 -L 20 -i S, 1, 0.50 and -p 20 (threads). In general, the very sensitive Medicago sativa amalgavirus 1 (MsAV1) settings increase the search effort of Bowtie2 (D, R) by The Medicago sativa amalgavirus 1 sequence was origin- increasing the cutoffs for which Bowtie would stop ally reported by Wang and Zhang in 2013 under the searching. The length of the seed substring (L) was GenBank accession GAFF01077243.1 and later dubbed shortened to 20 from 22 and the interval (i) between MsAV1 by Nibert et al. [20]. Prior to this study, the substrings was decreased to 0.50. The reads that were virus was not found in the US and its sequence had not mapped to the reference viral sequences related to the been experimentally confirmed. viruses of interest and to the assembled viral contigs The RNA-seq data in which the virus reads were dis- from the datasets, were sequestered and assembled de covered, derived from the publicly available datasets novo using SPAdes (v.3.12.0) [17] with the following pa- SRR6050922 to SRR6050957 generated in our laboratory rameters: kmers 21,31,41,51,81,91,95, −m 800 -t 20. Wet from the US alfalfa cultivars Maverick and ZG9830 [18]. lab experiments to confirm the identified viral sequences The identified raw viral reads were mapped to the refer- were performed only with Medicago sativa amalgavirus ence genome of MsAV1 (GAFF01077243.1; NC_ 1 (MsAV1). Plant materials infected with other viruses 040591.1) [20, 21] by Bowtie2 [22]. Each assembled Jiang et al. Virology Journal (2019) 16:153 Page 3 of 12 Fig. 1 Virome heterogeneity found in 655 publicly available alfalfa transcriptomic datasets. * indicates viruses not previously reported in alfalfa. Highlighted are viruses that were characterized in this study transcriptome was individually searched for amalgavirus addition, the assembled contigs were searched with sequences using the BLAST tool [23]. The numbers of BLASTX against a database of the RNA-dependent RNA amalgavirus reads found in the datasets are listed in polymerase (RdRp) motif sequences of known RNA vi- Fig.