The De Novo Transcriptome and Its Analysis in the Worldwide Vegetable Pest, Delia Antiqua (Diptera: Anthomyiidae)
Total Page:16
File Type:pdf, Size:1020Kb
INVESTIGATION The de novo Transcriptome and Its Analysis in the Worldwide Vegetable Pest, Delia antiqua (Diptera: Anthomyiidae) Yu-Juan Zhang, Youjin Hao, Fengling Si, Shuang Ren, Ganyu Hu, Li Shen, and Bin Chen1 Institute of Entomology and Molecular Biology, College of Life Sciences, Chongqing Normal University, Chongqing, People’s Republic of China ABSTRACT The onion maggot Delia antiqua is a major insect pest of cultivated vegetables, especially the KEYWORDS onion, and a good model to investigate the molecular mechanisms of diapause. To better understand the onion maggot biology and diapause mechanism of the insect pest species, D. antiqua, the transcriptome was sequenced high-throughput using Illumina paired-end sequencing technology. Approximately 54 million reads were obtained, trimmed, RNA and assembled into 29,659 unigenes, with an average length of 607 bp and an N50 of 818 bp. Among sequencing these unigenes, 21,605 (72.8%) were annotated in the public databases. All unigenes were then compared de novo against Drosophila melanogaster and Anopheles gambiae. Codon usage bias was analyzed and 332 simple assembly sequence repeats (SSRs) were detected in this organism. These data represent the most comprehensive codon usage bias transcriptomic resource currently available for D. antiqua and will facilitate the study of genetics, genomics, simple sequence diapause, and further pest control of D. antiqua. repeat The onion maggot Delia antiqua is a major insect pest of cultivated in a selected tissue and species of interest and generates quantitative vegetables, especially onions, and is widely distributed in the northern expression scores for each transcript (Wilhelm and Landry 2009). hemisphere. It can be induced into summer and winter diapauses, Such transcriptome analysis will likely replace large-scale microarray both happening at the pupal stage and just after head evagination approaches (Marioni et al. 2008; Simon et al. 2009) because of its (Ishikawa et al. 2000). These characteristics make it a good model lower cost and greater sequence yield. This allows the measurement for investigating the molecular mechanisms of pupal diapause (Hao of transcriptome composition and quantitatively surveys RNA expres- et al. 2012). To date, there have been no efforts to sequence the sion patterns (Wilhelm and Landry 2009). It also addresses comparative complete transcriptome of D. antiqua. The research on this species genomic-level questions and allows the development of molecular is limited by a very small amount of genomic data. As of February markers (Parra-Gonzalez et al. 2012). In the past several years, RNA- 2014, only 245 expressed sequence tags (ESTs) have been deposited in seq has been used on many genomes, from bacteria, archaea, and lower GenBank. To better understand the biological basis of diapause and D. eukaryotes to higher eukaryotes. It is particularly effective when a refer- antiqua, there is a need to explore genomic biology in D. antiqua. ence genome is not available (Martin and Wang 2011). Advances in next-generation sequencing (NGS) and assembly The main objective of our study was to construct a reference algorithms have rapidly promoted the development of next-generation transcriptome of D. antiqua for future genetic and genomic studies of RNA sequencing (RNA-seq). RNA-seq can reveal the entire transcriptome this species. Here, we present a comprehensive analysis of the de novo transcriptome sequencing results for cDNA samples derived from Copyright © 2014 Zhang et al. eggs, larvae, pupae, and adults of D. antiqua (covering every stage doi: 10.1534/g3.113.009779 of development) by using the Illumina Hiseq2000 sequencing plat- Manuscript received December 3, 2013; accepted for publication March 5, 2014; published Early Online March 10, 2014. form. As a result, a total of 29,659 unigenes were assembled and This is an open-access article distributed under the terms of the Creative identified; among them, 21,050 were annotated. Based on the tran- Commons Attribution Unported License (http://creativecommons.org/licenses/ scriptome, codon usage bias was analyzed and simple sequence by/3.0/), which permits unrestricted use, distribution, and reproduction in any repeats (SSRs) were detected. To our knowledge, this is the first report medium, provided the original work is properly cited. 1Corresponding author: College of Life Sciences, Chongqing Normal University, of the complete transcriptome and transcriptome characteristics of University Town, Chongqing 401331, People’s Republic of China. D. antiqua. This new dataset will provide a useful resource for future E-mail: [email protected] studies of the genetics and genomics of this species. Volume 4 | May 2014 | 851 MATERIALS AND METHODS classincludesclustersinwhichseveral unigenes with a similarity higher than 70% are classified into one cluster with the prefixCL.Theother Insect rearing and RNA extraction classincludessingletonswiththeprefix Unigene. FPKM for samples The non-diapausing colony (ND) of D. antiqua wasmaintainedinthe was calculated to show expression quantity, thus avoiding the influence Institute of Entomology and Molecular Biology, Chongqing Normal of sequencing length and difference (Mortazavi et al. 2008). Each FPKM University, China at 20 6 0.2° under 50–70% relative humidity (RH) was log transformed. with a 16L:8D photocycle as previously described (Chen et al. 2005). 10 Larvae were raised at 25 6 0.5° under 50–70% RH with 16L:8D to Functional annotation induce summer diapause (SD). Newly formed puparia were used for The generated unigenes larger than 200 nt were searched against NCBI experiments. They were kept under the same conditions as the larvae non-redundant protein database (Nr), Swiss-Prot, Kyoto Encyclopedia until day 15 after pupariation and then were transferred to 16° and of Genes and Genomes (KEGG), Cluster of Orthologous Groups 16L:8D to trigger diapause termination. For winter diapause (WD) in- (COG), and Gene Ontology (GO) databases using BLASTX alignment duction, the conditions were 15 6 0.5° under 50–70% RH with (E-value #1e25), and against NCBI non-redundant nucleotide database a 12L:12D through larvae and pupae. Ten samples were taken and then (Nt) by BLASTN (E-value #1e25). The best-aligned results were used to immediately frozen in liquid nitrogen for later RNA extraction. These 10 decide sequence direction and the coding sequence (CDS) of unigenes. If samples include ND eggs (from every incubation day), ND larvae (from results of different databases conflicted with each other, then apriority every larval instar), ND pupae (female and male, from the stage equal to order of Nr, Nt, Swiss-Prot, KEGG, and COG was followed when de- diapause development), ND adults (female and male, from different ciding sequence direction of unigenes. ESTScan (Iseli et al. 1999) was used developmental stages), three SD pupae samples (female and male, sam- to predict the sequence direction and CDS when unigenes were unaligned pled separately at diapause initiation, maintenance, and termination), to any of the databases. GO annotations of the unigenes were determined and three WD pupae samples (female and male, sampled separately at by the Blast2GO program (Conesa et al. 2005). After obtaining GO diapause initiation, maintenance, and termination) of D. antiqua. annotation for each unigene, WEGO software (Ye et al. 2006) was used Total RNA was separately extracted from the 10 samples using TRIzol to display GO functional classification. Reagent (Invitrogen, Carlsbad, CA) following the manufacturer’sprotocol. To eliminate genomic DNA, the RNA samples were treated with RNase- Characterization of open reading frames and Free DNase I according to the manufacturer’s protocol (Qiagen, USA). codon usage The RNA integrity was confirmed using the Agilent 2100 Bioanalyzer with The open reading frames (ORFs) in each unigene sequence were a minimum integrity number value of 7; 13% of total RNA each from ND predicted by searching against protein databases using BLASTX eggs, ND larvae, and ND adult samples and 8.7% of total RNA from each (E-value #1e25) in the following order: Nr, SwissProt, KEGG, and of the other seven samples were pooled together for cDNA preparation. COG. Sequences having hits in the former database did not go to the next round of searching against a later database. The coding regions mRNA purification and cDNA synthesis and Illumina were then extracted according to the best BLASTX match with a cus- sequencing preparation and sequencing tom Perl script. CodonW (http://codonw.sourceforge.net/)(Peden $ Beads with Oligo(dT) were used to isolate poly(A) mRNA after total 1999) was used to analyze RSCU and GC3 of ORFs ( 150 nt). Perl fi RNA extraction. Fragmentation buffer was added for cutting mRNA into scripts were used to process the output les of CodonW. We used SPSS short fragments. Taking these short fragments as templates, random 13.0 statistics software (SPSS Inc., Chicago, IL) for correlation analysis. hexamer-primer was used to synthesize the first-strand cDNA. The NC and CAI were calculated by DAMBE (Xia 2013) using the second-strand cDNA was synthesized using buffer, dNTPs, RNaseH, and Delia antiqua ribosomal proteins codon usage table as the reference DNA polymerase I. Short fragments were purified with QiaQuick PCR when calculated CAI. extraction kit (Qiagen, USA) and resolved with EB buffer for end SSR detection reparation and tailing A. After that, the short fragments were connected SSRs Identification Tool (SSRIT; http://www.gramene.org/db/markers/