Genomics 100 (2012) 35–41

Contents lists available at SciVerse ScienceDirect

Genomics

journal homepage: www.elsevier.com/locate/ygeno

Next-generation sequencing-based transcriptome analysis of Cryptolaemus montrouzieri under insecticide stress reveals resistance-relevant genes in ladybirds

Yuhong Zhang, Ruixin Jiang, Hongsheng Wu, Ping Liu, Jiaqin Xie, Yunyu He, Hong Pang ⁎

State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, Guangdong, China article info abstract

Article history: As the most efficient natural enemy of mealybugs, the ladybird Cryptolaemus montrouzieri Mulsant plays an Received 28 March 2012 important role in integrated pest management. We report here a profiling analysis of C. montrouzieri under Accepted 4 May 2012 insecticide stress to gain a deeper view of insecticide resistance in ladybirds. For transcriptome sequencing, Available online 11 May 2012 more than 26 million sequencing reads were produced. These reads were assembled into 38,369 non- redundant transcripts (mean size=453 nt). 23,248 transcripts were annotated with their gene description. Keywords: Using a tag-based DGE (Digital gene expression) system, over 5.7 million tags were sequenced in both the in- Biological control agents Transcriptome analysis secticide stress group and the control group, and mapped to 38,369 transcripts. We obtained 993 genes that High-throughput RNA sequencing were significantly up- or down-regulated under insecticide stress in the ladybird transcriptome. These results Gene expression profiling can contribute to in-depth research into the molecular mechanisms of resistance and enhance our current Insecticide resistance understanding of the effects of insecticides on natural enemies. © 2012 Elsevier Inc. All rights reserved.

1. Introduction is essential for deciphering the functional elements of the genome and determining when they are expressed and how they are regulat- In the last few years, natural enemies as a means of biological ed. In the few years since its initial application, next-generation se- control have been more frequently applied to plant protection, quencing (NGS, also called high-throughput sequencing or deep which raises certain concerns for humans. However, biological con- sequencing) has provided a new method for mapping and quantify- trol is usually accompanied by chemical control in practical applica- ing transcriptomes. In contrast to microarray methods, NGS has tion in the field. The influence of insecticides on natural predators clear advantages and is expected to revolutionize the manner in should not be ignored in integrated pest management (IPM). The which eukaryotic transcriptomes are analyzed [7–10]. Currently, predator ladybird Cryptolaemus montrouzieri Mulsant (Coleoptera: this method, developed by Illumina/Solexa, is able to generate over ) is one of the most efficient natural enemies of mealy- one billion bases of high quality sequence per run at less than 1% of bugs (Pseudococcidae) and has been used in many biological control the cost of capillary-based methods [11]. This method has already programs to suppress mealybug outbreaks around the world [1,2]. been applied to a number of model organisms, such as yeast, Recently, the role of C. montrouzieri as a biological control agent of Arabidopsis, Drosophila, mouse, human [12–18], and some other mealybugs is expected to gain further importance due to concerns non-model organisms [19–21]. Recently developed Solexa/Illumina about insecticide use and the effect of insecticide residue on food RNA-seq and Digital gene expression (DGE) based NGS technologies [1]. Because many insecticides were used in the fight against mealy- have dramatically changed the way that resistance-relevant genes bugs, C. montrouzieri is becoming more and more threatened by in the are identified because these technologies facilitate inves- these chemical insecticides. To date, most published studies on in- tigations into the functional complexity of transcriptomes [9,22]. secticide resistance of the C. montrouzieri focused on the survival In light of these advantages, we generated the first ladybird trans- rate of adults, fertility, and other variables [3–6].However,themo- criptome profile and analyzed the ladybird response induced by in- lecular mechanisms of the ladybird response to insecticides have secticide using NGS. In a single run, more than 2 billion raw reads not been analyzed at the transcriptome level. were generated for de novo assembly and annotation without a ge- The transcriptome is the complete set of expressed RNA tran- nome reference sequence. A comprehensive analysis of the global re- scripts in a cell during a specific developmental stage or in response sponse to insecticide stress in the ladybird can aid in the in-depth to a particular physiological condition. Research on the transcriptome investigation of candidate genes involved in resistance. As these data provided a rich resource based on direct sequencing, our results will be useful for the screening and analysis of ladybird functional ⁎ Corresponding author. Fax: +86 20 84110436. genes and also provide a novel method for understanding the molec- E-mail address: [email protected] (H. Pang). ular mechanisms underlying resistance for non-model species.

0888-7543/$ – see front matter © 2012 Elsevier Inc. All rights reserved. doi:10.1016/j.ygeno.2012.05.002 36 Y. Zhang et al. / Genomics 100 (2012) 35–41

2. Results 23,106 of them could be translated into proteins of greater than 100 aa (99.39%). Owing to the absence of genome information in ladybird, 2.1. RNA-Seq and read assembly 15,121 out of 38,369 non-redundant transcripts could not be matched to any database (39.4%). Approximately 26.67 million 90 bp paired-end raw reads (submit- Further, gene ontology (GO) annotation was used for finding GO ted to NCBI Short Read Archive (SRA) database, Accession No. terms [27]. Out of 23,060 Nr-annotated transcripts, a total of 6673 SRR343064) from ladybird larvae and adults were generated by Sol- transcripts were assigned to 46 functional groups in each of the exa/Illumina NGS. Raw reads that only had 3′ adaptor fragments three main categories (biological processes, cellular component and were removed before de novo assembly of sequence reads. These molecular function) of the GO classification (Table A.2; Fig. 1). We ob- clean reads were randomly clipped into 45-mer lengths for sequence served a large number of genes from the categories of ‘Cellular pro- assembly using the Velvet program [23]. These short 45-mers were cess’ (3721 members), ‘Cell part’ (4103 members), ‘Binding’ (4025 assembled, resulting in 88,717 contigs (Table A.1). The mean size of members) and ‘Catalytic activity’ (3096 members), whereas we ob- the contigs was 246 bp with lengths ranging from as small as 89 bp served fewer than 10 transcripts from the terms ‘Viral reproduction’, to as large as 5114 bp. It has been suggested that assembly with Vel- ‘Virion part’, ‘Auxiliary transport protein activity’ and ‘Electron carrier vet followed by Oases yields better contigs/transcripts [24]. Using the activity’ (Fig. 1). To analyze putative protein function, a COG database Oases program, the contigs were further assembled into 42,409 tran- was used for annotation [28]. A total of 6360 transcripts have a COG script isoforms with a mean size of 502 bp including 998 sequences classification (Table A.2). Among the 25 COG categories, the cluster larger than 2000 bp and 4830 sequences larger than 1000 bp (Table for ‘General function prediction’ represents the largest group (2050 A.1). Removal of short sequences (b100 bp in length) and partial members) followed by ‘Replication, recombination and repair’ overlapping sequences yielded a total of 38,369 non-redundant tran- (1003 members) and ‘Translation, ribosomal structure and biogene- scripts (including only the largest transcript isoform) with a mean sis’ (926 members). The categories ‘Nuclear structure’ and ‘Extracel- size of 453 bp (Table A.2). These non-redundant transcripts were pre- lular structures’ represent the smallest groups (fewer than 10 pared as a transcriptome database for finding insecticide stress- members) (Fig. 2). KEGG analysis identified the potential involve- relevant genes by DGE. ment of transcripts in biological pathways in the ladybird [29].In total, 13,455 transcripts were grouped into 219 known metabolic or 2.2. Annotation of all non-redundant transcripts signaling KEGG pathways (Table A.2). The most enriched pathways included purine metabolism, spliceosome, regulation of actin cyto- To identify putative functions, 38,369 non-redundant transcripts skeleton, and disease-relevant pathways. These annotations provide −5 were first aligned by BLASTx (E-value≤10 ) to protein databases in a valuable resource for investigating specific processes, functions the priority order of the NCBI non-redundant (Nr) database, Swissprot and pathways in ladybird research. database, Kyoto Encyclopedia of Genes and Genomes (KEGG) database and Cluster of Orthologous Groups (COG) database. Based on the most significant BLASTx hits, the coding sequences (CDs) were deduced. 2.3. Searching for insecticide stress-relevant transcripts The CDs of transcripts were translated into amino acid sequences using the standard codon table. Using this approach, approximately To gain insight into the molecular biology of the insecticide stress 23,248 transcripts (60.6% of all non-redundant transcripts) had reliable response in C. montrouzieri, transcripts related to insecticide targets CDs [25,26] (Table A.3). The size distribution of CD-containing tran- and metabolism were analyzed. From CD-containing non-redundant scripts was similar to Nr-annotated transcripts (Table A.1). These tran- transcripts, we noticed many genes linked to the insecticide stress re- scripts had a high potential for translation into functional proteins and sponse. Some transcripts were related to resistance mediated by

Fig. 1. Gene ontology classification of non-redundant transcripts. By alignment to GO terms, 6673 transcripts were assigned into 46 functional groups in each of the three main categories: biological process, cellular component, and molecular function. The left y-axis indicates the percentage of a specific category of genes that existed in the main category. Y. Zhang et al. / Genomics 100 (2012) 35–41 37

Fig. 2. Cluster of orthologous groups (COG) classification of putative proteins. Out of 23,248 transcripts, all 6360 putative proteins were classified functionally into at least 25 mo- lecular families in the COG database. complex multi-gene enzyme systems, such as esterases (including 25% of the tags were present between 10 and 100 copies, and more carboxylesterase, amidohydrolase, and phosphodiesterase); cyto- than 69% of the tags were present at between 2 and 10 copies (Fig. A.1). chrome P450s and glutathione S-transferases; insecticide targets, The two DGE libraries were annotated by tag mapping analysis such as ligand-gated ion channels (including nicotinic acetylcholine using 38,369 non-redundant transcripts from a transcriptome refer- receptors, glycine receptors, glutamate receptors, and GABA-gated ence database. From the distinct tags of the AS and NC libraries, ion channels) and voltage-gated ion channels (including sodium 29,524 (46.71%) and 23,180 (45.08%) of all distinct tags were mapped channels); and other enzymes involved in insecticide stress response, to the entire reference database (sense or anti-sense). For the tags such as catalase, NADH dehydrogenase, NADH oxidoreductase, tryp- that could be mapped to both DNA strands of the database, bidirec- sin and superoxide dismutase (Table 1). However, we did not find tional transcription, in which the ratio of the sense to anti-sense any acetylcholinesterase genes in our transcriptome database. As strand of the transcript was approximately 2:1, was found in both li- shown in Table 1, we found that the number of known sequences re- braries (Table A.4). 11,965 tag-mapped genes were matched in the lated to the insecticide response was smaller in ladybirds. Only the transcriptome reference database. The distribution of tags and gene carboxylesterase and NADH dehydrogenase genes in ladybirds were expression in the AS and NC groups is shown in Fig. A.2, which indi- found in the NCBI Nr database, which shows that much remains un- cated that only a few genes are highly expressed and that the expres- known in insecticide resistant gene research in ladybirds. sion level of a large number of genes is very low. To confirm whether the depth of transcriptome sequencing in the DGE libraries was suffi- 2.4. Gene expression profile analysis of responses to insecticide cient, a sequencing saturation analysis was performed. The number of genes mapped by tags begins to level off when the number of tags To obtain a global view of transcriptome responses after treatment reaches 2 million, indicating that the sequencing approached satura- with the insecticide acetamiprid, we analyzed variations in the gene ex- tion (Fig. A.3). pression profile between acetamiprid-treated and -untreated ladybirds To compare differentially expressed (DE) genes between libraries (as a negative control group) using DGE, a high-throughput tag- (AS/NC), the level of gene expression was determined by normalizing sequencing (Tag-seq) method. A total of 5.78 and 6.13 million raw the number of unambiguous tags in each library to Tags Per Million tags were generated, respectively, (submitted to NCBI SRA database, Ac- (TPM). To determine if insecticide stress resulted in significant cession No. SRR346079) from acetamiprid stress (AS) and negative con- changes in gene expression, differential DGE analysis between two trol (NC) groups using massively parallel sequencing on the Illumina groups was performed using a strict Bayesian algorithm [30]. The platform. After removing low quality reads, the total number of clean False Discovery Rate (FDR)≤0.001 and absolute value of log2Ratio≥1 tags per library ranged from 5.69 to 6.06 million, and 63,211 and were used as thresholds to judge significant differences in gene ex- 51,420 tag entities were obtained from the AS and NC groups (Table pression. Between the AS and NC libraries, 993 genes mapped by a A.4). The distribution of total tags and distinct tags over different tag total of 2,686 significantly changed tag entities were detected. Com- abundance categories exhibited very similar patterns. Among the dis- pared to the NC data set, 795 genes were up-regulated and 198 tinct tags, less than 6% had a copy number higher than 100, whereas genes were down-regulated in expression from the AS data set (Fig. 38 Y. Zhang et al. / Genomics 100 (2012) 35–41

Table 1 up-regulated genes (rab6 GTPase activating protein (RABGAP1), IplI- Genes related to the insecticide targets and metabolism. All homologs in ladybirds aurora-like kinase (AURKB), cytochrome P450 (P450), sulfate transporter (Coccinellidae) were from NCBI Nr database that show the number of the sequences (ST), AGAP003111-PA (AP) and glucosyl/glucuronosyl transferases with corresponding proteins (as of March 2012). (GGT)) from these 9 validated genes and another 2 down-regulated CD-containing Number of known sequences in genes (cytochrome P450 4V3 (CYP4V3) and chitinase 16 (CHI16)) for transcripts number ladybirds (Coccinellidae) qRT-PCR confirmation.ThevaluesfortheASsamplesarepresentedas Mixed-function oxidases thefoldchangeingeneexpressionnormalizedtothebeta-tubulingene Cytochrome P450 111 0 relative to the NC samples. The expression of these genes by qRT-PCR fi Glutathione S-transferase con rmed the direction of change detected by DGE analysis (Fig. 3). Glutathione 13 0 S-transferase 3. Discussion

Esterases Although NGS technologies have been developed for the rapid se- Carboxylesterase 11 3 Amidohydrolase 1 0 quencing and characterization of many transcriptomes, RNA-Seq and Phosphodiesterase 36 0 DGE have not been used in many organisms, including the ladybird. In this study, we applied transcriptome profiling to study insecticide stress Ligand-gated ion channels in the ladybird C. montrouzieri. We sequenced the C. montrouzieri trans- Nicotinic 11 0 acetylcholine criptome using Solexa/Illumina NGS, and obtained more than 2.4 billion receptor raw base sequences. The raw data were assembled de novo using the GABA-gated ion 60 Velvet and Oases programs. For the ladybird, as a non-model organism channel with an absence of a reference genome sequence, the assembly by Vel- Glycine receptor 1 0 vet followed by the Oases program was better than that of Velvet alone Glutamate receptor 18 0 Ligand-gated ion 70 or other programs based on assessment parameters such as N50 length, channel average contig length and sequence similarity with closely related spe- cies [24]. Velvet was developed as a novel method of performing de Voltage-gated ion channels Bruijn graph-based sequence assembly methods for very short reads Sodium channel 5 0 Voltage-gated ion 30 that can both remove errors and, in the presence of read pair informa- channel tion, resolve a large number of repeats [23]. This program resulted in the highest number of total contigs, but only 9% (7938 out of 88,717) Other enzymes of these were larger than 500 bp. Next, Oases which is a de novo trans- Catalase 2 0 criptome assembler designed to produce transcripts from short read se- NADH 27 63 dehydrogenase quencing technologies was used for uploading a preliminary assembly NADH 90 produced by Velvet, and then clustering these contigs into small groups. oxidoreductase Approximately 30% (12,483 out of 42,409) of transcript isoforms identi- Trypsin 26 0 fied by Oases were larger than 500 bp. Finally, 38,369 non-redundant Superoxide 50 dismutase transcripts were generated with lengths between 100 bp and 10,457 bp. We obtained 23,248 CD-containing non-redundant tran- scripts, which matched the Nr, Swissprot, KEGG, and COG databases. A.4). We found 829 (83.48%) genes could be annotated with CDs, The other unannotated transcripts represent novel genes whose func- whereas the other 164 (16.52%) genes did not have homologs in the tion has not yet been identified. Specifically, the Nr database had Nr database (by BLASTx with an E-value≤10− 5) (Table A.5). 23,060 (99.19%) BLAST results, which was the highest number of anno- Functional annotation of DE genes was performed to assign all the tated transcripts from these four databases (Table A.2). We noticed that genes to terms in the KEGG database. We compared DE genes with most of the Nr-annotated transcripts had top matches (first hit) with whole transcriptome background to search for genes involved in meta- sequences from the red flour (Tribolium) (Table A.3). These re- bolic or signal transduction pathways that were significantly enriched. sults can be explained by the fact that T. castaneum is the only beetle Among the differentially expressed genes between the AS and NC data species with a completely sequenced genome [31]. We matched more sets, 144 genes were involved in 310 biological process GO terms that than 69% of ladybird transcripts to the Swissprot database, 27% to the might be linked to the insecticide response and 591 genes were mapped COG database, 57% to the known pathways in the KEGG database and to 187 terms in the KEGG pathway database. Notably, specific enrich- ment of genes was observed for pathways (P-value≤0.05) involved in insecticide response metabolic or signal transduction pathways, such as antigen processing and presentation, ribosome, retinol metabolism, drug metabolism—other enzymes, oxidative phosphorylation, amino sugar and nucleotide sugar metabolism, metabolic pathways, protein processing in endoplasmic reticulum.

2.5. Experimental validation

To demonstrate the quality of the transcriptome data, RT-PCR ampli- fication was performed. We selected 10 CDs-annotated non-redundant transcripts from the top 20 up-regulated expression genes between AS and NC, and designed 10 pairs of primers for RT-PCR. In this analysis, all 10 primer pairs resulted in a band of the expected size and the identities Fig. 3. qRT‐PCR validation of DGE results. 6 up-regulated genes and 2 down-regulated of 9 of the 10 PCR products were confirmed by Sanger sequencing (one genes have been identified by qRT-PCR, including RABGAP1, AURKB, P450, ST, AP, GGT, fi PCRproductwasnotconcentratedenoughtosequence).Tocon rm the CYP4V3 and CHI16. The left y-axis indicates the relative expression level by qRT-PCR, differential DGE results by Solexa/Illumina sequencing, we selected 6 and the right y-axis indicates the log2Ratio of AS/NC by DGE. Y. Zhang et al. / Genomics 100 (2012) 35–41 39

28% to GO terms. These annotations showed that 292 transcripts are ho- broad range of biological functions that might be impacted in the mologous to known insecticide resistance genes, whereas 252 tran- ladybird to confer better tolerance of chemical insecticides [38].Justas scripts are enriched in various metabolic or signaling pathways. some studies demonstrated that may protect themselves from in- Further, many other genes related to insecticide metabolism and degra- secticides by cuticular protein thickening, leading to a reduction in insec- dation were matched to KEGG pathways, such as drug metabolism- ticide penetration [39,40], the cuticular protein 67B was found to be over other enzymes (206 genes), drug metabolism-cytochrome P450 (179 transcribed in ladybirds treated with insecticide. Otherwise, the response genes), metabolism of xenobiotics by cytochrome P450 (175 genes), to environmental stresses was a synergetic process depending on a series retinol metabolism (158 genes), steroid hormone biosynthesis (148 of proteins. We found that numerous genes associated with material and genes), limonene and pinene degradation (102 genes) and linoleic energy catabolism were over transcribed, including heat shock proteins acid metabolism (86 genes). (9 genes), ATP-binding cassette transporters (3 genes), ATP synthase To find more insecticide responsive genes, we used the acetamiprid- (1 gene). In addition to these annotated genes, 164 genes could not be treated ladybird group. Acetamiprid is a recently developed neonicotinoid matched with any existing genes. Novel genes involved in the insecticide insecticide that is frequently used against mealybugs and is widely used response might be identified from this group in the future. in the world due to its merits, such as lower toxicity and high activity against pests and insects [32,33].Asasignificant natural enemy of mealy- 4. Materials and methods bugs, C. montrouzieri is potentially threatened by the use of insecticides, especially acetamiprid. A global analysis of transcriptome variations asso- 4.1. Experimental insect and sample preparation ciated with a 2‐hour stressed adult ladybird to acetamiprid revealed their ability to adjust to modifications in their chemical environment. We se- Stock cultures of C. montrouzieri were obtained from the entomol- quenced 2 distinct cDNA libraries obtained from an acetamiprid- ogy institute of Sun Yet-sen University Guangzhou, China. They were stressed group and a negative control group, using the DGE method. By reared on mealybugs (Planococcus citri Risso) under normal condi- targeting a defined cDNA library, the DGE method can generate broader tions (25±1 °C; 14:10, light:dark; and 70±10% relative humidity) transcriptome coverage and a highernumberofcDNAtagspergene, in climate chambers. From these ladybirds, three larvae and three which can lead to more precise gene transcription data. Provided that a adults were collected as one sample for transcriptome sequencing. reference genome or transcriptome database is available and that the aim is to quantify transcript levels between different biological samples, this method is perfectly suited for deep transcriptome analysis [34].We Table 2 obtained 11,965 tag-mapped genes out of 38,369 transcripts (31.18%) Representatives of putative insecticide resistant genes as predicted by DGE. Limitations of all significantly different expressed genes between AS and NC are based on by mapping the tags to a transcriptome reference database. Many tran- FDR≤0.001 and the absolute value of log2Ratio ≥1. The log2Ratio (AS/NC) indicates scripts did not map to tags, most likely as a result of differences in gene the change of gene expression; a positive number means up-regulation and a expression at different development ladybird stages, distinct genes negative one means down-regulation. might be matched by the same tags and were possibly removed from Gene Length TPM- TPM- log2Ratio Annotation the data sets, and/or a CATG site does not exist in all genes. In our trans- ID NC AS (AS/NC) criptome database, 14,595 genes (38.04%) without CATG sites could not 6459 253 0.01 3.51 8.46 PREDICTED: similar to cytochrome generate 49 bp tags to sequence on the Solexa/Illumina platform. In addi- P450, partial tion, more than 53% of the distinct clean tags could not be mapped to the 3298 1467 1.15 5.62 2.29 PREDICTED: similar to NADPH transcriptome reference database. The occurrence of unknown tags was cytochrome P450 most likely due to the lack of ladybird genome sequences, the incomplete 1981 630 2.97 14.4 2.28 PREDICTED: similar to 2-phosphodiesterase NlaIII digestion during library preparation, many tags matching noncod- 1506 512 1.32 5.97 2.18 PREDICTED: similar to cytochrome ing RNAs and the usage of alternative polyadenylation and/or splicing P450 sites [35,36]. 4215 1121 6.6 27.22 2.04 PREDICTED: similar to NADPH Stress by acetamiprid treatment resulted in large alterations of the cytochrome P450 ladybird transcriptome profile, including significant up- or down- 3963 1800 1.48 5.62 1.92 Cytochrome P450 9Z4 1167 500 4.95 16.68 1.75 PREDICTED: similar to NADH- regulation of 993 transcripts. Among them, we found 24 known genes ubiquinone oxidoreductase ashi encoding detoxification enzymes (14 Cytochrome P450s; 1 Glutathione subunit S-transferase; 2 NADH oxidoreductases; 5 Trypsins; 1 Superoxide 1116 550 2.47 8.25 1.74 Cytochrome P450-like protein dismutase and 1 phosphodiesterase), which are involved in insecticide 929 1652 50.79 150.66 1.57 Cytochrome P450 6BQ13 2726 1585 4.29 11.24 1.39 Cytochrome P450 9Z4 response KEGG pathways, including limonene and pinene degradation, 2918 1454 7.92 19.84 1.32 Cytochrome P450-like protein retinol metabolism, drug metabolism-other enzymes, metabolic 1685 371 19.95 48.99 1.3 Cytochrome P450 monooxygenase pathways, metabolism of xenobiotics by cytochrome P450, drug 329 1624 41.89 98.68 1.24 PREDICTED: similar to cytochrome metabolism-cytochrome P450, steroid hormone biosynthesis, linoleic P450 monooxygenase acid metabolism, insect hormone biosynthesis, arachidonic acid metabo- 3421 855 9.4 21.95 1.22 PREDICTED: similar to Copper chaperone for superoxide dismutase lism, spliceosome regulation, glutathione metabolism, tyrosine metabo- (Superoxide dismutase copper lism, oxidative phosphorylation, complement and coagulation cascades, chaperone) neuroactive ligand–receptor interaction and diseases (Table 2). Howev- 2025 1113 55.74 120.1 1.11 PREDICTED: similar to glutathione er, these significantly differentially transcribed detoxification enzymes S-transferase 2697 1512 17.81 36.52 1.04 PREDICTED: similar to NADH- represented only a small proportion of the total transcripts that were ubiquinone oxidoreductase 42 kDa found in our transcriptome database. Although genes encoding the pri- subunit mary targets of acetamiprid (nicotinic acetylcholinesterase receptors) 3686 1730 7.42 14.93 1.01 Cytochrome P450 9Z4 were not found by DGE, similar transcriptome responses to this insecti- 3359 1639 30.84 14.75 −1.06 Cytochrome P450 monooxygenase cide may be partly related to similar effects generated by the alteration CYP4Q2 2703 1668 43.53 19.67 −1.15 Cytochrome P450 4V3 of cholinergic neuron function [32,37]. We found that 161 out of 993 4634 814 62 27.57 −1.17 Trypsin-7 DE genes mapped to 91 molecular functional GO terms, which were sim- 2512 1036 23.42 9.66 −1.28 Trypsin-7 flags: precursor ilarly affected by insecticide. Among them, members of the cytochrome 3889 904 21.11 8.08 −1.39 Trypsin P450 families and some detoxification enzymes that are frequently in- 39 426 569.91 156.98 −1.86 Trypsin inhibitor − volved in resistance to insecticide, and several other genes with a 1068 845 12.7 1.23 3.37 Trypsinogen precursor of ANTRYP7 40 Y. Zhang et al. / Genomics 100 (2012) 35–41

For DGE analysis, the experimental group was three adults treated their 5′ ends. The junction of Illumina adaptor 1 and CATG site is the rec- with 5 ng of the insecticide acetamiprid (obtained from Henan Planck ognition site of MmeI, which is a type of endonuclease with separated Ltd, China) per insect in pronotum for 2 h. In parallel, three ladybird recognition sites and digestion sites. It cuts at 17 bp downstream of adults that were acetamiprid-untreated were pooled as the negative the CATG site, producing tags with adaptor 1. After removing 3′ frag- control group. Finally, these samples were harvested and immediate- ments with magnetic bead precipitation, the Illumina adaptor 2 is ligat- ly frozen in liquid nitrogen. ed to the 3′ ends of tags, generating tags with different adaptors at both ends to form a tag library. After 15 cycles of linear PCR amplification, 4.2. RNA preparation for RNA-Seq 95 bp fragments are purified by 6% TBE PAGE Gel electrophoresis. After denaturation, the single-chain molecules are fixed onto the Total RNA was isolated using the UNlQ-10 Column Trizol Total RNA Illumina Sequencing Chip (flow cell). Each molecule grows into a Isolation Kit (Sangon) according to the manufacturer's protocol. RNA in- single-molecule cluster sequencing template through in situ amplifica- tegrity was confirmed using a 2100 Bioanalyzer (Agilent Technologies) tion. Four types of nucleotides, which are labeled by four colors, are with a minimum RNA integrity number (RIN) value of 7.0. The samples added and sequencing is performed using the sequencing by synthesis for transcriptome analysis were prepared using Illumina's kit following (SBS) method. Each tunnel generates millions of raw data with a se- the manufacturer's recommendations. Briefly, mRNA was purified from quencing length of 49 bp. 6 μg of total RNA (a mixture of RNA from larvae and adults at an equal ratio) using oligo (dT) magnetic beads. Fragmentation buffer was 4.5. Mapping DGE tags to reference transcriptome databases added for cutting mRNA into short fragments. Taking these short frag- ments as templates, random hexamer-primers were used to synthesize Before mapping reads to transcriptome databases, we filtered all first-strand cDNA. Second-strand cDNA was synthesized using buffer, raw sequence reads using the Illumina pipeline. All low quality tags, dNTPs, RNaseH and DNA polymerase I. Short fragments were purified such as tags with unknown sequences ‘N’, empty tags (sequence with the QiaQuick PCR extraction kit and eluted with EB buffer for end with only adaptor sequences), low complexity tags, and tags with repair and the addition of poly(A). Short fragments were ligated with only one copy (most likely resulting from sequencing errors) were re- sequencing adapters, and resolved by agarose gel electrophoresis. Suit- moved. A virtual library containing all of the possible CATG+17 able fragments were selected and purified and subsequently PCR ampli- bases length sequences was created using our transcriptome data- fied to create the final cDNA library template. base. All clean tags were mapped to the reference sequences and only 1 nucleotide mismatch was acceptable. Clean tags, which could 4.3. Analysis of the transcriptome results map to reference sequences were filtered from multiple genes. The remainder of the clean tags was designed unambiguous clean tags. The transcriptome was sequenced using Illumina HiSeq™ 2000. For gene expression analysis, the number of unambiguous clean Four fluorescently labeled nucleotides and a specialized polymerase tags for each gene was calculated and then normalized to TPM (the were used to determine the clusters base by base in parallel. The size number of transcripts per million clean tags) [41,42]; and the differ- of the library was approximately 200 bp and both ends of the libraries entially expressed tags were used for mapping and annotation. The were sequenced. The 90 bp raw PE reads were generated on the complete list of differentially expressed genes is shown in Table A.5. Illumina sequencing platform. Image deconvolution and quality value calculations were performed using the Illumina GA pipeline v1.6. The 4.6. Identification of differentially expressed genes raw reads were cleaned by removing adaptor sequences, empty reads and low quality sequences (reads with unknown sequences ‘N’). The A statistical analysis of the frequency of each tag in the different clean reads were assembled into non-redundant transcripts using the cDNA libraries was performed to compare gene expression in both in- de Bruijn graph algorithm and Velvet (http://www.ebi.ac.uk/~zerbino/ secticide stress and negative control conditions. A statistical compar- velvet/) [23]. After assessing different K-mer sizes, we found that the ison was performed with custom written scripts using the method 45-mers provided the best results for transcriptome assembly. Small described by Audic et al. [30]. The P value corresponds to the differen- K-mers make the graph very complex, whereas large K-mers can have tial gene expression test. FDR (False Discovery Rate) is a method to poor overlap in regions with low sequencing depth. The resulting con- determine the threshold of the P value in multiple tests and analyses tigs were joined into transcript isoforms using Oases (http://www.ebi. and is obtained by manipulating the FDR value. We used "FDR≤0.001 ac.uk/~zerbino/oases/), which has been developed specifically for the and the absolute value of log2Ratio≥1" as the threshold to judge the de novo assembly of transcriptomes using short reads. To obtain non- significance of gene expression differences. We performed cluster redundant transcripts, we removed short sequences (b100 bp in analysis of gene expression patterns with "cluster" [43] software length) and partially overlapping sequences. The resulting sequences and "Java Treeview" [44] software for clustering analysis of differen- were used for BLAST searches and annotation against the Nr protein da- tial gene expression patterns. In the gene expression profiling analy- tabase, the Swissprot database, the COG database and the KEGG data- sis, Gene Ontology enrichment analysis of functional significance was base using an E-value cut-off of 10−5. Functional annotation by GO applied using the hypergeometric test to map all differentially terms (www.geneontology.org) was analyzed by the Blast2go software. expressed genes to terms in the GO database, looking for significantly The COG and KEGG pathways annotations were performed using enriched GO terms in differentially expressed genes, and comparing Blastall software against the COG database and the KEGG database. them to the transcriptome database. For the pathway enrichment analysis, we mapped all differentially expressed genes to terms in 4.4. DGE library construction and sequencing the KEGG database and looked for significantly enriched KEGG terms compared to the transcriptome database. Tag library construction for the two ladybird samples (acetamiprid- treated group and negative control group) was performed in parallel 4.7. Experimental validation using the Illumina Gene Expression Sample Prep Kit. Briefly, mRNA was isolated from total RNA with magnetic oligo(dT) beads. First and To confirm non-redundant transcripts, the cDNA of 10 annotated second strand cDNA were synthesized and bead-bound cDNA was sub- non-redundant transcripts was amplified by RT-PCR. All PCR products sequently digested with the restriction enzyme NlaIII, which recognizes were purified using a Gel Extraction Kit (Qiagen), and validated by and cuts off the CATG sites. The cDNA fragments with 3′ ends were then Sanger sequencing. To confirm the DGE results, we designed 8 pairs purified with magnetic beads and the Illumina adapter 1 was added to of primer to perform qPCR analysis on 6 up-regulated genes and 2 Y. Zhang et al. / Genomics 100 (2012) 35–41 41 down-regulated genes. The primers are shown in Table A.6. The RNA [16] J. Marioni, C. Mason, S. Mane, M. Stephens, Y. Gilad, RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays, Genome samples used for the qPCR assays were the same as for the DGE ex- Res. 18 (2008) 1509–1517. periments and we also isolated RNA independently from biological [17] R.D. Morin, et al., Profiling the HeLa S3 transcriptome using randomly primed cDNA replicates. The first cDNA strand was synthesized from 1.0 μg total and massively parallel short-read sequencing, Biotechniques 45 (2008) 81–94. [18] B.R. Graveley, et al., The developmental transcriptome of Drosophila melanogaster, RNA by the PrimeScript RT reagent Kit (TaKaRa). The qPCR was per- Nature 471 (2011) 473–479. formed using a BIO-Rad IQ5 Real-Time PCR system (Bio-Rad) with [19] B. Feldmeyer, C.W. Wheat, N.K. Rezdorn, B. Rotter, M. Pfenninger, Short read SYBR-Green detection (SYBR Premix Ex Taq, TaKaRa) according to Illumina data for the de novo assembly of a non-model snail species trans- the manufacturer's instructions. The forward and reverse primers criptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of as- sembler performance, BMC Genomics 12 (2011) 317. used for qPCR are shown in Table A.6. Each reaction was run in tripli- [20] L.X. Xiang, D. He, W.R. Dong, Y.W. Zhang, J.Z. Shao, Deep sequencing-based trans- cate, after which the average threshold cycle (Ct) was calculated per criptome profiling analysis of bacteria-challenged Lateolabrax japonicus reveals fi sample. The beta-tubulin gene was used to normalize expression insight into the immune-relevant genes in marine sh, BMC Genomics 11 (2010) 472. levels, and the relative expression of genes was calculated using the [21] X.W. Wang, J.B. Luan, J.M. Li, Y.Y. Bao, C.X. Zhang, S.S. Liu, De novo characteriza- − △△ 2 Ct method. tion of a whitefly transcriptome and analysis of its gene expression during devel- Supplementary materials related to this article can be found on- opment, BMC Genomics 11 (2010) 400. [22] S.V. Anisimov, Serial analysis of gene expression (SAGE): 13 years of application line at http://dx.doi.org/10.1016/j.ygeno.2012.05.002. in research, Curr. Pharm. Biotechnol. 9 (2008) 338–350. [23] D.R. Zerbino, E. Birney, Velvet: algorithms for de novo short read assembly using de Bruijn graphs, Genome Res. 18 (2008) 821–829. [24] R. Garg, R.K. Patel, A.K. Tyagi, M. Jain, De novo assembly of chickpea trans- Acknowledgments criptome using short reads for gene discovery and marker identification, DNA Res. 18 (2011) 53–63. We thank the Beijing Genomics Institute (BGI) Shenzhen and Prof. [25] J. Ye, S. McGinnis, T.L. Madden, BLAST: improvements for better sequence analy- sis, Nucleic Acids Res. 34 (2006) W6–W9. Suhua Shi in Sun Yat-sen University for providing us with technical [26] C. Iseli, C.V. Jongeneel, P. Bucher, ESTScan: a program for detecting, evaluating, assistance in high throughout sequencing and bioinformatics analy- and reconstructing potential coding regions in EST sequences, Proc. Int. Conf. sis. This research was supported by a grant from the National Natural Intell. Syst. Mol. Biol. (1999) 138–148. Science Foundation of China (Grant No. 31171899), and the Knowl- [27] J. Ye, et al., WEGO: a web tool for plotting GO annotations, Nucleic Acids Res. 34 (2006) W293–W297. edge Innovation Program of Chinese Academy of Sciences (Grant [28] R.L. Tatusov, M.Y. Galperin, D.A. Natale, E.V. Koonin, The COG database: a tool for No. KSXC2-EW-B-02). genome-scale analysis of protein functions and evolution, Nucleic Acids Res. 28 (2000) 33–36. [29] M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno, M. Hattori, The KEGG resource for deciphering the genome, Nucleic Acids Res. 32 (2004) D277–D280. References [30] S. Audic, J.M. Claverie, The significance of digital gene expression profiles, Ge- nome Res. 7 (1997) 986–995. [1] R.X. Jiang, S. Li, Z.P. Guo, H. Pang, Research status of Cryptolaemus montrouzieri [31] S. Richards, et al., The genome of the model beetle and pest Tribolium castaneum, Mulsant and establishing its description criteria, J. Environ. Entomol. 31 (2009) Nature 452 (2008) 949–955. 238–247. [32] M. Tomizawa, J.E. Casida, Neonicotinoid insecticide toxicology: mechanisms of [2] L.Y. Li, The research and application prospects of Cryptolaemus montrouzieri in selective action, Annu. Rev. Pharmacol. Toxicol. 45 (2005) 247–268. China, Nat. Enemies Insects 15 (1993) 142–152. [33] Q.X. Zhou, Y.J. Ding, J.P. Xiao, Sensitive determination of thiamethoxam, imidacloprid [3] T.R. Babu, G. Ramanamurthy, Residual toxicity of pesticides to the adults of and acetamiprid in environmental water samples with solid-phase extraction Cryptolaemus montrouzieri Mulsant (Coccinellidae: Coleoptera), Int. Pest Control packed with multiwalled carbon nanotubes prior to high-performance liquid chro- 41 (1999) 137–138. matography, Anal. Bioanal. Chem. 385 (2006) 1520–1525. [4] R.A. Cloyd, A. Dickinson, Effect of insecticides on mealybug destroyer (Coleoptera: [34] L. Hanriot, et al., A combination of LongSAGE with Solexa sequencing is well suited to Coccinellidae) and parasitoid Leptomastix dactylopii (Hymenoptera: Encyrtidae), explore the depth and the complexity of transcriptome, BMC Genomics 9 (2008) 418. natural enemies of citrus mealybug (Homoptera: Pseudococcidae), J. Econ. [35] Q. Pan, O. Shai, L.J. Lee, B.J. Frey, B.J. Blencowe, Deep surveying of alternative splic- Entomol. 99 (2006) 1596–1604. ing complexity in the human transcriptome by high throughput sequencing, Nat. [5] A. Urbaneja, et al., Efficacy of five selected acaricides against Tetranychus urticae Genet. 40 (2008) 1413–1415. (Acari: Tetranychidae) and their side effects on relevant natural enemies occur- [36] J.S. Mattick, The genetic signatures of noncoding RNAs, PLoS Genet. 5 (2009) ring in citrus orchards, Pest Manage. Sci. 64 (2008) 834–842. e1000459. [6] M. Mani, V.J. Lakshmi, A. Krishnamoorthy, Side effects of some pesticides on the [37] V. Raymond-Delpech, K. Matsuda, B.M. Sattelle, J.J. Rauh, D.B. Sattelle, Ion chan- adult longevity, progeny production and prey consumption of Cryptolaemus nels: molecular targets of neuroactive insecticides, Invert. Neurosci. 5 (2005) montrouzieri Mulsant (Coccinellidae, Coleoptera), Indian J. Plant Prot. 25 (1997) 119–133. 48–51. [38] J. David, et al., Transcriptome response to pollutants and insecticides in the den- [7] N. Cloonan, S.M. Grimmond, Transcriptome content and dynamics at single- gue vector Aedes aegypti using next-generation sequencing technology, BMC Ge- nucleotide resolution, Genome Biol. 9 (2008) 234. nomics 11 (2010) 216. [8] O. Morozova, M.A. Marra, Applications of next-generation sequencing technolo- [39] R.F. Djouaka, et al., Expression of the cytochrome P450s, CYP6P3 and CYP6M2 are gies in functional genomics, Genomics 92 (2008) 255–264. significantly elevated in multiple pyrethroid resistant populations of Anopheles [9] Z. Wang, M. Gerstein, M. Snyder, RNA-Seq: a revolutionary tool for trans- gambiae s.s. from Southern Benin and Nigeria, BMC Genomics 9 (2008) 538. criptomics, Nat. Rev. Genet. 10 (2009) 57–63. [40] J. Vontas, et al., Transcriptional analysis of insecticide resistance in Anopheles [10] F. Ozsolak, P.M. Milos, RNA sequencing: advances, challenges and opportunities, stephensi using cross-species microarray hybridization, Insect Mol. Biol. 16 Nat. Rev. Genet. 12 (2011) 87–98. (2007) 315–324. [11] W. Huang, G. Marth, EagleView: a genome assembly viewer for next-generation [41] P.A. ‘t Hoen, et al., Deep sequencing-based expression analysis shows major ad- sequencing technologies, Genome Res. 18 (2008) 1538–1543. vances in robustness, resolution and inter-lab portability over five microarray [12] U. Nagalakshmi, et al., The transcriptional landscape of the yeast genome defined platforms, Nucleic Acids Res. 36 (2008) e141. by RNA sequencing, Science 320 (2008) 1344–1349. [42] A.S. Morrissy, et al., Next-generation tag sequencing for cancer gene expression [13] B.T. Wilhelm, et al., Dynamic repertoire of a eukaryotic transcriptome surveyed at profiling, Genome Res. 19 (2009) 1825–1835. single-nucleotide resolution, Nature 453 (2008) 1239–1243. [43] M.B. Eisen, P.T. Spellman, P.O. Brown, D. Botstein, Cluster analysis and display of [14] A. Mortazavi, B.A. Williams, K. McCue, L. Schaeffer, B. Wold, Mapping and quanti- genome-wide expression patterns, Proc. Natl. Acad. Sci. U. S. A. 95 (1998) fying mammalian transcriptomes by RNA-Seq, Nat. Methods 5 (2008) 621–628. 14863–14868. [15] R. Lister, et al., Highly integrated single-base resolution maps of the epigenome in [44] A.J. Saldanha, Java Treeview—extensible visualization of microarray data, Bioin- Arabidopsis, Cell 133 (2008) 523–536. formatics 20 (2004) 3246–3248.