Pyrosequencing of the midgut transcriptome of the poplar leaf tremulae reveals new families in Coleoptera Yannick Pauchet, Paul Wilkinson, Manuella van Munster, Sylvie Augustin, David Pauron, Richard Ffrench-Constant

To cite this version:

Yannick Pauchet, Paul Wilkinson, Manuella van Munster, Sylvie Augustin, David Pauron, et al.. Pyrosequencing of the midgut transcriptome of the poplar Chrysomela tremulae reveals new gene families in Coleoptera. Biochemistry and Molecular Biology, Elsevier, 2009, 39 (5-6), pp.403-413. ￿10.1016/j.ibmb.2009.04.001￿. ￿hal-02668241￿

HAL Id: hal-02668241 https://hal.inrae.fr/hal-02668241 Submitted on 31 May 2020

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. ARTICLE IN PRESS IB2039_proof 16 April 2009 1/11

Insect Biochemistry and Molecular Biology xxx (2009) 1–11

Contents lists available at ScienceDirect

Insect Biochemistry and Molecular Biology

journal homepage: www.elsevier.com/locate/ibmb

55 1 56 2 Pyrosequencing of the midgut transcriptome of the poplar leaf beetle 57 3 58 4 Chrysomela tremulae reveals new gene families in Coleoptera 59 5 60 6 Yannick Pauchet a,*, Paul Wilkinson a, Manuella van Munster b, Sylvie Augustin c, 61 62 7 David Pauron b, Richard H. ffrench-Constant a 8 63 9 a School of Biosciences, University of Exeter in Cornwall, Tremough Campus, Penryn TR10 9EZ, UK 64 b 10 INRA, UMR 1301 Interactions Biotiques et Sante´ Ve´ge´tale, 06903 Sophia Antipolis cedex, France 65 c INRA, UR 633 Zoologie Forestie`re, F-45000 Orle´ans, France 11 66 12 67 13 68 article info abstract 14 69 15 70 Article history: The insect midgut is the primary target site for Bt-derived insecticides and Bt alternatives. However, 16 71 Received 10 February 2009 despite extensive recent study, the precise role and nature of different Bt receptors remains a subject of 17 72 Received in revised form considerable debate. This problem is fuelledPROOF by a lack of understanding of the expressed in the 18 2 April 2009 73 insect midgut and their physiological roles. The poplar leaf beetle, Chrysomela tremulae, is an important Accepted 2 April 2009 74 19 model for understanding the mode of action of, and resistance to, coleopteran-specific Bt toxins and 20 currently shows the only known naturally occurring case of resistance to Cry3A toxins. Moreover it 75 Keywords: 21 belongs to the same family as the corn rootworm, Diabrotica virgifera, an economically important beetle 76 Insect midgut 77 22 Chitin deacetylase pest and target of hybrid corn expressing Cry3 toxins. Pyrosequencing is a fast and efficient way of 23 Cellulase defining the transcriptome of specific insect tissues such as the larval midgut. Here we use 454 based 78 24 Endopolygalacturonase pyrosequencing to sample the larval midgut transcriptome of C. tremulae. We identify candidate genes of 79 25 Chrysomela putative Bt receptors including transcripts encoding cadherin-like , N and 80 Bt resistance 26 alkaline phosphatase. We also describe a wealth of new transcripts predicting rapidly evolving gene 81 families involved in plant tissue digestion, which have no homologs in the genome of the stored product 27 82 pest the Red Flour beetle, Tribolium castaneum. 28 83 Ó 2009 Published by Elsevier Ltd. 29 84 30 85 31 86 32 87 33 1. Introduction (Bowen et al.,1998; ffrench-Constant et al., 2007). However, despite 88 34 extensive recent study, the precise role and nature of different Bt 89 35 Traditionally the major target site for small molecule insecticides receptors remains a subject of considerable debate (Pigott and Ellar, 90 36 has been the insect nervous system (Buckingham et al., 2005; Dong, 2007; Soberon et al., 2009). This problem is fuelled by a lack of 91 37 2007; ffrench-Constant et al., 2004; Millar and Denholm, 2007; understanding of the genes expressed in the insect midgut and their 92 38 Raymond-Delpech et al., 2005). However with the advent of Bt respective roles in digestion, response to invading micro-organ- 93 39 toxins, and Bt transformed insect resistant crops, the midgut has isms, homeostatic regulation, and the production and remodeling 94 40 become the primary focus for the development of novel insecticidal of the peritrophic membrane. We aim to address this problem by 95 41 proteins (Pigott and Ellar, 2007). Interest in insect midgut physi- describing the midgut transcriptome of representatives of key 96 42 ology has been strengthened by concerns over emerging Bt resis- insect pest groups such as leaf feeding and Lepidoptera. 97 43 tance and by ongoing research into a growing number of candidate Beetles represent approximately half of all insect species recor- 98 44 Bt receptors and how these may become altered in Bt resistant ded to date and within this group the chrysomelids are one of the 99 45 (Pigott and Ellar, 2007). Identifying and understanding largest families (Hunt et al., 2007), many of which are important 100 46 midgut receptors will also be important in the study of new toxins defoliating and/or root damaging pests such as the corn rootworm 101 47 proposed as alternatives to toxinsUNCORRECTED from Bacillus, such as the Toxin Diabrotica virgifera (Gray et al., 2009). With the recent sequencing of 102 48 complexes (Tc’s) from Photorhabdus and Xenorhabdus bacteria the complete genome of the Red Flour Beetle, Tribolium castaneum, 103 49 we are now uniquely poised to understand more about the geno- 104 50 mics and transcriptomics of beetles. Moreover, given the rather 105 Abbreviations: EST, expressed sequence tag; PM, peritrophic membrane; unusual lifestyle of T. castaneum in feeding exclusively on stored 106 51 Bt, Bacillus thuringiensis toxin; SNP, single nucleotide polymorphism. 52 * Corresponding author. Tel.: þ44 1326 371 852; fax: þ44 1326 253 638. products, comparisons with the commoner lifestyle of leaf-, root- or 107 53 E-mail address: [email protected] (Y. Pauchet). bark-feeding are a priority. The poplar leaf beetle, Chrysomela 108 54 109 0965-1748/$ – see front matter Ó 2009 Published by Elsevier Ltd. doi:10.1016/j.ibmb.2009.04.001

Please cite this article in press as: Pauchet, Y., et al., Pyrosequencing of the midgut transcriptome of the poplar leaf beetle Chrysomela..., Insect Biochemistry and Molecular Biology (2009), doi:10.1016/j.ibmb.2009.04.001 ARTICLE IN PRESS IB2039_proof 16 April 2009 2/11

2 Y. Pauchet et al. / Insect Biochemistry and Molecular Biology xxx (2009) 1–11

110 tremulae, is an important model for understanding Bt receptors and was further purified by using the RNeasy MinElute Clean up Kit 175 111 Bt resistance, and this is currently the only known beetle for which (Qiagen) following the manufacturer’s protocol and eluted in 20 ml 176 112 Cry3A resistance alleles have been found in the field (Genissel et al., of RNA storage solution (Ambion). 177 113 2003a). A Cry3A resistant C. tremulae strain was selected on Bt 178 114 transformed poplar trees expressing the Cry3A toxin (Genissel et al., 2.2. Midgut cDNA library preparation 179 115 2003b). This strain was derived from an isofemale line established 180 116 from field-caught insects that generated F2 offspring that survived Full-length, enriched, cDNAs were generated from 2 mg of total 181 117 on this Bt poplar clone (Augustin et al., 2004). Resistance to Cry3A in RNA using the SMART PCR cDNA synthesis (BD Clontech) 182 118 C. tremulae is under control of a single, completely recessive, following the manufacturer’s protocol. Reverse transcription was 183 119 autosomal trait (Augustin et al., 2004), suggesting that changes in performed with the PrimeScript reverse transcription 184 120 a single receptor, or other gene product, may be involved in resis- (Takara) for 60 min at 42 C and 90 min at 50 C. In order to prevent 185 121 tance. Resistance to Cry3A in the related can be over-representation of the most common transcripts, the resulting 186 122 overcome using Cyt1Aa (Federici and Bauer, 1998), a from double-stranded cDNAs were normalized using the Kamchatka 187 123 a different class of Bt toxins, suggesting some potential specificity in crab duplex-specific nuclease method (Trimmer cDNA normaliza- 188 124 the nature of the resistance mechanism. In the Lepidoptera, Bt tion kit, Evrogen) (Zhulidov et al., 2004). The resulting normalized 189 125 resistance has been associated with changes in midgut associated cDNA midgut library was used for 454 pyrosequencing (Margulies 190 126 aminopeptidase N, cadherin-like proteins, alkaline phosphatase et al., 2005). 191 127 and, via work in model nematodes, midgut glycolipids (Pigott and 192 128 Ellar, 2007). However the molecular basis of Cry3A resistance in 2.3. Pyrosequencing, sequence preprocessing and assembly 193 129 beetles has not been reported to date. 194 130 Pyrosequencing is a fast and efficient way of defining the For 454 pyrosequencing, a cDNA aliquot was sent to the Advanced 195 131 transcriptome of specific insect tissues or cells and has recently Genomics facility at the University of Liverpool (http://www.liv.ac. 196 132 been used with great effect to define the transcriptome of immune uk/agf). A single full plate run was performed on the 454 GS-FLX 197 133 related tissues in Manduca sexta larvae (Zou et al., 2008). 454 based pyrosequencer (Roche Applied Science) using 3 mg of normalized 198 134 sequencing of Expressed Sequence Tags (ESTs) greatly increases cDNAs processed by the ‘‘shotgun’’ method. The reads obtained were 199 135 the number and depth of sequence contigs that can be achieved preprocessedPROOF by removing PolyA tails and SMART adapters using 200 136 using capillary based sequencing. Further, normalization of cDNA custom written Perl scripts kindly provided by Mark Blaxter of the 201 137 pools prior to sequencing reduces the frequency of highly abun- University of Edinburgh. The processed reads were then clustered 202 138 dant transcripts and ensures that a more even representation of using the MIRA v2.9.26x3 assembler with the ’’de novo, normal, 203 139 transcripts is sampled during sequencing. Thus 454 sequencing EST, 45400 parameters. Specifying a minimum read length of 40 nt, 204 140 looks poised to geometrically increase the number of known a minimum sequence overlap of 40 nt and a minimum percentage 205 141 transcripts in many insect tissues. Here we use 454 based pyro- overlap identity of 80%. 206 142 sequencing to sample the larval midgut transcriptome of C. trem- 207 143 ulae. We identify several classes of putative Bt receptors such as 2.4. Blast homology searches and sequence annotation 208 144 transcripts encoding putative cadherin-like proteins, aminopep- 209 145 tidase N and alkaline phosphatase. We also describe a wealth of Homology searches (BLASTx and BLASTn) of unique sequences 210 146 putative new gene products which are part of rapidly evolving and functional annotation by terms (GO; www. 211 147 gene families involved in plant cell wall digestion. These new gene geneontology.org), InterPro terms (InterProScan, EBI), enzyme 212 148 families are not present in the genome of T. castaneum but are classification codes (EC), and metabolic pathways (KEGG, Kyoto 213 149 common in EST collection from other plant feeding beetles, Encyclopedia of Genes and Genomes) were determined using the 214 150 suggesting that additional leaf feeding genomic models, such as BLAST2GO software suite v2.3.1 (www.blast2go.de)(Conesa and 215 151 C. tremulae, will be needed in order to fully understand the Go¨tz, 2008). Homology searches were performed remotely on the 216 152 physiology of the Coleoptera. NCBI server through QBLAST, and followed a sequential strategy. 217 153 First, sequences were searched against an NCBI non-redundant (nr) 218 154 2. Methods and materials protein database via BLASTx using an E-value cut-off of 103 and 219 155 selecting predicted polypeptides of a minimum length of 10 amino 220 156 2.1. Insect rearing and midgut RNA preparation acids. Second, sequences for which no BLASTx hit were searched 221 157 again by BLASTn, against an NCBI nr nucleotide database using an 222 158 C. tremulae larvae for midgut dissection were obtained from E-value cut-off of 1010. 223 159 a strain established from a single mated pair collected in Vatan, For gene ontology mapping, the program extracts the GO terms 224 160 France, in 1999 (Augustin et al., 2004). The strain (Vatan125), origi- associated with homologies identified with NCBI’s QBLAST and 225 161 nated from the offspring of an isofemale line lacking alleles confer- returns a list of GO annotations represented as hierarchical cate- 226 162 ring resistance to the Cry3Aa toxin, was maintained in standard gories of increasing specificity. BLAST2GO allows the selection of 227 163 rearing conditions, in a growth chamber at 20 C with a photoperiod a significance level for the False Discovery Rate (FDR) which was 228 164 of 16:8 (L:D) at URZF-INRA Orle´ans.UNCORRECTED Larvae and adults were reared on used as a cut-off at the 0.05% probability level. Then, GO terms were 229 165 fresh mature leaves detached from greenhouse-grown poplar hybrid modulated using the annotation augmentation tool ANNEX (Myhre 230 166 clone (Populus tremula Populus tremuloides, INRA #353-38). Three et al., 2006) followed by GOSlim. GOSlim consists of a subset of the 231 167 day old third-instar larvae were used for dissection. gene ontology vocabulary encompassing key ontological terms and 232 168 For RNA isolation, midguts were dissected from 10 larvae into a mapping function between the full GO and the GOSlim. Here, we 233 169 PBS buffer. Midguts were cleared of food, peritrophic membrane used the ‘‘generic’’ GOSlim mapping (goslim_generic.obo) available 234 170 and Malpighian tubules, before being flash frozen in liquid nitrogen in BLAST2GO. The data presented represent the level 2 analysis, 235 171 and stored at 80 C prior to use. RNA was subsequently isolated illustrating general functional categories. Enzyme classification 236 172 using TRIzol reagent (Invitrogen) according to the manufacturer’s codes and KEGG metabolic pathway annotations are generated 237 173 protocol. Genomic DNA contamination was removed by DNAse from the direct mapping of GO terms to their enzyme code equiv- 238 174 treatment (TURBO DNAse, Ambion) for 30 min at 37 C. Midgut RNA alents. Finally, InterPro searches were performed remotely from 239

Please cite this article in press as: Pauchet, Y., et al., Pyrosequencing of the midgut transcriptome of the poplar leaf beetle Chrysomela..., Insect Biochemistry and Molecular Biology (2009), doi:10.1016/j.ibmb.2009.04.001 ARTICLE IN PRESS IB2039_proof 16 April 2009 3/11

Y. Pauchet et al. / Insect Biochemistry and Molecular Biology xxx (2009) 1–11 3

240 BLAST2GO to the InterProEBI web server. The default settings of 2.6. Phylogenetic analysis 305 241 BLAST2GO were used in every annotation step. For more details 306 242 on BLAST2GO functionalities see Conesa and Go¨tz (2008). MAFFT software (Multiple Alignment using Fast Fourier Trans- 307 243 form, www.ebi.ac.uk/mafft/) was used to perform multiple sequence 308 244 309 2.5. Capillary based sequencing of highly abundant messages alignment of the chitin deacetylase catalytic domains prior to 245 subsequent phylogenetic analysis. Predicted protein sequences 310 246 311 The over-abundant w1.3 kb band observed in the non- corresponding to cellulases, deduced from C. tremulae 454 contigs 247 312 normalized cDNA lane (Fig. 1) was gel-extracted and purified using and coleopteran-specific ESTs collected from Genbank db_est data- 248 313 the Illustra GFX PCR DNA and Gel Band Purification kit (GE base, were screened for the presence of a putative signal peptide 249 314 healthcare). The resulting DNA was re-amplified by 25 PCR cycles using the SignalP software. After removal of the predicted signal 250 315 using the 50 PCR Primer II A provided in the SMART PCR cDNA peptide, sequences were aligned using MAFFT software prior to 251 316 synthesis kit (BD Clontech). After checking on a 1.5% agarose gel Phylogenetic analysis. The MEGA4.0 software (Tamura et al., 2007) 252 317 that only one band of w1300 bp was re-amplified, the PCR product was used to construct the consensus phylogenetic tree using the 253 318 was cloned into the pCR4 TOPO vector (Invitrogen) and ligations UPGMA method (Unweighted Pair Method with Arithmetic Mean). 254 319 transformed into chemically competent Escherichia coli TOP10 To evaluate the branch strength of the phylogenetic tree, bootstrap 255 320 (Invitrogen). For capillary sequencing, DNA template was prepared analysis of 1000 replications was performed. 256 321 on a Nucleoplex automated system using the Nucleoplex Plasmid 257 322 Mini Prep kit (Tepnel Life Sciences). Capillary sequencing was 3. Results 258 323 carried out on an ABI 3130xl automatic DNA sequencer (Applied 259 324 Biosystems). Unique cDNAs found in at least two clones were 3.1. Pyrosequencing, assembly and annotation 260 325 fully sequenced by primer walking. The unique cDNA sequences 261 326 obtained were searched by BLAST and fully annotated using Normalization of the midgut cDNA resulted in reduction of the 262 327 BLAST2GO. Sequences were submitted to Genbank with accession over-abundant 0.8–3.0 kb transcripts and production of an even 263 328 numbers FJ654706–FJ654727. distribution of transcripts ranging from 0.5 to 3.0 kb in size (Fig. 1). 264 454-mediated pyrosequencing of the normalized library, using 329 265 a full sequencingPROOF plate of a Roche GS-FLX sequencer, generated 330 266 a total of 264,698 reads of raw nucleotide sequence data with an 331 267 average read length of 226 bp (Table 1). After quality scoring using 332 268 MIRA v2.9.26x3, 210,885 of these reads were taken forward for 333 269 assembly. Assembly resulted in 23,238 contigs and a residual 87 334 270 singletons, defined as sequences that did not assemble into a contig 335 271 using the defined assembly parameters. The mean contig size was 336 272 418 bp, with contigs ranging from as small as 40 bp to as large as 337 273 2959 bp (Table 1). 338 274 For BLAST annotation, contigs were first searched using BLASTx 339 275 against the non-redundant NCBI protein database using a cut-off 340 276 E-value of 103. Contigs returning below cut-off BLASTx results 341 277 were then re-searched using BLASTn against the non-redundant 342 278 NCBI nucleotide database using a cut-off E-value of 1010, using this 343 279 sequential approach, just over half (56.1%) of all contigs returned an 344 280 above cut-off BLAST result (Table 1). 345 281 346 282 3.2. Functional classifications of predicted proteins 347 283 348 284 The E-value and similarity distributions of the C. tremulae BLAST 349 285 hits against the non-redundant database are shown in Fig. 2. Most 350 286 (67%) of the BLAST hits are to the T. castaneum genome (Fig. 2C), as to 351 287 date this is the only fully sequenced beetle genome and therefore T. 352 288 castaneum sequences currently represent the vast majority of beetle 353 289 sequences in current databases. Gene ontology (GO) assignments 354 290 355 291 356 292 Table 1 357 Q2 293 Summary statistics for C. tremulae midgut EST assembly and annotation. 358 294 UNCORRECTEDAssembly 359 295 Total number of reads 264,698 360 296 Average read length 226 bp 361 Number of reads that entered in the assembly1 210,885 297 362 Total number of contigs 23,238 298 Total number of singlets 87 363 299 Average contig size (range) 418 bp (40–2959 bp) 364 300 Fig. 1. Agarose gel of C. tremulae midgut cDNA before (non-norm) and after (norm) 365 Annotation normalization. Note the high relative abundance of several discrete bands in the non- 301 % contigs with a blast hit against nr2 56.08% 366 normalized sample which disappear following normalization. The over-abundant 302 % contigs with at least 1 GO term 28.73% 367 transcripts (w1.3 kb) were excised from the gel, cloned and sequenced via capillary % contigs with an EC number 6.65% 303 sequencing to characterize highly abundant larval midgut transcripts (see text). Size 368 % contigs with at least one IPR 26.81% 304 markers are shown at left. 369

Please cite this article in press as: Pauchet, Y., et al., Pyrosequencing of the midgut transcriptome of the poplar leaf beetle Chrysomela..., Insect Biochemistry and Molecular Biology (2009), doi:10.1016/j.ibmb.2009.04.001 ARTICLE IN PRESS IB2039_proof 16 April 2009 4/11

4 Y. Pauchet et al. / Insect Biochemistry and Molecular Biology xxx (2009) 1–11

370 and enzyme classifications (EC) were used to classify the functions of 3.3. Identification of microsatellite repeats 435 371 the predicted midgut proteins (Fig. 3). A dominance of metabolic, 436 372 catalytic and binding activities is predicted (Fig. 3B and C), as Microsatellite loci with 5 di-, tri-, tetra- and pentanucleotide 437 373 expected for midgut proteins. Within the range of enzyme activities repeat units were identified from 67 C. tremulae 454 contigs. Thanks 438 374 dominate accounting for over half (51%) of midgut to the high coverage enabled by 454 pyrosequencing, variation in 439 375 (Fig. 3D). The InterPro database was also used to classify the the number of tandem repeats could be assessed directly from the 440 376 likely functions of predicted midgut proteins and a summary of the assembly of 3 contigs (Fig. 5). These 3 loci were found in the 50-UTR 441 377 20 most frequent classifications is shown in Table 2. Of these, of a transcript similar to a T. castaneum sodium-dependent phos- 442 378 enzymes involved in protein digestion (including a range of cysteine phate transporter XP_974340.1 (Fig. 5A), the 30-UTR of a homolog of 443 379 and serine proteinases) or xenobiotic metabolism (cytochrome a Nasonia vitripennis predicted protein XP_001607853.1 (Fig. 5B), 444 380 P450s, glutathione-S- and carboxylesterases) are domi- and the CDS of a homolog of a Bombyx mori fumarylacetoacetate 445 381 nant, as again predicted for the role of the insect gut in digestion and NP_001103763.1, matching codons for Tyr (TAT) and Ile 446 382 xenobiotic metabolism. In Table 3, we list candidate genes of interest (ATA) amino acids (Fig. 5C). 447 383 for different functional roles discussed below, including cell wall 448 384 degradation, putative Bt receptors and peritrophic membrane 3.4. Capillary sequencing of abundant midgut transcripts 449 385 associated proteins. Six contigs predict a C. tremulae cadherin-like 450 386 protein showing 40–64% amino acid identity to other coleopteran Following the excision of abundant 1.2–1.5 kb transcripts from 451 387 and lepidopteran cadherin-like proteins. Contig 2921 shows 64% the agarose gel of non-normalized midgut cDNA, the resulting 452 388 identity and 80% similarity to the cadherin-like protein from D. vir- product was cloned and individual cDNAs sequenced via conven- 453 389 gifera (Sayed et al., 2007) and is therefore a candidate Bt receptor. An tional capillary sequencing. A representative sample of the most 454 390 alignment of contig 2921 with coleopteran cadherin-like proteins is abundant cDNAs recovered is listed in Table 4, whilst a complete list 455 391 shown in Fig. 4. The predicted protein sequence of contig 2921 of transcripts and their associated InterPro predictions are given in 456 392 corresponds to part of the transmembrane spanning repeats and to the Supplementary materials (Table S1 and S2). The most abundant 457 393 the whole cytosolic region of cadherin-like proteins. transcripts, with 40 clones recovered, were two isoforms of a chitin 458 394 459 395 PROOF 460 396 461 397 462 398 463 399 464 400 465 401 466 402 467 403 468 404 469 405 470 406 471 407 472 408 473 409 474 410 475 411 476 412 477 413 478 414 479 415 480 416 481 417 482 418 483 419 484 420 485 421 486 422 487 423 488 424 UNCORRECTED 489 425 490 426 491 427 492 428 493 429 494 430 495 431 496 432 Fig. 2. Summary of homology searches (BLASTx and BLASTn) of C. tremulae larval midgut 454 data against the non-redundant database at NCBI. (A) E-value distribution of the top 497 BLAST hit for each unique sequence. Cut-offs used in each case were 103 for BLASTx and 1010 for BLASTn. (B) Similarity distribution of the top BLAST hit for each unique sequence. 433 (C) Species distribution of the top BLAST hit for each unique sequence. Note that nearly 67% of top hits are to the beetle Tribolium whose complete genome has recently been 498 434 sequenced. 499

Please cite this article in press as: Pauchet, Y., et al., Pyrosequencing of the midgut transcriptome of the poplar leaf beetle Chrysomela..., Insect Biochemistry and Molecular Biology (2009), doi:10.1016/j.ibmb.2009.04.001 ARTICLE IN PRESS IB2039_proof 16 April 2009 5/11

Y. Pauchet et al. / Insect Biochemistry and Molecular Biology xxx (2009) 1–11 5

500 565 501 566 502 567 503 568 504 569 505 570 506 571 507 572 508 573 509 574 510 575 511 576 512 577 513 578 514 579 515 580 516 581 517 582 518 583 519 584 520 585 521 586 522 587 523 588 524 589 525 PROOF 590 526 591 527 592 528 593 529 594 530 595 531 596 532 597 533 598 534 599 535 Fig. 3. Gene Ontology (GO) assignments and Enzyme Classifications (EC) for the C. tremulae larval midgut transcriptome. (A) Cellular component GO terms at level 2. (B) Biological 600 536 process GO terms at level 2. (C) Molecular function GO terms at level 2. (C) General EC terms: (EC:1.x.x), Transferases (EC:2.x.x), Hydrolases (EC: 3.x.x), 601 (EC:4.x.x), (EC:5.x.x), and (EC:6.x.x). Note that one sequence can be associated with more than one GO term (see text for discussion). 537 602 538 603 539 604 540 605 541 606 542 607 543 608 Table 2 544 Summary of TOP20 of InterPro families and domains represented in the C. tremulae larval midgut transcriptome. 609 545 610 546 Family Domain 611 547 InterPro Frequency Description InterPro Frequency Description 612 548 IPR013128 51 Peptidase C1A, papain IPR016040 66 NAD(P)-binding 613 549 IPR002198 39 Short-chain dehydrogenase/reductase IPR002018 52 Carboxylesterase, type B 614 IPR013753 33 Ras IPR000668 45 Peptidase C1A, papain C-terminal 550 615 IPR001806 30 Ras GTPase IPR017853 44 Glycoside hydrolase, catalytic core 551 IPR001128 25 Cytochrome P450 IPR002048 41 Calcium-binding EF-hand 616 552 IPR001360 24 Glycoside hydrolase, family 1 IPR011009 41 Protein kinase-like 617 553 IPR001993 24 Mitochondrial substrate carrier IPR000719 38 Protein kinase, core 618 554 IPR002347 24 Glucose/ribitol dehydrogenase IPR009003 37 Peptidase, trypsin-like serine and cysteine 619 IPR001353 23 20S proteasomeUNCORRECTED IPR001254 35 Peptidase S1/S6, chymotrypsin/Hap 555 620 IPR001395 21 Aldo/keto reductase IPR004046 34 Glutathione-S-, C-term 556 IPR004119 21 Protein of unknown function DUF227 IPR007087 33 Zinc finger, C2H2-type 621 557 IPR004000 19 Actin/actin-like IPR011046 33 WD40 repeat-like 622 558 IPR015609 18 Heat shock protein HSP40 IPR016196 31 MFS general substrate transporter 623 559 IPR015643 18 Peptidase C1A, cathepsin B IPR015880 29 Zinc finger, C2H2-like 624 IPR000734 17 Lipase IPR009072 25 Histone-fold 560 IPR007114 17 Major facilitator family IPR000504 24 RNA recognition motif, RNP-1 625 561 IPR006649 16 Like-Sm ribonucleoprotein IPR002557 24 Chitin-binding, peritrophin A 626 562 IPR000743 15 Glycoside hydrolase, family 28 IPR016135 22 Ubiquitin-conjugating enzyme/RWD-like 627 563 IPR001316 15 Peptidase S1A, streptogrisin IPR017442 22 Serine/threonine protein kinase-related 628 IPR001930 15 Peptidase M1, membrane alanine aminopeptidase IPR016024 21 Armadillo-type fold 564 629

Please cite this article in press as: Pauchet, Y., et al., Pyrosequencing of the midgut transcriptome of the poplar leaf beetle Chrysomela..., Insect Biochemistry and Molecular Biology (2009), doi:10.1016/j.ibmb.2009.04.001 ARTICLE IN PRESS IB2039_proof 16 April 2009 6/11

6 Y. Pauchet et al. / Insect Biochemistry and Molecular Biology xxx (2009) 1–11

630 Table 3 deacetylase differing by eight SNPs resulting in only two amino acid 695 631 Selection of genes of interest related to the C. tremulae larval midgut physiological changes. These two isoforms are putatively two alleles of the same 696 functions. 632 chitin deacetylase gene. Phylogenetic comparison with chitin 697 633 EC number Total number deacetylases from a range of different insects (Fig. 6), shows that 698 634 of contigs these belong to the midgut-specific Group V, according to the 699 635 Plant/fungi cell wall degradation classification described by Dixit et al. (2008), with the closest 700 636 Cellulase 3.2.1.4 2 relative being chitin deacetylase 6 from T. castaneum. A full amino 701 Cellulose 1,4-b-cellobiosidase 3.2.1.91 6 637 702 Endopolygalacturonase (pectinase) 3.2.1.15 37 acid alignment of the putative catalytic domains used to derive the 638 Endo-b-1,3-glucanase 3.2.1.39 2 phylogeny is shown in the Supplementary materials (Fig. S1). 703 639 A total of 13 contigs, corresponding to the two alleles described 704 Bt Cry3 toxin putative binding partners 640 Cadherin-like protein – 6 above, and also to homologs of T. castaneum chitin deacetylase 705 641 Aminopeptidase N 3.4.11.2 11 2 and 5 were found in the 454 data (Table 3). 706 642 Alkaline phosphatase 3.1.3.1 1 707 643 Peritrophic membrane biosynthesis, degradation and remodeling 3.5. Identification of new beetle gene families 708 644 Chitinase 3.2.1.14 36 709 Chitin synthase 2.4.1.16 7 645 After searching C. tremulae contigs using BLASTx against the 710 646 Chitin deacetylase 3.1.5.41 13 711 Peritrophin (mucin) – 7 T. castaneum gene objects (16,404 entries) (Richards et al., 2008), 647 12,897 positive hits were returned corresponding to 5489 (33.5%) 712 648 General digestion 713 Cysteine proteinase all types – 129 unique T. castaneum gene objects. Contigs returning below cut-off 649 714 Serine proteinase all types – 69 BLASTx results were re-searched against a non-redundant 650 Aminopeptidase all types – 33 database returning a further 428 positive hits. A large 715 651 all types – 25 portion of these contigs that did not hit any Tribolium predicted 716 Dipeptidylpeptidase all types – 3 652 gene objects were identified as putative plant tissue degrading 717 653 Lipase 3.1.1.3 17 718 a-amylase 3.2.1.1 3 enzymes such as cellulases (EC:3.2.1.4) and endopolygalactur- 654 719 a-glucosidase (maltase) 3.2.1.20 10 onases (EC:3.2.1.15). 655 b-glucosidase 3.2.1.21 22 To determinePROOF if such genes are common in other Coleoptera, 720 656 although absent from T. castaneum, we used the predicted proteins 721 657 722 658 723 659 724 660 725 661 726 662 727 663 728 664 729 665 730 666 731 667 732 668 733 669 734 670 735 671 736 672 Fig. 4. Amino acid alignment of known coleopteran cadherin-like genes with C. tremulae contig 2921, showing part of the predicted transmembrane spanning repeats. Sequences 737 were aligned using the MAFFT multiple alignment program and identical residues are boxed with dark shading, and conserved residues boxed with shading. Sequences are 673 738 from Diabrotica virgifera (Dvir, AAV88529.2), Tribolium castaneum (Tcas, XP_971388.2), Tenebrio molitor (Tmol, ABL86001.1) and C. tremulae (Ctre, contig_2921). 674 739 675 740 676 741 677 742 678 743 679 744 680 745 681 746 682 747 683 748 684 UNCORRECTED 749 685 750 686 751 687 752 688 753 689 754 690 755 691 756 692 757

693 0 758 Fig. 5. Polymorphic microsatellite repeats found in the C. tremulae 454 dataset. (A) (GAA)n found in the 5 -UTR of a putative sodium-dependent phosphate transporter transcript. 0 694 (B) (TA)n from the 3 -UTR of a predicted protein transcript. (C) (TA)n from the CDS of a putative fumarylacetoacetate hydrolase transcript. 759

Please cite this article in press as: Pauchet, Y., et al., Pyrosequencing of the midgut transcriptome of the poplar leaf beetle Chrysomela..., Insect Biochemistry and Molecular Biology (2009), doi:10.1016/j.ibmb.2009.04.001 ARTICLE IN PRESS IB2039_proof 16 April 2009 7/11

Y. Pauchet et al. / Insect Biochemistry and Molecular Biology xxx (2009) 1–11 7

760 Table 4 825 761 Capillary sequencing of over-abundant transcripts (w1.3 kb) in the non-normalized midgut cDNA. 826 762 # clones % total Name cDNA (bp) Protein (aa) Blastp vs nr Species E-Value 827 763 40 41.7 Chitin deacetylase 1260 376 Chitin deacetylase 6 T. castaneum 3e102 828 764 10 10.42 Aspartic proteinase-1 1245 386 Cathepsin D T. castaneum 3e134 829 765 10 10.42 16S rRNA 1306 – – – – 830 766 3 3.13 Chitinase-1 1278 389 Chitinase P. cochleariae 1e118 831 3 3.13 Cysteine proteinase-1 1163 323 Cathepsin L-like P. cochleariae 4e138 767 832 3 3.13 Endopolygalacturonase-1 1205 370 Polygalacturonase S. oryzae 5e97 768 2 2.1 Tetraspanin-1 1136 305 Tetraspanin 29fb T. castaneum 3e67 833 769 2 2.1 Unknown protein-1 1275 397 Hypothetical protein T. castaneum 8e69 834 770 835 771 836 772 837 773 838 774 839 775 840 776 841 777 842 778 843 779 844 780 845 781 846 782 847 783 848 784 849 785 PROOF 850 786 851 787 852 788 853 789 854 790 855 791 856 792 857 793 858 794 859 795 860 796 861 797 862 798 863 799 864 800 865 801 866 802 867 803 868 804 869 805 870 806 871 807 872 808 873 809 874 810 875 811 876 812 877 813 878 814 UNCORRECTED 879 815 880 816 881 817 882 818 883 819 884 820 Fig. 6. Phylogenetic analysis of putative chitin deacetylases from different insects. Phylogenetic analysis was conducted in MEGA4.0 using the UPGMA method and the percentage of 885 821 replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) is shown next to the branches. Positions containing alignment gaps and missing 886 822 data were eliminated only in pairwise sequence comparisons (Pairwise deletion option). ChBD: Chitin-binding domain; LDL: Low density lipoprotein domain; CDA: Chitin 887 deacetylase domain. Tribolium castaneum (Tcas), Apis mellifera (Amel), Anopheles gambiae (Agam), Helicoverpa armigera (Harm), Bombyx mori (Bmor), Drosophila melanogaster 823 (Dmel), Trichoplusia ni (Tni), Mamestra configurata (Mcon), C. tremulae (Ctre). Note that the alignment of the CDA domains used in this phylogeny is provided as a Supplementary 888 824 material (Fig. S1). 889

Please cite this article in press as: Pauchet, Y., et al., Pyrosequencing of the midgut transcriptome of the poplar leaf beetle Chrysomela..., Insect Biochemistry and Molecular Biology (2009), doi:10.1016/j.ibmb.2009.04.001 ARTICLE IN PRESS IB2039_proof 16 April 2009 8/11

8 Y. Pauchet et al. / Insect Biochemistry and Molecular Biology xxx (2009) 1–11

890 of the corresponding C. tremulae contigs to search public coleop- pattern for GH45 ([STA]-T-R-Y-[FYW]-D-x(5)-[CA]) is also 955 891 teran EST collections present in the Genbank db_est database. We conserved in all sequences (Fig. 7A). As many as six different 956 892 retrieved 130 ESTs corresponding to cellulases that clustered into cellulase sequences could be found in some species such as I. pini, 957 893 20 unique contigs, and 29 ESTs corresponding to pectinases but only two contigs were present in the C. tremulae 454 data 958 894 that clustered into 18 unique contigs. A further three coleopteran (Fig. 7A). A phylogenetic analysis suggested the presence of at least 959 895 cellulase sequences were retrieved from Genbank together with five sub-classes in this gene family (Fig. 7B). 960 896 two endopolygalacturonase sequences. All these genes were found Thirty-seven contigs corresponding to putative endopolyga- 961 897 in coleopteran species feeding on fresh plant tissues such as the lacturonase genes were found in the C. tremulae dataset suggesting 962 898 pine engraver Ips pini, the western corn rootworm D. virgifera, the a highly diverse gene family. An alignment of the predicted proteins 963 899 Colorado potato beetle Leptinotarsa decemlineata, the coffee berry deduced from the 11 longest contigs together with the sequence 964 900 borer Hypothenemus hampei, the mustard beetle Phaedon cochle- found in the abundant 1.3 kb band (Pect-1, Table 4) is shown in 965 901 ariae and the citrus root weevil Diaprepes abbreviatus. These Fig. 8A. All the C. tremulae sequences as well as the other coleopteran 966 902 sequence data are summarized in Table S3. ones share a glycoside hydrolase family 28 (GH28) conserved 967 903 The SignalP software predicted the presence of a putative signal domain (IPR000743). The InterProScan search did not hit the PRO- 968 904 peptide in all the cellulase protein sequences that appeared to be SITE consensus pattern of GH28 proteins (PS00502) which is 969 905 full-length CDS. An alignment of all these sequences is provided as [GSDENKRH]-x(2)-[VMFC]-x(2)-[GS]-H-G-[LIVMAG]-x(1,2)-[LIVM]- 970 906 Supplementary material (Fig. S2). In addition, an InterProScan G-S. This is because the consensus pattern of the cole- 971 907 search predicted a glycoside hydrolase family 45 (GH45) conserved opteran endopolygalacturonases is slightly different, we deduced it 972 908 domain (IPR000334) in every sequence. The active site consensus to be [SDENKRQT]-x(3)-C-x-G-[GS]-H-G-[LF]-x(3)-[IVTS]-G-x-S. 973 909 974 910 975 911 976 912 977 913 978 914 979 915 PROOF 980 916 981 917 982 918 983 919 984 920 985 921 986 922 987 923 988 924 989 925 990 926 991 927 992 928 993 929 994 930 995 931 996 932 997 933 998 934 999 935 1000 936 1001 937 1002 938 1003 939 1004 940 1005 941 1006 942 1007 943 1008 944 UNCORRECTED 1009 945 1010 946 1011 947 1012 948 1013 949 1014 950 1015 951 Fig. 7. Amino acid alignment and Phylogenetic analysis of the coleopteran cellulose gene family. (A) Predicted protein sequences deduced from coleopteran cellulase transcripts 1016 952 without their predicted signal peptide were aligned using MAFFT multiple alignment program and identical residues are boxed with dark shading, and conserved residues boxed with 1017 light shading. The predicted active site consensus pattern is underlined. (B) Phylogenetic analysis was conducted in MEGA4.0 using the UPGMA method and the percentage of replicate 953 trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) is shown next to the branches. Positions containing alignment gaps and missing data were 1018 954 eliminated only in pairwise sequence comparisons (Pairwise deletion option). Details on how these sequences were obtained are given as Supplementary materials (Table S3). 1019

Please cite this article in press as: Pauchet, Y., et al., Pyrosequencing of the midgut transcriptome of the poplar leaf beetle Chrysomela..., Insect Biochemistry and Molecular Biology (2009), doi:10.1016/j.ibmb.2009.04.001 ARTICLE IN PRESS IB2039_proof 16 April 2009 9/11

Y. Pauchet et al. / Insect Biochemistry and Molecular Biology xxx (2009) 1–11 9

1020 1085 1021 1086 1022 1087 1023 1088 1024 1089 1025 1090 1026 1091 1027 1092 1028 1093 1029 1094 1030 1095 1031 1096 1032 1097 1033 1098 1034 1099 1035 1100 1036 1101 1037 1102 1038 1103 1039 1104 1040 1105 1041 1106 1042 1107 1043 1108 1044 1109 1045 PROOF 1110 1046 1111 1047 1112 1048 1113 1049 1114 1050 1115 1051 1116 1052 1117 1053 1118 1054 1119 1055 1120 1056 1121 1057 1122 1058 1123 1059 1124 1060 1125 1061 1126 1062 1127 1063 1128 1064 1129 1065 1130 1066 1131 1067 1132 1068 1133 1069 1134 1070 1135 1071 1136 1072 1137 1073 1138 1074 Fig. 8. Amino acid alignment of putative C.UNCORRECTED tremulae and other coleopteran endopolygalacturonases. Sequences were aligned using the MAFFT multiple alignment program and 1139 1075 identical residues are boxed with dark shading, and conserved residues boxed with light shading. The predicted active site consensus pattern is underlined. Details on how these 1140 1076 sequences were obtained are given as Supplementary materials (Table S3). 1141 1077 1142 1078 1143 1079 4. Discussion but also provide a plethora of genetic markers, such as micro- 1144 1080 satellites (Fig. 5), that can be used to derive high density linkage 1145 1081 Pyrosequencing looks set to dramatically increase the frequency maps for mapping traits of biological relevance. In this fashion 454 1146 1082 and depth of EST collections for specific insect tissues and non- sequencing can rapidly promote both sequence data and genetic 1147 1083 Drosophilid insects as a whole. Such datasets not only provide markers in species for which no previous sequence data is available. 1148 1084 extensive nucleotide sequence data for insects lacking genomic data For example, 454-mediated sequencing of cDNA from immune 1149

Please cite this article in press as: Pauchet, Y., et al., Pyrosequencing of the midgut transcriptome of the poplar leaf beetle Chrysomela..., Insect Biochemistry and Molecular Biology (2009), doi:10.1016/j.ibmb.2009.04.001 ARTICLE IN PRESS IB2039_proof 16 April 2009 10/11

10 Y. Pauchet et al. / Insect Biochemistry and Molecular Biology xxx (2009) 1–11

1150 related M. sexta tissues has already dramatically increased the group V chitin deacetylase was the most abundant transcript 1215 1151 number of ESTs available (Zou et al., 2008) and sequencing of other recovered from the C. tremulae non-normalized midgut cDNAs. This 1216 1152 Manduca tissues, such as the midgut, will give us the first overview group of putative enzymes has attracted significant recent atten- 1217 1153 of a complete insect transcriptome from this physiological model tion in several lepidopteran species (Campbell et al., 2008; Guo 1218 1154 which currently lacks any genomic framework. To our knowledge, et al., 2005; Pauchet et al., 2008; Toprak et al., 2008), however their 1219 1155 the data described here is the first publication of a 454 derived role in midgut physiology remains unclear. 1220 1156 insect midgut EST collection, and this dataset will also dramatically As well as forming the major organ for processing of insect food, 1221 1157 increase the number of midgut ESTs from collections previously the midgut also forms the first line of defense against ingested 1222 1158 derived via conventional capillary sequencing, such as those from xenobiotics and micro-organisms, the midgut therefore also plays 1223 1159 the corn rootworm D. virgifera (Siegfried et al., 2005), the grass grub a critical role in xenobiotic metabolism and insect immunity. With 1224 1160 beetle zealandica (Marshall et al., 2008) or the pine respect to xenobiotic metabolism, cytochrome P450s (25 hits) and 1225 1161 engraver I. pini (Eigenheer et al., 2003). Thus pyrosequencing of glutathione-S-transferases (34 hits) are highly represented in the 1226 1162 non-model insects is a dramatic and rapid way of providing InterPro families detected (Table 2) and as full sequences for these 1227 1163 genomic and transcriptomic resources in an organism where none gene families appear it will be interesting to compare the full 1228 1164 are previously available. For example, with this publication, for complement of P450 and GSTgenes present in this leaf feeding beetle 1229 1165 C. tremulae the number of nucleotide sequences (reported contigs) with that found in the grain feeding T. castaneum. For example, given 1230 1166 has risen instantaneously from 8 (Genbank, 1st of February 2009) to that C. tremulae specializes in a single species of plant we might 1231 1167 over 23,000 (Table 1). When systematically applied to different expect the range of enzymes required to metabolize secondary plant 1232 1168 tissues of insect models pyrosequencing will similarly dramatically compounds to be reduced and somewhat specialized in their role. 1233 1169 increase resources available for all non-Drosophilids. In conclusion, here we have shown that 454-mediated pyrose- 1234 1170 With specific reference to the midgut ESTs for C. tremulae quencing is a rapid and dramatic way of increasing the size and 1235 1171 described here, many of these predict proteins with functions depth of insect midgut EST collections. Many of the predicted 1236 1172 expected for the midgut with likely roles in digestion, xenobiotic enzymes are currently only reported in the leaf beetle C. tremulae, 1237 1173 metabolism or immunity, whereas many either show unexpected sequenced here. Indeed some of the enzymes recovered are the first 1238 1174 functions or no similarity to predicted proteins in current databases. insect representatives. For example, the cellulose cellobiosidases 1239 1175 Expected roles in the midgut physiology of a leaf eating beetle involve (Table 3)PROOF appear to only have representatives from yeast, fungi and 1240 1176 destruction of the plant cell wall, thus the dominance of cellulases, bacteria in current sequence databases. The approach taken here, of 1241 1177 pectinases and glucanases. However, what is perhaps more inter- normalizing cDNAs from a specific insect tissue, attempts to define 1242 1178 esting is the absence of these predicted proteins from the T. casta- all of the unique transcripts associated with a tissue, often termed an 1243 1179 neum genome. This predicted difference in the midgut enzyme ‘unigene set’. We note, however, that cDNA shearing associated with 1244 1180 composition of beetles feeding on stored products (T. castaneum)and preparation of the sample for 454 sequencing may introduce an 1245 1181 fresh leaves (C. tremulae), begins to illustrate the potential power of element of bias resulting in uneven sequence coverage of repre- 1246 1182 comparing such datasets between insects with different lifestyles. sentative cDNAs. However, the depth of EST coverage generated by 1247 1183 We also note that enzymes degrading plant cell walls are of consid- 454 sequencing far outweighs that generated in conventional 1248 1184 erable current interest in biotechnology for the degradation of plant capillary led projects. 454 sequencing of other insect tissues will 1249 1185 material for use in biofuels. 454-mediated sequencing of insect therefore not only increase the depth of existing capillary generated 1250 1186 midguts, or salivary glands, may therefore be a powerful new way of EST collections but will also identify new insect representatives 1251 1187 finding enzymes important for plant biotechnology. for different enzymes and receptors. Further comparison of such 1252 1188 One of the central aims of this EST project was to identify contigs datasets in insects with different lifestyles and food sources will 1253 1189 encoding candidate Bt receptors to use as probes in a dissection of begin to tell us how different insects exploit their different niches. 1254 1190 the molecular basis of Cry3A resistance in C. tremulae. Contigs 1255 1191 encoding three clear candidate Bt receptors were identified. First, Acknowledgments 1256 1192 a cadherin-like protein showing strong similarity to cadherin-like 1257 1193 proteins from three other beetles, T. castaneum, Tenebrio molitor The authors thank Marcel Amichot for fruitful discussion, 1258 1194 and D. virgifera. Second, eleven contigs encoding proteins similar to Claudine Courtin for insect rearing and Philippe Lorme for green- 1259 1195 aminopeptidase N. Third, a contig encoding an alkaline phospha- house plant production. Supported by a BBSRC grant to R. ff-C. 1260 1196 tase. However no contigs encoding ADAM10-like metalloproteases, There is no conflict of interest. 1261 1197 recently reported as involved in the processing of Cry3A in Colo- 1262 1198 rado potato beetle (Ochoa-Campuzano et al., 2007), were found. Appendix. Supplementary material 1263 1199 Another expected function for proteins in the insect midgut is 1264 1200 remodeling of peritrophic membrane (Table 3). The peritrophic Supplementary data associated with this article can be found, in 1265 1201 membrane (PM) separates the food bolus in the insect midgut, in the online version, at doi:10.1016/j.ibmb.2009.04.001. 1266 1202 this case poplar leaves, from the midgut epithelium and associated 1267 1203 brush border membrane. This membrane is composed of chitin and References 1268 1204 proteins and forms a permeable barrierUNCORRECTED that protects the midgut 1269 1205 epithelium from erosion by the food bolus whilst allowing for the Augustin, S., Courtin, C., Rejasse, A., Lorme, P., Genissel, A., Bourguet, D., 2004. 1270 1206 passage of enzymes and solutes involved in extracellular digestion Genetics of resistance to transgenic Bacillus thuringiensis poplars in Chrysomela 1271 tremulae (Coleoptera: Chrysomelidae). J. Econ. Entomol. 97, 1058–1064. 1207 in the gut (Caldeira et al., 2007). The PM requires continuous repair Bowen, D., Rocheleau, T.A., Blackburn, M., Andreev, O., Golubeva, E., Bhartia, R., 1272 1208 and replacement and it is therefore interesting to note dominance ffrench-Constant, R.H., 1998. Insecticidal toxins from the bacterium Photo- 1273 1209 of enzymes involved in the binding (peritrophins, seven contigs), rhabdus luminescens. Science 280, 2129–2132. 1274 Buckingham, S.D., Biggin, P.C., Sattelle, B.M., Brown, L.A., Sattelle, D.B., 2005. Insect 1210 breakdown (chitinases, 36 contigs) and remodeling (chitin syn- GABA receptors: splicing, editing, and targeting by antiparasitics and insecti- 1275 1211 thase, seven contigs, and chitin deacetylase, 13 contigs) of chitin cides. Mol. Pharmacol. 68, 942–951. 1276 1212 (Table 3), suggesting that maintenance and remodeling of chitin in Caldeira, W., Dias, A.B., Terra, W.R., Ribeiro, A.F., 2007. Digestive enzyme compart- 1277 mentalization and recycling and sites of absorption and secretion along the 1213 1278 the PM occupies a major part of the C. tremulae transcriptional midgut of Dermestes maculatus (Coleoptera) larvae. Arch. Insect Biochem. 1214 midgut machinery. It is interesting to note that a member of the Physiol. 64, 1–18. 1279

Please cite this article in press as: Pauchet, Y., et al., Pyrosequencing of the midgut transcriptome of the poplar leaf beetle Chrysomela..., Insect Biochemistry and Molecular Biology (2009), doi:10.1016/j.ibmb.2009.04.001 ARTICLE IN PRESS IB2039_proof 16 April 2009 11/11

Y. Pauchet et al. / Insect Biochemistry and Molecular Biology xxx (2009) 1–11 11

1280 Campbell, P.M., Cao, A.T., Hines, E.R., East, P.D., Gordon, K.H., 2008. Proteomic Raymond-Delpech, V., Matsuda, K., Sattelle, B.M., Rauh, J.J., Sattelle, D.B., 2005. Ion channels: 1333 1281 analysis of the peritrophic matrix from the gut of the caterpillar, Helicoverpa molecular targets of neuroactive insecticides. Invert. Neurosci. 5, 119–133. 1334 armigera. Insect Biochem. Mol. Biol. 38, 950–958. Richards, S., Gibbs, R.A., Weinstock, G.M., Brown, S.J., Denell, R., Beeman, R.W., 1282 Conesa, A., Go¨tz, S., 2008. Blast2GO: a comprehensive suite for functional analysis in Gibbs, R., Beeman, R.W., Brown, S.J., Bucher, G., Friedrich, M., 1335 1283 plant genomics. Int. J. Plant Genomics 2008, 619832. Grimmelikhuijzen, C.J., Klingler, M., Lorenzen, M., Richards, S., Roth, S., 1336 1284 Dixit, R., Arakane, Y., Specht, C.A., Richard, C., Kramer, K.J., Beeman, R.W., Schroder, R., Tautz, D., Zdobnov, E.M., Muzny, D., Gibbs, R.A., Weinstock, G.M., 1337 Muthukrishnan, S., 2008. Domain organization and phylogenetic analysis of Attaway, T., Bell, S., Buhay, C.J., Chandrabose, M.N., Chavez, D., Clerk- 1285 proteins from the chitin deacetylase gene family of Tribolium castaneum and Blankenburg, K.P., Cree, A., Dao, M., Davis, C., Chacko, J., Dinh, H., Dugan- 1338 1286 three other species of insects. Insect Biochem. Mol. Biol. 38, 440–451. Rocha, S., Fowler, G., Garner, T.T., Garnes, J., Gnirke, A., Hawes, A., Hernandez, J., 1339 1287 Dong, K., 2007. Insect sodium channels and insecticide resistance. Invert. Neurosci. Hines, S., Holder, M., Hume, J., Jhangiani, S.N., Joshi, V., Khan, Z.M., Jackson, L., 1340 7, 17–30. Kovar, C., Kowis, A., Lee, S., Lewis, L.R., Margolis, J., Morgan, M., Nazareth, L.V., 1288 Eigenheer, A.L., Keeling, C.I., Young, S., Tittiger, C., 2003. Comparison of gene Nguyen, N., Okwuonu, G., Parker, D., Richards, S., Ruiz, S.J., Santibanez, J., 1341 1289 representation in midguts from two phytophagous insects, Bombyx mori and Ips Savard, J., Scherer, S.E., Schneider, B., Sodergren, E., Tautz, D., Vattahil, S., 1342 1290 pini, using expressed sequence tags. Gene 316, 127–136. Villasana, D., White, C.S., Wright, R., Park, Y., Beeman, R.W., Lord, J., Oppert, B., 1343 1291 Federici, B.A., Bauer, L.S., 1998. Cyt1Aa protein of Bacillus thuringiensis is toxic to the Lorenzen, M., Brown, S., Wang, L., Savard, J., Tautz, D., Richards, S., Weinstock, G., 1344 cottonwood leaf beetle, Chrysomela scripta, and suppresses high levels of Gibbs, R.A., Liu, Y., Worley, K., Weinstock, G., Elsik, C.G., Reese, J.T., Elhaik, E., 1292 resistance to Cry3Aa. Appl. Environ. Microbiol. 64, 4368–4371. Landan, G., Graur, D., Arensburger, P., Atkinson, P., Beeman, R.W., Beidler, J., 1345 1293 ffrench-Constant, R.H., Daborn, P.J., Le Goff, G., 2004. The genetics and genomics of Brown, S.J., Demuth, J.P., Drury, D.W., Du, Y.Z., Fujiwara, H., Lorenzen, M., 1346 1294 insecticide resistance. Trends Genet. 20, 163–170. Maselli, V., Osanai, M., Park, Y., Robertson, H.M., Tu, Z., Wang, J.J., Wang, S., 1347 ffrench-Constant, R.H., Dowling, A., Waterfield, N.R., 2007. Insecticidal toxins from Richards, S., Song, H., Zhang, L., Sodergren, E., Werner, D., Stanke, M., 1295 Photorhabdus bacteria and their potential use in agriculture. Toxicon 49, 436–451. Morgenstern, B., Solovyev, V., Kosarev, P., Brown, G., Chen, H.C., Ermolaeva, O., 1348 1296 Genissel, A., Augustin, S., Courtin, C., Pilate, G., Lorme, P., Bourguet, D., 2003a. Initial Hlavina, W., Kapustin, Y., Kiryutin, B., Kitts, P., Maglott, D., Pruitt, K., 1349 1297 frequency of alleles conferring resistance to Bacillus thuringiensis poplar in Sapojnikov, V., Souvorov, A., Mackey, A.J., Waterhouse, R.M., Wyder, S., 1350 a field population of Chrysomela tremulae. Proc. R. Soc. Lond. B Biol. Sci. 270, Zdobnov, E.M., Zdobnov, E.M., Wyder, S., Kriventseva, E.V., Kadowaki, T., Bork, P., 1298 791–797. Aranda, M., Bao, R., Beermann, A., Berns, N., Bolognesi, R., Bonneton, F., Bopp, D., 1351 1299 Genissel, A., Leple, J.C., Millet, N., Augustin, S., Jouanin, L., Pilate, G., 2003b. High Brown, S.J., Bucher, G., Butts, T., Chaumot, A., Denell, R.E., Ferrier, D.E., 1352 1300 tolerance against Chrysomela tremulae of transgenic poplar plants expressing Friedrich, M., Gordon, C.M., Jindra, M., Klingler, M., Lan, Q., Lattorff, H.M., 1353 a synthetic cry3Aa gene from Bacillus thuringiensis ssp tenebrionis. Mol. Breed. Laudet, V., von Levetsow, C., Liu, Z., Lutz, R., Lynch, J.A., da Fonseca, R.N., 1301 11, 103–110. Posnien, N., Reuter, R., Roth, S., Savard, J., Schinko, J.B., Schmitt, C., 1354 1302 Gray, M.E., Sappington, T.W., Miller, N.J., Moeser, J., Bohn, M.O., 2009. Adaptation Schoppmeier, M., Schroder, R., Shippy, T.D., Simonnet, F., Marques-Souza, H., 1355 1303 and invasiveness of western corn rootworm: intensifying research on a wors- Tautz, D., Tomoyasu, Y., Trauner, J., Van der Zee, M., Vervoort, M., Wittkopp, N., 1356 ening pest. Annu. Rev. Entomol. 54, 303–321. Wimmer, E.A., Yang, X., Jones, A.K., Sattelle, D.B., Ebert, P.R., Nelson, D., 1304 Guo, W., Li, G., Pang, Y., Wang, P., 2005. A novel chitin-binding protein identified Scott, J.G., Beeman, R.W., Muthukrishnan, S., Kramer, K.J., Arakane, Y., 1357 1305 from the peritrophic membrane of the cabbage looper, Trichoplusia ni. Insect Beeman,PROOF R.W., Zhu, Q., Hogenkamp, D., Dixit, R., Oppert, B., Jiang, H., Zou, Z., 1358 1306 Biochem. Mol. Biol. 35, 1224–1234. Marshall, J., Elpidina, E., Vinokurov, K., Oppert, C., Zou, Z., Evans, J., Lu, Z., 1359 1307 Hunt, T., Bergsten, J., Levkanicova, Z., Papadopoulou, A., John, O.S., Wild, R., Zhao, P., Sumathipala, N., Altincicek, B., Vilcinskas, A., Williams, M., 1360 Hammond, P.M., Ahrens, D., Balke, M., Caterino, M.S., Gomez-Zurita, J., Ribera, I., Hultmark, D., Hetru, C., Jiang, H., Grimmelikhuijzen, C.J., Hauser, F., 1308 Barraclough, T.G., Bocakova, M., Bocak, L., Vogler, A.P., 2007. A comprehensive Cazzamali, G., Williamson, M., Park, Y., Li, B., Tanaka, Y., Predel, R., Neupert, S., 1361 1309 phylogeny of beetles reveals the evolutionary origins of a superradiation. Schachtner, J., Verleyen, P., Raible, F., Bork, P., Friedrich, M., Walden, K.K., 1362 1310 Science 318, 1913–1916. Robertson, H.M., Angeli, S., Foret, S., Bucher, G., Schuetz, S., Maleszka, R., 1363 Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka, J., Wimmer, E.A., Beeman, R.W., Lorenzen, M., Tomoyasu, Y., Miller, S.C., 1311 Braverman, M.S., Chen, Y.J., Chen, Z., Dewell, S.B., Du, L., Fierro, J.M., Gomes, X.V., Grossmann, D., Bucher, G., 2008. The genome of the model beetle and pest 1364 1312 Godwin, B.C., He, W., Helgesen, S., Ho, C.H., Irzyk, G.P., Jando, S.C., Alenquer, M.L., Tribolium castaneum. Nature 452, 949–955. 1365 1313 Jarvie, T.P., Jirage, K.B., Kim, J.B., Knight, J.R., Lanza, J.R., Leamon, J.H., Sayed, A., Nekl, E.R., Siqueira, H.A., Wang, H.C., ffrench-Constant, R.H., Bagley, M., 1366 Lefkowitz, S.M., Lei, M., Li, J., Lohman, K.L., Lu, H., Makhijani, V.B., McDade, K.E., Siegfried, B.D., 2007. A novel cadherin-like gene from western corn rootworm, 1314 McKenna, M.P., Myers, E.W., Nickerson, E., Nobile, J.R., Plant, R., Puc, B.P., Diabrotica virgifera virgifera (Coleoptera: Chrysomelidae), larval midgut tissue. 1367 1315 Ronan, M.T., Roth, G.T., Sarkis, G.J., Simons, J.F., Simpson, J.W., Srinivasan, M., Insect Mol. Biol. 16, 591–600. 1368 1316 Tartaro, K.R., Tomasz, A., Vogt, K.A., Volkmer, G.A., Wang, S.H., Wang, Y., Siegfried, B.D., Waterfield, N., ffrench-Constant, R.H., 2005. Expressed sequence tags 1369 Weiner, M.P., Yu, P., Begley, R.F., Rothberg, J.M., 2005. Genome sequencing in from Diabrotica virgifera virgifera midgut identify a coleopteran cadherin and 1317 microfabricated high-density picolitre reactors. Nature 437, 376–380. a diversity of cathepsins. Insect Mol. Biol. 14, 137–143. 1370 1318 Marshall, S.D., Gatehouse, L.N., Becher, S.A., Christeller, J.T., Gatehouse, H.S., Soberon, M., Gill, S.S., Bravo, A., 2009. Signaling versus punching hole: how do 1371 1319 Hurst, M.R., Boucias, D.G., Jackson, T.A., 2008. Serine identified from Bacillus thuringiensis toxins kill insect midgut cells? Cell. Mol. Life Sci., 1372 a Costelytra zealandica (White) (Coleoptera: ) midgut EST library doi:10.1007/s00018-008-8330-9. 1320 and their expression through insect development. Insect Mol. Biol. 17, 247–259. Tamura, K., Dudley, J., Nei, M., Kumar, S., 2007. MEGA4: molecular evolutionary 1373 1321 Millar, N.S., Denholm, I., 2007. Nicotinic acetylcholine receptors: targets for genetics analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24, 1596–1599. 1374 1322 commercially important insecticides. Invert. Neurosci. 7, 53–66. Toprak, U., Baldwin, D., Erlandson, M., Gillott, C., Hou, X., Coutu, C., Hegedus, D.D., 1375 1323 Myhre, S., Tveit, H., Mollestad, T., Laegreid, A., 2006. Additional gene ontology 2008. A chitin deacetylase and putative insect intestinal lipases are components 1376 structure for improved biological reasoning. Bioinformatics 22, 2020–2027. of the Mamestra configurata (Lepidoptera: Noctuidae) peritrophic matrix. Insect 1324 Ochoa-Campuzano, C., Real, M.D., Martinez-Ramirez, A.C., Bravo, A., Rausell, C., Mol. Biol. 17, 573–585. 1377 1325 2007. An ADAM metalloprotease is a Cry3Aa Bacillus thuringiensis toxin Zhulidov, P.A., Bogdanova, E.A., Shcheglov, A.S., Vagner, L.L., Khaspekov, G.L., 1378 1326 receptor. Biochem. Biophys. Res. Commun. 362, 437–442. Kozhemyako, V.B., Matz, M.V., Meleshkevitch, E., Moroz, L.L., Lukyanov, S.A., 1379 Pauchet, Y., Muck, A., Svatos, A., Heckel, D.G., Preiss, S., 2008. Mapping the larval Shagin, D.A., 2004. Simple cDNA normalization using kamchatka crab duplex- 1327 midgut lumen proteome of Helicoverpa armigera, a generalist herbivorous specific nuclease. Nucleic Acids Res. 32, e37. 1380 1328 insect. J. Proteome Res. 7, 1629–1639. Zou, Z., Najar, F., Wang, Y., Roe, B., Jiang, H., 2008. Pyrosequence analysis of 1381 1329 Pigott, C.R., Ellar, D.J., 2007. Role of receptors in Bacillus thuringiensis crystal toxin expressed sequence tags for Manduca sexta hemolymph proteins involved in 1382 activity. Microbiol. Mol. Biol. Rev. 71, 255–281. immune responses. Insect Biochem. Mol. Biol. 38, 677–682. 1330 1383 1331 1384 1332 1385 UNCORRECTED

Please cite this article in press as: Pauchet, Y., et al., Pyrosequencing of the midgut transcriptome of the poplar leaf beetle Chrysomela..., Insect Biochemistry and Molecular Biology (2009), doi:10.1016/j.ibmb.2009.04.001