bioRxiv preprint doi: https://doi.org/10.1101/2020.12.04.412270; this version posted December 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 Title 2 De novo construction of a transcriptome for the stink bug crop

3 pest impicticornis during late development 4 5 Authors 6 Bruno C. Genevcius*, Tatiana T. Torres 7 8 Affiliation 9 Department of Genetics and Evolutionary Biology, University of São Paulo, São Paulo, SP, 10 Brazil 11 *correspondence: [email protected] 12 13 Abstract 14 Chinavia impicticornis is a Neotropical stink-bug (: ) with economic 15 importance for different crops. Little is known about the development of the species, as 16 well as the genetic mechanisms that may favor the establishment of populations in 17 cultivated plants. Here we conduct the first large-scale molecular study with C. 18 impicticornis. We generated RNA-seq data for males and females, at two immature 19 stages, for the genitalia separately and for the rest of the body. We assembled the 20 transcriptome and conduct a functional annotation. De novo assembled transcriptome 21 based on whole bodies and genitalia of males and females contained around 400,000 22 contigs with an average length of 688 bp. After pruning duplicated sequences and 23 conducting a functional annotation, the final annotated transcriptome comprised 39,478 24 transcripts of which 12,665 had GO terms assigned. These novel datasets will provide 25 invaluable data for the discovery of molecular processes related to morphogenesis and 26 immature biology. We hope to contribute to the growing research on stink bug evo-devo as 27 well as the development of bio-rational solutions for pest management. 28 29 Key-words: genes, , larvae, mRNA, nextGen, nymphs, sequencing 30 31

1 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.04.412270; this version posted December 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

32 DATA DESCRIPTION 33 Context 34 The stink bug Chinavia impicticornis (Hemiptera, Pentatomidae) is a Neotropical species 35 with wide distribution across the South America. The species is an economically important 36 polyphagous pest reported to feed on fourteen plant species [1]. Most of its damage has 37 been attributed to soybean and closely related crops. However, the capability of switching 38 to less preferred hosts in anomalous conditions may help explain the successful 39 establishment of its populations in different crops [2,3]. 40 Not only C. impicticornis but many other species of stink bugs are responsible for 41 millions of dollars’ worth of damage worldwide every year. Taken as an example, the brown 42 marmorated stink bug ( halys) has caused a 37 million dollar loss only to 43 apple growers in the USA during 2011 [4]. Towards mitigating this damage, there is 44 growing applied and basic research which has certainly contributed to our knowledge of 45 the biological aspects and management strategies for stink bug pests [5]. Most efforts 46 have focused on sampling methods, taxonomic identification, insecticide effectiveness, 47 and population monitoring [6,7]. However, traditional pest control techniques have the 48 potential of inflicting serious ecological disturbances as well as the selection of resistant 49 lineages of crop pests. The development of bio-rational solutions is largely dependent on a 50 detailed comprehension of the underlying biology of these pests [8]. In this context, studies 51 on the “omics” fields offer promising species-specific and environmental-friendly tools [9]. 52 More specifically, transcriptomic approaches provide the foundation for identifying gene 53 targets associated with pheromones, pesticide resistance, and other features with potential 54 for pest managing. Conversely, only five species of stink have had full body transcriptomes 55 sequenced to date: the brown marmorated stink bug [10], the harlequin bug [9], the 56 southern green stink bug [11], the brown stink bug [12], and the predatory stink bug Arma 57 chinensis [13]. Genomic data are even more scarce, comprising four genomes assembled 58 to date: Halyomorpha halys [14], Euschistus heros, guildinii, and Stiretrus 59 anchorago. 60 Here we target this gap by documenting and characterizing the first transcriptome 61 for the Neotropical stink bug C. impicticornis (Hemiptera, Pentatomidae). We conduct our 62 study at the nymphal stage for two reasons. First, because management strategies are 63 known to affect nymphs and adults in different ways, and nymph transcriptomes are yet 64 scarce for stink bugs, e.g. [10]. Second, developmental transcriptomes provide invaluable 65 data for the discovery of molecular processes underlying morphogenesis. We focused on

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.04.412270; this version posted December 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

66 the fifth instar because this is the stage where morphological sexual differentiation takes 67 place [15]. These resources may be helpful for the growing research on genital evo-devo 68 and stink bugs development as a whole [15,16]. 69 70 Methods 71 Samples 72 Our colony of C. impicticornis was fed on green bean pods (Phaseolus vulgaris) and 73 reared under the following conditions: 26±1 °C, 65±10% RH, and 14L:10D hours of 74 photophase. We isolated RNA from male and female immatures at two developmental 75 stages: the beginning and the end of the fifth nymphal instar (two hours after and seven 76 days after molting from fourth to fifth instar, respectively). The fifth nymphal instar in our 77 controlled conditions has an average of eight days. We included three individuals per sex 78 per stage, totaling 12 specimens. We chose to remove the genitalia of each specimen to 79 extract RNA and construct the libraries of bodies and genitalia separately. Genitalia are the 80 most important traits for both species delimitation and sex identification in true bugs. 81 Therefore, with this approach, we expect to generate data to be particularly useful for 82 studying the mechanisms of sex determination and speciation. For RNA sequencing, each 83 genital sample was sequenced separately (n = 3 genitalia per sex per sample), while 84 bodies of the same sex and developmental stage were pooled to generate a single library 85 (n = 1 body per sex per stage). 86 87 RNA extraction & sequencing 88 Frozen specimens (at -80 °C) were sexed and had their genitalia separated under a 89 stereoscope. RNA extraction of genitalia and bodies were conducted with the Trizol 90 reagent following the manufacturer’s protocol (Invitrogen, Life Technologies). RNA was 91 quantified via Qubit 2.0 (Thermo Fisher Scientific, USA) and then sent for sequencing at 92 the Centro de Genômica Funcional (ESALQ-USP) using Illumina HiSeq 2500. Samples 93 were paired-end sequenced with 100 bp and a target depth of 20 million read pairs per 94 sample. 95 96 Transcriptome assembly & annotation 97 We removed redundant sequences to decrease computational usage for transcriptome 98 assembly using a custom Perl script [17] (supplementary material). De novo assembly was

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.04.412270; this version posted December 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

99 conducted in Trinity v. 2.4.0 [18] with concatenated samples, a minimum contig length of 100 199, and other parameters set to default values. 101 Due to the lack of a reference genome, transcriptome annotation was conducted 102 using FunctionAnnotator [19]. This is a web-based tool that blasts sequences against the 103 NCBI NR protein database. FunctionAnnotator was also used for the functional 104 characterization, which uses the B2G4PIPE engine for GO terms assignment. 105 106 Quality control 107 The quality of raw reads was assessed using the program FastQC v. 0.11.9. A quality 108 control plot was constructed using MultiQC v. 1.9 [20] and is represented in Fig. 1. Raw 109 reads of all samples had good quality. We removed low quality sequences and adapters 110 using Trimmomatic v. 0.36 with the following parameters: LEADING:3 TRAILING:3 111 SLIDINGWINDOW:4:20 MINLEN:36. An average of 11.5 ± 0.006 % of raw reads was 112 removed during the trimming process. 113 To avoid redundant transcripts in the transcriptome, we removed the shortest 114 isoforms with a build-in Trinity script. We then assessed transcriptome completeness using 115 the Benchmarking Universal Single-Copy Orthologs (BUSCO) v. 4.1.4 [21] against the 116 Hemiptera ortholog database with default parameters. BUSCO analysis revealed 117 appropriate levels of completeness for the assembled transcriptome (Complete: 90.1%, 118 Fragmented: 1.5%, Missing: 8.4%; Fig. 1). 119 120 Transcriptome characterization 121 RNA sequencing generated approximately 5 million raw reads (Table 1). The de novo 122 assembled transcriptome had almost 400k contigs, with a 35.04% GC content and a mean 123 contig length of 688 bp. After removing the shortest isoforms, 268,000 contigs remained. 124 Out of these, 39,478 had successful matches in the NCBI database according to 125 FunctionAnnotator, of which 36,329 aligned with metazoans and 33,871 with . 126 The vast majority of the contigs were annotated against the Hemiptera sequences, more 127 specifically, against the genome of Halyomorpha halys (Fig. 2). All metrics had comparable 128 values among previous studies with stink bugs (Table 1). 129 For the 33,871 transcripts that aligned with arthropods, 12,665 transcripts had at 130 least one GO term assigned. A total of 5,087 GO terms were identified combining the three 131 categories: biological processes, molecular functions and cellular components. For the 132 function “biological process”, the most represented GO term among the annotated

4 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.04.412270; this version posted December 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

133 transcripts was RNA-dependent DNA biosynthetic process (Fig. 3). For the molecular 134 functions, RNA-binding was the most common GO term, while the cellular component with 135 higher representativeness was the nucleus (Fig. 3). 136 137 Re-use potential 138 Genomic and transcriptomic resources are paramount for advances in pest control and for 139 the understanding of the origins of morphological innovations. Surprisingly, despite the 140 high species diversity and the economic importance of stink bugs, large-scale comparative 141 molecular studies are still scarce for this group of . Here, we present the first large- 142 scale molecular study with the Neotropical pest C. impicticornis. From an economic 143 perspective, we encourage further use of our data to determine the genetic bases of 144 adaptation to new hosts and insecticide resistance. In particular, comparative 145 transcriptomic studies may help explain why some stink bug species are more or less 146 adaptable to different crops in comparison to C. impicticornis. From an evolutionary 147 perspective, our data may be useful to investigate both the mechanisms of species 148 separation and sexual differentiation. As genitalia tend to be the most complex and rapidly 149 evolving traits in insects, studying the genetic/developmental architecture of these 150 structures may help understand the emergence of morphological barriers during 151 speciation. Our study is the first to make available a transcriptome of male and female 152 immatures separately for stink bugs. We also provide, for the first time, transcriptome data 153 of two different moments of a single immature stage, which will potentially inform about 154 developmental heterochrony on a finer scale. Lastly, we also hope to contribute with useful 155 data for phylogenomic studies of hemipterans. 156 157 Availability of Supporting Data 158 Raw data are associated with the NCBI BioProject PRJNA666218 and are deposited at 159 the Sequence Reads Archive (SRA) under the accession codes: SRR12762704, 160 SRR12762703, SRR12762702, SRR12762696, SRR12762692, SRR12762689, 161 SRR12762690, SRR12762691, SRR12762693, SRR12762694, SRR12762695, 162 SRR12762697, SRR12762698, SRR12762699, SRR12762700 and SRR12762701. 163 Transcriptome assembly is deposited at the NCBI Transcriptome Shotgun Assembly 164 Sequence Database (TSA) under the accession code GIVF00000000. 165 166

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.04.412270; this version posted December 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

167 Acknowledgments 168 We thank Gisele Antoniazzi and Denis Callio for the assistance with establishing and 169 maintaining the colony, as well as Luiza Saad and Federico Brown for their support 170 with the RNA extractions. We also thank the reviewers Peter Thrope and Guillem Ylla and 171 the editor Scott Edmunds for their constructive criticism during the review. BCG was 172 supported by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) with a 173 Post-doctoral fellowship (proc. n. 18/18184-4). This work was supported by grants to TTT 174 from FAPESP (grant 2016/09659-3). 175 176 Authors’ contributions 177 BCG handled the insect colony, conducted the experiments and RNA extraction. BCG and 178 TTT analyzed the data and wrote the manuscript. 179 180 Competing interests 181 The authors declare no competing interests. 182

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.04.412270; this version posted December 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

183 Figures 184

185 Figure 1. Quality control of raw reads using FastQC (left) and analysis of transcriptome 186 completeness using BUSCO (right). 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 Figure 2. Diversity of insect orders from which the contigs of the C. impicticornis 204 transcriptome were annotated. 205 206 207

7 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.04.412270; this version posted December 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 Figure 3. Top ten GO terms for each category with higher abundance in the transcriptome 224 of C. impicticornis annotated with FunctionAnnotator. 225 226

8 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.04.412270; this version posted December 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

227 Tables 228 229 Table 1. Descriptive statistics of raw RNA-seq data and transcriptome assembly of the 230 stink bug C. impicticornis compared with all other transcriptomic studies of pentatomids.

Chinavia Arma Euschistus Halyomorpha Murgantia impicticornis chinensis heros halys histrionica viridula Platform Illumina Illumina Illumina Illumina Illumina Illumina HiSeq. 2500 HiSeq. 2000 HiSeq. 2000 HiSeq. 1000 HiSeq. HiSeq. 2000 Raw reads 522,644,434 nr 126,455,838 439,615,225 nr 284,400,000 Contigs 392,739 112,029 83,114 248,569 526,403 299,148 assembled GC content 35.04% nr 37.12% nr nr nr Mean contig 688 bp 250 bp 1,000 bp nr 812 bp nr length Pipeline Trinity Trinity Trinity Trinity Trinity Trinity GOs 5,087 nr 143,806 1,099 1,348 771 annotated Reference This study [13] [12] [10] [9] [11] BioProject PRJNA666218 PRJNA180995 PRJNA488833 PRJNA24284 PRJNA302154 PRJNA472074 accession 9 231 232 233 References 234 235 1. Schwertner F, Grazia J. O gênero Chinavia Orian (Hemiptera, Pentatomidae, 236 ) no Brasil, com chave pictórica para os adultos. Rev Bras Entomol. 51:416– 237 352007; 238 2. Panizzi AR, Grazia J. Stink bugs (Heteroptera, Pentatomidae) and an unique host plant 239 in the Brazilian subtropics. Iheringia - Ser Zool. 2001; doi: 10.1590/s0073- 240 47212001000100003. 241 3. Silva CCA, Blassioli-Moraes MC, Borges M, Laumann RA. Food diversification with 242 associated plants increases the performance of the Neotropical stink bug, Chinavia 243 impicticornis (Hemiptera: Pentatomidae). Plant Interact. Springer Netherlands; 244 2019; doi: 10.1007/s11829-018-9637-6. 245 4. Leskey TC, Short BD, Butler BR, Wright SE. Impact of the invasive brown marmorated 246 stink bug, Halyomorpha halys (Stål), in mid-Atlantic tree fruit orchards in the United States: 247 Case studies of commercial management. Psyche (London). 2012; doi: 248 10.1155/2012/535062. 249 5. Koch RL, Pezzini DT, Michel AP, Hunt TE. Identification, biology, impacts, and 250 management of stink bugs (Hemiptera: Heteroptera: Pentatomidae) of soybean and corn 251 in the midwestern United States. J Integr Pest Manag. 2017; doi: 10.1093/jipm/pmx004.

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.04.412270; this version posted December 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

252 6. Kamminga KL, Koppel AL, Herbert DA, Kuhar TP. Biology and management of the 253 green stink bug. J Integr Pest Manag. 2012; doi: 10.1603/IPM12006. 254 7. Palumbo JC, Perring TM, Millar JG, Reed DA. Biology, Ecology, and Management of an 255 Invasive Stink Bug, hilaris, in North America. Annu Rev Entomol. 2016; doi: 256 10.1146/annurev-ento-010715-023843. 257 8. Tassone EE, Geib SM, Hall B, Fabrick JA, Brent CS, Hull JJ. De novo construction of an 258 expanded transcriptome assembly for the western tarnished plant bug, Lygus hesperus. 259 Gigascience. 2016; doi: 10.1186/s13742-016-0109-6. 260 9. Sparks ME, Rhoades JH, Nelson DR, Kuhar D, Lancaster J, Lehner B, et al.. A 261 transcriptome survey spanning life stages and sexes of the harlequin bug, Murgantia 262 histrionica. Insects. 2017; doi: 10.3390/insects8020055. 263 10. Sparks ME, Shelby KS, Kuhar D, Gundersen-Rindal DE. Transcriptome of the invasive 264 brown marmorated stink bug, Halyomorpha halys (Stal) (Heteroptera: Pentatomidae). 265 PLoS One. 2014; doi: 10.1371/journal.pone.0111646. 266 11. Lavore A, Perez-Gianmarco L, Esponda-Behrens N, Palacio V, Catalano MI, Rivera- 267 Pomar R, et al.. (Hemiptera: Pentatomidae) transcriptomic analysis and 268 neuropeptidomics. Sci Rep. 2018; doi: 10.1038/s41598-018-35386-4. 269 12. Cagliari D, Dias NP, dos Santos EÁ, Rickes LN, Kremer FS, Farias JR, et al.. First 270 transcriptome of the Neotropical pest Euschistus heros (Hemiptera: Pentatomidae) with 271 dissection of its siRNA machinery. Sci Rep. 2020; doi: 10.1038/s41598-020-60078-3. 272 13. Zou D, Coudron TA, Liu C, Zhang L, Wang M, Chen H. Nutrigenomics in Arma 273 chinensis: transcriptome analysis of Arma chinensis fed on artificial diet and chinese oak 274 silk moth Antheraea pernyi pupae. PLoS One. 2013; doi: 10.1371/journal.pone.0060881. 275 14. Sparks ME, Bansal R, Benoit JB, Blackburn MB, Chao H, Chen M, et al.. Brown 276 marmorated stink bug, Halyomorpha halys (Stål), genome: Putative underpinnings of 277 polyphagy, insecticide resistance potential and biology of a top worldwide pest. BMC 278 Genomics. BMC Genomics; 2020; doi: 10.1186/s12864-020-6510-7. 279 15. Genevcius BC, Simon MN, Moraes T, Schwertner CF. Copulatory function and 280 development shape modular architecture of genitalia differently in males and females. 281 Evolution (N Y). 2020; doi: 10.1111/evo.13977. 282 16. Lu Y, Chen M, Reding K, Pick L. Establishment of molecular genetic approaches to 283 study gene expression and function in an invasive hemipteran, Halyomorpha halys. 284 Evodevo. BioMed Central; 2017; doi: 10.1186/s13227-017-0078-6. 285 17. Tandonnet S, Torres TT. Traditional versus 3′ RNA-seq in a non-model species. 286 Genomics Data. The Author(s); 2017; doi: 10.1016/j.gdata.2016.11.002.

10 bioRxiv preprint doi: https://doi.org/10.1101/2020.12.04.412270; this version posted December 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

287 18. Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, et al.. De 288 novo transcript sequence reconstruction from RNA-seq using the Trinity platform for 289 reference generation and analysis. Nat Protoc. 2013; doi: 10.1038/nprot.2013.084. 290 19. Chen TW, Gan RC, Fang YK, Chien KY, Liao WC, Chen CC, et al.. FunctionAnnotator, 291 a versatile and efficient web tool for non-model organism annotation. Sci Rep. 2017; doi: 292 10.1038/s41598-017-10952-4. 293 20. Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: Summarize analysis results for 294 multiple tools and samples in a single report. Bioinformatics. 2016; doi: 295 10.1093/bioinformatics/btw354. 296 21. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva E V., Zdobnov EM. BUSCO: 297 Assessing genome assembly and annotation completeness with single-copy orthologs. 298 Bioinformatics. 2015; doi: 10.1093/bioinformatics/btv351. 299

11