Genome of prolixus, an vector of , reveals unique adaptations to hematophagy and parasite infection Rafael D. Mesquita, Claudio R. Lazzari

To cite this version:

Rafael D. Mesquita, Claudio R. Lazzari. Genome of , an insect vector of Chagas disease, reveals unique adaptations to hematophagy and parasite infection. Proceedings of the National Academy of Sciences of the United States of America , National Academy of Sciences, 2015, 112 (48), pp.14936-14941. ￿10.1073/pnas.1506226112￿. ￿hal-01323805￿

HAL Id: hal-01323805 https://hal.archives-ouvertes.fr/hal-01323805 Submitted on 27 May 2020

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés.

Distributed under a Creative Commons Attribution - NonCommercial| 4.0 International License Genome of Rhodnius prolixus, an insect vector of Chagas disease, reveals unique adaptations to hematophagy and parasite infection

Rafael D. Mesquitaa,b,1,2, Raquel J. Vionette-Amaralc,1, Carl Lowenbergerd, Rolando Rivera-Pomare,f, Fernando A. Monteirob,g, Patrick Minxh, John Spiethh, A. Bernardo Carvalhob,i, Francisco Panzeraj, Daniel Lawsonk, André Q. Torresa,g, Jose M. C. Ribeirol, Marcos H. F. Sorgineb,c, Robert M. Waterhousem,n,o,p, Michael J. Montagueh, Fernando Abad-Franchq,3, Michele Alves-Bezerrac, Laurence R. Amaralr, Helena M. Araujob,s, Ricardo N. Araujob,t, L. Aravindu, Georgia C. Atellab,c, Patricia Azambujab,g, Mateus Bernis, Paula R. Bittencourt-Cunhac, Gloria R. C. Braza,b, Gustavo Calderón-Fernándezv, Claudia M. A. Cararetow, Mikkel B. Christensenk, Igor R. Costac, Samara G. Costag, Marilvia Dansax, Carlos R. O. Daumas-Filhoc, Iron F. De-Paulac, Felipe A. Diasb,c, George Dimopoulosy, Scott J. Emrichz, Natalia Esponda-Behrense, Patricia Fampaaa, Rita D. Fernandez-Medinabb, Rodrigo N. da Fonsecab,cc, Marcio Fonteneleb,s, Catrina Fronickh, Lucinda A. Fultonh, Ana Caroline Gandarab,c, Eloi S. Garciab,g, Fernando A. Gentab,g, Gloria I. Giraldo-Calderóndd, Bruno Gomesb,g, Katia C. Gondimb,c, Adriana Granzottow, Alessandra A. Guarnerib,ee, Roderic Guigóff,gg, Myriam Harryhh,ii, Daniel S. T. Hughesk, Willy Jablonkac, Emmanuelle Jacquin-Jolyjj, M. Patricia Juárezv, Leonardo B. Koerichb,i, José Manuel Latorre-Estivalisb,ee, Andrés Lavoree, Gena G. Lawrencekk, Cristiano Lazoskib,i, Claudio R. Lazzarill, Raphael R. Lopesc, Marcelo G. Lorenzob,ee, Magda D. Lugonx, David Majerowiczc,mm, Paula L. Marcetkk, Marco Mariottiff,gg, Hatisaburo Masudab,c, Karine Megyk, Ana C. A. Meloa,b, Fanis Missirlisnn, Theo Motaoo, Fernando G. Noriegapp, Marcela Nouzovapp, Rodrigo D. Nunesb,c, Raquel L. L. Oliveiraa, Gilbert Oliveira-Silveirac, Sheila Onse, Lucia Pagolae, Gabriela O. Paiva-Silvab,c, Agustina Pascuale, Marcio G. Pavang, Nicolás Pedriniv, Alexandre A. Peixotob,g, Marcos H. Pereirab,t, Andrew Pikey, Carla Polycarpob,c, Francisco Prosdocimic, Rodrigo Ribeiro-Rodriguesqq, Hugh M. Robertsonrr, Ana Paula Salernoss, Didier Salmonc, Didac Santesmassesff,gg, Renata Schamab,g, Eloy S. Seabra-Juniorss, Livia Silva-Cardosoc, Mario A. C. Silva-Netob,c, Matheus Souza-Gomesr, Marcos Sterkelc, Mabel L. Taracenac, Marta Tojott, Zhijian Jake Tuuu, Jose M. C. Tubiovv, Raul Ursic-Bedoyad, Thiago M. Venanciob,x, Ana Beatriz Walter-Nunoc, Derek Wilsonk, Wesley C. Warrenh, Richard K. Wilsonh, Erwin Huebnerww, Ellen M. Dotsonkk,2,4, and Pedro L. Oliveirab,c,2,4

Edited by Alberto Carlos Frasch, Universidad de San Martin and National Research Council (Consejo Nacional de Investigaciones Científicas y Técnicas de Argentina), San Martin-C.P., Argentina, and approved October 6, 2015 (received for review June 3, 2015)

Rhodnius prolixus not only has served as a model organism for the gametogenesis, the role of hemolymph proteins and lipids in oo- studyofinsectphysiology,butalsoisamajorvectorofChagasdis- genesis, and ion and water transport mechanisms. As an obligate ease, an illness that affects approximately seven million people world- blood-feeding hemipteran, this insect has adapted remarkably well to wide. We sequenced the genome of R. prolixus, generated assembled digesting and eliminating the potentially toxic by-products of blood sequences covering 95% of the genome (∼702 Mb), including 15,456 digestion. R. prolixus is also a major vector of ,the putative protein-coding genes, and completed comprehensive geno- parasitic protozoan that causes Chagas disease in humans. This mic analyses of this obligate blood-feeding insect. Although immune- disease, commonly considered a disease of the poor, causes pre- deficiency (IMD)-mediated immune responses were observed, R. pro- mature heart failure in humans and is responsible for high economic lixus putatively lacks key components of the IMD pathway, suggesting a reorganization of the canonical immune signaling network. Al- Author contributions: E.M.D. and P.L.O. conducted project coordination; R.D.M., C. Lowenberger, though both Toll and IMD effectors controlled intestinal microbiota, R.R.-P., F.A.M., A.B.C., F. Panzera, E.H., E.M.D., and P.L.O. were on the Steering Committee; neither affected Trypanosoma cruzi, the causal agent of Chagas dis- F.A.M., G.G.L., C. Lazoski, P.L.M., M.G.P., and E.M.D. conducted colony selection; G.G.L., ease, implying the existence of evasion or tolerance mechanisms. P.L.M., and E.M.D. supplied sequencing samples; P.M., J.S., C.F., L.A.F., W.C.W., and R.K.W. R prolixus conducted sequencing and genome assembly; R.J.V.-A., M.H.F.S., and D. Salmon conducted . has experienced an extensive loss of selenoprotein genes, experimental analyses; R.D.M., D.L., A.Q.T., J.M.C.R., R.M.W., M.B.C., S.J.E., G.I.G.-C., D.S.T.H., with its repertoire reduced to only two proteins, one of which is a K.M., E.S.S.-J., and D.W. conducted bioinformatics; all authors conducted gene anno- selenocysteine-based glutathione peroxidase, thefirstfoundininsects. tation and supplementary material writing; R.D.M., R.J.V.-A., C. Lowenberger, R.R.-P., A.Q.T., J.M.C.R., M.H.F.S., R.M.W., M.J.M., and P.L.O. wrote the paper; and R.D.M., The genome contained actively transcribed, horizontally transferred C. Lowenberger, R.R.-P., F.A.M., P.M., A.B.C., F. Panzera, M.H.F.S., R.M.W., M.J.M., E.M.D., genes from Wolbachia sp., which showed evidence of codon use evo- and P.L.O. conducted paper revision. lution toward the insect use pattern. Comparative protein analyses The authors declare no conflict of interest. revealed many lineage-specific expansions and putative gene absences This article is a PNAS Direct Submission. R prolixus in . , including tandem expansions of genes related to che- Data deposition: The cloned rpRelish sequence reported in this paper has been deposited moreception, feeding, and digestion that possibly contributed to the in the GenBank database (accession no. KP129556). The genomic assembly (RproC1) and evolution of a blood-feeding lifestyle. The genome assembly and these the consensus gene prediction (version 1.3) have been deposited in the VectorBase re- associated analyses provide critical information on the physiology and pository, https://www.vectorbase.org. The raw genomic reads have been deposited in the Sequence Read Archive (BioSample accession nos. SAMN02953759 and SAMN00009535– evolution of this important vector species and should be instrumental SAMN00009542). See also Genbank Assembly accession no. GCA_000181055.2. for the development of innovative disease control methods. 1R.D.M. and R.J.V.-A. contributed equally to this work. 2To whom correspondence may be addressed. Email: [email protected], edotson@cdc. Rhodnius prolixus | genome | hematophagy | immunity | Chagas disease gov, or [email protected]. 3Present address: Centro de Pesquisa René Rachou, Minas Gerais 30190-002, Brazil. ating back to Wigglesworth’s pioneering work in the 1930s 4E.M.D. and P.L.O. contributed equally to this work. D(1), Rhodnius prolixus (Fig. 1A) has served as a model organism This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. for the study of cellular and physiological processes in , such as 1073/pnas.1506226112/-/DCSupplemental.

14936–14941 | PNAS | December 1, 2015 | vol. 112 | no. 48 www.pnas.org/cgi/doi/10.1073/pnas.1506226112 recombination/repair enzymes, including one Holliday junction Significance helicase (RPRC004559), two DNA mismatch repair MutL pro- teins (RPRC010818 and RPRC011745), and one DNA polymerase Rhodnius prolixus is a major vector of Chagas disease, an illness I (RPRC006597). These findings suggest that extensive machinery caused by Trypanosoma cruzi which affects approximately to transpose, recombine, and repair the host DNA were likely in- 7 million people worldwide. This report describes the first ge- strumental for the success of Wolbachia gene transfer to R. prolixus. nome sequence of a nondipteran insect vector of an important human parasitic disease. This insect has a gene repertoire sub- RNAi Machinery and Target Genes. RNAi is a posttranscriptional stantially distinct from dipteran disease vectors, including im- gene-silencing mechanism triggered by double-stranded RNAs mune signaling pathways that display major departures from (dsRNAs) that degrade a target messenger RNA (mRNA) in a the canonical network. Large gene expansions related to che- sequence-specific manner (11). The RNAi mechanism can also be moreception, feeding, and digestion have facilitated triatomine triggered by microRNAs (miRNAs), which are small noncoding adaptation to a blood-feeding lifestyle. This study provides in- RNAs (12). We identified RNAi machinery (Table D5 in Dataset SI formation about the physiology and evolution of an important S1) as well as precursors and 87 mature miRNA sequences ( Appendix, Figs. A3 and A4 and Tables D6–D9 in Dataset S1) that disease vector that will boost understanding of transmission of a comprised a complete gene-silencing pathway for 804 potential life-threatening parasite and may lead to the development of target genes (SI Appendix, Fig. A3C and Tables D10 and D11 in innovative control methods. Dataset S1). Some miRNA clusters were similar to those de- scribed for Caenorhabditis elegans (13), such as the cluster rpr- miR-71/2a-2/13a/13b/2c/2a-1 (SI Appendix,Fig.A3B)thatwas and social costs. Approximately 10,000 people die from the disease identified in one of the introns of the phosphatase-four-like pro- annually and 100 million people are at risk for infection (2). tein gene (RPRC004331). The miRNA-target gene pairs also Results and Discussion showed conservation; for example, rpr-miR-124-3p targeting the ROCK1 protein kinase gene (RPRC007732-RA) (14) and rpr-miR- Genome Landscape. We assembled 702.6 Mb (v3.0.1, designated 10-3p targeting the Huntington-like gene (RPRC006430-RA) (15). RproC1) of the predicted 733-Mb genome size (3, 4), with mean × depth coverage of 8 . The current assembly includes 27,872 Immune Pathways and Their Effects on Intestinal Microbiota and scaffolds (SI Appendix, Fig. A1), with a measured GC content of T. cruzi. We identified most of the genes from canonical immune GENETICS 27.1% that was slightly lower than that of another hemipteran, pathways, including the Toll, immune deficiency (IMD), and Jak/ the pea aphid, Acyrthosiphon pisum (29.2%), although compa- STAT pathways, and several immune effectors, including a rable GC content was found in protein-coding regions (5). The substantial LSE of defensins (SI Appendix,Fig.A5and Table consensus gene prediction (VectorBase 1.3) includes 15,456 D12 in Dataset S1). However, some canonical components of the protein-coding genes and 738 RNA genes (Figs. 1A and SI Ap- IMD pathway were not detected: IMD,Fas-associatedprotein pendix, Fig. A1). We also found 25 Y-chromosome linked scaf- with death domain (Fadd), death-related ced-3/Nedd2-like cas- folds, including nine Y-linked genes. pase (Dredd), and Caspar. A similar observation was reported for Orthologous gene clustering analyses (6) classified R. prolixus the pea aphid, where a more extensive loss of the IMD pathway genes with orthologs in other insects and outgroup species (Fig. genes purportedly allowed the development of its obligate endo- 1B), and highlighted 630 lineage-specific expansions (LSE) in many symbiont (16). In R. prolixus, however, some members of the IMD R. prolixus gene groups related to defense mechanisms, behavior, pathway were found, such as the peptidoglycan recognition pro- development, and physiology. A comparison of gene families based teins (PGRPs) and the NF-κB/Rel homolog, rpRelish (KP129556) on protein domain annotations revealed a few putative lineage- (Fig. 2A). The expression of the latter was up-regulated in both specific reductions (LSR), where R. prolixus possessed substantially the intestinal epithelium and fat body 24 and 72 h after a blood fewer genes. These LSEs and LSRs will be specifically discussed meal (Fig. 2 B and C), thereby indicating an active IMD pathway below in relation to their biological roles. in R. prolixus. This possibility was investigated by rpRelish knock- down (SI Appendix,Fig.A6B), which decreased expression of Transposable Elements and Horizontal Gene Transfers. Approximately midgut defensin A (AAO74624), whereas expression of lysozyme B 1,400 transposable elements (TEs), with members from most of (ABX11554) remained unchanged and lysozyme A expression the known TE superfamilies, comprised 5.6% of the genome; (ABX11553) increased (Fig. 2D).Thesedatasuggestthat this percentage was much lower relative to Aedes aegypti (∼50%) rpRelish directly controls defensin A expression, but not the and Anopheles gambiae (∼20%) but similar to other anophelines lysozymes. Silencing of rpRelish also increased the population of (2–11%) (7). Nearly 70% of the TEs belonged to class II (mariner the symbiotic bacteria Rhodococcus rhodnii (Fig. 2 E–G), thus TEs) and a single superfamily (DTTRP1-7), composed of seven providing further support for an active IMD pathway despite the new families, represents almost 3% of the genome (Fig. 1 A and C lack of several canonical proteins. This could be explained by and SI Appendix, Tables A1 and A2). Many of these sequences either the existence of unknown alternative components linking have full-length transposases and ORFs suggesting a recent period the PGRP receptors to rpRelish or, alternatively, to a novel of increased transposition (8). rewiring of the immune network (Fig. 2I). The role of the IMD Horizontal gene transfers from Wolbachia were likely the or- pathway in the control of the gut microbiota led us to investigate igin of 27 genes (Tables D1–D3 in Dataset S1) located in some its role in the control of T. cruzi. Unexpectedly, rpRelish-silenced of the 85 Wolbachia-like genomic regions. These regions span- insects infected with T. cruzi (strain DM28C) did not change ned 100–2,500 bp, which were smaller and more dispersed than parasite loads after 7 (SI Appendix, Fig. A6C) or 14 days post- those reported in Glossina morsitans (two segments of ∼500 Kb) infection (Fig. 2H). Even more surprisingly, silencing the tran- (9). A previous transcriptome analysis showed that eight of these scription factor rpDorsal of the Toll pathway (17) (SI Appendix, genes appear to be transcribed in the midgut (10). Codon use Fig. A6A), did not change parasite levels (Figs. 2H and SI Ap- analysis of the horizontally transferred genes (HTG) revealed that, pendix, Fig. A6C). These findings strongly indicate that either T. of the 61 amino acid codons, 38 codons showed a shift from a cruzi infection does not activate the insect’simmunesystemorthat Wolbachia spp. use bias toward the Rhodnius codon frequency (SI the parasite is not affected by antimicrobial peptides that are pro- Appendix,Fig.A2and Table D4 in Dataset S1). This codon fre- duced in response to parasitism or the ingestion of a blood meal. A quency shift suggests an ongoing adaptive process whereby the HTG plausible hypothesis for these data is that T. cruzi developed active coevolved with the insect tRNA profile for more efficient trans- evasion or tolerance mechanisms, because activation of these im- lation. The HTG also revealed two transposases (RPRC000742 and mune pathways did not control T. cruzi populations, as previously RPRC000770), one reverse transcriptase (RPRC000723), and DNA suggested (18, 19).

Mesquita et al. PNAS | December 1, 2015 | vol. 112 | no. 48 | 14937 ABIV C III II Hemimetabola Holometabola Pogo Piggyback I Helitrons P R. prolixus N. vitripennis Maverick Sola C. lectularius A. mellifera hAT A. pisum T. castaneum TC1 P. humanus B. mori Gypsy 607 686 Diptera Outgroups Pao Copia D. melanogaster C. elegans G. morsitans I. scapularis 628 921 422 A. gambiae S. maritima A. aegypti D. pulex MITE LTR 532 769 787 2034 Jockey 16K NLTR LoA 3017115 234 [2253] 12K CLASS II Rose * 168 112 Totopi 8K 8K Felyant CR1-like s 123 4KNo homology 4K RTE 16 species4Multi-copy set Self-homology Mariner I Single-copy Metazoan homology R1

Rp Cl ApGenes Widespread orthology No orthology Genes Rp Cl Ap Ingi

Fig. 1. Rhodnius genome. (A) Genome overview. All scaffolds are represented in layer I and are organized clockwise from the longest to the shortest, starting at the arrowhead. The genic (layer II, red) and TEs (layer III, blue) showed opposite densities until the asterisk (*) and were similarly low from this point until the end (shorter scaffolds). The Wolbachia sp. insertions (layer IV, orange) were observed throughout the genome without a trend. The kernel picture illustrates an adult R. prolixus.(B) Gene clustering. The Venn diagram partitions 15,439 OrthoMCL gene clusters according to their species compo- sitions for R. prolixus and three other Hemimetabola (blue), four Diptera (yellow), four other Holometabola (green), and four noninsect outgroup species (pink). The 6,993 R. prolixus genes show widespread orthology (white circle, and bars, Bottom Left): these are part of the 7,115 clusters that have repre- sentatives from each of the four species sets, of which a conserved core of 2,253 clusters have orthologs in all 16 species. The 5,498 R. prolixus genes show no confident orthology (bars, Bottom Right), but most of these are homologous (e-value < 1e-05) to genes from other animals or to genes in its own genome. (C) TE distribution. The inner chart represents the three main classes of TEs (LTRs, non-LTRs, and class II), and the outer shows the distribution of TE superfamilies within each class. The charts are based on the total base pairs occupied by TE-related sequences longer than 0.5 Kb, as shown in SI Appendix, Table A1.

Selenocysteine Machinery, Selenoproteins, and Detoxification Enzymes. (RPRC014349) of the two R. prolixus thioredoxin reductases (Fig. Selenocysteine (Sec) is a proteinogenic amino acid that is in- 3A). These results, together with the finding of a Sec-GPx in corporated into proteins during translation by the reassignment of R. prolixus, indicate that insects originally had a more extensive specific UGA codons. We identified enzymes involved in Sec syn- repertoire of selenoproteins, and that Sec loss and replacement thesis and insertion (Table D13 in Dataset S1), but only two Sec by cysteine occurred independently in different species. proteins were found: selenophosphate synthase 2 (RPRC009014), A substantial LSE of enzymes involved in drug and detoxifica- which is involved in Sec synthesis, and glutathione peroxidase (GPx; tion occurred, such as in carboxylesterases and cytochrome P450 RPRC011108), a major antioxidant protein. In all insect genomes (Tables D14 and D15 in Dataset S1) (see also ref. 21). These sequenced to date, the known GPx genes were nonselenium, cys- enzymes, together with glutathione S-transferases, can drive in- teine-based enzymes. It was previously hypothesized that Sec was secticide resistance. Their identification is particularly relevant for replaced by cysteine in an ancient common ancestor of all insects. public health surveillance, as recent reports indicated that natural The discovery of a Sec-based GPx in an insect brings new insight triatomine populations have responded with increasingly higher into the evolution of this gene family, indicating that the Sec-GPx levels of insecticide resistance and could threaten the successful genes are more widespread and conserved than previously thought vector control initiatives that have occurred over the last two and (20). We also observed a recent Sec-to-cysteine conversion in one half decades in Latin America (22, 23).

A EFGI200 * 70 * 15 DAP-type 60 peptidoglycan PGRP 150 50 5 47 228 493 623 5 5 10 40 1 715 100 IMD RHD IPT Ank 30 5 FADD x Pirk 235 334 50 20 x x CFU x 10 CFU x 10 10 CFU x 10 T Dredd Tab Caspar 0 0 0 Tak x x BCD Mal Rel Mal Rel Mal Rel ? 8 5 40 * H Ankyrin * * Ird5 Kenny * 4 7 10 27 27 20 25 28 2127 27 20 Mal x 6 30 6 x 10 Rel 5 Relish 3 * * 10 Dor 4 20 104 2 103 2 Nucleus Caudal T. cruzi T. 2 Relish 2 10 1 101 Relative expression 100 0 -1 0 0 10 Defensin AM PM R Def Fas 24h 72h Fas 24h 72h LysA LysB

Fig. 2. Characterization of IMD pathway and control of T. cruzi replication. Although several proteins from the IMD pathway are missing from the R. prolixus genome, the pathway is still active. (A) The rpRelish protein contains a Rel homology domain (RHD), an Ig-like fold, Plexins, a transcription factor domain (IPT), and several ankyrin (Ank) domains. (B and C) In response to the growth of the native microbiota following a blood meal, rpRelish expression is increased at 24 and 72 h postblood meal in both the midgut (B) and fat body (C). Fas: fasting. When rpRelish expression is knocked down (pink) in the gut using RNAi (SI Appendix,Fig.A6B), the expression of defensin A (Def) is greatly reduced, and the expression of lysozyme A (LizA) is increased (D). Upon knockdown of rpRelish expression (pink), the bacteria load in the anterior midgut (AM) (E) and posterior midgut (PM) (F) is increased, with the same trend observed in the rectum (R) (G). dsMal control was shown in gray, *P < 0.05. Upon knockdown of rpDorsal or rpRelish expressions (SI Appendix,Fig.A6), T. cruzi levels are not altered in any section of the digestive tract 14 d after infection (H). dsMal control (gray), dsRelish (pink) and dsDorsal (blue). Median values are at the graphic top. (I) Representation of the IMD R. prolixus immune pathways depicting the absence of key IMD pathway components. In R. prolixus, although the Toll and Jak/STAT pathways are usually highly conserved, members of IMD pathway are missing (dashed lines shapes with red “X”), including the IMD, Fadd,andDredd genes and the negative modulators Pirk (poor IMD response upon knock-in) and Caspar. However, the pathway receptors, PGRPs and a homolog of the transcription factor Relish (rpRelish) are present, suggesting that the activity of the pathway is exerted through a noncanonical mechanism.

14938 | www.pnas.org/cgi/doi/10.1073/pnas.1506226112 Mesquita et al. Signaling and Development. We identified and manually curated represent a more recent loss as it was found in the pea aphid (SI most of the major metazoan cell signaling pathways related to Appendix, Fig. A12). Rhodnius saliva was previously studied for development and metabolism (Table D16 in Dataset S1). Tyrosine its capacity to interfere with host blood clotting, platelet aggre- kinases are a class of protein kinases found exclusively in meta- gation, and vasoconstriction (41), and also for its immunomod- zoans that regulate functions related to multicellularity, including ulatory activity. Lipocalin genes [n = 51 genes, some previously intercellular communication, growth, differentiation, adhesion, described (10, 41)], which transcribe the most abundant salivary and cell death. The tyrosine kinome (TK) of R. prolixus contained proteins, formed large LSEs, and occurred as tandem clusters (SI only 17 genes (Table D17 in Dataset S1), which were confined to Appendix, Fig. A15 and Table D28 in Dataset S1). We discovered 13 subfamilies (10 genes encode receptor TKs and 7 encode sol- 12 previously unidentified members of the nitrophorin clade (Fig. uble TKs). This represents the smallest TK described to date; for 3B), which encode lipocalins that carry nitric oxide in R. prolixus comparison, A. gambiae has 32 TKs and H. sapiens has 90 TK saliva. We also found LSEs occurring in tandem clusters for members distributed among 30 subfamilies (24, 25). aspartic peptidases (cathepsin-D like aspartic proteases), probably Using orthologous relations to other insects (26), we identified reflecting a replacement of digestive serine proteinases found in genes involved in oogenesis, anteroposterior (AP) and dorso- most insects by lysosomal proteinases for blood digestion in he- ventral (DV) axes determination (Tables D18 and D19 in mipterans (Table D29 in Dataset S1). Among the putative LSRs Dataset S1). Of note, the germ plasm genes osk and valois (vls) (Table D30 in Dataset S1), a reduced set of only seven amiloride- were absent from R. prolixus. The absence of osk likely reflects a sensitive sodium channels (SI Appendix,Fig.A16) may also be gene loss in insect evolution, because it was present in the linked to hematophagy because amiloride-sensitive sodium chan- transcriptome of the basally branching cricket Gryllus bimacu- nels are involved in the malpighian tubules filtration. This paucity latus (27), where it was incorporated during neural development. could have evolved to limit sodium transport for the production of Among the embryonic AP genes, Kruppel (RPRC000102) and hypoosmotic urine and can be related to the unusually weak re- giant (RPRC001027) were functionally confirmed as gap genes sponse of R. prolixus sodium channels to amiloride (42). (28, 29). We identified single-copy orthologs for the DV pat- Moreover, we manually annotated other gene families related to terning genes, including most Toll and bone morphogenetic hematophagy but without displaying signals of LSE. These families protein (BMP) pathway elements. The BMP type I receptor Sax, included the following pathways/functions: iron/heme binding, however, was not found in R. prolixus or in previous analyses transport and metabolism (43, 44) (Table D31 in Dataset S1); be- (30). Our functional analysis showed that one of the rpToll genes havioral control of the blood-seeking habits (45), including loco- (RPRC009262), similarly to that reported for hymenopterans motory and visual activities, memory formation, temperature and GENETICS (31, 32), performs both DV and AP embryonic patterning roles humidity detection, mechanoreception, and circadian clock control (30), pointing to a potentially novel role that this pathway ele- (Tables D32 and D33 in Dataset S1); energy metabolism pathways ment might display during embryogenesis. The role of hypoxia in [glycolysis, gluconeogenesis, and pentose-phosphate pathways (Table the development of the tracheal system—originally reported in D34 in Dataset S1)]; and regulatory peptides and receptors, as well as R. prolixus by Wigglesworth (33)—involves a set of remarkably con- 37 neuropeptide precursors or hormone genes (46). served genes (Table D20 in Dataset S1) (34). The role of the corpora As reported for A. pisum (47), most urea cycle enzymes were allata hormone, later named (JH), in the control not observed, with the exception of argininosuccinate synthetase of insect metamorphosis and reproduction was also another of Wigglesworth’s landmarks in the history of biology using R. prolixus as the experimental model (35). Although eight different forms of JH have been identified in other insects, ironically, the identity of the JH A U Caenorhabditis elegans U Pediculus humanus forms present in triatomines are still unknown. Nevertheless, we 100 100 100 U Pachypsylla venusta identified R. prolixus genes coding for the JH biosynthetic pathway/ U Diaphorina citri 100 95 Cimex lectularius signaling system (36) (Table D21 in Dataset S1). 99 U Cimex lectularius 100 99 C Rhodnius prolixus (RPRC014349) Among developmental genes, marked LSEs have occurred in U Cimex lectularius several transcription factor (TF) families. Recent independent C Caenorhabditis elegans C Pediculus humanus expansions of the Pipsqueak TF family in R. prolixus, as well as 100 C Drosophila melanogaster 100 100 C Drosophila melanogaster A. pisum, were driven in part by transponsons, with the potential C Acyrthosiphon pisum 99 Pachypsylla venusta reuse of DNA-binding domains of transposases as new TFs. 99 100 C Diaphorina citri 96 C Frankliniella occidentalis Given the developmental roles of Pipsqueak TFs in Drosophila, C Cimex lectularius 99 C these LSEs might have played some role in the morphological 0.5 100 83 Oncopeltus fasciatus C Rhodnius prolixus (RPRC006341) and behavioral diversification of hemipterans. Other predicted 100 C Halyomorpha halys TF expansions were also found as resulting from proliferation of RPRC000250−PA B 95 RPRC000537−PA transposable elements (37). Other expansions related to devel- 83 TRIDI 270381619 87 TRIMA 307094890 opment were found in cuticle genes, including those that tran- RPRC000999−PA scribe cuticular proteins and sclerotization enzymes (SI Appendix, 100 RPRC000114−PA NP1 33518667 RPRC000313−PA Figs. A7–A10 and Tables D22 and D23 in Dataset S1). 88 99 RPRC000077−PA RPRC000034−PA 94 100 RPRC000498−PA RPRC000058−PA Hematophagy-Related Genes. All developmental stages of R. pro- 96 RPRC000023−PA NP4 3219833 lixus, from first instar nymphs to adults of both sexes, feed ex- 71 RPRC000380−PA 93 100 RPRC000367−PA clusively on blood. Chemoreception is essential to host finding RPRC000072−PA NP2 3219826 1 95 100 RPRC000284−PA (38), and several LSEs that we identified included odorant and RPRC000061−PA gustatory receptors as well as odorant and chemosensory binding proteins (SI Appendix, Figs. A11–A14 and Tables D24–D27 in Fig. 3. Thioredoxin reductases and nitrophorins trees. (A) The tree shows the Dataset S1). Some of the recent LSEs, such as OR58-87 (SI phylogenetic relationship of thioredoxin reductases (TR) in nine Paraneoptera Appendix, Fig. A11) (named “recent Rhodnius expansion”) were genomes. Colored balls indicate the amino acid aligned to the Sec position: green (U) selenocysteine, red (C) cysteine, and gray (–) unknown/unaligned. found in tandem arrays (Table D24 in Dataset S1). As in other One of the two Cys-TR in R. prolixus (RPRC014349) clusters with the Sec-TR blood-feeding insects, detection of carbon dioxide contributes proteins, implying a Sec to Cys conversion. (B) Nitrophorins branch of lipocalins to host finding in triatomines (39). Nevertheless, the carbon di- tree exemplify a Rhodnius enriched cluster. Sequences clustered by OrthoMCL oxide receptor subfamily was absent in R. prolixus, as previously on group 691 together with unclustered sequences similar to lipocalins were reported in the body louse and the pea aphid, despite being aligned and a tree constructed (SI Appendix,Fig.A15). Black dots mark new highly conserved in holometabolous insects (40). The sugar re- Rhodnius sequences. TRIDI, Triatoma dimidiata;TRIMA,Triatoma matog- ceptor subfamily was also missing in R. prolixus, but this may rossensis. Support values based on bootstrap are included at nodes.

Mesquita et al. PNAS | December 1, 2015 | vol. 112 | no. 48 | 14939 (RPRC003927) (Table D35 in Dataset S1). The uricolysis path- third method, used to discover MITEs, relied upon Findmite (55) and a way, which is present in mosquitoes (48), is putatively absent in repeat library produced with RepeatScout (56). We assessed the genomic R. prolixus. Absence of both pathways explains the lack of coverage of all TE using BLAT (57). We identified genomic duplicated re- elimination of nitrogen as urea, also originally described by gions using BLASTN (52) of all scaffolds against each other with 5-Kb cut- Wigglesworth, even before the discovery of the urea cycle (1). As off to avoid TE detection. is the case for most eukaryotes, the R. prolixus genome does not miRNA precursors and mature sequences were retrieved using Einverted encode several pathways for essential amino acid synthesis (47) (58) and filtered. Their target genes were predicted using miRanda (59). (Table D36 in Dataset S1), but the lack of almost all urea cycle Selenoprotein genes were identified using Selenoprofiles (60) and Seblastian enzymes (Table D35 in Dataset S1) also precludes the de novo (61). We also identified SECIS elements using SECISearch3 (61). We used tRNAscan- synthesis of arginine. This is noteworthy because arginine is the SE (62) to detect tRNA-Sec. Selenoproteins multiple sequence alignments used precursor for nitric oxide, an important salivary vasodilator for T-Coffee (63) and then performed phylogenetic analyses by reconstructing trees blood-feeding insects. We postulate that adaptation to a diet rich with maximum likelihood using the best-fitting evolutionary model (64). in amino acids, such as blood, allowed for relaxation of the Wolbachia similar regions were searched using BLASTN (52) to compare constraints on the arginine synthesis pathway. the Rhodnius genome with 16 Wolbachia genomes. The gene models This first analysis of the R. prolixus genome generated novel in- present in these regions were selected and grouped with those having a sights regarding multiple adaptive mechanisms that likely contrib- Wolbachia protein as best hit in a BLASTP (52). This gene group had their uted to a strict blood-feeding lifestyle, and thus provided new codon-use compared with Rhodnius and Wolbachia backgrounds by Ex- working hypotheses that will drive future research in the biology pander Tool (65). of triatomines and Chagas disease. Some of the unique features of Immunity genes were searched in expressed sequence tags, gene pre- R. prolixus, including its numerous lineage-specific gene family ex- diction, assembled genome, and genomic unassembled raw reads to assure pansions, the peculiar immune network, and a “silent” relationship reliability of the absences; all searches used BLAST (52) and homologous genes between the insect host and the trypanosome parasite, also provide from closely related taxa. The rpRelish gene was manually assembled using a new starting point for further understanding its evolutionary ad- contigs of a previously published transcriptome (10). For confirmation, rpRelish aptations. In summary, our data illustrate the important role of was cloned and sequenced. Insects were immunologically challenged through R. prolixus as an additional study model to that of dipteran species in blood meal as it increases endosymbiont (R. rhodnii) population in the gut. Total RNA was extracted using the TRIZOL reagent (Invitrogen) following the the vector biology arena and in the broader insect biology landscape ’ as well. This work also reinforces the extraordinarily rich legacy of manufacturer s instructions. Real-time PCR (quantitative RT-PCR) was used to assess the transcript abundance and silencing efficiency of the genes of in- Wigglesworth, whose experimental studies on R. prolixus illumi- terest. T7 Megascript kit (Ambion) was used to generate and purify dsRNA nated the basic concepts of insect physiology still used today. We from PCR-amplified genes following the manufacturer’s instructions. The hope the findings described here will allow the development of dsRNA in sterile water was introduced into the thorax of adult females by novel strategies to control this important Chagas disease vector. injection. The microbiota of R. prolixus digestive tract was analyzed by counting colony forming units after plating homogenates of the different sections of Materials and Methods the midgut. All candidate colonies were genotyped to assure R. prolixus identity before extracting genetic material from ovaries or testis. Whole-genome shotgun ACKNOWLEDGMENTS. We thank Charles B. Beard for the Rhodnius pro- × sequencing using Sanger and 454 technologies produced 8 genome cov- lixus picture. Genome sequencing was funded by National Institutes of erage. After end-sequencing a BAC library, we assembled all data by Health Grant NHGRI-HG003079; VectorBase was supported by National implementing the default parameters in CABOG (49). Institutes of Health National Institute of Allergy and Infectious Diseases We predicted gene sequences using ab initio and similarity-based approaches Grant HHSN272200900039C; Brazilian groups were funded by Conselho that were merged to construct the final set of gene models. We performed Nacional de Desenvolvimento Científico e Tecnológico, Fundação de Amparo automatic annotation and paralog clustering using AnoXcel (50). We performed à Pesquisa do Estado do Rio de Janeiro (FAPERJ), Fundaçao de Amparo à Pesquisa do Estado de Sao Paulo, and Fundação de Amparo à Pesquisa do orthology-based clustering using OrthoMCL (6) and gene family counts based Estado de Minas Gerais; R.R.-P. was supported by Consejo Nacional de Inves- on conserved protein domain annotations from InterProScan (51) for 16 species. tigaciones Científicas y Técnicas de Argentina, Agencia Nacional de Promoción We used three different methods for identifying TEs. First, we identified de Ciencia y Tecnología, and Fundación Bunge y Born; R.D.F.-M. was funded by class II and nonlong-terminal repeats (non-LTRs) with RPS-blast (52). Sec- Coordenação de Aperfeiçoamento de Pessoal de Nível Superior/FAPERJ; and ond, we identified LTRs using homology-based approaches (53, 54). The R.M.W. was supported by Marie Curie PIOF-GA-2011-303312.

aDepartamento de Bioquímica, Instituto de Química, Universidade Federal do Rio de Janeiro, Rio de Janeiro 21941-909, Brazil; bInstituto Nacional de Ciência e Tecnologia em Entomologia Molecular, Rio de Janeiro 21941-591, Brazil; cPrograma de Biologia Molecular e Biotecnologia, Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro 21941-591, Brazil; dBiological Sciences, Simon Fraser University, Burnaby, BC, Canada V5A 1S6; eCentro Regional de Estudios Genomicos, Universidad Nacional de La Plata, La Plata 1900, Argentina; fCentro de Bioinvestigaciones, Universidad Nacional del Noroeste de Buenos Aires, 2700 Pergamino, Argentina; gInstituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro 21040-900, Brazil; hMcDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108; iDepartamento de Genética, Instituto de Biologia, Universidade Federal do Rio de Janeiro, Rio de Janeiro 21941-590, Brazil; jSección Genética Evolutiva, Facultad de Ciencias, Universidad de la República, Montevideo 11400, Uruguay; kEuropean Molecular Biology Laboratory, European Bioinformatics Institute, Welcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom; lSection of Vector Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, MD 20852; mDepartment of Genetic Medicine and Development, University of Geneva Medical School, Geneva 1211, Switzerland; nSwiss Institute of Bioinformatics, Geneva 1211, Switzerland; oComputer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139; pThe Broad Institute of MIT and Harvard, Cambridge, MA 02142; qGrupo de Pesquisa em Ecologia de Doenças Transmissíveis na Amazônia, Instituto Leônidas e Maria Deane, Fundação Oswaldo Cruz, Amazonas 69057-070, Brazil; rLaboratório de Bioinformática e Análises Moleculares, Instituto de Genética e Bioquímica, Faculdade de Computação, Universidade Federal de Uberlândia, Minas Gerais 38700-002, Brazil; sInstituto de Ciências Biomédicas, Universidade Federal do Rio de Janeiro, Rio de Janeiro 21941-591, Brazil; tDepartamento de Parasitologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Minas Gerais 31270-901, Brazil; uNational Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Rockville, MD 20894; vInstituto de Investigaciones Bioquímicas de La Plata, Facultad de Ciencias Médicas, Universidad Nacional de La Plata-CONICET, La Plata 1900, Argentina; wDepartamento de Biologia, Universidade Estadual Paulista, São José do Rio Preto, São Paulo 15054-000, Brazil; xLaboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Rio de Janeiro 28013-602, Brazil; yDepartment of Molecular Microbiology and Immunology, Bloomberg School of Public Health, The Johns Hopkins University, Baltimore, MD 21205-2179; zDepartment of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556; aaDepartamento de Biologia , Instituto de Ciências Biológicas e da Saúde, Universidade Federal Rural do Rio de Janeiro, Rio de Janeiro 23897-000, Brazil; bbEscola Nacional de Saúde Pública, Fundação Oswaldo Cruz, Rio de Janeiro 21040-360, Brazil; ccNúcleo de Pesquisas Ecológicas de Macaé, Universidade Federal do Rio de Janeiro, Rio de Janeiro 27910-970, Brazil; ddDepartment of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556; eeCentro de Pesquisas René Rachou, Fundação Oswaldo Cruz, Minas Gerais 30190-002, Brazil; ffCentre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona 08003, Catalonia, Spain; ggUniversitat Pompeu Fabra, Barcelona 08003, Catalonia, Spain; hhLaboratoire d’Evolution, Génome et Spéciation, Centre National de la Recherche Scientifique UPR9034/UR 072, Institut de Recherche pour le Développement, 91198 Gif sur Yvette, France; iiUniversité Paris-Sud, 91400 Orsay, France; jjFrench National Institute For Agricultural Research, Institute of Ecology and Environmental Sciences of Paris, 78000 Versailles, France; kkDivision of

14940 | www.pnas.org/cgi/doi/10.1073/pnas.1506226112 Mesquita et al. Parasitic Diseases and Malaria, Entomology Branch, Centers for Disease Control and Prevention, Atlanta, GA 30329; llInstitut de Recherche sur la Biologie de l’Insecte, UMR7261, Centre National de la Recherche Scientifique, Université François Rabelais, 37200 Tours, France; mmDepartamento de Biotecnologia Farmacêutica, Faculdade de Farmácia, Universidade Federal do Rio de Janeiro, Rio de Janeiro 21941-902, Brazil; nnDepartment of Physiology, Biophysics and Neuroscience, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Zacatenco, Mexico City 03760, Mexico; ooDepartamento de Fisiologia e Biofísica, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Minas Gerais 31270-901, Brazil; ppDepartment of Biological Sciences, Florida International University, Miami, FL 11200; qqNúcleo de Doenças Infecciosas, Universidade Federal do Espirito Santo, Espirito Santo 29043-900, Brazil; rrDepartment of Entomology, University of Illinois at Urbana–Champaign, Urbana, IL 61801; ssInstituto Federal de Educação Ciência e Tecnologia do Rio de Janeiro, Rio de Janeiro 20270-021, Brazil; ttDepartment of Physiology, School of Medicine–Center for Resesarch in Molecular Medicine and Chronic Diseases, Instituto de Investigaciones Sanitarias, University of Santiago de Compostela, 15782 Santiago de Compostela, Spain; uuDepartment of Biochemistry, Virginia Polytechnic Institute, Blacksburg, VA 24061; vvDepartment of Veterinary Medicine, University of Cambridge, Cambridge CB3 0ES, United Kingdom; and wwDepartment of Biological Sciences, University of Manitoba, Winnipeg, MB, Canada R3T 2N2

1. Wigglesworth VB (1931) The physiology of excretion in a blood-sucking insect, Rhodnius 33. Wigglesworth VB (1954) Growth and regeneration in the tracheal system of an insect, prolixus (, ) I. Composition of the urine. J Exp Biol 8(4):411–427. Rhodnius prolixus (Hemiptera). Q J Microsc Sci 95:115–137. 2. WHO (2015) Chagas Disease Fact Sheet No 340 (WHO, Geneva). 34. Lavore A, et al. (2015) Comparative analysis of zygotic developmental genes in 3. Panzera F, et al. (2010) Cytogenetics and genome evolution in the subfamily Tri- Rhodnius prolixus genome shows conserved features on the tracheal developmental atominae (Hemiptera, Reduviidae). Cytogenet Genome Res 128(1-3):77–87. pathway. Insect Biochem Mol Biol 64:32–43. 4. DeSalle R, Gregory TR, Johnston JS (2005) Preparation of samples for comparative 35. Wigglesworth VB (1934) The physiology of ecdysis in Rhodnius prolixus (Hemiptera). studies of chromosomes: Visualization, in situ hybridization, and genome II. Factors controlling molting and metamorphosis. Q J Microsc Sci 2(77):191–222. size estimation. Methods Enzymol 395:460–488. 36. Noriega FG (2014) Juvenile hormone biosynthesis in insects: What is new, what do we know, 5. Anonymous; International Aphid Genomics Consortium (2010) Genome sequence of and what questions remain? International Scholarly Research Notices, 10.1155/2014/967361. the pea aphid Acyrthosiphon pisum. PLoS Biol 8(2):e1000313. 37. Vidal NM, Grazziotin AL, Iyer LM, Aravind L, Venancio TM (2015) Transcription fac- 6. Fischer S, et al. (2011) Using OrthoMCL to assign proteins to OrthoMCL-DB groups or tors, chromatin proteins and the diversification of Hemiptera. Insect Biochem Mol to cluster proteomes into new ortholog groups. Curr Protoc Biokinformatics Chapter Biol, 10.1016/j.ibmb.2015.07.001. 6, Unit 6.12:11–19. 38. Carey AF, Carlson JR (2011) Insect olfaction from model systems to disease control. 7. Neafsey DE, et al. (2015) Mosquito genomics. Highly evolvable malaria vectors: The Proc Natl Acad Sci USA 108(32):12987–12995. genomes of 16 Anopheles mosquitoes. Science 347(6217):1258522. 39. Botto-Mahan C, Cattan PE, Canals M (2002) Field tests of carbon dioxide and conspecifics 8. Fernandez-Medina RD, Granzotto A, Ribeiro JM, Carareto CM (2015) Transposition as baits for Mepraia spinolai, wild vector of Chagas disease. Acta Trop 82(3):377–380. burst of mariner-like elements in the sequenced genome of Rhodnius prolixus. Insect 40. Robertson HM, Kent LB (2009) Evolution of the gene lineage encoding the carbon Biochem Mol Biol, 10.1016/j.ibmb.2015.09.003. dioxide receptor in insects. J Insect Sci 9:19. 9. Brelsfoard C, et al. (2014) Presence of extensive Wolbachia symbiont insertions discovered 41. Ribeiro JMC, Francischetti IMB (2003) Role of arthropod saliva in blood feeding: in the genome of its host Glossina morsitans morsitans. PLoS Negl Trop Dis 8(4):e2728. Sialome and post-sialome perspectives. Annu Rev Entomol 48:73–88. 10. Ribeiro JMC, et al. (2014) An insight into the transcriptome of the digestive tract of 42. Paluzzi JP, Yeung C, O’Donnell MJ (2013) Investigations of the signaling cascade in- GENETICS the bloodsucking bug, Rhodnius prolixus. PLoS Negl Trop Dis 8(1):e2594. volved in diuretic hormone stimulation of Malpighian tubule fluid secretion in 11. Fire A, et al. (1998) Potent and specific genetic interference by double-stranded RNA Rhodnius prolixus. J Insect Physiol 59(12):1179–1185. – in Caenorhabditis elegans. Nature 391(6669):806 811. 43. Dansa-Petretski M, Ribeiro JMC, Atella GC, Masuda H, Oliveira PL (1995) Antioxidant 12. Bartel DP (2004) MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell role of Rhodnius prolixus heme-binding protein. Protection against heme-induced 116(2):281–297. lipid peroxidation. J Biol Chem 270(18):10893–10896. 13. Lim LP, et al. (2003) The microRNAs of Caenorhabditis elegans. Genes Dev 17(8): 44. Oliveira PL, et al. (1995) A heme-binding protein from hemolymph and oocytes of the 991–1008. blood-sucking insect, Rhodnius prolixus. Isolation and characterization. J Biol Chem 14. An L, Liu Y, Wu A, Guan Y (2013) microRNA-124 inhibits migration and invasion by 270(18):10897–10901. down-regulating ROCK1 in glioma. PLoS One 8(7):e69478. 45. Guerenstein PG, Lazzari CR (2009) Host-seeking: How triatomines acquire and make 15. Hoss AG, et al. (2014) MicroRNAs located in the Hox gene clusters are implicated in use of information to find blood. Acta Trop 110(2-3):148–158. Huntington’s disease pathogenesis. PLoS Genet 10(2):e1004188. 46. Ons S, Richter F, Urlaub H, Pomar RR (2009) The neuropeptidome of Rhodnius prolixus 16. Gerardo NM, et al. (2010) Immunity and other defenses in pea aphids, Acyrthosiphon brain. Proteomics 9(3):788–792. pisum. Genome Biol 11(2):R21. 47. Guedes RLM, et al. (2011) Amino acids biosynthesis and nitrogen assimilation pathways: 17. Ursic-Bedoya R, Buchhop J, Lowenberger C (2009) Cloning and characterization of Dorsal A great genomic deletion during eukaryotes evolution. BMC Genomics 12(Suppl 4):S2. homologues in the hemipteran Rhodnius prolixus. Insect Mol Biol 18(5):681–689. 48. Scaraffia PY, et al. (2008) Discovery of an alternate metabolic pathway for urea 18. Eichler S, Schaub GA (2002) Development of symbionts in triatomine bugs and the synthesis in adult Aedes aegypti mosquitoes. Proc Natl Acad Sci USA 105(2):518–523. effects of infections with trypanosomatids. Exp Parasitol 100(1):17–27. 49. Miller JR, et al. (2008) Aggressive assembly of pyrosequencing reads with mates. 19. Castro DP, et al. (2012) Trypanosoma cruzi immune response modulation decreases Bioinformatics 24(24):2818–2824. microbiota in Rhodnius prolixus gut and is crucial for parasite survival and develop- 50. Ribeiro JM, Topalis P, Louis C (2004) AnoXcel: An Anopheles gambiae protein data- ment. PLoS One 7(5):e36591. base. Insect Mol Biol 13(5):449–457. 20. Dias FA, et al. (2015) Identification of a selenium-dependent glutathione peroxidase 51. Jones P, et al. (2014) InterProScan 5: Genome-scale protein function classification. in the blood-sucking insect Rhodnius prolixus. Insect Biochem Mol Biol, 10.1016/ – j.ibmb.2015.08.007. Bioinformatics 30(9):1236 1240. 21. Schama R, et al. (2015) Rhodnius prolixus supergene families of enzymes potentially 52. Altschul SF, et al. (1997) Gapped BLAST and PSI-BLAST: A new generation of protein – associated with insecticide resistance. Insect Biochem Mol Biol, 10.1016/j.ibmb.2015.06.005. database search programs. Nucleic Acids Res 25(17):3389 3402. 22. Capriotti N, Mougabure-Cueto G, Rivera-Pomar R, Ons S (2014) L925I mutation in the 53. Tubío JMC, Naveira H, Costas J (2005) Structural and evolutionary analyses of the Ty3/ Para-type sodium channel is associated with pyrethroid resistance in Triatoma in- gypsy group of LTR retrotransposons in the genome of Anopheles gambiae. Mol Biol – festans from the Gran Chaco region. PLoS Negl Trop Dis 8(1):e2659. Evol 22(1):29 39. 23. Germano MD, et al. (2010) New findings of insecticide resistance in Triatoma infestans 54. Tubío JMC, et al. (2011) Evolutionary dynamics of the Ty3/gypsy LTR retrotransposons (Heteroptera: Reduviidae) from the Gran Chaco. J Med Entomol 47(6):1077–1081. in the genome of Anopheles gambiae. PLoS One 6(1):e16328. 24. Shiu S-H, Li W-H (2004) Origins, lineage-specific expansions, and multiple losses of 55. Tu Z (2001) Eight novel families of miniature inverted repeat transposable elements in the – tyrosine kinases in eukaryotes. Mol Biol Evol 21(5):828–840. African malaria mosquito, Anopheles gambiae. Proc Natl Acad Sci USA 98(4):1699 1704. 25. Blume-Jensen P, Hunter T (2001) Oncogenic kinase signalling. Nature 411(6835):355–365. 56. Price AL, Jones NC, Pevzner PA (2005) De novo identification of repeat families in – 26. Lynch JA, El-Sherif E, Brown SJ (2012) Comparisons of the embryonic development of large genomes. Bioinformatics 21(Suppl 1):i351 i358. — – Drosophila, Nasonia, and Tribolium. Wiley Interdiscip Rev Dev Biol 1(1):16–39. 57. Kent WJ (2002) BLAT The BLAST-like alignment tool. Genome Res 12(4):656 664. 27. Ewen-Campen B, Srouji JR, Schwager EE, Extavour CG (2012) Oskar predates the 58. Rice P, Longden I, Bleasby A (2000) EMBOSS: The European Molecular Biology Open evolution of germ plasm in insects. Curr Biol 22(23):2278–2283. Software Suite. Trends Genet 16(6):276–277. 28. Lavore A, Esponda-Behrens N, Pagola L, Rivera-Pomar R (2014) The gap gene Krüppel 59. John B, et al. (2004) Human microRNA targets. PLoS Biol 2(11):e363. of Rhodnius prolixus is required for segmentation and for repression of the homeotic 60. Mariotti M, Guigó R (2010) Selenoprofiles: Profile-based scanning of eukaryotic ge- gene sex comb-reduced. Dev Biol 387(1):121–129. nome sequences for selenoprotein genes. Bioinformatics 26(21):2656–2663. 29. Lavore A, Pagola L, Esponda-Behrens N, Rivera-Pomar R (2012) The gap gene giant of 61. Mariotti M, Lobanov AV, Guigo R, Gladyshev VN (2013) SECISearch3 and Seblastian: New Rhodnius prolixus is maternally expressed and required for proper head and abdo- tools for prediction of SECIS elements and selenoproteins. Nucleic Acids Res 41(15):e149. men formation. Dev Biol 361(1):147–155. 62. Lowe TM, Eddy SR (1997) tRNAscan-SE: A program for improved detection of transfer 30. Berni M, et al. (2014) Toll signals regulate dorsal-ventral patterning and anterior-posterior RNA genes in genomic sequence. Nucleic Acids Res 25(5):955–964. placement of the embryo in the hemipteran Rhodnius prolixus. Evodevo 5(1):38. 63. Notredame C, Higgins DG, Heringa J (2000) T-Coffee: A novel method for fast and 31. Özüak O, Buchta T, Roth S, Lynch JA (2014) Dorsoventral polarity of the Nasonia accurate multiple sequence alignment. J Mol Biol 302(1):205–217. embryo primarily relies on a BMP gradient formed without input from Toll. Curr Biol 64. Huerta-Cepas J, et al. (2011) PhylomeDB v3.0: An expanding repository of genome- 24(20):2393–2398. wide collections of trees, alignments and phylogeny-based orthology and paralogy 32. Wilson MJ, Kenny NJ, Dearden PK (2014) Components of the dorsal-ventral pathway predictions. Nucleic Acids Res 39(Database issue, D1):D556–D560. also contribute to anterior-posterior patterning in honeybee embryos (Apis mellifera). 65. Ulitsky I, et al. (2010) Expander: From expression microarrays to networks and Evodevo 5(1):11. functions. Nat Protoc 5(2):303–322.

Mesquita et al. PNAS | December 1, 2015 | vol. 112 | no. 48 | 14941