1 Whole genome analysis of a schistosomiasis-transmitting freshwater snail

2 Adema, Coen M.; Hillier LaDeana W.; Jones, Catherine S.; Loker, Eric S.; Knight, Matty; Minx,

3 Patrick; Oliveira, Guilherme; Raghavan, Nithya; Shedlock, Andrew; Amaral, Laurence Rodrigues

4 do; Arican-Goktas, Halime D.; Assis, Juliana; Baba, Elio Hideo; Baron, Olga Lucia; Bayne,

5 Christopher J.; Bickham-Wright, Utibe; Biggar, Kyle K.; Blouin, Michael; Bonning, Bryony C.;

6 Botka, Chris; Bridger, Joanna M.; Buckley, Katherine M.; Buddenborg, Sarah K.; Caldeira, Roberta

7 Lima; Carleton, Julia; Carvalho, Omar S.; Castillo, Maria G.; Chalmers, Iain W.; Christensens,

8 Mikkel; Clifton, Sandy; Cosseau, Celine; Coustau, Christine; Cripps, Richard M.; Cuesta-Astroz,

9 Yesid; Cummins, Scott; di Stephano, Leon; Dinguirard, Nathalie; Duval, David; Emrich, Scott;

10 Feschotte, Cédric; Feyereisen, Rene; FitzGerald, Peter; Fronick, Catrina; Fulton, Lucinda; Galinier,

11 Richard; Geusz, Michael; Geyer, Kathrin K.; Giraldo-Calderon, Gloria; Gomes, Matheus de Souza;

12 Gordy, Michelle A.; Gourbal, Benjamin; Grossi, Sandra; Grunau, Christoph; Hanington, Patrick C.;

13 Hoffmann, Karl F.; Hughes, Daniel; Humphries, Judith; Izinara, Rosse; Jackson, Daniel J.; Jannotti-

14 Passos, Liana K.; Jeremias, Wander de Jesus; Jobling, Susan; Kamel, Bishoy; Kapusta, Aurélie;

15 Kaur, Satwant; Koene, Joris M.; Kohn, Andrea B; Lawson, Dan; Lawton, Scott P.; Liang, Di;

16 Limpanont, Yanin; Lockyer, Anne E.; Lovato, TyAnna L.; Liu, Sijun; Magrini, Vince; McManus,

17 Donald P.; Medina, Monica; Misra, Milind; Mitta, Guillaume; Mkoji, Gerald M.; Montague,

18 Michael J.; Montelongo, Cesar; Moroz, Leonid L; Munoz-Torres, Monica C.; Niazi, Umar; Noble,

19 Leslie R.; Pais, Fabiano; Papenfuss, Anthony T.; Peace, Rob; Pena, Janeth J: Pila, Emmanuel A.;

20 Quelais, T; Raney, Brian J.; Rast, Jonathan P.; Ribeiro, Fernanda; Rollinson, David; Rotgans,

21 Bronwyn; Routledge, Edwin J.; Ryan, Kathryn M.; Scholte, Larissa; Silva, Francislon; Storey,

22 Kenneth B; Swain, Martin; Tennessen, Jacob A.; Tomlinson, Chad; Trujillo, Damian L.; Volpi,

23 Emanuela V.; Walker, Anthony J.; Wang, Tianfang; Wannaporn, Ittiprasert; Warren, Wesley C.; 24 Wu, Xiao-Jun; Yoshino, Timothy P.; Yusuf, Mohammed; Zhang, Si-Ming; Zhao, Min; Wilson,

25 Richard K.

26 27 Summary

28 Biomphalaria snails are instrumental in transmission of the human blood fluke .

29 With the World Health Organization's goal to eliminate schistosomiasis as a global health problem

30 by 2025, there is now renewed emphasis on snail control. Our characterization of the genome of

31 , a lophotrochozoan protostome, provides timely and important information

32 on snail biology. We describe phero-perception, stress resistance, immune function and multi-level

33 regulation of gene expression that support the persistence of B. glabrata in aquatic habitats and

34 define this species as a suitable snail host for S. mansoni. We identify several potential targets for

35 developing novel control measures aimed at reducing snail-mediated transmission of

36 schistosomiasis.

37 38 The fresh water snail Biomphalaria glabrata (Lophotrochozoa, Mollusca) is of medical relevance as

39 this Neotropical gastropod contributes as obligate intermediate host of Schistosoma mansoni

40 (Lophotrochozoa, Platyhelminthes) to transmission of the neglected tropical disease (NTD) human

41 intestinal schistosomiasis1. An S. mansoni miracidium initiates a chronic parasite-snail infection that

42 alters B. glabrata immunity and metabolism, causing parasitic castration such that the snail does not

43 reproduce but instead supports generation of cercariae, the human-infective stage of S. mansoni.

44 Patently infected snails continuously release free-swimming cercariae that penetrate the skin of

45 humans that they encounter in their aquatic environment. Inside the human definitive host, S.

46 mansoni matures to adult worms that reproduce sexually in the venous system surrounding the

47 intestines, releasing eggs, many of which pass through the intestinal wall and are deposited in water

48 with the feces. Miracidia hatch from the eggs and complete the life cycle by infecting another B.

49 glabrata. Related Biomphalaria species transmit S. mansoni in Africa. Schistosomiasis is chronically

50 debilitating; estimates of disease burden indicate that disability-adjusted life years (DALYs) lost due

51 to morbidity rank schistosomiasis second only to malaria among parasitic diseases in impact on

52 global human health2.

53 In the absence of a vaccine, current control measures emphasize mass drug administration (MDA) of

54 praziquantel (PZQ), the only drug available for large-scale treatment of schistosomiasis3.

55 Schistosomes, however, may develop resistance and reduce the effectiveness of PZQ4. Importantly,

56 PZQ treatment does not protect humans against rapid re-infection following exposure to water-borne

57 cercariae released from infected snails. Snail-mediated parasite transmission must be interrupted to

58 achieve long-term sustainable control of human schistosomiasis5. The World Health Organization

59 has set a strategy for the global elimination of schistosomiasis as a public health threat by the year

60 2025 that recognizes both MDA approaches and targeting of the snail intermediate host as crucial 61 toward achieving this goal6. This significant undertaking provides added impetus for detailed study

62 of B. glabrata.

63 Genome analyses of B. glabrata also provide evolutionary insights by increasing the relatively few

64 taxa of the Lophotrochozoa that have been characterized. To date a majority of lophotrochozoan

65 genomes have been derived from parasitic, sessile, or marine (i.e. platyhelminths, leech,

66 bivalve, cephalopod, polychaete)7-11. Thus, B. glabrata likely possesses novel genomic features that

67 characterize a free-living lophotrochozoan from a freshwater environment. Here we characterize the

68 B. glabrata genome and describe biological properties that enable the snail’s persistence in aquatic

69 environments and define this snail as a suitable host for S. mansoni, including aspects of immunity

70 and gene regulation. These efforts, we anticipate, will foster developments to interrupt snail-

71 mediated parasite transmission in support of schistosomiasis elimination.

72 The B. glabrata genome comprises eighteen chromosomes (Supplementary Text 1 and

73 Supplementary Fig. 1-3) and has an estimated size of 916Mb12. We assembled the genome of BB02

74 strain B. glabrata13 from short reads, Sanger sequences, and 454 sequence data (~27.5X coverage)

75 and used a linkage map to assign genomic contigs to linkage groups (Supplementary Text 2 and

76 Supplementary Table 1). We captured transcriptomes from twelve different tissues (Illumina

77 RNAseq) and mapped these reads to the assembly to aid annotation (Supplementary Text 3,

78 Supplementary Figs. 4-8 and Supplementary Tables 2-7).

79 Olfaction in an aquatic environment

80 Aquatic molluscs employ proteins for communication; e.g. Aplysia attracts conspecifics using water-

81 soluble peptide pheromones14. We collected B. glabrata proteins from snail conditioned water

82 (SCW) and following electrostimulation (ES), which induces rapid release of proteins. NanoHPLC-

83 MS/MS identified 177 secreted proteins shared among 533 proteins from SCW and 232 proteins 84 from ES (Supplementary Table 8). Detection of an ortholog of temptin, which plays a key role in

85 pheromone attraction of Aplysia15, suggests an operational pheromone sensory system in B.

86 glabrata. To gain insight into the mechanisms for pheromone perception, we analyzed the B.

87 glabrata genome for olfactory G-protein coupled receptor (GPCR)-like supergene families and

88 identified 242 seven transmembrane domain GPCR-like genes belonging to fourteen subfamilies that

89 are clustered in the genome (Supplementary Text 4, Supplementary Figs. 9,10 and Supplementary

90 Table 9). RT-PCR and in situ hybridization confirmed expression of a GPCR-like gene within the

91 mollusc’s tentacles, known to be involved in chemosensation (Figure 1). This communication

92 system allows B. glabrata to detect conspecifics in aquatic environments but may have a tradeoff

93 effect by potentially exposing the snail as a suitable target for pathogens. 94

95 Fig 1. Candidate chemosensory receptors of B. glabrata. (A) Scaffold 39 contains fourteen 96 candidate chemosensory receptor genes (BgCRa-n). Most encode 7-transmembrane domain proteins, 97 BgCRm and BgCRn are truncated to six-transmembrane domains. Genes occur in both directions. 98 (B) Phylogenetic analysis of scaffold 39 chemosensory receptors. (C) Schematic of receptor showing 99 conserved and invariable amino acids, transmembrane domains I-VII; and location of glycosylation 100 sites. (D) Scanning electron micrograph showing anterior tentacle, with cilia covering the surface. 101 (B) RT-PCR gel showing amplicon for BgCR509a and actin from B. glabrata tentacle. (F, G) In situ 102 hybridization showing sense (negative control) and antisense localization of BgCR509a in anterior 103 tentacle section (purple). 104 105 Stress responses and immunity

106 Biomphalaria glabrata endures environmental insults, including biotic and abiotic stress from heat,

107 cold, xenobiotics, pollutants, injury and infection. Additional to previous reports of Capsaspora16 as

108 single-cell endosymbiont, we noted from the sequenced material an unclassified

109 mycoplasma or related mollicute bacteria and viruses (Supplementary Text 5,6, Supplementary

110 Figs. 11,12 and Supplementary Table 10). For anti-stress responses, B. glabrata genome has five

111 families of heat shock proteins (HSP), including Hsp20, Hsp40, Hsp60, and Hsp 90. The Hsp70 gene

112 family is the largest with six multi-exon genes, five single exon genes, and over ten pseudogenes

113 (Supplementary Text 7, Supplementary Figs. 13-16, Supplementary Table 11). These genes are

114 retained in B. glabrata embryonic (Bge) cells, the only available molluscan cell line17, such that anti-

115 stress function in B. glabrata can be investigated in vitro using cell cultures (Supplementary Text 8,

116 Supplementary Figs. 17-21, and Supplementary Table 12,13).

117 The B. glabrata genome contains about 99 genes encoding heme-thiolate enzymes (CYP

118 superfamily) for detoxifying xenobiotics. All major cytochrome P450 clans are represented,

119 including ~33 CYP2 genes, ~35 CYP3 genes, and ~7 CYP4 genes. The presence of 18 genes of the

120 mitochondrial clan suggests that molluscs, like arthropods, but unlike vertebrates, utilize

121 mitochondrial P450s for detoxification18. Tissue-specific expression (e.g. four uniquely in ovotestis)

122 suggests that fifteen P450 genes serve specific biological processes. Cataloguing the CYPome sets

123 the groundwork for rational design of selective molluscicides, e.g. by inhibiting unique P450s or

124 using B. glabrata-specific P450s for bioactivation of pro-molluscicides (Supplementary Text 9).

125 Biomphalaria glabrata possesses a diversity of pattern recognition receptors (PRRs) for immune

126 recognition of pathogens19. This includes 56 Toll-like receptor (TLR) genes, of which 27 encode

127 complete TLRs (Fig 2, Supplementary Text 10 and Supplementary Table 14), associated with a TLR 128 signaling network for transcriptional regulation through NF-B transcription factors (Supplementary

129 Text 11, Supplementary Fig. 22 and Supplementary Table 16). Like other lophotrochozoans, B.

130 glabrata shows a moderate expansion of TLR genes relative to mammals and insects which have

131 ~10 TLRs20. Other PRRs include eight peptidoglycan recognition-binding proteins (PGRPs), a single

132 Gram-negative binding protein (GNBP) and seventeen genes encoding for single scavenger receptor

133 cysteine-rich (SRCR) domains (Supplementary Table 15). A prominent category of B. glabrata

134 PRRs consists of fibrinogen-related proteins (FREPs), plasma lectins that are somatically mutated to

135 yield uniquely diversified FREP repertoires in individual snails21. The B. glabrata germline includes

136 four FREP genes comprising one immunoglobulin (IgSF) domain and one C-terminal fibrinogen

137 (FBG)-like domain, including one gene with an N-terminal PAN_AP domain, and twenty FREP

138 genes with two upstream IgSF domains preceding an FBG domain. FREP genes cluster in the

139 genome, often accompanied by partial FREP-like sequences (Supplementary Text 12 and

140 Supplementary Figs. 23-26). A proteomics level study of anti-parasite immunity showed that some

141 FREPs exhibiting binding reactivity to S. mansoni derive from different gene families between

142 susceptible and resistant B. glabrata strains. Moreover, a FREP-like lectin that contains a galectin

143 instead of a C-terminal fibrinogen domain (designated galectin-related protein; GREP22) is uniquely

144 associated with the resistant snail phenotype (Supplementary Text 8).

145 We identified several cytokines, including twelve homologs of IL-17, four MIF homologs, and

146 eleven TNF molecules (Supplementary Text 10, Supplementary Table 15). Orthologs of complement

147 factors indicate that B. glabrata can mount complement-like responses to pathogens (Supplementary

148 Text 13, Supplementary Figs. 27,28 and Supplementary Tables 17-28). We discovered an 149 150 Fig 2. TLR genes in B. glabrata. (A) Analysis of the (only complete) TIR domains from BgTLRs 151 identified seven classes (Neighbor-joining tree shown). Bootstrap values shown for 1,000 replicates. 152 Comparisons included TLRs from A. californica (Ac), L. gigantea (Lg), Mytilus galloprovincialis 153 (Mg), and C. gigas (Cg). The presence or absence of orthologs of each class in each molluscan 154 species is indicated. A representative of the B. glabrata class 1/2/3 clade is present within A. 155 californica, but is independent of the B. glabrata TLR classes (indicated by the large pink box). 156 Grey font indicates pseudogenes or partial genes. (B) B. glabrata has both single cysteine cluster 157 (scc; blue line)- and multiple cysteine cluster (mcc; orange line) TLRs. Domain structures are 158 shown for each of BgTLR class. BgTLRs consist of an LRRNT (orange hexagon), a series of LRRs 159 (ovals), a variable region (curvy line), LRRCT (yellow box), and transmembrane domain, and an 160 intracellular TIR domain (red hexagon). The dark blue ovals indicate well defined LRRs (predicted 161 by LRRfinder23); light blue ovals are less confident predictions. Each of the two class 7 BgTLRs has 162 a distinct ectodomain structure. The numbers of complete, pseudogenes () and partial and genes 163 are indicated for each class. 164 165 extensive toolkit for apoptosis, a response that can regulate invertebrate immune defense24. This

166 includes at least 50 genes encoding for Baculovirus IAP Repeat (BIR) domain-containing caspase

167 inhibitors, localized among 26 genomic scaffolds. The expansion of this gene family in molluscs (17

168 genes in Lottia gigantean, 48 in Crassostrea gigas), relative to other animal clades, suggests

169 important regulatory roles in apoptosis and innate immune responses25 (Supplementary text 14,

170 Supplementary Figs. 29-31 and Supplementary Table 29). Finally, B. glabrata possesses a gene

171 complement to metabolize reactive oxygen species (ROS) and nitric oxide (NO), that are generated

172 by hemocytes to exert cell-mediated cytotoxicity toward pathogens, including schistosomes

173 (Supplementary Text 15, Supplementary Fig. 32 and Supplementary Table 30).

174 Searches for antimicrobial peptide (AMP) sequences of B. glabrata indicated only a single macin-

175 type gene family, comprising six biomphamacin genes. While the AMP arsenal is reduced compared

176 to other invertebrate species (e.g., bivalve molluscs have multiple AMP gene families26), B. glabrata

177 possesses several multigenic families of antibacterial proteins including two achacins, five

178 LBP/BPIs, and 21 biomphalysins (Supplementary Text 16,17, Supplementary Figs. 33,34 and

179 Supplementary Tables 31,32). Gaps in functional annotation limit our interpretation of immune

180 function in B. glabrata (Supplementary Text 18 and Supplementary Table 33,34). Computational

181 analysis indicated that 15% of the predicted proteome consists of (583) secreted proteins. Immune-

182 relevant tissues showed high expression of 100 secreted proteins, including FREP3, other lectins and

183 novel proteins that potentially represent candidate immune factors for future study. (Supplementary

184 text 3, Supplementary Fig. 7, Supplementary Tables 5-6).

185 Regulation of biological processes

186 Epigenetic regulation allows dynamic use of genes contained in the B. glabrata genome27-29. The

187 chromatin-modifying enzymes include class I and II histone methyltransferases, LSD-class and 188 Jumonji-class histone demethylases, class I – IV histone deacetylases, and GNAT, Myst and CBP

189 superfamilies of histone acetyltransferases. We identified homologs of DNA (cytosine-5)-

190 methyltransferases 1 and 2 (no homolog of DNMT3), as well as putative methyl-CpG binding

191 domain proteins 2/3. In silico analyses predicted a mosaic type of DNA methylation, as is typical for

192 invertebrates (Supplementary Text 19, Supplementary Figs. 35-39, Supplementary Table 35). The

193 potential role of DNA methylation in B. glabrata reproduction and S. mansoni interactions is

194 reported elsewhere (K.G.G., U.H.N., D.D., Ce.C., C.T., I.W.C., M.T.S., U.B.-W., Sabrina E.

195 Munshi, C.G., T.P.Y. and K.F.H., in preparation).

196 The B. glabrata genome also provides the protein machinery for biogenesis of microRNA (miRNAs)

197 to regulate gene expression. Moreover, two computational methods independently predicted the

198 same 95 pre-miRNAs, encoding 102 mature miRNAs. Of these, 36 predicted miRNAs were

199 observed within our transcriptome data, while 53 displayed ≥ 90% nucleotide identity (with 25 at

200 100%) to L. gigantea miRNAs. One bioinformatics pipeline predicted 107 additional pre-miRNAs

201 unique to B. glabrata. Several B. glabrata miRNAs are predicted to regulate transcripts for

202 processes unique to snail biology, including secretory mucosal proteins and shell formation that may

203 present possible targets for control of B. glabrata (Supplementary Text 20,21, Supplementary Figs.

204 40-67, Supplementary Tables 36-45).

205 Aspects of B. glabrata biology are periodic30, likely due to control of circadian timing mechanisms.

206 We identified seven candidate clock genes in silico, including a gene with strong similarity to the

207 period gene of A. californica. Modification of expression of these putative clock genes may interrupt

208 circadian rhythms and affect feeding and egg-laying of B. glabrata (Supplementary Text 22).

209 Neuropeptides expressed within the nervous system coordinate the complex physiology of B.

210 glabrata, including regulation of male and female functions in this simultaneous hermaphrodite 211 snail. In silico searches identified 43 neuropeptide precursors in B. glabrata that were predicted to

212 yield over 250 mature signaling products. Neuropeptide transcripts occurred in multiple tissues, yet

213 some were most prominent within terminal genitalia (49%) and the CNS (56%), or even specific to

214 the CNS, including gonadotropin-releasing hormone (GnRH) and insulin-like peptides 2 and 3

215 (Supplementary Text 23, Supplementary Fig. 68, Supplementary Tables 46-48).

216 A role of steroid hormones in reproduction of hermaphrodite snails with male and female

217 reproductive organs remains speculative. Biomphalaria glabrata has a CYP51 gene to biosynthesize

218 sterols de novo, yet we found no orthologs of genes involved in either vertebrate steroid or arthropod

219 ecdysteroid biosynthesis. The lack of CYP11A1 suggests that B. glabrata cannot process cholesterol

220 to make vertebrate-like steroids. The absence of aromatase (CYP19), required for the formation of

221 estrogens, is particularly enigmatic as molluscs possess homologs of mammalian estrogen receptors.

222 Characterization of snail-specific aspects of steroidogenesis may identify targets to disrupt

223 reproduction toward control of snails. (Supplementary Text 24, Supplementary Fig. 69,

224 Supplementary Table 49).

225 The reproductive physiology of hermaphroditic snails is also modulated by male accessory gland

226 proteins (MAGPs), which are delivered with spermatozoa to augment fertilization success31. The B.

227 glabrata genome has sequences matching one such protein, Ovipostatin (LyAcp10), but none of the

228 other MAGPs identified in Lymnaea stagnalis32. Perhaps MAGPs evolved rapidly and are putatively

229 taxon-specific (Supplementary Text 25, Supplementary Fig. 70 and Supplementary Table 50).

230 KEGG Enrichment analyses33 identified metabolic and biochemical pathways in B. glabrata and

231 revealed distinct organ specific patterns of gene expression (Supplementary Text 3,26, Figs. 8, 71-

232 80, Supplementary Tables 3,51). Half of 5,000 annotated sequences were assigned to 200

233 biochemical pathways (Supplementary Text 3 and Supplementary Tables 2,3). Mapping biological 234 processes onto the snail anatomy helps interpreting B. glabrata’s responses to environmental insults

235 and pathogens.

236 For survival in B. glabrata, S. mansoni alters the snail host physiology, possibly by interfering with

237 the extracellular signal-regulated kinase (ERK) pathway34 and other signaling pathways. Eukaryotic

238 protein kinases (ePKs) mediate signal transduction through phosphorylation in complex networks,

239 and protein phosphatases counteract these effects towards effective signaling. Similarity searches for

240 conserved domains identified 240 B. glabrata ePKs encompassing all main types of animal ePKs

241 (Supplementary Text 3 and Supplementary Fig. 6). We also identified 60 putative protein

242 phosphatases comprising ~36 protein Tyr phosphatases (PTPs) and ~24 protein Ser/Thr

243 phosphatases (PSPs) (Supplementary Text 27, Supplementary Figs. 81-83). These sequences can be

244 studied for understanding control of homeostasis, particularly in the face of environmental and

245 pathogenic insults encountered by B. glabrata (Figure 3).

246 247

248 Fig 3. MAPK signaling pathway in B. glabrata. Blue shaded blocks correspond to proteins identified 249 in the B. glabrata predicted proteome, white blocks represent mammalian proteins without homologs 250 in the snail proteome. The +p signal represents phosphorylation and the –p signal represents protein 251 dephosphorylation. 252 253 Bilaterian evolution

254 We searched the B. glabrata genome for core cardiac-specification and -differentiation genes. A

255 previously characterized short cDNA sequence from snail heart RNA led to identification of

256 BGLB012592 as the Biomphalaria ortholog of tin/Nkx2.5 35. Similarity searches with Drosophila

257 orthologs identified most of the core cardiac regulatory factors and structural genes in the B.

258 glabrata genome (Supplementary Text 28, Supplementary Table 52), with enriched expression of

259 these genes in cardiac tissues (Figure 4). These results from a lophotrochozoan, in conjunction with

260 ecdysozoans and deuterostomes, further support the hypothesis that a primitive heart-like structure,

261 which developed through the actions of a core heart toolkit, was present in the urbilaterian ancestor.

262 Actins are conserved proteins that function in cell motility (cytoplasmic actins) and muscle

263 contraction (sarcomeric actins)36. The clustering across seven scaffolds suggests that some of the ten

264 B. glabrata actin genes arose through tandem duplication. Expression across all tissues indicates that

265 four genes encode cytoplasmic actins (Figure 4). Protein sequence comparisons placed all B.

266 glabrata actins as most closely related to mammalian cytoplasmic rather than sarcomeric actins

267 (Supplementary Text 29 and Supplementary Table 53), a pattern also observed for all six actin genes

268 of D. melanogaster 37. The actin genes of B. glabrata and other molluscs were most similar to

269 paralogs within their own genomes, rather than to other animal orthologs (Figure 4). One

270 interpretation is that actin genes diverged independently multiple times in molluscs, similar to an

271 earlier hypothesis for independent actin diversification in arthropods and chordates38. Alternatively,

272 a stronger appearance of monophyly than really exists may result if selective pressures due to

273 functional constraints keep actin sequences similar within a genome, for example if the encoded

274 proteins have overlapping functions. 275 A D 276 277 278 279 280 281 282 283 284 285 286 B 287 288 289 290 291 292 293 294 295 296 297 C 298 299 300 301 302 303 304 305 306 307 Fig 4. Expression of cardiac genes and actin genes in B. glabrata tissues determined through RNA- 308 seq analysis. (A) cardiac regulatory genes. (B) cardiac structural genes. (C) Relative expression of 309 actin genes in B. glabrata tissues. (D) Phylogenetic relationships of actin genes. 310 311 AG - Albumen gland; BUC - buccal mass; CNS - central nervous system; DG - digestive gland; FOOT – headfoot; HAPO - 312 heart/APO; KID – kidney; MAN - mantle edge; OVO – ovotestes; SAL - salivary glands; STO - stomach; TRG - terminal genitalia 313 Amphimedon – marine sponge; Haliotis – abalone; Ciona – sea squirt; Hirudo – leech; Crassostrea – oyster; Biomphalaria – pond 314 snail; Homo – human; Drosophila – fruit fly. 315 316 We analyzed the transcriptomic data for B. glabrata genes involved in biomineralization. Of 1,211

317 transcripts that were >2-fold up-regulated in the mantle relative to other tissues, 34 shared similarity

318 with molluscan sequences known to be involved in shell formation and biomineralization. Another

319 177 candidate sequences putatively involved in shell formation including eighteen genes (10.2%)

320 with similarity to sequences of shell forming secretomes of other marine and terrestrial molluscs

321 were identified from the entire mantle transcriptome (Figure 5). Like previous studies, our results

322 show both diverse and conserved aspects of molluscan shell-forming strategies. Highly conserved

323 components of the molluscan shell forming toolkit include carbonic anhydrases and tyrosinases9.

324 (Supplementary Text 30, Supplementary Fig. 84, Supplementary Tables 54). 325

326 Fig 5. Circos diagram of 177 mantle specific, secreted Biomphalaria gene products compared 327 against six other molluscan shell forming proteomes (BLASTp threshold ≤10e-6). Protein pairs that 328 share sequence similarity in the top quartile are linked in green, the second quartile is linked in blue, 329 third quartile has orange links and lowest quartile of similarity has red links. Marine species are 330 circled in blue, freshwater species in green, terrestrial species in red. Percentages (proportions in 331 brackets) indicate the number of proteins that shared similarity with a Biomphalaria shell forming 332 candidate gene. 333 334 Repetitive landscape

335 Repeat content analysis showed that 44.8% of the B. glabrata assembly consists of transposable

336 elements (TEs; Fig 6, Supplementary Text 31, Supplementary Figs. 87-87 and Supplementary Table

337 55), higher than observed from other molluscs: Pacific oyster, C. gigas (36%)9, owl limpet, L.

338 gigantea (21%)8, sea hare, A. californica (30%)39, and comparable to Octopus bimaculoides (43%)11.

339 The fraction of unclassified elements in B. glabrata was high (17.6%). Most abundant classified

340 repeats were LINEs, including Nimbus40 (27% of TEs, 12.1% of the genome), and DNA TEs (17.7%

341 of TEs, 8% of the genome), long terminal repeats (LTRs) were a small percentage of the repetitive

342 elements (6% of TEs, 1.7% of the genome). Non-mobile simple repeats comprised 2.6% of the

343 genome with an abundance of short dinucleotide satellite motifs. Divergence analyses of element

344 copy and consensus sequences indicated that DNA TEs were not recent invaders of the B. glabrata

345 genome, no intact transposases were detected. A hAT DNA transposon of B. glabrata (~1000

346 copies) has significant identity with SPACE INVADERS (SPIN) which horizontally infiltrated a

347 range of animal species, possibly through host-parasite interactions41. Overall, our results reinforce a

348 model in which diverse repeats comprise a large fraction of molluscan genomes.

349 350

351

352 Fig 6. Transposable element (TE) landscape of B. glabrata. (A) Left: proportion of DNA annotated 353 as TE (in black) or not annotated (white). Right: TE composition by class. Numbers represent the 354 percentage of the genome corresponding to each class. (B) Evolutionary view of TE landscape. For 355 each class, cumulative amounts of DNA (in Mb) are shown in function of the percentage of 356 divergence to the consensus (by bins of 1%, first one being ≥0 and <1; see Methods). Percentage of 357 divergence to the consensus is used as a proxy for age: the older the invasion of the TE is, the more 358 copies will have accumulated mutations (higher percentage of divergence, right of the graph). 359 Conversely, sequences corresponding to youngest elements show little divergence to consensus (left 360 of the graph). RC: rolling circle. 361 362 Concluding remarks

363 The genome of the Neotropical freshwater snail B. glabrata expands insights into animal biology by

364 further defining the lineage of the Lophotrochozoa relative to Ecdysozoa and Deuterostomia among

365 higher animals. An important rationale for analysis of the genome of B. glabrata pertains to its role

366 in transmission of S. mansoni in the New World. Moreover, most of the world’s cases of S. mansoni

367 infection occur in sub-Saharan Africa where other Biomphalaria species are responsible for

368 transmission, most notably Biomphalaria pfeifferi. Due to a shared common ancestor, B. glabrata

369 likely provides a good representation of the genomes of African Biomphalaria species42,43 . This

370 notion is supported by at least 90% sequence identity shared among 196 assembled transcripts

371 collected from B. pfeifferi (Illumina RNAseq) with the transcriptome of B. glabrata (Supplementary

372 Text 32 and Supplementary Tables 56-58). This report provides novel details on the biological

373 properties of B. glabrata and points to potential strategies for more effective surveillance and control

374 efforts against Biomphalaria to limit the transmission of schistosomiasis. 375 Methods

376 The genetic material used for sequencing the hermaphroditic freshwater snail B. glabrata was

377 derived from multiple snails of the BB02 strain, established at the University of New Mexico, USA

378 from a filed isolate collected from Minas Gerais, Brazil. Using a genome size estimate of 0.9-1Gb12,

379 we sequenced fragments (15X coverage), 3kb long inserts (10X), and 8kb long inserts (3X) with

380 reads generated on Roche 454 instrumentation, plus 1X coverage from plasmids and 0.1X from

381 bacterial artificial chromosome (BAC) ends13 on the ABI3730xl. Reads were assembled using

382 Newbler (v2.6)44. Additional sequencing reads were collected using Illumina instrumentation,

383 including 200bp short inserts (45X), 3kb long inserts (15X), and 8kb long inserts (10X), and

384 assembled de novo using SOAP de novo (v1.0.5)45. Finally, the Newbler assembly was merged with

385 the SOAP assembly using GAA46.

386 Redundant contigs in the merged assembly were collapsed and gaps between contigs were closed

387 through iterative rounds of Illumina mate-pair read alignment and extension using custom scripts.

388 We removed from the assembly all contaminating sequences, trimmed vectors (X), and ambiguous

389 bases (N). Shorter contigs (≤200bp) were removed prior to public release.

390 In the creation of the linkage group AGP files, we identified all scaffolds (145Mb total) that were

391 uniquely placed on a single linkage group (Supplementary Text 2 and Supplementary Table 1).

392 Because of low marker density and scaffolds could not be ordered and oriented within linkage

393 groups. The final draft assembly, (NCBI: ASM45736v1) is comprised of 331,400 scaffolds with an

394 N50 scaffold length of 48kb and an N50 contig length of 7.3kb. The assembled coverage (Newbler)

395 is 27.5X, and the assembly spans over 916Mb (with a coverage of 98%, 899Mb of sequence with

396 ~17Mb of estimated gaps). The draft genome sequence of Biomphalaria glabrata was aligned with

397 assemblies of Lottia and Aplysia (http://genome-test.cse.ucsc.edu/cgi- 398 bin/hgGateway?hgsid=389472876_vEhDXpybetKHBbwuzZFov0KE6Qhl&clade=other&org=Snail

399 &db=0) and also deposited in the DDBJ/EMBL/GenBank database (Accession Number

400 APKA00000000.1). It includes the genomes of an unclassified mollicute and viruses

401 (Supplementary 5 and 6: Accession Numbers CP013128, KT728710-12). Total RNA was extracted

402 from 12 different tissues dissected from multiple adult BB02 snails. Illumina RNAseq was used to

403 generate tissue-specific transcriptomes for albumen gland (AG); buccal mass (BUC); central nervous

404 system (CNS); digestive gland/hepatopancreas (DG/HP); muscular part of the headfoot (FOOT);

405 heart including amebocyte producing organ (HAPO); kidney (KID); mantle edge (MAN); ovotestis

406 (OVO); salivary gland (SAL); stomach (STO); terminal genitalia (TRG). The genome assembly was

407 also deposited in Vectorbase47, (https://www.vectorbase.org/organisms/biomphalaria-glabrata).

408 Computational annotation using Maker248 yielded 14,423 predicted gene models, including 96.5% of

409 the 458 sequences from the CEGMA core set of eukaryotic genes49. RNAseq data were mapped to

410 the genome assembly and Web Apollo50 was applied to aid manual annotation of genes of interest by

411 contributors. Methods and results are described in the Supplementary Information. 412 References

413 1. Paraense, W. L. The schistosome vectors in the Americas. Mem. Inst. Oswaldo Cruz 96 Suppl,

414 7-16 (2001).

415 2. King, C. H. Parasites and poverty: The case of schistosomiasis. Acta Trop. 113, 95-104 (2010).

416 3. Doenhoff, M. J. et al. Praziquantel: Its use in control of schistosomiasis in sub-Saharan Africa and

417 current research needs. Parasitology 136, 1825-1835 (2009).

418 4. Melman, S. D. et al. Reduced susceptibility to praziquantel among naturally occurring Kenyan

419 isolates of Schistosoma mansoni. PLoS Negl. Trop. Dis. 3, e504-e504 (2009).

420 5. Rollinson, D. et al. Time to set the agenda for schistosomiasis elimination. Acta Trop. 128, 423-

421 440 (2013).

422 6. World Health Organization. Accelerating work to overcome the global impact of neglected

423 tropical diseases: A roadmap for implementation, (Geneva: Department of Control of Neglected

424 Tropical Diseases. Available:

425 http://www.who.int/entity/neglected_diseases/NTD_RoadMap_2012_Fullversion.pdf, 2012).

426 7. Berriman, M. et al. The genome of the blood fluke Schistosoma mansoni. Nature 460, 352-358

427 (2009).

428 8. Simakov, O. et al. Insights into bilaterian evolution from three spiralian genomes. Nature 493,

429 526-531 (2013).

430 9. Zhang, G. et al. The oyster genome reveals stress adaptation and complexity of shell formation.

431 Nature 490, 49-54 (2012). 432 10. Tsai, I. J. et al. The genomes of four tapeworm species reveal adaptations to parasitism. Nature

433 496, 57-63 (2013).

434 11. Albertin, C. B. et al. The octopus genome and the evolution of cephalopod neural and

435 morphological novelties. Nature 524, 220-224 (2015).

436 12. Gregory, T. R. Genome size estimates for two important freshwater molluscs, the zebra mussel

437 (Dreissena polymorpha) and the schistosomiasis vector snail (Biomphalaria glabrata). Genome

438 46, 841-844 (2003).

439 13. Adema, C. M. et al. A bacterial artificial chromosome library for Biomphalaria glabrata,

440 intermediate snail host of Schistosoma mansoni. Mem. Inst.Oswaldo Cruz 101, 167-177 (2006).

441 14. Cummins, S. F. & Degnan, B. Sensory sea slugs: Towards decoding the molecular toolkit

442 required for a mollusc to smell. Commun. Integr. Biol. 3, 423-426 (2010).

443 15. Cummins, S. F. et al. Aplysia temptin - the 'glue' in the water-borne attractin pheromone

444 complex. FEBS J. 274, 5425-5437 (2007).

445 16. Hertel, L. A., Bayne, C. J. & Loker, E. S. The symbiont Capsaspora owczarzaki, nov. gen. nov.

446 sp., isolated from three strains of the pulmonate snail Biomphalaria glabrata is related to

447 members of the Mesomycetozoea. Int. J. Parasitol. 32, 1183-1191 (2002).

448 17. Odoemelam, E., Raghavan, N., Miller, A., Bridger, J. M. & Knight, M. Revised karyotyping and

449 gene mapping of the Biomphalaria glabrata embryonic (Bge) cell line. Int. J. Parasitol. 39, 675-

450 681 (2009).

451 18. Feyereisen, R. Arthropod CYPomes illustrate the tempo and mode in P450 evolution. Biochim

452 Biophys Acta 1814, 19-28 (2011). 453 19. Janeway, C. A. & Medzhitov, R. Innate immune recognition. Immunol. 20, 197-216 (2002).

454 20. Buckley, K. M. & Rast, J. P. Diversity of animal immune receptors and the origins of recognition

455 complexity in the deuterostomes. Dev. Comp. Immunol. 49, 179-189 (2015).

456 21. Zhang, S.-M., Adema, C. M., Kepler, T. B. & Loker, E. S. Diversification of Ig superfamily

457 genes in an invertebrate. Science 305, 251-254 (2004).

458 22. Dheilly, N. M. et al. A family of variable immunoglobulin and lectin domain containing

459 molecules in the snail Biomphalaria glabrata. Dev. Comp. Immunol. 48, 234-243 (2015).

460 23. Offord, V. & Werling, D. LRRfinder2.0: A webserver for the prediction of leucine-rich repeats.

461 Innate Immun. 19, 398-402 (2013).

462 24. Falschlehner, C. & Boutros, M. Innate immunity: Regulation of caspases by IAP-dependent

463 ubiquitylation. EMBO J. 31, 2750-2752 (2012).

464 25. Vandenabeele, P. & Bertrand, M. J. M. The role of the IAP E3 ubiquitin ligases in regulating

465 pattern-recognition receptor signalling. Nat. Rev. Immunol. 12, 833-844 (2012).

466 26. Mitta, G., Vandenbulcke, F. & Roch, P. Original involvement of antimicrobial peptides in

467 mussel innate immunity. FEBS L 486, 185-190 (2000).

468 27. Fneich, S. et al. 5-methyl-cytosine and 5-hydroxy-methyl-cytosine in the genome of

469 Biomphalaria glabrata, a snail intermediate host of Schistosoma mansoni. Parasit. Vectors 6,

470 167-167 (2013).

471 28. Geyer, K. K. et al. Cytosine methylation regulates oviposition in the pathogenic blood fluke

472 Schistosoma mansoni. Nat. Commun. 2, 424-424 (2011). 473 29. Knight, M. et al. Polyethyleneimine (PEI) mediated siRNA gene silencing in the Schistosoma

474 mansoni snail host, Biomphalaria glabrata. PLoS Negl. Trop. Dis. 5, e1212-e1212 (2011).

475 30. Rotenberg, L. et al. Oviposition rhythms in Biomphalaria spp. (Mollusca: Gastropoda). 1.

476 Studies in B. glabrata, B. straminea and B. tenagophila under outdoor conditions. Prog. Clin.

477 Biol. Res. 341B, 763-761 (1990).

478 31. Koene, J. M. et al. Male accessory gland protein reduces egg laying in a simultaneous

479 hermaphrodite. PLoS ONE 5, e10117-e10117 (2010).

480 32. Nakadera, Y. et al. Receipt of seminal fluid proteins causes reduction of male investment in a

481 simultaneous hermaphrodite. Curr. Biol. 24, 859-862 (2014).

482 33. Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and

483 interpretation of large-scale molecular data sets. Nucleic Acids Res. 40, D109-D114 (2012).

484 34. Zahoor, Z., Davies, A. J., Kirk, R. S., Rollinson, D. & Walker, A. J. Nitric oxide production by

485 Biomphalaria glabrata haemocytes: effects of Schistosoma mansoni ESPs and regulation

486 through the extracellular signal-regulated kinase pathway. Parasit. Vectors 2, 18-18 (2009).

487 35. Holland, N. D., Venkatesh, T. V., Holland, L. Z., Jacobs, D. K. & Bodmer, R. AmphiNk2-tin, an

488 amphioxus homeobox gene expressed in myocardial progenitors: Insights into evolution of the

489 vertebrate heart. Dev. Biol. 255, 128-137 (2003).

490 36. Fyrberg, E. A., Mahaffey, J. W., Bond, B. J. & Davidson, N. Transcripts of the six Drosophila

491 actin genes accumulate in a stage- and tissue-specific manner. Cell 33, 115-123 (1983). 492 37. Fryberg, E. A., Bond, B. J., Hershey, N. D., Mixter, K. S. & Davidson, N. The actin genes of

493 Drosophila: Protein coding regions are highly conserved but intron positions are not. Cell 24,

494 107-116 (1981).

495 38. Mounier, N., Gouy, M., Mouchiroud, D. & Prudhomme, J. C. Insect muscle actins differ

496 distinctly from invertebrate and vertebrate cytoplasmic actins. J. Mol. Evol. 34, 406-415 (1992).

497 39. Panchin, Y. & Moroz, L. L. Molluscan mobile elements similar to the vertebrate Recombination-

498 Activating Genes. Biochem. Biophys. Res. Commun. 369, 818-823 (2008).

499 40 Raghavan, N., et al. Nimbus (BgI): An active non-LTR retrotransposon of the Schistosoma

500 mansoni snail host Biomphalaria glabrata. Int. J. Parasitol. 37,1307-1318 (2007).

501 41. Pace, J. K., Gilbert, C., Clark, M.S. & Feschotte, C. Repeated horizontal transfer of a DNA

502 transposon in mammals and other tetrapods. Proc. Nat. Acad. Sci. USA 105, 17023-17028

503 (2008).

504 42. DeJong, R. J. et al. Evolutionary relationships and biogeography of Biomphalaria (Gastropoda:

505 Planorbidae) with implications regarding its role as host of the human bloodfluke, Schistosoma

506 mansoni. Mol. Biol. Evol.18, 2225-2239 (2001).

507 43. Campbell, G. et al. Molecular evidence supports an african affinity of the Neotropical freshwater

508 gastropod, Biomphalaria glabrata, Say 1818, an intermediate host for Schistosoma mansoni.

509 Proc. Biol. Sci. 267, 2351-2358 (2000).

510 44. Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors.

511 Nature 437, 376-380 (2005).

512 45. Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing.

513 Genome Res. 20, 265-272 (2010). 514 46. Yao, G. et al. Graph accordance of next-generation sequence assemblies. Bioinformatics 28, 13-

515 16 (2012).

516 47. Giraldo-Calderón, G. I. et al. VectorBase: An updated bioinformatics resource for invertebrate

517 vectors and other organisms related with human diseases. Nucleic Acids Res. 43, D707-13

518 (2015).

519 48. Holt, C. & Yandell, M. MAKER2: An annotation pipeline and genome-database management

520 tool for second-generation genome projects. BMC Bioinformatics 12, 491 (2011).

521 49. Parra, G., Bradnam, K. & Korf, I. CEGMA: A pipeline to accurately annotate core genes in

522 eukaryotic genomes. Bioinformatics 23, 1061-1067 (2007).

523 50. Lee, E. et al. Web Apollo: A web-based genomic annotation editing platform. Genome Biol. 14,

524 R93-R93 (2013).

525 Supplementary Information is linked to the online version of the paper at www.nature.com/nature

526 Acknowledgments

527 We thank S. Newfeld for discussion of actin evolution. Sequence characterization of the

528 Biomphalaria glabrata genome was funded by National Institutes of Health (NIH) grant HG003079

529 to R.K.W. and McDonnell Genome Institute, Washington University School of Medicine. Further

530 work was supported by FAPEMIG (RED-00014-14) and CNPq (304138/2014-2, 309312/2012-4) to

531 G.O.. K.F.H. and M.S. acknowledge the support of the UK BBSRC (BB/K005448/1) K.F.H. and

532 M.S. acknowledge the support of BBSRC (BB/K005448/1) and CNPq (503275/2011-5) to R.L.C.

533 C.B. acknowledges the support of NIH grant AI016137. J.M.K. acknowledges support from the

534 Research Council for Earth and Life Sciences (ALW; 819.01.007) with financial aid from the

535 Netherlands Organization for Scientific Research (NWO). S.E. acknowledges NIAID contract 536 HHSN272201400029C. C.M.A. was supported by NIH grant P30GM110907from the National

537 Institute of General Medical Sciences (NIGMS). R.M.C. acknowledges NIH grant GM061738 and

538 support from the American Heart Association, Southwest Affiliate (14GRNT20490250). D.T. was

539 supported by NIH grant R25 GM075149. M.K. acknowledges NIH-NIAID grant R01-AI063480.

540 C.F. acknowledges NIH grant R01-GM077582. D.J.J. acknowledges D.F.G grant JA2108/1-2.

541 E.S.L. acknowledges NIH grant P30GM110907 and ROI AI101438. BG acknowledges ANR JCJC

542 INVIMORY (ANR-13-JSV7-0009). K.M.B and J.P.R acknowledge Natural Sciences and

543 Engineering Research Council (NSERC 312221) and the Canadian Health Institutes for Research

544 (CIHR MOP74667). C.S.J, L.R.N, S.J, E.J.R, S.K and A.E.L acknowledge funding from NC3R’s

545 Grant (ref GO900802/1).

546

547 Author Contributions

548 C.M.A., E.S.L., M.K., N.R. conceived the study, scientific objectives. C.M.A. led the project and

549 manuscript preparation with input from steering committee members M.K., C.S.J., G.O., P.M.,

550 L.W.H., A.S., E.S.L., assisted by S.E. O.C. provided field collected snails. C.M.A. and E.S.L

551 cultured snails and provided materials. P.M., L.W.H, S.C., L.F., W.C.W, R.K.W., V.M., C.T

552 developed the sequencing strategy, managed the project, conducted assembly and evaluation. B.R.

553 performed genome alignments. M.C., D.H., S.E., G.G-C. and D.L. perfomed the genebuild, managed

554 metadata, performed genome annotation and data analysis, and facilitated Community Annotation

555 with M.C.M-T. H.D. A-G., M.Y., E.V.V., M.K. and J.M.B. performed Karyotyping and FISH

556 analysis. J.A.T and M.B. performed linkage mapping. G.O., F.P., F.R., J.A., I.R., Y.C., S.G., F.S.

557 and L.S. performed computational analyses of genomic, proteomic, and transcriptomic data, SNP

558 content, secretome, metabolic pathways and annotation of eukaryote protein kinases (ePKs). S.F.C.,

559 L.Y., D.L., M.Z. and D.McM conducted pheroreception studies. M.T.S., K.K.G., U.N. and K.F.H 560 conducted bacterial symbiont analysis. S.L., S-M.Z., E.S.L. and B.C.B. performed virus analyses.

561 M.K., P.F., W.I. and N.R. performed annotation of HSP. T.P.Y., X-J.W., U.B-W. and N.D.

562 conducted proteogenomic studies of parasite-reactive snail host proteins and data analysis. R.F., A.

563 E.L. and C.S.J. performed annotation of CYP. J.H. performed annotation of NFkB. K.M.B. and

564 J.P.R. performed annotation of conserved immune factors. C.M.A. and J.J.P performed annotation of

565 FREPs. M.C. and C.M. performed annotation of complement. D.D. performed annotation of

566 apoptosis. B.G. and C.J.B. performed annotation of REDOX balance. O.L.B., D.D., R.G., Ch.C. and

567 G.M. performed annotation of antibacterial defenses. L.d.S. and A.T.P performed search for

568 antibacterial defense genes. P.C.H., M.A.G. and E.A.P. performed annotation of unknown novel

569 sequences. K.K.G., I.W.C., U.N., K.F.H., Ce.C., T.Q. and C.G. performed annotation and analysis of

570 epigenetic sequences. E.H.B., L.R. doA., M.deS.G., R.L.C, and W.deJ.J. performed annotation of

571 miRNA (Brazil), K.K.B., R.P. and K.B.S. performed annotation of miRNA (Canada). M.G.

572 performed annotation of periodicity. S.F.C., B.R., T.W., A.E.L and S.K.conducted neuropeptide

573 studies and data analysis. A.E.L., R.F., S.K., E.J.R., S.J., D.R., C.S.J. and L.R.N.performed

574 annotation neuroendocrinology (CYP). J.M.K., B.R. and S.F.C.performed annotation of ovipostatin.

575 A.B.K and L.L.M performed tissue location of transcripts analysis. A.J.W. and S.P.L. performed

576 annotation of phosphatases. T.L.L., K.M.R., M.Mi., and R.C. performed annotation of actins and

577 annotation of cardiac transcriptional program with D.L.T. D.J.J., B.K. and M.Me. performed

578 annotation of biomineralization genes. J.C., A.K. and C.F. performed annotation of DNA

579 transposons, global analysis of transposable element landscape, and horizontal transfer events. A.S

580 and C.B. performed repeat/TE analysis. E.S.L, S.M.Z., G.M.M and SKB conducted comparative B.

581 pfeifferi transcriptome studies and data analysis. C.M.A, P.M., M.L.M did most of the writing with

582 contributions from all authors.

583 584 Author Information

585 The Biomphalaria glabrata genome project has been deposited at DDBJ/EMBL/

586 GenBank under the accession number APKA00000000.1. All short-read data have been

587 deposited into the Sequence Read Archive (SRA) (http://www.ncbi.nlm.nih.gov/sra)

588 as follows: Illumina HiSeq 2000 WGS reads:SRX648260-71; 454 GS FLX WGS: SRX005828,

589 SRX008161,2; 454 GS FLX RNA reads: SRX014813, SRX014894-7

590 Genome, transcriptome and predicted proteome data are also available at Vectorbase (ref. 47).

591 Reprints and permissions information is available at www.nature.com/reprints.

592 The authors declare no competing financial interests. Readers are welcome to comment on the online

593 version of the paper. Correspondence and requests for materials should be addressed to C.M.A.

594 ([email protected]).

595 596 597 List of Supplementary Files 598

599 PDF file 600 601 1. Supplementary information (SIZE) 602 This file contains Supplementary Text and Data sections 1-32 (see Contents list for details), 603 Supplementary Figures 1-87 604

605 Zip files 606 1. Supplementary Figure 8 (JPG, 8500 KB). 607 608 2 Supplementary Tables 1-56 (Excel 1919048 KB).