Articles PUBLISHED: 3 April 2017 | VOLUME: 1 | ARTICLE NUMBER: 0121

Adaptation to deep-sea chemosynthetic environments as revealed by genomes

Jin Sun1,​2, Yu Zhang3, Ting Xu2, Yang Zhang4, Huawei Mu2, Yanjie Zhang2, Yi Lan1, Christopher J. Fields5, Jerome Ho Lam Hui6, Weipeng Zhang1, Runsheng Li2, Wenyan Nong6, Fiona Ka Man Cheung6, Jian-Wen Qiu2* and Pei-Yuan Qian1,​7*

Hydrothermal vents and methane seeps are extreme deep-sea ecosystems that support dense populations of specialized macro­benthos such as . But the lack of genome information hinders the understanding of the adaptation of these ani- mals to such inhospitable environments. Here we report the genomes of a deep-sea vent/seep mussel ( plati- frons) and a shallow-water mussel ( philippinarum). Phylogenetic analysis shows that these mussel diverged approximately 110.4 million years ago. Many gene families, especially those for stabilizing protein structures and removing toxic substances from cells, are highly expanded in B. platifrons, indicating adaptation to extreme environmental conditions. The innate immune system of B. platifrons is considerably more complex than that of other lophotrochozoan species, including M. philippinarum, with substantial expansion and high expression levels of gene families that are related to immune recognition, endocytosis and caspase-mediated apoptosis in the gill, revealing presumed genetic adaptation of the deep-sea mussel to the presence of its chemoautotrophic endosymbionts. A follow-up metaproteomic analysis of the gill of B. platifrons shows metha- notrophy, assimilatory sulfate reduction and ammonia metabolic pathways in the symbionts, providing energy and nutrients, which allow the host to thrive. Our study of the genomic composition allowing symbiosis in extremophile molluscs gives wider insights into the mechanisms of symbiosis in other organisms such as deep-sea tubeworms and giant clams.

he environment of deep-sea hydrothermal vents and methane (1.64 Gb) was smaller than that of M. philippinarum (2.38 Gb), seeps is characterized by darkness, lack of photosynthesis- mainly because it has fewer repeats, particularly the Helitron- Tderived nutrients, high hydrostatic pressure, variable tem- transposable elements and unclassified repeats that together contrib- peratures and high concentrations of heavy metals and other toxic uted a 707 Mb difference between the two genomes (Supplementary 1,2 substances . Despite this hostile environment, these ecosystems Note 9). The N50 value (the shortest sequence length at 50% of the support dense populations of macrobenthos which, with the help genome size) of the B. platifrons scaffolds and contigs was 343.4 Kb of chemoautotrophic endosymbionts, are fuelled by simple reduced and 13.2 Kb, respectively; and the N50 value of the M. philippinarum molecules such as methane and hydrogen sulfide1–3. Studying vent scaffolds and contigs was 100.2 Kb and 19.7 Kb, respectively. The and seep organisms may answer fundamental questions about their deep-sea and shallow-water mussel genome contained 96.3% and origin and adaptation to the extreme environments1,4. Deep-sea 93.7% complete and partial universal single-copy metazoan ortholo- mussels (, Bathymodiolinae) often dominate at hydro- gous genes, respectively, indicating the completeness of the assembly thermal vents and cold seeps around the world. Owing to their and gene models (Supplementary Note 6). As with other bilateri- ecological importance and remarkable biological characteristics, ans, the two mussel genomes contained all the expected Hox and including their ability to survive an extended period under atmo- ParaHox genes, as well as similar numbers of microRNAs, again sug- spheric pressure5, they are a recognized model for studies of sym- gesting that the assemblies encapsulate a nearly complete representa- biosis6, immunity7, adaptation to abiotic stress8, ecotoxicology9 tion of genic information (Supplementary Note 13 and 14). Although and biogeography10. the gills of B. platifrons contain endosymbiotic methane-oxidizing bacteria (MOB)11, the host scaffolds did not contain bacterial nucle- Results and discussion otide sequences, showing a lack of horizontal gene transfer from the We sequenced the genomes of both B. platifrons Hashimoto and symbiont. B. platifrons has a lower rate of heterozygosity compared Okutani, 1994 (a deep-sea mussel, Fig. 1a) and M. philippinarum to the other three bivalves with a sequenced genome (Supplementary (Hanley, 1843) (a shallow-water mussel, Fig. 1b) in the family Fig. 3a), potentially owing to recurrent population bottlenecks as a Mytilidae using a whole-genome shotgun approach and compared result of population extinction and re-colonization12. their features (Supplementary Note 1). Both assembled genomes were The B. platifrons genome had 33,584 gene models, 86.2% of which highly repetitive (Supplementary Fig. 2). The genome of B. platifrons had transcriptome support, based on transcriptome data from three

1Division of Life Science, Hong Kong University of Science and Technology, Hong Kong, China. 2Department of Biology, Hong Kong Baptist University, Hong Kong, China. 3Shenzhen Key Laboratory of Marine Bioresource and Eco-Environmental Science, College of Life Science, Shenzhen University, Shenzhen, China. 4Key Laboratory of Tropical Marine Bio-resources and Ecology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou 510301, China. 5High Performance Computing in Biology, Roy J. Carver Biotechnology Centre, University of Illinois at Urbana–Champaign, Urbana, Illinois 61801, USA. 6Simon F. S. Li Marine Science Laboratory, School of Life Sciences, Centre for Soybean Research, Partner State Key Laboratory of Agrobiotechnology, the Chinese University of Hong Kong, Hong Kong, China. 7HKUST-CAS Joint Laboratory, Sanya Institute of Deep Sea Science and Engineering, Sanya 572000, China. *e-mail: [email protected]; [email protected] nature ECOLOGY & EVOLUTION 1, 0121 (2017) | DOI: 10.1038/s41559-017-0121 | www.nature.com/natecolevol 1 © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. Articles Nature Ecology & Evolution

a b

c Helobdella robusta d Pinctada fucata Capitella teleta Crassostrea gigas Lingula anatina Modiolus philippinarum Octopus bimaculoides Cephalopoda 94 Bathymodiolus platifrons 100 Lottia gigantea 100

100 Bathymodiolus azoricus Mytilidae 100 100 Biomphalaria glabrata Gastropoda 100 Aplysia californica 100 Perna viridis 100 100 Modiolus philippinarum Limnoperna fortunei Bathymodiolus platifrons 98 Mytilus coruscus 100 100 0.05 Crassostrea gigas Mytilus edulis 100 Pinctada fucata 0.01 100 Mytilus galloprovincialis

Figure 1 | Shell morphology and phylogenetic positions of B. platifrons and M. philippinarum. a,b, Shell of the deep-sea mussel B. platifrons (a, 10.7 cm) and shallow-water mussel M. philippinarum (b, 6.5 cm). c, Phylogenetic tree of 11 lophotrochozoan species based on genome data; the value of bootstrap support is shown at each node. The scale bar indicates 0.05 expected substitutions per site. d, Phylogenetic tree of Mytilidae based on transcriptome data from representative genera/species. The scale bar indicates 0.01 changes per site. tissues in a previous study13, and our new transcriptome data from Among the most numerously expanded gene families in three additional tissues; whereas the M. philippinarum genome B. platifrons are the HSP70 family (179 proteins) and the ABC had 36,549 gene models, 86.0% of which had transcriptome sup- transporters (393 proteins). HSP70 is important for protein folding, port, based on our new transcriptome data from five tissues and thus high expression of these genes in the gill (Supplementary (Supplementary Note 5). A total of 5,527 B. platifrons gene models were Fig. 15) could indicate their active response to stresses such as validated by shotgun proteomics using gill samples (false discovery changing temperature, pH and hypoxia. However, because it took rate (FDR) <​ 0.01), and the gene-expression level correlated approximately three hours from the sample collection to preserva- well (P <​ 1 ×​ 10−5) with their corresponding protein abundance tion, the high expression of some of the stress-related genes could (Supplementary Note 20). Comparisons among five molluscan also be caused by the stress experienced during the sampling and genomes revealed 6,883 shared protein domains and 474 protein transportation. Nevertheless, the expression level of HSP70 has domains unique to B. platifrons (Fig. 2a and Supplementary Table 14). been reported to positively correlate with the level of DNA-strand A phylogenetic tree constructed using 375 single-copy ortho- breakage in a deep-sea mussel16. Therefore, expansion of this gene logues from 11 lophotrochozoan species (Fig. 1c) and another tree family may provide additional genetic resources allowing deep-sea constructed using transcriptome sequences of representative mem- mussels to cope with abiotic stressors. ABC transporters are known bers of Mytilidae (Fig. 1d) showed that, within the genera used, to move toxic chemicals outside mussel gill epithelial cells17, form- Modiolus is the closest shallow-water relative of Bathymodiolus. ing the first line of defence against toxic chemicals in the vent/seep Using the lophotrochozoan tree as a reference, the time of diver- fluid. Many of the ABC transporters were highly expressed in the gence between B. platifrons and M. philippinarum was estimated gill (Supplementary Fig. 16), indicating their active role in protect- to be around 110.4 million years ago (Ma), with a 95% confidence ing the deep-sea mussel through detoxification. interval of 52.4–209.7 Ma (Fig. 2b), which is close to the upper age Deep-sea mussel Bathymodiolus spp. obtain their symbionts limit of deep-sea symbiotic mussels (102 Ma) previously estimated from the environment, and house them in the vacuoles of gill epithe- using five genes14. This result supports the hypothesis that the ances- lial cells18. Therefore, they must recognize the bacteria in the water tors of modern deep-sea mussels might have experienced an extinc- column, incorporate them into the host cell, and control their tion event during the global anoxia period that has been associated population growth (Fig. 3). The target microbes carry signature with the Palaeocene–Eocene thermal maximum at around 57 Ma15. molecules on their surface, which are recognized by the transmem- Gene-family analysis among the 11 lophotrochozoans revealed brane pattern-recognition receptors (PRRs) of the host19. Among the expansion of 111 protein domains and contraction of 39 domains the PRRs in B. platifrons, there was a significant expansion of genes in B. platifrons (Fig. 2c and Supplementary Table 10). The num- with an immunoglobin (Ig) domain and genes with an fibrinogen ber of contracted domains in the deep-sea mussel is the smallest domain, both of which are critical in allowing the sea anemone among all the species examined. This indicated that adaptation of Aiptasia to recognize its dinoflagellate symbiont20. In addition, B. platifrons to the deep-sea chemosynthetic environment was peptidoglycan-recognition proteins (PGRPs) that can bind with mainly mediated by gene family/domain expansion, and the deep- peptidoglycan21 on the bacterial cell wall were also expanded in the sea mussel has retained most genes of its shallow-water ancestors. genome of B. platifrons. Many of these PRR genes were also highly

2 nature ECOLOGY & EVOLUTION 1, 0121 (2017) | DOI: 10.1038/s41559-017-0121 | www.nature.com/natecolevol © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. Nature Ecology & Evolution Articles

a Modiolus c 12,049 464 –1 –2 0 1 2 3 12,033 Row Z score 445 177 161 Bp Mp Pf Cg Lg Bg Ac Ob La Ct Hr Annotation 474 2249 288 Octopus Immunoglobulin domain Bathymodiolus 4694 9 480 AAA domain 248 357 314 Ankyrin repeat 527 C1q domain 273 11,004 ABC transporter 624 Ras family 134 6,883 Protein-tyrosine phosphatase 197 2032 3 Fibronectin type III domain 312 EGF-like domain 158 288 TIR domain 231 181 Fibrogen C 126 125 Hsp70 protein Aplysia 145 Dual specificity phosphatase, catalytic domain 261 Death domain 235 Syndecan domain 128 Mitochondria-eating protein 10,325 Caspase-recruitment domain 311 Inhibitor of apoptosis domain NTPase 11,070 Galactosyltransferase Inositol hexakisphosphate Crassostrea Tyrosine phosphatase family THAP domain Zinc knuckle Protocadherin b 496.64 Capitella teleta Natural killer cell receptor 2B4 Homeodomain-like domain Helobdella robusta Methyltransferase FkbM domain 629.79 +55/–93 Winged helix-turn helix Lingula anatina Macro domain +321/–381 Caspase domain Octopus bimaculoides Tissue factor +73/–68 Myb/SANT-like DNA-binding domain 611.69 Lottia gigantea Transcriptional coactivator p15 (PC4) 543.61 +37/–40 Alcohol dehydrogenase transcription factor Myb/SANT-like 477.81 +101/–115 CENP-B N-terminal DNA-binding domain +69/–43 +41/–78 225.71 Biomphalaria glabrata TNF (tumour necrosis factor) family +80/–95 Aplysia californica Interferon-inducible GTPase (IIGP) 'Paired box' domain 51517.69 +127/–63 Lecithin retinol acyltransferase +67/-68 +93/–63 110.43 Modiolus philippinarum Cell-cycle sustaining, positive selection +111/–39 +99/–69 Bathymodiolus platifrons REJ domain PPR repeat +65/–81 381.32 Pancreatic hormone peptide +48/–39 Crassostrea gigas Integrin-α cytoplasmic region 282.56 +61/–45 Pinctada fucata Protein QIL1 VASP tetramerization domain Dimerization domain of male-specific-lethal 1 600 500 400 300 200 100 0 Ma Iron-sulfur cluster-binding domain

Figure 2 | Gene family representation analysis and divergence time of 11 lophotrochozoan species. a, Venn diagram showing the number of shared and unique Pfam functional domains in five molluscan species. b, Divergence times of 11 lophotrochozoan species with error bars (purple lines) indicating the 95% confidence interval, and the event of gene family expansions (red) and contractions (blue) analysed by counting Pfam domains. c, Heat map of major annotated Pfam domains that are expanded in B. platifrons, with multiple domains in a given gene being counted as one. Ac, Aplysia californica; Bp, Bathymodiolus platifrons; Bg, Biomphalaria glabrata; Cg, Crassostrea gigas; Ct, Capitella teleta; Hr, Hellobdella robusta; La, Lingula anatina; Lg, Lottia gigantea; Mp, Modiolus philippinarum; Ob, Octopus bimaculoides; Pf, Pinctada fucata.

expressed in the gill (Supplementary Note 18), consistent with the which can eventually result in the release of lysozymes to digest gill containing most of the MOB in the deep-sea mussels. the symbionts29. A TUNEL assay showed the presence of numer- Endocytosis of exogenous bacteria is a critical step in establish- ous apoptotic cells in deep-sea mussels30 and in late-juvenile stages ing the host–symbiont relationship. Several observations showed of vent tubeworms when they have acquired symbionts31, further molecular mechanisms of endocytosis of MOB by B. platifrons. substantiating the evolutionarily conserved role of apoptosis in (1) The Toll-like receptor 13 (TLR13) was found to be expanded establishing the host–symbiont relationship in various groups of in the genome of B. platifrons, and these receptors were highly invertebrates. In addition, as required to maintain homeostasis in expressed in the gill (Supplementary Fig. 20). TLR13 functions as symbiosis, B. platifrons has a complicated anti-apoptosis system an endosomal receptor in various groups of , which can with 130 inhibitors of apoptosis proteins (IAPs), which exceeds sense microbes only after their internalization by phagocytosis22. the number of IAPs in any other lophotrochozoan species available (2) Two adhesion gene families (syndecan and protocadherin) for comparison (Fig. 4). The high expression of many IAP genes is were expanded in the genome of B. platifrons, and these families consistent with their role in maintaining the symbiont population have been reported to mediate endocytosis in other species23,24 in the gill of B. platifrons (Fig. 4b). Therefore, the expansion of the (Supplementary Figs 21 and 22). (3) Positive selection of six genes apoptosis- and anti-apoptosis-related gene families could contrib- in the endocytosis pathway was shown by a branch-site evolution- ute to the regulation of the symbiont population in Bathymodiolus. ary analysis (Supplementary Fig. 14 and Supplementary Note 17). To better understand the relationship between the host and its Among the positively selected genes, a Wiskott–Aldrich syndrome symbionts, 220 MOB proteins were identified through metapro- protein and SCAR homologue (WASH) functions as a nucleation- teomics of the B. platifrons gill (Supplementary Note 20), leading promoting factor (NPF) to stimulate actin polymerization on endo- to the discovery of three key metabolic pathways in the symbi- somal membranes and consequently regulate vesicular trafficking25. onts (Fig. 3). The methane-oxidation pathway, indicated by high Apoptosis plays multiple functions in host–symbiont interactions expression of two signature enzymes of methanotrophy (that is, such as post-phagocytic winnowing in coral–dinoflagellate sym- methane monooxygenase and methanol dehydrogenase)11, fuels biosis26 and symbiont recycling in insect–bacteria symbiosis27. In the host–symbiont relationship (Supplementary Fig. 28). The assim- B. platifrons, several families of the classic apoptosis system includ- ilatory sulfate reduction pathway, identified by several adenylyl- ing TNF, DEATH, CARD and caspases were remarkably expanded sulfate kinases, produces sulfur-containing amino acids32. The and were highly expressed in the gill (Table 1 and Supplementary ammonia-assimilation pathway, identified by glutamate-ammonia Note 18). As the central controllers of programmed cell death, cas- ligase and glutamate synthase, detoxifies ammonia and provides pases initiate the transduction of apoptotic signals by activation the symbionts and host with the much needed organic nitrogen in of members of the TNF superfamily and their DEATH receptor28, the deep sea11.

nature ECOLOGY & EVOLUTION 1, 0121 (2017) | DOI: 10.1038/s41559-017-0121 | www.nature.com/natecolevol 3 © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. Articles Nature Ecology & Evolution

VM and O at −80 °C​ until use. DNA was extracted using the CTAB method, and its AM quality and quantity was checked with standard agarose gel electrophoresis and a Qubit Fluorometer, respectively. F The shallow-water horse mussels M. philippinarum were collected M from a sandy shore at Ting Kok, Hong Kong (22° 46.989′​ N, 114° 21.389′​ E; G Supplementary Fig. 1) during low tide in April 2014. The mussels were transported to HKBU and maintained in a seawater aquarium until use. The species identity was confirmed based on a morphological description33. DNA from the adductor muscle was extracted using the DNeasy Blood & Tissue Kit (Qiagen, Hilden, Germany). Recognition MOB Ig Fibrinogen Genome sequencing and reads cleaning. For B. platifrons, nine libraries with CH4 + NH4 insert sizes ranging from 180 bp to 16 kb were constructed using different preparation methods (see Supplementary Table 1a for details) and sequenced Amt by Illumina Hiseq2000 and Hiseq2500 platforms. For M. philippinarum, six PGRP libraries with insert sizes ranging from 180 bp to 5 kb were constructed and Gill epithelial cell sequenced by Illumina Hiseq2000 and Hiseq2500 platforms. Low-quality reads Syndecan TLR and sequencing-adaptor-contaminated reads were trimmed and error corrected. Protcadherin Reads of mate-pair libraries were de-duplicated and sorted.

+ NH4 Transcriptome sequencing. For B. platifrons, the mantle, foot and gill 13 AP-1 Endocytosis transcriptomes were generated from a single B. platifrons as previously reported . WASH The adductor muscle, visceral mass and ovary tissues from another B. platifrons collected from a at the Noho site in November 2015 were dissected on board and fixed in RNAlater, and their RNA was extracted using O VPS TRIzol and sent to BGI (Shenzhen) for sequencing using a eukaryotic mRNA C enrichment and library preparation protocol. For M. philippinarum, RNA was HH extracted from the mantle, foot, gill, adductor muscle and visceral mass and sequenced using the same method as for B. platifrons. The RNA-sequencing Glyceraldehyde L-Glutamate (RNA-seq) data are summarized in Supplementary Table 5. Raw reads were 3-phosphate trimmed and assembled using Trinity version 2.0.634. Apoptosis TNF 35 Glycolysis Genome assembly. The genomes were assembled using Platanus version 1.21 . Our genome-assembly pipeline involved assembling of DNA sequences using Caspase CARD different parameters, merging results from different assemblies, removing redundant scaffolds, further scaffolding using transcripts, filling the gaps and correcting the errors (Supplementary Fig. 4). Lysozyme IAP DEATH Identification of repeats and transposable elements. Repeats and transposable elements were screened using the RepeatModeler pipeline version 1.0.8. RMblast Figure 3 | A model of symbiosis between B. platifrons and methane was used to search and construct a species-specific repeats library of the genome, and RepeatMasker36 was used to identify, classify and mask the repeats with the oxidizing bacteria. A model of symbiosis between B. platifrons and MOB. species-specific repeats library and repeats in the RepBase with the default settings. Top left, the diagram of the six tissues used for transcriptome sequencing, including the major MOB-containing gill. AM, adductor muscle; F, foot; Genome annotation. Trinity was used to generate another version of transcripts G, gill; M, mantle; O, gonad; VM, visceral mass. Bottom, the host has using the genome-based mode. The transcriptome data from the de novo and mechanisms for MOB recognition, endocytosis and population control genome-assisted approaches were further assembled using PASA pipeline version 37 through apoptosis. The symbiont has various metabolic pathways (that 2.0.2 with BLAT as the aligner. The two mussel genomes were annotated using MAKER version 3.038. MAKER was run twice, in order to train the gene models is, methanotrophy and ammonia metabolism) that are beneficial to the and increase the gene-predication accuracy. The gene models were further host. Amt, ammonia transporter; AP-1, adaptor protein complex 1; CARD, filtered by running OrthoMCL39 with the protein models from six species of caspase-recruitment-domain protein; IAP, inhibitor of apoptosis; Ig, mollusks (Crassostrea gigas, Pinctada fucata, Lottia gigantea, Aplysia californica, immunoglobulin; LRP, low-density lipoprotein-receptor-related protein; Biomphalaria glabrata, and Octopus bimaculoides) and sequences that could PGRP, peptidoglycan recognition protein; TLR, Toll-like receptor; VPS, be clustered with these molluscs were retained. Transcriptome reads were then mapped to the remaining ‘orphan’ genes, and only those supported by at least vacuolar protein-sorting protein; WASH, Wiskott–Aldrich syndrome 10 parts per million (ppm) reads were retained. BLASTp was applied to search protein and SCAR homologue. the predicted mussel protein models against the whole non-redundant database with an E value of 1 ×​ 10−5. The.xml format files were uploaded to BLAST2GO to obtain the GO and InterprotScan annotation. Pfam annotation was performed Our study has provided genomic resources for understanding by searching the 16,295 Pfam entries (version 29.0) using a hidden Markov how the deep-sea mussel has adapted to extreme abiotic stresses model (HMM)40. The clean RNA-seq reads were quantified using kallisto41. and absence of photosynthesis-derived energy and nutrients Gene-expression levels were quantified based on transcripts per million (TPM). in the chemosynthetic ecosystems. The general mechanisms of Phylogenomics. Protein sequences of the nine published or public symbiosis revealed in our study are of relevance to other symbi- lophotrochozoan genomes (Supplementary Table 3) were downloaded and otic organisms such as deep-sea tubeworms and clams, as well as analysed together with data from the two mussels used in the phylogenomic shallow-water corals. analysis. OrthoMCL39 was applied to determine and cluster gene families, and the BBH approach in BLASTp was used with a threshold value of 1 × 10​ −5. This analysis resulted in 35,057 gene families among the 11 species. Only single-copy Methods orthologues in each gene cluster were concatenated without partition information Mussel collection and DNA extraction. Individuals of B. platifrons were collected for the phylogenetic analysis using RaxML42 with the GTR +​ Γ​ model and using the manned submersible Jiaolong from a methane seep (22° 06.921′​ N, slow bootstrapping. 119° 17.131′​ E, 1,122 m deep; Supplementary Fig. 1) in the South China Sea in Divergent time among the mollusks was estimated using MCMCTree, which July 201313. The mussels were kept in an enclosed sample chamber placed in the applied the Bayesian estimation method for species divergence43. The maximum- sample basket of the submersible, and it took approximately three hours from likelihood tree obtained using the 11 lophotrochozoan species was used to provide sampling to sample preservation. Once the samples were brought to the upper a reference tree topology, and the WAG +​ Γ​ protein substitution model was deck of the mothership, the adductor muscle of an individual was dissected, employed on each site. The tree was calibrated with the following time frames to cut into small pieces and immediately fixed in RNAlater. The sample was then constrain the age of the nodes between the species: minimum = 305.5 Ma​ and soft transported to Hong Kong Baptist University (HKBU) on dry ice and stored maximum =​ 581 Ma, for C. teleta and H. robusta44; minimum =​ 168.6 Ma and soft

4 nature ECOLOGY & EVOLUTION 1, 0121 (2017) | DOI: 10.1038/s41559-017-0121 | www.nature.com/natecolevol © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. Nature Ecology & Evolution Articles

Table 1 | Distribution of protein Pfam domains associated with apoptosis among 11 lophotrochozoans. Pfam Domain Bp Mp Pf Cg Lg Bg Ac Ob La Ct Hr PF00656 Caspase 75* 42 45 38 18 17 21 38 71 42 15 PF00653 IAP repeat 130* 95 68 49 26 43 52 16 37 26 6 PF00452 Bcl-2 12 10 12 8 11 9 8 14 16 15 12 PF00020 TNFR_c6 37 40 27 28 31 5 18 12 49 28 12 PF00229 TNF 50* 33 25 29 11 4 9 8 24 22 5 PF05729 NACHT 245 204 352* 127 105 53 170 102 194 246 84 PF00931 NB-ARC 78 69 95 52 104* 27 55 47 88 77 37 PF00531 DEATH 155* 98 125 55 40 8 53 186* 126 37 23 PF01335 DED 26 18 63* 22 10 2 7 6 47 8 2 PF00619 CARD 133* 94 66 57 32 6 36 29 212* 197* 16 Total 941 703 878 465 388 174 429 458 864 698 212 Each number shows the number of proteins containing the specific domain. Protein family expansion is indicated by an asterisk, with a significant corrected P value (P <​ 0.05). In B. platifrons, five gene families are significantly expanded when compared to other species. In any of the other 10 species, two gene families are significantly expanded at the most. Ac, Aplysia californica; Bg, Biomphalaria glabrata; Bp, B. platifrons; Cg, Crassostrea gigas; Ct, Capitella teleta; Hr, Hellobdella robusta; La, Lingula anatine; Lg, Lottia gigantean; Mp, Modiolus philippinarum; Ob, Octopus bimaculoides; Pf, Pinctada fucata. Domain abbreviations: IAP, inhibitor of apoptosis; Bcl-2, apoptosis regulator proteins, Bcl-2 family; TNFR, tumour necrosis factor receptor; TNF, tumour necrosis factor; NACHT, a domain that can be found in NAIP, CIITA, HET-E and TEP1 proteins; NB-ARC, a nucleotide-binding adaptor shared by APAF-1, certain R gene products and CED-4; DED, death effector domain; and CARD, caspase recruitment domain.

maximum =​ 473.4 Ma, for A. californica and B. glabrata44; minimum =​ 470.2 Ma cleavage: 2; peptide mass tolerance: 4 ppm; and fragment mass tolerance: 0.05 Da. and soft maximum =​ 531.5 Ma, for A. californica (or B. glabrata) and L. gigantea44; The FDR was dynamically controlled at 0.01. minimum =​ 532 Ma and soft maximum =​ 549 Ma, for the first appearance of Because the gills of B. platifrons contain MOB11,49 that belong to the family mollusks45; and minimum = 550.25 Ma​ and soft maximum =​ 636.1 Ma, for the Methylococcaceae13, we also identified and quantified the bacterial proteins first appearance of Lophotrochozoa45. in the gill samples to gain insights into the symbiotic relationship between the To understand the phylogenetic positions of these two mussels in the bacteria and their host. Protein sequences from 24 published bacterial genomes family Mytilidae better, transcriptome sequences from six additional mytilid in Methylococcaceae were downloaded and appended with the 1,402 bacterial species were downloaded from NCBI and re-assembled where appropriate, with that were identified in our previous meta-transcriptome analysis13. This created a details shown in the Supplementary Information. Protein sequences from each database with 86,023 bacterial proteins. The database search criteria were identical species were grouped into gene families using OrthoMCL39 as described above. to those described previously for the mussel proteins. The identified/matched Only the one-to-one orthologues were aligned, trimmed and concatenated. protein sequences were further annotated by BLASTp against the KEGG database Maximum-likelihood trees were built using RaxML42 with the GTR +​ Γ ​ model using an E value of 1 ×​10−7. and slow bootstrapping. Determination of positively selected genes. Single-copy orthologues were Gene-family evolution. We applied a sensitive HMM scanning method on known selected among B. platifrons, M. philippinarum, C. gigas and P. f u cata after pfam functional protein domains to classify the gene families46. Annotation of running the OrthoMCL pipeline39 as mentioned above. For each single-copy Pfam domains were performed on all 11 lophotrochozoan species after excluding orthologue group, they were aligned, trimmed and concatenated. protein domains of various transposable elements. A domain with multiple copies The phylogenetic tree was constructed using RaxML42 with the GTR +​ Γ ​ model in a protein was counted only once. Thereafter, in order to identify gene-family and rapid bootstrapping. expansion/contraction events in B. platifrons, a two-tailed Fisher’s exact test was Signatures of positively selected genes along a specific branch can be conducted. The Pfam domain counts in B. platifrons were compared against the detected by branch-site models implemented in the codeml module of the background average domain counts of the 9 non-mussel mollusks (that is, species phylogenetic analysis by maximum likelihood (PAML) package version 4.850. other than B. platifrons and M. philippinarum). In addition, OrthoMCL39 was The branch-site model requires a lineage designated as ‘foreground’ phylogeny applied to cluster gene families and CAFÉ version 3.1 was applied to determine the (that is, B. platifrons in the former phylogenetic tree) and the rest of the lineages significance of gene family expansion/contraction47. For each gene family, CAFÉ as ‘background’ phylogeny (that is, M. philippinarum, C. gigas, and P. f u cata in the generated a family-wide P value, with a significant P value indicating a possible former phylogenetic tree). Alternative model (model =​ 2, NSsite =​ 2, fix_omega =​ 0) gene-family expansion or contraction event. A branch-specific P value was also and null hypothesis (model =​ 2, NSsite =​ 2, fix_omega =​ 1 and omega =​ 1) generated for each branch/node using the Viterbi method. A family-wide P value likelihood values were tested using χ2 analysis. Only genes that had Bayesian less than 0.01 and a branch/node Viterbi P value less than 0.001 was considered Empirical Bayes (BEB) sites >​ 90 % and a corrected P value less than 0.1 were as a signature of gene family expansion/contraction for a specific gene family and considered to have undergone positive selection. A KEGG-pathway enrichment specific species, respectively, as recommended for CAFÉ version 3.1 in ref. 47. was performed with a χ2 test (if any of the components are over 5) or the Fisher’s exact test (all other analyses). Proteomics characterization. Gill samples from three individuals of B. platifrons fixed separately in RNAlater were used in proteomic analysis48. In brief, the Data availability. Raw reads, assembled genome sequences and annotation samples were homogenized in a lysis buffer containing 8 M urea and 40 mM are accessible from NCBI under BioProject numbers PRJNA328542 and HEPES (pH =​ 7.5), sonicated to break the cells up, centrifuged at 4 °C at 15,000g PRJNA328544, Sequence Read Archive accession numbers SRP078287 and for 15 min. The supernatant that contained the proteins was collected, purified SRP078294, and Whole Genome Shotgun project numbers MJUT00000000 and with a methanol/chloroform protein-precipitation method, and re-constituted MJUU00000000. The genome assemblies and annotations are also available from in lysis buffer. Three samples were then pooled in equal amounts, reduced using the Dryad Digital Repository at http://dx.doi.org/10.5061/dryad.h9942. dithiothreitol, alkylated using iodoacetamide, and digested using sequencing-grade Trypsin. The digested sample, containing approximately 1 mg peptides, received 3 August 2016; accepted 16 February 2017; was fractionated using a strong cation-exchange (SCX) column following the procedure described previously48. published 3 April 2017 Each SCX fraction was re-dissolved in 0.1% formic acid and analysed on a LTQ-Orbitrap Elite mass spectrometer coupled to an Easy-nLC (Thermo Fisher, References Bremen, Germany) equipped with an Acclaim PepMap RSLC C18 column 1. Van Dover, C. L. The Ecology of Deep-Sea Hydrothermal Vents (Thermo Fisher Scientific, Sunnyvale, California, USA). (Princeton Univ. Press, 2000). The MS/MS data were uploaded into Mascot version 2.3.2 (Matrix Sciences, 2. Levin, L. A. Ecology of sediments: interactions of fauna with flow, London, UK) to search against the deduced B. platifrons protein sequences, with chemistry and microbes. Oceanogr. Mar. Biol. 43, 1–46 (2005). their reversed sequences used to calculate the FDR. The searching parameters were 3. Sibuet, M. & Olu, K. Biogeography, biodiversity and fluid dependence as follows: peptide charge: +2,​ +3​ and +​4; fixed modification: carbamidomethyl (C); of deep-sea cold-seep communities at active and passive margins. variable modification: oxidation (M) and deamidated (NQ); maximum missed Deep Sea Res. II 45, 517–567 (1998).

nature ECOLOGY & EVOLUTION 1, 0121 (2017) | DOI: 10.1038/s41559-017-0121 | www.nature.com/natecolevol 5 © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. Articles Nature Ecology & Evolution 51 0 0.

1-0.6 0.18 a 1-0.2 b

07-1.7

2- 89-0.4

22-0.10

Bpl scaf 6488 61

Bpl scaf 166 3703-4.15

Bpl scaf 10

Mph scaf 742 -0.4 Mph scaf 6618- 25-0.2 Mph scaf af 590-3.12 Mph scaf 74 sc

Bpl scaf l 38141-2. 3-8.16 57-0.0

Bp Bpl scaf 6 Mph scaf 223 Bpl scaf 118 Mphscaf 2874 Mph scaf 2874 Mph scaf 486 Bpl scaf 572 Bpl scaf 50748- Bpl scaf 16610- Bpl scaf 28331-0.6 Pfu 6508.1 10-1.16 1-0.1 Bpl scaf 50748-0.2 Bpl scaf 9962-0.2 53612-0.5 Bpl scaf 17866-1.8 Bpl scaf 53657-2.13 Bpl scaf 5 10-0.18 Bpl scaf 171 38141-2.3 Bpl scaf 48 Bpl scaf 28185- Bpl scaf 21361-1.31 210-0.7 9 Bpl scaf 52403 Bpl sca 0.0 33 Bpl scaf 32541-0.0 Bpl scaf 58950-1.2 Bpl scaf 1 Bpl scaf 137 Bpl scaf 648 09-0. Bpl scaf 360 1.16 29520.t Bpl scaf 49105-0.10 99 Bpl scaf 578 f 14491-1.305011-0. 0 Bpl scaf 31475-0.17 4 Bpl scaf 49378-4.8 Bpl scaf 16 Bpl scaf 16617-0.7 1.17 9 1 98 Bpl scaf 58975-3.17 Mph scaf 5 6610-2.15 87 70 Bpl sc 92 Bpl scaf 9450-5.0 73 0. 1 0 Bpl scaf 52311-0.1 4 64 81 Bpl scaf 34540-1.11 81-2.1 8 91 Pfu 908.1 af 237 Bpl scaf 9507-1.6 15-0.1 100 0 Pfu 3135 610-2.14 89 Mph scaf 48928-0.15

85 89 Bpl scaf 43082- Pfu 2896.1 22 99 82 7138-0.25 5 18 96 Bpl scaf 29314-0.19 Mph scaf 99-1.

0 56 27897.t Mph scaf 16692-0.19 78 Bpl scaf 39147-0.5 .1 19034.t1 99 5 80 99 Mph scaf 22932-0.1 Bpl scaf 290 62 98 91 93 Bpl scaf 1 Mph scaf 12896-0.4 50990-0. Mph scaf 5553- 226.t 1 59 0 Mph scaf 71468-0.7 Mph sca 77 68 0 Mph scaf 8362-0.0 Bpl scaf 1 3698-7.1 1 91 0 Mph scaf 55521-0.2 96 Bpl scaf 21361-1.32 -0. 8 99 47 Bpl scaf 1449f 41468-1 Bpl scaf 63127-0.0 8 99 84 82 Mph scaf 28535-0.10 Bpl scaf 64 3698-7.17 5 87 0.3 95 Bpl scaf 55632-0.5 41 8-0.20 96 Bpl scaf 6 .21 69 Bpl scaf 44217-3.12 80 Bpl scaf 1598 0-4.2 Mph scaf 25992-0.20 89 95 44 Bpl scaf 64136-8 136-8.1 36 Mph scaf 2859-0.6 1110-0.0 9 49 76 Mp h scaf 42 Bpl scaf 64136- 95 4 60 8 90 Mph scaf 60939-0.23 Bpl scaf 95 2-1.0 73 Mph scaf 44777-0.2 Bpl sca 79 8 Mph scaf 72752-0.2 44519-2.4 .11 Mph scaf 72144-0.2 8. 85 92 Bpl scaf 62587-3.12 f 34030-2.5 7 Bpl scaf 52 0 78 Mph scaf 29020-0.21 2 80 82 Mph scaf 2487-0.8 100 96 92 0 Bpl scaf 9 87 78 49 Bpl scaf 20345-0.3 Mph scaf 9266-0.1241-1.16 100 79 99 95 Bpl scaf 16935-0.14 Mph scaf 642-0.1 60678-0.2 27 70 100 Bpl scaf 16935-0.10 Mph scaf 59 98 4651-0. 80 94 Bpl scaf Bpl sca 29 719-0.10 99 47 Bpl scaf 58975-3.16 Mph scaf f1654-0. 61248187-1. 7 98 Bpl scaf 3669-2.10 0 93 -0.0 99 Mph scaf 43 Cgi 405951267 80 6-0.34 80 95 48 5 Mph scaf 4112-0.5 Pfu 1338.1 28214.t11 89 4 Mph scaf 60911-0.4 1 Pfu 2234. 97 30 Mph scaf 1334 5-0.3 Mph sc 1 15452.t1 69 Mph scaf 71142-0.270.2 af 8340-0. 100 10 Bpl scaf 88 35 Mph scaf 4198 .2 75 28262-0.44 82 80 72 Mph scaf 5289- Pfu 1412.1 083 99 4 72 Bpl scaf 96 Mph scaf97802 23881-0 44653-68.t1 79 Cgi 405 Bpl scaf 73 95 3.24 90 52241-1.18 74 Cgi 405978025 Bpl scaf 62500-6 100 99 93 Cgi 405978023 Pfu 283 .18 Cgi 405978020 4.1 15640.t 62 Pfu 217.1 23677.t1 Mph scaf 4651-1. 2 100 Bpl scaf 40569-10.6 Mph scaf 6 3 41 99 1372-0. 100 Mph scaf 9227-0.0 Pfu 53.1 10177. 1 Cgi 405978032 t1 94 94 71 Pfu 53.1 10176.t1 100 24 Pfu 174.1 13739.t1 Pfu 1562.1 0 27 99 8.t1 8448.t1 Pfu 174.1 1373 5 Cgi 405978031 66 Cgi 40595417 99 Cgi 405974052 68 Cgi 405970397 Mph scaf 56141-0.11 91 99 Cgi 405972132 96 69 Cgi 405965687 Mph scaf 52145-0.7 45 71 86 94 Bpl scaf 61711-1.12 53 Cgi 405964327 28 97 Cgi 405961627 Pfu 34.1 13439.t1 83 98 Pfu 369.1 30682.t1 1 19533.t1 82 Pfu 6665. 77 24 Pfu 225.1 17144.t1 Pfu 3337.1 25718.t1 100 58 Pfu 953.1 11324.t1 Pfu 2691.1 05625.t1 8 Pfu 1360.1 01575.t1 99 84 86 86 80 Pfu 3730.1 02480.t1 53 Pfu 12316.1 23022.t1 Cgi 405965335 100 Cgi 405965161 40 96 Cgi 4059707 Cgi 4059766830 80 90 84 Cgi 405 Cgi 40596111 72 85 959982 682 9 Mph scaf 17036-0.0 Cgi 405976 1 .t 99 Cgi 40597393 Pfu 1222.1 08254 15 91 36 99 Bpl scaf 13258-0.41 Pfu 3964.1 15902.t1 91 44 Pfu 862.1 07 18 89 94 1.15 Cgi 40597160964.t Mph scaf 51546-0.6 90 1 11 92 Pfu 656.1 21026 Bpl scaf 32300- 98 38 7 7 98 60 81 Pfu 656.1 21025.t1 18 82 100 Bpl scaf 32300-1. Pfu 440 .t1 7 50 2.1 09320.t Mph scaf 43212-0.1 63 86 Pfu 31 35. 97 77 Bpl scaf 63743-3.6 90 Pfu 656 1 19035.t1 1 97 Bpl scaf 63743-3. 84 Pfu 1269.1.1 21023. Mph scaf 71942- Mph scaf 48369-1.6 98 31575.t1t1 Bpl scaf 63743-4.11 5 Bpl scaf 5295 28 77 Bpl scaf 62500- Bpl scaf 8738-0.11 43 92 98 79 88 99 0.1 80 83 Bpl scaf 18 -2. 4 Bpl scaf 32704-0.12 99 5 5 Mph scaf 6 89 31 1-0.119.1 Pfu 4545.1 19322.t1 97 Mph scaf 3 90 3607 4-0. Pfu 2888.1 28975.t1 Bpl scaf 5465 Mph scaf 89-0.4 4 99 Mph sca 9742-0.6 84 98 91 Cgi 405961841 8 Pfu 1277.1 24833.t1407-2.1 25 87 41 f 26204-0.31-5.15 Pfu 575.1 17589.t1 98 Pfu 2126 99 Pfu 9131.1 0 84 72 61 20 Cgi 40597 Mph scaf 42 86 .1 21946.t1 100 93 Pfu 819.1 Bpl scaf 63743-1.17 0 99 7 Pfu 3027.1 6381.t1 Bpl scaf 48726-1.19 30 Cgi 405967807 92 1608 87 Pfu 1559.1 66 68 31 187.t 73 46 Pfu 1866.1 2 96 24 Bpl scaf 22825-0.10 25614.t Bpl scaf 16982-0.22 60 Mph scaf 6 31760.t1 1 Mph scafCgi 60977-0.4 405958687 10012 7 81 46 Pfu 112.1 06 8 99 1 Cgi 405952425 94 1829.t1 94 95 Pfu 303 86 54 87 77 Pfu 1 12.1 Bpl scaf 53797-3.1 94 99 90 0456-0. 99 36 Pfu 21322. 6 2.1 09046 Bpl scaf 64635-0.3 6 98 72 100 Cgi 405950313 954.t1 52 92 Cgi 4059598 Mph scaf 7887-1.9 77 06953.t1 2 80 99 Pfu 2310.1 02 Bpl scaf 48726-1.21 30 Pfu 1237 1 09976.t1 29 .t1 85 84 Pfu 2299.1 Mph scaf 35545-0.6 89 100 93 Pfu 5650. Mph scaf 11822-0.0 99 Pfu 143.1 10 7.1 26430 87 69 Pfu 5801. Mph scaf 11822-0.40 100 8 Bpl scaf 14002-1.10 100 Pfu 819.1 31 085.t 49 100 100 95 Mph scaf Mph scaf 7887-1.18 32136.t 64 32 99 Cgi 405957896 1 02768.t1 95 Cgi 405 1 Mph scaf 31214-1.10 .t1 62 Cgi 405976607 1 06165.t339.t1 Bpl scaf 13236-1.16 98 67 85

71 Pfu 542.1 076 96 1 0 28 Mph scaf 3 65 71627-0.17 86 Mph scaf 188.t1 94 Mph scafBpl scaf69270-1.14 44217-0.9 55 99 Mph scaf 10419- 950727 Bpl scaf1 4002-1.8 98 99 Bpl scaf 38 82 100 Mph scaf 1 Pfu 76.1 20178.t1 Mph scaf 1 76 99 Mph scaf 69270-0.5 99 Bpl scaf 96 Bpl scaf 16610-

96

1 Bpl scaf 1

8

Bpl scaf 64881-2.10 621-0.4 20380-0.1 scaf Bpl Bpl scaf 64 43894-0.1601-0.

Bpl scaf 64881-2.17 scaf Bpl Bpl scaf 38141-1.10 58.t1 Bpl scaf 3814 scaf Bpl Bpl scaf

9-1.20

Mph scafMph 68495-0.11 scaf 4651-2.7 0.2 6 scaf Bpl Bpl scaf 38

Bpl scaf 20380-0. scaf Bpl Cgi 405969007

4

Mph scaf 4651-0.5 59306-0.1 scaf Bpl Cgi 405969767 34625-0273-0.6

Bpl scaf 648 scaf Bpl Cgi 4059 64881-2.

Bpl scaf 59306-0. scaf Bpl Cgi 405

Bpl scaf 14491-1.29 scaf Bpl Cgi 4059530

Mph scaf 72046-0.31 scaf Mph Cgi 405953040 3673-0.

Pfu 352.1 07397. 352.1 Pfu Cgi 405969765

5

Cgi 40594667 Cgi Cgi 405969762 5

Mph scaf 58020-0 scaf Mph Bpl scaf 617

Mph scaf 37795 scaf Mph Bpl scaf 64029-1.12

5 Bpl scaf 60224-2.8 54651-4. scaf Bpl

Bpl scaf 12477-0.1

Cgi 40596976 Cgi Pfu 22681.1 06646 Pfu 739.1 3111 Cgi 405959453 Cgi Pfu 3553.1 124 Mph scaf 68495-0.10 18953.t1 2875.1 Pfu 6610-1.1 0.3 1 Mph scaf 37470-0.3 38141-2.3 7 Mph scaf 41 881-1 .1

4881-2.1 0. .2 Mph scaf 40644-0.7 Cgi 40596746 961096 141-0.1 2 0

Bpl scaf 44653-3.27 71683 1-0.1 3 Bpl scaf 6402 7 Bpl scaf 50304-3.11

8 81-1.21

0 5 39 Mph scaf 66947- 2-0. 5

Mph scaf 47418-0.6

6 4 11

3.t

89.t1

4

8

t 3

-0.3

1 1

. 5 .t

1

c Bpl_scaf_16610

Bpl_scaf_38141

AM FOGMVM

Bpl_scaf_64881 –2 –1 01 2 Row Z score 25 kb

Figure 4 | Expansion of the IAP family in B. platifrons. a, Unrooted genealogy of IAP genes in Bathymodiolus platifrons (Bpl, green), Crassostrea gigas (Cgi, red), Modiolus philippinarum (Mph, yellow) and Pinctada fucata (Pfu, blue). b, Heat map of expression of IAP genes in six tissues. AM, adductor muscle; F, foot; G, gill; O, ovary; M, mantle; VM, visceral mass. c, Genomic arrangement of three IAP clusters. IPA genes are shown in red, whereas other genes are labelled in yellow. Scaffolds Bpl_scaf_16610 and Bpl_scaf_38141 each contain 6 IAP genes that are transcribed in the same direction, whereas scaffold Bpl_scaf_64881 contains 8 IAP genes that are transcribed in opposite directions.

4. Little, C. T. S. & Vrijenhoek, R. C. Are hydrothermal vent animals living 12. Faure, B., Schaeffer, S. W. & Fisher, C. R. Species distribution and population fossils? Trends Ecol. Evol. 18, 582–588 (2003). connectivity of deep-sea mussels at hydrocarbon seeps in the Gulf of Mexico. 5. Colaço, A. et al. LabHorta: a controlled aquarium system for monitoring PLoS ONE 10, e0118460 (2015). physiological characteristics of the hydrothermal vent mussel Bathymodiolus 13. Wong, Y. H. et al. High-throughput transcriptome sequencing of the cold azoricus. ICES J. Mar. Sci. 68, 349–356 (2011). seep mussel Bathymodiolus platifrons. Sci. Rep. 5, 16597 (2015). 6. Dubilier, N., Bergin, C. & Lott, C. Symbiotic diversity in marine animals: 14. Lorion, J. et al. Adaptive radiation of chemosymbiotic deep-sea mussels. the art of harnessing chemosynthesis. Nat. Rev. Microbiol. 6, 725–740 (2008). Proc. Biol. Sci. 280, 20131243 (2013). 7. Bettencourt, R. et al. High-throughput sequencing and analysis of the gill 15. Olu-Le Roy, K., von Cosel, R., Hourdez, S., Carney, S. L. & Jollivet, D. tissue transcriptome from the deep-sea hydrothermal vent mussel Amphi-Atlantic cold-seep Bathymodiolus species complexes across the Bathymodiolus azoricus. BMC Genomics 11, 559 (2010). equatorial belt. Deep Sea Res. I 54, 1890–1911 (2007). 8. Boutet, I., Jollivet, D., Shillito, B., Moraga, D. & Tanguy, A. Molecular 16. Pruski, A. M. & Dixon, D. R. Heat shock protein expression pattern (HSP70) identification of differentially regulated genes in the hydrothermal-vent in the hydrothermal vent mussel Bathymodiolus azoricus. Mar. Environ. Res. species Bathymodiolus thermophilus and Paralvinella pandorae in response 64, 209–224 (2007). to temperature. BMC Genomics 10, 222 (2009). 17. Luckenbach, T. & Epel, D. ABCB- and ABCC-type transporters confer 9. Bougerol, M., Boutet, I., LeGuen, D., Jollivet, D. & Tanguy, A. Transcriptomic multixenobiotic resistance and form an environment-tissue barrier response of the hydrothermal mussel Bathymodiolus azoricus in experimental in bivalve gills. Am. J. Physiol. Regul. Integr. Comp. Physiol. 294, exposure to heavy metals is modulated by the Pgm genotype and symbiont R1919–R1929 (2008). content. Mar. Genomics 21, 63–73 (2015). 18. Won, Y.-J. et al. Environmental acquisition of thiotrophic endosymbionts by 10. Johnson, S., Won, Y.-J., Harvey, J. & Vrijenhoek, R. A hybrid zone between deep-sea mussels of the genus Bathymodiolus. Appl. Environ. Microbiol. 69, Bathymodiolus mussel lineages from eastern Pacific hydrothermal vents. 6785–6792 (2003). BMC Evol. Biol. 13, 21 (2013). 19. Hoffmann, J. A., Kafatos, F. C., Janeway, C. A. & Ezekowitz, R. 11. DeChaine, E. G. & Cavanaugh, C. M. in Molecular Basis of Symbiosis: Phylogenetic perspectives in innate immunity. Science 284, Symbioses of Methanotrophs and Deep-Sea Mussels (Mytilidae: 1313–1318 (1999). Bathymodiolinae) (ed. Overmann, J.) 227–249 (Springer Berlin 20. Baumgarten, S. et al. The genome of Aiptasia, a sea anemone model Heidelberg, 2006). for coral symbiosis. Proc. Natl Acad. Sci. USA 112, 11893–11898 (2015).

6 nature ECOLOGY & EVOLUTION 1, 0121 (2017) | DOI: 10.1038/s41559-017-0121 | www.nature.com/natecolevol © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. Nature Ecology & Evolution Articles

21. Dziarski, R. & Gupta, D. The peptidoglycan recognition proteins (PGRPs). 47. Han, M. V., Thomas, G. W. C., Lugo-Martinez, J. & Hahn, M. W. Estimating Genome Biol. 7, 1–13 (2006). gene gain and loss rates in the presence of error in genome assembly and 22. Biondo, C. et al. The role of endosomal Toll-like receptors in bacterial annotation using CAFE 3. Molecul. Biol. Evol. 30, 1987–1997 (2013). recognition. Eur. Rev. Med. Pharmacol. Sci. 16, 1506–1512 (2012). 48. Sun J. et al. First proteome of the egg perivitelline fluid of a freshwater 23. Chen, K. & Williams, K. J. Molecular mediators for raft-dependent gastropod with aerial oviposition. J. Proteome Res. 11, 4240–4248 (2012). endocytosis of syndecan-1, a highly conserved, multifunctional receptor. 49. Barry, J. P. et al. Methane-based symbiosis in a mussel, Bathymodiolus J. Biol. Chem. 288, 13988–13999 (2013). platifrons, from cold seeps in Sagami Bay, Japan. Invertebr. Biol. 121, 24. de Beco, S., Gueudry, C., Amblard, F. & Coscoy, S. Endocytosis is required 47–54 (2002). for E-cadherin redistribution at mature adherens junctions. Proc. Natl Acad. 50. Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Sci. USA 106, 7010–7015 (2009). Evol. 24, 1586–1591 (2007). 25. Derivery, E. et al. The Arp2/3 activator WASH controls the fission of endosomes through a large multiprotein complex. Dev. Cell 17, 712–723 (2009). 26. Dunn, S. R. & Weis, V. M. Apoptosis as a post-phagocytic winnowing Acknowledgements mechanism in a coral–dinoflagellate mutualism. Environ. Microbiol. 11, We thank the crew of research vessel (R/V) Xiangyanghong 9 and operation team 268–276 (2009). of the Jiaolong, and the crew of R/V Kairei, operation team of remotely operated 27. Vigneron, A. et al. Insects recycle endosymbionts when the benefit is over. vehicle (ROV) Kaiko Mk-IV, and on-board scientists during the cruise of KR15-17 for Curr. Biol. 24, 2267–2273 (2014). facilitating sampling of B. platifrons. We also thank M. Sun for her help with the figures 28. Romero, A., Novoa, B. & Figueras, A. The complexity of apoptotic cell death and her continuous encouragement to the first author. This study was supported in mollusks: an update. Fish Shellfish Immunol. 46, 79–87 (2015). by the Strategic Priority Research Program of Chinese Academy of Sciences 29. Detree, C. et al. Multiple I-type lysozymes in the hydrothermal vent mussel (grant XDB06010102 to P.-Y.Q.), Hong Kong Baptist University (grant FRG2/14-15/002 Bathymodiolus azoricus and their role in symbiotic plasticity. PLoS ONE 11, to J.-W.Q.), and Scientific and Technical Innovation Council of Shenzhen and e0148988 (2016). Guangdong Natural Science Foundation (grant 827000012, JCYJ20150625102622556, 30. Guezi, H., et al. The potential implication of apoptosis in the control of 2014A030310230 to Yu Z.). Genome sequencing was conducted by BGI (Shenzhen), chemosynthetic symbionts in Bathymodiolus thermophilus. Fish Shellfish Macrogen (Seoul), and the Roy J. Carver Biotechnology Center of University of Illinois Immunol. 34, 1709 (2013). Urbana–Champaign (UIUC). The mass spectrometry analysis was performed at the 31. Nussbaumer, A. D., Fisher, C. R. & Bright, M. Horizontal endosymbiont Instrumental Analysis & Research Center, Sun Yat-Sen University. transmission in hydrothermal vent tubeworms. Nature 441, 345–348 (2006). 32. Canfield, D. & Des Marais, D. Aerobic sulfate reduction in microbial mats. Author contributions Science 251, 1471–1473 (1991). J.S., J.-W.Q. and P.-Y.Q. conceived and led the study. J.W.Q. and J.S. collected the samples. 33. Lee, S. & Morton, B. in The Malacofauna of Hong Kong and Southern T.X. extracted the DNA and RNA, constructed the phylogenetic tree, and analysed China II: The Hong Kong Mytilidae (eds Morton, B. & Dudgeon, D.) 49–76 the apoptosis-related gene families. J.S. and C.J.F. assembled the genome. J.S., C.J.F. (Hong Kong Univ. Press, 1985). and R.L. annotated the genomes. J.S. and Yang Z. analysed the immune-related gene 34. Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-seq data families. Yanj Z. analysed the HSP70 gene family. J.H.L.H., W.N., and F.K.M.C. analysed without a reference genome. Nat. Biotechnol. 29, 644–652 (2011). microRNA and hox genes. Y.L. analysed the positively selected genes. H.M., Yu Z., and 35. Kajitani, R. et al. Efficient de novo assembly of highly heterozygous W.Z. analysed the proteome. J.S. performed the other bioinformatic analysis. All of the genomes from whole-genome shotgun short reads. Genome Res. 24, authors read, wrote and approved the manuscript. 1384–1395 (2014). 36. Smit, A., Hubley, R. & Green, P. RepeatMasker Open-3.0. (1996–2010). Additional information 37. Haas, B. J. et al. Improving the Arabidopsis genome annotation Supplementary information is available for this paper. using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003). Reprints and permissions information is available at www.nature.com/reprints. 38. Cantarel, B. L. et al. MAKER: an easy-to-use annotation pipeline designed for Correspondence and requests for materials should be addressed to J.-W.Q. and P.-Y.Q. emerging model organism genomes. Genome Res. 18, 188–196 (2008). How to cite this article: Sun, J. et al. Adaptation to deep-sea chemosynthetic 39. Li, L., Stoeckert, C. J. & Roos, D. S. OrthoMCL: identification of ortholog environments as revealed by mussel genomes. Nat. Ecol. Evol. 1, 0121 (2017). groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003). 40. Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in D222–D230 (2014). published maps and institutional affiliations. 41. Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016). Competing interests 42. Stamatakis, A., Ludwig, T. & Meier, H. RAxML-III: a fast program for The authors declare no competing financial interests. maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 21, 456–463 (2005). This article is licensed under a Creative Commons Attribution 43. dos Reis, M. & Yang, Z. Approximate likelihood calculation on a 4.0 International License, which permits use, sharing, adaptation, phylogeny for Bayesian estimation of divergence times. Mol. Biol. Evol. 28, distribution and reproduction in any medium or format, as long as 2161–2172 (2011). you give appropriate credit to the original author(s) and the source, provide a link to the 44. Benton, M. J., Donoghue, P. C. J. & Asher, R. J. in The Timetree of Life: Creative Commons license, and indicate if changes were made. Calibrating and Constraining Molecular Clocks (eds Hedges, S. B. & Kumar, S.) The images or other third party material in this article are included in the article’s 35–86 (Oxford Univ. Press, 2009). Creative Commons license, unless indicated otherwise in a credit line to the material. 45. Benton, M. J. et al. Constraints on the timescale of evolutionary If material is not included in the article’s Creative Commons license and your intended history. Palaeontol. Electron. 18, 18.1.1FC (2015). use is not permitted by statutory regulation or exceeds the permitted use, you will need 46. Albertin, C. B. et al. The octopus genome and the evolution of cephalopod to obtain permission directly from the copyright holder. To view a copy of this license, neural and morphological novelties. Nature 524, 220–224 (2015). visit http://creativecommons.org/licenses/by/4.0/

nature ECOLOGY & EVOLUTION 1, 0121 (2017) | DOI: 10.1038/s41559-017-0121 | www.nature.com/natecolevol 7 © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.