1 Supplementary Information for: 2 3 Tripartite holobiont system in a vent snail broadens the concept of chemosymbiosis 4 5 Yi Yang1, Jin Sun1, Chong Chen2, Yadong Zhou3, Yi Lan1, Cindy Lee Van Dover4, 6 Chunsheng Wang3,5, Jian-Wen Qiu6, Pei-Yuan Qian1* 7 8 * Corresponding author: Pei-Yuan Qian 9 Email: [email protected] 10 11 12 This PDF file includes: 13 14 Figures S1 to S13 15 Tables S1 to S3 16 Supplementary Note 1 to Note 9 17 Legends for Datasets S1 to S4 18 SI References 19 20 Other supplementary materials for this manuscript include the following: 21 22 Datasets S1 to S4 23

I

24 Table of Contents 25 Supplementary Figures ...... 1 26 Figure S1. AED score distribution deduced from all of the gene models...... 1 27 Figure S2. Unique orthologous genes of the Wocan Alviniconcha marisindica based on 28 genomic comparisons with the other three Lophotrochozoa ...... 2 29 Figure S3. Gill endosymbionts and intestinal ectosymbionts of Alviniconcha marisindica 30 from the Wocan vent field ...... 3 31 Figure S4. Genomic comparisons and gene family analyses of the Wocan Alviniconcha 32 marisindica endosymbiont and the other four campylobacterotal close relatives ...... 4 33 Figure S5. DNA-repair genes in representative ...... 5 34 Figure S6. The abundance of the symbionts in gill and intestine of snail Alviniconcha 35 marisindica ...... 6 36 Figure S7. Pan-genome of 23 Alviniconcha marisindica endosymbionts from the Wocan 37 vent field ...... 7 38 Figure S8. Genome-based comparisons of 23 endosymbiotic isolates of Alviniconcha 39 marisindica from the Wocan vent field ...... 8 40 Figure S9. Transcriptional activity of genes participating in tripartite syntrophic 41 interactions in the Wocan Alviniconcha marisindica holobiont ...... 9 42 Figure S10. Nutrient biosynthesis capability of the Wocan Alviniconcha marisindica 43 holobiont ...... 10 44 Figure S11. Transcriptional activity of genes participating in the nutrient digestion and 45 absorption of the Wocan Alviniconcha marisindica host and bacterial immune evasion and 46 defence ...... 11 47 Figure S12. Gene Ontology (GO) enrichment network of differentially expressed genes 48 (DEGs) ...... 12 49 Figure S13. Bathymetric map of the hydrothermal vents over Central Indian Ridge and 50 Carlsberg Ridge...... 13 51 Supplementary Tables ...... 14 52 Table S1. The BUSCO score for different version of the assembly...... 14 53 Table S2. Quast genome assembly assessment report of Alviniconcha marisindica...... 15 54 Table S3. The annotation and expression levels of genes involved in encoding 55 representative exo-hydrolases in intestinal flora of three Alviniconcha marisindica 56 individuals...... 16 57 Supplementary Notes ...... 21 58 Supplementary Note 1 ...... 21 59 Figure S14. A photograph of snail Alviniconcha marisindica collected from the WHF 60 which stored in absolute ethanol. Scale bar = 10 cm...... 22 61 Figure S15. SEM images of radula...... 23 62 Figure S16. Scatter plot of shell width (diameter) vs. shell height across a size range of 63 19 specimens from Alviniconcha snails ...... 24 64 Table S4. Shell parameters of Alviniconcha marisindica ...... 24 65 Figure S17. Molecular of the population of Alviniconcha snails from Wocan 66 vent site on Carlsberg Ridge ...... 26 67 Supplementary Note 2 ...... 26 68 Table S5. The nucleic acid preparation and Illumina paired-end sequencing information 69 of different tissues/organs from 16 Alviniconcha marisindica individuals in this study. 28 70 Table S6. The PacBio and ONT sequencing information of DNA from one Alviniconcha 71 marisindica individual for holobiont genome assembly...... 31 72 Supplementary Note 3 ...... 31 73 Figure S18. A 17-mer histogram for the Alviniconcha marisindica genome...... 31

II 74 Table S7. The statistics of the genome assembly. The number in bold indicates the best 75 among all of the assemblers...... 32 76 Table S8. Information of the bacterial sequences removed from the assembled host 77 contigs...... 33 78 Supplementary Note 4 ...... 33 79 Table S9. The classified repeat content in the genome...... 33 80 Table S10. Lophotrochozoan genomes that were used for the phylogeny and gene family 81 analyses...... 36 82 Figure S19. Genome-based phylogeny of selected taxa showing the position of the 83 Alviniconcha marisindica among lophotrochozoans...... 37 84 Table S11. The time constrain that was applied to calibrate the divergent time in 85 the MCMCTree analysis...... 37 86 Supplementary Note 5 ...... 38 87 Figure S20. The maximum likelihood (ML) phylogenetic tree of endosymbionts of 88 Alviniconcha snails and other marine bacteria based on 16S rRNA gene...... 38 89 Table S12. The general genomic features of A. marisindica symbionts...... 40 90 Figure S21. The overview of genome of Mollicutes symbiont in the gill of A. 91 marisindica constructed using GView Server...... 40 92 Supplementary Note 6 ...... 41 93 Supplementary Note 7 ...... 43 94 Figure S22. Genomic comparison analysis across representatives of Campylobacterota. 95 ...... 43 96 Supplementary Note 8 ...... 45 97 Supplementary Note 9 ...... 45 98 Figure S23. Gene Ontology (GO) functional annotations of highly expressed genes in 99 the digestive gland, foot, gill, and intestine...... 46 100 Figure S24. Differentially expressed genes (DEGs) in the intestine and gill of 101 Alviniconcha marisindica...... 47 102 Figure S25. Transcriptional activity of genes participating in the globin of Wocan 103 Alviniconcha marisindica for substrates transportation...... 48 104 Dataset S1 (separate file). The annotation and expression levels of genes involved in 105 encoding hydrolases in intestinal flora of three A. marisindica individuals...... 49 106 Dataset S2 (separate file). The annotation of genes predicted from the metagenome of 107 intestinal flora from three snail individuals...... 49 108 Dataset S3 (separate file). The loss-of-function orthologous genes from the endosymbionts 109 of Alviniconcha marisindica and four campylobacterotal references through their pair-wise 110 genomic comparisons based on a profile-based method...... 49 111 Dataset S4 (separate file). The annotation of highly expressed genes (Fold Change>2) in 112 different tissues (gill, intestine, digestive gland, and mantle) of A. marisindica...... 49 113 SI References ...... 49 114 115

III 116 Supplementary Figures

117 118 Figure S1. AED score distribution deduced from all of the gene models.

1

119 120 Figure S2. Unique orthologous genes of the Wocan Alviniconcha marisindica based on 121 genomic comparisons with the other three Lophotrochozoa. Distribution of Alviniconcha- 122 unique orthologous genes in different functional categories based on (A) KOG and (B) 123 KEGG annotations. (C) Expression level of Alviniconcha-unique genes in the foot (Ft), neck 124 (Nk), mantle (Man), digestive gland (DG), gill (Gi), and intestine (Int) tissues are shown in 125 the heat map.

2 A.

0

0

1

0

5

t

n

e

t

n

o

C

C

G

0

2

0

1 1 10 100 1000 10000 Sequencing Coverage

B. Intestinal microbes (Phylum > 0.3%) Intestinal microbes ( > 1%)

1.0 Cannot be assigned 1.0 Virus 0.9 Phylum < 0.3% Pseudomonas Viruses 0.9 0.8 Candidatus Saccharibacteria 0.8 Spiroplasma 0.7 Burkholderia e Fusobacteria e 0.7

g Chloroflexi g Aeromonas a 0.6

a

t 0.6 t Proteus

n Verrucomicrobia 0.5 n e 0.5 Fusobacteria e Lactobacillus

c

c

r 0.4 r 0.4 Nitratifractor

e Elusimicrobia

e

P Arcobacter 0.3 Deinococcus-Thermus P 0.3 0.2 Spirochaetes Streptomyces Cyanobacteria 0.2 Paenibacillus 0.1 Planctomycetes 0.1 Haemophilus 0.0 Euryarchaeota 0.0 Xanthomonas Bacteroidetes -5 5 -6 5 5 6 Mycoplasma 9 -1 1 - 1 - 2 9 3 Tenericutes 9 - 1 1 2 1 2 9 3 Sulfurovum V 1 V 1 2 1 D V D Firmicutes V 1 V I- D I- D V D 8 I- 8 I- D I- 3 8 3 Actinobacteria 8 I- 8 3 3 8 3 Proteobacteria 3 Individuals Individuals 126 127 Figure S3. Gill endosymbionts and intestinal ectosymbionts of Alviniconcha marisindica 128 from the Wocan vent field. (A) Differential coverage binning of Alviniconcha and its gill 129 endosymbiont (one dominant phylotype, Campylobacterota). Each circle represents a contig 130 and its area indicates the contig length. The colour represents the systematic affinity of the 131 contig. The contigs of A. marisindica and its endosymbiont are grouped based on their 132 different levels of GC content and sequencing coverage. The area of red circles represents 133 one single dominant Campylobacterota and the purple represents one less abundant 134 Mollicutes. (B) Abundance and community structure of the ectosymbionts in the intestine. 135 Microbial taxonomic structures are deduced from the intestinal metagenomes. The 136 community compositions are displayed at the phylum and genus level based on Kaiju 137 classification. Phylum totaling >0.3% and genus totaling >1% of the samples are shown 138 respectively.

3 A. Genomic comparison Sulfurovum sp. NBC37-1 Sulfurovum riftiae Sulfurovum lithotrophicum L. satsuma endosymbiont A. marisindica endosymbiont GC content GC skew COG Category A B C D E F G H I J K L M N O P Q R S T U V W X Y Z B. C. 5 8

COG Category 4 legend 6 C: Energy production and conversion KEGG Classification D: Cell cycle control, cell division legend E: Amino acid transport and metabolism H: Coenzyme transport and metabolism A: Oxidoreductases I: Lipid transport and metabolism 3 y

y c J: Translation, ribosomal structure and biogenesis

c B: Transferases

n

n e K: Transcription

e C: Hydrolases

u

u D: Membrane trafficking q L: Replication, recombination and repair

q e 4

r e M: Cell wall/membrane/envelope biogenesis r E: DNA repair and recombination proteins

F F F: Chaperones and folding catalysts O: Posttranslational modification, protein tu rnover 2 G: Transcription factors P: Inorganic ion transport and metabolism H: Secretion system R: General function prediction only I: Lipopolysaccharide biosynthesis proteins S: Function unknown T: Signal transduction mechanisms U: Intracellular trafficking, secretion, and vesicular 2 V: Defense mechanisms 1 Z: Cytoskeleton

0 0

A B C D E F H G I C D E H I J K L M O P R S T U V Z 139 FunctionClass FunctionClass 140 Figure S4. Genomic comparisons and gene family analyses of the Wocan Alviniconcha 141 marisindica endosymbiont and the other four campylobacterotal close relatives. (A) 142 Genome-similarity comparisons of the endosymbiont of A. marisindica and the other four 143 Campylobacterota constructed using GView. From outside to the centre: genes of four 144 Campylobacterota on forward strand, the innermost circular part represents the endosymbiont 145 of A. marisindica, including genes on forward strand, GC content (%), GC skew. Colours 146 indicate categories of clusters of orthologous groups (COGs) and the scale is in kbp. The 147 identifiers of outside-to-inside rings are listed on the right. The unique orthologous genes of 148 the A. marisindica endosymbiont are classified into different functional categories based on 149 (B) KEGG and (C) COG annotations.

4 Sal Lsa Sli Sri SNBC37-1 Hpy Cje Wsu Eco Hin Cvi Gsu LexA MutH MutL MutM MutS1 MutY RecA RecB RecC RecD RecF RecG RecJ RecN RecO RecQ RecR RecX RuvA RuvB 150 RuvC 151 152 Figure S5. DNA-repair genes in representative Proteobacteria. Blue indicates absence, 153 and yellow indicates presence. Sal, Sulfurovum alviniconcha CR; Lsa, the endosymbiont of 154 Lamellibrachia satsuma; Sli, Sulfurovum lithotrophicum; Sri, Sulfurovum riftiae; SNBC37-1, 155 Sulfurovum sp. NBC37-1; Hpy, Helicobacter pylori 26695; Cje, Campylobacter jejuni 156 NCTC11168; Wsu, Wolinella succinogenes DSM1740; Eco, Escherichia coli K12 (γ- 157 proteobacterium); Hin, Haemophilus influenzae Rd KW20 (γ-proteobacterium); Cvi, 158 Chromobacterium violaceum ATCC12472 (β-proteobacterium); Gsu, Geobacter 159 sulfurreducens PCA (δ-proteobacterium).

5 160 161 Figure S6. The abundance of the symbionts in gill and intestine of snail Alviniconcha 162 marisindica. (A) The abundance of campylobacterotal endosymbionts in the gill 163 metagenomes. (B) The abundance of ectosymbionts in the intestinal metagenomes.

6 164 165 Figure S7. Pan-genome of 23 Alviniconcha marisindica endosymbionts from the Wocan 166 vent field. Pan-genes are classified into different functional categories based on COG 167 annotations. The number of genes in each category is shown in the histogram. In particular, 168 genes involved in energy production and conversion, nutrient biosynthesis, and metabolism 169 are classified into different metabolic pathways based on KEGG annotations. The percentage 170 of genes involved in each type of metabolic pathway is shown in the pie chart.

7 A. B. 8-A 8-B 2 7-A 13-A 9-B 9-B Tree scale: 13-A 6-B 12-A 5-A 0.001 6-B 10-A 4-A 13-B 1 8-A Boostrap 7-B 9-A 100 4-A 7-A 11-A 10-B > 90 4-B 5-B > 70 10-B 2 4-B 6-A 8-B 12-B 9-A 3 z 11-B 11-A 5-A 3 7-B 6-A 5-B 13-B x 1 y 11-B 10-A 12-A 171 12-B 172 Figure S8. Genome-based comparisons of 23 endosymbiotic isolates of Alviniconcha 173 marisindica from the Wocan vent field. (A) Principal component analysis under the 174 BLOSUM62 model and (B) phylogenomic analysis on the orthologous proteins in 941 175 single-copy shared orthologues of 23 endosymbionts from 13 A. marisindica individuals. 176 177

8 178 179 Figure S9. Transcriptional activity of genes participating in tripartite syntrophic 180 interactions in the Wocan Alviniconcha marisindica holobiont. Expression level of genes 181 that participate in (A) chemoautotroph, nutrient biosynthesis and metabolism, and nitrogen 182 and carbon metabolism in the endosymbiont of A. marisindica, (B) nutrient biosynthesis and 183 food digestion in the A. marisindica host, and (C) gut bacterial exo-hydrolase biosynthesis 184 for intestinal food digestion in the A. marisindica host for three A. marisindica individuals. 185 186

9 187 188 Figure S10. Nutrient biosynthesis capability of the Wocan Alviniconcha marisindica 189 holobiont. Venn diagram showing the nutrients (A) with or (B) without complete 190 biosynthesis pathways in the genomes of A. marisindica and its symbionts. Amino acids 191 (black colour): A – Alanine, bA – β-Alanine, C – Cysteine, D – Aspartate (aspartic acid), E – 192 Glutamic acid, F – Phenylalanine, G – Glycine, H – Histidine, hypoTa – Hypotaurine, I – 193 Isoleucine, K – Lysine, L – Leucine, M – Methionine, N – Asparagine, Orn – Ornithine, P – 194 Proline, Q – Glutamine, R – Arginine, S – Serine, T – Threonine, Ta – Taurine, V – Valine, 195 W – Tryptophan, Y – Tyrosine; Vitamins/cofactors (red colour): B1 – Thiamin, B2 – 196 Riboflavin, B3 – Nicotinate and nicotinamide, B5 – Pantothenate, B6 – Pyridoxine, B7 – 197 Biotin, B9 – Folate, B12 – Cobalamin, K1 – Phylloquinone, K2 – Menaquinone, CoA – 198 Coenzyme A, CoQ – Coenzyme Q

10 199 200 Figure S11. Transcriptional activity of genes participating in the nutrient digestion and 201 absorption of the Wocan Alviniconcha marisindica host and bacterial immune evasion 202 and defence. Heat map of transcriptional activity of genes that (A) encode various 203 hydrolases, including proteases, glycoside hydrolase, and peptidoglycan recognition proteins 204 (PGRPs) in the foot, neck, mantle, digestive gland (DG), intestine, and gill tissues, (B) 205 participate in bacterial surface-associated virulence factors, surface modification, protease 206 synthesis, and secretion in the gill endosymbiont and the intestinal ectosymbionts. Each grid 207 in the heat map represents a gene identified in the respective sample. The colour represents 208 the gene expression level (based on normalised TPM values). The annotated gene names and 209 their functional classifications are listed on the sides.

11 210 211 Figure S12. Gene Ontology (GO) enrichment network of differentially expressed genes 212 (DEGs). The significantly (p-value < 0.01) enriched GO terms of selected highly expressed 213 genes in the (A) intestine and (B) gills of Alviniconcha marisindica are clustered according to 214 their functional category. The connecting pairs of nodes showing the intra-cluster and inter- 215 cluster similarities of enriched terms. The colour code represents different cluster 216 annotations. Each node represents an enriched term.

12 217 218 Figure S13. Bathymetric map of the hydrothermal vents over Central Indian Ridge and 219 Carlsberg Ridge. The location of the Wocan Hydrothermal Field (WHF) in this study is on 220 Carlsberg Ridge and marked as one point ‘Wocan-3’. ‘Wocan-1’ and ‘Wocan-2’ are found by 221 DY 24th cruise in 2012, and marked as one point. The three vent sites of ‘Solitaire’, 222 ‘Edmond’, and ‘Kairei’ are marked as three points which located on Central Indian Ridge.

13 223 Supplementary Tables

224 Table S1. The BUSCO score for different version of the assembly. Version BUSCO score After 1 round of Pilon correction C:90.1% [S:88.7%,D:1.4%], F:1.2%, M:8.7%, n:954 After 2 rounds of Pilon correction C:95.6% [S:94.2%,D:1.4%], F:0.8%, M:3.6%, n:954 After Redundans C:96.5% [S:95.9%,D:0.6%], F:1.2%, M:2.3%, n:954 225

14 226 Table S2. Quast genome assembly assessment report of Alviniconcha marisindica. Assembly parameters Assessment results No. of contigs (>= 5000 bp) 3,926 No. of contigs (>= 10000 bp) 2,814 No. of contigs (>= 25000 bp) 2,097 No. of contigs (>= 50000 bp) 1,762 Total length (>= 5000 bp) 829,612,046 Total length (>= 10000 bp) 821,908,196 Total length (>= 25000 bp) 810,769,599 Total length (>= 50000 bp) 798,818,352 No. of contigs 3,926 Largest contig (bp) 4,734,527 Total length (bp) 829,612,046 GC (%) 45.47 N50 727,552 N75 366972 L50 336 L75 743 Total reads 17,708,417 Mapped (%) 98.72 Avg. coverage depth 81 Coverage >= 1X (%) 99.40 Coverage >= 5X (%) 99.01 Coverage >= 10X (%) 99.18 N's per 100 kbp 0.02 227

15 228 Table S3. The annotation and expression levels of genes involved in encoding representative 229 exo-hydrolases in intestinal flora of three Alviniconcha marisindica individuals. Individual 38I-DV129-5 Gene_ID Annotation Gene COG Best Tax-Level TPM k141_240474_2 sialate O-acetylesterase-like G Cytophagia 283.14 k141_338559_2 sialate O-acetylesterase G Leeuwenhoekiella 122.07 k141_402846_1 sialate O-acetylesterase-like G Sphingobacteriia 61.69 Alpha amylase, catalytic k141_216833_3 treY G Microbacteriaceae 490.45 domain k141_34583_1 Alpha-amylase domain G Sphingomonadales 304.57 k141_698499_2 maltase A2-like G Lactobacillaceae 328.87 k141_698499_2 maltase A2-like G Lactobacillaceae 702.71 k141_240474_2 sialate O-acetylesterase-like G Cytophagia 262.29 k141_338559_2 sialate O-acetylesterase G Leeuwenhoekiella 112.56 k141_402846_1 sialate O-acetylesterase-like G Sphingobacteriia 4050.85 k141_240474_2 sialate O-acetylesterase-like G Cytophagia 262.29 Alpha amylase, catalytic k141_216833_3 treY G Microbacteriaceae 30.38 domain k141_295070_4 alpha-glucosidase aglA G Myxococcales 46.35 k141_34583_1 Alpha-amylase domain G Sphingomonadales 232.23 GDSL-like Lipase/Acylhydrolase k141_188836_2 E Cytophagia 18.91 family GDSL-like Lipase/Acylhydrolase k141_283355_1 E Cytophagia 1.65 family GDSL-like k141_513053_1 Lipase/Acylhydrolase tesA E Leeuwenhoekiella 36.49 family k141_60153_1 GDSL-like Lipase/Acylhydrolase E Micromonosporales 45.59 PFAM GDSL-like Lipase k141_479102_1 E Nostocales 1.55 Acylhydrolase GDSL-like Lipase/Acylhydrolase k141_299134_1 E Proteobacteria 10.65 family k141_747374_1 G-D-S-L family lipolytic protein E Proteobacteria 6.50 GDSL-like Lipase/Acylhydrolase k141_435642_1 E Sphingomonadales 156.20 family GDSL-like Lipase/Acylhydrolase k141_572192_2 E Sphingomonadales 43.11 family GDSL-like Lipase/Acylhydrolase k141_72843_1 E Streptosporangiales 4.95 family k141_499802_1 carboxylesterase 3B lipT I Gordoniaceae 383.44 k141_344000_3 para-nitrobenzyl esterase pnbA I Paenibacillaceae 1.55 k141_219741_3 esterase E4-like I Sphingomonadales 114.20 k141_175913_1 Exodeoxyribonuclease III L Bacteria 98.89 k141_113311_7 Exodeoxyribonuclease III L Bacteria 47.97 k141_8172_1 Exodeoxyribonuclease III L Bacteria 4.49 k141_40481_1 Exodeoxyribonuclease III S Proteobacteria 7.73 Exodeoxyribonuclease k141_262990_1 holA L Tenericutes 608.82 III k141_175913_1 Exodeoxyribonuclease III L Bacteria 98.89 k141_39157_1 Trypsin E Corynebacteriaceae 2420.31 k141_791929_1 Trypsin-like serine protease M Vibrionales 33.11

16 k141_39157_1 Trypsin E Corynebacteriaceae 2420.31 k141_562146_2 neutral protease-like lasB Q Colwelliaceae 2.96 k141_541590_1 neutral protease-like lasB E Vibrionales 75.77 k141_562146_2 neutral protease-like lasB Q Colwelliaceae 2.96 Papain-like cysteine protease k141_46506_1 H Bacillus 1557.63 AvrRpt2 Papain-like cysteine protease k141_46506_1 H Bacillus 1557.63 AvrRpt2 k141_824392_1 hemagglutinin lasB E Shewanellaceae 1.56 k141_776239_3 Protease prsW family S Dermatophilaceae 305.60 Individual 38I-DV129-15 k141_688074_2 sialate O-acetylesterase-like G Cytophagia 1.91 k141_543608_2 sialate O-acetylesterase G Leeuwenhoekiella 47.14 SMART alpha amylase catalytic k141_42238_1 G Chloroflexi 324.57 sub domain SMART alpha amylase catalytic k141_352388_1 G Chloroflexi 115.94 sub domain Maltogenic Amylase, C- k141_819448_3 treS G Rhizobiaceae 534.59 terminal domain 1,4-alpha-glucan- k141_236157_3 GLC3 G Chaetomiaceae 178.97 branching enzyme k141_908504_2 GDSL-like Lipase/Acylhydrolase E Alteromonadaceae 16.04 GDSL-like k141_660256_1 Lipase/Acylhydrolase ypmR E Carnobacteriaceae 51.37 family GDSL-like Lipase/Acylhydrolase k141_318019_1 E Cytophagia 0.76 family GDSL-like Lipase/Acylhydrolase k141_672619_1 E Flavobacteriia 6.04 family GDSL-like k141_366024_1 Lipase/Acylhydrolase tesA E Leeuwenhoekiella 0.97 family k141_637991_1 GDSL-like Lipase/Acylhydrolase E Micromonosporales 372.32 k141_519721_1 GDSL-like Lipase/Acylhydrolase E Micromonosporales 66.99 k141_781927_1 GDSL-like Lipase/Acylhydrolase E Micromonosporales 59.82 PFAM GDSL-like Lipase k141_39007_1 E Nostocales 8.19 Acylhydrolase PFAM GDSL-like Lipase k141_640620_1 E Nostocales 2.14 Acylhydrolase GDSL-like Lipase/Acylhydrolase k141_353962_1 S Pleosporales 15.63 family Porphyromonadace k141_302376_2 GDSL-like Lipase/Acylhydrolase E 3.02 ae GDSL-like Lipase/Acylhydrolase k141_319940_1 E Sphingomonadales 39.71 family GDSL-like Lipase/Acylhydrolase k141_368234_1 E Streptosporangiales 1.30 family k141_908504_2 GDSL-like Lipase/Acylhydrolase E Alteromonadaceae 16.04 k141_463823_1 Esterase FE4 G Ascomycota 39.42 k141_542556_1 Exodeoxyribonuclease III G Ascomycota 2.70 Belongs to the type-B k141_664301_1 I Eurotiales 48.53 carboxylesterase lipase family

17 k141_238923_1 carboxylesterase 3B lipT I Gordoniaceae 83.59 Agaricomycetes k141_712297_1 Exodeoxyribonuclease III 0.75 incertae sedis k141_542556_1 Exodeoxyribonuclease III G Ascomycota 2.70 k141_621930_1 Exodeoxyribonuclease III L Bacteria 110.90 k141_611192_1 Exodeoxyribonuclease III L Bacteria 13.32 k141_49768_1 Exodeoxyribonuclease III L Bacteria 6.64 k141_489964_2 Exodeoxyribonuclease III L Bacteria 4.03 Exodeoxyribonuclease k141_322147_1 ypmS S Listeriaceae 19.94 III Exodeoxyribonuclease k141_448732_3 exoA L Neisseriales 3.38 III k141_560323_2 Exodeoxyribonuclease III S Proteobacteria 405.28 Exodeoxyribonuclease k141_711647_1 NAR1 Y Taphrinomycotina 22.07 III k141_396224_1 fibrinolytic enzyme, isozyme C-like O Actinobacteria 139.91 k141_246661_1 chymotrypsin-like serine proteinase O Actinobacteria 3.24 k141_2041_1 Trypsin E Corynebacteriaceae 613.73 k141_396224_1 fibrinolytic enzyme, isozyme C-like O Actinobacteria 139.91 k141_246661_1 chymotrypsin-like serine proteinase O Actinobacteria 3.24 k141_686364_2 neutral protease-like lasB Q Colwelliaceae 0.40 Gammaproteobacte k141_462614_1 neutral protease-like lasB E 8.32 ria k141_148422_1 neutral protease-like lasB E Vibrionales 15.26 Gammaproteobacte k141_462614_1 neutral protease-like lasB E 8.32 ria Papain-like cysteine protease k141_825267_1 S Clostridia 4.42 AvrRpt2 Papain-like cysteine protease k141_605305_1 S Clostridia 2.49 AvrRpt2 Papain-like cysteine protease k141_534720_1 S Clostridia 1.73 AvrRpt2 Papain-like cysteine protease k141_470447_1 S Clostridia 1.10 AvrRpt2 Papain-like cysteine protease k141_726180_1 S Clostridia 1.06 AvrRpt2 Papain-like cysteine protease k141_686583_1 S Clostridia 0.87 AvrRpt2 k141_417641_4 Protease prsW family S Dermatophilaceae 64.06 matrilysin family k141_890620_1 O Proteobacteria 35.72 metalloendoprotease k141_847292_1 OTU-like cysteine protease OT Eurotiales 5.79 Individual 38I-DV131-6 k141_623736_1 sialate O-acetylesterase G Leeuwenhoekiella 27.19 k141_469403_1 sialate O-acetylesterase-like G Sphingobacteriia 2004.59 SMART alpha amylase catalytic k141_366488_2 G Chloroflexi 85.31 sub domain k141_758050_2 maltase A2-like G Lactobacillaceae 226.31 k141_513093_1 maltase A3-like G Leuconostocaceae 1.54 k141_469403_1 sialate O-acetylesterase-like G Sphingobacteriia 2004.59 k141_461237_1 alpha-glucosidase malZ G Oceanospirillales 14.50

18 1,4-alpha-glucan- k141_474653_5 GLC3 G Chaetomiaceae 466.84 branching enzyme SMART alpha amylase catalytic k141_366488_2 G Chloroflexi 85.31 sub domain k141_511492_1 GDSL-like Lipase/Acylhydrolase E Alteromonadaceae 12.12 k141_290837_1 GDSL-like Lipase/Acylhydrolase E Clostridia 2.77 k141_454472_2 Lysophospholipase L1 E Cytophagia 11.80 k141_652332_1 hydrolase GDSL E Cytophagia 3.40 GDSL-like Lipase/Acylhydrolase k141_566963_1 E Flavobacteriia 41.46 family GDSL-like Lipase/Acylhydrolase k141_23211_1 E Hyphomonadaceae 484.00 family k141_483011_1 GDSL-like Lipase/Acylhydrolase E Micromonosporales 152.28 k141_445749_1 GDSL-like Lipase/Acylhydrolase E Micromonosporales 40.19 GDSL-like k141_599789_1 estA E Sphingobacteriia 12.80 Lipase/Acylhydrolase GDSL-like Lipase/Acylhydrolase k141_553159_2 E Sphingomonadales 37.03 family k141_546352_1 G-D-S-L family lipolytic protein E Sphingomonadales 4.12 k141_511492_1 GDSL-like Lipase/Acylhydrolase E Alteromonadaceae 12.12 Belongs to the type-B k141_378516_2 I Actinobacteria 32.96 carboxylesterase lipase family k141_776825_1 para-nitrobenzyl esterase pnbA I Bacillus 45.32 Belongs to the type-B k141_564754_1 I Eurotiales 107.07 carboxylesterase lipase family k141_568994_1 carboxylesterase 3B lipT I Gordoniaceae 360.87 Belongs to the type-B k141_785481_1 carboxylesterase lipase lipT I Mycobacteriaceae 209.83 family Belongs to the type-B k141_166509_1 T Nectriaceae 26.71 carboxylesterase lipase family Belongs to the type-B k141_378516_2 I Actinobacteria 32.96 carboxylesterase lipase family Exodeoxyribonuclease k141_62324_7 exoA L Alteromonadaceae 12.69 III k141_603633_1 Exodeoxyribonuclease III L Bacteria 9.79 k141_354178_2 Exodeoxyribonuclease III L Bacteria 6.34 k141_199648_2 Exodeoxyribonuclease III L Bacteria 5.53 Exodeoxyribonuclease k141_25149_1 ypmS S Listeriaceae 80.29 III k141_46566_1 Exodeoxyribonuclease III S Proteobacteria 3.20 Exodeoxyribonuclease k141_283492_1 NAR1 Y Taphrinomycotina 32.33 III Exodeoxyribonuclease k141_281926_1 holA L Tenericutes 108.10 III Exodeoxyribonuclease k141_62324_7 exoA L Alteromonadaceae 12.69 III k141_31145_1 Trypsin E Corynebacteriaceae 303.71 k141_111587_4 neutral protease-like lasB Q Colwelliaceae 2.54 Gammaproteobacte k141_385692_1 neutral protease-like lasB E 1.76 ria k141_588201_1 neutral protease-like lasB E Vibrionales 10.02 k141_111587_4 neutral protease-like lasB Q Colwelliaceae 2.54

19 Papain-like cysteine protease k141_46789_4 H Bacillus 8.81 AvrRpt2 k141_445402_1 peptidase domain protein S Clostridia 1.71 Papain-like cysteine protease k141_660310_1 S Clostridia 0.74 AvrRpt2 Papain-like cysteine protease k141_534044_1 S Clostridia 0.68 AvrRpt2 Papain-like cysteine protease k141_488656_1 S Clostridia 0.65 AvrRpt2 Papain-like cysteine protease k141_774423_1 S Clostridia 0.44 AvrRpt2 Papain-like cysteine protease k141_46789_4 H Bacillus 8.81 AvrRpt2 k141_759841_8 Protease prsW family S Dermatophilaceae 113.86 230

20 231 Supplementary Notes 232 Supplementary Note 1 233 Sample Dissection. The frozen snails were thawed in RNAlater® (Invitrogen, USA) on ice, 234 dissected with different tissues fixed separately in RNAlater®, and then prepared for nucleic 235 acid extraction. A single specimen of Alviniconcha marisindica from Wocan was used for the 236 holobiont genome assembly. A total of 16 individuals were dissected, and tissues/organs 237 were collected for nucleic acid preparation and sequencing. DNA extracted from muscle of 238 foot and neck from one male individual was used for host genome sequencing and assembly. 239 DNA extractied from the endosymbiont-harbouring gills were used for endosymbiont 240 genome sequencing and assembly. Three snail individuals, including the one used for 241 identifying the host genome, were dissected into 7–10 tissue types each with RNA extraction 242 performed on the different tissues, namely cephalic tentacle (Ct), digestive gland (DG), 243 gonad (testis or ovary, Go), foot (Ft), endosymbiont-containing ctenidium (Gi), 244 ectosymbiont-containing intestine (Int), mantle (Man), mantle edge (ME), neck furrow (NF), 245 nephridium (Ne), and ventricle heart (VH), following the previously published anatomy of 246 Alviniconcha marisindica from the Kairei deep-sea hydrothermal field (see Fig. 1 in Suzuki 247 et al., 2005). Moreover, endosymbiont-containing ctenidium (gill, Gi) of several individuals 248 were divided into different parts, including gill base (GB), gill distal (GD), gill posterior (GP) 249 and gill anterior (GA). The gills of 10 individuals were divided into anterior and posterior 250 parts, and the DNA of these 20 parts were extracted separately for metagenome sequencing. 251 The intestines of the three individuals were dissected for total DNA and RNA extraction for 252 the gut microbiome, and the gills of these individuals were also dissected for total RNA 253 extraction. 254 255 Nucleic Acid Preparation. Genomic DNA (gDNA) was extracted using the E.Z.N.A.® 256 Mollusc DNA Kit (Omega Bio-tek, Georgia, USA) and then purified using Genomic DNA 257 Clean & ConcentratorTM-10 Kit (Zymo Research, CA, USA) according to the 258 manufacturer’s protocol. Total DNA of the gills and that of the intestines were extracted 259 using the same protocol. Total RNA was extracted using Trizol (Invitrogen, USA) from 260 different tissues following the manufacturer’s protocol and prepared for RNA-Seq. Nucleic 261 acid quality was evaluated using agarose gel electrophoresis and a BioDrop µLITE (BioDrop, 262 Holliston, MA, US), and nucleic acid concentrations were quantified using a Qubit 263 fluorometer v3.0 (Thermo Fisher Scientific, Singapore). 264 265 Morphological Observation. The external morphology of a complete Alviniconcha 266 individual preserved in absolute ethanol was shown in Figure S14, the internal morphology 267 was observed under a Leica MZ9.5 stereozoom microscope, and the snails were dissected to 268 isolate their radula. Radular morphologies of 2 specimens were imaged using a scanning 269 electron microscopy (SEM). Radular sacs dissected from the body cavities and stored in pure 270 ethanol, then treated with half-strength commercial bleach, leaving the clean radular teeth. 271 Subsequently, the radula for SEM was rinsed in MilliQ water and dehydrated by increasing 272 concentration of ethanol solution (20, 40, 60, 75, 100%). The dehydrated radula was dried 273 completely using hexamethyldisilazane and then brought to the next step of SEM observation 274 uncoated at 15 kV using a Hitachi TM-3000SEM. Radular characteristics of the Alviniconcha 275 snail showed this snail has a broad radula with well-developed anterior supporting on the 276 central tooth (Figure S15), indicating its ability of grazing on flat surface. The following 277 linear measurements of 19 specimens were taken with digital vernier calipers: shell height 278 (H), shell width (W), shell depth (D) and shell aperture height (AH), aperture width (AW). 279 These measurements followed the methodology proposed by Chen et al., 2015. The 280 following ratios were calculated from the values of each linear measurement: shell

21 281 height/shell width (H/W); shell depth/shell width (D/W); shell aperture height/shell width 282 (AH/W); shell aperture width/shell width (AW/W). The shell parameters of 19 specimens 283 were shown in Table S4, and a scatter plot of shell width against shell height was shown in 284 Figure S16. The measured five parameters of 19 Alviniconcha individuals indicated these 285 snails might be at different growth stages, and they were linear across all life stages. 286 Morphologically, all Alviniconcha populations are known to be extremely similar regardless 287 of the species (Johnson et al., 2015), and this population was no exception, here molecular 288 taxonomy can be used to provide reliable identification for these cryptic species. 289

290 291 Figure S14. A photograph of snail Alviniconcha marisindica collected from the WHF which 292 stored in absolute ethanol. Scale bar = 10 cm. 293

22 A B

C

D E

294 295 Figure S15. SEM images of radula. Overview: (A) Alviniconcha marisindica (individual 296 01); scale bar = 300 μm (B) A. marisindica (individual 01); scale bars = 200 μm. Central and 297 lateral teeth close-up: (C) A. marisindica (individual 01); scale bars = 200 μm (D) A. 298 marisindica (individual 02); scale bars = 200 μm. Marginal teeth close-up: (E) A. marisindica 299 (individual 02); scale bars = 30 μm. 300

23 50

40

)

m

m

(

t 30

h

g

i

e H

20

l

l

e h

S 10

0 0 10 20 30 40 Shell Width (mm) 301 302 Figure S16. Scatter plot of shell width (diameter) vs. shell height across a size range of 19 303 specimens from Alviniconcha snails. (line of best fit formula: y = 1.3327x - 2.2346, R2 = 304 0.93). 305

306 Table S4. Shell parameters of Alviniconcha marisindica. Range and proportion to shell width 307 (diameter) are calculated from 19 specimens across a size range in this snail. Parameters (mm) Shell Aperture Width Height Depth Width Height

38I-DV129-1-N2 14.23 17.12 13.27 9.48 14.66 38I-DV129-14 22.00 27.16 19.77 15.34 19.90 38I-DV129-15 21.57 26.33 20.69 15.57 22.65 38I-DV129-19-1 26.99 34.97 23.08 22.62 19.58 38I-DV129-19-2 31.35 44.62 31.40 25.69 31.33 38I-DV129-2-N2 12.94 14.41 10.13 7.69 12.24 38I-DV129-20 25.80 30.65 21.57 16.85 25.31 38I-DV129-3-N2 26.63 28.50 24.22 18.48 22.45 38I-DV129-4-N2 31.94 42.73 31.42 21.75 27.39 38I-DV129-5-N2 30.87 37.23 30.50 21.92 34.07 38I-DV129-5 32.48 39.57 30.45 23.49 31.41 38I-DV129-8 26.60 32.01 26.31 18.08 27.18 38I-DV129-9 24.04 29.34 21.71 21.99 18.75 38I-DV131-1 19.22 25.08 18.66 16.38 20.99 38I-DV131-3 22.67 27.55 21.19 15.73 22.77 38I-DV131-5 33.54 42.59 31.10 25.52 34.11 38I-DV131-6 30.87 34.55 26.53 22.58 29.10 38I-DV131-7 32.38 44.85 32.39 24.55 35.39 38I-DV131-9 13.90 18.01 12.99 10.43 15.00 Range 12.94– 14.41– 10.13– 7.69– 12.24– 33.54 44.85 32.39 25.69 35.39 Proportion of shell diameter 1 1.238 0.927 0.734 0.970 (width) SD of proportion – 0.089 0.060 0.078 0.107 308 309 Molecular Taxonomy. Fragments of the mitochondrial cytochrome c oxidase subunit I 310 (COI) gene were amplified by polymerase chain reaction from 20 specimens, using the

24 311 universal primers c (Folmer et al., 1994). Reaction volumes consisted of 1 µL template DNA, 312 17 µL deionized sterilized water, 20 µL 2× PCR MasterMix (Tiangen Biotech Co., Beijing, 313 China), 1 µL of each primer (GenScript Biotech Co., Nanjing, China), for a total of 40 µL. 314 PCR products were purified using TIANGEN Universal DNA Purification Kit (Tiangen 315 Biotech Co., Beijing, China), and sequenced using Sanger sequencing platforms at the 316 Beijing Genomics Institute (BGI)-Shenzhen. Bidirectional sequences were trimmed and 317 assembled using DNASTAR EditSeq version 7.1.0. A phylogenetic tree was constructed 318 based on COI gene sequences of Alviniconcha snails. Genetic distance estimated based on the 319 COI sequence was about 3.4% between this new population and the CIR populations of A. 320 marisindica, and a phylogenetic tree based on COI gene sequences of Alviniconcha snails 321 shows our collected snail samples are closest to A. marisindica from the CIR (Figure S17A). 322 Trimmomatic v0.39 (Bolger et al., 2014) and FastUniq (Xu et al., 2012) were used to trim the 323 Illumina reads and remove duplicates. The clean reads were assembled using SPAdes v3.13.1 324 (Bankevich et al., 2012) with with k-mer sizes of 21, 33, 55, 77, 99, and 127 bp, and the 325 products were pooled. Then, sequences of 12S, 16S, 18S, 28S-D1, 28S-D6, COI and H3 genes 326 were picked out from the assembled contigs. A phylogenic tree constructed based on 327 concatenated sequences of these seven genes indicated that this population was the closest to 328 A. marisindica from the CIR (Figure S17B). The above results showed the Alviniconcha 329 population from the CR was Alviniconcha marisindica which is the same species from the 330 CIR. 331

25 332 333 Figure S17. Molecular taxonomy of the population of Alviniconcha snails from Wocan 334 vent site on Carlsberg Ridge. A phylogenetic tree of Alviniconcha snails based on (A) COI 335 genes and (B) concatenated 12S, 16S, 18S, 28S-D1, 28S-D6, COI and H3 genes, the 336 Alviniconcha marisindica population from Carlsberg Ridge (CR) and Central Indian Ridge 337 (CIR) on Indian Ocean are in red colour. Numbers on the nodes indicate bootstrap support. 338 339 Supplementary Note 2 340 Library Construction and Sequencing. Genomic DNA was aliquoted and submitted to 341 three sequencing platforms: Illumina, PacBio Sequel, and Oxford Nanopore Technologies 342 (ONT). A library with a 350 bp insert size was constructed from gDNA following the 343 standard protocol provided by Illumina (San Diego, CA, USA). After paired-end sequencing 344 of the library at Novogene (Beijing, China), approximately 50 Gb of Illumina NovaSeq reads

26 345 with a read length of 150 bp were generated. Illumina sequencing of total DNA from the gills 346 and that of total DNA from the intestines were conducted similarly, with approximately 50 347 Gb of reads generated from each gill sample for endosymbiont genome assembly, 348 approximately 6–8 Gb of reads generated from each of 20 gill filaments for symbiont genetic 349 diversity analysis, and approximately 12 Gb of reads generated from each of three intestine 350 specimens for metagenome analysis (Table S5). 351 352 For preparation of the single-molecule real-time (SMRT) DNA template for PacBio 353 sequencing, the gDNA was sheared into large fragments (10 kb on average) using a Covaris® 354 g-TUBE® device and then concentrated using AMPure® PB beads. DNA repair and 355 purification were carried out according to the manufacturer’s instructions (Pacific 356 Biosciences). The blunt adapter ligation reaction was conducted on purified end-repair DNA, 357 and after purification DNA sequencing polymerases became bound to SMRTbell templates. 358 Finally, the library was quantified using a Qubit fluorometer v3.0. After sequencing with the 359 PacBio Sequel System at the Hong Kong University of Science and Technology (HKUST) 360 and Novogene, approximately 72 Gb of long reads were generated, with reads less than 4 kb 361 in length discarded. 362 363 For ONT sequencing, a total of 3–5 μg of gDNA were used for the construction of each 364 library following the ‘1D gDNA selecting for long reads (SQK-LSK109)’ protocol from 365 ONT. Briefly, gDNA was repaired and end-prepped as per standard protocol, before it was 366 cleaned up with a 0.4× volume of AMPure XP beads. Adapter ligation and clean-up of the 367 cleaned-repaired DNA were performed as per the standard protocol and the purified-ligated 368 DNA was eluted using elution buffer. The DNA library was mixed with sequencing buffer 369 and loading beads before it was loaded onto the SpotON sample port. Finally, sequencing 370 was performed following the manufacturer’s guidelines using the FLO-MIN106 R9.4 flow 371 cell coupled to the MinIONTM platform (ONT, Oxford, UK). Raw reads were base-called 372 according to the protocol in MinKNOW and written into fastq files, and 9.8 Gb of long reads 373 were generated with reads less than 4Kb discarded. MinION sequencing of total DNA from 374 one gill specimen was conducted using the same procedures, generating 3.5 Gb of reads for 375 endosymbiont genome scaffolding. Illumina reads from gDNA were used for the genome 376 survey of the Wocan Alviniconcha marisindica, and PacBio and ONT reads were used for the 377 genome assembly (Table S6). 378 379 For eukaryotic transcriptome sequencing of different tissues, a 250–300 bp insert cDNA 380 library of each tissue was constructed after removing the prokaryotic RNA and sequenced on 381 the Illumina NovaSeq platform at Novogene to produce 150 bp paired-end reads. Since the 382 RNA of gills includes the sequences from both the host and the symbiont, another 250–300 383 bp insert strand-specific library of each gill specimen was constructed using Ribo-Zero™ 384 Magnetic Kit to sequence both eukaryotic and microbial RNA. Therefore, two sets of 385 transcript sequencing data were produced for the gills, one for both the host and the 386 symbiont, and the other for only the host. The meta-transcriptome sequencing of the intestine 387 was conducted using the same methods. Approximately 5–10 Gb of reads were generated 388 from each tissue (Table S5).

27 389 Table S5. The nucleic acid preparation and Illumina paired-end sequencing information of 390 different tissues/organs from 16 Alviniconcha marisindica individuals in this study.

Tissues/organs (Illumina sequencing reads) A. marisindica Individuals Gill Ct DG Go Ft Int Man ME NF Ne VH Sampl Nucleic GB GD GP GA e No. acid

18,79 20,55 9,982 0,443 19,39 16,88 20,37 20,89 20,24 23,29 20,62 19,50 RNA – – – – 5,743 1,965 6,253 1,223 7,288 1,531 0,721 4,431 38I- 31,379,093 DV129 (meta) -2-N2 260,1 DNA – – – 64,97 132,877,148 (meta) – – – – – – 6

24,055,311 26,27 22,59 26,18 31,12 27,11 23,61 RNA – – – – 9,803 7,795 4,458 6,320 9,141 2,722 38I- 28,067,797 (meta) DV129 -3-N2 193,0 DNA – – – 71,39 208,807,531 (meta) – – – – – – 7

19,80 21,25 5,564 1,951 19,36 20,37 20,23 22,46 20,88 27,39 20,40 21,58 RNA – – – – 38I- 9,344 4,319 8,317 4,985 4,683 7,183 4,036 7,150 DV129 27,801,163 -16 (meta)

DNA – – – – 245,337,981 (meta) – – – – – –

38,477,522 RNA – – – – 29,936,901 (meta) – – – – – 38I- (meta) DV129 -5-N2 45,356,320 DNA – – – – – – – – – – (meta)

30,937,560 RNA – – – – 29,480,256 (meta) – – – – – 38I- (meta) DV129 -15 43,407,952 DNA – – – – – – – – – – (meta)

28 Tissues/organs (Illumina sequencing reads) A. marisindica Individuals Gill Ct DG Go Ft Int Man ME NF Ne VH Sampl Nucleic GB GD GP GA e No. acid

39,406,904 RNA – – – – 27,747,516 (meta) – – – – – 38I- (meta) DV131 -6 39,646,956 DNA – – – – – – – – – – (meta)

38I- 22,194, 26,009 DV129 DNA – – – – – – 485 ,046 – – – – – – -14-1 (meta) (meta)

38I- 21,577, 20,902 DV129 DNA – – – – – – 279 ,985 – – – – – – -14-2 (meta) (meta)

38I- 24,755, 30,199 DV129 DNA – – – – – – 940 ,263 – – – – – – -19-1 (meta) (meta)

38I- 22,972, 20,232 DV129 DNA – – – – – – 456 ,613 – – – – – – -19-2 (meta) (meta)

38I- 24,269, 26,623 DV129 DNA – – – – – – 919 ,681 – – – – – – -20 (meta) (meta)

38I- 26,890, 27,436 DV131 DNA – – – – – – 577 ,916 – – – – – – -1 (meta) (meta)

38I- 26,916, 25,686 DV131 DNA – – – – – – 916 ,865 – – – – – – -3 (meta) (meta)

38I- 23,250, 19,848 DV131 DNA – – – – – – 612 ,394 – – – – – – -8 (meta) (meta)

29 Tissues/organs (Illumina sequencing reads) A. marisindica Individuals Gill Ct DG Go Ft Int Man ME NF Ne VH Sampl Nucleic GB GD GP GA e No. acid

38I- 25,868, 20,261 DV131 DNA – – – – – – 238 ,863 – – – – – – -9-1 (meta) (meta)

38I- 27,342, 19,909 DV131 DNA – – – – – – 826 ,893 – – – – – – -9-2 (meta) (meta)

391

30 392 Table S6. The PacBio and ONT sequencing information of DNA from one Alviniconcha 393 marisindica individual for holobiont genome assembly. Tissues/organs (Long-read sequencing reads) 38I-DV129-2-N 2 Foot/Neck Gill PacBio 11,232,982 – ONT 2,758,946 833,727 394 395 Supplementary Note 3 396 de novo Hybrid Assembly of the Host Genome. Trimmomatic v0.39 (Bolger et al., 2014) 397 was used to trim the Illumina adapters and low-quality bases (base quality ≤ 20). The genome 398 size was estimated by a 17-mer histogram which was shown in Figure S18. The kmer 399 histogram was generated by Platanus v1.2.4 (Kajitani et al., 2014) with settings of -k 17 -s 10 400 -u 0.2 -t 20 -m 320. The genome heterozygosity was evaluated as 0.88% using GenomeScope 401 (Vurture et al., 2017). Several genome assembly pipelines were applied to assemble the 402 genome with PacBio and ONT reads, including PacBio-only approaches (e.g. 403 minimp2+miniasm (Li, 2016) and wtdbg2 (Ruan and Li, 2019) and PacBio-ONT hybrid 404 approaches (e.g. MaSuRCA version 3.2.8 (Zimin et al., 2013), FMLRC (Wang et al., 2018) + 405 smartdenovo (Ruan, 2018) and FMLRC (Wang et al., 2018) + wtbg2 (Ruan and Li, 2019)). 406 The detailed settings of each assembly pipeline are shown below. 407

408 409 Figure S18. A 17-mer histogram for the Alviniconcha marisindica genome. 410 411 Genome Assembly. 412 1) minimap2 + miniasm on all of the Pacbio reads (Li, 2016). 413 minimap2 -X -t 64 -x ava-pb AlviPacBio_4K.fasta AlviPacBio_4K.fasta > reads.paf 414 miniasm -f AlviPacBio_4K.fasta reads.paf > Amar_default.gfa 415 awk '/^S/{print ">"$2"\n"$3}' Amar_default.gfa | fold > Amar_default.fa 416 417 2) wtdbg2 pipeline on all of the Pacbio reads (Ruan and Li, 2019). 418 wtdbg2 -t 12 -i AlviPacBio_4K.fasta -fo Amar_Pb -x sq -L 5000 -g 0.8g 419 wtpoa-cns -t 12 -i Amar_Pb.ctg.lay.gz -fo Amar_Pb.ctg.lay.fa

31 420 421 3) MaSuRCA v3.2.8 pipeline (Zimin et al., 2013). 422 The ONT reads were concatenated with the PacBio reads serve as the input of 423 ’NANOPORE=’ in the configure file, as suggested by the developer. The rest is as default. 424 425 4) FMLRC (Wang et al., 2018) hybrid correction and further assembled by third party 426 assembler. The command was listed as follows: 427 gunzip -c All_trim.fq.gz | awk 'NR % 4 == 2' | sort | tr NT TN | ropebwt2 -LR | tr NT TN | 428 fmlrc-convert Amar_comp_msbwt.npy 429 awk 'NR % 4 == 2' Illumina_trim.fq | sort | tr NT TN | ropebwt2 -LR | tr NT TN | fmlrc- 430 convert Amar_comp_msbwt.npy 431 fmlrc -p 12 Amar_comp_msbwt.npy Amar_Pb_ONT.fa Amar_Pb_ONT_fmlrc_EC.fasta 432 The corrected long reads were either assembled by smartdenovo 433 (https://github.com/ruanjue/smartdenovo) or wtdbg2 (Ruan and Li, 2019). 434 435 A comparison of the assembly statistics of different pipelines (Table S7) showed that the 436 FMLRC+wtdbg2 assembly was the best and therefore this assembly was used in the 437 downstream analyses. The assembly was carried out as follows: the ONT reads were 438 concatenated with PacBio reads and error corrected with Illumina reads using FMLRC 439 (Wang et al., 2018). This hybrid error correction method was selected based on previous 440 benchmarking analysis on the available tools using Illumina reads for correction of 441 PacBio/ONT long reads (Fu et al., 2019). The corrected long reads were then assembled 442 using wtdbg2 using the setting “-x preset2” (Ruan and Li, 2019). 443

444 Table S7. The statistics of the genome assembly. The number in bold indicates the best 445 among all of the assemblers. minimap2+ FMLRC+ FMLRC+ wtdbg2* MaSuRCA miniasm* smartdenovo wtdbg2 Total size 934.7Mb 824.2Mb 917.9Mb 846.3Mb 853.2Mb Number of contig 15,233 6526 30,250 5220 6704 N50 106.8Kb 346.2Kb 71.5Kb 320.6Kb 708.1Kb NG50 122.9Kb 343.7Kb 104.1Kb 325.9Kb 728.0Kb Longest contig 0.89Mb 2.38Mb 0.83Mb 2.68Mb 4.75Mb Mean length 61.4Kb 126.3Kb 37.8Kb 162.1Kb 127.3Kb No. of contig over 1Mb 0 56 0 59 204 446 * using PacBio sequences only. 447 448 Bacterial Contamination Removal. MetaBAT 2 (Kang et al., 2015) and MaxBin 2.0 (Wu et 449 al., 2016) were used to perform genome binning of the assembled contigs for checking 450 microbial contamination, the resulting output microbial genomes were checked by CheckM 451 v1.1.2 (Parks et al., 2015), based on the presence of particular marker genes. Open reading 452 frames (ORFs) of the binned microbial genome were predicted using Prodigal v2.6.3 (Hyatt 453 et al., 2010), then BLASTp was used to align all predicted protein sequences from the binned 454 genome to the NCBI NR protein database using an e-value 1e-5 with 20 best hits, and the 455 taxonomic assignment of each protein was imported to MEGAN v5.7.0 (Huson et al., 2011) 456 using the lowest common ancestor (LCA) method. These sequences were confirmed as 457 contamination and removed from the downstream analysis (Table S8). 458

32 459 Table S8. Information of the bacterial sequences removed from the assembled host contigs.

GC Total contig Total Completeness Contamination Marker lineage content clusters (Mb) ORFs (%) (%) (%)

k__Bacteria 83.65 0 (UID3060) 9.44 9,452 41.00 c__Mollicutes 31.56 3.26 (UID2395) 460 461 Supplementary Note 4 462 Annotation of the Host Genome. The proportion of repeat content in the Alviniconcha 463 genome was evaluated as 20.25%, the classified repeat content in the genome was shown in 464 Table S9.

465 Table S9. The classified repeat content in the genome. Class Count bpMasked %masked DNA transposons 130115 6754060 0.82% Academ 759 40214 0.00% Academ2 3 454 0.00% CMC-Chapaev 541 29410 0.00% CMC-Chapaev-3 106 4290 0.00% CMC-EnSpm 60985 4146368 0.50% CMC-Transib 923 48612 0.01% Crypton 546 32220 0.00% Crypton-H 39 23800 0.00% Crypton-V 754 33472 0.00% Dada 3988 218202 0.03% Ginger 30422 2183414 0.27% Harbinger 4 253 0.00% IS3EU 3370 170165 0.02% Kolobok 68 3191 0.00% Kolobok-Hydra 18710 1454083 0.18% Kolobok-T2 2497 131087 0.02% MULE-F 1 105 0.00% MULE-MuDR 3833 244134 0.03% MULE-NOF 22 6202 0.00% Maverick 21062 1112659 0.14% Merlin 1196 52982 0.01% Novosib 41391 5173020 0.63% P 751 34659 0.00% PIF-Harbinger 8296 532246 0.06% PIF-ISL2EU 2246 105459 0.01% PiggyBac 816 41911 0.01% Sola 60298 6225659 0.76%

33

TcMar 951 44699 0.01% TcMar-Ant1 4 198 0.00% TcMar-Fot1 980 51365 0.01% TcMar-ISRm11 54 2772 0.00% TcMar-Mariner 36 4471 0.00% TcMar-Pogo 2 113 0.00% TcMar-Sagan 1 94 0.00% TcMar-Stowaway 60 3541 0.00% TcMar-Tc1 1807 90403 0.01% TcMar-Tc2 93 11042 0.00% TcMar-Tc4 7 456 0.00% TcMar-Tigger 62 28309 0.00% TcMar-m44 2 144 0.00% Zator 4 286 0.00% Zisupton 856 55699 0.01% hAT 37068 2087935 0.25% hAT-Ac 26183 1603319 0.20% hAT-Blackjack 404 18786 0.00% hAT-Charlie 8142 661251 0.08% hAT-Pegasus 2596 134949 0.02% hAT-Tag1 2889 151116 0.02% hAT-Tip100 4325 219024 0.03% hAT-Tol2 10 324 0.00% hAT-hAT5 75 15348 0.00% hAT-hATm 162 8305 0.00% hAT-hATw 81 18227 0.00% hAT-hATx 1 125 0.00% hAT-hobo 3 210 0.00% LINE 2857 233083 0.03% Ambal 165 11493 0.00% CR1 854 114328 0.01% CR1-Zenon 144 36221 0.00% CRE 5 498 0.00% DRE 68 4503 0.00% Dong-R4 8 350 0.00% I 478 80933 0.01% Jockey 2741 346028 0.04% L1 7507 541190 0.07% L1-Tx1 6572 412478 0.05% L2 8569 861778 0.10% LOA 6 1229 0.00% Penelope 6485 384678 0.05% Proto1 23 1187 0.00% R1 3362 372621 0.05% R2 244 25636 0.00% RTE 5 428 0.00% RTE-BovB 395 91147 0.01% RTE-RTE 1 48 0.00%

34

RTE-X 675 104048 0.01% Rex-Babar 2063 164985 0.02% Tad1 33 4512 0.00% LTR 4210 235229 0.03% Caulimovirus 1 39 0.00% Copia 6077 331270 0.04% DIRS 552 40187 0.00% ERV 8929 598448 0.07% ERV-Foamy 1 164 0.00% ERV1 26664 1251321 0.15% ERV4 45 2232 0.00% ERVK 14030 620405 0.08% ERVL 138 8245 0.00% ERVL-MaLR 1 64 0.00% Gypsy 31901 3007753 0.37% Ngaro 3173 245231 0.03% Pao 819 88223 0.01% Viper 17 1082 0.00% Other 9 3237 0.00% DNA_virus 11 541 0.00% RC ------Helitron 19982 1423511 0.17% Retroposon 1 111 0.00% SVA 5 1767 0.00% SINE 54 6752 0.00% 5S 3 233 0.00% 5S-Deu-L2 57 4163 0.00% 5S-Sauria-RTE 2 153 0.00% 7SL 3 688 0.00% Alu 1 46 0.00% B4 117 5651 0.00% ID 16 778 0.00% MIR 128 7151 0.00% RTE-BovB 3 524 0.00% U 8 466 0.00% tRNA 5209 239468 0.03% tRNA-7SL 4 207 0.00% tRNA-C 31 1306 0.00% tRNA-CR1 11 489 0.00% tRNA-Core 8 354 0.00% tRNA-Deu 2 187 0.00% tRNA-Deu-L2 379 21699 0.00% tRNA-L2 833 45173 0.01% tRNA-Mermaid 385 19608 0.00% tRNA-RTE 5 173 0.00% tRNA-Sauria-RTE 1 0 0.00% tRNA-V 5 188 0.00% Segmental 1 54 0.00%

35 Unknown 25302 1390838 0.17% centromeric 3 364 0.00% total interspersed 672997 47414017 5.77%

Low_complexity 120231 7742961 0.94% RNA 147 13262 0.00% Satellite 46972 5890412 0.72% 5S 112 13331 0.00% W-chromosome 1 155 0.00% centr 7 1762 0.00% macro 12 821 0.00% telo 3 1409 0.00% Simple_repeat 1277876 105294431 12.81% rRNA 65 25179 0.00% scRNA 2 74 0.00% snRNA 59 7422 0.00% tRNA 503 29244 0.00% Total 2118987 166434480 20.25% 466 467 Host Gene Family Identification and Phylogenomic Analysis. A total of 26 468 lophotrochozoan genomes were analysed for clues to the gene family evolution (Table S10). 469 Time frame constraints imposed to calibrate the topology tree generated from RAxML 470 (Figure S19) are shown in Table S11. 471

472 Table S10. Lophotrochozoan genomes that were used for the phylogeny and gene family 473 analyses. Phylum Species Reference Annelida Capitella teleta Simakov, Marletaz et al., 2013 (88) Nemertea Notospermus geniculatus Luo, Kanda et al., 2018 (89) Phoronida Phoronis australis Luo, Kanda et al., 2018 (89) Brachiopoda Lingula anatina Luo, Kanda et al., 2018 (89) Euprymna scolopes Belcaid, Casaburi et al., 2019 (90) Octopus bimaculoides Albertin, Simakov et al., 2015 (91) Sinonovacula constricta Dong, Zeng et al., 2019 (92) Ruditapes philippinarum Yan, Nie et al., 2019 (93) Mizuhopecten yessoensis Wang, Zhang et al., 2017 (94) Bathymodiolus platifrons Sun, Zhang et al., 2017 (95) Modiolus philippinarum Sun, Zhang et al., 2017 (95) Pinctada fucata Takeuchi, Koyanagi et al., 2016 (96) Crassostrea gigas Zhang, Fang et al., 2012 (97) Haliotis rufescens Masonbrink, Purcell et al., 2019 (98) Haliotis rubra Gan, Tan et al., 2019 (99) Chrysomallon squamiferum Sun et al., 2020 (100) Lottia gigantea Simakov, Marletaz et al., 2013 (88) Alviniconcha marisindica This study Lanistes nyassanus Sun, Mu et al., 2019 (101) Marisa cornuarietis Sun, Mu et al., 2019 (101) Pomacea canaliculata Sun, Mu et al., 2019 (101) Aplysia californica NCBI AplCal3.0 Achatina fulica Guo, Zhang et al., 2019 (102)

36 Biomphalaria glabrata Adema, Hillier et al., 2017 (103) Radix auricularia Schell, Feldmeyer et al., 2017 (104) Elysia chlorotica Cai, Li et al., 2019 (105) 474

475 476 Figure S19. Genome-based phylogeny of selected taxa showing the position of the 477 Alviniconcha marisindica among lophotrochozoans. 478

479 Table S11. The time constrain that was applied to calibrate the species divergent time in the 480 MCMCTree analysis. Calibration node Date Range Reference The first appearance of hard minimum bound = Kaim and Conti, 2010 (106) Abyssochrysidae 168 Ma Hayes, Cowie et al., 2009; hard max time-point of L. nyassanus and P. canaliculata Sun, Mu et al., 2019 (101, 150 Ma 107) minimum = 168.6 Ma and Benton, Donoghue et al., A. californica and B. glabrata soft maximum = 473.4 Ma 2009 (108) The first appearance of both the hard minimum bound = Tillier, 1996; Wade, Mordan Stylommatophora and Hygrophila 130 Ma et al., 2001 (109, 110) and hard minimum bound = Jörger, Stöger et al., 2010 Heterobranchia 390 Ma (111) A. californica (or B. glabrata) and L. minimum = 470.2 Ma and Benton, Donoghue et al., gigantea soft maximum = 531.5 Ma 2015 (112) minimum = 532 Ma and Benton, Donoghue et al., The first appearance of molluscs soft maximum = 549 Ma 2015 (112) The first appearance of hard minimum = 465.0 Stoger, Sigwart et al., 2013 Pteriomorpha Ma (113)

37 minimum = 550.25 Ma The first appearance of Benton, Donoghue et al., and soft maximum = Lophotrochozoan 2015 (112) 636.1Ma 481 482 Supplementary Note 5 483 Microbial Community Composition of Gill Symbionts. To determine the microbial 484 ribotype and community composition in the gill of our specimen, approximately 1,500-bp 485 16S rRNA gene fragment was amplified by PCR from the gill using universal bacterial 486 primers 8f and 1492r (Paster et al., 1998). A clone library of 16S rRNA gene was 487 constructed, and 50 clones were randomly picked from the library and sequenced. The 488 resultant sequences were analyzed using RDP Naive Bayesian rRNA Classifier v2.11 (Wang 489 et al., 2007) with an 80% confidence threshold to reveal bacterial species composition in the 490 gill, the result indicated a single bacterial ribotype in the gill of A. marisindica. To 491 understand the phylogenetic position of the Alviniconcha endosymbiont, a phylogenetic tree 492 was constructed based on 16S rRNA gene sequences, 16S rRNA gene sequence of our 493 specimen was obtained from the above clone library result and 16S rRNA gene sequences of 494 other marine bacteria including endosymbionts from other Alviniconcha snail species were 495 obtained from GenBank. The sequences were aligned using MUSCLE and trimmed using 496 TrimAL v1.4 (Capella-Gutiérrez et al., 2009). The Maximum Likelihood (ML) method was 497 used to construct the phylogenetic tree using RaxML v8.2.11 (Stamatakis et al., 2005) under 498 the GTR + CAT model with 1000 bootstrap replicates. The endosymbiont of A. marisindica 499 from the CR in this study was clustered with the endosymbiont of A. marisindica from the 500 CIR and free-living Campylobacterota from deep-sea vents (Figure S20). 501

502 503 Figure S20. The maximum likelihood (ML) phylogenetic tree of endosymbionts of 504 Alviniconcha snails and other marine bacteria based on 16S rRNA gene. Numbers above 505 branches represent ML bootstrap values based on 1,000 iterations, with 100 as the highest 506 value. δ-Proteobacteria symbionts of genus Olavius are taken as outgroup, the endosymbiont 507 of Alviniconcha marisindica from the CR (this study) and the CIR are in red color. 508

38 509 Genome Binning, Annotation, and Functional Analysis. Contigs potentially belonging to 510 the campylobacterotal endosymbiont genome were separated from its host genome using 511 three binning methods. The first method was modified from Albertsen et al., 2013 and the 512 modified binning process was followed as our previous study (Yang et al., 2020). First, the 513 clean Illumina reads were mapped to the assembled contigs using Bowtie2 v2.3.4.3 514 (Langmead and Salzberg, 2012), and the coverage of each contig was calculated using 515 SAMTOOLS v1.9 (Li et al., 2009). Prodigal v2.6.3 (Hyatt et al., 2010) was used to predict 516 open reading frames (ORFs) and protein functional domains were predicted using HMMER 517 3.1b2 (Eddy et al., 2009) under the 100 + HMM model. Taxonomic affiliation of all HMM 518 positive ORFs were determined using BLASTp (Altschul et a., 1990) against NCBI 519 nonredundant (NR) protein database, and the taxonomic assignment of each protein was 520 imported to MEGAN v5.7.0 (Huson et al., 2011) using the lowest common ancestor (LCA) 521 method with the parameters of Min Score 50, Max Expected 0.01, Top Percent 5 and LCA 522 Percent 100. The results were subsequently analyzed in RStudio (https://www.rstudio.com/) 523 with the libraries of vegan, plyr, RColorBrewer and alphahull. Sequences representing the 524 draft symbiont genome were then extracted from the assembled contigs of both the host and 525 the symbiont, based on the combination of sequencing coverage, GC content and taxonomic 526 classification (Figure S3A). 527 528 Coding sequences (CDS) in the genomes of Alviniconcha symbionts were predicted and 529 translated using Prodigal v2.6.3 (Hyatt et al., 2010). Gene functions were determined by 530 using BLASTp to align the candidate sequences to the NCBI non-redundant (NR) and 531 SwissProt protein databases with the settings of “-evalue 1e-5 -word_size 3 -num_alignments 532 20 -max_hsps 20”. Blast2GO® (Götz et al., 2008) employed with EggNOG mapper (Huerta- 533 Cepas et al., 2017) was applied to assign Gene Ontology (GO) terms and Cluster of 534 Orthologous Groups (COG) to the protein sequences via GO and EggNOG databases. Kyoto 535 Encyclopedia of Genes and Genomes (KEGG) Automatic Annotation Server (KAAS) 536 (Kanehisa and Goto, 2000) was used to conduct the KEGG pathway annotation analysis with 537 the bi-directional best hit (BBH) method. The RAST Server (Overbeek et al., 2014) was used 538 to annotate the genome based on SEED database and build metabolic model from KEGG 539 annotation. 540 541 In addition to Campylobacterotal symbiont which has been confirmed as the dominant 542 endosymbiont in the gill from the above results of clone library, another Mollicutes symbiont 543 was found coexist in the metagenome binning (Figure S3A) which showed a much lower 544 abundance than the Campylobacterotal symbiont. The general genomic features of these two 545 symbionts were shown in Table S12. The genomic overview of Mollicutes was shown in 546 Figure S21. Large coding sequences in the genome of Mollicutes were annotated with 547 hypothetical protein, only 448 protein sequences were assigned functions based on the GO, 548 COG, KEGG and SEED databases. Metabolic pathways that sustain life activities such as 549 glycolysis/gluconeogenesis, DNA replication and repair were found in the Mollicutes, 550 however, many genes involved in the pathways of carbon fixation, amino acids biosynthesis 551 and chemoautotrophs (e.g. sulfur oxidation, methane oxidation, etc.) were missing, indicating 552 the Mollicutes in the gill of A. marisindica cannot offer the host benefits for surviving in 553 deep-sea vents. This is contrast with mutualism, in which both the Campylobacterotal 554 endosymbiont and A. marisindica snail benefit from each other, in this case, we concluded 555 the Mollicutes have commensal relationship with the Alviniconcha marisindica. 556

39 557 Table S12. The general genomic features of A. marisindica symbionts.

Symbiont Campylobacterota Mollicutes

Genome size (Mb) 1.47 0.79 No. of scaffolds 2 15

N50 (Kb) 1458.35 175.95 GC% 37.09 24.78

Completeness% 98.16 82.33 Contamination% 0.82 1.88 No. of Genes 1,429 1,358 No. of CDS 1,386 1,332 No. of functions assigned 1,324 448 558

559 560 Figure S21. The overview of genome of Mollicutes symbiont in the gill of A. marisindica 561 constructed using GView Server. The innermost ring represents the COG categories of the 562 symbiont. The second and third rings show GC skew and GC content. The outermost ring 563 represents the symbiont genome sequence. The identifiers of outside-to-inside rings are listed 564 on the right.

40 565 566 Supplementary Note 6 567 Gut microbial community and relative abundance. The bacterial abundance of 568 metagenomic sequences was figured out using Kaiju (Menzel et al., 2016) based on subset of 569 NCBI BLAST nr database containing all proteins belonging to Archaea, Bacteria and 570 Viruses. In the metagenome of gill, the bacterial sequences account for 12.08–21.44% 571 (Figure S6A) which is much higher than the intestinal bacteria (2.43–5.41%, Figure S6B). 572 After the metagenome assembly of the intestinal contents, bacterial taxonomic classification 573 of the assembled contigs was figured out using Kaiju (Menzel et al., 2016), the microbial 574 composition includes 38.74–49.30% of Proteobacteria, 7.44–11.15% of Actinobacteria, 8.98– 575 10.79% of Firmicutes, 1.89–9.75% of Tenericutes and 3.20–4.42% of viruses, etc (Figure 576 3B). 577 578 Gene Prediction and Annotation of Gut Microbiome. The gene set of intestinal flora is 579 estimated to be about 3,881–5,389 annotated genes that harbored an extensive metabolic 580 repertoire, which is distinct from, but complements the activity of host enzymes in the 581 digestive gland and intestine, including functions essential for food digestion and nutrient 582 absorption. In addition, ammonium (the major nitrogenous waste of the host) assimilation in 583 the meta-pathway of intestinal flora indicate their ability of recycling host’s metabolic waste, 584 and other metabolites generated by intestinal microbiota, such as bacteriocins, short-chain 585 fatty acids, and quorum-sensing autoinducers, were essential for intestinal homeostasis (Gao 586 et al., 2009). Here, we discuss the main intestinal flora, particularly bacteria, and genes 587 involved in microbial pathways associated with the metabolism of host-ingested substances, 588 including carbohydrates, proteins, vitamins and minerals, which could be used to expand the 589 nutrient acquisition capacities of the Alviniconcha intestine. Genes involved in intestinal 590 microbial hydrolases were extracted from the metagenome, their annotations and expression 591 levels were summarized in Dataset S1, all the annotated information of intestinal flora was in 592 Dataset S2. 593 594 Potential Cross-feeding of Gut Microbiome. Nondigestible carbohydrates can be converted 595 into critical metabolites such as vitamins and short chain fatty acids (SCFAs) by 596 saccharolytic bacteria. SCFAs produced by the intestinal microbiota affect lipid, glucose, and 597 cholesterol metabolism in and humans (Pranal et al., 1996). Moreover, this 598 fermentation results in SCFAs together with the gases CO2 and H2, which can be utilised by 599 acetogens or archaeal microbiota in the gut of A. marisindica. For example, acetogens 600 convert CO2 into acetate, methanogenic archaea can generate CH4 using H2 and CO2 from 601 polysaccharide processing, and sulfate-reducing bacteria use H2 for sulfate reduction to 602 produce H2S. Nitrogen is essential for bacterial survival in the intestine and dissimilatory 603 nitrate reduction and ammonium assimilation are found in the intestinal microbial meta- 604 pathway (Figure 5). Genes encoding for ureases are found in the metagenome of intestinal 605 microbes, and urea produced by the host can be hydrolysed to ammonia by these urease- 606 producing bacteria. The ammonia in turn can be used for protein metabolism. The occurrence 607 of urea-nitrogen recycling suggests that the gut microbiome plays an important role in the 608 nitrogen balance of A. marisindica (Figure 5). Co-abundance and meta-pathway analyses 609 indicate that the gut microbiome likely play critical ecological roles via bacterial 610 interdependencies and mutual cooperation to maintain intestinal homeostasis and provide the 611 host with symbiont metabolites that serve as host nutrients. 612 613 Gut Microbial Hydrolases. There are three main types of hydrolases secreted by intestinal 614 flora, including glycoside hydrolases (GHs) which can deconstruct complex carbohydrates

41 615 (e.g. cellulose, chitin), protease (including aminopeptidases) and esterase (mainly are 616 lipases). In Alviniconcha marisindica, digestive exoenzymes produced by the intestinal 617 microbes help the enzymatic digestion of food in intestine (Table S3), α-Amylase was found 618 abundant in the digestive exoenzymes of intestinal flora which breaks down long-chain 619 saccharides, trypsin and Zinc-dependent metalloprotease were major exoproteases which are 620 necessary for protein absorption, GDSL esterases/lipases were the major lipolytic enzymes 621 for carbon source provision. Especially, glycoside hydrolases (GHs) were found abundant in 622 the intestine of Alviniconcha marisindica and has important roles aiding the digestion of 623 dietary carbohydrates. Cellulases, xylanase/chitin deacetylase, glucanases, phosphorylase and 624 some other glycoside hydrolases (GHs) were found extensively in Proteobacteria (Dickeya, 625 Xanthomonadales, Cellvibrio), Actinobacteria, Firmicutes, Tenericutes, Chloroflexi, 626 Bacteroidetes (Cytophagia), and some fungi (Eurotiales, Debaryomycetaceae, 627 Chaetomiaceae) to form multi‐enzyme complexes that help the host digestion of ingested 628 carbohydrates. The Alviniconcha marisindica lack the enzymes to degrade the bulk of dietary 629 fibers, these nondigestible carbohydrates are converted into utilizable and important 630 metabolites by saccharolytic bacteria, such as vitamins and short chain fatty acids (SCFAs). 631 SCFAs were produced as the key products for nutrition provision via intestinal microbial 632 digestion of the dietary carbohydrates (den Besten et al., 2013), like acetate, propionate, and 633 butyrate, which have a healthy and beneficial effect. Previous studies of rats/mice and 634 humans showed SCFAs produced by the intestinal microbiota affect lipid, glucose, and 635 cholesterol metabolism in various tissues (Gao et al., 2009, Todesco et al., 1991), for 636 example, the fatty acid metabolism is regulated by SCFAs in the body including the balance 637 between fatty acid synthesis, fatty acid oxidation, and lipolysis (den Besten et al., 2013); 638 glucose metabolism is beneficially affected by SCFAs via normalizing plasma glucose levels 639 and increasing glucose handling (den Besten et al., 2013). A large part of the SCFAs is used 640 as a source of energy in various species, for example, SCFAs contribute ~10% of the 641 daily caloric requirements of humans, ~20–30% for several other omnivorous or herbivorous 642 animals (Bergman, 1990). 643 644 Gene encoded the sialate O-acetylesterase was highly expressed in the meta-transcriptome of 645 intestinal flora which indicate sialic acid degradation is active in the intestine of Alviniconcha 646 marisindica. Sialic acids (Sia) are prominent outermost carbohydrates of the intestine which 647 are important components of mucus layer (Li et al., 2015). A pathway for the transport and 648 catabolism of Sia was encoded in the metagenome of intestinal flora which indicate sialic 649 acid can be utilized as a carbon source. 650 651 Lactic Acid Bacteria in Gut. Various lactic acid bacteria were found co-exist in the 652 intestine, i.e. families of 1.29–1.55% of Lactobacillaceae, 0.20–0.43% of Streptococcaceae, 653 0.25–0.34% of Enterococcaceae, 0.14% of Leuconostocaceae (Leuconostoc), 0.14% of 654 Carnobacteriaceae and 0.12% of Aerococcaceae. Mixed lactic acid bacteria have the potential 655 to prevent pathogens causing intestinal infection and play an important role in the stability of 656 intestinal microbes for maintaining host health (Ren et al., 2018, Pessione, 2012), which 657 exhibit a mutualistic relationship with the host. For example, Lactobacillus species found in 658 the gastrointestinal tracts are commonly used as probiotics that confer a health benefit on the 659 host (Walter, 2008), at the family level, Lactobacillaceae and Streptococcaceae became 660 prevalent after probiotic treatment, and microbial community composition was more stable 661 during the period of probiotics treatment (Hemarajata and Versalovic, 2013). 662 663 Summary of Gut Microbial Functional Analysis. The above mentioned functional analysis 664 of intestinal flora indicate the microbiota has great impact on intestinal nutrient degradation

42 665 and absorption, host immune system stimulation and help defense against pathogens. The 666 degradative processes in the intestine following the action of bile acid, pancreatic and 667 intestinal enzymes were essential for nutritional homeostasis in the Alviniconcha holobiont. 668 Hence, even if the intestinal flora at a lower abundance than the endosymbionts in the gill, 669 they are significant to the host health and survival. In this study, intestinal microbial ecology 670 and its interplay with the host metabolome are critical to nutritional and metabolic demands 671 of Alviniconcha holobiont and help to shape unique microenvironments of intestine. 672 673 Supplementary Note 7 674 Genomic Comparison of the Endosymbiont. The gill endosymbiont of the Wocan 675 Alviniconcha marisindica was compared with the endosymbiont of Lamellibrachia tubeworm 676 (Patra et al., 2016), the epibiont of the giant tubeworm Riftia pachyptila (Giovanneli et al., 677 2016), and two free-living Campylobacterota (Inagaki et al., 2004; Nakagawa et al., 2007) 678 from deep-sea hot vents. Whole-genome ANI of orthologous gene pairs shared between two 679 microbial genomes was calculated using fastANI (Jain and Rodriguez-R, 2018). The 680 orthologous groups (OGs) from the above five Campylobacterotal genomes were detected 681 using Proteinortho v5.16b (Lechner et al., 2011) (BLAST threshold E = 1 × 10-10), a total of 682 1,053 shared single copy orthologs were identified. PCA analysis on the shared orthologous 683 proteins was performed using Jalview (Waterhouse et al., 2009), BLOSUM62 model was 684 used to calculate the similarity scores between each pair of sequences and form a matrix, then 685 the components were generated and visualized using BioVinci (Bioturing, San Diego, CA, 686 US) (Figure S22). 687

A. Sulfurovum lithotrophicum

Alviniconcha Sulfurovum endosymbiont sp. NBC37-1

Sulfurovum riftiae

Lamellibrachia endosymbiont

B. C. 25

30 legend A: Tw legend B: Bacter C: Energy production and conversion 20 D: Cell cycle control, cell division, chromosome pa rtitioning D: Tr rters E: Amino acid transport and metabolism F: Nucleotide transport and metabolism F: Pyruv G: Carbohydrate transport and metabolism G: Citrate cycle (TCA cycle) H: Coenzyme transport and metabolism 20 H: Glycolysis/Gluconeogenesiss I: Lipid transport and metabolism 15

y y J: Translation, ribosomal structure and biogenesis

c c J: Biosynthesis of secondar n K: Transcription n

e e

u L: Replication, recombination and repair u

q

q

e

e

r M: Cell wall/membrane/envelope biogenesis r

F F r O: Posttranslational modification, protein turnover, chaperones N: P r P: Inorganic ion transport and metabolism 10 O: Polyk Q: Secondary metabolites biosynthesis, transport and catabolism R: General function prediction only Q: Glyo o 10 S: Function unknown R: Fr T: Signal transduction mechanisms S: Protein kinases U: Intracellular trafficking, secretion, and vesicular transport T V: Defense mechanisms 5 U: Transcr factors X: multiple functions V folding catalysts W: P

0 0

C D E F G H I J K L M O P Q R S T U V X A B C D E F G H I J K L M N O P Q R S T U V W 688 FunctionClass FunctionClass 689 Figure S22. Genomic comparison analysis across representatives of Campylobacterota. 690 (A) PCA analysis on the orthologous proteins in 1,053 OGs of five Campylobacterota under 691 the BLOSUM62 model. The reduced orthologous genes of the endosymbiont of Alviniconcha

43 692 marisindica are classified into different functional categories based on (B) COG and (C) 693 KEGG annotations. 694 695 Reduced Genes of A. marisindica Endosymbiont. Based on PCA analysis, the five 696 Campylobacterota were separated according to habitat type (Figure S22A). The free-living 697 Campylobacterota were clustered, the endosymbiont of shallow-water seep vestimentiferan, 698 Lamellibrachia satsuma, the endosymbiont of deep-sea vent snail, Alviniconcha marisindica, 699 and the epibiont of deep-sea vent vestimentiferan, Riftia pachyptila were well separated from 700 each other. The horizontal transmitted endosymbionts and epibionts of seep- or vent-animals 701 likely come from the environmental free-living population. The animal hosts have the 702 potential to adopt endosymbionts that optimally adapt to both ambient seawater and host 703 intracellular environment. Therefore, different habitats may have driven the molecular 704 adaptation in the above symbionts. Comparison of complete genome sequences within clades 705 have revealed that the endosymbiont of A. marisindica exhibit the small genome with fewer 706 coding sequences. As a result, 352 orthologue genes were found lost in this endosymbiont. 707 The functional composition of the loss genes included genes involved in [M] cell 708 wall/membrane/envelope biogenesis (e.g. capsular polysaccharides), [O] posttranslational 709 modification, protein turnover, chaperons, [P] inorganic ion transport and metabolism, [C] 710 energy production and conversion, [T] signal transduction metabolism and [V] defensive 711 mechanisms account for the main part, in which 174 reduced genes annotated via KEGG and 712 mainly distributed in two-component system (e.g. chemotaxis proteins and response 713 regulators), DNA repair and recombination (e.g. DNA polymerase polX), carbohydrate 714 metabolism (e.g. part of pyruvate metabolism), biosynthesis of amino acids, ABC 715 transporters and other element transporters (Figure S22 B and C). The endosymbiont doesn’t 716 have many non-essential metabolic genes and part of central carbohydrate metabolism, for 717 example, partial Citrate cycle which is one of the optional from pyruvate to aceyl-coA (genes 718 ace and DLAT) was missing. 719 720 Loss-of-Function Mutations. The effects of variation in shared orthologous proteins of the 721 endosymbiont of Alviniconcha marisindica and four Campylobacterotal references were 722 revealed by the DBS method. Functional genetic changes in conserved domains within shared 723 orthologous protein sequences showed the number of genes, classified by functions, with 724 loss-of-function mutations in the endosymbiont of Alviniconcha marisindica were much less 725 than the other four Campylobacterotal references, only 10–13 loss-of-function mutated 726 orthologous proteins were found (Figure 4A). Comparing with the two free-living and one 727 epibiotic Campylobacterota, several bacterial surface proteins were found lost their functions 728 in the endosymbiont of Alviniconcha marisindica, including outer membrane 729 protein/protective bacterial surface antigen, OMA87 (PF03279.13), bacterial lipid A 730 biosynthesis acyltransferase (PF03279.13), putative membrane protein insertion efficiency 731 factor (PF01809.18) and Sel1 repeats (SLRs) (PF08238.12) which regarded as bacterial 732 virulence determinants and may also be recognized by host cells (Newton et al., 2007; 733 Emiola et al., Tabatabai, 2008). Furthermore, NnrS protein (PF05940.12) was found lost its 734 function in the endosymbiont of Alviniconcha marisindica compared to the endosymbiont of 735 Lamellibrachia satsuma. NnrS is particularly important for protecting bacterial resistance to 736 nitrosative stress under anaerobic conditions (Stern et al., 2013), though protein sequences of 737 this enzyme was found mutated and may lost its function in the endosymbiont of 738 Alviniconcha marisindica, genes encoded nitric oxide synthase-interacting protein and 739 NADPH--cytochrome P450 reductase (POR) which participated in nitric oxide catabolic 740 process were found expressed in Alviniconcha host cells to alleviate this stress. The loss-of-

44 741 function orthologous genes from the endosymbiont of Alviniconcha marisindica and other 742 four Campylobacterotal species are listed in Dataset S3. 743 744 Supplementary Note 8 745 Heterogeneous Genomes of the Endosymbiont Populations. A total of 20 endosymbiont 746 isolates were obtained from the posterior and anterior parts of the gill of 10 A. marisindica 747 individuals. A pipeline BactSNP (Yoshimura et al., 2019) was used to identify single- 748 nucleotide polymorphisms (SNPs) between isolates, this pipeline was capable of highly 749 accurate and sensitive SNP calling in a single step even when target isolates are closely 750 related. In this study, we used the genome of isolate ‘7-B’ as reference genome, SNPs were 751 called among isolates. Pseudo genomes of isolates were obtained with all contigs in 752 concatenated into one sequence for each isolate, and then input to phylogeny analysis. In 753 addition to the genomic ANI results, a phylogenetic tree using SNPs was constructed to 754 represent reliable evolutionary relationships among 20 endosymbiont isolates. The distance 755 between 2 isolates is the sum of the length of all branches connecting them, and we defined 756 the isolates clustered in the same clade at different levels on the phylogenetic tree with 757 distance less than 0.00198 as the same strains. In this case, five strains were identified from 758 20 endosymbiont isolates. The endosymbionts under stressful conditions can bestow a 759 selective advantage via increasing genetic variation. In addition, genetic variation of the 760 endosymbiont allows it to establish themselves in their chosen host and also allows them to 761 resist host’s subsequent innate immunity like many pathogenic bacteria (Robertson and 762 Meyer, 1992). Interestingly, we found that the endosymbiont isolates were not clustered 763 according to their host and the strains randomly distributed in different host individuals. 764 Based on the semi-endosymbiotic model of A. marisindica, we supposed the A. marisindica 765 had the ability to exchange or re-acquire gill endosymbionts from the environment, and had 766 potential to obtain more flexible and diverse endosymbiotic bacteria than many other 767 holobionts with true endosymbiosis. 768 769 Supplementary Note 9 770 DEGs of Different Tissues/Organs. The Alviniconcha genome encoded complete metabolic 771 pathways within lysosomes, endosomes and phagosomes which serving as main digestive 772 compartment of host cells. Especially, genes encoded large various proteases, bile salt- 773 activated lipase and glycosyl hydrolases were highly expressed in the intestine, genes 774 encoded gastric intrinsic factor and bile acid transporter were highly in the digestive gland, 775 which strongly support the active digestion of macromolecular nutrients in the intestine. The 776 annotation of highly expressed genes in the gill, intestine, digestive gland, and mantle of A. 777 marisindica were shown respectively in Dataset S4 and Figure 23. Large amount of highly 778 expressed genes in the intestine showed it has functional specializations that responsible for 779 the effective and regulated nutrients transport. The number of differentially expressed genes 780 in the gill and intestine, with their biological coefficient of variation (BCV) were shown 781 respectively in the Figure S24. Based on the DEGs (selective highly expressed genes), 782 functional enrichments of GO terms/KEGG pathways were also identified in the gill and 783 intestine of A. marisindica. In the intestine, genes involved in vacuole, hydrolases activity, 784 organic hydroxy compound metabolic process, and transmembrane transport were enriched, 785 indicating its active nutrients digestion and absorption (Figure S12A). In the gill, leukocyte 786 differentiation, myeloid leukocyte activation, MyD88-independent toll-like receptor signaling 787 pathway, and TNF signaling which contribute to host immune system were enriched, 788 indicating the ability of resisting invasions and controlling symbionts of the gill. Enrichment 789 of lipid biosynthetic process, regulation of protein synthesis and intracellular transport in the 790 gill showed its active nutrients metabolism (Figure S12B).

45 791

792 793 Figure S23. Gene Ontology (GO) functional annotations of highly expressed genes in the 794 digestive gland, foot, gill, and intestine. X-axis represents the default GO terms selected 795 automatically by WEGO online tool, the colour of bars represents different tissues of A. 796 marisindica (red – digestive gland, grey – gill, blue – intestine, orange - mantle), the length 797 of bars represents percentage and number of transcripts in different GO functional classes. 798 GO annotation provides three main functional categories (cellular component, molecular 799 function, and biological process). 800

46 801 802 Figure S24. Differentially expressed genes (DEGs) in the intestine and gill of 803 Alviniconcha marisindica. Volcano plot and biological coefficient of variation (BCV) plot of 804 differentially expressed genes in the (A) gill and (B) intestine are identified by DESeq2 805 analysis. The log10 (FDR corrected p-values) are plotted against the log2 (FC) in gene 806 expression. Upregulated genes by twofold or more and with a FDR corrected p-value < 0.05 807 are marked as blue dots, while down-regulated genes (FRD <= 2 with P < 0.05) are in red 808 colour. 809 810 Substrates Transport. Previous study showed the substrates for chemosynthesis including 811 sulfide and oxygen are obtained from ambient seawater through hemoglobin and hemocyanin

47 812 in the gill of Alviniconcha hessleri (Wittenberg and Stein, 1995). However, the genome of A. 813 marisindica does not have genes encoding hemoglobin subunits where this genome contains 814 the genes encoding hemocyanin, globin-like (glob1, NGB), myoglobin-like (IDO2) and 815 globin C, coelomic-like that are responsible for substrates transportation. Importantly, the 816 hemocyanin genes showed an extremely high expression (top 10 in the intestine) in the 817 intestine (Figure S25). Accordingly, a large volume of blue hemolymph was observed. In this 818 study, we supposed that hemocyanin was not only oxygen transporter as previous studies 819 suggested (Wittenberg and Stein, 1995). It also capable to bind and transport sulfur 820 compounds to the symbionts. 821

822 823 Figure S25. Transcriptional activity of genes participating in the globin of Wocan 824 Alviniconcha marisindica for substrates transportation. Heat map of transcriptional 825 activity of genes that encode neuroglobin, myoglobin, globin-like, and hemocyanin in 826 different tissues, including foot, neck, mantle, digestive gland (DG), gill and intestine. Each 827 grid in the heat map represents an identified gene in the respective sample. The colour 828 represents the gene expression level (based on normalized TPM values). The annotated gene 829 names are listed on the sides. 830 831 Digestion of Symbionts and Nutrients Transportation. Functional enzyme distribution of 832 A. marisindica genome shows that the host contains numerous genes responsible for key 833 hydrolases, and the expression of these genes is higher in the gill and intestine than in other 834 tissues/organs (Figure S11A), especially intestine showing a highly active digestive system. 835 The presence of none specific transporters of essential nutrients in gill endosymbiont 836 suggests nutrients are released via leakage or digestion by host (Yang et al., 2020, Newton et 837 al., 2007). For example, lysosomal cathepsins, non-lysosomal calpain and caspases are highly 838 expressed in the gill. However, except of membrane-lytic enzymes such as aminopeptidases 839 showing high expression, proteinases and glycosyl hydrolases are not more active in the gill 840 than in other tissues. The active T2SS of the endosymbiont also has the ability to translocate 841 intracellular proteins into outside. In this case, the endosymbiont releases the nutrients not 842 only through being digested but also using T2SS to export. Genes that are involved in 843 apoptosis regulation are expressed highly in the gill that harbors highly expressed caspases 844 genes (Figure S11A) responsible for programmed cell death. Since the cell initiates 845 intracellular apoptotic signaling in response to a stress, such as chemical cues in the 846 environment, nutrient deprivation, viral infection or other immune stimuli (Kemp, 2017), it is 847 reasonable to conclude that when endosymbionts proliferate quickly, the cytosolic

48 848 concentration in host cells will induce the regulation of cell apoptosis. Hence, the density of 849 endosymbionts shall be controlled by host bactericidal activity and apoptosis, which are 850 triggered by limited space and nutrient deprivation in host cell. The cell apoptosis in the gill 851 has the potential to help host cells obtain nutrients. 852

853 Dataset S1 (separate file). The annotation and expression levels of genes involved in 854 encoding hydrolases in intestinal flora of three A. marisindica individuals.

855 Dataset S2 (separate file). The annotation of genes predicted from the metagenome of 856 intestinal flora from three snail individuals.

857 Dataset S3 (separate file). The loss-of-function orthologous genes from the endosymbionts 858 of Alviniconcha marisindica and four campylobacterotal references through their pair-wise 859 genomic comparisons based on a profile-based method.

860 Dataset S4 (separate file). The annotation of highly expressed genes (Fold Change>2) in 861 different tissues (gill, intestine, digestive gland, and mantle) of A. marisindica. 862 863 SI References 864 1. Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, & Nielsen PH 865 (2013). Genome sequences of rare, uncultured bacteria obtained by differential coverage 866 binning of multiple metagenomes. Nature biotechnology 31: 533. 867 https://doi.org/10.1038/nbt.2579 868 869 2. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990). Basic local alignment 870 search tool. Journal of Molecular Biology 215: 403–410. https://doi.org/10.1016/S0022- 871 2836(05)80360-2 872 873 3. Bankevich A. Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, 874 Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV (2012). SPAdes: a new genome assembly 875 algorithm and its applications to single-cell sequencing. Journal of Computational Biology 876 19: 455–477. https://doi.org/10.1089/cmb.2012.0021 877 878 4. Bergman EN (1990). Energy contributions of volatile fatty acids from the gastrointestinal 879 tract in various species. Physiological Reviews 70: 567–590. 880 https://doi.org/10.1152/physrev.1990.70.2.567 881 882 5. Bolger AM, Lohse M & Usadel B (2014). Trimmomatic: a flexible trimmer for Illumina 883 sequence data. Bioinformatics 30: 2114–2120. https://doi.org/10.1093/bioinformatics/btu170 884 885 6. Capella-Gutiérrez S, Silla-Martínez JM & Gabaldón T (2009). trimAl: a tool for automated 886 alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973. 887 https://doi.org/10.1093/bioinformatics/btp348 888 889 7. Chen C, Linse K, Roterman CN, Copley JT, Rogers AD (2015). A new genus of large 890 hydrothermal vent-endemic gastropod (Neomphalina: Peltospiridae). Zoological Journal of 891 the Linnean Society 175: 319–335. https://doi.org/10.1111/zoj.12279 892

49 893 8. Den Besten G, van Eunen K, Groen AK, Venema K, Reijngoud DJ, Bakker BM (2013). 894 The role of short-chain fatty acids in the interplay between diet, gut microbiota, and host 895 energy metabolism. The Journal of Lipid Research 54: 2325–2340. 896 https://doi.org/10.1194/jlr.R036012 897 898 9. Eddy SR (2009). A new generation of homology search tools based on probabilistic 899 inference. Genome Informatics 23: 205–211. https://doi.org/10.1142/9781848165632_0019 900 901 10. Emiola A, George J, Andrews SS (2015). A complete pathway model for lipid A 902 biosynthesis in Escherichia coli. Plos One 10: e0121216. 903 https://doi.org/10.1371/journal.pone.0121216 904 905 11. Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek R (1994). DNA primers for 906 amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan 907 invertebrates. Mol Mar Biol Biotechnol 3:294–299. 908 909 12. Fu S, Wang A, Au KF (2019). A comparative evaluation of hybrid error correction 910 methods for error-prone long reads. Genome biology 20, 26. https://doi.org/10.1186/s13059- 911 018-1605-z 912 913 13. Gao Z, Yin J, Zhang J, Ward RE, Martin RJ, Lefevre M, Cefalu WT, Ye J (2009). 914 Butyrate improves insulin sensitivity and increases energy expenditure in mice. Diabetes 58: 915 1509–1517. https://doi.org/10.2337/db08-1637 916 917 14. Giovannelli D, Chung M, Staley J, Starovoytov V, Le Bris N, Vetriani C (2016). 918 Sulfurovum riftiae sp. nov., a mesophilic, thiosulfate-oxidizing, nitrate-reducing 919 chemolithoautotrophic epsilonproteobacterium isolated from the tube of the deep-sea 920 hydrothermal vent polychaete Riftia pachyptila. International journal of systematic and 921 evolutionary microbiology 66, 2697–2701. https://doi.org/10.1099/ijsem.0.001106 922 923 15. Götz S, García-Gómez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, Robles M, 924 Talón M, Dopazo J, Conesa A (2008). High-throughput functional annotation and data 925 mining with the Blast2GO suite. Nucleic Acids Research 36, 3420–3435. 926 https://doi.org/10.1093/nar/gkn176 927 928 16. Hemarajata P, Versalovic J (2013). Effects of probiotics on gut microbiota: mechanisms 929 of intestinal immunomodulation and neuromodulation. Therapeutic Advances in 930 Gastroenterology 6: 39–51. https://doi.org/10.1177%2F1756283X12459294 931 932 17. Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, Von Mering C, Bork P 933 (2017). Fast genome-wide functional annotation through orthology assignment by eggNOG- 934 mapper. Molecular biology and evolution 34, 2115-2122. 935 https://doi.org/10.1093/molbev/msx148 936 937 18. Huson DH, Mitra S, Ruscheweyh HJ, Weber N & Schuster SC (2011). Integrative 938 analysis of environmental sequences using MEGAN4. Genome Research 21, 1552–1560. 939 http://www.genome.org/cgi/doi/10.1101/gr.120618.111 940

50 941 19. Hyatt D, Chen GL, LoCascio PF, Land ML, Larimer FW, Hauser LJ (2010). Prodigal: 942 prokaryotic gene recognition and translation initiation site identification. BMC 943 Bioinformatics 11, 119. https://doi.org/10.1186/1471-2105-11-119 944 945 20. Inagaki F, Takai K, Nealson KH & Horikoshi K (2004). Sulfurovum lithotrophicum gen. 946 nov., sp. nov., a novel sulfur-oxidizing chemolithoautotroph within the epsilon- 947 Proteobacteria isolated from Okinawa Trough hydrothermal sediments. International journal 948 of systematic and evolutionary microbiology 54, 1477–1482. 949 https://doi.org/10.1099/ijs.0.03042-0 950 951 21. Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT & Aluru S (2018). High 952 throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. 953 Nature communications 9, 1–8. https://doi.org/10.1038/s41467-018-07641-9 954 955 22. Johnson SB, Warén A, Tunnicliffe V, Dover CV, Wheat CG, Schultz TF, Vrijenhoek RC 956 (2015). Molecular taxonomy and naming of five cryptic species of Alviniconcha snails 957 (: Abyssochrysoidea) from hydrothermal vents. Systematics and Biodiversity 13: 958 278–295. https://doi.org/10.1080/14772000.2014.970673 959 960 23. Kanehisa M & Goto S (2000). KEGG: Kyoto Encyclopedia of Genes and Genomes. 961 Nucleic Acids Research 28, 27–30. https://doi.org/10.1093/nar/28.1.27 962 963 24. Kajitani R, Toshimoto K, Noguchi H, Toyoda A, Ogura Y, Okuno M, Yabana M, Harada 964 M, Nagayasu E, Maruyama H, Kohara Y (2014). Efficient de novo assembly of highly 965 heterozygous genomes from whole-genome shotgun short reads. Genome Res 24:1384–1395. 966 https://doi.org/10.1101/gr.170720.113 967 968 25. Kang DD, Froula J, Egan R & Wang Z (2015). MetaBAT, an efficient tool for accurately 969 reconstructing single genomes from complex microbial communities. PeerJ 3, e1165. 970 https://doi.org/10.7717/peerj.1165 971 972 26. Kemp MG (2017). Crosstalk between apoptosis and autophagy: environmental 973 genotoxins, infection, and innate immunity. Journal of cell death 9: 1179670716685085. 974 https://doi.org/10.1177%2F1179670716685085 975 976 27. Langmead B & Salzberg SL (2012). Fast gapped-read alignment with Bowtie 2. Nature 977 methods 9, 357. https://doi.org/10.1038/nmeth.1923 978 979 28. Lechner M, Findeiß S, Steiner L, Marz M, Stadler PF, Prohaska SJ (2011). Proteinortho: 980 detection of (co-)orthologs in large-scale analysis. BMC Bioinformatics 12, 124. 981 https://doi.org/10.1186/1471-2105-12-124 982 983 29. Li H (2016). Minimap and miniasm: fast mapping and de novo assembly for noisy long 984 sequences. Bioinformatics 32, 2103–2110. https://doi.org/10.1093/bioinformatics/btw152 985 986 30. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, 987 Durbin R (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 988 2078–2079. https://doi.org/10.1093/bioinformatics/btp352 989

51 990 31. Li H, Limenitakis JP, Fuhrer T, Geuking MB, Lawson MA, Wyss M, Brugiroux S, Keller 991 I, Macpherson JA, Rupp S, Stolp B, Stein JV, Stecher B, Sauer U, McCoy KD, Macpherson 992 AJ (2015). The outer mucus layer hosts a distinct intestinal microbial niche. Nature 993 Communications 6: 8292. https://doi.org/10.1038/ncomms9292 994 995 32. Li Z, Quan G, Jiang X, Yang Y, Ding X, Zhang D, Wang X, Hardwidge PR, Ren W, Zhu 996 G (2018). Effects of metabolites derived from gut microbiota and hosts on pathogens. 997 Frontiers in cellular and infection microbiology 8: 314–314. 998 https://doi.org/10.3389/fcimb.2018.00314 999 1000 33. Menzel P, Ng KL, Krogh A (2016). Fast and sensitive taxonomic classification for 1001 metagenomics with Kaiju. Nature Communications 7: 11257. 1002 https://doi.org/10.1038/ncomms11257 1003 1004 34. Nakagawa S, Takaki Y, Shimamura S, Reysenbach AL, Takai K, Horikoshi K (2007). 1005 Deep-sea vent ε-proteobacterial genomes provide insights into emergence of pathogens. 1006 Proceedings of the National Academy of Sciences 104: 12146–12150. 1007 https://doi.org/10.1073/pnas.0700687104 1008 1009 35. Newton, H. J. et al. Sel1 Repeat Protein LpnE Is a Legionella pneumophila virulence 1010 determinant that influences vacuolar trafficking. Infection and Immunity 75: 5575–5585 1011 (2007). https://doi.org/10.1128/IAI.00443-07 1012 1013 36. Newton HJ, Sansom FM, Dao J, McAlister AD, Sloan J, Cianciotto NP, Hartland EL 1014 (2007). The Calyptogena magnifica chemoautotrophic symbiont genome. Science 315: 998– 1015 1000. https://doi.org/10.1126/science.1138438 1016 1017 37. Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, Edwards RA, Gerdes S, 1018 Parrello B, Shukla M, Vonstein V (2014). The SEED and the Rapid Annotation of microbial 1019 genomes using Subsystems Technology (RAST). Nucleic acids research 42: D206–D214. 1020 https://doi.org/10.1093/nar/gkt1226 1021 1022 38. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW (2015). CheckM: 1023 assessing the quality of microbial genomes recovered from isolates, single cells, and 1024 metagenomes. Genome research 25: 1043–1055. 1025 http://www.genome.org/cgi/doi/10.1101/gr.186072.114 1026 1027 39. Paster BJ, Bartoszyk IM, Dewhirst FE (1998). Identification of oral streptococci using 1028 PCR-based, reverse-capture, checkerboard hybridization. Methods in Cell Science 20: 223– 1029 231. https://doi.org/10.1023/A:1009715710555 1030 1031 40. Patra AK, Cho HH, Kwon YM, Kwon KK, Sato T, Kato C, Kang SG, Kim SJ (2016). 1032 Phylogenetic relationship between symbionts of tubeworm Lamellibrachia satsuma and the 1033 sediment microbial community in Kagoshima Bay. Ocean Science Journal 51, 317–332. 1034 https://doi.org/10.1007/s12601-016-0028-6 1035 1036 41. Pessione E (2012). Lactic acid bacteria contribution to gut microbiota complexity: lights 1037 and shadows. Frontiers in Cellular and Infection Microbiology 2: 86. 1038 https://doi.org/10.3389/fcimb.2012.00086 1039

52 1040 42. Pranal V, Fiala-Médioni A, Guezennec J (1996). Fatty acid characteristics in two 1041 symbiotic gastropods from a deep hydrothermal vent of the West Pacific. Marine ecology 1042 progress series 142: 175–184. https://doi.org/10.3354/meps142175 1043 1044 43. Robertson BD, Meyer TF (1992). Genetic variation in pathogenic bacteria. Trends in 1045 Genetics 8: 422–427. https://doi.org/10.1016/0168-9525(92)90325-X 1046 1047 44. Ren D, Gong S, Shu J, Zhu J, Liu H, Chen P (2018). Effects of mixed lactic acid bacteria 1048 on intestinal microbiota of mice infected with Staphylococcus aureus. BMC Microbiology 18: 1049 109. https://doi.org/10.1186/s12866-018-1245-1 1050 1051 45. Ruan J (2018). SMARTdenovo: Ultra-fast de novo assembler using long noisy reads. 1052 Github Available at: https://github. com/ruanjue/smartdenovo [Accessed January 10, 2019]. 1053 1054 46. Ruan J, Li H (2019). Fast and accurate long-read assembly with wtdbg2. Nature Methods, 1055 1–4. https://doi.org/10.1038/s41592-019-0669-3 1056 1057 47. Stamatakis A, Ludwig T, Meier H (2005). RAxML-III: a fast program for maximum 1058 likelihood-based inference of large phylogenetic trees. Bioinformatics 21: 456–463. 1059 https://doi.org/10.1093/bioinformatics/bti191 1060 1061 48. Stern A, Liu B, Bakken L, Shapleigh J, Zhu J. A novel protein protects bacterial iron- 1062 dependent metabolism from nitric oxide. Journal of bacteriology 195: 4702–4708 (2013). 1063 https://doi.org/ 10.1128/JB.00836-13 1064 1065 49. Suzuki Y, Sasaki T, Suzuki M, Nogi Y, Miwa T, Takai K, Nealson KH, Horikoshi K 1066 (2005). Novel chemoautotrophic endosymbiosis between a member of the 1067 and the hydrothermal-vent gastropod Alviniconcha aff. hessleri 1068 (Gastropoda: ) from the Indian Ocean. Applied and Environmental Microbiology 1069 71: 5440–5450. https://doi.org/10.1128/AEM.71.9.5440-5450.2005 1070 1071 50. Tabatabai LB (2008). Identification of Pasteurella multocida CHAPS-soluble outer 1072 membrane proteins. Avian Diseases 52: 147–149. https://doi.org/10.1637/7892-012807- 1073 ResNote 1074 1075 51. Todesco T, Rao AV, Bosello O, Jenkins DJ (1991). Propionate lowers blood glucose and 1076 alters lipid metabolism in healthy subjects. The American Journal of Clinical Nutrition 54: 1077 860–865. https://doi.org/10.1093/ajcn/54.5.860 1078 1079 52. Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, Schatz 1080 MC (2017). GenomeScope: fast reference-free genome profiling from short reads. 1081 Bioinformatics 33, 2202–2204. https://doi.org/10.1093/bioinformatics/btx153 1082 1083 53. Walter J (2008). Ecological role of Lactobacilli in the gastrointestinal tract: implications 1084 for fundamental and biomedical research. Applied and Environmental Microbiology 74: 1085 4985–4996. https://dx.doi.org/10.1128%2FAEM.00753-08 1086 1087 54. Wang JR, Holt J, McMillan L, Jones CD (2018). FMLRC: Hybrid long read error 1088 correction using an FM-index. BMC bioinformatics 19, 50. https://doi.org/10.1186/s12859- 1089 018-2051-3

53 1090 1091 55. Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ (2009). Jalview Version 1092 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25: 1189– 1093 1191. https://doi.org/10.1093/bioinformatics/btp033 1094 1095 56. Wittenberg JB, Stein JL (1995). Hemoglobin in the symbiont-harboring gill of the marine 1096 gastropod Alviniconcha hessleri. The Biological Bulletin 188: 5–7. 1097 https://doi.org/10.1086/699326 1098 1099 57. Wu YW, Simmons BA, Singer SW (2016). MaxBin 2.0: an automated binning algorithm 1100 to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607. 1101 https://doi.org/10.1093/bioinformatics/btv638 1102 1103 58. Xu H, Luo X, Qian J, Pang X, Song J, Qian G, Chen J, Chen S. FastUniq: a fast de novo 1104 duplicates removal tool for paired short reads. PLoS One 7, e52249 (2012). 1105 https://doi.org/10.1371/journal.pone.0052249 1106 1107 59. Yang Y, Sun J, Sun Y, Kwan YH, Wong WC, Zhang Y, Xu T, Feng D, Zhang Y, Qiu 1108 JW, Qian PY (2020). Genomic, transcriptomic, and proteomic insights into the symbiosis of 1109 deep-sea tubeworm holobionts. The ISME journal 14: 135–150. 1110 https://doi.org/10.1038/s41396-019-0520-y 1111 1112 60. Yoshimura D, Kajitani R, Gotoh Y, Katahira K, Okuno M, Ogura Y, Hayashi T, Itoh T 1113 (2019). Evaluation of SNP calling methods for closely related bacterial isolates and a novel 1114 high-accuracy pipeline: BactSNP. Microbial genomics 5: e000261. 1115 https://doi.org/10.1099/mgen.0.000261 1116 1117 61. Zimin AV, Marçais G, Puiu D, Roberts M, Salzberg SL, Yorke JA (2013) The MaSuRCA 1118 genome assembler. Bioinformatics 29, 2669–2677. 1119 https://doi.org/10.1093/bioinformatics/btt476

54