1 Supplementary materials for:

2 Fidelity varies in the symbiosis between a gutless marine worm and its microbial consortium

3

4 Yui Sato*1, Juliane Wippler1, Cecilia Wentrup2, Rebecca Ansorge1, Miriam Sadowski1, Harald

5 Gruber-Vodicka1, Nicole Dubilier*1, Manuel Kleiner*3

6 1Max Planck Institute for Marine Microbiology, Celsiusstr. 1, D-28359 Bremen, Germany

7 2University of Vienna, Department of Microbiology and Ecosystem Science, Althanstr. 14, A-1090 Vienna, Austria

8 3Department of Plant and Microbial Biology, North Carolina State University, Raleigh, North Carolina, USA

9

10 Contents:

11 1. Supplementary text

12 1.1. Detection limit of symbionts based on single-copy marker genes

13 1.2. Assessment of symbiont community compositions based on 16S ribosomal RNA genes

14 1.3. Symbiont 16S ribosomal RNA gene sequences indicated a linkage between haplotypes

15 of Candidatus Thiosymbion and host mitochondria

16 1.4. Phylogenetic reconstruction of mitochondria and symbionts based on SNPs identified

17 using a deterministic genotyping approach

18 1.5. SNP-identification based on genotype probabilities enhances capabilities of population-

19 level metagenomic analyses on host-associated microbiota

20 1.6. Estimation of the effective population size of symbionts within an Olavius algarvensis

21 individual based on genome-wide SNP abundance

22 Reference for supplementary text

23

24 2. Supplementary figures

25 Supplementary Figure S1 Phylogenomic tree of symbionts in Olavius algarvensis in relation to

26 reference bacterial genomes

27 Supplementary Figure S2 Phylogeny of 16S ribosomal RNA gene sequences for symbionts in

28 Olavius algarvensis in relation to reference bacterial sequences

1

29 Supplementary Figure S3 Phylogenies of mitochondria and Candidatus Thiosymbion within the

30 major mitochondrial lineages A and B

31 Supplementary Figure S4 Core SNP-trees based on called genotypes using a deterministic

32 approach

33 Supplementary Figure S5 Correlation of mitochondrial pairwise genetic distances with

34 corresponding genetic distances of symbionts in Olavius algarvensis

35 Supplementary Figure S6 Effective population size estimates of the symbiont per Olavius

36 algarvensis individual

37 Supplementary Figure S7 Sequence alignments of 16S ribosomal RNA genes of Olavius

38 algarvensis symbionts

39 Supplementary Figure S8 Distribution of mean relative read coverage among single-copy genes

40 within a single host per symbiont

41 Supplementary Figure S9 Symbiont composition of individual Olavius algarvensis samples of

42 two COI-haplotypes (A and B) from two locations (Sant’ Andrea and Cavoli)

43

44 3. Supplementary tables

45 Supplementary Table S1 Reference genome statistics of Olavius algarvensis symbionts

46 Supplementary Table S2 Assessment of strain variability of symbionts within Olavius

47 algarvensis individuals

48 Supplementary Table S3 Comparison of SNP-identification methods

2

49 1. Supplementary text

50 1.1 Detection limit of symbionts based on single-copy marker genes

51 In the 80 metagenomes of Olavius algarvensis, we assessed (i) symbiont community

52 composition and (ii) symbiont prevalence (the number of host individuals with the respective

53 symbiont species detected; n = 20 hosts per host group). These are assessed by quantifying sequences

54 of single-copy genes (SCGs) that are specific to symbiont species. Between 162 and 431 SCGs per

55 symbiont species were extracted from their reference genomes. Means and deviations of SCG read

56 coverages showed that a small subset of SCGs, especially in the Candidatus. (Ca.) Thiosymbion

57 symbiont, contain short repeat sequences that attract high abundance of read mapped erroneously

58 (Supplementary Figure S8). These reference SCGs were excluded from symbiont abundance

59 calculations to avoid overestimation. To this end, relative abundance of each symbiont was calculated

60 based on the mean read coverage of SCGs in the interquartile depth range, i.e. genes whose depths

61 were ranked between 25 and 75 percentiles, with genes with 0 coverages being ranked together.

62 Consequently, the minimum detection limit by this method was when the number of SCGs covered by

63 reads exceeds more than 25% of total number of SCGs in the reference. This method provides robust

64 assessment of symbionts, whereas it at times presents a conservative detection limit. For example, this

65 method per se indicated the presence of the spirochetal symbiont in all host individuals but one.

66 However, in the individual in which the spirochete symbiont was below the detection limit, sequences

67 matching 29 out of 162 SCGs (17%) as well as 16S rRNA gene were detected, thus we deemed the

68 spirochete to be present in this individual. Consequently, we interpreted that all host individuals show

69 the presence of spirochete (Figure 2).

70

71 1.2. Assessment of symbiont community compositions based on 16S ribosomal RNA genes

72 Relative abundances of O. algarvensis symbionts were estimated also by mapping metagenome

73 reads to reference 16S ribosomal RNA gene (SSU) sequences representing those of the symbionts, in

74 addition to the SCG-based approach above. For estimation of symbiont relative abundance based on

75 SSU sequences, quality-filtered reads matching with symbiont SSU sequences were quantified with

76 Kallisto using representative SSU reference sequences (NCBI accession numbers: spirochete; 3

77 AJ620502, Delta4; AJ620497, Gamma3; AJ620496, Delta3; AM493254, Delta1; AF328857, Ca.

78 Thiosymbion AF328856; the Delta1a and Delta1b symbionts sharing highly similar SSU sequences 1

79 were represented by AF328857). Results showed nearly the identical community structures in all

80 samples as shown based on SCG sequences (Supplementary Figure S9, Figure 2). The only difference

81 was that relative abundances of spirochete symbionts appeared slightly greater in the SSU-based

82 estimation than in the SGC-based estimation, which is likely due to differences in the copy number of

83 SSU in the spirochete symbiont compared to others.

84

85 1.3. Symbiont 16S ribosomal RNA gene sequences indicated a linkage between haplotypes of

86 Candidatus Thiosymbion and host mitochondria

87 As an initial assessment of partner fidelity, we assembled SSU sequences of symbionts in each

88 O. algarvensis sample, and searched for SNPs that are characteristic to certain host groups (COI-

89 haplotypes and locations). Only the Ca. Thiosymbion symbiont showed a SNP site in the SSU

90 sequences that was linked to the host COI-haplotype but not to locations, while no other symbionts

91 showed SNPs in SSU sequences that could be linked to certain host groups (Supplementary Figure

92 S7).

93

94 1.4. Phylogenetic reconstruction of mitochondria and symbionts based on SNPs identified using

95 a deterministic genotyping approach

96 We identified SNPs for mtDNA and symbionts using a deterministic genotyping approach for

97 phylogenetic reconstruction, in addition to the probabilistic approach to SNP-identification, to

98 compare the results using the two methods. For SNP-identification by deterministic genotyping, the

99 same symbiont- and mitochondrial-reads were analyzed with the SNIPPY pipeline v3.2

100 (https://github.com/tseemann/snippy), with the same reference genomes of mtDNA and symbionts as

101 described in the main document. Genotypes were first called at all reference nucleotide positions with

102 a minimum coverage of 5× for each sample. Core SNP sites were subsequently identified among

103 genotype-called sites that were covered ≥5× in all samples. Similar to the probabilistic approach

104 described in the main document, when no core SNP site was found, samples with insufficient

4

105 genotype data were excluded using a cut-off of lateral coverage (% reference sites with coverage ≥5×)

106 for SNP identification of a given symbiont. Phylogeny trees with bootstrap-support were computed

107 from resulting core SNP nucleotide alignment with IQ-TREE v1.5.5 2 using a best-model finder

108 implemented within IQ-TREE. Phylogenetic trees of the symbionts and mitochondria were visualized

109 in the iTOL web tool 3.

110 Because the deterministic genotype-calling method (i) relies on deeply-sequenced genomic sites

111 to account sequencing errors and (ii) identifies core SNP sites at common loci where all samples called

112 genotypes, it enables more conservative and more robust SNP-identifications. However, it also

113 imposes a limitation in the number of detectable SNP sites when low-coverage sequences are studied.

114 In our study, this approach required us to exclude many samples when no SNP site could be detected

115 due to low sequence-coverages, or when certain symbionts were absent. Consequently, the number of

116 samples and SNPs we could include in downstream analyses were substantially reduced when using

117 the deterministic genotyping approach, as compared to results using the genotype-probability based

118 method (Suppl. Table S3). An exception was the spirochete symbiont, where slightly more SNPs were

119 captured (99 sites) by the genotype-calling method than by the genotype probability-based approach

120 (88 sites). This was likely due to exceptionally high genetic variability of the spirochete symbionts

121 within and between host groups (see Figure 4g), which resulted in many variable sites removed by

122 filtering of statistically insignificant SNP-sites by the latter probabilistic approach. Nevertheless,

123 phylogenies based on limited SNPs and samples using the deterministic genotyping method showed

124 overall clustering patterns comparable to the ones shown using the probabilistic approach (Suppl.

125 Figure S4). Specifically, the core-SNP trees reproduced the distinctive divergence of mitochondria and

126 Ca. Thiosymbion based on host COI-haplotypes (Suppl. Figure S4a and S4b), the location-based

127 clustering of Delta4 (Suppl. Figure S4f), and even the clustering of Gamma3 from one B-host from

128 Cavoli within a clade of A-hosts from Cavoli (Suppl. Figure S4c; magnified panel). Similar to the

129 results based on the probabilistic SNP-identification, phylogenetic relationships of mitochondria and

130 Ca. Thiosymbion within the same mtDNA lineage were also not resolved using the deterministic

131 genotyping method (Suppl. Figure S3c and S3d).

132

5

133 1.5. SNP-identification based on genotype probabilities enhances capabilities of population-level

134 metagenomic analyses on host-associated microbiota

135 Metagenomic studies on infra-specific microbial variations at genome-wide SNP levels often

136 relies on deeply sequenced reads from abundant species 4. In this study, phylogenetic re-construction

137 of symbionts based on genome-wide SNPs was inferred directly from posterior genotype probabilities.

138 This probabilistic approach to SNP-identification enabled us to identify SNPs from highly replicated

139 metagenomes with moderate sequencing depths. In most cases, we were able to analyze more

140 individual samples using more SNP sites than performed with a conventional genotype-calling

141 approach as highlighted above in 1.4. Furthermore, phylogenetic patterns inferred with the

142 probabilistic method were comparable with those indicated using the deterministic genotyping

143 method, which is a more conservative approach with higher requirement for sequencing depth. The

144 present study thus showcases a promising potential of using probabilistic SNP-identification methods

145 for population genetic studies of complex host-associated microbiota, for which it is often difficult to

146 obtain sufficient coverages for target microbial species.

147

148 1.6. Estimation of the effective population size of symbionts within an O. algarvensis individual

149 based on genome-wide SNP abundance

150 We estimated the effective population size (Ne) of each symbiont species based on genome-

151 wide SNP abundance within a sample. Ne is a hypothetical number of individuals in an idealized

152 population with a certain quantity of interest that is equal to one that the actual population shows. In

153 our case, the quantity of interest is nucleotide diversity of a symbiotic bacterial population and the

154 ideal population is one in which all genetic mutations are neutral. Calculation of Ne for haploid species

155 such as and Archaea are challenging 5, but for simplicity we followed a method by Bobay and

5 156 Ochman using the Watterson’s estimator θ to calculate Ne using the equation; θ = 2 Ne µ, where µ is

157 the mutation rate 6. We made the assumption that mutation rates of all symbionts in our study are

158 similar, based on observations on a phylogenomic tree among other representative bacterial genomes

159 (Suppl. Figure S1). Specifically, we generated a phylogenomic tree of O. algarvensis symbiont

160 genomes with reference bacterial genomes including ones for which mutation rates are available 7, and

6

161 used a genome of the archaeon Nitrososphaera viennensis as outgroup. The phylogeny was estimated

162 based on 25 universally distributed single-copy genes in Bacteria and Archaea as implemented in

163 GToTree v1.5.38 8. The resulting tree showed that (i) published mutation rates for the previously

164 studied bacterial isolates are all similar in the order of magnitude around 10-10 mutation per site per

165 generation, (ii) the mutation rates and branch lengths from the outgroup to these isolate genomes do

166 not show a positive correlation within this range, and (iii) branch lengths to all symbionts are within

167 the range of those for the bacterial isolates with published mutation rates. Based on these observations,

168 we estimated the mutation rate µ of all symbionts as the average of the published mutation rates,

169 2.73×10-10 per site per generation.

4 5 170 Ne estimates of O. algarvensis symbionts were on the order of 10 - 10 (Suppl. Figure S6),

171 which are naturally much lower than Ne of census populations for most bacterial species reported on

8 9 5 172 the order of 10 – 10 by Boday and Ochman . Ne reflects the size of actual population and

173 fluctuations of population size over time. In the case of endosymbionts such as those associated with

174 O. algarvensis, we hypothesise that the Ne per host individual is driven by (i) the population size at the

175 original infection, whether it comes from vertical transmission or horizontal transmission, (ii) the

176 genetic diversity of the source population(s), (iii) internal bacterial growth after establishing a

177 symbiosis, and/or (iv) the presence of ongoing horizontal symbiont acquisition from an environmental

178 source. Although it is not feasible to disentangle individual drivers of Ne from simple comparisons of

179 Ne between symbionts, Ne can be linked to the presence and size of certain drivers if it is combined

180 with the knowledge related to these drivers. For example, the relatively large Ne estimates of Ca.

181 Thiosymbion may stem from its numerical dominance in an adult host individual and in the inoculant

182 during vertical transmission, as our study provides a strong support for vertical transmission of this

183 symbiont (see discussion in the main document). Other explanations are however possible for

184 estimated Ne values, and the elucidation of drivers for the Ne will require the identification of the

185 symbiont transmission mode, cell counts at the onset of symbiosis and in an adult host, and genetic

186 diversity of free-living symbionts if they are taken up by the host from the environment.

187

188 7

189 Reference for supplementary text

190 1 Sato, Y., Wippler, J., Wentrup, C., Dubilier, N. & Kleiner, M. High-Quality Draft Genome 191 Sequences of Two Deltaproteobacterial Endosymbionts, Delta1a and Delta1b, from the 192 Uncultured Sva0081 Clade, Assembled from Metagenomes of the Gutless Marine Worm 193 Olavius algarvensis. Microbiology Resource Announcements 9, e00276-00220, 194 doi:10.1128/mra.00276-20 (2020). 195 2 Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: A Fast and Effective 196 Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Molecular Biology and 197 Evolution 32, 268-274, doi:10.1093/molbev/msu300 (2015). 198 3 Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v4: recent updates and new 199 developments. Nucleic Acids Research 47, W256-W259, doi:10.1093/nar/gkz239 (2019). 200 4 Van Rossum, T., Ferretti, P., Maistrenko, O. M. & Bork, P. Diversity within species: 201 interpreting strains in microbiomes. Nature Reviews Microbiology 18, 491-506, 202 doi:10.1038/s41579-020-0368-1 (2020). 203 5 Bobay, L.-M. & Ochman, H. Factors driving effective population size and pan-genome 204 evolution in bacteria. BMC evolutionary biology 18, 153-153, doi:10.1186/s12862-018-1272-4 205 (2018). 206 6 Watterson, G. A. On the number of segregating sites in genetical models without 207 recombination. Theor Popul Biol 7, 256-276, doi:10.1016/0040-5809(75)90020-9 (1975). 208 7 Lynch, M. et al. Genetic drift, selection and the evolution of the mutation rate. Nat Rev Genet 209 17, 704-714, doi:10.1038/nrg.2016.104 (2016). 210 8 Lee, M. D. GToTree: a user-friendly workflow for phylogenomics. Bioinformatics 35, 4162- 211 4164, doi:10.1093/bioinformatics/btz188 (2019).

8

100 Borrelia recurrentis (GCF_000019705) Borrelia turcica (GCF_003606285) 100 Spirochaeta thermophila (GCF_000147075) 100 Treponema primitia (GCF_000214375) 100 100 Treponema caldarium (GCF_000219725) 100 Treponema medium (GCF_000413035)

78 Treponema phagedenis (GCF_008153345) Spirochaetes 100 Spirochaeta africana (GCF_000242595)

72 Salinispira pacifica (GCF_000507245) Sediminispirochaeta smaragdinae (GCF_000143985) 77 97 Spirochaeta perfilievii (GCF_008329945) 100 Oceanispirochaeta sp. K2 (GCF_008329965) Spirochete symbiont of Olavius algarvensis

100 Lactiplantibacillus plantarum (GCA_000023085) 100 Bacillus subtilis (GCA_000009045) [3.35E-10]* Firmicutes 98 Staphylococcus epidermidis (GCF_006094375) [7.40E-10]*

100 Streptomyces griseus (GCA_000010605) 100 Mycobacterium tuberculosis (GCF_000195955) [1.95E-10]* Actinobacteria Mycolicibacterium smegmatis (GCF_001457595) [5.27E-10]* Desulfatiglans anilini (GCF_000422285) Desulfatibacillum aliphaticivorans (GCF_000021905) 100 80 Desulforegula conservatrix (GCF_000426225)

100 Desulfobotulus mexicanus (GCF_006175995) 100 97 Desulfoluna spongiiphila (GCF_902498735) 75 100 Desulfobacula phenolica (GCF_900105645) 100 Desulfobacterium autotrophicum (GCF_000020365) 91 Desulfamplus magnetovallimortis (GCF_900170035) 82 Desulfococcus oleovorans (GCF_000018405) Desulfosalsimonas propionicica (GCF_013761005) Desulfatirhabdium butyrativorans (GCF_000429925) 87 100 Desulfatitalea tepidiphila (GCF_001293685) 100 Desulfosarcina widdelii (GCF_009688965) 100 97 Desulfosarcina ovata (GCF_009689005) 50 68 Delta3 symbiont of Olavius algarvensis Delta4 symbiont of Olavius algarvensis 100 Desulfococcus multivorans (GCF_001854245) 58 Desulfonema ishimotonii (GCF_003851005) 50 100 Delta1a symbiont of Olavius algarvensis Delta1b symbiont of Olavius algarvensis

100 Rhodopseudomonas palustris (GCA_000020445) 100 Rhizobium leguminosarum (GCA_000009265) Alphaproteobacteria Agrobacterium tumefaciens (GCA_007002865) [2.92E-10]* 100 Burkholderia cenocepacia (GCA_001992495) [1.33E-10]* Betaproteobacteria Pleionea sediminis (GCF_007570825) 78 Leucothrix mucor (GCF_000419525) 100 100 Pseudomonas aeruginosa (GCF_000006765) [0.79E-10]* 94 100 Salmonella enterica (GCF_000006945) [1.74E-10]* 100 Escherichia coli (GCF_000008865) [2.00E-10]* 100 Vibrio cholerae (GCF_000829215) [1.15E-10]* 96 Aliivibrio fischeri (GCF_004359415) [2.08E-10]*

71 Gamma3 symbiont of Olavius algarvensis 100 Thioalkalivibrio sulfidiphilus (GCF_000021985) Thioalkalivibrio thiocyanodenitrificans (GCF_000378965) 100 94 Thioalkalivibrio denitrificans (GCF_002000365) 66 Ca. Thioglobus singularis (GCF_001682155) Methyloprofundus sedimenti (GCF_002072955)

58 Sulfuriflexus mobilis (GCF_003967195) 51 99 Thiohalophilus thiocyanatoxydans (GCF_004366735) Thiohalomonas denitrificans (GCF_900102855) Thiogranum longum (GCF_004339085) 57 100 Symbiont of Solemya velesiana gill (GCF_002020805) 100 Sedimenticola thiotaurini (GCF_001007875) Gammaproteobacteria 89 Sedimenticola selenatireducens (GCF_007625115) 100 99 Thiolapillus brandeum (GCF_000828615) Symbiont of Solemya velum gill (GCF_002019535) 100 Symbiont of Riftia pachyptila (GCF_000224455) 50 Ca. Thiodiazotropha endoloripes (GCF_001708985) 100 Ca. Thiosymbion symbiont of Olavius algarvensis 91 Ca. Thiosymbion oneisti (GCF_900092655) 100 100 Thiorhodovibrio sp. 970 (GCF_000228725) Thiohalocapsa marina (GCF_008632335)

48 Thioflavicoccus mobilis (GCF_000327045) Lamprocystis purpurea (GCF_000379525) 99 Thiocapsa marina (GCF_000223985) 100 Marichromatium purpuratum (GCF_000224005) 88 100 Thiorhodococcus minor (GCF_010820565) 47 Thiorhodococcus mannitoliphagus (GCF_010915725) 100 Allochromatium vinosum (GCA_000025485)

95 Thiorhodococcus drewsii (GCF_000224065) 100 Imhoffiella purpurea (GCF_000585215)

Range B

Range A

0.5

Supplementary Figure S1 Phylogenomic tree of symbionts in Olavius algarvensis in relation to reference bacterial genomes. The phylogeny is calculated based on amino acid sequences of 25 protein coding genes that are universally present in Bacteria and Archaea. Assembly accession numbers are indicated in brackets. The outgroup is the archaeon Nitrososphaera viennensis (GCA_000698785), indicated with the arrow. The error bar indicates 0.5 amino acid replacement per amino acid site. The genomes of O. algarvensis symbionts are highlighted with red bold fonts and red dots at the branch terminal. Asterisks and blue dots denote reference bacteria for which mutation rates, as indicated in square brackets [mutation per nucleotide site per generation], are available (Lynch et al. 2016). Note that the range of branch lengths for these reference bacteria (Range A) encompasses the range of branch lengths for the O. algarvensis symbiont genomes (Range B). Branch support values indicate IQ-TREE Ultrafast Bootstrap estimates with 1,000 times replication. 100 Spirochaeta asiatica NR_026300 Spirochaeta dissipatitropha NR_043329 98 Spirochaeta halophila NR_044756

68 100 Spirochaeta isovalerica NR_104798 Spirochaeta psychrophila NR_134185 70 100 Spirochete symbiont of Olavius crassitunicatus AJ620512 Spirochete symbiont of Olavius loisae AF104475 99 59 Spirochete symbiont of Olavius algarvensis AJ620502 Spirochaetes Uncl. brackish water floc AB491844 Uncl. marine sediment JQ580008 60 96 100 Cytophaga sp. BHI80-3 associated with Alvinella pompejana tubes AJ431238

63 Uncl. diffuse vent surface KT257857 Uncl. marine sediment KP091163

46 Uncl. marine sediment JF268344 47 Uncl. submarine mud volcano sediment HQ588391

95 Uncl. marine sediment GQ246372 Uncl. mangrove sediment DQ811815 Uncl. marine sediment JF344682 96 100 Uncl. marine sediment JF344334 100 Uncl. marine sediment JQ580141

100 Uncl. marine sediment JQ580306 Delta1b symbiont of Olavius algarvensis MW411191 100 Uncl. marine sediment JQ580044 100 Delta1a symbiont of Olavius algarvensis AF328857 99 Delta1 symbiont of Olavius ilvae AJ620500 100 40 Uncl. seawater AY907763 100 Uncl. marine sediment JF344679 57 Uncl. marine sediment JF344572 56 60 Uncl. marine sediment KX097286 Uncl. marine sediment KC470928 97 100 Delta1 symbiont of Olavius crassitunicatus AJ620511 Uncl. marine sediment JF344410 Uncl. marine sediment JQ580056 100 Delta3 symbiont of Olavius algarvensis AM493254 100 Delta3 symbiont of Olavius ilvae AJ620501 Deltaproteobacteria 100 Delta3 symbiont of Inanidrilus exumae FM202060 47 Uncl. hypersaline microbial mat EU246034 Uncl. marine sediment JF344320 38 Uncl. marine sediment JF344673 Desulfosarcina alkanivorans NR_157797 72 100 Desulfosarcina widdelii NR_157796

66 95 Desulfosarcina ovata NR_037125 100 Desulfosarcina cetonica NR_028896 97 Desulfosarcina variabilis NR_044680 65 97 100 Uncl. marine sediment EU290686 Uncl. marine sediment FR823374 83 100 Uncl. marine sediment JQ580521 Desulfobacterium indolicum NR_028897 86 Uncl. marine sediment KX097497 100 Uncl. submarine mud volcano sediment FJ712498 73 Delta4 symbiont of Olavius algarvensis AJ620497 95 Uncl. marine sediment KP009614 83 Uncl. submarine mud volcano sediment FJ712404 90 Uncl. marine sediment KF268892

Thiolapillus brandeum NR_148757 94 Thioprofundum hispidum NR_112620 98 Sedimenticola selenatireducens NR_041877 100 87 Thioalkalivibrio sulfidiphilus NR_074692 Endothiovibrio diazotrophicus NR_148570 AJ620496 90 Gamma3 symbiont of Olavius algarvensis 100 Uncl. marine sediment JF344151 85 Uncl. marine sediment KT906693

53 Uncl. marine sediment AM039958 45 Uncl. marine sediment DQ351743 99 45 Uncl. marine sediment AM039959 48 53 Uncl. marine sediment EU734971

82 Uncl. marine sediment KR825167 Uncl. marine sediment KX088580 100 Thiococcus pfennigii NR_036977

98 Thioflavicoccus mobilis NR_102479

95 Halochromatium glycolicum NR_044896 Thiorhodovibrio winogradskyi NR_037050 99 Thiohalocapsa halophila NR_115076 100 Ca. Thiosymbion ectosymbiont of Catanema sp. ‘crete 1’ LR746258 Gammaproteobacteria Ectosymbiont of Catanema sp. ‘belize 1’ KP943972 74 100 Ca. Thiosymbion oneisti KF278591 Gamma1 symbiont of Olavius ilvae AJ620498 100 98 Bacterium associated with Laxus cosmopolitus FM955323 83 Ectosymbiont of Catanema sp. ‘st andrea’ KP943973 66 Ectosymbiont of Robbea sp. 2 EU711426 76 Symbiont of Inanidrilus makropetalos AJ890094 46 100 Symbiont of Inanidrilus leukodermatus AJ890100 AF328856 84 Ca. Thiosymbion symbiont of Olavius algarvensis Ectosymbiont of Robbea sp. 3 EU711428 100 Ectosymbiont of Robbea hypermnestra KP943980 Ca. Thiosymbion ectosymbiont of Paralaxus cocos LR746254 84 100 Ca. Thiosymbion ectosymbiont of Paralaxus bermudensis LR746249 56 Ca. Thiosymbion ectosymbiont of Paralaxus sp. ‘heron 1’ LR746251 86 Ectosymbiont of Leptonemella cf. juliae KP943977 Ectosymbiont of Leptonemella cf. vestari KP943976 100 98 Ca. Thiosymbion ectosymbiont of Leptonemella aphanothecae LR746248 100 Ectosymbiont of Leptonemella vicina KU921521

0.1

Supplementary Figure S2 Phylogeny of 16S rRNA gene sequences for symbionts in Olavius algarvensis in relation to reference bacterial sequences. NCBI sequence accession numbers are indicated in the labels. The outgroup is the archaeon Nitrososphaera viennensis (FR773157), indicated with the arrow. The error bar indicates 0.1 base pair replacement per nucleotide. The sequences of O. algarvensis symbionts are highlighted with red bold fonts. Sequences of bacteria associated with host animals are highlighted with yellow labels. “Uncl.” denotes an uncultured bacterium. Branch support values indicate IQ-TREE Ultrafast Bootstrap estimates with 1,000 replications. (a) Mitochondrial cladograms based on SNPs identified from genotype probabilities (b) Ca. Thiosymbion cladograms based on SNPs identified from genotype probabilities

Lineage A OalgCAVL_A11 Lineage A OalgSANT_A13 OalgCAVL_A30 100 OalgCAVL_A20

100 OalgCAVL_A21 100 OalgSANT_A23 26 35 OalgCAVL_A26 OalgCAVL_A19 OalgCAVL_A22 85 65 26 OalgCAVL_A17

79 OalgCAVL_A16 OalgCAVL_A18 91 OalgCAVL_A12 53 4 34 OalgSANT_A17

OalgCAVL_A23 5 OalgSANT_A20 63 OalgSANT_A11 OalgSANT_A19 98 14 OalgSANT_A16 10 OalgSANT_A18 69 42 4 OalgSANT_A19 32 OalgSANT_A14 63 OalgSANT_A13 OalgSANT_A21 50 OalgSANT_A22 73 OalgCAVL_A21 OalgSANT_A04 OalgCAVL_A26 63 52 79 21 54 OalgSANT_A24 47 OalgCAVL_A22 OalgSANT_A25 32 43 OalgCAVL_A30 OalgSANT_A21 27 OalgCAVL_A12 53 OalgSANT_A14 8 OalgCAVL_A23 45 OalgSANT_A23 70 OalgCAVL_A11 OalgSANT_A09 OalgCAVL_A16 41 44 OalgSANT_A15 OalgCAVL_A29 9 26 OalgSANT_A18 3 44 OalgCAVL_A14 OalgSANT_A20 OalgCAVL_A25 55 2 OalgSANT_A12 79 OalgCAVL_A13 OalgSANT_A06 0 28 60 20 OalgCAVL_A27 73 OalgSANT_A08 OalgCAVL_A28 0 61 OalgSANT_A05 OalgCAVL_A24 OalgSANT_A17 1 OalgCAVL_A15 OalgCAVL_A18 0 OalgSANT_A09 78 OalgCAVL_A17 OalgSANT_A12 2 27 65 19 OalgCAVL_A20 70 OalgSANT_A06 OalgCAVL_A27 35 OalgSANT_A05 0 1 26 OalgCAVL_A25 OalgSANT_A08 OalgCAVL_A29 28 OalgSANT_A15 OalgCAVL_A14 5 50 OalgSANT_A04 13 OalgCAVL_A24 16 OalgSANT_A24 60 45 OalgCAVL_A13 OalgSANT_A16 5 26 OalgCAVL_A28 10 OalgSANT_A22 91 OalgCAVL_A15 36 OalgSANT_A11 OalgCAVL_A19 OalgSANT_A25

Lineage B Lineage B OalgCAVL_B25 OalgCAVL_B15 100 OalgCAVL_B15 OalgSANT_B11 100 OalgCAVL_B11 99 OalgCAVL_B20

0 OalgCAVL_B24 51 OalgCAVL_B29

78 OalgCAVL_B27 41 OalgCAVL_B28 45 OalgCAVL_B22 60 OalgCAVL_B17 60 OalgCAVL_B19 OalgCAVL_B13 10 0 OalgCAVL_B23 52 83 OalgCAVL_B23 60 OalgCAVL_B29 53 OalgCAVL_B18 0 OalgCAVL_B21 0 OalgCAVL_B20 0 16 50 OalgCAVL_B12 27 23 OalgCAVL_B27 OalgCAVL_B16 43 OalgCAVL_B24

OalgCAVL_B26 11 OalgCAVL_B19 0 OalgCAVL_B21 34 OalgCAVL_B16 0 4 OalgCAVL_B25 45 OalgCAVL_B22 26 21 OalgCAVL_B28 10 OalgCAVL_B14 26 17 OalgCAVL_B11 39 OalgCAVL_B18 79 OalgCAVL_B14 OalgCAVL_B26 OalgCAVL_B13 35 OalgCAVL_B12 OalgCAVL_B31 OalgCAVL_B17 15 OalgSANT_B16 OalgCAVL_B31 76 OalgSANT_B05 11 OalgSANT_B13 43 76 14 OalgSANT_B06 37 OalgSANT_B21 OalgSANT_B14 26 26 OalgSANT_B08 54 OalgSANT_B09 73 OalgSANT_B11 12 31 OalgSANT_B18 0 OalgSANT_B14 97 OalgSANT_B23 OalgSANT_B17 6 13 7 OalgSANT_B16 0 OalgSANT_B09 OalgSANT_B17 6 OalgSANT_B20 28 OalgSANT_B12 OalgSANT_B23 0 OalgSANT_B21 OalgSANT_B18 76 OalgSANT_B24 OalgSANT_B24 0 9 OalgSANT_B04 76 OalgSANT_B22 8 22 OalgSANT_B08 76 OalgSANT_B13 0 0 OalgSANT_B07 0 OalgSANT_B15 OalgSANT_B05 0 OalgSANT_B04 2 OalgSANT_B20 OalgSANT_B06 2 0 OalgSANT_B19 OalgSANT_B07 15 0 35 OalgSANT_B15 44 OalgSANT_B12 OalgSANT_B22 OalgSANT_B19

(c) Mitochondrial cladograms based on SNPs identified from deterministic genotype-calling (d) Ca. Thiosymbion cladograms based on SNPs identified from deterministic genotype-calling

Lineage A OalgCAVL_A21 OalgCAVL_A18 96 Lineage A OalgCAVL_A26 100 OalgSANT_A11 34 OalgCAVL_A22 49 OalgCAVL_A16 45 46 OalgCAVL_A16 OalgSANT_A05 90 OalgCAVL_A30 47 44 OalgSANT_A20 100 94 OalgCAVL_A12 55 OalgSANT_A08 OalgCAVL_A23 48 OalgSANT_A09 OalgSANT_A21 45 OalgSANT_A14 OalgSANT_A23 OalgSANT_A23 80 20 20 OalgSANT_A24 63 OalgCAVL_A14 OalgSANT_A25 40 85 OalgCAVL_A17 85 35 OalgSANT_A11 50 OalgCAVL_A29 24 23 OalgSANT_A04 OalgCAVL_A13 47 20 OalgSANT_A22 OalgCAVL_A15 44 27 25 OalgSANT_A13 OalgCAVL_A25 OalgSANT_A16 65 69 OalgCAVL_A20 43 OalgSANT_A14 72 OalgSANT_A15

64 OalgSANT_A18 91 OalgCAVL_A27 89 OalgSANT_A09 97 OalgCAVL_A24 OalgSANT_A20 OalgCAVL_A28 21 OalgSANT_A12 39 OalgCAVL_A21 99 OalgSANT_A18 96 OalgSANT_A05 77 OalgSANT_A06 27 OalgSANT_A12 26 OalgSANT_A22 OalgSANT_A08 59 28 OalgSANT_A15 45 OalgSANT_A24 36 OalgCAVL_A18 62 OalgSANT_A13 81 OalgCAVL_A28 90 OalgSANT_A04 68 48 OalgCAVL_A29 OalgSANT_A06 59 OalgCAVL_A20 49 OalgCAVL_A30 38 OalgCAVL_A14 OalgSANT_A16 52 92 OalgCAVL_A25 79 OalgCAVL_A12 48 OalgCAVL_A19 94 38 OalgCAVL_A13 OalgCAVL_A22 94 OalgCAVL_A15 57 34 OalgCAVL_A26 OalgCAVL_A19 62 58 OalgSANT_A21 61 OalgCAVL_A24 90 OalgCAVL_A17 OalgCAVL_A23 OalgCAVL_A27 OalgSANT_A25

Lineage B Lineage B OalgCAVL_B25 OalgSANT_B14 OalgCAVL_B21 100 100 OalgSANT_B15 12 OalgCAVL_B15 91 OalgSANT_B17 OalgCAVL_B26 90 OalgSANT_B11 98 24 OalgCAVL_B12 88 OalgSANT_B16 14 OalgCAVL_B31 99 OalgSANT_B13 84 OalgCAVL_B19 46 OalgSANT_B23 OalgCAVL_B23 71 5 OalgSANT_B21 OalgCAVL_B13 71 13 OalgSANT_B22 OalgCAVL_B16 18 45 44 51 OalgSANT_B18 21 OalgCAVL_B14 21 OalgSANT_B24 OalgCAVL_B17 85 OalgSANT_B06 OalgCAVL_B20 90 13 OalgSANT_B12 OalgCAVL_B18 50 51 OalgSANT_B04 OalgCAVL_B29 73 63 OalgSANT_B05 OalgCAVL_B11 69 OalgSANT_B07 OalgCAVL_B22 71 OalgSANT_B08 OalgCAVL_B28 13 OalgSANT_B09 OalgCAVL_B27 OalgSANT_B19 11 OalgSANT_B20 32 OalgSANT_B20 OalgSANT_B13 44 98 OalgCAVL_B31 OalgSANT_B12 93 10 OalgCAVL_B27 28 22 OalgSANT_B06 88 OalgCAVL_B26 OalgSANT_B15 27 OalgCAVL_B11 19 OalgSANT_B05 16 38 OalgCAVL_B13 OalgSANT_B09 70 17 67 OalgCAVL_B25 OalgSANT_B18 12 OalgCAVL_B29 21 33 OalgSANT_B11 37 91 OalgCAVL_B18 OalgSANT_B14 98 OalgCAVL_B21 OalgSANT_B07 67 OalgCAVL_B23 OalgSANT_B19 OalgCAVL_B12 OalgSANT_B17 37 43 OalgCAVL_B14 OalgSANT_B21 OalgCAVL_B19 OalgSANT_B23 76 41 OalgCAVL_B15 OalgSANT_B16 80 OalgCAVL_B20 OalgSANT_B24 25 OalgCAVL_B28 OalgSANT_B08 61 OalgCAVL_B22 OalgSANT_B04 65 OalgCAVL_B16 OalgSANT_B22 OalgCAVL_B17

Supplementary Figure S3 Sub-clades within the major mitochondrial lineages A and B were poorly supported for host mitochondrial phylogenies using a probabilistic approach to SNP identification (a) and deterministic genotype-calling (c), as well as for Ca. Thiosymbion phylogenies using the probabilistic approaching to SNP identification (b) and deterministic genotype calling (d). Bootstrap support values above 95% are highlighted with bold fonts. Branches without clade supports are due to polychotomy. Uniform total branch lengths are transformed proportionally to branch-lengths in the phylogeny shown in Figure 3 and Suppl. Figure S4 for improving the visibility of bootstrap support values. a; Mitochondria b; Ca. Thiosymbion

c; Gamma3

COI-haplotype A B

Sant’ Andrea

Cavoli

d; Delta1a e; Delta1b

f; Delta4 g; Spirochete

Supplementary Figure S4 Core SNP-trees based on called genotypes from >5X covered bases (SNIPPY v3.2) for a; mitochondria (121 SNPs, >80% cut-off*, 76 samples), b; Ca. Thiosymbion (391 SNPs, >29% cut-off*, 76 samples), c; Gamma3 (73 SNPs, >37% cut-off*, 72 samples), d; Delta1a (121 SNPs, >30% cut-off*, 16 samples), e; Delta1b (57 SNPs, >50% cut-off*, 25 samples), f; Delta4 (177 SNPs, >60% cut-off*, 45 samples), and g; Spirochete (99 SNPs, >5% cut-off*, 41 samples). Bootstrap support values >95~100 are shown in black, <95~>85 in grey. Internal branch support values are omitted for visibility. (*Cut-off is applied based on % of mapped reference bases for filtering out samples with low-abundance symbionts and insufficient data for SNP-detection.) Note: In the magnified panel in (c), the Gamma3 symbionts from five A-worms (pale green) were present in the same clade, with four sharing the identical core SNPs and thus being invisible in the graph. a; Ca.Thiosymbion b; Gamma3 1.5

0.2 1.0

0.5 0.1

0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

c; Delta1a d; Delta1b

0.20

0.2 0.15

0.10 0.1

0.05

Pairwise genetic distance of symbionts genetic distance Pairwise 0.0 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

e; Delta4 f; Spirochete 0.16 0.5

0.4 0.12

0.3

0.08 0.2

0.04 0.1

0.0 0.00 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4

Pairwise genetic distance of mitochondria

Supplementary Figure S5 Correlation of mitochondrial pairwise genetic distances (x-axis) with corresponding genetic distances of symbionts in Olavius algarvensis samples (y-axis). Genetic distances were measured using NGSdist based on posterior genotype-probabilities calculated with ANGSD. Plot densities are indicated with pale blue contours, and linear regression lines and 95% confidence intervals are shown with a black line and grey ribbon, respectively. Note the different patterns in regressions, with Candidatus Thiosymbion showing the strongest positive correlation, and other symbionts showing small or no positive correlations between mitochondrial distances and symbiont distances. Also note that Gamma3 displayed three centres of plot density in the regression, and Delta4 displayed four, indicating the presence of other factor(s) that explain the patterns of symbiont genetic divergence than mitochondrial divergence. a a b c c c

1e+06

1e+05 Effective population size

1e+04

Ca. Thiosymbion Gamma3 Delta1a Delta1b Delta4 Spirochete (74) (62) (11) (19) (43) (10)

Symbiont

Supplementary Figure S6 Effective population size estimates of the symbiont per Olavius algarvensis individual. Estimation was based on the number of genome-wide segregation sites (SNPs), the average sequence coverage and an assumed mutation rate (see Supplementary Text 1.6). Numbers in brackets indicate numbers of replicates for each symbiont. Note that the y-axis is in log-scale. Thick horizontal lines and grey boxes respectively indicate the median and interquartile range (IQR) of observations. Vertical lines show the IQR ± 1.5 IQR range, and outliers out of this range are shown as circles. Letters on the top indicate a summary of pairwise statistical comparisons based on a p-value cutoff of 0.05 (post-hoc Dunn multiple comparison tests with the Benjamini-Hochberg p-value adjustment. Prior Kruskal-Wallis rank sum test indicated p-value = 2.2e-16, χ2 = 172.89, df = 5). Ca. Thiosymbion

Sant’ Andrea

COI-haplotype A

Cavoli

Sant’ Andrea

COI-haplotype B

Cavoli

Gamma3

Sant’ Andrea

COI-haplotype A

Cavoli

Sant’ Andrea

COI-haplotype B

Cavoli

Delta1* Delta1a* Sant’ COI-haplotype A Andrea Cavoli COI-haplotype B Sant’ Andrea Delta1b* COI-haplotype A Cavoli

Sant’ Andrea

COI-haplotype B Cavoli

Delta4

Sant’ Andrea

COI-haplotype A

Cavoli

Sant’ Andrea

COI-haplotype B

Cavoli

Spirochete

Sant’ COI-haplotype A Andrea

Cavoli

COI-haplotype B Sant’ Andrea

Supplementary Figure S7 Sequence alignments of 16S ribosomal RNA genes of O. algarvensis symbionts (from top to bottom; Candidatus Thiosymbion, Gamma3, Delta1 (Delta1a and Delta1b), Delta4 and spirochetes). Within each symbiont species, black notches denote single nucleotide polymorphisms among non-variable sites indicated by grey boxes. COI-haplotypes (A and B) and sampling locations (Sant’ Andrea and Cavoli) of the host O. algarvensis worms are indicated with coloured boxes on the left. Among the symbiont species, SSU sequences were successfully assembled from many host individuals for Ca. Thiosymbion, Gamma3 and Delta4, while assembly of SSU sequences for the Delta1a, Delta1b and spirochete symbionts was unsuccessful in many metagenomes, due to their lower relative abundances (or absence) and lower sequence-coverages. Note that only the Candidatus Thiosymbion symbiont show a distinctive SNP site that clearly separates COI-haplotypes A and B of the host, but no other symbionts show SNPs that can be linked to host COI-haplotype or location.

*Delta1 sequences were classified into Delta1a and Delta1b symbionts by cross-checking with the presence/absence of Delta1a and Delta1b symbionts in each host assessed by quantification of species-specific single-copy gene sequences (Figure 2 in the main body). Note Delta1a and Delta1b sequences share most of their non-variable sites. Delta1a Delta1b Delta3 Delta4 Ca.Thiosymbion Gamma3 Spirochete age) er v e co y mean relativ y genes (ordered b p single−co (n=75 worms; 162 SCGs) (n=79 worms; 431 SCGs) (n=80 worms; 380 SCGs) (n=67 worms; 208 SCGs) (n=3 worms; 201 SCGs) (n=49 worms; 201 SCGs) (n=51 worms; 200 SCGs) (n=51 worms;

0.04 0.02 0.00 0.20 0.10 0.00 0.02 0.01 0.00 0.04 0.02 0.00 0.06 0.04 0.02 0.00 0.04 0.03 0.02 0.01 0.00

0.000 0.005 0.010 v relati o c e s +/− (mean age r e v .d.) Distribution of mean relative read coverage among single-copy genes (SCGs) within S8 Distribution of mean relative read coverage among single-copy Supplementary Figure > 5, in Kallisto output) were TMP SCG reads (mean a single host per symbiont species. Samples with sufficient brackets. Plots are shown as The number of replicate samples and SCG sequences included are shown in analyzed. that, in each symbiont, grey bars). Note the mean of relative read coverage (pink bars) ± standard deviation (s.d.; of alignment indicated Further inspection a small number of SCGs indicated particularly high mean read coverage. symbiont in the corresponding that short repeat sequences matching with a partial SCG sequence are present number of SCGs. reads in the small of pseudo-aligned genome, which likely resulted in the over-estimation COI-haplotype A COI-haplotype B

Sant’ Andrea Cavoli Sant’ Andrea Cavoli

1.00

Symbiont Ca.Thiosym. 0.75 Gamma3 Delta1 (a or b) Delta3 Delta4 0.50 Spiro Relative abundance

0.25

0.00 Sample (COI-haplotype, Location, n = 20 each)

Supplementary Figure S9 Symbiont composition of individual O. algarvensis samples of two COI-haplotypes (A and B) from two locations (Sant’ Andrea and Cavoli). Relative abundance was estimated based on small sub-unit (SSU) rRNA gene sequences mapped to a collection of reference sequences derived from O. algarvensis endosymbionts. Note that Delta1a and Delta1b symbionts cannot be distinguished based on SSU rRNA genes due to high similarity, and thus they are pooled as “Delta1”. “Ca.Thiosym.” refers to Candidatus Thiosymbion. Supplementary Table S1 Reference genome statistics

Reference genome Ca. Thiosymbion Gamma3 Delta1a Delta1b Delta3 Delta4 Spirochete Specimen ID OalgB6SA OalgB6SA OalgA4SA OalgB6SA OalgB2SA OalgB6SA OalgB6SA MAG accession number GCA_905176695 GCA_905176675 GCA_902749705 GCA_902749685 GCA_903231395 GCA_905176665 GCA_905176685 COI-haplotype B B A B B B B Collection site Sant’ Andrea Sant’ Andrea Sant’ Andrea Sant’ Andrea Sant’ Andrea Sant’ Andrea Sant’ Andrea Genome size (bp) 3,819,172 4,272,394 10,617,271 9,888,100 5,583,787 5,465,468 2,393,688 #contigs 8,736 1,048 4,867 1,617 401 286 515 #contigs >=1kbp 927 268 1,427 523 131 98 135 #contigs >=5kbp 68 179 705 308 109 71 68 #contigs >=10kbp 2 143 311 227 92 61 52 #contigs >=50kbp 0 10 1 49 38 38 13 Largest contig 11,494 166,547 64,030 206,304 424,721 333,300 174,819 N50 1,978 23,651 10,152 44,270 76,958 121,536 45,935 N75 344 13,833 5,349 20,618 46,596 64,281 15,534 L50 551 50 307 66 21 13 15 L75 1,616 108 668 148 44 28 36 GC (%) 55.99 55.62 49.45 48.10 54.22 53.99 47.40 Completeness (%) 82.20 98.24 95.81 96.13 92.90 99.35 97.60 Contamination (%) 0.43 3.14 3.55 2.58 0.00 1.29 8.00 Strain heterogeneity (%) 0 0 16.7 0 0 0 4.35 Genome coverage (×) 80 185 91 29 36 36 16

*N50/75 is the length (bp) for which the collection of all contigs of that length or longer covers at least 50/75% of the genome.

L50/75 is the minimal number of contigs that cover 50/75% of the genome.

MAG; metagenome-assembled genome.

Supplementary Table S2 Symbionts in O. algarvensis do not show high strain variability within a host individual: (a) metagenomes used for the analysis with read-coverage of each symbiont genome, and (b) SNP-site density per 1Kbp of each symbiont.

(a) Coverage ENA sequence accession Ca. (Specimen ID) COI-haplotype Location Total data (Gb) Delta1a Delta1b Delta3 Delta4 Thiosymbion Gamma3 Spirochete

ERR3773751 (OalgA4SA) A Sant’ Andrea 5.7 91.1 0.0 0.0 56.2 62.0 115.6 5.5

SRR5248183 (OalgA2SA) A Sant’ Andrea 11.8 14.4 0.0 0.0 24.8 72.4 35.9 7.7

SRR5251647 (OalgA1SA) A Sant’ Andrea 17.2 17.1 0.0 0.0 22.4 44.6 32.4 14.3

SRR6213993 (OalgA3SA) A Sant’ Andrea 25.4 41.7 0.0 0.0 74.0 157.8 119.0 34.3

SRR5421031 (OalgA1CA) A Cavoli 11.2 12.2 19.8 55.5 0.0 84.0 50.9 9.2

SRR5421034 (OalgA2CA) A Cavoli 25.9 31.7 0.0 0.0 79.2 161.4 151.7 25.5

SRR5421607 (OalgA3CA) A Cavoli 11.7 20.2 0.0 0.0 52.8 108.6 68.0 14.7

ERR3773750 (OalgB6SA) B Sant’ Andrea 6.8 0.0 28.9 0.0 35.6 80.0 185.0 15.8

SRR5248184 (OalgB1SA) B Sant’ Andrea 13.0 0.0 0.0 0.0 46.1 73.4 59.2 10.3

SRR5248185 (OalgB2SA) B Sant’ Andrea 16.5 35.1 35.1 35.6 0.0 89.2 86.6 14.9

SRR5248194 (OalgB3SA) B Sant’ Andrea 13.8 0.0 14.0 0.0 23.8 56.2 52.1 6.5

(b) SNP-density (/Kbp) ENA sequence accession Ca. (Specimen ID) COI-haplotype location Delta1a Delta1b Delta3 Delta4 Thiosymbion Gamma3 Spirochete

ERR3773751 (OalgA4SA) A Sant’ Andrea 0.04 n.a. n.a. 0.03 0.46 0.03 0.50

SRR5248183 (OalgA2SA) A Sant’ Andrea 0.09 n.a. n.a. 0.02 0.50 0.05 0.31

SRR5251647 (OalgA1SA) A Sant’ Andrea 0.07 n.a. n.a. 0.02 0.41 0.05 0.40

SRR6213993 (OalgA3SA) A Sant’ Andrea 0.12 n.a. n.a. 0.04 0.59 0.15 0.58

SRR5421031 (OalgA1CA) A Cavoli 0.19 0.15 0.03 n.a. 0.73 0.08 0.48

SRR5421034 (OalgA2CA) A Cavoli 0.10 n.a. n.a. 0.03 0.55 0.13 1.00

SRR5421607 (OalgA3CA) A Cavoli 0.08 n.a. n.a. 0.03 0.55 0.16 0.55

ERR3773750 (OalgB6SA) B Sant’ Andrea n.a. 0.06 n.a. 0.01 0.29 0.07 0.45

SRR5248184 (OalgB1SA) B Sant’ Andrea n.a. n.a. n.a. 0.03 0.46 0.08 0.41

SRR5248185 (OalgB2SA) B Sant’ Andrea 0.09 0.09 0.02 n.a. 0.35 0.08 0.38

SRR5248194 (OalgB3SA) B Sant’ Andrea n.a. 0.09 n.a. 0.02 0.46 0.07 0.13

average 0.10 0.10 0.02 0.03 0.49 0.09 0.47

Supplementary Table S3 A probabilistic approach to SNP-identification based on posterior genotype probabilities in general enabled analysis of an increased number of SNP-sites (a) and samples (b), compared to the deterministic approach to genotype-calling and SNP- identification. a; Number of SNP sites Probabilistic Deterministic Target approach* approach** Mitochondria 166 121 Ca. Thiosymbion 2872 391 Gamma3 618 73 Delta1a 375 121 Delta1b 624 57 Delta4 675 177 Spirochete 88 99

b; Number of samples Probabilistic Deterministic Target approach approach Mitochondria 80 76 Ca. Thiosymbion 80 76 Gamma3 80 72 Delta1a 37 16 Delta1b 46 25 Delta4 67 45 Spirochete 41 41

*SNP sites were filtered with a cut-off of SNP p-value of 0.01, minimum and maxim total read-coverage (5%~95%tile range of coverage distribution), and minimum minor allele frequency of 0.01. **SNP sites were filtered with a minimum coverage of 5× in each sample.