Supplementary Information for:

Marine sediments illuminate diversity and evolution

Jennah E. Dharamshi1, Daniel Tamarit1†, Laura Eme1†, Courtney Stairs1, Joran Martijn1, Felix Homa1, Steffen L. Jørgensen2, Anja Spang1,3, Thijs J. G. Ettema1,4*

1 Department of Cell and Molecular Biology, Science for Laboratory, Uppsala University, SE-75123 Uppsala, Sweden 2 Department of Earth Science, Centre for Deep Sea Research, University of Bergen, N-5020 Bergen, Norway 3 Department of Marine Microbiology and Biogeochemistry, NIOZ Royal Netherlands Institute for Sea Research, and Utrecht University, NL-1790 AB Den Burg, The Netherlands 4 Laboratory of Microbiology, Department of Agrotechnology and Food Sciences, Wageningen University, 6708 WE Wageningen, The Netherlands.

† These authors contributed equally * Correspondence to: Thijs J. G. Ettema, Email: [email protected]

Supplementary Information

Supplementary Discussions ...... 3 1. Evolutionary relationships within the Chlamydiae ...... 3 2. Insights into the evolution of pathogenicity in ...... 8 3. Secretion systems and flagella in Chlamydiae ...... 13 4. Phylogenetic diversity of chlamydial nucleotide transporters...... 20 5. Genomic potential for de novo biosynthesis of nucleotides and amino acids across Chlamydiae ...... 25 6. in Loki’s Castle marine sediments ...... 27 7. Abundance and diversity of chlamydial lineages in Loki’s Castle marine sediments ...... 30 8. Underestimation of environmental abundance and diversity of Chlamydiae ...... 31

Supplementary Figures ...... 35

Supplementary Tables ...... 53

Supplementary Data Descriptions ...... 61

Supplementary References ...... 62

2 1 Supplementary Discussions

2 3 1. Evolutionary relationships within the Chlamydiae phylum

4 We performed several in-depth phylogenomic analyses to reconstruct interspecies relationships

5 within the Chlamydiae phylum. To build upon previous work1-3, we have increased taxon

6 sampling and put a particular emphasis on applying state-of-the-art approaches aiming to detect

7 and alleviate potential phylogenetic artifacts that can be caused by long-branching taxa and

8 sequence composition heterogeneity (see Methods).

9 Our phylogenomic analyses in maximum likelihood and Bayesian frameworks allowed

10 us to resolve seven well-supported Chlamydiae (CC) of putatively high taxonomic rank.

11 These include five newly identified clades, CC-I through CC-IV and Anoxychlamydiales,

12 which are primarily composed of uncultured chlamydial lineages represented by metagenome-

13 assembled genomes (MAGs). The phylogenetic placement of most lineages, and deep-

14 branching relationships between clades were well-resolved and consistent across

15 phylogenomic reconstructions (Fig. 2, Supplementary Figs. 3 and 16), with the exception of a

16 few long-branching lineages (see below).

17

18 1.1 Resolving deep evolutionary relationships between chlamydial clades 19 20 Overall, within previously identified clades, our analyses recovered shallow evolutionary

21 relationships that were consistent with recent work3. However, there are notable differences

22 with regard to the inferred deeper evolutionary relationships. In particular, previous work has

23 suggested that the Chlamydiaceae (denoted as the order Chlamydiales3) are deeply branching1-

24 4 and comprise a sister group of all other chlamydial lineages (corresponding to C-I, CC-II, CC-

25 III, Anoxychlamydiales and environmental chlamydiae members)2,3, which was tentatively

26 classified as the order Parachlamydiales3.

27 In contrast, all our phylogenomic reconstructions strongly support a sister relationship of

28 the Chlamydiaceae with CC-IV, which together form a sister of the environmental

29 chlamydiae. Altogether, this group forms a sister relationship with the second major radiation

30 in the Chlamydiae, comprised of CC-I, CC-II, CC-III and Anoxychlamydiales lineages (Fig. 2,

31 Supplementary Figs. 3 and 16).

32 Our results differ from prior analyses due to the inclusion of CC-IV, which is composed

33 of three newly identified metagenome assembles genomes (MAGs) from Loki’s Castle marine

34 sediments, and the use of phylogenetic inference methods aimed at minimizing artifacts such

35 as long-branch attraction (LBA). For instance, the branch leading to the Chlamydiaceae family

36 is relatively long, which may in part be due to the evolutionary transition to a parasitic lifestyle

37 with a restricted range6,7. The inclusion of CC-IV in our analyses shortens the long

38 branch to the Chlamydiaceae and may thus alleviate phylogenetic reconstruction artefacts that

39 were previously attracting the latter to the base of the phylum.

40 Investigations of the evolution of the Chlamydiae and inferences on the nature of the

41 chlamydial ancestor have been based on the assumption that the Chlamydiaceae represent the

42 earliest diverging lineage within this phylum1,2. Thus, conclusions from these analyses will

43 need to be re-examined based on the herein updated phylogeny of the Chlamydiae.

44

45 1.2 Phylogenetic placement of long-branching chlamydial lineages 46 47 In a recent study, the long-branching orphan lineage Chlamydiae bacterium

48 RIFCSPHIGHO2_12_FULL_49_11 was inferred as the second deepest-branching lineage

49 within Chlamydiae (after the divergence of Ca. Similichlamydia epinephelii) and was proposed

50 to form the new order Candidatus Novochlamydiales3.

51 In agreement with this, our initial maximum-likelihood (ML) phylogenies suggested the

52 placement of Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_11, followed by

4

53 K940_chlam_8 at the base of Chlamydiae, although support for the early divergence of these

54 representatives was weak (BV = 42 and BV = 61, respectively) (Fig. 2, Supplementary Data 4).

55 When 25% of the most heterogeneous sites were removed, both lineages became nested inside

56 a larger clade composed of CC-I, II, III and Anoxychlamydiales, although with poor support

57 (Fig. 2, Supplementary Data 4). However, in our Bayesian phylogenetic inference based on the

58 CAT-GTR model, a complex model of protein evolution that minimizes the effects of LBA5,

59 the placement of the two lineages within the larger clade of CC-I, II, III and Anoxychlamydiales

60 was highly supported (posterior probability (PP) = 0.97, Fig. 2). For instance, Chlamydiae

61 bacterium RIFCSPHIGHO2_12_FULL_49_11 was placed within a well-supported clade with

62 CC-I (PP = 0.99, Fig. 2), suggesting that the early divergence of this representative may indeed

63 have been the result of LBA. Thus, our analyses indicate that Chlamydiae bacterium

64 RIFCSPHIGHO2_12_FULL_49_112 does not represent a deep-branching Chlamydiae order

65 but may instead be closely related to the family.

66 During the process of our analyses, several other chlamydial MAGs and Single-cell

67 Assembled Genomes (SAGs) were publicly released (Supplementary Table 3). We

68 reconstructed a ML phylogeny including these lineages, which was congruent with our prior

69 analyses (Supplementary Fig. 3). One of these MAGs, representing Candidatus

70 Similichlamydia epinephelii, was placed as a sister lineage to all other members of the

71 Chlamydiae with high support (Supplementary Fig. 3). This position is consistent with other

72 recent phylogenomic analyses of the Chlamydiae2-4. Ca. S. epinephelii is a member of the

73 candidate chlamydial family Candidatus Parilichlamydiaceae, which is composed of

74 chlamydial fish that cause epitheliocystis6. This taxon emerges on a long-branch,

75 which is not surprising given the accelerated rate of evolution observed in many pathogens.

76 Future phylogenetic analyses with an improved taxonomic sampling might better resolve the

5

77 phylogenetic placement of Ca. Parilichlamydiaceae by alleviating potential phylogenetic

78 artifacts.

79

80 1.3 Genome characteristics and gene content variation across the Chlamydiae phylum 81 82 Genome characteristics (e.g., genome size and GC content) and gene content vary widely

83 between different clades of the Chlamydiae (Fig. 2, Supplementary Fig. 3). Nearly all genomic

84 information available for CC-I, CC-II, CC-III and Anoxychlamydiales is represented by MAGs

85 from Loki’s Castle marine sediments (Supplementary Table 2) or other recent metagenomic

86 surveys (Supplementary Table 3), which together represent over half of all chlamydial

87 diversity. With the exception of Simkania negevensis, all characterized chlamydial lineages

88 obtained through co-cultivation are part of the environmental chlamydiae and Chlamydiaceae

89 (Supplementary Table 3).

90 CC-II has an unusually high GC-content among chlamydiae, and branches as a sister

91 group to the S. negevensis-containing7 CC-I clade (Fig. 2, Supplementary Fig. 3). The estimated

92 genome sizes of CC-II MAGs derived from this metagenomic study and others (1.6-2.0 Mbp),

93 and CC-I lineages and Chlamydiae bacterium K1060_chlam_2 (1.7 Mbp), are all distinctly

94 smaller than the S. negevensis genome (2.6 Mbp), indicating differences in their cell biology

95 and lifestyle. The genomes of CC-I and CC-II lineages appear to have experienced reductive

96 genome evolution as their genomes display smaller median intergenic space in comparison to

97 all other chlamydiae, particularly in comparison with the genomes of many environmental

98 chlamydiae (Supplementary Fig. 3).

99 CC-III is composed solely of MAGs derived from recent metagenomic studies

100 (Supplementary Table 3).

101 Anoxychlamydiales is dominated by marine sediment chlamydiae (Fig. 2, Supplementary

102 Fig. 3), including nearly half of the MAGs obtained from Loki’s Castle marine sediments,

6

103 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_27_8 (derived from Rifle Colorado

104 aquifer groundwater8) and Chlamydiae bacterium SM23_39 (derived from the sulfate-methane

105 transition zone of White Oak River sediments9). The Anoxychlamydiales are characterized by

106 an exceptionally low GC content (26-31%) and have the largest median intergenic spaces of

107 members of the sub-clades including CC-I, CC-II, CC-III and Anoxychlamydiales. Gene

108 content within this clade is highly conserved in comparison with other newly identified

109 chlamydial lineages (Supplementary Fig. 3).

110 Only two marine sediment chlamydiae MAGs appeared to belong to the environmental

111 chlamydiae clade of well-characterized chlamydial symbionts of protists10,11 (Fig. 2,

112 Supplementary Fig. 3). Chlamydiae bacterium K940_chlam_3 represents the deepest branching

113 lineage of the environmental chlamydiae, and Chlamydiae bacterium K940_chlam_7 represents

114 a sister taxon to W. chondrophila12. These representatives have estimated genome sizes of 2.0

115 and 2.6 Mbp respectively, which is consistent with the range of genome sizes represented by

116 other members of the environmental chlamydiae (2.1-3.4 Mbp). The environmental chlamydiae

117 clade displays the most varied patterns in gene content and the largest median intergenic space

118 across their genomes (mean of 84 bp) (Supplementary Fig. 3). These factors point to gene

119 acquisition events, which could be a result of the amoeba-associated lifestyles of members of

120 this clade13.

121 CC-IV is composed solely of three marine sediment chlamydiae MAGs which have

122 higher GC-content (47%) than other chlamydiae (mean of 39%) and forms a well-supported

123 sister clade of the Chlamydiaceae (Fig. 2, Supplementary Fig. 3). The gene content within CC-

124 IV representatives is less conserved than within members of the Chlamydiaceae

125 (Supplementary Fig. 3). Furthermore, CC-IV members have larger estimated genome sizes

126 (1.3-2.1 Mbp) than the Chlamydiaceae (1-1.2 Mbp), with Chlamydiae bacterium

127 K940_chlam_9 having a particularly large genome size in comparison (2.1 Mbp).

7

128

129 2. Insights into the evolution of pathogenicity in Chlamydiaceae

130 The Chlamydiaceae family, recently reviewed in14,15, includes important animal and human

131 pathogens. The well-known human is the causative agent of

132 sexually transmitted genital tract infections and (i.e., preventable blindness), while

133 pneumoniae can cause acute respiratory infections. In addition, C. psittaci, C.

134 abortus, and C. felis all have zoonotic potential and can also cause disease in humans14. Based

135 on our revised Chlaymidiae phylogeny and the expanded genomic sampling of members of this

136 phylum, here we provide insights into the emergence and evolution of the Chlamydiaceae

137 family.

138

139 2.1 Chlamydiaceae evolved later in Chlamydiae evolution, and through genome reduction 140 141 The Chlamydiaceae were thought to be an early-diverging group within the Chlamydiae

142 phylum1-4. The expanded genomic sampling of chlamydial diversity and use of sophisticated

143 phylogenomic methods herein has allowed us to propose a new phylogeny for the Chlamydiae,

144 including the Chlamydiaceae (Supplementary Discussion 1). Specifically, we consistently

145 recover a strongly supported sister-relationship between Chlamydiaceae and CC-IV, which

146 together form a clade sister to the amoeba-associated environmental chlamydiae (Fig. 2,

147 Supplementary Fig. 3, Supplementary Fig. 16).

148 The CC-IV clade is solely comprised of uncultured lineages identified in Loki’s Castle

149 marine sediments. When compared to Chlamydiaceae, CC-IV lineages have larger genomes

150 (1.1-1.2 Mbp and 1.3-2.1 Mbp, respectively) and higher GC contents (37–41% and 47%,

151 respectively). These two clades also differ significantly in their conservation of gene content

152 (Supplementary Fig. 3) while the gene content of Chlamydiaceae is highly conserved, it is

153 highly variable between the three obtained CC-IV lineages.

8

154 Our analyses of the presence and absence patterns of Non-supervised Orthologous

155 Groups (NOGs) support previous reports which indicated that the evolution of the

156 Chlamydiaceae family was characterized by massive gene loss14,16-18, consistent with observed

157 genome size reduction in the branch leading to this family (Fig. 2, Fig. 3a-b, Supplementary

158 Data 3). When comparing gene content between environmental chlamydiae, CC-IV, and

159 Chlamydiaceae, we found 576, 248 and 36 NOGs respectively, conserved uniquely within each

160 clade (Fig. 3b,). When considering NOGs found exclusively in the Chlamydiaceae and not in

161 other chlamydiae, the set of protein families uniquely conserved in this group dropped further

162 to 15 (Supplementary Fig. 6). We also identified 13 PF domains conserved across

163 Chlamydiaceae, which are not present in the genomes of other chlamydial lineages. The

164 acquisition of the small set of proteins conserved in the Chlamydiaceae which are not found in

165 other chlamydiae, may have played a role in the evolution of the clade. In addition, the

166 proportion of NOGs assigned to each Cluster of Orthologous Groups of proteins (COG)

167 functional category was generally smaller in Chlamydiaceae relative to environmental

168 chlamydiae and CC-IV lineages (Supplementary Fig. 5). The latter observation was most

169 prevalent in COGs with the largest underrepresentation in functional categories related to

170 metabolism (e.g., energy production and conservation, carbohydrate transport and metabolism,

171 and inorganic ion transport and metabolism; Supplementary Fig. 5). Indicating, that in

172 particular a loss of functions related to metabolism may have contributed to Chlamydiaceae

173 evolution.

174

175 2.2 Chlamydiaceae display reduced metabolic capacities relative to CC-IV 176 177 Several metabolic pathways appear to have been lost specifically in Chlamydiaceae relative to

178 CC-IV and environmental chlamydiae (Supplementary Fig. 4, Supplementary Data 3). These

179 pathways include proline biosynthesis, and the UMP biosynthesis pathway necessary for

9

180 pyrimidine biosynthesis (KEGG module: M00051). Furthermore, Chlamydiaceae members

181 lack genes for a hexokinase (or any glucokinase) and for the first three enzymes of the

182 tricarboxylic acid cycle (TCA) cycle (i.e., citrate synthase, aconitase and isocitrate

183 dehydrogenase)19. Consequently, they depend on their host for metabolic exchange of TCA

184 cycle intermediates and glucose-6-phosphate19. In contrast, a glucokinase and a complete TCA

185 cycle are found in virtually all genomes of environmental19 and CC-IV chlamydiae. These

186 patterns suggest that all pathways mentioned above were present in the common ancestor of

187 environmental chlamydiae, CC-IV and Chlamydiaceae, and subsequently lost in

188 Chlamydiaceae.

189 Many flagellar components are present in CC-IV and individual chlamydial lineages

190 branching at the base of the environmental chlamydiae and CC-IV/Chlamydiaceae clades while

191 only a few components were identified in Chlamydiaceae (Supplementary Discussion 3).

192 Phylogenetic analyses of these flagellar components indicate that these were already present in

193 the last common ancestor of Chlamydiaceae, environmental and CC-IV chlamydiae

194 (Supplementary Fig. 8, Supplementary Data 4, Supplementary Discussion 3). Yet, the few

195 subunits present in the Chlamydiaceae seem to have been co-opted to act alongside their NF-

196 T3SS20.

197

198 2.3 Gain of virulence and host interaction factors in Chlamydiaceae evolution 199 200 In comparison to other lineages of the Chlamydiae, Chlamydiaceae genomes are characterized

201 by a set of unique and functionally annotated core genes (Fig. 3b, Supplementary Fig. 6) that

202 encode proteins associated with host-interaction and virulence.

203 In particular, and in agreement with previous studies17, we observed an expansion of

204 Polymorphic Outer Membrane Protein families (POMPs) uniquely in members of the

205 Chlamydiaceae14. Polymorphic outer membrane proteins (POMPs) allow for niche-specific

10

206 adhesion of chlamydial cells to their animal hosts, and also aid in immune system evasion

207 through their antigenic diversity1. Several additional outer membrane proteins are also

208 conserved across Chlamydiaceae (Supplementary Fig. 6).

209 Another striking example of a protein uniquely conserved among members of the

210 Chlamydiaceae is the carbohydrate-selective porin (OprB) protein (Supplementary Fig. 6),

211 which is a component of the outer membrane complex of membrane proteins that are surface-

212 exposed in Chlamydiaceae EBs21. Besides, all Chlamydiaceae encode one arginine

213 decarboxylase which most likely functions in the reduction of arginine reserves during host cell

214 infection, but could also protect from nitrosative stress22. Furthermore, most Chlamydiaceae

215 uniquely encode a Membrane Attack Complex PerForin (MACPF)14,23. While the exact

216 function of the MACPF in Chlamydiaceae is unclear, it may assist in the acquisition and

217 processing of lipids derived from the host, play a role in host immune system avoidance or

218 facilitate host entry through pore formation14.

219 Finally, Chlamydiaceae are characterized by a large number of highly conserved NOGs

220 and protein family (PF)24 domains with unknown function (Supplementary Fig. 6), some of

221 which could play a role in the pathogenic lifestyle of members of this family.

222 223 224 2.4 Gene acquisition events unique to CC-IV and Chlamydiaceae 225 226 Members of the CC-IV and Chlamydiaceae (Supplementary Fig. 7a) appeared to encode seven

227 gene families (by NOG or PF ) absent in all other Chlamydiae lineages, which were

228 likely gained prior to the divergence of the two former clades. Despite their presence in all

229 representative Chlamydiaceae genomes investigated here, the function of most of these proteins

230 is unknown (barring the exception of COG0400 in the genome of Chlamydia sp. 2742-308).

231 Their maintenance across Chlamydiaceae, despite massive gene loss during evolution of this

232 family (Fig. 3a-b), suggests that these proteins play important roles in their pathogenic

11

233 lifestyles. To further investigate their evolutionary history, we inferred single-gene tree

234 phylogenies for two of these protein families (PF04518 and PF05302), which are thus far

235 taxonomically restricted to CC-IV and Chlamydiaceae (see Methods).

236 While only one protein with the PF domain PF04518 is found in CC-IV member

237 Chlamydiae bacterium K940_chlam_9, genomes of Chlamydiaceae encode four or five

238 proteins with this domain (Supplementary Fig. 7a). A phylogenetic analysis of proteins

239 assigned to PF04518 (Supplementary Fig. 7a) revealed that this gene family appears to have

240 undergone several gene duplication events, after the divergence of CC-IV and Chlamydiaceae

241 and prior to the diversification of the latter, resulting in four distinct gene copies

242 (Supplementary Fig. 7b). Each of these copies belongs to one of four different highly supported

243 clades (BV > 98) (Supplementary Fig. 7b). Genes encoding proteins from clades 1 and 2, as

244 well as from clade 3 and 4, are localized, respectively, in a gene cluster in the genomes of

245 Chlamydiacaeae. A subset of proteins assigned to cluster 4 experienced an additional gene

246 duplication event in Chlamydia trachomatis, and , and

247 form a distinct sub-clade (clade 5) within clade 4 (Supplementary Fig. 7b). The previous

248 investigation of this protein family in Chlamydiaceae has shown that its members contain a

249 Non-Flagellar Type III Secretion System (NF-T3SS) signal25 and appear to be secreted by the

250 NF-T3SS as effectors26. They may act by targeting nuclear functions, since they are found in

251 the nucleus of infected host cells25,26. The function of proteins with the PF04518 domain in

252 Chlamydiaceae was likely neo-functionalized by the above described duplication events.

253 Understanding the function of the single copy protein in Chlamydiae bacterium K940_chlam_9

254 could help determine the ancestral function of the protein and how it impacted the evolution of

255 Chlamydiaceae pathogenicity.

256 We also observed conserved gene duplications between the CC-IV and Chlamydiaceae

257 in the case of proteins with the domain PF05302 (Supplementary Fig. 7a). In this case, a

12

258 phylogenetic analysis revealed three distinct clades, with one copy from Chlamydiae bacterium

259 K940_chlam_9 and Chlamydiaceae members found in each (Supplementary Fig. 7c). All three

260 copies are organized together in the genomes of both Chlamydiae bacterium K940_chlam_9

261 and Chlamydiaceae members (Supplementary Fig. 7c). Together, these results indicate that this

262 gene family underwent several gene duplication events prior to the divergence of CC-IV and

263 Chlamydiaceae. Chlamydia trachomatis PF05302 domain-containing homologs CT847 (clade

264 3) and CT849 (clade 1) (Supplementary Fig. 7c, Supplementary Data 4) have both been

265 characterized as T3SS substrates, and likely effectors27,28. Chlamydia trachomatis homolog

266 CT847 appears to interact with mammalian Grap2 Cyclin D-Interacting Protein (GCIP), a

267 protein involved in the eukaryotic cell cycle27. Examining the role of the three proteins with the

268 domain PF05302 in Chlamydiae bacterium K940_chlam_9 could help in elucidating their

269 ancestral functions. Thereby aiding in understanding the contributions of this protein family to

270 Chlamydiaceae evolution.

271 Future, more fine-grained investigations of gene content evolution in members of the

272 Chlamydiae, with the inclusion of CC-IV members, will be crucial to better understand the

273 evolutionary trajectories that led to the ecological success of Chlamydiaceae as animal

274 pathogens.

275

276 3. Secretion systems and flagella in Chlamydiae

277 3.1 Detection of secretion systems, flagella and effectors 278 279 The secretion of proteins and other molecules by secretion systems is important for host

280 association, microbial interactions and relation with the environment. We screened all available

281 Chlamydiae genomes for type I to VI secretion systems (T1SS to T6SS), flagella and related

282 genes with MacSyFinder29 (see Supplementary Data 3). Most chlamydiae genomes were found

283 to contain genes for T1SS, T2SS, T3SS and T5SS, while only a few encoded T4SS and flagella

13

284 genes. We discuss each of these systems below, following the gene nomenclature proposed by

285 Abby et al.30.

286 T1SS. T1SSs are simple one-step protein secretion systems that are formed by three

287 components: an inner membrane ABC transporter, an outer membrane component and a

288 bridging membrane fusion protein. All three components were found in most of the surveyed

289 chlamydiae. However, the membrane fusion protein was not detected in most CC-IV and CC-

290 III lineages, and all three components were absent in most Chlamydiaceae (Supplementary Data

291 3).

292 T2SS. T2SS are complex protein secretion systems formed by outer membrane, inner

293 membrane and pseudopilus apparatuses, and a cytoplasmic ATPase31,32. The most commonly

294 detected homologs for this system in Chlamydiae were GspD, GspF and GspG, respectively,

295 which represent the central core proteins of the three above-mentioned T2SS structural

296 apparatuses. We also detected the cytoplasmic ATPase GspE in a few Chlamydiae genomes,

297 and PilB (a homolog of GspE in type 4 pili) was often detected in those genomes where GspE

298 was missing (Supplementary Data 3). The core genes gspDEFG were generally co-located in

299 tandem (Supplementary Fig. 10). Other T2SS components, such as the minor pseudopilins

300 GspHIJK (labeled as 'mandatory' by MacSyFinder) and other non-essential proteins (labeled as

301 'accessory' by MacSyFinder) were often absent. However, situated immediately upstream of

302 gspDEFG, we detected either the minor pseudopilin genes gspHIJK, or genes of similar length

303 patterns with a significant e-value using BLAST (Supplementary Fig. 10). Taken together,

304 these results suggest that chlamydiae harbour a variant of the classical T2SS. Furthermore our

305 observations agree in part with Peabody et al.31, who described the presence of genes

306 gspCDEFG in Chlamydia and Chlamydophila genomes: we were unable to detect GspC, while

307 Peabody et al. were unable to detect the minor pseudopilins GspHIJK.

14

308 Non-flagellar T3SS (NF-T3SS), flagellum and T3SS-secreted effectors. NF-T3SSs are

309 complex protein secretion systems generally with eukaryotic host interactions, and have

310 previously been shown to be essential for virulence in Chlamydiaceae (reviewed in e.g.,33,34).

311 NF-T3SS components evolved through exaptation of proteins constituting the bacterial

312 flagellum35, which complicates their unambiguous detection and annotation. The components

313 screened in the present study include the outer membrane ring secretin (SctC), the inner

314 membrane ring (SctJ), the secretion apparatus (SctRSTUV), the sorting platform (SctQ) and

315 the cytoplasmic ATPase (SctN). SctC is unique to the NF-T3SS, while the other components

316 share homology with flagellar proteins. To evaluate whether the chlamydial homologs were

317 NF-T3SS or flagellar genes, we performed phylogenetic analyses of various individual genes,

318 as well as of concatenated alignments, using as reference the SctN sequences published by

319 Abby and Rocha35 and the other discussed NF-T3SS sequences used in Abby et al.30. Similar

320 to previous studies35, our phylogenetic analyses (Supplementary Fig. 8, Supplementary Data 4)

321 place a myxococcal NF-T3SS system as sister to all other bacterial NF-T3SS sequences. The

322 latter then diverge on one hand into all of the chlamydial sequences and, on the other, the rest

323 of . Our analyses retrieve the monophyly of the main chlamydial clades, which is in

324 overall agreement with the species tree. These results confirm that the NF-T3SS is found across

325 Chlamydiae.

326 The NF-T3SS genes are distributed over three gene clusters, one containing sctN, sctQ and

327 sctC, one containing sctJ, sctR, sctS and sctT, and one containing sctU and sctV (Supplementary

328 Fig. 9). The gene order and neighborhood of these clusters is highly conserved in all

329 Chlamydiae genomes, as has been shown previously for environmental chlamydiae and

330 Chlamydiaceae8,9. The gene order conservation allowed us to detect sctC homologs in many

331 genomes where this gene was not detected by MacSyFinder: we identified significant BLAST

332 hits to known SctC sequences in their expected position near the sctN and sctQ genes. The three

15

333 gene clusters were interspersed and flanked with other conserved genes on the same strand.

334 While their function could not be determined in most cases, a gene situated between sctQ and

335 sctC encoded a serine/threonine protein kinase that has been suggested to participate in NF-

336 T3SS protein secretion10. Altogether, we hypothesize that the NF-T3SS genes were acquired

337 by the common ancestor of Chlamydiae and have since been inherited vertically.

338 In contrast to the ubiquity of NF-T3SS, we found flagellar genes only in a handful of

339 genomes, including CC-IV and four marine chlamydiae related to the environmental

340 chlamydiae and Chlamydiaceae clades36,37. Although many chlamydiae genomes were found

341 to contain putative homologs of the flagellar proteins sctN and sctQ genes, most turned out not

342 to be associated with flagellar function. On one hand, most proteins detected as flagellar SctQ

343 homologs branched instead with NF-T3SS genes in phylogenetic analyses (ufBV=94%; SH-

344 aLRT=92%) or within a clade extremely distantly related to flagellar homologs (Supplementary

345 Data 4). Phylogenetic analyses of chlamydial SctN homologs similarly revealed that these often

346 branched with non-flagellar ATPases (Supplementary Data 4). However, phylogenetic analyses

347 were inconclusive regarding the putative function of a group of proteins annotated as flagellar

348 SctN homologs by MacSyFinder in Chlamydiaceae: while they are more closely related to

349 flagellar sequences than to other homologs, they are not nested within them (Supplementary

350 Data 4). Putative flagellar homologs of SctV in Chlamydiaceae also form long-branching clades

351 related to known flagellar sequences, indicating these proteins could represent divergent

352 flagellar homologs (Supplementary Data 4). Remarkably, flagellar homologs of SctN (FliI) and

353 SctV (FlhA) in Chlamydiaceae have been shown to interact with the NF-T3SS protein complex,

354 suggesting they have been co-opted to a new function in protein secretion20.

355 In contrast, the above-mentioned marine chlamydiae and the CC-IV Chlamydiae

356 K940_chlam_9 and KR12_chlam_2 were found to contain a large array of flagellar genes

357 (Supplementary Figs. 4 and 8; Supplementary Data 3). Even though CC-IV Chlamydiae

16

358 bacterium K1000_chlam_4 contained only a copy of the flagellar homolog of sctR (fliP), it is

359 possible that this genome encodes a full flagellar gene set, since this MAG is relatively

360 incomplete (Fig. 2, Supplementary Table 2) and the contig containing fliP ends right after this

361 gene, and thus before the expected location of the flagellar homologs of sctS (fliQ) and sctT

362 (fliR) genes. The flagellar genes of CC-IV and the marine chlamydiae listed above form a well-

363 supported clade in the phylogeny of concatenated NF-T3SS genes and in single-gene trees of

364 SctJ, SctR (Supplementary Fig. 8), SctN and others (Supplementary Data 4). Therefore, given

365 the species phylogeny obtained in the present study (Fig. 2, Supplementary Fig. 3), these results

366 suggest that gene sets for the flagellum were present in the common ancestor of the

367 environmental chlamydiae, Chlamydiaceae and CC-IV clades, but were ultimately lost in the

368 former two groups but retained in their CC-IV and unclassified relatives. However, further

369 phylogenetic analyses that adequately model the extreme divergence of the Chlamydiaceae

370 genes of putative flagellar origin will be required to verify these inferences.

371 Since we were able to predict the existence of a NF-T3SS in several chlamydiae, we used

372 EffectiveDB38 to predict T3SS-secreted proteins, eukaryotic-like domains (ELD), and putative

373 subcellular targeting signals to eukaryotic cellular compartments. None of the non-chlamydial

374 PVC bacteria were predicted to have any T3SS secreted proteins. In contrast, 9% to 28% of

375 chlamydial proteomes were predicted to possess a T3SS-associated signal peptide. However,

376 we could not identify major differences between various subclades of chlamydiae regarding

377 most features predicted by EffectiveDB (Supplementary Data 3). A notable exception concerns

378 the prediction of CCBD (conserved chaperone-binding domain) motifs, which are usually

379 found in the N-terminal region of T3SS-secreted proteins and have been shown to serve as

380 binding site of chaperones facilitating the correct selection and unfolding of T3SS-dependent

381 effector proteins. We found that Anoxychlamydiales members were enriched in proteins

382 predicted to have a CCBD motif (average 4.58 ± 0.78%) compared to other chlamydiae (2.52

17

383 ± 1.20%). However, the significance of this result is difficult to assess given that these lineages

384 do not appear to be enriched in predicted T3SS-secreted proteins. Intriguingly, five chlamydiae

385 were not predicted to have any T3SS-secreted proteins, although all of these are

386 predicted to have a NF-T3SS. In addition, Verrucomicrobium spinosum, a verrucomicrobium

387 recently described to have a NF-T3SS39, was also not predicted to have any T3SS-secreted

388 proteins or CCBD motifs. This suggests that predictive tools such as EffectiveDB are currently

389 unable to model the entire diversity of proteins motifs that are recognized by the T3SS and their

390 chaperones, and that differences between chlamydial lineages in terms of their T3SS-secreted

391 proteins will have to be revisited when more sensitive predictive tools become available.

392 T4SS. T4SSs are versatile systems generally involved in contact-dependent translocation

393 of proteins and DNA. We detected most of the Type F T4SS genes in a subset of Chlamydiae

394 genomes: Waddliaceae bacterium SP13, Chlamydiae bacterium K1060_chlam_2 (CC-I),

395 Chlamydiae bacterium K1000_chlam_3 (CC-II), R. massiliensis, Parachlamydia spp. and

396 Protochlamydia spp (Supplementary Fig. 11). This patchy distribution of T4SS genes in

397 chlamydiae is consistent with the idea that these genes were recently acquired via horizontal

398 gene transfer (HGT).

399 T5SS. T5SS are two-step protein secretion systems, generally substrate-specific and

400 containing one to three components. T5SS classical autotransporters (type 5a secretion systems,

401 T5aSS) and translocators (type 5b secretion system, T5bSS) were identified in most CC-I, II,

402 III and Anoxychlamydiales members, while most environmental chlamydiae and

403 Chlamydiaceae only contained T5aSS autotransporters.

404 Other systems. Finally, while a few homologs were detected for other systems such as

405 T6SS, Tad pili and type IV pili (T4P), these remained largely incomplete in all genomes,

406 indicating these inferences likely represent false positives.

407

18

408 3.2 Putative functions of secretion systems in marine sediment Chlamydiae 409 410 The observation made above largely corroborates findings made in previous studies, which

411 indicate that most Chlamydiae contain T1SS, T3SS and T5aSS, and provide new evidence for

412 the presence of T2SS and the sparse presence of T4SS and flagella3,17,30,40. The presence of

413 secretion systems is commonly interpreted in the light of host-symbiont dynamics. However,

414 the lack of identified eukaryotes in the presented samples (see Supplementary Discussion 6)

415 raises the possibility that these perform alternative functions in at least some of the newly

416 discovered chlamydial lineages.

417 In Chlamydiaceae, T2SS, T3SS and T5aSS are typically linked to host adhesion, invasion

418 and manipulation15,41,42. Similarly, environmental chlamydiae have also been shown to express

419 secretion systems during infection of microbial eukaryotes43,44. However, whether their

420 function is conserved throughout the Chlamydiae phylum remains unclear. For example, a

421 recent transcriptomic study found that the expression of T3SS is higher in reticulate bodies than

422 in elementary bodies in C. abortus, but found the opposite pattern in W. chondrophila44. The

423 exact nature of the cell cycle and the function of secretion systems in these lineages are yet to

424 be elucidated. Furthermore, the extracellular stage of the chlamydial cell cycle is considerably

425 understudied45, and little is known about potential chlamydial interactions with other microbes.

426 Despite the traditional link between secretion systems and host-association, some of these

427 systems have been described to target . For example, T6SS are typically used for

428 bacteria-bacteria interactions46, and gram positive T7SS have been described to target bacteria

429 under certain conditions47. T1SS, T4SS and T5bSS, all present in chlamydial genomes, have

430 also been shown to target bacterial cells48-51. In Legionella pneumophila, T2SS and T4P

431 facilitate biofilm formation and retention52,53, and the former is involved in sliding motility54

432 and extracellular survival in freshwater. Taken together, this suggests that the presence of

433 various secretion systems does not necessarily imply interactions with eukaryotic hosts.

19

434 In line with this, NF-T3SS have also been described in bacteria that are not known to

435 interact with eukaryotes55. For example, a NF-T3SS has been identified in Verrucomicrobium

436 spinosum, a generally free-living (though it has been shown to have detrimental

437 effects when experimentally inoculated in fruit flies and Caenorhabditis elegans39). NF-T3SS

438 has also been found alongside T6SS, chemotaxis and flagellar genes, in strains of and

439 Aeromonas associated with microbial biofilms, but which are not known to associate with

440 eukaryotes39. The presence of NF-T3SS in Myxococcales is particularly interesting, given that

441 they are not associated with a host and contain a highly divergent version of the NF-T3SS,

442 which represents a sister to all other NF-T3SS and lacks various genes generally associated

443 with this system35. The chlamydial version of the NF-T3SS, which is placed as sister clade to

444 all non-myxococcal NF-T3SS (Supplementary Fig. 8) may hold functional similarities with the

445 more divergent Myxococcales NF-T3SS35.

446 In conclusion, these observations indicate that the presence of various secretion systems

447 does not necessarily imply a -associated lifestyle. Alternatively, proteins secreted by

448 secretion systems could play a role in growth in biofilms, in the interaction with other microbial

449 groups or in the modification of the environment. Studies about the biology of chlamydial

450 elementary bodies, as well as visualisation of representatives of the newly discovered lineages

451 will be instrumental to answer this question.

452

453 4. Phylogenetic diversity of chlamydial nucleotide transporters.

454 Nucleotide transporters (NTTs) belong to the ‘ATPases Associated with diverse cellular

455 Activities’ (AAA) family of proteins56 and can transport a range of metabolites, including ATP,

456 the cofactor nicotinamide adenine dinucleotide (NAD+), ribonucleotides and

457 deoxyribonucleotides across a membrane. NTTs are found in diverse lineages in the tree of life.

458 In plastid-bearing eukaryotes, the ATP/ADP NTT proteins are essential for the import of ATP

20

459 into the organelle from the cytosol57,58. Some obligate intracellular pathogenic eukaryotes (e.g.,

460 Microsporidia16,56) and obligately symbiotic bacteria (e.g., members of the Chlamydiaceae and

461 Rickettsia59) use NTTs to import ATP and other nucleotides from their eukaryotic hosts. A

462 recent investigation of NTT phylogenetic diversity found that NTT homologs are also found in

463 a diverse set of free-living organisms60.

464 All chlamydial genomes investigated to date, including those from marine sediment

465 chlamydiae, encode multiple NTT homologs (Supplementary Fig. 4, Supplementary Data 3),

466 although the number of homologs varies across different representatives of this phylum.

467 Depending on the clade, we observed two (CC-IV and Chlamydiaceae), four (CC-I, CC-II and

468 environmental chlamydiae) or five (Anoxychlamydiales) distinct NTT paralogs

469 (Supplementary Data 3). To classify the NTTs of marine sediment chlamydiae related to those

470 from characterized lineages, we reanalysed the phylogenetic diversity of NTT proteins across

471 the tree of life.. For this, we used a previous analysis that resolved the NTT superfamily60:

472 “canonical NTTs”, “other NTTs” and in addition, proteins with NTT-HEAT domains.

473

474 4.1 Phylogenetic diversity of “canonical NTTs” 475 476 “Canonical NTTs” have a single TLC (PF03219) protein domain architecture, and include most

477 functionally characterized NTTs, such as the ATP/ADP transporters from plastids,

478 Microsporidia, Rickettsia and Chlamydiae (Supplementary Fig. 12a). Phylogenetic analyses of

479 these “canonical NTTs” resolved nine distinct groups of chlamydial sequences (Supplementary

480 Fig. 12a). Most ATP/ADP transporters of primary and secondary plastid-bearing lineages (e.g.,

481 archaeplastids, diatoms, , and brown algae) formed a strongly supported

482 clade (ufBV = 95, Supplementary Data 4). Notably, in spite of the cyanobacterial origin of

483 plastid61, the closest prokaryotic homologs to plastid derived ATP/ADP transporters are

484 represented by homologs of Chlamydiae (Supplementary Fig. 12a). As previously

21

485 hypothesized59,60,62,63, this suggests that plastid-bearing lineages acquired the NTT gene from a

486 chlamydial-like donor early in the evolution of this organelle. These chlamydial ATP/ADP

487 translocases formed a clade (ufBV = 91, Supplementary Data 4), referred to as cluster 9

488 (Supplementary Fig. 12a). Cluster 9 contains only one representative sequence from each major

489 chlamydial clade, except for Anoxychlamydiales members, each of which has two paralogs.

490 The topology within clade 9 (Supplementary Data 4) is consistent with the organismal

491 phylogeny (Fig. 2), suggesting that this protein has evolved vertically within the phylum. The

492 substrate specificities of several proteins in cluster 9 have been experimentally characterized.

493 Chlamydia trachomatis (Ct)56, Candidatus Protochlamydia acanthamoebae (Pam)59 and

494 Simkania negevenis (Sn)59,64 encode ATP/ADP-specific NTT1 antiporters that participate in

495 energy during infection of their hosts. Interestingly, CtNTT165 can also act as an

496 NAD+/ADP antiporter, providing a mechanism through which members of the Chlamydiaceae

497 can acquire NAD+, which they are unable to synthesize66,67.

498 Clusters 6 and 7, were each composed of several representatives of the environmental

499 chlamydiae, and branched closely to alphaproteobacterial lineages, including ADP/ATP

500 transporters from Rickettsia59 suggesting they may have similar functional roles in chlamydiae.

501 Cluster 8 includes CC-I, one representative from Anoxychlamydiales, and several

502 representatives from the environmental chlamydiae (Supplementary Fig. 12a). SnNTT3, found

503 in cluster 8, has been characterized and is a proton-independent general NTP transporter, which

504 is also capable of transporting the deoxyribonucleotide triphosphate dCTP64.

505 Five different chlamydial clusters, clusters 1-5, together formed a well-supported group

506 (ufBV = 100). Cluster 5 includes CC-II and environmental chlamydiae (Supplementary Fig.

507 12a) and PamNTT2, a proton-independent transporter of all four canonical ribonucleoside

508 triphosphates68. Cluster 4 is composed solely of sequences from environmental chlamydiae

509 (Supplementary Fig. 12a) and includes PamNTT3, a proton-energized symporter that transports

22

510 UTP68. Cluster 3 is composed of members of CC-I, CC-II, CC-III, Anoxychlamydiales and

511 includes SnNTT2, a proton-dependent symporter of GTP and ATP64. Cluster 2 NTTs comprises

512 sequences from CC-IV and Chlamydiaceae, and includes CtNTT2, a proton-driven symporter

513 of all four NTPs56 (Supplementary Fig. 12a). Cluster 1 contains homologs from environmental

514 chlamydiae, Anoxychlamydiales and some members of CC-II including PamNTT5, a proton-

515 energized symporter that transports both GTP and ATP68 (Supplementary Fig. 12a).

516 Finally, we observe a clear functional conservation within the experimentally

517 characterized H+-driven symporters, all of which group together in clusters 1, 2, 3 and 4 (ufBV

518 = 88 Supplementary Fig. 12a, Supplementary Data 4). This suggests that phylogenetic

519 reconstructions of NTT homologues possess a predictive power in terms of mode of transport,

520 but not substrate specificity.

521

522 4.2 Phylogenetic diversity of “other NTTs” with a single-domain architecture 523 524 A bacterial-dominated group of “other NTTs” has been described to have a single TLC

525 (PF03219) protein domain architecture, and most proteins in this group have yet to be

526 functionally characterized (Supplementary Fig. 12b). In the phylogenetic analysis of those

527 “other NTTs”, we recovered two groups of chlamydial sequences (Supplementary Fig. 12b).

528 Cluster 11 includes representatives from environmental chlamydiae and Anoxychlamydiales

529 (Supplementary Fig. 12b) and branches as a sister clade to a clade comprising sequences from

530 the Candidatus Dependentiae (formerly TM6) phylum (Supplementary Fig. 12b). Ca.

531 Dependentiae have reduced genomes and are thought to lead host-associated lifestyles with

532 eukaryotic hosts69. Cluster 10 includes members of CC-I (including SnNTT4, whose substrate

533 specificity could not be determined in a prior study64), CC-II, and Anoxychlamydiales and

534 forms a maximally supported clade composed largely of , including several

535 Bdellovibrio-and-like-organisms (BALOs). BALOs have a predatory lifestyle whereby they

23

536 invade the periplasm of other gram-negative bacteria to harvest nutrients70. However, this

537 invasive lifestyle is not obligate, as BALOs can grow axenically under nutrient rich conditions.

538

539 4.3 Phylogenetic diversity of “NTT-HEAT” proteins 540 541 “NTT-HEAT” family NTTs60 have an additional C-terminal HEAT domain (PF13646,

542 PF02985) and are found across a wide-range of free-living bacteria, though none have been

543 functionally characterized thus far (Supplementary Fig. 12c). HEAT domains are involved in

544 protein-protein interactions71, and thus may alter the function of the NTT domain in NTT-

545 HEAT proteins60. These putative NTTs are hypothesized to facilitate inter-microbial nutrient

546 exchange during bacteria-bacteria interactions60. For example, these NTTs could be involved

547 in multicellular development in , which have proteins with this domain

548 architecture60. A ML phylogeny of proteins with this domain architecture (Supplementary Fig.

549 12c) recovered one chlamydial clade, cluster 12, which includes the Chlamydiaceae, CC-IV,

550 CC-III, some environmental chlamydiae and several members of Anoxychlamydiales.

551

552 4.4 NTTs in marine sediment chlamydiae 553 554 NTTs identified in marine sediment chlamydiae from this study clustered together with other

555 chlamydial homologs (Supplementary Fig. 12). Despite distinct phyletic distribution patterns

556 of the NTTs found in different chlamydiae clades, the ubiquity of NTT homologues in different

557 chlamydial lineages gives weight to the proposed ancient origin of NTTs within the Chlamydiae

558 phylum59,60. Marine sediment chlamydiae, appear to host a similar set of NTTs as other

559 members of the phylum, including homologs closely related to functionally characterized

560 NTTs. Due to the promiscuous functions of NTTs however59,67, we cannot predict the substrate

561 specificity of homologs found in marine sediment chlamydiae. Although they have homologs

24

562 of the canonical ATP/ADP transporter (Supplementary Fig. 12a), functional characterization is

563 necessary to determine their substrate specificity.

564

565 5. Genomic potential for de novo biosynthesis of nucleotides and amino acids across

566 Chlamydiae

567 Many host-associated bacteria, and particularly obligate intracellular bacteria, are able to

568 acquire essential amino acids and nucleotides from their hosts. Obligate symbionts often

569 undergo genome reduction and lose the ability to produce these compounds de novo72. Here,

570 we discuss the ability of the marine sediment chlamydiae genomes for de novo amino acid and

571 nucleotide biosynthesis (Supplementary Fig. 4, Supplementary Data 3).

572

573 5.1 De novo biosynthesis of amino acids 574 575 Similar to characterized chlamydiae19, the investigated MAGs seem to be generally auxotrophic

576 for many amino acids. In fact, no chlamydiae representative with the capacity to synthesize all

577 amino acids has been identified thus far. Below, we discuss some examples of more extensive

578 amino acid and nucleotide biosynthesis capabilities in specific Chlamydiae lineages.

579 Proline. Environmental chlamydiae do not have the coding potential to synthesize all

580 amino acids de novo19, but generally encode a larger set of amino acid biosynthetic capabilities

581 than Chlamydiaceae (e.g., proline biosynthesis). We identified all genes for proline de novo

582 biosynthesis in more than half of environmental chlamydiae genomes (9/15; including marine

583 sediment lineage Chlamydiae bacterium K940_chlam7), and in one member of CC-IV

584 (Chlamydiae bacterium K940_chlam_9). This observation is in line with a scenario in which

585 proline biosynthesis was present in the common ancestor of the CC-IV, Chlamydiaceae and

586 environmental chlamydiae, and was subsequently lost in the Chlamydiaceae.

25

587 Aromatic amino acids. The ability to synthesize aromatic amino acids (i.e., tryptophan,

588 phenylalanine, and tyrosine) displays a punctuated distribution among the analysed chlamydiae.

589 Seemingly, only S. negevensis is capable of synthesizing all three amino acids19. The genomes

590 of Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 and its close relatives

591 (Supplementary Fig. 13, Supplementary Data 3) encode the potential for phenylalanine and

592 tyrosine biosynthesis and a near-complete pathway for tryptophan biosynthesis. While

593 Waddliaceae bacterium SP13, bacterium SCGC AG-110-P3 and Parachlamydia

594 sp. C2 each encode a complete pathway for the biosynthesis of tryptophan, some

595 Chlamydiaceae19 were found to encode a near-complete pathway (Supplementary Fig. 4).

596 Other amino acids. Chlamydiae generally do not encode pathways for the biosynthesis

597 of arginine, methionine, histidine, leucine, isoleucine and valine. We identified some notable

598 exceptions, including a complete leucine biosynthesis pathway in Parachlamydia sp. BC.030

599 and a near-complete histidine biosynthesis pathway (7/8 components; Supplementary Data 3)

600 in Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 and close relatives

601 (Supplementary Fig. 13, Supplementary Data 3).

602

603 5.2 De novo biosynthesis of nucleotides 604 605 Pyrimidine. Chlamydiaceae and most members of the environmental chlamydiae are

606 auxotrophic for the de novo biosynthesis of both purine and pyrimidine nucleotides19. Previous

607 work has identified pyrimidine biosynthesis (i.e., uridine monophosphate (UMP) biosynthesis

608 from glutamine; KEGG module M00051)73,74 in W. chondrophila WSU 86-104425 and

609 Criblamydia sequanensis CRIB-1873,74. We identified a complete pyrimidine biosynthesis

610 pathway in members of CC-II (Chlamydiae bacterium Ga0074140), CC-III (Chlamydiae

611 bacterium CG10_big_fil_rev_8_21_14_0_10_42_34,

612 CG10_big_fil_rev_8_21_14_0_10_35_9), CC-IV (Chlamydiae bacterium K940_chlam_9) and

26

613 environmental chlamydiae (Chlamydiae bacterium K940_chlam_7, K940_chlam_3,

614 Chlamydiales bacterium SCGC AG-110-M15 and Waddliaceae bacterium SP13). Further, we

615 identified near-complete pathways in additional chlamydiae members of CC-II

616 ( helvetica T3358 and Chlamydiae bacterium K940_chlam_2), other CC-IV

617 lineages, and environmental chlamydiae (Chlamydiae bacterium K940_chlam_3).

618 Purine. Compared to pyrimidine biosynthesis, purine biosynthesis is more sparsely distributed

619 among the Chlamydiae. The first evidence for a near-complete de novo purine biosynthesis

620 pathway (i.e., inosine monophosphate (IMP) biosynthesis from glutamine; KEGG module

621 M00048) in Chlamydiae was recently described in Chlamydiales bacterium SCGC AG-110-

622 M1536. We additionally identified the complete pathway for IMP biosynthesis in Chlamydiae

623 bacterium CG10_big_fil_rev_8_21_14_0_10_35_9 and Waddliaceae bacterium SP13, and

624 partial biosynthesis pathways (i.e., all but the PurE and PurK encoding genes; Supplementary

625 Data 3) in Chlamydiales bacterium SCGC AG-110-M15 and Chlamydiae bacterium

626 K940_chlam_3 (environmental chlamydiae). The former represents a relatively incomplete

627 SAG (Supplementary Table 3), such that the presence of these genes cannot be ruled out.

628 Interestingly, all genomes which encode a complete or near-complete purine de novo

629 biosynthesis pathway also encode a complete or near-complete pathway for de novo pyrimidine

630 biosynthesis. This observation suggests that some chlamydiae might not rely on a host for these

631 essential metabolites. Future analyses aimed at inferring the evolutionary histories of these

632 pathways will help to determine whether nucleotide biosynthesis was ancestrally present in

633 Chlamydiae or rather represents a derived trait acquired by .

634

635 6. Eukaryotes in Loki’s Castle marine sediments

636 All chlamydiae characterized to date represent obligate symbionts with eukaryotic hosts and

637 have a characteristic biphasic lifecycle (intracellular host-associated phase and extracellular

27

638 elementary body phase). Our analyses revealed that the marine sediment chlamydiae encode

639 key host-association features (e.g., NF-T3SS; Supplementary Data 3, Supplementary

640 Discussion 3) and elementary body factors (e.g., early upstream reading frame transcription

641 factor and histone-like development protein; Supplementary Fig. 4, Supplementary Data 3) and

642 are predicted to be auxotrophic for some nucleotides and amino acids (Supplementary Fig. 4,

643 Supplementary Data 3, Supplementary Discussion 5). These observations would suggest that

644 marine sediment chlamydiae might be host-associated and prompted a thorough search for

645 eukaryotes in these sediments.

646 Indeed, active populations of fungi, and macrofauna have been observed in

647 marine sediments75-78. Using universal eukaryotic primer sets, we failed to amplify 18S rRNA

648 gene sequences from the marine sediment samples (Supplementary Table 5), in line with

649 previous analyses of Loki’s Castle marine sediments79. However, we were able to identify

650 several 18S rRNA gene sequences in the obtained metagenomic data (see below;

651 Supplementary Table 4), suggesting that eukaryotes might represent low-abundant community

652 members of these anaerobic marine sediments. Yet, it has been shown that eukaryotic DNA

653 from overlying water columns can be deposited and well-preserved in marine sediments under

654 anoxic conditions80,81. In the present study, we were unable to determine whether the observed

655 eukaryotic DNA sequences were derived from live cells capable of hosting chlamydiae. Below

656 we expand on the identified 18S rRNA gene sequences and discuss their potential sources.

657 Several 18S rRNA gene sequences from samples GS10_PC15_940 (contig-124_471961

658 and contig-124_482067) and GS10_PC15_1000 (contig-124_27583) were classified as

659 mammalian (Supplementary Table 4). These sequences most likely represent human

660 contamination introduced during sampling, DNA extraction or during sequencing.

661 In the GS10_PC15_1060 sample we identified an 18S rRNA gene sequence that likely

662 derives from a (order Rhabdocoela, contig-124_364989). Chlamydiae bacterium

28

663 K1060_chlam_2, which corresponds to the only Simkaniaceae-like MAG derived from these

664 marine sediments, was also obtained from this sample. Since several of the previously described

665 Simkaniaceae are known symbionts of marine worms82,83, it is possible that Chlamydiae

666 bacterium K1060_chlam_2 might be a symbiont of the Rhabdocoela-related flatworm observed

667 in this sediment layer.

668 In the GS10_ PC15_940 sample we uncovered 18S rRNA gene sequences that likely

669 derive from an ichthyosporean (contig-124_299197 and contig-124_207553), a green algae

670 (Micromonas contig-124_372972) and a genome (contig-124_152295). This

671 sample was shown to also contain the actively replicating (Fig. 4a) and highly abundant

672 Chlamydiae bacterium K940_chlam_7 (Supplementary Discussion 7), which is most closely

673 related to the -associated environmental chlamydiae10,11. This raises the possibility that

674 one of the aforementioned eukaryotes could represent a host for Chlamydiae bacterium

675 K940_chlam_7. However, given that Micromonas is a phototroph it is unlikely that these cells

676 are active in dark marine sediments. Moreover, there have been no reported cases of chlamydiae

677 capable of infecting Archaeplastida (algae and land ). Alternatively, it is possible that the

678 observed ichthyosporean might represent the host organism of Chlamydiae bacterium

679 K940_chlam_7. Yet, little is known about the ecology of ichthyosporeans in marine sediments,

680 and, to our knowledge, there have been no reported cases of chlamydiae capable of infecting

681 ichthyosporeans so far.

682 Eukaryotes present at low abundances in the samples could have been missed by

683 our sequencing efforts. However, in general, the eukaryotic sequences identified in the samples

684 appear insufficient to account for overall patterns in chlamydial diversity and abundance across

685 all samples. No eukaryotic sequences were identified in sample GS08_GC12_126, where

686 Anoxychlamydiales lineages were found to be exceptionally abundant (Supplementary

29

687 Discussion 7). Thereby suggesting that these particular chlamydial lineages may not depend on

688 a eukaryotic host.

689

690 7. Abundance and diversity of chlamydial lineages in Loki’s Castle marine sediments

691 In a previous study of Loki’s Castle marine sediments108, we detected the presence of

692 Chlamydiae. We further investigated the relative abundance and diversity of Chlamydiae in

693 these sediments using amplicon sequencing of samples taken from four different sediment cores

694 at various depths (Supplementary Table 1, Supplementary Data 2). All of the samples with high

695 chlamydial abundances were isolated from sediment depths found below (but within 1.2 m of)

696 the oxic/anoxic transition zone, which is found at various depths below the seafloor in sediment

697 cores GS08_GC12(0.38 mbsf)74, GS10_PC15 (1.0 mbsf)75, and GS10_GC1475 (0.4 mbsf). The

698 highest diversity of chlamydial OTUs (over 0.1% relative abundance) were observed in anoxic

699 sediment layers (Fig. 1b). When considering individual OTUs found across the marine sediment

700 amplicons, 30 were found to be present in at least five samples (Supplementary Data 2),

701 indicating that a large fraction of the observed chlamydial lineages are commonly found in this

702 environment.

703

704 7.1 Anoxychlamydiales lineages are abundant microbial community members in Loki’s 705 Castle marine sediments 706 707 We were unable to link 16S rRNA gene fragments to most of the Anoxychlamydiales MAGs

708 reconstructed in this study (a problem often encountered in genome-resolved metagenomic

709 studies112,113). However, in the phylogenetic analysis of the obtained 16S rRNA amplicon

710 sequences (Supplementary Data 4), we identified 17 OTUs that formed a highly supported clade

711 (ufBV = 98) with Anoxychlamydiales member Chlamydiae bacterium SM23_39. The OTU

712 abundance of this group mirrors the presence of Anoxychlamydiales bins in the metagenomes

713 from the same samples, indicating that these OTUs represent Anoxychlamydiales 16S rRNA

30

714 gene sequences. Two of these OTUs (OTU_5_19291 and OTU_255_442) were highly

715 abundant and widespread across sediment samples. OTU_5_19291 is found in 18 samples, with

716 highest relative abundance in all four sediment cores past the oxic/anoxic transition zone. The

717 relative abundance of OTU_5_19291 is above 1% in 7 samples. In one exceptional case it was

718 the most abundant OTU in the GS08_GC12_126 sample, representing 40% of bacterial relative

719 abundance. OTU_255_442, like OTU_5_19291 was most abundant (1.3%) in sample

720 GS08_GC12_126.

721 7.2 Environmental chlamydiae lineages are abundant in sample GS10_PC15_940 722 723 In our amplicon survey of GS10_PC15_K940, we identified an abundant OTU (OTU_64_1912;

724 4.8% abundance), which likely corresponding to the Chlamydiae bacterium K940_chlam_7

725 MAG, as they both affiliate with the Waddliaceae family in phylogenetic analyses (Fig. 1b,

726 Fig. 2, Supplementary Fig. 3, Supplementary Data 2). chondrophila is a known animal

727 pathogen12,84, and Waddliaceae family members have been identified both in animal-associated

728 and environmental samples85. The wide distribution of these organisms in diverse environments

729 suggests these species could naturally infect protists like other environmental chlamydiae10,11.

730 If so, this raises the possibility that the Waddliaceae-related Chlamydiae bacterium

731 K940_chlam_7 (Supplementary Discussion 6) might be a symbiont of the eukaryotes detected

732 in sample GS10_PC15_K940.

733

734 8. Underestimation of environmental abundance and diversity of Chlamydiae

735 8.1 The environmental distribution of Chlamydiae 736 737 To assess if the high environmental relative abundance and diversity of Chlamydiae identified

738 here (Supplementary Table 1, Supplementary Discussion 7) is unique to Loki’s Castle marine

739 sediments, we surveyed chlamydial abundance and diversity in other environments using the

740 Integrated Microbial NGS (IMNGS) platform86. IMNGS allows for large-scale taxonomic

31

741 analysis of 16S rRNA gene amplicon datasets deposited in the Sequence Read Archive (SRA).

742 Using this platform, we identified 13 environments that were enriched for chlamydial diversity

743 (>50 OTUs) and/or abundance (>0.1% relative abundance; Fig. 4b, Supplementary Data 3).

744 A large proportion of rhizosphere samples (831, corresponding to 62% of samples) and

745 soil samples (2295, corresponding to 14% of samples) harbour relative abundances of

746 Chlamydiae above 0.1%, though comparatively fewer had a high taxonomic richness as based

747 on OTU numbers. Several salt marsh samples were found to contain large numbers of

748 chlamydial OTUs, indicating that this environment represents an unexplored reservoir for

749 uncultured Chlamydiae diversity. Approximately 16% of groundwater samples also appear to

750 harbor a large relative chlamydial abundance, which is congruent with a recent study in which

751 17 MAGs were assembled from groundwater that resolved five distinct chlamydial lineages8.

752 Other environments, including wastewater, activated sludge and bioreactor samples, also

753 contain chlamydial relative abundances above 0.1% of the total microbial community.

754 Interestingly, previous studies have retrieved several chlamydial MAGs affiliated with both

755 CC-II and environmental chlamydiae from such environments (Supplementary Discussion 1,

756 Supplementary Table 3)87-89. In addition, some biofilm samples were found to harbour higher

757 (>0,1% relative abundance) chlamydial abundances (15% of samples) and could be an

758 additional environment of interest for studying uncultured chlamydial lineages.

759 Furthermore, we found that 24% and 14% of freshwater samples contained relative

760 abundances above 0,1% and included more than 50 OTUs, respectively. Samples from

761 freshwater sediments generally also contain high relative abundances of Chlamydiae, but do

762 not necessarily harbour high chlamydial diversity, which is similar to observations made for

763 seawater and marine sediment samples (Fig. 4b, Supplementary Data 3). These findings are in

764 line with a study that revealed a broad taxonomic and phylogenetic diversity of chlamydiae in

765 various environments, particularly from , soil and freshwater environments90. Altogether,

32

766 our analyses underline that several environments harbor high diversity and relative abundances

767 of uncultured Chlamydiae, even though primer sets used in environmental surveys were not

768 optimal for detection of chlamydiae (see 8.2).

769

770 8.2 Underestimation of chlamydial diversity and abundance in environmental surveys 771 772 Schulz et al.90 recently reported that taxonomic diversity estimates differ significantly between

773 amplicon and metagenomic surveys. In particular, they observed that taxonomic richness and

774 diversity of Chlamydiae was more pronounced in metagenomic data when compared to

775 amplicon studies. This may be the result of the common use of primers, which do not amplify

776 a large fraction of representatives of the Chlamydiae phylum91. For example, the widely used

777 16S rRNA gene primer sets 515FB and 806RB from the Earth Microbiome Project92, only

778 capture 0.7% of the characterized chlamydial diversity without any mismatches (though they

779 do capture 95% if allowing a single mismatch (Supplementary Table 5)). Similarly, the

780 universal primer pair A519F/Uni1391R captures less than 1% of chlamydial diversity

781 (Supplementary Table 5). In the present study we therefore used a bacterial-specific primer pair

782 (S-D-0564-a-S-15/SD-Bact-1061-a-A-17) that is predicted to capture ~94% of the presently

783 known chlamydial diversity without mismatches (Supplementary Table 5). Indeed, when

784 comparing the relative abundances of OTUs generated by 16S rRNA amplicon sequencing

785 using the S-D-0564-a-S-15/SD-Bact-1061-a-A-17 and A519F/U1391R primer pairs on

786 sediment core GS08_GC1293, we found that chlamydial OTUs represented 43% relative

787 abundance using the former primer pair, and less than 1% relative abundance when using the

788 latter. Similarly, while no chlamydial sequences were detected previously in GS10_GC14_75

789 using the A519F/U1391R primer pair79, a similar analysis with the S-D-0564-a-S-15/SD-Bact-

790 1061-a-A-17 primer pair recovered 8.9% relative chlamydial abundance.

791

33

792 8.3 Using culture-independent methods to explore chlamydial genomic diversity 793 794 As evidenced by the present study, culture-independent methods have great potential for

795 expanding genomic representation within the Chlamydiae phylum94. The majority of the so far

796 characterized chlamydiae have been isolated by means of co-cultivation (Supplementary Table

797 3), thus selecting for representatives that can replicate in the respective eukaryotic host.

798 However, most newly identified chlamydial lineages are represented by genome data only

799 which is derived from cultivation-independent studies (Supplementary Table 3). Chlamydiae-

800 targeted studies using cultivation-independent approaches have resulted in the first chlamydial

801 SAGs36 and the first chlamydial MAGs from metagenomic-based projects targeting animal

802 host-associated populations2,4. A number of chlamydial MAGs have also been recently

803 retrieved from whole microbial community metagenomic sequencing efforts of diverse

804 environments, including drinking water treatment plants88,89, a bioreactor87, aquifer

805 groundwater8, a cold-water geyser95, oceanic waters37 and river estuary sediment9. Altogether,

806 the Chlamydiae phylum is severely understudied at the genomic level, and the future

807 exploration of the microbial communities in additional environments in which Chlamydiae are

808 represented (e.g, see 8.1) will likely yield genomic data from diverse and abundant chlamydial

809 lineages.

34 Supplementary Figures

Extract DNA Amplify and sequence Cluster 16S rRNA region of the 16S rRNA gene sequences into gene from Bacteria OTUs

Sequence metagenome

Assemble Group contigs into metagnomic reads metagenomic bins into contigs

Supplementary Figure 1. Overview of sequencing methods. For amplicon sequencing, DNA was extracted from 69 marine sediment samples taken near Loki’s Castle hydrothermal vent field. These were used as a template for bacterial-specific amplification of an approximately 500 bp region of the 16S rRNA gene and sequenced on an Illumina MiSeq sequencer. These sequences were clustered at the 97% level to generate operational taxonomic units (OTUs). For metagenomic sequencing, DNA was extracted from 4 samples with a high abundance and diversity of chlamydiae. Sequence libraries were prepared and sequenced with Illumina HiSeq and resulting reads of each metagenome were assembled into contigs using IDBA-UD. A differential coverage genome binning approach (using CONCOCT), followed by manual curation, was used to obtain metagenome assembled genomes (MAGs).

a b

PVC group bacterium (ex Bugula neritina AB1) Sample ID Metagenome ID Gbp Sequenced Gbp Assembled (≥ 1 kb) Omnitrophica

GS08_GC12_126 KR126 16.4 1.10 Lentisphaerae GS10_PC15_940 K940 63 1.30

Other Bacteria Other Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_11 GS10_PC15_1000 K1000 53.8 1.07 Chlamydiales bacterium SCGC AB-751-O23 Ca.Similichlamydia epinephelii K940 contig-124_1042 GS10_PC15_1060 K1060 116.4 2.38 K1060 contig-124_111011 Simkania negevensis K1060 contig-124_201400 K1060 contig-124_2150 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_49_12 c K1060 contig-124_15197 Chlamydiae bacterium Ga0074140 K1060 contig-124_59246 K940 contig-124_2839 KR126 contig-100_216 K1060 contig-124_217445 K1060 contig-124_194465 K1000 contig-124_9157 KR126 contig-100_1304 K940 contig-124_54975 K1000 contig-124_1375 K940 contig-124_65424 KR126 contig-100_105023 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_9 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 Chlamydiae Chlamydiae bacterium SM23-39 K1060 contig-124_32386 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_27_8 K1000 contig-124_138441 Simkaniaceae Parilichlamydiaceae bacterium SCGC AG-110-M15 bacterium SM23_39 Rhabdochlamydiaceae K1000 contig-124_1902 Simkaniaceae K940 contig-124_2150

bacterium RIFCSPLOWO2_02_FULL_49_12 Piscichlamydiaceae Simkaniaceae KR126 contig-100_3930

bacterium SCGC AG-110-P3 K1060 contig-124_9382

PCF6 PCF9 K940 contig-124_7225 PCF8 PCF2 Chlamydiales /Criblamydiaceae KR126 contig-100_2916 K1060 contig-124_114082 Chlamydiae PCF4 PCF9 PCF1 PCF7 K940 contig-124_5410 PCF5 Parachlamydiaceae/Criblamydiaceae Chlamydiales K1060 contig-124_29952 KR126 contig-100_14168 Waddliaceae Parachlamydiaceae/CriblamydiaceaeParachlamydiaceae/Criblamydiaceae Chlamydiales bacterium SCGC AG-110-M15 Parachlamydiaceae/Criblamydiaceae K1060 contig-124_133909 PCF3 Chlamydiales bacterium SCGC AG-110-P3 Criblamydia sequanensis Parachlamydiaceae/Criblamydiaceae Estrella lausannensis K940 contig-124_6068 Parachlamydiaceae Chlamydiaceae K940 contig-124_170740 0.5 substitutions Clavichlamydiaceae Clavichlamydiaceae K940 contig-124_6236 per site 0.09 substitutions K1000 contig-124_70302 per site KR126 contig-100_68564 K1060 contig-124_229337 ufBV ≥ 95 KR126 contig-100_6141 ufBV ≥ 80 Chlamydiaceae Metagenome-assembled genome

Supplementary Figure 2. Marine sediment metagenome sequencing statistics and chlamydiae diversity. a, Sequencing statistics and sample identifiers for the four sediment samples used for metagenomic sequencing, including the number of Gbp assembled for each metagenome. b, Maximum likelihood (ML) tree estimated using an alignment of fifteen ribosomal proteins (at least five of which had to be present) from reference taxa (black, collapsed clades in grey) and marine sediment chlamydiae (orange), under the LG+C60+G model of evolution implemented with IQ-TREE (180 taxa, 2308 sites). Black and white circles represent bipartition values greater than 95 and 80 percent, respectively, from 1000 ultrafast bootstraps (ufBV). Dotted lines indicate the ufBV for all branches in the indicated clade. Sequences corresponding to metagenome-assembled genomes retrieved in this study are indicated with a blue star. c, ML phylogeny of chlamydial 16S rRNA gene fragments identified in Loki’s Castle metagenomes (orange) in the context of a reference chlamydial dataset (black, collapsed clades in grey), inferred using IQ-TREE with the GTR+R7 model of evolution (344 taxa, 1554 sites).

36

a b c Other MAG and SAG species representatives Presence and absence of bacterial NOGs CC = Chlamydiae clade Absent Present Median Intergenic space Ca. Similichlamydia epinephelii C.b. RIFCSPHIGHO2_12_FULL_49_11 Chlamydiae bacterium K940_chlam_8 CC Chlamydiae bacterium K1060_chlam_2 Simkania negevensis I Chlamydiae bacterium RIFCSPLOWO2_02_FULL_49_12 Chlamydiae bacterium Ga0074140 Rhabdochlamydia helvetica Chlamydiae bacterium K940_chlam_2 Chlamydiae bacterium KR126_chlam_1 II Chlamydiae bacterium K1000_chlam_2 Chlamydiae bacterium KR126_chlam_3 Chlamydiae bacterium K1000_chlam_3 Chlamydiae bacterium K940_chlam_6 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_9 Chlamydiae bacterium CG10_big_fil_rev_8_21_14_0_10_42_34 III Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 Chlamydiae bacterium CG10_big_fil_rev_8_21_14_0_10_35_9

Chlamydiae bacterium SM23_39 Anoxychlamydiales

to outgroup Chlamydiae bacterium K1060_chlam_5 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_27_8 Chlamydiae bacterium K1000_chlam_1 Chlamydiae bacterium K940_chlam_1 Chlamydiae bacterium KR126_chlam_6 Chlamydiae bacterium K1060_chlam_1 Chlamydiae bacterium K940_chlam_4 Chlamydiae bacterium KR126_chlam_4 Chlamydiae bacterium K1060_chlam_3 Chlamydiae bacterium KR126_chlam_5 Chlamydiae bacterium K1060_chlam_4 Chlamydiae bacterium K940_chlam_5 Chlamydiales bacterium SCGC_AB-751-O23 Waddliaceae bacterium SP13 Chlamydiales bacterium SCGC_AG-110-P3 Environmental Chlamydiae bacterium K940_chlam_3 Criblamydia sequanensis Estrella lausannensis Chlamydiae bacterium K940_chlam_7 Waddlia chondrophila Parachlamydia acanthamoebae

Parachlamydia sp. BC-030

Ca. Rubidus massiliensis chlamydiae Chlamydiales bacterium 38-26 Neochlamydia sp. EPS4 Parachlamydiaceae bacterium HS-T3 Parachlamydia sp. C2 Ca. Protochlamydia amoebophila Ca. Protochlamydia naegleriophila Chlamydiales bacterium SCGC_AG-110-M15 Chlamydiae bacterium K940_chlam_9 Chlamydiae bacterium K1000_chlam_4 V Chlamydiae bacterium KR126_chlam_2 Chlamydia trachomatis Chlamydia muridarum

Chlamydia suis Chlamydiaceae Chlamydophila pecorum Chlamydia sp. 2742-308 Ca. Chlamydia corallus Chlamydophila pneumoniae Chlamydia ibidis BV ≥ 90 Chlamydia avium Chlamydia gallinacea BV ≥ 70 Chlamydophila caviae 0.4 substitutions per site 0 50 100 Supplementary Figure 3. Species phylogeny and gene content variation across the Chlamydiae phylum. a, was estimated using a concatenated alignment of 38 single-copy marker proteins, using IQ-TREE under the PMSF approximation of LG+C60 (8072 sites). Bipartitions are labeled with black and white circles representing non-parametric bootstrap values (BV) greater or equal to 90 and 70, respectively. The phylogeny includes other metagenome assembled genome (MAG) and single-cell assembled genomes (SAG) chlamydiae species representatives (stars, see Methods, Supplementary Table 3). b, Presence (in dark grey) and absence (in light grey) of NOGs found across all chlamydial lineages, with delineated Chlamydiae clades indicated. c, Median intergenic space in bp across chlamydial genomes.

37

Supplementary Figure 4. Overview of selected protein content across Chlamydiae. Presence of selected proteins and pathways including traits associated with the chlamydiae biphasic lifecycle, components of central carbon metabolism and nucleotide and amino acid biosynthesis, across Chlamydiae species representatives color-coded according to Chlamydiae clades. Where relevant the corresponding KEGG pathway module is indicated in brackets.

38

130 131

Environmental 98 chlamydiae 82 86 60 69 59 52 13 41 14 33 8 33 30 29 23

121

Chlamydiae bacterium 81 75 K940_chlam_9 67 53 53 56 51 46 35 34 29 10 27 32 9 13 14

113 91 Chlamydiae bacterium 77 KR12_6chlam_2 51 49 50 48 39 43 28 28 30 10 22 24 6 14 6

Chlamydiae bacterium 86 K1000_chlam_4 CC-IV 57 54 38 34 23 29 29 30 25 6 3 18 17 3 17 17 4

110

Chlamydiaceae 60 56 46 35 35 33 40 10 9 11 26 30 31 19 2 19 3

J K L DMNOT U V CEF G H I P Q

Information storage and processing Cellular processes and signalling Metabolism J: Translation, ribosomal structure and biogenesis D: Cell cycle control, cell division, chromosome partitioning C: Energy production and conversion K: Transcription M: Cell wall/membrane/envelope biogenesis E: Amino acid transport and metabolism L: Replication, recombination and repair N: Cell motility F: Nucleotide transport and metabolism O: Posttranslational modification, protein turnover, chaperones G: Carbohydrate transport and metabolism T: Signal transduction mechanisms H: Coenzyme transport and metabolism U: Intracellular trafficking, secretion, and vesicular transport I: Lipid transport and metabolism V: Defense mechanisms P: Inorganic ion transport and metabolism Q: Secondary metabolites biosynthesis, transport and catabolism

Supplementary Figure 5. COG category distribution patterns. Distributions of the number of NOGs assigned across COG categories for environmental chlamydiae (mean and standard deviation), CC-IV, and Chlamydiaceae (mean and standard deviation).

39

NOG or PF Present 2742-308 sp. NOG or PF Absent Chlamydia corallus Chlamydia

NOG or PF Domain Description Chlamydia trachomatis Chlamydia muridarum Chlamydia suis Chlamydophila pecorum Chlamydia Ca. Chlamydophila pneumoniae Chlamydia ibidis Chlamydia avium Chlamydia gallinacea Chlamydia felis Chlamydophila caviae Chlamydia abortus Chlamydia psittaci Host Interaction and Adhesion 0ZM85 Polymorphic membrane protein - family A 1 1 1 1 1 1 1 1 1 1 1 1 1 0Y0KC Polymorphic membrane protein - family B/C 2 2 2 1 1 1 2 1 1 1 1 1 1 1 0XV92 Polymorphic outer membrane protein - family D/E/F/G/H 5 5 6 11 14 19 13 22 4 5 21 13 10 13 0Y3IR Polymorphic membrane protein - family G 1 1 1 2 1 1 1 1 1 1 1 1 1 1 PF03503 Chlamydia cysteine-rich outer membrane protein 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 PF05745 Chlamydia 15 kDa cysteine-rich outer membrane protein (CRPA) 1 1 1 1 1 1 1 1 1 1 1 1 1 PF04156 IncA protein 4 5 3 1 2 9 1 1 1 8 10 8 10 PF17628 Inclusion membrane protein D 1 1 1 1 1 Virulence Factors 0Y2RT Porin AaxA/Carbohydrate-selective porin OprB 1 1 1 1 1 1 1 1 1 1 1 1 1 1 COG1945 Arginine decarboxylase 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0Y3HQ Membrane attack complex (MAC) perforin 1 1 1 1 1 1 2 1 2 1 2 0ZUEX Adherence factor/cytotoxin 4 3 2 2 2 1 1 1 1 PF05475 Pgp3 C-terminal domain 1 1 1 1 1 1 1 1 Vitamin Biosynthesis (Folate) COG1478 Alternate folylglutamate synthase FolC2 1 1 1 1 1 1 1 1 1 1 1 1 1 0ZGA4 Dihydroneopterin aldolase FolB 1 1 1 1 1 1 1 1 1 1 1 1 Metabolism COG1218 3'(2'),5'bisphosphate nucleotidase 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0Z2QW Adenosine AMP deaminase 1 1 1 1 1 1 1 COG0352 Thiamine monophosphate synthase 1 1 1 1 1 Gene Expression PF07382 Histone H1-like nucleoprotein HC2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 PF17455 Late transcription unit B protein 1 1 1 1 1 1 1 1 1 1 1 1 1 1 PF17446 Late transcription unit A protein 1 1 1 1 1 1 1 1 1 1 1 Unknown Function 11VHY Conserved hypothetical protein 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0Y23D DUF5398 - Domain of unknown function 1 1 1 1 1 1 1 1 1 1 1 1 1 1 PF06587 DUF1137 - Domain of unknown function 1 1 1 1 1 1 1 1 1 1 1 1 1 1 PF16802 DUF5070 - Domain of unknown function 1 1 1 1 1 1 1 1 1 1 1 1 1 1 PF07577 DUF1547 - Domain of unknown function 1 1 1 1 1 1 1 1 1 1 1 PF07146 DUF1389 - Domain of unknown function 2 3 4 5 4 3 3 3 7 3 7 PF07560; PF07579 DUF1539 and DUF1548 - Domains of unknown function 2 1 1 1 1 1 1 3 3 3 3

Supplementary Figure 6. Conserved gene content restricted to the Chlamydiaceae family. Presence (dark grey), absence (light grey), and number of genes assigned to NOGs or with PF domains found uniquely within Chlamydiaceae lineages among Chlamydiae, and which is conserved across the family (in a third of representative genomes).

40

a

NOG or PF Present NOG or PF Absent bacterium K940_chlam_9 bacterium K1000_chlam_4 bacterium KR126_chlam_2 bacterium sp. sp. 2742-308 Chlamydia corallus Chlamydia Chlamydia trachomatis Chlamydia muridarum Chlamydia suis Chlamydophila pecorum Chlamydia Ca. Chlamydophila pneumoniae Chlamydia ibidis Chlamydia avium Chlamydia gallinacea Chlamydia felis Chlamydophila caviae Chlamydia abortus Chlamydia psittaci NOG PF Domain Description Chlamydiae Chlamydiae Chlamydiae -- PF04518 Effector from type III secretion system 1 5 5 5 4 4 4 4 4 4 4 4 4 4 4 -- PF05302 Domain of unknown function (DUF720) 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 10UQK PF07079 Domain of unknown function (DUF1347) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 COG0400 PF02230 Phospholipase/Carboxylesterase 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 -- PF17458 Domain of unknown function (DUF5421) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -- PF17459 Domain of unknown function (DUF5422) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -- PF17461 Domain of unknown function (DUF5423) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

b c

1 2 3 4 5 1 2 3

PF04518 PF04518 PF04518 PF04518 PF04518 PF05302 PF05302 PF05302

Chlamydiae bacterium K940_chlam_9 Chlamydiae bacterium K940_chlam_9 1

1 Chlamydiaceae 1 Chlamydiaceae

2 Chlamydiae bacterium K940_chlam_9 Chlamydiaceae 2

2 3 Chlamydiaceae Chlamydiaceae

Chlamydia abortus 4 Chlamydiae bacterium K940_chlam_9 3 Chlamydia psittaci

Chlamydophila caviae 3 Chlamydiaceae Chlamydia felis

Chlamydia gallinacea

Chlamydia avium 1 substitution Chlamydia ibidis per site Chlamydia muridarum

1 substitution Chlamydia trachomatis per site Chlamydia suis

Chlamydophila pecorum BV ≥ 90 Chlamydia sp. 2742308 BV ≥ 70

Chlamydophila pneumoniae

Ca. Chlamydia corallus

Chlamydia trachomatis 5 Chlamydia muridarum

Chlamydia suis

Supplementary Figure 7. Evolutionary insights into gene content shared between Chlamydiaceae and CC-IV. a, Presence (dark grey), absence (light grey), and number of genes assigned to NOGs or with PF domains found conserved uniquely in CC-IV and Chlamydiaceae lineages among Chlamydiae. Phylogenetic tree and typical genomic organization of gene families containing PF domains b, PF04518 and c, PF05302. Phylogenies were inferred with IQ-TREE under the PMSF approximation of LG+C20+G+F (PF04518: 395 sites, PF05302: 126 sites). Bipartitions are labeled with black and white circles representing non-parametric bootstrap values (BV) greater or equal to 90 and 70, respectively.

41

a Waddliaceae bacterium SP13 Chlamydiales bacterium SCGC AG-110-P3 Chlamydiae bacterium K1000chlam4 Chlamydiae bacterium KR126chlam2 Legend

CC-IV Chlamydiae bacterium K940chlam9 Chlamydiales bacterium SCGC AG-110-M15 sctJ sctN sctU sctV sctR sctS sctT flgB flgC fliE sctQ Chlamydiales bacterium SCGC AB-751-O23 50 kb b concatenated SctJNRSTUV c SctJ NF-T3SS

Proteobacteria Symbiobacterium thermophilum Alphaproteobacteria Alphaproteobacteria / Opitutus terrae Spirochaetales Alphaproteobacteria Thermotogales Aquificales Rhodothermus marinus Gemmatimonas aurantiaca Bacteria Waddliaceae bacterium SP13 Chlamydiae bacterium K940_chlam_9

Chlamydiae bacterium KR126_chlam_2 Flagellum Flagellum Aquificales Bacteria

Deferribacterales Epsilonbacteria Firmicutes Deltaproteobacteria Spirochaetales Ca. defluvi Deltaproteobacteria 0.5 Chlamydiales bacterium SCGC AB-751-O23

Waddliaceae bacterium SP13 ydiae

Myxococcales Chlamydiae bacterium K940chlam9 m Chlamydiae bacterium KR126chlam2 Ca. Similichlamydia epinephelii Waddliaceae bacterium SP13 d SctR Chla Environmental chlamydiae NF-T3SS Chlamydiaceae Deltaproteobacteria CC-IV

Chlamydiae bacterium K940_chlam_8 CC-III Desulfatibacillus alkenivorans Bacteria Anoxychlamydiales Chlamydiae

CC-II NF-T3SS Deltaproteobacteria Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_11 CC-I Bacteria Proteobacteria Bacteria Proteobacteria

Proteobacteria Bacteria Verrucomicrobium spinosum 0.5 substitutions per site Deltaproteobacteria Bacteria Flagellum Proteobacteria

Acidobacteria

Gemmatimonas aurantiaca Waddliaceae bacterium SP13 Chlamydiales bacterium SCGC AG-110-P3 Chlamydiales bacterium SCGC AG-110-M15 0.5 ydiae Chlamydiae bacterium K940chlam9 m Chlamydiae bacterium K1000chlam4 Chlamydiae bacterium KR126chlam2 Chla

Supplementary Figure 8. Synteny of flagellar components found in chlamydial lineages and phylogenetic analyses of homologous NF-T3SS and flagellar components. a, Synteny of flagellar genes in Chlamydiae, following the gene nomenclature by Abby et al.713030. All genomes with at least one homolog of flagellar genes detected by MacSyFinder are included, except for those genomes with flagellear homologs of sctN and sctV, which have been reported to be co-opted by the NF-T3SS machinery (see Supplementary Discussion). Genes (arrows) are colored according to the legend next to the synteny plot. Genome regions are defined as 10 kb up- and downstream the colored genes, and are truncated at contig boundaries (thicker, vertical lines). Comparison lines between genes represent best reciprocal BLASTP hits with an e-value less than or equal to 0.001. Phylogenies of b, a concatenated dataset of the SctJNRSTUV proteins (PMSF approximation of LG+F+C50+R4, 626 sequences, 1635 sites), c, the SctJ protein (LG+F+C40+R4, 630 sequences, 126 sites) and d, the SctR protein (LG+F+C40+R4, 651 sequences, 171 sites). All phylogenies were reconstructed with IQ-TREE and were rooted with the respective paralogues. Bipartitions are labeled with black and white circles representing non-parametric bootstrap values (BV) greater or equal to 90 and 70, respectively.

42

Neochlamydia sp. S13 Neochlamydia sp. EPS4 Neochlamydia sp. TUME1 Chlamydiales bacterium 38-26 Candidatus Rubidus massiliensis Chlamydia sp. 32-24 Parachlamydia acanthamoebae str. Hall’s Parachlamydia acanthamoebae OEW1 Parachlamydia acanthamoebae UV-7 Parachlamydia acanthamoebae Bn9 Parachlamydia acanthamoebae BC.030 Candidatus Protochlamydia amoebophila UWE25 Candidatus Protochlamydia sp. R18 Candidatus Protochlamydia sp. W-9 Candidatus Protochlamydia amoebophila EI2 Candidatus Protochlamydia massiliensis Candidatus Protochlamydia naegleriophila KNIc Parachlamydia acanthamoebae Environmental chlamydiae Environmental Parachlamydiaceae bacterium HS-T3 Chlamydiales bacterium SCGC AG-110-P3 Criblamydia sequanensis CRIB-18 Estrella lausannensis CRIB-30 Chlamydiae bacterium K940chlam3 Chlamydiae bacterium K940chlam7 Waddlia chondrophila WSU 86-1044 Waddliaceae bacterium SP13 Chlamydia abortus S26/3 Chlamydia psittaci 6BC Chlamydophila caviae GPIC Chlamydia felis Fe/C-56 Chlamydia gallinacea 08-1274/3 Chlamydia avium 10DC88 Chlamydia muridarum str. Nigg Chlamydia suis MD56 Chlamydia trachomatis D/UW-3/CX Chlamydia ibidis 10-1398/6 Chlamydiaceae Chlamydophila pneumoniae CWL029 Chlamydia corallus G3/2742-324 Chlamydia sp. 2742-308 Chlamydophila pecorum E58 Chlamydiae bacterium K1000chlam4 Chlamydiae bacterium KR126chlam2

CC-IV Chlamydiae bacterium K940chlam9 Chlamydiales bacterium SCGC AG-110-M15 Chlamydiales bacterium SCGC AB-751-O23 Legend Chlamydiae bacterium K1060chlam3 sctN sctQ sctC sctU sctV sctJ sctR sctS sctT Chlamydiae bacterium KR126chlam5 Chlamydiae bacterium K940chlam5 Chlamydiae bacterium K940chlam4 Chlamydiae bacterium KR126chlam4 Chlamydiae bacterium K1060chlam1 Chlamydiae bacterium KR126chlam6 Chlamydiae bacterium K1000chlam1 Chlamydiae bacterium K940chlam1 Chlamydiae bacterium K1060chlam5

Anoxychlamydiales Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_27_8 Chlamydiae bacterium RIFCSPLOWO2_01_FULL_28_7 Chlamydiae bacterium SM23_3 Chlamydiae bacterium RIFCSPHIGHO2_01_FULL_44_39 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 Chlamydiae bacterium RIFCSPLOWO2_12_FULL_45_2 Chlamydiae bacterium RIFCSPLOWO2_01_FULL_44_52 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_44_59 CC-III Chlamydiae bacterium RIFCSPHIGHO2_02_FULL_45_9 Chlamydiae bacterium CG10_big_fil_rev_8_21_14_0_10_42_34 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_9 Chlamydiae bacterium CG10_big_fil_rev_8_21_14_0_10_35_9 Chlamydiae bacterium K1000chlam3 Chlamydiae bacterium K940chlam6 Chlamydiae bacterium KR126chlam3 Chlamydiae bacterium K1000chlam2 Chlamydiae bacterium KR126chlam1 Chlamydiae bacterium K940chlam2 Chlamydiae bacterium Ga0074140 Rhabdochlamydia helvetica T3358

CC-II Chlamydiae bacterium GWA2_50_15 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_49_12 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_32 Chlamydiae bacterium GWC2_50_10 Chlamydiae bacterium RIFCSPLOWO2_12_FULL_49_12 Chlamydiae bacterium GWF2_49_8 Chlamydiae bacterium RIFCSPHIGHO2_02_FULL_49_29 Chlamydiae bacterium K1060chlam2

CC-I Simkania negevensis Z Chlamydiae bacterium K940chlam8 Ca Similichlamydia epinephelii Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_11 20 kb Supplementary Figure 9. Conserved synteny of NF-T3SS components across Chlamydiae. Synteny plot including genomes with at least one homolog of NF-T3SS genes detected by MacSyFinder. Genes (arrows) are colored according to the legend. Genome regions are defined as 10 kb up- and downstream of the colored genes, and are truncated at contig boundaries (thicker, vertical lines). Comparison lines between genes represent best reciprocal BLASTP hits with an e-value less than or equal to 0.001.

43

Chlamydiales bacterium 38-26 Criblamydia sequanensis CRIB-18 Chlamydia sp. 32-24 Estrella lausannensis CRIB-30 Chlamydiae bacterium K940chlam3 Chlamydiae bacterium K940chlam7 Neochlamydia sp. S13 Neochlamydia sp. TUME1 Neochlamydia sp. EPS4 Parachlamydia acanthamoebae str. Hall's coccus Parachlamydia acanthamoebae UV-7 Parachlamydia acanthamoebae OEW1 Parachlamydia acanthamoebae Bn9 Parachlamydia sp. C2 Parachlamydia sp. BC.030 Parachlamydiaceae bacterium HS-T3 Candidatus Protochlamydia amoebophila UWE25

Environmental chlamydiae Environmental Candidatus Protochlamydia amoebophila EI2 Candidatus Protochlamydia massiliensis Candidatus Protochlamydia naegleriophila KNic Candidatus Protochlamydia sp. R18 Candidatus Protochlamydia sp. W-9 Candidatus Rubidus massiliensis Chlamydiales bacterium SCGC AG-110-P3 Waddlia chondrophila WSU 86-1044 Waddliaceae bacterium SP13 Chlamydia abortus S26/3 Chlamydia avium 10DC88 Chlamydia corallus G3/2742-324 Chlamydia felis Fe/C-56 Chlamydia gallinacea 08-1274/3 Legend Chlamydophila caviae GPIC Chlamydophila pneumoniae CWL029 gspD gspE gspF gspG gspH gspI gspJ gspL pilAE pilB pilC pilM pilQ tadZ Chlamydophila pecorum E58 Chlamydia ibidis 10-1398/6 Chlamydia muridarum str. Nigg Chlamydiaceae Chlamydia psittaci 6BC Chlamydia sp. 2742-308 Chlamydia suis MD56 Chlamydia trachomatis D/UW-3/CX Chlamydiae bacterium K940chlam9 Chlamydiae bacterium KR126chlam2 CC-IV Chlamydiae bacterium K1000chlam1 Chlamydiae bacterium K1060chlam1 Chlamydiae bacterium K1060chlam3 Chlamydiae bacterium K1060chlam5 Chlamydiae bacterium K940chlam1 Chlamydiae bacterium K940chlam4 Chlamydiae bacterium K940chlam5 Chlamydiae bacterium KR126chlam4 Chlamydiae bacterium KR126chlam5 Chlamydiae bacterium KR126chlam6 Anoxychlamydiales Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_27_8 Chlamydiae bacterium RIFCSPLOWO2_01_FULL_28_7 Chlamydiae bacterium SM23_3 Chlamydiae bacterium CG10_big_fil_rev_8_21_14_0_10_42_34 Chlamydiae bacterium RIFCSPHIGHO2_01_FULL_44_39 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_44_59 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_9 Chlamydiae bacterium RIFCSPHIGHO2_02_FULL_45_9

CC-III Chlamydiae bacterium RIFCSPLOWO2_01_FULL_44_52 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 Chlamydiae bacterium RIFCSPLOWO2_12_FULL_45_2 Chlamydiae bacterium CG10_big_fil_rev_8_21_14_0_10_35_9 Chlamydiae bacterium Ga0074140 Chlamydiae bacterium GWA2_50_15 Chlamydiae bacterium GWC2_50_10 Chlamydiae bacterium GWF2_49_8 Chlamydiae bacterium K1000chlam2 Chlamydiae bacterium K1000chlam3 Chlamydiae bacterium K940chlam2 Chlamydiae bacterium K940chlam6

CC-II Chlamydiae bacterium KR126chlam1 Chlamydiae bacterium KR126chlam3 Rhabdochlamydia helvetica T3358 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_32 Chlamydiae bacterium RIFCSPHIGHO2_02_FULL_49_29 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_49_12 Chlamydiae bacterium RIFCSPLOWO2_12_FULL_49_12 Chlamydiae bacterium K1060chlam2 Simkania negevensis Z CC-I Chlamydiae bacterium K940chlam8 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_11 Ca Similichlamydia epinephelii 10 kb Supplementary Figure 10. Conserved synteny of T2SS components across Chlamydiae. Synteny plot including Chlamydiae genomes with at least one T2SS detected by MacSyFinder (Supplementary Data 3). Genes (arrows) are colored according to the legend. Genome regions are defined as 10 kb up- and downstream of the colored genes, and are truncated at contig boundaries (thicker, vertical lines). Comparison lines between genes represent best reciprocal BLASTP hits with an e-value less than or equal to 0.001.

44

Parachlamydia sp. C2

Parachlamydia sp. BC.030

Candidatus Protochlamydia amoebophila UWE25

Candidatus Protochlamydia naegleriophila KNiC Legend

a genes tr trbC MOBQ t4cp1 t4cp2 virb4 Candidatus Protochlamydia sp. R18 Environmental chlamydiae Environmental Candidatus Rubidus massiliensis

Chlamydiae bacterium K1060chlam2 CC-I Simkania negevensis Z

Chlamydiae bacterium K1000chlam3 CC-II

Waddliaceae bacterium SP13 10 kb Supplementary Figure 11. Conserved synteny of T4SS components in Chlamydiae. Synteny plot including Chlamydiae genomes with at least one T4SS detected by MacSyFinder (Supplementary Data 3). Genes (arrows) are colored according to the legend. Genome regions are defined as 10 kb up- and downstream of the colored genes, and are truncated at contig boundaries (thicker, vertical lines). Comparison lines between genes represent best reciprocal BLASTP hits with an e-value less than or equal to 0.001.

45

a

1 2

CC-IV CC-I CC-II CC-III Anoxy Chl Env PamNTT5 3 1 (GTP, ATP/H+ symporter) 4 CtNTT2 5 2 (NTP/H+ symporter)

SnNTT2 3 (GTP, ATP/H+ symporter)

PamNTT3 4 (UTP/H+ symporter)

PamNTT2 5 (NTP counter exchange transporter) 0.5 substitutions per site 6 6 7

7 SnNTT3 9 8 (NTP, dCTP counter exchange transporter)

CtNTT1/PamNTT1/SnNTT1 9 2x (ATP/ADP antiporter, NAD+/ADP antiporter)

8 b c

10

0.5 substitutions per site 0.5 substitutions per site

12

11

CC-IV Anoxy CC-I CC-II CC-III Chl Env

SnNTT4 10 (unknown substrate) CC-IV Anoxy CC-I CC-II CC-III Chl 11 Env 12

Chlamydiae Diatoms/Haptophytes/Brown algae Alphaproteobacteria Deltaproteobacteria Cyanobacteria Archaeplastida/Green algae/Red algae Microsporidia Ca Dependentiae Bacteroides Supplementary Figure 12. Phylogenetic inference of nucleotide transporters. Phylogenetic trees of nucleotide transporter (NTT) proteins found in both prokaryotes and eukaryotes. Chlamydiae are shown in orange with clade affiliation of the chlamydiae within each chlamydial NTT cluster indicated by the coloured circles. Functionally characterized NTTs from each cluster are indicated. Species and clade name abbreviations: environmental chlamydiae (Env), Chlamydiaceae (Chl), Anoxychlamydiales (Anoxy), Chlamydia trachomatis (Ct), Ca. Protochlamydia acanthamoebae (Pam), Simkania negevensis (Sn). See legend for color scheme of additional lineages. a, ML phylogeny inferred using IQ-TREE with the LG+F+R8 model of “canonical NTTs” (400 taxa, 357 sites) b, ML phylogeny of “other NTTs” (that form a sister clade to the “canonical NTTs”), inferred using IQ-TREE with the LG+F+R8 model of evolution (302 taxa, 348 sites). c, ML phylogeny of “NTT-HEAT NTTs”, inferred using IQ-TREE with the LG+F+R6 model of evolution (157 taxa, 329 sites).

46

Ferroplasma_acidarmanus_fer1 100 94 Methanocorpusculum_labreanum_Z 64 Archaeoglobus_fulgidus_DSM_4304 Methanocaldococcus_fervens_AG86 29 26 Nanoarchaeum_equitans_Kin4_M 33 Pyrococcus_furiosus_COM1 70 Sulfolobus_acidocaldarius_DSM_639 69 Candidatus_Korarchaeum_cryptofilum_OPF8 80 lokiarch 100 Cenarchaeum_symbiosum_A Candidatus_Caldiarchaeum_subterraneum 89 Leptospira_biflexa_serovar_Patoc_strain_Patoc_1__Paris_ 79 Brachyspira_intermedia_PWS_A 100 Borrelia_burgdorferi_B31 Treponema_pallidum_subsp._pallidum_str._Nichols 32 Omnitrophica_bacterium_OLB16.contig_contig_1 100 Chlorobaculum_parvum_NCIB_8327 97 Rhodothermus_marinus_DSM_4252 100 Bacteroides_thetaiotaomicron_VPI-5482 65 Cytophaga_hutchinsonii_ATCC_33406 Pedobacter_heparinus_DSM_2366 100 Marinithermus_hydrothermalis_DSM_14884 93 Truepera_radiovictrix_DSM_17093 68 Deinococcus_radiodurans_R1 98 Fervidobacterium_pennivorans_DSM_9078 100 Thermotoga_maritima_MSB8 81 Petrotoga_mobilis_SJ95 Kosmotoga_olearia_TBF_19.5.1 40 99 Acidaminococcus_intestini_RyC-MR95 100 Heliobacterium_modesticaldum_Ice1 100 84 Natranaerobius_thermophilus_JW_NM-WN-LF 100 Lactococcus_lactis_subsp._lactis_Il1403 Listeria_innocua_Clip11262 92 38 Herpetosiphon_aurantiacus_DSM_785 100 Thermomicrobium_roseum_DSM_5159 57 Dehalococcoides_ethenogenes_195 100 Caldilinea_aerophila_DSM_14535_=_NBRC_104270 35 Anaerolinea_thermophila_UNI-1 Streptosporangium_roseum_DSM_43021 100 100 Nocardia_brasiliensis_ATCC_700358 93 Actinosynnema_mirum_DSM_43827 31 52 Kineococcus_radiotolerans_SRS30216_=_ATCC_BAA-149 Catenulispora_acidiphila_DSM_44928 100 Gloeobacter_violaceus_PCC_7421 100 Prochlorococcus_marinus_str._MIT_9303 Synechococcus_sp._PCC_6312 91100 Pleurocapsa_sp._PCC_7327 62 Cyanothece_sp._PCC_7822 41 Trichodesmium_erythraeum_IMS101 100 13 Mastigocladopsis_repens Anabaena_cylindrica_PCC_7122 90 Stigmatella_aurantiaca_DW4_3-1 99 Pelobacter_propionicus_DSM_2379 96 Desulfobulbus_propionicus_DSM_2032 Syntrophobacter_fumaroxidans_MPOB 100 Nautilia_profundicola_AmH 42 97 Helicobacter_pylori_26695 Arcobacter_nitrofigilis_DSM_7299 100 Thiobacillus_denitrificans_ATCC_25259 18 Chromobacterium_violaceum_ATCC_12472 48 60 100 Nitrosomonas_sp._Is79A3 Burkholderia_xenovorans_LB400 77 Legionella_pneumophila_subsp._pneumophila_str._Philadelphia_1 96 Allochromatium_vinosum_DSM_180 62 98 Alteromonas_sp._SN2 100 Acinetobacter_baumannii_ATCC_17978 GCA_000482685.contig_contig_1 Magnetococcus_marinus_MC-1 75 71 Candidatus_Pelagibacter_sp._IMCC9063 46 alpha_proteobacterium_HIMB59 75 GCA_000371985.contig_contig_1 76 Rickettsia_prowazekii_str._Madrid_E Candidatus_Caedibacter_acanthamoebae 89 86 Acetobacter_pasteurianus_IFO_3283-01 94 Geminicoccus_roseus 94 Novosphingobium_sp._PP1Y 91 Caulobacter_crescentus_CB15 43 Bartonella_quintana_str._Toulouse Dinoroseobacter_shibae_DFL_12_=_DSM_16493 Chlamydiae_bacterium_RIFCSPHIGHO2_12_FULL_49_11.contig_contig_1 K940_chlam_8.rp15.contig.prokka_contig_1 100 Simkania_negevensis_Z K1060_chlam_2.rp15.contig.prokka_contig_1 55 13 Chlamydiae_bacterium_GWF2_49_8.contig_contig_1 97 100 Chlamydiae_bacterium_RIFCSPHIGHO2_02_FULL_49_29.contig_contig_1 Chlamydiae_bacterium_RIFCSPLOWO2_12_FULL_49_12.contig_contig_1 3369Chlamydiae_bacterium_RIFCSPLOWO2_02_FULL_49_12.contig_contig_1 100 70 93 Chlamydiae_bacterium_RIFCSPHIGHO2_12_FULL_49_32.contig_contig_1 18Chlamydiae_bacterium_GWC2_50_10.contig_contig_1 Chlamydiae_bacterium_GWA2_50_15.contig_contig_1 65 Chlamydiae bacterium Ga0074140 100 K940_chlam_2.rp15.contig.prokka_contig_1 100 100 KR126_chlam_1.rp15.contig.prokka_contig_1 97 K1000_chlam_2.rp15.contig.prokka_contig_1 100 KR126_chlam_3.rp15.contig.prokka_contig_1 100 100 K940_chlam_6.rp15.contig.prokka_contig_1 K1000_chlam_3.rp15.contig.prokka_contig_1 100 Chlamydiae_bacterium_RIFCSPHIGHO2_12_FULL_49_9.contig_contig_1 100 Chlamydiae_bacterium_RIFCSPHIGHO2_02_FULL_45_9.contig_contig_1 100Chlamydiae_bacterium_RIFCSPLOWO2_12_FULL_45_20.contig_contig_1 12Chlamydiae_bacterium_RIFCSPLOWO2_01_FULL_44_52.contig_contig_1 100 8 Chlamydiae_bacterium_RIFCSPHIGHO2_12_FULL_44_59.contig_contig_1 23Chlamydiae_bacterium_RIFCSPHIGHO2_01_FULL_44_39.contig_contig_1 Chlamydiae_bacterium_RIFCSPLOWO2_02_FULL_45_22.contig_contig_1 Chlamydiae_bacterium_SM23_39.contig_contig_1 100 100 Chlamydiae_bacterium_RIFCSPHIGHO2_12_FULL_27_8.contig_contig_1 100 Chlamydiae_bacterium_RIFCSPLOWO2_01_FULL_28_7.contig_contig_1 K1060_chlam_5.rp15.contig.prokka_contig_1 67 100 K940_chlam_1.rp15.contig.prokka_contig_1 100 K1000_chlam_1.rp15.contig.prokka_contig_1 KR126_chlam_6.rp15.contig.prokka_contig_1 100 98 K1060_chlam_1.rp15.contig.prokka_contig_1 100100K940_chlam_4.rp15.contig.prokka_contig_1 100 KR126_chlam_4.rp15.contig.prokka_contig_1 100K940_chlam_5.rp15.contig.prokka_contig_1 100K1060_chlam_4.rp15.contig.prokka_contig_1 100K1060_chlam_3.rp15.contig.prokka_contig_1 KR126_chlam_5.rp15.contig.prokka_contig_1 89 K940_chlam_9.rp15.contig.prokka_contig_1 100 K1000_chlam_4.rp15.contig.prokka_contig_1 100 KR126_chlam_2.rp15.contig.prokka_contig_1 Chlamydophila_pecorum_E58.contig_contig_1 100 100 Chlamydia_sp_2742-308.contig_contig_1 Chlamydophila_pneumoniae_CWL029 5327 Chlamydia_ibidis_10-1398-6.contig_contig_1 100 Chlamydia_trachomatis_D_UW-3_CX 50 Chlamydia_muridarum_str_Nigg.contig_contig_1 60 Chlamydia_suis_MD56.contig_contig_1 100 96 Chlamydia_gallinacea_08-1274-3.contig_contig_1 83 99 Chlamydia_avium_10DC88.contig_contig_1 100 Chlamydia_felis_Fe-C-56.contig_contig_1 82 Chlamydophila_caviae_GPIC.contig_contig_1 100Chlamydia_psittaci_6BC.contig_contig_1 Chlamydia_abortus_S26-3.contig_contig_1 100 Estrella_lausannensis_CRIB-30.contig_contig_1 8 Criblamydia_sequanensis_CRIB-18.contig_contig_1 33 K940_chlam_3.rp15.contig.prokka_contig_1 100 K940_chlam_7.rp15.contig.prokka_contig_1 100 Waddlia_chondrophila_WSU_86-1044 10 Parachlamydiaceae bacterium HS-T3 Parachlamydia sp. C2 100100Protochlamydia_naegleriophila.contig_contig_1 73 100Chlamydia sp. Diamant 100Candidatus Protochlamydia amoebophila EI2 50Candidatus Protochlamydia sp. W-9 46 43Candidatus Protochlamydia sp. R18 Candidatus_Protochlamydia_amoebophila_UWE25 100 Parachlamydia_acanthamoebae_str_Halls_coccus.contig_contig_1 92Parachlamydia_acanthamoebae_UV-7 62Parachlamydia acanthamoebae Bn9 29 Parachlamydia acanthamoebae OEW1 100 Chlamydia_sp_32-24.contig_contig_1 79 Candidatus_Rubidus_massiliensis.contig_contig_1 100 Chlamydiales_bacterium_38-26.contig_contig_1 100 Neochlamydia_sp_TUME1.contig_contig_1 57Neochlamydia_sp_EPS4.contig_contig_1 Neochlamydia sp. S13 Kiritimatiella_glycovorans.contig_contig_1 100100 Lentisphaerae_bacterium_GWF2_57_35.contig_contig_1 74 GCA_001604235.contig_contig_1 100 Lentisphaerae_bacterium_RIFOXYA12_FULL_48_11.contig_contig_1 100 Lentisphaerae_bacterium_RIFOXYC12_FULL_60_16.contig_contig_1 65Lentisphaerae_bacterium_RIFOXYB12_FULL_60_10.contig_contig_1 Lentisphaerae_bacterium_RIFOXYA12_FULL_60_10.contig_contig_1 81 Lentisphaera_araneosa_HTCC2155.contig_contig_1 100 GCA_001603055.contig_contig_1 100 100 Lentisphaerae_bacterium_RIFOXYA12_64_32.contig_contig_1 Lentisphaerae_bacterium_RIFOXYB12_FULL_65_16.contig_contig_1 Lentisphaerae_bacterium_GWF2_38_69.contig_contig_1 100 100 Lentisphaerae_bacterium_GWF2_50_93.contig_contig_1 100 43 Lentisphaerae_bacterium_GWF2_49_21.contig_contig_1 90 Lentisphaerae_bacterium_GWF2_44_16.contig_contig_1 40 Lentisphaerae_bacterium_GWF2_52_8.contig_contig_1 Lentisphaerae_bacterium_GWF2_45_14.contig_contig_1 90 Verrucomicrobia_bacterium_CG1_02_43_26.contig_contig_1 90 Verrucomicrobia_bacterium_GWC2_42_7.contig_contig_1 Verrucomicrobia_bacterium_GWF2_51_19.contig_contig_1 99 100 GCA_001604565.contig_contig_1 77 GCA_001604585.contig_contig_1 100 Coraliomargarita_sp_CAG-312.contig_contig_1 46 Verrucomicrobia_bacterium_CAG-312_58_20.contig_contig_1 100 GCA_000383755.contig_contig_1 58 100 92 GCA_000382665.contig_contig_1 100 GCA_000382685.contig_contig_1 100 Opitutaceae_bacterium_BACL24_MAG-120322-bin51.contig_contig_1 100Coraliomargarita_akajimensis_DSM_45221.contig_contig_1 Coraliomargarita_akajimensis_DSM_45221 Verrucomicrobiae_bacterium_DG1235.contig_contig_1 80 Verrucomicrobia_bacterium_RIFCSPLOWO2_12_FULL_64_8.contig_contig_1 100 100 Opitutaceae_bacterium_IG16b.contig_contig_1 100 GCA_001464505.contig_contig_1 100 Opitutus_sp_GAS368.contig_contig_1 100 Cephaloticoccus_primus.contig_contig_1 75 Cephaloticoccus_capnophilus.contig_contig_1 100 Opitutus_terrae_PB90-1.contig_contig_1 94 Opitutus_terrae_PB90-1 83 Opitutaceae_bacterium_TSB47.contig_contig_1 Verrucomicrobia_bacterium_IMCC26134.contig_contig_1 100 87100Opitutaceae_bacterium_TAV5.contig_contig_1 100 Opitutaceae_bacterium_TAV1.contig_contig_1 100GCA_000171235.contig_contig_1 95Opitutaceae_bacterium_TAV3.contig_contig_1 Opitutaceae_bacterium_TAV4.contig_contig_1 96 Verrucomicrobia_bacterium_GWF2_62_7.contig_contig_1 GCA_001604625.contig_contig_1 100 84 Verrucomicrobia_bacterium_SCN_57-15.contig_contig_1 64 Pedosphaera_parvula_Ellin514.contig_contig_1 100 GCA_000385295.contig_contig_1 100GCA_000383715.contig_contig_1 GCA_000385275.contig_contig_1 100 Verrucomicrobia_subdivision_6_bacterium_BACL9_MAG-120507-bin52.contig_contig_1 66Verrucomicrobia_subdivision_6_bacterium_BACL9_MAG-120820-bin42.contig_contig_1 92 100 Verrucomicrobia_subdivision_6_bacterium_BACL9_MAG-120924-bin69.contig_contig_1 Verrucomicrobiaceae_bacterium_GAS474.contig_contig_1 98 100 GCA_000379365.contig_contig_1 100 GCA_000526255.contig_contig_1 100 Methylacidiphilum_infernorum_V4 100 Methylacidiphilum_kamchatkense_Kam1.contig_contig_1 100GCA_000953475.contig_contig_1 Methylacidiphilum_fumariolicum_SolV.contig_contig_1 100 Candidatus_Xiphinematobacter_sp_Idaho_Grape.contig_contig_1 100 90 48 Verrucomicrobia_bacterium_RIFCSPHIGHO2_12_FULL_41_10.contig_contig_1 100 76 Terrimicrobium_sacchariphilum.contig_contig_1 Verrucomicrobia_bacterium_61-8.contig_contig_1 40 Verrucomicrobia_bacterium_SCGC_AG-212-E04.contig_contig_1 Chthoniobacter_flavus_Ellin428.contig_contig_1 67 Verrucomicrobia_bacterium_13_1_40CM_4_54_4.contig_contig_1 100100Verrucomicrobia_bacterium_13_2_20CM_55_10.contig_contig_1 71 Verrucomicrobia_bacterium_13_2_20CM_2_54_15_9cls.contig_contig_1 64Verrucomicrobia_bacterium_13_1_20CM_54_28.contig_contig_1 100 100Verrucomicrobia_bacterium_13_1_20CM_3_54_17.contig_contig_1 74Verrucomicrobia_bacterium_13_1_20CM_4_54_11.contig_contig_1 Verrucomicrobia_bacterium_13_2_20CM_54_12.contig_contig_1

47

100 GCA_000739655.contig_contig_1 97GCA_000739635.contig_contig_1 100GCA_001313125.contig_contig_1 GCA_000172155.contig_contig_1 100 100 GCA_000428305.contig_contig_1 90 GCA_000739615.contig_contig_1 100 GCA_000378105.contig_contig_1 100 Rubritalea_squalenifaciens_DSM_18772.contig_contig_1 100 GCA_000285795.contig_contig_1 100 95 GCA_000264645.contig_contig_1 GCA_000264605.contig_contig_1 Akkermansia_glycaniphila.contig_contig_1 100100Akkermansia_sp_KLE1797.contig_contig_1 30Akkermansia_sp_KLE1605.contig_contig_1 100 Akkermansia_sp_KLE1798.contig_contig_1 Akkermansia_sp_CAG-344.contig_contig_1 9967GCA_000723745.contig_contig_1 51Akkermansia_muciniphila.contig_contig_1 100Akkermansia_muciniphila_ATCC_BAA-835 66Akkermansia_muciniphila_CAG-154.contig_contig_1 74GCA_001940945.contig_contig_1 85Akkermansia_sp_54_46.contig_contig_1 GCA_000980515.contig_contig_1 PVC_group_bacterium_ex_Bugula_neritina_AB1.contig_contig_1 Candidatus_Omnitrophica_bacterium_CG1_02_41_171.contig_contig_1 100 Omnitrophica_bacterium_GWA2_50_21.contig_contig_1 GCA_000405945.contig_contig_1 61 100 Omnitrophica_bacterium_GWA2_52_8.contig_contig_1 100 58 Omnitrophica_bacterium_RIFOXYB12_FULL_50_7.contig_contig_1 55 GCA_000402985.contig_contig_1 Omnitrophica_bacterium_GWA2_52_12.contig_contig_1 97 18Omnitrophica_bacterium_RIFCSPLOWO2_12_FULL_44_17.contig_contig_1 96 100 Omnitrophica_bacterium_RIFCSPHIGHO2_02_FULL_45_28.contig_contig_1 25Omnitrophica_bacterium_RIFCSPHIGHO2_12_FULL_44_12.contig_contig_1 100 Omnitrophica_bacterium_RIFCSPLOWO2_02_FULL_44_11.contig_contig_1 100 Omnitrophica_bacterium_RIFCSPHIGHO2_02_FULL_46_11.contig_contig_1 99 Omnitrophica_bacterium_RIFCSPLOWO2_01_FULL_45_10b.contig_contig_1 100 Omnitrophica_bacterium_RIFCSPLOWO2_12_FULL_50_11.contig_contig_1 100 Omnitrophica_bacterium_RIFCSPLOWO2_01_FULL_50_24.contig_contig_1 Omnitrophica_bacterium_RIFCSPHIGHO2_02_FULL_49_9.contig_contig_1 100 Candidatus_Omnitrophica_bacterium_CG1_02_40_15.contig_contig_1 94 46 Omnitrophica_bacterium_GWA2_41_15.contig_contig_1 100 Omnitrophica_bacterium_RIFCSPHIGHO2_02_FULL_63_14.contig_contig_1 100 Omnitrophica_bacterium_RIFCSPHIGHO2_02_FULL_51_18.contig_contig_1 100 55 Candidatus_Omnitrophica_bacterium_CG1_02_46_14.contig_contig_1 Candidatus_Omnitrophica_bacterium_CG1_02_49_16.contig_contig_1 82 Candidatus_Omnitrophica_bacterium_CG1_02_49_10.contig_contig_1 Candidatus_Omnitrophica_bacterium_CG1_02_43_210.contig_contig_1 49100 Omnitrophica_bacterium_RBG_13_46_9.contig_contig_1 100 Candidatus_Omnitrophus_magneticus.contig_contig_1 100 GCA_000398085.contig_contig_1 100 Omnitrophica_bacterium_RIFCSPLOWO2_01_FULL_45_10.contig_contig_1 100 Omnitrophica_bacterium_RIFCSPLOWO2_02_FULL_45_16.contig_contig_1 62Omnitrophica_bacterium_RIFCSPLOWO2_12_FULL_45_13.contig_contig_1 69 100 Omnitrophica_bacterium_RIFCSPHIGHO2_02_FULL_46_20.contig_contig_1 Omnitrophica_bacterium_RIFCSPLOWO2_01_FULL_45_24.contig_contig_1 74 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_02_FULL_68_15.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_02_FULL_67_20.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_GWF2_63_9.contig_contig_1 29Omnitrophica_WOR_2_bacterium_RIFCSPLOWO2_12_FULL_63_16.contig_contig_1 0 Omnitrophica_WOR_2_bacterium_GWA2_63_20.contig_contig_1 0 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_02_FULL_63_39.contig_contig_1 0 Omnitrophica_WOR_2_bacterium_RIFCSPLOWO2_02_FULL_63_16.contig_contig_1 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_12_FULL_64_13.contig_contig_1 74100 Omnitrophica_WOR_2_bacterium_SM23_29.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFCSPLOWO2_12_FULL_51_24.contig_contig_1 63Omnitrophica_WOR_2_bacterium_RIFCSPLOWO2_02_FULL_50_19.contig_contig_1 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_01_FULL_49_10.contig_contig_1 Candidatus_Omnitrophica_bacterium_CG1_02_44_16.contig_contig_1 100 42 Omnitrophica_WOR_2_bacterium_GWA2_47_8.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_02_FULL_48_11.contig_contig_1 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_01_FULL_48_9.contig_contig_1 100 46 100 Omnitrophica_WOR_2_bacterium_GWC2_45_7.contig_contig_1 Omnitrophica_WOR_2_bacterium_GWA2_45_18.contig_contig_1 100 100 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_02_FULL_50_17.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFCSPLOWO2_12_FULL_50_9.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_01_FULL_52_10.contig_contig_1 32 Omnitrophica_WOR_2_bacterium_GWA2_53_43.contig_contig_1 31 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_02_FULL_52_10.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFOXYC2_FULL_38_12.contig_contig_1 100 84Omnitrophica_WOR_2_bacterium_GWA2_37_7.contig_contig_1 40 84Omnitrophica_WOR_2_bacterium_RIFOXYA12_FULL_38_10.contig_contig_1 12Omnitrophica_WOR_2_bacterium_GWF2_38_59.contig_contig_1 19Omnitrophica_WOR_2_bacterium_RIFOXYB2_FULL_38_16.contig_contig_1 Omnitrophica_WOR_2_bacterium_RIFOXYA2_FULL_38_17.contig_contig_1 85 Omnitrophica_WOR_2_bacterium_SM23_72.contig_contig_1 54 Omnitrophica_WOR_2_bacterium_RIFCSPLOWO2_12_FULL_51_8.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RBG_13_44_8b.contig_contig_1 43 Omnitrophica_WOR_2_bacterium_RBG_13_44_8.contig_contig_1 91 Omnitrophica_WOR_2_bacterium_RBG_13_41_10.contig_contig_1 99 Omnitrophica_WOR_2_bacterium_RIFCSPLOWO2_01_FULL_41_12.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFOXYB2_FULL_45_11.contig_contig_1 100Omnitrophica_WOR_2_bacterium_GWB2_45_9.contig_contig_1 64Omnitrophica_WOR_2_bacterium_RIFOXYC2_FULL_45_15.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFOXYA2_FULL_45_12.contig_contig_1 59Omnitrophica_WOR_2_bacterium_GWA2_44_7.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFOXYC2_FULL_43_9.contig_contig_1 91 85 Omnitrophica_WOR_2_bacterium_GWC2_44_8.contig_contig_1 Omnitrophica_WOR_2_bacterium_GWF2_43_52.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_02_FULL_45_21.contig_contig_1 100Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_02_FULL_46_37.contig_contig_1 44Omnitrophica_WOR_2_bacterium_RIFCSPLOWO2_02_FULL_45_28.contig_contig_1 Omnitrophica_WOR_2_bacterium_RIFCSPLOWO2_12_FULL_46_30.contig_contig_1 63 Planctomycetes_bacterium_DG_23.contig_contig_1 100 Planctomycetes_bacterium_DG_58.contig_contig_1 67 Planctomycetes_bacterium_SM23_65.contig_contig_1 54 Planctomycetes_bacterium_SM23_32.contig_contig_1 100 Planctomycetes_bacterium_RBG_16_59_8.contig_contig_1 Planctomycetes_bacterium_RBG_16_43_13.contig_contig_1 100 Planctomycetes_bacterium_RIFCSPHIGHO2_02_FULL_52_58.contig_contig_1 76 100 Planctomycetes_bacterium_RIFCSPHIGHO2_12_FULL_52_36.contig_contig_1 76Planctomycetes_bacterium_RIFCSPLOWO2_12_FULL_50_35.contig_contig_1 100 Planctomycetes_bacterium_RIFCSPHIGHO2_02_FULL_50_42.contig_contig_1 67Planctomycetes_bacterium_RIFCSPHIGHO2_12_FULL_51_37.contig_contig_1 63Planctomycetes_bacterium_RIFCSPLOWO2_02_FULL_50_16.contig_contig_1 100 Planctomycetes_bacterium_GWA2_50_13.contig_contig_1 100 Candidatus_Scalindua_sp_BSI-1.contig_contig_1 99 Candidatus_Scalindua_brodae.contig_contig_1 100 Planctomycetes_bacterium_RIFCSPHIGHO2_02_FULL_40_12.contig_contig_1 100Planctomycetes_bacterium_RIFCSPLOWO2_12_FULL_40_19.contig_contig_1 99Planctomycetes_bacterium_GWF2_40_8.contig_contig_1 100 Planctomycetes_bacterium_GWA2_40_7.contig_contig_1 100 Planctomycetes_bacterium_RBG_16_41_13.contig_contig_1 100GCA_000315095.contig_contig_1 GCA_000315115.contig_contig_1 100 100 Planctomycetes_bacterium_RIFCSPHIGHO2_12_39_6.contig_contig_1 63Planctomycetes_bacterium_RIFCSPHIGHO2_02_FULL_38_41.contig_contig_1 Planctomycetes_bacterium_RIFCSPLOWO2_12_38_17.contig_contig_1 100100 Planctomycetes_bacterium_RIFCSPLOWO2_12_FULL_39_13.contig_contig_1 57Planctomycetes_bacterium_GWF2_39_10.contig_contig_1 67Planctomycetes_bacterium_GWC2_39_26.contig_contig_1 Planctomycetes_bacterium_GWA2_39_15.contig_contig_1 49 Candidatus_Jettenia_caeni.contig_contig_1 81100 Candidatus_Brocadia_sinica_JPN1.contig_contig_1 100 Candidatus_Brocadia_sinica.contig_contig_1 44100 GCA_001753675.contig_contig_1 100 Candidatus_Brocadia_fulgida.contig_contig_1 Planctomycetes_bacterium_RIFCSPHIGHO2_12_42_15.contig_contig_1 5799 Planctomycetes_bacterium_GWB2_41_19.contig_contig_1 87 Planctomycetes_bacterium_RIFOXYB12_FULL_42_10.contig_contig_1 100Planctomycetes_bacterium_RIFOXYD12_FULL_42_12.contig_contig_1 39Planctomycetes_bacterium_RIFOXYD2_FULL_41_16.contig_contig_1 45Planctomycetes_bacterium_GWE2_41_14.contig_contig_1 Planctomycetes_bacterium_RIFOXYC2_FULL_41_27.contig_contig_1 100 Planctomycetes_bacterium_SM23_25.contig_contig_1 Planctomycetes_bacterium_DG_20.contig_contig_1 99 100 Phycisphaera_mikurensis_NBRC_102666 95 GCA_001657375.contig_contig_1 100 GCA_000484995.contig_contig_1 Phycisphaerae_bacterium_SM23_33.contig_contig_1 53 Phycisphaerae_bacterium_SM23_30.contig_contig_1 100 Planctomycetes_bacterium_GWC2_49_10.contig_contig_1 100 94 Planctomycetes_bacterium_GWF2_50_10.contig_contig_1 100 Planctomycetes_bacterium_GWF2_42_9.contig_contig_1 100 Planctomycetes_bacterium_GWF2_41_51.contig_contig_1 50 100 Planctomycetes_bacterium_RBG_13_44_8b.contig_contig_1 Planctomycetes_bacterium_GWC2_45_44.contig_contig_1 100 GCA_001603075.contig_contig_1 100 Planctomycetes_bacterium_RBG_13_60_9.contig_contig_1 100 Planctomycetes_bacterium_RBG_13_62_9.contig_contig_1 72 Planctomycetes_bacterium_RBG_13_46_10.contig_contig_1 100 79 Planctomycetes_bacterium_RBG_13_50_24.contig_contig_1 100 Phycisphaerae_bacterium_SG8_4.contig_contig_1 76 Phycisphaerae_bacterium_SM1_79.contig_contig_1 92 Planctomycetes_bacterium_RBG_16_55_9.contig_contig_1 Planctomycetes_bacterium_RBG_19FT_COMBO_48_8.contig_contig_1 100 Planctomycetaceae_bacterium_SCGC_AG-212-F19.contig_contig_1 100 GCA_000255705.contig_contig_1 100 GCA_000171775.contig_contig_1 99 100GCA_000531095.contig_contig_1 Gemmata_sp_SH-PL17.contig_contig_1 Isosphaera_pallida_ATCC_43644 100 100 Singulisphaera_sp_GP187.contig_contig_1 100GCA_000255675.contig_contig_1 96 Singulisphaera_acidiphila_DSM_18658.contig_contig_1 100 Paludisphaera_borealis.contig_contig_1 100 100 Planctomyces_sp_SH-PL62.contig_contig_1 Planctomycetales_bacterium_71-10.contig_contig_1 100 GCA_000255655.contig_contig_1 100 Planctopirus_sp_JC280.contig_contig_1 100 Planctopirus_limnophila_DSM_3776.contig_contig_1 100 Planctomicrobium_piriforme.contig_contig_1 88 Planctomyces_sp_SH-PL14.contig_contig_1 53 Gimesia_maris_DSM_8797.contig_contig_1 100 Planctomyces_brasiliensis_DSM_5305 36 GCA_001464525.contig_contig_1 58 Planctomycetes_bacterium_RBG_16_64_10.contig_contig_1 100 Planctomycetes_bacterium_RBG_13_63_9.contig_contig_1 100 Planctomycetes_bacterium_RBG_16_64_12.contig_contig_1 26 Blastopirellula_marina_DSM_3645.contig_contig_1 98 Pirellula_staleyi_DSM_6068.contig_contig_1 72 GCA_001642875.contig_contig_1 Pirellula_sp_SH-Sr6A.contig_contig_1 94 GCA_001642915.contig_contig_1 10096 Rhodopirellula_maiorica_SM1.contig_contig_1 GCA_001642955.contig_contig_1 100100 Rhodopirellula_sp_SWK7.contig_contig_1 100 Rhodopirellula_sallentina_SM41.contig_contig_1 Rhodopirellula_islandica.contig_contig_1 0.3 10078Rhodopirellula_europaea_SH398.contig_contig_1 81Rhodopirellula_europaea_6C.contig_contig_1 100Rhodopirellula_baltica_WH47.contig_contig_1 65Rhodopirellula_baltica_SH_1 50Rhodopirellula_baltica_SH28.contig_contig_1 Rhodopirellula_baltica_SWK14.contig_contig_1 Supplementary Figure 13. Selection of representative PVC genomes. ML phylogeny, inferred from an alignment of 438 taxa and 2301 sites of concatenated orthologous ribosomal proteins from ribocontigs using RAxML under the PROTCATLG model of evolution. Branch support was estimated with 100 rapid bootstrap replicates. PVC phyla are coloured: Planctomycetes in pink, Candidatus Omnitrophica in orange, Verrucomicrobia in blue, Lentisphaerae in green and Chlamydiae in purple. Representative bacterial lineages are in black and the archaeal outgroup in grey. Branches leading to clades from which to select a representative and selected representatives (Supplementary Table 3 and 6) are in red.

48

01000 3000 5000 0246810 Copy Number Chlamydia muridarum str. Nigg Chlamydia abortus S26/3 Chlamydia psittaci 6BC Chlamydia trachomatis D/UW-3/CX Chlamydophila caviae GPIC Chlamydophila pecorum E58 Chlamydophila pneumoniae CWL029 Chlamydiae bacterium K940_chlam_9 Chlamydia felis Fe/C-56 Chlamydia gallinacea 08-1274/3 Coraliomargarita akajimensis DSM 45221 Chlamydiae bacterium KR126_chlam_1 Ca. Protochlamydia naegleriophila KNic Chlamydia avium 10DC88 Parachlamydia acanthamoebae UV-7 Chlamydiae bacterium K940_chlam_8 Kiritimatiella glycovorans L21-Fru-AB Methylacidiphilum infernorum V4 Simkania negevensis Z Chlamydiae bacterium KR126_chlam_4 Chlamydiae bacterium K1060_chlam_5 Phycisphaera mikurensis NBRC 102666 Pirellula staleyi DSM 6068 Chlamydiae bacterium K1060_chlam_2 Akkermansia muciniphila ATCC BAA-835 Chlamydiae bacterium K940_chlam_1 Chlamydiae bacterium K940_chlam_7 Chlamydiae bacterium K940_chlam_3 HTCC215 Isosphaera pallida ATCC 43644 Opitutus terrae PB90-1 Chlamydiae bacterium K1000_chlam_1 Chlamydiae bacterium K1060_chlam_1 Chlamydiae bacterium K940_chlam_5 Ca. Omnitrophus magneticus SKK-01 0XPXW COG0799 COG0449 COG0777 COG0825 COG0721 COG0319 COG1734 COG0238 COG0359 COG0335 COG0228 COG0336 COG0482 COG0215 COG0544 COG0750 COG0162 COG0253 COG1137 COG0289 COG0177 COG0522 COG0691 COG0769 COG1825 COG4775 COG0201 COG0481 COG0203 COG0322 COG0527 COG0196 COG0195 COG0858 COG1185 COG1862 COG0571 COG0575 COG0552 COG0396 COG0541 COG0185 COG0199 COG0128 COG0749 COG2877 COG1212 COG0173 COG0172 COG1663 COG0057 COG0592 COG0127 COG0030 COG0180 COG0511 COG0495 COG0009 COG1519 COG0781 COG0343 COG0742 COG0342 COG0148 COG0706 COG1530 COG0013 COG0050 COG0504 COG0361 COG0272 COG0012 COG0217 COG2890 COG0249 COG0324 COG1044 COG0632 COG0290 COG0250 COG0240 COG0323 COG0222 COG0166 COG0445 COG0149 COG0100 COG0052 COG2255 COG1570 COG1198 COG1160 COG0817 COG0815 COG0802 COG0774 COG0576 COG0536 COG0532 COG0525 COG0468 COG0442 COG0353 COG0331 COG0320 COG0292 COG0283 COG0264 COG0261 COG0256 COG0254 COG0244 COG0237 COG0233 COG0216 COG0211 COG0200 COG0197 COG0193 COG0190 COG0186 COG0184 COG0136 COG0126 COG0125 COG0124 COG0103 COG0102 COG0099 COG0098 COG0097 COG0096 COG0094 COG0093 COG0092 COG0091 COG0090 COG0089 COG0088 COG0087 COG0082 COG0081 COG0080 COG0064 COG0051 COG0049 COG0016 COG0048 NOGs

Supplementary Figure 14. Heatmap of copy-number for potential marker gene NOGs. Presence, absence and copy number of single-copy marker gene NOGs from complete and near complete PVC genomes, used for reconstructing species phylogenies.

49

● 12.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Discordance score Discordance ● ● ● ● ● ● 7.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5.0 0XPXW COG0186 COG0184 COG0082 COG0199 COG0048 COG0742 COG0319 COG0320 COG0193 COG0185 COG0799 COG0228 COG0211 COG0102 COG0781 COG0802 COG0721 COG0361 COG0222 COG0099 COG0571 COG0482 COG0089 COG0080 COG0691 COG0094 COG0125 COG0858 COG2890 COG0126 COG2877 COG0049 COG0091 COG0552 COG0292 COG0331 COG1044 COG0353 COG0093 COG0096 COG0203 COG0103 COG0149 COG0632 COG0359 COG1160 COG0190 COG1825 COG0127 COG1137 COG0148 COG0256 COG0196 COG0495 COG0817 COG0057 COG0200 COG0197 COG0522 COG0051 COG0081 COG0335 COG0097 COG0050 COG0769 COG1663 COG0087 COG0264 COG0261 COG0244 COG0172 COG0180 COG0532 COG0088 COG0090 COG0289 COG0324 COG0511 COG0016 COG2255 COG0215 COG0576 COG0322 COG1530 COG0240 COG0283 COG0052 COG0449 COG0098 COG0774 COG0092 COG0064 COG1570 COG0575 COG0195 COG0536 COG0544 COG0249 COG0445 COG0216 COG0468 COG0250 COG0233 COG1198 COG0173 COG0825 COG0290 COG0217 COG0541 COG0750 COG0013 COG0012 COG0815 COG0272 COG0504 COG0323 COG0525 COG0481 COG0201 COG1185 COG0592 COG0749 Ranked NOGs Supplementary Figure 15. Discordance filtering of single-copy marker genes. Discordance scores across single-copy marker protein NOGs. The 20% most discordant markers are left of the red dotted line. .

50

a b c

PVC group bacterium PVC group bacterium PVC group bacterium Candidatus Omnitrophica Candidatus Omnitrophica Candidatus Omnitrophica Planctomycetes Planctomycetes Planctomycetes Lentisphaerae/Kiritimatiellaeota Lentisphaerae/Kiritimatiellaeota Lentisphaerae/Kiritimatiellaeota Lentisphaerae Lentisphaerae Lentisphaerae Verrucomicrobia Verrucomicrobia Verrucomicrobia Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_11 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_11 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_11 Chlamydiae bacterium K940_chlam_8 Chlamydiae bacterium K940_chlam_8 Chlamydiae bacterium K940_chlam_8 Chlamydiae bacterium K1060_chlam_2 Chlamydiae bacterium K1060_chlam_2 Chlamydiae bacterium K1060_chlam_2 Simkania negevensis Simkania negevensis Simkania negevensis Chlamydiae bacterium Ga0074140 Chlamydiae bacterium Ga0074140 Chlamydiae bacterium Ga0074140 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_49_12 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_49_12 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_49_12 Chlamydiae bacterium K940_chlam_2 Chlamydiae bacterium K940_chlam_2 Chlamydiae bacterium K940_chlam_2 Chlamydiae bacterium KR126_chlam_1 Chlamydiae bacterium KR126_chlam_1 Chlamydiae bacterium KR126_chlam_1 Chlamydiae bacterium K1000_chlam_2 Chlamydiae bacterium K1000_chlam_2 Chlamydiae bacterium K1000_chlam_2 Chlamydiae bacterium KR126_chlam_3 Chlamydiae bacterium KR126_chlam_3 Chlamydiae bacterium KR126_chlam_3 Chlamydiae bacterium K1000_chlam_3 Chlamydiae bacterium K1000_chlam_3 Chlamydiae bacterium K1000_chlam_3 Chlamydiae bacterium K940_chlam_6 Chlamydiae bacterium K940_chlam_6 Chlamydiae bacterium K940_chlam_6 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_9 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_9 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_9 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 Chlamydiae bacterium SM23_39 Chlamydiae bacterium SM23_39 Chlamydiae bacterium SM23_39 Chlamydiae bacterium K1060_chlam_5 Chlamydiae bacterium K1060_chlam_5 Chlamydiae bacterium K1060_chlam_5 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_27_8 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_27_8 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_27_8 Chlamydiae bacterium K1000_chlam_1 Chlamydiae bacterium K1000_chlam_1 Chlamydiae bacterium K1000_chlam_1 Chlamydiae bacterium K940_chlam_1 Chlamydiae bacterium K940_chlam_1 Chlamydiae bacterium K940_chlam_1 Chlamydiae bacterium KR126_chlam_6 Chlamydiae bacterium KR126_chlam_6 Chlamydiae bacterium KR126_chlam_6 Chlamydiae bacterium K1060_chlam_1 Chlamydiae bacterium K1060_chlam_1 Chlamydiae bacterium K1060_chlam_1 Chlamydiae bacterium K940_chlam_4 Chlamydiae bacterium K940_chlam_4 Chlamydiae bacterium K940_chlam_4 Chlamydiae bacterium KR126_chlam_4 Chlamydiae bacterium KR126_chlam_4 Chlamydiae bacterium KR126_chlam_4 Chlamydiae bacterium K1060_chlam_3 Chlamydiae bacterium K1060_chlam_3 Chlamydiae bacterium K1060_chlam_3 Chlamydiae bacterium KR126_chlam_5 Chlamydiae bacterium KR126_chlam_5 Chlamydiae bacterium KR126_chlam_5 Chlamydiae bacterium K1060_chlam_4 Chlamydiae bacterium K1060_chlam_4 Chlamydiae bacterium K1060_chlam_4 Chlamydiae bacterium K940_chlam_5 Chlamydiae bacterium K940_chlam_5 Chlamydiae bacterium K940_chlam_5 Chlamydiae bacterium K940_chlam_3 Chlamydiae bacterium K940_chlam_3 Chlamydiae bacterium K940_chlam_3 Criblamydia sequanensis Criblamydia sequanensis Criblamydia sequanensis Estrella lausannensis Estrella lausannensis Estrella lausannensis Chlamydiae bacterium K940_chlam_7 Chlamydiae bacterium K940_chlam_7 Chlamydiae bacterium K940_chlam_7 Waddlia chondrophila Waddlia chondrophila Waddlia chondrophila Parachlamydiaceae bacterium HS-T3 Parachlamydia acanthamoebae Parachlamydia acanthamoebae Parachlamydiasp. C2 Ca. Rubidus massiliensis Ca. Rubidus massiliensis Ca. Protochlamydia amoebophila Chlamydiales bacterium 38-26 Chlamydiales bacterium 38-26 Protochlamydia naegleriophila Neochlamydia sp. EPS4 Neochlamydiasp. EPS4 Parachlamydia acanthamoebae Parachlamydiaceae bacterium HS-T3 Parachlamydiaceae bacterium HS-T3 Ca. Rubidus massiliensis Parachlamydia sp. C2 Parachlamydia sp. C2 Chlamydiales bacterium 38-26 Ca. Protochlamydia amoebophila Ca. Protochlamydia amoebophila Neochlamydia sp. EPS4 Protochlamydia naegleriophila Protochlamydia naegleriophila Chlamydiae bacterium K940_chlam_9 Chlamydiae bacterium K940_chlam_9 Chlamydiae bacterium K940_chlam_9 Chlamydiae bacterium K1000_chlam_4 Chlamydiae bacterium K1000_chlam_4 Chlamydiae bacterium K1000_chlam_4 Chlamydiae bacterium KR126_chlam_2 Chlamydiae bacterium KR126_chlam_2 Chlamydiae bacterium KR126_chlam_2 Chlamydia trachomatis Chlamydia suis Chlamydia trachomatis Chlamydia muridarum Chlamydia muridarum Chlamydia muridarum Chlamydia suis Chlamydia trachomatis Chlamydia suis Chlamydophila pecorum Chlamydophila pecorum Chlamydophila pecorum Chlamydia sp. 2742-308 Chlamydia sp. 2742-308 Chlamydia sp. 2742-308 Chlamydophila pneumoniae Chlamydophila pneumoniae Chlamydophila pneumoniae Chlamydia ibidis Chlamydia ibidis Chlamydia ibidis Chlamydia avium Chlamydia avium Chlamydia avium Chlamydia gallinacea Chlamydia gallinacea Chlamydia gallinacea Chlamydia abortus Chlamydia felis Chlamydia felis ufBV ≥ 95 Chlamydia psittaci Chlamydophila caviae Chlamydophila caviae Chlamydia felis Chlamydia abortus Chlamydia abortus Chlamydophila caviae Chlamydia psittaci Chlamydia psittaci

0.1 substitutions 0.1 substitutions 0.1 substitutions per site per site per site

Supplementary Figure 16. ML concatenated species phylogenetic trees of Chlamydiae. ML phylogenetic trees inferred using IQ-TREE under the LG+C60+R4+F model of evolution with (a) 98 (28,286 sites), (b) 55 (14,212 sites), and (c) 38 (7,894 sites) concatenated single-copy marker genes. Datasets of 55 and 38 single-copy marker genes are subsets of the 98 based on best representation among genomes. Phylogenies include an extensive outgroup with representatives from across the PVC phyla: Kiritimatiellaeota, Lentisphaerae, Verrucomicrobia, Candidatus Omnitrophica and Planctomycetes. Ultrafast bootstrap (ufBV) support is indicated at branches following the legend.

51

300

200 Chi2T score statistic est Chi2T

100

0

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 Percentage of alignment Chi2 pruned (%)

PVC Outgroup Chlamydiae

Supplementary Figure 17. Step-wise removal of the most compositionally heterogeneous sites from a concatenated alignment of single-copy marker proteins, based on c2 statistics. The plot shows the c2 test statistic across Chlamydiae and outgroup PVC taxa after pruning 5% to 95% of the sites with the highest compositional heterogeneity.

52

Supplementary Tables

Supplementary Table 1. Loki's Castle sediment sample information and summary of amplicon sequencing results.

Chlamydiae Number of Chlamydiae OTUs Over Chlamydiae OTUs Over Relative Chlamydiae 0.1% Relative 1% Relative Sample ID Sediment Core Depth (mbsf) Abundance (%) OTUs Abundance Abundance GS10_PC15_10 GS10_PC15 11.58 –a – – – GS10_PC15_40 GS10_PC15 11.23 – – – – GS10_PC15_70 GS10_PC15 10.93 – – – – GS10_PC15_100 GS10_PC15 10.63 – – – – GS10_PC15_130 GS10_PC15 10.33 – – – – GS10_PC15_160 GS10_PC15 10.03 – – – – GS10_PC15_190 GS10_PC15 9.73 – – – – GS10_PC15_220 GS10_PC15 9,43 0 0 0 0 GS10_PC15_250 GS10_PC15 9.13 0.068 1 0 0 GS10_PC15_280 GS10_PC15 8.83 – – – – GS10_PC15_310 GS10_PC15 8.53 – – – – GS10_PC15_340 GS10_PC15 8.23 – – – – GS10_PC15_370 GS10_PC15 7.93 – – – – GS10_PC15_400 GS10_PC15 7.63 – – – – GS10_PC15_430 GS10_PC15 7.33 – – – – GS10_PC15_460 GS10_PC15 7.03 – – – – GS10_PC15_490 GS10_PC15 6.73 – – – – GS10_PC15_520 GS10_PC15 6.43 0.177 1 1 0 GS10_PC15_550 GS10_PC15 6.13 0 0 0 0 GS10_PC15_580 GS10_PC15 5.83 – – – – GS10_PC15_610 GS10_PC15 5.53 0 0 0 0 GS10_PC15_640 GS10_PC15 5.23 – – – – GS10_PC15_670 GS10_PC15 4.93 – – – – GS10_PC15_700 GS10_PC15 4.36 – – – – GS10_PC15_730 GS10_PC15 4.33 – – – – GS10_PC15_760 GS10_PC15 4.03 – – – – GS10_PC15_790 GS10_PC15 3.73 – – – – GS10_PC15_820 GS10_PC15 3.43 – – – – GS10_PC15_850 GS10_PC15 3.13 – – – – GS10_PC15_880 GS10_PC15 2.83 3.808 14 6 1 GS10_PC15_910 GS10_PC15 2.53 – – – – GS10_PC15_940 GS10_PC15 2.23 11.148 26 10 2 GS10_PC15_1000 GS10_PC15 1.63 5.666 82 16 0 GS10_PC15_1060 GS10_PC15 1.03 12.43 163 29 1 GS10_PC15_1090 GS10_PC15 0.73 – – – – GS10_PC15_1120 GS10_PC15 0.43 1.239 25 2 0 GS10_GC14_5 GS10_GC14 0.05 – – – – GS10_GC14_10 GS10_GC14 0.10 – – – – GS10_GC14_40 GS10_GC14 0.40 – – – – GS10_GC14_75 GS10_GC14 0.75 8.929 10 4 1 GS10_GC14_100 GS10_GC14 1.00 – – – – GS10_GC14_115 GS10_GC14 1.15 – – – – GS10_GC14_130 GS10_GC14 1.30 – – – – GS10_GC14_150 GS10_GC14 1.50 – – – – GS10_GC14_176 GS10_GC14 1.76 – – – – GS10_GC14_180 GS10_GC14 1.80 – – – – GS10_GC14_200 GS10_GC14 2.00 – – – – GS14_GC12_10 GS14_GC12 0.10 1.024 25 2 0 GS14_GC12_20 GS14_GC12 0.20 1.028 19 1 0 GS14_GC12_30 GS14_GC12 0.30 1.079 94 2 0 GS14_GC12_40 GS14_GC12 0.40 0.467 25 1 0 GS14_GC12_50 GS14_GC12 0.50 1.107 17 3 0 GS14_GC12_75 GS14_GC12 0.75 2.101 45 3 0 GS14_GC12_100 GS14_GC12 1.00 1.648 25 6 0 GS14_GC12_130 GS14_GC12 1.30 1.393 6 3 1 GS14_GC12_160 GS14_GC12 1.60 3.829 4 1 1 GS14_GC12_175 GS14_GC12 1.75 0.848 5 2 0 GS14_GC12_190 GS14_GC12 1.90 0.098 2 0 0 GS14_GC12_220 GS14_GC12 2.20 0 0 0 0 GS14_GC12_250 GS14_GC12 2.50 0 0 0 0 GS14_GC12_280 GS14_GC12 2.80 0 0 0 0 GS14_GC12_310 GS14_GC12 3.10 0 0 0 0 GS14_GC12_340 GS14_GC12 3.40 0.218 6 0 0 GS14_GC12_357 GS14_GC12 3.57 0 0 0 0 GS14_GC12_360 GS14_GC12 3.60 0.027 5 0 0 GS08_GC12_38 GS08_GC12 0.38 – – – – GS08_GC12_80 GS08_GC12 0.80 – – – – GS08_GC12_126 GS08_GC12 1.26 43.063 37 8 2 GS08_GC12_310 GS08_GC12 3.10 – – – – aNot applicable (–), PCR screened, but amplicon sequencing not performed for sample

53

Supplementary Table 2. Characteristics of marine sediment chlamydiae MAGs.

Estimated Median Chlamydiae Clade Completeness Number of Bin Size Genome Size Intergenic a a b c Genome Affiliation Metagenome Sample (%) Redundancy GC (%) Contigs (Mbp) (Mbp) Space (bp) iRep N50 RP15 Contig 16S rRNA Gene Chlamydiae bacterium K940_chlam_8 Unresolved GS10_PC15_940 98 1 37.48 89 1.4 1.43 38 – 26189 contig-124_1042 full (contig-124_2389) Chlamydiae bacterium K1060_chlam_2 CC-I GS10_PC15_1060 97 1 46.89 143 1.63 1.68´ 28 – 16440 contig-124_2150 partial (contig-124_100491) Chlamydiae bacterium K940_chlam_2 CC-II GS10_PC15_940 94 1.09 48.88 285 1.66 1.61 18 – 7020 contig-124_2839 none Chlamydiae bacterium KR126_chlam_1 CC-II GS08_GC12_126 99 1.01 46.8 25 1.61 1.61 32 1.19 116054 contig-100_216 partial (contig-100_165) Chlamydiae bacterium K1000_chlam_2 CC-II GS10_PC15_1000 88 1.01 44.48 252 1.74 1.96 20 – 8520 contig-124_9157 partial (contig-124_16903) Chlamydiae bacterium KR126_chlam_3 CC-II GS08_GC12_126 94 1 42.98 185 1.68 1.79 40 1.4 11258 contig-100_1304 partial (contig-100_50271) Chlamydiae bacterium K940_chlam_6 CC-II GS10_PC15_940 91 1 42.93 432 1.51 1.66 24 – 4668 contig-124_54975/contig-124_65424 none Chlamydiae bacterium K1000_chlam_3 CC-II GS10_PC15_1000 86 1 41.77 270 1.66 1.93 31 1.9 7870 contig-124_1375 none Chlamydiae bacterium K1060_chlam_5 CC-IV GS10_PC15_1060 98 1.01 26.37 57 1.39 1.40 39 – 39842 contig-124_32386 none Chlamydiae bacterium K940_chlam_1 CC-IV GS10_PC15_940 100 1 30.93 156 1.33 1.33 50 1.54 14062 contig-124_2150 none Chlamydiae bacterium K1000_chlam_1 CC-IV GS10_PC15_1000 96 1 30.85 252 1.6 1.67 57 1.51 9399 contig-124_1902 partial (contig-124_139984) Chlamydiae bacterium KR126_chlam_6 CC-IV GS08_GC12_126 94 1.02 29.65 162 1.53 1.60 50 1.36 12844 contig-100_3930 none Chlamydiae bacterium K1060_chlam_1 CC-IV GS10_PC15_1060 96 1 29.32 120 1.59 1.66 68 1.48 18228 contig-124_9382 none Chlamydiae bacterium KR126_chlam_4 CC-IV GS08_GC12_126 100 1.01 29.43 107 1.58 1.56 63 1.12 21756 contig-100_2916 none Chlamydiae bacterium K940_chlam_4 CC-IV GS10_PC15_940 94 1 29.51 235 1.42 1.51 64 1.58 8098 contig-124_7225 none Chlamydiae bacterium K940_chlam_5 CC-IV GS10_PC15_940 99 1.01 30.16 191 1.76 1.76 65 1.58 12088 contig-124_5410 none Chlamydiae bacterium K1060_chlam_4 CC-IV GS10_PC15_1060 67 1.04 30.3 514 1.38 1.98 75 – 3239 contig-124_114082 none Chlamydiae bacterium KR126_chlam_5 CC-IV GS08_GC12_126 94 1.01 30.25 264 1.57 1.65 54 1.19 7741 contig-100_14168 none Chlamydiae bacterium K1060_chlam_3 CC-IV GS10_PC15_1060 71 1.03 30.34 167 0.98 1.34 60.5 – 8308 contig-124_29952 none Chlamydiae bacterium K940_chlam_3 Environmental chlamydiae GS10_PC15_940 96 1 41.7 221 1.91 1.99 50 1.73 10099 contig-124_6068 none Chlamydiae bacterium K940_chlam_7 Environmental chlamydiae GS10_PC15_940 96 1 43.37 408 2.49 2.59 53 1.4 7595 contig-124_4023 none Chlamydiae bacterium K940_chlam_9 CC-V GS10_PC15_940 99 1 47.21 240 2.07 2.09 35.5 1.79 16710 contig-124_6236 partial (contig-124_2865) Chlamydiae bacterium KR126_chlam_2 CC-V GS08_GC12_126 93 1.04 47.37 141 1.37 1.41 24 1.4 12860 contig-100_6141/contig-100_8385 none Chlamydiae bacterium K1000_chlam_4 CC-V GS10_PC15_1000 71 1.04 46.82 229 0.94 1.27 38 – 4244 contig-124_70302 none aEstimated using micomplete (See Methods) with Chlamydiae-specific single-copy gene set (Supplementary Table 6) bBased on estimated completeness, corrected by estimated proportion of genome redundancy cNot applicable (–), genome didn't meet coverage (5X) and completeness (70%) requirements for inferring replication rate (iRep)(Brown et al., 2016) dPercentage of reads in respective metagenome mapped to contigs in each genome (out of all reads mapped to contigs in the complete metagenome assembly)

54

Supplementary Table 3. Characteristics of Chlamydiae reference genomes.

Estimated Median Chlamydiae Clade Number of Completeness Genome/Bin Genome Size Number of Intergenic 16S rRNA Organism Name Affiliation Genbank Accession Assembly Level Genome Source Contigs (%)b Redundancyb GC (%) Size (Mbp) (Mbp) ORFsc Space (bp) Gene Chlamydiae Species Representatives (Available Prior to February 2017) Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_11 Unresolved GCA_001794905.1 Scaffold aquifer groundwater metagenome96 134 91 1 48.45 1.26 1.38 1065 22.5 no Simkania negevensis Z CC-I GCA_000237205.1 Complete Genome co-culture – 100 1 41.60 2.63 – 2518 36 yes Chlamydiae bacterium RIFCSPLOWO2_02_FULL_49_12 CC-II GCA_001796275.1 Scaffold aquifer groundwater metagenome96 156 95 1 48.99 1.41 1.49 1175 34 yes Chlamydiae bacterium Ga0074140 CC-II GCA_001464115.1 Contig water treatment plant metagenome88 6 99 1 47.82 1.72 1.74 1648 35 yes Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_9 CC-III GCA_001794935.1 Scaffold aquifer groundwater metagenome96 211 65 1 48.59 1.32 2.02 1166 27 no Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 CC-III GCA_001796255.1 Scaffold aquifer groundwater metagenome96 31 99 1 44.70 1.58 1.59 1475 23.5 yes Chlamydiae bacterium SM23_39 CC-IV GCA_001303765.1 Contig aquifer groundwater metagenome96 67 93 1 26.23 1.13 1.21 986 35 yes Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_27_8 CC-IV GCA_001796155.1 Scaffold aquifer groundwater metagenome96 222 71 1 27.43 0.97 1.38 817 29 no Criblamydia sequanensis CRIB-18 Environmental chlamydiae GCA_000750955.1 Contig co-culture 23 99 1 38.24 2.97 3.00 2418 64 yes Estrella lausannensis CRIB-30 Environmental chlamydiae GCA_900000175.1 Scaffold co-culture 34 99 1 48.22 2.83 2.85 2217 92 yes Waddlia chondrophila WSU 86-1044 Environmental chlamydiae GCA_000092785.1 Complete Genome co-culture – 100 1 43.74 2.13 – 1956 22 yes Parachlamydia acanthamoebae UV-7 Environmental chlamydiae GCA_000253035.1 Complete Genome co-culture – 99 1 39.04 3.07 – 2788 56 yes Ca. Rubidus massiliensis (ex. Chlamydia sp. Rubis) Environmental chlamydiae GCA_000756735.1 Contig co-culture 5 100 1 32.64 2.82 2.82 2446 53 yes Chlamydiales bacterium 38-26 Environmental chlamydiae GCA_001897225.1 Scaffold thiocyanate bioreactor metagenome87 10 100 1 38.12 2.83 2.83 2327 88 no Neochlamydia sp. EPS4 Environmental chlamydiae GCA_000813665.1 Contig co-culture 112 99 1 38.09 2.53 2.55 2173 142 yes Parachlamydiaceae bacterium HS-T3 Environmental chlamydiae GCA_000829755.1 Contig co-culture 34 99 1 38.71 2.31 2.33 2025 44.5 yes Parachlamydia sp. C2 (ex. Protochlamydia greubae) Environmental chlamydiae GCA_001545115.1 Scaffold co-culture 33 100 1 42.05 3.42 3.42 2766 117 yes Ca. Protochlamydia amoebophila UWE25 Environmental chlamydiae GCA_000011565.1 Chromosome co-culture 2 100 1 34.72 2.41 2.41 2031 120 yes Ca. Protochlamydia naegleriophila KNic Environmental chlamydiae GCA_001499655.1 Complete Genome co-culture –a 100 1 42.44 3.03 – 2575 113 yes Chlamydia trachomatis D/UW-3/CX Clamydiaceae GCA_000008725.1 Complete Genome co-culture – 100 1 41.31 1.04 – 894 53 yes Chlamydia muridarum str. Nigg Clamydiaceae GCA_000006685.1 Complete Genome co-culture – 100 1 40.31 1.08 – 911 45.5 yes Chlamydia suis MD56 Clamydiaceae GCA_000493885.1 Scaffold co-culture 47 100 1 42.01 1.08 1.08 931 51 yes Chlamydophila pecorum E58 Clamydiaceae GCA_000204135.1 Complete Genome co-culture – 100 1 41.08 1.11 – 988 28 yes Chlamydia sp. 2742-308 Clamydiaceae GCA_001653975.1 Chromosome co-culture 2 100 1 38.50 1.12 1.12 1004 44 yes Chlamydophila pneumoniae CWL029 Clamydiaceae GCA_000008745.1 Complete Genome co-culture – 100 1 40.58 1.23 – 1052 55 yes Chlamydia ibidis 10-1398/6 Clamydiaceae GCA_000454725.1 Contig co-culture 4 100 1 38.32 1.15 1.15 1018 50 yes Chlamydia avium 10DC88 Clamydiaceae GCA_000583875.1 Complete Genome co-culture – 100 1 36.88 1.05 – 947 31 yes Chlamydia gallinacea 08-1274/3 Clamydiaceae GCA_000471025.2 Complete Genome co-culture – 99 1 37.90 1.07 – 900 40.5 yes Chlamydia felis Fe/C-56 Clamydiaceae GCA_000009945.1 Complete Genome co-culture – 100 1 39.34 1.17 – 1013 39.5 yes Chlamydophila caviae GPIC Clamydiaceae GCA_000007605.1 Complete Genome co-culture – 100 1 39.19 1.18 – 1005 48 yes Chlamydia abortus S26/3 Clamydiaceae GCA_000026025.1 Complete Genome co-culture – 100 1 39.87 1.14 – 932 45 yes Chlamydia psittaci 6BC Clamydiaceae GCA_000204255.1 Complete Genome co-culture – 100 1 39.02 1.18 – 1009 42 yes Other Chlamydiae Species Representatives (Released Between February 2017 and April 2018) Ca. Similichlamydia epinephelii GCCT14 Unresolved GCA_003056015.1 Scaffold infected gill tissue metagenome2 170 80 1.4 39.54 0.98 0.71 940 36 yes Rhabdochlamydia helvetica T3358 CC-II Pillonel et al.. 2018 Contig tick metagenome3 38 99 1 36.18 1.83 1.85 1692 43 yes 95 Chlamydiae bacterium CG10_big_fil_rev_8_21_14_0_10_42_34 CC-III GCA_002773795.1 Scaffold cold CO2 driven geyser metagenome 34 97 1 42.36 1.68 1.73 1581 21.5 no 95 Chlamydiae bacterium CG10_big_fil_rev_8_21_14_0_10_35_9 Unresolved GCA_002773835.1 Scaffold cold CO2 driven geyser metagenome 108 85 1.04 35.13 1.74 1.96 1720 24 no Chlamydiales bacterium SCGC AB-751-O23 Unresolved GCA_900093645.1 Scaffold single-cell from marine water36 89 42 1 35.45 0.99 2.37 876 30.5 yes Waddliaceae bacterium SP13 Unresolved GCA_002709385.1 Contig marine water metagenome37 49 97 1 38.48 3.15 3.24 2460 65 yes Chlamydiales bacterium SCGC AG-110-P3 Environmental chlamydiae GCA_900093655.1 Scaffold single-cell from marine water36 102 50 1 46.83 1.30 2.58 1235 84 yes Chlamydiales bacterium SCGC AG-110-M15 Unresolved GCA_900093625.1 Scaffold single-cell from marine water36 59 50 1 41.80 0.93 1.84 851 53 no Parachlamydia sp. BC.030 Environmental chlamydiae GCA_002786175.1 Contig urban drinking water system metagenome89 39 100 1 41.53 3.04 3.04 2540 68 no Ca. Chlamydia corallus G3/2742-324 Clamydiaceae GCA_002817655.1 Contig snake choana metagenome97 7 100 1.01 39.28 1.20 1.19 996 48.5 yes Non-representative Chlamydiae Included in Select Analyses Chlamydiae bacterium GWA2_50_15 CC-II GCA_001796065.1 Scaffold aquifer groundwater metagenome96 56 94 1 49.34 1.18 1.26 993 29.5 yes Chlamydiae bacterium GWC2_50_10 CC-II GCA_001796095.1 Scaffold aquifer groundwater metagenome96 135 83 1 48.94 1.17 1.41 966 31 yes Chlamydiae bacterium GWF2_49_8 CC-II GCA_001796105.1 Scaffold aquifer groundwater metagenome96 280 71 1 49.23 1.02 1.44 760 26 no Chlamydiae bacterium RIFCSPHIGHO2_02_FULL_49_29 CC-II GCA_001796185.1 Scaffold aquifer groundwater metagenome96 85 91 1 49.08 1.39 1.52 1187 31 no Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_32 CC-II GCA_001796175.1 Scaffold aquifer groundwater metagenome96 94 89 1 48.91 1.40 1.56 1190 29 yes Chlamydiae bacterium RIFCSPLOWO2_12_FULL_49_12 CC-II GCA_001796315.1 Scaffold aquifer groundwater metagenome96 70 90 1 49.16 1.42 1.57 1224 34.5 yes Chlamydiae bacterium RIFCSPHIGHO2_01_FULL_44_39 CC-III GCA_001794865.1 Scaffold aquifer groundwater metagenome96 32 99 1 44.72 1.57 1.58 1466 23 yes Chlamydiae bacterium RIFCSPHIGHO2_02_FULL_45_9 CC-III GCA_001796125.1 Scaffold aquifer groundwater metagenome96 222 80 1 44.66 1.34 1.69 1156 26 no Chlamydiae bacterium RIFCSPLOWO2_01_FULL_44_52 CC-III GCA_001796235.1 Scaffold aquifer groundwater metagenome96 30 99 1 44.74 1.54 1.55 1438 24 yes Chlamydiae bacterium RIFCSPLOWO2_12_FULL_45_20 CC-III GCA_001796285.1 Scaffold aquifer groundwater metagenome96 29 99 1 44.75 1.54 1.56 1443 23 yes Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_44_59 CC-III GCA_001794895.1 Scaffold aquifer groundwater metagenome96 30 99 1 44.72 1.57 1.58 1470 23.5 yes Chlamydiae bacterium RIFCSPLOWO2_01_FULL_28_7 CC-IV GCA_001796195.1 Scaffold aquifer groundwater metagenome96 193 62 1 28.07 0.71 1.14 572 31 no Chlamydia sp. 32-24 Environmental chlamydiae GCA_001897185.1 Scaffold thiocyanate bioreactor metagenome87 100 99 1 32.42 2.53 2.55 2075 56 no Neochlamydia sp. S13 Environmental chlamydiae GCA_000648235.1 Contig co-culture 1342 99 1.07 38.03 3.19 2.99 2232 161 yes Neochlamydia sp. TUME1 Environmental chlamydiae GCA_000813645.1 Contig co-culture 254 100 1 38.02 2.55 2.55 2344 121 yes Parachlamydia acanthamoebae OEW1 Environmental chlamydiae GCA_000812225.1 Contig co-culture 162 97 1 39.04 3.01 3.09 2755 53 yes Parachlamydia acanthamoebae Bn9 Environmental chlamydiae GCA_000875975.1 Contig co-culture 72 99 1 38.94 3.00 3.03 2409 60 yes Parachlamydia acanthamoebae Hall's coccus Environmental chlamydiae GCA_000176075.1 Contig co-culture 95 98 1 38.97 2.97 3.02 2809 54 no Ca. Protochlamydia amoebophila EI2 Environmental chlamydiae GCA_000813625.1 Contig co-culture 178 96 1 34.82 2.40 2.51 2149 99 yes Ca. Protochlamydia massiliensis (ex. Chlamydia sp. 'Diamant') Environmental chlamydiae GCA_000751535.1 Contig co-culture 5 100 1 42.75 2.96 2.96 2451 110 yes Ca. Protochlamydia sp. R18 str. S13 Environmental chlamydiae GCA_000648255.1 Contig co-culture 795 100 1.02 34.74 2.72 2.67 2017 110 yes Ca. Protochlamydia sp. W-9 Environmental chlamydiae GCA_001950615.1 Contig co-culture 402 100 1 34.43 2.48 2.48 1817 109 yes aNot applicable (–), complete genome bEstimated using micomplete (See Methods) with Chlamydiae-specific single-copy gene set (Supplementary Table 6) cOpen Reading Frames (ORFs)

55

Supplementary Table 4. Number of 16/18S rRNA gene fragments identified per phyla in marine sediment sample metagenomes.

Phylum Domain GS10_PC15_1060 GS10_PC15_1000 GS10_PC15_940 GS08_GC12_126 Chloroplast – 0 0 1 0 Platyhelminthes Eukaryota 1 0 0 0 Chordata Eukaryota 0 1 2 0 Abeoformidae Eukaryota 0 0 2 0 Eukaryota 0 0 1 0 Bacteria 8 6 9 3 (SAR406 clade) Bacteria 4 3 3 3 Bacteria 3 2 3 2 Bacteria 3 2 1 1 phylum Bacteria 2 3 0 1 Ca. Latescibacteria (WS3) Bacteria 4 4 2 2 Cloacimonetes Bacteria 1 0 0 0 Ca. (RBG-1) Bacteria 5 3 0 5 GN01 Bacteria 0 2 0 0 Ignavibacteriae Bacteria 0 0 1 0 Proteobacteria Bacteria 51 22 30 14 Firmicutes Bacteria 0 1 1 0 Actinobacteria Bacteria 17 18 21 8 Chloroflexi Bacteria 63 50 22 92 - Bacteria 1 2 0 0 Bacteria 1 1 1 0 WS1 Bacteria 3 2 1 5 Bacteria 9 8 7 3 Chlamydiae Bacteria 16 17 12 6 Planctomycetes Bacteria 69 45 55 31 Lentisphaerae Bacteria 1 0 0 0 Ca. Omnitrophica (OP3) Bacteria 7 1 0 11 Epsilonbacteraeota Bacteria 3 0 3 0 Acidobacteria Bacteria 6 5 5 0 Ca. Acetothermia (OP1) Bacteria 0 4 0 1 Bacteria 0 0 0 1 Bacteria 2 1 0 1 Ca. Hydrogenedentes (NKB19) Bacteria 1 0 2 1 Ca. Bacteria 0 0 2 0 Ca. (TM7) Bacteria 1 1 0 0 Ca. Parcubacteria Bacteria 49 22 13 11 CPR2 Bacteria 2 0 0 0 Ca. (OP11) Bacteria 9 6 4 9 Ca. Bacteria 2 2 1 1 Ca. Bacteria 2 0 0 0 Ca. Peregrinibacteria Bacteria 4 0 0 1 Ca. (WWE3) Bacteria 1 0 1 0 Ca. Absconditabacteria (SR1) Bacteria 0 1 0 0 MD2896-B216 Bacteria 2 1 0 0 Unknown CPR phylum 1 Bacteria 1 1 0 0 Unknown CPR phylum 2 Bacteria 1 0 0 0 Unknown CPR phylum 3 Bacteria 1 0 0 0 Bacteria 4 1 0 3 phylum incertae sedis Bacteria 4 5 0 2 Ca. Aminicenantes (OP8) Bacteria 3 3 1 1 Ca. Dependentiae (TM6) Bacteria 42 17 18 6 Ca. Aerophobetes (CD12) Bacteria 6 6 1 3 NC10 Bacteria 1 0 0 1 Bacteria 2 1 5 0 BRC1 Bacteria 1 1 5 0 1 1 0 4 Unknown Euryarchaeota (superphylum) phylum 1 Archaea 0 0 1 0 Thaumarchaeota Archaea 2 8 4 1 Ca. Bathyarchaeota Archaea 2 2 0 1 Group C3 Archaea 2 4 0 0 Ca. Woesearchaeota Archaea 20 11 30 8 Ca. Diapherotrites Archaea 3 0 0 0 Ca. Parvarchaeota Archaea 1 0 0 0 Unknown DPANN phylum 1 Archaea 1 0 0 0 Ca. Aenigmarchaeota Archaea 0 0 0 1 Ca. Altiarchaeota Archaea 0 0 0 1 Ca. Lokiarchaeota Archaea 5 10 1 3 TOTAL ALL 456 307 271 248

56

Supplementary Table 5. PCR primer pairs, taxonomic coverage and reaction conditions.

Primer Pair Taxonomic Target Taxonomic Coverage (No Mismatches)a Taxonomic Coverage (One Mismatch)a PCR Amplication Reaction Conditionsb Chla-310-a-20 · 0% of Eukaryota · 0% of Eukaryota (CGCCAACAYTGGGACTGAGA) · 0% of Archaea · 0% of Archaea · 15 min of polymerase heat activation at 95 °C and · 0% of Bacteria · 0% of Bacteria · 35 cycles of 94 °C (60 s), 60 °C (60 s) and 72 °C (60 S-*-Univ-1100-a-A-15 Chlamydiae · 99 % of characterized Chlamydiae · 96% of characterized Chlamydiae (0.1% s) · final 98 (4.5 % of Marinimicrobia and small (GGGTYKCGCTCGTTR) of Armatimonadetes, no additional bacterial extension at 72 °C (10 min) percentages (0.01-0.81 %) of additional phyla) ) 574*f · 15 min of polymerase heat activation at 95 °C (CGGTAAYTCCAGCTCYV)99 and · 88 % of Eukaryota · 94% of Eukaryota · 35 cycles of 94 °C (60 s), a step-down to 70 °C (1 1132 Eukarya · 0% of Archaea · 10% of Archaea s), followed by a ramping rate of 0.4 °C/s to 50 °C (60 (CCGTCAATTHCTTYAART)99 · 0% of Bacteria · 0% of Bacteria s), and a ramping rate of 0.8 °C/s to 72 °C (60 s) · final extension at 72 °C (10 min) S-D-0564-a-S-15 · 15 min of polymerase heat activation at 95 °C · 0% of Eukaryota · 0% of Eukaryota 98 · 28 cycles of 94 °C (60 s), a step-down to 70 °C (1 (AYTGGGYDTAAAGNG) and S- · 0.2% of Archaea · 4.6% of Archaea Bacteria s), followed by a ramping rate of 0.4 °C/s to 50 °C (60 D-Bact-1061-a-A-17 · 92% of Bacteria · 97% of Bacteria 98 s), and a ramping rate of 0.8 °C/s to 72 °C (60 s) (CRRCACGAGCTGACGAC) · 94% of characterized Chlamydiae · 99% of characterized Chlamydiae · final extension at 72 °C (10 min) A519F · 86% of Eukaryota · 93% of Eukaryota (CAGCMGCCGCGGTAA)100 and Eukarya, Archaea · 70% of Archaea · 74% of Archaea not used in this study Uni1391R and Bacteria · 86% of Bacteria · 91% of Bacteria (ACGGGCGGTGWGTRC)93 · 0.7% of characterized Chlamydiae · 93% of characterized Chlamydiae Earth Microbiome Project92 · 0% of Eukaryota · 0% of Eukaryota primers: 515F Archaea and · 0% of Archaea · 0% of Archaea (GTGYCAGCMGCCGCGGTAA) not used in this study Bacteria · 92% of Bacteria · 92% of Bacteria and 806R · 0.7% of characterized Chlamydiae · 95% of characterized Chlamydiae (GGACTACNVGGGTWTCTAAT) aUsing SILVA TestPrime (Klindworth et al., 2013) with the SSU r132 database and RefNR sequence collection bUsing HotStarTaq DNA Polymerase (QIAGEN)

57

Supplementary Table 6. PVC outgroup genomes used in phylogenomic analyses.

Phylum Species Name Genbank Accession Kiritimatiellaeota Kiritimatiella glycovorans L21-Fru-AB GCA_001017655.1 Lentisphaerae Lentisphaerae bacterium GWF2_57_35 GCA_001804865.1 Lentisphaerae Lentisphaerae bacterium RIFOXYC12_FULL_60_16 GCA_001803315.1 Lentisphaerae Lentisphaera araneosa HTCC215 GCA_000170755.1 Lentisphaerae Lentisphaerae bacterium GWF2_50_93 GCA_001804815.1 Verrucomicrobia Coraliomargarita akajimensis DSM 45221 GCA_000025905.1 Verrucomicrobia Opitutus terrae PB90-1 GCA_000019965.1 Verrucomicrobia Pedosphaera parvula Ellin514 GCA_000172555.1 Verrucomicrobia Methylacidiphilum infernorum V4 GCA_000019665.1 Verrucomicrobia Terrimicrobium sacchariphilum NM-5 GCA_001613545.1 Verrucomicrobia Verrucomicrobium sp. BvORR106 GCA_000739655.1 Verrucomicrobia Akkermansia muciniphila ATCC BAA-835 GCA_000020225.1 Unclassified PVC PVC group bacterium (ex. Bugula neritina AB1) AB1-3 GCA_001730085.1 Candidatus Omnitrophica Ca. Omnitrophica bacterium CG1_02_41_171 GCA_001871865.1 Candidatus Omnitrophica Ca. Omnitrophus fodinae SCGC AAA011-A17 GCA_000405945.1 Candidatus Omnitrophica Ca. Omnitrophus magneticus SKK-01 GCA_000954095.1 Candidatus Omnitrophica Omnitrophica WOR_2 bacterium RIFCSPHIGHO2_02_FULL_63_39 GCA_001805685.1 Candidatus Omnitrophica Omnitrophica WOR_2 bacterium RIFCSPLOWO2_02_FULL_50_19 GCA_001805805.1 Candidatus Omnitrophica Omnitrophica WOR_2 bacterium RIFOXYB2_FULL_38_16 GCA_001805995.1 Candidatus Omnitrophica Omnitrophica WOR_2 bacterium RBG_13_41_10 GCA_001805465.1 Candidatus Omnitrophica Omnitrophica WOR_2 bacterium GWF2_43_52 GCA_001805445.1 Planctomycetes Planctomycetes bacterium DG_23 GCA_001302825.1 Planctomycetes Planctomycetes bacterium RIFCSPLOWO2_02_FULL_50_16 GCA_001828565.1 Planctomycetes Ca. Scalindua brodae RU1 GCA_000786775.1 Planctomycetes Ca. JPN1 GCA_000949635.1 Planctomycetes Phycisphaera mikurensis NBRC 102666 GCA_000284115.1 Planctomycetes Isosphaera pallida ATCC 43644 GCA_000186345.1 Planctomycetes Pirellula staleyi DSM 6068 GCA_000025185.1

58

Supplementary Table 7. Single-copy marker genes used to assess chlamydial MAGs completeness and redundancy.

Chlamydiae Micomplete Gene/Domain Names Aminoacyl tRNA synthetase II, N-terminal domain Ribosomal protein S10p/S20e Arginyl tRNA synthetase N terminal domain Ribosomal protein S11 Bacterial RNA polymerase, alpha chain C terminal domain Ribosomal protein S12/S23 Bacterial trigger factor protein (TF) Ribosomal protein S13/S18 Bacterial trigger factor protein (TF) C-terminus Ribosomal protein S15 ClpX C4-type zinc finger Ribosomal protein S16 Conserved hypothetical protein 95 Ribosomal protein S17 CTP synthase N-terminus Ribosomal protein S18 Cytidylate kinase Ribosomal protein S19 Dephospho-CoA kinase Ribosomal protein S2 DNA polymerase III beta subunit, C-terminal domain Ribosomal protein S20 DNA polymerase III beta subunit, central domain Ribosomal protein S3, C-terminal domain DNA primase catalytic core, N-terminal domain Ribosomal protein S4/S9 N-terminal domain Double-stranded RNA binding motif Ribosomal protein S5, C-terminal domain Elongation factor TS Ribosomal protein S5, N-terminal domain Enolase, C-terminal TIM barrel domain Ribosomal protein S6 Enolase, N-terminal domain Ribosomal protein S7p/S5e FAD synthetase Ribosomal protein S8 Ferredoxin-fold anticodon binding domain Ribosomal protein S9/S16 GAD domain Ribosomal Proteins L2, C-terminal domain GrpE Ribosomal Proteins L2, RNA binding domain GTP-binding protein LepA C-terminus recycling factor GTP1/OBG RNA polymerase beta subunit Holliday junction DNA helicase ruvB C-terminus RNA polymerase beta subunit external 1 domain IPP transferase RNA polymerase Rpb1, domain 1 MraW methylase family RNA polymerase Rpb1, domain 2 NusA N-terminal domain RNA polymerase Rpb1, domain 3 Oligomerisation domain RNA polymerase Rpb1, domain 4 Peptidyl-tRNA hydrolase RNA polymerase Rpb1, domain 5 Phosphoglycerate kinase RNA polymerase Rpb2, domain 2 Protein of unknown function (DUF933) RNA polymerase Rpb2, domain 3 recA bacterial DNA recombination protein RNA polymerase Rpb2, domain 6 Ribosomal L18p/L5e family RNA polymerase Rpb2, domain 7 Ribosomal L27 protein RNA polymerase Rpb3/Rpb11 dimerisation domain Ribosomal L28 family RuvA N terminal domain Ribosomal L29 protein SecY translocase ribosomal L5P family C-terminus Seryl-tRNA synthetase N-terminal domain Ribosomal prokaryotic L21 protein Signal peptidase (SPase) II Ribosomal protein L10 Signal peptide binding domain Ribosomal protein L11, N-terminal domain SmpB protein Ribosomal protein L11, RNA binding domain Tetrahydrofolate dehydrogenase/cyclohydrolase, NAD(P)-binding domain Ribosomal protein L13 Translation initiation factor 1A / IF-1 Ribosomal protein L14p/L23e Translation initiation factor IF-3, C-terminal domain Ribosomal protein L16p/L10e Translation initiation factor IF-3, N-terminal domain Ribosomal protein L17 Translation-initiation factor 2 Ribosomal protein L18e/L15 TRCF domain Ribosomal protein L19 tRNA (Guanine-1)-methyltransferase Ribosomal protein L1p/L10e family tRNA synthetase B5 domain Ribosomal protein L20 tRNA synthetases class I (R) Ribosomal protein L22p/L17e tRNA synthetases class II core domain (F) Ribosomal protein L23 UDP-N-acetylenolpyruvoylglucosamine reductase, C-terminal domain Ribosomal protein L3 Ultra-violet resistance protein B Ribosomal protein L35 Uncharacterised P-loop hydrolase UPF0079 Ribosomal protein L5 Uncharacterised protein family (UPF0081) Ribosomal protein L6 Uncharacterized protein family UPF0054 Ribosomal protein L9, C-terminal domain UvrC Helix-hairpin-helix N-terminal Ribosomal protein L9, N-terminal domain

59

Supplementary Table 8. Single-copy marker genes used concatenation-based species tree inference.

Used in 38, 55 an d 98 NO G marker protein sets Used in 55 an d 98 NO G marker protein sets Used in 98 NO G marker protein sets NO G CO G description NO G CO G description NO G CO G description COG 0049 Ribosomal protein S7 0XPXW COG 0012 Ribosome-binding ATPase YchF, GTP1/OBG family COG 0051 Ribosomal protein S10 COG 0013 Alanyl-tRNA synthetase COG 0064 Asp-tRNAAsn/Glu-tRNAGln amidotransferase B subunit COG 0057 Glyceraldehyde-3-phosphate dehydrogenase/erythrose-4- COG 0016 Phenylalanyl-tRNA synthetase alpha subunit COG 0125 Thymidylate kinase phosphate dehydrogenase COG 0081 Ribosomal protein L1 COG 0050 Translation elongation factor EF-Tu, a GTPase COG 0148 Enolase COG 0087 Ribosomal protein L3 COG 0052 Ribosomal protein S2 COG 0149 Triosephosphate isomerase COG 0088 Ribosomal protein L4 COG 0172 Seryl-tRNA synthetase COG 0173 Aspartyl-tRNA synthetase COG 0090 Ribosomal protein L2 COG 0240 Glycerol-3-phosphate dehydrogenase COG 0180 Tryptophanyl-tRNA synthetase COG 0091 Ribosomal protein L22 COG 0292 Ribosomal protein L20 COG 0190 5,10-methylene-tetrahydrofolate dehydrogenase/Methenyl tetrahydrofolate cyclohydrolase COG 0092 Ribosomal protein S3 COG 0359 Ribosomal protein L9 COG 0195 Transcription antitermination factor NusA, contains S1 and KH domains COG 0093 Ribosomal protein L14 COG 0504 CTP synthase (UTP-ammonia lyase) COG 0196 FAD synthase COG 0094 Ribosomal protein L5 COG 0536 GTPase involved in cell partioning and DNA repair COG 0215 Cysteinyl-tRNA synthetase COG 0096 Ribosomal protein S8 COG 0544 FKBP-type peptidyl-prolyl cis-trans isomerase (trigger COG 0217 Transcriptional and/or translational regulatory protein YebC/TACO1 factor) COG 0097 Ribosomal protein L6P/L9E COG 0592 DNA polymerase III sliding clamp (beta) subunit, PCNA COG 0249 DNA mismatch repair ATPase MutS homolog COG 0098 Ribosomal protein S5 COG 0750 Membrane-associated protease RseP, regulator of RpoE COG 0250 Transcription antitermination factor NusG activity COG 0103 Ribosomal protein S9 COG 1185 Polyribonucleotide nucleotidyltransferase (polynucleotide COG 0272 NAD-dependent DNA ligase phosphorylase) COG 0126 3-phosphoglycerate kinase COG 1198 Primosomal protein N' (replication factor Y) - superfamily II COG 0283 Cytidylate kinase helicase COG 0127 Inosine/xanthosine triphosphate pyrophosphatase, all-alpha COG 1530 Ribonuclease G or E COG 0289 Dihydrodipicolinate reductase NTP-PPase family COG 0197 Ribosomal protein L16/L10AE COG 0322 Excinuclease UvrABC, nuclease subunit COG 0200 Ribosomal protein L15 COG 0323 DNA mismatch repair ATPase MutL COG 0201 Preprotein translocase subunit SecY COG 0324 tRNA A37 N6-isopentenylltransferase MiaA COG 0203 Ribosomal protein L17 COG 0335 Ribosomal protein L19 COG 0216 Protein chain release factor A COG 0445 tRNA U34 5-carboxymethylaminomethyl modifying enzyme MnmG/GidA COG 0233 Ribosome recycling factor COG 0449 Glucosamine 6-phosphate synthetase, contains amidotransferase and phosphosugar isomerase domains COG 0244 Ribosomal protein L10 COG 0481 Translation elongation factor EF-4, membrane-bound GTPase COG 0256 Ribosomal protein L18 COG 0511 Biotin carboxyl carrier protein COG 0261 Ribosomal protein L21 COG 0522 Ribosomal protein S4 or related protein COG 0264 Translation elongation factor EF-Ts COG 0525 Valyl-tRNA synthetase COG 0290 Translation initiation factor IF-3 COG 0532 Translation initiation factor IF-2, a GTPase COG 0331 Malonyl CoA-acyl carrier protein transacylase COG 0541 Signal recognition particle GTPase COG 0353 Recombinational DNA repair protein RecR COG 0552 Signal recognition particle GTPase COG 0468 RecA/RadA recombinase COG 0575 CDP-diglyceride synthetase COG 0495 Leucyl-tRNA synthetase COG 0749 DNA polymerase I - 3'-5' exonuclease and polymerase domains COG 0576 Molecular chaperone GrpE (heat shock protein) COG 0774 UDP-3-O-acyl-N-acetylglucosamine deacetylase COG 0632 Holliday junction resolvasome RuvABC DNA-binding subunit COG 0825 Acetyl-CoA carboxylase alpha subunit

COG 0769 UDP-N-acetylmuramyl tripeptide synthase COG 0858 Ribosome-binding factor A COG 0815 Apolipoprotein N-acyltransferase COG 1044 UDP-3-O-[3-hydroxymyristoyl] glucosamine N-acyltransferase COG 0817 Holliday junction resolvasome RuvABC endonuclease subunit COG 1137 ABC-type lipopolysaccharide export system, ATPase component

COG 2255 Holliday junction resolvasome RuvABC, ATP-dependent DNA COG 1160 Predicted GTPases helicase subunit COG 1570 Exonuclease VII, large subunit COG 1663 Tetraacyldisaccharide-1-P 4'-kinase COG 1825 Ribosomal protein L25 (general stress protein Ctc) COG 2877 3-deoxy-D-manno-octulosonic acid (KDO) 8-phosphate synthase COG 2890 Methylase of polypeptide chain release factors

60

Supplementary Data Descriptions

Supplementary Data 1. Chlamydiae 16S rRNA amplicon OTUs in FASTA format

Supplementary Data 2. Relative abundance of Chlamydiae OTUs across Loki’s Castle marine sediments.

Supplementary Data 3. Pathway overviews, selected gene annotations and raw data.

Tab 1. Presence and absence of bacterial level NOGs across Chlamydiae

Tab 2. EffectiveDB results

Tab 3. Overview of KEGG pathways and their presence across Chlamydiae including central carbon metabolism, carbon fixation, amino acid and nucleotide biosynthesis

Tab 4. Secretion systems and flagellar components identified by MacSyFinder

Tab 5. Selected gene annotations

Tab 6. IMNGS results

Supplementary Data 4. Unprocessed phylogenetic trees presented in this study.

61

Supplementary References

1 Pillonel, T., Bertelli, C., Salamin, N. & Greub, G. Taxogenomics of the order Chlamydiales. Int J Syst Evol Microbiol 65, 1381-1393, doi:10.1099/ijs.0.000090 (2015). 2 Taylor-Brown, A. et al. Culture-independent genomics of a novel chlamydial pathogen of fish provides new insight into host-specific adaptations utilized by these intracellular bacteria. Environ Microbiol 19, 1899-1913, doi:10.1111/1462-2920.13694 (2017). 3 Pillonel, T., Bertelli, C. & Greub, G. Environmental Metagenomic Assemblies Reveal Seven New Highly Divergent Chlamydial Lineages and Hallmarks of a Conserved Intracellular Lifestyle. Front Microbiol 9, 79, doi:10.3389/fmicb.2018.00079 (2018). 4 Taylor-Brown, A. et al. Metagenomic analysis of fish-associated Ca. Parilichlamydiaceae reveals striking metabolic similarities to the terrestrial Chlamydiaceae. Genome Biol Evol, doi:10.1093/gbe/evy195 (2018). 5 Lartillot, N. & Philippe, H. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol 21, 1095-1109, doi:10.1093/molbev/msh112 (2004). 6 Stride, M. C. et al. Molecular Characterization of ''Candidatus Parilichlamydia carangidicola,'' a Novel Chlamydia-Like Epitheliocystis Agent in Yellowtail Kingfish, Seriola lalandi (Valenciennes), and the Proposal of a New Family, ''Candidatus Parilichlamydiaceae'' fam. nov. (Order Chlamydiales). Applied and Environmental Microbiology 75, doi:10.1128/AEM.02899-12 (2012). 7 Vouga, M., Baud, D. & Greub, G. Simkania negevensis, an insight into the biology and clinical importance of a novel member of the Chlamydiales order. Crit Rev Microbiol 43, 62-80, doi:10.3109/1040841X.2016.1165650 (2017). 8 Anantharaman, K. et al. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat Commun 7, 13219, doi:10.1038/ncomms13219 (2016). 9 Baker, B. J., Lazar, C. S., Teske, A. P. & Dick, G. J. Genomic resolution of linkages in carbon, nitrogen, and sulfur cycling among widespread estuary sediment bacteria. Microbiome 3, 14, doi:10.1186/s40168-015-0077-6 (2015). 10 Horn, M. Chlamydiae as symbionts in eukaryotes. Annu Rev Microbiol 62, 113-131, doi:10.1146/annurev.micro.62.081307.162818 (2008). 11 Taylor-Brown, A., Vaughan, L., Greub, G., Timms, P. & Polkinghorne, A. Twenty years of research into Chlamydia-like organisms: a revolution in our understanding of the biology and pathogenicity of members of the phylum Chlamydiae. Pathog Dis 73, 1-15 (2015). 12 Baud, D., Thomas, V., Arafa, A., Regan, L. & Greub, G. Waddlia chondrophila, a potential agent of human fetal death. Emerg Infect Dis 13, 1239-1243, doi:10.3201/eid1308.070315 (2007). 13 Moliner, C., Fournier, P. E. & Raoult, D. Genome analysis of living in amoebae reveals a melting pot of evolution. FEMS Microbiol Rev 34, 281-294, doi:10.1111/j.1574-6976.2010.00209.x (2010). 14 Nunes, A. & Gomes, J. P. Evolution, phylogeny, and molecular epidemiology of Chlamydia. Infect Genet Evol 23, 49-64 (2014). 15 Elwell, C., Mirrashidi, K. & Engel, J. Chlamydia cell biology and pathogenesis. Nat. Rev. Microbiol. 14, 385-400, doi:10.1038/nrmicro.2016.30 (2016). 16 Horn, M. et al. Illuminating the evolutionary history of chlamydiae. Science 304, 728- 730, doi:10.1126/science.1096330 (2004).

62

17 Collingro, A. et al. Unity in variety--the pan-genome of the Chlamydiae. Mol Biol Evol 28, 3253-3270 (2011). 18 Subtil, A., Collingro, A. & Horn, M. Tracing the primordial Chlamydiae: extinct parasites of plants? Trends Plant Sci 19, 36-43, doi:10.1016/j.tplants.2013.10.005 (2014). 19 Omsland, A., Sixt, B. S., Horn, M. & Hackstadt, T. Chlamydial metabolism revisited: interspecies metabolic variability and developmental stage-specific physiologic activities. FEMS Microbiol Rev 38, 779-801, doi:10.1111/1574-6976.12059 (2014). 20 Stone, C. B., Bulir, D. C., Gilchrist, J. D., Toor, R. K. & Mahony, J. B. Interactions between flagellar and type III secretion proteins in . BMC Microbiol 10, 18, doi:10.1186/1471-2180-10-18 (2010). 21 Birkelund, S. et al. Analysis of proteins in Chlamydia trachomatis L2 outer membrane complex, COMC. FEMS Immunol Med Microbiol 55, 187-195, doi:10.1111/j.1574- 695X.2009.00522.x (2009). 22 Bliven, K. A., Fisher, D. J. & Maurelli, A. T. Characterization of the activity and expression of arginine decarboxylase in human and animal Chlamydia pathogens. FEMS Microbiol Lett 337, 140-146, doi:10.1111/1574-6968.12021 (2012). 23 Ponting, C. P. Chlamydial homologues of the MACPF (MAC/perforin) domain. Curr Biol 9, R911-913 (1999). 24 Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res 42, D222-230, doi:10.1093/nar/gkt1223 (2014). 25 Muschiol, S. et al. Identification of a family of effectors secreted by the type III secretion system that are conserved in pathogenic Chlamydiae. Infect Immun 79, 571- 580, doi:10.1128/IAI.00825-10 (2011). 26 Hobolt-Pedersen, A. S., Christiansen, G., Timmerman, E., Gevaert, K. & Birkelund, S. Identification of Chlamydia trachomatis CT621, a protein delivered through the type III secretion system to the host cell cytoplasm and nucleus. FEMS Immunol Med Microbiol 57, 46-58, doi:10.1111/j.1574-695X.2009.00581.x (2009). 27 Chellas-Gery, B., Linton, C. N. & Fields, K. A. Human GCIP interacts with CT847, a novel Chlamydia trachomatis type III secretion substrate, and is degraded in a tissue- culture infection model. Cell Microbiol 9, 2417-2430, doi:10.1111/j.1462- 5822.2007.00970.x (2007). 28 da Cunha, M. et al. Identification of type III secretion substrates of Chlamydia trachomatis using Yersinia enterocolitica as a heterologous system. Bmc Microbiology 14, doi:Artn 4010.1186/1471-2180-14-40 (2014). 29 Abby, S. S., Neron, B., Menager, H., Touchon, M. & Rocha, E. P. MacSyFinder: a program to mine genomes for molecular systems with an application to CRISPR-Cas systems. PLoS One 9, e110726, doi:10.1371/journal.pone.0110726 (2014). 30 Abby, S. S. et al. Identification of protein secretion systems in bacterial genomes. Sci Rep 6, 23080, doi:10.1038/srep23080 (2016). 31 Peabody, C. R. et al. Type II protein secretion and its relationship to bacterial type IV pili and archaeal flagella. Microbiology 149, 3051-3072, doi:10.1099/mic.0.26364-0 (2003). 32 Korotkov, K. V., Sandkvist, M. & Hol, W. G. The type II secretion system: biogenesis, molecular architecture and mechanism. Nat Rev Microbiol 10, 336-351, doi:10.1038/nrmicro2762 (2012). 33 Mueller, K. E., Plano, G. V. & Fields, K. A. New frontiers in type III secretion biology: the Chlamydia perspective. Infect Immun 82, 2-9, doi:10.1128/IAI.00917-13 (2014).

63

34 Dumoux, M., Nans, A., Saibil, H. R. & Hayward, R. D. Making connections: snapshots of chlamydial type III secretion systems in contact with host membranes. Curr Opin Microbiol 23, 1-7, doi:10.1016/j.mib.2014.09.019 (2015). 35 Abby, S. S. & Rocha, E. P. The non-flagellar type III secretion system evolved from the bacterial flagellum and diversified into host-cell adapted systems. PLoS Genet 8, e1002983, doi:10.1371/journal.pgen.1002983 (2012). 36 Collingro, A. et al. Unexpected genomic features in widespread intracellular bacteria: evidence for motility of marine chlamydiae. ISME J 11, 2334-2344, doi:10.1038/ismej.2017.95 (2017). 37 Tully, B. J., Graham, E. D. & Heidelberg, J. F. The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans. Sci Data 5, 170203, doi:10.1038/sdata.2017.203 (2018). 38 Eichinger, V. et al. EffectiveDB--updates and novel features for a better annotation of bacterial secreted proteins and Type III, IV, VI secretion systems. Nucleic Acids Res 44, D669-674, doi:10.1093/nar/gkv1269 (2016). 39 Sait, M. et al. Genomic and Experimental Evidence Suggests that Verrucomicrobium spinosum Interacts with Eukaryotes. Front Microbiol 2, 211, doi:10.3389/fmicb.2011.00211 (2011). 40 Martinez-Garcia, P. M., Ramos, C. & Rodriguez-Palenzuela, P. T346Hunter: a novel web-based tool for the prediction of type III, type IV and type VI secretion systems in bacterial genomes. PLoS One 10, e0119317, doi:10.1371/journal.pone.0119317 (2015). 41 Pilhofer, M. et al. Architecture and host interface of environmental chlamydiae revealed by electron cryotomography. Environ Microbiol 16, 417-429, doi:10.1111/1462- 2920.12299 (2014). 42 Nans, A., Kudryashev, M., Saibil, H. R. & Hayward, R. D. Structure of a bacterial type III secretion system in contact with a host membrane in situ. Nat Commun 6, 10114, doi:10.1038/ncomms10114 (2015). 43 Konig, L. et al. Biphasic Metabolism and Host Interaction of a Chlamydial Symbiont. mSystems 2, doi:10.1128/mSystems.00202-16 (2017). 44 Beder, T. & Saluz, H. P. Virulence-related comparative transcriptomics of infectious and non-infectious chlamydial particles. BMC Genomics 19, 575, doi:10.1186/s12864- 018-4961-x (2018). 45 Cosse, M. M., Hayward, R. D. & Subtil, A. One Face of Chlamydia trachomatis: The Infectious Elementary Body. Curr Top Microbiol Immunol 412, 35-58, doi:10.1007/82_2016_12 (2018). 46 Gallique, M., Bouteiller, M. & Merieau, A. The Type VI Secretion System: A Dynamic System for Bacterial Communication? Front Microbiol 8, 1454, doi:10.3389/fmicb.2017.01454 (2017). 47 Cao, Z., Casabona, M. G., Kneuper, H., Chalmers, J. D. & Palmer, T. The type VII secretion system of Staphylococcus aureus secretes a nuclease toxin that targets competitor bacteria. Nat Microbiol 2, 16183, doi:10.1038/nmicrobiol.2016.183 (2016). 48 Aoki, S. K. et al. A widespread family of polymorphic contact-dependent toxin delivery systems in bacteria. Nature 468, 439-442, doi:10.1038/nature09490 (2010). 49 Souza, D. P. et al. Bacterial killing via a type IV secretion system. Nat Commun 6, 6453, doi:10.1038/ncomms7453 (2015). 50 Willett, J. L., Ruhe, Z. C., Goulding, C. W., Low, D. A. & Hayes, C. S. Contact- Dependent Growth Inhibition (CDI) and CdiB/CdiA Two-Partner Secretion Proteins. J Mol Biol 427, 3754-3765, doi:10.1016/j.jmb.2015.09.010 (2015).

64

51 Garcia-Bayona, L., Guo, M. S. & Laub, M. T. Contact-dependent killing by Caulobacter crescentus via cell surface-associated, glycine zipper proteins. Elife 6, doi:10.7554/eLife.24869 (2017). 52 Lucas, C. E., Brown, E. & Fields, B. S. Type IV pili and type II secretion play a limited role in Legionella pneumophila biofilm colonization and retention. Microbiology 152, 3569-3573, doi:10.1099/mic.0.2006/000497-0 (2006). 53 Ishida, K. et al. Amoebal endosymbiont Neochlamydia genome sequence illuminates the bacterial role in the defense of the host amoebae against Legionella pneumophila. PLoS One 9, e95166, doi:10.1371/journal.pone.0095166 (2014). 54 Stewart, C. R., Rossier, O. & Cianciotto, N. P. Surface translocation by Legionella pneumophila: a form of sliding motility that is dependent upon type II protein secretion. J Bacteriol 191, 1537-1546, doi:10.1128/JB.01531-08 (2009). 55 Pallen, M. J., Beatson, S. A. & Bailey, C. M. Bioinformatics, genomics and evolution of non-flagellar type-III secretion systems: a Darwinian perspective. FEMS Microbiol Rev 29, 201-229, doi:10.1016/j.femsre.2005.01.001 (2005). 56 Tjaden, J. et al. Two Nucleotide Transport Proteins in Chlamydia trachomatis, One for Net Nucleoside Triphosphate Uptake and the Other for Transport of Energy. Journal of Bacteriology 181, 1196-1202 (1999). 57 Neuhaus, H. E., Thom, E., Möhlmann, T., Steup, M. & Kampfenkelz, K. Characterization of a novel eukaryotic ATP/ADP translocator located in the plastid envelope of Arabidopsis thaliana L. The Plant Journal 11, 73-82 (1997). 58 Tjaden, J., Schwöppe, C., Möhlmann, T., Quick, P. W. & Neuhaus, H. E. Expression of a Plastidic ATP/ADP Transporter Gene in Escherichia coli Leads to a Functional Adenine Nucleotide Transport System in the Bacterial Cytoplasmic Membrane. The Journal of Biological Chemistry 273, 9630-9636 (1998). 59 Schmitz-Esser, S. et al. ATP/ADP Translocases: a Common Feature of Obligate Intracellular Amoebal Symbionts Related to Chlamydiae and Rickettsiae. Journal of Bacteriology 186, 683-691, doi:10.1128/jb.186.3.683-691.2004 (2004). 60 Major, P., Embley, T. M. & Williams, T. A. Phylogenetic Diversity of NTT Nucleotide Transport Proteins in Free-Living and Parasitic Bacteria and Eukaryotes. Genome Biol Evol 9, 480-487, doi:10.1093/gbe/evx015 (2017). 61 Gould, S. B., Waller, R. F. & McFadden, G. I. Plastid evolution. Annu Rev Plant Biol 59, 491-517, doi:10.1146/annurev.arplant.59.032607.092915 (2008). 62 Amiri, H., Karlberg, O. & Andersson, S. G. Deep origin of plastid/parasite ATP/ADP translocases. J Mol Evol 56, 137-150, doi:10.1007/s00239-002-2387-0 (2003). 63 Greub, G. & Raoult, D. History of the ADP/ATP-Translocase-Encoding Gene, a Parasitism Gene Transferred from a Chlamydiales Ancestor to Plants 1 Billion Years Ago. Applied and Environmental Microbiology 69, 5530-5535, doi:10.1128/aem.69.9.5530-5535.2003 (2003). 64 Knab, S., Mushak, T. M., Schmitz-Esser, S., Horn, M. & Haferkamp, I. Nucleotide parasitism by Simkania negevensis (Chlamydiae). J Bacteriol 193, 225-235, doi:10.1128/JB.00919-10 (2011). 65 Fisher, D. J., Fernández, R. E. & Maurelli, A. T. Chlamydia trachomatis Transports NAD via the Npt1 ATP/ADP Translocase. Journal of Bacteriology 195, 3381-3386 (2013). 66 Stephens, R. S. et al. Genome Sequence of an Obligate Intracellular Pathogen of Humans: Chlamydia trachomatis. Science 282, 754-759 (1998). 67 Haferkamp, I. et al. A candidate NAD transporter in an intracellular bacterial symbiont related to Chlamydiae. Nature 432 (2004).

65

68 Haferkamp, I. et al. Tapping the nucleotide pool of the host: novel nucleotide carrier proteins of Protochlamydia amoebophila. Mol Microbiol 60, 1534-1545, doi:10.1111/j.1365-2958.2006.05193.x (2006). 69 Yeoh, Y. K., Sekiguchi, Y., Parks, D. H. & Hugenholtz, P. Comparative Genomics of Candidate Phylum TM6 Suggests That Parasitism Is Widespread and Ancestral in This Lineage. Mol Biol Evol 33, 915-927, doi:10.1093/molbev/msv281 (2016). 70 Perez, J., Moraleda-Munoz, A., Marcos-Torres, F. J. & Munoz-Dorado, J. Bacterial predation: 75 years and counting! Environ Microbiol 18, 766-779, doi:10.1111/1462- 2920.13171 (2016). 71 Groves, M. R., Hanlon, N., Turowski, P., Hemmings, B. A. & Barford, D. The structure of the protein phosphatase 2A PR65/A subunit reveals the conformation of its 15 tandemly repeated HEAT motifs. Cell 96, 99-110 (1999). 72 Moran, N. A. Microbial minimalism: genome reduction in bacterial pathogens. Cell 108, 583-586 (2002). 73 Bertelli, C. et al. The Waddlia genome: a window into chlamydial biology. PLoS One 5, e10890, doi:10.1371/journal.pone.0010890 (2010). 74 Bertelli, C. et al. Sequencing and characterizing the genome of Estrella lausannensis as an undergraduate project: training students and biological insights. Front Microbiol 6, 101, doi:10.3389/fmicb.2015.00101 (2015). 75 Schander, C. et al. The fauna of hydrothermal vents on the Mohn Ridge (North Atlantic). Marine Biology Research 6, 155-171 (2010). 76 Pedersen, R. B. et al. Discovery of a black smoker vent field and vent fauna at the Arctic Mid-Ocean Ridge. Nat Commun 1, doi:10.1038/ncomms1124 (2010). 77 Edgcomb, V. P., Beaudoin, D., Gast, R., Biddle, J. F. & Teske, A. Marine subsurface eukaryotes: the fungal majority. Environ Microbiol 13, 172-183, doi:10.1111/j.1462- 2920.2010.02318.x (2011). 78 Orsi, W., Biddle, J. F. & Edgcomb, V. Deep sequencing of subseafloor eukaryotic rRNA reveals active Fungi across marine subsurface provinces. PLoS One 8, e56335, doi:10.1371/journal.pone.0056335 (2013). 79 Spang, A. et al. Complex archaea that bridge the gap between prokaryotes and eukaryotes. Nature 521, 173-179, doi:10.1038/nature14447 (2015). 80 Kouduka, M. et al. A new DNA extraction method by controlled alkaline treatments from consolidated subsurface sediments. FEMS Microbiol Lett 326, 47-54, doi:10.1111/j.1574-6968.2011.02437.x (2012). 81 Orsi, W. D. Ecology and evolution of seafloor and subseafloor microbial communities. Nat Rev Microbiol, doi:10.1038/s41579-018-0046-8 (2018). 82 Israelsson, O. Chlamydial symbionts in the enigmatic Xenoturbella (Deuterostomia). J Invertebr Pathol 96, 213-220, doi:10.1016/j.jip.2007.05.002 (2007). 83 Kjeldsen, K. U., Obst, M., Nakano, H., Funch, P. & Schramm, A. Two types of endosymbiotic bacteria in the enigmatic marine worm Xenoturbella bocki. Appl Environ Microbiol 76, 2657-2662, doi:10.1128/AEM.01092-09 (2010). 84 Rurangirwa, F. R., Dilbeck, P. M., Crawford, T. B., McGuire, T. C. & McElwain, T. F. Analysis of the 16S rRNA gene of micro-organism WSU 86-1044 from an aborted bovine foetus reveals that it is a member of the order Chlamydiales: proposal of Waddliaceae fam. nov., Waddlia chondrophila gen. nov., sp. nov. Int J Syst Bacteriol 49 Pt 2, 577-581, doi:10.1099/00207713-49-2-577 (1999). 85 Lagkouvardos, I. et al. Integrating metagenomic and amplicon databases to resolve the phylogenetic and ecological diversity of the Chlamydiae. ISME J 8, 115-125, doi:10.1038/ismej.2013.142 (2014).

66

86 Lagkouvardos, I. et al. IMNGS: A comprehensive open resource of processed 16S rRNA microbial profiles for ecology and diversity studies. Sci Rep 6, 33721, doi:10.1038/srep33721 (2016). 87 Kantor, R. S. et al. Bioreactor microbial ecosystems for thiocyanate and cyanide degradation unravelled with genome-resolved metagenomics. Environ Microbiol 17, 4929-4941, doi:10.1111/1462-2920.12936 (2015). 88 Pinto, A. J. et al. Metagenomic Evidence for the Presence of Comammox Nitrospira- Like Bacteria in a Drinking Water System. mSphere 1, doi:10.1128/mSphere.00054-15 (2016). 89 Zhang, Y., Kitajima, M., Whittle, A. J. & Liu, W. T. Benefits of Genomic Insights and CRISPR-Cas Signatures to Monitor Potential Pathogens across Drinking Water Production and Distribution Systems. Front Microbiol 8, 2036, doi:10.3389/fmicb.2017.02036 (2017). 90 Schulz, F. et al. Towards a balanced view of the bacterial tree of life. Microbiome 5, 140, doi:10.1186/s40168-017-0360-9 (2017). 91 Lagkouvardos, I. et al. Integrating metagenomic and amplicon databases to resolve the phylogenetic and ecological diversity of the Chlamydiae. ISME J 8, 115-125 (2014). 92 Thompson, L. R. et al. A communal catalogue reveals Earth's multiscale microbial diversity. Nature 551, 457-463, doi:10.1038/nature24621 (2017). 93 Jorgensen, S. L. et al. Correlating microbial community profiles with geochemical data in highly stratified sediments from the Arctic Mid-Ocean Ridge. PNAS 109, E2846- E2855, doi:10.1073/pnas.1207574109 (2012). 94 Taylor-Brown, A., Madden, D. & Polkinghorne, A. Culture-independent approaches to chlamydial genomics. Microb Genom, doi:10.1099/mgen.0.000145 (2018). 95 Probst, A. J. et al. Differential depth distribution of microbial function and putative symbionts through sediment-hosted aquifers in the deep terrestrial subsurface. Nat Microbiol 3, 328-336, doi:10.1038/s41564-017-0098-y (2018). 96 Anantharaman, K. et al. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat Commun 7, doi:10.1038/ncomms13219 (2016). 97 Taylor-Brown, A., Spang, L., Borel, N. & Polkinghorne, A. Culture-independent metagenomics supports discovery of uncultivable bacteria within the genus Chlamydia. Sci Rep 7, 10661, doi:10.1038/s41598-017-10757-5 (2017). 98 Klindworth, A. et al. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res 41, e1, doi:10.1093/nar/gks808 (2013). 99 Hugerth, L. W. et al. Systematic design of 18S rRNA gene primers for determining eukaryotic diversity in microbial consortia. PLoS One 9, e95567, doi:10.1371/journal.pone.0095567 (2014). 100 Wang, Y. & Qian, P. Y. Conservative fragments in bacterial 16S rRNA genes and primer design for 16S ribosomal DNA amplicons in metagenomic studies. PLoS One 4, e7401, doi:10.1371/journal.pone.0007401 (2009).

67