Supplementary Information for:
Marine sediments illuminate Chlamydiae diversity and evolution
Jennah E. Dharamshi1, Daniel Tamarit1†, Laura Eme1†, Courtney Stairs1, Joran Martijn1, Felix Homa1, Steffen L. Jørgensen2, Anja Spang1,3, Thijs J. G. Ettema1,4*
1 Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, SE-75123 Uppsala, Sweden 2 Department of Earth Science, Centre for Deep Sea Research, University of Bergen, N-5020 Bergen, Norway 3 Department of Marine Microbiology and Biogeochemistry, NIOZ Royal Netherlands Institute for Sea Research, and Utrecht University, NL-1790 AB Den Burg, The Netherlands 4 Laboratory of Microbiology, Department of Agrotechnology and Food Sciences, Wageningen University, 6708 WE Wageningen, The Netherlands.
† These authors contributed equally * Correspondence to: Thijs J. G. Ettema, Email: [email protected]
Supplementary Information
Supplementary Discussions ...... 3 1. Evolutionary relationships within the Chlamydiae phylum ...... 3 2. Insights into the evolution of pathogenicity in Chlamydiaceae ...... 8 3. Secretion systems and flagella in Chlamydiae ...... 13 4. Phylogenetic diversity of chlamydial nucleotide transporters...... 20 5. Genomic potential for de novo biosynthesis of nucleotides and amino acids across Chlamydiae ...... 25 6. Eukaryotes in Loki’s Castle marine sediments ...... 27 7. Abundance and diversity of chlamydial lineages in Loki’s Castle marine sediments ...... 30 8. Underestimation of environmental abundance and diversity of Chlamydiae ...... 31
Supplementary Figures ...... 35
Supplementary Tables ...... 53
Supplementary Data Descriptions ...... 61
Supplementary References ...... 62
2 1 Supplementary Discussions
2 3 1. Evolutionary relationships within the Chlamydiae phylum
4 We performed several in-depth phylogenomic analyses to reconstruct interspecies relationships
5 within the Chlamydiae phylum. To build upon previous work1-3, we have increased taxon
6 sampling and put a particular emphasis on applying state-of-the-art approaches aiming to detect
7 and alleviate potential phylogenetic artifacts that can be caused by long-branching taxa and
8 sequence composition heterogeneity (see Methods).
9 Our phylogenomic analyses in maximum likelihood and Bayesian frameworks allowed
10 us to resolve seven well-supported Chlamydiae Clades (CC) of putatively high taxonomic rank.
11 These include five newly identified clades, CC-I through CC-IV and Anoxychlamydiales,
12 which are primarily composed of uncultured chlamydial lineages represented by metagenome-
13 assembled genomes (MAGs). The phylogenetic placement of most lineages, and deep-
14 branching relationships between chlamydia clades were well-resolved and consistent across
15 phylogenomic reconstructions (Fig. 2, Supplementary Figs. 3 and 16), with the exception of a
16 few long-branching lineages (see below).
17
18 1.1 Resolving deep evolutionary relationships between chlamydial clades 19 20 Overall, within previously identified clades, our analyses recovered shallow evolutionary
21 relationships that were consistent with recent work3. However, there are notable differences
22 with regard to the inferred deeper evolutionary relationships. In particular, previous work has
23 suggested that the Chlamydiaceae (denoted as the order Chlamydiales3) are deeply branching1-
24 4 and comprise a sister group of all other chlamydial lineages (corresponding to C-I, CC-II, CC-
25 III, Anoxychlamydiales and environmental chlamydiae members)2,3, which was tentatively
26 classified as the order Parachlamydiales3.
27 In contrast, all our phylogenomic reconstructions strongly support a sister relationship of
28 the Chlamydiaceae with CC-IV, which together form a sister clade of the environmental
29 chlamydiae. Altogether, this group forms a sister relationship with the second major radiation
30 in the Chlamydiae, comprised of CC-I, CC-II, CC-III and Anoxychlamydiales lineages (Fig. 2,
31 Supplementary Figs. 3 and 16).
32 Our results differ from prior analyses due to the inclusion of CC-IV, which is composed
33 of three newly identified metagenome assembles genomes (MAGs) from Loki’s Castle marine
34 sediments, and the use of phylogenetic inference methods aimed at minimizing artifacts such
35 as long-branch attraction (LBA). For instance, the branch leading to the Chlamydiaceae family
36 is relatively long, which may in part be due to the evolutionary transition to a parasitic lifestyle
37 with a restricted animal host range6,7. The inclusion of CC-IV in our analyses shortens the long
38 branch to the Chlamydiaceae and may thus alleviate phylogenetic reconstruction artefacts that
39 were previously attracting the latter to the base of the phylum.
40 Investigations of the evolution of the Chlamydiae and inferences on the nature of the
41 chlamydial ancestor have been based on the assumption that the Chlamydiaceae represent the
42 earliest diverging lineage within this phylum1,2. Thus, conclusions from these analyses will
43 need to be re-examined based on the herein updated phylogeny of the Chlamydiae.
44
45 1.2 Phylogenetic placement of long-branching chlamydial lineages 46 47 In a recent study, the long-branching orphan lineage Chlamydiae bacterium
48 RIFCSPHIGHO2_12_FULL_49_11 was inferred as the second deepest-branching lineage
49 within Chlamydiae (after the divergence of Ca. Similichlamydia epinephelii) and was proposed
50 to form the new order Candidatus Novochlamydiales3.
51 In agreement with this, our initial maximum-likelihood (ML) phylogenies suggested the
52 placement of Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_11, followed by
4
53 K940_chlam_8 at the base of Chlamydiae, although support for the early divergence of these
54 representatives was weak (BV = 42 and BV = 61, respectively) (Fig. 2, Supplementary Data 4).
55 When 25% of the most heterogeneous sites were removed, both lineages became nested inside
56 a larger clade composed of CC-I, II, III and Anoxychlamydiales, although with poor support
57 (Fig. 2, Supplementary Data 4). However, in our Bayesian phylogenetic inference based on the
58 CAT-GTR model, a complex model of protein evolution that minimizes the effects of LBA5,
59 the placement of the two lineages within the larger clade of CC-I, II, III and Anoxychlamydiales
60 was highly supported (posterior probability (PP) = 0.97, Fig. 2). For instance, Chlamydiae
61 bacterium RIFCSPHIGHO2_12_FULL_49_11 was placed within a well-supported clade with
62 CC-I (PP = 0.99, Fig. 2), suggesting that the early divergence of this representative may indeed
63 have been the result of LBA. Thus, our analyses indicate that Chlamydiae bacterium
64 RIFCSPHIGHO2_12_FULL_49_112 does not represent a deep-branching Chlamydiae order
65 but may instead be closely related to the Simkaniaceae family.
66 During the process of our analyses, several other chlamydial MAGs and Single-cell
67 Assembled Genomes (SAGs) were publicly released (Supplementary Table 3). We
68 reconstructed a ML phylogeny including these lineages, which was congruent with our prior
69 analyses (Supplementary Fig. 3). One of these MAGs, representing Candidatus
70 Similichlamydia epinephelii, was placed as a sister lineage to all other members of the
71 Chlamydiae with high support (Supplementary Fig. 3). This position is consistent with other
72 recent phylogenomic analyses of the Chlamydiae2-4. Ca. S. epinephelii is a member of the
73 candidate chlamydial family Candidatus Parilichlamydiaceae, which is composed of
74 chlamydial fish pathogens that cause epitheliocystis6. This taxon emerges on a long-branch,
75 which is not surprising given the accelerated rate of evolution observed in many pathogens.
76 Future phylogenetic analyses with an improved taxonomic sampling might better resolve the
5
77 phylogenetic placement of Ca. Parilichlamydiaceae by alleviating potential phylogenetic
78 artifacts.
79
80 1.3 Genome characteristics and gene content variation across the Chlamydiae phylum 81 82 Genome characteristics (e.g., genome size and GC content) and gene content vary widely
83 between different clades of the Chlamydiae (Fig. 2, Supplementary Fig. 3). Nearly all genomic
84 information available for CC-I, CC-II, CC-III and Anoxychlamydiales is represented by MAGs
85 from Loki’s Castle marine sediments (Supplementary Table 2) or other recent metagenomic
86 surveys (Supplementary Table 3), which together represent over half of all chlamydial
87 diversity. With the exception of Simkania negevensis, all characterized chlamydial lineages
88 obtained through co-cultivation are part of the environmental chlamydiae and Chlamydiaceae
89 (Supplementary Table 3).
90 CC-II has an unusually high GC-content among chlamydiae, and branches as a sister
91 group to the S. negevensis-containing7 CC-I clade (Fig. 2, Supplementary Fig. 3). The estimated
92 genome sizes of CC-II MAGs derived from this metagenomic study and others (1.6-2.0 Mbp),
93 and CC-I lineages and Chlamydiae bacterium K1060_chlam_2 (1.7 Mbp), are all distinctly
94 smaller than the S. negevensis genome (2.6 Mbp), indicating differences in their cell biology
95 and lifestyle. The genomes of CC-I and CC-II lineages appear to have experienced reductive
96 genome evolution as their genomes display smaller median intergenic space in comparison to
97 all other chlamydiae, particularly in comparison with the genomes of many environmental
98 chlamydiae (Supplementary Fig. 3).
99 CC-III is composed solely of MAGs derived from recent metagenomic studies
100 (Supplementary Table 3).
101 Anoxychlamydiales is dominated by marine sediment chlamydiae (Fig. 2, Supplementary
102 Fig. 3), including nearly half of the MAGs obtained from Loki’s Castle marine sediments,
6
103 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_27_8 (derived from Rifle Colorado
104 aquifer groundwater8) and Chlamydiae bacterium SM23_39 (derived from the sulfate-methane
105 transition zone of White Oak River sediments9). The Anoxychlamydiales are characterized by
106 an exceptionally low GC content (26-31%) and have the largest median intergenic spaces of
107 members of the sub-clades including CC-I, CC-II, CC-III and Anoxychlamydiales. Gene
108 content within this clade is highly conserved in comparison with other newly identified
109 chlamydial lineages (Supplementary Fig. 3).
110 Only two marine sediment chlamydiae MAGs appeared to belong to the environmental
111 chlamydiae clade of well-characterized chlamydial symbionts of protists10,11 (Fig. 2,
112 Supplementary Fig. 3). Chlamydiae bacterium K940_chlam_3 represents the deepest branching
113 lineage of the environmental chlamydiae, and Chlamydiae bacterium K940_chlam_7 represents
114 a sister taxon to W. chondrophila12. These representatives have estimated genome sizes of 2.0
115 and 2.6 Mbp respectively, which is consistent with the range of genome sizes represented by
116 other members of the environmental chlamydiae (2.1-3.4 Mbp). The environmental chlamydiae
117 clade displays the most varied patterns in gene content and the largest median intergenic space
118 across their genomes (mean of 84 bp) (Supplementary Fig. 3). These factors point to gene
119 acquisition events, which could be a result of the amoeba-associated lifestyles of members of
120 this clade13.
121 CC-IV is composed solely of three marine sediment chlamydiae MAGs which have
122 higher GC-content (47%) than other chlamydiae (mean of 39%) and forms a well-supported
123 sister clade of the Chlamydiaceae (Fig. 2, Supplementary Fig. 3). The gene content within CC-
124 IV representatives is less conserved than within members of the Chlamydiaceae
125 (Supplementary Fig. 3). Furthermore, CC-IV members have larger estimated genome sizes
126 (1.3-2.1 Mbp) than the Chlamydiaceae (1-1.2 Mbp), with Chlamydiae bacterium
127 K940_chlam_9 having a particularly large genome size in comparison (2.1 Mbp).
7
128
129 2. Insights into the evolution of pathogenicity in Chlamydiaceae
130 The Chlamydiaceae family, recently reviewed in14,15, includes important animal and human
131 pathogens. The well-known human pathogen Chlamydia trachomatis is the causative agent of
132 sexually transmitted genital tract infections and trachoma (i.e., preventable blindness), while
133 Chlamydophila pneumoniae can cause acute respiratory infections. In addition, C. psittaci, C.
134 abortus, and C. felis all have zoonotic potential and can also cause disease in humans14. Based
135 on our revised Chlaymidiae phylogeny and the expanded genomic sampling of members of this
136 phylum, here we provide insights into the emergence and evolution of the Chlamydiaceae
137 family.
138
139 2.1 Chlamydiaceae evolved later in Chlamydiae evolution, and through genome reduction 140 141 The Chlamydiaceae were thought to be an early-diverging group within the Chlamydiae
142 phylum1-4. The expanded genomic sampling of chlamydial diversity and use of sophisticated
143 phylogenomic methods herein has allowed us to propose a new phylogeny for the Chlamydiae,
144 including the Chlamydiaceae (Supplementary Discussion 1). Specifically, we consistently
145 recover a strongly supported sister-relationship between Chlamydiaceae and CC-IV, which
146 together form a clade sister to the amoeba-associated environmental chlamydiae (Fig. 2,
147 Supplementary Fig. 3, Supplementary Fig. 16).
148 The CC-IV clade is solely comprised of uncultured lineages identified in Loki’s Castle
149 marine sediments. When compared to Chlamydiaceae, CC-IV lineages have larger genomes
150 (1.1-1.2 Mbp and 1.3-2.1 Mbp, respectively) and higher GC contents (37–41% and 47%,
151 respectively). These two clades also differ significantly in their conservation of gene content
152 (Supplementary Fig. 3) while the gene content of Chlamydiaceae is highly conserved, it is
153 highly variable between the three obtained CC-IV lineages.
8
154 Our analyses of the presence and absence patterns of Non-supervised Orthologous
155 Groups (NOGs) support previous reports which indicated that the evolution of the
156 Chlamydiaceae family was characterized by massive gene loss14,16-18, consistent with observed
157 genome size reduction in the branch leading to this family (Fig. 2, Fig. 3a-b, Supplementary
158 Data 3). When comparing gene content between environmental chlamydiae, CC-IV, and
159 Chlamydiaceae, we found 576, 248 and 36 NOGs respectively, conserved uniquely within each
160 clade (Fig. 3b,). When considering NOGs found exclusively in the Chlamydiaceae and not in
161 other chlamydiae, the set of protein families uniquely conserved in this group dropped further
162 to 15 (Supplementary Fig. 6). We also identified 13 PF domains conserved across
163 Chlamydiaceae, which are not present in the genomes of other chlamydial lineages. The
164 acquisition of the small set of proteins conserved in the Chlamydiaceae which are not found in
165 other chlamydiae, may have played a role in the evolution of the clade. In addition, the
166 proportion of NOGs assigned to each Cluster of Orthologous Groups of proteins (COG)
167 functional category was generally smaller in Chlamydiaceae relative to environmental
168 chlamydiae and CC-IV lineages (Supplementary Fig. 5). The latter observation was most
169 prevalent in COGs with the largest underrepresentation in functional categories related to
170 metabolism (e.g., energy production and conservation, carbohydrate transport and metabolism,
171 and inorganic ion transport and metabolism; Supplementary Fig. 5). Indicating, that in
172 particular a loss of functions related to metabolism may have contributed to Chlamydiaceae
173 evolution.
174
175 2.2 Chlamydiaceae display reduced metabolic capacities relative to CC-IV 176 177 Several metabolic pathways appear to have been lost specifically in Chlamydiaceae relative to
178 CC-IV and environmental chlamydiae (Supplementary Fig. 4, Supplementary Data 3). These
179 pathways include proline biosynthesis, and the UMP biosynthesis pathway necessary for
9
180 pyrimidine biosynthesis (KEGG module: M00051). Furthermore, Chlamydiaceae members
181 lack genes for a hexokinase (or any glucokinase) and for the first three enzymes of the
182 tricarboxylic acid cycle (TCA) cycle (i.e., citrate synthase, aconitase and isocitrate
183 dehydrogenase)19. Consequently, they depend on their host for metabolic exchange of TCA
184 cycle intermediates and glucose-6-phosphate19. In contrast, a glucokinase and a complete TCA
185 cycle are found in virtually all genomes of environmental19 and CC-IV chlamydiae. These
186 patterns suggest that all pathways mentioned above were present in the common ancestor of
187 environmental chlamydiae, CC-IV and Chlamydiaceae, and subsequently lost in
188 Chlamydiaceae.
189 Many flagellar components are present in CC-IV and individual chlamydial lineages
190 branching at the base of the environmental chlamydiae and CC-IV/Chlamydiaceae clades while
191 only a few components were identified in Chlamydiaceae (Supplementary Discussion 3).
192 Phylogenetic analyses of these flagellar components indicate that these were already present in
193 the last common ancestor of Chlamydiaceae, environmental and CC-IV chlamydiae
194 (Supplementary Fig. 8, Supplementary Data 4, Supplementary Discussion 3). Yet, the few
195 subunits present in the Chlamydiaceae seem to have been co-opted to act alongside their NF-
196 T3SS20.
197
198 2.3 Gain of virulence and host interaction factors in Chlamydiaceae evolution 199 200 In comparison to other lineages of the Chlamydiae, Chlamydiaceae genomes are characterized
201 by a set of unique and functionally annotated core genes (Fig. 3b, Supplementary Fig. 6) that
202 encode proteins associated with host-interaction and virulence.
203 In particular, and in agreement with previous studies17, we observed an expansion of
204 Polymorphic Outer Membrane Protein families (POMPs) uniquely in members of the
205 Chlamydiaceae14. Polymorphic outer membrane proteins (POMPs) allow for niche-specific
10
206 adhesion of chlamydial cells to their animal hosts, and also aid in immune system evasion
207 through their antigenic diversity1. Several additional outer membrane proteins are also
208 conserved across Chlamydiaceae (Supplementary Fig. 6).
209 Another striking example of a protein uniquely conserved among members of the
210 Chlamydiaceae is the carbohydrate-selective porin (OprB) protein (Supplementary Fig. 6),
211 which is a component of the outer membrane complex of membrane proteins that are surface-
212 exposed in Chlamydiaceae EBs21. Besides, all Chlamydiaceae encode one arginine
213 decarboxylase which most likely functions in the reduction of arginine reserves during host cell
214 infection, but could also protect from nitrosative stress22. Furthermore, most Chlamydiaceae
215 uniquely encode a Membrane Attack Complex PerForin (MACPF)14,23. While the exact
216 function of the MACPF in Chlamydiaceae is unclear, it may assist in the acquisition and
217 processing of lipids derived from the host, play a role in host immune system avoidance or
218 facilitate host entry through pore formation14.
219 Finally, Chlamydiaceae are characterized by a large number of highly conserved NOGs
220 and protein family (PF)24 domains with unknown function (Supplementary Fig. 6), some of
221 which could play a role in the pathogenic lifestyle of members of this family.
222 223 224 2.4 Gene acquisition events unique to CC-IV and Chlamydiaceae 225 226 Members of the CC-IV and Chlamydiaceae (Supplementary Fig. 7a) appeared to encode seven
227 gene families (by NOG or PF domain) absent in all other Chlamydiae lineages, which were
228 likely gained prior to the divergence of the two former clades. Despite their presence in all
229 representative Chlamydiaceae genomes investigated here, the function of most of these proteins
230 is unknown (barring the exception of COG0400 in the genome of Chlamydia sp. 2742-308).
231 Their maintenance across Chlamydiaceae, despite massive gene loss during evolution of this
232 family (Fig. 3a-b), suggests that these proteins play important roles in their pathogenic
11
233 lifestyles. To further investigate their evolutionary history, we inferred single-gene tree
234 phylogenies for two of these protein families (PF04518 and PF05302), which are thus far
235 taxonomically restricted to CC-IV and Chlamydiaceae (see Methods).
236 While only one protein with the PF domain PF04518 is found in CC-IV member
237 Chlamydiae bacterium K940_chlam_9, genomes of Chlamydiaceae encode four or five
238 proteins with this domain (Supplementary Fig. 7a). A phylogenetic analysis of proteins
239 assigned to PF04518 (Supplementary Fig. 7a) revealed that this gene family appears to have
240 undergone several gene duplication events, after the divergence of CC-IV and Chlamydiaceae
241 and prior to the diversification of the latter, resulting in four distinct gene copies
242 (Supplementary Fig. 7b). Each of these copies belongs to one of four different highly supported
243 clades (BV > 98) (Supplementary Fig. 7b). Genes encoding proteins from clades 1 and 2, as
244 well as from clade 3 and 4, are localized, respectively, in a gene cluster in the genomes of
245 Chlamydiacaeae. A subset of proteins assigned to cluster 4 experienced an additional gene
246 duplication event in Chlamydia trachomatis, Chlamydia muridarum and Chlamydia suis, and
247 form a distinct sub-clade (clade 5) within clade 4 (Supplementary Fig. 7b). The previous
248 investigation of this protein family in Chlamydiaceae has shown that its members contain a
249 Non-Flagellar Type III Secretion System (NF-T3SS) signal25 and appear to be secreted by the
250 NF-T3SS as effectors26. They may act by targeting nuclear functions, since they are found in
251 the nucleus of infected host cells25,26. The function of proteins with the PF04518 domain in
252 Chlamydiaceae was likely neo-functionalized by the above described duplication events.
253 Understanding the function of the single copy protein in Chlamydiae bacterium K940_chlam_9
254 could help determine the ancestral function of the protein and how it impacted the evolution of
255 Chlamydiaceae pathogenicity.
256 We also observed conserved gene duplications between the CC-IV and Chlamydiaceae
257 in the case of proteins with the domain PF05302 (Supplementary Fig. 7a). In this case, a
12
258 phylogenetic analysis revealed three distinct clades, with one copy from Chlamydiae bacterium
259 K940_chlam_9 and Chlamydiaceae members found in each (Supplementary Fig. 7c). All three
260 copies are organized together in the genomes of both Chlamydiae bacterium K940_chlam_9
261 and Chlamydiaceae members (Supplementary Fig. 7c). Together, these results indicate that this
262 gene family underwent several gene duplication events prior to the divergence of CC-IV and
263 Chlamydiaceae. Chlamydia trachomatis PF05302 domain-containing homologs CT847 (clade
264 3) and CT849 (clade 1) (Supplementary Fig. 7c, Supplementary Data 4) have both been
265 characterized as T3SS substrates, and likely effectors27,28. Chlamydia trachomatis homolog
266 CT847 appears to interact with mammalian Grap2 Cyclin D-Interacting Protein (GCIP), a
267 protein involved in the eukaryotic cell cycle27. Examining the role of the three proteins with the
268 domain PF05302 in Chlamydiae bacterium K940_chlam_9 could help in elucidating their
269 ancestral functions. Thereby aiding in understanding the contributions of this protein family to
270 Chlamydiaceae evolution.
271 Future, more fine-grained investigations of gene content evolution in members of the
272 Chlamydiae, with the inclusion of CC-IV members, will be crucial to better understand the
273 evolutionary trajectories that led to the ecological success of Chlamydiaceae as animal
274 pathogens.
275
276 3. Secretion systems and flagella in Chlamydiae
277 3.1 Detection of secretion systems, flagella and effectors 278 279 The secretion of proteins and other molecules by secretion systems is important for host
280 association, microbial interactions and relation with the environment. We screened all available
281 Chlamydiae genomes for type I to VI secretion systems (T1SS to T6SS), flagella and related
282 genes with MacSyFinder29 (see Supplementary Data 3). Most chlamydiae genomes were found
283 to contain genes for T1SS, T2SS, T3SS and T5SS, while only a few encoded T4SS and flagella
13
284 genes. We discuss each of these systems below, following the gene nomenclature proposed by
285 Abby et al.30.
286 T1SS. T1SSs are simple one-step protein secretion systems that are formed by three
287 components: an inner membrane ABC transporter, an outer membrane component and a
288 bridging membrane fusion protein. All three components were found in most of the surveyed
289 chlamydiae. However, the membrane fusion protein was not detected in most CC-IV and CC-
290 III lineages, and all three components were absent in most Chlamydiaceae (Supplementary Data
291 3).
292 T2SS. T2SS are complex protein secretion systems formed by outer membrane, inner
293 membrane and pseudopilus apparatuses, and a cytoplasmic ATPase31,32. The most commonly
294 detected homologs for this system in Chlamydiae were GspD, GspF and GspG, respectively,
295 which represent the central core proteins of the three above-mentioned T2SS structural
296 apparatuses. We also detected the cytoplasmic ATPase GspE in a few Chlamydiae genomes,
297 and PilB (a homolog of GspE in type 4 pili) was often detected in those genomes where GspE
298 was missing (Supplementary Data 3). The core genes gspDEFG were generally co-located in
299 tandem (Supplementary Fig. 10). Other T2SS components, such as the minor pseudopilins
300 GspHIJK (labeled as 'mandatory' by MacSyFinder) and other non-essential proteins (labeled as
301 'accessory' by MacSyFinder) were often absent. However, situated immediately upstream of
302 gspDEFG, we detected either the minor pseudopilin genes gspHIJK, or genes of similar length
303 patterns with a significant e-value using BLAST (Supplementary Fig. 10). Taken together,
304 these results suggest that chlamydiae harbour a variant of the classical T2SS. Furthermore our
305 observations agree in part with Peabody et al.31, who described the presence of genes
306 gspCDEFG in Chlamydia and Chlamydophila genomes: we were unable to detect GspC, while
307 Peabody et al. were unable to detect the minor pseudopilins GspHIJK.
14
308 Non-flagellar T3SS (NF-T3SS), flagellum and T3SS-secreted effectors. NF-T3SSs are
309 complex protein secretion systems generally with eukaryotic host interactions, and have
310 previously been shown to be essential for virulence in Chlamydiaceae (reviewed in e.g.,33,34).
311 NF-T3SS components evolved through exaptation of proteins constituting the bacterial
312 flagellum35, which complicates their unambiguous detection and annotation. The components
313 screened in the present study include the outer membrane ring secretin (SctC), the inner
314 membrane ring (SctJ), the secretion apparatus (SctRSTUV), the sorting platform (SctQ) and
315 the cytoplasmic ATPase (SctN). SctC is unique to the NF-T3SS, while the other components
316 share homology with flagellar proteins. To evaluate whether the chlamydial homologs were
317 NF-T3SS or flagellar genes, we performed phylogenetic analyses of various individual genes,
318 as well as of concatenated alignments, using as reference the SctN sequences published by
319 Abby and Rocha35 and the other discussed NF-T3SS sequences used in Abby et al.30. Similar
320 to previous studies35, our phylogenetic analyses (Supplementary Fig. 8, Supplementary Data 4)
321 place a myxococcal NF-T3SS system as sister to all other bacterial NF-T3SS sequences. The
322 latter then diverge on one hand into all of the chlamydial sequences and, on the other, the rest
323 of bacteria. Our analyses retrieve the monophyly of the main chlamydial clades, which is in
324 overall agreement with the species tree. These results confirm that the NF-T3SS is found across
325 Chlamydiae.
326 The NF-T3SS genes are distributed over three gene clusters, one containing sctN, sctQ and
327 sctC, one containing sctJ, sctR, sctS and sctT, and one containing sctU and sctV (Supplementary
328 Fig. 9). The gene order and neighborhood of these clusters is highly conserved in all
329 Chlamydiae genomes, as has been shown previously for environmental chlamydiae and
330 Chlamydiaceae8,9. The gene order conservation allowed us to detect sctC homologs in many
331 genomes where this gene was not detected by MacSyFinder: we identified significant BLAST
332 hits to known SctC sequences in their expected position near the sctN and sctQ genes. The three
15
333 gene clusters were interspersed and flanked with other conserved genes on the same strand.
334 While their function could not be determined in most cases, a gene situated between sctQ and
335 sctC encoded a serine/threonine protein kinase that has been suggested to participate in NF-
336 T3SS protein secretion10. Altogether, we hypothesize that the NF-T3SS genes were acquired
337 by the common ancestor of Chlamydiae and have since been inherited vertically.
338 In contrast to the ubiquity of NF-T3SS, we found flagellar genes only in a handful of
339 genomes, including CC-IV and four marine chlamydiae related to the environmental
340 chlamydiae and Chlamydiaceae clades36,37. Although many chlamydiae genomes were found
341 to contain putative homologs of the flagellar proteins sctN and sctQ genes, most turned out not
342 to be associated with flagellar function. On one hand, most proteins detected as flagellar SctQ
343 homologs branched instead with NF-T3SS genes in phylogenetic analyses (ufBV=94%; SH-
344 aLRT=92%) or within a clade extremely distantly related to flagellar homologs (Supplementary
345 Data 4). Phylogenetic analyses of chlamydial SctN homologs similarly revealed that these often
346 branched with non-flagellar ATPases (Supplementary Data 4). However, phylogenetic analyses
347 were inconclusive regarding the putative function of a group of proteins annotated as flagellar
348 SctN homologs by MacSyFinder in Chlamydiaceae: while they are more closely related to
349 flagellar sequences than to other homologs, they are not nested within them (Supplementary
350 Data 4). Putative flagellar homologs of SctV in Chlamydiaceae also form long-branching clades
351 related to known flagellar sequences, indicating these proteins could represent divergent
352 flagellar homologs (Supplementary Data 4). Remarkably, flagellar homologs of SctN (FliI) and
353 SctV (FlhA) in Chlamydiaceae have been shown to interact with the NF-T3SS protein complex,
354 suggesting they have been co-opted to a new function in protein secretion20.
355 In contrast, the above-mentioned marine chlamydiae and the CC-IV Chlamydiae
356 K940_chlam_9 and KR12_chlam_2 were found to contain a large array of flagellar genes
357 (Supplementary Figs. 4 and 8; Supplementary Data 3). Even though CC-IV Chlamydiae
16
358 bacterium K1000_chlam_4 contained only a copy of the flagellar homolog of sctR (fliP), it is
359 possible that this genome encodes a full flagellar gene set, since this MAG is relatively
360 incomplete (Fig. 2, Supplementary Table 2) and the contig containing fliP ends right after this
361 gene, and thus before the expected location of the flagellar homologs of sctS (fliQ) and sctT
362 (fliR) genes. The flagellar genes of CC-IV and the marine chlamydiae listed above form a well-
363 supported clade in the phylogeny of concatenated NF-T3SS genes and in single-gene trees of
364 SctJ, SctR (Supplementary Fig. 8), SctN and others (Supplementary Data 4). Therefore, given
365 the species phylogeny obtained in the present study (Fig. 2, Supplementary Fig. 3), these results
366 suggest that gene sets for the flagellum were present in the common ancestor of the
367 environmental chlamydiae, Chlamydiaceae and CC-IV clades, but were ultimately lost in the
368 former two groups but retained in their CC-IV and unclassified relatives. However, further
369 phylogenetic analyses that adequately model the extreme divergence of the Chlamydiaceae
370 genes of putative flagellar origin will be required to verify these inferences.
371 Since we were able to predict the existence of a NF-T3SS in several chlamydiae, we used
372 EffectiveDB38 to predict T3SS-secreted proteins, eukaryotic-like domains (ELD), and putative
373 subcellular targeting signals to eukaryotic cellular compartments. None of the non-chlamydial
374 PVC bacteria were predicted to have any T3SS secreted proteins. In contrast, 9% to 28% of
375 chlamydial proteomes were predicted to possess a T3SS-associated signal peptide. However,
376 we could not identify major differences between various subclades of chlamydiae regarding
377 most features predicted by EffectiveDB (Supplementary Data 3). A notable exception concerns
378 the prediction of CCBD (conserved chaperone-binding domain) motifs, which are usually
379 found in the N-terminal region of T3SS-secreted proteins and have been shown to serve as
380 binding site of chaperones facilitating the correct selection and unfolding of T3SS-dependent
381 effector proteins. We found that Anoxychlamydiales members were enriched in proteins
382 predicted to have a CCBD motif (average 4.58 ± 0.78%) compared to other chlamydiae (2.52
17
383 ± 1.20%). However, the significance of this result is difficult to assess given that these lineages
384 do not appear to be enriched in predicted T3SS-secreted proteins. Intriguingly, five chlamydiae
385 were not predicted to have any T3SS-secreted proteins, although all of these organisms are
386 predicted to have a NF-T3SS. In addition, Verrucomicrobium spinosum, a verrucomicrobium
387 recently described to have a NF-T3SS39, was also not predicted to have any T3SS-secreted
388 proteins or CCBD motifs. This suggests that predictive tools such as EffectiveDB are currently
389 unable to model the entire diversity of proteins motifs that are recognized by the T3SS and their
390 chaperones, and that differences between chlamydial lineages in terms of their T3SS-secreted
391 proteins will have to be revisited when more sensitive predictive tools become available.
392 T4SS. T4SSs are versatile systems generally involved in contact-dependent translocation
393 of proteins and DNA. We detected most of the Type F T4SS genes in a subset of Chlamydiae
394 genomes: Waddliaceae bacterium SP13, Chlamydiae bacterium K1060_chlam_2 (CC-I),
395 Chlamydiae bacterium K1000_chlam_3 (CC-II), R. massiliensis, Parachlamydia spp. and
396 Protochlamydia spp (Supplementary Fig. 11). This patchy distribution of T4SS genes in
397 chlamydiae is consistent with the idea that these genes were recently acquired via horizontal
398 gene transfer (HGT).
399 T5SS. T5SS are two-step protein secretion systems, generally substrate-specific and
400 containing one to three components. T5SS classical autotransporters (type 5a secretion systems,
401 T5aSS) and translocators (type 5b secretion system, T5bSS) were identified in most CC-I, II,
402 III and Anoxychlamydiales members, while most environmental chlamydiae and
403 Chlamydiaceae only contained T5aSS autotransporters.
404 Other systems. Finally, while a few homologs were detected for other systems such as
405 T6SS, Tad pili and type IV pili (T4P), these remained largely incomplete in all genomes,
406 indicating these inferences likely represent false positives.
407
18
408 3.2 Putative functions of secretion systems in marine sediment Chlamydiae 409 410 The observation made above largely corroborates findings made in previous studies, which
411 indicate that most Chlamydiae contain T1SS, T3SS and T5aSS, and provide new evidence for
412 the presence of T2SS and the sparse presence of T4SS and flagella3,17,30,40. The presence of
413 secretion systems is commonly interpreted in the light of host-symbiont dynamics. However,
414 the lack of identified eukaryotes in the presented samples (see Supplementary Discussion 6)
415 raises the possibility that these perform alternative functions in at least some of the newly
416 discovered chlamydial lineages.
417 In Chlamydiaceae, T2SS, T3SS and T5aSS are typically linked to host adhesion, invasion
418 and manipulation15,41,42. Similarly, environmental chlamydiae have also been shown to express
419 secretion systems during infection of microbial eukaryotes43,44. However, whether their
420 function is conserved throughout the Chlamydiae phylum remains unclear. For example, a
421 recent transcriptomic study found that the expression of T3SS is higher in reticulate bodies than
422 in elementary bodies in C. abortus, but found the opposite pattern in W. chondrophila44. The
423 exact nature of the cell cycle and the function of secretion systems in these lineages are yet to
424 be elucidated. Furthermore, the extracellular stage of the chlamydial cell cycle is considerably
425 understudied45, and little is known about potential chlamydial interactions with other microbes.
426 Despite the traditional link between secretion systems and host-association, some of these
427 systems have been described to target prokaryotes. For example, T6SS are typically used for
428 bacteria-bacteria interactions46, and gram positive T7SS have been described to target bacteria
429 under certain conditions47. T1SS, T4SS and T5bSS, all present in chlamydial genomes, have
430 also been shown to target bacterial cells48-51. In Legionella pneumophila, T2SS and T4P
431 facilitate biofilm formation and retention52,53, and the former is involved in sliding motility54
432 and extracellular survival in freshwater. Taken together, this suggests that the presence of
433 various secretion systems does not necessarily imply interactions with eukaryotic hosts.
19
434 In line with this, NF-T3SS have also been described in bacteria that are not known to
435 interact with eukaryotes55. For example, a NF-T3SS has been identified in Verrucomicrobium
436 spinosum, a generally free-living organism (though it has been shown to have detrimental
437 effects when experimentally inoculated in fruit flies and Caenorhabditis elegans39). NF-T3SS
438 has also been found alongside T6SS, chemotaxis and flagellar genes, in strains of Vibrio and
439 Aeromonas associated with microbial biofilms, but which are not known to associate with
440 eukaryotes39. The presence of NF-T3SS in Myxococcales is particularly interesting, given that
441 they are not associated with a host and contain a highly divergent version of the NF-T3SS,
442 which represents a sister to all other NF-T3SS and lacks various genes generally associated
443 with this system35. The chlamydial version of the NF-T3SS, which is placed as sister clade to
444 all non-myxococcal NF-T3SS (Supplementary Fig. 8) may hold functional similarities with the
445 more divergent Myxococcales NF-T3SS35.
446 In conclusion, these observations indicate that the presence of various secretion systems
447 does not necessarily imply a eukaryote-associated lifestyle. Alternatively, proteins secreted by
448 secretion systems could play a role in growth in biofilms, in the interaction with other microbial
449 groups or in the modification of the environment. Studies about the biology of chlamydial
450 elementary bodies, as well as visualisation of representatives of the newly discovered lineages
451 will be instrumental to answer this question.
452
453 4. Phylogenetic diversity of chlamydial nucleotide transporters.
454 Nucleotide transporters (NTTs) belong to the ‘ATPases Associated with diverse cellular
455 Activities’ (AAA) family of proteins56 and can transport a range of metabolites, including ATP,
456 the cofactor nicotinamide adenine dinucleotide (NAD+), ribonucleotides and
457 deoxyribonucleotides across a membrane. NTTs are found in diverse lineages in the tree of life.
458 In plastid-bearing eukaryotes, the ATP/ADP NTT proteins are essential for the import of ATP
20
459 into the organelle from the cytosol57,58. Some obligate intracellular pathogenic eukaryotes (e.g.,
460 Microsporidia16,56) and obligately symbiotic bacteria (e.g., members of the Chlamydiaceae and
461 Rickettsia59) use NTTs to import ATP and other nucleotides from their eukaryotic hosts. A
462 recent investigation of NTT phylogenetic diversity found that NTT homologs are also found in
463 a diverse set of free-living organisms60.
464 All chlamydial genomes investigated to date, including those from marine sediment
465 chlamydiae, encode multiple NTT homologs (Supplementary Fig. 4, Supplementary Data 3),
466 although the number of homologs varies across different representatives of this phylum.
467 Depending on the clade, we observed two (CC-IV and Chlamydiaceae), four (CC-I, CC-II and
468 environmental chlamydiae) or five (Anoxychlamydiales) distinct NTT paralogs
469 (Supplementary Data 3). To classify the NTTs of marine sediment chlamydiae related to those
470 from characterized lineages, we reanalysed the phylogenetic diversity of NTT proteins across
471 the tree of life.. For this, we used a previous analysis that resolved the NTT superfamily60:
472 “canonical NTTs”, “other NTTs” and in addition, proteins with NTT-HEAT domains.
473
474 4.1 Phylogenetic diversity of “canonical NTTs” 475 476 “Canonical NTTs” have a single TLC (PF03219) protein domain architecture, and include most
477 functionally characterized NTTs, such as the ATP/ADP transporters from plastids,
478 Microsporidia, Rickettsia and Chlamydiae (Supplementary Fig. 12a). Phylogenetic analyses of
479 these “canonical NTTs” resolved nine distinct groups of chlamydial sequences (Supplementary
480 Fig. 12a). Most ATP/ADP transporters of primary and secondary plastid-bearing lineages (e.g.,
481 archaeplastids, diatoms, haptophytes, red algae and brown algae) formed a strongly supported
482 clade (ufBV = 95, Supplementary Data 4). Notably, in spite of the cyanobacterial origin of
483 plastid61, the closest prokaryotic homologs to plastid derived ATP/ADP transporters are
484 represented by homologs of Chlamydiae (Supplementary Fig. 12a). As previously
21
485 hypothesized59,60,62,63, this suggests that plastid-bearing lineages acquired the NTT gene from a
486 chlamydial-like donor early in the evolution of this organelle. These chlamydial ATP/ADP
487 translocases formed a clade (ufBV = 91, Supplementary Data 4), referred to as cluster 9
488 (Supplementary Fig. 12a). Cluster 9 contains only one representative sequence from each major
489 chlamydial clade, except for Anoxychlamydiales members, each of which has two paralogs.
490 The topology within clade 9 (Supplementary Data 4) is consistent with the organismal
491 phylogeny (Fig. 2), suggesting that this protein has evolved vertically within the phylum. The
492 substrate specificities of several proteins in cluster 9 have been experimentally characterized.
493 Chlamydia trachomatis (Ct)56, Candidatus Protochlamydia acanthamoebae (Pam)59 and
494 Simkania negevenis (Sn)59,64 encode ATP/ADP-specific NTT1 antiporters that participate in
495 energy parasitism during infection of their hosts. Interestingly, CtNTT165 can also act as an
496 NAD+/ADP antiporter, providing a mechanism through which members of the Chlamydiaceae
497 can acquire NAD+, which they are unable to synthesize66,67.
498 Clusters 6 and 7, were each composed of several representatives of the environmental
499 chlamydiae, and branched closely to alphaproteobacterial lineages, including ADP/ATP
500 transporters from Rickettsia59 suggesting they may have similar functional roles in chlamydiae.
501 Cluster 8 includes CC-I, one representative from Anoxychlamydiales, and several
502 representatives from the environmental chlamydiae (Supplementary Fig. 12a). SnNTT3, found
503 in cluster 8, has been characterized and is a proton-independent general NTP transporter, which
504 is also capable of transporting the deoxyribonucleotide triphosphate dCTP64.
505 Five different chlamydial clusters, clusters 1-5, together formed a well-supported group
506 (ufBV = 100). Cluster 5 includes CC-II and environmental chlamydiae (Supplementary Fig.
507 12a) and PamNTT2, a proton-independent transporter of all four canonical ribonucleoside
508 triphosphates68. Cluster 4 is composed solely of sequences from environmental chlamydiae
509 (Supplementary Fig. 12a) and includes PamNTT3, a proton-energized symporter that transports
22
510 UTP68. Cluster 3 is composed of members of CC-I, CC-II, CC-III, Anoxychlamydiales and
511 includes SnNTT2, a proton-dependent symporter of GTP and ATP64. Cluster 2 NTTs comprises
512 sequences from CC-IV and Chlamydiaceae, and includes CtNTT2, a proton-driven symporter
513 of all four NTPs56 (Supplementary Fig. 12a). Cluster 1 contains homologs from environmental
514 chlamydiae, Anoxychlamydiales and some members of CC-II including PamNTT5, a proton-
515 energized symporter that transports both GTP and ATP68 (Supplementary Fig. 12a).
516 Finally, we observe a clear functional conservation within the experimentally
517 characterized H+-driven symporters, all of which group together in clusters 1, 2, 3 and 4 (ufBV
518 = 88 Supplementary Fig. 12a, Supplementary Data 4). This suggests that phylogenetic
519 reconstructions of NTT homologues possess a predictive power in terms of mode of transport,
520 but not substrate specificity.
521
522 4.2 Phylogenetic diversity of “other NTTs” with a single-domain architecture 523 524 A bacterial-dominated group of “other NTTs” has been described to have a single TLC
525 (PF03219) protein domain architecture, and most proteins in this group have yet to be
526 functionally characterized (Supplementary Fig. 12b). In the phylogenetic analysis of those
527 “other NTTs”, we recovered two groups of chlamydial sequences (Supplementary Fig. 12b).
528 Cluster 11 includes representatives from environmental chlamydiae and Anoxychlamydiales
529 (Supplementary Fig. 12b) and branches as a sister clade to a clade comprising sequences from
530 the Candidatus Dependentiae (formerly TM6) phylum (Supplementary Fig. 12b). Ca.
531 Dependentiae have reduced genomes and are thought to lead host-associated lifestyles with
532 eukaryotic hosts69. Cluster 10 includes members of CC-I (including SnNTT4, whose substrate
533 specificity could not be determined in a prior study64), CC-II, and Anoxychlamydiales and
534 forms a maximally supported clade composed largely of Deltaproteobacteria, including several
535 Bdellovibrio-and-like-organisms (BALOs). BALOs have a predatory lifestyle whereby they
23
536 invade the periplasm of other gram-negative bacteria to harvest nutrients70. However, this
537 invasive lifestyle is not obligate, as BALOs can grow axenically under nutrient rich conditions.
538
539 4.3 Phylogenetic diversity of “NTT-HEAT” proteins 540 541 “NTT-HEAT” family NTTs60 have an additional C-terminal HEAT domain (PF13646,
542 PF02985) and are found across a wide-range of free-living bacteria, though none have been
543 functionally characterized thus far (Supplementary Fig. 12c). HEAT domains are involved in
544 protein-protein interactions71, and thus may alter the function of the NTT domain in NTT-
545 HEAT proteins60. These putative NTTs are hypothesized to facilitate inter-microbial nutrient
546 exchange during bacteria-bacteria interactions60. For example, these NTTs could be involved
547 in multicellular development in Cyanobacteria, which have proteins with this domain
548 architecture60. A ML phylogeny of proteins with this domain architecture (Supplementary Fig.
549 12c) recovered one chlamydial clade, cluster 12, which includes the Chlamydiaceae, CC-IV,
550 CC-III, some environmental chlamydiae and several members of Anoxychlamydiales.
551
552 4.4 NTTs in marine sediment chlamydiae 553 554 NTTs identified in marine sediment chlamydiae from this study clustered together with other
555 chlamydial homologs (Supplementary Fig. 12). Despite distinct phyletic distribution patterns
556 of the NTTs found in different chlamydiae clades, the ubiquity of NTT homologues in different
557 chlamydial lineages gives weight to the proposed ancient origin of NTTs within the Chlamydiae
558 phylum59,60. Marine sediment chlamydiae, appear to host a similar set of NTTs as other
559 members of the phylum, including homologs closely related to functionally characterized
560 NTTs. Due to the promiscuous functions of NTTs however59,67, we cannot predict the substrate
561 specificity of homologs found in marine sediment chlamydiae. Although they have homologs
24
562 of the canonical ATP/ADP transporter (Supplementary Fig. 12a), functional characterization is
563 necessary to determine their substrate specificity.
564
565 5. Genomic potential for de novo biosynthesis of nucleotides and amino acids across
566 Chlamydiae
567 Many host-associated bacteria, and particularly obligate intracellular bacteria, are able to
568 acquire essential amino acids and nucleotides from their hosts. Obligate symbionts often
569 undergo genome reduction and lose the ability to produce these compounds de novo72. Here,
570 we discuss the ability of the marine sediment chlamydiae genomes for de novo amino acid and
571 nucleotide biosynthesis (Supplementary Fig. 4, Supplementary Data 3).
572
573 5.1 De novo biosynthesis of amino acids 574 575 Similar to characterized chlamydiae19, the investigated MAGs seem to be generally auxotrophic
576 for many amino acids. In fact, no chlamydiae representative with the capacity to synthesize all
577 amino acids has been identified thus far. Below, we discuss some examples of more extensive
578 amino acid and nucleotide biosynthesis capabilities in specific Chlamydiae lineages.
579 Proline. Environmental chlamydiae do not have the coding potential to synthesize all
580 amino acids de novo19, but generally encode a larger set of amino acid biosynthetic capabilities
581 than Chlamydiaceae (e.g., proline biosynthesis). We identified all genes for proline de novo
582 biosynthesis in more than half of environmental chlamydiae genomes (9/15; including marine
583 sediment lineage Chlamydiae bacterium K940_chlam7), and in one member of CC-IV
584 (Chlamydiae bacterium K940_chlam_9). This observation is in line with a scenario in which
585 proline biosynthesis was present in the common ancestor of the CC-IV, Chlamydiaceae and
586 environmental chlamydiae, and was subsequently lost in the Chlamydiaceae.
25
587 Aromatic amino acids. The ability to synthesize aromatic amino acids (i.e., tryptophan,
588 phenylalanine, and tyrosine) displays a punctuated distribution among the analysed chlamydiae.
589 Seemingly, only S. negevensis is capable of synthesizing all three amino acids19. The genomes
590 of Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 and its close relatives
591 (Supplementary Fig. 13, Supplementary Data 3) encode the potential for phenylalanine and
592 tyrosine biosynthesis and a near-complete pathway for tryptophan biosynthesis. While
593 Waddliaceae bacterium SP13, Chlamydiales bacterium SCGC AG-110-P3 and Parachlamydia
594 sp. C2 each encode a complete pathway for the biosynthesis of tryptophan, some
595 Chlamydiaceae19 were found to encode a near-complete pathway (Supplementary Fig. 4).
596 Other amino acids. Chlamydiae generally do not encode pathways for the biosynthesis
597 of arginine, methionine, histidine, leucine, isoleucine and valine. We identified some notable
598 exceptions, including a complete leucine biosynthesis pathway in Parachlamydia sp. BC.030
599 and a near-complete histidine biosynthesis pathway (7/8 components; Supplementary Data 3)
600 in Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 and close relatives
601 (Supplementary Fig. 13, Supplementary Data 3).
602
603 5.2 De novo biosynthesis of nucleotides 604 605 Pyrimidine. Chlamydiaceae and most members of the environmental chlamydiae are
606 auxotrophic for the de novo biosynthesis of both purine and pyrimidine nucleotides19. Previous
607 work has identified pyrimidine biosynthesis (i.e., uridine monophosphate (UMP) biosynthesis
608 from glutamine; KEGG module M00051)73,74 in W. chondrophila WSU 86-104425 and
609 Criblamydia sequanensis CRIB-1873,74. We identified a complete pyrimidine biosynthesis
610 pathway in members of CC-II (Chlamydiae bacterium Ga0074140), CC-III (Chlamydiae
611 bacterium CG10_big_fil_rev_8_21_14_0_10_42_34,
612 CG10_big_fil_rev_8_21_14_0_10_35_9), CC-IV (Chlamydiae bacterium K940_chlam_9) and
26
613 environmental chlamydiae (Chlamydiae bacterium K940_chlam_7, K940_chlam_3,
614 Chlamydiales bacterium SCGC AG-110-M15 and Waddliaceae bacterium SP13). Further, we
615 identified near-complete pathways in additional chlamydiae members of CC-II
616 (Rhabdochlamydia helvetica T3358 and Chlamydiae bacterium K940_chlam_2), other CC-IV
617 lineages, and environmental chlamydiae (Chlamydiae bacterium K940_chlam_3).
618 Purine. Compared to pyrimidine biosynthesis, purine biosynthesis is more sparsely distributed
619 among the Chlamydiae. The first evidence for a near-complete de novo purine biosynthesis
620 pathway (i.e., inosine monophosphate (IMP) biosynthesis from glutamine; KEGG module
621 M00048) in Chlamydiae was recently described in Chlamydiales bacterium SCGC AG-110-
622 M1536. We additionally identified the complete pathway for IMP biosynthesis in Chlamydiae
623 bacterium CG10_big_fil_rev_8_21_14_0_10_35_9 and Waddliaceae bacterium SP13, and
624 partial biosynthesis pathways (i.e., all but the PurE and PurK encoding genes; Supplementary
625 Data 3) in Chlamydiales bacterium SCGC AG-110-M15 and Chlamydiae bacterium
626 K940_chlam_3 (environmental chlamydiae). The former represents a relatively incomplete
627 SAG (Supplementary Table 3), such that the presence of these genes cannot be ruled out.
628 Interestingly, all genomes which encode a complete or near-complete purine de novo
629 biosynthesis pathway also encode a complete or near-complete pathway for de novo pyrimidine
630 biosynthesis. This observation suggests that some chlamydiae might not rely on a host for these
631 essential metabolites. Future analyses aimed at inferring the evolutionary histories of these
632 pathways will help to determine whether nucleotide biosynthesis was ancestrally present in
633 Chlamydiae or rather represents a derived trait acquired by horizontal gene transfer.
634
635 6. Eukaryotes in Loki’s Castle marine sediments
636 All chlamydiae characterized to date represent obligate symbionts with eukaryotic hosts and
637 have a characteristic biphasic lifecycle (intracellular host-associated phase and extracellular
27
638 elementary body phase). Our analyses revealed that the marine sediment chlamydiae encode
639 key host-association features (e.g., NF-T3SS; Supplementary Data 3, Supplementary
640 Discussion 3) and elementary body factors (e.g., early upstream reading frame transcription
641 factor and histone-like development protein; Supplementary Fig. 4, Supplementary Data 3) and
642 are predicted to be auxotrophic for some nucleotides and amino acids (Supplementary Fig. 4,
643 Supplementary Data 3, Supplementary Discussion 5). These observations would suggest that
644 marine sediment chlamydiae might be host-associated and prompted a thorough search for
645 eukaryotes in these sediments.
646 Indeed, active populations of fungi, protists and macrofauna have been observed in
647 marine sediments75-78. Using universal eukaryotic primer sets, we failed to amplify 18S rRNA
648 gene sequences from the marine sediment samples (Supplementary Table 5), in line with
649 previous analyses of Loki’s Castle marine sediments79. However, we were able to identify
650 several 18S rRNA gene sequences in the obtained metagenomic data (see below;
651 Supplementary Table 4), suggesting that eukaryotes might represent low-abundant community
652 members of these anaerobic marine sediments. Yet, it has been shown that eukaryotic DNA
653 from overlying water columns can be deposited and well-preserved in marine sediments under
654 anoxic conditions80,81. In the present study, we were unable to determine whether the observed
655 eukaryotic DNA sequences were derived from live cells capable of hosting chlamydiae. Below
656 we expand on the identified 18S rRNA gene sequences and discuss their potential sources.
657 Several 18S rRNA gene sequences from samples GS10_PC15_940 (contig-124_471961
658 and contig-124_482067) and GS10_PC15_1000 (contig-124_27583) were classified as
659 mammalian (Supplementary Table 4). These sequences most likely represent human
660 contamination introduced during sampling, DNA extraction or during sequencing.
661 In the GS10_PC15_1060 sample we identified an 18S rRNA gene sequence that likely
662 derives from a flatworm (order Rhabdocoela, contig-124_364989). Chlamydiae bacterium
28
663 K1060_chlam_2, which corresponds to the only Simkaniaceae-like MAG derived from these
664 marine sediments, was also obtained from this sample. Since several of the previously described
665 Simkaniaceae are known symbionts of marine worms82,83, it is possible that Chlamydiae
666 bacterium K1060_chlam_2 might be a symbiont of the Rhabdocoela-related flatworm observed
667 in this sediment layer.
668 In the GS10_ PC15_940 sample we uncovered 18S rRNA gene sequences that likely
669 derive from an ichthyosporean (contig-124_299197 and contig-124_207553), a green algae
670 (Micromonas contig-124_372972) and a chloroplast genome (contig-124_152295). This
671 sample was shown to also contain the actively replicating (Fig. 4a) and highly abundant
672 Chlamydiae bacterium K940_chlam_7 (Supplementary Discussion 7), which is most closely
673 related to the protist-associated environmental chlamydiae10,11. This raises the possibility that
674 one of the aforementioned eukaryotes could represent a host for Chlamydiae bacterium
675 K940_chlam_7. However, given that Micromonas is a phototroph it is unlikely that these cells
676 are active in dark marine sediments. Moreover, there have been no reported cases of chlamydiae
677 capable of infecting Archaeplastida (algae and land plants). Alternatively, it is possible that the
678 observed ichthyosporean might represent the host organism of Chlamydiae bacterium
679 K940_chlam_7. Yet, little is known about the ecology of ichthyosporeans in marine sediments,
680 and, to our knowledge, there have been no reported cases of chlamydiae capable of infecting
681 ichthyosporeans so far.
682 Eukaryotes present at low abundances in the samples could have been missed by
683 our sequencing efforts. However, in general, the eukaryotic sequences identified in the samples
684 appear insufficient to account for overall patterns in chlamydial diversity and abundance across
685 all samples. No eukaryotic sequences were identified in sample GS08_GC12_126, where
686 Anoxychlamydiales lineages were found to be exceptionally abundant (Supplementary
29
687 Discussion 7). Thereby suggesting that these particular chlamydial lineages may not depend on
688 a eukaryotic host.
689
690 7. Abundance and diversity of chlamydial lineages in Loki’s Castle marine sediments
691 In a previous study of Loki’s Castle marine sediments108, we detected the presence of
692 Chlamydiae. We further investigated the relative abundance and diversity of Chlamydiae in
693 these sediments using amplicon sequencing of samples taken from four different sediment cores
694 at various depths (Supplementary Table 1, Supplementary Data 2). All of the samples with high
695 chlamydial abundances were isolated from sediment depths found below (but within 1.2 m of)
696 the oxic/anoxic transition zone, which is found at various depths below the seafloor in sediment
697 cores GS08_GC12(0.38 mbsf)74, GS10_PC15 (1.0 mbsf)75, and GS10_GC1475 (0.4 mbsf). The
698 highest diversity of chlamydial OTUs (over 0.1% relative abundance) were observed in anoxic
699 sediment layers (Fig. 1b). When considering individual OTUs found across the marine sediment
700 amplicons, 30 were found to be present in at least five samples (Supplementary Data 2),
701 indicating that a large fraction of the observed chlamydial lineages are commonly found in this
702 environment.
703
704 7.1 Anoxychlamydiales lineages are abundant microbial community members in Loki’s 705 Castle marine sediments 706 707 We were unable to link 16S rRNA gene fragments to most of the Anoxychlamydiales MAGs
708 reconstructed in this study (a problem often encountered in genome-resolved metagenomic
709 studies112,113). However, in the phylogenetic analysis of the obtained 16S rRNA amplicon
710 sequences (Supplementary Data 4), we identified 17 OTUs that formed a highly supported clade
711 (ufBV = 98) with Anoxychlamydiales member Chlamydiae bacterium SM23_39. The OTU
712 abundance of this group mirrors the presence of Anoxychlamydiales bins in the metagenomes
713 from the same samples, indicating that these OTUs represent Anoxychlamydiales 16S rRNA
30
714 gene sequences. Two of these OTUs (OTU_5_19291 and OTU_255_442) were highly
715 abundant and widespread across sediment samples. OTU_5_19291 is found in 18 samples, with
716 highest relative abundance in all four sediment cores past the oxic/anoxic transition zone. The
717 relative abundance of OTU_5_19291 is above 1% in 7 samples. In one exceptional case it was
718 the most abundant OTU in the GS08_GC12_126 sample, representing 40% of bacterial relative
719 abundance. OTU_255_442, like OTU_5_19291 was most abundant (1.3%) in sample
720 GS08_GC12_126.
721 7.2 Environmental chlamydiae lineages are abundant in sample GS10_PC15_940 722 723 In our amplicon survey of GS10_PC15_K940, we identified an abundant OTU (OTU_64_1912;
724 4.8% abundance), which likely corresponding to the Chlamydiae bacterium K940_chlam_7
725 MAG, as they both affiliate with the Waddliaceae family in phylogenetic analyses (Fig. 1b,
726 Fig. 2, Supplementary Fig. 3, Supplementary Data 2). Waddlia chondrophila is a known animal
727 pathogen12,84, and Waddliaceae family members have been identified both in animal-associated
728 and environmental samples85. The wide distribution of these organisms in diverse environments
729 suggests these species could naturally infect protists like other environmental chlamydiae10,11.
730 If so, this raises the possibility that the Waddliaceae-related Chlamydiae bacterium
731 K940_chlam_7 (Supplementary Discussion 6) might be a symbiont of the eukaryotes detected
732 in sample GS10_PC15_K940.
733
734 8. Underestimation of environmental abundance and diversity of Chlamydiae
735 8.1 The environmental distribution of Chlamydiae 736 737 To assess if the high environmental relative abundance and diversity of Chlamydiae identified
738 here (Supplementary Table 1, Supplementary Discussion 7) is unique to Loki’s Castle marine
739 sediments, we surveyed chlamydial abundance and diversity in other environments using the
740 Integrated Microbial NGS (IMNGS) platform86. IMNGS allows for large-scale taxonomic
31
741 analysis of 16S rRNA gene amplicon datasets deposited in the Sequence Read Archive (SRA).
742 Using this platform, we identified 13 environments that were enriched for chlamydial diversity
743 (>50 OTUs) and/or abundance (>0.1% relative abundance; Fig. 4b, Supplementary Data 3).
744 A large proportion of rhizosphere samples (831, corresponding to 62% of samples) and
745 soil samples (2295, corresponding to 14% of samples) harbour relative abundances of
746 Chlamydiae above 0.1%, though comparatively fewer had a high taxonomic richness as based
747 on OTU numbers. Several salt marsh samples were found to contain large numbers of
748 chlamydial OTUs, indicating that this environment represents an unexplored reservoir for
749 uncultured Chlamydiae diversity. Approximately 16% of groundwater samples also appear to
750 harbor a large relative chlamydial abundance, which is congruent with a recent study in which
751 17 MAGs were assembled from groundwater that resolved five distinct chlamydial lineages8.
752 Other environments, including wastewater, activated sludge and bioreactor samples, also
753 contain chlamydial relative abundances above 0.1% of the total microbial community.
754 Interestingly, previous studies have retrieved several chlamydial MAGs affiliated with both
755 CC-II and environmental chlamydiae from such environments (Supplementary Discussion 1,
756 Supplementary Table 3)87-89. In addition, some biofilm samples were found to harbour higher
757 (>0,1% relative abundance) chlamydial abundances (15% of samples) and could be an
758 additional environment of interest for studying uncultured chlamydial lineages.
759 Furthermore, we found that 24% and 14% of freshwater samples contained relative
760 abundances above 0,1% and included more than 50 OTUs, respectively. Samples from
761 freshwater sediments generally also contain high relative abundances of Chlamydiae, but do
762 not necessarily harbour high chlamydial diversity, which is similar to observations made for
763 seawater and marine sediment samples (Fig. 4b, Supplementary Data 3). These findings are in
764 line with a study that revealed a broad taxonomic and phylogenetic diversity of chlamydiae in
765 various environments, particularly from plant, soil and freshwater environments90. Altogether,
32
766 our analyses underline that several environments harbor high diversity and relative abundances
767 of uncultured Chlamydiae, even though primer sets used in environmental surveys were not
768 optimal for detection of chlamydiae (see 8.2).
769
770 8.2 Underestimation of chlamydial diversity and abundance in environmental surveys 771 772 Schulz et al.90 recently reported that taxonomic diversity estimates differ significantly between
773 amplicon and metagenomic surveys. In particular, they observed that taxonomic richness and
774 diversity of Chlamydiae was more pronounced in metagenomic data when compared to
775 amplicon studies. This may be the result of the common use of primers, which do not amplify
776 a large fraction of representatives of the Chlamydiae phylum91. For example, the widely used
777 16S rRNA gene primer sets 515FB and 806RB from the Earth Microbiome Project92, only
778 capture 0.7% of the characterized chlamydial diversity without any mismatches (though they
779 do capture 95% if allowing a single mismatch (Supplementary Table 5)). Similarly, the
780 universal primer pair A519F/Uni1391R captures less than 1% of chlamydial diversity
781 (Supplementary Table 5). In the present study we therefore used a bacterial-specific primer pair
782 (S-D-0564-a-S-15/SD-Bact-1061-a-A-17) that is predicted to capture ~94% of the presently
783 known chlamydial diversity without mismatches (Supplementary Table 5). Indeed, when
784 comparing the relative abundances of OTUs generated by 16S rRNA amplicon sequencing
785 using the S-D-0564-a-S-15/SD-Bact-1061-a-A-17 and A519F/U1391R primer pairs on
786 sediment core GS08_GC1293, we found that chlamydial OTUs represented 43% relative
787 abundance using the former primer pair, and less than 1% relative abundance when using the
788 latter. Similarly, while no chlamydial sequences were detected previously in GS10_GC14_75
789 using the A519F/U1391R primer pair79, a similar analysis with the S-D-0564-a-S-15/SD-Bact-
790 1061-a-A-17 primer pair recovered 8.9% relative chlamydial abundance.
791
33
792 8.3 Using culture-independent methods to explore chlamydial genomic diversity 793 794 As evidenced by the present study, culture-independent methods have great potential for
795 expanding genomic representation within the Chlamydiae phylum94. The majority of the so far
796 characterized chlamydiae have been isolated by means of co-cultivation (Supplementary Table
797 3), thus selecting for representatives that can replicate in the respective eukaryotic host.
798 However, most newly identified chlamydial lineages are represented by genome data only
799 which is derived from cultivation-independent studies (Supplementary Table 3). Chlamydiae-
800 targeted studies using cultivation-independent approaches have resulted in the first chlamydial
801 SAGs36 and the first chlamydial MAGs from metagenomic-based projects targeting animal
802 host-associated populations2,4. A number of chlamydial MAGs have also been recently
803 retrieved from whole microbial community metagenomic sequencing efforts of diverse
804 environments, including drinking water treatment plants88,89, a bioreactor87, aquifer
805 groundwater8, a cold-water geyser95, oceanic waters37 and river estuary sediment9. Altogether,
806 the Chlamydiae phylum is severely understudied at the genomic level, and the future
807 exploration of the microbial communities in additional environments in which Chlamydiae are
808 represented (e.g, see 8.1) will likely yield genomic data from diverse and abundant chlamydial
809 lineages.
34 Supplementary Figures
Extract DNA Amplify and sequence Cluster 16S rRNA region of the 16S rRNA gene sequences into gene from Bacteria OTUs
Sequence metagenome
Assemble Group contigs into metagnomic reads metagenomic bins into contigs
Supplementary Figure 1. Overview of sequencing methods. For amplicon sequencing, DNA was extracted from 69 marine sediment samples taken near Loki’s Castle hydrothermal vent field. These were used as a template for bacterial-specific amplification of an approximately 500 bp region of the 16S rRNA gene and sequenced on an Illumina MiSeq sequencer. These sequences were clustered at the 97% level to generate operational taxonomic units (OTUs). For metagenomic sequencing, DNA was extracted from 4 samples with a high abundance and diversity of chlamydiae. Sequence libraries were prepared and sequenced with Illumina HiSeq and resulting reads of each metagenome were assembled into contigs using IDBA-UD. A differential coverage genome binning approach (using CONCOCT), followed by manual curation, was used to obtain metagenome assembled genomes (MAGs).
a b Planctomycetes
PVC group bacterium (ex Bugula neritina AB1) Sample ID Metagenome ID Gbp Sequenced Gbp Assembled (≥ 1 kb) Omnitrophica
GS08_GC12_126 KR126 16.4 1.10 Lentisphaerae Lentisphaerae GS10_PC15_940 K940 63 1.30
Other Bacteria Other Verrucomicrobia Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_11 GS10_PC15_1000 K1000 53.8 1.07 Chlamydiales bacterium SCGC AB-751-O23 Ca.Similichlamydia epinephelii K940 contig-124_1042 GS10_PC15_1060 K1060 116.4 2.38 K1060 contig-124_111011 Simkania negevensis K1060 contig-124_201400 K1060 contig-124_2150 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_49_12 c K1060 contig-124_15197 Chlamydiae bacterium Ga0074140 K1060 contig-124_59246 K940 contig-124_2839 KR126 contig-100_216 K1060 contig-124_217445 K1060 contig-124_194465 K1000 contig-124_9157 KR126 contig-100_1304 K940 contig-124_54975 K1000 contig-124_1375 K940 contig-124_65424 KR126 contig-100_105023 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_9 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 Chlamydiae Chlamydiae bacterium SM23-39 K1060 contig-124_32386 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_27_8 K1000 contig-124_138441 Simkaniaceae Parilichlamydiaceae bacterium SCGC AG-110-M15 bacterium SM23_39 Rhabdochlamydiaceae K1000 contig-124_1902 Simkaniaceae K940 contig-124_2150
bacterium RIFCSPLOWO2_02_FULL_49_12 Piscichlamydiaceae Simkaniaceae KR126 contig-100_3930
bacterium SCGC AG-110-P3 K1060 contig-124_9382
PCF6 PCF9 K940 contig-124_7225 PCF8 PCF2 Chlamydiales Parachlamydiaceae/Criblamydiaceae KR126 contig-100_2916 K1060 contig-124_114082 Chlamydiae PCF4 PCF9 PCF1 PCF7 K940 contig-124_5410 PCF5 Parachlamydiaceae/Criblamydiaceae Chlamydiales K1060 contig-124_29952 KR126 contig-100_14168 Waddliaceae Parachlamydiaceae/CriblamydiaceaeParachlamydiaceae/Criblamydiaceae Chlamydiales bacterium SCGC AG-110-M15 Parachlamydiaceae/Criblamydiaceae K1060 contig-124_133909 PCF3 Chlamydiales bacterium SCGC AG-110-P3 Criblamydia sequanensis Parachlamydiaceae/Criblamydiaceae Estrella lausannensis K940 contig-124_6068 Parachlamydiaceae Chlamydiaceae K940 contig-124_170740 0.5 substitutions Clavichlamydiaceae Clavichlamydiaceae K940 contig-124_6236 per site 0.09 substitutions K1000 contig-124_70302 per site KR126 contig-100_68564 K1060 contig-124_229337 ufBV ≥ 95 KR126 contig-100_6141 ufBV ≥ 80 Chlamydiaceae Metagenome-assembled genome
Supplementary Figure 2. Marine sediment metagenome sequencing statistics and chlamydiae diversity. a, Sequencing statistics and sample identifiers for the four sediment samples used for metagenomic sequencing, including the number of Gbp assembled for each metagenome. b, Maximum likelihood (ML) tree estimated using an alignment of fifteen ribosomal proteins (at least five of which had to be present) from reference taxa (black, collapsed clades in grey) and marine sediment chlamydiae (orange), under the LG+C60+G model of evolution implemented with IQ-TREE (180 taxa, 2308 sites). Black and white circles represent bipartition values greater than 95 and 80 percent, respectively, from 1000 ultrafast bootstraps (ufBV). Dotted lines indicate the ufBV for all branches in the indicated clade. Sequences corresponding to metagenome-assembled genomes retrieved in this study are indicated with a blue star. c, ML phylogeny of chlamydial 16S rRNA gene fragments identified in Loki’s Castle metagenomes (orange) in the context of a reference chlamydial dataset (black, collapsed clades in grey), inferred using IQ-TREE with the GTR+R7 model of evolution (344 taxa, 1554 sites).
36
a b c Other MAG and SAG species representatives Presence and absence of bacterial NOGs CC = Chlamydiae clade Absent Present Median Intergenic space Ca. Similichlamydia epinephelii C.b. RIFCSPHIGHO2_12_FULL_49_11 Chlamydiae bacterium K940_chlam_8 CC Chlamydiae bacterium K1060_chlam_2 Simkania negevensis I Chlamydiae bacterium RIFCSPLOWO2_02_FULL_49_12 Chlamydiae bacterium Ga0074140 Rhabdochlamydia helvetica Chlamydiae bacterium K940_chlam_2 Chlamydiae bacterium KR126_chlam_1 II Chlamydiae bacterium K1000_chlam_2 Chlamydiae bacterium KR126_chlam_3 Chlamydiae bacterium K1000_chlam_3 Chlamydiae bacterium K940_chlam_6 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_9 Chlamydiae bacterium CG10_big_fil_rev_8_21_14_0_10_42_34 III Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 Chlamydiae bacterium CG10_big_fil_rev_8_21_14_0_10_35_9
Chlamydiae bacterium SM23_39 Anoxychlamydiales
to outgroup Chlamydiae bacterium K1060_chlam_5 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_27_8 Chlamydiae bacterium K1000_chlam_1 Chlamydiae bacterium K940_chlam_1 Chlamydiae bacterium KR126_chlam_6 Chlamydiae bacterium K1060_chlam_1 Chlamydiae bacterium K940_chlam_4 Chlamydiae bacterium KR126_chlam_4 Chlamydiae bacterium K1060_chlam_3 Chlamydiae bacterium KR126_chlam_5 Chlamydiae bacterium K1060_chlam_4 Chlamydiae bacterium K940_chlam_5 Chlamydiales bacterium SCGC_AB-751-O23 Waddliaceae bacterium SP13 Chlamydiales bacterium SCGC_AG-110-P3 Environmental Chlamydiae bacterium K940_chlam_3 Criblamydia sequanensis Estrella lausannensis Chlamydiae bacterium K940_chlam_7 Waddlia chondrophila Parachlamydia acanthamoebae
Parachlamydia sp. BC-030
Ca. Rubidus massiliensis chlamydiae Chlamydiales bacterium 38-26 Neochlamydia sp. EPS4 Parachlamydiaceae bacterium HS-T3 Parachlamydia sp. C2 Ca. Protochlamydia amoebophila Ca. Protochlamydia naegleriophila Chlamydiales bacterium SCGC_AG-110-M15 Chlamydiae bacterium K940_chlam_9 Chlamydiae bacterium K1000_chlam_4 V Chlamydiae bacterium KR126_chlam_2 Chlamydia trachomatis Chlamydia muridarum
Chlamydia suis Chlamydiaceae Chlamydophila pecorum Chlamydia sp. 2742-308 Ca. Chlamydia corallus Chlamydophila pneumoniae Chlamydia ibidis BV ≥ 90 Chlamydia avium Chlamydia gallinacea BV ≥ 70 Chlamydia felis Chlamydophila caviae 0.4 substitutions per site Chlamydia abortus Chlamydia psittaci 0 50 100 Supplementary Figure 3. Species phylogeny and gene content variation across the Chlamydiae phylum. a, Phylogenetic tree was estimated using a concatenated alignment of 38 single-copy marker proteins, using IQ-TREE under the PMSF approximation of LG+C60 (8072 sites). Bipartitions are labeled with black and white circles representing non-parametric bootstrap values (BV) greater or equal to 90 and 70, respectively. The phylogeny includes other metagenome assembled genome (MAG) and single-cell assembled genomes (SAG) chlamydiae species representatives (stars, see Methods, Supplementary Table 3). b, Presence (in dark grey) and absence (in light grey) of NOGs found across all chlamydial lineages, with delineated Chlamydiae clades indicated. c, Median intergenic space in bp across chlamydial genomes.
37
Supplementary Figure 4. Overview of selected protein content across Chlamydiae. Presence of selected proteins and pathways including traits associated with the chlamydiae biphasic lifecycle, components of central carbon metabolism and nucleotide and amino acid biosynthesis, across Chlamydiae species representatives color-coded according to Chlamydiae clades. Where relevant the corresponding KEGG pathway module is indicated in brackets.
38
130 131
Environmental 98 chlamydiae 82 86 60 69 59 52 13 41 14 33 8 33 30 29 23
121
Chlamydiae bacterium 81 75 K940_chlam_9 67 53 53 56 51 46 35 34 29 10 27 32 9 13 14
113 91 Chlamydiae bacterium 77 KR12_6chlam_2 51 49 50 48 39 43 28 28 30 10 22 24 6 14 6
Chlamydiae bacterium 86 K1000_chlam_4 CC-IV 57 54 38 34 23 29 29 30 25 6 3 18 17 3 17 17 4
110
Chlamydiaceae 60 56 46 35 35 33 40 10 9 11 26 30 31 19 2 19 3
J K L DMNOT U V CEF G H I P Q
Information storage and processing Cellular processes and signalling Metabolism J: Translation, ribosomal structure and biogenesis D: Cell cycle control, cell division, chromosome partitioning C: Energy production and conversion K: Transcription M: Cell wall/membrane/envelope biogenesis E: Amino acid transport and metabolism L: Replication, recombination and repair N: Cell motility F: Nucleotide transport and metabolism O: Posttranslational modification, protein turnover, chaperones G: Carbohydrate transport and metabolism T: Signal transduction mechanisms H: Coenzyme transport and metabolism U: Intracellular trafficking, secretion, and vesicular transport I: Lipid transport and metabolism V: Defense mechanisms P: Inorganic ion transport and metabolism Q: Secondary metabolites biosynthesis, transport and catabolism
Supplementary Figure 5. COG category distribution patterns. Distributions of the number of NOGs assigned across COG categories for environmental chlamydiae (mean and standard deviation), CC-IV, and Chlamydiaceae (mean and standard deviation).
39
NOG or PF Present 2742-308 sp. NOG or PF Absent Chlamydia corallus Chlamydia
NOG or PF Domain Description Chlamydia trachomatis Chlamydia muridarum Chlamydia suis Chlamydophila pecorum Chlamydia Ca. Chlamydophila pneumoniae Chlamydia ibidis Chlamydia avium Chlamydia gallinacea Chlamydia felis Chlamydophila caviae Chlamydia abortus Chlamydia psittaci Host Interaction and Adhesion 0ZM85 Polymorphic membrane protein - family A 1 1 1 1 1 1 1 1 1 1 1 1 1 0Y0KC Polymorphic membrane protein - family B/C 2 2 2 1 1 1 2 1 1 1 1 1 1 1 0XV92 Polymorphic outer membrane protein - family D/E/F/G/H 5 5 6 11 14 19 13 22 4 5 21 13 10 13 0Y3IR Polymorphic membrane protein - family G 1 1 1 2 1 1 1 1 1 1 1 1 1 1 PF03503 Chlamydia cysteine-rich outer membrane protein 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 PF05745 Chlamydia 15 kDa cysteine-rich outer membrane protein (CRPA) 1 1 1 1 1 1 1 1 1 1 1 1 1 PF04156 IncA protein 4 5 3 1 2 9 1 1 1 8 10 8 10 PF17628 Inclusion membrane protein D 1 1 1 1 1 Virulence Factors 0Y2RT Porin AaxA/Carbohydrate-selective porin OprB 1 1 1 1 1 1 1 1 1 1 1 1 1 1 COG1945 Arginine decarboxylase 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0Y3HQ Membrane attack complex (MAC) perforin 1 1 1 1 1 1 2 1 2 1 2 0ZUEX Adherence factor/cytotoxin 4 3 2 2 2 1 1 1 1 PF05475 Pgp3 C-terminal domain 1 1 1 1 1 1 1 1 Vitamin Biosynthesis (Folate) COG1478 Alternate folylglutamate synthase FolC2 1 1 1 1 1 1 1 1 1 1 1 1 1 0ZGA4 Dihydroneopterin aldolase FolB 1 1 1 1 1 1 1 1 1 1 1 1 Metabolism COG1218 3'(2'),5'bisphosphate nucleotidase 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0Z2QW Adenosine AMP deaminase 1 1 1 1 1 1 1 COG0352 Thiamine monophosphate synthase 1 1 1 1 1 Gene Expression PF07382 Histone H1-like nucleoprotein HC2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 PF17455 Late transcription unit B protein 1 1 1 1 1 1 1 1 1 1 1 1 1 1 PF17446 Late transcription unit A protein 1 1 1 1 1 1 1 1 1 1 1 Unknown Function 11VHY Conserved hypothetical protein 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0Y23D DUF5398 - Domain of unknown function 1 1 1 1 1 1 1 1 1 1 1 1 1 1 PF06587 DUF1137 - Domain of unknown function 1 1 1 1 1 1 1 1 1 1 1 1 1 1 PF16802 DUF5070 - Domain of unknown function 1 1 1 1 1 1 1 1 1 1 1 1 1 1 PF07577 DUF1547 - Domain of unknown function 1 1 1 1 1 1 1 1 1 1 1 PF07146 DUF1389 - Domain of unknown function 2 3 4 5 4 3 3 3 7 3 7 PF07560; PF07579 DUF1539 and DUF1548 - Domains of unknown function 2 1 1 1 1 1 1 3 3 3 3
Supplementary Figure 6. Conserved gene content restricted to the Chlamydiaceae family. Presence (dark grey), absence (light grey), and number of genes assigned to NOGs or with PF domains found uniquely within Chlamydiaceae lineages among Chlamydiae, and which is conserved across the family (in a third of representative genomes).
40
a
NOG or PF Present NOG or PF Absent bacterium K940_chlam_9 bacterium K1000_chlam_4 bacterium KR126_chlam_2 bacterium sp. sp. 2742-308 Chlamydia corallus Chlamydia Chlamydia trachomatis Chlamydia muridarum Chlamydia suis Chlamydophila pecorum Chlamydia Ca. Chlamydophila pneumoniae Chlamydia ibidis Chlamydia avium Chlamydia gallinacea Chlamydia felis Chlamydophila caviae Chlamydia abortus Chlamydia psittaci NOG PF Domain Description Chlamydiae Chlamydiae Chlamydiae -- PF04518 Effector from type III secretion system 1 5 5 5 4 4 4 4 4 4 4 4 4 4 4 -- PF05302 Domain of unknown function (DUF720) 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 10UQK PF07079 Domain of unknown function (DUF1347) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 COG0400 PF02230 Phospholipase/Carboxylesterase 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 -- PF17458 Domain of unknown function (DUF5421) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -- PF17459 Domain of unknown function (DUF5422) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 -- PF17461 Domain of unknown function (DUF5423) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
b c
1 2 3 4 5 1 2 3
PF04518 PF04518 PF04518 PF04518 PF04518 PF05302 PF05302 PF05302
Chlamydiae bacterium K940_chlam_9 Chlamydiae bacterium K940_chlam_9 1
1 Chlamydiaceae 1 Chlamydiaceae
2 Chlamydiae bacterium K940_chlam_9 Chlamydiaceae 2
2 3 Chlamydiaceae Chlamydiaceae
Chlamydia abortus 4 Chlamydiae bacterium K940_chlam_9 3 Chlamydia psittaci
Chlamydophila caviae 3 Chlamydiaceae Chlamydia felis
Chlamydia gallinacea
Chlamydia avium 1 substitution Chlamydia ibidis per site Chlamydia muridarum
1 substitution Chlamydia trachomatis per site Chlamydia suis
Chlamydophila pecorum BV ≥ 90 Chlamydia sp. 2742308 BV ≥ 70
Chlamydophila pneumoniae
Ca. Chlamydia corallus
Chlamydia trachomatis 5 Chlamydia muridarum
Chlamydia suis
Supplementary Figure 7. Evolutionary insights into gene content shared between Chlamydiaceae and CC-IV. a, Presence (dark grey), absence (light grey), and number of genes assigned to NOGs or with PF domains found conserved uniquely in CC-IV and Chlamydiaceae lineages among Chlamydiae. Phylogenetic tree and typical genomic organization of gene families containing PF domains b, PF04518 and c, PF05302. Phylogenies were inferred with IQ-TREE under the PMSF approximation of LG+C20+G+F (PF04518: 395 sites, PF05302: 126 sites). Bipartitions are labeled with black and white circles representing non-parametric bootstrap values (BV) greater or equal to 90 and 70, respectively.
41
a Waddliaceae bacterium SP13 Chlamydiales bacterium SCGC AG-110-P3 Chlamydiae bacterium K1000chlam4 Chlamydiae bacterium KR126chlam2 Legend
CC-IV Chlamydiae bacterium K940chlam9 Chlamydiales bacterium SCGC AG-110-M15 sctJ sctN sctU sctV sctR sctS sctT flgB flgC fliE sctQ Chlamydiales bacterium SCGC AB-751-O23 50 kb b concatenated SctJNRSTUV c SctJ NF-T3SS
Proteobacteria Alphaproteobacteria Symbiobacterium thermophilum Alphaproteobacteria Actinobacteria Alphaproteobacteria Firmicutes/Chloroflexi Opitutus terrae Spirochaetales Alphaproteobacteria Thermotogales Aquificales Rhodothermus marinus Gemmatimonas aurantiaca Bacteria Waddliaceae bacterium SP13 Chlamydiae bacterium K940_chlam_9 Thermotogae
Chlamydiae bacterium KR126_chlam_2 Flagellum Acidobacteria Clostridia Flagellum Aquificales Bacteria
Deferribacterales Epsilonbacteria Proteobacteria Firmicutes Deltaproteobacteria Spirochaetales Ca. Nitrospira defluvi Deltaproteobacteria 0.5 Chlamydiales bacterium SCGC AB-751-O23
Waddliaceae bacterium SP13 ydiae
Myxococcales Chlamydiae bacterium K940chlam9 m Chlamydiae bacterium KR126chlam2 Ca. Similichlamydia epinephelii Waddliaceae bacterium SP13 d SctR Chla Environmental chlamydiae NF-T3SS Chlamydiaceae Deltaproteobacteria CC-IV
Chlamydiae bacterium K940_chlam_8 CC-III Desulfatibacillus alkenivorans Bacteria Anoxychlamydiales Chlamydiae
CC-II NF-T3SS Deltaproteobacteria Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_11 CC-I Bacteria Proteobacteria Bacteria Proteobacteria
Proteobacteria Bacteria Verrucomicrobium spinosum 0.5 substitutions per site Deltaproteobacteria Bacteria Flagellum Proteobacteria
Acidobacteria
Gemmatimonas aurantiaca Waddliaceae bacterium SP13 Chlamydiales bacterium SCGC AG-110-P3 Chlamydiales bacterium SCGC AG-110-M15 0.5 ydiae Chlamydiae bacterium K940chlam9 m Chlamydiae bacterium K1000chlam4 Chlamydiae bacterium KR126chlam2 Chla
Supplementary Figure 8. Synteny of flagellar components found in chlamydial lineages and phylogenetic analyses of homologous NF-T3SS and flagellar components. a, Synteny of flagellar genes in Chlamydiae, following the gene nomenclature by Abby et al.713030. All genomes with at least one homolog of flagellar genes detected by MacSyFinder are included, except for those genomes with flagellear homologs of sctN and sctV, which have been reported to be co-opted by the NF-T3SS machinery (see Supplementary Discussion). Genes (arrows) are colored according to the legend next to the synteny plot. Genome regions are defined as 10 kb up- and downstream the colored genes, and are truncated at contig boundaries (thicker, vertical lines). Comparison lines between genes represent best reciprocal BLASTP hits with an e-value less than or equal to 0.001. Phylogenies of b, a concatenated dataset of the SctJNRSTUV proteins (PMSF approximation of LG+F+C50+R4, 626 sequences, 1635 sites), c, the SctJ protein (LG+F+C40+R4, 630 sequences, 126 sites) and d, the SctR protein (LG+F+C40+R4, 651 sequences, 171 sites). All phylogenies were reconstructed with IQ-TREE and were rooted with the respective paralogues. Bipartitions are labeled with black and white circles representing non-parametric bootstrap values (BV) greater or equal to 90 and 70, respectively.
42
Neochlamydia sp. S13 Neochlamydia sp. EPS4 Neochlamydia sp. TUME1 Chlamydiales bacterium 38-26 Candidatus Rubidus massiliensis Chlamydia sp. 32-24 Parachlamydia acanthamoebae str. Hall’s coccus Parachlamydia acanthamoebae OEW1 Parachlamydia acanthamoebae UV-7 Parachlamydia acanthamoebae Bn9 Parachlamydia acanthamoebae BC.030 Candidatus Protochlamydia amoebophila UWE25 Candidatus Protochlamydia sp. R18 Candidatus Protochlamydia sp. W-9 Candidatus Protochlamydia amoebophila EI2 Candidatus Protochlamydia massiliensis Candidatus Protochlamydia naegleriophila KNIc Parachlamydia acanthamoebae Environmental chlamydiae Environmental Parachlamydiaceae bacterium HS-T3 Chlamydiales bacterium SCGC AG-110-P3 Criblamydia sequanensis CRIB-18 Estrella lausannensis CRIB-30 Chlamydiae bacterium K940chlam3 Chlamydiae bacterium K940chlam7 Waddlia chondrophila WSU 86-1044 Waddliaceae bacterium SP13 Chlamydia abortus S26/3 Chlamydia psittaci 6BC Chlamydophila caviae GPIC Chlamydia felis Fe/C-56 Chlamydia gallinacea 08-1274/3 Chlamydia avium 10DC88 Chlamydia muridarum str. Nigg Chlamydia suis MD56 Chlamydia trachomatis D/UW-3/CX Chlamydia ibidis 10-1398/6 Chlamydiaceae Chlamydophila pneumoniae CWL029 Chlamydia corallus G3/2742-324 Chlamydia sp. 2742-308 Chlamydophila pecorum E58 Chlamydiae bacterium K1000chlam4 Chlamydiae bacterium KR126chlam2
CC-IV Chlamydiae bacterium K940chlam9 Chlamydiales bacterium SCGC AG-110-M15 Chlamydiales bacterium SCGC AB-751-O23 Legend Chlamydiae bacterium K1060chlam3 sctN sctQ sctC sctU sctV sctJ sctR sctS sctT Chlamydiae bacterium KR126chlam5 Chlamydiae bacterium K940chlam5 Chlamydiae bacterium K940chlam4 Chlamydiae bacterium KR126chlam4 Chlamydiae bacterium K1060chlam1 Chlamydiae bacterium KR126chlam6 Chlamydiae bacterium K1000chlam1 Chlamydiae bacterium K940chlam1 Chlamydiae bacterium K1060chlam5
Anoxychlamydiales Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_27_8 Chlamydiae bacterium RIFCSPLOWO2_01_FULL_28_7 Chlamydiae bacterium SM23_3 Chlamydiae bacterium RIFCSPHIGHO2_01_FULL_44_39 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 Chlamydiae bacterium RIFCSPLOWO2_12_FULL_45_2 Chlamydiae bacterium RIFCSPLOWO2_01_FULL_44_52 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_44_59 CC-III Chlamydiae bacterium RIFCSPHIGHO2_02_FULL_45_9 Chlamydiae bacterium CG10_big_fil_rev_8_21_14_0_10_42_34 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_9 Chlamydiae bacterium CG10_big_fil_rev_8_21_14_0_10_35_9 Chlamydiae bacterium K1000chlam3 Chlamydiae bacterium K940chlam6 Chlamydiae bacterium KR126chlam3 Chlamydiae bacterium K1000chlam2 Chlamydiae bacterium KR126chlam1 Chlamydiae bacterium K940chlam2 Chlamydiae bacterium Ga0074140 Rhabdochlamydia helvetica T3358
CC-II Chlamydiae bacterium GWA2_50_15 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_49_12 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_32 Chlamydiae bacterium GWC2_50_10 Chlamydiae bacterium RIFCSPLOWO2_12_FULL_49_12 Chlamydiae bacterium GWF2_49_8 Chlamydiae bacterium RIFCSPHIGHO2_02_FULL_49_29 Chlamydiae bacterium K1060chlam2
CC-I Simkania negevensis Z Chlamydiae bacterium K940chlam8 Ca Similichlamydia epinephelii Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_11 20 kb Supplementary Figure 9. Conserved synteny of NF-T3SS components across Chlamydiae. Synteny plot including genomes with at least one homolog of NF-T3SS genes detected by MacSyFinder. Genes (arrows) are colored according to the legend. Genome regions are defined as 10 kb up- and downstream of the colored genes, and are truncated at contig boundaries (thicker, vertical lines). Comparison lines between genes represent best reciprocal BLASTP hits with an e-value less than or equal to 0.001.
43
Chlamydiales bacterium 38-26 Criblamydia sequanensis CRIB-18 Chlamydia sp. 32-24 Estrella lausannensis CRIB-30 Chlamydiae bacterium K940chlam3 Chlamydiae bacterium K940chlam7 Neochlamydia sp. S13 Neochlamydia sp. TUME1 Neochlamydia sp. EPS4 Parachlamydia acanthamoebae str. Hall's coccus Parachlamydia acanthamoebae UV-7 Parachlamydia acanthamoebae OEW1 Parachlamydia acanthamoebae Bn9 Parachlamydia sp. C2 Parachlamydia sp. BC.030 Parachlamydiaceae bacterium HS-T3 Candidatus Protochlamydia amoebophila UWE25
Environmental chlamydiae Environmental Candidatus Protochlamydia amoebophila EI2 Candidatus Protochlamydia massiliensis Candidatus Protochlamydia naegleriophila KNic Candidatus Protochlamydia sp. R18 Candidatus Protochlamydia sp. W-9 Candidatus Rubidus massiliensis Chlamydiales bacterium SCGC AG-110-P3 Waddlia chondrophila WSU 86-1044 Waddliaceae bacterium SP13 Chlamydia abortus S26/3 Chlamydia avium 10DC88 Chlamydia corallus G3/2742-324 Chlamydia felis Fe/C-56 Chlamydia gallinacea 08-1274/3 Legend Chlamydophila caviae GPIC Chlamydophila pneumoniae CWL029 gspD gspE gspF gspG gspH gspI gspJ gspL pilAE pilB pilC pilM pilQ tadZ Chlamydophila pecorum E58 Chlamydia ibidis 10-1398/6 Chlamydia muridarum str. Nigg Chlamydiaceae Chlamydia psittaci 6BC Chlamydia sp. 2742-308 Chlamydia suis MD56 Chlamydia trachomatis D/UW-3/CX Chlamydiae bacterium K940chlam9 Chlamydiae bacterium KR126chlam2 CC-IV Chlamydiae bacterium K1000chlam1 Chlamydiae bacterium K1060chlam1 Chlamydiae bacterium K1060chlam3 Chlamydiae bacterium K1060chlam5 Chlamydiae bacterium K940chlam1 Chlamydiae bacterium K940chlam4 Chlamydiae bacterium K940chlam5 Chlamydiae bacterium KR126chlam4 Chlamydiae bacterium KR126chlam5 Chlamydiae bacterium KR126chlam6 Anoxychlamydiales Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_27_8 Chlamydiae bacterium RIFCSPLOWO2_01_FULL_28_7 Chlamydiae bacterium SM23_3 Chlamydiae bacterium CG10_big_fil_rev_8_21_14_0_10_42_34 Chlamydiae bacterium RIFCSPHIGHO2_01_FULL_44_39 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_44_59 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_9 Chlamydiae bacterium RIFCSPHIGHO2_02_FULL_45_9
CC-III Chlamydiae bacterium RIFCSPLOWO2_01_FULL_44_52 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 Chlamydiae bacterium RIFCSPLOWO2_12_FULL_45_2 Chlamydiae bacterium CG10_big_fil_rev_8_21_14_0_10_35_9 Chlamydiae bacterium Ga0074140 Chlamydiae bacterium GWA2_50_15 Chlamydiae bacterium GWC2_50_10 Chlamydiae bacterium GWF2_49_8 Chlamydiae bacterium K1000chlam2 Chlamydiae bacterium K1000chlam3 Chlamydiae bacterium K940chlam2 Chlamydiae bacterium K940chlam6
CC-II Chlamydiae bacterium KR126chlam1 Chlamydiae bacterium KR126chlam3 Rhabdochlamydia helvetica T3358 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_32 Chlamydiae bacterium RIFCSPHIGHO2_02_FULL_49_29 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_49_12 Chlamydiae bacterium RIFCSPLOWO2_12_FULL_49_12 Chlamydiae bacterium K1060chlam2 Simkania negevensis Z CC-I Chlamydiae bacterium K940chlam8 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_11 Ca Similichlamydia epinephelii 10 kb Supplementary Figure 10. Conserved synteny of T2SS components across Chlamydiae. Synteny plot including Chlamydiae genomes with at least one T2SS detected by MacSyFinder (Supplementary Data 3). Genes (arrows) are colored according to the legend. Genome regions are defined as 10 kb up- and downstream of the colored genes, and are truncated at contig boundaries (thicker, vertical lines). Comparison lines between genes represent best reciprocal BLASTP hits with an e-value less than or equal to 0.001.
44
Parachlamydia sp. C2
Parachlamydia sp. BC.030
Candidatus Protochlamydia amoebophila UWE25
Candidatus Protochlamydia naegleriophila KNiC Legend
a genes tr trbC MOBQ t4cp1 t4cp2 virb4 Candidatus Protochlamydia sp. R18 Environmental chlamydiae Environmental Candidatus Rubidus massiliensis
Chlamydiae bacterium K1060chlam2 CC-I Simkania negevensis Z
Chlamydiae bacterium K1000chlam3 CC-II
Waddliaceae bacterium SP13 10 kb Supplementary Figure 11. Conserved synteny of T4SS components in Chlamydiae. Synteny plot including Chlamydiae genomes with at least one T4SS detected by MacSyFinder (Supplementary Data 3). Genes (arrows) are colored according to the legend. Genome regions are defined as 10 kb up- and downstream of the colored genes, and are truncated at contig boundaries (thicker, vertical lines). Comparison lines between genes represent best reciprocal BLASTP hits with an e-value less than or equal to 0.001.
45
a
1 2
CC-IV CC-I CC-II CC-III Anoxy Chl Env PamNTT5 3 1 (GTP, ATP/H+ symporter) 4 CtNTT2 5 2 (NTP/H+ symporter)
SnNTT2 3 (GTP, ATP/H+ symporter)
PamNTT3 4 (UTP/H+ symporter)
PamNTT2 5 (NTP counter exchange transporter) 0.5 substitutions per site 6 6 7
7 SnNTT3 9 8 (NTP, dCTP counter exchange transporter)
CtNTT1/PamNTT1/SnNTT1 9 2x (ATP/ADP antiporter, NAD+/ADP antiporter)
8 b c
10
0.5 substitutions per site 0.5 substitutions per site
12
11
CC-IV Anoxy CC-I CC-II CC-III Chl Env
SnNTT4 10 (unknown substrate) CC-IV Anoxy CC-I CC-II CC-III Chl 11 Env 12
Chlamydiae Diatoms/Haptophytes/Brown algae Alphaproteobacteria Deltaproteobacteria Cyanobacteria Archaeplastida/Green algae/Red algae Microsporidia Ca Dependentiae Gammaproteobacteria Bacteroides Supplementary Figure 12. Phylogenetic inference of nucleotide transporters. Phylogenetic trees of nucleotide transporter (NTT) proteins found in both prokaryotes and eukaryotes. Chlamydiae are shown in orange with clade affiliation of the chlamydiae within each chlamydial NTT cluster indicated by the coloured circles. Functionally characterized NTTs from each cluster are indicated. Species and clade name abbreviations: environmental chlamydiae (Env), Chlamydiaceae (Chl), Anoxychlamydiales (Anoxy), Chlamydia trachomatis (Ct), Ca. Protochlamydia acanthamoebae (Pam), Simkania negevensis (Sn). See legend for color scheme of additional lineages. a, ML phylogeny inferred using IQ-TREE with the LG+F+R8 model of “canonical NTTs” (400 taxa, 357 sites) b, ML phylogeny of “other NTTs” (that form a sister clade to the “canonical NTTs”), inferred using IQ-TREE with the LG+F+R8 model of evolution (302 taxa, 348 sites). c, ML phylogeny of “NTT-HEAT NTTs”, inferred using IQ-TREE with the LG+F+R6 model of evolution (157 taxa, 329 sites).
46
Ferroplasma_acidarmanus_fer1 100 94 Methanocorpusculum_labreanum_Z 64 Archaeoglobus_fulgidus_DSM_4304 Methanocaldococcus_fervens_AG86 29 26 Nanoarchaeum_equitans_Kin4_M 33 Pyrococcus_furiosus_COM1 70 Sulfolobus_acidocaldarius_DSM_639 69 Candidatus_Korarchaeum_cryptofilum_OPF8 80 lokiarch 100 Cenarchaeum_symbiosum_A Candidatus_Caldiarchaeum_subterraneum 89 Leptospira_biflexa_serovar_Patoc_strain_Patoc_1__Paris_ 79 Brachyspira_intermedia_PWS_A 100 Borrelia_burgdorferi_B31 Treponema_pallidum_subsp._pallidum_str._Nichols 32 Omnitrophica_bacterium_OLB16.contig_contig_1 100 Chlorobaculum_parvum_NCIB_8327 97 Rhodothermus_marinus_DSM_4252 100 Bacteroides_thetaiotaomicron_VPI-5482 65 Cytophaga_hutchinsonii_ATCC_33406 Pedobacter_heparinus_DSM_2366 100 Marinithermus_hydrothermalis_DSM_14884 93 Truepera_radiovictrix_DSM_17093 68 Deinococcus_radiodurans_R1 98 Fervidobacterium_pennivorans_DSM_9078 100 Thermotoga_maritima_MSB8 81 Petrotoga_mobilis_SJ95 Kosmotoga_olearia_TBF_19.5.1 40 99 Acidaminococcus_intestini_RyC-MR95 100 Heliobacterium_modesticaldum_Ice1 100 84 Natranaerobius_thermophilus_JW_NM-WN-LF 100 Lactococcus_lactis_subsp._lactis_Il1403 Listeria_innocua_Clip11262 92 38 Herpetosiphon_aurantiacus_DSM_785 100 Thermomicrobium_roseum_DSM_5159 57 Dehalococcoides_ethenogenes_195 100 Caldilinea_aerophila_DSM_14535_=_NBRC_104270 35 Anaerolinea_thermophila_UNI-1 Streptosporangium_roseum_DSM_43021 100 100 Nocardia_brasiliensis_ATCC_700358 93 Actinosynnema_mirum_DSM_43827 31 52 Kineococcus_radiotolerans_SRS30216_=_ATCC_BAA-149 Catenulispora_acidiphila_DSM_44928 100 Gloeobacter_violaceus_PCC_7421 100 Prochlorococcus_marinus_str._MIT_9303 Synechococcus_sp._PCC_6312 91100 Pleurocapsa_sp._PCC_7327 62 Cyanothece_sp._PCC_7822 41 Trichodesmium_erythraeum_IMS101 100 13 Mastigocladopsis_repens Anabaena_cylindrica_PCC_7122 90 Stigmatella_aurantiaca_DW4_3-1 99 Pelobacter_propionicus_DSM_2379 96 Desulfobulbus_propionicus_DSM_2032 Syntrophobacter_fumaroxidans_MPOB 100 Nautilia_profundicola_AmH 42 97 Helicobacter_pylori_26695 Arcobacter_nitrofigilis_DSM_7299 100 Thiobacillus_denitrificans_ATCC_25259 18 Chromobacterium_violaceum_ATCC_12472 48 60 100 Nitrosomonas_sp._Is79A3 Burkholderia_xenovorans_LB400 77 Legionella_pneumophila_subsp._pneumophila_str._Philadelphia_1 96 Allochromatium_vinosum_DSM_180 62 98 Alteromonas_sp._SN2 100 Acinetobacter_baumannii_ATCC_17978 GCA_000482685.contig_contig_1 Magnetococcus_marinus_MC-1 75 71 Candidatus_Pelagibacter_sp._IMCC9063 46 alpha_proteobacterium_HIMB59 75 GCA_000371985.contig_contig_1 76 Rickettsia_prowazekii_str._Madrid_E Candidatus_Caedibacter_acanthamoebae 89 86 Acetobacter_pasteurianus_IFO_3283-01 94 Geminicoccus_roseus 94 Novosphingobium_sp._PP1Y 91 Caulobacter_crescentus_CB15 43 Bartonella_quintana_str._Toulouse Dinoroseobacter_shibae_DFL_12_=_DSM_16493 Chlamydiae_bacterium_RIFCSPHIGHO2_12_FULL_49_11.contig_contig_1 K940_chlam_8.rp15.contig.prokka_contig_1 100 Simkania_negevensis_Z K1060_chlam_2.rp15.contig.prokka_contig_1 55 13 Chlamydiae_bacterium_GWF2_49_8.contig_contig_1 97 100 Chlamydiae_bacterium_RIFCSPHIGHO2_02_FULL_49_29.contig_contig_1 Chlamydiae_bacterium_RIFCSPLOWO2_12_FULL_49_12.contig_contig_1 3369Chlamydiae_bacterium_RIFCSPLOWO2_02_FULL_49_12.contig_contig_1 100 70 93 Chlamydiae_bacterium_RIFCSPHIGHO2_12_FULL_49_32.contig_contig_1 18Chlamydiae_bacterium_GWC2_50_10.contig_contig_1 Chlamydiae_bacterium_GWA2_50_15.contig_contig_1 65 Chlamydiae bacterium Ga0074140 100 K940_chlam_2.rp15.contig.prokka_contig_1 100 100 KR126_chlam_1.rp15.contig.prokka_contig_1 97 K1000_chlam_2.rp15.contig.prokka_contig_1 100 KR126_chlam_3.rp15.contig.prokka_contig_1 100 100 K940_chlam_6.rp15.contig.prokka_contig_1 K1000_chlam_3.rp15.contig.prokka_contig_1 100 Chlamydiae_bacterium_RIFCSPHIGHO2_12_FULL_49_9.contig_contig_1 100 Chlamydiae_bacterium_RIFCSPHIGHO2_02_FULL_45_9.contig_contig_1 100Chlamydiae_bacterium_RIFCSPLOWO2_12_FULL_45_20.contig_contig_1 12Chlamydiae_bacterium_RIFCSPLOWO2_01_FULL_44_52.contig_contig_1 100 8 Chlamydiae_bacterium_RIFCSPHIGHO2_12_FULL_44_59.contig_contig_1 23Chlamydiae_bacterium_RIFCSPHIGHO2_01_FULL_44_39.contig_contig_1 Chlamydiae_bacterium_RIFCSPLOWO2_02_FULL_45_22.contig_contig_1 Chlamydiae_bacterium_SM23_39.contig_contig_1 100 100 Chlamydiae_bacterium_RIFCSPHIGHO2_12_FULL_27_8.contig_contig_1 100 Chlamydiae_bacterium_RIFCSPLOWO2_01_FULL_28_7.contig_contig_1 K1060_chlam_5.rp15.contig.prokka_contig_1 67 100 K940_chlam_1.rp15.contig.prokka_contig_1 100 K1000_chlam_1.rp15.contig.prokka_contig_1 KR126_chlam_6.rp15.contig.prokka_contig_1 100 98 K1060_chlam_1.rp15.contig.prokka_contig_1 100100K940_chlam_4.rp15.contig.prokka_contig_1 100 KR126_chlam_4.rp15.contig.prokka_contig_1 100K940_chlam_5.rp15.contig.prokka_contig_1 100K1060_chlam_4.rp15.contig.prokka_contig_1 100K1060_chlam_3.rp15.contig.prokka_contig_1 KR126_chlam_5.rp15.contig.prokka_contig_1 89 K940_chlam_9.rp15.contig.prokka_contig_1 100 K1000_chlam_4.rp15.contig.prokka_contig_1 100 KR126_chlam_2.rp15.contig.prokka_contig_1 Chlamydophila_pecorum_E58.contig_contig_1 100 100 Chlamydia_sp_2742-308.contig_contig_1 Chlamydophila_pneumoniae_CWL029 5327 Chlamydia_ibidis_10-1398-6.contig_contig_1 100 Chlamydia_trachomatis_D_UW-3_CX 50 Chlamydia_muridarum_str_Nigg.contig_contig_1 60 Chlamydia_suis_MD56.contig_contig_1 100 96 Chlamydia_gallinacea_08-1274-3.contig_contig_1 83 99 Chlamydia_avium_10DC88.contig_contig_1 100 Chlamydia_felis_Fe-C-56.contig_contig_1 82 Chlamydophila_caviae_GPIC.contig_contig_1 100Chlamydia_psittaci_6BC.contig_contig_1 Chlamydia_abortus_S26-3.contig_contig_1 100 Estrella_lausannensis_CRIB-30.contig_contig_1 8 Criblamydia_sequanensis_CRIB-18.contig_contig_1 33 K940_chlam_3.rp15.contig.prokka_contig_1 100 K940_chlam_7.rp15.contig.prokka_contig_1 100 Waddlia_chondrophila_WSU_86-1044 10 Parachlamydiaceae bacterium HS-T3 Parachlamydia sp. C2 100100Protochlamydia_naegleriophila.contig_contig_1 73 100Chlamydia sp. Diamant 100Candidatus Protochlamydia amoebophila EI2 50Candidatus Protochlamydia sp. W-9 46 43Candidatus Protochlamydia sp. R18 Candidatus_Protochlamydia_amoebophila_UWE25 100 Parachlamydia_acanthamoebae_str_Halls_coccus.contig_contig_1 92Parachlamydia_acanthamoebae_UV-7 62Parachlamydia acanthamoebae Bn9 29 Parachlamydia acanthamoebae OEW1 100 Chlamydia_sp_32-24.contig_contig_1 79 Candidatus_Rubidus_massiliensis.contig_contig_1 100 Chlamydiales_bacterium_38-26.contig_contig_1 100 Neochlamydia_sp_TUME1.contig_contig_1 57Neochlamydia_sp_EPS4.contig_contig_1 Neochlamydia sp. S13 Kiritimatiella_glycovorans.contig_contig_1 100100 Lentisphaerae_bacterium_GWF2_57_35.contig_contig_1 74 GCA_001604235.contig_contig_1 100 Lentisphaerae_bacterium_RIFOXYA12_FULL_48_11.contig_contig_1 100 Lentisphaerae_bacterium_RIFOXYC12_FULL_60_16.contig_contig_1 65Lentisphaerae_bacterium_RIFOXYB12_FULL_60_10.contig_contig_1 Lentisphaerae_bacterium_RIFOXYA12_FULL_60_10.contig_contig_1 81 Lentisphaera_araneosa_HTCC2155.contig_contig_1 100 GCA_001603055.contig_contig_1 100 100 Lentisphaerae_bacterium_RIFOXYA12_64_32.contig_contig_1 Lentisphaerae_bacterium_RIFOXYB12_FULL_65_16.contig_contig_1 Lentisphaerae_bacterium_GWF2_38_69.contig_contig_1 100 100 Lentisphaerae_bacterium_GWF2_50_93.contig_contig_1 100 43 Lentisphaerae_bacterium_GWF2_49_21.contig_contig_1 90 Lentisphaerae_bacterium_GWF2_44_16.contig_contig_1 40 Lentisphaerae_bacterium_GWF2_52_8.contig_contig_1 Lentisphaerae_bacterium_GWF2_45_14.contig_contig_1 90 Verrucomicrobia_bacterium_CG1_02_43_26.contig_contig_1 90 Verrucomicrobia_bacterium_GWC2_42_7.contig_contig_1 Verrucomicrobia_bacterium_GWF2_51_19.contig_contig_1 99 100 GCA_001604565.contig_contig_1 77 GCA_001604585.contig_contig_1 100 Coraliomargarita_sp_CAG-312.contig_contig_1 46 Verrucomicrobia_bacterium_CAG-312_58_20.contig_contig_1 100 GCA_000383755.contig_contig_1 58 100 92 GCA_000382665.contig_contig_1 100 GCA_000382685.contig_contig_1 100 Opitutaceae_bacterium_BACL24_MAG-120322-bin51.contig_contig_1 100Coraliomargarita_akajimensis_DSM_45221.contig_contig_1 Coraliomargarita_akajimensis_DSM_45221 Verrucomicrobiae_bacterium_DG1235.contig_contig_1 80 Verrucomicrobia_bacterium_RIFCSPLOWO2_12_FULL_64_8.contig_contig_1 100 100 Opitutaceae_bacterium_IG16b.contig_contig_1 100 GCA_001464505.contig_contig_1 100 Opitutus_sp_GAS368.contig_contig_1 100 Cephaloticoccus_primus.contig_contig_1 75 Cephaloticoccus_capnophilus.contig_contig_1 100 Opitutus_terrae_PB90-1.contig_contig_1 94 Opitutus_terrae_PB90-1 83 Opitutaceae_bacterium_TSB47.contig_contig_1 Verrucomicrobia_bacterium_IMCC26134.contig_contig_1 100 87100Opitutaceae_bacterium_TAV5.contig_contig_1 100 Opitutaceae_bacterium_TAV1.contig_contig_1 100GCA_000171235.contig_contig_1 95Opitutaceae_bacterium_TAV3.contig_contig_1 Opitutaceae_bacterium_TAV4.contig_contig_1 96 Verrucomicrobia_bacterium_GWF2_62_7.contig_contig_1 GCA_001604625.contig_contig_1 100 84 Verrucomicrobia_bacterium_SCN_57-15.contig_contig_1 64 Pedosphaera_parvula_Ellin514.contig_contig_1 100 GCA_000385295.contig_contig_1 100GCA_000383715.contig_contig_1 GCA_000385275.contig_contig_1 100 Verrucomicrobia_subdivision_6_bacterium_BACL9_MAG-120507-bin52.contig_contig_1 66Verrucomicrobia_subdivision_6_bacterium_BACL9_MAG-120820-bin42.contig_contig_1 92 100 Verrucomicrobia_subdivision_6_bacterium_BACL9_MAG-120924-bin69.contig_contig_1 Verrucomicrobiaceae_bacterium_GAS474.contig_contig_1 98 100 GCA_000379365.contig_contig_1 100 GCA_000526255.contig_contig_1 100 Methylacidiphilum_infernorum_V4 100 Methylacidiphilum_kamchatkense_Kam1.contig_contig_1 100GCA_000953475.contig_contig_1 Methylacidiphilum_fumariolicum_SolV.contig_contig_1 100 Candidatus_Xiphinematobacter_sp_Idaho_Grape.contig_contig_1 100 90 48 Verrucomicrobia_bacterium_RIFCSPHIGHO2_12_FULL_41_10.contig_contig_1 100 76 Terrimicrobium_sacchariphilum.contig_contig_1 Verrucomicrobia_bacterium_61-8.contig_contig_1 40 Verrucomicrobia_bacterium_SCGC_AG-212-E04.contig_contig_1 Chthoniobacter_flavus_Ellin428.contig_contig_1 67 Verrucomicrobia_bacterium_13_1_40CM_4_54_4.contig_contig_1 100100Verrucomicrobia_bacterium_13_2_20CM_55_10.contig_contig_1 71 Verrucomicrobia_bacterium_13_2_20CM_2_54_15_9cls.contig_contig_1 64Verrucomicrobia_bacterium_13_1_20CM_54_28.contig_contig_1 100 100Verrucomicrobia_bacterium_13_1_20CM_3_54_17.contig_contig_1 74Verrucomicrobia_bacterium_13_1_20CM_4_54_11.contig_contig_1 Verrucomicrobia_bacterium_13_2_20CM_54_12.contig_contig_1
47
100 GCA_000739655.contig_contig_1 97GCA_000739635.contig_contig_1 100GCA_001313125.contig_contig_1 GCA_000172155.contig_contig_1 100 100 GCA_000428305.contig_contig_1 90 GCA_000739615.contig_contig_1 100 GCA_000378105.contig_contig_1 100 Rubritalea_squalenifaciens_DSM_18772.contig_contig_1 100 GCA_000285795.contig_contig_1 100 95 GCA_000264645.contig_contig_1 GCA_000264605.contig_contig_1 Akkermansia_glycaniphila.contig_contig_1 100100Akkermansia_sp_KLE1797.contig_contig_1 30Akkermansia_sp_KLE1605.contig_contig_1 100 Akkermansia_sp_KLE1798.contig_contig_1 Akkermansia_sp_CAG-344.contig_contig_1 9967GCA_000723745.contig_contig_1 51Akkermansia_muciniphila.contig_contig_1 100Akkermansia_muciniphila_ATCC_BAA-835 66Akkermansia_muciniphila_CAG-154.contig_contig_1 74GCA_001940945.contig_contig_1 85Akkermansia_sp_54_46.contig_contig_1 GCA_000980515.contig_contig_1 PVC_group_bacterium_ex_Bugula_neritina_AB1.contig_contig_1 Candidatus_Omnitrophica_bacterium_CG1_02_41_171.contig_contig_1 100 Omnitrophica_bacterium_GWA2_50_21.contig_contig_1 GCA_000405945.contig_contig_1 61 100 Omnitrophica_bacterium_GWA2_52_8.contig_contig_1 100 58 Omnitrophica_bacterium_RIFOXYB12_FULL_50_7.contig_contig_1 55 GCA_000402985.contig_contig_1 Omnitrophica_bacterium_GWA2_52_12.contig_contig_1 97 18Omnitrophica_bacterium_RIFCSPLOWO2_12_FULL_44_17.contig_contig_1 96 100 Omnitrophica_bacterium_RIFCSPHIGHO2_02_FULL_45_28.contig_contig_1 25Omnitrophica_bacterium_RIFCSPHIGHO2_12_FULL_44_12.contig_contig_1 100 Omnitrophica_bacterium_RIFCSPLOWO2_02_FULL_44_11.contig_contig_1 100 Omnitrophica_bacterium_RIFCSPHIGHO2_02_FULL_46_11.contig_contig_1 99 Omnitrophica_bacterium_RIFCSPLOWO2_01_FULL_45_10b.contig_contig_1 100 Omnitrophica_bacterium_RIFCSPLOWO2_12_FULL_50_11.contig_contig_1 100 Omnitrophica_bacterium_RIFCSPLOWO2_01_FULL_50_24.contig_contig_1 Omnitrophica_bacterium_RIFCSPHIGHO2_02_FULL_49_9.contig_contig_1 100 Candidatus_Omnitrophica_bacterium_CG1_02_40_15.contig_contig_1 94 46 Omnitrophica_bacterium_GWA2_41_15.contig_contig_1 100 Omnitrophica_bacterium_RIFCSPHIGHO2_02_FULL_63_14.contig_contig_1 100 Omnitrophica_bacterium_RIFCSPHIGHO2_02_FULL_51_18.contig_contig_1 100 55 Candidatus_Omnitrophica_bacterium_CG1_02_46_14.contig_contig_1 Candidatus_Omnitrophica_bacterium_CG1_02_49_16.contig_contig_1 82 Candidatus_Omnitrophica_bacterium_CG1_02_49_10.contig_contig_1 Candidatus_Omnitrophica_bacterium_CG1_02_43_210.contig_contig_1 49100 Omnitrophica_bacterium_RBG_13_46_9.contig_contig_1 100 Candidatus_Omnitrophus_magneticus.contig_contig_1 100 GCA_000398085.contig_contig_1 100 Omnitrophica_bacterium_RIFCSPLOWO2_01_FULL_45_10.contig_contig_1 100 Omnitrophica_bacterium_RIFCSPLOWO2_02_FULL_45_16.contig_contig_1 62Omnitrophica_bacterium_RIFCSPLOWO2_12_FULL_45_13.contig_contig_1 69 100 Omnitrophica_bacterium_RIFCSPHIGHO2_02_FULL_46_20.contig_contig_1 Omnitrophica_bacterium_RIFCSPLOWO2_01_FULL_45_24.contig_contig_1 74 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_02_FULL_68_15.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_02_FULL_67_20.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_GWF2_63_9.contig_contig_1 29Omnitrophica_WOR_2_bacterium_RIFCSPLOWO2_12_FULL_63_16.contig_contig_1 0 Omnitrophica_WOR_2_bacterium_GWA2_63_20.contig_contig_1 0 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_02_FULL_63_39.contig_contig_1 0 Omnitrophica_WOR_2_bacterium_RIFCSPLOWO2_02_FULL_63_16.contig_contig_1 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_12_FULL_64_13.contig_contig_1 74100 Omnitrophica_WOR_2_bacterium_SM23_29.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFCSPLOWO2_12_FULL_51_24.contig_contig_1 63Omnitrophica_WOR_2_bacterium_RIFCSPLOWO2_02_FULL_50_19.contig_contig_1 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_01_FULL_49_10.contig_contig_1 Candidatus_Omnitrophica_bacterium_CG1_02_44_16.contig_contig_1 100 42 Omnitrophica_WOR_2_bacterium_GWA2_47_8.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_02_FULL_48_11.contig_contig_1 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_01_FULL_48_9.contig_contig_1 100 46 100 Omnitrophica_WOR_2_bacterium_GWC2_45_7.contig_contig_1 Omnitrophica_WOR_2_bacterium_GWA2_45_18.contig_contig_1 100 100 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_02_FULL_50_17.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFCSPLOWO2_12_FULL_50_9.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_01_FULL_52_10.contig_contig_1 32 Omnitrophica_WOR_2_bacterium_GWA2_53_43.contig_contig_1 31 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_02_FULL_52_10.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFOXYC2_FULL_38_12.contig_contig_1 100 84Omnitrophica_WOR_2_bacterium_GWA2_37_7.contig_contig_1 40 84Omnitrophica_WOR_2_bacterium_RIFOXYA12_FULL_38_10.contig_contig_1 12Omnitrophica_WOR_2_bacterium_GWF2_38_59.contig_contig_1 19Omnitrophica_WOR_2_bacterium_RIFOXYB2_FULL_38_16.contig_contig_1 Omnitrophica_WOR_2_bacterium_RIFOXYA2_FULL_38_17.contig_contig_1 85 Omnitrophica_WOR_2_bacterium_SM23_72.contig_contig_1 54 Omnitrophica_WOR_2_bacterium_RIFCSPLOWO2_12_FULL_51_8.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RBG_13_44_8b.contig_contig_1 43 Omnitrophica_WOR_2_bacterium_RBG_13_44_8.contig_contig_1 91 Omnitrophica_WOR_2_bacterium_RBG_13_41_10.contig_contig_1 99 Omnitrophica_WOR_2_bacterium_RIFCSPLOWO2_01_FULL_41_12.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFOXYB2_FULL_45_11.contig_contig_1 100Omnitrophica_WOR_2_bacterium_GWB2_45_9.contig_contig_1 64Omnitrophica_WOR_2_bacterium_RIFOXYC2_FULL_45_15.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFOXYA2_FULL_45_12.contig_contig_1 59Omnitrophica_WOR_2_bacterium_GWA2_44_7.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFOXYC2_FULL_43_9.contig_contig_1 91 85 Omnitrophica_WOR_2_bacterium_GWC2_44_8.contig_contig_1 Omnitrophica_WOR_2_bacterium_GWF2_43_52.contig_contig_1 100 Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_02_FULL_45_21.contig_contig_1 100Omnitrophica_WOR_2_bacterium_RIFCSPHIGHO2_02_FULL_46_37.contig_contig_1 44Omnitrophica_WOR_2_bacterium_RIFCSPLOWO2_02_FULL_45_28.contig_contig_1 Omnitrophica_WOR_2_bacterium_RIFCSPLOWO2_12_FULL_46_30.contig_contig_1 63 Planctomycetes_bacterium_DG_23.contig_contig_1 100 Planctomycetes_bacterium_DG_58.contig_contig_1 67 Planctomycetes_bacterium_SM23_65.contig_contig_1 54 Planctomycetes_bacterium_SM23_32.contig_contig_1 100 Planctomycetes_bacterium_RBG_16_59_8.contig_contig_1 Planctomycetes_bacterium_RBG_16_43_13.contig_contig_1 100 Planctomycetes_bacterium_RIFCSPHIGHO2_02_FULL_52_58.contig_contig_1 76 100 Planctomycetes_bacterium_RIFCSPHIGHO2_12_FULL_52_36.contig_contig_1 76Planctomycetes_bacterium_RIFCSPLOWO2_12_FULL_50_35.contig_contig_1 100 Planctomycetes_bacterium_RIFCSPHIGHO2_02_FULL_50_42.contig_contig_1 67Planctomycetes_bacterium_RIFCSPHIGHO2_12_FULL_51_37.contig_contig_1 63Planctomycetes_bacterium_RIFCSPLOWO2_02_FULL_50_16.contig_contig_1 100 Planctomycetes_bacterium_GWA2_50_13.contig_contig_1 100 Candidatus_Scalindua_sp_BSI-1.contig_contig_1 99 Candidatus_Scalindua_brodae.contig_contig_1 100 Planctomycetes_bacterium_RIFCSPHIGHO2_02_FULL_40_12.contig_contig_1 100Planctomycetes_bacterium_RIFCSPLOWO2_12_FULL_40_19.contig_contig_1 99Planctomycetes_bacterium_GWF2_40_8.contig_contig_1 100 Planctomycetes_bacterium_GWA2_40_7.contig_contig_1 100 Planctomycetes_bacterium_RBG_16_41_13.contig_contig_1 100GCA_000315095.contig_contig_1 GCA_000315115.contig_contig_1 100 100 Planctomycetes_bacterium_RIFCSPHIGHO2_12_39_6.contig_contig_1 63Planctomycetes_bacterium_RIFCSPHIGHO2_02_FULL_38_41.contig_contig_1 Planctomycetes_bacterium_RIFCSPLOWO2_12_38_17.contig_contig_1 100100 Planctomycetes_bacterium_RIFCSPLOWO2_12_FULL_39_13.contig_contig_1 57Planctomycetes_bacterium_GWF2_39_10.contig_contig_1 67Planctomycetes_bacterium_GWC2_39_26.contig_contig_1 Planctomycetes_bacterium_GWA2_39_15.contig_contig_1 49 Candidatus_Jettenia_caeni.contig_contig_1 81100 Candidatus_Brocadia_sinica_JPN1.contig_contig_1 100 Candidatus_Brocadia_sinica.contig_contig_1 44100 GCA_001753675.contig_contig_1 100 Candidatus_Brocadia_fulgida.contig_contig_1 Planctomycetes_bacterium_RIFCSPHIGHO2_12_42_15.contig_contig_1 5799 Planctomycetes_bacterium_GWB2_41_19.contig_contig_1 87 Planctomycetes_bacterium_RIFOXYB12_FULL_42_10.contig_contig_1 100Planctomycetes_bacterium_RIFOXYD12_FULL_42_12.contig_contig_1 39Planctomycetes_bacterium_RIFOXYD2_FULL_41_16.contig_contig_1 45Planctomycetes_bacterium_GWE2_41_14.contig_contig_1 Planctomycetes_bacterium_RIFOXYC2_FULL_41_27.contig_contig_1 100 Planctomycetes_bacterium_SM23_25.contig_contig_1 Planctomycetes_bacterium_DG_20.contig_contig_1 99 100 Phycisphaera_mikurensis_NBRC_102666 95 GCA_001657375.contig_contig_1 100 GCA_000484995.contig_contig_1 Phycisphaerae_bacterium_SM23_33.contig_contig_1 53 Phycisphaerae_bacterium_SM23_30.contig_contig_1 100 Planctomycetes_bacterium_GWC2_49_10.contig_contig_1 100 94 Planctomycetes_bacterium_GWF2_50_10.contig_contig_1 100 Planctomycetes_bacterium_GWF2_42_9.contig_contig_1 100 Planctomycetes_bacterium_GWF2_41_51.contig_contig_1 50 100 Planctomycetes_bacterium_RBG_13_44_8b.contig_contig_1 Planctomycetes_bacterium_GWC2_45_44.contig_contig_1 100 GCA_001603075.contig_contig_1 100 Planctomycetes_bacterium_RBG_13_60_9.contig_contig_1 100 Planctomycetes_bacterium_RBG_13_62_9.contig_contig_1 72 Planctomycetes_bacterium_RBG_13_46_10.contig_contig_1 100 79 Planctomycetes_bacterium_RBG_13_50_24.contig_contig_1 100 Phycisphaerae_bacterium_SG8_4.contig_contig_1 76 Phycisphaerae_bacterium_SM1_79.contig_contig_1 92 Planctomycetes_bacterium_RBG_16_55_9.contig_contig_1 Planctomycetes_bacterium_RBG_19FT_COMBO_48_8.contig_contig_1 100 Planctomycetaceae_bacterium_SCGC_AG-212-F19.contig_contig_1 100 GCA_000255705.contig_contig_1 100 GCA_000171775.contig_contig_1 99 100GCA_000531095.contig_contig_1 Gemmata_sp_SH-PL17.contig_contig_1 Isosphaera_pallida_ATCC_43644 100 100 Singulisphaera_sp_GP187.contig_contig_1 100GCA_000255675.contig_contig_1 96 Singulisphaera_acidiphila_DSM_18658.contig_contig_1 100 Paludisphaera_borealis.contig_contig_1 100 100 Planctomyces_sp_SH-PL62.contig_contig_1 Planctomycetales_bacterium_71-10.contig_contig_1 100 GCA_000255655.contig_contig_1 100 Planctopirus_sp_JC280.contig_contig_1 100 Planctopirus_limnophila_DSM_3776.contig_contig_1 100 Planctomicrobium_piriforme.contig_contig_1 88 Planctomyces_sp_SH-PL14.contig_contig_1 53 Gimesia_maris_DSM_8797.contig_contig_1 100 Planctomyces_brasiliensis_DSM_5305 36 GCA_001464525.contig_contig_1 58 Planctomycetes_bacterium_RBG_16_64_10.contig_contig_1 100 Planctomycetes_bacterium_RBG_13_63_9.contig_contig_1 100 Planctomycetes_bacterium_RBG_16_64_12.contig_contig_1 26 Blastopirellula_marina_DSM_3645.contig_contig_1 98 Pirellula_staleyi_DSM_6068.contig_contig_1 72 GCA_001642875.contig_contig_1 Pirellula_sp_SH-Sr6A.contig_contig_1 94 GCA_001642915.contig_contig_1 10096 Rhodopirellula_maiorica_SM1.contig_contig_1 GCA_001642955.contig_contig_1 100100 Rhodopirellula_sp_SWK7.contig_contig_1 100 Rhodopirellula_sallentina_SM41.contig_contig_1 Rhodopirellula_islandica.contig_contig_1 0.3 10078Rhodopirellula_europaea_SH398.contig_contig_1 81Rhodopirellula_europaea_6C.contig_contig_1 100Rhodopirellula_baltica_WH47.contig_contig_1 65Rhodopirellula_baltica_SH_1 50Rhodopirellula_baltica_SH28.contig_contig_1 Rhodopirellula_baltica_SWK14.contig_contig_1 Supplementary Figure 13. Selection of representative PVC genomes. ML phylogeny, inferred from an alignment of 438 taxa and 2301 sites of concatenated orthologous ribosomal proteins from ribocontigs using RAxML under the PROTCATLG model of evolution. Branch support was estimated with 100 rapid bootstrap replicates. PVC phyla are coloured: Planctomycetes in pink, Candidatus Omnitrophica in orange, Verrucomicrobia in blue, Lentisphaerae in green and Chlamydiae in purple. Representative bacterial lineages are in black and the archaeal outgroup in grey. Branches leading to clades from which to select a representative and selected representatives (Supplementary Table 3 and 6) are in red.
48
01000 3000 5000 0246810 Copy Number Chlamydia muridarum str. Nigg Chlamydia abortus S26/3 Chlamydia psittaci 6BC Chlamydia trachomatis D/UW-3/CX Chlamydophila caviae GPIC Chlamydophila pecorum E58 Chlamydophila pneumoniae CWL029 Chlamydiae bacterium K940_chlam_9 Chlamydia felis Fe/C-56 Chlamydia gallinacea 08-1274/3 Coraliomargarita akajimensis DSM 45221 Chlamydiae bacterium KR126_chlam_1 Ca. Protochlamydia naegleriophila KNic Chlamydia avium 10DC88 Parachlamydia acanthamoebae UV-7 Chlamydiae bacterium K940_chlam_8 Kiritimatiella glycovorans L21-Fru-AB Methylacidiphilum infernorum V4 Simkania negevensis Z Chlamydiae bacterium KR126_chlam_4 Chlamydiae bacterium K1060_chlam_5 Phycisphaera mikurensis NBRC 102666 Pirellula staleyi DSM 6068 Chlamydiae bacterium K1060_chlam_2 Akkermansia muciniphila ATCC BAA-835 Chlamydiae bacterium K940_chlam_1 Chlamydiae bacterium K940_chlam_7 Chlamydiae bacterium K940_chlam_3 Lentisphaera araneosa HTCC215 Isosphaera pallida ATCC 43644 Opitutus terrae PB90-1 Chlamydiae bacterium K1000_chlam_1 Chlamydiae bacterium K1060_chlam_1 Chlamydiae bacterium K940_chlam_5 Ca. Omnitrophus magneticus SKK-01 0XPXW COG0799 COG0449 COG0777 COG0825 COG0721 COG0319 COG1734 COG0238 COG0359 COG0335 COG0228 COG0336 COG0482 COG0215 COG0544 COG0750 COG0162 COG0253 COG1137 COG0289 COG0177 COG0522 COG0691 COG0769 COG1825 COG4775 COG0201 COG0481 COG0203 COG0322 COG0527 COG0196 COG0195 COG0858 COG1185 COG1862 COG0571 COG0575 COG0552 COG0396 COG0541 COG0185 COG0199 COG0128 COG0749 COG2877 COG1212 COG0173 COG0172 COG1663 COG0057 COG0592 COG0127 COG0030 COG0180 COG0511 COG0495 COG0009 COG1519 COG0781 COG0343 COG0742 COG0342 COG0148 COG0706 COG1530 COG0013 COG0050 COG0504 COG0361 COG0272 COG0012 COG0217 COG2890 COG0249 COG0324 COG1044 COG0632 COG0290 COG0250 COG0240 COG0323 COG0222 COG0166 COG0445 COG0149 COG0100 COG0052 COG2255 COG1570 COG1198 COG1160 COG0817 COG0815 COG0802 COG0774 COG0576 COG0536 COG0532 COG0525 COG0468 COG0442 COG0353 COG0331 COG0320 COG0292 COG0283 COG0264 COG0261 COG0256 COG0254 COG0244 COG0237 COG0233 COG0216 COG0211 COG0200 COG0197 COG0193 COG0190 COG0186 COG0184 COG0136 COG0126 COG0125 COG0124 COG0103 COG0102 COG0099 COG0098 COG0097 COG0096 COG0094 COG0093 COG0092 COG0091 COG0090 COG0089 COG0088 COG0087 COG0082 COG0081 COG0080 COG0064 COG0051 COG0049 COG0016 COG0048 NOGs
Supplementary Figure 14. Heatmap of copy-number for potential marker gene NOGs. Presence, absence and copy number of single-copy marker gene NOGs from complete and near complete PVC genomes, used for reconstructing species phylogenies.
49
● 12.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Discordance score Discordance ● ● ● ● ● ● 7.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5.0 0XPXW COG0186 COG0184 COG0082 COG0199 COG0048 COG0742 COG0319 COG0320 COG0193 COG0185 COG0799 COG0228 COG0211 COG0102 COG0781 COG0802 COG0721 COG0361 COG0222 COG0099 COG0571 COG0482 COG0089 COG0080 COG0691 COG0094 COG0125 COG0858 COG2890 COG0126 COG2877 COG0049 COG0091 COG0552 COG0292 COG0331 COG1044 COG0353 COG0093 COG0096 COG0203 COG0103 COG0149 COG0632 COG0359 COG1160 COG0190 COG1825 COG0127 COG1137 COG0148 COG0256 COG0196 COG0495 COG0817 COG0057 COG0200 COG0197 COG0522 COG0051 COG0081 COG0335 COG0097 COG0050 COG0769 COG1663 COG0087 COG0264 COG0261 COG0244 COG0172 COG0180 COG0532 COG0088 COG0090 COG0289 COG0324 COG0511 COG0016 COG2255 COG0215 COG0576 COG0322 COG1530 COG0240 COG0283 COG0052 COG0449 COG0098 COG0774 COG0092 COG0064 COG1570 COG0575 COG0195 COG0536 COG0544 COG0249 COG0445 COG0216 COG0468 COG0250 COG0233 COG1198 COG0173 COG0825 COG0290 COG0217 COG0541 COG0750 COG0013 COG0012 COG0815 COG0272 COG0504 COG0323 COG0525 COG0481 COG0201 COG1185 COG0592 COG0749 Ranked NOGs Supplementary Figure 15. Discordance filtering of single-copy marker genes. Discordance scores across single-copy marker protein NOGs. The 20% most discordant markers are left of the red dotted line. .
50
a b c
PVC group bacterium PVC group bacterium PVC group bacterium Candidatus Omnitrophica Candidatus Omnitrophica Candidatus Omnitrophica Planctomycetes Planctomycetes Planctomycetes Lentisphaerae/Kiritimatiellaeota Lentisphaerae/Kiritimatiellaeota Lentisphaerae/Kiritimatiellaeota Lentisphaerae Lentisphaerae Lentisphaerae Verrucomicrobia Verrucomicrobia Verrucomicrobia Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_11 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_11 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_11 Chlamydiae bacterium K940_chlam_8 Chlamydiae bacterium K940_chlam_8 Chlamydiae bacterium K940_chlam_8 Chlamydiae bacterium K1060_chlam_2 Chlamydiae bacterium K1060_chlam_2 Chlamydiae bacterium K1060_chlam_2 Simkania negevensis Simkania negevensis Simkania negevensis Chlamydiae bacterium Ga0074140 Chlamydiae bacterium Ga0074140 Chlamydiae bacterium Ga0074140 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_49_12 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_49_12 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_49_12 Chlamydiae bacterium K940_chlam_2 Chlamydiae bacterium K940_chlam_2 Chlamydiae bacterium K940_chlam_2 Chlamydiae bacterium KR126_chlam_1 Chlamydiae bacterium KR126_chlam_1 Chlamydiae bacterium KR126_chlam_1 Chlamydiae bacterium K1000_chlam_2 Chlamydiae bacterium K1000_chlam_2 Chlamydiae bacterium K1000_chlam_2 Chlamydiae bacterium KR126_chlam_3 Chlamydiae bacterium KR126_chlam_3 Chlamydiae bacterium KR126_chlam_3 Chlamydiae bacterium K1000_chlam_3 Chlamydiae bacterium K1000_chlam_3 Chlamydiae bacterium K1000_chlam_3 Chlamydiae bacterium K940_chlam_6 Chlamydiae bacterium K940_chlam_6 Chlamydiae bacterium K940_chlam_6 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_9 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_9 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_9 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 Chlamydiae bacterium SM23_39 Chlamydiae bacterium SM23_39 Chlamydiae bacterium SM23_39 Chlamydiae bacterium K1060_chlam_5 Chlamydiae bacterium K1060_chlam_5 Chlamydiae bacterium K1060_chlam_5 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_27_8 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_27_8 Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_27_8 Chlamydiae bacterium K1000_chlam_1 Chlamydiae bacterium K1000_chlam_1 Chlamydiae bacterium K1000_chlam_1 Chlamydiae bacterium K940_chlam_1 Chlamydiae bacterium K940_chlam_1 Chlamydiae bacterium K940_chlam_1 Chlamydiae bacterium KR126_chlam_6 Chlamydiae bacterium KR126_chlam_6 Chlamydiae bacterium KR126_chlam_6 Chlamydiae bacterium K1060_chlam_1 Chlamydiae bacterium K1060_chlam_1 Chlamydiae bacterium K1060_chlam_1 Chlamydiae bacterium K940_chlam_4 Chlamydiae bacterium K940_chlam_4 Chlamydiae bacterium K940_chlam_4 Chlamydiae bacterium KR126_chlam_4 Chlamydiae bacterium KR126_chlam_4 Chlamydiae bacterium KR126_chlam_4 Chlamydiae bacterium K1060_chlam_3 Chlamydiae bacterium K1060_chlam_3 Chlamydiae bacterium K1060_chlam_3 Chlamydiae bacterium KR126_chlam_5 Chlamydiae bacterium KR126_chlam_5 Chlamydiae bacterium KR126_chlam_5 Chlamydiae bacterium K1060_chlam_4 Chlamydiae bacterium K1060_chlam_4 Chlamydiae bacterium K1060_chlam_4 Chlamydiae bacterium K940_chlam_5 Chlamydiae bacterium K940_chlam_5 Chlamydiae bacterium K940_chlam_5 Chlamydiae bacterium K940_chlam_3 Chlamydiae bacterium K940_chlam_3 Chlamydiae bacterium K940_chlam_3 Criblamydia sequanensis Criblamydia sequanensis Criblamydia sequanensis Estrella lausannensis Estrella lausannensis Estrella lausannensis Chlamydiae bacterium K940_chlam_7 Chlamydiae bacterium K940_chlam_7 Chlamydiae bacterium K940_chlam_7 Waddlia chondrophila Waddlia chondrophila Waddlia chondrophila Parachlamydiaceae bacterium HS-T3 Parachlamydia acanthamoebae Parachlamydia acanthamoebae Parachlamydiasp. C2 Ca. Rubidus massiliensis Ca. Rubidus massiliensis Ca. Protochlamydia amoebophila Chlamydiales bacterium 38-26 Chlamydiales bacterium 38-26 Protochlamydia naegleriophila Neochlamydia sp. EPS4 Neochlamydiasp. EPS4 Parachlamydia acanthamoebae Parachlamydiaceae bacterium HS-T3 Parachlamydiaceae bacterium HS-T3 Ca. Rubidus massiliensis Parachlamydia sp. C2 Parachlamydia sp. C2 Chlamydiales bacterium 38-26 Ca. Protochlamydia amoebophila Ca. Protochlamydia amoebophila Neochlamydia sp. EPS4 Protochlamydia naegleriophila Protochlamydia naegleriophila Chlamydiae bacterium K940_chlam_9 Chlamydiae bacterium K940_chlam_9 Chlamydiae bacterium K940_chlam_9 Chlamydiae bacterium K1000_chlam_4 Chlamydiae bacterium K1000_chlam_4 Chlamydiae bacterium K1000_chlam_4 Chlamydiae bacterium KR126_chlam_2 Chlamydiae bacterium KR126_chlam_2 Chlamydiae bacterium KR126_chlam_2 Chlamydia trachomatis Chlamydia suis Chlamydia trachomatis Chlamydia muridarum Chlamydia muridarum Chlamydia muridarum Chlamydia suis Chlamydia trachomatis Chlamydia suis Chlamydophila pecorum Chlamydophila pecorum Chlamydophila pecorum Chlamydia sp. 2742-308 Chlamydia sp. 2742-308 Chlamydia sp. 2742-308 Chlamydophila pneumoniae Chlamydophila pneumoniae Chlamydophila pneumoniae Chlamydia ibidis Chlamydia ibidis Chlamydia ibidis Chlamydia avium Chlamydia avium Chlamydia avium Chlamydia gallinacea Chlamydia gallinacea Chlamydia gallinacea Chlamydia abortus Chlamydia felis Chlamydia felis ufBV ≥ 95 Chlamydia psittaci Chlamydophila caviae Chlamydophila caviae Chlamydia felis Chlamydia abortus Chlamydia abortus Chlamydophila caviae Chlamydia psittaci Chlamydia psittaci
0.1 substitutions 0.1 substitutions 0.1 substitutions per site per site per site
Supplementary Figure 16. ML concatenated species phylogenetic trees of Chlamydiae. ML phylogenetic trees inferred using IQ-TREE under the LG+C60+R4+F model of evolution with (a) 98 (28,286 sites), (b) 55 (14,212 sites), and (c) 38 (7,894 sites) concatenated single-copy marker genes. Datasets of 55 and 38 single-copy marker genes are subsets of the 98 based on best representation among genomes. Phylogenies include an extensive outgroup with representatives from across the PVC phyla: Kiritimatiellaeota, Lentisphaerae, Verrucomicrobia, Candidatus Omnitrophica and Planctomycetes. Ultrafast bootstrap (ufBV) support is indicated at branches following the legend.
51
300
200 Chi2T score statistic est Chi2T
100
0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 Percentage of alignment Chi2 pruned (%)
PVC Outgroup Chlamydiae
Supplementary Figure 17. Step-wise removal of the most compositionally heterogeneous sites from a concatenated alignment of single-copy marker proteins, based on c2 statistics. The plot shows the c2 test statistic across Chlamydiae and outgroup PVC taxa after pruning 5% to 95% of the sites with the highest compositional heterogeneity.
52
Supplementary Tables
Supplementary Table 1. Loki's Castle sediment sample information and summary of amplicon sequencing results.
Chlamydiae Number of Chlamydiae OTUs Over Chlamydiae OTUs Over Relative Chlamydiae 0.1% Relative 1% Relative Sample ID Sediment Core Depth (mbsf) Abundance (%) OTUs Abundance Abundance GS10_PC15_10 GS10_PC15 11.58 –a – – – GS10_PC15_40 GS10_PC15 11.23 – – – – GS10_PC15_70 GS10_PC15 10.93 – – – – GS10_PC15_100 GS10_PC15 10.63 – – – – GS10_PC15_130 GS10_PC15 10.33 – – – – GS10_PC15_160 GS10_PC15 10.03 – – – – GS10_PC15_190 GS10_PC15 9.73 – – – – GS10_PC15_220 GS10_PC15 9,43 0 0 0 0 GS10_PC15_250 GS10_PC15 9.13 0.068 1 0 0 GS10_PC15_280 GS10_PC15 8.83 – – – – GS10_PC15_310 GS10_PC15 8.53 – – – – GS10_PC15_340 GS10_PC15 8.23 – – – – GS10_PC15_370 GS10_PC15 7.93 – – – – GS10_PC15_400 GS10_PC15 7.63 – – – – GS10_PC15_430 GS10_PC15 7.33 – – – – GS10_PC15_460 GS10_PC15 7.03 – – – – GS10_PC15_490 GS10_PC15 6.73 – – – – GS10_PC15_520 GS10_PC15 6.43 0.177 1 1 0 GS10_PC15_550 GS10_PC15 6.13 0 0 0 0 GS10_PC15_580 GS10_PC15 5.83 – – – – GS10_PC15_610 GS10_PC15 5.53 0 0 0 0 GS10_PC15_640 GS10_PC15 5.23 – – – – GS10_PC15_670 GS10_PC15 4.93 – – – – GS10_PC15_700 GS10_PC15 4.36 – – – – GS10_PC15_730 GS10_PC15 4.33 – – – – GS10_PC15_760 GS10_PC15 4.03 – – – – GS10_PC15_790 GS10_PC15 3.73 – – – – GS10_PC15_820 GS10_PC15 3.43 – – – – GS10_PC15_850 GS10_PC15 3.13 – – – – GS10_PC15_880 GS10_PC15 2.83 3.808 14 6 1 GS10_PC15_910 GS10_PC15 2.53 – – – – GS10_PC15_940 GS10_PC15 2.23 11.148 26 10 2 GS10_PC15_1000 GS10_PC15 1.63 5.666 82 16 0 GS10_PC15_1060 GS10_PC15 1.03 12.43 163 29 1 GS10_PC15_1090 GS10_PC15 0.73 – – – – GS10_PC15_1120 GS10_PC15 0.43 1.239 25 2 0 GS10_GC14_5 GS10_GC14 0.05 – – – – GS10_GC14_10 GS10_GC14 0.10 – – – – GS10_GC14_40 GS10_GC14 0.40 – – – – GS10_GC14_75 GS10_GC14 0.75 8.929 10 4 1 GS10_GC14_100 GS10_GC14 1.00 – – – – GS10_GC14_115 GS10_GC14 1.15 – – – – GS10_GC14_130 GS10_GC14 1.30 – – – – GS10_GC14_150 GS10_GC14 1.50 – – – – GS10_GC14_176 GS10_GC14 1.76 – – – – GS10_GC14_180 GS10_GC14 1.80 – – – – GS10_GC14_200 GS10_GC14 2.00 – – – – GS14_GC12_10 GS14_GC12 0.10 1.024 25 2 0 GS14_GC12_20 GS14_GC12 0.20 1.028 19 1 0 GS14_GC12_30 GS14_GC12 0.30 1.079 94 2 0 GS14_GC12_40 GS14_GC12 0.40 0.467 25 1 0 GS14_GC12_50 GS14_GC12 0.50 1.107 17 3 0 GS14_GC12_75 GS14_GC12 0.75 2.101 45 3 0 GS14_GC12_100 GS14_GC12 1.00 1.648 25 6 0 GS14_GC12_130 GS14_GC12 1.30 1.393 6 3 1 GS14_GC12_160 GS14_GC12 1.60 3.829 4 1 1 GS14_GC12_175 GS14_GC12 1.75 0.848 5 2 0 GS14_GC12_190 GS14_GC12 1.90 0.098 2 0 0 GS14_GC12_220 GS14_GC12 2.20 0 0 0 0 GS14_GC12_250 GS14_GC12 2.50 0 0 0 0 GS14_GC12_280 GS14_GC12 2.80 0 0 0 0 GS14_GC12_310 GS14_GC12 3.10 0 0 0 0 GS14_GC12_340 GS14_GC12 3.40 0.218 6 0 0 GS14_GC12_357 GS14_GC12 3.57 0 0 0 0 GS14_GC12_360 GS14_GC12 3.60 0.027 5 0 0 GS08_GC12_38 GS08_GC12 0.38 – – – – GS08_GC12_80 GS08_GC12 0.80 – – – – GS08_GC12_126 GS08_GC12 1.26 43.063 37 8 2 GS08_GC12_310 GS08_GC12 3.10 – – – – aNot applicable (–), PCR screened, but amplicon sequencing not performed for sample
53
Supplementary Table 2. Characteristics of marine sediment chlamydiae MAGs.
Estimated Median Chlamydiae Clade Completeness Number of Bin Size Genome Size Intergenic a a b c Genome Affiliation Metagenome Sample (%) Redundancy GC (%) Contigs (Mbp) (Mbp) Space (bp) iRep N50 RP15 Contig 16S rRNA Gene Chlamydiae bacterium K940_chlam_8 Unresolved GS10_PC15_940 98 1 37.48 89 1.4 1.43 38 – 26189 contig-124_1042 full (contig-124_2389) Chlamydiae bacterium K1060_chlam_2 CC-I GS10_PC15_1060 97 1 46.89 143 1.63 1.68´ 28 – 16440 contig-124_2150 partial (contig-124_100491) Chlamydiae bacterium K940_chlam_2 CC-II GS10_PC15_940 94 1.09 48.88 285 1.66 1.61 18 – 7020 contig-124_2839 none Chlamydiae bacterium KR126_chlam_1 CC-II GS08_GC12_126 99 1.01 46.8 25 1.61 1.61 32 1.19 116054 contig-100_216 partial (contig-100_165) Chlamydiae bacterium K1000_chlam_2 CC-II GS10_PC15_1000 88 1.01 44.48 252 1.74 1.96 20 – 8520 contig-124_9157 partial (contig-124_16903) Chlamydiae bacterium KR126_chlam_3 CC-II GS08_GC12_126 94 1 42.98 185 1.68 1.79 40 1.4 11258 contig-100_1304 partial (contig-100_50271) Chlamydiae bacterium K940_chlam_6 CC-II GS10_PC15_940 91 1 42.93 432 1.51 1.66 24 – 4668 contig-124_54975/contig-124_65424 none Chlamydiae bacterium K1000_chlam_3 CC-II GS10_PC15_1000 86 1 41.77 270 1.66 1.93 31 1.9 7870 contig-124_1375 none Chlamydiae bacterium K1060_chlam_5 CC-IV GS10_PC15_1060 98 1.01 26.37 57 1.39 1.40 39 – 39842 contig-124_32386 none Chlamydiae bacterium K940_chlam_1 CC-IV GS10_PC15_940 100 1 30.93 156 1.33 1.33 50 1.54 14062 contig-124_2150 none Chlamydiae bacterium K1000_chlam_1 CC-IV GS10_PC15_1000 96 1 30.85 252 1.6 1.67 57 1.51 9399 contig-124_1902 partial (contig-124_139984) Chlamydiae bacterium KR126_chlam_6 CC-IV GS08_GC12_126 94 1.02 29.65 162 1.53 1.60 50 1.36 12844 contig-100_3930 none Chlamydiae bacterium K1060_chlam_1 CC-IV GS10_PC15_1060 96 1 29.32 120 1.59 1.66 68 1.48 18228 contig-124_9382 none Chlamydiae bacterium KR126_chlam_4 CC-IV GS08_GC12_126 100 1.01 29.43 107 1.58 1.56 63 1.12 21756 contig-100_2916 none Chlamydiae bacterium K940_chlam_4 CC-IV GS10_PC15_940 94 1 29.51 235 1.42 1.51 64 1.58 8098 contig-124_7225 none Chlamydiae bacterium K940_chlam_5 CC-IV GS10_PC15_940 99 1.01 30.16 191 1.76 1.76 65 1.58 12088 contig-124_5410 none Chlamydiae bacterium K1060_chlam_4 CC-IV GS10_PC15_1060 67 1.04 30.3 514 1.38 1.98 75 – 3239 contig-124_114082 none Chlamydiae bacterium KR126_chlam_5 CC-IV GS08_GC12_126 94 1.01 30.25 264 1.57 1.65 54 1.19 7741 contig-100_14168 none Chlamydiae bacterium K1060_chlam_3 CC-IV GS10_PC15_1060 71 1.03 30.34 167 0.98 1.34 60.5 – 8308 contig-124_29952 none Chlamydiae bacterium K940_chlam_3 Environmental chlamydiae GS10_PC15_940 96 1 41.7 221 1.91 1.99 50 1.73 10099 contig-124_6068 none Chlamydiae bacterium K940_chlam_7 Environmental chlamydiae GS10_PC15_940 96 1 43.37 408 2.49 2.59 53 1.4 7595 contig-124_4023 none Chlamydiae bacterium K940_chlam_9 CC-V GS10_PC15_940 99 1 47.21 240 2.07 2.09 35.5 1.79 16710 contig-124_6236 partial (contig-124_2865) Chlamydiae bacterium KR126_chlam_2 CC-V GS08_GC12_126 93 1.04 47.37 141 1.37 1.41 24 1.4 12860 contig-100_6141/contig-100_8385 none Chlamydiae bacterium K1000_chlam_4 CC-V GS10_PC15_1000 71 1.04 46.82 229 0.94 1.27 38 – 4244 contig-124_70302 none aEstimated using micomplete (See Methods) with Chlamydiae-specific single-copy gene set (Supplementary Table 6) bBased on estimated completeness, corrected by estimated proportion of genome redundancy cNot applicable (–), genome didn't meet coverage (5X) and completeness (70%) requirements for inferring replication rate (iRep)(Brown et al., 2016) dPercentage of reads in respective metagenome mapped to contigs in each genome (out of all reads mapped to contigs in the complete metagenome assembly)
54
Supplementary Table 3. Characteristics of Chlamydiae reference genomes.
Estimated Median Chlamydiae Clade Number of Completeness Genome/Bin Genome Size Number of Intergenic 16S rRNA Organism Name Affiliation Genbank Accession Assembly Level Genome Source Contigs (%)b Redundancyb GC (%) Size (Mbp) (Mbp) ORFsc Space (bp) Gene Chlamydiae Species Representatives (Available Prior to February 2017) Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_11 Unresolved GCA_001794905.1 Scaffold aquifer groundwater metagenome96 134 91 1 48.45 1.26 1.38 1065 22.5 no Simkania negevensis Z CC-I GCA_000237205.1 Complete Genome co-culture – 100 1 41.60 2.63 – 2518 36 yes Chlamydiae bacterium RIFCSPLOWO2_02_FULL_49_12 CC-II GCA_001796275.1 Scaffold aquifer groundwater metagenome96 156 95 1 48.99 1.41 1.49 1175 34 yes Chlamydiae bacterium Ga0074140 CC-II GCA_001464115.1 Contig water treatment plant metagenome88 6 99 1 47.82 1.72 1.74 1648 35 yes Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_9 CC-III GCA_001794935.1 Scaffold aquifer groundwater metagenome96 211 65 1 48.59 1.32 2.02 1166 27 no Chlamydiae bacterium RIFCSPLOWO2_02_FULL_45_22 CC-III GCA_001796255.1 Scaffold aquifer groundwater metagenome96 31 99 1 44.70 1.58 1.59 1475 23.5 yes Chlamydiae bacterium SM23_39 CC-IV GCA_001303765.1 Contig aquifer groundwater metagenome96 67 93 1 26.23 1.13 1.21 986 35 yes Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_27_8 CC-IV GCA_001796155.1 Scaffold aquifer groundwater metagenome96 222 71 1 27.43 0.97 1.38 817 29 no Criblamydia sequanensis CRIB-18 Environmental chlamydiae GCA_000750955.1 Contig co-culture 23 99 1 38.24 2.97 3.00 2418 64 yes Estrella lausannensis CRIB-30 Environmental chlamydiae GCA_900000175.1 Scaffold co-culture 34 99 1 48.22 2.83 2.85 2217 92 yes Waddlia chondrophila WSU 86-1044 Environmental chlamydiae GCA_000092785.1 Complete Genome co-culture – 100 1 43.74 2.13 – 1956 22 yes Parachlamydia acanthamoebae UV-7 Environmental chlamydiae GCA_000253035.1 Complete Genome co-culture – 99 1 39.04 3.07 – 2788 56 yes Ca. Rubidus massiliensis (ex. Chlamydia sp. Rubis) Environmental chlamydiae GCA_000756735.1 Contig co-culture 5 100 1 32.64 2.82 2.82 2446 53 yes Chlamydiales bacterium 38-26 Environmental chlamydiae GCA_001897225.1 Scaffold thiocyanate bioreactor metagenome87 10 100 1 38.12 2.83 2.83 2327 88 no Neochlamydia sp. EPS4 Environmental chlamydiae GCA_000813665.1 Contig co-culture 112 99 1 38.09 2.53 2.55 2173 142 yes Parachlamydiaceae bacterium HS-T3 Environmental chlamydiae GCA_000829755.1 Contig co-culture 34 99 1 38.71 2.31 2.33 2025 44.5 yes Parachlamydia sp. C2 (ex. Protochlamydia greubae) Environmental chlamydiae GCA_001545115.1 Scaffold co-culture 33 100 1 42.05 3.42 3.42 2766 117 yes Ca. Protochlamydia amoebophila UWE25 Environmental chlamydiae GCA_000011565.1 Chromosome co-culture 2 100 1 34.72 2.41 2.41 2031 120 yes Ca. Protochlamydia naegleriophila KNic Environmental chlamydiae GCA_001499655.1 Complete Genome co-culture –a 100 1 42.44 3.03 – 2575 113 yes Chlamydia trachomatis D/UW-3/CX Clamydiaceae GCA_000008725.1 Complete Genome co-culture – 100 1 41.31 1.04 – 894 53 yes Chlamydia muridarum str. Nigg Clamydiaceae GCA_000006685.1 Complete Genome co-culture – 100 1 40.31 1.08 – 911 45.5 yes Chlamydia suis MD56 Clamydiaceae GCA_000493885.1 Scaffold co-culture 47 100 1 42.01 1.08 1.08 931 51 yes Chlamydophila pecorum E58 Clamydiaceae GCA_000204135.1 Complete Genome co-culture – 100 1 41.08 1.11 – 988 28 yes Chlamydia sp. 2742-308 Clamydiaceae GCA_001653975.1 Chromosome co-culture 2 100 1 38.50 1.12 1.12 1004 44 yes Chlamydophila pneumoniae CWL029 Clamydiaceae GCA_000008745.1 Complete Genome co-culture – 100 1 40.58 1.23 – 1052 55 yes Chlamydia ibidis 10-1398/6 Clamydiaceae GCA_000454725.1 Contig co-culture 4 100 1 38.32 1.15 1.15 1018 50 yes Chlamydia avium 10DC88 Clamydiaceae GCA_000583875.1 Complete Genome co-culture – 100 1 36.88 1.05 – 947 31 yes Chlamydia gallinacea 08-1274/3 Clamydiaceae GCA_000471025.2 Complete Genome co-culture – 99 1 37.90 1.07 – 900 40.5 yes Chlamydia felis Fe/C-56 Clamydiaceae GCA_000009945.1 Complete Genome co-culture – 100 1 39.34 1.17 – 1013 39.5 yes Chlamydophila caviae GPIC Clamydiaceae GCA_000007605.1 Complete Genome co-culture – 100 1 39.19 1.18 – 1005 48 yes Chlamydia abortus S26/3 Clamydiaceae GCA_000026025.1 Complete Genome co-culture – 100 1 39.87 1.14 – 932 45 yes Chlamydia psittaci 6BC Clamydiaceae GCA_000204255.1 Complete Genome co-culture – 100 1 39.02 1.18 – 1009 42 yes Other Chlamydiae Species Representatives (Released Between February 2017 and April 2018) Ca. Similichlamydia epinephelii GCCT14 Unresolved GCA_003056015.1 Scaffold infected gill tissue metagenome2 170 80 1.4 39.54 0.98 0.71 940 36 yes Rhabdochlamydia helvetica T3358 CC-II Pillonel et al.. 2018 Contig tick metagenome3 38 99 1 36.18 1.83 1.85 1692 43 yes 95 Chlamydiae bacterium CG10_big_fil_rev_8_21_14_0_10_42_34 CC-III GCA_002773795.1 Scaffold cold CO2 driven geyser metagenome 34 97 1 42.36 1.68 1.73 1581 21.5 no 95 Chlamydiae bacterium CG10_big_fil_rev_8_21_14_0_10_35_9 Unresolved GCA_002773835.1 Scaffold cold CO2 driven geyser metagenome 108 85 1.04 35.13 1.74 1.96 1720 24 no Chlamydiales bacterium SCGC AB-751-O23 Unresolved GCA_900093645.1 Scaffold single-cell from marine water36 89 42 1 35.45 0.99 2.37 876 30.5 yes Waddliaceae bacterium SP13 Unresolved GCA_002709385.1 Contig marine water metagenome37 49 97 1 38.48 3.15 3.24 2460 65 yes Chlamydiales bacterium SCGC AG-110-P3 Environmental chlamydiae GCA_900093655.1 Scaffold single-cell from marine water36 102 50 1 46.83 1.30 2.58 1235 84 yes Chlamydiales bacterium SCGC AG-110-M15 Unresolved GCA_900093625.1 Scaffold single-cell from marine water36 59 50 1 41.80 0.93 1.84 851 53 no Parachlamydia sp. BC.030 Environmental chlamydiae GCA_002786175.1 Contig urban drinking water system metagenome89 39 100 1 41.53 3.04 3.04 2540 68 no Ca. Chlamydia corallus G3/2742-324 Clamydiaceae GCA_002817655.1 Contig snake choana metagenome97 7 100 1.01 39.28 1.20 1.19 996 48.5 yes Non-representative Chlamydiae Included in Select Analyses Chlamydiae bacterium GWA2_50_15 CC-II GCA_001796065.1 Scaffold aquifer groundwater metagenome96 56 94 1 49.34 1.18 1.26 993 29.5 yes Chlamydiae bacterium GWC2_50_10 CC-II GCA_001796095.1 Scaffold aquifer groundwater metagenome96 135 83 1 48.94 1.17 1.41 966 31 yes Chlamydiae bacterium GWF2_49_8 CC-II GCA_001796105.1 Scaffold aquifer groundwater metagenome96 280 71 1 49.23 1.02 1.44 760 26 no Chlamydiae bacterium RIFCSPHIGHO2_02_FULL_49_29 CC-II GCA_001796185.1 Scaffold aquifer groundwater metagenome96 85 91 1 49.08 1.39 1.52 1187 31 no Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_49_32 CC-II GCA_001796175.1 Scaffold aquifer groundwater metagenome96 94 89 1 48.91 1.40 1.56 1190 29 yes Chlamydiae bacterium RIFCSPLOWO2_12_FULL_49_12 CC-II GCA_001796315.1 Scaffold aquifer groundwater metagenome96 70 90 1 49.16 1.42 1.57 1224 34.5 yes Chlamydiae bacterium RIFCSPHIGHO2_01_FULL_44_39 CC-III GCA_001794865.1 Scaffold aquifer groundwater metagenome96 32 99 1 44.72 1.57 1.58 1466 23 yes Chlamydiae bacterium RIFCSPHIGHO2_02_FULL_45_9 CC-III GCA_001796125.1 Scaffold aquifer groundwater metagenome96 222 80 1 44.66 1.34 1.69 1156 26 no Chlamydiae bacterium RIFCSPLOWO2_01_FULL_44_52 CC-III GCA_001796235.1 Scaffold aquifer groundwater metagenome96 30 99 1 44.74 1.54 1.55 1438 24 yes Chlamydiae bacterium RIFCSPLOWO2_12_FULL_45_20 CC-III GCA_001796285.1 Scaffold aquifer groundwater metagenome96 29 99 1 44.75 1.54 1.56 1443 23 yes Chlamydiae bacterium RIFCSPHIGHO2_12_FULL_44_59 CC-III GCA_001794895.1 Scaffold aquifer groundwater metagenome96 30 99 1 44.72 1.57 1.58 1470 23.5 yes Chlamydiae bacterium RIFCSPLOWO2_01_FULL_28_7 CC-IV GCA_001796195.1 Scaffold aquifer groundwater metagenome96 193 62 1 28.07 0.71 1.14 572 31 no Chlamydia sp. 32-24 Environmental chlamydiae GCA_001897185.1 Scaffold thiocyanate bioreactor metagenome87 100 99 1 32.42 2.53 2.55 2075 56 no Neochlamydia sp. S13 Environmental chlamydiae GCA_000648235.1 Contig co-culture 1342 99 1.07 38.03 3.19 2.99 2232 161 yes Neochlamydia sp. TUME1 Environmental chlamydiae GCA_000813645.1 Contig co-culture 254 100 1 38.02 2.55 2.55 2344 121 yes Parachlamydia acanthamoebae OEW1 Environmental chlamydiae GCA_000812225.1 Contig co-culture 162 97 1 39.04 3.01 3.09 2755 53 yes Parachlamydia acanthamoebae Bn9 Environmental chlamydiae GCA_000875975.1 Contig co-culture 72 99 1 38.94 3.00 3.03 2409 60 yes Parachlamydia acanthamoebae Hall's coccus Environmental chlamydiae GCA_000176075.1 Contig co-culture 95 98 1 38.97 2.97 3.02 2809 54 no Ca. Protochlamydia amoebophila EI2 Environmental chlamydiae GCA_000813625.1 Contig co-culture 178 96 1 34.82 2.40 2.51 2149 99 yes Ca. Protochlamydia massiliensis (ex. Chlamydia sp. 'Diamant') Environmental chlamydiae GCA_000751535.1 Contig co-culture 5 100 1 42.75 2.96 2.96 2451 110 yes Ca. Protochlamydia sp. R18 str. S13 Environmental chlamydiae GCA_000648255.1 Contig co-culture 795 100 1.02 34.74 2.72 2.67 2017 110 yes Ca. Protochlamydia sp. W-9 Environmental chlamydiae GCA_001950615.1 Contig co-culture 402 100 1 34.43 2.48 2.48 1817 109 yes aNot applicable (–), complete genome bEstimated using micomplete (See Methods) with Chlamydiae-specific single-copy gene set (Supplementary Table 6) cOpen Reading Frames (ORFs)
55
Supplementary Table 4. Number of 16/18S rRNA gene fragments identified per phyla in marine sediment sample metagenomes.
Phylum Domain GS10_PC15_1060 GS10_PC15_1000 GS10_PC15_940 GS08_GC12_126 Chloroplast – 0 0 1 0 Platyhelminthes Eukaryota 1 0 0 0 Chordata Eukaryota 0 1 2 0 Abeoformidae Eukaryota 0 0 2 0 Chlorophyta Eukaryota 0 0 1 0 Bacteroidetes Bacteria 8 6 9 3 Marinimicrobia (SAR406 clade) Bacteria 4 3 3 3 Gemmatimonadetes Bacteria 3 2 3 2 Fibrobacteres Bacteria 3 2 1 1 Caldithrix phylum incertae sedis Bacteria 2 3 0 1 Ca. Latescibacteria (WS3) Bacteria 4 4 2 2 Cloacimonetes Bacteria 1 0 0 0 Ca. Zixibacteria (RBG-1) Bacteria 5 3 0 5 GN01 Bacteria 0 2 0 0 Ignavibacteriae Bacteria 0 0 1 0 Proteobacteria Bacteria 51 22 30 14 Firmicutes Bacteria 0 1 1 0 Actinobacteria Bacteria 17 18 21 8 Chloroflexi Bacteria 63 50 22 92 Deinococcus-Thermus Bacteria 1 2 0 0 Armatimonadetes Bacteria 1 1 1 0 WS1 Bacteria 3 2 1 5 Spirochaetes Bacteria 9 8 7 3 Chlamydiae Bacteria 16 17 12 6 Planctomycetes Bacteria 69 45 55 31 Lentisphaerae Bacteria 1 0 0 0 Ca. Omnitrophica (OP3) Bacteria 7 1 0 11 Epsilonbacteraeota Bacteria 3 0 3 0 Acidobacteria Bacteria 6 5 5 0 Ca. Acetothermia (OP1) Bacteria 0 4 0 1 Nitrospinae Bacteria 0 0 0 1 Elusimicrobia Bacteria 2 1 0 1 Ca. Hydrogenedentes (NKB19) Bacteria 1 0 2 1 Ca. Atribacteria Bacteria 0 0 2 0 Ca. Saccharibacteria (TM7) Bacteria 1 1 0 0 Ca. Parcubacteria Bacteria 49 22 13 11 CPR2 Bacteria 2 0 0 0 Ca. Microgenomates (OP11) Bacteria 9 6 4 9 Ca. Berkelbacteria Bacteria 2 2 1 1 Ca. Gracilibacteria Bacteria 2 0 0 0 Ca. Peregrinibacteria Bacteria 4 0 0 1 Ca. Katanobacteria (WWE3) Bacteria 1 0 1 0 Ca. Absconditabacteria (SR1) Bacteria 0 1 0 0 MD2896-B216 Bacteria 2 1 0 0 Unknown CPR phylum 1 Bacteria 1 1 0 0 Unknown CPR phylum 2 Bacteria 1 0 0 0 Unknown CPR phylum 3 Bacteria 1 0 0 0 Poribacteria Bacteria 4 1 0 3 Halanaerobiales phylum incertae sedis Bacteria 4 5 0 2 Ca. Aminicenantes (OP8) Bacteria 3 3 1 1 Ca. Dependentiae (TM6) Bacteria 42 17 18 6 Ca. Aerophobetes (CD12) Bacteria 6 6 1 3 NC10 Bacteria 1 0 0 1 Nitrospirae Bacteria 2 1 5 0 BRC1 Bacteria 1 1 5 0 Euryarchaeota Archaea 1 1 0 4 Unknown Euryarchaeota (superphylum) phylum 1 Archaea 0 0 1 0 Thaumarchaeota Archaea 2 8 4 1 Ca. Bathyarchaeota Archaea 2 2 0 1 Group C3 Archaea 2 4 0 0 Ca. Woesearchaeota Archaea 20 11 30 8 Ca. Diapherotrites Archaea 3 0 0 0 Ca. Parvarchaeota Archaea 1 0 0 0 Unknown DPANN phylum 1 Archaea 1 0 0 0 Ca. Aenigmarchaeota Archaea 0 0 0 1 Ca. Altiarchaeota Archaea 0 0 0 1 Ca. Lokiarchaeota Archaea 5 10 1 3 TOTAL ALL 456 307 271 248
56
Supplementary Table 5. PCR primer pairs, taxonomic coverage and reaction conditions.
Primer Pair Taxonomic Target Taxonomic Coverage (No Mismatches)a Taxonomic Coverage (One Mismatch)a PCR Amplication Reaction Conditionsb Chla-310-a-20 · 0% of Eukaryota · 0% of Eukaryota (CGCCAACAYTGGGACTGAGA) · 0% of Archaea · 0% of Archaea · 15 min of polymerase heat activation at 95 °C and · 0% of Bacteria · 0% of Bacteria · 35 cycles of 94 °C (60 s), 60 °C (60 s) and 72 °C (60 S-*-Univ-1100-a-A-15 Chlamydiae · 99 % of characterized Chlamydiae · 96% of characterized Chlamydiae (0.1% s) · final 98 (4.5 % of Marinimicrobia and small (GGGTYKCGCTCGTTR) of Armatimonadetes, no additional bacterial extension at 72 °C (10 min) percentages (0.01-0.81 %) of additional phyla) bacterial phyla) 574*f · 15 min of polymerase heat activation at 95 °C (CGGTAAYTCCAGCTCYV)99 and · 88 % of Eukaryota · 94% of Eukaryota · 35 cycles of 94 °C (60 s), a step-down to 70 °C (1 1132 Eukarya · 0% of Archaea · 10% of Archaea s), followed by a ramping rate of 0.4 °C/s to 50 °C (60 (CCGTCAATTHCTTYAART)99 · 0% of Bacteria · 0% of Bacteria s), and a ramping rate of 0.8 °C/s to 72 °C (60 s) · final extension at 72 °C (10 min) S-D-0564-a-S-15 · 15 min of polymerase heat activation at 95 °C · 0% of Eukaryota · 0% of Eukaryota 98 · 28 cycles of 94 °C (60 s), a step-down to 70 °C (1 (AYTGGGYDTAAAGNG) and S- · 0.2% of Archaea · 4.6% of Archaea Bacteria s), followed by a ramping rate of 0.4 °C/s to 50 °C (60 D-Bact-1061-a-A-17 · 92% of Bacteria · 97% of Bacteria 98 s), and a ramping rate of 0.8 °C/s to 72 °C (60 s) (CRRCACGAGCTGACGAC) · 94% of characterized Chlamydiae · 99% of characterized Chlamydiae · final extension at 72 °C (10 min) A519F · 86% of Eukaryota · 93% of Eukaryota (CAGCMGCCGCGGTAA)100 and Eukarya, Archaea · 70% of Archaea · 74% of Archaea not used in this study Uni1391R and Bacteria · 86% of Bacteria · 91% of Bacteria (ACGGGCGGTGWGTRC)93 · 0.7% of characterized Chlamydiae · 93% of characterized Chlamydiae Earth Microbiome Project92 · 0% of Eukaryota · 0% of Eukaryota primers: 515F Archaea and · 0% of Archaea · 0% of Archaea (GTGYCAGCMGCCGCGGTAA) not used in this study Bacteria · 92% of Bacteria · 92% of Bacteria and 806R · 0.7% of characterized Chlamydiae · 95% of characterized Chlamydiae (GGACTACNVGGGTWTCTAAT) aUsing SILVA TestPrime (Klindworth et al., 2013) with the SSU r132 database and RefNR sequence collection bUsing HotStarTaq DNA Polymerase (QIAGEN)
57
Supplementary Table 6. PVC outgroup genomes used in phylogenomic analyses.
Phylum Species Name Genbank Accession Kiritimatiellaeota Kiritimatiella glycovorans L21-Fru-AB GCA_001017655.1 Lentisphaerae Lentisphaerae bacterium GWF2_57_35 GCA_001804865.1 Lentisphaerae Lentisphaerae bacterium RIFOXYC12_FULL_60_16 GCA_001803315.1 Lentisphaerae Lentisphaera araneosa HTCC215 GCA_000170755.1 Lentisphaerae Lentisphaerae bacterium GWF2_50_93 GCA_001804815.1 Verrucomicrobia Coraliomargarita akajimensis DSM 45221 GCA_000025905.1 Verrucomicrobia Opitutus terrae PB90-1 GCA_000019965.1 Verrucomicrobia Pedosphaera parvula Ellin514 GCA_000172555.1 Verrucomicrobia Methylacidiphilum infernorum V4 GCA_000019665.1 Verrucomicrobia Terrimicrobium sacchariphilum NM-5 GCA_001613545.1 Verrucomicrobia Verrucomicrobium sp. BvORR106 GCA_000739655.1 Verrucomicrobia Akkermansia muciniphila ATCC BAA-835 GCA_000020225.1 Unclassified PVC PVC group bacterium (ex. Bugula neritina AB1) AB1-3 GCA_001730085.1 Candidatus Omnitrophica Ca. Omnitrophica bacterium CG1_02_41_171 GCA_001871865.1 Candidatus Omnitrophica Ca. Omnitrophus fodinae SCGC AAA011-A17 GCA_000405945.1 Candidatus Omnitrophica Ca. Omnitrophus magneticus SKK-01 GCA_000954095.1 Candidatus Omnitrophica Omnitrophica WOR_2 bacterium RIFCSPHIGHO2_02_FULL_63_39 GCA_001805685.1 Candidatus Omnitrophica Omnitrophica WOR_2 bacterium RIFCSPLOWO2_02_FULL_50_19 GCA_001805805.1 Candidatus Omnitrophica Omnitrophica WOR_2 bacterium RIFOXYB2_FULL_38_16 GCA_001805995.1 Candidatus Omnitrophica Omnitrophica WOR_2 bacterium RBG_13_41_10 GCA_001805465.1 Candidatus Omnitrophica Omnitrophica WOR_2 bacterium GWF2_43_52 GCA_001805445.1 Planctomycetes Planctomycetes bacterium DG_23 GCA_001302825.1 Planctomycetes Planctomycetes bacterium RIFCSPLOWO2_02_FULL_50_16 GCA_001828565.1 Planctomycetes Ca. Scalindua brodae RU1 GCA_000786775.1 Planctomycetes Ca. JPN1 GCA_000949635.1 Planctomycetes Phycisphaera mikurensis NBRC 102666 GCA_000284115.1 Planctomycetes Isosphaera pallida ATCC 43644 GCA_000186345.1 Planctomycetes Pirellula staleyi DSM 6068 GCA_000025185.1
58
Supplementary Table 7. Single-copy marker genes used to assess chlamydial MAGs completeness and redundancy.
Chlamydiae Micomplete Gene/Domain Names Aminoacyl tRNA synthetase class II, N-terminal domain Ribosomal protein S10p/S20e Arginyl tRNA synthetase N terminal domain Ribosomal protein S11 Bacterial RNA polymerase, alpha chain C terminal domain Ribosomal protein S12/S23 Bacterial trigger factor protein (TF) Ribosomal protein S13/S18 Bacterial trigger factor protein (TF) C-terminus Ribosomal protein S15 ClpX C4-type zinc finger Ribosomal protein S16 Conserved hypothetical protein 95 Ribosomal protein S17 CTP synthase N-terminus Ribosomal protein S18 Cytidylate kinase Ribosomal protein S19 Dephospho-CoA kinase Ribosomal protein S2 DNA polymerase III beta subunit, C-terminal domain Ribosomal protein S20 DNA polymerase III beta subunit, central domain Ribosomal protein S3, C-terminal domain DNA primase catalytic core, N-terminal domain Ribosomal protein S4/S9 N-terminal domain Double-stranded RNA binding motif Ribosomal protein S5, C-terminal domain Elongation factor TS Ribosomal protein S5, N-terminal domain Enolase, C-terminal TIM barrel domain Ribosomal protein S6 Enolase, N-terminal domain Ribosomal protein S7p/S5e FAD synthetase Ribosomal protein S8 Ferredoxin-fold anticodon binding domain Ribosomal protein S9/S16 GAD domain Ribosomal Proteins L2, C-terminal domain GrpE Ribosomal Proteins L2, RNA binding domain GTP-binding protein LepA C-terminus Ribosome recycling factor GTP1/OBG RNA polymerase beta subunit Holliday junction DNA helicase ruvB C-terminus RNA polymerase beta subunit external 1 domain IPP transferase RNA polymerase Rpb1, domain 1 MraW methylase family RNA polymerase Rpb1, domain 2 NusA N-terminal domain RNA polymerase Rpb1, domain 3 Oligomerisation domain RNA polymerase Rpb1, domain 4 Peptidyl-tRNA hydrolase RNA polymerase Rpb1, domain 5 Phosphoglycerate kinase RNA polymerase Rpb2, domain 2 Protein of unknown function (DUF933) RNA polymerase Rpb2, domain 3 recA bacterial DNA recombination protein RNA polymerase Rpb2, domain 6 Ribosomal L18p/L5e family RNA polymerase Rpb2, domain 7 Ribosomal L27 protein RNA polymerase Rpb3/Rpb11 dimerisation domain Ribosomal L28 family RuvA N terminal domain Ribosomal L29 protein SecY translocase ribosomal L5P family C-terminus Seryl-tRNA synthetase N-terminal domain Ribosomal prokaryotic L21 protein Signal peptidase (SPase) II Ribosomal protein L10 Signal peptide binding domain Ribosomal protein L11, N-terminal domain SmpB protein Ribosomal protein L11, RNA binding domain Tetrahydrofolate dehydrogenase/cyclohydrolase, NAD(P)-binding domain Ribosomal protein L13 Translation initiation factor 1A / IF-1 Ribosomal protein L14p/L23e Translation initiation factor IF-3, C-terminal domain Ribosomal protein L16p/L10e Translation initiation factor IF-3, N-terminal domain Ribosomal protein L17 Translation-initiation factor 2 Ribosomal protein L18e/L15 TRCF domain Ribosomal protein L19 tRNA (Guanine-1)-methyltransferase Ribosomal protein L1p/L10e family tRNA synthetase B5 domain Ribosomal protein L20 tRNA synthetases class I (R) Ribosomal protein L22p/L17e tRNA synthetases class II core domain (F) Ribosomal protein L23 UDP-N-acetylenolpyruvoylglucosamine reductase, C-terminal domain Ribosomal protein L3 Ultra-violet resistance protein B Ribosomal protein L35 Uncharacterised P-loop hydrolase UPF0079 Ribosomal protein L5 Uncharacterised protein family (UPF0081) Ribosomal protein L6 Uncharacterized protein family UPF0054 Ribosomal protein L9, C-terminal domain UvrC Helix-hairpin-helix N-terminal Ribosomal protein L9, N-terminal domain
59
Supplementary Table 8. Single-copy marker genes used concatenation-based species tree inference.
Used in 38, 55 an d 98 NO G marker protein sets Used in 55 an d 98 NO G marker protein sets Used in 98 NO G marker protein sets NO G CO G description NO G CO G description NO G CO G description COG 0049 Ribosomal protein S7 0XPXW COG 0012 Ribosome-binding ATPase YchF, GTP1/OBG family COG 0051 Ribosomal protein S10 COG 0013 Alanyl-tRNA synthetase COG 0064 Asp-tRNAAsn/Glu-tRNAGln amidotransferase B subunit COG 0057 Glyceraldehyde-3-phosphate dehydrogenase/erythrose-4- COG 0016 Phenylalanyl-tRNA synthetase alpha subunit COG 0125 Thymidylate kinase phosphate dehydrogenase COG 0081 Ribosomal protein L1 COG 0050 Translation elongation factor EF-Tu, a GTPase COG 0148 Enolase COG 0087 Ribosomal protein L3 COG 0052 Ribosomal protein S2 COG 0149 Triosephosphate isomerase COG 0088 Ribosomal protein L4 COG 0172 Seryl-tRNA synthetase COG 0173 Aspartyl-tRNA synthetase COG 0090 Ribosomal protein L2 COG 0240 Glycerol-3-phosphate dehydrogenase COG 0180 Tryptophanyl-tRNA synthetase COG 0091 Ribosomal protein L22 COG 0292 Ribosomal protein L20 COG 0190 5,10-methylene-tetrahydrofolate dehydrogenase/Methenyl tetrahydrofolate cyclohydrolase COG 0092 Ribosomal protein S3 COG 0359 Ribosomal protein L9 COG 0195 Transcription antitermination factor NusA, contains S1 and KH domains COG 0093 Ribosomal protein L14 COG 0504 CTP synthase (UTP-ammonia lyase) COG 0196 FAD synthase COG 0094 Ribosomal protein L5 COG 0536 GTPase involved in cell partioning and DNA repair COG 0215 Cysteinyl-tRNA synthetase COG 0096 Ribosomal protein S8 COG 0544 FKBP-type peptidyl-prolyl cis-trans isomerase (trigger COG 0217 Transcriptional and/or translational regulatory protein YebC/TACO1 factor) COG 0097 Ribosomal protein L6P/L9E COG 0592 DNA polymerase III sliding clamp (beta) subunit, PCNA COG 0249 DNA mismatch repair ATPase MutS homolog COG 0098 Ribosomal protein S5 COG 0750 Membrane-associated protease RseP, regulator of RpoE COG 0250 Transcription antitermination factor NusG activity COG 0103 Ribosomal protein S9 COG 1185 Polyribonucleotide nucleotidyltransferase (polynucleotide COG 0272 NAD-dependent DNA ligase phosphorylase) COG 0126 3-phosphoglycerate kinase COG 1198 Primosomal protein N' (replication factor Y) - superfamily II COG 0283 Cytidylate kinase helicase COG 0127 Inosine/xanthosine triphosphate pyrophosphatase, all-alpha COG 1530 Ribonuclease G or E COG 0289 Dihydrodipicolinate reductase NTP-PPase family COG 0197 Ribosomal protein L16/L10AE COG 0322 Excinuclease UvrABC, nuclease subunit COG 0200 Ribosomal protein L15 COG 0323 DNA mismatch repair ATPase MutL COG 0201 Preprotein translocase subunit SecY COG 0324 tRNA A37 N6-isopentenylltransferase MiaA COG 0203 Ribosomal protein L17 COG 0335 Ribosomal protein L19 COG 0216 Protein chain release factor A COG 0445 tRNA U34 5-carboxymethylaminomethyl modifying enzyme MnmG/GidA COG 0233 Ribosome recycling factor COG 0449 Glucosamine 6-phosphate synthetase, contains amidotransferase and phosphosugar isomerase domains COG 0244 Ribosomal protein L10 COG 0481 Translation elongation factor EF-4, membrane-bound GTPase COG 0256 Ribosomal protein L18 COG 0511 Biotin carboxyl carrier protein COG 0261 Ribosomal protein L21 COG 0522 Ribosomal protein S4 or related protein COG 0264 Translation elongation factor EF-Ts COG 0525 Valyl-tRNA synthetase COG 0290 Translation initiation factor IF-3 COG 0532 Translation initiation factor IF-2, a GTPase COG 0331 Malonyl CoA-acyl carrier protein transacylase COG 0541 Signal recognition particle GTPase COG 0353 Recombinational DNA repair protein RecR COG 0552 Signal recognition particle GTPase COG 0468 RecA/RadA recombinase COG 0575 CDP-diglyceride synthetase COG 0495 Leucyl-tRNA synthetase COG 0749 DNA polymerase I - 3'-5' exonuclease and polymerase domains COG 0576 Molecular chaperone GrpE (heat shock protein) COG 0774 UDP-3-O-acyl-N-acetylglucosamine deacetylase COG 0632 Holliday junction resolvasome RuvABC DNA-binding subunit COG 0825 Acetyl-CoA carboxylase alpha subunit
COG 0769 UDP-N-acetylmuramyl tripeptide synthase COG 0858 Ribosome-binding factor A COG 0815 Apolipoprotein N-acyltransferase COG 1044 UDP-3-O-[3-hydroxymyristoyl] glucosamine N-acyltransferase COG 0817 Holliday junction resolvasome RuvABC endonuclease subunit COG 1137 ABC-type lipopolysaccharide export system, ATPase component
COG 2255 Holliday junction resolvasome RuvABC, ATP-dependent DNA COG 1160 Predicted GTPases helicase subunit COG 1570 Exonuclease VII, large subunit COG 1663 Tetraacyldisaccharide-1-P 4'-kinase COG 1825 Ribosomal protein L25 (general stress protein Ctc) COG 2877 3-deoxy-D-manno-octulosonic acid (KDO) 8-phosphate synthase COG 2890 Methylase of polypeptide chain release factors
60
Supplementary Data Descriptions
Supplementary Data 1. Chlamydiae 16S rRNA amplicon OTUs in FASTA format
Supplementary Data 2. Relative abundance of Chlamydiae OTUs across Loki’s Castle marine sediments.
Supplementary Data 3. Pathway overviews, selected gene annotations and raw data.
Tab 1. Presence and absence of bacterial level NOGs across Chlamydiae
Tab 2. EffectiveDB results
Tab 3. Overview of KEGG pathways and their presence across Chlamydiae including central carbon metabolism, carbon fixation, amino acid and nucleotide biosynthesis
Tab 4. Secretion systems and flagellar components identified by MacSyFinder
Tab 5. Selected gene annotations
Tab 6. IMNGS results
Supplementary Data 4. Unprocessed phylogenetic trees presented in this study.
61
Supplementary References
1 Pillonel, T., Bertelli, C., Salamin, N. & Greub, G. Taxogenomics of the order Chlamydiales. Int J Syst Evol Microbiol 65, 1381-1393, doi:10.1099/ijs.0.000090 (2015). 2 Taylor-Brown, A. et al. Culture-independent genomics of a novel chlamydial pathogen of fish provides new insight into host-specific adaptations utilized by these intracellular bacteria. Environ Microbiol 19, 1899-1913, doi:10.1111/1462-2920.13694 (2017). 3 Pillonel, T., Bertelli, C. & Greub, G. Environmental Metagenomic Assemblies Reveal Seven New Highly Divergent Chlamydial Lineages and Hallmarks of a Conserved Intracellular Lifestyle. Front Microbiol 9, 79, doi:10.3389/fmicb.2018.00079 (2018). 4 Taylor-Brown, A. et al. Metagenomic analysis of fish-associated Ca. Parilichlamydiaceae reveals striking metabolic similarities to the terrestrial Chlamydiaceae. Genome Biol Evol, doi:10.1093/gbe/evy195 (2018). 5 Lartillot, N. & Philippe, H. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol 21, 1095-1109, doi:10.1093/molbev/msh112 (2004). 6 Stride, M. C. et al. Molecular Characterization of ''Candidatus Parilichlamydia carangidicola,'' a Novel Chlamydia-Like Epitheliocystis Agent in Yellowtail Kingfish, Seriola lalandi (Valenciennes), and the Proposal of a New Family, ''Candidatus Parilichlamydiaceae'' fam. nov. (Order Chlamydiales). Applied and Environmental Microbiology 75, doi:10.1128/AEM.02899-12 (2012). 7 Vouga, M., Baud, D. & Greub, G. Simkania negevensis, an insight into the biology and clinical importance of a novel member of the Chlamydiales order. Crit Rev Microbiol 43, 62-80, doi:10.3109/1040841X.2016.1165650 (2017). 8 Anantharaman, K. et al. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat Commun 7, 13219, doi:10.1038/ncomms13219 (2016). 9 Baker, B. J., Lazar, C. S., Teske, A. P. & Dick, G. J. Genomic resolution of linkages in carbon, nitrogen, and sulfur cycling among widespread estuary sediment bacteria. Microbiome 3, 14, doi:10.1186/s40168-015-0077-6 (2015). 10 Horn, M. Chlamydiae as symbionts in eukaryotes. Annu Rev Microbiol 62, 113-131, doi:10.1146/annurev.micro.62.081307.162818 (2008). 11 Taylor-Brown, A., Vaughan, L., Greub, G., Timms, P. & Polkinghorne, A. Twenty years of research into Chlamydia-like organisms: a revolution in our understanding of the biology and pathogenicity of members of the phylum Chlamydiae. Pathog Dis 73, 1-15 (2015). 12 Baud, D., Thomas, V., Arafa, A., Regan, L. & Greub, G. Waddlia chondrophila, a potential agent of human fetal death. Emerg Infect Dis 13, 1239-1243, doi:10.3201/eid1308.070315 (2007). 13 Moliner, C., Fournier, P. E. & Raoult, D. Genome analysis of microorganisms living in amoebae reveals a melting pot of evolution. FEMS Microbiol Rev 34, 281-294, doi:10.1111/j.1574-6976.2010.00209.x (2010). 14 Nunes, A. & Gomes, J. P. Evolution, phylogeny, and molecular epidemiology of Chlamydia. Infect Genet Evol 23, 49-64 (2014). 15 Elwell, C., Mirrashidi, K. & Engel, J. Chlamydia cell biology and pathogenesis. Nat. Rev. Microbiol. 14, 385-400, doi:10.1038/nrmicro.2016.30 (2016). 16 Horn, M. et al. Illuminating the evolutionary history of chlamydiae. Science 304, 728- 730, doi:10.1126/science.1096330 (2004).
62
17 Collingro, A. et al. Unity in variety--the pan-genome of the Chlamydiae. Mol Biol Evol 28, 3253-3270 (2011). 18 Subtil, A., Collingro, A. & Horn, M. Tracing the primordial Chlamydiae: extinct parasites of plants? Trends Plant Sci 19, 36-43, doi:10.1016/j.tplants.2013.10.005 (2014). 19 Omsland, A., Sixt, B. S., Horn, M. & Hackstadt, T. Chlamydial metabolism revisited: interspecies metabolic variability and developmental stage-specific physiologic activities. FEMS Microbiol Rev 38, 779-801, doi:10.1111/1574-6976.12059 (2014). 20 Stone, C. B., Bulir, D. C., Gilchrist, J. D., Toor, R. K. & Mahony, J. B. Interactions between flagellar and type III secretion proteins in Chlamydia pneumoniae. BMC Microbiol 10, 18, doi:10.1186/1471-2180-10-18 (2010). 21 Birkelund, S. et al. Analysis of proteins in Chlamydia trachomatis L2 outer membrane complex, COMC. FEMS Immunol Med Microbiol 55, 187-195, doi:10.1111/j.1574- 695X.2009.00522.x (2009). 22 Bliven, K. A., Fisher, D. J. & Maurelli, A. T. Characterization of the activity and expression of arginine decarboxylase in human and animal Chlamydia pathogens. FEMS Microbiol Lett 337, 140-146, doi:10.1111/1574-6968.12021 (2012). 23 Ponting, C. P. Chlamydial homologues of the MACPF (MAC/perforin) domain. Curr Biol 9, R911-913 (1999). 24 Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res 42, D222-230, doi:10.1093/nar/gkt1223 (2014). 25 Muschiol, S. et al. Identification of a family of effectors secreted by the type III secretion system that are conserved in pathogenic Chlamydiae. Infect Immun 79, 571- 580, doi:10.1128/IAI.00825-10 (2011). 26 Hobolt-Pedersen, A. S., Christiansen, G., Timmerman, E., Gevaert, K. & Birkelund, S. Identification of Chlamydia trachomatis CT621, a protein delivered through the type III secretion system to the host cell cytoplasm and nucleus. FEMS Immunol Med Microbiol 57, 46-58, doi:10.1111/j.1574-695X.2009.00581.x (2009). 27 Chellas-Gery, B., Linton, C. N. & Fields, K. A. Human GCIP interacts with CT847, a novel Chlamydia trachomatis type III secretion substrate, and is degraded in a tissue- culture infection model. Cell Microbiol 9, 2417-2430, doi:10.1111/j.1462- 5822.2007.00970.x (2007). 28 da Cunha, M. et al. Identification of type III secretion substrates of Chlamydia trachomatis using Yersinia enterocolitica as a heterologous system. Bmc Microbiology 14, doi:Artn 4010.1186/1471-2180-14-40 (2014). 29 Abby, S. S., Neron, B., Menager, H., Touchon, M. & Rocha, E. P. MacSyFinder: a program to mine genomes for molecular systems with an application to CRISPR-Cas systems. PLoS One 9, e110726, doi:10.1371/journal.pone.0110726 (2014). 30 Abby, S. S. et al. Identification of protein secretion systems in bacterial genomes. Sci Rep 6, 23080, doi:10.1038/srep23080 (2016). 31 Peabody, C. R. et al. Type II protein secretion and its relationship to bacterial type IV pili and archaeal flagella. Microbiology 149, 3051-3072, doi:10.1099/mic.0.26364-0 (2003). 32 Korotkov, K. V., Sandkvist, M. & Hol, W. G. The type II secretion system: biogenesis, molecular architecture and mechanism. Nat Rev Microbiol 10, 336-351, doi:10.1038/nrmicro2762 (2012). 33 Mueller, K. E., Plano, G. V. & Fields, K. A. New frontiers in type III secretion biology: the Chlamydia perspective. Infect Immun 82, 2-9, doi:10.1128/IAI.00917-13 (2014).
63
34 Dumoux, M., Nans, A., Saibil, H. R. & Hayward, R. D. Making connections: snapshots of chlamydial type III secretion systems in contact with host membranes. Curr Opin Microbiol 23, 1-7, doi:10.1016/j.mib.2014.09.019 (2015). 35 Abby, S. S. & Rocha, E. P. The non-flagellar type III secretion system evolved from the bacterial flagellum and diversified into host-cell adapted systems. PLoS Genet 8, e1002983, doi:10.1371/journal.pgen.1002983 (2012). 36 Collingro, A. et al. Unexpected genomic features in widespread intracellular bacteria: evidence for motility of marine chlamydiae. ISME J 11, 2334-2344, doi:10.1038/ismej.2017.95 (2017). 37 Tully, B. J., Graham, E. D. & Heidelberg, J. F. The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans. Sci Data 5, 170203, doi:10.1038/sdata.2017.203 (2018). 38 Eichinger, V. et al. EffectiveDB--updates and novel features for a better annotation of bacterial secreted proteins and Type III, IV, VI secretion systems. Nucleic Acids Res 44, D669-674, doi:10.1093/nar/gkv1269 (2016). 39 Sait, M. et al. Genomic and Experimental Evidence Suggests that Verrucomicrobium spinosum Interacts with Eukaryotes. Front Microbiol 2, 211, doi:10.3389/fmicb.2011.00211 (2011). 40 Martinez-Garcia, P. M., Ramos, C. & Rodriguez-Palenzuela, P. T346Hunter: a novel web-based tool for the prediction of type III, type IV and type VI secretion systems in bacterial genomes. PLoS One 10, e0119317, doi:10.1371/journal.pone.0119317 (2015). 41 Pilhofer, M. et al. Architecture and host interface of environmental chlamydiae revealed by electron cryotomography. Environ Microbiol 16, 417-429, doi:10.1111/1462- 2920.12299 (2014). 42 Nans, A., Kudryashev, M., Saibil, H. R. & Hayward, R. D. Structure of a bacterial type III secretion system in contact with a host membrane in situ. Nat Commun 6, 10114, doi:10.1038/ncomms10114 (2015). 43 Konig, L. et al. Biphasic Metabolism and Host Interaction of a Chlamydial Symbiont. mSystems 2, doi:10.1128/mSystems.00202-16 (2017). 44 Beder, T. & Saluz, H. P. Virulence-related comparative transcriptomics of infectious and non-infectious chlamydial particles. BMC Genomics 19, 575, doi:10.1186/s12864- 018-4961-x (2018). 45 Cosse, M. M., Hayward, R. D. & Subtil, A. One Face of Chlamydia trachomatis: The Infectious Elementary Body. Curr Top Microbiol Immunol 412, 35-58, doi:10.1007/82_2016_12 (2018). 46 Gallique, M., Bouteiller, M. & Merieau, A. The Type VI Secretion System: A Dynamic System for Bacterial Communication? Front Microbiol 8, 1454, doi:10.3389/fmicb.2017.01454 (2017). 47 Cao, Z., Casabona, M. G., Kneuper, H., Chalmers, J. D. & Palmer, T. The type VII secretion system of Staphylococcus aureus secretes a nuclease toxin that targets competitor bacteria. Nat Microbiol 2, 16183, doi:10.1038/nmicrobiol.2016.183 (2016). 48 Aoki, S. K. et al. A widespread family of polymorphic contact-dependent toxin delivery systems in bacteria. Nature 468, 439-442, doi:10.1038/nature09490 (2010). 49 Souza, D. P. et al. Bacterial killing via a type IV secretion system. Nat Commun 6, 6453, doi:10.1038/ncomms7453 (2015). 50 Willett, J. L., Ruhe, Z. C., Goulding, C. W., Low, D. A. & Hayes, C. S. Contact- Dependent Growth Inhibition (CDI) and CdiB/CdiA Two-Partner Secretion Proteins. J Mol Biol 427, 3754-3765, doi:10.1016/j.jmb.2015.09.010 (2015).
64
51 Garcia-Bayona, L., Guo, M. S. & Laub, M. T. Contact-dependent killing by Caulobacter crescentus via cell surface-associated, glycine zipper proteins. Elife 6, doi:10.7554/eLife.24869 (2017). 52 Lucas, C. E., Brown, E. & Fields, B. S. Type IV pili and type II secretion play a limited role in Legionella pneumophila biofilm colonization and retention. Microbiology 152, 3569-3573, doi:10.1099/mic.0.2006/000497-0 (2006). 53 Ishida, K. et al. Amoebal endosymbiont Neochlamydia genome sequence illuminates the bacterial role in the defense of the host amoebae against Legionella pneumophila. PLoS One 9, e95166, doi:10.1371/journal.pone.0095166 (2014). 54 Stewart, C. R., Rossier, O. & Cianciotto, N. P. Surface translocation by Legionella pneumophila: a form of sliding motility that is dependent upon type II protein secretion. J Bacteriol 191, 1537-1546, doi:10.1128/JB.01531-08 (2009). 55 Pallen, M. J., Beatson, S. A. & Bailey, C. M. Bioinformatics, genomics and evolution of non-flagellar type-III secretion systems: a Darwinian perspective. FEMS Microbiol Rev 29, 201-229, doi:10.1016/j.femsre.2005.01.001 (2005). 56 Tjaden, J. et al. Two Nucleotide Transport Proteins in Chlamydia trachomatis, One for Net Nucleoside Triphosphate Uptake and the Other for Transport of Energy. Journal of Bacteriology 181, 1196-1202 (1999). 57 Neuhaus, H. E., Thom, E., Möhlmann, T., Steup, M. & Kampfenkelz, K. Characterization of a novel eukaryotic ATP/ADP translocator located in the plastid envelope of Arabidopsis thaliana L. The Plant Journal 11, 73-82 (1997). 58 Tjaden, J., Schwöppe, C., Möhlmann, T., Quick, P. W. & Neuhaus, H. E. Expression of a Plastidic ATP/ADP Transporter Gene in Escherichia coli Leads to a Functional Adenine Nucleotide Transport System in the Bacterial Cytoplasmic Membrane. The Journal of Biological Chemistry 273, 9630-9636 (1998). 59 Schmitz-Esser, S. et al. ATP/ADP Translocases: a Common Feature of Obligate Intracellular Amoebal Symbionts Related to Chlamydiae and Rickettsiae. Journal of Bacteriology 186, 683-691, doi:10.1128/jb.186.3.683-691.2004 (2004). 60 Major, P., Embley, T. M. & Williams, T. A. Phylogenetic Diversity of NTT Nucleotide Transport Proteins in Free-Living and Parasitic Bacteria and Eukaryotes. Genome Biol Evol 9, 480-487, doi:10.1093/gbe/evx015 (2017). 61 Gould, S. B., Waller, R. F. & McFadden, G. I. Plastid evolution. Annu Rev Plant Biol 59, 491-517, doi:10.1146/annurev.arplant.59.032607.092915 (2008). 62 Amiri, H., Karlberg, O. & Andersson, S. G. Deep origin of plastid/parasite ATP/ADP translocases. J Mol Evol 56, 137-150, doi:10.1007/s00239-002-2387-0 (2003). 63 Greub, G. & Raoult, D. History of the ADP/ATP-Translocase-Encoding Gene, a Parasitism Gene Transferred from a Chlamydiales Ancestor to Plants 1 Billion Years Ago. Applied and Environmental Microbiology 69, 5530-5535, doi:10.1128/aem.69.9.5530-5535.2003 (2003). 64 Knab, S., Mushak, T. M., Schmitz-Esser, S., Horn, M. & Haferkamp, I. Nucleotide parasitism by Simkania negevensis (Chlamydiae). J Bacteriol 193, 225-235, doi:10.1128/JB.00919-10 (2011). 65 Fisher, D. J., Fernández, R. E. & Maurelli, A. T. Chlamydia trachomatis Transports NAD via the Npt1 ATP/ADP Translocase. Journal of Bacteriology 195, 3381-3386 (2013). 66 Stephens, R. S. et al. Genome Sequence of an Obligate Intracellular Pathogen of Humans: Chlamydia trachomatis. Science 282, 754-759 (1998). 67 Haferkamp, I. et al. A candidate NAD transporter in an intracellular bacterial symbiont related to Chlamydiae. Nature 432 (2004).
65
68 Haferkamp, I. et al. Tapping the nucleotide pool of the host: novel nucleotide carrier proteins of Protochlamydia amoebophila. Mol Microbiol 60, 1534-1545, doi:10.1111/j.1365-2958.2006.05193.x (2006). 69 Yeoh, Y. K., Sekiguchi, Y., Parks, D. H. & Hugenholtz, P. Comparative Genomics of Candidate Phylum TM6 Suggests That Parasitism Is Widespread and Ancestral in This Lineage. Mol Biol Evol 33, 915-927, doi:10.1093/molbev/msv281 (2016). 70 Perez, J., Moraleda-Munoz, A., Marcos-Torres, F. J. & Munoz-Dorado, J. Bacterial predation: 75 years and counting! Environ Microbiol 18, 766-779, doi:10.1111/1462- 2920.13171 (2016). 71 Groves, M. R., Hanlon, N., Turowski, P., Hemmings, B. A. & Barford, D. The structure of the protein phosphatase 2A PR65/A subunit reveals the conformation of its 15 tandemly repeated HEAT motifs. Cell 96, 99-110 (1999). 72 Moran, N. A. Microbial minimalism: genome reduction in bacterial pathogens. Cell 108, 583-586 (2002). 73 Bertelli, C. et al. The Waddlia genome: a window into chlamydial biology. PLoS One 5, e10890, doi:10.1371/journal.pone.0010890 (2010). 74 Bertelli, C. et al. Sequencing and characterizing the genome of Estrella lausannensis as an undergraduate project: training students and biological insights. Front Microbiol 6, 101, doi:10.3389/fmicb.2015.00101 (2015). 75 Schander, C. et al. The fauna of hydrothermal vents on the Mohn Ridge (North Atlantic). Marine Biology Research 6, 155-171 (2010). 76 Pedersen, R. B. et al. Discovery of a black smoker vent field and vent fauna at the Arctic Mid-Ocean Ridge. Nat Commun 1, doi:10.1038/ncomms1124 (2010). 77 Edgcomb, V. P., Beaudoin, D., Gast, R., Biddle, J. F. & Teske, A. Marine subsurface eukaryotes: the fungal majority. Environ Microbiol 13, 172-183, doi:10.1111/j.1462- 2920.2010.02318.x (2011). 78 Orsi, W., Biddle, J. F. & Edgcomb, V. Deep sequencing of subseafloor eukaryotic rRNA reveals active Fungi across marine subsurface provinces. PLoS One 8, e56335, doi:10.1371/journal.pone.0056335 (2013). 79 Spang, A. et al. Complex archaea that bridge the gap between prokaryotes and eukaryotes. Nature 521, 173-179, doi:10.1038/nature14447 (2015). 80 Kouduka, M. et al. A new DNA extraction method by controlled alkaline treatments from consolidated subsurface sediments. FEMS Microbiol Lett 326, 47-54, doi:10.1111/j.1574-6968.2011.02437.x (2012). 81 Orsi, W. D. Ecology and evolution of seafloor and subseafloor microbial communities. Nat Rev Microbiol, doi:10.1038/s41579-018-0046-8 (2018). 82 Israelsson, O. Chlamydial symbionts in the enigmatic Xenoturbella (Deuterostomia). J Invertebr Pathol 96, 213-220, doi:10.1016/j.jip.2007.05.002 (2007). 83 Kjeldsen, K. U., Obst, M., Nakano, H., Funch, P. & Schramm, A. Two types of endosymbiotic bacteria in the enigmatic marine worm Xenoturbella bocki. Appl Environ Microbiol 76, 2657-2662, doi:10.1128/AEM.01092-09 (2010). 84 Rurangirwa, F. R., Dilbeck, P. M., Crawford, T. B., McGuire, T. C. & McElwain, T. F. Analysis of the 16S rRNA gene of micro-organism WSU 86-1044 from an aborted bovine foetus reveals that it is a member of the order Chlamydiales: proposal of Waddliaceae fam. nov., Waddlia chondrophila gen. nov., sp. nov. Int J Syst Bacteriol 49 Pt 2, 577-581, doi:10.1099/00207713-49-2-577 (1999). 85 Lagkouvardos, I. et al. Integrating metagenomic and amplicon databases to resolve the phylogenetic and ecological diversity of the Chlamydiae. ISME J 8, 115-125, doi:10.1038/ismej.2013.142 (2014).
66
86 Lagkouvardos, I. et al. IMNGS: A comprehensive open resource of processed 16S rRNA microbial profiles for ecology and diversity studies. Sci Rep 6, 33721, doi:10.1038/srep33721 (2016). 87 Kantor, R. S. et al. Bioreactor microbial ecosystems for thiocyanate and cyanide degradation unravelled with genome-resolved metagenomics. Environ Microbiol 17, 4929-4941, doi:10.1111/1462-2920.12936 (2015). 88 Pinto, A. J. et al. Metagenomic Evidence for the Presence of Comammox Nitrospira- Like Bacteria in a Drinking Water System. mSphere 1, doi:10.1128/mSphere.00054-15 (2016). 89 Zhang, Y., Kitajima, M., Whittle, A. J. & Liu, W. T. Benefits of Genomic Insights and CRISPR-Cas Signatures to Monitor Potential Pathogens across Drinking Water Production and Distribution Systems. Front Microbiol 8, 2036, doi:10.3389/fmicb.2017.02036 (2017). 90 Schulz, F. et al. Towards a balanced view of the bacterial tree of life. Microbiome 5, 140, doi:10.1186/s40168-017-0360-9 (2017). 91 Lagkouvardos, I. et al. Integrating metagenomic and amplicon databases to resolve the phylogenetic and ecological diversity of the Chlamydiae. ISME J 8, 115-125 (2014). 92 Thompson, L. R. et al. A communal catalogue reveals Earth's multiscale microbial diversity. Nature 551, 457-463, doi:10.1038/nature24621 (2017). 93 Jorgensen, S. L. et al. Correlating microbial community profiles with geochemical data in highly stratified sediments from the Arctic Mid-Ocean Ridge. PNAS 109, E2846- E2855, doi:10.1073/pnas.1207574109 (2012). 94 Taylor-Brown, A., Madden, D. & Polkinghorne, A. Culture-independent approaches to chlamydial genomics. Microb Genom, doi:10.1099/mgen.0.000145 (2018). 95 Probst, A. J. et al. Differential depth distribution of microbial function and putative symbionts through sediment-hosted aquifers in the deep terrestrial subsurface. Nat Microbiol 3, 328-336, doi:10.1038/s41564-017-0098-y (2018). 96 Anantharaman, K. et al. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat Commun 7, doi:10.1038/ncomms13219 (2016). 97 Taylor-Brown, A., Spang, L., Borel, N. & Polkinghorne, A. Culture-independent metagenomics supports discovery of uncultivable bacteria within the genus Chlamydia. Sci Rep 7, 10661, doi:10.1038/s41598-017-10757-5 (2017). 98 Klindworth, A. et al. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res 41, e1, doi:10.1093/nar/gks808 (2013). 99 Hugerth, L. W. et al. Systematic design of 18S rRNA gene primers for determining eukaryotic diversity in microbial consortia. PLoS One 9, e95567, doi:10.1371/journal.pone.0095567 (2014). 100 Wang, Y. & Qian, P. Y. Conservative fragments in bacterial 16S rRNA genes and primer design for 16S ribosomal DNA amplicons in metagenomic studies. PLoS One 4, e7401, doi:10.1371/journal.pone.0007401 (2009).
67