Quick viewing(Text Mode)

Anoxygenic Phototrophic Chloroflexota Member Uses a Type I Reaction Center

Anoxygenic Phototrophic Chloroflexota Member Uses a Type I Reaction Center

bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Anoxygenic phototrophic Chloroflexota member uses a Type I reaction center

Tsuji JM1*, Shaw NA1, Nagashima S2, Venkiteswaran JJ1,3, Schiff SL1, Hanada S2, Tank M2,4*, Neufeld

JD1*

5

1University of Waterloo, 200 University Avenue West, Waterloo, Ontario, Canada, N2L 3G1

2Tokyo Metropolitan University, 1-1 Minami-osawa, Hachioji, Tokyo, Japan, 192-0397

3Wilfrid Laurier University, 75 University Avenue West, Waterloo, Ontario, Canada, N2L 3C5

4Leibniz Institute DSMZ-German Collection of and Cell Cultures GmbH,

10 Inhoffenstrasse 7B, 38124 Braunschweig, Germany

*Corresponding authors: [email protected]; [email protected]; [email protected]

Keywords: Chloroflexota; ; filamentous anoxygenic ; boreal lakes; anoxygenic

photoautotrophy; of ; enrichment cultivation

bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

15 Summary

Chlorophyll-based phototrophy is performed using quinone and/or Fe-S type reaction centers1,2.

Unlike oxygenic , where both reaction center classes are used in tandem as Photosystem II

and Photosystem I, anoxygenic phototrophs use only one class of reaction center, termed Type II

(RCII) or Type I (RCI) reaction centers, separately for phototrophy3. Here we report the cultivation and

20 characterization of a filamentous anoxygenic phototroph within the Chloroflexota (formerly

Chloroflexi4) phylum, provisionally named ‘Candidatus Chlorohelix allophototropha’, that performs

phototrophy using a distinct fourth clade of RCI protein, despite placing sister taxonomically to RCII-

utilizing Chloroflexota members. ‘Ca. Chx. allophototropha’ contains chlorosomes, uses

c, and encodes the FMO protein like other RCI-utilizing phototrophs in the

25 Chlorobiales and Chloracidobacterales orders5. ‘Ca. Chx. allophototropha’ also encodes the potential

for using the Calvin-Benson-Bassham (CBB) cycle, unlike all known RCI-utilizing

phototrophs6. The discovery of ‘Ca. Chx. allophototropha’, as the first representative of a novel

Chloroflexota order (i.e., ‘Ca. Chloroheliales’), sheds light on longstanding questions about the

evolution of photosynthesis, including the origin of chlorosomes among RCII-utilizing Chloroflexota

30 members5,7. The Chloroflexota is now the only phylum outside the containing genomic

potential for both quinone and Fe-S type reaction centers and can thus serve as an additional system for

exploring fundamental questions about the evolution of photosynthesis.

bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Main text

35 The evolution of photosynthesis remains poorly understood8,9. Among chlorophyll-based

phototrophic organisms, two distinct classes of photosynthetic reaction center, either quinone-type or

Fe-S type, are used for light energy conversion by members of at least eight microbial phyla, yet the

evolutionary history of these reaction centers is unclear1,3,10,11. Members of a single microbial phylum,

the Cyanobacteria (along with in ), use both quinone and Fe-S type reaction

40 centers in tandem as Photosystem II (PSII) and Photosystem I (PSI), respectively, to generate molecular

through the oxidation of water2. Diverse anoxygenic phototrophs making up the remaining

known phyla perform phototrophy using a single reaction center class, either a Type II (RCII) or Type I

(RCI) reaction center related to PSII or PSI, respectively3. Combined phylogenetic, biochemical, and

physiological evidence suggests that many photosynthesis-associated genes, including reaction center

45 genes, are ancient in origin and may have been transferred between distantly related microorganisms

multiple times over evolutionary history5,12. Such lateral gene transfer is presumed to be a key factor

allowing for the existence of both oxygenic and anoxygenic photosynthesis13. However, modern case

studies of such distant lateral gene transfer events are limited, hindering understanding of the

fundamental mechanisms allowing for chlorophototrophy to evolve14.

50 Among phototrophic microorganisms, anoxygenic phototrophs within the Chloroflexota phylum

(formerly the Chloroflexi phylum4,15; also called filamentous anoxygenic phototrophs or “green non-

sulfur ”) have been particularly cryptic in their evolutionary origin. Although these bacteria use

RCII for photosynthesis, several phototrophic Chloroflexota members also contain chlorosomes, a

light-harvesting apparatus that is otherwise found only in RCI-utilizing bacteria16. How chlorosomes

55 were adopted by both RCI-containing phototrophs and RCII-containing phototrophic Chloroflexota

members remains highly speculative, with some suggesting interactions between RCI and RCII-

utilizing bacteria in the evolutionary past5,7. However, concrete evidence for such interactions has never

been demonstrated in nature. Here we report the cultivation of a highly novel phototroph within the bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Chloroflexota phylum that uses RCI and chlorosomes for phototrophy. We describe the physiological

60 and genomic properties of the novel bacterium and its implications for the evolution of photosynthesis.

Water was collected from the illuminated and anoxic column of an iron-rich Boreal Shield

lake to enrich for resident bacterial phototrophs. After approximately two months of incubation under

light, enrichment cultures began showing ferrous iron oxidation activity (Supplementary Note 1), and

16S ribosomal RNA (16S rRNA) gene amplicon sequencing from some cultures indicated the presence

65 of a novel bacterium distantly related to known Chloroflexota members (Supplementary Data 1).

Further enrichment in deep agar dilution series eliminated a purple phototrophic bacterium related to

the metabolically versatile Rhodopseudomonas palustris17 and a green phototrophic bacterium

belonging to the Chlorobium genus. Metagenome sequencing and assembly recovered the nearly closed

genome of the Chloroflexota member, which represented the only detectable phototroph and comprised

70 43.4% of raw metagenome sequences (Extended Data Fig. 1), along with nearly closed genomes of the

two major heterotrophic populations remaining in the culture (Supplementary Note 2). Given the

detection of genes for RCI in the Chloroflexota member’s genome, we provisionally classify this

bacterium as, ‘Candidatus Chlorohelix allophototropha’ strain L227-S17 (Chlo.ro.he'lix. Gr. adj.

chloros, green; Gr. fem. n. helix, spiral; N.L. fem. n. Chlorohelix, a green spiral. al.lo.pho.to.tro'pha. Gr.

75 adj. allos, other, different; Gr. n. phos, -otos, light; N.L. suff. -tropha, feeding; N.L. fem. adj.

allophototropha, phototrophic in a different way).

Physiologically, ‘Ca. Chx. allophototropha’ resembled other cultured phototrophs belonging to

the Chloroflexota phylum (Fig. 1). Cells grew well in soft agar (Fig. 1a) and developed as long,

spiralling filaments (Fig. 1b), similar to those observed for Chloronema giganteum18, that were

80 composed of cells with dimension of ~2-3 μm x 0.6 μm (Fig. 1c-d). In addition to the presence of light-

harvesting chlorosomes (Fig. 1e-f), ‘Ca. Chx. allphototropha’ cells were pigmented, with the dominant

pigment being bacteriochlorophyll c based on the 433 and 663 nm peaks observed in absorption

spectrum data from acetone:methanol pigment extracts (Fig. 1g; see Supplementary Note 3)19. bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Despite physiological similarities to cultured Chloroflexota members, genome sequencing data

85 revealed that ‘Ca. Chx. allophototropha’ encoded a remote homolog of a pscA-like RCI gene (Fig. 2)

rather than the expected pufL/pufM RCII genes. A pscA-like gene was also detected in a second genome

bin, named ‘Ca. Chloroheliales bin L227-5C’, which we recovered from a second enrichment culture

that was subsequently lost (see Supplementary Notes 1-2). Our detection of RCI genes in ‘Ca. Chx.

allophototropha’ makes the Chloroflexota phylum unique among all known phototroph-containing

90 microbial phyla, which otherwise contain representatives that use strictly RCI or RCII for phototrophy

(Fig. 2a). Based on a maximum-likelihood amino acid sequence phylogeny (Fig. 2b; Extended Data

Fig. 2), the PscA-like predicted protein of ‘Ca. Chx. allophototropha’ represents a clearly distinct

fourth class of RCI protein that places between clades for PscA-like proteins of Chloracidobacterales

members and PshA proteins of Heliobacterales members. Homology modelling indicated that the ‘Ca.

95 Chx. allophototropha’ PscA-like protein contains the expected six N-terminal transmembrane helices

for coordinating antenna pigments and five C-terminal transmembrane helices involved in core electron

transfer20 (Fig. 2c). The pscA-like gene occurred on a long scaffold (3.29 Mb in length) and was

flanked by other genes whose predicted primary sequences had closest BLASTP hits to other

Chloroflexota members (Supplementary Data 2). The genome did not contain the pscBCD photosystem

100 accessory genes found among Chlorobia members5, nor a cyanobacterial psaB-like paralog, implying

that ‘Ca. Chx. allophototropha’ uses a homodimeric RCI complex2 like other known anoxygenic

phototrophs.

Based on the Genome Database (GTDB) classification system4,21 used throughout this

work, ‘Ca. Chx. allophototropha’ represents the first cultivated member of a novel order within the

105 Chloroflexota phylum (Fig. 3), which we provisionally name the ‘Ca. Chloroheliales’ (see

Supplementary Note 4). This order placed immediately basal to the Chloroflexales order, which

contains the canonical RCII-utilizing phototroph clade in the phylum that includes the Chloroflexaceae,

Roseiflexaceae, and Oscillochloridaceae families15. The Chloroflexales order also contains a basal non- bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

phototrophic family, the Herpetosiphonaceae, which was formerly placed in its own order prior to

110 reassignment in the GTDB4. The close phylogenetic placement of RCI-utilizing ‘Ca. Chloroheliales’

members and RCII-utilizing Chloroflexales members implies the potential for past genetic interaction

between these phototroph groups.

The genome of ‘Ca. Chx. allophototropha’ encoded numerous phototrophy-associated genes with

remote homology to phototrophy genes of known organisms (Fig. 3; Supplementary Data 3). We

115 detected a highly novel homolog of fmoA encoding the Fenna-Matthews-Olson (FMO) protein involved

in energy transfer from chlorosomes to RCI22 (Extended Data Fig. 3). This finding supports that ‘Ca.

Chx. allophototropha’ relies on RCI and chlorosomes for phototrophic energy conversion and makes

the Chloroflexota the third known phylum to use the FMO protein in phototrophy23. Using both

bidirectional BLASTP and profile hidden Markov model-based searches, we also detected a potential

120 homolog of the key chlorosome structural gene csmA5,16 with ~33% identity at the predicted amino acid

level to the CsmA primary sequence of J-10-fl (Fig. 3). This potential

homolog had a similar predicted tertiary structure to the CsmA protein of Chlorobium tepidum24 based

on homology modelling and shared several highly conserved residues with other known CsmA

sequences, although the potential homolog lacked the H25 histidine hypothesized to be required for

125 bacteriochlorophyll a binding25 (Extended Data Fig. 4). No other predicted open reading frames in the

genome had close homology to known CsmA sequences, implying that this sequence represents a

highly novel CsmA variant. We also detected potential homologs of csmM and csmY5 in the genome

(Fig. 3).

Similarly to Oscillochloris trichoides, the ‘Ca. Chx. allophototropha’ genome encoded genes for

130 the Calvin-Benson-Bassham (CBB) cycle (Fig. 3), including a deep-branching Class IC/ID “red type”

cbbL gene26 representing the large subunit of RuBisCO (Extended Data Fig. 5). Carbon fixation via the

CBB cycle has never before been reported for an RCI-utilizing phototroph, and this finding further

highlights the novelty of ‘Ca. Chx. allophototropha’ compared to known phototrophs. The novel bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

genome did not encode the potential for carbon fixation via the 3-hydroxypropionate (3HP) bicycle,

135 which is thought to represent a more recent evolutionary innovation within the Chloroflexales order27.

The ‘Ca. Chx. allophototropha’ genome also encoded the biosynthesis pathway for

a and c5 and had the genomic potential for nitrogen fixation based on the presence

28 of a nifHI1I2BDK gene cluster (Fig. 3). Although the closest BLASTP hits against RefSeq for

predicted primary sequences of nifH, nifB, nifD, and nifK were to Chloroflexota-associated sequences,

29 140 the nifI1 and nifI2 genes involved in regulation of nitrogen metabolism were not detectable among

other Chloroflexota members. Other members instead encoded a nifHBDK gene cluster, implying that

the ‘Ca. Chx. allophototropha’ nif cluster could represent a more ancestral nif cluster variant compared

to those of its relatives. The genome bin of the second ‘Ca. Chloroheliales’ member enriched in this

study (Fig. 3; bin L227-5C) encoded a similar set of phototrophy-related gene homologs as ‘Ca. Chx.

145 allophototropha’, which further supports the robustness of our detection of these novel phototrophy-

related genes.

Previously, the genetic content of RCII-utilizing phototrophic Chloroflexota members has been

enigmatic because several Chloroflexota-associated phototrophy genes appear to be most closely

related to genes of RCI-utilizing phototrophs. Our results lead us to hypothesize that ‘Ca. Chx.

150 allophototropha’ represents a missing transition form in the evolution of phototrophy among the

Chloroflexota phylum that resolves this enigma by providing a direct link between RCI- and RCII-

utilizing phototrophic taxa. Along with encoding chlorosomes, RCII-utilizing Chloroflexota members

have been observed to encode bacteriochlorophyll synthesis genes most closely related to those of RCI-

utilizing Chlorobia members5. Using maximum likelihood phylogenies of the (bacterio)chlorophyll

155 synthesis proteins BchIDH/ChlIDH (Extended Data Fig. 6) and BchLNB/ChlLNB/BchXYZ (Extended

Data Fig. 7), we observed that ‘Ca. Chloroheliales’ sequences placed immediately adjacent to this sister

grouping of RCII-utilizing Choroflexota and RCI-utilizing Chlorobia members. A phylogeny of the

chlorosome structural protein CsmA also indicated closer evolutionary relatedness between ‘Ca. bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Chloroheliales’, RCII-utilizing Chloroflexota, and RCI-utilizing Chlorobia members compared to RCI-

160 utilizing Acidobacteriota members such as thermophilum23 (Extended Data Fig.

8). The phylogenetic clustering of RCI- and RCII-utilizing Chloroflexota members with RCI-utilizing

Chlorobia members suggests a shared genetic history of phototrophy between these groups despite

using different core reaction center components. Thus, it is possible that an ancestor of ‘Ca. Chx.

allophototropha’ received genes for RCI, chlorosomes, and bacteriochlorophyll synthesis, along with

165 genes for carbon or nitrogen fixation, via lateral gene transfer and then provided the photosynthesis

accessory genes needed by RCII-containing Chloroflexales members via either direct vertical

inheritance or a subsequent lateral gene transfer event.

The cultivation of ‘Ca. Chx. allophototropha’ also opens substantial possibilities for

understanding the evolution of oxygenic photosynthesis. The Chloroflexota now represents the only

170 phylum outside of the Cyanobacteria where both quinone and Fe-S type photosynthetic reaction

centers can be found (Fig. 2a), although in different organisms. Genes for RCII and RCI might be

readily incorporated into a single member of the Chloroflexota phylum, for example given the apparent

lateral transfer of RCII genes to basal Chloroflexota clades (Fig. 3) described previously30. One model

for the origin of oxygenic photosynthesis suggests that a single organism received quinone and Fe-S

175 type reaction centers via lateral gene transfer and then developed the capacity to use these two reaction

center classes in tandem as PSII and PSI1,13. The physiological effect of receiving both reaction center

classes might now be testable in a laboratory setting. Ancestral state reconstructions could also yield

insights into the genomic properties of the common ancestor of RCI-utilizing ‘Ca. Chloroheliales’ and

RCII-utilizing Chloroflexales members and whether this common ancestor was itself phototrophic.

180 Generally, discovering ‘Ca. Chx. allophototropha’ demonstrates the possibility for massive

photosynthesis gene movements over evolutionary time, which is required by many evolutionary

models but has been sparsely demonstrated in nature12,14. bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

With the cultivation of ‘Ca. Chx. allophototropha’, the Chloroflexota now represents the first

known phylum of phototrophs having members that use either RCI or RCII for phototrophic energy

185 conversion. ‘Ca. Chx. allophototropha’ is unique among known phototrophs, because it encodes a

distinct fourth class of RCI protein, forms the third known phototroph clade to utilize the FMO protein

for energy transfer, and represents the only known RCI-utilizing phototroph to encode the CBB cycle

for carbon fixation. Future research examining the biochemical properties of the reaction center and

chlorosomes of ‘Ca. Chx. allophototropha’, the potentially photoferrotrophic metabolism of this strain

190 (Supplementary Note 1), and the diversity and biogeography of ‘Ca. Chloroheliales’ members, will

provide fundamental insights into how photosynthesis operates and evolves. Cultivation of ‘Ca. Chx.

allophototropha’ allows the Chloroflexota to serve as a model system for exploring photosystem gene

movements over evolutionary time.

195 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

References

1. Hohmann-Marriott, M. F. & Blankenship, R. E. Evolution of photosynthesis. Annu Rev

Biol 62, 515–548 (2011).

2. Fischer, W. W., Hemp, J. & Johnson, J. E. Evolution of oxygenic photosynthesis. Annu Rev Earth

200 Pl Sc 44, 647–683 (2016).

3. Thiel, V., Tank, M. & Bryant, D. A. Diversity of chlorophototrophic bacteria revealed in the

omics era. Annu Rev Plant Biol 69, 21–49 (2018).

4. Parks, D. H. et al. A standardized based on genome phylogeny substantially

revises the tree of life. Nat Biotechnol 36, 996–1004 (2018).

205 5. Bryant, D. A. et al. Comparative and functional genomics of anoxygenic green bacteria from the

taxa Chlorobi, Chloroflexi, and . in Functional Genomics and Evolution of

Photosynthetic Systems (eds. Burnap, R. & Vermaas, W.) 47–102 (Springer Netherlands, 2012).

doi:10.1007/978-94-007-1533-2_3.

6. Tang, K.-H., Tang, Y. J. & Blankenship, R. E. Carbon metabolic pathways in phototrophic

210 bacteria and their broader evolutionary implications. Front Microbiol 2, (2011).

7. Hohmann-Marriott, M. F. & Blankenship, R. E. Hypothesis on chlorosome biogenesis in green

photosynthetic bacteria. FEBS Lett 581, 800–803 (2007).

8. Olson, J. M. & Blankenship, R. E. Thinking about the evolution of photosynthesis. in Discoveries

in Photosynthesis (eds. Govindjee, Beatty, J. T., Gest, H. & Allen, J. F.) 1073–1086 (Springer

215 Netherlands, 2005). doi:10.1007/1-4020-3324-9_95.

9. Cardona, T. Thinking twice about the evolution of photosynthesis. Open Biol 9, 180246 (2019).

10. Ward, L. M., Cardona, T. & Holland-Moritz, H. Evolutionary implications of anoxygenic

phototrophy in the bacterial phylum Candidatus Eremiobacterota (WPS-2). Front. Microbiol. 10,

(2019). bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

220 11. Sadekar, S., Raymond, J. & Blankenship, R. E. Conservation of distantly related membrane

proteins: photosynthetic reaction centers share a common structural core. Mol Biol Evol 23,

2001–2007 (2006).

12. Martin, W. F., Bryant, D. A. & Beatty, J. T. A physiological perspective on the origin and

evolution of photosynthesis. FEMS Microbiol Rev 42, 205–231 (2018).

225 13. Soo, R. M., Hemp, J. & Hugenholtz, P. Evolution of photosynthesis and aerobic respiration in the

cyanobacteria. Free Radical Bio Med 140, 200–205 (2019).

14. Zeng, Y., Feng, F., Medová, H., Dean, J. & Koblížek, M. Functional type 2 photosynthetic

reaction centers found in the rare bacterial phylum . Proc Natl Acad Sci USA

111, 7795–7800 (2014).

230 15. Hanada, S. The phylum Chloroflexi, the family Chloroflexaceae, and the related phototrophic

families Oscillochloridaceae and Roseiflexaceae. in The : Other Major Lineages of

Bacteria and The (eds. Rosenberg, E., DeLong, E. F., Lory, S., Stackebrandt, E. &

Thompson, F.) 515–532 (Springer Berlin Heidelberg, 2014). doi:10.1007/978-3-642-38954-

2_165.

235 16. Frigaard, N.-U. & Bryant, D. A. Seeing green bacteria in a new light: genomics-enabled studies

of the photosynthetic apparatus in and filamentous anoxygenic phototrophic

bacteria. Arch Microbiol 182, 265–276 (2004).

17. Jiao, Y., Kappler, A., Croal, L. R. & Newman, D. K. Isolation and characterization of a

genetically tractable photoautotrophic Fe(II)-oxidizing bacterium, Rhodopseudomonas palustris

240 strain TIE-1. Appl Environ Microbiol 71, 4487–4496 (2005).

18. Gich, F., Garcia-Gil, J. & Overmann, J. Previously unknown and phylogenetically diverse

members of the green nonsulfur bacteria are indigenous to freshwater lakes. Arch Microbiol 177,

1–10 (2001). bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

19. Overmann, J. & Garcia-Pichel, F. The phototrophic way of life. in The Prokaryotes: Prokaryotic

245 Communities and Ecophysiology (eds. Rosenberg, E., DeLong, E. F., Lory, S., Stackebrandt, E. &

Thompson, F.) 203–257 (Springer Berlin Heidelberg, 2013). doi:10.1007/978-3-642-30123-0_51.

20. Gisriel, C. et al. Structure of a symmetric photosynthetic reaction center–photosystem. Science

357, 1021–1025 (2017).

21. Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify

250 genomes with the Genome Taxonomy Database. Bioinformatics 36, 1925–1927 (2020).

22. Olson, J. M. The FMO protein. in Discoveries in Photosynthesis (eds. Govindjee, Beatty, J. T.,

Gest, H. & Allen, J. F.) 421–427 (Springer Netherlands, 2005). doi:10.1007/1-4020-3324-9_40.

23. Bryant, D. A. et al. Candidatus Chloracidobacterium thermophilum: an aerobic phototrophic

Acidobacterium. Science 317, 523–526 (2007).

255 24. Pedersen, M. Ø., Underhaug, J., Dittmer, J., Miller, M. & Nielsen, N. C. The three-dimensional

structure of CsmA: A small antenna protein from the green sulfur bacterium Chlorobium tepidum.

FEBS Lett 582, 2869–2874 (2008).

25. Pedersen, M. Ø., Linnanto, J., Frigaard, N.-U., Nielsen, N. Chr. & Miller, M. A model of the

protein–pigment baseplate complex in chlorosomes of photosynthetic green bacteria. Photosynth

260 Res 104, 233–243 (2010).

26. Tourova, T. P. et al. Phylogeny of anoxygenic filamentous phototrophic bacteria of the family

Oscillochloridaceae as inferred from comparative analyses of the rrs, cbbL, and nifH genes.

Microbiology 75, 192–200 (2006).

27. Shih, P. M., Ward, L. M. & Fischer, W. W. Evolution of the 3-hydroxypropionate bicycle and

265 recent transfer of anoxygenic photosynthesis into the Chloroflexi. Proc Nat Acad Sci USA 114,

10749–10754 (2017).

28. Raymond, J., Siefert, J. L., Staples, C. R. & Blankenship, R. E. The natural history of nitrogen

fixation. Mol Biol Evol 21, 541–554 (2004). bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

29. Enkh-Amgalan, J., Kawasaki, H. & Seki, T. Molecular evolution of the nif gene cluster carrying

270 nifI1 and nifI2 genes in the Gram-positive phototrophic bacterium Heliobacterium chlorum. Int J

Syst Evol Microbiol 56, 65–74 (2006).

30. Ward, L. M., Hemp, J., Shih, P. M., McGlynn, S. E. & Fischer, W. W. Evolution of phototrophy in

the Chloroflexi phylum driven by horizontal gene transfer. Front Microbiol 9, 260 (2018).

275 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Methods

Enrichment cultivation of ‘Ca. Chx. allophototropha’

To culture novel anoxygenic phototrophs, we sampled Lake 227, a seasonally anoxic and

ferruginous Boreal Shield lake at the International Institute for Sustainable Development Experimental

280 Lakes Area (IISD-ELA; near Kenora, Canada). Lake 227 develops up to ~100 μM concentrations of

dissolved ferrous iron in its anoxic water column31, and anoxia is more pronounced than expected

naturally due to the long-term experimental eutrophication of the lake32. The lake’s physico-chemistry

has been described in detail previously.31 We collected water from the illuminated portion of the upper

anoxic zone of Lake 227 in September 2017, at 3.875 and 5 m depth, and transported this water

285 anoxically and chilled in 120 mL glass serum bottles, sealed with black rubber stoppers (Geo-Microbial

Technology Company; Ochelata, Oklahoma, USA), to the laboratory.

Water was supplemented with 2% v/v of a freshwater medium containing 8 mM ferrous

chloride33 and was distributed anoxically into 120 mL glass serum bottles, also sealed with black

rubber stoppers (Geo-Microbial Technology Company), that had a headspace of dinitrogen gas at 1.5

290 atm final pressure. Bottles were spiked with a final concentration of 50 μM Diuron or 3-(3,4-

dichlorophenyl)-1,1-dimethylurea (DCMU); Sigma-Aldrich; St. Louis, Missouri, USA) to block

oxygenic phototrophic activity34 and to eliminate oxygenic phototrophs. Spiking was performed either

at the start of the experiment or as needed based on observations of oxygenic phototroph growth.

Bottles were incubated at 10°C under white light (20-30 μmol photons m-2 s-1 at 400-700 nm

295 wavelengths; blend of incandescent and fluorescent sources). Cultures were monitored regularly for

ferrous iron concentration using the ferrozine assay35 and were amended with additional freshwater

medium or ferrous chloride (e.g., in 0.1-1 mM increments) when ferrous iron levels dropped,

presumably due to iron oxidation.

The ‘Ca. Chx. allophototropha’ enrichment culture was gradually acclimatized to higher media

300 concentrations through repeated feeding and subculture. Cultures were moved to room temperature bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

(20-22°C) growth and could be grown under white light (fluorescent, halogen, or incandescent; same

light intensity as above) or under far red LED light (using a PARSource PowerPAR LED bulb; LED

Grow Lights Depot; Portland, Oregon, USA). A deep agar dilution series36 was used to eliminate

contaminants from the culture. The same freshwater medium was used for this dilution series, amended

305 with an acetate feeding solution37. Ferrous chloride-containing agar plugs (10 mM concentration) were

initially used at the bottom of tubes to form a ferrous iron concentration gradient to qualitatively

determine best conditions for growth. Over multiple rounds of optimization, the growth medium was

adjusted to contain 2 mM ferrous chloride and 1.2 mM sodium acetate and to be a 1:1 dilution of the

original freshwater medium, kept at a pH of ~7.5. In addition, cultures were generally incubated in soft

310 agar (0.2-0.3%; w/v)38 to promote faster growth of ‘Ca. Chx. allophototropha’ cells compared to

growth in standard concentration agar (~0.8%; w/v) or in liquid medium.

Agar shake tubes were incubated under 740 nm LED lights for a period of time to eliminate

purple phototrophic bacteria, because 740 nm light is not readily used for photosynthesis by these

bacteria19. Additionally, a higher medium pH of ~7.5-8.5 was used for a time to select against green

315 phototrophic bacteria belonging to the Chlorobia class, because these bacteria are known to grow

poorly in moderately basic conditions39. Lastly, sodium molybdate was added to the medium at a final

concentration of 100 μM, in a 1:10 concentration ratio relative to sulfate, to inhibit the activity of

sulfate-reducing bacteria as optimized experimentally (Supplementary Methods).

320 Physiological characterization

Filaments of ‘Ca. Chx. allophototropha’ cells were identified based on pigment autofluorescence

using epifluorescence microscopy. An Eclipse 600 light microscope equipped for fluorescence

microscopy (Nikon; Shinagawa, Tokyo, Japan) was used with a xenon light source. Autofluorescence

in the wavelength range expected for bacteriochlorphyll c was detected using an excitation filter of

325 445(±45) nm, dichroic mirror of 520 nm, and emission filter of 715 (LP) nm (all from Semrock; bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Rochester, New York, USA), before light collection using an ORCA-Flash4.0 monochromatic camera

(Hamamatsu Photonics; Hamamatsu City, Shizuoka, Japan). Examination of cells under

epifluorescence microscopy enabled the growth of cells to be qualitatively assessed. In addition,

growth of ‘Ca. Chx. allophototropha’ was confirmed based on relative increases in cell pigmentation

330 using a custom-built pigment fluorescence detection system (Supplementary Methods).

To perform transmission electron microscopy (TEM), cell biomass was picked from agar shake

tubes, and residual agar surrounding cells was digested using agarase. One unit of β-agarase I (New

England Biolabs; Ipswich, Massachusetts, USA) and 10 uL of 10x reaction buffer was added to 100 μL

of cell suspension and incubated at 42°C for 1.5 h. Following cell pelleting and removal of supernatant,

335 cells were then fixed for 2 h at 4°C in 4%/4% glutaraldehyde/paraformaldehyde (dissolved in 1x

phosphate-buffered saline; 1x PBS) and stored at 4°C. Sample preparation and imaging was performed

at the Molecular and Cellular Imaging Facility of the Advanced Analysis Center (University of Guelph,

Ontario, Canada). Washed cell pellets were enrobed in 4% Noble agar (ThermoFisher Scientific;

Waltham, Massachusetts, USA), and enrobed samples were cut into 1 mm cubes. The cubes were fixed

340 in 1% (w/v) osmium tetroxide for 45 min and subjected to a 25-100% ethanol dehydration series.

Samples were then subjected to stepwise infiltration using 25-100% LR White Resin (Ted Pella;

Redding, California, USA), transferred to gelatin capsules in 100% LR White Resin, and allowed to

polymerize overnight at 60°C. An Ultracut UCT ultramicrotome (Leica Microsystems; Wetzlar,

Germany) equipped with a diamond knife (DiATOME; Hatfield, Pennsylvania, USA) was used to cut

345 50 nm thin sections, which were then floated on 100-mesh copper grids and stained with 2% uranyl

acetate and Reynold’s lead citrate to enhance contrast. Sections were imaged under standard operating

conditions using a Tecnai G2 F20 TEM (ThermoFisher Scientific; Waltham, Massachusetts, USA) that

was running at 200 kV and was equipped with a Gatan 4k CCD camera (Gatan; Pleasanton, California,

USA). For scanning electron microscopy (SEM), fixed cells were washed in 1x PBS and incubated in

350 1% (w/v) osmium tetroxide at room temperature for 30 min. Following incubation, cells were washed, bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

deposited on an aluminum stub, dried, and then sputter coated with a gold:palladium mixture using a

Desk V TSC Sample Preparation System (Denton Vacuum; Moorestown, New Jersey, USA). Prepared

samples were imaged using a Quanta FEG 250 SEM (ThermoFisher Scientific) with a high-voltage

setting of 10 kV and working distance of 9.9 mm.

355 Pigments were extracted from the ‘Ca. Chx. allophototropha’ enrichment culture in

acetone:methanol (7:2 v/v). Cells were picked from agar shake tubes and washed once using 10 mM

Tris-HCl (pH=8). The resulting pelleted cell material was suspended in 400 μL of a solution of 7:2

acetone:methanol and was mixed vigorously by vortex and pestle. Supernatant was evaporated until

mostly dry and resuspended in 600 μL acetone. The absorbance of the extracted pigments was assessed

360 between 350-1000 nm using a UV-1800 UV/Visible Scanning Spectrophotometer (Shimadzu; Kyoto,

Japan) and a 1 mL quartz cuvette.

16S rRNA gene amplicon sequencing and analysis

The microbial community composition of early enrichment cultures was assessed using 16S

365 ribosomal RNA (16S rRNA) gene amplicon sequencing with subsequent analysis of the obtained

sequences. Genomic DNA was extracted from pelleted cell biomass using the DNeasy UltraClean

Microbial Kit (Qiagen; Venlo, The Netherlands) according to the manufacturer protocol, with the

addition of a 10 min treatment at 70°C after adding Solution SL for improved cell lysis. The V4-V5

hypervariable region of the 16S rRNA gene was then amplified from extracted DNA using the

370 universal prokaryotic PCR primers 515F-Y40 and 926R41, using primers with attached index sequences

and sequencing adaptors as described previously42–44. Amplification was performed in singlicate.

Library pooling, cleanup, and sequencing on a MiSeq (Illumina; San Diego, California, USA) was

performed as described previously42. Sequence data analysis was performed using QIIME245 version

2019.10 via the AXIOME46 pipeline, commit 1ec1ea6 (https://github.com/neufeld/axiome3), with bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

375 default analysis parameters. Briefly, paired-end reads were trimmed, merged, and denoised using

DADA247 to generate an amplicon sequence variant (ASV) table. Taxonomic classification of ASVs

was performed using QIIME2’s naive Bayes classifier48 trained against the Silva SSU database49,50,

release 132. The classifier training file was prepared using QIIME2 version 2019.7.

380 Metagenome sequencing

The functional gene content of early liquid enrichment cultures was assessed via shotgun

metagenome sequencing. Genomic DNA was extracted as above, and library preparation and

sequencing was performed at The Centre for Applied Genomics (TCAG; The Hospital for Sick

Children, Toronto, Canada). The Nextera DNA Flex Library Prep Kit (Illumina) kit was used for

385 metagenome library preparation, and libraries were sequenced using a fraction of a lane of a HiSeq

2500 (Illumina) with 2x125 base paired-end reads to obtain 5.0-7.3 million total reads per sample.

High molecular weight DNA was later extracted from the ‘Ca. Chx. allophototropha’ enrichment

culture grown in agar shake tubes. Colonies were picked from agar, and genomic DNA was extracted

using a custom phenol:chloroform DNA extraction protocol followed by size selection and cleanup

390 (Supplementary Methods). Read cloud DNA sequencing library preparation was performed using the

TELL-Seq WGS Library Prep Kit51 (Universal Sequencing Technology; Canton, Massachusetts, USA)

following ultralow input protocol recommendations for small genomes. Library amplification was

performed using 10 μL TELL-beads and 16 amplification cycles. The resulting library was sequenced

using a MiSeq Reagent Kit v2 (300-cycle; Illumina) with 2x150 bp read length on a MiSeq (Illumina)

395 to a depth of 19.6 million total reads.

bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Genome assembly and binning

Read cloud metagenome sequencing data for the ‘Ca. Chx. allophototropha’ enrichment culture

were demultiplexed and quality control was performed on index reads using the Tell-Read pipeline,

400 version 0.9.7 (Universal Sequencing Technology), with default settings. Demultiplexed reads were then

assembled using Tell-Link, version 1.0.0 (Universal Sequencing Technology), using global and local

kmer lengths of 65 and 35, respectively. Metagenome data for this sample, together with metagenome

data for an early enrichment of the same culture, were then partitioned into genome bins and analyzed

using the ATLAS pipeline, version 2.2.052 (Supplementary Methods). The ‘Ca. Chx. allophototropha’

405 genome bin was identified based on its taxonomic placement within the Chloroflexota phylum

according to the Genome Taxonomy Database Toolkit (GTDB-Tk) version 0.3.3, which relied on

GTDB release 894,21. This genome bin was manually curated to remove potentially incorrectly binned

scaffolds (Supplementary Methods). Following curation, the genome bin was annotated using Prokka

version 1.14.653, and selected portions of the genome were checked for functional genes using the

410 BlastKOALA webserver version 2.254.

Short read metagenome sequencing data generated from a second early enrichment culture was

also analyzed. This second enrichment was lost early in the cultivation process but contained another

member of the ‘Ca. Chloroheliales’ (Supplementary Note 1). The single enrichment culture

metagenome was processed using ATLAS version 2.2.052,55 for read quality control, assembly and

415 genome binning. A single genome bin was classified using the GTDB-Tk21 to the Chloroflexota

phylum. This genome bin was manually curated (Supplementary Methods) and was then annotated

using Prokka53 as above. All ATLAS settings for both runs are available in the code repository

associated with this work.

To assess the community composition of the enrichment cultures, the relative abundances of each

420 recovered genome bin from ATLAS were assessed by calculating the number of reads mapped to each

bin divided by the total reads mapped to assembled scaffolds. To cross-validate these results, short bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

protein fragments were predicted directly from unassembled read data using FragGeneScanPlusPlus56

commit 9a203d8. These protein fragments were searched using a profile Hidden Markov Model

(HMM) for the single-copy housekeeping gene rpoB, and hits were assigned taxonomy using

425 MetAnnotate57 version 0.92 with default settings. The HMM for rpoB (2009 release) was downloaded

from FunGene58, and taxonomic classification was performed using the RefSeq database, release 80

(March 2017)59. Output of MetAnnotate was analyzed using the metannoviz R library, version 1.0.2

(https://github.com/metannotate/metannoviz) with an e-value cutoff of 10-10.

430 Identification of RCI-associated genes

To detect remote homologs of RCI-associated genes in the two ‘Ca. Chloroheliales’-associated

genome bins, hmmsearch60 version 3.1b2 was used to search predicted proteins from the bins using

HMMs downloaded from Pfam61. In particular, genes encoding the core Type I reaction center

(pscA/pshA/psaAB; PF00223), a Type I reaction center-associated protein (pscD; PF10657),

435 chlorosomes structural units (csmAC; PF02043, PF11098), and a chlorosome energy transfer protein

(fmoA; PF02327), were queried. Detected homologs in the genome bins were assessed for their

alignment to known RCI-associated genes encoded by phototrophic microorganisms belonging to other

phyla. The tertiary structures of the detected homologs were also predicted using I-TASSER62, and

phylogenetic placement was assessed (see below). Custom HMMs were built for selected homologs

440 using available input sequences. Primary sequences predicted from the genomes for each gene of

interest were aligned using Clustal Omega63 version 1.2.3, and HMMs were generated using

hmmbuild60 version 3.1b2. Custom HMMs and homology models generated by I-TASSER are

available in the code repository associated with this work.

bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

445 Assessment of genomic potential for photosynthesis within the Chloroflexota phylum

Representative genomes for the phylum Chloroflexota were downloaded from NCBI based on

information in the GTDB4, release 89. All genomes that represented a type species of Chloroflexota

according to NCBI and the GTDB were downloaded, as well as genomes representing members of

known phototrophic clades (i.e., the Chloroflexaceae family15, the ‘Ca. Thermofonsia’ order30, and the

450 ‘Ca. Roseiliniales’ order64). Selected genomes of known phototrophs that were deposited in other

genome databases were also downloaded (see details in the code repository associated with this work).

In addition, any genome bins listed in the GTDB that belonged to the ‘54-19’ order (which represented

the name of the ‘Ca. Chloroheliales’ order in this database) were downloaded. Non-phototrophic

lineages were subsequently pruned to one representative per genus, with the exception of non-

455 phototrophic lineages closely related to phototrophic clades. In total, this left 58 genomes, including

genomes of 28 known phototrophs. A concatenated core protein phylogeny was constructed for this

genome collection using GToTree65 version 1.4.11. The ‘Bacteria.hmm’ collection of 74 single-copy

marker genes was used, with a minimum of 30% of the genes required in each genome in order for the

genome to be kept in the final phylogeny. IQ-TREE66 version 1.6.9 was used to build the phylogeny

460 from the masked and concatenated multiple sequence alignment. ModelFinder67 was used to determine

the optimal evolutionary model for phylogeny building. Branch supports were calculated using 1000

ultrafast bootstraps68.

A collection of photosynthesis-associated genes, including genes for reaction centers, antenna

proteins, chlorosome structure and attachment, bacteriochlorophyll synthesis, and carbon fixation, was

465 selected based on the genome analyses of Tang and colleagues69 and Bryant and colleagues5. Reference

sequences for these genes were selected from genomes of well-studied representatives of the phylum,

Chloroflexus aurantiacus70, Oscillochloris trichoides71, and Roseiflexus castenholzii72. Bidirectional

BLASTP73 was performed on these reference sequences against the entire Chloroflexota genome

collection (above) to detect potential orthologs. The BackBLAST pipeline74, version 2.0.0-alpha3 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

470 (doi:10.5281/zenodo.3697265), was used for bidirectional BLASTP, and cutoffs for the e value, percent

identity, and query coverage of hits were empirically optimized to 10-3, 20%, and 50%, respectively.

Genes involved in iron oxidation or reduction were also searched using FeGenie75, commit 30174bb,

with default settings.

475 Phylogenetic assessment of photosynthesis-associated genes

Selected photosynthesis-associated genes were tested against genes from other phyla to examine

the phylogenetic relationship of the novel ‘Ca. Chloroheliales’ sequences to those of other phototrophic

organisms. Genes for RCI/PSI (pscA/pshA/psaAB)76, RCI-chlorosome attachment (fmoA)22, chlorosome

structure (csmA)25, (bacterio)chlorophyll synthesis (bchIDH/chlIDH and bchLNB/chlLNB/bchXYZ)5,

480 and RuBisCo (cbbL)77, were examined. Genomes potentially containing these genes of interest were

determined using a combination of automated detection via AnnoTree78 and descriptions in the

literature. Annotree was also used to summarize the GTDB reference tree for downstream data

visualization. Representative genomes from this initial genome set were selected from the GTDB,

based on genome quality and taxonomic novelty, before being downloaded from NCBI. Potential

485 orthologs of the genes of interest were identified in the downloaded genomes using bidirectional

BLASTP, which was performed using the primary sequences of known reference genes as queries

(Supplementary Data 3) via BackBLAST74 version 2.0.0-alpha3. In addition, for some sequence sets

(i.e., Type I reaction centers; Bch proteins; CbbL), additional reference sequences were manually added

based on literature references5,76,77. Sequence sets were then pruned to remove closely related

490 sequences. Identified primary sequences were aligned using Clustal Omega63 version 1.2.3, and the

sequence sets were further verified by manually inspecting the sequence alignments for unusual

variants. Alignments were then masked from non-informative regions using Gblocks79 version 0.91b

with relaxed settings (-t=p -b3=40 -b4=4 -b5=h) to preserve regions with remote homology (see results

in Extended Data Table 1). Maximum likelihood protein phylogenies were built from masked sequence bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

495 alignments using IQ-TREE66 version 1.6.9. Evolutionary rate model selection was performed using

ModelFinder67, and 1000 rapid bootstraps were used to calculate branch support values68. Rate models

used are summarized in Extended Data Table 1.

Data availability

500 The three enrichment culture metagenomes, along with recovered genome bins, are available

under the National Center for Biotechnology Information (NCBI) BioProject accession PRJNA640240.

Amplicon sequencing data are available under the same NCBI BioProject accession.

Code availability

505 Custom scripts and additional raw data files used to analyze the metagenome and genome data

are available at https://github.com/jmtsuji/Ca-Chlorohelix-allophototropha-RCI

(doi:10.5281/zenodo.3932366).

bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

510 References

31. Schiff, S. L. et al. Millions of Boreal Shield lakes can be used to probe Archaean Ocean

biogeochemistry. Sci Rep 7, 46708 (2017).

32. Schindler, D. W. et al. Eutrophication of lakes cannot be controlled by reducing nitrogen input:

Results of a 37-year whole-ecosystem experiment. Proc Natl Acad Sci USA 105, 11254–11258

515 (2008).

33. Hegler, F., Posth, N. R., Jiang, J. & Kappler, A. Physiology of phototrophic iron(II)-oxidizing

bacteria: implications for modern and ancient environments. FEMS Microbiol Ecol 66, 250–260

(2008).

34. Vandermeulen, J. H., Davis, N. D. & Muscatine, L. The effect of inhibitors of photosynthesis on

520 zooxanthellae in corals and other marine invertebrates. Marine Biology 16, 185–191 (1972).

35. Stookey, L. L. Ferrozine - a new spectrophotometric reagent for iron. Anal Chem 42, 779–781

(1970).

36. Pfennig, N. purpureus gen. nov. and sp. nov., a ring-shaped, vitamin B12-requiring

member of the family Rhodospirillaceae. Int J Syst Evol Microbiol 28, 283–288 (1978).

525 37. Imhoff, J. F. The family Chromatiaceae. in The Prokaryotes (eds. Rosenberg, E., DeLong, E. F.,

Lory, S., Stackebrandt, E. & Thompson, F.) 151–178 (Springer Berlin Heidelberg, 2014).

doi:10.1007/978-3-642-38922-1_295.

38. Gaisin, V. A. et al. ‘Candidatus Viridilinea mediisalina’, a novel phototrophic Chloroflexi

bacterium from a Siberian soda lake. FEMS Microbiol Lett 366, (2019).

530 39. Imhoff, J. F. The family Chlorobiaceae. in The Prokaryotes (eds. Rosenberg, E., DeLong, E. F.,

Lory, S., Stackebrandt, E. & Thompson, F.) 501–514 (Springer Berlin Heidelberg, 2014).

doi:10.1007/978-3-642-38954-2_142. bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

40. Parada, A. E., Needham, D. M. & Fuhrman, J. A. Every base matters: assessing small subunit

rRNA primers for marine microbiomes with mock communities, time series and global field

535 samples. Environ Microbiol 1403–1414 (2016) doi:10.1111/1462-2920.13023.

41. Quince, C., Lanzen, A., Davenport, R. J. & Turnbaugh, P. J. Removing noise from pyrosequenced

amplicons. BMC Bioinform 12, 38 (2011).

42. Kennedy, K., Hall, M. W., Lynch, M. D. J., Moreno-Hagelsieb, G. & Neufeld, J. D. Evaluating

bias of Illumina-based bacterial 16S rRNA gene profiles. Appl Environ Microbiol 80, 5717–5722

540 (2014).

43. Bartram, A. K., Lynch, M. D., Stearns, J. C., Moreno-Hagelsieb, G. & Neufeld, J. D. Generation

of multimillion-sequence 16S rRNA gene libraries from complex microbial communities by

assembling paired-end Illumina reads. Appl Environ Microbiol 77, 3846–3852 (2011).

44. Cavaco, M. A. et al. Freshwater microbial community diversity in a rapidly changing High Arctic

545 watershed. FEMS Microbiol Ecol 95, fiz161 (2019).

45. Bolyen, E. et al. Reproducible, interactive, scalable and extensible microbiome data science using

QIIME 2. Nat Biotechnol 37, 852–857 (2019).

46. Lynch, M. D., Masella, A. P., Hall, M. W., Bartram, A. K. & Neufeld, J. D. AXIOME: automated

exploration of microbial diversity. GigaScience 2, 3–3 (2013).

550 47. Callahan, B. J. et al. DADA2: High-resolution sample inference from Illumina amplicon data.

Nat Methods 13, 581–583 (2016).

48. Bokulich, N. A. et al. Optimizing taxonomic classification of marker-gene amplicon sequences

with QIIME 2’s q2-feature-classifier plugin. Microbiome 6, 90 (2018).

49. Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and

555 web-based tools. Nucleic Acids Res 41, D590–D596 (2013).

50. Glöckner, F. O. et al. 25 years of serving the community with ribosomal RNA gene reference

databases and tools. J Biotechnol 261, 169–176 (2017). bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

51. Chen, Z. et al. Ultra-low input single tube linked-read library method enables short-read second-

generation sequencing systems to generate highly accurate and economical long-range

560 sequencing information routinely. Genome Res gr.260380.119 (2020) doi:10.1101/gr.260380.119.

52. Kieser, S., Brown, J., Zdobnov, E. M., Trajkovski, M. & McCue, L. A. ATLAS: a Snakemake

workflow for assembly, annotation, and genomic binning of metagenome sequence data. BMC

Bioinform 21, 257 (2020).

53. Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069

565 (2014).

54. Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG tools for

functional characterization of genome and metagenome sequences. J Mol Biol 428, 726–731

(2016).

55. White III, R. A. et al. ATLAS (Automatic Tool for Local Assembly Structures) - a comprehensive

570 infrastructure for assembly, annotation, and genomic binning of metagenomic and

metatranscriptomic data. PeerJ Preprints 5, e2843v1 (2017).

56. Singh, R. G. et al. Unipept 4.0: functional analysis of metaproteome data. J Proteome Res 18,

606–615 (2018).

57. Petrenko, P., Lobb, B., Kurtz, D. A., Neufeld, J. D. & Doxey, A. C. MetAnnotate: function-

575 specific taxonomic profiling and comparison of metagenomes. BMC Biol 13, 1–8 (2015).

58. Fish, J. A. et al. FunGene: the functional gene pipeline and repository. Front Microbiol 4, 291

(2013).

59. O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic

expansion, and functional annotation. Nucleic Acids Res 44, D733–D745 (2016).

580 60. Eddy, S. R. Accelerated profile HMM searches. PLOS Comput Biol 7, e1002195 (2011).

61. Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucl.

Acids Res. 44, D279–D285 (2016). bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

62. Yang, J. et al. The I-TASSER Suite: protein structure and function prediction. Nat Methods 12, 7–

8 (2015).

585 63. Sievers, F. et al. Fast, scalable generation of high‐quality protein multiple sequence alignments

using Clustal Omega. Mol Syst Biol 7, 539 (2011).

64. Tank, M., Thiel, V., Ward, D. M. & Bryant, D. A. A panoply of phototrophs: an overview of the

thermophilic chlorophototrophs of the microbial mats of alkaline siliceous hot springs in

Yellowstone National Park, WY, USA. in Modern Topics in the Phototrophic Prokaryotes:

590 Environmental and Applied Aspects (ed. Hallenbeck, P. C.) 87–137 (Springer International

Publishing Switzerland, 2017). doi:10.1007/978-3-319-46261-5_3.

65. Lee, M. D. GToTree: a user-friendly workflow for phylogenomics. Bioinformatics 35, 4162–4164

(2019).

66. Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective

595 stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32, 268–274

(2015).

67. Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S.

ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods 14, 587–589

(2017).

600 68. Hoang, D. T., Chernomor, O., von Haeseler, A., Minh, B. Q. & Vinh, L. S. UFBoot2: improving

the ultrafast bootstrap approximation. Mol Biol Evol 35, 518–522 (2018).

69. Tang, K.-H. et al. Complete genome sequence of the filamentous anoxygenic phototrophic

bacterium Chloroflexus aurantiacus. BMC Genomics 12, 334 (2011).

70. Pierson, B. K. & Castenholz, R. W. A phototrophic gliding filamentous bacterium of hot springs,

605 Chloroflexus aurantiacus, gen. and sp. nov. Arch Microbiol 100, 5–24 (1974).

71. Keppen, O. I., Baulina, O. I. & Kondratieva, E. N. Oscillochloris trichoides neotype strain DG-6.

Photosynth Res 41, 29–33 (1994). bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

72. Hanada, S., Takaichi, S., Matsuura, K. & Nakamura, K. Roseiflexus castenholzii gen. nov., sp.

nov., a thermophilic, filamentous, photosynthetic bacterium that lacks chlorosomes. Int J Syst

610 Evol Microbiol 52, 187–193 (2002).

73. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search

tool. J Mol Biol 215, 403–410 (1990).

74. Bergstrand, L. H., Cardenas, E., Holert, J., Hamme, J. D. V. & Mohn, W. W. Delineation of

steroid-degrading microorganisms through comparative genomic analysis. mBio 7, e00166-16

615 (2016).

75. Garber, A. I. et al. FeGenie: A comprehensive tool for the identification of iron genes and iron

gene neighborhoods in genome and metagenome assemblies. Front Microbiol 11, 37 (2020).

76. Cardona, T. Early Archean origin of heterodimeric Photosystem I. Heliyon 4, e00548 (2018).

77. Tabita, F. R., Hanson, T. E., Satagopan, S., Witte, B. H. & Kreel, N. E. Phylogenetic and

620 evolutionary relationships of RubisCO and the RubisCO-like proteins and the functional lessons

provided by diverse molecular forms. Phil Trans R Soc B 363, 2629–2640 (2008).

78. Mendler, K. et al. AnnoTree: visualization and exploration of a functionally annotated microbial

tree of life. Nucleic Acids Res 47, 4442–4448 (2019).

79. Talavera, G. & Castresana, J. Improvement of phylogenies after removing divergent and

625 ambiguously aligned blocks from protein sequence alignments. Syst Biol 56, 564–577 (2007).

Acknowledgments

We thank staff at the IISD-ELA and R. Elgood for providing baseline limnological data and

sampling advice; J. Mead, E. Barber, J. Wolfe, K. Thompson, and R. Henderson for assistance with

630 lake sampling; K. Engel for assistance with high-throughput DNA sequencing; Y. Shirotori and M.

Saini for assistance with enrichment cultivation; V. Gaisin for advice for cultivating filamentous

phototrophs; S. Hirose for assistance with Sanger sequencing and pigment analyses; R. Harris and E. bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Roach for assistance with electron microscopy; T. Chen for advice during read cloud sequencing library

preparation; Y. Wang for assistance with read cloud sequence assembly; V. Thiel for advice about

635 reference genomes of Chloroflexota members; and N. Tran and K. Shimada for discussion and advice

about biochemistry. This work was supported by a Strategic Partnership Grant for Projects from the

National Sciences and Engineering Research Council of Canada (NSERC), Discovery Grants from

NSERC, and a research grant from the Institute for Fermentation, Osaka.

640 Author contributions

S.L.S., J.D.N., and J.J.V. conceived the study. J.M.T., M.T., and N.A.S. performed enrichment

cultivation. J.M.T. collected epifluorescence microscopy images and ran pigment analyses. J.M.T. and

N.A.S. performed electron microscopy, extracted genomic DNA, and performed metagenome library

preparation and sequencing. J.M.T. analyzed 16S rRNA gene and metagenome sequencing data,

645 conducted phylogenetic analyses, and visualized the data. J.M.T., S.N., M.T., S.H., and J.D.N.

interpreted the data and its impacts on understanding the evolution of photosynthesis. J.D.N., M.T., and

S.H. supervised research. J.M.T. wrote the paper with J.D.N., N.A.S., S.N., and comments from all

other authors. All authors have read, revised and approved the manuscript submission.

650 Ethics declarations

Competing interests

The authors declare no competing interests.

bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

655 Supplementary Information | PDF file containing Supplementary Notes 1-4, Supplementary

Methods, and Supplementary References.

Supplementary Data 1 | Amplicon sequencing variant table of early phototroph enrichment

cultures from this study. The Excel file summarizes the percent relative abundances of amplicon

660 sequencing variants (ASVs) detected in 16S ribosomal RNA gene amplicon sequencing data

representing early enrichment cultures of ‘Ca. Chloroheliales’ members.

Supplementary Data 2 | Genomic context of novel Type I reaction center genes detected in this

study. The Excel file summarizes the top five BLASTP hits against the RefSeq database for predicted

665 open reading frames (ORFs) of each of 20 genes up/downstream of the detected pscA-like genes.

Supplementary Data 3 | Bidirectional BLASTP results for photosynthesis-related genes among

the Chloroflexota phylum. The Excel file summarizes the query sequences and results of bidirectional

BLASTP to search for photosynthesis-related genes in genomes of Chloroflexota members. These data

670 are visualized in Fig. 3.

bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Fig. 1 | Physiology of the ‘Ca. Chx. allophototropha culture’. a, Example growth of the ‘Ca. Chx. allophototropha’ enrichment in soft agar. The left panel is a photograph of a test tube with a ruler shown for scale, and the right panel shows the same tube when viewed under a pigment fluorescence detection system optimized for bacteriochlorophyll c (Supplementary Methods; Extended Data Fig. 9) 5 b-c, Epifluorescence microscopy images of ‘Ca. Chx. allophototropha’ cells. Cell autofluorescence is shown, with light excitation and detection wavelengths optimized for bacteriochlorophyll c. d, Scanning electron microscopy image of an aggregate of ‘Ca. Chx. allophototropha’ cells. e-f, Transmission electron microscopy images showing longitudinal and cross sections, respectively, of ‘Ca. Chx. allophototropha’ cells. Example chlorosomes are marked with arrows. g, Absorption 10 spectrum (350-850 nm) of an acetone:methanol extract of pigments from the ‘Ca. Chx. allophototropha’ culture. Major absorption peaks are marked by dashed lines. Scale bars on the top row of images (b-d) represent 10 μm, and scale bars on the bottom row of images (e-f) represent 0.1 μm. bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Photosystem utilized Acidobacteriota a b Chlorobia Chloracidobacteriales RCI RCII PSII and PSI RCII or RCI (B) 98

100

Proteobacteria Cyanobacteria Ca. Chloroheliales 100

Eremiobacterota

Gemmatimonadota

Heliobacteriales Bacteroidota 100 Chloroflexota

Anoxygenic c (RCI; homodimer)

Oxygenic (PSI; heterodimer)

PsaA 94 PsaB

Cyanobacteria / 1.0

Fig. 2 | Functional novelty of the ‘Ca. Chlorohelix allophototropha’ photosystem. a, Overview of the eight containing known chlorophototrophs. A class-level summary of the Genome Tree Database (GTDB) bacterial reference tree (release 89) is shown, with chlorophototrophic phyla highlighted based on the photosynthetic reaction center utilized. b, Maximum likelihood phylogeny of 5 Fe-S type reaction center primary sequences. Ultrafast bootstrap values for major branch points are shown out of 100 based on 1000 replicates, and the scale bar shows the expected proportion of amino acid change across the 548-residue masked sequence alignment (Extended Data Table 1). Chlorophototrophic lineages are summarized by order name. The placement of the PscA-like sequence of ‘Ca. Chx. allophototropha’ is indicated by a black dot. All clades above they grey dot utilize 10 chlorosomes. c, Predicted tertiary structure of the novel ‘Ca. Chlorohelix allophototropha’ PscA-like primary sequence based on homology modelling. The six N-terminal and five C-terminal transmembrane helices expected for RCI are coloured in red and tan, respectively.

Chloroflexales Ca. Chloroheliales Ca. Thermofonsia 2 Ca. Thermofonsia 3

nifD

nifH

cbbL

cbbP

Nitrogen fixation Nitrogen

meh

mct

mch CBB (Calvin) cycle (Calvin) CBB

bicycle mcl

pcs

mcr

bchK

3-hydroxypropionate

bchU

bchP

bchG

bchC

bchZ

bchY

bchX

bchF

bchL

bchB

synthesis

bchN

acsF

bchE

(Bacterio)chlorophyll bchJ

.

The copyright holder for this preprint (which preprint this for holder copyright The bchM

bchH

bchD

bchI

csmY

csmP

csmO

csmN

csmM

and assembly and csmA

pufC

pufB Chlorosome structure Chlorosome

pufA

pufLM

pufM

pufL

center-associated

Type II reaction II Type pscA fmoA

this version posted July 7, 2020. 2020. 7, July posted version this

−B center-associated ; −19 reaction I Type JKG1T CC-BY-NC-ND 4.0 International license International 4.0 CC-BY-NC-ND DSM 13941 40 60 80 100 Chloroflexi bin J036 Chlorothrix halophila Chloroflexi bin 54 C/N fixation Roseiflexus sp. RS−1 TC1 Flexilinea flocculi Chloroflexus sp. MS−G Thermorudispeleae KI4 Nitrolancea hollandica Lb Roseilinea gracile OTU−6 sp. Y−396−1 Chloroflexus sp. Ornatilinea apprima P3M−1 Chloroflexus islandicus isl−2 . Thermofonsia bin JP3−7 Ca. . Thermofonsia bin JP1−8 Ca. Litorilinea aerophila PRI−413 Ardenticatena maritima 110S . Thermofonsia bin JP1−20 Ca. Thermofonsia bin JP1−16 Ca. Thermofonsia bin JP3−13 Ca. Oscillochloris trichoides6 DG− Ca. Chloroploca asiatica9 B7− YMTK−2 tardivitalis Bellilinea caldifistulae GOMI−1 Thermofonsia bin CP1−1M Ca. Kouleothrix aurantiaca COM saccharolytica KIBI−1 Ca. Promineofilum breve Cfx−K Dehalococcoides mccartyi CG5 thermophila UNI−1 Ca. Oscillochloris fontis Chuk17 . Thermofonsia bin CP2−42A Ca. . Thermofonsia bin CP2−20G Ca. Caldilinea aerophila DSM 14535 Thermoflexus hugenholtzii JAD2 Thermoflexus hugenholtzii G233 Kallotenue papyrolyticum Chloroflexus aurantiacus−fl J−10 40 60 80 100 Brevefilum fermentans CAMBI−1 available under a under available Chloroflexi RRmetagenome_bin16 DSM 9485 . Viridilinea halotolerans Chok−6 Ca. Viridilinea Thermanaerothrix daxensis GNS−1 Ca. Chloroheliaceae bin L227−5C . Viridilinea mediisalina Kir15−3F Ca. Viridilinea Roseiflexus castenholzii Herpetosiphon aurantiacus DSM 785 Ktedonobacter racemifer DSM 44963 Thermomicrobium roseum DSM 5159 Herpetosiphon geysericola DSM 7119 chlorophyll Bacterio- Chloranaerofilum corporosum OTU−15 Herpetosiphon llansteffanense CA052B Sphaerobacter thermophilus DSM 20745 ATCC BAA−798 ATCC Thermobaculum terrenum Thermogemmatispora carboxidivorans PM5 Ca. Chlorohelix allophototropha L227−S17 AM1 (outgroup) Methylobacterium extorquens ATCC BAA−1881 ATCC Thermosporothrix hazakensis Dehalogenimonas lykanthroporepellens BL−DC−9 40 60 80 100 83 93 80 71 96 Chlorosome 95 92 https://doi.org/10.1101/2020.07.07.190934 99 40 60 80 100 75 doi: doi: 89 77 79 89 65 76 Photosynthesis 46 77 0.3 was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made made It is perpetuity. in preprint the display to license a bioRxiv has granted who author/funder, is the review) peer by certified not was each gene is marked by bolded outlines on the heatmap. Raw results of bidirectional BLASTP are BLASTP Raw results of bidirectional on the heatmap. marked by bolded outlines each gene is on the right phototrophs are labelled potential Supplementary Data 3. Orders containing summarized in of the heatmap. highlighted in orange. On the right, a heatmap shows the presence/absence of genes involved in of genes involved in shows the presence/absence orange. On the right, a heatmap highlighted in of the fill colour of The intensity BLASTP. based on bidirectional or related processes photosynthesis query sequence. Query hit compared to the identity of the BLASTP corresponds to the percent each tile were derived from the genomes of ‘Ca. Chlorohelix Chloroflexus allophototropha’, sequences for used query organism , and the trichoides , or Oscillochloris castenholzii aurantiacus, Roseiflexus A among members of the Chloroflexota phylum. A phototrophy Fig. 3 | Genomic potential for on a set of shown on the left based of key members of the phylum is phylogeny maximum likelihood on 1000 values are shown out of 100 based proteins. Ultrafast bootstrap core bacterial 74 concatenated 613- 11 across the proportion of amino acid change bar shows the expected and the scale replicates, clade is The ‘Ca. Chloroheliales’ 1). Table (Extended Data sequence alignment residue masked bioRxiv preprint preprint bioRxiv Amino acid identity (%)

5 10 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

a Xanthomonadaceae 2.1 Geobacteraceae 19.9 3.5 Desulfuromonadaceae 2.1 2.5 Comamonadaceae 3 Burkholderiaceae 2.2 Unassembled metagenome data metagenome Unassembled Rhodobacteraceae 4.4 Xanthobacteraceae 3.4 Rhizobiales bacterium MIMtkB18 2.2 Rhizobiaceae 5 Hyphomicrobiaceae 2.9 Bradyrhizobiaceae 5.6 8.8 Sphaerobacteraceae 2.6 4.8 Herpetosiphonaceae 2.4 Roseiflexaceae 3

Family (NCBI taxonomy) Chloroflexaceae 2.6 Anaerolineaceae 3.3 Sphingobacteriaceae 2.2 2.3 2.5 Flavobacteriaceae 3.5 4.2 2.4 Phylum (GTDB / NCBI) Prolixibacteraceae 3.1 Acidobacteriota / Acidobacteria Porphyromonadaceae 2.5 2.8 Lentimicrobiaceae 7.5 6.9 4.6 Bacteroidota / Holophagaceae 44.8 Chloroflexota / Chloroflexi b Cfx2-01 (Unresolved CG2−30−32−10; 96.4/3.05) 5.3 Desulfuromonadota / * Cfx2-02 (Ca. Chloroheliales bin L227-5C; 96.33/1.56) 18.9 Patescibacteria / NA Metagenome-assembled genomes Metagenome-assembled Cfx2-04 (Pelobacter C; 97.4/0.97) 6.2 Proteobacteria / Proteobacteria Cfx2-03 (Rhodoplanes; 98.42/1.11) 43.8 0.1 Cfx3-01 (Geothrix; 99.12/0.05) 38.4 Cfx3-06 (UBA8529; 98.1/2.33) 7.4 Cfx3-09 (Unresolved FEN−979; 97.31/2.69) 7.9 Cfx3-02 (UBA4417; 97.04/2.15) 11.5 Cfx3-03 (Unresolved Lentimicrobiaceae; 98.91/6.94) 0.8 16.5 Cfx3-04 (Ca. Chlorohelix allophototropha; 99.08/1.38) 39.1 43.6 Cfx3-08 (Unresolved Geobacteraceae; 96.13/0) 13.7 Genome (GTDB taxonomy) Cfx3-07 (UBA8515; 60.75/3.4) 0.6 Cfx3-05 (Rhodopseudomonas; 98.4/1.41) 7.8 Cfx3-10 (Rhodoferax; 69.87/3.74) 2.2

Ca. Chx. allophototrophaCa. Chx. allophototropha S1 S15 Ca. Chloroheliaceae bin L227−5C S0

Extended Data Fig. 1 | Composition of the ‘Ca. Chloroheliales’ enrichment cultures described in this study. a, Bubble plot showing the relative abundances of taxa within ‘Ca. Chloroheliales’ enrichment culture metagenomes based on classification of partial RpoB fragments recovered from unassembled reads. Bubbles are coloured based on phylum according to NCBI taxonomy, because 5 reads were classified against the RefSeq database. Microbial families with greater than 2% relative abundance are shown. Two metagenomes were derived from subcultures #1 and 15 of the ‘Ca. Chx. allophototropha’ culture, and the third metagenome was derived from the initial enrichment of ‘Ca. Chloroheliales bin L227-5C’. b, Bubble plot showing the relative abundances of uncurated genome bins recovered from the same metagenomes. Relative abundances are expressed as the percentage of 10 reads mapped to the genome bin compared to the total number of assembled reads. Bubbles are coloured by phylum according to GTDB taxonomy and are shown for all entries greater than 0.05% relative abundance. Listed beside each genome bin name are the GTDB genus that the bin was classified to and the percent completeness and contamination of the bin according to CheckM. Statistics shown for the ‘Ca. Chx. allophototropha’ bin differ slightly compared to Supplementary Note 15 2, which describes the statistics for the bin after manual bin curation. bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Phylum (order) Porphyra purpurea 97 Phaeodactylum tricornutum Cyanobacteria / chloroplast 98 Cyanidium caldarium 100Cycas revoluta 81 Arabidopsis thaliana Bacteroidota (Chlorobia) 100 Chara vulgaris 97 Synechococcus elongatus PCC 6301 Acidobacteriota (Chloracidobacteriales) Prochlorococcus marinus str. CCMP1375 Gloeomargarita lithophora Alchichica−D10 Spirulina major PCC 6313 Chloroflexota (Ca. Chloroheliales) 99 Microcystis aeruginosa NIES−843 100 Scytonema hofmannii PCC 7110 Firmicutes B (Heliobacteriales) 83 Fischerella thermalis PCC 7521 82 thermalis PCC 7203 100 Nostoc punctiforme PCC 73102 100 Fortiea contorta PCC 7126 PsaB Anabaena cylindrica PCC 7122 100 Leptolyngbya boryana PCC 6306 Stanieria cyanosphaera PCC 7437 Oscillatoria acuminata PCC 6304 Gloeobacter violaceus PCC 7421 100 Gloeobacter kilaueensis JS1 96 Porphyra purpurea Phaeodactylum tricornutum Cyanidium caldarium 100 Cycas revoluta 0.3 100 Arabidopsis thaliana Chara vulgaris 94 95 Prochlorococcus marinus str. CCMP1375 Synechococcus elongatus PCC 6301 Gloeomargarita lithophora Alchichica−D10 95 Oscillatoria acuminata PCC 6304 Stanieria cyanosphaera PCC 7437 82 Microcystis aeruginosa NIES−843 100 Spirulina major PCC 6313 Leptolyngbya boryana PCC 6306 98 Fortiea contorta PCC 7126 Anabaena cylindrica PCC 7122 97 Nostoc punctiforme PCC 73102 94 93 Fischerella thermalis PCC 7521 89 Scytonema hofmannii PCC 7110 PsaA 91 Chroococcidiopsis thermalis PCC 7203 100 Gloeobacter violaceus PCC 7421 Gloeobacter kilaueensis JS1 Chlorobium chlorochromatii CaD3 100 Ca. Chlorobium phaeoferrooxidans KB01 100 Chlorobium phaeoclathratiforme BU−1 Chlorobium phaeobacteroides DSM 266 Chlorobium limicola DSM 245 Chlorobium phaeovibrioides DSM 265 84 100 Chlorobium luteolum DSM 273 99 Chlorobaculum tepidum TLS 100 Chlorobaculum limnaeum DSM 1677 98 Chlorobaculum parvum NCIB 8327 PscA 99 Prosthecochloris marina V1 99 Chlorobium phaeobacteroides BS1 98 Prosthecochloris aestuarii DSM 271 96 Chlorobium bin 445 100 100 Ca. Thermochlorobacter aerophilum OS Ca. Thermochlorobacteriaceae bin GBChlB 95 100 PscA-like Chloracidobacterium thermophilum OC1 100 Chloracidobacterium sp. CP2_5A Chloracidobacterium thermophilum B 100 PscA-like 100 Ca. Chlorohelix allophototropha L227−S17 Ca. Chloroheliaceae bin L227−5C 100 Heliobacterium gestii 100 Heliobacterium modesticaldum Ice1 PshA Heliobacillus mobilis

Extended Data Fig. 2 | Maximum likelihood phylogeny of oxygenic and anoxygenic Type I reaction center predicted protein sequences. A simplified depiction of the same phylogeny is shown in Fig. 2b. The phylogeny is midpoint rooted, and ultrafast bootstrap values of at least 80/100 are shown. The scale bar represents the expected proportion of amino acid change across the 548-residue 5 masked sequence alignment (Extended Data Table 1). bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Phylum (order) Chlorobium phaeovibrioides PhvTcv−s14 100 Chlorobium phaeovibrioides DSM 265 Bacteroidota (Chlorobia) Chlorobium phaeovibrioides GrTcv12 Acidobacteriota (Chloracidobacteriales) Chlorobium luteolum DSM 273 95 Chlorobium luteolum CIII Chloroflexota (Ca. Chloroheliales) Chlorobium ferrooxidans DSM 13031 95 Ca. Chlorobium phaeoferrooxidans KB01 Chlorobium phaeoclathratiforme BU−1 89 Chlorobium chlorochromatii CaD3 Chlorobium limicola Frasassi 100 Chlorobium limicola DSM 245 Chlorobium phaeobacteroides DSM 266 95 Chlorobaculum tepidum TLS 99 Chlorobaculum limnaeum DSM 1677 Chlorobaculum parvum NCIB 8327 98 Chlorobium phaeobacteroides BS1 100 Prosthecochloris sp. GSB1 96 Prosthecochloris marina V1 100 Prosthecochloris sp. HL−130−GSB 100 Prosthecochloris sp. CIB 2401 95 Prosthecochloris aestuarii DSM 271 Chloroherpeton thalassium ATCC 35110 0.1 Chlorobium bin 445 100 Ca. Thermochlorobacter aerophilum OS Ca. Thermochlorobacteriaceae bin GBChlB Chloracidobacterium thermophilum OC1 100 Chloracidobacterium sp. CP2_5A Chloracidobacterium thermophilum B 100 Ca. Chlorohelix allophototropha L227−S17 Ca. Chloroheliaceae bin L227−5C

Extended Data Fig. 3 | Maximum likelihood phylogeny of Fenna-Matthews-Olson (FMO) protein (FmoA) sequences. The phylogeny is midpoint rooted, and ultrafast bootstrap values of at least 80/100 are shown. The scale bar represents the expected proportion of amino acid change across the 356- residue masked sequence alignment (Extended Data Table 1). 10 5

Thermochlorobacteriaceae bin GBChlB bin Ca. Thermochlorobacteriaceae a Phylum (order) Chlorobium phaeoferrooxidans KB01 phaeoferrooxidans Ca. Chlorobium Chlorobium phaeovibrioides PhvTcv phaeovibrioides Chlorobium −s14 Chloroherpeton thalassium ATCC thalassium Chloroherpeton 35110 Chlorobium phaeobacteroides DSM266 phaeobacteroides Chlorobium bioRxiv preprint Thermochlorobacter aerophilum OS aerophilum Ca. Thermochlorobacter Chloracidobacterium thermophilum Chloracidobacterium each sequence. A correspondingmaximumlikelihood phylogeny isshowninExtended Data Fig.6. The shown, andacolourbarontheleftofalignment indicates the phylumandorder associated with representatives fromknownchlorosome-containingphyla. The entireCsmA primarysequenceis Chx. allophototropha’. a,Multiplesequence alignment oftheCsmA primarysequenceincluding properties Fig.4|Structural Extended Data ofthepredicted chlorosome protein CsmA in‘Ca. is markedwithanarrow. Tertiary structureoftheCsmA proteinfromChlorobium tepidum(PDBaccession2K37);theH25site TASSER. The tryptophanresiduecorrespondingtothe typical H25residueisindicated byanarrow. c, the possible tertiarystructureofthe‘Ca.Chx.allophototropha’ CsmA proteinaspredicted byI- witharedasterisk. b, Homologymodel showing involved inbacteriochlorophyll abindingis marked across allsequences are markedwithablackasterisk, andthe H25 histidine residue predictedtobe sequenceBLOSUM62 similarity scoreisshownunderneaththe alignment. Residuesthat areconserved Chloroheliales) (Ca. Chloroflexota ) (Chloroflexales Chloroflexota Bacteroidota Chlorobium phaeoclathratiforme BU−1 phaeoclathratiforme Chlorobium ( Acidobacteriota DSM265 phaeovibrioides Chlorobium Chloracidobacterium thermophilum Chloracidobacterium GrTcv12 phaeovibrioides Chlorobium DSM 1677 DSM limnaeum Chlorobaculum DSM 13031 DSM ferrooxidans Chlorobium Ca. Viridilinea−3F Kir15 mediisalina Prosthecochloris aestuarii DSM271 aestuarii Prosthecochloris Chx allophototropha L227−S17 Ca. Chxallophototropha Ca. Viridilinea Chok−6 halotolerans DSM 9485 DSM aggregans Chloroflexus NCIB8327 parvum Chlorobaculum sp.HL−130GSB Prosthecochloris was notcertifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmade BS1 phaeobacteroides Chlorobium −5C binL227 Ca. Chloroheliaceae Chlorobium chlorochromatii CaD3 chlorochromatii Chlorobium Chloracidobacterium J−10fl aurantiacus Chloroflexus Oscillochloris fontisChuk17 Ca. Oscillochloris Chlorobium luteolum DSM273 luteolum Chlorobium Chloroploca asiatica B7−9 asiatica Ca. Chloroploca DG−6 trichoides Oscillochloris sp.CIB2401 Prosthecochloris Chlorobium limicola DSM245 limicola Chlorobium Chlorobium limicola Frasassi limicola Chlorobium Chlorobaculum tepidum Chlorobaculum isl−2 islandicus Chloroflexus V1 marina Prosthecochloris sp.GSB1 Prosthecochloris sp. Chloroflexus Y−3961 Chlorobium luteolum Chlorobium (Chlorobia sp.MS−G Chloroflexus Chlorothrix halophila Chlorothrix Chlorobium ) Chloracidobacteriales doi: sp.CP2_5A ) bin445

Similarity https://doi.org/10.1101/2020.07.07.190934 OC1 (BLOSUM62) TLS CIII 0.0 0.5 1.0 B 0 K G F R A E K I G E F K R D E R T A I A E T K S D S R S A Q A A T F S P S S S S Q G A R F V P S S G R L S G G A G G T S L G G G G T Y L L G N G Q Y N L L N R Q L N T L G R K L A T L G A K D A F L I A Q D G F G I T Q Y G W G H T G Y E W F H M G Y E E F S M I Y R E G S I I S R V G L I V S D I T L I V V D G T G I G V G G A G P G M G A P M K S F R E T I E K D R R A S A P T S A S G G S R Q L A S F S P G S G S V G G R G I G M S G W L M N G Q S N G L G R T L L S G G G Q Y A L L N S Q E N V L A R K L S T F G A K Y A W L H A G D E F F I M Q L G E G S T I Y R W G H I G S E S F L M V Y E E T S I I A R G G G I G S G I V L P V M D T I V G G G G A P M R N F R S R I R K P R F G A D P A T F T G E G S G S R S I G P Y A N R N I N E V A A S T A Q L A N S V W Q W H G Q F M I D G I Q A V Q A S S E S F W G R T V A M R N F R S R V T K Y R R N F F N R R S N R F I R R S K R P I R R F K G P R F A G G A P S N V R G E V G D S P G T V A S D R F P V S T V G S G F P S Y G E A G N S R G N E S I G R N S A L G I V S G A R P T A Y S I A T G N A P R D Y N L A I L N T N R E S N V V I A W T A R E S W V T H A A G A Q E S L F T A M A N I Q S D L V G A W V N Q S S W A V H V W G Q Q Q G W F A H M S G I E Q D S F G F M I W I Q G D A R G V N I Q G Q G A A A M V S Q E G S A F S W E G S R F T W V G A R M T V A M K A F K S A V E K S P V A Y G T V E P S G G R L S T G G G M S G Y A N R N I R M T G K G L S D F L M G V T E W H G E V M V E F I R G A A A L I D T F V G G G S M R N F R S R I R K P R F S S V A P T S F S G E G S G S R S I G P Y A N R N I T E V A A S T A Q L A N S V W Q W H G Q F M I D G I Q A V Q G A S E S F W G R T A M K A F K S E V A K T P K V A A F F K S E V A K K A G T F S P K V K V S E A A E P F F I S K A S S K G E T R I P L A V K A G T F S P V V E A P F S S G S G G G G R S G L V G E M G P N S S G V S Y E G A P R G N S L A R S S N G S I R G R L G M G T I G S G K A Y G G A L G S N S G G S D G G N F L G I L N G R M G G A G Y L V V A N G T N G K E R Y G W N A L H I N S G R R D E M N F V T I L M G R M V K M G E G T V F L G T I S K A R D G W G F L H A L S G A M D E A G F V L V L M I T M V D E G E T W V F F H T I V G E R G E W G G V H S G M G A S V E A M E V L F M V I V D R E T G F F A I V A R G A G G L A G I A N D A M T L F I V D G T G F G V S G M G G S M R N F R R S N R F I R R S K R P R I R N R R F F K N G R P F S R R R F S I G R R R I K N R P F K R R P F S R G R F I V R N K S P S R F N G S S S S F A N S G S S N P F G S A S S G E S G F S A N G S G E S S G G S R F S K S S S G A S V S E S F S G G R K F P S K S S G Y G A V E S E A S F G V G N R K P E S R K S S Y K G N A V E E A T S I F G V G N P R N K P E S R V S D S Y K G N A V V E A T S I Y G A V N P R N P A A R V S D Y A K N A V V A T T K I Y G A N A P A N P A R Q V F D Y A N L A K V G A T E I A Y S A S N A A N N A A V R Q F D S V A E N L K V V G E T P K I A S A W S K A S A N N E A Q V S Q S F D S V A W E P L G K V V G A T H P K V A R S A W S K A G S A A N L E A Q V S Q Q S F Y S V A W E P L F G K V A T H P K V A M R S W K A G S A A N I L E Q S Q Q S F Y S D V W P L F G K V G A H K V A M R S W I S K G A A N I L E G Q Q G S Q F Y S D V T W A P F K V G A V H V K V M S W I S K E G Q A A I E G Q Q G S P Q A F Y D V S W A P S F S K G A V H V K V G M S S I S K E G Q A A G G I E E G Q G T P Q A F Y G R D S V S A P S F S K M L G F A V V K V S M S S S I W K E Q A A G G I E E G G Q G T P A F Y G R D S V Y S A R P S S K M L G F A A V V T K V S S S S I W K N E Q A A A G G E E G G Q G T R P A M F Y G R S V Y S A R S P N S S K M L F A A V V T T V I S S S S W K N E Q A G A R G E E G G G T R P A M Y M R S V Y S R S P N S S T L F A A V T T V I S S G W K N E A G A G R G E G K G T R P M K Y G M R S G S R S P N S A M T L F L V T T V I G F S G W E S A G A G R G K G G K G D P M K Y G M R S Y G S R S S F A T T L E A L V T T G L F S G V N E S A G G G M K G G K A R D P M G R G S Y G S S K N S F M L V E A L V T S I G L S T V N E S G P G R G M G G E A R D P V G M R G Y S W S K N S F A M T L V A V H A T I S L Y S G T N E G G P G R G M G K E R P E V G M R G Y G W S N S V A M T L V A L H A I S M F S G T N S G G G R G V G K E R D E G M R E Y G W S N F V M T L F G A L H A I L M N G I S N S G G G R M V G K R V R D E G M G E Y G G S E N F V M T V F G A L A G P I L M N G T I S N S A G S G R M V G K E R V R D A S G M G E Y G W G S E N F L G M T V F A L H A G P I L I R N G T I N S G A G S G R M D L G K E R R D E A S G M G T Y G W G N F V L G M T V F A L H A I L M I R N G T V N S G A G R M V D L G K E G R D E A G M G E T Y G W G N F V L M T V F F A L H G I L M I N G T I V S N S G S R M V D G K E R G G R D E M M G E T Y G W G G G N F V T V F F A L H A G G I L M G T I V S N S G A S G R M V K E R G G R D E A M G M G E G W G G G N F V L M T V F L H A G G I L M I N G T I S G A S G R M V D G K E R D E A M G M G E T Y G W G F V L G T V F F A L H A L M I V G T I V N S G A M V D G K E R G Q D E A G E T Y G W G G N F V L V F F A L H A G I L M I T I V N S G A S R M V D E R G Q D E A M M G E T W G G N F V L V V F F H A G I L M I G T I V G A S R M V D K E R G E A M A G E T G W G G V L V V F F L H A G M I G T I V S G A S V D K E R G D E A M E T G W G G F V L F F L H A G L M I I V S G A S M V D R G D E A M G E T G G F V L V F F A G L M I T I V A S M V D A R G A M G E T W G G L V F F H A G I T I V G A S D A R G E A M T W G G V L F H A G M I V G A S V D G E A M E T G V L F F G M I I V S V D R G M E T G G F F S G I V A S R G A M G G L S G I A S D A M T L F V V D G T G F G V S G M G G S M R N F R S R I R K P R F S S V T P A S F S G E G S G S R S I G P Y A N R N I T E V A A S T A Q L A N S V W Q W H G Q F M I D G I Q A V Q G A S E S F W G R T A M K A F K S E V A K T P V A F G S V E P S S G R L S G G G G G L N G Y A N R N I R M T G K G L S D F L M G V T E W H G E V M V E F I R G A A A L I D T F V G G G S M b G L G K L T K F T R F S R E S I V Q I K E E K R D F R R F G S P N S A A R A V V A P V T P E R V V V E S S T S V R Q G P G V S R S G V G G S W S T V N G V W C T I N R A E C I I A R D E G V T A T D R G L T I T G R S L V M W G M A W V W W G M D W T W Q G V D E S F Q F V E E V F A F D E S A A A T D K S A A V V L R G A A V M L G A M K G A P K A G A A G P M K S S G A R A G R P M A S S G A R G G R G M A D S G Y R G L R G N A D D G Y F G L M G N R D D G Y F Q L M I N R R D G D F Q F M I N R R Y G D A Q F I L N A R Y R D A Y F I I N A D Y R G A Y V I I L A D N R G I Y V H I L G D N E G I L V H N L G I N E E I L G H N I G I E E E V L G I N I S I E D E V V G I V I S K E D K V V F I V Q S K G D K M V F V Q K G K M F Q G M * available undera CC-BY-NC-ND 4.0Internationallicense 25 * * ; this versionpostedJuly7,2020. Alignment position (amino acid) (amino position Alignment 50 c The copyrightholderforthispreprint(which . 75 K A F K S E I Q K G Y E G E A T G F E R A S T A G F T E R T A S T A F T R T S A T T * bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

98 Rhodomicrobium vannielii ATCC 17100 Phylum (order) Nitrobacter winogradskyi Nb−255 Acidiphilium cryptum JF−5 Chloroflexota (Chloroflexales) Inquilinus limosus DSM 16000 Rhodovibrio salinarum DSM 9154 91 Aurantimonas sp. SI85−9A1 Chloroflexota (basal; non-phototrophic) 89 Rhodopila globiformis DSM 161 Methyloferula stellata AR4 85 100 Methylibium petroleiphilum PM1 Chloroflexota (Ca. Chloroheliales) 100 Xanthobacter autotrophicus Py2 82 Paraburkholderia xenovorans LB400 85 Beggiatoa alba B18LD 85 Chloroflexi bacterium CSP1−4 Thermithiobacillus tepidarius DSM 3134 Salinisphaera shabanensis E1L3A 100 Odontella sinensis 80 82 Pleurochrysis carterae 99 Sulfuricaulis limicola HA5 85 94 Nitrosospira multiformis ATCC 251966 100 Nitrosococcus oceani ATCC 19707 86 Hydrocarboniphaga effusa AP103 Dokdonella koreensis DS−123 100 Thiohalorhabdus denitrificans HL 19 100 Oscillochloris trichoides DG−6 100 Ca. Oscillochloris fontis Chuk17 Kouleothrix aurantiaca COM−B Nitrolancea hollandica Lb 100 Kyrpidia tusciae DSM 2912 98 Hydrogenibacillus schlegelii MA 48 IC/ID 91 Sulfobacillus thermosulfidooxidans DSM 9293 Smaragdicoccus niigatensis DSM 44881 Thermomonospora curvata DSM 43183 100 Pseudonocardia thermophila DSM 43832 100 Ca. Chlorohelix allophototropha L227−S17 Ca. Chloroheliaceae bin L227−5C 84 Methylococcus capsulatus ATCC 19069 Halorhodospira halophila SL1 Allochromatium vinosum DSM 180 Ectothiorhodosinus mongolicus M9 87 Acidihalobacter prosperus DSM 5130 Nitrobacter vulgaris Thioalkalivibrio versutus AL 2 Thiothrix nivea DSM 5205 84 Acidimicrobium ferrooxidans DSM 10331 100 Pararhodospirillum photometricum DSM 122 88 100 99 83 Hydrogenophilus thermoluteolus Rhodovulum sulfidophilum DSM 1374 Thiohalospira halophila DSM 15071 Mariprofundus ferrooxydans PV−1 87 Thiohalomonas denitrificans HLD2 Grimontia hollisae ATCC 33564 Sulfuricella denitrificans skB26 100 Bradyrhizobium sp. BTAi1 99 Prochlorococcus marinus str. CCMP1375 100 Nitrococcus mobilis Nb−231 IA/IB Thiomicrospira pelophila DSM 1534 98 Sulfurivirga caldicuralii DSM 17737 100 99 Aliterella atlantica CENA595 Anabaena variabilis ATCC 29413 100 Prochlorothrix hollandica PCC 9006 0.2 98 Nicotiana tabacum Rubidibacter lacunae KORDI 51−2 Limnochorda pilosa HC45 100 Thermodesulfobium narugense DSM 14796 Ammonifex degensii KC4 III (A) 100 Archaeoglobus fulgidus DSM 4304 83 100 Thermococcus kodakaraensis KOD1 100 Thermosphaera aggregans DSM 11486 100 98 Ignisphaera aggregans DSM 17230 Thermofilum pendens Hrk 5 Hyperthermus butylicus DSM 5456 100 84 92 DSM 3638 81 OT3 Methanoregula boonei 6A8 99 95 Halomicrobium mukohataei DSM 12286 100 Haloferax volcanii DS2 98 Natronomonas pharaonis DSM 2160 Halodesulfurarchaeum formicicum HSR6 III (B) 100 jannaschii DSM 2661 Methanotorris igneus Kol 5 Methanosarcina acetivorans C2A 91 Methanoplanus limicola DSM 2279 100 Methanospirillum hungatei JF−1 93 Methanosphaerula palustris E1−9c 96 Methanocella paludicola SANAE 100 100 Methanothrix thermoacetophila PT Methanonatronarchaeum thermophilum AMET1 III (C) Methanomassiliicoccus luminyensis B10 98 98 Rhodoferax fermentans JCM 7819 98 Rhodoferax ferrireducens T118 Sulfuritalea hydrogenivorans sk43H Thiolapillus brandeum Hiromi 1 Thiomicrospira crunogena XCL−2 Hydrogenovibrio marinus Propionicimonas paludicola II 100 87 Magnetospirillum gryphiswaldense MSR−1 v2 100 Magnetospirillum magnetotacticum AMB−1 86 Rhodospirillum rubrum ATCC 11170 100 100 Symbiodinium sp. 93 Lingulodinium polyedrum Leptonema illini DSM 21528 100 Methanohalophilus mahii DSM 5219 II/III Methanococcoides burtonii DSM 6242

Extended Data Fig. 5 | Maximum likelihood phylogeny of RuBisCO large subunit (CbbL) predicted protein sequences. Group I to III CbbL sequences are shown, and group IV sequences, which do not form proteins involved in carbon fixation, are omitted for conciseness. The phylogeny is midpoint rooted, and ultrafast bootstrap values of at least 80/100 are shown. The scale bar represents 5 the expected proportion of amino acid change across the 412-residue masked sequence alignment (Extended Data Table 1). bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Phylum (order) Bacteroidota (Chlorobia) Chloroflexota (Chloroflexales) 100 Chlorobium phaeovibrioides DSM 265 95 Chlorobium luteolum DSM 273 Chloroflexota (Ca. Chloroheliales) 100 Chlorobium phaeoclathratiforme BU−1 100 Ca. Chlorobium phaeoferrooxidans KB01 Chlorobium phaeobacteroides DSM 266 Cyanobacteria / chloroplast 84 89 Chlorobium limicola DSM 245 Chlorobaculum tepidum TLS 99 Chlorobaculum limnaeum DSM 1677 Firmicutes B (Heliobacteriales ) 100 100 Chlorobaculum parvum NCIB 8327 100 Prosthecochloris marina V1 98 Chlorobium phaeobacteroides BS1 Proteobacteria (various) 100 Prosthecochloris sp. GSB1 100 Prosthecochloris sp. CIB 2401 100 Prosthecochloris aestuarii DSM 271 Gemmatimonadota (Gemmatimonadales) Chloroherpeton thalassium ATCC 35110 100 Ca. Viridilinea mediisalina Kir15−3F 100 Ca. Viridilinea halotolerans Chok−6 Acidobacteriota (Chloracidobacteriales) 100 Ca. Chloroploca asiatica B7−9 Chloranaerofilum corporosum YNP−MS−B−OTU−15 100 100 100 Oscillochloris trichoides DG−6 100 Ca. Oscillochloris fontis Chuk17 100 Chloroflexus sp. Y−396−1 81 Chloroflexus sp. MS−G 99 98 100 Chloroflexus aurantiacus J−10−fl Chloroflexus islandicus isl−2 100 100 Chloroflexus aggregans DSM 9485 Roseiflexus sp. RS−1 100 Roseiflexus castenholzii DSM 13941 Chlorothrix halophila 100 Ca. Chlorohelix allophototropha L227−S17 Ca. Chloroheliaceae bin L227−5C Scytonema hofmannii PCC 7110 96 Fischerella thermalis PCC 7521 99 Mastigocoleus testarum BC008 98 100 Nostoc punctiforme PCC 73102 0.1 92 Anabaena cylindrica PCC 7122 Aliterella atlantica CENA595 Oscillatoria nigro−viridis PCC 7112 93 Oscillatoria acuminata PCC 6304 Moorea producens 3L 100 Leptolyngbya boryana PCC 6306 Dactylococcopsis salina PCC 8305 Cyanobacterium stanieri PCC 7202 Spirulina major PCC 6313 96 Stanieria cyanosphaera PCC 7437 100 Microcystis aeruginosa NIES−843 100 Populus trichocarpa 100 Arabidopsis thaliana 100 Amborella trichopoda 100 Thalassiosira pseudonana 100 Phaeodactylum tricornutum 100 Prochlorococcus marinus str. CCMP1375 100 Cyanobium gracile PCC 6307 100 Synechococcus elongatus PCC 6301 Prochlorothrix hollandica PCC 9006 Gloeomargarita lithophora Alchichica−D10 100 100 Gloeobacter violaceus PCC 7421 Gloeobacter kilaueensis JS1 100 Heliobacterium gestii 100 Heliobacterium modesticaldum Ice1 Heliobacillus mobilis 100 Nevskia ramosa DSM 11499 Methyloversatilis universalis FAM5 DSM 137 Thiorhodospira sibirica ATCC 700588 100 Thiocapsa roseopersicina DSM 217 Allochromatium vinosum DSM 180 Blastochloris viridis ATCC 19567 Afifella marina DSM 2698 100 Congregibacter litoralis KT71 Pseudohaliea rubra DSM 19751 Thalassobaculum litoreum DSM 18839 Rhodospirillum rubrum ATCC 11170 Rhodovibrio salinarum DSM 9154 Rhodomicrobium vannielii ATCC 17100 84 Erythrobacter longus DSM 6997 99 Aquidulcibacter paucihalophilus TH1−2 Roseobacter litoralis Och 149 Gemmatimonas phototrophica AP64 100 Chloracidobacterium thermophilum OC1 100 Chloracidobacterium thermophilum B Chloracidobacterium sp. CP2_5A

Extended Data Fig. 6 | Maximum likelihood phylogeny of BchIDH/ChlIDH protein sequences. The phylogeny is midpoint rooted, and ultrafast bootstrap values of at least 80/100 are shown. The scale bar represents the expected proportion of amino acid change across the 1702-residue masked sequence alignment (Extended Data Table 1). No sequences associated with the Eremiobacterota 5 phylum are shown, because the complete BchIDH gene set could not be detected in any of the genome bins available from that phylum using bidirectional BLASTP. bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

100 Thiocapsa sp. KS1 Phylum (order) 99 Allochromatium vinosum DSM 180 100 Halorhodospira halophila SL1 100 Pseudohaliea rubra DSM 19751 Proteobacteria (various) Chloroflexota (Chloroflexales)* 91 Congregibacter litoralis KT71 100 Prochlorococcus marinus str. CCMP1375 99 100 Cyanobium gracile PCC 6307 Gemmatimonas phototrophica AP64 Gemmatimonadota (Gemmatimonadales) Chloroflexota (Ca. Chloroheliales) 96 Thiorhodospira sibirica ATCC 700588 Methyloversatilis universalis FAM5 Rhodovibrio salinarum DSM 9154 Aquidulcibacter paucihalophilus TH1−2 Acidobacteriota (Chloracidobacteriales) Firmicutes B (Heliobacteriales) 100 Stappia stellulata DSM 5886 Oceanibaculum indicum P24 Rhodomicrobium vannielii ATCC 17100 Thalassobaculum litoreum DSM 18839 Cyanobacteria / chloroplast Eremiobacterota (UBP12) Rhodoferax fermentans JCM 7819 83 Rubritepida flocculans DSM 14296 Mongoliimonas terrestris MIMtkB18 Nevskia ramosa DSM 11499 Bacteroidota (Chlorobia) Rhodospirillum rubrum ATCC 11170 Blastochloris viridis ATCC 19567 Afifella marina DSM 2698 Fulvimarina pelagi HTCC2506 89 Roseobacter litoralis Och 149 100 Erythrobacter longus DSM 6997 Rhodoblastus acidophilus DSM 137 100 Chloracidobacterium thermophilum OC1 100 Chloracidobacterium thermophilum B Chloracidobacterium sp. CP2_5A 99 100 Nostoc punctiforme PCC 73102 99 Fortiea contorta PCC 7126 92 Anabaena cylindrica PCC 7122 93 Fischerella thermalis PCC 7521 Scytonema hofmannii PCC 7110 100 Chroococcidiopsis thermalis PCC 7203 100 Stanieria cyanosphaera PCC 7437 Microcystis aeruginosa NIES−843 82 Spirulina major PCC 6313 82 Cyanobacterium stanieri PCC 7202 Oscillatoria acuminata PCC 6304 Neosynechococcus sphagnicola sy1 94 97 Synechococcus elongatus PCC 6301 Dactylococcopsis salina PCC 8305 82 Prochlorothrix hollandica PCC 9006 100 Oscillatoria nigro−viridis PCC 7112 Gloeomargarita lithophora Alchichica−D10 100 Gloeobacter violaceus PCC 7421 Gloeobacter kilaueensis JS1 Chlorobium phaeobacteroides DSM 266 82 Chlorobium limicola DSM 245 99 Chlorobium phaeoclathratiforme BU−1 Ca. Chlorobium phaeoferrooxidans KB01 100 100 Chlorobium phaeovibrioides DSM 265 Chlorobium luteolum DSM 273 89 99 Chlorobium chlorochromatii CaD3 100 Chlorobaculum tepidum TLS Chlorobaculum parvum NCIB 8327 100 100 Chlorobaculum limnaeum DSM 1677 Prosthecochloris sp. CIB 2401 Prosthecochloris aestuarii DSM 271 100 100 100 Prosthecochloris marina V1 Chlorobium phaeobacteroides BS1 Prosthecochloris sp. GSB1 100 Chloroherpeton thalassium ATCC 35110 Ca. Thermochlorobacter aerophilum OS 100 Chloroflexus sp. Y−396−1 100 Chloroflexus sp. MS−G Chloroflexus aurantiacus J−10−fl 93 100 100 Chloroflexus aggregans DSM 9485 Chloroflexus islandicus isl−2 94 100 Ca. Viridilinea mediisalina Kir15−3F 100 Ca. Viridilinea halotolerans Chok−6 BchLNB/ 100 100 Ca. Chloroploca asiatica B7−9 100 Chloranaerofilum corporosum YNP−MS−B−OTU−15 99 100 Oscillochloris trichoides DG−6 ChlLBC Ca. Oscillochloris fontis Chuk17 100 100 Roseiflexus sp. RS−1 92 Roseiflexus castenholzii DSM 13941 Chlorothrix halophila 100 Ca. Chlorohelix allophototropha L227−S17 Ca. Chloroheliaceae bin L227−5C 100 Heliobacterium gestii 100 Heliobacterium modesticaldum Ice1 Heliobacillus mobilis Ca. Eremiobacterota bin bog_1520 Ca. Eremiobacterota bin bog_1501 100 Ca. Eremiobacterota bin bog_1527 Ca. Eremiobacterota bin bog_1519 97 Chlorobium phaeoclathratiforme BU−1 97 Ca. Chlorobium phaeoferrooxidans KB01 96 Chlorobium chlorochromatii CaD3 96 97 Chlorobium phaeobacteroides DSM 266 Chlorobium limicola DSM 245 100 Chlorobium phaeovibrioides DSM 265 Chlorobium luteolum DSM 273 100 Prosthecochloris marina V1 100 Chlorobium phaeobacteroides BS1 100 100 Prosthecochloris sp. GSB1 95 Prosthecochloris sp. CIB 2401 Prosthecochloris aestuarii DSM 271 100 95 Chlorobaculum tepidum TLS 100 Chlorobaculum parvum NCIB 8327 Chlorobaculum limnaeum DSM 1677 100 Ca. Thermochlorobacteriaceae bin GBChlB Ca. Thermochlorobacter aerophilum OS 100 Chloroflexus sp. Y−396−1 100 Chloroflexus sp. MS−G 100 Chloroflexus aurantiacus J−10−fl 100 Chloroflexus islandicus isl−2 100 100 Chloroflexus aggregans DSM 9485 100 Ca. Viridilinea mediisalina Kir15−3F 99 Ca. Viridilinea halotolerans Chok−6 100 Ca. Chloroploca asiatica B7−9 96 100 Oscillochloris trichoides DG−6 93 95 Ca. Oscillochloris fontis Chuk17 Chlorothrix halophila 100 100 Chloroflexi bin J036 Ca. Thermofonsia bin JP3−7 * 100 Roseiflexus sp. RS−1 98 Roseiflexus castenholzii DSM 13941 100 Ca. Chlorohelix allophototropha L227−S17 Ca. Chloroheliaceae bin L227−5C 100 Heliobacterium gestii 95 100 Heliobacterium modesticaldum Ice1 100 Heliobacillus mobilis 100 Chloracidobacterium thermophilum OC1 100 Chloracidobacterium thermophilum B Chloracidobacterium sp. CP2_5A Ca. Eremiobacterota bin bog_1492 Rhodospirillum rubrum ATCC 11170 Pararhodospirillum photometricum DSM 122 Rhodomicrobium vannielii ATCC 17100 96 Erythrobacter longus DSM 6997 Aquidulcibacter paucihalophilus TH1−2 Thiorhodospira sibirica ATCC 700588 Afifella marina DSM 2698 Rhodoblastus acidophilus DSM 137 100 Halorhodospira halophila SL1 100 Pseudohaliea rubra DSM 19751 Congregibacter litoralis KT71 Thiocapsa sp. KS1 BchXYZ 100 Stappia stellulata DSM 5886 Oceanibaculum indicum P24 Blastochloris viridis ATCC 19567 100 Nevskia ramosa DSM 11499 90 Methyloversatilis universalis FAM5 91 Rhodoferax fermentans JCM 7819 91 Gemmatimonas phototrophica AP64 99 Roseobacter litoralis Och 149 0.2 87 Fulvimarina pelagi HTCC2506 Mongoliimonas terrestris MIMtkB18 100 99 Thalassobaculum litoreum DSM 18839 Rubritepida flocculans DSM 14296 Rhodovibrio salinarum DSM 9154 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Extended Data Fig. 7 | Maximum likelihood phylogeny of predicted protein sequences of the paralogs BchLNB/ChlLNB and BchXYZ. The phylogeny is midpoint rooted, and ultrafast bootstrap values of at least 80/100 are shown. The scale bar represents the expected proportion of amino acid change across the 819-residue masked sequence alignment (Extended Data Table 1). One sequence 5 among the Chloroflexales cluster for BchXYZ, associated with ‘Ca. Thermofonsia bin JP3-7’, actually belongs to a genome bin that places in the basal clades of the Chloroflexota phylum (see Fig. 3) and is thus marked with an asterisk. bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Phylum (order) Prosthecochloris sp. HL−130−GSB Prosthecochloris marina V1 Bacteroidota (Chlorobia) Prosthecochloris aestuarii DSM 271 Prosthecochloris sp. CIB 2401 100 Acidobacteriota (Chloracidobacteriales) Chlorobium phaeobacteroides BS1 98 Chlorobium phaeobacteroides DSM 266 Chloroflexota (Chloroflexales) Prosthecochloris sp. GSB1 Chlorobium limicola Frasassi Chloroflexota (Ca. Chloroheliales) Chlorobium limicola DSM 245 Chlorobium phaeovibrioides PhvTcv−s14 Chlorobium phaeovibrioides DSM 265 97 Chlorobium phaeovibrioides GrTcv12 Chlorobium luteolum DSM 273 85 Chlorobium chlorochromatii CaD3 85 Chlorobium luteolum CIII 100 Chlorobium phaeoclathratiforme BU−1 Chlorobium ferrooxidans DSM 13031 Ca. Chlorobium phaeoferrooxidans KB01 Chlorobaculum limnaeum DSM 1677 Chlorobaculum tepidum TLS 86 82 0.2 Chlorobaculum parvum NCIB 8327 Chlorobium bin 445 95 97 Ca. Thermochlorobacter aerophilum OS 93 Ca. Thermochlorobacteriaceae bin GBChlB Chloroherpeton thalassium ATCC 35110 Chloracidobacterium thermophilum OC1 100 Chloracidobacterium thermophilum B Chloracidobacterium sp. CP2_5A 91 Chloroflexus islandicus isl−2 Chloroflexus aggregans DSM 9485 Chloroflexus aurantiacus J−10−fl 100 Chloroflexus sp. MS−G 91 Chloroflexus sp. Y−396−1 86 Ca. Chloroploca asiatica B7−9 Ca. Viridilinea mediisalina Kir15−3F 96 86 Ca. Viridilinea halotolerans Chok−6 Oscillochloris trichoides DG−6 80 Ca. Oscillochloris fontis Chuk17 Chlorothrix halophila 100 Ca. Chlorohelix allophototropha L227−S17 Ca. Chloroheliaceae bin L227−5C

Extended Data Fig. 8 | Maximum likelihood phylogeny of CsmA predicted protein sequences. The phylogeny is midpoint rooted, and ultrafast bootstrap values of at least 80/100 are shown. The scale bar represents the expected proportion of amino acid change across the 74-residue masked sequence alignment (Extended Data Table 1). bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Camera (no IR filter) LED light source CCD NIR bandpass filter

(380-500 nm; 720-820 nm 450 nm peak)

Cytoplasm

Bacterial culture BChlC FMO RCI Chlorosome (inside cell Periplasm membrane)

Extended Data Fig. 9 | Schematic of a pigment fluorescence detection system optimized for bacteriochlorophyll c. Light is depicted as straight lines with arrows, and pigment-containing microbial biomass inside the tube is depicted as green dots. The fluorescently active chlorosome complex of an example Type I reaction center-containing cell is shown as an inset. An example 5 photograph using the system is shown in Fig. 1a. Abbreviations: LED, light-emitting diode; IR, infrared; NIR near infrared; CCD, charge-coupled device; BChlC, bacteriochlorophyll c; FMO, Fenna- Matthews-Olson protein; RCI, Type I photosynthetic reaction center. bioRxiv preprint doi: https://doi.org/10.1101/2020.07.07.190934; this version posted July 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Extended Data Table 1 | Properties of sequence alignments and phylogenetic trees presented in this study.

Alignment Alignment length length Evolutionary Protein set (unmasked) (masked) rate model used Core proteins (Chloroflexota phylum) NA 11 613 LG+F+R6 Type I reaction center (PsaA/PsaB/PscA/PshA) 1292 548 LG+F+G4 FmoA 399 356 LG+G4 CbbL 644 412 LG+I+G4 CsmA 89 74 LG+F+G4 BchIDH/ChlIDH 3156 1702 LG+F+I+G4 BchLNB/ChlLNB/BchXYZ 2010 819 LG+F+I+G4