1 2 Heme A-containing oxidases evolved in the ancestors of iron oxidizing 3 4 Supplemental Material 5 6 Additional methodological approaches and findings 7 This Supplemental Material file includes additional methodological approaches and findings that are described in detail 8 for documenting our in depth analysis of key accessory proteins of COX enzymes: CtaA, CtaG and SURF1. 9 The Supplemental Material includes 12 Supplementary Figures and 6 Supplementary Tables, as well as various 10 Supplementary References, which are listed at p. 9 of this document following the numeration in the main text. 11 The Supplementary Tables are pasted at the end of this document, but can also be supplied as independent .xls files, 12 indicated in their legends. 13 14 CtaA 15 Although most seem to have heme A-containing COX enzymes [2, 7, 31, 32], no exhaustive study on the 16 taxonomic distribution of these enzymes has been reported recently. We have considered CtaA, heme A synthase, as a 17 potential proxy for determining the taxonomic distribution of heme A-containing COX enzymes, undertaking a 18 systematic genomic search for heme A synthase among all prokaryotes that are currently represented in the 19 comprehensive nr database and other genome repositories. Using multiple queries combined with iterative blast 20 searches (see Material and Methods, cf. [23]), we could not find CtaA proteins in anaerobic phyla such as Dictyoglomi 21 and . We also failed to find CtaA proteins – apart from clear, isolated cases of LGT - in the following 22 taxonomic groups, besides the lineages of the Candidate Phyla Radiation [43]: , , 23 facultatively anaerobic such as Persephonella, and sulfate-reducing such as 24 Desulfovibrio. Remarkably, most of these groups have genes for CtaB and some of them have HCO terminal oxidases 25 classified in the A family [2, 32], as shown in some phylogenetic trees of COX subunits presented in this paper. These 26 taxa must therefore have either heme B or O in the oxygen-reacting center of their A family oxidases, similarly to the

27 b(o)3 oxidases of Desulfovibrio [23]. 28 29 Along the exhaustive genomic survey of heme A synthase, we discovered several taxa that have a type 2 CtaA together 30 with another type of the protein, most frequently of type 1, in their genome (Supplementary Table S4, cf. Table 1). 31 Previously, this dual presence of heme A synthases has been reported only for a ill-defined MAG of 32 [22], which was not reported in Supplementary Table S4 because of the limited completeness of its genome. 33 Remarkably, in several related to Ca. Accumulibacter, the gene for what appears a non-functional 34 variant of type 1 CtaA (Table 1) is followed by the gene for a type 2 CtaA; namely, the genes encoding for two types of 35 heme A synthase are concatenated with each other, and precede the gene cluster of a B family oxidase (Supplementary 36 Table S4 and data not shown). This gene concatenation strongly suggests that the evolution of the various types of CtaA 1

37 has followed gene duplication and subsequent diversification, as illustrated in the scheme of Fig. 3b. In other taxa, for 38 example the Bacteriodetes Flavobactereium johnsonii, the genes of two different types of CtaA are dispersed along the 39 genome (Supplementary Table S4 and data not shown). 40 41 Our genomic survey also identified a number of that have type 1 CtaA instead of the type 2 42 characteristic of the class [20]. Previously, type 1 CtaA was reported only in Tistrella and Geminicoccus [22], marine 43 taxa that together with Arboricoccus may form the family Geminicoccaceae amog Rhodospirillales (see [51] and 44 references therein). These proteins cluster together in extended phylogenetic trees, forming a sister group to the branch 45 containing other type 1 proteins from unclassified Alphaproteobacteria, such as OUV28671 of Alphaproteobacteria 46 bacterium TMED109 [66] (Supplementary Fig. S4a). These unclassified Alphaproteobacteria live in marine 47 environments too, and their number is steadily increasing in genome repositories. The Alphaproteobacterial taxa 48 possessing type 1 CtaA has increased from two in 2016 [22] to 11 in 2018 (MDE, unpublished data) and then to 54 as 49 for February 2020 (https://blast.ncbi.nlm.nih.gov/Blast.cgi , accessed on 19 Feb 2020). Interestingly, the single case of 50 type 1 CtaA found in mitochondria [22] clusters with the branch of unclassified marine Alphaproteobacteria rather than 51 with that containing Tistrella and Geminicoccaceae (Supplementary Fig. S4a,b), contrary to a previous report [22]. We 52 are still searching for Alphaproteobacteria MAG that may have both type 1 and type 2 CtaA genes as in the case of 53 other listed in Supplementary Table S4. 54 55 The previous genomic survey of heme A synthases [22] failed to detect type 2 CtaA proteins present in , 56 and Ca. Calditrichaeota (Table 1 and Supplementary Table S4). We then found a group of about 57 500 CtaA proteins that lack the Cys pairs in diverse such as (Supplementary Fig. S1a), 58 and CFB (Chlorobi, Flavobacteria and , Table 1). Sequence analysis indicated that these 59 proteins have structural features differing from those of type 2 CtaA proteins, in particular the shorter ECL1 (compare 60 Fig. 2b with Supplementary Fig. S1a, cf. Table 1). Phylogenetic analysis then clarified that this new type of CtaA 61 proteins clusters with a subtype variant of type 1 CtaA that is always different from that forming the sister group of type 62 2 CtaA (Supplementary Table S2 and Fig. 3a). Consequently, such CtaA proteins likely derived from a secondary loss 63 of one or both Cys pairs form type 1 variants. Therefore, they were called type 1.5 (Table 1). 64 65 The late divergent position of the newly defined type 1.5 CtaA proteins initially was unclear in the unrooted 66 phylogenetic trees that were routinely produced using various sets of proteins and different methods (not shown). To 67 solve the problem of the root in the overall phylogenetic trees of CtaA proteins, which has persisted since the work of 68 He et al [22], we first considered the short CtaA protein of the Archaean Aeropyrum as a potential root for the 69 phylogenetic trees. According to a previous hypothesis, the Aeropyrum variant of type 1 might constitute a reasonable 70 ancestor for the superfamily of heme A synthases [20, 65]. However, we found that Aeropyrum CtaA does not form a 71 basal branch in the phylogenetic trees of CtaA proteins, but rather clusters with type 1 proteins varying from tree to tree, 72 depending upon the method and experimental settings used to build these trees (Fig. 3a and Supplementary Figs. S2- 2

73 S3). The same pattern was found for 4 TM CtaA proteins from other Archaean lineages, which clustered with different 74 type 1 variants than those close to Aeropyrum CtaA (Fig. 3a and Supplementary Fig. S3). These findings suggest that 75 short CtaA proteins present in the genome of diverse Archaean lineages likely derive from separate events of LGT from 76 bacteria, as for various terminal oxidases and other bioenergetic enzymes [6, 32]. Consequently, the short Archean 77 CtaA was defined as type 1.4, for its likely origin from splits of genes for type 1.1 CtaA (Table 1 and Fig. 3b). 78 79 We next looked into other 4 TM proteins that may function as a potential root in the phylogenetic trees of CtaA

80 proteins. Cyt b561 of E.coli and related [67] has been found to resemble the 3D structure of the C

81 terminal of B. subtilis CtaA [21]. However, sequence alignment of E.coli Cyt b561 to the C terminal domain of 82 CtaA proteins required an extensive gap in order to match the His ligands of the cyt b heme, thereby producing distorted 83 ML trees with a poorly resolved root (not shown). Conversely, we found that sequence alignment of proteins containing 84 the Domain of Unknown Function 420 (DUF420, http://pfam.xfam.org/family/DUF420 , first accessed on 23 85 December 2018) with the N-terminal domain of CtaA proteins produced a good local sequence match, including the two 86 conserved His residues that are believed to form the (transient) axial ligands of the heme O substrate in B. subtilis CtaA 87 [21]. We then extended such preliminary alignments to encompass the most divergent DUF420 proteins including

88 CtaM, which has been shown to be involved in the maturation of cytochrome aa3 in S. aureus [61]. Before validating 89 DUF420 proteins as rooting sequences, and consequently potential ancestors of CtaA proteins, we undertook a thorough 90 analysis and detailed taxonomic survey of these proteins, which is summarized below. 91 92 In the genomic surveys mentioned earlier, genes encoding for what is usually defined as DUF420 family domain 93 (http://pfam.xfam.org/family/DUF420 , last accessed on 2 February 2020) were frequently encountered near COX and 94 related genes (Supplementary Table S5a). The DUF420 domain is present in BAF67254, the protein from S. aureus 95 which has been named CtaM [61]. CtaM is similar to B.subtilis YozB (COG2322), which has a role in the biogenesis of 96 the oxygen-reacting centre of aa3 oxidase, as emerging from recent unpublished results by Author LH. Extensive 97 genomic searches (Supplementary Table S5a) have shown that there are two different clades of DUF420-containing 98 proteins in , the vast order of that includes both S.aureus and B.subtilis. The first, and apparently 99 oldest clade has homologues among and related taxa that form the deepest branching group of the 100 phylum Firmicutes [43] – Supplementary Fig. S5). The ML tree of the most diverse DUF420 proteins shows a robust 101 separation (with over 90% bootstrap support) of two major clades, both of which contain proteins from members of 102 Bacillales taxa, as shown in Supplementary Fig. S5. Clade 1 contains proteins from Firmicutes that belong to the family 103 Alicyclobacillaceae (one taxon of which, A. ferrooxidans, is an acidophilic Fe2+-oxidizer with deep branching CtaA, cf. 104 Fig. S3b and Ref. [41]), plus proteins that either form a three-genes unit with CtaA and CtaB, as in the case of S.aureus 105 CtaM [61], or are inserted at the end of CtaA-G operons for COX. In contrast, clade 2 proteins appear to be late 106 diverging with respect to those of Clade 1, being at the tip of the branch containing the DUF420 of Gram negative 107 bacteria (box labelled 2 in Supplementary Fig. S5). These findings, combined with complementary trees, suggest 108 that B. subtilis YozB and related proteins such as B.firmus ORF1 might have been laterally transferred from members of 3

109 the Chitinophagales class of CFB, since a taxon of this order, Niastella, appears to have the closest homologue to the 110 group of Bacilli 2 (Fig. S5 top). In molecular terms, the two clades of DUF420 proteins differ for the number of 111 predicted TM (four for clade 1/CtaM and apparently five for clade 2/YozB, with the additional helix at the N terminus) 112 and the presence of a distinctive C terminal extension of 15 residues in clade 1/CtaM, which is shared with early 113 diverging proteins from Nitrospirae. In all cases, various DUF420 proteins may be involved in the assembly of the 114 binuclear oxygen-reacting center of A family, sometimes B family, and probably also C family oxidases (data not 115 shown), given the genomic location of Leptospirillum DUF420 in a gene cluster containing various assembly factors for

116 the cbb3 oxidase that supports iron oxidation in these bacteria (cf. Ref. [28]). The DUF420 proteins of Leptospirillum 117 taxa form the deepest branch in the phylogenetic trees of the whole family, together with other proteins from 118 Nitrospirae, both classified as N. gracilis and unclassified (altogether labelled Nitrospirae in Fig. S5). Intriguingly, a 119 related protein, OLB2753 of Nitrospirae bacterium 13_2_20CM_2_63_8, shows two partial DUF420 domains fused 120 together. This finding supports the possibility that ancestral CtaA arose in acidophilic iron-oxidizers from the 121 duplication and partial diversification of DUF420 proteins, as shown in the scheme of Fig. 3b. 122 123 The insertion of DUF420 proteins in the alignments of CtaA sequences produced rooted trees which consistently 124 exhibited the following elements of topology (Fig. 3a and Supplementary Figs S2-S3): 125 1. Type 1.5 is sister to a type 1 branch, but never sister to type 2 or type 0; 126 2. Type 0 (e.g. Acidithiobacillus) is nearly always (97%) sister to all the other types, i.e. it is basal to the whole tree; 127 3. Type 2 is sister to a type 1 branch different from the sister of type 1.5; 128 4. DUF420 proteins form the root of the trees. 129 The statistical analysis of these elements of tree topology that define the molecular evolution of CtaA proteins (Table 1) 130 is presented in Supplementary Table S2. These elements were used also to formulate the priors for dedicated Bayesian 131 trees (Supplementary Fig. S2b). 132 133 134 135

4

136 CtaG 137 The insertion and assembly of a Cu atom in the oxygen-reducing center of COX enzymes is intimately connected with 138 the insertion of heme A [24, 25]. Membrane proteins called CtaG are specifically involved in this process, and are 139 divided in two different super-families [23]. In this work, we have studied only the multi-TM super-family of 140 caa3_CtaG, which was initially characterized in B.subtilis [24]. The ctaG gene in the ctaBCDEFG operon of caa3 141 oxidase was found in the first report of B. subtilis genome [68]. Most information that is available on this protein 142 derives from subsequent studies in B. subtilis [24]. Recently, homologs of Bacillus caa3_CtaG have been reported in 143 Alphaproteobacteria [23] and [69]. In the latter phylum, which probably contains most of the caa3_CtaG 144 proteins that are currently available, the CtaG domain is fused with the domain of another protein that binds Cu, CopD 145 [69]. Previously, CopD-related proteins from the -Thermus phylum have been considered as possible 146 ancestors of the caa3-CtaG proteins [23]. However, phylogenetically broad trees have later shown that these proteins 147 form an internal, late-diverging branch within the caa3-CtaG super-family (results not shown). Such results emerged 148 from a systematic genomic search of genes encoding recognized members of the caa3_CtaG super-family that we 149 undertook with the same approaches used for CtaA proteins. 150 151 All variants of CtaG proteins currently present in genome repositories are recognized by the conserved domain [50] of 152 the caa3_CtaG superfamily, cl09173, which includes pfam09678 (focused on Deinococcus proteins), COG3336 153 (focused on full proteins from Bacillus and truncated proteins from alphaproteobacteria) and also TIGR02737 focused 154 on Bacillales proteins only (http://tigrfams.jcvi.org/cgi-bin/HmmReportPage.cgi?acc=TIGR02737 , last accessed on 21 155 Feb 2020). Overall, the taxonomic distribution of these proteins is clearly narrower than that of either CtaA or DUF420 156 proteins (Supplementary Table S5a). According to the website for pfam09678 (http://pfam.xfam.org/family/Caa3_CtaG 157 accessed on 20 Feb 2020), caa3_CtaG proteins are predominantly present in Proteobacteria, Actinobacteria and 158 Firmicutes (chiefly Bacillales). Although this distribution is clearly underestimated when considering the current 159 taxonomic richness of the nr database (Gemmatimonadetes, for instance, have at least one order of magnitude more 160 caa3_CtaG proteins than the few reported in the pfam09678 website), there is clear evidence for a limited distribution 161 among the taxa that have COX enzymes and CtaA accessory proteins. Hence, other proteins may fulfill the same role of 162 inserting Cu in the oxygen-reacting center in A-family oxidases in various bacterial phyla, as discussed later. 163 164 Actinobacteria predominantly have the abovementioned fused protein [69], but also have homologs of the 7 TM protein 165 of B.subtilis in either unclassified MAG or the deep branching group of Acidimicrobiales, which do not cluster together 166 with the fused proteins typical of and Mycobacteria (Fig. 4b, Supplementary Fig. S6 and data not 167 shown). Such results were obtained from phylogenetic trees rooted on the central and C-terminal domain of 168 Corynebacterium MATE efflux transporters [70], which aligns well with the manually curated sequences of caa3_CtaG 169 proteins (cf. Supplementary Fig. S6 and data not shown). These root proteins show the conserved domain cd13136, 170 characteristic of the multidrug and toxic compound extrusion (MATE)-like proteins [70], which appear to be involved 171 also in the transport of heavy metals such as Al. The great majority of rooted phylogenetic trees of caa3_CtaG proteins 5

172 including all major taxonomic groups show the proteins of Acidithiobacillus spp. and Acidiferrobacter spp. in the 173 earliest branching group (Fig. 4b and Supplementary Table 6). 174 175 Previous phylogenetic analysis indicated that caa3_CtaG proteins coded by isolated genes as in Rhodovibrio, a non- 176 photosynthetic member of the order Rhodospirillales which is part of a marine clade [51] or new family [71], were deep 177 branching with respect to similar proteins from other Rhodospirillales and most Alphaproteobacteria [23]. Rooted 178 phylogenetic trees extended to all the taxonomic groups that have caa3_CtaG proteins have later shown that the 179 Rhodovibrio protein forms a group that includes long proteins predicted to have 8 TM (as shown in Fig. 4) from 180 Acidiphilium, the deepest branching genus of the family Acetobacteraceae [71], and also Metallibacterium (Fig. 4b and 181 Supplementary Fig. S6). The latter taxon is an iron-metabolizing member of the which possesses 182 the ancestral type of CtaA too (Supplementary Fig. S1b). The henceforth named Rhodovibrio group is characterized by 183 sequence signatures such as a Cys residue lying just before the conserved D249 that is likely to be involved in Cu 184 binding (Supplementary Fig. S7). In one-half of rooted phylogenetic trees of caa3_CtaG proteins the Rhodovibrio group 185 is in sister position with respect to the branch formed by the bifunctional proteins of Actinobacteria and the 7 TM 186 proteins of Chloroflexi (Figs. 4b and Supplementary Fig. S6a; see also Supplementary Table S6). In other trees, the 187 Rhodovibrio group and the bifunctional/Chloroflexi group branch instead from a common stem, as shown in the 188 Bayesian tree in Supplementary Fig. S6b. A similar comb-like topology is seen in ML trees that are condensed with the 189 routine cut-off of 50% bootstrap support [52], not just for caa3_CtaG proteins (Fig. 7d), but also for other COX 190 assembly proteins and COX1 too (Fig. 7 and Supplementary Fig. S2). The simplest interpretation of the comb-like tree 191 topology is that the internal nodes of phylogenetic trees do not have enough statistical support to produce separate 192 branches for several groups of either taxonomically or structurally related proteins. So far, comb-like trees have been 193 found and interpreted almost exclusively in phylogenetic trees of taxa [72,73], and therefore we have not further 194 analyzed this pattern besides its implications suggesting crown evolution of COX (main text), which are mentioned later 195 in relation to SURF1 evolution.

6

196 SURF1 197 Surfeit locus protein 1 (previously termed SURF-1 [74], but frequently called Surf1 in bacteria [30,75-77]) is a 198 membrane protein that in humans is encoded by the SURF1 gene, which is defective in Leigh Syndrome [74]. Studies in 199 Paracoccus have shown that the isolated SURF1 protein binds heme A and may be involved in the insertion of this 200 heme in the oxygen-reacting center of COX [30,76]. SURF1 (Surf1) proteins contain the conserved domain [50] of the 201 SURF1 superfamily, cd06662. The term SURF1 has been used herein to define the proteins having this domain, which 202 corresponds to Pfam superfamily PF2104 (http://pfam.xfam.org/family/PF02104, last accessed on 22 Feb 2020). This 203 website lists the overwhelming presence in Actinobacteria and Proteobacteria of the alpha-, beta- and gamma- class, as 204 well as in 24 Chloroflexi and 3 Gemmatimonadetes. However, the real distribution of SURF1proteins in Chloroflexi is 205 much wider, including at least 75 taxa according to our most recent Blast searches. There are SURF1 homologs also in 206 bacterial groups that were previously considered to lack this proteins, for instance Deltaproteobacteria and 207 Acidimicrobiales (Supplementary Table S5b). Remarkably, this distribution is very different than, and not-overlapping 208 that of DUF420 proteins (Supplementary Table 5a); only the genome of Gemmatimonadetes bacterium isolate AG12 209 was found to contain genes for the complete version of both SURF1 and DUF420. This genomic evidence of mutual 210 exclusion suggests functional redundancy, sustaining the possibility that both SURF1 and DUF420 may contribute to 211 the insertion of heme A in COX1 [30,61]. On the other hand, a different kind of heme A insertase has been reported to 212 be involved in the assembly of the ba3 oxidase (family B) of Thermus [78]. We found no homolog of this protein 213 (accession: WP_011173205) outside the Deinococcus-Thermus phylum, confirming previous genomic searches [78]. 214 Because the same phylum also contains DUF420 proteins (Supplementary Table S5a), it is possible that yet 215 unrecognized membrane proteins may be involved in the insertion of heme A in other bacterial lineages that have either 216 DUF420 or SURF1 proteins. Intriguingly, Alphaproteobacteria of the Magnetospirillum clade [51] do not have SURF1 217 proteins, but do have DUF420 proteins associated with B family oxidases [46] (Supplementary Table S5). 218 219 We found proteins that have the same membrane topology of SURF1 but are about 70 residues shorter in their extra- 220 cytoplasmic domain; these proteins are most commonly present in the genome of Chloroflexi of the Ardenticatena and 221 Caldilinea lineages, in which they are generally associated with COX gene clusters. We have provisionally called these 222 proteins pre-SURF, since they align well with the transmembrane regions of SURF1 proteins. In phylogenetic trees, 223 they usually formed a robust root when the SURF similar proteins of Acidithiobacillus spp. and Acidiferrobacter spp. 224 were not included. 225 226 We report here the discovery of homologs of SURF proteins in the genome of Acidithiobacillus spp. and 227 Acidiferrobacter spp., which have characteristic residues lying in their extracellular domain that may function as 228 potential ligands for Cu (Fig. 5a). Notably, the C-terminal region of the caa3_CtaG proteins of the same acidophilic 229 iron-oxidizing taxa also show potential Cu-binding residues that may compensate for the absence of conserved His 230 residues in the central part of the protein, and may additionally contribute to Cu delivery to the oxygen-reacting center 231 of COX1. The sequences of SURF-related proteins from Acidithiobacillus spp. and Acidiferrobacter spp. substantially 7

232 diverge from those of previously known SURF1 proteins, but still share the same conserved domain of the SURF1 233 super-family; hence we called these proteins ‘SURF similar’. They invariably form the basal branch in phylogenetic

234 trees of SURF1 proteins, while the homolog proteins coded by the CyoE gene ending the operon of cytochrome bo3 235 ubiquinol oxidases [30] always occupy the latest diverging branch (Fig. 5b and Supplementary Fig. S8a). The addition 236 of SURF proteins from Chloroflexi and other phyla that are not represented in the tree of Fig. 5b (cf. Supplementary 237 Table S5b) does not modify the overall topology of the phylogenetic trees (Supplementary Fig. S8a). Intriguingly, the 238 additional proteins branch off in a comb-like pattern from a stem that is shared with the Proteobacteria and 239 Actinobacteria groups (Supplementary Fig. S8a). This pattern resembles that shown by other COX proteins (Fig. 7), 240 suggesting a sudden, crown-like evolution also for SURF1 proteins, after separation from SURF similar proteins of 241 acidophilic iron-oxidizers. Given these features, a statistical analysis of tree topology for SURF1 proteins was deemed 242 unnecessary.

8

243 Supplementary References – following the numeration in the main text 244 245 [66] Tully, B.J., Graham, E.D., and Heidelberg, J.F. (2018) The reconstruction of 2,631 draft metagenome-assembled 246 genomes from the global oceans. Sci Data16, 170203. 247 [67] Lundgren, C.A.K., Sjöstrand, D., Biner, O., Bennett, M., Rudling, A., Johansson, A.L., Brzezinski, P., Carlsson, 248 J., von Ballmoos, C., and Högbom, M. (2018) Scavenging of superoxide by a membrane-bound superoxide 249 oxidase. Nat Chem Biol 14,788-793. 250 [68] Kunst, F., Ogasawar,a N., Moszer, I., et al. (1997) The complete genome sequence of the gram-positive bacterium 251 Bacillus subtilis. Nature 390, 249-256. 252 [69] Morosov, X., Davoudi, C.F., Baumgart. M., Brocker, M., and Bott, M. (2018) The copper-deprivation stimulon of 253 Corynebacterium glutamicum comprises proteins for biogenesis of the actinobacterial cytochrome bc (1)-aa (3) 254 supercomplex. J Biol Chem 293, 15628-15640. 255 [70] Hvorup, R.N., Winnen, B., Chang, A.B., Jiang, Y., Zhou, X.F., and Saier, M.H. Jr. (2003) The 256 multidrug/oligosaccharidyl-lipid/polysaccharide (MOP) exporter superfamily. Eur J Biochem 270, 799-813. 257 [71] Muñoz-Gómez, S.A., Hess, S., Burger, G., Lang, B.F., Susko, E., Slamovits, C.H., and Roger, A.J. (2019) An 258 updated phylogeny of the Alphaproteobacteria reveals that the parasitic Rickettsiales and Holosporales have 259 independent origins. Elife 25, pii: e42535. 260 [72] Smith, W.A., Oakeson, K.F,, Johnson, K.P., Reed, D.L., Carter, T., Smith, K.L., Koga, R., Fukatsu, T., Clayton, 261 D.H., and Dale, C. (2013) Phylogenetic analysis of symbionts in feather-feeding lice of the genus Columbicola: 262 evidence for repeated symbiont replacements. BMC Evol Biol 13,109. 263 [73] Hernandez-Lopez, A. (2013) Of Trees and Bushes: Phylogenetic Networks as Tools to Detect, Visualize and 264 Model Reticulate Evolution. In: Evolutionary Biology: Exobiology and Evolutionary Mechanisms, pp. 145-164, 265 Springer, Berlin, Heidelberg. 266 [74] Zhu, Z., Yao, J., Johns, T., Fu, K., De Bie, I., Macmillan, C., Cuthbert, A.P., Newbold, R.F., Wang, J., Chevrette, 267 M., Brown, G.K., Brown, R.M., and Shoubridge, E.A. (1998) SURF1, encoding a factor involved in the 268 biogenesis of cytochrome c oxidase, is mutated in Leigh syndrome. Nat Genet. 20, 337-343. 269 [75] Poyau, A., Buchet, K., and Godinot, C. (1999) Sequence conservation from human to prokaryotes of Surf1, a 270 protein involved in cytochrome c oxidase assembly, deficient in Leigh syndrome. FEBS Lett. 462, 416-420. 271 [76] Hannappel, A., Bundschuh, F.A., and Ludwig, B. (2011) Characterization of heme-binding properties of 272 Paracoccus denitrificans Surf1 proteins. FEBS J 278, 1769-1778. 273 [77] Davoudi, C.F., Ramp, P., Baumgart, M., and Bott, M. (2019) Identification of Surf1 as an assembly factor of the 274 cytochrome bc(1)-aa(3) supercomplex of Actinobacteria. Biochim Biophys Acta 1860, 148033. 275 [78] Werner C, Richter OM, Ludwig B. A novel heme a insertion factor gene cotranscribes with the Thermus 276 thermophilus cytochrome ba3 oxidase locus. J Bacteriol. 2010 Sep;192(18):4712-9. 277 [79] Puustinen, A., and Wikström, M. (1991) The heme groups of cytochrome o from Escherichia coli. Proc Natl Acad 278 Sci USA 88, 6122-6126. 279 [80] Clark, I.C., Melnyk, R.A., Engelbrektson, A., and Coates, J.D. (2013) Structure and evolution of chlorate 280 reduction composite transposons. mBio. 4, e00379-13. 281 282

9

283 Figure S1. a. Schematic model for the transmembrane structure of the new type of CtaA protein without Cys pairs that 284 we found in several bacteria and called type 1.5 for it derives from a secondary loss of a Cys pair from type 1.1 proteins, 285 as indicated by phylogenetic analysis (Fig. 3 and Table1). The protein used as a reference in the model is 286 WP_012463762 of Methylacidiphilium infernorum. b. Neighbor Joining (NJ) tree obtained with the Blast routine from a 287 BLASTP search with the CtaA sequence of Acidiferrobacter sp. SP_III as a query against the whole NCBI nr database, 288 extended to 250 hits. Note that extension to larger numbers of hits did not retrieve any protein with the recognized 289 conserved domain of the COX15-CtaA super-family. The blue symbol indicates known Fe2+-oxidizing taxa. The 290 outgroup protein is from a different type of CtaA. LGT? stands for likely episodes of Lateral Gene Transfer of CtaA 291 genes. 292

a Schematic model of a new type of CtaA without Cys pairs

type 1.5 Methylacidiphilium P D F W P D G P G P S P G out G G H T V R E V H H G R H G G H G R G H Q Q G Q T 1 2 3 4 5 6 7 S 8 T R S K R W in N variable variable C b NJ tree from BLAST of Acidiferrobacter CtaA against nr database

Alcanivorax, outgroup class B – type 1.1 CtaA Acidiferrobacter thiooxydans Acidiferrobacter query Acidiferrobacter sp. MAG (Actinobacteria) soil crust (LGT?) Acidithiobacillus

Fe oxidizers Metallibacterium acid tolerant, Fe reducer Salinisphaera moderately acidophilic Acidihalobacter & Defluviimonas Nicrococcus mobilis nitrifier (LGT?) Acidiplasmata Archaea Sulfolobales

293 0.2 294

10

295 Figure S2. a. ML tree of 66 CtaA proteins as in Fig. 3a, but with labelled taxa. The tree is representative of several 296 other ML trees obtained with different conditions and sets of proteins as presented in Supplementary Table S2.

OQY89546.1_CtaA_type_2_Anaerolineae_bacterium_UTCFX1 100 RMH62643.1_CtaA_type_2_Calditrichaeota_bacterium_isolate_J004_k99_788863 24 WP_012026872.1_CtaA_type_2_Flavobacterium_johnsoniae 27 WP_029455177.1_Cta_type_2_Candidatus_Pelagibacter_ubique 43 WP_011338957.1_CtaA_type_2_Rhodobacter_sphaeroides_2.4.1 WP_043580054.1_CtaA_type_2_Gemmatimonas_phototrophica 50 type 2 99 OJW47555.1_CtaA_type_2_Alphaproteobacteria_bacterium_41-28 29 OJX05173.1_CtaA_type_2_Caedibacter_sp._38-128 71 RMH70379.1_CtaA_type_2_Gemmatimonadetes_bacterium_isolate_J002 33 76 WP_041097518.1_CtaA_type_2_in_doublet_Sulfuritalea_hydrogenivorans 63 WP_011384617.1_CtaA_type_2_Magnetospirillum_magneticum 100 WP_096603282.1_CtaA_type_1.1_Hydrogenobacter_hydrogenophilus 79 WP_079653992.1_CtaA_type_1.1_Thermocrinis_minervae type 1.1 & 1.3 Aquificae EDP75668.1_CtaA_type_1.3_Hydrogenivirga_sp._128-5-R1-1 100 AAC06508.1_CtaA_type_1.3_Aquifex_aeolicus_VF5 93 WP_007913852.1_CtaA_type_1.1__Ktedonobacter_racemifer WP_013172556.1_CtaA_type_1.1_Bacillus_selenitireducens WP_072434975.1_ClaA_type_1.0_Staphylococcus_multispecies 33 100 89 type 1.1 Firmicutes & Chloroflexi 51 pdb|6A2J|A_CtaA_type_1.1__Bacillus_subtilis_3D WP_072874998.1_CtaA_type_1.0_Alicyclobacillus_montanus 99 WP_021297909.1_CtaA_type_1.0_Alicyclobacillus_acidoterrestris 96 WP_062762558.1_CtaA_type_1.1__synthase_Tistrella_mobilis 26 OUV28671.1_CtaA_type_1.1_Alphaproteobacteria_bacterium_TMED109 type 1.1 alphaproteobacteria 100 EDG77812.1_CtaA_type_1.1_marine_metagenome_GOS1 WP_015894391.1_type_1.1_Gemmatimonas_aurantiaca 7 38 GBD46540.1_CtaA_type_1.1_bacterium_HR41_Actinomycetes 9 WP_098504290.1_CtaA_type_1.1_Thermoflexus_hugenholtzii 88 16 OJW49327.1_CtaA_type_1.1_Candidatus_Accumulibacter_sp._66-26 100 PZN76834.1_CtaA_type_1.1_Proteobacteria_bacterium_isolate_ZC4RG24 1 WP_039744349.1_CtaA_type_1.1_Geobacter_pickeringii 47 WP_011142162.1_CtaA_type_1.1_Gloeobacter_violaceus type 1.1 9 WP_011294411.1_CtaA_type_1.1_Prochlorococcus_marinus 100 RPJ71692.1_CtaA_type_1.1_of_CtaAB_Alphaproteobacteria_bacterium_isolate_maxbin2.0024 type 1.1 CtaAB fused 14 WP_102242031.1_CtaA_type_1.1_of_CtaAB_fused_Bacteriovorax_stolpii WP_083805082.1_type_1.5_Chthoniobacter_flavus 86 72 WP_013563749.1_CtaA_type_1.5_Isosphaera_pallida 87 WP_103028522.1__CtaA_type_1.5_Salinibacter_altiplanensis WP_012463762.1_CtaA_type_1.5_Methylacidiphilum_infernorum 75 RIL09501.1_CtaA_type_1.5_Proteobacteria_bacterium_isolate_PRO7 type 1.5 55 41 OGW89061.1_CtaA_type_1.5_Omnitrophica_bacterium_RIFCSPLOWO2_01_FULL_50_24 32 RKU08837.1_CtaA_type_1.5_Candidatus_Poribacteria_bacterium_isolate_PCPOR4 43 RMF61787.1_CtaA_type_1.5__Calditrichaeota_bacterium_isolate_J070 98 OGT94871.1_CtaA_type_1.5__Gemmatimonadetes_bacterium_GWC2_71_9 68 WP_089940975.1_CtaA_type_1.5_Candidatus_Entotheonella_palauensis OJU64717.1_CtaA_type_1.0_of_CtaAB_Armatimonadetes_bacterium_55-13 100 WP_013457123.1_CtaA_type_1.0_of_CtaAB_Oceanithermus_profundus type 1.0 CtaAB fused 98 WP_124105194.1_CtaA_type_1.0_of_CtaAB_Thermus_thermophilus 95 PSN93460.1_CtaA_Candidatus_Marsarchaeota_G2_archaeon_ECH_B_2 59 WP_011178101.1_CtaA_Picrophilus_Thermoplasmata WP_009073043.1_CtaA_Metallosphaera_yellowstonensis 99 WP_054838554.1_CtaA_Sulfolobus_metallicus 99 OJU85824.1_CtaA_ancestral_Solirubrobacterales_bacterium_70-9 RKF00309.1_CtaA_ancestral_Salinisphaera_sp._LB1 99 57 OBS10933.1_CtaA_ancestral_Acidihalobacter_prosperus_F5 type 0 ancestral 81 WP_035838712.1_CtaA_ancestral_Defluviimonas_sp._20V17 79 WP_081129996.1_CtaA_ancestral_Metallibacterium_scheffleri 61 WP_110138546.1_CtaA_ancestral_Acidiferrobacter_sp._SPIII_3 WP_123101892.1_CtaA_ancestral_Acidithiobacillus_sp._CJ-2 55 ACK78561.1_CtaA_ancestral_Acidithiobacillus_ferrooxidans_ATCC_23270 100 83 CDQ12450.1_CtaA_ancestral_Acidithiobacillus_ferrivorans 99 AEM47987.1_CtaA_ancestral_Acidithiobacillus_ferrivorans_SS3 BAF67254.1_CtaM_DYF420_Staphylococcus_aureus_subsp._aureus_str._Newman 86 WP_069185604.1_DUF420_Candidatus_Terasakiella_magnetica 100 WP_069957975.1_DUF420_Magnetovibrio_blakemorei DUF420 proteins 91 OGT94872.1_DUF420_Gemmatimonadetes_bacterium_GWC2_71_9 79 GBD06424.1_DUF420_bacterium_HR21

297 1 298 299

11

300 Figure S2. b. Bayesian tree obtained from the same alignment of 66 CtaA proteins as in part a. The bootstrap 301 percentage values are shown only for the major nodes.

WP_072434975.1_ClaA_ 6A2JA_CtaA_type_1.1_ WP_013172556.1_CtaA_ type 1 Firmicutes 64 WP_007913852.1_CtaA_ WP_021297909.1_CtaA_ & Chloroflexi WP_072874998.1_CtaA_ 100 OJW49327.1_CtaA_type PZN76834.1_CtaA_type 99 WP_011294411.1_CtaA_ WP_011142162.1_CtaA_ 100 74 WP_098504290.1_CtaA_ GBD46540.1_CtaA_type WP_013457123.1_CtaA_ 100 WP_124105194.1_CtaA_ OJU64717.1_CtaA_type WP_079653992.1_CtaA_ 100 WP_096603282.1_CtaA_ 99 AAC06508.1_CtaA_type Aquificae EDP75668.1_CtaA_type OUV28671.1_CtaA_type 92 EDG77812.1_CtaA_type WP_062762558.1_CtaA_ WP_015894391.1_type_ OJW47555.1_CtaA_type OJX05173.1_CtaA_type WP_011384617.1_CtaA_ RMH70379.1_CtaA_type WP_029455177.1_Cta_t WP_011338957.1_CtaA_ WP_012026872.1_CtaA_ WP_041097518.1_CtaA_ 100 RMH62643.1_CtaA_type OQY89546.1_CtaA_type type 2 100 WP_043580054.1_CtaA_ WP_039744349.1_CtaA_ WP_102242031.1_CtaA_ 58 RPJ71692.1_CtaA_type type 1.1 CtaAB fused OGW89061.1_CtaA_type WP_012463762.1_CtaA_ 94 RIL09501.1_CtaA_type RKU08837.1_CtaA_type WP_103028522.1__CtaA WP_013563749.1_CtaA_ 100 RMF61787.1_CtaA_type WP_089940975.1_CtaA_ type 1.5 WP_083805082.1_type_ OGT94871.1_CtaA_type OBS10933.1_CtaA_ance WP_081129996.1_CtaA_ OJU85824.1_CtaA_ance WP_123101892.1_CtaA_ CDQ12450.1_CtaA_ance AEM47987.1_CtaA_ance ACK78561.1_CtaA_ance type 0 WP_110138546.1_CtaA_ 100 WP_035838712.1_CtaA_ ancestral RKF00309.1_CtaA_ance 99 PSN93460.1_CtaA_Cand WP_011178101.1_CtaA_ 77 WP_009073043.1_CtaA_ WP_054838554.1_CtaA_ WP_069185604.1_DUF42 WP_069957975.1_DUF42 100 OGT94872.1_DUF420_Ge GBD06424.1_DUF420_ba DUF420 BAF67254.1_CtaM_DYF4

0.7 302 303

12

304 Figure S3. a. Extended ML tree of 140 CtaA proteins obtained with the program FasTree and 1000 replicates. The 305 alignment included the fast evolving sequence of the Planctomycetes Schlesneria (thick blue arrow), which was then 306 removed. All other sequences and taxa are as shown in part b. Note the different position of the two short Archaean type 307 1.4 proteins in the tree (thin arrows). See Table 1 for our expanded classification of CtaA types and subtype variants. 308 Bootstrap values are shown only for the major nodes.

100 86

99

type 2

70

93 65 type 1.1 Proteobacteria

type 1.4 Archaea 79 89

89 type 1.1 fused CtaAB

77 88 type 1.5 78

98 100 100 type 1.0 Alicyclobacillus

type 0

90

100

91 DUF420 proteins

1.0 309 310 311

13

312 Figure S3. b. ML tree of the same 140 CtaA obtained with the MEGA program. The proteins and taxa were the same as 313 in part a, but another sequence of type 1.4 CtaA from Aeropyrum was added to substitute the Schlesneria protein.

PZQ81172.1 CtaA type 1.0 Flavobacterium johnsoniae 30 GBD03730.1 CtaA type 2 bacterium HR19 100 PZN88645.1 CtaA type 2 bacterium isolate ZC4RG22 100 OQY89546.1 CtaA type 2 Anaerolineae bacterium UTCFX1 OUR99601.1 CtaA type 2 Halobacteriovorax marinus 80 RMH62643.1 CtaA type 2 Calditrichaeota bacterium isolate J004 k99_788863 66 22 WP_012026872.1 CtaA type 2 Flavobacterium johnsoniae WP_029455177.1 Cta type 2 Candidatus Pelagibacter ubique 4 100 WP_011338957.1 CtaA type 2Rhodobacter sphaeroides 2.4.1 KDB02983.1 CtaA type 2 Defluviimonas sp. 20V17 46 WP_043580054.1 CtaA type 2 Gemmatimonas phototrophica type 2 6 EDJ59104.1 CtaA type 2 GOS_1668562 marine metagenome 4 100 OJW47555.1 CtaA type 2 Alphaproteobacteria bacterium 41-28 16 OJX05173.1 CtaA type 2 Caedibacter sp. 38-128 RMH70379.1 CtaA type 2 Gemmatimonadetes bacterium isolate J002 42 PHS04815.1 CtaA2 type 2 Acidithiobacillus sp. isolate NORP59 68 100 WP_041097518.1 CtaA type 2 in doublet Sulfuritalea hydrogenivorans 42 WP_015766302.1 CtaA type 2 in doublet Candidatus Accumulibacter phosphatis clade IIA str. UW-1 68 WP_011384617.1 CtaA type 2 Magnetospirillum magneticum 64 WP_046021975.1 CtaA type 2 Magnetospira sp. QH-2 RDB35440.1 CtaA type 1.0 Spirobacillus cienkowskii 100 OJU64717.1 CtaA type 1.0 of CtaAB bacterium 55-13 100 AIE85547.1 CtaA type 1.0 of CtaAB Fimbriimonas ginsengisoli Armatimonadetes type 1.0 Thermus & other taxa 52 WP_124105194.1 CtaA type 1.0 of CtaAB Thermus thermophilus WP_050726680.1 CtaA type 1.0 Vulgatibacter incomptus delta 86 24 WP_013457123.1 CtaA type 1.0 of CtaAB Oceanithermus profundus 100 WP_096603282.1 CtaA type 1.1 hydrogenophilus 56 WP_079653992.1 CtaA type 1.1 Thermocrinis minervae EDP75668.1 CtaA type 1.3 Hydrogenivirga sp. 128-5-R1-1 type 1.1 & 1.3 Aquificae AAC06508.1 CtaA type 1.3 aeolicus VF5 42 100 100 WP_104022897.1 CtaA type 1.1 Gemmatirosa kalamazoonesis 44 WP_015894391.1 type 1.1 Gemmatimonas aurantiaca type 1 Gemmatimonadetes 10 GBD46540.1 CtaA type 1.1 bacterium HR41 Actinomycetes WP_053225804.1 CtaA type 1.0 Solirubrobacter soli WP_082058524.1 CtaA type 1.3 Acidithrix ferrooxidans 96 WP_052566363.1 CtaA type 1.3 Ferrimicrobium acidiphilum 100 type 1.3 Acidimicrobia 100 WP_015797667.1 CtaA type 1.3 ferrooxidans 28 BAA80695.2 Class A - type 1.4 CtaA Aeropyrum pernix K1 WP_010588310.12 Aeropyrum CtaA type 1.5 Schlesneria paludicola 98 PZN32774.1 CtaA type 1.1 Proteobacteria bacterium isolate ZC4RG39 52 PZN76834.1 CtaA type 1.1 Proteobacteria bacterium isolate ZC4RG24 96 WP_123590943.1 CtaA type 1.1 Salinisphaera halophila 100 80 OJW49327.1 CtaA type 1.1 Candidatus Accumulibacter sp. 66-26 type 1.1 Proteobacteria WP_089031249.1 CtaA type 1.1 Alcanivorax multispecies 18 WP_083250550.1 CtaA type 1.1 Acidihalobacter prosperus V6 100 WP_082954583.1 CtaA type 1.1 Acidihalobacter prosperus DSM 5130 WP_011294411.1 CtaA type 1.1 Prochlorococcus marinus 0 WP_015163461.1 CtaA type 1.1 Pseudanabaena sp. PCC 7367 74 WP_023175458.1 CtaA type 1.1 Gloeobacter kilaueensis type 1.1 Cyanobacteria 100 100 WP_011142162.1 CtaA type 1.1 Gloeobacter violaceus 12 KPK07954.1 CtaA type 1.1 Anaerolineae bacterium SG8_19 0 OGQ21377.1 CtaA type 1.1 Deltaproteobacteria bacterium RIFCSPHIGHO2_02_FULL_44_16 22 type 1.1 deltaproteobacteria 80 OGQ04045.1 CtaA type 1.1 Deltaproteobacteria bacterium RIFCSPHIGHO2_01_FULL_43_49 26 PIZ31830.1 CtaA type 1.1 Alphaproteobacteria bacterium CG_4_10_14_0_8_um_filter_53_9 OLS23033.1 Class A - type 1.4 CtaA Ca. Heimdallarchaeota archaeon LC_3 82 WP_088560771.1 CtaA type 1.1 Arboricoccus pini 98 8 WP_027133088.1 CtaA type 1.1 Geminicoccus roseus 4 WP_062762558.1 CtaA type 1.1 synthase Tistrella mobilis 100 OUX16407.1 CtaA type 1.1 Rickettsiales bacterium TMED251 type 1.1 alphaproteobacteria EDG77812.1 CtaA type 1.1 marine metagenome GOS1 88 PPR17063.1 CtaA type 1.1 Alphaproteobacteria bacterium MarineAlpha9_Bin3 100 100 OUV28671.1 CtaA type 1.1 Alphaproteobacteria bacterium TMED109 76 WP_098504290.1 CtaA type 1.1 Thermoflexus hugenholtzii PKB68439.1 CtaA type 1.1 of CtaAB fused SAR202 cluster bacterium Io17-Chloro-G3 type 1.1 Chloroflexi 16 WP_052604117.1 CtaA type 1.1 Acidithrix ferrooxidans 98 KJE77947.1 CtaA type 1.1 Ferrimicrobium acidiphilum DSM 19497 86 100 2 WP_021297909.1 CtaA type 1.0 Alicyclobacillus acidoterrestris 100 WP_072874998.1 CtaA type 1.0 Alicyclobacillus montanus WP_007913852.1 CtaA type 1.1 Ktedonobacter racemifer WP_019536423.1 CtaA type 1.1 Paenibacillus ginsengihumi 9622 WP_013172556.1 CtaA type 1.1 Bacillus selenitireducens type 1.1 Firmicutes 52 WP_077718695.1 CtaA type 1.1 Novibacillus thermophilus 36 pdb|6A2J|A CtaA type 1.1 Bacillus subtilis 3D EJY94546.1 CtaA type 1.0 Staphylococcus arlettae CVD059 52 12 100 WP_072434975.1 ClaA type 1.0 Staphylococcus multispecies 100 WP_011937347.1 CtaA type 1.1 synthase Geobacter uraniireducens 76 WP_039744349.1 CtaA type 1.1 Geobacter pickeringii type 1.1 Geobacter BAL54902.1 CtaA type 1.0 uncultured Acetothermia bacterium 100 GBC76368.1 CtaA type 1.0 bacterium HR07 36 RMH45218.1 CtaA type 1.0 Deltaproteobacteria bacterium isolate J016 k99_8148 20 PIP93002.1 CtaA type 1.1 of CtaAB Bdellovibrio sp. CG22_combo_CG10-13_8_21_14_all_39_27 WP_102242031.1 CtaA type 1.1 of CtaAB fused Bacteriovorax stolpii 100 type 1.1 fused CtaAB 98 RPJ71692.1 CtaA type 1.1 of CtaAB fAlphaproteobacteria bacterium isolate maxbin2.0024 28 WP_083805082.1 type 1.5 Chthoniobacter flavus WP_013563749.1 CtaA type 1.5 Isosphaera pallida GBD06427.1 CtaA type 1.5 bacterium HR21 Chlorobi 44 24 KXK56860.1 CtaA type 1.5 Chlorobi bacterium OLB7 38 KRO61795.1 CtaA type 1.5 Verrucomicrobia subdivision 6 bacterium BACL9 MAG-120507-bin52 100 RKU08837.1 CtaA type 1.5 Candidatus bacterium isolate PCPOR4 88 RKU32088.1 CtaA type 1.5 Candidatus Poribacteria bacterium isolate MSPOR6 0 20 RMF86485.1 CtaA type 1.5 Nitrospinae bacterium isolate J063 k99_1008194 OGW89061.1 CtaA type 1.5 Omnitrophica bacterium RIFCSPLOWO2_01_FULL_50_24 WP_103028522.1 CtaA type 1.5 Salinibacter altiplanensis type 1.5 various taxa 100 WP_009058399.1 CtaA type 1.5 Methylacidiphilum fumariolicum 100 6 WP_012463762.1 CtaA type 1.5 Methylacidiphilum infernorum WP_024806690.1 CtaA type 1.5 Verrucomicrobia bacterium LP2A RIL09501.1 CtaA type 1.5 Proteobacteria bacterium isolate PRO7 0 PYT33835.1 CtaA type 1.5 Acidobacteria bacterium isolate gp22 AA4 16 100 OGT96806.1 CtaA type 1.5 Gemmatimonadetes bacterium RIFCSPLOWO2_02_FULL_71_11 92 OGT94871.1 CtaA type 1.5 Gemmatimonadetes bacterium GWC2_71_9 10 RMF61787.1 CtaA type 1.5 Calditrichaeota bacterium isolate J070 28 WP_089940975.1 CtaA type 1.5 Candidatus Entotheonella palauensis WP_054970043.1 CtaA type 1 Alicyclobacillus ferrooxydans type 1.0 Alicyclobacillus 100 WP_021294876.1 CtaA type 1.0 Alicyclobacillus acidoterrestris 76 AKA48865.1 CtaA ancestral uncultured archaeon clone ASS_A1 PSN93460.1 CtaA Candidatus Marsarchaeota G2 archaeon ECH_B_2 WP_048101955.1 CtaA Acidiplasma Euryarchaeota Thermoplasmata 8250 100 WP_011178101.1 CtaA Picrophilus Thermoplasmata 48 PVU77062.1 CtaA Acidianus hospitalis WP_015385427.1 CtaA Sulfolobus acidocaldarius 100 WP_126449707.1 CtaA Sulfodiicoccus acidiphilus WP_054838554.1 CtaA Sulfolobus metallicus 30 100 90 WP_009073043.1 CtaA Metallosphaera yellowstonensis OJU85824.1 CtaA ancestral Solirubrobacterales bacterium 70-9 34 RKF00309.1 CtaA ancestral Salinisphaera sp. LB1 WP_081129996.1 CtaA ancestral Metallibacterium scheffleri 98 OBS10933.1 CtaA ancestral Acidihalobacter prosperus F5 type 0 ancestral 94 WP_035838712.1 CtaA ancestral Defluviimonas sp. 20V17 76 94 APZ42837.1 CtaA ancestral Acidihalobacter ferrooxidans 60100 WP_065970946.1 CtaA ancestral Acidiferrobacter thiooxydans WP_110138546.1 CtaA ancestral Acidiferrobacter sp. SPIII_3 WP_123102697.1 CtA ancestral second Acidithiobacillus sp. CJ-2 44 WP_004870834.1 CtaA ancestral 90 WP_123101892.1 CtaA ancestral Acidithiobacillus sp. CJ-2 9694 CDQ12450.1 CtaA ancestral Acidithiobacillus ferrivorans 96 AEM47987.1 CtaA ancestral Acidithiobacillus ferrivorans SS3 78 BBF65215.1 CtaA ancestral Acidithiobacillus ferridurans ACH84934.1 CtaA ancestral Acidithiobacillus ferrooxidans ATCC 53993 100 100 ACK78561.1 CtaA ancestral Acidithiobacillus ferrooxidans ATCC 23270 100 EJY94548.1 DUF420 Staphylococcus arlettae CVD059 100 BAF67254.1 CtaM DYF420 Staphylococcus aureus subsp. aureus str. Newman WP_021297921.1 DUF420 Alicyclobacillus acidoterrestris 98 AAC06515.1 DUF420 Aquifex aeolicus VF5 100 EDP75672.1 DUF420 Hydrogenivirga sp. 128-5-R1-1 88 WP_069185604.1 DUF420 Candidatus Terasakiella magnetica 6 DUF420 proteins 46 WP_069957975.1 DUF420 Magnetovibrio blakemorei REK09385.1 DUF420 Planctomycetes bacterium isolate LB_PLM_3 c_000000000037 92 GBD06424.1 DUF420 bacterium HR21 OGT96807.1 DUF420 Gemmatimonadetes bacterium RIFCSPLOWO2_02_FULL_71_11 6 314 100 OGT94872.1 DUF420 Gemmatimonadetes bacterium GWC2_71_9 14

315 Figure S4. Phylogenetic trees of type 1 CtaA in Alphaproteobacteria and Andalucia mitochondria. 316 a. NJ tree derived from a Blast search (focused on 100 hits) of type 1 CtaA OUV28671 of Alphaproteobacteria 317 bacterium TMED109 [66] against the whole nr database (accessed on 19 Feb 2020). 318 b. ML tree obtained with the JTT + F model and 1000 bootstraps using a manually curated alignment of 25 CtaA 319 sequences from unclassified Alphaproteobacteria and the single type 1 CtaA that is found in eukaryotes [22]. 320 a NJ tree from Blast of type 1 CtaA of alphaproteobacteria

Deltaproteobacteria MAG outgroup Magnetococcales MAG

Chloroflexi

Andalucia mitochondria marine alphaproteobacteria MAG MarineAlpha_9 & Pelagibacteeriaceae MAG Pelagibacterales MAG Bacilli Rhodospirillaceae and MarineAlpha_9 MAG Tistrella & Geminicoccaceae

0.2 b ML tree of type 1 CtaA alphaproteobacteria and Andalucia

54 Andalucia mitochondria 95 marine alpha MAGs 49 100 93 MarineAlpha9 MAGs 51 Geminicoccaceae & Tistrella 93 Magnetococcales 99 type 2 Reyranella type 1.0 non functional 98

321 1 322

15

323 Figure S5. ML tree of 70 sequences with the DUF420 family domain. The phylogenetic tree was obtained with the ML 324 approach (100 bootstraps) and the Dayhoff model; it was rooted on a group of distantly related proteins, provisionally 325 labelled ‘Precursors’ in the tree, which were easily aligned to the DUF420 proteins also because they share four TM. 326 Such proteins are present in the genome of unclassified Gemmatimonadetes, for instance PYP58912 of 327 Gemmatimonadetes bacterium isolate AG20, a soil metagenomic-assembled genome [59]. The tree topology indicates 328 two major clades for the DUF420 proteins, which are indicated by the two different symbols in their central nodes. 329 Clade 1 includes S. aureus CtaM [61], while clade 2 includes B. subtilis YozD, and therefore contains the group labelled 330 Bacilli2 that is boxed. The basal branch of the DUF420 phylogeny is labelled ‘Nitrospirae’ because it includes 331 comparatively longer proteins from Leptospirillum and various classified and unclassified taxa of the Nitrospirae 332 phylum. Further details regarding the two clades of DUF420 proteins are inserted aside the tree.

333 Maximal Likelihood tree of 70 sequences of DUF420 and related proteins 1 April 2109 ABK86588.1 conserved hypothetical protein Bacillus thuringiensis str. Al Hakam This group of Bacilli proteins KIN35281.1 YoxB-like Bacillus subtilis strain B4071 NP_389795.1 YozB Bacillus subtilis subsp. subtilis str. 168 includes YozB of B. subtilis (that is AHD07264.1 YozB Paenibacillus larvae subsp. larvae DSM 25430 Bacilli 2 WP_093335159.1 DUF420 in CtaA-F operon Salibacterium halotolerans not associated with AAA22368.1 ORF1 DUF420 in CtaA-F operon Bacillus firmus COX genes, hence isolated) and KMK77087.1 DUF420 in CtaA-F operon Bacillus pseudalcaliphilus WP_049815690.1 DUF420 Niastella koreensis ORF1 of B. firmus that ends a CTA- F operon CFB Rhodospirillaceae Gemmatimonadetes Entotheonella Thermus-Deinococcus & Aquificae BAL54904.1 DUF420 in COX operon uncultured Acetothermia bacterium Clade 2 GBC76366.1 DUF420 bacterium HR07 GBD04322.1 DUF420 bacterium HR19 WP_038089710.1 DUF420 flagellatus WP_083574114.1 DUF420 Alicyclobacillus montanus This group of Firmicutes proteins WP_054967562.1 DUF420 Alicyclobacillus ferrooxydans includes genomically isolated DUF420 WP_021297921.1 DUF420 Alicyclobacillus acidoterrestris WP_062308964.1 DUF420 Alicyclobacillus sendaiensis from Alicyclobacillaceae, DUF420 that Clade 1 AEJ42597.1 DUF420 Alicyclobacillus acidocaldarius WP_019533408.1 DUF420 un CtaA-G operon Paenibacillus ginsengihumi Firmicutes & Bacilli 1 end a CTA-G operon as in Virgibacillus WP_031548405.1 DUF420 Salinicoccus luteus and CtaM in a genomic unit with CtaAB WP_066192817.1 DUF420 in CtaA-G operon Gracilibacillus timonensis WP_053219793.1 DUF420 in CtaA-G operon Virgibacillus senegalensis that is typical of Staphylococcus and CVY07878.1 PDUF420 in isolated gene pneumoniae some Streptococci. These proteins are WP_002452975.1 DUF420 in CtaAB unit Staphylococcus multispecies WP_000538761.1 DUF420 in CtaAB unit Staphylococcus shorter than YozB. Nitrospirae Precursors

334 0.5 335

16

336 Figure S6. a. Representative ML tree of cca3_CtaG rooted with MATE proteins cf. Fig. 4b, with labelled proteins and 337 taxa. This tree was based on a refined alignment of 42 sequences.

98 WP_061117056.1_Caa3_CtaG_Caballeronia_turbans 89 WP_114162484.1_Caa3_CtaG_Paraburkholderia_terricola 70 PZN04705.1_Caa3_CtaG_Proteobacteria_bacterium_isolate_ZC4RG42 33 GAN15704.1_Caa3_CtaG_Sphingomonas_paucimobilis 19 WP_091340122.1_Caa3_CtaG_Alkalimonas_amylolytica 23 WP_008973008.1_Caa3_CtaG_Bradyrhizobium_sp._STM_3843 Proteobacteria (most) 35 WP_085083811.1_Caa3_CtaG_isolated_Azospirillum_oryzae 85 WP_159350243.1_Caa3_CtaG_protein_Roseomonas_deserti 81 WP_078758427.1_Caa3_CtaG_Lysobacter_spongiicola WP_082828323.1_Caa3_CtaG_Tistrella_mobilis_6TM 24 100 RWJ04862.1_Caa3_CtaG_Mesorhizobium_sp._isolate_N.Ca.IN.002.08.1_6TM 99 OLD86242.1_Caa3_CtaG_Gemmatimonadetes_bacterium_13_1_20CM_4_66_11 PYO42444.1_Caa3_CtaG_Gemmatimonadetes_bacterium_AG9 Gemmatimonadetes & Alicyclobacillus 25 WP_054969061.1_Caa3_CtaG_Alicyclobacillus_ferrooxydans 100 OFW75128.1_Caa3_CtaG_Alicyclobacillus_sp._RIFOXYA1_FULL_53_8 OLD98564.1_Caa3_CtaG_Actinobacteria_bacterium_13_1_20CM_4_69_9 Actinobactera not fused 7 THD10598.1_Caa3_CtaG_Metallibacterium_scheffleri 99 WP_012040066.1_Caa3_CtaG_short_Acidiphilium_multispecies OYV78030.1_Caa3_CtaG_Chromatiales_bacterium_21-64-14 78 OJU11752.1_Caa3_CtaG_Alphaproteobacteria_bacterium_64-11 92 Rhodovibrio group 12 WP_081728717.1_Caa3_CtaG_Rhodovibrio_salinarum 77 WP_085122454.1_Caa3_CtaG_Tistlia_consotensis EGO93812.1_Caa3_CtaG_long_Acidiphilium_sp._PM 100 GAN75160.1_Caa3_CtaG_long_Acidiphilium_multivorum_AIU301 6739 100 WP_098503125.1_Caa3_CtaG_Thermoflexus_hugenholtzii PWB48853.1_Caa3_CtaG_Dehalococcoidia_bacterium_isolate_FeB_14 Chloroflexi 94 WP_119285486.1_bifunctional_CopD/CtaG_Rhodospirillaceae_bacterium_SYSU_D60006 WP_011015135.1_bifunctional_CopD/CtaG_Corynebacterium_glutamicum 100 bifunctional CtaG 47 WP_126270502.1_Caa3_CtaG_Streptomyces_multispecies 24 sp|P64690.1_bifunctional_CopD/CtaG_Mycobacterium_tuberculosis_variant_bovis 100 WP_035328688.1_Caa3_CtaG_Bacillus_firmus WP_061891196.1_CtaG_Bacillus_subtilis 41 KJF18403.1_Caa3_CtaG_Acidithrix_ferrooxidans Acidomicrobia & Bacilli 98 WP_015798026.1_Caa3_CtaG_Acidimicrobium_ferrooxidans Acidimicrobia 100 KJE78182.1_Caa3_CtaG_Ferrimicrobium_acidiphilum_DSM_19497 97 WP_158306932.1_Caa3_CtaG_Acidithiobacillus_ferrivorans_SS3 WP_071182981.1_Caa3_CtaG Acidithiobacillus_ferrivorans 91 WP_113526191.1_Caa3_CtaG_Acidithiobacillus_ferrooxidans acidophilic Fe-oxidizers 75 WP_110137319.1_Caa3_CtaG_Acidiferrobacter_sp._SPIII_3 98 WP_083995572.1_Caa3_CtaG_Acidiferrobacter_thiooxydans WP_115243653.1_MATE family transporter_Corynebacterium_renale MATE outgroup 100 WP_074495182.1_MATE family transporter_Corynebacterium_multispecies

338 1 339 340 341

17

342 Figure S6. b. Bayesian tree of the same alignment of cca3_CtaG rooted with MATE proteins as that used for Fig. 4b 343 and in a. The various proteins are identified by their accession number and are the same as in a. Note that major 344 branches separate from a common stem in a comb-like fashion as in condensed ML trees of caa3_CtaG and (cf. Fig. 7). 345 This stem is in sister position to the basal branch of the proteins from acidophilic Fe-oxidizers. Bootstrap values are 346 shown only for major nodes as in Fig S2b. 347

WP_061117056.1_Caa3_ WP_114162484.1_Caa3_ PZN04705.1_Caa3_CtaG Proteobacteria GAN15704.1_Caa3_CtaG WP_091340122.1_Caa3_ WP_082828323.1_Caa3_ 75 RWJ04862.1_Caa3_CtaG WP_078758427.1_Caa3_ 75 WP_085083811.1_Caa3_ WP_159350243.1_Caa3_ WP_008973008.1_Caa3_ OLD86242.1_Caa3_CtaG PYO42444.1_Caa3_CtaG Bacillus 75 WP_035328688.1_Caa3_ WP_061891196.1_CtaG_ WP_054969061.1_Caa3_ 89 OFW75128.1_Caa3_CtaG KJE78182.1_Caa3_CtaG 50 WP_015798026.1_Caa3_ KJF18403.1_Caa3_CtaG GAN75160.1_Caa3_CtaG EGO93812.1_Caa3_CtaG Rhodovibrio 100 THD10598.1_Caa3_CtaG OJU11752.1_Caa3_CtaG WP_012040066.1_Caa3_ WP_081728717.1_Caa3_ 100 WP_085122454.1_Caa3_ OYV78030.1_Caa3_CtaG PWB48853.1_Caa3_CtaG 100 WP_098503125.1_Caa3_ WP_126270502.1_Caa3_ Actinobacteria P64690.1_bifunctiona WP_011015135.1_bifun WP_119285486.1_bifun OLD98564.1_Caa3_CtaG 50 WP_113526191.1_Caa3_ WP_158306932.1_Caa3_ WP_071182981.1_Caa3_ acidophilic WP_110137319.1_Caa3_ Fe-oxidizers WP_083995572.1_Caa3_ 100 WP_115243653.1_MATE WP_074495182.1_MATE outgroup

0.6 348 349

18

350 Figure S7. Alignment block covering the C-terminal region of 24 sequences of caa3_CtaG proteins. Residues in black 351 over white at the bottom of the alignment are conserved potential ligands for Cu, following the numeration of B.subtilis. 352 Residues highlighted in azul on top of the alignment are additional potential ligands for Cu in the distant caa3_CtaG 353 proteins of acidophilic iron oxidizers from Proteobacteria (see Fig. 4). The Cys residues highlighted in yellow indicate 354 other potential Cu ligands that are specific to the proteins of the Rhodovibrio group and may compensate for the absence 355 of M257 in their sequence. A gap common to many sequences has been deleted at the position highlighted in dark blue. 356 WP_012535968 Acidithiobacillus ferrooxidans QKALYV-WIMEMAMMA------MGGIWFWSAM-SMNPA--QSSHMLWGLTPLLDEHLAGMMMTFLSLPTMCLVTWH WP_012535968 Acidithiobacillus ferrooxidans QKALYV-WIMEMAMMA------MGGIWFWSAM-SMNPA--QSSHMLWGLTPLLDEHLAGMMMTFLSLPTMCLVTWHF CDQ11041 Acidithiobacillus ferrivorans QKAGYV-FVTEVIMMG------MGGMWFWSS-TSTNPM--GSSHILWGMTPLSDQRSAGIAMMALSLPTMCLVSWHF WP_110137319 Acidiferrobacter sp. SPIII_3 RKGLYV-WAMEFAMMV------MGTMWFWSAMKSMDPS--TGTPLVWGVSRVTDVHIAGAVMTGLSLPTMCLVSWHF WP_083995572 Acidiferrobacter thiooxydans RKGLYV-WAMEFAMMV------MGTMWFWSAMKSMNPS--TGTPLLWGMSRVTDVHIAGAVMTGLSLPTMCLVSWHF KJE78182 Ferrimicrobium acidiphilum HHSFYPIFATNMHLIGVSEADQQLAGGISKLIDFGILWTVAVAIMVRAQKQEDLGADPEPITWLDVEREFKRTHKNTG KJF18403 Acidithrix ferrooxidans RHSMYNPYSHEAKKIGISPADQQLAGGTVKIISIIVFWSIAAIILARAAKDEESGSSGPSFTWDDVEREFQRTAPIDQ PYO42444 Gemmatimonadetes bacterium AG9 GTVLYPFYATAPRVWGLTPVDQQLGGLLMWVVGTMYLWVAGGVVWFRWSAREEAGDVEREVPLEAYGSAEK------OLD86242 Gemmatimonadetes bacterium 13_1_20CM_ DSVLYPFYATAPRVGGLSPDDQQIGGLLMWVLGGLMLWIVMTVIWFRWSFWDARGDAERAVPLDAYGVGGSGLGTGRN WP_054969061 Alicyclobacillus ferrooxydans NHPWYSFYVSAPRAAWLTPADMQLGAIIMMVFMAGAYLAYGIRAYAKQDESIWYQ------OFW75128 Alicyclobacillus sp. RIFOXYA1_FULL_53 NRPWYAHYVDAPRFPNLSAGDMQLGAITMMIFMATRYAIVGIQQYMKQDESVWYE------WP_013172458 Bacillus selenitireducens GDAIYATYTN-PAVWATAYHDQQFGGAMMKVIQELAYGVAMGYTFRIWMKRDREETPKLAIQEFDMYEAETPKA---- WP_035328688 Bacillus firmus DTPMS-TYSD-PNAWMSLIHDQQLGGVIMKIIQEIVYGFILAQIFYAWYKKEQEITPSEETILNPHLVK------PWB48853 Dehalococcoidia bacterium FeB_14 SQVMFQQLQETPRFWPAPLLDQQIAGAIMFIVGEMLGLIATIAAAAAWARADEREAKRQDAKRARARAGT------WP_098503125 Thermoflexus hugenholtzii SRVLYEELAAQPRLWPDPLLDQQIGGAIMFLAGEAIGLVAVIAAAAAWARADEREARRADARMAREKARASAP----- WP_082828323 Tistrella mobilis LIA-FAGRPLYIHGLAAQLADQQAGGVIMLLGGGLAYLIGGVALMARLFRDSQGEGTMP------WP_081728717 Rhodovibrio salinarum PMDLFPYYAFCGRLYSIGPTDQQYGGLIIWLPPAMMSVIALLVVLAN-LRRAEAAGR------SMF15978 Tistlia consotensis DHDLYPFYAWCGRFFPSIATDQKFGGLIIWIPAAMMSVVALILVINALRRTEAGRGGEDRGEEVVSAAQWTG------OJU11752 Alphaproteobacteria bacterium 64-11 SHDIYPVYNICGRVLMTALNDQHYGGLIIWLPGTLTSFAAMIVVLVTMRLNEEKAEREREAREGAV------ONG53554 Roseomonas deserti ---LYARQTAQAALWLDPLQDQQLAGLVMWVPGGIAYAAAMLLCLAAWLRRAGRLADAARLG------GAN75160 Acidiphilium multivorum AIU301 SVVLYDAYGRMAPFWISALGDERFGGLTMWIDGSTMIALGALIALYRIASHEDRTAGRRIALVGAKPVRGSEFHVGKR WP_007424497 Acidiphilium sp. PM SVVLYDAYGRMAPFWISALNDERFGGLTMWIPGSMMIALGALIALYRIASHEDRVAGRRIALVGAPPVRNSQYHVGKR WP_008973008 Bradyrhizobium sp. STM 3843 PRPLYPGHAAGVMKWLTLMEDQQLAGLIMWIPAGGAYVLAAALVFLSWLDEAEARALRTA-RRGAVVQARPQVQGTPG WP_062364450 Variovorax paradoxus --PWYPAYGTATPALFDLLEDQRLGGLIMWVPAGLAYLVVALVAAARLLQGETVRAAPGAM-PPR------WP_078758427 Lysobacter spongiicola RRPLYEVYAERAPALLDVLADQQLAGLVMWVPACLPYLVGGLWLMAAWMQRAQRRQDPAYAAPPPFEPRSPGPQA--- RRPLYEVYAERAPALLDVLAD249AGLVM257 357 358

19

359 Figure S8. a. ML tree of SURF1 and pre-SURF proteins from all the taxonomic groups that have the corresponding 360 genes, for a total of 60 sequences (cf. Supplementary Table S5b). This tree is an expansion of that shown in Fig. 5b. 361 b. Scheme indicating the known or suggested role of COX assembly proteins, modified from Fig. 1c. Dashed blue 362 arrows indicate transfer of Cu atoms (left part of the model), while dashed black arrows indicate transfer of hemes. 363 Redox reactions are indicated by thin lines as in Fig. 1c. Note that heme B is taken from the cytoplasm to be modified 364 into heme O, as originally reported in E.coli [79].

95 12 CyoE bo3 oxidases WP_081837257.1 SURF1 family protein alpha proteobacterium Q-1 5 14 WP_012707112.1 SURF1 aa3 Sinorhizobium fredii WP_094409710.1 SURF1 Elstera cyanobacteriorum 7 WP_108658521.1 SURF1 aa3 Acuticoccus kandeliae 16 WP_012383730.1 SURF1 aa3 Beijerinckia indica 17 38 ECX99269.1 SURF1 GOS_2437683 marine metagenome 20 WP_007668186.1 SURF1 aa3 alpha proteobacterium BAL199 2 PPR10533.1 SRF1 Alphaproteobacteria bacterium MarineAlpha11_Bin1 WP_051474151.1 SURF1 Rhodospirillales bacterium URHD0088 92 WP_109106963.1 SURF1 aa3 Azospirillum sp. TSO35-2 5 WP_014240848.1 SURF1 aa3 Azospirillum brasilense WP_062948357.1 SURF1 Thalassospira Multispecies 31 WP_044433218.1 SURF1 aa3 Skermanella aerolata Alphaproteobacteria 76 49 WP_082828198.1 SURF1 aa3 Tistrella mobilis WP_088560772.1 SURF1 aa3 Arboricoccus pini 48 WP_011750542.1 SURF1 aa3 Paracoccus denitrificans PZQ48255.1 SURF1 Micavibrio aeruginosavorus isolate S2_005_002_R2_29 76 WP_081463054.1 SURF1 Micavibrio aeruginosavorus ARL-13 76 PCJ03246.1 SURF1 Alphaproteobacteria bacterium NORP84 100 AIL13098.1 SURF1Candidatus Paracaedimonas acanthamoebae 37 98 OJX14029.1 SURF1 aa3 Caedibacter sp. 37-49 WP_082168477.1 SURF1 family protein Caedimonas varicaedens 60 OUV90346.1 SURF1 Alphaproteobacteria bacterium TMED150 75 OUV07130.1 SURF1 Alphaproteobacteria bacterium TMED93 17 34 PPR17064.1 SURF1 Alphaproteobacteria bacterium MarineAlpha9_Bin3 45 OJV13902.1 SURF1 Alphaproteobacteria bacterium 33-17 16 59 WP_075535056.1 SURF1 Candidatus Pelagibacter ubique WP_116300503.1 SURF1 family protein Alkalilimnicola ehrlichii WP_081125900.1 SURF1 family protein Metallibacterium scheffleri WP_092135769.1 SURF1 Cupriavidus sp. YR651 23 WP_037332767.1 SURF1 family protein Salinisphaera hydrothermalis 33 PHX71198.1 SURF1 Acidimicrobium sp. isolate Baikal-G2 MTA51800.1 SURF1 Actinobacteria bacterium isolate UFOp-RE-18aug17-39 RE-18aug17-39-c7 9422 100 WP_011014979.1 SURF1 Corynebacterium glutamicum 58 WP_011075783.1 SURF1 family protein Corynebacterium efficiens Actinobacteria WP_083284771.1 SURF1 Corynebacterium multispecies 97 TXA41414.1 SURF1 family protein tuberculosis variant bovis PKO61789.1 SURF1 Betaproteobacteria bacterium HGW-Betaproteobacteria-18 37 MPZ52523.1 SURF1 bacterium isolate Dino_bin35 WP_013558955.1 SURF1 Anaerolinea thermophila WP_011958754.1 SURF1 family protein Roseiflexus sp. RS-1 5069 WP_012256115.1 SURF1 family protein Chloroflexus aurantiacus 48 WP_104022937.1 SURF1 Gemmatirosa kalamazoonesis Chloroflexi & other phyla OLC75447.1 SURF1 Gemmatimonadetes bacterium 13_1_40CM_4_69_8 29 WP_014432336.1 SURF1 Caldilinea aerophila 21 TDJ53498.1 SURF1Gemmatimonadetes bacterium isolate N074bin45 53 RPJ53973.1 SURF1 family protein Acidobacteria bacterium RLE14368.1 pre-SURF protein Actinobacteria bacterium isolate B3_G11 BAL54380.1 pre-SURF protein Chloroflexi bacterium 98 pre-SURF Chloroflexi 80 PWH11852.1 pre-SURF protein Anaerolineae bacterium isolate Nak57 92 NCP88285.1 pre-SURF protein bacterium isolate 311FMe.003 WP_065968679.1 SURF similar Acidiferrobacter thiooxydans WP_110138620.1 SURF similar Acidiferrobacter sp. SPIII_3 100 WP_126604692.1 SURF similar Acidithiobacillus ferridurans SURF similar iron oxidizers 98 WP_012537611.1 SURF similar Acidithiobacillus ferrooxidans multistrain

365 2 366 b Scheme with the accessory proteins for COX biogenesis pyrites Fe2+ extracellular

OM c Cyc2

periplasm Cu Cu rus Cu rus Cu rus

+ Cu c H cup Cu Cyc1 c SURF-1 TlpA Cu Cu a CuA Cu a o Cu a B 3 a CtaB IM CtaG CtaA K & D 4 COX2 COX1 channel 3 o b

Cu H+ cytoplasm b 367 368

20

369 Figure S9. a. NJ tree derived directly from the BLASTP search of COX1 of Acidothiobacillus ferrooxidans against the 370 nr database, extended to 250 hits (cf. Fig. S1b). The deepest branching groups and the tree topology of the iron oxidizers 371 (boxed) did not change by expanding the Blast search up to 1000 hits. b. ML tree rendered with the MEGA5 program 372 (500 bootstraps) using a manually curated alignment of the significant hits retrieved from a BLASTP search of COX3 373 from Acidiferrobacter sp. SP_III against the nr database as in a. c. ML tree rendered with the MEGA5 program (500 374 bootstraps as in b) of a BLASTP search of COX4 of Acidiferrobacter sp. SP_III against the nr database; spurious hits or 375 irrelevant proteins retrieved in the search were removed from the alignment that has been used for producing the tree. In 376 all panels, the clade of proteobacterial Fe2+-oxidizers is surrounded by a blue box.

a NJ tree of COX1 from BLASTP against whole NR b ML tree of COX3 matching tree from BLASTP against whole NR Pseudonocardiales bacterium(PZS18933.1) Chloroflexi bacterium(PZR99744.1) 100 Actinobacteria MAGs Acidiferrobacter Actinobacteria bacterium 21-73-9(OYV59603.1) 95 Acidimicrobium ferrooxidans(WP_015799102.1) CoxC Acidithiobacillus sp. Ferrithrix thermotolerans(WP_072788440.1) Ferrimicrobium acidiphilum(WP_052566320.1) 88 Chloroflexi Acidithiobacillus 98 Candidatus Dormibacter sp. RRmetagenome_bin12(PZR78255.1) Chloroflexi bacterium(TME50400.1) Acidihalobacter Chloroflexi bacterium(TMF16118.1) 97 Actinobacteria MAGs Thioclava sp. WP_076835289.1 COX3 Acidihalobacter ferrooxidans Acidiferrobacter thiooxydans(WP_065972084.1) Acidiferrobacter sp. SPIII_3(WP_110137324.1) WP_098503120.1 heme-copper oxidase subunit III Thermoflexus hugenholtzii Acidithobacillus Acidihalobacter COX3 Acidithrix ferrooxidans bacterium(TBR23506.1) Nitrospirae bacterium(TAJ34963.1) Chloroflexi bacterium CSP1-4(KRT60364.1) 0.5 Chloroflexi bacterium(TMB58219.1) Actinobacteria bacterium(HCP62636.1) Actinobacteria bacterium(RUA23490.1) c ML tree of COX4 matching tree from BLASTP against whole NR Chloroflexi 94 Chloroflexi Acidiferrobacter Actinobacteria & Chloroflexi 97 CoxD Acidithiobacillus sp.

89 Acidithiobacillus 39

A2 type Cyanobacteria Acidihalobacter 70 Cox4 Acidithrix ferrooxidans

1

377 0.1 378

21

379 Figure S10. Amino acid residues known to form the D- and K-channel in P. denitrificans COX1 [2, 54] compared to 380 those in the taxa indicated. Substitutions are highlighted in gray and in red when they are non-conservative. 381

file: Table D-channel and K channel 05-Nov-19 D-channel K-channel taxon COX1 type* E278 N113 D124 N131 S134 S193 N199 Y280 T351 K354 S357 S291

Paracoccus denitrificans A1 sub. B E N D N S S N Y T K S S Acidithiobacillus thiooxidans A1 sub. bo3 E N D N G T N Y T K N S Gemmatimonadetes sp. A1 sub. a-III E N D N G T N Y S K N S Acidithiobacillus ferrivorans A2 E Y N L S S N Y T I S L Acidithiobacillus ferrooxidans A2 E Y N L S S N Y T I S L Acidiferrobacter sp. SPIII_3 A2 E Y T L S T N Y T V A L Acidihalobacter prosperus A2 E N Q M S T N Y T F A L Gemmatimonadetes sp. AG38 A1 E N D N G T N Y A I L S Acidobacteria sp. Gp1 AA139 A2 E N D N G T N Y S L I S Acidimicrobium ferrooxidans A2 E N K E S M N Y T L V M Acidithrix ferrooxidans A2 E N K E S I N Y T I V S Alicyclobacillus ferrooxidans A2 E N S S S S N Y T K T T Thermoplasmata Archaea Acidiplasma aeolicum B? fused E I D N G T E Y T L G F Ferroplasma sp. Type II B? fused E I D N G T E Y T L G F Thermoprotei Archaea Sulfolobus metallicus FoxA - B L F L A G E Y F L L I Acidianus ambivalens DoxB - B I I Q K L T W Y S T N Y Sulfolobus acidocaldarius SoxB - B V G no K A A A Y S T N Y Thermus thermophilus B I V no M S V L Y S T T Y * Checked in www.evocell.org/HCO 382

22

383 Figure S11. a. Linearized ML tree of 115 COX1 sequences from bacteria, including many lacking the K-channel (cf. 384 Fig. 6b and Supplementary Fig. S9). Proteobacterial Fe2+-oxidizers are in marine blue. The tree was graphically 385 rendered with the linearized option of the MEGA5 program without a strict cut-off to present it in a compact way, and 386 was rooted using family B paralogues. 387

a Linearized ML tree of various bacterial COX1

100 type A1 type A2subtype subtype a & ab delta 44 74 A1 type Calescamantes 100 38 A1 type Oligoflexia 100 A1 - A2 type mix 40 97 type A1 subtype a 49 A1 type with 2 COX3 100 A2 type Cyanobacteria & Chloroflexi 45 74 A2 types 51 type A1 subtype b 100 A2 type Cox1-3 Thermus 3D 100 A1 type COX13 fused 10031 A1 type Chloroflexi & bacterium H33 84 100 63 type A1 A1 type Bacillussubtype a-III 100 45 A1 type quinol oxidases 100 100 Pseudocardiales sp. RR 95 Ktenobacter UBA10454 90 without Chloroflexi bacterium E78 RRK channel 67 Ca. division AD3 bacterium 100 Acidimicrobia 100 100100 Alicyclobacillus no K channel 100 B Thioclava family ba3 100 Acidihalobacter ferrooxidans 100 78 Acidihalobacter prosperus 100 100 40 Acidiferrobacter 100 Acidithiobacillus 100 ba3 oxidases family B

0.5 388 389 390

23

391 Figure S11. b. ML tree (100 bootstraps) of 120 COX1 proteins including those of acidophilic Fe2+-oxidizers from 392 Archaea. The tree was graphically rendered with the linearized option without a strict cut-off for compact presentation. b ML linearized tree of COX1 proteins including those of Archaea

A1 type subtype a & ab A2 type Cyanobacteria & Chloroflexi A1 type Calescamantes A1 type with 2 Cox3 A1 type Oligoflexia A1 - A2 type mix KJR40944.1 A2 type sub. CyoCAB Candidatus Magnetoovum chiemensis WP_096602706.1 A2 type sub. CyoCAB Hydrogenobacter hydrogenophilus A2 type CyoCAB WP_069957977.1 A2 type sub. CyoCAB Magnetovibrio blakemorei SPQ00724.1 A2 type COX1 sub. delta Candidatus Sulfobium mesophilum AFN73282.1 A2 COX1 coo3 Melioribacter roseus P3M-2 Ignavibacteria WP_085814438.1 A2 type COX1 sub. delta Geobacter pelophilus WP_004513565.1 A2 type COX1 sub. delta Geobacter metallireducens YP_011033.1 A2 type cbo3 oxidase Desulfovibrio vulgaris str. Hildenborough A2 types WP_022851262.1 A2 type COX1 sub. delta Geovibrio sp. L21-Ace-BES A2 types PZN27055.1 COX1 A2 type a-I subtype Proteobacteria bacterium isolate ZC4RG46 RMF65912.1 A2 type COX1 sub. delta Calditrichaeota bacterium isolate J070 WP_014744997.1 COX1 A2 type a-I subtype Tistrella mobilis WP_096331974.1 COX1 A2 type Nannocystis exedens isolated gene cluster RMH73828.1 COX1 A2 type Gemmatimonadetes bacterium isolate J002 EHO39949.1 COX1 A2 type a-I abyssi DSM 13497 A2 type Cox1-3 Thermus WP_011765490.1 A1 type COX13 fused Azoarcus sp. BH72 WP_077280290.1 A1 type COX13 fused Thioalkalivibrio denitrificans A1 type Cox1-3 fused WP_014890673.1 A1 type COX13 fused Methylocystis sp. SC2 WP_119003812.1 A1 type COX13 fused Rhodobacter sphaeroides A1 type Chloroflexi & bacterium H33 A1 type caa3 Bacilli A1 type ubiquinol oxidases WP_052604890.1 COX1 Acidithrix ferrooxidans Acidimicrobia Fe-oxidizers WP_035388637.1 COX1 Ferrimicrobium acidiphilum WP_052605977.1 COX1 without K channel Acidithrix ferrooxidans KJE75794.1 COX1 without K channel Ferrimicrobium acidiphilum DSM 19497 WP_015799102.1 COX1 without K channel Acidimicrobium ferrooxidans KPV42142.1 cytochrome C oxidase subunit I Alicyclobacillus ferrooxydans KPV41969.1 ubiquinol oxidase subunit I Alicyclobacillus ferrooxydans WP_088728307.1 cytochrome C oxidase subunit I Thioclava sp. IC9 acidophilic Fe-oxidizers WP_083699744.1 COX1 Acidihalobacter ferrooxidans V-8 AOV18356.1 COX1 Acidihalobacter prosperus V-6 WP_035190474.1 CoxA without K channel Acidithiobacillus ferrivorans CAA07035.1 CoxA without K channel in Fe oxidase Acidithiobacillus ferridurans WP_065972084.1 COX1 Acidiferrobacter thiooxydans WP_110137324.1 COX1 Acidiferrobacter sp. SPIII_3 WP_054964015.1 Cox1-3 fused Acidiplasma aeolicum EQB71948. Cox1-3 fused Ferroplasma sp. Type II Thermoplasmata Fe-oxidizers WP_009887157.1 Cox1-3 fused Ferroplasma acidarmanus AAS45824.1 SoxM partial Metallosphaera sedula SoxM fused Cox1-3 WP_011279051 SoxM Cox1-3 fused Sulfolobus acidocaldarius AKA48933.1 COX1 aa3 ancestral uncultured archaeon clone ASS_A1 A1 type uncultured archaeon ABG91823.1 FoxA Sulfolobus metallicus DSM 6482 JCM 9184 WP_110379818.1 FoxA Acidianus sulfidivorans WP_010980682.1 FoxA Sulfurisphaera tokodaii FoxA WP_009072007.1 FoxA Metallosphaera yellowstonensis WP_110369035.1 FoxA Metallosphaera hakonensis CAA69980.1 DoxB Acidianus ambivalens WP_048100260.1 DoxB Candidatus Acidianus copahuensis DoxB AAK40409.1 DoxB Saccharolobus solfataricus P2 B family AHC52110.1 SoxB Sulfolobus acidocaldarius SUSAZ CAA44510.1 SoxB Sulfolobus acidocaldarius SoxB AAS45823.1 SoxB partial Metallosphaera sedula B family ba3 oxidases 393

394 2.5 2.0 1.5 1.0 0.5 0.0 395

24

396 Figure S12. Alignment of sequences of the C-terminal domain of COX2 that binds the CuA centre. 397 Cox2 alignment taxa CuA ligands DVMHDFWVPAWGEKKDVIPNEVRHLFITPTALGSTATNPMLRVQCAMICGNGHPLMRAPVKVVTAAKFKTW COX2 Acidithiobacillus sp. Milos DVMHDFWVPAWGEKKDVIPNEVRHLFITPTMLGTTATNPMLRVQCSLICGNGHPLMRAPVKVVTPADFKTW COX2 Acidithiobacillus ferroxidans 1 DVMHDFWVPAWGEKKDVIPNEVRHLFITPTMLGTTATNPMLRVQCSLICGNGHPLMRAPVKVVTPADFKTW COX2 Acidithiobacillus ferroxidans 2 DVMHDFWVPAWGEKKDVIPNEVRHLFITPTMLGTTATNPMLRVQCSLICGNGHPLMRAPVKVVTPADFKAW COX2 Acidithiobacillus multispecies DVVHDFWVPAWGEKKDVIPNEVRHLFITPTVLGTTATNPMIRVQCTMICGNGHPLMRAPVKVVTAADFKKW COX2 Acidithiobacillus ferroxidans 3 DVMHDFWVPAWGEKKDVIPNEVRHLFITPTMLGTTATNPMLRVQCSLICGNGHPLMRAPVKVVTPADFKAW COX2 Acidithiobacillus ferridurans DVMHDFWVPAWGVKKDVIPNEIRHLYVTPTVLGSTKTNPMIRVQCSLICGNGHPMMRAPVEVLTKAAFKTW COX2 Acidithiobacillus thiooxydans DVMHDFWVPAWGVKKDVIPNEIRHLYVTPTVLGSTKTNPMIRVQCSLICGNGHPMMRAPVEVLTKAAFKTW COX2 Acidiferrobacter sp. SPIII_3 DVFHSFWVPAFGFKLTAIPGENRVMYATPIRLGTQTGDPLLRVQCSWDCGMGHPVMRFPANVVTWKNFKSW COX2 Acidiferrobacter ferroxydans DVMHSFWVPIWGIKKALVPGETRSIVITPTALVDTSQNPLARVQCSWDCGLGHAQMRAVVKVVTDKDFKAW COX2 Acidihalobacter prosperus DVMHSFWAPAWGIKKAVIPGETRDLVVTPTKIMDTLSDPTMRIQCAQICGAGHPVMRSELRVVSAADFDTW COX2 Thioclava sp. DLFJ4-1 DVMHSFWVPAWGIKKAVIPGETRDLVVTPTKIMDTLSDPTMRIQCAQICGAGHPVMRSELRVVSAADFDTW COX2 Thioclava sp. IC9 DVTHSFYVPAWGVKADIIPGITRDLYITPTQITSTAVNPMVRLQCAQLCGAGHAYMEANVEVVSPQAFAKW COX2 Acidithiobacillus ferroxidans 4 DVIHSFWVPSWGIKEDVIPGEARSIYITPTKITSFAQNPMSRVQCAEVCGPGHPWMEAPLNVVSSSQFAKW COX2 Sulfobacillus thermotolerans DVIHSFWIPAFGEKMDVIPGETRYMVATPTKIASTETNPEVRVQCAEVCGPGHPYMYATVHIVSDAAFKQW COX2 Acidibacillus sulfuroxidans DVVHSLWIPAFRMKIDAIPGRTTYMTVYPTELGAYNDDQAYRVQCAELCGLDHSKMWLPVRVVTESEFEAW COX2 Thermoflexus hugenholtzii DVVHSFWIYDYDIKEDAVPGVVNHAYFDARYTGSSTARGKNWVTCNELCGLWHGWMRSRLAVVSKPAFASW COX2 Chloroflexi bacterium isolate RR DVVHSFWIVQMGIKIDANPGEVTHIGVTPD------RRGTFAVRCAELCGIYHAYMQTQVRVVSDAAFRSW COX2 Actinobacteria bacterium UBA8262 DVIHSFWVPQLSGKTDLIPNRTNHLWVDPF------EPGVYVGQCAEYCGTQHAQMLLRVVVHTPRGFESW COX2-c Deltaproteobacterria bacterium NP36 DVIHSFWIPALSGKRDVMPNHTNFIWFTPDSAL---GAQVWNGHCAEYCGTSHANMKFRAYTVTAEQFDSW COX2 Gemmatirosas kalamazoonesis DILHSFYIPEFRVKQDMVPGSYTSVWFEAT------EARETVLLCTEYCGSGHSDMMATVKVLEPSDFEKW COX2-c Sorangiium cellulosum DVLHDFYVPEFRAKMDMIPGVVTYYWFTPT------RAGTFEALCAELCSTGHSFMRGGVVVESESKRQAW COX2 Bosea lathyri DVLHDFYVPEFRAKMDMIPGMVTYFWFTPT------RTGTFEILCAELCGVGHPQMRGTVMIDEEVAYQAW COX2 Rhizobium mongolense DVLHSFAMPSFGVKMDAVPGRLNETWFRVE------KPGVYYGQCSEICGVRHGFMPIVIEARAEGDFENW COX2 Tistrella mobilis DVIHAFALPAFGVKIDAIPGRLNETWFKAT------KTGMFYGQCSELCGKDHAFMPIAIRVVEDQEFASW COX2 Bradyrhizobium japonicum DVIHSWTVPAFGVKQDAVPGRLAQLWFRAE------REGIFFGQCSELCGISHAYMPITVKVVSEEAYAAW COX2 3D Rhodobacter sphaeroides DVIHAWTIPAFAVKQDAVPGRIAQLWFSVD------QEGVYFGQCSELCGINHAYMPIVVKAVSQEKYEAW COX2 3D Paracoccus denitrificans DVIHGFHVEGTNINVEVLPGEVSTVRYTFR------PGEYRIICNQYCGLGHQNMFGKIVVKE------COX2 3D ba3 Thermus thermophilus DVIHGFNIQGTNVNMMVIPGEVSKLTATFE------AGEYHFVCNEYCGVGHHQMFGTVIVEEE------COX2 ba3 Chloroflexi bacterium DOLJORAL_50_32 DVVHGVHIHGTNYNVMAIPGTVGYMRIKFK------PGVYHVVCHEFCGVGHHAMQGKIIVE------COX2 ba3 Aquifex aeolicus SVMNSFFIPRLGSQIYAMAGMQTRLHLIAE------PGTYDGISASYSGPGFSGMKFKAIATPDRAFDQW S ub . I I b o 3 qu i n o l o1.xi dbo3ase Eubiquinol.coli K-12 oxidase E.coli KGFSGMKFKAITPDMNTFNQWVAKAKQSPNAI-----NDMATYEKLAAP-SEYNKVEYFSAVKP-DLFKDV Sub. II bo3 quinol oxidase Enterobacter sp. R1 AGTSWMLFKTKVVSMADFKQWAHGVENSPNTM-----DYAQFNRFAEPYINVKDKTPVYGHVEH-GLFNHL Sub. II bo3 quinol oxidase Acidithiobacillus AGTSWMTFKTRIVKSADFREWVRKVQQSPTSM-----TYASFNDYADPYINVHHKVAYFSSVPD-GLFDHV Sub. II bo3 quinol oxidase A. ferrooxydans 1 AGTSWMTFKTRIVKSADFREWVRKVQQSPTSM-----TYASFNDYADPYINVHHKVAYFSSVPD-GLFDHV Sub. II bo3 quinol oxidase A. ferrooxydans 2 AGTSWMTFKTKIVPQSEFTQWVQQVKASPKTM-----TEASFDRYAEPYINVHHRVVYFSKVPD-GLFDHV Sub. II bo3 quinol oxidase A. caldus AGTSWMTFKTKMVDKADFAQWVQKVQQAPGSM-----TYASFNRFAEPFINVNHHLVYFSSVEP-GLFDHV Sub. II bo3 quinol oxidase Acidithiobacillus multi. PGFSWMDFKVKAVPPAKFTAWIKKAGTSKNTL-----NYASFSRFAEPTVNIERKIHYFSDVKT-GLFAKV Sub. II bo3 quinol oxidase Acidihalobacter prosperus PGFSWMRFETHVVSDETFARWSEGMQTAEEHL-----DDASFLKFAAPTINTDNITRRFSGVQP-GMFDRV Sub. II bo3 quinol oxidase Acetobacteraceae MAG PGFSWMDYKVHVVKKAKFQGWVDKAQATGQAL-----SYKRFQQFAEPMVNTANKTYTFNHVDP-DLFETV Sub. II bo3 quinol oxidase Salinisphaera halofila AGFSWMGFKTHVVSSAHFTHWIQSIQKSPRHM-----DYAQFNKFARPTTNIDGKTPAFSHVHA-RLFDQV Sub. II bo3 quinol oxidase Acidiferrobacter thio. AGFSWMGFKTHVVSAARFTHWTQSIQKSPRHM-----DYAQFNKFARPTTNIDGKTYAFSHVHA-RLFDQV Sub. II bo3 quinol oxidase Acidiferrobacter SPIII_3 EGFYTQNFKASAMNDEVFARWVKAAKARGLPL-----DNKARAALEEKSNKVELARALDQPTEQPILFSTA Sub. II bo3 quinol oxidase Thioclava sp. IC9

Legend: Residues in white over blue background are the conserved ligands for CuA; the Glu/Gln residues highlighted in azul provide a ligand via their peptide carbonyl and are substituted

only in the COX2 sequences of acidophilic iron oxidizers of Proteobacteria and of cytochrome bo3 quinol oxidases of Enterobacteraceae. These and other substitutions of conserved ligands are highlighted in gray. 398 399

25

400 Supplementary Tables 401 Supplementary Table S1. The table lists the accession number for COX subunits and their accessory proteins that we 402 found in taxa predominantly present in soil metagenomes [59]. Symbols are the same as in Fig. 1c,d. White boxes 403 indicate missing genes. Genes inserted within common clusters are indicated by small squares in light gray and contain 404 the 2 numeral when present in a pair. ABC indicate genes for ATP-Binding Cassette transporters. Genes for proteins 405 with known function that are not part of the rus operon are in light gray, while genes encoding partial proteins are in 406 dark gray. CtaA genes, when present, encode for type 1.1 proteins. Genes for di-haem c-type cytochromes are indicated 407 with c4 as in Fig. 1d. Genes for chloride dismutase are fused with a Rieske domain and colored in pale green. Similar 408 genes end operons for chlorate reduction that contain genes encoding DOMON domain proteins [80], resembling those 409 present in some gene clusters shown here. The great majority of the genomes of the listed taxa were estimated to be 410 over 90% complete. Other partial clusters resembling those listed in the table and similar clusters found in genomes that 411 were less than 90% complete are not presented. The original table can be supplied as Supplemental Table S1.xls . 412 Top part. rus-like gene clusters

insert partial rus -like COX A1 type organisms with rus -like gene cluster cyt c -rich cluster prepended Cu protein COX operon a-III subtype upstream gene Cyc 2-like di-haem cyt c Cup-like COX2 COX1 COX3A COX3B COX4 other extra phylum Gemmatimonadetes PYO94987 DinB PYO94981 93 PYO9498 chloride PYO94986 456 aa PYO94985 PYO94984 236 aa PYO94983 PYO95014 PYO94982 PYO95013 Gemmatimonadetes bacterium isolate AG38 superfamily aa 3TM channel PYP18302 DinB, PYP18301 458 aa PYP18300 PYP18298 238 aa PYP18298 PYP18297 PYP18296 PYP18295 PYP18294 Gemmatimonadetes bacterium isolate AG30 partial Gemmatimonadetes bacterium OLD58396 OLD58389 469 aa OLD58390 OLD58391 241 aa OLD58397 OLD58392 OLD58393 OLD58394 OLD58395 13_1_20CM_69_28 peroxiredoxin HCU10810 BaeS Gemmatimonadetes bacterium isolate HCU10811 115 HCU10818 439 aa HCU10817 HCU10816 242 aa HCU10815 HCU10814 HCU10813 HCU10812 Signal aa 3TM UBA10902 transduction Gemmatimonadetes bacterium OLB49088 91 aa OLB49096 second OLB49089 469 aa OLB49090 OLB49091 226 aa OLB49131 OLB49092 OLB49093 OLB49094 OLB49095 13_2_20CM_2_65_7 unknown Cup PYP55606 DinB_2 PYP55598 162 aa PYP55605 456 aa PYP55604 PYP55603 238 aa PYP55602 PYP55601 PYP55600 PYP55611 PYP55599 Gemmatimonadetes bacterium isolate AG16 family 1 TM PYP52735 214 aa PYP52744 OsmC- PYP52736 454 aa PYP52737 PYP52738 240 aa PYP52739 PYP52740 PYP52741 PYP52742 PYP52743 Gemmatimonadetes bacterium isolate AG21 unknown like protein PYO40317 212 aa PYO40310, PYO40308 115 aa PYO40316 445 aa PYO4031 PYO40314 259 aa PYO40313 PYO40312 PYO40311 PYO40309 Gemmatimonadetes bacterium isolate AG9 unknown PYO44553 no TM split - PYP79782 & PYP79785 VanZ like PYP79775 PYP79784 437 a PYP79782 125 & PYP79781 240 aa PYP79780 PYP79779 PYP79778 PYP79777 PYP79776 Gemmatimonadetes bacterium isolate AG1 family 61 aa 2TM 134 aa cyt c split - PYP05636 & PYP05634 VanZ like PYP05643 104 PYP05644 61 PYP05635 440 aa PYP056367 cyt c PYP05638 240 aa PYP05639 PYP05640 PYP05641 PYP05642 Gemmatimonadetes bacterium isolate AG3 family aa 3TM aa 2TM 134 & 125 aa PYP03992 431 aa PYP03991 216 aa PYP04000 61 without Cyt c but PYP03993 PYP03994 240 aa PYP03995 PYP03996 PYP03997 PYP03998 PYP03999 Gemmatimonadetes bacterium isolate AG33 unknown aa 2 TM with VanZ domain PYO83617 82 aa PYO83609 211 aa PYO83616 PYO83615 247 aa PYO83614 PYO83613 PYO83612 PYO83611 PYO83610 Gemmatimonadetes bacterium isolate AG41 partial 1 TM Gemmatimonadetes bacterium OLD00390 DinB OLD00397 OLD00389 441 aa OLD00388 OLD00387 235 aa pseudo 13_1_40CM_3_65_8 superfamily Cox2-like phylum Acidobacteria WP_011521240 299 WP_011521241 WP_011521243 WP_01152124 WP_011521250 WP_083763647 WP_011521244 WP_011521245 WP_011521246 WP_011521247 Candidatus Koribacter versatilis Ellin345 aa unknown 459 aa 237 aa 8 102 aa unknown PYX22639 PYX22650 CtaB PYX22649 467 aa 2 PYX22646 PYX22645 234 aa PYX22644 PYX22643 PYX22642 PYX22641 PYX22640 Acidobacteriaisolate gp1 AA134 unknown PYT95791 274 aa PYT95782 151 aa PYT95790 491 aa PYT95789 PYT95788 243 aa PYT95787 PYT95786 PYT95785 PYT95784 PYT95783 YiaG transcription Acidobacteria isolate gp2 AA90 unknown 2TM AcrR transcriptional PYX08513 109 PYX08514 2TM PYX08508 453 aa PYX08509 PYX08510 267 aa PYX08511 PYX08588 PYX08512 PYX08589 CtaB Acidobacteria bacterium isolate gp1 AA140 regulator aa 88 aa PYX20155 158 aa PYX20147 104 PYX20154 453 aa PYX20153 PYX20152 248 aa PYX20151 PYX20150 PYX20149 PYX20148 PYX20146 101 aa CtaB Acidobacteria bacterium isolate gp1 AA139 unknown aa PYQ38989 409 aa PYQ38988 PYQ38987 247 aa PYQ38986 PYQ38985 PYQ38984 PYQ38983 PYQ38982 PYQ38981 CtaB Acidobacteria bacterium isolate AA151 no cyt c motif bacterial SH3 PYX29018 409 aa PYX29026 89 PYX29019 PYX29020 247 aa PYX29021 PYX29022 PYX29023 PYX29024 PYX29025 carboxypeptidase Acidobacteria bacterium isolate gp1 AA129 domain protein no cyt c motif aa 2 TM bacterial SH3 PYV84369 PYV84373 91 aa PYV84366 459 aa PYV84367 PYV84368 251 aa pseudo PYV84370 PYV84371 PYV84372 CtaB Acidobacteria bacterium isolate gp1 AA145 domain protein COX2-like 2TM bacterial SH3 PYV73397 129 aa PYV73400 PYV73394 486 aa PYV73395 PYV73398 PYV73399 Acidobacteria isolate gp1 AA148 domain protein partial partial PYX48479 PYX48477 86 aa PYX48478 470 aa overlapping PYX48480 232 aa PYX48481 PYX48482 Acidobacteria isolate gp1 AA131 unknown haems PYV96256 68 aa unknown 139 aa PYV96263 PYV96262 248 aa PYV96261 PYV96260 PYV96259 PYV96258 PYV96257 CtaB Acidobacteria bacterium isolate gp1 AA141 2TM PYU72160 83 aa unknown 332 aa PYU72153 PYU72154 236 aa PYU72155 PYU72156 PYU72157 PYU72158 PYU72159 metallo-hydrolase Acidobacteria bacterium isolate gp2 AA101 2TM PYU03424 Acidobacteria isolate gp2 AA86 unknown 70 aa overlapping PYU03423 237 aa PYU03422 PYU03421 PYU03420 PYU03419 PYU03418 PYU03417 peptidase S41 haems PYX81345 245 aa methionine (S)-S- PYX81346 withType 2 PYX81344 PYX81349 PYX81343 PYX81342 PYX81341 PYX81340 Acidobacteria isolate gp1 AA122 oxide reductase periplasmic fold PYU41598 short-chain PYU41604 236 aa PYU41603 PYU41602 PYU41601 PYU41600 PYU41599 Acidobacteria bacterium isolate gp2 AA106 79 aa 2TM dehydrogenase Acidobacteria bacterium 13_1_20CM_3_58_11 OLE47617 236 aa OLE47616 OLE47615 OLE47614 OLE47619 OLE47613 OLE47612 79 aa OLE47611 Acidobacteria bacterium 13_2_20CM_58_27 OLB28508 241 aa OLB28509 OLB28510 OLB28523 OLB28524 OLB28511 OLB28512 OLB28513 94 aa Acidobacteria isolate gp1 AA126 PYX63460 partial PYX63459 237 aa PYX63458 PYX63457 PYP90952 92 PYP90951 62 aa glutaminase PYP90956 PYP90954 PYP90953 Acidobacteriia bacterium AA117 aa no TM no TM alkaline PYX98085 PYX98084 PYX98083 Acidobacteria isolate gp1 AA115 phosphatase PYX34039, partial PYX34041 PYX340410 Acidobacteriaisolate gp1 AA133 167 aa partial PYX92160 64 aa PYX92162 only peptidase PYX92161 Acidobacteria bacterium isolate gp1 AA123 partial domain PYT71522 83 aa PYT7152 PYT71524 only peptidase Acidobacteria bacterium isolate gp2 AA91 partial partial 413 domain

26

414 Continued next page… 415 Supplementary Table S1. Continued: middle and bottom part. Other cyt c-rich-COX gene clusters.

organisms with other COX cluster upstream gene bd- like cyt c -rich cluster prepended COX Cu protein COX operon A1 type subunit I di-haem-cyt c 4 Cyt c Cyt c DOMON Cup-like COX2 COX1 COX3 COX4 CtaB or other other phylum Gemmatimonadetes Gemmatimonadetes bacterium OLC44248 cytc x2 OLC4425362 aa OLB48906 chlorite OLC44245 OLC44246 OLC44247 OLC44249 OLC44250 OLC44251 OLC44252 OLC44253 CtaB 13_1_40CM_4_65_7 DOMON 2TM dismutase Rieske Gemmatimonadetes bacterium OLE61534 chlorite OLC07916 partial OLE61544 OLE61543 OLE61542 OLE61541 OLE61540 OLE61539 OLE61547 OLE61538 OLE61537 CtaB OLE61536 62 aa 13_1_20CM_2_70_10 dismutase Rieske radical SAM protein, PYO41518 PYO41515 chlorite PYO41526 PYO41525 PYO41524 PYO41523 PYO41522 PYO41521 PYO41520 PYO41519 PYO41517 62 aa Gemmatimonadetes bacterium isolate AG5 PQQ CtaB dismutase Rieske

radical SAM protein, PYP71506 chlorite PYP71125 PYP71126 PYP71128 PYP71127 PYP71512 partial PYP71511 PYP71510 PYP71513 PYP71509 PYP71508 CtaB PYP71507 62 aa Gemmatimonadetes bacterium isolate AG17 PQQ dismutase Rieske radical SAM protein, PYO31248 PYO31249 62 aa PYO3123 PYO31240 PYO31241 cyt c PYO31242 PYO31243 PYO31244. PYO31245 PYO31246 PYO31247 Gemmatimonadetes bacterium isolate AG8 PQQ CtaB 2TM PYO70787 PYO70782 62 aa PYO70773 PYO70774 PYO70775 PYO70776 PYO70777 PYO70778 PYO70779 PYO70780 PYO70781 Gemmatimonadetes bacterium isolate AG47 CtaB 2TM Gemmatimonadetes bacterium isolate AG31 PYP24405 PYP24404 PYP24403 PYP24402 PYP24401 PYP24400 PYP24399 PYP24398 PYP24397 PYP24411 CtaB 62 aa 2TM

Gemmatimonadetes bacterium isolate AG12 PYP70999 cyt c DOMON PYP70995 PYP70994 PYP70993 PYP70992 PYP70991 PYP70990 CtaB

Gemmatimonadetes bacterium isolate AG3 PYP06049 PYP06048 PYP06076 cyt c PYP06047 PYP06046 PYP06045 PYP06044 PYP06043 PYP06042 PYP06075 CtaB 62 aa 2TM PYO82271 PYO84861 PYO82269 PYO82268 cyt c PYO82267 PYO82266 PYO82265 PYO82264 PYO82272 PYO82263 62 aa 2TM Gemmatimonadetes bacterium isolate AG41 CtaB PYP58558 62 aa PYP58549 PYP58550 PYP58622 cyt c PYP58551 PYP58552 PYP58553 PYP58554 PYP58555 PYP58556 PYP58557 CtaB Gemmatimonadetes bacterium isolate AG20 2TM PYO95963 PYO95955 cyt c PYO95956 cyt c PYO95957 PYO95958 PYO95959 PYO95960 PYO95961 PYO95962 62 aa 2TM Gemmatimonadetes bacterium isolate AG36 CtaB PYO36502 PYO36500 PYO36491 PYO36492 PYO36493 cyt c PYO36494 PYO36495 PYO36496 PYO36497 PYO36498 PYO36499. Gemmatimonadetes bacterium isolate AG50 CtaB 62 aa 2TM heme d1 uroporphyrinogen biosynthesis radical PYP17460 PYP17461 PYP17462 PYP17463 PYP17464 PYP17465 PYP17466 PYP17467 PYP17468 PYP17469 CtaB ferrochetalase Gemmatimonadetes bacterium isolate AG30 decarboxylase SAM protein SAM_ahbD_hemeb PYP70989 PYP70988 PYP70999 PYP70998 PYP70997 PYP70996 PYP70995 PYP70994 PYP70993 PYP70992 PYP70991 PYP70990 CtaB Gemmatimonadetes bacterium isolate AG12 super family 64 aa 67 aa TDJ56404 1 64 aa radical SAM protein TDJ56394 TDJ56395 TDJ56396 TDJ56397 TDJ56398 TDJ56399 TDJ56400 TDJ56401 TDJ56402 TDJ56403 CtaB TDJ56403 62 aa Gemmatimonadetes bacterium isolate N074bin45 4TM (DUF420?) radical SAM protein, PYP62977 PYP62978 PYP62979 PYP62980 PYP62981 PYP62982 PYP62983 PYP62984 PYP62985 PYP62986 CtaB Gemmatimonadetes bacterium isolate AG13 coenzyme PQQ

organisms with concatenated COX clusters upstream gene bd- like cyt c -rich cluster prepended COX Cu protein COX operon A1 type concatenated COX operon a-III subtype subunit I di-haem-cyt c 4 Cyt c Cyt c DOMON Cup-like COX2 COX1 COX3 COX4 CtaB or other other di-haem cyt c Cup-like COX2 COX1 COX3A COX3B COX4 other

rSAM_ahbD_hemeb OLD58396 Gemmatimonadetes bacterium OLD58532 tri cyt c OLD58527 super family fused OLD58530 OLD58531 OLD58390 OLD58391 OLD58397 OLD58392 OLD58393 OLD58394 OLD58395 OsmC-like DOMON ferredoxin 13_1_20CM_69_28 with PCP_red protein 1TM subtilisin GBD31924 GBD31923 GBD31797 biosynthesis protein GBD31933 GBD31932 GBD31931 GBD31930 GBD31929 GBD31928 GBD31927 GBD31926 GBD31925 GBD31922 42 aa GBD31801 GBD31800 GBD31799 GBD31798 DUF488 Bacterium HR33 CtaB 62 aa 2TM 3TM AlbA radical SAM protein, PYP55598 162 PYP56911 PYP56912 PYP56913 PYP56914 PYP56915 PYP56916 PYP56917 PYP56918 PYP55604 PYP55603 PYP55602 PYP55601 PYP55600 PYP55611 PYP55599 Gemmatimonadetes bacterium isolate AG16 coenzyme PQQ aa 1 TM chloride radical SAM protein, PYO95201 PYO95192 PYO95193 PYO95194 PYO95195 PYO95196 PYO95197 PYO95198 PYO95199 PYO95200 PYO94985 PYO94984 PYO94983 PYO95014 PYO94982 PYO95013 PYO94981 3TM channel Gemmatimonadetes bacterium isolate AG38 coenzyme PQQ CtaB protein radical SAM protein, PYO15794 PYO15724 62 aa PYO83609 211 PYO15715 PYO15716 PYO15717 PYO15718 PYO15719 PYO15720 PYO15721 PYO15722 PYO15723 PYO83616 PYO83615 PYO83614 PYO83613 PYO83612 PYO83611 PYO83610 Gemmatimonadetes bacterium isolate AG7 coenzyme PQQ CtaB 2TM aa 1 TM PYP06671 PYP04000 61 PYP06678 PYP06695 PYP06677 PYP06676 PYP06675 PYP06674 PYP06673 PYP06672 PYP06671 CtaB PYP03993 PYP03994 PYP03995 PYP03996 PYP03997 PYP03998 PYP03999 Gemmatimonadetes bacterium isolate AG33 62 aa 2TM aa 2 TM 7-cyano-7- radical SAM protein, PYP50643 PYP50634 PYP50635 PYP50636 PYP50637 PYP50638 PYP50639 PYP50640 PYP50641 PYP50642 PYP50654 CtaB PYP52737 PYP52738 PYP52739 PYP52739 PYP52741 PYP52742 PYP52743 deazaguanine Gemmatimonadetes bacterium isolate AG21 coenzyme PQQ 62 aa 2TM reductase radical SAM protein, PYO42215 PYO42216 chlorite dismutase PYO40313, PYO40312, PYO40311, PYO40310, PYO40309, no PYO40308 115 PYO42207 bd PYO42208 PYO42294 PYO42209 PYO42210 PYO42211 PYO42212 PYO42213 PYO42214 PYO40315 PYO40314 Gemmatimonadetes bacterium isolate AG9 coenzyme PQQ CtaB 62 aa 2TM Rieske PYO44552 PYO44551 PYO44550 PYO44553 DNA aa no TM ?Candidatus Rokubacteriabacterium rSAM_NirJ2 super OLD74678 OLD74661 OLD74663 chlorite family fused with OLD74653 OLD74654 OLD74655 OLD74656 OLD74657 OLD74658 OLD74659 OLD74677 OLD746604 13_1_20CM_4_70_14 - probably CtaB 62 aa 2TM dismutase Rieske Gemmatimonadetes MAG PCP_reductase

organisms with complete clusters upstream gene bd- like cyt c-rich cluster prepended COX Cu protein COX operon a-III subtype subunit I di-haem-cyt c 4 Cyt c Cyt c DOMON Cup-like COX2 COX1 CtaA CtaB COX3A 2TM COX3B COX4 or other other extra phylum Ca. Rokubacteria OLC33788 FtsH cell OLC33773 2 OLC33772 Universal OLC33787 OLC33785 OLC33784 OLC33783 partial OLC33782 OLC33781 OLC33780 OLC33779 OLC33778 OLC33777 OLC33776 OLC33775 OLC33774 COX4 Candidatus Rokubacteria 13_1_40CM_4_69_5 division TM DUF983 second Cup stress protein PYN57776 FtsH cell PYN57789 PYN57790 Universal PYN57777 PYN57778 PYN57779 PYN57780 PYN57781 PYN57782 PYN57783 PYN57784 PYN57781 PYN57785 PYN57786 PYN57787 PYN57788 COX4 Candidatus Rokubacteria isolate AR21 division DUF983 second Cup stress protein Candidatus Rokubacteria bacterium OLC13635 FtsH cell OLC13640, OLC13649 OLC13650 Universal OLC13636 OLC13637 OLC13638 OLC13639 OLC13641 OLC13642 OLC13643 OLC13644 OLC13645 OLC13646 OLC13647 OLC13648 COX4 13_1_40CM_69_27 division cupredoxin only DUF983 second Cup stress protein Candidatus Rokubacteria bacterium OGK96031 FtsH cell OGK96018 OGK96017 Universal OGK96030 OGK96029 OGK96028 OGK96027 OGK96026 OGK96025 OGK96024 OGK96023 OGK96034 OGK96022 OGK96021 OGK96020 OGK96019 COX4 RIFCSPHIGHO2_02_FULL_73_26 division DUF983 second Cup stress protein PYN37572 FtsH cell PYN37586 PYN37587 Universal PYN37573 PYN37574 PYN37575 PYN37576 PYN37577 PYN37578 PYN37579 PYN37580 PYN37581 PYN37582 PYN37583 PYN37584 PYN37585 COX4 Candidatus Rokubacteria isolate AR28 division DUF983 second Cup stress protein PYN60170 FtsH cell PYN60184 PYN60185 Universal PYN60171 PYN60172 PYN60173 PYN60174 PYN60175 PYN60176 PYN60177 PYN60178 PYN60179 PYN60180 PYN60181 PYN60182 PYN60183 COX4 Candidatus Rokubacteria isolate AR17 division DUF983 second Cup stress protein PYN04753 FtsH cell PYN04766 PYN04767 Universal PYN04754 PYN04755 PYN04756 PYN04757 PYN04758 PYN04759 PYN04760 PYN04761 PYN04816 PYN04762 PYN04763 PYN04764 PYN04765 COX4 Candidatus Rokubacteria isolate AR34 division DUF983 second Cup stress protein Candidatus Rokubacteria bacterium OLC33788 FtsH cell OLC33773 OLC33772 Universal OLC33787 OLC33786 OLC33785 OLC33783 OLC33782 OLC33781 OLC33780 OLC33779 OLC337798 OLC337797 OLC33776 OLC33775 OLC33774 COX4 13_1_40CM_4_69_5 division DUF983 second Cup stress protein PYN72595 FtsH cell PYN72582 PYN72581 Universal PYN72595 PYN72594 PYN72593 PYN72592 PYN72591 PYN72590 PYN72589 PYN72588 PYN72587 PYN72586 PYN72585 PYN72584 PYN72583 COX4 Candidatus Rokubacteria isolate AR24 division DUF983 second Cup stress protein PYN66674 FtsH cell PYN66652 PYN66651 Universal PYN66664 PYN66663 PYN66662 PYN66661 PYN66660 PYN66659 PYN66658 PYN66657 PYN66656 PYN66655 PYN66654 PYN66653 COX4 Candidatus Rokubacteria isolate AR20 division DUF983 second Cup stress protein PYM62290 FtsH cell PYM62277 PYM62278 Universal PYM62265 PYM62266 PYM62267 PYM62268 PYM62269 PYM62270 PYM62271 PYM62272 PYM62273 PYM62274 PYM62275 PYM62276 COX4 Candidatus Rokubacteria isolate AR38 division DUF983 second Cup stress protein PYN39011 FtsH cell PYN38997 Universal PYN39010 PYN39009 PYN39008 PYN39007 PYN39006 PYN39005 PYN39004 PYN39003 PYN39002 PYN39001 PYN39000 PYN38999 COX4 PYN38998 Candidatus Rokubacteria isolate AR25 division second Cup stress protein PYN97750 FtsH cell PYN97764 PYN97765 Universal PYN97751 PYN97752 PYN97753 PYN97754 PYN97755 PYN97756 PYN97757 PYN97758 PYN97759 PYN97760 PYN97761 PYN97762 PYN97763 COX4 Candidatus Rokubacteria isolate AR16 division DUF983 second Cup stress protein PYM22223 FtsH cell PYM22229 69 PYM22222 PYM22221 partial PYM22220 PYM22219 PYM22218 PYM22217 PYM22216 PYM22215 PYM22214 PYM22213 PYM22212 PYM22211 PYM22210 COX4 Candidatus Rokubacteria isolate AR7 division aa partial PYN98135 FtsH cell PYN98121 PYN98134 PYN98133 PYN98132 PYN98131 PYN98130 PYN98129 PYN98128 PYN98127 PYN98126 PYN98125 PYN98124 PYN98123 PYN98122 COX4 Candidatus Rokubacteria isolate AR18 division DUF983 Universal Candidatus Rokubacteria bacterium OGK82003 OGK82016 OGK82015 OGK82014 OGK82013 OGK82012 OGK82011 OGK82010 OGK82009 OGK82008 OGK82007 OGK82006 OGK82005 OGK82004 COX4 stress protein second Cup GWA2_73_35 pseudo Universal PYO52320 PYO52321 PYO52307 PYO52308 PYO52309 PYO52310 PYO52311 PYO52312 PYO52313 PYO52314 PYO52315 PYO52316 PYO52317 PYO52318 PYO52319 COX4 stress protein Candidatus Rokubacteria isolate AR11 DUF983 second Cup partial OGK82003 OGK82016 OGK82015 OGK82014 OGK82013 OGK82012 OGK82011 OGK82010 OGK82009 OGK82008 OGK82007 OGK82006 OGK82005 OGK82004 COX4 Candidatus Rokubacteria GWA2_73_35 second Cup PYM43359 PYM43360 Universal partial PYM43347 PYM43348 PYM43349 PYM43350 PYM43351 PYM43352 PYM43353 PYM43354 PYM43355 PYM43356 PYM43357 PYM43358 COX4 Candidatus Rokubacteria isolate AR39 DUF983 second Cup stress protein PYM83597 PYM83596 Universal partial PYM83610 PYM83609 PYM83608 PYM83607 PYM83606 PYM83605 PYM83604 PYM83603 PYM83602 PYM83601 PYM83600 PYM83599 PYM83598 COX4 Candidatus Rokubacteria isolate AR40 DUF983 second Cup stress protein PYM20234 PYM20225 PYM20224 Universal partial PYM20238 PYM20237 PYM20236 PYM20235 PYM20233 PYM20232 PYM20231 PYM20230 PYM20229 PYM20228 PYM20227 PYM20226 COX4 Candidatus Rokubacteria isolate AR5 cupredoxin only DUF983 second Cup stress protein FtsH cell division, PYN15591, PYN15574 PYN15573 Universal partial PYN15587 PYN15585 PYN15584 PYN15583 PYN15582 PYN15580 PYN15579 PYN15578 PYN15577, long PYN15576, partial PYN15575 COX4 Candidatus Rokubacteria isolate AR32 pseudo partial DUF983 partial stress protein phylum Ca. Poribacteria Cup, then tri-heme RKU31220 RKU31207 RKU31208 RKU31209 RKU31210 RKU31211 2 ABC RKU31212 RKU31213 RKU31214 RKU31215 RKU31217 RKU31218 RKU31219 RKU31220 COX4 2TM 86 aa Candidatus Poribacteria isolate PCPOR1 cyt c & b 6f operon SCO Cup, then tri-heme RKU16365 Universal RKU16351 RKU16352 RKU16353 RKU16354 RKU16355 2 ABC RKU16356 RKU16357 RKU16358 RKU16359 RKU16361 RKU16362 RKU16363 RKU16364 COX4 Candidatus Poribacteria isolate PCPOR2 cyt c & b 6f operon SCO stress protein Cup, then tri-heme RKU088310 Universal RKU08846 RKU08844 RKU08843 RKU08842 RKU08841 2 ABC RKU08839 RKU08838 RKU08837 RKU08836 RKU08834 RKU08833 RKU08832 RKU08831 COX4 Candidatus Poribacteria isolate PCPOR4 cyt c & b 6f operon SCO stress protein Cup, then tri-heme RKU32080 Universal RKU32096 RKU32095 RKU32094 RKU32093 RKU32092 2 ABC RKU32090 RKU32089 RKU32088 RKU32087 RKU32085 RKU32084 RKU32082 RKU32081 COX4 Candidatus Poribacteria isolate PCPOR6 cyt c & b 6f operon SCO stress protein fumarate RKU10163 Cup only RKU10180 RKU101779 RKU10178 RKU10177 RKU10176 2 ABC RKU10173 2 RKU10170 RKU10169 RKU10168 RKU10167 RKU10166 RKU10165 RKU10164 COX4 reductase Candidatus Poribacteria isolate PCPOR2b SCO flavoprotein RKU27560 Universal RKU27504 RKU27505 RKU27506 RKU27507 RKU27508 2 ABC RKU27511 RKU27512 RKU27513 RKU27559 RKU27515 RKU27516 RKU27517 RKU27518 COX4 Candidatus Poribacteria isolate AGPOR5 SCO stress protein

PON19328 PON19317 PON19318 PON19319 PON19320 PON19321 PON19322 PON19323 PON19324 PON19325 PON19326 PON19335 PON19327 COX4 DUF983 Candidatus Entotheonella serta SCO 416 upstream gene subunit I bd di-haem-cyt c 4 Cyt c Cyt c DOMON Cup-like COX2 COX1 CtaA CtaB COX3A 2TM COX3B other other extra 417 418

27

419 Supplementary Table S2. Statistical analysis of tree topology configuration for the two new types of CtaA proteins 420 introduced in this paper. The analysis was carried out after careful inspection of several phylogenetic trees that included 421 all the various types of CtaA proteins we have classified (Table 1) and analysed according to four mutually exclusive 422 topology configuration as described earlier [51]. Percent values in bold reflect statistically significant data with p values 423 below 0.0001 using the χ2 test [51]. 424 tree topology category for CtaA type type 1.5 % total type 0 % total sister of type 2 0 0 0 0 sister of another type without Cys pairs 0 0 0 0 sister of type 1 branch 31 93.9 1 3.1 sister of all other types, basal 2 6.1 31 96.9 total of trees examined 33 100 32 100 425 426 427 The original table can be supplied as: SupplementalTableS2new.xls 428 429 430 431

28

432 Supplementary Table S3. The table lists the HCO oxidases and COX accessory proteins with their accession for 433 selected taxa of Acidithiobacillales, Acidiferrobacterales and Acidihalobacter (previously known as Thiobacillus). 434 See Supplementary Fig. S8b for a model of these accessory proteins and their interaction with COX subunits. The color 435 code matches that used in Fig. 1c,d. Classification of COX operons follows that proposed recently [23]. 436 #ancestral CtaA ends COX cluster; ^with ba3-a1 cluster.

COX ubiquinol other A family haem A synthase haem O synthase other protein CtaG Cu uptake and delivery bd bd cbb3 ba3-a1 rus -COX operon oxidase oxidase CtaA, type 0 CtaB for haem A caa3_CtaG SCO TlpA FixI PCuAB bd-I CIO C family B family organisms with available genome COX1 subunit I (CyoB) COX1 type 1 (SURF) CtaG_Cox11 Cu Atpase subunit I subunit I subunit I subunit I

Unclassified Acidithiobacillales EGQ62590 type 0, EGQ61976, EGQ63569 Acidithiobacillus sp. GGI-221 EGQ60755 627 aa isolated partial PHS04940 595 aa, Acidithiobacillus sp. NORP59 PHS09287 PHS05417 PHS07882 subtype a-III KPL28761, Acidithiobacillales bacterium SM1_46 KPL27153 type 1 KPL26957 KPL27904 528 aa KPL27305 472 aa KPK72867 KPK12228 SURF-1 in KPK11247, Acidithiobacillales bacterium SG8_45 KPK12225 515 aa operon KPK12353, others

KPK70211 520 aa, KPK70215 SURF-1 in ab KPK72878, Acidithiobacillales bacterium SM23_46 KPK70216 type 1 KPK70212 in ab operon KPK70217 KPK70392 KPK71037 472 aa KPK70691 562 aa subtype ab operon operon KPK72445

Acidithiobacillaceae WP_101537037 716 aa, WP_101537004# type WP_101537194, WP_101537453 544 Acidithiobacillus sp. SH WP_101536946 702 aa 0 WP_101538905 aa WP_004870834#, type WP_004868407 546 Acidithiobacillus caldus ATCC 51756 WP_004870827 717 aa WP_004872504 WP_004869429 0 aa WP_010643015 716 aa, WP_010636934#, type Acidithiobacillus thiooxidans ATCC 19377 WP_010636933 WP_010637904 WP_010637963 WP_010636937 702 aa 0

WP_126605688 MSF WP_113526681 544 WP_113527200 Acidithiobacillus ferridurans JCM 18981 WP_113526417 627 aa WP_126605010 716 aa WP_113526608 type 0 WP_113526421 ctaT, then ctaR, and BBF65070, partial WP_126604669 WP_126604143 aa 484 aa ctaS

AEM47985 MSF family AEM47987 317 aa WP_014029592, Acidithiobacillus ferrivorans SS3 AEM47992 627 aa AEM47186 677 aa AEM47986 CtaT, then ?CtaR and WP_014029225 WP_014027734 type 0 WP_014028243 CtaS

ACK78336 MFS_1 ACK80759, ACK78561 275 aa type family transporter Acidithiobacillus ferrooxidans ATCC 23270 ACK79083 627 aa ACK80009 705 aa ACK80568 ACK79358, isolated ACK80236 ACK79479, ACK77832 542 aa 0 ctaT, then CtaR, CtaU ACK78515 and CtaS

Acidiferrobacteraceae MBP81232, Acidiferrobacteraceae bacterium isolate NP79 MBP81957 MBP81231 MBP80832 , partial MBP80828

MAK33811 523 aa, MAK33807 SURF-1 in MAK33361, Acidiferrobacter sp. isolate IN47 MAK33301 779 aa MAK33805 MAK34213 MAK34474, partial MAK33360 subtype ab operon operon MAK33806

WP_065970948 694 aa, WP_065968679 Surf- WP_114282960, Acidiferrobacter thiooxydans m-1 WP_065972084 628 aa WP_083995840 692 aa, WP_065970946# type 0 WP_065972087 WP_083995572 WP_065968708 like WP_065971779 WP_065969220 702 aa

WP_110136577 692 aa, WP_110136580# , WP_110138620 Surf- WP_110137199, Acidiferrobacter sp. SPIII_3 WP_110137324 628 aa WP_110136363 692 aa, WP_065970946# type 0 WP_110137319 WP_110137620 WP_110137321 like WP_110137742 WP_110136487 702 a WP_096359505 WP_096359241 type 1 , WP_096359599, BAV34787 or BAV32572 or WP_096457636 with WP_096359507 in ab WP_096361494, Sulfuricaulis limicola 519 aa, subtype ab and WP_096359240 WP_096359521 WP_096361676, WP_096361493 WP_096359243 564 ba3-a1 cluster operon WP_096361969 operon type 2^ WP_096359107 473 aa aa WP_096458192 524 aa, WP_096457618 type 1, WP_096457624, WP_096462981, WP_096462446, WP_096459593 subtype ab operon, WP_096359519 with WP_096458199. SURF-1 WP_096462791 in ab WP_096462441 WP_096457633 565 Sulfurifustis variabilis and WP_096457615 WP_096462392, WP_096462957, WP_096462175, COX1-3 fused 828 aa WP_096461293 ba3-a1 cluster in ab operon operon 472 aa aa type 2^ WP_096462391 WP_096461393 WP_096462313 isolated other acidophilic gammaproteobacteria WP_083251101 624 aa WP_083251103, WP_070077171 Acidihalobacter prosperus F5 NOT rus, AOU99474 607 WP_070078302 701 aa WP_083251533# type 0 WP_070078299# AOU99109 WP_070077568 WP_070079464 499 aa aa

WP_082954421#, type WP_052064387, WP_082954400 704 aa, WP_038086639#, WP_082954523, WP_052064339, WP_038086975 Acidihalobacter prosperus DSM 5130 WP_082954525 631 aa 0 and WP_082954583 WP_082954579, WP_038091325 705 aa WP_082954524 WP_038090567 WP_052064443 480 aa type 1 WP_052064118

WP_083250996# type 0 WP_070073488 , WP_070072674 704 aa, WP_070071601 , Acidihalobacter prosperus V6 WP_083250549 632 aa and WP_083250550 WP_083250551 WP_083250502, WP_070071322 WP_070072149 705 aa WP_083250553 type 1 WP_070073792

WP_076836973 682 aa, WP_076835291, WP_076836342, Acidihalobacter ferrooxidans V8 WP_083699744 651 aa APZ42837# type 0 WP_076838087 APZ42840 707 aa APZ44588 partial# WP_083699831

terminal oxidase COX ubiquinol other A family haem A synt. haem O synthase other protein CtaG bd bd cbb3 ba3-a1 other definition rus -COX operon oxidase oxidase CtaA, type 0 CtaB for haem A caa3_CtaG SCO TlpA FixI PCuAB bd-I CIO C family B family 437 subunit/domain COX1 subunit I (CyoB) COX1 type 1 (SURF) CtaG_Cox11 CuA insertion Cu Atpase subunit I subunit I subunit I subunit I 438 439 The original table can be supplied as: SupplementalTableS3.xls 440 441 442 443

29

444 Supplementary Table S4. List of representative taxa that have a type 2 CtaA and also another type of CtaA. The taxa 445 have a complete or nearly complete (>95%) genome. The CtaA types follow the classification presented in Table 1. 446 Only a few strains of the Ca. Accumulibacter group are presented in the table. 447 Phylum class/order representative taxon accession type 2 CtaA accession type 1 CtaA notes

Proteobacteria betaproteobacteria Candidatus Accumulibacter sp. SK-11 EXI72158 EXI72159 type 1 non functional, gene concatenated Candidatus Accumulibacter phosphatis WP_015766302 WP_081444139 type 1 non functional, gene concatenated Candidatus Propionivibrio aalborgensis SBT10918 SBT10917 type 1 non functional, gene concatenated Rhodocyclales bacterium isolate FeB_8 WP_116534845 WP_116534844 type 1 non functional, gene concatenated gammaproteobacteria Sulfuritalea hydrogenivorans WP_041097518 WP_041097958 type 1 non functional, gene concatenated Acidiferrobacterae Sulfuricaulis limicola WP_096359240 WP_096359241 type 1 non functional, gene concatenated Sulfurifustis variabilis WP_09645761 WP_096457618 type 1 non functional Acidiferrobacter sp. SPIII_3 WP_065970946 type 0 Oligoflexia Bacteriovoracales Halobacteriovorax marinus OUR99601 Bacteriovorax_stolpii WP_102242031 type 1.1 fused CtaAB Acidithiobacillia Acidithiobacillales Acidithiobacillus thiooxidans ATCC 19377 WP_010636934 type 0 Acidithiobacillales bacterium SM1_46 KPL27153 type 1.1 Bacteroidetes Flavobacteria Flavobacterium johnsoniae WP_012026872 PZQ81172 type 1.1 Gemmatimonadetes Gemmatimonadetes bacterium isolate J002 RMH70379 Gemmatimonadetes bacterium isolate SB0668_bin_25 MXX12879 type 1.0 Gemmatimonadetes bacterium isolate SB0663_bin_4 MYA76507 type 1.1 Ca. Calditrichaeota Calditrichaeota bacterium isolate J004 RMH62643 Calditrichaeota bacterium isolate CLD4 284 KAA3631309 type 1.0 Chloroflexi Anaerolineae_bacterium _UTCFX1 OQY89546 448 Anaerolineae_bacterium_SG8_19 KPK07954 type 1.1 449 450 The original table can be supplied as: Table S4new.xls 451 452 453

30

454 Supplementary Table S5. a. List of DUF420 proteins from taxa that represent all bacterial groups that have the gene 455 for this protein in their genome. None of the listed taxa have the SURF1 protein coded in their genome. accession taxa close to in genomic sequence unclassified bacteria BAL54904 Candidatus Acetothermia CtaB, then CtaA type 1.0 GBD28975 bacterium HR32 not relevant RMH09559 Firmicutes 1 (Bacillales 1) not relevant BAF67254 Staphylococcus aureus subsp. aureus str. Newman - CtaM CtaB then type 1.0 CtaA CVY07878 Streptococcus pneumoniae strain 2842STDY5753564 CtaB then type 1.0 CtaA Firmicutes 2 (Bacillales 2) NP_389795 Bacillus subtilis subsp. subtilis strain 168 - YozB YocA Lysozyme_like AAA22368 Bacillus firmus COX4 WP_018921396 Salsuginibacillus kocurii COX4 WP_054967562 Alicyclobacillus ferrooxidans glutamate decarboxylase Chloroflexi WP_008476701 Nitrolancea hollandica not relevant WP_054492454 Ardenticatena maritima copper chaperone Deinococcus-Thermus WP_051963639 Deinococcus misasensis DSM 22328 CtaA type 1.0 WP_011173262 Thermus thermophilus septum site-determining protein MinC, not relevant Aquificae WP_121010335 Hydrogenivirga caldilitoris strain DSM 16510 COX1 ba3-like WP_010880044 Aquifex aeolicus VF5 COX1 ba3-like partial WP_041434059 Thermocrinis albus DSM 14484 not relevant OGW43395 Nitrospirae bacterium RBG_16_43_11 , but Aquificae ba3-like isolated Gemmatimonadetes OGU04286.1 Gemmatimonadetes bacterium GWC2_71_10 not relevant KPK64416.1 Gemmatimonas sp. SG8_38_2 KPK64417 62 aa, then CtaB Actinobacteria WP_144849088 Hymenobacter sp. Fur1 COX4 CFB RTL57609 Sphingobacteriales bacterium SCO WP_026730705 Flavobacterium denitrificans SCO WP_049815690 Niastella koreensis not relevant KXK57099.1 Chlorobi bacterium OLB7 SCO Acidobacteria PYQ05042 Acidobacteria bacterium isolate AA34 SCO Planctomycetes WP_002644190.1 Gimesia maris fused with SCO, not relevant Verrucomicrobia PYJ56461 Verrucomicrobia bacterium isolate AV7 SCO RMH63774 Calditrichaeota bacterium isolate J004 k99_499414 SCO Ca. Entotheonella WP_089934891 Candidatus Entotheonella palauensis COX1 isolated not ba3-like ETW97334 Candidatus Entotheonella factor COX1 isolated Nitrospirae OGW43395 Nitrospirae bacterium RIFCSPHIGHO2_01_FULL_66_17 nearby ba3-like partial OGW62595 Nitrospirae bacterium isolate J031 k99_120184 TlpA Deltaproteobacteria OGP21477 Deltaproteobacteria bacterium GWA2_57_13 COX4 KPK14056 Myxococcales bacterium SG8_38 Carbon starvation protein CstA KYF95524 Sorangium cellulosum strain So0011-07 C3980 sulfoxide reductase heme-binding subunit YedZ Oligoflexia WP_021275701 Bacteriovorax sp. Seq25_V not relevant MAF78151 Halobacteriovoraceae bacterium isolate ARS14 COX3 Alphaproteobacteria WP_097280064 Caenispirillum bisanense COX1 ba3-like WP_028878424 Terasakiella pusilla COX1 ba3-like WP_046021972 Magnetospira sp. QH-2 CtaB WP_096704222 Magnetospirillum sp. 15-1 CtaB Zetaproteobacteria 456 NCP22206 Zetaproteobacteria bacterium isolate CG_2015-14_35_33 SCO 31

457 Supplementary Table S5. b. List of representative taxa which have the SURF1 gene and corresponding SURF 458 proteins. None of these taxa have genes for DUF420 proteins in their genome. Accession taxa in COX cluster note Chloroflexi WP_013558955 Anaerolinea thermophila UNI-1 DNA yes in Fig. S8 WP_014432336 Caldilinea aerophila yes in Fig. S8 WP_012256115 Chloroflexus aurantiacus yes in Fig. S8 WP_011958754 Roseiflexus sp. RS-1 yes in Fig. S8 PKN92430 Chloroflexi bacterium HGW-Chloroflexi-6 yes OQY89616 Anaerolineae bacterium UTCFX2 no Deinococcus-Thermus WP_013179237 SURF1 family protein [Truepera radiovictrix] yes LGT Gemmatimonadetes WP_104022937 Gemmatirosa kalamazoonesis no in Fig. S8 OLC75447 Gemmatimonadetes bacterium 13_1_40CM_4_69_8 no in Fig. S8 TDJ53498 Gemmatimonadetes bacterium isolate N074bin45 no in Fig. S8 Actinobacteria MTA51800 Actinobacteria bacterium isolate UFOp-RE-18aug17-39 RE-18aug17-39-c7 no in Fig. S8 TXA41414 Mycobacterium tuberculosis variant bovis no in Fig. 5b WP_011014979 Corynebacterium glutamicum no in Fig. 5b WP_083284771 Corynebacterium multispecies no in Fig. 5b MPZ52523 Acidimicrobiia bacterium isolate Dino_bin35 no in Fig. S8 PHX71198 Acidimicrobium sp. isolate Baikal-G2 no in Fig. S8 Acidithiobacillia WP_012537611 Acidithiobacillus ferroxidans multispecies close to rus operon in Fig. 5b WP_126604692 Acidithiobacillus ferridurans close to rus operon in Fig. 5b Gammaproteobacteria WP_110138620 Acidiferrobacter sp. SPIII_3 close to rus operon in Fig. 5b WP_081125900 Metallibacterium scheffleri yes, WP_136256396 in Fig. 5b WP_037332767 Salinisphaera hydrothermalis no, isolated in Fig. 5b Alphaproteobacteria yes (previously WP_041604982 Tistrella mobilis KA081020-065 in Fig. 5b WP_082828198) PPR17064 Alphaproteobacteria bacterium MarineAlpha9_Bin3 yes in Fig. 5b OUV90346 Alphaproteobacteria bacterium TMED150 yes in Fig. 5b PZQ48255 Micavibrio aeruginosavorus isolate S2_005_002_R2_29 yes in Fig. 5b WP_011750542 Paracoccus denitrificans yes in Fig. 5b Betaproteobacteria WP_092135769 Cupriavidus sp. YR651 yes in Fig. S8 PKO61789 Betaproteobacteria bacterium HGW-Betaproteobacteria-18 yes in Fig. S8 Deltaproteobacteria WP_111730160 Lujinxingia litoralis yes HAF89533 Deltaproteobacteria bacterium isolate UBA8081 yes OGP84521 Deltaproteobacteria bacterium RBG_13_65_10 no, isolated LGT? Zetaproteobacteria RPI01614 Zetaproteobacteria bacterium isolate metabat2.423 close to COX1 only LGT unclassified bacteria 459 GBD32769 bacterium HR33 no 460 461 The original table can be supplied as: Table S5new.xls 462 32

463 Supplementary Table S6. Statistical analysis of tree topology configuration for the major forms of caa3_CtaG 464 proteins that we found in systematic genomic searches. The analysis was carried out after careful inspection of several 465 phylogenetic trees that included all the various groups of caa3_CtaG proteins with different membrane topology (Fig. 4) 466 and signature residues (Supplementary Fig. S7). The analysis was undertaken according to four mutually exclusive 467 topology configurations as described earlier [51] and shown in Supplementary Table S2. Percent values in bold reflect 468 statistically significant data with p values below 0.001 using the χ2 test [51]. 469 acidophilic Fe- Rhodovibrio fused* tree topology category for CtaG groups % total % total % total Bacillus group % total oxidizers group Actinobacteria

sister of Rhodovibrio group 1 4.0 0 11 73.3 4 19.0 sister of fused Actinobacteria 0 11 50.0 0 0 sister of another group 3 12.0 10 45.5 4 26.7 17 81.0 sister of all other types, basal 21 84.0 1 4.5 0 0

total of trees examined 25 100 22 100 15 100 21 100

470 *together with 2 Chloroflexi with which they always cluster together 471 The original table can be supplied as: Table S6new.xls 472

33