Evolutionary history of redox metal-binding domains across the tree of life

Arye Harela, Yana Brombergb, Paul G. Falkowskia,c,1, and Debashish Bhattacharyad

aEnvironmental Biophysics and Molecular Ecology Program, Institute of Marine and Coastal Science, bDepartment of Biochemistry and Microbiology, and dDepartment of Ecology, Evolution, and Natural Resources, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901; and cDepartment of Earth and Planetary Sciences, Rutgers University, The State University of New Jersey, Piscataway, NJ 08854

Contributed by Paul G. Falkowski, March 4, 2014 (sent for review October 23, 2013; reviewed by Edward F. DeLong and Michael Lynch) mediate electron transfer (i.e., redox) reactions diverged family members (2, 14, 15). The evolutionary relation- across the tree of life and ultimately facilitate the biologically driven ships of protein families identified using profile HMMs may be fluxes of hydrogen, carbon, nitrogen, oxygen, and on Earth. reconstructed with similarity networks. These networks are com- The core responsible for these reactions are ancient, often posed of vertices (nodes), which represent protein sequences, small in size, and highly diverse in amino acid sequence, and many connected by edges, representing similarity above a specified cut- require specific transition metals in their active sites. Here we re- off. Protein similarity networks offer an appealing alternative to construct the evolution of metal-binding domains in extant oxidor- phylogenetic approaches that rely on simultaneous multiple se- eductases using a flexible network approach and permissive profile quence alignments to reconstruct strictly bifurcating trees. By in- alignments based on available microbial genome data. Our results corporating different metrics (e.g., pairwise alignment of profiles or suggest there were at least 10 independent origins of redox domain sequence-to-profile alignments), network analysis provides a flexi- families. However, we also identified multiple ancient connections ble approach to access the composition of domains in ancient c between Fe2S2- (adrenodoxin-like) and heme- (cytochrome )binding protein families. In this study we applied a flexible network ap- domains. Our results suggest that these two iron-containing redox proach (11, 12) and permissive profile alignments (2, 14, 15) on families had a single common ancestor that underwent duplication microbial genome data to reconstruct the evolutionary history of and divergence. The iron-containing constitutes ∼50% metal-binding domains. Our results suggest that of all metal-containing oxidoreductases and potentially catalyzed re- whereas there were at least 10 independent origins of redox do- dox reactions in the Archean oceans. Heme-binding domains seem to be derived via modular evolutionary processes that ultimately form the main families one core family of iron-containing oxidoreductases backbone of redox reactions in both anaerobic and aerobic respiration came to dominate the electron fluxes across the planet before the and photosynthesis. The empirically discovered network allows us to evolution of oxygen. This family continues to represent the core of peer into the ancient history of microbial metabolism on our planet. biologically catalyzed electron transfer reactions on Earth. Results and Discussion iron–sulfur | Great Oxidation/Oxygenation Event | biogeochemical cycles | core pathways Profile Alignments Reveal at Least 10 Origins of Transition-Metal Redox Domains. By aligning sequence profiles of metal-binding redox domains (16) (102 HMM domain profiles with five or xidoreductases are anciently derived enzymes that mediate more sequences; Methods) we constructed a network of vertices, electron transfer (i.e., redox) reactions across the tree of life O with edges between them indicating domain similarity. This ap- and ultimately came to facilitate biologically driven fluxes of proach revealed 10 distinct (disconnected; Methods) subnetworks hydrogen, carbon, nitrogen, oxygen, and sulfur on Earth (1). It has (grouping 71 domains) and 31 isolated domains (Fig. 1, Fig. S1, been suggested that an ancestral pool of peptide modules may have and Table S1). Domains binding different ligands (e.g., Fe S given rise to the first protein folds that were dispersed into dif- 4 4 with Fe S ) were connected to each other only in the largest ferent superfamilies (2–4). Some of these peptide modules are part 2 2 of the limited set of building blocks (i.e., the “redox con- struction kit”) that gave rise to many oxidoreductases (5). Given Significance this Darwinian model of “descent with modification” for amino acid sequences in the active sites of enzymes, we analyzed a set of Oxidoreductases mediate the biological production of chemical core of oxidoreductase catalytic domains to elucidate origin(s) and energy and regulate the flow of essential elements in all or- evolutionary patterns of biological electron transfer reactions. ganisms and ecosystems, yet their evolutionary history is poorly Previous analysis suggests that the catalytic domains evolved in understood. Here we present a network analysis of all known microbes long before the Great Oxidation Event (GOE) ca. 2.4 metal-containing oxidoreductases across the tree of life. Mem- billion y ago (6). However, owing to their ancient provenance, bers of this network seem to have driven microbial metabolism often small size, and high divergence, the evolutionary history in the Archean oceans. Our analysis reveals that oxidoreductases of these domains is challenging to reconstruct. For example, are polyphyletic and derived from a minimum of 10 different a recent attempt to reconstruct the phylogeny of oxidoreductase ancient protein families with distantly related domains. How- domains based on structural data (7) was limited to pairwise dis- ever, we find substantial evidence that two apparently distinct tance analysis (without an underlying model of structure evolution) and ubiquitous iron-containing families of oxidoreductases con- taining Fe S and hemes arose from a single common ancestor. and implied a monophyletic origin of all metal-binding domains. 2 2 To address the evolutionary history of oxidoreductases, we used Author contributions: A.H. and D.B. designed research; A.H. performed research; D.B. hidden Markov model (HMM) (8–10) profile-to-profile alignments provided conceptual insights; A.H., Y.B., P.G.F., and D.B. analyzed data; and A.H. wrote and protein similarity networks (11, 12) to study the metal-binding the paper. domains. HMMs (10) are a class of probabilistic models generally Reviewers: E.F.D., Massachusetts Institute of Technology; and M.L., Indiana University. applicable to linear sequences (13). Because profile HMMs cap- The authors declare no conflict of interest. ture family-specific information, including functionally and struc- 1To whom correspondence should be addressed. E-mail: [email protected]. turally important residues, they are more sensitive and accurate This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. than sequence alignments alone when searching for deeply 1073/pnas.1403676111/-/DCSupplemental.

7042–7047 | PNAS | May 13, 2014 | vol. 111 | no. 19 www.pnas.org/cgi/doi/10.1073/pnas.1403676111 Downloaded by guest on September 25, 2021 subnetwork (Fig. 1A); however, this subnetwork did not contain Ancient Domains Arose in Thermophilic Anaerobic Prokaryotes and domains binding molybdenum, tungsten, manganese, or copper. Subsequently Diverged Following the Rise of Oxygen. To test the These results strongly imply that oxidoreductases (assigned to hypothesis that oxidoreductase transition metal-binding domains EC class 1) are polyphyletic (Fig. 1). that share a common ancestor may have diversified via envi- ronmental selection, we studied their distribution with respect to One Diverged Subnetwork Implies a Common Ancestor for Iron-Binding tolerance of oxygen and elevated temperatures. Domain families Domains. The largest subnetwork contains edges that connect dif- dominated by sequences from anaerobes (>50%) and thermo- ferent ligand-binding protein domains. Both profile-vs.-profile and philes (>70%) are found in the largest subnetwork (Fig. 1B and position-specific iterative (PSI)-BLAST alignments connect the SI Methods) and are presumably relics of the oldest core motifs. following binding domains: Fe S to Fe S ,iron–sulfur (Fe S , 4 4 2 2 4 4 In contrast, one-half of the domain families (14 of 28) in other Fe S ) to four cysteine iron domains (FeS ), and iron–sulfur 2 2 4 subnetworks are dominated by sequences from aerobic prokar- (Fe4S4,Fe2S2) to hemes (Fig. S2). In contrast, alignments of yotes. Ten of these domains are present in enzymes whose func- Fe4S4-binding domains with nitrogenase (FeFe, MoFe, VFe, or 8Fe-7S) are supported only by the more sensitive profile align- tions require molecular oxygen [e.g., oxygenases (monooxygenase c ments (Fig. 1A), implying a more distant relationship. This analysis and dioxygenase) and cytochrome oxidase]. Our results suggest suggests a monophyletic origin of iron-binding domains that we that some domains found in the largest subnetwork arose early in postulate to have evolved from FexSx to hemes (discussed below). a thermophilic anaerobic prokaryotic ancestor (17, 18) and sub- This core set of domains is distantly related to domains of nitro- sequently diverged as the oxidation state of the surface of the genase and those containing other metals, including vanadium and Earth increased and temperature presumably decreased. In con- nickel (i.e., NiFe in ). trast, the domains present in other isolated, mostly aerobic “islands”

A

Fe4S4

Fe2S2 HemeHeme

Fe4S4, 4Fe-2S-2O Heme a3-CuB

4Fe-4S,Heme siroheme NiFe V FeFe (nitrogenase) VFe (nitrogenase) FeFe, MoFe, VFe, 8Fe-7S (nitrogenase)

FeS4 B CytP450 Electron transfer in (monooxygenase) Multicopper oxidase f * complex III or bc6 CytC

CytC oxidase Dioxygenase Adx’ Peroxidase CytC Iron hydrogenase CytC

* CytC

Nitrogenase Monooxygenase EVOLUTION

OSPO Dfx Dioxygenase * HSR CytC all types of cytochrome c DMSOR * Adx’ adrenodoxin like * HSR heterodisulphide reductase Dfx desulfoferrodoxin OSPO oxygen-sensitive pyruvate oxidoreductase DMSOR DMSO reductase

Fig. 1. Multiple subnetworks of transition-metal redox domains found using profile alignments. Vertices indicate domain families and edge lengths indicate profile alignment scores (number of aligned residues per shortest profile length). Solid edge connections are verified both by PSI-BLAST and profile align- ments. Six vertices were collapsed for the purpose of presentation (Fig. S1). Isolated vertices are not shown (Table S1). Note that the distance between disconnected network components does not indicate the level of similarity. (A) Domain ligands (indicated by vertex color). Note that a ligand may be co- ordinated by different types of domains (e.g., heme in three different types of domains). (B) Vertex pie charts show the proportion of sequences with different oxygen requirements for each domain (aerobes are blue, facultative aerobes are yellow, and anaerobes are red). Domains that have <2% annotated sequences are shown in light blue with a darker circle. Yellow asterisks indicate a domain with a high proportion (≥70%) of sequences from thermophiles. Proportions were calculated with normalized organism counts, that is, by dividing the count of domain-sequences of a given category (e.g., anaerobes) by the number of organisms in this category in the overall set.

Harel et al. PNAS | May 13, 2014 | vol. 111 | no. 19 | 7043 Downloaded by guest on September 25, 2021 (subnetworks) presumably evolved independently after the rise A of oxygen. The assumption of independent innovation is supported by Fe2S2 a study of the evolution of protein folds. This analysis suggests that enzymes recruited new (and preexisting) folds for oxygen- Heme dependent metabolism in the several-hundred-million-year in- terval between the emergence of oxygenic photosynthesis and the GOE (19). The absence of connections between domain is- lands may reflect limitations of the sequence similarity-based bioinformatic approach used in this study. However, it is plau- sible that (i) selective forces imposed by an increase in oxygen following the GOE served as a driving force for the evolution of novel domains and (ii) specific, environmentally constrained niches preserved this novelty. One such example is superoxide dismutase (SOD; Table S1), which is found in high frequency in facultative anaerobes. The presumably toxic effect of even low concentrations of oxygen imposed strong selective forces that resulted in the innovation of several independent SOD analogs B 1.2 of oxygen scavengers, two of which (Cu/Zn and Fe/Mn binding) (20) appear as isolated domains in our network. 1

Ancient Origin of Fe4S4-Binding Domains Is Implied by Its High Abundance 0.8 in Anaerobic, Thermophilic Environments and High Network Centrality. To study the possible antiquity of Fe4S4-binding domains, we 0.6 grouped domains that bind similar ligands (Table S1)andstudied their network centrality (SI Methods). Transition metal-binding 0.4 domains can be divided into three categories based on differences in their relative abundance between obligate anaerobes, facultative 0.2 aerobes, and obligate aerobes (Fig. S3). The Fe4S4-binding domains reveal a pattern of descending abundance as oxygen requirement number of connections number of connections 0 increases (Fig. S3A) and temperature requirement decreases (from thermophilic to meso/psychrophilic; Fig. S3D and SI Methods). sequence of sequence the same ligand/ total sequence of sequence the same ligand/ total >=0 >=3 >=5 Although temperature and oxygen requirements are treated sepa- Number of connections with with domain Number ofconnections Number of connections with with domain Number ofconnections rately, their influence on the distribution of microbes is often Minimal number of connections with interconnected and trends in the occurrence of a specific domain non-self domain sequences may derive from one or a combination of both factors. The high C centrality of Fe4S4-binding domains further supports a putative Fe2S2 -start Fe2S2 Unknown origin in ancient anaerobic environments (Fig. S4). This hypoth- esis is supported by a phylogenomic analysis that addressed the Heme-end Fe4S4-end Copper-end evolution of metal-binding domain structures and found an early origin of iron–sulfur-binding domains (6). In addition, it has been suggested that ferredoxin and related proteins evolved early in the history of biological catalysis of redox reactions (21, 22). This idea is partially based on the short, repeat sequence in modern ferredoxins (21), all of which contain a CXXCXXC....C motif and are chiral (22, 23). We propose that the Fe4S4-binding domain served as a template for the emergence of other domains found across the oxidoreductase landscape. The adjustment to changes in metal availability (6, 24) and the need to generate domains with different redox potential and electron transfer chains seem to have been important driving forces in the evolution of new domains.

Analysis of Networks Based Solely on Domain Sequences Is Consistent

with a Common Origin of Fe2S2- and Heme-Binding Domains. Align- ment of domain profiles (Fig. 1) revealed a potential evolu- tionary connection between Fe2S2- and heme-binding domains. Fig. 2. Evolutionary relationship between heme-binding cytochrome c and To investigate this result we constructed a more stringent net- the adrenodoxin-like (Fe2S2-binding) domain sequences. (A) The entire sub- work from domain sequences of the largest subnetwork by using network 8 of the domain-sequence network constructed from domain a higher sequence identity cutoff and larger alignment coverage sequences of the largest subnetwork (Methods) that contains heme- and required to define an edge (Methods). Within this stringent Fe2S2-binding sequences. Vertices indicate adrenodoxin-family domain se- quences (gray circles; InterPro ID IPR018298), and cytochrome c domain se- quences (red circles; general and multiheme cytochrome with InterPro IDs

IPR009056 and IPR011031, respectively). Edges indicate PSI-BLAST alignments. the 34 promiscuous adrenodoxin-like (Fe2S2-binding) domain sequences and (B) Promiscuous sequences of heme (red bars) are more similar to Fe2S2 (gray ending with cytochrome c (heme-binding) domain sequences. Vertices indicate bars) domain sequences than to other heme domain sequences. The level of sequences and edges BLAST alignments (Methods). Note that some edges are promiscuity (x axis) is defined by the minimal number of connections to non- used in more than one path. In most steps, only one substitution occurred (ob-

self domain sequences (i.e., heme connection with Fe2S2). (C) Network repre- served by one nonaligned amino acid). The double dash marks an edge that senting all paths of degenerative alignments chain (Methods) starting from was shifted left and extended solely for the purpose of compact presentation.

7044 | www.pnas.org/cgi/doi/10.1073/pnas.1403676111 Harel et al. Downloaded by guest on September 25, 2021 network (Fig. 2A), subnetwork 8 contains a central region of this approach did not retrieve any unexpected domain sequences 83 Fe2S2-binding domain sequences (an adrenodoxin-family do- (i.e., those that bind ligands other than Fe4S4, heme, FeS, and main that also includes prokaryotic putidaredoxin and terpre- copper). This result suggests that expansion of sequence space to doxin) surrounded by 282 heme-binding domain sequences apparently unrelated modules is not an artifact of the degenerative (cytochrome c). Although “promiscuous” domains (i.e., aligning to alignment analysis in the current protein space. Furthermore, we three or more domain sequences binding a different ligand) exist propose that naturally occurring protein intermediates, which may n = n = both for Fe2S2-( 34) and heme- ( 39) binding domains, the not be capable of binding Fe2S2 or heme, exist transiently between promiscuous heme domains have sixfold fewer connections with Fe2S2 and the core of heme-binding domains. other heme domain sequences than do the promiscuous Fe2S2 B domains (Fig. 2 ). Nearly one-half (19) of the promiscuous heme Alignment of Complete Protein Sequence Profiles of Fe2S2-Binding domains do not align with any other heme domain sequence. That (Adrenodoxin Family) and Cytochrome c-Specific Families Demonstrates

is, the central (promiscuous) sequences of the heme domains are Conservation Beyond the Short Fe2S2 Domain Sequences. To search more similar to Fe2S2 domains than to other heme domains. for conservation beyond the short Fe2S2 domain sequence and its These results suggest that the Fe2S2 and heme, which constitute corresponding core in the cytochrome c domain, we generated ∼ 50% of all metal-containing oxidoreductases (7), are derived profiles from complete sequences of proteins containing the Fe2S2 from a single common ancestor. To investigate the robustness (adrenodoxin family)-binding domains and the cytochrome of this result, we randomized the connections between the Fe2S2- c-specific domains (SI Methods). The adrenodoxin-like domain- and heme-binding domain sequences using three different ap- containing protein aligns (E-value 0.0025) to cytochrome c4 Methods A B proaches ( ). These randomized networks (Fig. S5 and ) (IPR024167) with six aligned positions outside the short domain A were markedly different from those shown in Fig. 2 . core; two of these residues (Leu and Gly) are conserved in >40% of 544 and 557 of adrenodoxin-like and cytochrome c4 domain Approximately 35% of Transition-Metal Redox Domain-Containing sequences, respectively (Fig. S5). Thus, if an evolutionary path Proteins Are Not Classified as Enzymes. To understand the origin exists between the adrenodoxin family to cytochrome c, it was of transition-metal redox domains we examined their distribution extended beyond the core binding residues. Based on the redox in other enzyme classes in the UniProt database. Approximately potential of cytochrome c and the adrenodoxin family proteins, 44% of the transition-metal redox domain-containing proteins are not classified as enzymes and a small proportion (1.8%) is we assume that the newly derived porphyrin, using the cytochrome c-like domain, was probably more oxidizing than the adrenodoxin- assigned to nonoxidoreductase enzyme activities (Table S2). In – a previous study aimed at detecting oxidoreductases based on like Fe2S2 domain (27 29). ∼ It has been suggested that the abiotic origins of life precursors the presence of catalytic sites it was suggested that 20% of non- – enzyme proteins in the UniProt database are misclassified enzymes used transition metal-based ligands (e.g., iron sulfur clusters) to fix (25). However, even taking this level of misannotation into account, carbon (30), and the basic reaction was appropriated by the first as much as 35% of transition-metal redox domain-containing pro- free-living organisms (31). This hypothesis highlights the critical teins in our set have no assigned enzymatic activity. An explanation importance of transition metal-binding domains in the origin of for this result may be that the transition-metal redox domains metabolism. However, our analysis suggests there was no single evolved from short sequences (i.e., modules) that were able to bind transition metals and were later recruited into oxidoreductases, where they assumed a new role in mediating electron transfer reactions. This hypothesis is supported by the observation that an Geochemical reactions ancestral pool of peptide modules gave rise to the ancient folds (2). FeS+H2SFeS2+ H2 To further test our inference, we asked whether these types of Hydrogenase 1 ATP synthase transitions are found in available metagenome data. + + + H Heterodisulfide H2 2e + 2H 3 Evidence for Naturally Occurring Intermediates Between Fe2S2- Reductase Binding (Adrenodoxin-like) and Heme-Binding (Cytochrome c) Domain e- transfer chain 2 - Sequences. We searched the UniProt domain sequences and the e Global Ocean Sampling (GOS) data (26) for all possible sequence ADP+Pi ATP intermediates between the core regions of Fe2S2–heme-binding domains (Methods). Using a “degenerative alignment chain” ap- 2e- 2e- proach starting from Fe S -binding (adrenodoxin-like) domain 0 - 2 2 Regeneration 2H2O+S H2S+2OH sequences, we retrieved 21 heme-binding (cytochrome c) domain of CoM and CoB S0 reductase 4 EVOLUTION sequences (Fig. 2C). The majority (90%) of the degenerative B. Sulfur reduction A. CO2 reduction alignment paths ended with a heme-binding domain, whereas the remainder ended with FeS- (9%) and copper- (1%) binding domains. Paths that ended with heme-binding domain sequences Fig. 3. Core energy conversion pathways in the ancient microbiome cap- tured by the largest protein domain subnetwork (Fig. 1, largest subnetwork). contained an average of 13.2 steps and 1.5 indels per path (Tables Ancient consortia of microbes probably exploited redox-coupled reactions S3 and S4). It seems that protein sequences found in nature are with available donor (H2) and acceptors (CO2 and elementary sulfur) to 12 + constrained to a very limited portion (∼10 )ofallavailablepro- generate H disequilibrium (i.e., proton motive force), which was in turn 390 tein space (10 ) (2). For the modules identified here, only a small harvested to form chemical bonds (ATP) [modified from Madigan (38)]. 8 1 fraction (10 ) of the available protein space was accessed (SI Electrons from H2 were harvested by a membrane-associated hydrogenase Methods). Thus, it is reasonable to assume that it is virtually im- (InterPro domains IDs: IPR018194, IPR004108, and IPR014406) and were transferred across the membrane (A) via iron-based domains2 to a hetero- possible for the protein space found in nature to contain all of the 3 intermediates between any given pair of sequences. disulfide reductase (InterPro domain ID IPR017680) that regenerated co- enzyme M and B, mediating the last step in methanogenesis (CO2 reduction) To compensate for this lack of diversity, specifically in the 4 or (B) to sulfur reductase (InterPro domains IDs: IPR018194, IPR006066, and GOS database, we performed a degenerative alignment chain + “ ” IPR006067; Results and Discussion) (38). Finally, harvesting energy from H search against a joint database that included a larger sequence disequilibrium into chemical bonds is supposedly an ancient mechanism that SI Methods space from noneukaryotic organisms in UniProt ( ). is abundant throughout the tree of life. The resulting H2S could have been 0′ For the sample of 11 (adrenodoxin-like) queries (SI Methods) recycled to H2 by abiotic reaction with iron sulfide minerals (ΔG = −42 kJ) (38).

Harel et al. PNAS | May 13, 2014 | vol. 111 | no. 19 | 7045 Downloaded by guest on September 25, 2021 Table 1. Ancient pathways represented in the biggest profiles-based redox domains network (Fig. 1) Pathway Related domain names (IDs*) Comments

Energy conservation, basic Hydrogenase (18194, 04108, 14406) Found in the deeply branching thermophilic proton reduction Ferredoxin (06058, 09051, 17900) archaeon Pyrococcus furiosus (37) Acetyl-CoA (central metabolite) synthesis Oxygen-sensitive pyruvate Could have gained its substrate (pyruvate) oxidoreductase (11898) from abiotic synthesis in conditions that prevailed in hydrothermal vents (44) Nitrogen fixation Nitrogenase component 1 (00510) Either a molybdenum–iron, vanadium–iron, or iron–iron protein Regulation of photosynthesis and Ferredoxin thioredoxin reductase (04209) Evolved in subsequent phases of evolution of electron transfer Rieske (05805) photosynthesis and electron transfer chains Cytochrome c (03088, 08168, 11031) Ferredoxin (06058, 09051, 17900)

*Domain IDs show InterPro ID without the “IPR0” preceding characters (i.e., 18506 instead of IPR018506; Table S1).

transition metal domain that is the progenitor of all extant oxidor- extracted by a membrane-associated hydrogenase and transferred eductases. Rather, the network analyses we present strongly in- across the membrane to two major pathways: (i)CO2 reduction in dicate a polyphyletic evolutionary history. The multiple functions methanogens and (ii) elementary sulfur reduction (38). The latter encoded by highly diverged protein families hinder testing the hy- could have been mediated by a sulfur-reducing hydrogenase an- pothesis of polyphyly via the application of a standard phylogenetic cestor (41) or by an ancient sulfur-reducing domain (42). In- analysis with an associated protein evolution model (32). Further- terestingly, both of these functions are represented in the largest more, our analysis supports the hypothesis of extraordinary genetic subnetwork (Fig. 1). innovation during the Archaean eon (19, 33). One of these studies Our network also contains domains that may have participated highlights genetic innovations during an “Archaean genetic expan- in an ancient energy conservation proton reduction pathway, sion” that was associated with an expansion in microbial respiratory acetyl-CoA synthesis, nitrogen fixation, and regulation of anoxy- and electron transport capabilities (33). The other suggests that genic photosynthesis and electron transfer (Table 1). Together enzymes recruited new folds for oxygen-dependent metabolism these could support sustainable core pathways that formed the (19). This “research and development” stage in the first ca.2.5 foundation for evolving primordial light using anoxygenic photo- billion y of Earth’s history have facilitated the radiation of microbes synthesis followed by oxygenic photosynthesis. The latter drove under the strong selective conditions ultimately imposed by the the GOE and the evolution of novel domains that rely on oxygen. presence of oxygen. Metabolic innovation ended with the evolution Methods of eukaryotes (1). We cannot completely rule out the possibility that sequence All protein sequences with their annotations were extracted from UniProt convergence explains the shared patterns we identified for short (December 2011) (43) and the UniProt Metagenomic and Environmental Sequences databases (downloaded November 2011). UniProt sequences with amino acid sequences in, for example Fe2S2-binding domains. – transition metal-using redox domains (25) were extracted from InterPro (44). However, the evolutionary connection between heme- and iron Profiles were generated for each domain using scripts from the HHblits suite sulfur Fe2S2-binding domains is supported by phylogenetic (16). All-vs.-all alignment of domain profiles was made with the HHblits analysis of the remotely related (34, 35) signal sensor PAS and HHalign program. Note that although it takes all publicly available genomic GAF domain families, which shows the partition of a heme- data into account, our approach and related results (i.e., isolated subnet- works) are inevitably limited by the availability of sequences and resolution binding PAS, Fe4S4-binding PAS, and Fe2S2-binding GAF into three different clades (36, 37). Networks do not address the di- power of alignments. We also generated all-vs.-all domain sequence align- rection of evolution; however, the postulated existence of iron– ments using PSI-BLAST. Metadata from organisms (i.e., oxygen and tem- perature requirements; Fig. 1B and Fig. S3) were extracted from the National sulfur-based ancient life precursors (30, 31) suggests that the Center for Biotechnology Information and from the Integrated Microbial Fe2S2-binding domain appeared before the heme-binding struc- Genomes project database of the Joint Genome Institute (June 2012). It ture. This assumption is also supported by evolutionary analysis should be noted that to minimize phylogenetic bias in our profile networks of transition-metal redox domains that demonstrate that Fe2S2 resulting from oversampling particular lineages, we analyzed most of the are highly evolvable, loop-rich structures containing few hydrogen publicly available prokaryotic genomes (Fig. 1B and Fig. S3). Centrality (Fig. bonds (7). These hypotheses suggest a derived evolutionary posi- S4 B and C) and connection between Fe2S2- and heme-binding domains (Fig. tion for heme-binding domains (specifically cytochrome c), which 2A) were computed for the more stringent network (protein sequence identity >30%, hit length of >70% of the smallest homolog) constructed is found across the tree of life. These protein domains ultimately from domain sequences of the largest subnetwork. To search for possible form modules of oxidoreductase evolution that support both an- sequence intermediates between Fe2S2- (adrenodoxin-family) and heme- aerobic and aerobic respiration and photosynthesis (38). (cytochrome c) binding domains, we constructed a set of BLAST alignments

(Fig. 2) starting from Fe2S2 domain sequences in which the aligned region of Does the Largest Subnetwork Contain Components of Metabolic the hit was used as a query sequence in the next step. When the chain of Pathways Present in the First Cells on Our Planet? Inspection of degenerative alignments retrieved a domain sequence that bound a ligand “ ” the central network that contains both iron–sulfur and hemes other than Fe2S2 it was terminated and considered as a path (Tables S3 suggests the component oxidoreductases represent an ancient and S4). For all methods, all parameter and procedural details are reported core of redox metal-dependent pathways in anaerobes and ther- in SI Methods. mophiles (Fig. 3). In Archean oceans, H was undoubtedly a major 2 ACKNOWLEDGMENTS. We thank Vikas Nanda, Stefan Senn, John Kim, Shu source of reducing power (17, 18, 39, 40). Metabolic strategies Cheng, Chengesheng Zhu, Ben Jelen (all Rutgers University), and Eric Bapteste among ancient microbial consortia (Fig. 3) probably included uti- (Université Pierre et Marie Curie) for helpful discussions; Johannes Söding lization of redox-coupled reactions with the available donor, H2, (Ludwig Maximilian University of Munich) for helpful advice and technical + and acceptors, CO and elemental sulfur, to generate an H -based support in the profiles alignment analysis; J. Clark Lagarias (University 2 of California, Davis) for discussions; and Huan Qiu, Udi Zelzion, and other D.B. thermodynamic disequilibrium that was subsequently used to laboratory members for helpful insights into the work. This research was funded form chemical bonds (29, 40). Electrons from H2 would have been by Gordon and Betty Moore Foundation Grant GBMF2807 (to P.G.F.).

7046 | www.pnas.org/cgi/doi/10.1073/pnas.1403676111 Harel et al. Downloaded by guest on September 25, 2021 1. Falkowski PG, Fenchel T, Delong EF (2008) The microbial engines that drive Earth’s 24. Williams RJP (1981) Natural selection of the chemical elements. Proc R Soc Lond B Biol biogeochemical cycles. Science 320(5879):1034–1039. Sci 213:361–397. 2. Alva V, Remmert M, Biegert A, Lupas AN, Söding J (2010) A galaxy of folds. Protein Sci 25. Harel A, Falkowski P, Bromberg Y (2012) TrAnsFuSE refines the search for protein 19(1):124–130. function: Oxidoreductases. Integr Biol (Camb) 4(7):765–777. 3. Bukhari SA, Caetano-Anollés G (2013) Origin and evolution of protein fold designs 26. Yooseph S, et al. (2007) The Sorcerer II Global Ocean Sampling expedition: Expanding inferred from phylogenomic analysis of CATH domain structures in proteomes. PLOS the universe of protein families. PLoS Biol 5(3):e16. Comput Biol 9(3):e1003009. 27. Huang YY, Kimura T (1983) Reduction potential and thermodynamic parameters of 4. Furnham N, et al. (2012) Exploring the evolution of novel enzyme functions within adrenodoxin by the use of an anaerobic thin-layer electrode. Anal Biochem 133(2): structurally defined protein superfamilies. PLOS Comput Biol 8(3):e1002403. 385–393. 5. Baymann F, et al. (2003) The redox protein construction kit: Pre-last universal common 28. Lehninger AL, Nelson DL, Cox MM (2008) Lehninger Principles of Biochemistry (Freeman, ancestor evolution of energy-conserving enzymes. Philos Trans R Soc Lond B Biol Sci New York), 5th Ed, p 511. 358(1429):267–274. 29. Williams RJP, Frausto da Silva JRR (1996) The Natural Selection of the Chemical Elements: 6. Dupont CL, Butcher A, Valas RE, Bourne PE, Caetano-Anollés G (2010) History of ’ biological metal utilization inferred through phylogenomic analysis of protein The Environment and Life s Chemistry (Clarendon, New York), pp 186, 306, 588. structures. Proc Natl Acad Sci USA 107(23):10567–10572. 30. Wachtershauser G (1998) Pyrite formation, the first energy source for life: A hy- 7. Kim JD, Senn S, Harel A, Jelen BI, Falkowski PG (2013) Discovering the electronic circuit pothesis. Syst Appl Microbiol 10:207–210. diagram of life: Structural relationships among transition Metal binding sites in oxi- 31. Martin W, Russell MJ (2003) On the origins of cells: A hypothesis for the evolutionary doreductases. Philos Trans R Soc B 386:220120257–220120266. transitions from abiotic geochemistry to chemoautotrophic prokaryotes, and from 8. Altschul SF, et al. (1997) Gapped BLAST and PSI-BLAST: A new generation of protein prokaryotes to nucleated cells. Philos Trans R Soc Lond B Biol Sci 358(1429):59–83, database search programs. Nucleic Acids Res 25(17):3389–3402. discussion 83–85. 9. Gribskov M, McLachlan AD, Eisenberg D (1987) Profile analysis: Detection of distantly 32. Theobald DL (2010) A formal test of the theory of universal common ancestry. Nature – related proteins. Proc Natl Acad Sci USA 84(13):4355 4358. 465(7295):219–222. 10. Krogh A, Brown M, Mian IS, Sjölander K, Haussler D (1994) Hidden Markov models 33. David LA, Alm EJ (2011) Rapid evolutionary innovation during an Archaean genetic in computational biology. Applications to protein modeling. J Mol Biol 235(5): expansion. Nature 469(7328):93–96. – 1501 1531. 34. Anantharaman V, Koonin EV, Aravind L (2001) Regulatory potential, phyletic distri- 11. Bapteste E, Bouchard F, Burian RM (2012) Philosophy and evolution: Minding the gap bution and evolution of ancient, intracellular small-molecule-binding domains. J Mol between evolutionary patterns and tree-like patterns. Methods Mol Biol 856:81–110. Biol 307(5):1271–1292. 12. Bapteste E, et al. (2012) Evolutionary analyses of non-genealogical bonds produced 35. Ho YS, Burden LM, Hurley JH (2000) Structure of the GAF domain, a ubiquitous sig- by introgressive descent. Proc Natl Acad Sci USA 109(45):18266–18272. – 13. Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14(9):755–763. naling motif and a new class of cyclic GMP receptor. EMBO J 19(20):5288 5299. 14. Sadreyev RI, Baker D, Grishin NV (2003) Profile-profile comparisons by COMPASS 36. Unden G, Nilkens S, Singenstreu M (2013) Bacterial sensor kinases using Fe-S cluster – predict intricate homologies between protein families. Protein Sci 12(10):2262–2272. binding PAS or GAF domains for O2 sensing. Dalton Trans 42(9):3082 3087. + 15. Söding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 37. Müllner M, et al. (2008) A PAS domain with an oxygen labile [4Fe-4S](2 ) cluster in 21(7):951–960. the oxygen sensor kinase NreB of Staphylococcus carnosus. Biochemistry 47(52): 16. Remmert M, Biegert A, Hauser A, Söding J (2012) HHblits: Lightning-fast iterative 13921–13932. protein sequence searching by HMM-HMM alignment. Nat Methods 9(2):173–175. 38. Madigan MT (2012) Brock Biology of Microorganisms (Benjamin Cummings, San 17. Baross JA, Hoffman SE (1985) Submarine hydrothermal vents and associated gradient Francisco), 13th Ed, pp 354–359, 385–387, 393–394, 402, 451. environments as sites for the origin and evolution of life. Orig Life Evol Biosph 15(4): 39. Kim JD, Yee N, Nanda V, Falkowski PG (2013) Anoxic photochemical oxidation 327–345. of siderite generates molecular hydrogen and iron oxides. Proc Natl Acad Sci USA 18. Nisbet EG, Sleep NH (2001) The habitat and nature of early life. Nature 409(6823): 110(25):10073–10077. – 1083 1091. 40. Reysenbach AL, Shock E (2002) Merging genomes with geochemistry in hydrothermal 19. Wang M, et al. (2011) A universal molecular clock of protein folds and its power in ecosystems. Science 296(5570):1077–1082. tracing the early history of aerobic metabolism and planet oxygenation. Mol Biol Evol 41. Ma K, Schicho RN, Kelly RM, Adams MW (1993) Hydrogenase of the hyperthermophile 28(1):567–582. Pyrococcus furiosus is an elemental sulfur reductase or sulfhydrogenase: Evidence for 20. Miller AF (2004) Superoxide dismutases: Active sites that save, but a protein that kills. a sulfur-reducing hydrogenase ancestor. Proc Natl Acad Sci USA 90(11):5341–5344. Curr Opin Chem Biol 8(2):162–168. 42. Pedroni P, et al. (1995) Characterization of the locus encoding the [Ni-Fe] sulfhy- 21. Eck RV, Dayhoff MO (1966) Evolution of the structure of ferredoxin based on living relics of primitive amino Acid sequences. Science 152(3720):363–366. drogenase from the archaeon Pyrococcus furiosus: Evidence for a relationship to – 22. Kim JD, Rodriguez-Granillo A, Case DA, Nanda V, Falkowski PG (2012) Energetic se- bacterial sulfite reductases. Microbiology 141(Pt 2):449 458. lection of topology in ferredoxins. PLOS Comput Biol 8(4):e1002463. 43. UniProt Consortium (2012) Reorganizing the protein space at the Universal Protein 23. Mulholland SE, Gibney BR, Rabanal F, Dutton PL (1999) Determination of nonligand Resource (UniProt). Nucleic Acids Res 40(Database issue):D71–D75. amino acids critical to [4Fe-4S]2+/+ assembly in ferredoxin maquettes. Biochemistry 44. Hunter S, et al. (2009) InterPro: The integrative protein signature database. Nucleic 38(32):10442–10448. Acids Res 37(Database issue):D211–D215. EVOLUTION

Harel et al. PNAS | May 13, 2014 | vol. 111 | no. 19 | 7047 Downloaded by guest on September 25, 2021