<<

History of biological utilization inferred through phylogenomic analysis of structures

Christopher L. Duponta,1, Andrew Butcherb, Ruben E. Valasc, Philip E. Bourned, and Gustavo Caetano-Anollése,1

aMicrobial and Environmental Genomics Group, J. Craig Venter Institute, La Jolla, CA 92121; bDepartment of Biology, University of York, York YO10 5YW, United Kingdom; cBioinformatics Program and dSkaggs School of Pharmacy and Pharmaceutical Sciences, University of California, La Jolla, CA 92064; and eEvolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana–Champaign, IL 61801

Edited by Paul G. Falkowski, Rutgers, The State University of New Jersey, New Brunswick, NJ, and approved May 3, 2010 (received for review October 28, 2009) The fundamental of trace elements dictates the molecular Cu, Zn, and Mo (8, 9). Following the invention of oxygenic speciation and reactivity both within cells and the environment at and the resulting increase in atmospheric O2 large. Using protein structure and comparative genomics, we levels around 2.4 billion years ago (GYA) (10), increased con- elucidate several major influences this chemistry has had upon tinental weathering and oceanic sulfate reduction promoted biology. All of life exhibits the same size-dependent a transition to a euxinic (anoxic and sulfidic) Proterozoic scaling for the number of metal-binding within a proteome. (11, 12). In this high-sulfide, reducing ocean, Fe, Mn, and Co This fundamental evolutionary constant shows that the selection of concentrations would have declined yet remained greatly ele- one element occurs at the exclusion of another, with the eschewal vated relative to the modern oxic ocean, whereas Cu and Zn of Fe for Zn and Ca being a defining feature of eukaryotic pro- would have decreased several orders of magnitude (9). This teomes. Early life lacked both the structures required to control transition was likely not ocean-wide, with sporadic returns to intracellular metal concentrations and the metal-binding proteins anoxia and high spatial variability (13, 14), although coastal that catalyze transport and transformations. The surface perhaps exhibited conditions similar to those of development of protein structures for metal homeostasis coincided the modern oxic ocean. The transition to a generally oxic and fi with the emergence of metal-speci c structures, which predomi- oxidizing ocean (∼0.8–0.5 GYA) prompted large increases in Cu, nantly bound abundant in the Archean ocean. Potentially,

Zn, and Mo, with concomitant decreases in Fe, Mn, and Co (9). EVOLUTION fi this promoted the diversi cation of emerging lineages of The enormous redox shifts temporally encompass the evolution and through the establishment of biogeochemical cycles. In of modern life, and it has been suggested that this influenced the contrast, structures binding Cu and Zn evolved much later, pro- fl selection of the elements for biological utilization (15), although viding further evidence that environmental availability in uenced there is sparse evidence (16). the selection of the elements. The late evolving Zn-binding proteins The advent of large-scale whole-genome sequencing can help are fundamental to eukaryotic cellular biology, and Zn bioavailabil- elucidate how trace-metal chemistry influenced the evolution of ity may have been a limiting factor in eukaryotic evolution. The protein macromolecules and how the different superkingdoms of results presented here provide an evolutionary timeline based on life incorporated this machinery. Both specific amino resi- genomic characteristics, and key hypotheses can be tested by dues and an appropriate 3D arrangement of those residues are alternative geochemical methods. required for proper metal binding by a protein, implying that an analytical approach using protein structure is more appropriate Archean-Proterozoic | | | than a purely sequence-based methodology. A recent survey of evolution | metal homeostasis Fe-, Zn-, Mn-, and Co-binding structures in over 300 genome- encoded revealed that the abundance of proteins etalloproteins contain one or more of an inorganic el- binding a specific metal within a proteome scales to proteome Mement in their 3D structure and are said to comprise 30% of size as a power law with distinct slopes (α) for each super- all proteins (1). Many biological pathways contain at least one kingdom and metal (17). Specifically, the proteomes of akaryotes metalloenzyme (2) and consequently require Mg, K, Ca, Fe, Mn, (organisms in superkingdoms Archaea and Bacteria lacking and Zn to sustain life (3). Other elements, like Cu, Mo, Ni, Se, and a nucleus) were described by power laws with values of α > 1 for — — Co, are required by many though not all organismal lineages, Fe-, Mn-, and Co-binding structures, while the eukaryotic pro- and the utilization of trace elements varies greatly between species teomes have values of α > 1 for Zn-binding structures. The fi (3). The different elements have a range of af nities for most different power law slopes for Fe, Mn, Zn, and Co parallels the +2 +2 < +2 < coordinating environments in the order Mg /Ca Mn hypothetical chemical environment in which each superkingdom +2 < +2 < +2 < +2 ∼ +2 Fe Co Ni Cu Zn , an equilibrium series known supposedly evolved, providing full-genome evidence for the in- as the Irving-Williams Series (4). This array of outer-sphere fluence of trace metal on biological evolution. chemistry provides significant catalytic diversity, yet has con- In that study, the lack of knowledge about the order of met- sequences for both biological and . alloenzyme evolution precluded any conclusions about how the Within the , metals compete for protein binding sites; hence, chemistry of the different trace metals have impacted the extensive protein networks involving transporters and metal- emergence of diversified life. Although the power-law scaling sensing regulatory proteins are required to maintain the proper revealed a consistent evolutionary trajectory, it lacked direction, subcellular concentrations of each element, and in some cases directly shuttle each metal to its requisite in the proper cellular compartment (1, 5, 6). It has been hypothesized Author contributions: C.L.D., P.E.B., and G.C.-A. designed research; C.L.D., R.E.V., and that the establishment of a metal homeostasis system is required G.C.-A. performed research; A.B. contributed new reagents/analytic tools; C.L.D., R.E.V., for distinct phylotypes of cells to develop (7). P.E.B., and G.C.-A. analyzed data; and C.L.D., P.E.B., and G.C.-A. wrote the paper. Environmentally, for similar fundamental chemical reasons, The authors declare no conflict of interest. the shifts in global redox state hypothesized to have occurred This article is a PNAS Direct Submission. over the past 4.5 billion years (3, 7, 8) greatly influenced trace- 1To whom correspondence may be addressed. E-mail: [email protected] or [email protected]. fi metal abundance. Speci cally, the anoxic and reducing Archean This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. ocean would have been enriched in Fe, Mn, and Co, yet low in 1073/pnas.0912491107/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.0912491107 PNAS | June 8, 2010 | vol. 107 | no. 23 | 10567–10572 Downloaded by guest on September 25, 2021 impeding interpretation over evolutionary history. Here, we ex- binding protein families have been reported in the larger eukaryotic pand upon this previous study by examining a larger set of ele- proteomes (23), and the α >1 power-law scaling extends those trends ments and introduce a temporal scale. The relative timing for the to both akaryotic superkingdoms. evolution of metal-binding protein structures, both those exclu- A comparison of the sum of all metal-binding domains and sive and promiscuous in metal choice, was determined using proteome size reveals a remarkable superkingdom-independent a phylogeny of protein architectures that has been developed and scaling (Fig. 1). This finding indicates that the overall gain and refined over the past 7 years (18, 19). The occurrences and loss of metal-binding domains is a fundamental and constant rate abundances of all of the diverse Fe-, Zn-, Mn-, Co-, Ni-, Cu-, for all of life. Furthermore , the slope of >1 indicates that the Mo-, and Ca-binding protein families in 313 genomes were also metal-binding proportion of a proteome increases with expan- determined. By examining the intersections of these datasets, we sions in proteome size; only ∼8.5% of the small proteomes of identify several genomic fossils that illuminate how trace-metal parasitic Bacteria bind metals, whereas >25% of the largest chemistry has constrained evolution. Environmental concen- mammalian proteomes is metal binding (Fig. 1). One caveat trations of trace metals influenced what type of metal-binding might be the lack of complete proteome annotation; 60 to 65% proteins evolve. The invention of the protein machinery needed of a given proteome can be structurally characterized with the to solve the equilibrium puzzle of the Irving-Williams Series SUPERFAMILY HMMs, yet the structural genomics initiative coincided with the birth of metal-binding electron-transfer pro- suggests that the nonannotated portions of proteomes have teins. Finally, based upon proteome content, protein-domain a similar proportion of metal-binding domains (24). evolution, and , it seems likely that low Zn, Mo, and Cu concentrations prevented the widespread emergence and Diversity of Metal-Binding Compromises. The different trends for diversification of the eukaryotic superkingdom until the advent each metal and superkingdom add up to a shared universal of a planet-wide shift in redox state. trend, indicating the need for diverse compromises. To visualize these compromises, we plotted the percent of a proteome that Results binds the three most abundantly used metals (Fe, Zn, and Ca) Overview. The evolutionary units in our study are protein struc- against the total number of protein domains in a proteome for fi tural domains: more speci cally, compact and independently each superkingdom (Fig. 2). In general, Bacteria eschew Ca for fi folding 3D architectures. The structural classi cation of proteins Fe and Zn, but the choice between the latter two metals de- (SCOP) uses structural similarity to group domain fold-struc- pends upon proteome size. Bacteria with small proteomes tures into fold superfamilies (FSF), which are one or more contain a higher proportion of Zn-binding domains, most of evolutionarily related protein fold-families (FF) with little se- them involved in tRNA synthesis, transcription, and translation quence similarity. FFs are sequence clusters nested within each fi (17). The essentiality of many of these protein domains means FSF (20). SCOP (20) provides a hierarchical classi cation of all that the retention in small genomes is not surprising, yet the protein domains published in the Protein Data Bank (PDB) (21) concomitant exclusion of Fe-binding domains is striking. In and version 1.69 of SCOP sorts 70,800 domains into 945 defined contrast, larger Bacterial proteomes tend to contain a higher folds that are assigned to 1,539 FSFs, which are further sub- proportion of Fe-binding domains. Similar trends are observed divided into 2,845 FFs. In the simplest scenario, a protein con- for archaeal proteomes (Fig. 2). Eukaryotic proteomes always tains one domain belonging to a FF and its parent FSF. More contain a greater proportion of Zn-binding structures relative to complex proteins contain multiple domains from one or more FF and FSF. Here, we examine the distribution of metal-binding those that bind Fe. Ca-binding proteins are often more abun- FFs in extant genomes, simultaneously examining the temporal dant than Fe-binding proteins, particularly in larger eukaryotic evolution of the parent category, the FSFs. proteomes (Fig. 2). Despite these prevailing trends, a great di- versity of compromises exists within each superkingdom. Es- Trends of Composition in the Different Superkingdoms. sentially, life has chosen diversely from the pool of available The , defined here as the proteomic complement of Ca-, metalloenzymes, but selecting one element requires the re- Zn-, Fe-, Mn-, Cu-, Co-, Mo-, and Ni-binding FFs, of 313 genome- linquishment of another. encoded proteomes representing the three superkingdoms, were examined relative to the total number of protein domains assigned to Emergence of Metal-Binding Protein Structures. A genome-based a genome, which is directly proportional to genome size. In terms of phylogeny of protein architectures at the FSF level that described sheer proteome proportion, Fe-, Zn-, and Ca-binding domains are the most abundant, followed by Cu and Mn, then finally Co, Ni, and Mo (Table S1). However, as reported previously, the abundances of 100000 Archaea Fe-, Zn-, Mn-, and Co-binding protein domains scale to proteome Bacteria size as a power law, with different slopes for each metal and super- Eukarya 10000 kingdom (Fig. S1 and Table S1). This scaling indicates that a raw Archaea model percentage is a misrepresentation of proteome abundance. The Bacteria model Eukarya model power-law scaling is weak (Fig. S1), yet Ni- and Mo-binding structures 1000 are clearly more abundant in akaryotic relative to eukaryotic pro-

teomes, although both only comprise a minuscule portion of most 100

proteomes (Table S1). Cu-binding domains are in similar abun- metal-binding Total dance in all three superkingdoms (Table S1), a result consistent with domains in a proteome a previous survey conducted using sequence-based methods (22). 10 Many akaryotic proteomes contain only one or no Cu-binding protein

domains, preventing a robust estimation of a power-law slope, where- 1 as eukaryotic proteomes exhibit a preferential accumulation of Cu- 100 1000 10000 100000 binding domains with increasing proteome size (Fig. S1 and Table Total domains in a proteome S1). Ca-binding domains within a proteome scale with power-law Fig. 1. Universal power-law scaling in metallomes. The total number of slopes of greater than one for all three superkingdoms (Fig. S1 and metal-binding domains relative to the total number of structural domains Table S1), although Archaea and Eukarya have significantly (P < assigned to a given proteome is shown in log-log form, as are the fitted 0.01) greater slopes than Bacteria. Large-scale expansions of Ca- power laws from Table S1.

10568 | www.pnas.org/cgi/doi/10.1073/pnas.0912491107 Dupont et al. Downloaded by guest on September 25, 2021 ABC 100 Bacteria Archaea Eukarya

10

1 Fe

% of all protein domains Zn Ca Fig. 2. Metallomic compromises. The percent of the All metals 0.1 proteome that binds Zn, Fe, Ca, and the sum of the 2.5 3 3.5 4 33.23.43.63 3.5 4 4.5 5 three metals is shown for (A) Bacteria, (B) Archaea, and Log (protein domains) ~ Log (genome size) (C) Eukarya.

the evolution of the protein world was previously reconstructed and describes the rise of lineages in an emerging and complex by Wang et al. (19), and the relative age of each metal-binding organismal world. architecture was determined from this phylogenomic tree. For The nd increments are by a linear scale, yet equal nd visualization, the age of each FSF (node distance, nd) is displayed steps do not necessarily correspond to equal time steps. To as the number of nodes from the hypothetical ancestor in the tree better interpret the structural evolution data, we aligned this on a relative 0 to 1 scale, with nd = 0 representing the birth of the structural phylogeny with a generalized geochemical and physical protein universe and nd = 1.0 representing the most recent fossil record at two points. First, rising in the structural innovation (Fig. 3). The exact order of closely posi- at 2.4 GYA (14) would result in a physiological necessity for the tioned FSFs is potentially debatable in trees of this size, but oxidative defense- dismutase and (Fig. 3), and the FSFs for these enzymes evolved at nd = 0.326 trends across the phylogeny are certainly robust and informative and 0.364, respectively. Second, the appearance of the first (25). For example, Wang et al. examined the evolution of protein fi nd fi EVOLUTION fi Eukarya-speci c structure ( = 0.614) was aligned with the rst architectures speci c to each superkingdom and delineated the appearance of unambiguous physical fossils of eukaryotic life entire phylogeny into three epochs (19). FSFs ubiquitous to life (1.6 GYA) (Fig. 3) (26). The anchor points allow a delineation of fi fi nd – arose during the rst epoch (architectural diversi cation; =0 the structural phylogeny into gross geochemical epochs, which 0.391), and these core architectures catalyze much of modern align remarkably well with the protein evolution epochs desig- , at least in their modern manifestation (19). The nated by Wang et al. (19). Metalloenzyme evolution was pre- second epoch (superkingdom specification; nd = 0.391–0.61) dominantly cladogenic, with numerous metal-binding structures describes the evolution of protein structures at a time when first appearing very close to each other within the phylogeny (Fig. 3). lineages emerged and superkingdoms diversified. The third and Over 80% of cambialistic (multiple metal binding; Methods) and final epoch (organismal diversification; nd = 0.61–1.0) contains ambiguous (may bind no or alternative metals; Methods) metal- a host of protein architectures specific and ubiquitous to Eukarya binding architectures evolved during the first epoch, predating

Atmospheric O2 100 (% present day) 1-10 -3 <<10

4-2.4 GYA ~2.4-0.5 GYA 0.5-0 GYA High Fe, Mn, Co, Ni Less Fe, Mn, Co, Ni Lowest Fe, Low Zn, Cu, Mo Lowest Zn, Cu Mn,Co,Ni Generalized Increased Mo Highest Cu, ocean Coastal pockets of oxic (high Zn, Cu) Zn, Mo

chemistry Sporadic fluctuations between anoxia and euxinia GOE~2.4 GYA GOE~2.4 First Euk.fossils ~1.6 GYA Catalase FSF Fe/Mn SOD FSF First Euk. FSF 1

0.8

0.6 Fe FSFs Mn FSFs Zn FSFs 0.4 Cu FSFs Ca FSFs 0.2 Mo FSFs Cambialistic FSFs

Ambiguous FSFs Proportion of FSFs evolved at nd at evolved FSFs of Proportion 0 0 0.2 0.4 0.6 0.8 1 nd from of structural phylogeny

Fig. 3. The appearance of metal-binding FSFs over the course of protein evolution and the correlation with oceanic geochemistry. (Top and Middle)A summary of the current consensus on the oxygen content of the atmosphere and the metal chemistry of the global over geological time adapted from Lyon and Reinhard (14) and Saito et al. (9). (Bottom) The temporal evolution of several classes of metal-binding FSFs. Each point is a single metal-binding FSF, with the x axis indicating the time of appearance on a protein architecture phylogeny and the y axis indicating the total proportion of each category that has appeared up to that time. The three panels were aligned at two anchor points by a pair of transitional arguments detailed in the main text.

Dupont et al. PNAS | June 8, 2010 | vol. 107 | no. 23 | 10569 Downloaded by guest on September 25, 2021 both the appearance of unambiguous metal-binding FSFs and presented here provides the first extensive temporal support for the first loss of a FSF by a superkingdom (Fig. 1). The cam- previous transitional argument hypotheses (15–17). bialistic FSFs contain an ensemble of metal-specific FFs. Es- sentially, these FSFs are 3D scaffolds that have achieved diver- Emergence of the Akaryotes, Metal Homeostasis, and Biogeochemical sified metal-binding through alternations to the predominantly Cycles. Cambialistic metal-binding protein architectures capable catalytic metal-binding . The cambialistic FSFs contain of binding multiple metals appeared earlier than metal-specific FFs that bind Fe, Zn, Mn, or Ni, with a few binding Mo, Ca, or counterparts, and the trend from promiscuity to specificity in Co (Table S2). The ambiguous FSFs contain FFs that either bind metal-binding parallels general patterns observed in evolution of or lack metals. modern metabolism (27). The hypothesis that a functional metal Around nd = 0.3, two cambialistic FSFs prominently involved homeostasis system would be necessary for the first steps in in metal homeostasis evolved: the helical backbone metal re- akaryotic evolution has been argued using transitional arguments ceptor FSF (nd = 0.34, SCOP ccs c.92.2, involved in Fe, Zn, and (3), and the concomitant appearances of structures involved in Mn uptake and sensing) and the heavy metal-associated domain metal homeostasis, metal-specific architectures, and the diver- FSF (nd = 0.321, SCOP ccs d.58.17, involved in Cu and Zn ho- sification of the akaryotic lineages provides the first genomic meostasis). These FSFs contain multiple metal-specific FFs, are evidence of this. Before this coordinated evolution, organic central to nearly all characterized metal-homeostasis systems, and cofactors able to transport protons had already evolved (28), yet are found in nearly all akaryotic proteomes in high copy number. when coupled with metal-binding electron transport proteins, The emergence of these metal homeostasis FSFs coincides with redox transformations become possible. The earliest evolving the appearance of many metal-binding FSFs, including those Fe-, Mo-, and Cu-binding protein architectures are intrinsically unambiguous and cambialistic in metal choice (Fig. 3). This ap- involved in and the transformations of C, N, S, pearance occurs just before the predominant period of diver- and O (Table S3). In the environment, organisms metabolize the sification of the akaryotic superkingdoms (19). The majority of products of other organisms through interconnected bio- the emerging architectures bind metals abundant in the Archean geochemical cycles that attain a steady state resistant to further ocean (9), and most Fe (>60%) and Mn-binding (100%) struc- change (29–31). The coordinated appearance of many of the tures appear between 0.3 and 0.6 nd (Fig. 3). Structures that bind metal-binding electron transfer proteins provided the potential Fe in mixed Fe-S clusters evolve before those that bind Fe through for the emergence of a nascent biogeochemical network, in- rings or direct bonds (Fig. S2A). In contrast, creasing the number of ecological niches for emergent lineages despite their early appearance, most (>60%) Zn-binding FSFs of Bacteria and Eukarya. If the evolution of metal-binding evolve after 0.6 nd, a pattern shared by Cu- and Ca-binding FSFs electron-transfer proteins and a biogeochemical network were (Fig.3). Most of the late evolving Zn-FSFs use Zn as a structural indeed closely linked, parallel measurements of the isotopic (Fig S2B). fractionation of trace metals (Fe) and biogeochemical elements (N, S) of fossils of ancient microbial communities should reveal Relative Age of the Metallome in the Three Superkingdoms. We punctuated and concomitant variations. determined the percent of proteomes in each superkingdom that contain each Fe-, Zn-, and Ca-binding FSF and compared it to the and the Evolution of Eukarya. The Zn power-law scaling, par- age of the structure. The earliest evolving metal-binding protein ticularly the intercept, shows that the last common ancestor of structures are found within most proteomes of Archaea, Bacteria, Eukarya certainly contained more Zn-binding proteins than an and Eukarya, regardless of the metal-bound (Fig. S3). Structures equivalent akaryote (Table S1). Even modern Eukarya that in- that evolved during the second epoch appear in a smaller per- habit anoxic environments, such as Giardia, have substantially centage of modern proteomes in each superkingdom, again in more Zn-binding proteins than akaryotes (7.3% of the proteome). metal-independent fashion (Fig. S3). To a large extent, extant The eukaryotic proteomes contain most of the early evolving Zn Bacteria eschew the architectures that evolved during the third proteins, likely acquired through endosymbiotic gene transfer (19, epoch. Archaea share with Eukarya a late-evolving set of six Zn- 32), yet are the sole carriers of late-evolving Zn-binding proteins binding architectures involved in transcription or translation (17), (Fig. S3), where the metal is predominantly structural in nature but otherwise contain few metal-binding structures that evolved (Fig. S2). An examination of the subcellular localization of metal- after 0.6 nd. In contrast, eukaryotes increasingly incorporate binding proteins within eukaryotic cells reveals an enrichment of metal-binding structures that evolved during the third epoch, Zn-binding domains targeted to the nucleus (Fig. S4), a result cor- particularly those that bind Ca and Zn (Fig. S3). Most of the Fe- roborated by the observation of elevated concentrations of Zn in binding structures that evolve after 0.6 nd are also found pre- the nuclei of marine flagellates (33). This eukaryote-specific uti- dominantly in Eukarya (Fig. S3). In summary, although the lization of Zn even came at the expense of Fe utilization (Fig. 2). modern akaryotes still predominantly use ancient metal-binding In the geologic record, the eukaryotic fossils from the Proterozoic protein structures, most eukaryotes use both early- and late- ocean were found in regions that were likely coastal (26), where evolving structures. a combination of photosynthesis in shallow water and O2-induced continental weathering likely resulted in increased Zn and Cu Discussion and Conclusions concentrations. The dependence of eukaryotes on Zn and the pre- Influence of Ocean Chemistry on the Selection of the Elements. Of sumably low bioavailability of Zn during the Proterozoic (9) suggests the various metals, Fe, Mn, and Mo were preferentially selected that low environmental Zn concentrations may have limited eu- by early biological life (Fig. 3), with many of the FSFs binding karyotic diversification. The other biogeochemical features of the these metals appearing before the Great Oxidation Event (GOE). Proterozoic ocean invoked as limiting factors to eukaryotic evolu- In contrast Cu and Zn utilization evolved after the GOE when tion are low Mo and high sulfide concentrations (12, 34). The former these metals were available in at least some environments. The would place a limit on global fixation; the latter is presumed implication is that the geochemistry of the Archean ocean influ- to have been toxic and either may have spatially confined the eu- enced both the evolution of metal-binding architectures and the karyotic evolution to what was coincidentally high Zn environment, selection of elements by the ancestors of Archaea and Bacteria. with a subsequent bias for Zn during structural innovations. The early evolution of the low redox-state Fe-S-containing Currently, the timing of eukaryotic evolution is contentious structures (Fig. S2) and delayed emergence of structures binding and the unique utilization of Zn by Eukarya might provide a rich Fe directly using amino acid side chains, bolsters this conclusion. avenue of future research. Concrete fossilized remains are only The analysis of metal-binding protein distribution and evolution found 1.6 GYA, and organic biomarkers placing an emergence at

10570 | www.pnas.org/cgi/doi/10.1073/pnas.0912491107 Dupont et al. Downloaded by guest on September 25, 2021 2.7 GYA (35) have recently been called into doubt (36). In the slight. Modern akaryotes, like marine cyanobacteria, can grow structural phylogeny, the first eukaryotic-specific FSF appears continuously at concentrations of Zn approaching those of the much later than superoxide dismutases and (Fig. 3), Archean ocean, whereas eukaryotes cannot (9, 42, 43). enzymes that would have been necessary for life to tolerate in- A more important question is how the progenitors of diversified creased oxygen. This finding supports a temporal lag between the life coped with a dramatic four to five order-of-magnitude decline GOE and eukaryotic emergence, although molecular clock in Zn concentrations accompanying the onset of oceanic euxinia studies will be required to verify this. Eukaryotes are known to at the Archean–Proterozoic transition. Chemical modeling stud- fi fractionate Zn during high af nity uptake (37), and an exami- ies suggest that, although uncomplexed Zn was likely very low, nation of the isotopic record of Zn may be a powerful ancillary aqueous Zn-S species were certainly present (9) and microbial approach to polarizing this fundamental argument. Zn acquisition might have proceeded through a production and scavenging strategy similar to that employed for Fe acquisi- Ca, Co, Mo, and the Evolution of Eukarya. Ca concentrations are tion in the modern world. Alternatively, early life might have relatively unaffected by redox chemistry, seeming to remove turned to cambialism, using a different metal in what are modern- a geochemical explanation for the observed late evolution of Ca- binding domains (Fig. 3). Like the Zn-binding proteins, many of day Zn metalloenzymes. In the exopeptidases, Mn or Co can act the late-evolving Ca-binding FSFs are eukaryotic-specific(Fig. as a substitute (44, 45). can substitute for Zn in RNA S3) and may have evolved to address a eukaryotic cellular need. polymerase without a dramatic loss in functionality, as shown in β fi Specifically, Ca, with its high coordination number and affinity ref. 46. Structural proteins like Zn- ribbons and Zn ngers also for hard , is ideally suited for signaling between organelles bind Co, although the DNA binding activity of a Co-bound Zn or cells (3, 38). Supporting this hypothesis, the most abundant finger is much less efficient relative to the Zn-bound form (47). In Ca-binding domains in Eukarya are predominantly involved in each of these scenarios, the Irving-Williams series dictates that signaling within or between cells (Table S4). The proteomes of with increased Zn bioavailability, the other metal would be dis- the akaryotes that do have multiple organelles, Planctomycetes, placed. The utility of Zn fingers likely increased following the are also enriched in Ca-binding domains (10% of the proteome), change in global Zn bioavailability, possibly prompting a burst in bolstering the hypothesis that this utilization is related to cellular the innovation of proteins like Zn fingers (48) and quickening the complexity and not environment. diversification and rise of Eukarya. Essentially, the eukaryotic Cu and Mo bioavailability in the oceans would have paralleled strategy of substituting Co for a majority of Zn requirements (42, EVOLUTION that of Zn (9) and also likely influenced eukaryotic evolution. The 49) and the scavenging of metals using ligand export and retrieval Cu-binding cupredoxin FSF (b.6.1, nd = 0.34) evolved early and, may be vestigial abilities from the high sulfide Proterozoic, as although necessary for a complete global nitrogen cycle, only select suggested by Saito et al. (9). akaryotic lineages contain this FSF. In contrast, this FSF is ubiq- uitous in eukaryotic proteomes, suggesting a preference for Cu in Methods electron transport. Therefore, to a lesser extent, environmental Cu Data Sources. Libraries of hidden Markov models built from structural align- concentrations may have impeded eukaryotic evolution. Mo con- mentsoftheFSFsandFFsareavailablefromtheSuperfamilydatabaseandallow centrations increased following the transition to an oxic ocean the conversion of a proteome sequence into a complement of structural pro- (39); however, in direct contrast to Zn, akaryotes dedicate a larger tein architectures (50, 51). Release 1.69 of Superfamily covers 313 complete proportion of their proteome to binding Mo relative to Eukarya genomes (23 Archaea, 233 Bacteria, and 57 Eukarya) and was the source for all (Table S1). All of the Mo-binding structures emerged before the domain assignments. The Superfamily database employs a hybrid approach; first eukaryotic FSF (Fig. 3) and catalyze biogeochemical trans- a hidden Markov model-searching protocol and subsequent pairwise com- × −2 formations of N, such as nitrogen fixation and reduction, parisons with a probability cutoff of E =2 10 for identifying likely members fi reactions that are the provenance of akaryotes (30). In contrast, of a group; it also provides a con dence level (in the form of an E value) for every candidate identified. As was done in Yang et al. (52), a more stringent E many eukaryotes acquire nitrogen though ingestion of other − value cutoff of 10 4 was employed here for domain assignments. organisms or by establishing symbiotic relationships. The influence of Mo on evolution is likely indirect; the overall passage of nitro- Annotation of Metal Binding. Each FSF in the SCOP database was manually gen through biogeochemical systems and thus pro- examined for structures containing a covalently bound metal . This process ductivity depends upon Mo bioavailability (12, 40). On a global requires examining FFs, domains, and specific PDB structures within each FSF. scale, increased Mo concentrations could have facilitated higher If all of the representative structures in a FSF or FF bind the same metal, then rates of N-fixation, releasing nutrient limitation of , the family is considered metal specificor“unambiguous.” The manual increasing global carbon flow, and thus allowing for the energeti- curation of the SCOP revealed that most (∼80–90%) of the metal-binding FFs cally expensive eukaryotic cell. and their cognate domains bind a specific metal unambiguously (17), but the same cannot be said for the FSFs and the introduction of some descriptive Influence of Changing Zn on Metal Utilization. Recently, Mulkidja- terminology is required before proceeding further. Approximately 65% of nian and Galperin suggested that the Zn-binding proteins and the FSFs contain FFs binding the same metal, but the other 35% exhibit one RNA structures found in all of modern life must have been of two deviations from this behavior (17). “Ambiguous” refers to FSFs where present in the last universal common ancestor (41). Invoking the not all of the FFs bind a metal, whereas cambialistic FSFs have FFs binding hypothesized Zn concentrations in the Archean ocean (9), they different metals. Some of the cambialistic and ambiguous FSFs are the concluded that the last universal common ancestor must have largest in the SCOP in terms of the number of FFs they contain (Table S2). evolved near metal-rich hydrothermal vents (41). Our results do Cambialistic and ambiguous FSFs and the metals they bind are listed in Table not support this hypothesis. Within our structural phylogeny, most S2; all unambiguous metal-binding FFs and FSFs are listed in Table S5. of the early-evolving Zn-binding protein domains are in cam- Analysis of Metal-Binding Domain Abundance. Abundance matrices for each bialistic and ambiguous FSFs (Fig. 3), and the progenitor struc- specific metal-binding FF in the individual proteomes were constructed, with tures may have bound a different metal or no metal at all. After rows representing a specific proteome and columns representing architectures. the discovery of metal homeostasis, life predominantly turned to To determine the power-law fitting, the data were log-transformed and the Fe and Mn (Fig. 3). Three early-evolving unambiguous Zn- linear fit and curve statistics were calculated using a geometric mean least- binding FSFs are found in almost all extant akaryotic organisms: squares fitting technique. The slopes for each power law were compared using the Zn-dependent exopeptidases (c.56.5, nd = 0.125), the Zn-β ANOCOVA by testing the hypothesis that the slopes were equal. The ubiquity of ribbon (g.41.3, nd = 0.29), and RNA polymerase (e.29.1, nd = a FSF or FF is defined here as the percentage of proteomes within a superking- 0.27), yet the physiological ramifications of this appear to be dom that contain a specificFForFSF.AllcomputationswereconductedinMatlab.

Dupont et al. PNAS | June 8, 2010 | vol. 107 | no. 23 | 10571 Downloaded by guest on September 25, 2021 Phylogenies of Protein Architectures. A phylogenomic tree of protein archi- tive to the rest of the protein universe was obtained directly from the tree tecture reconstructed using a global genomic census of protein-domain of all FSFs. We chose this tree and not updated versions to allow direct structure at the FSF level in 185 genomes spanning 19 Archaea, 129 Bacteria, comparison of evolutionary inferences of metal-binding and proteome di- and 37 Eukarya (19) was used to generate a timeline of architectural dis- versification and because FSF functions are well annotated (25). covery. The tree was generated from FSF architectural abundance and was rooted by polarizing characters with a model based on two fundamental ACKNOWLEDGMENTS. We thank Minglei Wang for help with tree recon- premises: (i) that protein structure is far more conserved than sequence and structions and Ken Nealson for a critical reading of the manuscript. This work carries a strong phylogenetic signal, and (ii) that architectures that are was supported by a director’s discretionary fund grant from the National popular in proteomes are more ancient. As with any phylogenetic method, Aeronautics and Space Administration National Astrobiology Institute (to C.L.D.) and National Science Foundation Grant MCB-0749836 and US Depart- the validity of character argumentation can be contentious but this has been ment of Agriculture-Cooperative State Research Education and Extension discussed in detail elsewhere (25). The age (nd) of metal-binding FSFs rela- Service Grant Hatch Illu-802-314 (to G.C.-A.).

1. Rosenzweig AC (2002) Metallochaperones: Bind and deliver. Chem Biol 9:673–677. 29. Kump LR (2004) Scientists Debate Gaia, eds Schneider SH, Miller JR, Crist E, Boston PJ 2. Andreini C, Bertini I, Cavallaro G, Holliday GL, Thornton JM (2008) Metal ions in (MIT press, Cambridge), pp 93–100. biological : From databases to general principles. J Biol Inorg Chem 30. Falkowski PG, Fenchel T, Delong EF (2008) The microbial engines that drive ’s 13:1205–1218. biogeochemical cycles. Science 320:1034–1039. 3. Williams RJP, Frausto da Silva JJR (2006) The Chemistry of Evolution: The 31. Volk T (2004) Scientists Debate Gaia, eds Schneider SH, Miller JR, Crist E, Boston PJ Development of Our Ecosystem (Elsevier, Amsterdam). (MIT press, Cambridge), pp 27–36. 4. Irving H, Williams RJP (1948) Order of stability of metal complexes. Nature 162: 32. Embley TM, Martin W (2006) Eukaryotic evolution, changes and challenges. Nature – 746 747. 440:623–630. 5. Tottey S, et al. (2008) Protein-folding location can regulate -binding versus 33. Twining BS, Baines SB, Vogt S, de Jonge MD (2008) Exploring ocean biogeochemistry – - or zinc-binding. Nature 455:1138 1142. by single-cell microprobe analysis of protist elemental composition. J Eukaryot 6. Waldron KJ, Robinson NJ (2009) How do bacterial cells ensure that Microbiol 55:151–162. – get the correct metal? Nat Rev Microbiol 7:25 35. 34. Johnston DT, Wolfe-Simon F, Pearson A, Knoll AH (2009) Anoxygenic photosynthesis 7. Williams RJP, Fraústo Da Silva JJ (2003) Evolution was chemically constrained. J Theor modulated Proterozoic oxygen and sustained Earth’s middle age. Proc Natl Acad Sci Biol 220:323–343. USA 106:16925–16929. 8. Anbar AD (2008) Oceans. Elements and evolution. Science 322:1481–1483. 35. Brocks JJ, Logan GA, Buick R, Summons RE (1999) Archean molecular fossils and the 9. Saito MA, Sigman DM, Morel FMM (2003) The bioinorganic chemistry of the ancient early rise of eukaryotes. Science 285:1033–1036. ocean: The co-evolution of cyanobacterial metal requirements and biogeochemical 36. Rasmussen B, Fletcher IR, Brocks JJ, Kilburn MR (2008) Reassessing the first cycles at the Archean-Proterozoic boundary? Inorganica Chemica Acta 356:308–318. appearance of eukaryotes and cyanobacteria. Nature 455:1101–1104. 10. Bekker A, et al. (2004) Dating the rise of atmospheric oxygen. Nature 427:117–120. 37. John SG, Geis RW, Saito MA, Boyle EA (2007) Zinc isotopic fractionation during high- 11. Arnold GL, Anbar AD, Barling J, Lyons TW (2004) isotope evidence for fi fi widespread anoxia in mid-Proterozoic oceans. Science 304:87–90. af nity and low-af nity zinc transport by the marine diatom Thalassiosira oceanica. – 12. Anbar AD, Knoll AH (2002) Proterozoic ocean chemistry and evolution: A bioinorganic Limnol Oceanogr 52:2710 2714. bridge? Science 297:1137–1142. 38. Vardi A, et al. (2006) A stress surveillance system based on and nitric oxide in 13. Reinhard CT, Raiswell R, Scott C, Anbar AD, Lyons TW (2009) A late Archean sulfidic marine diatoms. PLoS Biol 4:e60. stimulated by early oxidative weathering of the continents. Science 326:713–716. 39. Scott C, et al. (2008) Tracing the stepwise oxygenation of the Proterozoic ocean. 14. Lyons TW, Reinhard CT (2009) Early Earth: Oxygen for heavy-metal fans. Nature 461: Nature 452:456–459. 179–181. 40. Glass JB, Wolfe-Simon F, Anbar AD (2009) Coevolution of metal availability and 15. Williams RJP (1997) The natural selection of the chemical elements. Cell Mol Life Sci nitrogen assimilation in cyanobacteria and algae. Geobiology 7:100–123. 53:816–829. 41. Mulkidjanian A, Galperin M (2009) On the origin of life in the Zinc world. 2. 16. Ji H-F, Chen L, Jiang Y-Y, Zhang H-Y (2009) Evolutionary formation of new protein Validation of the hypothesis on the photosynthesizing zinc sulfide edifices as cradles folds is linked to metallic recruitment. Bioessays 31:975–980. of life on Earth. Biol Direct 4:27. 17. Dupont CL, Yang S, Palenik B, Bourne PE (2006) Modern proteomes contain putative 42. Sunda WG, Huntsman SA (1995) Co and Zn interreplacement in marine phy- imprints of ancient shifts in trace metal geochemistry. Proc Natl Acad Sci USA 103: toplankton: Evolutionary and ecological implications. Limnol Oceanogr 40:1404– 17822–17827. 1417. 18. Caetano-Anollés G, Caetano-Anollés D (2003) An evolutionarily structured universe of 43. Saito MA, Moffett JW, Chisholm SW, Waterbury JB (2002) Cobalt limitation and – protein architecture. Genome Res 13:1563 1571. uptake in Prochlorococcus. Limnol Oceanogr 47:1629–1636. 19. Wang M, Yafremava LS, Caetano-Anollés D, Mittenthal JE, Caetano-Anollés G (2007) 44. Heese D, Berger S, Röhm K-H (1990) Nuclear magnetic relaxation studies of the role of Reductive evolution of architectural repertoires in proteomes and the birth of the the metal ion in Mn2(+)-substituted I. Eur J Biochem 188:175–180. – tripartite world. Genome Res 17:1572 1585. 45. Maric S, et al. (2009) The M17 leucine of the malaria parasite 20. Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: A structural classification Plasmodium falciparum: Importance of active site metal ions in the binding of of proteins database for the investigation of sequences and structures. J Mol Biol 247: substrates and inhibitors. 48:5435–5439. 536–540. 46. Speckhard DC, Wu FY-H, Wu C-W (1977) Role of the intrinsic metal in RNA polymerase 21. Berman HM, et al. (2000) The Protein Data Bank. Nucleic Res 28:235–242. from Escherichia coli. In vivo substitution of tightly bound zinc with cobalt. 22. Andreini C, Banci L, Bertini I, Rosato A (2008) Occurrence of copper proteins through Biochemistry 16:5228–5234. the three domains of life: A bioinformatic approach. J Proteome Res 7:209–216. 47. Predki PF, Sarkar B (1994) Metal replacement in “zinc finger” and its effect on DNA 23. Morgan RO, Martin-Almedina S, Iglesias JM, Gonzalez-Florez MI, Fernandez MP binding. Environ Health Perspect 102(Suppl 3):195–198. (2004) Evolutionary perspective on annexin calcium-binding domains. Biochim 48. Aravind L, Iyer LM, Koonin EV (2006) Comparative genomics and structural biology of Biophys Acta 1742:133–140. – 24. Shi W, et al. (2005) Metalloproteomics: High-throughput structural and functional the molecular innovations of eukaryotes. Curr Opin Struct Biol 16:409 419. annotation of proteins in structural genomics. Structure 13:1473–1486. 49. Price NM, Morel FMM (1990) and cobalt substitution for zinc in a marine – 25. Caetano-Anollés G, Wang M, Caetano-Anollés D, Mittenthal JE (2009) The origin, diatom. Nature 344:658 660. evolution and structure of the protein world. Biochem J 417:621–637. 50. Gough J (2006) Genomic scale sub-family assignment of protein domains. Nucleic 26. Knoll AH, Javaux EJ, Hewitt D, Cohen P (2006) Euykaryotic organisms in Proterozoic Acids Res 34:3625–3633. oceans. Philos Trans R Soc Lond B Biol Sci 361:1023–1038. 51. Gough J, Karplus K, Hughey R, Chothia C (2001) Assignment of homology to genome 27. Caetano-Anollés G, et al. (2009) The origin and evolution of modern metabolism. Int J sequences using a library of hidden Markov models that represent all proteins of Biochem Cell Biol 41:285–297. known structure. J Mol Biol 313:903–919. 28. Ji H-F, Chen L, Zhang H-Y (2008) Organic cofactors participated more frequently than 52. Yang S, Doolittle RF, Bourne PE (2005) Phylogeny determined by transition metals in redox reactions of primitive proteins. Bioessays 30:766–771. content. Proc Natl Acad Sci USA 102:373–378.

10572 | www.pnas.org/cgi/doi/10.1073/pnas.0912491107 Dupont et al. Downloaded by guest on September 25, 2021