Metabolic evolution of a deep-branching hyperthermophilic chemoautotrophic bacterium

Rogier Braakman Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA

Eric Smith Krasnow Institute for Advanced Study, George Mason University, 4400 University Drive, Fairfax, VA 22030, USA and Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA (Dated: September 19, 2013) Aquifex aeolicus is a deep-branching hyperthermophilic chemoautotrophic bacterium restricted to hydrothermal vents and hot springs. These characteristics make it an excellent model system for studying the early evolution of metabolism. Here we present the whole-genome metabolic network of this organism and examine in detail the driving forces that have shaped it. We make extensive use of phylometabolic analysis, a method we recently introduced that generates trees of metabolic phenotypes by integrating phylogenetic and metabolic constraints. We reconstruct the evolution of a range of metabolic sub-systems, including the reductive citric acid (rTCA) cycle, as well as the biosynthesis and functional roles of several amino acids and cofactors. We show that A. aeolicus uses the reconstructed ancestral pathways within many of these sub-systems, and highlight how the evolutionary interconnections between sub-systems facilitated several key innovations. Our analyses further highlight three general classes of driving forces in metabolic evolution. One is the duplication and divergence of genes for as these progress from lower to higher specificity, improving the kinetics of certain sub-systems. A second is the kinetic optimization of established pathways through fusion of enzymes, or their organization into larger complexes. The third is the minimization of the ATP unit cost to synthesize biomass, improving thermodynamic efficiency. Quantifying the distribution of these classes of innovations across metabolic sub-systems and across the tree of life will allow us to assess how a tradeoff between maximizing growth rate and growth efficiency has shaped the long-term metabolic evolution of the biosphere.

Introduction While A. aeolicus has been the focus of substantial ex- perimental efforts (see Ref. [5] for a review), it has not been characterized nearly as extensively as other model Metabolism lies at the heart of cellular physiology, act- systems for which highly curated metabolic models ex- ing as a chemical transformer between environmental in- ist. In addition, the inherent uncertainty of genome an- puts and components of biomass. Identifying rules and notation from sequence alone [6, 7], while overall sig- principles that underlie metabolic architecture can thus nificantly improving for next-generation methods [8], is provide important insights into how basic properties of compounded by the deep-branching position and ex- chemistry and physics constrain living systems. Of par- tremophile lifestyle of this organism. Metabolic recon- ticular relevance to understanding the chemical history struction protocols generally rely on heuristic rules to of the biosphere is the foundational layer of autotrophic deal with the inevitable network “gaps” that result from metabolism, which fixes CO2 and ultimately provides the misannotation or the presence of genes of unknown func- ecological support to all forms of heterotrophy. tion. Such protocols tend to perform well in predicting The merits of this view [1] were highlighted in a re- basic aspects of phenotype, such as growth rate, particu- cent study on the early evolution of carbon-fixation path- larly for well-studied organisms [9, 10], but it is less clear arXiv:1309.4467v1 [q-bio.MN] 17 Sep 2013 ways, which concluded that environmentally-driven in- what level of confidence to assign them when the focus novations in this process underpin most of the deepest is the evolution of specific metabolic sub-systems. More- branches in the tree of life [2]. To extend our analy- over, reconstructing an individual metabolic network re- sis of the early evolution of metabolism and of autotro- quires substantial effort and provides only a single “snap- phy, we present here a whole-genome reconstruction of shot” of an evolutionary process that has played out over the metabolic network of Aquifex aeolicus. A. aeolicus several billion years. is a chemoautotroph, deriving both biomass and energy For these reasons we utilize phylometabolic analysis from inorganic chemical compounds, and is one of the (PMA) [2] to guide the reconstruction of the metabolic deepest-branching and most thermophilic known bacte- network of A. aeolicus from its genome [11]. PMA gen- ria [3]. Deep-branching clades restricted to hydrother- erates trees of functional metabolic networks (i.e. phe- mal vents are generally considered to contain some of notypes) by integrating metabolic and phylogenetic re- the most conservative metabolic features as a result of constructions. The power of PMA derives from a simple high degree of long-term stability provided by these en- yet versatile constraint: the continuity of life in evolu- vironments [4] tion. Since metabolic pathways are the supply lines of 2 monomers from which all life is constructed, the conti- While selection for improved kinetics has probably nuity of life requires that at the ecosystem level some lowered the overall occurrence of promiscuous enzymes, pathway to a given universal metabolite must be com- metabolism maintains a substantial degree of plete in any evolutionary sequence across different parts promiscuity. For example, E. coli mutants in which an of the tree of life. The distribution of metabolic genes essential metabolic pathway was knocked out have been in different pathways to given metabolites, within and observed to recruit an alternate pathway from parts nor- across clades, thus informs the most likely completions mally used for other functions to maintain growth [24]. in individuals, while distributions of pathways suggest In addition to promiscuity in the binding of alternate the evolutionary sequences that connect them (see also substrates while local functional group transformation is Methods). We recently introduced PMA to reconstruct preserved (substrate promiscuity), promiscuity can also the evolutionary history of carbon-fixation, relating all occur through the catalysis of alternate reaction mech- extant pathways to a single ancestral form [2]. Here we anisms (catalytic promiscuity). The form of promiscu- show the versatility of this approach, using it to recon- ity most frequently encountered in extant cells appears struct the complete whole-genome metabolic network of to be substrate promiscuity [25]. In general one might an individual species, while further examining the evolu- expect that the inherent “messiness” of enzymatic chem- tionary driving forces that have shaped the network. istry leads to a cost-benefit tradeoff in the evolution of As we will show, A. aeolicus synthesizes a significant substrate specificity, where complete specificity is diffi- fraction of its biomass through metabolic pathways that cult to achieve and moreover disadvantageous because it appear to represent conserved forms of the ancestral would decrease the capacity for future adaptation [23]. pathways to those metabolites. This is relevant in de- In particular one would expect this tradeoff to be dif- bates on the position of this organism within the tree ferent for core processes, where a higher mass flux can sig- of life. Initial phylogenetic studies based on 16S rRNA nificantly amplify the benefits of improved kinetics, ver- suggested that the Aquificales represent potentially the sus more peripheral processes that have lower mass flux. deepest branch within the bacterial domain [12, 13]. In keeping with this expectation, it is found that sub- Later studies of conserved insertion-deletions (indels) in strate promiscuity tends to increase toward the metabolic a range of proteins led to the conclusion that Aquifi- periphery [26], while reaction rate constants of enzymes cales are instead a later branch more closely related to tend to increase toward the metabolic core [27]. The -proteobacteria [14], but this was subsequently found to idea that selection for kinetics has determined the degree be likely the result of substantial horizontal gene trans- of specificity of a pathway’s enzymes raises the intrigu- fer (HGT) between these two clades [15]. It has since ing possibility that prior to selection for increased sub- become clear that Aquificales and -proteobacteria rep- strate specificity, homologous reaction sequences could resent the dominant clades of primary producers near hy- have initially been catalyzed by the same set of promis- drothermal vents [16], and ecological association is now cuous enzymes [19, 28, 29], allowing earlier metabolisms understood to be a major driver of HGT [17]. Together with greater abundances of such homologous sequences this appears to have restored some consensus on the very to be controlled with smaller genomes. deep-branching position of A. aeolicus [15, 18], which is We will discuss the evolution of several sub-networks further supported by our analysis of its metabolism. in the metabolism of A. aeolicus that provide illustra- tions of these general principles. We will show that com- pared with later branching autotrophs A. aeolicus uses Innovations in metabolic evolution a greater abundance of repeated parallel chemical se- quences catalyzed by enzymes with high sequence sim- Our analysis highlights three classes of innovations in ilarity, which could have initially been catalyzed by a metabolic evolution. One of these is gene duplication smaller number of more promiscuous enzymes. We will and divergence along a progression from low substrate- further highlight how cost-benefit tradeoffs of improving specificity to high substrate-specificity enzymes, driven kinetics have played out differently in core versus periph- by selection for improved kinetics. It has long been ar- eral pathways. gued that early enzymes had broader substrate affini- A second class of innovations concerns the fusion of ties than modern enzymes, with greater potential to enzyme subunits into larger single-subunit enzymes. En- promiscuously catalyze homologous reactions in earlier zyme fusion increases the effective concentrations of re- times [19–22]. Broad affinity of ancestral enzymes is actants at the for subsequent reactions within thought to have facilitated evolutionary adaptation by a sequence, thus increasing the throughput rate of the providing a ‘background’ of biochemical pathways, ini- sequence [30]. In particular when intermediates within tially proceeding at lower rates. If such background pathway sequences are not used elsewhere in metabolism, pathways produced advantageous products, they could the fusion of associated enzymes would appear to have then have been incorporated en bloc into metabolism [19], potentially little cost, but significant kinetic benefits for initially through increased enzyme expression levels and pathways with high mass-flux densities. Accordingly, eventually through duplication and divergence leading to studies have generally shown that gene fusion events more specific enzymes [23]. significantly outnumber gene fission events in evolu- 3 tion [31, 32]. Similarly, the organization of enzymes into totrophic, and especially on the longest time-scales of multi-subunit enzymes, or even super-complexes, can, selection we may thus expect that just as new inor- in addition to facilitating novel reaction mechanisms, ganic energy sources were being explored to increase to- also be considered to improve the kinetics of metabolism tal inputs to ecosystems, both the rate and efficiency by increasing effective concentrations of intermediates. of metabolic processes were simultaneously being opti- While this is not a central focus of the current study, we mized where possible. Even if individuals have access to, will highlight several key reaction sequences that in A. and can switch between alternate strategies and lifestyles, aeolicus are catalyzed by multi-subunit enzymes that in both those individuals as well as the ecosystem to which later branching organisms are known to be fused, reflect- they belong would benefit from improved kinetics and ef- ing Aquifex’s more primitive metabolism. ficiency of their metabolisms. The present reconstruction A final important class of innovations occurs at the of the metabolism of A. aeolicus provides an excellent level of pathways. In some cases organisms may have testbed to start cataloging and exploring these ideas. access to multiple different pathways which produce cer- tain metabolites, with different pathways’ having differ- ent unit costs of required ATP hydrolysis. Recent work Materials and Methods has suggested that lowering the ATP cost, thereby in- creasing overall thermodynamic efficiency, of pathways involved in CO2-fixation was a major evolutionary driv- The basic principles of phylometabolic analysis ing force that resulted in several major early branches in (PMA) [2] are outlined if Fig. 1. The effectiveness and the tree of life [2]. Here we show additional divergences flexibility of PMA for studying metabolic evolution arises that appear to be driven by increasing thermodynamic directly from the strongly synergistic potential of inte- efficiency. grating metabolic and phylogenetic constraints into a These three classes of innovations are all different ex- single approach. For example, if an individual annotated pressions, at different levels, of a more general evolution- genome leads to a putative metabolic network that is not ary tradeoff, in which either the efficiency or the rate of viable due to gaps in all known pathways that connect growth is maximized. For example, for heterotrophic or- given sub-networks, then without experimental evidence ganisms it is observed that in the presence of resource it may be very unclear how to proceed in the curation competition cells will use fermentative metabolic modes process. As shown in panel A, placing that same pu- and maximize the rate of ATP production, while in the tative network in phylogenetic context and comparing absence of competition cells will use respiration to max- metabolic gene profiles within and across clades, often imize the yield of ATP production [33]. Because ATP clearly suggests the proper gap-fills. We will show many hydrolysis drives biosynthesis, increasing its production such examples of the ways PMA guides the reconstruc- rate will increase growth rate, while increasing ATP yield tion process for the metabolic network of A. aeolicus. will increase growth efficiency. The same genomic surveys that help interpret the char- A second aspect of metabolism where this growth acter of individual phenotypes lead also to a distribu- rate/efficiency tradeoff is expressed, is not in the produc- tion of metabolic pathways across the tree of life, as tion of ATP, but in the structure of the biosynthetic ma- shown in panel B. PMA exploits the existence of uni- chinery driven by its subsequent hydrolysis. For exam- versal topological bottlenecks in metabolism – metabo- ple, improving kinetics of metabolic sequences through lites through which all pathway alternatives in a given increased substrate specificity or gene fusion contributes sub-system must pass to allow biosynthesis. Imposing a to improving growth rate, while lowering ATP cost by requirement on reconstructions that all extant and ances- choosing alternative pathways contributes to improving tral metabolic states supply those bottleneck metabolites efficiency. The relative mass contributions of different then allows us to transform the distributions of pathways pathways or reactions are likely a critical determinant into specific evolutionary sequences. These sequences can in the balance of benefit vs. cost of such innovations in be represented as phylometabolic trees, as shown on the different parts of metabolism. right in panel B. Cost-benefit tradeoffs in the structure of biosynthesis In comparison to phylogenetics, which reconstructs are likely to be of particular importance in autotrophic evolutionary trees through analyzing presence/absence of metabolism. Because (some) heterotrophs can switch be- genes or the sequence similarity of genes, PMA thus gen- tween fermentative and respiratory strategies for given erates trees by analyzing the presence/absence of path- organic inputs, they have access to much larger variabil- ways or the similarity of (sub)networks. Moreover, be- ity in the rate of ATP production than do autotrophs. cause each node in a phylometabolic tree represents a Autotrophs generally use purely respiratory metabolic distinct functional phenotype with an explicit internal modes to interconvert pairs of inorganic compounds and chemical structure, the comparison between nodes can are thus more constrained on the ATP production side. often identify clear physical-chemical driving forces un- Since autotrophs form the metabolic basis for ecosys- derlying divergences [2]. For A. aeolicus this helps us tems, the same will tend to hold for aggregate metabolic identify the key evolutionary driving forces that have networks at that level. The biosphere is globally au- shaped its metabolism. 4

A Y

X ?

B

Ancestral pathway

Phylometabolic Tree

FIG. 1: Principles of Phylometabolic Analysis. Panel A shows how phylogenetic distributions of pathways helps interpret the curation of individual metabolic networks. In this case comparison of metabolic gene profiles suggests the orange pathway represents the correct completion. Panel B in turn shows how pathway distributions in turn also suggest evolutionary sequences. Imposing continuity of metabolite production at the ecosystems level allows us to represent those sequences as phylometabolic trees, in which each node represents a functional phenotype with an explicit internal chemical structure.

A critical aspect of PMA is that the evolutionary se- within single species or in syntrophic consortia. Nev- quences represented by phylometabolic trees are best un- ertheless, the metabolism of individual species may still derstood at the ecosystem level. Enforcing continuity in contain most or all of the network contained in individual the overall connectedness of metabolism requires the ex- nodes of a phylometabolic tree. This is particularly true istence of evolutionary intermediates in which two path- for autotrophs (such as A. aeolicus), which generate all ways are co-present before one may replace another (see biomass components directly from inorganics and are in a Fig. 1). However, such intermediary states are not re- sense “ecosystems within individuals”, and thus form im- quired to exist within a single organism, as an individ- portant model systems for reconstructing the long-term ual may give up the ability to use one pathway before evolution of metabolism. evolving or incorporating (through gene transfer) a sec- ond to replace it, so long as in the interim it can take up Our reconstruction of a whole-metabolism model for the required bottleneck intermediate from an ecosystem A. aeolicus is based on an initial network obtained from that continues to produce it. Moreover, since in PMA the Model SEED server [34], an automated pipeline that we generally model metabolic sub-systems without ex- generates genome-scale metabolic models directly from plicitly considering aspects of regulation or cellular inte- genome annotations. After first modifying the nutrient gration, the approach itself also does not distinguish be- and biomass compositions of the model to accurately cap- tween individuals and ecosystems. The nodes in a recon- ture the boundary conditions that define the overall A. structed tree thus capture the sequence of dependencies aeolicus phenotype (see supplemental info for additional in the innovation of pathways, whether these occurred details), the internal network was curated, using crite- ria of phylometabolic consistency to evaluate proposed 5 completions of of key network gaps. with chains of diverse length and substitution patterns In practice, PMA was implemented by surveying the generated by a very small set of proteins [37]. It should annotated genomes of a large number of archaea and further be noted that T. maritima in fact also has a di- bacteria across the tree of life at the online Uniprot verse mixture of chain length and substitution patterns in database [35]. Other repositories may be used, but we its lipid content [38], but that the simpler composition of chose Uniprot because it derives from the manually cu- E. coli was chosen in defining its biomass for reconstruc- rated Swissprot database, which is known to have among tion [36]. Thus, taking this representation difference into the lowest error rates in gene annotation [7]. We then account, the metabolic networks of A. aeolicus and T. performed an exhaustive and redundant search for all maritima may be reasonably representative of minimal relevant genes, using naming variants and EC classifica- bacterial metabolism found near hydrothermal vents. tion numbers, which code for the enzymes across a set of Our primary goal was to accurately reconstruct the metabolic pathways that define a metabolic sub-system. pathway structure of the metabolism for the purpose of When all enzymes of a metabolic pathway are identified understanding its evolution. We therefore did not signifi- within a single genome, then that pathway is counted as cantly modify the biomass vector of the model (with fatty present in that species, and totals are tabulated across acids as the major exception), or general assignments of clades. The high rate at which new genomes are be- genes to enzymes from the initial model obtained from ing sequenced is reflected in the fact that the number of the SEED server. This work should thus be considered members in clades differs in our tabulations for different a qualitative first step toward building a comprehensive sub-systems, even though analyses were in some cases computational organism model for A. aeolicus. As addi- only performed a few weeks apart. tional experimental data become available, including in To further correct for misannotated or unannotated particular detailed data on the biomass composition un- genes, the above searches were complemented by addi- der different growth conditions, they will provide further tional BLAST searches using the built-in capabilities on benchmarks to make this model increasingly quantita- the Uniprot website. As we will highlight with several tive. Nevertheless, this reconstruction contains a wealth examples, in many cases BLAST searches will identify of data on basic principles of metabolic evolution, as we groups of genes coding for the same enzyme at the clade will discuss in detail, starting from core carbon-fixation. level. Especially when organisms contain closely related enzymes that because of high sequence similarity are mis- annotated (often as the same gene), comparisons using Carbon-fixation and the initiation of anabolism both sequence similarity and enzyme length will often separate those enzymes into clear groups at the clade Aquifex aeolicus uses a unique and previously unrec- level. This in turn increases the chances that individ- ognized form of carbon-fixation. It has been known for ual enzymes within those clade-level groups have been more than two decades that A. aeolicus uses the re- experimentally studied within an individual member of ductive citric acid (rTCA) cycle to fix most of its car- that clade, helping to anchor the functional annotation bon [39, 40]. However, our study on the evolution of of each of the enzyme groups. carbon-fixation led us to conclude that in parallel this organism uses an incomplete form of the reductive acetyl- CoA (Wood-Ljungdahl, WL) pathway to produce a small Results and Discussion sub-set of biomass components [2]. This hybrid form of carbon-fixation was recognized through a broad survey Aquifex aeolicus provides an interesting reference point of the biosynthetic pathways leading to glycine and ser- for minimal deep-branching hyperthermophilic bacterial ine. We identified substantial gaps in each of the conven- metabolism. Our reconstruction for this organism con- tionally recognized pathways to these compounds across tains 756 reactions and 729 metabolites. In compari- many deep-branching clades. Instead these organisms son, the recently reconstructed network of Thermotoga often possess the complete gene complement for an al- maritima – like A. aeolicus thought to be one of the ternate folate based reductive C1 pathway that is also deepest-branching and most thermophilic bacteria [18] – used as part of WL. Indeed, the suggested existence of contains 562 reactions and 503 metabolites [36]. Most widespread hybrid carbon-fixation strategies using a par- of the difference in size results from the ways fatty acid tial WL pathway was one of the key insights that allowed metabolism is represented. In the T. maritima network, the reconstruction of a phylometabolic tree of carbon- lipid metabolism is represented using many composite re- fixation through fully autotrophic intermediates [1, 2]. actions, resulting in only 27 reactions and 27 metabolites The complete carbon-fixation strategy reconstructed in this sub-network [36]. By contrast, we explicitly rep- for A. aeolicus is shown in Fig. 2. The major frac- resent each reaction in lipid synthesis, and because the tion of carbon is fixed through the rTCA cycle, produc- lipid composition of A. aeolicus is diverse, this results ing its intermediates acetyl-CoA, pyruvate, oxaloacetate, in nearly 200 reactions and 200 metabolites (see also the and α-ketoglutarate (highlighted in blue), from which an- section on lipid biosynthesis below). However, fatty acid abolic pathways to most components of biomass subse- biosynthetic machinery is highly flexible and modular, quently radiate [41]. In parallel, reductive folate chem- 6

O H HN R N istry transforms CO into C units of different oxidation HN 2 1 H2N N N states, from which a small number of additional anabolic H Purine O HCOOH O pathways are initiated. Formyl groups are used in the O HN R O H N R N N HN HN H2N N N H2N N N production of purine, methylene groups are used in the H H production of glycine and serine, as well at thymidylate CO O N R N and , while methyl groups are donated to S- Thymidylate HN H2N N N adenosyl methione (SAM), which mediates methyl-group H Serine chemistry [42]. This combined carbon-fixation strategy CoA O N R N HN thus lacks only a single reaction relative to the recon- H N Glycine 2 N N H +CO +NH" structed root of carbon-fixation, in which the rTCA cy- SAM cle and the WL pathway are fully integrated [2]. The CH 3 O HN R N A. aeolicus metabolic phenotype is lacking only the final HN H2N N N H synthesis of acetyl-CoA within WL, which is catalyzed Lipids by one of the most oxygen-sensitive enzymes in the bio- +CO sphere [43]. Alanine ThPP Sugars +CO +H S-CoA CoA O O The specific form of formate uptake by folate remains Nucleotides O O O O O O 2 B O O S-CoA O O to be elucidated. A. aeolicus lacks the gene for 10-formyl- +HCO"# O O CoA THF (tetrahydrofolate) synthetase, which catalyzes the 3 1 O O O O 10 O O attachment of formate at the N position of THF in the O O O O O Aspartate O O O O O O O Amino acids acetogenic version of WL [42]. However, a broad genomic * +H O +H 5 survey across deep branches of the tree of life nonetheless 4 O O O O O strongly supports the use of a partial WL pathway over O O O O other alternatives to produce glycine and serine in A. O O 5’ -H O aeolicus [2]. 5 -H O O

Formate incorporation in this organism is most likely O O O O 10 O O O catalyzed by an unrecognized N -uptake protein, or O O O through an unrecognized sequence involving attachment 4’ +H of formate at the N5 position of THF [2]. This alternate +H O O O O O O route was suggested by the very wide distribution across O 3’ O O 1’ O O O O the tree of life of a gene for an ATP-dependent 5-formyl- O 2’ O O O O +HCO"# +CO + H CoA S-CoA O O O B O O O THF cycloligase, present in the genomes of many organ- +CO +H Glutamate isms that like A. aeolicus appear to use a partial WL ThPP Amino acids pathway to produce glycine and serine, but which lack Cofactors the (also ATP-dependent) 10-formyl-THF synthetase. 5- formyl-THF cycloligase catalyzes the cyclization of 5- FIG. 2: Carbon-fixation in Aquifex aeolicus. The main fixa- formyl-THF to methenyl-THF and has so far only been tion pathway is the reductive citric acid (rTCA) cycle, from suggested to be part of futile cycle, but its precise phys- which most anabolic pathways are initiated. Reductive folate iological role has long remained unclear [44, 45]. The chemistry is a secondary fixation pathway from which an addi- 5 suggested use of an N uptake route for formate is ap- tional small set of anabolic pathways is initiated. Relative to pealing not only for these reasons, but also because it the reconstructed root of carbon-fixation, in which the rTCA would provide an evolutionary intermediate between N5 cycle and Wood-Ljungdahl pathway are fully integrated [2], formate uptake in the methanogenic version of WL and this hybrid strategy employed by A. aeolicus lacks only the N10 formate uptake in the acetogenic version of WL. grey-dashed reaction (acetyl-CoA synthesis). Molecules high- While it has been recognized that most anabolic path- lighted in blue represent the “pillars of anabolism”, TCA in- termediates from which the vast majority of anabolic path- ways originate from TCA intermediates, there exist some ways have been initiated throughout evolution [41]. High- variation in this set of intermediates across clades. In lighted in green is succinyl-CoA, which forms a precursor to particular, many organisms derive pyrroles (precursors pyrroles through a later derived pathway in some organisms to the family that includes heme and chloro- (but not Aquifex). Highlighted in red are reaction sequences phyll) from α-ketoglutarate through what is known as involving the same local functional group transformation that the C5 pathway, while α-proteobacteria and mitochon- in A. aeolicus are catalyzed by closely related enzymes in both dria derive pyrroles from succinyl-CoA [46]. This had halves of the rTCA cycle. Green dashed arrows highlight al- previously already led to the suggestion that pyrrole syn- ternate pathway sequences catalyzed by a single enzyme in from succinyl-CoA is a later evolutionary innova- other clades. tion [47], which is confirmed by genomic surveys. Ta- ble I shows that the pathway from succinyl-CoA is almost completely absent from deep-branching bacterial and ar- chaeal clades, while the C5 pathway is nearly universally most of metabolic evolution, carbon-fixation has fed into distributed. Thus, across most of the tree of life, and for anabolism through only 4 TCA intermediates. 7

C5 Aminolevulinate high sequence similarity, and were originally annotated Domain as the same enzymes [48, 50]. Similarly, pyruvate and α- Archaea (133) 108 0 ketoglutarate ferredoxin: also catalyze ho- Bacteria∗ (345) 290 5 mologous chemistry (reactions 2, 20), and due to high sequence similarity were again annotated as the same en- TABLE I: Distribution of entry sequences to pyrrole biosynthesis. zyme in A. aeolicus [11]. Notes: ∗ Includes Aquificales, Thermotogales, Firmicutes (Bacil- lales + Clostridia), Chloroflexi, Chlorobiales, Cyanobacteria, Ni- These observations are striking in light of the discus- trospirae, Deincoccales, Verrucomicrobia, Planctomycetes. sion on metabolic innovations in the introduction. A hy- pothesis of a minimal ancestral metabolism with promis- cuous enzymes catalyzing homologous reactions in par- Evolution of the rTCA cycle allel, is consistent with a highly symmetric rTCA vari- ant used by members of the Aquificaceae. Aquificales Variants in the form of the rTCA cycle used within the are the deepest-branching clade using this pathway to Aquificales provide insights into the evolutionary driving fix carbon [52], and as mentioned their exclusive associ- forces that have shaped this pathway. Recent studies in ation with hydrothermal vents may result in some of the Hydrogenobacter thermophilus, which together with A. most conservative metabolic features [4]. aeolicus belongs to the Aquificaceae family within Aquif- Indeed, the new enzymes identified in the Aquificaceae icales, identified novel enzymes for several rTCA reac- have been argued to represent the ancestral rTCA en- tions. The cleavage of citrate to acetyl-CoA and oxaloac- zymes [40, 49, 53]. The single step ATP citrate etate, and the reductive carboxylation of α-ketoglutarate found in most rTCA bacteria is suggested to have arisen to isocitrate, each conventionally recognized as single en- through a gene fusion of the second sub-unit of citryl- zyme reactions, are in H. thermophilus both catalyzed CoA synthase with citryl-CoA lyase [40, 48, 49], while the in two steps by distinct enzymes also found in the A. combination of α-ketoglutarate biotin carboxylase plus aeolicus genome (see Fig. 2) [48–51]. This Aquificaceae isocitrate dehydrogenase (ICDH) is suggested to have variant of the rTCA cycle has an increased degree of sym- been replaced by a single ICDH with increased substrate metry, and reveals previously unrecognized homology re- specificity [53]. lations in enzymes, which recapitulate the similarity in the local-group chemistry of their substrates. To further explore the evolutionary driving forces that The newly discovered citryl-CoA synthase and α- underly the evolution of the rTCA cycle, we used H. ther- ketoglutarate biotin carboxylase enzymes catalyze local mophilus enzymes as a benchmark to survey the genomes functional group chemistry that is homologous to their of all Aquificales, as well as other clades using the rTCA counterparts succinyl-CoA synthase and pyruvate biotin cycle. Table II shows the distributions of each of the carboxylase (reactions 1, 10 and 3, 30 in Fig. 2), respec- newly identified high homology rTCA enzymes across all tively. Moreover, both sets of homologous enzymes have Aquificale genomes.

ICDH BC - LS BC - SS CS - LS CS - SS Family Species PYR AKG PYR AKG SUC- CIT- SUC- CIT- I T. thermophilus 421 617 (100) 652 (100) 475 (100) 472 (100) 383 (100) 429 (100) 293 (100) 344 (100) A. aeolicus 426 614 (81) 655 (78) 477 (76) 472 (82) 385 (77) 436 (78) 305 (84) 368 (84) T. Albus 419 614 (81) 653 (90) 475 (83) 472 (91) 382 (81) 432 (87) 295 (87) 341 (89) H. sp Y04AAS1 421 619 (69) 638 (74) 475 (69) 472 (79) 382 (74) 423 (78) 291 (80) 329 (77) II S. azorense 421 614 (65) 647 (74) 482 (67) 472 (73) 388 (63) ∗∗ (31) 293 (72) ∗∗ (39) P. marinus 747 613 (67) ∗∗ (50) 478 (64) ∗∗ (56) 388 (65) ∗∗ (31) 292 (69) ∗∗ (38) S. sp YO3AOP1 746 616 (66) ∗∗ (47) 476 (66) ∗∗ (56) 389 (64) ∗∗ (30) 293 (72) ∗∗ (39) III D. thermolithotrophum 735 616 (57) ∗∗ (50) 472 (58) ∗∗ (54) 389 (62) ∗∗ (31) 300 (70) ∗∗ (37) T. ammonificans 735 618 (59) ∗∗ (49) 472 (59) ∗∗ (54) 389 (63) ∗∗ (30) 300 (70) ∗∗ (37)

TABLE II: Distribution of rTCA enzymes in Aquificales. Numbers in each column represent the length of the enzyme in terms of amino acid residues, and the numbers in parentheses are the sequence similarities relative to the T. thermophilus version of those enzymes. Entries ∗∗ are for species that have only one copy of that enzyme class, which are aligned with the type to which it has greatest sequence similarity (sequence similarity for alternate alignment also shown for comparison). Families: I - Aquificaceae, II - Hydrogenothermaceae, III - Desulfurothermaceae. Abbreviations: ICDH, isocitrate dehydrogenase; BC, biotin carboxylase; -CoA synthase; LS, large subunit; SS, small subunit; PYR, pyruvate; AKG, α-ketoglutarate; SUC-, succinyl-; CIT-, citryl. 8

Two clear groups of Aquificales are distinguished by adaptation can thus be seen as driven by improving the the rTCA variants they use. All members of the Aquifi- thermodynamic efficiency of the rTCA cycle. caceae possess two biotin carboxylase (BC) and two An additional factor that could have further increased CoA synthase (CS) enzymes. By contrast, all member the ATP savings of using a single higher-specificity ICDH of the Hydrogenothermaceae and Desulfurobacteriaceae to generate isocitrate from α-ketoglutarate is that like possess only single copies of these enzyme types, except other β-ketoacids oxalosuccinate is subject to sponta- S. azorense, which possess two BC enzymes. In addition, neous decarboxylation. The lifetime of oxaloacetate (also in terms of both sequence similarity and length, both a β-ketoacid) in water may be estimated at about 3 min sub-units of the single BC enzyme in Hydrogenother- at 90◦ C (Figure based on ∆H‡ = 17.2 kcal/mol and −5 −1 maceae and Desulfurobacteriaceae best match pyruvate k25◦C ≈ 2.8 × 10 s from Ref. [56].), and oxalosucci- (rather than α-ketoglutarate) BC in Aquificaceae. Simi- nate is more unstable. Metal ions and amines are known, larly, the single CS enzyme in these families best matches at least for oxaloacetate, to further enhance the rate of succinyl-CoA synthase in Aquificaceae. Finally, all Hy- spontaneous decarboxylation [56]. The decarboxylation drogenothermaceae and Desulfurobacteriaceae are known of oxalosuccinate back to its precursor α-ketoglutarate to possess ATP citrate lyase, while all except S. azorense would introduce a futile cycle that may have raised the possess an ICDH enzyme that is significantly larger than average cost of the forward carboxylation reaction above the version found in Aquificaceae. This difference in 1 ATP. length is consistent with the suggestion that an increased Surveys in the genomes of other clades using the rTCA substrate specificity of the ICDH allowed it to supplant cycle (data not shown) are consistent with these sug- the combined function of the BC and more primitive gested adaptations. Chlorobiales, a later branching pho- ICDH enzymes [53]. This survey shows how sequence toautotrophic clade in which the rTCA cycle was orig- length of enzymes can be a useful secondary source of inally described [57], “Candidatus Nitrospira defluvii”, information, after sequence similarity, in functional an- a member of the Nitrospirae in which the rTCA cy- notation. Thus, we conclude that all Aquificaceae pos- cle was recently discovered through metagenome anal- sess the highly symmetric rTCA variant shown in Fig. 2, ysis [58], as well as Sulfurimonas, an -proteobacterial while other Aquificale families mostly possess the more family that uses the rTCA cycle [59], all use the less- conventional rTCA variant. symmetric rTCA variant. Their genomic features asso- Clear evolutionary driving forces can be identified that ciated with the rTCA cycle are similar to those of the connect the different extant forms of the rTCA cycle. Hydrogenothermaceae and Desulfurobacteriaceae. All of The gene fusion within ATP citrate lyase increases the these species possess a gene for ATP citrate lyase, have effective concentration of citryl-CoA in the subsequent single copies of BC and CS enzymes, and have ICDH cleavage reaction, and can thus be understood to improve enzymes with a length of around 730–740 amino acid the kinetics of the rTCA cycle. This gene fusion would residues. appear to have low cost, not requiring evolution of an Together these observations can be used to reconstruct additional compensatory pathway to release citryl-CoA a phylometabolic tree of rTCA variants, which in turn as a free intermediate, since it is not used elsewhere in represents a branch of increased resolution on a more gen- metabolism. eral phylometabolic tree of carbon-fixation [2]. This tree Detailed thermodynamic analyses have in turn identi- is shown in Fig. 3. The hypothesized root node, shown on fied two classes of rTCA reactions that in isolation would the left, consists of the symmetric rTCA variant, but cat- require ATP hydrolysis to proceed: carboxylation reac- alyzed by a single set of of enzymes for both arcs. Dupli- tions and carboxyl reduction reactions [54]. These costs cation and divergence toward greater specificity enzymes can be avoided in a number of ways. Coupling car- through selection for improved kinetics then gives rise boxyl reductions to subsequent carboxylations through to the symmetric rTCA variant found in Aquificaceae, thioester intermediates (e.g. succinyl- or acetyl-CoA), al- while gene fusion in ATP citrate lyase combined with lows the combined sequence to be driven by a single ATP an increased substrate specificity of ICDH gives rise to a hydrolysis, while coupling endergonic to subsequent ex- second divergence that leads to the conventional rTCA ergonic reactions can eliminate the ATP cost of the com- cycle. A comparison of ATP citrate lyase genes sug- bined sequence [54]. The replacement of the combination gested that Aquificale families using it obtained this en- of α-ketoglutarate BC and the lower-specificity ICDH zyme through HGT from -proteobacteria, and that the with a single high-specificity ICDH falls into this lat- Chlorobiale version represents the ancestral ATP citrate ter category. The α-ketoglutarate BC reaction in Aquifi- lyase [40]. Thus, the initial divergence between rTCA caceae costs 1 ATP per carbon incorporated [50], while variants may have been between Aquificales and Chloro- the subsequent reduction to isocitrate is highly exergonic, biales, with two of the Aquificale families later joining resulting in a nearly-reversible sequence without ATP the conventional rTCA group through HGT, as a result cost when the reactions are coupled [41, 54, 55]. This of ecological association with -proteobacteria. 9

duplication + divergence fusion 2 2 2 3 1

3 5* 4 4 Kinetics + Kinetics Thermodynamics 5’ 5 (Alkalinity) 5 5* 4’ 2’ 1 3’ 1’ 2’ functional takeover

Aquicacaea Hydrogenothermaceae + Desulfurothermaceae Chlorobiales Nitrospirae ε-Proteobacteria

FIG. 3: Phylometabolic tree showing the evolution of the rTCA cycle. A combination of improving kinetics (which increases growth rate) and improving thermodynamics (which increases growth efficiency) explain both divergences. For the first diver- gence, duplication and divergence toward higher substrate specificity of enzymes improves kinetics. For the second divergence, 0 0 replacing enzymes 3 + 4 by a higher specificity version of enzyme 4 removes a cost of 1 ATP hydrolysis in fixing CO2, while fusion of one sub-unit of enzyme 1 with enzyme for the subsequent cleavage reaction improves kinetics. Green boxes represent homologous reactions catalyzed by enzymes with high sequence similarity, while purple boxes represent homologous reactions catalyzed by members of the same enzyme families. Reactions 50 and 5∗ are catalyzed by the same enzyme. Differences in sequence divergence between green and purple enzymes may reflect differences in complexity of the reactions, see text for further discussion. The yellow node represents acetyl-CoA, the blue node represents oxaloacetate, and the red node represents succinyl-CoA. Dark blue arrows indicate the direction of mass through pathways.

While not all five homologous enzymes in the symmet- the whole cycle would required only 7 total enzymes for ric Aquificaceae rTCA variant show the same sequence catalysis: the 5 homologous reactions plus the reduc- similarity as the first three reaction (1–3), we argue this tion of malate to fumarate, and the cleavage of citryl- is consistent with the suggested evolutionary scenario. CoA, both of which do not have analogs in the opposite Reactions 1–3 represent thermodynamic bottleneck reac- arc. It has previously been argued that a fully connected tion that in isolation would require ATP hydrolysis [54], rTCA+WL network was selected as the root of carbon- and are catalyzed by elaborate multi-subunit enzymes fixation, because its topology would have provided the that contain complex metal-centers and/or complex or- most robust form of network autocatalysis for earlier eras ganic cofactors, and belong to small and highly con- of life [1, 2]. The present analysis suggests that an addi- served enzyme families. By contrast, the hydrogenation tional reason the rTCA cycle could have been privileged and dehydration reactions (4, 5) that follow these com- as a carbon fixation pathway capable of initiating the first plex reactions are catalyzed by simpler enzymes belong- cellular life, over alternate autocatalytic carbon-fixation ing to highly diversified enzyme families used through- pathways observed today [52, 60], is that its greater sym- out metabolism. The selection pressure to conserve en- metry permitted it to emerge with simpler catalytic sup- zyme sequence similarity is therefore likely to have been port and regulatory structure. lower, and either (or both) of these enzymes could also Each of the divergences in the tree of rTCA variants have been more easily replaced by other members of can be understood in terms of a cost-benefit tradeoff in this family. This breakdown between “easy” and “hard” metabolic innovations. Duplication and divergence to- chemistries has been recognized as a general constraint in ward greater substrate specificity of the enzymes catalyz- the early evolution of carbon-fixation pathways, with in- ing the two arcs would have improved the kinetics of the novations primarily occurring through the emergence of pathway, and thus the growth rate of organisms using it. new sequences of easier chemistries that connect harder While the genome expansion due to these duplications bottleneck reactions [1]. would have had increased cost associated to it, this was Having both arcs catalyzed by single sets of promiscu- apparently more than offset by the benefit of increasing ous enzymes in the root node would likely have resulted the kinetics of a pathway through which most cellular in slower growth for organisms using such a strategy, but carbon is incorporated. We will discuss a different case it would also have considerably lowered the overhead for in the biosynthesis of branched chain amino acids in the early regulatory machinery. If both rTCA arcs could next section, where a lower mass flux appears to have have been catalyzed by the same set of single enzymes, shifted this balance away from favoring duplication and 10 divergence. Surprising pathway variants used by A. aeolicus for the For the second divergence, the fusion of the citryl- synthesis of several amino acids were identified as a re- CoA synthase and citryl-CoA lyase would have improved sult of gaps in conventional pathways. Subsequent analy- the kinetics of the rTCA cycle, while the elimination of sis further showed those pathways to represent the likely the ATP dependent α-ketoglutarate BC would have im- ancestral pathways to those compounds. In the previous proved the thermodynamic efficiency of the rTCA cycle. section we mentioned the synthesis of glycine, serine, and A secondary effect of this adaptation is that it replaces cysteine, which in this organism are derived directly from − a carboxylation reaction based on HCO3 with one based CO2 and NH3 (and H2S for cysteine) through a partial on CO2, potentially allowing the new phenotype to thrive WL pathway coupled to the reductive (= biosynthetic) in less alkaline environments. It is interesting to note operation of the “glycine cycle” [2]. Next we discuss the that a similar replacement of pyruvate BC and malate biosynthesis of the branched chain amino acids valine, dehydrogenase, between whih oxaloacetate is the inter- leucine and isoleucine. mediate, in the opposite rTCA arc is not observed in any organism using rTCA to fix CO2. This may simply be because oxaloacetate is the starting precursor to a wide Branched chain amino acids range of anabolic pathways, and is most easily accessed if it is released into solution, and because it is moreover It has long been thought that most organisms syn- produced as a free intermediate during the cleavage of thesize α-ketobutyrate, a central precursor to isoleucine, citrate. by deaminating threonine [63]. However, an increasing The distribution of the different innovations governing number of species have been found to instead derive the evolution of the two extant forms of the rTCA cycle α-ketobutyrate from pyruvate and acetyl-CoA through may give some insights into the tradeoffs between opti- what is known as the “citramalate” pathway (see Fig. 4). mizing growth rate and growth efficiency. The combined Originally discovered, and later described in detail, in fitness advantage of improving kinetics by fusing the cit- the Spirochetes [64, 65], this pathway was subsequently rate cleavage sequence and improving thermodynamic ef- discovered in a range of species across the bacterial ficiency through the emergence of the higher-specificity and archaeal domains. It was found to be used in ICDH appears to be significant under most conditions. members of both the Euryarcheota [66–69] and Crenar- Only one family (Aquificaceae) within one clade (Aquif- cheota [70, 71], as well as Firmicutes (Clostridia) [72, 73], icales) still uses the ancestral rTCA variant, while the Chloroflexi [74], Chlorobia [75], Cyanobacteria [76], and conventional form is distributed across a wide range of several Proteobacteria [77–79]. bacterial clades. It is further interesting to note that Like the observation that the ancestral form of rTCA in the vast majority of cases the two innovations occur had an increased degree of symmetry that could have al- together. In only one known case (S. azorense) has the lowed a smaller regulatory structure, the sequence of lo- fusion of the citrate cleavage reaction taken place without cal functional group transformations in the citramalate elimination of the α-ketoglutarate BC. Characterization pathway is repeated in the synthesis of leucine from α- of additional Aquificales could help us disentangle the ketoisovalerate (see Fig. 4). In contrast to the symmetric ordering and relative advantage of the two innovations. rTCA variant where parallel reactions are catalyzed by homologous enzymes, however, in this case the parallel reactions are in fact catalyzed by the same enzymes in Amino acid biosynthesis both pathways [65]. The only exceptions are the pair of acetyl-CoA addition reactions (5, 50) that initiate the two For most amino acid biosynthetic pathways in A. aeoli- sequences, which are catalyzed by separate, though still cus the genome annotation leaves little doubt about the homologous enzymes. Together with the broad observed correct completion. Alanine, glutamate and aspartate distribution, this leads us to propose that the citramalate are only one amination reaction removed from interme- pathway represents the ancestral pathway to isoleucine, diates in the rTCA cycle, while an additional amination with threonine deaminase (often called threonine dehy- reaction leads to asparagine (from aspartate) and glu- dratase, TDH) a more recent innovation. tamine (from glutamate). The 3-step sequence from as- The genome of A. aeolicus is consistent with this hy- partate to homoserine provides the branching point from pothesis, as it lacks the gene for threonine deaminase which the synthesis of threonine, methionine, and and instead contains two genes annotated as 2-isopropyl (via the DAP pathway [61]) diverge. Arginine and pro- (IPMS, reaction 500) [11]. To confirm line are both derive from glutamate, with arginine syn- the presence of the citramalate pathway, and to place thesis proceeding via ornithine and a partial urea cycle. this use in evolutionary context, we performed broad ge- Histidine is synthesized from ATP and PRPP through nomic surveys for threonine deaminase (TDH) and citra- the standard histidine synthesis pathway. The shikimate malate synthase (CMS, reaction 50). Presence of either pathway [62] produces chorismate, from which the syn- gene satisfies the constraint of pathway completeness in theses of the aromatic amino acids tryptophan, pheny- PMA, as the other genes necessary are shared between lalanine and tyrosine diverge. both pathways. Distributions of the two genes/pathways 11 are shown in Table III.

Ancestral Val/Leu/Ileu O O O O O O O O O

O O O O O O

O O O

O 2a. O 2b. 3. 4. NH O 1. O Valine

O + ThPP ThPP O O O O O O 1. O O O O O 2a. 2b. 3. 4.

O O O O O

O

O O O O O O NH O O Isoleucine

Oxidative TCA

O O O O O O O + ThPP ThPP LipA-S S-CoA O O

NH Threonine

Ancestral Val/Leu/Ileu

O O O O O O O O O O O O O O O O O O O O O O O O 6a. 7. 7. O O

6b. O O O 5’. O S-CoA

5”. O O O O O O O O O O O O O O 6a. O 6b. 7. Spont. O O O O

O O O O O

O O NH

O O O O Leucine

Oxidative TCA

O O O O O O O O O O O O O

S-CoA O O O O O O O O O O O O O O O O O O O O O O O O O

FIG. 4: Branched chain amino acid biosynthesis. The chemical sequences show the parallels in terms of local functional group chemistry within the reconstructed ancestral pathways to valine, leucine and isoleucine. The blue box highlight the citramalate pathway of α-ketobutyrate synthesis, reconstructed here to represent the ancestral sequence to this compound. The molecules highlighted in orange and green in turn show the compact interconnectedness of the ancestral pathways to the branched chain amino acids. Parallels to substrate sequences within the oxidative TCA are also highlighted, as well as the alternate route to α-ketobutyrate from threonine.

We note briefly that there is some uncertainty in the BLAST searches show that at the clade level these genes annotation of CMS, because of the sequence similarity separate into clear groups based on sequence similarity. not only to IPMS, but also to homocitrate synthase of the For most clades, genes within one of these groups were AAA Lysine synthesis pathways [69]. However, the pool- identified in the various experimental studies mentioned ing of evidence in PMA lowers these uncertainties. For above as likely encoding CMS, thus anchoring the group example, if a strain has two genes annotated as IPMS, as a whole. Combining all these different lines of evidence possesses no gene for TDH, nor any of the other genes of leads to the distributions shown in Table III. the AAA lysine pathway to which homocitrate synthase Among the archaea in our study, a vast majority might connect, one of the IPMS copies most likely instead (98/122, or 80%) posses a gene for CMS. By contrast, codes for CMS. In addition, BLAST search comparisons less than half (58/122, or 47.5%) possess a gene for TDH. among these genes were used for cross-validation. For Moreover, when TDH is found it is mostly co-present most species with multiple genes annotated as IPMS, with CMS, while in many cases CMS represents the ex- 12

TDH CMS both bia. Domain/Clade/Family A closer look within clades where TDH represents the Archaea (122) 58 98 47 majority pathway provides further context for evolution- Crenarcheota (40) 34 30 27 ary reconstruction. For example, within the large and Euryarcheota (79) 21 65 17 diverse Firmicute phylum, the aerobic Bacillales nearly Korarcheota (1) 1 1 1 all possess the threonine pathway, while the anaerobic Thaumarcheota (2) 2 2 2 Clostridiales exhibit the threonine and citramalate path- Bacteria total (321) 213 138 55 ways in about equal frequencies. Within the Clostridiales Bacteria w/o Firmicutes (118) 57 78 29 in turn, the Clostridiaceae family, which contains many Aquificales (9) 0 9 0 pathogenic strains [81], make nearly exclusive use of the Thermotogales (13) 5 6 5 threonine pathway, while the Thermoanaerobacteriales Chlorobiales (11) 0 11 0 family, which like the Aquificales are hyperthermophiles, Chloroflexi (16) 10 12 6 make nearly exclusive use of the citramalate pathway. Nitrospirae (3) 0 3 0 Within the Cyanobacteria in turn, the majority of Planctomycetes (6) 1 6 1 strains using the threonine pathway belong to the genera Verrucomicrobia (4) 1 4 1 Synechococcus and Prochlorococcus. Many of the former Deinococcus-Thermus (15) 10 9 9 and all of the latter are divergent cyanobacterial lineages Deinococcales (7) 6 1 0 highly adapted to oligotrophic regions of the world’s Thermales (8) 4 8 4 oceans, often dominating those environments [82–84]. Cyanobacteria (41) 30 21 12 Among all other cyanobacteria, the citramalate path- Prochlorococcus (12) 12 0 0 way represents the major route to isoleucine. Finally, Synechococcus (11) 8 5 2 within Deinococcales-Thermales [85], which have approx- Others (18) 10 16 10 imately equal numbers of both pathways, the Deinococ- Firmicutes (203) 156 60 26 cales use mainly the threonine pathway, while the hy- Bacilli (104) 101 9 9 perthermophilic Thermales use mainly the citramalate Clostridiales (99) 55 51 17 pathway. Clostridiaceae (33) 26 8 3 Lastly, it has previously been observed that many Thermoanaerobacter (24) 3 20 1 deep-branching bacteria that possess both TDH and CMS, appear to possess the catabolic version of TABLE III: Distribution of Isoleucine biosynthesis pathways. TDH [80]. In our sample of Thermotogales and Chlo- roflexi, both of which show frequent occurrence of TDH, BLAST searches show that many of these genes are bet- clusive pathway to isoleucine. This is relevant, because ter matched to the catabolic than the anabolic TDH of two versions of TDH are known to exist: an ‘anabolic’ E. coli. Thus, all evidence combined leads us to conclude version whose activity is obligatory for cell growth, and that the citramalate pathway is the ancestral pathway a ‘catabolic’ version that only becomes active during sal- also in bacteria, and thus for all life, with the threonine vage of excess threonine [63]. It has previously been pathway initially emerging as a salvage pathway. noted that the archaeal TDH best matches the catabolic This conclusion is important relative to the previous version of this enzyme [80]. This is consistent with ex- discussion of a minimal ancestral rTCA cycle. In ad- perimental observation that for archaea which have both dition to the enzymes shared between the citramalate TDH and CMS, the citramalate pathway represent ei- pathway and the final sequence in leucine synthesis (re- ther a major or the exclusive pathway used in synthesis actions 5–7), the enzymes catalyzing the homologous se- of isoleucine [67, 70]. Thus, we conclude that the cit- quences in valine and isoleucine synthesis (reactions 1– ramalate pathway represents the ancestral pathway to 3) are similarly shared across pathways, while the fi- isoleucine in archaea, with TDH representing a later in- nal amination reaction (4) is in all three pathways per- novation that initially emerged for salvage purposes. formed by the same enzyme [86]. While some organ- Bacteria present a more complex picture than ar- isms have additional copies of the first thiamin catalyzed chaea regarding the synthesis of isoleucine. Both reaction (1) for purposes of regulation [87, 88], A. ae- TDH and CMS are common in deep-branching bac- olicus possesses only a single, two-subunit enzyme for terial clades, with TDH occurring with higher fre- both reactions [11]. As mentioned, the lone exception quency. However, this balance is dominated by Fir- to this pattern of promiscuous catalysis is the acetyl- micutes, and among other deep-branching bacteria, the CoA addition reaction (5, 50) initiating both sequences citramalate pathway appears to be the most abundant in leucine/isoleucine synthesis. However, if these reaction route to isoleucine. Strikingly, a significant number had not been mediated by thioester intermediates, they of deep-branching clades whose members include (hy- would represent carboxyl reduction and carbon-carbon per)thermophiles, autotrophs, or both, show near exclu- bond forming reactions that in carbon-fixation pathways sive use of the citramalate pathway: Aquificales, Chloro- are associated with ATP hydrolysis [54]. Similar to those biales, Nitrospirae, Planctomycetes, and Verrucomicro- more constrained carbon-fixation reactions, the high se- 13 quence similarity of the two enzymes caused them to be actions across the two sub-systems, which in the direc- originally annotated as the same enzyme [11, 65]. The tion of decarboxylation is facile, and in the direction of more complex, constrained nature of these reactions may carboxylation requires complex cofactors and ATP hy- thus have increased the kinetic advantage of duplication drolysis (see previous discussions). Phylogenetic recon- and divergence toward greater enzyme specificity. Thus, structions of the lineages of these enzymes could shed if as before we assume an ancestral enzyme with broader additional light on these hypotheses. substrate affinity for reactions 5/50, then all 21 total re- These observation may thus suggest that the existence actions in the synthesis of valine, leucine and isoleucine of a sequence of substrate chemistry operating in the op- starting from pyruvate and acetyl-CoA would have re- posite direction within the ancestral rTCA cycle could quired only 7 enzymes for catalysis (see also Fig. 4). have facilitated the emergence of the ancestral pathways In contrast to the evolution of the rTCA cycle, most to leucine and isoleucine. The only truly new form of enzymes catalyzing homologous reaction sequences in chemistry in the citramalate pathway and its homolo- this case have not duplicated and diverged, except for gous sequence in leucine synthesis is the initiating re- CMS and IMPS. Why this difference? The difference in action involving ligation of acetyl-CoA (reactions 5, 50 mass flux between the two pathways would appear to a in Fig. 4). Moreover, the facile decarboxylation of β- offer a straightforward explanation. While most cellu- carboxylic acids, which in the reductive direction of the lar carbon passes through the rTCA cycle in autotrophs rTCA cycle may have increased the cost of an already using that pathway, only 3 out of 20 amino acids are complex reaction, may have created an advantage when generated through the pathways in Fig. 4. Thus, even used in the opposite direction, possibly further facilitat- if we simplistically assume roughly equal biomass parti- ing the emergence of the ancestral citramalate/leucine tioning between amino acids, nucleotides and lipids, the sequence. mass flux difference between the two sets of pathways The pattern of pathway diversification by innovation is about an order of magnitude. The only enzyme for of the initiating reaction, followed by re-use of similar which selection pressure to improve kinetics appears to or identical downstream enzymes to catalyze homologous have led to duplication and divergence within the syn- reaction sequences, was also the primary mode of diversi- thesis of branched chain amino acids represents a more fication proposed for carbon-fixation pathways in Ref. [1]. complex, possible thermodynamic bottleneck reaction. The difference to those previous proposals is that here also the direction of the re-used sequence chemistry is suggested to have changed. We should also note that in Emergence, and combined regulation with the rTCA cycle comparing these innovations some caution should be ex- ercised because the establishment of the first pathway to a set of amino acids is an innovation occurring in the era Additional overlaps between the rTCA cycle and the prior to LUCA, while diversification of carbon-fixation branched chain amino acid biosynthetic pathways may pathways occurred after LUCA. Nonetheless the paral- provide insights into how the latter emerged. The dehy- lels in the suggested modes of innovations is striking and dration/hydration isomerization and dehydrogenation se- may suggest a general principle of early metabolic evolu- quences (reactions 6a, 6b, 7) in the reconstructed ances- tion. tral pathways to leucine and isoleucine is homologous to the sequence of local group transformations occurring in If true, this hypothesis of pathway evolution may have the opposite direction in the large-molecule half of rTCA allowed the LUCA to regulate the combined sub-systems (reactions 40, 50 and 5∗ in Fig. 2). The associated en- with an even smaller genome than we have suggested zymes in the two sub-systems may also have a common for them separately. If the homologous enzymes in the origin. In A. aeolicus (which uses the ancestral rTCA two sub-systems arose through duplication and diver- variant), the dehydration/hydration-isomerization reac- gence from a common ancestor, then the two sequences tion is catalyzed by a single subunit aconitase enzyme could have potentially been catalyzed by the same en- (ACO) in rTCA, and a two subunit enzyme (LeuC/D) zymes in an earlier era. We suggested above that the in the branched chain amino acid pathways. However, complete ancestral forms of rTCA and branched chain the combined length of LeuC and LeuD is similar to that amino acid biosynthesis could in an era of more promis- of ACO, while LeuC and LeuD also have high sequence cuous enzymes each have been catalyzed by only 7 en- similarity to consecutive, adjacent portions within ACO zymes total. The added observations here suggest that (data not shown). This suggest the possibility that fol- the entire connected network of rTCA plus the pathways lowing duplication and divergence from a common an- to the branched chain amino acids could have been cat- cestral enzyme the two subunits became fused within alyzed by only 12 total enzymes. the rTCA cycle but not the branched chain amino acid pathways, due to the differences in mass flux density. Similarly, the homologous (de)hydrogenation reactions Relation to the reversal of the TCA cycle in the two sub-systems are in A. aeolicus also catalyzed by enzymes with high sequence similarity. The enzyme The key innovation of the acetyl-CoA ligation reaction homologies do not extend to the (de)carboxylation re- that initiates the homologous sequences within leucine 14 and isoleucine synthesis, and possibly governed their Lipoic acid emergence, may have also facilitated the later emergence of other pathways within metabolism, in particular the Lipoic acid is a cofactor with very limited, but key reversal of direction of the TCA cycle. The entire homol- metabolic roles. Lipoic acid is central to the “Glycine ogous reaction sequences (5–7) in the citramalate and Cleavage System” (GCS) [90], which connects glycine leucine pathways are also known as “keto acid elonga- and serine metabolism to folate one-carbon chemistry, tion” sequences [19], and are further repeated within the and is also used (as previously mentioned) in the ferre- oxidative TCA cycle (as well as the AAA lysine synthesis doxin:oxidoreductase decarboxylation reactions in the pathway). It is becoming clear that the TCA cycle origi- oxidative TCA cycle and the degradation of branched nally operated in the reductive direction (see Ref. [1] and chain amino acids. The GCS is known to be re- references therein for discussion), which means that keto versible [90], has nearly neutral thermodynamics [54], acid elongation likely occurred within branched chain and likely originally operated in the reductive (i.e. amino acid synthesis prior to its use in the oxidative TCA biosynthetic, not degradative) direction as part of the cycle. The thiamin-facilitated decarboxylation of pyru- ancestral pathway leading to glycine and serine [2]. For vate (see Fig. 4) thus similarly appears to have been used these reasons the GCS together with serine methyl trans- first in the synthesis of branched chain amino acids then ferase (SMT) has also been called the “Glycine Cy- in its use within the oxidative TCA cycle. Finally, lipoic cle” [2]. The phylometabolic analysis that places reduc- acid is used only in the oxidative (and not the reductive) tive glycine synthesis at the base of the tree of life, as direction of the TCA cycle, while its likely ancestral func- part of the phenotype of the last common ancestor, sug- tion was in the synthesis of glycine through the glycine gests that the function of lipoic acid in the glycine cycle cycle [2] (see also the section on lipoic acid below for preceded its use in either the oxidative TCA cycle or the additional supporting evidence). degradation of branched chain amino acids. This pro- Thus, we suggest that at the substrate level the prior vides important context for interpreting the distribution existence of the biosynthetic pathways leading to va- of lipoic acid biosynthesis genes that we discuss next. line/leucine/isoleucine facilitated the reversal of the TCA Three pathways are known for the biosynthesis of lipoic cycle from the reductive to the oxidative direction. The acid (see Fig. 5). The conventional pathway, first de- additional key innovation appears to have been the re- scribed in E. coli [91], involves transfer of an octanoyl cruitment of lipoic acid from the glycine cycle to its inter- moiety from the Acyl Carrier Protein (ACP) to one of the action with thiamin in the production of acetyl-CoA from Lipoyl Dependent (LD) enzymes, followed by sequential pyruvate. Broad affinity of earlier enzymes would have sulfuration of the octanoyl moiety to produce the final aided the emergence of this novel pathway as promiscu- lipoated enzyme [92]. The first step is catalyzed by oc- ous activity could have allowed this pathway to proceed tanoyl (LipB), while the second is catalyzed at lower rates, with duplication and divergence later be- by Lipoyl Synthase (LipA). A variant of this scheme ing favored as respiration came fully online and mass flux was recently discovered in B. subtillis, which involves the through this pathway increased. same basic chemistry, but distinct set of enzymes and an additional intermediate in the transfer of octanoate. In B. subtillis, the distinct octanoyl transferase LipM trans- Cofactors fers octanoate from ACP to the H-protein of the GCS, fol- lowed by a second transfer (catalyzed by LipL) to the E2 Cofactors are a distinct class of molecules at the sub- subunit of pyruvate dehydrogenase [93–95]. Both LipM strate level of metabolism, forming a chimeric interme- and LipL had previously been obscured due to their se- diary layer between monomers and polymers in terms of quence similarities to LplA of E. coli [94]. A third structure [89]. Cofactors are also critical components of distinct pathway was recently discovered in an E. coli the control hierarchy of metabolism, facilitating many mutant in which LipB had been deactivated. In this mu- key reaction mechanisms, and thus the overall integra- tant, lipoate protein (LplA), normally used in the tion of metabolism. Each cofactor generally facilitates attachment of free lipoic acid to LD enzymes, is recruited a distinct and specialized catalytic function, and their in the transfer of free octanoate to an LD enzyme through emergence can thus be thought of as the outgrowth of an AMP-bound intermediate [92, 96–98]. In light of these kinetic feedback loops, each “opening up” new spaces variations, it is noteworthy that A. aeolicus lacks a gene in the universe of organic chemistry and bringing them for LipB and is annotated as having LipA and LplA [11]. under the control of biology [1]. Understanding the evo- This raises the question, which of the pathway alternates lution of cofactor biosynthesis is thus important both in does A. aeolicus use, and what does this teach us about providing context to discussions on the origin of life, as how the biosynthetic pathways and the functionality of well as understanding major physiological lineages in the lipoic acid evolved? tree of life. In this section we focus on the synthesis of To examine this question, we performed broad ge- several cofactors in A. aeolicus, using the reconstruction nomic surveys for each of the enzymes used in lipoic to provide additional insights into the evolution of their acid metabolism, shown in Table IV. In addition to LipA, functions. LipB, LplA, LipM/LipL, our survey also includes LplJ, 15

8:0 AMP LplA * LplA * nificant abundance of the LipA + LipB combination. In contrast, LplA (and its combination with LipA) is widely LipB and more evenly distributed across both archaea and bac- teria. However, BLAST searches indicate that in most

LipM LipL of these cases LplA is in fact a better match to LipM O O O of B. subtillis than to LplA of either E. coli or the eur- S LipM NH NH ACP GcvH E2 yarchaeota T. acidophilum [99]. While this suggests that the B. subtillis variant may be the ancestral pathway to LipA LipA lipoic acid, in most cases with a putative LipM gene, we SH LipA SH SH SH could not find the accompanying LipL gene, presenting a puzzle as this gene is absolutely required in lipoic acid O O synthesis in B. subtillis [94]. NH NH GcvH E2 This is where the functional roles of lipoic acid pro- LplJ vides critical evolutionary context. As explained, the LplA likely ancestral function of lipoic acid is its role in con- SH SH necting glycine/serine metabolism to folate-C1 chemistry SH SH LplJ through the glycine cycle, for which it remains (nearly

O LplA O universally) essential to this day. In contrast, the role OH AMP of lipoic acid in the oxidative TCA cycle or the degra- dation of branched chain amino acids likely arose later, and is not essential to many organisms. For example, FIG. 5: Lipoic acid biosynthesis and lipoyl-protein assem- many autotrophs, including A. aeolicus, do not use the bly. In E. coli (green sequence), octanoate is transfered from oxidative TCA cycle nor do they degrade branched chain ACP to the E2 subunit of pyruvate dehydrogenase (PDH) by amino acids. Such organisms should thus not need LipL LipB, followed by sulfuration to lipoic acid by LipA. In E. coli to transfer octanoate from the H-protein of GCS to other mutants lacking LipB, octanoate is transfered through an al- LD enzymes, because they do not possess them. Instead, ternate route with an AMP-bound intermediate by LplA, nor- direct sulfuration of octanoate bound to the H-protein of mally used for incorporation of free lipoic acid. In B. subtillis GCS, a reaction known to be catalyzed by LipA [100], is (blue sequence), octanoate is transfered from ACP first to the H-protein of GCS by LipM, followed by a second transfer to sufficient to allow the sole function of lipoic acid in these the E2 subunit of PDH by LipL. B. subtillis also uses LplJ organisms. This proposed sequence of enzyme functions instead of LplA for incorporation of free lipoic acid. In red is is supported by the observation that in deep-branching the suggested ancestral biosynthesis of lipoic acid (see main clades that do not possess the glycine cycle – includ- text). ing many Clostridia, all methanogenic families within the Euryarcheota or the Desulfurobacteriaceae family within Aquificales [2] – we do not find any genes associated with a distinct lipoate protein ligase found in B. subtillis [94]. lipoic acid biosynthesis or uptake. Thus, we suggest that It can immediately be seen that the conventional LipA + in many cases where a putative LipM is found but LipL LipB combination is not widely distributed across deep- is absent, a pathway is used in which LipM is directly fol- branching clades. Only 6 archaeal strains, nearly all lowed by LipA (see Fig. 5), and we also propose that this (5/6) in the Thermoproteale family within the Crenar- represents the ancestral pathway for de novo lipoic acid cheota, appear to possess this pathway. Among deep- biosynthesis. Note that this proposal further reinforces branching bacteria, only the Deinococcales, Cyanobacte- the conclusion that the reductive TCA cycle preceded the ria and Clostridiale family within Firmicutes show sig- oxidative version.

In this scenario, an additional octanoyl transferase a single all-purpose LipB transferase, as seen in E. coli. (LipL) emerged for the second subsequent transfer to the The high sequence similarity among LipM, LipL, LplA E2 domain of PDH, possibly through duplication and di- and LpIJ further suggests that the environmental uptake vergence of LipM as seen in B. subtillis. This would have genes likewise arose through duplication and divergence introduced a redundancy into the lipoic acid system by from octanoyl transferase genes, but that some of this an- producing two dedicated octanoyl transferase enzymes. cestral function was maintained in LplA, making possible This redundancy then appears to have been removed its recruitment in E. coli mutants lacking LipB. by replacing the two distinct transferase enzymes with 16

LipA LipB A+B LplA A+A LipM LipL A+L+M LplJ Domain/Clade/Family Archaea total (126) 37 6 6 60 31 Crenarcheota (41) 18 5 5 27∗ 13 Euryarcheota (80) 19 1 1 32 18 Korarcheota (1) 0 0 0 1∗ 0 Nanoarchaea (1) 0 0 0 0 0 Thaumarcheota (3) 0 0 0 0 0 Bacteria total (329) 214 105 102 100 83 94 108 85 146 Aquificales total (9) 7 3 3 4∗ 4 Aquificaceae (4) 4 0 0 4∗ 4 Hydrogenobacter (3) 3 3 3 0 0 Desulfotobacter (2) 0 0 0 0 0 Thermotogales (13) 0 0 0 13∗ 0 Deinococcus-Thermales (15) 15 15 15 2 2 Chloroflexi (16) 8 6 6 10∗ 8 Chlorobi (11) 11 1 1 11∗ 11 Planctomycetes (6) 5 1 0 5∗ 5 Nitrospira (3) 2 0 0 3∗ 2 Verrucomicrobia (4) 4 3 3 4∗ 4 Cyanobacteria (44) 43 43 43 43∗ 43 Firmicutes total (208) 119 33 31 5 4 94 108 85 146 Bacilli (106) 92 7 7 0 0 85 106 85 96 Clostridia (102) 27 26 24 5 4 9 2 0 50

TABLE IV: Distribution of lipoic acid biosynthesis genes. Within the column counting LplA genes, entries denoted by (∗) are often better matched to LipM of B. subtillis than LplA of either E. coli or T. acidophilum. See main text for additional details.

Vitamin B6 ever, A. aeolicus possesses the DXP-dependent pathway, prompting us to further examine this hypothesis. Ta- ble V shows the distribution of the key enzymes involved In contrast to the narrow functionality of lipoic acid, in the condensation steps in both sequences – PdxA/J in vitamin B6, which refers to pyroxidal 5-phosphate and the DXP-dependent pathway and PdxS/T in the DXP- its substitutes, is one of the most functionally diverse co- independent pathway – across bacteria and archaea. factors, with its different forms facilitating a very wide range of reaction classes [101]. Its diverse functionality and relatively simple chemistry, plausibly accessible un- Pdx der abiotic conditions, has led to the suggestion that it Domain/Clade/Family S T S+T A J A+J may have been one of the earliest cofactors [102, 103]. Archaea (126) 114 114 113 6 0 0 Bacteria (339) There are two known biosynthetic pathways leading Aquificales (9) 0 0 0 8 9 8 to vitamin B6 in modern metabolism. In the first recog- Thermotogales (13) 12 12 12 0 0 0 nized pathway, described in E. coli, pyridoxine phosphate Chloroflexi (16) 16 15 15 0 0 0 is derived from 4-erythrophosphate [101]. This path- Planctomycetes (6) 0 0 0 6 6 6 way is known as the ‘DXP dependent’ pathway, because Nitrospirae (3) 0 0 0 3 3 3 1-deoxy-D-xylulose-5-phosphate (DXP) is the secondary Chlorobia (11) 0 0 0 11 11 11 input to the final condensation reaction that produces Deinococci (15) 15 15 15 0 0 0 the pyridine ring of pyridoxine. In the second (‘DXP in- Verrucomicrobia (4) 0 0 0 4 4 4 dependent’) pathway, first described in B. subtillis, pyri- Cyanobacteria (44) 0 0 0 44 44 44 doxal phosphate is synthesized through the direct con- Firmicutes (208) 183 164 164 47 0 0 densation of ribulose phosphate and glyceraldehyde phos- Bacillales (106) 106 106 106 11 0 0 phate [104, 105]. Clostridia (102) 77 58 58 36 0 0 Previous analyses found genes for the DXP- independent pathway to be highly conserved and dis- TABLE V: Distribution of pyroxidal phosphate synthesis tributed across both archaeal and bacterial domains, genes. while genes for the DXP-dependent pathway were found mainly in the γ-proteobacteria, suggesting this latter These distributions show some striking patterns. It pathway emerged later in evolution [106, 107]. How- had previously been noted that the two pathways are 17 mutually exclusive within organisms, which use only one Finally, the other reactions in the DXP-dependent or the other [106]. Our analysis shows that the pathways pathway that lead up to the condensation sequence cat- are in fact mutually exclusive at the clade level, much alyzed by PdxA/J represent common and widely used more so than we have seen for pathway variants in other metabolic reactions catalyzed by members of highly sub-systems we have previously analyzed. PdxA is found diversified enzyme families. The reaction sequence in a few species within both archaea and Firmicutes, but connecting 4-erythrophosphate to phospho-4-hydroxy- that enzyme catalyzes a hydrogenation reaction, with the threonine, the input to the PdxA/J-catalyzed ring con- enzyme catalyzing the actual ring condensation reaction densation sequence, consists of a hydration/reduction of (PdxJ) completely absent in those cases. Our analysis an aldehyde to a carboxyl group, a subsequent dehydro- further appears to confirm that the DXP-independent genation of an alcohol to a carbonyl group, and finally pathway represents the ancestral pathway. In addition the reductive amination of that carbonyl group. Espe- to nearly all archaea, several deep-branching bacterial cially in an earlier era of more promiscuous catalysts, clades (Thermotogales, Chloroflexi, Deinococcales, Fir- this pathway could thus well have been recruited en bloc micutes) also use this pathway. into the emergent DXP-dependent pathway. The selec- It was previously suggested that the DXP-dependent tive advantage of this adaptation, and the way it might pathway arose within proteobacteria [107], but the fact have led to the peculiar distribution of both pathways, that all members of several deep-branching clades use this remains to be explained, however. pathway suggests it may actually have been an earlier innovation. The observed distribution of both pathways is difficult to explain, however. That the pathways are Quinones mutually exclusive at the clade level requires either early HGT to progenitors of clades, extensive gene transfer be- The main component of the quinone pool in A. ae- tween select clades after they had diverged, or extensive olicus was determined to be 2-demethylmenaquinone-7 transfer within clades that can take the appearance of (DMK-7) [109]. Other Aquificales, including its close genes sweeping through the population [17]. In any of relative H. thermophilus, had previously been found to these cases there is no obvious explanation for why trans- use 2-methylthiomenaquinone-7 [110–112]. Menaquinone fer would have been restricted to occurring only between (MK) has significantly lower redox potential than select clades, nor is a selective advantage apparent. ubiquinone (UQ), and, based on distributions of these An alternative explanation would be (possibly re- two quinone types both across the tree of life [113, peated) convergent evolution early in the divergences 114] and within clades known to bridge the anaerobic- of clades. This explanation has some appeal, as the aerobic domains, UQ was suggested to have emerged key enzymes in both pathways, PdxA/J and PdxS/T, with the rise of atmospheric oxygen [115]. Membrane- are in fact very similar in their 3-D structure and in dissolved quinones exchange electrons directly with the the sequence of local functional group transformations fumarate/succinate redox couple, which is respectively that make up the respective condensation reactions [101]. an electron acceptor or donor depending on the direc- However, even for convergent evolution we are lacking a tion of this reaction [113, 116]. The possible emergence good explanation for why only some clades would perva- of the higher potential UQ with the rise of oxygen may sively adopt this new strategy, while others did not. thus have allowed the fumarate-succinate interconversion If the evolutionary sequences and driving forces are not to reverse to the oxidative direction, further facilitating clear, we can at least identify features of the two path- reversal of the TCA cycle as a whole. Generally, whereas ways that would have facilitated the transition between reduced MK is easily oxidized in the presence of oxygen, them. The shared fold structure and similarity in re- disrupting electron flow into biosynthesis, reduced UQ is action mechanisms, but low sequence similarity, between stable in the presence of oxygen [115]. The redox poten- PdxA/J and PdxS/T have been interpreted to mean that tial of the de-methylated version of menaquinone, DMK, they represent convergent discoveries [101]. However, lies between that of MK and UQ, possibly reflecting the it is also possible that PdxA/J emerged from PdxS/T microaerophilic character of A. aoeolicus [109]. and that both have been under strong selection pressure, causing their sequence to diverge strongly. Another common feature between pathways is they Nucleotides both start from intermediates within the pentose phos- phate pathway. The key exception is DXP itself, which is Assignment of biosynthetic pathways to pyrimidines the of the first committed reaction in the DOXP and purines was mostly unambiguous in A. aeolicus. At pathway of terpenoid backbone synthesis. Archaea ex- the substrate level there is little major variation in the clusively use the alternate mevalonate (MVA) pathway to biosynthesis of these compounds [117], and the genome synthesize terpenoids, while bacteria use both the MVA of A. aeolicus shows complete gene sets for their synthe- and DOXP pathways [108], possibly providing a partial sis [11]. There is some ambiguity in the interconversion explanation for why the DXP-dependent pathway to vi- between differently substituted purines and pyrimidines tamin B6 emerged only within bacteria and not archaea. due to the well-known broad substrate affinity of many 18 of the enzymes involved (e.g. [118, 119]). We therefore the lower mass flux density of pyrimidine synthesis rel- did not significantly modify the conservative broad as- ative to core carbon fixation also reduces the benefit of signments made by SEED in this sub-network. Experi- fusion. This may help explain why these genes are also mental studies would probably be needed to elucidate the not fused in many other bacteria. For A. aeolicus an- fine scale activity/regulation of these reactions if it were other reason that fusion did not take place may be that deemed important for a highly quantitative model. We the heterotrimer structure may provide additional stabil- again suggest that the broad affinity of enzymes inter- ity under hyperthermophilic conditions [123]. converting differently substituted nucleobases indicates that the lower mass flux of these reactions (for example compared to the rTCA cycle) significantly reduces the Cellular encapsulation benefit relative to the cost of using multiple more-specific enzymes. The cellular encapsulation of A. aeolicus consists of There are a few noteworthy details in the biosynthetic three main components: phospholipid membranes, a pep- pathways of nucleotides. The initial synthesis of the IMP tidoglycan cell wall, and lipopolysaccharide. A. ae- backbone involves several steps in which formyl groups olicus has the full gene complement for the standard are incorporated, which can proceed either through an diaminopimelate-based variant of peptidoglycan synthe- ATP mediated addition of free formate, or through dona- sis that is common to gram-negative bacteria [124], but tion of the formyl group by N10-Formyl THF [117]. Ar- leaves substantial gaps within lipopolysaccharide synthe- chaea that possess tetrahydromethanopterin (H MPT) 4 sis pathways. These pathways remain an important area rather than tetrahydrofolate (THF) as their C1 carrier of experimental study, as they have required a large num- use free formate in purine synthesis because H MPT is 4 ber of gap-fills for which we lacked overall context in the not a good donor of formyl groups [42, 120]. Most other curation process. Of the three components of encapsula- organisms use THF to transfer formyl groups in purine tion, the composition of phospholipids contains the most synthesis [117], while E. coli was found to possess both information on the ecology of A. aeolicus. mechanisms [121]. A. aeolicus follows these trends and uses THF as the formyl donor during purine synthesis. Another minor variation in the synthesis of purines in- volves the carboxylation of aminoidazole ribunucleotide Lipid biosynthesis (AIR). Like other bacteria, A. aeolicus uses a 2-step in- − corporation of HCO3 involving ATP hydrolysis for this Lipid metabolism represents the single largest sub- reaction. By contrast, higher organisms use a 1-step in- system within the reconstructed network of A. aeoli- corporation of CO2 in an ATP free system, in which the cus, containing nearly 200 out of ∼760 reactions. As enzyme is moreover often fused to the enzyme catalyz- mentioned this is partly due to the fact that we explic- ing the subsequent reaction [117]. Similar to enzyme re- itly represent each reaction within this sub-system, and placements we have seen in other metabolic sub-systems, partly due to the fact that A. aeolicus has a complex this may represent another adaptation selected because and diverse lipid composition (see Supplementary Table it improves both thermodynamic efficiency and pathway I) [125]. However, the elongation of all fatty acid chains is throughput, in this case also shifting dependence from a polymerization sequence in which 2-carbon units (from − HCO3 to CO2 as a secondary effect. acetyl-CoA) are added, and then reduced, through a re- Pyrimidine synthesis in A. aeolicus again reflects the peated sequence of the same 7 reactions catalyzed by the primitive nature of its metabolism. In the first reac- same 8 enzymes, with only the length of the fatty acid tion in this pathway carbamoyl phosphate is synthesized tail away from the reaction site varying [37]. Much of − the size of this sub-network thus reflects representation from glutamine, HCO3 , and ATP. Experimental stud- ies showed that in A. aeolicus this three-part reaction is in the model rather than the associated genome content. catalyzed by a heterotrimer enzyme with relatively inef- In addition, the fatty acid sub-network is further ex- ficient coupling between the subunits [122]. By contrast, panded due to the fact that A. aeolicus uses fatty acid in E. coli two of those subunits are fused together, re- chains that contain methyl groups, propyl rings, and un- sulting in a heterodimer enzyme, while in mammals all saturated bonds at different positions [125]. Each of these three subunits plus the enzymes for the subsequent re- different substitutions is introduced at a different point actions to carbamoyl-aspartate and dihydroorotate are during the elongation process, resulting in an expanding fused together into one large single subunit enzyme [123]. set of intermediates that is tracked in the network prior Paralleling the suggested ancestry of ATP citrate lyase, to the output of chains of different lengths and substitu- the collection of observations about pyrimidine synthe- tions into the final lipid assembly process. sis suggest that the heterotrimer carbamoyl-phosphate Finally, the lipid content of A. aeolicus is unusual synthetase of A. aeolicus is more closely related to the among bacteria for containing both phospho-ester and ancestral enzyme for this reaction [122, 123]. However, phospho-ether lipids [125]. In general, most bacte- while the cost of improving kinetics through gene fusion ria use fatty acid-based ester lipids, while archaea use is lower than other cases of duplication and divergence, isoprenoid-based ether lipids [126]. The additional use 19 of fatty acid ether lipids by A. aeolicus thus represent a A. aeolicus has a versatile and diverse energy sort of intermediary strategy. metabolism. Molecular hydrogen is sufficient as a sole Altogether lipid biosynthesis can be thought of as a electron donor, although it can in some cases be sup- compact and highly modular system that distributes 2- plemented by hydrogen sulfide (H2S). A variety of com- carbon units over a set of states of different lengths and pounds can act as terminal electron acceptors. Molecular substitution patterns that can be varied depending on oxygen (O2) is the major electron acceptor, and under environmental context. Methyl group side-chains, un- conditions so far tested it appears to be obligatory. The − saturated bonds or cyclopropane rings can be used to metabolic network also indicates that nitrate (NO3 ) can modify the fluidity of the membrane, while cyclopropane be used as electron acceptor, with the complete sequence rings may also be used to adapt to lower pH [127]. The for conversion to ammonia (NH3) present [11]. While linkage of fatty acids to the the glycerol backbone can in this is in accordance with observations for other Aquif- turn be varied between ether or ester bonds to modify icales [131], nitrate has so far not been reported as an the permeability of the membrane [128]. Apart from ba- electron acceptor for A. aeolicus [3, 5]. sic inputs and final assembly (isoprenoids vs. fatty acids, An observation that H2S cannot replace H2 as sole elec- ethers vs. esters), the regulation of lengths and substitu- tron donor may be explained by the existence of tightly- tion patterns appears to be the main factors permitting coupled respiratory super-complexes that prevent uptake wide variability in lipid composition. The diverse and of intermediates in the respiratory sequence [130, 132]. varied composition of A. aeolicus lipids, including both Electrons are transfered from H2 into metabolism at ester and ether linkages, appears to reflect the “stressed” three main points, Hydrogenases I, II & III, from where hyperthermophilic conditions of the hydrothermal vents they enter the membrane quinone pool (Hydrogenase I, and springs where it lives. II) or are directly transferred to ferredoxin in the cyto- plasm (soluble Hydrogenase III) [133]. In the hydroge- nase I respiratory chain, the quinones subsequently trans- Energy metabolism fer the electrons to a cytochrome bc1 complex, which in turn reduces O2 to water [129, 134, 135]. In the hydro- Energy metabolism has the highest mass flux density genase II respiratory chain, the quinones instead trans- fer the electrons to the sulfur reductase complex, which of all cellular processes, because it generates the global 2− energetic driving forces (both reductants and ATP) for in turn reduces elemental sulfur (and perhaps S4O6 ) all subsequent metabolic interconversion. The energy and produces H2S [136, 137]. H2S can then subsequently metabolism of A. aeolicus represents one of its most stud- be re-oxidized by a sulfide quinone reductase complex ied aspects [5], and as we will demonstrate, the effects of that transfers the electrons through quinones (and a cy- kinetic optimization can be seen throughout. tochrome bc1 complex) into oxygen, which is reduced to An autotroph such as Aquifex should be more sen- water [130, 132, 136]. sitive to energy-metabolism optimization than more Sulfur has a dynamic and varied role in the energetics commonly-studied heterotrophic models, because unlike of A. aeolicus, likely in part because of its ability to exist a heterotroph, which can obtain energy from organics in a wide range of oxidation states [138]. In addition to 0 and may use fermentation, autotrophs obtain all energy acting as electron donor (H2S, possibly S ), several sulfur through purely respiratory interconversion of inorganics. compounds are capable of acting as electron acceptors. 0 2− The free energy density available from inorganic redox Elemental sulfur (S ) and tetrathionate (S4O6 ) act as couples may also be as much as an order of magnitude electron acceptors at the hydrogenase I complex [137]. 2− lower than that provided by sunlight used by photoau- Thiosulfate (S2O3 ) is in turn putatively oxidized by totrophs. Together these effects should create a signif- the Sox multi-enzyme system [5, 139, 140], which has icant selective advantage for improving the kinetics of been described in another member of the Aquificaceae, energy metabolism in chemoautotrophs, by improving H. thermophilus [141]. The Sox system has also been de- growth rate. scribed in other thermophiles that share with A. aeolicus The apparent effects of improving kinetics can be seen both the equivalent set of Sox genes in their genomes, and at all levels of the energy metabolism of A. aeolicus. the characteristic of producing cytoplasmic sulfur glob- Many respiratory proteins are organized in polycistronic ules under certain growth conditions [142]. Finally, A. operons in the genome, and are subsequently assembled aeolicus also possesses several rhodanese complexes pos- into super-complexes once functionally expressed. More- sibly associated with cyanide detoxification [143, 144], as over, whereas interactions between components of other well as an ATP sulfurylase [145] possibly involved in sul- known respiratory super-complexes are generally rather fite oxidation [5]. However, due to the complexity of sul- weak, in A. aeolicus these super-complexes are found to fur chemistry, its roles in A. aeolicus energy metabolism be exceptionally stable [5, 129, 130]. While this is proba- remain to be fully mapped out [5]. bly in part an adaptation to growth at high temperature, The energy metabolism of A. aeolicus connects to it also likely improves the overall kinetics of energy con- its biosynthetic pathways mainly through the membrane version sequences due to increased effective concentration quinone pool (with DMK-7 as the main component, see of intermediates within each sequence. previous). Quinones reduce other biosynthetic reductant 20 carriers such as nicotinamides (NAD, NADP), flavins systems. By reconstructing the evolutionary sequences (FAD), and ferredoxins, while in special cases also di- among pathways, we were also able to analyze the evolu- rectly driving metabolic conversions (such as fumarate tionary driving forces that shaped various sub-systems. reduction to succinate). Biosynthesis thus produces a We have highlighted throughout the way selection for net flux of electrons into the cell, which gives A. aeolicus improved kinetics and/or improved thermodynamic effi- its reducing character. This electron influx is distributed ciency has shaped the network. Comparing different sub- across acceptors that range in character from purely an- networks reveals a tradeoff in the costs versus the benefits abolic (CO2 during fixation and subsequent biosynthesis) of these innovations, which apparently depends strongly to purely energetic (O2), with some playing intermedi- on the relative mass flux density of sub-systems. Extend- ary roles (some reduced nitrogen and sulfur is needed for ing these analyses to other metabolic sub-systems and to biosynthesis). Shifts in character of these latter interme- the evolutionary history of other organisms will improve diary forms can be seen from the fact that A. aeolicus our understanding of how tradeoffs between performance growing on elemental sulfur will emit H2S upon reach- gains and their associated costs generally contributed to ing the stationary phase of growth [137]. Finally, global fitness in the earliest stages of cellular life. charge balance in the cell is maintained by combining this electron influx with a proton influx, which is captured by the ATP synthase to generate ATP [146]. Acknowledgments

Parts of this work were performed under support from Outlook and future directions NSF FIBR grant nr. 0526747 – The Emergence of Life: From Geochemistry to the Genetic Code. RB was further We have used PMA to reconstruct the whole-genome supported by an SFI Omidyar Fellowship at the Santa Fe metabolic network of A. aeolicus, and have shown that Institute. ES thanks Insight Venture Partners for sup- it uses the likely ancestral pathways within many sub- port.

[1] Rogier Braakman and Eric Smith. The compositional 2013. and evolutionary logic of metabolism. Physical Biology, [9] Adam M. Feist, Markus J. Herrgard, Ines Thiele, Jen- 10:011001, 2013. nie L. Reed, and Bernhard O. Palsson. Reconstruction [2] Rogier Braakman and Eric Smith. The emergence and of biochemical networks in microorganisms. Nature Re- early evolution of biological carbon fixation. PLoS views Microbiol., 7:129–143, 2009. Comp. Biol., 8:e1002455, 2012. [10] C. S. Henry, J. F. Zinner, M. P. Cohoon, and R. L. [3] Robert Huber and Wolfgang Eder. Aquificales. In Mar- Stevens. iBsu1103: a new genome-scale metabolic tin Dworkin, Stanley Falkow, Eugene Rosenberg, Karl- model of Bacillus subtilis based on SEED annotations. Heinz Schleifer, and Erko Stackebrandt, editors, The Genome Biol., 10:R69, 2009. Prokaryotes, pages 925–938. Springer New York, 2006. [11] Gerard Deckert, Patrick V. Warren, , [4] Anna-Louise Reysenbach and Everett Shock. Merg- William G. Young, Anna L. Lenox, David E. Graham, ing genomes with geochemistry in hydrothermal ecosys- Ross Overbeek, Marjory A. Snead, Martin Keller, Mon- tems. Science, 296:1077–1082, 2002. ette Aujay, Robert Huber, Robert A. Feldman, Jay M. [5] Marianne Guiral, Laurence Prunetti, Cl´ement Aussig- Short, Gary J. Olsen, and Ronald V. Swanson. The nargues, Alexandre Ciaccafava, Pascale Infossi, Mar- complete genome of the hyperthermophilic bacterium ianne Llbert, Elisabeth Lojou, and Marie-Th´er`ese aquifex aeolicus. Nature, 392:353–358, 1998. Giudici-Orticoni. The hyperthermophilic bacterium [12] S. Burggraf, G. J. Olsen, K. O. Stetter, and C. R. aquifex aeolicus: From respiratory pathways to ex- Woese. A phylogenetic analysis of aquifex pyrophilus. tremely resistant enzymes and biotechnological appli- Syst. Appl. Microbiol, 15:352–356, 1992. cations. Advances in microbial physiology, 61:125, 2012. [13] Norman R. Pace. A molecular view of microbial diver- [6] and Arthur M. Lesk. The relation be- sity and the biosphere. Science, 276:734–740, 1997. tween the divergence of sequence and structure in pro- [14] Emma Griffiths and Radhey S Gupta. Signature se- teins. EMBO J., 5:823–826, 1986. quences in diverse proteins provide evidence for the late [7] Alexandra M Schnoes, Shoshana D Brown, Igor Do- divergence of the order aquificales. International Micro- devski, and Patricia C Babbitt. Annotation error in biology, 7(1):41–52, 2004. public databases: misannotation of molecular function [15] Bastien Boussau, Laurent Gu´eguen,and Manolo Gouy. in enzyme superfamilies. PLoS computational biology, Accounting for horizontal gene transfers explains con- 5(12):e1000605, 2009. flicting hypotheses regardint the position of aquificales [8] Predrag Radivojac, Wyatt T Clark, Tal Ronnen Oron, in the phylogeny of bacteria. BMC Evol. Biol., 8:272:1– Alexandra M Schnoes, Tobias Wittkop, Artem Sokolov, 18, 2008. Kiley Graim, Christopher Funk, Karin Verspoor, Asa [16] SM Sievert and C Vetriani. Chemoautotrophy at deep- Ben-Hur, et al. A large-scale evaluation of compu- sea vents: Past, present, and future. Oceanography, tational protein function prediction. Nature methods, 25(1):218–233, 2012. 21

[17] Martin F Polz, Eric J Alm, and William P Hanage. [33] Thomas Pfeiffer, Stefan Schuster, and Sebastian Bonho- Horizontal gene transfer and the evolution of bacterial effer. Cooperation and competition in the evolution of and archaeal population structure. Trends in Genetics, atp-producing pathways. Science, 292(5516):504–507, 2013. 2001. [18] Olga Zhaxybayeva, Kristen S. Swithers, Pascal [34] Christopher S. Henry, Matthew DeJongh, Aaron A. Lapierre, Gregory P. Fournier, Derek M. Bickhart, Best, Paul M. Frybarger, Ben Linsay, and Rick L. Robert T. DeBoy, Karen E. Nelson, Camilla L. Nesbø, Stevens. High-throughput generation, optimiza- W. Ford Doolittle, J. Peter Gogarten, and Kenneth M. tion and analysis of genome-scale metabolic models. Noll. On the chimeric nature, thermophilic origin, and Nat. Biotech, 28:977–982, 2010. phylogenetic placement of the Thermotogales. PNAS, [35] The UniProt Consortium. Ongoing and future develop- 106:5865–5870, 2009. ments at the universal protein resource. Nucleic Acids [19] Roy A. Jensen. Enzyme recruitment in evolution of new Research, 39(suppl 1):D214–D219, 2011. function. Annu. Rev. Microbiol., 30:409–425, 1976. [36] Ying Zhang, Ines Thiele, Dana Weekes, Zhanwen Li, [20] Patrick J O’Brien and Daniel Herschlag. Catalytic Lukasz Jaroszewski, Kraysztof Ginalski, Ashley M. promiscuity and the evolution of new enzymatic activi- Deacon, Johhn Wooley, Scott A. Lesley, Ian A. Wil- ties. Chemistry & biology, 6(4):R91–R105, 1999. son, Bernhard Palsson, Andrei Osterman, and Adam [21] Shelley D. Copley. Enzymes with extra talents: Godzik. Three-dimensional structural view of the cen- moonlighting functions and catalytic promiscuity. tral metabolic network of Thermotoga maritima. Sci- Curr. Opin. Chem. Biol., 7:265–272, 2003. ence, 325:1544–1549, 2009. [22] Olga Khersonsky and Dan S. Tawfik. Enzyme promis- [37] Stephen W White, Jie Zheng, Yong-Mei Zhang, and cuity: A mechanistic and evolutionary perspective. Charles O Rock. The of type ii fatty Annu. Rev. Biochem., 79:471–505, 2010. acid biosynthesis. Annu. Rev. Biochem., 74:791–831, [23] Dan S Tawfik et al. Messy biology and the origins 2005. of evolutionary innovations. Nature chemical biology, [38] Nestor M Carballeira, Morayama Reyes, Anthony 6(10):692, 2010. Sostre, Heshu Huang, MF Verhagen, and MW Adams. [24] Juhan Kim, Jamie P. Kershner, Yehor Novikov, Unusual fatty acid compositions of the hyperther- Richard K. Shoemaker, and Shelley D. Copley. Three mophilic archaeon pyrococcus furiosus and the bac- serendipitous pathways in E. coli can bypass a block terium thermotoga maritima. Journal of bacteriology, in pyridoxal-50-phosphate synthesis. Mol. Sys. Biol., 179(8):2766–2768, 1997. 6:436:1–13, 2010. [39] Monika Beh, Gerhard Strauss, Robert Huber, Karl-Otto [25] Olga Khersonsky, Sergey Malitsky, Ilana Rogachev, and Stetter, and Georg Fuchs. Enzymes of the reductive Dan S Tawfik. Role of chemistry versus substrate bind- citric acid cycle in the autotrophic eubacterium Aquifex ing in recruiting promiscuous enzyme functions. Bio- pyrophilus and in the archaebacterium Thermoproteus chemistry, 50(13):2683–2690, 2011. neutrophilus. Archives of Microbiology, 160:306–311, [26] Hojung Nam, Nathan E Lewis, Joshua A Lerman, 1993. Dae-Hee Lee, Roger L Chang, Donghyuk Kim, and [40] Michael H¨ugler,Harald Huber, Stephen J. Molyneaux, Bernhard O Palsson. Network context and selec- Costantino Vetriani, and Stefan M. Seivert. Autotrophic tion in the evolution to enzyme specificity. Science, co2 fixation via the reductive tricarboxylic acid cycle 337(6098):1101–1104, 2012. in different lineages within the phylum Aquificae: evi- [27] Arren Bar-Even, Elad Noor, Yonatan Savir, Wolfram dence for two ways of citrate cleavage. Env. Microbiol- Liebermeister, Dan Davidi, Dan S. Tawfik, and Ron ogy, 9:81–92, 2007. Milo. The moderately efficient enzyme: Evolutionary [41] Eric Smith and Harold J. Morowitz. Universality in and physicochemical trends shaping enzyme parame- intermediary metabolism. Proc. Nat. Acad. Sci. USA, ters. Biochem., 50:4402–4410, 2011. 101:13168–13173, 2004. SFI preprint # 04-07-024. [28] Marco Fondi, Matteo Brilli, Giovanni Emiliani, Do- [42] B. Edward H. Maden. Tetrahydrofolate and tetrahy- natella Paffetti, and Renato Rani. The primor- dromethanopterin compared: functionally distinct car- dial metabolism: an ancestral interconnection be- riers in c1 metabolism. Biochem. J., 350:609–629, 2000. tween leucine, arginine, and lysine biosynthesis. BMC [43] S W Ragsdale, J E Clark, L G Ljungdahl, L L Lundie, Evol. Biol., 7(Suppl 2):S3, 2007. and H L Drake. Properties of purified carbon monox- [29] Juli Peret´o. Out of fuzzy chemistry: from ide dehydrogenase from Clostridium thermoaceticum, a prebiotic chemistry to metabolic networks. nickel, iron-sulfur protein. Journal of Biological Chem- Chem. Soc. Rev., 10.1039/C2CS35054H:1–10, 2012. istry, 258(4):2364–2369, 1983. DOI: 10.1039/c2cs35054h. [44] Patrick Stover and Verne Schirch. The metabolic role of [30] Edward M Marcotte, Matteo Pellegrini, Ho-Leung Ng, leucovorin. Trends in Biochemical Sciences, 18(3):102 – Danny W Rice, Todd O Yeates, and David Eisenberg. 106, 1993. Detecting protein function and protein-protein interac- [45] Teng Huang and Verne Schirch. Mechanism tions from genome sequences. Science, 285(5428):751– for the coupling of atp hydrolysis to the con- 753, 1999. version of 5-formyltetrahydrofolate to 5,10- [31] BB Snel, PP Bork, MM Huynen, et al. Genome methenyltetrahydrofolate. Journal of Biological evolution-gene fusion versus gene fission. Trends in ge- Chemistry, 270(38):22296–22300, 1995. netics, 16:9–11, 2000. [46] D. von Wettstein, S. Gough, and C. G. Kannan- [32] Sarah K Kummerfeld, Sarah A Teichmann, et al. Rel- gara. Chlorophyll biosynthesis. The Plant Cell Online, ative rates of gene fusion and fission in multi-domain 7(7):1039–1057, 1995. proteins. Trends in genetics: TIG, 21(1):25, 2005. [47] Steven A. Benner, Andrew D. Ellington, and Andreas 22

Tauer. Modern metabolism as a palimpsest of the rna [63] H Edwin Umbarger and Barbara Brown. Threo- world. Proc. Nat. Acad. Sci. USA, 18:7054–7058, 1989. nine deamination in escherichia coli ii.: Evidence for [48] Miho Aoshima, Masaharu Ishii, and Yasuo Igarashi. A two l-threonine deaminases1. Journal of bacteriology, novel enzyme, citryl-coa synthetase, catalysing the first 73(1):105, 1957. step of the citrate cleavage reaction in hydrogenobacter [64] Nyles W Charon, Russell C Johnson, and David Pe- thermophilus tk-6. Molecular Microbiology, 52:751–761, terson. Amino acid biosynthesis in the spirochete lep- 2004. tospira: evidence for a novel pathway of isoleucine [49] Miho Aoshima, Masaharu Ishii, and Yasuo Igarashi. A biosynthesis. Journal of bacteriology, 117(1):203–211, novel enzyme, citryl-coa lyase, catalysing the second 1974. step of the citrate cleavage reaction in hydrogenobacter [65] Hai Xu, Yuzhen Zhang, Xiaokui Guo, Shuangxi Ren, thermophilus tk-6. Molecular Microbiology, 52:763–770, Andreas A Staempfli, Juishen Chiao, Weihong Jiang, 2004. and Guoping Zhao. Isoleucine biosynthesis in leptospira [50] Miho Aoshima, Masaharu Ishii, and Yasuo Igarashi. interrogans serotype lai strain 56601 proceeds via a A novel biotin protein required for reductive carboxy- threonine-independent pathway. Journal of bacteriol- lation of 2-oxoglutarate by isocitrate dehydrogenas in ogy, 186(16):5400–5409, 2004. Hydrogenobacter thermophilus TK-6. Mol. Microbiol., [66] B Eikmanns, D Linder, and Rudolf K Thauer. Un- 51:791–798, 2004. usual pathway of isoleucine biosynthesis in methanobac- [51] Miho Aoshima and Yasuo Igarashi. A novel terium thermoautotrophicum. Archives of microbiology, oxcalosuccinate-forming enzyme involved in the reduc- 136(2):111–113, 1983. tive carboxylation of 2-oxoglutarate in Hydrogenobacter [67] Michel Hochuli, Heiko Patzelt, Dieter Oesterhelt, Kurt thermophilus TK-6. Mol. Microbiol., 62:748–759, 2006. W¨uthrich, and Thomas Szyperski. Amino acid biosyn- [52] Michael H¨ugler and S. M. Seivert. Beyond the thesis in the halophilic archaeonhaloarcula hispanica. calvin cycle: Autotrophic carbon fixation in the ocean. Journal of bacteriology, 181(10):3226–3237, 1999. Ann. Rev. Marine Sci., 3:261–289, 2011. [68] David M Howell, Huimin Xu, and Robert H White. (r)- [53] Miho Aoshima and Yasuo Igarashi. Nondecarboxylat- citramalate synthase in methanogenic archaea. Journal ing and decarboxylating isocitrate dehydrogenases: ox- of bacteriology, 181(1):331–333, 1999. alosuccinate reductase as an ancestral form of isocitrate [69] Randy M Drevland, Abdul Waheed, and David E dehydrogenase. J. Bacteriol., 190:2050–2055, 2008. Graham. Enzymology and evolution of the pyruvate [54] Arren Bar-Even, Avi Flamholz, Elad Noor, and Ron pathway to 2-oxobutyrate in methanocaldococcus jan- Milo. Thermodynamic constraints shape the structure naschii. Journal of bacteriology, 189(12):4391–4400, of carbon fixation pathways. Biochimica et Biophysica 2007. Acta (BBA) - Bioenergetics, 1817(9):1646 – 1659, 2012. [70] S Sch¨afer,C Barkowski, and G Fuchs. Carbon assimi- [55] S. L. Miller and D. Smith-Magowan. The thermo- lation by the autotrophic thermophilic archaebacterium dynamics of the krebs cycle and related compounds. thermoproteus neutrophilus. Archives of microbiology, J. Phys. Chem. Ref. Data., 19:1049–1073, 1990. 146(3):301–308, 1986. [56] Richard Wolfenden, Charles A. Lewis Jr., and Yang [71] Ulrike Jahn, Harald Huber, Wolfgang Eisenreich, Yuan. Kinetic challenges facing oxalate, mal- Michael H¨ugler,and Georg Fuchs. Insights into the onate, acetoacetate and oxaloacetate decarboxylases. autotrophic co2 fixation pathway of the archaeon ig- J. Am. Chem. Soc., 133:5683–5685, 2011. nicoccus hospitalis: comprehensive analysis of the [57] B. B. Buchanan and D. I. Arnold. A reverse krebs cycle central carbon metabolism. Journal of bacteriology, in photosynthesis: Consensus at last. Photosynth. Res., 189(11):4108–4119, 2007. 24:47–53, 1990. [72] Xueyang Feng, Housna Mouttaki, Lu Lin, Rick Huang, [58] Sebastian L¨ucker, Michael Wagner, Frank Maixner, Eric Bing Wu, Christopher L Hemme, Zhili He, Baichen Pelletier, Hanna Koch, Benoit Vacherie, Thomas Rattei, Zhang, Leslie M Hicks, Jian Xu, et al. Characteriza- Jaap S. Sinninghe Damste, Eva Spieck, Denis Le Paslier, tion of the central metabolic pathways in thermoanaer- and Holger Daims. A nitrospira metagenome illumi- obacter sp. strain x514 via isotopomer-assisted metabo- nates the physiology and evolution of globally important lite analysis. Applied and environmental microbiology, nitrite-oxidizing bacteria. Proc. Nat. Acad. Sci. USA, 75(15):5001–5008, 2009. 107:13479–13484, 2010. [73] Kuo-Hsiang Tang, Xueyang Feng, Wei-Qin Zhuang, Lisa [59] Michael H¨ugler,Carl O. Wirsen, Georg Fuchs, Craig D. Alvarez-Cohen, Robert E Blankenship, and Yinjie J Taylor, and Stefan M. Sievert. Evidence for autotrophic Tang. Carbon flow of heliobacteria is related more to co2 fixation via the reductive tricarboxylic acid cycle by clostridia than to the green sulfur bacteria. Journal of members of the ε subdivision of proteobacteria. J. Bac- Biological Chemistry, 285(45):35104–35112, 2010. teriology, 187:3020–3027, 2005. [74] Yinjie J Tang, Shan Yi, Wei-Qin Zhuang, Stephen H [60] Georg Fuchs. Alternative pathways of carbon dioxide Zinder, Jay D Keasling, and Lisa Alvarez-Cohen. In- fixation: Insights into the early evolution of life? Ann. vestigation of carbon metabolism in dehalococcoides Rev. Microbiol., 65(1):631–658, 2011. ethenogenes strain 195 by use of isotopomer and [61] G Scapin and JS Blanchard. Enzymology of bacterial transcriptomic analyses. Journal of bacteriology, lysine biosynthesis. Advances in enzymology and related 191(16):5224–5231, 2009. areas of , 72:279, 1998. [75] Xueyang Feng, Kuo-Hsiang Tang, Robert E Blanken- [62] Ronald Bentley and E. Haslam. The shikimate pathway ship, and Yinjie J Tang. Metabolic flux analysis of the a metabolic tree with many branches. Critical Reviews mixotrophic metabolisms in the green sulfur bacterium in Biochemistry and Molecular Biology, 25(5):307–384, chlorobaculum tepidum. Journal of Biological Chem- 1990. istry, 285(50):39544–39550, 2010. 23

[76] Bing Wu, Baichen Zhang, Xueyang Feng, Jacob R sition, reaction mechanism, and physiological signifi- Rubens, Rick Huang, Leslie M Hicks, Himadri B cance. Mol. Cell. Biochem., 1:169–187, 1973. Pakrasi, and Yinjie J Tang. Alternative isoleucine syn- [91] John E Cronan, Xin Zhao, and Yanfang Jiang. Func- thesis pathway in cyanobacterial species. Microbiology, tion, attachment and synthesis of lipoic acid in Es- 156(2):596–602, 2010. cherichia coli. Advances in microbial physiology, 50:103– [77] Carla Risso, Stephen J Van Dien, Amber Orloff, 146, 2005. Derek R Lovley, and Maddalena V Coppi. Elucidation of [92] Xin Zhao, J Richard Miller, Yanfang Jiang, Michael A an alternate isoleucine biosynthesis pathway in geobac- Marletta, and John E Cronan. Assembly of the covalent ter sulfurreducens. Journal of bacteriology, 190(7):2266– linkage between lipoic acid and its cognate enzymes. 2274, 2008. Chemistry & biology, 10(12):1293–1302, 2003. [78] Kuo-Hsiang Tang, Xueyang Feng, Yinjie J Tang, and [93] Quin H Christensen and John E Cronan. Lipoic acid Robert E Blankenship. Carbohydrate metabolism and synthesis: A new family of octanoyltransferases gener- carbon fixation in roseobacter denitrificans och114. ally annotated as lipoate protein . Biochemistry, PLoS One, 4(10):e7233, 2009. 49(46):10024–10036, 2010. [79] James B McKinlay and Caroline S Harwood. Carbon [94] Natalia Martin, Quin H Christensen, Mar´ıaC Mansilla, dioxide fixation as a central redox cofactor recycling John E Cronan, and Diego de Mendoza. A novel two- mechanism in bacteria. Proceedings of the National gene requirement for the octanoyltransfer reaction of Academy of Sciences, 107(26):11669–11675, 2010. bacillus subtilis lipoic acid biosynthesis. Molecular mi- [80] Gary Xie, Christian Forst, Carol Bonner, Roy A Jensen, crobiology, 80(2):335–349, 2011. et al. Significance of two distinct types of tryptophan [95] Quin H Christensen, Natalia Martin, Maria C Mansilla, synthase beta chain in bacteria, archaea and higher Diego de Mendoza, and John E Cronan. A novel amido- plants. Genome Biol, 3(1), 2002. transferase required for lipoic acid cofactor assembly in [81] JUERGEN Wiegel, RALPH Tanner, FRED A Rainey, bacillus subtilis. Molecular microbiology, 80(2):350–363, et al. An introduction to the family clostridiaceae. 2011. Prokaryotes, 4:654–678, 2006. [96] Squire J Booker. Unraveling the pathway of lipoic acid [82] Ena Urbach, David J Scanlan, Daniel L Distel, John B biosynthesis. Chemistry & biology, 11(1):10–12, 2004. Waterbury, and Sallie W Chisholm. Rapid diver- [97] Fatemah AM Hermes and John E Cronan. Scavenging sification of marine picophytoplankton with dissimi- of cytosolic octanoic acid by mutant lpla lipoate ligases lar light-harvesting structures inferred from sequences allows growth of escherichia coli strains lacking the lipb of prochlorococcus and synechococcus (cyanobacteria). octanoyltransferase of lipoic acid synthesis. Journal of Journal of molecular evolution, 46(2):188–201, 1998. bacteriology, 191(22):6796–6803, 2009. [83] Brian Palenik, B Brahamsha, FW Larimer, M Land, [98] Charles O Rock. Opening a new path to lipoic acid. L Hauser, P Chain, J Lamerdin, W Regala, EE Allen, Journal of bacteriology, 191(22):6782–6784, 2009. J McCarren, et al. The genome of a motile marine syne- [99] Quin H Christensen and John E Cronan. The ther- chococcus. Nature, 424(6952):1037–1042, 2003. moplasma acidophilum lpla-lplb complex defines a new [84] Gabrielle Rocap, Frank W Larimer, Jane Lamerdin, class of bipartite lipoate-protein ligases. Journal of Bi- Stephanie Malfatti, Patrick Chain, Nathan A Ahlgren, ological Chemistry, 284(32):21317–21326, 2009. Andrae Arellano, Maureen Coleman, Loren Hauser, [100] Robert M Cicchillo, David F Iwig, A Daniel Wolfgang R Hess, et al. Genome divergence in two Jones, Natasha M Nesbitt, Camelia Baleanu-Gogonea, prochlorococcus ecotypes reflects oceanic niche differ- Matthew G Souder, Loretta Tu, and Squire J Booker. entiation. Nature, 424(6952):1042–1047, 2003. Lipoyl synthase requires two equivalents of s-adenosyl- [85] Marina Omelchenko, Yuri Wolf, Elena Gaidamakova, l-methionine to synthesize one equivalent of lipoic acid. Vera Matrosova, Alexander Vasilenko, Min Zhai, Biochemistry, 43(21):6378–6386, 2004. Michael Daly, Eugene Koonin, and Kira Makarova. [101] T Fitzpatrick, Nikolaus Amrhein, Barbara Kappes, Pe- Comparative genomics of thermus thermophilus and ter Macheroux, Ivo Tews, and Thomas Raschle. Two deinococcus radiodurans: divergent routes of adapta- independent routes of de novo vitamin b6 biosynthesis: tion to thermophily and radiation resistance. BMC Evo- not that different after all. Biochem. J, 407:1–13, 2007. lutionary Biology, 5(1):57, 2005. [102] Sabrina M Austin and Thomas G Waddell. Prebiotic [86] HE Umbarger. Amino acid biosynthesis and its regu- synthesis of vitamin b6-type compounds. Origins of Life lation. Annual review of biochemistry, 47(1):533–606, and Evolution of the Biosphere, 29(3):287–296, 1999. 1978. [103] Harold J Morowitz, Vijayasarathy Srinivasan, and Eric [87] HERBERT Grimminger and HE Umbarger. Acetohy- Smith. The swiss army knife of biological catalysis: A droxy acid synthase i of escherichia coli: purification compact toolkit of organic functional groups. Complex- and properties. Journal of bacteriology, 137(2):846–853, ity, 11(3):9–10, 2006. 1979. [104] Kristin E Burns, Yun Xiang, Cynthia L Kinsland, [88] Z Barak, DAVID M Chipman, and NATAN Gollop. Fred W McLafferty, and Tadhg P Begley. Recon- Physiological implications of the specificity of acetohy- stitution and biochemical characterization of a new droxy acid synthase isozymes of enteric bacteria. Jour- pyridoxal-5’-phosphate biosynthetic pathway. Journal nal of bacteriology, 169(8):3750–3756, 1987. of the American Chemical Society, 127(11):3682–3683, [89] Vijayasarathy Srinivasan and Harold J. Morowitz. 2005. The canonical network of autotrophic intermediary [105] Thomas Raschle, Nikolaus Amrhein, and Teresa B Fitz- metabolism: Minimal metabolome of a reductive patrick. On the two components of pyridoxal 5?- chemoautotroph. Biol. Bulletin, 216:126–130., 2009. phosphate synthase from bacillus subtilis. Journal of [90] Goro Kikuchi. The glycine cleavage system: Compo- Biological Chemistry, 280(37):32291–32300, 2005. 24

[106] Marilyn Ehrenshaft, Piotr Bilski, Ming Y Li, Colin F [117] Yang Zhang, Mariya Morar, and Steven E Ealick. Struc- Chignell, and Margaret E Daub. A highly conserved tural biology of the purine biosynthetic pathway. Cellu- sequence is a novel gene involved in de novo vitamin b6 lar and molecular life sciences, 65(23):3699–3724, 2008. biosynthesis. Proceedings of the National Academy of [118] Leon A. Heppel and R. J. Hilmoe. Purification and Sciences, 96(16):9374–9378, 1999. properties of 5-nucleotidase. Journal of Biological [107] Gerhard Mittenhuber et al. Phylogenetic analyses and Chemistry, 188(2):665–676, 1951. comparative genomics of vitamin b6 (pyridoxine) and [119] Paul Berg and W. K. Joklik. Enzymatic phosphoryla- pyridoxal phosphate biosynthesis pathways. Journal tion of nucleoside diphosphates. Journal of Biological of molecular microbiology and biotechnology, 3(1):1–20, Chemistry, 210(2):657–672, 1954. 2001. [120] Robert H White. Purine biosynthesis in the domain [108] Yan Boucher and W Ford Doolittle. The role of lateral archaea without folates or modified folates. Journal of gene transfer in the evolution of isoprenoid biosynthesis bacteriology, 179(10):3374–3377, 1997. pathways. Molecular microbiology, 37(4):703–716, 2000. [121] Ariane Marolewski, John M Smith, and Stephen J [109] Pascale Infossi, Elisabeth Lojou, Jean-Paul Chauvin, Benkovic. Cloning and characterization of a new purine Gaetan Herbette, Myriam Brugna, and Marie-Th´er`ese biosynthetic enzyme: a non-folate glycinamide ribonu- Giudici-Orticoni. Aquifex aeolicus membrane hydroge- cleotide transformylase from e. coli. Biochemistry, nase for hydrogen biooxidation: Role of lipids and phys- 33(9):2531–2537, 1994. iological partners in enzyme stability and activity. Inter- [122] Anupama Ahuja, Cristina Purcarea, Hedeel I Guy, and national Journal of Hydrogen Energy, 35:10778–10789, David R Evans. A novel carbamoyl-phosphate syn- 2010. Indo-French Workshop on Biohydrogen: from Ba- thetase from aquifex aeolicus. Journal of Biological sic Concepts to Technology. Chemistry, 276(49):45694–45703, 2001. [110] MASAHARU Ishii, TOSHIYUKI Kawasumi, YASUO [123] Anupama Ahuja, Cristina Purcarea, Richard Ebert, Igarashi, TOHRU Kodama, and YASUJI Minoda. Sharon Sadecki, Hedeel I Guy, and David R Evans. 2-methylthio-1, 4-naphthoquinone, a unique sulfur- Aquifex aeolicus dihydroorotase association with aspar- containing quinone from a thermophilic hydrogen- tate transcarbamoylase switches on catalytic activity. oxidizing bacterium, hydrogenobacter thermophilus. Journal of Biological Chemistry, 279(51):53136–53144, Journal of bacteriology, 169(6):2380–2384, 1987. 2004. [111] SEIGO SHIMA and KEN-ICHIRO SUZUKI. Hy- [124] KARL HEINZ Schleifer and Otto Kandler. Peptido- drogenobacter acidophilus sp. nov., a thermoacidophilic, glycan types of bacterial cell walls and their taxonomic aerobic, hydrogen-oxidizing bacterium requiring ele- implications. Bacteriological reviews, 36(4):407, 1972. mental sulfur for growth. International journal of sys- [125] Linda L Jahnke, Wolfgang Eder, Robert Huber, tematic bacteriology, 43(4):703–708, 1993. Janet M Hope, Kai-Uwe Hinrichs, John M Hayes, [112] R Stohr, Arne Waberski, Horst V¨olker, Brian J David J Des Marais, Sherry L Cady, and Roger E Tindall, and Michael Thomm. Hydrogenothermus Summons. Signature lipids and stable carbon isotope marinus gen. nov., sp. nov., a novel thermophilic analyses of octopus spring hyperthermophilic commu- hydrogen-oxidizing bacterium, recognition of calder- nities compared with those ofaquificales representatives. obacterium hydrogenophilum as a member of the Applied and Environmental Microbiology, 67(11):5179– genus hydrogenobacter and proposal of the reclassifica- 5189, 2001. tion of hydrogenobacter acidophilus as hydrogenobac- [126] Joseph W. Lengeler, Gerhart Drews, and Hans G. ulum acidophilum gen. nov., comb. nov., in the phy- Schlegel. Biology of the Prokaryotes. Blackwell Science, lum’hydrogenobacter/aquifex’. International journal of New York, 1999. systematic and evolutionary microbiology, 51(5):1853– [127] Yong-Mei Zhang and Charles O Rock. Membrane lipid 1862, 2001. homeostasis in bacteria. Nature Reviews Microbiology, [113] W. Nitscke, D.M. Kramer, A. Riedel, and U. Liebl. 6(3):222–233, 2008. From naptho- to benzoquinones - (r)evolutionary reor- [128] David L. Valentine. Adaptations to energy stress dictate ganizations of electron transfer chains. In P. Mathis, ed- the ecology and evolution of the archaea. Nat. Rev. Mi- itor, Photosynthesis: from Light to the Biosphere, vol. cro, 5:316–323, 2007. 1, pages 945–950. Kluwer Academic Press, Dordrecht, [129] GH Peng, G Fritzsch, V Zickermann, H Schaagger, 1995. R Mentele, F Lottspeich, M Bostina, M Radermacher, [114] Michael Sch¨utz, Myriam Brugna, Evelyne Lebrun, R Huber, KO Stetter, and H Michel. Isolation, charac- Frauke Baymann, Robert Huber, Karl-Otto Stetter, terization and electron microscopic single particle anal- G¨unter Hauska, Ren´eToci, Danielle Lemesle-Meunier, ysis of the nadh : ubiquinone oxidoreductase (complex Pascale Tron, et al. Early evolution of cytochrome bc i) from the hyperthermophilic eubacterium aquifex ae- complexes. Journal of molecular biology, 300(4):663– olicus. Biochemistry, 42:3032–3039, 2003. 675, 2000. [130] Marianne Guiral, Laurence Prunetti, Sabraina Lignon, [115] Barbara Schoepp-Cothenet, Cl´ement Lieutaud, R´egine Lebrun, Danielle Moinier, and Marie-Th´er`es Frauke Baymann, Andr´e Verm´eglio, Thorsten Giuici-Orticoni. New insights into the respiratory chains Friedrich, David M. Kramer, and Wolfgang Nitschke. of the chemolithoautotrophic and hyperthermophilic Menaquinone as a pool quinone in a purple bacterium. bacterium Aquifex aeolicus. J. Proteome Res., 8:1717– Proc. Nat. Acad. Sci. USA, 106:8549–8554, 2005. 1730, 2009. [116] Tina M Iverson, C´esarLuna-Chavez, Gary Cecchini, [131] Costantino Vetriani, Mark D. Speck, Susan V. and Douglas C Rees. Structure of the escherichia Ellor, Richard A. Lutz, and Valentin Starovoy- coli fumarate reductase respiratory complex. Science, tov. Thermovibrio ammonificans sp. nov., a ther- 284(5422):1961–1966, 1999. mophilic, chemolithotrophic, nitrate-ammonifying bac- 25

terium from deep-sea hydrothermal vents. International [139] F Verte, V Kostanjevecki, L De Smet, TE Meyer, Journal of Systematic and Evolutionary Microbiology, MA Cusanovich, and JJ Van Beeumen. Identification of 54(1):175–181, 2004. a thiosulfate utilization gene cluster from the green pho- [132] Laurence Prunetti, Pascale Infossi, Myriam Brugna, totrophic bacterium chlorobium limicola. Biochemistry, Christine Ebel, Marie-Th`er´ese Giudici-Orticoni, and 41:2932–2945, 2002. Marianne Guiral. New functional sulfide oxidase-oxygen [140] Wriddhiman Ghosh, Somnath Mallick, and Sujoy Ku- reductase supercomplex in the membrane of the hyper- mar DasGupta. Origin of the sox multienzyme complex thermophilic bacterium aquifex aeolicus. Journal of Bi- system in ancient thermophilic bacteria and coevolution ological Chemistry, 285:41815–41826, 2010. of its constituent proteins. Research in microbiology, [133] M Guiral, T Aubert, and MT Giudici-Orticoni. Hy- 160(6):409–420, 2009. drogen metabolism in the hyperthermophilic bacterium [141] Ryoko Sano, Masafumi Kameya, Satoshi Wakai, Hi- aquifex aeolicus. Biochemical society transactions, royuki Arai, Yasuo Igarashi, Masaharu Ishii, and 33(Part 1):22–24, 2005. International Hydrogenases Yoshihiro Sambongi. Thiosulfate oxidation by a Conference, Reading, ENGLAND, AUG 24-29, 2004. thermo-neutrophilic hydrogen-oxidizing bacterium, hy- [134] Marianne Brugna-Guiral, Pascale Tron, Wolfgang drogenobacter thermophilus. Bioscience, biotechnology, Nitschke, Karl-Otto Stetter, Benedicte Burlat, Bruno and biochemistry, 74(4):892–894, 2010. Guigliarelli, Mireille Bruschi, and Marie Th´er`ese [142] Daisuke Miyake, Shin-ichi Ichiki, Miyako Tanabe, Giudici-Orticoni. [nife] hydrogenases from the hyper- Takahiro Oda, Hisao Kuroda, Hirofumi Nishihara, and thermophilic bacterium Aquifex aeolicus: properties, Yoshihiro Sambongi. Thiosulfate oxidation by a mod- function, and phylogenetics. Extremophiles, 7:145–157, erately thermophilic hydrogen-oxidizing bacterium, hy- 2003. drogenophilus thermoluteolus. Archives of microbiology, [135] Michael Schutz, Barbara Schoepp-Cothenet, Elisabeth 188(2):199–204, 2007. Lojou, Mireille Woodstra, Doris Lexa, Pascale Tron, [143] Marie-C´ecile Giuliani, Pascale Tron, Gis`ele Leroy, Alain Dolla, Marie-Claire Durand, Karl Otto Stetter, Corinne Aubert, Patrick Tauc, and Marie-Th´er`ese and Frauke Baymann. The naphthoquinol oxidizing cy- Giudici-Orticoni. A new sulfurtransferase from the tochrome bc1 complex of the hyperthermophilic knall- hyperthermophilic bacterium aquifex aeolicus. FEBS gasbacterium aquifex aeolicus: Properties and phylo- Journal, 274:4572–4587, 2007. genetic relationships. Biochemistry, 42:10800–10808, [144] Marie-C´ecile Giuliani, C´ecile Jourlin-Castelli, Gis`ele 2003. Leroy, Aderrahman Hachani, and Marie Th´er`ese [136] T Nubel, C Klughammer, R Huber, G Hauska, and Giudici-Orticoni. Characterization of a new periplasmic M Schutz. Sulfide : quinone oxidoreductase in mem- single-domain rhodanese encoded by a sulfur-regulated branes of the hyperthermophilic bacterium aquifex ae- gene in a hyperthermophilic bacterium aquifex aeolicus. olicus (vf5). Archives of microbiology, 173:233–244, Biochimie, 92:388–397, 2010. 2000. [145] Zhihao Yu, Eric B Lansdon, Irwin H Segel, and An- [137] Marianne Guiral, Pascale Tron, Corinne Aubert, drew J Fisher. Crystal structure of the bifunctional Alexandre Gloter, Chantal Iobbi-Nivol, and Marie- atp sulfurylase-: Aps kinase from the chemolithotrophic Th´er`esGiuici-Orticoni. A membrane-bound multien- thermophile aquifex aeolicus. Journal of molecular bi- zyme, hydrogen-oxidizing, and sulfur-reducing complex ology, 365(3):732–743, 2007. from the hyperthermophilic bacterium Aquifex aeolicus. [146] Guohong Peng, Mihnea Bostina, Michael Radermacher, J. Biol. Chem., 280:42004–42015, 2005. Isam Rais, Michael Karas, and Hartmut Michel. Bio- [138] George Wald. Life in the second and third periods: or chemical and electron microscopic characterization of why phosphorus and sulfur for high-energy bonds. In the f1f0 atp synthase from the hyperthermophilic eubac- M. Kasha and B. Pullman, editors, Horizons in bio- terium aquifex aeolicus. FEBS letters, 580:5934–5940, chemistry, pages 127–142, New York, 1962. Academic 2006. Press.