Arxiv:1309.4467V1 [Q-Bio.MN] 17 Sep 2013

Metabolic evolution of a deep-branching hyperthermophilic chemoautotrophic bacterium Rogier Braakman Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA Eric Smith Krasnow Institute for Advanced Study, George Mason University, 4400 University Drive, Fairfax, VA 22030, USA and Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA (Dated: September 19, 2013) Aquifex aeolicus is a deep-branching hyperthermophilic chemoautotrophic bacterium restricted to hydrothermal vents and hot springs. These characteristics make it an excellent model system for studying the early evolution of metabolism. Here we present the whole-genome metabolic network of this organism and examine in detail the driving forces that have shaped it. We make extensive use of phylometabolic analysis, a method we recently introduced that generates trees of metabolic phenotypes by integrating phylogenetic and metabolic constraints. We reconstruct the evolution of a range of metabolic sub-systems, including the reductive citric acid (rTCA) cycle, as well as the biosynthesis and functional roles of several amino acids and cofactors. We show that A. aeolicus uses the reconstructed ancestral pathways within many of these sub-systems, and highlight how the evolutionary interconnections between sub-systems facilitated several key innovations. Our analyses further highlight three general classes of driving forces in metabolic evolution. One is the duplication and divergence of genes for enzymes as these progress from lower to higher substrate specificity, improving the kinetics of certain sub-systems. A second is the kinetic optimization of established pathways through fusion of enzymes, or their organization into larger complexes. The third is the minimization of the ATP unit cost to synthesize biomass, improving thermodynamic efficiency. Quantifying the distribution of these classes of innovations across metabolic sub-systems and across the tree of life will allow us to assess how a tradeoff between maximizing growth rate and growth efficiency has shaped the long-term metabolic evolution of the biosphere. Introduction While A. aeolicus has been the focus of substantial ex- perimental efforts (see Ref. [5] for a review), it has not been characterized nearly as extensively as other model Metabolism lies at the heart of cellular physiology, act- systems for which highly curated metabolic models ex- ing as a chemical transformer between environmental in- ist. In addition, the inherent uncertainty of genome an- puts and components of biomass. Identifying rules and notation from sequence alone [6, 7], while overall sig- principles that underlie metabolic architecture can thus nificantly improving for next-generation methods [8], is provide important insights into how basic properties of compounded by the deep-branching position and ex- chemistry and physics constrain living systems. Of par- tremophile lifestyle of this organism. Metabolic recon- ticular relevance to understanding the chemical history struction protocols generally rely on heuristic rules to of the biosphere is the foundational layer of autotrophic deal with the inevitable network \gaps" that result from metabolism, which fixes CO2 and ultimately provides the misannotation or the presence of genes of unknown func- ecological support to all forms of heterotrophy. tion. Such protocols tend to perform well in predicting The merits of this view [1] were highlighted in a re- basic aspects of phenotype, such as growth rate, particu- cent study on the early evolution of carbon-fixation path- larly for well-studied organisms [9, 10], but it is less clear arXiv:1309.4467v1 [q-bio.MN] 17 Sep 2013 ways, which concluded that environmentally-driven in- what level of confidence to assign them when the focus novations in this process underpin most of the deepest is the evolution of specific metabolic sub-systems. More- branches in the tree of life [2]. To extend our analy- over, reconstructing an individual metabolic network re- sis of the early evolution of metabolism and of autotro- quires substantial effort and provides only a single \snap- phy, we present here a whole-genome reconstruction of shot" of an evolutionary process that has played out over the metabolic network of Aquifex aeolicus. A. aeolicus several billion years. is a chemoautotroph, deriving both biomass and energy For these reasons we utilize phylometabolic analysis from inorganic chemical compounds, and is one of the (PMA) [2] to guide the reconstruction of the metabolic deepest-branching and most thermophilic known bacte- network of A. aeolicus from its genome [11]. PMA gen- ria [3]. Deep-branching clades restricted to hydrother- erates trees of functional metabolic networks (i.e. phe- mal vents are generally considered to contain some of notypes) by integrating metabolic and phylogenetic re- the most conservative metabolic features as a result of constructions. The power of PMA derives from a simple high degree of long-term stability provided by these en- yet versatile constraint: the continuity of life in evolu- vironments [4] tion. Since metabolic pathways are the supply lines of 2 monomers from which all life is constructed, the conti- While selection for improved kinetics has probably nuity of life requires that at the ecosystem level some lowered the overall occurrence of promiscuous enzymes, pathway to a given universal metabolite must be com- metabolism maintains a substantial degree of enzyme plete in any evolutionary sequence across different parts promiscuity. For example, E. coli mutants in which an of the tree of life. The distribution of metabolic genes essential metabolic pathway was knocked out have been in different pathways to given metabolites, within and observed to recruit an alternate pathway from parts nor- across clades, thus informs the most likely completions mally used for other functions to maintain growth [24]. in individuals, while distributions of pathways suggest In addition to promiscuity in the binding of alternate the evolutionary sequences that connect them (see also substrates while local functional group transformation is Methods). We recently introduced PMA to reconstruct preserved (substrate promiscuity), promiscuity can also the evolutionary history of carbon-fixation, relating all occur through the catalysis of alternate reaction mech- extant pathways to a single ancestral form [2]. Here we anisms (catalytic promiscuity). The form of promiscu- show the versatility of this approach, using it to recon- ity most frequently encountered in extant cells appears struct the complete whole-genome metabolic network of to be substrate promiscuity [25]. In general one might an individual species, while further examining the evolu- expect that the inherent \messiness" of enzymatic chem- tionary driving forces that have shaped the network. istry leads to a cost-benefit tradeoff in the evolution of As we will show, A. aeolicus synthesizes a significant substrate specificity, where complete specificity is diffi- fraction of its biomass through metabolic pathways that cult to achieve and moreover disadvantageous because it appear to represent conserved forms of the ancestral would decrease the capacity for future adaptation [23]. pathways to those metabolites. This is relevant in de- In particular one would expect this tradeoff to be dif- bates on the position of this organism within the tree ferent for core processes, where a higher mass flux can sig- of life. Initial phylogenetic studies based on 16S rRNA nificantly amplify the benefits of improved kinetics, ver- suggested that the Aquificales represent potentially the sus more peripheral processes that have lower mass flux. deepest branch within the bacterial domain [12, 13]. In keeping with this expectation, it is found that sub- Later studies of conserved insertion-deletions (indels) in strate promiscuity tends to increase toward the metabolic a range of proteins led to the conclusion that Aquifi- periphery [26], while reaction rate constants of enzymes cales are instead a later branch more closely related to tend to increase toward the metabolic core [27]. The -proteobacteria [14], but this was subsequently found to idea that selection for kinetics has determined the degree be likely the result of substantial horizontal gene trans- of specificity of a pathway's enzymes raises the intrigu- fer (HGT) between these two clades [15]. It has since ing possibility that prior to selection for increased sub- become clear that Aquificales and -proteobacteria rep- strate specificity, homologous reaction sequences could resent the dominant clades of primary producers near hy- have initially been catalyzed by the same set of promis- drothermal vents [16], and ecological association is now cuous enzymes [19, 28, 29], allowing earlier metabolisms understood to be a major driver of HGT [17]. Together with greater abundances of such homologous sequences this appears to have restored some consensus on the very to be controlled with smaller genomes. deep-branching position of A. aeolicus [15, 18], which is We will discuss the evolution of several sub-networks further supported by our analysis of its metabolism. in the metabolism of A. aeolicus that provide illustra- tions of these general principles. We will show that com- pared with later branching autotrophs A. aeolicus uses Innovations in metabolic evolution a greater abundance of repeated parallel chemical sequences catalyzed by enzymes with high sequence sim- Our analysis highlights three classes of innovations in ilarity, which could

Arxiv:1309.4467V1 [Q-Bio.MN] 17 Sep 2013

Yeast Genome Gazetteer P35-65

Table S1. List of Genes Up- Or Down-Regulated in H99 When Bound by 18B7

Supplemental Methods

Assigning Folds to the Proteins Encoded by the Genome of Mycoplasma Genitalium (Protein Fold Recognition͞computer Analysis of Genome Sequences)

Practical Structure-Sequence Alignment of Pseudoknotted Rnas Wei Wang

UC San Diego UC San Diego Electronic Theses and Dissertations

ABSTRACT DATA DRIVEN APPROACHES to IDENTIFY DETERMINANTS of HEART DISEASES and CANCER RESISTANCE Avinash Das Sahu, Doctor Of

Paraburkholderia Phymatum Homocitrate Synthase Nifv Plays a Key Role for Nitrogenase Activity During Symbiosis with Papilionoids and in Free-Living Growth Conditions

Etwork Inference from Vector Representations of Words

Predicting Protein Folding Pathways Using Ensemble Modeling and Sequence Information

O O2 Enzymes Available from Sigma Enzymes Available from Sigma

Protein Engineering Design & Selection