Downloaded from genome.cshlp.org on October 8, 2021 - Published by Cold Spring Harbor Laboratory Press Letter The Phylogenetic Extent of Metabolic Enzymes and Pathways José Manuel Peregrin-Alvarez, Sophia Tsoka, Christos A. Ouzounis1 Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK The evolution of metabolic enzymes and pathways has been a subject of intense study for more than half a century. Yet, so far, previous studies have focused on a small number of enzyme families or biochemical pathways. Here, we examine the phylogenetic distribution of the full-known metabolic complement of Escherichia coli, using sequence comparison against taxa-specific databases. Half of the metabolic enzymes have homologs in all domains of life, representing families involved in some of the most fundamental cellular processes. We thus show for the first time and in a comprehensive way that metabolism is conserved at the enzyme level. In addition, our analysis suggests that despite the sequence conservation and the extensive phylogenetic distribution of metabolic enzymes, their groupings into biochemical pathways are much more variable than previously thought. One of the fundamental tenets in molecular biology was ex- reliable source of metabolic information. The EcoCyc data- pressed by Monod, in his famous phrase “What is true for base holds information about the full genome and all known Escherichia coli is true for the elephant” (Jacob 1988). For a metabolic pathways of Escherichia coli (Karp et al. 2000). Re- long time, this statement has inspired generations of molecu- cently, the database has been used to represent computational lar biologists, who have used Bacteria as model organisms to predictions of other organisms (Karp 2001). understand the basic principles of life. The discovery of the three domains of life (Woese and Fox 1977) testified that there exist some pronounced differences between organisms, RESULTS for example in transcription regulation (Struhl 1999). Para- We have searched the nonredundant protein sequence data- doxically, metabolism has always been considered as one of base, previously partitioned in seven major taxonomic the most conserved cellular processes (Lehninger 1979), that groups, with all 548 enzymes from the known metabolic remains invariable from Bacteria to Eucarya, but no quantifi- complement of Escherichia coli. Whenever a homolog of each cation of this view has been provided. query enzyme is found in the corresponding taxonomic Instead, the phylogenetic extent of metabolism has been group, this is recorded into a binary vector (see Methods). assessed by experimental case studies of individual biochemi- Conceptually, this approach is similar to a low-resolution ver- cal pathways (Crawford 1989) and, more recently, by com- sion of the phylogenetic profile method (Pellegrini et al. parative genomics. Entire genome sequences from a wide va- 1999). Instead of searching individual species, however, we riety of species offered the possibility of performing metabolic focus on major taxonomic groups and we seek enzymes that reconstruction, based on known metabolic pathways and ge- “travel together” within or across these groups. No assump- nome sequence comparison (Karp et al. 1996). Case studies tions about functional roles or associations are made (Pelle- have suggested that even some of the most central pathways grini et al. 1999): We only examine the phylogenetic extent of in biochemistry such as the citric acid cycle (Huynen et al. the query set, in this case the entire known metabolic comple- 1999), glycolysis (Dandekar et al. 1999), and amino acid bio- ment of E. coli. The end result is a matrix of 548 genes across synthetic pathways (Forst and Schulten 2001) may vary sig- all the taxonomic combinations; in all, 37 (out of 128 pos- nificantly over large phylogenetic distances. sible) such combinations can be observed. Genes that have A comprehensive analysis of metabolism has not been the same distribution pattern (i.e., identical binary vectors) in performed until now, possibly due to the scarcity of system- each taxonomic category are collected accordingly. atically collected information on genome sequences and The majority of E. coli enzymes (274 of them, or 50%) metabolic pathways. Metabolic enzyme families are consid- have homologs in all domains of life, Bacteria, Archaea, and ered to be highly conserved and have been used to reconstruct Eucarya, covering six taxonomic combinations (Fig. 1). Fur- the deep branching patterns of the tree of life (Doolittle et al. thermore, there are an additional 13 enzymes (2%) which are 1996). Yet, it remains unclear which enzymes are represented universally present in all seven taxa, including the viruses in all major taxa, what pathways they participate in, and (Table 1). This universal set represents enzyme families in- which ones are most conserved at the sequence level. volved in various biochemical processes, including amino We set out to address the phylogenetic extent and con- acid, cofactor, and nucleotide biosynthesis (Kyrpides et al. servation of enzymes and pathways by using a highly curated, 1999). It is worth noting that the presence of all these en- zymes in viruses is not fully understood yet (Kaiser et al. 1999). Enzymes present in Bacteria and (1) Eucarya are 57, 1 Corresponding author. covering four taxonomic combinations (10%; e.g., glu- E-MAIL [email protected]; FAX 44-1223-494471. Article and publication are at http://www.genome.org/cgi/doi/10.1101/ cosamine-6-phosphate isomerase) or (2) Archaea are 52 (9%) gr.246903. (e.g., cytochrome D ubiquinol oxidase). It is interesting that 422 Genome Research 13:422–427 ©2003 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/03 $5.00; www.genome.org www.genome.org Downloaded from genome.cshlp.org on October 8, 2021 - Published by Cold Spring Harbor Laboratory Press Evolution of Metabolic Pathways taxa (e.g., Protista or Fungi) may be significantly underrepresented in terms of the amount of available se- quence data, we have assessed the gross pattern of taxonomic distri- butions by combining all four eu- karyotic taxa (Protista, Fungi, Plants, Metazoa) into one group. It is striking that the 71 Bacteria- specific enzymes are significantly underrepresented compared to the control set (with an average of 195 proteins that are Bacteria-specific; Fig. 2A). The 52 enzymes with ho- mologs in Archaea but not Eucarya are slightly underrepresented (with Figure 1 Distribution of the 548 known E. coli metabolic enzymes into 37 taxonomic combinations an average of 63 proteins in the (see Methods). The seven taxonomic groups correspond to domains of life in the case of Archaea and control sets for this taxonomic Bacteria or the four major eukaryotic groups (Fungi, Metazoa, Protista, Viridiplantae), while Viruses are grouping; Fig. 2B). Finally, if we considered as an additional group (see Methods). Universal enzymes are colored in gray (dark gray for take enzymes with homologs in Ar- those with viral homologs), those with homologs in Eucarya in blue, with homologs in Archaea in red, and in Bacteria only in green. Other combinations of the seven taxonomic groups are shown in white chaea, Bacteria, all four eukaryotic (see Methods). taxa, and viruses (170 in total, or 31%), it is evident that this set of proteins stands out as being over- 71 enzymes are present only in Bacteria (13%; e.g., L-fucose represented in this universal taxonomic pattern compared isomerase), possibly representing unique metabolic capabili- with random (Fig. 2C). It is thus evident that randomly se- ties of this taxon. Notably, we have not observed any meta- lected proteins from a bacterial genome tend to be confined bolic enzymes that are species-specific to E. coli. Finally, the within the corresponding taxonomic group, while metabolic remaining 81 enzymes (15%) have homologs in various taxo- enzymes are expected to be present across a much wider phy- nomic combinations (24 in total), with very low counts that logenetic spectrum (Fig. 2). Enzymes with homologs in other are not statistically significant (see below). Overall, 52% of taxonomic group combinations exhibit similarly strong de- known metabolic enzymes from E. coli are found to be present viations from a random background being either over- or un- in all three domains, a fact indicating that metabolism is derrepresented in the corresponding combination (Fig. 3). highly conserved during evolution (Fig. 1). To examine which enzymes are actually most conserved To assess whether the observed patterns of phylogenetic at the sequence level, we recorded all pairwise sequence iden- distribution for metabolic enzymes are different from any tity values between the E. coli enzymes and their homologs other proteins, we have performed simulations of this analy- from Homo sapiens (Table 2), as an indicative measure of pro- sis using protein sets of equal size, randomly selected from the tein sequence conservation. There are 11 E. coli metabolic -and have se (10%ע) E. coli genome (see Methods). Because some of the eukaryotic enzymes which have similar lengths Table 1. The 13 Metabolic Enzymes Which Have Homologs in All Major Taxonomic Partitions, Including Viruses Accession Description Pathway P00379 Dihydrofolate reductase (EC 1.5.1.3) FormylTHF biosynthesis, folic acid biosynthesis P00470 Thymidylate synthase (EC 2.1.1.45) FormylTHF biosynthesis, deoxypyrimidine nucleotide/side metabolism P00479 Aspartate carbamoyltransferase
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages7 Page
-
File Size-