<<

GBE

Different Ways of Doing the Same: Variations in the Two Last Steps of the Biosynthetic Pathway in

Dennifier Costa Brandao~ Cruz1, Lenon Lima Santana1, Alexandre Siqueira Guedes2, Jorge Teodoro de Souza3,*, and Phellippe Arthur Santos Marbach1,* 1CCAAB, Biological Sciences, Recoˆ ncavo da Bahia Federal University, Cruz das Almas, Bahia, Brazil 2Agronomy School, Federal University of Goias, Goiania,^ Goias, Brazil 3 Department of Phytopathology, Federal University of Lavras, Minas Gerais, Brazil Downloaded from https://academic.oup.com/gbe/article/11/4/1235/5345563 by guest on 27 September 2021

*Corresponding authors: E-mails: [email protected]fla.br; [email protected]. Accepted: February 16, 2019

Abstract The last two steps of the purine biosynthetic pathway may be catalyzed by different in prokaryotes. The that encode these enzymes include homologs of purH, purP, purO and those encoding the AICARFT and IMPCH domains of PurH, here named purV and purJ, respectively. In , these reactions are mainly catalyzed by the domains AICARFT and IMPCH of PurH. In , these reactions may be carried out by PurH and also by PurP and PurO, both considered signatures of this and analogous to the AICARFT and IMPCH domains of PurH, respectively. These genes were searched for in 1,403 completely sequenced prokaryotic publicly available. Our analyses revealed taxonomic patterns for the distribution of these genes and anticorrelations in their occurrence. The analyses of bacterial genomes revealed the existence of genes coding for PurV, PurJ, and PurO, which may no longer be considered signatures of the domain Archaea. Although highly divergent, the PurOs of Archaea and Bacteria show a high level of conservation in the amino acids of the active sites of the , allowing us to infer that these enzymes are analogs. Based on the results, we propose that the purO was present in the common ancestor of all living beings, whereas the gene encoding PurP emerged after the divergence of Archaea and Bacteria and their isoforms originated in duplication events in the common ancestor of phyla and . The results reported here expand our understanding of the diversity and evolution of the last two steps of the purine biosynthetic pathway in prokaryotes. Key words: Archaea, Bacteria, bioinformatics, comparative genomics, evolution, phylogeny.

Introduction There are generally ten pur genes in the following order: and their derivatives are biomolecules essential to all purF, purD, purN, purL, purM, purE, purK, purC, purB,and living as they play important roles in signaling path- purH, each encoding an responsible for a step in the ways, carbohydrate metabolism, and as precursors of nucleic purine biosynthetic pathway (PBP). The majority of the pur acids (Smith and Atkins 2002). Additionally, the enzymes that genes were initially described in and later, catalyze their synthesis are important targets for antimicrobial orthologs were found in other prokaryotes and , and anticancer compounds (Kirsch and Whitney 1991). The suggesting that these are the canonical genes encoding enzymatic reactions involved in the de novo biosynthesis of enzymes of the PBP in different evolutionary lineages purines were elucidated in the 1950s (Buchanan and Hartman (Chopra et al. 1991; Ni et al. 1991; Gu et al. 1992; Rayl 1959) and all genes that encode these enzymes were identi- et al. 1996; Nilsson and Kilstrup 1998; Peltonen and fied and their products biochemically characterized by the Mants€ al€ a€ 1999; Sampei et al. 2010; Liu et al. 2014). This year 2000 (Zalkin 1983; Parker 1984; Schrimsher et al. scenario changed when some authors showed that the enzy- 1986; Aiba and Mizobuchi 1989; Watanabe et al. 1989; matic reactions of three of the 10–11 steps of the PBP may be Cheng et al. 1990; Inglese et al. 1990; Gu et al. 1992; He catalyzed by different nonhomologous enzymes in distinct et al. 1992; Marolewski et al. 1994; Graupner et al. 2002; . For example, PurT, a novel enzyme involved Hoskins et al. 2004; Ownby et al. 2005). in the PBP was described in 1994 as an analog of the already

ß The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]

Genome Biol. Evol. 11(4):1235–1249. doi:10.1093/gbe/evz035 Advance Access publication February 20, 2019 1235 Cruz et al. GBE

known PurN that catalyses the third step of the PBP Operational Taxonomical Units (OTUs) in the analyses. The (Marolewski et al. 1994). Two new enzymes, PurP and PurO program TBLASTN was used to perform searches for purH, were later described in the PBP of Archaea as analogs of PurH, purP and purO and for genes that code for the domains until then considered as the canonical enzyme catalyzing the AICARFT and IMPCH in the nucleotide collection (nr/nt), two last steps of the PBP (Graupner et al. 2002; Ownby et al. RefSeq Representative genomes (refseq_representative_ge- 2005). PurP and PurO are currently considered signatures of nomes) and ReFseq (refseq_genomes) databases. the Archaea domain (Graupner et al. 2002; Ownby et al. The genes coding for the domains AICARFT and IMPCH 2005; Zhang, Morar, et al. 2008; Zhang, White, et al. 2008; were named hereafter as purV and purJ,respectively Armenta-Medina et al. 2014). These two separate enzymes (fig. 1). Additionally, the program BLASTP was used to per-

are analogous to the domains AICARFT and IMPCH of PurH, form searches for PurH, PurP, PurO, PurV, and PurJ in the Downloaded from https://academic.oup.com/gbe/article/11/4/1235/5345563 by guest on 27 September 2021 which contains them fused (fig. 1). The domain AICARFT ca- nonredundant protein sequences (nr) database. All TBLASTN talyses the penultimate reaction of the PBP, converting AICAR and BLASTP searches (Altschul et al. 1990) were performed in FAICAR and the domain IMPCH catalyses the last reaction with the default parameters of the programs, except for the of the PBP, converting FAICAR into IMP, the final product of filters and masking options that were disabled. the pathway (Zhang, Morar, et al. 2008). BLAST searches were done individually in each completely A comparative genomic analysis of the Archaea domain sequenced genome and the presence of conserved domains showed that in some free-living of the typical of the searched was used as the homology Euryarchaeota the domains AICARFT and IMPCH of PurH criterion to recover the sequences. All sequences containing are encoded by distinct genes. These archaeal species do the searched domains were recovered and included in the not contain genes encoding PurH nor its analogs PurP and analyses. sequences of PurH (GI: 16131836) of PurO (Brown et al. 2011). In this study, the authors also Escherichia coli and PurP (GI: 15668306) and PurO (GI: showed that species of the phylum Crenarchaeota do not 34588137) of jannaschii were used as possess the genes coding for PurH nor homologs of its queries. The amino acid sequence of PurH was used as query domains or PurO, but PurP was found (Brown et al. 2011). to perform searches for PurH, PurV, and PurJ and for the This study indicates that the diversity of enzymes involved in genes that encode these proteins. the PBP of Archaea is higher than previously thought. In their study, Brown et al. (2011) did not include the domain Diversity of the Last Two Steps of the PBP Bacteria, where most of the prokaryotic diversity resides. Purine biosynthesis is among the most ancient metabolic During the BLAST searches, the presence or absence and the pathways and probably evolved in the LUCA (Caetano- number of copies of purH, purV, purJ, purP,andpurO in all Anolles et al. 2007). According to the hypothesis of the genomes analyzed as well as their genomic context in Horowitz (1945), enzymes in the last steps of biosynthetic relation to the other genes of the PBP were registered. pathways are the first to be recruited. Curiously, the last Taxonomic patterns of occurrence of these genes in prokary- two steps of the PBP show the highest variation. otic genomes were registered in all categories, from domain Nowadays, the availability of completely sequenced to species. genomes in public databases representing most of the diver- sity of prokaryotic higher taxa potentially provide a compre- Multiple Alignments and Phylogeny of PurP and PurO hensive picture of the diversity and evolution of biological The amino acid sequences of the genes purH, purV, purJ, processes. In this study, we report on a genomic analysis purP,andpurO recovered in the BLAST searches were code- concerning the diversity and evolution of the two last steps aligned in the Guidance server with the MAFFT algorithm of the PBP in prokaryotic lineages. The results are presented in (Penn et al. 2010). The program MEGA 6.0 (Tamura et al. a taxonomical framework that includes the currently accepted 2013) was used to edit the multiple alignments of the proteins phylogenetic classification of the prokaryotes. PurP and PurO, choose the substitution matrix, and to per- form the phylogenetic analyses with the maximum likelihood (ML) method. The phylogenetic trees were visualized and Materials and Methods edited in the program Archaeopterix (Han and Zmasek 2009). Searches for Purine Biosynthetic Genes (pur) in Prokaryotic Genomes Amino Acid Sequence Analyses A total of 1,403 nonredundant fully sequenced prokaryotic Sliding window plot analyses were done with the program genomes deposited in the NCBI (National Centre for SWAAP 1.0.2 (Pride 2000) using the multiple alignment of Biotechnology Information) database until July of 2014 were proteins. The average identity of the sequences was calcu- used in this study. Bacterial and archaeal strains identified at lated with the model K2P in a sampling window of ten amino the or species levels were treated as distinct acids with only one amino acid displaced along the multiple

1236 Genome Biol. Evol. 11(4):1235–1249 doi:10.1093/gbe/evz035 Advance Access publication February 20, 2019 Different Ways of Doing the Same GBE

Step 9 Step 10 PurH

C AICARFT IMPCH N

Homologs Analogs Homologs Analogs Downloaded from https://academic.oup.com/gbe/article/11/4/1235/5345563 by guest on 27 September 2021

N AICARFT C N PurP C N IMPCH C N PurO C

PurV PurJ

FIG.1.—Nomenclature and homology/analogy relationships among the proteins involved in the last two steps of the purine biosynthetic pathway. PurH is composed of two domains, IMPCH and AICARFT. The relationships of homology and analogy among the domains of PurH and other proteins are shown. PurH is shown in an inverted position to match the steps of the purine biosynthetic pathway. alignments. The ConSurf Server (http://consurf.tau.ac.il/2016; andinthefamiliesMethanoregulaceae, last accessed March 14, 2019; Ashkenazy et al. 2016)was , , used to estimate the evolutionary conservation of amino acid ,andMethanosarcinaceae and in the positions in two multiple alignments of PurO: the first align- class , families Ferroplasmaceae, ment contained only archaeal PurOs and the second con- Picrophilaceae,andThermoplasmataceae of the phylum tained the PurO sequences of Euryarchaeota (fig. 2 and supplementary table S1, thermoautotrophicus and the bacterial PurOs. The ConSurf Supplementary Material online). In contrast, PurH-coding score was calculated using the default parameters with the genes were found in 1,083 OTUs of the domain Bacteria dis- two multiple alignments described earlier for each amino acid tributed in most phyla of this domain (fig. 2 and supplemen- of the tertiary structure of the PurO from M. thermoautotro- tary table S1, Supplementary Material online). phicus (PDB ID code 2NTL), which was previously functionally Genes that code for proteins homologous to the domains characterized (Kang et al. 2007). The logos were produced in AICARFT and IMPCH of PurH were found in OTUs of the the program Web Logo (Crooks et al. 2004) only with the domains Archaea and Bacteria (fig. 2 and supplementary table positions of the active sites of PurOs. The average identity/ S1, Supplementary Material online). The name purJ was used divergence between archaeal and bacterial proteins was cal- to designate the domain IMPCH of the PurH of Salmonella culated with the software MEGA 6.0 in pairwise comparisons typhymurium, when it was incorrectly identified as a gene between proteins from these two domains. (Gots et al. 1969). Therefore, from this point on, we will use the name purJ for the gene encoding the domain Results IMPCH in accordance with its original nomenclature (Gots et al. 1969). For the domain AICARFT, we propose the Comparative Genomics name purV (fig. 1). The majority of the purV and purJ were The comparative genomic analysis was carried out with a total recovered from bacterial genomes: 109 purVs,28fromthe of 1,403 completely sequenced genomes available in the domain Archaea and 81 from domain Bacteria;and54purJs, NCBI database, representing 1,266 OTUs of the domain 4fromthedomainArchaea and 50 from domain Bacteria Bacteria and 137 OTUs of the domain Archaea.TheseOTUs (table 1 and supplementary table S1, Supplementary represent 27 out of the 34 described phyla of the domain Material online). The genes purV and/or purJ were found in Bacteria and all five phyla of the domain Archaea according 10 of the 27 analyzed, including Gram-positive to List of Prokaryotic names with Standing in Nomenclature— and Gram-negative OTUs, indicating that they are widely LPSN (Euzeby 1997). From the 1,403 analyzed genomes, 132 distributed. did not have genes coding for PurH, PurO, purP, PurV, and The gene purO was until now considered a signature of the PurJ (fig. 2 and supplementary tables S1 and S2, domain Archaea (Graupner et al. 2002; Ownby et al. 2005; Supplementary Material online). Zhang, Morar, et al. 2008; Zhang, White, et al. 2008; PurH-coding genes were found in genomes of 23 OTUs of Armenta-Medina et al. 2014). Surprisingly, 22 homologs of the domain Archaea, all of which are in the class purO were found in bacterial genomes, whereas 59 purOs

Genome Biol. Evol. 11(4):1235–1249 doi:10.1093/gbe/evz035 Advance Access publication February 20, 2019 1237 Cruz et al. GBE

Bacteria Domain Archaea Gram Positive Gram Negative Phylum Crenarchaeota Euryarchaeota Tenericutes Bacteriodetes Clorobi Chrysiogenetes Deferribacteres - Thermus Dictyoglomi Class Acidithiobacillia Class Class Class Class Class Class Oligoflexia Synergistia 1 Downloaded from https://academic.oup.com/gbe/article/11/4/1235/5345563 by guest on 27 September 2021 4 1 2 1 2 1 4 1 8 5 6 3 2 1 3 2 4 4 11 41 49 37 95 94 43 27 33 17 10 14 16 76 12 157 238 156 222 Total OTUs ------1 1 1 3 4 3 6 1 11 11 11 12 41 26 None 1 / 3 1 / 4 3 / 8 1 / 1 / 2 / 3 4 / 3 / 1 / 1 / 3 / 2 / 3 / 9 2 / 4 2 / 1 / 5 / 1 / 2 / 3 4 / 2 / 7 1 / 3 1 / 2 1 / 1 / 1 / 1 / 4 6 / 8 1 / 1 / 1 / 4 / 1 / 2 2 / 1 / 3 / 4 4 / 5 1 / 1 / 3 2 / 2 / 9 2 / 4 2 / 9 / 41 8 / 10 8 / 10 5 / 10 9 / 14 6 / 16 4 / 14 OTUs 10 / 11 12 / 25 10 / 13 19 / 25 13 / 31 49 / 63 26 / 49 46 / 90 10 / 20 16 / 21 10 / 23 Genus/ 64 / 148 72 / 201 66 / 130 94 / 216 purH purV purJ purP purO

FIG.2.—Comparative genomic analysis of genes coding for the ninth and tenth steps of the purine biosynthetic pathway in 1,403 complete prokaryotic genomes. Colored boxes represent presence of the gene and smaller black boxes inside the colored ones represent the cases in which purH is replaced by a combination of genes that are functionally equivalent to purH. The numbers below each taxonomical category indicate the number of genera and OTUs analyzed. None indicates the number of OTUs that do not contain any of the genes encoding the last two steps of the PBP.

Table 1 Occurrence and Co-Occurrence of the Genes Encoding the Last Two Steps of the Purine Biosynthetic Pathway in Bacteria and Archaea and the Genomic Context They are Found Occurrence Per OTU Co-Occurrence In Context with Not in Context with Per OTU Other Pur Genes Other Pur Genes

Genes Archaea Bacteria Total Genes Archaea Bacteria Genes Archaea Bacteria Archaea Bacteria purH 23 1,083 1,106 purH/purV —17purV 25 26 3 55 purV 28 81 109 purH/purJ —4purJ —164 34 purJ 45054purH/purO —3purO 615537 purP I 25 — — purV/purJ —44purP I —— 25— purP II 62 — — purV/purO 25 19 purP II 25 — 37 — purP III 61 — — purH/purV/purO —1purP III 32 — 29 — purP IV 6——purP II/purP III 37 — purP IV 6— — — purO 59 22 81 purP II/purP III/purJ 4— purP I/purO 24 — purP I/purP II/purP III/purO 1— purP II/purP III/purV/purO 3— purP II/purP III/purH 10 — purP II/purP III/purP IV/purO 6—

NOTE.—Genes in genomic context are together with other genes of the PBP.

were found in archaeal genomes (table 1 and fig. 2). Genes found in genomes of the phyla Crenarchaeota, Korarchaeota, that code for homologs of PurP were only found in genomes and Thaumarchaeota, but genes coding for homologs of PurP of OTUs of the domain Archaea,atotalof154,withthe were present. number of copies varying from one to four per genome The patterns of presence and absence of the genes purH, (fig. 2 and supplementary table S1, Supplementary Material purV, purJ,andpurO show that the putative new bacterial online). Homologs of PurH, PurO, PurV, and PurJ were not genes of the PBP, in general, anticorrelate with purH. In other

1238 Genome Biol. Evol. 11(4):1235–1249 doi:10.1093/gbe/evz035 Advance Access publication February 20, 2019 Different Ways of Doing the Same GBE

words, the majority of the OTUs that do not have purH pos- prokaryotes. Some patterns are maintained at the genus, sess purV and purO or purV and purJ in the genome (fig. 2). family, class, and at the phylum level (table 3). Similarly, archaeal genomes contained the combinations The evolutionary history of purH, purV,andpurJ is inti- purV/purO, purJ/purP,andpurP/purO, but not purV and mately related and these results will be presented elsewhere, purJ, the most common in Bacteria,afterpurH. These results whereas the evolution of purO and PurP will be presented in suggest that the combinations of genes mentioned earlier for this publication. bacteria and archaea are replacing purH, maintaining the last two steps of the PBP, as already proposed for the domain Archaea (Brown et al. 2011). Evolution of PurO

Only two assembly/annotation errors in the genes coding The maximum likelihood (ML) tree of PurO shows two distinct Downloaded from https://academic.oup.com/gbe/article/11/4/1235/5345563 by guest on 27 September 2021 for proteins of the last two steps of the PBP were found in the groups, one containing PurOs from OTUs of the domain 1,403 analyzed genomes: two truncated PurH sequences Archaea and the other one with PurOs of the domain (supplementary table S1, Supplementary Material online). Bacteria (fig. 3a). This topology was obtained both with These two sequences were included in our analyses. This re- sequences of amino acids and nucleotides (supplementary markably low number of misannotations may be due to the fig. S1, Supplementary Material online) and is congruent fact that these genes are well represented in the biological with the current prokaryotic phylogeny (Woese and Fox databases. 1977) and therefore the root of this tree was placed in the In summary, genes coding for PurH were found in 85% of branch that connects these two groups. The topology of the the bacterial genomes and the combinations purV/purJ and rooted tree of archaeal PurOs agrees with the of purV/purO were in 63 genomes, representing 5% of the total this domain, at least at the family level (fig. 3b). number of bacterial genomes analyzed. The remaining 10% The average divergence of purO homologs in Archaea and did not harbor any of the genes for the last two steps of the Bacteria is 74%, indicating that they are highly divergent. PBP. In Archaea, genes coding for PurH were in 17% of the However, the results of the genomic analyses previously genomes and the combinations purV/purO, purP/purJ,and shown (fig. 2; tables 1 and 2) suggest that the bacterial homo- purP/purO were found in 44% of the genomes, 10% of logs of purOs are analogs of their archaeal counterparts. To the genomes did not have the genes for the last two steps of test whether the archaeal and bacterial PurOs are indeed the PBP, and the remaining 29% contained other combina- analogs, additional analyses were performed. The PurO tions that did not replace purH completely (fig. 2; table 1;and from Methanothermobacter thermoautotrophicus was func- supplementary table S1, Supplementary Material online). tionally characterized previously. This enzyme possesses a tet- rameric tertiary structure (PDB ID code 2NTL) and its active site comprises 14 amino acid residues, all of them in the same Genomic Context and Taxonomical Patterns monomer (Kang et al. 2007). The sliding window plot analysis The results of these studies demonstrated that purP is present showed that the conservation in the primary structure of the only in the domain Archaea,whilepurO, purV,andpurJ were archaeal PurOs and their bacterial counterparts is similar found both in the domains Bacteria and Archaea (fig. 2 and (fig. 4a). The amino acids of the active site of PurO are dis- supplementary table S1, Supplementary Material online). The tributed in the first 3/4 of the primary structure and concen- gene combinations purP/purO, purP/purJ, purV/purO, purV/ trated in the N-terminal (fig. 4a). purJ are potentially replacing purH in Bacteria and Archaea, The conservation score for each amino acid of PurO from performing the last two steps of the PBP. Approximately 37% M. thermoautotrophicus (mthPurO) was calculated with of the total number of genes that is potentially replacing purH ConSurf on the basis of their homologs in multiple alignments in Bacteria and Archaea is in genomic context with other of both archaeal and bacterial PurOs (fig. 4b). The results genes of the PBP and the remaining 63% is not in context showed that 53 amino acid residues of the mthPurO (26%) in Bacteria and Archaea (tables 1 and 2). In some bacterial are highly conserved in archaeal PurOs (scores 8 and 9) and 35 OTUs, the putative new genes, purO, purV,andpurJ are in of these amino acid residues also have ConSurf scores 8 and 9 genomic context with themselves or with other genes of the in bacterial PurOs (supplementary table S3, Supplementary PBP, from which the most frequent are purD and purN (ta- Material online). Most of these highly conserved amino acid ble 2). The arrangements of these genes in Bacteria (table 2) residues compose or surround the active site of the enzyme in indicate that they are putative operons that are coexpressed the tertiary structure (fig. 4b). All amino acids of the PurO and have a role in the PBP. active site possess conservation score 9, except for ser24, The comparative genomic analysis shows that there are asn54, and tyr56, that had scores 7 or 8 based on the bacterial taxonomical patterns at the genus, family and class levels multiple alignment. for the combinations purP, purO, purV,andpurJ,whichare The logos constructed with the 14 positions of the active able to replace purH in Archaea and Bacteria. Thus, the pres- site in multiple alignments of purOs from Archaea and ence of these genes is typical of specific higher taxa of Bacteria showed that ten of these amino acids are identical

Genome Biol. Evol. 11(4):1235–1249 doi:10.1093/gbe/evz035 Advance Access publication February 20, 2019 1239 Cruz et al. GBE

Table 2 Bacterial OTUs with purV, purJ,andpurO in Genomic Context with Other Genes of the Purine Biosynthetic Pathway

Phylum Class Order OTUs Genomic Context Paenibacillus mucilaginosus KNP414 purF purM purN purV purO purD Bacilli Bacillales Thermobacillus composti KWC4 purF purM purN purV purO purD Butyrivibrio proteoclasticus B316 Other purO purV Other Cellulosilyticum lentocellum DSM 5427 Other purO purV Other [] saccharolyticum WM1 Other purO purV Other Clostridium sp . SY8519 Other purO purV Other Clostridiales Firmicutes [Eubacterium] eligens ATCC 27750 Other purO purV Other [Eubacterium] rectale ATCC 33656 Other purO purV Other Roseburia hominis A2-183 Other purO purV Other Oscillibacter valericigenes Sjm18-20 purF purM purN purV purD purL fermentans DSM 20731 purF purM purN purJ purD Acidaminococcales Downloaded from https://academic.oup.com/gbe/article/11/4/1235/5345563 by guest on 27 September 2021 Acidaminococcus intestini RyC-MR95 purF purM purN purJ purD subsp. lactilytica TAM6421 purF purM purN purJ purD Olsenella uli DSM 7084 Other purO purV Other Actinobacteria equolifaciens DSM 19450 Other purO purV Other Eggerthellales heliotrinireducens DSM 20476 Other purO purV Other Fervidobacterium nodosum Rt17-B1 purF purN purV purD purM Fervidobacterium pennivorans DSM 9078 purF purN purV purD purM Thermotogae Thermotogae Thermotogales Thermosipho africanus TCF52B purF purN purV purD purM Thermosipho melanesiensis BI429 purF purN purV purD purM Sphaerochaeta globus str. Buddy purF purM purN purV purD purL Sphaerochaeta pleomorpha str. Grapes purF purM purN purV purD purL africana DSM 8902 Other purJ purV Other Spirochaetes Spirochaetia Spirochaetales Spirochaeta smaragdinae DSM 11293 Other purB purJ Other Spirochaeta sp . L21-RPul-D2 Other purJ purV Other Spirochaeta thermophila DSM 6192 Other purJ purV Other Desulfarculales Desulfarculus baarsii DSM 2075 Other purJ purD Other autotrophicum HRM2 Other purJ purV Other toluolica Tol2 Other purJ purV Other Desulfobulbus propionicus DSM 2032 Other purJ purN Other Deltaproteobacteria sulfexigens DSM 10523 Other purJ purN Other Proteobacteria oleovorans Hxd3 Other purJ purV Other Desulfotalea psychrophila LSv54 Other purJ purN Other Desulfurivibrio alkaliphilus AHT2 Other purJ purN Other Syntrophobacterales fumaroxidans MPOB Other purJ purD Other Epsilonproteobacteria felis ATCC 49179 purF purM purO purV purD PurL

NOTE.—Gray boxes indicate genes that do not participate in last two steps or are not part of the purine biosynthetic pathway.

(fig. 4c). The only exceptions are Ser24, Ser26, Asn54, and relationships among them. Brown et al. (2011) analyzed 76 Tyr56. The Ser24 is 100% conserved in archaeal PurOs and PurPs from archaeal genomes and found similar groups, but varies in bacterial PurOs, whereas Ser26 is 100% conserved proposed the clusters Ia/Ib and II (named by Zhang, White, in bacterial PurOs and varies in archaeal PurOs (fig. 4c). The et al. 2008 as groups I, II, and III, respectively). They hypoth- Asn54 is the only amino acid position that varies in both esized that clusters Ia/Ib and II originated in an ancient gene archaeal and bacterial PurOs (fig. 4c). Most of the archaeal duplication event that occurred before the divergence of the PurOs analyzed possess asparagine in this position and from Archaea taxa analyzed. Brown et al. (2011) also reported that the four that possess serine, three were recovered from the cluster Ib (PurP II by Zhang, White, et al. 2008) was mono- OTUs in the Family Methanocellaceae, suggesting that this phyletic in some analyses, but in others the cluster Ia fell inside variation is characteristic of PurOs from this family. The cluster Ib. Besides that, it was not clear if the highly divergent Tyr56 is 100% conserved in archaeal PurOs, but it is replaced PurP from Thermococcus species was a distinct isoform of mainly by serine in bacterial PurOs, however, glutamic acid, PurP or if it was acquired by a lateral gene transfer event histidine, and leucine were also present at this position from a Crenarchaeota. In this study, we analyzed 154 PurPs (fig. 4c). In general, most of the variations described earlier from archaeal genomes, including higher taxa not sampled in are chemically equivalent in PurOs of both Archaea and previous studies and revisited their phylogenetic relationships. Bacteria. These results are congruent with the hypothesis The number of copies of purP in OTUs of the domain that purOs in Archaea and Bacteria are analogous. Archaea that contain this gene varied from one to three. The ML phylogenetic tree of PurP showed four distinct groups (fig. 5a) that were enumerated according to Evolution of PurP Zhang, White, et al. (2008) as PurP I, II, and III. The group The diversity of PurPs was investigated in previous studies. PurP IV is being reported in this study. The groups I, II, and Zhang, White, et al. (2008) analyzed PurPs from 22 archaeal III were also recovered by Brown et al. (2011) but the in- species and proposed the existence of three groups: PurP I, ternal topologies were distinct from the groups presented PurP II, and PurP III, but they did not explore the evolutionary in our study. The indels in the PurP alignment (fig. 5b)

1240 Genome Biol. Evol. 11(4):1235–1249 doi:10.1093/gbe/evz035 Advance Access publication February 20, 2019 Different Ways of Doing the Same GBE

Table 3 Taxonomic Patterns of Ocurrence of purPs, purO, purV,andpurJ Taxonomic Classification Taxonomic Patterns Archaea Crenarchaeota Family [12] purP II/purP III Euryarchaeota Class Halobacteria [25] purV-N/purO Family [03] purV/purO/purP II/purP III Genus Archaeoglobus [04] purJ/purP II/purP III Bacteria Bacteriodetes Family Prevotellaceae [06] purV/purJ Thermotogae Family Fervidobacteriaceae [04] purV/purJ Thermodesulfobacteria Phylum Thermodesulfobacteria [02] purV/purJ Proteobacteria Order Desulfovibrionales [09] purV/purJ Downloaded from https://academic.oup.com/gbe/article/11/4/1235/5345563 by guest on 27 September 2021 Order Desulfobacterales [08] purV/purJ

NOTE.—All OTUs of these taxa, including classes, families, and genera included harbor the gene shown. The numbers between brackets are the amount of OTUs ineach group. were used to root the phylogenetic tree (fig. 5a)because methanogenic archaea Class I analyzed fell into the group indels are rarely fixed and are highly conserved evolution- PurP I, which is not phylogenetically related to any PurP of ary events that do not revert easily as compared with nu- . Thus, the relationship of PurPs I with the cleotide substitutions (Rokas and Holland 2000). other PurPs is incongruent with the archaeal phylogeny Therefore, genes or proteins that share indels are consid- (Petitjean et al. 2015; Hug et al. 2016). The group PurP I ered phylogenetically more related (Chan et al. 2007). includes the PurP from Methanocaldococcus jannaschii, Indels are used as molecular markers in studies as diverse theonlyPurPthathaditsFAICARsynthaseactivityexper- as protein evolution and taxonomy of microorganisms imentally characterized (Zhang, White, et al. 2008). (Rokas and Holland 2000; Chan et al. 2007; The group II encompasses the PurPs II and PurPs III from Ajawatanawong and Baldauf 2013; Naushad et al. thesame61OTUsofthephylaCrenarchaeota, 2014). The PurPs I, II, and IV share two indels (fig. 5b), Euryarchaeota, Thaumarchaeota,andKorarchaeota (sup- indicating that they are phylogenetically more related plementary table S4, Supplementary Material online). The with each other than with PurP III. These indels were uti- composition of these groups is similar to what was lized as a nonarbitrary criterion to root the phylogenetic reported by Zhang, White, et al. (2008)andBrown et al. tree of PurP in the branch that connects PurP III with the (2011) but the topologies of these groups are not. The other groups (fig. 6). The topology of the rooted tree group PurP II is composed of 62 proteins, one in each indicates that PurPs I and II are themselves more related OTU, except for mazei Go1 that harbors than they are with PurP IV (fig. 6a). two PurPs II. The group PurP III is composed of 61 proteins, Group I includes PurPs I from the methanogenic ar- one copy in each OTU. In some OTUs, the genes that en- chaea Class I, Orders , code the PurP II and III are in genomic context with other ,andMethanopyrales, and PurPs from pur genes, what suggests that they are functionally re- the Order Methanocellales, which are methanogenic ar- lated with the PBP (supplementary table S5, chaea Class II (fig. 6a). Methanogenic Archaea Class I and Supplementary Material online). The group PurP II pos- II are not phylogenetically related (Petitjean et al. 2015). sesses two subgroups with several cases of horizontal The topology of this group is similar to the one obtained gene transfer. These inferences were made because the by Brown et al. (2011) and is congruent with the archaeal groups of PurPs formed are not congruent with the ar- phylogeny, except for the PurP I from Methanocellales, chaeal phylogeny (Koonin 2015; Petitjean et al. 2015; Hug grouped as a sister of the subgroup containing PurPs I et al. 2016). For example: 1) PurPs from Euryarchaeota are from Methanobacteriales and Methanococcales,both together with one PurP from an OTU belonging in the methanogenic archaea Class I. It indicates that the phylum Korarchaeota, indicating that it was acquired by Methanocellales, that are methanogenic archaea Class II, a horizontal gene transfer event from an Euryarchaeota acquired its PurP I in an event of horizontal gene transfer (fig. 6a); 2) the PurP II from a Methanocellales is more from a methanogenic archaea Class I. All OTUs from these relatedtoPurPsIIfromArchaeoglobales than to PurPs II classes contain only one copy of purP, the only exception from , while phylogenetically, is paludicola SANAE, with three copies of Archaeoglobales are more distantly related than the other the gene, but only one copy is classified in group I (sup- two orders; 3) PurPs II from are more plementary table S4, Supplementary Material online). relatedtoPurPsIIfromMethanosarcinales,while Methanogenic archaea Class I are phylogenetically related Archaeoglobales are known to be phyllogenetically closer to the Class Thermococci, however, the PurPs from all to Thermoplasmatales; 4) The second subgroup contained

Genome Biol. Evol. 11(4):1235–1249 doi:10.1093/gbe/evz035 Advance Access publication February 20, 2019 1241 Cruz et al. GBE

(b) 72 EU - WP_004267493.1 - magadii ATCC 43099 (a) 60 EU - WP_012944853.1 - DSM 5511 Archaea EU - WP_014862858.1 - sp. J7-2 EU - WP_006183061.1 - Natrinema pellirubrum DSM 15624 61 74 EU - WP_015323170.1 - occultus SP4 Root Natrialbaceae 93 EU - WP_013878988.1 - Halopiger xanaduensis SH-6 70 EU - WP_005578293.1 - gregoryi SP2 57 EU - AHG00648.1 - Halostagnicola larsenii XH-48 EU - WP_015302331.1 - Halovivax ruber XH-70 EU - WP_020447016.1 - Salinarchaeum sp. Harcht-Bsk1 EU - WP_008418982.1 - jeotgali B3 100 EU - WP_004044970.1 - volcanii DS2 EU - YP_006347742.1 - Haloferax mediterranei ATCC 33500 Bacteria EU - WP_015911385.1 - lacusprofundi ATCC 49239 0.2 100 EU - WP_015788797.1 - utahensis DSM 12940 EU - ERJ06829.1 - Halorhabdus tiamatea SARL4B 91 EU - WP_015761883.1 - mukohataei DSM 12286 89 EU - WP_004959818.1 - marismortui ATCC 43049

100 EU - WP_014041673.1 - Haloarcula hispanica ATCC 33960 Downloaded from https://academic.oup.com/gbe/article/11/4/1235/5345563 by guest on 27 September 2021 EU - WP_011322093.1 - pharaonis DSM 2160 64 90 EU - WP_015410184.1 - Natronomonas moolapensis 8.8.11 borinquense Haloferacaceae 99 EU - WP_006055612.1 - DSM 11551 EU - 15791163 - sp. NRC-1 Halobacteriaceae 100 EU - 169236919 - R1 56 EU - WP_011570303.1 - walsbyi DSM 16790 Haloferacaceae 100 EU - WP_012036585.1 - Methanocella arvoryzae MRE50 EU - WP_012901361.1 - SANAE Methanocellaceae EU - WP_014405444.1 - Methanocella conradii HZ254 67 EU - WP_013719173.1 - concilii GP6 Methanosaetaceae 99 EU - WP_011696789.1 - Methanosaeta thermophila PT 50 EU - WP_014587743.1 - Methanosaeta harundinacea 6Ac 98 EU - WP_013645998.1 - sp. AL-21 86 92 EU - WP_013824756.1 - Methanobacterium sp. SWAN-1 52 EU - WP_023991358.1 - Methanobacterium sp. MB1 EU - WP_010876651.1 - Methanothermobacter thermautotrophicus str. Delta H 99 EU - WP_013296201.1 - Methanothermobacter marburgensis str. Marburg 97 EU - WP_011406969.1 - stadtmanae DSM 3091 73 EU - WP_013413402.1 - fervidus DSM 2088 EU - WP_004032959.1 - smithii ATCC 35061 Methanobacteriaceae 73 EU - WP_012956637.1 - Methanobrevibacter ruminantium M1 53 91 EU - WP_016359057.1 - Methanobrevibacter sp. AbM4 EU - WP_013100004.1 - Methanocaldococcus infernus ME 90 98 EU - WP_011868054.1 - maripaludis C5 83 EU - WP_011972429.1 - Methanococcus vannielii SB 99 EU - WP_013179550.1 - Methanococcus voltae A3 EU - WP_011973488.1 - Methanococcus aeolicus Nankai-3 53 EU - WP_013867675.1 - okinawensis IH1 EU - WP_013799325.1 - igneus Kol 5 78 EU - WP_015733759.1 - Methanocaldococcus vulcanius M7 92 EU - WP_015791961.1 - Methanocaldococcus fervens AG86 Methanocaldococcaceae 99 91 EU - WP_010870131.1 - Methanocaldococcus jannaschii DSM 2661 86 EU - WP_012980627.1 - Methanocaldococcus sp. FS406-22 EU - WP_013905373.1 - Pyrococcus yayanosii CH1 EU - WP_004067598.1 - Thermococcus litoralis DSM 5473 100 EU - WP_011249385.1 - Thermococcus kodakarensis KOD1 Thermococcaceae 72 EU - WP_014787897.1 - Thermococcus sp. CL1 99 EU - WP_015858879.1 - Thermococcus gammatolerans EJ3 52 EU - WP_014011985.1 - Thermococcus sp. 4557 EU - WP_011018459.1 - kandleri AV19 AB - WP_013863757.1 - Microlunatus phosphovorus NM-1 Propionibacteriaceae EP - WP_013469099.1 - ATCC 49179 Helicobacteraceae SP - WP_013607820.1 - Sphaerochaeta globus str. Buddy FB - WP_015846456.1 - Paenibacillus sp. JDR-2 Paenibacillaceae 99 87 AB - WP_022738809.1 - Adlercreutzia equolifaciens DSM 19450 100 AB - WP_012799496.1 - Slackia heliotrinireducens DSM 20476 77 AB - WP_013251215.1 - Olsenella uli DSM 7084 SP - WP_014271261.1 - Sphaerochaeta pleomorpha str. Grapes Spirochaetaceae 75 FB - WP_013920969.1 - Paenibacillus mucilaginosus KNP414 Paenibacillaceae FB - WP_015255887.1 - Thermobacillus composti KWC4 FC - WP_015615840.1 - Clostridium pasteurianum BC1 Clostridiaceae FC - WP_014116928.1 - Oscillibacter valericigenes Sjm18-20 Oscillospiraceae FR - WP_015732441.1 - Fibrobacter succinogenes subsp. succinogenes S85 Fibrobacteraceae SP - WP_013701989.1 - Treponema succinifaciens DSM 2489 Spirochaetaceae FC - WP_012740696.1 - [Eubacterium] eligens ATCC 27750 Eubacteriaceae 90 FC - WP_013497026.1 - Ruminococcus albus 7 Ruminococcaceae 66 FC - WP_013272846.1 - [Clostridium] saccharolyticum WM1 Lachnospiraceae 76 FC - WP_013280130.1 - Butyrivibrio proteoclasticus B316 60 FC - WP_013977983.1 - Clostridium sp. SY8519 Clostridiaceae Lachnospiraceae 53 FC - WP_013658180.1 - Cellulosilyticum lentocellum DSM 5427 FC - WP_012742409.1 - [Eubacterium] rectale ATCC 33656 Eubacteriaceae 92 FC - WP_014080130.1 - Roseburia hominis A2-183 Lachnospiraceae

0.2

FIG.3.—Phylogenetic relationships among archaeal PurOs and their bacterial counterparts. (a) Maximum likelihood tree (ML) of PurOs showing the root placement at the branch that connects PurOs from Archaea and Bacteria.(b) Rooted ML tree showing PurOs from Archaea and their homologs in Bacteria in distinct groups. The multiple alignment utilized in these reconstructions contained 81 sequences of amino acids with 134 sites. The tree was constructed with the model LGþG þ I and bootstrap was performed with 1,000 resamplings. The OTUs are identified by abbreviations that represent their phylum and class, accession number of the protein and species name. The abbreviations are: EU, phylum Euryarchaeota; FC, phylum Firmicutes class Clostridia; FB, phylum Firmicutes class Bacillales; AB, phylum Actinobacteria; SP, phylum Spirochaetes; FR, phylum Fibrobacteres; EP, phylum Proteobacteria classe Epsilonproteobacteria. The scale indicates the number of substitutions per site. PurPs II from Thaumarchaeota and Crenarchaeota,where Sulfobolales than to ;5)thePurPsII Thermoproteales formed a sister subgroup with the from do not form a monophyletic Desulfurococcales, whereas it is known from archaeal group (fig 6a). Similar phylogenetic incongruences were phylogenies that Desulfurococcales is closer to found among the PurPs III (fig. 6a).

1242 Genome Biol. Evol. 11(4):1235–1249 doi:10.1093/gbe/evz035 Advance Access publication February 20, 2019 Different Ways of Doing the Same GBE

(a) (b)

Ser24 Asn54 PurOs of Archaea Tyr20 Arg25 Glu104 Tyr56 Asp106 Tyr2 Ser26 Tyr59 Arg5 Asn73

Tyr148

Arg30 Percentage of Identidy of Percentage Downloaded from https://academic.oup.com/gbe/article/11/4/1235/5345563 by guest on 27 September 2021

PurOs of Bacteria (c)

Tyr2 Arg5 Tyr20 Ser24 Arg25 Ser26 Arg30 Asn54 Tyr56 Tyr59 Asn73 Glu104 Asp106 Tyr148 Archaea YRYSRHSRNSYYNEDY Bacteria H 123456789 E YRYGARSRDALSYNEDY

FIG.4.—Conservation of PurOs from Archaea and Bacteria.(a) Sliding window plot analysis of the multiple alignments of PurOs in Archaea and Bacteria show that the conservation in the primary structure is similar in these domains. The arrows indicate the positions of the active sites of PurO from Methanothermobacter thermoautotrophicus.(b) Amino acids residues of the 3D structure of PurO from M. thermoautotrophicus complexed with AICAR, its substrate (PDB ID code 2NTL) as visualized in ConSurf. The tertiary structure is presented using a surface-filled model and the AICAR in a ball- and-stick model. The amino acids residues are colored according to their ConSurf conservation scores calculated on the basis of multiple alignments of archaeal and bacterial PurOs. The color-coding bar varies from 1 to 9, where 9 is the most conserved and 1 most variable. Amino acid positions with low confidence are marked in yellow. Tertiary structures on the left show amino acids with scores 1 to 9 and the ones on the right show amino acids with scores 1 to 7. The figure reveals that most highly conserved amino acids (scores 8 and 9) compose or surround the active site of the enzyme. (c) Sequence logos of the positions in the multiple alignment that correspond to the active sites of PurO from M. thermoautotrophicus showing that most amino acids are conserved in Archaea and Bacteria.

118 135 152 169 (a)(b) Group 2 Methanocaldococcus jannaschii DSM 2661 ILRWESERSLEG--KLLREA TVIVKFPGARGG--RGYFIA (PurP II) Methanocaldococcus sp. FS406-22 ILRWESERSLEG--KLLREA TVIVKFPGARGG--RGYFIA PurP I Methanocaldococcus vulcanius M7 ILRWESERNLEG--KLLREA TVIVKFPGARGG--RGYFIA Methanotorris igneus Kol 5 ILRWEAERDLEG--KLLRES PVMVKFPGARGG--KGYFVC Methanococcus aeolicus Nankai-3 ILRWEAERDLEG--KLLKES PVMVKFPGARGG--RGYFPC tenax Kra 1 LIEIEADQFKKM--GLLERA PVIVKLFGAKGG--RGYFFA Group 1 Thermoproteus uzoniensis 768-20 LIEVEADQYKKM--DLLRRA PVIVKLFGAKGG--RGYFLA (PurP I) PurP II Archaeoglobus fulgidus DSM 4304 LMIWETDRDRQR--EWLERS LAIVKYPGAKGG--KGYFIV Archaeoglobus veneficus SNP6 LMLWEVDREKQR--KWLLDA LVIVKYPGAKGG--MGYFLA Archaeoglobus sulfaticallidus PM70-1 LMYWETVREMQD--RWLNSA LCIVKFPGAKGG--KGYFLA Pyrococcus yayanosii CH1 FLKWETTFELQD--RALEKA LYFVRIEGPRGG--SGHFMA Root Thermococcus sp. 4557 FLKWETSFELQD--RALEKA LYFVRLEGPRGG--SGHFLA Group 4 FLKWETTFELQD--KALEGA LYFVRIEGPRGG--SGHFIV (PurP IV) PurP IV Thermococcus kodakarensis KOD1 Thermococcus sp. CL1 FLKWETTFELQD--RALDAA LYFVRIEGPRGG--SGHFLA Thermococcus gammatolerans EJ3 FLKWETTFELQD--KALDEA LYFVRIEGPRGG--SGHFIA solfataricus P2 MLRWEERIGEKNYYKILNEA PVIVKLPEAKRKVERGFFFA fumarii 1A LLRWEERKGEKNYYKLLDDA PVLVKLPEAARRVERAFFVA PurP III GP6 LLRSEERGDKRDYYWLLEKA LVMVKLPHAVKTLERGFFTA Group 3 LLRSEERGDKMDYYWLLEKA LVIVKLPHAVKTLERGFFTA 0.2 (PurP III) Methanosaeta thermophila PT Methanosaeta harundinacea 6Ac LLRSEERGEEQDYYWILEKA LVMVKLPHAVKTLERGFFTA

FIG.5.—Groups of PurP defined by phylogenetic analysis and detail of the multiple alignment. (a) Unrooted maximum likelihood tree of PurP constructed with 154 amino acid sequences containing 399 sites, the model LGþG þ I and bootstrap with 1,000 resamplings. Root placement is indicated. (b) Indels shared by PurPs I, II and IV that were used to root the phylogenetic tree of PurP shown in (a). Five PurPs of each group are represented in the alignment.

Genome Biol. Evol. 11(4):1235–1249 doi:10.1093/gbe/evz035 Advance Access publication February 20, 2019 1243 Cruz et al. GBE

(a)(b) 90 EU - WP_015858377.1 - Thermococcus gammatolerans EJ3 EU - WP_014121268.1 - Thermococcus sp. AM4 2/3 PurP II 98 EU - WP_014788621.1 - Thermococcus sp. CL1 76 EU - WP_014012731.1 - Thermococcus sp. 4557 EU - WP_011249151.1 - Thermococcus kodakarensis KOD1 2/3 95 EU - WP_013905812.1 - Pyrococcus yayanosii CH1 EU - WP_011012664.1 - DSM 3638 Thermococci 100 EU - WP_014734336.1 - Pyrococcus sp. ST04 PurP I 56 3/3 81 EU - WP_013749388.1 - Pyrococcus sp. NA2 79 EU - WP_010885437.1 - Pyrococcus horikoshii OT3 EU - WP_013468302.1 - Thermococcus barophilus MP PurP IV EU - WP_012766081.1 - Thermococcus sibiricus MM 739 99 EU - WP_004066071.1 - Thermococcus litoralis DSM 5473 EU - WP_012901228.1 - Methanocella paludicola SANAE Methanocellales Methanomicrobia EU - WP_012940614.1 - Archaeoglobus profundus DSM 5631 98 EU - WP_010877627.1 - Archaeoglobus fulgidus DSM 4304 PurP III 99 EU - WP_012966304.1 - Ferroglobus placidus DSM 10642 Archaeoglobales Archaeoglobi 56 EU - WP_013683285.1 - Archaeoglobus veneficus SNP6 59 EU - WP_015590040.1 - Archaeoglobus sulfaticallidus PM70-1 KO - WP_012309053.1 - Candidatus Korarchaeum cryptofilum OPF8 ? ? EU - WP_015491705.1 - Thermoplasmatales archaeon BRNA1 Thermoplasmatales Thermoplasmata 78 EU - WP_013720515.1 - Methanosaeta concilii GP6 99 EU - WP_011695382.1 - Methanosaeta thermophila PT 100 EU - WP_014586143.1 - Methanosaeta harundinacea 6Ac 98 EU - WP_011020264.1 - Methanosarcina acetivorans C2A 91 EU - WP_011033439.1 - Methanosarcina mazei Go1 88 97 EU - WP_011306342.1 - str. Fusaro EU - WP_011034815.1 - Methanosarcina mazei Go1 Methanosarcinales Methanomicrobia EU - WP_013897401.1 - zhilinae DSM 4017 100 Downloaded from https://academic.oup.com/gbe/article/11/4/1235/5345563 by guest on 27 September 2021 EU - WP_015053828.1 - psychrophilus R15 67 EU - WP _013193586.1 - evestigatum Z-7303 EU - WP_015323592.1 - hollandica DSM 15978 PurP II EU - WP_011499674.1 - burtonii DSM 6242 EU - WP_013036507.1 - mahii DSM 5219 100 TA - WP_012215255.1 - maritimus SCM1 100 TA - WP_014963256.1 - Candidatus Nitrosopumilus koreensis AR1 ? TA - WP_015017797.1 - Candidatus Ga9.2 Nitrososphaerales Nitrososphaeria 68 CR - WP_013302772.1 - aggregans DSM 17230 Desulfurococcales 76 CR - WP_013335797.1 - Vulcanisaeta distributa DSM 14429 Thermoproteales 95 CR - WP_012608199.1 - kamchatkensis 1221n Desulfurococcales 100 CR - WP_014767300.1 - Desulfurococcus fermentans DSM 16532 CR - WP_014557707.1 - fontis Kam940 Fervidococcales 97 CR - WP_011277588.1 - Sulfolobus acidocaldarius DSM 639 52 CR - WP_010978242.1 - str. 7 54 CR - WP_012711871.1 - Sulfolobus islandicus L.S.2.15 100 100 CR - WP_009990496.1 - P2 100 CR - WP_013775566.1 - hospitalis W1 CR - WP_013736630.1 - cuprina Ar-4 99 CR - WP_011921163.1 - DSM 5348 79 CR - WP_011998477.1 - hospitalis KIN4/I CR - WP_014025786.1 - 1A Desulfurococcales 67 CR - WP_012185096.1 - Caldivirga maquilingensis IC-167 51 CR - WP_013605307.1 - Vulcanisaeta moutnovskia 768-28 98 CR - WP_014126465.1 - Thermoproteus tenax Kra 1 100 CR - WP_013679021.1 - Thermoproteus uzoniensis 768-20 100 CR - WP_011761854.1 - islandicum DSM 4184 99 CR - WP_012351253.1 - Pyrobaculum neutrophilum V24Sta Thermoproteales CR - WP_011849140.1 - Pyrobaculum calidifontis JCM 11548 73 CR - WP_011007364.1 - Pyrobaculum aerophilum str. IM2 99 CR - WP_014289334.1 - Pyrobaculum sp. 1860 CR - WP_011901712.1 - Pyrobaculum arsenaticum DSM 13514 100 CR - WP_014348067.1 - Pyrobaculum oguniense TE7 EU - WP_011018958.1 - Methanopyrus kandleri AV19 100 EU - WP_012901208.1 - Methanocella paludicola SANAE 100 EU - WP_014406961.1 - Methanocella conradii HZ254 Methanocellales Methanomicrobia EU - WP_012035114.1 - Methanocella arvoryzae MRE50 69 EU - WP_013643927.1 - Methanobacterium sp. AL-21 100 EU - WP_013826603.1 - Methanobacterium sp. SWAN-1 EU - WP_023991239.1 - Methanobacterium sp. MB1 EU - WP_013296368.1 - Methanothermobacter marburgensis str. Marburg 100 EU - WP_010876825.1 - Methanothermobacter thermautotrophicus str. Delta H 96 EU - WP_011405855.1 - Methanosphaera stadtmanae DSM 3091 Methanobacteriales 98 EU - WP_016358960.1 - Methanobrevibacter sp. AbM4 92 EU - WP_012955397.1 - Methanobrevibacter ruminantium M1 83 EU - WP_004036716.1 - Methanobrevibacter smithii ATCC 35061 PurP I EU - WP_013413473.1 - DSM 2088 61 EU - WP_013100179.1 - Methanocaldococcus infernus ME EU - WP_010869629.1 - Methanocaldococcus jannaschii DSM 2661 99 EU - WP_012980593.1 - Methanocaldococcus sp. FS406-22 100 EU - WP_015790884.1 - Methanocaldococcus fervens AG86 EU - WP_012819581.1 - Methanocaldococcus vulcanius M7 EU - WP_013798714.1 - Methanotorris igneus Kol 5 Methanococcales 8682 EU - WP_011974035.1 - Methanococcus aeolicus Nankai-3 EU - WP_013866761.1 - Methanothermococcus okinawensis IH1 85 EU - WP_013179653.1 - Methanococcus voltae A3 EU - WP_011867979.1 - Methanococcus maripaludis C5 96 EU - WP_011972491.1 - Methanococcus vannielii SB 90 EU - WP_013905372.1 - Pyrococcus yayanosii CH1 67 EU - WP_014011986.1 - Thermococcus sp. 4557 EU - WP_004067596.1 - Thermococcus litoralis DSM 5473 EU - WP_011249386.1 - Thermococcus kodakarensis KOD1 Thermococcales Thermococci PurP IV 100 EU - WP_014787896.1 - Thermococcus sp. CL1 100 EU - WP_015858881.1 - Thermococcus gammatolerans EJ3 EU - WP_011499113.1 - Methanococcoides burtonii DSM 6242 EU - WP_015052899.1 - Methanolobus psychrophilus R15 51 EU - WP_015323984.1 - Methanomethylovorans hollandica DSM 15978 EU - WP_013037844.1 - Methanohalophilus mahii DSM 5219 EU - WP_013897790.1 - Methanosalsum zhilinae DSM 4017 Methanosarcinales Methanomicrobia 99 EU - WP_011305327.1 - Methanosarcina barkeri str. Fusaro 100 EU - WP_011023596.1 - Methanosarcina acetivorans C2A 96 98 EU - WP_011032544.1 - Methanosarcina mazei Go1 EU - WP_013194044.1 - Methanohalobium evestigatum Z-7303 EU - WP_015492794.1 - Thermoplasmatales archaeon BRNA1 Thermoplasmatales Thermoplasmata EU - WP_014585944.1 - Methanosaeta harundinacea 6Ac 100 EU - WP_013719018.1 - Methanosaeta concilii GP6 Methanosarcinales Methanomicrobia 100 78 EU - WP_011696275.1 - Methanosaeta thermophila PT 87 EU - WP_012766089.1 - Thermococcus sibiricus MM 739 59 EU - WP_004067623.1 - Thermococcus litoralis DSM 5473 EU - WP_013468310.1 - Thermococcus barophilus MP EU - WP_011011535.1 - Pyrococcus furiosus DSM 3638 EU - WP_014733634.1 - Pyrococcus sp. ST04 100 EU - WP_013906329.1 - Pyrococcus yayanosii CH1 EU - WP_013748411.1 - Pyrococcus sp. NA2 Thermococcales Thermococci 70 EU - WP_010884419.1 - Pyrococcus horikoshii OT3 EU - WP_011249158.1 - Thermococcus kodakarensis KOD1 52 EU - WP_015858371.1 - Thermococcus gammatolerans EJ3 50 EU - WP_014121262.1 - Thermococcus sp. AM4 100 EU - WP_014012736.1 - Thermococcus sp. 4557 99 50 EU - WP_014788625.1 - Thermococcus sp. CL1 88 54 EU - WP_012940619.1 - Archaeoglobus profundus DSM 5631 75 EU - WP_013683779.1 - Archaeoglobus veneficus SNP6 100 EU - WP_012965443.1 - Ferroglobus placidus DSM 10642 Archaeoglobales Archaeoglobi 97 EU - WP_010877767.1 - Archaeoglobus fulgidus DSM 4304 EU - WP_015589970.1 - Archaeoglobus sulfaticallidus PM70-1 PurP III 65 EU - WP_012901227.1 - Methanocella paludicola SANAE Methanocellales Methanomicrobia 94 TA - WP_015020201.1 - Candidatus Nitrososphaera gargensis Ga9.2 Nitrososphaerales Nitrososphaeria 100 TA - WP_012215153.1 - Nitrosopumilus maritimus SCM1 Nitrosopumilales ? 100 TA - WP_014963125.1 - Candidatus Nitrosopumilus koreensis AR1 KO - WP_012309054.1 - Candidatus Korarchaeum cryptofilum OPF8 ? ? CR - WP_014026457.1 - Pyrolobus fumarii 1A 75 100 CR - WP_012608197.1 - Desulfurococcus kamchatkensis 1221n Desulfurococcales CR - WP_014767298.1 - Desulfurococcus fermentans DSM 16532 99 CR - WP_014557705.1 - Fervidicoccus fontis Kam940 Fervidococcales 93 CR - WP_013302770.1 - Ignisphaera aggregans DSM 17230 Desulfurococcales 82 CR - WP_013335798.1 - Vulcanisaeta distributa DSM 14429 Thermoproteales 64 CR - WP_011277589.1 - Sulfolobus acidocaldarius DSM 639 CR - WP_010978243.1 - Sulfolobus tokodaii str. 7 89 100 CR - WP_012711870.1 - Sulfolobus islandicus L.S.2.15 100 CR - WP_009990498.1 - Sulfolobus solfataricus P2 Sulfolobales CR - WP_013775567.1 - Acidianus hospitalis W1 CR - WP_013736629.1 - Metallosphaera cuprina Ar-4 100 CR - WP_011921162.1 - Metallosphaera sedula DSM 5348 Thermoprotei CR - WP_012123087.1 - Ignicoccus hospitalis KIN4/I Desulfurococcales 54 CR - WP_012185095.1 - Caldivirga maquilingensis IC-167 CR - WP_013605308.1 - Vulcanisaeta moutnovskia 768-28 95 CR - WP_014126464.1 - Thermoproteus tenax Kra 1 100 CR - WP_013679020.1 - Thermoproteus uzoniensis 768-20 CR - WP_011007361.1 - Pyrobaculum aerophilum str. IM2 96100 CR - WP_011901709.1 - Pyrobaculum arsenaticum DSM 13514 Thermoproteales 97 CR - WP_014348070.1 - Pyrobaculum oguniense TE7 CR - WP_014289331.1 - Pyrobaculum sp. 1860 CR - WP_011850706.1 - Pyrobaculum calidifontis JCM 11548 61 CR - WP_011761855.1 - Pyrobaculum islandicum DSM 4184 0.2 95 CR - WP_012351252.1 - Pyrobaculum neutrophilum V24Sta

FIG.6.—Phylogenetic trees of the PurPs and the proposed events of duplication that gave rise to the different groups. (a) Phylogenetic tree rooted on the branch that connects PurP III with the other PurPs. This tree is shown unrooted in figure 5a.(b) Proposed evolutionary history of PurPs. Arrows indicate duplication events and the numbers on the arrows indicate the number of times the branch is recovered in phylogenetic analyses performed with different settings (supplementary fig. S2, Supplementary Material online). The abbreviations indicate the phylum of each sequence: EU, phylum Euryarchaeota;CR, phylum Crenarchaeota; TA, phylum Thaumarchaeota; KO, phylum Korarchaeota. The scale indicates the number of substitutions per site.

1244 Genome Biol. Evol. 11(4):1235–1249 doi:10.1093/gbe/evz035 Advance Access publication February 20, 2019 Different Ways of Doing the Same GBE

The group PurP IV is composed of PurP homologs encoded Discussion in the genome of six out of 13 OTUs analyzed in the family The PBP is ancient and its derivatives are involved in the syn- Thermococcaceae (supplementary table S4, Supplementary thesis of nucleic acid precursors, carbohydrate metabolism, Material online). These six OTUs also harbor PurPs II and III and in several cellular signaling pathways. Information on (supplementary table S4, Supplementary Material online). The the diversity of the genes encoding the enzymes of the PBP PurPs IV are highly divergent from the others PurPs but highly is important to many biotechnological applications, such as similar among them, with identities varying from 70% to the development of anticancer and antimicrobial drugs. In this 87%. In our analyses, the PurP IV always formed a distinct study, we expand the scientific knowledge on the diversity in group, in contrast to what was reported by Brown et al. the last two steps of the PBP in prokaryotic lineages. To ac-

(2011), where the two representatives of these PurPs Downloaded from https://academic.oup.com/gbe/article/11/4/1235/5345563 by guest on 27 September 2021 complish this, an extensive genome analysis was performed emerged inside PurP II as a basal group in Crenarchaeota with 1,403 completely sequenced and annotated prokaryotic (cluster Ib by Brown et al. 2011). The genes coding PurPs IV genomes. Comparative genomics and genomic context anal- are in the same genomic context with other pur genes (sup- yses showed that the diversity of PBP in the domain Bacteria is plementary table S5, Supplementary Material online). The higher than previously reported. For example, the genes purV, PurPs IV are being described for the first time in this study purJ,andpurO, initially reported only from Archaea,were as a novel putative isoform of PurP (supplementary table S4, found in this study to be relatively common in Bacteria.The Supplementary Material online). gene purO was until now, considered a signature of the do- The internal topology of the groups I, II, and III of the PurP main Archaea. tree is generally not congruent with the phylogenetic relation- The occurrence of genes coding for PurH in Bacteria and ships among orders, classes, and phyla of the Archaea, with Archaea was previously reported (Brown et al. 2011; few exceptions. PurPs encoded in the same genomes fall in Armenta-Medina et al. 2014). However, our results showed different groups. For instance, the groups PurPs II and III com- that contrary to what occurs in the domain Archaea,theen- posed by PurPs from de same OTUs, representing four ar- zyme PurH seems to be the preferred evolutionary alternative chaeal phyla, and the three PurPs from M. paludicola selected by most bacterial phyla to catalyze the last two steps SANAE fall in groups I, II, and III. Finally, the overall topology in the purine biosynthesis pathway. Genes encoding PurH of the PurP tree is incongruent with the archaeal phylogeny, were found in 17% of the archaeal genomes analyzed and but congruent with the hypothesis that the main PurP groups in 85% of the bacterial genomes (fig. 2). This difference may are composed by paralogs that originated in gene duplication be due to the fact that the domain AICARFT from the PurHs events. Thus, based on the finding presented earlier, we pro- uses tetrahydrofolate (THF) as a donor of formyl and in bac- pose that groups I, II, III, and IV are paralogs that originated in teria, in contrast to archaea, THF is preferentially used as the three events of gene duplication that occurred in the ancestral C1 donor (Maden 2000). A similar observation was made by of the Archaea domain (fig. 6b). Brown et al. (2011), when they reported that the few archaeal The first duplication event, also proposed by Brown et al. OTUs that could synthesize folates also possessed PurH or (2011), originated the PurP III and the ancestral of PurPs IV, I, PurV encoded in their genomes. These included OTUs from and II. Here, we proposed two additional subsequent dupli- the Class Halobacteria and from the Order cations events that originated the PurPs IV, I, and II. These . On the other hand, Archaea that do isoforms of PurP were selectively maintained or lost during not possess PurH nor PurV do not synthesize THF (White the taxonomic diversification of the domain Archaea.Thein- 1988, 1997; Choquet et al. 1994). ternal nodes representing the duplication events that origi- The genomic analyses performed in this study, as well as nated the paralogs I, II, and IV do not have enough the taxonomical patterns of gene occurrence indicate that the bootstrap support in the protein ML tree (fig. 6a)similarto combinations purV/purO, purV/purJ areabletoreplacepurH what was found by Brown et al. (2011). Therefore, we con- in members of the domain Bacteria that lack this gene, similar structed other ML trees with the multiple alignment of to what was proposed for Archaea (Brown et al. 2011). Genes nucleotides containing only unambiguous sites as determined that replace purH occurred in 73% of the archaeal genomes in the Guidance server (Penn et al. 2010). The alternative ML and in 5% of the bacterial genomes included in our analyses trees were inferred with different nucleotide positions (first (fig. 2). Approximately 37% of the purO, purV,andpurJ were and second or all positions of the codon) and with the amino in genomic context with other genes of the PBP in both bac- acid positions of the protein multiple alignment. Only the ML terial and archaeal genomes and 63% were not in genomic tree obtained with the protein multiple alignment yielded an context (table 1). The fact that part of the genes coding for alternative evolutionary history (supplementary fig. S2, enzymes in the PBP are in genomic context is expected be- Supplementary Material online) that was not similar to the cause genes functionally related tend to be in the same op- tree proposed for the PurPs in this study (fig. 6b). eron in prokaryotic genomes (Korbel et al. 2004). The

Genome Biol. Evol. 11(4):1235–1249 doi:10.1093/gbe/evz035 Advance Access publication February 20, 2019 1245 Cruz et al. GBE

genomic context is commonly used in the prediction of the Bacteria does not implicate that these two genes cannot cat- functional interactions among genes (Huynen et al. 2000). alyze the last two steps of the PBP. The experimental charac- No homologs of genes that code for PurH, PurO, PurV, and terization of the peptides encoded by these genes will PurJ were found in genomes studied of OTUs in the phyla certainly bring more information on their catalytic mecha- Crenarchaeota, Korarchaeota,andThaumarchaeota,but nisms and evolution. genes that code for homologs of PurP were found and they The molecular phylogeny of the PurOs indicates that the may be involved in the PBP, as will be discussed below. A total ancestral form of this enzyme was present in the common of 132 genomes did not harbor any of the genes coding for ancestor of the domains Archaea and Bacteria.Analternative the last two steps of the PBP nor their homologs. These OTUs, hypothesis to the origin of bacterial PurOs would be the lateral

14 Archaea and 118 Bacteria,representing10% of each transfer of an archaeal PurO to an ancient bacterium. But, in Downloaded from https://academic.oup.com/gbe/article/11/4/1235/5345563 by guest on 27 September 2021 domain are not able to synthesize purines de novo. All these this case, the bacterial PurOs would have emerged within the 132 OTUs that lacked the genes coding for the last two steps group of archaeal PurOs in the phylogenetic tree, which was of the PBP also lacked the other genes of the PBP, with the not the case. Based on these considerations, we inferred that exception of 54 OTUs that possessed one or a few genes of the ancestral PurO was present in the common ancestor of the pathway, from which purB or purC were the most fre- Archaea and Bacteria. The absence of purO in the majority of quently encountered (supplementary table S2, the analyzed bacterial genomes would be the result of losses Supplementary Material online). Parasites, such as many of this gene during species diversification in this domain. members of the family , , Taking these results into consideration, purO can no longer , and symbionts such as Nanoarchaeum be considered a signature of the domain Archaea,asprevi- equitans and Serratia symbiotica acquire purines by recycling ously suggested (Graupner et al. 2002; Ownby et al. 2005; pre-existing purines found in the growth substrate through Zhang, Morar, et al. 2008; Zhang, White, et al. 2008; one the salvage purine pathways (Waters et al. 2003; Jewett Armenta-Medina et al. 2014). et al. 2009; Liechti and Goldberg 2012). However, approxi- The maintenance of basic biochemical functions in homol- mately one-third (43 OTUs) of these 132 OTUs lacked the four ogous protein domains that experienced drastic structural di- salvage pathways described in prokaryotes, indicating that vergence is a recently described phenomenon (Zhang et al. they acquire purines from their hosts. Most of these OTUs 2014). Therefore, the divergence at the primary structure level are obligatory endosymbionts or intracellular parasites (sup- observed among bacterial and archaeal PurOs does not nec- plementary table S2, Supplementary Material online). essarily imply in functional differences even when the diver- According to Xu et al. (2007) the fusion of the genes that gences result in changes in the tertiary structure. This code for the domains AICARFT and IMPCH originating the phenomenon was also observed with PurOs from Bacteria PurH was favored during evolution once FAICAR, product and Archaea, where the amino acids from the catalytic site of the AICARFT domain of PurH, is spontaneously converted as well as the ones surrounding it are highly conserved into its precursor, AICAR. Therefore, the origin of PurH con- (fig. 4b) as opposed to the majority of the other amino acids tributed to accelerate the conversion of FAICAR into IMP by that are highly variable (supplementary table S3, the domain IMPCH. The hypothesis proposed by Xu et al. Supplementary Material online). The variation in the conser- (2007) for the origin of PurH, implies that the domains vation of the primary structure of the PurO of Archaea and AICARFT and IMPCH coded by two distinct genes was the Bacteria is coincident (fig 4a), indicating that the selection ancestral condition of the PBP and therefore, organisms with pressure was similar on these proteins after the divergence the PurH in their genomes would have an adaptive advan- of the domains Archaea and Bacteria. In addition, the conser- tage. Zhang, Morar, et al. (2008) speculated that although vation of the amino acid residues of the active site is not there is no tunnel or channel between the domains AICARFT related to the conservation of the region of the primary struc- and IMPCH of PurH, there is a predominance of positive ture in which they are located, even at positions where amino charges between them, what favors the transfer of FAICAR acids residues vary (fig. 4a). It suggests that they were con- from the domain AICARFT to the domain IMPCH, corroborat- served in the PurOs of and their bacterial counter- ing the suggestions made by Xu et al. (2007) for the origin of parts despite the evolutionary divergence that resulted from PurH. However, our results show that purV and purJ are func- speciation events. Further evidences that support the hypoth- tionally involved in the PBP in bacteria, like was proposed by esis that bacterial and archaeal PurOs are analogous include Brown et al. (2011) for Archaea. The domains AICARFT and the anticorrelation between the occurrence of the genes IMPCH of the human PurH produce peptides able to catalyze purO/purV and purH (fig. 2)andthefactthatpartofthe their respective enzymatic reactions (Rayl et al. 1996). These bacterial homologs of purO are in genomic context with results indicate that AICARFT and IMPCH do not need to be purV and other genes of the PBP (table 1). fused to perform their catalytic reactions. Therefore, the fact The PurPs are only present in archaeal genomes and the that the domains AICARFT and IMPCH are coded by distinct phylogenetic tree of these enzymes shows a division in four genes in many living OTUs of the domains Archaea and groups. Due to the different evolutionary histories recovered

1246 Genome Biol. Evol. 11(4):1235–1249 doi:10.1093/gbe/evz035 Advance Access publication February 20, 2019 Different Ways of Doing the Same GBE

in the ML tree constructed with a protein multiple alignment et al. (2008) speculated that the PurP II of P. furiosus could containing only amino acid positions with high levels of reli- utilize an alternative source of formyl to catalyze the conver- ability, the relationships among these PurP paralogs proposed sion of AICAR to FAICAR while the PurP III would have a here must be viewed with caution. Perhaps, phylogenetic distinct catalytic activity or no catalytic function once it is analyses including a greater number of PurP sequences or highly divergent from PurP I and II. other methodological approaches, such as the use of complex ThePurPIofM. jannaschii and PurP II of P. furiosus have an networks, can help to solve this problem (Andrade 2011; hexameric quaternary structure formed by the interaction be- Carvalho et al. 2015). tween two trimers. However, PurP I has a more compact The collective analyses of PurPs in archaeal genomes indi- structure, with 2.5 more buried surface area between

cate that PurP IV is a new isoform, distinct from the ones the two trimers than PurP II (Zhang, White, et al. 2008). Downloaded from https://academic.oup.com/gbe/article/11/4/1235/5345563 by guest on 27 September 2021 previously described in the PBP (Zhang, White, et al. 2008; This weaker interaction between the trimers of PurP II of P. Brown et al. 2011). The tertiary structure of the PurP IV from furiosus was attributed to a possible crystallization artifact Thermococcus kodakarensis was resolved, however, it was rather than biologically relevant (Zhang, White, et al. 2008). not enzymatically characterized (Zhang, White, et al. 2008; However, every Archaea analyzed in this study that contain a Brown et al. 2011). PurP II also has a PurP III, except for some OTUs of the class The incongruences between the phylogeny inferred with Thermococci, which harbor the PurPs II, III, and IV. Therefore, the PurPs in this study and the phylogeny of the domain it is possible that this weaker interaction between the trimers Archaea may be the result of both the absence of PurPs in of PurP II of P. furiosus is because in its biologically active form, the genome of some OTUs (e.g., in the Class Halobacteria; the PurP II and III form heterohexamers composed of trimers supplementary text) or events of horizontal gene transfer with the same isoform. Therefore, we speculate that in this (e.g., transference of PurP I among unrelated methanogenic arrangement the PurPs II and III would be able to catalyze the Archaea—fig. 6). ninth reaction of the PBP, the conversion of AICAR into There are evidences suggesting that the PurPs II, III, and IV FAICAR. are functionally linked to the PBP. For example, most Archaea The results of this study contribute to a better understand- included in this study that harbor PurPs II and III or II, III, and IV ing of the diversity of the PBP in prokaryotes. The taxonomic do not contain the PurP I or analogous enzymes such as PurH patterns of occurrence of the genes of the last two steps of or PurV encoded in their genomes (supplementary table S1, the PBP are molecular signatures of certain groups of prokar- Supplementary Material online). In general, these archaeal yotes that could be used as markers in genetic diversity or on OTUs are free-living (supplementary table S4, functional metagenomic studies. The genes purV, purJ,and Supplementary Material online) and the genes encoding purO, previously reported only in the domain Archaea were PurPs II, III, and IV of several distantly related Archaea are in also found in Bacteria. In light of these results, purO cannot be genomic context or in putative operons with other genes of considered a signature of the domain Archaea, as previously the PBP (supplementary table S5, Supplementary Material on- reported. Bacterial PurOs were inferred to have catalytic ac- line). These OTUs are widespread in the phyla Crenarchaeota, tivity due to the conservation of amino acids in its active site Euryarchaeota, Thaumarchaeota,andKorarchaeota and rep- and to participate in the ninth step of the PBP. The findings resent more than one-third of the Archaea analyzed in this reported here on the conservation of the active sites of PurOs study. In some cases, the conservation of the genomic context in Bacteria and Archaea could be used as a guide to select of the purPs extends to OTUs of different genera, such as the positions to engineer this protein either to gain more insight in purPsfromDesulfurococcus kamchatkensis 1221n and its mode of action or modify its activity by mutations as Ignisphaera aggregans DSM 17230 or to all OTUs of one or- reported for glycerol dehydratase (Maddock et al. 2017). der, such as in Sulfolobales (supplementary table S5, Some bacterial OTUs containing purV, purJ,andpurO are Supplementary Material online). These are ecological and ge- probiotics or biomarkers for human or pathologies, nomic evidences of the involvement of the PurPs II, III, and IV such as Roseburia spp., Helicobacter felis,andPrevotella in the PBP. spp. (Fritz et al. 2006; Larsen 2017; Tamanai-Shacoori et al. The PurP I of Methanocaldococcus jannaschii was shown to 2017). Thus, these genes may be used as markers to identify have FAICAR synthase activity, which is the ninth step of the the OTUs or, alternatively, may be used in experiments of PBP (Ownby et al. 2005). In contrast, neither Pyrococcus fur- molecular docking to develop drugs that specifically inhibit iosus PurP II and PurP III have showed any detectable FAICAR pathogenic species containing these proteins (Meng et al. synthase activity (Zhang, White, et al. 2008). However, the 2011). The experimental characterization of the activity of tertiary and quaternary structures of PurP I of M. jannaschii the different isoforms of PurP from the Archaea domain and PurP II of P. furiosus revealed that their active sites were ought to be pursued in future studies. highly conserved (Zhang, White, et al. 2008). Additionally, the According to the hypothesis of Horowitz (1945), enzymes PurP II of P. furiosus binds to both ATP and AICAR (Zhang, of the last stages of biosynthetic pathways were the first to be White, et al. 2008). Based on these findings, Zhang, White, recruited during their origin and Woese (1998) proposed that

Genome Biol. Evol. 11(4):1235–1249 doi:10.1093/gbe/evz035 Advance Access publication February 20, 2019 1247 Cruz et al. GBE

the Cenancestor was a diverse community of cells that sur- Chopra AK, Peterson JW, Prasad R. 1991. Nucleotide sequence analysis of vived and evolved as a biological unit rather than a discrete purH and purD genes from Salmonella typhimurium. Biochim Biophys Acta. 1090(3):351–354. entity. The last two steps of the PBP show the highest varia- Choquet CG, Richards JC, Patel GB, Sprott GD. 1994. Purine and pyrim- tion when compared with the other steps of this pathway and idine biosynthesis in methanogenic bacteria. Arch Microbiol. indeed, this variation is found in both Archaea and Bacteria. 161(6):471–480. This diversity observed in the first recruited enzymes probably Crooks GE, Hon G, Chandonia JM, Brenner SE. 2004. WebLogo: a se- resulted from the functional redundancy in ancient times, in quence logo generator. Genome Res. 14(6):1188–1190. Euzeby JP. 1997. List of bacterial names with standing in nomencla- accordance with Woese’s proposition. Therefore, our results ture: a folder available on the Internet. Int J Syst Bacteriol. combined with the propositions presented earlier, are congru- 47(2):590–592.

ent with the hypothesis that the last steps of PBP evolved in Fritz EL, Slavik T, Delport W, Olivier B, van der Merwe SW. 2006. Incidence Downloaded from https://academic.oup.com/gbe/article/11/4/1235/5345563 by guest on 27 September 2021 the Cenancestor. of Helicobacter felis and the effect of coinfection with Helicobacter pylori on the gastric mucosa in the African population. J Clin Microbiol. 44(5):1692–1696. Supplementary Material Gots JS, Dalal FR, Shumas SR. 1969. Genetic separation of the inosinic acid cyclohydrolase-transformylase complex of Salmonella typhimurium.J Supplementary data areavailableatGenome Biology and Bacteriol. 99:441–449. Evolution online. Graupner M, Xu H, White RH. 2002. New class of IMP cyclohydrolase in Methanococcus jannaschii. J Bacteriol. 184(5):1471–1473. Gu ZM, Martindale DW, Lee BH. 1992. Isolation and complete sequence of the purL gene encoding FGAM synthase II in Lactobacillus casei. Acknowledgments Gene 119(1):123–126. Han MV, Zmasek CM. 2009. phyloXML: xML for evolutionary biology and The authors acknowledge the financial support provided by comparative genomics. BMC Bioinformatics 10:356. the Brazilian organizations Fundac¸ao~ de Amparo aPesquisa He B, Smith JM, Zalkin H. 1992. Escherichia coli purB gene: cloning, nu- do Estado da Bahia (FAPESB), Coordenac¸ao~ de cleotide sequence, and regulation by purR. J Bacteriol. Aperfeic¸oamento de Pessoal de Nıvel Superior (CAPES), and 174(1):130–136. Horowitz NH. 1945. On the evolution of biochemical syntheses. Proc Natl Conselho Nacional de Desenvolvimento Cientıfico e Acad Sci U S A. 31(6):153–157. Tecnologico (CNPq). Hoskins AA, Anand R, Ealick SE, Stubbe J. 2004. The formylglycinamide ribonucleotide amidotransferase complex from subtilis: metabolite-mediated complex formation. Biochemistry Literature Cited 43(32):10314–10327. Aiba A, Mizobuchi K. 1989. Nucleotide sequence analysis of genes purH Hug LA, et al. 2016. A new view of the tree of . Nat Microbiol. and purD involved in the de novo purine nucleotide biosynthesis of 1(5):16048. Escherichia coli. J Biol Chem. 264(35):21239–21246. Huynen M, Snel B, Lathe W, Bork P. 2000. Predicting protein function by Ajawatanawong P, Baldauf SL. 2013. Evolution of protein indels in , genomic context: quantitative evaluation and qualitative inferences. and fungi. BMC Evol Biol. 13:140. Genome Res. 10(8):1204–1210. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local Inglese J, Johnson DL, Shiau A, Smith JM, Benkovic SJ. 1990. Subcloning, alignment search tool. J Mol Biol. 215(3):403–410. characterization, and affinity labeling of Escherichia coli glycinamide Andrade RFS. 2011. Detecting network communities: an application to ribonucleotide transformylase. Biochemistry 29(6):1436–1443. phylogenetic analysis. PLoS Comput Biol. 7(5):e1001131. Jewett MW, et al. 2009. GuaA and GuaB are essential for Borrelia burg- Armenta-Medina D, Segovia L, Perez-Rueda E. 2014. Comparative geno- dorferi survival in the tick-mouse cycle. J Bacteriol. mics of nucleotide metabolism: a tour to the past of the three cellular 191(20):6231–6241. domains of life. BMC Genomics 15:800. Kang YN, Tran A, White RH, Ealick SE. 2007. A novel function for the NTN Ashkenazy H, et al. 2016. ConSurf 2016: an improved methodology to hydrolase fold demonstrated by the structure of an archaeal inosine estimate and visualize evolutionary conservation in macromolecules monophosphate cyclohydrolase. Biochemistry 46(17):5050–5062. Nucleic Acids Res. 44:344–350. Kirsch DR, Whitney RR. 1991. Pathogenicity of Candida albicans auxotro- Brown AM, Hoopes SL, White RH, Sarisky CA. 2011. Purine biosynthesis in phic mutants in experimental . Infect Immun. archaea: variations on a theme. Biol Direct. 6:63. 59(9):3297–3300. Buchanan JM, Hartman SC. 1959. Enzymatic reactions in the synthesis of Koonin EV. 2015. Archaeal ancestors of eukaryotes: not so elusive any purines. Adv Enzymol. 21:199–261. more. BMC Biol. 13:84. Caetano-Anolles G, Kim SK, Mittenthal JE. 2007. The origin of modern Korbel JO, Jensen LJ, Mering CV, Bork P. 2004. Analysis of genomic con- metabolic networks inferred from phylogenomic analysis of protein text: prediction of functional associations from conserved bidirection- architecture. Proc Natl Acad Sci U S A. 104:9358–9363. ally transcribed gene pairs. Nat Biotechnol. 22(7):911–917. Carvalho DS, et al. 2015. What are the evolutionary origins of mitochon- Larsen JM. 2017. The immune response to Prevotella bacteria in chronic dria? A complex network approach. PLoS One 10(9):e0134988. inflammatory disease. Immunology 151(4):363–374. Chan SK, Hsing M, Hormozdiari F, Cherkasov A. 2007. Relationship be- Liechti G, Goldberg JB. 2012. Helicobacter pylori relies primarily on the tween insertion/deletion (indel) frequency of proteins and essentiality. purine salvage pathway for purine nucleotide biosynthesis. J Bacteriol. BMC Bioinformatics 28:227. 194(4):839–854. Cheng YS, et al. 1990. Glycinamide ribonucleotide synthetase from Liu Y, et al. 2014. Modification in de novo purine pathway for adenosine Escherichia coli: cloning, overproduction, sequencing, isolation, and accumulation by Bacillus subtilis. Wei Sheng Wu Xue Bao characterization. Biochemistry 29(1):218–227. 54(6):641–647.

1248 Genome Biol. Evol. 11(4):1235–1249 doi:10.1093/gbe/evz035 Advance Access publication February 20, 2019 Different Ways of Doing the Same GBE

Maddock DJ, Gerth ML, Patrick WM. 2017. An engineered glycerol dehy- Rokas A, Holland PW. 2000. Rare genomic changes as a tool for phylo- dratase with improved activity for the conversion of meso-2, 3-buta- genetics. Trends Ecol Evol. 15(11):454–459. nediol to butanone. Biotechnol J. 12(12):1700480. Sampei G-I, et al. 2010. Crystal structures of glycinamide ribonucleotide Maden BE. 2000. Tetrahydrofolate and tetrahydromethanopterin com- synthetase, PurD, from thermophilic eubacteria. J Biochem. pared: functionally distinct carriers in C1 metabolism. Biochem J. 148(4):429–438. 15(350 Pt 3):609–629. Schrimsher JL, Schendel FJ, Stubbe J, Smith JM. 1986. Purification and Marolewski A, Smith JM, Benkovic SJ. 1994. Cloning and characterization characterization of aminoimidazole ribonucleotide synthetase from of a new purine biosynthetic enzyme: a non-folate glycinamide ribo- Escherichia coli. Biochemistry 25(15):4366–4371. nucleotide transformylase from E. coli. Biochemistry 33(9):2531–2537. Smith PMC, Atkins CA. 2002. Purine biosynthesis: big in division, even Meng XY, Zhang HX, Mezei M, Cui M. 2011. Molecular docking: a pow- bigger in nitrogen assimilation. Physiol. 128(3):793–802. erful approach for structure-based drug discovery. Curr Comput Aided Tamanai-Shacoori Z, et al. 2017. Roseburia spp.: a marker of health?

Drug Des. 2:147–157. Future Microbiol. 12:157–170. Downloaded from https://academic.oup.com/gbe/article/11/4/1235/5345563 by guest on 27 September 2021 Naushad HS, Lee B, Gupta RS. 2014. Conserved signature indels and sig- Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. 2013. MEGA6: nature proteins as novel tools for understanding microbial phylogeny molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. and systematics: identification of molecular signatures that are specific 30(12):2725–2729. for the phytopathogenic genera Dickeya, Pectobacterium and Watanabe W, Sampei G, Aiba A, Mizobuchi K. 1989. Identification and Brenneria. Int J Syst Evol Microbiol. 64(Pt 2):366–383. sequence analysis of Escherichia coli purE and purK genes encoding 50- Ni L, Guan K, Zalkin H, Dixon JE. 1991. De novo purine nucleotide biosyn- phosphoribosyl-5-amino-4-imidazole carboxylase for de novo purine thesis: cloning, sequencing and expression of a chicken PurH cDNA biosynthesis. J Bacteriol. 171(1):198–204. encoding 5-aminoimidazole-4-carboxamide-ribonucleotide Waters E, et al. 2003. The genome of Nanoarchaeum equitans: insights transformylase-IMP cyclohydrolase. Gene 106(2):197–205. into early archaeal evolution and derived parasitism. Proc Natl Acad Sci Nilsson D, Kilstrup M. 1998. Cloning and expression of the Lactococcus U S A. 100(22):12984–11298. lactis purDEK genes, required for growth in milk. Appl Environ White RH. 1988. Analysis and characterization of the folates in the non- Microbiol. 64(11):4321–4327. methanogenic archaebacteria. J Bacteriol. 170(10):4608–4612. Ownby K, Xu H, White RH. 2005. A Methanocaldococcus jannaschii ar- White RH. 1997. Purine biosynthesis in the domain Archaea without chaeal signature gene encodes for a 5-Formaminoimidazole-4-carbox- folates or modified folates. J Bacteriol. 179(10):3374–3377. amide-1–D-ribofuranosyl 5-Monophosphate synthetase: a new Woese CR, Fox GE. 1977. Phylogenetic structure of the prokaryotic do- enzyme in purine biosynthesis. J Biol Chem. 280(12):10881–10887. main: the primary kingdoms. Proc Natl Acad Sci U S A. Parker J. 1984. Identification of the purC gene product of Escherichia coli.J 74(11):5088–5090. Bacteriol. 157(3):712–717. Woese CR. 1998. The universal ancestor. Proc Natl Acad Sci U S A. Peltonen T, Mants€ al€ a€ P. 1999. Isolation and characterization of a 95(12):6854–6859. purC(orf)QLF operon from Lactococcus [correction of Lactobacillus] Xu L, et al. 2007. Structure-based design, synthesis, evaluation, and crystal lactis MG1614. Mol Gen Genet. 261(1):31–41. structures of transition state analogue inhibitors of inosine monophos- Penn O, et al. 2010. GUIDANCE: a web server for assessing alignment phate cyclohydrolase. J Biol Chem. 282(17):13033–13046. confidence scores. Nucleic Acids Res. 38(Web Server):W23–W28. Zalkin H. 1983. Structure, function, and regulation of amidophosphoribo- Petitjean C, Deschamps P, Lopez-Garc ıa P, Moreira D, Brochier-Armanet syltransferase from prokaryotes. Adv Enzyme Regul. 21:225–237. C. 2015. Extending the conserved phylogenetic core of Archaea dis- Zhang D, Iyer LM, Burroughs AM, Aravind L. 2014. Resilience of biochem- entangles the evolution of the third domain of life. Mol Biol Evol. ical activity in protein domains in the face of structural divergence. 32(5):1242–1254. Curr Opin Struct Biol. 26:92–103. Pride DT. 2000. Swaap: a tool for analyzing substitutions and similarity in Zhang Y, Morar M, Ealick SE. 2008. Structural biology of the purine bio- multiple alignments. Available from: http://www.thepridelaboratory. synthetic pathway. Cell Mol Life Sci. 65(23):3699–3724. org/software.html, last accessed March 14, 2019; distributed by the Zhang Y, White RH, Ealick SE. 2008. Crystal structure and function of 5- author. formaminoimidazole-4-carboxamide-1-b-d-ribofuranosyl 50-mono- Rayl EA, Moroson BA, Beardsley GP. 1996. The Human purH gene prod- phosphate synthetase from Methanocaldococcus jannaschii. uct, 5-aminoimidazole-4-carboxamide ribonucleotide formyltransfer- Biochemistry 47(1):205–217. ase/IMP cyclohydrolase: cloning, sequencing, expression, purification, kinetic analysis, and domain mapping. J Biol Chem. 271(4):2225–2233. Associate editor: Laura A. Katz

Genome Biol. Evol. 11(4):1235–1249 doi:10.1093/gbe/evz035 Advance Access publication February 20, 2019 1249