Quick viewing(Text Mode)

Exploring the Chemistry and Evolution of the Isomerases

Exploring the Chemistry and Evolution of the Isomerases

Exploring the chemistry and evolution of the

Sergio Martínez Cuestaa, Syed Asad Rahmana, and Janet M. Thorntona,1

aEuropean Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom

Edited by Gregory A. Petsko, Weill Cornell Medical College, New York, NY, and approved January 12, 2016 (received for review May 14, 2015)

Isomerization reactions are fundamental in biology, and identifier serves as a bridge between biochemical data and ge- usually differ in their biological role and pharmacological effects. nomic sequences allowing the assignment of enzymatic activity to In this study, we have cataloged the isomerization reactions known genes and proteins in the functional annotation of genomes. to occur in biology using a combination of manual and computa- Isomerases represent one of the six EC classes and are subdivided tional approaches. This method provides a robust basis for compar- into six subclasses, 17 sub-subclasses, and 245 EC numbers cor- A ison and clustering of the reactions into classes. Comparing our responding to around 300 biochemical reactions (Fig. 1 ). results with the Commission (EC) classification, the standard Although the catalytic mechanisms of isomerases have already approach to represent enzyme function on the basis of the overall been partially investigated (3, 12, 13), with the flood of new data, an integrated overview of the chemistry of isomerization in bi- chemistry of the catalyzed reaction, expands our understanding of ology is timely. This study combines manual examination of the the biochemistry of isomerization. The grouping of reactions in- chemistry and structures of isomerases with recent developments volving stereoisomerism is straightforward with two distinct types cis-trans in the automatic search and comparison of reactions. Results (racemases/epimerases and isomerases), but reactions obtained using our de novo reaction-based clustering approach entailing structural isomerism are diverse and challenging to clas- were compared with the EC classification. sify using a hierarchical approach. This study provides an overview of which isomerases occur in nature, how we should describe and Results classify them, and their diversity. Unlike other EC classes, the overall chemistry of isomerases is diverse, especially at the subclass level (Fig. 1A). Some - BIOCHEMISTRY isomerases | enzyme reaction | EC-BLAST | reaction similarity | ases change [racemases and epimerases (EC 5.1) EC classification and cis-trans isomerases (EC 5.2)]; the rest catalyze major structural rearrangements and mirror the chemistry of other EC he 3D structure and function of biomolecules are intimately primary classes but act intramolecularly [intramolecular oxido- Tlinked. One of the most outstanding attributes of is reductases (EC 5.3) evoke (EC 1), intra- their ability to recognize similar , such as isomers, se- molecular (EC 5.4) are designated from transferases (EC 2), and intramolecular (EC 5.5) are designated from lectively. For example, glutamate racemase catalyzes the inter- lyases (EC 4)]. Finally, other isomerases (EC 5.99) refer to conversion between the isomers L-glutamate and D-glutamate, isomerases that do not fit any of the above and exhibit even with the first being one of the 20 amino acids used to build greater diversity. Only three subclasses, EC 5.1, EC 5.3, and EC proteins, whereas the second is an essential component of bac- 5.4, are further divided into sub-subclasses depending on dif- terial cell walls (1). Isomers of the same drug are often distin- ferent attributes of the reaction: type of , bond change, guished; for example, the tragic story of thalidomide unveiled how subtle changes in the spatial arrangement of atoms can have Significance drastic consequences in their biological effect (2). The isomerases, which catalyze these interconversions, are involved in the central metabolism of most living organisms and Biologists are now challenged with the functional interpreta- have important applications in organic synthesis, biotechnology, tion of vast amounts of sequencing data derived from geno- and drug discovery (3–5). In comparison to other classes, isom- mics initiatives. Among all known proteins, the function of erases are a small class involving unimolecular reactions, which enzymes is probably the most investigated and best described are easy to analyze manually. The study of the biological mecha- at the molecular level. Together with enzymes changing the nisms of isomerases provided fundamental insights into the elec- redox state of substrates and transferring chemical groups trostatic principles of (6) and helped to reveal the between molecules, isomerases catalyze interconversion of connection between host–parasite interactions and cancer (7). The isomers, molecules sharing the same atomic composition but challenges of automatically detecting stereoisomerization in re- different arrangements of chemical groups. This study presents actions also make their chemistry technically interesting (8–11). a way of describing isomerases that will give biochemists a method A standard description of the biological function of genes and to search and utilize reaction data in a more knowledge- proteins is essential to interpret and report the outcome of se- based manner. It captures our current knowledge, charac- quencing initiatives. Scientists have traditionally developed elab- terizing the chemistry of isomerization in biology, and will orate classification systems to group functions in a hierarchical contribute to improving the annotation of sequences derived manner. Among the existing approaches, enzyme function is from genomes. probably the best described at the molecular level, due to the long- standing effort of a team of enzymologists from the Enzyme Author contributions: S.M.C., S.A.R., and J.M.T. designed research; S.M.C. performed re- Commission (EC) of the Nomenclature Committee of the In- search; S.A.R. contributed new reagents/analytic tools; S.M.C. and J.M.T. analyzed data; ternational Union of Biochemistry and Molecular Biology (NC- and S.M.C. and J.M.T. wrote the paper. IUBMB) to classify enzyme function systematically. The EC The authors declare no conflict of interest. classification is the most widely used system and uses four-digit This article is a PNAS Direct Submission. identifiers known as the EC numbers describing different levels Freely available online through the PNAS open access option. of the overall chemistry being catalyzed by an enzyme. For instance, 1To whom correspondence should be addressed. Email: [email protected]. alanine racemase is an (EC 5) catalyzing the racemiza- This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. tion (EC 5.1) of the (EC 5.1.1) Ala (EC 5.1.1.1). This 1073/pnas.1509494113/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1509494113 PNAS Early Edition | 1of6 Downloaded by guest on September 26, 2021 AB

Fig. 1. Analysis of the EC classification of isomerases. (A) Distribution of isomerases in six subclasses, with the type of isomerism highlighted. Different attributes of the reaction are used to divide subclasses into sub-subclasses. (B) Distribution of isomerase reactions by bond changes. The symbol “↔” indicates change of bond order.

reaction center, and chemical group transferred (Fig. 1A). An For instance, all reactions in EC 5.1 are only C(R/S), except the approach using a combination of manual analysis informed by an conversion of L-phenylalanine into D-phenylalanine, where automatic comparison and clustering of reactions was contrasted L-phenylalanine racemase (EC 5.1.1.11) catalyzes the cleavage to the EC nomenclature to suggest key determinants involved and formation of two O–P bonds and two O–H bonds from ATP in the classification of isomerization in biology (SI Appendix, and water molecules. EC 5.2 is mainly C(E/Z), and C–Hand Fig. S1A). C–C ↔ C=C are rare. The rest of the subclasses involve a more complex combination of bond changes and reaction centers. Isomerase Reaction Data. At the time of writing, the NC-IUBMB Despite being rare, 12 bond changes (40% of the total) and 468 (the body that oversees enzyme nomenclature) listed 5,385 active reaction centers (79% of the total) are distinctive of one subclass four-digit EC numbers in the classification, 245 of which corre- (SI Appendix, Table S1). For example, the O–O bond in ring spond to isomerase EC numbers. The EC assigns an EC number systems is only broken by EC 5.3 enzymes present in the arachi- to an enzyme and, based on experimental evidence, identifies its donic acid metabolism: prostaglandin synthases D, E, and I (EC “dominant” reaction, even though the enzyme might be pro- 5.3.99.2, EC 5.3.99.3, and EC 5.3.99.4) and thromboxane-A syn- miscuous and able to catalyze many different reactions. Bi- thase (EC 5.3.99.5). These enzymes catalyze the opening of epi- ological databases, such as the Kyoto Encyclopedia of Genes and dioxy bridges in prostaglandins. On the other hand, abundant bond Genomes (KEGG; which is very widely used), rely on this so- changes, such as C(R/S), are often present in multiple subclasses. called “IUBMB reaction,” which is chosen by the KEGG as the representative reaction for the group of reactions associated with Isomerase Reactants. All isomerase reactions, as defined in the the same EC number. Only the 219 isomerase EC numbers with KEGG (15), are reversible, with both substrates and products chemical structures available for all reactants and balanced equally designated as reactants. Most reactions are unimolecular IUBMB reactions were used in this analysis (Materials and Methods). (a single substrate leads to a single ); the only exception This dataset represents the most complete compilation of is the interconversion catalyzed by L-phenylalanine racemase isomerase chemistry existing in nature that is known today (SI (discussed above). This enzyme is an ATP-hydrolyzing isomerase Appendix, Table S5). involving three substrates (L-phenylalanine, ATP, and water) and three products (D-phenylalanine, AMP, and diphosphate). A Bond Changes and Reaction Centers in the Isomerases. Using our total of 370 reactants are present in our list of isomerase reac- algorithm EC-BLAST (14), we characterized the 219 isomer- tions, and 10% of them are present in more than one reaction. ase reactions in our dataset by calculating the bond changes they The three most common reactants, (S)-2,3-epoxysqualene, ger- perform and their reaction centers directly from the molecular anylgeranyl diphosphate, and prostaglandin H2, participate in equations describing the reactions. This analysis provides an one subclass only (EC 5.4, EC 5.5, and EC 5.3, respectively) (SI overview of the chemistry of isomerase reactions in nature (Fig. Appendix, Fig. S1C). Remarkably, (S)-2,3-epoxysqualene, an in- 1B). A total of 30 different types of bond changes and 595 re- termediate in the biosynthesis of terpenoids in plants, animals, action centers were found in these reactions. The most common and fungi, is the substrate of 25 different oxidosqualene cyclases bond changes are R/S stereo-change [C(R/S)], cleavage and (EC 5.4.99), which catalyze diverse cyclization/rearrangement formation of carbon–hydrogen (C–H) bond, and cleavage and reactions to produce cyclic sterol and triterpene products (16). In formation of oxygen–hydrogen (O–H) bond (Fig. 1B). The most particular, these intramolecular transferases differ minimally in common reaction centers are carbon and oxygen atoms and the structure of their active sites to generate structurally diverse 2-hydroxypropyl (SI Appendix, Fig. S1B). The distribution of the cyclization products. Geranylgeranyl diphosphate is also involved different bond changes with the EC subclasses shows that EC 5.1 in cyclization reactions undertaken by five different intra- and EC 5.2 are less diverse in bond changes and reaction centers molecular lyases (EC 5.5.1) present in the mevalonate pathway than the rest, which show many different types of bond changes. of higher eukaryotes and bacteria. Finally, prostaglandin H2 is a

2of6 | www.pnas.org/cgi/doi/10.1073/pnas.1509494113 Martínez Cuesta et al. Downloaded by guest on September 26, 2021 AB

C BIOCHEMISTRY

Fig. 2. Cluster analysis of isomerase reactions based on bond changes. (A) Bond change composition of three clusters (A, B, and F), which displays reactions as rows and bond changes as columns. The blue scale represents the number of bond changes in reactions. As shown to the left of the graphs, reactions are an- notated in colors according to their EC subclass. Bond changes were ordered left to right by increasing number. Outliers are annotated with an arrow (SI Ap- pendix,TableS2). (B) Bond change similarity matrix used to find the six chemically optimal clusters. The blue-to-red scale represents increasing bond change similarity, with identical reactions having similarity of 1 (red). More details of A and B are shown in SI Appendix, Fig. S2.(C) Comparative analysis of the reaction clustering by bond changes (Left) and clustering by substrates and products (Right) using tanglegrams (Materials and Methods and SI Appendix,Fig.S5).

lipid metabolite functioning as an important regulatory matrices were computed and clustered hierarchically (Materials in animals. It is the substrate of four different intramolecular and Methods). Based on bond changes, an optimal number of six oxidoreductases (EC 5.3.99), which share similar patterns of chemically sensible clusters were generated, but only two clusters bond changes and reaction centers. Other reactants shared be- correspond to a pure subclass (EC 5.1 and EC 5.2, respectively) tween more than one reaction participate in different subclasses, (Fig. 2 and SI Appendix, Fig. S2). The other clusters are mixed. however. For instance, D- 6-phosphate, the second me- Clustering by reaction centers describes reactions in a more tabolite of the glycolysis pathway, is the substrate of four isom- detailed manner and supports the grouping obtained by bond erases from three different subclasses: phosphoglucose isomerase changes (SI Appendix, Figs. S3 and S5 C and E). Finally, the third (EC 5.3.1.9), phosphoglucomutase (EC 5.4.2.2 and EC 5.4.2.5), type of clustering by the structures of substrates and products is and 1D-myo-inositol-3-phosphate (EC 5.5.1.4). completely different (Fig. 2C) and does not show differences between EC 5.1, EC 5.2, and the rest of the subclasses, instead Clustering and Classification. To group reactions based on the connecting reactions with structurally similar reactants but dif- number of chemical attributes, all-against-all reaction similarity ferent overall chemistry (SI Appendix, Figs. S4 and S5 D and F).

Martínez Cuesta et al. PNAS Early Edition | 3of6 Downloaded by guest on September 26, 2021 For instance, racemase (EC 5.1.1.5), lysine 2,3-aminomutase Taxonomy, and Sequences (SIFTS) resource (19). The data are (EC 5.4.3.2), and methylornithine synthase (EC 5.4.99.58) group shown in Fig. 3 and SI Appendix,Fig.S8, in which the catalytic together in the analysis of substrates and products. The first domains, the minimum set of domains necessary for the catalysis of catalyzes the racemization of L-lysine to D-lysine. The second is a each EC number, are displayed. Almost one-fifth (24 EC numbers) radical S-adenosyl-L-methionine (SAM) enzyme; it uses pyri- of the EC numbers are associated with multiple unrelated protein doxal-5′-phosphate (PLP) as a and transfers an amino folds, showing that these functions have evolved multiple times groupfromC2toC3inL-lysine to produce (3S)-3,6-dia- from unrelated protein structures with different domain composi- minohexanoate (17). The third example is also a radical SAM tions, most often consisting of two or three distinct catalytic do- enzyme, but it is not PLP-dependent and catalyzes a mutase mains (SI Appendix,Fig.S8). Among the most diverse EC numbers, reaction that uses L-lysine to generate 3-methylornithine, a key we find the peptidylprolyl isomerases (EC 5.2.1.8), which catalyze the precursor in the biosynthesis of pyrrolysine. This chemical com- cis-trans isomerization of proline peptide bonds, and display 11 un- pound is the 22nd proteinogenic amino acid encoded as the UAG related protein structures corresponding to 14 domains. Also, for codon in the genetic code of methanogenic archaea and bacteria chorismate mutase (EC 5.4.99.5) and the DNA topoisomerases (EC (18). Although the three isomerases share L-lysine as a reactant 5.99.1.2 and EC 5.99.1.3), we observe that these reactions are and the two radical SAM enzymes have similar overall chemistry performed by six unrelated proteins. as evidenced by bond changes and reaction centers, the chemistry In total, we identified 141 different domains involved in the of lysine racemase is different. The difference in the use of the catalysis of isomerase function (Fig. 3C), with about one-third PLP cofactor between the radical SAM enzymes is only apparent (49 domains) connected to multiple isomerase EC numbers. Of if mechanistic information is considered. these domains, 23 are involved in isomerase reactions that are The subdivision of the EC 5.1, EC 5.3, and EC 5.4 isomerase often very diverse from both the chemistry and substrate per- subclasses into sub-subclasses was investigated (Fig. 1A). First, al- spectives. Among these isomerase reactions, we found the though racemases and epimerases acting on amino acids (EC 5.1.1) CATH 3.20.20.70 alpha-beta domain, known as “aldolase class and (EC 5.1.3) have the same bond changes, they I,” which facilitates different types of isomerizations in modified split by reaction centers and reactants into different groups (SI sugar molecules bearing a phosphate group (EC 5.1.3.1, EC Appendix,Fig.S6A). This observation is consistent with previous 5.1.3.9, EC 5.3.1.1, EC 5.3.1.16, and EC 5.3.1.24). It also trans- investigations exploring EC 5.1 reactions on the basis of poses the C=C bond in isopentenyl diphosphate (EC 5.3.3.2), codes and self-organizing maps (10). Second, bond changes and catalyzes the aromatization of thiazole moieties (EC 5.3.99.10), reaction centers reveal similarities between intramolecular oxido- and mutates L-lysine into 3-methylornithine (EC 5.4.99.58). reductases interconverting aldoses and ketoses (EC 5.3.1) and keto- Multidomain structures add an additional level of complexity and enol-groups (EC 5.3.2) and differences from intramolecular [e.g., the current data imply that both enolase (CATH 3.20.20.120) oxidoreductases transposing C=C bonds (EC 5.3.3) (SI Appendix, and enolase-like, N-terminal (CATH 3.30.390.10) domains are Fig. S6B). Third, the division of EC 5.4 on the basis of the chemical necessary to catalyze amino acid, dipeptide, and mandelate race- group transferred seems sensible but proved difficult to extract mizations (EC 5.1.1.10, EC 5.1.1.20, and EC 5.1.2.2) and muco- using our chemical attributes. For instance, isochorismate synthase nate cycloisomerizations (EC 5.5.1.1 and EC 5.5.1.7)]. (EC 5.4.4.2) and chorismate mutase (EC 5.4.99.5) isomerize the There are, however, 19 domains where the overall chemistry of substrate chorismate into isochorismate and prephenate, respec- the reaction and substrate-binding abilities are conserved. Per- tively. Although both reactions share the same substrate and similar haps the most relevant example is the RNA pseudouridine syn- bond changes and reaction centers, the former involves the transfer thases (EC 5.4.99.19–EC 5.4.99.29), in which the pseudouridine of a hydroxyl group, whereas the latter converts a 2-hydroxyprop-2- synthase domain (CATH 3.30.2350.10) changes uridine into pseu- enoic acid group (SI Appendix,Fig.S6C). douridine. Also, there are five domains that are involved in re- actions that conserve the substrate with modest changes in the Structures of Isomerases. To provide an overview of the relation- chemistry. For example, epimerizations in the C1 (EC 5.3.1.8 ship between isomerase chemistry and protein structure, we and EC 5.3.1.31) and C2 (EC 5.1.3.8 and EC 5.1.3.11) positions performed a manual analysis of the domain composition of all of sugar structures are both catalyzed by CATH domain available isomerase 3D structures. A total of 136 isomerase EC 1.50.10.10. Also, there are two domains involved in reactions that numbers have Protein Data Bank (PDB) structural data and change the substrate while conserving the chemistry. For example, the Class Architecture Topology Homologous (CATH) domain def- macrophage migration inhibitory factor domain (CATH 3.30.429.10) initions according to the Structure Integration with Function, catalyzes related keto-enol tautomerizations in phenylpyruvate

ABC

Fig. 3. Domain composition of isomerase 3D structures. (A) Distribution of isomerase domains aggregated by EC sub-subclass (rows) and second level of CATH (columns). A red square represents one or more domains involved in an EC number. More details are shown in SI Appendix, Fig. S8.(B) Distribution of CATH domains per isomerase EC number. (C) Distribution of isomerase EC numbers per CATH domain. The frequency axes represent the counts of isomerase EC numbers (B) and CATH domains (C).

4of6 | www.pnas.org/cgi/doi/10.1073/pnas.1509494113 Martínez Cuesta et al. Downloaded by guest on September 26, 2021 (EC 5.3.2.1), oxaloacetate (EC 5.3.2.2), 2-hydroxymuconate de- approach. This limitation led to an overestimation of their number rivatives (EC 5.3.2.6 and EC 5.3.3.10), and L-dopachrome of bond changes, and work to resolve these issues is ongoing. (EC 5.3.3.12). The correlation between our automated clustering of the isomerases and the EC classification is mixed. Previous studies Isomerase Reactions in the Universe of Enzyme . To found agreement between the grouping of biochemical reactions compare isomerases with all known enzyme reactions, all of the based on chemical attributes and the EC classification, especially 65 subclasses in the EC nomenclature were compared and in oxidoreductases (EC 1), (EC 3), and (EC 6) clustered on the basis of their bond changes (SI Appendix, Fig. (24–27). We deliberatively chose the clustering algorithm and S9). The number of bond changes for all reactions was aggre- number of clusters to optimize comparison with the EC classi- gated and normalized by the number of reactions in each sub- fication in terms of fitting subclasses. On the whole, bond changes class. From this heat map, it is immediately apparent that the work best for partially recreating the six subclasses (Fig. 2). Nev- bond changes alone cannot be used to classify enzyme reactions ertheless, the bond change distribution is not pure, highlighting into the six primary classes. The most outstanding feature is that the nonhierarchical nature of chemical reactions, including isom- stereoisomerase reactions [racemases, epimerases (EC 5.1), and erases. Reaction centers generate a more complex classification cis-trans isomerases (EC 5.2)] are distinct and show little similarity but support the results obtained by bond changes. Clustering to any other subclass. The remarkably simple overall chemistry of based on the structures of substrates and products shows clear these isomerases contrasts with the complex chemistry of other EC differences compared with clustering using bond changes and subclasses (SI Appendix,Fig.S9A). Other than this difference, the reactions centers (SI Appendix), but this approach is useful to isomerases group primarily with oxidoreductases (EC 1) and ly- find enzymes that work on the same reactants. ases (EC 4). Intramolecular oxidoreductases (EC 5.3) cluster most To summarize the evidence, the classification of enzymatic closely with oxidoreductases acting on the CH–CH group of do- isomerization is not easy because the nature and number of chemical nors (EC 1.3) and with those oxidoreductases reducing the C–O–C attributes are drastically different between the subclasses of isom- group as acceptors (EC 1.23). Interestingly the intramolecular erases. Racemases, epimerases, and cis-trans isomerases (EC 5.1 and transferases (EC 5.4) and intramolecular lyases (EC 5.5) group EC 5.2) have relatively few bond changes and reaction centers, and together and are most similar to C–O lyases (EC 4.2). These are sensibly grouped using either metric. However, EC 5.3, EC 5.4, enzymes all transform C–C bonds in ring structures (SI Appendix, and EC 5.5 are present in mixed clusters and involve more complex Fig. S9B). In a separate study of the evolution of isomerases (20), combinations of bond changes and reaction centers between we found that isomerases most frequently evolve to become ly- substrates and products. This variation leads to more chemical

ases, presumably because of the similarity in chemistry that this diversity, which poses challenges when trying to classify them BIOCHEMISTRY clustering reveals. There is also some clustering with transferases using a hierarchical approach. that convert aldehyde or oxo groups (EC 2.2), C–C lyases (EC The intramolecular isomerases (EC 5.4) are dominated by the 4.1), and oxidoreductases acting on the CH–OH group of donors sub-subclass EC 5.4.99 (intramolecular transferases transferring (EC 1.1). Most enzymes belonging to these subclasses perform “other” groups), which accounts for almost two-thirds of all EC O–H and C–H catalysis and chiral inversions. 5.4 reactions and one-fifth of all EC 5 reactions and is the most The composite measure we use to combine reactions in a given populated sub-subclass in isomerases. However, this sub-subclass subclass sometimes masks individual reactions that do not fol- complexity can be deconvoluted if clusters of reactions sharing low the overall trend. For example, the 2,3-diphosphoglycerate– similar overall chemistry are redefined and considered separately. dependent and –independent phosphoglycerate mutases (EC 5.4.2.11 In particular, reactions catalyzed by oxidosqualene cyclases (25 re- and EC 5.4.2.12, respectively) catalyze the intramolecular transfer actions), RNA pseudouridine synthases (16 reactions), and carbon of a phosphate group in the conversion of 3-phosphoglycerate to mutases (six reactions) are similar enough to be considered 2-phosphoglycerate in the glycolysis pathway. These enzymes separately from the rest of EC 5.4.99 reactions [SI Appendix, share the catalysis of O–P and O–H bonds with phosphotrans- Intramolecular Transferases (EC 5.4) and Fig. S6]. Perhaps these ferases and kinases (EC 2.7), phosphatases and phosphoric ester reactions deserve separate sub-subclasses. hydrolases (EC 3.1), and other hydrolases acting on phosphorus- The subclass “other isomerases” (EC 5.99) sits apart from containing anhydrides (EC 3.6). The EC 5.99 isomerases are the rest and exhibits even greater chemical diversity. The completely different from the other isomerases and do not topoisomerases (EC 5.99.1.2 and EC 5.99.1.3) change the topology cluster with them, or really with any other reactions, in the heat of DNA while maintaining atom connectivity, and can thus be map. Their closest neighbors are predominantly oxidoreductases, considered to be stereoisomerases (28), whereas thiocyanate but similarities are low. isomerase (EC 5.99.1.1) and 2-hydroxychromene-2-carboxylate isomerase (EC 5.99.1.4) qualify as structural isomerases. Discussion Overall, this study of the chemical attributes of biological In this paper, we have revisited the isomerase classification, using isomerization reflects differences depending on the type of isom- an automated approach to capture three critical characteristics of erism between substrate and product, and highlights three groups enzyme reactions: the changes they generate in covalent bond of similar reactions: enantioisomerism (racemases and epimer- structure, their reaction centers, and the structure of the substrates ases), cis-trans isomerism (cis-trans isomerases), and structural and products involved. This study considers neither the mechanisms isomerism (intramolecular oxidoreductases, intramolecular trans- nor any cofactors used in catalysis (21), which has some advantages, ferases, and intramolecular lyases). The other isomerases include because mechanistic information is difficult to validate experimen- both stereoisomers and structural isomers. tally and is therefore disputed in the literature. In addition, mech- Even the combination of all three ways of comparing enzyme anistic components are usually not captured in reaction files, and the reactions is not able to identify all of the isomerases exclusively EC classification does not use mechanisms per se. At the time of or to reproduce ab initio the primary classification of enzymes as writing, only one-fifth of the isomerase reactions have mechanistic defined by the EC. However, this robust automatic approach data in the Mechanism, Annotation, and Classification in Enzymes provides a rigorous way to characterize all of the isomerase re- (MACiE) resource (22), and we also note that mechanism does not actions and to discriminate between some of the subclasses. It is correlate particularly well with the EC classification (23). also powerful for studying their evolution (20) and to assist in the Overall, the automatic approach in EC-BLAST works rather development of enzymes to perform novel reactions, based on a well, being able to handle 99% of all isomerase reactions. However, better description of extant enzymes. there are still some of these reactions involving cyclizations of (S)-2,3- epoxysqualene and derivatives (EC 5.4.99) that are fairly complex Materials and Methods even at the overall level. As a result, these reactions are challenging Reaction Curation and Similarity. Structural information about the substrates for atom-atom mapping and difficult to handle in our automatic and products of the IUBMB reactions associated with 219 active four-digit

Martínez Cuesta et al. PNAS Early Edition | 5of6 Downloaded by guest on September 26, 2021 isomerase EC numbers was available from the 70.0+ release of the KEGG (15) EC subclass and the same cluster, FP (false positives) are the number of pairs and accessible using the KEGG Advanced Programming Interface (29). in different EC subclass but located in the same cluster, and FN (false neg- Structures were downloaded in MDL Molfile format, analyzed using RDKit atives) are the number of pairs in the same EC subclass but located in dif- (30), and visualized using MarvinSketch software (version 5.9.4) from ferent clusters. Second, as internal evaluation, hierarchical trees were ChemAxon (31). Explicit hydrogens were manipulated using the Molecule pruned at the height that simultaneously minimizes the number of clusters File Converter of Marvin, and reaction files were built, cleaned, balanced, and the spread within each cluster using the maptree package (33). Third, and stored in Rxnfile format. All-against-all similarities between reaction the best correspondence between clusters and EC classification was explored fingerprints were calculated on the basis of bond changes, reaction centers, using the mclust package (34), which helped to determine the extent to and structures of substrates and products using EC-BLAST (14). Bond changes which subclasses and sub-subclasses are prevalent in clusters. refer to the cleavage and formation of chemical bonds, changes in bond order, and stereo-changes. A reaction center is the collection of atoms and bonds that are changed during the reaction, also known as the local atomic Comparative Analysis. Similarity distributions and clustering solutions com- environment around the atoms involved in bond changes (SI Appendix). To puted for each chemical attribute were compared using several strategies. measure reaction similarity, a Tanimoto score was used ranging between First, differences and correlations between similarity distributions were 0 (no similarity) and 1 (identical reactions). This analysis assumed a one-to- evaluated using the Kolmogorov–Smirnov test and Pearson’s correlation one relationship whereby a single IUBMB reaction uniquely designates any coefficient (r), respectively (35, 36). Second, the cross-tabulation of reactions given isomerase EC number. However, this assumption is an approximation, resulting from different clustering methods was manually analyzed using and almost 30% of EC numbers are associated with more than one reaction. contingency tables. Third, topological distances between clustering trees, defined as twice the number of internal branches representing different Cluster Analysis. To find groups of similar reactions, hierarchical clustering bipartitions of the tips (37), were also used. Fourth, tanglegrams involved was performed using the R Environment for Statistical Computing (32). Three drawing reaction clustering trees opposite to each other for visualization approaches were used to select the best clustering algorithm and to choose (38). The number of chemical attributes for EC subclasses was aggregated on the optimal number of clusters. First, using external evaluation, clustering the basis of each individual reaction and normalized by the number of re- algorithms were compared on their ability to obtain the greatest purity in EC actions in each subclass. Cluster analysis was used to compare subclasses as subclasses or sub-subclasses using the F measure as implemented in in-house described above. scripts. This score is the harmonic mean of precision (p) and recall (r), de- fined as Fmeasure = 2pr . In the context of EC subclasses, precision is the p + r ACKNOWLEDGMENTS. We thank Drs. Gemma L. Holliday and John B. O. fraction of isomerase reactions classified in the correct subclass (p = TP ) TP + FP Mitchell for critically reading the manuscript and providing suggestions for and captures the subclass purity of clusters. Recall refers to the fraction of improvement, and Drs. Dobril K. Ivanov and Matthias Ziehm for technical isomerase reactions belonging to the same subclass grouped in the same discussions. We also thank the KEGG for making its reaction data available = TP cluster (r TP + FN) and represents how spread subclasses are across different for academic use through its Advanced Programming Interface services. This clusters. TP (true positives) are the number of pairs of reactions in the same study was funded by the European Molecular Biology Laboratory.

1. Lam H, et al. (2009) D-amino acids govern stationary phase cell wall remodeling in 19. Velankar S, et al. (2013) SIFTS: Structure Integration with Function, Taxonomy and bacteria. Science 325(5947):1552–1555. Sequences resource. Nucleic Acids Res 41(Database issue):D483–D489. 2. Jacques V, Czarnik AW, Judge TM, Van der Ploeg LHT, DeWitt SH (2015) Differenti- 20. Martínez Cuesta S, Furnham N, Rahman SA, Sillitoe I, Thornton JM (2014) The evo- ation of antiinflammatory and antitumorigenic properties of stabilized lution of enzyme function in the isomerases. Curr Opin Struct Biol 26:121–130. of thalidomide analogs. Proc Natl Acad Sci USA 112(12):E1471–E1479. 21. Holliday GL, Mitchell JBO, Thornton JM (2009) Understanding the functional roles of 3. Asano Y, Hölsch K (2012) Isomerizations. Enzyme Catalysis in Organic Synthesis, eds amino acid residues in enzyme catalysis. J Mol Biol 390(3):560–577. Drauz K, Gröger H, May O (Wiley–VCH, Weinheim, Germany), 3rd Ed, pp 1607–1684. 22. Holliday GL, et al. (2012) MACiE: Exploring the diversity of biochemical reactions. 4. Hilterhaus L, Liese A (2012) Industrial application and processes using isomerases. Nucleic Acids Res 40(Database issue):D783–D789. Enzyme Catalysis in Organic Synthesis, eds Drauz K, Gröger H, May O (Wiley–VCH, 23. Nath N, Mitchell JBO (2012) Is EC class predictable from reaction mechanism? BMC – Weinheim, Germany), 3rd Ed, pp 1685 1691. Bioinformatics 13(1):60. 5. Lundqvist T, et al. (2007) Exploitation of structural and regulatory diversity in gluta- 24. Mu F, Unkefer PJ, Unkefer CJ, Hlavacek WS (2006) Prediction of -cat- – mate racemases. Nature 447(7146):817 822. alyzed reactions based on atomic properties of metabolites. Bioinformatics 22(24): 6. Fried SD, Bagchi S, Boxer SG (2014) Extreme electric fields power catalysis in the active 3082–3088. – site of ketosteroid isomerase. Science 346(6216):1510 1514. 25. Egelhofer V, Schomburg I, Schomburg D (2010) Automatic assignment of EC numbers. 7. Marsolier J, et al. (2015) Theileria parasites secrete a prolyl isomerase to maintain host PLOS Comput Biol 6(1):e1000661. – leukocyte transformation. Nature 520(7547):378 382. 26. Hu X, Yan A, Tan T, Sacher O, Gasteiger J (2010) Similarity perception of reactions 8. Ott MA, Vriend G (2006) Correcting ligands, metabolites, and pathways. BMC catalyzed by oxidoreductases and hydrolases using different classification methods. Bioinformatics 7:517. J Chem Inf Model 50(6):1089–1100. 9. Apostolakis J, Sacher O, Körner R, Gasteiger J (2008) Automatic determination of 27. Holliday GL, Rahman SA, Furnham N, Thornton JM (2014) Exploring the biological and reaction mappings and reaction center information. 2. Validation on a biochemical chemical complexity of the ligases. J Mol Biol 426(10):2098–2111. reaction database. J Chem Inf Model 48(6):1190–1198. 28. O’Brien PJ (2006) Catalytic promiscuity and the divergent evolution of DNA repair 10. Latino DARS, Zhang Q-Y, Aires-de-Sousa J (2008) Genome-scale classification of enzymes. Chem Rev 106(2):720–752. metabolic reactions and assignment of EC numbers with self-organizing maps. 29. Kawashima S, Katayama T, Sato Y, Kanehisa M (2003) KEGG API: A web service using Bioinformatics 24(19):2236–2244. SOAP/WSDL to access the KEGG system. Genome Informatics 14:673–674. 11. Chen WL, Chen DZ, Taylor KT (2013) Automatic reaction mapping and reaction center 30. Landrum G (2013) RDKit: Open-Source Cheminformatics Software. Available at www. detection. Wiley Interdiscip Rev Comput Mol Sci 3(6):560–593. rdkit.org. Accessed July 31, 2014. 12. Tanner ME (2002) Understanding nature’s strategies for enzyme-catalyzed racemi- 31. ChemAxon (2012) MarvinSketch, version 5.9.4. Available at https://www.chemaxon. zation and epimerization. Acc Chem Res 35(4):237–246. 13. Silverman RB (2002) The Organic Chemistry of Enzyme-Catalyzed Reactions (Academic, com/products/marvin/marvinsketch/. Accessed May 11, 2012. London). 32. R Core Team (2012) R: A Language and Environment for Statistical Computing. 14. Rahman SA, Cuesta SM, Furnham N, Holliday GL, Thornton JM (2014) EC-BLAST: A Available at www.r-project.org/. Accessed July 31, 2014. tool to automatically search and compare enzyme reactions. Nat Methods 11(2): 33. Kelley LA, Gardner SP, Sutcliffe MJ (1996) An automated approach for clustering an 171–174. ensemble of NMR-derived protein structures into conformationally related subfam- – 15. Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M (2012) KEGG for integration and ilies. Protein Eng 9(11):1063 1065. interpretation of large-scale molecular data sets. Nucleic Acids Res 40(Database issue): 34. Fraley C, Raftery AE, Murphy TB, Scrucca L (2012) mclust Version 4 for R: Normal D109–D114. Mixture Modeling for Model-Based Clustering, Classification, and Density Estimation. 16. Abe I (2014) The oxidosqualene cyclases: One substrate, diverse products. Natural Technical Report No. 597 (University of Washington, Seattle, WA). Products: Discourse, Diversity, and Design, eds Osbourn A, Goss R, Carter GT (Wiley, 35. Crawley MJ (2007) The R Book (Wiley, Chichester, UK). New York), 1st Ed, pp 297–317. 36. Oksanen J (2011) Multivariate analysis of ecological communities in R: Vegan tutorial. 17. Frey PA, Hegeman AD, Ruzicka FJ (2008) The Radical SAM Superfamily. Crit Rev Available at cran.r-project.org/web/packages/vegan/index.html. Accessed July 31, 2014. Biochem Mol Biol 43(1):63–88. 37. Paradis E, Claude J, Strimmer K (2004) APE: Analyses of Phylogenetics and Evolution in 18. Gaston MA, Zhang L, Green-Church KB, Krzycki JA (2011) The complete biosynthesis R language. Bioinformatics 20(2):289–290. of the genetically encoded amino acid pyrrolysine from lysine. Nature 471(7340): 38. Scornavacca C, Zickmann F, Huson DH (2011) Tanglegrams for rooted phylogenetic 647–650. trees and networks. Bioinformatics 27(13):i248–i256.

6of6 | www.pnas.org/cgi/doi/10.1073/pnas.1509494113 Martínez Cuesta et al. Downloaded by guest on September 26, 2021