SI Appendix Index 1
Total Page:16
File Type:pdf, Size:1020Kb
SI Appendix Index Calculating chemical attributes using EC-BLAST ................................................................................ 2 Chemical attributes in isomerase reactions ............................................................................................ 3 Bond changes …..................................................................................................................................... 3 Reaction centres …................................................................................................................................. 5 Substrates and products …..................................................................................................................... 6 Comparative analysis …........................................................................................................................ 7 Racemases and epimerases (EC 5.1) ….................................................................................................. 7 Intramolecular oxidoreductases (EC 5.3) …........................................................................................... 8 Intramolecular transferases (EC 5.4) ….................................................................................................. 9 Supporting references …....................................................................................................................... 10 Fig. S1. Overview …............................................................................................................................. 11 Fig. S2. Analysis of bond changes …................................................................................................... 12 Fig. S3. Analysis of reaction centres …................................................................................................ 13 Fig. S4. Analysis of substrates and products ….................................................................................... 14 Fig. S5. Comparative analysis of chemical attributes …...................................................................... 15 Fig. S6. Analysis of subclasses EC 5.1, 5.3 and 5.4 …........................................................................ 16 Fig. S7. Example of similarities between three isomerase reactions …............................................... 17 Fig. S8. Domain composition of isomerase structures …..................................................................... 18 Fig. S9. Isomerase reactions in the universe of enzyme function ….................................................... 19 Table S1. Overall statistics of chemical attributes …........................................................................... 20 Table S2. Analysis of bond changes clusters ….................................................................................... 21 Table S3. Analysis of reaction centres clusters …................................................................................ 22 Table S4. Analysis of substrates and products clusters ….................................................................... 23 Table S5. List of isomerase reactions …....................................................................................... 24 - 45 1 Calculating chemical attributes using EC-BLAST Our new method EC-BLAST performs robust comparisons and similarity searching of chemical reactions in a quantitative manner using the following chemical attributes: bond changes, reaction centres and structures of substrates and products (Reaction curation and similarity section in Materials and methods, main text). EC-BLAST uses reaction files (Rxnfiles) as the input for four different algorithms that perform atom-atom mapping using the maximum common substructure (MCS) method. Bond changes are obtained automatically using the Dugundji-Ugi (DU) model. The best solution is chosen on the basis of the number of bond changes and fragments generated and bond energies according to the principle of minimum chemical distance. In the DU model, substrates and products are represented using bond-electron matrices where diagonal elements correspond to the number of free valence electrons and off-diagonal entries give the bond orders between atoms. The reaction matrix is obtained when the substrates matrix is subtracted from the products matrix. A positive element on this matrix indicates bond formation, whereas a negative element corresponds to bond cleavage. Changes in the stereochemistry of atoms and bonds, also known as stereochanges, are coded in parallel. Cleaved and formed bonds are indicated as lines connecting atoms, for instance C–C means a single carbon- carbon bond that is cleaved or formed in the reaction. Bond order changes are represented as double arrows connecting bonds, for example C–C↔C=C means a single carbon-carbon bond turning into double carbon-carbon bond or vice versa. Stereochanges are represented as atoms that change their absolute configuration, for instance C(R/S) means a carbon atom that changes from R to S configuration. In order to obtain reaction centres, EC-BLAST calculates circular fingerprints around each non- hydrogen atoms present in bond changes at different levels. Reaction centres at level 0 simply comprise the atoms involved in bond changes, level 1 extends radially by one covalent bond from level 0, level 2 extends by two covalent bonds from level 0 … Finally, the full structure of the substrates and products of the reaction is also considered. The type and number of chemical attributes for each reaction are encoded in reaction fingerprints and a Tanimoto score is used to compare reaction fingerprints and calculate similarities on the basis of bond changes, reaction centres or reactant structure individually. The results are stored in similarity matrices. An example illustrating the calculation of chemical attributes and similarities between three isomerase reactions is shown in Fig. S7. 2 Chemical attributes in isomerase reactions Summary statistics of the bond changes, reaction centres and substrates and products found in isomerase reactions are displayed in Table S1. Some chemical attributes (bond changes, reaction centres, or substrates and products) are distinctive of a particular EC subclass and we can quantify this as proportions of each chemical attribute across all the reactions present in a given isomerase subclass. Overall, twelve out of thirty bond changes (40%) found in isomerase reactions are distinctive for an isomerase EC subclass and this proportion increases from reaction centres (79%) to substrates and products (96%). The ability of the NC-IUBMB to manually update the EC classification in the form of new, transferred and deleted reactions as new enzyme data becomes available will change the proportion of chemical attributes for subclasses. Bond changes Reactions and bond changes are distributed differently depending on the isomerase subclass (Fig. S2A). EC 5.4 (intramolecular transferases) is the most abundant subclass with 71 reactions, closely followed by EC 5.1 (racemases and epimerases) and EC 5.3 (intramolecular oxidoreductases) with 58 and 57 reactions, respectively. EC 5.4 has the largest number of bond changes followed by EC 5.3 and EC 5.5 (intramolecular lyases). The average number of bond changes per isomerase reaction is 7.3. There are several extreme cases with more than 20 bond changes, which correspond to complex reactions belonging to sub-subclass EC 5.4.99 (Fig. S2B). Isomerases from EC 5.1, 5.2 and 5.99 catalyse on average a small number of bond changes (1 to 8). However EC 5.3, 5.4 and 5.5 have more bond changes on average (Fig. S2C). The optimal grouping of isomerase reactions was explored using hierarchical clustering (Reaction curation and similarity section in Materials and methods, main text). First, all-against-all pairwise similarities between isomerase reactions were computed using EC-BLAST on the basis of the number of bond changes (Fig. S2D). Reactions with the same type and number of bond changes are identical (similarity = 1), reactions with some bond changes in common have a similarity between 0 and 1, however reactions with no bond changes in common are different (similarity = 0). Similarities were stored in a matrix. Secondly, a clustering algorithm was applied to the similarity matrix in order to group similar reactions (Fig. S2E). After testing various clustering procedures by scanning how different cuts along the trees lead to greatest purity in EC subclasses, the Ward algorithm was selected to produce an optimal number of six chemically sensible clusters (Table S2). Only one cluster is pure in 3 subclasses: F in EC 5.1. Three clusters are more than 75% pure: A in EC 5.4, B in EC 5.2 and D in EC 5.3. Finally, two clusters are mixed: C (mainly EC 5.3 and 5.4) and E (EC 5.4 and 5.5). From the subclass point of view, almost all racemase and epimerase reactions (EC 5.1) group in cluster F. The only exception is EC 5.1.1.11, which takes part in cluster A and it is considered to be misannotated1 because of sharing identical bond changes to five intramolecular transferase reactions (EC 5.4) (see below). Cis-trans isomerase reactions (EC 5.2) are mostly grouped in cluster B. Exceptions are EC 5.2.1.5 (misannotated), 5.2.1.8 and 5.2.1.13 in cluster C. The rest of subclasses are mixed across several clusters, intramolecular oxidoreductases (EC 5.3) are present in four clusters: B, C, D and E; intramolecular transferases (EC 5.4) are in four clusters: A, C, D and E; intramolecular lyases