<<

THE UNIVERSITY OF CHICAGO

MACHINE LEARNING FOR THE GENOTYPE TO PHENOTYPE PROBLEM

A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR THE DEGREE OF DOCTORATE OF PHILOSOPHY

DEPARTMENT OF COMPUTER SCIENCE

BY JOHN WILLIAM SANTERRE

CHICAGO, ILLINOIS DRAFTDECEMBER 2017 Copyright c 2017 by John William Santerre DRAFTAll Rights Reserved TABLE OF CONTENTS

LIST OF FIGURES ...... v

LIST OF TABLES ...... vii

ACKNOWLEDGMENTS ...... x

ABSTRACT ...... xi

INTRODUCTION ...... xii

I GENOTYPE TO PHENOTYPE: ANTIMICROBIAL RESISTANCE

1 OVERVIEW ...... 2

2 DETECTING ANTIMICROBIAL RESISTANCE IN THE LAB ...... 6 2.1 Biological concepts ...... 6 2.2 AMR Protocol ...... 8

3 ANTIMICROBIAL RESISTANCE AS A SUPERVISED MACHINE LEARNING PROBLEM ...... 10 3.1 k-mer Notation ...... 12 3.1.1 Classification Using k-mer Matrices ...... 17 3.2 Model Selection and Tuning ...... 19 3.3 Model Considerations ...... 20 3.3.1 Hyperparameter Selection ...... 21 3.4 Metrics ...... 22 3.4.1 Classification performance ...... 22 3.4.2 Feature Importance Calculation ...... 23 3.4.3 Gene Region Identification ...... 24 3.5 Model Comparison ...... 25 3.5.1 Naive Bayes ...... 25 3.5.2 AdaBoost ...... 26 3.5.3 Logistic Regression ...... 27 3.5.4 Support Vector Machines ...... 27 3.5.5 Random Forest ...... 27

4 ANALYSIS ...... 31 4.1 Data Sets ...... 31 4.1.1 Acinetobacter baumannii ...... 31 4.1.2 Staphylococcus aureus ...... 34 4.1.3 Mycobacterium tuberculosis ...... 34 4.1.4 Klebsiella pneumoniae ...... 35 DRAFTiii 4.2 Classifier Comparison ...... 36 4.2.1 Classifier Performance ...... 36 4.2.2 Speed ...... 39 4.3 Random Forest ...... 39 4.3.1 RF subtrees ...... 46 4.3.2 RF on subsets of the data ...... 47 4.4 Feature Importance Calculation ...... 49 4.4.1 Biological relevance ...... 49 4.4.2 Feature Importance Stability ...... 54

5 COMPRESSED MATRIX FORMULATION ...... 59 5.1 Compressed Matrix Construction ...... 59 5.2 Experiments ...... 63

A SUPPLEMENTARY MATERIAL ...... 70 A.1 AMR: Classification ...... 70 A.2 AMR: Gene Stability ...... 86 A.3 AMR: Code ...... 101 A.4 PGR ...... 103

DRAFTiv LIST OF FIGURES

3.1 Example dataset from which the k-mer matrix in (3.1) is constructed. SUS genomes are labelled 0. RES genomes are labelled 1 to denote the presence of a mutation conferring resistance to a specific antibiotic...... 12 3.2 Schematic of the AMR phenotype classification workflow using k-mers...... 16 3.3 An example RF tree...... 28

4.1 Overview of the A. baumannii dataset. The k-mer size (shown on the x axis) affects different metrics of the k-mer matrix identically...... 32 4.2 Histogram of the entries in the k-mer matrix for the (a) A. baumannii and (b) S. aureus dataset. Note that increasing the size k can have a dramatic effect on the distribution of entries of the k-mer matrix...... 33 4.3 Overview of the A. baumannii dataset. The k-mer size (shown on the x axis) affects different metrics of the k-mer matrix identically...... 34 4.4 ROC curves for classifiers trained on each of the A. baumannii and S. aureus datasets (k = 15)...... 39 4.5 ROC curves for classifiers trained on each of the M. tuberculosis datasets listed in Table 4.1...... 40 4.6 ROC curves for different classifiers trained on the A. baumannii dataset. Each ROC curve corresponds to a different size k...... 41 4.7 ROC curves for different classifiers trained on the S. aureus dataset. Each ROC curve corresponds to a different size k...... 42 4.8 ROC curves for different classifiers trained on the binary A. baumannii dataset. Each ROC curve corresponds to a different size k...... 43 4.9 ROC curves for different classifiers trained on the binary S. aureus dataset. Each ROC curve corresponds to a different size k...... 44 4.10 Accuracy of different classifiers on the (binary) A. baumannii and S. aureus datasets. Additional performance metrics can be found in the supplementary Tables A.1, A.2, A.3 and A.4 in Appendix A.1...... 45 4.11 Execution times for different classifiers ran on the A. baumannii dataset. NB denotes Naive Bayes...... 46 4.12 ROC curves for the RF classifier trained on the K. pneumoniae dataset. . . . . 46 4.13 ROC curves for the RF classifier trained with different number of trees on each of the A. baumannii and S. aureus datasets...... 48 4.14 ROC curves for the RF classifier trained on training sets of increasing sizes. In particular, RF was trained on subsets of the M. tuberculosis rifampicin and streptomycin datasets from Table 4.1. The subset size denotes the total number of isolates subsampled from each dataset in each experiment (with number of RES and SUS isolates equal to n/2 in each case)...... 49 4.15 Plot of the ranking (y axis) of each PEG (x axis) by the aggregate score computed by A1, as described in the text. Additional information about the PEGs can be found in supplementary Tables A.12 and A.11 in Appendix A.2...... 51 DRAFTv 4.16 Plot of the ranking (y axis) of each PEG (x axis) by the aggregate score computed by A2, as described in the text. Additional information about the PEGs can be found in supplementary Tables A.13 and A.14 in Appendix A.2...... 52 4.17 k-mer ranking according to the feature importance computed by RF on the full or the compressed k-mer matrix. Collapsing by column ID refers to calculating the feature importance separately on the full k-mer matrix and then summing the feature importance for all columns that have an identical column identity in the k-mer matrix...... 58

5.1 Python pseudocode for constructing a k-mer matrix...... 60 5.2 Pseudocode for constructing a compressed k-mer matrix...... 62 5.3 Overview of the compressed A. baumannii and S. aureus datasets. Similarly to the full k-mer matrix case (see Figures 4.1 and 4.3), the k-mer size (shown on the x axis) affects different metrics of the k-mer matrix identically...... 64 5.4 Accuracy of different classifiers on the compressed (binary) A. baumannii and S. aureus datasets. Additional performance metrics can be found in the supplemen- tary Tables A.5, A.6, A.7 and A.8 in Appendix A.1...... 65 5.5 ROC curves for different classifiers trained on the compressed A. baumannii dataset. Each ROC curve corresponds to a different size k...... 66 5.6 ROC curves for different classifiers trained on the compressed S. aureus dataset. Each ROC curve corresponds to a different size k...... 67 5.7 ROC curves for different classifiers trained on the compressed binary A. bauman- nii dataset. Each ROC curve corresponds to a different size k...... 68 5.8 ROC curves for different classifiers trained on the compressed binary S. aureus dataset. Each ROC curve corresponds to a different size k...... 69

DRAFTvi LIST OF TABLES

4.1 List of the antibiotics used for the M. tuberculosis isolates. For each antibiotic we list the total number of isolates used for classification with the specific number of resistant and susceptible isolates listed in the fourth and fifth column, respectively. Additional details about the dataset can be found in (Davis et al., 2016). The last column shows the number of features when the k = 15...... 35 4.2 List of antibiotics used for the K. pneumoniae isolates. For each antibiotic we list the total number of isolates which have a labelled (i.e., resistant or susceptible) for that antibiotic. The specific number of resistant and susceptible isolates are listed in the fourth and fifth column, respectively...... 36 4.3 Classifier comparison on the A. baumannii dataset (classification of resistance to carbapenem) and the S. aureus dataset (classification of resistance to methicillin). 37 4.4 Classifier comparison on each of the seven M. tuberculosis datasets listed in Table 4.1...... 38 4.5 Comparison of RF performance on the K. pneumoniae dataset...... 47 4.6 Comparison of RF classifier trained with different number of trees on the A. baumannii dataset (classification of resistance to carbapenem) and the S. aureus dataset (classification of resistance to methicillin)...... 48 4.7 Comparison of RF performance on training sets of increasing sizes. In particular, we trained RF on subsets of the M. tuberculosis rifampicin and streptomycin datasets from Table 4.1 . The subset size n denotes the total number of isolates subsampled from each dataset in each experiment (with number of RES and SUS isolates equal to n/2 in each case). All statistics are averaged over three independent runs, on each one of them we performed 5-fold cross-validation. . . 50 4.8 The top 20 k-mer matrix features computed by training RF on the k-mer matrix of the M. tuberculosis rifampicin dataset. Each classifier consists of 10 trees. FI denotes feature importance. [Change col index?]...... 54 4.9 The top 20 k-mer matrix features computed by training RF on the k-mer matrix of the M. tuberculosis rifampicin dataset. Each classifier consists of 1000 trees. FI denotes feature importance. [Change col index?]...... 55 4.10 The top 20 k-mer matrix features computed by training RF on the compressed k-mer matrix of the M. tuberculosis rifampicin dataset. Each classifier consists of 10 trees. FI denotes feature importance...... 56 4.11 The top 20 k-mer matrix features computed by training RF on the compressed k-mer matrix of the M. tuberculosis rifampicin dataset. Each classifier consists of 1000 trees. FI denotes feature importance...... 57

A.1 Performance of different classifiers on the A. baumannii dataset for different k- mer sizes...... 71 A.2 Performance of different classifiers on the S. aureus dataset for different k-mer sizes...... 73 A.3 Performance of different classifiers on the binary A. baumannii dataset for differ- ent k-mer sizes...... 75 DRAFTvii A.4 Performance of different classifiers on the binary S. aureus dataset for different k-mer sizes...... 77 A.5 Performance of different classifiers on the compressed A. baumannii dataset for different k-mer sizes...... 79 A.6 Performance of different classifiers on the compressed S. aureus dataset for dif- ferent k-mer sizes...... 81 A.7 Performance of different classifiers on the compressed binary A. baumannii dataset for different k-mer sizes...... 83 A.8 Performance of different classifiers on the compressed binary S. aureus dataset for different k-mer sizes...... 85 A.9 k-mer statistics computed by RF on the k-mer matrix of the M. tuberculosis rifampicin dataset (k = 15). FI denotes the feature importance. The balance denotes the fraction of times each k-mer is associated with a RES label in the dataset. Identical denotes the number of k-mers that have the same column identity as the specified k-mer. Sum of FIs is the sum of the feature importances (computed by RF) of the k-mers whose column identities that are identical to the specified k-mer...... 88 A.10 k-mer statistics computed by RF on the k-mer matrix of the M. tuberculosis streptomycin dataset (k = 15). FI denotes the feature importance. The balance denotes the fraction of times each k-mer is associated with a RES label in the dataset. Identical denotes the number of k-mers that have the same column identity as the specified k-mer. Sum of FIs is the sum of the feature importances (computed by RF) of the k-mers whose column identities that are identical to the specified k-mer...... 91 A.11 The top 50 PEGs ranked by the aggregation score the features in 100 RF classi- fiers, each of which is trained independently on the M. tuberculosis streptomycin dataset. The aggregation score was computed as described in A1 in Section 4.4.1. 94 A.12 The top 50 PEGs ranked by the aggregation score of the features in 100 RF classi- fiers, each of which is trained independently on the M. tuberculosis streptomycin dataset. The aggregation score was computed as described in A1 in Section 4.4.1. 97 A.13 The top 50 PEGs ranked by the aggregation score of the top 100 features in 100 RF classifiers, each of which is trained independently on the M. tuberculosis streptomycin dataset. The aggregation score was computed as described in A2 in Section 4.4.1...... 99 A.14 The top 50 PEGs ranked by the aggregation score of the top 100 features in 100 RF classifiers, each of which is trained independently on the M. tuberculosis rifampicin dataset. The aggregation score was computed as described in A2 in Section 4.4.1...... 101 A.15 Datasets used in the PGR experiments. Each dataset is named after the growth media used. For each dataset we show the percentage of positive examples (with the remaining being negative being negative) [why are these always 20%] as well as the accuracy of the RF algorithm...... 104 DRAFTviii A.16 Table of the top 5 (with their EC numbers) associated with each of the top 10 k-mers. For each media we show the top 10 k-mers (ranked according to the RF FI score). The second number in each tuple specifies the number of times the specific EC number is being associated with the given k-er in the specified PGR media dataset. [TODO: REDO]...... 127 A.17 Table of the top 5 enzymes (with their EC numbers) associated with each of the top 300 k-mers obtained by training RF on each of the PGR media datasets. The second number in each tuple specifies the number of times the specific EC number is being associated with any of the 300 k-mers across all isolates. . . . 135 A.18 Table of the top 10 reactions per PGR media identified by usung RF feature importances. [TODO: REDO]...... 137

DRAFTix ACKNOWLEDGMENTS

DRAFTx ABSTRACT

This thesis demonstrates the suitability of machine learning for classifying phenotypes from genotype data. First, we analyze the suitability of machine learning techniques on antimi- crobial resistance phenotypes. Additionally, we evaluate the stability of identifying DNA regions related to antimicrobial resistance. To speed and simplify this process, we develop a unique matrix construction method specifically for use on antimicrobial resistances datasets. Furthermore, this thesis considers an alternative phenotypic classification problem — namely predicting the ability of an organism to grow on a particular media (predicting growth rate) and the structure of the resulting feature space. Finally, we propose an extension of the Random Forest feature importance calculation and show how such an alteration results in an improvement in the identification of gene regions.

DRAFTxi INTRODUCTION

The rise of machine learning to augment or replace human judgments has grown at an astounding pace. In the broadest of terms, these techniques look to discern structure from data and are productive when order can be discerned using statistical techniques. The very basis of the genotype to phenotype problem is one of constructing and conserving structure through inheritance and across generations. While it is intuitive to consider strings of DNA as similar to strings of language, the physical process of decoding DNA is in fact radically different. The biological processes that utilize DNA work in a similar fashion to large parallel systems, with individual decodings happening locally and without the longer order dependencies that can be employed in language. While the actual biological process is local, a particular phenotype may arise as a separate set of distinct mutations. We consider two different genotype to phenotype problems to illustrate this difference. Antimicrobial resistance (AMR) is typically a function of a small number of individual mutations, while growth of an organism in substrates is often a collection of fitness function mutations that result in overall fitness. While making individual adaptations to algorithms can result in better performance, the genotype to phenotype problem is so broad and encompasses so many different prediction problems that it is premature optimization to fine tune algorithms at this stage. Instead, we first show the performance on a wide collection of machine learning algorithms, demonstrate how they can be used to more efficiently evaluate unknown data, and then finally propose alterations to more accurately coincide with our intuition of optimal performance. This paper will consider two types of datasets — one for detection of AMR and another one used for predicting growth rate (PGR) — and propose an alteration of the Random Forest algorithm to improve feature selection on biological data. First, we use a large collection of labeled AMR to compare multiple machine learning algorithms, thereby demonstrating the suitability of machine learning to provide consistent and productive results. We show DRAFTxii that machine learning can provide a substitute for GWAS analysis and how such analysis can be computed for a lower computational cost. The second dataset, which we refer to as PGR dataset, will consider the growth ability of a large number of bacteria on individual growth media. Finally, we propose an extension of the Random Forest feature importance calculation and show how such an alteration results in an improvement in the identification of gene regions. Some of the main contributions of this thesis include:

• demonstrating the suitability of machine learning to means for classifying AMR phe- notypes,

• evaluating the stability in identifying regions of the DNA that relate to AMR pheno- types,

• developing a unique matrix construction method specifically for use on AMR datasets.

• evaluating the suitability of machine learning for predicting growth rate of an organism,

• evaluating the suitability of machine learning as an alternative of the annotation of gene regions associated with PGR,

• proposing an extension of the Random Forest (RF) feature importance calculation and show how such an alteration results in an improvement in the identification of gene regions.

[Include descriptor of rank aggregation section etc.]

DRAFTxiii Part I

Genotype to Phenotype: Antimicrobial Resistance

DRAFT1 CHAPTER 1

OVERVIEW

Antimicrobial resistance (AMR) is a significant health threat to the global community and is widely recognized by governments, departments and organizations such as the White House, the Center for Disease Control, United Nations and the World Health Organization (for Disease Control and , US; Organization et al., 2014; House, 2015). Despite these extensive efforts, AMR is still a major public health concern. Resistant microbes are typically se- lected in response to the extensive (mis)use of antimicrobial agents (Laxminarayan et al., 2013). The efforts to develop new broad spectrum antibiotics to stem this epidemic have not been successful (Butler et al., 2013). One particularly concerning region is Southeast Asia known for the prolific distribution of counterfeit medications. Perhaps even more important than human consumption is the use of antibiotics worldwide for livestock. Antibiotics are introduced into the food supply and partially consumed food introduces antibiotics through runoff into the water supply. Thus, it is of utmost importance for researchers and clinicians to correctly identify resistant bacteria and react accordingly. Furthermore, it is equally vital for researchers to determine the mutations conferring resistance so that they can identify novel AMR mechanisms and track epidemics (Wright, 2011). Despite the costs and expense of monitoring bacteria, such as C. difficile, which is among the fastest growing superbugs, there are reasons to believe that the situation will grow worse. Estimates project that AMR diseases will account for more deaths in 2050 than cancer [ref]. While many new promising approaches to prevention of AMR in bacteria are being developed, it is unclear how these will exceed the rapid development of AMR. One particular example is the fact that many bacteria are already resistant to first line antibiotics standardly used by the US military. The pace of resistance development is surpassing the ability of the medical community to develop new techniques for fighting superbug infections. Accordingly, the US military has done testing to determine the prevalence of AMR bacteria DRAFT2 and attempted to alter its first line treatments. The prevalence of AMR strains often renders first line protocols ineffective, resulting in the misapplication of antibiotics which may delay or prevent positive outcomes. In a clinical setting, first line antibiotics are typically used for treatment, followed by additional tests for resistance. In order to test for AMR, the standard protocol involves manual collection of bacterial samples. These can then be cultured and physically given an antibiotic to see if the sample is resistant. Because this process is extremely time consuming, it is also prohibitively expensive. Researchers have developed rapid assessment techniques such as microassays, which can rapidly detect a particular known resistance. These methods typically require a single use devices and carry an associated cost. Physicians would like rapid identification of known AMR in order to provide better care. Beyond the short term clinical view researchers continue to seek tools that provide insight into the mechanisms of AMR and its effect on the genome to assist in drug discovery. In machine learning terminology this is equivalent to performing classification and feature selection. Clinical researchers often need to classify AMR in a known isolate (i.e., a single sample) and to identify AMR bacteria in an environmental sample. In addition, often various single nucleotide polymorphisms (SNPs), as well as the associated gene or gene function that confers resistance, need to be identified. The process needs to be robust to noise and as transparent as possible in order to provide interpretability for the clinician. A tiny percentage of the worlds bacteria have been discovered and only a minuscule fraction of those have been sequenced. This observation has unique statistical inference implications for machine learning classification tasks — while the amount of analysis that is currently possible might not be too, but machine learning also provides a new avenue to expanding our understanding of AMR development. More broadly, machine learning tools offer the opportunity to quantify a vast collection of biological data in a radically different way. While standard applied machine learning techniques offer direct utility for the clinicians DRAFT3 or biologists, nuances of working with biological data require additional sophistication not offered by standard tools and techniques. AMR classification could be challenging due to the variety of protocols that researchers use when collecting and analyzing samples. This creates opportunity for unique analyses that are tailored to a particular protocol. In contrast to traditional machine learning datasets, which are typically ”clean”, AMR data is often collected in multiple studies, and as a result, it is is typically noisy. We analyze a standard protocol when trying to identify how bacteria will respond to an antimicrobial agent. It is intended to simulate in the lab the speed and type of resistance that a type of bacteria will develop if the agent is used environmentally. The typical scenario we consider is intended to vary the samples exposure to the antimi- crobial agent, while allowing for equivalent genetic drift within each set of isolates and any mutation that might result from the growth medium or other environmental factors. By comparing the genetic code of the resistant population (which would sometimes be denoted RES) with the susceptible population (which would sometimes be denoted SUS), the goal is to discern metagenomics or other stable genetic differences between isolates that confer the particular phenotype. The naive solution for comparing multiple isolates is to simply align the genomes of different RES and SUS isolates and look for insertions, deletions or transpositions that can distinguish between the two groups. In short read sequencing one cannot simply ”line up” the DNA strands starting with the first base pairs, rather, this requires the global alignment of contigs between isolates. It is the lack of a clear ”zero index” for short read DNA that makes the DNA alignment challenging. Often the strands are aligned against a reference genome or a genome that has either been generated with WGS or is so extensively studied using short read technology that the content of the genome is well established. While computationally expensive, this process, called Genome Wide Association Study (GWAS) is the current state-of-the-art technique and yields compelling and easily interpretable results. This process relies upon DRAFT4 maximal matching for a gappy overlap.

DRAFT5 CHAPTER 2

DETECTING ANTIMICROBIAL RESISTANCE IN THE LAB

2.1 Biological concepts

Bacterial DNA consists of two strings each is comprised of approximately 4 million nitroge- nous bases drawn from the following set: adenine (A), thymine (T), cytosine (C) and guanine (G). Each string is a reverse complement of the other with adenine bonding with thymine and cytosine bonding with guanine. For example, the reverse complement of the string AGTC is GACT. The genetic structure of an organism is referred to as its genotype and the observable characteristics of an organism as its phenotype. AMR is an example of a phenotype that is generated by changes to the bacterial genotype. In other words, antimicrobial resistance is conferred by changes in the underlying genetic code. Often the changes are related to the so called housekeeping genes, i.e., genes which modulate basic cellular functions. This can involve shutting down promoter regions of genes or alterations to the encoding region of the gene. In some cases, single nucleotide polymorphisms (SNPs), also known as point muta- tions, can confer phenotypical changes. In particular, this means that a single change in a DNA strand consisting of 4 million base pairs can result in a completely different phenotype. Below we define concepts which we will often use in the rest of this thesis.

1. Genome Wide Association Studies (GWAS). Analysis of alterations in the vari- ation the genome to determine if the variations explain a trait. Typically, this is done by globally aligning the genomes against a reference genome.

2.k −mer refers to all possible substrings (of length k) contained in a DNA read.

3. Read. The nucleotide sequence (obtained by sequencing) of a fragment of DNA.

4. Contig. Continuous region of DNA that is generated through an assembly process from genome reads. DRAFT6 5. Single nucleotide polymorphism (SNP)(pronounced snip). A change in a single nucleotide that occurs at a specific position in the DNA.

6. Isolate (also sometimes referred to as sample) represents a single homogenous colony of bacteria whose genetic material is aggregated into on average description of the colony.

7. Reverse Complement. Due to DNA’s binding structure, the reverse compliment of ATGC is GCAT. The DNA is reversed and the bonding parter of each nucleotide is substi- tuted.

8. Housekeeping Genes. Typically referring to continually expressed genes, housekeeping genes regulate basic cellar function. Particularly AMR is known to interfere with housekeep- ing genes related to cell wall transport.

9. Promoter Regions of the DNA serve to initiate the transcription of an individual gene. Mutations in a promotor regions can inhibit transcription thereby conferring resistance.

10. Point Mutation. A single insertion, deletion or mutation of a base pair.

11. Protocol. The term is used here to denote a methodology for collection an early stage analysis. The specific protocol used in practice is important for understanding the underlying assumptions of the resulting dataset.

12. Metagenomics. Generated from environmental samples, metagenomics analysis con- siders diverse and mixed colonies of biological material.

13. Minimum Inhibitory Concentration (MIC). The minimal concentration of an antibiotic required to prevent grown of a bacteria.

14. Growth Medium. The used for growing a bacterial colony. By carefully controlling the choice of medium, one can alter the the expression of particular bacterial genes. DRAFT7 2.2 AMR Protocol

Below we describe the basic steps of the standard protocol for generating samples of AMR bacteria for analysis. There are a mix of different approaches, but below we outline the major steps involved in each one of them.

1. Collection: The process of collection can encompass a variety of different protocols. In a clinical setting where a unique patient is involved, a direct sample is often col- lected. At other times, a regional collection of multiple patients’ data is gathered. Recently, researchers have begun taking broad environmental samples. Such metage- nomics sampling encompasses a variety of genetic material and poses vastly different inference problems.

2. Culturing: Culturing typically involves separating the sample into one or often mul- tiple different collections and then growing it on distinct growth plates (petri dishes). Since the growth medium (food to nourish and substrate to hold the bacteria) can affect the genetic components of the material, the most general growth mediums are often chosen.

3. Exposure: With a sufficient number of separate collections of material, half of the isolates receive a low dose of the antimicrobial agent, while the other half are left to continue growing. The dosage of the antimicrobial agent is targeted to be below the Minimum Inhibitory Concentration (MIC), the lowest concentration at which the antibiotic will kill the bacteria. Since some bacteria survive the application of the antibiotic, those left behind experience selective genetic pressure to adapt to the an- tibiotic and develop a resistance phenotype. Those bacteria that survive and continue to grow after being exposed to the antimicrobial agent are referred to as the resistant phenotype (RES), while the unexposed isolates are still susceptible to the antibiotic (SUS). DRAFT8 4. Sequencing: Sequencing is the physical process of converting the clustered isolates into manageable regions of DNA and then digitizing these regions. While WGS domi- nated early advancements, short read technology has dramatically increased the scale and scope of this research. In short read technology, each isolate (a colony of bacte- ria) is sequenced into short (25 − 150 base pair) strings/reads. Conceptually, akin to freezing the DNA structure and shattering it, the reads are the most basic form of digitized data. Short read technologies are inherently noisy, and can not map to the entire genome, as demonstrated in the next step.

5. Assembly: Assembly is the process of building the longest possible strings out of the aggregation of reads. The assembled regions, which are known as contigs, represent the greatest lengths of aggregated DNA given the genome and read technology. While WGS offers the generation of an entire DNA sequence, short read technology is inherently limited by the repeated regions. Repeated regions longer than the length of the reads cannot be assembled.

DRAFT9 CHAPTER 3

ANTIMICROBIAL RESISTANCE AS A SUPERVISED

MACHINE LEARNING PROBLEM

Understanding the differences among hundreds or thousands of genomes, all of varying lengths, creates a fundamental computation problem for biologists. The current standard practice for identifying the loci associated with a given phenotype is a Genome Wide Asso- ciation study (GWAS) (Chewapreecha et al., 2014). This approach detects genetic variation by identifying statistically significant differences between strains. GWAS excels at finding SNPs because it effectively returns a P value of the probability that an individual loci is associated with the change in phenotype. While GWAS has proven to be a powerful tool for analysis, it requires alignment against a well-curated reference genome [The standard analysis technique in GWAS analysis is to derive a P value associated with the likelihood that an individual snip is related to the different in phenotypes. In the simples case the phase transition in feature importances values provides clear evidence that there are distinct difference in the population of the top k-mers. This approach reflects the broader statistical trend to move away from rigid statistical test and, rather, favor more complex analysis this is less likely to produce false positives. While individual k-mers are powerful, we believe that clustering on gene locations offers the strongest evidence that the natural separation of low feature importance results that are more likely a function of statistical noise and the much more significant values. A more interesting approach would be to incorporate a noise model that would account for random mutations. This might be accomplish by theoretical projecting learned from biology, or more interestingly from inspection of the kmer matrix and its alternations as K is altered.] Machine learning algorithms are an appealing alternative to GWAS because they can be used to provide both the ability to classify phenotypes and to identify loci of interest without DRAFT10 the expectation of fully-assembled genomes. Furthermore, machine learning has the potential to be less computationally expensive and does not require aligning reads against a reference genome. Previous work using machine learning algorithms for detecting AMR has shown promise. Researchers have successfully used algorithms as diverse as Set Covering Machines, AdaBoost and computational tools such as de Bruijn graphs representations (Davis et al., 2016). Additionally, previous research in computational biology has considered Random Forest for other problems, but to our knowledge no one has explored the use of k-mer space and Random Forest for the de novo identification of SNPs or gene regions associated with resistance (see Figure 3.2 for a chart of phenotype classification workflow using k-mers). For instance, the Mykrobe (Bradley et al., 2015) predictor software package outlines a system robust to mixed collections of species in sample, but utilizes the prior knowledge of resistant and susceptible alleles. Alternatively, the work regarding Set Covering Machines imposes the constraint of minimalism: the optimal set of ”rules” that identify resistance is [the minimal set (what’s this)], an assertion that may not be biologically accurate. We mention these to highlight two points. The first is that the problem of AMR appears to lend itself to fairly straightforward analysis. This encourages the application of a variety of approaches, the majority of which are successful, a fact not widely recognized in the biological community. The second is that this type of work has the exciting potential not only to identify a particular region with some degree of prior knowledge, but also to provide true de novo analysis, thereby replacing GWAS at least in cases where a known reference strain exists. The impact that machine learning has had on a variety of fields is hard to understate. From highly structured problems to general inferences, machine learning has provided unique tools that have changed the way individuals interact with devices and understand their sur- roundings. Advances in computational hardware have provided opportunities for various historically unexplored statistical principles to be fully leveraged at scale. The trio of skills DRAFT11 Isolate Genome Label i1 ATGCATGC 0 i2 ATGCATGC 0 i3 ATGGATGC 0 i4 ATGCCTGC 1 i5 ATGCCTCC 1 i6 ATGCGTGC 1

Figure 3.1: Example dataset from which the k-mer matrix in (3.1) is constructed. SUS genomes are labelled 0. RES genomes are labelled 1 to denote the presence of a mutation conferring resistance to a specific antibiotic. software engineering, rigorous statistics and in-depth domain knowledge has yielded con- siderable results in a variety of fields. In many ways, the problem of AMR would benefit from machine learning tools. Given that our data is clearly labeled and balanced between positive and negative examples, a whole host of supervised learning techniques would seem amenable. Until the recent explosion in deep learning, supervised learning techniques were at the heart of work by domain-agnostic machine learning experts, and extensive theoretical work had given good insight into performance of the algorithms in a variety of different sce- narios. Additionally, extensive use of rural real world settings has established general first steps that yield strong results.

3.1 k-mer Notation

To leverage this extensive work, all that remains is to determine a natural and clear way for representing features to describe the two sets of isolates. For this purpose, our analysis chooses to use k-mers as the principal feature for the isolates. k-mers are short overlapping substrings of DNA that typically vary in length between 10 and 100 base pairs, where k indicates the choice of substring length. Let I = {i1, i2, i3, . . . , im} denote the set of isolates/samples (typically, m is in the range 50 − 5, 000). For each isolate we find all the DRAFT12 substrings of length k and combine them into a set of unique overlapping k-mers K =

{k1, k2, k3, . . . , kn}. We construct an m × n matrix, called a k-mer matrix, in which each row corresponds to an individual isolate, while each column corresponds to a unique k-mer.

Each entry Mi,j represents the number of times the j-th k-mer appears in the i-th isolate.

For example, consider the data set in Table 3.1. The SUS population is the set {i1, i2, i3}, while the RES population is comprised of the set {i4, i5, i6}. For k = 3 the actual k-mer set is {ATG, CAT, TGC, GCA, GGA, TCC, GAT, GCC, CCT, CTG, CGT, CTC, GCG, GTG, TGC, TGG}, of which we store only the lexicographically lower reverse complement of each string. Thus, for example, CAT is stored as ATG (the reverse nucleotide complement of CAT is GTA and after lexicographic ordering the string GTA becomes AGT), while TGC is stored as GCA. As a result the final k-mer set that we use for construction of the k-mer matrix is K = {ACG, AGG, ATC, ATG, CAG, CCA, CGC, CTC, GCA, GCC, GGA, CAC} with |K| = 12. The resulting k-mer matrix M ∈ R6×12 has the form

ACG AGG ATC ATG CAG CCA CGC CTC GCA GCC GGA CAC   i1 000300003000 i2 000300003000    i3 001201001010  M=   (3.1) i4 010110002100    i5 010100011110  i6 100100102001

where the underscores denote columns which are identical. An alternative way of defining a k-mer matrix is in terms of the presence/absence of a given k-mer in each isolate, rather than in terms of the k-mer counts in each isolate. This type of matrix we call binary k-mer and it can be easily obtained by replacing the positive integer valued entries of M, as defined above, with ones. In other words, the (i, j)-th entry

of the binary k-mer matrix Mbin is set to 1, if Mi,j > 0. If Mi,j > 0, the Mbini,j = 0.

We can further compress k-mer matrix M or a binary k-mer matrix Mbin by collapsing DRAFT13 the identical columns in it. For example, we can compress the sample matrix in (3.1) by remapping the column tuples (ATC,CCA)7→ 1 and (ACG, CGC, CAC)7→ 2 and forming the matrix

2 AGG 1 ATG CAG CTC GCA GCC GGA   i1 0 0 0 3 0 0 3 0 0 i2 0 0 0 3 0 0 3 0 0    i3 0 0 1 2 0 0 1 0 1  Mcompress =   (3.2) i4 0 1 0 1 1 0 2 1 0    i5 0 1 0 1 0 1 1 1 1  i6 1 0 0 1 0 0 2 0 0 whose columns are all unique. Note that both (3.1) and (3.2) are so called ”short and fat” matrices, i.e., the number of columns is much larger than the number of rows. Due to the repetitious nature of encoding data in terms of k-mers, k-mer matrices, such as the one in (3.1)), may have many columns that are exact duplicates of each other. We define all such features as sharing a column identity — column identity is the unique column vector (of a k-mer matrix) associated with a given k-mer. A column identity is typically used to identify k-mers that have the exact same column identities in an uncompressed k- mer matrix. This definition will prove useful when we consider the importances of individual k-mers, all k-mers that share a column identity, and all k-mers that can be mapped back to a unique gene region on a reference genome. We will also often refer to certain columns of a k-mer matrix in term of their indices — the column index of a given k-mer is the index its corresponding column in a full (uncompressed) or compressed k-mer matrix. For example, in the k-mer matrix defined in (3.1) k-mer ATC (whose column index is 2) and CCA (whose column index is 5) share the column identity [0, 0, 1, 0, 0, 0]>. Compressing a k-mer matrix amounts to looking for a lower dimensional projection such that only the minimal amount of information contained in the matrix is lost. This process can be motivated due to having features that lack predictive power (i.e., evenly distributed across label classes). They are strongly correlated with other variables (i.e., oversampling or DRAFT14 sampling a hidden trait through multiple known features), or are even explicitly projecting onto a lower dimensional but more complicated space (i.e., the function that maps features to new a space is computed by combining values across features). All these approaches share the assumption that the original sample space is ill formed or redundant. In contrast to the alignment of individual contigs, which creates a consistent index across isolates, this approach allows us to work index free. While the total number of possible k-mers is staggeringly large in theory, in practice isolates exist on a much smaller subspace. This subspace is the of both the choice of k-mer size and the number of isolates to be considered. It is clear why k-mer size affects the number of features, as there are 4k unique k-mers to choose from. While there are 4k unique strings, the actual space that we must consider is even smaller. This is due to the fact that the direction of the assembled contigs is not known. During the reading process, it is possible that the reads represent either the right hand or left hand version of the double helix strand of DNA. A DNA strand and its reverse complement join to form a complete double helix. For instance, the reversed complement of ATGC is GCAT. Unfortunately, read technology cannot distinguish between the left hand and right hand strands of DNA. For this reason, only the lexicographically lower value is stored. Thus, any occurrence of ATGC and GCAT are both counted under the ATCG column. Additionally, the number of k-mers increases as a function of the number of unique isolates considered. This is due to noise or small mutations that lead to k-mer matrices whose columns contain only a single nonzero entry. This directly affects the sparsity of the underlying k-mer matrix. In a particularly extreme case only the combined percentage of G’s and C’s are compared to A’s and T’s. Called the GC count, this is used in practice for rapid identification of broad types of data (e.g., plant vs. animal classification). In the opposite extreme, using long k- mers yields a nearly square diagonal matrix where each k-mer (i.e., one isolate’s unique DNA sequence) identifies only its single corresponding isolate. In practice, we note that this often means that the size of our matrix expands with k in the range of 10 − 14. Since typical DRAFT15 K-mer Counts for RES Genomes K-mer Counts for SUS Genomes AAAAAAAAAAGAGAGCAAATGCTCTCTTTTT 1 AAAAAAAAAAAAAGCCCGCCGTTAGCGAGCTK-mer Counts 1 AAAAAAAAAATTAGCAGCTAATAACCCTTAT 1 AAAAAAAAAAAAGCCCGCCGTTAGCGAGCTT 1 AAAAAAAAACACTTGTCACTGTTAGATCTTA 2 AAAAAAAAAAAGCCCGCCGTTAGCGAGCTTASusceptible Genomes 1 AAAAAAAAAAACGAGCCAGTGGGCTCGTTTTAAAAAAAAACATCCATAATGCTTTCTAAGCT 1 1 AAAAAAAAAAGGTATCTTCAACACACAGGGAAAAAAAAAAACCCCGCCATTAAGACGGGTTT 1 1 AAAAAAAAAAACGCGCCAGCCATAATCTGGAAAAAAAAAACCACATCGGGTGATGTGGCAAA 1 1 AAAAAAAAAATAAAATAAACTTGCATTATTAAAAAAAAAAACTAGCTACTAACAATCCGTAT 1 1 AAAAAAAAACCTGTTGGCAAATAAAAACACA 1 AAAAAAAAAACGACTTCATAAAGAAGTCGTT 1 AAAAAAAAAATTAGCAGGTAACAACCCTTACAAAAAAAAAAGAAGAAATTTACATAAATAAT 1 1 AAAAAAAAACGCACATCAAGGATGTGCGTTC 1 AAAAAAAAAAGCCCGCCGTTAGCGAGCTTAA 1 AAAAAAAAAACAAAAAAAACCCCGCCGCTAAAAAAAAAAAACGAGCCAGTGGGCTCGTTTTT 1 1 AAAAAAAAAAAAAAAAAGCCACTAATACAAGAAAAAAAAACATAATATTTCAATATAAGGGA 1 1 AAAAAAAAACTCTGTCGATAGTCTAGTTTGT 1 AAAAAAAAAAGGTATCTTGAGTGGTACAAGG 1 AAAAAAAAAATTAGCAGCTAATAACCCTTATAAAAAAAAAACGCGCCAGCCATAATCTGGAA 1 1 AAAAAAAAAAAAAAAACCCGCCCCCGATAAGAAAAAAAAACATCCATAATGCATCGTAAGCT 1 1 AAAAAAAAAGAAAAATCTTCCTATATTACAC 1 AAAAAAAAAATCCAGATCATGCAAAAATTTA 1 AAAAAAAAAATTTATTTTAGAAAGTTATAAAAAAAAAAAAATACAATGTTCGATCTCAGGCA 1 1 AAAAAAAAAAAAAAAAGCCACTAATACAAGCAAAAAAAAACGCACATCAAGGATGCGCGTTT 1 1 AAAAAAAAAATCCAACATCTTTTCCCGCTCGAAAAAAAAAGACTTATAGCGACATCGCAGCA 1 1 AAAAAAAAACGCTAACTGAAGATTCAAGTTAAAAAAAAAAATCTGTCCAGCGTAGAACTTTG 1 1 AAAAAAAAAACAAAAGGTTGCATATATTTTTAAAAAAAAACAAAAAAAACCCCGCCGCTAAA 1 1 AAAAAAAAAACGAGCCGTAGCTCGTTTTTTCAAAAAAAAAAAAAAACCCGCCCCCGATAAGG 1 1 AAAAAAAAAATTAGCAGCTAATAACCCTTAT. 1 AAAAAAAAACTCAAGCCCGTTAGAATGTAGA. 1 AAAAAAAAAAGGGTTGCTGAGGTAACCCTTTAAAAAAAAACAACTAAATTTTTGAGTCAATC. 1 1 AAAAAAAAAAGGTATCTTCAACACACAGGGAAAAAAAAAAAAAAAAGCCACTAATACAAGCG 1 1 AAAAAAAAACACTTGTCACTGTTAGATCTTA 1 AAAAAAAAACTCTATCGATAGTTTAGTTTGT. 1 AAAAAAAAAATAGCCTAAGAACATAGGTACGAAAAAAAAACAAGCCGATGATGTTTTAACGA 1 1 AAAAAAAAAATAAAATAAACTTGCATTATTAAAAAAAAAAAAAAACCCGCCCCCGATAAGGG 1 1 AAAAAAAAACAGACTCTGCTGTTAGCTTATT. 1 AAAAAAAAAGACTTATAGCGACATCGCGGCA. 1 AAAAAAAAAATTAGCAGCTAATAACCCTTATAAAAAAAAACACTTGTCACTGTTAGATCTTA 1 1 AAAAAAAAAATTAGCAGGTAACAACCCTTACAAAAAAAAAAAAAAGCCACTAATACAAGCGT 1 1 . . AAAAAAAAACAAAAGGTTGCATATATTTTTTAAAAAAAAACAGTGATTCCTTAGAAATAAAT 1 1 AAAAAAAAAATTCGTAATACACTTTTATTAGAAAAAAAAAAAAACCACCCTAAAGAGGGTGG 1 1 . . AAAAAAAAACAATTTCCCGGTTAAATGGTATAAAAAAAAACATCCATAATGCTTTCTAAGCT 1 1 AAAAAAAAACATAATATTTCAATATAAGGGAAAAAAAAAAAAAACCCGCCCCCGATAAGGGG 1 1 . . AAAAAAAAACACCAATTTTTTTCCCTTCAACAAAAAAAAACCGCTAAAAACTCATAAAATCA 1 1 AAAAAAAAACATCCATAATGCATCGTAAGCTAAAAAAAAAAAAAGCCACTAATACAAGCGTA 1 1 AAAAAAAAACACTTGTCACTGTTAGATCTTA. 3 AAAAAAAAACCCCCCCGCCTCTAATAAAACA. 1 AAAAAAAAACATCCATAATGCTTTCTAAGCT. 1 AAAAAAAAACGAGCCGTAGCTCGTTTTTTCT. 1 AAAAAAAAACCAATCTGGAGCGGGAAACGAG. 4 AAAAAAAAACGCACATCAAGGATGCGCGTTT. 1 ...... Matrix of Merged K-mer Counts AAAAAAAAAAAAAAAAAGCCACTAATACAAG 0 0 0 0 0 1 0 0 AAAAAAAAAAAAAAAACCCGCCCCCGATAAG 0 0 0 0 0 1 0 0 AAAAAAAAAAAAAAAAGCCACTAATACAAGC 0 0 0 0 0 1 0 0 AAAAAAAAAAAAAAACCCGCCCCCGATAAGG 0 0 0 0 0 1 0 0 AAAAAAAAAAAAAAAGCCACTAATACAAGCG 0 0 0 0 0 1 0 0 AAAAAAAAAAAAAACCCGCCCCCGATAAGGG 0 0 0 0 0 1 0 0 AAAAAAAAAAAAAAGCCACTAATACAAGCGT 0 0 0 0 0 1 0 0 AAAAAAAAAAAAACCACCCTAAAGAGGGTGG 0 0 0 0 0 1 0 0 AAAAAAAAAAAAACCCGCCCCCGATAAGGGG 0 0 0 0 0 1 0 0 AAAAAAAAAAAAAGCCACTAATACAAGCGTA 0 0 0 0 0 1 0 0 AAAAAAAAAAAAAGCCCGCCGTTAGCGAGCT 0 0 0 1 0 0 0 0 AAAAAAAAAAAACCACCCTAAAGAGGGTGGT 0 0 0 0 0 1 0 0 AAAAAAAAAAAACCCCCGCCAATAATCGATA 0 0 0 0 0 1 0 0 AAAAAAAAAAAACCCGCCCCCGATAAGGGGG 0 0 0 0 0 1 0 0 AAAAAAAAAAAAGCCACTAATACAAGCGTAA 0 0 0 0 0 1 0 0 AAAAAAAAAAAAGCCCGCCGTTAGCGAGCTT 0 0 0 1 0 0 0 0 AAAAAAAAAAACCACCCTAAAGAGGGTGGTT 0 0 0 0 0 1 0 0 AAAAAAAAAAACCCCCGCCAATAATCGATAT 0 0 0 0 0 1 0 0 AAAAAAAAAAACCCGCCCCCGATAAGGGGGC 0 0 0 0 0 1 0 0 AAAAAAAAAAACGAGCCAGTGGGCTCGTTTT 0 1 0 0 0 0 0 0 AAAAAAAAAAACGCGCCAGCCATAATCTGGA 0 1 0 0 0 0 0 0 AAAAAAAAAAAGCCACTAATACAAGCGTAAT 0 0 0 0 0 1 0 0 AAAAAAAAAAAGCCCGCCGTTAGCGAGCTTA 0 0 0 1 0 0 0 0 AAAAAAAAAACAAAAAAAACCCCGCCGCTAA 0 0 0 0 0 0 0 1 AAAAAAAAAACAAAAGGTTGCATATATTTTT 1 0 0 0 0 0 0 0 AAAAAAAAAACCACCCTAAAGAGGGTGGTTT 0 0 0 0 0 1 0 0

Machine Learning

Relevant K-mers, “The Classifier" 1.208 ATAGTTCTGAGGTTGTTGTTCATTATCAAAA 0.819 CCGTTCAATAGCTACCTGAAAAGGCTTATAA AAGCCTTTTCAGGTAGCTATTGAACGGGCAG AGCCTTTTCAGGTAGCTATTGAACGGGCAGG ATAAGCCTTTTCAGGTAGCTATTGAACGGGC 0.921 ACTTGTTCCATTTTAGAGATTTGTTTAAGAT CTTGTTCCATTTTAGAGATTTGTTTAAGATA CTTAAACAAATCTCTAAAATGGAACAAGTCA TTATCTTAAACAAATCTCTAAAATGGAACAA 0.654 GGTCTACGCTATAACACTGAAGGGAAATGTA 0.548 AATCGATTGAGCTTTTTGTGCATCGCCACCG AGCCGGTGGCGATGCACAAAAAGCTCAATCG ATCGATTGAGCTTTTTGTGCATCGCCACCGG ATTGAGCTTTTTGTGCATCGCCACCGGCTTG 0.585 ATCGATATAGCCATTTCTGACTAAAAACTCG 0.450 CTTCAAAAGCTTTTAATGCTTTTTCGGCTGC ATTCCTTCAAAAGCTTTTAATGCTTTTTCGG GCCGAAAAAGCATTAAAAGCTTTTGAAGGAA AGCAGCCGAAAAAGCATTAAAAGCTTTTGAA 0.525 AAAAATTGATCCAAATACTCCAATTGCAGTG AAAATGTAACTATAAAAATTGATCCAAATAC CAGTGATGTTGATGAACCAGCATAGTGCAGA TACATTTTTTAAATTAACGTCAGTTAAAACA 0.573 TAGTTTTATTATTAAATCAAAAATATTTTAA AGTTTTATTATTAAATCAAAAATATTTTAAA TAAAATATTTTTGATTTAATAATAAAACTAA AAAATATTTTTGATTTAATAATAAAACTAAA 0.535 AGCCGAAATTCAGGAGCTTTGGTTTGAGCAC GCCGAAATTCAGGAGCTTTGGTTTGAGCACC CAGCCGAAATTCAGGAGCTTTGGTTTGAGCA

Unclassified Classified Genome Genome

Source: Davis et al. (2016). Figure 3.2: Schematic of the AMR phenotype classification workflow using k-mers. DRAFT16 bacteria length is 4 × 106, each 10-mer represents 1/400, 000 of the total length. In practice, it is helpful to think of starting with approximately 4 orders of magnitude more features than rows.

3.1.1 Classification Using k-mer Matrices

Conceptually, we would like each k-mer to represent a single location on the genome. This corresponds to a choice of k-mer length such that the vast majority of entries in the matrix are 0 or 1. This would allow us to evaluate individual changes in the genome. Additionally, it is useful to find k-mers that occur only in a single isolate. These k-mers indicate that the uniqueness is not conserved across isolates and is more likely to be random noise created by a genetic or mechanical process. This is not always possible, as it is typical to find entire regions of the genome repeated multiple times. Gene duplication is a common event in DNA and often produces regions that allow for greater mutation rates. This means we can either choose a k of length where sparsity is just beginning to grow in the matrix, or we can choose a k of length where the matrix size is beginning to taper off. Since this process can yield matrices of drastically different sizes, this decision is partially computational and partially based on its relationship to the accuracy of the classifier. Since we are beginning with no prior expectation of how resistance will be generated, it is possible that a single SNP would generate the difference between our two populations. If it was generated without noise, this SNP would be duplicated in k columns. This is because each position occurs in k different features. In the simplest case, k features will have all ones in one label and all zeros in the other. Such a matrix would be convertible to a block diagonal matrix. In practice, however, we find that it is not so clean. Noise is ever present in the AMR data sets, and mutations typically co-occur on the same gene. There could be several reasons for this, including the possibility that since the first mutation effectively disrupts the gene function, the gene is more suitable for mutations. DRAFT17 The the region that uniquely identifies the resistant phenotype is known as a genetic fingerprint. Genetic fingerprints are particularly useful in the analysis of large, diverse environmental samples. They serve as early warning signs that an environment may contain a particular phenotype, but cannot prove definitively that the phenotype exists. Even more commonly, a collection of such fingerprints is identified and attached to an assay, a device that can be covered with a metagenomic sample so that the fingerprint binds to the device. A DNA assay is akin to a biological alarm system. A biological sample is obtained and washed across the assay; should a sample corresponding to the fingerprint be present, it will bind to the assay and notify the user. Typically, such systems use fingerprints longer than typical k-mer length to prevent false positives (usually of length 100). While fingerprints offer compelling evidence of the importance of a set of k-mers, they are not without concerns. From a statistical viewpoint, using them without further analysis leaves one prone to overfitting; from biological viewpoint, they could potentially reflect a co-occurring, unrelated genotypic modification — a very real possibility given some protocol designs. While fingerprints are powerful for first analysis, we cannot take for granted that we will always be analyzing a small number of mutations that are perfectly conserved. It is entirely possible that any number of single disruptions to a gene could be significant for a change in phenotype. Classification accuracy is an important metric for understanding if ML classifiers are correctly labeling unseen data sets. However, most important to the biologist is the identifi- cation of regions in the genome that are key for conferring various phenotypes. This analysis can take the form of identifying the exact location for spot mutations, identifying genes, or identifying gene functions. The variety of options provides ample flexibility in analysis.

DRAFT18 3.2 Model Selection and Tuning

Machine learning is still a relatively new field. Since its inception, there have been several productive waves of research with each striving to solve separate diverse problems that were previously unanswered. As such multiple different paradigms have emerged as reasonable first choices for practitioners. Current state-of-the-art out algorithms provide efficient ways of managing lack of independence between features and identifying when combinations of features are more effective than the sum of the parts. As recently as five years ago, these problems were often tackled by feature engineering (feature engineering is the practice of finding or constructing features that facilitate increased classification accuracy), where new features are explicitly constructed from previous ones. Additionally, it was not uncommon to generate a large volume of new features that only represented the combination of existing features. Finally, it was not uncommon for functions to be applied to the data to evaluate nonlinear relationships. Advances in machine learning have produced excessively more pow- erful algorithms. These algorithms account for the nonlinearity between features and the model rather than requiring explicit construction of nonlinear features. Deep learning tech- niques which accommodate tremendous amount nonlinear relationships between variables, are the current industry standard for commenting such nonlinearity. Here we will consider a variety of historically important and practical machine learning models and compare their results.

1. Protocol. The protocol is defined by biologists, and is generally outside the scope of the machine learning practitioner, yet that has dramatic effect on what type of analysis is performed.

2. Hyperparameters. While there are many different types of hyperparameters, in our case we consider how the type of parameter selection is effectively embedding our data in different spaces. DRAFT19 3. Algorithm. Algorithm selection is a topic that is most familiar to machine learning practitioners and while it has an effect on accuracy and performance will see how many algorithms may give satisfactory results.

4. Algorithm parameters. The proper choice of algorithm parameters can have dra- matic effects on the overall performance of the machine learning algorithm. The opti- mal choice is in fact often hard to determine, however satisfactory results are possible with minimal tuning.

5. Metrics of evaluation. There are many different options for evaluating relationships on the above. In particular we will consider how these decisions affect the interoper- ability for practitioner.

3.3 Model Considerations

Machine learning has spent considerable effort finding tools and techniques that are both robust and generalizable, i.e., they perform well on new unseen data, and do so on a variety of different types of data problems. A great number of algorithmic approaches have developed over the years, arising out of radically different communities and appearing to make radically different assumptions about the optimal approach of analysis. A great deal of the research in statistical machine learning centers around the use of typically fixed data structures (a tall and skinny, short and fat, or square matrix), and the exploitation of some partial unique structure inherent in the data considered. However, at least equally important, and often more so, is the choice of how to formulate the mapping of data into the matrix. The choice of how to formulate the problem is becoming more and more important as expertise in machine learning grows to the point that a clear workflow is established to yield strong results. This has led to a great deal of speculation in the industrial deep learning community: many speculate that soon, what will distinguish machine learning results will simply be data set DRAFT20 depth and quality, and much of what we currently consider machine learning research will be largely set in stone, with best practices generally agreed upon. In broad strokes, this opinion has merit, and in fact was the partial conclusion of applied machine learning practitioners prior to the rise of deep learning; with this in mind, general skepticism should remain despite the fantastic success that deep learning has presented. Below we we discuss hyperparameter selection and the metrics of evaluation which we will use in the rest of the thesis.

3.3.1 Hyperparameter Selection

Hyperparameter selection is the process of identifying parameters that exist outside of a the machine learning algorithm but may influence that algorithms ability to accurately produce results. Hyperparameters can have significant impact on the efficacy of the machine learning algorithm, making this critical space to explore. Often, a grid search of hyperparameters and parameter settings is employed to deduce the optimal algorithm, which amounts to searching the combined space of hundreds or thousands of possible optimal models. In addition to being time consuming and posing problems of over-fitting, such a workflow also presents problems in explanation, as considering any one of the attributes requires fixing the others. This is particularly unintuitive when discussing hyperparameters because the optimal choice reflects decisions that are made further into the analysis, often after much more experimentation. That said, we will try to ground our consideration in analysis that does not rely upon later work; at times, however, this may be unavoidable. The primary hyperparameter will consider is selecting k-mer length during the preprocessing of the raw data.

DRAFT21 k-mer Size Selection

The choice of the k-mer size to work with is the hyperparameter with the largest and most direct effect on the algorithm. KMC 2 (Deorowicz et al., 2015), for instance, allows for a choice of k ≥ 10. We show how the changing k-mer size affects the size of the matrix. Below, however, we show how when shrinking the size of the matrix by collapsing the k-mers by column identity, the number increases much more slowly. This poses significant hurdles for space constraints. One advantage of using column identity on a very large k-mer matrix is that one may represent the matrix as a series of Boolean objects (i.e., identity matrix) with minimal loss of information. Additionally, in such a matrix the number of singletons (i.e., k-mers that occur only once across all isolates) is very high; in practice it is reasonable to throw such singletons away, as their perceived value is low (i.e., they don’t represent conserved traits across all RES or SUS populations)

3.4 Metrics

We consider several different metrics for measuring the performance of ML classifiers which we discuss in more detail below.

3.4.1 Classification performance

In this and subsequent part of this thesis we will use the following standard metrics for measuring classifier performance:

1. Discrimination threshold is the cutoff used to determine if a binary classification is type one or type two.

2. Receiver operating characteristic (ROC) is a graph of results of a binary classifier as across various discrimination thresholds. DRAFT22 3. Area under the curve (AUC) is the area under the ROC curve providing a useful generalization of the performance of each algorithm.

4. Type I error is the number of false positives in a binary classification problem.

5. Type II error is the number of false negatives in a binary classification problem.

6. Precision Calculated by dividing the number of correctly predicted true examples divided by the number of false examples predicted as true.

7. Precision at percentage. While it can be useful to make predictions across the entire data set, in practice it can be more important to make a smaller number of highly accurate predictions. Precision at percentage determine the number of correct predictions on a small subset of the data. This can be highly useful in practice.

8. Recall is computed by dividing the number of correctly predicted false examples di- vided by the number of true examples predicted as false.

9. F1 score is a metric to establish a balance between both precision and recall scores

and is given by F1 = 2 × Precision × Recall/(Precision + Recall)

3.4.2 Feature Importance Calculation

The metrics in the previous section focus on the ability of algorithm to determine the class from a set of features, however the ability to identify which features are predictive of the class can also be equally important for applied machine learning practitioners. It can provide a crucial understanding of the underlying structure of the feature space. In our case, feature selection can be used identify which mutations confer resistance to a specific antibiotic. Such information can also be used to compress the feature matrix while providing equal predictive power. While many techniques can perform such analysis, not all can do so easily or simply. The approach that algorithms take varies vastly, however the resulting data DRAFT23 structure is similar in the sense that the results provides information that can be used to compress an item by feature matrix. In essence, the goal is to determine a sufficient way of representing the original data matrix such that, despite a significant decrease in the number of features, it does not loose its predictive power. Different algorithms approach the problem of feature importances calculation differently. Some approaches are a natural by product of the functioning of the original algorithm, while some are separate algorithms that are used to interpret the trained classifier, and finally some algorithms provide no ability to discern the individual importance of each feature. We consider several different ways of evaluating feature importance, that reflect the atypical matrix structure and the nature of the mutations our individual datasets exhibit.

3.4.3 Gene Region Identification

While identification of features in k-mer space is useful, it is more productive to identify significant features by gene region. This is the first to finding biological plausibility of new mutations or alternatively of confirming known hypothesis about biological function. Typically the bacteria we consider has an associated reference genome we may refer to. This allows us to map individual k-mer which have been identified as important back to the individual protein encoding genes (PEGs). In the particular case of AMR modeled in k-mer space each SNP is redundantly captured by a considerable number of k-mers. This means multiple features convey the same information. One natural way to account for this redundancy is to aggregate the results of future importance calculation back to the PEG IDs of a reference genome. For well understood bacteria, such as we have here, this allows us to validated importance of individual PEGs and thereby the accuracy of our algorithm for feature importance calculation. PEGs also provide insight into the biological role that such mutation interferes with. Some SNPs effectively turn off the function of genes, while others simply augment the function. Such outside information may then be used by biologists to DRAFT24 determine the appropriate course of action in a clinical scenario, such selected a different antibiotic that’s function is not interfered with by the mutation. In order to have confidence that machine learning algorithms are correctly identifying the aspect of of genetic change that correlations with the phenotype, they need to plausibly find the SNP, or PEG associated with a SNP, that accounts for resistance. The minimum requirement for a viable ML algorithms is that it consistently identifies regions of the genetic code for further analysis — in other words, the algorithm is stable. While we have gold standard data, we also perform an analysis from the prospective of stability to simulate a real life situation in which we do not have access to all information (e.g., detecting AMR in the field). Compounding the issues of algorithmic stability is the fact that the k-mer matrix is overdetermined. For this reason multiple k-mers encode the same mutation and therefore, analysis that naively utilizes k-mer matrices is likely to yield different results each time, despite the fact that some k-mer reflect the same biological structures. To account for such realities we utilize the Burrows–Wheeler alignment algorithm (Li and Durbin, 2009, 2010) to map each k-mer to a reference genome. Such an alignment maps each k-mer to a protein encoding gene (PEG) which in turn denotes the unique gene location that is the highest percentage match for that k-mer.

3.5 Model Comparison

Before we consider the individual algorithms and their relationship to each other, let us first formalize our terminology.

3.5.1 Naive Bayes

Naive Bayes are probabilistic classifiers that compute the conditional probability using Bayes’ theorem and the method of maximum likelihood. It utilizes the strong simplifying assump- tion of conditional independence, but despite this somewhat unrealistic assumption for gen- DRAFT25 eral datasets, empirically these algorithms perform sufficiently well. Let X denote a column vector of the m × n k-mer matrix M (recall that n is the number of k-mers and m is the number of isolates). Let Y denote a label associated with that vector. The assumption of independence is required in order to calculate P (Y |X) using Bayes’ rule which requires computing P (X|Y ). Even in the case of binary classification (where each label Y denotes one of two possible classes), without assuming conditional independence one would need at least 2n data points in order to compute the probability estimates, and thus, even for a moderately sized n, this rapidly becomes prohibitively expensive. By assuming conditional independence, the search space is reduced from exponential (i.e., 2n) to a multiplicative constant of 2n.

3.5.2 AdaBoost

The AdaBoost classifier (Freund and Schapire, 1995) repeatedly fits a classifier to data, each time weighting the incorrectly classified instances more highly. This approach is designed to minimize the loss of the optimal strategy. Specifically, loss for first T trials is defined as

T X t t LA = p · ` t=1 and T X t Li = min li i t=1

is the loss of a particular classifiers loss. The goal is to minimize Lopt = LA − mini Li. An early example of ensemble learning, or aggregation of weak classifiers to build a strong clas- sifier, Ada-boost(adaptive boosting) earned considerable acclaim and produces competitive results in practice (Fernndez-Delgado et al., 2014). [define p and l in the formulas above.] One reason AdaBoost might not be the optimal algorithmic choice in AMR classification is its sensitivity to outliers. Each sample (isolate) might have mutations in different gene DRAFT26 regions and a gene region with an outlier mutation would receive disproportionate weight with respect to the likelihood that mutation would occur in another experiment.

3.5.3 Logistic Regression

[Finish.]

3.5.4 Support Vector Machines

The Support Vector Machines (SVM) algorithm (Cortes and Vapnik, 1995) constructs a linear hyperplane that separates the data by class. In practices, data is rarely linearly separable and thus the original data is mapped into a higher dimensional space to make the separation possible. The optimal separating hyperplane is one that maximizes its distance to the nearest data points. This of course makes the algorithm sensitive to outliers.

3.5.5 Random Forest 6 X ai i=1 Ranking

t1 t2 t3 t4 t5 t6   r 0.3 0.2 0.1 0.05   1  S =   0.6 0.2 0.1 0.05 0.25 0.2 (3.3) r2 0.3 0.25 0.2

Random Forest (Breiman, 2001) is an ensemble decision tree algorithm that produces mul- tiple separate decision trees on random subsets of original data, it then combines the trees to reach a consensus on new examples. Random Forest (RF) is a well recognized technique, often adopted as the first line of attack against new machine learning questions due to its high accuracy on a broad range of problems (Fernndez-Delgado et al., 2014). Additionally, DRAFT27 t1 = 0.3 t1 = 0.3

t2 = 0.2 t3 = 0.1 t5 = 0.25 t6 = 0.2 RES SUS RES SUS SUS RES SUS t4 = 0.05 RES SUS

Figure 3.3: An example RF tree.

Random Forest provides an easy to understand evaluation of the importance of individual features in the classification model, which in our case allows us to use the most important k- mers to find the DNA regions associated with AMR. Finally, RF can be naively parallelized, which is crucial given the size of scale of the k-mer matrices used in this thesis [Should we include table with the data sizes?]. √ RF generates submatrices by sampling n features from the original matrix [ten] times with replacement. For each submatrix a decision tree is trained. The algorithm takes a greedy approach at each split in the tree, identifying the feature with the maximal ability to decrease the Gini impurity index (defined in the following section) on the data set. To classify new examples, each decision tree judges the new isolate and ”votes” to classify it — in the AMR case the goal is to classify a new isolate as either RES or SUS to a given antibiotic. The majority RES/SUS judgment determines the ultimate classification. This is why RF is considered an ensemble machine learning technique — it utilizes a large number of weak classifiers and aggregates them for a better final classification result.

Feature Importance Calculation

For each node on a tree, the Gini impurity index provides a measure of the diversity of the elements. The k–mer that yields the greatest decrease in the Gini impurity score is chosen for each node on a tree. A normalized summation of the decrease in impurity is then used to signify the individual importance of the k-mers. Letting n be the number of classes and DRAFT28 pi be the fraction of data points labelled as class i in the dataset, the Gini impurity index G is defined as n X G = pi (1 − pi). i=1

If we denote I (ki) as the feature importance of ki ∈ K, where K is the set of all observed k-mers, I (ki) can be computed in the form:

X I (ki) = Gcurrent − Gleft − Gright ,

where Gcurrent denotes [..what..], while Gleft and Gright denote, respectively, the [....finish]. The importance of each k-mer is calculated by [summing] the Gini impurity index value for each k-mer across all trees in this forest. For the RF algorithm this amounts to the use of a function to aggregate the importance of the features that appear on the decision trees. The simplest solution is to sum the individual importance of each feature.

Let T = {t1, ··· , ti} be a forest of decision trees trained on X = {x1, ··· , xn} vectors i with labels Y = {y1, ··· , yn}. Let F = f1 be features that are present in T . Since each feature on tree tj, has an associated value of a Gini impurity where impurity I is defined as:

J X X fi (1 − fi) = fi fk , (3.4) i=1 i6=k as provided by Breiman (2001). [j and J notation]. Equation (3.4) is both easy to compute and intuitively accurate. Davis et al. (2016) show that a slight alteration to the protocol (namely summing feature importance across gene regions) allows us to rediscover regions of importance with minimal effort. However, further inspection of such results provides reason for pause. Following the same protocol as above yields problematic results. The ”top feature” for individual runs is often placed second or third on individual trees, but is an outlier in the number of trees in which they are present. This need not be a problem for all DRAFT29 distributions. In this particular distribution, where features have identical column values, the choice of [feature A or B] is either up to random chance or the implemented method of selection. By aggregation on gene regions, the authors above provide some control of such sampling, though a more rigorous algorithm would be appropriate. The same basic problem exists. We would prefer an approach that facilitates the feature importance of matrices of this particular type. This would control for the random nature of feature selection in such a vastly over sampled environment, in effect utilizing the position of the feature on individual trees and controlling for the random selection of features. Here we propose [Project Under Palaverous Sampling], a two-step alteration of the RF algorithm specifically designed to allow for a more reasonable interpretation of such similarly structured matrices.

DRAFT30 CHAPTER 4

ANALYSIS

4.1 Data Sets

In total there are 10 separate data sets, representing an analysis of four different bacteria. All the datasets were obtained from PATRIC which is an online platform generated from extensive biological annotations (Wattam et al., 2013, 2016). The Acinetobacter baumannii isolates and Staphylococcus aureus isolates are each analyzed with a single antibiotic — car- bapenem (see Section 4.1.1 for description) and methicillin (see Section 4.1.2 for description), respectively. M. tuberculosis isolates are analyzed with seven different antibiotics and the resulting seven M. tuberculosis datasets are described in Section 4.1.3. The Klebsiella pneu- moniae dataset is a multilabel datasets consisting of K. pneumoniae treated with sixteen different antibiotics (Section 4.1.4). We present an overview of each dataset after it has been transformed into a k-mer matrix. Each summary is a collection of plots that provides an overview of each data set and how the choice of the size of k effects the the resulting matrix in general. In general, the results show the same trends across datasets. Often the graphs are attempts to understand and interpret how k affects size, sparsity and unique mapping quality of the resulting matrix. A larger k-mer size increases the space complexity and allows for k-mers that uniquely map to individual regions. The choice of k however is more complicated as will soon be shown.

4.1.1 Acinetobacter baumannii

There are 232 Acinetobacter baumannii isolates of which 110 are susceptible to the antibiotic carbapenem, while 122 are resistant to it. In Figure 4.1 we provide an overview of the Acinetobacter baumannii dataset. Setting k = 10 generates 524, 684 features, while setting k = 20 generates 52, 966, 821 unique k-mers. Increasing the k-mer size has a dramatic effect DRAFT31 6 1e7 (i) 1.0 (ii) 5 0.8 4 0.6 3 0.4 2 %sparsity # features # 1 0.2 0 0.0 10 12 14 16 18 20 10 12 14 16 18 20

9 1e8 (iii) 25 (iv) 8 7 20 6 5 15 4 10 3 # ones # 2 5 1

0 (GB) size matrix 0 10 12 14 16 18 20 10 12 14 16 18 20

Figure 4.1: Overview of the A. baumannii dataset. The k-mer size (shown on the x axis) affects different metrics of the k-mer matrix identically. on both the sparsity and the range of values in the matrix. With 10-mers the matrix is dense (with 91.1% of the entries being nonzero), while when k = 20 only 7.3% of the matrix entries are nonzero. This dilation of the k-mer size also accounts for a dramatic increase of the number of nonzero entries in the matrix. In such a case the feature uniquely identifies a single region in the underlying isolate, which provides a simpler and more accurate identification of genes from feature importance calculations. The number of ones increases from 13, 328, 393 for k = 10 to 874, 899, 968 for k = 20. Finally, we consider another fundamental way of interpretting the k-mer matrix, namely in terms of the number of unique column entries. While computing the rank of such a large matrix is computationally prohibitive, a simpler approach suffices to allow us to work on on dramatically smaller matrix. Chiefly, many of the features result in identical columns of values. The four plots in Figure 4.1 show how k-mer size affects the matrix representation of the data. The first plot shows the number features required to represent the data. When a DRAFT32 (a) A. baumannii (b) S. aureus 25 25 k=10 k=10 k=11 k=11 20 k=12 20 k=12 k=13 k=13 k=14 k=14 k=15 k=15 15 k=16 15 k=16 k=17 k=17 k=18 k=18 10 k=19 10 k=19 k=20 k=20 log (count) log (count) log

5 5

0 0 0 100 200 300 400 500 600 0 100 200 300 400 500 600 entry in the k-mer matrix entry in the k-mer matrix

Figure 4.2: Histogram of the entries in the k-mer matrix for the (a) A. baumannii and (b) S. aureus dataset. Note that increasing the size k can have a dramatic effect on the distribution of entries of the k-mer matrix. small k is chosen each feature occurs multiple times within the matrix and this this leads to smaller number of total unique strings required to represent the data. The second plot illustrates that as the k-mer size increases so does the sparsity of the matrix. The third plot shows how the total number of ones in a matrix grows in accordance with the first two plots. Finally, the fourth plot shows how changing the value of k affects the space required for storing the k-mer matrix. Additionally, Figure 4.2(a) shows (on a logarithmic scale) the counts of the number of times an entry value occurs in the k-mer matrix, which provides another visual representation of the effect the k-mer size has on the structure of k-mer matrices. We can see that increasing the size of the k-mer that the distribution is dramatically affected. We can also see these graphs hints that some sections of the structure are preserved through the plateaus and drops and peaks associated with small clusters of particular count values. [Such resolution by provide future work in considering closer analysis of such structure and how it might impact our interpretation of k-mer space.]

DRAFT33 1.21e7 (i) 0.8 (ii) 1.0 0.7 0.8 0.6 0.5 0.6 0.4 0.4 0.3 %sparsity # features # 0.2 0.2 0.0 0.1 10 12 14 16 18 20 10 12 14 16 18 20

1.81e9 (iii) 1.41e10 (iv) 1.6 1.2 1.4 1.2 1.0 1.0 0.8 0.8 0.6 0.6 # ones # 0.4 0.4 0.2 0.2

0.0 (GB) size matrix 0.0 10 12 14 16 18 20 10 12 14 16 18 20

Figure 4.3: Overview of the A. baumannii dataset. The k-mer size (shown on the x axis) affects different metrics of the k-mer matrix identically.

4.1.2 Staphylococcus aureus

There are 606 Staphylococcus aureus isolates with 491 resistant to the antibiotic methicillin and 115 susceptible to it. Figure 4.3 provides an overview of the S. aureus dataset similar to our analysis of the A. baumannii dataset in the previous section. As the four plots in Figure 4.3 demonstrate, the k-mer matrix of this dataset exhibits similar structure and the quality of the RF classifier (measured in terms of classification accuracy) is also similar. Figure 4.2(b) shows the same trend as described for the A. baumannii case in Figure 4.2(a) above.

4.1.3 Mycobacterium tuberculosis

The worldwide health threat of multidrug resistant tuberculosis has resulted in several studies that have generated a large amount of AMR data for M. tuberculosis strains (Davis et al., 2016). In the PATRIC database M. tuberculosis is currently the species with the largest amount of AMR metadata. As described by Davis et al. (2016), one of the challenges of DRAFT34 No. Antibiotic # Isolates # RES # SUS # Features, k=15 1 Ethambutol 1,045 333 712 13,922,315 2 Ethionamide 422 172 250 13,031,317 3 Isoniazid 1,287 815 472 14,000,442 4 Kanamycin 671 187 484 13,099,034 5 Ofloxacin 696 238 458 13,247,450 6 Rifampicin 1,201 533 668 13,941,421 7 Streptomycin 1,170 492 678 13,961,344

Table 4.1: List of the antibiotics used for the M. tuberculosis isolates. For each antibiotic we list the total number of isolates used for classification with the specific number of resistant and susceptible isolates listed in the fourth and fifth column, respectively. Additional details about the dataset can be found in (Davis et al., 2016). The last column shows the number of features when the k = 15. classifying M. tuberculosis genomes as resistant or susceptible is that M. tuberculosis is resistant to multiple antibiotics which makes an unambiguous classification of individual antibiotics challenging. For example, it is difficult to build a classifier to detect isoniazid resistance that is not also biased by rifampicin related k-mers and vice versa. Table 4.1 provides a summary of the M. tuberculosis dataset. In particular, the table lists the number of isolates of M. tuberculosis treated with each of the seven antibiotics and the resulting number of features in the k-mer matrix (particularly, when k = 15) for each antibiotic.

4.1.4 Klebsiella pneumoniae

This dataset consists of 1, 777 Klebsiella pneumoniae genomes collected by the Houston Methodist Hospital System between 2011 and 2015 (Long et al., 2017). The dataset covers 16 antibiotics listed in Table 4.2. Unlike the other three datasets, the K. pneumoniae dataset is a multilabel dataset, in which each datapoint has a miximum of binary 16 labels (i.e., each isolate is either resistant or susceptible to one of the 16 antibiotics). Note, however, that each isolate does not necessarily have all 16 labels — this is due to the fact that the data was collected in a longitudinal study and so, the the K. pneumoniae isolates were not necessarily DRAFT35 No. Antibiotic # Isolates # SUS # RES 1 Amikacin 1,554 1,190 364 2 Aztreonam 1,477 1,377 100 3 Cefepime 1,247 936 311 4 Cefoxitin 1,531 555 976 5 Ciprofloxacin 1,554 119 1,435 6 Ertapenem 443 265 178 7 Fosfomycin 772 640 132 8 Gentamicin 1,554 786 768 9 Imipenem 1,553 1,100 453 10 Levofloxacin 1,553 246 1,307 11 Meropenem 1,553 1,123 430 12 Piperacillin/Tazobactam 1,552 322 1,230 13 Tetracycline 1,554 658 896 14 Tigecycline 627 396 231 15 Tobramycin 1,554 501 1,053 16 Trimethoprim/Sulfamethoxazole 1,554 331 1,223

Table 4.2: List of antibiotics used for the K. pneumoniae isolates. For each antibiotic we list the total number of isolates which have a labelled (i.e., resistant or susceptible) for that antibiotic. The specific number of resistant and susceptible isolates are listed in the fourth and fifth column, respectively.

tested for resistance to each of the 16 antibiotics.

4.2 Classifier Comparison

All the performance statistics we report below are computed by a 5-fold cross-validation unless specified otherwise. When we refer to a k-mer matrix we typically mean nonbinary k-mer matrix (as described in Section 3.1), unless otherwise specified. Unless otherwise specified, we set k = 15.

4.2.1 Classifier Performance

We trained five classifiers (RF, Naive Bayes, Lasso, AdaBoost, SVM) on the A. baumannii and S. aureus datasets described in Section 4.1. The k-mer matrices were constructed by setting k = 15. Figure 4.4 show the ROC curves for all five classifiers, while Table 4.3 lists the DRAFT36 Accuracy F1 Recall Precision AUC A. baumannii Random Forest 0.94 ± 0.03 0.94 ± 0.03 0.92 ± 0.05 0.96 ± 0.02 0.95 ± 0.04 Naive Bayes 0.87 ± 0.07 0.85 ± 0.1 0.81 ± 0.17 0.94 ± 0.05 0.87 ± 0.07 Lasso 0.91 ± 0.04 0.92 ± 0.04 0.93 ± 0.09 0.92 ± 0.08 0.96 ± 0.03 AdaBoost 0.91 ± 0.06 0.91 ± 0.07 0.93 ± 0.11 0.9 ± 0.04 0.96 ± 0.03 SVM 0.88 ± 0.08 0.89 ± 0.07 0.92 ± 0.08 0.87 ± 0.09 0.93 ± 0.05 S. aureus Random Forest 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 Naive Bayes 0.93 ± 0.03 0.96 ± 0.02 0.97 ± 0.02 0.95 ± 0.01 0.88 ± 0.04 Lasso 0.99 ± 0.02 0.99 ± 0.01 0.99 ± 0.02 1.0 ± 0.01 0.99 ± 0.01 AdaBoost 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 SVM 0.83 ± 0.02 0.9 ± 0.01 0.99 ± 0.02 0.84 ± 0.01 0.99 ± 0.02

Table 4.3: Classifier comparison on the A. baumannii dataset (classification of resistance to carbapenem) and the S. aureus dataset (classification of resistance to methicillin). classifier accuracy, precision, recall, F1 and the area under the curve (AUC) corresponding to the ROC curves in Figure 4.4. Additionally, we trained the RF, Naive Bayes and Lasso classifiers on the seven M. tuberculosis datasets from Section 4.1.3. Figure 4.5 shows the corresponding ROC curves and Table 4.4 lists the classifier accuracy, precision, recall, F1 and the corresponding AUC values. Next, we analyze the A. baumannii and S. aureus datasets across different k-mer size ranges. Figures 4.7 and 4.6 show the ROC curves for the five classifiers trained on the A. baumannii and S. aureus dataset, respectively. Similarly, Figures 4.9 and 4.8 show the ROC curves for the five classifiers trained on the binary k-mer matrices of the A. baumannii and S. aureus dataset, respectively. In Figure 4.10 we plot the average accuracy for classifiers trained with default settings on the nonbinary and binary k-mer matrices as a function of the k-mer size. While the structure of the k-mer matrix changes as k increases, the accuracy in the high performing classifiers stays largely unchanged with the exception of Naive Bayes, which is considerably less accurate for small values of k. This is a rather noteworthy result since the choice of k DRAFT37 Accuracy F1 Recall Precision AUC Ethambutol Random Forest 0.66 ± 0.08 0.35 ± 0.13 0.31 ± 0.15 0.52 ± 0.21 0.64 ± 0.14 Naive Bayes 0.69 ± 0.06 0.42 ± 0.12 0.36 ± 0.13 0.57 ± 0.18 0.6 ± 0.07 Lasso 0.74 ± 0.06 0.55 ± 0.09 0.54 ± 0.17 0.67 ± 0.18 0.77 ± 0.1 Ethionamide Random Forest 0.68 ± 0.07 0.52 ± 0.15 0.47 ± 0.2 0.68 ± 0.13 0.71 ± 0.11 Naive Bayes 0.63 ± 0.08 0.38 ± 0.22 0.34 ± 0.27 0.65 ± 0.23 0.59 ± 0.1 Lasso 0.71 ± 0.06 0.61 ± 0.1 0.59 ± 0.17 0.66 ± 0.08 0.74 ± 0.08 Isoniazid Random Forest 0.74 ± 0.07 0.79 ± 0.08 0.79 ± 0.15 0.81 ± 0.06 0.81 ± 0.07 Naive Bayes 0.63 ± 0.01 0.76 ± 0.01 0.91 ± 0.04 0.65 ± 0.0 0.53 ± 0.01 Lasso 0.86 ± 0.09 0.88 ± 0.09 0.86 ± 0.15 0.92 ± 0.05 0.93 ± 0.04 Kanamycin Random Forest 0.79 ± 0.08 0.53 ± 0.17 0.44 ± 0.16 0.72 ± 0.21 0.73 ± 0.11 Naive Bayes 0.81 ± 0.05 0.54 ± 0.15 0.41 ± 0.16 0.82 ± 0.13 0.69 ± 0.08 Lasso 0.89 ± 0.03 0.79 ± 0.05 0.75 ± 0.05 0.84 ± 0.07 0.87 ± 0.04 Ofloxacin Random Forest 0.71 ± 0.03 0.45 ± 0.17 0.4 ± 0.21 0.68 ± 0.16 0.74 ± 0.03 Naive Bayes 0.66 ± 0.04 0.55 ± 0.07 0.64 ± 0.17 0.51 ± 0.05 0.65 ± 0.05 Lasso 0.88 ± 0.04 0.82 ± 0.07 0.8 ± 0.11 0.87 ± 0.1 0.89 ± 0.04 Rifampicin Random Forest 0.77 ± 0.06 0.78 ± 0.07 0.75 ± 0.12 0.83 ± 0.07 0.86 ± 0.07 Naive Bayes 0.7 ± 0.07 0.75 ± 0.05 0.78 ± 0.08 0.72 ± 0.09 0.69 ± 0.07 Lasso 0.9 ± 0.07 0.9 ± 0.08 0.88 ± 0.14 0.94 ± 0.05 0.96 ± 0.03 Streptomycin Random Forest 0.75 ± 0.05 0.67 ± 0.09 0.61 ± 0.14 0.79 ± 0.13 0.81 ± 0.06 Naive Bayes 0.6 ± 0.04 0.53 ± 0.12 0.58 ± 0.22 0.51 ± 0.06 0.6 ± 0.06 Lasso 0.76 ± 0.03 0.69 ± 0.09 0.67 ± 0.17 0.77 ± 0.1 0.84 ± 0.03

Table 4.4: Classifier comparison on each of the seven M. tuberculosis datasets listed in Table 4.1.

DRAFT38 ReceiverA. operating baumannii characteristic (ROC) Receiver S.operating aureus characteristic (ROC) 1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4 Random Forest Random Forest Naive Bayes Naive Bayes True Positive Rate Positive True 0.2 Rate Positive True 0.2 Lasso Lasso AdaBoost AdaBoost 0.0 SVM 0.0 SVM 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate Figure 4.4: ROC curves for classifiers trained on each of the A. baumannii and S. aureus datasets (k = 15). has [ such dramatic effects on the computational performance and speed of the algorithms], the fact that the choice of k has only minimal effect on accuracy of the algorithm should give practitioners confidence that even without access to large memory machines they should be able to generate reasonable results. [For our experiments we chose k = 15 as it is near or just after the asymptote. This also provided a reasonably sized matrix to compare with a compression technique we explore later in this thesis.]

4.2.2 Speed

The execution time of the five classifiers is plotted in Figure 4.11. RF outperforms all the other classifiers on the k-mer data by a significant margin. Although RF and Lasso have similar predictive performance, RF is much faster, which is why we continued the analysis using RF.

4.3 Random Forest

As demonstrated above RF achieves the right balance between wall clock time time and desired accuracy. Additionally, in contrast to other classifiers, RF performance does not DRAFT39 Receiver Ethambutoloperating characteristic (ROC) ReceiverEthionamide operating characteristic (ROC) 1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

True Positive Rate Positive True 0.2 Random Forest Rate Positive True 0.2 Random Forest Naive Bayes Naive Bayes Lasso Lasso 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate Receiver operatingIsoniazid characteristic (ROC) Receiver Kanamycinoperating characteristic (ROC) 1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

True Positive Rate Positive True 0.2 Random Forest Rate Positive True 0.2 Random Forest Naive Bayes Naive Bayes Lasso Lasso 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate Receiver operatingOfloxacin characteristic (ROC) Receiver operatingRifampicin characteristic (ROC) 1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

True Positive Rate Positive True 0.2 Random Forest Rate Positive True 0.2 Random Forest Naive Bayes Naive Bayes Lasso Lasso 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate ReceiverStreptomycin operating characteristic (ROC) 1.0

0.8

0.6

0.4

True Positive Rate Positive True 0.2 Random Forest Naive Bayes Lasso 0.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate Figure 4.5: ROC curves for classifiers trained on each of the M. tuberculosis datasets listed in Table 4.1. DRAFT40 RF Naive Bayes

Receiver operating characteristic (ROC) Receiver operating characteristic (ROC) 1.0 1.0

0.8 0.8 kmer size =10 kmer size =10 kmer size =11 kmer size =11 0.6 kmer size =12 0.6 kmer size =12 kmer size =13 kmer size =13 kmer size =14 kmer size =14 0.4 kmer size =15 0.4 kmer size =15 kmer size =16 kmer size =16 kmer size =17 kmer size =17 True Positive Rate Positive True 0.2 Rate Positive True 0.2 kmer size =18 kmer size =18 kmer size =19 kmer size =19 0.0 kmer size =20 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate

Lasso SVM

Receiver operating characteristic (ROC) Receiver operating characteristic (ROC) 1.0 1.0

0.8 0.8 kmer size =10 kmer size =10 kmer size =11 kmer size =11 0.6 kmer size =12 0.6 kmer size =12 kmer size =13 kmer size =13 kmer size =14 kmer size =14 0.4 kmer size =15 0.4 kmer size =15 kmer size =16 kmer size =16 kmer size =17 kmer size =17 True Positive Rate Positive True 0.2 Rate Positive True 0.2 kmer size =18 kmer size =18 kmer size =19 kmer size =19 0.0 kmer size =20 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate AdaBoost

Receiver operating characteristic (ROC) 1.0

0.8 kmer size =10 kmer size =11 0.6 kmer size =12 kmer size =13 kmer size =14 0.4 kmer size =15 kmer size =16 kmer size =17 True Positive Rate Positive True 0.2 kmer size =18 kmer size =19 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate Figure 4.6: ROC curves for different classifiers trained on the A. baumannii dataset. Each ROC curve corresponds to a different size k. DRAFT41 RF Naive Bayes

Receiver operating characteristic (ROC) Receiver operating characteristic (ROC) 1.0 1.0

0.8 0.8 kmer size =10 kmer size =10 kmer size =11 kmer size =11 0.6 kmer size =12 0.6 kmer size =12 kmer size =13 kmer size =13 kmer size =14 kmer size =14 0.4 kmer size =15 0.4 kmer size =15 kmer size =16 kmer size =16 kmer size =17 kmer size =17 True Positive Rate Positive True 0.2 Rate Positive True 0.2 kmer size =18 kmer size =18 kmer size =19 kmer size =19 0.0 kmer size =20 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate

Lasso SVM

Receiver operating characteristic (ROC) Receiver operating characteristic (ROC) 1.0 1.0

0.8 0.8 kmer size =10 kmer size =10 kmer size =11 kmer size =11 0.6 kmer size =12 0.6 kmer size =12 kmer size =13 kmer size =13 kmer size =14 kmer size =14 0.4 kmer size =15 0.4 kmer size =15 kmer size =16 kmer size =16 kmer size =17 kmer size =17 True Positive Rate Positive True 0.2 Rate Positive True 0.2 kmer size =18 kmer size =18 kmer size =19 kmer size =19 0.0 kmer size =20 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate AdaBoost

Receiver operating characteristic (ROC) 1.0

0.8 kmer size =10 kmer size =11 0.6 kmer size =12 kmer size =13 kmer size =14 0.4 kmer size =15 kmer size =16 kmer size =17 True Positive Rate Positive True 0.2 kmer size =18 kmer size =19 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate Figure 4.7: ROC curves for different classifiers trained on the S. aureus dataset. Each ROC curve corresponds to a different size k. DRAFT42 RF Naive Bayes

Receiver operating characteristic (ROC) Receiver operating characteristic (ROC) 1.0 1.0

0.8 0.8 kmer size =10 kmer size =10 kmer size =11 kmer size =11 0.6 kmer size =12 0.6 kmer size =12 kmer size =13 kmer size =13 kmer size =14 kmer size =14 0.4 kmer size =15 0.4 kmer size =15 kmer size =16 kmer size =16 kmer size =17 kmer size =17 True Positive Rate Positive True 0.2 Rate Positive True 0.2 kmer size =18 kmer size =18 kmer size =19 kmer size =19 0.0 kmer size =20 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate

Lasso SVM

Receiver operating characteristic (ROC) Receiver operating characteristic (ROC) 1.0 1.0

0.8 0.8 kmer size =10 kmer size =10 kmer size =11 kmer size =11 0.6 kmer size =12 0.6 kmer size =12 kmer size =13 kmer size =13 kmer size =14 kmer size =14 0.4 kmer size =15 0.4 kmer size =15 kmer size =16 kmer size =16 kmer size =17 kmer size =17 True Positive Rate Positive True 0.2 Rate Positive True 0.2 kmer size =18 kmer size =18 kmer size =19 kmer size =19 0.0 kmer size =20 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate AdaBoost

Receiver operating characteristic (ROC) 1.0

0.8 kmer size =10 kmer size =11 0.6 kmer size =12 kmer size =13 kmer size =14 0.4 kmer size =15 kmer size =16 kmer size =17 True Positive Rate Positive True 0.2 kmer size =18 kmer size =19 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate Figure 4.8: ROC curves for different classifiers trained on the binary A. baumannii dataset. Each ROC curve corresponds to a different size k. DRAFT43 RF Naive Bayes

Receiver operating characteristic (ROC) Receiver operating characteristic (ROC) 1.0 1.0

0.8 0.8 kmer size =10 kmer size =10 kmer size =11 kmer size =11 0.6 kmer size =12 0.6 kmer size =12 kmer size =13 kmer size =13 kmer size =14 kmer size =14 0.4 kmer size =15 0.4 kmer size =15 kmer size =16 kmer size =16 kmer size =17 kmer size =17 True Positive Rate Positive True 0.2 Rate Positive True 0.2 kmer size =18 kmer size =18 kmer size =19 kmer size =19 0.0 kmer size =20 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate

Lasso SVM

Receiver operating characteristic (ROC) Receiver operating characteristic (ROC) 1.0 1.0

0.8 0.8 kmer size =10 kmer size =10 kmer size =11 kmer size =11 0.6 kmer size =12 0.6 kmer size =12 kmer size =13 kmer size =13 kmer size =14 kmer size =14 0.4 kmer size =15 0.4 kmer size =15 kmer size =16 kmer size =16 kmer size =17 kmer size =17 True Positive Rate Positive True 0.2 Rate Positive True 0.2 kmer size =18 kmer size =18 kmer size =19 kmer size =19 0.0 kmer size =20 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate AdaBoost

Receiver operating characteristic (ROC) 1.0

0.8 kmer size =10 kmer size =11 0.6 kmer size =12 kmer size =13 kmer size =14 0.4 kmer size =15 kmer size =16 kmer size =17 True Positive Rate Positive True 0.2 kmer size =18 kmer size =19 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate Figure 4.9: ROC curves for different classifiers trained on the binary S. aureus dataset. Each ROC curve corresponds to a different size k. DRAFT44 A. baumannii S. aureus 1.0 1.05

1.00 0.9 0.95 0.8 0.90

0.7 0.85

Random Forest 0.80 Random Forest 0.6 avg. accuracy avg. Naive Bayes accuracy avg. Naive Bayes 0.75 Lasso Lasso 0.5 SVM 0.70 SVM AdaBoost AdaBoost 0.4 0.65 10 12 14 16 18 20 10 12 14 16 18 20 kmer size kmer size binary A. baumannii binary S. aureus 1.1 1.05

1.0 1.00

0.9 0.95

0.8 0.90

Random Forest Random Forest 0.7 0.85 avg. accuracy avg. Naive Bayes accuracy avg. Naive Bayes Lasso Lasso 0.6 SVM 0.80 SVM AdaBoost AdaBoost 0.5 0.75 10 12 14 16 18 20 10 12 14 16 18 20 kmer size kmer size

Figure 4.10: Accuracy of different classifiers on the (binary) A. baumannii and S. aureus datasets. Additional performance metrics can be found in the supplementary Tables A.1, A.2, A.3 and A.4 in Appendix A.1.

DRAFT45 250

200

150

100

50 execution time (minutes) execution

0 RF NB Lasso SVM AdaBoost

Figure 4.11: Execution times for different classifiers ran on the A. baumannii dataset. NB denotes Naive Bayes.

change significantly with k-mer size. We analyzed the RF performance on the K. pneumoniae dataset — Figure 4.12 and Table 4.5 show the ROC curves and performance metrics of RF.

Receiver operating characteristic 1.0

Amikacin 0.8 Aztreonam Cefepime Cefoxitin 0.6 Ciprofloxacin Ertapenem Fosfomycin Gentamicin 0.4 Imipenem Levofloxacin Meropenem Piperacillin True Positive Rate Positive True 0.2 Tetracycline Tigecycline Tobramycin 0.0 Trimethoprim 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate Figure 4.12: ROC curves for the RF classifier trained on the K. pneumoniae dataset.

4.3.1 RF subtrees

One of the most important parameters of the RF classifiers is the number of trees used. For the smaller datasets A. baumannii and S. aureus we plot the ROC curves corresponding to different number of trees in the RF classifier (see Figure 4.13). The corresponding accuracy, DRAFT46 Accuracy F1 Recall Precision AUC Amikacin 0.91 ± 0.01 0.81 ± 0.02 0.82 ± 0.06 0.81 ± 0.06 0.95 Aztreonam 0.93 ± 0.01 0.96 ± 0.01 0.94 ± 0.01 0.98 ± 0.01 0.60 Cefepime 0.75 ± 0.02 0.83 ± 0.01 0.82 ± 0.01 0.85 ± 0.03 0.69 Cefoxitin 0.78 ± 0.05 0.82 ± 0.04 0.85 ± 0.04 0.8 ± 0.06 0.84 Ciprofloxacin 0.93 ± 0.01 0.96 ± 0.01 0.96 ± 0.01 0.96 ± 0.01 0.81 Ertapenem 0.89 ± 0.01 0.81 ± 0.02 0.81 ± 0.05 0.82 ± 0.06 0.94 Fosfomycin 0.83 ± 0.02 0.37 ± 0.05 0.52 ± 0.12 0.3 ± 0.08 0.70 Gentamicin 0.84 ± 0.05 0.83 ± 0.05 0.88 ± 0.07 0.79 ± 0.05 0.95 Imipenem 0.93 ± 0.03 0.88 ± 0.04 0.85 ± 0.05 0.91 ± 0.04 0.97 Levofloxacin 0.93 ± 0.02 0.96 ± 0.01 0.97 ± 0.01 0.95 ± 0.03 0.93 [Meropenem] 0.94 Piperacillin/Tazo bactam 0.79 ± 0.01 0.87 ± 0.01 0.86 ± 0.01 0.89 ± 0.03 0.79 Tetracycline 0.77 ± 0.02 0.79 ± 0.03 0.85 ± 0.03 0.73 ± 0.04 0.89 Tigecycline 0.63 ± 0.05 0.4 ± 0.07 0.52 ± 0.12 0.33 ± 0.06 0.50 Tobramycin 0.89 ± 0.03 0.92 ± 0.02 0.92 ± 0.03 0.91 ± 0.02 0.97 Trimethoprim/Sulfa methoxazole 0.92 ± 0.02 0.95 ± 0.01 0.94 ± 0.02 0.95 ± 0.01 0.91

Table 4.5: Comparison of RF performance on the K. pneumoniae dataset.

precision, recall, F1 and AUC values are plotted in Table 4.6. As evident from these figures, as the number of trees increases the accuracy does not change significantly. [explain why.]

4.3.2 RF on subsets of the data

One important aspect of stability is the ability of the machine learning algorithm to function on a smaller number of datapoints (in our case isolates) available. This is important because it is not uncommon for clinicians to work with limited amounts of data. For such analysis we select smaller subsamples of the original data and then proceed with RF training on each subset. This provides us the ability to simulate how well our algorithm would perform with more limited access to isolates. For this experiment we chose to work with the M. tuberculosis datasets rifampicin and streptomycin as the number of RES and SUS isolates in those datasets are approximately equal (see Table 4.1). Table 4.7 shows that the accuracy increases as the subset size increases. DRAFT47 ReceiverA. operating baumannii characteristic (ROC) Receiver S.operating aureus characteristic (ROC) 1.0 1.0

0.8 0.8

0.6 10 Trees 0.6 10 Trees 50 Trees 50 Trees 0.4 100 Trees 0.4 100 Trees 150 Trees 150 Trees

True Positive Rate Positive True 0.2 200 Trees Rate Positive True 0.2 200 Trees 250 Trees 250 Trees 300 Trees 300 Trees 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate Figure 4.13: ROC curves for the RF classifier trained with different number of trees on each of the A. baumannii and S. aureus datasets.

Accuracy F1 Recall Precision AUC A. baumannii 10 trees 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.96 ± 0.03 50 trees 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.96 ± 0.03 100 trees 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.96 ± 0.03 150 trees 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.95 ± 0.03 200 trees 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.96 ± 0.04 250 trees 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.95 ± 0.03 300 trees 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.96 ± 0.03 S. aureus 10 trees 0.94 ± 0.04 0.94 ± 0.04 0.93 ± 0.06 0.95 ± 0.03 0.99 ± 0.01 50 trees 0.92 ± 0.02 0.92 ± 0.02 0.91 ± 0.05 0.94 ± 0.04 0.99 ± 0.01 100 trees 0.94 ± 0.04 0.95 ± 0.04 0.93 ± 0.06 0.96 ± 0.03 0.99 ± 0.01 150 trees 0.94 ± 0.03 0.94 ± 0.03 0.93 ± 0.04 0.95 ± 0.03 0.99 ± 0.01 300 trees 0.93 ± 0.03 0.93 ± 0.04 0.9 ± 0.07 0.96 ± 0.03 0.99 ± 0.01 250 trees 0.92 ± 0.04 0.93 ± 0.04 0.93 ± 0.05 0.93 ± 0.05 0.99 ± 0.01 300 trees 0.94 ± 0.03 0.95 ± 0.03 0.93 ± 0.03 0.96 ± 0.03 0.99 ± 0.01

Table 4.6: Comparison of RF classifier trained with different number of trees on the A. baumannii dataset (classification of resistance to carbapenem) and the S. aureus dataset (classification of resistance to methicillin). DRAFT48 Receiverrifampicin operating characteristic (ROC) Receiverstreptomycin operating characteristic (ROC) 1.0 1.0

subset size=50 0.8 0.8 subset size=100 subset size=150 subset size=50 0.6 subset size=200 0.6 subset size=100 subset size=300 subset size=150 subset size=400 subset size=200 0.4 subset size=500 0.4 subset size=300 subset size=600 subset size=400 subset size=700 subset size=500 True Positive Rate Positive True 0.2 Rate Positive True 0.2 subset size=800 subset size=600 subset size=900 subset size=700 0.0 subset size=1000 0.0 subset size=800 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate Figure 4.14: ROC curves for the RF classifier trained on training sets of increasing sizes. In particular, RF was trained on subsets of the M. tuberculosis rifampicin and streptomycin datasets from Table 4.1. The subset size denotes the total number of isolates subsampled from each dataset in each experiment (with number of RES and SUS isolates equal to n/2 in each case).

Figure 4.14 shows the ROC curves for each of the M. tuberculosis datasets. While the ROC curves indicate that with increased access to data RF increases accuracy, it is easy to note that even in the low isolate case, the algorithms preform better than chance.

4.4 Feature Importance Calculation

4.4.1 Biological relevance

While identification of individual SNPs associated with conferring AMR is important, the most utility for a biologist comes from knowing the PEG associated with a SNP. Such information is crucial for understanding how AMR functions and, consequently, developing alternative strategies for targeting bacteria resistant to a particular antibiotic. For this reason, not only does the feature importance of k-mers associated with resistance matter, but associating those k-mers with genes is equally important and often more clinically applicable. Since k-mer matrices contain a large number of redundant features, interpreting feature importance in its traditional sense has a high degree of complexity due to noise. On the other DRAFT49 Rifampicin n Accuracy F1 Recall Precision AUC 50 0.77 ± 0.18 0.79 ± 0.16 0.83 ± 0.18 0.77 ± 0.18 0.85 ± 0.17 100 0.72 ± 0.09 0.72 ± 0.09 0.73 ± 0.14 0.75 ± 0.14 0.81 ± 0.11 150 0.76 ± 0.04 0.75 ± 0.05 0.74 ± 0.1 0.79 ± 0.05 0.85 ± 0.05 200 0.78 ± 0.06 0.76 ± 0.06 0.73 ± 0.09 0.81 ± 0.09 0.86 ± 0.06 300 0.79 ± 0.05 0.79 ± 0.05 0.78 ± 0.1 0.82 ± 0.08 0.89 ± 0.03 400 0.81 ± 0.03 0.81 ± 0.03 0.8 ± 0.05 0.81 ± 0.06 0.9 ± 0.04 500 0.81 ± 0.04 0.79 ± 0.05 0.75 ± 0.06 0.84 ± 0.04 0.89 ± 0.03 600 0.83 ± 0.03 0.82 ± 0.03 0.79 ± 0.05 0.86 ± 0.04 0.92 ± 0.02 700 0.85 ± 0.03 0.85 ± 0.03 0.82 ± 0.04 0.88 ± 0.04 0.93 ± 0.03 800 0.86 ± 0.03 0.86 ± 0.04 0.82 ± 0.05 0.89 ± 0.03 0.94 ± 0.02 900 0.89 ± 0.02 0.88 ± 0.02 0.85 ± 0.04 0.93 ± 0.04 0.96 ± 0.02 1000 0.9 ± 0.03 0.89 ± 0.03 0.87 ± 0.05 0.92 ± 0.03 0.96 ± 0.01 Streptomycin n Accuracy F1 Recall Precision AUC 50 0.64 ± 0.14 0.61 ± 0.14 0.59 ± 0.2 0.72 ± 0.2 0.66 ± 0.19 100 0.64 ± 0.1 0.63 ± 0.1 0.61 ± 0.15 0.66 ± 0.11 0.72 ± 0.12 150 0.73 ± 0.1 0.72 ± 0.11 0.69 ± 0.12 0.76 ± 0.11 0.82 ± 0.09 200 0.69 ± 0.09 0.67 ± 0.09 0.65 ± 0.12 0.72 ± 0.13 0.77 ± 0.08 300 0.78 ± 0.05 0.78 ± 0.06 0.76 ± 0.06 0.79 ± 0.07 0.85 ± 0.04 400 0.77 ± 0.04 0.77 ± 0.05 0.74 ± 0.07 0.79 ± 0.05 0.85 ± 0.04 500 0.78 ± 0.04 0.78 ± 0.05 0.78 ± 0.08 0.79 ± 0.04 0.86 ± 0.04 600 0.82 ± 0.05 0.82 ± 0.05 0.8 ± 0.06 0.84 ± 0.05 0.9 ± 0.03 700 0.86 ± 0.03 0.86 ± 0.03 0.86 ± 0.04 0.87 ± 0.04 0.92 ± 0.03 800 0.86 ± 0.04 0.86 ± 0.04 0.84 ± 0.05 0.87 ± 0.04 0.92 ± 0.03

Table 4.7: Comparison of RF performance on training sets of increasing sizes. In particular, we trained RF on subsets of the M. tuberculosis rifampicin and streptomycin datasets from Table 4.1 . The subset size n denotes the total number of isolates subsampled from each dataset in each experiment (with number of RES and SUS isolates equal to n/2 in each case). All statistics are averaged over three independent runs, on each one of them we performed 5-fold cross-validation.

DRAFT50 M. tuberculosis, rifampicin 0.30

0.25 2123 2916 2123 3561 4088 1987 2123 3251 1715 2123 3251 3622 0.20 700757 1024 1125 2123 2559 2927 3333 3618 757 1782 3950 4125 757 3251 3561 4276 327 757 3618 0.15 739757 739

739 4220 739 0.10 374484 739 975 1303 2777 3499 484 739 739 2280 24 757 1801 2280 3234 410641764220 0.05 757

0.00

−0.05 0 1000 2000 3000 4000 M. tuberculosis, streptomycin 0.30

0.25 2123 2916 2123 3561 4088 1987 2123 3251 1715 2123 3251 3622 0.20 700757 1024 1125 2123 2559 2927 3333 3618 757 1782 3950 4125 757 3251 3561 4276 327 757 3618 0.15 739757 739

739 4220 739 0.10 374484 739 975 1303 2777 3499 484 739 739 2280 24 757 1801 2280 3234 410641764220 0.05 757

0.00

−0.05 0 1000 2000 3000 4000

Figure 4.15: Plot of the ranking (y axis) of each PEG (x axis) by the aggregate score computed by A1, as described in the text. Additional information about the PEGs can be found in supplementary Tables A.12 and A.11 in Appendix A.2.

DRAFT51 M. tuberculosis, rifampicin 400 739 350 300 250 757 200 2123 3251 150 2283 1618 6 2624 3731 1186 3734 3906 100 19 3561 3905 4220 4365 50 0 −50 0 1000 2000 3000 4000 M. tuberculosis, streptomycin

600 739

500

400

300

200 2123 3251 826 920 4220 3906 4365 6 757 1765 100 484 3561

0

−100 0 1000 2000 3000 4000

Figure 4.16: Plot of the ranking (y axis) of each PEG (x axis) by the aggregate score computed by A2, as described in the text. Additional information about the PEGs can be found in supplementary Tables A.13 and A.14 in Appendix A.2.

DRAFT52 hand, since PEGs represent larger gene regions the small, local redundancies introduced by a particular k-mer are diminished. For the above reasons, the primary focus of the RF feature importance calculation relates to mapping the features back to gene regions and evaluating the results of individual k-mers in aggregate. This aggregate approach over gene region generates results that are more clinically useful, easier to evaluate by biologists, and decrease the instabilities resulting from the k-mer space representation. Since evaluations indicated that accuracy did not suffer from use of a compressed matrix (see Chapter 5) we use compressed [might need to change if uncompressed experiments are added] k-mer matrices for all of the following comparisons. We evaluate the RF feature importance calculation in several ways. To facilitate the accuracy of gene identification we train 100 RF classifiers independently, each with 50 trees. We then aggregate the feature importance computed by each RF classifier in one of two ways:

A1. summation of the feature importances over 100 RF classifiers,

A2. the number of times a PEG occurs in the list of top 100 features of each individual RF classifier.

Subsequently, we rank the genes by the aggregate score, computed by A1 or A2 above, and in each case visualize the results in a histogram plot which shares a visual similarity to GWAS clustering. Figures 4.15 and 4.16 show the results for different datasets using ranking by metric A1 or A2, respectively. Tables A.11, A.12, A.13 and A.14 include a description of the products associated with each of the top PEGs. [Comment on the role those genes play.] Finally, we consider how these metrics change when using subsets of the original data. In our case, we know that in a clinical setting the use of large number of disparate isolates is unlikely, and we therefore wish to simulate the possibility that a practitioner only has 500 or even 100 samples from which to work from. [Include subset experiment and reference figure here.] DRAFT53 classifier #1 classifier #2 classifier #3 col index FI col index FI col index FI

CAAGCGGTGGTGGAC 0.02257 CATGACTGGTCGGCG 0.02087 AGGTCGAACGAGGGG 0.02244

CCTCTCTTGACGACG 0.02204 GCGCGCCGGGGCCAC 0.02046 GACTCGGGCCATGCC 0.02094

ATAGTCGGCCAACTC 0.01969 CGACCAGTGCACCCG 0.02041 AAGATGAGCATGTCG 0.02007

CTAGGTGTGGCCTAC 0.01961 GGCCCGACACCCCAC 0.01929 CGGTCGGTGCCCGAC 0.01926

GCCGCCGACGGCGCC 0.01793 TACCCGTCTGACCCA 0.01896 GATCGCCGCGTCGGA 0.01926

ACTTATCCAAACCCC 0.01764 AACGTCGCACAGAGC 0.01820 CAGGTGGGCAGCAAA 0.01925

ACCACGACCGCGCTA 0.01638 ACTGTTGGCGCTGGG 0.01799 CCACCGGCGGTCGAG 0.01875

GCCCGCCTCGGCAAC 0.01603 CAGGGTCGACGCCGG 0.01647 GCGACTGGGCGGGTA 0.01745

TCGTATGATAGACCA 0.01546 CGCCTCCGCGGCTAC 0.01606 AGACGAGCAGCAGGA 0.01607

ATACCATTGGCGCCC 0.01496 ACACCTCGAGCGCGG 0.01520 ACCGTCGGTGCAACA 0.01585

ACCAGCGAGGCCGCC 0.00669 GACTTGCTCGCCATC 0.01480 ATCGGTTCGCCTAGC 0.01538

CCGTTGGTCGCGGAG 0.00601 GCCGGTGGTGATCGC 0.01226 CATCCTGGGCGTGGC 0.01052

AGGTCTTTGATGCGC 0.00516 AGCGCCGACTGTTGG 0.00764 CCGCATCGTTGCAAA 0.00610

CTTCGACGACTTCGA 0.00503 CGGCCCCACCCGTCC 0.00743 GGCGGGCAAGGCAAC 0.00596

ACACGCTGTCGGCGT 0.00489 GTCGAGGATGTCGCA 0.00719 AACCCGTCGTGCAGC 0.00559

AGCCAATTCATGGAC 0.00477 GAGGCCGATGGGCCC 0.00685 GCTACCGATATCCAC 0.00556

CCGTTTTCCGGCGCC 0.00468 AAGTGTCCGAGATGG 0.00678 ATCCTGGGCGTGGCC 0.00507

CGGCCCCGTAGAGGA 0.00464 CCGCCGCATCGGCAC 0.00622 ATACCGGCGGTAACG 0.00505

CGCTGAACTGTCCGA 0.00459 ACGGTCGCGATGGCG 0.00611 GGGTGTCGACGAACC 0.00465

ACAGCCCACCGACCC 0.00452 TCTTGTGGCGGCGCA 0.00535 AGCATGCCGGCCACA 0.00454

Table 4.8: The top 20 k-mer matrix features computed by training RF on the k-mer matrix of the M. tuberculosis rifampicin dataset. Each classifier consists of 10 trees. FI denotes feature importance. [Change col index?]

4.4.2 Feature Importance Stability

[Describe why stability is important.] Tables 4.8 and 4.9 show the top 20 features com- puted by training three RF classifiers on the k-mer matrix of the M. tuberculosis rifampicin dataset using 10 and 1000 trees, respectively. The same experiment is performed on the compressed k-mer matrices in each case (see Table 4.10 and 4.11, respectively). We notice [....finish]. Figure 4.17 is an attempt to show that [....talk about P value and noise]. Tables A.9 and A.10 [update if adding more datasets] contain information on the feature importances learnt by RF. Collapsing by ID refers to calculating the feature importance DRAFT54 classifier #1 classifier #2 classifier #3 col index FI col index FI col index FI

CAGCGCCGACAGTCG 0.00149 AGCGCCGACTGTTGG 0.00176 CCAGCGCCGACAGTC 0.00165

CCGGCATCGAGGTCG 0.00139 CCAGCGCCGACAGTC 0.00141 GGACGCGATCACCAC 0.00151

ACCGGCATCGAGGTC 0.00129 CCCCAGCGCCGACAG 0.00114 GCCGGTGGTGATCGC 0.00132

GGACGCGATCACCAC 0.00123 ATGGCCCGAGTCGCC 0.00099 CCCCAGCGCCGACAG 0.00131

CCGGTGGTGATCGCG 0.00123 GACGCGATCACCACC 0.00095 CAGCGCCGACAGTCG 0.00130

ACCTCGATGCCGCTG 0.00119 CCAGCGCCAACAGTC 0.00095 ACCGGCATCGAGGTC 0.00127

ACTGTCGGCGCTGGG 0.00110 CGCCGACAGTCGGCG 0.00095 CGGGCCCCAGCGCCA 0.00124

ACCACCGGCATCGAG 0.00105 AGCGCCGACTGTCGG 0.00094 ATGCCGCTGGTGATC 0.00116

ATGCCGGTGGTGATC 0.00103 CGATGCCGGTGGTGA 0.00092 GGCCCCAGCGCCGAC 0.00111

CATCCTGGGCATGGC 0.00098 ACCTCGATGCCGCTG 0.00089 CACCACCGGCATCGA 0.00098

AGCGCCAACAGTCGG 0.00094 AGCGCCGACAGTCGG 0.00086 GGGCCCCAGCGCCAA 0.00097

ATGCCGCTGGTGATC 0.00090 GACGCGATCACCAGC 0.00079 GCCGACAGTCGGCGC 0.00097

GGCCCCAGCGCCGAC 0.00089 CCACCGGCATCGAGG 0.00077 AGCGCCGACTGTCGG 0.00096

ATCCTGGGCATGGCC 0.00087 CGCCAACAGTCGGCG 0.00072 CCGCTGGTGATCGCG 0.00085

GACGCGATCACCACC 0.00085 CCGCTGGTGATCGCG 0.00071 CTCGGGCCATGCCCA 0.00084

ATCACCAGCGGCATC 0.00084 CAAGCGCCGACTGTC 0.00067 AGCGCCAACAGTCGG 0.00082

GCCGGTGGTGATCGC 0.00084 ACTGTTGGCGCTGGG 0.00067 CATGGCCCGAGTCGC 0.00080

CAGCGCCAACAGTCG 0.00084 CCCCAGCGCCAACAG 0.00066 GCCGCTGGTGATCGC 0.00078

CGGGCCCCAGCGCCA 0.00080 GCCCCAGCGCCAACA 0.00065 CGCCAACAGTCGGCG 0.00078

CGGCGCTGGGGCCCG 0.00078 CCGGCATCGAGGTCG 0.00064 GACGCGATCACCACC 0.00075

Table 4.9: The top 20 k-mer matrix features computed by training RF on the k-mer matrix of the M. tuberculosis rifampicin dataset. Each classifier consists of 1000 trees. FI denotes feature importance. [Change col index?]

DRAFT55 classifier #1 classifier #2 classifier #3 col index FI col index FI col index FI 527213 0.04487 66706 0.02981 408378 0.02216 93368 0.01897 63341 0.02133 64176 0.02097 63356 0.01884 400593 0.01940 63446 0.02011 63493 0.01881 63402 0.01892 95569 0.01956 421632 0.01818 400983 0.01877 400681 0.01954 93378 0.01775 400676 0.01738 527214 0.01897 400568 0.01764 95565 0.01718 93165 0.01883 63242 0.01725 93159 0.01694 77501 0.01838 404562 0.01721 92583 0.01646 422710 0.01825 63961 0.01698 422335 0.01484 95596 0.01686 93486 0.01431 408378 0.01454 64190 0.01678 408380 0.01079 415269 0.01065 93090 0.01485 70052 0.00960 77501 0.00981 63462 0.01447 66694 0.00866 81419 0.00916 552267 0.01106 80395 0.00777 81634 0.00906 70052 0.01020 70054 0.00756 103150 0.00820 77503 0.00976 69171 0.00671 70050 0.00727 74533 0.00761 75121 0.00574 527215 0.00684 70051 0.00712 67911 0.00517 74536 0.00679 408552 0.00618 68047 0.00512 76079 0.00642 81318 0.00616

Table 4.10: The top 20 k-mer matrix features computed by training RF on the compressed k-mer matrix of the M. tuberculosis rifampicin dataset. Each classifier consists of 10 trees. FI denotes feature importance.

DRAFT56 classifier #1 classifier #2 classifier #3 col index FI col index FI col index FI 74659 0.00428 70054 0.00444 412262 0.00430 412262 0.00400 70051 0.00389 74662 0.00429 527215 0.00393 527212 0.00356 77501 0.00418 74662 0.00330 81634 0.00332 70051 0.00414 66705 0.00329 77493 0.00327 408379 0.00406 413723 0.00329 408376 0.00312 413723 0.00386 74660 0.00328 527214 0.00310 81635 0.00365 77502 0.00323 408379 0.00297 77503 0.00353 408379 0.00315 527213 0.00290 74663 0.00346 70051 0.00301 413723 0.00290 77504 0.00336 77501 0.00295 41226 0.00284 81634 0.00319 74661 0.00285 77501 0.00283 70053 0.00311 527214 0.00280 66706 0.00282 527216 0.00309 70053 0.00273 81636 0.00280 77502 0.00295 408378 0.00273 408377 0.00278 74660 0.00283 404562 0.00253 77502 0.00274 404562 0.00277 527212 0.00245 74661 0.00269 527212 0.00265 81634 0.00244 77503 0.00256 528635 0.00264 81635 0.00243 70053 0.00255 74659 0.00251 70050 0.00241 74663 0.00249 77493 0.00247

Table 4.11: The top 20 k-mer matrix features computed by training RF on the compressed k-mer matrix of the M. tuberculosis rifampicin dataset. Each classifier consists of 1000 trees. FI denotes feature importance.

DRAFT57 M. tuberculosis, rifampicin M. tuberculosis, streptomycin 0.035 0.012 full kmer matrix full kmer matrix 0.030 collapsing by column ID 0.010 collapsing by column ID compressed kmer matrix compressed kmer matrix 0.025 0.008 0.020 0.006 0.015 0.004 0.010 feature importance feature importance feature 0.005 0.002

0.000 0.000 0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120 140 160 feature rank feature rank

Figure 4.17: k-mer ranking according to the feature importance computed by RF on the full or the compressed k-mer matrix. Collapsing by column ID refers to calculating the feature importance separately on the full k-mer matrix and then summing the feature importance for all columns that have an identical column identity in the k-mer matrix. separately on the full k-mer matrix and then summing the feature importance for all columns that have an identical column identity in the k-mer matrix. Comparison between the feature importances computed using RF on the full matrix and the collapsing procedure shows the instability of working with redundant data.

DRAFT58 CHAPTER 5

COMPRESSED MATRIX FORMULATION

A compressed k-mer matrix, as defined in (3.2), is a matrix such that no two columns are identical. Practically speaking, since RF chooses arbitrarily from duplicate columns for selection in its tree structure, once the predictive power of a single column of the k- mer is used, any subsequent identical column would be ignored. The compression solves two problems — since our data set is massively redundant, compression leads to reduction of both the space required for storing the data and the time complexity of downstream machine learning algorithms. Empirically, many of the columns in k-mer matrices contain only a single nonzero entry at the same index, which means that k-mer matrices can be compressed in a very compact form. While the realities of working with the original k-mer matrix are conceptually advanta- geous, they do introduce potential errors. Feature importances are calculated on individual k-mers, despite multiple k-mers having identical columns. In the particular case of RF clas- √ sification, at each step the algorithm uniformly at random selects n features (where n is the number of columns in (3.1)). This process generates instability in the feature importance calculation as we have shown in Tables 4.8 and 4.9. Additionally, in Tables 4.10 and 4.11 we have shown that using a compressed matrix manages some of the instability in k-mer identification. However, compressed k-mer matrix construction is costly in terms space com- plexity. Below we outline a novel matrix construction technique that is capable of generating a compressed matrix without requiring a large memory machine for generation.

5.1 Compressed Matrix Construction

The typical process for generating a k-mer matrix is to process contigs through a k-mer counting tool, such as KMC 2 (Deorowicz et al., 2015), and output the data to a tab separated DRAFT59 outer_dict = {} for file in isolate_files: for kmer, count in open(file): outer_dict[kmer][file] = count M = np.zeros((len(isolate_files),len(outer_dict))) # fill in M such that M[file,kmer] = count

Figure 5.1: Python pseudocode for constructing a k-mer matrix.

file with k-mer string in the first column and the count of times that k-mer occurs in the second column. Each file represents an isolate. One can then generate sparse representation of the ultimate k-mer matrix by constructing a dictionary of dictionaries where the outer dictionary keys are the k-mers, the inner dictionary keys are the isolate IDs, and the inner dictionary values are the count of times the outer most key occurs in the innermost isolate. The Python pseudocode is provided in Figure 5.1. One can then construct a compressed matrix by searching for duplicate columns in k-mer matrix. This is a reasonable process to follow but it explicitly builds the entire matrix. For even moderately sized k-mer matrices the difference between the compressed and uncom- pressed form is approximately two orders of magnitude for isolates of the same species. This prevents the use of such techniques on all but the largest computer systems with considerable amounts of available RAM. Thus, for a wide spread adoption of such tools and approaches a technique for building the compressed data matrices from the initial data is required. We developed an efficient algorithm to construct such a compressed matrix from binary form of the data which we present below. While most matrix construction algorithms proceed in either row-wise or column-wise fashion with the columns or rows axis fixed, our algorithm proceeds by expanding the representation of columns only as necessary. In our compressed matrix formulation we construct a data object that is equivalent to the outer and inner dictionary in Figure 5.1, however, it does not require the construction of the entire matrix M and only then reducing its size by searching for similar columns. Rather, by virtue of its construction, it already represents the data in the final compressed format. DRAFT60 In this case, our data object is a single dictionary, where the keys are sets of k-mers and the value is an array of zeros and ones in which the i-th entry represents the presence or absence of those k-mers in i-th isolate. The final matrix then can be generated simply by aligning the values from this dictionary. The algorithm proceeds as follows — it iteratively builds the compressed matrix from the top left corner with each new isolate expanding the matrix down and adding columns, as needed. In the simplest case a single isolate exists and all k-mers in that isolate are present, so an example isolate with four k-ers (i.e., ATG, TGC, AAA, GGG) is represented in the form

[ATG, TGC, AAA, GGG]   i1 1 (5.1)

Next, a next a second isolate is introduced and three possibilities exist for its k-mers: (a) the second isolate’s k-mers are all identical to the k-mers of the first isolate (i.e., the k-mers in the first column in (5.1)), (b) the k-mer set of the second isolate is disjoint from the k-mer set of the first isolate, (c) there is partial overlap of the k-mer sets of the two isolates. Each of these possibilities lead to a different update of the compressed matrix as follows:

(a) When there is a perfect overlap between the k-mers of the first and second isolate (i.e., the second isolate consists of k-ers TG, TGC, AAA, GGG), the matrix is updated in the form [ATG, TGC, AAA, GGG]   i 1 1    i2 1

(b) When the k-mer sets are disjoint (for example, if the second isolate consists of k-mers DRAFT61 columns: a dictionary where each key, column_kmer, is a set of k-mers and each value, column_vector, is a vector consisting of zeros and ones

for each new_isolate: for column_kmer, column_vector in columns: if column_kmers are a subset of new_isolate: append 1 to column_vector else: append 1 to column_vector for the intersection of column_kmer and new_isolate_kmers append 0 to column_vector for column_kmer not in new_isolate make column of 0’s ending with a single 1 for any unused set of kmers in new_isolates

Figure 5.2: Pseudocode for constructing a compressed k-mer matrix.

ATC, TGG, AAT and GGT), a new column is added to the matrix in the form

[ATG, TGC, AAA, GGG] [ATC, TGG, AAT, GGT]   i 1 0 1    i2 0 1

(c) Most often only a partial overlap exists between the k-mer sets. If, for example, the second isolate consists of k-mers ATG, TGC, GTG and TTA, the matrix is updated in the form [ATG, TGC] [AAA, GGG] [GTG, TTA]   i 1 1 0 1    i2 1 0 1

The pseudocode for the resulting algorithm is shown in Figure 5.2 and the full Python code can be found in Section A.2.

DRAFT62 5.2 Experiments

Since many of the columns of the k-mer matrix are identical, we are able to compress the k-mer matrix by eliminating identical columns from the k-mer matrix, which decreases the feature space. Since identical columns provided identical information, when we compress them, we only lose a information about the distribution of such columns but, as we show below, this does not affect the predictive power of each individual column. The plots in Figure 5.3 show summary statistics of the compressed A. baumannii and S. aureus datasets — we notice that the various statistics monotonically increase and have a distinct peak that occurs k = 13. The peak in these plots also represents the largest the matrix will grow in its compressed state. At the same time, k = 13 in the full, i.e. not compressed, matrices (shown in Figures 4.1 and 4.3) corresponds to an asymptote in each of the plots. Next, we compare the performance of the five classifiers used above on the compressed nonbinary and binary A. baumannii and S. aureus datasets in terms of classification accuracy (see Figure 5.4). Note that [....finish]. Tables A.5, A.6, A.7 and A.8 contain additional performance statistics. We also plot the ROC curves for the RF classifiers trained on the compressed nonbinary and binary A. baumannii (Figures 5.5 and 5.7, respectively) and S. aureus (Figures 5.6 and 5.8, respectively) datasets. As illustrated by these figures, the ROC curves for the two types of compressed A. baumannii and S. aureus datasets do not significantly differ from the results for the uncompressed data in Figures 4.10.

DRAFT63 A. baumannii (compressed matrix)

81e6 (i) 71e8 (ii) 7 6 6 5 5 4 4 3 3 2 %sparsity # features # 1 2 0 1 10 12 14 16 18 20 10 12 14 16 18 20

51e8 (iii) 3.51e9 (iv) 3.0 4 2.5 3 2.0 2 1.5

# ones # 1.0 1 0.5

0 (GB) size matrix 0.0 10 12 14 16 18 20 10 12 14 16 18 20 S. aureus (compressed matrix)

1.01e6 (i) 4.51e8 (ii) 4.0 0.8 3.5 3.0 0.6 2.5 0.4 2.0 1.5 1.0 %sparsity # features # 0.2 0.5 0.0 0.0 10 12 14 16 18 20 10 12 14 16 18 20

2.01e8 (iii) 1.21e9 (iv) 1.8 1.6 1.0 1.4 0.8 1.2 1.0 0.6 0.8 0.4 # ones # 0.6 0.4 0.2

0.2 (GB) size matrix 0.0 10 12 14 16 18 20 10 12 14 16 18 20

Figure 5.3: Overview of the compressed A. baumannii and S. aureus datasets. Similarly to the full k-mer matrix case (see Figures 4.1 and 4.3), the k-mer size (shown on the x axis) affects different metrics of the k-mer matrix identically. DRAFT64 A. baumannii, compressed S. aureus, compressed 1.0 1.1

0.9 1.0

0.8 0.9

0.7 0.8

Random Forest Random Forest 0.6 0.7 avg. accuracy avg. Naive Bayes accuracy avg. Naive Bayes Lasso Lasso 0.5 SVM 0.6 SVM AdaBoost AdaBoost 0.4 0.5 10 12 14 16 18 20 10 12 14 16 18 20 kmer size kmer size

binary A. baumannii, compressed binary S. aureus, compressed 1.0 1.05

0.9 1.00

0.95 0.8 0.90 0.7 0.85 Random Forest Random Forest 0.6 avg. accuracy avg. Naive Bayes accuracy avg. 0.80 Naive Bayes Lasso Lasso 0.5 SVM 0.75 SVM AdaBoost AdaBoost 0.4 0.70 10 12 14 16 18 20 10 12 14 16 18 20 kmer size kmer size

Figure 5.4: Accuracy of different classifiers on the compressed (binary) A. baumannii and S. aureus datasets. Additional performance metrics can be found in the supplementary Tables A.5, A.6, A.7 and A.8 in Appendix A.1.

DRAFT65 RF Naive Bayes

Receiver operating characteristic (ROC) Receiver operating characteristic (ROC) 1.0 1.0

0.8 0.8 kmer size =10 kmer size =10 kmer size =11 kmer size =11 0.6 kmer size =12 0.6 kmer size =12 kmer size =13 kmer size =13 kmer size =14 kmer size =14 0.4 kmer size =15 0.4 kmer size =15 kmer size =16 kmer size =16 kmer size =17 kmer size =17 True Positive Rate Positive True 0.2 Rate Positive True 0.2 kmer size =18 kmer size =18 kmer size =19 kmer size =19 0.0 kmer size =20 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate

Lasso SVM

Receiver operating characteristic (ROC) Receiver operating characteristic (ROC) 1.0 1.0

0.8 0.8 kmer size =10 kmer size =10 kmer size =11 kmer size =11 0.6 kmer size =12 0.6 kmer size =12 kmer size =13 kmer size =13 kmer size =14 kmer size =14 0.4 kmer size =15 0.4 kmer size =15 kmer size =16 kmer size =16 kmer size =17 kmer size =17 True Positive Rate Positive True 0.2 Rate Positive True 0.2 kmer size =18 kmer size =18 kmer size =19 kmer size =19 0.0 kmer size =20 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate AdaBoost

Receiver operating characteristic (ROC) 1.0

0.8 kmer size =10 kmer size =11 0.6 kmer size =12 kmer size =13 kmer size =14 0.4 kmer size =15 kmer size =16 kmer size =17 True Positive Rate Positive True 0.2 kmer size =18 kmer size =19 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate Figure 5.5: ROC curves for different classifiers trained on the compressed A. baumannii dataset. Each ROC curve corresponds to a different size k. DRAFT66 RF Naive Bayes

Receiver operating characteristic (ROC) Receiver operating characteristic (ROC) 1.0 1.0

0.8 0.8 Kmer size =10 Kmer size =10 Kmer size =11 Kmer size =11 0.6 Kmer size =12 0.6 Kmer size =12 Kmer size =13 Kmer size =13 Kmer size =14 Kmer size =14 0.4 Kmer size =15 0.4 Kmer size =15 Kmer size =16 Kmer size =16 Kmer size =17 Kmer size =17 True Positive Rate Positive True 0.2 Rate Positive True 0.2 Kmer size =18 Kmer size =18 Kmer size =19 Kmer size =19 0.0 Kmer size =20 0.0 Kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate

Lasso SVM

Receiver operating characteristic (ROC) Receiver operating characteristic (ROC) 1.0 1.0

0.8 0.8 Kmer size =10 Kmer size =10 Kmer size =11 Kmer size =11 0.6 Kmer size =12 0.6 Kmer size =12 Kmer size =13 Kmer size =13 Kmer size =14 Kmer size =14 0.4 Kmer size =15 0.4 Kmer size =15 Kmer size =16 Kmer size =16 Kmer size =17 Kmer size =17 True Positive Rate Positive True 0.2 Rate Positive True 0.2 Kmer size =18 Kmer size =18 Kmer size =19 Kmer size =19 0.0 Kmer size =20 0.0 Kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate AdaBoost

Receiver operating characteristic (ROC) 1.0

0.8 Kmer size =10 Kmer size =11 0.6 Kmer size =12 Kmer size =13 Kmer size =14 0.4 Kmer size =15 Kmer size =16 Kmer size =17 True Positive Rate Positive True 0.2 Kmer size =18 Kmer size =19 0.0 Kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate Figure 5.6: ROC curves for different classifiers trained on the compressed S. aureus dataset. Each ROC curve corresponds to a different size k. DRAFT67 RF Naive Bayes

Receiver operating characteristic (ROC) Receiver operating characteristic (ROC) 1.0 1.0

0.8 0.8 kmer size =10 kmer size =10 kmer size =11 kmer size =11 0.6 kmer size =12 0.6 kmer size =12 kmer size =13 kmer size =13 kmer size =14 kmer size =14 0.4 kmer size =15 0.4 kmer size =15 kmer size =16 kmer size =16 kmer size =17 kmer size =17 True Positive Rate Positive True 0.2 Rate Positive True 0.2 kmer size =18 kmer size =18 kmer size =19 kmer size =19 0.0 kmer size =20 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate

Lasso SVM

Receiver operating characteristic (ROC) Receiver operating characteristic (ROC) 1.0 1.0

0.8 0.8 kmer size =10 kmer size =10 kmer size =11 kmer size =11 0.6 kmer size =12 0.6 kmer size =12 kmer size =13 kmer size =13 kmer size =14 kmer size =14 0.4 kmer size =15 0.4 kmer size =15 kmer size =16 kmer size =16 kmer size =17 kmer size =17 True Positive Rate Positive True 0.2 Rate Positive True 0.2 kmer size =18 kmer size =18 kmer size =19 kmer size =19 0.0 kmer size =20 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate AdaBoost

Receiver operating characteristic (ROC) 1.0

0.8 kmer size =10 kmer size =11 0.6 kmer size =12 kmer size =13 kmer size =14 0.4 kmer size =15 kmer size =16 kmer size =17 True Positive Rate Positive True 0.2 kmer size =18 kmer size =19 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate Figure 5.7: ROC curves for different classifiers trained on the compressed binary A. bau- mannii dataset. Each ROC curve corresponds to a different size k. DRAFT68 RF Naive Bayes

Receiver operating characteristic (ROC) Receiver operating characteristic (ROC) 1.0 1.0

0.8 0.8 kmer size =10 kmer size =10 kmer size =11 kmer size =11 0.6 kmer size =12 0.6 kmer size =12 kmer size =13 kmer size =13 kmer size =14 kmer size =14 0.4 kmer size =15 0.4 kmer size =15 kmer size =16 kmer size =16 kmer size =17 kmer size =17 True Positive Rate Positive True 0.2 Rate Positive True 0.2 kmer size =18 kmer size =18 kmer size =19 kmer size =19 0.0 kmer size =20 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate

Lasso SVM

Receiver operating characteristic (ROC) Receiver operating characteristic (ROC) 1.0 1.0

0.8 0.8 kmer size =10 kmer size =10 kmer size =11 kmer size =11 0.6 kmer size =12 0.6 kmer size =12 kmer size =13 kmer size =13 kmer size =14 kmer size =14 0.4 kmer size =15 0.4 kmer size =15 kmer size =16 kmer size =16 kmer size =17 kmer size =17 True Positive Rate Positive True 0.2 Rate Positive True 0.2 kmer size =18 kmer size =18 kmer size =19 kmer size =19 0.0 kmer size =20 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate AdaBoost

Receiver operating characteristic (ROC) 1.0

0.8 kmer size =10 kmer size =11 0.6 kmer size =12 kmer size =13 kmer size =14 0.4 kmer size =15 kmer size =16 kmer size =17 True Positive Rate Positive True 0.2 kmer size =18 kmer size =19 0.0 kmer size =20 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate Figure 5.8: ROC curves for different classifiers trained on the compressed binary S. aureus dataset. Each ROC curve corresponds to a different size k. DRAFT69 APPENDIX A

SUPPLEMENTARY MATERIAL

A.1 AMR: Classification

k Accuracy F1 Recall Precision AUC RF 10 0.93 ± 0.04 0.93 ± 0.04 0.95 ± 0.03 0.91 ± 0.06 0.95 ± 0.03 11 0.92 ± 0.04 0.92 ± 0.05 0.93 ± 0.02 0.91 ± 0.09 0.94 ± 0.03 12 0.92 ± 0.06 0.92 ± 0.06 0.94 ± 0.05 0.9 ± 0.08 0.94 ± 0.04 13 0.92 ± 0.02 0.92 ± 0.03 0.94 ± 0.02 0.9 ± 0.06 0.96 ± 0.03 14 0.94 ± 0.04 0.95 ± 0.04 0.96 ± 0.03 0.93 ± 0.06 0.95 ± 0.04 15 0.93 ± 0.02 0.93 ± 0.02 0.93 ± 0.02 0.93 ± 0.06 0.95 ± 0.03 16 0.93 ± 0.03 0.93 ± 0.03 0.95 ± 0.03 0.91 ± 0.07 0.95 ± 0.04 17 0.92 ± 0.03 0.92 ± 0.03 0.95 ± 0.03 0.9 ± 0.04 0.95 ± 0.04 18 0.92 ± 0.03 0.92 ± 0.03 0.94 ± 0.02 0.91 ± 0.07 0.95 ± 0.03 19 0.94 ± 0.04 0.95 ± 0.04 0.96 ± 0.03 0.93 ± 0.06 0.96 ± 0.03 20 0.93 ± 0.02 0.93 ± 0.02 0.96 ± 0.03 0.91 ± 0.03 0.96 ± 0.03 Naive Bayes 10 0.51 ± 0.04 0.18 ± 0.11 0.62 ± 0.37 0.11 ± 0.06 0.53 ± 0.04 11 0.52 ± 0.04 0.17 ± 0.1 0.9 ± 0.2 0.1 ± 0.06 0.54 ± 0.04 12 0.66 ± 0.06 0.54 ± 0.13 0.94 ± 0.08 0.39 ± 0.14 0.68 ± 0.06 13 0.77 ± 0.08 0.72 ± 0.15 0.95 ± 0.06 0.61 ± 0.19 0.78 ± 0.08 14 0.85 ± 0.08 0.83 ± 0.11 0.93 ± 0.06 0.78 ± 0.19 0.85 ± 0.07 15 0.87 ± 0.07 0.85 ± 0.1 0.94 ± 0.05 0.81 ± 0.17 0.87 ± 0.07 16 0.87 ± 0.07 0.85 ± 0.1 0.94 ± 0.05 0.81 ± 0.17 0.87 ± 0.07 17 0.87 ± 0.07 0.85 ± 0.1 0.94 ± 0.05 0.81 ± 0.17 0.87 ± 0.07 18 0.87 ± 0.07 0.85 ± 0.1 0.94 ± 0.05 0.81 ± 0.17 0.87 ± 0.07 19 0.86 ± 0.07 0.85 ± 0.1 0.93 ± 0.05 0.81 ± 0.17 0.86 ± 0.07 20 0.86 ± 0.07 0.85 ± 0.1 0.93 ± 0.05 0.81 ± 0.17 0.86 ± 0.07 Lasso 10 0.91 ± 0.04 0.92 ± 0.04 0.92 ± 0.06 0.92 ± 0.08 0.95 ± 0.04 11 0.92 ± 0.04 0.92 ± 0.04 0.93 ± 0.06 0.92 ± 0.08 0.95 ± 0.04 12 0.91 ± 0.04 0.92 ± 0.04 0.93 ± 0.06 0.91 ± 0.08 0.95 ± 0.04 13 0.91 ± 0.04 0.92 ± 0.04 0.93 ± 0.06 0.91 ± 0.08 0.95 ± 0.04 14 0.92 ± 0.04 0.92 ± 0.04 0.93 ± 0.06 0.92 ± 0.08 0.95 ± 0.04 15 0.92 ± 0.04 0.92 ± 0.04 0.93 ± 0.06 0.92 ± 0.08 0.95 ± 0.04 16 0.92 ± 0.04 0.92 ± 0.04 0.93 ± 0.06 0.92 ± 0.08 0.95 ± 0.04 17 0.92 ± 0.04 0.92 ± 0.04 0.93 ± 0.06 0.92 ± 0.08 0.95 ± 0.04 18 0.92 ± 0.04 0.92 ± 0.04 0.93 ± 0.06 0.92 ± 0.08 0.95 ± 0.04 19 0.92 ± 0.04 0.92 ± 0.04 0.93 ± 0.06 0.92 ± 0.08 0.95 ± 0.04 20 0.93 ± 0.04 0.93 ± 0.04 0.94 ± 0.04 0.92 ± 0.08 0.95 ± 0.04 AdaBoost 10 0.93 ± 0.02 0.93 ± 0.02 0.93 ± 0.03 0.93 ± 0.06 0.96 ± 0.02 DRAFT70 11 0.91 ± 0.02 0.91 ± 0.03 0.92 ± 0.02 0.9 ± 0.06 0.96 ± 0.03 12 0.93 ± 0.03 0.93 ± 0.03 0.93 ± 0.02 0.94 ± 0.06 0.97 ± 0.02 13 0.93 ± 0.02 0.93 ± 0.02 0.92 ± 0.02 0.94 ± 0.06 0.97 ± 0.03 14 0.94 ± 0.04 0.94 ± 0.04 0.94 ± 0.02 0.94 ± 0.08 0.96 ± 0.04 15 0.91 ± 0.06 0.91 ± 0.07 0.9 ± 0.04 0.93 ± 0.11 0.96 ± 0.04 16 0.94 ± 0.06 0.93 ± 0.06 0.95 ± 0.01 0.93 ± 0.11 0.96 ± 0.03 17 0.92 ± 0.04 0.92 ± 0.04 0.9 ± 0.03 0.95 ± 0.08 0.96 ± 0.04 18 0.94 ± 0.04 0.94 ± 0.04 0.94 ± 0.05 0.95 ± 0.06 0.96 ± 0.04 19 0.94 ± 0.04 0.94 ± 0.04 0.96 ± 0.03 0.93 ± 0.07 0.96 ± 0.03 20 0.93 ± 0.05 0.93 ± 0.05 0.94 ± 0.03 0.93 ± 0.09 0.96 ± 0.03 SVM 10 0.84 ± 0.03 0.83 ± 0.04 0.95 ± 0.03 0.74 ± 0.06 0.94 ± 0.05 11 0.91 ± 0.06 0.92 ± 0.05 0.92 ± 0.09 0.93 ± 0.07 0.95 ± 0.04 12 0.91 ± 0.06 0.92 ± 0.04 0.89 ± 0.08 0.96 ± 0.05 0.96 ± 0.04 13 0.9 ± 0.06 0.91 ± 0.04 0.87 ± 0.08 0.96 ± 0.05 0.94 ± 0.05 14 0.89 ± 0.06 0.9 ± 0.05 0.87 ± 0.07 0.94 ± 0.08 0.93 ± 0.05 15 0.88 ± 0.08 0.89 ± 0.07 0.87 ± 0.09 0.92 ± 0.08 0.93 ± 0.05 16 0.87 ± 0.1 0.88 ± 0.09 0.86 ± 0.09 0.9 ± 0.09 0.93 ± 0.05 17 0.87 ± 0.1 0.88 ± 0.09 0.86 ± 0.09 0.9 ± 0.09 0.93 ± 0.05 18 0.87 ± 0.1 0.88 ± 0.09 0.86 ± 0.09 0.9 ± 0.09 0.93 ± 0.05 19 0.87 ± 0.1 0.88 ± 0.09 0.86 ± 0.09 0.9 ± 0.09 0.93 ± 0.05 20 0.87 ± 0.1 0.88 ± 0.09 0.86 ± 0.09 0.9 ± 0.09 0.93 ± 0.05

Table A.1: Performance of different classifiers on the A. baumannii dataset for different k-mer sizes.

DRAFT71 k Accuracy F1 Recall Precision AUC RF 10 0.98 ± 0.02 0.99 ± 0.01 1.0 ± 0.0 0.98 ± 0.02 0.99 ± 0.01 11 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 12 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.98 ± 0.01 0.99 ± 0.01 13 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 14 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 15 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 16 0.98 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.98 ± 0.01 0.99 ± 0.01 17 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 18 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 19 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.98 ± 0.01 0.99 ± 0.01 20 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 Naive Bayes 10 0.84 ± 0.18 0.87 ± 0.17 0.96 ± 0.01 0.84 ± 0.24 0.85 ± 0.1 11 0.88 ± 0.14 0.91 ± 0.11 0.95 ± 0.02 0.89 ± 0.16 0.86 ± 0.1 12 0.89 ± 0.11 0.92 ± 0.08 0.95 ± 0.02 0.9 ± 0.13 0.86 ± 0.09 13 0.92 ± 0.05 0.95 ± 0.03 0.96 ± 0.01 0.95 ± 0.06 0.88 ± 0.05 14 0.93 ± 0.03 0.96 ± 0.02 0.95 ± 0.01 0.96 ± 0.03 0.88 ± 0.04 15 0.93 ± 0.03 0.96 ± 0.02 0.95 ± 0.01 0.97 ± 0.02 0.88 ± 0.03 16 0.93 ± 0.03 0.96 ± 0.02 0.95 ± 0.01 0.97 ± 0.02 0.88 ± 0.03 17 0.93 ± 0.02 0.96 ± 0.02 0.95 ± 0.01 0.97 ± 0.02 0.88 ± 0.03 18 0.93 ± 0.02 0.96 ± 0.02 0.95 ± 0.01 0.97 ± 0.02 0.88 ± 0.03 19 0.94 ± 0.02 0.96 ± 0.02 0.95 ± 0.01 0.97 ± 0.02 0.88 ± 0.03 20 0.93 ± 0.02 0.96 ± 0.01 0.95 ± 0.01 0.97 ± 0.02 0.88 ± 0.03 Lasso 10 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 11 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 12 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 13 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.01 0.99 ± 0.01 0.99 ± 0.0 14 0.98 ± 0.01 0.99 ± 0.01 1.0 ± 0.01 0.98 ± 0.02 0.99 ± 0.0 15 0.98 ± 0.01 0.99 ± 0.01 1.0 ± 0.01 0.98 ± 0.02 1.0 ± 0.0 16 0.98 ± 0.01 0.99 ± 0.01 1.0 ± 0.01 0.98 ± 0.02 1.0 ± 0.0 17 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.98 ± 0.02 1.0 ± 0.0 18 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.98 ± 0.02 1.0 ± 0.0 19 0.98 ± 0.01 0.99 ± 0.01 1.0 ± 0.01 0.98 ± 0.02 1.0 ± 0.0 20 0.98 ± 0.01 0.99 ± 0.01 1.0 ± 0.01 0.98 ± 0.02 1.0 ± 0.0 AdaBoost 10 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.02 0.99 ± 0.01 11 0.98 ± 0.03 0.99 ± 0.02 1.0 ± 0.01 0.98 ± 0.04 0.99 ± 0.01 12 0.98 ± 0.02 0.99 ± 0.01 1.0 ± 0.01 0.98 ± 0.02 0.99 ± 0.01 13 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.98 ± 0.01 14 0.99 ± 0.02 0.99 ± 0.01 1.0 ± 0.01 0.99 ± 0.02 0.99 ± 0.01 15 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 16 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 17 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 18 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 0.99 ± 0.01 DRAFT72 19 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 20 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 SVM 10 0.98 ± 0.03 0.98 ± 0.02 0.99 ± 0.01 0.98 ± 0.04 0.99 ± 0.01 11 0.97 ± 0.02 0.98 ± 0.01 0.98 ± 0.02 0.98 ± 0.03 0.99 ± 0.01 12 0.92 ± 0.03 0.95 ± 0.02 0.92 ± 0.05 0.99 ± 0.02 0.99 ± 0.01 13 0.83 ± 0.02 0.9 ± 0.01 0.84 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 14 0.83 ± 0.02 0.9 ± 0.01 0.84 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 15 0.83 ± 0.02 0.9 ± 0.01 0.84 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 16 0.83 ± 0.02 0.9 ± 0.01 0.84 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 17 0.83 ± 0.02 0.9 ± 0.01 0.84 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 18 0.83 ± 0.02 0.9 ± 0.01 0.84 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 19 0.83 ± 0.02 0.9 ± 0.01 0.84 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 20 0.83 ± 0.02 0.9 ± 0.01 0.84 ± 0.01 0.99 ± 0.02 0.99 ± 0.02

Table A.2: Performance of different classifiers on the S. aureus dataset for different k-mer sizes.

DRAFT73 k Accuracy F1 Recall Precision AUC RF 10 0.93 ± 0.03 0.93 ± 0.03 0.95 ± 0.03 0.92 ± 0.06 0.95 ± 0.03 11 0.93 ± 0.03 0.93 ± 0.03 0.94 ± 0.03 0.92 ± 0.07 0.96 ± 0.03 12 0.94 ± 0.03 0.94 ± 0.03 0.96 ± 0.03 0.93 ± 0.06 0.95 ± 0.03 13 0.94 ± 0.03 0.94 ± 0.03 0.94 ± 0.04 0.93 ± 0.06 0.96 ± 0.03 14 0.93 ± 0.04 0.93 ± 0.04 0.93 ± 0.04 0.93 ± 0.03 0.95 ± 0.04 15 0.92 ± 0.05 0.93 ± 0.05 0.93 ± 0.05 0.93 ± 0.05 0.95 ± 0.03 16 0.91 ± 0.04 0.92 ± 0.04 0.91 ± 0.05 0.93 ± 0.05 0.95 ± 0.04 17 0.93 ± 0.04 0.94 ± 0.03 0.92 ± 0.05 0.95 ± 0.05 0.96 ± 0.03 18 0.94 ± 0.03 0.95 ± 0.03 0.96 ± 0.03 0.93 ± 0.04 0.96 ± 0.03 19 0.91 ± 0.04 0.92 ± 0.03 0.92 ± 0.08 0.92 ± 0.06 0.96 ± 0.02 20 0.94 ± 0.03 0.94 ± 0.03 0.95 ± 0.03 0.93 ± 0.05 0.96 ± 0.02 Naive Bayes 10 0.67 ± 0.08 0.54 ± 0.15 0.94 ± 0.06 0.39 ± 0.16 0.68 ± 0.08 11 0.56 ± 0.04 0.28 ± 0.13 0.94 ± 0.07 0.17 ± 0.1 0.58 ± 0.04 12 0.67 ± 0.06 0.55 ± 0.13 0.96 ± 0.05 0.4 ± 0.15 0.69 ± 0.06 13 0.74 ± 0.07 0.67 ± 0.13 0.96 ± 0.06 0.54 ± 0.15 0.75 ± 0.06 14 0.83 ± 0.09 0.81 ± 0.12 0.92 ± 0.05 0.76 ± 0.19 0.84 ± 0.08 15 0.83 ± 0.09 0.81 ± 0.12 0.91 ± 0.03 0.76 ± 0.19 0.84 ± 0.08 16 0.84 ± 0.08 0.83 ± 0.11 0.91 ± 0.03 0.79 ± 0.18 0.85 ± 0.08 17 0.84 ± 0.08 0.83 ± 0.11 0.91 ± 0.03 0.79 ± 0.18 0.85 ± 0.08 18 0.85 ± 0.08 0.84 ± 0.11 0.91 ± 0.03 0.8 ± 0.18 0.85 ± 0.08 19 0.86 ± 0.07 0.85 ± 0.09 0.91 ± 0.03 0.82 ± 0.16 0.86 ± 0.07 20 0.87 ± 0.06 0.86 ± 0.08 0.91 ± 0.03 0.83 ± 0.14 0.87 ± 0.06 Lasso 10 0.91 ± 0.04 0.92 ± 0.04 0.92 ± 0.05 0.92 ± 0.07 0.97 ± 0.03 11 0.92 ± 0.04 0.92 ± 0.04 0.93 ± 0.05 0.92 ± 0.07 0.97 ± 0.02 12 0.93 ± 0.04 0.93 ± 0.03 0.93 ± 0.05 0.94 ± 0.06 0.98 ± 0.02 13 0.93 ± 0.04 0.93 ± 0.03 0.93 ± 0.05 0.94 ± 0.06 0.97 ± 0.02 14 0.93 ± 0.04 0.93 ± 0.03 0.93 ± 0.05 0.94 ± 0.06 0.97 ± 0.02 15 0.93 ± 0.04 0.93 ± 0.03 0.93 ± 0.05 0.94 ± 0.06 0.97 ± 0.03 16 0.92 ± 0.04 0.93 ± 0.03 0.93 ± 0.05 0.93 ± 0.06 0.97 ± 0.03 17 0.93 ± 0.04 0.93 ± 0.03 0.93 ± 0.05 0.94 ± 0.06 0.97 ± 0.03 18 0.93 ± 0.04 0.93 ± 0.03 0.93 ± 0.05 0.94 ± 0.06 0.97 ± 0.03 19 0.93 ± 0.04 0.93 ± 0.03 0.93 ± 0.05 0.94 ± 0.06 0.97 ± 0.03 20 0.93 ± 0.04 0.93 ± 0.03 0.93 ± 0.05 0.94 ± 0.06 0.97 ± 0.02 AdaBoost 10 0.94 ± 0.06 0.94 ± 0.05 0.93 ± 0.05 0.95 ± 0.06 0.97 ± 0.03 11 0.91 ± 0.02 0.91 ± 0.03 0.91 ± 0.04 0.91 ± 0.08 0.95 ± 0.03 12 0.93 ± 0.03 0.93 ± 0.03 0.94 ± 0.04 0.92 ± 0.07 0.96 ± 0.02 13 0.94 ± 0.04 0.94 ± 0.04 0.94 ± 0.03 0.93 ± 0.06 0.97 ± 0.03 14 0.94 ± 0.03 0.95 ± 0.03 0.95 ± 0.03 0.94 ± 0.06 0.97 ± 0.02 15 0.94 ± 0.03 0.95 ± 0.03 0.96 ± 0.02 0.93 ± 0.06 0.97 ± 0.02 16 0.94 ± 0.04 0.94 ± 0.04 0.94 ± 0.04 0.93 ± 0.08 0.96 ± 0.04 17 0.94 ± 0.03 0.94 ± 0.03 0.95 ± 0.04 0.93 ± 0.06 0.97 ± 0.03 18 0.91 ± 0.03 0.92 ± 0.03 0.93 ± 0.04 0.9 ± 0.07 0.97 ± 0.03 DRAFT74 19 0.94 ± 0.04 0.94 ± 0.04 0.96 ± 0.02 0.93 ± 0.07 0.96 ± 0.04 20 0.94 ± 0.04 0.94 ± 0.04 0.95 ± 0.03 0.94 ± 0.08 0.97 ± 0.02 SVM 10 0.86 ± 0.03 0.87 ± 0.03 0.87 ± 0.07 0.88 ± 0.08 0.92 ± 0.05 11 0.86 ± 0.03 0.87 ± 0.03 0.87 ± 0.07 0.88 ± 0.08 0.93 ± 0.05 12 0.86 ± 0.03 0.87 ± 0.03 0.87 ± 0.07 0.88 ± 0.08 0.93 ± 0.05 13 0.88 ± 0.04 0.89 ± 0.03 0.87 ± 0.07 0.92 ± 0.06 0.93 ± 0.05 14 0.9 ± 0.06 0.92 ± 0.04 0.88 ± 0.08 0.97 ± 0.05 0.92 ± 0.05 15 0.9 ± 0.06 0.91 ± 0.04 0.87 ± 0.08 0.97 ± 0.05 0.92 ± 0.05 16 0.9 ± 0.06 0.91 ± 0.04 0.87 ± 0.08 0.97 ± 0.05 0.92 ± 0.05 17 0.9 ± 0.06 0.91 ± 0.04 0.87 ± 0.08 0.97 ± 0.05 0.92 ± 0.05 18 0.9 ± 0.06 0.91 ± 0.04 0.87 ± 0.08 0.97 ± 0.05 0.92 ± 0.05 19 0.9 ± 0.06 0.91 ± 0.04 0.87 ± 0.08 0.97 ± 0.05 0.93 ± 0.05 20 0.9 ± 0.06 0.91 ± 0.04 0.87 ± 0.08 0.97 ± 0.05 0.93 ± 0.05

Table A.3: Performance of different classifiers on the binary A. baumannii dataset for different k-mer sizes.

DRAFT75 k Accuracy F1 Recall Precision AUC RF 10 0.98 ± 0.02 0.99 ± 0.01 1.0 ± 0.01 0.98 ± 0.03 0.99 ± 0.01 11 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.02 0.99 ± 0.01 12 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 13 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 14 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 15 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 16 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 17 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 18 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 19 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 20 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 Naive Bayes 10 0.89 ± 0.12 0.93 ± 0.09 0.95 ± 0.02 0.91 ± 0.13 0.86 ± 0.09 11 0.9 ± 0.09 0.94 ± 0.06 0.95 ± 0.02 0.93 ± 0.1 0.87 ± 0.07 12 0.93 ± 0.04 0.96 ± 0.03 0.96 ± 0.01 0.96 ± 0.04 0.88 ± 0.04 13 0.94 ± 0.02 0.96 ± 0.02 0.96 ± 0.01 0.97 ± 0.02 0.89 ± 0.03 14 0.94 ± 0.02 0.96 ± 0.02 0.95 ± 0.01 0.97 ± 0.03 0.89 ± 0.03 15 0.94 ± 0.02 0.96 ± 0.01 0.95 ± 0.01 0.97 ± 0.02 0.89 ± 0.03 16 0.94 ± 0.02 0.96 ± 0.01 0.95 ± 0.01 0.97 ± 0.02 0.89 ± 0.03 17 0.94 ± 0.02 0.96 ± 0.01 0.95 ± 0.01 0.97 ± 0.02 0.89 ± 0.03 18 0.94 ± 0.02 0.96 ± 0.01 0.95 ± 0.01 0.97 ± 0.02 0.89 ± 0.03 19 0.94 ± 0.02 0.96 ± 0.01 0.95 ± 0.01 0.97 ± 0.02 0.89 ± 0.03 20 0.94 ± 0.02 0.96 ± 0.01 0.95 ± 0.01 0.97 ± 0.02 0.89 ± 0.03 Lasso 10 0.98 ± 0.02 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 1.0 ± 0.0 11 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 1.0 ± 0.0 12 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 1.0 ± 0.0 13 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 1.0 ± 0.0 14 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 1.0 ± 0.0 15 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 1.0 ± 0.0 16 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 1.0 ± 0.0 17 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 1.0 ± 0.0 18 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 1.0 ± 0.0 19 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 1.0 ± 0.0 20 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 1.0 ± 0.0 AdaBoost 10 0.99 ± 0.02 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 11 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 12 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 13 0.99 ± 0.02 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.98 ± 0.01 14 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.02 15 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 16 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 17 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 18 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 DRAFT76 19 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.02 20 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.02 SVM 10 0.83 ± 0.02 0.9 ± 0.01 0.83 ± 0.01 0.99 ± 0.02 0.98 ± 0.02 11 0.83 ± 0.02 0.9 ± 0.01 0.83 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 12 0.83 ± 0.02 0.9 ± 0.01 0.84 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 13 0.83 ± 0.02 0.9 ± 0.01 0.84 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 14 0.83 ± 0.02 0.9 ± 0.01 0.84 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 15 0.83 ± 0.02 0.9 ± 0.01 0.84 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 16 0.83 ± 0.02 0.9 ± 0.01 0.84 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 17 0.83 ± 0.02 0.9 ± 0.01 0.84 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 18 0.83 ± 0.02 0.9 ± 0.01 0.84 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 19 0.83 ± 0.02 0.9 ± 0.01 0.84 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 20 0.83 ± 0.02 0.9 ± 0.01 0.84 ± 0.01 0.99 ± 0.02 0.99 ± 0.02

Table A.4: Performance of different classifiers on the binary S. aureus dataset for different k-mer sizes.

DRAFT77 k Accuracy F1 Recall Precision AUC RF 10 0.91 ± 0.03 0.91 ± 0.03 0.92 ± 0.06 0.91 ± 0.07 0.95 ± 0.03 11 0.94 ± 0.05 0.94 ± 0.06 0.96 ± 0.03 0.93 ± 0.09 0.95 ± 0.04 12 0.94 ± 0.05 0.94 ± 0.06 0.95 ± 0.03 0.93 ± 0.09 0.95 ± 0.04 13 0.93 ± 0.05 0.93 ± 0.05 0.94 ± 0.05 0.93 ± 0.07 0.95 ± 0.03 14 0.94 ± 0.04 0.94 ± 0.04 0.96 ± 0.03 0.93 ± 0.06 0.96 ± 0.03 15 0.92 ± 0.04 0.92 ± 0.04 0.94 ± 0.03 0.9 ± 0.07 0.95 ± 0.03 16 0.92 ± 0.03 0.92 ± 0.03 0.95 ± 0.03 0.89 ± 0.07 0.96 ± 0.02 17 0.93 ± 0.03 0.93 ± 0.03 0.95 ± 0.03 0.92 ± 0.06 0.95 ± 0.03 18 0.94 ± 0.04 0.94 ± 0.04 0.96 ± 0.03 0.93 ± 0.06 0.96 ± 0.03 19 0.94 ± 0.03 0.94 ± 0.04 0.95 ± 0.02 0.93 ± 0.06 0.95 ± 0.03 20 0.94 ± 0.03 0.94 ± 0.03 0.96 ± 0.03 0.93 ± 0.06 0.96 ± 0.04 Naive Bayes 10 0.51 ± 0.04 0.18 ± 0.11 0.62 ± 0.37 0.11 ± 0.06 0.53 ± 0.04 11 0.5 ± 0.04 0.11 ± 0.11 0.5 ± 0.45 0.07 ± 0.06 0.52 ± 0.03 12 0.52 ± 0.05 0.17 ± 0.14 0.75 ± 0.39 0.1 ± 0.09 0.54 ± 0.04 13 0.58 ± 0.09 0.31 ± 0.21 0.97 ± 0.07 0.21 ± 0.18 0.6 ± 0.09 14 0.63 ± 0.08 0.45 ± 0.16 0.96 ± 0.05 0.31 ± 0.16 0.65 ± 0.07 15 0.71 ± 0.12 0.6 ± 0.21 0.96 ± 0.04 0.48 ± 0.24 0.73 ± 0.11 16 0.76 ± 0.14 0.67 ± 0.21 0.94 ± 0.06 0.58 ± 0.28 0.77 ± 0.13 17 0.78 ± 0.13 0.71 ± 0.2 0.93 ± 0.07 0.63 ± 0.27 0.78 ± 0.12 18 0.79 ± 0.11 0.75 ± 0.16 0.93 ± 0.07 0.67 ± 0.23 0.8 ± 0.1 19 0.79 ± 0.11 0.75 ± 0.16 0.93 ± 0.07 0.67 ± 0.23 0.8 ± 0.1 20 0.8 ± 0.11 0.76 ± 0.16 0.93 ± 0.07 0.68 ± 0.23 0.8 ± 0.1 Lasso 10 0.91 ± 0.04 0.92 ± 0.04 0.92 ± 0.06 0.92 ± 0.08 0.95 ± 0.04 11 0.92 ± 0.04 0.92 ± 0.04 0.93 ± 0.06 0.92 ± 0.08 0.95 ± 0.04 12 0.92 ± 0.04 0.92 ± 0.04 0.93 ± 0.06 0.92 ± 0.08 0.95 ± 0.04 13 0.92 ± 0.04 0.92 ± 0.04 0.93 ± 0.06 0.92 ± 0.08 0.95 ± 0.04 14 0.92 ± 0.04 0.92 ± 0.04 0.93 ± 0.06 0.92 ± 0.08 0.95 ± 0.04 15 0.92 ± 0.04 0.92 ± 0.04 0.93 ± 0.06 0.92 ± 0.08 0.95 ± 0.04 16 0.9 ± 0.03 0.91 ± 0.03 0.9 ± 0.07 0.92 ± 0.08 0.95 ± 0.04 17 0.9 ± 0.03 0.91 ± 0.03 0.9 ± 0.07 0.92 ± 0.08 0.95 ± 0.04 18 0.9 ± 0.03 0.91 ± 0.03 0.9 ± 0.07 0.92 ± 0.08 0.95 ± 0.04 19 0.9 ± 0.03 0.91 ± 0.03 0.9 ± 0.07 0.92 ± 0.08 0.95 ± 0.04 20 0.9 ± 0.03 0.91 ± 0.03 0.9 ± 0.07 0.92 ± 0.08 0.95 ± 0.04 AdaBoost 10 0.93 ± 0.02 0.93 ± 0.02 0.93 ± 0.03 0.93 ± 0.06 0.96 ± 0.02 11 0.91 ± 0.02 0.91 ± 0.03 0.92 ± 0.02 0.9 ± 0.06 0.96 ± 0.03 12 0.93 ± 0.03 0.93 ± 0.04 0.94 ± 0.01 0.93 ± 0.08 0.97 ± 0.02 13 0.93 ± 0.02 0.93 ± 0.02 0.92 ± 0.02 0.94 ± 0.06 0.97 ± 0.03 14 0.94 ± 0.04 0.94 ± 0.04 0.94 ± 0.02 0.94 ± 0.08 0.96 ± 0.04 15 0.91 ± 0.06 0.91 ± 0.07 0.9 ± 0.04 0.93 ± 0.11 0.96 ± 0.04 16 0.93 ± 0.06 0.93 ± 0.06 0.95 ± 0.02 0.92 ± 0.11 0.96 ± 0.03 17 0.92 ± 0.04 0.92 ± 0.04 0.9 ± 0.03 0.95 ± 0.08 0.96 ± 0.04 18 0.94 ± 0.04 0.94 ± 0.04 0.93 ± 0.05 0.95 ± 0.06 0.96 ± 0.04 DRAFT78 19 0.94 ± 0.04 0.94 ± 0.04 0.96 ± 0.03 0.93 ± 0.07 0.96 ± 0.03 20 0.93 ± 0.05 0.93 ± 0.05 0.94 ± 0.03 0.93 ± 0.09 0.96 ± 0.03 SVM 10 0.84 ± 0.03 0.83 ± 0.04 0.95 ± 0.03 0.74 ± 0.06 0.94 ± 0.05 11 0.91 ± 0.06 0.92 ± 0.05 0.92 ± 0.09 0.93 ± 0.07 0.95 ± 0.04 12 0.92 ± 0.06 0.93 ± 0.05 0.9 ± 0.09 0.96 ± 0.05 0.95 ± 0.04 13 0.9 ± 0.06 0.92 ± 0.04 0.88 ± 0.08 0.96 ± 0.05 0.96 ± 0.04 14 0.9 ± 0.06 0.92 ± 0.04 0.88 ± 0.08 0.96 ± 0.05 0.95 ± 0.04 15 0.9 ± 0.06 0.92 ± 0.04 0.88 ± 0.08 0.96 ± 0.05 0.96 ± 0.04 16 0.9 ± 0.06 0.91 ± 0.05 0.88 ± 0.08 0.95 ± 0.07 0.96 ± 0.04 17 0.9 ± 0.06 0.91 ± 0.05 0.88 ± 0.08 0.95 ± 0.07 0.96 ± 0.04 18 0.9 ± 0.06 0.91 ± 0.05 0.88 ± 0.08 0.95 ± 0.07 0.96 ± 0.03 19 0.9 ± 0.06 0.91 ± 0.05 0.88 ± 0.08 0.95 ± 0.07 0.96 ± 0.03 20 0.9 ± 0.06 0.91 ± 0.05 0.88 ± 0.08 0.95 ± 0.07 0.96 ± 0.03

Table A.5: Performance of different classifiers on the compressed A. baumannii dataset for different k-mer sizes.

DRAFT79 k Accuracy F1 Recall Precision AUC RF 10 0.98 ± 0.03 0.99 ± 0.02 1.0 ± 0.01 0.98 ± 0.04 0.99 ± 0.01 11 0.98 ± 0.02 0.99 ± 0.01 1.0 ± 0.01 0.98 ± 0.03 0.99 ± 0.01 12 0.98 ± 0.02 0.99 ± 0.01 1.0 ± 0.0 0.98 ± 0.03 0.99 ± 0.01 13 0.99 ± 0.02 0.99 ± 0.01 1.0 ± 0.01 0.99 ± 0.02 0.99 ± 0.01 14 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 15 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 16 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.02 0.99 ± 0.01 17 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 18 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 0.98 ± 0.01 0.99 ± 0.01 19 0.98 ± 0.02 0.99 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 0.99 ± 0.01 20 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 Naive Bayes 10 0.81 ± 0.23 0.83 ± 0.24 0.96 ± 0.02 0.81 ± 0.3 0.83 ± 0.13 11 0.84 ± 0.2 0.87 ± 0.19 0.96 ± 0.01 0.83 ± 0.25 0.85 ± 0.11 12 0.84 ± 0.19 0.87 ± 0.18 0.96 ± 0.02 0.84 ± 0.24 0.85 ± 0.1 13 0.87 ± 0.14 0.9 ± 0.12 0.96 ± 0.02 0.87 ± 0.19 0.86 ± 0.08 14 0.89 ± 0.12 0.92 ± 0.09 0.96 ± 0.01 0.9 ± 0.15 0.87 ± 0.07 15 0.89 ± 0.12 0.92 ± 0.09 0.95 ± 0.01 0.91 ± 0.14 0.86 ± 0.08 16 0.89 ± 0.1 0.93 ± 0.07 0.95 ± 0.02 0.91 ± 0.12 0.85 ± 0.07 17 0.89 ± 0.1 0.93 ± 0.08 0.95 ± 0.02 0.92 ± 0.12 0.85 ± 0.07 18 0.89 ± 0.1 0.93 ± 0.07 0.94 ± 0.01 0.92 ± 0.12 0.84 ± 0.07 19 0.89 ± 0.1 0.93 ± 0.07 0.94 ± 0.01 0.92 ± 0.12 0.84 ± 0.06 20 0.89 ± 0.1 0.93 ± 0.07 0.94 ± 0.01 0.93 ± 0.12 0.84 ± 0.07 Lasso 10 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 11 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 12 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 13 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 14 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 15 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.98 ± 0.02 0.99 ± 0.01 16 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.0 0.98 ± 0.02 0.99 ± 0.01 17 0.98 ± 0.02 0.98 ± 0.01 0.99 ± 0.0 0.98 ± 0.03 0.99 ± 0.01 18 0.98 ± 0.02 0.99 ± 0.01 1.0 ± 0.0 0.98 ± 0.03 0.99 ± 0.01 19 0.98 ± 0.02 0.99 ± 0.01 1.0 ± 0.0 0.98 ± 0.03 0.99 ± 0.01 20 0.98 ± 0.02 0.98 ± 0.01 1.0 ± 0.0 0.97 ± 0.03 0.99 ± 0.01 AdaBoost 10 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.02 0.99 ± 0.01 11 0.98 ± 0.02 0.99 ± 0.02 0.99 ± 0.01 0.98 ± 0.03 0.99 ± 0.01 12 0.98 ± 0.02 0.99 ± 0.01 0.99 ± 0.01 0.98 ± 0.02 0.99 ± 0.01 13 0.98 ± 0.02 0.98 ± 0.01 0.99 ± 0.01 0.98 ± 0.02 0.99 ± 0.01 14 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 0.99 ± 0.01 15 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 16 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 0.99 ± 0.02 17 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.02 0.99 ± 0.01 18 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 0.99 ± 0.01 DRAFT80 19 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 20 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 SVM 10 0.98 ± 0.03 0.98 ± 0.02 0.99 ± 0.01 0.98 ± 0.04 0.99 ± 0.01 11 0.97 ± 0.03 0.98 ± 0.02 0.99 ± 0.02 0.98 ± 0.04 0.99 ± 0.01 12 0.97 ± 0.02 0.98 ± 0.01 0.99 ± 0.01 0.98 ± 0.03 0.99 ± 0.01 13 0.97 ± 0.02 0.98 ± 0.01 0.99 ± 0.01 0.98 ± 0.03 0.99 ± 0.01 14 0.97 ± 0.02 0.98 ± 0.02 0.99 ± 0.01 0.98 ± 0.04 1.0 ± 0.0 15 0.97 ± 0.03 0.98 ± 0.02 0.99 ± 0.01 0.97 ± 0.04 1.0 ± 0.0 16 0.95 ± 0.07 0.96 ± 0.05 0.99 ± 0.01 0.95 ± 0.1 0.99 ± 0.0 17 0.94 ± 0.07 0.96 ± 0.05 0.99 ± 0.01 0.94 ± 0.1 0.99 ± 0.0 18 0.95 ± 0.08 0.96 ± 0.05 0.99 ± 0.01 0.94 ± 0.1 0.99 ± 0.01 19 0.94 ± 0.09 0.96 ± 0.06 0.99 ± 0.01 0.94 ± 0.11 0.99 ± 0.0 20 0.93 ± 0.1 0.95 ± 0.07 0.99 ± 0.01 0.93 ± 0.13 0.99 ± 0.01

Table A.6: Performance of different classifiers on the compressed S. aureus dataset for different k-mer sizes.

DRAFT81 k Accuracy F1 Recall Precision AUC RF 10 0.89 ± 0.05 0.9 ± 0.05 0.88 ± 0.08 0.92 ± 0.05 0.94 ± 0.04 11 0.94 ± 0.04 0.94 ± 0.04 0.96 ± 0.03 0.92 ± 0.07 0.95 ± 0.04 12 0.94 ± 0.02 0.94 ± 0.02 0.96 ± 0.03 0.92 ± 0.04 0.95 ± 0.04 13 0.94 ± 0.02 0.94 ± 0.02 0.96 ± 0.03 0.92 ± 0.04 0.95 ± 0.03 14 0.94 ± 0.03 0.94 ± 0.03 0.96 ± 0.03 0.93 ± 0.06 0.96 ± 0.03 15 0.93 ± 0.03 0.93 ± 0.03 0.96 ± 0.03 0.91 ± 0.05 0.96 ± 0.03 16 0.92 ± 0.03 0.92 ± 0.03 0.95 ± 0.03 0.9 ± 0.06 0.95 ± 0.04 17 0.93 ± 0.04 0.93 ± 0.04 0.94 ± 0.04 0.93 ± 0.05 0.96 ± 0.03 18 0.92 ± 0.04 0.92 ± 0.04 0.95 ± 0.03 0.9 ± 0.08 0.95 ± 0.04 19 0.94 ± 0.03 0.95 ± 0.03 0.95 ± 0.03 0.94 ± 0.04 0.95 ± 0.03 20 0.95 ± 0.04 0.95 ± 0.04 0.96 ± 0.03 0.94 ± 0.06 0.96 ± 0.03 Naive Bayes 10 0.6 ± 0.05 0.4 ± 0.11 0.91 ± 0.09 0.26 ± 0.09 0.62 ± 0.04 11 0.51 ± 0.03 0.13 ± 0.13 0.56 ± 0.46 0.07 ± 0.07 0.53 ± 0.03 12 0.5 ± 0.02 0.11 ± 0.1 0.56 ± 0.46 0.06 ± 0.06 0.53 ± 0.02 13 0.56 ± 0.09 0.25 ± 0.25 0.58 ± 0.47 0.17 ± 0.18 0.58 ± 0.09 14 0.62 ± 0.09 0.4 ± 0.21 0.98 ± 0.04 0.28 ± 0.19 0.63 ± 0.09 15 0.67 ± 0.09 0.53 ± 0.17 0.98 ± 0.03 0.38 ± 0.17 0.69 ± 0.08 16 0.75 ± 0.12 0.68 ± 0.19 0.96 ± 0.04 0.57 ± 0.25 0.77 ± 0.11 17 0.8 ± 0.11 0.74 ± 0.19 0.95 ± 0.04 0.66 ± 0.25 0.81 ± 0.11 18 0.81 ± 0.1 0.77 ± 0.16 0.92 ± 0.05 0.7 ± 0.22 0.81 ± 0.1 19 0.81 ± 0.1 0.77 ± 0.16 0.92 ± 0.05 0.7 ± 0.22 0.81 ± 0.1 20 0.81 ± 0.1 0.77 ± 0.16 0.92 ± 0.05 0.7 ± 0.22 0.81 ± 0.1 Lasso 10 0.91 ± 0.04 0.92 ± 0.04 0.92 ± 0.05 0.92 ± 0.07 0.97 ± 0.03 11 0.93 ± 0.04 0.93 ± 0.03 0.93 ± 0.05 0.94 ± 0.06 0.97 ± 0.02 12 0.93 ± 0.03 0.93 ± 0.03 0.94 ± 0.04 0.94 ± 0.06 0.97 ± 0.02 13 0.94 ± 0.02 0.94 ± 0.02 0.94 ± 0.04 0.94 ± 0.06 0.97 ± 0.02 14 0.93 ± 0.03 0.93 ± 0.03 0.95 ± 0.03 0.92 ± 0.07 0.97 ± 0.03 15 0.94 ± 0.03 0.94 ± 0.03 0.95 ± 0.03 0.93 ± 0.06 0.97 ± 0.03 16 0.94 ± 0.03 0.94 ± 0.03 0.95 ± 0.03 0.93 ± 0.06 0.97 ± 0.03 17 0.94 ± 0.03 0.94 ± 0.03 0.95 ± 0.03 0.93 ± 0.06 0.97 ± 0.03 18 0.94 ± 0.03 0.94 ± 0.03 0.95 ± 0.03 0.93 ± 0.06 0.97 ± 0.03 19 0.94 ± 0.03 0.94 ± 0.03 0.95 ± 0.03 0.93 ± 0.06 0.97 ± 0.03 20 0.94 ± 0.03 0.94 ± 0.03 0.95 ± 0.03 0.93 ± 0.06 0.96 ± 0.03 AdaBoost 10 0.94 ± 0.06 0.94 ± 0.05 0.93 ± 0.05 0.95 ± 0.06 0.97 ± 0.03 11 0.91 ± 0.02 0.91 ± 0.03 0.91 ± 0.04 0.91 ± 0.08 0.95 ± 0.03 12 0.93 ± 0.03 0.93 ± 0.03 0.94 ± 0.04 0.92 ± 0.07 0.96 ± 0.02 13 0.94 ± 0.04 0.94 ± 0.04 0.94 ± 0.03 0.94 ± 0.06 0.97 ± 0.02 14 0.94 ± 0.03 0.94 ± 0.03 0.94 ± 0.03 0.94 ± 0.06 0.97 ± 0.02 15 0.94 ± 0.03 0.94 ± 0.03 0.95 ± 0.04 0.93 ± 0.06 0.97 ± 0.02 16 0.94 ± 0.04 0.94 ± 0.04 0.94 ± 0.04 0.93 ± 0.08 0.96 ± 0.04 17 0.94 ± 0.03 0.95 ± 0.03 0.96 ± 0.02 0.93 ± 0.06 0.97 ± 0.02 18 0.92 ± 0.03 0.92 ± 0.03 0.94 ± 0.03 0.91 ± 0.07 0.97 ± 0.02 DRAFT82 19 0.94 ± 0.04 0.94 ± 0.04 0.96 ± 0.02 0.93 ± 0.07 0.97 ± 0.03 20 0.94 ± 0.04 0.94 ± 0.04 0.95 ± 0.03 0.94 ± 0.08 0.97 ± 0.02 SVM 10 0.87 ± 0.04 0.88 ± 0.03 0.88 ± 0.08 0.89 ± 0.06 0.93 ± 0.05 11 0.9 ± 0.05 0.91 ± 0.04 0.87 ± 0.07 0.95 ± 0.05 0.93 ± 0.05 12 0.9 ± 0.05 0.91 ± 0.04 0.87 ± 0.07 0.95 ± 0.05 0.93 ± 0.05 13 0.9 ± 0.05 0.91 ± 0.04 0.87 ± 0.07 0.95 ± 0.05 0.93 ± 0.04 14 0.9 ± 0.05 0.91 ± 0.04 0.87 ± 0.07 0.95 ± 0.05 0.93 ± 0.04 15 0.9 ± 0.05 0.91 ± 0.04 0.87 ± 0.07 0.95 ± 0.05 0.93 ± 0.04 16 0.9 ± 0.05 0.91 ± 0.04 0.87 ± 0.07 0.95 ± 0.05 0.93 ± 0.04 17 0.9 ± 0.05 0.91 ± 0.04 0.87 ± 0.07 0.95 ± 0.05 0.93 ± 0.04 18 0.9 ± 0.05 0.91 ± 0.04 0.87 ± 0.07 0.95 ± 0.05 0.93 ± 0.04 19 0.9 ± 0.05 0.91 ± 0.04 0.87 ± 0.07 0.95 ± 0.05 0.93 ± 0.04 20 0.9 ± 0.05 0.91 ± 0.04 0.87 ± 0.07 0.95 ± 0.05 0.93 ± 0.04

Table A.7: Performance of different classifiers on the compressed binary A. baumannii dataset for different k-mer sizes.

DRAFT83 k Accuracy F1 Recall Precision AUC RF 10 0.98 ± 0.03 0.98 ± 0.02 1.0 ± 0.01 0.97 ± 0.03 0.99 ± 0.01 11 0.98 ± 0.02 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 0.99 ± 0.01 12 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 13 0.99 ± 0.02 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 14 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 15 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.02 0.99 ± 0.01 16 0.98 ± 0.02 0.98 ± 0.01 0.99 ± 0.02 0.98 ± 0.02 0.99 ± 0.01 17 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.98 ± 0.02 0.99 ± 0.01 18 0.98 ± 0.02 0.98 ± 0.01 0.98 ± 0.03 0.99 ± 0.01 0.99 ± 0.01 19 0.97 ± 0.02 0.98 ± 0.01 0.98 ± 0.03 0.98 ± 0.02 0.98 ± 0.02 20 0.98 ± 0.02 0.99 ± 0.01 0.98 ± 0.03 0.99 ± 0.01 0.99 ± 0.01 Naive Bayes 10 0.87 ± 0.14 0.91 ± 0.12 0.96 ± 0.02 0.88 ± 0.18 0.86 ± 0.08 11 0.88 ± 0.14 0.91 ± 0.11 0.96 ± 0.02 0.88 ± 0.18 0.86 ± 0.07 12 0.88 ± 0.13 0.91 ± 0.11 0.96 ± 0.02 0.89 ± 0.17 0.86 ± 0.06 13 0.89 ± 0.11 0.92 ± 0.09 0.95 ± 0.02 0.9 ± 0.15 0.86 ± 0.06 14 0.89 ± 0.11 0.92 ± 0.08 0.95 ± 0.01 0.91 ± 0.14 0.85 ± 0.07 15 0.89 ± 0.11 0.92 ± 0.08 0.94 ± 0.02 0.92 ± 0.13 0.84 ± 0.07 16 0.89 ± 0.11 0.93 ± 0.08 0.94 ± 0.02 0.92 ± 0.13 0.84 ± 0.07 17 0.89 ± 0.1 0.93 ± 0.07 0.94 ± 0.01 0.92 ± 0.12 0.84 ± 0.07 18 0.89 ± 0.11 0.93 ± 0.08 0.94 ± 0.02 0.92 ± 0.13 0.84 ± 0.07 19 0.89 ± 0.11 0.93 ± 0.08 0.94 ± 0.01 0.92 ± 0.13 0.84 ± 0.07 20 0.89 ± 0.11 0.93 ± 0.08 0.94 ± 0.01 0.92 ± 0.13 0.84 ± 0.07 Lasso 10 0.98 ± 0.02 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 1.0 ± 0.0 11 0.98 ± 0.02 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 1.0 ± 0.0 12 0.98 ± 0.02 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 1.0 ± 0.0 13 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 1.0 ± 0.0 14 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.0 15 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.0 16 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 0.99 ± 0.01 0.99 ± 0.01 17 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.02 0.99 ± 0.01 0.99 ± 0.01 18 0.98 ± 0.01 0.98 ± 0.01 0.99 ± 0.02 0.98 ± 0.02 0.99 ± 0.01 19 0.98 ± 0.02 0.98 ± 0.01 0.99 ± 0.02 0.98 ± 0.02 0.99 ± 0.01 20 0.97 ± 0.02 0.98 ± 0.01 0.99 ± 0.02 0.98 ± 0.02 0.99 ± 0.01 AdaBoost 10 0.99 ± 0.02 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 11 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 12 0.98 ± 0.02 0.99 ± 0.01 0.99 ± 0.0 0.98 ± 0.02 0.98 ± 0.02 13 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 14 0.99 ± 0.01 0.99 ± 0.01 0.99 ± 0.0 0.99 ± 0.01 0.99 ± 0.02 15 0.98 ± 0.01 0.99 ± 0.01 0.99 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 16 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.01 0.99 ± 0.01 0.99 ± 0.01 17 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 18 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.01 DRAFT84 19 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.02 20 0.99 ± 0.01 0.99 ± 0.01 1.0 ± 0.0 0.99 ± 0.01 0.99 ± 0.02 SVM 10 0.94 ± 0.04 0.97 ± 0.02 0.95 ± 0.05 0.98 ± 0.03 0.99 ± 0.01 11 0.96 ± 0.02 0.97 ± 0.01 0.96 ± 0.03 0.98 ± 0.03 1.0 ± 0.0 12 0.96 ± 0.02 0.97 ± 0.01 0.97 ± 0.03 0.98 ± 0.03 0.99 ± 0.0 13 0.95 ± 0.02 0.97 ± 0.02 0.97 ± 0.03 0.98 ± 0.04 0.99 ± 0.0 14 0.95 ± 0.02 0.97 ± 0.02 0.96 ± 0.03 0.98 ± 0.04 0.99 ± 0.01 15 0.94 ± 0.03 0.97 ± 0.02 0.96 ± 0.03 0.98 ± 0.04 0.99 ± 0.01 16 0.94 ± 0.03 0.96 ± 0.02 0.95 ± 0.03 0.97 ± 0.05 0.99 ± 0.01 17 0.94 ± 0.03 0.96 ± 0.02 0.95 ± 0.03 0.97 ± 0.05 0.99 ± 0.01 18 0.93 ± 0.04 0.96 ± 0.02 0.95 ± 0.03 0.97 ± 0.06 0.99 ± 0.01 19 0.93 ± 0.04 0.95 ± 0.03 0.95 ± 0.04 0.97 ± 0.06 0.99 ± 0.01 20 0.92 ± 0.05 0.95 ± 0.03 0.95 ± 0.04 0.96 ± 0.08 0.99 ± 0.01

Table A.8: Performance of different classifiers on the compressed binary S. aureus dataset for different k-mer sizes.

DRAFT85 A.2 AMR: Gene Stability

k-mer FI Balance Identical Sum of FI

GGGCCATGCCCAGGA 0.0078 0.339 13 0.0146

GCCCCAGCGCCAACA 0.006 0.9874 13 0.0168

AGCGCCAACAGTCGG 0.0058 0.9874 13 0.0168

CATGCCCAGGATGTA 0.0047 0.339 13 0.0146

CACCTCCGCACCGGC 0.0043 0.0688 103 0.007

AAATCGCGTAGTCTG 0.0043 0.0684 1199 0.02

GACGGATCTCAGCGC 0.0042 0.0688 1364 0.03

CATGACGCGGCCAGC 0.0041 0.6478 401 0.008

ACGCGCGCCACTCTC 0.0041 0.6478 401 0.008

CGGCAGTTCGCCGAC 0.004 0.6489 1 0.004

CACCGCCGTGGGGCA 0.004 0.0737 2 0.004

AGATTCTGAATCTGG 0.0038 0.0684 1199 0.02

AGCTCGTCGTCTTAC 0.0038 0.6499 11 0.0038

GGATCGCCTCGGTGA 0.0038 0.4903 2 0.0038

TCCCGAGGTTGGACA 0.0038 0.0688 1364 0.03

CGATCTGCTCGGAGC 0.0037 0.645 9 0.0037

GCCCCGGGTAGTCGA 0.0037 0.6485 10 0.0037

GACGCCGCACAACGC 0.0036 0.0684 24 0.0036

TCGCGCTGGATGGCA 0.0036 0.0688 1364 0.03

CAGTGCACCCAGGCC 0.0036 0.6496 7 0.0036

ATGCCACACCTCGGC 0.0036 0.0733 238 0.013

CGCCGCGTAGGCCGC 0.0035 0.4896 1 0.0035

ACCGATCGCTTGCAC 0.0035 0.0688 1364 0.03

CCGCCGCCGAAGAAC 0.0035 0.6482 1 0.0035

CCGCTCCATCACCCA 0.0035 0.0688 1364 0.03

GCGATCTTGGACCCC 0.0035 0.6488 13 0.0035

AGTAGCGGCCGCGAG 0.0035 0.0684 1199 0.02

AGGGTCGACTCCGGC 0.0035 0.0591 14 0.0035

ACGACACCACCTCGA 0.0035 0.0733 238 0.013

CGGCCAAGCGCTGGA 0.0035 0.6485 10 0.0035

CAATATTGGCCGGGG 0.0035 0.0684 1199 0.02

GCCGGCGAGATAGAC 0.0035 0.0684 1199 0.02

ATCGGCTGCCGCCCG 0.0034 0.0723 19 0.0034

AGGGTCGTCGTGATC 0.0034 0.0733 238 0.013

AAGGAGAATGGTCCG 0.0034 0.0727 1932 0.00

ACCCAGGCTTTGAAA 0.0034 0.0684 1199 0.02

CACATCCACTCATAC 0.0034 0.0691 1388 0.00

CACTTTTTATCCAGG 0.0034 0.0688 1364 0.03

CCTGGTCGGCGAGCC 0.0033 0.6482 12 0.0033 DRAFT86 ACCCGGTGGTGCCCT 0.0033 0.0688 103 0.007

CACACCGTCGGGCTG 0.0033 0.6473 1 0.0033

CACGCCGTTGGGCAC 0.0033 0.6482 4 0.0033

CCGTTCCGTCGTCAC 0.0033 0.0684 1199 0.02

CGGTGCTCGCGGGGC 0.0032 0.0737 84 0.0032

AGGGAAACTCCGGCG 0.0032 0.0688 1364 0.03

CGTCCGCCGAGCAGC 0.0032 0.6459 1 0.0032

CAATAAGAACCGTGA 0.0032 0.0684 1199 0.02

CAACCCTGCGGTGCC 0.0032 0.0703 11 0.0032

CCTAGCGCTGACAAC 0.0032 0.0688 1364 0.03

CGGGAGCGCCAAAGA 0.0031 0.0733 238 0.013

ATCGCTATCCCGAGG 0.0031 0.0688 1364 0.03

CGCCGACAGTCGGCG 0.003 0.3893 4 0.0044

ACGTCATCGAGTGAC 0.003 0.0691 1388 0.006

CCCGTTGAACCGTGA 0.0025 0.5086 607 0.002

AGCGCCGACAGTCGG 0.0022 0.3162 5 0.0022

GCCGACTGTTGGCGC 0.0021 0.9874 13 0.0168

GCAGATCGTCGGTCA 0.0018 0.5328 12 0.0018

AGCAGATCGTCGGTC 0.0017 0.5201 1 0.0017

GCCATTGGCGATAGC 0.0017 0.4694 2 0.0017

AGGTTAGGCGCCAAT 0.0016 0.5853 2 0.0016

CATCCTGGGCGTGGC 0.0016 0.9612 11 0.0036

AAGGAGGGTTCTGTC 0.0016 0.441 126 0.0016

CCCCAGCGCCAACAG 0.0015 0.6934 1 0.0015

AACGATTCCTCCACA 0.0015 0.5676 3 0.0015

GCTCCCGTCGTTCCA 0.0015 0.5134 7 0.0015

CGCGGCGAGACGATA 0.0014 0.5053 3 0.0017

CTTCGACGACTTCGA 0.0014 0.5853 1 0.0014

ATCGGACGCATCTCA 0.0014 0.4479 14 0.0014

CCCGCCGGGCCAGGC 0.0014 0.6284 3 0.0014

GACGCCGCCCTACGC 0.0013 0.5082 1 0.0013

GCCGACAGTCGGCGC 0.0013 0.3893 4 0.0044

AGATTTGGGAGCCGA 0.0013 0.5853 12 0.0013

CCACACCGAAAATCC 0.0013 0.6157 2 0.0013

CGTCAGCATCGCCCC 0.0013 0.9624 29 0.0013

ACTGTTGGCGCTGGG 0.0013 0.9874 13 0.0168

ACCTCCGAGCAACGA 0.0012 0.5828 15 0.0012

GCCTACCACTCTCCA 0.0012 0.6146 4 0.0012

TGCGATACACCGCCA 0.0012 0.3362 14 0.0012

GGGTGGGTCGGCAGA 0.0012 0.345 29 0.0012

CTGATTATGCCTGAC 0.0011 0.5686 10 0.0011 DRAFT87 Table A.9: k-mer statistics computed by RF on the k-mer matrix of the M. tuberculosis rifampicin dataset (k = 15). FI denotes the feature importance. The balance denotes the fraction of times each k-mer is associated with a RES label in the dataset. Identical denotes the number of k-mers that have the same column identity as the specified k-mer. Sum of FIs is the sum of the feature importances (computed by RF) of the k-mers whose column identities that are identical to the specified k-mer.

DRAFT88 k-mer FI Balance Identical Sum of FIs

AGTTCGGCTTCCTCG 0.0048 0.9137 13 0.0112

ACTCCGAGGAAGCCG 0.0045 0.9137 13 0.0112

AGTTCGGCTTCTTCG 0.0031 0.3755 2 0.0031

GTTCTGCGATACCAA 0.0026 0.2986 6 0.0026

CGTAGCGGCGATGCC 0.0022 0.2944 1 0.0022

GCCACGTGGTAGGCC 0.0021 0.0957 1388 0.0108

CATAACTGACCGCCG 0.0021 0.2798 2 0.0021

GGCCCCAGCGCCGAC 0.0021 0.2194 5 0.0021

GTCGACGGTCCACGA 0.0021 0.2718 411 0.0089

ACCAGTGCCCTGGTG 0.0021 0.2723 16 0.0021

CAATCTCGACCATGG 0.002 0.0957 1388 0.0108

CTAGTCGGTGAGAAG 0.002 0.271 433 0.0077

CTTCTGGGTCGAGCA 0.002 0.4811 5 0.002

ACTCCTTGACTTGTA 0.002 0.482 13 0.002

CAACCCCGACCCCGA 0.0019 0.3681 1 0.0019

CAAACATCGGGATAC 0.0019 0.2718 411 0.0089

TGGACCACCCGCAAA 0.0019 0.1894 2 0.0019

GATGGTGTGCGGCAA 0.0019 0.0947 1198 0.006

GACGTGGGCGGTCCC 0.0019 0.395 1 0.0019

GAACACGTCTTGGAC 0.0019 0.271 433 0.0077

CCAGTTGCTCTCTGC 0.0019 0.0952 1366 0.0064

CACTCCGAAGAAGCC 0.0019 0.3203 10 0.0023

CCGTAGCCGGTATGA 0.0018 0.271 433 0.0077

ATGGCTCGACGATTG 0.0018 0.0957 1388 0.0108

ATCGACAAGCAGGTG 0.0018 0.4821 3 0.0018

CTTTACCCGGCTACC 0.0018 0.3746 5 0.0034

CGGCGACAGCACTAC 0.0018 0.2718 411 0.0089

GGACGCGTCGCCGCC 0.0018 0.0957 1388 0.0108

CCACCGCCACGGCAA 0.0018 0.0899 4 0.0018

CGCGGTGTTGATGAG 0.0018 0.3951 1 0.0018

GGACGCGATCACCAC 0.0018 0.7165 1 0.0018

CACGTCGCGCGAACA 0.0018 0.2963 5 0.0018

AGTCCGGCTTTATGC 0.0018 0.2723 11 0.0018

ATCATGCTGATCACC 0.0017 0.559 11 0.0017

CACCCGCTCATCCGG 0.0017 0.5615 103 0.0017

CCCGCCACCGCCCCG 0.0017 0.1088 19 0.0017

CGCGGCGGGCACAGA 0.0017 0.0952 23 0.0017

AGAAGACTTCCTGCT 0.0017 0.1005 29 0.0017

ACACAACCCACCCGC 0.0017 0.0952 1366 0.0064

CAACGCCCCATCAGG 0.0017 0.5607 44 0.0017

CCCCCGTTAGCGTTA 0.0016 0.2767 3 0.0016

ACCGGCCCCTGCCGC 0.0016 0.0957 1388 0.0108 DRAFT89 CAGGCGGGTGCGCAC 0.0016 0.1094 13 0.0016

GAGCCTGTTGGTGAC 0.0016 0.2705 13 0.0029

GGGTAGCCGGGTAAA 0.0016 0.3746 5 0.0034

CCCGGCGCCATACGC 0.0016 0.5604 10 0.0016

CAACCAGAGTTATCC 0.0015 0.0947 1198 0.006

AGACCAGTGGGGGCG 0.0015 0.0952 1366 0.0064

CCATCGCCGGCGGCG 0.0015 0.4069 1 0.0015

GATCGGTGGCCAAGC 0.0015 0.1005 79 0.0015

CACACTTGCTGACCG 0.0015 0.7132 1136 0.0016

ATACGCTCGGGGATC 0.0015 0.3022 80 0.0015

AACTGTCGGCGATTG 0.0015 0.3751 3 0.0015

ACCGAGCAGCCAGGC 0.0014 0.3825 26 0.0014

AGATGGAAGACAGCC 0.0014 0.0947 1198 0.006

CTACGTCGGGCCAGA 0.0014 0.0957 1388 0.0108

CTACGATATCCCGAG 0.0014 0.0952 1366 0.0064

GGGCCAGAACAACCC 0.0014 0.9412 13 0.0022

CAACAGCCTTCGATG 0.0014 0.8 148 0.0045

ATCGAGGCCTTTCGC 0.0013 0.2692 12 0.0013

GAAGTAGGTCCGCTC 0.0013 0.7094 43 0.0013

ACAGGCTCTCGATGC 0.0013 0.2705 13 0.0029

AGGGTGAGAACACCC 0.0013 0.9516 28 0.0016

ACGCGCACTGATTAC 0.0013 0.2718 411 0.0089

AATTGCGGTGGTGGG 0.0012 0.0947 1198 0.006

CAAGCGCCGACTGTC 0.0012 0.2931 1 0.0012

CATCGACGCAACCGG 0.0012 0.7903 2 0.0012

GACGACGGCTTGTCA 0.0012 0.0628 9 0.0012

CGTGCAAGGCGGTGC 0.0012 0.8033 111 0.0027

GCCCCAGCGCCGACA 0.0012 0.3434 1 0.0012

CGAATTCACGGGCAA 0.0011 0.2702 16 0.0011

CCTTGGTGGGGCCGC 0.0011 0.4069 12 0.0011

GAGCTCACCCGGCAC 0.0011 0.398 5 0.0027

CATCCTGGGCATGGC 0.0011 0.2712 13 0.004

ATCCTGGACCACTCC 0.0011 0.2995 296 0.0012

ACCTATGCAACGGAT 0.0011 0.4007 2 0.0011

CCGGCCAGGTCCCAC 0.0011 0.385 4 0.0011

GGGACGCGGCCAGGA 0.0011 0.8 148 0.0045

CAGGTGCCGGGTGAG 0.0011 0.398 5 0.0027

CAGGGTGCCGGGTAG 0.0011 0.736 168 0.0015

DRAFT90 Table A.10: k-mer statistics computed by RF on the k-mer matrix of the M. tuberculosis streptomycin dataset (k = 15). FI denotes the feature importance. The balance denotes the fraction of times each k-mer is associated with a RES label in the dataset. Identical denotes the number of k-mers that have the same column identity as the specified k-mer. Sum of FIs is the sum of the feature importances (computed by RF) of the k-mers whose column identities that are identical to the specified k-mer.

DRAFT91 PEG ID Product fig|83332.111.peg.2123 Catalase-peroxidase KatG (EC 1.11.1.21) fig|83332.111.peg.2916 Phosphatidylinositol phosphate synthase @ Archaetidylinosi- tol phosphate synthase (EC 2.7.8.39 fig|83332.111.peg.2123 Catalase-peroxidase KatG (EC 1.11.1.21) fig|83332.111.peg.3561 putative membrane protein fig|83332.111.peg.4088 MBL-fold metallo- superfamily fig|83332.111.peg.1987 Membrane protein EccB5 fig|83332.111.peg.3251 Chromosome partition protein smc fig|83332.111.peg.2123 Catalase-peroxidase KatG (EC 1.11.1.21) fig|83332.111.peg.1715 L-asparaginase (EC 3.5.1.1) fig|83332.111.peg.3251 Chromosome partition protein smc fig|83332.111.peg.3622 hypothetical protein fig|83332.111.peg.3251 Chromosome partition protein smc fig|83332.111.peg.2123 Catalase-peroxidase KatG (EC 1.11.1.21) fig|83332.111.peg.3251 Chromosome partition protein smc fig|83332.111.peg.2123 Catalase-peroxidase KatG (EC 1.11.1.21) fig|83332.111.peg.2123 Catalase-peroxidase KatG (EC 1.11.1.21) fig|83332.111.peg.3333 hypothetical protein fig|83332.111.peg.2927 Universal stress fig|83332.111.peg.1024 hypothetical protein fig|83332.111.peg.757 SSU ribosomal protein S12p (S23e) fig|83332.111.peg.3333 hypothetical protein fig|83332.111.peg.757 SSU ribosomal protein S12p (S23e) fig|83332.111.peg.3333 hypothetical protein fig|83332.111.peg.700 Enoyl-CoA hydratase (EC 4.2.1.17) fig|83332.111.peg.3618 Protein subunit SecA fig|83332.111.peg.1125 LSU ribosomal protein L25p fig|83332.111.peg.3618 Protein translocase subunit SecA fig|83332.111.peg.2559 hypothetical protein fig|83332.111.peg.3950 Putative cytochrome P450 125 (EC 1.14.-.-) fig|83332.111.peg.4125 Aspartokinase (EC 2.7.2.4) fig|83332.111.peg.757 SSU ribosomal protein S12p (S23e) fig|83332.111.peg.1782 Histidinol-phosphatase [alternative form] (EC 3.1.3.15) fig|83332.111.peg.3251 Chromosome partition protein smc fig|83332.111.peg.3561 putative membrane protein fig|83332.111.peg.4276 hypothetical protein fig|83332.111.peg.757 SSU ribosomal protein S12p (S23e) fig|83332.111.peg.327 Serine mycosin MycP3 fig|83332.111.peg.757 SSU ribosomal protein S12p (S23e) fig|83332.111.peg.3618 Protein translocase subunit SecA fig|83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) DRAFT92 fig|83332.111.peg.4220 Integral membrane indolylacetylinositol arabinosyltransferase EmbB (EC 2.4.2.-) fig|83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig|83332.111.peg.374 Iron-sulphur-binding reductase fig|83332.111.peg.484 hypothetical protein fig|83332.111.peg.1303 Respiratory nitrate reductase beta chain (EC 1.7.99.4) fig|83332.111.peg.374 Iron-sulphur-binding reductase fig|83332.111.peg.975 tRNA/rRNA methyltransferase fig|83332.111.peg.2777 Glycerol-3-phosphate acyltransferase (EC 2.3.1.15) fig|83332.111.peg.3499 Acyl-CoA dehydrogenase (EC 1.3.8.1) fig|83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig|83332.111.peg.3499 Acyl-CoA dehydrogenase (EC 1.3.8.1) fig|83332.111.peg.484 hypothetical protein fig|83332.111.peg.975 tRNA/rRNA methyltransferase fig|83332.111.peg.484 hypothetical protein fig|83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig|83332.111.peg.2280 Probable carboxylesterase LipT (EC 3.1.1.-) fig|83332.111.peg.1801 Efflux ABC transporter for /L-cysteine fig|83332.111.peg.2280 Probable carboxylesterase LipT (EC 3.1.1.-) fig|83332.111.peg.2280 Probable carboxylesterase LipT (EC 3.1.1.-) fig|83332.111.peg.4220 Integral membrane indolylacetylinositol arabinosyltransferase EmbB (EC 2.4.2.-) fig|83332.111.peg.24 annexin VII fig|83332.111.peg.4106 FIG022979: MoxR-like ATPases fig|83332.111.peg.4220 Integral membrane indolylacetylinositol arabinosyltransferase EmbB (EC 2.4.2.-) fig|83332.111.peg.4106 FIG022979: MoxR-like ATPases fig|83332.111.peg.3234 Putative lipoprotein lppW precursor fig|83332.111.peg.4220 Integral membrane indolylacetylinositol arabinosyltransferase EmbB (EC 2.4.2.-) fig|83332.111.peg.1801 Efflux ABC transporter for glutathione/L-cysteine fig|83332.111.peg.4176 Arogenate dehydrogenase (EC 1.3.1.43) fig|83332.111.peg.4220 Integral membrane indolylacetylinositol arabinosyltransferase EmbB (EC 2.4.2.-) fig|83332.111.peg.19 Serine/threonine-protein kinase PknA (EC 2.7.11.1) fig|83332.111.peg.4208 UDP-glucose 4-epimerase (EC 5.1.3.2) fig|83332.111.peg.757 SSU ribosomal protein S12p (S23e) fig|83332.111.peg.4208 UDP-glucose 4-epimerase (EC 5.1.3.2) fig|83332.111.peg.757 SSU ribosomal protein S12p (S23e) fig|83332.111.peg.4208 UDP-glucose 4-epimerase (EC 5.1.3.2) fig|83332.111.peg.757 SSU ribosomal protein S12p (S23e) fig|83332.111.peg.4208 UDP-glucose 4-epimerase (EC 5.1.3.2) DRAFT93 Table A.11: The top 50 PEGs ranked by the aggregation score the features in 100 RF clas- sifiers, each of which is trained independently on the M. tuberculosis streptomycin dataset. The aggregation score was computed as described in A1 in Section 4.4.1.

DRAFT94 PEG ID Product fig—83332.111.peg.2916 Phosphatidylinositol phosphate synthase @ Archaetidylinosi- tol phosphate synthase (EC 2.7.8.39) fig—83332.111.peg.3622 hypothetical protein fig—83332.111.peg.2123 Catalase-peroxidase KatG (EC 1.11.1.21) fig—83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig—83332.111.peg.2123 Catalase-peroxidase KatG (EC 1.11.1.21) fig—83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig—83332.111.peg.3561 putative membrane protein fig—83332.111.peg.3561 putative membrane protein fig—83332.111.peg.4276 hypothetical protein fig—83332.111.peg.2123 Catalase-peroxidase KatG (EC 1.11.1.21) fig—83332.111.peg.4125 Aspartokinase (EC 2.7.2.4) fig—83332.111.peg.2123 Catalase-peroxidase KatG (EC 1.11.1.21) fig—83332.111.peg.1782 Histidinol-phosphatase [alternative form] (EC 3.1.3.15) fig—83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig—83332.111.peg.4088 MBL-fold metallo-hydrolase superfamily fig—83332.111.peg.327 mycosin MycP3, 2C component of Type VII secretion system ESX-3 fig—83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig—83332.111.peg.2123 Catalase-peroxidase KatG (EC 1.11.1.21) fig—83332.111.peg.3251 Chromosome partition protein smc fig—83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig—83332.111.peg.374 Iron-sulphur-binding reductase fig—83332.111.peg.484 hypothetical protein fig—83332.111.peg.1303 Respiratory nitrate reductase beta chain (EC 1.7.99.4) fig—83332.111.peg.374 Iron-sulphur-binding reductase fig—83332.111.peg.975 tRNA/rRNA methyltransferase fig—83332.111.peg.2777 Glycerol-3-phosphate acyltransferase (EC 2.3.1.15) fig—83332.111.peg.3499 Acyl-CoA dehydrogenase (EC 1.3.8.1), 2C Mycobacterial sub- group FadE23 fig—83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig—83332.111.peg.3499 Acyl-CoA dehydrogenase (EC 1.3.8.1), 2C Mycobacterial sub- group FadE23 fig—83332.111.peg.484 hypothetical protein fig—83332.111.peg.975 tRNA/rRNA methyltransferase fig—83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig—83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig—83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig—83332.111.peg.2123 Catalase-peroxidase KatG (EC 1.11.1.21) fig—83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig—83332.111.peg.3251 Chromosome partition protein smc fig—83332.111.peg.1715 L-asparaginase (EC 3.5.1.1) DRAFT95 fig—83332.111.peg.3251 Chromosome partition protein smc fig—83332.111.peg.484 hypothetical protein fig—83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig—83332.111.peg.3251 Chromosome partition protein smc fig—83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig—83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig—83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig—83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig—83332.111.peg.4220 Integral membrane indolylacetylinositol arabinosyltransferase EmbB (EC 2.4.2.-) fig—83332.111.peg.1987 Membrane protein EccB5 fig—83332.111.peg.3251 Chromosome partition protein smc fig—83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig—83332.111.peg.4220 Integral membrane indolylacetylinositol arabinosyltransferase EmbB (EC 2.4.2.-) fig—83332.111.peg.24 annexin VII fig—83332.111.peg.4106 FIG022979: MoxR-like ATPases fig—83332.111.peg.4220 Integral membrane indolylacetylinositol arabinosyltransferase EmbB (EC 2.4.2.-) fig—83332.111.peg.4106 FIG022979: MoxR-like ATPases fig—83332.111.peg.3234 Putative lipoprotein lppW precursor fig—83332.111.peg.4220 Integral membrane indolylacetylinositol arabinosyltransferase EmbB (EC 2.4.2.-) fig—83332.111.peg.1801 Efflux ABC transporter for glutathione/L-cysteine fig—83332.111.peg.4176 Arogenate dehydrogenase (EC 1.3.1.43) fig—83332.111.peg.4220 Integral membrane indolylacetylinositol arabinosyltransferase EmbB (EC 2.4.2.-) fig—83332.111.peg.1801 Efflux ABC transporter for glutathione/L-cysteine fig—83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig—83332.111.peg.2280 Probable carboxylesterase LipT (EC 3.1.1.-) fig—83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig—83332.111.peg.3223 Site-specific tyrosine recombinase XerC fig—83332.111.peg.3223 Site-specific tyrosine recombinase XerC fig—83332.111.peg.6 DNA gyrase subunit A (EC 5.99.1.3) fig—83332.111.peg.2280 Probable carboxylesterase LipT (EC 3.1.1.-) fig—83332.111.peg.2280 Probable carboxylesterase LipT (EC 3.1.1.-) fig—83332.111.peg.340 PPE family protein fig—83332.111.peg.3618 Protein translocase subunit SecA fig—83332.111.peg.757 SSU ribosomal protein S12p (S23e) fig—83332.111.peg.6 DNA gyrase subunit A (EC 5.99.1.3) fig—83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig—83332.111.peg.6 DNA gyrase subunit A (EC 5.99.1.3) fig—83332.111.peg.3211 Mobile element protein DRAFT96 fig—83332.111.peg.757 SSU ribosomal protein S12p (S23e) fig—83332.111.peg.757 SSU ribosomal protein S12p (S23e) fig—83332.111.peg.3333 hypothetical protein fig—83332.111.peg.2927 Universal stress protein family fig—83332.111.peg.1024 hypothetical protein fig—83332.111.peg.757 SSU ribosomal protein S12p (S23e) fig—83332.111.peg.757 SSU ribosomal protein S12p (S23e) fig—83332.111.peg.3333 hypothetical protein fig—83332.111.peg.757 SSU ribosomal protein S12p (S23e) fig—83332.111.peg.3333 hypothetical protein fig—83332.111.peg.700 Enoyl-CoA hydratase (EC 4.2.1.17) fig—83332.111.peg.3618 Protein translocase subunit SecA fig—83332.111.peg.1125 LSU ribosomal protein L25p

Table A.12: The top 50 PEGs ranked by the aggregation score of the features in 100 RF classifiers, each of which is trained independently on the M. tuberculosis streptomycin dataset. The aggregation score was computed as described in A1 in Section 4.4.1.

DRAFT97 PEG ID Product fig|83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig|83332.111.peg.757 SSU ribosomal protein S12p (S23e) fig|83332.111.peg.2123 Catalase-peroxidase KatG (EC 1.11.1.21) fig|83332.111.peg.3251 Chromosome partition protein smc fig|83332.111.peg.2283 Modular polyketide synthase fig|83332.111.peg.1618 PE PGRS family protein fig|83332.111.peg.2624 PPE family protein fig|83332.111.peg.6 DNA gyrase subunit A (EC 5.99.1.3) fig|83332.111.peg.3731 PE PGRS family protein fig|83332.111.peg.3906 PE PGRS family protein fig|83332.111.peg.1186 PE PGRS family protein fig|83332.111.peg.3734 hypothetical protein fig|83332.111.peg.4365 16S rRNA (guanine(527)-N(7))-methyltransferase (EC 2.1.1.170) fig|83332.111.peg.19 Serine/threonine-protein kinase PknA (EC 2.7.11.1) fig|83332.111.peg.3905 PE PGRS family protein fig|83332.111.peg.4220 Integral membrane indolylacetylinositol arabinosyltransferase EmbB (EC 2.4.2.-) fig|83332.111.peg.3561 putative membrane protein fig|83332.111.peg.821 PE PGRS family protein fig|83332.111.peg.1217 PE PGRS family protein fig|83332.111.peg.819 hypothetical protein fig|83332.111.peg.2278 Nicotinamidase (EC 3.5.1.19) fig|83332.111.peg.2144 Putative uncharacterized protein BCG 1966 fig|83332.111.peg.314 PE PGRS family protein fig|83332.111.peg.329 hypothetical protein Rv0293c fig|83332.111.peg.3618 Protein translocase subunit SecA fig|83332.111.peg.818 hypothetical protein fig|83332.111.peg.393 PPE family protein fig|83332.111.peg.1765 13E12 repeat family protein fig|83332.111.peg.2809 Mobile element protein fig|83332.111.peg.2786 PE PGRS family protein fig|83332.111.peg.4231 5-Phosphoribosyl diphosphate (PRPP): decaprenyl- phosphate 5-phosphoribosyltransferase (EC 2.4.2.45) fig|83332.111.peg.3739 PPE family protein fig|83332.111.peg.817 FIG00822571: hypothetical protein fig|83332.111.peg.832 hypothetical protein fig|83332.111.peg.2822 [Acyl-carrier-protein] acetyl of FASI (EC 2.3.1.38) fig|83332.111.peg.3713 hypothetical protein fig|83332.111.peg.2927 Universal stress protein family fig|83332.111.peg.1616 PE PGRS family protein fig|83332.111.peg.820 Mobile element protein DRAFT98 fig|83332.111.peg.484 hypothetical protein fig|83332.111.peg.3861 13E12 repeat family protein fig|83332.111.peg.3910 hypothetical protein fig|83332.111.peg.3950 Putative cytochrome P450 125 (EC 1.14.-.-) fig|83332.111.peg.700 enoyl-CoA hydratase (EC 4.2.1.17) fig|83332.111.peg.3776 PE PGRS family protein fig|83332.111.peg.4088 MBL-fold metallo-hydrolase superfamily fig|83332.111.peg.1024 hypothetical protein fig|83332.111.peg.333 PE PGRS family protein fig|83332.111.peg.1187 PE PGRS family protein fig|83332.111.peg.59 SSU ribosomal protein S6p

Table A.13: The top 50 PEGs ranked by the aggregation score of the top 100 features in 100 RF classifiers, each of which is trained independently on the M. tuberculosis streptomycin dataset. The aggregation score was computed as described in A2 in Section 4.4.1.

DRAFT99 PEG ID Product fig|83332.111.peg.739 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) fig|83332.111.peg.2123 Catalase-peroxidase KatG (EC 1.11.1.21) fig|83332.111.peg.3251 Chromosome partition protein smc fig|83332.111.peg.920 PE PGRS family protein fig|83332.111.peg.826 PE PGRS family protein fig|83332.111.peg.4220 Integral membrane indolylacetylinositol arabinosyltransferase EmbB (EC 2.4.2.-) fig|83332.111.peg.4365 16S rRNA (guanine(527)-N(7))-methyltransferase (EC 2.1.1.170) fig|83332.111.peg.3906 PE PGRS family protein fig|83332.111.peg.6 DNA gyrase subunit A (EC 5.99.1.3) fig|83332.111.peg.1765 13E12 repeat family protein fig|83332.111.peg.757 SSU ribosomal protein S12p (S23e) fig|83332.111.peg.484 hypothetical protein fig|83332.111.peg.3561 putative membrane protein fig|83332.111.peg.3211 Mobile element protein fig|83332.111.peg.2283 Modular polyketide synthase fig|83332.111.peg.340 PPE family protein fig|83332.111.peg.2786 PE PGRS family protein fig|83332.111.peg.2624 PPE family protein fig|83332.111.peg.2517 fig|83332.111.peg.1801 efflux ABC transporter for glutathione/L-cysteine fig|83332.111.peg.1993 PPE family protein fig|83332.111.peg.2088 Glutamine synthetase (EC 6.3.1.2) fig|83332.111.peg.2280 Probable carboxylesterase LipT (EC 3.1.1.-) fig|83332.111.peg.3713 hypothetical protein fig|83332.111.peg.109 13E12 repeat family protein fig|83332.111.peg.393 PPE family protein fig|83332.111.peg.3739 PPE family protein fig|83332.111.peg.3145 FIG00820705: hypothetical protein fig|83332.111.peg.3861 13E12 repeat family protein fig|83332.111.peg.4125 Aspartokinase (EC 2.7.2.4) fig|83332.111.peg.374 Iron-sulphur-binding reductase fig|83332.111.peg.3499 Acyl-CoA dehydrogenase (EC 1.3.8.1) fig|83332.111.peg.2777 Glycerol-3-phosphate acyltransferase (EC 2.3.1.15) fig|83332.111.peg.3734 hypothetical protein fig|83332.111.peg.24 annexin VII fig|83332.111.peg.3618 Protein translocase subunit SecA fig|83332.111.peg.3729 PPE family protein fig|83332.111.peg.4088 MBL-fold metallo-hydrolase superfamily fig|83332.111.peg.19 Serine/threonine-protein kinase PknA (EC 2.7.11.1) fig|83332.111.peg.2278 Nicotinamidase (EC 3.5.1.19) Pyrazinamidase DRAFT100 fig|83332.111.peg.1303 Respiratory nitrate reductase beta chain (EC 1.7.99.4) fig|83332.111.peg.1616 PE PGRS family protein fig|83332.111.peg.975 tRNA/rRNA methyltransferase fig|83332.111.peg.1618 PE PGRS family protein fig|83332.111.peg.1285 13E12 repeat family protein fig|83332.111.peg.1957 PE PGRS family protein fig|83332.111.peg.827 PE PGRS family protein fig|83332.111.peg.1987 Membrane protein EccB5 fig|83332.111.peg.1542 Carbamoyl-phosphate synthase large chain (EC 6.3.5.5) fig|83332.111.peg.1715 L-asparaginase (EC 3.5.1.1)

Table A.14: The top 50 PEGs ranked by the aggregation score of the top 100 features in 100 RF classifiers, each of which is trained independently on the M. tuberculosis rifampicin dataset. The aggregation score was computed as described in A2 in Section 4.4.1.

A.3 AMR: Code i_1 = [’atg’,’tgc’,’aaa’,’ggg’] i_2 = [’atg’,’tgc’,’gtg’,’tta’] i_3 = [’atg’,’tga’,’tgc’] i_4 = [’atg’,’atg’] isolates = [i_1,i_2,i_3,i_4] num_col = 1 columns = {} kmers_in_col = [] for isolate in isolates: kmer_in_isolate = set(isolate) used_kmers = set() if columns: for kmers, vector in columns.items(): kmers_set = set(kmers) # cast once and use for opperations if kmers_set.issubset(kmer_in_isolate): # kmers are all in isolate vector.append(1) else: #Split case shared_kmers = kmers_set.intersection(kmer_in_isolate) kmers_in_col_only = kmers_set.difference(kmer_in_isolate) del columns[kmers] #remove origial entry if shared_kmers: columns[tuple(shared_kmers)]=vector+[1] DRAFT101 if kmers_in_col_only: columns[tuple(kmers_in_col_only)]=vector+[0] used_kmers = used_kmers.union(kmers_set) kmers_not_in_cols = kmer_in_isolate.difference(used_kmers) if kmers_not_in_cols: if columns: cnts = [0 for x in range(len(columns[columns.keys()[0]])-1)] cnts.append(1) else: cnts = [1] columns[tuple(kmers_not_in_cols)]=cnts print ’columns :’, columns

DRAFT102 A.4 PGR

No. Media Name % Positive Accuracy 1 Nitrogen-Glycine 0.23 0.78 2 Carbon-D-Galactonic-Acid-g-Lactone 0.25 0.80 3 Nitrogen-L-Serine 0.21 0.84 4 Nitrogen-L-Glutamic-Acid 0.43 0.74 5 Carbon-b-Methyl-D-Galactoside 0.25 0.81 6 Carbon-L-Aspartic-Acid 0.27 0.87 7 Carbon-Butylamine-sec 0.25 0.78 8 Carbon-a-Methyl-D-Glucoside 0.25 0.81 9 Nitrogen-D-Glucosamine 0.64 0.78 10 Nitrogen-Nitrite 0.31 0.81 11 Carbon-L-Glutamic-Acid 0.39 0.79 12 Carbon-Glycyl-L-Aspartic-Acid 0.61 0.82 13 Carbon-Caproic-Acid 0.25 0.80 14 Carbon-a-Keto-Valeric-Acid 0.25 0.79 15 Nitrogen-Gly-Asn 0.38 0.77 16 Carbon-b-Phenylethylamine 0.25 0.78 17 Carbon-D-L-Malic-Acid 0.20 0.88 18 Carbon-D- 0.55 0.77 19 Carbon-D-L-Citramalic-Acid 0.25 0.78 20 Carbon-2-3-Butanone 0.25 0.77 21 Nitrogen-Ala-Asp 0.63 0.79 22 Carbon-Maltose 0.36 0.79 23 Carbon-Capric-Acid 0.25 0.78 24 Carbon-Oxalomalic-Acid 0.25 0.81 25 Carbon-i-Erythritol 0.25 0.80 26 Carbon-D-Melezitose 0.25 0.79 27 Carbon-D-Alanine 0.22 0.82 28 Nitrogen-Nitrate 0.29 0.79 29 Nitrogen-Ala-Glu 0.55 0.81 30 Nitrogen-N-Acetyl-D-Glucosamine 0.30 0.81 31 Carbon-D-L-Octopamine 0.25 0.77 32 Nitrogen-Ala-Gly 0.61 0.82 33 Nitrogen-Ala-Gln 0.57 0.81 34 Carbon-D-L-a-Glycerol-Phosphate 0.41 0.78 35 Nitrogen-L-Proline 0.33 0.80 36 Carbon-L-Alanyl-Glycine 0.54 0.79 37 Nitrogen-L-Aspartic-Acid 0.29 0.90 38 Carbon-Glycyl-L-Proline 0.45 0.78 39 Nitrogen-Gly-Glu 0.59 0.83 DRAFT103 40 Carbon-D-Trehalose 0.34 0.78 41 Carbon-2-Deoxy-D-Ribose 0.28 0.84 42 Nitrogen-Ala-Thr 0.49 0.81 43 Nitrogen-Gly-Gln 0.61 0.83 44 Carbon-2-Hydroxy-Benzoic-Acid 0.25 0.79 45 Carbon-L-Proline 0.30 0.81 46 Carbon-b-D-Allose 0.25 0.81 47 Phosphate-D-L-a-Glycerol-Phosphate 0.46 0.71 48 Carbon-Glycerol 0.33 0.75 49 Carbon-D-Glucosamine 0.61 0.75 50 Carbon-a-Methyl-D-Galactoside 0.25 0.79 51 Carbon-N-Acetyl-D-Glucosamine 0.31 0.82 52 Carbon-L-Lactic-Acid 0.33 0.79 53 Carbon-m-Tartaric-Acid 0.25 0.79 54 Carbon-D-Mannose 0.53 0.74 55 Carbon-Glycyl-L-Glutamic-Acid 0.52 0.80 56 Carbon-4-Hydroxy-L-Proline-trans 0.25 0.75 57 Carbon-Fumaric-Acid 0.34 0.81 58 Carbon-p-Hydroxy-Phenylacetic-Acid 0.25 0.79 59 Carbon-L-Malic-Acid 0.20 0.89 60 Carbon-a-Hydroxy-Butyric-Acid 0.25 0.75 61 Carbon-D-Xylose 0.25 0.80 62 Carbon-a-Keto-Butyric-Acid 0.25 0.79 63 Nitrogen-D-Alanine 0.24 0.81 64 Carbon-D-Lactitol 0.25 0.78 65 Carbon-D-Ribose 0.42 0.82 66 Nitrogen-Cytosine 0.25 0.84 67 Carbon-Succinic-Acid 0.33 0.83

Table A.15: Datasets used in the PGR experiments. Each dataset is named after the growth media used. For each dataset we show the percentage of positive examples (with the remaining being negative being negative) [why are these always 20%] as well as the accuracy of the RF algorithm.

DRAFT104 Nitrogen-Glycine 1 Polyribonucleotide nucleotidyltransferase (EC 2.7.7.8), 11), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), NADH- ubiquinone oxidoreductase chain F (EC 1.6.5.3), 3), Thioredoxin reductase (EC 1.8.1.9), 3), Carbon monoxide dehydrogenase large chain (EC 1.2.99.2), 3), 2 Anhydro-N-acetylmuramic acid kinase (EC 2.7.1.170), 7), 3-oxoacyl-acyl-carrier-protein synthase; 2C KASIII (EC 2.3.1.180), 4), Phosphoenolpyruvate-protein phosphotransferase of PTS system (EC 2.7.3.9), 3), DNA polymerase IV (EC 2.7.7.7), 3), Phosphoribosylaminoimidazole-succinocarboxamide synthase (EC 6.3.2.6), 2), 3 (Guanosine-3; 2C5-bis(diphosphate) 3-pyrophosphohydrolase (EC 3.1.7.2) / GTP pyrophosphokinase (EC 2.7.6.5); 2C (p)ppGpp syn- thetase II, 18), FKBP-type peptidyl-prolyl cis-trans FkpA precursor (EC 5.2.1.8), 4), Valyl-tRNA synthetase (EC 6.1.1.9), 3), Methionine (EC 3.4.11.18), 3), Alkaline phosphatase (EC 3.1.3.1), 2), 4 Type I restriction-modification system; 2C restriction subunit R (EC 3.1.21.3), 4), Glycolate dehydrogenase (EC 1.1.99.14); 2C FAD-binding subunit GlcE, 4), Guanylate kinase (EC 2.7.4.8), 4), Ribonuclease E (EC 3.1.26.12), 3), CDP-diacylglycerol–serine O- phosphatidyltransferase (EC 2.7.8.8) @ Archaetidylserine synthase (EC 2.7.8.38), 3), 5 Potassium-transporting ATPase B chain (EC 3.6.3.12) (TC 3.A.3.7.1), 10), Threonyl-tRNA synthetase (EC 6.1.1.3), 6), D-arabinose 5- phosphate isomerase (EC 5.3.1.13), 5), Dihydrolipoamide acyltransferase component of branched-chain alpha-keto acid dehydrogenase complex (EC 2.3.1.168), 5), Cytochrome c-type biogenesis protein DsbD; 2C protein-disulfide reductase (EC 1.8.1.8), 4), 6 Phosphoserine aminotransferase (EC 2.6.1.52), 6), tRNA-i(6)A37 methylthiotransferase (EC 2.8.4.3), 4), Pyruvate dehydrogenase E1 component beta subunit (EC 1.2.4.1), 4), Peptidyl-prolyl cis-trans isomerase (EC 5.2.1.8), 4), Catalase KatE (EC 1.11.1.6), 3), 7 Glutamate-1-semialdehyde 2; 2C1-aminomutase (EC 5.4.3.8), 17), Exodeoxyribonuclease III (EC 3.1.11.2), 6), Phenylalanyl-tRNA syn- thetase alpha chain (EC 6.1.1.20), 6), Methylcrotonyl-CoA carboxylase biotin-containing subunit (EC 6.4.1.4), 6), 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 6), 8 Enolase (EC 4.2.1.11), 25), DNA polymerase I (EC 2.7.7.7), 6), Thymidylate synthase (EC 2.1.1.45), 6), Thiamine-monophosphate kinase (EC 2.7.4.16), 3), Ribosomal large subunit pseudouridine synthase D (EC 5.4.99.23), 3), 9 L-serine dehydratase; 2C beta subunit (EC 4.3.1.17) / L-serine dehydratase; 2C alpha subunit (EC 4.3.1.17), 6), N-formylglutamate deformylase (EC 3.5.1.68), 5), Gamma-glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 4), 1; 2C4-alpha- glucan (glycogen) branching ; 2C GH-13-type (EC 2.4.1.18), 4), Uptake hydrogenase small subunit precursor (EC 1.12.99.6), 3), 10 Glucose-6-phosphate isomerase (EC 5.3.1.9), 13), Triosephosphate isomerase (EC 5.3.1.1), 12), ATP-dependent Clp protease proteolytic subunit (EC 3.4.21.92), 6), Adenylosuccinate synthetase (EC 6.3.4.4), 5), UDP-glucose 6-dehydrogenase (EC 1.1.1.22), 5))

Nitrogen-Glycine 1 Polyribonucleotide nucleotidyltransferase (EC 2.7.7.8), 11), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), NADH- ubiquinone oxidoreductase chain F (EC 1.6.5.3), 3), Thioredoxin reductase (EC 1.8.1.9), 3), Carbon monoxide dehydrogenase large chain (EC 1.2.99.2), 3), 2 Anhydro-N-acetylmuramic acid kinase (EC 2.7.1.170), 7), 3-oxoacyl-acyl-carrier-protein synthase; 2C KASIII (EC 2.3.1.180), 4), Phosphoenolpyruvate-protein phosphotransferase of PTS system (EC 2.7.3.9), 3), DNA polymerase IV (EC 2.7.7.7), 3), Phosphoribosylaminoimidazole-succinocarboxamide synthase (EC 6.3.2.6), 2), 3 (Guanosine-3; 2C5-bis(diphosphate) 3-pyrophosphohydrolase (EC 3.1.7.2) / GTP pyrophosphokinase (EC 2.7.6.5); 2C (p)ppGpp syn- thetase II, 18), FKBP-type peptidyl-prolyl cis-trans isomerase FkpA precursor (EC 5.2.1.8), 4), Valyl-tRNA synthetase (EC 6.1.1.9), 3), Methionine aminopeptidase (EC 3.4.11.18), 3), Alkaline phosphatase (EC 3.1.3.1), 2), 4 Type I restriction-modification system; 2C restriction subunit R (EC 3.1.21.3), 4), Glycolate dehydrogenase (EC 1.1.99.14); 2C FAD-binding subunit GlcE, 4), Guanylate kinase (EC 2.7.4.8), 4), Ribonuclease E (EC 3.1.26.12), 3), CDP-diacylglycerol–serine O- phosphatidyltransferase (EC 2.7.8.8) @ Archaetidylserine synthase (EC 2.7.8.38), 3), 5 Potassium-transporting ATPase B chain (EC 3.6.3.12) (TC 3.A.3.7.1), 10), Threonyl-tRNA synthetase (EC 6.1.1.3), 6), D-arabinose 5- phosphate isomerase (EC 5.3.1.13), 5), Dihydrolipoamide acyltransferase component of branched-chain alpha-keto acid dehydrogenase complex (EC 2.3.1.168), 5), Cytochrome c-type biogenesis protein DsbD; 2C protein-disulfide reductase (EC 1.8.1.8), 4), 6 Phosphoserine aminotransferase (EC 2.6.1.52), 6), tRNA-i(6)A37 methylthiotransferase (EC 2.8.4.3), 4), Pyruvate dehydrogenase E1 component beta subunit (EC 1.2.4.1), 4), Peptidyl-prolyl cis-trans isomerase (EC 5.2.1.8), 4), Catalase KatE (EC 1.11.1.6), 3), 7 Glutamate-1-semialdehyde 2; 2C1-aminomutase (EC 5.4.3.8), 17), Exodeoxyribonuclease III (EC 3.1.11.2), 6), Phenylalanyl-tRNA syn- thetase alpha chain (EC 6.1.1.20), 6), Methylcrotonyl-CoA carboxylase biotin-containing subunit (EC 6.4.1.4), 6), 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 6), 8 Enolase (EC 4.2.1.11), 25), DNA polymerase I (EC 2.7.7.7), 6), Thymidylate synthase (EC 2.1.1.45), 6), Thiamine-monophosphate kinase (EC 2.7.4.16), 3), Ribosomal large subunit pseudouridine synthase D (EC 5.4.99.23), 3), 9 L-serine dehydratase; 2C beta subunit (EC 4.3.1.17) / L-serine dehydratase; 2C alpha subunit (EC 4.3.1.17), 6), N-formylglutamate deformylase (EC 3.5.1.68), 5), Gamma-glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 4), 1; 2C4-alpha- glucan (glycogen) branching enzyme; 2C GH-13-type (EC 2.4.1.18), 4), Uptake hydrogenase small subunit precursor (EC 1.12.99.6), 3), 10 Glucose-6-phosphate isomerase (EC 5.3.1.9), 13), Triosephosphate isomerase (EC 5.3.1.1), 12), ATP-dependent Clp protease proteolytic subunit (EC 3.4.21.92), 6), Adenylosuccinate synthetase (EC 6.3.4.4), 5), UDP-glucose 6-dehydrogenase (EC 1.1.1.22), 5))

Carbon-D-Galactonic-Acid-g-Lactone 1 ATP synthase alpha chain (EC 3.6.3.14), 60), Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61), 10), Ribonucleotide reductase of class Ib (aerobic); 2C alpha subunit (EC 1.17.4.1), 6), Ribonuclease HII (EC 3.1.26.4), 6), Choloylglycine hydrolase (EC 3.5.1.24), 6), 2 Cytochrome d ubiquinol oxidase subunit II (EC 1.10.3.-), 13), Cell division protein FtsH (EC 3.4.24.-), 5), Phosphoribosylglycinamide formyltransferase 2 (EC 2.1.2.-), 5), DNA polymerase III alpha subunit (EC 2.7.7.7), 4), NADH-ubiquinone oxidoreductase chain C (EC 1.6.5.3) / NADH-ubiquinone oxidoreductase chain D (EC 1.6.5.3), 4), 3 Cell division protein FtsH (EC 3.4.24.-), 8), Polyribonucleotide nucleotidyltransferase (EC 2.7.7.8), 6), UDP-N-acetylglucosamine 1- carboxyvinyltransferase (EC 2.5.1.7), 5), N-succinyl-L; 2CL-diaminopimelate desuccinylase (EC 3.5.1.18), 4), Carbamoyl-phosphate syn- thase large chain (EC 6.3.5.5), 3), 4 Cytochrome c oxidase polypeptide I (EC 1.9.3.1), 10), DNA polymerase III alpha subunit (EC 2.7.7.7), 9), Phosphoribosylformylglyci- namidine cyclo- (EC 6.3.3.1), 8), Homoserine dehydrogenase (EC 1.1.1.3), 7), Anaerobic sulfite reductase subunit C (EC 1.8.1.-), 6), 5 Adenylosuccinate (EC 4.3.2.2) @ SAICAR lyase (EC 4.3.2.2), 8), Aminomethyltransferase (glycine cleavage system T protein) (EC 2.1.2.10), 5), 3-oxoadipate CoA-transferase subunit A (EC 2.8.3.6), 5), Acryloyl-CoA reductase AcuI/YhdH (EC 1.3.1.84), 5), Long-chain- fatty-acid–CoA ligase (EC 6.2.1.3), 4), 6 Branched-chain alpha-keto acid dehydrogenase; 2C E1 component; 2C beta subunit (EC 1.2.4.4), 8), Type I restriction-modification system; 2C restriction subunit R (EC 3.1.21.3), 7), Phosphoribosylformylglycinamidine cyclo-ligase (EC 6.3.3.1), 7), 4-hydroxy- tetrahydrodipicolinate synthase (EC 4.3.3.7), 4), Glycine dehydrogenase decarboxylating (glycine cleavage system P2 protein) (EC 1.4.4.2), 4), 7 Glycerol kinase (EC 2.7.1.30), 8), Thioredoxin reductase (EC 1.8.1.9), 6), (Guanosine-3; 2C5-bis(diphosphate) 3-pyrophosphohydrolase (EC 3.1.7.2) / GTP pyrophosphokinase (EC 2.7.6.5); 2C (p)ppGpp synthetase II, 5), S-adenosylmethionine synthetase (EC 2.5.1.6), 5), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 4), 8 Aldehyde dehydrogenase (EC 1.2.1.3), 6), DNA ligase (NAD(+)) (EC 6.5.1.2), 5), 2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2), 5), Phosphoserine phosphatase (EC 3.1.3.3), 5), Cell division protein FtsH (EC 3.4.24.-), 4), 9 2-C-methyl-D-erythritol 2; 2C4-cyclodiphosphate synthase (EC 4.6.1.12), 6), Aspartyl-tRNA synthetase (EC 6.1.1.12) @ Aspartyl- tRNA(Asn) synthetase (EC 6.1.1.23), 6), CDP-diacylglycerol–glycerol-3-phosphate 3-phosphatidyltransferase (EC 2.7.8.5), 5), ATP- dependent DNA ligase (EC 6.5.1.1) clustered with Ku protein; 2C LigD, 5), Valyl-tRNA synthetase (EC 6.1.1.9), 5), DRAFT105 10 2-isopropylmalate synthase (EC 2.3.3.13), 19), ATP synthase alpha chain (EC 3.6.3.14), 7), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 6), Aspartate aminotransferase (EC 2.6.1.1), 5), Dihydrolipoamide acetyltransferase component of pyruvate dehydrogenase complex (EC 2.3.1.12), 4))

Nitrogen-L-Serine 1 2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2), 13), NADPH-dependent 7-cyano-7-deazaguanine reductase (EC 1.7.1.13), 7), Cytosol aminopeptidase PepA (EC 3.4.11.1), 6), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 5), ATP synthase beta chain (EC 3.6.3.14), 5), 2 (Pyridoxal 5-phosphate synthase (glutamine hydrolyzing); 2C glutaminase subunit (EC 4.3.3.6), 3), Glutaryl-CoA dehydrogenase (EC 1.3.8.6), 3), Sulfur carrier protein ThiS adenylyltransferase (EC 2.7.7.73), 2), Uroporphyrinogen III decarboxylase (EC 4.1.1.37), 2), Galactoside O-acetyltransferase (EC 2.3.1.18), 2), 3 Quinolinate phosphoribosyltransferase decarboxylating (EC 2.4.2.19), 6), ATP synthase beta chain (EC 3.6.3.14), 6), Alcohol dehydrogenase (EC 1.1.1.1), 4), Holo-acyl-carrier-protein synthase (EC 2.7.8.7), 4), CDP-diacylglycerol–glycerol-3-phosphate 3- phosphatidyltransferase (EC 2.7.8.5), 3), 4 Phosphoribosylamine–glycine ligase (EC 6.3.4.13), 14), Glutamate synthase NADPH large chain (EC 1.4.1.13), 7), Phosphoglycerate (EC 5.4.2.11), 5), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 5), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 5), 5 Epoxide hydrolase (EC 3.3.2.9), 17), Valyl-tRNA synthetase (EC 6.1.1.9), 5), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 5), Ribulokinase (EC 2.7.1.16), 5), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 4), 6 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 6), Ketopantoate reductase PanG (EC 1.1.1.169), 4), Seryl-tRNA synthetase (EC 6.1.1.11), 4), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 3), tRNA t(6)A37-methylthiotransferase (EC 2.8.4.5), 3), 7 DNA topoisomerase I (EC 5.99.1.2), 10), Alanyl-tRNA synthetase (EC 6.1.1.7), 6), ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 5), Type I restriction-modification system; 2C restriction subunit R (EC 3.1.21.3), 4), D-alanyl-D-alanine (EC 3.4.16.4), 3), 8 Isoleucyl-tRNA synthetase (EC 6.1.1.5), 13), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 9), Branched-chain alpha-keto acid dehydro- genase; 2C E1 component; 2C beta subunit (EC 1.2.4.4), 6), Acetolactate synthase large subunit (EC 2.2.1.6), 5), Trehalose synthase (EC 5.4.99.16), 5), 9 Succinyl-CoA ligase ADP-forming alpha chain (EC 6.2.1.5), 6), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 4), 6-phosphofructokinase (EC 2.7.1.11), 3), Sulfate adenylyltransferase subunit 2 (EC 2.7.7.4), 2), ATP synthase alpha chain (EC 3.6.3.14), 2), 10 Adenylosuccinate lyase (EC 4.3.2.2) @ SAICAR lyase (EC 4.3.2.2), 5), 4-hydroxy-tetrahydrodipicolinate reductase (EC 1.17.1.8), 4), DNA gyrase subunit A (EC 5.99.1.3), 4), 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase (EC 1.17.7.1), 3), ATP-dependent DNA helicase RecG (EC 3.6.4.12), 3))

Nitrogen-L-Glutamic-Acid 1 ATP synthase beta chain (EC 3.6.3.14), 13), Riboflavin synthase eubacterial/eukaryotic (EC 2.5.1.9), 5), Ribonucleotide reductase of class Ia (aerobic); 2C beta subunit (EC 1.17.4.1), 5), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), Ribose 1; 2C5-bisphosphate phosphokinase PhnN (EC 2.7.4.23), 4), 2 PTS system; 2C lactose-specific IIA component (EC 2.7.1.69), 3), Adenylosuccinate synthetase (EC 6.3.4.4), 2), Dihydroorotase (EC 3.5.2.3), 2), Pyrimidine-nucleoside phosphorylase (EC 2.4.2.2), 2), Ferredoxin–NADP(+) reductase (EC 1.18.1.2), 2), 3 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 8), DNA polymerase III alpha subunit (EC 2.7.7.7), 5), Pyruvate-flavodoxin oxidoreductase (EC 1.2.7.-), 4), Cell division protein FtsZ (EC 3.4.24.-), 4), Galactose-1-phosphate uridylyltransferase (EC 2.7.7.10), 4), 4 Replicative DNA helicase (DnaB) (EC 3.6.4.12), 6), Polyphosphate kinase (EC 2.7.4.1), 5), ATP-dependent protease La (EC 3.4.21.53) Type I, 4), Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61) / 2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2) @ 2-oxoglutarate decarboxylase (EC 4.1.1.71) @ 2-hydroxy-3-oxoadipate synthase (EC 2.2.1.5), 3), Ribonuclease HII (EC 3.1.26.4), 3), 5 Transketolase (EC 2.2.1.1), 7), 6-phosphofructokinase (EC 2.7.1.11), 6), ATP-dependent protease La (EC 3.4.21.53) Type I, 5), beta- glucosidase (EC 3.2.1.21), 5), Methionine aminopeptidase (EC 3.4.11.18), 5), 6 alpha-L-rhamnosidase (EC 3.2.1.40), 2), Lead; 2C cadmium; 2C and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 2), Two-component sensor histidine kinase; 2C malate (EC 2.7.3.-), 1), L-seryl- tRNA(Sec) selenium transferase (EC 2.9.1.1), 1), Uroporphyrinogen-III methyltransferase (EC 2.1.1.107) / Uroporphyrinogen-III synthase (EC 4.2.1.75), 1), 7 Soluble lytic murein transglycosylase precursor (EC 3.2.1.-), 2), Phosphoribosylaminoimidazole-succinocarboxamide synthase (EC 6.3.2.6), 2), Hydroxymethylpyrimidine phosphate synthase ThiC (EC 4.1.99.17), 1), Diaminohydroxyphosphoribosylaminopyrimidine deaminase (EC 3.5.4.26) / 5-amino-6-(5-phosphoribosylamino)uracil reductase (EC 1.1.1.193), 1), Valyl-tRNA synthetase (EC 6.1.1.9), 1), 8 Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 4), Streptococcal lipoprotein rotamase A; 3B Peptidyl-prolyl cis-trans isomerase (EC 5.2.1.8), 4), 5-methyltetrahydrofolate–homocysteine methyltransferase (EC 2.1.1.13), 3), Phosphoribosylformylglycinamidine syn- thase; 2C synthetase subunit (EC 6.3.5.3) / Phosphoribosylformylglycinamidine synthase; 2C glutamine amidotransferase subunit (EC 6.3.5.3), 3), Urocanate hydratase (EC 4.2.1.49), 3), 9 (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 6), Leucyl-tRNA synthetase (EC 6.1.1.4), 6), 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase (EC 1.17.7.1), 3), Deoxycytidine triphosphate deaminase (EC 3.5.4.13), 3), Phosphoribosylformylglycinamidine synthase; 2C synthetase subunit (EC 6.3.5.3) / Phosphoribosylformylglycinamidine synthase; 2C glutamine amidotransferase subunit (EC 6.3.5.3), 3), 10 DNA gyrase subunit A (EC 5.99.1.3), 6), O-acetylhomoserine sulfhydrylase (EC 2.5.1.49) / O-succinylhomoserine sulfhydrylase (EC 2.5.1.48), 4), Hydroxymethylpyrimidine phosphate kinase ThiD (EC 2.7.4.7), 4), Fumarate hydratase class II (EC 4.2.1.2), 4), Citrate lyase alpha chain (EC 4.1.3.6), 3))

Carbon-b-Methyl-D-Galactoside 1 ATP-dependent protease La (EC 3.4.21.53) Type I, 6), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 6), 2-methylcitrate dehydratase (EC 4.2.1.79), 6), ATP-dependent Clp protease proteolytic subunit (EC 3.4.21.92), 6), Exodeoxyribonuclease V alpha chain (EC 3.1.11.5), 5), 2 Ribonuclease E (EC 3.1.26.12), 7), Arginyl-tRNA synthetase (EC 6.1.1.19), 4), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 4), Arginine deiminase (EC 3.5.3.6), 3), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 3), 3 Phosphoglucosamine mutase (EC 5.4.2.10), 14), UDP-N-acetylglucosamine 1-carboxyvinyltransferase (EC 2.5.1.7), 11), Hydrox- ymethylpyrimidine phosphate synthase ThiC (EC 4.1.99.17), 10), Fructose-bisphosphate aldolase class II (EC 4.1.2.13), 8), 3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 8), 4 Fructose-1; 2C6-bisphosphatase; 2C type I (EC 3.1.3.11), 23), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 14), (DNA-directed RNA poly- merase beta subunit (EC 2.7.7.6), 7), ATP synthase beta chain (EC 3.6.3.14), 7), Glycine dehydrogenase decarboxylating (glycine cleavage system P protein) (EC 1.4.4.2), 6), 5 (Inosine-5-monophosphate dehydrogenase (EC 1.1.1.205) / CBS domain, 11), Alanyl-tRNA synthetase (EC 6.1.1.7), 11), Gamma- aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19), 8), N-acetylglucosamine-6-phosphate deacetylase (EC 3.5.1.25), 8), Adenosylmethionine-8-amino-7-oxononanoate aminotransferase (EC 2.6.1.62), 7), 6 Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19), 17), Glycerol kinase (EC 2.7.1.30), 12), Adenosylmethionine- 8-amino-7-oxononanoate aminotransferase (EC 2.6.1.62), 9), NADH-ubiquinone oxidoreductase chain L (EC 1.6.5.3), 7), Methylglyoxal synthase (EC 4.2.3.3), 7), 7 DNA-directed RNA polymerase alpha subunit (EC 2.7.7.6), 23), Prolyl-tRNA synthetase (EC 6.1.1.15); 2C bacterial type, 17), Peptide chain release factor N(5)-glutamine methyltransferase (EC 2.1.1.297), 6), Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14), 5), DNA polymerase I (EC 2.7.7.7), 5), 8 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 8), NAD(P) transhydrogenase alpha subunit (EC 1.6.1.2), 7), Exodeoxyribonu- clease V beta chain (EC 3.1.11.5), 6), Acetolactate synthase large subunit (EC 2.2.1.6), 6), Exodeoxyribonuclease VII large subunit (EC 3.1.11.6), 6), DRAFT106 9 NADH dehydrogenase (EC 1.6.99.3), 8), D-alanyl-D-alanine carboxypeptidase (EC 3.4.16.4), 5), A/G-specific adenine glycosylase (EC 3.2.2.-), 4), Lysyl-tRNA synthetase (class II) (EC 6.1.1.6), 4), Glucans biosynthesis glucosyltransferase H (EC 2.4.1.-), 4), 10 Cytochrome c oxidase polypeptide I (EC 1.9.3.1), 12), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 8), Homoserine O-acetyltransferase (EC 2.3.1.31), 7), Homoserine dehydrogenase (EC 1.1.1.3), 6), 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 5))

Carbon-L-Aspartic-Acid 1 Membrane N (EC 3.4.11.2), 5), Cell division protein FtsI Peptidoglycan synthetase (EC 2.4.1.129), 4), Multi- modular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 4), Ribonuclease HI (EC 3.1.26.4), 3), Glucosamine-6-phosphate deaminase (EC 3.5.99.6), 3), 2 DNA polymerase III beta subunit (EC 2.7.7.7), 6), DNA gyrase subunit A (EC 5.99.1.3), 5), Methenyltetrahydrofolate cyclohydrolase (EC 3.5.4.9) / Methylenetetrahydrofolate dehydrogenase (NADP+) (EC 1.5.1.5), 4), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), 3 tRNA t(6)A37-methylthiotransferase (EC 2.8.4.5), 5), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 4), Pyruvate; 2Cphos- phate dikinase (EC 2.7.9.1), 3), Adenylosuccinate synthetase (EC 6.3.4.4), 2), (23S rRNA (guanosine(2251)-2-O)-methyltransferase (EC 2.1.1.185), 2), 4 Adenosylhomocysteinase (EC 3.3.1.1), 4), Ribosomal large subunit pseudouridine synthase F (EC 5.4.99.21), 3), ATP synthase alpha chain (EC 3.6.3.14), 3), Tyrosyl-tRNA synthetase (EC 6.1.1.1), 3), Glucose 1-dehydrogenase (EC 1.1.1.47), 3), 5 Apolipoprotein N-acyltransferase (EC 2.3.1.-), 4), 4-hydroxy-tetrahydrodipicolinate reductase (EC 1.17.1.8), 4), 1-hydroxy-2-methyl-2- (E)-butenyl 4-diphosphate synthase (EC 1.17.7.1), 3), Sensor protein comP (EC 2.7.1.-), 3), Adenylosuccinate lyase (EC 4.3.2.2) @ SAICAR lyase (EC 4.3.2.2), 3), 6 Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 12), Threonine synthase (EC 4.2.3.1), 4), Molybdopterin molybdenumtransferase (EC 2.10.1.1), 4), 4-aminobutyrate aminotransferase (EC 2.6.1.19), 4), Signal transduction histidine kinase CheA (EC 2.7.3.-), 4), 7 Kojibiose phosphorylase (EC 2.4.1.230), 5), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 3), Cell division protein FtsH (EC 3.4.24.-), 3), Cysteine desulfurase (EC 2.8.1.7) ; 3D¿ SufS, 3), tRNA-guanine transglycosylase (EC 2.4.2.29), 3), 8 Adenylosuccinate lyase (EC 4.3.2.2) @ SAICAR lyase (EC 4.3.2.2), 4), Alcohol dehydrogenase (EC 1.1.1.1), 3), UDP-N-acetylglucosamine 1-carboxyvinyltransferase (EC 2.5.1.7), 3), SSU rRNA (adenine(1518)-N(6)/adenine(1519)-N(6))-dimethyltransferase (EC 2.1.1.182), 3), 5-methyltetrahydropteroyltriglutamate–homocysteine methyltransferase (EC 2.1.1.14), 3), 9 Enolase (EC 4.2.1.11), 15), Thymidylate synthase (EC 2.1.1.45), 6), (Adenosine (5)-pentaphospho-(5)-adenosine pyrophosphohydrolase (EC 3.6.1.-), 3), DNA polymerase III subunits gamma and tau (EC 2.7.7.7), 2), Aspartyl-tRNA synthetase (EC 6.1.1.12) @ Aspartyl- tRNA(Asn) synthetase (EC 6.1.1.23), 2), 10 Methylmalonyl-CoA mutase (EC 5.4.99.2), 9), Lysyl-tRNA synthetase (class II) (EC 6.1.1.6), 5), Glycogen phosphorylase (EC 2.4.1.1), 5), Molybdopterin-synthase adenylyltransferase (EC 2.7.7.80), 3), Glutamate 5-kinase (EC 2.7.2.11) / RNA-binding C-terminal domain PUA, 3))

Carbon-Butylamine-sec 1 Aspartokinase (EC 2.7.2.4), 5), Urocanate hydratase (EC 4.2.1.49), 5), Type I restriction-modification system; 2C restriction subunit R (EC 3.1.21.3), 5), DNA primase (EC 2.7.7.-), 5), Alanyl-tRNA synthetase (EC 6.1.1.7), 5), 2 Glycyl-tRNA synthetase alpha chain (EC 6.1.1.14), 6), beta-N-acetylglucosaminidase (EC 3.2.1.52), 4), Carbon monoxide dehydrogenase small chain (EC 1.2.99.2), 3), O-acetylhomoserine sulfhydrylase (EC 2.5.1.49) / O-succinylhomoserine sulfhydrylase (EC 2.5.1.48), 3), Glycerol kinase (EC 2.7.1.30), 3), 3 Nitrite reductase NAD(P)H large subunit (EC 1.7.1.4), 9), Methionyl-tRNA synthetase (EC 6.1.1.10), 8), N5-carboxyaminoimidazole ribonucleotide mutase (EC 5.4.99.18), 7), dTDP-glucose 4; 2C6-dehydratase (EC 4.2.1.46), 6), Ribosomal large subunit pseudouridine synthase C (EC 5.4.99.24), 6), 4 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 7), Valyl-tRNA synthetase (EC 6.1.1.9), 5), 23S rRNA (cytosine(1962)-C(5))- methyltransferase (EC 2.1.1.191), 5), Ribonuclease III (EC 3.1.26.3), 4), N(6)-L-threonylcarbamoyladenine synthase (EC 2.3.1.234), 4), 5 Leucyl-tRNA synthetase (EC 6.1.1.4), 9), Dihydroxy-acid dehydratase (EC 4.2.1.9), 7), Glucosamine–fructose-6-phosphate aminotrans- ferase isomerizing (EC 2.6.1.16), 5), Potassium-transporting ATPase B chain (EC 3.6.3.12) (TC 3.A.3.7.1), 4), Histidinol dehydrogenase (EC 1.1.1.23), 4), 6 Ribonucleotide reductase of class Ia (aerobic); 2C alpha subunit (EC 1.17.4.1), 6), ATP synthase alpha chain (EC 3.6.3.14), 5), Gamma- glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 4), Cobalt-precorrin-7 (C5)-methyltransferase (EC 2.1.1.289) / Cobalt-precorrin-6B C15-methyltransferase decarboxylating (EC 2.1.1.196), 4), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 4), 7 3-polyprenyl-4-hydroxybenzoate carboxy-lyase (EC 4.1.1.98), 6), Cell division protein FtsH (EC 3.4.24.-), 5), Glycerol-3-phosphate de- hydrogenase NAD(P)+ (EC 1.1.1.94), 5), 23S rRNA (guanine(1835)-N(2))-methyltransferase (EC 2.1.1.174), 4), Aspartyl-tRNA(Asn) amidotransferase subunit B (EC 6.3.5.6) @ Glutamyl-tRNA(Gln) amidotransferase subunit B (EC 6.3.5.7), 4), 8 Aldehyde dehydrogenase (EC 1.2.1.3), 19), 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 17), Crossover junction endodeoxyri- bonuclease RuvC (EC 3.1.22.4), 10), Methionine aminopeptidase (EC 3.4.11.18), 7), Methylcrotonyl-CoA carboxylase carboxyl transferase subunit (EC 6.4.1.4), 7), 9 Cytochrome c oxidase polypeptide I (EC 1.9.3.1), 10), Aspartokinase (EC 2.7.2.4), 6), Electron transfer flavoprotein-ubiquinone oxi- doreductase (EC 1.5.5.1), 6), ATP synthase alpha chain (EC 3.6.3.14), 5), 3-hydroxybutyryl-CoA dehydrogenase (EC 1.1.1.157); 3B 3-hydroxyacyl-CoA dehydrogenase (EC 1.1.1.35), 5), 10 Adenylosuccinate synthetase (EC 6.3.4.4), 9), NAD kinase (EC 2.7.1.23), 9), Arginyl-tRNA synthetase (EC 6.1.1.19), 8), Triosephosphate isomerase (EC 5.3.1.1), 6), Valyl-tRNA synthetase (EC 6.1.1.9), 6))

Carbon-a-Methyl-D-Glucoside 1 Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 11), 1-deoxy-D-xylulose 5-phosphate synthase (EC 2.2.1.7), 10), Dihydrolipoamide acetyl- transferase component of pyruvate dehydrogenase complex (EC 2.3.1.12), 8), Thioredoxin reductase (EC 1.8.1.9), 7), 2; 2C4-dienoyl-CoA reductase NADPH (EC 1.3.1.34), 5), 2 Catalase KatE (EC 1.11.1.6), 9), Catalase KatE-intracellular protease (EC 1.11.1.6), 9), DEAD-box ATP-dependent RNA helicase DeaD (; 3D CshA) (EC 3.6.4.13), 8), 23S rRNA (adenine(2503)-C(2))-methyltransferase @ tRNA (adenine(37)-C(2))-methyltransferase (EC 2.1.1.192), 6), Cytosol aminopeptidase PepA (EC 3.4.11.1), 5), 3 Aspartyl-tRNA(Asn) amidotransferase subunit B (EC 6.3.5.6) @ Glutamyl-tRNA(Gln) amidotransferase subunit B (EC 6.3.5.7), 15), DNA gyrase subunit A (EC 5.99.1.3), 10), Succinyl-CoA ligase ADP-forming alpha chain (EC 6.2.1.5), 8), Enoyl-CoA hydratase (EC 4.2.1.17) @ Enoyl-CoA hydratase EchA5 (EC 4.2.1.17), 7), 3-oxoacyl-acyl-carrier-protein synthase; 2C KASII (EC 2.3.1.179), 6), 4 NAD-dependent dihydropyrimidine dehydrogenase subunit PreA (EC 1.3.1.1), 7), Isocitrate dehydrogenase NADP (EC 1.1.1.42), 7), Phosphoribosylglycinamide formyltransferase 2 (EC 2.1.2.-), 6), Membrane-bound lytic murein transglycosylase F (EC 4.2.2.n1), 5), Branched-chain alpha-keto acid dehydrogenase; 2C E1 component; 2C beta subunit (EC 1.2.4.4), 5), 5 GMP synthase glutamine-hydrolyzing; 2C amidotransferase subunit (EC 6.3.5.2) / GMP synthase glutamine-hydrolyzing; 2C ATP pyrophosphatase subunit (EC 6.3.5.2), 28), Dihydroxy-acid dehydratase (EC 4.2.1.9), 7), Proline dehydrogenase (EC 1.5.5.2) / Delta-1-pyrroline-5-carboxylate dehydrogenase (EC 1.2.1.88), 6), NADH-ubiquinone oxidoreductase chain L (EC 1.6.5.3), 6), Gamma- aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19), 6), 6 Gamma-glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 5), Lysine 2; 2C3-aminomutase (EC 5.4.3.2), 4), Acetyl-coenzyme A carboxyl transferase alpha chain (EC 6.4.1.2), 4), Valyl-tRNA synthetase (EC 6.1.1.9), 3), Transketolase (EC 2.2.1.1), 3), 7 Succinate dehydrogenase flavoprotein subunit (EC 1.3.5.1), 8), S-adenosylmethionine synthetase (EC 2.5.1.6), 8), 5- methyltetrahydrofolate–homocysteine methyltransferase (EC 2.1.1.13), 7), Alanyl-tRNA synthetase (EC 6.1.1.7), 6), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 6), 8 Ribosomal large subunit pseudouridine synthase D (EC 5.4.99.23), 6), Thioredoxin reductase (EC 1.8.1.9), 5), Phosphoenolpyruvate carboxylase (EC 4.1.1.31), 5), (Inosine-5-monophosphate dehydrogenase (EC 1.1.1.205) / CBS domain, 5), Methenyltetrahydrofolate cyclohydrolase (EC 3.5.4.9) / Methylenetetrahydrofolate dehydrogenase (NADP+) (EC 1.5.1.5), 4), DRAFT107 9 NADH-ubiquinone oxidoreductase chain G (EC 1.6.5.3), 6), 3-isopropylmalate dehydratase large subunit (EC 4.2.1.33), 6), DNA topoi- somerase I (EC 5.99.1.2), 6), ATP-dependent protease La (EC 3.4.21.53) Type I, 4), Cytochrome c oxidase polypeptide I (EC 1.9.3.1), 4), 10 Tyrosyl-tRNA synthetase (EC 6.1.1.1), 7), Uptake hydrogenase large subunit (EC 1.12.99.6), 7), 3-isopropylmalate dehydratase large subunit (EC 4.2.1.33), 6), 4-hydroxythreonine-4-phosphate dehydrogenase (EC 1.1.1.262), 5), (Deoxyuridine 5-triphosphate nucleotidohy- drolase (EC 3.6.1.23), 4))

Nitrogen-D-Glucosamine 1 (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), Pyruvate formate-lyase (EC 2.3.1.54), 2), Neopullulanase (EC 3.2.1.135), 2), NAD(P) transhydrogenase alpha subunit (EC 1.6.1.2), 2), Threonine synthase (EC 4.2.3.1), 2), 2 Gamma-glutamyl phosphate reductase (EC 1.2.1.41), 2), Tetrachloroethene reductive dehalogenase PceA (EC 1.97.1.8), 2), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 1), Uracil phosphoribosyltransferase (EC 2.4.2.9), 1), Protein phosphatase 2C (EC 3.1.3.16), 1), 3 (5-methylthioadenosine phosphorylase (EC 2.4.2.28), 14), NADH-ubiquinone oxidoreductase chain L (EC 1.6.5.3), 8), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 8), N- acetylglucosamine-6-phosphate deacetylase (EC 3.5.1.25), 5), Cytochrome c oxidase polypeptide I (EC 1.9.3.1), 4), 4 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 4), Methionyl-tRNA synthetase (EC 6.1.1.10), 3), Dihydrolipoamide succinyl- transferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61), 3), Deoxyribodipyrimidine photolyase (EC 4.1.99.3), 2), Phenylalanyl-tRNA synthetase alpha chain (EC 6.1.1.20), 2), 5 Alkaline phosphatase (EC 3.1.3.1), 1), 2-keto-3-deoxy-D-arabino-heptulosonate-7-phosphate synthase I alpha (EC 2.5.1.54), 1), Enolase (EC 4.2.1.11), 1), 6-phosphogluconolactonase (EC 3.1.1.31); 2C eukaryotic type, 1), Methylphosphotriester-DNA–protein-cysteine S- methyltransferase (EC 2.1.1.n11) / DNA-3-methyladenine glycosylase II (EC 3.2.2.21), 1), 6 Ubiquinol-cytochrome C reductase iron-sulfur subunit (EC 1.10.2.2), 8), Formiminoglutamic iminohydrolase (EC 3.5.3.13), 3), 2- methylcitrate dehydratase (EC 4.2.1.79), 3), Pyruvate dehydrogenase E1 component (EC 1.2.4.1), 3), tRNA-specific 2-thiouridylase MnmA (EC 2.8.1.13), 2), 7 (EC 5.4.2.2), 1), DNA polymerase III beta subunit (EC 2.7.7.7), 1), Para-aminobenzoate synthase; 2C ami- nase component (EC 2.6.1.85), 1), Type I restriction-modification system; 2C DNA-methyltransferase subunit M (EC 2.1.1.72), 1), Methylphosphotriester-DNA–protein-cysteine S-methyltransferase (EC 2.1.1.n11) / DNA-3-methyladenine glycosylase II (EC 3.2.2.21), 1), 8 Hydroxymethylpyrimidine phosphate synthase ThiC (EC 4.1.99.17), 16), Glutamate-1-semialdehyde 2; 2C1-aminomutase (EC 5.4.3.8), 9), Aspartyl-tRNA synthetase (EC 6.1.1.12) @ Aspartyl-tRNA(Asn) synthetase (EC 6.1.1.23), 5), Valyl-tRNA synthetase (EC 6.1.1.9), 5), Glucosamine–fructose-6-phosphate aminotransferase isomerizing (EC 2.6.1.16), 5), 9 S-adenosylmethionine:tRNA ribosyltransferase-isomerase (EC 2.4.99.17), 6), DEAD-box ATP-dependent RNA helicase DeaD (; 3D CshA) (EC 3.6.4.13), 6), O-acetylhomoserine sulfhydrylase (EC 2.5.1.49) / O-succinylhomoserine sulfhydrylase (EC 2.5.1.48), 4), Dihydropteroate synthase (EC 2.5.1.15), 4), Glucose-6-phosphate isomerase (EC 5.3.1.9), 3), 10 Ribonuclease HII (EC 3.1.26.4), 4), Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61) / 2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2) @ 2-oxoglutarate decarboxylase (EC 4.1.1.71) @ 2-hydroxy- 3-oxoadipate synthase (EC 2.2.1.5), 3), Polyphosphate kinase (EC 2.7.4.1), 3), ATP-dependent DNA helicase RecG (EC 3.6.4.12), 3), Alkaline phosphatase (EC 3.1.3.1), 2))

Nitrogen-Nitrite 1 Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 9), Histidinol-phosphatase alternative form (EC 3.1.3.15), 5), (Guanosine-3; 2C5-bis(diphosphate) 3-pyrophosphohydrolase (EC 3.1.7.2) / GTP pyrophosphokinase (EC 2.7.6.5); 2C (p)ppGpp synthetase II, 5), Precorrin-8X methylmutase (EC 5.4.99.61), 4), 3-dehydroquinate synthase (EC 4.2.3.4), 4), 2 ATP synthase beta chain (EC 3.6.3.14), 10), NADH-ubiquinone oxidoreductase chain M (EC 1.6.5.3), 9), Tryptophanyl-tRNA synthetase (EC 6.1.1.2), 7), NADP-specific glutamate dehydrogenase (EC 1.4.1.4), 5), Threonine synthase (EC 4.2.3.1), 3), 3 Adenosylhomocysteinase (EC 3.3.1.1), 12), Pyrimidine-nucleoside phosphorylase (EC 2.4.2.2), 9), DNA gyrase subunit B (EC 5.99.1.3), 8), Isocitrate lyase (EC 4.1.3.1), 7), NAD(P) transhydrogenase alpha subunit (EC 1.6.1.2), 6), 4 ATP synthase beta chain (EC 3.6.3.14), 23), Enolase (EC 4.2.1.11), 7), Acetyl-coenzyme A carboxyl transferase alpha chain (EC 6.4.1.2) / Acetyl-coenzyme A carboxyl transferase beta chain (EC 6.4.1.2); 3B Propionyl-CoA carboxylase beta chain (EC 6.4.1.3); 3B Methylmalonyl-CoA decarboxylase; 2C alpha chain (EC 4.1.1.41), 6), NADH-ubiquinone oxidoreductase chain M (EC 1.6.5.3), 4), Proline iminopeptidase (EC 3.4.11.5), 4), 5 6-phosphofructokinase (EC 2.7.1.11), 28), Topoisomerase IV subunit A (EC 5.99.1.-), 4), Undecaprenyl-phosphate alpha-N- acetylglucosaminyl 1-phosphate transferase (EC 2.7.8.33), 4), Respiratory nitrate reductase alpha chain (EC 1.7.99.4), 4), DNA ligase (NAD(+)) (EC 6.5.1.2), 3), 6 Allophanate hydrolase 2 subunit 2 (EC 3.5.1.54), 10), Aconitate hydratase (EC 4.2.1.3) @ 2-methylisocitrate dehydratase (EC 4.2.1.99), 7), Valyl-tRNA synthetase (EC 6.1.1.9), 6), Dihydroorotase (EC 3.5.2.3), 6), Transketolase (EC 2.2.1.1), 6), 7 Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 6), Dihydroxy-acid dehydratase (EC 4.2.1.9), 6), 3-polyprenyl-4-hydroxybenzoate carboxy-lyase (EC 4.1.1.98), 5), GMP synthase glutamine-hydrolyzing; 2C amidotransferase subunit (EC 6.3.5.2) / GMP synthase glutamine-hydrolyzing; 2C ATP pyrophosphatase subunit (EC 6.3.5.2), 5), Uracil-DNA glycosylase; 2C family 1 (EC 3.2.2.27), 5), 8 Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 5), Potassium-transporting ATPase A chain (EC 3.6.3.12) (TC 3.A.3.7.1), 5), Selenide; 2Cwater dikinase (EC 2.7.9.3), 4), Prolyl (EC 3.4.21.26), 4), Ribonuclease III (EC 3.1.26.3), 3), 9 Arginyl-tRNA synthetase (EC 6.1.1.19), 6), Omega-–pyruvate aminotransferase (EC 2.6.1.18), 6), Aspartate aminotransferase (EC 2.6.1.1), 4), ATP synthase alpha chain (EC 3.6.3.14), 4), D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 4), 10 1-deoxy-D-xylulose 5-phosphate reductoisomerase (EC 1.1.1.267), 11), 3-isopropylmalate dehydrogenase (EC 1.1.1.85), 11), Long-chain- fatty-acid–CoA ligase (EC 6.2.1.3), 11), Fumarate hydratase class I (EC 4.2.1.2), 8), D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 6))

Carbon-L-Glutamic-Acid 1 Ribosomal large subunit pseudouridine synthase D (EC 5.4.99.23), 18), Aspartyl-tRNA(Asn) amidotransferase subunit B (EC 6.3.5.6) @ Glutamyl-tRNA(Gln) amidotransferase subunit B (EC 6.3.5.7), 6), Enolase (EC 4.2.1.11), 5), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 5), Inositol-1-phosphate synthase (EC 5.5.1.4), 4), 2 Branched-chain amino acid aminotransferase (EC 2.6.1.42), 8), 6; 2C7-dimethyl-8-ribityllumazine synthase (EC 2.5.1.78), 5), Dipeptidyl carboxypeptidase Dcp (EC 3.4.15.5), 5), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 5), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 4), 3 Aconitate hydratase (EC 4.2.1.3), 6), Signal transduction histidine kinase CheA (EC 2.7.3.-), 5), 1-deoxy-D-xylulose 5-phosphate synthase (EC 2.2.1.7), 4), Pyruvate dehydrogenase E1 component beta subunit (EC 1.2.4.1), 4), O-acetylhomoserine sulfhydrylase (EC 2.5.1.49) / O-succinylhomoserine sulfhydrylase (EC 2.5.1.48), 3), 4 Calcium-transporting ATPase (EC 3.6.3.8), 4), Uridine monophosphate kinase (EC 2.7.4.22), 3), Glucosamine–fructose-6-phosphate amino- transferase isomerizing (EC 2.6.1.16), 3), Citrate synthase (si) (EC 2.3.3.1), 3), (Pyridoxal 5-phosphate synthase (glutamine hydrolyzing); 2C synthase subunit (EC 4.3.3.6), 3), 5 (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 27), Adenylate kinase (EC 2.7.4.3), 10), Carboxyl-terminal protease (EC 3.4.21.102), 7), Fumarate hydratase class II (EC 4.2.1.2), 4), (Deoxyuridine 5-triphosphate nucleotidohydrolase (EC 3.6.1.23), 3), 6 Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 4), tRNA t(6)A37-methylthiotransferase (EC 2.8.4.5), 4), Polyribonucleotide nucleotidyltransferase (EC 2.7.7.8), 3), Pyruvate; 2Cphosphate dikinase (EC 2.7.9.1), 3), Alternative cytochrome c oxidase polypeptide CoxP (EC 1.9.3.1), 2), 7 Arginyl-tRNA synthetase (EC 6.1.1.19), 4), Cysteine desulfurase (EC 2.8.1.7), 2), Porphobilinogen deaminase (EC 2.5.1.61), 2), Poly(glycerol-phosphate) alpha-glucosyltransferase (EC 2.4.1.52), 2), ATP-dependent protease La (EC 3.4.21.53) Type I, 1), DRAFT108 8 beta-glucosidase (EC 3.2.1.21), 7), Alanyl-tRNA synthetase (EC 6.1.1.7), 7), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 6), Proline dehy- drogenase (EC 1.5.5.2) / Delta-1-pyrroline-5-carboxylate dehydrogenase (EC 1.2.1.88), 5), Cytidylate kinase (EC 2.7.4.25), 4), 9 Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 3), Leucyl-tRNA synthetase (EC 6.1.1.4), 3), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 3), Alkaline phosphatase (EC 3.1.3.1), 2), Aspartate aminotransferase (EC 2.6.1.1), 2), 10 Octanoate-acyl-carrier-protein-protein-N-octanoyltransferase (EC 2.3.1.181), 2), Dihydroxy-acid dehydratase (EC 4.2.1.9), 2), Inorganic pyrophosphatase (EC 3.6.1.1), 2), Cysteine desulfurase (EC 2.8.1.7), 2), Ribonucleotide reductase of class Ia (aerobic); 2C beta subunit (EC 1.17.4.1), 2))

Carbon-Glycyl-L-Aspartic-Acid 1 5-dehydro-4-deoxyglucarate dehydratase (EC 4.2.1.41), 8), Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehy- drogenase complex (EC 2.3.1.61) / 2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2) @ 2-oxoglutarate decarboxylase (EC 4.1.1.71) @ 2-hydroxy-3-oxoadipate synthase (EC 2.2.1.5), 7), 3-isopropylmalate dehydratase large subunit (EC 4.2.1.33), 6), DNA ligase (NAD(+)) (EC 6.5.1.2), 4), Dihydroorotase (EC 3.5.2.3), 4), 2 Assimilatory nitrate reductase large subunit (EC 1.7.99.4), 2), Cell division protein FtsI Peptidoglycan synthetase (EC 2.4.1.129), 1), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 1), Ribokinase (EC 2.7.1.15), 1), Threonyl-tRNA synthetase (EC 6.1.1.3), 1), 3 Trimethylamine-N-oxide reductase (EC 1.7.2.3); 2C TorA, 8), Aminodeoxyfutalosine synthase (EC 2.5.1.120), 6), Cell division protein FtsH (EC 3.4.24.-), 5), 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 4), Topoisomerase IV subunit B (EC 5.99.1.-), 4), 4 ATP-dependent protease La (EC 3.4.21.53) Type I, 5), Undecaprenyl-phosphate alpha-N-acetylglucosaminyl 1-phosphate transferase (EC 2.7.8.33), 4), (EC 5.4.2.11), 4), ATP-dependent DNA helicase RecG (EC 3.6.4.12), 4), Type I restriction- modification system; 2C DNA-methyltransferase subunit M (EC 2.1.1.72), 4), 5 Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 8), Anthranilate phosphoribosyltransferase (EC 2.4.2.18), 8), Thioredoxin re- ductase (EC 1.8.1.9), 5), 16S rRNA (guanine(527)-N(7))-methyltransferase (EC 2.1.1.170), 4), Cell division protein FtsZ (EC 3.4.24.-), 4), 6 Amidophosphoribosyltransferase (EC 2.4.2.14), 9), Chemotaxis response regulator protein-glutamate methylesterase CheB (EC 3.1.1.61), 5), Assimilatory nitrate reductase large subunit (EC 1.7.99.4), 4), Ornithine aminotransferase (EC 2.6.1.13); 3B Succinylornithine transam- inase (EC 2.6.1.81); 3B Acetylornithine aminotransferase (EC 2.6.1.11), 4), ATP-dependent DNA helicase RecG (EC 3.6.4.12), 4), 7 6-phosphofructokinase (EC 2.7.1.11), 8), Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19), 6), Cell division protein FtsH (EC 3.4.24.-), 6), Homogentisate 1; 2C2-dioxygenase (EC 1.13.11.5), 6), DNA primase (EC 2.7.7.-), 5), 8 Acetate kinase (EC 2.7.2.1), 5), tRNA (5-methylaminomethyl-2-thiouridylate)-methyltransferase (EC 2.1.1.61) / FAD-dependent cmnm(5)s(2)U34 oxidoreductase, 5), Dihydroorotase (EC 3.5.2.3), 4), Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14), 4), 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 4), 9 ATP synthase gamma chain (EC 3.6.3.14), 8), Cystathionine beta-lyase (EC 4.4.1.8), 7), 5-methyltetrahydrofolate–homocysteine methyl- transferase (EC 2.1.1.13), 5), DNA polymerase III delta prime subunit (EC 2.7.7.7), 5), A/G-specific adenine glycosylase (EC 3.2.2.-), 4), 10 6-phosphofructokinase (EC 2.7.1.11), 5), Indolepyruvate oxidoreductase subunit IorB (EC 1.2.7.8), 4), Undecaprenyl-phosphate alpha-N- acetylglucosaminyl 1-phosphate transferase (EC 2.7.8.33), 4), Homoserine kinase (EC 2.7.1.39), 3), Uridine monophosphate kinase (EC 2.7.4.22), 3))

Carbon-Caproic-Acid 1 Cyclohexadienyl dehydrogenase (EC 1.3.1.12)(EC 1.3.1.43), 6), Valyl-tRNA synthetase (EC 6.1.1.9), 4), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 4), DNA polymerase III polC-type (EC 2.7.7.7), 3), NAD kinase (EC 2.7.1.23), 3), 2 Phosphoglucosamine mutase (EC 5.4.2.10), 13), Hydroxymethylpyrimidine phosphate synthase ThiC (EC 4.1.99.17), 12), Fructose- bisphosphate aldolase class II (EC 4.1.2.13), 9), D-arabinose 5-phosphate isomerase (EC 5.3.1.13), 8), UDP-N-acetylglucosamine 1- carboxyvinyltransferase (EC 2.5.1.7), 8), 3 (Inosine-5-monophosphate dehydrogenase (EC 1.1.1.205) / CBS domain, 12), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 6), Argini- nosuccinate synthase (EC 6.3.4.5), 5), Methionyl-tRNA synthetase (EC 6.1.1.10), 4), ATP-dependent protease La (EC 3.4.21.53) Type I, 3), 4 Succinyl-CoA ligase ADP-forming alpha chain (EC 6.2.1.5), 9), 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase (EC 2.7.7.60), 5), DNA gyrase subunit B (EC 5.99.1.3), 5), NAD(P) transhydrogenase alpha subunit (EC 1.6.1.2), 4), Guanylate kinase (EC 2.7.4.8), 4), 5 Proline iminopeptidase (EC 3.4.11.5), 12), Cell division protein FtsH (EC 3.4.24.-), 6), Epoxide hydrolase (EC 3.3.2.9), 5), ATP synthase alpha chain (EC 3.6.3.14), 4), Tagatose 1; 2C6-bisphosphate aldolase (EC 4.1.2.40), 4), 6 Urocanate hydratase (EC 4.2.1.49), 25), Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14), 11), Methylcrotonyl-CoA carboxylase biotin-containing subunit (EC 6.4.1.4), 9), Glycerol kinase (EC 2.7.1.30), 8), NADH-ubiquinone oxidoreductase chain L (EC 1.6.5.3), 8), 7 Argininosuccinate lyase (EC 4.3.2.1), 7), (Guanosine-3; 2C5-bis(diphosphate) 3-pyrophosphohydrolase (EC 3.1.7.2) / GTP pyrophos- phokinase (EC 2.7.6.5); 2C (p)ppGpp synthetase II, 5), Dihydrolipoamide acyltransferase component of branched-chain alpha-keto acid dehydrogenase complex (EC 2.3.1.168), 4), (5-methylthioadenosine nucleosidase (EC 3.2.2.16) @ S-adenosylhomocysteine nucleosidase (EC 3.2.2.9), 4), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 4), 8 Isocitrate dehydrogenase NADP (EC 1.1.1.42); 3B Monomeric isocitrate dehydrogenase NADP (EC 1.1.1.42), 8), Topoisomerase IV subunit B (EC 5.99.1.-), 7), 1-deoxy-D-xylulose 5-phosphate reductoisomerase (EC 1.1.1.267), 5), Glucose-1-phosphate thymidylyltransferase (EC 2.7.7.24), 5), NADH-ubiquinone oxidoreductase chain G (EC 1.6.5.3), 5), 9 1-deoxy-D-xylulose 5-phosphate synthase (EC 2.2.1.7), 15), NADH-ubiquinone oxidoreductase chain G (EC 1.6.5.3), 7), NADH-ubiquinone oxidoreductase chain D (EC 1.6.5.3), 7), Quinolinate phosphoribosyltransferase decarboxylating (EC 2.4.2.19), 6), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), 10 S-adenosylmethionine synthetase (EC 2.5.1.6), 8), 3-oxoacyl-acyl-carrier-protein synthase; 2C KASII (EC 2.3.1.179), 6), NAD synthetase (EC 6.3.1.5) / Glutamine amidotransferase chain of NAD synthetase, 4), Prolyl-tRNA synthetase (EC 6.1.1.15); 2C bacterial type, 4), Homoserine kinase (EC 2.7.1.39), 4))

Carbon-a-Keto-Valeric-Acid 1 Hydroxymethylpyrimidine phosphate synthase ThiC (EC 4.1.99.17), 29), N(6)-L-threonylcarbamoyladenine synthase (EC 2.3.1.234), 8), Pyrimidine-nucleoside phosphorylase (EC 2.4.2.2), 7), Threonyl-tRNA synthetase (EC 6.1.1.3), 7), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 7), 2 Aspartate-semialdehyde dehydrogenase (EC 1.2.1.11), 8), ATP-dependent DNA helicase RecG (EC 3.6.4.12), 8), DNA-directed RNA polymerase alpha subunit (EC 2.7.7.6), 5), Gamma-glutamyl phosphate reductase (EC 1.2.1.41), 5), ATP synthase alpha chain (EC 3.6.3.14), 4), 3 IMP cyclohydrolase (EC 3.5.4.10) / Phosphoribosylaminoimidazolecarboxamide formyltransferase (EC 2.1.2.3), 10), Non-heme chloroper- oxidase (EC 1.11.1.10), 6), Glutamate-1-semialdehyde 2; 2C1-aminomutase (EC 5.4.3.8), 5), Aspartate aminotransferase (AspB-4) (EC 2.6.1.1), 5), Replicative DNA helicase (DnaB) (EC 3.6.4.12), 5), 4 Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 12), (EC 5.4.2.7), 10), Acetyl-CoA synthetase (EC 6.2.1.1), 9), 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase (EC 1.17.7.1), 8), Glucokinase (EC 2.7.1.2), 8), 5 Enoyl-CoA hydratase isoleucine degradation (EC 4.2.1.17) / 3-hydroxyacyl-CoA dehydrogenase (EC 1.1.1.35) / 3-hydroxybutyryl-CoA epimerase (EC 5.1.2.3), 10), Dihydrolipoamide acetyltransferase component of pyruvate dehydrogenase complex (EC 2.3.1.12), 7), Adeny- late cyclase (EC 4.6.1.1), 7), Fumarate hydratase class II (EC 4.2.1.2), 6), Phosphonoacetate hydrolase (EC 3.11.1.2), 6), 6 Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 10), Mg(2+) transport ATPase; 2C P-type (EC 3.6.3.2), 5), Prolyl-tRNA synthetase (EC 6.1.1.15); 2C bacterial type, 5), 2- isopropylmalate synthase (EC 2.3.3.13), 4), N5-carboxyaminoimidazole ribonucleotide mutase (EC 5.4.99.18), 4), DRAFT109 7 O-acetylhomoserine sulfhydrylase (EC 2.5.1.49) / O-succinylhomoserine sulfhydrylase (EC 2.5.1.48), 6), DNA gyrase subunit A (EC 5.99.1.3), 6), Glutamate 5-kinase (EC 2.7.2.11) / RNA-binding C-terminal domain PUA, 4), Glycerol kinase (EC 2.7.1.30), 4), (23S rRNA (guanosine(2251)-2-O)-methyltransferase (EC 2.1.1.185), 4), 8 Quinolinate synthetase (EC 2.5.1.72), 9), 6; 2C7-dimethyl-8-ribityllumazine synthase (EC 2.5.1.78), 6), 2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2), 5), L-aspartate oxidase (EC 1.4.3.16), 5), N5-carboxyaminoimidazole ribonucleotide synthase (EC 6.3.4.18), 5), 9 (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 6), Acetyl-CoA synthetase (EC 6.2.1.1), 6), 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 6), N(6)-L-threonylcarbamoyladenine synthase (EC 2.3.1.234), 6), Cytochrome c oxidase polypeptide I (EC 1.9.3.1), 5), 10 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 14), DNA ligase (NAD(+)) (EC 6.5.1.2), 6), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 5), Ribonuclease E (EC 3.1.26.12), 4), 3-hydroxybutyryl-CoA dehydrogenase (EC 1.1.1.157); 3B 3-hydroxyacyl- CoA dehydrogenase (EC 1.1.1.35), 4))

Nitrogen-Gly-Asn 1 DNA ligase (NAD(+)) (EC 6.5.1.2), 3), Meso-diaminopimelate D-dehydrogenase (EC 1.4.1.16), 2), Exodeoxyribonuclease VII large subunit (EC 3.1.11.6), 2), ATP synthase alpha chain (EC 3.6.3.14), 2), Ribulose bisphosphate carboxylase large chain (EC 4.1.1.39), 1), 2 D-arabinose 5-phosphate isomerase (EC 5.3.1.13), 6), ATP-dependent Clp protease proteolytic subunit (EC 3.4.21.92), 6), Threonyl-tRNA synthetase (EC 6.1.1.3), 5), Probable dTDP-4-dehydrorhamnose reductase (EC 1.1.1.133), 4), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 4), 3 tRNA(Ile)-lysidine synthetase (EC 6.3.4.19), 4), DNA ligase (NAD(+)) (EC 6.5.1.2), 3), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 3), 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase (EC 1.17.7.1), 2), Riboflavin synthase eubacte- rial/eukaryotic (EC 2.5.1.9), 2), 4 Glutamate synthase NADPH large chain (EC 1.4.1.13), 3), NADH-ubiquinone oxidoreductase chain D (EC 1.6.5.3), 2), Methylcrotonyl- CoA carboxylase carboxyl transferase subunit (EC 6.4.1.4), 2), 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductase; 2C gamma subunit (EC 1.2.7.-) / 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductase; 2C alpha subunit (EC 1.2.7.-), 2), Deoxyribodipyrimidine photolyase (EC 4.1.99.3), 1), 5 Dihydroxy-acid dehydratase (EC 4.2.1.9), 6), Enoyl-CoA hydratase (EC 4.2.1.17) / 3-hydroxyacyl-CoA dehydrogenase (EC 1.1.1.35) / 3- hydroxybutyryl-CoA epimerase (EC 5.1.2.3), 3), Cell division protein FtsH (EC 3.4.24.-), 2), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 2), Valyl-tRNA synthetase (EC 6.1.1.9), 2), 6 Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 5), 1; 2C4-alpha-glucan (glycogen) branching enzyme; 2C GH-13-type (EC 2.4.1.18), 4), Type I restriction-modification system; 2C restriction subunit R (EC 3.1.21.3), 3), Acyl-CoA dehydrogenase; 2C long-chain specific (EC 1.3.8.8), 2), NADH dehydrogenase (EC 1.6.99.3), 2), 7 Glucosamine–fructose-6-phosphate aminotransferase isomerizing (EC 2.6.1.16), 15), Undecaprenyl-diphosphatase (EC 3.6.1.27), 9), Cell division protein FtsH (EC 3.4.24.-), 4), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 4), Butyryl-CoA dehydrogenase (EC 1.3.8.1), 3), 8 NAD(P)H dehydrogenase (quinone) 2 (EC 1.6.5.2), 10), Maltose phosphorylase (EC 2.4.1.8), 10), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 8), alpha-xylosidase (EC 3.2.1.177), 7), Fructose-bisphosphate aldolase class II (EC 4.1.2.13), 6), 9 Cardiolipin synthetase (EC 2.7.8.-), 5), Fructose-bisphosphate aldolase class II (EC 4.1.2.13), 4), Phosphoribosylamine–glycine ligase (EC 6.3.4.13), 3), Leucyl-tRNA synthetase (EC 6.1.1.4), 3), Ornithine carbamoyltransferase (EC 2.1.3.3), 3), 10 Phosphoenolpyruvate-protein phosphotransferase of PTS system (EC 2.7.3.9), 15), DNA polymerase III alpha subunit (EC 2.7.7.7), 5), PTS system; 2C N-acetylglucosamine-specific IIC component / PTS system; 2C N-acetylglucosamine-specific IIB component (EC 2.7.1.69), 5), Transcriptional regulator; 2C GntR family domain / Aspartate aminotransferase (EC 2.6.1.1), 4), Adenosylcobinamide kinase (EC 2.7.1.156) / Adenosylcobinamide-phosphate guanylyltransferase (EC 2.7.7.62), 4))

Carbon-b-Phenylethylamine 1 Signal peptidase I (EC 3.4.21.89), 14), Tryptophanyl-tRNA synthetase (EC 6.1.1.2), 8), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 7), Exodeoxyribonuclease III (EC 3.1.11.2), 6), DNA polymerase III alpha subunit (EC 2.7.7.7), 6), 2 Alcohol dehydrogenase (EC 1.1.1.1), 7), Aspartyl-tRNA synthetase (EC 6.1.1.12) @ Aspartyl-tRNA(Asn) synthetase (EC 6.1.1.23), 6), Ribosomal large subunit pseudouridine synthase D (EC 5.4.99.23), 6), Acetyl-coenzyme A carboxyl transferase alpha chain (EC 6.4.1.2), 6), 3-isopropylmalate dehydratase large subunit (EC 4.2.1.33), 5), 3 Acetolactate synthase large subunit (EC 2.2.1.6), 17), UDP-N-acetylmuramoylalanine–D-glutamate ligase (EC 6.3.2.9), 16), UDP- N-acetylglucosamine 1-carboxyvinyltransferase (EC 2.5.1.7), 7), ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 6), 2- oxoglutarate/2-oxoacid ferredoxin oxidoreductase; 2C beta subunit (EC 1.2.7.-), 6), 4 ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 11), 2-methylcitrate dehydratase (EC 4.2.1.79), 7), Glycine dehydrogenase decarboxylating (glycine cleavage system P protein) (EC 1.4.4.2), 6), Deoxycytidine triphosphate deaminase (EC 3.5.4.13), 5), ATP synthase alpha chain (EC 3.6.3.14), 5), 5 Para-aminobenzoate synthase; 2C aminase component (EC 2.6.1.85), 7), Threonyl-tRNA synthetase (EC 6.1.1.3), 6), Replicative DNA helicase (DnaB) (EC 3.6.4.12), 5), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 5), Porphobilinogen synthase (EC 4.2.1.24), 4), 6 IMP cyclohydrolase (EC 3.5.4.10) / Phosphoribosylaminoimidazolecarboxamide formyltransferase (EC 2.1.2.3), 19), Phosphate:acyl-ACP acyltransferase PlsX (EC 2.3.1.n2), 7), Exopolyphosphatase (EC 3.6.1.11), 6), UDP-N-acetylglucosamine 1-carboxyvinyltransferase (EC 2.5.1.7), 5), Dihydropteroate synthase (EC 2.5.1.15), 5), 7 Dihydroxy-acid dehydratase (EC 4.2.1.9), 10), Cell division protein FtsH (EC 3.4.24.-), 8), Respiratory nitrate reductase beta chain (EC 1.7.99.4), 7), Ribosomal large subunit pseudouridine synthase D (EC 5.4.99.23), 6), BioD-like N-terminal domain / Phosphate acetyltransferase (EC 2.3.1.8), 4), 8 UDP-3-O-3-hydroxymyristoyl glucosamine N-acyltransferase (EC 2.3.1.191), 12), NAD kinase (EC 2.7.1.23), 7), Aspartokinase (EC 2.7.2.4), 6), Glucose-1-phosphate thymidylyltransferase (EC 2.7.7.24), 5), Octaprenyl diphosphate synthase (EC 2.5.1.90), 4), 9 Naphthoate synthase (EC 4.1.3.36), 6), Adenylosuccinate lyase (EC 4.3.2.2) @ SAICAR lyase (EC 4.3.2.2), 5), ATP-dependent DNA helicase RecG (EC 3.6.4.12), 5), L-serine dehydratase; 2C alpha subunit (EC 4.3.1.17), 4), Diaminohydroxyphosphoribosylaminopyrimidine deaminase (EC 3.5.4.26) / 5-amino-6-(5-phosphoribosylamino)uracil reductase (EC 1.1.1.193), 4), 10 Asparagine synthetase glutamine-hydrolyzing (EC 6.3.5.4), 8), Glutamate synthase NADPH large chain (EC 1.4.1.13), 7), Lipid A biosyn- thesis lauroyl acyltransferase (EC 2.3.1.241), 6), DNA-3-methyladenine glycosylase (EC 3.2.2.20), 5), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 5))

Carbon-D-L-Malic-Acid 1 Anthranilate synthase; 2C aminase component (EC 4.1.3.27), 5), Ribokinase (EC 2.7.1.15), 5), (Guanosine-3; 2C5-bis(diphosphate) 3- pyrophosphohydrolase (EC 3.1.7.2) / GTP pyrophosphokinase (EC 2.7.6.5); 2C (p)ppGpp synthetase II, 4), Cysteinyl-tRNA synthetase (EC 6.1.1.16), 3), Aconitate hydratase 2 (EC 4.2.1.3), 3), 2 DNA topoisomerase I (EC 5.99.1.2), 4), Hydroxymethylpyrimidine phosphate synthase ThiC (EC 4.1.99.17), 3), Acid phosphatase (EC 3.1.3.2), 2), Aminomethyltransferase (glycine cleavage system T protein) (EC 2.1.2.10), 2), Phosphoribosylformylglycinamidine cyclo-ligase (EC 6.3.3.1), 2), 3 Aconitate hydratase (EC 4.2.1.3), 6), 6-phosphogluconate dehydrogenase; 2C decarboxylating (EC 1.1.1.44), 5), ATP-dependent Clp protease proteolytic subunit (EC 3.4.21.92), 4), (R)-citramalate synthase (EC 2.3.1.182), 4), 1-deoxy-D-xylulose 5-phosphate synthase (EC 2.2.1.7), 3), 4 DNA gyrase subunit B (EC 5.99.1.3), 9), L-lactate dehydrogenase (EC 1.1.1.27), 4), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 4), Lysyl-tRNA synthetase (class II) (EC 6.1.1.6), 3), Enolase (EC 4.2.1.11), 3), 5 Carbamoyl-phosphate synthase small chain (EC 6.3.5.5), 34), Cytochrome c oxidase polypeptide I (EC 1.9.3.1), 8), Acyl-CoA:1-acyl-sn- glycerol-3-phosphate acyltransferase (EC 2.3.1.51), 5), Ribosomal large subunit pseudouridine synthase D (EC 5.4.99.23), 5), Carbamoyl- phosphate synthase large chain (EC 6.3.5.5), 4), 6 Deoxyribose-phosphate aldolase (EC 4.1.2.4), 7), Hydroxymethylglutaryl-CoA synthase (EC 2.3.3.10), 5), DNA polymerase III epsilon subunit (EC 2.7.7.7), 5), Glycerol-3-phosphate dehydrogenase NAD(P)+ (EC 1.1.1.94), 4), Phosphopantothenoylcysteine decarboxylase (EC 4.1.1.36) / Phosphopantothenoylcysteine synthetase (EC 6.3.2.5), 3), DRAFT110 7 UDP-N-acetylmuramoylalanyl-D-glutamate–2; 2C6-diaminopimelate ligase (EC 6.3.2.13), 4), Enoyl-acyl-carrier-protein reductase FMN (EC 1.3.1.9), 3), N-acetyl-gamma-glutamyl-phosphate reductase (EC 1.2.1.38), 3), DNA topoisomerase I (EC 5.99.1.2), 3), Glycine dehy- drogenase decarboxylating (glycine cleavage system P1 protein) (EC 1.4.4.2), 3), 8 DNA polymerase I (EC 2.7.7.7), 9), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), Adenosylhomocysteinase (EC 3.3.1.1), 3), D-alanine–D-alanine ligase (EC 6.3.2.4), 3), Aspartyl-tRNA(Asn) amidotransferase subunit B (EC 6.3.5.6) @ Glutamyl-tRNA(Gln) amidotransferase subunit B (EC 6.3.5.7), 3), 9 Glycerol-3-phosphate dehydrogenase NAD(P)+ (EC 1.1.1.94), 6), Deoxyribodipyrimidine photolyase (EC 4.1.99.3), 3), Alkaline phos- phatase (EC 3.1.3.1), 3), Malonate-semialdehyde dehydrogenase inositol (EC 1.2.1.18), 3), ATP synthase gamma chain (EC 3.6.3.14), 3), 10 2-acylglycerophosphoethanolamine acyltransferase (EC 2.3.1.40) / Acyl-acyl-carrier-protein synthetase (EC 6.2.1.20), 3), Seryl-tRNA syn- thetase (EC 6.1.1.11), 1), NAD(P) transhydrogenase alpha subunit (EC 1.6.1.2), 1), Alkaline phosphatase (EC 3.1.3.1), 1), Transketolase (EC 2.2.1.1), 1))

Carbon-D-Fructose 1 3-oxoacyl-acyl-carrier-protein synthase; 2C KASII (EC 2.3.1.179), 2), Tyrosine-protein kinase (EC 2.7.10.2), 2), ATP-dependent protease La (EC 3.4.21.53) Type I, 1), Seryl-tRNA synthetase (EC 6.1.1.11), 1), 3-methyl-2-oxobutanoate hydroxymethyltransferase (EC 2.1.2.11), 1), 2 DNA polymerase III alpha subunit (EC 2.7.7.7), 4), ATP-dependent DNA helicase RecG (EC 3.6.4.12), 3), (16S rRNA (cytidine(1402)-2- O)-methyltransferase (EC 2.1.1.198), 3), Adenosylhomocysteinase (EC 3.3.1.1), 2), ATP synthase alpha chain-like protein (EC 3.6.3.14), 2), 3 Aspartyl-tRNA(Asn) amidotransferase subunit A (EC 6.3.5.6) @ Glutamyl-tRNA(Gln) amidotransferase subunit A (EC 6.3.5.7), 2), N- acyl-L-amino acid amidohydrolase (EC 3.5.1.14), 1), Mitochondrial processing peptidase-like protein (EC 3.4.24.64), 1), (Guanosine-3; 2C5-bis(diphosphate) 3-pyrophosphohydrolase (EC 3.1.7.2) / GTP pyrophosphokinase (EC 2.7.6.5); 2C (p)ppGpp synthetase II, 1), 4 Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 5), Enolase (EC 4.2.1.11), 4), Fumarate hydratase class I; 2C aerobic (EC 4.2.1.2), 4), Ribose-phosphate pyrophosphokinase (EC 2.7.6.1), 3), (tRNA (cytidine(32)/uridine(32)-2-O)-methyltransferase (EC 2.1.1.200), 3), 5 ATP-dependent Clp protease proteolytic subunit (EC 3.4.21.92), 7), tRNA-guanine transglycosylase (EC 2.4.2.29), 6), Pyruvate; 2Cphos- phate dikinase (EC 2.7.9.1), 5), Ribose-phosphate pyrophosphokinase (EC 2.7.6.1), 4), 4-hydroxyphenylpyruvate dioxygenase (EC 1.13.11.27), 4), 6 Fructose-bisphosphate aldolase class II (EC 4.1.2.13), 5), NADH-ubiquinone oxidoreductase chain L (EC 1.6.5.3), 1), Quinolinate syn- thetase (EC 2.5.1.72), 1), Carbamate kinase (EC 2.7.2.2), 1), Citrate lyase beta chain (EC 4.1.3.6), 1), 7 , 8 DNA polymerase I (EC 2.7.7.7), 8), Serine hydroxymethyltransferase (EC 2.1.2.1), 7), Gamma-glutamyl phosphate reductase (EC 1.2.1.41), 4), Single-stranded-DNA-specific exonuclease RecJ (EC 3.1.-.-), 3), D-alanine–D-alanine ligase (EC 6.3.2.4), 3), 9 Undecaprenyl diphosphate synthase (EC 2.5.1.31), 2), Cytosine deaminase (EC 3.5.4.1), 2), Aldehyde dehydrogenase (EC 1.2.1.3), 2), Ornithine carbamoyltransferase (EC 2.1.3.3), 2), Lycopene beta-cyclase (EC 5.5.1.19), 2), 10 Agmatine deiminase (EC 3.5.3.12), 2), Phospho-N-acetylmuramoyl-pentapeptide-transferase (EC 2.7.8.13), 2), Formate dehydrogenase- O; 2C major subunit (EC 1.2.1.2), 2), 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 2), ATP phosphoribosyltransferase (EC 2.4.2.17) ; 3D¿ HisGl, 1))

Carbon-D-L-Citramalic-Acid 1 tRNA-guanine transglycosylase (EC 2.4.2.29), 6), Porphobilinogen synthase (EC 4.2.1.24), 4), Phosphoribosylamine–glycine ligase (EC 6.3.4.13), 3), Phosphate:acyl-ACP acyltransferase PlsX (EC 2.3.1.n2), 3), Glutamate synthase NADPH small chain (EC 1.4.1.13), 3), 2 DNA polymerase III subunits gamma and tau (EC 2.7.7.7), 9), Cell division protein FtsH (EC 3.4.24.-), 8), Xylulose-5-phosphate phos- phoketolase (EC 4.1.2.9) @ Fructose-6-phosphate phosphoketolase (EC 4.1.2.22), 8), Nucleoside triphosphate pyrophosphohydrolase MazG (EC 3.6.1.8), 7), Cell division trigger factor (EC 5.2.1.8), 7), 3 Deoxycytidine triphosphate deaminase (EC 3.5.4.13), 9), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 6), Phosphoribosyl- formylglycinamidine synthase; 2C synthetase subunit (EC 6.3.5.3) / Phosphoribosylformylglycinamidine synthase; 2C glutamine amido- transferase subunit (EC 6.3.5.3), 5), Aspartyl-tRNA(Asn) amidotransferase subunit A (EC 6.3.5.6) @ Glutamyl-tRNA(Gln) amidotrans- ferase subunit A (EC 6.3.5.7), 5), Hydroxymethylpyrimidine phosphate synthase ThiC (EC 4.1.99.17), 5), 4 GMP synthase glutamine-hydrolyzing; 2C amidotransferase subunit (EC 6.3.5.2) / GMP synthase glutamine-hydrolyzing; 2C ATP py- rophosphatase subunit (EC 6.3.5.2), 12), 3-polyprenyl-4-hydroxybenzoate carboxy-lyase (EC 4.1.1.98), 8), Aldehyde dehydrogenase (EC 1.2.1.3), 6), UDP-3-O-3-hydroxymyristoyl N-acetylglucosamine deacetylase (EC 3.5.1.108), 6), Alanyl-tRNA synthetase (EC 6.1.1.7), 5), 5 4-hydroxy-tetrahydrodipicolinate synthase (EC 4.3.3.7), 6), Methanol dehydrogenase large subunit protein (EC 1.1.2.7), 6), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 5), CTP synthase (EC 6.3.4.2), 4), Urocanate hydratase (EC 4.2.1.49), 4), 6 DNA ligase (NAD(+)) (EC 6.5.1.2), 4), Hydroxyacylglutathione hydrolase (EC 3.1.2.6), 4), 3-isopropylmalate dehydratase large subunit (EC 4.2.1.33), 3), Ribonucleotide reductase of class Ia (aerobic); 2C beta subunit (EC 1.17.4.1), 3), Peptide chain release factor N(5)- glutamine methyltransferase (EC 2.1.1.297), 3), 7 Agmatinase (EC 3.5.3.11), 8), O-succinylbenzoate synthase (EC 4.2.1.113), 5), ATP-dependent protease La (EC 3.4.21.53) Type I, 4), Acetylornithine deacetylase (EC 3.5.1.16), 4), N-acetylglucosamine-1-phosphate uridyltransferase (EC 2.7.7.23) / Glucosamine-1- phosphate N-acetyltransferase (EC 2.3.1.157), 4), 8 Proline iminopeptidase (EC 3.4.11.5), 9), Isoaspartyl aminopeptidase (EC 3.4.19.5) @ Asp-X , 4), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 4), ATP-dependent protease La (EC 3.4.21.53) Type I, 3), Cell division protein FtsH (EC 3.4.24.-), 3), 9 UTP–glucose-1-phosphate uridylyltransferase (EC 2.7.7.9), 7), 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase (EC 2.7.7.60), 6), tRNA t(6)A37-methylthiotransferase (EC 2.8.4.5), 5), Chemotaxis protein methyltransferase CheR (EC 2.1.1.80), 4), ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 4), 10 S-adenosylmethionine synthetase (EC 2.5.1.6), 25), Histidinol dehydrogenase (EC 1.1.1.23), 14), 3-oxoacyl-acyl-carrier-protein synthase; 2C KASIII (EC 2.3.1.180), 8), Uracil phosphoribosyltransferase (EC 2.4.2.9), 8), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 8))

Carbon-2-3-Butanone 1 Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 7), N(6)-L-threonylcarbamoyladenine synthase (EC 2.3.1.234), 4), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 4), Murein- DD-endopeptidase (EC 3.4.99.-), 4), Acetolactate synthase small subunit (EC 2.2.1.6), 4), 2 2-amino-3-ketobutyrate coenzyme A ligase (EC 2.3.1.29), 18), Phosphoribosylformylglycinamidine cyclo-ligase (EC 6.3.3.1), 18), Dihy- drolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61), 12), Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14), 7), Acetyl-CoA synthetase (EC 6.2.1.1), 7), 3 Malate dehydrogenase (EC 1.1.1.37), 5), Pyruvate formate-lyase (EC 2.3.1.54), 4), 1-deoxy-D-xylulose 5-phosphate synthase (EC 2.2.1.7), 4), D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 4), Molybdopterin-synthase adenylyltransferase (EC 2.7.7.80), 4), 4 3-isopropylmalate dehydratase large subunit (EC 4.2.1.33), 5), DNA primase (EC 2.7.7.-), 5), Topoisomerase IV subunit A (EC 5.99.1.-), 4), Proline iminopeptidase (EC 3.4.11.5), 4), Glutamate synthase NADPH large chain (EC 1.4.1.13), 4), 5 Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14), 5), DNA topoisomerase I (EC 5.99.1.2), 5), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 4), Glutamate-ammonia-ligase adenylyltransferase (EC 2.7.7.42), 4), (2E; 2C6E)-farnesyl diphosphate synthase (EC 2.5.1.10), 3), 6 Succinyl-CoA ligase ADP-forming alpha chain (EC 6.2.1.5), 12), Pyruvate; 2Cphosphate dikinase (EC 2.7.9.1), 8), Histidinol dehydroge- nase (EC 1.1.1.23), 7), Peptidyl-prolyl cis-trans isomerase (EC 5.2.1.8), 7), Endonuclease V (EC 3.2.2.17), 6), 7 Nitrite reductase NAD(P)H large subunit (EC 1.7.1.4), 9), Ribose-phosphate pyrophosphokinase (EC 2.7.6.1), 8), Long-chain-fatty-acid– CoA ligase (EC 6.2.1.3), 6), Methionyl-tRNA synthetase (EC 6.1.1.10), 6), UDP-N-acetylglucosamine 2-epimerase (EC 5.1.3.14), 5), DRAFT111 8 Membrane alanine aminopeptidase N (EC 3.4.11.2), 7), Uroporphyrinogen III decarboxylase (EC 4.1.1.37), 6), Glutamate synthase NADPH large chain (EC 1.4.1.13), 6), Ribonuclease E (EC 3.1.26.12), 5), Peptidase B (EC 3.4.11.23), 5), 9 NADH-ubiquinone oxidoreductase chain B (EC 1.6.5.3), 8), UDP-N-acetylmuramate:L-alanyl-gamma-D-glutamyl-meso-diaminopimelate ligase (EC 6.3.2.-), 7), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 5), Sulfur carrier protein ThiS adenylyltransferase (EC 2.7.7.73), 4), 2-isopropylmalate synthase (EC 2.3.3.13), 4), 10 Ribosomal protein S12p Asp88 (E. coli) methylthiotransferase (EC 2.8.4.4), 19), Geranyl-CoA carboxylase biotin-containing subunit (EC 6.4.1.5), 10), UTP–glucose-1-phosphate uridylyltransferase (EC 2.7.7.9), 7), Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14), 6), NAD-dependent glyceraldehyde-3-phosphate dehydrogenase (EC 1.2.1.12), 6))

Nitrogen-Ala-Asp 1 Phosphoribosylformylglycinamidine cyclo-ligase (EC 6.3.3.1), 6), Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19), 5), Alanylphosphatidylglycerol synthase (EC 2.3.2.11), 5), Periplasmic FeFe hydrogenase (EC 1.12.7.2), 5), Hydroxymethylglutaryl-CoA lyase (EC 4.1.3.4), 5), 2 Glycerate kinase (EC 2.7.1.31), 5), Isocitrate lyase (EC 4.1.3.1), 4), Transcriptional regulator; 2C GntR family domain / Aspartate aminotransferase (EC 2.6.1.1), 4), DNA gyrase subunit A (EC 5.99.1.3), 4), 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase (EC 1.17.7.1), 3), 3 Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14), 28), Pyruvate carboxylase (EC 6.4.1.1), 8), Methylcrotonyl-CoA carboxylase biotin-containing subunit (EC 6.4.1.4), 8), Phosphoglycerate kinase (EC 2.7.2.3), 6), Cell division protein FtsH (EC 3.4.24.-), 5), 4 Cell division protein FtsZ (EC 3.4.24.-), 19), Alanyl-tRNA synthetase (EC 6.1.1.7), 13), DNA polymerase I (EC 2.7.7.7), 7), Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14), 7), Threonine dehydratase biosynthetic (EC 4.3.1.19), 7), 5 Catalase KatE (EC 1.11.1.6), 5), 2; 2C3; 2C4; 2C5-tetrahydropyridine-2; 2C6-dicarboxylate N-succinyltransferase (EC 2.3.1.117), 4), Spore germination endopeptidase Gpr (EC 3.4.24.78), 3), Tryptophanyl-tRNA synthetase (EC 6.1.1.2), 3), GMP synthase glutamine- hydrolyzing; 2C amidotransferase subunit (EC 6.3.5.2) / GMP synthase glutamine-hydrolyzing; 2C ATP pyrophosphatase subunit (EC 6.3.5.2), 3), 6 Phosphoribosylformylglycinamidine synthase; 2C synthetase subunit (EC 6.3.5.3) / Phosphoribosylformylglycinamidine synthase; 2C glutamine amidotransferase subunit (EC 6.3.5.3), 10), Ribose 5-phosphate isomerase A (EC 5.3.1.6), 8), Ribonucleotide reductase of class II (coenzyme B12-dependent) (EC 1.17.4.1), 5), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 4), Alcohol dehydrogenase (EC 1.1.1.1), 3), 7 Phosphoenolpyruvate-protein phosphotransferase of PTS system (EC 2.7.3.9), 8), DNA ligase (NAD(+)) (EC 6.5.1.2), 7), Topoisomerase IV subunit A (EC 5.99.1.-), 5), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 5), Cell division protein FtsH (EC 3.4.24.-), 4), 8 Inorganic pyrophosphatase (EC 3.6.1.1), 11), Cysteine synthase (EC 2.5.1.47), 6), 3-isopropylmalate dehydratase large subunit (EC 4.2.1.33), 5), DNA gyrase subunit A (EC 5.99.1.3), 5), Gamma-glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 4), 9 Methylthioribose-1-phosphate isomerase (EC 5.3.1.23), 6), 3-oxoacyl-acyl-carrier-protein synthase; 2C KASII (EC 2.3.1.179), 4), Adenylo- succinate synthetase (EC 6.3.4.4), 4), 3-oxoacyl-acyl-carrier-protein synthase (EC 2.3.1.41), 4), ATP synthase gamma chain (EC 3.6.3.14), 4), 10 N5-carboxyaminoimidazole ribonucleotide mutase (EC 5.4.99.18), 10), ATP synthase beta chain (EC 3.6.3.14), 8), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 7), Phosphoribosylformylglycinamidine cyclo-ligase (EC 6.3.3.1), 7), S-adenosylmethionine synthetase (EC 2.5.1.6), 7))

Carbon-Maltose 1 NAD-dependent glyceraldehyde-3-phosphate dehydrogenase (EC 1.2.1.12), 9), 5; 2C10-methylenetetrahydrofolate reductase (EC 1.5.1.20), 7), Cytochrome d ubiquinol oxidase subunit II (EC 1.10.3.-), 6), DNA polymerase III alpha subunit (EC 2.7.7.7), 4), Type I restriction- modification system; 2C DNA-methyltransferase subunit M (EC 2.1.1.72), 4), 2 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61), 10), Phosphoenolpyruvate carboxylase (EC 4.1.1.31), 9), Foldase protein PrsA precursor (EC 5.2.1.8), 5), ATP-dependent protease La (EC 3.4.21.53) Type I, 4), Cell division protein FtsZ (EC 3.4.24.-), 4), 3 Ribonuclease E (EC 3.1.26.12), 7), Sarcosine oxidase alpha subunit (EC 1.5.3.1), 6), Adenylosuccinate synthetase (EC 6.3.4.4), 5), Dihydroorotase (EC 3.5.2.3), 5), Periplasmic FeFe hydrogenase (EC 1.12.7.2), 5), 4 Aspartyl-tRNA synthetase (EC 6.1.1.12), 6), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 5), ATP synthase alpha chain (EC 3.6.3.14), 4), Lysyl-tRNA synthetase (class II) (EC 6.1.1.6), 4), ATP synthase gamma chain (EC 3.6.3.14), 4), 5 NADH-ubiquinone oxidoreductase chain I (EC 1.6.5.3), 9), 3-isopropylmalate dehydrogenase (EC 1.1.1.85), 6), Cell division protein FtsH (EC 3.4.24.-), 5), Cell division protein FtsI Peptidoglycan synthetase (EC 2.4.1.129), 5), Aspartyl-tRNA synthetase (EC 6.1.1.12) @ Aspartyl-tRNA(Asn) synthetase (EC 6.1.1.23), 4), 6 Methylphosphotriester-DNA–protein-cysteine S-methyltransferase (EC 2.1.1.n11) / DNA-3-methyladenine glycosylase II (EC 3.2.2.21), 5), Methylphosphotriester-DNA–protein-cysteine S-methyltransferase (EC 2.1.1.n11) / ADA regulatory protein / Methylated-DNA–protein- cysteine methyltransferase (EC 2.1.1.63), 5), NAD(P) transhydrogenase alpha subunit (EC 1.6.1.2), 5), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), Valyl-tRNA synthetase (EC 6.1.1.9), 3), 7 Aspartyl-tRNA(Asn) amidotransferase subunit A (EC 6.3.5.6) @ Glutamyl-tRNA(Gln) amidotransferase subunit A (EC 6.3.5.7), 6), I (EC 5.4.99.5) / Prephenate dehydratase (EC 4.2.1.51), 3), Ribosomal protein L3 N(5)-glutamine methyltransferase (EC 2.1.1.298), 3), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 3), Glycogen phosphorylase (EC 2.4.1.1), 3), 8 Topoisomerase IV subunit A (EC 5.99.1.-), 12), Transketolase (EC 2.2.1.1), 5), DNA gyrase subunit A (EC 5.99.1.3), 5), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), DNA-directed RNA polymerase alpha subunit (EC 2.7.7.6), 4), 9 D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 6), Serine acetyltransferase (EC 2.3.1.30), 3), ADP-ribose pyrophosphatase (EC 3.6.1.13), 3), Aspartyl-tRNA(Asn) amidotransferase subunit B (EC 6.3.5.6) @ Glutamyl-tRNA(Gln) amidotransferase subunit B (EC 6.3.5.7), 2), Lysyl-tRNA synthetase (class II) (EC 6.1.1.6), 2), 10 Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 6), Phosphoenolpyruvate synthase (EC 2.7.9.2), 6), Glycyl-tRNA synthetase (EC 6.1.1.14), 4), Glutamine synthetase type I (EC 6.3.1.2), 3), UDP-N-acetylmuramate–alanine ligase (EC 6.3.2.8), 3))

Carbon-Capric-Acid 1 Potassium-transporting ATPase C chain (EC 3.6.3.12) (TC 3.A.3.7.1), 11), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 9), Sensor histidine kinase RcsC (EC 2.7.13.3), 6), DNA polymerase III alpha subunit (EC 2.7.7.7), 5), ATP-dependent DNA helicase RecG (EC 3.6.4.12), 5), 2 UDP-N-acetylmuramoylalanyl-D-glutamate–2; 2C6-diaminopimelate ligase (EC 6.3.2.13), 15), Single-stranded-DNA-specific exonuclease RecJ (EC 3.1.-.-), 9), Coproporphyrinogen III oxidase; 2C aerobic (EC 1.3.3.3), 7), tRNA (guanine(37)-N(1))-methyltransferase (EC 2.1.1.228), 5), Phosphoglycerate kinase (EC 2.7.2.3), 5), 3 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 56), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 17), NAD- dependent glyceraldehyde-3-phosphate dehydrogenase (EC 1.2.1.12), 7), 5-methyltetrahydrofolate–homocysteine methyltransferase (EC 2.1.1.13), 6), Isocitrate lyase (EC 4.1.3.1), 5), 4 ATP synthase beta chain (EC 3.6.3.14), 9), N-formylglutamate deformylase (EC 3.5.1.68), 8), 2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2), 8), Mg(2+) transport ATPase; 2C P-type (EC 3.6.3.2), 4), Fructose-1; 2C6-bisphosphatase; 2C type I (EC 3.1.3.11), 4), 5 2-keto-3-deoxy-D-arabino-heptulosonate-7-phosphate synthase II (EC 2.5.1.54), 49), DNA gyrase subunit A (EC 5.99.1.3), 8), DNA poly- merase I (EC 2.7.7.7), 7), NAD kinase (EC 2.7.1.23), 5), DNA topoisomerase I (EC 5.99.1.2), 5), 6 ATP phosphoribosyltransferase (EC 2.4.2.17) ; 3D¿ HisGs, 12), Nitric-oxide reductase subunit B (EC 1.7.99.7), 7), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 5), Long- chain-fatty-acid–CoA ligase (EC 6.2.1.3), 4), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), DRAFT112 7 Aldehyde dehydrogenase (EC 1.2.1.3), 19), Cell division protein FtsI Peptidoglycan synthetase (EC 2.4.1.129), 10), Ribosomal large subunit pseudouridine synthase D (EC 5.4.99.23), 8), GMP synthase glutamine-hydrolyzing; 2C amidotransferase subunit (EC 6.3.5.2) / GMP synthase glutamine-hydrolyzing; 2C ATP pyrophosphatase subunit (EC 6.3.5.2), 5), Leucyl-tRNA synthetase (EC 6.1.1.4), 5), 8 ATP synthase alpha chain (EC 3.6.3.14), 7), Arginyl-tRNA–protein transferase (EC 2.3.2.8), 6), Alanyl-tRNA synthetase (EC 6.1.1.7), 6), DEAD-box ATP-dependent RNA helicase DeaD (; 3D CshA) (EC 3.6.4.13), 5), Octanoate-acyl-carrier-protein-protein-N- octanoyltransferase (EC 2.3.1.181), 5), 9 Methylcrotonyl-CoA carboxylase biotin-containing subunit (EC 6.4.1.4), 20), Ornithine carbamoyltransferase (EC 2.1.3.3), 5), Ribosomal large subunit pseudouridine synthase B (EC 5.4.99.22), 4), 5-methyltetrahydrofolate–homocysteine methyltransferase (EC 2.1.1.13), 4), Thioredoxin reductase (EC 1.8.1.9), 4), 10 ATP synthase alpha chain (EC 3.6.3.14), 31), Glucosamine–fructose-6-phosphate aminotransferase isomerizing (EC 2.6.1.16), 7), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 6), Ribonuclease E (EC 3.1.26.12), 5), Aconitate hydratase 2 (EC 4.2.1.3) @ 2-methylisocitrate dehydratase (EC 4.2.1.99), 4))

Carbon-Oxalomalic-Acid 1 Phenylalanyl-tRNA synthetase alpha chain (EC 6.1.1.20), 12), Ribosomal protein S12p Asp88 (E. coli) methylthiotransferase (EC 2.8.4.4), 5), CCA tRNA nucleotidyltransferase (EC 2.7.7.72), 5), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 5), N5- carboxyaminoimidazole ribonucleotide mutase (EC 5.4.99.18), 5), 2 S-adenosylmethionine:tRNA ribosyltransferase-isomerase (EC 2.4.99.17), 9), DNA gyrase subunit A (EC 5.99.1.3), 8), 2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2), 7), ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 7), GMP synthase glutamine- hydrolyzing; 2C amidotransferase subunit (EC 6.3.5.2) / GMP synthase glutamine-hydrolyzing; 2C ATP pyrophosphatase subunit (EC 6.3.5.2), 6), 3 Nucleoside diphosphate kinase (EC 2.7.4.6), 6), Cytochrome c oxidase polypeptide I (EC 1.9.3.1), 5), Indole-3-glycerol phosphate synthase (EC 4.1.1.48), 4), Glucose-6-phosphate 1-dehydrogenase (EC 1.1.1.49), 4), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 4), 4 Arginine deiminase (EC 3.5.3.6), 5), Threonine dehydratase biosynthetic (EC 4.3.1.19), 5), Gamma-glutamyl phosphate reductase (EC 1.2.1.41), 5), NAD(P) transhydrogenase subunit beta (EC 1.6.1.2), 4), ATP-dependent protease La (EC 3.4.21.53) Type I, 3), 5 Cell division protein FtsZ (EC 3.4.24.-), 7), Cobyric acid synthase (EC 6.3.5.10), 6), tRNA t(6)A37-methylthiotransferase (EC 2.8.4.5), 5), DNA polymerase III alpha subunit (EC 2.7.7.7), 5), Dihydrolipoamide acetyltransferase component of pyruvate dehydrogenase complex (EC 2.3.1.12), 4), 6 Coproporphyrinogen III oxidase; 2C oxygen-independent (EC 1.3.99.22), 6), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 6), Adenylosuccinate synthetase (EC 6.3.4.4), 5), DNA ligase (NAD(+)) (EC 6.5.1.2), 3), Methenyltetrahydrofolate cyclohy- drolase (EC 3.5.4.9) / Methylenetetrahydrofolate dehydrogenase (NADP+) (EC 1.5.1.5), 3), 7 Nitrilotriacetate monooxygenase component A (EC 1.14.13.-), 8), Carbamoyl-phosphate synthase small chain (EC 6.3.5.5), 7), N5- carboxyaminoimidazole ribonucleotide synthase (EC 6.3.4.18), 5), Isocitrate lyase (EC 4.1.3.1), 4), 3-ketoacyl-CoA thiolase (EC 2.3.1.16), 4), 8 DNA polymerase I (EC 2.7.7.7), 7), V-type ATP synthase subunit C (EC 3.6.3.14), 6), Topoisomerase IV subunit A (EC 5.99.1.-), 6), 5-methyltetrahydrofolate–homocysteine methyltransferase (EC 2.1.1.13), 4), Glycerol kinase (EC 2.7.1.30), 4), 9 Crossover junction endodeoxyribonuclease RuvC (EC 3.1.22.4), 9), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 7), Branched- chain alpha-keto acid dehydrogenase; 2C E1 component; 2C beta subunit (EC 1.2.4.4), 6), Alcohol dehydrogenase (EC 1.1.1.1), 5), Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14), 5), 10 CTP synthase (EC 6.3.4.2), 10), Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14) / Biotin carboxyl carrier protein of acetyl- CoA carboxylase; 3B Propionyl-CoA carboxylase alpha chain (EC 6.4.1.3), 8), Ribose-phosphate pyrophosphokinase (EC 2.7.6.1), 7), Alcohol dehydrogenase (EC 1.1.1.1), 5), 6-phosphofructokinase (EC 2.7.1.11), 5))

Carbon-i-Erythritol 1 NAD(P) transhydrogenase alpha subunit (EC 1.6.1.2), 14), Type I restriction-modification system; 2C restriction subunit R (EC 3.1.21.3), 6), Topoisomerase IV subunit B (EC 5.99.1.-), 6), (Guanosine-3; 2C5-bis(diphosphate) 3-pyrophosphohydrolase (EC 3.1.7.2) / GTP pyrophosphokinase (EC 2.7.6.5); 2C (p)ppGpp synthetase II, 6), 3-ketoacyl-CoA thiolase fadN-fadA-fadE operon (EC 2.3.1.16), 5), 2 Ribonucleotide reductase of class Ia (aerobic); 2C alpha subunit (EC 1.17.4.1), 7), Type I restriction-modification system; 2C restriction subunit R (EC 3.1.21.3), 5), beta-galactosidase (EC 3.2.1.23), 5), DNA ligase (NAD(+)) (EC 6.5.1.2), 4), L-seryl-tRNA(Sec) selenium transferase (EC 2.9.1.1), 4), 3 Phosphoribosylformylglycinamidine synthase; 2C synthetase subunit (EC 6.3.5.3) / Phosphoribosylformylglycinamidine synthase; 2C glutamine amidotransferase subunit (EC 6.3.5.3), 12), Phenylalanyl-tRNA synthetase alpha chain (EC 6.1.1.20), 9), Tryptophanyl-tRNA synthetase (EC 6.1.1.2), 6), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 6), Cyclic beta-1; 2C2-glucan synthase (EC 2.4.1.-), 6), 4 Ribose 5-phosphate isomerase B (EC 5.3.1.6), 11), Phenylalanyl-tRNA synthetase alpha chain (EC 6.1.1.20), 6), Dihydrofolate synthase (EC 6.3.2.12) @ Folylpolyglutamate synthase (EC 6.3.2.17), 6), Propionyl-CoA carboxylase carboxyl transferase subunit (EC 6.4.1.3), 5), Undecaprenyl diphosphate synthase (EC 2.5.1.31), 5), 5 Cell division protein FtsH (EC 3.4.24.-), 9), Succinyl-CoA ligase ADP-forming alpha chain (EC 6.2.1.5), 9), Methylglyoxal synthase (EC 4.2.3.3), 5), Ribosomal large subunit pseudouridine synthase D (EC 5.4.99.23), 4), NAD-dependent malic enzyme (EC 1.1.1.38), 4), 6 Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 24), Phosphoenolpyruvate synthase (EC 2.7.9.2), 8), Ribosomal large subunit pseudouridine synthase D (EC 5.4.99.23), 6), 5-oxoprolinase (EC 3.5.2.9); 2C HyuA-like domain / 5-oxoprolinase (EC 3.5.2.9); 2C HyuB- like domain, 6), Glutamine synthetase type I (EC 6.3.1.2), 5), 7 Glutamate synthase NADPH small chain (EC 1.4.1.13), 19), Porphobilinogen synthase (EC 4.2.1.24), 14), Glutamate synthase NADPH large chain (EC 1.4.1.13), 9), 5-methyltetrahydrofolate–homocysteine methyltransferase (EC 2.1.1.13), 5), FKBP-type peptidyl-prolyl cis-trans isomerase SlyD (EC 5.2.1.8), 5), 8 DNA gyrase subunit A (EC 5.99.1.3), 6), UTP–glucose-1-phosphate uridylyltransferase (EC 2.7.7.9), 6), Cytosol aminopeptidase PepA (EC 3.4.11.1), 5), Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14), 4), Dihydrolipoamide acetyltransferase component of pyruvate dehydrogenase complex (EC 2.3.1.12), 3), 9 Replicative DNA helicase (DnaB) (EC 3.6.4.12), 12), Precorrin-6Y C(5; 2C15)-methyltransferase decarboxylating (EC 2.1.1.132), 10), Ribonucleotide reductase of class Ia (aerobic); 2C beta subunit (EC 1.17.4.1), 8), Leucyl-tRNA synthetase (EC 6.1.1.4), 7), Fumarate hydratase class II (EC 4.2.1.2), 6), 10 2-hydroxy-3-oxopropionate reductase (EC 1.1.1.60), 9), Indole-3-glycerol phosphate synthase (EC 4.1.1.48) / Phosphoribosylanthranilate isomerase (EC 5.3.1.24), 5), DNA topoisomerase III (EC 5.99.1.2), 5), Aspartate aminotransferase (EC 2.6.1.1), 4), Biosynthetic Aromatic amino acid aminotransferase beta (EC 2.6.1.57), 4))

Carbon-D-Melezitose 1 5-methyltetrahydrofolate–homocysteine methyltransferase (EC 2.1.1.13), 12), Glutamate synthase NADPH large chain (EC 1.4.1.13), 9), Alanyl-tRNA synthetase (EC 6.1.1.7), 7), Ketol-acid reductoisomerase (EC 1.1.1.86), 7), NAD-specific glutamate dehydrogenase (EC 1.4.1.2); 2C large form, 6), 2 Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 6), N(6)-L-threonylcarbamoyladenine synthase (EC 2.3.1.234), 5), ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 5), Isovaleryl-CoA dehydrogenase (EC 1.3.8.4), 4), Acetyl-CoA synthetase (EC 6.2.1.1), 4), 3 ATP synthase alpha chain (EC 3.6.3.14), 8), Transketolase (EC 2.2.1.1), 8), 3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyl- transferase (EC 2.3.1.9), 4), L-seryl-tRNA(Sec) selenium transferase (EC 2.9.1.1), 4), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 4), 4 Glutamate synthase NADPH large chain (EC 1.4.1.13), 16), Histidyl-tRNA synthetase (EC 6.1.1.21), 9), Quinolinate phosphoribosyl- transferase decarboxylating (EC 2.4.2.19), 9), Ribokinase (EC 2.7.1.15), 8), Alanyl-tRNA synthetase (EC 6.1.1.7), 8), DRAFT113 5 Acyl-carrier-protein acetyl transferase of FASI (EC 2.3.1.38) / Enoyl-acyl-carrier-protein reductase of FASI (EC 1.3.1.9) / 3- hydroxypalmitoyl-acyl-carrier-protein dehydratase of FASI (EC 4.2.1.61) / Acyl-carrier-protein malonyl transferase of FASI (EC 2.3.1.39) / Acyl-carrier-protein palmitoyl transferase of FASI (EC 2.3.1.-) / Acyl carrier protein of FASI / 3-oxoacyl-acyl-carrier-protein reductase of FASI (EC 1.1.1.100) / 3-oxoacyl-acyl-carrier-protein synthase of FASI (EC 2.3.1.41), 5), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 5), Cell division protein FtsH (EC 3.4.24.-), 4), Diaminopimelate decarboxylase (EC 4.1.1.20), 4), Diaminopimelate epimerase (EC 5.1.1.7), 4), 6 Acetolactate synthase large subunit (EC 2.2.1.6), 4), Lysine N-acyltransferase MbtK (EC 2.3.1.-) @ Siderophore synthetase small compo- nent; 2C acetyltransferase, 3), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 3), Diaminopimelate epimerase (EC 5.1.1.7), 3), Phytochrome; 2C two-component sensor histidine kinase (EC 2.7.3.-), 3), 7 Sulfate and thiosulfate import ATP-binding protein CysA (EC 3.6.3.25), 9), Pyruvate-flavodoxin oxidoreductase (EC 1.2.7.-), 6), UTP–glucose-1-phosphate uridylyltransferase (EC 2.7.7.9), 5), Dihydroorotase (EC 3.5.2.3), 4), CDP-diacylglycerol–serine O- phosphatidyltransferase (EC 2.7.8.8), 4), 8 Dihydrolipoamide acyltransferase component of branched-chain alpha-keto acid dehydrogenase complex (EC 2.3.1.168), 9), Cobyric acid synthase (EC 6.3.5.10), 7), Aminomethyltransferase (glycine cleavage system T protein) (EC 2.1.2.10), 5), Phosphoribosylformylglyci- namidine synthase; 2C synthetase subunit (EC 6.3.5.3) / Phosphoribosylformylglycinamidine synthase; 2C glutamine amidotransferase subunit (EC 6.3.5.3), 5), 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 5), 9 Aspartyl-tRNA(Asn) amidotransferase subunit B (EC 6.3.5.6) @ Glutamyl-tRNA(Gln) amidotransferase subunit B (EC 6.3.5.7), 18), Potassium-transporting ATPase B chain (EC 3.6.3.12) (TC 3.A.3.7.1), 11), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 8), O-acetylhomoserine sulfhydrylase (EC 2.5.1.49) / O-succinylhomoserine sulfhydrylase (EC 2.5.1.48), 7), Probable VANILLIN dehydroge- nase oxidoreductase protein (EC 1.-.-.-), 6), 10 Proline dehydrogenase (EC 1.5.5.2) / Delta-1-pyrroline-5-carboxylate dehydrogenase (EC 1.2.1.88), 6), Diaminohydroxyphosphoribosy- laminopyrimidine deaminase (EC 3.5.4.26) / 5-amino-6-(5-phosphoribosylamino)uracil reductase (EC 1.1.1.193), 6), Transcriptional re- pressor of PutA and PutP / Proline dehydrogenase (EC 1.5.5.2) / Delta-1-pyrroline-5-carboxylate dehydrogenase (EC 1.2.1.88), 6), Cell division protein FtsZ (EC 3.4.24.-), 5), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 5))

Carbon-D-Alanine 1 Glycyl-tRNA synthetase alpha chain (EC 6.1.1.14), 8), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 8), Formate– tetrahydrofolate ligase (EC 6.3.4.3), 3), Aspartate–ammonia ligase (EC 6.3.1.1), 3), Adenylosuccinate lyase (EC 4.3.2.2) @ SAICAR lyase (EC 4.3.2.2), 3), 2 (Pyridoxal 5-phosphate synthase (glutamine hydrolyzing); 2C synthase subunit (EC 4.3.3.6), 13), ATP synthase F0 sector subunit a (EC 3.6.3.14), 9), Aspartate aminotransferase (EC 2.6.1.1), 7), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 6), Hypoxanthine- guanine phosphoribosyltransferase (EC 2.4.2.8), 4), 3 Cysteinyl-tRNA synthetase (EC 6.1.1.16), 8), Threonyl-tRNA synthetase (EC 6.1.1.3), 5), Fructose-1; 2C6-bisphosphatase; 2C GlpX type (EC 3.1.3.11) / Sedoheptulose-1; 2C7-bisphosphatase (EC 3.1.3.37), 4), DNA ligase (NAD(+)) (EC 6.5.1.2), 4), Methionyl-tRNA formyltransferase (EC 2.1.2.9), 4), 4 ATP-dependent Clp protease proteolytic subunit (EC 3.4.21.92), 16), Deoxyribodipyrimidine photolyase (EC 4.1.99.3), 8), UDP-N- acetylglucosamine 1-carboxyvinyltransferase (EC 2.5.1.7), 5), Thioredoxin reductase (EC 1.8.1.9), 5), UDP-N-acetylmuramoylalanine– D-glutamate ligase (EC 6.3.2.9), 5), 5 Glutamyl-tRNA synthetase (EC 6.1.1.17), 8), Cell division trigger factor (EC 5.2.1.8), 7), Aspartyl-tRNA(Asn) amidotransferase subunit A (EC 6.3.5.6) @ Glutamyl-tRNA(Gln) amidotransferase subunit A (EC 6.3.5.7), 5), Methionine aminopeptidase (EC 3.4.11.18), 4), UDP-N-acetylglucosamine 1-carboxyvinyltransferase (EC 2.5.1.7), 3), 6 Seryl-tRNA synthetase (EC 6.1.1.11), 10), N-acetyl-gamma-glutamyl-phosphate reductase (EC 1.2.1.38), 4), 3-methyl-2-oxobutanoate hydroxymethyltransferase (EC 2.1.2.11), 3), Methylenetetrahydrofolate–tRNA-(uracil-5-)-methyltransferase TrmFO (EC 2.1.1.74), 3), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 3), 7 Argininosuccinate lyase (EC 4.3.2.1), 4), Histidyl-tRNA synthetase (EC 6.1.1.21), 4), Phenylalanyl-tRNA synthetase beta chain (EC 6.1.1.20), 4), 4-hydroxy-tetrahydrodipicolinate reductase (EC 1.17.1.8), 3), Replicative DNA helicase (DnaB) (EC 3.6.4.12), 3), 8 Aspartyl-tRNA(Asn) amidotransferase subunit A (EC 6.3.5.6) @ Glutamyl-tRNA(Gln) amidotransferase subunit A (EC 6.3.5.7), 12), UDP-N-acetylmuramoyl-tripeptide–D-alanyl-D-alanine ligase (EC 6.3.2.10), 6), Penicillin acylase (EC 3.5.1.11), 5), Histidinol-phosphate aminotransferase (EC 2.6.1.9), 4), Ribulose-phosphate 3-epimerase (EC 5.1.3.1), 4), 9 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 18), Cell division protein FtsH (EC 3.4.24.-), 6), (DNA-directed RNA poly- merase beta subunit (EC 2.7.7.6), 5), Lipoyl synthase (EC 2.8.1.8), 4), Type I restriction-modification system; 2C restriction subunit R (EC 3.1.21.3), 4), 10 6-phospho-beta-glucosidase (EC 3.2.1.86), 3), Oligoendopeptidase F (EC 3.4.24.-), 3), Sulfur carrier protein ThiS adenylyltransferase (EC 2.7.7.73), 2), Acetolactate synthase large subunit (EC 2.2.1.6), 2), UDP-glucose 6-dehydrogenase (EC 1.1.1.22), 2))

Nitrogen-Nitrate 1 Cell division protein FtsH (EC 3.4.24.-), 9), Arginyl-tRNA synthetase (EC 6.1.1.19), 7), Phosphoglucosamine mutase (EC 5.4.2.10), 6), Undecaprenyl-diphosphatase (EC 3.6.1.27), 6), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 5), 2 Thioredoxin reductase (EC 1.8.1.9), 15), DNA polymerase III alpha subunit (EC 2.7.7.7), 11), Transketolase (EC 2.2.1.1), 7), 3-ketoacyl- CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 7), S-adenosylmethionine synthetase (EC 2.5.1.6), 6), 3 Glycyl-tRNA synthetase alpha chain (EC 6.1.1.14), 7), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 5), Carbamoyl- phosphate synthase large chain (EC 6.3.5.5), 4), Cyclic beta-1; 2C2-glucan synthase (EC 2.4.1.-), 4), Glutamate-ammonia-ligase adeny- lyltransferase (EC 2.7.7.42), 3), 4 Cytochrome O ubiquinol oxidase subunit I (EC 1.10.3.-), 6), Polyribonucleotide nucleotidyltransferase (EC 2.7.7.8), 6), NADH-ubiquinone oxidoreductase chain M (EC 1.6.5.3), 5), (Inosine-5-monophosphate dehydrogenase (EC 1.1.1.205) / CBS domain, 5), Ribonucleotide reductase of class II (coenzyme B12-dependent) (EC 1.17.4.1), 5), 5 DNA polymerase III alpha subunit (EC 2.7.7.7), 24), Threonyl-tRNA synthetase (EC 6.1.1.3), 10), UDP-N-acetylmuramoylalanyl-D- glutamate–2; 2C6-diaminopimelate ligase (EC 6.3.2.13), 10), DEAD-box ATP-dependent RNA helicase DeaD (; 3D CshA) (EC 3.6.4.13), 6), Alanyl-tRNA synthetase (EC 6.1.1.7), 5), 6 ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 8), Mercuric ion reductase (EC 1.16.1.1), 7), Enolase (EC 4.2.1.11), 5), Phos- phoglycerate mutase (EC 5.4.2.11), 5), Ethanolamine ammonia-lyase heavy chain (EC 4.3.1.7), 5), 7 tRNA dimethylallyltransferase (EC 2.5.1.75), 8), Pyruvate dehydrogenase E1 component (EC 1.2.4.1), 6), Dihydroorotase (EC 3.5.2.3), 6), Poly(A) polymerase (EC 2.7.7.19), 6), Glutamine synthetase type I (EC 6.3.1.2), 5), 8 3-isopropylmalate dehydratase large subunit (EC 4.2.1.33), 11), Dihydrolipoamide acetyltransferase component of pyruvate dehydrogenase complex (EC 2.3.1.12), 5), Glutathione-independent formaldehyde dehydrogenase (EC 1.2.1.46), 5), Ribonuclease E (EC 3.1.26.12), 4), S-adenosylmethionine:tRNA ribosyltransferase-isomerase (EC 2.4.99.17), 4), 9 L-aspartate oxidase (EC 1.4.3.16), 15), N5-carboxyaminoimidazole ribonucleotide synthase (EC 6.3.4.18), 8), ATP phosphoribosyltrans- ferase (EC 2.4.2.17) ; 3D¿ HisGs, 6), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 5), Ribosomal protein L11 methyltransferase (EC 2.1.1.-), 4), 10 Isoleucyl-tRNA synthetase (EC 6.1.1.5), 12), 5-methyltetrahydropteroyltriglutamate–homocysteine methyltransferase (EC 2.1.1.14), 8), (23S rRNA (guanosine(2251)-2-O)-methyltransferase (EC 2.1.1.185), 7), Tripeptide aminopeptidase (EC 3.4.11.4), 5), Long-chain-fatty- acid–CoA ligase (EC 6.2.1.3), 5))

Nitrogen-Ala-Glu 1 Threonine dehydratase biosynthetic (EC 4.3.1.19), 7), NADP-dependent malic enzyme (EC 1.1.1.40), 5), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 5), Acyl-CoA hydrolase (EC 3.1.2.20), 4), Prolyl-tRNA synthetase (EC 6.1.1.15); 2C bacterial type, 4), 2 Cell division protein FtsZ (EC 3.4.24.-), 17), Alanyl-tRNA synthetase (EC 6.1.1.7), 17), Cell division protein FtsH (EC 3.4.24.-), 11), Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14), 10), Methylcrotonyl-CoA carboxylase biotin-containing subunit (EC 6.4.1.4), 10), DRAFT114 3 Sulfate and thiosulfate import ATP-binding protein CysA (EC 3.6.3.25), 6), Dihydroorotase (EC 3.5.2.3), 4), Glycerophosphoryl diester phosphodiesterase (EC 3.1.4.46), 4), O-acetylhomoserine sulfhydrylase (EC 2.5.1.49) / O-succinylhomoserine sulfhydrylase (EC 2.5.1.48), 3), Homoserine kinase (EC 2.7.1.39), 3), 4 Succinyl-CoA ligase ADP-forming alpha chain (EC 6.2.1.5), 22), Gamma-glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 14), UDP-N-acetylmuramate–alanine ligase (EC 6.3.2.8), 13), L-aspartate oxidase (EC 1.4.3.16), 7), 2; 2C3; 2C4; 2C5- tetrahydropyridine-2; 2C6-dicarboxylate N-succinyltransferase (EC 2.3.1.117), 6), 5 Transketolase (EC 2.2.1.1), 17), Glutamate synthase NADPH large chain (EC 1.4.1.13), 11), Nitrilotriacetate monooxygenase component A (EC 1.14.13.-), 5), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 5), Chemotaxis protein methyltransferase CheR (EC 2.1.1.80), 4), 6 Adenosylmethionine-8-amino-7-oxononanoate aminotransferase (EC 2.6.1.62), 7), UDP-glucose 4-epimerase (EC 5.1.3.2), 6), Dihydroxy- acid dehydratase (EC 4.2.1.9), 6), Cell division protein FtsH (EC 3.4.24.-), 5), Diaminopimelate epimerase (EC 5.1.1.7), 4), 7 Hydroxymethylpyrimidine phosphate kinase ThiD (EC 2.7.4.7), 10), tRNA (guanine(37)-N(1))-methyltransferase (EC 2.1.1.228), 6), NADH dehydrogenase (EC 1.6.99.3), 6), NADH-ubiquinone oxidoreductase chain F (EC 1.6.5.3), 6), Alkanesulfonate monooxygenase (EC 1.14.14.5), 5), 8 Mercuric ion reductase (EC 1.16.1.1), 5), Ribonuclease E (EC 3.1.26.12), 4), Aspartokinase (EC 2.7.2.4), 3), 5-methyltetrahydrofolate– homocysteine methyltransferase (EC 2.1.1.13), 3), Aspartyl-tRNA synthetase (EC 6.1.1.12) @ Aspartyl-tRNA(Asn) synthetase (EC 6.1.1.23), 3), 9 Phosphoglycerate mutase (EC 5.4.2.11), 5), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 5), NADH-ubiquinone oxidoreductase chain F (EC 1.6.5.3), 4), Chorismate synthase (EC 4.2.3.5), 4), Acyl-acyl-carrier-protein–UDP-N-acetylglucosamine O-acyltransferase (EC 2.3.1.129), 4), 10 Phosphoglycerate kinase (EC 2.7.2.3), 27), 3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 18), Acetyl-CoA acetyltransferase (EC 2.3.1.9), 12), Glutamate-1-semialdehyde 2; 2C1-aminomutase (EC 5.4.3.8), 11), 3-ketoacyl-CoA thiolase fadN-fadA-fadE operon (EC 2.3.1.16), 10))

Nitrogen-N-Acetyl-D-Glucosamine 1 Cytochrome c oxidase polypeptide II (EC 1.9.3.1), 3), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 2), Nicotinate phosphoribosyltransferase (EC 6.3.4.21), 2), Sulfur carrier protein ThiS adenylyltransferase (EC 2.7.7.73), 2), Glutamyl-tRNA synthetase (EC 6.1.1.17) @ Glutamyl- tRNA(Gln) synthetase (EC 6.1.1.24), 1), 2 Replicative DNA helicase (DnaB) (EC 3.6.4.12), 7), Tyrosyl-tRNA synthetase (EC 6.1.1.1), 5), Phosphoglucomutase (EC 5.4.2.2), 5), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 5), Nucleoside diphosphate kinase (EC 2.7.4.6), 4), 3 tRNA-dihydrouridine(20/20a) synthase (EC 1.3.1.91), 7), DNA gyrase subunit A (EC 5.99.1.3), 6), Homoserine dehydrogenase (EC 1.1.1.3), 5), Cytochrome c oxidase polypeptide II (EC 1.9.3.1), 5), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 4), 4 Cytosol aminopeptidase PepA (EC 3.4.11.1), 5), Methylcrotonyl-CoA carboxylase carboxyl transferase subunit (EC 6.4.1.4), 2), Deoxyri- bodipyrimidine photolyase (EC 4.1.99.3), 1), ATP phosphoribosyltransferase (EC 2.4.2.17) ; 3D¿ HisGl, 1), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 1), 5 Nicotinate-nucleotide adenylyltransferase (EC 2.7.7.18), 2), Deoxyribodipyrimidine photolyase (EC 4.1.99.3), 1), Chemotaxis response regulator protein-glutamate methylesterase CheB (EC 3.1.1.61), 1), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 1), tRNA pseudouridine(55) synthase (EC 5.4.99.25), 1), 6 Glucose-1-phosphate thymidylyltransferase (EC 2.7.7.24), 2), 3-oxoacyl-acyl-carrier-protein synthase; 2C KASII (EC 2.3.1.179), 2), Adenosylmethionine-8-amino-7-oxononanoate aminotransferase (EC 2.6.1.62), 2), ATP-dependent protease La (EC 3.4.21.53) Type I, 1), Aspartokinase (EC 2.7.2.4), 1), 7 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 11), Uridine phosphorylase (EC 2.4.2.3), 6), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), Arginyl-tRNA synthetase (EC 6.1.1.19), 3), N-succinyl-L; 2CL-diaminopimelate desuccinylase (EC 3.5.1.18), 3), 8 Formate dehydrogenase-O; 2C major subunit (EC 1.2.1.2), 3), Phospho-N-acetylmuramoyl-pentapeptide-transferase (EC 2.7.8.13), 2), 3-ketoacyl-CoA thiolase (EC 2.3.1.16), 2), DNA polymerase I (EC 2.7.7.7), 2), NADH-ubiquinone oxidoreductase chain L (EC 1.6.5.3), 1), 9 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 2), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 2), 3-methylitaconate Delta isomerase (EC 5.3.3.6), 1), UDP- galactofuranosyl transferase GlfT1 (EC 2.4.1.287); 2C catalyzes initiation of cell wall galactan polymerization, 1), Histidinol-phosphatase alternative form (EC 3.1.3.15), 1), 10 Glucose-6-phosphate isomerase (EC 5.3.1.9), 13), Adenylosuccinate synthetase (EC 6.3.4.4), 11), Triosephosphate isomerase (EC 5.3.1.1), 10), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 7), Single-stranded-DNA-specific exonuclease RecJ (EC 3.1.-.-), 6))

Carbon-D-L-Octopamine 1 2-methoxy-6-polyprenyl-1; 2C4-benzoquinol methylase (EC 2.1.1.201), 10), Phosphoenolpyruvate synthase (EC 2.7.9.2), 8), NAD(P) transhydrogenase subunit beta (EC 1.6.1.2), 7), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 5), 2-hydroxy-3-oxopropionate reductase (EC 1.1.1.60), 4), 2 ATP synthase alpha chain (EC 3.6.3.14), 27), NADH-ubiquinone oxidoreductase chain D (EC 1.6.5.3), 12), NAD-specific glutamate dehydrogenase (EC 1.4.1.2); 2C large form, 8), Glutamate synthase NADPH large chain (EC 1.4.1.13), 8), Potassium-transporting ATPase B chain (EC 3.6.3.12) (TC 3.A.3.7.1), 7), 3 Leucyl-tRNA synthetase (EC 6.1.1.4), 8), Threonylcarbamoyl-AMP synthase (EC 2.7.7.87) / SUA5 domain with internal deletion, 4), Malonate-semialdehyde dehydrogenase inositol (EC 1.2.1.18), 4), 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 4), Cell division protein FtsZ (EC 3.4.24.-), 4), 4 Phosphoglucosamine mutase (EC 5.4.2.10), 9), 3-oxoacyl-acyl-carrier-protein synthase; 2C KASII (EC 2.3.1.179), 8), 6- phosphofructokinase (EC 2.7.1.11), 6), Phosphoribosylformylglycinamidine cyclo-ligase (EC 6.3.3.1), 6), (23S rRNA (guanosine(2251)- 2-O)-methyltransferase (EC 2.1.1.185), 5), 5 (23S rRNA (guanosine(2251)-2-O)-methyltransferase (EC 2.1.1.185), 16), Alanyl-tRNA synthetase (EC 6.1.1.7), 8), Anaerobic dimethyl sulfoxide reductase chain A (EC 1.8.5.3); 2C molybdopterin-binding domain, 7), Uridine monophosphate kinase (EC 2.7.4.22), 5), UDP- N-acetylmuramoylalanine–D-glutamate ligase (EC 6.3.2.9), 5), 6 Threonyl-tRNA synthetase (EC 6.1.1.3), 8), Carboxyl-terminal protease (EC 3.4.21.102), 6), NAD(P)H-hydrate epimerase (EC 5.1.99.6) / ADP-dependent (S)-NAD(P)H-hydrate dehydratase (EC 4.2.1.136), 5), Thymidylate kinase (EC 2.7.4.9), 5), Cyclohexanone monooxy- genase (EC 1.14.13.22), 5), 7 Choline dehydrogenase (EC 1.1.99.1), 9), Malonyl CoA-acyl carrier protein transacylase (EC 2.3.1.39), 8), Arogenate dehydrogenase (EC 1.3.1.43), 6), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 5), Valyl-tRNA synthetase (EC 6.1.1.9), 4), 8 NADP-dependent malic enzyme (EC 1.1.1.40), 9), NADH-ubiquinone oxidoreductase chain M (EC 1.6.5.3), 8), NADH-ubiquinone oxidore- ductase chain L (EC 1.6.5.3), 6), 2; 2C3-bisphosphoglycerate-independent phosphoglycerate mutase (EC 5.4.2.12), 6), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 6), 9 Glutamate 5-kinase (EC 2.7.2.11) / RNA-binding C-terminal domain PUA, 4), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 4), Succi- nate dehydrogenase flavoprotein subunit (EC 1.3.5.1), 4), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 3), (2E; 2C6Z)-farnesyl diphosphate synthase (EC 2.5.1.68), 3), 10 Lysophospholipase (EC 3.1.1.5); 3B Monoglyceride lipase (EC 3.1.1.23), 7), Enoyl-CoA hydratase (EC 4.2.1.17), 5), 16S rRNA (guanine(966)-N(2))-methyltransferase (EC 2.1.1.171), 5), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 5), Phosphoenolpyruvate synthase (EC 2.7.9.2), 4))

Nitrogen-Ala-Gly 1 Succinate-semialdehyde dehydrogenase NAD(P)+ (EC 1.2.1.16), 10), Potassium-transporting ATPase A chain (EC 3.6.3.12) (TC 3.A.3.7.1), 8), Aldehyde dehydrogenase (EC 1.2.1.3), 7), Succinate-semialdehyde dehydrogenase NAD (EC 1.2.1.24); 3B Succinate- semialdehyde dehydrogenase NADP+ (EC 1.2.1.79), 6), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 5), DRAFT115 2 Carbamoyl-phosphate synthase small chain (EC 6.3.5.5), 12), Glycerophosphoryl diester phosphodiesterase (EC 3.1.4.46), 10), Phospho- glucomutase (EC 5.4.2.2), 7), Enoyl-acyl-carrier-protein reductase FMN (EC 1.3.1.9), 5), Aldehyde dehydrogenase (EC 1.2.1.3), 5), 3 NADP-specific glutamate dehydrogenase (EC 1.4.1.4), 3), Chorismate synthase (EC 4.2.3.5), 2), Cell division protein FtsI Peptidoglycan synthetase (EC 2.4.1.129), 2), Cysteine desulfurase (EC 2.8.1.7), 2), Sulfate adenylyltransferase subunit 2 (EC 2.7.7.4), 2), 4 Glucose-6-phosphate isomerase (EC 5.3.1.9), 9), Dihydrolipoamide acetyltransferase component of pyruvate dehydrogenase complex (EC 2.3.1.12), 7), SSU rRNA (adenine(1518)-N(6)/adenine(1519)-N(6))-dimethyltransferase (EC 2.1.1.182), 6), 3-oxoacyl-acyl-carrier-protein synthase; 2C KASIII (EC 2.3.1.180), 5), Cytochrome d ubiquinol oxidase subunit II (EC 1.10.3.-), 5), 5 Dihydrolipoamide acetyltransferase component of pyruvate dehydrogenase complex (EC 2.3.1.12), 12), Dihydrolipoamide acyltransferase component of branched-chain alpha-keto acid dehydrogenase complex (EC 2.3.1.168), 5), Pyruvate-flavodoxin oxidoreductase (EC 1.2.7.-), 4), Thioredoxin reductase (EC 1.8.1.9), 4), Porphobilinogen synthase (EC 4.2.1.24), 4), 6 Sulfate and thiosulfate import ATP-binding protein CysA (EC 3.6.3.25), 8), ATP synthase alpha chain (EC 3.6.3.14), 4), Dihydroorotase (EC 3.5.2.3), 4), Succinate dehydrogenase flavoprotein subunit (EC 1.3.5.1), 4), (Adenosine (5)-pentaphospho-(5)-adenosine pyrophos- phohydrolase (EC 3.6.1.-), 4), 7 Urease alpha subunit (EC 3.5.1.5), 23), Pyruvate-flavodoxin oxidoreductase (EC 1.2.7.-), 8), Anthranilate phosphoribosyltransferase (EC 2.4.2.18), 8), (Inosine-5-monophosphate dehydrogenase (EC 1.1.1.205) / CBS domain, 8), NAD(P)H-hydrate epimerase (EC 5.1.99.6) / ADP-dependent (S)-NAD(P)H-hydrate dehydratase (EC 4.2.1.136), 6), 8 Cell division protein FtsZ (EC 3.4.24.-), 21), Alanyl-tRNA synthetase (EC 6.1.1.7), 21), Cell division protein FtsH (EC 3.4.24.-), 10), Methenyltetrahydrofolate cyclohydrolase (EC 3.5.4.9) / Methylenetetrahydrofolate dehydrogenase (NADP+) (EC 1.5.1.5), 6), Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14), 6), 9 Alanyl-tRNA synthetase (EC 6.1.1.7), 9), Cell division protein FtsZ (EC 3.4.24.-), 8), Exodeoxyribonuclease VII large subunit (EC 3.1.11.6), 8), DNA gyrase subunit B (EC 5.99.1.3), 8), Ribonuclease E (EC 3.1.26.12), 7), 10 Adenosylhomocysteinase (EC 3.3.1.1), 7), Cobalt-precorrin-5B (C1)-methyltransferase (EC 2.1.1.195), 7), Enoyl-CoA hydratase (EC 4.2.1.17), 6), Pyruvate dehydrogenase E1 component (EC 1.2.4.1), 4), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 4))

Nitrogen-Ala-Gln 1 Cell division protein FtsH (EC 3.4.24.-), 14), Alcohol dehydrogenase (EC 1.1.1.1); 3B Acetaldehyde dehydrogenase (EC 1.2.1.10), 7), UDP-3-O-3-hydroxymyristoyl glucosamine N-acyltransferase (EC 2.3.1.191), 7), Alcohol dehydrogenase (EC 1.1.1.1), 6), Lactaldehyde dehydrogenase involved in fucose or rhamnose utilization (EC 1.2.1.22), 6), 2 Lysyl-tRNA synthetase (class I) (EC 6.1.1.6), 8), Phospho-N-acetylmuramoyl-pentapeptide-transferase (EC 2.7.8.13), 8), 3-oxoacyl-acyl- carrier-protein synthase; 2C KASII (EC 2.3.1.179), 6), Undecaprenyl diphosphate synthase (EC 2.5.1.31), 5), Amidophosphoribosyltrans- ferase (EC 2.4.2.14), 5), 3 Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 12), (2E; 2C6E)-farnesyl diphosphate synthase (EC 2.5.1.10), 9), N5-carboxyaminoimidazole ribonucleotide mutase (EC 5.4.99.18), 7), O-acetylhomoserine sulfhydrylase (EC 2.5.1.49) / O-succinylhomoserine sulfhydrylase (EC 2.5.1.48), 4), ATP-dependent protease La (EC 3.4.21.53) Type I, 3), 4 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 30), Betaine aldehyde dehydrogenase (EC 1.2.1.8), 12), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 8), NAD(P) transhydrogenase subunit beta (EC 1.6.1.2), 7), UDP-glucose 4-epimerase (EC 5.1.3.2), 7), 5 Protein-PII uridylyltransferase (EC 2.7.7.59) / Protein-PII-UMP uridylyl-removing enzyme, 11), Pyruvate; 2Cphosphate dikinase (EC 2.7.9.1), 10), 3-ketoacyl-CoA thiolase fadN-fadA-fadE operon (EC 2.3.1.16), 9), Phosphate:acyl-ACP acyltransferase PlsX (EC 2.3.1.n2), 6), Exopolyphosphatase (EC 3.6.1.11), 5), 6 Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 6), GDP-mannose 4; 2C6-dehydratase (EC 4.2.1.47), 3), dTDP-4-amino-4; 2C6-dideoxygalactose transaminase (EC 2.6.1.59), 3), Inositol-1-phosphate synthase (EC 5.5.1.4), 3), tRNA-guanine transglycosylase (EC 2.4.2.29), 3), 7 Glycyl-tRNA synthetase alpha chain (EC 6.1.1.14), 10), Aldehyde dehydrogenase (EC 1.2.1.3), 6), Glycine dehydrogenase decarboxylating (glycine cleavage system P protein) (EC 1.4.4.2), 6), Histidinol dehydrogenase (EC 1.1.1.23), 5), N-succinyl-L; 2CL-diaminopimelate desuccinylase (EC 3.5.1.18), 4), 8 3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 22), Acetyl-CoA acetyltransferase (EC 2.3.1.9), 11), Cell division protein FtsH (EC 3.4.24.-), 10), Glutamate-1-semialdehyde 2; 2C1-aminomutase (EC 5.4.3.8), 10), Gamma- glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 8), 9 DNA polymerase III subunits gamma and tau (EC 2.7.7.7), 6), ATP synthase gamma chain (EC 3.6.3.14), 5), 5-methyltetrahydrofolate– homocysteine methyltransferase (EC 2.1.1.13), 4), DNA gyrase subunit A (EC 5.99.1.3), 4), Cystathionine beta-lyase (EC 4.4.1.8), 4), 10 Pyruvate dehydrogenase E1 component beta subunit (EC 1.2.4.1), 8), S-adenosylmethionine synthetase (EC 2.5.1.6), 6), Dihydropteroate synthase (EC 2.5.1.15), 5), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 5), 2; 2C3; 2C4; 2C5-tetrahydropyridine-2; 2C6-dicarboxylate N-succinyltransferase (EC 2.3.1.117), 3))

Carbon-D-L-a-Glycerol-Phosphate 1 1-deoxy-D-xylulose 5-phosphate synthase (EC 2.2.1.7), 8), DNA gyrase subunit B (EC 5.99.1.3), 8), Cell division protein FtsZ (EC 3.4.24.-), 7), Glycogen phosphorylase (EC 2.4.1.1), 7), Glutamate-1-semialdehyde 2; 2C1-aminomutase (EC 5.4.3.8), 7), 2 Xylulose-5-phosphate phosphoketolase (EC 4.1.2.9) @ Fructose-6-phosphate phosphoketolase (EC 4.1.2.22), 11), tRNA-specific 2-thiouridylase MnmA (EC 2.8.1.13), 7), Cell division protein FtsI Peptidoglycan synthetase (EC 2.4.1.129), 7), N(6)-L- threonylcarbamoyladenine synthase (EC 2.3.1.234), 6), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 6), 3 Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 5), 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase (EC 1.17.7.1), 4), Glycyl-tRNA synthetase alpha chain (EC 6.1.1.14), 4), 3-deoxy-manno-octulosonate cytidylyltransferase (EC 2.7.7.38), 4), Glutamate-1-semialdehyde 2; 2C1-aminomutase (EC 5.4.3.8), 4), 4 ATP-dependent DNA helicase RecG (EC 3.6.4.12), 5), N5-carboxyaminoimidazole ribonucleotide mutase (EC 5.4.99.18), 5), Dihydrooro- tase (EC 3.5.2.3), 4), Argininosuccinate lyase (EC 4.3.2.1), 4), DNA gyrase subunit A (EC 5.99.1.3), 4), 5 5-methyltetrahydropteroyltriglutamate–homocysteine methyltransferase (EC 2.1.1.14), 12), Chorismate synthase (EC 4.2.3.5), 10), Leucyl- tRNA synthetase (EC 6.1.1.4), 5), DNA polymerase III subunits gamma and tau (EC 2.7.7.7), 4), DNA gyrase subunit A (EC 5.99.1.3), 4), 6 Exodeoxyribonuclease VII large subunit (EC 3.1.11.6), 8), DNA polymerase I (EC 2.7.7.7), 5), Phosphoenolpyruvate carboxylase (EC 4.1.1.31), 4), NAD-dependent glyceraldehyde-3-phosphate dehydrogenase (EC 1.2.1.12), 4), Alcohol dehydrogenase (EC 1.1.1.1), 3), 7 NADH-ubiquinone oxidoreductase chain D (EC 1.6.5.3), 9), N(6)-L-threonylcarbamoyladenine synthase (EC 2.3.1.234), 7), Crossover junc- tion endodeoxyribonuclease RuvC (EC 3.1.22.4), 7), 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase (EC 2.7.7.60), 5), Argini- nosuccinate lyase (EC 4.3.2.1), 4), 8 Tyrosyl-tRNA synthetase (EC 6.1.1.1), 9), Cystathionine beta-lyase (EC 4.4.1.8), 6), Agmatine deiminase (EC 3.5.3.12), 5), 5-methyltetrahydrofolate–homocysteine methyltransferase (EC 2.1.1.13), 5), O-acetylhomoserine sulfhydrylase (EC 2.5.1.49) / O- succinylhomoserine sulfhydrylase (EC 2.5.1.48), 4), 9 Threonyl-tRNA synthetase (EC 6.1.1.3), 6), 6-phospho-beta-glucosidase (EC 3.2.1.86), 5), Pyrophosphate-energized proton pump (EC 3.6.1.1), 5), Alanyl-tRNA synthetase (EC 6.1.1.7), 5), ATP-dependent DNA helicase RecG (EC 3.6.4.12), 4), 10 Crossover junction endodeoxyribonuclease RuvC (EC 3.1.22.4), 9), Glucose-6-phosphate isomerase (EC 5.3.1.9), 8), NADH-ubiquinone oxidoreductase chain L (EC 1.6.5.3), 8), Ornithine cyclodeaminase (EC 4.3.1.12), 5), Ribose 5-phosphate isomerase A (EC 5.3.1.6), 4))

Nitrogen-L-Proline DRAFT116 1 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 26), UDP-N-acetylmuramate:L-alanyl-gamma-D-glutamyl-meso- diaminopimelate ligase (EC 6.3.2.-), 10), Peptidyl-tRNA hydrolase (EC 3.1.1.29), 8), Aspartyl-tRNA(Asn) amidotransferase subunit B (EC 6.3.5.6) @ Glutamyl-tRNA(Gln) amidotransferase subunit B (EC 6.3.5.7), 8), UDP-N-acetylglucosamine 4; 2C6-dehydratase (EC 4.2.1.135), 5), 2 Branched-chain alpha-keto acid dehydrogenase; 2C E1 component; 2C alpha subunit (EC 1.2.4.4) / Branched-chain alpha-keto acid dehydrogenase; 2C E1 component; 2C beta subunit (EC 1.2.4.4), 6), Thioredoxin reductase (EC 1.8.1.9), 4), Cell division protein FtsZ (EC 3.4.24.-), 3), Phosphoenolpyruvate-protein phosphotransferase of PTS system (EC 2.7.3.9), 3), Pyruvate; 2Cphosphate dikinase (EC 2.7.9.1), 3), 3 DNA gyrase subunit A (EC 5.99.1.3), 9), Topoisomerase IV subunit B (EC 5.99.1.-), 7), 1-deoxy-D-xylulose 5-phosphate reductoisomerase (EC 1.1.1.267), 6), Diaminopimelate epimerase (EC 5.1.1.7), 5), beta-glucosidase (EC 3.2.1.21), 5), 4 Cell division protein FtsZ (EC 3.4.24.-), 6), Valyl-tRNA synthetase (EC 6.1.1.9), 5), beta-N-acetylglucosaminidase (EC 3.2.1.52), 4), 1-phosphofructokinase (EC 2.7.1.56), 4), Gamma-glutamyl phosphate reductase (EC 1.2.1.41), 4), 5 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 8), Ribose 5-phosphate isomerase B (EC 5.3.1.6) / Galactose 6-phosphate isomerase, 7), Enolase (EC 4.2.1.11), 6), DNA polymerase III alpha subunit (EC 2.7.7.7), 5), Pyruvate-flavodoxin oxidoreductase (EC 1.2.7.-), 4), 6 Leucyl-tRNA synthetase (EC 6.1.1.4), 12), 3-polyprenyl-4-hydroxybenzoate carboxy-lyase (EC 4.1.1.98), 6), (DNA-directed RNA poly- merase beta subunit (EC 2.7.7.6), 4), DNA topoisomerase I (EC 5.99.1.2), 4), Tetrachloroethene reductive dehalogenase PceA (EC 1.97.1.8), 4), 7 Ribulose-phosphate 3-epimerase (EC 5.1.3.1), 10), 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 9), Thioredoxin reductase (EC 1.8.1.9), 7), 2; 2C3-bisphosphoglycerate-independent phosphoglycerate mutase (EC 5.4.2.12), 6), NADPH-dependent glyceraldehyde-3- phosphate dehydrogenase (EC 1.2.1.13) / NAD-dependent glyceraldehyde-3-phosphate dehydrogenase (EC 1.2.1.12), 6), 8 Seryl-tRNA synthetase (EC 6.1.1.11), 5), UDP-N-acetylglucosamine 1-carboxyvinyltransferase (EC 2.5.1.7), 4), DNA polymerase III polC-type (EC 2.7.7.7), 4), Ketopantoate reductase PanG (EC 1.1.1.169), 3), Riboflavin synthase eubacterial/eukaryotic (EC 2.5.1.9), 2), 9 Cell division protein FtsH (EC 3.4.24.-), 4), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), Acetylornithine aminotrans- ferase (EC 2.6.1.11), 4), DNA polymerase I (EC 2.7.7.7), 4), Prolyl-tRNA synthetase (EC 6.1.1.15); 2C archaeal/eukaryal type, 4), 10 DNA polymerase I (EC 2.7.7.7), 8), Serine hydroxymethyltransferase (EC 2.1.2.1), 7), Gamma-glutamyl phosphate reductase (EC 1.2.1.41), 4), Single-stranded-DNA-specific exonuclease RecJ (EC 3.1.-.-), 3), D-alanine–D-alanine ligase (EC 6.3.2.4), 3))

Carbon-L-Alanyl-Glycine 1 Phosphoglycerate kinase (EC 2.7.2.3), 12), S-adenosylmethionine synthetase (EC 2.5.1.6), 10), Ribonucleotide reductase of class II (coen- zyme B12-dependent) (EC 1.17.4.1), 9), NADH-ubiquinone oxidoreductase chain B (EC 1.6.5.3), 7), Cell division protein FtsZ (EC 3.4.24.-), 6), 2 Signal peptidase I (EC 3.4.21.89), 10), Branched-chain phosphotransacylase (EC 2.3.1.- ), 7), 1-deoxy-D-xylulose 5-phosphate synthase (EC 2.2.1.7), 4), 6-phosphofructokinase (EC 2.7.1.11), 4), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 4), 3 Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19), 6), 3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 6), DNA polymerase III alpha subunit (EC 2.7.7.7), 5), RNA polymerase associated protein RapA (EC 3.6.1.-), 5), 2-C-methyl-D-erythritol 2; 2C4-cyclodiphosphate synthase (EC 4.6.1.12), 3), 4 RNA polymerase associated protein RapA (EC 3.6.1.-), 9), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 7), Pyruvate car- boxylase (EC 6.4.1.1), 6), Dihydrolipoamide dehydrogenase of pyruvate dehydrogenase complex (EC 1.8.1.4), 6), Thiol peroxidase; 2C Tpx-type (EC 1.11.1.15), 5), 5 Cell division protein FtsH (EC 3.4.24.-), 13), Methenyltetrahydrofolate cyclohydrolase (EC 3.5.4.9) / Methylenetetrahydrofolate de- hydrogenase (NADP+) (EC 1.5.1.5), 8), D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 6), 1-hydroxy-2-methyl-2-(E)-butenyl 4- diphosphate synthase (EC 1.17.7.1), 5), Nitrite reductase NAD(P)H small subunit (EC 1.7.1.4), 4), 6 Topoisomerase IV subunit A (EC 5.99.1.-), 8), 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase (EC 1.17.7.1), 6), Pyridine nucleotide-disulfide oxidoreductase; 3B NADH dehydrogenase (EC 1.6.99.3), 5), NAD(P)H-hydrate epimerase (EC 5.1.99.6) / ADP- dependent (S)-NAD(P)H-hydrate dehydratase (EC 4.2.1.136), 5), 6-phosphofructokinase (EC 2.7.1.11), 5), 7 ATP synthase beta chain (EC 3.6.3.14), 13), Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17), 5), 3-oxoacyl-acyl- carrier-protein synthase (EC 2.3.1.41), 5), Alanyl-tRNA synthetase (EC 6.1.1.7), 4), Sulfate and thiosulfate import ATP-binding protein CysA (EC 3.6.3.25), 4), 8 Glycyl-tRNA synthetase alpha chain (EC 6.1.1.14), 8), Hydroxyacylglutathione hydrolase (EC 3.1.2.6), 6), ADP-ribose pyrophosphatase (EC 3.6.1.13), 5), Single-stranded-DNA-specific exonuclease RecJ (EC 3.1.-.-), 4), Sugar phosphatase YfbT (EC 3.1.3.23), 3), 9 Acetylglutamate kinase (EC 2.7.2.8), 19), L-seryl-tRNA(Sec) selenium transferase (EC 2.9.1.1), 16), Xanthine dehydrogenase; 2C molyb- denum binding subunit (EC 1.17.1.4), 13), S-adenosylmethionine synthetase (EC 2.5.1.6), 13), 3-isopropylmalate dehydratase large subunit (EC 4.2.1.33), 5), 10 Leucyl-tRNA synthetase (EC 6.1.1.4), 16), Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61), 8), Dihydrolipoamide acetyltransferase component of pyruvate dehydrogenase complex (EC 2.3.1.12), 7), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 6), 3-oxoacyl-acyl-carrier-protein synthase; 2C KASII (EC 2.3.1.179), 5))

Nitrogen-L-Aspartic-Acid 1 Succinyl-CoA ligase ADP-forming beta chain (EC 6.2.1.5), 5), Phospho-N-acetylmuramoyl-pentapeptide-transferase (EC 2.7.8.13), 5), (Inosine-5-monophosphate dehydrogenase (EC 1.1.1.205) / CBS domain, 5), Succinate dehydrogenase flavoprotein subunit (EC 1.3.5.1), 4), Dihydroorotate dehydrogenase (NAD(+)); 2C catalytic subunit (EC 1.3.1.14), 4), 2 Arginyl-tRNA synthetase (EC 6.1.1.19), 8), Polyribonucleotide nucleotidyltransferase (EC 2.7.7.8), 6), Imidazoleglycerol-phosphate dehy- dratase (EC 4.2.1.19), 6), ATP phosphoribosyltransferase regulatory subunit (EC 2.4.2.17), 5), Guanylate kinase (EC 2.7.4.8), 4), 3 Enolase (EC 4.2.1.11), 3), N-acetylmuramoyl-L-alanine amidase (EC 3.5.1.28), 3), tRNA (guanine(37)-N(1))-methyltransferase (EC 2.1.1.228), 2), D-alanine aminotransferase (EC 2.6.1.21), 2), Pyruvate-flavodoxin oxidoreductase (EC 1.2.7.-), 2), 4 Ribonuclease PH (EC 2.7.7.56), 6), Cardiolipin synthetase (EC 2.7.8.-), 4), Topoisomerase IV subunit B (EC 5.99.1.-), 4), Phosphoenolpyruvate-dihydroxyacetone phosphotransferase (EC 2.7.1.121); 2C subunit DhaM; 3B DHA-specific IIA component, 3), Al- kaline phosphatase (EC 3.1.3.1), 2), 5 ATP synthase beta chain (EC 3.6.3.14), 11), 5-methyltetrahydrofolate–homocysteine methyltransferase (EC 2.1.1.13), 5), Ribonuclease E (EC 3.1.26.12), 4), Ribonucleotide reductase of class Ia (aerobic); 2C beta subunit (EC 1.17.4.1), 4), Endonuclease III (EC 4.2.99.18), 3), 6 D-alanine–D-alanine ligase (EC 6.3.2.4), 5), DNA gyrase subunit B (EC 5.99.1.3), 5), Tryptophan synthase beta chain (EC 4.2.1.20), 4), Methionine aminopeptidase (EC 3.4.11.18), 4), Biotin synthase (EC 2.8.1.6), 4), 7 NADH-ubiquinone oxidoreductase chain H (EC 1.6.5.3), 7), NADH-ubiquinone oxidoreductase chain I (EC 1.6.5.3), 5), Methylmalonyl-CoA epimerase (EC 5.1.99.1) @ Ethylmalonyl-CoA epimerase, 5), Arginyl-tRNA synthetase (EC 6.1.1.19), 3), Fructose-1; 2C6-bisphosphatase; 2C Bacillus type (EC 3.1.3.11), 3), 8 5-methyltetrahydrofolate–homocysteine methyltransferase (EC 2.1.1.13), 5), Amidophosphoribosyltransferase (EC 2.4.2.14), 4), Methylmalonyl-CoA mutase (EC 5.4.99.2), 4), Uroporphyrinogen III decarboxylase (EC 4.1.1.37), 3), Phosphopantetheine adenylyltrans- ferase (EC 2.7.7.3), 3), 9 S-adenosylmethionine synthetase (EC 2.5.1.6), 7), ATP-dependent protease La (EC 3.4.21.53) Type I, 5), Phosphoribosylamine–glycine ligase (EC 6.3.4.13), 4), 6-phospho-beta-glucosidase (EC 3.2.1.86), 3), Prolipoprotein diacylglyceryl transferase (EC 2.4.99.-), 3), 10 (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 13), Aconitate hydratase (EC 4.2.1.3), 5), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 5), Transketolase (EC 2.2.1.1), 4), (R)-citramalate synthase (EC 2.3.1.182), 4))

Carbon-Glycyl-L-Proline 1 DNA polymerase I (EC 2.7.7.7), 5), Phosphate:acyl-ACP acyltransferase PlsX (EC 2.3.1.n2), 4), Alcohol dehydrogenase (EC 1.1.1.1), 3), Aminomethyltransferase (glycine cleavage system T protein) (EC 2.1.2.10), 3), 3-oxoacyl-acyl-carrier-protein synthase; 2C KASII (EC 2.3.1.179), 3), DRAFT117 2 Cell division protein FtsZ (EC 3.4.24.-), 23), Threonine dehydratase biosynthetic (EC 4.3.1.19), 14), Threonine dehydratase; 2C catabolic (EC 4.3.1.19) @ L-serine dehydratase; 2C (PLP)-dependent (EC 4.3.1.17), 11), Gamma-glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 10), Glycine dehydrogenase decarboxylating (glycine cleavage system P protein) (EC 1.4.4.2), 10), 3 Glutamate synthase NADPH large chain (EC 1.4.1.13), 12), Adenylate kinase (EC 2.7.4.3), 8), Acetyl-CoA synthetase (EC 6.2.1.1), 4), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 4), Omega-amino acid–pyruvate aminotransferase (EC 2.6.1.18), 3), 4 Succinyl-CoA ligase ADP-forming alpha chain (EC 6.2.1.5), 7), Cell division protein FtsZ (EC 3.4.24.-), 6), Dihydrolipoamide acetyl- transferase component of pyruvate dehydrogenase complex (EC 2.3.1.12), 5), GMP synthase glutamine-hydrolyzing; 2C amidotransferase subunit (EC 6.3.5.2) / GMP synthase glutamine-hydrolyzing; 2C ATP pyrophosphatase subunit (EC 6.3.5.2), 5), Histidyl-tRNA synthetase (EC 6.1.1.21), 5), 5 Tryptophan synthase beta chain (EC 4.2.1.20), 4), Pyruvate-flavodoxin oxidoreductase (EC 1.2.7.-), 4), ATP-dependent protease La (EC 3.4.21.53) Type I, 3), tRNA-specific 2-thiouridylase MnmA (EC 2.8.1.13), 3), Cobyric acid synthase (EC 6.3.5.10), 3), 6 Methionyl-tRNA formyltransferase (EC 2.1.2.9), 24), ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 8), Glycyl-tRNA synthetase beta chain (EC 6.1.1.14), 5), tRNA-guanine transglycosylase (EC 2.4.2.29), 5), Prolyl-tRNA synthetase (EC 6.1.1.15); 2C bacterial type, 3), 7 3-oxoacyl-acyl-carrier-protein synthase; 2C KASII (EC 2.3.1.179), 14), 3-oxoacyl-acyl-carrier-protein synthase; 2C KASI (EC 2.3.1.41), 5), Phenylalanyl-tRNA synthetase beta chain (EC 6.1.1.20), 5), (Inactive (p)ppGpp 3-pyrophosphohydrolase domain / GTP pyrophos- phokinase (EC 2.7.6.5); 2C (p)ppGpp synthetase I, 4), Crossover junction endodeoxyribonuclease RuvC (EC 3.1.22.4), 4), 8 UDP-N-acetylmuramoylalanine–D-glutamate ligase (EC 6.3.2.9), 6), 4-hydroxy-tetrahydrodipicolinate synthase (EC 4.3.3.7), 3), Dihy- dropteroate synthase (EC 2.5.1.15), 3), Type I restriction-modification system; 2C restriction subunit R (EC 3.1.21.3), 3), Aldehyde dehydrogenase (EC 1.2.1.3), 3), 9 DNA gyrase subunit A (EC 5.99.1.3), 5), Glutathione-independent formaldehyde dehydrogenase (EC 1.2.1.46), 5), Isocitrate lyase (EC 4.1.3.1), 4), Alkaline phosphatase (EC 3.1.3.1), 4), Cytidine deaminase (EC 3.5.4.5), 4), 10 ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 4), Glutamate synthase NADPH small chain (EC 1.4.1.13), 4), 16S rRNA (guanine(1207)-N(2))-methyltransferase (EC 2.1.1.172), 3), Adenylosuccinate synthetase (EC 6.3.4.4), 3), PTS system; 2C beta-glucoside- specific IIB component (EC 2.7.1.69) / PTS system; 2C beta-glucoside-specific IIC component / PTS system; 2C beta-glucoside-specific IIA component, 3))

Nitrogen-Gly-Glu 1 Exodeoxyribonuclease VII large subunit (EC 3.1.11.6), 8), 3-isopropylmalate dehydratase large subunit (EC 4.2.1.33), 8), Cell division protein FtsZ (EC 3.4.24.-), 7), Alanyl-tRNA synthetase (EC 6.1.1.7), 7), Ribonucleotide reductase of class II (coenzyme B12-dependent) (EC 1.17.4.1), 6), 2 Alanyl-tRNA synthetase (EC 6.1.1.7), 11), 3-isopropylmalate dehydratase large subunit (EC 4.2.1.33), 6), Cyclic beta-1; 2C2-glucan synthase (EC 2.4.1.-), 5), DNA gyrase subunit A (EC 5.99.1.3), 5), Glutathione reductase (EC 1.8.1.7), 4), 3 Phosphoserine aminotransferase (EC 2.6.1.52), 9), 5-methyltetrahydropteroyltriglutamate–homocysteine methyltransferase (EC 2.1.1.14), 9), DNA polymerase III subunits gamma and tau (EC 2.7.7.7), 7), Ribonucleotide reductase of class III (anaerobic); 2C large subunit (EC 1.17.4.2), 7), Type I restriction-modification system; 2C restriction subunit R (EC 3.1.21.3), 6), 4 Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 5), Uridine monophosphate kinase (EC 2.7.4.22), 4), ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 4), Fructokinase (EC 2.7.1.4), 4), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 4), 5 Dihydroxy-acid dehydratase (EC 4.2.1.9), 17), Phosphogluconate dehydratase (EC 4.2.1.12), 8), Glutamate synthase NADPH large chain (EC 1.4.1.13), 7), UDP-N-acetylmuramate:L-alanyl-gamma-D-glutamyl-meso-diaminopimelate ligase (EC 6.3.2.-), 7), Catalase-peroxidase KatG (EC 1.11.1.21), 6), 6 Thymidylate synthase (EC 2.1.1.45), 17), Glutamyl-tRNA synthetase (EC 6.1.1.17) @ Glutamyl-tRNA(Gln) synthetase (EC 6.1.1.24), 5), Enolase (EC 4.2.1.11), 5), Methionyl-tRNA formyltransferase (EC 2.1.2.9), 3), Thiamine-monophosphate kinase (EC 2.7.4.16), 3), 7 Endonuclease IV (EC 3.1.21.2), 6), Inositol-1-monophosphatase (EC 3.1.3.25), 3), Phosphoenolpyruvate synthase (EC 2.7.9.2), 3), Phos- phoglycerate kinase (EC 2.7.2.3), 3), Succinate dehydrogenase flavoprotein subunit (EC 1.3.5.1), 3), 8 3-isopropylmalate dehydratase large subunit (EC 4.2.1.33), 14), Alanyl-tRNA synthetase (EC 6.1.1.7), 11), Aspartate 1-decarboxylase (EC 4.1.1.11), 10), Aldehyde dehydrogenase (EC 1.2.1.3), 7), Adenosylhomocysteinase (EC 3.3.1.1), 6), 9 Phosphoribosylformylglycinamidine cyclo-ligase (EC 6.3.3.1), 10), Crossover junction endodeoxyribonuclease RuvC (EC 3.1.22.4), 8), Aspartyl-tRNA synthetase (EC 6.1.1.12) @ Aspartyl-tRNA(Asn) synthetase (EC 6.1.1.23), 4), Biosynthetic arginine decarboxylase (EC 4.1.1.19), 4), Anthranilate phosphoribosyltransferase (EC 2.4.2.18), 4), 10 3-oxoacyl-acyl-carrier-protein synthase; 2C KASIII (EC 2.3.1.180), 8), Homoserine dehydrogenase (EC 1.1.1.3), 6), Dihydrolipoamide acetyltransferase component of pyruvate dehydrogenase complex (EC 2.3.1.12), 5), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 4), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper- translocating P-type ATPase (EC 3.6.3.4), 4))

Carbon-D-Trehalose 1 2-isopropylmalate synthase (EC 2.3.3.13), 23), Tyrosyl-tRNA synthetase (EC 6.1.1.1), 7), DNA polymerase I (EC 2.7.7.7), 7), Phenylalanyl-tRNA synthetase beta chain (EC 6.1.1.20), 6), NAD synthetase (EC 6.3.1.5) / Glutamine amidotransferase chain of NAD synthetase, 4), 2 Serine hydroxymethyltransferase (EC 2.1.2.1), 8), Myo-inositol 2-dehydrogenase 1 (EC 1.1.1.18), 5), Alanyl-tRNA synthetase (EC 6.1.1.7), 4), Undecaprenyl-phosphate alpha-N-acetylglucosaminyl 1-phosphate transferase (EC 2.7.8.33), 4), Anthranilate synthase; 2C aminase component (EC 4.1.3.27), 4), 3 Imidazolonepropionase (EC 3.5.2.7), 9), Lipid-A-disaccharide synthase (EC 2.4.1.182), 7), Phenylalanyl-tRNA synthetase beta chain (EC 6.1.1.20), 6), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 5), Polyribonucleotide nucleotidyltransferase (EC 2.7.7.8), 5), 4 ATP synthase beta chain (EC 3.6.3.14), 12), Alanyl-tRNA synthetase (EC 6.1.1.7), 6), UDP-N-acetylmuramate–alanine ligase (EC 6.3.2.8), 5), ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 4), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 4), 5 16S rRNA (cytosine(967)-C(5))-methyltransferase (EC 2.1.1.176), 15), Formate dehydrogenase O alpha subunit (EC 1.2.1.2) @ selenocysteine-containing, 5), tRNA(Ile)-lysidine synthetase (EC 6.3.4.19), 4), Acetylornithine aminotransferase (EC 2.6.1.11), 4), Dihydroxy-acid dehydratase (EC 4.2.1.9), 4), 6 Ketol-acid reductoisomerase (EC 1.1.1.86), 9), Ribosomal protein S12p Asp88 (E. coli) methylthiotransferase (EC 2.8.4.4), 8), 5- methyltetrahydropteroyltriglutamate–homocysteine methyltransferase (EC 2.1.1.14), 7), D-aminoacyl-tRNA deacylase (EC 3.1.1.96), 7), Propionyl-CoA carboxylase carboxyl transferase subunit (EC 6.4.1.3), 5), 7 3-polyprenyl-4-hydroxybenzoate carboxy-lyase (EC 4.1.1.98), 19), Undecaprenyl-phosphate alpha-N-acetylglucosaminyl 1-phosphate trans- ferase (EC 2.7.8.33), 6), Glycerophosphoryl diester phosphodiesterase (EC 3.1.4.46), 6), DEAD-box ATP-dependent RNA helicase DeaD (; 3D CshA) (EC 3.6.4.13), 5), Gamma-glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 4), 8 DNA ligase (NAD(+)) (EC 6.5.1.2), 3), Signal transduction histidine-protein kinase BarA (EC 2.7.13.3), 3), Glucose-1-phosphate thymidy- lyltransferase (EC 2.7.7.24), 2), Para-aminobenzoate synthase; 2C aminase component (EC 2.6.1.85), 2), D-3-phosphoglycerate dehydro- genase (EC 1.1.1.95), 2), 9 IMP cyclohydrolase (EC 3.5.4.10) / Phosphoribosylaminoimidazolecarboxamide formyltransferase (EC 2.1.2.3), 6), Dihydroxy-acid dehy- dratase (EC 4.2.1.9), 5), Cyanophycin synthase (EC 6.3.2.29)(EC 6.3.2.30), 3), L-serine dehydratase; 2C beta subunit (EC 4.3.1.17) / L-serine dehydratase; 2C alpha subunit (EC 4.3.1.17), 3), Isopentenyl-diphosphate Delta-isomerase (EC 5.3.3.2), 3), 10 Peptide deformylase (EC 3.5.1.88), 4), beta-galactosidase (EC 3.2.1.23), 3), ATP-dependent Clp protease proteolytic subunit (EC 3.4.21.92), 3), 1; 2C4-alpha-glucan (glycogen) branching enzyme; 2C GH-13-type (EC 2.4.1.18), 3), Deoxyribodipyrimidine photolyase (EC 4.1.99.3), 2))

Carbon-2-Deoxy-D-Ribose DRAFT118 1 Threonyl-tRNA synthetase (EC 6.1.1.3), 11), Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14), 9), (Inosine-5-monophosphate dehydrogenase (EC 1.1.1.205) / CBS domain, 9), Tryptophanyl-tRNA synthetase (EC 6.1.1.2), 7), UDP-N-acetylglucosamine 2-epimerase (EC 5.1.3.14), 7), 2 Aconitate hydratase (EC 4.2.1.3), 7), 2-methylcitrate dehydratase (2-methyl-trans-aconitate forming) (EC 4.2.1.117), 2), Aconitate hy- dratase (EC 4.2.1.3) @ 2-methylisocitrate dehydratase (EC 4.2.1.99), 2), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 1), Aspartate amino- transferase (EC 2.6.1.1), 1), 3 GMP synthase glutamine-hydrolyzing; 2C amidotransferase subunit (EC 6.3.5.2) / GMP synthase glutamine-hydrolyzing; 2C ATP pyrophosphatase subunit (EC 6.3.5.2), 9), FKBP-type peptidyl-prolyl cis-trans isomerase SlyD (EC 5.2.1.8), 8), FIG001385: N- acetylmuramoyl-L-alanine amidase (EC 3.5.1.28), 6), Dihydroorotase (EC 3.5.2.3), 5), Cysteine desulfurase (EC 2.8.1.7), 4), 4 NAD(P) transhydrogenase subunit beta (EC 1.6.1.2), 4), Pyruvate dehydrogenase E1 component alpha subunit (EC 1.2.4.1), 2), 2- isopropylmalate synthase (EC 2.3.3.13), 1), N-acetyl-gamma-glutamyl-phosphate reductase (EC 1.2.1.38), 1), Pantoate–beta-alanine ligase (EC 6.3.2.1), 1), 5 16S rRNA (cytosine(967)-C(5))-methyltransferase (EC 2.1.1.176), 15), Methionyl-tRNA formyltransferase (EC 2.1.2.9), 10), Threonine synthase (EC 4.2.3.1), 6), Aconitate hydratase (EC 4.2.1.3), 5), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 5), 6 Sensor histidine kinase PrrB (RegB) (EC 2.7.3.-), 7), Phosphoglucosamine mutase (EC 5.4.2.10), 5), Glucosamine–fructose-6-phosphate aminotransferase isomerizing (EC 2.6.1.16), 5), Type I restriction-modification system; 2C restriction subunit R (EC 3.1.21.3), 4), 1; 2C2-phenylacetyl-CoA epoxidase; 2C subunit C (EC 1.14.13.149), 4), 7 Isocitrate lyase (EC 4.1.3.1), 11), NAD(P)H-hydrate epimerase (EC 5.1.99.6) / ADP-dependent (S)-NAD(P)H-hydrate dehydratase (EC 4.2.1.136), 2), Signal transduction histidine kinase CheA (EC 2.7.3.-), 1), Integral membrane indolylacetylinositol arabinosyltransferase EmbB (EC 2.4.2.-), 1), 6-phospho-beta-glucosidase (EC 3.2.1.86), 1), 8 DNA polymerase III alpha subunit (EC 2.7.7.7), 12), Imidazole glycerol phosphate synthase cyclase subunit (EC 4.1.3.-), 6), Histidinol dehydrogenase (EC 1.1.1.23), 6), Signal peptidase I (EC 3.4.21.89), 6), Phosphoglucomutase (EC 5.4.2.2), 6), 9 Ribonucleotide reductase of class Ia (aerobic); 2C alpha subunit (EC 1.17.4.1), 10), Alanyl-tRNA synthetase (EC 6.1.1.7), 10), Thioredoxin reductase (EC 1.8.1.9), 5), Biotin synthase (EC 2.8.1.6), 5), Cystathionine beta-synthase (EC 4.2.1.22), 4), 10 2; 2C3-bisphosphoglycerate-independent phosphoglycerate mutase (EC 5.4.2.12), 15), Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61), 7), ATP-dependent protease La (EC 3.4.21.53) Type I, 6), Argininosuccinate synthase (EC 6.3.4.5), 6), Prolyl-tRNA synthetase (EC 6.1.1.15); 2C bacterial type, 6))

Nitrogen-Ala-Thr 1 Isoleucyl-tRNA synthetase (EC 6.1.1.5), 5), Enolase (EC 4.2.1.11), 5), ATP-dependent protease La (EC 3.4.21.53) Type I, 4), NAD(P) transhydrogenase alpha subunit (EC 1.6.1.2), 4), Dihydroorotase (EC 3.5.2.3), 4), 2 Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14), 14), Phosphoglycerate kinase (EC 2.7.2.3), 10), Biotin carboxylase of acetyl- CoA carboxylase (EC 6.3.4.14) / Biotin carboxyl carrier protein of acetyl-CoA carboxylase; 3B Propionyl-CoA carboxylase alpha chain (EC 6.4.1.3), 7), Carbamate kinase (EC 2.7.2.2), 6), Cell division protein FtsH (EC 3.4.24.-), 6), 3 ATP-dependent protease La (EC 3.4.21.53) Type I, 5), DEAD-box ATP-dependent RNA helicase DeaD (; 3D CshA) (EC 3.6.4.13), 5), D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 5), Cell division protein FtsH (EC 3.4.24.-), 4), Dihydroorotase (EC 3.5.2.3), 4), 4 (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 7), DNA gyrase subunit A (EC 5.99.1.3), 7), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 5), Serine hydroxymethyltransferase (EC 2.1.2.1), 4), Alanyl-tRNA synthetase (EC 6.1.1.7), 4), 5 Tryptophanyl-tRNA synthetase (EC 6.1.1.2), 16), ATP synthase beta chain (EC 3.6.3.14), 14), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 8), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 6), Glucosamine–fructose-6-phosphate aminotransferase isomerizing (EC 2.6.1.16), 6), 6 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 22), Uracil phosphoribosyltransferase (EC 2.4.2.9), 14), tRNA 4-thiouridine syn- thase (EC 2.8.1.4) / Rhodanese-like domain required for thiamine synthesis, 11), S-adenosylmethionine synthetase (EC 2.5.1.6), 10), (2; 2C3-cyclic-nucleotide 2-phosphodiesterase (EC 3.1.4.16), 5), 7 Thioredoxin reductase (EC 1.8.1.9), 19), Phosphoglycerate kinase (EC 2.7.2.3), 18), Uridine monophosphate kinase (EC 2.7.4.22), 12), Glutathione reductase (EC 1.8.1.7), 10), Glutamate-1-semialdehyde 2; 2C1-aminomutase (EC 5.4.3.8), 10), 8 Acyl-CoA dehydrogenase; 2C long-chain specific (EC 1.3.8.8), 5), Succinyl-CoA ligase ADP-forming beta chain (EC 6.2.1.5), 5), Periplas- mic FeFe hydrogenase large subunit (EC 1.12.7.2), 5), (Pyridoxamine 5-phosphate oxidase (EC 1.4.3.5), 4), Pyruvate carboxylase subunit A (EC 6.4.1.1), 4), 9 Urocanate hydratase (EC 4.2.1.49), 11), Tryptophanyl-tRNA synthetase (EC 6.1.1.2), 6), Xylulose-5-phosphate phosphoketolase (EC 4.1.2.9) @ Fructose-6-phosphate phosphoketolase (EC 4.1.2.22), 5), DNA polymerase III alpha subunit (EC 2.7.7.7), 5), Acetyl-CoA synthetase (EC 6.2.1.1), 3), 10 NADP-dependent malic enzyme (EC 1.1.1.40), 7), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 4), 3-oxoacyl-acyl-carrier- protein synthase; 2C KASIII (EC 2.3.1.180), 4), Acyl-CoA hydrolase (EC 3.1.2.20), 4), Succinate-semialdehyde dehydrogenase NAD(P)+ (EC 1.2.1.16), 4))

Nitrogen-Gly-Gln 1 2; 2C3-bisphosphoglycerate-independent phosphoglycerate mutase (EC 5.4.2.12), 7), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 5), Adenosylcobinamide kinase (EC 2.7.1.156) / Adenosylcobinamide-phosphate guanylyltransferase (EC 2.7.7.62), 4), 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 4), Ribosomal large subunit pseudouridine synthase B (EC 5.4.99.22), 3), 2 2; 2C3-bisphosphoglycerate-independent phosphoglycerate mutase (EC 5.4.2.12), 5), Succinate dehydrogenase flavoprotein subunit (EC 1.3.5.1), 5), Fumarate reductase flavoprotein subunit (EC 1.3.5.4), 5), Pyruvate; 2Cphosphate dikinase (EC 2.7.9.1), 5), Aspartate amino- transferase (EC 2.6.1.1), 4), 3 Methenyltetrahydrofolate cyclohydrolase (EC 3.5.4.9) / Methylenetetrahydrofolate dehydrogenase (NADP+) (EC 1.5.1.5), 5), 2-C-methyl- D-erythritol 4-phosphate cytidylyltransferase (EC 2.7.7.60), 5), 5-Enolpyruvylshikimate-3-phosphate synthase (EC 2.5.1.19), 5), UTP– glucose-1-phosphate uridylyltransferase (EC 2.7.7.9), 5), DNA ligase (NAD(+)) (EC 6.5.1.2), 4), 4 3-oxoacyl-acyl-carrier-protein synthase; 2C KASII (EC 2.3.1.179), 9), DNA gyrase subunit A (EC 5.99.1.3), 5), 2; 2C5-dioxovalerate dehydrogenase (EC 1.2.1.26), 5), Adenosylhomocysteinase (EC 3.3.1.1), 4), DEAD-box ATP-dependent RNA helicase DeaD (; 3D CshA) (EC 3.6.4.13), 4), 5 Cell division protein FtsZ (EC 3.4.24.-), 19), DNA polymerase I (EC 2.7.7.7), 15), Alanyl-tRNA synthetase (EC 6.1.1.7), 14), Cell division protein FtsH (EC 3.4.24.-), 6), Phosphoglycerate kinase (EC 2.7.2.3), 6), 6 Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14), 26), Alanyl-tRNA synthetase (EC 6.1.1.7), 12), DNA-directed RNA poly- merase beta subunit (EC 2.7.7.6), 11), Chorismate synthase (EC 4.2.3.5), 10), Cell division protein FtsH (EC 3.4.24.-), 8), 7 Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 13), N5-carboxyaminoimidazole ribonucleotide mutase (EC 5.4.99.18), 8), Thioredoxin reductase (EC 1.8.1.9), 7), Porphobilinogen synthase (EC 4.2.1.24), 7), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 5), 8 Phosphoribosylformylglycinamidine synthase; 2C synthetase subunit (EC 6.3.5.3) / Phosphoribosylformylglycinamidine synthase; 2C glutamine amidotransferase subunit (EC 6.3.5.3), 11), Fumarate hydratase class I; 2C aerobic (EC 4.2.1.2), 7), 5-methyltetrahydrofolate– homocysteine methyltransferase (EC 2.1.1.13), 5), N-acyl-L-amino acid amidohydrolase (EC 3.5.1.14), 5), ATP synthase gamma chain (EC 3.6.3.14), 5), 9 Enolase (EC 4.2.1.11), 19), Tryptophan synthase beta chain (EC 4.2.1.20), 19), Glutamate-1-semialdehyde 2; 2C1-aminomutase (EC 5.4.3.8), 7), Ribonuclease PH (EC 2.7.7.56), 7), (R)-citramalate synthase (EC 2.3.1.182), 6), 10 S-(hydroxymethyl)glutathione dehydrogenase (EC 1.1.1.284), 7), Gamma-glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 6), Ribulose-phosphate 3-epimerase (EC 5.1.3.1), 6), Glucose-6-phosphate isomerase (EC 5.3.1.9), 5), SOS-response re- pressor and protease LexA (EC 3.4.21.88), 5))

Carbon-2-Hydroxy-Benzoic-Acid 1 8-amino-7-oxononanoate synthase (EC 2.3.1.47), 7), Glutamate synthase NADPH large chain (EC 1.4.1.13), 4), Topoisomerase IV subunit B (EC 5.99.1.-), 4), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 4), Cysteine desulfurase (EC 2.8.1.7) ; 3D¿ SufS, 3), DRAFT119 2 Respiratory nitrate reductase beta chain (EC 1.7.99.4), 36), Aspartate aminotransferase (EC 2.6.1.1), 16), Pyruvate formate-lyase (EC 2.3.1.54), 15), Exodeoxyribonuclease III (EC 3.1.11.2), 12), Malate dehydrogenase (EC 1.1.1.37), 5), 3 ATP synthase alpha chain (EC 3.6.3.14), 11), Prolyl-tRNA synthetase (EC 6.1.1.15); 2C bacterial type, 6), Phosphatidylglycerophos- phatase A (EC 3.1.3.27), 5), Carbon monoxide dehydrogenase form I; 2C large chain (EC 1.2.99.2), 5), UDP-N-acetylglucosamine–N- acetylmuramyl-(pentapeptide) pyrophosphoryl-undecaprenol N-acetylglucosamine transferase (EC 2.4.1.227), 5), 4 tRNA-guanine transglycosylase (EC 2.4.2.29), 23), Cytochrome O ubiquinol oxidase subunit I (EC 1.10.3.-), 20), Aspartyl-tRNA synthetase (EC 6.1.1.12) @ Aspartyl-tRNA(Asn) synthetase (EC 6.1.1.23), 9), Cytochrome c oxidase polypeptide I (EC 1.9.3.1), 9), Undecaprenyl diphosphate synthase (EC 2.5.1.31), 8), 5 Valyl-tRNA synthetase (EC 6.1.1.9), 17), DNA gyrase subunit A (EC 5.99.1.3), 9), Phosphoribosylamine–glycine ligase (EC 6.3.4.13), 9), UDP-N-acetylmuramoylalanyl-D-glutamate–2; 2C6-diaminopimelate ligase (EC 6.3.2.13), 7), tRNA (guanine(37)-N(1))-methyltransferase (EC 2.1.1.228), 6), 6 CTP synthase (EC 6.3.4.2), 13), Methenyltetrahydrofolate cyclohydrolase (EC 3.5.4.9) / Methylenetetrahydrofolate dehydrogenase (NADP+) (EC 1.5.1.5), 13), Quinone oxidoreductase (EC 1.6.5.5), 7), tRNA-specific 2-thiouridylase MnmA (EC 2.8.1.13), 6), Ribonu- cleotide reductase of class II (coenzyme B12-dependent) (EC 1.17.4.1), 6), 7 Cysteine synthase (EC 2.5.1.47), 12), Isopentenyl-diphosphate delta-isomerase; 2C FMN-dependent (EC 5.3.3.2), 10), 1-hydroxy-2-methyl- 2-(E)-butenyl 4-diphosphate synthase (EC 1.17.7.1), 7), ATP-dependent protease subunit HslV (EC 3.4.25.2), 7), Crossover junction endodeoxyribonuclease RuvC (EC 3.1.22.4), 5), 8 ATP synthase beta chain (EC 3.6.3.14), 12), Thioredoxin reductase (EC 1.8.1.9), 11), tRNA t(6)A37-methylthiotransferase (EC 2.8.4.5), 9), Pyruvate dehydrogenase E1 component (EC 1.2.4.1), 8), Type I restriction-modification system; 2C restriction subunit R (EC 3.1.21.3), 5), 9 ATP-dependent protease La (EC 3.4.21.53) Type I, 7), Lipoprotein signal peptidase (EC 3.4.23.36), 6), Valyl-tRNA synthetase (EC 6.1.1.9), 5), HPr kinase/phosphorylase (EC 2.7.1.-) (EC 2.7.4.-), 4), ATP-dependent DNA helicase RecG (EC 3.6.4.12), 4), 10 Chorismate synthase (EC 4.2.3.5), 14), Glucose-6-phosphate isomerase (EC 5.3.1.9), 9), Deoxycytidine triphosphate deaminase (EC 3.5.4.13), 8), Acyl-CoA dehydrogenase; 2C short-chain specific (EC 1.3.8.1), 6), Acetyl-coenzyme A carboxyl transferase alpha chain (EC 6.4.1.2), 6))

Carbon-L-Proline 1 DNA polymerase I (EC 2.7.7.7), 12), Cysteine desulfurase (EC 2.8.1.7), 5), D-alanine–D-alanine ligase (EC 6.3.2.4), 4), Replicative DNA helicase (DnaB) (EC 3.6.4.12), 4), Enoyl-CoA hydratase (EC 4.2.1.17), 3), 2 Glutamine synthetase type I (EC 6.3.1.2), 4), Topoisomerase IV subunit A (EC 5.99.1.-), 4), Pyruvate kinase (EC 2.7.1.40), 4), DNA polymerase III subunits gamma and tau (EC 2.7.7.7), 3), Phosphoenolpyruvate carboxylase (EC 4.1.1.31), 3), 3 beta-galactosidase (EC 3.2.1.23), 5), Undecaprenyl-diphosphatase (EC 3.6.1.27), 5), Tyrosyl-tRNA synthetase (EC 6.1.1.1), 4), Glyceraldehyde-3-phosphate dehydrogenase (NADP(+)) (EC 1.2.1.9), 4), Thioredoxin reductase (EC 1.8.1.9), 3), 4 Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 5), Glutamate N-acetyltransferase (EC 2.3.1.35) @ N-acetylglutamate synthase (EC 2.3.1.1), 4), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), Thymidylate kinase (EC 2.7.4.9), 3), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 3), 5 Type I restriction-modification system; 2C DNA-methyltransferase subunit M (EC 2.1.1.72), 7), ATP-dependent protease La (EC 3.4.21.53) Type I, 5), 23S rRNA (uracil(1939)-C(5))-methyltransferase (EC 2.1.1.190), 4), Cell division protein FtsH (EC 3.4.24.-), 3), Glutamine synthetase type III; 2C GlnN (EC 6.3.1.2), 3), 6 Glyoxylate carboligase (EC 4.1.1.47), 4), Endonuclease III (EC 4.2.99.18), 3), Tryptophan synthase beta chain (EC 4.2.1.20), 3), Pyruvate carboxylase (EC 6.4.1.1), 3), Acyl-phosphate:glycerol-3-phosphate O-acyltransferase PlsY (EC 2.3.1.n3), 3), 7 DNA polymerase I (EC 2.7.7.7), 8), 2-Keto-3-deoxy-D-manno-octulosonate-8-phosphate synthase (EC 2.5.1.55), 6), Type III restriction- modification system methylation subunit (EC 2.1.1.72), 3), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 3), ATP-dependent protease La (EC 3.4.21.53) Type I, 2), 8 Polyribonucleotide nucleotidyltransferase (EC 2.7.7.8), 4), Topoisomerase IV subunit B (EC 5.99.1.-), 3), DNA polymerase III polC-type (EC 2.7.7.7), 3), Succinate dehydrogenase iron-sulfur protein (EC 1.3.5.1), 3), Fumarate hydratase class I; 2C alpha region (EC 4.2.1.2); 3B L(+)-tartrate dehydratase alpha subunit (EC 4.2.1.32), 2), 9 ATP synthase beta chain (EC 3.6.3.14), 27), ATP synthase alpha chain (EC 3.6.3.14), 6), 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 4), Aspartate aminotransferase (EC 2.6.1.1), 2), Formate–tetrahydrofolate ligase (EC 6.3.4.3), 2), 10 Probable dTDP-4-dehydrorhamnose reductase (EC 1.1.1.133), 9), Biotin carboxylase of methylcrotonyl-CoA carboxylase (EC 6.3.4.14), 4), Prolyl-tRNA synthetase (EC 6.1.1.15); 2C bacterial type, 3), Type I restriction-modification system; 2C restriction subunit R (EC 3.1.21.3), 3), Threonine synthase (EC 4.2.3.1), 3))

Carbon-b-D-Allose 1 Hydroxymethylpyrimidine phosphate synthase ThiC (EC 4.1.99.17), 26), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 19), D-arabinose 5-phosphate isomerase (EC 5.3.1.13), 11), Octanoate-acyl-carrier-protein-protein-N-octanoyltransferase (EC 2.3.1.181), 8), CTP synthase (EC 6.3.4.2), 8), 2 Aspartyl-tRNA(Asn) amidotransferase subunit B (EC 6.3.5.6) @ Glutamyl-tRNA(Gln) amidotransferase subunit B (EC 6.3.5.7), 14), Succinyl-CoA ligase ADP-forming alpha chain (EC 6.2.1.5), 8), DNA gyrase subunit A (EC 5.99.1.3), 7), Enoyl-CoA hydratase (EC 4.2.1.17) @ Enoyl-CoA hydratase EchA5 (EC 4.2.1.17), 7), 3-oxoacyl-acyl-carrier-protein synthase; 2C KASII (EC 2.3.1.179), 6), 3 Transketolase (EC 2.2.1.1), 11), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 9), NAD-dependent glyceraldehyde-3- phosphate dehydrogenase (EC 1.2.1.12), 7), Enolase (EC 4.2.1.11), 5), ATP-dependent Clp protease proteolytic subunit (EC 3.4.21.92), 5), 4 Catalase-peroxidase KatG (EC 1.11.1.21), 13), NADP-dependent malic enzyme (EC 1.1.1.40), 11), ATP-dependent protease La (EC 3.4.21.53) Type I, 7), DNA polymerase I (EC 2.7.7.7), 7), Deoxycytidylate deaminase (EC 3.5.4.12), 6), 5 Methionine aminopeptidase (EC 3.4.11.18), 7), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 7), Prolyl- tRNA synthetase (EC 6.1.1.15); 2C bacterial type, 6), (Inactive (p)ppGpp 3-pyrophosphohydrolase domain / GTP pyrophosphokinase (EC 2.7.6.5); 2C (p)ppGpp synthetase I, 6), N5-carboxyaminoimidazole ribonucleotide synthase (EC 6.3.4.18), 6), 6 Arsenate reductase (EC 1.20.4.1), 9), DNA gyrase subunit B (EC 5.99.1.3), 8), Lysyl-tRNA synthetase (class II) (EC 6.1.1.6), 7), 2-keto- 3-deoxy-D-arabino-heptulosonate-7-phosphate synthase I alpha (EC 2.5.1.54), 7), CTP synthase (EC 6.3.4.2), 6), 7 Aldehyde dehydrogenase (EC 1.2.1.3), 8), Phosphomannomutase (EC 5.4.2.8), 8), Probable VANILLIN dehydrogenase oxidoreductase protein (EC 1.-.-.-), 6), Topoisomerase IV subunit B (EC 5.99.1.-), 5), Succinyl-CoA ligase ADP-forming beta chain (EC 6.2.1.5), 5), 8 ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 6), Homoserine dehydrogenase (EC 1.1.1.3), 5), Indole-3-glycerol phosphate synthase (EC 4.1.1.48), 5), PTS system; 2C glucitol/sorbitol-specific IIB component and second of two IIC components (EC 2.7.1.69), 5), Carbonic anhydrase; 2C gamma class (EC 4.2.1.1), 5), 9 Na(+)-translocating NADH-quinone reductase subunit F (EC 1.6.5.-), 6), Cell division protein FtsH (EC 3.4.24.-), 4), Cyclomaltodextrin glucanotransferase (EC 2.4.1.19); 3B Maltogenic alpha-amylase (EC 3.2.1.133), 4), Phosphoenolpyruvate-protein phosphotransferase of PTS system (EC 2.7.3.9), 4), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), 10 Lysyl-tRNA synthetase (class II) (EC 6.1.1.6), 7), Limit dextrin alpha-1; 2C6-maltotetraose-hydrolase (EC 3.2.1.196), 6), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 6), 2; 2C4-dihydroxyhept-2-ene-1; 2C7-dioic acid aldolase (EC 4.1.2.52), 6), NAD-dependent glyceraldehyde-3- phosphate dehydrogenase (EC 1.2.1.12), 6))

Phosphate-D-L-a-Glycerol-Phosphate 1 DNA polymerase III delta prime subunit (EC 2.7.7.7), 7), ATP-dependent protease La (EC 3.4.21.53) Type I, 6), Imidazoleglycerol- phosphate dehydratase (EC 4.2.1.19), 6), 2; 2C3-diketo-5-methylthiopentyl-1-phosphate enolase (EC 5.3.2.5), 5), Pyruvate dehydrogenase (quinone) (EC 1.2.5.1), 4), 2 Crossover junction endodeoxyribonuclease RuvC (EC 3.1.22.4), 9), Proline iminopeptidase (EC 3.4.11.5), 7), Cell division protein FtsH (EC 3.4.24.-), 4), Topoisomerase IV subunit A (EC 5.99.1.-), 4), Aspartokinase (EC 2.7.2.4), 3), DRAFT120 3 Succinate-semialdehyde dehydrogenase NAD(P)+ (EC 1.2.1.16), 9), 3; 2C4-dihydroxy-2-butanone 4-phosphate synthase (EC 4.1.99.12) / GTP cyclohydrolase II (EC 3.5.4.25), 8), Aldehyde dehydrogenase (EC 1.2.1.3), 6), DNA polymerase I (EC 2.7.7.7), 5), Aspartyl-tRNA synthetase (EC 6.1.1.12) @ Aspartyl-tRNA(Asn) synthetase (EC 6.1.1.23), 4), 4 Dihydropteroate synthase type-2 (EC 2.5.1.15) @ Sulfonamide resistance protein, 5), Gamma-glutamyltranspeptidase (EC 2.3.2.2) @ Glu- tathione hydrolase (EC 3.4.19.13), 4), 5-methyltetrahydrofolate–homocysteine methyltransferase (EC 2.1.1.13), 4), Tryptophan synthase alpha chain (EC 4.2.1.20), 4), ATP-dependent protease La (EC 3.4.21.53) Type I, 3), 5 Error-prone repair homolog of DNA polymerase III alpha subunit (EC 2.7.7.7), 9), Porphobilinogen synthase (EC 4.2.1.24), 6), UDP- glucose 4-epimerase (EC 5.1.3.2), 6), DNA primase (EC 2.7.7.-), 6), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 5), 6 Aspartyl-tRNA(Asn) amidotransferase subunit A (EC 6.3.5.6) @ Glutamyl-tRNA(Gln) amidotransferase subunit A (EC 6.3.5.7), 13), Histidinol dehydrogenase (EC 1.1.1.23), 11), 1; 2C4-alpha-glucan (glycogen) branching enzyme; 2C GH-13-type (EC 2.4.1.18), 7), D-3- phosphoglycerate dehydrogenase (EC 1.1.1.95), 6), Methionyl-tRNA synthetase (EC 6.1.1.10), 6), 7 Dihydroorotase (EC 3.5.2.3), 6), S-formylglutathione hydrolase (EC 3.1.2.12), 6), 5-Enolpyruvylshikimate-3-phosphate synthase (EC 2.5.1.19), 6), Acetolactate synthase large subunit (EC 2.2.1.6), 3), Valyl-tRNA synthetase (EC 6.1.1.9), 3), 8 Transcriptional repressor of PutA and PutP / Proline dehydrogenase (EC 1.5.5.2) / Delta-1-pyrroline-5-carboxylate dehydrogenase (EC 1.2.1.88), 4), Tryptophan synthase beta chain (EC 4.2.1.20), 3), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 3), Cytochrome c-type biogenesis protein DsbD; 2C protein-disulfide reductase (EC 1.8.1.8), 3), Urea carboxylase (EC 6.3.4.6), 3), 9 Acetyl-CoA acetyltransferase (EC 2.3.1.9) @ 3-oxoadipyl-CoA thiolase (EC 2.3.1.174), 7), Ribonuclease III (EC 3.1.26.3), 6), Urease alpha subunit (EC 3.5.1.5), 5), 3-oxoadipyl-CoA thiolase (EC 2.3.1.174) @ 3-oxo-5; 2C6-dehydrosuberyl-CoA thiolase (EC 2.3.1.223), 4), N-acyl-L-amino acid amidohydrolase (EC 3.5.1.14), 4), 10 Thioredoxin reductase (EC 1.8.1.9), 10), Tyrosyl-tRNA synthetase (EC 6.1.1.1), 6), Phosphogluconate dehydratase (EC 4.2.1.12), 4), Formate–tetrahydrofolate ligase (EC 6.3.4.3), 3), Acetolactate synthase large subunit (EC 2.2.1.6), 3))

Carbon-Glycerol 1 (Adenosine (5)-pentaphospho-(5)-adenosine pyrophosphohydrolase (EC 3.6.1.-), 22), Proline iminopeptidase (EC 3.4.11.5), 7), Lipid-A- disaccharide synthase (EC 2.4.1.182), 4), UDP-glucose 4-epimerase (EC 5.1.3.2), 4), Dihydroorotase (EC 3.5.2.3), 4), 2 Phosphomannomutase (EC 5.4.2.8), 7), Aspartyl aminopeptidase (EC 3.4.11.21), 5), DNA gyrase subunit A (EC 5.99.1.3), 4), N(6)-L- threonylcarbamoyladenine synthase (EC 2.3.1.234), 3), tRNA-guanine transglycosylase (EC 2.4.2.29), 3), 3 Prolyl-tRNA synthetase (EC 6.1.1.15); 2C bacterial type, 5), Alanyl-tRNA synthetase (EC 6.1.1.7), 4), NAD(P) transhydrogenase alpha subunit (EC 1.6.1.2), 3), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 3), Succinate dehydrogenase flavoprotein subunit (EC 1.3.5.1), 3), 4 Potassium-transporting ATPase B chain (EC 3.6.3.12) (TC 3.A.3.7.1), 3), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 2), Phosphoenolpyruvate carboxykinase ATP (EC 4.1.1.49), 2), Signal transduction histidine kinase CheA (EC 2.7.3.-), 1), D-lactate dehydrogenase (EC 1.1.1.28), 1), 5 Adenylosuccinate synthetase (EC 6.3.4.4), 6), Aerobic glycerol-3-phosphate dehydrogenase (EC 1.1.5.3), 6), Membrane alanine aminopep- tidase N (EC 3.4.11.2), 4), 1-deoxy-D-xylulose 5-phosphate synthase (EC 2.2.1.7), 3), 4-hydroxy-tetrahydrodipicolinate synthase (EC 4.3.3.7), 3), 6 ATP phosphoribosyltransferase (EC 2.4.2.17) ; 3D¿ HisGl, 2), Phospho-N-acetylmuramoyl-pentapeptide-transferase (EC 2.7.8.13), 2), IMP cyclohydrolase (EC 3.5.4.10) / Phosphoribosylaminoimidazolecarboxamide formyltransferase (EC 2.1.2.3), 2), DNA mismatch repair protein precursor (EC 3.2.1.4), 1), Transketolase (EC 2.2.1.1), 1), 7 N-acetylglucosamine kinase of eukaryotic type (EC 2.7.1.59), 6), tRNA (guanine(37)-N(1))-methyltransferase (EC 2.1.1.228), 4), tRNA pseudouridine(55) synthase (EC 5.4.99.25), 4), ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 3), Aspartate-semialdehyde dehydrogenase (EC 1.2.1.11), 3), 8 Methenyltetrahydrofolate cyclohydrolase (EC 3.5.4.9) / Methylenetetrahydrofolate dehydrogenase (NADP+) (EC 1.5.1.5), 4), UDP-N- acetylmuramoyl-tripeptide–D-alanyl-D-alanine ligase (EC 6.3.2.10), 3), DNA gyrase subunit A (EC 5.99.1.3), 3), Cell division protein FtsZ (EC 3.4.24.-), 2), Amidophosphoribosyltransferase (EC 2.4.2.14), 2), 9 Sulfite reductase NADPH hemoprotein beta-component (EC 1.8.1.2), 10), Aminomethyltransferase (glycine cleavage system T protein) (EC 2.1.2.10), 4), Diaminohydroxyphosphoribosylaminopyrimidine deaminase (EC 3.5.4.26) / 5-amino-6-(5-phosphoribosylamino)uracil reductase (EC 1.1.1.193), 3), Proline iminopeptidase (EC 3.4.11.5), 2), Propionyl-CoA carboxylase carboxyl transferase subunit (EC 6.4.1.3), 2), 10 (Guanosine-3; 2C5-bis(diphosphate) 3-pyrophosphohydrolase (EC 3.1.7.2) / GTP pyrophosphokinase (EC 2.7.6.5); 2C (p)ppGpp syn- thetase II, 6), Valyl-tRNA synthetase (EC 6.1.1.9), 4), Uracil-DNA glycosylase; 2C family 4 (EC 3.2.2.27), 4), FKBP-type peptidyl-prolyl cis-trans isomerase FkpA precursor (EC 5.2.1.8), 4), Prolipoprotein diacylglyceryl transferase (EC 2.4.99.-), 3))

Carbon-D-Glucosamine 1 DNA primase (EC 2.7.7.-), 1), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 1), Serine–pyruvate amino- transferase (EC 2.6.1.51) / L-alanine:glyoxylate aminotransferase (EC 2.6.1.44), 1), Aldose 1-epimerase (EC 5.1.3.3), 1), Arginyl-tRNA synthetase (EC 6.1.1.19), 1), 2 Tyrosyl-tRNA synthetase (EC 6.1.1.1), 6), Lysyl-tRNA synthetase (class II) (EC 6.1.1.6), 4), DEAD-box ATP-dependent RNA helicase DeaD (; 3D CshA) (EC 3.6.4.13), 3), NADP-specific glutamate dehydrogenase (EC 1.4.1.4), 3), DNA gyrase subunit A (EC 5.99.1.3), 3), 3 (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), Aspartyl-tRNA synthetase (EC 6.1.1.12) @ Aspartyl-tRNA(Asn) synthetase (EC 6.1.1.23), 3), Hydroxyacylglutathione hydrolase (EC 3.1.2.6), 3), Ribonuclease E (EC 3.1.26.12), 3), Replicative DNA helicase (DnaB) (EC 3.6.4.12), 3), 4 Adenylate kinase (EC 2.7.4.3), 5), DNA gyrase subunit B (EC 5.99.1.3), 5), Acetyl-CoA synthetase (EC 6.2.1.1), 4), Chorismate mutase I (EC 5.4.99.5) / Prephenate dehydratase (EC 4.2.1.51), 3), Aldehyde dehydrogenase (EC 1.2.1.3), 3), 5 Phosphoglycerate mutase (EC 5.4.2.11), 6), Acetylornithine aminotransferase (EC 2.6.1.11), 4), NAD(P) transhydrogenase alpha subunit (EC 1.6.1.2), 3), Aconitate hydratase (EC 4.2.1.3), 3), DNA ligase (NAD(+)) (EC 6.5.1.2), 2), 6 2; 2C3-bisphosphoglycerate-independent phosphoglycerate mutase (EC 5.4.2.12), 4), Chemotaxis protein methyltransferase CheR (EC 2.1.1.80), 3), NAD(P) transhydrogenase alpha subunit (EC 1.6.1.2), 3), Aspartyl-tRNA(Asn) amidotransferase subunit A (EC 6.3.5.6) @ Glutamyl-tRNA(Gln) amidotransferase subunit A (EC 6.3.5.7), 3), DNA ligase (NAD(+)) (EC 6.5.1.2), 2), 7 Glutathione S-transferase; 2C unnamed subgroup (EC 2.5.1.18), 6), Aspartate aminotransferase (EC 2.6.1.1), 4), Long-chain-fatty-acid– CoA ligase (EC 6.2.1.3), 4), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 3), Pyruvate formate-lyase (EC 2.3.1.54), 3), 8 3-oxoacyl-acyl-carrier-protein synthase; 2C KASIII (EC 2.3.1.180), 6), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 5), Glutamyl-tRNA synthetase (EC 6.1.1.17), 5), UDP-N-acetylglucosamine 1-carboxyvinyltransferase (EC 2.5.1.7), 4), Adenosylhomocys- teinase (EC 3.3.1.1), 3), 9 Dihydroxy-acid dehydratase (EC 4.2.1.9), 7), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 4), Cell division protein FtsH (EC 3.4.24.-), 3), 4-hydroxy-tetrahydrodipicolinate synthase (EC 4.3.3.7), 3), 5-methyltetrahydropteroyltriglutamate–homocysteine methyltransferase (EC 2.1.1.14), 2), 10 Single-stranded-DNA-specific exonuclease RecJ (EC 3.1.-.-), 4), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), Aspartyl- tRNA synthetase (EC 6.1.1.12), 3), 7-carboxy-7-deazaguanine synthase (EC 4.3.99.3), 3), Fumarate hydratase class I; 2C alpha region (EC 4.2.1.2); 3B L(+)-tartrate dehydratase alpha subunit (EC 4.2.1.32), 2))

Carbon-a-Methyl-D-Galactoside 1 DNA polymerase I (EC 2.7.7.7), 10), Cytochrome c oxidase polypeptide I (EC 1.9.3.1), 6), ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 6), Undecaprenyl-phosphate alpha-N-acetylglucosaminyl 1-phosphate transferase (EC 2.7.8.33), 4), 2; 2C4-dienoyl-CoA reductase NADPH (EC 1.3.1.34), 4), 2 Isoleucyl-tRNA synthetase (EC 6.1.1.5), 6), Pyruvate kinase (EC 2.7.1.40), 5), Phosphoribosylformylglycinamidine synthase; 2C syn- thetase subunit (EC 6.3.5.3) / Phosphoribosylformylglycinamidine synthase; 2C glutamine amidotransferase subunit (EC 6.3.5.3), 4), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 4), Glucose dehydrogenase; 2C PQQ-dependent (EC 1.1.5.2), 3), DRAFT121 3 Galactose-1-phosphate uridylyltransferase (EC 2.7.7.10), 5), Aspartate aminotransferase (EC 2.6.1.1), 4), 3-ketoacyl-CoA thiolase (EC 2.3.1.16), 4), Vitamin B12 ABC transporter; 2C ATP-binding protein BtuD / Adenosylcobinamide amidohydrolase (EC 3.5.1.90), 4), Methionyl-tRNA synthetase (EC 6.1.1.10), 4), 4 Ribulose-phosphate 3-epimerase (EC 5.1.3.1), 8), (Inosine-5-monophosphate dehydrogenase (EC 1.1.1.205) / CBS domain, 8), 16S rRNA (uracil(1498)-N(3))-methyltransferase (EC 2.1.1.193), 7), Argininosuccinate synthase (EC 6.3.4.5), 4), Valyl-tRNA synthetase (EC 6.1.1.9), 4), 5 Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 12), Cell division protein FtsH (EC 3.4.24.-), 8), Aspartate aminotransferase (EC 2.6.1.1), 5), Proline iminopeptidase (EC 3.4.11.5), 4), Glycerophosphoryl diester phosphodiesterase (EC 3.1.4.46), 4), 6 Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 13), Phosphoribosylformylglycinamidine synthase; 2C synthetase subunit (EC 6.3.5.3) / Phosphoribosylformylglycinamidine synthase; 2C glutamine amidotransferase subunit (EC 6.3.5.3), 10), GMP synthase glutamine-hydrolyzing; 2C amidotransferase subunit (EC 6.3.5.2) / GMP synthase glutamine-hydrolyzing; 2C ATP pyrophosphatase subunit (EC 6.3.5.2), 10), Phosphoribosylaminoimidazole-succinocarboxamide synthase (EC 6.3.2.6), 9), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 5), 7 Succinyl-CoA ligase ADP-forming alpha chain (EC 6.2.1.5), 9), Uracil-DNA glycosylase; 2C family 4 (EC 3.2.2.27), 5), 23S rRNA (cytosine(1962)-C(5))-methyltransferase (EC 2.1.1.191), 5), Tryptophanyl-tRNA synthetase (EC 6.1.1.2), 4), Diaminopimelate epimerase (EC 5.1.1.7), 4), 8 Prolyl-tRNA synthetase (EC 6.1.1.15); 2C bacterial type, 7), Dihydropteroate synthase type-2 (EC 2.5.1.15) @ Sulfonamide resistance protein, 6), Alkanesulfonate monooxygenase (EC 1.14.14.5), 4), Succinate dehydrogenase flavoprotein subunit (EC 1.3.5.1), 4), Phospho- enolpyruvate carboxykinase GTP (EC 4.1.1.32), 4), 9 Gamma-glutamyl phosphate reductase (EC 1.2.1.41), 9), Aminodeoxyfutalosine synthase (EC 2.5.1.120), 7), UDP-N-acetylglucosamine 1-carboxyvinyltransferase (EC 2.5.1.7), 5), Uridine monophosphate kinase (EC 2.7.4.22), 5), Undecaprenyl-phosphate alpha-N- acetylglucosaminyl 1-phosphate transferase (EC 2.7.8.33), 5), 10 Phosphoglucosamine mutase (EC 5.4.2.10), 12), Dihydrolipoamide acetyltransferase component of pyruvate dehydrogenase complex (EC 2.3.1.12), 8), Acetolactate synthase large subunit (EC 2.2.1.6), 7), Arginyl-tRNA synthetase (EC 6.1.1.19), 7), Serine acetyltransferase (EC 2.3.1.30), 7))

Carbon-N-Acetyl-D-Glucosamine 1 Cell division protein FtsZ (EC 3.4.24.-), 3), Phytochrome; 2C two-component sensor histidine kinase (EC 2.7.3.-); 3B cyanobacterial phytochrome 1, 2), tRNA (guanine(37)-N(1))-methyltransferase (EC 2.1.1.228), 1), 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductase; 2C gamma subunit (EC 1.2.7.-), 1), Type I restriction-modification system; 2C restriction subunit R (EC 3.1.21.3), 1), 2 6-phosphogluconate dehydrogenase; 2C decarboxylating (EC 1.1.1.44), 7), Ribonucleotide reductase of class II (coenzyme B12-dependent) (EC 1.17.4.1), 5), Coproporphyrinogen III oxidase; 2C oxygen-independent (EC 1.3.99.22), 4), DNA polymerase I (EC 2.7.7.7), 4), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 4), 3 Methionyl-tRNA synthetase (EC 6.1.1.10), 7), Limit dextrin alpha-1; 2C6-maltotetraose-hydrolase (EC 3.2.1.196), 7), Epoxide hydrolase (EC 3.3.2.9), 3), Glycerophosphoryl diester phosphodiesterase (EC 3.1.4.46), 2), Nucleoside triphosphate pyrophosphohydrolase MazG (EC 3.6.1.8), 2), 4 (tRNA (cytidine(34)-2-O)-methyltransferase (EC 2.1.1.207), 16), Peptidyl-prolyl cis-trans isomerase PpiD (EC 5.2.1.8), 7), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 6), (Inosine-5-monophosphate dehydrogenase (EC 1.1.1.205) / CBS domain, 5), 2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2), 4), 5 Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 7), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 4), Transketolase (EC 2.2.1.1), 4), dTDP-4-dehydrorhamnose 3; 2C5-epimerase (EC 5.1.3.13), 4), Serine hydroxymethyltransferase (EC 2.1.2.1), 4), 6 DNA ligase (NAD(+)) (EC 6.5.1.2), 4), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 4), Pantoate–beta-alanine ligase (EC 6.3.2.1), 4), ATP-dependent DNA helicase RecG (EC 3.6.4.12), 4), NAD-dependent glyceraldehyde-3-phosphate dehydrogenase (EC 1.2.1.12), 4), 7 Acetylglutamate kinase (EC 2.7.2.8), 25), Polyribonucleotide nucleotidyltransferase (EC 2.7.7.8), 8), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 7), ATP-dependent protease La (EC 3.4.21.53) Type I, 6), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 6), 8 3-oxoacyl-acyl-carrier-protein synthase; 2C KASIII (EC 2.3.1.180), 12), ATP-dependent Clp protease proteolytic subunit (EC 3.4.21.92), 12), UDP-N-acetylglucosamine 1-carboxyvinyltransferase (EC 2.5.1.7), 5), Methionyl-tRNA synthetase (EC 6.1.1.10), 5), Thioredoxin reductase (EC 1.8.1.9), 4), 9 NADH-ubiquinone oxidoreductase chain G (EC 1.6.5.3), 8), 3-deoxy-manno-octulosonate cytidylyltransferase (EC 2.7.7.38), 6), DNA polymerase III polC-type (EC 2.7.7.7), 5), Aminomethyltransferase (glycine cleavage system T protein) (EC 2.1.2.10), 4), Quinolinate synthetase (EC 2.5.1.72), 4), 10 Aspartokinase (EC 2.7.2.4), 4), ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 4), DNA gyrase subunit B (EC 5.99.1.3), 4), Cell division protein FtsH (EC 3.4.24.-), 3), NAD(P) transhydrogenase alpha subunit (EC 1.6.1.2), 3))

Carbon-L-Lactic-Acid 1 2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2), 5), NAD-dependent glyceraldehyde-3-phosphate dehydrogenase (EC 1.2.1.12), 5), Aspartate aminotransferase (EC 2.6.1.1), 3), DNA polymerase I (EC 2.7.7.7), 3), Valyl-tRNA synthetase (EC 6.1.1.9), 2), 2 Glucose-1-phosphate thymidylyltransferase (EC 2.7.7.24), 2), 3-oxoacyl-acyl-carrier-protein synthase; 2C KASII (EC 2.3.1.179), 2), Adenosylmethionine-8-amino-7-oxononanoate aminotransferase (EC 2.6.1.62), 2), ATP-dependent protease La (EC 3.4.21.53) Type I, 1), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 1), 3 GMP synthase glutamine-hydrolyzing; 2C amidotransferase subunit (EC 6.3.5.2) / GMP synthase glutamine-hydrolyzing; 2C ATP py- rophosphatase subunit (EC 6.3.5.2), 3), Sulfur carrier protein ThiS adenylyltransferase (EC 2.7.7.73), 2), Cell division protein FtsH (EC 3.4.24.-), 2), ATP synthase alpha chain (EC 3.6.3.14), 2), beta-galactosidase (EC 3.2.1.23), 2), 4 NAD-dependent glyceraldehyde-3-phosphate dehydrogenase (EC 1.2.1.12), 11), Chitinase (EC 3.2.1.14), 4), Oxaloacetate decarboxylase; 2C divalent-cation-dependent (EC 4.1.1.3), 3), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 3), Seryl-tRNA synthetase (EC 6.1.1.11), 2), 5 2-keto-3-deoxy-D-arabino-heptulosonate-7-phosphate synthase II (EC 2.5.1.54), 4), Sucrose-6-phosphate hydrolase (EC 3.2.1.26); 3B Lev- anase (EC 3.2.1.65), 4), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 3), DNA polymerase III beta subunit (EC 2.7.7.7), 2), Prolyl-tRNA synthetase (EC 6.1.1.15); 2C bacterial type, 2), 6 UDP-N-acetylmuramate–alanine ligase (EC 6.3.2.8), 8), GDP-mannose 4; 2C6-dehydratase (EC 4.2.1.47), 6), 3-methylmercaptopropionyl- CoA ligase (EC 6.2.1.44) of DmdB2 type, 5), Phosphoribulokinase (EC 2.7.1.19), 4), Type I restriction-modification system; 2C restriction subunit R (EC 3.1.21.3), 4), 7 Anthranilate synthase; 2C aminase component (EC 4.1.3.27), 1), Transketolase (EC 2.2.1.1), 1), Glutamate-ammonia-ligase adenylyl- transferase (EC 2.7.7.42), 1), (Pyridoxamine 5-phosphate oxidase (EC 1.4.3.5), 1), Sulfate and thiosulfate import ATP-binding protein CysA (EC 3.6.3.25), 1), 8 3-oxoacyl-acyl-carrier-protein synthase; 2C KASII (EC 2.3.1.179), 5), Endonuclease III (EC 4.2.99.18), 4), 5-methyltetrahydrofolate– homocysteine methyltransferase (EC 2.1.1.13), 3), Ribonucleotide reductase of class Ib (aerobic); 2C beta subunit (EC 1.17.4.1), 3), 6-phosphofructokinase (EC 2.7.1.11), 3), 9 Cell division protein FtsI Peptidoglycan synthetase (EC 2.4.1.129), 3), Protoporphyrin IX Mg-chelatase subunit I (EC 6.6.1.1), 3), tRNA- guanine transglycosylase (EC 2.4.2.29), 3), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 2), Phytochrome; 2C two-component sensor histidine kinase (EC 2.7.3.-), 2), 10 ATP synthase alpha chain (EC 3.6.3.14), 27), Prolipoprotein diacylglyceryl transferase (EC 2.4.99.-), 6), Oligoendopeptidase F (EC 3.4.24.-), 6), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 5), GDP-mannose 4; 2C6-dehydratase (EC 4.2.1.47), 4))

Carbon-m-Tartaric-Acid DRAFT122 1 NAD(P) transhydrogenase alpha subunit (EC 1.6.1.2), 12), Type I restriction-modification system; 2C restriction subunit R (EC 3.1.21.3), 6), Phenylalanyl-tRNA synthetase beta chain (EC 6.1.1.20), 6), Aspartate carbamoyltransferase (EC 2.1.3.2), 6), 3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 4), 2 Urocanate hydratase (EC 4.2.1.49), 18), Aconitate hydratase (EC 4.2.1.3), 9), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 7), Leucyl-tRNA synthetase (EC 6.1.1.4), 6), Anthranilate synthase; 2C aminase component (EC 4.1.3.27), 5), 3 NADP-dependent malic enzyme (EC 1.1.1.40), 6), DNA ligase (NAD(+)) (EC 6.5.1.2), 5), Membrane alanine aminopeptidase N (EC 3.4.11.2), 5), DNA polymerase III alpha subunit (EC 2.7.7.7), 4), NADH-ubiquinone oxidoreductase chain D (EC 1.6.5.3), 4), 4 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 7), Cell division protein FtsH (EC 3.4.24.-), 6), L-seryl-tRNA(Sec) selenium transferase (EC 2.9.1.1), 6), Enoyl-CoA hydratase isoleucine degradation (EC 4.2.1.17) / 3-hydroxyacyl-CoA dehydrogenase (EC 1.1.1.35) / 3-hydroxybutyryl-CoA epimerase (EC 5.1.2.3), 6), DNA gyrase subunit B (EC 5.99.1.3), 6), 5 Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 10), Aminopeptidase S (Leu; 2C Val; 2C Phe; 2C Tyr preference) (EC 3.4.11.24), 6), Topoisomerase IV subunit A (EC 5.99.1.-), 6), Aspartate aminotransferase (EC 2.6.1.1), 4), Ribonuclease E (EC 3.1.26.12), 4), 6 DNA gyrase subunit A (EC 5.99.1.3), 18), Isocitrate dehydrogenase NADP (EC 1.1.1.42); 3B Monomeric isocitrate dehydrogenase NADP (EC 1.1.1.42), 11), Acetyl-coenzyme A carboxyl transferase alpha chain (EC 6.4.1.2), 7), Cystathionine beta-synthase (EC 4.2.1.22), 6), 3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 6), 7 Respiratory nitrate reductase delta chain (EC 1.7.99.4), 5), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 5), Membrane-bound lytic murein transglycosylase F (EC 4.2.2.n1), 4), V-type ATP synthase subunit C (EC 3.6.3.14), 4), Transketolase (EC 2.2.1.1), 4), 8 2-methylcitrate dehydratase (2-methyl-trans-aconitate forming) (EC 4.2.1.117), 4), 1-deoxy-D-xylulose 5-phosphate synthase (EC 2.2.1.7), 4), Sulfate and thiosulfate import ATP-binding protein CysA (EC 3.6.3.25), 4), 2-acylglycerophosphoethanolamine acyltransferase (EC 2.3.1.40) / Acyl-acyl-carrier-protein synthetase (EC 6.2.1.20), 4), Alcohol dehydrogenase (EC 1.1.1.1), 3), 9 ATP synthase beta chain (EC 3.6.3.14), 15), Alanyl-tRNA synthetase (EC 6.1.1.7), 6), NADH dehydrogenase (EC 1.6.99.3), 5), Hydrox- ymethylpyrimidine phosphate kinase ThiD (EC 2.7.4.7), 5), Anthranilate phosphoribosyltransferase (EC 2.4.2.18), 5), 10 Phenylalanyl-tRNA synthetase beta chain (EC 6.1.1.20), 7), Lysyl-tRNA synthetase (class II) (EC 6.1.1.6), 5), DNA primase (EC 2.7.7.- ), 5), Xanthine dehydrogenase; 2C molybdenum binding subunit (EC 1.17.1.4), 4), Malonate-semialdehyde dehydrogenase inositol (EC 1.2.1.18), 4))

Carbon-D-Mannose 1 Phosphoribosylamine–glycine ligase (EC 6.3.4.13), 3), ATP-dependent protease La (EC 3.4.21.53) Type I, 2), Seryl-tRNA synthetase (EC 6.1.1.11), 2), Octanoate-acyl-carrier-protein-protein-N-octanoyltransferase (EC 2.3.1.181), 2), Fumarylacetoacetase (EC 3.7.1.2), 2), 2 Assimilatory nitrate reductase large subunit (EC 1.7.99.4), 2), DNA primase (EC 2.7.7.-), 1), Formyltetrahydrofolate deformylase (EC 3.5.1.10), 1), Adenylosuccinate lyase (EC 4.3.2.2) @ SAICAR lyase (EC 4.3.2.2), 1), Phosphoribosylformylglycinamidine synthase; 2C synthetase subunit (EC 6.3.5.3) / Phosphoribosylformylglycinamidine synthase; 2C glutamine amidotransferase subunit (EC 6.3.5.3), 1), 3 Fumarylacetoacetase (EC 3.7.1.2), 1), Kynurenine 3-monooxygenase (EC 1.14.13.9), 1), ATP synthase beta chain (EC 3.6.3.14), 1), Sarcosine oxidase alpha subunit (EC 1.5.3.1), 1), Octanoate-acyl-carrier-protein-protein-N-octanoyltransferase (EC 2.3.1.181), 1), 4 Tyrosyl-tRNA synthetase (EC 6.1.1.1), 6), Pyruvate carboxylase (EC 6.4.1.1), 4), Prolyl-tRNA synthetase (EC 6.1.1.15); 2C bacterial type, 4), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 3), Indole-3-glycerol phosphate synthase (EC 4.1.1.48), 3), 5 DNA-directed RNA polymerase alpha subunit (EC 2.7.7.6), 7), Aconitate hydratase (EC 4.2.1.3), 4), Cytochrome c oxidase polypeptide I (EC 1.9.3.1), 4), 6-phosphofructokinase (EC 2.7.1.11), 3), 3-isopropylmalate dehydratase large subunit (EC 4.2.1.33), 3), 6 Phosphatidylserine decarboxylase (EC 4.1.1.65), 9), Adenosylcobinamide-phosphate synthase (EC 6.3.1.10), 3), CCA tRNA nucleotidyl- transferase (EC 2.7.7.72), 2), beta-N-acetylglucosaminidase (EC 3.2.1.52), 2), L-serine dehydratase; 2C beta subunit (EC 4.3.1.17) / L-serine dehydratase; 2C alpha subunit (EC 4.3.1.17), 2), 7 Leucyl-tRNA synthetase (EC 6.1.1.4), 24), ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 6), 2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2), 5), Membrane alanine aminopeptidase N (EC 3.4.11.2), 3), Guanyl-specific ribonuclease (EC 3.1.27.3), 3), 8 Signal peptidase I (EC 3.4.21.89), 4), UDP-N-acetylglucosamine 1-carboxyvinyltransferase (EC 2.5.1.7), 3), Arginine N-succinyltransferase; 2C beta subunit (EC 2.3.1.109), 1), Gamma-glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 1), 2-dehydro- 3-deoxygluconokinase (EC 2.7.1.45), 1), 9 Acetyl-coenzyme A carboxyl transferase alpha chain (EC 6.4.1.2), 8), 3-polyprenyl-4-hydroxybenzoate carboxy-lyase (EC 4.1.1.98), 5), Amidophosphoribosyltransferase (EC 2.4.2.14), 4), DNA topoisomerase I (EC 5.99.1.2), 4), ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 3), 10 Phosphomannomutase (EC 5.4.2.8), 5), Arylsulfatase (EC 3.1.6.1), 3), Uracil phosphoribosyltransferase (EC 2.4.2.9), 2), Oligoendopepti- dase F (EC 3.4.24.-), 2), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 2))

Carbon-Glycyl-L-Glutamic-Acid 1 Dihydroxy-acid dehydratase (EC 4.2.1.9), 11), Glutamate synthase NADPH large chain (EC 1.4.1.13), 8), Phosphogluconate dehydratase (EC 4.2.1.12), 6), UDP-N-acetylmuramate:L-alanyl-gamma-D-glutamyl-meso-diaminopimelate ligase (EC 6.3.2.-), 6), Betaine aldehyde dehydrogenase (EC 1.2.1.8), 6), 2 DNA ligase (NAD(+)) (EC 6.5.1.2), 28), L-aspartate oxidase (EC 1.4.3.16), 7), Glyoxylate reductase (EC 1.1.1.26) @ Hydroxypyruvate reductase (EC 1.1.1.81), 6), Exodeoxyribonuclease VII large subunit (EC 3.1.11.6), 6), Valyl-tRNA synthetase (EC 6.1.1.9), 5), 3 Histidinol dehydrogenase (EC 1.1.1.23), 19), Tryptophan synthase beta chain (EC 4.2.1.20), 6), Phosphoribosylanthranilate isomerase (EC 5.3.1.24), 6), Acetyl-CoA synthetase (EC 6.2.1.1), 5), DNA polymerase III alpha subunit (EC 2.7.7.7), 5), 4 Cell division protein FtsZ (EC 3.4.24.-), 11), Protease II (EC 3.4.21.83), 10), Alanyl-tRNA synthetase (EC 6.1.1.7), 8), Exodeoxyribonu- clease VII large subunit (EC 3.1.11.6), 7), 3-isopropylmalate dehydratase large subunit (EC 4.2.1.33), 7), 5 Histidinol dehydrogenase (EC 1.1.1.23), 6), Gamma-glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 4), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 4), CDP-diacylglycerol–glycerol-3-phosphate 3-phosphatidyltransferase (EC 2.7.8.5), 3), NADH- ubiquinone oxidoreductase chain M (EC 1.6.5.3), 3), 6 Aspartate aminotransferase (EC 2.6.1.1), 8), N(6)-L-threonylcarbamoyladenine synthase (EC 2.3.1.234), 6), D-alanyl-D-alanine car- boxypeptidase (EC 3.4.16.4), 5), Arylsulfatase (EC 3.1.6.1), 5), Gamma-glutamyl phosphate reductase (EC 1.2.1.41), 5), 7 Fumarate hydratase class II (EC 4.2.1.2), 6), Respiratory nitrate reductase alpha chain (EC 1.7.99.4), 5), Aspartokinase (EC 2.7.2.4) / Homoserine dehydrogenase (EC 1.1.1.3), 4), UDP-N-acetylglucosamine 1-carboxyvinyltransferase (EC 2.5.1.7), 4), Sulfate adenylyltrans- ferase subunit 1 (EC 2.7.7.4), 4), 8 Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 7), Threonine dehydratase biosynthetic (EC 4.3.1.19), 6), Porphobilinogen deaminase (EC 2.5.1.61), 5), N-acetylglucosamine-1- phosphate uridyltransferase (EC 2.7.7.23) / Glucosamine-1-phosphate N-acetyltransferase (EC 2.3.1.157), 4), Histidinol dehydrogenase (EC 1.1.1.23), 4), 9 ATP-dependent DNA helicase RecG (EC 3.6.4.12), 22), Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydro- genase complex (EC 2.3.1.61) / 2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2) @ 2-oxoglutarate decarboxylase (EC 4.1.1.71) @ 2-hydroxy-3-oxoadipate synthase (EC 2.2.1.5), 4), Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4), 4), Molybdopterin molybdenumtransferase (EC 2.10.1.1), 4), beta-glucosidase (EC 3.2.1.21), 4), 10 Methenyltetrahydrofolate cyclohydrolase (EC 3.5.4.9) / Methylenetetrahydrofolate dehydrogenase (NADP+) (EC 1.5.1.5), 10), Cell divi- sion protein FtsH (EC 3.4.24.-), 9), Aconitate hydratase (EC 4.2.1.3), 8), D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 6), Alcohol dehydrogenase (EC 1.1.1.1), 5))

Carbon-4-Hydroxy-L-Proline-trans 1 Uridine monophosphate kinase (EC 2.7.4.22), 14), Pyrrolidone-carboxylate peptidase (EC 3.4.19.3), 12), Phosphoribosylformylglycinami- dine synthase; 2C synthetase subunit (EC 6.3.5.3), 9), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 9), Leucyl-tRNA synthetase (EC 6.1.1.4), 8), 2 DNA topoisomerase I (EC 5.99.1.2), 12), NADP-dependent malic enzyme (EC 1.1.1.40), 11), ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 7), Transketolase (EC 2.2.1.1), 4), Exodeoxyribonuclease V beta chain (EC 3.1.11.5), 3), DRAFT123 3 NADH-ubiquinone oxidoreductase chain N (EC 1.6.5.3), 9), Crossover junction endodeoxyribonuclease RuvC (EC 3.1.22.4), 9), UDP- galactopyranose mutase (EC 5.4.99.9), 9), Alcohol dehydrogenase (EC 1.1.1.1), 7), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 5), 4 Threonine dehydratase biosynthetic (EC 4.3.1.19), 16), Threonine dehydratase; 2C catabolic (EC 4.3.1.19) @ L-serine dehydratase; 2C (PLP)-dependent (EC 4.3.1.17), 9), 3-isopropylmalate dehydrogenase (EC 1.1.1.85), 7), Methionyl-tRNA synthetase (EC 6.1.1.10), 6), (2E; 2C6E)-farnesyl diphosphate synthase (EC 2.5.1.10), 5), 5 Acetolactate synthase large subunit (EC 2.2.1.6), 8), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 7), ATP-dependent DNA helicase RecG (EC 3.6.4.12), 7), Lipoyl synthase (EC 2.8.1.8), 6), Alpha-aminoadipate aminotransferase (EC 2.6.1.39) @ Leucine transaminase (EC 2.6.1.6) @ Valine transaminase, 6), 6 (16S rRNA (cytidine(1402)-2-O)-methyltransferase (EC 2.1.1.198), 8), Aspartyl-tRNA synthetase (EC 6.1.1.12) @ Aspartyl-tRNA(Asn) synthetase (EC 6.1.1.23), 7), N-acetylmuramoyl-L-alanine amidase (EC 3.5.1.28), 7), D-alanyl-D-alanine carboxypeptidase (EC 3.4.16.4), 6), Tryptophanyl-tRNA synthetase (EC 6.1.1.2), 6), 7 N(6)-L-threonylcarbamoyladenine synthase (EC 2.3.1.234), 7), Phosphoribosylamine–glycine ligase (EC 6.3.4.13), 7), Tryptophan synthase alpha chain (EC 4.2.1.20), 6), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper- translocating P-type ATPase (EC 3.6.3.4), 5), Pyruvate; 2Cphosphate dikinase (EC 2.7.9.1), 5), 8 Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 32), Glucosamine–fructose-6-phosphate aminotransferase isomerizing (EC 2.6.1.16), 15), Succinate-semialdehyde dehydrogenase NAD(P)+ (EC 1.2.1.16), 8), Fructose-bisphosphate aldolase class II (EC 4.1.2.13), 7), NADH-ubiquinone oxidoreductase chain L (EC 1.6.5.3), 6), 9 Limit dextrin alpha-1; 2C6-maltotetraose-hydrolase (EC 3.2.1.196), 11), GMP synthase glutamine-hydrolyzing; 2C amidotransferase sub- unit (EC 6.3.5.2) / GMP synthase glutamine-hydrolyzing; 2C ATP pyrophosphatase subunit (EC 6.3.5.2), 8), NADH-ubiquinone oxi- doreductase chain L (EC 1.6.5.3), 7), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 5), Transcriptional repressor of PutA and PutP / Proline dehydrogenase (EC 1.5.5.2) / Delta-1-pyrroline-5-carboxylate dehydrogenase (EC 1.2.1.88), 5), 10 Aspartate-semialdehyde dehydrogenase (EC 1.2.1.11), 6), Histidyl-tRNA synthetase (EC 6.1.1.21), 6), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 6), Phenylalanyl-tRNA synthetase beta chain (EC 6.1.1.20), 5), Glutamate synthase NADPH large chain (EC 1.4.1.13), 5))

Carbon-Fumaric-Acid 1 Glycolate dehydrogenase (EC 1.1.99.14); 2C subunit GlcD, 6), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 4), Peptidyl-prolyl cis-trans isomerase (EC 5.2.1.8), 4), Adenylate cyclase (EC 4.6.1.1), 4), Cell division protein FtsH (EC 3.4.24.-), 3), 2 D-2-hydroxyglutarate dehydrogenase (EC 1.1.99.2), 6), Leucyl-tRNA synthetase (EC 6.1.1.4), 6), Aconitate hydratase (EC 4.2.1.3), 4), Lysyl-tRNA synthetase (class II) (EC 6.1.1.6), 3), Lipoprotein signal peptidase (EC 3.4.23.36), 3), 3 DNA polymerase III alpha subunit (EC 2.7.7.7), 5), Adenosylhomocysteinase (EC 3.3.1.1), 4), Uroporphyrinogen-III methyltransferase (EC 2.1.1.107) / Uroporphyrinogen-III synthase (EC 4.2.1.75), 4), Pyruvate-flavodoxin oxidoreductase (EC 1.2.7.-), 4), Phosphate:acyl- ACP acyltransferase PlsX (EC 2.3.1.n2), 4), 4 Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 6), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 5), Ribonucleotide reductase of class II (coenzyme B12-dependent) (EC 1.17.4.1), 4), Pyruvate-flavodoxin oxidoreductase (EC 1.2.7.-), 3), DNA polymerase III alpha subunit (EC 2.7.7.7), 3), 5 Isoleucyl-tRNA synthetase (EC 6.1.1.5), 7), Alanyl-tRNA synthetase (EC 6.1.1.7), 7), beta-glucosidase (EC 3.2.1.21), 6), Cytidylate kinase (EC 2.7.4.25), 5), Proline dehydrogenase (EC 1.5.5.2) / Delta-1-pyrroline-5-carboxylate dehydrogenase (EC 1.2.1.88), 4), 6 Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 8), DNA ligase (NAD(+)) (EC 6.5.1.2), 4), Undecaprenyl-phosphate alpha-N- acetylglucosaminyl 1-phosphate transferase (EC 2.7.8.33), 4), DNA polymerase III alpha subunit (EC 2.7.7.7), 4), Sensor histidine kinase ResE (EC 2.7.13.3), 4), 7 Hydroxylamine reductase (EC 1.7.99.1), 6), N(6)-L-threonylcarbamoyladenine synthase (EC 2.3.1.234), 3), Signal transduction histidine kinase CheA (EC 2.7.3.-), 3), 4-hydroxy-tetrahydrodipicolinate synthase (EC 4.3.3.7), 2), ATP synthase epsilon chain (EC 3.6.3.14), 2), 8 Error-prone repair homolog of DNA polymerase III alpha subunit (EC 2.7.7.7), 4), Cystathionine beta-synthase (EC 4.2.1.22), 3), An- thranilate phosphoribosyltransferase (EC 2.4.2.18), 3), DNA polymerase I (EC 2.7.7.7), 3), Respiratory nitrate reductase beta chain (EC 1.7.99.4), 2), 9 Catalase-peroxidase KatG (EC 1.11.1.21), 11), 2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2), 9), Phosphoglucosamine mutase (EC 5.4.2.10), 7), Glycyl-tRNA synthetase alpha chain (EC 6.1.1.14), 6), Thioredoxin reductase (EC 1.8.1.9), 6), 10 Uridine monophosphate kinase (EC 2.7.4.22), 11), Thioredoxin reductase (EC 1.8.1.9), 11), Phosphoglycerate kinase (EC 2.7.2.3), 10), Glutamate-1-semialdehyde 2; 2C1-aminomutase (EC 5.4.3.8), 10), Exopolyphosphatase (EC 3.6.1.11), 8))

Carbon-p-Hydroxy-Phenylacetic-Acid 1 NADPH-dependent glyceraldehyde-3-phosphate dehydrogenase (EC 1.2.1.13), 9), (Pyridoxal 5-phosphate synthase (glutamine hydrolyz- ing); 2C synthase subunit (EC 4.3.3.6), 5), Glutamate synthase NADPH large chain (EC 1.4.1.13), 5), Diaminopimelate decarboxylase (EC 4.1.1.20), 4), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), 2 Histidinol dehydrogenase (EC 1.1.1.23), 15), S-adenosylmethionine synthetase (EC 2.5.1.6), 7), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 7), Prolipoprotein diacylglyceryl transferase (EC 2.4.99.-), 6), DNA gyrase subunit A (EC 5.99.1.3), 5), 3 Cysteine synthase (EC 2.5.1.47), 9), tRNA (guanine(37)-N(1))-methyltransferase (EC 2.1.1.228), 6), ATP synthase F0 sector subunit c (EC 3.6.3.14), 6), 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 5), UDP-glucose 4-epimerase (EC 5.1.3.2), 4), 4 (Inosine-5-monophosphate dehydrogenase (EC 1.1.1.205) / CBS domain, 10), Adenosylmethionine-8-amino-7-oxononanoate aminotrans- ferase (EC 2.6.1.62), 7), L-asparaginase I; 2C cytoplasmic (EC 3.5.1.1), 5), Alkaline phosphatase (EC 3.1.3.1), 4), Cytosol aminopeptidase PepA (EC 3.4.11.1), 4), 5 Aconitate hydratase 2 (EC 4.2.1.3), 6), Succinate-semialdehyde dehydrogenase NAD(P)+ (EC 1.2.1.16), 6), Glutamate-ammonia-ligase adenylyltransferase (EC 2.7.7.42), 5), Pyridine nucleotide-disulfide oxidoreductase; 3B NADH dehydrogenase (EC 1.6.99.3), 4), Long- chain-fatty-acid–CoA ligase (EC 6.2.1.3), 4), 6 Polyribonucleotide nucleotidyltransferase (EC 2.7.7.8), 8), UDP-N-acetyl-L-fucosamine synthase (EC 5.1.3.28), 7), Histidine ammonia- lyase (EC 4.3.1.3), 6), 3-oxoadipate CoA-transferase subunit A (EC 2.8.3.6), 5), Lysyl-tRNA synthetase (class II) (EC 6.1.1.6), 5), 7 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 42), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 5), tRNA pseudouridine(38-40) synthase (EC 5.4.99.12), 4), Hypoxanthine-guanine phosphoribosyltransferase (EC 2.4.2.8), 3), Pyruvate carboxylase subunit B (EC 6.4.1.1), 3), 8 Cytochrome c oxidase polypeptide I (EC 1.9.3.1), 16), Thioredoxin reductase (EC 1.8.1.9), 11), Phosphoglycerate kinase (EC 2.7.2.3), 6), Cell division protein FtsZ (EC 3.4.24.-), 5), (Inosine-5-monophosphate dehydrogenase (EC 1.1.1.205) / CBS domain, 5), 9 ATP synthase beta chain (EC 3.6.3.14), 17), Glutamate 5-kinase (EC 2.7.2.11) / RNA-binding C-terminal domain PUA, 16), 2; 2C3- bisphosphoglycerate-independent phosphoglycerate mutase (EC 5.4.2.12), 10), UDP-N-acetylglucosamine 1-carboxyvinyltransferase (EC 2.5.1.7), 9), Type I restriction-modification system; 2C DNA-methyltransferase subunit M (EC 2.1.1.72), 7), 10 3-oxoacyl-acyl-carrier-protein synthase; 2C KASII (EC 2.3.1.179), 5), (Inosine-5-monophosphate dehydrogenase (EC 1.1.1.205) / CBS domain, 5), Cysteinyl-tRNA synthetase (EC 6.1.1.16), 4), FKBP-type peptidyl-prolyl cis-trans isomerase FklB (EC 5.2.1.8), 4), Urocanate hydratase (EC 4.2.1.49), 3))

Carbon-L-Malic-Acid 1 Mg(2+) transport ATPase; 2C P-type (EC 3.6.3.2), 6), 4-hydroxyphenylacetate 3-monooxygenase (EC 1.14.14.9), 4), Carboxyl-terminal protease (EC 3.4.21.102), 4), Aspartate aminotransferase (EC 2.6.1.1), 3), Dihydropteroate synthase (EC 2.5.1.15), 3), 2 ATP-dependent protease La (EC 3.4.21.53) Type I, 5), Phosphate:acyl-ACP acyltransferase PlsX (EC 2.3.1.n2), 5), 2-dehydro-3- deoxygluconokinase (EC 2.7.1.45), 4), Phosphoenolpyruvate-protein phosphotransferase of PTS system (EC 2.7.3.9), 4), Adenylosuccinate lyase (EC 4.3.2.2) @ SAICAR lyase (EC 4.3.2.2), 4), 3 DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), Glucan 1; 2C4-alpha-maltotetraohydrolase (EC 3.2.1.60), 3), DEAD-box ATP-dependent RNA helicase DeaD (; 3D CshA) (EC 3.6.4.13), 3), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 3), DRAFT124 4 Valyl-tRNA synthetase (EC 6.1.1.9), 8), Methylmalonate-semialdehyde dehydrogenase (EC 1.2.1.27), 6), Adenylosuccinate synthetase (EC 6.3.4.4), 5), Single-stranded-DNA-specific exonuclease RecJ (EC 3.1.-.-), 3), Carbonic anhydrase; 2C beta class (EC 4.2.1.1), 3), 5 Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 6), Ribonucleotide reductase of class Ib (aerobic); 2C beta subunit (EC 1.17.4.1), 4), Aspartate-semialdehyde dehydrogenase (EC 1.2.1.11), 4), Lysyl-tRNA synthetase (class I) (EC 6.1.1.6), 4), 4-alpha-glucanotransferase (amylomaltase) (EC 2.4.1.25), 3), 6 Isopentenyl-diphosphate delta-isomerase; 2C FMN-dependent (EC 5.3.3.2), 5), Acetyl-coenzyme A carboxyl transferase alpha chain (EC 6.4.1.2) / Acetyl-coenzyme A carboxyl transferase beta chain (EC 6.4.1.2); 3B Propionyl-CoA carboxylase beta chain (EC 6.4.1.3), 2), Nicotinate-nucleotide adenylyltransferase (EC 2.7.7.18), 2), Glutamate N-acetyltransferase (EC 2.3.1.35) @ N-acetylglutamate synthase (EC 2.3.1.1), 2), Phosphate acetyltransferase (EC 2.3.1.8), 2), 7 Inosose isomerase (EC 5.3.99.11), 6), Nucleoside triphosphate pyrophosphohydrolase MazG (EC 3.6.1.8), 5), Cysteinyl-tRNA synthetase (EC 6.1.1.16), 3), Ribosomal protein L11 methyltransferase (EC 2.1.1.-), 3), Histidyl-tRNA synthetase (EC 6.1.1.21), 3), 8 Adenylosuccinate synthetase (EC 6.3.4.4), 5), UDP-N-acetylmuramoylalanine–D-glutamate ligase (EC 6.3.2.9), 4), DNA ligase (NAD(+)) (EC 6.5.1.2), 3), Glutamine synthetase type I (EC 6.3.1.2), 3), Alpha-D-GlcNAc alpha-1; 2C2-L-rhamnosyltransferase (EC 2.4.1.-), 3), 9 Peptide deformylase (EC 3.5.1.88), 10), Glycine dehydrogenase decarboxylating (glycine cleavage system P2 protein) (EC 1.4.4.2), 5), CTP synthase (EC 6.3.4.2), 3), Enoyl-CoA hydratase (EC 4.2.1.17), 3), Dihydroorotate dehydrogenase (NAD(+)); 2C electron transfer subunit (EC 1.3.1.14), 3), 10 tRNA (guanine(37)-N(1))-methyltransferase (EC 2.1.1.228), 3), Enolase (EC 4.2.1.11), 2), Enoyl-acyl-carrier-protein reductase FMN (EC 1.3.1.9), 2), Type I restriction-modification system; 2C DNA-methyltransferase subunit M (EC 2.1.1.72), 2), NADPH dehydrogenase (EC 1.6.99.1), 2))

Carbon-a-Hydroxy-Butyric-Acid 1 Dihydroxy-acid dehydratase (EC 4.2.1.9), 10), Amidophosphoribosyltransferase (EC 2.4.2.14), 9), Polyribonucleotide nucleotidyltrans- ferase (EC 2.7.7.8), 8), Cardiolipin synthetase (EC 2.7.8.-), 7), Glutathione synthetase (EC 6.3.2.3), 7), 2 Sensor histidine kinase ResE (EC 2.7.13.3), 5), Acetolactate synthase large subunit (EC 2.2.1.6), 4), Hydroxymethylpyrimidine phosphate kinase ThiD (EC 2.7.4.7), 4), Quinone oxidoreductase (EC 1.6.5.5), 4), Alcohol dehydrogenase (EC 1.1.1.1), 3), 3 Glucose-6-phosphate 1-dehydrogenase (EC 1.1.1.49), 6), Hydroxymethylpyrimidine phosphate synthase ThiC (EC 4.1.99.17), 6), UDP- N-acetylmuramoylalanyl-D-glutamate–L-lysine ligase (EC 6.3.2.7), 4), D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 4), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 4), 4 Deoxycytidine triphosphate deaminase (EC 3.5.4.13), 8), Urocanate hydratase (EC 4.2.1.49), 7), Cysteine synthase (EC 2.5.1.47), 7), Leucyl-tRNA synthetase (EC 6.1.1.4), 6), Exodeoxyribonuclease V alpha chain (EC 3.1.11.5), 6), 5 Enolase (EC 4.2.1.11), 44), Arsenate reductase (EC 1.20.4.1), 14), ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 12), D-3- phosphoglycerate dehydrogenase (EC 1.1.1.95), 11), Ketosteroid-9-alpha-hydroxylase; 2C oxygenase (EC 1.14.13.- ); 3B Terminal oxyge- nase KshA, 7), 6 Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 4), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 3), Cell division protein FtsH (EC 3.4.24.-), 3), UDP-N- acetylglucosamine 1-carboxyvinyltransferase (EC 2.5.1.7), 3), ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 3), 7 1-deoxy-D-xylulose 5-phosphate reductoisomerase (EC 1.1.1.267), 13), Ribonucleotide reductase of class Ib (aerobic); 2C alpha subunit (EC 1.17.4.1), 9), ATP synthase beta chain (EC 3.6.3.14), 6), 6-phosphogluconolactonase (EC 3.1.1.31); 2C eukaryotic type, 6), Carbamoyl- phosphate synthase large chain (EC 6.3.5.5), 5), 8 Dihydroxy-acid dehydratase (EC 4.2.1.9), 54), Phosphogluconate dehydratase (EC 4.2.1.12), 21), Multimodular transpeptidase- transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 21), L-arabonate dehydratase (EC 4.2.1.25), 7), GMP synthase glutamine-hydrolyzing; 2C amidotransferase subunit (EC 6.3.5.2) / GMP synthase glutamine-hydrolyzing; 2C ATP pyrophosphatase subunit (EC 6.3.5.2), 7), 9 (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 15), Polyribonucleotide nucleotidyltransferase (EC 2.7.7.8), 8), Ribonucleotide reductase of class Ia (aerobic); 2C beta subunit (EC 1.17.4.1), 7), Malate:quinone oxidoreductase (EC 1.1.5.4), 6), Glycyl-tRNA synthetase (EC 6.1.1.14), 6), 10 Crossover junction endodeoxyribonuclease RuvC (EC 3.1.22.4), 8), Aspartyl-tRNA(Asn) amidotransferase subunit B (EC 6.3.5.6) @ Glutamyl-tRNA(Gln) amidotransferase subunit B (EC 6.3.5.7), 7), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), DNA gyrase subunit A (EC 5.99.1.3), 4), 4-hydroxybenzoate polyprenyltransferase (EC 2.5.1.39), 3))

Carbon-D-Xylose 1 DNA polymerase III beta subunit (EC 2.7.7.7), 4), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), 1-deoxy-D-xylulose 5-phosphate reductoisomerase (EC 1.1.1.267), 3), 6-phosphogluconate dehydrogenase; 2C decarboxylating (EC 1.1.1.44), 3), Malto- oligosyltrehalose synthase (EC 5.4.99.15), 2), 2 Fructose-bisphosphate aldolase class II (EC 4.1.2.13), 8), Thiol peroxidase; 2C Bcp-type (EC 1.11.1.15), 7), Deoxycytidine triphosphate deaminase (EC 3.5.4.13), 6), 3-isopropylmalate dehydratase large subunit (EC 4.2.1.33), 6), Aspartokinase (EC 2.7.2.4), 5), 3 UDP-N-acetylenolpyruvoylglucosamine reductase (EC 1.3.1.98), 18), Isocitrate dehydrogenase NADP (EC 1.1.1.42), 12), Glutamate syn- thase NADPH large chain (EC 1.4.1.13), 8), Glutamine synthetase type I (EC 6.3.1.2), 7), ATP-dependent protease La (EC 3.4.21.53) Type I, 6), 4 SSU rRNA (adenine(1518)-N(6)/adenine(1519)-N(6))-dimethyltransferase (EC 2.1.1.182), 4), 5-formyltetrahydrofolate cyclo-ligase (EC 6.3.3.2), 3), Histidyl-tRNA synthetase (EC 6.1.1.21), 3), Dihydrofolate reductase (EC 1.5.1.3), 3), Alcohol dehydrogenase (EC 1.1.1.1), 2), 5 Phosphoenolpyruvate-protein phosphotransferase of PTS system (EC 2.7.3.9), 13), Phosphoserine phosphatase (EC 3.1.3.3), 7), Valyl- tRNA synthetase (EC 6.1.1.9), 6), Dihydroorotase (EC 3.5.2.3), 6), Peptidyl-dipeptidase A precursor (EC 3.4.15.1), 4), 6 Uracil-DNA glycosylase; 2C family 5 (EC 3.2.2.27), 6), Alkaline phosphatase (EC 3.1.3.1), 3), NAD(P)H-hydrate epimerase (EC 5.1.99.6) / ADP-dependent (S)-NAD(P)H-hydrate dehydratase (EC 4.2.1.136), 3), Topoisomerase IV subunit B (EC 5.99.1.-), 3), DNA polymerase I (EC 2.7.7.7), 3), 7 Signal transduction histidine kinase CheA (EC 2.7.3.-), 8), Uracil phosphoribosyltransferase (EC 2.4.2.9), 8), Glutamate synthase NADPH large chain (EC 1.4.1.13), 8), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 6), 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 5), 8 ATP-dependent Clp protease proteolytic subunit (EC 3.4.21.92), 9), NADH dehydrogenase (EC 1.6.99.3), 5), Limit dextrin alpha-1; 2C6-maltotetraose-hydrolase (EC 3.2.1.196), 5), beta-galactosidase (EC 3.2.1.23), 5), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), 9 Cytochrome d ubiquinol oxidase subunit I (EC 1.10.3.-), 12), Anthranilate synthase; 2C aminase component (EC 4.1.3.27), 11), Phenylalanyl-tRNA synthetase beta chain (EC 6.1.1.20), 8), Glutamate-1-semialdehyde 2; 2C1-aminomutase (EC 5.4.3.8), 6), Para- aminobenzoate synthase; 2C aminase component (EC 2.6.1.85), 5), 10 Succinyl-CoA ligase ADP-forming alpha chain (EC 6.2.1.5), 7), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 7), Aldehyde dehydrogenase (EC 1.2.1.3), 6), Methyl-directed repair DNA adenine methylase (EC 2.1.1.72), 5), Cell division protein FtsH (EC 3.4.24.- ), 4))

Carbon-a-Keto-Butyric-Acid 1 ATP synthase beta chain (EC 3.6.3.14), 9), Cell division protein FtsZ (EC 3.4.24.-), 6), Ribokinase (EC 2.7.1.15), 6), Phosphoribosyl- formylglycinamidine synthase; 2C synthetase subunit (EC 6.3.5.3) / Phosphoribosylformylglycinamidine synthase; 2C glutamine amido- transferase subunit (EC 6.3.5.3), 5), Deoxyribodipyrimidine photolyase (EC 4.1.99.3), 4), 2 Phosphoenolpyruvate synthase (EC 2.7.9.2), 9), Aminomethyltransferase (glycine cleavage system T protein) (EC 2.1.2.10), 5), Branched- chain amino acid dehydrogenase deaminating (EC 1.4.1.9)(EC 1.4.1.23), 4), N(6)-L-threonylcarbamoyladenine synthase (EC 2.3.1.234), 4), Dihydrolipoamide dehydrogenase of pyruvate dehydrogenase complex (EC 1.8.1.4), 4), 3 Threonyl-tRNA synthetase (EC 6.1.1.3), 53), 1-deoxy-D-xylulose 5-phosphate synthase (EC 2.2.1.7), 7), (23S rRNA (guanosine(2251)-2- O)-methyltransferase (EC 2.1.1.185), 6), Cytosol aminopeptidase PepA (EC 3.4.11.1), 6), Cardiolipin synthetase (EC 2.7.8.-), 5), DRAFT125 4 N-acetylglucosamine-1-phosphate uridyltransferase (EC 2.7.7.23) / Glucosamine-1-phosphate N-acetyltransferase (EC 2.3.1.157), 7), Fu- marate hydratase class I (EC 4.2.1.2), 5), Histidinol dehydrogenase (EC 1.1.1.23), 5), Hydroxymethylpyrimidine phosphate kinase ThiD (EC 2.7.4.7), 4), Methionyl-tRNA synthetase (EC 6.1.1.10), 4), 5 Gamma-glutamyl phosphate reductase (EC 1.2.1.41), 9), UDP-glucose 6-dehydrogenase (EC 1.1.1.22), 8), Spermidine synthase (EC 2.5.1.16), 7), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 7), 1-deoxy-D-xylulose 5-phosphate synthase (EC 2.2.1.7), 5), 6 Acetyl-coenzyme A carboxyl transferase alpha chain (EC 6.4.1.2), 9), Ribonucleotide reductase of class Ia (aerobic); 2C alpha subunit (EC 1.17.4.1), 8), Ketol-acid reductoisomerase (EC 1.1.1.86), 5), Ribosomal large subunit pseudouridine synthase D (EC 5.4.99.23), 5), DEAD-box ATP-dependent RNA helicase DeaD (; 3D CshA) (EC 3.6.4.13), 4), 7 Transcriptional repressor of PutA and PutP / Proline dehydrogenase (EC 1.5.5.2) / Delta-1-pyrroline-5-carboxylate dehydrogenase (EC 1.2.1.88), 19), D-alanyl-D-alanine carboxypeptidase (EC 3.4.16.4), 15), Proline dehydrogenase (EC 1.5.5.2) / Delta-1-pyrroline-5- carboxylate dehydrogenase (EC 1.2.1.88), 14), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 5), Ribulose-phosphate 3-epimerase (EC 5.1.3.1), 5), 8 Tyrosyl-tRNA synthetase (EC 6.1.1.1), 6), Imidazolonepropionase (EC 3.5.2.7), 5), tRNA-specific 2-thiouridylase MnmA (EC 2.8.1.13), 4), Phosphoribosylformylglycinamidine synthase; 2C synthetase subunit (EC 6.3.5.3) / Phosphoribosylformylglycinamidine synthase; 2C glutamine amidotransferase subunit (EC 6.3.5.3), 4), Succinate dehydrogenase flavoprotein subunit (EC 1.3.5.1), 4), 9 Transketolase (EC 2.2.1.1), 12), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 7), ATP-dependent Clp protease proteolytic subunit (EC 3.4.21.92), 7), NAD-dependent glyceraldehyde-3-phosphate dehydrogenase (EC 1.2.1.12), 7), Enolase (EC 4.2.1.11), 5), 10 Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19), 5), Phosphoribosylformylglycinamidine synthase; 2C syn- thetase subunit (EC 6.3.5.3) / Phosphoribosylformylglycinamidine synthase; 2C glutamine amidotransferase subunit (EC 6.3.5.3), 5), D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 5), tRNA-guanine transglycosylase (EC 2.4.2.29), 4), 8-amino-7-oxononanoate syn- thase (EC 2.3.1.47), 4))

Nitrogen-D-Alanine 1 (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 3), 3-oxoacyl-acyl-carrier-protein synthase; 2C KASII (EC 2.3.1.179), 2), Cysteinyl-tRNA synthetase (EC 6.1.1.16), 2), Tryptophan synthase beta chain (EC 4.2.1.20), 2), Adenylate kinase (EC 2.7.4.3), 2), 2 Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 2), Exopolysaccharide biosynthesis glycosyltransferase EpsF (EC 2.4.1.-), 2), 5-hydroxyisourate hydrolase (EC 3.5.2.17), 2), Cell division protein FtsZ (EC 3.4.24.-), 2), 2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2), 2), 3 Geranyl-CoA carboxylase biotin-containing subunit (EC 6.4.1.5), 5), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 5), Ribosomal large subunit pseudouridine synthase D (EC 5.4.99.23), 5), Polyphosphate kinase (EC 2.7.4.1), 5), Glutathione peroxidase (EC 1.11.1.9) @ Thioredoxin peroxidase (EC 1.11.1.15), 4), 4 Probable dTDP-4-dehydrorhamnose reductase (EC 1.1.1.133), 7), Biotin carboxylase of methylcrotonyl-CoA carboxylase (EC 6.3.4.14), 7), CCA tRNA nucleotidyltransferase (EC 2.7.7.72), 4), Threonine synthase (EC 4.2.3.1), 3), Dihydroorotase (EC 3.5.2.3), 3), 5 Acetolactate synthase large subunit (EC 2.2.1.6), 10), Aconitate hydratase (EC 4.2.1.3), 4), FMN adenylyltransferase (EC 2.7.7.2) / Riboflavin kinase (EC 2.7.1.26), 3), tRNA (guanine(37)-N(1))-methyltransferase (EC 2.1.1.228), 3), Precorrin-8X methylmutase (EC 5.4.99.61), 3), 6 Diaminopimelate epimerase (EC 5.1.1.7), 7), Cell division protein FtsZ (EC 3.4.24.-), 6), DNA gyrase subunit B (EC 5.99.1.3), 6), Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 4), Valyl-tRNA synthetase (EC 6.1.1.9), 3), 7 Glycyl-tRNA synthetase alpha chain (EC 6.1.1.14), 5), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 5), Dihydroxy-acid dehydratase (EC 4.2.1.9), 5), Adenylosuccinate lyase (EC 4.3.2.2) @ SAICAR lyase (EC 4.3.2.2), 4), ATP synthase alpha chain (EC 3.6.3.14), 3), 8 Topoisomerase IV subunit B (EC 5.99.1.-), 6), Diaminopimelate epimerase (EC 5.1.1.7), 5), beta-glucosidase (EC 3.2.1.21), 5), 1-deoxy- D-xylulose 5-phosphate reductoisomerase (EC 1.1.1.267), 4), DNA gyrase subunit A (EC 5.99.1.3), 4), 9 Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 5), Diaminopimelate decarboxylase (EC 4.1.1.20), 3), beta- glucosidase (EC 3.2.1.21), 3), DNA gyrase subunit A (EC 5.99.1.3), 3), Aspartokinase (EC 2.7.2.4), 2), 10 Nucleoside diphosphate kinase (EC 2.7.4.6), 10), Respiratory nitrate reductase beta chain (EC 1.7.99.4), 5), tRNA-i(6)A37 methylthio- transferase (EC 2.8.4.3), 5), Valyl-tRNA synthetase (EC 6.1.1.9), 4), Dihydrolipoamide dehydrogenase of pyruvate dehydrogenase complex (EC 1.8.1.4), 4))

Carbon-D-Lactitol 1 Pyrimidine-nucleoside phosphorylase (EC 2.4.2.2), 6), Acetyl-CoA synthetase (EC 6.2.1.1), 5), Dihydroxy-acid dehydratase (EC 4.2.1.9), 5), NAD synthetase (EC 6.3.1.5) / Glutamine amidotransferase chain of NAD synthetase, 4), Cell division protein FtsH (EC 3.4.24.-), 4), 2 Alanyl-tRNA synthetase (EC 6.1.1.7), 10), beta-galactosidase (EC 3.2.1.23), 8), UDP-N-acetylglucosamine 1-carboxyvinyltransferase (EC 2.5.1.7), 7), Selenide; 2Cwater dikinase (EC 2.7.9.3), 7), (Guanosine-3; 2C5-bis(diphosphate) 3-pyrophosphohydrolase (EC 3.1.7.2) / GTP pyrophosphokinase (EC 2.7.6.5); 2C (p)ppGpp synthetase II, 6), 3 1-deoxy-D-xylulose 5-phosphate reductoisomerase (EC 1.1.1.267), 9), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 8), Glutamate syn- thase NADPH large chain (EC 1.4.1.13), 7), Dihydroxy-acid dehydratase (EC 4.2.1.9), 7), O-acetylhomoserine sulfhydrylase (EC 2.5.1.49) / O-succinylhomoserine sulfhydrylase (EC 2.5.1.48), 6), 4 Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 21), 3-isopropylmalate dehydratase large subunit (EC 4.2.1.33), 7), Cytosol aminopeptidase PepA (EC 3.4.11.1), 6), Enolase (EC 4.2.1.11), 6), 2-acylglycerophosphoethanolamine acyltransferase (EC 2.3.1.40) / Acyl-acyl-carrier- protein synthetase (EC 6.2.1.20), 6), 5 3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 8), 23S rRNA (pseudouridine(1915)-N(3))-methyltransferase (EC 2.1.1.177), 6), Ribonucleoside-diphosphate reductase (EC 1.17.4.1), 5), Phosphoribosylformylglycinamidine synthase; 2C synthetase subunit (EC 6.3.5.3) / Phosphoribosylformylglycinamidine synthase; 2C glutamine amidotransferase subunit (EC 6.3.5.3), 3), DNA polymerase II (EC 2.7.7.7), 3), 6 UDP-3-O-3-hydroxymyristoyl glucosamine N-acyltransferase (EC 2.3.1.191), 8), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 6), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 5), Transketolase (EC 2.2.1.1), 5), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 4), 7 (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 8), Ribonuclease III (EC 3.1.26.3), 3), Cell division protein FtsH (EC 3.4.24.-), 3), Acetolactate synthase large subunit (EC 2.2.1.6), 3), 2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2), 3), 8 Branched-chain amino acid aminotransferase (EC 2.6.1.42), 9), UDP-N-acetylmuramate:L-alanyl-gamma-D-glutamyl-meso- diaminopimelate ligase (EC 6.3.2.-), 7), Cysteine desulfurase (EC 2.8.1.7) ; 3D¿ IscS, 7), Gamma-glutamyl phosphate reductase (EC 1.2.1.41), 6), Cyclic pyranopterin phosphate synthase (MoaA) (EC 4.1.99.18), 6), 9 Cell division protein FtsI Peptidoglycan synthetase (EC 2.4.1.129), 9), (Inosine-5-monophosphate dehydrogenase (EC 1.1.1.205) / CBS domain, 9), DNA gyrase subunit B (EC 5.99.1.3), 6), Argininosuccinate synthase (EC 6.3.4.5), 5), Cell division protein FtsZ (EC 3.4.24.-), 5), 10 3-isopropylmalate dehydratase large subunit (EC 4.2.1.33), 10), Phosphogluconate dehydratase (EC 4.2.1.12), 6), ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 5), Acetyl-CoA synthetase (EC 6.2.1.1), 5), DNA polymerase III alpha subunit (EC 2.7.7.7), 5))

Carbon-D-Ribose 1 Phenylalanyl-tRNA synthetase beta chain (EC 6.1.1.20), 9), Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 5), Pantoate–beta-alanine ligase (EC 6.3.2.1), 3), Chorismate synthase (EC 4.2.3.5), 3), Phospho-N-acetylmuramoyl-pentapeptide-transferase (EC 2.7.8.13), 3), 2 DNA gyrase subunit A (EC 5.99.1.3), 6), 2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2), 4), UDP-N-acetylglucosamine 1- carboxyvinyltransferase (EC 2.5.1.7), 4), Alkaline phosphatase (EC 3.1.3.1), 3), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 3), 3 (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 5), Acetylornithine aminotransferase (EC 2.6.1.11), 3), Ribonuclease E (EC 3.1.26.12), 3), Hydrolase (EC 3.8.1.2), 3), Pyrophosphate-energized proton pump (EC 3.6.1.1), 3), DRAFT126 4 Deoxyribodipyrimidine photolyase (EC 4.1.99.3), 9), 8-amino-7-oxononanoate synthase (EC 2.3.1.47), 7), Phosphoribosylamine–glycine ligase (EC 6.3.4.13), 6), Alcohol dehydrogenase (EC 1.1.1.1), 5), N-acetylglucosamine-1-phosphate uridyltransferase (EC 2.7.7.23) / Glucosamine-1-phosphate N-acetyltransferase (EC 2.3.1.157), 5), 5 Cell division protein FtsZ (EC 3.4.24.-), 5), Aspartate aminotransferase (EC 2.6.1.1), 3), Phosphoribosylglycinamide formyltransferase (EC 2.1.2.2), 3), Aspartate 1-decarboxylase (EC 4.1.1.11), 3), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 3), 6 Hydroxymethylglutaryl-CoA reductase (EC 1.1.1.34), 2), Phosphoribosylaminoimidazole-succinocarboxamide synthase (EC 6.3.2.6), 2), DNA gyrase subunit B (EC 5.99.1.3), 1), ATP synthase beta chain (EC 3.6.3.14), 1), UDP-2; 2C3-diacylglucosamine diphosphatase (EC 3.6.1.54), 1), 7 Uridine monophosphate kinase (EC 2.7.4.22), 14), 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase (EC 2.7.1.148), 7), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 7), DNA gyrase subunit B (EC 5.99.1.3), 6), DNA polymerase I (EC 2.7.7.7), 5), 8 DNA gyrase subunit A (EC 5.99.1.3), 7), Isoleucyl-tRNA synthetase (EC 6.1.1.5), 6), Aspartate carbamoyltransferase (EC 2.1.3.2), 6), Endonuclease III (EC 4.2.99.18), 5), Prolyl-tRNA synthetase (EC 6.1.1.15); 2C bacterial type, 5), 9 Alanyl-tRNA synthetase (EC 6.1.1.7), 9), DEAD-box ATP-dependent RNA helicase DeaD (; 3D CshA) (EC 3.6.4.13), 5), Chorismate synthase (EC 4.2.3.5), 5), DNA gyrase subunit A (EC 5.99.1.3), 4), Cobyrinic acid a; 2Cc-diamide synthetase (EC 6.3.5.11), 3), 10 (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 3), Alanyl-tRNA synthetase (EC 6.1.1.7), 3), Ferredoxin–NADP(+) reduc- tase; 2C actinobacterial (eukaryote-like) type (EC 1.18.1.2), 2), 3-oxoacyl-acyl-carrier-protein synthase; 2C KASII (EC 2.3.1.179), 2), Cytochrome c oxidase (cbb3-type) subunit CcoN (EC 1.9.3.1), 2))

Nitrogen-Cytosine 1 Phosphoribosylamine–glycine ligase (EC 6.3.4.13), 5), Tyrosyl-tRNA synthetase (EC 6.1.1.1), 4), Prolipoprotein diacylglyceryl transferase (EC 2.4.99.-), 3), Fructose-bisphosphate aldolase class II (EC 4.1.2.13), 3), S-adenosylmethionine:tRNA ribosyltransferase-isomerase (EC 2.4.99.17), 3), 2 UDP-glucose 4-epimerase (EC 5.1.3.2), 11), Transaldolase (EC 2.2.1.2), 7), Methionine aminopeptidase (EC 3.4.11.18), 6), Foldase protein PrsA precursor (EC 5.2.1.8), 6), Exodeoxyribonuclease VII large subunit (EC 3.1.11.6), 5), 3 5-methyltetrahydrofolate–homocysteine methyltransferase (EC 2.1.1.13), 7), ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 6), Ribonucleotide reductase of class Ia (aerobic); 2C alpha subunit (EC 1.17.4.1), 6), 3-oxoacyl-acyl-carrier-protein synthase; 2C KASIII (EC 2.3.1.180), 4), Domain often clustered or fused with uracil-DNA glycosylase / Uracil-DNA glycosylase; 2C putative family 6 (EC 3.2.2.27), 4), 4 Alanyl-tRNA synthetase (EC 6.1.1.7), 7), DNA gyrase subunit B (EC 5.99.1.3), 6), Succinate-semialdehyde dehydrogenase NAD (EC 1.2.1.24); 3B Succinate-semialdehyde dehydrogenase NADP+ (EC 1.2.1.79), 5), Dihydrolipoamide dehydrogenase of pyruvate dehydroge- nase complex (EC 1.8.1.4), 4), Alcohol dehydrogenase (EC 1.1.1.1), 4), 5 Phosphoglycerate kinase (EC 2.7.2.3), 5), Glycerophosphoryl diester phosphodiesterase (EC 3.1.4.46), 5), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 4), 3-oxoacyl-acyl-carrier-protein synthase; 2C KASII (EC 2.3.1.179), 4), Peptide chain release factor N(5)- glutamine methyltransferase (EC 2.1.1.297), 4), 6 Cell division protein FtsH (EC 3.4.24.-), 12), Methionyl-tRNA formyltransferase (EC 2.1.2.9), 12), Adenylate cyclase (EC 4.6.1.1), 5), S-adenosylmethionine synthetase (EC 2.5.1.6), 4), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 3), 7 Enolase (EC 4.2.1.11), 11), Cell division protein FtsH (EC 3.4.24.-), 7), Ubiquinol-cytochrome C reductase iron-sulfur subunit (EC 1.10.2.2), 7), Dihydroorotase (EC 3.5.2.3), 5), Porphobilinogen synthase (EC 4.2.1.24), 4), 8 Fructose-1; 2C6-bisphosphatase; 2C type I (EC 3.1.3.11), 1), Pyruvate formate-lyase (EC 2.3.1.54), 1), Phosphoribosylformylglycinamidine synthase; 2C synthetase subunit (EC 6.3.5.3) / Phosphoribosylformylglycinamidine synthase; 2C glutamine amidotransferase subunit (EC 6.3.5.3), 1), Altronate dehydratase (EC 4.2.1.7), 1), tRNA pseudouridine(55) synthase (EC 5.4.99.25), 1), 9 Thioredoxin reductase (EC 1.8.1.9), 6), beta-galactosidase (EC 3.2.1.23), 5), Nucleoside triphosphate pyrophosphohydrolase MazG (EC 3.6.1.8), 5), Alanyl-tRNA synthetase (EC 6.1.1.7), 4), 3-isopropylmalate dehydrogenase (EC 1.1.1.85), 3), 10 Phosphoenolpyruvate carboxykinase GTP (EC 4.1.1.32), 16), Electron transfer flavoprotein-ubiquinone oxidoreductase (EC 1.5.5.1), 7), DNA polymerase I (EC 2.7.7.7), 6), Soluble lytic murein transglycosylase precursor (EC 3.2.1.-), 5), Adenylosuccinate lyase (EC 4.3.2.2) @ SAICAR lyase (EC 4.3.2.2), 5))

Carbon-Succinic-Acid 1 (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), Fumarate hydratase class I (EC 4.2.1.2), 3), beta-galactosidase (EC 3.2.1.23), 3), Phosphoenolpyruvate-dihydroxyacetone phosphotransferase (EC 2.7.1.121); 2C ADP-binding subunit DhaL, 3), NADH- ubiquinone oxidoreductase chain L (EC 1.6.5.3), 3), 2 Coproporphyrinogen III oxidase; 2C aerobic (EC 1.3.3.3), 3), Galactose-1-phosphate uridylyltransferase (EC 2.7.7.10), 2), Pantothenate kinase (EC 2.7.1.33), 2), Aspartate aminotransferase (EC 2.6.1.1), 2), Para-aminobenzoate synthase; 2C aminase component (EC 2.6.1.85), 2), 3 ATP synthase alpha chain (EC 3.6.3.14), 4), Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 3), Uridine monophosphate kinase (EC 2.7.4.22), 3), N-acetylglucosamine deacetylase (EC 3.5.1.-) / 3-hydroxyacyl-acyl-carrier-protein dehydratase; 2C FabZ form (EC 4.2.1.59), 3), Acetyl-CoA synthetase (EC 6.2.1.1), 3), 4 DNA polymerase I (EC 2.7.7.7), 9), D-alanine–D-alanine ligase (EC 6.3.2.4), 4), Serine hydroxymethyltransferase (EC 2.1.2.1), 4), (DNA- directed RNA polymerase beta subunit (EC 2.7.7.6), 3), Gamma-glutamyl phosphate reductase (EC 1.2.1.41), 3), 5 Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 8), 6-phospho-beta-glucosidase (EC 3.2.1.86), 4), Pyruvate; 2Cphosphate dikinase (EC 2.7.9.1), 3), 2-Keto-3-deoxy-D-manno-octulosonate-8-phosphate synthase (EC 2.5.1.55), 2), Glutamine synthetase type I (EC 6.3.1.2), 2), 6 Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 8), Pyruvate; 2Cphosphate dikinase (EC 2.7.9.1), 8), 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase (EC 1.17.7.1), 5), D-arabinose 5-phosphate isomerase (EC 5.3.1.13), 5), Indole-3-glycerol phosphate synthase (EC 4.1.1.48), 5), 7 Adenylate kinase (EC 2.7.4.3), 12), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 10), tRNA-specific 2-thiouridylase MnmA (EC 2.8.1.13), 4), Alanyl-tRNA synthetase (EC 6.1.1.7), 4), Glutathione reductase (EC 1.8.1.7), 3), 8 Isoleucyl-tRNA synthetase (EC 6.1.1.5), 5), Lead; 2C cadmium; 2C zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5); 3B Copper-translocating P-type ATPase (EC 3.6.3.4), 5), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 4), D-alanine–D-alanine ligase (EC 6.3.2.4), 3), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 3), 9 DNA gyrase subunit B (EC 5.99.1.3), 15), Topoisomerase IV subunit B (EC 5.99.1.-), 8), Ribonuclease III (EC 3.1.26.3), 7), 2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2), 5), DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 5), 10 Adenylosuccinate lyase (EC 4.3.2.2) @ SAICAR lyase (EC 4.3.2.2), 4), Methylmalonyl-CoA mutase small subunit; 2C MutA (EC 5.4.99.2), 4), Pyruvate dehydrogenase E1 component beta subunit (EC 1.2.4.1), 3), Molybdopterin molybdenumtransferase (EC 2.10.1.1), 3), (DNA- directed RNA polymerase beta subunit (EC 2.7.7.6), 3)) Table A.16: Table of the top 5 enzymes (with their EC numbers) associated with each of the top 10 k-mers. For each media we show the top 10 k-mers (ranked according to the RF FI score). The second number in each tuple specifies the number of times the specific EC number is being associated with the given k-er in the specified PGR media dataset. [TODO: REDO] DRAFT127 Nitrogen-Glycine (Dihydrolipoamide acyltransferase component of branched-chain alpha-keto acid dehydrogenase complex (EC 2.3.1.168), 6), (Long-chain-fatty- acid–CoA ligase (EC 6.2.1.3), 5), (Threonyl-tRNA synthetase (EC 6.1.1.3), 5), (L-2C4-diaminobutyric acid transaminase DoeD (EC 2.6.1.-), 4), (Potassium-transporting ATPase B chain (EC 3.6.3.12) (TC 3.A.3.7.1), 4), (Thiazole synthase (EC 2.8.1.10), 3), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 3), (Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 3), (Glucosamine–fructose-6-phosphate aminotransferase isomerizing (EC 2.6.1.16), 3), (Glucose-6-phosphate isomerase (EC 5.3.1.9), 3), (FKBP-type peptidyl-prolyl cis-trans isomerase FkpA precursor (EC 5.2.1.8), 3), (Oligoendopeptidase F (EC 3.4.24.-), 3), (Peptidyl-tRNA hydrolase (EC 3.1.1.29), 2), (Putative ATP:guanido phosphotransferase YacI (EC 2.7.3.-), 2), (D-alanyl-D- alanine carboxypeptidase (EC 3.4.16.4), 2)

Carbon-D-Galactonic-Acid-g-Lactone (ATP synthase alpha chain (EC 3.6.3.14), 65), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 60), (Aldehyde dehydrogenase (EC 1.2.1.3), 52), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 49), (LeadC cadmiumC zinc and mercury trans- porting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 48), (Phosphoribosylformylglycinamidine cyclo-ligase (EC 6.3.3.1), 45), (ATP-dependent DNA helicase RecG (EC 3.6.4.12), 42), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 42), (2-isopropylmalate synthase (EC 2.3.3.13), 40), (5-methyltetrahydrofolate–homocysteine methyltransferase (EC 2.1.1.13), 39), (Homoserine dehydrogenase (EC 1.1.1.3), 36), (Enoyl-CoA hydratase (EC 4.2.1.17), 34), (Methylcrotonyl-CoA carboxylase biotin-containing subunit (EC 6.4.1.4), 32), (Succinate-semialdehyde dehydrogenase NAD(P)+ (EC 1.2.1.16), 32), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 32)

Nitrogen-L-Serine (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 8), (Catalase-peroxidase KatG (EC 1.11.1.21), 7), (Cytosol aminopeptidase PepA (EC 3.4.11.1), 6), (Formate dehydrogenase N beta subunit (EC 1.2.1.2), 6), (L-2C4-diaminobutyric acid transaminase DoeD (EC 2.6.1.-), 6), (Sulfur carrier protein ThiS adenylyltransferase (EC 2.7.7.73), 6), (Autoinducer 1 sensor kinase/phosphatase LuxN (EC 2.7.3.-) (EC 3.1.3.-), 6), (CoA-disulfide reductase (EC 1.8.1.14), 5), (DNA gyrase subunit B (EC 5.99.1.3), 5), (4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14), 5), (D-alanyl-D-alanine carboxypeptidase (EC 3.4.16.4), 4), (Succinyl-CoA ligase ADP-forming alpha chain (EC 6.2.1.5), 4), (Phosphogluconate dehydratase (EC 4.2.1.12), 4), (NADPH-dependent 7-cyano-7-deazaguanine reductase (EC 1.7.1.13), 3), (Lipid A biosynthesis lauroyl acyltransferase (EC 2.3.1.241), 3)

Nitrogen-L-Glutamic-Acid (Pyruvate-flavodoxin oxidoreductase (EC 1.2.7.-), 14), (Ribonuclease HI (EC 3.1.26.4), 12), (Bifunctional protein: zinc-containing alcohol de- hydrogenaseB quinone oxidoreductase ( NADPH:quinone reductase) (EC 1.1.1.-)B Similar to arginate lyase, 9), (Leucyl-tRNA synthetase (EC 6.1.1.4), 9), (Cell division protein FtsI Peptidoglycan synthetase (EC 2.4.1.129), 9), (Threonyl-tRNA synthetase (EC 6.1.1.3), 8), (Riboflavin synthase eubacterial/eukaryotic (EC 2.5.1.9), 8), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 7), (D-alanyl-D-alanine carboxypeptidase (EC 3.4.16.4), 7), (L-2C4-diaminobutyric acid transaminase DoeD (EC 2.6.1.-), 7), ((R)-citramalate synthase (EC 2.3.1.182), 6), (Exodeoxyri- bonuclease V beta chain (EC 3.1.11.5), 6), (CoA-disulfide reductase (EC 1.8.1.14), 5), (NAD-specific glutamate dehydrogenase (EC 1.4.1.2)C large form, 5), (Enoyl-CoA hydratase (EC 4.2.1.17) @ Enoyl-CoA hydratase EchA5 (EC 4.2.1.17), 5)

Carbon-b-Methyl-D-Galactoside (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 60), (Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19), 47), (Adenosylmethionine-8-amino-7-oxononanoate aminotransferase (EC 2.6.1.62), 46), (Pyruvate kinase (EC 2.7.1.40), 39), (Aldehyde dehy- drogenase (EC 1.2.1.3), 38), (Omega-amino acid–pyruvate aminotransferase (EC 2.6.1.18), 37), (Alanyl-tRNA synthetase (EC 6.1.1.7), 36), (Fructose-1C6-bisphosphataseC type I (EC 3.1.3.11), 34), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 32), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 32), (Isoleucyl-tRNA synthetase (EC 6.1.1.5), 30), (DNA-directed RNA polymerase alpha subunit (EC 2.7.7.6), 29), (GMP synthase glutamine-hydrolyzingC amidotransferase subunit (EC 6.3.5.2) / GMP synthase glutamine- hydrolyzingC ATP pyrophosphatase subunit (EC 6.3.5.2), 29), (”Inosine-5-monophosphate dehydrogenase (EC 1.1.1.205) / CBS domain”, 29), (Prolyl-tRNA synthetase (EC 6.1.1.15)C bacterial type, 27)

Carbon-L-Aspartic-Acid (Glycerate kinase (EC 2.7.1.31), 16), (Carbamoyl-phosphate synthase small chain (EC 6.3.5.5), 7), (Maltose/maltodextrin transport ATP- binding protein MalK (EC 3.6.3.19), 7), (Aspartate-semialdehyde dehydrogenase DoeC in ectoine degradation (EC 1.2.1.-), 7), (Seryl-tRNA synthetase (EC 6.1.1.11), 6), (Acetolactate synthase large subunit (EC 2.2.1.6), 6), (L-2C4-diaminobutyric acid transaminase DoeD (EC 2.6.1.- ), 6), (Class A beta-lactamase (EC 3.5.2.6) D¿ SHV family, 6), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 5), (Quinolinate synthetase (EC 2.5.1.72), 5), (Enoyl-CoA hydratase (EC 4.2.1.17), 5), (Threonyl-tRNA synthetase (EC 6.1.1.3), 5), (D-alanyl-D-alanine carboxypeptidase (EC 3.4.16.4), 4), (Alcohol dehydrogenase (EC 1.1.1.1), 4), (Choline-sulfatase (EC 3.1.6.6), 4)

Carbon-Butylamine-sec (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 90), (Aldehyde dehydrogenase (EC 1.2.1.3), 74), (Gamma-glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 34), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 33), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 30), (NADH-ubiquinone oxidoreductase chain N (EC 1.6.5.3), 29), (Urocanate hydratase (EC 4.2.1.49), 29), (Pyruvate carboxylase (EC 6.4.1.1), 27), (GMP synthase glutamine-hydrolyzingC amidotransferase subunit (EC 6.3.5.2) / GMP synthase glutamine- hydrolyzingC ATP pyrophosphatase subunit (EC 6.3.5.2), 26), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 25), (Enoyl-CoA hydratase (EC 4.2.1.17), 22), (Cytochrome d ubiquinol oxidase subunit I (EC 1.10.3.-), 22), (Cytochrome c oxidase polypeptide I (EC 1.9.3.1), 22), (NADH-ubiquinone oxidoreductase chain G (EC 1.6.5.3), 22), (UDP-N-acetylmuramoylalanyl-D-glutamate–2C6-diaminopimelate ligase (EC 6.3.2.13), 21)

Carbon-a-Methyl-D-Glucoside (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 94), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 86), (1-deoxy-D-xylulose 5-phosphate synthase (EC 2.2.1.7), 48), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 45), (Aldehyde dehydrogenase (EC 1.2.1.3), 42), (DNA polymerase III alpha subunit (EC 2.7.7.7), 39), (DNA gyrase subunit A (EC 5.99.1.3), 38), (5-methyltetrahydrofolate–homocysteine methyltransferase (EC 2.1.1.13), 38), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 37), (Gamma-glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 35), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 35), (Enoyl-CoA hydratase (EC 4.2.1.17), 34), (3-oxoacyl-acyl-carrier-protein synthaseC KASII (EC 2.3.1.179), 33), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 32), (Dihydroorotase (EC 3.5.2.3), 29)

Nitrogen-D-Glucosamine (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 14), (Phosphoglycerate mutase (EC 5.4.2.11), 11), (DNA poly- merase III alpha subunit (EC 2.7.7.7), 10), (Carboxyl-terminal protease (EC 3.4.21.102), 9), (Phosphogluconate dehydratase (EC 4.2.1.12), 8), (Phosphoheptose isomerase (EC 5.3.1.-), 8), (L-aspartate oxidase (EC 1.4.3.16), 8), (Respiratory nitrate reductase beta chain (EC 1.7.99.4), 8), (Molybdopterin molybdenumtransferase (EC 2.10.1.1), 7), (3-isopropylmalate dehydrogenase (EC 1.1.1.85), 7), (Glutamyl-tRNA synthetase (EC 6.1.1.17), 7), (DNA gyrase subunit B (EC 5.99.1.3), 7), (DNA polymerase III beta subunit (EC 2.7.7.7), 7), (UDP-3-O-3-hydroxymyristoyl N-acetylglucosamine deacetylase (EC 3.5.1.108), 6), (Phosphoheptose isomerase 1 (EC 5.3.1.-), 6)

Nitrogen-Nitrite DRAFT128 (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 89), (ATP synthase beta chain (EC 3.6.3.14), 72), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 71), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 43), (Polyribonucleotide nucleotidyltransferase (EC 2.7.7.8), 43), (6-phosphofructokinase (EC 2.7.1.11), 42), (NADH-ubiquinone oxidoreductase chain N (EC 1.6.5.3), 40), (Alcohol dehydrogenase (EC 1.1.1.1), 39), (DNA gyrase subunit B (EC 5.99.1.3), 31), (3-oxoacyl-acyl- carrier protein reductase (EC 1.1.1.100), 30), (Dihydrolipoamide acyltransferase component of branched-chain alpha-keto acid dehydrogenase complex (EC 2.3.1.168), 30), (Phosphoglucosamine mutase (EC 5.4.2.10), 29), (Acetyl-CoA synthetase (EC 6.2.1.1), 29), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 28), (ATP-dependent DNA helicase RecG (EC 3.6.4.12), 28)

Carbon-L-Glutamic-Acid (Thiamin-phosphate pyrophosphorylase (EC 2.5.1.3), 32), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 24), (UDP-N-acetylmuramate–alanine ligase (EC 6.3.2.8), 24), (Prolyl-tRNA synthetase (EC 6.1.1.15)C bacterial type, 23), (tRNA-guanine transglycosylase (EC 2.4.2.29), 18), (Enoyl-CoA hydratase (EC 4.2.1.17), 17), (Argini- nosuccinate synthase (EC 6.3.4.5), 16), (Succinyl-CoA:3-ketoacid-coenzyme A transferase subunit A (EC 2.8.3.5), 14), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 13), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 12), (3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl- CoA acetyltransferase (EC 2.3.1.9), 12), (Glutamate-1-semialdehyde 2C1-aminomutase (EC 5.4.3.8), 11), (2-oxoglutarate/2-oxoacid ferredoxin oxidoreductaseC beta subunit (EC 1.2.7.-), 11), (2-dehydropantoate 2-reductase (EC 1.1.1.169), 11), (Malate synthase (EC 2.3.3.9), 9)

Carbon-Glycyl-L-Aspartic-Acid (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 124), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 108), (3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 92), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 89), (Alcohol dehydrogenase (EC 1.1.1.1), 71), (Aldehyde dehydrogenase (EC 1.2.1.3), 67), (Transcriptional regulatorC GntR family domain / Aspartate aminotransferase (EC 2.6.1.1), 63), (Glutamate-ammonia-ligase adenylyltransferase (EC 2.7.7.42), 58), (Enoyl-CoA hydratase (EC 4.2.1.17), 57), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 54), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 53), (Adenylate cyclase (EC 4.6.1.1), 49), (Cysteine desulfurase (EC 2.8.1.7), 49), (Anthranilate phosphoribosyltransferase (EC 2.4.2.18), 47), (UDP-N-acetylmuramoylalanine–D-glutamate ligase (EC 6.3.2.9), 44)

Carbon-Caproic-Acid (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 70), (Urocanate hydratase (EC 4.2.1.49), 62), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 56), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 51), (NADH-ubiquinone oxidoreductase chain G (EC 1.6.5.3), 51), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 44), (Aldehyde dehydrogenase (EC 1.2.1.3), 38), (Glutamate 5-kinase (EC 2.7.2.11) / RNA-binding C-terminal domain PUA, 36), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 33), (3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 33), (”DNA-directed RNA polymerase beta subunit (EC 2.7.7.6)”, 31), (”Inosine-5-monophosphate dehydrogenase (EC 1.1.1.205) / CBS domain”, 31), (Aspartate aminotransferase (EC 2.6.1.1), 29), (Succinate-semialdehyde dehydrogenase NAD(P)+ (EC 1.2.1.16), 28), (Adenylate cyclase (EC 4.6.1.1), 27)

Carbon-a-Keto-Valeric-Acid (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 67), (S-adenosylmethionine synthetase (EC 2.5.1.6), 49), (Long-chain-fatty-acid– CoA ligase (EC 6.2.1.3), 49), (Hydroxymethylpyrimidine phosphate synthase ThiC (EC 4.1.99.17), 47), (LeadC cadmiumC zinc and mer- cury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 42), (Enoyl-CoA hydratase (EC 4.2.1.17), 41), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 40), (Phosphoribosylformylglycinamidine cyclo-ligase (EC 6.3.3.1), 39), (Cytochrome c oxidase polypeptide I (EC 1.9.3.1), 36), (Aspartyl-tRNA(Asn) amidotransferase subunit A (EC 6.3.5.6) @ Glutamyl- tRNA(Gln) amidotransferase subunit A (EC 6.3.5.7), 36), (Enolase (EC 4.2.1.11), 35), (Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 34), (Enoyl-CoA hydratase isoleucine degradation (EC 4.2.1.17) / 3-hydroxyacyl-CoA dehydrogenase (EC 1.1.1.35) / 3-hydroxybutyryl-CoA epimerase (EC 5.1.2.3), 32), (Dethiobiotin synthetase (EC 6.3.3.3), 27), (Fumarate hydratase class IC aerobic (EC 4.2.1.2), 25)

Nitrogen-Gly-Asn (Glucosamine–fructose-6-phosphate aminotransferase isomerizing (EC 2.6.1.16), 12), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 12), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 9), (Fructose-bisphosphate aldolase class II (EC 4.1.2.13), 9), (DNA polymerase III beta subunit (EC 2.7.7.7), 7), (DNA polymerase III alpha subunit (EC 2.7.7.7), 7), (Exodeoxyribonuclease VII large subunit (EC 3.1.11.6), 7), (DNA ligase (NAD(+)) (EC 6.5.1.2), 6), (Ribonucleoside-diphosphate reductase (EC 1.17.4.1), 6), (NAD(P)H dehy- drogenase (quinone) 2 (EC 1.6.5.2), 5), (alpha-xylosidase (EC 3.2.1.177), 5), (Thiamine-monophosphate kinase (EC 2.7.4.16), 5), (Sporulation kinase E (EC 2.7.13.3), 5), (DNA polymerase III delta subunit (EC 2.7.7.7), 5), (Deoxyadenosine kinase (EC 2.7.1.76) @ Deoxycytidine kinase (EC 2.7.1.74), 4)

Carbon-b-Phenylethylamine (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 60), (Aldehyde dehydrogenase (EC 1.2.1.3), 35), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 31), (2-keto-3-deoxy-D-arabino-heptulosonate-7-phosphate synthase II (EC 2.5.1.54), 29), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 27), (Enoyl-CoA hydratase (EC 4.2.1.17), 24), (Phenylalanyl-tRNA synthetase beta chain (EC 6.1.1.20), 24), (UDP-N-acetylmuramoylalanine–D-glutamate ligase (EC 6.3.2.9), 24), (UDP-N-acetylglucosamine 1-carboxyvinyltransferase (EC 2.5.1.7), 24), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 23), (Phy- toene synthase (EC 2.5.1.32), 22), (Dihydrolipoamide acetyltransferase component of pyruvate dehydrogenase complex (EC 2.3.1.12), 22), (O-acetylhomoserine sulfhydrylase (EC 2.5.1.49) / O-succinylhomoserine sulfhydrylase (EC 2.5.1.48), 22), (Signal peptidase I (EC 3.4.21.89), 22), (Topoisomerase IV subunit A (EC 5.99.1.-), 22)

Carbon-D-L-Malic-Acid (Autoinducer 1 sensor kinase/phosphatase LuxN (EC 2.7.3.-) (EC 3.1.3.-), 10), (Phosphogluconate dehydratase (EC 4.2.1.12), 10), (Glutamyl- tRNA synthetase (EC 6.1.1.17), 9), (Medium-chain-fatty-acid–CoA ligase (EC 6.2.1.-), 8), (DNA gyrase subunit B (EC 5.99.1.3), 8), (D- Lactate dehydrogenaseC cytochrome c-dependent (EC 1.1.2.4), 8), (Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 7), (”Bis(5- nucleosyl)-tetraphosphataseC symmetrical (EC 3.6.1.41)”, 7), (3-isopropylmalate dehydrogenase (EC 1.1.1.85), 7), (4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14), 7), (2-acylglycerophosphoethanolamine acyltransferase (EC 2.3.1.40) / Acyl-acyl-carrier-protein synthetase (EC 6.2.1.20), 6), (Cytochrome c oxidase polypeptide I (EC 1.9.3.1), 6), (PTS systemC N-acetylglucosamine-specific IIC component / PTS systemC N-acetylglucosamine-specific IIB component (EC 2.7.1.69) / PTS systemC N- acetylglucosamine-specific IIA component, 6), (Thiazole synthase (EC 2.8.1.10), 5), (N-acetylmuramoyl-L-alanine amidase (EC 3.5.1.28), 5)

Carbon-D-Fructose (Valyl-tRNA synthetase (EC 6.1.1.9), 28), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 11), (DNA polymerase III beta subunit (EC 2.7.7.7), 6), (Glycyl-tRNA synthetase alpha chain (EC 6.1.1.14), 6), (Cytosol aminopeptidase PepA (EC 3.4.11.1), 6), (Myo-inositol 2-dehydrogenase (EC 1.1.1.18), 6), (Ribonucleoside-diphosphate reductase (EC 1.17.4.1), 6), (DNA gyrase subunit B (EC 5.99.1.3), 6), (ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 5), (Autoinducer 1 sensor kinase/phosphatase LuxN (EC 2.7.3.-) (EC 3.1.3.-), 5), (DNA polymerase III alpha subunit (EC 2.7.7.7), 5), (Fructose-bisphosphate aldolase class II (EC 4.1.2.13), 5), (Sporulation kinase E (EC 2.7.13.3), 5), (Phosphogluconate dehydratase (EC 4.2.1.12), 5), (3-isopropylmalate dehydrogenase (EC 1.1.1.85), 5)

Carbon-D-L-Citramalic-Acid DRAFT129 (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 90), (Aldehyde dehydrogenase (EC 1.2.1.3), 52), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 43), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 42), (Phosphoribosylformylglycinamidine cyclo-ligase (EC 6.3.3.1), 37), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 35), (DNA-directed RNA polymerase alpha subunit (EC 2.7.7.6), 35), (Adenylosuccinate lyase (EC 4.3.2.2) @ SAICAR lyase (EC 4.3.2.2), 30), (Glyoxylate carboligase (EC 4.1.1.47), 28), (Dihydroxy-acid dehydratase (EC 4.2.1.9), 24), (Histidinol dehydrogenase (EC 1.1.1.23), 24), (Signal transduction histidine kinase CheA (EC 2.7.3.-), 23), (Gamma-glutamyl phosphate reductase (EC 1.2.1.41), 21), (Membrane alanine aminopeptidase N (EC 3.4.11.2), 21), (Alanyl-tRNA synthetase (EC 6.1.1.7), 21)

Carbon-2-3-Butanone (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 78), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 55), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 49), (UDP-N-acetylglucosamine 1-carboxyvinyltransferase (EC 2.5.1.7), 43), (Leucyl-tRNA synthetase (EC 6.1.1.4), 42), (DNA gyrase subunit B (EC 5.99.1.3), 40), (Enoyl-CoA hydratase (EC 4.2.1.17), 38), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 36), (NAD(P) transhydrogenase alpha subunit (EC 1.6.1.2), 36), (Histidinol dehydrogenase (EC 1.1.1.23), 35), (3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 34), (Signal transduction histidine kinase CheA (EC 2.7.3.-), 34), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 33), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 32), (Aldehyde dehydrogenase (EC 1.2.1.3), 31)

Nitrogen-Ala-Asp (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 139), (Thioredoxin reductase (EC 1.8.1.9), 119), (Aldehyde dehydrogenase (EC 1.2.1.3), 118), (Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 111), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 101), (Cell division protein FtsH (EC 3.4.24.-), 101), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 95), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 86), (Anthranilate phosphoribo- syltransferase (EC 2.4.2.18), 85), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 83), (Cell division protein FtsZ (EC 3.4.24.-), 78), (Phosphoribosylformylglycinamidine cyclo-ligase (EC 6.3.3.1), 74), (Enoyl-CoA hydratase (EC 4.2.1.17), 73), (Methylcrotonyl-CoA carboxylase biotin-containing subunit (EC 6.4.1.4), 70), (NADH-ubiquinone oxidoreductase chain L (EC 1.6.5.3), 65)

Carbon-Maltose (Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 31), (Gamma-glutamyl phosphate reductase (EC 1.2.1.41), 26), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 23), (Omega-amino acid–pyruvate aminotransferase (EC 2.6.1.18), 23), (Long- chain-fatty-acid–CoA ligase (EC 6.2.1.3), 22), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 20), (Molybdopterin molybdenumtrans- ferase (EC 2.10.1.1), 19), (Aspartyl-tRNA(Asn) amidotransferase subunit B (EC 6.3.5.6) @ Glutamyl-tRNA(Gln) amidotransferase subunit B (EC 6.3.5.7), 16), (Acetolactate synthase large subunit (EC 2.2.1.6), 15), (Fumarate hydratase class II (EC 4.2.1.2), 14), (NADH-ubiquinone ox- idoreductase chain D (EC 1.6.5.3), 14), (Glycogen phosphorylase (EC 2.4.1.1), 13), (Glutamate-1-semialdehyde 2C1-aminomutase (EC 5.4.3.8), 12), (3-dehydroquinate synthase (EC 4.2.3.4), 12), (Gamma-glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 11)

Carbon-Capric-Acid (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 70), (Aldehyde dehydrogenase (EC 1.2.1.3), 61), (ATP synthase alpha chain (EC 3.6.3.14), 46), (2-keto-3-deoxy-D-arabino-heptulosonate- 7-phosphate synthase II (EC 2.5.1.54), 46), (Glycine dehydrogenase decarboxylating (glycine cleavage system P protein) (EC 1.4.4.2), 38), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 37), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 36), (Acetyl-CoA synthetase (EC 6.2.1.1), 35), (D-alanyl-D-alanine carboxypeptidase (EC 3.4.16.4), 28), (”Inosine-5-monophosphate dehydrogenase (EC 1.1.1.205) / CBS domain”, 27), (3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 24), (Methylcrotonyl-CoA carboxylase biotin-containing subunit (EC 6.4.1.4), 23), (Transketolase (EC 2.2.1.1), 23), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 21), (Enoyl-CoA hydratase (EC 4.2.1.17), 21)

Carbon-Oxalomalic-Acid (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 66), (Transketolase (EC 2.2.1.1), 41), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 39), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 32), (NADH-ubiquinone oxidoreductase chain M (EC 1.6.5.3), 32), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 30), (Enoyl-CoA hydratase (EC 4.2.1.17), 29), (Polyphosphate kinase (EC 2.7.4.1), 28), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 25), (Chemotaxis protein methyltransferase CheR (EC 2.1.1.80), 25), (DNA polymerase IV (EC 2.7.7.7), 24), (Acetolactate synthase large subunit (EC 2.2.1.6), 24), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 24), (Phospho-N-acetylmuramoyl-pentapeptide- transferase (EC 2.7.8.13), 23), (Xaa-Pro dipeptidase (EC 3.4.13.9), 22)

Carbon-i-Erythritol (Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 68), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 44), (Dihydrolipoamide dehy- drogenase of pyruvate dehydrogenase complex (EC 1.8.1.4), 39), (Ketol-acid reductoisomerase (EC 1.1.1.86), 37), (NAD(P) transhydrogenase alpha subunit (EC 1.6.1.2), 36), (Phosphoribosylformylglycinamidine synthaseC synthetase subunit (EC 6.3.5.3) / Phosphoribosylformyl- glycinamidine synthaseC glutamine amidotransferase subunit (EC 6.3.5.3), 33), (DNA gyrase subunit A (EC 5.99.1.3), 30), (Arginyl-tRNA synthetase (EC 6.1.1.19), 29), (O-acetylhomoserine sulfhydrylase (EC 2.5.1.49) / O-succinylhomoserine sulfhydrylase (EC 2.5.1.48), 27), (Thiamin-phosphate pyrophosphorylase (EC 2.5.1.3), 26), (Aldehyde dehydrogenase (EC 1.2.1.3), 24), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 24), (Cell division protein FtsH (EC 3.4.24.-), 23), (Polyribonucleotide nucleotidyltransferase (EC 2.7.7.8), 23), (ATP- dependent DNA helicase RecG (EC 3.6.4.12), 23)

Carbon-D-Melezitose (Sulfate and thiosulfate import ATP-binding protein CysA (EC 3.6.3.25), 66), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 60), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 49), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 46), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 44), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 42), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 39), (3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 38), (DNA polymerase III subunits gamma and tau (EC 2.7.7.7), 35), (Pyruvate kinase (EC 2.7.1.40), 33), (Acetyl-coenzyme A carboxyl transferase alpha chain (EC 6.4.1.2), 28), (5-methyltetrahydrofolate–homocysteine methyltransferase (EC 2.1.1.13), 28), (Enoyl-CoA hydratase (EC 4.2.1.17), 27), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 26), (DNA polymerase III alpha subunit (EC 2.7.7.7), 26)

Carbon-D-Alanine (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 8), (Ribonucleotide reductase of class III (anaerobic)C large subunit (EC 1.17.4.2), 8), (beta-glucosidase (EC 3.2.1.21), 7), (Lipoyl synthase (EC 2.8.1.8), 6), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 5), (Valyl-tRNA synthetase (EC 6.1.1.9), 5), (Acetolactate synthase small subunit (EC 2.2.1.6), 5), (Deoxyribodipyrimidine photolyase (EC 4.1.99.3), 5), (”Pyridoxal 5-phosphate synthase (glutamine hydrolyzing)C synthase subunit (EC 4.3.3.6)”, 5), (5-methyltetrahydropteroyltriglutamate– homocysteine methyltransferase (EC 2.1.1.14), 4), (UDP-glucose 6-dehydrogenase (EC 1.1.1.22), 4), (Catalase-peroxidase KatG (EC 1.11.1.21), 4), (Autoinducer 1 sensor kinase/phosphatase LuxN (EC 2.7.3.-) (EC 3.1.3.-), 4), (Phosphogluconate dehydratase (EC 4.2.1.12), 4), (L-2C4- diaminobutyric acid transaminase DoeD (EC 2.6.1.-), 4)

Nitrogen-Nitrate DRAFT130 (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 89), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 75), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 65), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 64), (Aldehyde dehydrogenase (EC 1.2.1.3), 63), (DNA polymerase III alpha subunit (EC 2.7.7.7), 59), (NADH-ubiquinone oxidoreductase chain M (EC 1.6.5.3), 57), (3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 51), (Enoyl-CoA hydratase (EC 4.2.1.17), 48), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 47), (D- alanyl-D-alanine carboxypeptidase (EC 3.4.16.4), 43), (Glutamate-ammonia-ligase adenylyltransferase (EC 2.7.7.42), 39), (NADH-ubiquinone oxidoreductase chain N (EC 1.6.5.3), 38), (Alcohol dehydrogenase (EC 1.1.1.1), 36), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 35)

Nitrogen-Ala-Glu (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 141), (Succinyl-CoA ligase ADP-forming alpha chain (EC 6.2.1.5), 118), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 114), (3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 112), (Transketolase (EC 2.2.1.1), 108), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 106), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 106), (Enoyl-CoA hydratase (EC 4.2.1.17), 105), (Phosphoglycerate kinase (EC 2.7.2.3), 104), (Gamma- glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 101), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 92), (Aldehyde dehydrogenase (EC 1.2.1.3), 88), (Methylcrotonyl-CoA carboxylase biotin-containing subunit (EC 6.4.1.4), 82), (Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 76), (Cell division protein FtsZ (EC 3.4.24.-), 69)

Nitrogen-N-Acetyl-D-Glucosamine (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 9), (tRNA-dihydrouridine(20/20a) synthase (EC 1.3.1.91), 6), (Autoinducer 1 sensor kinase/phosphatase LuxN (EC 2.7.3.-) (EC 3.1.3.-), 6), (Phosphogluconate dehydratase (EC 4.2.1.12), 6), (DNA polymerase III alpha subunit (EC 2.7.7.7), 5), (Catalase-peroxidase KatG (EC 1.11.1.21), 5), (Sporulation kinase E (EC 2.7.13.3), 5), (Multimodular transpeptidase- transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 5), (4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate al- dolase (EC 4.1.2.14), 5), (DNA polymerase III beta subunit (EC 2.7.7.7), 4), (Leucyl-tRNA synthetase (EC 6.1.1.4), 4), (Sulfur carrier protein ThiS adenylyltransferase (EC 2.7.7.73), 4), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 4), (D-alanyl-D-alanine carboxypeptidase (EC 3.4.16.4), 3), (Nicotinate-nucleotide adenylyltransferase (EC 2.7.7.18), 3)

Carbon-D-L-Octopamine (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 53), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 45), (LeadC cadmiumC zinc and mer- cury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 41), (Multimodular transpeptidase- transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 41), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 41), (Catalase-peroxidase KatG (EC 1.11.1.21), 37), (Tryptophan synthase beta chain (EC 4.2.1.20), 35), (Potassium-transporting ATPase B chain (EC 3.6.3.12) (TC 3.A.3.7.1), 34), (Adenylate cyclase (EC 4.6.1.1), 32), (DNA polymerase III subunits gamma and tau (EC 2.7.7.7), 32), (Alcohol dehydrogenase (EC 1.1.1.1), 32), (Enoyl-CoA hydratase (EC 4.2.1.17), 30), (Aldehyde dehydrogenase (EC 1.2.1.3), 28), (”23S rRNA (guanosine(2251)-2-O)- methyltransferase (EC 2.1.1.185)”, 28), (Alanyl-tRNA synthetase (EC 6.1.1.7), 27)

Nitrogen-Ala-Gly (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 119), (Glucose-6-phosphate isomerase (EC 5.3.1.9), 108), (Enoyl-CoA hydratase (EC 4.2.1.17), 107), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 105), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 100), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.- .-), 79), (Aldehyde dehydrogenase (EC 1.2.1.3), 66), (3-methyl-2-oxobutanoate hydroxymethyltransferase (EC 2.1.2.11), 66), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 63), (3-oxoacyl-acyl-carrier-protein synthaseC KASII (EC 2.3.1.179), 55), (IMP cyclohydrolase (EC 3.5.4.10) / Phosphoribosylaminoimidazolecarboxamide formyltransferase (EC 2.1.2.3), 50), (Glutamate 5-kinase (EC 2.7.2.11) / RNA- binding C-terminal domain PUA, 49), (Alcohol dehydrogenase (EC 1.1.1.1), 49), (Glucose-6-phosphate 1-dehydrogenase (EC 1.1.1.49), 46), (Adenylate cyclase (EC 4.6.1.1), 45)

Nitrogen-Ala-Gln (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 184), (3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 156), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 125), (Enoyl-CoA hydratase (EC 4.2.1.17), 119), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 89), (3-isopropylmalate dehydratase large subunit (EC 4.2.1.33), 88), (Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 85), (Gamma-glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 84), (Aldehyde dehydrogenase (EC 1.2.1.3), 84), (O-acetylhomoserine sulfhydrylase (EC 2.5.1.49) / O-succinylhomoserine sulfhydrylase (EC 2.5.1.48), 83), (Cell division protein FtsZ (EC 3.4.24.-), 76), (Alcohol dehydrogenase (EC 1.1.1.1), 76), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 71), (Acetyl-CoA acetyltransferase (EC 2.3.1.9), 71), (Anthranilate phosphoribosyltransferase (EC 2.4.2.18), 71)

Carbon-D-L-a-Glycerol-Phosphate (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 88), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 67), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 52), (Phenylalanyl-tRNA syn- thetase beta chain (EC 6.1.1.20), 52), (Aldehyde dehydrogenase (EC 1.2.1.3), 47), (Gamma-glutamyl phosphate reductase (EC 1.2.1.41), 45), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 41), (Chemotaxis protein methyltransferase CheR (EC 2.1.1.80), 41), (NADH-ubiquinone oxidoreductase chain G (EC 1.6.5.3), 38), (Transcriptional regulatorC GntR family domain / Aspartate aminotrans- ferase (EC 2.6.1.1), 38), (1-deoxy-D-xylulose 5-phosphate synthase (EC 2.2.1.7), 36), (Glutamate-ammonia-ligase adenylyltransferase (EC 2.7.7.42), 36), (Cell division protein FtsI Peptidoglycan synthetase (EC 2.4.1.129), 36), (DNA polymerase I (EC 2.7.7.7), 35), (Gamma- glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 35)

Nitrogen-L-Proline (Acyl-CoA:1-acyl-sn-glycerol-3-phosphate acyltransferase (EC 2.3.1.51), 16), (Thioredoxin reductase (EC 1.8.1.9), 15), (dTDP-4- dehydrorhamnose reductase (EC 1.1.1.133), 12), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 11), (”Guanosine-3C5- bis(diphosphate) 3-pyrophosphohydrolase (EC 3.1.7.2) / GTP pyrophosphokinase (EC 2.7.6.5)C (p)ppGpp synthetase II”, 10), (Citronellyl- CoA dehydrogenase @ Acyl-CoA dehydrogenase (EC 1.3.8.-)C Mycobacterial subgroup FadE13, 10), (Cholesterol oxidase (EC 1.1.3.6) @ Steroid Delta(5)-¿Delta(4)-isomerase (EC 5.3.3.1), 10), (UDP-N-acetylmuramate:L-alanyl-gamma-D-glutamyl-meso-diaminopimelate lig- ase (EC 6.3.2.-), 8), (Betaine aldehyde dehydrogenase (EC 1.2.1.8), 8), (4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3- deoxyphosphogluconate aldolase (EC 4.1.2.14), 8), (Phosphogluconate dehydratase (EC 4.2.1.12), 7), (Methylisocitrate lyase (EC 4.1.3.30), 7), (NADH dehydrogenase (EC 1.6.99.3), 7), (2-acylglycerophosphoethanolamine acyltransferase (EC 2.3.1.40) / Acyl-acyl-carrier-protein synthetase (EC 6.2.1.20), 6), (O-acetylhomoserine sulfhydrylase (EC 2.5.1.49) / O-succinylhomoserine sulfhydrylase (EC 2.5.1.48), 6)

Carbon-L-Alanyl-Glycine (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 117), (Enoyl-CoA hydratase (EC 4.2.1.17), 106), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 105), (Exopolyphosphatase (EC 3.6.1.11), 103), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 89), (Sulfate and thiosulfate import ATP-binding protein CysA (EC 3.6.3.25), 82), (Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14), 82), (Cell division protein FtsH (EC 3.4.24.-), 75), (3- ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 75), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 74), (Maltose/maltodextrin transport ATP-binding protein MalK (EC 3.6.3.19), 74), (Acetylglutamate kinase (EC 2.7.2.8), 65), (Alanyl- tRNA synthetase (EC 6.1.1.7), 61), (Aldehyde dehydrogenase (EC 1.2.1.3), 58), (Gamma-glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 57)

Nitrogen-L-Aspartic-Acid DRAFT131 (Acetyl-CoA synthetase (EC 6.2.1.1), 10), (Class A beta-lactamase (EC 3.5.2.6) D¿ SHV family, 9), (Argininosuccinate lyase (EC 4.3.2.1), 8), (Ribosomal large subunit pseudouridine synthase F (EC 5.4.99.21), 7), (Threonine dehydratase biosynthetic (EC 4.3.1.19), 7), (Aspartate- semialdehyde dehydrogenase DoeC in ectoine degradation (EC 1.2.1.-), 7), (Choline-sulfatase (EC 3.1.6.6), 7), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 6), (Malonyl-acyl-carrier protein O-methyltransferase (EC 2.1.1.197), 6), (Threonyl-tRNA synthetase (EC 6.1.1.3), 6), (NADH-ubiquinone oxidoreductase chain N (EC 1.6.5.3), 5), (Imidazole glycerol phosphate synthase cyclase subunit (EC 4.1.3.-), 5), (Serine acetyltransferase (EC 2.3.1.30), 5), (Lipid-A-disaccharide synthase (EC 2.4.1.182), 4), (L-asparaginase IC cytoplasmic (EC 3.5.1.1), 4)

Carbon-Glycyl-L-Proline (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 123), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 101), (Gamma- glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 92), (Aldehyde dehydrogenase (EC 1.2.1.3), 90), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 82), (Cell division protein FtsZ (EC 3.4.24.-), 82), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 78), (Urocanate hydratase (EC 4.2.1.49), 68), (Transcriptional regulatorC GntR family domain / Aspartate aminotransferase (EC 2.6.1.1), 66), (Glycine dehydrogenase de- carboxylating (glycine cleavage system P protein) (EC 1.4.4.2), 64), (Thioredoxin reductase (EC 1.8.1.9), 64), (Threonine dehydratase biosyn- thetic (EC 4.3.1.19), 63), (3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 59), (Enoyl-CoA hydratase (EC 4.2.1.17), 57), (Phenylalanyl-tRNA synthetase beta chain (EC 6.1.1.20), 53)

Nitrogen-Gly-Glu (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 139), (Phosphoglycerate kinase (EC 2.7.2.3), 127), (3-oxoacyl-acyl-carrier protein reduc- tase (EC 1.1.1.100), 105), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P- type ATPase (EC 3.6.3.4), 92), (3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 89), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 85), (Enoyl-CoA hydratase (EC 4.2.1.17), 77), (D-3-phosphoglycerate dehydro- genase (EC 1.1.1.95), 77), (Aldehyde dehydrogenase (EC 1.2.1.3), 71), (Cysteine desulfurase (EC 2.8.1.7), 69), (Transcriptional regulatorC GntR family domain / Aspartate aminotransferase (EC 2.6.1.1), 68), (Shikimate kinase I (EC 2.7.1.71), 68), (Dihydroxy-acid dehydratase (EC 4.2.1.9), 67), (Alcohol dehydrogenase (EC 1.1.1.1), 67), (Alanyl-tRNA synthetase (EC 6.1.1.7), 61)

Carbon-D-Trehalose (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 58), (Thioredoxin reductase (EC 1.8.1.9), 56), (Aldehyde dehydrogenase (EC 1.2.1.3), 44), (Nitrite reductase NAD(P)H large subunit (EC 1.7.1.4), 41), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 39), (Glutamate synthase NADPH small chain (EC 1.4.1.13), 38), (ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 38), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 36), (Glutamate-1-semialdehyde 2C1- aminomutase (EC 5.4.3.8), 33), (Quinone oxidoreductase (EC 1.6.5.5), 31), (Enoyl-CoA hydratase (EC 4.2.1.17), 30), (Phenylalanyl-tRNA synthetase beta chain (EC 6.1.1.20), 30), (Uridine monophosphate kinase (EC 2.7.4.22), 30), (Aerobic glycerol-3-phosphate dehydrogenase (EC 1.1.5.3), 29), (Cytosol aminopeptidase PepA (EC 3.4.11.1), 28)

Carbon-2-Deoxy-D-Ribose (Valyl-tRNA synthetase (EC 6.1.1.9), 30), (Membrane alanine aminopeptidase N (EC 3.4.11.2), 22), (Chorismate synthase (EC 4.2.3.5), 12), (Isocitrate lyase (EC 4.1.3.1), 11), (NAD-specific glutamate dehydrogenase (EC 1.4.1.2)C large form, 10), (Aconitate hydratase (EC 4.2.1.3), 10), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 9), (Methionyl-tRNA formyltransferase (EC 2.1.2.9), 9), (Cyclic dehypoxanthine futalosine synthase (EC 1.21.98.1), 7), (NAD kinase (EC 2.7.1.23), 7), (Beta-phosphoglucomutase (EC 5.4.2.6), 7), (2-isopropylmalate synthase (EC 2.3.3.13), 6), (Cytosol aminopeptidase PepA (EC 3.4.11.1), 6), (Histidinol dehydrogenase (EC 1.1.1.23), 6), (Aspartate aminotransferase (EC 2.6.1.1), 6)

Nitrogen-Ala-Thr (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 137), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 95), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 92), (Cell division protein FtsH (EC 3.4.24.-), 86), (Phosphoglycerate kinase (EC 2.7.2.3), 80), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.- .-), 79), (Aldehyde dehydrogenase (EC 1.2.1.3), 63), (3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 62), (Thioredoxin reductase (EC 1.8.1.9), 60), (Uridine monophosphate kinase (EC 2.7.4.22), 59), (Enoyl-CoA hydratase (EC 4.2.1.17), 57), (Quinone oxidoreductase (EC 1.6.5.5), 52), (DNA polymerase III subunits gamma and tau (EC 2.7.7.7), 50), (Alcohol dehydrogenase (EC 1.1.1.1), 49), (Glutamate-1-semialdehyde 2C1-aminomutase (EC 5.4.3.8), 48)

Nitrogen-Gly-Gln (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 146), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 104), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 96), (Alanyl-tRNA synthetase (EC 6.1.1.7), 88), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 81), (Biotin carboxylase of acetyl-CoA carboxy- lase (EC 6.3.4.14), 75), (Enoyl-CoA hydratase (EC 4.2.1.17), 72), (Aldehyde dehydrogenase (EC 1.2.1.3), 71), (3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 66), (Adenylate cyclase (EC 4.6.1.1), 64), (DNA polymerase III subunits gamma and tau (EC 2.7.7.7), 63), (Ribonuclease E (EC 3.1.26.12), 62), (Cell division protein FtsZ (EC 3.4.24.-), 60), (DNA polymerase I (EC 2.7.7.7), 56), (Gamma-glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 55)

Carbon-2-Hydroxy-Benzoic-Acid (Cytochrome c oxidase polypeptide I (EC 1.9.3.1), 64), (Phosphoribosylamine–glycine ligase (EC 6.3.4.13), 58), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 49), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 45), (Signal transduction histidine kinase CheA (EC 2.7.3.-), 33), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 32), (Respiratory nitrate reductase beta chain (EC 1.7.99.4), 31), (Aldehyde dehydrogenase (EC 1.2.1.3), 27), (ATP synthase beta chain (EC 3.6.3.14), 27), (3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 26), (Isoleucyl-tRNA synthetase (EC 6.1.1.5), 24), (Valyl-tRNA synthetase (EC 6.1.1.9), 23), (Succinyl-CoA ligase ADP-forming alpha chain (EC 6.2.1.5), 23), (Acetyl-CoA synthetase (EC 6.2.1.1), 23), (Enolase (EC 4.2.1.11), 23)

Carbon-L-Proline (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 8), (Autoinducer 1 sensor kinase/phosphatase LuxN (EC 2.7.3.-) (EC 3.1.3.-), 8), (Phosphoenolpyruvate carboxylase (EC 4.1.1.31), 7), (Phosphogluconate dehydratase (EC 4.2.1.12), 7), (Ribosomal large subunit pseudouridine synthase F (EC 5.4.99.21), 6), (NADH-ubiquinone oxidoreductase chain M (EC 1.6.5.3), 5), (DNA gyrase subunit B (EC 5.99.1.3), 5), (Aspartate-semialdehyde dehydrogenase DoeC in ectoine degradation (EC 1.2.1.-), 5), (Class A beta-lactamase (EC 3.5.2.6) D¿ SHV family, 5), (4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14), 5), (Dihydroorotase (EC 3.5.2.3), 4), (Sulfur carrier protein ThiS adenylyltransferase (EC 2.7.7.73), 4), (L-2C4-diaminobutyric acid transaminase DoeD (EC 2.6.1.-), 4), (Glutaminyl-tRNA synthetase (EC 6.1.1.18), 4), (Ribonuclease E (EC 3.1.26.12), 4)

Carbon-b-D-Allose (Polyribonucleotide nucleotidyltransferase (EC 2.7.7.8), 91), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 82), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 74), (Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 48), (Cytochrome c oxidase polypeptide I (EC 1.9.3.1), 47), (DNA polymerase I (EC 2.7.7.7), 45), (Glyc- erol kinase (EC 2.7.1.30), 43), (NADH-ubiquinone oxidoreductase chain N (EC 1.6.5.3), 40), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 37), (Aldehyde dehydrogenase (EC 1.2.1.3), 37), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 35), (Hydroxymethylpyrimidine phosphate synthase ThiC (EC 4.1.99.17), 34), (Histidinol dehydrogenase (EC 1.1.1.23), 34), (Glutamate-ammonia- ligase adenylyltransferase (EC 2.7.7.42), 32), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 32)

Phosphate-D-L-a-Glycerol-Phosphate DRAFT132 (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 74), (Enoyl-CoA hydratase (EC 4.2.1.17), 71), (LeadC cadmiumC zinc and mercury transport- ing ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 68), (Aldehyde dehydrogenase (EC 1.2.1.3), 61), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 59), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 50), (Fumarate hydratase class II (EC 4.2.1.2), 49), (Sulfate and thiosulfate import ATP-binding protein CysA (EC 3.6.3.25), 49), (Succinate-semialdehyde dehydroge- nase NAD(P)+ (EC 1.2.1.16), 43), (Error-prone repair homolog of DNA polymerase III alpha subunit (EC 2.7.7.7), 40), (Maltose/maltodextrin transport ATP-binding protein MalK (EC 3.6.3.19), 40), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 38), (ATP- dependent Clp protease proteolytic subunit (EC 3.4.21.92), 37), (Adenylate cyclase (EC 4.6.1.1), 37), (5-methyltetrahydrofolate–homocysteine methyltransferase (EC 2.1.1.13), 37)

Carbon-Glycerol (ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 8), (Transketolase (EC 2.2.1.1), 7), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 7), (”Adenosine (5)-pentaphospho-(5)-adenosine pyrophosphohydrolase (EC 3.6.1.-)”, 7), (Branched-chain amino acid aminotransferase (EC 2.6.1.42), 6), (Catalase-peroxidase KatG (EC 1.11.1.21), 6), (DNA polymerase III beta subunit (EC 2.7.7.7), 6), (NAD-specific glutamate dehydrogenase (EC 1.4.1.2)B NADP-specific glutamate dehydrogenase (EC 1.4.1.4), 5), (Methionine aminopeptidase (EC 3.4.11.18), 5), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 4), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 4), (Aerobic glycerol-3-phosphate dehydrogenase (EC 1.1.5.3), 4), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 4), (beta-glucosidase (EC 3.2.1.21), 4), (UTP–glucose-1-phosphate uridylyltransferase (EC 2.7.7.9), 4)

Carbon-D-Glucosamine (ATP synthase alpha chain (EC 3.6.3.14), 59), (ATP-dependent DNA helicase UvrD/PcrA (EC 3.6.4.12), 27), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 24), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 21), (Urease alpha subunit (EC 3.5.1.5), 21), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 19), (Aldehyde dehydrogenase (EC 1.2.1.3), 17), (Cytochrome d ubiquinol oxidase subunit I (EC 1.10.3.-), 15), (Octanoate-acyl-carrier- protein-protein-N-octanoyltransferase (EC 2.3.1.181), 13), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 12), (3-isopropylmalate dehydrogenase (EC 1.1.1.85), 12), (Signal transduction histidine kinase CheA (EC 2.7.3.-), 12), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 12), (A/G-specific adenine glycosylase (EC 3.2.2.-), 11), (Phosphoglycerate mutase (EC 5.4.2.11), 11)

Carbon-a-Methyl-D-Galactoside (Enoyl-CoA hydratase (EC 4.2.1.17), 65), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 64), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 51), (DNA polymerase I (EC 2.7.7.7), 48), (Signal transduction histidine kinase CheA (EC 2.7.3.-), 42), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 39), (Cell division protein FtsZ (EC 3.4.24.-), 39), (Phosphoglucosamine mutase (EC 5.4.2.10), 37), (Porphobilinogen synthase (EC 4.2.1.24), 37), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 35), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 34), (Gamma-glutamyl phosphate reductase (EC 1.2.1.41), 33), (Pyruvate kinase (EC 2.7.1.40), 28), (DNA gyrase subunit A (EC 5.99.1.3), 28), (Acetolactate synthase large subunit (EC 2.2.1.6), 27)

Carbon-N-Acetyl-D-Glucosamine (DNA polymerase III beta subunit (EC 2.7.7.7), 12), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 7), (Methionyl-tRNA synthetase (EC 6.1.1.10), 6), (DNA polymerase III alpha subunit (EC 2.7.7.7), 6), (Sporulation kinase E (EC 2.7.13.3), 6), (Myo-inositol 2-dehydrogenase (EC 1.1.1.18), 5), (Ribonucleoside-diphosphate reductase (EC 1.17.4.1), 5), (Thiamine-monophosphate kinase (EC 2.7.4.16), 5), (L-aspartate oxidase (EC 1.4.3.16), 5), (Glycolate dehydrogenase (EC 1.1.99.14)C iron-sulfur subunit GlcF, 4), (Phospholi- pase A1 (EC 3.1.1.32) (EC 3.1.1.4) @ Outer membrane phospholipase A, 4), (Deoxyribose-phosphate aldolase (EC 4.1.2.4), 4), (2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2), 4), (Aspartate carbamoyltransferase (EC 2.1.3.2), 4), (N-acyl-D-amino-acid deacylase (EC 3.5.1.81), 4)

Carbon-L-Lactic-Acid (3-isopropylmalate dehydrogenase (EC 1.1.1.85), 10), (Autoinducer 1 sensor kinase/phosphatase LuxN (EC 2.7.3.-) (EC 3.1.3.-), 10), (Glutamyl-tRNA synthetase (EC 6.1.1.17), 8), (”Bis(5-nucleosyl)-tetraphosphataseC symmetrical (EC 3.6.1.41)”, 8), (Medium-chain-fatty- acid–CoA ligase (EC 6.2.1.-), 8), (D-Lactate dehydrogenaseC cytochrome c-dependent (EC 1.1.2.4), 8), (DNA gyrase subunit B (EC 5.99.1.3), 8), (Phosphogluconate dehydratase (EC 4.2.1.12), 8), (CTP synthase (EC 6.3.4.2), 7), (4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14), 7), (D-alanyl-D-alanine carboxypeptidase (EC 3.4.16.4), 6), (Thiazole synthase (EC 2.8.1.10), 6), (PTS systemC N-acetylglucosamine-specific IIC component / PTS systemC N-acetylglucosamine-specific IIB component (EC 2.7.1.69) / PTS systemC N-acetylglucosamine-specific IIA component, 6), (Sulfur carrier protein ThiS adenylyltransferase (EC 2.7.7.73), 5), (Decaprenyl diphosphate synthase (EC 2.5.1.91), 5)

Carbon-m-Tartaric-Acid (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 98), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 70), (ATP synthase beta chain (EC 3.6.3.14), 65), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 59), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 53), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 49), (Aldehyde dehydrogenase (EC 1.2.1.3), 49), (3-oxoacyl-acyl-carrier-protein synthaseC KASII (EC 2.3.1.179), 43), (NADP- dependent malic enzyme (EC 1.1.1.40), 43), (NAD(P) transhydrogenase alpha subunit (EC 1.6.1.2), 41), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 40), (Adenylate cyclase (EC 4.6.1.1), 39), (Gamma-glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 39), (Urocanate hydratase (EC 4.2.1.49), 39), (3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 38)

Carbon-D-Mannose (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 49), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 37), (Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 36), (DNA polymerase I (EC 2.7.7.7), 33), (Enoyl-CoA hydratase (EC 4.2.1.17), 31), (Aldehyde dehydrogenase (EC 1.2.1.3), 31), (Acetyl-coenzyme A carboxyl transferase alpha chain (EC 6.4.1.2), 30), (3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 28), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 28), (D-3- phosphoglycerate dehydrogenase (EC 1.1.1.95), 27), (DNA gyrase subunit A (EC 5.99.1.3), 24), (NADH dehydrogenase (EC 1.6.99.3), 22), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 21), (”DNA-directed RNA polymerase beta subunit (EC 2.7.7.6)”, 21), (Pyruvate kinase (EC 2.7.1.40), 21)

Carbon-Glycyl-L-Glutamic-Acid (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 121), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 101), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 89), (ATP-dependent DNA he- licase RecG (EC 3.6.4.12), 89), (Cell division protein FtsH (EC 3.4.24.-), 88), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 86), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 84), (Histidinol dehydrogenase (EC 1.1.1.23), 79), (Enoyl-CoA hydratase (EC 4.2.1.17), 73), (Aldehyde dehydrogenase (EC 1.2.1.3), 70), (DNA ligase (NAD(+)) (EC 6.5.1.2), 70), (Ribonuclease E (EC 3.1.26.12), 66), (Cysteine desulfurase (EC 2.8.1.7), 57), (Acetyl-CoA synthetase (EC 6.2.1.1), 53), (Glutamate-ammonia-ligase adenylyltrans- ferase (EC 2.7.7.42), 52)

Carbon-4-Hydroxy-L-Proline-trans DRAFT133 (Methionine aminopeptidase (EC 3.4.11.18), 65), (Carbamoyl-phosphate synthase large chain (EC 6.3.5.5), 55), (5-methyltetrahydrofolate– homocysteine methyltransferase (EC 2.1.1.13), 51), (Glucosamine–fructose-6-phosphate aminotransferase isomerizing (EC 2.6.1.16), 47), (DNA polymerase III alpha subunit (EC 2.7.7.7), 45), (Aldehyde dehydrogenase (EC 1.2.1.3), 43), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 34), (Threonine dehydratase biosynthetic (EC 4.3.1.19), 30), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 28), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 27), (2-isopropylmalate synthase (EC 2.3.3.13), 27), (3-isopropylmalate dehydrogenase (EC 1.1.1.85), 27), (Glutamate-ammonia-ligase adenylyltransferase (EC 2.7.7.42), 26), (CTP synthase (EC 6.3.4.2), 25), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 24)

Carbon-Fumaric-Acid (Methionyl-tRNA formyltransferase (EC 2.1.2.9), 68), (Glucosamine–fructose-6-phosphate aminotransferase isomerizing (EC 2.6.1.16), 31), (Pyruvate dehydrogenase E1 component (EC 1.2.4.1), 25), (Catalase-peroxidase KatG (EC 1.11.1.21), 20), (Triosephosphate isomerase (EC 5.3.1.1), 18), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 18), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 16), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 15), (Diaminopimelate epimerase (EC 5.1.1.7), 14), (Glutathione reductase (EC 1.8.1.7), 14), (3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl- CoA acetyltransferase (EC 2.3.1.9), 13), (2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2), 12), (Multimodular transpeptidase- transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 12), (DNA polymerase III beta subunit (EC 2.7.7.7), 12), (Uridine monophosphate kinase (EC 2.7.4.22), 11)

Carbon-p-Hydroxy-Phenylacetic-Acid (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 56), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 51), (L-serine dehydrataseC beta subunit (EC 4.3.1.17) / L-serine dehydrataseC alpha subunit (EC 4.3.1.17), 50), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 49), (Phosphoenolpyruvate synthase (EC 2.7.9.2), 43), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 43), (Peptidyl-tRNA hydrolase (EC 3.1.1.29), 39), (Aldehyde dehydrogenase (EC 1.2.1.3), 32), (FMN-dependent NADH-azoreductase (EC 1.7.1.6), 32), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 30), (3- ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 29), (Enoyl-CoA hydratase (EC 4.2.1.17), 27), (Aerobic cobaltochelatase CobT subunit (EC 6.6.1.2), 27), (Adenosylmethionine-8-amino-7-oxononanoate aminotransferase (EC 2.6.1.62), 25), (”23S rRNA (guanosine(2251)-2-O)-methyltransferase (EC 2.1.1.185)”, 25)

Carbon-L-Malic-Acid (Cell division protein FtsI Peptidoglycan synthetase (EC 2.4.1.129), 8), (L-2C4-diaminobutyric acid transaminase DoeD (EC 2.6.1.-), 8), (Heme O synthaseC protoheme IX farnesyltransferase (EC 2.5.1.-) COX10-CtaB, 8), (Phosphoglucomutase (EC 5.4.2.2), 6), (Class A beta- lactamase (EC 3.5.2.6) D¿ SHV family, 5), (CoA-disulfide reductase (EC 1.8.1.14), 5), (Transcriptional regulator of pyridoxine metabolism / Pyridoxamine phosphate aminotransferase (EC 2.6.1.54), 5), (Inosose isomerase (EC 5.3.99.11), 5), (Choline-sulfatase (EC 3.1.6.6), 5), (Ribosomal large subunit pseudouridine synthase F (EC 5.4.99.21), 4), (Acetate kinase (EC 2.7.2.1), 4), (PyruvateCphosphate dikinase (EC 2.7.9.1), 4), (Aspartate-semialdehyde dehydrogenase DoeC in ectoine degradation (EC 1.2.1.-), 4), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 3), (Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4), 3)

Carbon-a-Hydroxy-Butyric-Acid (Dihydroxy-acid dehydratase (EC 4.2.1.9), 80), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper- translocating P-type ATPase (EC 3.6.3.4), 62), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 58), (Aldehyde dehydrogenase (EC 1.2.1.3), 51), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 50), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 47), (Enolase (EC 4.2.1.11), 42), (NAD(P) transhydrogenase subunit beta (EC 1.6.1.2), 42), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 40), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 39), (Phosphogluconate dehydratase (EC 4.2.1.12), 37), (3-ketoacyl- CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 36), (Adenylate cyclase (EC 4.6.1.1), 31), (5-Enolpyruvylshikimate- 3-phosphate synthase (EC 2.5.1.19), 31), (Cell division protein FtsI Peptidoglycan synthetase (EC 2.4.1.129), 28)

Carbon-D-Xylose (Signal transduction histidine kinase CheA (EC 2.7.3.-), 23), (Acetyl-CoA synthetase (EC 6.2.1.1), 21), (Cytosol aminopeptidase PepA (EC 3.4.11.1), 18), (Membrane alanine aminopeptidase N (EC 3.4.11.2), 15), (Anthranilate synthaseC aminase component (EC 4.1.3.27), 15), (Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61), 14), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 14), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 13), (3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9), 13), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 12), (Glycyl-tRNA synthetase beta chain (EC 6.1.1.14), 12), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 12), (Undecaprenyl-phosphate galactosephospho- transferase (EC 2.7.8.6), 12), (Fructose-bisphosphate aldolase class II (EC 4.1.2.13), 11), (ATP-dependent protease La (EC 3.4.21.53) Type I, 11)

Carbon-a-Keto-Butyric-Acid (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 68), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 62), (Acetyl- CoA synthetase (EC 6.2.1.1), 58), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 56), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 51), (Aldehyde dehydrogenase (EC 1.2.1.3), 46), (Threonyl-tRNA synthetase (EC 6.1.1.3), 40), (”DNA-directed RNA polymerase beta subunit (EC 2.7.7.6)”, 36), (Phosphoribosylformylglyci- namidine synthaseC synthetase subunit (EC 6.3.5.3) / Phosphoribosylformylglycinamidine synthaseC glutamine amidotransferase subunit (EC 6.3.5.3), 36), (Enoyl-CoA hydratase (EC 4.2.1.17), 30), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 30), (Cell division protein FtsH (EC 3.4.24.-), 29), (ATP-dependent DNA helicase RecG (EC 3.6.4.12), 28), (Phosphoenolpyruvate synthase (EC 2.7.9.2), 26), (D-alanyl-D- alanine carboxypeptidase (EC 3.4.16.4), 26)

Nitrogen-D-Alanine (5-methyltetrahydrofolate–homocysteine methyltransferase (EC 2.1.1.13), 6), (L-2C4-diaminobutyric acid transaminase DoeD (EC 2.6.1.- ), 6), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 5), (2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2), 5), (UDP-N- acetylmuramoylalanyl-D-glutamate–2C6-diaminopimelate ligase (EC 6.3.2.13), 5), (Pyruvate carboxylase (EC 6.4.1.1), 4), (Biosynthetic argi- nine decarboxylase (EC 4.1.1.19), 3), (Catalase-peroxidase KatG (EC 1.11.1.21), 3), (2-dehydro-3-deoxygluconokinase (EC 2.7.1.45), 3), (Mul- timodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 3), (Polyketide synthase modules and related proteins @ Long-chain fatty-acid-CoA ligase (EC 6.2.1.3)C Mycobacterial subgroup FadD9, 3), (Ribosomal large subunit pseudouridine synthase F (EC 5.4.99.21), 2), (Cell division protein FtsZ (EC 3.4.24.-), 2), (UDP-3-O-3-hydroxymyristoyl glucosamine N-acyltransferase (EC 2.3.1.191), 2), (Diaminopime- late decarboxylase (EC 4.1.1.20), 2)

Carbon-D-Lactitol (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 111), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 84), (Aldehyde dehydrogenase (EC 1.2.1.3), 52), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 42), (Gamma-glutamyl phosphate reductase (EC 1.2.1.41), 39), (Enoyl-CoA hydratase (EC 4.2.1.17), 39), (Multimodular transpeptidase-transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 38), (Cell division protein FtsI Peptidoglycan synthetase (EC 2.4.1.129), 38), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 34), (Alanyl-tRNA synthetase (EC 6.1.1.7), 31), (Acetyl-CoA synthetase (EC 6.2.1.1), 30), (Signal transduction histidine kinase CheA (EC 2.7.3.-), 29), (Dihydroxy-acid dehydratase (EC 4.2.1.9), 28), (Potassium-transporting ATPase B chain (EC 3.6.3.12) (TC 3.A.3.7.1), 28), (Alcohol dehydrogenase (EC 1.1.1.1), 28)

Carbon-D-Ribose DRAFT134 (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 32), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 30), (Membrane alanine aminopeptidase N (EC 3.4.11.2), 24), (Maltose/maltodextrin transport ATP-binding protein MalK (EC 3.6.3.19), 24), (Adenylosuccinate lyase (EC 4.3.2.2) @ SAICAR lyase (EC 4.3.2.2), 22), (NADH-ubiquinone oxidoreductase chain H (EC 1.6.5.3), 18), (Aldehyde dehydrogenase (EC 1.2.1.3), 18), (Enoyl-CoA hydratase (EC 4.2.1.17), 18), (Phosphoserine phosphatase (EC 3.1.3.3), 18), (Fructose-bisphosphate aldolase class II (EC 4.1.2.13), 17), (Crossover junction endodeoxyribonuclease RuvC (EC 3.1.22.4), 17), (DNA polymerase I (EC 2.7.7.7), 17), (Sulfate and thiosulfate import ATP-binding protein CysA (EC 3.6.3.25), 16), (Acetyl-coenzyme A carboxyl transferase alpha chain (EC 6.4.1.2), 16), (Signal transduction histidine kinase CheA (EC 2.7.3.-), 15)

Nitrogen-Cytosine (Phosphoenolpyruvate carboxykinase GTP (EC 4.1.1.32), 47), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 45), (Cell division protein FtsH (EC 3.4.24.-), 39), (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 37), (Glutamate synthase NADPH large chain (EC 1.4.1.13), 33), (DNA-directed RNA polymerase beta subunit (EC 2.7.7.6), 32), (Thioredoxin reductase (EC 1.8.1.9), 29), (Adenylate cyclase (EC 4.6.1.1), 27), (D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95), 26), (NAD-specific glutamate dehydrogenase (EC 1.4.1.2)C large form, 26), (Signal transduction histidine kinase CheA (EC 2.7.3.-), 24), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 22), (Aldehyde dehydrogenase (EC 1.2.1.3), 22), (Gamma-glutamyltranspeptidase (EC 2.3.2.2) @ Glutathione hydrolase (EC 3.4.19.13), 21), (S-adenosylmethionine synthetase (EC 2.5.1.6), 21)

Carbon-Succinic-Acid (LeadC cadmiumC zinc and mercury transporting ATPase (EC 3.6.3.3) (EC 3.6.3.5)B Copper-translocating P-type ATPase (EC 3.6.3.4), 15), (Tryptophanyl-tRNA synthetase (EC 6.1.1.2), 11), (Long-chain-fatty-acid–CoA ligase (EC 6.2.1.3), 10), (Multimodular transpeptidase- transglycosylase (EC 2.4.1.129) (EC 3.4.-.-), 10), (D-alanyl-D-alanine carboxypeptidase (EC 3.4.16.4), 9), (Nucleoside triphosphate pyrophos- phohydrolase MazG (EC 3.6.1.8), 9), (Single-stranded-DNA-specific exonuclease RecJ (EC 3.1.-.-), 8), (Dihydroxy-acid dehydratase (EC 4.2.1.9), 8), (NADP-specific glutamate dehydrogenase (EC 1.4.1.4), 7), (ATP phosphoribosyltransferase (EC 2.4.2.17) D¿ HisGs, 7), (DNA poly- merase III alpha subunit (EC 2.7.7.7), 7), (3-oxoacyl-acyl-carrier protein reductase (EC 1.1.1.100), 6), (L-2C4-diaminobutyric acid transaminase DoeD (EC 2.6.1.-), 6), (Superoxide dismutase Fe (EC 1.15.1.1), 6), (Glutamate-ammonia-ligase adenylyltransferase (EC 2.7.7.42), 6) Table A.17: Table of the top 5 enzymes (with their EC numbers) associated with each of the top 300 k-mers obtained by training RF on each of the PGR media datasets. The second number in each tuple specifies the number of times the specific EC number is being associated with any of the 300 k-mers across all isolates.

DRAFT135 Nitrogen-Glycines: (rxn10042, 180), (rxn13784, 155), (rxn10199, 93), (rxn00085, 70), (rxn05405, 52), (rxn10043, 49), (rxn03084, 48), (rxn05736, 42), (rxn00974, 40), (rxn00414, 38)), Carbon-D-Galactonic-Acid-g-Lactone: (rxn13784, 161), (rxn10199, 147), (rxn10042, 137), (rxn00085, 134), (rxn05405, 103), (rxn03244, 90), (rxn08583, 77), (rxn02342, 75), (rxn10043, 71), (rxn03437, 68)), Nitrogen-L-Serine: (rxn13784, 116), (rxn10042, 115), (rxn10199, 58), (rxn00974, 37), (rxn00085, 33), (rxn00414, 31), (rxn03084, 29), (rxn00785, 28), (rxn00527, 27), (rxn00555, 26)), Nitrogen-L-Glutamic-Acid: (rxn13784, 272), (rxn10199, 239), (rxn10042, 160), (rxn00085, 157), (rxn05405, 152), (rxn00974, 113), (rxn03084, 112), (rxn00527, 109), (rxn00337, 82), (rxn03244, 81)), Carbon-b-Methyl-D-Galactoside: (rxn13784, 159), (rxn10199, 139), (rxn10042, 125), (rxn00085, 118), (rxn05405, 101), (rxn03244, 93), (rxn03084, 87), (rxn10043, 77), (rxn00527, 77), (rxn00799, 76)), Carbon-L-Aspartic-Acid: (rxn13784, 174), (rxn10042, 163), (rxn10199, 110), (rxn05405, 92), (rxn00085, 88), (rxn00527, 68), (rxn00285, 62), (rxn05736, 61), (rxn05289, 59), (rxn03244, 54)), Carbon-Butylamine-sec: (rxn10199, 164), (rxn13784, 161), (rxn10042, 143), (rxn00085, 128), (rxn05405, 107), (rxn03244, 87), (rxn10043, 84), (rxn05736, 78), (rxn00527, 73), (rxn02342, 71)), Carbon-a-Methyl-D-Glucoside: (rxn13784, 169), (rxn00085, 133), (rxn10042, 119), (rxn10199, 114), (rxn05405, 100), (rxn10043, 81), (rxn03084, 78), (rxn00785, 78), (rxn03244, 77), (rxn00527, 72)), Nitrogen-D-Glucosamine: (rxn13784, 388), (rxn10042, 331), (rxn10199, 294), (rxn00085, 216), (rxn05405, 154), (rxn00974, 136), (rxn03084, 125), (rxn02005, 115), (rxn00337, 105), (rxn03437, 105)), Nitrogen-Nitrite: (rxn10199, 185), (rxn00085, 184), (rxn10042, 183), (rxn05405, 156), (rxn13784, 143), (rxn10043, 127), (rxn03244, 117), (rxn00527, 106), (rxn00350, 93), (rxn00974, 90)), Carbon-L-Glutamic-Acid: (rxn13784, 225), (rxn10199, 214), (rxn00085, 173), (rxn05405, 157), (rxn10042, 155), (rxn00527, 119), (rxn03244, 112), (rxn03084, 96), (rxn00337, 92), (rxn00350, 90)), Carbon-Glycyl-L-Aspartic-Acid: (rxn10199, 337), (rxn13784, 294), (rxn00085, 292), (rxn10042, 281), (rxn05405, 247), (rxn03084, 190), (rxn03244, 171), (rxn00527, 167), (rxn00974, 147), (rxn00785, 145)), Carbon-Caproic-Acid: (rxn13784, 151), (rxn00085, 147), (rxn10199, 146), (rxn10042, 138), (rxn03437, 122), (rxn05405, 118), (rxn03244, 93), (rxn00337, 78), (rxn08583, 74), (rxn05736, 72)), Carbon-a-Keto-Valeric-Acid: (rxn10199, 168), (rxn10042, 158), (rxn13784, 140), (rxn03244, 122), (rxn00085, 120), (rxn02811, 96), (rxn05405, 91), (rxn03437, 86), (rxn03084, 85), (rxn10043, 82)), Nitrogen-Gly-Asn: (rxn13784, 222), (rxn10199, 199), (rxn10042, 158), (rxn00085, 114), (rxn05405, 105), (rxn00527, 88), (rxn00974, 84), (rxn03084, 72), (rxn00785, 69), (rxn00414, 63)), Carbon-b-Phenylethylamine: (rxn10199, 148), (rxn13784, 146), (rxn10042, 133), (rxn05405, 109), (rxn00085, 108), (rxn03244, 99), (rxn10043, 73), (rxn00785, 72), (rxn00527, 72), (rxn03243, 68)), Carbon-D-L-Malic-Acid: (rxn10042, 84), (rxn13784, 83), (rxn10199, 66), (rxn00085, 57), (rxn05405, 36), (rxn00527, 34), (rxn00799, 33), (rxn00974, 32), (rxn00414, 31), (rxn00785, 29)), Carbon-D-Fructose: (rxn13784, 225), (rxn10199, 215), (rxn10042, 197), (rxn00085, 133), (rxn05405, 115), (rxn03084, 103), (rxn00785, 95), (rxn05736, 88), (rxn02005, 82), (rxn00527, 79)), Carbon-D-L-Citramalic-Acid: (rxn13784, 185), (rxn10199, 157), (rxn10042, 147), (rxn00085, 109), (rxn05405, 96), (rxn08583, 87), (rxn03084, 74), (rxn00974, 68), (rxn03437, 64), (rxn10043, 58)), Carbon-2-3-Butanone: (rxn10199, 169), (rxn13784, 143), (rxn10042, 132), (rxn00085, 113), (rxn03243, 96), (rxn05405, 94), (rxn10043, 85), (rxn03437, 82), (rxn03244, 78), (rxn00785, 71)), Nitrogen-Ala-Asp: (rxn10042, 436), (rxn10199, 383), (rxn00085, 308), (rxn13784, 302), (rxn05405, 291), (rxn03084, 212), (rxn05289, 196), (rxn00974, 185), (rxn00527, 165), (rxn02811, 146)), Carbon-Maltose: (rxn13784, 257), (rxn10199, 213), (rxn10042, 173), (rxn00085, 141), (rxn05405, 118), (rxn00785, 117), (rxn03244, 90), (rxn05289, 81), (rxn00527, 80), (rxn03084, 78)), Carbon-Capric-Acid: (rxn13784, 173), (rxn10042, 165), (rxn00085, 160), (rxn10199, 153), (rxn05405, 107), (rxn03244, 85), (rxn10043, 82), (rxn00974, 78), (rxn00785, 76), (rxn02811, 68)), Carbon-Oxalomalic-Acid: (rxn13784, 139), (rxn00085, 136), (rxn10042, 134), (rxn10199, 131), (rxn05405, 99), (rxn10043, 83), (rxn03244, 81), (rxn00337, 73), (rxn03084, 70), (rxn00785, 70)), Carbon-i-Erythritol: (rxn10199, 158), (rxn13784, 147), (rxn00085, 143), (rxn10042, 137), (rxn05405, 101), (rxn03244, 95), (rxn10043, 71), (rxn00350, 67), (rxn00337, 65), (rxn03084, 64)), Carbon-D-Melezitose: (rxn10199, 149), (rxn10042, 145), (rxn00085, 131), (rxn13784, 111), (rxn05405, 89), (rxn10043, 88), (rxn03244, 82), (rxn03437, 76), (rxn00974, 73), (rxn00527, 71)), Carbon-D-Alanine: (rxn10042, 167), (rxn13784, 104), (rxn10199, 95), (rxn00085, 49), (rxn05405, 47), (rxn02342, 47), (rxn10043, 43), (rxn05289, 38), (rxn00414, 38), (rxn03084, 37)), Nitrogen-Nitrate: (rxn10199, 175), (rxn13784, 152), (rxn00085, 146), (rxn05405, 143), (rxn03244, 140), (rxn10042, 119), (rxn03084, 103), (rxn03243, 88), (rxn00527, 85), (rxn00785, 81)), Nitrogen-Ala-Glu: (rxn10199, 374), (rxn00085, 335), (rxn10042, 302), (rxn13784, 283), (rxn05405, 263), (rxn03084, 187), (rxn00350, 171), (rxn03437, 169), (rxn03244, 167), (rxn00785, 160)), Nitrogen-N-Acetyl-D-Glucosamine: (rxn13784, 123), (rxn10042, 113), (rxn10199, 106), (rxn00085, 63), (rxn02005, 51), (rxn05405, 47), (rxn00974, 41), (rxn02811, 40), (rxn00555, 36), (rxn05736, 34)), Carbon-D-L-Octopamine: (rxn10199, 169), (rxn10042, 135), (rxn00085, 135), (rxn13784, 119), (rxn05405, 96), (rxn03244, 94), (rxn00527, 84), (rxn03084, 76), (rxn00785, 73), (rxn03437, 73)), Nitrogen-Ala-Gly: (rxn10199, 446), (rxn10042, 346), (rxn00085, 345), (rxn13784, 307), (rxn05405, 265), (rxn03084, 189), (rxn05289, 185), (rxn00527, 173), (rxn00974, 163), (rxn03244, 150)), Nitrogen-Ala-Gln: (rxn10199, 364), (rxn00085, 284), (rxn13784, 282), (rxn10042, 278), (rxn05405, 272), (rxn00527, 185), (rxn05289, 172), (rxn03244, 168), (rxn00350, 161), (rxn03243, 159)), Carbon-D-L-a-Glycerol-Phosphate: (rxn10199, 249), (rxn13784, 228), (rxn10042, 197), (rxn05405, 192), (rxn00085, 172), (rxn00527, 143), (rxn03084, 125), (rxn03244, 111), (rxn03243, 97), (rxn01465, 89)), Nitrogen-L-Proline: (rxn10042, 170), (rxn13784, 122), (rxn10199, 120), (rxn05405, 95), (rxn00085, 93), (rxn03084, 75), (rxn00527, 70), (rxn03244, 66), (rxn10043, 58), (rxn05736, 57)), Carbon-L-Alanyl-Glycine: (rxn10199, 370), (rxn00085, 329), (rxn10042, 295), (rxn13784, 268), (rxn05405, 217), (rxn00527, 166), (rxn03084, 164), (rxn03243, 156), (rxn03244, 156), (rxn05289, 146)), Nitrogen-L-Aspartic-Acid: (rxn13784, 160), (rxn00085, 134), (rxn10199, 114), (rxn10042, 106), (rxn03084, 77), (rxn00785, 70), (rxn00527, 62), (rxn05405, 54), (rxn00974, 52), (rxn02342, 47)), Carbon-Glycyl-L-Proline: (rxn10199, 275), (rxn00085, 227), (rxn10042, 195), (rxn05405, 195), (rxn00527, 175), (rxn13784, 174), (rxn00350, 160), (rxn03244, 152), (rxn03084, 133), (rxn03243, 131)), Nitrogen-Gly-Glu: (rxn10199, 345), (rxn00085, 343), (rxn13784, 302), (rxn10042, 271), (rxn05405, 250), (rxn03137, 191), (rxn03084, 187), (rxn05289, 172), (rxn00527, 168), (rxn03243, 167)), Carbon-D-Trehalose: (rxn02005, 234), (rxn10042, 214), (rxn10199, 191), (rxn00085, 160), (rxn13784, 139), (rxn03244, 103), (rxn05405, 96), (rxn03243, 91), (rxn00527, 86), (rxn00337, 83)), Carbon-2-Deoxy-D-Ribose: (rxn13784, 127), (rxn10199, 124), (rxn10042, 119), (rxn05405, 94), (rxn00085, 84), (rxn03084, 57), (rxn10043, 53), (rxn05289, 52), (rxn00974, 51), (rxn00785, 44)), Nitrogen-Ala-Thr: (rxn10199, 288), (rxn05405, 252), (rxn00085, 246), (rxn13784, 233), (rxn10042, 224), (rxn00527, 170), (rxn03084, 153), (rxn03244, 147), (rxn00350, 135), (rxn03243, 127)), Nitrogen-Gly-Gln: (rxn10199, 391), (rxn00085, 327), (rxn13784, 318), (rxn05405, 292), (rxn10042, 283), (rxn00285, 219), (rxn03084, 195), (rxn00527, 166), (rxn03244, 166), (rxn00974, 161)), Carbon-2-Hydroxy-Benzoic-Acid: (rxn00085, 168), (rxn13784, 139), (rxn10199, 136), (rxn03437, 108), (rxn05405, 105), (rxn10042, 103), (rxn03244, 100), (rxn10043, 96), (rxn00527, 77), (rxn00555, 73)), Carbon-L-Proline: (rxn10199, 140), (rxn13784, 124), (rxn10042, 109), (rxn00085, 88), (rxn05405, 86), (rxn02342, 81), (rxn03244, 74), (rxn10043, 66), (rxn00527, 60), (rxn00350, 57)), DRAFT136 Carbon-b-D-Allose: (rxn13784, 150), (rxn10199, 129), (rxn00085, 120), (rxn10042, 115), (rxn05405, 100), (rxn00785, 80), (rxn03244, 78), (rxn05289, 71), (rxn00337, 70), (rxn05736, 65)), Phosphate-D-L-a-Glycerol-Phosphate: (rxn13784, 351), (rxn10042, 297), (rxn10199, 290), (rxn00085, 192), (rxn05405, 175), (rxn00527, 129), (rxn05289, 119), (rxn03084, 116), (rxn03244, 116), (rxn00785, 113)), Carbon-Glycerol: (rxn13784, 252), (rxn10042, 168), (rxn10199, 151), (rxn00085, 95), (rxn05405, 72), (rxn00974, 57), (rxn00785, 54), (rxn03084, 51), (rxn08583, 49), (rxn00337, 43)), Carbon-D-Glucosamine: (rxn13784, 428), (rxn10042, 323), (rxn10199, 296), (rxn00085, 223), (rxn00974, 175), (rxn03084, 167), (rxn05405, 138), (rxn00337, 127), (rxn02342, 115), (rxn05289, 115)), Carbon-a-Methyl-D-Galactoside: (rxn10199, 160), (rxn00085, 145), (rxn13784, 144), (rxn10042, 138), (rxn02811, 101), (rxn05405, 98), (rxn00414, 75), (rxn03084, 73), (rxn00527, 73), (rxn03244, 70)), Carbon-N-Acetyl-D-Glucosamine: (rxn10199, 119), (rxn13784, 99), (rxn10042, 97), (rxn00085, 59), (rxn05405, 46), (rxn00414, 44), (rxn03437, 44), (rxn00527, 42), (rxn05736, 41), (rxn03084, 40)), Carbon-L-Lactic-Acid: (rxn13784, 143), (rxn10042, 133), (rxn10199, 129), (rxn05405, 72), (rxn00085, 69), (rxn00527, 57), (rxn10043, 55), (rxn03084, 54), (rxn00974, 51), (rxn00414, 42)), Carbon-m-Tartaric-Acid: (rxn13784, 166), (rxn10042, 140), (rxn10199, 139), (rxn00085, 112), (rxn03244, 103), (rxn05405, 102), (rxn03084, 87), (rxn00974, 83), (rxn10043, 81), (rxn00527, 79)), Carbon-D-Mannose: (rxn10042, 337), (rxn13784, 293), (rxn10199, 286), (rxn00085, 200), (rxn05405, 163), (rxn00785, 127), (rxn02005, 117), (rxn03084, 113), (rxn00974, 110), (rxn01871, 109)), Carbon-Glycyl-L-Glutamic-Acid: (rxn10199, 320), (rxn00085, 253), (rxn05405, 232), (rxn10042, 223), (rxn13784, 204), (rxn03084, 180), (rxn03244, 170), (rxn03243, 157), (rxn00527, 147), (rxn03437, 137)), Carbon-4-Hydroxy-L-Proline-trans: (rxn10199, 162), (rxn13784, 145), (rxn00085, 144), (rxn10042, 101), (rxn05405, 89), (rxn00337, 85), (rxn03244, 82), (rxn00527, 80), (rxn10043, 79), (rxn03084, 77)), Carbon-Fumaric-Acid: (rxn13784, 283), (rxn10042, 224), (rxn10199, 175), (rxn00085, 152), (rxn05405, 112), (rxn00974, 98), (rxn00414, 97), (rxn00527, 95), (rxn00785, 86), (rxn00791, 84)), Carbon-p-Hydroxy-Phenylacetic-Acid: (rxn10199, 174), (rxn13784, 149), (rxn10042, 130), (rxn00085, 120), (rxn05405, 97), (rxn03244, 85), (rxn10043, 78), (rxn03437, 74), (rxn00527, 70), (rxn03084, 69)), Carbon-L-Malic-Acid: (rxn10199, 81), (rxn13784, 78), (rxn10042, 68), (rxn00085, 51), (rxn00285, 50), (rxn05405, 47), (rxn02342, 43), (rxn00527, 41), (rxn00974, 38), (rxn00414, 36)), Carbon-a-Hydroxy-Butyric-Acid: (rxn10199, 182), (rxn13784, 167), (rxn10042, 149), (rxn05405, 122), (rxn00085, 118), (rxn00785, 101), (rxn10043, 97), (rxn00974, 78), (rxn00337, 77), (rxn00527, 76)), Carbon-D-Xylose: (rxn10042, 170), (rxn10199, 151), (rxn13784, 148), (rxn00085, 107), (rxn05405, 84), (rxn00337, 71), (rxn05736, 69), (rxn10043, 64), (rxn08583, 62), (rxn00527, 59)), Carbon-a-Keto-Butyric-Acid: (rxn13784, 164), (rxn10199, 149), (rxn10042, 139), (rxn00085, 132), (rxn05405, 89), (rxn03244, 87), (rxn10043, 82), (rxn00785, 80), (rxn03437, 74), (rxn13782, 69)), Nitrogen-D-Alanine: (rxn13784, 127), (rxn10199, 117), (rxn10042, 90), (rxn02342, 89), (rxn00085, 70), (rxn05405, 54), (rxn10043, 53), (rxn00974, 51), (rxn00527, 39), (rxn00337, 37)), Carbon-D-Lactitol: (rxn13784, 146), (rxn10199, 140), (rxn00085, 124), (rxn10042, 103), (rxn03244, 94), (rxn05405, 92), (rxn00785, 80), (rxn02342, 71), (rxn05736, 68), (rxn00974, 63)), Carbon-D-Ribose: (rxn10199, 246), (rxn13784, 205), (rxn10042, 179), (rxn00085, 157), (rxn05405, 120), (rxn00527, 94), (rxn03084, 92), (rxn03244, 91), (rxn10043, 81), (rxn03437, 81)), Nitrogen-Cytosine: (rxn10199, 156), (rxn13784, 143), (rxn00085, 110), (rxn05405, 109), (rxn10042, 103), (rxn03437, 71), (rxn03244, 62), (rxn00414, 61), (rxn00337, 56), (rxn03084, 55)), Carbon-Succinic-Acid: (rxn13784, 174), (rxn00085, 171), (rxn10042, 168), (rxn10199, 168), (rxn05405, 160), (rxn00527, 103), (rxn03244, 102), (rxn03084, 98), (rxn00974, 88), (rxn00350, 86)) Table A.18: Table of the top 10 reactions per PGR media identified by usung RF feature importances. [TODO: REDO]

DRAFT137 REFERENCES

Phelim Bradley, N Claire Gordon, Timothy M Walker, Laura Dunn, Simon Heys, Bill Huang, Sarah Earle, Louise J Pankhurst, Luke Anson, Mariateresa De Cesare, et al. Rapid antibiotic-resistance predictions from genome sequence data for staphylococcus aureus and mycobacterium tuberculosis. Nature communications, 6, 2015.

Leo Breiman. Random Forests. Machine Learning, 45(1):5–32, October 2001. ISSN

0885-6125. doi: 10.1023/A:1010933404324. URL http://dx.doi.org/10.1023/A% 3A1010933404324.

Mark S Butler, Mark A Blaskovich, and Matthew A Cooper. Antibiotics in the clinical pipeline in 2013. The Journal of antibiotics, 66(10):571–591, 2013.

Claire Chewapreecha, Simon R Harris, Nicholas J Croucher, Claudia Turner, Pekka Mart- tinen, Lu Cheng, Alberto Pessia, David M Aanensen, Alison E Mather, Andrew J Page, Susannah J Salter, David Harris, Francois Nosten, David Goldblatt, Jukka Corander, Ju- lian Parkhill, Paul Turner, and Stephen D Bentley. Dense genomic sampling identifies highways of pneumococcal recombination. Nat Genet, 46(3):305–309, March 2014. ISSN

1061-4036. URL http://dx.doi.org/10.1038/ng.2895.

Corinna Cortes and Vladimir Vapnik. Support vector machine. Machine learning, 20(3): 273–297, 1995.

James J Davis, S´ebastienBoisvert, Thomas Brettin, Ronald W Kenyon, Chunhong Mao, Robert Olson, Ross Overbeek, John Santerre, Maulik Shukla, Alice R Wattam, et al. Antimicrobial resistance prediction in PATRIC and RAST. Scientific reports, 6, 2016.

Sebastian Deorowicz, Marek Kokot, Szymon Grabowski, and Agnieszka Debudaj-Grabysz. KMC 2: fast and resource-frugal k-mer counting. Bioinformatics, 31(10):1569–1576, 2015. DRAFT138 Manuel Fernndez-Delgado, Eva Cernadas, Senn Barro, and Dinani Amorim. Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? Journal of Ma-

chine Learning Research, 15:3133–3181, 2014. URL http://jmlr.org/papers/v15/ delgado14a.html.

Centres for Disease Control and Prevention (US). Antibiotic resistance threats in the United States, 2013. Centres for Disease Control and Prevention, US Department of Health and Human Services, 2013.

Yoav Freund and Robert E Schapire. A desicion-theoretic generalization of on-line learning and an application to boosting. In European conference on computational learning theory, pages 23–37. Springer, 1995.

White House. National action plan for combating antibiotic-resistant bacteria. Accessed August, 8, 2015.

Ramanan Laxminarayan, Adriano Duse, Chand Wattal, Anita KM Zaidi, Heiman FL Wertheim, Nithima Sumpradit, Erika Vlieghe, Gabriel Levy Hara, Ian M Gould, Herman Goossens, et al. Antibiotic resistancethe need for global solutions. The Lancet infectious diseases, 13(12):1057–1098, 2013.

Heng Li and Richard Durbin. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25(14):1754–1760, 2009.

Heng Li and Richard Durbin. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics, 26(5):589–595, 2010.

S Wesley Long, Randall J Olsen, Todd N Eagar, Stephen B Beres, Picheng Zhao, James J Davis, Thomas Brettin, Fangfang Xia, and James M Musser. Population genomic analy- sis of 1,777 extended-spectrum beta-lactamase-producing Klebsiella pneumoniae isolates, Houston, Texas: Unexpected abundance of clonal group 307. mBio, 8(3):e00489–17, 2017. DRAFT139 World Health Organization et al. Antimicrobial resistance: global report on surveillance. World Health Organization, 2014.

Alice R Wattam, David Abraham, Oral Dalay, Terry L Disz, Timothy Driscoll, Joseph L Gabbard, Joseph J Gillespie, Roger Gough, Deborah Hix, Ronald Kenyon, et al. PATRIC, the bacterial bioinformatics database and analysis resource. Nucleic acids research, 42(D1): D581–D591, 2013.

Alice R Wattam, James J Davis, Rida Assaf, S´ebastien Boisvert, Thomas Brettin, Christo- pher Bun, Neal Conrad, Emily M Dietrich, Terry Disz, Joseph L Gabbard, et al. Improve- ments to PATRIC, the all-bacterial bioinformatics database and analysis resource center. Nucleic acids research, 45(D1):D535–D542, 2016.

Gerard D Wright. Molecular mechanisms of antibiotic resistance. Chemical Communications, 47(14):4055–4061, 2011.

DRAFT140