US 20020012939A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2002/0012939 A1 PalsSOn (43) Pub. Date: Jan. 31, 2002

(54) METHODS FOR IDENTIFYING DRUG Publication Classification TARGETS BASED ON GENOMIC SEQUENCE DATA (51) Int. Cl." ...... C12O 1/68; C12O 1/04; G06F 19/00; G01N 33/48; (76) Inventor: Bernhard Palsson, La Jolla, CA (US) GO1N 33/50 (52) U.S. Cl...... 435/6; 435/34; 702/20 Correspondence Address: KNOBBE MARTENS OLSON & BEAR LLP 620 NEWPORT CENTER DRIVE (57) ABSTRACT SIXTEENTH FLOOR NEWPORT BEACH, CA 92660 (US) This invention provides computational approach to iden tifying potential antibacterial drug targets based on a (21) Appl. No.: 09/923,870 genome Sequence and its annotation. Starting from a fully Sequenced genome, open reading frame assignments are (22) Filed: Aug. 6, 2001 made which determine the metabolic genotype for the organism. The metabolic genotype, and more Specifically its Related U.S. Application Data Stoichiometric matrix, are analyzed using flux balance analysis to assess the effects of genetic deletions on the (63) Continuation of application No. 09/243,022, filed on fitness of the organism and its ability to produce essential Feb. 2, 1999. biomolecules required for growth. Patent Application Publication Jan. 31, 2002 Sheet 1 of 3 US 2002/0012939 A1 C start D 12

Obtain DNA Sequence of the Genome

Determine Location of Open Reading Frames

Assign Function to Genes Based on Homology of Open Reading Frames to Known Genes

Determine Genes Involved in Cellular Metabolism Oend D 22 Figure 1 Patent Application Publication Jan. 31, 2002. Sheet 2 of 3 US 2002/0012939 A1

Ostar D 52

Gather biochemical information on all reactions performed by the genes/gene products of the metabolic genotype

Gather additional biochemical information on any reactions known to occur biochemically not present in the metabolic genotype

Create Genome Specific Stoichiometric Matrix

Determine the Biomass Composition of the organism

Determine the uptake rates and maintenance requirements of the organism

Formulate the general linear programming problem representing the in Silico strain of the organism Patent Application Publication Jan. 31, 2002 Sheet 3 of 3 US 2002/0012939 A1

Figure 3

III

A. Biomass Yield bo 20 .8 Repression O 5 Repression Repression E. N CO O.4. Repression d w 18.19 s t 0. 4 14.4222 2 N7 Acetate 0.3. 122 d 1.65 O 2 11.66 s 1.66 ve O 9.46 9 apa BCDEFGH 9.39 A. e--wor-m- | roBDEFGH KLMN 8.68 5 10 1s rpi 7.8 Oxygen uptake (mmol gDW) frpe 6.79

Figure 3a Figure 3b

Induction Induction Figure 3c -7 inductionInduction Induction 6.8 6.8 US 2002/0012939 A1 Jan. 31, 2002

METHODS FOR DENTIFYING DRUG TARGETS metabolism. The identification of “sensitive' metabolic BASED ON GENOMIC SEQUENCE DATA Steps in attaining the necessary metabolic flux distributions to Support growth and Survival that can be attacked to RELATED APPLICATIONS weaken or destroy a microbe, need not be localized to a Single biochemical reaction or cellular process. Rather, 0001. This application in a continuation of application different cellular targets that need not be intimately related Ser. No. 09/243,022, filed Feb. 2, 1999. in the metabolic topology could be chosen based on the concerted effect the loss of each of these functions would BACKGROUND OF THE INVENTION have on metabolism. 0002) 1. Field of the Invention 0009. A similar strategy with viral infections has recently proved successful. It has been shown that “cocktails” of 0003. This invention relates to methods for identifying different drugs that target different biochemical processes drug targets based on genomic Sequence data. More specifi provide enhanced Success in fighting against HIV infection. cally, this invention relates to Systems and methods for Such a paradigm shift is possible only if the necessary determining Suitable molecular targets for the directed biological information as well as appropriate methods of development of antimicrobial agents. rational analysis are available. Recent advances in the field 0004 2. Description of the Related Art of genomics and bioinformatics, in addition to mathematical modeling, offer the possibility to realize this approach. 0005 Infectious disease is on a rapid rise and threatens to regain its Status as a major health problem. Prior to the 0010. At present, the field of microbial genetics is enter discovery of antibiotics in the 1930s, infectious disease was ing a new era where the genomes of Several microorganisms a major cause of death. Further discoveries, development, are being completely Sequenced. It is expected that in a and mass production of antibiotics throughout the 1940s and decade, or So, the nucleotide Sequences of the genomes of all 1950s dramatically reduced deaths from microbial infec the major human pathogens will be completely determined. tions to a level where they effectively no longer represented The Sequencing of the genomes of pathogens Such as a major threat in developed countries. Haemophilus influenzae has allowed researchers to compare the homology of proteins encoded by the open reading 0006 Over the years antibiotics have been liberally pre frames (ORFs) with those of Escherichia coli, resulting in Scribed and the Strong Selection pressure that this represents valuable insight into the H. influenzae metabolic features. has led to the emergence of antibiotic resistant Strains of Similar analyses, Such as those performed with H. influen many Serious human pathogens. In Some cases Selected zae, will provide details of metabolism spanning the hier antibiotics, Such as Vancomycin, literally represent the last archy of metabolic regulation from bacterial genomes to line of defense against certain pathogenic bacteria Such as phenotypes. StaphylococcuS. The possibility for Staphylococci to acquire Vancomycin resistance through exchange of genetic material 0011. These developments provide exciting new oppor with enterococci, which are commonly resistant to Vanco tunities to carry out conceptual experiments in Silico to mycin, is a Serious issue of concern to health care Specialists. analyze different aspects of microbial metabolism and its The pharmaceutical industry continues its Search for new regulation. Further, the Synthesis of whole-cell models is antimicrobial compounds, which is a lengthy and tedious, made possible. Such models can account for each and every but very important process. The rate of development and Single metabolic reaction and thus enable the analysis of introduction of new antibiotics appears to no longer be able their role in overall cell function. To implement Such analy to keep up with the evolution of new antibiotic resistant sis, however, a mathematical modeling and Simulation organisms. The rapid emergence of antibiotic resistant framework is needed which can incorporate the extensive organisms threatens to lead to a Serious widespread health metabolic detail but Still retain computational tractability. CC COCC. Fortunately, rigorous and tractable mathematical methods have been developed for the required Systems analysis of 0007. The basis of antimicrobial chemotherapy is to metabolism. selectively kill the microbe with minimal, and ideally no, harm to normal human cells and tissues. Therefore, ideal 0012. A mathematical approach that is well Suited to targets for antibacterial action are biochemical processes account for genomic detail and avoid reliance on kinetic that are unique to bacteria, or those that are Sufficiently complexity has been developed based on well-known Sto different from the corresponding mammalian processes to ichiometry of metabolic reactions. This approach is based on allow acceptable discrimination between the two. For effec metabolic flux balancing in a metabolic Steady State. The tive antibiotic action it is clear that a vital target must exist history of flux balance models for metabolic analyses is in the bacterial cell and that the antibiotic be delivered to the relatively short. It has been applied to metabolic networks, target in an active form. Therefore resistance to an antibiotic and the Study of adipocyte metabolism. Acetate Secretion can arise from: (i) chemical destruction or inactivation of the from E. coli under ATP maximization conditions and ethanol antibiotic; (ii) alteration of the target site to reduce or Secretion by yeast have also been investigated using this eliminate effective antibiotic binding; (iii) blocking antibi approach. otic entry into the cell, or rapid removal from the cell after entry; and (iv) replacing the metabolic step inhibited by the 0013 The complete sequencing of a bacterial genome antibiotic. and ORF assignment provides the information needed to determine the relevant metabolic reactions that constitute 0008 Thus, it is time to fundamentally re-examine the metabolism in a particular organism. Thus a flux-balance philosophy of microbial killing Strategies and develop new model can be formulated and Several metabolic analyses can paradigms. One Such paradigm is a holistic view of cellular be performed to extract metabolic characteristics for a US 2002/0012939 A1 Jan. 31, 2002 particular organism. The flux balance approach can be easily utilizes the genome Sequence, the annotation data, and the applied to Systematically simulate the effect of Single, as biomass requirements of an organism to construct genomi well as multiple, gene deletions. This analysis will provide cally complete metabolic genotypes and genome-specific a list of Sensitive that could be potential antimi Stoichiometric matrices. These Stoichiometric matrices are crobial targets. analyzed using a flux-balance analysis. This invention 0.014. The need to consider a new paradigm for dealing describes how to assess the affects of genetic deletions on with the emerging problem of antibiotic resistant pathogens the fitness and productive capabilities of an organism under is a problem of vital importance. The route towards the given environmental and genetic conditions. design of new antimicrobial agents must proceed along 0018 Construction of a genome-specific stoichiometric directions that are different from those of the past. The rapid matrix from genomic annotation data is illustrated along growth in bioinformatics has provided a wealth of biochemi with applying flux-balance analysis to study the properties cal and genetic information that can be used to Synthesize of the Stoichiometric matrix, and hence the metabolic geno complete representations of cellular metabolism. These type of the organism. By limiting the constraints on various models can be analyzed with relative computational ease fluxes and altering the environmental inputs to the metabolic through flux-balance models and Visual computing tech network, genetic deletions may be analyzed for their affects niques. The ability to analyze the global metabolic network on growth. This invention is embodied in a Software appli and understand the robustness and Sensitivity of its regula cation that can be used to create the Stoichiometric matrix for tion under various growth conditions offers promise in a fully Sequenced and annotated genome. Additionally, the developing novel methods of antimicrobial chemotherapy. Software application can be used to further analyze and manipulate the network So as to predict the ability of an 0.015. In one example, Pramanik et al. described a sto organism to produce biomolecules necessary for growth, ichiometric model of E. coli metabolism using flux-balance thus, essentially simulating a genetic deletion. modeling techniques (Stoichiometric Model of Escherichia coli Metabolism: Incorporation of Growth-Rate Dependent BRIEF DESCRIPTION OF THE DRAWINGS Biomass Composition and Mechanistic Energy Require 0019 FIG. 1 is a flow diagram illustrating one procedure ments, Biotechnology and Bioengineering, Vol. 56, No. 4, for creating metabolic genotypes from genomic Sequence Nov. 20, 1997). However, the analytical methods described data for any organism. by Pramanik, et al. can only be used for situations in which biochemical knowledge exists for the reactions occurring 0020 FIG. 2 is a flow diagram illustrating one procedure within an organism. Pramanik, et al. produced a metabolic for producing in Silico microbial Strains from the metabolic model of metabolism for E. coli based on biochemical genotypes created by the method of FIG. 1, along with information rather than genomic data Since the metabolic additional biochemical and microbiological data. genes and related reactions for E. coli had already been well 0021 FIG. 3 is a graph illustrating a prediction of Studied and characterized. Thus, this method is inapplicable genome Scale shifts in transcription. The graph shows the to determining a metabolic model for organisms for which different phases of the metabolic response to varying oxygen little or no biochemical information on metabolic enzymes availability, Starting from completely aerobic to completely and genes is known. It can be envisioned that in the future anaerobic in E. coli. The predicted changes in expression the only information we may have regarding an emerging pattern between phases II and V are indicated. pathogen is its genomic Sequence. What is needed in the art is a System and method for determining and analyzing the DETAILED DESCRIPTION OF THE entire metabolic network of organisms whose metabolic INVENTION reactions have not yet been determined from biochemical 0022. This invention relates to systems and methods for assays. The present invention provides Such a System. utilizing genome annotation data to construct a Stoichiomet ric matrix representing most of all of the metabolic reactions SUMMARY OF THE INVENTION that occur within an organism. Using these Systems and methods, the properties of this matrix can be Studied under 0016. This invention relates to constructing metabolic conditions simulating genetic deletions in order to predict genotypes and genome Specific Stoichiometric matrices from the affect of a particular gene on the fitness of the organism. genome annotation data. The functions of the metabolic Moreover, genes that are vital to the growth of an organism genes in the target organism are determined by homology can be found by Selectively removing various genes from the Searches against databases of genes from Similar organisms. Stoichiometric matrix and thereafter analyzing whether an Once a potential function is assigned to each metabolic gene organism with this genetic makeup could Survive. Analysis of the target organism, the resulting data is analyzed. In one of these lethal genetic mutations is useful for identifying embodiment, each gene is Subjected to a flux-balance analy potential genetic targets for anti-microbial drugs. sis to assess the effects of genetic deletions on the ability of the target organism to produce essential biomolecules nec 0023. It should be noted that the systems and methods essary for its growth. Thus, the invention provides a high described herein can be implemented on any conventional throughput computational method to Screen for genetic host computer System, Such as those based on Intel(R) micro deletions which adversely affect the growth capabilities of processors and running MicroSoft Windows operating Sys fully Sequenced organisms. tems. Other systems, such as those using the UNIX or LINUX operating system and based on IBM(R), DEC(R) or 0017 Embodiments of this invention also provide a com Motorola(E) microprocessors are also contemplated. The putational, as opposed to an experimental, method for the Systems and methods described herein can also be imple rapid Screening of genes and their gene products as potential mented to run on client-Server Systems and wide-area net drug targets to inhibit an organism's growth. This invention WorkS, Such as the Internet. US 2002/0012939 A1 Jan. 31, 2002

0024 Software to implement the system can be written in 0029 All of the genes involved in metabolic reactions any well-known computer language, Such as Java, C, C++, and functions in a cell comprise only a Subset of the Visual Basic, FORTRAN or COBOL and compiled using genotype. This Subset of genes is referred to as the metabolic any well-known compatible compiler. genotype of a particular organism. Thus, the metabolic genotype of an organism includes most or all of the genes 0.025 The software of the invention normally runs from involved in the organism's metabolism. The gene products instructions Stored in a memory on the host computer produced from the Set of metabolic genes in the metabolic System. Such a memory can be a hard disk, Random AcceSS genotype carry out all or most of the enzymatic reactions and Memory, Read Only Memory and Flash Memory. Other transport reactions known to occur within the target organ types of memories are also contemplated to function within ism as determined from the genomic Sequence. the Scope of the invention. 0030 To begin the selection of this subset of genes, one 0026. A process 10 for producing metabolic genotypes can Simply Search through the list of functional gene assign from an organism is shown in FIG. 1. Beginning at a start ments from state 18 to find genes involved in cellular state 12, the process 10 then moves to a state 14 to obtain the metabolism. This would include genes involved in central genomic DNA sequence of an organism. The nucleotide metabolism, amino acid metabolism, nucleotide metabo Sequence of the genomic DNA can be rapidly determined for lism, fatty acid and lipid metabolism, carbohydrate assimi an organism with a genome size on the order of a few million lation, Vitamin and biosynthesis, energy and redox base pairs. One method for obtaining the nucleotide generation, etc. This Subset is generated at a State 20. The Sequences in a genome is through commercial gene data process 10 of determining metabolic genotype of the target bases. Many gene Sequences are available on-line through a organism from genomic data then terminates at an end Stage number of Sites (see, for example, www.tigr.org) and can 22. easily be downloaded from the Internet. Currently, there are 16 microbial genomes that have been fully Sequenced and 0031 Referring now to FIG. 2, the process 50 of pro are publicly available, with countless others held in propri ducing a computer model of an organism. This proceSS is etary databases. It is expected that a number of other also known as producing in Silico microbial Strains. The organisms, including pathogenic organisms will be found in process 50 begins at a start state 52 (same as end state 22 of nature for which little experimental information, except for process 10) and then moves to a state 54 wherein biochemi its genome Sequence, will be available. cal information is gathered for the reactions performed by each metabolic gene for each of the genes in the 0027. Once the nucleotide sequence of the entire genomic metabolic genotype determined from process 10. DNA in the target organism has been obtained at state 14, the coding regions, also known as open reading frames, are 0032 For each gene in the metabolic genotype, the determined at a State 16. Using existing computer algo Substrates and products, as well as the Stoichiometry of any rithms, the location of open reading frames that encode and all reactions performed by the gene product of each gene genes from within the genome can be determined. For can be determined by reference to the biochemical literature. example, to identify the proper location, Strand, and reading This includes information regarding the irreversible or frame of an open reading frame one can perform a gene reversible nature of the reactions. The stoichiometry of each Search by signal (promoters, ribosomal binding sites, etc.) or reaction provides the molecular ratioS in which reactants are by content (positional base frequencies, codon preference). converted into products. Computer programs for determining open reading frames 0033 Potentially, there may still remain a few reactions are available, for example, by the University of Wisconsin in cellular metabolism which are known to occur from in Genetics Computer Group and the National Center for Vitro assays and experimental data. These would include Biotechnology Information. well characterized reactions for which a gene or protein has 0028. After the location of the open reading frames have yet to be identified, or was unidentified from the genome been determined at state 16, the process 10 moves to state 18 Sequencing and functional assignment of State 14 and 18. to assign a function to the protein encoded by the open This would also include the transport of metabolites into or reading frame. The discovery that an open reading frame or out of the cell by uncharacterized genes related to transport. gene has sequence homology to a gene coding for a protein Thus one reason for the missing gene information may be of known function, or family of proteins of known function, due to a lack of characterization of the actual gene that can provide the first clues about the gene and it's related performs a known biochemical conversion. Therefore upon protein's function. After the locations of the open reading careful review of existing biochemical literature and avail frames have been determined in the genomic DNA from the able experimental data, additional metabolic reactions can target organism, well-established algorithms (i.e. the Basic be added to the list of metabolic reactions determined from Local Alignment Search Tool (BLAST) and the FAST the metabolic genotype from state 54 at a state 56. This family of programs can be used to determine the extent of would include information regarding the Substrates, prod Similarity between a given Sequence and gene/protein ucts, reversibility/irreversibility, and stoichiometry of the Sequences deposited in Worldwide genetic databases. If a reactions. coding region from a gene in the target organism is homolo 0034 All of the information obtained at states 54 and 56 gous to a gene within one of the Sequence databases, the regarding reactions and their Stoichiometry can be repre open reading frame is assigned a function Similar to the Sented in a matrix format typically referred to as a Stoichio homologously matched gene. Thus, the functions of nearly metric matrix. Each column in the matrix corresponds to a the entire gene complement or genotype of an organism can given reaction or flux, and each row corresponds to the be determined So long as homologous genes have already different metabolites involved in the given reaction/flux. been discovered. Reversible reactions may either be represented as one reac US 2002/0012939 A1 Jan. 31, 2002

tion that operates in both the forward and reverse direction sider the Steady State behavior. Eliminating the time deriva or be decomposed into one forward reaction and one back tives obtained from dynamic mass balances around every ward reaction in which case all fluxes can only take on metabolite in the metabolic System, yields the System of positive values. Thus, a given position in the matrix linear equations represented in matrix notation, describes the Stoichiometric participation of a metabolite (listed in the given row) in a particular flux of interest (listed S.--O Equation 1 in the given column). Together all of the columns of the 0039 where S refers to the stoichiometric matrix of the genome specific Stoichiometric matrix represent all of the System, and V is the flux vector. This equation simply States chemical conversions and cellular transport processes that that over long times, the formation fluxes of a metabolite are determined to be present in the organism. This includes must be balanced by the degradation fluxes. Otherwise, all internal fluxes and So called exchange fluxes operating Significant amounts of the metabolite will accumulate inside within the metabolic network. Thus, the process 50 moves to the metabolic network. Applying equation 1 to our System a state 58 in order to formulate all of the cellular reactions we let S now represent the genome specific Stoichiometric together in a genome specific Stoichiometric matrix. The matrix resulting genome specific Stoichiometric matrix is a funda 0040. To determine the metabolic capabilities of a mental representation of a genomically and biochemically defined metabolic genotype Equation 1 is Solved for the defined genotype. metabolic fluxes and the internal metabolic reactions, V, 0035. After the genome specific stoichiometric matrix is while imposing constraints on the activity of these fluxes. defined at state 58, the metabolic demands placed on the Typically the number of metabolic fluxes is greater than the organism are calculated. The metabolic demands can be number of mass balances (i.e., mdn) resulting in a plurality readily determined from the dry weight composition of the of feasible flux distributions that satisfy Equation 1 and any cell. In the case of well-Studied organisms Such as Escheri constraints placed on the fluxes of the System. This range of chia coli and Bacillus Subtilis, the dry weight composition is solutions is indicative of the flexibility in the flux distribu available in the published literature. However, in Some cases tions that can be achieved with a given set of metabolic it will be necessary to experimentally determine the dry reactions. The Solutions to Equation 1 lie in a restricted weight composition of the cell for the organism in question. region. This Subspace defines the capabilities of the meta This can be accomplished with varying degrees of accuracy. bolic genotype of a given organism, Since the allowable The first attempt would measure the RNA, DNA, protein, Solutions that Satisfy Equation 1 and any constraints placed and lipid fractions of the cell. A more detailed analysis on the fluxes of the system define all the metabolic flux would also provide the specific fraction of nucleotides, distributions that can be achieved with a particular set of amino acids, etc. The process 50 moves to state 60 for the metabolic genes. determination of the biomass composition of the target 0041. The particular utilization of the metabolic genotype organism. can be defined as the metabolic phenotype that is expressed 0036) The process 50 then moves to state 62 to perform under those particular conditions. Objectives for metabolic Several experiments that determine the uptake rates and function can be chosen to explore the best use of the maintenance requirements for the organism. Microbiologi metabolic network within a given metabolic genotype. The cal experiments can be carried out to determine the uptake Solution to equation 1 can be formulated as a linear pro rates for many of the metabolites that are transported into the gramming problem, in which the flux distribution that mini cell. The uptake rate is determined by measuring the deple mizes a particular objective if found. Mathematically, this tion of the from the growth media. The measure optimization can be Stated as; ment of the biomass at each point is also required, in order Minimize Z Equation 2 to determine the uptake rate per unit biomass. The mainte Equation 3 nance requirements can be determined from a chemoStat experiment. The glucose uptake rate is plotted versus the 0042 where Z is the objective which is represented as a growth rate, and the y-intercept is interpreted as the non linear combination of metabolic fluxes V. The optimization growth associated maintenance requirements. The growth can also be Stated as the equivalent maximization problem; asSociated maintenance requirements are determined by i.e. by changing the Sign on Z. fitting the model results to the experimentally determined 0043. This general representation of Z enables the for points in the growth rate versus glucose uptake rate plot. mulation of a number of diverse objectives. These objectives 0037 Next, the process 50 moves to a state 64 wherein can be design objectives for a Strain, exploitation of the information regarding the metabolic demands and uptake metabolic capabilities of a genotype, or physiologically rates obtained at State 62 are combined with the genome meaningful objective functions, Such as maximum cellular Specific Stoichiometric matrix of Step 8 together fully define growth. For this application, growth is to be defined in terms the metabolic System using flux balance analysis (FBA). of biosynthetic requirements based on literature values of This is an approach well Suited to account for genomic detail biomass composition or experimentally determined values as it has been developed based on the well-known stoichi Such as those obtained from state 60. Thus, we can define ometry of metabolic reactions. biomass generation as an additional reaction flux draining intermediate metabolites in the appropriate ratios and rep 0.038. The time constants characterizing metabolic tran resented as an objective function Z. In addition to draining Sients and/or metabolic reactions are typically very rapid, on intermediate metabolites this reaction flux can be formed to the order of milli-Seconds to Seconds, compared to the time utilize energy molecules such as ATP, NADH and NADPH constants of cell growth on the order of hours to days. Thus, So as to incorporate any maintenance requirement that must the transient mass balances can be simplified to only con be met. This new reaction flux then becomes another con US 2002/0012939 A1 Jan. 31, 2002

Straint/balance equation that the System must Satisfy as the phenotypes. If the removal/deletion does not allow the objective function. It is analagous to adding an addition metabolic network to produce necessary precursors for column to the Stoichiometric matrix of Equation 1 to rep growth, and the cell can not obtain these precursors from its resent Such a flux to describe the production demands placed environment, the deletion(s) has the potential as an antimi on the metabolic System. Setting this new flux as the crobial drug target. Thus by adjusting the constraints and objective function and asking the System to maximize the defining the objective function we can explore the capabili value of this flux for a given set of constraints on all the other ties of the metabolic genotype using linear programming to fluxes is then a method to simulate the growth of the optimize the flux distribution through the metabolic net organism. work. This creates what we will refer to as an in silico bacterial Strain capable of being Studied and manipulated to 0044) Using linear programming, additional constraints analyze, interpret, and predict the genotype-phenotype rela can be placed on the value of any of the fluxes in the tionship. It can be applied to assess the affects of incremental metabolic network. changes in the genotype or changing environmental condi ?.svis C; Equation 4 tions, and provide a tool for computer aided experimental 004.5 These constraints could be representative of a design. It should be realized that other types of organisms maximum allowable flux through a given reaction, possibly can similarly be represented in silico and still be within the resulting from a limited amount of an present in Scope of the invention. which case the value for C would take on a finite value. 0048. The construction of a genome specific stoichiomet These constraints could also be used to include the knowl ric matrix and in Silico microbial Strains can also be applied edge of the minimum flux through a certain metabolic to the area of Signal transduction. The components of reaction in which case the value for B would take on a finite Signaling networks can be identified within a genome and value. Additionally, if one chooses to leave certain reversible used to construct a content matrix that can be further reactions or transport fluxes to operate in a forward and analyzed using various techniques to be determined in the reverse manner the flux may remain unconstrained by Set future. ting f, to negative infinity and C to positive infinity. If reactions proceed only in the forward reaction f3, is set to EXAMPLE 1. Zero while C is set to positive infinity. As an example, to Simulate the event of a genetic deletion the flux through all A. of the corresponding metabolic reactions related to the gene in question are reduced to Zero by setting 3, and C to be zero in Equation 4. Based on the in vivo environment where the E. coli Metabolic Genotype and In Silico Model bacteria lives one can determine the metabolic resources 0049. Using the methods disclosed in FIGS. 1 and 2, an available to the cell for biosynthesis of essentially molecules in silico strain of Escherichia coli K-12 has been constructed for biomass. Allowing the corresponding transport fluxes to and represents the first Such Strain of a bacteria largely be active provides the in Silico bacteria with inputs and generated from annotated Sequence data and from biochemi outputs for Substrates and by-products produces by the cal information. The genetic Sequence and open reading metabolic network. Therefore as an example, if one wished frame identifications and assignments are readily available to Simulate the absence of a particular growth Substrate one from a number of on-line locations (ex: www.tigr.org). For Simply constrains the corresponding transport fluxes allow this example we obtained the annotated Sequence from the ing the metabolite to enter the cell to be Zero by allowing f; following website for the E. coli Genome Project at the and C to be zero in Equation 4. On the other hand if a University of Wisconsin (http://www.genetics.wisc.edu/). Substrate is only allowed to enter or exit the cell via transport Details regarding the actual Sequencing and annotation of mechanisms, the corresponding fluxes can be properly con the Sequence can be found at that Site. From the genome Strained to reflect this Scenario. annotation data the Subset of genes involved in cellular 0.046 Together the linear programming representation of metabolism was determined as described above in FIG. 1, the genome-specific Stoichiometric matrix as in Equation 1 State 20, comprising the metabolic genotype of the particular along with any general constraints placed on the fluxes in the strain of E. coli. System, and any of the possible objective functions com 0050. Through detailed analysis of the published bio pletes the formulation of the in silico bacterial strain. The in chemical literature on E. coli we determined (1) all of the Silico Strain can then be used to Study theoretical metabolic reactions associated with the genes in the metabolic geno capabilities by Simulating any number of conditions and type and (2) any additional reactions known to occur from generating flux distributions through the use of linear pro biochemical data which were not represented by the genes in gramming. The process 50 of formulating the in Silico Strain the metabolic genotype. This provided all of the necessary and Simulating its behavior using linear programming tech information to construct the genome Specific Stoichiometric niques terminates at an end State 66. matrix for E. coli K-12. 0047 Thus, by adding or removing constraints on various 0051 Briefly, the E. coli K-12 bacterial metabolic geno fluxes in the network it is possible to (1) Simulate a genetic type and more Specifically the genome Specific Stoichiomet deletion event and (2) simulate or accurately provide the ric matrix contains 731 metabolic processes that influence network with the metabolic resources present in its in vivo 436 metabolites (dimensions of the genome specific sto environment. Using flux balance analysis it is possible to ichiometric matrix are 436x731). There are 80 reactions determine the affects of the removal or addition of particular present in the genome specific Stoichiometric matrix that do genes and their associated reactions to the composition of not have a genetic assignment in the annotated genome, but the metabolic genotype on the range of possible metabolic are known to be present from biochemical data. The genes US 2002/0012939 A1 Jan. 31, 2002

contained within this metabolic genotype are shown in Table their loSS removes the capability of cellular growth on 1 along with the corresponding reactions they carry out. glucose. For growth on glucose, the essential gene products 0.052 Because E. coli is arguably the best studied organ are involved in the 3-carbon Stage of glycolysis, three ism, it was possible to determine the uptake rates and reactions of the TCA cycle, and several points within the maintenance requirements (state 62 of FIG. 2) by reference PPP Deletions in the 6-carbon stage of glycolysis result in to the published literature. This in silico strain accounts for a reduced ability to Support growth due to the diversion of the metabolic capabilities of E. coli. It includes membrane additional flux through the PPP. transport processes, the central catabolic pathways, utiliza 0056. The results from the gene deletion study can be tion of alternative carbon Sources and the biosynthetic directly compared with growth data from mutants. The pathways that generate all the components of the biomass. In growth characteristics of a Series of E. coli mutants on the case of E. coli K-12, we can call upon the wealth of data Several different carbon Sources were examined (80 cases on overall metabolic behavior and detailed biochemical were determined from the literature), and compared to the in information about the in Vivo genotype to which we can silico deletion results (Table 2). The majority (73 of 80 cases compare the behavior of the in silico strain. One utility of or 91%) of the mutant experimental observations are con FBA is the ability to learn about the physiology of the sistent with the predictions of the in silico study. The results particular organism and explore its metabolic capabilities from the in Silico gene deletion analysis are thus consistent without any specific biochemical data. This ability is impor with experimental observations. tant considering possible future Scenarios in which the only data that we may have for a newly discovered bacterium EXAMPLE 3 (perhaps pathogenic) could be its genome sequence. C. EXAMPLE 2 Prediction of Genome Scale Shifts in Gene B. Expression 0057 Flux based analysis can be used to predict meta In Silico Deletion Analysis for E. coli to Find bolic phenotypes under different growth conditions, Such as Antimicrobial Targets substrate and oxygen availability. The relation between the 0.053 Using the in silico strain constructed in Example 1, flux value and the gene expression levels is non-linear, the effect of individual deletions of all the enzymes in central resulting in bifurcations and multiple steady States. How metabolism can be examined in Silico. For the analysis to ever, FBA can give qualitative (on/off) information as well determine Sensitive linkages in the metabolic network of E. as the relative importance of gene products under a given coli, the objective function utilized is the maximization of condition. Based on the magnitude of the metabolic fluxes, the biomass yield. This is defined as a flux draining the qualitative assessment of gene expression can be inferred. necessary biosynthetic precursors in the appropriate ratioS. 0.058 FIG.3a shows the five phases of distinct metabolic This flux is defined as the biomass composition, which can behavior of E. coli in response to varying oxygen availabil be determined from the literature. See Neidhardt et. al., ity, going from completely anaerobic (phase I) to completely Escherichia coli and SalmOnella. Cellular and Molecular aerobic (phase V). FIGS. 3b and 3c display lists of the genes Biology, Second Edition, ASM Press, Washington D.C., that are predicted to be induced or repressed upon the shift 1996. Thus, the objective function is the maximization of a from aerobic growth (phase V) to nearly complete anaerobic Single flux, this biosynthetic flux. growth (phase II). The numerical values shown in FIGS. 3b 0.054 Constraints are placed on the network to account and 3c are the fold change in the magnitude of the fluxes for the availability of Substrates for the growth of E. coli. In calculated for each of the listed enzymes. the initial deletion analysis, growth was Simulated in an 0059 For this example, the objective of maximization of aerobic glucose minimal media culture. Therefore, the con biomass yield is utilized (as described above). The con Straints are set to allow for the components included in the Straints on the System are also set accordingly (as described media to be taken up. The Specific uptake rate can be above). However, in this example, a change in the avail included if the value is known, otherwise, an unlimited ability of a key Substrate is leading to changes in the Supply can be provided. The uptake rate of glucose and metabolic behavior. The change in the parameter is reflected oxygen have been determined for E. coli (Neidhardt et. al., as a change in the uptake flux. Therefore, the maximal Escherichia coli and SalmOnella. Cellular and Molecular allowable oxygen uptake rate is changed to generate this Biology, Second Edition, ASM Press, Washington D.C., data. The figure demonstrates how Several fluxes in the 1996. Therefore, these values are included in the analysis. metabolic network will change as the oxygen uptake flux is The uptake rate for phosphate, Sulfur, and nitrogen Source is continuously decreased. Therefore, the constraints on the not precisely known, So constraints on the fluxes for the fluxes is identical to what is described in the previous uptake of these important Substrates is not included, and the Section, however, the oxygen uptake rate is Set to coincide metabolic network is allowed to take up any required with the point in the diagram. amount of these Substrates. 0060 Corresponding experimental data sets are now 0.055 The results showed that a high degree of redun becoming available. Using high-density oligonucleotide dancy exists in central intermediary metabolism during arrays the expression levels of nearly every gene in Saccha growth in glucose minimal media, which is related to the romyces cerevisiae can now be analyzed under various interconnectivity of the metabolic reactions. Only a few growth conditions. From these Studies it was shown that metabolic functions were found to be essential Such that nearly 90% of all yeast mRNAS are present in growth on rich US 2002/0012939 A1 Jan. 31, 2002

and minimal media, while a large number of mRNAs were defined above, a flux-balance model for the first completely shown to be differentially expressed under these two con Sequenced free living organism, Haemophilus influenzae, ditions. Another recent article shows how the metabolic and has been generated. One application of this model is to genetic control of gene expression can be Studied on a predict a minimal defined media. It was found that H. genomic scale using DNA microarray technology (Explor influenzae can grow on the minimal defined medium as ing the Metabolic and Genetic Control of Gene Expression determined from the ORF assignments and predicted using on a Genomic Scale, Science, Vol. 278, Oct. 24, 1997. The FBA. Simulated bacterial growth was predicted using the temporal changes in genetic expression profiles that occur following defined media: fructose, arginine, cysteine, during the diauxic shift in S. cerevisiae were observed for glutamate, putrescine, Spermidine, thiamin, NAD, tetrapyr every known expressed sequence tag (EST) in this genome. role, pantothenate, ammonia, phosphate. This predicted As shown above, FBA can be used to qualitatively simulate minimal medium was compared to the previously published shifts in metabolic genotype expression patterns due to defined media and was found to differ in only one com alterations in growth environments. Thus, FBA can Serve to pound, inosine. It is known that inosine is not required for complement current Studies in metabolic gene expression, growth, however it does Serve to enhance growth. Again the by providing a fundamental approach to analyze, interpret, in silico results obtained were consistent with published in and predict the data from Such experiments. Vivo research. These results provide confidence in the use of this type of approach for the design of defined media for EXAMPLE 4 organisms in which there currently does not exist a defined D. media. 0062) While particular embodiments of the invention Design of Defined Media have been described in detail, it will be apparent to those 0061 An important economic consideration in large skilled in the art that these embodiments are exemplary Scale bioprocesses is optimal medium formulation. FBA can rather than limiting, and the true Scope of the invention is be used to design Such media. Following the approach defined by the claims that follow.

TABLE 1. The genes included in the E. coli metabolic genotype along with corresponding enzymes and reactions that comprise the genome specific stoichiometric matrix. The final column indicates the presencefabsence of the gene (as the number of copies) in the E. coli genome. Thus the presence of a gene in the E. coli genome indicates that the gene is part of the metabolic genotype. Reactions/Genes not present in the genome are those gathered at state 56 in Figure 2 and together with the reactions of the genes in the metabolic genotype form the columns of the genome specific stoichiometric matrix. E. coli Enzyme Gene Reaction Genome Glucokinase glk GLC + ATP-> G6P - ADP Glucokinase glk bDGLC + ATP-> b)G6P + ADP Phosphoglucose pgi G6P -> F6P Phosphoglucose isomerase pgi bDG6P -> G6P Phosphoglucose isomerase pgi bDG6P - F6P Aldose 1-epimerase galM bDGLC <-> GLC Glucose-1-phophatase agp G1P -> GLC - PI Phosphofructokinase pfka. F6P - ATP - FDP - ADP Phosphofructokinase B pfkB F6P - ATP - FDP - ADP Fructose-1,6-bisphosphatase op FDP -> F6P - PI Fructose-1,6-bisphosphatate aldolase a. FDP -> T3P1 - T3P2 2 Triosphosphate Isomerase piA T3P1 -> T3P2 Methylglyoxal synthase mgSA T3P2 - MTHGXL - PI O Glyceraldehyde-3-phosphate dehydrogenase-A complex gap A T3P1 - P - NAD - NADH - 13PDG Glyceraldehyde-3-phosphate dehydrogenase-C complex gapC1C2 T3P1 - P - NAD - NADH - 13PDG 2 Phosphoglycerate kinase pgk 13PDG - ADP-> 3PG - ATP Phosphoglycerate mutase 1 gpmA 3PG -> 2PG Phosphoglycerate mutase 2 gpmB 3PG -> 2PG Enolase CO 2PG -> PEP Phosphoenolpyruvate synthase ppsA PYR - ATP-> PEP - AMP - PI Pyruvate Kinase II pykA PEP - ADP-> PYR - ATP Pyruvate Kinase I pykF PEP - ADP-> PYR - ATP Pyruvate dehydrogenase pdA, aceEF PYR + COA - NAD -> NADH + CO2 - ACCOA 3 Glucose-1-phosphate adenylyltransferase glgC ATP - G1P-> ADPGLC - PPI Glycogen synthase glgA ADPGLC -> ADP - GLYCOGEN Glycogen phosphorylase glgP GLYCOGEN - P -> G1P Maltodextrin phosphorylase malP GLYCOGEN - P -> G1P Glucose 6-phosphate-1-dehydrogenase Zwf G6P - NADP -> D6PGL - NADPH 6-Phosphogluconolactonase pg| D6PGL -> D6PGC O 6-Phosphogluconate dehydrogenase (decarboxylating) gnd D6PGC - NADP - NADPH-CO2 - RLSP Ribose-5-phosphate isomerase A rpiA RLSP -> RSP Ribose-5-phosphate isomerase B rpi B RLSP -> RSP Ribulose phosphate 3-epimerase rpe RLSP -> XSP Transketolase I tktA RSP - XSP -> T3P1 - S7P US 2002/0012939 A1 Jan. 31, 2002

TABLE 1-continued The genes included in the E. coli metabolic genotype along with corresponding enzymes and reactions that comprise the genome specific stoichiometric matrix. The final column indicates the presencefabsence of the gene (as the number of copies) in the E. coli genome. Thus the presence of a gene in the E. coli genome indicates that the gene is part of the metabolic genotype. Reactions/Genes not present in the genome are those gathered at state 56 in Figure 2 and together with the reactions of the genes in the metabolic genotype form the columns of the genome specific stoichiometric matrix. E. coli Enzyme Gene Reaction Genome Transketolase II Transketolase I tktA Transketolase II Transaldolase B talB Phosphogluconate dehydratase edd 2-Keto-3-deoxy-6-phosphogluconate aldolase eda Citrate synthase gltA Aconitase A acna Aconitase B acn8 CIT -> ICT Isocitrate dehydrogenase iccA ICT - NADP &-> CO2 - NADPH - AKG 2-Ketoglutarate dehyrogenase sucAB, lpdA AKG - NAD+ COA-> CO2 - NADH -- SUCCOA Succinyl-CoA synthetase SucCD SUCCOA-ADP + P &-> ATP + COA - SUCC Succinate dehydrogenase sdhaECD SUCC - FAD -> FADH - FUM Fumurate reductase rdABCD FUM - FADH-> SUCC - FAD Fumarase A fumA FUM &-> MAL Fumarase B umEB FUM &-> MAL Fumarase C fumC FUM &-> MAL Malate dehydrogenase mdh MAL - NAD &-> NADH - OA D-Lactate dehydrogenase 1 PYR - NADH &-> NAD - LAC D-Lactate dehydrogenase 2 PYR - NADH &-> NAD - LAC Acetaldehyde dehydrogenase ACCOA - 2 NADH &-> ETH + 2 NAD+ COA Pyruvate formate 1 PYR + COA-> ACCOA - FOR Pyruvate formate lyase 2 PYR + COA-> ACCOA - FOR Formate hydrogen lyase dhF, hycBEFG FOR -> CO2 Phosphotransacetylase bta ACCOA-P &-> ACTP + COA Acetate kinase A ackA ACTP-ADP-> ATP - AC GAR transformylase T burT ACTP-ADP &-> ATP - AC Acetyl-CoA synthetase aCS ATP - AC - COA-> AMP - PPI - ACCOA Phosphoenolpyruvate carboxykinase pckA OA + ATP-> PEP + CO2 - ADP Phosphoenolpyruvate carboxylase ppc PEP - CO2 - OA - P1 Malic enzyme (NADP) maeB MAL - NADP-> CO2 - NADPH - PYR Malic enzyme (NAD) SfcA MAL - NAD -> CO2 - NADH - PYR Isocitrate lyase aceA ICT -> GLX -- SUCC Malate synthase A aceB ACCOA + GLX -> COA + MAL Malate synthase G gloB ACCOA + GLX -> COA + MAL Inorganic pyrophosphatase ppa PPI - 2 PI NAPIT dehydrogenase II indh NADH + O-> NAD+ OH2 NADH dehydrogenase I nuoABEFGHIJ NADH + O-> NAD+ OH2 + 3.5 HEXT Formate dehydrogenase-N finGHI FOR + O -> OH2 + CO2 + 2 HEXT 3 Formate dehydrogenase-O foIHG FOR + O -> OH2 + CO2 + 2 HEXT Formate dehydrogenase fchF FOR + O -> OH2 + CO2 + 2 HEXT Pyruvate oxidase poxB PYR + O -> AC + CO2 + OH2 Glycerol-3-phosphate dehydrogenase (aerobic) glpD GL3P+ O -> T3P2 + OH2 Glycerol-3-phosphate dehydrogenase (anaerobic) glpABC GL3P+ O -> T3P2 + OH2 Cytochrome oxidase bo3 cyoABCD, cyc OH2 + 5 O2 -> O + 2.5 HEXT Cytochrome oxidase bd cydABCD, app OH2 + 5 O2 -> O + 2 HEXT Succinate dehydrogenase complex sdhaECD FADH + O. <-> FAD + OH2 Thioredoxin reductase trixB OTHIO - NADPH-> NADP RTHIO Pyridine nucleotide transhydrogenase pntAB NADPH - NAD - NADP - NADH Pyridine nucleotide transhydrogenase pntAB NADP - NADH-2 HEXT-> NADPH - NAD Hydrogenase I hyaABC 2 O + 2 HEXT &-> 2 OH2 + H2 Hydrogenase 2 hybAC 2 O + 2 HEXT &-> 2 OH2 + H2 Hydrogenase 3 hycFGBE 2 O + 2 HEXT &-> 2 OH2 + H2 FOF1-ATPase atpABCDEFG ATP - ADP - P -- 4 HEXT Alpha-galactosidase (melibiase) melA MEL-> GLC - GLAC Galactokinase galK GLAC + ATP-> GAL1P - ADP Galactose-1-phosphate uridylyltransferase galT GAL1P - UDPG &-> G1 P -- UDPGAL UDP-glucose 4-epimerase galE UDPGAL -> UDPG UDP-glucose-1-phosphate uridylyltransferase gall J G1P - UTP -> UDPG - PPI Phosphoglucomutase G1P -> G6P Periplasmic beta-glucosidase precursor LCTS -> GLC - GLAC Beta-galactosidase (LACTase) lacz, LCTS -> GLC - GLAC trehalose-6-phosphate treC TRE6P -> b G6P - GLC Beta-fructofuranosidase SUC6P -> G6P - FRU 1-Phosphofructokinase (Fructose 1-phosphate kinase) fruk F1P - ATP-> FDP - ADP Xylose isomerase xylA FRU -> GLC Phosphomannomutase cpsG MAN6P -> MAN1P US 2002/0012939 A1 Jan. 31, 2002

TABLE 1-continued The genes included in the E. coli metabolic genotype along with corresponding enzymes and reactions that comprise the genome specific stoichiometric matrix. The final column indicates the presencefabsence of the gene (as the number of copies) in the E. coli genome. Thus the presence of a gene in the E. coli genome indicates that the gene is part of the metabolic genotype. Reactions/Genes not present in the genome are those gathered at state 56 in Figure 2 and together with the reactions of the genes in the metabolic genotype form the columns of the genome specific stoichiometric matrix. E. coli Enzyme Gene Reaction Genome Mannose-6-phosphate isomerase man A MACN1P - F6P N-Acetylglucosamine-6-phosphate deacetylase nagA NAGP-> GA6P - AC Glucosamine-6-phosphate deaminase nag B GA6P - F6P - NH3 N-Acetylneuraminate lyase nan A SLA-> PYR - NAMAN L-Fucose isomerase fucI FUC -> FCL L-Fuculokinase fucK FCL - ATP-> FCL1P - ADP L-Fuculose phosphate aldolase fucA FCL1P &-> LACAL-T3P2 Lactaldehyde reductase fucC) LACAL - NADH &-> 12PPD - NAD Aldehyde dehydrogenase A aldA LACAL + NAD <-> LLAC + NADH Aldehyde dehydrogenase B aldB LACAL + NAD <-> LLAC + NADH Aldehyde dehydrogenase adhC LACAL + NAD <-> LLAC + NADH Aldehyde dehydrogenase adhC GLAL - NADH &-> GL - NADH Aldehyde dehydrogenase adhE LACAL + NAD -> LLAC + NADH Aldehyde dehydrogenase ald LACAL + NAD <-> LLAC + NADH Aldehyde dehydrogenase ald ACAL - NAD -> AC - NADH Gluconokinase I gntV GLCN - ATP -> D6PGC-ADP Gluconokinase II gntK GLCN - ATP -> D6PGC-ADP L-Rhamnose isomerase rhaA RMN &-> RML Rhamnulokinase rha RML - ATP-> RML1P - ADP Rhamnulose-1-phosphate aldolase rhaD RML1P &-> LACAL-T3P2 L-Arabinose isomerase ara A ARAB &-> RBL Arabinose-5-phospliate isomerase RLSP - ASP O L-Ribulokinase araB RBL + ATP-> RLSP + ADP L-Ribulose-phosphate 4-epimerase ara) RLSP -> XSP Xylose isomerase xylA XYL &-> XUL Xylulokinase xyl B XUL + ATP-> XSP + ADP Ribokinase rbsK RIB - ATP - RSP - ADP Mannitol-1-phosphate 5-dehydrogenase mtD MNT6P - NAD -> F6P - NADH Glucitol-6-phosphate dehydrogenase SrD GLT6P - NAD &-> F6P - NADH Galactitol-1-phosphate dehydrogenase gatD GLTL1P - NAD &-> TAG6P - NADH Phosphofructokinase B pfkB TAG6P + ATP-> TAG16P - ADP 1-Phosphofructokinase fruk TAG6P + ATP-> TAG16P - ADP Tagatose-6-phosphate kinase agaZ TAG6P + ATP-> TAG16P - ADP Tagatose-bisphosphate aldolase 2 gatY TAG16P -> T3P2 - T3P1 Tagatose-bisphosphate aldolase 1 agaY TAG16P -> T3P2 - T3P1 Glycerol kinase glpK GL - ATP-> GL3P - ADP Glycerol-3-phosphate-dehydrogenase-INAD(P)+ gpSA GL3P - NADP -> T3P2 - NADPH Phosphopentomutase deoB DR1P -> DRSP Phosphopentomutase deoB R1P - RSP Deoxyribose-phosphate aldolase deoC DRSP -> ACAL-T3P1 Asparate transaminase aspC OA + GLU &-> ASP + AKG Asparagine synthetase (Glutamate dependent) asnB ASP + ATP + GLN -> GLU - ASN - AMP - PPI Aspartate-ammonia asnA ASP + ATP - NH3 -> ASN - AMP - PPI Glutamate dehydrogenase gdha AKG - NH3 - NADPH &-> GLU - NADP Glutamate-ammonia ligase gln A GLU - NH3 + ATP-> GLN - ADP + PI Glutamate synthase gltBD AKG - GLN - NADPH-> NADP + 2 GLU 2 Alanine transaminase alaB PYR + GLU &-> AKG + ALA O Valine-pyruvate aminotransferase avtA OVAL - ALA-> PYR + VAL Alanine racemase, biosynthetic alr ALA <-> DALA Alanine racemase, catabolic dadX ALA -> DALA N-Acetylglutamate synthase argA GLU - ACCOA-> COA - NAGLU N-Acetylglutamate kinase argB NAGLU - ATP-> ADP - NAGLUYP N-Acetylglutamate phosphate reductase argC NAGLUYP - NADPH &-> NADP - P - NAGLUSAL Acetylornithine transaminase argD NAGLUSAL - GLU &-> AKG - NAARON Acetylornithine deacetylase argE NAARON -> AC - ORN Carbamoyl phosphate synthetase carAB GLN + 2 ATP + CO2 -> GLU - CAP - 2 ADP + PI 2 Ornfthine carbamoyl 1 argF ORN - CAP &-> CTR - PI 2 Ornithine carbamoyl transferase 2 arg ORN - CAP &-> CTR - PI Ornithine transaminase ygiGH ORN - AKG -> GLUGSAL - GLU 2 Argininosuccinate synthase argG CTR - ASP + ATP -> AMP - PPI ARGSUCC Argininosuccinate lyase argH ARGSUCC &-> FUM - ARG Arginine decarboxylase, biosynthetic speA ARG -> CO2 - AGM Arginine decarboxylase, degradative adi ARG -> CO2 - AGM speB AGM -> UREA - PTRC Ornithine decarboxylase, biosynthetic spec ORN - PTRC - CO2 Ornithine decarboxylase, degradative speF ORN - PTRC - CO2 Adenosylmethionine decarboxylase speD SAM &-> DSAM - CO2 US 2002/0012939 A1 Jan. 31, 2002 10

TABLE 1-continued The genes included in the E. coli metabolic genotype along with corresponding enzymes and reactions that comprise the genome specific stoichiometric matrix. The final column indicates the presencefabsence of the gene (as the number of copies) in the E. coli genome. Thus the presence of a gene in the E. coli genome indicates that the gene is part of the metabolic genotype. Reactions/Genes not present in the genome are those gathered at state 56 in Figure 2 and together with the reactions of the genes in the metabolic genotype form the columns of the genome specific stoichiometric matrix. E. coli Enzyme Gene Reaction Genome Spermidine synthase speE PTRC - DSAM-> SPMD - SMTA Methylthioadenosine nucleosidase SMTA-> AD + 5MTR 5-Methylthioribose kinase SMTR - ATP-> SMTRP - ADP 5-Methylthioribose-1-phosphate isomerase SMTRP -> SMTR1P E-1 (Enolase-phosphatase) SMTR1P -> DKMPP E-3 (Unknown) DKMPP -> FOR - KMB Transamination (Unknown) KMB - GLN -> GLU - MET Y-Glutamyl kinase proB GLU + ATP-> ADP + GLUP Glutamate-5-semialdehyde dehydrogenase proA GLUP - NADPH-> NADP - P -- GLUGSAL N-Acetylornithine deacetylase argE NAGLUSAL -> GLUGSAL - AC Pyrroline-5-carboxylate reductase proC GLUGSAL - NADPH-> PRO - NADP Threonine dehydratase, biosynthetic ilvA THR-> NH3 - OBUT Threonine dehydratase, catabolic tdcB THR-> NH3 - OBUT Acetohydroxybutanoate synthase I w8N OBUT - PYR-> ABUT - CO2 Acetohydroxybutanoate synthase II ilvG(12)M OBUT - PYR-> ABUT - CO2 3 Acetohydroxybutanoate synthase III w OBUT - PYR-> ABUT - CO2 Acetohydroxy Acid isomeroreductase wC ABUT - NADPH-> NADP - DHMVA Dihydroxy acid dehydratase ilv) DHMVA-> OMVAL Branched chain amino acid aminotransferase w OMVAL - GLU &-> AKG - ILE Acetolactate synthase I w8N 2 PYR-> CO2 - ACLAC Acetolactate synthase II ilvG(12)M 2 PYR-> CO2 - ACLAC Acetolactate synthase III w 2 PYR-> CO2 - ACLAC Acetohydroxy acid isomeroreductase vC ACLAC - NADPH-> NADP+ DHVAL Dihydroxy acid dehydratase ilv) DHVAL-> OVAL Branched chain amino acid aminotransferase w OVAL - GLU -> AKG - VAL Valine-pyruvate aminotransferase avtA OIVAL + ALA-> PYR + VAL Isopropylmalate synthase leuA ACCOA- OLVAL -> COA - CBHCAP Isopropylmalate isomerase leuCD CBHCAP -> IPPMAL 3-Isopropylmalate dehydrogenase leu3 PPMAL - NAD -> NADH -- OCAP + CO2 Branched chain amino acid aminotransferase w OCAP - GLU -> AKG - LEU Aromatic amino acid transaminase tyrB OCAP - GLU -> AKG - LEU 2-Dehydro-3-deoxyphosphoheptonate aldolase F aroF E4P - PEP-> PI-3DDAH7P 2-Dehydro-3-deoxyphosphoheptonate aldolase G aroG E4P - PEP-> PI-3DDAH7P 2-Dehydro-3-deoxyphosphoheptonate aldolase H aroH E4P - PEP-> PI-3DDAH7P 3-Dehydroquinate synthase aroB 3DDAH7P -> DOT + PI 3-Dehydroquinate dehydratase aroD DOT &-> DHSK Shikimate dehydrogenase aroE DHSK- NADPH &-> SME - NADP Shikimate kinase I aroK SME - ATP-> ADP - SMESP Shikimate kinase II aroL SME - ATP-> ADP - SMESP 3-Phosphoshikimate-1-carboxyvinyltransferase aroA SMESP - PEP -> 3PSME - PI Chorismate synthase aroC 3PSME - P - CHOR Chorismate mutase 1 phea CHOR-> PHEN Prephenate dehydratase phea PHEN -> CO2 + PHPYR Aromatic amino acid transaminase yrB PHPYR GLU - AKG – PHE Chorismate mutase 2 yr A CHOR-> PHEN Prephanate dehydrogenase yr A PHEN - NAD - HPHPYR - CO2 - NADH Aromatic amino acid transaminase yrB HPHPYR GLU - AKG - TYR Asparate transaminase aspC HPHPYR GLU - AKG - TYR Anthranilate synthase CHOR - GLN -> GLU - PYR - AN Anthranilate synthase component II AN - PRPP - PPI - NPRAN Phosphoribosyl anthranilate isomerase NPRAN - CPADSP Indoleglycerol phosphate synthase CPAD5P -> CO2 - GP Tryptophan synthase GP - SER -> T3P1 - TRP Pliosphoribosyl pyrophosphate synthase RSP - ATP - PRPP - AMP ATP phosphoribosyltransferase PRPP - ATP-> PPI - PRBATP Phosphoribosyl-ATP pyrophosphatase PRBATP-> PPI - PRBAMP Phosphoribosyl-AMP cyclohydrolase PRBAMP-> PRFP Phosphoribosylformimino-5-amino-1-phos PRFP - PRLP phonbosyl-4-imidazole c Imidazoleglycerol phosphate synthase PRLP - GLN -> GLU - AICAR - DIMGP Imidazoleglycerol phosphate dehydratase DIMGP - IMACP L-Histidinol phosphate aminotransferase IMACP - GLU -> AKG - HISOLP Histidinol phosphatase HISOLP - P - HISOL Histidinol dehydrogenase HISOL - 3 NAD-> HIS - 3 NADH 3 -Phosphoglycerate dehydrogenase 3PG - NAD - NADH PHP Phosphoserine transaminase PHP - GLU - AKG - 3PSER Phosphoserine phosphatase 3PSER - P - SER US 2002/0012939 A1 Jan. 31, 2002 11

TABLE 1-continued The genes included in the E. coli metabolic genotype along with corresponding enzymes and reactions that comprise the genome specific stoichiometric matrix. The final column indicates the presencefabsence of the gene (as the number of copies) in the E. coli genome. Thus the presence of a gene in the E. coli genome indicates that the gene is part of the metabolic genotype. Reactions/Genes not present in the genome are those gathered at state 56 in Figure 2 and together with the reactions of the genes in the metabolic genotype form the columns of the genome specific stoichiometric matrix. E. coli Enzyme Gene Reaction Genome Glycine hydroxymethyltransferase glyA THF - SER -> GLY - METTHF Threonine dehydrogenase todh THR- COA-> GLY - ACCOA Amino ketobutyrate CoA ligase kb THR- COA-> GLY - ACCOA Sulfate adenylyltransferase cysDN SLF - ATP - GTP-> PPI APS - GDP - PI Adenylylsulfate kinase cysC APS - ATP-> ADP - PAPS 3'-Phospho-adenylylsulfate reductase cysH PAPS - RTHO-> OTHIO - H2SO3 - PAP Suffite reductase cysIJ H2SO3 - 3NADPH-> H2S - 3 NADP Serine transacetylase cysE SER - ACCOA &-> COA - ASER O-Acetylserine (thiol)-lyase A cysK ASER + H2S -> AC - CYS O-Acetylserine (thiol)-lyase B cysM ASER + H2S -> AC - CYS 3'-5' Bisphosphate nucleotidase PAP -> AMP - PI Aspartate kinase I thrA ASP + ATP &-> ADP + BASP Aspartate kinase II metL ASP + ATP &-> ADP + BASP Aspartate kinase III lysO ASP + ATP &-> ADP + BASP Aspartate semialdehyde dehydrogenase asd BASP - NADPH &-> NADP - P - ASPSA Homoserine dehydrogenase I thrA ASPSA - NADPH &-> NADP+ HSER Homoserine dehydrogenase II metL ASPSA - NADPH &-> NADP+ HSER Homoserine kinase thr3 HSER - ATP -> ADP + PHSER Threonine synthase thrC PHSER - P - THR Dihydrodipicolinate synthase dap A ASPSA - PYR-> D23PIC Dihydrodipicolinate reductase dapB D23PIC - NADPH-> NADP - PIP26DX Tetrahydrodipicolinate succinylase dapD PIP26DX -- SUCCOA-> COA - NS2A6O Succinyl diaminopimelate aminotransferase dapC NS2A6O - GLU &-> AKG - NS26DP Succinyl diaminopimelate desuccinylase dapF NS26DP-> SUCC - D26PIM Diaminopimelate epimerase dapF D26PM -> MDAP Diaminopimelate decarboxylase lySA MDAP -> CO2 - LYS Lysine decarboxylase 1 cadA LYS -> CO2 - CADV Lysine decarboxylase 2 ldcC LYS -> CO2 - CADV Homoserine transSuccinylase metA HSER - SUCCOA-> COA - OSLHSER O-succinly homoserine lyase metB OSLEHSER - CYS -> SUCC - LLCT Cystathionine-f-lyase metC LLCT-> HCYS - PYR - NH3 Adenosyl homocysteinase (Unknown) Unknown HCYS - ADN &-> SAH Cobalamin-dependent methionine synthase metH HCYS - MTHF -> MET THF Cobalamin-independent methionine synthase met HCYS - MTHF -> MET THF 5-Adenosylmethionine synthetase metK MET - ATP - PPI - P - SAM D-Amino acid dehydrogenase dadA DALA - FAD -> FADH - PYR - NH3 Putrescine transaminase pat PTRC - AKG -> GABAL - GLU Amino oxidase tynA PTRC -> GABAL - NH3 Aminobutyraldehyde dehydrogenase prr GABAL - NAD -> GABA - NADH Aldehyde dehydrogenase ald GABAL - NAD -> GABA - NADH Aminobutyrate aminotransaminase 1 gabT GABA-AKG -> SUCCSAL - GLU Aminobutyrate aminotransaminase 2 goaG GABA-AKG -> SUCCSAL - GLU Succinate semialdehyde dehydrogenase-NAD sad SUCCSAL - NAD -> SUCC - NADH Succinate semialdehyde dehydrogenase-NADP gabl) SUCCSAL - NADP-> SUCC - NADPH Asparininase I anSA ASN -> ASP + NH3 Asparininase II ansB ASN -> ASP + NH3 Aspartate ainmonia-lyase aspA ASP-> FUM - NH3 Tryptophanase tnaA CYS -> PYR - NH3 - H2S L-Cysteine desulfhydrase CYS-> PYR - NH3 - H2S Glutamate decarboxylase A gadA GLU -> GABA + CO2 Glutamate decarboxylase B gadB GLU -> GABA + CO2 A GLN -> GLU - NH3 O Glutaminase B GLN -> GLU - NH3 Proline dehydrogenase putA PRO - FAD -> FADH -- GLUGSAL Pyrroline-5-carboxylate dehydrogenase putA GLUGSAL - NAD -> NADH -- GLU Serine deaminase 1 sdaA SER -> PYR - NH3 Serine deaminase 2 sdaB SER -> PYR - NH3 Trypothanase tnaA SER -> PYR - NH3 D-Serine deaminase dsdA DSER-> PYR - NH3 Threonine dehydrogenase todh THR - NAD -> 2A3O - NADH Amino ketobutyrate ligase kb 2A3O + COA-> ACCOA - GLY Threonine dehydratase catabolic tdcB THR-> OBUT - NH3 Threonine deaminase 1 sdaA THR-> OBUT - NH3 Threonine deaminase 2 sdaB THR-> OBUT - NH3 Tryptophanase tnaA TRP -> INDOLE - PYR - NH3 Amidophosphoribosyl transferase purF PRPP - GLN - PPI - GLU - PRAM Phosphoribosylamine-glycine ligase purD PRAM + ATP + GLY &-> ADP - P -- GAR US 2002/0012939 A1 Jan. 31, 2002 12

TABLE 1-continued The genes included in the E. coli metabolic genotype along with corresponding enzymes and reactions that comprise the genome specific stoichiometric matrix. The final column indicates the presencefabsence of the gene (as the number of copies) in the E. coli genome. Thus the presence of a gene in the E. coli genome indicates that the gene is part of the metabolic genotype. Reactions/Genes not present in the genome are those gathered at state 56 in Figure 2 and together with the reactions of the genes in the metabolic genotype form the columns of the genome specific stoichiometric matrix. E. coli Enzyme Gene Reaction Genome Phosphoribosylglycinamide formyltransferase burn GAR - FTHF - THF - FGAR GAR transformylase T burT GAR - FOR - ATP-> ADP - P - FGAR Phosphoribosylformylglycinamide synthetase burL FGAR + ATP + GLN -> GLU - ADP - P - FGAM Phosphoribosylformylglycinamide cyclo-ligase bury FGAM+ ATP-> ADP - P - AIR Phosphoribosylaminoimidazole carboxylase 1 burk AIR + CO2 + ATP &-> NCAR - ADP + PI Phosphoribosylaminoimidazole carboxylase 2 burE NCAIR &-> CAIR Phosphoribosylaminoimidazole-succinocarboxamide burC CAR + ATP - ASP &-> ADP + P - SAICAR synthetase 5'-Phosphoribosyl-4-(N-succinocarboxamide)-5- bur SAICAR &-> FUM - AICAR aminoimidazole lya AICAR transformylase bur AICAR - FTHF -> THF - PRFICA IMP cyclohydrolase bur PRFICA - IMP Adenylosuccinate synthetase purA IMP - GTP - ASP-> GDP - P - ASUC Adenylosuccinate lyase bur ASUC &-> FUM-AMP IMP dehydrogenase guaB IMP - NAD - NADH - XMP GMP synthase guaA XMP - ATP - GLN - GLU - AMP - PPI - GMP GMP reductase guaG GMP - NADPH-> NADP - IMY - NH3 Aspartate-carbamoyltransferase pyrBI CAP - ASP-> CAASP + PI pyrC CAASP &-> DOROA Dihydroorotate dehydrogenase pyrD DOROA + O. <-> OH2 + OROA Orotate phosphoribosyl transferase pyrE OROA - PRPP - PPI - OMP OMP decarboxylase pyrF OMP -> CO2 - UMP CTP synthetase pyrC UTP - GLN - ATP-> GLU - CTP - ADP - PI Adenylate kinase ATP + AMP &-> 2 ADP Adenylate kinase GTP AMP -> ADP - GDP Adenylate kinase ITP AMP-> ADP IDP Adenylate kinase DAMP + ATP &-> ADP + DADP Guanylate kinase GMP - ATP - GDP - ADP Deoxyguanylate kinase DGMP - ATP -> DGDP - ADP Nucleoside-diphosphate kinase GDP - ATP -> GTP - ADP Nucleoside-diphosphate kinase UDP - ATP -> UTP - ADP Nucleoside-diphosphate kinase CDP - ATP -> CTP - ADP Nucleoside-diphosphate kinase DGDP - ATP -> DGTP - ADP Nucleoside-diphosphate kinase DUDP - ATP -> DUTP - ADP Nucleoside-diphosphate kinase DCDP - ATP -> DCTP - ADP Nucleoside-diphosphate kinase DADP + ATP &-> DATP + ADP Nucleoside-diphosphate kinase DTDP - ATP -> DTTP - ADP AMP Nucleosidse AMP-> AD + RSP ADN -> INS - NH3 Deoxyadenosine deaminase DA-> DIN - NH3 Adenine deaminase yalic AD -> NH3 - HYXN Inosine kinase INS - ATP - IMP - ADP Guanosine kinase S.S GSN - ATP -> GMP - ADP Adenosine kinase ADN + ATP-> AMP - ADP Adenine phosphotyltransferase AD - PRPP - PPI AMP Xanthine-guanine phosphoribosyltransferase XAN - PRPP-> XMP - PPI Xanthine-guanine phosphoribosyltransferase HYXN - PRPP-> PPI - IMP Hypoxanthine phosphoribosyltransferaseS HYXN - PRPP-> PPI - IMP Xanthine-guanine pliosphoribosyltransferase GN - PRPP-> PPI - GMP Hypoxanthine phosphoribosyltransferase GN - PRPP-> PPI - GMP Xanthosine phosphorylase DIN - P -> HYXN - DR1P Purine nucleotide phosphorylase DLN - P -> HYXN - DR1P Xanthosine phosphorylase DA - P &-> AD - DR1P Purine nucleotide phosphorylase DA - P &-> AD - DR1P Xanthosine phosphorylase DG - P &-> GN - DR1P Purine nucleotide phosphorylase DG - P &-> GN - DR1P Xanthosine phosphorylase HYXN - R1P -> INS - PI Purine nucleotide phosphorylase HYXN - R1P -> INS - PI Xanthosine phosphorylase AD - R1P &-> PI - ADN Purine nucleotide phosphorylase AD - R1P &-> PI - ADN Xanthosine phosphorylase GN - R1P &-> P -- GSN Purine nucleotide phosphorylase GN - R1P &-> P -- GSN Xanthosine phosphorylase XAN - R1P &-> PI - XTSN Purine nucleotide phosphorylase XAN - R1P &-> PI - XTSN Uridine phosphorylase URI - P &-> URA - R1P Thymidine (deoxyuridine) phosphorylase DU - P &-> URA - DR1P Purine nucleotide phosphorylase DU - P &-> URA - DR1P US 2002/0012939 A1 Jan. 31, 2002 13

TABLE 1-continued The genes included in the E. coli metabolic genotype along with corresponding enzymes and reactions that comprise the genome specific stoichiometric matrix. The final column indicates the presencefabsence of the gene (as the number of copies) in the E. coli genome. Thus the presence of a gene in the E. coli genome indicates that the gene is part of the metabolic genotype. Reactions/Genes not present in the genome are those gathered at state 56 in Figure 2 and together with the reactions of the genes in the metabolic genotype form the columns of the genome specific stoichiometric matrix. E. coli Enzyme Gene Reaction Genome Thymidine (deoxyuridine) phosphorylase deoA DT - P -> THY - DR1P Cytidylate kinase cmkA DCMP - ATP - ADP-DCDP Cytidylate kinase cmkA CMP - ATP -> ADP - CDP Cytidylate kinase cmkB DCMP - ATP - ADP-DCDP Cytidylate kinase cmkB CMP - ATP -> ADP - CDP Cytidylate kinase cmkA UMP - ATP - ADP - UDP Cytidylate kinase cmkB UMP - ATP - ADP - UDP dTMP kinase timk DTMP - ATP - ADP-DTDP Uridylate kinase pyrH UMP - ATP -> UDP - ADP Uridylate kinase pyrH DUMP - ATP -> DUDP - ADP Thymidine (deoxyuridine) kinase tok DU - ATP -> DUMP - ADP Uracil phosphoribosyltransferase upp URA - PRPP -> UMP - PPI Cytosine deaminase codA CYTS -> URA - NH3 Uridine kinase udk URI - GTP-> GDP - UMP Cytodine kinase udk CYTD - GTP - GDP - CMP CMP glycosylase CMP -> CYTS - RSP cold CYTD -> URI - NH3 Thymidine (deoxyniridine) kinase tok DT - ATP-> ADP-DTMP dCTP deaminase dcd DCTP -> DUTP - NH3 Cytidine deaminase cold DC - NH3 - DU 5'-Nucleotidase ushA DUMP -> DU - PI 5'-Nucleotidase ushA DTMP -> DT - PI 5'-Nucleotidase ushA DAMP -> DA - PI 5'-Nucleotidase ushA DGMP -> DG - PI 5'-Nucleotidase ushA DCMP -> DC - PI 5'-Nucleotidase ushA CMP - CYTD PI 5'-Nucleotidase ushA AMP-> PI - ADN 5'-Nucleotidase ushA GMP - P - GSN 5'-Nucleotidase ushA IMP - P - INS 5'-Nucleotidase ushA XMP - PI - XTSN 5'-Nucleotidase UMP -> PT - URI Ribonucleoside-diphosphate reductase nrdAB ADP - RTHIO -> DADP - OTHIO Ribonucleoside-diphosphate reductase nrdAB GDP - RTHIO -> DGDP - OTHIO : Ribonucleoside-triphosphate reductase nrdD ATP - RTHIO -> DATP - OTHIO Ribonucleoside-triphosphate reductase nrdD GTP - RTHIO -> DGTP - OTHIO Ribonucleoside-diphosphate reductase nrdAB CDP - RTHO -> DCDP - OTHIO Ribonucleoside-diphosphate reductase II nirdEF CDP - RTHO -> DCDP - OTHIO Ribonucleoside-diphosphate reductase nrdAB UDP - RTHIO -> DUDP - OTHIO Ribonucleoside-triphosphate reductase nrdD CTP - RTHIO -> DCTP - OTHO Ribonucleoside-triphosphate reductase nrdD UTP - RTHIO -> OTHO - DUTP dUTP pyrophosphatase dut DUTP - PPI - DUMP Thymidilate synthetase thy A DUMP - METTHF -> DHF - DTMP Nucleoside triphosphatase mutT GTP-> GSN - 3 PI Nucleoside triphosphatase mutT DGTP -> DG - 3 PI Deoxyguanosinetriphosphate triphophohydrolase dgt DGTP -> DG - 3 PI Deoxyguanosinetriphosphate triphophohydrolase dgt GTP-> GSN - 3 PI Glycine cleavage system (Multi-component system) gcvHTP, IpdA GLY - THF - NAD - METTHF - NADH CO2 - NH3 4 Formyl tetrahydrofolate deformylase burU FTHF - FOR THF Methylene tetrahydrofolate reductase metF METTHF - NADH - NAD - MTHF Methylene THF dehydrogenase oD METTHF - NADP - METHF - NADPH Methenyl tetrahydrofolate cyclehydrolase oD METHE -> FTHF Acetyl-CoA carboxyltransferase accABD ACCOA-ATP + CO2 &-> MALCOA-ADP + PI Malonyl-CoA-ACP transacylase abD MALCOA - ACP &-> MALACP + COA Malonyl-ACP decarboxylase adB MALACP -> ACACP + CO2 Acetyl-CoA-ACP transacylase ab ACACP + COA &-> ACCOA - ACP Acyltransferase pis GL3P - O.O35 C140ACP - 0.102 C141ACP - 0.717 C16OAC O CDP-Diacylglycerol synthetase casA PA - CTP -> CDPDG - PPI CDP-Diacylglycerol pyrophosphatase cdh CDPDG - CMP - PA Phosphatidylserine synthase pSSA CDPDG - SER - CMP - PS Phosphatidylserine decarboxylase psd PS-> PE - CO2 Phosphatidylglycerol phosphate synthase pgSA CDPDG - GL3P -> CMP - PGP Phosphatidylglycerol phosphate phosphatase A pgpA PGP - P - PG Phosphatidylglycerol phosphate phosphatase B pgpB PGP - P - PG Cardiolipin synthase cls 2 PG &-> CL - GL Acetyl-CoA C-acetyltransferase atoB 2 ACCOA <-> COA + AACCOA Isoprenyl-pyrophosphate synthesis pathway T3P1 - PYR - 2 NADPH - ATP-> IPPP - ADP - 2 NADP- O Isoprenyl pyrophosphate isomerase IPPP - DMPP US 2002/0012939 A1 Jan. 31, 2002 14

TABLE 1-continued The genes included in the E. coli metabolic genotype along with corresponding enzymes and reactions that comprise the genome specific stoichiometric matrix. The final column indicates the presencefabsence of the gene (as the number of copies) in the E. coli genome. Thus the presence of a gene in the E. coli genome indicates that the gene is part of the metabolic genotype. Reactions/Genes not present in the genome are those gathered at state 56 in Figure 2 and together with the reactions of the genes in the metabolic genotype form the columns of the genome specific stoichiometric matrix. E. coli Enzyme Gene Reaction Genome Farnesyl pyrophosphate synthetase ispA DMPP IPPP - GPP PPI Geranyltranstransferase ispA GPP IPPP - FPP PPI Octoprenyl pyrophosphate synthase (5 reactions) ispB 5 IPPP FPP - OPP-5 PPI Undecaprenyl pyrophosphate synthase (8 reactions) 8 IPPP FPP - UDPP-8 PPI Chorismate pyruvate-lyase ubiC CHOR-> 4HBZ - PYR Hydroxybenzoate octaprenyltransferase ubiA 4HBZ - OPP -> O4HBZ - PPI Octaprenyl-hydroxybeuzoate decarboxylase ubiD, ubiX O4HBZ-> CO2 + 2COPPP 2-Octaprenylphenol hydroxylase ubiB 2OPPP - O2 - 206H Methylation reaction 2O6H - SAM -> 2OPMP - SAH 2-Octaprenyl-6-methoxyphenol hydroxylase ubi 2OPMP - O2 -> 2OPME3 2-Octaprenyl-6-methoxy-1,4-benzoquinone methylase ubi 2OPMB - SAM-> 2OPMMB - SAH O 2-Octaprenyl-3-methyl-6-methoxy-1,4- ubiF 2OPMMB - O2 -> 2OMHMB benzoquinone hydroxylase 3-Dimethylubiquinone 3-methyltransferase ubiG 2OMHMB + SAM -> OH2 + SAH Isochorismate synthase 1 menF CHOR-> ICHOR C-Ketoglutarate decarboxylase menD AKG - TPP-> SSALTPP - CO2 SHCHC synthase menD CHOR - SSALTPP-> PYR - TPP - SHCHC O-Succinylbenzoate-CoA synthase menC SHCHC -> OSB O-Succinylbenzoic acid-CoA ligase menE OSB + ATP + COA-> OSBCOA-AMP - PPI Naphthoate synthase menB OSBCOA-> DHNA + COA 1,4-Dihydroxy-2-naphthoate octaprenyltransferase menA DHNA. OPP - DMK - PPI - CO2 S-Adenosylmethionine-2-DMK methyltransferase menG DMK - SAM-> MK-- SAH Isochorismate synthase 2 entC CHOR-> ICHOR Isochorismatase entB CHOR - 23DHDHB - PYR 2,3-Dihydo-2,3-dihydroxybenzoate dehydrogenase entA 23DHDHB - NAD &-> 23DHB - NADH ATP-dependent activation of 2,3-dihydroxybenzoate ent 23DHB - ATP - 23DHBA - PPI ATP-dependent serine activating enzyme ent SER - ATP &-> SERA - PPI Enterochelin synthetase ent) 3 SERA+ 3 23DHBA-> ENTER - 6 AMP GTP cyclohydrolase II ribA GTP-> D6RPSP - FOR - PPI Pryimidine deaminase ribD D6RPSP -> A6RPSP - NH3 Pyrimidine reductase ribD A6RPSP - NADPH-> A6RPSP2 - NADP Pyrimidine phosphatase A6RPSP2 -> A6RP - PI 3,4 Dihydroxy-2-butanone-4-phosphate synthase rib8 RLSP-> DB4P - FOR 6,7-Dimethyl-8-ribityllumazine synthase ribe DB4P - A6RP -> D8RL - PI Riboflavin synthase rib 2 D8RL-> RBFLV - A6RP Riboflavin kinase rib RIBFLV-ATP-> FMN - ADP FAD synthetase rib FMN - ATP - FAD - PPI GTP cyclohydrolase I oE GTP-> FOR - AHTD Dihydroneopterin triphosphate pyrophosphorylase intpa AHTD -> PPI - DHPP Nucleoside triphosphatase mutT AHTD -> DHP - 3 PI Dihydroneopterin monophosphate dephosphorylase DHPP -> DHP - PI Dihydroneopterin aldolase olB DHP-> AHHMP - GLAL 6-Hydroxymethyl-7.8 dihydropterin pyrophosphokinase olK AHHMP - ATP-> AMP-AHHMD Aminodeoxychorismate synthase pabAB CHOR - GLN -> ADCHOR - GLU Aminodeoxychorismate lyase babC ADCHOR-> PYR - PABA Dihydropteroate synthase oIP PABA-AHHMD -> PPI - DHPT Dihydrofolate synthetase olC DHPT - ATP - GLU - ADP - P - DHF Dihydrofolate reductase olA DHF - NADPH-> NADP - THF Ketopentoate hydroxymethyl transferase ban OVAL - METTHF-> AKP - THF Ketopantoate reductase ban AKP - NADPH-> NADP - PANT AcetohyOxyacid isomeroreductase wC AKP - NADPH-> NADP - PANT Aspartate decarboxylase banD ASP-> CO2 + b ALA Pantoate-f-alanine ligase banC PANT - baA - ATP-> AMP - PPI - PNTO Pantothenate kinase coaA PNTO -- ATP-> ADP - 4PPNTO Phosphopantothenate-cysteine ligase 4PPNTO - CTP - CYS -> CMP - PPI - 4PPNCYS Phosphopantothenate-cysteine decarboxylase 4PPNCYS -> CO2 - 4PPNTE Phospho-pantethiene adenylyltransferase 4PPNTE - ATP -> PPI - DPCOA DephosphoCoA kinase DPCOA + ATP-> ADP + COA ACP Synthase acpS COA-> PAP - ACP Aspartate oxidase madB ASP - FAD -> FADH - ISUCC Quinolate synthase madA ISUCC + T3P2 -> PI + OA Quinolate phosphoribosyl transferase madC OA + PRPP-> NAMN + CO2 + PPI NAMN adenylyl transferase madD NAMN + ATP-> PPI - NAAD NAMN adenylyl transferase madD NMN - ATP-> NAD - PPI Deanido-NAD ammonia ligase madE NAAD + ATP + NH3 -> NAD+. AMP - PPI NAD kinase madEG NAD+ ATP-> NADP+ ADP NADP phosphatase NADP-> NAD - PI US 2002/0012939 A1 Jan. 31, 2002 15

TABLE 1-continued The genes included in the E. coli metabolic genotype along with corresponding enzymes and reactions that comprise the genome specific stoichiometric matrix. The final column indicates the presencefabsence of the gene (as the number of copies) in the E. coli genome. Thus the presence of a gene in the E. coli genome indicates that the gene is part of the metabolic genotype. Reactions/Genes not present in the genome are those gathered at state 56 in Figure 2 and together with the reactions of the genes in the metabolic genotype form the columns of the genome specific stoichiometric matrix. E. coli Enzyme Gene Reaction Genome DNA ligase ig NAD -> NMN - AMP NMN bncC NMN - NAMN - NH3 NMN glycohydrolase (cytoplasmic) NMN -> RSP - NAn O NAm amidohydrolase pncA NAm -> NAC - NH3 NAPRTase bncB NAC - PRPP - ATP - NAMN - PPI - P - ADP NMD pyrophosphatase bnuE NADxt -> NMNxt - AMPxt NMN permease bnuC NMNxt - NMN NMN glycohydrolase (membrane bound) NMNxt -> RSP + NAm O Nicotinic acid uptake NACxt -> NAC GSA synthetase hemM GLU - ATP -> GTRNA-AMP - PPI Glutamyl-tRNA synthetase gltX GLU - ATP -> GTRNA-AMP - PPI Glutamyl-tRNA reductase hemA GTRNA + NM)PI1 -> GSA + NADP Glutamate-1-semialdehyde aminotransferase hemL GSA -> ALAV Porphobilinogen synthase hem3 8 ALAV -> 4PBG Hydroxymethylbilane synthase hemC 4PBG - HMB - 4 NH3 Uroporphyrinogen III synthase hemD HMB -> UPRG Uroporphyrin-III C-methyltransferase 1 hemX SAM - UPRG -> SAH - PC2 Uroporphyrin-Ill C-methyltransferase 2 cysG SAM - UPRG-> SAH - PC2 1,3-Dimethyluroporphyrinogen III dehydrogenase cysG PC2 - NAD -> NADH - SHCL Siroheme ferrochelatase cysG SHCL - SHEME Uroporphyrinogen decarboxylase hem UPRG - 4 CO2 - CPP Coproporphyrinogen Oxidase, aerobic hemF O2 - CPP -> 2 CO2 - PPHG Protoporphyrinogen oxidase hemCi O2 - PPHG -> PPIX : Ferrochelatase hem PPDX - PTH Heme O synthase cyoE PTH - FPP -> HO - PPI 8-Amino-7-oxononanoate synthase bioF ALA + CHCOA <-> CO2 + COA + AONA Adenosylmethionine-8-amino-7-Oxononanoate bioA SAM - AONA &-> SAMOB - DANNA aminotransferase Dethiobiotin synthase bioD CO2 - DANNA - ATP &-> DTB - P - ADP Biotin synthase bio DTB - CYS -> BT Glutamate-cysteine ligase gshA CYS - GLU - ATP-> GC - P - ADP Glutathione synthase gshB GLY - GC + ATP-> RGT - P - ADP Glutathione reductase gOr NADPH- OGT - NADP - RGT thiC protein hiC AIR-> AHM HMP kinase hN AHIM + ATP-> AHMP - ADP HMP-phosphate kinase hiD AHMP - ATP-> AHMPP - ADP Hypothetical T3P1 - PYR-> DTP thiG protein hiG DTP - TYR - CYS -> THZ - HBA- CO2 thiE protein hE DTP - TYR - CYS -> THZ - HBA- CO2 thiF protein hE DTP - TYR - CYS -> THZ - HBA- CO2 thiH protein h DTP - TYR - CYS -> THZ - HBA- CO2 THZ kinase hiM THZ + ATP-> THZP - ADP Thiamin phosphate synthase his THZP - AHMPP - THMP - PPI Thiamin kinase hik THMP - ADP -> THAMIN - ATP Thiamin phosphate kinase hiL THMP - ATP -> TPP - ADP Erythrose 4-phosphate dehydrogenase E4P - NAD &-> ER4P - NADH Erythronate-4-phosphate dehydrogenase ER4P - NAD &-> OHB - NADH Hypothetical transaminase/phosphoserine transaminase S OHB -- GLU &-> PHT-AKG Pyridoxal-phosphate biosynthetic proteins pdx.J-pdxA PHT - DXSP-> PSP + CO2 Pyridoxine 5'-phosphate oxidase PSP - O2 -> PLSP - H2O2 Threonine synthase PHT-> 4HLT - PI Hypothetical Enzyme 4HLT-> PYRDX Pyridoxine kinase PYRDX - ATP-> PSP - ADP Hypothetical Enzyme P5P - PYRDX. PI O Hypothetical Enzyme PLSP - PL - PI Pyridoxine kinase PL - ATP-> PLSP-ADP Pyridoxine 5'-phosphate oxidase PYRDX - O2 -> PL - H2O2 Pyridoxine 5'-phosphate oxidase PL - O2 - NH3 &-> PDLA + H2O2 Pyridoxine kinase PDLA - ATP - PDLASP - ADP Hypothetical Enzyme PDLASP-> PDLA - PI Pyridoxine 5'-phosphate oxidase bdxH PDLASP - O2 - PLSP - H2O2 - NH3 Serine hydroxymethyltransferase (serine methylase) glyA PLSP - GLU -> PDLASP - AKG Serine hydroxymethyltransferase (serile methylase) glyA PLSP - ALA-> PDLASP - PYR Glutamine fm.ctose-6-phosphate Transaminase glimS F6P -- GLN -> GLU - GA6P Phosphoglucosamine mutase glmM GA6P &-> GA1P N-Acetylglucosamine-1-phosphate-uridyltransferase glm U UTP+ GAIP-- ACCOA-> UDPNAG + PPI + COA UDP-N-acetylglucosamine acyltransferase lpXA C140ACP - UDPNAG -> ACP - UDPG2AA US 2002/0012939 A1 Jan. 31, 2002 16

TABLE 1-continued The genes included in the E. coli metabolic genotype along with corresponding enzymes and reactions that comprise the genome specific stoichiometric matrix. The final column indicates the presencefabsence of the gene (as the number of copies) in the E. coli genome. Thus the presence of a gene in the E. coli genome indicates that the gene is part of the metabolic genotype. Reactions/Genes not present in the genome are those gathered at state 56 in Figure 2 and together with the reactions of the genes in the metabolic genotype form the columns of the genome specific stoichiometric matrix. E. coli Enzyme Gene Reaction Genome UDP-3-O-acyl-N-acetylglucosamine deacetylase pXC UDPG2AA -> UDPG2A - AC UDP-3-O-(3-hydroxymyristoyl)glucosamine- pxD UDPG2A - C140ACP -> ACP - UDPG23A acyltransferase UDP-sugar hydrolase ushA UDPG23A -> UMP - LIPX Lipid A disaccharide synthase pxB LIPX UDPG23A -> UDP - DSAC1P Tetraacyldisaccharide 4" kinase DISAC1P - ATP-> ADP - LIPIV O 3-Deoxy-D-manno-octulosonic-acid transferase kdtA LIPIV - CMPKDO - KDOLIPIV - CMP (KDO transferase) 3-Deoxy-D-manno-octulosonic-acid transferase kdtA KDOLIPIV - CMPKDO - K2LPIV - CMP (KDO transferase) Endotoxin synthase htrB, msbB K2LPIV - C140ACP - C12OACP-> LIPA - 2 ACP 2 3-Deoxy-D-manno-octulosonic-acid 8-phosphate kdSA PEP - ASP - KDOP - PI synthase 3-Deoxy-D-manno-octulosonic-acid 8-phosphate KDOP - KDO - PI O phosphatase CMP-2-keto-3-deoxyoctonate synthesis kdsB KDO - CTP -> PPI - CMPKDO ADP-L-glycero-D-mannoheptose-6-epimerase pcA, rfaED S7P - ATP-> ADPHEP - PPI UDPglucose-1-phosphate uridylyltransferase gall J, galF G1P - UTP -> PPI - UDPG 2 Ethanolamine phosphotransferase PE - CMP-> CDPETN - DGR O Phosphatidate phosphatase PA-> PI - DGR O Diacylglycerol kinase dgkA DGR - ATP-> ADP + PA LPS Synthesis - truncated version of LPS (ref neid) rfaLIGFC LIPA - 3 ADPHEP - 2 UDPG - 2 CDPETN - 3 CMPKDO -> 6 UDP-N-acetylglucosamine-enolpyruvate transferase murA UDPNAG. PEP -> UDPNAGEP - PI UDP-N-acetylglucosamine-enolpyruvate dehydrogenase murB UDPNAGEP - NADPH-> UDPNAM - NADP 1 UDP-N-acetylmuramate-alanine ligase murO UDPNAM - ALA - ATP-> ADP - P -- UDPNAMA UDP-N-acetylmuramoylalanine-D-glutamate ligase mur) UDPNAMA - DGLU - ATP -> UDPNAMAG - ADP - PI UDP-N-acetylmuramoylalanyl-D-glutamate 2,6-diamino- murE UDPNAMAG + ATP + MDAP -> UNAGD - ADP + PI pimelate lig D-Alanine-D-alanine adding enzyme murF UNAGD + ATP + AA-> UNAGDA - ADP + PI Glutamate racemase mur GLU &-> DGLU D-ala:D-ala ddlAB 2 DALA <-> AA 2 Phospho-N-acetylmuramoylpentapeptide transferase mraY UNAGDA -> UMP - P - UNPTDO N-Acetylglucosaminyl transferase murg UNPTDO - UDPNAG -> UDP - PEPTIDO Arabinose (low affinity) araB. ARABxt - HEXT &-> ARAB Arabinose (high affinity) arafGH ARABxt + ATP-> ARAB - ADP + PI 3 Dihydroxyacetone DHAxt - PEP-> T3P2 - PYR O Fructose fruABF FRUxt - PEP-> F1P - PYR 2 Fucose fucP FUCxt - HEXT - FUC Galacitol gatABC GLTLxt - PEP-> GLTL1B - PYR 3 Galactose (low affinity) galP GLACxt -- HEXT-> GLAC Galactose (low affinity) galP GLCxt -- HEXT-> GLC Galactose (high affinity) mg|ABC GLACxt + ATP-> GLAC-ADP + PI 3 Glucitol SrA1A2B GLTxt - PEP-> GLT6P - PYR 3 Gluconate gntST GLCNxt + ATP-> GLCN - ADP + PT Glucose ptsG, crr GLCxt - PEP-> G6P - PYR 2 Glycerol glpF GLxt <-> GL Lactose lacY LCTSxt - NEXT &-> LCTS Maltose malX, crr, malE MLTxt + PEP-> MLT6P + PYR 7 Mannitol mtlA, cmtAB MNTxt - PEP-> MNT6P - PYR 3 Mannose manATZ, ptsPA MANxt + PEP-> MAN1P + PYR 6 Melibiose melB MELxt -- HEXT-> MELI N-Acetylglucosamine nagE, ptsN NAG - PEP - NAGP - PYR 2 Rhamnose rhaT RMNxt + ATP-> RMN - ADP + PI Ribose rbsABCD, xylH RIBxt + ATP-> RIB + ADP + PI 5 Sucrose SC SUCxt - PEP-> SUC6P - PYR O Trehalose treAB TRExt - PEP-> TRE6P - PYR 2 Xylose (low affinity) xylE XYLxt - NEXT-> XYL Xylose (high affinity) xylFG, rbsB XYLxt + ATP-> XYL - ADP + PI 3 Alanine cycA ALAxt + ATP-> ALA + ADP + PI Arginine artPMQJI, arg ARGxt + ATP-> ARG - ADP + PI 9 Asparagine (low Affinity) ASNxt - HEXT &-> ASN O Asparagine (high Affinity) ASNxt + ATP-> ASN - ADP + PI O Aspartate gltP ASPxt -- HEXT-> ASP Aspartate gltJKL ASPxt + ATP-> ASP-ADP + PI 3 Branched chain amino acid transport brnO BCAAxt -- HEXT &-> BCAA Cysteine not identified CYSxt - ATP-> CYS - ADP - PI O D-Alanine cycA DALAxt + ATP -> DALA-ADP + PI US 2002/0012939 A1 Jan. 31, 2002 17

TABLE 1-continued The genes included in the E. coli metabolic genotype along with corresponding enzymes and reactions that comprise the genome specific stoichiometric matrix. The final column indicates the presencefabsence of the gene (as the number of copies) in the E. coli genome. Thus the presence of a gene in the E. coli genome indicates that the gene is part of the metabolic genotype. Reactions/Genes not present in the genome are those gathered at state 56 in Figure 2 and together with the reactions of the genes in the metabolic genotype form the columns of the genome specific stoichiometric matrix. E. coli Enzyme Gene Reaction Genome D-Alanine glycine permease cycA DALAxt + HEXT <-> DALA D-Alanine glycine permease cycA DSERxt + HEXT &-> DSER D-Alanine glycine permease cycA GLYxt - HEXT &-> GLY Diaminopimelic acid MDAPxt + ATP-> MDAP - ADP + PI Y-Aminobutyrate transport gabP GABAxt + ATP-> GABA-ADP + PI Glutamate gltP GLUxt -- HEXT &-> GLU Glutamate gltS GLUxt -- HEXT &-> GLU Glutamate GLUxt + ATP-> GLU - ADP + PI Glutamine glnHPQ GLNxt + ATP-> GLN - ADP + PI Glycine cycA, proVWX GLYxt + ATP-> GLY - ADP + PI Histidine hisJMPO HISxt + ATP-> HIS - ADP + PI Isoleucine liv ILExt + ATP-> ILE - ADP + PI Leucine livKM/livPGJ LEUxt + ATP-> LEU - ADP + PI Lysine lysP LYSxt -- HEXT &-> LYS Lysine argT, hisMPQ LYSxt + ATP-> LYS - ADP + PI Lysine/Cadaverine cadB LYSxt + ATP-> LYS - ADP + PI Methionine metD METxt - ATP - MET - ADP - PI Ornithine argT, hisMPQ ORNxt + ATP-> ORN - ADP + PI Phenlyalanine aroP/mtr?pheP PHExt - HEXT -> PHE Proline putP proPWX PROxt - HEXT - PRO Proline cycA, proVW PROxt - ATP - PRO - ADP - PI Putrescine potEFHIG PTRCxt - ATP - PTRC - ADP - PI Serine sdaC SERxt - HEXT &-> SER Serine cycA SERxt + ATP-> SER - ADP + PI Spermidine & putrescine potABCD SPMDxt - ATP-> SPMD - ADP - PI Spermidine & putrescine potABCD PTRCxt - ATP - PTRC - ADP - PI Threonine liv THRxt - ATP - THR - ADP - PI Threonine todcC THRxt - HEXT - THR Tryptophan tnaB TRPxt - HEXT -> TRP Tyrosine tyrP TYRxt - HEXT -> TYR Valine liv VALxt + ATP -> VAL - ADP + PI Dipeptide dppABCDF DIPEPxt - ATP -> DIPEP - ADP - PI Oligopeptide oppABCDF OPEPxt - ATP-> OPEP - ADP - PI Peptide sapABD PEPTxt - ATP -> PEPT - ADP - PI Uracil uraA URAxt - HEXT -> URA Nicotinamide mononucleotide transporter pnuC NMNxt - HEXT-> -- NMN Cytosine codB CYTSxt - HEXT-> CYTS Adenine purB ADxt + HEXT -> AD Guanine gpt, hpt Hypoxanthine gpt, hpt HYXNxt -> HYXN : Xanthosine XTSNxt -> XTSN Xanthine Sp XANxt &-> XAN G-system nup G ADNxt - NEXT -> ADN G-system nup G GSNxt - NEXT - GSN G-system nup G URIxt - NEXT -> URI G-system nup G CYTDxt - HEXT-> CYTD G-system (transports all nucleosides) nup G INSxt - HEXT-> INS G-system nup G XTSNxt - HEXT-> XTSN G-system nup G DTxt - HEXT -> DT G-system nup G DINxt - HEXT -> DIN G-system nup G DGxt -- HEXT-> DG G-system nup G DAxt -- HEXT-> DA G-system nup G DCxt -- HEXT -> DC G-system nup G DUxt - HEXT-> DU C-system nupC ADNxt -- HEXT-> ADN C-system nupC URIxt - HEXT -> URI C-system nupC CYTDxt - HEXT-> CYTD C-system nupC DTxt - HEXT -> DT C-system nupC DAxt -- HEXT-> DA C-system nupC DCxt -- HEXT -> DC C-system nupC DUxt - HEXT-> DU Nucleosides and deoxynucleoside tSX ADNxt -- HEXT-> ADN Nucleosides and deoxynucleoside tSX GSNxt - HEXT - GSN Nucleosides and deoxynucleoside tSX URxt -- HEXT -> URI Nucleosides and deoxynucleoside tSX CYTDxt - HEXT-> CYTD Nucleosides and deoxynucleoside tSX INSxt - HEXT-> INS Nucleosides and deoxynucleoside tSX XTSNxt - HEXT-> XTSN US 2002/0012939 A1 Jan. 31, 2002 18

TABLE 1-continued The genes included in the E. coli metabolic genotype along with corresponding enzymes and reactions that comprise the genome specific stoichiometric matrix. The final column indicates the presencefabsence of the gene (as the number of copies) in the E. coli genome. Thus the presence of a gene in the E. coli genome indicates that the gene is part of the metabolic genotype. Reactions/Genes not present in the genome are those gathered at state 56 in Figure 2 and together with the reactions of the genes in the metabolic genotype form the columns of the genome specific stoichiometric matrix. E. coli Enzyme Gene Reaction Genome Nucleosides and deoxynucleoside tSX DTxt - HEXT -> DT Nucleosides and deoxynucleoside tSX DINxt - HEXT -> DIN Nucleosides and deoxynucleoside tSX DGxt -- HEXT-> DG Nucleosides and deoxynucleoside tSX DAxt -- HEXT-> DA Nucleosides and deoxynucleoside tSX DCxt -- HEXT -> DC Nucleosides and deoxynucleoside tSX DUxt - HEXT-> DU Acetate transport ACxt + HEXT &-> AC O Lactate transport LACxt -- HEXT &-> LAC L-Lactate dP LLACxt + HEXT &-> LLAC Formate transport focA FORxt &-> FOR Ethanol transport ETHxt - HEXT -> ETH Succinate transport dcuAB SUCCxt - HEXT -> SUCC Pyruvate transport PYRxt - HEXT -> PYR Ammonia transport amtE NH3xt - HEXT -> NH3 Potassium transport kdpABC Kxt + ATP-> K - ADP + PI Potassium transport trkAEHG Kxt -- HEXT K-> K Sulfate transport cysPTUWAZ, s SLFxt + ATP-> SLF - ADP + PI Phosphate transport pstABCS Pxt - ATP-> ADP - 2 PI Phosphate transport pitAB Pxt - HEXT -> PI Glycerol-3-phosphate glpT, ugpABCE GL3Pxt - PI-> GL3P Dicarboxylates dcuAB, dctA SUCCxt - HEXT -> SUCC Dicarboxylates dcuAB, dctA FUMxt - HEXT -> FUM Dicarboxylates dcuAB, dctA MALxt -- HEXT &-> MAL Dicarboxylates dcuAB, dctA ASPxt -- HEXT &-> ASP Fatty acid transport fadL C140xt -> C140 Fatty acid transport fadL C160xt-> C160 Fatty acid transport fadL C18Oxt -> C18O C-Ketoglutarate AKGxt -- HEXT &-> AKG Na/H antiporter inhaABC NAxt -- &-> NA- HEXT Na/H antiporter chaABC NAxt -- &-> NA- HEXT Pantothenate panF PNTOxt - HEXT -> PNTO Sialic acid permease nanT SLAxt + ATP-> SLA-ADP + PI Oxygen transport O2xt <-> O2 Carbon dioxide transport CO2xt &-> CO2 Urea transport UREAxt + 2 HEXT &-> UREA AU drain flux for constant maintanence requirements ATP -> ADP - PI Glyceraldehyde transport GLALxt <-> GLAL Acetaldehyde transport ACALxt <-> ACAL

0063)

TABLE 2 Comparison of the predicted mutant growth characteristics from the gene deletion study to published experimental results with single and double mutants. Glucose Glycerol Succinate Acetate Gene (in vivo?in silico) (in vivo?in silico) (in vivo?in silico) (in vivo/in silico) aceF -f+ aceA aceB ackA aCS aCl. -f- -f- cyd +f- cyo +f- CO -f+ -f+ fba -f+ fbp +f- -f- gap -f- -f- gltA -f- -f- US 2002/0012939 A1 Jan. 31, 2002 19

TABLE 2-continued Comparison of the predicted mutant growth characteristics from the gene deletion study to published experimental results with single and double mutants. Glucose Glycerol Succinate Acetate Gene (in vivo?in silico) (in vivo?in silico) (in vivo?in silico) (in vivo/in silico) idh -f- -f- -f- indh +f- +f- O +f- +f- pfk -f+ pgi +f- +f- pgk -f- -f- -f- pg| +f- pntAB +f- +f- +f- glk +f- ppc if- -f+ +f- pta pts +f- pyk +f- rpi -f- -f- -f- sdhaECD +f- tpi -f+ -f- -f- C +f- -f- Zwf +f- SucAD +f- Zwf, pnt +f- pck, mes -f- pck, pps -f- pgi, Zwf -f- pgi, gnd -f- pta, acs tktA, tktB -f- Results are scored as + or - meaning growth or no growth determined from in vivofin silico data. In 73 of 80 cases the in silico behavior is the same as the experimentally observed behavior.

What is claimed is: 4. The method of claim 1, comprising reducing the flux of 1. A method for determining a phenotype of an organism, the metabolic reaction of the candidate metabolic gene to comprising: determine whether the reduction would result in a lethal phenotype. providing a table of metabolic reactions known to take place in the organism, wherein the products of at least 5. The method of claim 1, wherein the phenotype is one metabolic reaction are linked to the reactants of Selected from the group consisting of growth, increased another metabolic reaction; metabolite Secretion and increased protein Secretion. 6. The method of claim 1, comprising determining the determining a candidate metabolic gene on the organism’s minimal media composition required to Sustain growth of genome, the organism. providing the nucleotide Sequence of the open reading 7. The method of claim 1, comprising determining the frame of the candidate metabolic gene; optimal requirements for maximizing a growth phenotype of the organism. assigning a function to the candidate metabolic gene 8. The method of claim 1, comprising determining the based on its nucleotide or amino acid homology to genes in the organism necessary to Sustain highest level of other, known metabolic genes, growth under a particular environmental condition. determining the metabolic reaction of the candidate meta 9. The method of claim 1, wherein the mathematical bolic gene based on the assigned function of the analysis is an optimization analysis. candidate metabolic gene, 10. The method of claim 9, wherein the optimization adding the metabolic reaction of the candidate metabolic analysis is a Flux Balance Analysis using linear program gene to the table of metabolic reactions, and ming methods. 11. The method of claim 1, comprising determining determining a phenotype of the organism by performing whether an input to the table of metabolic reactions results a mathematical analysis of the table of metabolic in a phenotype of increased output of a metabolic product. reactions. 12. A computer System comprising a memory having 2. The method of claim 1, comprising identifying a instructions that when executed perform the Steps of metabolic gene that when removed from the table of meta bolic reactions would result in a Suboptimal phenotype. providing a table of metabolic reactions known to take 3. The method of claim 1, comprising identifying a place in the organism, wherein the products of at least metabolic gene that when removed from the table of meta one metabolic reaction are linked to the reactants of bolic reactions would result in a lethal phenotype. another metabolic reaction; US 2002/0012939 A1 Jan. 31, 2002 20

determining a candidate metabolic gene on the organism’s flux of the metabolic reaction of the candidate metabolic genome, gene to determine whether the reduction would result in a providing the nucleotide Sequence of the open reading lethal phenotype. 17. The computer system of claim 12, wherein the phe frame of the candidate metabolic gene; notype is Selected from the group consisting of growth, assigning a function to the candidate metabolic gene increased metabolite Secretion and increased protein Secre based on its nucleotide or amino acid homology to tion. other, known metabolic genes, 18. The computer System of claim 12, comprising instruc determining the metabolic reaction of the candidate meta tions that when executed perform a method of determining bolic gene based on the assigned function of the the minimal media composition required to Sustain growth candidate metabolic gene, of the organism. 19. The computer System of claim 12, comprising instruc adding the metabolic reaction of the candidate metabolic tions that when executed perform a method of determining gene to the table of metabolic reactions, and the optimal requirements for maximizing a growth pheno determining a phenotype of the organism by performing type of the organism. a mathematical analysis of the table of metabolic 20. The computer System of claim 12, comprising instruc reactions. tions that when executed perform a method of determining 13. The computer system of claim 12, wherein said the genes in the organism necessary to Sustain highest level memory is Selected from the group consisting of a hard of growth under a particular environmental condition. disk, optical memory, Random Access Memory, Read Only Memory and Flash Memory. 21. The computer system of claim 12, wherein the math 14. The computer System of claim 12, comprising instruc ematical analysis is an optimization analysis. tions that when executed perform a method of identifying a 22. The computer System of claim 21, wherein the opti metabolic gene that when removed from the table of meta mization analysis is a Flux Balance Analysis using linear bolic reactions would result in a Suboptimal phenotype. programming methods. 15. The computer System of claim 12, comprising instruc 23. The computer System of claim 12, comprising instruc tions that when executed perform a method of identifying a tions that when executed perform a method or determining metabolic gene that when removed from the table of meta whether an input to the table of metabolic reactions results bolic reactions would result in a lethal phenotype. in a phenotype of increased output of a metabolic product. 16. The computer System of claim 15, comprising instruc tions that when executed perform a method of reducing the