<<

MODELING AND COMPUTATIONAL PREDICTION OF METABOLIC CHANNELLING

by

Christopher Morran Sanford

A thesis submitted in conformity with the requirements for the degree of Master of Science Graduate Department of Molecular Genetics University of Toronto

© Copyright by Christopher Morran Sanford 2009

Abstract

MODELING AND COMPUTATIONAL PREDICTION OF METABOLIC CHANNELLING

Master of Science

2009

Christopher Morran Sanford

Graduate Department of Molecular Genetics

University of Toronto

Metabolic channelling occurs when two that act on a common pass that intermediate directly from one to the next without allowing it to diffuse into the surrounding aqueous medium. In this study, properties of channelling are investigated through the use of computational models and cell simulation tools. The effects of kinetics and thermodynamics on channelling are explored with the emphasis on validating the hypothesized roles of metabolic channelling in living cells. These simulations identify situations in which channelling can induce acceleration of reaction velocities and reduction in the free concentration of intermediate metabolites. Databases of biological information, including metabolic, thermodynamic, toxicity, inhibitory, fusion and physical interaction data are used to predict examples of potentially channelled enzyme pairs. The predictions are used both to support the hypothesized evolutionary motivations for channelling, and to propose potential enzyme interactions that may be worthy of future investigation.

ii

Acknowledgements I wish to thank my supervisor Dr. John Parkinson for the guidance he has provided during my time spent in his lab, as well as for his extensive help in the writing of this thesis. I am grateful for the advice of my committee members, Prof. Alan Davidson and Dr. David Bazett-Jones, who have helped to keep my work focused.

I would also like to thank the members of the Parkinson lab, in particular the postdoctoral researchers Dr. James Wasmuth and Dr. Jose Peregrin-Alvarez, for their continuous help.

I thank as well Charlotte Morrison-Reed for her loving support over the course of my studies.

Finally I would like to acknowledge the National Sciences and Engineering Research Council of Canada for financial support of this research.

iii

Table of Contents Acknowledgements ...... iii

List of Tables ...... vi

List of Figures ...... vi

1 Introduction ...... 1

1.1 Background ...... 2

1.1.1 The of Living Cells ...... 2

1.1.2 Introduction to Metabolic Channelling ...... 4

1.1.3 Previous Work on Channelling ...... 6

1.1.4 The Role of Modeling ...... 10

1.1.5 Previous Research on Metabolic Channelling Modeling ...... 13

1.2 Project Overview ...... 19

2 Simulation ...... 22

2.1 Introduction ...... 22

2.1.1 Details of the Glycolysis Pathway ...... 24

2.2 Cell++ Simulations of Metabolic Channelling ...... 27

2.2.1 Details of the Metabolic Channelling Model ...... 27

2.2.2 Inter-Enzyme Channelling Distance Parameter Sweep ...... 29

2.3 E-CELL Simulations of Metabolic Channelling ...... 29

2.3.1 Simulation Details ...... 29

2.4 Results of the Metabolic Channelling Simulations ...... 33

2.4.1 Cell++ Simulation Results ...... 34

2.4.2 E-CELL Simulation Results ...... 38

2.4.3 Inter-Enzyme Distance Parameter Scan ...... 40

2.5 Discussion ...... 42

2.5.1 Future work: Exploring the glycolysis model in vivo ...... 44

iv

2.5.2 Future work: Improving and extending the model ...... 46

3 Prediction of Metabolic Channelling ...... 49

3.1 Introduction ...... 49

3.2 Methods ...... 51

3.2.1 Metabolic Network Data ...... 51

3.2.2 Predictions Based on Free Coupling ...... 52

3.2.3 Predictions Based on Toxicity and Inhibitors ...... 55

3.2.4 Analysis of Potential Regulation at Metabolic Branch Points ...... 57

3.2.5 Validation of Predictions using Multifunctional Enzymes ...... 58

3.2.6 Validation of Predictions using Protein-Protein Interaction Networks ...... 59

3.3 Results ...... 62

3.3.1 Free Energy Coupling ...... 62

3.3.2 Toxicity and Inhibitors ...... 65

3.3.3 Metabolic Branch Points ...... 70

3.3.4 Multifunctional Enzymes ...... 73

3.3.5 Protein-Protein Interaction Networks ...... 75

3.4 Discussion ...... 78

4 Conclusion ...... 84

4.1 Future Work ...... 85

5 References ...... 87

Appendix A – Simulation Traces ...... 93

Appendix B – Channelling Predictions ...... 104

Appendix C – Statistical Methods ...... 110

v

List of Tables Table 1.1: Diffusion Rate Measurements ...... 13 Table 2.1: Other Simulators ...... 23 Table 2.2: Glycolysis Model Constants ...... 27 Table 2.3: Glycolysis Trace Statistics ...... 34 Table 2.4: Enzymes Acting on Glycolysis Metabolites in S. cerevisiae ...... 45 Table 2.5: The Scale of Metabolism ...... 48 Table 3.1: Highly-Connected Metabolites ...... 51 Table 3.2: Toxicological Databases ...... 57 Table 3.3: PPI Network Data Sources ...... 61 Table 3.4: Toxic and Inhibitory Data Overview ...... 66 Table 3.5: Consecutive Reaction Representation in Multifunction Enzymes...... 74 Table 3.6: Predicted Channelling Representation in Multifunctional Enzymes ...... 74 Table 3.7: Minimum Path Lengths in the Combined Interactome Network ...... 76 Table 3.8: Minimum Path Lengths in Each Individual PPI Network ...... 77

List of Figures Figure 1.1: Cell++ Simulation Environment ...... 17 Figure 2.1: Glycolysis Pathway ...... 25 Figure 2.2: Glycolysis Linear Segment ...... 26 Figure 2.3: Glycolysis Model in E-CELL ...... 31 Figure 2.4: Inter-Enzyme Channelling Distance...... 41 Figure 3.1: Group Contribution Method ...... 53 Figure 3.2: Toxic Metabolite Node Degrees ...... 68 Figure 3.3: Inhibitory Metabolite Node Degrees ...... 69 Figure 3.4: KEGG Pathway Membership ...... 71 Figure 3.5: Metabolic Branching Factor ...... 72

vi

1 Introduction The interior of the cell is not a well-stirred, homogeneous reaction vessel. Instead it is a complex, crowded environment, filled with promiscuously binding , membrane-bound compartments and a network of cytoskeletal structures. Because of these intracellular features, which slow the motion of enzymes and the metabolites they process, the spatial organization of plays an important role in cellular processes. This thesis aims to explore the role of colocalized, biochemically sequential enzymes which are capable of channelling metabolic intermediates from one active site to the next, without releasing significant quantities into the .

Metabolic channelling has been hypothesized to fulfill a number of cellular needs by eliminating the diffusion of some substrates: aiding in regulation, circumventing thermodynamic barriers, protecting unstable intermediates and minimizing damage from toxic intermediates being a few of the natural benefits channelling can provide. Gaining knowledge in the mechanism of action and biochemical effects of metabolic channelling will help us to better understand metabolic diseases and improve the effectiveness of genetically engineered organisms. Metabolic engineering, which is used to create organisms that produce pharmaceutical or industrial chemicals (Atsumi and Liao 2008) and that degrade toxic waste materials (Villas-Bôas and Bruheim 2007), may benefit through greater yields and higher reaction rates with well-targeted implementations of metabolic channelling.

This study investigates the phenomenon of metabolic channelling in two ways. First, a simple biochemical system was computationally modeled, and the effect of channelling interactions on each pair of consecutive enzymes was tested. This provided information on the thermodynamic, enzyme kinetic and spatial localization conditions under which metabolic channelling is likely to be useful. As well, these simulations verified that channelling is indeed capable of accelerating reaction velocities and reducing cytosolic concentrations of certain intermediates, in the model system that was used.

Second, by applying the principles tested in simulation to a variety of biological databases, enzyme pairs were identified that have the potential to be involved in metabolic channelling. Gibbs free energy changes for enzyme catalyzed reaction pairs were used to predict enzyme pairs that may have the capacity to circumvent thermodynamic barriers via substrate channelling. Predictions were also made for enzyme pairs that operate on toxic or inhibitory compounds, based on the hypothesis that the cell would benefit from keeping them out of the bulk cytosol. Because substrate channelling

1

2

is hypothesized to be involved in the regulation of metabolism, these predictions were investigated for their content of key regulatory compounds at metabolic branch points. The predictions were then validated through a comparison of properties that are expected to relate to metabolic channelling: the enrichment in multifunctional enzymes and physical protein-protein interactions.

1.1 Background

1.1.1 The Metabolism of Living Cells The complete set of all the compounds and chemical reactions that occur within an organism is known as its metabolism. The E. coli metabolome, which is its complement of metabolically important chemical species, is made up of at least 625 metabolites. More than 931 reactions act upon these compounds inside the bacterial cell, converting nutrients into energy, functional components of the organism and waste products (Reed, Vo et al. 2003). Most of these reactions are catalyzed by enzymes – proteins capable of carrying out biochemical reactions – of which there are over 900 documented to be present in E. coli (Sundararaj, Guo et al. 2004). Two components of metabolism include , which breaks apart nutrients into energy and biochemical precursors, and anabolism, which constructs important compounds from simpler .

Each species has evolved its own metabolism in response to the challenges presented by its environment, which has led to a wide range of biologically important molecules. There are over 15 000 common metabolites and 4 500 classes of enzymatic activities described in the Kyoto Encyclopaedia of and Genomes (KEGG), a large biological component database (Kanehisa and Goto 2000). Together, they form an intricate metabolic network, with each metabolite connected to the others via enzyme-catalyzed reactions. curation efforts have then attempted to partition the network into more easily understood pathways, named after the key biomolecular functions they perform (Kanehisa and Goto 2000; Karp, Keseler et al. 2007).

Genetic abnormalities in metabolism have been known to be responsible for human diseases for over a hundred years. The term “inborn error of metabolism” was coined by Archibald Garrod in 1908 to describe inherited diseases that prevent metabolism from functioning correctly (Garrod and Frowde 1923). Many genetic diseases are caused by the lack of a specific enzyme. Phenylketonuria is caused by the absence of a functioning hydroxylase enzyme, which is necessary to convert phenylalanine into . This medical condition affects one in 16 000 children born in

3

North America and is currently treatable only by a strict, lifelong diet. Untreated, it interferes with brain development and causes mental retardation (Harding 2008).

The differences in metabolism between organisms create opportunities for drug design. Because pathogenic and parasitic species acquire many types of nutrients from their hosts, they often lack entire metabolic pathways. As well, the needs of these organisms, such as host immune-system evasion, require the activity of enzymes that are not present in mammals. Both differences imply that there are some enzymes which are essential for the disease-causing organism's survival, but non-existent or redundant in . Such enzymes represent potential drug targets; compounds able to inactivate the essential enzymes can potentially kill the parasite or bacterial pathogen without harming the host. Understanding the details of metabolism allows researchers to determine what enzymes are essential in which organisms, and therefore helps them to propose likely drug targets.

Achieving control over aspects of metabolism through genetic modification is known as metabolic engineering and is used to improve yields of important pharmaceutical and industrial chemicals. This form of engineering is also used to improve the nutritional value of important foods: A strain of biofortified rice known as “Golden Rice” was engineered to express β-carotene in the edible endosperm. This metabolite, which is responsible for the orange colour of carrots, is broken down into A when digested. Because deficiency is major problem in developing countries, altering rice metabolism in this way is expected to improve nutrition greatly for malnourished members of rice-eating societies (Paine, Shipton et al. 2005).

Achieving a level of understanding sufficient to cure disease and engineer useful microorganisms is made difficult not only by the scale of metabolism, but also by the intracellular factors which act upon it. Because the chemical needs of an organism are constantly changing, its metabolism is regulated by an intricate system. Intrinsic regulation helps the cell to maintain constant levels of key metabolites by up- or down-regulating their production as they become depleted or overabundant. , for example, is produced from via a series of enzyme-catalyzed reactions. When its concentration increases in the cell, isoleucine acts as an inhibitor of threonine deaminase, decreasing the maximum reaction velocity of the enzyme and slowing isoleucine production (Umbarger 1992). External stimuli can also affect metabolism through extrinsic regulation. Insulin induces cells to convert into fatty acids and glycogen, while epinephrine is capable of conveying the opposite message and initiating the breakdown of those chemicals into glucose.

4

1.1.2 Introduction to Metabolic Channelling The cytoplasm of a living cell, where much of metabolism occurs (Clegg 1984), is not a dilute aqueous solution, and this hinders diffusion in many ways. A typical cell is only 70% by mass, and its cytosol is a concentrated mixture with thousands of types of proteins, lipids, carbohydrates and small metabolites. In , three types of cytoskeletal structures form a complex web throughout the cell that impedes the movement of some substances while providing paths that aid the transportation of others. Many intracellular particles are attracted to each other through ionic pulls, Van der Waals interactions and hydrophobic effects. Some of these interactions are specific and only occur when chance brings two partners together, while others act constantly to slow diffusion. Macromolecular crowding from all these components and steric hindrance force individual particles to follow longer paths to travel the same distance than they otherwise would in a dilute solution.

Some pairs of enzymes circumvent this slow intracellular diffusion through the process of metabolic channelling, in which the of one enzyme is passed directly to the next, without any significant release into the bulk aqueous phase. For channelling to occur, two enzymes that process consecutive reactions in a must be spatially co-localized. In cells, this can be accomplished by targeting them to membrane-bound compartments, or by physically linking them together. Enzymes can bind directly to each other in pairs, as happens with the subunits, or they can be brought together in larger channelling complexes known as “metabolons” (Winkel 2004).

While colocalization of biochemically consecutive enzymes may induce some amount of channelling, there are enzyme pairs that have evolved more efficient mechanisms to minimize loss of the intermediate to diffusion. A sealed tunnel connects active sites in , preventing indole diffusion (see section 1.1.3.1) as well as in carbamoyl- synthetase, where it directs the flow of (see section 1.1.3.2). In plants and in protozoa, dihydrofolate and form a bifunctional enzyme which catalyzes the conversion of 5,10-methylenetetrahydrofolate into first dihydrofolate, then tetrahydrofolate. The dihydrofolate intermediate, which is negatively charged in physiological conditions, follows a pathway of positively charged amino acids on the surface of the enzyme. Electrostatic interactions lead each dihydrofolate directly from one active site to the next, without allowing it to diffuse into the cytosol (Knighton, Kan et al. 1994; Stroud 1994).

5

Metabolic channelling, which is also known as substrate channelling, is hypothesized to aid the cell in a number of different ways. Passing the intermediate directly from one enzyme to the next through channelling prevents release of that metabolite into the cytosol and can have obvious benefits if the compound in question is detrimental to the rest of the cell. The cytosol must be protected from toxic molecules, which can damage cellular machinery through spontaneous chemical reactions, and from inhibitors, which interfere with enzymatic activity. Some important metabolites are unstable in aqueous solution and would degrade if they were freed from the enzymes that act on them. To some degree, all freely diffusing small molecules present a challenge for the cytosol, which can only dissolve a limited quantity of metabolites. By keeping intermediates associated with enzymes, channelling can help to reduce the concentration of free small molecules, thus maintaining the solvation capacity of the cytosol (Srere 1987).

In some situations, channelling can increase the rate at which consecutive biochemical reactions proceed. Eliminating the need for metabolites to diffuse through the cytosol reduces the inter- enzyme transit time of those intermediates, which would relieve a kinetic constraint in the cases where that diffusion is the rate-limiting step. As well, unfavourable equilibria can be circumvented. If an unfavourable reaction is coupled to a favourable one by channelling of a common substrate, the combined reaction will bypass the thermodynamic barrier and proceed more quickly than the two would individually. Accelerating metabolic activity in either of these ways can be advantageous to the organism by allowing it to process nutrients more quickly than its competitors or, more subtly, to react to change more quickly by reducing the time needed to transition into new stable states (Srere 1987).

Channelling also has the potential to affect metabolic regulation through the formation and dissociation of transient enzyme complexes. A cell could dynamically alter specific reaction rates in response to its changing biochemical needs by inducing channelling between appropriate enzyme pairs. In addition to regulating the reaction rates within biochemical pathways, such transient enzyme interactions are capable of directing the flow of metabolic intermediates; by inducing substrate channelling at the entrance to one biochemical pathway, metabolites can be prevented from entering competing pathways.

Substrate channelling is capable of increasing the velocity of reaction pairs by reducing thermodynamic constraints, resulting in a form of metabolic acceleration. This occurs when the first reaction has a low equilibrium constant (Keq), and the second has a high equilibrium constant. For

6

this study, the Gibbs free energy change (ΔG) for a reaction was used as a surrogate for its equilibrium constant. In aqueous solution, the equation

∆퐺 = 푅푡 ln 퐾푒푞 (1) relates Gibbs free energy to the ideal gas constant (R), the temperature (t) of the system and the equilibrium constant (Keq). Pairs of reactions were selected from a database of free energy changes based on their potential for acceleration by channelling.

A certain way to ensure that two enzymes maintain spatial co-localization is to join them together as a single protein. There are many known examples of multifunctional enzymes – proteins which have been observed catalyzing more than one reaction. In this study these were used as a form of evidence that channelling is likely to occur. The existence of a multifunctional enzyme catalyzing sequential reactions implies that there is an evolutionary motivation to co-localize those reactions in the cell. Since the reactions catalyzed by a multifunctional enzyme demonstrate that channelling may be important in that organism, it is likely that channelling of the same intermediate could benefit the metabolism of other species.

In addition to forming multifunctional enzymes, two catalytic functions can be colocalized through physical interactions between enzymes. Binding interactions, which can be direct enzyme-enzyme interactions or operate though linking proteins, act as adhesive forces within the cell, holding specific particles together. The strength of these interactions can vary, and determines whether a stable complex will be formed, or if the two enzymes will only bind transiently, allowing the cell to dynamically associate them in response to regulatory signals.

1.1.3 Previous Work on Channelling As recently as 1991, the existence of metabolic channelling as a widespread biochemical phenomenon was considered controversial. The concept of metabolons was dismissed as unscientific and unobservable by some (Shulman 1991), in spite of the original, highly cited tryptophan synthase example that was discovered in 1958 (see section 1.1.3.1) (Yanofsky and Rachmeler 1958; Matchett 1974). Since then, experiments have shown channelling to be an important metabolic mechanism that is prevalent throughout biology.

Many examples have been documented illustrating the importance of channelling in the metabolism of toxic compounds. Ammonia, for example, is not freely available in the cytosol, so many enzymes have evolved to chemically release it from , and then pass it directly to the active site

7

where it will be used. Glucosamine-6-phosphate synthase (Teplyakov, Obmolova et al. 2001) and glycerol phosphate synthase (Chaudhuri, Lange et al. 2001; Douangamath, Walker et al. 2002) both operate this way – insulating toxic ammonia from access to the cytosol by passing it through an intramolecular tunnel. Carbamoyl phosphate also handles ammonia in this way (Purcarea, Evans et al. 1999; Massant, Verstreken et al. 2002), as well as being involved in further channelling interactions (see section 1.1.3.2).

Channelling has been documented in pathways as well documented as the TCA cycle, where formation has been observed on the inner surface of mitochondrial cristae (Srere 1990). Many bifunctional enzymes have been found to rely on channelling, like methylenetetrahydrofolate /cyclohydrolase, where 5,10-methenyl-THF is passed between active sites (see section 3.3.1.1) (Pelletier and Mackenzie 1995; Pawelek, Allaire et al. 2000). Enzyme pairs that use alternative methods of channelling, such as the electrostatic pathway observed in thymidylate synthase- (Knighton, Kan et al. 1994; Stroud 1994) have been investigated (see section 1.1.2). β- and are hypothesized to use channelling to aid in the regulation of energy in muscles; transient interactions can help to provide the intermittent but high energy requirements of these cells (Foucault, Vacher et al. 2000).

Significant research has been performed on metabolic channelling in plants. acetyltransferase and O-acetyleserine()- channel O-acetylserine between them as they perform the final steps of (Droux, Ruffet et al. 1998). Cyanogenic plants engage in a spectacular example of channelling in their production of dhurrin, a defence toxin. The enzymes used to produce this compound are stored in separate subcellular locations in living plant cells, and are brought together only when the tissue is crushed, typically through herbivory. Only then are enzymes are free to form a transient complex, and channelling allows the final, toxic metabolite to be created (Winkel 2004).

The following are two well documented examples of metabolic channelling. Both cases describe a pair of enzymes that have been spatially colocalized and specifically adapted to maximize the channelling of their intermediate substrates.

1.1.3.1 Metabolic Channelling of Indole One of the earliest characterized examples is the substrate channelling of indole in the production of tryptophan. Tryptophan synthase is a tetrameric enzyme complex formed from two α- and two

8

β-subunits which catalyze the final two steps in the production of tryptophan: The α-subunit cleaves indole-3-glycerol phosphate into indole and glyceraldehyde-3-phosphate and then tryptophan synthase β performs a condensation reaction, joining indole and serine to form tryptophan. In 1958, Charles Yanofsky discovered that the indole intermediate was not released into the cytosol, but instead channelled directly between the two subunits (Yanofsky and Rachmeler 1958).

This was later confirmed by radioisotope analysis in 1973. An in vitro system containing active tryptophan synthase from Neurospora crassa was constructed using 14C-labeled indole-3-glycerol phosphate and non-labelled indole. At regular time intervals, samples were taken and the radioactivity of indole and newly formed tryptophan was measured. Although some radioactivity was detected in the indole population, the amount present in the tryptophan was far higher, implying both that the tryptophan had been formed from originally in the indole-3-glycerol phosphate and that the indole formed in the α-subunit was not free to diffuse into the solution (Matchett 1974).

Substrate channelling between these two enzymes is highly evolved and very coordinated. There is a tunnel between the two subunits that connects the α active site directly to the β active site. Through this tunnel, four molecules of indole are simultaneously transported between subunits at a rate of over 1000 per second. The two reactions are highly synchronized, to prevent indole from accumulating inside the tunnel; formation of tryptophan in the β-subunit triggers the α-subunit to initiate its reaction and release the next molecule of indole.

Indole itself is a very hydrophobic molecule, which justifies the cells requirement to employ channelling in this case; if indole were freed into the cytosol, it would simply diffuse across the plasma membrane and exit the cell entirely. Experiments have been performed using site-directed mutagenesis in which the tunnel between tryptophan synthase subunits was made to leak indole, and the results showed that indole was in fact lost from the cell (Schlichting, Yang et al. 1994).

1.1.3.2 Metabolic Channelling of Carbamoyl Phosphate Carbamoyl phosphate is a thermolabile compound that, in aqueous solution, spontaneously degrades to cyanate, a carbamylating agent that is highly toxic when present in the cytoplasm. However production of this unstable intermediate is essential, as production of carbamoyl phosphate is an important step in both and metabolism. It is formed from glutamine and by carbamoyl-phosphate synthetase (CKase), enters the pyrimidine

9

pathway via a reaction catalyzed by aspartate transcarbamoylase (ATCase) and the arginine pathway by carbamoyltransferase (OTCase):

CKase: 2 ATP + L-glutamine + HCO3- + H2O ↔ 2 ADP + phosphate + L-glutamate + carbamoyl phosphate

ATCase: carbamoyl phosphate + L-ornithine ↔ phosphate + L-

OTCase: carbamoyl phosphate + L-aspartate ↔ phosphate + N-carbamoyl-L-aspartate

Partial channelling of carbamoyl phosphate between CKase and both ATCase and OTCase has been observed in yeast, Neurospora and mammals, but it is in the hyperthermophiles where channelling of this metabolite is most efficient. This thermolabile compound becomes much more unstable as the temperature is increased, and at 96°C, the native temperature of Pyrococcus furiosus, it has a half life of less than three seconds. Enzymes from this organism, as well as from Pyrococcus abyssi, have become important experimental models for the study of carbamoyl phosphate channelling.

Affinity electrophoresis experiments demonstrated that CKase and OTCase are part of a multienzyme complex in Pyrococcus furiosus, but further experiments were needed to prove that substrate channelling is taking place (Massant, Verstreken et al. 2002). Similar to the experiments performed on indole in the tryptophan synthase complex, radiolabeled substrates were used to test whether the intermediate is able to freely diffuse into the cytosol. 14C-containing sodium bicarbonate, MgATP, NH4Cl and aspartate, along with unlabeled carbamoyl phosphate were introduced into a reaction vessel containing active CKase and ATCase. The radioactive carbon atoms were seen to be transferred to the final carbamoyl aspartate product, and not into the surrounding carbamoyl phosphate population. An analogous experiment showed that channelling to OTCase also occurs in Pyrococcus (Purcarea, Evans et al. 1999).

An additional level of channelling occurs in CKase: ammonia is channelled through the enzyme before it is used to create carbamoyl phosphate. After an ammonia molecule is generated from the breakdown of glutamine, it is transported along a 100 angstrom long tunnel running through the center of the enzyme. This tunnel connects three active sites: the glutamine amidotransferase site, the bicarbonate and the final location were carbamoyl phosphate is synthesized. As ammonia is known to be a toxic substance, its careful channelling to avoid release into the cytosol is understandable.

10

1.1.4 The Role of Modeling To probe substrate channelling acting within the context of a metabolic network, this thesis relies on computer models of biochemical pathways. The simulated effects of channelling within a model of enzymes and metabolites provide insights that can be applied to many channelling scenarios.

Modeling is the heart of the scientific process. The models we create represent our current understanding of a subject. They can be as vague as the model implied by a collection of recent papers, or as specific as a set of mathematical equations, but they all try to explain observed phenomena and make useful predictions. These predictions become the hypotheses that scientists will test through experimentation. How these hypotheses compare with the results of experimentation show the weaknesses in the model, allowing us to correct or refine it, and ultimately leading to a more thorough understanding.

Models of cell biological systems, however, require much more data than a single initial velocity or an amount of fuel. Entire catalogues are required to bootstrap these simulations, such as the full complement of metabolites present in a cell, or the entire set of enzymes present and the reactions they catalyze. Only recently have detailed enough datasets been generated to allow even the most basic cell biology models to be defined. These data, identified through a myriad of “-omics” terms such as metabolomics, fluxomics and reactomics, are the first results of high-throughput biological experiments. The new discipline of Systems Biology is emerging with the goal of modeling these huge sets of data and interpreting the biological meaning within them (Joyce and Palsson 2006). Prof. Minoru Kanehisa, whose lab is responsible for one such biological component catalogue, justifies his effort as the next important step in understanding biology:

“A grand challenge in the post-genomic era is a complete computer representation of the cell, the organism, and the biosphere, which will enable computational prediction of higher- level complexity of cellular processes and organism behaviours from genomic and molecular information. Towards this end we have been developing a bioinformatics resource named KEGG as part of the research projects of the Kanehisa Laboratories in the Bioinformatics Center of Kyoto University and the Center of the University of Tokyo.” Modeling of biological systems has been done at many scales, from atomic-scale quantum mechanical simulations of protein folding, to whole organ, organism or ecology simulations. This paper is focused on the “meso-“scale, where entire macromolecules are represented as single units. Details about how individual enzymes perform their functions are abstracted to mathematical

11

functions so that the simulation can model the interactions between large groups of them. At this scale, using modern computers, it is just now becoming possible to model entire cells.

This is what Masaru Tomita, the director of a popular modeling tool known as E-Cell believes is the major challenge of the 21st century. In his words, “the cell is never ‘conquered’ until its total behaviour is understood, and the total behaviour of the cell is never understood until it is modeled and simulated” (Tomita 2001).

1.1.4.1 Spatial Relationships of Cellular Components In modeling the molecular biology of a cell, the spatial relationships between individual molecules need to be considered. Most cells are microscopic, and contain a volume of cytosol so small that the quantity of chemicals present cannot be accurately described as a concentration and must instead be represented by a finite number of molecules. The interior of a cell is far from homogeneous, and is instead composed of multiple membrane-bound compartments, cytoskeletal barriers and various locations to which proteins can be targeted and transported. As well, because molecules move through the cytosol at a limited speed, the time metabolites take to move between enzymes has thermodynamic and kinetic consequences.

In a typical bacterium with a volume of 2 × 10-16L, a 1 µM protein concentration implies the presence of only 100 copies of that protein. With such a low copy number, even random fluctuations in the motion of these particles can result in significant concentration variations throughout the cell (Halling 1989). The concept of molecule concentration becomes even less meaningful when organelles are considered. For example, an average 250nm diameter endosome in a eukaryotic cell usually maintains a pH of 6. This means that there is a single free present in the lumen of that organelle. When that endosome matures, an active proton pump acidifies the endosome to pH 5, which requires the addition of fewer than 50 more (Luby-Phelps 2000). For a model to be accurate at these scales, many chemicals must be represented as individual molecules.

The cytosol itself is inhomogeneous, and proteins are present in many unique environments – some are in solution, some are membrane-bound and others interact with cytoskeletal elements. Organelles and vesicles divide the aqueous environment into compartments as well as providing isolated membranes to which proteins can be targeted. There are also non-membrane bound subcompartments in eukaryotes partitioning the cytosol known as “peripheral cytoplasmic domains” that can be observed by the microinjecting of fluorescent tracers into different parts of the cell. The

12

boundaries of these, which are likely formed from actin and actin-binding proteins, interfere with the motion of large molecules and have an estimated pore size of 50nm (Luby-Phelps and Taylor 1988; Janson, Ragsdale et al. 1996).

In eukaryotes, three types of cytoskeletal structures form a complex web throughout the cell that impedes the movement of some substances while providing paths that aid the transportation of others. Many intracellular particles are attracted to each other through electrostatic interactions, Van der Waals attractions and hydrophobic effects. Some of these interactions occur only between specific partners, while others are caused by promiscuously binding proteins. Macromolecular crowding from all these components and steric hindrance force individual particles to follow longer paths to travel the same distance than they otherwise would in a dilute solution, resulting in a decrease in the rate of diffusion.

Two techniques are commonly used to measure diffusion of molecular species in vivo: Fluorescence recovery after photobleaching (FRAP) and fluorescence correlation spectroscopy (FCS). With the first method, a laser burst is used to photobleach an area of fluorescent molecules, which destroys the fluorophore and eliminates the fluorescence in that area. The time for fluorescence to return, which is interpreted as the time required for unbleached molecules to diffuse into the area, is measured, and from that the diffusion rate can be calculated. FCS is used when the concentration of fluorophores is low enough that fewer than 100 molecules are present in the detection area. By carefully monitoring how the fluorescence changes as individual particles enter and leave the focal volume, it is possible to calculate the diffusivity of the medium in which the particles are moving. The diffusivity in a particular media, such as the cytosol, is defined by the diffusion coefficient D, which has units of length2 / time. In the absence of any cytosolic streaming or other directed flow, the time required for any molecule to diffuse a given distance is proportional to the square root of D, meaning that the transit time for a given metabolite doubles when the diffusivity quadruples.

13

Table 1.1: Diffusion Rate Measurements

Diffusion rates for particles of different sizes measured in vivo and in water. While all molecules move more slowly inside a cell, the difference is greater for larger, protein-sized particles than for compounds with a smaller molecular weight. The diffusion coefficient, D, is given in both the units quoted in the original publication, and in m2/sec. D/Do shows the ratio of in vivo to in vitro diffusion rates (Neil, Duong et al. 1996; Arrio-Dupont, Foucault et al. 1997; Swaminathan, Hoang et al. 1997; Elowitz, Surette et al. 1999; García-Pérez, López-Beltrán et al. 1999; Lacount, Vignali et al. 2005; Stark, Breitkreutz et al. 2006).

Particle Medium D (quoted) D (m^2/sec) Do (in water) D/Do Reference GFP E. coli cytoplasm 7.7μm^2/s 7.70E-012 87μm^2/s 0.088 Elowitz Enolase Rabbit muscle cells 13.5μm^2/s 1.35E-011 56μm^2/s 0.24 Arrio-Dupont GFP CHO cell cytoplasm 2.7e-7cm^2/s 2.70E-011 8.7e-7cm^2/s 0.31 Swaminathan Lactate Rat erythrocyte 0.21e-5cm^2/s 2.10E-010 0.66e-5cm^2/s 0.32 Garcia-Perez Water Rat erythrocyte 0.9e-5cm^2/s 9.00E-010 2.0e-5cm^2/s 0.45 Garcia-Perez Cesium Rat 0.91e-3mm^2/s 9.10E-010 1.9e-3 mm^2/s 0.48 Neil

Within living cells, it has been observed that the diffusivity of proteins, most metabolites and even water is slower than in aqueous solution. The diffusion coefficient decreases as the size of the moving particle increases, but only if the particle has no significant binding interactions. Protein radius does not usually correlate with D, because their protein-protein interactions slow their movement more than their bulk. Small changes in the primary structure of GFP, for example, were observed to change its diffusivity within E. coli cells significantly, presumably due to the differences they introduced in its binding with other intracellular particles (Elowitz, Surette et al. 1999).

1.1.5 Previous Research on Metabolic Channelling Modeling Simulations have been used to investigate specific instances of metabolic channelling in the last five years, prior to which it was only considered in general, hypothetical ways. In 1993, for example, a mathematical representation of an arbitrary pathway was constructed to investigate whether intermediate metabolite concentrations can be reduced through channelling (Cornish-Bowden and Cárdenas 1993). Tryptophan synthase, one of the earliest examples of known metabolic channelling, was modeled from the perspective of determining how many indole molecules occupy the tunnel between enzyme active sites. Only the two enzyme subunits, plus each of the metabolites involved in the reaction, were incorporated in this simulation (Degenring, Röhl et al. 2004).

14

Myocardial ischemia, a coronary disease caused by lack of sufficient blood flow to cardiac tissue, has been investigated from the perspective of metabolic channelling. One study of this disease was performed by constructing an ODE-based model of cardiac glycolysis in LSODE, the Livermore Solver for Ordinary Differential Equations (Radhakrishnan and Hindmarsh 1993). This is a generic ODE solver, similar to E-CELL (see section 1.1.6.1) that uses robust numerical methods to evaluate the evolution of systems of differential equations. The model itself consisted of 31 metabolic compounds involved in glycolysis, the tricarboxylic acid cycle and β-oxidation. Transport processes were implemented to move the compounds between each of three major compartments: extracellular blood, the cytosol and the mitochondria. Equations were constructed for the relevant enzyme-catalyzed metabolic reactions, connecting the metabolites with each other and forming a metabolic pathway. Although few of the numerous kinetic parameters in the model have been accurately measured, parameter sweeps were used to assign plausible values to each of the unknowns.

The model these researchers used was adapted from a previous one by the addition of a glycolytic sub-domain to enable metabolic channelling within glycolysis. Many of the glycolytic enzymes in cells are thought to be colocalized to structures near the sarcolemma and sarcoplasmic reticulum. It was found that adding this feature to explicitly model channelling allowed the simulations to more accurately replicate experimentally determined results. Without modeling channelling, simulations had inaccurately predicted a delay between when ischemia was induced and glycogen breakdown began. As well, the overall rate of lactate production in previous models was much slower than what occurred in vivo. The addition of glycolysis channelling to their model solved both of these issues. The new model's improved fit to reality was considered evidence that glycolytic enzymes are likely to be colocalized in cardiac muscle cells (Zhou, Salem et al. 2005).

A model of the cycle was used to examine the hypothesis that arginine is channelled between and . This metabolic pathway, which is performed in hepatocytes, converts bicarbonate and ammonium into urea as a way for the body to dispose of waste. Evidence for metabolic channelling had come from two sources: urea synthesis enzymes undergo specific spatial organization in liver cells, potentially indicating an evolutionary significance to their proximity. Also, radiolabeling experiments had shown that intracellular arginine pools are not free to mix with the arginine being used in urea production but are instead formed from bicarbonate.

15

The researchers constructed an ODE model within the mathematical modelling tool Mathematica in order to determine whether channelling alone can account for observed differences in radioisotope distribution. They started with an existing model containing three compartments: cytoplasm, mitochondria and an extracellular medium. Differential equations controlling transportation flux between compartments and biochemical reaction velocities were defined, with parallel equations created to represent the change of chemically indistinguishable radiolabeled metabolites. Experimentally determined kinetics constants were used where available, and estimated by parameter fitting otherwise.

A term was added to the differential equations controlling arginine concentration to replicate the effects of metabolic channelling between the two enzymes. Termed a “free mixing factor” (fm), this parameter represented how much of the arginine was free to diffuse between the enzyme microenvironment and the cytosol. A value of zero represented complete channelling, with no intermediate arginine reaching the cytosol, and a value of one represented a well-mixed solution. Simulations showed that an fm of 0.1 was best able to replicate the in vivo results. Introducing metabolic channelling to the mathematical model was able to improve the simulation results dramatically, supporting the hypothesis that channelling is an important feature of the pathway in vivo (Maher, Kuchel et al. 2003).

The aim of my research is to explore the impact of spatial localization and substrate channelling on enzyme activity. To construct in silico models of metabolic systems, two simulation tools were used: Cell++ and E-CELL. Both are capable of accurately modelling the flow of metabolites through a system of enzyme-catalyzed reactions, and both allow the definition of spatial constraints. One metabolic system was modelled with two techniques in order to gain insight through a comparison of the similarities and differences in the results. As both models are meant to represent the same underlying system, those aspects of the results that are the same across simulation methods are likely to represent features of the real biochemical system. Differences, however, are certain to represent artefacts of the simulation or a failure of a simulator. In this case comparing the E-CELL against Cell++ serves to verify that each is functioning correctly. Both Cell++ and E-CELL experiments are potentially subject to errors, brought on either through malfunctions or by the flexibility of the program and the potential for it to be misconfigured.

16

1.1.5.1 An Overview of the Cell++ Simulation Environment Cell++ is a tool that uses differential equations to model some processes, and particle-based agents for others. Cell++ has been used previously to model the effects of intracellular crowding on signalling pathways and geometry on wave propagation, as well as other metabolic systems (Sanford et al. 2006).

The simulation takes place in a cubic lattice which both defines the environment of the cell and stores concentration fields for each of the small molecules being modeled. Configuring each space within this lattice allows the cell to be represented as an arbitrary three-dimensional shape. Each cube within this lattice is initialized to one cellular environment type, such as cell membrane or cytosolic space, which will affect how readily different molecules will be able to diffuse within it. Compounds whose populations are too large to be modeled as individual molecules, such as calcium or metabolic intermediates, are represented by their concentrations within each lattice cell.

Starting from an initial configuration, time progresses via constant increments typically on the order of 1µs. At each step, changes are caused by three types of operations: the diffusion of small molecules, the motion of larger particles and enzymatic reactions.

Partial differential equations based on Fick's law of diffusion control the flux between neighbouring lattice cells. This law, shown for one dimension, relates the change in concentration over time to the local concentration gradient and a constant that defines the diffusivity.

휕푓 휕2푓 = κ (2) 휕푡 휕푥2 This partial differential equation relates the concentration change over time to the instantaneous concentration gradient at that point and a diffusion coefficient kappa.

Cell++ uses the Euler forward method (Petzold 1998) for its numerical approximation of diffusion:

푛+1 푛 n n n 퐶푥 − 퐶푥 C − 2Cx + C = κ x−1 x+1 (3) ∆푡 ∆x2 With the Euler method, the change in concentration at point x as time advances from one time step n to the next, n+1, is related to the concentrations of its neighbours at time n.

17

This technique, which is first-order accurate in time and 2nd-order accurate in space, has been shown to be conditionally stable. So long as no more than 1/6th of the concentration in any lattice cell is moved to its neighbour in a single time-step, the Euler method has been proven to guarantee that mathematical instabilities such as negative concentrations are guaranteed not to occur.

Figure 1.1: Cell++ Simulation Environment

This screen capture shows a simulation in progress using Cell++. The large, grey, wire-frame box represents the boundary of the simulation. Within it, the dots show the location of each individual enzyme particle and the purple area in the bottom left corner illustrates an expanding metabolite concentration field. In the centre of this simulation is a compartment to which two species of enzyme are confined.

18

Larger, less numerous molecules such as enzymes, are modeled as individual particles free to roam about the cell. A random walk approximates the Brownian motion that would be caused by interactions with water molecules undergoing thermal fluctuations. At each time-step, Cell++ chooses a random direction for every particle, as well as a random velocity taken from a range that is calculated to approximate the specified diffusion coefficient. Although this capability is not used in the glycolysis simulation, the particles can be configured to interact with each other when they are close together, forming complexes through binding interactions, or changing molecular states, such as with phosphorylation events. Enzymes perform their catalytic activities at the location they occupy at the end of each time step, converting metabolites that are present within the lattice unit they are within.

1.1.5.2 An Overview of the E-CELL Simulation Environment E-CELL is a widely used cell modeling tool based on the numerical approximation of multiple ordinary differential equations. While it does not explicitly model space, structure within the cell is described by defining cellular compartments and the compounds they will hold. These compartments are generally used to represent membrane-bound organelles, as well as intra- and extra-cellular metabolite spaces. Within these compartments, pools of compounds are specified which model the chemical concentrations at that location. Biochemical processes are then added, directing metabolite flow as the simulation proceeds. Each process, which is connected to one or more compound pool, is associated with a differential equation that relates the concentrations in those pools to an instantaneous rate of change.

A fundamental difference from the way Cell++ operates is how E-CELL treats the passing of time. The numerical approximation engine, as it is running, is capable of dynamically adjusting the length of each time-step based on the rates of concentration changes and the level of error that can be tolerated. This allows E-CELL to operate on multiple timescales in a single simulation, accounting for both fast and slow processes (Takahashi, Kaizu et al. 2004). Cell++, in contrast, represents time with a set of equally sized, discrete intervals.

E-Cell, which models enzymatic reactions as ordinary differential equations acting on compartments of metabolites, has been used to simulate several whole-cell systems. In 2001 it was used to explore a self-sufficient cell model based on Mycoplasma genitalium that consisted of 127 coding genes, and 495 reaction rules. It took into account enzymatic reactions, complex formation and transporters, and succeeding in making the experimentally verified prediction that ATP usage should spike at the

19

moment of cell starvation (Tomita 2001). More recently, E-Cell was employed to simulate an entire red blood cell's metabolism, with the hopes of learning more about medically relevant conditions such as anaemia and glucose-6-phosphate deficiency. Since erythrocytes lack any , the model was simpler to develop, and focused on the glycolytic, pentose phosphate and metabolic pathways (Nakayama, Kinoshita et al. 2005).

1.2 Project Overview The aim of this study is to better understand metabolic channelling – why it occurs and what its effects are – and to use this knowledge to predict enzyme pairs in which channelling is likely to occur. Simulations were used to investigate substrate channelling through the modeling of a simple pathway. In addition to providing intuitive insights by the visualization of a metabolic system in action, these simulations were able to provide quantitative information on the thermodynamic, kinetic and inter-enzyme distance conditions under which channelling can occur. The simulation results were also able to show some of the effects that metabolic channelling can potentially have; an acceleration of the reaction system was observed, as well as changes to the concentrations of intermediate reaction products.

The information gained from these simulations was then used to propose hypotheses regarding what situations are likely to result in the evolution of channelling interactions. This established the criteria under which a variety of databases were searched in order to predict likely channelling pairs. Thermodynamic measurements and estimates were used to infer channelling partners by the change in Gibbs free energy experienced in sequential biochemical reactions. Sources of toxicity and enzyme inhibition data suggested metabolic intermediates that might be undesirable in the cytosol at high concentrations and therefore may be channelled directly between enzymes. These predictions were verified against lists of enzymes which are known to catalyze multiple sequential reactions and enzymes which were observed to interact closely in protein-protein interaction networks.

To investigate the effects of metabolic channelling, a simple, linear stretch of the glycolysis pathway was computationally modeled. The model, which was configured to use realistic parameters for the and reaction thermodynamics, incorporated spatial localization information for all of the enzymes and metabolites. As well, chemical diffusion was simulated explicitly, creating similar metabolite transit times as would be expected to occur in living cells. Two simulation environments

20

were used: Cell++ and E-CELL. The majority of the experiments were performed using Cell++, a tool developed in the Parkinson lab, whose primary strength is the explicit modeling of metabolite concentration fields and three-dimensional enzyme locations. The remainder of the simulations were verifications using E-CELL, which was chosen because of the maturity and reliability of the software.

Within the context of this system, channelling between each of the three pairs of biochemically consecutive enzymes was modeled. The resulting data demonstrated that channelling is capable of circumventing thermodynamic barriers, which accelerates reaction rates for some pairs of reactions. As well, channelling was shown to decrease the cytosolic concentration of intermediate metabolites, which is an important consequence when those compounds interfere with the of the cell. The simulation results form the basis of metabolic channelling enzyme pair predictions that are made from Gibbs free energy values, toxicity information and enzyme inhibition data.

These predictions serve two purposes: They attempt to validate the expected evolutionary motivations for channelling, and they propose novel candidate pairs. The prediction lists are compared with other indicators for metabolic channelling: the existence of multifunctional enzymes and protein-protein interactions with the capability of physically linking enzyme pairs. Successful predictions – those which include known channelling partners – provide validating evidence for the hypotheses they are based on and suggest enzyme pairs that may be worth further investigation.

Many metabolites have the potential to interfere with biochemical activity in the cytosol, such as toxic intermediates, or chemicals that have an inhibitory effect on other enzymes. The cell is expected to benefit from channelling toxic or inhibitory compounds, to reduce their exposure to the cytosol. Using this hypothesis, a list of channelling candidates was generated from enzyme pairs whose reactions both involve an intermediate molecule that may be undesirable to the remainder of the cell because of its toxic or inhibitory properties.

Because physically joining two enzymes can spatially colocalize their functions within the cell, a list of enzyme pairs whose functions were found to both be part of a single multifunctional enzyme in any organism was used to generate a list of channelling predictions. As well, a verification of the channelling predictions was performed through a comparison to several recent surveys of physical protein-protein interactions.

21

The simulations performed in this study demonstrate some of the effects of metabolic channelling and the requirements for it to occur. This information is used to predict likely channelling partners, which are then validated through a comparison with expected channelling indicators.

2 Glycolysis Simulation

2.1 Introduction By decreasing the transit time of intermediate compounds, metabolic channelling is hypothesized to circumvent thermodynamic barriers, resulting in increased reaction rates for some pairs of reactions. As well, channelling is expected to reduce the cytosolic concentration of the intermediates, which may be an important consequence if those compounds interfere with the biochemistry of the cell. To investigate these potential effects of substrate channelling, simulations were performed on a simple metabolic system. Experimentally determined reaction thermodynamics and enzyme kinetics were used to ensure accurate and meaningful results would be obtained. The spatial placement of enzyme particles was varied using multiple simulations, allowing for the exploration of co-localization induced metabolic channelling in different scenarios

The activity of biochemically consecutive enzymes requires an accurate representation of the reaction dynamics. The activity of these enzymes needs to be modeled at the level of reaction velocity, taking into account parameters such as the forward and reverse reaction speeds, the equilibrium constants and enzyme concentrations. Precise details of the reaction mechanism, however, do not need to be simulated as the hypothesis deals only with the motion of intermediates once they are formed and not with the specifics of how they are altered.

For channelling to occur, it is necessary that pairs of enzymes be physically located near each other in the cell, and so the simulation must model the spatial distribution of individual enzyme particles. The distance between pairs of enzymes will affect how much time intermediate compounds will take to travel and how likely they are to diffuse into the remainder of the cytosol.

Equally important is the modeling of diffusion. Because channelling is the difference between reactions that are slowed by the motions of intermediate metabolites and those that are not, it is vital that small molecule diffusion is accurately represented in the simulation. Unlike the spatial requirements for individual enzyme particles, the location and trajectory of each metabolite molecule does not need to be modeled. The paths of these chemicals as they move between enzymes only need to be considered en mass, so it is acceptable to represent them as concentration fields while still maintaining an adequate level of accuracy.

22

23

The rapid growth of biological systems modeling has resulted in the production of a large number of biochemical simulators. In the mesoscale category alone, there are dozens of tools in development that have been used to test, evaluate and make predictions from many different models. While they often use different mathematical techniques – ordinary differential equations, partial differential equations and particle evolution to name a few – all seek to achieve the same goals: visualization and prediction of the outcome of a model biological system. Although it is far from exhaustive, table 2.1 lists a number of these simulators.

Table 2.1: Other Simulators

Many groups have created cell simulation packages to model different aspects of intracellular life. This table lists a small minority of the simulators available today.

Name Technique Reference URL (Shapiro, Levchenko et al. 2003; Butland, Peregrin-Alvarez et al.

Cellerator ODE 2005) http://www.cellerator.org/ (Rain, Selig et al. 2001;

E-Cell ODE Takahashi, Kaizu et al. 2004) http://www.e-cell.org/ (Blinov, Faeder et al. 2004;

BioNetGen Rule-based Ewing, Chu et al. 2007) http://bionetgen.org/ (Kierzek 2002; Butland,

Stocks Reaction-diffusion Peregrin-Alvarez et al. 2005) http://www.sysbio.pl/stocks/ (Rain, Selig et al. 2001; Lok and

Moleculizer Discrete event Brent 2005) http://www.molsci.org/~lok/moleculizer/ (Loew and Schaff 2001; Rain,

Virtual Cell ODE Selig et al. 2001) http://www.vcell.org/

Mcell CA (Stiles, Bartol et al. 2001) http://www.mcell.cnl.salk.edu/

BioSPICE Toolset (Kumar and Feidler 2003) http://biospice.sourceforge.net/

GEPASI ODE (Mendes 1997) http://www.gepasi.org/

Smoldyn Reaction-diffusion (Andrews and Bray 2004) http://www.smoldyn.org/

StochSim Reaction-diffusion (Le Novere and Shimizu 2001) http://www.ebi.ac.uk/~lenov/stochsim.html

MesoRD Reaction-diffusion (Hattne, Fange et al. 2005) http://mesord.sourceforge.net/

JigCell ODE (Vass, Allen et al. 2004) http://jigcell.biol.vt.edu/

Hy3S ODE (Salis, Sotiropoulos et al. 2006) http://hysss.sourceforge.net/

Cell++ Reaction-diffusion (Sanford, Chris et al. 2006) http://www.compsysbio.org/CellSim/

CellWare ODE (Dhar, Meng et al. 2004) http://www.cellware.org

Dynetica ODE (You, Hoonlor et al. 2003) http://www.duke.edu/~you/Dynetica_page.htm (Ander, Beltrao et al. 2004; Karp,

SmartCell ODE Keseler et al. 2007) http://smartcell.embl.de/ (Abbott, Forrest et al. 2006;

CancerSim CA Karp, Keseler et al. 2007) http://www.cs.unm.edu/forrest/software/cancersim/ (Kanehisa and Goto 2000;

Simcell CA Wishart, Yang et al. 2004) http://wishart.biology.ualberta.ca/SimCell/

While not ideal due to its lack of explicit spatial modeling capability, E-CELL is a well established modeling platform that satisfies most of the requirements for this system (Ishii, Robert et al. 2004; Nakayama, Kinoshita et al. 2005; Ohno, Naito et al. 2008). Being mainly an ODE solver, E-CELL has the facility to model arbitrary differential functions, such as those that define reaction dynamics,

24

and to solve them accurately using sophisticated numerical approximation techniques. There is no explicit way to represent individual enzyme particles within E-CELL, however this can be achieved by defining very small compartments and assigning sufficiently low enzyme concentrations to them. Similarly, while diffusion is not a predefined function, it can be programmed into an E-CELL model by applying Fick's Law (equation (2)) between compartments.

The simulation engine designed in the Parkinson Lab, Cell++, was created specifically to investigate spatio-temporal biochemical phenomena, and so it possesses all of the features required for these channelling experiments. Its primary advantage over E-CELL for these experiments is that spatial relationships are modeled explicitly. Enzymes are treated as particles whose locations and motions are modeled precisely as time progresses, but smaller compounds such as metabolic intermediates are treated as three-dimensional concentration fields. Those fields are constantly acted on by the two other factors needed to model channelling: accurate reaction dynamics that occur at the locations of enzyme catalysts, and realistic diffusion throughout the cytosol.

2.1.1 Details of the Glycolysis Pathway A segment of the glycolysis pathway was modeled for three reasons: It is an important, central biochemical pathway found throughout life. As well, glycolysis is well characterized, with significant data available on the enzymatic mechanisms involved. Glycolysis is the entry to carbohydrate metabolism; it converts glucose, a basic sugar used by almost all organisms, into pyruvate, the metabolite which fuels both aerobic and anaerobic respiration, and can be converted into other carbohydrates or fatty acids. In cells which lack mitochondria, it is the primary source of energy. This pathway is present in virtually all known organisms.

25

Figure 2.1: Glycolysis Pathway

The subset of glycolysis being modeled is highlighted within the entire glycolysis/ pathway, with enzymes present in S. cerevisiae shaded in green. While the whole pathway is highly connected to the rest of the cell’s biochemistry, this subset is relatively isolated.

26

Glycolysis is a fully reversible pathway and is known as gluconeogenesis when it flows in the direction of glucose production. Misregulation of glycolysis, through a failure to produce sufficient insulin results in mellitus, and several treatments for this illness work by stimulating glycolysis or inhibiting gluconeogenesis. The simulations in this chapter follow glycolysis through the linear metabolic pathway from 1,3-bisphosphoglycerate through four enzyme-catalyzed reactions to pyruvate. Although glycolysis is connected to many other pathways, these five metabolites are involved in relatively few side reactions, which reduces the complexity required to construct an accurate model. See figures 2.1 and 2.2.

Figure 2.2: Glycolysis Linear Segment

Four enzyme-catalyzed reactions and five metabolites constitute the modeled section of glycolysis. Simulations began with an initial release of 1,3-bisphosphoglycerate and proceeded to form the remaining compounds over time.

Because glycolysis is a well-studied pathway biochemical system, the kinetic and thermodynamic values for each of the reactions have been experimentally determined (Teusink, Passarge et al. 2000). For each of the four enzymes being modeled, experimentally determined data was available

27

detailing all of the relevant reaction constants: The forward and reverse reaction rates, the equilibrium constants and the maximum achievable reaction velocity (Table 2.2).

Table 2.2: Glycolysis Model Constants

Constants used in both the Cell++ and E-CELL glycolysis models. These values were experimentally determined and reflect the activity of S. cerevisiae enzymes acting in vitro.

phosphoglycerate phosphoglycerate Enzyme kinase enolase 1,3- Reactant bisphosphoglycerate 3-phosphoglycerate 2-phosphoglycerate phosphoenolpyruvate

Product 3-phosphoglycerate 2-phosphoglycerate phosphoenolpyruvate pyruvate Forward Michaelis Constant (Ka) in mM 3.00E-003 1.2 0.04 0.14 Reverse Michaelis Constant (Kp) in mM 0.53 0.1 0.5 21

Equilibrium Constant (Keq) 3.20E+003 0.19 6.7 6.50E+003

Maximum Reaction Velocity (Vmax) in µmol · min-1 · mg protein-1 4.8 9.4 1.35 4.05

2.2 Cell++ Simulations of Metabolic Channelling

2.2.1 Details of the Metabolic Channelling Model Simulations were performed within a cube-shaped environment, defined by a cubic lattice ten units in height, width and depth. The spatial resolution of this lattice was 50nm, meaning that entire volume of space simulated was a box 500nm on each side with a volume of 10 aL. This virtual cell represents a fraction of a yeast cell that is approximately one seventh the length, and one three hundredth the volume of a living cell (Johnston, Ehrhardt et al. 1979).

The Cell++ simulations modelled 50ms of reaction time using intervals of 5µs. Because Cell++ uses the Euler method of numerical approximation, the simulation becomes unstable if the time represented per iteration is too large. The value at which this instability occurs is dependent on the size of the lattice unit and the magnitude of the diffusion coefficient. Smaller time-steps, however, require greater amounts of computational resources for the simulation to complete. This timescale was chosen as it is close to the maximum allowable without losing any accuracy due to numerical instabilities. In all of the simulations, more than 50% of the initial 1,3-bisphosphoglycerate was converted into pyruvate within 50ms.

In experiments where yeast cells were incubated with a 100 mM glucose solution, intracellular pyruvate levels were measured at a concentration of 1.85 mM (Teusink, Passarge et al. 2000). To

28

create a similar metabolic state in the virtual cell, at time zero an initial 1 M concentration of 1,3- bisphosphoglycerate was placed in the corner lattice-unit of the cell, from where it begins diffusing throughout the cell. Averaged over the 1000 lattice units in the simulation, this initial dose represents a total concentration of 1mM.

Enzymatic reactions are calculated, using the forward Euler method, at each time step. All enzymes in these simulations are modeled by the one-substrate, one-product, reversible Michaelis-Menten reaction velocity equation (Teusink, Passarge et al. 2000).

푎 푝/푎 1 − 퐾 퐾 푎 푒푞 (4) 푣 = 푉푚푎푥 푎 푝 1 + + 퐾푎 퐾푝

The forward and reverse reaction rates (Ka and Kp), the reactant and product concentrations (a and p respectively) as well as the equilibrium constant (Keq) and maximum velocity (Vmax) determine the instantaneous rate of each reaction. Only the metabolite concentration within the lattice cell of each enzyme is considered during this step, ensuring that the reactions are spatially confined to the location of their catalysts.

Ten particles of each enzyme were initialized randomly throughout the cell volume. Yeast cells are expected to have a much higher concentration of glycolytic enzymes than this, however the limited lattice resolution would not allow for accurate channelling simulations involving more particles. As there are only 1000 lattice cells in the Cell++ model, using a number of enzyme molecules that approaches that number would saturate the system and allow to occur at every point. Instead, each of the enzymes was given a reaction speed factor of 1000 to allow them to represent multiple macromolecules and increase their effective concentration to realistic levels.

As shown in table 1.1 from the introduction, small molecules such as lactate have been measured to diffuse at near 10-10 m2/s in the cytosol, which is one or two orders of magnitude faster than larger macromolecules with measured D values of 10-11 and 10-12 m2/s. Each of the metabolites in this glycolysis simulation were assigned a diffusion coefficient of 10-11 m2/s, which is a low, yet realistic value that increases the change of diffusion-limited effects being observed.

In order to model metabolic channelling, pairs of enzymes were constrained to always stay together in space, representing a metabolon formed through protein interactions. For these experiments, the motion of individual enzyme particles was shown to have no significant effect. To ensure that the

29

virtual metabolons maintained their proximity throughout the simulation time course, the Cell++ default Brownian motion of enzymes was disabled.

For example, in one experiment every particle was paired with one enolase enzyme, and constrained to stay together throughout the simulation. Four combinations of channelling were tested: one experiment was performed with no channelling, as well as one for each of the three pairs of sequential enzymes. The average concentrations of each metabolite were recorded over the course of the simulation.

2.2.2 Inter-Enzyme Channelling Distance Parameter Sweep In an effort to quantify the distance required to enable metabolic channelling under the modeled conditions, an inter-enzyme distance parameter scan was performed. The glycolysis reaction time was measured for 1000 replicates of the Cell++ glycolysis metabolon model. In each replicate, the five enzyme particles were initialized to a new, random location within a 20x20x20 lattice-unit cell. Distances between these enzymes, which did not change over the course of the simulation, were recorded and correlated with the reaction time. To measure distances as precisely as possible, the Cell++ environment was configured to use the smallest time scale that could be modeled in a reasonable amount of time, which for this experiment was 2.5nm per lattice unit. At that spatial resolution, the granularity of the time scale needed to be reduced to 5.0 × 10-8 seconds per iteration. An enzyme reaction velocity scaling factor of 109 was used to model the effect of multiple enzyme particles simultaneously operating at the given distances.

2.3 E-CELL Simulations of Metabolic Channelling

2.3.1 Simulation Details To model the yeast glycolysis channelling system in E-CELL, a five-compartment system was created using the E-CELL Model Builder tool, as shown in figure 2.3. This system consists of a cytosolic compartment, representing the bulk of the cytosol, and one enzymatic locale compartment for each of the four enzymes, initialized to be small enough to allow channelling. Where possible, each parameter in this simulation was assigned the same value as its analogue in the Cell++ model. The cytosolic compartment, which contains metabolite pools for each of the five compounds, was defined to have a volume of 1.25 × 10-16 L and each enzyme-locale compartment was set to the size of a single Cell++ lattice unit 1.25 × 10-19 L.

30

Within each of the smaller compartments there are two types of processes: enzymatic reactions and diffusion between compartments. The reactions converted each metabolite into the next compound in the pathway according to the kinetics of the enzymes that process them. As in the Cell++ simulation, reactions were modeled as one-substrate, one-product reversible Michaelis-Menten reactions, using equation (4). Reaction processes were defined separately for each compartment, to allow for diffusion barrier to enable channelling.

31

Figure 2.3: Glycolysis Model in E-CELL

This figure shows the glycolysis segment model as seen from within the E-CELL model building tool. Ovals represent metabolite or enzyme concentrations as well as the size for each compartment, small rectangles are processes governed by mathematical equations and large boxes delineate compartmental boundaries. Metabolic flow through this system begins as cytosolic bisphosphoglycerate (BPG), at the top of the diagram and follows the arrows through to pyruvate (PYR) at the bottom.

In this model, there are four compartments in addition to the cytoplasmic space. Each of those compartments represents the space immediately surrounding an enzyme particle and contains the processes and concentrations relevant to that enzyme. Inside the compartment, labeled PGK_LOCALE are three concentrations: the concentration of the enzyme itself (PGK), of its bisphosphoglycerate substrate (BPG) and of its product, 3-phosphoglycerate (_3PGA). As well, there are three processes: BPG_DIFFUSION_PROCESS brings cytosolic BPG into the compartment, _3PGA_DIFFUSION_PROCESS returns 3PGA to the cytosol, and PGK_PROCESS performs the function of phosphoglycerate kinase to convert BPG into 3PGA.

The identifiers S0, P0 and C0 are present in this model to indicate the substrates, products and catalysts involved in each process and are used as variables to specify the differential equations.

32

E-CELL does not explicitly model diffusion, so two diffusion processes were defined for each locale compartment: one for the flow of the substrate into the compartment from the cytosol and another moving the product out to the bulk of the cell. Like the enzymatic processes, these equations are reversible; they are capable of both sending metabolites forward along the pathway as well as moving them in the reverse direction. Fick's Law (equation (2)) provided the differential equation here, as it did for Cell++, by specifying the rate that diffusion occurs based on the surface area between compartments and a coefficient of diffusivity. The diffusion rate between compartments was set to 109/ sec, which was chosen by parameter fitting so that the channelling-free E-CELL simulation generated approximately the same reaction time as did the equivalent Cell++ experiment.

Each of the intermediate metabolites were modeled with three linked pools: one for the concentration of that compound in the immediate vicinity of the enzyme that produces it, one for the majority of the cytosol, and one for the volume near the enzyme that acts on it next in the pathway. At the start of the simulation, the bisphosphoglycerate concentration was initialized to 1 M in the large compartment, while each of the other metabolites was initialized with a concentration of zero. The concentrations in the cytosolic pools were recorded throughout the simulation time course to provide a trace of the overall reaction. Figure 2.3 illustrates the E-CELL model with no channelling present. Bisphosphoglycerate (BPG), the only metabolite present at the start of the simulation, has an initial concentration in the large, cytosolic compartment. A diffusion process brings the compound into the volume surrounding a phosphoglycerate kinase particle. Once inside, the BPG acts as a substrate for the enzymatic reaction process and is converted into 3- phosphoglycerate. Another diffusion process transports that intermediate out of the enzyme locale and into the cytosol. This is repeated four times, once for each enzyme in the system, resulting in the eventual production of cytosolic pyruvate (PYR).

To model channelling between enzymes, two enzyme compartments were linked by adding a process to equalize their intermediate metabolite concentrations at a near-instantaneous speed. For example, to enable channelling between phosphoglycerate mutase and enolase, a process was created that equalizes the concentrations of 2-phosphoglycerate in the PGM compartment with that of the 2-phosphoglycerate pool in the ENO compartment. Like metabolic channelling in cells, this provided the effect of having intermediates created by one enzyme immediately accessible to the active site of the next enzyme in the pathway.

33

2.4 Results of the Metabolic Channelling Simulations The results from these simulations are most easily viewed as a metabolite concentration trace along the time course of the experiment. On each of these charts (figures A1 through A10 – see appendix A), time of simulation is shown on the x-axis, and the average concentration of each metabolite throughout the cell is shown on the y-axis. In addition to the concentration time course, the first derivative of that chart with respect to time is also provided, to quantitatively illustrate the rate at which each metabolite accumulates or diminishes. A summary of these simulation traces is shown in table 2.3.

The glycolytic reactions simulated in this section convert an initial dose of 1,3-bisphosphoglycerate into pyruvate. The time taken to perform this combined reaction, described as the pathway's rate or efficiency, varies dramatically as parameters within the model are changed. In none of the simulations, however, is the entire amount of initial compound fully converted into the final metabolite. Because of the thermodynamics involved, the enzymes reaction rates slow as the reactant concentrations decrease and so complete conversion to pyruvate can never occur in this system. This is in contrast to the situation in nature where pyruvate is constantly being used by cellular processes. Because of this, a measurable metric was chosen to represent the overall reaction rate of the complete system: t50. This value, which is defined as the time required to convert half of the initial compound into pyruvate, scales well with the reaction velocities and can be easily determined from any of the glycolysis simulation results.

In addition to the total metabolic rate, we are also concerned with the evolution of the metabolite concentrations. None of the glycolytic intermediates are toxic, unstable or are expected to act as inhibitors in other pathways, but since this pathway is being studied as an example of metabolic channelling, it is useful to consider what would happen if they were. One hypothesized effect of metabolic channelling is to reduce the effects of potentially harmful compounds by keeping their cytosolic concentrations low. In these simulations, the peak concentration of each metabolite was recorded as it is informative to view how channelling affects them.

34

Table 2.3: Glycolysis Trace Statistics

Two statistics are used here to describe important aspects of the simulation results. Time-to- half-completion (t50) measures the length of time required to convert half of the original metabolite into pyruvate. Maximum concentration peaks for each intermediate metabolite show how their concentrations change from one simulation to another.

T50 3PGA Peak 2PGA Peak PEP Peak Modeler Parameter (sec) (mol) (mol) (mol) Cell++ No Diffusion (2.4.1.1) 0.42 0.65 0.065 0.049 Cell++ No Channeling (2.4.1.2) 5.63 0.82 0.074 0.057 3PGA Channeling Cell++ (2.4.1.3) 5.58 0.7 0.076 0.056 2PGA Channeling Cell++ (2.4.1.4) 1.76 0.43 0.034 0.200 PEP Channeling Cell++ (2.4.1.5) 4.92 0.82 0.072 0.000 E-CELL No Diffusion (2.4.2.1) 0.4 0.58 0.056 0.002 E-CELL No Channeling (2.4.2.2) 6.02 0.72 0.066 0.048 3PGA Channeling E-CELL (2.4.2.3) 5.71 0.7 0.069 0.050 2PGA Channeling E-CELL (2.4.2.4) 2.23 0.52 0.059 0.130 PEP Channeling E-CELL (2.4.2.5) 4.84 0.72 0.057 0.000

2.4.1 Cell++ Simulation Results

2.4.1.1 Cell++ – No Diffusion In order to understand the effects of channelling on this system, it is useful to consider how the reaction would proceed without any diffusion limitations. Figure A1 shows the results of the Cell++ simulation where all diffusion is configured to occur instantaneously, effectively modeling the entire system as a well-mixed reaction vessel. Without incorporating the speed-limiting effects of diffusion, this glycolytic pathway reaction reaches half-completion (t50) in 0.42 seconds. Phosphoglycerate kinase, because of its high equilibrium constant, converts BPG into 3-phosphoglycerate (3PGA) at a steady rate until it is exhausted at approximately 0.2 seconds into the simulation. At this point a sharp phase transition is observed, where 3PGA ceases to be created and starts being rapidly depleted. During the first phase of the reaction, the concentrations of each of the intermediate metabolites, as well as that of pyruvate, are seen to climb at constant rates.

Once 3PGA levels start dropping at 0.2 seconds, the concentration of 2-phosphoglycerate (2PGA) immediately begins to drop, causing these two compounds to maintain a constant concentration ratio throughout the entire time course. This is due to the effects of chemical equilibrium: the reaction that forms 2PGA from 3PGA has an equilibrium constant of only 0.19, meaning that the

35

enzyme reaction velocity will decrease to zero as the 2PGA concentration approaches one fifth that of 3PGA. Maintaining this ratio does not, however, noticeably slow the production of the end product pyruvate. Phosphoenolpyruvate production, which is the rate-limiting step in this reaction due to the low maximum reaction velocity of enolase, is able to drain excess 2PGA away from the phosphoglycerate mutase that created it. In this well-stirred model, pyruvate production is able to proceed at a fast, constant rate until it is nearly all converted after one second of simulation.

2.4.1.2 Cell++ – No Channelling Many of the gross features of this simulation are unchanged from the previous one. BPG is depleted very quickly at the beginning of the simulation, causing 3PGA to peak at a high level of 0.82 mM. The next intermediates, 2PGA and phosphoenolpyruvate (PEP) maintain fairly constant, low concentrations with their respective maxima at 0.074 mM and 0.057 mM. Pyruvate production is reasonably fast throughout the simulation, with its rate slowing gradually over the later time points.

A diffusion coefficient of 10-11 m2/s was added to the model for each of the five metabolites. This had the dramatic effect of slowing the overall reaction by an order of magnitude, increasing the t50 from 0.42 to 5.63 seconds. Unlike the no-diffusion simulation, enzyme particles are limited by the fact that they can only operate on metabolites in their immediate vicinity. Diffusion of substrate molecules to and from the active site becomes the rate-limiting step in the reaction.

In spite of its relatively slow forward reaction rate, the BPG initially present is converted to 3- phosphoglycerate (3PGA) almost as quickly as it can diffuse to the phosphoglycerate kinase enzyme particle. With its high equilibrium constant of 3.2 × 103, the forward direction of the reaction is so highly favoured that the accumulation of 3PGA does not significantly affect the speed of the enzyme.

Although the second metabolite, 3PGA, quickly becomes present at a high concentration, it is processed very slowly throughout the entire simulation. The reaction that phosphoglycerate mutase catalyzes to convert 3PGA into 2PGA has an equilibrium constant of only 0.19, meaning that the forward reaction can only continue when the substrate is present at five times the concentration of the product. Once the concentrations approach this ratio in the vicinity of the enzyme, which happens in the first 100ms of the simulation, diffusion becomes rate-limiting and slows the reaction.

Both enolase and pyruvate kinase – the last two enzymes in this model pathway – catalyze reactions in which the forward reaction is favoured over the reverse. Those reactions, which produce

36

phosphoenolpyruvate (PEP) and pyruvate (PYR) respectively, are able to process their substrates as quickly as they are produced in the previous step and delivered via diffusion. Because of this, 2PGA and PEP are maintained at a constant level throughout the simulation, both of them being produced and consumed at approximately equal rates.

In addition to a slowing of the metabolic throughput of the reaction system, a slow diffusion rate has created a period of lag before any noticeable pyruvate production. Without modeling diffusion, pyruvate production in section 2.4.1.1 was at its maximum rate of 1.2 mol/s at time zero, whereas with diffusion modeled explicitly, the initial production of pyruvate was negligible. Its maximum production rate of 0.12 mol/s was reached only after 1.68 seconds. The slow diffusion rate can be easily understood to cause this delay; as opposed to the case with instantaneous diffusion, the first substrate molecule needs to move a significant distance before it can become converted into the final product. This substrate, BPG, which begins the simulation in the corner of the cell, must diffuse to the location of the phosphoglycerate kinase enzyme to be converted to 3PGA. It then needs to flow to each of the other enzymes in turn on its course through the pathway. The time required for diffusion to bring the initial metabolite into contact with phosphoglycerate kinase is responsible for the initial lag in pyruvate production.

The third major difference in the concentration trace caused by adding diffusion to the model is the softening of phase changes. Unlike the well-mixed system, 3PGA production does not proceed at a constant rate until BPG is exhausted. The flow of BPG into the vicinity of PGK slows greatly as the concentration within the cytosol decreases, leading to a gradual reduction of enzymatic activity as the substrate concentration approaches zero. On the simulation trace, this can be seen in the exponentially decreasing BPG concentration and the gradual increase of 3PGA as it reaches its peak concentration.

2.4.1.3 Cell++ – 3PGA Channelling In the 3PGA channelling simulation, diffusion rates are kept at the same 10-11 m2/s value, but the phosphoglycerate kinase and phosphoglycerate mutase particles were tethered together at the same point in space. Channelling of this intermediate did not result in any major changes to the overall reaction as compared with the channelling-free simulation results in section 2.4.1.2: the t50 value decreased very slightly, from 5.63 seconds without channelling to 5.58 seconds with it, and the 3PGA concentration peak only dropped a small amount, from 0.82 mM to 0.70.

37

One observable difference with the channelling-free simulation is that the lag time before 2PGA production begins is reduced. This is not surprising as the distance that intermediate metabolites travel before reaching the phosphoglycerate mutase enzyme has been significantly reduced. Metabolic channelling of 3PGA in this system results in only very slight effects.

2.4.1.4 Cell++ – 2PGA Channelling Metabolic channelling of 2PGA between phosphoglycerate mutase (PGM) and enolase (ENO) greatly increases the total metabolic throughput of this pathway. Pyruvate concentration reaches t50 at 1.76 seconds, which is three times faster than the control simulation without channelling described in section 2.4.1.2. The initial metabolite, BPG, is converted into 3PGA at approximately the same rate in this experiment as it is in the control, however 3PGA is seen to accumulate much less quickly. Peaking at a concentration of 0.43 mM, which is half what was observed in the absence of channelling, it is clear that the 3PGA in the system is being used at a faster rate by PGM.

The next intermediate, 2PGA, maintains a low concentration, in spite of being produced at a faster rate. This implies that the enolase reaction velocity must also have increased to match the higher rate of 2PGA formation. Enolase produces PEP faster, which is seen to amass in the cytosol at higher concentrations (0.20 mM instead of 0.057 mM, a 3.5-fold increase) as it diffuses to the pyruvate kinase enzyme.

PGM activity is increased by the presence of 2PGA channelling, and the following enzymes speed up because of the greater availability of their substrates. The cause of this acceleration is the low equilibrium constant (Keq = 0.19) in the 3PGA ↔ 2PGA reaction: PGM operates significantly faster when the local concentration of 2PGA is lower, and channelling reduces that concentration by locating ENO activity at the site of 2PGA production.

2.4.1.5 Cell++ – PEP Channelling The simulation results where PEP is channelled between enolase and pyruvate kinase are virtually identical to those from the channelling-free control experiment in section 2.4.1.2: t50 occurs at 4.92 seconds, and the 3PGA and 2PGA concentrations peak at 0.82 mM and 0.072 mM respectively. The most significant difference is that the concentration of PEP is undetectably low through the entire simulation. While the two other channelling pairs tested each resulted in a lower level of the channelled metabolite, PEP channelling reduced the intermediate concentration much more

38

significantly. Since the PEP ↔ PYR reaction has a high maximum velocity, and equilibrium constant, PEP is converted quickly.

Keeping cellular levels of specific compounds low, especially for toxic, unstable or inhibitory compounds, is one of the hypothesized effects of metabolic channelling. Phosphoenolpyruvate has none of these properties, but this simulation demonstrates how effectively some enzyme pairings can lower intermediate metabolite concentrations.

2.4.2 E-CELL Simulation Results

2.4.2.1 E-CELL – No Diffusion Comparing the results of two different simulators that are modeling the same system can help to show which features are fundamental to the system and which are only artefacts of the simulator. The behaviour of the PEP concentration is one major difference between the E-CELL and Cell++ simulations modeling a well-mixed system. When run under Cell++, PEP concentration increases for 0.65 seconds, then diminishes for the remainder of the simulation. E-CELL however reports its concentration level at zero for the entire duration. Both results are plausible and so it is difficult to say which would occur in a hypothetically well-mixed cell based on these simulations.

All of the remaining features of the concentration trace agree: the t50 values are similar, the concentrations change at approximately the same rate and the same sharp phase transition is observed at around 0.2 seconds.

2.4.2.2 E-CELL – No Channelling Without channelling, E-CELL delivers a very similar result to the Cell++ simulation of the system. The concentration trace curves are qualitatively similar, with only minor differences in the timing and values of the concentration peaks. Because the surface area of the E-CELL compartments is not defined, the diffusion constant for this simulation was chosen via a parameter sweep as the value that results in the best match to the Cell++ simulation in section 2.4.1.2. The t50 value for this experiment was therefore constrained to be similar.

One minor artefact that was present in the 2PGA concentration change rate from the Cell++ simulation was not observed in these results. There was a small decrease in 2PGA accumulation seen at approximately 0.1 seconds for which there is no plausible explanation. The absence of this

39

feature in the E-CELL results confirms that this is an artefact of the simulation, and was most likely caused by the low spatial resolution of the concentration field lattice.

2.4.2.3 E-CELL – 3PGA Channelling As with the Cell++ results, the results from 3PGA channelling in the E-CELL simulation show that metabolic channelling of 3PGA between its two enzymes has very little effect on the metabolic behaviour of the system. There is one difference between the simulators however. The Cell++ concentration change graph has 2PGA being produced at its fastest rate immediately following the start of the simulation, but the E-CELL results display a lag of 0.2 seconds before that metabolite begins accumulating quickly. This is likely due to the coincidental placement of two non-channelled enzyme particles near each other in the Cell++ virtual cell. While averaging over multiple Cell++ experiments would be expected to resolve this difference, E-CELL is immune to this effect as it does not explicitly model the precise locations of each enzyme.

2.4.2.4 E-CELL – 2PGA Channelling Metabolic channelling of 2PGA affects the same increase in overall reaction rate in the E-CELL simulation as it did with Cell++; the t50 value is reduced by approximately three-fold from the corresponding no-channelling control experiment. The specifics of how intermediate metabolite levels are affected by the enzyme tethering, however, differ in unexpected ways.

For example, the accumulation of the channelled compound, 2PGA, is not significantly lowered as would be predicted. In the E-CELL experiment, acceleration of phosphoglycerate mutase activity still occurs from the thermodynamics responding to 2PGA depletion, but the lower concentration levels in this case are more strongly localized to the site of the enzyme. As with other differences between simulators, this results from the choice of numerical methods. In Cell++, enzyme-catalyzed reactions and diffusion processes are implemented as separate stages; first the formulae are evaluated, and then the chemicals present are moved to neighbouring lattice cells. In contrast, E-CELL considers all of the reaction and diffusion events as one system of equations, and solves them simultaneously. This means that in Cell++, enolase has a chance to act on 2PGA immediately after it is created, and before it has the opportunity to diffuse away. Effectively, it is as if the two enzymes' active sites are farther apart in E-CELL than in Cell++, which approximates them at exactly the same location.

40

2.4.2.5 E-CELL – PEP Channelling The E-CELL and Cell++ simulation results for PEP channelling are virtually identical. Both experiments show that channelling between enolase and pyruvate kinase can be extremely effective at reducing the accumulation of PEP in the cytosol.

2.4.3 Inter-Enzyme Distance Parameter Scan Constraining phosphoglycerate mutase and enolase to the same location in this glycolysis model has been shown to result in an acceleration of the overall series of reactions. To investigate this further, more Cell++ simulations were employed in an attempt to determine the maximum distance in which substrate channelling is capable of enabling this effect. Using the same model configuration presented in section 2.4.1.4 to simulate 2PGA channelling, the PGM/ENO colocalization constraint was removed and distances between each of the enzymes varied. The simulation was repeated 1000 times, with each replicate configured to use a different set of enzyme locations.

Each of the 1000 replicates from this parameter sweep are plotted as a single point in figure 2.4. Distance between the phosphoglycerate mutase and enolase molecules, the two enzymes whose channelling most effects reaction rate, is shown on the X-axis. When the two enzymes are in the same lattice cell, representing direct contact, the channelled distance is zero. Reaction time, measured by the t50 value for conversion of BPG to PYR, is given by each point's position on the Y- axis.

Because each of the four enzyme particles modeled in this experiment were assigned different locations within the virtual cell in every replicate, a spread of reaction times is visible for any given channelling distance. The points at zero distance, with a reaction time around 2ms, are equivalent to the metabolon experiments described earlier. In these simulations, PGM and ENO are located in the same Cell++ lattice unit, resulting in a significantly increased metabolic velocity through this pathway. Both the upper and lower bounds on reaction speed given by these results are seen to increase as the inter-enzyme distance rises, up to approximately 15nm. This trend shows that the rate of channelling is measurable at this scale.

41

Figure 2.4: Inter-Enzyme Channelling Distance

The pathway's overall reaction time decreases as the inter-enzyme channeling distance approaches zero. Cell++ is unable to accurately measure reaction times at very small distances because of limitations in its lattice resolution.

Unfortunately, the precise relationship between inter-enzyme channelling distance and reaction velocity cannot be measured from these data, due to the limits on the spatial resolution. The shortest distance that can be modeled within Cell++ is a single lattice unit, which in this case is 2.5nm. At that range, the reaction acceleration is only beginning to occur in these simulations. Additionally, the results are expected to lose accuracy when inter-enzyme distances approach the spatial resolution of the simulation due to artefacts caused1 by the coarseness of the lattice.

42

2.5 Discussion The simulations described in this chapter have demonstrated how two of the proposed biological motivations for metabolic channelling can be enacted by enzyme colocalization: the circumvention of thermodynamic barriers and protecting the cytosol from toxic intermediates.

Relief of thermodynamic constraints can be accomplished when two biochemically sequential enzymes are spatially associated. If the first enzyme catalyzes an unfavourable reaction and the second catalyzes a reaction with a high equilibrium constant, then channelling of the biochemical intermediate results in a catalytic rate increase for the pair of enzymes and accelerates the two reactions. Using experimentally determined metabolite diffusion coefficients, it was shown that the two enzymes must be less than 2.5 nm apart in order for this effect to occur.

The distance of 2.5 nm is a maximum for cytosolic enzyme pairs, and the results from section 2.4.3 suggest that bringing the enzymes closer together would allow for more efficient channelling. Living cells are thought to spatially associate biochemically sequential enzymes through the formation of tight enzyme complexes known as metabolons (Srere 1987). Channelling between these enzymes can further be enhanced through specific adaptations such as intramolecular tunnels (Pan, Woehl et al. 1997) and electrostatic pathways (Stroud 1994). These features constrain the motion of intermediate molecules, effectively reducing the distance the channelled substrate travels between active sites.

As well, metabolic channelling was responsible for decreasing the cytosolic phosphoenolpyruvate concentration when enolase and pyruvate kinase were colocalized. When the rate-limiting step in a pair of sequential reactions is catalyzed by the first enzyme, these simulations show that channelling is very effective at minimizing the concentration of the intermediate compound, while maintaining the overall reaction velocity. This can be very important when that intermediate is toxic or unstable, or is otherwise undesirable in the cytosol.

Evidence for the existence of metabolic channelling in the glycolysis pathway has been previously documented (al-Habori 1995). Trypanosomes, which include disease-causing parasites such as and , offer the best example: In these organisms the seven glycolytic enzymes required to convert glucose into glycerol 3-phosphate and 1,3-bisphosphoglycerate are all contained within a membrane-bound compartment known as the glycosome. This organelle, which is evolutionarily related to the peroxisome, effectively prevents several metabolic intermediates

43

from entering the cytosol (Clayton and Michels 1996). Other research suggested that there may be direct metabolite transfer between glycerol-3-phosphate dehydrogenase and (Chock and Gutfreund 1988) but that has been contested by later work (Wu, Gutfreund et al. 1991).

Historically, it was believed that compartmentalization of glycolysis in this way increased the speed in which glucose was metabolized in these organisms (Clayton and Michels 1996), but more recent research has shown that the reaction thermodynamics and enzyme kinetics do not support reaction acceleration via channelling in this situation. Instead, the result of glycolysis channelling within the glycosome is hypothesized to be a reduction in cytosolic concentration of glucose-6-phosphate and -1,6-biphosphate. Without the enzyme compartmentalization induced by the glycosome, concentrations of these metabolites were predicted to rise to toxic levels, killing the cells through either osmotic swelling or phosphate depletion (Bakker, Mensonides et al. 2000).

It has been shown that intracellular localization of glycolysis enzymes is regulated, and is important for its proper functioning (al-Habori 1995). In vitro studies demonstrate that aldolase, and glyceraldehyde 3-phosphate dehydrogenase (GAPDH) contain actin- binding domains, (Méjean, Pons et al. 1989) allowing them to interact directly with actin microfilaments. This association has been hypothesized to be involved in regulation, as glycolytic activity is correlated with changes is cytoskeletal structure in some cells (Bereiter-Hahn, Stübig et al. 1995). Specific localization of glycolysis enzymes is critical to their functioning in Drosophila wing muscles; mislocalization of GAPDH and aldolase in those cells leads to a flightless phenotype (Wojtas, Slepecky et al. 1997).

The unique hybrid of ordinary differential equations for chemical reactions coupled with lattice- based concentration field diffusion provided by Cell++ was required for this channelling investigation. Most metabolic simulation tools operate with only sets of ODEs and are incapable of modeling the effects of enzyme location on the overall system. Those modeling packages that do represent space explicitly generally use particles exclusively, which while appropriate for describing signalling cascades and rough particle gradients, are too inefficient to handle metabolite concentrations.

Metabolic channelling in the Cell++ and E-CELL models was implemented by tethering pairs of enzymes to each other to reduce the distance intermediate metabolites would have to diffuse. In a

44

living cell, this can be accomplished by creating gene fusions; by genetically combining the coding regions of two enzymes into a single gene it is possible to construct a multifunctional enzyme. This new gene, when expressed in yeast cells, would create channelling partners very similar to the virtual enzyme pairs in the in silico models. The activity of these synthetic fusions would need to be tested first in vivo, to verify that they remain enzymatically active and maintain similar kinetic parameters. Additionally, it should be possible to introduce polypeptide sequences between the two enzyme units, to create tethers of varying lengths. These constructs could be used to investigate the relationship between inter-enzyme distance and the effects of metabolic channelling.

2.5.1 Future work: Exploring the glycolysis model in vivo The purpose of these simulations is to improve our understanding of metabolic channelling through iterations of biological experimentation and comparison with computational models. Yeast metabolism, however, is such a complex and highly connected network that a whole cell simulation is currently unfeasible. A more manageable approach is to consider subunits of the network that form more manageable modules, such as the linear section of glycolysis presented in this chapter. To accurately match the simulation conditions, it would be necessary to isolate the pathway being studied in vivo.

Genetically, this would be accomplished by eliminating the activity of enzymes that create unmodeled branch points, using temperature-sensitive alleles for essential enzymes and null alleles for the remainder. In the case of the glycolysis subpathway, five enzymes would need to be modified: one that acts on 3-phosphoglycerate (1.1.1.95) and four that catalyze reactions involving phosphoenolpyruvate (2.5.1.19, 2.5.1.54, 4.1.1.49 and 4.2.1.11) (Table 2.4).

45

Table 2.4: Enzymes Acting on Glycolysis Metabolites in S. cerevisiae

The EC numbers for yeast enzymes that act on the metabolites present in the Cell++ and E-CELL glycolysis simulations. The four enzymes represented in the model are shown in colour.

Compound 1,3-bisphosphoglycerate 3-phosphoglycerate 2-phosphoglycerate Phosphoenolpyruvate Pyruvate Enzymes 2.7.2.3 2.7.2.3 4.2.1.11 2.7.1.11 1.1.1.38 4.1.1.48 1.2.1.12 5.4.2.1 5.4.2.1 2.7.1.40 2.7.1.40 4.1.1.50

5.4.2.1 1.1.1.95 2.5.1.19 1.1.2.3 4.1.1.65

2.5.1.54 1.1.2.4 4.1.3.27

4.1.1.49 1.2.4.1 4.3.1.19

4.2.1.11 1.8.1.4 4.4.1.1

2.2.1.6 4.4.1.8

2.3.1.12 4.6.1.1

2.6.1.2 6.4.1.1

4.1.1.1

Experimental methods have been developed in the last decade which can precisely measure the changing metabolite levels in yeast at a time resolution of seconds. Yeast cultures, grown in an incubator, are periodically sampled and sprayed into cold or near-boiling (Gonzalez, François et al. 1997) , which instantly stops the enzymes from functioning. The ethanol solutions, which are kept at a neutral pH to prevent metabolite degradation, are then concentrated by evaporation. A variety of methods are used to quantify each compound of interest, ranging from enzymatic assays to chromatography techniques (Oldiges, Lütz et al. 2007). Researchers have developed automated devices which are capable of sampling yeast cultures repeatedly, for the purpose of generating high resolution metabolome data. Results from these protocols generate metabolite concentration traces which can be directly compared with those produced by simulations (Theobald, Mailinger et al. 1997).

Comparisons such as those have the potential to provide information on in vivo enzyme mechanics. The enzyme models used in these simulations have represented Michaelis-Menten kinetics – a simplified mechanism for generic . There are more precise mechanics known, which take into account the specific order of binding events that take place during the reaction. These mechanisms, however, were determined by examining the reaction velocities in carefully controlled in vitro studies. Enzymes often operate quite differently in the intracellular environment than they do in dilute solutions, and cell simulations may be the best way to investigate them. Comparing experimentally determined metabolite measurements to the simulated values would help to

46

elucidate the in vivo mechanics by testing them against the predicted reaction rates. Knowledge about these mechanisms can then provide insights into how channelling is accomplished – whether a given pair of enzymes coordinate through allosteric interactions, for example.

Other difficult to measure properties that affect channelling can be estimated by comparing the in vivo experimental and computer simulation results. Testing multiple hypothetical values for parameters such as the cytosolic diffusion rates for each metabolite, the effective Vmax of enzymes within the cell and the volume of the cell that is accessible to glycolysis metabolites will yield a range of metabolite concentration trace results. Whichever values give the most realistic results in simulation are the ones most likely to represent the true cellular measurements. Reaction thermodynamics, as well, is an important channelling factor that could be better understood with this method. Equilibrium constants can be different in vivo because of entropic effects caused by intracellular crowding and localized pH or ionic strength levels (Ellis 2001).

2.5.2 Future work: Improving and extending the model The model used to simulate the glycolysis pathway in this chapter is sufficient to demonstrate some aspects of metabolic channelling, such as its dependence on equilibrium constants, reaction velocities and enzyme proximities. Obtaining more detailed and accurate quantitative results, however, requires the use of a more refined model. As well, in order to investigate other aspects of channelling, such as its capability to aid in dynamic metabolic regulation, the system must be extended through the incorporation of additional substrate and enzyme species.

Improving the model can be done by using superior models for each of the processes. As was mentioned earlier, the simple one-substrate, one-product Michaelis-Menten reaction equation controlled the rate of enzyme-catalyzed reactions. This was appropriate only because cometabolites such as ADP were abstracted away. Modeling them explicitly could be effected by adding appropriate concentration fields and upgrading the enzyme mechanisms to take into account their binding.

The biochemical concentrations in this version of the glycolysis model are initialized at implausibly low levels; a living cell is never completely depleted of its metabolites. A better model might begin with each of the compounds present in steady state concentrations. This could be realized by slowly pumping bisphosphoglycerate into the system, or by artificially holding the BPG constant until an equilibrium is reached. At that point a large BPG pulse could be introduced, starting the clock on the

47

simulation trace. Similarly, the final metabolite pyruvate should not be allowed to simply build up. In there are 18 enzymes other than pyruvate kinase that are capable of using it as a substrate. A new process that consumes PYR as it is being created would improve the realism of the model.

Due to a limitation in computer power, the Cell++ model used a relatively coarse lattice resolution. Higher resolutions, though computationally expensive, would allow for the use of more enzyme particles without flooding the simulation, and would minimize artefacts caused by nearby enzyme pairs. Fine-tuning of the simulation parameters would also result in more accurate results. The diffusion coefficients were all estimated at the same value of 1.0-11 m/s2, which represents an estimate that is within two orders of magnitude of the expected value. Each of the enzyme kinetics constants, including the Vmax and the forward and reverse Michaelis-Menten constants, was set based on a single in vitro experiment. The reaction equilibrium constants, as well, were assigned based on thermodynamics measured outside of a real cell. All of these inputs are critical, and updating them as higher quality measurements become available would allow for more accurate simulation results.

The glycolysis segment presented in this chapter is a very simple metabolic model, chosen for its use in demonstrating the basic aspects of metabolic channelling. In it there are only nine network components represented: five metabolites connected by four reactions. A single input to this system – the initial burst of 1,3-bisphosphoglycerate – constitutes the only connection this pathway makes to the external environment. While it was appropriate for this channelling investigation, there are many more advances that can be made through the modeling of more complex systems.

48

Table 2.5: The Scale of Yeast Metabolism

A full metabolic model of yeast requires hundreds of different metabolites and over a thousand reactions. Data for the glycolysis / gluconeogenesis pathway is from KEGG (Kanehisa and Goto 2000; Lacount, Vignali et al. 2005). Data for the complete network is from Förster et al. (Förster, Famili et al. 2003).

Current Glycolysis / Yeast Metabolic Model Gluconeogenesis Network Metabolites 5 21 584 Reactions 4 28 1175 Connections 1 10 287

A more complete model of S. cerevisiae glycolysis would have the capability of answering more difficult questions. According to KEGG, there are 21 major metabolites and 28 different enzymes involved in this pathway, half of which link glycolysis to other pathways (Table 2.5). Metabolic channelling could be further investigated for its power to control the metabolic flux; dynamic channelling, induced by the regulated association and dissociation of enzymes in a transient metabolon, may have the potential to direct metabolites into the pathways that need them most. Expanding the glycolysis pathway with the remainder of its components could allow this potentially important function of channelling to be studied. A full model could be used to investigate how the cell regulates metabolic flow and is able to allocate resources efficiently. Simulations could elucidate how the entire system switches direction from breaking down sugars (glycolysis) to building them back up again (gluconeogenesis), how the citrate cycle becomes activated in the presence of or how energy production changes as different amino acids become depleted, to give just a few examples.

3 Prediction of Metabolic Channelling

3.1 Introduction The simulations in the previous chapter demonstrate the effects that metabolic channelling can potentially bring about in a greatly simplified model system. Results from the simulations showed that colocalization of two biochemically sequential enzymes can diminish the effect of a thermodynamic barrier, and greatly increase the reactions’ combined reaction velocities. Locating a pair of enzymes that act on a common substrate within a close proximity to each other was also shown to decrease the cytosolic availability of their intermediate compound.

In an effort to explore how frequently the hypothesized uses for metabolic channelling are exploited in nature, this chapter details several methods used to predict channelling candidate pairs on the basis of findings from Chapter 2. To validate these predictors, literature reports of channelling were exploited, together with multifunctional enzyme evidence and protein-protein interaction datasets. By comparing these lists of candidates against previously described cases of metabolic channelling, it is possible to infer the biological relevance of the predictors. The predictions made in this chapter represent a systematic and comprehensive survey of potential channelling pairs from data that are currently available, and identify new channelling examples that may be targeted for future experimental studies.

As noted in the previous chapter, channelling between enzymes can help to overcome thermodynamic barriers, as is the case when the first enzyme catalyzes a reaction with a low equilibrium constant and the second catalyzes a more favourable reaction. As was shown in the chapter 2 simulations, linking two such reactions via metabolic channelling can dramatically increase the overall reaction velocity by reducing the concentration of the intermediate metabolite. The first set of predictions described in this chapter is a search for enzyme pairs that catalyze sequential biochemical reactions which have the potential to be accelerated through a coupling of their free .

Also demonstrated in the chapter 2 simulations, by passing an intermediate metabolite directly from one enzyme to the next, substrate channelling is able to reduce the release of that metabolite into the cytosol. In the cases of toxic and inhibitory compounds, this can create selection pressure to force channelling to evolve: toxic but necessary intermediates such as ammonia can react

49

50

inappropriately if free to diffuse through the cytosol, and inhibitory metabolites can interfere with the activity of unrelated pathways. Channelling candidates were predicted based on whether their intermediate compounds are expected to act as toxins or inhibitors.

Along with circumventing thermodynamic barriers and preventing toxic intermediates from entering the cytosol, metabolic channelling is thought to be involved in regulating the flow of metabolism. To investigate this, the frequency at which each class of channelling candidates occurs at metabolic branch points is examined in this chapter. At these key locations in the metabolic network, intermediate compounds produced in one pathway have the opportunity to enter two or more subsequent pathways. Frequently, cells employ methods such as feedback loops and enzyme inhibition to dynamically regulate the direction of metabolic flow at these branches. In this way, the cell can appropriately up- or down-regulate metabolite production, depending on the current demand for each of the biochemical end products. Along with other processes that selectively alter reaction rates, metabolic channelling is capable of directing metabolites into specific pathways. For this reason, it is likely that channelling partners may be over-represented at metabolic pathway branch points.

An important precondition for metabolic channelling to occur is that the two enzymes involved must be in close physical proximity within the cell. There are several ways this can be accomplished: physical protein interactions can bind enzymes together, the enzymes can be targeted to a membrane-bound organelle, or both enzymatic activities can be performed on a single, covalently- linked molecule. Validation of these predictions was performed by comparing the sets of candidate enzyme pairs with three indicators that are associated with channelling in living cells. First, the prediction sets were compared against known examples of multifunctional enzymes – proteins with the capability to perform more than one enzymatic function. By creating an enzyme with two or more active sites that are each capable of catalyzing a different reaction, evolution has ensured that those two reactions will always be performed at the same location. Multifunctional and potentially multifunctional enzymes that perform biochemically sequential reactions were catalogued, with the hypothesis that this list will be enriched for substrate channelling partners. The second indicator of enzyme proximity that was used is the minimum number of physical interactions needed to connect the two proteins. Unless they are localized to a membrane-bound organelle, physical interactions are required to hold pairs of channelling enzymes together within the cell, and so a number of protein-protein interaction networks (PPIs) were searched for correlations with the predicted

51

channelling candidates. Some enzyme pairs are known to bind directly to each other, but many others require linking proteins to help keep them colocalized. Finally, literature reports of metabolic channelling were examined as a further form of validation of the prediction sets.

3.2 Methods

3.2.1 Metabolic Network Data Data on the connectivity of metabolic pathways was obtained from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Kanehisa and Goto 2000). The KEGG “COMPOUND” database was used, which enumerates metabolites and the enzymes that have been reported to catalyze their reactions. This network is not organism-specific, and includes both common, core metabolism reactions as well as rare interactions that are found in only a few species. In release 42.0 of this database, 13 779 compounds are connected by 4224 enzymes.

Table 3.1: Highly-Connected Metabolites

Pairs of reactions with only highly-connected or currency metabolites in common were not considered consecutive. Although these compounds are involved in many reactions, for the most part they participate only to aid in the synthesis and degradation of other molecules, or to provide a source of energy for the reactions they take part in. Additionally, the majority of these compounds are present in abundance within cells, and so affecting their production would not effectively regulate metabolic pathways.

KEGG Number of KEGG Number of Compound ID Reactions Compound ID Reactions H2O C00001 2548 2-Oxoglutarate C00026 166

ATP C00002 489 H2O2 C00027 177

NAD+ C00003 735 Acceptor C00028 200

NADH C00004 732 UDP-glucose C00029 125

NADPH C00005 813 Reduced acceptor C00030 198

NADP+ C00006 815 Acetate C00033 102

O2 C00007 1007 GDP C00035 41

ADP C00008 351 Oxaloacetate C00036 42

Phosphate C00009 416 Succinate C00042 95

Coenzyme A C00010 447 GTP C00044 31

CO2 C00011 458 CMP C00055 64

Pyrophosphate C00013 319 UTP C00075 25

NH3 C00014 308 H+ C00080 1654

UDP C00015 256 UMP C00105 28

AMP C00020 171 CDP C00112 12 Reduced Pyruvate C00022 157 ferredoxin C00138 32 Acetyl coenzyme A C00024 153 H2 C00282 19

l-Glutamate C00025 132 FADH2 C01352 45

52

From this network we constructed a list of biochemically consecutive enzymes representing pairs of enzymes that both act on a single common substrate. In order to create a meaningful list of chemical intermediates, some compounds were filtered out: Compounds which are involved in an extremely large number of reactions, for example water and oxygen, as well as energy-carrying metabolites such as ATP and NADH were not considered to channelling candidates in this analysis. See table 3.1 for the complete list of filtered compounds. A total of 36 ligands were removed for this reason, consistent with previous studies examining topological properties of metabolic networks (Croes, Couche et al. 2006). The final list of consecutive enzymes contains 79 432 pairs joined by 2246 different metabolites.

3.2.2 Predictions Based on Free Energy Coupling

3.2.2.1 Data Sources Both experimentally determined as well as computationally estimated data on reaction Gibbs free energy changes (ΔG) was used to predict pairs of enzymes whose reactions might be accelerated through metabolic channelling. The source of measured energies was obtained from the Thermodynamics of Enzyme-catalyzed Reactions Database (TECRDB) hosted by the National Institute for Science and Technology (NIST). Thermodynamic information has been collected from over a thousand published papers to form this resource which contains ΔG values for about 400 reactions (Goldberg, Tewari et al. 2004).

In addition to these individually-measured thermodynamic data, we used free energy values estimated via the Mavrovouniotis group-contribution method. This is a technique in which the absolute free energy of formation (ΔF) for each compound is approximated as the sum of the energies associated with each chemical group present in the molecule. For example, a -CH3 group adds 8.5 kcal/mol to a compound and a -CO- group subtracts 27.4 kcal/mol (Mavrovouniotis 1990). See figure 3.1. Group-contribution ΔG values that had been calculated for 874 reactions in the iJR904 E. coli metabolism model (Henry, Jankowski et al. 2006) were used to supplement the TECRDB data. Although these ΔG estimates are less accurate than experimentally determined values, a comparison to the NIST values (data not shown) demonstrated sufficient correlation for the estimates to be considered useful.

A third, larger source of estimated free energy data was obtained from a Japanese study which applied the Mavrovouniotis group contribution method to every compound present in KEGG

53

(Tanaka, Okuno et al. 2003). These thermodynamic predictions, however, disagreed significantly with those performed by the Henry group, in spite of having used the same technique. While some were exact matches, many differed by large amounts. Others were given with the same magnitude, but the opposite sign, resulting in contradictions over whether a given reaction would be thermodynamically favourable. An analysis performed by Dr. Yu in the Wodak lab at the University of Toronto was able to attribute some of the differences to the use of physiologically inappropriate protonation states in the Tanaka data and implied that the remainder of the errors were due to misapplications of the Mavrovouniotis method. Correcting these errors was not feasible within the context of this study and so these free energy data proved unusable.

Figure 3.1: Group Contribution Method

The group contribution method described by Mavrovouniotis can be used to estimate the free energy of formation (ΔF) of a compound. Energy values for each chemical group are summed, along with an origin value of -24.7 kcal/mol. In this case, the ΔF for glutamate, which has been experimentally measured to be -167.1 kcal/mol, is estimated at -164.7 kcal/mol.

3.2.2.2 Parsing the Data The NIST TECRDB was not available in a standardized format, and was obtained by first downloading multiple web pages, then parsing them for the relevant information. A table was extracted from those pages containing a reaction reference ID, a URL to the data on that reaction as well as a text representation of the stoichiometry of that reaction in this format:

54

(aq) + orthophosphate(aq) = (aq) + -D- 1- phosphate(aq)

Each of the URLs acquired in this way were then followed to obtain the detailed experimental information. Usable data was parsed from these pages, such as the enzyme name that catalyzes the reaction and its EC number, the pH, temperature and salt concentration conditions of the experiments performed and the equilibrium constant that was reported in the literature. Given the equilibrium constant and the experimental temperature, the change in Gibbs free energy could be calculated.

In contrast, the Henry data was provided as a single Excel spreadsheet that was trivial to convert into a format suitable for my analysis. In the spreadsheet, one page contained a list of enzyme names, EC numbers and free energy change associated with each reaction. The stoichiometry of each reaction was presented in this format:

btcoa + fad + h2o + nad >-> aacoa + fadh2 + h + nadh

A second page listed each of the compounds by name and abbreviation, allowing the components of each reaction to be interpreted easily.

Both the enzyme and compound names from the two data sources were then converted into KEGG standardized names in order to compare them with each other as well as with the metabolic network. KEGG provides a list of common synonyms in its compound and enzyme databases, which included the majority of the NIST and Henry compounds, however a significant number required manual curation in order to account for special characters, typographical errors and unlisted name variants.

All enzymes are capable of operating in both the forward and reverse directions, depending on the concentrations of the metabolites. Therefore, for each reaction with thermodynamic data available, the reverse reaction was included, whose free energy value is the same magnitude but opposite sign as the original reaction. This doubled the size of the combined free energy database and allowed both directions of every reaction to be considered in the subsequent analysis.

55

3.2.2.3 Scoring Pairs Metabolic acceleration due to channelling is hypothesized to occur when the first enzyme catalyzes a thermodynamically unfavourable reaction and the second catalyzes one that is favourable. This means that of the two reactions in such a pair, one must have an equilibrium constant less than one, and the other must be greater than one. As can be seen from equation (1) in the introduction, this implies that the Gibbs free energy values must be positive for the first reaction and negative for the second.

An additional constraint was applied in order to find reaction pairs in which the combined reaction was likely to proceed in the forward direction: The sum of the two free energy values was required to be negative. For the thermodynamic benefit from metabolic channelling to create a favourable combined reaction from a coupled pair, the second reaction must release more energy than the first one consumes. It is important to note that this requirement does not limit reaction pairs to those that are listed as operating in the same direction in textbook biochemical pathways.

In order to prioritize the candidates, a score was calculated by multiplying the two ΔG values together. This score ranked pairs highest where a large thermodynamic barrier could be overcome by channelling.

3.2.3 Predictions Based on Toxicity and Inhibitors To search for metabolic channelling candidates based on the hypothesis that these kinds of compounds are more likely to be channelled than others, lists of toxic and inhibitory chemicals were acquired and the enzymes that act on them investigated. Toxicity data was taken from ChemIDplus, a database maintained by the National Library of Medicine's Specialized Information Services. There are over 380 000 different compounds in this collection, which is indexed by name, structure similarity and most importantly, toxicity. For each chemical, there are references to multiple experiments that have tested the toxic effects of that chemical in different situations; compounds in this database have been administered via every possible route (inhalation, oral, intraperitoneal injection, etc.) to a wide array of organisms. In each case, the qualitative effects on behaviour and physiology, as well as the exact dose required for lethality, have been measured and reported (Tomasulo 2002).

The downloadable version of this database lacks the quantitative toxicity data that is presented in the web interface. In order to construct lists of the most toxic compounds, the advanced filter was

56

used to query ChemIDplus using minimum LD50/LC50 cut offs as the search criteria. Parsing and reformatting the html output allowed the extraction of the sets of chemicals reported to be lethal to some organisms in a variety of minimum doses.

For enzyme inhibition activity data, the Braunschweig Enzyme Database (BRENDA) was consulted. This resource, hosted by the University of Cologne in Germany, collates a wide variety of information on enzymes and the metabolites on which they act. In particular, for every enzyme in the database, compounds that are known to inhibit its activity are recorded. Version 0702 of the database was downloaded and all compounds reported to inhibit any enzyme were collected into the list used for this analysis. Those inhibitors which are substrates for two or more enzymes represent metabolic channelling candidates. A total of 343 inhibitors fall into this category, which are processed by 3553 pairs of enzymes (Barthelmes, Ebeling et al. 2007).

The unprocessed toxin and inhibitor lists contain all compounds that were identified from these databases, including chemicals that do not play a role in the metabolism of most organisms. For example, while palytoxin is an extremely toxic natural molecule, it has only been found in the zoanthids, an order of marine animal. To limit the compounds to naturally produced metabolites, these lists were filtered to include only entries listed in the Kyoto Encyclopaedia of Genes and Genomes (KEGG) “compound” database (Kanehisa and Goto 2000). Subsequently, this same database was used to further restrict the search to molecules that could be considered intermediate metabolites; those that are acted on by at least two different enzymes. Because the goal is to find toxic or inhibitory metabolic channelling candidates, each of them must be produced by one enzyme and then used by a second potential channelling partner.

Although intracellular toxicity information would have been ideal, for this study we used the minimum reported lethal concentration in any organism via any method of compound administration. A cut-off of 100ppm required to kill 50% of a population (LC50) resulted in 254 compounds that were common to 4799 pairs of consecutive reactions. ChemIDplus, while not a source of intracellular toxicity data, is the most appropriate database available. Seventeen toxicity information sources were investigated, but none listed solely chemicals with evidence for intracellular toxicity. Four major databases described substances that are considered hazardous in the workplace and two contained only compounds that are known to be carcinogenic or cause genetic defects. These data sources, which are primarily intended as guidelines for protecting the safety of humans, were not applicable to the study because the toxicity of each compound was not

57

quantified by an LC50. Several were unusable because they are not provided in a machine readable format or simply referenced literature articles that mention toxicity. Table 3.2 lists the toxicity databases that were explored for applicability in this study.

Table 3.2: Toxicological Databases

Seventeen databases were investigated as sources of toxicity data. ChemIDplus was the only machine-readable source of quantitative toxicity data that was applicable for this study.

Name Abbreviation Issue ChemIDPlus

Hazardous Substances Data Bank HSDB Workplace Hazardous Integrated Risk Information System IRIS Workplace Hazardous International Toxicity Estimates of Risk ITER Workplace Hazardous International Chemical Safety Cards ICSC Workplace Hazardous Chemical Carcinogenesis Research Information System CCRIS Carcinogenic Genetic Toxicology GENE-TOX Carcinogenic Registry of Toxic Effects of Chemical Substances RTECS Proprietary/Unavailable MDL Toxicity Database MDL Proprietary/Unavailable Developmental and Reproductive Toxicology Database DART Not Machine Readable Comparative Toxicogenomics Database CTD Not Machine Readable EPA Chemical Fact Sheets Not Machine Readable

DrugBank Not Machine Readable

Toxicology Literature Online TOXLINE Literature Index National Toxicological Program, Chemical Database Literature Index

MSDS Archive Obsolete Toxicological Data at the National Institute of Hygienic Sciences, Japan BL-DB Obsolete

3.2.4 Analysis of Potential Regulation at Metabolic Branch Points For this analysis, two definitions of branch points were considered: The number of KEGG pathways the common metabolite is a member of and the total number of different enzymatic reactions that act on it. The first method relies on the named biochemical pathways – such as gluconeogenesis, ubiquinone biosynthesis and glyoxylate metabolism – assigned to each compound by the Kyoto Encyclopedia of Genes and Genomes (KEGG). These pathways, which consist of manually partitioned reactions and compounds, are constructed to reflect the steps involved in producing the biologically important end products for which they are named.

To measure branch point membership, each compound was classified into three categories, determined by whether KEGG annotates it as belonging to one, two, or more canonical biochemical pathways. Reactions acting on metabolites within a pathway (one KEGG pathway annotation) or

58

between two pathways (two annotations) may be able to affect the metabolic rate, but have little flexibility to direct metabolic flow, since metabolites are often committed to a linear pathway by irreversible reactions. By changing the relative reaction rates of enzymes that process compounds belonging to three or more pathways the cell most easily enact decisions on which pathways to allocate those metabolites to.

Like KEGG pathway membership, the number of enzymatic reactions that act on a given metabolite is correlated with the power of those enzymes to regulate metabolic flow. It does not rely on the arbitrary partitioning of the metabolic network into pathways, but instead directly measures the number of potential routes available. It is, however, more susceptible to false positives as many reactions are only present in a small number of organisms. A compound that is metabolized through different reactions in different organisms will be connected to three or more reactions in the KEGG metabolic network, even though it may not be a branch point in any one species. To overcome this, it is necessary to differentiate between compounds that are involved in few reactions with those participating in many. While it is possible for alternative metabolic pathways in some organisms to add a small number of reactions, very highly connected metabolites are likely to be found at branch points.

3.2.5 Validation of Predictions using Multifunctional Enzymes

3.2.5.1 Annotated Multifunctional Enzymes Swiss-Prot protein knowledgebase is currently one of the most complete databases on proteins, and contains a large number of high-quality, manually curated annotations (Boeckmann, Blatter et al. 2005). From the extensive annotations within Swiss-Prot, only the functional data were used for this analysis; the (EC number) fields in the database indicate known catalytic activities. Of the 118 000 enzymes listed in this database, 370 were functionally annotated with multiple EC numbers, implying that they catalyzed multiple reactions. Almost half of those multifunctional enzymes were determined to catalyze sequential biochemical reactions.

A similar search in the smaller (PDB) yielded far fewer results. In this database, which contains information on structurally-characterized proteins, only one multifunctional enzyme was found that catalyzes sequential reactions: Methylene-THF dehydrogenase/cyclohydrolase. This protein, which is described in more detail in the free energy coupling section (3.3.1.1), was also identified in the Swiss-Prot search.

59

3.2.5.2 Gene Fusions To extend the search further, a class of potentially multifunctional enzymes derived from gene fusions was also included. In a gene fusion event, the coding sequences from two different proteins are brought together via translocation, interstitial deletion or chromosomal inversion. When this happens to two enzyme-coding genes, this occasionally results in an enzyme being created that is capable of performing both original functions. Where natural selection has resulted in the preservation of these fused genes, it is likely that keeping the two functions together in space is biologically advantageous for the organism. These differ from Swiss-Prot in that the database was constructed in an automated manner, with functional information inferred purely through sequence similarity, whereas Swiss-Prot entries are manually curated. Because sequence similarity is a different requirement from demonstrated enzymatic functionality, the gene fusion proteins may include members that are not found in Swiss-Prot. Gene fusion data will include proteins which have been predicted from genome sequencing projects but have not yet been annotated, for example.

The putative gene fusions for this study were taken from the Rosetta Stone analysis in the Prolinks database. Proteins in this database are fusion predictions made by applying three homology criteria: First, the sequence must be the best BLAST hit for two query sequences from a different genome. Second, those two query sequences must themselves be the best reverse BLAST hit from the fusion- candidate sequence. Finally, the two query sequences must share no detectable sequence similarity with each other. This ensures that the fused gene contains sequences of significant similarity to at least two different proteins from another species (Bowers, Pellegrini et al. 2004). Only the high- confidence fusions were used – those marked in the database with a P-value of 0.05 or less – to restrict the list to the enzymes most likely to be multifunctional. Functional annotations were then added to the gene fusions by cross-referencing the enzyme names with the KEGG enzyme database.

3.2.6 Validation of Predictions using Protein-Protein Interaction Networks

3.2.6.1 Data Sources Protein interaction data for this analysis were taken from two classes published PPI networks: those derived from yeast two-hybrid experiments and those generated from affinity capture followed by mass spectrometry. Both methods have been implemented as high-throughput, robotic systems and are capable of detecting physical interactions between proteins. Differences between them, however, cause different classes of interactions to be observed (Bader and Hogue 2002; Han, Dupuy et al. 2005).

60

Yeast two-hybrid screening relies on expressing pairs of modified proteins in genetically altered yeast colonies. A transcription factor is split into two fragments – a DNA binding domain and a transcription activation domain, which are capable of functioning as long as they are brought together through protein binding. Each of those fragments is then fused to one of the two proteins that will be tested for interaction, and the two proteins are expressed in yeast via the introduction of plasmids. To detect when binding occurs, a yeast strain is used with an appropriate reporter gene, such as LacZ. This gene is given an upstream activation sequence that will only allow transcription when the transcription factor that was used is active. By growing this yeast in the appropriate media, colonies will only form when the reporter is expressed, which should only occur if the two proteins of interest interact and bring both transcription factor domains together (Rain, Selig et al. 2001; Lacount, Vignali et al. 2005; Rual, Venkatesan et al. 2005).

In contrast to the yeast two-hybrid assay which identifies binary interactions, affinity purification detects entire protein complexes. While this method also relies on hybrid proteins, only one bait protein is modified in each experiment. That protein is fused to a tag that has a known binding partner or partners. In the case of tandem affinity purification (TAP), the tag contains protein A, which binds to immunoglobulin G (IgG), as well as calmodulin binding (CBP). The cytosol from cells expressing the fusion protein is extracted and run through a column containing IgG- coated beads. The column is washed, to eliminate unbound protein, and then subjected to a that releases the proteins from the beads and exposes the CBP domain of the tag. A second purification step is then performed, where the proteins are exposed to calmodulin-coated beads and washed again. Finally, mass spectrometry is used to determine what proteins remain attached to the tag (Butland, Peregrin-Alvarez et al. 2005; Ewing, Chu et al. 2007).

61

Table 3.3: PPI Network Data Sources

Physical protein interaction networks used in this analysis: The first five datasets are results taken from individual published papers. BioGRID data represents the union of interaction maps published by multiple groups using the same technique and model organism (Ewing, Chu et al. 2007) (Rual, Venkatesan et al. 2005) (Butland, Peregrin-Alvarez et al. 2005) (Rain, Selig et al. 2001) (Lacount, Vignali et al. 2005) (Stark, Breitkreutz et al. 2006).

Source Organism Method Interactions Proteins Enzyme- Enzyme- Protein- Total Enzyme Protein Protein Total Enzyme Protein Ewing H. sapiens Affinity Capture-MS 2324 61 650 1613 1283 218 1065 Rual H. sapiens Two-Hybrid 6726 356 1214 5156 3133 519 2614 Butland E. coli Affinity Capture-MS 3788 371 1426 1991 916 267 649 Rain H. pylori Two-Hybrid 1464 76 488 900 732 194 538 LaCount P. falciparum Two-Hybrid 2846 25 411 2410 1308 131 1177 D. BioGRID melanogaster Two-Hybrid 22282 99 2578 19605 7025 577 6448 BioGRID S. cerevisiae Affinity Capture-MS 38662 2204 6376 30082 3719 707 3012 BioGRID S. cerevisiae Two-Hybrid 12342 482 2384 9476 4208 692 3516

Two human protein interaction networks were used, one derived from each detection method. The Ewing dataset was generated using flag-tagged affinity capture / mass spectrometry, which is similar to TAP-tagging but adds only a short, octapeptide fusion sequence to each bait protein. A single purification step using antibodies to the flag tag recovers the complexes in this protocol. In this study, 407 bait proteins were in a human embryonic 293 cell culture to identify 2324 high- confidence interactions between 1283 proteins (Ewing, Chu et al. 2007). For the second human interactome, yeast two-hybrid assays were used to search for interactions between 7200 bait and 7200 prey proteins, representing 10% of the possible pairs in the human genome (Rual, Venkatesan et al. 2005).

Individual studies provided interaction data for three microorganisms. The E. coli interactome was generated from a tandem affinity purification study that tagged 857 proteins and purified 648 complexes (Butland, Peregrin-Alvarez et al. 2005). A yeast two-hybrid screen using 261 bait proteins and a library of over two million prey ORFs supplied H. pylori information (Rain, Selig et al. 2001) and another was used for P. falciparum data (Lacount, Vignali et al. 2005).

BioGRID, the Biological General Repository for Interaction Datasets, was the source for the three largest interaction networks used here. This database, which is operated out of Mount Sinai

62

Hospital, contains interaction data extracted from a large number of primary literature sources. Both high- and low-throughput experiments are included in the BioGRID dataset, giving it over 200 000 total interactions from among six species (Stark, Breitkreutz et al. 2006). The D. melanogaster data used in this analysis contained 22 282 yeast two-hybrid interactions, the vast majority of which (20 406 interactions) came from a single experiment (Giot, Bader et al. 2003), with the remainder coming from 35 other studies. BioGRID provided two S. cerevisiae networks: one constructed from 252 affinity capture studies and the other taken from 978 yeast two-hybrid sources. Table 3.3 shows a summary of the PPI data sources used.

Each of the datasets listed interactions as pairs of standardized gene identifiers. The Ewing and Rual datasets, for example, used the Human Genome Organization Committee official symbol and BioGRID provided Drosophila data using approved FlyBase symbols. This made it possible to cross-reference the genes with the KEGG LIGAND database and annotate them with functional information. The enzyme table from this database contains entries on 4668 enzymes, and is annotated with the Enzyme Commission functional codes as well as the symbols for genes which have been observed to encode each of them (Kanehisa and Goto 2000). Once each interaction network was annotated with functional information, minimum path lengths were calculated for each enzyme pair.

3.3 Results

3.3.1 Free Energy Coupling In total, there were 3149 enzyme pairs that matched the criteria required for metabolic acceleration through channelling: one enzyme catalyzes a thermodynamically favourable reaction, the other catalyzes an unfavourable reaction, and the combined free energy change of the two reactions is negative. These pairs included 196 intermediate compounds which were identified as being likely channelling candidates. The top scoring compounds are shown in table B1 in appendix B.

One immediately apparent aspect of these results is that many of the compounds identified are involved in multiple reactions. For example, there were 144 reaction pairs that were predicted to benefit from channelling glyoxylate, a compound involved in , and metabolism, as well as the glyoxylate cycle in plants and microorganisms. Many of these pairs represent parallel, or very similar reactions – can form glyoxylate from glycolate by reducing NAD+ in one case, or NADP+ in another. Most, however, are due to the fact

63

that glyoxylate is found at a biochemical branch point. In particular, glyoxylate is a compound with a higher energy of formation (ΔF) than many of its metabolically adjacent neighbours, meaning that most reactions for which it is a product have a positive ΔG, and most reactions using it as a substrate have a negative ΔG. As a consequence, many reaction pairs with glyoxylate as the intermediate meet the free energy prediction criteria of having an unfavourable reaction followed by a favourable reaction. This finding that the channelling predictions frequently involve highly-connected metabolites is discussed later in this chapter.

Two highly connected metabolites from the free energy coupling prediction list that are worth discussing further are 5,10-methenyltetrahydrofolate and chorismate. The former, which is processed by multifunctional enzymes in many species, is documented to be involved in channelling, and the latter is distinguished by its particularly high prediction score.

3.3.1.1 Predicted Channelling Candidate: 5,10-methenyltetrahydrofolate (5-10-methenyl-THF) Methylenetetrahydrofolate dehydrogenase (methylene-THF) and methenyl-THFcyclohydrolase, which were predicted to form a potential channelling pair (prediction score of 4.7 × 107), are found in a single, multifunctional enzyme in many organisms. Methylene-THF dehydrogenase forms the 5,10-methenyl-THF intermediate which is subsequently processed by methenyl-THF cyclohydrolase to form 10-formyl-THF by these reactions.

Methylene-THF dehydrogenase:

5,10-methylene-THF + NADP+ ↔ 5,10-methenyl-THF + NADPH + H+

Methenyl-THF cyclohydrolase:

5,10-methenyl-THF + H2O ↔ 10-formyl-THF

In most eukaryotes these two reactions are catalyzed by a single, bifunctional enzyme, whereas in prokaryotes they are catalyzed by two separate enzymes. The mammalian enzyme equivalent includes a 10-formyl-THF synthase domain, meaning that it is trifunctional and catalyzes three sequential chemical transformations.

Together these two reactions feed the one-carbon pool by drawing from the thymidylate, and serine pathways. Folate, and in particular 10-formyl-THF, is an important metabolite

64

used in the transfer of single-carbon- subgroups. It is an essential compound for purine biosynthesis, making the enzymes responsible for its production common targets of anti- therapies.

The metabolic flow does not always proceed from 5,10-methylene-THF to 10-formyl-THF and has been observed operating in the opposite direction in yeast (Pasternack, Laude et al. 1992). This allows the one-carbon metabolite pool to be supplied by both serine metabolic precursors as well as from formate, depending on the cell's biochemical needs. Channelling between these enzymes has been suggested to specifically improve metabolism in the reverse direction (Pelletier and Mackenzie 1995). While channelling does increase the reaction velocities in the forward direction, the efficiency is only 50%, allowing half of the intermediate to diffuse into the cytosol (Cohen and Mackenzie 1978). Because the cyclohydrolase operates more slowly than the dehydrogenase, 5,10- methenyl-THF is produced more quickly than it can be utilized when the enzyme is operating in the forward direction, so half of the intermediate molecules dissociate from the complex and diffuse into the cytosol. When 5,10-methylene-THF is being formed from 10-formyl-THF however, the channelling efficiency increases to almost 100% (Pawelek, Allaire et al. 2000).

Researchers investigating the mechanism of these two reactions have measured the equilibrium constant of the combined reaction to be 16 ± 2.8, providing a useful comparison with the free energy values used in this channelling prediction. This agrees moderately well with the free energy values of +3.867 kJ/mol and -9.7 kJ/mol the NIST database gives for methylene-THF dehydrogenase and methenyl-THF cyclohydrolase respectively. At 30°C, the temperature in which the combined equilibrium constant was measured, these free energy values predict an equilibrium constant of 6.8. Both 6.8 and 16 reflect a reversible reaction in which the equilibrium significantly favours formation of the reaction products (Pelletier and Mackenzie 1995).

3.3.1.2 Predicted Channelling Candidate: Chorismate The compound which scored highest on the free energy prediction list was chorismate, an intermediate usually produced from shikimate. This metabolite is found at the critical branch point between indole and tryptophan metabolism; and the ubiquinone, folate and alkaloid biosynthesis pathways, and its use is tightly regulated. Although there are currently no known examples of chorismate channelling, its placement in the metabolic network, coupled with the potential for directed reaction acceleration suggested by its prediction score, indicates that it is an excellent

65

potential metabolic channelling candidate. In particular, the : enzyme pair showed an extremely high prediction score.

Chorismate mutase:

chorismate ↔ prephenate

Anthranilate synthase:

chorismate + L-glutamine ↔ anthranilate + pyruvate + L-glutamate

Both enzymes act on chorismate to form compounds at the entry to major biochemical pathways: Chorismate mutase creates prephenate, the main precursor for tyrosine formation, and anthranilate synthase leads into the tryptophan pathway by forming anthranilate.

Metabolic regulation at the chorismate branch point is essential to the survival of many organisms. Experiments in yeast have shown that misregulation by overexpression of chorismate mutase causes tryptophan starvation, as metabolites that would have been used to form tryptophan end up being redirected into the tyrosine biosynthesis pathway instead. Simultaneous overexpression of anthranilate synthase was shown to rescue this phenotype (Krappmann, Lipscomb et al. 2000). Plants have been observed to dynamically regulate the activity of these enzymes to increase tryptophan production as part of their defence response (Niyogi and Fink 1992). As well, plants are known to engage in channelling of chorismate precursors by localizing exclusively to the chloroplasts (Schmid, Schaller et al. 1992). It is certainly plausible that the channelling of chorismate between these two enzymes may occur in nature, and that it would have a significant effect on the production ratio between tyrosine and tryptophan.

3.3.2 Toxicity and Inhibitors Two databases were queried for toxic and inhibitory metabolites with the potential to be channelled: ChemIDplus (Tomasulo 2002) provided information on toxic compounds, and BRENDA (Barthelmes, Ebeling et al. 2007) was used as the source of inhibitors. Each chemical species found was then cross-referenced with the KEGG (Kanehisa and Goto 2000) metabolic network, to identify molecules which are present in natural metabolic systems. The metabolites that are involved in two or more enzyme-catalyzed reactions were considered to be metabolic intermediates that may be involved in channelling due to their toxic or inhibitory properties.

66

For the toxic intermediates, a toxicity threshold of 100ppm was chosen, to exclude chemicals such as (toxic at 600ppm) which are toxic only at very high concentrations. A total of 37 072 compounds from the ChemIDplus database exceeded this level of toxicity. Of these, 1634 were identified as metabolites in KEGG, leading to 197 channelling candidates that were processed by multiple enzymatic reactions. There were 11 004 chemicals annotated in the BRENDA database to have an inhibitory effect on enzymes. From this set of inhibitors, 1624 were considered metabolites and 199 were metabolic intermediates. See table 3.4 for more detail.

Table 3.4: Toxic and Inhibitory Data Overview

This table shows the number of chemical species present in each toxicity and inhibitory compound set. As the toxicity threshold increases, more molecules are added. Only a small fraction are metabolic intermediates, with the potential to be involved in channeling.

Compounds Metabolites Intermediates Toxic at 0.0001 ppm 10 4 0 Toxic at 0.001 ppm 51 9 0 Toxic at 0.01 ppm 181 30 1 Toxic at 0.1 ppm 771 123 14 Toxic at 1.0 ppm 2915 367 46 Toxic at 10 ppm 8885 816 97 Toxic at 100 ppm 37072 1634 197 Inhibitory 11004 1624 199

The number of enzymes each compound interacts with directly affects how many channelling parings are possible; potential partners increase exponentially with enzyme count. Figures 3.2 and 3.3 therefore show the most highly connected toxic and inhibitory intermediates in the metabolic network, with node degree representing the number of interacting enzymes. Reflecting the scale- free property of biological networks, these charts show a small number of highly-connected intermediates and a large number that have only a few enzymes acting on them.

Many reports of ammonia channelling have been published, showing that this toxic yet essential intermediate needs to be kept out of the cytosol. There are almost 200 enzyme-catalyzed reactions that require ammonia, as well as more that incorporate it as a transient partial reaction substrate. In many cases, these reactions have become coupled with glutamine amidotransferases that produce ammonia from glutamine and send it directly to the enzyme's active site (Mouilleron and Golinelli- Pimpaneau 2007). For example, the hisH-hisF complex catalyzes the cyclase reaction step in the biosynthesis pathway. HisH hydrolyzes glutamine to produce ammonia, which is channeled

67

directly into the active site of hisF. Since the histidine biosynthesis pathway is not present in mammals, the hisH-hisF channelling interaction is a potential herbicide target (Chaudhuri, Lange et al. 2001; Amaro, Tajkhorshid et al. 2003).

An interesting inhibitor which may be worth future investigation is 2-oxoglutarate, an important product of the cycle. This metabolite, which inhibits the activity of phosphoenolpyruvate carboxylase, is known to involved in the regulation of glycolysis activity, especially during periods of low glucose supply, such as fasting (Titheradge, Picking et al. 1992). The , however, is only one of several pathways that require 2-oxoglutarate as a metabolite; the glutamate and butanoate metabolic pathways, for example, are connected to each other by this chemical. It is possible that some organisms have evolved 2-oxoglutarate channelling between the glutamate and butanoate pathways in order to prevent inappropriate inhibition of glycolysis.

68

Figure 3.2: Toxic Metabolite Node Degrees

Toxic metabolites and their connectivity within the metabolic network. Node degree, which represents the number of enzymes that act on the metabolite, is plotted on a logarithmic scale.

1 10 100 1000

Ammonia peroxide Acetic acid, glacial Formic acid Sulfuric acid Glycerin , precipitated Hydrogen sulfide Methyl Chlorine Lecithin Ethanolamine Hydroxyacetic acid Isoflurophate Carbon monoxide Alcohol Butyric acid Propionic acid Benzaldehyde Benzoquinone

Prasterone Compound Cyclohexanone Hydroxyhydroquinone Hydrogen selenide Serotonin Hydrogen cyanide Phenethylamine Aniline 4-Nitrophenol Histamine Benzyl alcohol Acrylic acid Trimethylenediamine Ethylamine Acrylic acid 8 x Degree 4 Compounds 13 x Degree 3 Compounds 32 x Degree 2 Compounds 79 x Degree 1 Compounds Node Degree

69

Figure 3.3: Inhibitory Metabolite Node Degrees

Inhibitory metabolites and their connectivity within the metabolic network. Node degree, which represents the number of enzymes that act on the metabolite, is plotted on a logarithmic scale.

1 10 100 1000

5'-AMP 5'-Adenylic acid 2-Oxoglutarate 3'-Phosphoadenylylsulfate 4-Hydroxybenzoic acid 4-Hydroxybenzoate 4-Hydroxyphenylpyruvate 1,2-Dihydroxybenzene 5'-IMP 1,2-Diacylglycerol 1,2-Diacyl-sn-glycerol 3,4-Dihydroxybenzoate 2-Oxoisovalerate 2-Oxoisopentanoate 3,4-Dihydroxybenzoic acid 3-Methyl-2-oxobutanoate 5-Phospho-alpha-D-ribose 1-diphosphate 3-Phospho-D-glycerate 5,10-Methylenetetrahydrofolate 2-Oxobutanoate 2-Oxobutyrate 5'-Deoxy-5'-methylthioadenosine 5'-Methylthioadenosine 2-Ketobutyric acid 2-Acetamido-2-deoxy-D-glucose 4-Hydroxyphenylacetaldehyde 1,4-Dihydroxybenzene 2-Butenoyl-CoA 5'-Adenylylsulfate 3-Phospho-D-glyceroyl phosphate 17beta-Hydroxy-4-androsten-3-one

Compound 4-Aminobutanoate 4-Hydroxybenzaldehyde 4-Hydroxyphenylacetic acid 4-Coumaroyl-CoA 2,3-Dihydroxybenzoic acid 2,3-Dihydroxybenzoate 10-Formyltetrahydrofolate 2'- 5,7,4'-Trihydroxyflavone 2-Acetamido-2-deoxy-D- 3-Hydroxy-3-methylglutaryl-CoA 3-Hydroxyanthranilate 2'- 5'-triphosphate 3,4-Dihydroxyphenylacetate 3,4-Dihydroxyphenylacetic acid 17alpha-Hydroxyprogesterone 13 x Degree 6 Compounds 17 x Degree 5 Compounds 23 x Degree 4 Compounds 40 x Degree 3 Compounds 58 x Degree 2 Compounds 72 x Degree 1 Compounds Node Degree

70

3.3.3 Metabolic Branch Points Metabolic channelling is suspected to play a role in the regulation of metabolism through the formation and dissociation of transient enzyme complexes. Channelling that is dynamically induced by these interactions would have the ability to direct metabolic flow into specific pathways if the enzymes involved catalyze reactions at metabolic branch points. To test whether this is occurring in the metabolites predicted to be involved in channelling, each compound was categorized by two metrics: the number of KEGG pathway annotations and the number of biochemical reactions that act upon the metabolite.

The results using both KEGG pathways and metabolic branching factors show a consistent over- representation of compounds from each of the prediction lists – free energy predictions, toxic compounds, inhibitory compounds – as well as from the gene fusions and multifunctional enzymes. Figures 3.4 and 3.5 show the fraction of compounds present in each category, normalized against those compounds which act as intermediates between any pair of consecutive reactions. Enzyme pairs predicted by their free energy changes show the largest enrichment, with 4-fold higher than expected membership in the multi-pathway class. The toxic and inhibitory intermediate predictors also show a greater frequency of multi-pathway KEGG membership than seen in the control, but with only a 2-fold increase.

As with the KEGG pathway memberships, metabolites predicted by the free energy method (section 3.3.1) demonstrated the greatest correlation to branching factor by showing a 7-fold increase in the most highly connected category (21+ reactions.) The remaining predictors were also seen to select for unusually highly connected compounds. In the 6-10, 11-20 and 21+ categories, every enzyme pair list had proportionately higher representation than did the complete list of biochemically consecutive enzymes. Each of the techniques used for predicting metabolic channelling partners resulted in candidate pairs that were enriched for metabolic branch points, as defined by both KEGG pathway membership and metabolic branching factors.

Both branch point analyses show similar enrichments for the metabolic channelling prediction sets and the Rosetta Enzyme and Swiss-Prot multifunctional enzyme sets. This result is not surprising, as the multifunctional enzyme lists were found to overlap significantly with free energy and inhibitory metabolite predictions.

71

Figure 3.4: KEGG Pathway Membership

The enrichment of multiple KEGG pathway annotated compounds in each of the channeling prediction sets. The fraction of compounds in each prediction list belonging to 1, 2 or more KEGG pathways was normalized by dividing by the fraction of biochemically consecutive enzyme pairs annotated with the same number of pathways. The result shows that fewer than expected compounds which were predicted to be involved in channeling are annotated with only one metabolic pathway. Branch point compounds – those belonging to three or more KEGG pathways – are enriched in each of the metabolic prediction lists. In this figure, “Consecutive” refers to the complete set of biochemically consecutive enzyme pairs, whereas the remaining data sets contain enzyme pair predictions: “Free Energy” pairs were predicted by their Gibbs free energy values, “RosettaEnzyme” pairs are putative gene fusions, “SwissProt” pairs are multifunctional enzymes, “Toxic” pairs are enzymes acting on a common toxic metabolite and “Inhibitor” pairs are enzymes which both act on a known enzymatic inhibitor. Statistical significance is indicated with dots: p-value < 10-2(•), < 10-5(••), < 10-10(•••). See appendix C.

72

Figure 3.5: Metabolic Branching Factor

Compounds are divided into six categories based on the number of biochemical reactions each is annotated within the KEGG compound database. The highest categories – with six or more reaction annotations – contain the most metabolically well-connected compounds and represent branch-point metabolites. The fraction of each prediction set belonging to each branching factor class was normalized against the fraction present in all intermediate metabolites. In this figure, “Consecutive” refers to the complete set of metabolites. The remaining data sets contain metabolites predicted to be involved in channeling by the five predictors described in figure 3.4. Statistical significance is indicated with dots: p-value < 10-2(•), < 10-5(••), < 10-10(•••). See appendix C.

73

3.3.4 Multifunctional Enzymes From the 4 600 000 protein pairs from 170 genomes in the Rosetta Stone database, 4144 represented pairs of enzymes, 571 of which catalyze consecutive reactions. These gene fusions show a high degree of overlap with the multifunctional Swiss-Prot enzyme list, which is expected. Of the 176 Swiss-Prot pairs, 43 or 24% of them are present in the Rosetta Stone list. Each of the gene fusion pairs represents one enzyme whose sequence shares significant similarity to two enzymes that each have different reactions, implying that the two original genes fused at some point in history. Although genetic fusion of two enzymes does not always result in a protein capable of catalyzing multiple reactions, it is one method by which multifunction enzymes evolve.

Bringing two enzymes together in one protein is an excellent way for evolution to create metabolic channelling. Like a metabolon formed through inter-protein binding, catalyzing two reactions by a single molecule, guarantees that they will take place at the same physical location in the cell. This minimizes diffusion of the intermediate into the bulk cytosol. Although there are other reasons to fuse enzymatic genes – transcriptional regulation, for example – the ability of an enzyme to catalyze multiple sequential biochemical reactions is a good indicator of metabolic channelling.

Because multifunctional enzymes are a good indicator of channelling, the fraction of enzyme pairs that are present in the Rosetta and Swiss-Prot lists is one way to measure the accuracy of a metabolic channelling predictor. The list of consecutive enzyme pairs represents all possible metabolic channelling opportunities, and so it is used as a control set to which the channelling predictions will be compared against. Only 0.72% of all consecutive enzyme pairs are present in the Rosetta Stone data set and 0.22% in the multifunctional Swiss-Prot annotation list, defining the fraction of multifunctional enzyme annotations expected for enzyme pairs with no evidence of metabolic channelling (Table 3.5).

74

Table 3.5: Consecutive Reaction Representation in Multifunction Enzymes

This table shows the overlap between the complete set of biochemically consecutive enzymes and the multifunctional enzymes present in the Rosetta Stone and Swiss-Prot data sets. The percentage of consecutive pairs found in each multifunctional enzyme set represents the expected fraction with no metabolic channelling predicted. The consecutive, Rosetta Stone and Swiss-Prot totals represent the number of enzyme pairs with unique functional annotations.

Consecutive Rosetta Stone Swiss-Prot Count Percent Count Percent Count Percent Rosetta 571 0.72% 43 24.43% Swiss-Prot 176 0.22% 43 7.53% Total 79432 571 176

Enzyme pairs predicted to be involved in channelling by their Gibbs free energy changes showed a much higher level of overlap with both multifunctional sets than did the complete list of consecutive pairs. Of the 546 predicted pairs, 51 (9.3%) showed evidence of gene fusion in at least one organism and 8 (1.5%) were annotated with multiple enzymatic functions in Swiss-Prot, representing 13- and 7-fold enrichment respectively (Table 3.6).

Table 3.6: Predicted Channelling Representation in Multifunctional Enzymes

This table shows the overlap between enzymes that were predicted to be involved in channelling due to free energy coupling or because of a common toxic or inhibitory intermediate. Free energy and inhibitory coupling predictions show a large enrichment of multifunctional enzymes as compared with the control set of all biochemically consecutive enzyme pairs. Predictions based on toxic intermediates do not demonstrate any enrichment. Statistical significance is indicated with dots: p-value < 10-2(•), < 10-5(••), < 10-10(•••). See appendix C.

Free Energy Toxic Inhibitory Count Percent Enrichment Count Percent Enrichment Count Percent Enrichment Rosetta 51 9.34% 13.0x••• 33 0.69% 1.0x 81 2.28% 3.2x••• Swiss-Prot 8 1.47% 6.6x• 14 0.29% 1.3x 24 0.68% 3.0x•• Total 546 4799 3553

Toxicity of the common intermediate did not lead to any noticeable increase in multifunctional enzyme membership. Channelling of inhibitory compounds, however, was supported by these data. An approximately 3-fold enrichment for both gene fusion and multiple enzyme activities was observed for reaction pairs involving inhibitory metabolites.

75

3.3.5 Protein-Protein Interaction Networks Enzyme colocalization, which can be accomplished through direct enzyme-enzyme binding interactions, can also be mediated through the action of additional proteins. Binding interactions with non-enzymatic proteins can link enzymes together into complexes, or can recruit them to specific intracellular locations. The minimum path length of physical protein-protein interactions required to connect each pair of enzymes was investigated as an indicator for the likelihood of the two enzymes being colocalized, and therefore potentially involved in metabolic channelling.

In order to search for indirectly interacting enzyme pairs, the shortest path length was calculated for every pair of enzymes within each PPI. This measures the minimum number of physical interactions needed to bridge the enzymes and can be used as a metric for how likely they are to come together, at least transiently, in the cell. For these data, the hypothesis is that enzyme pairs involved in channelling will tend to have lower shortest path lengths than those that are not.

To test whether potential channelling partners are closer together in the PPI networks, each pair was classified by its minimum path length. With the one exception of E. coli the median path length between enzymes for all of the networks used in this study is 4. Enzyme pairs were therefore classified as close (1 – 3 edge shortest path), medium (4 edges) or distant (shortest path length of 5 or more) based on that value. E. coli has a smaller and tighter interaction network, with an average path length of 3. In this network, enzyme pairs were labelled as close or distant if their minimum path lengths were respectively less or greater than this average.

The list of consecutive enzyme pairs is the negative control in this analysis, as it represents every possible enzyme pair that could potentially be involved in metabolic channelling. Each of the other enzyme pair sets was therefore normalized against the consecutive enzymes, in order to investigate for any relative enrichment: The fraction of enzyme pairs in the test set in each path length category was divided by the fraction of all consecutive enzyme pairs in the same path length category. Looking at the normalized ratio of short to long paths for each prediction sets shows whether there is a greater than expected number of short paths.

As was previously reported (Huthmacher, Gille et al. 2008), consecutive enzyme pairs demonstrated an enrichment for short path lengths as compared with unrelated enzymes. In the combined interactome set, as shown in table 3.7, the complete list of enzyme pairs shows a much lower short path length membership, with an enrichment ratio of 0.68 relative to the consecutive pairs. This

76

pattern is observable in each of the individual interaction sets in table 3.8 and is backed by 2700 data points, confirming the earlier findings.

Table 3.7: Minimum Path Lengths in the Combined Interactome Network

Short minimum path length enrichment in the combined interaction networks from all eight interactomes. Path lengths are counted as being short (1-3 edges) average (4 edges) or long (5+ edges) for each network except for E. coli.

*Because of the shorter median path length, short paths are defined as those with 1-2 edges, average paths have 3 edges and long paths have 4 or more edges in the E. coli network. Statistical significance is indicated with dots: p-value < 10-2(•), < 10-5(••), < 10-10(•••). See appendix C.

Pathlengths Normalized to Consecutive Pairs Enrichment 1-3 4 5+ 1-3 4 5+ ••• All 75607 88501 72882 46.19 53.41 67.86 0.68 Consecutive 1637 1657 1074 1.00 1.00 1.00 1.00 Inhibitor 111 140 83 0.07 0.08 0.08 0.88 Pooled Interactomes Free Energy 93 114 58 0.06 0.07 0.05 1.05 Rosetta 113 122 51 0.07 0.07 0.05 1.45 Swiss Prot 9 19 11 0.01 0.01 0.01 0.54 • Toxic 45 58 59 0.03 0.04 0.05 0.50

The Rosetta Stone gene fusions also demonstrate an enrichment of short path length pairs in many of the interaction networks. Four of the eight individual networks show a higher than expected short : long path ratios, and in the pooled interactome set there is a 1.45-fold level of enrichment. Within the H. sapiens network in particular the distribution of minimum path lengths is very high for short paths, with enrichment levels of 4- to 5-fold. However, there is an order of magnitude less data supporting this; fewer than 170 enzyme pairs from the Rosetta Stone study are present in all of the interaction networks combined.

77

Table 3.8: Minimum Path Lengths in Each Individual PPI Network

Enrichment for short path lengths, divided by interactomes and prediction sets. Statistical significance is indicated with dots: p-value < 10-2(•), < 10-5(••), < 10-10(•••). See appendix C.

Pathlengths Normalized to Consecutive Pairs Enrichment 1-3* 4 5+ 1-3 4 5+ •• All 7419 16452 14356 35.50 52.73 69.35 0.51 Consecutive 209 312 207 1.00 1.00 1.00 1.00 • D. melanogaster Inhibitor 9 24 22 0.04 0.08 0.11 0.41 Free Energy 2 22 4 0.01 0.07 0.02 0.50 Two-Hybrid Rosetta 5 23 12 0.02 0.07 0.06 0.41 Swiss Prot 1 5 2 0.00 0.02 0.01 0.50 Toxic 4 9 4 0.02 0.03 0.02 0.99 • All 3509 7854 8142 42.79 55.31 66.74 0.64 Consecutive 82 142 122 1.00 1.00 1.00 1.00 E. coli Inhibitor 4 7 5 0.05 0.05 0.04 1.19 Free Energy 7 9 6 0.09 0.06 0.05 1.74 Affinity Capture-MS Rosetta 9 5 12 0.11 0.04 0.10 1.12 Swiss Prot 0 1 2 0.00 0.01 0.02 N/A Toxic 1 1 1 0.01 0.01 0.01 1.49 All 3376 4971 2979 42.73 50.72 51.36 0.83 Consecutive 79 98 58 1.00 1.00 1.00 1.00 H. pylori Inhibitor 1 5 1 0.01 0.05 0.02 0.73 Free Energy 5 9 4 0.06 0.09 0.07 0.92 Two-Hybrid Rosetta 9 11 7 0.11 0.11 0.12 0.94 Swiss Prot 1 2 0 0.01 0.02 0.00 N/A Toxic 6 3 3 0.08 0.03 0.05 1.47 All 1406 2427 3070 54.08 57.79 66.74 0.81 Consecutive 26 42 46 1.00 1.00 1.00 1.00 H. sapiens Inhibitor 4 3 4 0.15 0.07 0.09 1.77 Free Energy 1 5 2 0.04 0.12 0.04 0.88 Affinity Capture-MS Rosetta 3 3 1 0.12 0.07 0.02 5.31 Swiss Prot 0 1 0 0.00 0.02 0.00 N/A Toxic 0 3 4 0.00 0.07 0.09 0.00 ••• All 3143 5063 12102 27.33 71.31 106.16 0.26 • Consecutive 115 71 114 1.00 1.00 1.00 1.00 H. sapiens Inhibitor 4 12 11 0.03 0.17 0.10 0.36 Free Energy 0 2 5 0.00 0.03 0.04 0.00 Two-Hybrid Rosetta 5 1 1 0.04 0.01 0.01 4.96 Swiss Prot 0 0 1 0.00 0.00 0.01 0.00 Toxic 0 2 6 0.00 0.03 0.05 0.00 All 784 1142 1078 39.20 28.55 39.93 0.98 Consecutive 20 40 27 1.00 1.00 1.00 1.00 P. falciparum Inhibitor 3 3 1 0.15 0.08 0.04 4.05 Free Energy 1 3 3 0.05 0.08 0.11 0.45 Two-Hybrid Rosetta 3 5 0 0.15 0.13 0.00 N/A Swiss Prot 0 0 0 0.00 0.00 0.00 N/A Toxic 0 0 2 0.00 0.00 0.07 0.00 • All 33384 26148 12478 48.59 61.24 59.14 0.82 Consecutive 687 427 211 1.00 1.00 1.00 1.00 S. cerevisiae Inhibitor 54 31 18 0.08 0.07 0.09 0.92 Free Energy 61 27 10 0.09 0.06 0.05 1.87 Affinity Capture-MS Rosetta 53 28 11 0.08 0.07 0.05 1.48 Swiss Prot 4 6 2 0.01 0.01 0.01 0.61 • Toxic 20 19 19 0.03 0.04 0.09 0.32 All 14732 26669 24307 53.18 46.79 62.97 0.84 Consecutive 277 570 386 1.00 1.00 1.00 1.00 S. cerevisiae Inhibitor 25 59 24 0.09 0.10 0.06 1.45 • Free Energy 7 42 28 0.03 0.07 0.07 0.35 Two-Hybrid Rosetta 21 41 17 0.08 0.07 0.04 1.72 Swiss Prot 2 3 6 0.01 0.01 0.02 0.46 Toxic 13 22 20 0.05 0.04 0.05 0.91

78

None of the remaining four channelling prediction sets – those constructed based on free energy coupling, toxicity, inhibition or Swiss-Prot multifunctional annotations – shows any detectable overabundance of closely connected enzymes. While some interaction networks exhibit a positive enrichment, such at the 1.74-fold enrichment of free energy predictions in E. coli, those examples are infrequent and supported by little evidence. None of the metabolic channelling prediction sets demonstrated any elevation in short path length numbers in the overall pooled interactome results.

3.4 Discussion In this chapter, enzyme pairs were predicted to be involved in metabolic channelling based on three hypothesized evolutionary motivations: the circumvention of thermodynamic barriers, protection of the cytosol from toxic intermediates and prevention of inhibitors from interfering with alternate pathways. These predictions were then used to investigate a fourth potential use for channelling – regulation of metabolic flow through transient interactions – by examining the enrichment of potentially channelled compounds at metabolic branch points. Finally, the three sets of channelling predictions were compared with two expected indicators of metabolic channelling as a form of verification: known multifunctional enzymes and physical protein-protein interaction network minimum path lengths.

Many of the enzyme pairs made by the free energy coupling prediction are known channelling partners that have been previously described in the literature. Although not shown on the list of highest-scoring free energy predictions (Table B1), both the indole and carbamoyl phosphate channelling examples described in the introduction are identified. The two reactions catalyzed by tryptophan synthase, when paired, have free energy change values that result in a high prediction score of 3.8 × 108, and channelling between CKase and either ATCase or OTCase results in a score of 2.7 × 108. It was promising to see that this free energy-based prediction method was able to identify these two well-studied and prominent examples of channelling, especially considering that the documented evolutionary motivation for those examples was primarily the instability of their intermediates and not the thermodynamics of the reactions. This finding implies that more candidates on the predicted list are likely to be involved in as yet undiscovered channelling interactions.

79

The existence of multifunctional enzymes catalyzing the same pairs of reactions as predicted by the free energy and inhibitory metabolite channelling predictions also supports the hypothesis that metabolic channelling has evolved to accomplish these goals. Those predictions with multifunctional enzyme evidence are the most strongly supported in this analysis and are therefore more likely to be involved in metabolic channelling in living cells. Tables B2 through B4 (see appendix B) list the channelling candidates from each category that are supported by evidence of multifunctional enzymes.

A significant overlap between each of the channelling prediction sets and metabolic network branch points supports the hypothesis that channelling is used to aid in the regulation of metabolism. The free energy prediction candidates are those enzyme pairs that are most capable of experiencing a reaction acceleration when brought together through substrate channelling. Because of their potential to introduce a change in the reaction rates, they represent the enzymes with the highest possibility for metabolic regulation. It is therefore not surprising that the intermediate compounds associated with these pairs are preferentially found at pathway branch points. These locations in the metabolic network are optimal points for regulatory activity, as the large number of associated biochemical reactions can be used to control the rate of metabolic flow into many different pathways.

Multifunctional enzymes from the Rosetta Stone gene fusion analysis and Swiss Prot multiple EC number annotation set show an enrichment at metabolic branch points as well, though it is not as dramatic as the overabundance seen for free energy candidate pairs. This is most likely due to the overlap between the two sets; many multifunctional enzymes have been selected during evolution because of their ability to improve the combined reaction rates. While there are examples of bifunctional enzymes in some organisms, many other species perform the two catalytic activities in separate enzymes. This creates the ability to dynamically regulate associations between the two enzymes as the cell's biochemical needs change, which explains their enrichment at pathway branch points.

However, reaction acceleration is not the only evolutionary motivation for creating multifunctional enzymes; toxicity and instability of intermediates, for example, can also be responsible. Toxic compounds have little specific relevance to regulation, but they do appear to be concentrated at branch points. There is a two-fold enrichment at KEGG pathway branch points, and a three-fold enrichment of compounds with greater than ten associated reactions. One possible explanation is

80

that evolution has provided cells with redundant mechanisms for ensuring that toxic metabolites are processed quickly. By expressing many different enzymes that are each capable of converting an undesirable compound into something less harmful, cells can minimize the possibility of harmful chemicals reacting inappropriately in the cytosol.

That inhibitory metabolites are associated with branch points is not surprising. Their ability to enact metabolic control through feedback inhibition is well documented (Geigenberger, Stitt et al. 2004; Goslings, Meskauskiene et al. 2004; Grubb and Abel 2006). Often, the product of a metabolic pathway has the capability to downregulate its own production by inducing allosteric effects in the enzymes near the start of that pathway. By slowing the production of its precursor compounds, the inhibitor is able to increase the flow of metabolites into alternate pathways, into which they become committed via irreversible reactions. This is most effective when the inhibitor acts at the earliest steps of a pathway, which correspond to enzymes at or near metabolic branch points.

While a correlation between metabolic branch points and each of the channelling predictors was found, much of the results in this chapter would be dramatically improved with better sources of data, particularly the predictions based on metabolite toxicity. Substrate channelling can prevent toxic metabolic intermediates from being released into the cytosol while they are being acted upon by enzymes. This implies that compounds with intracellular toxicity are potential channelling candidates. However, the toxicity database from which the list of toxic intermediates was compiled, ChemIDplus, is a collection of results from a wide variety of experiments. Many different methods of toxin administration, such as oral, intraperitoneal and inhalation, were performed on a variety of animals to create these data. Unfortunately, while this information is abundant and easily accessible, it does not reflect precisely the type of chemicals that are likely to be involved in substrate channelling. Passing intermediates directly from one enzyme to the next can decrease the concentration of chemicals that are harmful when present inside the cell, but will not reduce the lethality of toxins that act in other ways.

ChemIDplus does contain intracellular toxins, such as , formaldehyde and ammonia, but it also lists many substances that are lethal for other reasons. For example, 1,1-difluoroethane is listed as toxic because it is an inert gas that is heavier than air, and can exclude oxygen from an animal's lungs if inhaled; there is no implication that this compound would cause any harm if it were dissolved within a cell's cytosol. Because there is no evidence for the intracellular toxicity of this compound, it is not likely to be an important candidate for metabolic channelling.

81

The lack of significant physical interaction enrichment amongst channelling prediction candidates also suggests an area where improvement could be possible, and can be explained by three different possibilities. First, this result may indicate that channelling is not occurring between the predicted pairs. This, however, is unlikely as several other lines of evidence have supported the free energy and inhibitory metabolite predictions. Strong overlap of those two sets with the known multifunctional evidence demonstrates that there is a biological motivation to perform those enzymatic functions at the same location in the cell. Also, many of the reactions identified through those predictions, such as those involving indole (Yanofsky and Rachmeler 1958), and carbamoyl phosphate (Massant, Verstreken et al. 2002) have been previously documented as examples of metabolic channelling.

A second possibility is that channelling is being enacted, but through mechanisms other than physical protein-protein interactions. Perhaps through independent targeting of enzyme pairs to organelles or other specific locations within the cell, channelling partners are directed to each other. If this were the case, pairs of enzymes must be brought close enough to each other to enable channelling, but without binding to each other or being connected by a bridge of interacting proteins.

The final explanation is that channelling does occur between many of the predicted pairs, and physical protein-protein interactions are responsible for bringing them together, but that the data used in this analysis are insufficient to detect it. There are many weaknesses in the data that was used, which been be seen by the relatively small number of enzyme pairs visible in the results. Mapping of physical protein-protein interaction networks is a relatively new field, and the overall coverage is still very weak. For example, only 10% of the potential H. sapiens interactions were measured in the human interactome yeast two-hybrid assay that was performed. As well, both the affinity purification and yeast two-hybrid techniques are poor at detecting interactions amongst membrane-bound proteins, which are potential colocalization targets (Giot, Bader et al. 2003).

Because any interaction between the hybrid proteins will enable reporter expression, yeast two- hybrid selection is a very sensitive method. Transient interactions are sufficient to allow yeast colony formation to indicate an interaction, as are weak binding events. While the detection of transient interactions is considered a strength of this system, its tendency to find weakly binding pairs leads to a large number of false positives. Promiscuous proteins are often found that seem to bind to hundreds of others, and computational approaches have to be performed to attempt to

82

filter out irrelevant interactions. As well, since the method relies on gene fusion, direct binding between the fused constructs that is being measured. Although this binding profile usually matches the natural proteins, it is possible for the attachment of transcription factor domains to interfere with the binding surfaces and result in false negatives (Formstecher, Aresta et al. 2005). Affinity purification requires that all proteins in the complex remain bound to the bait protein during the entire process of extraction from the cell and both purification steps. This eliminates the possibility of identifying weakly bound proteins and results in far fewer false positives than the yeast two- hybrid method. Transient binding events cannot be detected, however, as their interactions are not maintained for a long enough period of time.

Functional annotation of the proteins in the interaction networks is imperfect as well, and so many of the interactions which were detected may not have been properly assigned to the enzymes they represent. In the future, this analysis may yield more positive results when it is performed on more complete and better annotated networks.

In spite of these difficulties, the work presented in this chapter resulted in some interesting findings. In addition to glyoxylate, the free energy channelling predictor found many other highly metabolically connected compounds with a high energy of formation which may be involved in channelling. This has resulted in an enrichment for compounds that are involved in multiple pathways. It is an intriguing possibility that metabolism may have evolved with high ΔF compounds at key locations between pathways either because they are easily converted into other metabolites, or to increase the efficiency of regulation. These metabolites, whose thermodynamic properties make them energetically expensive to produce, release that energy when they are consumed. It is possible that this release of energy encourages the evolution of enzymes that act on high ΔF compounds by reducing the thermodynamic barriers needed to catalyze reactions.

By acting at their positions in the metabolic network, biochemical regulation within the cell can most easily direct intermediates into the right pathways at the time when they are needed for two reasons. Firstly, having a metabolite that costs energy to produce but releases energy when used creates the opportunity for metabolic channelling to accelerate the reaction. Regulation of transient interactions between two enzymes that act on the intermediate could provide the cell with dynamic control over the reaction rates. The second regulatory benefit is unrelated to metabolic channelling: The high ΔF forces those reactions to have a favourable equilibrium constant, meaning that once their enzymes are active, the metabolites will be irreversibly converted. This creates a

83

thermodynamic barrier which prevents thermal noise and stochastic effects from reversing regulation. Once metabolites enter a pathway via a high ΔF compound, the high free energy change of the reverse reaction will ensure that they are committed to that pathway. Unfortunately, a complete database of energies of formation was unavailable at the time of this study, and so this hypothesis could not be investigated further.

In this chapter, metabolic channelling partners were predicted based on three hypotheses: There were 3149 enzyme pairs acting on 196 different intermediate compounds which were predicted due to their potential to circumvent thermodynamic barriers; 197 toxic intermediates were identified as possible channelling candidates and 199 inhibitory metabolites found which may be involved in channelling. These predictions were validated through a comparison with multifunctional enzymes that found a 6- to 13-fold enrichment for free energy predictions and a 3-fold enrichment for inhibitory metabolite predictions. To investigate the use of these channelling candidates in metabolic regulation, their abundance at metabolic branch points was measured. This revealed that four times more free energy predictions and two times more toxic and inhibitory predictions than expected are found at these key regulatory locations in the metabolic network.

The results described here demonstrate that the free energy coupling and inhibitory metabolite channelling predictions agree well with known examples of multifunctional enzymes, and so it is probable that these two motivations influence the evolution of metabolic channelling. The results also agree with the hypothesis that substrate channelling, which is possibly induced by transient protein interactions, is involved in regulation of cellular metabolism. More work still needs to be done, though, to investigate specific channelling examples and to explore the regulatory role of channelling in more detail.

4 Conclusion The behaviour of complex biological systems is often non-intuitive, and metabolic channelling is no exception, as demonstrated by the controversy that this subject has experienced (Gutfreund and Chock 1991; Kell 1991). Exploration of such systems can be accomplished with modeling, and to that end all of the experiments described in chapter 2 were performed in silico. There are three main reasons why modeling was chosen rather than in vivo or in vitro experiments:

To discover how channelling happens using a minimal set of rules

Modeling is one way to investigate a system in complete isolation from other factors. This can be helpful in demonstrating the causes for any particular phenomenon. If metabolic acceleration emerges as a property of a simulation, then the rules that define that model must be sufficient to cause it. Limiting the model to the simplest possible definitions that result in observable metabolic channelling can provide convincing evidence of what may be required to enable it in nature.

To explore parameters and set bounds on when channelling can happen

Beyond demonstrating what mechanisms make channelling possible, it is possible to quantitatively measure the range of parameters necessary to allow it. Within a simulation, the effects of altering natural constants can be explored. Parameters such as the diffusion coefficient of metabolites within the cytosol and reaction kinetics values can be changed in ways that defy their chemical natures. By systematically scanning through entire ranges, simulations can reveal constraints that limit situations in which channelling is possible.

Gain non-intuitive insights

Within a simulation, it is possible to visualize any process occurring in precise detail. Every aspect of the system that is being modeled can be viewed as time progresses. In spatial simulations, the changing locations of enzyme molecules, the rate at which each reaction is proceeding and the instantaneous concentration fields of every metabolite can be queried and recorded, up to whatever level of accuracy the simulation is run. Much of these data would be difficult or impossible to record in a laboratory experiment. Looking at this information in different forms – such as tables, graphs, images and movies – can lead to unexpected insights that would otherwise be hard to come by.

84

85

The work described in chapter two has provided these benefits to the understanding of metabolic channelling. Two of the expected effects of channelling were observed in the models that were used, showing that enzyme colocalization can be sufficient to enable substrate channelling in an environment with slow metabolite diffusion rates. A range of inter-enzyme distance values was explored, which helped to set some bounds on how closely enzymes must interact in order to be involved in channelling. Insights were gained regarding the acceleration of reaction pairs when thermodynamic constraints were circumvented and the reduction in cytosolic concentration of a channelled .

Chapter three provided evidence showing that evolution has taken advantage of channelling for several of the hypothesized motivations. Many enzymes predicted to be involved in metabolic channelling due to free energy coupling or an inhibitory reaction intermediate were supported by documented research as well as enrichment in multifunctional enzyme databases. A strong overlap between prediction sets and metabolic branch point compounds suggested the importance of substrate channelling in the regulation of metabolism. Finally, this work has resulted in the prediction of many novel enzyme pairs which are likely candidates to be involved in metabolic channelling.

4.1 Future Work The simulations performed in chapter 2 represent what can be accomplished with the limitation of current tools and resources. As discussed in section 2.5.2, new insights could be obtained by extending the models to include metabolic branch points. This would allow simulations to test the potential effectiveness of transient protein associations in regulating metabolic flow through dynamic control of channelling interactions. The accuracy of data generated from these simulations could be improved through the use of higher, but more computationally intensive resolution settings (see section 2.4.3) or by improving the model by comparison with the in vivo time course experiments described in section 2.5.1.

Many of the predictions made in chapter two were hindered by low-quality data sources. A majority of the Gibbs free energy data were drawn from a limited set of group-contribution estimates (see section 3.2.2.1) and would be greatly improved if more experimental data were available. A source of intracellular toxicity data, which was unavailable at the time of this study (see section 2.3.2), would be of great help for predicting channelling candidates in the future. When more complete

86

protein-protein interaction networks, containing higher-confidence interactions are available, it will be worth revisiting the minimum path length comparison which was expected to verify the channelling predictions. Future metabolic channelling predictions can be performed once additional data sources are published. For example high throughput colocalization studies, performed by automated protein-protein colocalization methods in vivo (Costes, Daelemans et al. 2004) can narrow predicted sets to those enzymes which are found in the same cytosolic locations.

Finally, verification of some of the channelling predictions should be performed in living cells. By genetically fusing two enzymes that are predicted to accelerate a reaction when channelled, metabolic channelling can be artificially introduced into an organism. In vivo experimental techniques, such as those described in section 2.5.1 can then be used to investigate whether the induced channelling results in increased reaction velocities. Similarly, channelling that is predicted to lower the cytosolic concentrations of an undesirable intermediate (see section 3.3.2), or to increase metabolic flow into specific pathways (see section 3.2.5) can be genetically induced and experimentally tested.

5 References Abbott, R. G., S. Forrest, et al. (2006). "Simulating the hallmarks of cancer." Artificial life 12(4): 617- 634. al-Habori, M. (1995). "Microcompartmentation, metabolic channelling and carbohydrate metabolism." The international journal of biochemistry & cell biology 27(2): 123-132. Amaro, R., E. Tajkhorshid, et al. (2003). "Developing an energy landscape for the novel function of a (beta/alpha)8 barrel: ammonia conduction through HisF." Proceedings of the National Academy of Sciences of the United States of America 100(13): 7599-7604. Ander, M., P. Beltrao, et al. (2004). "SmartCell, a framework to simulate cellular processes that combines stochastic approximation with diffusion and localisation: analysis of simple networks." Syst Biol (Stevenage) 1(1): 129-138. Andrews, S. and D. Bray (2004). "Stochastic simulation of chemical reactions with spatial resolution and single molecule detail." Physical Biology 1(3): 137-151. Arrio-Dupont, M., G. Foucault, et al. (1997). "Mobility of creatine phosphokinase and beta-enolase in cultured muscle cells." Biophysical journal 73(5): 2667-2673. Atsumi, S. and J. C. Liao (2008). "Metabolic engineering for advanced biofuels production from ." Current opinion in biotechnology. Bader, G. and C. Hogue (2002). "Analyzing yeast protein-protein interaction data obtained from different sources." Nat Biotech 20(10): 991-997. Bakker, B. M., F. I. Mensonides, et al. (2000). "Compartmentation protects trypanosomes from the dangerous design of glycolysis." Proceedings of the National Academy of Sciences of the United States of America 97(5): 2087-2092. Barthelmes, J., C. Ebeling, et al. (2007). "BRENDA, AMENDA and FRENDA: the enzyme information system in 2007." Nucleic Acids Res 35(Database issue). Bereiter-Hahn, J., C. Stübig, et al. (1995). "Cell cycle-related changes in F-actin distribution are correlated with glycolytic activity." Experimental cell research 218(2): 551-560. Blinov, M. L., J. R. Faeder, et al. (2004). "BioNetGen: software for rule-based modeling of signal transduction based on the interactions of molecular domains." Bioinformatics (Oxford, England) 20(17): 3289-3291. Boeckmann, B., M. C. Blatter, et al. (2005). "Protein variety and functional diversity: Swiss-Prot annotation in its biological context." Comptes rendus biologies 328(10-11): 882-899. Bowers, P. M., M. Pellegrini, et al. (2004). "Prolinks: a database of protein functional linkages derived from coevolution." Genome Biol 5(5). Butland, G., J. Peregrin-Alvarez, et al. (2005). "Interaction network containing conserved and essential protein complexes in Escherichia coli." Nature 433(7025): 531-537. Chaudhuri, B. N., S. C. Lange, et al. (2001). "Crystal structure of imidazole glycerol phosphate synthase: a tunnel through a (beta/alpha)8 barrel joins two active sites." Structure (London, England : 1993) 9(10): 987-997. Chock, P. B. and H. Gutfreund (1988). "Reexamination of the kinetics of the transfer of NADH between its complexes with glycerol-3-phosphate dehydrogenase and with lactate dehydrogenase." Proceedings of the National Academy of Sciences of the United States of America 85(23): 8870-8874. Clayton, C. and P. Michels (1996). "Metabolic compartmentation in African trypanosomes." Parasitology Today 12(12): 465-471. Clegg, J. S. (1984). "Properties and metabolism of the aqueous cytoplasm and its boundaries." The American journal of physiology 246(2 Pt 2).

87

88

Cohen, L. and R. E. Mackenzie (1978). "Methylenetetrahydrofolate dehydrogenase- methenyltetrahydrofolate cyclohydrolase-formyltetrahydrofolate synthetase from porcine liver. Interaction between the dehydrogenase and cyclohydrolase activities of the multifunctional enzyme." Biochimica et biophysica acta 522(2): 311-317. Cornish-Bowden, A. and M. L. Cárdenas (1993). "Channelling can affect concentrations of metabolic intermediates at constant net flux: artefact or reality?" European journal of biochemistry / FEBS 213(1): 87-92. Costes, S. V., D. Daelemans, et al. (2004). "Automatic and quantitative measurement of protein- protein colocalization in live cells." Biophysical journal 86(6): 3993-4003. Croes, D., F. Couche, et al. (2006). "Inferring meaningful pathways in weighted metabolic networks." J Mol Biol 356(1): 222-236. Degenring, D., M. Röhl, et al. (2004). "Discrete event, multi-level simulation of metabolite channeling." Biosystems 75(1-3): 29-41. Dhar, P., T. Meng, et al. (2004). "Cellware--a multi-algorithmic software for computational systems biology." Bioinformatics 20(8): 1319-1321. Douangamath, A., M. Walker, et al. (2002). "Structural evidence for ammonia tunneling across the (beta alpha)(8) barrel of the imidazole glycerol phosphate synthase bienzyme complex." Structure (London, England : 1993) 10(2): 185-193. Droux, M., M. L. Ruffet, et al. (1998). "Interactions between serine acetyltransferase and O- acetylserine (thiol) lyase in higher plants--structural and kinetic properties of the free and bound enzymes." European journal of biochemistry / FEBS 255(1): 235-245. Ellis, J. (2001). "Macromolecular crowding: an important but neglected aspect of the intracellular environment." Current Opinion in Structural Biology 11(1): 114-119. Elowitz, M. B., M. G. Surette, et al. (1999). "Protein mobility in the cytoplasm of Escherichia coli." J Bacteriol 181(1): 197-203. Ewing, R. M., P. Chu, et al. (2007). "Large-scale mapping of human protein-protein interactions by mass spectrometry." Mol Syst Biol 3. Formstecher, E., S. Aresta, et al. (2005). "Protein interaction mapping: A Drosophila case study." Genome Res. 15(3): 376-384. Förster, J., I. Famili, et al. (2003). "Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network." Genome Res 13(2): 244-253. Foucault, G., M. Vacher, et al. (2000). "Interactions between beta-enolase and in the cytosol of cells." The Biochemical journal 346 Pt 1: 127-131. García-Pérez, A. I., E. A. López-Beltrán, et al. (1999). "Molecular crowding and viscosity as determinants of translational diffusion of metabolites in subcellular organelles." Archives of biochemistry and biophysics 362(2): 329-338. Garrod, A. E. and H. Frowde (1923). Inborn errors of metabolism. London. Geigenberger, P., M. Stitt, et al. (2004). "Metabolic control analysis and regulation of the conversion of to starch in growing potato tubers." Plant, Cell & Environment 27(6): 655-673. Giot, L., J. S. Bader, et al. (2003). "A protein interaction map of Drosophila melanogaster." Science 302(5651): 1727-1736. Goldberg, R., Y. Tewari, et al. (2004). "Thermodynamics of enzyme-catalyzed reactions--a database for quantitative biochemistry." Bioinformatics 20(16): 2874-2877. Gonzalez, B., J. François, et al. (1997). "A rapid and reliable method for metabolite extraction in yeast using boiling buffered ethanol." Yeast 13(14): 1347-1355. Goslings, D., R. Meskauskiene, et al. (2004). "Concurrent interactions of heme and FLU with Glu tRNA reductase (HEMA1), the target of metabolic feedback inhibition of tetrapyrrole

89

biosynthesis, in dark- and light-grown Arabidopsis plants." The Plant journal : for cell and molecular biology 40(6): 957-967. Grubb, C. D. and S. Abel (2006). "Glucosinolate metabolism and its control." Trends in plant science 11(2): 89-100. Gutfreund, H. and P. B. Chock (1991). " among glycolytic enzymes: fact or fiction." Journal of theoretical biology 152(1): 117-121. Halling, P. J. (1989). "Do the laws of chemistry apply to living cells?" Trends in biochemical sciences 14(8): 317-318. Han, J.-D., D. Dupuy, et al. (2005). "Effect of sampling on topology predictions of protein-protein interaction networks." Nature Biotechnology 23(7): 839-844. Harding (2008). "Progress toward cell-directed therapy for phenylketonuria." Clinical Genetics 74(2): 97-104. Hattne, J., D. Fange, et al. (2005). "Stochastic reaction-diffusion simulation with MesoRD." Bioinformatics 21(12): 2923-2924. Henry, C., M. Jankowski, et al. (2006). "Genome-Scale Thermodynamic Analysis of Escherichia coli Metabolism." Biophys. J. 90(4): 1453-1461. Huthmacher, C., C. Gille, et al. (2008). "A computational analysis of protein interactions in metabolic networks reveals novel enzyme pairs potentially involved in metabolic channeling." Journal of theoretical biology 252(3): 456-464. Ishii, N., M. Robert, et al. (2004). "Toward large-scale modeling of the microbial cell for computer simulation." J Biotechnol 113(1-3): 281-294. Janson, L. W., K. Ragsdale, et al. (1996). "Mechanism and size cutoff for steric exclusion from actin- rich cytoplasmic domains." Biophysical journal 71(3): 1228-1234. Johnston, G. C., C. W. Ehrhardt, et al. (1979). "Regulation of cell size in the yeast Saccharomyces cerevisiae." Journal of bacteriology 137(1): 1-5. Joyce, A. and B. Palsson (2006) "The model organism as a system: integrating 'omics' data sets." Nature Reviews Molecular Cell Biology 7(3): 198-210. Kanehisa, M. and S. Goto (2000). "KEGG: kyoto encyclopedia of genes and genomes." Nucleic Acids Res 28(1): 27-30. Karp, P., I. Keseler, et al. (2007). "Multidimensional annotation of the Escherichia coli K-12 genome." Nucl. Acids Res. 35(22): 7577-7590. Kell, D. B. (1991). "On the physiological significance of metabolite channelling: if, how, and where, but not why." J Theor Biol 152(1): 49-51. Kierzek, A. (2002). "STOCKS: STOChastic Kinetic Simulations of biochemical systems with Gillespie algorithm." Bioinformatics 18(3): 470-481. Knighton, D., C.-C. Kan, et al. (1994). "Structure of and kinetic channelling in bifunctional dihydrofolate reductase-thymidylate synthase." Nat Struct Mol Biol 1(3): 186-194. Krappmann, S., W. N. Lipscomb, et al. (2000). "Coevolution of transcriptional and at the chorismate metabolic branch point of Saccharomyces cerevisiae." Proceedings of the National Academy of Sciences of the United States of America 97(25): 13585-13590. Kumar, S. P. and J. C. Feidler (2003). "BioSPICE: a computational infrastructure for integrative biology." Omics : a journal of integrative biology 7(3). Lacount, D., M. Vignali, et al. (2005). "A protein interaction network of the malaria parasite Plasmodium falciparum." Nature 438(7064): 103-107. Le Novere, N. and T. Shimizu (2001). "STOCHSIM: modelling of stochastic biomolecular processes." Bioinformatics 17(6): 575-576.

90

Loew, L. and J. Schaff (2001). "The Virtual Cell: a software environment for computational cell biology." Trends in Biotechnology 19(10): 401-406. Lok, L. and R. Brent (2005). "Automatic generation of cellular reaction networks with Moleculizer 1.0." Nature Biotechnology 23(1): 131-136. Luby-Phelps, K. (2000). "Cytoarchitecture and physical properties of cytoplasm: volume, viscosity, diffusion, intracellular surface area." Int Rev Cytol 192: 189-221. Luby-Phelps, K. and D. L. Taylor (1988). "Subcellular compartmentalization by local differentiation of cytoplasmic structure." Cell motility and the cytoskeleton 10(1-2): 28-37. Maher, A. D., P. W. Kuchel, et al. (2003). "Mathematical modelling of the urea cycle. A numerical investigation into substrate channelling." European journal of biochemistry / FEBS 270(19): 3953-3961. Massant, J., P. Verstreken, et al. (2002). "Metabolic Channeling of Carbamoyl Phosphate, a Thermolabile Intermediate. Evidence for Physical Interaction Between Kinase- like Carbamoyl-phosphate Synthetase and Ornithine Carbamoyl from the Hyperthermophile Pyrococcus furiosus." J. Biol. Chem. 277(21): 18517-18522. Matchett, W. H. (1974). "Indole channeling by tryptophan synthase of neurospora." The Journal of biological chemistry 249(13): 4041-4049. Mavrovouniotis, M. (1990). "Group contributions for estimating standard gibbs energies of formation of biochemical compounds in aqueous solution." Biotechnology and Bioengineering 36(10): 1070-1082. Méjean, C., F. Pons, et al. (1989). "Antigenic probes locate binding sites for the glycolytic enzymes glyceraldehyde-3-phosphate dehydrogenase, aldolase and phosphofructokinase on the actin monomer in microfilaments." The Biochemical journal 264(3): 671-677. Mendes, P. (1997). "Biochemistry by numbers: simulation of biochemical pathways with Gepasi 3." Trends Biochem Sci 22(9): 361-363. Mouilleron, S. and B. Golinelli-Pimpaneau (2007). "Conformational changes in ammonia-channeling glutamine amidotransferases." Current opinion in structural biology 17(6): 653-664. Nakayama, Y., A. Kinoshita, et al. (2005). "Dynamic simulation of red blood cell metabolism and its application to the analysis of a pathological condition." Theoretical Biology and Medical Modelling 2(1). Neil, J. J., T. Q. Duong, et al. (1996). "Evaluation of intracellular diffusion in normal and globally- ischemic rat brain via 133Cs NMR." Magnetic resonance in medicine : official journal of the Society of Magnetic Resonance in Medicine / Society of Magnetic Resonance in Medicine 35(3): 329-335. Niyogi, K. K. and G. R. Fink (1992). "Two Anthranilate Synthase Genes in Arabidopsis: Defense- Related Regulation of the Tryptophan Pathway." Plant Cell 4(6): 721-733. Ohno, H., Y. Naito, et al. (2008). "Construction of a biological tissue model based on a single-cell model: a computer simulation of metabolic heterogeneity in the liver lobule." Artificial life 14(1): 3-28. Oldiges, M., S. Lütz, et al. (2007). "Metabolomics: current state and evolving methodologies and tools." Appl Microbiol Biotechnol 76(3): 495-511. Paine, J., C. Shipton, et al. (2005). "Improving the nutritional value of Golden Rice through increased pro-vitamin A content." Nature Biotechnology 23(4): 482-487. Pan, P., E. Woehl, et al. (1997). "Protein architecture, dynamics and allostery in tryptophan synthase channeling." Trends in biochemical sciences 22(1): 22-27. Pasternack, L. B., D. A. Laude, et al. (1992). "13C NMR detection of folate-mediated serine and synthesis in vivo in Saccharomyces cerevisiae." Biochemistry 31(37): 8713-8719.

91

Pawelek, P. D., M. Allaire, et al. (2000). "Channeling efficiency in the bifunctional methylenetetrahydrofolate dehydrogenase/cyclohydrolase domain: the effects of site- directed mutagenesis of NADP binding residues." Biochimica et biophysica acta 1479(1-2): 59-68. Pelletier, J. and R. Mackenzie (1995). "Binding and interconversion of tetrahydrofolates at a single site in the bifunctional methylenetetrahydrofolate dehydrogenase/cyclohydrolase." Biochemistry 34(39): 12673-12680. Petzold, L. R. (1998). Computer Methods for Ordinary Differential Equations and Differential- Algebraic Equations, SIAM: Society for Industrial and Applied Mathematics. Purcarea, C., D. R. Evans, et al. (1999). "Channeling of carbamoyl phosphate to the pyrimidine and arginine biosynthetic pathways in the deep sea hyperthermophilic archaeon Pyrococcus abyssi." The Journal of biological chemistry 274(10): 6122-6129. Radhakrishnan, K. and A. C. Hindmarsh (1993). Description and use of LSODE, the Livemore Solver for Ordinary Differential Equations. Lawrence Livermore National Laboratory Report. Rain, J. C., L. Selig, et al. (2001). "The protein-protein interaction map of Helicobacter pylori." Nature 409(6817): 211-215. Reed, J. L., T. D. Vo, et al. (2003). "An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR)." Genome Biol 4(9). Rual, J.-F. o., K. Venkatesan, et al. (2005). "Towards a proteome-scale map of the human protein–protein interaction network." Nature. Salis, H., V. Sotiropoulos, et al. (2006). "Multiscale Hy3S: Hybrid stochastic simulation for supercomputers." BMC Bioinformatics 7(1). Sanford, Chris, et al. (2006). "Cell++simulating biochemical pathways." Bioinformatics 22(23): 2918- 2925. Schlichting, I., X. J. Yang, et al. (1994). "Structural and kinetic analysis of a channel-impaired mutant of tryptophan synthase." The Journal of biological chemistry 269(43): 26591-26593. Schmid, J., A. Schaller, et al. (1992). "The in-vitro synthesized tomato shikimate kinase precursor is enzymatically active and is imported and processed to the mature enzyme by chloroplasts." The Plant journal : for cell and molecular biology 2(3): 375-383. Shapiro, B. E., A. Levchenko, et al. (2003). "Cellerator: extending a computer algebra system to include biochemical arrows for signal transduction simulations." Bioinformatics 19(5): 677- 678. Shulman, R. G. (1991). "Is channeling a disproveable hypothesis?" Journal of theoretical biology 152(1): 133-134. Srere, P. (1990). "Citric acid cycle redux." Trends in Biochemical Sciences 15(11): 411-412. Srere, P. A. (1987). "Complexes of Sequential Metabolic Enzymes." Annual Review of Biochemistry 56(1): 89-124. Stark, C., B. J. Breitkreutz, et al. (2006). "BioGRID: a general repository for interaction datasets." Nucleic Acids Res 34(Database issue). Stiles, J., T. Bartol, et al. (2001). Monte Carlo Methods for Simulating Realistic Synaptic Microphysiology Using MCell. Computational neuroscience: realistic modeling for experimentalists, CRC Press. Stroud, R. M. (1994). "An electrostatic highway." Nature structural biology 1(3): 131-134. Sundararaj, S., A. Guo, et al. (2004). "The CyberCell Database (CCDB): a comprehensive, self- updating, relational database to coordinate and facilitate in silico modeling of Escherichia coli." Nucleic acids research 32(Database issue). Swaminathan, R., C. P. Hoang, et al. (1997). "Photobleaching recovery and anisotropy decay of green fluorescent protein GFP-S65T in solution and cells: cytoplasmic viscosity probed by green

92

fluorescent protein translational and rotational diffusion." Biophysical journal 72(4): 1900- 1907. Takahashi, K., K. Kaizu, et al. (2004). "A multi-algorithm, multi-timescale method for cell simulation." Bioinformatics 20(4): 538-546. Tanaka, M., Y. Okuno, et al. (2003). "Extraction of a Thermodynamic Property for Biochemical Reactions in the Metabolic Pathway." Genome Informatics 14: 370-371. Teplyakov, A., G. Obmolova, et al. (2001). "Channeling of ammonia in glucosamine-6-phosphate synthase." Journal of Molecular Biology 313(5): 1093-1102. Teusink, B., J. Passarge, et al. (2000). "Can yeast glycolysis be understood in terms of in vitro kinetics of the constituent enzymes? Testing biochemistry." European Journal of Biochemistry 267(17): 5313-5329. Theobald, U., W. Mailinger, et al. (1997). "In vivo analysis of metabolic dynamics in Saccharomyces cerevisiae : I. Experimental observations." Biotechnology and Bioengineering 55(2): 305-316. Titheradge, M. A., R. A. Picking, et al. (1992). "Physiological concentrations of 2-oxoglutarate regulate the activity of phosphoenolpyruvate carboxykinase in liver." The Biochemical journal 285 ( Pt 3): 767-771. Tomasulo, P. (2002). "ChemIDplus-super source for chemical and drug information." Medical reference services quarterly 21(1): 53-59. Tomita, M. (2001). "Whole-cell simulation: a grand challenge of the 21st century." Trends in Biotechnology 19(6): 205-210. Umbarger, H. E. (1992). "The origin of a useful concept--feedback inhibition." Protein science : a publication of the Protein Society 1(10): 1392-1395. Vass, M., N. Allen, et al. (2004). "the JigCell model builder and run manager." Bioinformatics (Oxford, England) 20(18): 3680-3681. Villas-Bôas, S. G. and P. Bruheim (2007). "The potential of metabolomics tools in bioremediation studies." Omics : a journal of integrative biology 11(3): 305-313. Winkel, B. S. (2004). "Metabolic channeling in plants." Annu Rev Plant Biol 55: 85-107. Wishart, D., R. Yang, et al. (2004). "Dynamic cellular automata: an alternative approach to cellular simulation." In Silico Biology 4: 14. Wojtas, K., N. Slepecky, et al. (1997). "Flight muscle function in Drosophila requires colocalization of glycolytic enzymes." Mol Biol Cell 8(9): 1665-1675. Wu, X., H. Gutfreund, et al. (1991). "Substrate Channelling in Glycolysis: A Phantom Phenomenon." Proceedings of the National Academy of Sciences 88(2): 497-501. Yanofsky, C. and M. Rachmeler (1958). "The exclusion of free indole as an intermediate in the biosynthesis of tryptophan in Neurospora crassa." Biochimica et biophysica acta 28(3): 640- 641. You, L., A. Hoonlor, et al. (2003). "Modeling biological systems using Dynetica--a simulator of dynamic networks." Bioinformatics 19(3): 435-436. Zhou, L., J. E. Salem, et al. (2005). "Mechanistic model of cardiac energy metabolism predicts localization of glycolysis to cytosolic subdomain during ischemia." Am J Physiol Heart Circ Physiol 288(5).

Appendix A – Simulation Traces

93

94

Figure A1: Cell++ — No Diffusion

95

Figure A2: Cell++ — No Channelling

96

Figure A3: Cell++ — 3PGA Channelling

97

Figure A4: Cell++ — 2PGA Channelling

98

Figure A5: Cell++ — PEP Channelling

99

Figure A6: E-CELL — No Diffusion

100

Figure A7: E-CELL — No Channelling

101

Figure A8: E-CELL — 3PGA Channelling

102

Figure A9: E-CELL — 2PGA Channelling

103

Figure A10: E-CELL — PEP Channelling

Appendix B – Channelling Predictions

104

105

Table B1: Top Free Energy Predictions

The 25 highest-scoring channeling predictions made through free energy comparisons. The channeled metabolite is listed, along with the number of potential enzyme pairs that were identified as candidates that may channel that compound. For the metabolites that are intermediates in more than one pair of potentially channeled reactions, the highest-scoring enzyme pair is shown. As well, the channeling prediction score is shown, along with the data sources from which each reaction's Gibbs free energy value was obtained.

Potential Intermediate Metabolite Pairs Highest Scoring Enzyme Pair Score Sources Chorismate 15 Chorismate pyruvate lyase anthranilate synthase 3.60E+10 Henry Henry Ubiquinone-8 49 pyruvate Glucose dehydrogenase (ubiquinone-8 as acceptor) 3.12E+10 Henry Henry 2-C-Methyl-D-erythritol 2,4-cyclodiphosphate 1 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase 2C-methyl-D-erythritol 2,4 cyclodiphosphate 1.53E+10 Henry Henry 1 taurine transport via ABC system Taurine 1.48E+10 Henry Henry L-Glutamine 52 amidophosphoribosyltransferase anthranilate synthase 1.35E+10 Henry Henry D-Glucose 170 D-glucose transport via PEP:Pyr PTS Glucose dehydrogenase (ubiquinone-8 as acceptor) 9.58E+09 Henry Henry all-trans-Octaprenyl diphosphate 1 Hydroxybenzoate octaprenyltransferase 1,4-dihydroxy-2-naphthoate octaprenyltransferase 6.61E+09 Henry Henry (S)-Dihydroorotate 7 dihydroorotic acid (menaquinone-8) dihydoorotic acid dehydrogenase (quinone8) 5.08E+09 Henry Henry Phosphoenolpyruvate 250 3-deoxy -D-manno-octulosonic -acid 8-phosphate 3-deoxy-7-phosphoheptulonate synthase 4.42E+09 Henry Henry (S)-Lactate 5 L-Lactate dehydrogenase (menaquinone) L-Lactate dehydrogenase (ubiquinone) 4.10E+09 Henry Henry sn-Glycerol 3-phosphate 20 glycerol-3-phosphate dehydrogenase (menaquinone-8) glycerol-3-phosphate dehydrogenase (ubiquinone-8) 4.10E+09 Henry Henry (S)-Malate 3 (menaquinone 8 as acceptor) Malate dehydrogenase (ubiquinone 8 as acceptor) 4.10E+09 Henry Henry 1-hydroxy-2-methyl-2-(E)-butenyl 4- 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate reductase 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate reductase diphosphate 1 (ipdp) (dmpp) 4.03E+09 Henry Henry 5-Phospho-alpha-D-ribose 1-diphosphate 41 amidophosphoribosyltransferase nicotinate phosphoribosyltransferase 3.99E+09 Henry Henry L-Aspartate 88 synthase (glutamine-hydrolysing) L-aspartate oxidase 3.94E+09 Henry Henry Menaquinone 8 25 L-aspartate oxidase NADH dehydrogenase (menaquinone-8 & 0 protons) 3.56E+09 Henry Henry Phosphoribosyl-ATP 1 ATP phosphoribosyltransferase phosphoribosyl-ATP diphosphatase 3.43E+09 Henry Henry Glyoxylate 144 (NADP+) 2.42E+09 Henry NIST Glycerone phosphate 152 fructose-biphosphate aldolase synthase 1.80E+09 NIST Henry Maltose 9 maltose transport via ABC system maltose transport via PEP:Pyr PTS 1.78E+09 Henry Henry L-Cysteine 6 L-cysteine transport via ABC system O-succinylhomoserine lyase (L-cysteine) 1.63E+09 Henry Henry 42 adenine phosphoribosyltransferase 1.51E+09 Henry Henry 12 phosphoribosyltransferase 1.51E+09 Henry Henry 5'-phosphate 4 orotate phosphoribosyltransferase orotidine-5'-phosphate decarboxylase 1.44E+09 Henry Henry Adenosine 25 1.43E+09 NIST Henry

106

Table B2: Free Energy Coupling Predictions with Multifunctional Enzyme Evidence

Enzyme A EC# A Enzyme B EC# B Intermediate KEGG ID Evidence 1.3.99.1 L-aspartate oxidase 1.4.3.16 FAD C00016 Rosetta acetylornithine 2.6.1.11 ornithine-oxo-acid transaminase 2.6.1.13 Pyridoxal phosphate C00018 Rosetta ornithine-oxo-acid transaminase 2.6.1.13 4-aminobutyrate transaminase 2.6.1.19 Pyridoxal phosphate C00018 Rosetta acetylornithine transaminase 2.6.1.11 histidinol-phosphate transaminase 2.6.1.9 Pyridoxal phosphate C00018 Rosetta malate synthase 2.3.3.9 4.1.3.1 Glyoxylate C00048 Rosetta/SwissProt aspartate carbamoyltransferase 2.1.3.2 synthase 6.3.4.4 L-Aspartate C00049 Rosetta amidophosphoribosyltransferase 2.4.2.14 phosphoribosylformylglycinamidine synthase 6.3.5.3 L-Glutamine C00064 Rosetta 3-deoxy-7-phosphoheptulonate synthase 2.5.1.54 phosphopyruvate hydratase 4.2.1.11 Phosphoenolpyruvate C00074 Rosetta ornithine carbamoyltransferase 2.1.3.3 ornithine-oxo-acid transaminase 2.6.1.13 L-Ornithine C00077 Rosetta 1.1.1.1 acetaldehyde dehydrogenase (acetylating) 1.2.1.10 Acetaldehyde C00084 SwissProt 2.2.1.2 glucose-6-phosphate 5.3.1.9 D-Fructose 6-phosphate C00085 Rosetta mannose-1-phosphate (GDP) 2.7.7.22 GDP-mannose 4,6-dehydratase 4.2.1.47 GDP-mannose C00096 Rosetta glucose-1-phosphate adenylyltransferase 2.7.7.27 5.4.2.2 D-Glucose 1-phosphate C00103 Rosetta 2.2.1.1 1-deoxy-D-xylulose-5-phosphate synthase 2.2.1.7 (2R)-2-Hydroxy-3-(phosphonooxy)-propanal C00118 Rosetta fumarate hydratase 4.2.1.2 aspartate ammonia-lyase 4.3.1.1 Fumarate C00122 Rosetta aspartate-ammonia 6.3.1.1 asparagine synthase (glutamine-hydrolysing) 6.3.5.4 L-Asparagine C00152 Rosetta aspartate carbamoyltransferase 2.1.3.2 ornithine carbamoyltransferase 2.1.3.3 Carbamoyl phosphate C00169 Rosetta ornithine carbamoyltransferase 2.1.3.3 2.7.2.2 Carbamoyl phosphate C00169 Rosetta transketolase 2.2.1.1 ribulose-phosphate 3-epimerase 5.1.3.1 D-Xylulose 5-phosphate C00231 Rosetta 2.5.1.10 isopentenyl-diphosphate Delta-isomerase 5.3.3.2 Dimethylallyl diphosphate C00235 Rosetta phosphoglycerate kinase 2.7.2.3 phosphoglycerate mutase 5.4.2.1 3-Phospho-D-glyceroyl phosphate C00236 Rosetta anthranilate synthase 4.1.3.27 5.4.4.2 Chorismate C00251 Rosetta (NADP+) 1.3.1.13 4.2.1.51 Prephenate C00254 Rosetta transketolase 2.2.1.1 3-deoxy-7-phosphoheptulonate synthase 2.5.1.54 D-Erythrose 4-phosphate C00279 Rosetta transketolase 2.2.1.1 fructose-bisphosphate aldolase 4.1.2.13 D-Erythrose 4-phosphate C00279 Rosetta dihydroorotate dehydrogenase 1.3.99.11 orotate phosphoribosyltransferase 2.4.2.10 Orotate C00295 Rosetta 2.7.1.17 5.3.1.5 D-Xylulose C00310 Rosetta 4-aminobutyrate transaminase 2.6.1.19 4.1.1.15 4-Aminobutanoate C00334 Rosetta 1.1.1.3 aspartate-semialdehyde dehydrogenase 1.2.1.11 L-Aspartate 4-semialdehyde C00441 Rosetta methylenetetrahydrofolate dehydrogenase (NAD+) 1.5.1.15 methenyltetrahydrofolate cyclohydrolase 3.5.4.9 5,10-Methenyltetrahydrofolate C00445 SwissProt methylenetetrahydrofolate dehydrogenase (NADP+) 1.5.1.5 methenyltetrahydrofolate cyclohydrolase 3.5.4.9 5,10-Methenyltetrahydrofolate C00445 SwissProt purine- 2.4.2.1 pyrimidine-nucleoside phosphorylase 2.4.2.2 alpha-D-Ribose 1-phosphate C00620 Rosetta purine-nucleoside phosphorylase 2.4.2.1 2.4.2.3 alpha-D-Ribose 1-phosphate C00620 Rosetta phosphopyruvate hydratase 4.2.1.11 phosphoglycerate mutase 5.4.2.1 2-Phospho-D-glycerate C00631 Rosetta mannose-1-phosphate guanylyltransferase (GDP) 2.7.7.22 5.4.2.8 D-Mannose 1-phosphate C00636 Rosetta glucose-6-phosphate isomerase 5.3.1.9 phosphoglucomutase 5.4.2.2 alpha-D-Glucose 6-phosphate C00668 Rosetta alpha,alpha-trehalose-phosphate synthase (UDP-forming) 2.4.1.15 trehalose- 3.1.3.12 alpha,alpha'-Trehalose 6-phosphate C00689 Rosetta urocanate hydratase 4.2.1.49 histidine ammonia-lyase 4.3.1.3 Urocanate C00785 Rosetta glucose-1-phosphate thymidylyltransferase 2.7.7.24 dTDP-glucose 4,6-dehydratase 4.2.1.46 dTDP-glucose C00842 Rosetta 2.7.1.5 L-rhamnose isomerase 5.3.1.14 L-Rhamnulose C00861 Rosetta/SwissProt 3.3.2.1 isochorismate synthase 5.4.4.2 Isochorismate C00885 Rosetta fructuronate reductase 1.1.1.57 5.3.1.12 D-Fructuronate C00905 Rosetta orotate phosphoribosyltransferase 2.4.2.10 orotidine-5'-phosphate decarboxylase 4.1.1.23 Orotidine 5'-phosphate C01103 Rosetta/SwissProt 2-hydroxy-3-oxopropionate reductase 1.1.1.60 hydroxypyruvate isomerase 5.3.1.22 2-Hydroxy-3-oxopropanoate C01146 Rosetta 3-phosphoshikimate 1-carboxyvinyltransferase 2.5.1.19 4.2.3.5 5-O-(1-Carboxyvinyl)-3-phosphoshikimate C01269 Rosetta ATP phosphoribosyltransferase 2.4.2.17 phosphoribosyl-ATP diphosphatase 3.6.1.31 Phosphoribosyl-ATP C02739 Rosetta phosphate acetyltransferase 2.3.1.8 2.7.2.1 Propanoyl phosphate C02876 Rosetta phosphate acetyltransferase 2.3.1.8 2.7.2.15 Propanoyl phosphate C02876 Rosetta phosphoglycerate dehydrogenase 1.1.1.95 phosphoserine transaminase 2.6.1.52 3-Phosphonooxypyruvate C03232 Rosetta glutamate-5-semialdehyde dehydrogenase 1.2.1.41 glutamate 5-kinase 2.7.2.11 L-Glutamyl 5-phosphate C03287 Rosetta/SwissProt hydroxyacylglutathione 3.1.2.6 4.4.1.5 (R)-S-Lactoylglutathione C03451 Rosetta UDP-N-acetylmuramate dehydrogenase 1.1.1.158 UDP-N-acetylglucosamine 1-carboxyvinyltransferase 2.5.1.7 UDP-N-acetyl-3-(1-carboxyvinyl)-D-glucosamine C04631 Rosetta transketolase 2.2.1.1 transaldolase 2.2.1.2 D-Sedoheptulose 7-phosphate C05382 Rosetta purine-nucleoside phosphorylase 2.4.2.1 phosphorylase 2.4.2.4 Deoxyinosine C05512 Rosetta

107

Table B3: Toxic Metabolic Predictions with Multifunctional Enzyme Evidence

Enzyme A EC# A Enzyme B EC# B Intermediate KEGG ID Evidence 4.1.99.1 tyrosine phenol-lyase 4.1.99.2 Pyridoxal phosphate C00018 Rosetta methionine gamma-lyase 4.4.1.11 cystathionine beta-lyase 4.4.1.8 Pyridoxal phosphate C00018 Rosetta 1.4.1.10 2.1.2.10 Glycine C00037 Rosetta 1.4.3.19 aminomethyltransferase 2.1.2.10 Glycine C00037 Rosetta steryl- 3.1.6.2 choline-sulfatase 3.1.6.6 Sulfate C00059 Rosetta 3.5.1.14 3.5.1.4 Carboxylate C00060 Rosetta cystathionine gamma-synthase 2.5.1.48 methionine gamma-lyase 4.4.1.11 2-Oxobutanoate C00109 Rosetta cystathionine gamma-lyase 4.4.1.1 methionine gamma-lyase 4.4.1.11 2-Oxobutanoate C00109 Rosetta citrate CoA-transferase 2.8.3.10 citrate (pro-3S)-lyase 4.1.3.6 Citrate C00158 SwissProt dehydrogenase (NAD+) 1.2.1.3 [NAD(P)+] 1.2.1.5 Acid C00174 Rosetta 3.5.1.32 amidase 3.5.1.4 Benzoate C00180 Rosetta purine-nucleoside phosphorylase 2.4.2.1 adenosylhomocysteinase 3.3.1.1 Adenosine C00212 Rosetta dehydrogenase 1.5.99.1 dehydrogenase 1.5.99.2 Sarcosine C00213 Rosetta sulfate adenylyltransferase (ADP) 2.7.7.5 ATP adenylyltransferase 2.7.7.53 Adenylyl sulfate C00224 SwissProt carbon-monoxide dehydrogenase (ferredoxin) 1.2.7.4 carbon-monoxide dehydrogenase (acceptor) 1.2.99.2 CO C00237 SwissProt 3.5.1.1 3.5.1.19 C00241 Rosetta 2.5.1.47 cystathionine beta-lyase 4.4.1.8 Hydrogen sulfide C00283 Rosetta 4.1.1.22 aromatic-L-amino-acid decarboxylase 4.1.1.28 1H-Imidazole-4-ethanamine C00388 Rosetta O-acetylhomoserine aminocarboxypropyltransferase 2.5.1.49 methionine gamma-lyase 4.4.1.11 Methanethiol C00409 Rosetta N-carbamoylputrescine amidase 3.5.1.53 3.5.3.12 N-Carbamoylputrescine C00436 Rosetta methenyltetrahydrofolate cyclohydrolase 3.5.4.9 formate-tetrahydrofolate ligase 6.3.4.3 5,10-Methenyltetrahydrofolate C00445 Rosetta/SwissProt alcohol dehydrogenase 1.1.1.1 alcohol dehydrogenase (NADP+) 1.1.1.2 D-Glyceraldehyde C00577 Rosetta carbonyl reductase (NADPH) 1.1.1.184 prostaglandin-E2 9-reductase 1.1.1.189 Prostaglandin F2alpha C00639 SwissProt 5'- 3.1.3.5 3'-nucleotidase 3.1.3.6 C00911 SwissProt cysteine synthase 2.5.1.47 O-acetylhomoserine aminocarboxypropyltransferase 2.5.1.49 O-Acetyl-L-serine C00979 Rosetta/SwissProt cysteine synthase 2.5.1.47 beta-pyrazolylalanine synthase 2.5.1.51 O-Acetyl-L-serine C00979 SwissProt cysteine synthase 2.5.1.47 L-mimosine synthase 2.5.1.52 O-Acetyl-L-serine C00979 SwissProt beta-pyrazolylalanine synthase 2.5.1.51 L-mimosine synthase 2.5.1.52 O-Acetyl-L-serine C00979 SwissProt cysteine synthase 2.5.1.47 O-phosphoserine sulfhydrylase 2.5.1.65 O-Acetyl-L-serine C00979 SwissProt thiosulfate 2.8.1.1 3-mercaptopyruvate sulfurtransferase 2.8.1.2 Thiocyanate C01755 Rosetta cystathionine gamma-synthase 2.5.1.48 O-acetylhomoserine aminocarboxypropyltransferase 2.5.1.49 L-Cystathionine C02291 Rosetta O-acetylhomoserine aminocarboxypropyltransferase 2.5.1.49 cystathionine beta-lyase 4.4.1.8 L-Cystathionine C02291 Rosetta A2 3.1.1.4 3.1.1.5 1-Acyl-sn-glycero-3-phosphoethanolamine C04438 SwissProt phenylalanine dehydrogenase 1.4.1.20 dehydrogenase 1.4.1.9 alpha-Amino acid C05167 Rosetta L-serine ammonia-lyase 4.3.1.17 threonine ammonia-lyase 4.3.1.19 alpha-Amino acid C05167 Rosetta/SwissProt aromatic-L-amino-acid decarboxylase 4.1.1.28 phenylalanine decarboxylase 4.1.1.53 Phenethylamine C05332 Rosetta cysteine synthase 2.5.1.47 cystathionine gamma-lyase 4.4.1.1 Selenocysteine C05688 Rosetta cystathionine gamma-synthase 2.5.1.48 cystathionine gamma-lyase 4.4.1.1 Selenocystathionine C05699 Rosetta cystathionine gamma-synthase 2.5.1.48 cystathionine beta-lyase 4.4.1.8 Selenocystathionine C05699 Rosetta cystathionine gamma-lyase 4.4.1.1 cystathionine beta-lyase 4.4.1.8 Selenocystathionine C05699 Rosetta cysteine synthase 2.5.1.47 cystathionine gamma-synthase 2.5.1.48 S-Sulfo-L-cysteine C05824 Rosetta acetate-CoA ligase 6.2.1.1 propionate-CoA ligase 6.2.1.17 Propinol adenylate C05983 Rosetta amidase 3.5.1.4 hydratase 4.2.1.84 Benzamide C09815 Rosetta 3beta-hydroxy-delta5- dehydrogenase 1.1.1.145 steroid Delta-isomerase 5.3.3.1 Campest-4-en-3beta-ol C15784 SwissProt

108

Table B4: Inhibitory Metabolite Predictions with Multifunctional Enzyme Evidence

Enzyme A EC# A Enzyme B EC# B Intermediate KEGG ID Evidence acetylornithine transaminase 2.6.1.11 ornithine-oxo-acid transaminase 2.6.1.13 Pyridoxal phosphate C00018 Rosetta acetylornithine transaminase 2.6.1.11 succinyldiaminopimelate transaminase 2.6.1.17 Pyridoxal phosphate C00018 SwissProt acetylornithine transaminase 2.6.1.11 4-aminobutyrate transaminase 2.6.1.19 Pyridoxal phosphate C00018 Rosetta ornithine-oxo-acid transaminase 2.6.1.13 4-aminobutyrate transaminase 2.6.1.19 Pyridoxal phosphate C00018 Rosetta D-amino-acid transaminase 2.6.1.21 branched-chain-amino-acid transaminase 2.6.1.42 Pyridoxal phosphate C00018 Rosetta acetylornithine transaminase 2.6.1.11 histidinol-phosphate transaminase 2.6.1.9 Pyridoxal phosphate C00018 Rosetta butyrate-CoA ligase 6.2.1.2 long-chain-fatty-acid-CoA ligase 6.2.1.3 Acyl-CoA C00040 Rosetta phosphoadenylyl-sulfate reductase (thioredoxin) 1.8.4.8 adenylyl-sulfate kinase 2.7.1.25 3'-Phosphoadenylyl sulfate C00053 Rosetta glutamine-tRNA ligase 6.1.1.18 asparagine synthase (glutamine-hydrolysing) 6.3.5.4 L-Glutamine C00064 Rosetta pyruvate, phosphate 2.7.9.1 pyruvate, water dikinase 2.7.9.2 Phosphoenolpyruvate C00074 Rosetta 2.2.1.6 threonine ammonia-lyase 4.3.1.19 2-Oxobutanoate C00109 Rosetta cystathionine gamma-synthase 2.5.1.48 methionine gamma-lyase 4.4.1.11 2-Oxobutanoate C00109 Rosetta cystathionine gamma-lyase 4.4.1.1 methionine gamma-lyase 4.4.1.11 2-Oxobutanoate C00109 Rosetta ATP phosphoribosyltransferase 2.4.2.17 anthranilate phosphoribosyltransferase 2.4.2.18 5-Phospho-alpha-D-ribose 1-diphosphate C00119 Rosetta hypoxanthine phosphoribosyltransferase 2.4.2.8 ribose-phosphate diphosphokinase 2.7.6.1 5-Phospho-alpha-D-ribose 1-diphosphate C00119 Rosetta ATP diphosphatase 3.6.1.8 adenylosuccinate synthase 6.3.4.4 IMP C00130 Rosetta 2-isopropylmalate synthase 2.3.3.13 dihydroxy-acid dehydratase 4.2.1.9 3-Methyl-2-oxobutanoic acid C00141 Rosetta methylenetetrahydrofolate dehydrogenase (NADP+) 1.5.1.5 thymidylate synthase 2.1.1.45 5,10-Methylenetetrahydrofolate C00143 Rosetta hypoxanthine phosphoribosyltransferase 2.4.2.8 ATP diphosphatase 3.6.1.8 GMP C00144 Rosetta aspartate-ammonia ligase 6.3.1.1 asparagine synthase (glutamine-hydrolysing) 6.3.5.4 L-Asparagine C00152 Rosetta glyceraldehyde-3-phosphate dehydrogenase (NADP+) 1.2.1.9 phosphoglycerate kinase 2.7.2.3 3-Phospho-D-glycerate C00197 Rosetta sulfate adenylyltransferase (ADP) 2.7.7.5 ATP adenylyltransferase 2.7.7.53 Adenylyl sulfate C00224 SwissProt phosphoribosylglycinamide formyltransferase 2.1.2.2 phosphoribosylaminoimidazolecarboxamide formyltransferase 2.1.2.3 10-Formyltetrahydrofolate C00234 Rosetta phosphoribosylglycinamide formyltransferase 2.1.2.2 formyltetrahydrofolate deformylase 3.5.1.10 10-Formyltetrahydrofolate C00234 Rosetta phosphoribosylaminoimidazolecarboxamide formyltransferase 2.1.2.3 formyltetrahydrofolate deformylase 3.5.1.10 10-Formyltetrahydrofolate C00234 Rosetta glyceraldehyde-3-phosphate dehydrogenase (phosphorylating) 1.2.1.12 phosphoglycerate kinase 2.7.2.3 3-Phospho-D-glyceroyl phosphate C00236 Rosetta glyceraldehyde-3-phosphate dehydrogenase (NAD(P)+) 1.2.1.59 phosphoglycerate kinase 2.7.2.3 3-Phospho-D-glyceroyl phosphate C00236 Rosetta phosphoglycerate kinase 2.7.2.3 phosphoglycerate mutase 5.4.2.1 3-Phospho-D-glyceroyl phosphate C00236 Rosetta phosphoglycerate kinase 2.7.2.3 bisphosphoglycerate mutase 5.4.2.4 3-Phospho-D-glyceroyl phosphate C00236 Rosetta aldehyde dehydrogenase (NAD+) 1.2.1.3 4-aminobutyrate transaminase 2.6.1.19 4-Aminobutanoate C00334 Rosetta 4-aminobutyrate transaminase 2.6.1.19 glutamate decarboxylase 4.1.1.15 4-Aminobutanoate C00334 Rosetta 3.5.3.7 4.1.1.19 4-Aminobutanoate C00334 Rosetta phosphogluconate 2-dehydrogenase 1.1.1.43 6-phosphogluconolactonase 3.1.1.31 6-Phospho-D-gluconate C00345 Rosetta phosphogluconate dehydrogenase (decarboxylating) 1.1.1.44 6-phosphogluconolactonase 3.1.1.31 6-Phospho-D-gluconate C00345 Rosetta 6-phosphogluconolactonase 3.1.1.31 phosphogluconate dehydratase 4.2.1.12 6-Phospho-D-gluconate C00345 Rosetta hydroxymethylglutaryl-CoA reductase (NADPH) 1.1.1.34 hydroxymethylglutaryl-CoA synthase 2.3.3.10 (S)-3-Hydroxy-3-methylglutaryl-CoA C00356 Rosetta hydroxymethylglutaryl-CoA reductase 1.1.1.88 hydroxymethylglutaryl-CoA synthase 2.3.3.10 (S)-3-Hydroxy-3-methylglutaryl-CoA C00356 Rosetta dihydrofolate reductase 1.5.1.3 thymidylate synthase 2.1.1.45 Dihydrofolate C00415 Rosetta/SwissProt porphobilinogen synthase 4.2.1.24 glutamate-1-semialdehyde 2,1-aminomutase 5.4.3.8 5-Aminolevulinate C00430 Rosetta methenyltetrahydrofolate cyclohydrolase 3.5.4.9 formate-tetrahydrofolate ligase 6.3.4.3 5,10-Methenyltetrahydrofolate C00445 Rosetta/SwissProt purine-nucleoside phosphorylase 2.4.2.1 3.5.4.5 Deoxyuridine C00526 Rosetta testosterone 17beta-dehydrogenase 1.1.1.63 testosterone 17beta-dehydrogenase (NADP+) 1.1.1.64 Testosterone C00535 SwissProt 2.5.1.15 aminodeoxychorismate lyase 4.1.3.38 4-Aminobenzoate C00568 Rosetta diacylglycerol O- 2.3.1.20 2-acylglycerol O-acyltransferase 2.3.1.22 1,2-Diacyl-sn-glycerol C00641 SwissProt N-acylmannosamine kinase 2.7.1.60 N-acetylneuraminate lyase 4.1.3.3 N-Acetyl-D-mannosamine C00645 Rosetta N-acylmannosamine kinase 2.7.1.60 UDP-N-acetylglucosamine 2-epimerase 5.1.3.14 N-Acetyl-D-mannosamine C00645 SwissProt branched-chain-amino-acid transaminase 2.6.1.42 dihydroxy-acid dehydratase 4.2.1.9 (S)-3-Methyl-2-oxopentanoic acid C00671 Rosetta -phosphate aldolase 4.1.2.4 5.4.2.7 2-Deoxy-D-ribose 5-phosphate C00673 Rosetta acyl-CoA dehydrogenase 1.3.99.3 glutaryl-CoA dehydrogenase 1.3.99.7 Crotonoyl-CoA C00877 Rosetta glutaryl-CoA dehydrogenase 1.3.99.7 enoyl-CoA hydratase 4.2.1.17 Crotonoyl-CoA C00877 Rosetta

109

Table B4 (continued)

KEGG Enzyme A EC# A Enzyme B EC# B Intermediate ID Evidence butyryl-CoA dehydrogenase 1.3.99.2 3-hydroxybutyryl-CoA dehydratase 4.2.1.55 Crotonoyl-CoA C00877 Rosetta acyl-CoA dehydrogenase 1.3.99.3 3-hydroxybutyryl-CoA dehydratase 4.2.1.55 Crotonoyl-CoA C00877 Rosetta glutaryl-CoA dehydrogenase 1.3.99.7 3-hydroxybutyryl-CoA dehydratase 4.2.1.55 Crotonoyl-CoA C00877 Rosetta enoyl-CoA hydratase 4.2.1.17 3-hydroxybutyryl-CoA dehydratase 4.2.1.55 Crotonoyl-CoA C00877 Rosetta dehydrogenase (NAD+) 1.3.1.1 3.5.2.2 5,6- C00906 Rosetta 3-dehydroquinate dehydratase 4.2.1.10 3-dehydroquinate synthase 4.2.3.4 3-Dehydroquinate C00944 Rosetta/SwissProt L-lactate dehydrogenase 1.1.1.27 2.6.1.1 Mercaptopyruvate C00957 Rosetta adenosylmethionine-8-amino-7-oxononanoate 8-amino-7-oxononanoate synthase 2.3.1.47 transaminase 2.6.1.62 8-Amino-7-oxononanoate C01092 Rosetta bisphosphoglycerate phosphatase 3.1.3.13 phosphoglycerate mutase 5.4.2.1 2,3-Bisphospho-D-glycerate C01159 SwissProt bisphosphoglycerate phosphatase 3.1.3.13 bisphosphoglycerate mutase 5.4.2.4 2,3-Bisphospho-D-glycerate C01159 SwissProt phosphoglycerate mutase 5.4.2.1 bisphosphoglycerate mutase 5.4.2.4 2,3-Bisphospho-D-glycerate C01159 Rosetta/SwissProt prephenate dehydrogenase 1.3.1.12 1.3.1.43 3-(4-Hydroxyphenyl)pyruvate C01179 SwissProt prephenate dehydrogenase (NADP+) 1.3.1.13 aspartate transaminase 2.6.1.1 3-(4-Hydroxyphenyl)pyruvate C01179 Rosetta prephenate dehydrogenase (NADP+) 1.3.1.13 histidinol-phosphate transaminase 2.6.1.9 3-(4-Hydroxyphenyl)pyruvate C01179 Rosetta 2-Amino-4-hydroxy-6-hydroxymethyl-7,8- dihydropteroate synthase 2.5.1.15 dihydroneopterin aldolase 4.1.2.25 dihydropteridine C01300 Rosetta/SwissProt 2-amino-4-hydroxy-6-hydroxymethyldihydropteridine 2-Amino-4-hydroxy-6-hydroxymethyl-7,8- diphosphokinase 2.7.6.3 dihydroneopterin aldolase 4.1.2.25 dihydropteridine C01300 Rosetta/SwissProt isovaleryl-CoA dehydrogenase 1.3.99.10 acyl-CoA dehydrogenase 1.3.99.3 3-Methylcrotonyl-CoA C03069 Rosetta isovaleryl-CoA dehydrogenase 1.3.99.10 enoyl-CoA hydratase 4.2.1.17 3-Methylcrotonyl-CoA C03069 Rosetta isovaleryl-CoA dehydrogenase 1.3.99.10 methylcrotonoyl-CoA carboxylase 6.4.1.4 3-Methylcrotonyl-CoA C03069 Rosetta enoyl-CoA hydratase 4.2.1.17 methylcrotonoyl-CoA carboxylase 6.4.1.4 3-Methylcrotonyl-CoA C03069 Rosetta ethanolaminephosphotransferase 2.7.8.1 diacylglycerol cholinephosphotransferase 2.7.8.2 1-Alkyl-2-acylglycerol C03201 SwissProt phosphoglycerate dehydrogenase 1.1.1.95 phosphoserine transaminase 2.6.1.52 3-Phosphonooxypyruvate C03232 Rosetta phosphoribosylaminoimidazole carboxylase 4.1.1.21 phosphoribosylformylglycinamidine cyclo-ligase 6.3.3.1 Aminoimidazole ribotide C03373 Rosetta phosphoribosylformylglycinamidine cyclo-ligase 6.3.3.1 5-(carboxyamino)imidazole synthase 6.3.4.18 Aminoimidazole ribotide C03373 Rosetta muconate cycloisomerase 5.5.1.1 chloromuconate cycloisomerase 5.5.1.7 3-Chloro-cis,cis-muconate C03585 Rosetta 3.1.1.4 lysophospholipase 3.1.1.5 1-Acyl-sn-glycero-3-phosphoethanolamine C04438 SwissProt 2-amino-4-hydroxy-6-hydroxymethyldihydropteridine 2-Amino-7,8-dihydro-4-hydroxy-6- dihydropteroate synthase 2.5.1.15 diphosphokinase 2.7.6.3 (diphosphooxymethyl)pteridine C04807 Rosetta/SwissProt long-chain-3-hydroxyacyl-CoA dehydrogenase 1.1.1.211 acetyl-CoA C-acetyltransferase 2.3.1.9 3-Oxohexanoyl-CoA C05269 Rosetta acetyl-CoA C-acyltransferase 2.3.1.16 acetyl-CoA C-acetyltransferase 2.3.1.9 3-Oxohexanoyl-CoA C05269 Rosetta butyryl-CoA dehydrogenase 1.3.99.2 enoyl-CoA hydratase 4.2.1.17 trans-Hex-2-enoyl-CoA C05271 Rosetta acyl-CoA dehydrogenase 1.3.99.3 enoyl-CoA hydratase 4.2.1.17 trans-Oct-2-enoyl-CoA C05276 Rosetta aromatic-L-amino-acid decarboxylase 4.1.1.28 phenylalanine decarboxylase 4.1.1.53 Phenethylamine C05332 Rosetta purine-nucleoside phosphorylase 2.4.2.1 2.4.2.4 Deoxyinosine C05512 Rosetta thymidine phosphorylase 2.4.2.4 adenosine deaminase 3.5.4.4 Deoxyinosine C05512 Rosetta adenylyl-sulfate reductase 1.8.99.2 adenylyl-sulfate kinase 2.7.1.25 Adenylylselenate C05686 Rosetta adenylyl-sulfate kinase 2.7.1.25 sulfate adenylyltransferase 2.7.7.4 Adenylylselenate C05686 Rosetta/SwissProt cystathionine gamma-synthase 2.5.1.48 cystathionine gamma-lyase 4.4.1.1 Selenocystathionine C05699 Rosetta 6.3.2.12 tetrahydrofolate synthase 6.3.2.17 10-Formyltetrahydrofolyl L-glutamate C05928 SwissProt acetate-CoA ligase 6.2.1.1 propionate-CoA ligase 6.2.1.17 Propinol adenylate C05983 Rosetta biotin-[propionyl-CoA-carboxylase (ATP-hydrolysing)] ligase 6.3.4.10 biotin-[methylcrotonoyl-CoA-carboxylase] ligase 6.3.4.11 Holo-[carboxylase] C06250 SwissProt biotin-[propionyl-CoA-carboxylase (ATP-hydrolysing)] ligase 6.3.4.10 biotin-[acetyl-CoA-carboxylase] ligase 6.3.4.15 Holo-[carboxylase] C06250 SwissProt biotin-[methylcrotonoyl-CoA-carboxylase] ligase 6.3.4.11 biotin-[acetyl-CoA-carboxylase] ligase 6.3.4.15 Holo-[carboxylase] C06250 SwissProt deacetoxycephalosporin-C hydroxylase 1.14.11.26 deacetoxycephalosporin-C synthase 1.14.20.1 Deacetoxycephalosporin C C06565 SwissProt thymidine phosphorylase 2.4.2.4 cytidine deaminase 3.5.4.5 5'-Deoxy-5-fluorouridine C12739 Rosetta 3beta-hydroxy-delta5-steroid dehydrogenase 1.1.1.145 steroid Delta-isomerase 5.3.3.1 Campest-4-en-3beta-ol C15784 SwissProt butyryl-CoA dehydrogenase 1.3.99.2 acyl-CoA dehydrogenase 1.3.99.3 (S)-2-Methylbutanoyl-CoA C15980 Rosetta (6Z,9Z,12Z,15Z,18Z)-3-Oxotetracosapenta- long-chain-3-hydroxyacyl-CoA dehydrogenase 1.1.1.211 acetyl-CoA C-acyltransferase 2.3.1.16 6,9,12,15,18-enoyl-CoA C16389 Rosetta

Appendix C – Statistical Methods The statistical significance of metabolic branch point (figures 3.4 and 3.5), multifunctional enzyme (table 3.6) and short protein-protein interaction network path length (tables 3.7 and 3.8) enrichments was determined using the hypergeometric distribution. This distribution describes the probability of a specific number of successes (k) when n members of a finite population are drawn randomly without replacement. With a total population size of N containing m successful members, the following equation gives the probability:

푚 푁 − 푚

푘 푃 푋 = 푘 = 푛 − 푘 푁

The hypergeometric distribution is used to represent the null hypothesis that selection from the total population is performed randomly, without any enrichment for successful members. If an under-representation of successes in the experimental case is expected, then the p-value is generated by summing the probabilities of fewer or equal successes occurring than were observed:

푚 푁 − 푚 푘 푖 푃 푋 ≤ 푘 = 푛 − 푖 푁

푖=0 푛

Similarly, if an over-representation of successes is expected, the p-value is represented by the probability that equal or more successes would occur by chance than were observed:

푚 푁 − 푚 푛 푖 푃 푋 ≥ 푘 = 푛 − 푖 푁

푖=푘 푛

In the case of metabolic branch point enrichment (figures 3.4 and 3.5), the total population size (N) is equal to the number of metabolites used in the control set. From this population, the number drawn (n) is equal to the number of metabolites present in each prediction set. Successful members (m) are those which belong to the KEGG Pathway or Compound Connectivity bin in question and the successful members drawn (k) are the metabolites predicted by each method that fall in the given bin.

For multifunctional enzyme enrichment (table 3.6), the total population size (N) is 79 432, the total number of consecutive enzyme pairs. Those with toxic or inhibitory intermediates represent the number drawn (n) as do those predicted to be involved in free energy coupling. Enzyme pairs

110

111

present in the Rosetta and Swiss-Prot multifunctional sets all contribute to the total successes (m) and contribute to the drawn successes (k) only when present in a prediction set.

For the minimum protein-protein interaction path lengths (tables 3.7 and 3.8), the total number of biochemically consecutive enzyme pairs was total population size (N) and the number of predicted enzyme pairs was the number drawn (n). Successful population members (m: total, k: drawn) were those which have a shorter than average minimum path length.

The p-values have been represented graphically in all tables and figures with dots:

p-value Dots > 10-2 < 10-2 • < 10-5 •• < 10-10 •••