Advanced Review The best models of metabolism Eberhard O. Voit*
Biochemical systems are among of the oldest application areas of mathematical modeling. Spanning a time period of over one hundred years, the repertoire of options for structuring a model and for formulating reactions has been constantly growing, and yet, it is still unclear whether or to what degree some models are better than others and how the modeler is to choose among them. In fact, the variety of options has become overwhelming and difficult to maneuver for novices and experts alike. This review outlines the metabolic model design proc- ess and discusses the numerous choices for modeling frameworks and mathe- matical representations. It tries to be inclusive, even though it cannot be complete, and introduces the various modeling options in a manner that is as unbiased as that is feasible. However, the review does end with personal recom- mendations for the choices of default models. © 2017 Wiley Periodicals, Inc.
How to cite this article: WIREs Syst Biol Med 2017, e1391. doi: 10.1002/wsbm.1391
INTRODUCTION function was frequently chosen not only for meta- bolic modeling in vitro, but also in living cells, where ver the past decades, mathematical and compu- its underlying assumptions are not satisfied3,4; Otational modeling has become a widely accepted indeed, it has been used even in biological fields that tool in biology, and its aspirations of making reliable have very little to do with enzyme kinetics.5 The rea- predictions or offering explanations for complex, and sons of this default utilization are threefold. First, sometimes counterintuitive phenomena are generally MMRL has proven to be an excellent representation appreciated as potentially very useful. At the same of enzymatic processes in vitro, and thousands of fi time, newcomers to the eld do not always seem to articles have measured or used its characteristic para- recognize that modeling can entail very different meters KM and Vmax. The assumption therefore has approaches and structures. In particular, it appears been that, barring obvious alternatives, one might be that many nonexperts seldom wonder where exactly justified to extrapolate its usage to conditions models come from, and if they actively want to in vivo. Second, MMRL captures a nonlinear satura- engage in basic modeling, they resort to a few defaults tion process, while allowing simple transformations that are not necessarily optimal or even valid. to linearity, for instance, by expressing 1/V as a func- A prime example is the common use of the tion of 1/S, where V is the rate of product formation 1,2 – Michaelis-Menten rate law (MMRL ), which was and S the substrate concentration.6 8 This combina- developed for analyses of enzyme catalyzed reactions tion makes MMRL very appealing, especially if one fi in vitro. Speci cally, it was conceptually based on the considers that there are infinitely many nonlinear reversible formation of an intermediate complex functions and that there is no guidance regarding between substrate and enzyme and the conversion of optimal choices among them. Also, linear regression this complex into product and enzyme, and formu- becomes applicable for parameter estimation, lated mathematically in the language of elementary although with some distortion of the error structure, chemical reaction kinetics. Over the decades, this and is incomparably easier than nonlinear regression. Finally, a default choice is attractive as it seems impossible to infer valid nonlinear representations *Correspondence to: [email protected] directly from experimental data, and much has been Department of Biomedical Engineering, Georgia Institute of Tech- ‘ ’ nology, Atlanta, GA, USA made of the fact that there are no true models. Physicists have it a little easier in this respect, Conflict of interest: The author has declared no conflicts of interest ‘ ’ for this article. because many laws, such as the law of gravity, have
© 2017 Wiley Periodicals, Inc. 1of26 Advanced Review wires.wiley.com/sysbio been gleaned directly from experiments or derived redundancy which is widespread in biology. Instead, from first principles and are valid under common modelers should require that a good model be conditions, even though they may fail in the realms driven by crisp biological questions, which in turn of astronomy and quantum physics. Biology, of determine the structure of the model.10 As an exam- course, is embedded in the physical world and must ple, consider the growth of a bacterial culture. If we obey its laws, but biological processes are often con- plan to investigate how many bacteria to expect at volutions of so many fundamental processes that a a given point in time, a simple exponential function physics-based description is no longer feasible.9 To with an appropriate growth rate might be an appro- see this discrepancy, one might just imagine all the priate solution for relatively small and well-fed physical processes associated with cell division or sig- populations. However, the function breaks down nal transduction to realize that biology must develop for large numbers, does not tell us anything about its own higher-level functional representations. Biol- the variability among several populations that ogy is of course not alone in this respect, and fields should all grow with the same characteristics, does like medicine, psychology or sociology face the same not take into account spatial considerations, and challenges. certainly does not reveal molecular mechanisms. Even statistics, which is a shining example of Thus, if it is important to determine which genes mathematical rigor, is not immune against wrong affect the speed of growth, we obviously require a model choices. In some cases, such as flipping a fair much more complex model. In the end, the clearest coin over and over again, the phenomenon itself may and most concise formulations of questions have the dictate which statistical distribution or process best chance of being answered by mathematical and description best captures its random features. How- computational models. ever, if biological or medical data have an observed The typical construction of a metabolic systems distribution with an extended right tail, many proba- model from scratch consists of four steps (Figure 1): bility distributions—with very different mathematical formats—may approximately fit the data, and the 1. Identification of the constituents of the system. fi identi cation of a particular distribution becomes a 2. Identification of the topology and regulation of somewhat troubling matter of unguided choice. the system. Sometimes one may be able to use arguments like the applicability of the central limit theorem in a multi- 3. Choice of mathematical representations for all plicative space, which can help provide support for a processes. log-normal distribution, but there are no true guaran- 4. Estimation of parameter values for the process tees when it comes to real data. We will return to this representations. issue later in the review, but the situation demon- strates how much more complicated the choice of These construction steps are followed by model representations may become in the analysis of diagnostics and analyses of consistency and model biological systems. appropriateness, which assess technical details, To address the issue of model choice, let’s start such as model stability and sensitivity, and more at the beginning. If one is not convinced already, globally try to ascertain that the model ‘makes some pondering will lead to the conclusion that sense.’ The latter is often tested with a series of there are no true models in biology and related simulations whose results are compared to data or fields. The reasons are manifold but collectively sim- expectations. The final modeling steps pertain to ple: Every biological phenomenon, no matter how various model uses, which may include explana- small, contains so many components and occurs in tions of observed phenomena, predictions of such a complicated environment that we cannot untested scenarios, various manipulations, pertur- even list—let alone represent—all factors that bations, and interventions, or optimizations directly or indirectly affect it. At least, that is the toward desirable goals. current state of the art. Also, the purpose of a The emphasis of this review is on Step 3. Steps model is often to distill, understand, or explain the 1 and 2 are briefly discussed, but they are primarily a essence of a phenomenon, which suggests omitting, matter of the biological subject area. Step 4 has been simplifying, or abstracting nonessential details. But reviewed numerous times in recent years and will if a detail is omitted, the model is bound to fail if only be discussed very coarsely. The steps following this particular detail becomes important. This the model construction fall outside the scope of this conundrum has no real solution, and Ockham’s review, although the diagnostic steps often lead to a razor is of little help, because it advises against revisiting of Steps 1 through 4.
2of26 © 2017 Wiley Periodicals, Inc. WIREs System Biology and Medicine The best models of metabolism
Metabolites Reactions
K
L M N Ideas
Regulation
X F t X X v Model structure: i = i( , 1,..., n, ij) Continuous? Deterministic? Discrete? Stochastic? K
VKM Application Reactions: vij = vij(t, X , ..., X ) 1 n M V N L V MN Mass action? Michaelis-Menten? Hill? CME? LM Inhibition? Type? Power-Law? S-system? GMA? Saturable? Lin-log? Convenience? Tendency? Universal? Petri Net? Automaton?
FIGURE 1 | Model design process. Top left: Modeling ideas lead to the selection of (blue) metabolites, (red) reactions, and (gold) regulatory signals, as far as they are known. Top right: These components are arranged in the format of a dynamic system, consisting of a static metabolic network of reactions and their regulation. Bottom right: Each reaction in this dynamic system needs to be mathematically formulated, with account of regulatory signals. Bottom left: The identification of the most appropriate model structure and representations faces many difficult choices. The full characterization of all reactions, including the determination of parameter values, completes the model design. Green arrows indicate that several iterations of the model design process may be necessary.
STEPS 1 AND 2: IDENTIFICATION OF and its environment, which is seldom the case. For CONSTITUENTS, TOPOLOGY AND this reason, Steps 1 and 2 are arguably the most fi REGULATION OF THE SYSTEM important and most dif cult steps of the modeling process. They may look deceivingly simple at first For most of the century of metabolic modeling, the glance, because ‘one intuitively knows’ what the design of a model has focused on a specific metabo- model is supposed to be about and who the main lite or pathway of interest, or on a pathway system players are. However, the selection of components that contains several linear, branched, or cyclic path- naturally incurs bias, because the modeler does not ways. This focus may have derived from an interest- have full information about the pathway system but ing research question or from a hypothesis generated must make decisions to include or exclude certain by earlier investigations. The modeling process begins aspects. The hidden challenge is the following: If the with two steps, namely, identifying what is to be lists are missing important components, many model included in the model, and how the components results will be compromised or flat-out wrong. But if interact with each other through mass flow or regula- they contain too many components, the model tion. The two steps go hand in hand and are guided becomes unwieldy, over-parameterized, and unrelia- by the biological hypotheses or questions to be ble in predictions and extrapolations. answered by the model. They should involve the Generically, the system definition in Step 1 fol- most parsimonious lists of components, processes, lows the etymology of the word: to define means to and regulators that still retain the integrity of the sys- set boundaries. The challenge is that this boundary tem or phenomenon. Such a compromise is easily setting is often complicated and seldom unique, stated but often very difficult to implement, because because every system in biology is embedded in a lar- assessing the importance of all possible factors would ger system, which could affect the system of interest. require almost complete knowledge of the system As a rule of thumb, Savageau proposed selecting
© 2017 Wiley Periodicals, Inc. 3of26 Advanced Review wires.wiley.com/sysbio components and processes such that there is high and subsequently determining functional descrip- connectivity within the system while the number of tions, the data are collectively subjected to a statisti- processes crossing the boundaries of the system is cal machine learning analysis that optimizes the minimal (p. 80 of Ref 11). Of great help are data- system connectivity so that it matches the observed bases like KEGG12 and BioCyc,13 which contain data as closely as possible. Specifically, one defines a comprehensive maps of a large number of pathways, space of feasible models, which are formulated as along with ample other information. However, they graphs where conditional probabilities are associated do not solve the problem. with the connections between variables. A learning Step 2 has been approached in two distinct procedure then selects the model that best fits the ways. When the model is constructed from the bot- actual observations.28 Early methods used Bayesian tom up, which used to be the case for almost all met- inference29 or mutual information,30 but the topic is abolic models until quite recently, one collects generically so complicated, and there are so many information regarding the production and degrada- solution proposals—some effective, others less so— tion of every component and thereby establishes the that recurring crowdsourcing competitions are held system topology and possibly the regulatory struc- to determine the best inference algorithms; they are ture. Clearly, this procedure requires substantial a called DREAM competitions (Dialogue on Reverse- priori knowledge of the details of the system, but rich Engineering Assessment and Methods31,32). Many of information is available as the result of over the proposed DREAM algorithms have focused on 100 years of biochemical research and has been co-occurrence patterns of changes in the expression documented in an enormous body of literature. The of particular genes, transcriptomic responses, pro- exact determination of the regulatory structure is teins, or metabolites. The vast majority of these have often murkier, and if it is not known from biochemi- targeted gene regulatory or protein–protein interac- cal experimentation, it is also difficult to infer with tion networks (e.g., Refs 33–36), but the results of computational methods (see below), although this is these genome- and proteome-based inferences can be sometimes possible (e.g., Refs 14–16). In any event, translated, under certain assumptions, into inferences – rich information on metabolic reactions has been regarding specific metabolic networks.37 46 It is also amassed over time, and the database BRENDA17 is a sometimes feasible, and of course desirable, to com- treasure trove for kinetic parameter values, including bine the metabolomics top-down approaches with information regarding regulation. more traditional bottom-up approaches.47,48 While An entirely different approach toward con- most reverse-engineering methods and applications structing models is based on metabolic time series or have targeted gene networks, some have been applied data from several input–output steady-state experi- to metabolic networks as well. Depending on the ments. If such data are available, it is in principle data, these inferences have addressed static – – possible to infer the most likely connectivity of a sys- networks49 52 or dynamic systems.53 62 tem.18,19 The necessary data have become available The machine learning inferences may target not in recent years as the result of targeted or untargeted only the connectivity of a network but also the distri- – metabolomics.20 24 In this approach, very many bution of metabolic flux magnitudes at a steady state. metabolite concentrations are measured, typically For the latter, two methodological frameworks have with mass spectrometry or nuclear magnetic reso- been developed. The first is metabolic flux analysis, nance, for the system under investigation and a con- which is based on supplying the metabolic system trol system. Often the goal is to identify and with a labeled substrate, waiting until the substrate characterize significant differences in the peak pattern has been converted into a variety of other metabo- of the spectrograms. The differing peaks point to lites, and then using sophisticated computational metabolites with different concentrations in the two methods for determining the fluxes that drive the – situations. If these metabolites can be identified, system.63 69 The second is flux balance analysis which however is not always the case, they become (FBA), which is a very popular extension of stoichio- the focus of further analysis, for instance with metric analysis70,71 and estimates flux distributions, dynamic models. To some degree it is also possible to sometimes genome-wide, under the assumption of evaluate genome data with respect to expression dif- some optimality criterion, such as maximal growth – – ferentials in genes coding for enzymes.4,25 27 or ATP production.72 75 The methods of analysis for these data are very Quite a different, graph-based inference method different from the bottom-up methods. Rather than is the metabolic pathway reconstruction from molec- scanning the literature and databases for possible ular structures of compounds and knowledge about connections between pairs of components A and B, enzymes that possibly convert these compounds into
4of26 © 2017 Wiley Periodicals, Inc. WIREs System Biology and Medicine The best models of metabolism others. In these approaches, reactions are considered dramatically, the induced disruptions led to sur- transfers of atoms between metabolic compounds, prisingly small changesinmRNAsandproteins. and computational graph methods determine feasible Moreover, the metabolite levels remained unex- – or most likely paths.76 78 pectedly stable, and the authors showed that this In all these approaches, no matter how different stability was achieved through wide-spread rerout- they are, the ultimate outcome is a directed network ing of fluxes throughout the metabolic system. of metabolic fluxes, ideally with an indication of its If complex responses such as compensation or regulatory control structure. An exception is the defi- adaptation are to be understood, the modeling effort nition of reactions as nodes and metabolites as edges needs to be stepped up toward fully dynamic models that represent ‘shared resources among modules.’79,80 that permit true extrapolations and predictions of This option is rather counterintuitive but has the untested scenarios. For these models, the connectivity advantage of natural clustering with a reduced num- of components is not sufficient, but every process ber of nodes. In a similar vein, one can mathemati- needs to be formulated as a function that appropri- cally represent a traditional metabolic network as a ately captures the roles of all system components that model where the reactions are the dependent affect this particular process. Needless to say, this variables.81 step is challenging. Furthermore, the need to identify suitable functional representations arises whether a model is constructed from the bottom up or from the STEP 3: CHOICE OF MATHEMATICAL top down. REPRESENTATIONS FOR ALL PROCESSES The Challenge of Choosing Suitable The great advantage of graph and machine learn- Functions ing methods is that only minimal aprioriknowl- Even for single processes, the choice of an explicit edge about a system is needed, as long as enough function can be difficult. As a comparatively simple suitable data are available. Alas, this advantage is example, consider the selection of a statistical distri- directly connected to the greatest limitation of bution function for representing a sample or process. these methods, namely, that the final result con- As discussed before, the stochastic features underly- sists only of the connectivity of the system and, in ing the distribution may suggest a formulation, as for some cases, the amount of material flowing flipping a coin, but such guidance is rare. To demon- through each connection. For some purposes, this strate the problem, Sorribas et al.83 generated 25 sam- information is sufficient, and corresponding mod- ples with 160 drawings each from normal, gamma els have been used to predict the consequences of and Weibull distributions and then used an optimiza- gene knock-outs or changes in substrate availabil- tion algorithm to fit one of the standard distribution ity. However, these predictions implicitly assume functions (normal, log-normal, gamma, logistic, Wei- that the organism does not call up its multi-level bull, or Gumbel) to these artificial data. Surprisingly, regulatory machinery to mount compensatory in about 75% of all cases, the best fit was obtained mechanisms. Such responses are nonlinear and with a different distribution than the one used to cre- therefore not always well modeled by linear meth- ate the sample. This unexpected finding causes con- ods such as graph analyses or FBA. For instance, cern for selecting functions for biological processes, if a gene is knocked out that codes for some especially if data are noisy. enzyme, the corresponding enzymatic step is also The wrong identification of functions raises the eliminated and no product is generated. However, question whether it is really problematic if the if the organism needs this product, it will express selected function is wrong, as long as the data are other genes with the goal of redirecting alternative represented well. For the statistical example it may pathways toward the desired metabolite. This actually not even be a major problem for data ana- problem is significant for minimalistic FBA models, lyses because distributions are used for concise but does not entirely disappear for genome-wide descriptions, sampling, decision making, and maybe models either, because the analysis makes infer- for computing some test statistics. However, wrong ences in the absence of regulation. In a beautiful functions can be detrimental in dynamical systems, demonstration, Ishii et al.82 studied the responses which are commonly used for extrapolations toward of Escherichia coli to numerous environmental and new scenarios. As a simple example, consider two genetic perturbations. While the expression of very similar models of the small pathway in Figure 2. genes close to the perturbations often changed Model A is formulated as
© 2017 Wiley Periodicals, Inc. 5of26 Advanced Review wires.wiley.com/sysbio
1.5 1.5 X X 1 2 Y Y 1 2 1 1
X X Concentration 2 3 0.5 0.5 0 5 Time 10 0 5 Time 10 X 1 1.5 1.5 X X 3 4 Y Y X 3 4 4 1 1 Concentration 0.5 0.5 0 5 Time 10 0 5 Time 10
1.2 1.2 8 X 3 X 1 2 Y Y 1 2
0.6 0.9 4 1.5 X X 1 2 Y Y Concentration 1 2 Concentration 0 0.6 0 0 0 5 Time 10 0 5 Time 10 0 10 Time 20 0 10 Time 20 1.2 1.2 3 5 X X 3 4 Y Y 3 4 1 1 1.5 2.5 X X 3 4
Concentration Y
Y Concentration 3 4 0.8 0.8 0 0 0 5 Time 10 0 5 Time 10 0 10 Time 20 0 10 Time 20
FIGURE 2 | The diagram in the top-left panel was modeled with two slightly differing models (see Text). The responses to moderate perturbations (top-right: X2(0) = Y2(0) = 1.5 and bottom-left (X1(0) = Y1(0) = 0.2) are quite similar. However, a bolus of 2 units added to X1 and Y1 during the time period t 2 [3, 8] triggers very different response trends (bottom-right).
_ : −1 − ðÞ− : available experimental data to determine a well-fitting X1 =12 X3 FX1 0 2 X1 model. Without a priori knowledge, the modeler may _ try the mass-action formulation or a Hill function X2 = FXðÞ1 −X2 (or a variety of other functions), which in both cases leads to excellent results (Figure 2). The modeler is X_ = X −X 3 2 3 presumably satisfied, and using the argument of parsi- _ : − : ð Þ mony, s/he may choose the simple mass-action for- X4 =02 X1 0 2 X4 1 mat. This choice does not encounter problems until a new scenario is modeled, as we will discuss next. It where, the material flux between X and X is repre- 1 2 appears that this scenario is generically much more 1:2 X4 ðÞ 1 sented with the Hill function FX1 = : 4.In frequent than one would like to admit. 0 2+X1 Both models have the same steady state of Model B, all variables are called Yi, and the process (1, 1, 1, 1), and the differences in responses to rea- corresponding to F(X1) is instead formulated as the mass action function G(Y1)=cY1 with c = 1. Other- sonable perturbations are small, especially if one wise the two models are identical. One could argue imagines that either model in reality should match that a Hill model and a mass-action model are quite data with a bit of noise. Figure 2 shows responses to different, which is certainly true, and that’s the point changing the initial values from the steady state to of the example: In a real-world situation, it may be (0.2, 1, 1, 1) or (1, 1.5, 1, 1). The resulting fits are entirely unclear how to formulate F(X1) mathemati- certainly satisfactory, and one may conclude that cally, and the modeler will have to rely entirely on either model choice is perfectly suitable.
6of26 © 2017 Wiley Periodicals, Inc. WIREs System Biology and Medicine The best models of metabolism
However, substantial differences arise if the terms, provides a perfect representation to any data- models are used for other types of simulations. For set. In all these cases, the compromise obviously lies instance, the panel on the bottom right of Figure 2 in the complexity of the generalization, which exhibits the results of a simulation where an external strongly affects the number of parameters to be bolus of 2 units is added, quasi as a second input, to estimated. the first differential equation during the time period A third option is in some sense a compromise t 2 [3, 8], whereas it is 0 at other times. For larger between the first two approaches. It consists of the dynamical systems, where error compensation easily generic representation of a distribution family in an happens within or among terms of the same equation approximate manner that strikes a reasonable bal- or of different equations, it is possible that wrong ance between accuracy and the number of parameter model choices lead to utterly faulty results. Note that values. Examples in statistics are the four-parameter the example was not constructed around a bifurca- S-distribution and the five-parameter GS- – tion point, where slight changes in parameter values distribution.90 92 These distributions contain only a may lead to different qualitative behavior, but that it few statistical distributions as exact special cases, but addresses typical alternative functions within their approximate very many of them, including compli- normal operation ranges. cated noncentral and discrete distributions, in a rela- tively simple, streamlined and quite accurate manner. Owing to their simplicity, these distributions permit Generic Strategies for Identifying interesting analyses, especially with respect to fitting Appropriate Functions data of ill-characterized origins, where they can even Clearly, the consequence of choosing a wrong model suggest which traditional distribution might be well should be a cause for concern that requires atten- suited to represent a dataset. Similar strategies have – tion. For inspiration, let’s return to statistics to been proposed for growth processes.93 96 explore documented strategies for selecting distribu- tion functions. A first, intuitive strategy may be fit- ting a more or less comprehensive set of candidate Selecting Metabolic Rate Functions from a functions to the data under investigation, one at a Smorgasbord of Options time, and judging the quality of fit by residual error. The issues associated with model choices are exacer- Sorribas et al.83 pursued this strategy to analyze bated in metabolic modeling, because experimental growth distributions of girls between ages 5 and 17. data are typically the output of complex systems, Needless to say, this strategy is cumbersome and rather than explicit functions. As a consequence, the time-consuming, and any choice based on residual selection problem is confounded by compensation of errors becomes questionable if the various candidate errors within and among the equations of the ODE distributions contain different numbers of para- system that is supposed to represent the data, as we meters: On the one hand, a function with more saw before (Figure 2). The good news is that meta- parameters should be expected to yield better accu- bolic models tend to be quite robust, as long as all racy, but if one is interested in a good fit, should a variables remain in relatively small ranges. This is highly parameterized (or over-parameterized) func- especially so in large models with ample regulation, tion be penalized? which tends to buffer the variables in the vicinity of a A second strategy is the formulation of a gener- homeostatic state. As a consequence, different models alized distribution function that contains a few or may be appropriate for modeling systems close to – many candidate functions as special cases.84 88 their normal operating state. Direct comparisons Driven to the extreme, Savageau’s suprasystem of between alternative modeling frameworks are relative probability distributions, which consists of a large rare but do exist and demonstrate similar outputs for system of nonlinear ordinary differential equations different models in response to small – (ODEs), is so comprehensive that one can prove perturbations.97 111 Nonetheless, models are often mathematically that all continuous distributions are designed for explorations of new realms, where dif- exact special cases.89 Problem solved? No, because, ferences among alternative representations can first, it is a priori unclear how many ODEs should be become significant. For instance, the goal of a model involved, and second, increasing numbers of ODEs analysis may be to understand or intervene in a dis- demand so many parameters that it becomes impossi- ease, or to manipulate a microbe toward the high- ble to determine them from data with any degree of yielding production of some metabolite, such as bio- – reliability. In a similar vein, one could propose a diesel, insulin, or citric acid.112 115 These types of generic polynomial which, with sufficiently many manipulations often require larger changes, where
© 2017 Wiley Periodicals, Inc. 7of26 Advanced Review wires.wiley.com/sysbio differences between alternative modeling formats settings, and indeed, noninteger coefficients have become significant. Thus, the search for the best been used in metabolic modeling (e.g., Refs 126). models must not be abandoned. In their pure, unmodulated form, MMRL and – The field of metabolic modeling is dominated the Hill function are easy to use and parameterize,6 8 by a few functional formats that have been used and while we already discussed limitations with time and again. In some way, many of them may respect to valid applicability, they have been and will be traced back to mass action kinetics, which was remain to be a mainstay in metabolic modeling for a proposed over 150 years ago for explaining the long time. For larger pathway systems, the seeming – kinetics of elementary chemical reactions.116 119 simplicity evaporates, and standard analyses, such as Indeed, mass action functions are the most preva- the computation of a steady state or of parameter lent defaults. For a simple irreversible reaction that sensitivities, become very cumbersome.104 converts A into B, the mass-action formulation is The most important generalization of these _ A = −k A, B_ = k A. Thus, degradation and produc- functions is the account of regulation, which can tion are linear functions of A with a rate k.IfA and make these functions complicated, if not unwieldy. In B are converted into C, the model is represented as the simple case of competitive inhibition by inhibitor _ _ A = B_ = −κ A B, C = κ A B, where κ is again a rate I, the result is still rather simple: constant. It is important to note that many models in V S biology are direct derivatives of these formula- v = max : ð4Þ 119 ðÞ= tions ; they include Michaelis-Menten and Hill KM 1+I KI + S functions,1,2,120 SIR models121 for the spread of infectious diseases, and Lotka-Volterra However, different inhibition mechanisms mandate – models.122 125 For single-variable processes, the dif- different formulae. As a result, the mathematical for- ferential equation for the loss of A is linear, which is mats for rate laws with competitive, noncompetitive, also the case for many elementary processes in phys- uncompetitive, mixed, or allosteric inhibition are all ics, such as exponential decay, heating and cooling, different, which implies that the type of inhibition and simple transport processes. must be known before the appropriate model can be Outside the mass-action formulation, the formulated. Numerous books have published details 11,127–130 MMRL1,2 has been the undisputed workhorse of of the various types of inhibition. metabolic modeling.4 Initially formulated as a set For reactions that involve several inhibitors and of differential equations, describing the binding of modulators, the results can become very complicated, substrate to enzyme and the generation of prod- leading to a feeling of frustration and concern. An uct, as discussed before, the power of the rate law interesting example is the phosphofructokinase (PFK) came from assumptions that are mostly true for reaction of glycolysis, which phosphorylates fructose experiments in vivo. These permitted the reformu- 6-phosphate through a transfer of ATP. Because PFK lation of the overall rate of conversion of sub- is a critical control point in glycolysis, the reaction is strate into product as an explicit function of affected by several metabolites, as well as the form pH. Extensive investigations of these modulators have led to a large variety of rate functions that were formulated to capture the reaction kinetics. The fol- V S v = max ; ð2Þ lowing collection of examples is certainly not com- KM + S plete, but the reason for showing at least a subset is to indicate the growing confusion the newcomer where V is the maximal rate, and the Michaelis max experiences when studying rate functions. The exam- constant K reflects the affinity between substrate M ples are presented essentially in the notation in which and enzyme. The simplest generalization of MMRL they were originally published. is the Hill function,120 which associates the Hill coef- One of the simpler rate laws was presented by ficient n with the substrate and with K : M Eicher and colleagues,126 who formulated it as a bi- substrate, irreversible Hill process of the form V Sn v = max : ð3Þ Kn + Sn M h h F6PnoATP v = Vf ; The Hill coefficient is typically 2 or 4, as it was origi- h h 1 + PEPh +1+α2hPEPh F6P + ATP 1+ α4h h nally meant to represent molecular subunits. None- 1+ PEP theless, there are no mathematical reasons for these ð5Þ
8of26 © 2017 Wiley Periodicals, Inc. WIREs System Biology and Medicine The best models of metabolism where the variables F6P, ATP and PEP represent The co-existence of such drastically different metabolite concentrations that are scaled by their models for the same reaction is disconcerting: How half-saturation constants. Thus, in addition to Vf, α should one decide on the most appropriate model and h, the model contains three further parameters. formulation? Is it even possible to obtain information Blangy et al.131 proposed a model based on the so- regarding all kinetic parameters, and if so, is it valid called concerted transition theory proposed by to mix parameter values measured under different Monod et al.132 This model has over twenty para- conditions and maybe even in different organisms? meters. A few years later, Otto et al.133 proposed a Are questionable parameter values and a confusing, model in the format overwhelming diversity of formats a fact of life, or are there alternatives? Savageau (p. 75 of Ref 11) Vmax_PFK × F6P × MgATP responded to these questions over forty years ago: ‘It v = K_F6P + F6P K_MgATP + MgATP ; ð6aÞ 1+L must be concluded that the complete kinetic charac- terization of more complex regulatory enzymes is where impossible for practical reasons. Even if such a char- acterization were available, it would hardly be useful. 4 4 In the simplest case, for which the highest power of 1:0+ MgATP × 1:0+ Mg KMgATP K_Mg any concentration variable is one, a reaction with × : L = L0_PFK 4 4 fi F6P AMP eight reactants and modi ers will have on the order 1:0+ × 1:0+ 130 K_F6P K_AMP of 500 terms in its rate law.’ Schulz came to a sim- ðð6bÞÞ ilar conclusion after having discussed the overwhelm- ing complexity of some rate functions that attempted Mulquiney et al.134 suggested the alternative to capture the correct mechanism.