Supporting Information

Liu et al. 10.1073/pnas.1215508110 8 SI Text > _ = − <> x1 k1x1x2 _ = − − 2 x2 k1x1x2 k2x2x3 [S6] I. Algebraic Observability > _ = − 2 :> x3 k1x1x2 2k2x2x3 The mathematical description of a control system, which re- _ = 2 : x4 k2x2x3 sponds to external inputs uðtÞ ∈ ℝK and provides specific outputs yðtÞ ∈ ℝM , is best described in the state-space form If we consider an open system, we should introduce in-flux for ( pure reactants (that never act as products) and out-flux for pure x_ = f ; x ; u ðtÞ ðt ðtÞ ðtÞÞ [S1] products (that never act as reactants) as follows: y = h ; x ; u ; ðtÞ ðt ðtÞ ðtÞÞ 8 > _ = − + <> x1 k1x1x2 C1 ∈ ℝN f · h · _ = − − 2 + where xðtÞ is the state vector of the system; ð Þ and ð Þ are x2 k1x1x2 k2x2x3 C2 [S7] > _ = − − 2 in general nonlinear functions. :> x3 k1x1x2 2k2x2x3 x _ = 2 − ; Assume that we have no knowledge of the initial state ð0Þ of x4 k2x2x3 C4x4 the system, but we can monitor yðtÞ perfectly in some interval so t = that all their time derivatives at time 0 can be calculated. The where for pure reactants A and B, we introduce constant in-flux C1 observability problem concerns the existence of relationships be- and C ; and for pure product D, we introduce x-dependent out- y 2 tween the outputs ðtÞ and their time derivatives, the state vector flux C4x4. With the extra terms due to in- and out-flux, the in- xðtÞ,andtheinputsuðtÞ such that the system’s initial state xð0Þ can ference diagram changes slightly—there will be self-edges for be deduced (1–5). From the differential algebraic point of view pure products. However, this will not change the prediction made (6–8), the observability of a rational system is determined by the by GA, because a pure product with self-edge is still a root dimension of the space spanned by gradients of the Lie derivatives strongly connected component (SCC) of size 1 in the inference diagram; hence, it has to be measured to yield observability. For XN X XK ∂ ∂ + ∂ simplicity we only consider closed systems here. We also assume L := + f + uð j 1Þ [S2] ∂ i∂ l ð jÞ that there are no external inputs uðtÞ and we consider the sim- t = xi ∈ℕ = ∂u i 1 j l 1 l plest measurement scheme, i.e., we directly measure a subset of state variables (e.g., the concentrations of some chemical species) of its output functions hðt; xðtÞ; uðtÞÞ. The observability problem S1 can be further reduced to the so-called rank test: the system is yðtÞ = ð⋯; x ðtÞ; ⋯ÞT: [S8] algebraically observable if and only if the NM × N Jacobian matrix i 2 3 S5 ∂L0h ∂L0h ∂L0h Now we show that the reaction system is algebraically ob- f 1 f 1 ⋯ f 1 servable if we measure the concentration of the pure product D, 6 ∂ ∂ ∂ 7 6 x1 x2 xN 7 = 6 7 i.e., y x4. 6 ⋯⋯⋯⋯7 Proof: 6 7 We calculate the Lie derivatives of the output function: 6 0 0 0 7 6 ∂L hM ∂L hM ∂L hM 7 6 f f ⋯ f 7 ð0Þ = 0 = ; [S9] 6 ∂ ∂ ∂ 7 Y Lf y x4 6 x1 x2 xN 7 6 7 J = 6 ⋮⋮⋮⋮7 [S3] 6 7 ð1Þ = 1 = 2; [] 6 7 Y L y k2x2x 6 ∂ N−1 ∂ N−1 ∂ N−1 7 f 3 6 Lf h1 Lf h1 Lf h1 7 6 ⋯ 7 6 ∂x ∂x ∂x 7 6 1 2 N 7 Y ð2Þ = L2y 6 ⋯⋯⋯⋯7 f 6 7 = _ 2 + _ [S11] 4 N−1 N−1 N−1 5 k2x2x3 k2x22x3x3 ∂L hM ∂L hM ∂L hM f f ⋯ f = − − 2 2 + − 2 ; k2 k1x1x2 k2x2x3 x3 k2x22x3 k1x1x2 2k2x2x3 ∂x1 ∂x2 ∂xN has full rank (6, 7), ð3Þ = 3 Y Lf y = : [S4] = € 2 + _ _ + _ _ + _2 + € rank J N k2x2x3 k2x22x3x3 k2x22x3x3 k2x22x3 k2x22x3x3 2 = 2k x k x x − 2k x x2 For example, we can perform the rank test for chemical re- 2 2 1 1 2 2 2 3 + 4k x k x x − 2k x x2 −k x x − k x x2 action systems with mass-action kinetics. Consider a reaction 2 3 1 1 2 2 2 3 1 1 2 2 2 3 [S12] system with four chemical species {A,B,C,D} involved in two 2 2 2 + 2k2x2x3 −k x1x − 4k2x2x3 k1x1x2 − 2k2x2x reactions (Fig. S1D): 1 2 3 + k x −k x x − k x x2 − 2k x2 −k x x − k x x2 ( 1 1 1 1 2 2 2 3 2 3 1 1 2 2 2 3 k1 2 2 2 2 R1 : A + B ! C + k x k x x − 2k x x k x x − 2k x x [S5] 2 3 1 1 2 2 2 3 1 1 2 2 2 3 : + k2 : − − − 2 − 2 − − 2 : R2 B 2C ! D k1x1 k1x1x2 k2x2x3 k2x3 k1x1x2 k2x2x3

Using mass-action kinetics, the balance equations for the closed Then the Jacobian matrix can be calculated: system can be written as

Liu et al. www.pnas.org/cgi/content/short/1215508110 1of10 2 3 the GA-selected sensor nodes. Consider the simplest reaction ∂L0y ∂L0y ∂L0y ∂L0y 6 f f f f 7 system: A → B (as shown in Fig. S1A). If we just measure A’s 6 ∂ ∂ ∂ ∂ 7 6 x1 x2 x3 x4 7 concentration as a function of time x1ðtÞ, we will never infer any 6 7 information about the initial state of B, i.e., x ð0Þ. Similarly, if we 6 ∂L1y ∂L1y ∂L1y ∂L1y 7 2 3 2 6 f f f f 7 0001 do not measure x in Fig. S1E we can never infer it, because x 6 7 6 7 4 4 6 ∂x1 ∂x2 ∂x3 ∂x4 7 does not appear in any other node’s balance equation. = 6 7 = 6 0 J22 J23 0 7; J 6 7 4 5 6 ∂L2y ∂L2y ∂L2y ∂L2y 7 J31 J32 J33 0 6 f f f f 7 B. Sufficiency. If all of the GA-selected sensor nodes are measured, 6 ∂ ∂ ∂ ∂ 7 J41 J42 J43 0 6 x1 x2 x3 x4 7 then the Jacobian matrix does not contain any zero columns (e.g., 6 7 Fig. S1 C, F,andI, Left). Because all nonzero elements in the 4 ∂L3y ∂L3y ∂L3y ∂L3y 5 f f f f Jacobian matrix are complicated polynomials of the state varia- ∂x1 ∂x2 ∂x3 ∂x4 bles, the probability of having dependent columns in the Jacobian matrix will be very low, if not zero. In the following we show with that measuring the GA-selected sensor nodes will very likely be sufficient to yield observability of biochemical reaction systems. = 2 We also show the exceptional cases and argue that they are rare. J22 k2x3 We first give intuitive explanations about the sufficiency of monitoring the GA-selected sensor nodes for the observability of = J23 2k2x2x3 biochemical reaction systems. Our argument is based on struc- tural control theory (9, 10). We linearize the right-hand side of x J31 = k1k2x2ð2x2 − x3Þ x3 the balance equations at an arbitrary state , obtaining the lin- earized system x_ðtÞ = Ax, which can be associated with a weighted directed network GðAÞ with its weighted adjacency matrix = − − + + 2 + ∂ J32 k2x3 k1x1ð 4x2 x3Þ k2x ð8x2 x3Þ fi 3 a = jx. In other words, if a ≠ 0 then it gives the strength of ij ∂xj ij weight that node j can affect node i. Positive (or negative) values = − − + + 2 + → J33 2k2x2 k1x1ð x2 x3Þ 2k2x3ð3x2 x3Þ of aij suggest that the edge ( j i) is excitatory (or inhibitory). The weighted directed network is also called the system digraph in structural control theory (11–13). J = k k x 2k x2 −8x2 + 2x x + x2 41 1 2 2 2 3 2 2 3 3 For biochemical reaction systems, a simple rule to get the + − + + 2 − + 2 directed network GðAÞ directly from the reactions is the fol- k1 x2x3ð 2x2 x3Þ 2x1 2x2 6x2x3 x3 lowing: For each reaction (i) draw a directed edge from each reactant to each product; (ii) draw a self-edge for each reactant; J = k 2k k x x2 −24x2 + 4x x + x2 (iii) if there are more than two reactants, draw a bidirectional 42 2 1 2 1 3 2 2 3 3 edge between every two reactants. For example, consider a re- + k2x4 72x2 + 32x x + x2 + k2x 2x x ð−3x + x Þ 23 2 2 3 3 1 1 2 3 2 3 action system with four chemical species {A,B,C,D} involved in + 2 − + 2 x1 6x2 12x2x3 x3 two reactions (as shown in Fig. S1G): 8 > k < R : A 1! B + C = − 2 + + 2 1 J43 2k2x2 2k1k2x1x3 8x2 3x2x3 2x3 [S13] > k2 : R : B ↽ * D : + k2x3 48x2 + 40x x + 3x2 + k2x ðx ð−3x + x Þ 2 2 3 2 2 3 3 1 1 1 2 3 k3 + x2ð−x2 + x3ÞÞ : Using mass-action kinetics, the balance equations for the closed system can be written as It can be shown via symbolic calculation that J has full rank. 8 Thus, the system is algebraically observable. > _ = − <> x1 k1x1 Note that in general we are not going to explicitly calculate _ =+ − + the initial state from Eqs. S9–S12, which is usually very difficult, x2 k1x1 k2x2 k3x4 [S14] > _ =+ if not impossible. Observability only concerns whether such a :> x3 k1x1 _ =+ − : solution exists. x4 k2x2 k3x4

II. Graphical Approach The directed network GðAÞ associated with the balance A. Necessity. We can prove that measuring the sensor nodes equations is shown in Fig. S2A. predicted by graphical approach (GA) is necessary for the ob- As observability and controllability represent mathematical servability of an arbitrary dynamical system. The GA-predicted duals (1, 2), we can map the observability problem of the net- sensors are chosen from the root SCCs of the inference diagram work GðAÞ into the controllability problem of the transposed derived from the system’s dynamics. According to the definition network GðATÞ, which is obtained by flipping the direction of of root SCCs, they have no incoming edges, implying that their each edge in the original network GðAÞ (Fig. S2B). One can information cannot be inferred from any other nodes in the in- check that the transposed network GðATÞ is nothing but the in- ference diagram. Indeed, if we fail to measure all of the GA- ference diagram of the system (Fig. S1H). Assuming that the ’ predicted sensor nodes, then one or several columns in the Ja- matrix elements aij s are independent, we can perform the cobian matrix will be zero (e.g., the gray columns of Fig. S1 C, F, structural controllability analysis over the transposed network AT and I, Right). For example, assume we do not measure a sensor Gð Þ (9, 10). According to structural control theory, a system is structurally controllable if and only if its corresponding directed node xi; as the state of the sensor nodes can never be inferred from the dynamics of other nodes, the ith column of the Jacobian network contains neither inaccessible nodes nor dilations (9). ∂Lt y Here, inaccessible nodes represent those state variables that matrix will be zero: f = 0 for all 0 ≤ t ≤ N − 1. Hence, rank J < ∂xi cannot be reached from the inputs. The system contains a di- N and the system is not observable, indicating the necessity of lation if and only if there is a node subset S of the state variables

Liu et al. www.pnas.org/cgi/content/short/1215508110 2of10 such that jTðSÞj < jSj where the neighborhood set TðSÞ of a set S In principle there could be more complicated exceptional cases is defined to be the set of all nodes j that there exists a directed for which the GA-predicted sensors are not sufficient for observ- edge from j to a node in S, i.e., TðSÞ = f jjð j → iÞ ∈ EðGÞ; i ∈ Sg. k1 k1 ability, e.g., A + B + C ↽ * D + E, A + B + C ↽ * 2D + The input variables (also called origins, e.g., u1 and u2 in Fig. k2 k2

S2D) are not allowed to belong to S but may belong to TðSÞ. j · j k1 k1 + + * + + + + ↽ * + + denotes the cardinality of a set. We can show that by controlling E, A B C ↽ D E F, 2A B C D E F, k2 k2 the GA-predicted nodes, the directed network contains neither k + + 1 * + + inaccessible nodes nor dilations. All of the nodes are accessible 2A B C ↽ 2D E F. Systematic search for those ex- k2 from inputs because we inject inputs to the nodes in the top layer ceptional cases, or equivalently the existence of nontrivial Lie of the underlying hierarchical structure. There are no dilations subalgebra of models’ symmetries letting the inputs and the because each node (except pure products) has a self-edge (because outputs be invariant (14), is beyond the scope of the current their concentrations will affect the dynamics of themselves), which work. Here, we emphasize that those exceptional cases contain- ensures that jTðSÞj ≥ jSj for all of the subsets S. Hence, we only need to control those GA-predicted nodes to fully control the ing isolated reactions are extremely rare. Consider N species transposed network GðATÞ. By invoking the duality of controlla- involved in M reactions and assume each reaction contain exactly bility and observability, we just need to measure those root nodes four species. The probability that four species A, B, C, and D to fully observe the original network GðAÞ. involve in an isolated reaction, i.e., none of them involves in any 2 3 − N − 4 M 1 The linearized systems are usually not structured systems, be- 6 7 M 6 4 7 other reactions, is given by 4 5 , which decays to zero cause A is not a structured matrix—its elements (aij) are typically N N not independent from each other. For example, we have 4 4 2 3 rapidly as N and M grow. For N = M = 10, such a probability is × −12 −k1x2 −k1x1 00 2.305 10 . Moreover, the symmetries of state variables in those exceptional 6 −k x −k x − k x2 ð−2k x x Þ 0 7 A =6 1 2 1 1 2 3 2 2 3 7 [S15] cases are easily broken due to either different stoichiometry coef- 4 − 2 − 5 k1x2 k1x1 2k2x3 ð 4k2x2x3Þ 0 ficients or additional reactions. For example, the reversible reaction 2 0 k2x3 2k2x2x3 0 k1 A + B ↽ * 2C + D [S18] for the system of Fig. S1D, with many elements depending on the k2 same variable, hence being correlated with each other. There- fore, the structural observability analysis, and thence GA could contains no symmetry any longer and any single species can be in principle underestimate the number of sensor nodes that we chosen as the sensor to observe the whole system. Similarly, need to measure. We therefore need to test GA’s validity for the reaction system fully nonlinear systems. 8 > k1 For this we randomly generated chemical reaction networks < A + B ↽ * C + D (Materials and Methods). We start from a few randomly generated k2 [S19] > k3 compounds, and create mass-balanced reactions with randomly : C ↽ * E k assigned rate constants ki. Due to the arithmetic complexity in- 4 volved in the generic rank calculation (14), the largest reaction system we were able to test contains 221 species involved in 121 contains no symmetry and any single species can be chosen as mass-balanced reactions (15). We generate 1,000 such reaction the sensor to observe the whole system. Hence those excep- systems and use GA to identify the sensor nodes for those systems. tional cases, e.g., S16, are exceptionally rare in large chem- By using Sedoglavic’s algorithm (14), we perform the rank test of ical reaction systems, possibly never occurring in real systems. the Jacobian matrix and confirm that monitoring the GA-predicted Indeed, we did not find any occurrence of such configurations sensor nodes indeed yields full observability for almost all of the in the metabolic networks of three well-studied model organisms, connected reaction systems we generated. Escherichia coli, Saccharomyces cerevisiae,andHomo sapiens, We find that exceptions occur when there are reversible starting from their complete metabolic reconstruction (16). reactions, e.g., III. Real Biochemical Reaction Systems

k1 We also tested GA on several real biochemical reaction systems, A + B ↽ * C + D; [S16] fi k2 e.g., the simpli ed glycolytic reaction map, the model for ligand binding, and the model for cell cycle control, confirming that which are isolated from the rest of the reaction system, forming these systems are all observable via monitoring the minimum set isolated root SCCs in the inference diagram. For example, using of sensor nodes predicted by GA. We further apply GA to the mass-action kinetics the balance equations for this closed subsys- genome-scale metabolic networks of three well-studied model tem S16 can be written as organisms, E. coli, S. cerevisiae,andH. sapiens.Wefind that the 8 fraction of sensor nodes determined by GA is not very sensitive > _ = − + <> x1 k1x1x2 k2x3x4 to the assigned reversibility of the reactions in the genome-scale _ = − + x2 k1x1x2 k2x3x4 [S17] metabolic reconstructions. > _ =+ − :> x3 k1x1x2 k2x3x4 _ x4 =+k1x1x2 − k2x3x4: A. Simplified Glycolytic Reaction Map. We first consider the sim- plified glycolytic reaction map. The system consists of N = 10 chemical species [glucose (Gluc); ADP; glucose 6-phosphate One can show that measuring any single species, e.g., y = x1, will not yield observability of the whole system. This is due to the existence of (G6P); ATP; glucose 1-phosphate (G1P); AMP; fructose 6-phos- symmetries in the state variables leaving the output and its time phate (F6P); fructose 2,6-biphosphate (F2-6BP); triose phosphate derivatives invariant (14). For such a reaction, one has to measure at (TP); pyruvate (Pyr)] involved in R = 9 reactions (main text, Fig. T least two species, e.g., y = (x1, x2) , to achieve observability. 3A). The balance equations are given by

Liu et al. www.pnas.org/cgi/content/short/1215508110 3of10 8 > x_ = − k x x + C > 1 1 1 4 1 > x_ = k x x + 2k x x − 2k x2 + k x x + k x x + k x − 2k x2x d½G1K > 2 1 1 4 4 4 6 5 2 7 4 7 9 4 7 10 4 11 2 9 = k + ðk + k Þ½G1R > _ = − + − dt 5 4 8r > x3 k1x1x4 k2x3 k3x5 k6x3 > _ = − − + 2 − − − + 2 − − − + ; > x4 k1x1x4 k4x4x6 k5x2 k7x4x7 k9x4x7 k10x4 2k11x2x9 k8½G1K R ½G1K V6pð1 ½UbE2Þ V6½UbE2 < _ x5 = k2x3 − k3x5 > x_ = − k x x + k x2 > 6 4 4 6 5 2 d½G1R > _ = − + − = − − − + ; > x7 k6x3 k7x4x7 k8x8 k9x4x7 k4½G1R k6p½G1R k8r½G1R k8½G1K R > _ = − dt > x8 k7x4x7 k8x8 > _ = − 2 :> x9 2k9x4x7 k11x2x9 _ = 2 − : d½G2K x10 k11x2x9 C2x10 = k + ðk + k Þ½G2R + V ð1 − ½Cdc25Þ dt 1 4 7r 25p [S20] + V25½Cdc25 ½PG2 − kk½G2K½R It is easy to check that pure product pyruvate (Pyr, or x10) is the − − + only root node of the inference diagram (main text, Fig. 3B). To ½G2K V2pð1 ½UbEÞ V2½UbE check the observability of the system with Pyr (y = x ) as the 10 − ½G2K V ð1 − ½Wee1Þ + V ½Wee1 ; only output, we use Sedoglavic’s algorithm and find that the wp w system is indeed observable. d½G2R = − k ½G2R − k ½G2R + k ½G2K½R B. Model for Ligand Binding. Six species are incorporated in the dt 4 7r 7 ligand binding model (17): erythropoietin (Epo); Epo receptor − + − + ; (EpoR); Epo_EpoR complex; internalized complex Epo_EpoR_i; ½G2R k2p V2pð1 ½UbEÞ V2½UbE degraded internalized ligand dEpo_i, and degraded extracellular + β ligand dEpo_e (main text, Fig. 3C). Using mass-action kinetics, d½IE = − kir½IE + ki½IECð½G2K ½PG2Þ; fl the reaction uxes are given by dt Kmir + ½IE Kmi + ½IEC 8 > = · · > v1 kon ½Epo ½EpoR > = · · d½mass = μ ; > v2 kon kD ½Epo EpoR ½mass > = · dt <> v3 kt Bmax = · v4 kt ½EpoR [S21] > = · d½PG2 > v5 ke ½Epo EpoR = − V 1 − Cdc25 + V Cdc25 PG2 > = · 25pð ½ Þ 25½ ½ > v6 kex ½Epo EpoR i dt > = · :> v7 kdi ½Epo EpoR i + k4½PG2R + k7r½PG2R − k7½PG2½R v8 = kde·½Epo EpoR i −½PG2 V2pð1 − ½UbEÞ + V2½UbE and the balance equations are given by + ½G2K Vwpð1 − ½Wee1Þ + Vw½Wee1 ; 8 > d½Epo > = − v + v + v > 1 2 6 d½PG2R = − − + > dt k4½PG2R k7r½PG2R k7½PG2½R > d½EpoR dt > = − v1 + v2 + v3 − v4 + v6 > dt − + − + ; > ½PG2R k2p V2pð1 ½UbEÞ V2½UbE <> d½Epo EpoR = v1 − v2 − v5 dt [S22] > d½R > d½Epo EpoR i = − − − = + + + + > v5 v6 v7 v8 k3 k6p½G1R k8r½G1R k7r½G2R k7r½PG2R > dt dt > d½dEpo i > = − k ½R − k ½G1K½R − k ½G2K½R − k ½PG2½R > v7 4 8 7 7 > dt :> d½dEpo e kp½massð½Cig1 + α½G1K + ½G2K + β½PG2Þ½R = v8 − dt Kmp + ½R + ½G2R k2p + V2pð1 − ½UbEÞ + V2½UbE One can easily check that the sensor node set consists of two pure products (dEpo_i and dEpo_e) (main text, Fig. 3D). To + ½PG2R k2p + V2pð1 − ½UbEÞ + V2½UbE ; check the observability of the system with dEpo_i and dEpo_e as the outputs (y = [dEpo_i], y = [dEpo_e]), we use Sedoglavic’s 1 2 d½UbE k ½UbE k ½IEð1 − ½UbEÞ algorithm and find that the system is indeed observable. = − ur + u ; dt Kmur + ½UbE Kmu + 1 − ½UbE C. Model for Cell Cycle Control in Fission Yeast. Novak and Tyson proposed a mathematical model for cell cycle control in fission yeast + β − (18). Using standard principles of biochemical kinetics, they ob- d½UbE2 = − kur2½UbE2 + ku2ð½G2K ½PG2Þð1 ½UbE2Þ; tained a set of differential equations describing how the concen- dt Kmur2 + ½UbE2 Kmu2 + 1 − ½UbE2 trations of the major state variables in their model change with time: + β + β − d½Cdc25 = − kcr½Cdc25 + kc½Cdc25Cð½G2K ½PG2Þ; d½Wee1 = − kwð½G2K ½PG2Þ½Wee1 + kwrð1 ½Wee1Þ ; dt Kmcr + ½Cdc25 Kmc + 1 − ½Cdc25 dt Kmw + ½Wee1 Kmwr + 1 − ½Wee1

Liu et al. www.pnas.org/cgi/content/short/1215508110 4of10 × L fi with rate constants k1 = 0.015, k3 = 0.09375, k2′ = 0.05, k4 = where the N K matrix is to be speci ed later. Note that if z = x 0.1875, k5 = 0.00175, k7 = 100, k7r = 0.1, k6′ = 0, k8 = 10, k8r = the observer is initiated with ð0Þ ð0Þ, then it follows that z = x > x 0.1, kp = 3.25, ki = 0.4, kir = 0.1, kur = 0.1, ku = 0.2, kur2 = 0.3, ðtÞ ðtÞ exactly for all t 0. Because ð0Þ is usually unavail- ku2 = 1, kwr = 0.25, kw = 1, kcr = 0.25, kc = 1, V2 = 0.25, V2′ = able (which is the very reason we need an observer), we have z ≠ x z x 0.0075, V25 = 0.5, V25′ = 0.025, V6 = 7.5, V6′ = 0.0375, Vw = 0.35, ð0Þ ð0Þ; we hope ðtÞ will asymptotically converge to ðtÞ, i.e., Vw′ = 0.035, Michaelis constants Kmc = Kmcr = 0.1, Kmi = Kmir = the state of the observer tracks the state of the original system. L 0.01, Kmp = 0.001, Kmu = Kmur = 0.01, Kmu2 = Kmur2 = 0.05, This can be guaranteed by a suitable choice of the matrix. e = z − x Kmw = Kmwr = 0.1, and miscellaneous constants α = 0.25, β = 0.05, Consider the time evolution of the error vector ðtÞ ðtÞ ðtÞ. μ = 0.00495, [Cig1] = 0. From Eqs. S23 and S24, one has The associated inference diagram contains two SCCs: {[mass]} and {[Cdc25], [G1K], [G1R], [G2K], [G2R], [IE], [PG2], [PG2], e_ðtÞ = z_ðtÞ − x_ðtÞ [PG2R], [R], [UbE], [UbE2], [Wee1]} (Fig. S3). The latter is = ½A − LC½zðtÞ − xðtÞ [S25] a root SCC. We verified, via Sedoglavic’s algorithm, that by = ½A − LCeðtÞ: monitoring any node in the root SCC, the system is observable.

D. Genome-Scale Metabolic Networks. We applied GA to the If the matrix [A − LC] is asymptotically stable, the error vector genome-scale metabolic networks of three well-studied model will converge to zero with rate determined by the largest ei- organisms, E. coli, S. cerevisiae,andH. sapiens (16). During the genvalue of [A − LC]. genome-scale metabolic network reconstruction, the reversibility For the linear reaction system shown in Fig. S1G, we have and directionality of reactions have to be carefully assigned. To 8 > _ = − achieve this, the thermodynamic consistency analysis has been <> x1 k1x1 _ =+ − + introduced in the metabolic reconstruction process (19, 20). The x2 k1x1 k2x2 k3x4 [] fi > _ =+ detailed veri cation and error diagnostics of the assigned re- :> x3 k1x1 versibility and directionality of the reactions in the genome-scale x_4 =+k2x2 − k3x4: metabolic reconstructions are beyond the scope of our current research. However, to determine the sensitivity of our result on the The output vector is given by y = (x , x )T and for simplicity we assignment of the reaction reversibility, we performed the follow- 3 4 have assumed there are no external inputs uðtÞ. Then, the ing test. For the reaction list of each model organism, we randomly Luenberger observer is given by selected a p fraction of irreversible reactions, changed them to be 8 reversible, and recalculated the fraction of sensors (ns)fromthe > _ = − fi <> z1 k1z1 modi ed inference diagram. This random selection is repeated 10 _ =+ − + z2 k1z1 k2z2 k3z4 [S27] times to get statistical error. We then plot ns as a function of p (Fig. > _ =+ + − fi :> z3 k1z1 Kðx3 z3Þ S4). We nd that ns decreases slowly as p increases. For example, _ =+ − + − for the genome-scale metabolic network of E. coli (iAF1260), z4 k2z2 k3z4 Kðx4 z4Þ nsðp = 0Þ ≈ 0:058 and nsðp = 0:9Þ ≈ 0:028, which means that if 90% of the original irreversible reactions were actually reversible, the denoted as observer 1, where we choose fraction of sensors will decrease by 52%. This calculation suggests 0 1 00 that our result is not very sensitive to the assigned reversibility of B C the reactions in the genome-scale metabolic reconstructions. L = B 00C [S28] @ K 0 A IV. Linear Observer 0 K Observability only concerns our ability to reconstruct the internal state of a system from its outputs. GA described in this work helps and K is a constant (called observer gain) (2). We show that the us identify the nodes through which we can observe a complex observer can be used to monitor the dynamics of the system and system— it does not tell us how to do it. To achieve actual ob- the state of the observer indeed tracks the state of the original servability, we need to explicitly build an observer, i.e., a dynamic system (Fig. S5). device that models a real system through which we uncover, from However, if we do not measure x4 and just measure x3, from the the available outputs, the rest of the (unmeasured) variables. For the inference diagram we can easily tell that x2 and x4 can never be sake of completion, we illustrate this procedure for a small linear inferred. This can also be seen from the simulation of the fol- reaction system—for nonlinear systems the observer construction is lowing observer: rather involved and still an open and active area of research (21). 8 Consider the chemical reaction system shown in Fig. S1G, > _ = − <> z1 k1z1 which has linear balance equations. GA predicts a minimum _ =+ − + z2 k1z1 k2z2 k3z4 [S29] sensor set contains two nodes (x3 and x4), and monitoring them > _ =+ + − :> z3 k1z1 Kðx3 z3Þ should in principle yield full observability. Because this is a linear _ z4 =+k2z2 − k3z4 dynamic system, one can easily design a Luenberger observer. A general linear time-invariant system is described by denoted as observer 2. One sees that indeed this observer will not ( track x and x at all (Fig. S6). x_ = A x + B u 2 4 ðtÞ ðtÞ ðtÞ [S23] yðtÞ = C xðtÞ: V. Other Dynamic Systems A. Ecological Systems. We consider an ecological community The Luenberger observer has the following form: consisting of N species, in which population dynamics is driven by interspecific interactions. We assume a Holling type I functional z_ðtÞ = A zðtÞ + L½yðtÞ − C zðtÞ + B uðtÞ; [S24] response, and the population dynamics of species i is given by

Liu et al. www.pnas.org/cgi/content/short/1215508110 5of10 ! ; XN This model contains three state variables: xðtÞ yðtÞ, and zðtÞ, _ = − + γ ; [S30] representing the membrane potential, the transport rate of so- xi xi ri sixi ij xj j = 1; j ≠ i dium and potassium ions through fast ion channels (spiking variable), and the transport rate of other ions thorough slow channels (bursting variable), respectively. Here, we consider where ri is the intrinsic population and mortality rate of species i, si is density-dependent self-regulation, and γij represents the in- a diffusion-coupled directed network with N identical Hind- teraction coefficient between two species i and j. Two species i marsh–Rose neurons. The dynamic equation of each neuron is and j interact with probability C. We consider three different given by γ γ types of interactions: (i) random, where ij and ji are uncorre- 8 P – < _ = + ϕ − + + − lated and they do not have to appear in pair; (ii) predator prey, xi yi ðxiÞ zi Ia j∈∂+i vj vi where γ and γ always appear in pair and have opposite signs; _ = ψ − [S31] ij ji : yi ðxiÞ yi γ γ _ (iii)mixture of competition and mutualism, where ij and ji al- zi = r½sðxi − xRÞ − zi; ways appear in pair and have the same sign, either “+” or “−” representing mutualistic or competitive interaction, respectively. + with ϕðxÞ = ax2 − x3, ψðxÞ = 1 − bx2, and ∂ i representing all nodes We generate 100 ecological systems with N = 50 species, in- = pointed by node i. teraction probability C 0.05, and randomly assigned parameter We generate 100 neuron systems with up to N = 50 neurons values. We then identify the sensor species using GA. By using randomly connected with each other with probability C = 0.1. Sedoglavic’s algorithm, we find those systems are indeed ob- We fix the system parameters to be s = 4, xR = −8/5, a = 3, b = 5, servable by monitoring the GA-selected sensor species for all of = the three interaction types. and r 0.001. We consider the current Ia that enters the neuron as an input and assume it is the same for all of the neurons (24). B. Neuron Systems. As a simplification of the classical Hodgkin– We then identify the sensor neurons using GA. By using Sedo- Huxley neuron model (22), the Hindmarsh–Rose model (23) glavic’s algorithm, we verify that those systems are indeed ob- aims to model the spiking–bursting behavior of a single neuron. servable by monitoring the GA-predicted sensor neurons.

1. Kalman RE (1963) Mathematical description of linear dynamical systems. J Soc Indus 14. Sedoglavic A (2002) A probabilistic algorithm to test local algebraic observability in Appl Math Ser A 1(2):152–192. polynomial time. J Symb Comput 33(5):735–755. 2. Luenberger DG (1979) Introduction to Dynamic Systems: Theory, Models, & 15. Basler G, Nikoloski Z (2011) JMassBalance: Mass-balanced randomization and analysis Applications (Wiley, New York). of metabolic networks. Bioinformatics 27(19):2761–2762. 3. Chui CK, Chen G (1989) Linear Systems and Optimal Control (Springer, New York). 16. Schellenberger J, Park JO, Conrad TM, Palsson BØ (2010) BiGG: A Biochemical Genetic 4. Slotine JJ, Li W (1991) Applied Nonlinear Control (Prentice-Hall, Englewood Cliffs, NJ). and Genomic knowledgebase of large scale metabolic reconstructions. BMC Bioinf 5. Isidori A (1995) Nonlinear Control Systems (Springer, Berlin). 11:213. 6. Diop S, Fliess M (1991) On nonlinear observability. Proceedings of ECC’91 (Hermès, 17. Raue A, Becker V, Klingmüller U, Timmer J (2010) Identifiability and observability Paris), Vol 1, pp 152–157. analysis for experimental design in nonlinear dynamical models. Chaos 20(4):045105. 7. Diop S, Fliess M (1991) Nonlinear observability, identifiability, and persistent 18. Novak B, Tyson JJ (1997) Modeling the control of DNA replication in fission yeast. Proc trajectories. Proceedings of the 30th IEEE Conference on Decision and Control (IEEE Natl Acad Sci USA 94(17):9147–9152. Press, New York), Vol 1, pp 714–719. 19. Kümmel A, S, Heinemann M (2006) Systematic assignment of thermodynamic 8. Anguelova M April (2004) PhD thesis (Chalmers University of Technology and Göte- constraints in metabolic network models. BMC Bioinformatics 7:512. borg University, Göteborg, Sweden). 20. Feist AM, et al. (2007) A genome-scale metabolic reconstruction for Escherichia coli 9. Lin CT (1974) Structural controllability. IEEE Trans Automat Control 19(3):201–208. K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol Syst 10. Liu YY, Slotine JJ, Barabási AL (2011) Controllability of complex networks. Nature Biol 3:121. 473(7346):167–173. 21. Chaves M, Sontag ED (2002) State-estimators for chemical reaction networks of 11. Siljak DD (1978) Large-scale Dynamic Systems: Stability and Structure (North-Holland, Feinberg-Horn-Jackson zero deficiency type. Eur J Control 8(4):343–359. New York). 22. Hodgkin AL, Huxley AF (1952) A model of the nerve impulse using two first-order 12. Khan UA, Moura JMF (2008) Distributing the Kalman filter for large-scale systems. differential equations. J Physiol 117:500–554. IEEE Trans Signal Process 56(10):4919–4935. 23. Hindmarsh JL, Rose RM (1982) A model of the nerve impulse using two first-order 13. Doostmohammadian M, Khan UA (2011) Communication strategies to ensure generic differential equations. Nature 296(5853):162–164. networked observability in multi-agent systems. Conference Record of the Forty Fifth 24. Tabareau N, Slotine JJ, Pham QC (2010) How synchronization protects from noise. Asilomar Conference on Signals, Systems and Computers (ASILOMAR) (IEEE Press, PLOS Comput Biol 6(1):e1000637. New York), pp 1865–1868.

Liu et al. www.pnas.org/cgi/content/short/1215508110 6of10 A A B D A + B C G A B + C B + 2C D B D Balance equations

B E H

A x A x A x 1 1 1

B x C x B C 2 3 x x x B 2 2 3 D x D x 4 4 C F I

x x 1 x x x x 1 1 1 1 1

x x x x x x x x 2 3 2 3 2 3 2 3 x x 2 2 x x 4 4 x x 4 4 observable unobservable observable unobservable observable unobservable Jacobian Inference Diagram Reactions

Fig. S1. Observability of simple chemical reaction systems. For each reaction system, we show its reaction(s), balance equations, inference diagram, and the Jacobian matrices corresponding to an observable case by measuring the GA-selected sensor node(s) (shown in red); and an unobservable case (with sensors shown in green). (A) A simple chemical reaction with two species A and B. The balance equations associated with this reaction are linear. (B) Inference diagram derived from the balance equations shown in A.(C) Jacobian matrices corresponding to an observable case (Left) and a nonobservable case (Right). (D)A chemical reaction system with four species (A, B, C, and D) involved in two reactions. The associated balance equations are nonlinear. (E) Inference diagram derived from the balance equations shown in D.(F) Jacobian matrices corresponding to an observable case (Left) and a nonobservable case (Right). (G)A chemical reaction system with four species (A, B, C, and D) involved in two reactions (one is reversible). The associated balance equations are linear.(H)In- ference diagram derived from the balance equations shown in G.(I) Jacobian matrices corresponding to an observable case (Left) and a nonobservable case (Right).

Liu et al. www.pnas.org/cgi/content/short/1215508110 7of10 A G( A ) B G(AT)

x A x A 1 1

B x C x B x C x 2 3 2 3

D x D x 4 4

C D

x A x A 1 1

B x C x x C x 2 3 B 2 3

D x D x 4 y 4 u 2 2

y u 1 1

Fig. S2. Duality between observability and controllability. (A) Balance equations of a chemical reaction system can be associated with a directed weighted network GðAÞ, called the system digraph. (B)Byflipping the direction of each edge in GðAÞ, we get a transposed network GðATÞ, which is just the inference diagram introduced in the main text. (C) To ensure the observability, we need to measure nodes x3 and x4 as outputs y1 and y2.(D) To ensure the controllability, we need to drive nodes x3 and x4 with inputs u1 and u2.

Fig. S3. Inference diagram of the cell cycle control model in fission yeast (18). Strongly connected components (SCCs) are marked with dashed circles. The root SCC, which has no incoming links, is shaded in gray.

Liu et al. www.pnas.org/cgi/content/short/1215508110 8of10 0.12 E. coli (iAF1260) S. cerevisiae (iND750) 0.1 H. sapiens (Recon1)

0.08

0.06

0.04

0.02

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fig. S4. Fraction of sensor nodes as a function of p, with p the fraction of irreversible reactions in the original genome-scale metabolic networks of three well- studied model organisms, Escherichia coli (red “+”), Saccharomyces cerevisiae (green “x”), and Homo sapiens (blue “*”). This p fraction of irreversible reactions is forced to be reversible, and then we recalculate the fraction of sensor nodes of the perturbed metabolic networks. The results are averaged over 10 random selections of p fraction of irreversible reactions, with error bars defined as SEM.

Fig. S5. Observer 1 works: the estimates zi (t) (shown in dotted lines) will converge to the original state variables xi(t) (shown in solid line) for large t. Here, x1, x2, x3, and x4 are shown in red, green, blue, and magenta, respectively.

Liu et al. www.pnas.org/cgi/content/short/1215508110 9of10 Fig. S6. Observer 2 does not work: only the estimates z1 (t) and z3 (t) will converge to the original state variables x1 (t) and x3 (t), respectively, at large t.The estimates z2 (t) and z4 (t) will not converge to x2 (t) and x4 (t) Here, x1, x2, x3, and x4 are shown in red, green, blue, and magenta, respectively.

Liu et al. www.pnas.org/cgi/content/short/1215508110 10 of 10