<<

Chemical Reaction Networks Part 1: Chemical Reaction Networks — what, where, why?

• What are Chemical Reaction Networks?

• Where can you use them?

• Why they are hard to analyze?

• Why they are interesting, mathematically and practically?

• Levels at which one can use them as a modeling framework What are Chemical Reaction Networks (loose)

• Models of chemical systems + much more

• Discrete-state Markovian stochastic processes

• Topologically interesting graph systems

• Laboratories in which to study:

• the consequences of concurrency in transformations

• how topology controls dynamics or possibilities for order What are Chemical Reaction Networks (tighter)

• Sets of reactions that act on sets of species and:

• reactions can fire independently and randomly

• convert sets of reactants to sets of products all at once

• Hopping on simple graphs is a degenerate case

• More generally one requires multi-hypergraphs (we will represent these with doubly-bipartite graphs) Stochastic processes with or without concurrency

• Stochastic process on a simple graph: A B

• nodes are “all alike”; particles can occupy anywhere

• particles hop over links — simple diffusion on the graph

ACE PYR • CRN: reactions are conditional b on all inputs’ being available H2

and all outputs’ being formed CO2 H2O

• Reactions are independent like those on the simple graph But moves in the state space are not independent Pause to emphasize what is the MAIN POINT

• It is the species that carry state information from one moment to the next in a Markov CRN

• For simple graphs, events that change state are independent species-by-species

• For non-simple hypergraphs, independence is lifted to the reaction level, but may not hold species-by-species

• This difference is the source of complexity Current Biology Vol 25 No 1 6

Figure 4. Sound Transition Networks Showing Phoneme diffusion Regular and Sporadic Changes Transitions among consonants (circles) and during language among vowelsWhat (squares) are frequent might and regular one (many connections) but are rare between them, evolution save for those mediated by the semivowel w. Transitions are more frequent among sounds with similarmodel places of articulation: consonants with CRNs? are coded as bilabials-labiodentals (red), nasal (light green), uvular-velar-glottal (purple), postal- veolar-palatals (blue), and dental-alveolars (green); vowels divide into high (gray) and higher-mid to low (white) subsets. Blue lines denote sporadic transitions, with thicker lines de- 3 noting faster underlying rates. Red lines denote2 regular changes; arrows indicate the direction of change. 4

18 Figure 5A),19 then the number of such events per branch of the tree is ex- pected to follow a Poisson distribution with mean rate given by 0.0026 3 t, 15 where t is the length of the branch in years. 10Following expectations, the cumula- tive density of the observed number of2 12events per branch (including branches 8 13 with no regular sound changes) shows a close fit to the Poisson expectation 2 28(Figure 5B). The 21 branches in which 5 no regular sound change occurred, 9 along with those in which multiple 1 events are inferred, can all be con- 6 sidered as samples from the same un- 1 derlying stochastic process. A further 2 characteristic of the Poisson process 1 is that waiting times between succes- 14 sive events follow an exponential dis- tribution. The distribution of waiting times between successive events of regular sound change on the phylogeny shows a striking fit to this expectation 17 (Figure 5C). 1 The observed range of 14 in the number of regular sound changes per 7 Stoichiometric Hruschka et al.language Detecting is, however, regular wide, being model with unlimited sound changesexpected in linguistics to occur in as approximately 0.68% of outcomes (Figure 5D). The out- solutions to a fixed events of concertedgroup, Chuvash, evolution with 15 regular sound changes, might be unusual in having Curr. Biol. 25:1--9four phonemes(2015) that are unique among conversion problem this group of languages. These four pho- nemes account for five of the regular sound changes in the branch leading The number of regular sound changes in a language’s to Chuvash. Removing these five, Chuvash with ten events history ranges from a low of 1 in Karaim and Balkar to a yields a range (10–1) that now falls well within the Poisson high of 15 in Chuvash (Figure 2B; the low count for Karaim expectation. might reflect phonetic transcription practices). The tempta- tion is to interpret these as indicating different intrinsic rates, Discussion or perhaps different external pressures, for sound change, Our analysis has shown how a model of concerted evolution but large differences in the numbers of regular changes can can discover the timings and phylogenetic placements of arise among languages simply as a result of random fluctua- multiple events of regular sound change, and without prior tions and shared phylogenetic histories. Thus, if events of knowledge of the forms those regular changes might take. regular change occur randomly at a constant rate (as in The events we find conform closely to linguistic expectations, Other applications and other names for CRNs and closely-related systems

• Population processes such as birth-death processes, including evolutionary games and lots of genetics cases

www.researchgate.net/publication/38083310

• In computer science the same concept of concurrency is captured by Petri Nets for process calculus Levels at which CRN models can be defined and studied dn • Chemical rate equation = Y A Y dt d ⇢ = T ⇢ • Stochastic process for counts dt n nn0 n0 Xn0 • “Petri-net proof” counting individuals

• Category-theoretic CRN-composition (J. Baez & students)

[A] a b c [B] [X] [Y]

[A] ab c [B] [Y] { or } [A] a bc [B] [X]

[A] abc [B] Graph-grammar methods to generate CRNs, and associated analysis tools for them

• Why do this?

• Actual networks can be too large to write down (Murchison, Titan atmosphere), or too large to even know (OOL as an example)

Huygens image: credit NASA

Fig. 2. A possible mode of formation of meteoritic compounds from known interstellar molecules and subsequent aqueous chemistry. (A) Addition of ketene (neutral or ionic) to interstellar nitriles and hydrogenationCooper producing et al. keto PNAS compounds. 108:14051 If addition is to (2011) terminal carbons, straight-chained keto acids are eventually produced. If addition is to internal carbons, branched keto acids (e.g., 3-methyl-4-oxopentanoic acid, Fig. 1) are the result. At any stage com- pounds may be saturated or unsaturated. (B) Observed laboratory pyruvate reaction products: a possible synthetic route to some known meteoritic compounds • Graph-grammars allow youand to prebiotic express citric acid cycle precursors. an All productsindefinitely-large were identified by GC-MS as their tBDMS network and/or TMS derivatives. Depending on the particular experi- ment, all listed reaction products were observed labeled with 13C from either reactant pyruvate (x 0;1; 2; or 3) or K13CN. Where pyruvate is the sole reactant, ¼ reaction products are fully labeled with 13Cwhenfullylabeled(U-13C) pyruvate is used, e.g., oxaloacetic acid, acetoacetic acid, 2-methyl succinic acid, etc. 13 13 13 13 recursively from a finite set2-Methyl of succinicgenerators acid contains one C (i.e., its mass increases by one amu) when C1-pyruvate is the reactant; it contains two Cwhen C3-pyruvate is the 13 13 reactant; and four Cwhenstartingwith C2;3-pyruvate. This result is consistent with a substitution of the oxygen on carbon-2 of one molecule of pyruvate by a second molecule of pyruvate (at carbon-3) followed by decarboxylation and hydrolysis of the second pyruvate. 2-Ketoglutaric acid (13C-labeled) is observed in pyruvate- K13CN reaction mixtures. To date, 2-ketoglutaric acid has not been seen in pyruvate-only solutions but is seen in pyruvate-oxaloacetate solutions implying that it can also be formed by pyruvate alone (i.e., after pyruvate produces oxaloacetate). 3-Ketoglutaric acid is seen in both pyruvate-only solution and pyruvate KCN solution. Due to the GC coelution of cis and trans aconitate and the similarity of their mass spectra, the identity of the observed isomer(s) is þ not yet determined. Also, the presence or absence of malic acid (another citric acid cycle compound) is not yet confirmed. [In addition to the keto acid for- mation scheme in (A), meteoritic pyruvate formation might have also taken place by the oxidation of -water reaction products as shown].

reaction of CN radical with ketene (17). Similar reactions with also found in meteorites (19, 20). The list includes some of the higher nitriles and ketene (or other acetylating agents) may give compounds we report here such as tricarballyic acid and citric the higher homologs (Fig. 2A). Addition of ketene in some form acid (20). Tricarballyic acid (and other tricarboxylic acids) is also (ionic, radical, or neutral) to internal unsaturated carbons of the produced by aqueous reactions of pyruvate (outlined below). nitriles could give branched keto acids. After formation, the keto A portion of the meteoritic tricarboxylic acids may have formed acids or their nitrile and/or amide forms (Fig. 2A) should then from successive additions of HCN to unsaturated mononitriles. An yield a variety of other meteoritic compounds including “non- analogy is the postulated formation of meteoritic succinic acid α” acids (SI Text). Ketene addition to the nucleophilic nitrogen of with HCN and acrylonitrile (21). For example, sequential beta amino acids may be responsible for the presence of acetylated additions of HCN to interstellar (22) cyanoallene, CH2CCHCN, amino acids in meteorites (8). In terms of nitrile reactions with followed by aqueous hydrolysis could have produced tricarballylic molecules other than ketene, theoretical calculations show that acid (Fig. 1). it is energetically favorable for molecules such as NH3 and H2O (among others) to also add to the terminal carbon of CCN (18): The Formation of Prebiotic Compounds by Pyruvate Chemistry: after further processing this could result in an amino and hydroxy Laboratory Reactions. To determine if subsequent reactions of keto GEOPHYSICS acid, respectively. This scenario may also apply to higher nitrile acids could be partly responsible for the presence of hydroxy tri- homologs, again resulting in non-α acids. carboxylic acids (e.g., citric acid) and other known meteoritic A ubiquitous and important source of energy, interstellar compounds we began a series of reactions with keto acids under radiation, should be considered in tricarboxylic acid formation. conditions related to those of carbonaceous meteorite parent The exposure of interstellar grains and comets to various forms bodies (SI Text). In the experiments a reactant, most often sodium of radiation inspired research into how such phenomena would pyruvate, was placed in a solution of carbonate buffer or sodium affect the production of organic compounds (19). A series of bicarbonate with or without KCN and/or peroxide and allowed to experiments employing the irradiation of various starting com- react at 2 °C or 22 °C. In some cases peroxide was added at later pounds, including HCN, were shown to produce some compounds stages of the reactions. Many of the same products are seen at

Cooper et al. PNAS ∣ August 23, 2011 ∣ vol. 108 ∣ no. 34 ∣ 14017 Example of a graph, graph fragment, common pattern, and rewriteVersion rule February 21, 2013 submitted to Entropy 5 of 17 Figure 1. Illustration of a chemical reaction using the Double Pushout approach. The chemical transformation of complete molecules (i.e. the application of the graph grammar rule as defined in the first row) is represented as the graph derivation G = H in the second • Molecules themselves are written as graphs ) row. This shows an intermediate step in the synthesis of adenine as an HCN pentamer. l r Above, a possible rule p =(L K R) underlying the concrete reaction is shown (i.e. ! the graph grammar rule defining the chemical reaction). Bonds in L that are removed and • Maps among sets of moleculesbonds in R that become are inserted are all graph coloured red. morphisms The green vertices form in the context K of category theory the rule. L K R H H H N N N • Reaction mechanisms l r N N N are graph-fragment C C C

morphisms H H H N N N

N N N C C C

H C H C H C • Reaction application is H H H C C C N N N

a graph-embedding C C C H H H map N N N G D H

• Key point: finite generator123 approximation, set more can refined energyproduce computations unlimited are too expensive in termsCRNs of computational resources 124 to be used in large-scale network exploration.

125 The chemical space of HCN polymerization is modelled by a set of 9 reaction rules that have been

126 extracted from literature [7,12]. An additional 14 rules are added to implement hydrolysis, mostly

127 additions to the C N and C N bonds. A full description on the rules is given in the Web Supplement

128 [39].

129 3.3. Finding Chemical Motifs

130 A chemical motif is a collection of reactions that e.g. connects given input molecules in order to

131 create a specific product, constitute a catalytic cycle, or are collectively autocatalytic. Given a reaction

132 network generated in the space exploration step, chemical motifs are identified using an an integer linear

133 programming (ILP) approach. The framework of ILP allows us to model both general pathways such as

134 “the conversion of HCN to adenine” as well as pathways with specific structural properties such as “the

135 presence of autocatalytic cycles” without specifying particular molecules.

136 The ILP formulations generically model each reaction by a non-negative integer variable that encodes

137 the multiplicity the reaction. Constraints are introduced to ensure balance of mass. Specific motifs are Andersen et al. Journal of Systems Chemistry 2013, 4:4 Page 2 of 14 http://www.jsystchem.com/content/4/1/4

[8] or a collection of reactions that maximize the pro- The basic computational task we envision starts from duction of a desired product in metabolic engineering. an unordered set R of reactions such as those forming a The net reaction of a given pathway is simply the linear particular metabolic reaction pathway. To derive the effec- combination of the participating hyperedges. tive transformation rule describing the pathway we need In the setting of generative models of chemistry, each to find the correct ordering π in which the transforma- concrete reaction is not only associated with its stoi- tion rules pi,underlyingtheindividualchemicalreactions chiometry but also with the transformation rule operating ρi,havetobecomposed.Weillustratethisapproachin on the molecules that are involved in a particular reac- some detail using the Formose reaction as an example in tion. Importantly, these rules are formulated in terms of the Results section. reaction mechanisms that readily generalize to large sets of structurally related molecules. It is thus of interest to Graph grammars and rule composition derive not only the stoichiometric net reaction of a path- Graph grammars, or graph rewriting systems, are proper way but also the corresponding “effective transformation generalizations of term rewriting systems. A wide vari- rule”. Instead of attempting to address this issue apos- ety of formal frameworks have been explored, including teriori,wefocushereonthepossibilityofcomposing several different algebraic ones rooted in category the- the elementary rules of chemical transformations to new ory. We base our conceptual developments on the double effective rules that encapsulate entire pathways. pushout (DPO) formulation of graph transformations. For The motivation comes from the observation that string the comprehensive treatise of this framework we refer to grammars are meaningfully characterized and understood [10]. In the following sections we first outline the basic by investigating the transformation rules. Consider, as setup and then introduce full and partial rule compo- atrivialexample,thecontext-freegrammarG with the sition. Alternative approaches to graph rewriting in the starting symbol S and the rules S aS′a, S′ aS′a B context of (artificial) chemistry have been based on the ϵ ϵ → → | and B → |bB, where denotes the empty string. Inspect- single pushout (SPO) model of graph transformations, ing this grammar we see that we can summarize the effect see e.g. [11,12]. We briefly discuss the rather techni- k n n of the productions as B → b , k ≥ 0, and S → a Ba , cal difference between the DPO and SPO framework in G n k n n ≥ 1. The language generated from is thus {a b a |n ≥ Appendix A, where we also briefly outline our reasons for 1, k ≥ 0}.Hereweexplorewhetherasimilarreason- choosing DPO. ing, namely the systematic combination of transformation rules, can help to characterize the language of molecules Double pushout and concurrency that is generated by a particular graph rewriting chem- The DPO formulation of graph transformations considers istry. Similar to the example from term rewriting above, The double-pushout( formalisml r ) for graph re-write transformation rules of form p = L ← K → R where L, we should at the very least be able to recognize the reg- R,andrulesK are called the left graph, right graph, and context ularities in polymerization reactions. We shall see below, graph, respectively. The maps l and r are graph mor- however, that the rule based approach holds much higher p,m phisms. The rule p transforms G to H,insymbolsG H promises. • ==⇒ if thereThe is a pushoutimportant graph thingD and to a say “matching here properly morphism” is that there is a Chemical reactions can be readily composed to “over- m : L commutativityG such that following requirement diagram which is valid: defines these morphisms all reactions” such as the net transformation of metabolic → L K R l — left pushout H H H pathways. This observation is used implicitly in flux bal- N N N (adds bonds to Andersenform L) et al. Journal of Systemsl Chemistry 2013,r4:4 Page 3 of 14 N N N ance analysis at the level of the stoichiometric matrix. http://www.jsystchem.com/content/4/1/4C C C r — right pushout Recently, [9] considered the composition of concrete H H H (adds bonds to form R) N N N N N N chemical reactions, i.e., transformations of complete C C C H C H C H C m — matching morphism H H H molecules, as a means of reconstructing metabolic path- C C C (1)N N N a dependency graph E exists so that in the following L2 e2(L2) is a subgraph of R1.Omittingtheexplicit C C C =∼ ways. In this contribution we take a different point of view: D — gluing condition H H H diagram: N N N references to the subgraph matching morphism e2 we instead of asking for concrete overall reactions, we are G D H The existence of D is equivalent to the so-called gluing can simply view L2 as subgraph of R1 as illustrated in concerned with the composition of the underlying reac- condition,whichdetermineswhethertherulep is appli- Figure 1b. tion mechanisms themselves. As we will see, these can • Morphisms admit partial or The rule composition thus amounts to a rewriting cable to a match in G.Inthefollowingwewillalsowrite p2,e2 be applied to arbitrary molecule contexts. We therefore p full rule composition — think R1==⇒R,whiletheleftsideL1 is preserved. We will use G H and G H for derivations, if the specific match p2◦p1 address two issues: First we establish the formal condi- ⇒ ⇒ the notation p2 p1 and G H for this restricted type of or transformationof β-reduction rule is in unimportant λ-calculus, or clear from the ◦ ===⇒ tions under which chemical transformation rules can be rule composition, and call it full composition as the com- context.or Curried functions in Haskell meaningfully composed. To this end, we discuss rule com- plete left side of p2 is a subgraph of R1. Note that L2 may Concurrency theory provides a canonical framework fit into R1 in more than one way so that there may be more position within the framework of concurrency theory in for the composition of two graph transformations.Dependency Given graph for rule composition than one composite rule. Formally, the alternative compo- the following section. We then investigate the specific (2) ( li ri ) sitions are distinguished by different matching morphisms restrictions that apply to chemical systems, leading to a two rules pi Li Ki Ri , i 1, 2, a composi- e2 in the diagram in Figure 1a; we will return to this point q q = ← → = the cycles (1) and (2) are pushouts, and (3) is a pullback, ( l r ) below. constructive approach for inferring composite rules. tion L K R p1 E p2 can be definedsee e.g., whenever [13]. We then have ql = s1 ◦ w1 and ← → = ∗ An example of a full rule composition is shown in qr = t2 ◦ w2.Theconcurrencytheorem[14]ensures that for any sequence of consecutive direct transforma- Figure 1c. The two rules in the example, which in this p1,m1 p2,m2 case are also chemical reactions, are part of the Formose tions G H G′ agraphE,acorrespondingE- ===⇒ ===⇒ grammar. The Formose grammar consists of two pairs of concurrent rule p1 ∗E p2,andamorphismm can be found p1 Ep2,m rules. The first pair of rules, (from now on denoted as p0 such that G ∗ G′. =====⇒ and p1), implements both directions of the keto- tau- In order to use graph transformation as a model tomerism. One direction, p1,isvisualizedinFigure1c. for chemical reactions additional conditions must be The second pair (from now on denoted as p2, p3)isthe enforced. Most importantly, atoms are neither created, aldol-addition and its reverse respectively. The reverse nor destroyed, nor transformed to other types. Thus only (p3)isalsovisualizedinFigure1c.Weseethattheleftside graph morphisms whose restriction to the vertex sets of p1 is isomorphic to a subgraph of one of the compo- are bijective are valid in our context. In particular, the nents of the right side of p3. Composing the two rules by matching morphism m always corresponds to a subgraph subgraph matching yields a third rule, p1 p3. isomorphism in our context. The context graph K thus is ◦ (isomorphic to) a subgraph of both L and R,describing the part of L that remains unchanged in R.Conserva- Partial rule composition tion of atoms means that the vertex sets of L, K,and An important issue for the application to chemical reac- R are linked by bijections known as the atom-mapping. tions is that the graphs involved in the rules are in gen- When the atom mapping is clear, thus, we do not need to eral not connected. Typical chemical reactions combine represent the context explicitly. molecules, split molecules or transfer groups of atoms It is important to note that the existence of the match- from one molecule to another. The transformation rules for all these reactions therefore require multiple con- ing morphism m : L → G alone is not sufficient to guarantee the applicability of the transformation. In our nected components. For the purpose of dealing with these context, we require in addition that the transformation rules, we introduce the following notation for graphs and rule does not attempt to introduce an edge in R that has derivations. been present already before the transformation is applied. Let Q be a graph with #Q connected components Qi, i = ( ( ) ( )) / 1, ...,#Q.ItwillbeconvenienttotreatQ as the multiset Formally, the gluing condition requires that l x , l y ∈ L ( ( ) ( )) ( ( ( )) ( ( ))) / of its components. A typical chemical graph derivation, and r x , r y ∈ R implies m l x , m r y ∈ G. corresponding to a bi-molecular reaction can be written Full rule composition 1 2 p,m 1 2 3 in the form {G , G }==⇒{H , H , H },wherewetakethe In the following we will be concerned only with special, notation to imply that all graphs Gi and Hj are connected. chemically motivated, types of rule compositions. In the The conditions for the ◦ composition of rules are a bit simplest case the dependency graph E is isomorphic to too strict for our applications. We thus relax them respect R1, later we will also consider a more general setting in the component structure of left and right graphs. More which E is isomorphic to the disjoint union of R1 and precisely, we consider a partition of the components of L2 some connected components of L2.Fortheeaseofnota- into two parts L2 and L2′ (cmp. Figure 2a), and we require tion from now on we only refer to a rule composition, that E is isomorphic to a disjoint union of a copy of R1 and not to a composition of morphisms as in the Graph and L2′ ,whileL2 must be isomorphic to a subgraph of R1. i Grammar section, i.e., p1 ∗E p2 will be denoted as p2 ◦ p1 As a consequence, every connected component L2 of L2 ( i ) ( ) ( i ) (note the order of the arguments changes). If E =∼ R1,then satisfies either e2 L2 ⊆ e1 R1 or e2 L2 is a connected Some examples of the kinds of applied problems

graph-grammars have been12. The used Formose Reaction to solve

C4k C4e

OH OH OH • Steven Benner’s pruning p1 p2 HO OH O OH HO OH p2 p1 OH OH OH p1 +C1 +C1 of the formose network p3 p p 3 2 +C1 C1 O p ≠p C1 C1 3 4 ≠p4 ≠p4 HO OH OH OH OH OH with Borate to yield OH OH OH OH C5b O OH HO OH OH O 12. The Formose Reaction OH O

C5k3 p1 p2 p p HO R 1 2 OH B OH OH OH OH C1 +C1 ≠ OH p4 p3

OH OH HO OH (a) Borate OH p2 p1 C2a ≠p p2 K 4 p1 R OH +C2a L p H OH O 3 OH OH OH OH OH H2O p OH 2

OH O OH HO OH O C OH p1 OH C O C +C2e OH OH p 3 C2a B C ≠p4 B C R B C C2e p2 p1 ≠p4 R O R OH +C2a OH OH O p3 OH H O p O OH 2 OH 2 OH OH C2e H p1 ≠p4 O p1 OH O HO OH p2 +C2e (b) Borate + 1,2-diol reaction pattern, “addBorate” OH OH , C3k p3 OH L K R C3e C3a +C1 C1 C5a O C O C O C p3 ≠p4 C1 +C1 ≠p4 p3 p3

B H B H, D B D OH h i O OH p HO OH 1 HO O

(c) Relabelling of hydrogen to make it non-reactive, “HtoD” p2 OH C2e , C2a

Figure 12.6: Molecules and reaction patterns for borate inhibition of the formose Figure 12.7: The reaction network of the formose chemistry as calculated with reaction. The borate molecule (a) is modelled with only two hydroxyl groups to the strategy QBFS. The blue subnetwork correspond to the borate inhibited net- simplify the model. (b) the reaction pattern for forming borate complexes with work calculated with Qborate. The green and blue networks together with the red + 1,2-diols. This rule additionally has a matching constraint: none of the carbon reaction (C5b to C3e) correspond the network calculated with Qborate,i.e.,with atoms may not have incident double bonds. To approximate the subsequent non- dihydroxyacetone as an input compound. Each reaction is annotated with the reactivity of the hydrogens on the carbon atoms we relabel them to D using the reaction pattern, pi, used to realise the concrete reaction. For the aldol reactions, reaction “HtoD” (c). p3 and p4, the secondary educt (+) or product ( ) is additionally shown. The + ≠ addition of borate in Qborate is done with the strategy revive[addBorate], mean- ing that at most 1 borate is added in each iteration. The red reaction is no longer As a reference, we generate the non-inhibited reaction network with the strategy available if the addition is done with the strategy repeat[revive[addBorate]], QBFS: meaning “add as many as possible”. Q = addUniverse[ formaldehyde ] BFS { } addSubset[ glycolaldehyde ] 136 æ { } repeat[ æ rightPredicate[P , parallel[ p ,p ,p ,p ]] #C { 1 2 3 4} ] Not all molecules can actually bind with borate and must therefore be pre- served while the other molecules form complexes. This is modeled with a revive strategy around the actual complex forming reaction pattern, “addB- orate”. After the potential forming of a borate complex, the relevant hydrogen atoms must be made inactive using the rule “HtoD”. The number of relevant hydrogens may not be the same for all molecule and therefore the relabelling

134 Version February 21, 2013 submitted to Entropy 5 of 17

Figure 1. Illustration of a chemical reaction using the Double Pushout approach. The chemical transformation of complete molecules (i.e. the application of the graph grammar Some examples of the kinds ruleof as defined applied in the first row) is represented asproblems the graph derivation G = H in the second ) row. This shows an intermediate step in the synthesis of adenine as an HCN pentamer. l r Above, a possible rule p =(L K R) underlying the concrete reaction is shown (i.e. ! graph-grammars have been theused graph grammar rule to defining thesolve chemical reaction). Bonds in L that are removed and bonds in R that are inserted are all coloured red. The green vertices form the context K of the rule.

L K R H H H N N N • A network for HCN l r N N N polymerization calibrated C C C H H H N N N

Version February 21, 2013 submitted to Entropy 9 ofVersion17 February 21, 2013N submitted to EntropyN N 11 of 17 against Mass C C C H C H C H C H H H Figure 4. Hypergraph representation of the mechanistic route to adenine as proposed by C C C molecule concentrations Figure 6. A chemicalN pathway to produceN adenine by maximizingN the sum of MS data Oro[´ 7]. Under NH3 catalysing conditions 5 HCN molecules polymerize into 1 molecule of C C 7C intensities of theH individual compounds; theH sum of intensities is 1.05247H 10 for the given adenine (Note that the consumption of 4 NH3 molecules is counter balanced by 4 reactions · solution. N N N producing NH3). The pathway uses nine different chemical reactions. G D H

123 approximation, more refined energy computations are too expensiveN in terms of computational resources NH NH2 N N 2 N 124 N N to be used in large-scale network exploration. N NH 5 N 1 1 HN 125 The chemical space of HCN polymerization is modelled by a setNH of 9 reaction rules that have been NH N CH 2 N NH2 1 1 1 N NH2 N 126 extracted from literature [7,12]. An additional 14 rules are addedN to implement hydrolysis, mostly NH2 127 additions to the C N and C N bonds. A full description on the rules is given in the Web Supplement 1 1 HN 128 [39].

NH3 NH2 NH 1 1 1 N CH NH3 129 3.3. Finding Chemical Motifs 5 HN NH2 NH2 NH2 NH2 1 130 A chemical motifOH is a collection of reactions that e.g. connects given input molecules in order to N N 131 create a specific product, constitute a catalytic cycle, or are collectivelyNH autocatalytic. Given a reaction NH2 NH N N HN N 132 network generated in theNH space2 exploration step, chemical motifs are identified using an an integer linear HN NH2 OH N NH2 2 1 NH H NH2 133 programming2 (ILP) approach. The frameworkHN N of ILP allows us to model both general pathways such as NH 2 NH HN N 134 “the conversion of HCN to adenine” as well as pathways with specific structuralN properties such as “the

135 presence of autocatalytic cycles” without specifying particular molecules. N 1 1 1 N 136 The ILP formulations generically model each reaction by a non-negative integer variable that encodes 1 137 the multiplicity the reaction. Constraints are introduced to ensure balance of mass.1 Specific motifs are H N NH H N N 2 2 N N

NH N N H N H2N H2N N N N N N 1 N 1 N 1 N N NH2 OH2 1 NH NH N 2 H NH N N

209 support from the MS data by using the sum of intensities as objective function, is depicted in Fig. 6. The 7 210 overall sum of intensities is 1.05247 10 . · Figure 7. Putative autocatalytic loop in formamide, identified in the chemical space of

211 4.4. Autocatalytic Loops HCN-polymer hydrolysis. The autocatalytic loop is feed by cyanide molecules stemming from triazine decomposition.

212 Triazine is a small molecule HCN trimer that is quickly and completely removed from the solution in

213 a simple hydrolysis experiment. After 24 hours the absorbance from the ring was no longer detectable

214 by UV-VIS spectroscopy. This suggests that an autocatalytic network emerges that is feed by triazine or 1 215 some of its decomposition products under the hydrolysis conditions. Therefore we queried the chemical 1

216 space of HCN-hydrolysis for small autocatalytic sub networks with our computational methods (for a

217 formal definition of autocatalysis, see [41]). We identified small sub-networks that are autocatalytic in 3 1 218 formamide (shown in Fig. 7). Interestingly, formamide is the most stable compound with molecular

219 formula CHON. However, its role in prebiotic processes and the origin of life is under heavy debated 2 220 (see review [42] and accompanying comments). 2

2

3 3.4 The Autocatalytic Production of Oxaloglycolate 3 RESULTS

N

DAMN N NH2

NH2

N

NH2 N O

N H2N NH2 NH Another example application NH2

N NH2 OH NH2 H2N NH2 O HN O OH NH2 H2N NH2

NH2 NH N H2N O 2 O N NH N O NH2 N NH2 • Albert Eschenmoser’s glyoxylate scenario relating HCN

NH2 NH2 NH2 NH2 NH2 NH2 O NH2 OH O N HN OH HN O H N O O 2

N NH2 O N O N NH NH HO NH2 2 2 NH2 HO NH2 NH H2N O O OH 3.2 Pathways from the HCN-tetramerpolymerization to Oxaloglycolate and hydrolysis 3 RESULTS Nto TCA-cycleN metabolites

NH2 NH2 NH2 NH2 NH2 O NH2 NH2 NH2 NH2 OH NH2 O N N N HN O H2N OH O OH O O O NH Pyruvate Oxaloacetate Glycolaldehyde NH2 O 2 N N N NH O HO NH2 2 OH O NH2 O HO NH2 O NH2 HO NH2 OH NH O O OH O OH CH3 OH Original sketch OH HO O O

NH O NH2 NH2 NH2 O NH2 NH2 OH O NH2 NH2 NH2 OH H2N OH O OH N N O OH HN OH O OH O H2N O by Eschenmoser N NH2 O O O HO N N O HO NH2 OH NH2 OH O O O NH2 O O NH2 NH2 O OH N HC N Oxaloglycolate O HO OH

OH OH HO OH O NH NH NH NH OH NH O O NH Glyoxylate 2 2 2 2 2 2 O NH2 HN O N O NH 2 OH O O O OH N OH O OH NH2 O O NH2 HO HO N NH2 N N NH OH O OH NH2 OH OH O O NH2 HO NH2 O NH N N O Glyoxylate O

OH

NH2 OH OH O NH2 OH O OH OH OH OH O NH2 O O NH2 O N HN OH O HO O O H2N O O NH2 HO O N NH2 O NH2 NH2 HO N N O O HO NH2 HO NH2 NH NH O NH OH OH OH NH 2 2 O OH N N O N N O OH

OH O OH O OH OH OH H2N OH OH OH O O NH2 N N HN O O H2N OH O OH NH2 NH2 O NH2 HO N N NH2 O O NH2 HO NH2 O OH HO NH2 O OH NH OH O N N O OH NH2 O H N NH N 2 2 DAMN

NH2 H2N NH2 O OH OH OH OH OH OH OH O OH H2N OH O N

H2N OH O H2N O O NH2 HO O O NH2 NH2 O NH2 N HO NH2 O O O O O NH2 O OH OH NH Figure 2: Overview of Eschenmoser’s hypothetical chemical space.

O OH OH O OH H2N OH

O O O

HO NH2 O

O NH2 OH O OH OH one could immediately infer atom traces for specific isotope labelling experiments to distinguish alternative pathways [5]. O Full cascade of HO OH In Fig. 2, any edge drawn as a solid line indicates that a single production rule, i.e., a single reaction Oxaloglycolate O has been used. DashedThis edges orwork hyperedges suggests denote larger chemical a subspaces, role i.e., for sub-hypergraphs. “path The entropies” O OH optimal solutions two dashed orange (hyper-)edges refer to the hypothetical autocatalytic subnetwork of oxaloglycolate, using glyoxylate as food molecule. Such a hypothetical autocatalytic cycle has been presented and discussedFigure 3: in A superposition of optimal pathways from the HCN-tetramer to oxaloglycolate, illustrating the combinatorial complexity for even small subspaces. detail by Eschenmoser [17]. Below, we will discuss alternative routes found by computational inference. 8 The dashed green hyperedge depicts the hypothetical autocatalytic subnetwork of Glyoxylate, using HCN as food molecule. Such a subsystem has not been proposed in earlier work. In section 3.3 we will discuss several alternative co-optimal solutions for this chemical transformation motif. Oxaloglycolate and its oligomeres are precursors of carbohydrates and ↵-keto acids, which themselves are potential starting compounds for a primordial metabolism. In Fig. 2 oxaloglycolate is depicted as a precursor to glycolaldehyde, the dashed line corresponds to four subsequent production rules that correspond to a decarboxylation. Furthermore, via two production rule steps oxaloglycolate is reduced to oxaloacetate, which in two subsequent steps leads to pyruvate.

3.2 Pathways from the HCN-tetramer to Oxaloglycolate At first glance the conversion of the HCN-tetramer into oxaloglycolate seems to be a straightforward hy- drolysation process. Eschenmoser’s paper [17] and Fig. 2 illustrates this process in strongly abstracted form with just 3 steps. A mechanistic model, however, shows that 11 steps are required. Fig. 3 summarises a superposition of all pathways of minimum length starting at DAMN and ending in oxaloglycolate. The figure emphasises the combinatorial nature of the HCN hydrolysation chemistry. Although all connections in the graph in Fig. 2 seem to be 1-to-1, this is in fact not the case. The simple molecules H2O, NH3, and HCN were suppressed in the drawing to reduce cluttering. The intermediate compounds selected as representatives in the overview figure (see Fig. 2) are highlighted in Fig. 3. A closer inspection of the structure of the network in Fig. 3 shows that it closely resembles a Cartesian graph product[25]. This feature is based on the fact that the temporal order of several of the intermediate steps can be permuted. A consequence of the product-like structure is a high level of confluence, i.e., the fact that a large number of partially overlapping alternative

6 The MedØlDatschgerl (MØD) package including a

6/18/2017 MedØlDatschgerl — Cheminformatics 1 documentation Live Playground IMADA University of Southern Denmark MedØlDatschgerl

Overview http://cheminf.imada.sdu.dk/mod/ MedØlDatschgerl (MØD) is a software package developed for graph-based cheminformatics. It includes a general graph transformation system for automatically generating reaction networks from graph grammar formulations of chemistries. The software is primarily implemented in C++, but the package includes comprehensive Python 3 binding that provides easy access most functionality. The package also includes a large visualisation module that makes it possible to automatically visualise molecules, reactions, and reaction networks. Examples of how to use the Python interface and the visualisation capabilities can be seen in the examples section, and they can all be ac‐ • Software package for graph- cessed interactively in the Live Playground. based cheminformatics Source Code and Documentation Each release is available at GitHub. Please also use GitHub for reporting bugs, suggesting features, and con‐ tributing code. The documentation can be found at the GitHub Page.

• Even more important: a well- References If you use MØD in your research, you may want to cite some of the following papers. You may also be interested in the Graph Grammar Library, which has been used in early versions of MØD. defined representation system (As BibTeX) A Software Package for Chemically Inspired Graph Transformation Jakob L. Andersen, Christoph Flamm, Daniel Merkle, and Peter F. Stadler. Graph Transformation - 9th In‐ for what constitutes a ternational Conference, ICGT 2016, 73-88, 2016 [ DOI | TR ] In silico Support for Eschenmoser’s Glyoxylate Scenario Jakob L. Andersen, Christoph Flamm, Daniel Merkle, and Peter F. Stadler. Israel Journal of Chemistry, molecule / reaction / pathway 55(8):919-933, 2015. [ DOI | TR ] 50 Shades of Rule Composition — From Chemical Reactions to Higher Levels of Abstraction Jakob L. Andersen, Christoph Flamm, Daniel Merkle, and Peter F. Stadler. Formal Methods in Macro-Biolo‐ gy, 8738:117-135, 2014. [ DOI ] Conference version: Towards an Optimal DNA-Templated Molecular Assembler Jakob L. Andersen, Christoph Flamm, Martin M. Hanczyc, and Daniel Merkle. ALIFE 14: The Fourteenth Conference on the Synthesis and Simulation of Living Systems, 14:557-564, 2014. [ DOI | http ] Journal version: Towards Optimal DNA-Templated Computing • Tools for hard network search / Jakob L. Andersen, Christoph Flamm, Martin M. Hanczyc, and Daniel Merkle. International Journal of Un‐ conventional Computing, 11(3-4):185-203, 2015. [ http ] Generic Strategies for Chemical Space Exploration Jakob L. Andersen, Christoph Flamm, Daniel Merkle, and Peter F. Stadler. International Journal of Compu‐ optimization problems tational Biology and Drug Design, 7(2/3):225-258, 2014. [ DOI | TR ] Navigating the Chemical Space of HCN Polymerization and Hydrolysis: Guiding Graph Grammars by Mass Spectrometry Data. Jakob L. Andersen, Tommy Andersen, Christoph Flamm, Martin M. Hanczyc, Daniel Merkle, and Peter F. Stadler. Entropy, 15(10):4066-4083, 2013. [ DOI | http ] Inferring chemical reaction patterns using rule composition in graph grammars. Jakob L. Andersen, Christoph Flamm, Daniel Merkle, and Peter F. Stadler. Journal of Systems Chemistry, 4(1):4, 2013. [ DOI | http ] Maximizing output and recognizing autocatalysis in chemical reaction networks is NP-complete. Jakob L. Andersen, Christoph Flamm, Daniel Merkle, and Peter F. Stadler. Journal of Systems Chemistry, 3(1):1, 2012. [ DOI | http ]

http://cheminf.imada.sdu.dk/mod/ 1/1 Pentose phosphate network; flow 1 1 4 The concept of Balanced 2 3 19 18 Integer Hyperflows 0.5

15 10 13 12 2 8 0 • What combined flux rates 28 1 5 9 1 2 6 balance concentrations at 1 1 14 all species? Integer Linear -0.5 1 Programming problem 17 7

-1 Pentose phosphate network; flow 13 -1 -0.5Pentose phosphate 0 network; flow 10 0.5 1 1 1 4 4 2 3 2 3

19 19 1 18 18 0.5 0.5

1 15 15 10 10 8 13 13 12 1 12 2 8 0 28 0 1 1 28 2 1 5 5 9 9 1 6 2 6 1 1 14 14 -0.5 -0.5 1

1 1 17 17

7 7

-1 -1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 15.2. Realisable Pathways and Atom Tracing

C5P C5P C5P C5P

TKL TKL

C3P C7P C7P C3P

One finer level than hyperflows:TAL Interpretations TAL (also known as “Petri net proofs”)C6P C4P C5P C5P C4P C6P TKL TKL

C6P C3P C3P C6P

AL Net15.2. conversion Realisable Pathways in and the Atom Tracinginteger hyperflow: 6C5 -> 5C6 C6P

(a)

C5P C5P C5P C5P C5P C6P C5P

TKL borrowed molecule TKL TKL TKL C7P C4P C5P C3P C3P C7P C7P C3P TKL TAL TAL C3P C6P C5P

C6P C4P C5P C5P C4P C6P TAL TAL

TKL TKL C6P C4P C7P C4P C5P TKL C6P C3P C3P C6P C5P C7P C6P AL AL TKL TAL

C6P C6P C3P C4P C6P C6P

(a)

C5P C6P C5P (b) borrowed molecule TKL TKL Figure 15.3: Two pathways implementing the non-oxidative pentose phosphate pathway, with the overall reaction 6 C5P 5 C6P. Both pathways are shown C7P C4P C5P C3P ≠æ with duplication ofAn molecules un-realizable and reactions, such pathway that each of them represent a A TKLrealizable pathway single occurrence in the pathway. (a) The “classical” pathway [Meléndez-Hevia & Isidoro 1985unless], which is realisable.a molecule(b) An non-realisableis borrowed alternative pathway, C3P C6P C5P which is realisable if a copy of C6P is borrowed. TAL TAL

C6P C4P C7P C4P C5P 163 TKL

C5P C7P C6P AL

TKL TAL

C6P C3P C4P C6P C6P

(b)

Figure 15.3: Two pathways implementing the non-oxidative pentose phosphate pathway, with the overall reaction 6 C5P 5 C6P. Both pathways are shown ≠æ with duplication of molecules and reactions, such that each of them represent a single occurrence in the pathway. (a) The “classical” pathway [Meléndez-Hevia & Isidoro 1985], which is realisable. (b) An non-realisable alternative pathway, which is realisable if a copy of C6P is borrowed.

163 The hard math that a good computational platform helps you do

• Equivalence of molecules under different pathways: must solve graph isomorphism problem very many times

• Many search and optimization problems that are low- order polynomial on simple graphs are NP complete on hypergraphs Andersen et al. J. Sys. Chem. 3:1 (2012)

• For brute-force networks, need Integer-Linear- Programming help

• For networks beyond brute force, good heuristics become essential Part 2: The discrete math (topology, graph theory) of CRNs

• Main concepts, terminology, and graphical notations for CRNs

• The largest component of a CRN on which dynamics behaves the same way as a random walk on a simple graph CRN basics: terms and concepts (graphics)

complex graph reactions (or network) complexes α α β β stoichiometry ε ε

species Baez and Biamonte (2017) Gunawardena (2003) Feinberg (1979, 1983) Horn and Jackson (1972) connected components of the complex graph = “linkage classes” A k k2 1 _ k2 ε _ k1 k1 ε k _ 1 k k1 1 k2 ε B CRN basics: terms and concepts (quantities)

reactions • Species index p 1,...,P 2 complexes α • Complex index i, j 1,..., β 2 C rate species • (i, j) k Reactions (ordered) const. ji stoichiometry

• Concentrations (column vector) n [n ] ⌘ p • Stoichiometry (complexes to species) Y yj ⌘ p i i i yp • Activities (column vector on complexes) ⇥ ⇤ np Y Y Y ⌘ ⌘ p ⇥ ⇤ YT • Adjacency/Rate matrix (btw. complexes) A = (wj wi) kjiwi dn (Xi,j) • = Y A Y The rate law dt α β ε ε Example A

dn rate law = Y A Y dt

Y = 123 ↵✏ n 2 A = ↵ 2✏ = n stoichiometric matrix 2 3 Y ⇥ ⇤ ✏ 2 n3 3 adjacency matrix 4 5 4activities5

T A = (wj wi) kjiw i (Xi,j) 1 100 1 010 means A = 1 ↵ + 1 ✏ + 2 0 3 ⇥ ⇤ 2 0 3 ⇥ ⇤ ··· 4 5 4 5 The Complex Network: the largest sub-system that is a simple graph

• If this had been a simple graph, the rate law woud have been α β 1 2 3 n n ε ε d 1 1 n2 = A n2 ↵✏ dt 2 3 2 3 n3 n3 A = ↵ 2✏ 2 ✏ 3 4 5 4 5 4 5 • The Key Difference between simple graphs and CRNs:

• The species always carry the memory of state

• In graphs, A acts directly on species

• In CRNs, A acts on complexes, which have no state Part 3: What does topology ensure about dynamics without fine constraints on rates?

• Two classic results (one now, one in the next section)

• Feinberg’s “deficiency-0” theorem for existence, uniqueness, stability, and properties of classical steady states

• The Anderson-Craciun-Kurtz theorem for steady- state distributions on deficiency-0 networks What are the issues?

• In general, the rate equation is ugly: dn = Y A Y dt To what extent does topology allow us to draw conclusions about possible dynamics or steady states?

• Reaction stoichiometry can be “clumpy” — we might expect that, independent of the behavior of the classical rate law, that clumpiness might lead to complicated

distributions α β ε ε

A The three types of flows that determine the character of solutions

• Balanced flows within the complex network: ker A

• Flows that change concentration: Im(Y A)

• Balanced flows outside ker A : ker Y ImA \ For non-obvious reasons, introduce the concept “deficiency” for a CRN

• Stoichiometric subspace: S =Im(Y A) s dim (S) ⌘ • Deficiency: dim (ker (Y ) Im(A)) ⌘ \

• Two identities: dim (Im(A)) = dim (Im(Y A)) + dim (ker Y Im(A)) \ = s + C =dim(Im(A)) + dim (ker (A)) = s + +dim(ker(A))

• Lead to a counting rule for deficiency: = C s l (complexes) (linkage classes) Remarkable result of topology for existence, uniqueness, stability, interiority of steady states

• Theorem (Martin Feinberg)

“If a CRN with δ = 0 is weakly reversible then, for mass action kinetics, the rate equations will have precisely one steady state, within each positive stoichiometric compatibility class. This steady state is asymptotically stable.” [and has strictly positive concentrations]

Feinberg and Horn 1974; Horn 1973; Feinberg1977,1980 Intuition for the proof: if deficiency = 0, there can only be mean-regressing flows

“s-flow” “δ-flow” changes some concentration balanced (integer) hyperflow shifts activities to resist change preserves concentrations & activities

• If δ = 0, there can be no flows that don’t change concentration of species

• But if species pile up or are depleted, the greater activity drives flows in the opposite direction⇥ (interiority) Part 4: Stochastic CRNs, with a tiny mention of generating-function methods

• Anderson-Craciun-Kurtz Theorem:

“For a CRN with δ = 0, weakly reversible, with mass action kinetics — no matter how clumpy the reactions — the steady-state distributions will be products of Poissons, or slices through them if there are conserved quantities.”

D. Anderson, G. Craciun, T. Kurtz 2010 Discrete-state, continuous-time stochastic processes: Master Equation and a taste of ACK

• Probability density function: ⇢n

d • Master equation: ⇢ = T ⇢ dt n nn0 n0 Xn0 • Expected numbers: n n ⌘h i i Y (n) = Y (n) • Sampling without replacement n ! leads to mass-action rates i ⇥ ⇤p Y ⌘ i p np yp ! Y Factorial moments • d A new rate equation: n = Y A Y (n) dt h i h i Generating-function methods for discrete-state stochastic processes

n • Moment-generating function: (z)= z ⇢n n X d • If ⇢ evolves w/ a master equation: ⇢n = Tnn ⇢n dt 0 0 Xn0

• Then evolves under a PDE @ @ (z)= z, (z) called a Liouville equation: @t L @z ✓ ◆ Factorial moments and Poisson distributions: least- information null models for steady states

n ⇠ ⇠ • Poisson distribution: ⇢n = e n!

• All factorial moments are simple functions of the mean: n! n! = ⇢ = ⇠k (n k)! n (n k)! n ⌧ X • The generating functions are exponentials: n n ⇠ z ⇠ (z 1)⇠ (z) e = e ⇠ ⌘ n! n • They are eigenstates of the Xderivative operator: @ (z)=⇠ (z) @z ⇠ ⇠ Example at δ = 0, and elegant proof that a Poisson distribution is a steady-state solution

α d⇢n @/@n = e 1 ↵n • Master equation: dt β n⇣ ⌘ + e@/@n 1 n(n 1) ⇢ n ⇣ ⌘ o • zz2 1 ↵ @/@z Liouville function: = L 1 (rate matrix) @2/@z2 ⇥ ⇤  ⇥ ⇤  @ @ = z (1 z) ↵ @z @z ✓ ◆

@ • But on a Poisson distribution: ↵ (z)=(↵ ⇠) (z) @z ⇠ ⇠ ✓ ◆

• So steady-state condition ( z )=0 whenever ⇠ = ↵/ L ⇠ Getting general ACK from Feinberg’s deficiency-0

• T @ More general form for : = (z) A Y L L Y @z ✓ ◆ @ General behavior on a gen. fn. (z) = (n) Y @z ⇠ h Y i ✓ ◆ z=1 For a Poisson, remember factorial moments: = Y ( n ) h i The derivative just pulls out the coherent-state parm. = Y (⇠)

T ⇠(z)= (z) A Y (⇠)⇠(z) L Y • dn This is just the condition in the = Y A Y (n) rate equation that Feinberg shows dt can always be solved at δ = 0. END PROOF More that can be done

• Beyond deficiency-0? When (else) are CRNs simple?

• Duality and fluctuation theorems (Jarzynski, Crooks)

• Develop stat. mech. of concurrency further:

• Systems-biology language Kappa (Fontana, Danos)

• Large-Deviations for stochastic Rubik’s Cube?

• Category theory: rule & network aggregation / reduction Concluding remarks

• Diffusion on ordinary graphs is a restrictive framework, but has still be extremely useful

• Adding concurrency and clumpiness with CRNs is a natural next step, but adds considerable mathematical difficulty

• This ties in well with macro-worlds, and the area of overlap contains a lot of interesting work to do