1 Autocatalysis in chemical networks: unifications and extensions 1, 2 1 2 2 Alex Blokhuis, David Lacoste, and Philippe Nghe 1) 3 Gulliver Laboratory, UMR CNRS 7083, PSL University, 10 rue Vauquelin, Paris F-75231, 4 France 2) 5 Laboratoire de Biochimie, Chimie Biologie et Innovation, ESPCI Paris, PSL University,10 rue Vauquelin, 6 Paris F-75231, France

7 (Dated: July 1, 2020)

8 Autocatalysis is essential for the origin of life and chemical evolution. However, the lack of unified framework 9 so far prevents a systematic study of autocatalysis. Here, we derive general stoichiometric conditions 10 for and autocatalysis in networks from basic principles. This allows for a 11 classification of minimal autocatalytic motifs. While all known autocatalytic systems indeed contain 12 minimal motifs, the classification also reveals hitherto unidentified motifs. We further examine conditions 13 for kinetic viability of such networks, which depends on the autocatalytic motifs they contain and is notably 14 increased by internal catalytic cycles. Finally, we show how this framework extends the range of conceivable 15 autocatalytic systems, by applying our stoichiometric and kinetic analysis to autocatalysis emerging from 16 coupled compartments. The unified approach to autocatalysis presented in this work lays a foundation 17 towards the building of a systems-level theory of chemical evolution.

18 PACS numbers: 05.40.-a 82.65.+r 82.20.-w

19 INTRODUCTION 53 catalysis: A substance that increases the rate of a re- 54 action without modifying the overall standard Gibbs ◦ 55 energy change (∆G ) in the reaction; the process is 20 The capacity of living systems to replicate them- 56 called catalysis. The catalyst is both a reactant and 21 selves is rooted in a chemistry that makes more of itself, 57 product of the reaction. Catalysis brought about by one 22 i.e. an autocatalytic system. Autocatalysis appears 58 of the products of a (net) reaction is called autocatalysis. 23 to be ubiquitous in living systems from molecules to 1 24 ecosystems . It is also likely to have been continually 25 present since the beginning of life and is invoked as 2–5 26 a key element in prebiotic scenarios . Surprisingly, 6 27 autocatalysis is considered to be a rarity in chemistry . 28 Developments in systems chemistry are changing this 29 view, with an increasing number of autocatalytic sys- 7–9 30 tems synthesized de novo . Chemical replicators 31 have been endowed with biomimetic properties such as 10 11 32 protein-like folding and parasitism . Autocatalysis

33 has also found technological applications, e.g. enan- 59 From this definition, we derive conditions to de- 12–14 34 tiomer enrichment and acid amplification . 60 termine whether a subnetwork embedded in a larger 61 chemical network, can be catalytic or autocatalytic. 35 Understanding autocatalysis represents a primary 62 These conditions provide a mathematical basis to iden- 36 challenge for theory. Models based on autocatalysis 63 tify minimal motifs, called autocatalytic cores. We 37 were first built to explain a diversity of dynamical 64 found that cores have five fundamental categories of 38 behaviors in so called dissipative structures, such as 15 65 motifs. They allow classification of all previously de- 39 bistable reactions , oscillating reactions, and chemical 16 66 scribed forms of autocatalysis, and also reveal hitherto 40 waves . Autocatalysis then became a central topic in 67 unidentified autocatalytic schemes. We then study the 41 the study of self-replication dynamics in biological and 3,17–19 20–22 68 kinetic conditions, which we call viability conditions, 42 prebiotic systems (see for recent reviews). 69 under which autocatalytic networks can appear and 43 Despite this history, a unified theory of autocatal- 70 be maintained on long times. We find that networks 44 ysis is still lacking. Such a theory is needed to under- 71 have different viabilities depending on their core struc- 45 stand the origins, diversity and plausibility of autocatal- 72 ture, and notably that viability is increased by internal 46 ysis. It would also provide design principles for artificial 73 catalytic cycles. Finally, we expand the repertoire 47 autocatalytic systems. Here, we present a framework 74 of autocatalytic systems, by demonstrating a general 48 that unifies the different descriptions of autocatalysis 23–27 75 mechanism for its in multicompartment sys- 49 and is based on reaction network . 76 tems (e.g. porous media, vesicles, multiphasic systems). 50 Let us start from basic definitions in chemistry 77 This mechanism strongly relaxes chemical requirements 28 51 as established by IUPAC (see SI Sec. I for full def- 78 for autocatalysis, making the phenomenon much more 52 initions), where autocatalysis is a particular form of 79 diverse than previously thought. 2

1’ 2’ 80 EXAMPLES, DEFINITIONS AND CONVENTIONS a) b) c) 1’ A -1 0 EA A + E EA B 0 1 81 Catalysis and autocatalysis 2’ 1’ 2’ EA E +B E -1 1 EA 1 -1 E 82 The following reactions have the same net mass 83 balance but a different status regarding catalysis: 1’’ 2’’ d) e) f) (I) (II) (III) A -1 0 AB A )−−−−* B, A+E )−−−−* B+E, A+B )−−−−−−* 2B. 1’’ + B -1 2 1’’ 2’’ (1) A B AB 2’’ AB 1 -1 84 Since no species is both a reactant and product AB 2B B 85 in reaction (I), it should be regarded as uncatalyzed. 86 Reactions (II) and (III) instead contain species which Figure 1: Different representations for allocatalysis (a,b,c) 87 are both a reactant and a product, species E in reaction and autocatalysis (d,e,f). a) Combining reactions 88 (I) and species B in reaction (III) and following the (1’)+(2’) affords an allocatalytic cycle that converts 89 definition above, these species can be considered as A to B. b) stoichiometric matrix of a), the dashed 90 catalysts. In reaction (II), the amount of species E square encloses the allocatalytic submatrix ν¯0 for 91 remains unchanged, in contrast to the case of reaction network b). c) Graph representation of the allocat- 92 (III), where the species B experiences a net production. alytic subnetwork. d) Combining (1”)+(2”) affords 93 For this reason, reaction (III) represents genuine auto- an autocatalytic cycle converting A to B. e) stoichio- 94 catalysis. Although reaction (II) is usually referred to metric matrix of d), the dashed square encloses the autocatalytic submatrix ν¯00 for network e). f) a graph 95 as simply catalyzed in the chemistry literature, we pro- representation of the autocatalytic subnetwork. 96 pose to call it an example of allocatalysis to contrast it 97 with the case of autocatalysis, catalysis being common 98 to both. 128 cel on each side leading to the same column vector as 99 We emphasize that stoichiometric considerations 129 for (I). This is avoided by describing catalysis through 100 are necessary but not sufficient to characterize catalysis, 130 a sequence of reactions steps from which it emerges, 101 which according to the definition should also acceler- 131 so that a participating species is either a reactant or a 102 ate the rate of the net reaction. In the following, we 132 product: 103 will first generalize the stoichiometric conditions, then 104 examine kinetic ones. IIa IIb A + E )−−−−* EA )−−−−* E + B, (2) IIIa IIIb A + B )−−−−* AB )−−−−* 2B . (3) 105 Stoichiometric matrix and reaction vectors 133 We call this convention non-ambiguity and assume 134 henceforth that it is respected. 106 Reaction networks are represented as a stoichio- 23,26 107 metric matrix ν , in which columns correspond to 108 reactions and rows to species. The entries in a column 135 CATALYSIS AND AUTOCATALYSIS IN 109 are the stoichiometric coefficients of the species partic- 136 STOICHIOMETRIC MATRICES 110 ipating in that reaction, the coefficient is negative for 111 every species consumed and positive for every species T 112 produced. A reaction vector g = [g1, .., gr] results 137 In this section, we will consider any possible sub- 113 in a change of species numbers ∆n = ν · g. The sup- 138 matrix ν¯ of ν, the stoichiometric matrix of a reaction 114 port of g, denoted supp(g), is the set of its non-zero 139 network, and ask whether the stoichiometry of the cor- 115 coordinates. A reaction cycle is a non-zero reaction 140 responding subnetwork, called a motif, is compatible 116 vector c such that no net species number change occurs 141 with the definitions of allocatalysis or autocatalysis. 117 : ν · c = 0, or equivalently, c belongs to the right null 142 Note that such identification neither makes a priori T 118 space of ν. Vectors b belonging to the left null space 143 assumptions on the values and signs of reaction vector 119 of ν induce conservation laws, because in that case 144 coefficients, nor on kinetics, or on which species are 120 b · n represents a conserved quantity. The case of all 145 catalytic or not. A matrix ν¯ is a restriction of ν to 121 coefficients bk nonnegative is referred to as a mass-like 146 certain rows and columns, which respectively corre- 122 conservation law. For example in Fig.1a, conserved 147 spond to the species and reactions of the motif under 123 quantities are nE + nEA (catalysts) and nA + nEA + nB 148 consideration. 124 (total compounds). 149 The restriction of the rows means that the species 125 Lastly, catalyzed reactions may not always be dis- 150 of ν are separated into internal species of the motif 126 tinguished from uncatalyzed one in the stoichiometric 151 (rows of ν¯) and external species (remaining rows of 127 matrix. For instance, in reactions (II-III), catalysts can- 152 ν). These external species could be, in some cases, 3

26 153 chemostatted , and represent feedstock compounds, 206 Criterion for autocatalysis 29 154 also called the food set , and waste from the point of 155 view of internal species of the motif. In Fig.1, external 207 By definition, autocatalysis is the process by which 156 species have been colored in blue, while stoichiometric 208 a combination of reactions involves a set of species 157 submatrices have been boxed in yellow. Fig 1a and 1d 209 which all increase in number conditional on species in 158 represent examples of allocatalysis and autocatalysis, 0 210 the set itself (the autocatalysts), while other species 159 respectively, with their respective submatrices ν¯ and 00 211 undergo a turnover. This leads to the following condi- 160 ν¯ , and hypergraph representations Fig 1c and Fig 1f. 212 tions: 161 Restriction of columns separates reactions which 213 There exists a set of species S, a submatrix ν¯ of 162 are part of the motif and those which occur outside 214 ν restricted to S, and a reaction vector g such that: 163 of it. A motif such that each of its reactions has at 215 i) ν¯ is autonomous, ii) all coordinates of ∆n = ν¯ · g 164 least one reactant and at least one product is called 216 are strictly positive, or equivalently, ν¯ has no mass- 165 autonomous. This means that every column of ν¯ con- 217 like conservation laws. The members of S consumed 166 tains a positive and a negative coefficient. Below, we 218 (and produced) by g are called autocatalysts, g an 167 pose autonomy as a condition for catalysis. Indeed, 219 autocatalytic mode and ν¯ an autocatalytic matrix. 168 it ensures that the production of any species of the 220 Condition (i) ensures the conditionality of the 169 motif is conditional on the presence of other chemi- 221 reactions on autocatalysts, as it forbids cases where 170 cal species of the motif. Otherwise, rate acceleration 222 species of S are produced from external reactants only, 171 would be allowed unconditional on an already present 223 thus playing the role of conditions (i) and (ii) in the 172 substance, in opposition to the definition of catalysis. 224 definition of allocatalysis. Condition (ii) expresses 173 Autonomy is less restrictive than former conditions for 24 225 the increase in autocatalyst number. The equivalence 174 autocatalysis , and is similar to the siphon concept 30 226 between the two formulations of condition (ii) is an 175 in Petri Nets , but without assumption on reaction 31 227 immediate consequence of Gordan’s theorem . Impor- 176 signs (see SI Sec. I). Note that it does not forbid that 228 tantly, the second formulation of (ii) does not involve 177 reactions outside of the motif produce species of the 229 an autocatalytic mode g, so that (i) and (ii) can be 178 motif. 230 expressed as properties of a matrix itself, in contrast 231 with allocatalysis. This allows us to look for minimal 232 autocatalytic motifs, which we do next. Note that 233 external species must feed the autocatalytic system in 179 Criterion for allocatalysis 234 order to guarantee the net mass increase imposed by 235 condition (ii).

180 By definition, allocatalysis is an ensemble of re- 181 actions by which a set of species remain conserved in

182 number (the catalysts) while other external species 236 Autocatalytic cores 183 undergo a turnover which changes their numbers. This 184 leads to the following conditions: 237 An autocatalytic core is an autocatalytic motif 185 There exists a set of species S, a submatrix ν¯ of 238 which is minimal because it does not contain any 186 ν restricted to S, and a non-zero reaction vector c 239 smaller autocatalytic motif. Consequently, an auto- 187 such that: i) ν¯ is autonomous; ii) supp(c) is included 240 catalytic system is either a core, or it contains one 188 in the columns of ν¯; iii) c is a reaction cycle of ν¯ 241 or several cores. The stoichiometric conditions show 189 (ν¯ · c = 0), and; iv) ν · c =6 0. The members of S which 242 that characterizing cores is equivalent to finding all au- 190 participate in c (i.e. that are consumed and produced) 243 tonomous matrices whose image contains vectors with 191 are called allocatalysts, c an allocatalytic cycle and ν¯ 244 only strictly positive components. This well-posed for- 192 an allocatalytic matrix. 245 mulation allowed us to show that the stoichiometric

193 Condition (i) has been discussed above. Condition 246 matrix ν¯ of an autocatalytic core must verify a number 194 (ii) expresses the involvement of the catalysts in the 247 of non-obvious properties reported below and demon- 195 reactions c, where all columns of ν¯ are non-zero due to 248 strated in the SI Sections II andIII. 196 (i), so that all reactions of c involve catalysts. Condition 249 First, ν¯ must be square (the number of species 197 (iii) expresses the conservation of catalysts and (iv) the 250 equals the number of reactions) and invertible. The 198 net reaction. Since the reaction cycle c is a cycle of the 251 inverse has a chemical interpretation. By definition of −1 199 reduced matrix but not of the original matrix, some 252 the inverse, the k-th column of ν¯ is a reaction vector 200 authors have qualified it as emergent and shown that 253 such that species k increases by one unit, making it an 201 it can establish a non-equilibrium steady state driven 254 elementary mode of production. Likewise, the reaction 26 −1 202 by the turnover of the external species . Note that 255 vector obtained by summing the columns of ν¯ leads to 203 being allocatalytic is not a property of the sub-matrix 256 a net increase by one unit of every autocatalyst, which 204 ν¯ alone but involves the larger matrix ν as imposed by 257 thus represents an elementary mode of autocatalysis. 205 condition (iv). 258 This shows how stoichiometry informs on fundamental 4

27 259 modes of autocatalysis . a) Type I Type II Type III Type IV Type V 260 Second, every forward reaction of a core involves 261 only one core species as a reactant. While this ex- 262 cludes reactions between two different core species, a 263 single core species may react with itself. As ν¯ is square, 264 this also implies that every species of a core is con- 265 sumed (none is only produced), thus is an autocatalyst. b) C3 C2 c) 266 Furthermore, every species is the reactant of a single C2 C3’ C4 C3 267 reaction. Overall, every species is uniquely associated C4’ 268 with a reaction as being its reactant, so that ν¯ admits C5 C4’ C3’ Type I Type II 269 a representation with a negative diagonal and zero or C5’

OH 270 positive coefficients elsewhere, at least one coefficient of O O HO O 271 each column being strictly positive to ensure autonomy. HO H H H H O O O HO H H 272 These properties are constraining enough to allow OH HO OH O O OH 273 an exhaustive enumeration of reaction graphs that HO HO O

OH O 274 are cores. Autocatalytic cores are found to belong to OH O OH HO O OH OH 275 O OH five categories, denoted as Type I to Type V. Fig. 2a HO OH OH OH 276 represents typical members of each category as reaction HO O OH O O 277 hypergraphs (see SI Fig. S1 for general cases). As can H H H H HO HO 278 be seen in these graphs, all minimal motifs contain OH OH 279 a fork, which ends either in the same compound (or 280 node) for Type I or in different compounds for Types Figure 2: a) Five minimal motifs. Orange squares indi- cate where further nodes and reactions may be added, 281 II to V. The presence of this fork is consistent with provided this preserves the motif type (I,II,III,IV,V) 282 the intuition that autocatalysis requires reaction steps and minimality. b+c) Examples of chemical networks, 283 that amplify the amount of autocatalysts. The orange along with their autocatalytic cores. blue: external 284 square on the links between the nodes indicate that species, yellow: autocatalysts. b) Type I: Breslow’s 285 these links could contain further nodes and reactions 1959 mechanism for the formose reaction33 c) Type 286 in series, provided certain rules on cycles below. II: Another autocatalytic cycle in the formose reac-

287 The five types differ in their number of graph tion. Species denoted as Cx inside the nodes refer to 32 288 cycles and the way these cycles overlap. Type I con- molecules containing x carbon atoms, which are shown below in standard chemical representation. 289 sists of a single graph cycle that is weight-asymmetric, 290 defined as the product of the stoichiometric coefficients 291 of its reaction products being different than that its 292 reactants. Types II-V can be described as two overlap-

293 ping graph cycles, where any such graph cycle involving 313 Domain) model for self-enhancing growth of amphiphile 4,5 294 a strict subset of the core species must be an allocat- 314 assemblies , all underlying autocatalysis is described 295 alytic cycle, i.e. weight-symmetric (it would otherwise 315 (SI Fig. S6) by Type I cycles with one fork and Type 296 be of Type I, contradicting minimality). 316 II cycles built up from sequential nonoverlapping al- 317 locatalytic cycles (cross-incorporation, such as N3 in 318 Fig. 3). More generally, when such catalytic cycles are

297 Unification of autocatalytic schemes 319 compactly written as single reactions as in (1), they 320 can be treated in the RAF (Reflexively Autocatalytic 29 321 and Food-generated) framework , where they form 298 The stoichiometric characterization of autocatal- 36 322 irreducible RAF-sets . This formally establishes the 299 ysis provides a unified approach to autocatalytic net- 5,37 323 recently suggested link between these models. 300 works reported in the literature. The examples below 301 are further detailed in SI Sec. III. The formose reaction 324 Another reported form of autocatalysis is ‘chemical 302 is a classic example of autocatalysis known to contain 38 34 325 amplification’ due to cavitands . The mechanism in- 303 many autocatalytic cycles . Fig.2b and c show Type I 326 volves a reactive compound in a molecular cage, whose 304 and III cores both found in the formose reaction. Simi- 327 free counterpart can react to form two species that 305 larly, autocatalytic cores of Type I and III can be found 328 exchange with the caged species, thus amplifying its 306 in the Calvin cycle and reverse Krebs cycle (SI Fig. S4). 329 release. We find that this process can be described 307 Some reaction steps Fig.2b may be catalyzed externally 330 within our framework and corresponds to a Type III 308 (e.g. by enzymes, base, ions), but external catalysis 331 core (SI Fig. S5). 309 in general does not alter the core. By the same token, 21,35 310 proposed examples of auto-induction introduced in 332 Overall, previously described autocatalytic 311 contain Type I and III cores (SI Fig. S3). 333 schemes comprise Types I, II and III. We have not yet 312 In the GARD (Graded Autocatalysis Replication 334 found examples of Types IV and V. 5

335 VIABILITY OF AUTOCATALYTIC NETWORKS 383 Reversible Type I cycles

384 Consider a Type I cycle consisting of n reaction 336 Stoichiometric conditions do not guarantee that 385 steps, such as N1 in Fig. 3b, and let us start at the 337 autocatalysts within motifs amplify. Whether an ini- 386 first step with species X1 (marked node). Ultimately, 338 tial autocatalyst amplifies or degrades depends on ki- 387 X1 will either be successfully converted and yield 2X1 339 netic considerations. To address this so-called fixation 17,22 388 or be degraded prematurely, which simplifies (4) to 340 problem , we examined the probability Pex of ex- 341 tinction (or 1 − Pex of fixation) of species within auto- p0 p2 ∅ ←−− X1 −−→ 2X1, (6) 342 catalytic motifs, as a function of transition probabilities 343 of reaction steps. 389 with p0 + p2 = 1. The overall outcome described by (6) 390 corresponds to the simplest type of branching process: 344 Considering a homogeneous system with a steady 391 a birth-death process. (5) then becomes a quadratic 345 supply of reactants, several authors have noted that 392 equation that yields 346 in the highly dilute autocatalyst regime, appreciable 17,22,39 347 rates require first-order autocatalysis , i.e. each  1 1 − 1, p2 ≥ , p2 2 348 forward reaction step only involves one autocatalyst. Pex = 1 (7) 1, p2 < . 349 Among first-order order networks, fixation models have 2 350 so far focused on Type I networks (e.g. Fig.2b), which 393 This generalizes Bagley et al’s observation for Type 351 have a single graph cycle containing n species. In a 394 I networks to n > 1 and reversible reactions. For re- 352 transition step, a given species may either proceed 395 versible reactions, p2 is found by considering all possible 353 irreversibly to the next species or disappear as a result 396 sequences of forward and backward reactions along the 354 of degradation. King found that if every reaction step − 397 cycle. From Xk, let Πk be the transition probability to 355 k among n steps of the cycle has a success probability + + + 398 revert to Xk−1, and Πk to convert to Xk+1. We have 356 Πk (1−Πk being the degradation probability), fixation Qn + 357 is possible for a doubling probability p2 = k=1 Πk ≥ n 39 Y + 358 1/2 . This minimum value of p2 above which fixation p2 = Πk Γk, (8) 19,40 359 is possible is called the decay threshold . Bagley k=1 17 360 et al. used birth-death processes to derive Pex for ∞ X − + s 1 361 an autocatalytic loop containing one species (n = 1). Γk+1 = (Πk+1ΓkΠk ) = − + , (9) 1 − Π ΓkΠ 362 Schuster reported detailed time-dependent statistics s=0 k+1 k 22 363 for such networks in various contexts . 399 where Γk recursively (Γ1 = 1) counts the statistical 364 Here, we extend the treatment of the fixation 400 weight of all back-and-forth trajectories from Xk to − + 365 problem so as to include reversible reactions and net- 401 itself, in terms of Πk and Πk . In the irreversible − 366 works beyond Type I using the theory of branching 402 reaction limit Πk → 0, Γk → 1 King’s expression for 41 39 367 processes . In these stochastic processes, an autocat- 403 p2 is recovered. 368 alytic species Xs is, after a sequence of reaction steps in 369 the network, replaced by k copies. Reaction sequences

370 yielding k copies happen with a probability pk, such 404 Viability of autocatalytic cores 371 that

p p p X −−→∅0 , X −−→1 X , ... X −−→k kX , ... . (4) 405 To investigate how autocatalytic motif structure af- s s s s s 42 406 fects survival, we calculated Pex for five different cores 372 The probability Pex that Xs goes extinct is then the 407 (N1 to N5, Fig. 3): they are of equal size (6 reaction 373 probability that its k descendants independently go 408 steps, 6 species), all reactions proceed irreversibly with 374 extinct: 409 the same success probability ζ, which plays a similar + 410 role as the transition probability Πk in the example ∞ 19,39,40 2 X k 411 above and is sometimes called specificity . Pex = p0 + p1Pex + p2Pex + ... = pkPex. (5) 412 Fig. 3 highlights how P depends on ζ for each k=0 ex 413 core structure. The highest ζ for extinction (Pex = 1) 375 The main difficulty here is to derive pk from transition 414 is observed for the Type I cycle N1, and progressively 376 probabilities Πk. A procedure for this is given in SI Sec. 415 lower values are found for N2 to N4, which are all of 377 IV, where branching processes are constructed from 416 Type II. Type V network N5 tolerates the lowest speci- 378 reaction networks. Below, we exemplify this method by 417 ficity ζ before extinction, sustaining almost three times 379 generalizing known results for Type I networks, solu- 418 higher failure rates 1−ζ than N1. These differences can 380 tions for other networks being detailed in the SI Sec. IV. 419 be qualitatively understood by counting the minimum 381 We then apply it to compare the Pex of autocatalytic 420 number of steps needed to produce more autocatalysts. 382 motifs which differ in their core structures. 421 In respective order, networks N1 to N5 in Fig. 3b do so 6

a) a) b) A B A B Type II P α 2 2 β ex AB AB U V A A A U U AB P V V A B ex c) 2

α r3 β

r4 + r2

r +

r5 1

+ + r6

b) r7

d)

N N N N N 1 2 3 4 5 ex P k k ex Figure 3: a) Pex as function of ζ (legend: Pex(ζ) for Pex < 1) for b) 5 autocatalytic networks of similar size, starting at the dashed node. N1: Type I cycle. N2: Type II with one fork. N3: Type II, two nonover- lapping allocatalytic cycles, a common motif in GARD with a 1st order RAF representation. N4: Type II: allocatalytic cycles connected by intermediate steps. N5: Type V. Symbols: Pex after 1000 simulated trials, detailed in SI Sec. VII, lines: exact solution, derived in SI Sec. V Figure 4: Multicompartment autocatalysis. a) Reaction network with two reactions and five species in a sin- gle compartment. The network does not contain any autocatalytic core, thus cannot perform autocatalysis. 422 in six, four, three, three and two steps. In particular, b) Same reaction network as in (a), but duplicated in 423 given their symmetries, the Pex of N3 and N5 have the two compartments α and β, coupled by the selective 424 same dependence on ζ as a 3 and 2-membered Type I exchange of species A and A2B. A Type II core, high- 425 cycle, respectively. lighted in orange, emerges. c) Open reactor with two compartments, a semi-permeable membrane, degra- 426 It has been suggested that large networks are dis- 39 dation and exchange reactions. Chemostatted species 427 favored in general . The examples of Fig. 3 indicate have a lighter background: U and V in α and AB in 428 that this can be counterbalanced by the presence of β. d) Extinction probability Pex for multicompart- 429 more allocatalytic cycles in the network. This is in ment autocatalysis in (c), starting from a single Aα, 430 particular the case for autocatalytic sets, as every net as a function of exchange rate kex and degradation 431 reaction must participate in an allocatalytic cycle. rate kd, relative to other relevant reaction rates fixed at k. Slanting asymptote: exchange-limited survival d kex = 2k . Vertical√ asymptote: reaction-limited sur- vival kd/k = 13−3 . Dashed white line: transition 432 EXTENSIONS: MULTICOMPARTMENT 2 between extinction and potential fixation (P < 1). 433 AUTOCATALYSIS ex Expressions for Pex and asymptotes are derived in SI Sec. VI. 434 We finally show how stoichiometric criteria allow 435 the identification of autocatalysis that emerges from 436 compartments coupled via selective exchange, as found 446 437 in systems comprising vesicles, pores, emulsions and and AB is chemostatted in β (Fig. 4c). The reaction 43 447 438 complex coacervates . The reaction network in Fig. involving U and V may in principle also take place in 448 439 4a is incapable of autocatalysis as it does not contain β, but it is not required for autocatalysis as it is not 449 440 any autocatalytic core. However, when we place this part of the type II core. In the present example, U and 450 441 network in two compartments α and β coupled by a V are absent in β.

442 membrane permeable to A and A2B, a Type II core 451 We now apply our viability analysis to this autocat- 443 emerges (Fig. 4b). 452 alytic network in the presence of degradation reactions 444 The core identification indicates a possible setting 453 (r6 and r7 in Fig. 4c). Let us introduce a characteristic 445 for autocatalysis: U and V are chemostatted in α, 454 rate k for reactions r2, r4 and r5, a degradation rate 7

d ex 455 k for reactions r6 and r7 and an exchange rate k 509 paves to the way of a systematic exploration of the 48 456 for reactions r1 and r3. Figure 4d shows the extinction 510 possible modes chemical evolution . d 457 probability as a function of the degradation rate k 511 Autocatalytic motifs provide different degrees of ex 458 and exchange rate k , both normalized by k. To over- 512 robustness, which we evaluated using the notion of via- 459 come the degradation threshold (Pex < 1), the ratio 513 bility. Viability can be computed as a survival probabil- d 460 k/k must lie above a certain threshold (vertical black 514 ity in an appropriately defined branching process. This ex 461 dotted line in Fig. 4d), and the rate of exchange k 515 approach is generally applicable to autocatalytic mod- 462 should outpace the rate of degradation (black slanting 516 els upon identification of their cores, highlighting the 463 dotted line in Fig. 4d). Stochastic simulations (SI Sec. 517 interest of a unified framework. Viability results from 464 VII and Fig. S9) confirm autocatalytic growth in these 518 a competition between reactions that produce autocat- 465 conditions. 519 alysts and side-reactions such as degradation. This is 19,40 466 In this example of coupled compartments, com- 520 intimately related to the ‘paradox of specificity’ : 467 pounds are no longer restricted to one role: AB is an 521 autocatalytic motifs are more likely to be found in large 468 autocatalyst in α and a feedstock in β. Chemical re- 522 networks with many different chemical components en- 469 actions are no longer restricted to one direction: The 523 gaging in many different reactions, but putting many 470 reaction used for reproduction in α is reused in β to 524 components together favors side-reactions, leading to 471 provide the missing step to close the cycle. Such mul- 525 extinction.

472 ticompartment autocatalysis is however more general. 526 Multicompartment autocatalysis introduced here 473 −−* For instance, a single reaction A )−− B + C can give 527 offers a way around this problem: coupled compart- 474 rise to Type III motifs, given three compartments cou- 528 ments effectively enlarge the number of species without 475 pled by selective exchange as detailed in SI Sec. VIII 529 requiring new reactions. In multicompartment auto- 476 and Fig. S9. 530 catalysis, cycles rely on the environmental coupling 531 of reaction networks, which allows access to condi- 532 tions unattainable in a single compartment. In this 533 way, autocatalysis can emerge from reaction schemes 477 DISCUSSION 534 as simple as a bimolecular reaction, provided certain 535 semi-permeability conditions are met for the exchange 478 We presented a theoretical framework for auto- 536 of compounds between compartments. In the example 479 catalysis based on stoichiometry, which allows a precise 537 shown here (Fig. 4), this allowed us to reuse the com- 480 identification of the different forms of autocatalysis. 538 pounds and reactions to complete autocatalytic cycles. 481 Starting with a large stoichiometric matrix, we provide 539 The principle is more general, however: autocatalysis 482 criteria for reaction network motifs that allow allocatal- 540 may also emerge from coupling phases with physical- 483 ysis and autocatalysis. A detailed analysis of the graph 541 chemical conditions conducive to different reactions, as 49 50 484 structure contained in these reduced stoichiometric 542 observed in liquid-solid , solid-gas interfaces. Liquid- 485 matrices reveals that they contain only five possible re- 543 liquid interfaces in cellular organization and multiphase 43 486 current motifs, which are minimal in the sense that they 544 coacervates are promising places to further explore 487 do not contain smaller motifs. Fundamental modes of 545 such principles. 488 production of minimal autocatalytic cores are encoded 546 Overall, our framework shows that autocatalysis 489 in the column vectors of the inverse of the autocatalytic 547 comes in a diversity of forms and can emerge in unex- 490 core submatrix. Autocatalytic cores are found to have 548 pected ways, indicating that autocatalysis in chemistry 491 a single reactant species for each reaction. This means 549 must be more widespread than previously thought. 492 that autocatalytic networks require the availability of 550 This invites to search for further extensions of auto- 493 certain chemical species in their cores to operate prop- 551 catalysis, which provides new vistas for understanding 494 erly, but also implies that the proper functioning of an 51 552 how chemistry may complexify towards life . 495 autocatalytic network will guarantee the stable supply 496 of certain products, a definitive advantage when these 497 products are key enzymes or metabolites. 553 MATERIALS AND METHODS 498 We identified these minimal motifs in known ex- 499 amples of autocatalysis such as the formose reaction, 500 central metabolic cycles, the GARD model and RAF 554 Theoretical methods and derivation of results are 501 sets. Autocatalytic cores also provide a basis for algo- 555 detailed in the Supplementary Appendix comprising 502 rithms to identify these recurring autocatalytic motifs 556 the following sections: 1) Terminology and definitions, 44,45 503 in large chemical networks , as has been done for 557 2) derivation of autocatalytic cores from Graph the- 46 504 gene regulatory networks . In this way, we may be 558 ory, 3) their chemical interpretation and 4) application 505 able to break the complexity of large chemical networks 559 to formose, autoinduction, metabolic cycles, chemical 47 506 into smaller, more manageable structures . Addition- 560 amplification, RAF sets, GARD 5) branching process 507 ally, autocatalytic cores are the building block of evolu- 561 derivation and determination of Pex. 6) determination 36 508 tion in prebiotic chemistries , thus their identification 562 of Pex for Fig. 3. 7) determination of Pex for Fig. 4d. 8

27 563 8) stochastic simulations. 9) autocatalysis from one 622 S. Schuster, C. Hilgetag, J. H. Woods, and D. A. Fell, “Ele- 564 bimolecular reaction and 3 compartments. 623 mentary modes of functioning in biochemical replicators,” in 624 Computation in Cellular and Molecular Biological Systems 625 (Singapore: World Scientific, 1996) pp. 151–165. 28 626 A. D. McNaught and A. Wilkinson, IUPAC Compendium of 627 Chemical Terminology, 2nd ed. ("the Gold Book") (Blackwell 565 ACKNOWLEDGEMENTS 628 Scientific Publications, Oxford, 1997). 29 629 W. Hordijk, J. Theor. Biol. 435, 22 (2017). 30 630 M. Gopalkrishnan, Bull. Math. Biol. 73, 2962 (2011). 566 AB acknowledges stimulating discussions with D. 31 631 Borwein, Jonathan and Lewis, Adrian S., Convex Analysis and 567 van de Weem and Z. Zeravcic. AB and DL acknowl- 632 Nonlinear Optimization (Springer-Verlag New York, 2006). 32 568 edge support from Agence Nationale de la Recherche 633 Graph cycles are closed paths in the reaction hypergraph and 569 (ANR-10-IDEX-0001-02) and PSL IRIS OCAV). PN 634 should not be confounded with reaction cycles which are right 635 null vectors of the stoichiometric matrix. 570 acknowledges s. Krishna and C. Flamm for discus- 33 636 R. Breslow, Tetrahedron Lett. 21, 22 (1959). 571 sions. AB, DL and PN acknowledge Andrew Griffiths 34 637 A. Ricardo, F. Frye, M. A. Carrigan, J. D. Tipton, D. H. 572 for careful reading of the manuscript. 638 Powell, and S. A. Benner, J. Org. Chem. 71, 9503 (2006). 35 639 A. J. P. Teunissen, T. F. E. Paffen, I. A. W. Filot, M. D. 640 Lanting, R. J. C. van der Haas, T. F. A. de Greef, and E. W. 641 Meijer, Chem. Sci. 10, 9115 (2019). 36 573 REFERENCES 642 V. Vasas, C. Fernando, M. Santos, S. Kauffman, and E. Sza- 643 thmáry, Biol. Direct 7, 1 (2012). 37 644 S. A. Kauffman, A world beyond physics: the emergence & 1 574 W. Hordijk, BioScience 63, 877 (2013). 645 evolution of life (Oxford University Press, 2019). 2 575 A. I. Oparin, Origin of Life (Dover, 1952). 38 646 J. Chen, S. Körner, S. L. Craig, S. Lin, D. M. Rudkevich, and 3 576 S. A. Kauffman, J. Theor. Biol. 119, 1 (1986). 647 J. Rebek, Proc. Natl. Acad. Sci. U. S. A. 99, 2593 (2002). 4 577 D. Segré, D. Ben-eli, D. W. Deamer, and D. Lancet, Orig. 39 648 G. King, BioSystems 15, 89 (1982). 578 Life Evol. Biosph. 31, 119 (2001). 40 649 E. Szathmáry, Philos. Trans. R. Soc. B Biol. Sci. 355, 1669 5 579 D. Lancet, R. Zidovetzki, and O. Markovitch, J. R. Soc. 650 (2000). 580 Interface 15, 20180159 (2018). 41 651 S. Karlin and H. M. Taylor, A first course in stochastic pro- 6 581 L. E. Orgel, Plos Biol. 6, 5 (2008). 652 cesses (Academic Press, New York, 1975). 7 582 A. Bissette, B. Odell, and S. Fletcher, Nat Commun 5, 4607 42 653 Note that in general, the calculation of Pex may involve reac- 583 (2014). 654 tions that are not in the core. Here, we consider cases where 8 584 S. N. Semenov, L. J. Kraft, A. Ainla, M. Zhao, M. Bagh- 655 this is not necessary. 585 banzadeh, V. E. Campbell, K. Kang, J. M. Fox, and G. M. 43 656 T. Lu and E. Spruijt, Journal of the American Chemical Society 586 Whitesides, Nature 537, 656 (2016). 657 142, 2905 (2020). 9 587 H. N. Miras, C. Mathis, W. Xuan, D.-L. Long, R. Pow, 44 658 á. Kun, B. Papp, and E. Szathmáry, Genome Biol 9, R51 588 and L. Cronin, Proc. Natl. Acad. Sci. U. S. A. (2020), 659 (2008). 589 10.1073/pnas.1921536117. 45 660 Andersen, J. L. and Flamm, C. and Merkle, D. and Stadler, 10 590 B. Liu, C. G. Pappas, E. Zangrando, N. Demitri, P. J. 661 P. F., IEEE/ACM Trans. Comput. Biol. Bioinform 16, 510 591 Chmielewski, and S. Otto, J. Am. Chem. Soc. 141, 1685 662 (2017). 592 (2019). 46 663 R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, 11 593 M. Altay, Y. Altay, and S. Otto, Angew. Chem. Int. Ed. 57, 664 and U. Alon, Science 298, 824 (2002). 594 10564 (2018). 47 665 P. Dittrich and P. Speroni di Fenizio, Bull. Math. Biol. 69, 12 595 C. Viedma, Phys. Rev. Lett. 94, 065504 (2005). 666 1199 (2007). 13 596 I. Baglai, M. Leeman, M. Kellogg, and W. Noorduin, Org. 48 667 Z. Peng, A. Plum, P. Gagrani, and D. A. Baum, arXiv (2020), 597 Biomol. Chem. 17, 35 (2019). 668 arXiv:2001.02533, arXiv:arXiv:2001.02533. 14 598 K. Ichimura, K. Arimitsu, and K. Kudo, Chemistry Lett. 24, 49 669 Halpern, J, J. Electrochem. Soc. 100, 421 (1953). 599 551 (1995). 50 670 P. Grosfils, P. Gaspard, and T. V. de Bocarmé, The Journal 15 600 F. Schlögl, Z. Phys. 253, 147 (1972). 671 of Chemical Physics 143, 064705 (2015). 16 601 G. Nicolis and I. Prigogine, Self-Organization in Nonequilib- 51 672 R. Krishnamurthy, Chem. Eur. J. 24, 16708 (2018). 602 rium Systems: From Dissipative Structures to Order through 52 673 K. Andersson, G. Ketteler, H. Bluhm, S. Yamamoto, H. Oga- 603 Fluctuations (Wiley, New York, 1977). 674 sawara, L. G. M. Pettersson, M. Salmeron, and A. Nilsson, J. 17 604 R. J. Bagley, D. J. Farmer, and W. Fontana, in Artif. Life II 675 Am. Chem. Soc. 130, 2793 (2008). 605 (1991) pp. 141–158. 53 676 D. Angeli, P. D. Leenheer, and E. D. Sontag, Math. Biosci. 18 606 D. Segré, D. Lancet, O. Kedem, and Y. Pilpel, Orig. Life Evol. 677 210, 598 (2007). 607 Biosph. 28, 501 (1998). 54 678 J. Chen, S. Körner, S. L. Craig, D. M. Rudkevich, and J. Re- 19 608 E. Szathmáry, Philos. Trans. R. Soc. Lond. B. Biol. Sci. 361, 679 bek, Nature 415, 385 (2002). 609 1761 (2006). 55 680 G. King, J. Theor. Biol. 123, 493 (1986). 20 610 W. Hordijk and M. Steel, Journal of Theoretical Biology 227, 611 451 (2004). 21 612 D. G. Blackmond, Angew. Chem. Int. Ed. 48, 386 (2009). 22 613 P. Schuster, Monatsh. Chem. 150, 763 (2019). 23 614 M. Feinberg, Foundations of Chemical Reaction Network The- 615 ory (Springer, New York, 2019). 24 616 U. Barenholz, D. Davidi, E. Reznik, Y. Bar-on, N. Antonovsky, 617 E. Noor, and R. Milo, eLife 6, e20667 (2017). 25 618 A. Deshpande, M. Gopalkrishnan, and C. Science, Bull. Math. 619 Biol. 76, 2570 (2014). 26 620 M. Polettini and M. Esposito, J. Chem. Phys. 141, 024117 621 (2014). 9

681 SUPPLEMENTARY INFORMATION 731 that express an opposition (same vs different). This 732 opposition between same and different is e.g. found 733 in the IUPAC terminology for a homogeneous cataly- 682 I. TERMINOLOGY 734 sis (occurring in the same phase) and heterogeneous 735 catalysis. For the IUPAC recommended terminology 683 A. IUPAC Definitions 736 ’autocatalysis’, a consistent choice that expresses this 737 opposition is ’allocatalysis’ (self vs other).

684 To promote the consistent use of terminology, 685 IUPAC committees establish recommendations which

686 serve as a basis for the Compendium of Chemical Ter- 738 C. Remarks 687 minology (The Gold Book), of which some relevant 688 entries are reproduced here. Comments in italic have 739 1. Directionality of autocatalysis 689 been added for clarity.

690 Chemical Reaction: A process that results in the 691 interconversion of chemical species. Chemical reactions 740 In an allocatalytic reaction, catalysts are produced 692 may be elementary reactions or stepwise reactions. (It 741 and consumed in equal amounts, and the term does 693 should be noted that this definition includes experi- 742 not distinguish between the direction in which the re- 694 mentally observable interconversions of conformers.) 743 action proceeds (catalysis equally increases the rates 695 Detectable chemical reactions normally involve sets 744 of ’forward’ and ’backward’ reactions by the thermo- 696 of molecular entities as indicated by this definition, 745 dynamic criterion). In contrast, the term autocatalysis 697 but it is often conceptually convenient to use the term 746 only applies in one direction, due to the requirement 698 also for changes involving single molecular entities (i.e. 747 of having catalysts be the product of the net reaction. 699 ’microscopic chemical events’). 748 Consequently, in writing the simplest reaction balance

700 Catalyst: A substance that increases the rate of a A + B )−−−−* AB )−−−−* 2B, (10) 701 reaction without modifying the overall standard Gibbs 702 energy change in the (net) reaction; the process is 749 the definition applies when the net reaction A )−−−−* B 703 called catalysis. The catalyst is both a reactant and 750 exhibits acceleration, owing to B being a catalyst and 704 product of the (catalyzed) reaction. The words cata- 751 product of the net reaction. In the opposite sense, 705 lyst and catalysis should not be used when the added 752 B )−−−−* A, B is no longer a product, but it can still 706 substance reduces the rate of reaction (see inhibitor). 753 accelerate the reaction. This case is typically referred 707 Catalysis can be classified as homogeneous catalysis, 754 to as ’reverse autocatalysis’. 708 in which only one phase is involved, and heterogeneous 755 A simple experimental example is the catalytic dis- 709 catalysis, in which the reaction occurs at or near an 756 proportionation of water adsorbed on a copper surface 710 interface between phases. Catalysis brought about by 52 757 H O(ads) . 711 one of the products of a (net) reaction is called auto- 2

712 catalysis. Catalysis brought about by a group on a 1 713 reactant molecule itself is called intramolecular catal- 2H2O(ads) )−−−−* H2O · OH(ads) + H(ads), (11) 714 ysis. The term catalysis is also often used when the 2 H2O · OH(ads) )−−−−* H2O(ads) + OH(ads). (12) 715 substance is consumed in the (net) reaction (for exam- 716 ple: base-catalysed hydrolysis of esters). Strictly, such 758 Performing r1 and r2 from left to right, the catalyst 717 a substance should be called an activator. 759 H2O is consumed, yielding reverse autocatalysis for the 718 Autocatalytic Reaction: A(net) chemical reac- 760 net reaction: 719 tion in which a product (or a reaction intermediate)

720 also functions as a catalyst. In such a reaction the H2O(ads) )−−−−* H(ads) + OH(ads). (13) 721 observed rate of reaction is often found to increase 722 with time from its initial value. 761 In the opposite sense, the stoichiometric conditions for 762 autocatalysis are obtained:

H(ads)+OH(ads)+H2O(ads) )−−−−* 2H2O(ads). (14) 723 B. Allocatalysis

724 We refer to allocatalysis as the form of catalysis 763 2. Inhibition 725 in which, at the end of a catalytic cycle, the catalyst(s) 726 have not changed in number. By their equal participa- 727 tion in either direction, allocatalysts will thus drop out 764 A compound is a catalyst in the context of a par- 728 of the net reaction. Some authors refer to autocatalysis 765 ticular experimental condition where it accelerates the 729 as homocatalysis and allocatalysis as heterocatalysis, 766 rate of a reaction. A change of those conditions may 730 which is a lexicologically consistent choice of terms 767 change this label, e.g. the compound may become an 10

768 inhibitor instead. Consider 810 {r1, ..., rm} is reaction set, each rj being an ordered 811 pair (Xj, Y j) of non-empty and non-intersecting sub- 1 A + B )−−−−* AB, (15) 812 sets of S respectively called reactants and products, and 2 813 M is the stoichiometric matrix with coefficients mij. A + E )−−−−* AE, (16) 814 mij = 0 when the species si does not participate to the 3 AE + B )−−−−* AEB, (17) 815 reaction rj, mij < 0 when species si is a reactant of 4 816 rj, and mij > 0 when species si is a product of rj. AEB )−−−−* E + AB. (18) 817 The stoichiometric matrix M contains all the in- 769 For a rate acceleration to occur for the net reaction 818 formation about the hypergraph, the column of M 770 A + B )−−−−* AB, steps 2,3,4 must together proceed 819 corresponding to reactions, and the rows to species. 771 faster than reaction 1. Template-assisted ligation of 820 Definition 2. A subgraph of H = (S, R, M) is a 772 DNA or RNA provides an example where this may not 0 0 0 0 0 0 821 triplet H = (S ,R ,M ) where S is a subset of S, R 773 be the case: below a critical annealing temperature 822 is a set of reactions which reactants and products are 774 the detachment of the product (here, AB in step 4) 0 823 in S and intersect the reactant set and product set 775 becomes increasingly slower for longer strands, and the 824 of a reaction in R, with corresponding stoichiometric 776 template becomes an inhibitor. 0 825 coefficients M .

826 Definition 3. A reaction graph is square if it has the 777 3. Autonomy and siphons 827 same number of reactions and species.

828 Definition 4. A directed path is a sequence of alter- 778 The concept of autonomy is closely related to the 829 nating species and reactions, all reactions and species 779 concept a siphon in Chemical Reaction Networks (CRN) 53 830 being distinct, where species which precede and succeed 780 theory : A Siphon Σ is a subset Σ ⊂ S of all species 831 a reaction are respectively a reactant and a product of it. 781 S, which for each reaction that has a species in Σ as a 832 A path is a minimal path if it is not possible to form 782 product, has at least one of its reactants in Σ. 833 a path starting and ending at the same species using a 783 In this definition, a reaction can be irreversible 834 strict subset of its reactions. A path is semi-open if 784 in a mathematical sense: the reverse reaction does 835 it either starts or ends with an edge. 785 not exist. For a reversible CRN, a reverse reaction 786 does exist, and the siphon definition must apply to 836 Definition 5. In a directed path, an edge has a back- 787 the forward and backward direction, thus becomes 837 branch if one of its products is a species located up- 788 equivalent to autonomy. A reaction must then imply 838 stream in the path, and this product is called a back- 789 both one or more products and one or more reactants 839 product. An edge has a forward-branch if one of its 790 from Σ (siphon reactions), or none at all (external 840 reactants is a species located downstream in the path, 791 reactions). Note that autonomy is less restrictive than 841 and this product is called a forward-reactant. 24 792 the conditions posed in Barenholz et al. , where in 842 Definition 6. A cycle has an identical definition as a 793 addition species must be both reactant and product 843 path, except that the first and last species are the same 794 of a reaction. This last conditions is a proved as a 844 species. A cycle is minimal if it is not possible to form 795 consequence for minimal structures in our choice of 845 a cycle with a subset of its species and reactions. 796 formalism (see below).

846 Definition 7. A species S is the solitary reactant 847 (product) of a reaction if S is the only reactant (resp. 797 II. MATHEMATICAL DERIVATION OF 848 product) of this reaction, otherwise it is a co-reactant 798 AUTOCATALYTIC CORES 849 (resp. co-product).

850 Definition 8. A reaction is simple if has a single 799 A. Reaction graph definitions 851 reactant and a single product.

800 Reaction graphs described below correspond to 852 Definition 9. Consider a simple reaction R with re- 801 weighted directed hypergraphs without self-loops in the 853 actant x and product y, with respective 802 language of graph theory. In this section, the word cycle 854 −a and b. The contraction of R consists of: (i) re- 803 is used in the sense of graph theory (see below), not 855 moving R; (ii) merging x and y into a single species z, 804 in the sense of reaction cycle used to denote right null- 856 and; (iii) multiplying by a (resp. b) the stoichiometric 805 vectors of the stoichiometric matrix. In the following, 857 coefficients associated with z for all reactions formerly 806 letters used for scalars indicate positive numbers, and 858 associated with x (resp. y). 807 cycles and paths are understood as directed. 859 Definition 10. In a square graph, a perfect match- 808 Definition 1. A reaction graph H is a triplet 860 ing is a bijection between species and reactions. It 809 (S, R, M) where {s1, ..., sn} is the species set, R = 861 corresponds to all pairs (i, σ(i))i=1...N , where i is the 11

862 index of a species, σ is a permutation of [1...N], and 907 • A minimal cycle C has exactly two perfect match- 863 σ(i) is the index of a reaction. A perfect matching of 908 ings, which correspond to the matching of its 864 a graph, or a subgraph, G is called a G-matching. 909 reactions with their solitary reactants and soli- 910 tary products respectively. det(C) = 0 if and only 865 Remarks: 911 if C is weight-symmetric.

0 866 • Consider reactions E = ({a}, {b, c}) and F = 912 • Be H the hypergraph obtained from H by con- 867 ({b}, {c}); a − E − b − F − c is a path, but it is 913 tracting a simple reaction R. The cofactor expan- 0 868 not minimal because it contains a−E −c. Simple 914 sion implies that |det(H )| = |det(H)|. Addition- 0 0 869 paths are necessarily minimal (Fig. S1a). 915 ally, any cycle C of H becomes a cycle C in H ob- 0 916 tained by contracting R, and |det(C )| = |det(C)|. 870 • Consider E = ({a}, {b}) and F = ({b}, {a, c}); 871 a − E − b − F − c is a minimal path and F has a 872 back-branch (Fig. S1a). In particular, it contains 917 C. Autocatalytic cores 873 a cycle a − E − F − a.

874 • Simple paths are exactly cycle-free minimal 918 Definition 14. A core is a minimal productive reac- 875 paths. 919 tion graph, i.e. a reaction graph that does not contain 920 any productive subgraph. 876 • More generally, a minimal path can have back-

877 products and forward-reactants (Fig. S1a). 921 Finding cores is equivalent to finding minimal ma- 922 trices which are productive and autonomous. In this 878 • A hypergraph cycle can have reactions connecting n,m 923 section, we denote M ∈ IR the stoichiometric ma- 879 several of its species, but a minimal cycle cannot. 924 trix of the graph H, where n is the number of rows 880 Thus, a minimal cycle only contains simple paths 925 (species) and m the number of columns (reactions). 881 (it is identical to a cycle in a regular graph). 926 If M is productive, we denote γ a vector such that 927 M.γ 0. Productivity of M is indifferent to the sign 882 • Cycles are square. 928 of its columns as the sign of the coefficients of γ is not 929 constrained. Therefore, we choose the convention that 930 any productive vector γ is positive, up to taking the 883 B. Relationship with linear algebra 931 opposite for some columns of M.

932 Remarks: 884 x 0 denotes a real vector with only strictly

885 positive coordinates. 933 • M invertible implies M productive, as the image n,m 934 of M is then the full space IR , which contains 886 Definition 11. A matrix M is productive if there 935 the strictly positive orthant. 887 exists a real vector γ such that M. γ 0. Equivalently, n 888 M intersects the strictly positive orthant IR>0. 936 • Below, species or reactions ’can be removed’ is 937 understood as ’can be removed while preserving 889 Definition 12. A matrix is autonomous if its 938 productivity and autonomy’. Being able to re- 890 columns all contain a strictly negative and a strictly 939 move a row (a species) or a column (a reaction) 891 positive coefficient. 940 contradicts the minimality of a core.

892 Definition 13. A minimal cycle is weight- 941 • Removing columns (reactions) preserves auton- 893 symmetric if the product of the absolute values 942 omy, but not necessarily productivity. 894 of the stoichiometric coefficients associated with its

895 reactants equals the product of the stoichiometric 943 • Removing rows (species) preserves productivity, 896 coefficients associated with its products. Otherwise, the 944 but not necessarily autonomy. 897 cycle is weight-asymmetric. 945 • A row corresponding to a species that is always 898 Remarks: 946 a co-reactant or a co-product can be removed 947 without affecting autonomy (every column still 899 • Autonomous matrices are stoichiometric matrices 948 contains positive and negative coefficients) and 900 of reactions systems such that every reaction has 949 productivity. 901 at least one reactant and at least one product. 950 Proposition 1. In a core, every species is both a 902 • The Leibniz formula for the determinant shows 951 reactant and a product. 903 that the non-zero terms of det(M) exactly cor- 904 respond to the products of stoichiometric coeffi- 952 Proof. Every species must be produced, otherwise it 905 cients mi,σ(i) determined by each possible perfect 953 would not be possible to find a positive γ verifying 906 matching of the graph. 954 M.γ 0. Now, suppose a species S is never a reactant 12

955 and only a product. All reactions such that S is their 1005 Proposition 5. In a core, every species is involved in 956 only product can be removed without affecting the 1006 a cycle. 957 productivity of other species. In the resulting graph, 958 either S is not the product of any reaction anymore, 1007 Proof. Obvious from Property 4, considering the back 959 or S is only a co-product of the remaining reactions. 1008 and forth paths joining any two species. 960 In both cases, S can be removed without affecting 1009 Proposition 6. Consider a partition of a core into 961 autonomy. 1010 two species sets V and W . V cannot be upstream of 1011 W , i.e. reactions with products in W cannot have all 962 Proposition 2. A core is square, invertible, every 1012 their reactants in V . 963 species is the solitary reactant of a reaction, and is 964 reactant for this reaction only. 1013 Proof. Suppose on the contrary that M can be written   n,m AB 965 Proof. Consider M ∈ IR a productive stoichiomet- 1014 M = where A and B span V , C spans W , 0 C 966 ric matrix with n species and m reactions, with rank 1015 and B ≤ 0. Given the latter inequality, Property 1 967 k. Obviously k ≤ m, n. We must also have m = k, 1016 implies that A is non-empty and autonomous. Consider 968 otherwise a column could be removed while preserving 1017 γ > 0 such that M.γ 0. Be α and β the respective 969 the image of M, thus productivity. Additionally, every 1018 restrictions of γ to the reaction spaces of A and B. 970 species must be the solitary reactant of at least one 1019 Then A.α + B.β 0. As B.β ≤ 0, we have A.α 0, 971 reaction, otherwise it could be removed. This implies 1020 contradicting the minimality of M. 972 m ≥ n. Overall, k = m ≥ n ≥ k, so that k = m = n, 973 meaning that M is square and invertible. As every 1021 The definitions below are generalizations of the 974 species S is the solitary reactant of at least one reac- 1022 notion of ear decomposition in regular graphs (Fig. 975 tion R and m = n, the species is a reactant for only 1023 S1b). 976 R.

1024 Definition 15. A hyper-ear is a hypergraph compris-

977 Remark: Property 2 implies that M can be re- 1025 ing a minimal cycle C, called the base cycle, such that 978 arranged in such a way that it has a strictly negative di- 1026 C has a reaction with a product x outside C, and a min- 979 agonal, and only non-negative off-diagonal coefficients. 1027 imal path P starting at x, such that its last reaction, 1028 R, has a product in C, R being the only P-reaction 980 Proposition 3. In a core, a square autonomous sub- 1029 to have a product in C.A proto-ear has a similar 981 matrix must have a product outside its set of species. 1030 definition, but where C is a simple path (called the base 1031 path) instead of a minimal cycle. 982 Proof. Be A a square autonomous submatrix of M and   AC 1032 Description of proto-ears and hyperears - In 983 write M = . We need to show that B > 0. BD 1033 a hyper-ear or a proto-ear, any C-reaction can have x 984 As A is autonomous, every A-reaction has a reactant 1034 as a product, and any C-species can be the product of 985 in the set of A-species. By Property 2, reactants are 1035 R, the last reaction of P (Fig. S1b). We denote by u 986 always solitary, thus B ≥ 0. Now suppose B = 0. Then 1036 a species which is the product of R, v any C-species 987 det(M) = det(A).det(D). Either det(A) = 0, implying 1037 which is the reactant of a C-reaction which produces 988 det(M) = 0, contradicting M invertible, or det(A) 6= 0, 1038 x,’−’ a simple path (including the empty path), o a 989 then A is productive, contradicting the minimality of 1039 simple path comprising at least one species, o being a 990 M. 1040 subcase of the ’−’ category. By convention, a uv motif 1041 can correspond to a single same species which is both a

991 Proposition 4. A core is strongly connected. 1042 R-product and a reactant for x production. Any proto- 1043 ear falls into a class described by a chain made of the 1044 symbols u, v, o, and ’−’, as soon as it contains at least 992 Proof. Consider a species x0 of M. Below, we recur- 1045 a u and a v. Reciprocally, any such chain represents 993 sively construct sets Dk = {x1, ..., xk} of increasing 1046 one or more proto-ears. Any hyper-ear is obtained by 994 cardinal, such that every xi is downstream x0, until 1047 cyclic closure of the base path of a proto-ear. Such 995 k = n − 1, implying that for any species y 6= x, there 0 0 1048 closure is denoted by the ∗ symbol at the beginning 996 exists a path from x to y. We denote R(S) the only 1049 and the end of the chain. 997 reaction with reactant S, which is well defined by Prop- 998 erty 2. Step 1: We take x1 as a product of R(x0). 1050 Theorem 1. Cores are of one of the following types: 999 Step k: Suppose Dk exists, k < n − 1. Re-arrange 1000 M such the top left block A of size k corresponds to 1051 • TYPE I: a weight-asymmetric minimal cycle; 1001 species set Dk and reaction set R(Dk). As A is au- 1002 tonomous, by Property 3, there exists a species xk+1 1052 • TYPE II: a cycle comprising all species as soli- 1003 outside of Dk which is downstream Dk, hence Dk+1 1053 tary reactants, and one or more weight-symmetric 1004 exists. 1054 sub-cycles without intersection between them; 13

1055 • TYPES III-IV-V: one of the three types of hyper- 1106 strictly positive. In all these cases, we have det(M) > 1056 ears, any subcycle of which is weight-symmetric: 1107 −1 + ac + be = 1.

1057 (III) ∗u−vo∗; (IV) ∗u−u−v∗; (V) ∗u−v−u−v∗. 1108 Step 2: All types are minimal

1109 TYPE I: Removing any subset of species or reac- 1058 Proof. Core type are represented in Fig. S1c. H 1110 tions would result in an acyclic graph, contradicting 1059 denotes the reaction graph and M its stoichiometric 1111 Property 5.

1060 matrix. 1112 TYPE II: Removing any subset of species or reac- 1061 1113 tions either leads to a hypergraph where a subset of

1062 SUFFICIENCY 1114 species is upstream the rest, contradicting Property 6,

1063 1115 or to a non-invertible minimal cycle.

1064 Step 1: All types have a non-zero determinant. 1116 TYPE III-V: Removal of any set of reactions (and 1117 a fortiori of species, given that they all are solitary 1065 TYPE I: Invertibility is a direct consequence of 1118 reactants) leaves at most one cycle in the graph, the 1066 the remark on the determinant of minimal cycles. 1119 latter being non-invertible. 1067 TYPE II: Consider a minimal weight-symmetric 1120 1068 subcycle C of H. Any H-matching consistent with a 1121 NECESSITY 1069 C-matching can be associated to another H-matching 1122 1070 obtained by reversing the C-matching. Thus, det(H) 1071 has the form det(C).α + β, where the first term com- 1123 By Property 5, there exists a minimal cycle C in 1072 prises all C-consistent H-matchings. As det(C) = 0, it 1124 H. Either C is weight-asymmetric, then H = C is of 1073 suffices to show that β is a single non-zero term. By 1125 TYPE I. Or C is weight-symmetric. Then, by Property 1074 Property 3, C must have a reaction R1 with a prod- 1126 3, there is a C-reaction with a product x outside C. By 1075 uct x1 outside of C. As by definition of TYPE II, 1127 Property 4, there exists a path P from x to any species 1076 subcycles are non-intersecting, R1 must be the only 1128 in C (Fig. S1d). We can take P minimal and such that 1077 C-reaction with a product outside C, and x1 must be 1129 only its last reaction has a product in C, thus forming 1078 the only product of R1 outside C. All non-zero terms 1130 a hyper-ear ∗mu−vm∗ where m symbols stand for any 1079 in β require matching R1 to x1, otherwise R1 would 1131 other hyper-ear motif. Below, we show that hyper-ears 1080 be matched with a C-species, which would impose a C- 1132 are either one of the types II-V, or contain a TYPE 1081 matching (C being a minimal cycle), contradicting the 1133 II, III or IV core as a subgraph, overall demonstrating 1082 definition of β. We now order the indexes k following 1134 that H is necessarily a hyper-ear of TYPE II-V. 1083 the downstream order of the xk species along H, and 1084 show recursively that β corresponds to the H-matching 1135 Before continuing the proof of the theorem, we 1085 where every Rk matches xk: (step 1) By construction, 1136 show an additional property based on the sufficiency 1086 R1 matches x1. (step k) Suppose Rk−1 matches xk−1. 1137 of TYPE I and TYPE III. 1087 Either Rk is a simple reaction, and, as its reactant xk−1 1088 is already matched, Rk necessarily matches its only 1138 Proposition 7. In a core, a minimal path can have 1089 product xk. Or Rk has a back-branch forming a mini- 1139 back-branches but no forward-branch, and the cycles 1090 mal cycle. However, cycles are non-intersecting, so that 1140 formed by back-branches are non-intersecting. 1091 the back-product xi of Rk is necessarily downstream 1092 xk and upstream xk−1, thus xi is already matched by 1141 Proof. Consider a minimal path P in a core H. 1093 the recursion hypothesis. Thus, Rk can only match xk. 1142 Forward-branches require reactions to have multiple

1094 TYPES III-IV-V: Consider the stoichiometric ma- 1143 reactants, contradicting Property 1. Therefore, P is 1095 trix M with non-negative off-diagonal coefficients: 1144 either a simple path or it has back-branches. Suppose 1145 there exists two back-products y and z of the respective   −1 c e 1146 reactions ry and rz, respectively forming intersecting M =  a −1 f  (19) 1147 cycles Cy and Cz (Fig. S1e). Without loss of gener- b d −1 1148 ality, y and z can be taken closest, y upstream of z, 1149 and such that there is no other cycle nested within 1096 We have det(M) = −1+df +ac+ade+bcf +be. Strict 1150 Cy and Cz than possibly themselves. Cz is necessarily 1097 subcycles must be weight-symmetric, as otherwise, the 1151 weight-symmetric, as it would otherwise contradict the 1098 minimality of M would be contradicted. Consequently, 1152 minimality of H. There are two cases. Case 1: rz is 0 1099 when their factors are both non-zero, the products df, 1153 upstream ry (Cz is nested within Cy). Call P the path 1100 ac, and be must be equal to 1. By contracting all 1154 from the product of rz to ry then y then z. Then Cz 0 1101 simple reactions and multiplying the columns of M 1155 and P form a TYPE III core. Case 2: ry is upstream 0 1102 if necessary so that solitary reactant stoichiometries 1156 rz (Cz and Cy are entangled). Call P the path joining 0 1103 are all normalized to −1, TYPE III corresponds to 1157 y to z. Then Cz and P form a TYPE III core. Overall, 1104 d = f = 0 and a, b, c, e > 0, TYPE IV to f = 0 1158 we have shown that the existence intersecting cycles 1105 and a, b, c, d, e > 0, and TYPE IV to all coefficients 1159 along a minimal contradicts the minimality of H. 14

1160 We now go back to the proof of the theorem. 1216 this class which does not contradict the minimality of 1217 H.

1161 Proof. Consider the minimal path P of the hyper-ear, 1218 1162 where P starts at x. By Property 7, P has no forward- 1163 branch. 1219 III. CHEMICAL INTERPRETATION OF 1164 Case A (TYPE II): Suppose P has one or more 0 1220 AUTOCATALYTIC CORES 1165 back-branch. Consider a reaction R of P which has a 0 1166 back-branch, with back-product y and forming a cycle 0 0 1167 C . C is necessarily weight-symmetric by minimality of 1221 We remind that the notion of ’graph cycle’ (closed 0 0 0 1168 H. Call x the product of R outside C . Consider the 1222 successions of nodes and edges) differs from the notion 0 0 1169 path P starting at x , then going downstream of P 1223 ’reaction cycle’ (right null vectors of the stoichiomet- 1170 until it reaches a shortest subpath u − v of C, then to x, 1224 ric matrix). The latter name was historically chosen 0 1171 and finally from x to y in P. By Property 7, all cycles 1225 because the two notions overlap in the particular case 1172 along P are non-intersecting, and by construction, C 1226 of the simplest catalytic cycles, but there are counter- 1173 is non-intersecting with the cycles of P. Consequently, 1227 examples for both (we give one later). We employ 0 0 1174 P only has non-intersecting cycles, so that C and 1228 ’reaction cycle’ and ’graph cycle’ to distinguish these 0 1175 P form a TYPE II motif. Thus, the base cycle of 1229 notions.

1176 the hyper-ear cannot have species outside u − v, as 1230 By the minimality of autocatalytic cores, an au- 0 0 1177 otherwise the core made of C and P would be a strict 1231 tocatalytic motif is either a core, or it contains one or 1178 subset of H, contradicting its minimality. This shows 1232 several cores. If P is a property of cores, then for any 1179 that H is a ∗u − v∗ hyper-ear with a minimal path 1233 autocatalytic motif, either it verifies P , or it contains 1180 comprising non-intersecting cycles, the latter being 1234 an autocatalytic motif verifying P . In particular, 1181 necessarily weight-symmetric. Thus H is a TYPE II

1182 core. 1235 • by Property 1, every autocatalytic motif contains

1183 Case B (TYPES III-V): Suppose P has no 1236 an invertible autocatalytic motif; 1184 back-branch, in other words P is a simple path. 1237 • by Property 2, every autocatalytic motif contains 1185 Subcase B.1 (TYPE III): Suppose H a hyper-ear 1238 an autocatalytic motif such that every product 1186 which has only one u and one v symbol, thus a ∗u−v−∗ 1239 is also a catalyst of the reaction, or equivalently, 1187 hyper-ear. The case ∗u − v∗ with a simple path is 0 0 1240 such that every species appears on both side of 1188 covered by the definition of TYPE II. If the − symbol 1241 the total chemical equation. 1189 does not represent an empty path, then it comprises 1190 at least one species, so that H is a ∗u − vo∗ hyper-ear 1242 The five categories of cores are schematically rep- 1191 with simple path P, corresponding to TYPE III. 1243 resented in Fig. S1c. The convention of representation 1192 Subcase B.2 (TYPE IV): Suppose the hyper-ear 1244 are as follows. The edges of the graph correspond to 1193 comprises one u or v symbol in addition to the u − v 1245 reactions and the yellow nodes to species. Reaction 1194 motif. We first note that u − u − v and u − v − v proto- 1246 have two sides, the reactant side and product side, and 1195 ears are isomorphic to ∗u − v − ∗, which falls into the 1247 each side of the edge representing the reaction connects 1196 categories of TYPE II or TYPE III cores. Therefore, 1248 to one or several nodes representing the species, with 1197 in a core, a hyper-ear containing two successive u or v 1249 a stoichiometry for each connection. In the Type I 1198 symbols is necessarily of the form ∗u − u − v∗ or ∗v − 1250 core, the edge from the top node to the bottom node 1199 v − u∗, or their cyclic permutations, as any additional 1251 ends with a fork, the two ends of which connect to a 1200 symbol would allow to find a strict subgraph u − u − v 1252 single bottom node (S1c). This means that the bottom 1201 or u−v−v forming a core, contradicting the minimality 1253 species is produced with a stoichiometry of 2 by this 1202 of H. Furthermore, ∗u − u − v∗ and ∗v − v − u∗ hyper- 1254 reaction. For simplicity, we have represented the con- 1203 ears are isomorphic. Indeed, the matrices of their 1255 nection between all other edges and nodes without fork. 1204 reduced forms correspond to the matrix shown in the 1256 However, in all generality, any connection between a 1205 SUFFICIENCY section of the theorem, where exactly 1257 edge and a node could be a fork (e.g. could be of 1206 one coefficient is set to zero. Thus, these motifs as well 1258 stoichiometry >1), as soon as the rules on graph cycle 1207 as all their cyclic permutations, fall into the category 1259 symmetry are respected, as explained below. Forks also 1208 of TYPE IV. 1260 appear in Types II-V, but where they connect to two

1209 Subcase B.3 (TYPE V): Subcase B.2 imposes that 1261 distinct nodes, which are two distinct product species. 1210 any motif containing four or more u or v symbols must 1262 The orange squares indicate that the reaction repre- 1211 alternate u and v symbols. Any motifs with five or 1263 sented can be replaced by a chain of reactions with a 1212 more u or v symbols contains, up to permutation, a 1264 single reactant species and a single product species. 1213 u−v−u−v proto-ear as a strict subgraph. However, the 1265 The mathematical derivations are done without 1214 latter is isomorphic to TYPE IV cores, which implies 1266 constraint on the number of reactants or products a 1215 that ∗u − v − u − v∗ (TYPE V) is the only motif in 1267 reaction step can have. However, in a chemical system, 15

1268 elementary reactions are in principle either unimolecu- 1326 Notably, every core follows the basic structure 1269 lar or bimolecular. This restricts the range of possible 1327 represented on Fig S1d, comprising a base cycle C and 1270 core graphs. This restriction operates only at the level 1328 a minimal path (in the sense of reaction hypergraphs, 1271 of the stoichiometry of each single reaction, not at the 1329 see former section) starting from a reaction fork and 1272 level of the overall structure of cores. Indeed, even 1330 joining back to a node of C. This structure should 1273 without invoking the restriction on bimolecularity, the 1331 enable a systematic algorithmic search for autocatalytic 1274 mathematical derivation results in cores such that ev- 1332 motifs in large stoichiometries. 1275 ery reaction step involves at most a single species as 1333 We illustrate the concepts of autocatalytic sub- 1276 reactant and at most two species as products, so that 1334 motif and the properties demonstrated above on a toy 1277 bimolecularity can always be respected, provided rules 1335 model for the Formose reaction, to which we have added 1278 on the stoichiometry of each connection. 1336 one auxiliary reaction C3 )−−−−* D3, so that we obtain 1279 In a chemical autocatalytic core, an edge with a 1337 a stoichiometric matrix 1280 fork connecting to two distinct nodes (e.g. a reaction 1281 with two distinct product species) must have a stoi- 1 2 3 4 1282 chiometry 1 for each connection in order to respect   1 C1 −1 −1 0 0 C + C −−* C 1283 bimolecularity. If a reaction has only one product 1 2 )−− 3 C2 −1 0 2 0 2 1284 species, then its stoichiometry can be 1 or 2, as soon   C1 + C3 )−−−−* C4 ν = C3  1 −1 0 −1 (20) 1285   3 as it respects rules on cycle stoichiometry, which we C −−* 2 C D3 0 0 0 1  4 )−− 2 1286 detail now. 4 C 0 1 −1 0 4 C3 )−−−−* D3 1287 Consider a simple graph cycle C, simple meaning 1288 that every reaction as a single reactant species and 1338 this full matrix has a left nullvector l = (1, 2, 3, 3, 4), 1289 a single product species. Note ai (resp. bi) the stoi- 1339 i.e. we have a mass-like conservation law 1290 chiometry of the reactant (resp. product) of reaction i. Q Q 1291 If i ai =6 i bi, we say that C is weight-asymmetric, L = n + 2n + 3n + 3n + 4n . (21) C1 C2 C3 D3 C4 1292 otherwise it is weight-symmetric. Weight-asymmetric

1293 simple graph cycles are an example of graph cycle 1340 Thus, this matrix does not correspond to an autocat- 1294 which has no reaction cycle. Weight-symmetric graph 1341 alytic motif. Upon removing C1 (then considered as a 1295 cycle taken in isolation have a reaction cycle (their 1342 feedstock molecule), we obtain an autocatalytic matrix 1296 determinant is zero as shown in the derivation of cores 1343 ν ∗ 1297 above), thus they correspond to allocatalytic cycles 1298 in the context of a larger reaction graph where the 1 2 3 4 1299 reactions of the graph cycle consume and/or produce 1 C −−* C 1300 species outside of its own species set. The most classic C −1 0 2 0  2 )−− 3 2 2 1301 example of an allocatalytic cycle is represented in Fig C3 1 −1 0 −1 C3 )−−−−* C4 ν ∗ =   (22) 1302 3 S1f, where reactant S is provided from the environment, D3 0 0 0 1  C4 )−−−−* 2 C2 1303 binds to catalyst E to form complex ES converted into C 0 1 −1 0 4 −−4* 1304 EP, finally dissociated into E which is recycled, and C3 )−− D3 1305 product P.

1344 1306 Type I cores are weight-asymmetric simple graph This matrix is autonomous and invertible with 1345 1307 cycles, as C in S1c . Consequently, in types II-V, any inverse: 1308 simple graph cycle must be weight-symmetric (these 1 1 2 2 2 1309 graph cycles corresponding reaction cycles), otherwise 2 1 1 1 2 −1 C2 C3 D3 C4 1310 ν =   = (g ,gg ,gg ,gg ). (23) the core would contain a Type I core, contradicting its ∗ 3 1 1 1 1 1311 minimality. For example in S1c: in Type II, C must 0 4 0 0 1 0 1312 be symmetric, but this does not apply to C because 0 1313 it is not even a simple cycle; in types III-V, C, C and 00 1346 The columns of the inverse are reaction vectors 1314 C must be symmetric. Type II cores consist of a 0 1347 associated to a given species, of which they produce 1315 large graph cycle (C in Fig S1c) comprising smaller 1348 one extra unit. We will refer to them as elementary 1316 graph cycles embedded within it (for example C in Fig 1349 production modes. 1317 S1c). Given the above, each of these smaller cycles D3 T 1318 must be weight-symmetric, and obeys the definition of 1350 Among the production modes, g = (2, 1, 1, 1) 1319 an allocatalytic cycle. The Type II category includes 1351 is the sole vector which uses the auxiliary reaction, and 1320 circularly closed successions of such allocatalytic cy- 1352 D3 only ever occurs as a product 1321 cles, where the product of one allocatalytic cycles is D g 3 1322 the catalyst of the next allocatalytic cycle, which is 2C + 2C + C )−−−−−−−* 2C + 2C + C + D . (24) 2 3 4 D 2 3 4 3 1323 a typical example of autocatalytic set. In addition −g 3 1324 however, Type II allows intermediate non catalyzed C C D C 1325 reaction steps. 1353 If we now consider Γ = g 2 + g 3 + g 3 + g 4 , we 16

1354 obtain a reaction balance such that every species has a 1377 A. Autoinduction 1355 net increase:

1378 The concept of autoinduction was put forward by Γ : 7C2 +6C3 +4C4 −−→ 8C2 +7C3 +5C4 +D3. (25) 21 1379 D. Blackmond , to distinguish between autocataly- 1380 sis that is mediated by external catalysts (i.e. not 1356 ν ∗ is not minimal as D3 is only produced, and does 1381 part of the autocatalysts that are reproduced) called 1357 not participate as a catalyst. Indeed, Property 2 im- 1382 ’autoinduction’, and autocatalysis that functions with- 1358 plies the existence of an autocatalytic submotif without 1383 out the aid of external catalysts (i.e. all catalysts are 1359 species which does not participate as a catalyst. An 1384 autocatalysts). We thus obtain a hybrid of pure al- 1360 autocatalytic submotif is obtained by removing the 1385 locatalysis and autocatalysis, for which the simplest 1361 reaction 1386 example would be

C3 )−−−−* D3, (26) A + B + E )−−−−* 2A + E. (31)

1362 and the species D3, to obtain the autonomous subma- 1387 The IUPAC definitions impose that autoinduction quali- 1363 trix ν¯ 1388 fies as autocatalysis. It follows then from our framework 1 2 3 1389 that we can find autocatalytic cores in autoinduction 1 1390 networks, which is indeed confirmed in Fig. S3 for the   21 C −−* C 1391 C2 −1 0 2 2 )−− 3 two types of autoinduction that have been proposed . 2 ν¯ = C 1 −1 0 (27) 3  C3 )−−−−* C4 1392 The concept of ’autoinduction’ addresses a no- C4 0 1 −1 3 1393 tion of self-sufficiency (also encountered in RAF sets, C4 )−−−−* 2 C2 1394 Sec.III D): external allocatalysts become essential to

1395 1364 The matrix ν¯ is also invertible succesful autocatalysis, yet they are not reproduced. 1396 Depending on the context, they could be seen as part   1 1 2 2 1397 of the environment, in the same sense as essential feed- −1 C C C ν¯ = 2 1 1 2 = (g 2 ,gg 3 ,gg 4 ). (28) 1398 stock species. Examples of autoinduction occur in 3 1 1 1 1399 autocatalytic metabolic networks (with locally allocat- 1400 alytic enzymes) or e.g. the formose reaction which is −1 1365 The columns of ν¯ correspond to reaction vectors, 1401 often catalyzed by base and divalent metal ions.

1366 whose application yields one net copy of the corre- 1402 The presence of external allocatalytic cycles does (k) 1367 sponding molecule, e.g. ∆nk = (ν¯ · g )k = 1. This is 1403 not add new cycles to the autocatalytic core. A practi- 1368 illustrated for C2 in Fig. S2. 1404 cal consequence is that one can write catalyzed reac- 1405 tions very compactly for the core, while still maintain- 1406 ing nonambiguity, which we make use of in our analysis 1407 of metabolic cycles.

1408 B. Metabolic cycles 1369 The individual replication cycles are

C g 2 C + C + C )−−−−−−−* 2C + C + C (29) 1409 At least two metabolic cycles are known to be 2 3 4 C 2 3 4 −g 2 1410 autocatalytic. In our analysis of autoinduction, we C 1411 pointed out that the core does not contain external g 3 2C + C + C )−−−−−−−* 2C + 2C + C 1412 allocatalysts (here: enzymes). Written purely in terms 2 3 4 C 2 3 4 −g 3 1413 of autocatalysts, we find a Type II autocatalytic core C g 4 1414 for the reverse Krebbs cycle (Fig. S4a). For the Calvin 2C + 2C + C )−−−−−−−* 2C + 2C + 2C 2 3 4 C 2 3 4 −g 4 1415 cycle depicted in Fig. S4b, we identify three Type I 1416 cores (two structurally equivalent, differing only in the C C C 1370 We can construct Γ = g 2 + g 3 + g 4 , which leads to 1417 reaction chosen to link the same core species) and 4 of 1371 the overall reaction 1418 Type II.

Γ 5C2 + 4C3 + 3C4 )−−−−* 6C2 + 5C3 + 4C4. (30) −Γ 1419 C. Chemical amplification

1372 We see here that every species has a net production 38,54 1373 and is on both sides of the balance, thus participates 1420 Chen et al. advanced a general strategy to 1374 as a autocatalyst. Furthermore, ν¯ is a Type I core and 1421 achieve self-amplifying behavior, demonstrated by the 1375 consequently does not contain any smaller autocatalytic 1422 encapsulation of the reactive compound DCC in a cav- 1376 submotifs. 1423 itand (indicated by an oval around the molecule in 17

1424 EDCC, Fig. S5), which they referred to as chemical 1449 molecule involved in a reaction in R, and every reactant 1425 amplification. This phenomenon is consistent with the 1450 in R can be constructed from the food set f by successive 1426 IUPAC definition of autocatalysis, which we will now 1451 applications of reactions from R. The philosophy 1427 illustrate by finding its autocatalytic core. 1452 behind the RAF set is that ’every reaction’ in a RAF- 1453 set can be accelerated by the catalysts themselves. 1428 Let us first assess a reaction network formed by 1454 RAF sets do not perform autoinduction, except when 1429 two reactions proposed by Chen et al, in absence of 1455 members of the food set f within a RAF-set also serve 1430 the cavitand complexes. In that case, applying our for- 1456 as allocatalysts. 1431 malism readily reveals that the network cannot exhibit 1432 autocatalysis: there are no autocatalytic submatrices. 1457 In the RAF-set formalism, reaction and catalysis 1458 are distinct mathematical objects. Graphically, RAF- 1 2 1459 sets are typically represented as bipartite graphs (S6a), DCC−1 0  1460 with nodes (white squares) corresponding to reactions, X −1 0 1461 which connect to nodes (colored circles) which serve   1 I1  1 −1 DCC + X )−−−−* I1, 1462 as reactants (entering bold arrow) and products (leav- ν =   (32) Y  0 −1 2 1463 ing bold arrow) via directed edges. A dashed arrow   I1 + Y )−−−−* DCU + Z. DCU 0 1  1464 connecting to a reaction indicates that a species is a Z 0 1 1465 catalyst for a reaction. Within the RAF framework, 1466 the following terms are employed as distinct: i) auto- 1433 Now, let us add the species EDCC and two exchange 1467 catalytic reaction ("is a single chemical reaction for 1434 reactions with DCU and Z that liberate DCC, i.e. 1468 which one of the products also catalyzes the reaction"), 1469 ii) autocatalytic cycle ("is a sequence of reactions that, 1 DCC + X )−−−−* I1, (33) 1470 once completed, results in two (or more) copies of the 2 1471 molecule that was started with"such as the toy formose I1 + Y )−−−−* DCU + Z, (34) 1472 reaction), iii) autocatalytic set (or RAF set). 3 EDCC + DCU )−−−−* DCC + EDCU, (35) 1473 It is important to note that the RAF-set formalism 1474 and the distinctions above depend on the chosen level of −−4* EDCC + Z )−− DCC + EZ, (36) 1475 coarse-graining of the description. In that description 1476 (see Fig. S6a) allocatalysts in the same allocatalytic 1435 for which we have the matrix 1477 cycle are represented by a single species. The combi- 1478 nation of reactions that form the allocatalytic cycle 1 2 3 4 1479 is represented by a single dashed line, connecting to   DCC −1 0 1 1 1480 an uncatalyzed reaction. Here, there exists a level of X −1 0 0 0  1481 description where a RAF set appears as autocatalytic I1    1 −1 0 0  1482 reaction involving catalytic cycles. By comparing Figs. Y  0 −1 0 0    1483 S6a, we see that the level of description is critical in ν = DCU  0 1 −1 0  (37)   1484 assessing whether a network is a RAF-set or not. From Z  0 1 0 −1   1485 the definition of a RAF-set, the detailed version of the EDCC 0 0 −1 −1   1486 network in Fig.S6a would not be a RAF, but the less EDCU 0 0 1 0  1487 detailed description next to it would be. EZ 0 0 0 1 1488 Another example is the toy formose reaction, 1489 where two different choices of coarse-graining yield 1436 This network admits an autocatalytic submatrix, and 1490 two different description in the RAF framework. Even 1437 we obtain an autocatalytic core of Type III consisting 1491 when neglecting the role of catalytic base and metal 1438 of the species DCC, I1, DCU and Z, as can be seen in 1492 ions, the toy formose reaction without coarse-graining 1439 Fig. S5b. The new reagent EDCC serves as a feedstock 29 1493 is not a RAF-set , since its individual steps are not 1440 compound, that allows to dispense new DCC, that can 1494 catalyzed: 1441 now serve as an autocatalyst. The generality of the 1442 mechanism follows from an exchange that could have C1 + C2 )−−−−* C3, C1 + C3 )−−−−* C4, C4 )−−−−* 2C2. 1443 been performed with different reactants than DCU and 38,54 (38) 1444 Z to yield an equivalent network, as shown in Refs . 1495 Framed solely in terms of C1 and one autocatalyst (e.g. 1496 C2, C4) one could propose a description in which we are 1497 oblivious of these steps, and write it as an autocatalytic

1445 D. Autocatalysis in RAF sets 1498 reaction in the sense of RAF sets, such as

2C1 + C2 )−−−−* 2C2, 4C1 + C4 )−−−−* 2C4. (39) 1446 A definition of Reflexively Autocatalytic Food- 20 1447 generated sets can be found in Ref. : a set of reactions 1499 This would work as long as we consider them satisfac- 1448 R is RAF if every reaction is catalyzed by at least one 1500 tory levels of description, depending on the context 18

1501 (e.g. further reactivity of intermediates, separation of 1546 {1, 2, .., s}, the entry βij encodes the contribution for 1502 timescales): either of these hypothetical cases satisfy 1547 the allocatalytic cycle 1503 the RAF criteria. ∗ II I c I I 1504 An interesting contrast occurs between RAF sets A i + A j )−−−−−* A i + A j (43) −c∗ 1505 and our formalism: RAF-sets require a level of coarse- 1506 graining without intermediates to define catalysis in ∗ 1548 where c denotes a reaction vector for the catalytic 1507 terms of single catalysts, which ensures a compact and 1549 cycle, for any description that verifies nonambiguity. 1508 unique description. Our formalism requires each cat- 1550 When i = j, this is a Type I autocatalytic cycle (Fig. 1509 alytic cycle to have at least one intermediate to satisfy 1551 S6c). When i 6= j (cross-incorporation), Type II auto- 1510 nonambiguity. Although less compact, the description 1552 catalytic cycles are obtained, which are built up from n 1511 is flexible: adding further steps is always possible and 1553 sequential allocatalytic incorporation steps and which 1512 does not alter the conclusions, which must remain in 1554 end in the incorporation of the original amphiphile. 1513 accord with chemical definitions. 1555 Fig. S6 shows examples for n = 2 and n = 3. The 1514 In describing catalysis in terms of uncatalyzed 1556 importance of the allocatalysis step (43) is graded by 1515 reactions, it becomes possible to formalize the under- 1557 the entry βij. For a given n-step autocatalytic motif 1516 lying structure of catalysis in different models. We 1558 to exist, we require 1517 will now illustrate how this formally establishes that n 1518 all autocatalysis in the GARD model qualifies as a Y β > 0, s 6= s 6= .. 6= s . (44) 1519 RAF-set. sk+1sk 1 2 n k=1

1559 In practice, all βij > 0, so all motifs exist in principle. 1520 E. GARD model 1560 The starting point of GARD is (40), i.e. a coarse- 1561 grained description in which allocatalysis can be de-

1521 GARD stands for graded autocatalysis replica- 1562 scribed as a single step and a single allocatalyst. From 18 1522 tion domain , and is a model for the autocatalytic 1563 Fig. S6, we see that we can obtain networks in GARD 1523 assembly of amphiphile assemblies (e.g. micelles). It 1564 that would be RAF-sets in the RAF-framework by 1524 describes micelles with a composition n = {n1, ..., ns}, 1565 coarse-graining an incorporation cycle to convert it to 1525 that follow an evolution equation 1566 catalysis in the RAF sense. The illustrated procedure 1567 extends to all autocatalysis in GARD, which is fully   s 1568 characterized by the continuation of the structures in dni + −  1 X = k ρ N − k n 1 + β n . (40) 1569 Fig. S6 to their n-step Type II analogues. (44) guar- dt i i i i  N ij j j=1 1570 antees that each reaction in GARD is catalyzed. It 1571 follows that all autocatalysis in the GARD model can 1526 The surfaces are in contact with a reservoir, that con- 1572 formally be treated as a RAF-set. 1527 tains species Z at concentration ρi and that can enter i 1573 Interestingly, the RAF-set formalism treats catal- 1528 the surface, which has an area proportional to N. The + 1574 ysis as pertaining to chemistry in single phases, with 1529 incorporation happens with a base rate of k , but can i 1575 the environment supplying food locally through rapid 1530 be facilitated by other amphiphiles, for which the cat- 1576 exchange. In GARD, we instead have phase-transfer 1531 alytic rate enhancement is characterized by βij.A 1577 catalysis between a bulk medium (II) and an interface 1532 special ingredient in GARD is the division process, 1578 (I). A species in the bulk then serves as the food. Once 1533 which splits a mature aggregate in two new ones, after 1579 the exact same species enters the interface, however, it 1534 achieving a maximal size. 1580 may cease to be abundant or have a fixed concentration 1535 We examine here stoichiometric mechanisms lead- 1581 due to rapid exchange. It may thus no longer have 1536 ing to (40). Let us consider amphiphiles A and B, 1582 the properties ascribed to the food set in the bulk and 1537 which can be in the micelle or reservoir, labelled I 1583 should ipso facto be treated as a different species as 1538 and II (see Fig. S6). To catalyze incorporation of the 1584 described in the final section of the main text. 1539 other, a complex is formed with a reservoir species, and 1540 subsequent dissociation takes place in the micelle

I II I I I A + B )−−−−* [AB] )−−−−* A + B , (41) 1585 IV. THE EXTINCTION PROBLEM AND FIXATION BI + AII )−−−−* [BA]I )−−−−* AI + BI. (42) 1586 In this section, we show that the extinction prob- 1541 This network corresponds exactly to the network dis- 1587 lem (finding the extinction probability at long times, 1542 cussed in Fig. S6a, where the products of two allocat- 1588 Pex) can be solved by a mapping to a branching pro- 1543 alytic cycles serve as a catalyst for each other, a Type 1589 cess. We will first derive how in a system initially at 1544 II core in our stoichiometric formalism. 1590 steady-state perturbed with dilute autocatalysts, key 1545 More generally, labelling amphiphiles as Ak (k ∈ 1591 statistical properties emerge: autocatalytic species can 19

1592 be treated as independent, and their environment as 1633 where Yj is a feedstock compound that was already 1593 fixed. Extinction becomes exponentially less likely as 1634 (abundantly) present in the system at a fixed molar 1594 the population size continues to grow, which means 1635 fraction x . The probability for one X molecule to Yj k 1595 Pex can be determined in the dilute limit. 1636 encounter a Y does not change with N, as x remains j Yj 1596 We then proceed by applying the framework to a 1637 fixed (and macroscopic). 1597 variety of networks. 1638 Note furthermore that, when autocatalysts are 1639 rare, reactions producing more autocatalysts are ’irre- 1640 versible’ 1598 A. Context Xk + Yj −−→ Xl + Xm (49)

1599 We consider a reaction network, in a macroscopic 1641 in the sense that the reverse reaction is exceedingly 1600 system (letting N denote the amount of chemical 1642 more rare than the forward reaction. 23 1601 species, let us say N > 10 , or similarly, let the system 1602 volume V be large) in a steady-state. For simplicity, let 1603 us first consider a CSTR (single phase, ideally mixed), 1604 with a residence time τ, corresponding to a uniform 1643 C. Constant composition, constant transition rates 1605 degradation rate kd.

1606 Now, we perturb the steady-state with a handful 1644 The effect of rare autocatalysts on the steady-state 1607 (O(1)) of new (not yet present in the system) autocata- 1645 composition (maintained by influx and degradation in 1608 lysts {Xk} = {X1, X2, ..., Xs} that are part of the same 1646 a CSTR) is initially small: every reaction introduces 1609 autocatalytic core. 1647 changes in molar fraction of the order 1/N (or in con-

1610 We consider that the population can grow due to 1648 centration terms, 1/V ∝ 1/N). For large N, the al- 1611 catalysis by autocatalysts, and the population decays 1649 terations of the composition will be vanishingly small 1612 by degradation reactions and effective degradation. For 1650 while the autocatalysts are rare.

1613 example, outflow out of a CSTR is considered as a first- 1651 Consequently, we can approximate the reactor 1614 order degradation process: 1652 composition in which an autocatalyst is placed, as 1653 the steady-state composition before perturbation. We Xk → ∅ (45) 1654 then assume that the molar fractions of species Yk 1655 consumed by autocatalysts are sufficiently abundant 1615 The problem we wish to solve is the extinc- 1656 i.e. x  1/N, which was also required for (48). Yk 1616 tion problem: For an initial population of autocat- 1657 For sufficient N, deviations from this approximation 1617 alysts {N } = {NX , .., NX } what is the probability Xk 1 s 1658 become vanishingly small. 1618 Pex({NX }), that, after a long time, the autocatalyst k 1659 A given Xk will therefore have a fixed transition 1619 population goes extinct (∀k N = 0)? + Xk 1660 rate w = kx to perform (48). Similarly, for (47) k Yj 1661 and CSTR degradation, a fixed transition rate w = k 1662 is found, depending only on a rate constant. 1620 B. Large system limit

1621 Let us first note that we (deliberately) consider 1663 D. Independence 1622 the initial stochastic kinetics in a large system, with 1623 a small number of autocatalysts, such that reactions 1664 In the rare-autocatalyst regime, all reaction steps 1624 among autocatalysts of the kind 1665 we consider for autocatalysts are first-order, and they 1666 occur at fixed rates. It follows that autocatalysts do Xk + Xj −−→ ”P roduct(s)” (46) 1667 not influence each other, and they can each be treated 1625 are exceedingly rare and slow (the probability that a 1668 independently. Thus, we can treat each autocatalyst 1626 given Xk molecule encounters another Xj in a given 1669 type separately: 1627 time-frame scales with N /N), where N is initially Xj Xj Pex({N }) = Pex(NX )Pex(NX )..Pex(NX ). (50) 1628 of the order 1. It follows survival of autocatalytic Xk 1 2 s 1629 cycles requiring such reactions in the forward sens is 1670 Also each individual autocatalyst can be treated as 1630 hampered in large systems. 1671 such: 1631 Reactions that are first order in terms of autocat- NX 1632 alysts are not hampered P (N ) = P (X ) k (51) ex Xk ex k X )−−−−* ”P roduct(s)” (47) k 1672 Where Pex(Xk) denotes the extinction probability of a Xk + Yj )−−−−* ”P roduct(s)” (48) 1673 population initially composed of a single Xk species. 20

∅ ∅ 1674 We thus need a method for finding Pex(Xk). This 1707 Where Π1, Π2 are success probabilities for the degra- 1675 will be obtained by mapping the problem to a branching 1708 dation reaction. Due to total probability conservation, 1676 process. 1709 we have

∅ + Π1 + Π1 = 1, (58) 1677 E. A branching process ∅ − + Π2 + Π2 + Π2 = 1 (59)

1710 Ultimately, a B species will either be replaced by 2 1678 An attractive method we propose for finding 1 1711 new ones (2B ), or none (∅): 1679 Pex(Xk) (from here on simplified to Pex), is by con- 1 1680 sidering the distribution of ’descendants’ Xk a species p p ∅ ←−−0 B −−→2 2B , (60) 1681 Xk will generate. From this, we construct a Branch- 1 1 1682 ing Process, represented chemically as a single parent 1712 which is a chemical representation of the simplest type 1683 molecule Xs yielding k descendants: 1713 of branching process: a birth-death process. In (53),

Pk 1714 the only nonzero terms will come from 0 descendants Xs −−→ kXs, (52) 1715 (p0) and 2 descendants (p2):

1684 with pk a distribution of the number of descendants 2 2 Pex = p0 + p2Pex = 1 − p2 + p2Pex. (61) 1685 which depends on the network topology.

1686 Knowing pk suffices to find Pex, since the probabil- 1716 Solving the quadratic equation yields two solutions 1687 ity to go extinct is the probability that all descendants 1688 independently ( (51)) go extinct: ± 1 ± (1 − 2p2) Pex = . (62) 2p2 ∞ 2 X k Pex = p0 + p1Pex + p2P + ... = pkP , (53) + ex ex 1717 For our problem, the ’physical’ solution is Pex while k=0 + 1718 p2 ≥ 1/2. Beyond that point, Pex > 1, while we require − 1719 0 ≤ Pex ≤ 1, so P = 1 becomes the only physical 1689 We will now highlight some possible choices for Branch- ex 1720 solution. 1690 ing Processes and their associated pk.  1 1 − 1, p2 ≥ , p2 2 Pex = 1 (63) 1, p2 < 2 , 1691 F. A Birth-Death Process for the Type I cycle

1721 The average number of descendants is 2p2, which means 1692 Consider a simple Type-I cycle such as in Fig 1722 that below p2 = 1/2 (the decay threshold) B1 is on aver- 1693 S7a. Here, simple refers to there being a direct path 1723 age replaced by less then one B1 species and extinction 1694 of (effective) unimolecular steps between the starting 1724 is guaranteed. 1695 compound (B1) and final compound (B2), followed by 1696 a single fragmentation step producing two B1 from one 1697 B2

B2 −−→ 2B1. (54) 1725 G. A Branching Process for the same type I cycle

1698 Starting from the marked node (B1), let p2 be the 1726 To illustrate that there is a variety of choices for 1699 probability of successfully forming 2B1, i.e. 1727 the stochastic process under study, we will here consider p 2 1728 an alternative choice for the simple Type-I cycle, which B1 −−→ 2B1, (55) 1729 is more generally applicable. Noting that a single 1700 where p2 contains the contribution of all possible tra- 1730 cycle is successful with probability p2, we now consider 1701 jectories (here: going back and forth between B1 and 1731 the number of successful cycles a single B1 provides, 1702 B2) that precede the irreversible fragmentation step, 1732 knowing that at some point degradation intervenes with 1703 i.e. 1733 probability 1 − p2. A succession of k cycles preceding 1734 a failure will thus lead to k successors + − + + Π1 Π2 Π1 Π2 B1 −−→ B2 −−→ B1... −−→ B2 −−→ 2B1 (56) pk B1 −−→ kB1. (64) + − + 1704 Where Π1 , Π2 and Π2 are success probabilities for 1735 The probability to spawn k descendants, pk, is the 1705 the single reaction steps. These transitions compete 1736 probability of k successful Bernoulli trials followed by 1706 with irreversible degradation processes 1737 failure, and follows a geometric distribution

∅ ∅ Π1 Π2 k B1 −−→∅, B2 −−→∅. (57) pk = p2 (1 − p2). (65) 21

+ 1738 Injecting this distribution in (53), we find a geometric 1767 cides with the transition probability Πk 1739 series + ∞ + wk lim Π = lim = ζk (72) X 1 − p2 − k − − + ∅ k w →0 w →0 Pex = (1 − p2)(p2Pex) = . (66) k−1 k−1 wk−1 + wk + w1 1 − p2Pex k=0

1768 For simple Type-I networks with n reaction steps 1740 Multiplying both sides by 1 − p2Pex then yields the 1769 (n distinct edges), the irreversible limit ((70)) general- 1741 quadratic equation (61) previously found for the Birth- 39,55 1770 izes to 1742 Death process. This is necessary, since we are calcu- n n + 1743 lating the same quantity Pex for the same network. It Y Y w p = Π+ = k , (73) 1744 highlights that we may construct a variety of branching 2 k + ∅ wk + wk 1745 processes to find Pex. k=1 k=1

1771 which we will show more formally in the next section.

1746 H. Microscopic expressions for p2

1772 J. The simple Type I cycle with n steps 1747 We can construct p2 from microscopic details. In 1748 terms of transition probabilities, we find p2 by summing 1749 over all trajectories in (56): 1773 To expand our discussion on simple Type I cycles, 1774 we will now derive a general expression for p2, when ∞ + + X + − + k + Π1 Π2 1775 there are n steps, of which the first n − 1 are treated p2 = Π (Π Π ) Π = (67) 1 2 1 2 1 − Π−Π+ 1776 as reversible. The problem is illustrated in Fig. S7c. k=0 2 1 1777 Let us denote P the probability to reach Xk→Xj 1750 A more detailed description is possible when our de- 1778 X , starting from X . For P , there is only one j k X1→X2 1751 scription is Markovian (i.e. reactions are sufficiently 1779 trajectory: + 1752 elementary). Let wk denote a forward transition rate, − Π+ 1753 to go from X to X . Let w denote a backward tran- 1 k k+1 k X1 −−→ X2, (74) ∅ 1754 sition rate from Xk to Xk−1 and wk the degradation + 1755 rate for X . We may then write 1780 and hence P = Π . For P , we need to k X1→X2 1 X2→X3 1781 consider that we can go back and forth between X2 + − w w 1782 and X : Π+ = 1 , Π− = 1 , (68) 1 1 w+ + w∅ 2 w− + w∅ + w+ 1 1 1 2 2 + Π Π+ + 1 2 + w2 X1 )−−−−* X2 −−→ X3. (75) Π = . (69) Π− 2 − ∅ + 2 w1 + w2 + w2

1783 We can absorb the contribution of hopping back-and- 1784 forth in the factor Γ2: 1756 I. The irreversible reaction limit P = Γ Π+ (76) X2→X3 2 2 ∞ 1757 Simple Type-I cycles have been studied in the 1 17,19,40 X − + k 1758 limit where all reactions proceed irreversibly Γ2 = (Π Π ) = (77) 2 1 1 − Π−Π+ − k=0 2 1 1759 (∀k wk → 0). In this limit backward reactions are − 1760 ignored (Π = 0), which for our example leads to 2 1785 Now, let us consider this argument when we want to + + 1786 find PX →X for w w 3 4 p = Π+Π+ = 1 2 . (70) 2 1 2 + ∅ + ∅ w + w w + w Π+ Π+ + 1 1 2 2 1 2 Π3 X1 )−−−−* X2 )−−−−* X3 −−→ X4. (78) Π− Π− 1761 In this limit, there is only one trajectory that con- 2 3 1762 tributes to pc, and each step involves a competition 1787 Now, we need to consider that we can go back and 1763 between the forward reaction and degradation only. 1788 forth between X3 and X2, and that at X2 we can go 1764 In studies using simple Type-I cycles, the fraction 1789 back and forth between X2 and X1 (captured by Γ2) + 1790 before moving to X3. We absorb all this in the factor wk ζ = , (71) 1791 Γ : k + ∅ 3 wk + wk P = Γ Π+ (79) X3→X4 3 3 1765 has been referred to as the specificity of reaction ∞ − + 19,39,40,55 Γ = P (Π Γ Π )k = 1 (80) 1766 step k, which for irreversible reactions coin- 3 k=0 3 2 2 − + 1−Π3 Γ2Π2 22

1792 We can repeat this argument, e.g. for the kth step 1819 becomes

+ + Π+ + C k Π1 Π2 k−1 Π Pk = (1 − pD)pD. (89) X )−−−−* X )−−−−* ... )−−−−−−−*X −−→k X , (81) 1 − 2 − − k k+1 Π2 Π3 Πk 1820 Combined, a single C1 has been then replaced according 1821 to 1793 which then yields the following recursive expressions 1794 for k ≥ 1 C1 −−→ sD1 −−→ (n1 + n2 + ... + ns)C1. (90) + P = Γ Π (82) s Xk→Xk+1 k k P 1822 Let us denote k = l=1 nl as the number of descen- P∞ − + s 1 Γk+1 = s=0(Πk+1ΓkΠk ) = − + . (83) 1823 dants. The distribution of the number of descendants 1−Πk+1ΓkΠk 1824 pk then becomes

1795 where Γ2 imposes that Γ1 = 1. ∞ ∞ s X P X Y DCU k pk = P P δ , (91) 1796 The total probability p2 to reach Xn from X1 s nr n1+...+ns s=0 n ,..,n r=0 1797 and then perform the final irreversible fragmentation 1 s 1798 reaction rn, is then 1825 which simplifies to n−1 ! n Y + Y + α k p2 = PX →X Π Γn = Π Γk, (84) p0 = 1 − β , pk = βα , k ≥ 1 (92) k k+1 n k 1 − α k=1 k=1

1826 where 1799 When backward transitions become negligible − 1800 (∀k Π → 0), we have ∀k ≥ 1 Γ = 1, and p then p p (1 − p )(1 − p ) k k 2 α = D , β = C D C . 1801 acquires its well-known limit expression described by 1 − pC(1 − pD) 1 − pC(1 − pD) 1802 (73). (93) 1827 From (53), we then find Pex. By rewriting the geomet- 1828 ric series due to (93), we have

α αP P = 1 − β − β ex , (94) 1803 K. A Branching Process for a Type II cycle ex 1 − α 1 − αPex

1829 which admits the solutions Pex = 1 and 1804 As an example of a simple Type II cycle, we con- 1805 sider the network in Fig. S7e, starting with the species β 1 1 − pC Pex = + = . (95) 1806 C1. Our approach will be reminiscent of our branching α − 1 α pD 1807 process in Sec. IV G, but with a repeated branching 1808 step.

1809 With a probability pC, C1 will successfully perform 1810 the allocatalytic cycle r1 + r2 + r3 (with some possible 1811 back-and-forths), yielding overall

pC C1 −−→ C1 + D1. (85) 1830 L. Microscopic expressions for pC and pD

D 1812 The probability Pk that k successful cycles occur before 1813 the first failure (i.e. degradation C → ∅) is k 1831 Let us first consider how pC can be constructed D k 1832 from smaller reaction steps. To do so, we observe that P = (1 − pC)p . (86) k C 1833 the first step must be

1814 Corresponding effectively to the overal reaction + Π1 C1 −−→ C2, (96) D Pk C1 −−→ kD1. (87) + 1834 with Π1 the probability of success, competing with 1835 degradation 1815 Now, let pD be the probability that D1 succesfully 1816 completes a cycle r4 + r2 + r3 (including back-and- ∅ Π1 1817 forths): C1 −−→∅, (97)

pD ∅ + D1 −−→ C1 + D1. (88) 1836 for which Π1 = 1 − Π1 .

1818 The probability of k successful cycles before failure 1837 Arrived at C2, going back-and forth reversibly 23

1838 becomes possible for neighboring nodes C1, C3 and D1: 1863 replacing pC with ps and pD with p1, thus finding

+ + 1 − p Π1 Π2 c,s Pex = . (107) C1 )−−−−* C2 )−−−−* C3, (98) − − pc,1 Π2 Π3 + Θ1 1864 We now turn to the problem of finding expressions for D1 )−−−−* C2, (99) − 1865 pc,s and pc,1. Θ1 1866 By structural analogy to simple Type I cycles + − − + 1839 Where Πk , Πk , Θk , Θk all denote success probabilities. 1867 (apart from the fragmentation step), we may again 1840 Starting at C1, a successful trajectory necessarily starts 1868 write + + + 1841 with reaction r1 (Π ) and ends with r2 +r3 (Π Π ), all 1 2 3 + 1842 P = Π Γ (108) in the forward sense. In between, we can go back-and- Xk→Xk+1 k k − + − + + − 1843 forth to C1,C3 and D1 (Π2 Π1 + Θ2 Θ1 + Π2 Π3 ) any P∞ − + s 1 Γk+1 = s=0(Πk ΓkΠk ) = − + . (109) 1844 number of times. Summing over all possible trajectories, 1−Πk+1ΓkΠk 1845 pC then becomes 1869 with Γ1 = 1. ∞ Qn−1  + X + − + − + + − k + + 1870 Since pc,s = k=s PX →X Πn Γn we find pC = Π1 (Π2 Π1 +Θ2 Θ1 +Π2 Π3 ) Π2 Π3 (100) k k+1 k=0 n ! Y + 1846 which sums to pc,s = Πk Γk (110) k=s + + + Π1 Π2 Π3 pC = − + − + + − (101) 1 − (Π2 Π1 + Θ2 Θ1 + Π2 Π3 )

1871 N. Symmetric motifs 1847 Starting at D1, a successful trajectory necessarily starts + 1848 with r4 (Θ1 ) in the forward sense, to form C2. From 1849 there on, a successful trajectory follows the previous 1872 When the network motif is symmetric in struc- + + 1850 calculation, i.e. p = Θ /Π p : D 1 1 C 1873 ture, and if the transitions preserve this symmetry, 1874 this can be exploited to simplify calculations and gain Θ+Π+Π+ 1 2 3 1875 insight in topological aspects of autocatalysis and ro- pD = − + − + + − (102) 1 − (Π2 Π1 + Θ2 Θ1 + Π2 Π3 ) 1876 bustness. Experimentally, this symmetry rarely applies 1877 for the transitions, but it can be made applicable for 1851 In the limit where all reactions proceed irreversibly 1878 the purpose of our analysis, e.g. by setting transi- 1852 forward (Sec. IV I), pC and pD only have a contributing 1879 tion probabilities to values that reflect the structural 1853 from a single straight trajectory 1880 symmetry.

+ + + + + + 1881 Consider a series of m linked allocatalytic cycles pC = Π1 Π2 Π3 , pD = Θ1 Π2 Π3 . (103) 1882 (Fig. S8b), all consisting of n nodes and n edges which 1883 are structurally equivalent:

+ Π+ Πkn+1 (k+1)n 1854 M. Type III cycles with one fragmentation step Xkn+1 −−−−→ .. −−−−−→ Xkn+1 + X(k+1)n+1, (111)

1884 which loops back at the mth cycle:

1855 Let us now consider a Type III network composed + Π Π+ 1856 of n species {X1, .., Xn} and reaction steps {r1, .., rn}, m−n+1 mn Xm−n+1 −−−−−→ .. −−−→ X1 + Xm−n+1, (112) 1857 where the final fragmentation step rn produces + 1885 As before, Πk denotes a forward transition probability, Xn −−→ X1 + Xs, 1 < s < n, (104) 1886 and we also introduce reverse reactions and degradation 1887 in the usual sense: 1858 as shown in Fig. S7e

− 1859 We may then introduce the success rates for the Πk Xk −−→ Xk−1, (113) 1860 allocatalytic cycles for X1 and Xs ∅ Πk pc,1 Xk −−→∅. (114) X1 −−→ X1 + Xs, (105) p c,s 1888 Now, let us choose transition probabilities such that Xs −−→ X1 + Xs. (106) 1889 they are equivalent among the equistructural allocat- + + 1861 Which was exactly our starting point in Sec. IV K. 1890 alytic cycles, i.e. periodic in n: Πk = Πk+n and idem − − 1862 Starting at Xs, we may thus directly use Pex, upon 1891 for reverse steps (Πk = Πk+n) and by extension, degra- 24

∅ ∅ + − ∅ 1892 dation (Πk = Πk+n), since Πk + Πk + Πk = 1. 1930 Pex = (1 − p2)/p2 (for p2 ≥ 1/2) ((63)), we then find 1893 Then, finding P (X ) is no different [footnote: ex k 1 − ζ6 1894 provided we choose 1 ≤ k ≤ m − n] from finding P = (119) ex ζ6 1895 Pex(Xk+n), which is seen readily in Fig. S8a by rotating 1896 the networks and interchanging the labels. Applying 1897 this symmetry to Fig.S8a for the six-membered cycle 1898 (n = 2, m = 3), we can thus write

1931 B. N : a 6-membered asymmetric Type III cycle Pex(X1) = Pex(X3) = Pex(X5). (115) 2

Pex(X2) = Pex(X4) = Pex(X6).

1932 In Sec. IV M, the general solution for n-membered 1899 We characterize the success of each equivalent allocat- 1933 Type III cycles with one fragmentation reaction was de- 1900 alytic cycle (n steps) by the same probability p2. We 1934 rived. For a 6-membered cycle with the fragmentation 1901 can then express the extinction probability in terms of 1935 reaction 1902 the reaction products of one allocatalytic cycle:

X6 −−→ X1 + X4, (120) Pex(X1) = 1 − p2 + p2Pex(X1)Pex(X3) (116) 2 = 1 − p2 + p2Pex(X1) , (117) 1936 which corresponds to network N2 in the main text.

1903 where we have used the symmetry in (116), which 1937 Having X4 as the starting species, we can express 1904 yields the exact second order equation we obtained for 1938 Pex(X4) in terms of (110) 1905 a simple Type I cycle of n steps. We can thus resolve the 1 − pc,4 1906 general problem using our previously derived solutions. Pex = . (121) pc,1

1939 In the irreversible reaction limit with fixed specificity + 1907 V. EXPRESSIONS FOR FIG 3, A SURVEY OF P ex 1940 (∀k ≥ 1 Πk = ζ, Γk = 1), Pex becomes 1908 FOR VARIOUS STRUCTURES 1 − ζ3 Pex = 6 (122) 1909 In Fig. 3 in the main text, the behavior of Pex ζ 1910 is compared for a number of networks (N1 to N5), in 1911 the limit where all reaction steps proceed irreversibly 1912 (Sec. IV I), and where all reaction steps do so with a 1913 common success probability ζ (also known in the liter- 1941 C. N3: 6-membered Type II network with RAF 1914 ature as specificity). Of course, this is an abstraction 1942 representation 1915 that is hard to realize experimentally, and its purpose 1916 is the following: by controlling for kinetics, we can 1917 systematically investigate and compare how survival is 1943 The network N3 consists of two nonoverlapping 1918 affected by network topology. In this section, we will 1944 allocatalytic cycles, which produce each other’s allocat- 1919 derive the functional dependence of Pex on ζ for the 1945 alyst: 1920 structures discussed in the main text. Π+ Π+ + 1 2 Π3 A1 )−−−−* A2 )−−−−* A3 −−→ A1 + B1, (123) + + Θ Θ Θ+ 1921 A. N1: a 6-membered Type I cycle 1 2 3 B1 )−−−−* B2 )−−−−* B3 −−→ A1 + B1. (124)

+ + 1946 1922 In Sec. IV J, we derived the general solution for Choosing our transition probabilities Πk = Θk , we can 1947 1923 n-membered Type I cycles in terms of the probability exploit the network symmetry as outlined in Sec. IV N 1948 and Pex(A ) = Pex(B ). Let pc be the probability that 1924 p2 to reach Xn and perform rn. Starting from X1, we 1 1 1949 1925 can now recover the solution for n = 6 either A1 or B1 successfully finishes an allocatalytic 1950 cycle. We may then write 1926 For n = 6, p2 becomes

5 ! 6 Pex(A1) = 1 − pc + pcPex(A1)Pex(B1) (125) Y + Y + 2 p = P Π Γ = Π Γ (118) = 1 − pc + pcPex(A ) . (126) 2 Xk→Xk+1 6 6 k k 1 k=1 k=1 1951 The success rate pc can be expressed in the familiar 1927 Moving to the irreversible reaction limit, and control- 1952 way + 1928 ling the reaction specifity (∀k ≥ 1 Πk = ζ, Γk = 1), 6 + + + 1929 we obtain p2 = ζ . Upon injection in the solution pc = Π1 Π2 Π3 Γ1Γ2Γ3. (127) 25

1953 In the irreversible limit (∀k ≥ 1, Γk = 1) with fixed 1975 An alternative way of seeing this is that, by symmetry, 3 1954 specificity ζ, pc = ζ , and thus we find for pc ≥ 1/2 1976 Pex is the same as that for a 3-membered Type III 1977 cycle with one fragmentation step (forming X2), for 3 1 − ζ 1978 which we can directly use the solution derived in Sec. Pex = 3 . (128) ζ 1979 IV M.

1955 D. N4: 6-membered symmetric Type III network 1980 E. A trio of symmetric analogues

1956 The network N4 consists of two nonoverlapping 1981 As derived in Sec. IV N, the interlinked allocat- 1957 allocatalytic cycles, which produce a precursor (A0, B0) 1958 for each other’s allocatalyst: 1982 alytic cycles of size n behave, due to symmetry, as 1983 an n-membered simple Type-I cycle. Reproducing the + + + Π Π 1984 0 1 Π2 solution for the 2-membered cycles (see Sec. IV F) in A0 )−−−−* A1 )−−−−* A2 −−→ A1 + B0, (129) + − 1985 the irreversible limit with ∀k ≥ 1 Πk = ζ, Πk = 0, we Θ+ Θ+ + 0 1 Θ2 1986 thus find B0 )−−−−* B1 )−−−−* B2 −−→ A0 + B1. (130) 1 − ζ2 1959 Choosing our transition probabilities to follow the sym- Pex = 2 (139) + + ζ 1960 metry of the network, we can write Πk = Θk , and 1961 Pex(Ak) = Pex(Bk). Finally, we note that B0 can ei- ∅ + 1962 ther i) degrade with probability Θ0 = 1 − Θ0 , or ii) + + 1963 form B with probability Θ = Π , such that 1 0 0 1987 F. N5: a type V core + + + + Pex(B0) = 1−Π0 +Π0 Pex(B1) = 1−Π0 +Π0 Pex(A1). (131) 1988 In N5 we have the reactions 1964 Denoting pc the probability that A1 performs a suc- A −−* A −−→ B + C , (140) 1965 cessful allocatalytic cycle (yielding A1 + B0), we can 1 )−− 2 1 1 1966 write the extinction probability as B1 )−−−−* B2 −−→ A1 + C1, (141)

C1 )−−−−* C2 −−→ A1 + B1. (142) Pex(A1) = 1 − pc + pcPex(A1)Pex(B0), (132)

1989 If our transitions follow the symmetry of the network, 1967 which upon injecting (131) yields the following expres- 1990 we have Pex(Ak) = Pex(Bk) = Pex(Ck). Denoting pc 1968 sion for Pex(A1) 1991 the success probability of the allocatalytic cycle, we + + 2 1992 can write Pex in terms of (140) Pex = 1 − pc + pc(1 − Π0 )Pex + pcΠ0 Pex. (133)

Pex(A ) = 1 − pc + pcPex(B )Pex(C ) (143) 1969 Solving the quadratic equation (133), we find 1 1 1 2 = 1 − pc + pcPex(A ) , (144) q 1 + + 2 1 − pc(1 − Π0 ) ± (pc(1 + Π0 ) − 1) 1993 which is the solution found for the 2-membered cycle. Pex(A1) = + 2pcΠ0 1994 Noting that (134) + + 1970 which yields Pex = 1 and pc = Π1 Π2 Γ1Γ2 (145)

1 − pc 1995 the irreversible limit with fixed specificity (∀k ≥ Pex = + . (135) + 1996 1 Π = ζ, Γ = 1) yields, as in Sec. IV F, pcΠ0 k k

2 1971 We can write pc in terms of back-and-forths starting 1 − ζ Pex = (146) 1972 at A1, terminating with an irreversible fragmentation ζ2

P∞ − + + − k + + pc = k=0(Π1 Π0 + Π1 Π2 ) Π1 Π2 (136) Π+Π+ = 1 2 , (137) 1−Π−Π++Π+Π− 1 0 1 2 1997 VI. EXPRESSIONS FOR FIG 4D, A PHASE 1998 DIAGRAM FOR MULTICOMPARTMENT 1973 so that in the irreversible limit with fixed specificity 1999 AUTOCATALYSIS 1974 (∀k ≥ 1 Πk+ = ζ, Πk− = 0), we obtain

2 1 − ζ 2000 Fig. 4d represents a case of a Type III core with Pex = . (138) ζ3 2001 one fragmentation reaction (Sec.IV M), consisting of 5 26

+ + 2002 members: 2034 • w5 = k5 release of A2Bα through locally irre- 2035 versible reaction r . r1 r2 r3 r4 5 ABα )−−−−* Aα )−−−−* Aβ )−−−−* A2Bβ )−−−−* A2Bα,(147) r5 A2Bα −−→ ABα + Aα, (148) 2036 We then obtain

2003 There are two degradation reactions (r6 and r7), both + d + w3 d w3 2004 taking place in the compartment β: Π1 = + d , Π1 = + d , (155) w3 +w3 w3 +w3 + − r6 r7 + w4 − w4 Aβ −−→∅, A2Bβ −−→∅. (149) Π2 = + d − , Π2 = + d − , (156) w4 +w4 +w4 w4 +w4 +w4 d d w4 2005 Here rk refers to the reaction label as found in Fig.4d, Π2 = + d − , (157) w4 +w4 +w4 2006 α, β to the compartments and A, AB, A2B to species + − + w5 − w5 2007 as defined in Fig.4d. Π3 = + − , Π3 = + − . (158) w5 +w5 w5 +w5 2008 We are guaranteed that, starting from Aα or ABα, 2009 eventually Aβ will be formed with probability 1. We 2037 Noting that Γ1 = 1, the product Γ2Γ3 simplifies to 2010 can thus simplify the problem by only considering re- 1 2011 actions r3 to r5 and species Aβ, A2Bβ and A2Bα, i.e. Γ2Γ3 = − + − + . (159) 2012 we can then write for the transition probabilities 1 − Π2 Π1 − Π3 Π2

Π+ Π+ + 2038 This allows to fully express P in terms of 8 micro- 1 2 Π3 ex Aβ )−−−−* A Bβ )−−−−* A Bα −−→ 2Aβ, (150) − 2 − 2 2039 scopic coefficients. Reactions r1 and r2 would give 4 Π2 Π3 2040 more rate constants and two more molar fractions (for Πd Πd 1 2 2041 chemostatted species). However, their values do not Aβ −−→∅, A2B −−→∅. (151) 2042 alter Pex.

2013 We readily find that this effective description obeys the 2043 For the purpose of illustration, we will consider 2014 solution for a simple Type-I network with 3 members, 2044 the competition between exchange, degradation, and 2015 with p2 a success probability for a cycle 2045 other transitions. To do so, we choose one rate for + − ex + + + 2046 exchange k4 = k4 = k and one rate for degradation p2 = Π1 Π2 Π3 Γ1Γ2Γ3 (152) d d d 2047 k6 = k7 = k . Furthermore, we let the sequestration- 1 − p2 P = (153) 2048 release steps be equally probable in compartment β, ex p + − + 2 2049 and match release in α: w3 = w4 = w5 = k. The 2050 transition success probabilities can then be expressed 2016 Where we have used Γk for calculating back-and-forth 2051 in terms of two ratios 2017 trajectories, as derived in sec. IV J: d ex ∞ ∆ = k /k, Ξ = k /k (160) X − + s 1 Γk+1 = (Πk+1ΓkΠk ) = − + . (154) 1 − Π Γ Π 2052 which upon substitution yields s=0 k+1 k k + 1 d ∆ + Ξ 2018 with Γ1 = 1. Let us now give a microscopic kinetic Π1 = 1+∆ , Π1 = 1+∆ , Π2 = 1+∆+Ξ , (161) 2019 interpretation to the competing processes, by consider- d ∆ − 1 Π2 = 1+∆+Ξ , Π2 = 1+∆+Ξ , (162) 2020 ing sufficiently elementary transitions on the level of a + 1 − Ξ 2021 single species for which we can introduce rate constants Π3 = 1+Ξ , Π3 = 1+Ξ . (163)

+ + 2022 2053 This permits to construct the phase diagram for Pex • w3 = k3 xABβ , sequestration of ABβ (present 2023 2054 in Fig. 4D in terms of the variables ∆ and Ξ. with a fixed molar fraction xABβ ), which plays 2024 the role of a feedstock species in compartment β 2025 in r3 proceeding forward.

d d 2026 • w3 = k6 degradation, reaction r6.

− − 2055 A. Phase boundaries and limits 2027 • w4 = k3 release of ABβ, r3 proceeding back- 2028 ward.

+ + 2056 Let us first find an expression for the boundary be- 2029 • w4 = k4 exchange of A2B from compartment β 2030 to α, when r4 proceeds forward. 2057 tween autocatalysis and deterministic extinction, which 2058 occurs when Pex = 1 and (153) coincide, i.e. when d d 2031 • w4 = k7 degradation, reaction r7. 2059 pc = 1/2, which upon substitution of (159) becomes: − − 2032 • w5 = k4 exchange, from compartment α to β, 1 Π+Π+Π+ = (1 − Π−Π+ − Π−Π+) (164) 2033 when r4 proceeds backward. 1 2 3 2 2 1 3 2 27

2060 In terms of Ξ and ∆, we have 2090 as survival (The error in this approximation scales as Ntresh 2091 Pex ). Otherwise, the composition after nr steps is 1 Ξ 1 2 1+∆ 1+∆+Ξ 1+Ξ (165) 2092 used as an initial composition to repeat this protocol, 1 1 Ξ Ξ 2093 again for nr steps (and so forth if again the threshold = 1 − 1+∆+Ξ 1+∆ − 1+Ξ 1+∆+Ξ 2094 is not reached). Once either the threshold is reached

2095 2061 Which rearranges to a linear dependence in Ξ or extinction occurs, it is sampled accordingly and the 2096 run is terminated. 2 2 ∆ Ξ + ∆ + 3∆Ξ + 2∆ − Ξ = 0, (166) 2097 For networks N1 to N5, the symmetry of the prob- 2098 lem is used to sample reactions: a random number χ 2062 from which we obtain for the phase boundary 2099 between 0 and 1 is drawn, and if χ < ζ, a forward 2100 reaction step is performed for an autocatalyst picked ∆(∆ + 2) Ξ = . (167) 2101 at random. If χ > ζ, a degradation step is performed 1 − 3∆ − ∆2 2102 for an autocatalyst picked at random.

2063 In the regime where reactions outpace degradation 2103 In Fig. S9a a short time interval is shown for 10 2064 (∆  1), extinction will be due to rate-limiting ex- 2104 simulation runs performed for the multicompartment 2065 change (Ξ  1). Taking (167), dividing by ∆ and 2105 network using, for completeness, all 5 autocatalysts 2066 letting ∆ → 0, the ratio Ξ/∆ tends to 2106 ABα, Aα, Aβ, A2Bβ, A2Bα (Since ABα, Aα return to 2107 Aβ with probability 1, the exact solution for Pex can Ξ 2108 also be found with the three-species network as shown = 2, (168) 2109 ∆ in Sec. VI). We start with NAβ = 1.0 and all other 2110 autocatalysts at 0. Rate parameters for r3 to r7 follow 2067 as also seen in the phase diagram. When exchange is ex 2111 the conventions outlined above, and are set at k = 2068 very rapid (Ξ → ∞), it ceases to be rate-limiting, and d + 2112 1.0,k = 0.3, k3 = 100, k = 10.0. r1 is treated as 2069 degradation will only compete with other reactions. + 2113 irreversible with an effective rate constant k1 = 10.0 2070 This occurs when we let the denominator of (167) 2114 and exchange of A is performed with a rate kex. From 2071 become 0, i.e. 2115 100000 simulation runs (Ntresh = 40) we obtain the 2116 numerical estimate of P = 0.698 ± 0.004. This is 1 − 3∆ − ∆2 = 0, (169) ex 2117 consistent with our exact solution (153), which yields 2118 P = 0.6999. 2072 which has solutions ex √ 2119 Among the 10 short runs for the multicompart- −3 ± 13 ∆± = , (170) 2120 ment network in Fig. S9a, 4 runs reach extinction 2 2121 within the simulation time, whereas the autocatalysis √ 2122 in one run (green) has reached an exponential growth −3+ 13 2073 of which ∆ = 2 ≈ 0.30 is the only physical 2123 regime that is close to deterministic with a total popu- 2074 solution, as can also be seen in the phase diagram. 2124 lation of 40 autocatalysts.

2075 VII. STOCHASTIC SIMULATIONS 2125 VIII. MULTICOMPARTMENT AUTOCATALYSIS 2126 WITH THREE COMPARTMENTS

2076 We can numerically sample the extinction proba- 2077 bility Pex({NX }0), with {NX }0 the initial autocatalyst 2127 Let us consider a bimolecular reaction 2078 population, by performing stochastic simulations of the 2079 kinetics using Gillespie’s algorithm which start from A + B )−−−−* C, (171) 2080 the initial composition {NX }0, using transition rates

2081 as discussed previously. The compositions obtained 2128 which can occur in three different compartments labeled 2082 from the algorithm are used to sample extinction or 2129 α, β, γ, as shown in Fig. S9b. Let us couple these 2083 survival, as detailed below. 2130 compartments through the following exchanges 2084 When the total autocatalyst population in a run 2085 reaches zero, this is sampled as an extinction event. Aα )−−−−* Aβ, Bβ )−−−−* Bγ , (172)

2086 When the population is not extinct at the end of a Cα )−−−−* Cβ )−−−−* Cγ . (173) 2087 prescribed number of reactions nr, the total number 2088 Nac of autocatalysts is compared with a threshold 2131 Removing Aγ ,Cα then immediately yields the Type 2089 population Ntresh. If Nac > Ntresh, the run is sampled 2132 III autocatalytic core shown in Fig. S9c. 28

a pathback-branch forward-branch C’ is a subcycle of C C is not minimal C’ is minimal

b hyper-ear proto-ear c Type I Type II

*u-v-*

Type III Type IV Type V

*u-u-v-v* u-u-v-v

d e z y z nested back-branches rz y ry y y z z entangled back-branches ry rz f E allocatalytic cycle ES EP S P

Figure S1: a) Hypergraph paths and cycle examples. b) Examples of hyper-ears (left) and associated proto-ears (right) obtained by removing green species. The syntax below graphs (see section II C) describes the relationships between non-P species and path P. Nodes ’u’ are products of r, nodes ’v’ are reactants of reactions producing x, ’-’ is any series of reactions with a single reactant and product, ’*’ denotes the closure of C by the green path. c) Autocatalytic cores. Edges are oriented consistently along cycles, so that reaction have a single reactant. Orange squares are chains of arbitrary length made of reactions with a single reactant species and product species. Edge-to-node connections are weighted by a stoichiometric coefficient, represented explicitly only for Type I by a fork (stoichiometry of 2). C, C0 and C0 are cycles. In Type II, the dotted path may comprise multiple cycles similar to the green box. In main text Fig. 2, only the case of a single green box is represented for simplicity. In Type I, detC 6= 0; in Type II, det(C) = 0 and det(C0) can be any value; in Types III-V, det(C) = det(C0) = det(C00) = 0. d) Generic hyper-ear structure of cores. e) Nested and entangled back-branches. f) Example of allocatalytic cycle. 29

IV a) I b) I D g 3 C III II III II g 2

Figure S2: a) A decorated Toy Formose reaction given by the submatrix ν ∗, obtained by removing C1. The replication cy- D cle g 3 is illustrated in blue. In this network, only species C2, C3 and C4 are autocatalysts. b) The minimal formose C reaction in its autocatalytic subnetwork, an example of an SFA. Arrows illustrate the replication cycle g 2 .

a) b) +

C int1

cat1

Figure S3: a) top: A product-enhanced cycle, Blackmond’s first type of autoinduction. bottom: Hypergraph, containing a Type I autocatalytic core (yellow). External food and allocatalysts are marked in blue. b) top: A ligand-accelerated cycle, Blackmond’s second type of autoinduction. bottom: Hypergraph, containing a Type II autocatalytic core (yellow) and supporting external food (blue). Note that, in both cases, external allocatalytic cycles are not part of the core. Allocatalysts are treated on equal footing with feedstock and waste in isolating a core. 30

a) 1: Acetate 7: Fumarate 2 3 4 1 5 2: Acetyl-CoA 8: Succinate 3: Pyruvate 9: Succinyl-CoA 13 6 4: Phosphoenyl- 10: 2-ketoglutarate pyruvate 11: isocitrate 12 7 5: oxaloacetate 12: cis-aconitate 11 8 10 9 6: Malate 13: citrate 6 7 6 4 6 6 5 7 10 b) 6 6 1: RuP 6 7 5 11 2: RuBP 3 13 12 10 6 8 3: 3-PGA 6 5 2 4 6 4: 1,3-PGA 13 11 1 9 6 12 5: G3P 6 6: F1,6P 9 1 6 6 5 3 7: F6P 4 6 13 6 9 8: E4P 2 6 5 10 9: X5P 3 5 1 9 10: DHAP 6 11 11: S1,7P 2 12 1 12: S7P 13 13: R5P 2x 9 9 1

Figure S4: a) Autocatalysts in the Reverse Krebbs cycle, which yield a Type II core. b) Autocatalysts in the Calvin Cycle. We find 3 cores of Type I (2 equivalent, up to the choice of reaction to link 5 and 9), and 4 cores of Type II. 31 a) DCU b)

NH O Type III NH N N O C C N NH N DCU Z Z

EDCC EDCC I1 NH 2 O I1 O NH C N DCC Y

NH O X O NH NH O OH

N C N

EDCU DCC EZ

Figure S5: a) Chemical network for the cavitand-amplification network. blue: feedstock compounds and waste. beige: autocatalysts. b) Type III autocatalytic core for the cavitand-amplification network. 32

1 Y Y 1 X 1 XY 1 1 XY Y X 2 1 2 X 1 2 1 2 4 1+2 3+4 4 X 2 XY 3 1

II

I

Type I Type II Type II

β β β β β βii ij ji ij jk ki

Figure S6: a) A Type II autocatalytic network with its feedstock compounds (or Food set), as encountered in GARD. Col- ored nodes highlight two distinct allocatalytic cycles that yield an autocatalytic cycle when combined. b) The same network, in a bipartite graph representation used in the RAF sets formalism. Specifying the mechanism in terms of uncatalyzed reaction steps removes the RAF property. c) A coarse-grained representation, where allocatalysts in the same allocatalytic cycle are represented by a single species, and each allocatalytic cycle has been replaced with a dashed line, to indicate the net reaction being catalyzed. b. A catalytic incorporation mechanism: the red (telephone shape) amphiphile forms a complex with the purple (square shape) amphiphile, which mediates its incorporation in a micelle or vesicle. c. Autocatalytic networks of co-assembling amphiphiles. Amphiphiles in square nodes are reservoir species, those in circular nodes represent amphiphiles in a micelle or membrane. Diagonal terms of the catalytic ma- trix encode Type I autocatalysis, i.e. direct self-incorporation. All other autocatalysis (cross-incorporation) is of Type II: sequences of nonoverlapping allocatalytic cycles. 33

a) b) c) B Bn 1 B B 1 2 B1 B2 2B1 B B - n-1 ... 2 d) Π Π Π 2 n-1 n B B ... B B 2B 1 - 2 - - n-1 - n 1

e) 4 f) D Π 2 3 1 2 C2 C3 C C C C C + D 1 1 1 - 2 - 3 1 1 Θ Θ- g) 1 1 Xs Xn D1 X1

X2 h) Π Π Π 2 Π s Π n s-1 s X X ... X ... X X + X 1 - 2 - - s - - n 1 s

Figure S7: a) A Type I subnetwork. B1 and B2 reversibly interconvert, B2 can also irreversibly form two B1, marked by + the forked edge. b) transition network, Πk denotes the probability that, starting at Bk, the next transition will be a − ∅ step forward in the cycle, Πk a step backward, and Πk a degradation. Fragmentation (yielding 2B1) and degradation (∅) are absorbing states, attained with probabilities pc and 1 − pc respectively. c) Autocatalytic core for a Type I cycle with n nodes. d) Transition network for c e) schematic for an autocatalytic core for a Type III cycle. f) Transition network for e) g) General schematic for an autocatalytic core for a Type II cycle with a single fragmentation step. h) Transition network for g. 34

a) b) ...

......

c) d) e) f) Π Π 2 3 Π 3 A A A A + B 1 - 2 - 3 1 1

g) h) i) Π 0 Π Π 2 0 A A A A + B 0 - 1 - 2 1 0

j) k) Π 2 A A B + C 1 - 2 1 1

Figure S8: a) three motifs with the same Pex(Ak), when the transition network follows the same symmetry as the net- work structure. b) A more general case: a multiple of allocatalytic cycles of size n. c) 6-membered Type I cycle. d) a 6-membered Type II cycle. e) a 6-membered Type III cycle. f) Transition network for one of the two allocatalytic cycles. g) a 6-membered Type III cycle. The transition of a precursor to allocatalyst is not mediated by an allocat- alytic cycle, and hence this is not a RAF. h) Transition network for g). i) a trio of symmetric analogues. j) a Type V autocatalytic core. k) Transition network for j. 35 a) # Autocatalysts

b) A B C

+ + +

c) α

Figure S9: a) Simulation using Gillespie’s Algorithm. Each of the 10 colored lines represents the sum of autocatalysts P P ( X NX) over time in a single run. Extinction events ( X NX = 0) are marked with a black pin at their correspond- ing timepoint. b) Three-compartment network with a single bimolecular chemical reaction A + B )−−−−* C. The pink wall separating α from β is permeable to B and C. The orange wall separating β from γ is permeable to A and C. c) Autocatalytic core. The use of three compartment allows to construct the reactions to complete the cycle (the ears) directly from the reproduction step.