<<

1 Autocatalysis in chemical networks: unifications and extensions 1, 2 1 2 2 Alex Blokhuis, David Lacoste, and Philippe Nghe 1) 3 Gulliver Laboratory, UMR CNRS 7083, PSL University, 10 rue Vauquelin, Paris F-75231, 4 France 2) 5 Laboratoire de Biochimie, Chimie Biologie et Innovation, ESPCI Paris, PSL University,10 rue Vauquelin, 6 Paris F-75231, France

7 (Dated: May 16, 2020)

8 Autocatalysis is an essential property for theories of and chemical evolution. However, the 9 different formalisms proposed so far seemingly address different forms of autocatalysis and it remains unclear 10 whether all of them have been captured. Furthermore, the lack of unified framework thus far prevents 11 a systematic study of autocatalysis. Here, we derive general stoichiometric conditions for and 12 autocatalysis in networks from basic principles in chemistry. This allows for a classification 13 of minimal autocatalytic motifs, which includes all known autocatalytic systems and motifs that had not 14 been reported previously. We further examine conditions for kinetic viability of such networks, which 15 depends on the autocatalytic motifs they contain. Finally, we show how this framework extends the range 16 of conceivable autocatalytic systems, by applying our stoichiometric and kinetic analysis to autocatalysis 17 emerging from coupled compartments. The unified approach to autocatalysis presented in this work lays a 18 foundation towards the building of a systems-level theory of chemical evolution.

19 PACS numbers: 05.40.-a 82.65.+r 82.20.-w

20 INTRODUCTION 53 different descriptions of autocatalysis and is based on 27–30 54 reaction network .

21 Life’s capacity to make more of itself is rooted in a 22 chemistry that makes more of itself. Autocatalysis ap- 23 pears to be ubiquitous in living systems from molecules 55 Let us start from basic definitions in chemistry as 1 31 24 to ecosystems . It is also likely to have been continu- 56 established by IUPAC (see Supplementary Informa- 25 ally present since the beginning of life and is invoked 57 tion Section I for full definitions), where autocatalysis 2–5 26 as a key element in prebiotic scenarios . However, 58 is a particular form of catalysis: A substance that in- 6 27 autocatalysis is considered to be a rarity in chemistry . 59 creases the rate of a reaction without modifying the ◦ 28 This would suggest that prebiotic molecules may not 60 overall standard Gibbs energy change (∆G ) in the re- 4,5 29 be that different from biomolecules (lipids , nucleic 61 action; the process is called catalysis. The catalyst is 7–10 2,3 30 acids , and peptides ). 62 both a reactant and product of the reaction. Catalysis 63 brought about by one of the products of a (net) reaction 31 Developments in systems chemistry are changing 64 is called autocatalysis. 32 this view, with an increasing number of autocatalytic 11–13 33 systems synthesized de novo . Chemical replicators 34 have been endowed with biomimetic properties such as 14 15 35 protein-like folding and parasitism . Autocatalysis 65 From this definition, we derive conditions to de- 36 also finds technological applications, e.g. eniantomere 66 termine whether a subnetwork embedded in a larger 16–18 37 and acid amplification . 67 chemical network, can be catalytic or autocatalytic. 68 These conditions provide a mathematical basis to iden- 38 The elucidation of autocatalysis represents a pri- 69 tify minimal motifs, called autocatalytic cores. Cores 39 mary challenge for theory. Models based on autocatal- 70 come in five structural motifs. They allow classification 40 ysis were first built to explain a diversity of dynami- 71 of all previously described forms of autocatalysis, and 41 cal behaviors in so called dissipative structures, such 19 72 also reveal not yet identified autocatalytic schemes. We 42 as bistable reactions , oscillating reactions, chemical 20 73 then study the kinetic conditions, which we call viabil- 43 waves and chemical chaos. Autocatalysis then be- 74 ity conditions, under which autocatalytic networks can 44 came a central topic for studying the self-replication 3,21–23 75 appear and be maintained on long times. We find that 45 dynamics of biological and prebiotic systems 24–26 76 networks have a different viability depending on their 46 (see for recent reviews). 77 core structure, notably that internal catalytic cycles 47 Despite this long history, a unified theory of au- 78 increase robustness. Finally, we expand the repertoire 48 tocatalysis is still lacking. Such a theory is needed to 79 of autocatalytic systems, by demonstrating a general 49 understand the origins, diversity and plausibility of 80 mechanism for its on a multicompartment 50 autocatalysis. It would also provide design principles 81 level. This mechanism strongly relaxes chemical re- 51 for artificial autocatalytic systems. 82 quirements for autocatalysis, making the phenomenon 52 Here, we present a framework that unifies the 83 more abundant and diverse than previously thought. 2

1 2 84 EXAMPLES, DEFINITIONS AND CONVENTIONS a) b) c) 1 A -1 0 EA A + E EA B 0 1 85 Catalysis and autocatalysis 2 1 2 EA E +B E -1 1 EA 1 -1 E 86 The following reactions have the same net mass 87 balance but a different status regarding catalysis: d) e) 1 2 f) (I) (II) (III) A -1 0 AB A −−* B, A+E −−* B+E, A+B −−−* 2B. 1 )−− )−− )−−− + B -1 2 (1) A B AB 1 2 2 AB 1 -1 88 Since no species is both a reactant and product AB 2B B 89 in reaction (I), it should be regarded as uncatalyzed. 90 Reactions (II) and (III) instead contain species which Figure 1: Different representation for allocatalysis (a,b,c) 91 are both a reactant and a product, species E in reaction and autocatalysis (d,e,f). a) Combining reactions 92 (I) and species B in reaction (III) and following the (1’)+(2’) affords an allocatalytic cycle that converts 93 definition above, these species can be considered as A to B. b) stoichiometric matrix of a), the dashed 94 catalysts. In reaction (II), the amount of species E square encloses the allocatalytic submatrix ν¯0 for 95 remains unchanged, in contrast to the case of reaction network b). c) Graph representation of the allocat- 96 (III), where the species B experiences a net production. alytic subnetwork. d) Combining (1”)+(2”) affords 97 For this reason, reaction (III) represents genuine au- an autocatalytic cycle converting A to B. e) stoichio- 98 tocatalysis. Although reaction (II) is usually referred metric matrix of d), the dashed square encloses the 00 99 to as simply catalyzed in the chemistry literature, we autocatalytic submatrix ν¯ for network e). f) a graph representation of the autocatalytic subnetwork. 100 propose to call it an exemple of allocatalysis to contrast 101 it with the case of autocatalysis. Then, we can reserve 102 the word catalysis for any instance and combination of 103 allocatalysis, autocatalysis and reverse autocatalysis. 132 matrix. For instance, in reactions (II-III), catalysts 133 cancel on each side leading to the same column vector 104 We emphasize that such stoichiometric consider- 134 as for (I). One way to avoid this is to describe enough 105 ations are necessary but not sufficient to characterize 135 intermediate steps so that a participating species is 106 catalysis, which according to the definition should also 136 either a reactant or a product: 107 accelerate the rate of the net reaction. In the following, 108 we will first generalize the stoichiometric conditions, −−IIa* −−IIb* 109 then examine kinetic ones. A + E )−− EA )−− E + B, (2) IIIa IIIb A + B )−−−−* AB )−−−−* 2B . (3)

110 Stoichiometric matrix and reaction vectors 137 We call this convention non-ambiguity and assume 138 henceforth that it is respected.

111 Reaction networks are represented as a stoichio- 27,30 112 metric matrix ν , in which columns correspond to 139 CATALYSIS AND AUTOCATALYSIS IN 113 reactions and rows to species. The entries in a column 140 STOICHIOMETRIC MATRICES 114 are the stoichiometric coefficients of the species partic- 115 ipating in that reaction, the coefficient is negative for 116 every species consumed and positive for every species 141 We seek to identify candidate motifs (subnetworks) T 117 produced. A reaction vector g = [g1, .., gr] results 142 for allocatalysis and autocatalysis within reaction net- 118 in a change of species numbers ∆n = ν · g. The sup- 143 works based the only knowledge of the stoichiometric 119 port of g (denoted supp(g)) is the set of its non-zero 144 matrix ν. Such identification does not make a priori 120 coordinates. A reaction cycle is a non-zero reaction 145 assumptions on the values or signs of reaction vector 121 vector c such that no net species number change occurs 146 coefficients, or on kinetics, which we will examine in the 122 : ν · c = 0, or equivalently, c belongs to the right null 147 next section. A motif has a stoichiometric submatrix ν¯ T 123 space of ν. Vectors b belonging to the left null space 148 obtained by restricting ν to certain rows and columns.

124 of ν induce conservation laws, because in that case 149 Restriction to rows delimitates species within the 125 30 b · n represents a conserved quantity. The case of all 150 system from external species , such as feedstock com- 126 32 coefficients bk nonnegative is refered to as a mass-like 151 pounds (also so-called food set ) and waste. In Fig. 1, 127 conservation law. For example in Fig.1a, conserved 152 external species have been colored in blue, while stoi- 128 quantities are nE + nEA (catalysts) and nA + nEA + nB 153 chiometric submatrices have been boxed in yellow. Fig 129 (total compounds). 154 1a (resp. Fig 1d) represents an example of allocatalysis 0 00 130 Lastly, catalyzed reactions may not always be dis- 155 (resp. autocatalysis) with its submatrix ν¯ (resp. ν¯ ) 131 tinguished from uncatalyzed one in the stoichiometric 156 and its hypergraph representation Fig 1c (resp. Fig 3

157 1f). 208 There exists a set of species S, a submatrix ν¯ of

158 We restrict further the stoichiometric matrix to 209 ν restricted to S, and a reaction vector g such that: 159 some reactions (columns). We do that in order to guar- 210 i) ν¯ is autonomous, ii) all coordinates of ∆n = ν¯ · g 160 antee that each reaction of the subsystem has at least 211 are strictly positive, or equivalently, ν¯ has no mass- 161 one reactant and at least one product. The subsystem 212 like conservation laws. The members of S consumed 162 then has a property called autonomy, which means that 213 (and produced) by g are called autocatalysts, g an 163 every column of the reduced matrix contains a posi- 214 autocatalytic mode and ν¯ an autocatalytic matrix. 28 164 tive and a negative coefficient . Autonomy ensures 215 Condition (i) ensures the conditionality of the 165 that the production any catalyst is conditional on the 216 reactions on autocatalysts, as it forbids cases where 166 presence of other chemical species of the subsystem 217 species of S are produced from external reactants only, 33 167 (similarly to the siphon concept in Petri Nets but 218 thus playing the role of conditions (i) and (ii) in the 168 without assumption on reaction signs, see Supplemen- 219 definition of allocatalysis.

169 tary Information Sections II and III). Otherwise, rate 220 Condition (ii) expresses the increase in autocat- 170 acceleration would be allowed unconditional on an al- 221 alysts number. The equivalence between its two for- 171 ready present substance, in opposition to the definition 222 mulations is an immediate consequence of Gordan’s 172 34 of catalysis. 223 theorem . Importantly, the second formulation of (ii) 224 does not involve an autocatalytic mode g, so that (i) 225 and (ii) can be formulated as properties of a matrix 173 Criterion for allocatalysis 226 itself, in contrast with allocatalysis. This allows us 227 to look for minimal autocatalytic motifs, which we do 228 next. Note that there is no need to specify an explicit 174 By definition, allocatalysis is an ensemble of re- 229 condition on the turnover of external species (as con- 175 actions by which a set of species remain conserved in 230 dition (iv) does for allocatalysis), their consumption 176 number (the catalysts) while other external species 231 being implied by the net mass increase imposed by 177 undergo a turnover which changes their numbers. This 232 condition (ii). 178 leads to the following conditions:

179 There exists a set of species S, a submatrix ν¯ of 180 ν restricted to S, and a non-zero reaction vector c 181 such that: i) ν¯ is autonomous; ii) supp(c) is included 233 Autocatalytic cores 182 in the columns of ν¯; iii) c is a reaction cycle of ν¯ 183 (ν¯ · c = 0), and; iv) ν · c =6 0. The members of S which 234 An autocatalytic core is an autocatalytic motif 184 participate to c (i.e. that are consumed and produced) 235 which is minimal in the sense that it does not contain 185 are called allocatalysts, c an allocatalytic cycle and ν¯ 236 any smaller autocatalytic motif. Consequently, an au- 186 an allocatalytic matrix. 237 tocatalytic system is either a core, or it contains one 187 Condition (i) has been discussed above. Condition 238 or several cores. The stoichiometric conditions show 188 (ii) expresses the involvement of the catalysts in the 239 that characterizing cores is equivalent to finding all 189 reactions c, where all columns of ν¯ are non-zero due to 240 autonomous matrices whose image contains vectors 190 (i), so that all reactions of c involve catalysts. Condition 241 with strictly positive coordinates only. This well-posed 191 (iii) expresses the conservation of catalysts and (iv) the 242 formulation allowed us to show that the stoichiometric 192 net reaction. Since the reaction cycle c is a cycle of the 243 matrix ν¯ of an autocatalytic core must verify the spe- 193 reduced matrix but not of the original matrix, some 244 cific properties below (see Supplementary Information 30 194 authors have called it emergent . This emergent cycle 245 Sections II - IV for demonstrations and examples). 195 can establish a non-equilibrium steady state driven 246 First, ν¯ must be square (the number of species 196 by the turnover of the external species. Note that 247 equals the number of reactions) and invertible. The 197 being allocatalytic is not a property of the sub-matrix 248 inverse has a chemical interpretation. By definition of 198 ν¯ alone but requires to explicit its relationship with −1 249 the inverse, the k-th column of ν¯ is a reaction vector 199 the larger matrix ν as imposed by condition (iv). A 250 such that species k increases by one unit, making it an 200 counterexample would be an isomerization cycle, which 251 elementary mode of production. Likewise, the reaction 201 verifies conditions (i-iii) but is not allocatalytic. −1 252 vector obtained by summing the columns of ν¯ leads 253 to increase by one unit of every autocatalyst, and thus 254 is an elementary mode of autocatalysis. This shows 202 Criterion for autocatalysis 255 how stoichiometry informs on fundamental modes of 35 256 autocatalysis .

203 By definition, autocatalysis is the process by which 257 Second, every forward reaction of a core involves 204 a combination of reactions involves a set of species 258 only one core species as a reactant. While this ex- 205 which all increase in number conditionally on species 259 cludes reactions between two different core species, a 206 in the set itself (the autocatalysts). This leads to the 260 single core species may react with itself. As ν¯ is square, 207 following conditions: 261 this also implies that every species of a core is con- 4

262 sumed (none is only produced), thus is an autocatalyst. a) Type I Type II Type III Type IV Type V 263 Furthermore, every species is the reactant of a single 264 reaction. Overall, every species is uniquely associated 265 with a reaction as being its reactant, so that ν¯ admits 266 a representation with a negative diagonal and zero or 267 positive coefficients elsewhere, at least one coefficient of 268 each column being strictly positive to ensure autonomy. b) C3 C2 c) 269 These properties are constraining enough to allow C2 C3’ C4 C3 270 an exhaustive enumeration of reaction graphs that C4’ 271 are cores. Autocatalytic cores are found to belong to C5 C4’ C3’ Type I Type II 272 five categories, denoted as type I to type V. Fig. 2a C5’

OH 273 represents typical members of each category as reaction O O HO O 274 hypergraphs (see Supp. Fig. S1 for general cases). As HO H H H H O O O 275 HO H H can be seen in these graphs, all minimal motifs contain OH HO OH OH O 276 O a fork, which ends either in the same compound (or HO HO O

OH O 277 node) for type I or in different compounds for types OH O OH HOH OH HO 278 O II to V. This fork is consistent with the intuition that O OH HO OH OH OH 279 autocatalysis requires reaction steps that amplify the HO O OH O O 280 amount of autocatalysts. The orange square on the H H H H HO HO 281 links between the nodes indicate that these links could OH OH 282 contain further nodes and reactions in series. Figure 2: a. Five minimal motifs. Orange squares indi- 283 The five types differ in their number of graph 36 cate where further nodes and reactions may be added, 284 cycles and the way these cycles overlap. Type I con- provided this preserves the motif type (I,II,III,IV,V) 285 sists of a single graph cycle that is asymmetric in a and minimality. b+c: Examples of chemical networks, 286 stoichiometric sense: the product of the stoichiomet- along with their autocatalytic cores. blue: external 287 ric coefficients of its reaction products must be larger species, yellow: autocatalysts. b. Type I: Breslow’s 288 than that its reactants. Types II-V can be described 1959 mechanism for the formose reaction37 c. Type 289 as two overlaping graph cycles, where any graph cy- II: Another autocatalytic cycle in the formose reac- 290 cle that does not contain all core species must be an tion. Species denoted as Cx inside the nodes refer to 291 allocatalytic cycle (it would otherwise be of Type I, molecules containing x carbon atoms, which are shown 292 contradicting minimality). Equivalently, the product of below in standard chemical representation. 293 the stoichiometric coefficients of the reaction products 294 of these cycles must be equal to that of their reactants. 316 sequential nonoverlapping allocatalytic cycles (cross- 317 incorporation, such as N3 in Fig. (3)). More generally, 318 when such catalytic cycles are compactly written as 295 Unification: First Examples 319 single reactions (Eq.(1)), they can be treated in the 32 320 RAF framework , where they form irreducible RAF- 41 296 The stoichiometric characterization of autocatal- 321 sets . This formally establishes the recently suggested 5,42 297 ysis provides a unified approach to autocatalytic net- 322 link between these models. 298 works reported in the literature. The examples below 323 Another proposed extension of autocatalysis is 299 and their proofs are detailed in Supplementary Infor- 43 324 ’chemical amplification’ due to cavitands . The mech- 300 mation Section IV. The formose reaction is a classical 325 anism involves a reactive compound in a molecular cage, 301 example of autocatalysis known to contain many au- 38 326 whose free counterpart can react to form two species 302 tocatalytic cycles . Fig. 2b and c show Type I and 327 that exchange with the caged species, thus amplifying 303 II cores both found in the formose reaction. Similarly, 328 its release. We find that this process can be described 304 autocatalytic cores of Type I and II can be found in 329 within our framework and corresponds to a Type III 305 the Calvin cycle and reverse Krebs cycle (Sup. Fig. 330 core (Supp. Fig. S5). 306 S4a). Some reaction steps Fig. 2b may be catalyzed 331 Overall, examples of Types I and II seem most 307 externally (e.g. by enzymes, base, ions), but external 332 common, whereas have not yet found examples of Types 308 catalysis in general does not alter the core. By the 25,39 333 IV and V. 309 same token, proposed examples of auto-induction 310 are found to harbor Type I and II cores (Sup. Fig. 311 S4b). 334 VIABILITY OF AUTOCATALYTIC NETWORKS 312 In the GARD model for self-enhancing growth 4,5 313 of amphiphile assemblies , all underlying autocatal- 40 314 ysis is described (Supp. Fig. S6) by Type I cycles 335 Stoichiometric conditions do not guarantee that 315 (self-incorporation) and Type II cycles built up from 336 autocatalytic motifs amplify. Whether an initial auto- 5

337 catalyst amplifies or degrades depends on kinetic con- 384 2 descendants), and P0 = 1 − pc the probability of 21,26 338 siderations. To address this ’fixation problem’ , we 385 premature degradation: 339 examined the probability Pex of extinction (or 1 − Pex 1−pc pc 340 of fixation) of species within autocatalytic motifs, as a ∅ ←−−− B1 −−→ 2B1. (6) 341 function of transition probabilities of reaction steps.

342 Considering an homogeneous system with a steady 386 For reversible reactions, pc results from the integra- 343 supply of reactants, several authors have noted that 387 tion of all possible sequences of forward and backward − 344 in the highly dilute autocatalyst regime, appreciable 388 reactions along the cycle. From Bk, let Πk be the 21,26,44 + 345 rates require first-order autocatalysis , i.e. each 389 transition probability to revert to Bk−1, and Πk to 346 forward reaction step only involves one autocatalyst. 390 convert to Bk+1. We have 347 Among first-order order networks, fixation models have n 348 so far focused on Type I networks (e.g. Fig.2b), which Y + pc = Π Γk, (7) 349 comprise a single graph cycle containing n species. In k k=1 350 a transition step, a given species may either proceed ∞ 351 irreversibly to the next species or disappear as a result X − + s 1 Γk+1 = (Πk+1ΓkΠk ) = − + , (8) 352 of degradation. King found that if every reaction step 1 − Π Γ Π s=0 k+1 k k 353 k among n steps of the cycle has a success probability + + 354 Πk (1−Πk being the degradation probability), fixation 391 where Γk recursively defines the statistical weight of Qn + 44 355 is possible if pc = k=1 Πk ≥ 1/2 , a condition called 392 all back-and-forth trajectories from Bk to itself from 23,45 21 − + 356 decay threshold . Bagley et al. used birth-death 393 the transition probabilities Πk and Πk , with Γ1 = 1. 357 processes to derive Pex for an autocatalytic loop com- 394 The overall outcome described by (6) corresponds to 358 prising one species (n = 1). Schuster reported detailed 395 the simplest type of branching process: a birth-death 359 time-dependent statistics for such networks in various 396 process. Using (5) finally yields 26 360 contexts .  1 1 − 1, pc ≥ , 361 Here we extend the problem considered by Bagley pc 2 Pex = 1 (9) 362 et al. so as to include reversible reactions and net- 1, pc < 2 . 363 works beyond type I using the theory of branching 46 397 364 processes . In these stochastic processes, a parent This generalizes Bagley et al’s observation for type 398 365 (here an autocatalytic species X ) is replaced by k off- I networks to n > 1 and reversible reactions. The s − 399 366 spring, with k drawn from a distribution P , chemically irreversible limit Πk → 0, Γk → 1 recovers King’s k 44 400 367 corresponding to result .

Pk Xs −−→ kXs. (4) 401 Viability of autocatalytic motifs 368 A procedure to construct the branching process from 369 reaction networks and derive Pk from transition proba- 402 To investigate how autocatalytic motif structure 370 bilities Πk is illustrated in Supplementary Information 403 affects survival, we calculated Pex for five different 371 Section V. Pex ultimately determines Pk, since the 47 404 cores (N1 to N5, Fig. 3): they are of equal size (6 372 probability to go extinct is the probability that all 405 reaction steps, 6 species), all reactions proceed irre- 373 descendants go extinct: 406 versibly with the same success rate ζ, which plays a + ∞ 407 similar role as the transition probability Πk in the ex- 2 X k 23,44,45 Pex = P0 + P1Pex + P2Pex + ... = PkPex. (5) 408 ample above and is sometimes called specificity . k=0 409 Fig. 3c highlights how Pex depends on ζ for each 410 core structure. The highest ζ for extinction (Pex = 1) 374 We first exemplify the method by generalizing known 411 is observed for the Type I cycle N1, and progressively 375 results for Type I networks, solutions for other networks 412 lower values are found for N2 to N4, which are all of 376 are reported in the Supplementary Information Section 413 type II. Type V network N5 tolerates the lowest speci- 377 V. We then compare the computed Pex of autocatalytic 414 ficity ζ before extinction, sustaining almost three times 378 motifs which differ in their core structures. 415 higher failure rates 1−ζ than N1. These differences can 416 be qualitatively understood by counting the minimum 417 number of steps needed to produce more autocatalysts. 379 Reversible Type I cycles 418 In respective order, networks N1 to N5 in Fig. 3b do so 419 in six, four, three, three and two steps. In particular, 420 given their symmetries, the P of N and N have the 380 Consider a Type I cycle, such as N1 in Fig. 3b, ex 3 5 421 same dependence on ζ as a 3 and 2-membered Type I 381 consisting of n nodes. Starting at the first node, B1, let 422 cycle, respectively. 382 P2 = pc be the total probability that, after realizing 383 every step of the cycle, it is converted to 2B1 (k = 423 It has been suggested that large networks are dis- 6

a) a) P ex α 3 β

4 + 2

1 +

P 5

+ ex +

6 7 U V b) α 3 β A 4 2 b) AB A B 2 5 1 c)

N1 N2 N3 N4 N5 ex P Figure 3: a) Pex as function of ζ (legend: Pex(ζ) for k k ex Pex < 1) for b) 5 autocatalytic networks of similar size, starting at the dashed node. N1: Type-I cycle. N2: Type-II with one fork. N3: Type-II, two nonover- lapping allocatalytic cycles, a common motif in GARD with a 1st order RAF representation. N4: Type-II: al- locatalytic cycles connected by intermediate steps. N5: Type-V d) Symbols: Pex after 1000 simulated trials, lines: exact solution. Full expressions are derived in Supplementary Information Section VI. Figure 4: Multicompartment autocatalysis a) between compartments α and β, coupled by exhange reactions of species A and A2B. AB functions as an autocata- 424 favored in general. The examples of Fig. 3 indicate lyst in α, and as a feedstock species in β. b) Type III 425 that this can be counterbalanced by the presence of autocatalytic core obtained for the multicompartment 426 more allocatalytic cycles in the network. In particular, autocatalytic network c) Extinction probability Pex 427 autocatalytic sets in the sense of RAF-sets allow to for multicompartment autocatalysis in a), starting ex 428 build large and robust networks, since their underlying from a single Aα, as a function of exchange rate k and degradation rate kd, relative to other relevant re- 429 structure is typically built up from allocatalytic cycles. action rates fixed at k (Pex is independent of rates for r1 and r5). Sloped asymptote: exchange-limited sur- d vival kex = 2k .√ Vertical asymptote: reaction-limited d 13−3 430 EXTENSION TO MULTICOMPARTMENT survival k /k = 2 . The transition between extinc- 431 AUTOCATALYSIS tion and potential survival (Pex < 1) is marked by a dashed white line (solutions for Pex and asymptotes are derived in Supplementary Information Section 432 As a final illustration, we now extend our stoichio- VIII). 433 metric method to an emergent type of autocatalysis. 434 Consider two compartments α and β, the internal chem- 435 istry of which admits no autocatalysis (Fig. 4). The 448 to be reused in β, thereby providing the missing loop 436 two compartments allow the same chemistry, but they 449 to close the cycle. With three compartments, a type 437 do not have the same access to feedstock species (U, V 450 III motif can be constructed from a single bimolecular 438 in α and A in β). Upon coupling these compartments 451 reaction (Supp. Fig. S9e). 439 via a membrane that exchanges A and A2B (Fig. 4a), 440 a Type II autocatalytic core emerges (Fig. 4b). 452 In Fig. 4c, the results of the analysis of the viabil- 453 ity conditions for this network are shown as a plot of 441 Here, species AB is limiting in α, whereas in β it is d 454 the extinction probability vs. degradation rate k , and 442 an abundant feedstock, which is catalytically exchanged ex 455 exchange rate k . Both of these rates are normalized 443 through the reversible formation of A2B. By coupling 456 by characteristic rates k present in the other reactions 444 compartments, compounds are no longer restricted to 457 which are assumed to be equal to each other. 445 one role (e.g. feedstock, autocatalyst) and chemical 446 reactions are no longer restricted to one direction. The 458 The figure shows that to overcome the degrada- 447 reaction used for reproduction in α is then allowed 459 tion threshold (Pex < 1), the typical reaction rate k 7

d 460 over a certain fraction of the degradation rate k (ver- 514 different reactions. However, putting all those compo- 461 tical black dotted line in Fig. 4c), as we have seen 515 nents together disproportionately favors side-reactions, 23,45 462 in the case of single compartment autocatalysis, but 516 leading to extinction . ex 463 additionally, the rate of exchange k must outpace 517 Multicompartment autocatalysis introduced here 464 the rate of degradation (black oblique dotted line in 518 offers a way around this problem: coupled compart- 465 Fig. 4c). Stochastic simulations (Supp. Fig. S9c) show 519 ments can produce autocatalysis from few reactions and 466 autocatalytic growth in these conditions for surviv- 520 generate extensive networks without mixing all compo- 467 ing populations, thus confirming that these emergent 521 nents, thereby reducing the accumulation of degrada- 468 motifs can indeed exhibit to autocatalysis. 522 tive side reactions. Overcoming degradation is a general 523 issue, known as the error threshold when replicators 524 degrade by mutation. Compartments also relax this 51 52 525 469 DISCUSSION threshold , particularly when they are transient .

526 The emergence of multicompartment autocatalytic 527 cycles relies on the environmental coupling of reaction 470 We presented a theoretical framework for auto- 528 networks, which allows to access conditions unattain- 471 catalysis based on stoichiometry, which allows a precise 529 able in a single compartment. In our example, this 472 identification of the different forms of autocatalysis. 530 allowed us to reuse the compounds and reactions to 473 Starting with a large stoichiometric matrix, we show 531 complete autocatalytic cycles. The principle is more 474 how to reduce it, and how to distinguish allocatalysis 532 general, however: autocatalysis may come from pairing 475 from autocatalysis at the level of this reduced stoi- 533 physico-chemical conditions which cannot coexist in 476 chiometric matrix. A detailed analysis of the graph 534 one homogeneous phase, as found in the autocatalytic 477 structure contained in these reduced stoichiometric 53 535 dissolution of copper and pitting corrosion. Such 478 matrices reveals that they contain only five possible re- 536 principles may be exploited to more readily design 479 current motifs, which are minimal in the sense that they 537 autocatalytic systems with biomimetic functionalities. 480 do not contain smaller motifs. Fundamental modes of 481 production of minimal autocatalytic cores are encoded 538 Overall, our framework shows that autocatalysis 482 in the column vectors of the inverse of the autocat- 539 comes in a diversity of forms and can readily emerge in 483 alytic core submatrix. Autocatalytic cores are found 540 unexpected ways, which makes autocatalysis in chem- 484 to have a single reactant for each reaction. This means 541 istry more widespread than previously thought. This 485 that autocatalytic networks require the availability of 542 invites to search for further extensions of autocataly- 54 486 certain chemical species in their cores to operate prop- 543 sis, which provides new vistas for understanding how 487 erly. Conversely, the same property also implies that 544 chemistry may complexify towards life. 488 the proper functioning of an autocatalytic network 489 will guarantee the stable supply of certain products, 490 a definitive advantage when these products are key 545 MATERIALS AND METHODS 491 enzymes or metabolites.

492 We identified these minimal motifs in known ex- 493 amples of autocatalysis such as the formose reaction, 546 Theoretical methods and derivation of results are 494 metabolic networks, the GARD model and RAF sets. 547 detailed in the Supplementary Information, which is di- 495 Autocatalytic cores also provide a basis for algorithms 548 vided up in the following sections: I) Terminology and 496 to identify these recurring autocatalytic motifs in large 549 definitions, II) derivation of autocatalytic cores from 48,49 497 chemical networks , as has been done for gene reg- 550 Graph theory, III) their chemical interpretation and 50 498 ulatory networks . In this way, we may be able to 551 IV) application to formose, autoinduction, metabolic 499 break the complexity of large chemical networks into 552 cycles, chemical amplification, RAF sets, GARD V) 500 smaller, more manageable structures. 553 branching process derivation and determination of Pex. 554 VI) determination of P for Fig. 3. VII) determina- 501 Autocatalytic motifs are found to confer differ- ex 555 tion of P for Fig. 4d. VIII) autocatalysis from one 502 ent degrees of robustness, which we evaluated using ex 556 bimolecular reaction and 3 compartments. 503 the notion of viability. This viability can be precisely 504 evaluated as a survival probability in an appropriately 505 defined branching process. This approach is applicable 506 to many models for autocatalytic systems upon iden- 557 ACKNOWLEDGEMENTS 507 tification of their cores, highlighting the interest of a 508 unified framework. Viability results from a competi- 509 tion between reactions that produce autocatalysts and 558 AB acknowledges stimulating discussions with D. 510 side-reactions such as degradation. This is intimately 559 van de Weem. AB and DL acknowledge support from 511 related to the ’paradox of specifity’: autocatalytic mo- 560 Agence Nationale de la Recherche (ANR-10-IDEX- 512 tifs are more likely to be found in large networks with 561 0001-02, IRIS OCAV). PN acknowledges C. Flamm 513 many different chemical components engaging in many 562 for discussions. 8

38 563 REFERENCES 631 A. Ricardo, F. Frye, M. A. Carrigan, J. D. Tipton, D. H. 632 Powell, and S. A. Benner, J. Org. Chem. 71, 9503 (2006). 39 1 633 A. J. P. Teunissen, T. F. E. Paffen, I. A. W. Filot, M. D. 564 W. Hordijk, BioScience 63, 877 (2013). 2 634 Lanting, R. J. C. van der Haas, T. F. A. de Greef, and E. W. 565 A. I. Oparin, Origin of Life (Dover, 1952). 3 635 Meijer, Chem. Sci. 10, 9115 (2019). 566 S. A. Kauffman, J. Theor. Biol. 119, 1 (1986). 40 4 636 Containing only one fork. 567 D. Segré, D. Ben-eli, D. W. Deamer, and D. Lancet, Orig. 41 637 V. Vasas, C. Fernando, M. Santos, S. Kauffman, and E. Sza- 568 Life Evol. Biosph. 31, 119 (2001). 5 638 thmáry, Biol. Direct 7, 1 (2012). 569 D. Lancet, R. Zidovetzki, and O. Markovitch, J. R. Soc. 42 639 S. A. Kauffman, A world beyond physics: the emergence & 570 Interface 15, 20180159 (2018). 6 640 evolution of life (Oxford University Press, 2019). 571 L. E. Orgel, Plos Biol. 6, 5 (2008). 43 7 641 J. Chen, S. Körner, S. L. Craig, S. Lin, D. M. Rudkevich, and 572 C. Woese, The Genetic Code, the Molecular Basis for Genetic 642 J. Rebek, Proc. Natl. Acad. Sci. U. S. A. 99, 2593 (2002). 573 Expression (Harper and Row, New York, 1967). 44 8 643 G. King, BioSystems 15, 89 (1982). 574 L. E. Orgel, J. Mol. Biol. 38, 381 (1968). 45 9 644 E. Szathmáry, Philos. Trans. R. Soc. B Biol. Sci. 355, 1669 575 F. Crick, J. Mol. Biol. 38, 367 (1968). 10 645 (2000). 576 W. Gilbert, Nature 319, 618 (1986). 46 11 646 S. Karlin and H. M. Taylor, A first course in stochastic pro- 577 A. Bissette, B. Odell, and S. Fletcher, Nat Commun 5, 4607 647 cesses (Academic Press, New York, 1975). 578 (2014). 47 12 648 Note that in general, the calculation of Pex may involve re- 579 S. N. Semenov, L. J. Kraft, A. Ainla, M. Zhao, M. Bagh- 649 actions that are not in the core. Here we consider sufficiently 580 banzadeh, V. E. Campbell, K. Kang, J. M. Fox, and G. M. 650 simple cases where this is not necessary. 581 Whitesides, Nature 537, 656 (2016). 48 13 651 á. Kun, B. Papp, and E. Szathmáry, Genome Biol 9, R51 582 H. N. Miras, C. Mathis, W. Xuan, D.-L. Long, R. Pow, 652 (2008). 583 and L. Cronin, Proc. Natl. Acad. Sci. U. S. A. (2020), 49 653 Andersen, J. L. and Flamm, C. and Merkle, D. and Stadler, 584 10.1073/pnas.1921536117. 14 654 P. F., IEEE/ACM Trans. Comput. Biol. Bioinform 16, 510 585 B. Liu, C. G. Pappas, E. Zangrando, N. Demitri, P. J. 655 (2017). 586 Chmielewski, and S. Otto, J. Am. Chem. Soc. 141, 1685 50 656 R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, 587 (2019). 15 657 and U. Alon, Science 298, 824 (2002). 588 M. Altay, Y. Altay, and S. Otto, Angew. Chem. Int. Ed. 57, 51 658 D. Grey, V. Hutson, and E. Szathmáry, Proc. R. Soc. Lond. 589 10564 (2018). 16 659 B 262, 29 (1995). 590 C. Viedma, Phys. Rev. Lett. 94, 065504 (2005). 52 17 660 A. Blokhuis, P. Nghe, L. Peliti, and D. Lacoste, J. Theor. Biol. 591 I. Baglai, M. Leeman, M. Kellogg, and W. Noorduin, Org. 661 487, 110110 (2020). 592 Biomol. Chem. 17, 35 (2019). 53 18 662 Halpern, J, J. Electrochem. Soc. 100, 421 (1953). 593 K. Ichimura, K. Arimitsu, and K. Kudo, Chemistry Lett. 24, 54 663 R. Krishnamurthy, Chem. Eur. J. 24, 16708 (2018). 594 551 (1995). 55 19 664 K. Andersson, G. Ketteler, H. Bluhm, S. Yamamoto, H. Oga- 595 F. Schlögl, Z. Phys. 253, 147 (1972). 20 665 sawara, L. G. M. Pettersson, M. Salmeron, and A. Nilsson, J. 596 G. Nicolis and I. Prigogine, Self-Organization in Nonequilib- 666 Am. Chem. Soc. 130, 2793 (2008). 597 rium Systems: From Dissipative Structures to Order through 56 667 D. Angeli, P. D. Leenheer, and E. D. Sontag, Math. Biosci. 598 Fluctuations (Wiley, New York, 1977). 21 668 210, 598 (2007). 599 R. J. Bagley, D. J. Farmer, and W. Fontana, in Artif. Life II 57 669 J. Chen, S. Körner, S. L. Craig, D. M. Rudkevich, and J. Re- 600 (1991) pp. 141–158. 22 670 bek, Nature 415, 385 (2002). 601 D. Segré, D. Lancet, O. Kedem, and Y. Pilpel, Orig. Life Evol. 58 671 G. King, J. Theor. Biol. 123, 493 (1986). 602 Biosph. 28, 501 (1998). 59 23 672 Provided we choose 1 ≤ k ≤ m − n. 603 E. Szathmáry, Philos. Trans. R. Soc. Lond. B. Biol. Sci. 361, 604 1761 (2006). 24 605 W. Hordijk and M. Steel, Journal of Theoretical Biology 227, 606 451 (2004). 25 607 D. G. Blackmond, Angew. Chem. Int. Ed. 48, 386 (2009). 26 608 P. Schuster, Monatsh. Chem. 150, 763 (2019). 27 609 M. Feinberg, Foundations of Chemical Reaction Network The- 610 ory (Springer, New York, 2019). 28 611 U. Barenholz, D. Davidi, E. Reznik, Y. Bar-on, N. Antonovsky, 612 E. Noor, and R. Milo, eLife 6, e20667 (2017). 29 613 A. Deshpande, M. Gopalkrishnan, and C. Science, Bull. Math. 614 Biol. 76, 2570 (2014). 30 615 M. Polettini and M. Esposito, J. Chem. Phys. 141, 024117 616 (2014). 31 617 A. D. McNaught and A. Wilkinson, IUPAC Compendium of 618 Chemical Terminology, 2nd ed. ("the Gold Book") (Blackwell 619 Scientific Publications, Oxford, 1997). 32 620 W. Hordijk, J. Theor. Biol. 435, 22 (2017). 33 621 M. Gopalkrishnan, Bull. Math. Biol. 73, 2962 (2011). 34 622 Borwein, Jonathan and Lewis, Adrian S., Convex Analysis and 623 Nonlinear Optimization (Springer-Verlag New York, 2006). 35 624 S. Schuster, C. Hilgetag, J. H. Woods, and D. A. Fell, “Ele- 625 mentary modes of functioning in biochemical replicators,” in 626 Computation in Cellular and Molecular Biological Systems 627 (Singapore: World Scientific, 1996) pp. 151–165. 36 628 Graph cycle are understood as closed paths in the graph, and 629 should not be confounded with reaction cycles. 37 630 R. Breslow, Tetrahedron Lett. 21, 22 (1959). 9

673 SUPPLEMENTARY INFORMATION 723 that express an opposition (same vs different). This 724 opposition between same and different is e.g. found 725 in the IUPAC terminology for a homogeneous catal- 674 I. TERMINOLOGY 726 ysis (occuring in the same phase) and heterogeneous 727 catalysis. For the IUPAC recommended terminology 675 A. IUPAC Definitions 728 ’autocatalysis’, a consistent choice that expresses this 729 opposition is ’allocatalysis’ (self vs other).

676 To promote the consistent use of terminology, 677 IUPAC committees establish recommendations which 678 serve as a basis for the Compendium of Chemical Ter- 679 minology (The Gold Book), of which some relevant 680 entries are reproduced here. Comments in italic have 730 C. Remarks 681 been added for clarity.

682 Chemical Reaction: A process that results in the

683 interconversion of chemical species. Chemical reactions 731 1. Directionality of autocatalysis 684 may be elementary reactions or stepwise reactions. (It 685 should be noted that this definition includes experi-

686 mentally observable interconversions of conformers.) 732 In an allocatalytic reaction, catalysts are produced 687 Detectable chemical reactions normally involve sets 733 and consumed in equal amount, and both the ’for- 688 of molecular entities as indicated by this definition, 734 ward’ or ’backward’ are acceletated, independently of 689 but it is often conceptually convenient to use the term 735 whether the reaction goes in one or the other direction. 690 also for changes involving single molecular entities (i.e. 736 In contrast, the term autocatalysis only applies in one 691 ’microscopic chemical events’). 737 direction, due to the requirement of having catalysts

692 Catalyst: A substance that increases the rate of a 738 be the product of the net reaction. However, the reac- 693 reaction without modifying the overall standard Gibbs 739 tion in the other direction is still a form of catalysis, 694 energy change in the (net) reaction; the process is 740 sometimes refered to as ’reverse autocatalysis’. For 695 called catalysis. The catalyst is both a reactant and 741 example, water adsorbed on a copper surface H2O(ads) 55 696 product of the (catalyzed) reaction. The words cata- 742 can strongly accelerate its own disproportionation . 697 lyst and catalysis should not be used when the added 2H O(ads) −−* H O · OH(ads) + H(ads), (10) 698 substance reduces the rate of reaction (see inhibitor). 2 )−− 2 699 Catalysis can be classified as homogeneous catalysis, H2O · OH(ads) )−−−−* H2O(ads) + OH(ads). (11) 700 in which only one phase is involved, and heterogeneous 701 catalysis, in which the reaction occurs at or near an 743 In this direction, the net reaction produces no catalysts 702 interface between phases. Catalysis brought about by 744 (instead, it degrades them): 703 one of the products of a (net) reaction is called auto- H O(ads) −−* H(ads) + OH(ads), (12) 704 catalysis. Catalysis brought about by a group on a 2 )−− 705 reactant molecule itself is called intramolecular catal- 745 hence it does not satisfy the stoichiometric require- 706 ysis. The term catalysis is also often used when the 746 ments for autocatalysis. However, in the opposite sense 707 substance is consumed in the (net) reaction (for exam- 747 an appropriate stoichiometry for autocatalysis can be 708 ple: base-catalysed hydrolysis of esters). Strictly, such 748 realized: 709 a substance should be called an activator.

710 Autocatalytic Reaction: A(net) chemical reac- H(ads)+OH(ads)+H2O(ads) )−−−−* 2H2O(ads). (13) 711 tion in which a product (or a reaction intermediate)

712 also functions as a catalyst. In such a reaction the 749 Note that reaction (11) is still a form of catalysis: 713 observed rate of reaction is often found to increase 750 H2O is a reactant and product in reaction (11), which 714 with time from its initial value. 751 represents an accelerated pathway for reaction (12).

715 B. Allocatalysis

716 We refer to allocatalysis as the form of catalysis 752 2. Inhibition 717 in which, at the end of a catalytic cycle, the catalyst(s) 718 have not changed in number. By their equal participa- 719 tion in either direction, allocatalysts will thus drop out 753 A compound is a catalyst in the context of a par- 720 of the net reaction. Some authors refer to autocatalysis 754 ticular experimental condition where it accelerates the 721 as homocatalysis and allocatalysis as heterocatalysis, 755 rate of a reaction. A change of those conditions may 722 which is a lexicologically consistent choice of terms 756 change this label, the compound may become an in- 10

757 hibitor instead. E.g. consider 798 {r1, ..., rm} is reaction set, each rj being an ordered 799 pair (Xj, Y j) of non-empty and non-intersecting sub- 1 A + B )−−−−* AB, (14) 800 sets of S respectively called reactants and products, and 801 M is the stoichiometric matrix with coefficients mij. −−2* A + E )−− AE, (15) 802 mij = 0 when the species si does not participate to the 3 803 reaction r , m < 0 when species s is a reactant of AE + B )−−−−* AEB, (16) j ij i 804 rj, and mij > 0 when species si is a product of rj. 4 AEB )−−−−* E + AB. (17) 805 The stoichiometric matrix M contains all the in-

758 For a rate acceleration to occur, steps 2,3,4 must 806 formation about the hypergraph, the column of M 759 together proceed faster than reaction 1. Template- 807 corresponding to reactions, and the rows to species. 760 assisted ligation of DNA or RNA provides an example 808 Definition 2. A subgraph of H = (S, R, M) is a 761 where this may not be the case: below a critical anneal- 0 0 0 0 0 0 809 triplet H = (S ,R ,M ) where S is a subset of S, R 762 ing temperature the detachment of the product (here, 810 is a set of reactions which reactants and products are 763 AB in step 4) becomes increasingly slower for longer 0 811 in S and intersect the reactant set and product set 764 strands, and the template becomes an inhibitor. 812 of a reaction in R, with corresponding stoichiometric 0 813 coefficients M .

765 3. Autonomy and siphons 814 Definition 3. A reaction graph is square if it has the 815 same number of reactions and species.

766 The concept of autonomy is closely related to the 816 Definition 4. A directed path is a sequence of alter- 767 concept a siphon in Chemical Reaction Networks (CRN) 817 nating species and reactions, all reactions and species 56 768 theory : A Siphon Σ is a subset Σ ⊂ S of all species 818 being distinct, where species which precede and succeed 769 S, which for each reaction that has a species in Σ as a 819 a reaction are respectively a reactant and a product of it. 770 product, has at least one of its reactants in Σ. 820 A path is a minimal path if it is not possible to form

771 In this definition, a reaction can be irreversible 821 a path starting and ending at the same species using a 772 in a mathematical sense: the reverse reaction does 822 strict subset of its reactions. A path is semi-open if 773 not exist. For a reversible CRN, a reverse reaction 823 it either starts or ends with an edge. 774 does exist, and the siphon definition must apply to 824 Definition 5. In a directed path, an edge has a back- 775 the forward and backward direction, thus becomes 825 branch if one of its products is a species located up- 776 equivalent to autonomy. A reaction must then imply 826 stream in the path, and this product is called a back- 777 both one or more products and one or more reactants 827 product. An edge has a forward-branch if one of its 778 from Σ (siphon reactions), or none at all (external 828 reactants is a species located downstream in the path, 779 reactions). Note that autonomy is less restrictive than 28 829 and this product is called a forward-reactant. 780 the conditions posed in Barenholz et al. , where in

781 addition species must be both reactant and product 830 Definition 6. A cycle has an identical definition as a 782 of a reaction. This last conditions is a proved as a 831 path, except that the first and last species are the same 783 consequence for minimal structures in our choice of 832 species. A cycle is minimal if it is not possible to form 784 formalism (see below). 833 a cycle with a subset of its species and reactions.

834 Definition 7. A species S is the solitary reactant 835 (product) of a reaction if S is the only reactant (resp. 785 II. MATHEMATICAL DERIVATION OF 836 product) of this reaction, otherwise it is a co-reactant 786 AUTOCATALYTIC CORES 837 (resp. co-product).

787 A. Reaction graph definitions 838 Definition 8. A reaction is simple if has a single 839 reactant and a single product.

788 Reaction graphs described below correspond to 840 Definition 9. Consider a simple reaction R with re- 789 weighted directed hypergraphs without self-loops in the 841 actant x and product y, with respective 790 language of graph theory. In this section, the word cycle 842 −a and b. The contraction of R consists of: (i) re- 791 is used in the sense of graph theory (see below), not 843 moving R; (ii) merging x and y into a single species z, 792 in the sense of reaction cycle used to denote right null- 844 and; (iii) multiplying by a (resp. b) the stoichiometric 793 vectors of the stoichiometric matrix. In the following, 845 coefficients associated with z for all reactions formerly 794 letters used for scalars indicate positive numbers, and 846 associated with x (resp. y). 795 cycles and paths are understood as directed. 847 Definition 10. In a square graph, a perfect match- 796 Definition 1. A reaction graph H is a triplet 848 ing is a bijection between species and reactions. It 797 (S, R, M) where {s1, ..., sn} is the species set, R = 849 corresponds to all pairs (i, σ(i))i=1...N , where i is the 11

850 index of a species, σ is a permutation of [1...N], and 895 • A minimal cycle C has exactly two perfect match- 851 σ(i) is the index of a reaction. A perfect matching of 896 ings, which correspond to the matching of its 852 a graph, or a subgraph, G is called a G-matching. 897 reactions with their solitary reactants and soli- 898 tary products respectively. det(C) = 0 if and only 853 Remarks: 899 if C is weight-symmetric.

0 854 • Consider reactions E = ({a}, {b, c}) and F = 900 • Be H the hypergraph obtained from H by con- 855 ({b}, {c}); a − E − b − F − c is a path, but it is 901 tracting a simple reaction R. The cofactor expan- 0 856 not minimal because it contains a−E −c. Simple 902 sion implies that |det(H )| = |det(H)|. Addition- 0 0 857 paths are necessarily minimal (Fig. S1a). 903 ally, any cycle C of H becomes a cycle C in H ob- 0 904 tained by contracting R, and |det(C )| = |det(C)|. 858 • Consider E = ({a}, {b}) and F = ({b}, {a, c}); 859 a − E − b − F − c is a minimal path and F has a 860 back-branch (Fig. S1a). In particular, it contains 905 C. Autocatalytic cores 861 a cycle a − E − F − a.

862 • Simple paths are exactly cycle-free minimal 906 Definition 14. A core is a minimal productive reac- 863 paths. 907 tion graph, i.e. a reaction graph that does not contain 908 any productive subgraph. 864 • More generally, a minimal path can have back-

865 products and forward-reactants (Fig. S1a). 909 Finding cores is equivalent to finding minimal ma- 910 trices which are productive and autonomous. In this 866 • A hypergraph cycle can have reactions connecting n,m 911 section, we denote M ∈ IR the stoichiometric ma- 867 several of its species, but a minimal cycle cannot. 912 trix of the graph H, where n is the number of rows 868 Thus, a minimal cycle only contains simple paths 913 (species) and m the number of columns (reactions). 869 (it is identical to a cycle in a regular graph). 914 If M is productive, we denote γ a vector such that 915 M.γ 0. Productivity of M is indifferent to the sign 870 • Cycles are square. 916 of its columns as the sign of the coefficients of γ is not 917 constrained. Therefore, we choose the convention that 918 any productive vector γ is positive, up to taking the 871 B. Relationship with linear algebra 919 opposite for some columns of M.

920 Remarks: 872 x 0 denotes a real vector with only strictly

873 positive coordinates. 921 • M invertible implies M productive, as the image n,m 922 of M is then the full space IR , which contains 874 Definition 11. A matrix M is productive if there 923 the strictly positive orthant. 875 exists a real vector γ such that M. γ 0. Equivalently, n 876 M intersects the strictly positive orthant IR>0. 924 • Below, species or reactions ’can be removed’ is 925 understood as ’can be removed while preserving 877 Definition 12. A matrix is autonomous if its 926 productivity and autonomy’. Being able to re- 878 columns all contain a strictly negative and a strictly 927 move a row (a species) or a column (a reaction) 879 positive coefficient. 928 contradicts the minimality of a core.

880 Definition 13. A minimal cycle is weight- 929 • Removing columns (reactions) preserves auton- 881 symmetric if the product of the absolute values 930 omy, but not necessarily productivity. 882 of the stoichiometric coefficients associated with its

883 reactants equals the product of the stoichiometric 931 • Removing rows (species) preserves productivity, 884 coefficients associated with its products. Otherwise, the 932 but not necessarily autonomy. 885 cycle is weight-asymmetric. 933 • A row corresponding to a species that is always 886 Remarks: 934 a co-reactant or a co-product can be removed 935 without affecting autonomy (every column still 887 • Autonomous matrices are stoichiometric matrices 936 contains positive and negative coefficients) and 888 of reactions systems such that every reaction has 937 productivity. 889 at least one reactant and at least one product. 938 Proposition 1. In a core, every species is both a 890 • The Leibniz formula for the determinant shows 939 reactant and a product. 891 that the non-zero terms of det(M) exactly cor- 892 respond to the products of stoichiometric coeffi- 940 Proof. Every species must be produced, otherwise it 893 cients mi,σ(i) determined by each possible perfect 941 would not be possible to find a positive γ verifying 894 matching of the graph. 942 M.γ 0. Now, suppose a species S is never a reactant 12

943 and only a product. All reactions such that S is their 993 Proposition 5. In a core, every species is involved in 944 only product can be removed without affecting the 994 a cycle. 945 productivity of other species. In the resulting graph, 946 either S is not the product of any reaction anymore, 995 Proof. Obvious from Property 4, considering the back 947 or S is only a co-product of the remaining reactions. 996 and forth paths joining any two species. 948 In both cases, S can be removed without affecting 997 Proposition 6. Consider a partition of a core into 949 autonomy. 998 two species sets V and W . V cannot be upstream of 999 W , i.e. reactions with products in W cannot have all 950 Proposition 2. A core is square, invertible, every 1000 their reactants in V . 951 species is the solitary reactant of a reaction, and is 952 reactant for this reaction only. 1001 Proof. Suppose on the contrary that M can be written   n,m AB 953 Proof. Consider M ∈ IR a productive stoichiomet- 1002 M = where A and B span V , C spans W , 0 C 954 ric matrix with n species and m reactions, with rank 1003 and B ≤ 0. Given the latter inequality, Property 1 955 k. Obviously k ≤ m, n. We must also have m = k, 1004 implies that A is non-empty and autonomous. Consider 956 otherwise a column could be removed while preserving 1005 γ > 0 such that M.γ 0. Be α and β the respective 957 the image of M, thus productivity. Additionally, every 1006 restrictions of γ to the reaction spaces of A and B. 958 species must be the solitary reactant of at least one 1007 Then A.α + B.β 0. As B.β ≤ 0, we have A.α 0, 959 reaction, otherwise it could be removed. This implies 1008 contradicting the minimality of M. 960 m ≥ n. Overall, k = m ≥ n ≥ k, so that k = m = n, 961 meaning that M is square and invertible. As every 1009 The definitions below are generalizations of the 962 species S is the solitary reactant of at least one reac- 1010 notion of ear decomposition in regular graphs (Fig. 963 tion R and m = n, the species is a reactant for only 1011 S1b). 964 R.

1012 Definition 15. A hyper-ear is a hypergraph compris-

965 Remark: Property 2 implies that M can be re- 1013 ing a minimal cycle C, called the base cycle, such that 966 arranged in such a way that it has a strictly negative di- 1014 C has a reaction with a product x outside C, and a min- 967 agonal, and only non-negative off-diagonal coefficients. 1015 imal path P starting at x, such that its last reaction, 1016 R, has a product in C, R being the only P-reaction 968 Proposition 3. In a core, a square autonomous sub- 1017 to have a product in C.A proto-ear has a similar 969 matrix must have a product outside its set of species. 1018 definition, but where C is a simple path (called the base 1019 path) instead of a minimal cycle. 970 Proof. Be A a square autonomous submatrix of M and   AC 1020 Description of proto-ears and hyperears - In 971 write M = . We need to show that B > 0. BD 1021 a hyper-ear or a proto-ear, any C-reaction can have x 972 As A is autonomous, every A-reaction has a reactant 1022 as a product, and any C-species can be the product of 973 in the set of A-species. By Property 2, reactants are 1023 R, the last reaction of P (Fig. S1b). We denote by u 974 always solitary, thus B ≥ 0. Now suppose B = 0. Then 1024 a species which is the product of R, v any C-species 975 det(M) = det(A).det(D). Either det(A) = 0, implying 1025 which is the reactant of a C-reaction which produces 976 det(M) = 0, contradicting M invertible, or det(A) 6= 0, 1026 x,’−’ a simple path (including the empty path), o a 977 then A is productive, contradicting the minimality of 1027 simple path comprising at least one species, o being a 978 M. 1028 subcase of the ’−’ category. By convention, a uv motif 1029 can correspond to a single same species which is both a

979 Proposition 4. A core is strongly connected. 1030 R-product and a reactant for x production. Any proto- 1031 ear falls into a class described by a chain made of the 1032 symbols u, v, o, and ’−’, as soon as it contains at least 980 Proof. Consider a species x0 of M. Below, we recur- 1033 a u and a v. Reciprocally, any such chain represents 981 sively construct sets Dk = {x1, ..., xk} of increasing 1034 one or more proto-ears. Any hyper-ear is obtained by 982 cardinal, such that every xi is downstream x0, until 1035 cyclic closure of the base path of a proto-ear. Such 983 k = n − 1, implying that for any species y 6= x, there 0 0 1036 closure is denoted by the ∗ symbol at the beginning 984 exists a path from x to y. We denote R(S) the only 1037 and the end of the chain. 985 reaction with reactant S, which is well defined by Prop- 986 erty 2. Step 1: We take x1 as a product of R(x0). 1038 Theorem 1. Cores are of one of the following types: 987 Step k: Suppose Dk exists, k < n − 1. Re-arrange 988 M such the top left block A of size k corresponds to 1039 • TYPE I: a weight-asymmetric minimal cycle; 989 species set Dk and reaction set R(Dk). As A is au- 990 tonomous, by Property 3, there exists a species xk+1 1040 • TYPE II: a cycle comprising all species as soli- 991 outside of Dk which is downstream Dk, hence Dk+1 1041 tary reactants, and one or more weight-symmetric 992 exists. 1042 sub-cycles without intersection between them; 13

1043 • TYPES III-IV-V: one of the three types of hyper- 1094 strictly positive. In all these cases, we have det(M) > 1044 ears, any subcycle of which is weight-symmetric: 1095 −1 + ac + be = 1.

1045 (III) ∗u−vo∗; (IV) ∗u−u−v∗; (V) ∗u−v−u−v∗. 1096 Step 2: All types are minimal

1097 TYPE I: Removing any subset of species or reac- 1046 Proof. Core type are represented in Fig. S1c. H 1098 tions would result in an acyclic graph, contradicting 1047 denotes the reaction graph and M its stoichiometric 1099 Property 5.

1048 matrix. 1100 TYPE II: Removing any subset of species or reac- 1049 1101 tions either leads to a hypergraph where a subset of

1050 SUFFICIENCY 1102 species is upstream the rest, contradicting Property 6,

1051 1103 or to a non-invertible minimal cycle.

1052 Step 1: All types have a non-zero determinant. 1104 TYPE III-V: Removal of any set of reactions (and 1105 a fortiori of species, given that they all are solitary 1053 TYPE I: Invertibility is a direct consequence of 1106 reactants) leaves at most one cycle in the graph, the 1054 the remark on the determinant of minimal cycles. 1107 latter being non-invertible. 1055 TYPE II: Consider a minimal weight-symmetric 1108 NECESSITY 1056 subcycle C of H. Any H-matching consistent with a 1057 C-matching can be associated to another H-matching 1109 By Property 5, there exists a minimal cycle C in 1058 obtained by reversing the C-matching. Thus, det(H) 1110 H. Either C is weight-asymmetric, then H = C is of 1059 has the form det(C).α + β, where the first term com- 1111 TYPE I. Or C is weight-symmetric. Then, by Property 1060 prises all C-consistent H-matchings. As det(C) = 0, it 1112 3, there is a C-reaction with a product x outside C. By 1061 suffices to show that β is a single non-zero term. By 1113 Property 4, there exists a path P from x to any species 1062 Property 3, C must have a reaction R1 with a prod- 1114 in C (Fig. S1d). We can take P minimal and such that 1063 uct x1 outside of C. As by definition of TYPE II, 1115 only its last reaction has a product in C, thus forming 1064 subcycles are non-intersecting, R1 must be the only 1116 a hyper-ear ∗mu−vm∗ where m symbols stand for any 1065 C-reaction with a product outside C, and x1 must be 1117 other hyper-ear motif. Below, we show that hyper-ears 1066 the only product of R1 outside C. All non-zero terms 1118 are either one of the types II-V, or contain a TYPE 1067 in β require matching R1 to x1, otherwise R1 would 1119 II, III or IV core as a subgraph, overall demonstrating 1068 be matched with a C-species, which would impose a C- 1120 that H is necessarily a hyper-ear of TYPE II-V. 1069 matching (C being a minimal cycle), contradicting the 1070 definition of β. We now order the indexes k following 1121 Before continuing the proof of the theorem, we 1071 the downstream order of the xk species along H, and 1122 show an additional property based on the sufficiency 1072 show recursively that β corresponds to the H-matching 1123 of TYPE I and TYPE III. 1073 where every Rk matches xk: (step 1) By construction, 1124 Proposition 7. In a core, a minimal path can have 1074 R1 matches x1. (step k) Suppose Rk−1 matches xk−1. 1125 back-branches but no forward-branch, and the cycles 1075 Either Rk is a simple reaction, and, as its reactant xk−1 1126 formed by back-branches are non-intersecting. 1076 is already matched, Rk necessarily matches its only 1077 product xk. Or Rk has a back-branch forming a mini- 1078 mal cycle. However, cycles are non-intersecting, so that 1127 Proof. Consider a minimal path P in a core H. 1079 the back-product xi of Rk is necessarily downstream 1128 Forward-branches require reactions to have multiple 1080 xk and upstream xk−1, thus xi is already matched by 1129 reactants, contradicting Property 1. Therefore, P is 1081 the recursion hypothesis. Thus, Rk can only match xk. 1130 either a simple path or it has back-branches. Suppose 1131 there exists two back-products y and z of the respective 1082 TYPES III-IV-V: Consider the stoichiometric ma- 1132 reactions r and r , respectively forming intersecting 1083 trix M with non-negative off-diagonal coefficients: y z 1133 cycles Cy and Cz (Fig. S1e). Without loss of gener-   −1 c e 1134 ality, y and z can be taken closest, y upstream of z, M =  a −1 f  (18) 1135 and such that there is no other cycle nested within b d −1 1136 Cy and Cz than possibly themselves. Cz is necessarily 1137 weight-symmetric, as it would otherwise contradict the

1084 We have det(M) = −1+df +ac+ade+bcf +be. Strict 1138 minimality of H. There are two cases. Case 1: rz is 0 1085 subcycles must be weight-symmetric, as otherwise, the 1139 upstream ry (Cz is nested within Cy). Call P the path 1086 minimality of M would be contradicted. Consequently, 1140 from the product of rz to ry then y then z. Then Cz 0 1087 when their factors are both non-zero, the products df, 1141 and P form a TYPE III core. Case 2: ry is upstream 0 1088 ac, and be must be equal to 1. By contracting all 1142 rz (Cz and Cy are entangled). Call P the path joining 0 1089 simple reactions and multiplying the columns of M 1143 y to z. Then Cz and P form a TYPE III core. Overall, 1090 if necessary so that solitary reactant stoichiometries 1144 we have shown that the existence intersecting cycles 1091 are all normalized to −1, TYPE III corresponds to 1145 along a minimal contradicts the minimality of H. 1092 d = f = 0 and a, b, c, e > 0, TYPE IV to f = 0 1093 and a, b, c, d, e > 0, and TYPE IV to all coefficients 1146 We now go back to the proof of the theorem. 14

1147 Proof. Consider the minimal path P of the hyper-ear, 1204 1148 where P starts at x. By Property 7, P has no forward- 1149 branch.

1150 Case A (TYPE II): Suppose P has one or more 1205 III. CHEMICAL INTERPRETATION OF 0 1206 1151 back-branch. Consider a reaction R of P which has a AUTOCATALYTIC CORES 0 1152 back-branch, with back-product y and forming a cycle 0 0 1153 C . C is necessarily weight-symmetric by minimality of 1207 We remind that the notion of ’graph cycle’ (closed 0 0 0 1154 H. Call x the product of R outside C . Consider the 1208 successions of nodes and edges) differs from the notion 0 0 1155 path P starting at x , then going downstream of P 1209 ’reaction cycle’ (right null vectors of the stoichiomet- 1156 until it reaches a shortest subpath u − v of C, then to x, 1210 ric matrix). The latter name was historically chosen 0 1157 and finally from x to y in P. By Property 7, all cycles 1211 because the two notions overlap in the particular case 1158 along P are non-intersecting, and by construction, C 1212 of the simplest catalytic cycles, but there are counter- 1159 is non-intersecting with the cycles of P. Consequently, 1213 examples for both (we give one later). We employ 0 0 1160 P only has non-intersecting cycles, so that C and 1214 ’reaction cycle’ and ’graph cycle’ to distinguish these 0 1161 P form a TYPE II motif. Thus, the base cycle of 1215 notions. 1162 the hyper-ear cannot have species outside u − v, as 0 0 1216 By the minimality of autocatalytic cores, an au- 1163 otherwise the core made of C and P would be a strict 1217 tocatalytic motif is either a core, or it contains one or 1164 subset of H, contradicting its minimality. This shows 1218 several cores. If P is a property of cores, then for any 1165 that H is a ∗u − v∗ hyper-ear with a minimal path 1219 autocatalytic motif, either it verifies P , or it contains 1166 comprising non-intersecting cycles, the latter being 1220 an autocatalytic motif verifying P . In particular, 1167 necessarily weight-symmetric. Thus H is a TYPE II 1168 core. 1221 • by Property 1, every autocatalytic motif contains 1169 Case B (TYPES III-V): Suppose P has no 1222 an invertible autocatalytic motif; 1170 back-branch, in other words P is a simple path. 1223 • by Property 2, every autocatalytic motif contains 1171 Subcase B.1 (TYPE III): Suppose H a hyper-ear 1224 an autocatalytic motif such that every product 1172 which has only one u and one v symbol, thus a ∗u−v−∗ 1225 is also a catalyst of the reaction, or equivalently, 1173 hyper-ear. The case ∗u − v∗ with a simple path is 0 0 1226 such that every species appears on both side of 1174 covered by the definition of TYPE II. If the − symbol 1227 the total chemical equation. 1175 does not represent an empty path, then it comprises 1176 at least one species, so that H is a ∗u − vo∗ hyper-ear 1228 The five categories of cores are schematically rep- 1177 with simple path P, corresponding to TYPE III. 1229 resented in Fig. S1c. The convention of representation 1178 Subcase B.2 (TYPE IV): Suppose the hyper-ear 1230 are as follows. The edges of the graph correspond to 1179 comprises one u or v symbol in addition to the u − v 1231 reactions and the yellow nodes to species. Reaction 1180 motif. We first note that u − u − v and u − v − v proto- 1232 have two sides, the reactant side and product side, and 1181 ears are isomorphic to ∗u − v − ∗, which falls into the 1233 each side of the edge representing the reaction connects 1182 categories of TYPE II or TYPE III cores. Therefore, 1234 to one or several nodes representing the species, with 1183 in a core, a hyper-ear containing two successive u or v 1235 a stoichiometry for each connection. In the Type I 1184 symbols is necessarily of the form ∗u − u − v∗ or ∗v − 1236 core, the edge from the top node to the bottom node 1185 v − u∗, or their cyclic permutations, as any additional 1237 ends with a fork, the two ends of which connect to a 1186 symbol would allow to find a strict subgraph u − u − v 1238 single bottom node (S1c). This means that the bottom 1187 or u−v−v forming a core, contradicting the minimality 1239 species is produced with a stoichiometry of 2 by this 1188 of H. Furthermore, ∗u − u − v∗ and ∗v − v − u∗ hyper- 1240 reaction. For simplicity, we have represented the con- 1189 ears are isomorphic. Indeed, the matrices of their 1241 nection between all other edges and nodes without fork. 1190 reduced forms correspond to the matrix shown in the 1242 However, in all generality, any connection between a 1191 SUFFICIENCY section of the theorem, where exactly 1243 edge and a node could be a fork (e.g. could be of 1192 one coefficient is set to zero. Thus, these motifs as well 1244 stoichiometry >1), as soon as the rules on graph cycle 1193 as all their cyclic permutations, fall into the category 1245 symmetry are respected, as explained below. Forks also 1194 of TYPE IV. 1246 appear in Types II-V, but where they connect to two

1195 Subcase B.3 (TYPE V): Subcase B.2 imposes that 1247 distinct nodes, which are two distinct product species. 1196 any motif containing four or more u or v symbols must 1248 The orange squares indicate that the reaction repre- 1197 alternate u and v symbols. Any motifs with five or 1249 sented can be replaced by a chain of reactions with a 1198 more u or v symbols contains, up to permutation, a 1250 single reactant species and a single product species. 1199 u−v−u−v proto-ear as a strict subgraph. However, the 1251 The mathematical derivations are done without 1200 latter is isomorphic to TYPE IV cores, which implies 1252 constraint on the number of reactants or products a 1201 that ∗u − v − u − v∗ (TYPE V) is the only motif in 1253 reaction step can have. However, in a chemical system, 1202 this class which does not contradict the minimality of 1254 elementary reactions are in principle either unimolecu- 1203 H. 1255 lar or bimolecular. This restricts the range of possible 15

1256 core graphs. This restriction operates only at the level 1313 represented on Fig S1d, comprising a base cycle C and 1257 of the stoichiometry of each single reaction, not at the 1314 a minimal path (in the sense of reaction hypergraphs, 1258 level of the overall structure of cores. Indeed, even 1315 see former section) starting from a reaction fork and 1259 without invoking the restriction on bimolecularity, the 1316 joining back to a node of C. This structure should 1260 mathematical derivation results in cores such that ev- 1317 enable a systematic algorithmic search for autocatalytic 1261 ery reaction step involves at most a single species as 1318 motifs in large stoichiometries. 1262 reactant and at most two species as products, so that 1263 bimolecularity can always be respected, provided rules

1264 on the stoichiometry of each connection. 1319 IV. EXAMPLES AND APPLICATION OF THE 1320 STOICHIOMETRIC CRITERIA 1265 In a chemical autocatalytic core, an edge with a 1266 fork connecting to two distinct nodes (e.g. a reaction 1267 with two distinct product species) must have a stoi- 1321 A. Example of autocatalytic submotif 1268 chiometry 1 for each connection in order to respect 1269 bimolecularity. If a reaction has only one product 1322 We illustrate the concepts of autocatalytic sub- 1270 species, then its stoichiometry can be 1 or 2, as soon 1323 motif and the properties demonstrated above on a toy 1271 as it respects rules on cycle stoichiometry, which we 1324 model for the Formose reaction, to which we have added 1272 detail now. 1325 one auxiliary reaction C3 )−−−−* D3, so that we obtain 1273 Consider a simple graph cycle C, simple meaning 1326 a stoichiometric matrix 1274 that every reaction as a single reactant species and 1275 a single product species. Note ai (resp. bi) the stoi- 1 2 3 4 1276 chiometry of the reactant (resp. product) of reaction i. 1 C −1 −1 0 0  Q Q 1 C1 + C2 )−−−−* C3 1277 If i ai =6 i bi, we say that C is weight-asymmetric, C2 −1 0 2 0 2 1278 otherwise it is weigth-symmetric. Weigth-asymmetric   C + C )−−−−* C ν = C  1 −1 0 −1 1 3 4 (19) 3   3 1279 simple graph cycles are an example of graph cycle C −−* 2 C D3 0 0 0 1  4 )−− 2 1280 which has no reaction cycle. Weight-symmetric graph 4 C 4 0 1 −1 0 C )−−−−* D 1281 cycle taken in isolation have a reaction cycle (their 3 3 1282 determinant is zero as shown in the derivation of cores 1327 this full matrix has a left nullvector l = (1, 2, 3, 3, 4), 1283 above), thus they correspond to allocatalytic cycles 1328 i.e. we have a mass-like conservation law 1284 in the context of a larger reaction graph where the 1285 reactions of the graph cycle consume and/or produce L = n + 2n + 3n + 3n + 4n . (20) C1 C2 C3 D3 C4 1286 species outside of its own species set. The most classical

1287 example of an allocatalytic cycle is represented in Fig 1329 Thus, this matrix does not correspond to an autocat- 1288 S1f, where reactant S is provided from the environment, 1330 alytic motif. Upon removing C1 (then considered as a 1289 binds to cataylst E to form complex ES converted into 1331 feedstock molecule), we obtain an autocatalytic matrix 1290 EP, finally dissociated into E which is recycled, and 1332 ν ∗ 1291 product P. 1 2 3 4 1292 Type I cores are weight-asymmetric simple graph 1 1293 cycles, as C in S1c . Consequently, in Types II-V, any C −−* C C −1 0 2 0  2 )−− 3 1294 simple graph cycle must be weight-symmetric (these 2 2 C 1 −1 0 −1 C −−* C 1295 graph cycles correspondg reaction cycles), otherwise 3   3 )−− 4 ν ∗ = 3 (21) D3 0 0 0 1  1296 the core would contain a Type I core, contradicting its C4 )−−−−* 2 C2 C 1297 minimality. For example in S1c: in Type II, C must 4 0 1 −1 0 4 0 C3 )−−−−* D3 1298 be symmetric, but this does not apply to C because 0 1299 it is not even a simple cycle; in Types III-V, C, C 00 1333 This matrix is autonomous and invertible with 1300 and C must be symmetric. Type II cores consist of 0 1334 inverse: 1301 a large graph cycle (C in Fig S1c) comprising smaller 1302 graph cycles embedded within it (for example C in Fig 1 1 2 2 2 1303 S1c). Given the above, each of these smaller cycles −1 2 1 1 1 2 C C D C ν =   = (g 2 ,gg 3 ,gg 3 ,gg 4 ). (22) 1304 must be weight-symmetric, and obeys the definition of ∗ 3 1 1 1 1 1305 an allocatalytic cycle. The Type II category includes 4 0 0 1 0 1306 circularly closed successions of such allocatalytic cy-

1307 cles, where the product of one allocatalytic cycles is 1335 The columns of the inverse are reaction vectors 1308 the catalyst of the next allocatalytic cycle, which is 1336 associated to a given species, of which they produce 1309 a typical example of autocatalytic set. In addition 1337 one extra unit. We will refer to them as elementary 1310 however, Type II allows intermediate non catalyzed 1338 production modes. 1311 reaction steps. D T 1339 Among the production modes, g 3 = (2, 1, 1, 1) 1312 Notably, every core follows the basic structure 1340 is the sole vector which uses the auxiliary reaction, and 16

1341 D3 only ever occurs as a product 1365 submotifs.

D g 3 2C + 2C + C )−−−−−−−* 2C + 2C + C + D . (23) 2 3 4 D 2 3 4 3 −g 3

C C D C 1342 If we now consider Γ = g 2 + g 3 + g 3 + g 4 , we 1366 B. Autoinduction 1343 obtain a reaction balance such that every species has a 1344 net increase: 1367 The concept of autoinduction was put forward by 25 Γ : 7C2 +6C3 +4C4 −−→ 8C2 +7C3 +5C4 +D3. (24) 1368 D. Blackmond , to distinguish between autocataly- 1369 sis that is mediated by external catalysts (i.e. not 1345 ν ∗ is not minimal as D3 is only produced, and does 1370 part of the autocatalysts that are reproduced) called 1346 not participate as a catalyst. Indeed, Property 2 im- 1371 ’autoinduction’, and autocatalysis that functions with- 1347 plies the existence of an autocatalytic submotif without 1372 out the aid of external catalysts (i.e. all catalysts are 1348 species which does not participate as a catalyst. An 1373 autocatalysts). We thus obtain a hybrid of pure al- 1349 autocatalytic submotif is obtained by removing the 1374 locatalysis and autocatalysis, for which the simplest 1350 reaction 1375 example would be

C3 )−−−−* D3, (25) A + B + E )−−−−* 2A + E. (30)

1351 and the species D3, to obtain the autonomous subma- 1376 The IUPAC definitions impose that autoinduction quali- 1352 trix ν¯ 1377 fies as autocatalysis. It follows then from our framework 1378 that we can find autocatalytic cores in autoinduction 1 2 3 1379 networks, which is indeed confirmed in Fig. S3 for the 25 1 1380 two types of autoinduction that have been proposed .   C −−* C C2 −1 0 2 2 )−− 3 2 ν¯ = C 1 −1 0 (26) 1381 The concept of ’autoinduction’ addresses a no- 3  C3 )−−−−* C4 C 0 1 −1 3 1382 tion of self-sufficiency (also encountered in RAF sets, 4 C −−* 2 C 4 )−− 2 1383 Sec.IV E): external allocatalysts become essential to 1384 succesful autocatalysis, yet they are not reproduced. 1353 The matrix ν¯ is also invertible 1385 Depending on the context, they could be seen as part 1 1 2 2 1386 of the environment, in the same sense as essential feed- −1 C C C 1387 stock species. Examples of autoinduction occur in ν¯ = 2 1 1 2 = (g 2 ,gg 3 ,gg 4 ). (27) 3 1 1 1 1388 autocatalytic metabolic networks (with locally allocat- 1389 alytic enzymes) or e.g. the formose reaction which is −1 1390 often catalyzed by base and divalent metal ions. 1354 The columns of ν¯ correspond to reaction vectors,

1355 whose application yields one net copy of the corre- 1391 The presence of external allocatalytic cycles does (k) 1356 ν¯ g sponding molecule, e.g. ∆nk = (ν¯ · g )k = 1. This is 1392 not add new cycles to the autocatalytic core. A practi- 1357 illustrated for C2 in Fig. S2. 1393 cal consequence is that one can write catalyzed reac-

1394 1358 The individual replication cycles are tions very compactly for the core, while still maintain- 1395 ing nonambiguity, which we make use of in our analysis C g 2 1396 of metabolic cycles. C + C + C )−−−−−−−* 2C + C + C (28) 2 3 4 C 2 3 4 −g 2

C g 3 2C + C + C )−−−−−−−* 2C + 2C + C 2 3 4 C 2 3 4 −g 3

C g 4 1397 C. Metabolic cycles 2C + 2C + C )−−−−−−−* 2C + 2C + 2C 2 3 4 C 2 3 4 −g 4

C C C 1398 At least two metabolic cycles are known to be 1359 We can construct Γ = g 2 + g 3 + g 4 , which leads to 1399 autocatalytic. In our analysis of autoinduction, we 1360 the overall reaction 1400 pointed out that the core does not contain external Γ 1401 allocatalysts (here: enzymes). Written purely in terms 5C2 + 4C3 + 3C4 )−−−−* 6C2 + 5C3 + 4C4. (29) −Γ 1402 of autocatalysts, we find a type II autocatalytic core 1403 for the reverse Krebbs cycle (Fig. S4a). For the Calvin 1361 We see here that every species has a net production 1404 cycle depicted in Fig. S4b, we identify three Type I 1362 and is on both sides of the balance, thus participates 1405 cores (two structurally equivalent, differing only in the 1363 as a autocatalyst. Furthermore, ν¯ is a Type I core and 1406 reaction chosen to link the same core species) and 4 of 1364 consequently does not contain any smaller autocatalytic 1407 type II. 17

1408 D. Chemical amplification 1434 E. Autocatalysis in RAF sets

1435 A definition of Reflexively Autocatalytic Food- 43,57 24 1409 Chen et al. advanced a general strategy to 1436 generated sets can be found in Ref. : a set of reactions 1410 achieve self-amplifying behavior, demonstrated by the 1437 R is RAF if every reaction is catalyzed by at least one 1411 encapsulation of the reactive compound DCC in a cav- 1438 molecule involved in a reaction in R, and every reactant 1412 itand (indicated by an oval around the molecule in 1439 in R can be constructed from the food set f by successive 1413 EDCC, Fig. S5), which they referred to as chemical 1440 applications of reactions from R. The philosophy 1414 amplification. This phenomenon is consistent with the 1441 behind the RAF set is that ’every reaction’ in a RAF- 1415 IUPAC definition of autocatalysis, which we will now 1442 set can be accelerated by the catalysts themselves. 1416 illustrate by finding its autocatalytic core. 1443 RAF sets do not perform autoinduction, except when 1444 members of the food set f within a RAF-set also serve 1417 Let us first assess a reaction network formed by 1445 as allocatalysts. 1418 two reactions proposed by Chen et al, in absence of 1446 In the RAF-set formalism, reaction and catalysis 1419 the cavitand complexes. In that case, applying our for- 1447 are distinct mathematical objects. Graphically, RAF- 1420 malism readily reveals that the network cannot exhibit 1448 sets are typically represented as bipartite graphs (S6a), 1421 autocatalysis: there are no autocatalytic submatrices. 1449 with nodes (white squares) corresponding to reactions, 1 2 1450 which connect to nodes (colored circles) which serve 1451 as reactants (entering bold arrow) and products (leav- DCC  −1 0 1452 ing bold arrow) via directed edges. A dashed arrow X −1 0   1 1453 connecting to a reaction indicates that a species is a I1  1 −1 DCC + X )−−−−* I1, ν =   (31) 1454 catalyst for a reaction. Within the RAF framework, Y  0 −1 2   I1 + Y )−−−−* DCU + Z. 1455 the following terms are employed as distinct: i) auto- DCU  0 1  1456 catalytic reaction ("is a single chemical reaction for Z 0 1 1457 which one of the products also catalyzes the reaction"), 1458 ii) autocatalytic cycle ("is a sequence of reactions that, 1422 Now, let us add the species EDCC and two exchange 1459 once completed, results in two (or more) copies of the 1423 reactions with DCU and Z that liberate DCC, i.e. 1460 molecule that was started with"such as the toy formose 1 1461 reaction), iii) autocatalytic set (or RAF set). DCC + X )−−−−* I1, (32) 1462 It is important to note that the RAF-set formalism 2 I1 + Y )−−−−* DCU + Z, (33) 1463 and the distinctions above depend on the chosen level of 3 1464 coarse-graining of the description. In that description EDCC + DCU −−* DCC + EDCU, (34) )−− 1465 (see Fig. S6a) allocatalysts in the same allocatalytic 4 EDCC + Z )−−−−* DCC + EZ, (35) 1466 cycle are represented by a single species. The combi- 1467 nation of reactions that form the allocatalytic cycle

1424 for which we have the matrix 1468 is represented by a single dashed line, connecting to 1469 an uncatalyzed reaction. Here, there exists a level of 1 2 3 4 1470 description where a RAF set appears as autocatalytic 1471 reaction involving catalytic cycles. By comparing Figs. DCC −1 0 1 1  1472 S6a, we see that the level of description is critical in X −1 0 0 0    1473 assessing whether a network is a RAF-set or not. From I1  1 −1 0 0  1474 the definition of a RAF-set, the detailed version of the Y  0 −1 0 0    1475 network in Fig.S6a would not be a RAF, but the less ν = DCU  0 1 −1 0  (36)   1476 detailed description next to it would be. Z  0 1 0 −1   EDCC 0 0 −1 −1 1477 Another example is the toy formose reaction,   EDCU 0 0 1 0  1478 where two different choices of coarse-graining yield EZ 0 0 0 1 1479 two different description in the RAF framework. Even 1480 when neglecting the role of catalytic base and metal 1425 This network admits an autocatalytic submatrix, and 1481 ions, the toy formose reaction without coarse-graining 32 1426 we obtain an autocatalytic core of type III consisting 1482 is not a RAF-set , since its individual steps are not 1427 of the species DCC, I1, DCU and Z, as can be seen in 1483 catalyzed: 1428 Fig. S5b. The new reagent EDCC serves as a feedstock 1429 compound, that allows to dispense new DCC, that can C1 + C2 )−−−−* C3, C1 + C3 )−−−−* C4, C4 )−−−−* 2C2. 1430 now serve as an autocatalyst. The generality of the (37) 1431 mechanism follows from an exchange that could have 1484 Framed solely in terms of C1 and one autocatalyst (e.g. 1432 been performed with different reactants than DCU and 1485 C2, C4) one could propose a description in which we are 43,57 1433 Z to yield an equivalent network, as shown in Refs . 1486 oblivious of these steps, and write it as an autocatalytic 18

1487 reaction in the sense of RAF sets, such as 1530 This network corresponds exactly to the network dis- 1531 cussed in Fig. S6a, where the products of two allocat- 2C1 + C2 )−−−−* 2C2, 4C1 + C4 )−−−−* 2C4. (38) 1532 alytic cycles serve as a catalyst for each other, a Type 1533 II core in our stoichiometric formalism. 1488 This would work as long as we consider them satisfac- 1489 tory levels of description, depending on the context 1490 (e.g. further reactivity of intermediates, separation of 1534 More generally, labelling amphiphiles as Ak (k ∈ 1491 timescales): either of these hypothetical cases satisfy 1535 {1, 2, .., s}, the entry βij encodes the contribution for 1492 the RAF criteria. 1536 the allocatalytic cycle 1493 An interesting contrast occurs between RAF sets ∗ 1494 and our formalism: RAF-sets require a level of coarse- II I c I I A i + A j )−−−−−* A i + A j (42) 1495 graining without intermediates to define catalysis in −c∗ 1496 terms of single catalysts, which ensures a compact and ∗ 1497 unique description. Our formalism requires each cat- 1537 where c denotes a reaction vector for the catalytic 1498 alytic cycle to have at least one intermediate to satisfy 1538 cycle, for any description that verifies nonambiguity. 1499 nonambiguity. Although less compact, the description 1539 When i = j, this is a type I autocatalytic cycle (Fig. 1500 is flexible: adding further steps is always possible and 1540 S6c). When i 6= j (cross-incorporation), type II auto- 1501 does not alter the conclusions, which must remain in 1541 catalytic cycles are obtained, which are built up from n 1502 accord with chemical definitions. 1542 sequential allocatalytic incorporation steps and which 1543 end in the incorporation of the original amphiphile. 1503 In describing catalysis in terms of uncatalyzed 1544 Fig. S6 shows examples for n = 2 and n = 3. The 1504 reactions, it becomes possible to formalize the under- 1545 importance of the allocatalysis step (42) is graded by 1505 lying structure of catalysis in different models. We 1546 the entry βij. For a given n-step autocatalytic motif 1506 will now illustrate how this formally establishes that 1547 to exist, we require 1507 all autocatalysis in the GARD model qualifies as a 1508 RAF-set. n Y βsk+1sk > 0, s1 6= s2 6= .. 6= sn. (43) k=1

1509 F. GARD model 1548 In practice, all βij > 0, so all motifs exist in principle.

1510 GARD stands for graded autocatalysis replica- 22 1511 tion domain , and is a model for the autocatalytic 1549 The starting point of GARD is Eq. (39), i.e. a 1512 assembly of amphiphile assemblies (e.g. micelles). It 1550 coarse-grained description in which allocatalysis can 1513 describes micelles with a composition n = {n1, ..., ns}, 1551 be described as a single step and a single allocatalyst. 1514 that follow an evolution equation 1552 From Fig. S6, we see that we can obtain networks in 1553 GARD that would be RAF-sets in the RAF-framework  s  1554 by coarse-graining an incorporation cycle to convert it dni + −  1 X = k ρiN − k ni 1 + βijnj . (39) 1555 to catalysis in the RAF sense. The illustrated proce- dt i i  N  j=1 1556 dure extends to all autocatalysis in GARD, which is 1557 fully characterized by the continuation of the structures 1515 The surfaces are in contact with a reservoir, that con- 1558 in Fig. S6 to their n-step type II analogues. Eq. (43) 1516 tains species Zi at concentration ρi and that can enter 1559 guarantees that each reaction in GARD is catalyzed. 1517 the surface, which has an area proportional to N. The 1560 It follows that all autocatalysis in the GARD model + 1518 incorporation happens with a base rate of ki , but can 1561 can formally be treated as a RAF-set. 1519 be facilitated by other amphiphiles, for which the cat- 1520 alytic rate enhancement is characterized by βij.A 1521 special ingredient in GARD is the division process, 1562 Interestingly, the RAF-set formalism treats catal- 1522 which splits a mature surface in two new ones, after 1563 ysis as pertaining to chemistry in single phases, with 1523 achieving a maximal size. 1564 the environment supplying food locally through rapid 1524 We examine here stoichiometric mechanisms lead- 1565 exchange. In GARD, we instead have phase-transfer 1525 ing to Eq. (39). Let us consider amphiphiles A and 1566 catalysis between a bulk medium (II) and an interface 1526 B, which can be in the micelle or reservoir, labelled 1567 (I). A species in the bulk then serves as the food. Once 1527 I and II (see Fig. S6). To catalyze incorporation of 1568 the exact same species enters the interface, however, it 1528 the other, a complex is formed with a reservoir species, 1569 may cease to be abundant or have a fixed concentration 1529 and subsequent dissociation takes place in the micelle 1570 due to rapid exchange. It may thus no longer have 1571 the properties ascribed to the food set in the bulk and AI + BII )−−−−* [AB]I )−−−−* AI + BI, (40) 1572 should ipso facto be treated as a different species as I II I I I B + A )−−−−* [BA] )−−−−* A + B . (41) 1573 described in the final section of the main text. 19

1574 V. THE EXTINCTION PROBLEM AND FIXATION 1617 of the order 1. It follows survival of autocatalytic 1618 cycles requiring such reactions in the forward sens is 1619 hampered in large systems. 1575 In this section, we show that the extinction prob- 1620 Reactions that are first order in terms of autocat- 1576 lem (finding the extinction probability at long times, 1621 alysts are not hampered 1577 Pex) can be solved by a mapping to a branching process. 1578 We will first derive how ( context) in a system initially X )−−−−* ”P roduct(s)” (46) 1579 at steady-state perturbed with dilute autocatalysts, k 1580 key statistical properties emerge: autocatalytic species Xk + Yj )−−−−* ”P roduct(s)” (47) 1581 can be treated as independent, and their environment 1622 where Y is a feedstock compound that was already 1582 as fixed. Extinction becomes exponentially less likely j 1623 (abundantly) present in the system at a fixed molar 1583 as the population size continues to grow, which means 1624 fraction xY . The probability for one X molecule to 1584 Pex can be determined in the dilute limit. j k 1625 encounter a Y does not change with N, as xY remains 1585 We then proceed by applying the framework to a j j 1626 1586 variety of networks. fixed (and macroscopic). 1627 Note furthermore that, when autocatalysts are 1628 rare, reactions producing more autocatalysts are ’irre- 1629 versible’ 1587 A. Context

Xk + Yj −−→ Xl + Xm (48) 1588 We consider a reaction network, in a macroscopic 1589 system (Letting N denote the amount of chemical 1630 in the sense that the reverse reaction is exceedingly 23 1590 species, let us say N > 10 , or similarly, let the system 1631 more rare than the forward reaction. 1591 volume V be large) in a steady-state. For simplicity, let 1592 us first consider a CSTR (single phase, ideally mixed), 1593 with a residence time τ, corresonding to a uniform 1632 C. Constant composition, constant transition rates 1594 degradation rate kd.

1595 Now, we perturb the steady-state with a handful 1633 The effect of rare autocatalysts on the steady-state 1596 (O(1)) of new (not yet present in the system) autocata- 1634 composition (maintained by influx and degradation in 1597 lysts {Xk} = {X1, X2, ..., Xs} that are part of the same 1635 a CSTR) is initially small: every reaction introduces 1598 autocatalytic core. 1636 changes in molar fraction of the order 1/N (or in con-

1599 We consider that the population can grow due to 1637 centration terms, 1/V ∝ 1/N). For large N, the al- 1600 catalysis by autocatalysts, and the population decays 1638 terations of the composition will be vanishingly small 1601 by degradation reactions and effective degradation. For 1639 while the autocatalysts are rare.

1602 example, outflow out of a CSTR is considered as a first- 1640 Consequently, we can approximate the reactor 1603 order degradation process: 1641 composition in which an autocatalyst is placed, as 1642 the steady-state composition before perturbation. We X → ∅ (44) k 1643 then assume that the molar fractions of species Yk 1644 consumed by autocatalysts are sufficiently abundant 1604 The problem we wish to solve is the extinc- 1645 i.e. x  1/N, which was also required for (47). Yk 1605 tion problem: For an initial population of autocat- 1646 For sufficient N, deviations from this approximation 1606 alysts {N } = {NX , .., NX } what is the probability Xk 1 s 1647 become vanishingly small. 1607 Pex({N }), that, after a long time, the autocatalyst Xk 1648 A given Xk will therefore have a fixed transition + 1608 population goes extinct (∀k NX = 0)? 1649 rate w = kx to perform (47). Similarly, for (46) k k Yj 1650 and CSTR degradation, a fixed transition rate w = k 1651 is found, depending only on a rate constant.

1609 B. Large system limit

1652 D. Independence 1610 Let us first note that we (deliberately) consider 1611 the initial stochastic kinetics in a large system, with 1612 a small number of autocatalysts, such that reactions 1653 In the rare-autocatalyst regime, all reaction steps 1613 among autocatalysts of the kind 1654 we consider for autocatalysts are first-order, and they 1655 occur at fixed rates. It follows that autocatalysts do Xk + Xj −−→ ”P roduct(s)” (45) 1656 not influence each other, and they can each be treated 1657 independently. Thus, we can treat each autocatalyst 1614 are exceedingly rare and slow (the probability that a 1658 type separately: 1615 given Xk molecule encounters another Xj in a given 1616 time-frame scales with NX /N), where NX is initially Pex({N }) = Pex(NX )Pex(NX )..Pex(NX ). (49) j j Xk 1 2 s 20

1659 Also each individiual autocatalyst can be treated as 1692 i.e. 1660 such: + − + + Π1 Π2 Π1 Π2 NX B1 −−→ B2 −−→ B1... −−→ B2 −−→ 2B1 (55) P (N ) = P (X ) k (50) ex Xk ex k + − + 1693 Where Π1 , Π2 and Π2 are success probabilities for 1661 Where Pex(X ) denotes the extinction probability of a k 1694 the single reaction steps. These transitions compete 1662 population initially composed of a single X species. k 1695 with irreversible degradation processes 1663 We thus need a method for finding Pex(Xk). This ∅ ∅ 1664 will be obtained by mapping the problem to a branching Π1 Π2 B1 −−→∅, B2 −−→∅. (56) 1665 process. ∅ ∅ 1696 Where Π1, Π2 are success probabilities for the degra- 1697 dation reaction. Due to total probability conservation, 1698 we have 1666 E. A branching process ∅ + Π1 + Π1 = 1, (57) ∅ − + 1667 An attractive method we propose for finding Π2 + Π2 + Π2 = 1 (58) 1668 Pex(Xk) (from here on simplified to Pex), is by con- 1669 sidering the distribution of ’descendants’ Xk a species 1699 Ultimately, a B1 species will either be replaced by 2 1670 Xk will generate. From this, we construct a Branch- 1700 new ones (2B1), or none (∅): 1671 ing Process, represented chemically as a single parent 1−pc pc 1672 molecule X yielding k descendants: s ∅ ←−−− B1 −−→ 2B1, (59)

Pk 1701 which is a chemical representation of the simplest type Xs −−→ kXs, (51) 1702 of branching process: a birth-death process. In Eq. 1673 with Pk a distribution of the number of descendants 1703 (52), the only nonzero terms will come from 0 descen- 1674 which depends on the network topology. 1704 dants (P0) and 2 descendants (P2):

1675 Knowing Pk suffices to find Pex, since the probabil- 2 2 Pex = P0 + P2Pex = 1 − pc + pcPex. (60) 1676 ity to go extinct is the probability that all descendants 1677 independently (Eq. (50)) go extinct: 1705 Solving the quadratic equation yields two solutions ∞ 2 X k ± 1 ± (1 − 2pc) Pex = P0 + P1Pex + P2P + ... = PkP , (52) P = . (61) ex ex ex 2p k=0 c + 1678 We will now highlight some possible choices for Branch- 1706 For our problem, the ’physical’ solution is Pex while + 1679 ing Processes and their associated Pk. 1707 pc ≥ 1/2. Beyond that point, Pex > 1, while we require − 1708 0 ≤ Pex ≤ 1, so Pex = 1 becomes the only physical 1709 solution.  1 1 − 1, pc ≥ , 1680 F. A Birth-Death Process for the Type I cycle pc 2 Pex = 1 (62) 1, pc < 2 ,

1681 Consider a simple type-I cycle such as in Fig S7a. 1710 The average number of descendants is 2pc, which means 1682 Here, simple refers to there being a direct path of 1711 that below pc = 1/2 (the decay threshold) B1 is on aver- 1683 (effective) unimolecular steps between the starting com- 1712 age replaced by less then one B1 species and extinction 1684 pound (B1) and final compound (B2), followed by a 1713 is guaranteed. 1685 single fragmentation step producing two B1 from one 1686 B2

1714 G. A Branching Process for the same type I cycle B2 −−→ 2B1. (53)

1687 Starting from the marked node (B1), let P2 = pc be 1715 To illustrate that there is a variety of choices for 1688 the probability of successfully forming 2B1, i.e. 1716 the stochastic process under study, we will here consider 1717 an alternative choice for the simple type-I cycle, which pc B1 −−→ 2B1, (54) 1718 is more generally applicable. Noting that a single 1719 cycle is successful with probability pc, we now consider 1689 where pc contains the contribution of all possible tra- 1720 the number of successful cycles a single B1 provides, 1690 jectories (here: going back and forth between B1 and 1721 knowing that at some point degradation intervenes with 1691 B2) that precede the irreversible fragmentation step, 1722 probability 1 − pc. A succession of k cycles preceding 21

1723 a failure will thus lead to k successors 1751 In this limit, there is only one trajectory that con- 1752 tributes to pc, and each step involves a competition Pk B1 −−→ kB1. (63) 1753 between the forward reaction and degradation only.

1754 In studies using simple type-I cycles, the fraction 1724 The probability to spawn k descendants, Pk, is the 1725 probability of k successful Bernoulli trials followed by w+ ζ = k , (70) 1726 failure, and follows a geometric distribution k + ∅ wk + wk k Pk = pc (1 − pc). (64) 1755 has been referred to as the specificity of reaction 23,44,45,58 1756 step k, which for irreversible reactions coin- 1727 Injecting this distribution in Eq. (52), we find a geo- + 1757 cides with the transition probability Π 1728 metric series k ∞ w+ X k 1 − pc + k lim Π = lim = ζk (71) Pex = (1 − pc)(pcPex) = . (65) − k − − + ∅ 1 − pcPex w →0 w →0 w + w + w k=0 k−1 k−1 k−1 k 1

1729 Multiplying both sides by 1 − pcPex then yields the 1758 For simple type-I networks with n reaction steps 1730 quadratic equation (60) previously found for the Birth- 1759 (n distinct edges), the irreversible limit (Eq. (69)) 44,58 1731 Death process. This is necessary, since we are calcu- 1760 generalizes to 1732 lating the same quantity Pex for the same network. It n n + 1733 highlights that we may construct a variety of branching Y + Y wk pc = Π = , (72) 1734 processes to find Pex. k + ∅ k=1 k=1 wk + wk

1761 which we will show more formally in the next section. 1735 H. A microscopic view on pc

1736 We can construct pc from microscopic details. In 1762 J. The simple type I cycle with n steps 1737 terms of transition probabilities, we find pc by summing 1738 over all trajectories in Eq.(55): 1763 To expand our discussion on simple type I cycles, ∞ + + 1764 we will now derive a general expression for pc, when X + − + k + Π1 Π2 pc = Π1 (Π2 Π1 ) Π2 = − + (66) 1765 there are n steps, of which the first n − 1 are treated 1 − Π2 Π1 k=0 1766 as reversible. The problem is illustrated in Fig. S7c.

1767 Let us denote P the probability to reach 1739 A more detailed description is possible when our de- Xk→Xj 1768 X , starting from X . For P , there is only one 1740 scription is Markovian (i.e. reactions are sufficiently j k X1→X2 + 1769 trajectory: 1741 elementary). Let wk denote a forward transition rate, − 1742 to go from X to X . Let w denote a backward tran- k k+1 k Π+ ∅ 1 1743 sition rate from Xk to Xk−1 and wk the degradation X1 −−→ X2, (73) 1744 rate for X . We may then write k + 1770 and hence P = Π . For P , we need to X1→X2 1 X2→X3 + − w w 1771 consider that we can go back and forth between X Π+ = 1 , Π− = 1 , (67) 2 1 + ∅ 2 − ∅ + 1772 and X1: w1 + w1 w1 + w2 + w2 + w Π+ Π− Π+ = 2 . (68) X −−→1 X −−→2 X , (74) 2 w− + w∅ + w+ 1 2 1 1 2 2 + Π2 X2 −−→ X3. (75)

1745 I. The irreversible reaction limit 1773 We can absorb the contribution of hopping back-and- 1774 forth in the factor Γ2:

1746 Simple type-I cycles have been studied on sev- P = Π+Γ (76) X2→X3 2 2 1747 eral occasions, in the limit where all reactions proceed ∞ 21,23,45 − 1 1748 irreversibly (∀k wk → 0). In this limit back- X − + k − Γ2 = (Π2 Π1 ) = − + (77) 1749 ward reactions are ignored (Π = 0), which for our 1 − Π Π 2 k=0 2 1 1750 example leads to 1775 Now, for PX →X , we need to consider that we can go w+ w+ 3 4 p = Π+Π+ = 1 2 . (69) 1776 back and forth between X3 and X2, and that at X2 we c 1 2 + ∅ + ∅ w1 + w1 w2 + w2 1777 can go back and forth between X2 and X1. We absorb 22

1778 this in the factor Γ3: 1805 becomes

P = Π+Γ (78) PC = (1 − p )pk . (87) X3→X4 3 3 k D D P∞ − + k 1 Γ3 = k=0(Π3 Γ2Π2 ) = − + (79) 1−Π3 Γ2Π2 1806 Combined, a single C1 has been then replaced according 1807 to 1779 We can repeat this argument any number of times, to 1780 find that, for k ≥ 1 C1 −−→ sD1 −−→ (n1 + n2 + ... + ns)C1. (88) + Ps PX →X = Π Γk (80) 1808 k k+1 k Let us denote k = l=1 nl as the number of descen- Γ = P∞ (Π− Γ Π+)s = 1 . (81) 1809 dants. The distribution of the number of descendants k+1 s=0 k+1 k k 1−Π− Γ Π+ k+1 k k 1810 Pk then becomes

1781 where Γ imposes that Γ = 1. ∞ ∞ s 2 1 X X Y P = PP PDCUδk , (89) k s nr n1+...+ns 1782 The total probability pc to reach Xn from X1 and s=0 n1,..,ns r=0 1783 then perform the final irreversible fragmentation reac-

1784 tion rn, is then 1811 which simplifies to n−1 ! n α Y Y k p = P Π+Γ = Π+Γ , (82) P0 = 1 − β , Pk = βα , k ≥ 1 (90) c Xk→Xk+1 n n k k 1 − α k=1 k=1 1812 where 1785 When backward transitions become negligible − pD pC(1 − pD)(1 − pC) 1786 (∀k Πk → 0), we have ∀k ≥ 1 Γk = 1, and pc then α = , β = . 1787 acquires its well-known limit expression described by 1 − pC(1 − pD) 1 − pC(1 − pD) 1788 Eq. (72). (91) 1813 From Eq. (52), we then find Pex. By rewriting the 1814 geometric series due to (91), we have

α αPex Pex = 1 − β − β , (92) 1789 K. A Branching Process for a Type II cycle 1 − α 1 − αPex

1815 which admits the solutions Pex = 1 and 1790 As an example of a simple type-II cycle, we con- 1791 sider the network in Fig. S7c, starting with the species β 1 1 − pC Pex = + = . (93) 1792 C1. Our approach will be reminiscent of our branching α − 1 α pD 1793 process in Sec. V G, but with a repeated branching 1794 step.

1795 With a probability pC, C1 will successfully perform 1796 the allocatalytic cycle r1 + r2 + r3 (with some possible 1797 back-and-forths), yielding overall

pC C1 −−→ C1 + D1. (83) 1816 L. Microscopic expressions for pC and pD

D 1798 The probability Pk that k successful cycles occur before 1799 the first failure (i.e. degradation C → ∅) is k 1817 Let us first consider how pC can be constructed D k 1818 from smaller reaction steps. To do so, we observe that P = (1 − pC)p . (84) k C 1819 the first step must be

1800 Corresponding effectively to the overal reaction + Π1 C1 −−→ C2, (94) D Pk C1 −−→ kD1. (85) + 1820 with Π1 the probability of success, competing with 1821 degradation 1801 Now, let pD be the probability that D1 succesfully 1802 completes a cycle r4 + r2 + r3 (including back-and- ∅ Π1 1803 forths): C1 −−→∅, (95)

pD ∅ + D1 −−→ C1 + D1. (86) 1822 for which Π1 = 1 − Π1 .

1804 The probability of k successful cycles before failure 1823 Arrived at C2, going back-and forth reversibly 23

1824 becomes possible for neighboring nodes C1, C3 and D1: 1849 replacing pC with ps and pD with p1, thus finding

Π+ Π− 1 − ps 1 2 P = . (106) C1 −−→ C2, C2 −−→ C1 (96) ex p1 Π+ Π− C −−→2 C , C −−→3 C (97) 2 3 3 2 1850 We now turn to the problem of finding expressions for − + Θ1 Θ1 1851 ps and p1. C2 −−→ D1, D1 −−→ C1 (98) 1852 By structural analogy to simple type I cycles (apart + − − + 1853 from the fragmentation step), we may again write 1825 Where Πk , Πk , Θk , Θk all denote success probabilities. 1826 Starting at C1, a succesful trajectory necessarily starts + + + + PX →X = Π Γk (107) 1827 with reaction r1 (Π1 ) and ends with r2 + r3 (Π2 Π3 ), k k+1 k ∞ 1828 all in the forward sense. In between, we can go back- P − + s 1 Γk+1 = s=0(Πk ΓkΠk ) = − + . (108) − + − + + − 1−Πk+1ΓkΠk 1829 and-forth (Π2 Π1 + Θ2 Θ1 + Π2 Π3 ) any number of 1830 times. Summing over all possible trajectories, pC then 1854 with Γ1 = 1. 1831 becomes Qn−1  + 1855 Since p = P Π Γ we find ∞ s k=s Xk→Xk+1 n n X + − + − + + − k + + pC = Π1 (Π2 Π1 + Θ2 Θ1 + Π2 Π3 ) Π2 Π3 (99) n ! k=0 Y + ps = Πk Γk (109) 1832 which sums to k=s

+ + + Π1 Π2 Π3 pC = (100) 1 − (Π−Π+ + Θ−Θ+ + Π+Π−) 2 1 2 1 2 3 1856 N. Symmetric motifs

1833 Starting at D1, a succesful trajectory necessarily starts + 1834 with r4 (Θ1 ) in the forward sense, to form C2. From 1857 When the network motif is symmetric in struc- 1835 there on, a successful trajectory follows the previous 1858 ture, and if the transitions preserve this symmetry, + + 1836 calculation, i.e. pD = Θ1 /Π1 pC: 1859 this can be exploited to simplify calculations and gain 1860 insight in topological aspects of autocatalysis and ro- + + + Θ1 Π2 Π3 1861 bustness. Experimentally, this symmetry rarely applies pD = − + − + + − (101) 1 − (Π2 Π1 + Θ2 Θ1 + Π2 Π3 ) 1862 for the transitions, but it can be made applicable for 1863 the purpose of our analysis, e.g. by setting transi- 1837 In the limit where all reactions proceed irreversibly 1864 tion probabilities to values that reflect the structural 1838 forward (Sec. V I), pC and pD only have a contributing 1865 symmetry. 1839 from a single straight trajectory 1866 Consider a series of m linked allocatalytic cycles + + + + + + 1867 (Fig. S8b), all consisting of n nodes and n edges which pC = Π1 Π2 Π3 , pD = Θ1 Π2 Π3 . (102) 1868 are structurally equivalent:

+ Π+ Πkn+1 (k+1)n Xkn+1 −−−−→ .. −−−−−→ Xkn+1 + X(k+1)n+1, (110)

1840 M. Type III cycles with one fragmentation step 1869 which loops back at the mth cycle:

Π+ + m−n+1 Πmn 1841 Let us now consider a type III network composed Xm−n+1 −−−−−→ .. −−−→ X1 + Xm−n+1, (111) 1842 of n species {X1, .., Xn} and reaction steps {r1, .., rn}, + 1843 where the final fragmentation step r produces n 1870 As before, Πk denotes a forward transition probability, 1871 and we also introduce reverse reactions and degradation Xn −−→ X1 + Xs, 1 < s < n, (103) 1872 in the usual sense:

− 1844 as shown in Fig. S7e Πk Xk −−→ Xk−1, (112) 1845 We may then introduce the success rates for the ∅ Πk 1846 allocatalytic cycles for X1 and Xs Xk −−→∅. (113)

p1 X1 −−→ X1 + Xs, (104) 1873 Now, let us choose transition probabilities such that ps 1874 they are equivalent among the equistructural allocat- Xs −−→ X1 + Xs. (105) + + 1875 alytic cycles, i.e. periodic in n: Πk = Πk+n and idem − − 1847 Which was exactly our starting point in Sec. V K. 1876 for reverse steps (Πk = Πk+n) and by extension, degra- ∅ ∅ + − ∅ 1848 Starting at Xs, we may thus directly use Pex, upon 1877 dation (Πk = Πk+n), since Πk + Πk + Πk = 1. 24

59 1878 Then, finding Pex(Xk) is no different from find- 1915 find 1879 ing Pex(X ), which is seen readily in Fig. S8a by k+n 1 − ζ6 1880 rotating the networks and interchanging the labels. Ap- Pex = 6 (118) 1881 plying this symmetry to Fig.S8a for the six-membered ζ 1882 cycle (n = 2, m = 3), we can thus write

Pex(X1) = Pex(X3) = Pex(X5). (114)

Pex(X2) = Pex(X4) = Pex(X6). 1916 2. N2: a 6-membered asymmetric type III cycle

1883 We characterize the success of each equivalent allocat- 1884 alytic cycle (n steps) by the same probability pc. We 1917 In Sec. V M, the general solution for n-membered 1885 can then express the extinction probability in terms of 1918 type III cycles with one fragmentation reaction was de- 1886 the reaction products of one allocatalytic cycle: 1919 rived. For a 6-membered cycle with the fragmentation 1920 reaction Pex(X1) = 1 − pc + pcPex(X1)Pex(X3) (115) 2 = 1 − pc + pcPex(X1) , (116) X6 −−→ X1 + X4, (119)

1887 where we have used the symmetry (115), which yields 1921 which corresponds to network N2 in the main text. 1888 the exact second order equation we obtained for a 1889 simple type-I cycle of n steps. We can thus resolve the 1922 Having X4 as the starting species, we can express 1890 general problem using our previously derived solutions. 1923 Pex(X4) in terms of Eq. (109)

1 − p4 Pex = . (120) p1 1891 VI. EXPRESSIONS FOR FIG 3, A SURVEY OF Pex 1892 FOR VARIOUS STRUCTURES 1924 In the irreversible reaction limit with fixed specificity + 1925 (∀k ≥ 1 Πk = ζ, Γk = 1), Pex becomes

3 1893 In Fig. 3 in the main text, the behavior of Pex is 1 − ζ P = (121) 1894 compared for a number of networks (N1 to N5), in the ex ζ6 1895 limit where all reaction steps proceed irreversibly (Sec. 1896 V I), and where all reaction steps do so with a common 1897 success probability ζ (also known in the literature as 1898 specificity). Of course, this is an abstraction that is 1899 hard to realize experimentally, and its purpose is the 1926 A. N3: 6-membered type II network with RAF 1900 following: by controlling for kinetics, we can systemat- 1927 representation 1901 ically investigate and compare how survival is affected 1902 by network topology. In this section, we will derive the 1903 functional dependence of Pex on ζ for the structures 1928 The network N3 consists of two nonoverlapping 1904 discussed in the main text. 1929 allocatalytic cycles, which produce each other’s allocat- 1930 alyst:

+ + + Π1 Π2 Π A )−−−−* A )−−−−* A −−→3 A + B , (122) 1905 1. N1: a 6-membered type I cycle 1 2 3 1 1 Θ+ Θ+ + 1 2 Θ3 B1 )−−−−* B2 )−−−−* B3 −−→ A1 + B1. (123) 1906 In Sec. V J, we derived the general solution for + + 1907 n-membered type I cycles in terms of the probability 1931 Choosing our transition probabilities Πk = Θk , we 1932 can exploit the network symmetry as outlined in Sec. 1908 pc to reach Xn and perform rn. Starting from X1, we 1909 can now recover the solution for n = 6 1933 V N and Pex(A1) = Pex(B1). Let pc be the probability 1934 that either A1 or B1 succesfully finishes an allocatalytic 1910 For n = 6, pc becomes 1935 cycle. We may then write 5 ! 6 Y Y + + Pex(A1) = 1 − pc + pcPex(A1)Pex(B1) (124) pc = PX →X Π6 Γ6 = Πk Γk (117) k k+1 2 k=1 k=1 = 1 − pc + pcPex(A1) . (125)

1911 Moving to the irreversible reaction limit, and control- 1936 The success rate pc can be expressed in the familiar + 1912 ling the reaction specifity (∀k ≥ 1 Πk = ζ, Γk = 1), 1937 way 6 1913 we obtain pc = ζ . Upon injection in the solution + + + 1914 Pex = (1 − pc)/pc (for pc ≥ 1/2) (Eq. (9)), we then pc = Π1 Π2 Π3 Γ1Γ2Γ3. (126) 25

1938 In the irreversible limit (∀k ≥ 1, Γk = 1) with fixed 1960 An alternative way of seeing this is that, by symmetry, 3 1939 specificity ζ, pc = ζ , and thus we find for pc ≥ 1/2 1961 Pex is the same as that for a 3-membered type III cycle 1962 with one fragmentation step (forming X2), for which 3 1 − ζ 1963 we can directly use the solution derived in Sec. V M. P = . (127) ex ζ3

1940 B. N4: 6-membered symmetric type III network 1964 C. A trio of symmetric analogues

1941 The network N4 consists of two nonoverlapping 1965 As derived in Sec. V N, the interlinked allocat- 1942 allocatalytic cycles, which produce a precursor (A0, B0) 1966 alytic cycles of size n behave, due to symmetry, as 1943 for each other’s allocatalyst: 1967 an n-membered simple type-I cycle. Reproducing the 1968 solution for the 2-membered cycles (see Sec. V F) in + + Π Π Π+ + − 0 1 2 1969 the irreversible limit with ∀k ≥ 1 Π = ζ, Π = 0, we A0 )−−−−* A1 )−−−−* A2 −−→ A1 + B0, (128) k k 1970 + + thus find Θ Θ Θ+ −−0* −−1* 2 B0 )−− B1 )−− B2 −−→ A0 + B1. (129) 1 − ζ2 P = (138) ex ζ2 1944 Choosing our transition probabilities to follow the sym- + + 1945 metry of the network, we can write Πk = Θk , and 1946 Pex(Ak) = Pex(Bk). Finally, we note that B0 can ei- ∅ + 1947 ther i) degrade with probability Θ0 = 1 − Θ0 , or ii) + + 1971 D. N : a type V core 1948 form B1 with probability Θ0 = Π0 , such that 5

+ + + + Pex(B0) = 1−Π0 +Π0 Pex(B1) = 1−Π0 +Π0 Pex(A1). (130) 1972 In N5 we have the reactions 1949 Denoting pc the probability that A1 performs a succes- A1 )−−−−* A2 −−→ B1 + C1, (139) 1950 ful allocatalytic cycle (yielding A + B ), we can write 1 0 −−* 1951 the extinction probability as B1 )−− B2 −−→ A1 + C1, (140) C1 )−−−−* C2 −−→ A1 + B1. (141) Pex(A1) = 1 − pc + pcPex(A1)Pex(B0), (131) 1973 If our transitions follow the symmetry of the network, 1952 which upon injecting (130) yields the following expres- 1974 we have Pex(Ak) = Pex(Bk) = Pex(Ck). Denoting pc 1953 sion for Pex(A1) 1975 the success probability of the allocatalytic cycle, we

+ + 2 1976 can write Pex in terms of Eq. (139) Pex = 1 − pc + pc(1 − Π0 )Pex + pcΠ0 Pex. (132) Pex(A1) = 1 − pc + pcPex(B1)Pex(C1) (142) 1954 Solving the quadratic equation (132), we find 2 = 1 − pc + pcPex(A1) , (143) q + + 2 1 − pc(1 − Π0 ) ± (pc(1 + Π0 ) − 1) 1977 which is the solution found for the 2-membered cycle. P (A ) = ex 1 + 1978 Noting that 2pcΠ0 (133) + + pc = Π1 Π2 Γ1Γ2 (144) 1955 which yields Pex = 1 and

1 − p 1979 the irreversible limit with fixed specificity (∀k ≥ c + Pex = . (134) + 1980 1 Πk = ζ, Γk = 1) yields, as in Sec. V F, pcΠ0 1 − ζ2 1956 We can write pc in terms of back-and-forths starting P = (145) ex ζ2 1957 at A1, terminating with an irreversible fragmentation

P∞ − + + − k + + pc = k=0(Π1 Π0 + Π1 Π2 ) Π1 Π2 (135) + + Π1 Π2 = − + + − , (136) 1−Π1 Π0 +Π1 Π2 1981 VII. EXPRESSIONS FOR FIG 4D, A PHASE 1982 DIAGRAM FOR MULTICOMPARTMENT 1958 so that in the irreversible limit with fixed specificity 1983 AUTOCATALYSIS 1959 (∀k ≥ 1 Πk+ = ζ, Πk− = 0), we obtain

2 1 − ζ 1984 Fig. S9 represents a case of a type III core with Pex = . (137) ζ3 1985 one fragmentation reaction (Sec.V M), consisting of 5 26

1986 members: 2012 plays the role of a feedstock species in compart- 2013 ment I when r3 proceeds forward. proceeds back- Π+ Π+ Π+ Π+ 1 2 3 4 2014 ward. X1 )−−−−* X2 )−−−−* X3 )−−−−* X4 )−−−−* X5, (146) Π+ ∅ ∅ 5 2015 • w3 = k3 degradation, r6 X5 −−→ X1 + X2, (147) − − Π− Π− Π− Π− 2016 • w4 = k4 release of F, when r3 X −−5* X −−4* X −−3* X −−2* X , (148) 5 )−− 4 )−− 3 )−− 2 )−− 1 + ex 2017 • w4 = k4 exchange, from compartment II to I, 1987 which can also be seen in Fig. S9c. From Fig. S9a, we 2018 when r4 proceeds forward. 1988 furthermore infer that there are only two degradation ∅ ∅ 2019 • w4 = k4 degradation, r7 1989 reactions (r6 and r7): − ex 2020 • w5 = k5 exchange, from compartment I to II, ∅ ∅ Π3 Π4 2021 when r4 proceeds backward. X3 )−−−−* ∅, X4 )−−−−* ∅. (149) + + 2022 • w5 = k5 release of F, which is an autocatalyst 1990 Denoting p1 the probability for X1 to succesfully 2023 in II, through locally irreversible reaction r5. 1991 finish (with back-and-forths) the cycle r1 + r2 + r3 + 2024 We then obtain 1992 r4 + r5, and p2 the probability that X2 does so for r2 + + 1993 ∅ r3 + r4 + r5, we can directly substitute the expressions + w3 ∅ w3 Θ1 = + ∅ , Θ1 = + ∅ , (158) 1994 derived in sec. V J w3 +w3 w3 +w3 + + w w∅ 5 5 4 ∅ 4 Θ2 = + ∅ − , Θ2 = + ∅ − , (159) Y + Y + w4 +w4 +w4 w4 +w4 +w4 p1 = Πk Γk, p2 = Πk Γk. (150) − + − w4 + w5 k=1 k=2 Θ2 = + ∅ − , Θ3 = + − , (160) w4 +w4 +w4 w5 +w5 w− 1995 Since degradation reactions do not occur in compart- Θ− = 5 . (161) 3 w++w− 1996 ment I, we are guaranteed that, starting from X1 or 5 5 1997 X2, eventually X3 will be formed with probability 1. 2025 Noting that Φ1 = 1, the product Φ2Φ3 simplifies to 1998 At that point, we can solve the problem for an effective 1999 network without X and X starting at X (by the 1 1 2 3 Φ Φ = . (162) 2000 same token, returns are inconsequential). In Fig. S9d 2 3 − + − + 1 − Θ2 Θ1 − Θ3 Θ2 2001 this effective network is given, for which we write

2026 This allows to fully express P in terms of 8 micro- + + ex Θ Θ Θ+ 1 2 3 2027 scopic coefficients. Reactions r1 and r2 would give A1 )−−−−* A2 )−−−−* A3 −−→ 2A1, (151) 2028 − − 4 more rate constants and two more molar fractions Θ3 Θ2 2029 (for chemostatted species). However, their values do A3 )−−−−* A2 )−−−−* A1, (152) 2030 ∅ ∅ not alter Pex as they wil always lead to return to A1 Θ1 Θ2 + + A1 −−→∅, A2 −−→∅. (153) 2031 (provided w1 , w2 > 0, which we naturally assume to 2032 be true). 2002 We readily find that this corresponds to a simple type-I 2033 For the purpose of illustration, we will consider 2003 network with 3 members, with a success probability 2034 the competition between exchange, degradation, and 2004 for a cycle pc, such that 2035 other transitions. To do so, we choose one rate for ex ex ex + + + 2036 exchange k4 = k5 = k and one rate for degradation pc = Θ1 Θ2 Θ3 Φ1Φ2Φ3 (154) d d d 2037 k3 = k4 = k . Furthermore, we let the sequestration- 1 − pc 2038 release steps be equally probable in compartment II, Pex = (155) + − + pc 2039 and match release in I w3 = w4 = w5 = k. The 2040 transition success probabilities can then be expressed 2005 Where we have used Φ for calculating back-and-forth k 2041 in terms of two ratios 2006 trajectories, as derived in sec. V J: ∆ = kd/k, Ξ = kex/k (163) P = Θ+Φ (156) Ak→Ak+1 k k P∞ − + s 1 2042 which upon substitution yield Φk+1 = s=0(Θk ΦkΘk ) = − + . (157) 1−Θk+1ΦkΘk + 1 d ∆ + Ξ Θ1 = 1+∆ , Θ1 = 1+∆ , Θ2 = 1+∆+Ξ , (164) 2007 with Φ1 = 1. Let us now give a microscopic inter- d ∆ − 1 2008 pretation to the competing processes, by considering Θ2 = 1+∆+Ξ , Θ2 = 1+∆+Ξ , (165) 2009 sufficiently elementary transitions on the level of a + 1 − Ξ Θ3 = 1+Ξ , Θ3 = 1+Ξ . (166) 2010 single species

2043 This permits to construct the phase diagram for Pex + + 2011 • w3 = k3 xF, sequestration of F (see S9a), which 2044 in Fig. 4D in terms of the variables ∆ and Ξ. 27

2045 A. Phase boundaries and limits 2058 as also seen in the phase diagram. When exchange is 2059 very rapid (Ξ → ∞), it ceases to be rate-limiting, and 2060 degradation will only compete with other reactions. 2061 This occurs when we let the denominator of Eq. (170) 2062 become 0, i.e. 2046 Let us first find an expression for the boundary be- 2047 tween autocatalysis and deterministic extinction, which 1 − 3∆ − ∆2 = 0, (172) 2048 occurs when Pex = 1 and Eq. (155) coincide, i.e. when 2049 p = 1/2, which upon substitution of Eq. (162) be- c 2063 which has solutions 2050 comes: √ −3 ± 21 1 ∆± = , (173) Θ+Θ+Θ+ = (1 − Θ−Θ+ − Θ−Θ+) (167) 2 1 2 3 2 2 1 3 2 √ −3+ 21 2064 2051 In terms of Ξ and ∆, we have of which ∆ = 2 ≈ 0.79 is the physical solution, 2065 as can also be seen in the phase diagram. 1 Ξ 1 2 1+∆ 1+∆+Ξ 1+Ξ (168) = 1 − 1 1 − Ξ Ξ 1+∆+Ξ 1+∆ 1+Ξ 1+∆+Ξ 2066 VIII. MULTICOMPARTMENT AUTOCATALYSIS 2067 WITH THREE COMPARTMENTS 2052 Which rearranges to a linear dependence in Ξ

2 2 ∆ Ξ + ∆ + 3∆Ξ + 2∆ − Ξ = 0, (169) 2068 Let us consider a bimolecular reaction

2053 from which we obtain for the phase boundary A + B )−−−−* C, (174)

∆(∆ + 2) 2069 which can occur in three different compartments labeled Ξ = 2 . (170) 1 − 3∆ − ∆ 2070 α, β, γ, as shown in Fig. S9d. Let us couple these 2071 compartments through the following exchanges 2054 In the regime where reactions outpace degradation 2055 (∆  1), extinction will be due to rate-limiting ex- Aα )−−−−* Aβ, Bβ )−−−−* Bγ , (175) 2056 change (Ξ  1). Taking Eq. (170), dividing by ∆ and Cα )−−−−* Cβ )−−−−* Cγ . (176) 2057 letting ∆ → 0, the ratio Ξ/∆ tends to

Ξ 2072 Removing Aγ ,Cα then immediately yields the type III = 2, (171) 2073 autocatalytic core shown in Fig. S9e. ∆ 28

a pathback-branch forward-branch C’ is a subcycle of C C is not minimal C’ is minimal

b hyper-ear proto-ear c Type I Type II

*u-v-*

Type III Type IV Type V

*u-u-v-v* u-u-v-v

d e z y z nested back-branches rz y ry y y z z entangled back-branches ry rz f E allocatalytic cycle ES EP S P

Figure S1: a) Hypergraph paths and cycle examples. b) Examples of hyper-ears (left) and associated proto-ears (right) obtained by removing green species. The syntax below graphs (see section II C) describes the relationships between non-P species and path P. Nodes ’u’ are products of r, nodes ’v’ are reactants of reactions producing x, ’-’ is any series of reactions with a single reactant and product, ’*’ denotes the closure of C by the green path. c) Autocatalytic cores. Edges are oriented consistently along cycles, so that reaction have a single reactant. Orange squares are chains of arbitrary length made of reactions with a single reactant species and product species. Edge-to-node connections are weighted by a stoichiometric coefficient, represented explicitly only for Type I by a fork (stoichiometry of 2). C, C0 and C0 are cycles. In Type II, the dotted path may comprise multiple cycles similar to the green box. In main text Fig. 2, only the case of a single green box is represented for simplicity. In Type I, detC 6= 0; in Type II, det(C) = 0 and det(C0) can be any value; in Types III-V, det(C) = det(C0) = det(C00) = 0. d) Generic hyper-ear structure of cores. e) Nested and entangled back-branches. f) Example of allocatalytic cycle. 29

IV a) I b) I D g 3 C III II III II g 2

Figure S2: a) A decorated Toy Formose reaction given by the submatrix ν ∗, obtained by removing C1. The replication cy- D cle g 3 is illustrated in blue. In this network, only species C2, C3 and C4 are autocatalysts. b) The minimal formose C reaction in its autocatalytic subnetwork, an example of an SFA. Arrows illustrate the replication cycle g 2 .

a) b) +

C int1

cat1

Figure S3: a) top: A product-enhanced cycle, Blackmond’s first type of autoinduction. bottom: Hypergraph, contain- ing the type I autocatalytic core (yellow). External food and allocatalysts are marked in blue. b) top: A ligand- accelerated cycle, Blackmond’s second type of autoinduction. bottom: Hypergraph, containing the type II autocat- alytic core (yellow) and supporting external food (blue). Note that, in both cases, external allocatalytic cycles are not part of the core. Allocatalysts are treated on equal footing with feedstock and waste in isolating a core. 30

a) 1: Acetate 7: Fumarate 2 3 4 1 5 2: Acetyl-CoA 8: Succinate 3: Pyruvate 9: Succinyl-CoA 13 6 4: Phosphoenyl- 10: 2-ketoglutarate pyruvate 11: isocitrate 12 7 5: oxaloacetate 12: cis-aconitate 11 8 10 9 6: Malate 13: citrate 6 7 6 4 6 6 5 7 10 b) 6 6 1: RuP 6 7 5 11 2: RuBP 3 13 12 10 6 8 3: 3-PGA 6 5 2 4 6 4: 1,3-PGA 13 11 1 9 6 12 5: G3P 6 6: F1,6P 9 1 6 6 5 3 7: F6P 4 6 13 6 9 8: E4P 2 6 5 10 9: X5P 3 5 1 9 10: DHAP 6 11 11: S1,7P 2 12 1 12: S7P 13 13: R5P 2x 9 9 1

Figure S4: a) Autocatalysts in the Reverse Krebbs cycle, which yield a Type II core. b) Autocatalysts in the Calvin Cycle. We find 3 cores of type I (2 equivalent, up to the choice of reaction to link 5 and 9), and 4 cores of type II.

a) DCU b)

NH O Type III NH N N O C C N NH N DCU Z Z

EDCC EDCC I1 NH 2 O I1 O NH C N DCC Y

NH O X O NH NH O OH

N C N

EDCU DCC EZ

Figure S5: a. Chemical network for the cavitand-amplification network. blue: feedstock compounds and waste. beige: autocatalysts. b. type III autocatalytic core for the cavitand-amplification network. 31

1 Y Y 1 X 1 XY 1 1 XY Y X 2 1 2 X 1 2 1 2 4 1+2 3+4 4 X 2 XY 3 1

II

I

Type I Type II Type II

β β β β β βii ij ji ij jk ki

Figure S6: a. A type-II autocatalytic network with its feedstock compounds (or Food set), as encountered in GARD. Col- ored nodes highlight two distinct allocatalytic cycles that yield an autocatalytic cycle when combined. b. The same network, in a bipartite graph representation used in the RAF sets formalism. Specifying the mechanism in terms of uncatalyzed reaction steps removes the RAF property. c. A coarse-grained representation, where allocatalysts in the same allocatalytic cycle are represented by a single species, and each allocatalytic cycle has been replaced with a dashed line, to indicate the net reaction being catalyzed. b. A catalytic incorporation mechanism: the red (telephone shape) amphiphile forms a complex with the purple (square shape) amphiphile, which mediates its incorporation in a micelle or vesicle. c. Autocatalytic networks of co-assembling amphiphiles. Amphiphiles in square nodes are reservoir species, those in circular nodes represent amphiphiles in a micelle or membrane. Diagonal terms of the catalytic ma- trix encode type I autocatalysis, i.e. direct self-incorporation. All other autocatalysis (cross-incorporation) is of type II: sequences of nonoverlapping allocatalytic cycles. 32

a) b) c) B Bn 1 B B 1 2 B1 B2 2B1 B B - n-1 ... 2 d) Π Π Π 2 n-1 n B B ... B B 2B 1 - 2 - - n-1 - n 1

e) 4 f) D Π 2 3 1 2 C2 C3 C C C C C + D 1 1 1 - 2 - 3 1 1 Θ Θ- g) 1 1 Xs Xn D1 X1

X2 h) Π Π Π 2 Π s Π n s-1 s X X ... X ... X X + X 1 - 2 - - s - - n 1 s

Figure S7: a. A type-I subnetwork. B1 and B2 reversibly interconvert, B2 can also irreversibly form two B1, marked by + the forked edge. b. State graph, Πk denotes the probability that, starting at Bk, the next transition will be a step − ∅ forward in the cycle, Πk a step backward, and Πk a degradation. Fragmentation (yielding 2B1) and degradation (∅) are absorbing states, attained with probabilities pc and 1 − pc respectively. c. Autocatalytic core for a type-I cycle with n nodes. d. State graph for c e. schematic for an autocatalytic core for a type III cycle. f. State graph for e. g. general schematic for an autocatalytic core for a type II cycle with a single fragmentation step. h. State graph for g. 33

a) b) ...

......

c) d) e) f) Π Π 2 3 Π 3 A A A A + B 1 - 2 - 3 1 1

g) h) i) Π 0 Π Π 2 0 A A A A + B 0 - 1 - 2 1 0

j) k) Π 2 A A B + C 1 - 2 1 1

Figure S8: a) three motifs with the same Pex(Ak), when the transition network follows the same symmetry as the network structure. b) A more general case: a multiple of allocatalytic cycles of size n. c) 6-membered type I cycle. d) a 6- membered type II cycle. e) a 6-membered type III cycle. f) transition network for one of the two allocatalytic cycles. g) a 6-membered type III cycle. The transition of a precursor to allocatalyst is not mediated by an allocatalytic cycle, and hence this is not a RAF. h) Transition network. i) a trio of symmetric analogues. j) a type V autocatalytic core. k) transition network for j. 34

a) 4 c) 5 3

1 2

b) # Autocatalysts

d) A B C

+ + +

e) α

Figure S9: a) description in terms of Xk. Only the species in the purple circle undergo irreversible transitions towards absorbing states: degradation of X3 (r6) and X4 (r7), fragmentation of X5 (r5). b) An effective network with the same statistics for Pex as a), obtained by removing X1 and X2. c) Simulation using Gillespie’s Algorithm. Extinction events are marked with black pin. d) three-compartment network with a single bimolecular chemical reaction. e) autocatalytic core. The use of three compartment allows to construct the reactions to complete the cycle (the ears) directly from the reproduction step.