Autocatalytic sets and models of early life

Wim Filipa Sousa Hordijk

! Mike Steel Joint work with…

Elchanan Mossel

Oxford, 2016 1 2

It is often said that all the conditions for the first production of Some ‘formal’ models a living organism are now present, which could ever have been present.— But if (& oh what a big if) we could conceive ! (1940s-1980s) in some warm little pond with all sorts of ammonia & phosphoric salts,—light, heat, electricity &c present, that a ! Self-reproducing automata (von Neumann) compound was chemically formed, ready to undergo ! model (Gánti) still more complex changes, at the present day such matter would be instantly devoured, or absorbed, which would not ! ‘Hypercycles’ (Eigen and Schuster) have been the case before living creatures were formed. ! Collectively autocatalytic systems (Kauffman, Farmer, Bagley) Letter to J. D. Hooker, 1 Feb [1871] ! First cycles in directed graphs (Bollobás and Rasmussen) ! Many ideas/theories re. origin of life ! (M,R)-systems (Rosen) ( ‘RNA world’, genetic first/vs first, hydrothermal vents ! More recently (1990s-) etc). ! Petri Nets (Sharov) ! Chemical organisation theory (COT) ! Current DNA/RNA/protein molecular " (Contreras et al. 2011; Kreyssig et al. 2012) machinery too complex to have arisen ! RAF theory

spontaneously all at once. 3 4

Two features of ! Key early steps require the emergence (and evolution) of self-sustaining and autocatalytic Accelerates the production of networks of reactions. in the network so they accumulate spatially in concentrations sufficient to sustain further reactions and fight diffusion.

Not only much faster rates, but also Vaidya et al., Nature, 2012 tightly ‘coordinated’ Wolfenden,)Snider,)Acc.)Chem.)Res,) 2001)

5 6

Catalytic Reaction System (CRS) Another way to view a CRS

=(X, R0,C,F) A directed (and bipartite) graph with Q two types of vertices ( types, reactions) and two types of arrows Molecule Catalysis “Food” set types (reactants + products, catalysis). Reactions C X R0 F X ✓ ⇥ ✓ f X X (x,r) 1 R0 2 2 ✓ ⇥ r p 1 r =( reactants , products )=(⇢(r), ⇡(r)) 4 { } { } p p 3 1

f r p r f 3 3 2 2 2

7 8 Simple example: model Early claim: “The formation of autocatalytic sets of polypeptide catalysts is an expected emergent collective property of sufficiently complex A set of molecules represented by strings over an alphabet (e.g. 0, 1) sets of polypeptides, amino acids, and other small molecules.” up to length n, with food molecules up to length t (with t << n). (Kauffman, 1986) A set of reactions of two types: ligation: 000+111000111 Basic idea: Given a fixed probability of cleavage: 01010100101+010 catalysis p and increasing n, at some point there is a phase transition where the

entire reaction graph becomes an Randomly assigned catalysis: autocatalytic set, similar to Pr[x catalyzes r] = p(x,r) giant connected components appearing Uniform model p(x,r)=p in random graphs.

9 10

Main Criticisms Our approach

● Argument requires an exponential growth rate in Use mathematics (and simulations) to study the level of catalysis (Lifson, 1996). polymer model and its extensions ● Autocatalytic sets lack evolvability (Vasas, Szathmáry & Santos, 2010).

● Binary polymer model is not realistic enough (Wills & Henderson, 1997). First we need to formalize some notions….

We will consider all these issues….

11 12 Definitions: Closure Definition: F-generated

! Given any subset R of R , the closure of F (relative to R) 0 R is F –generated if clR(F ) contains clR(F ) every reactant of every reaction in R is the set of molecule types that can be constructed from F by applying just reactions from R (whether they are catalysed or not). ⇒ each reactant of any reaction in R is either in F o r is a product of some other reaction in R

! Formally, clR(F) is the unique (minimal) subset W of X that contains F and satisfies: ⇢(r) W ⇡(r) W ⇐ ? ✓ ) ✓ ! clR(F) is computable in polynomial time in |Q|

13 14

Definition: RAF (Reflexively Earlier example Autocatalytic network over F)

A subset R of R0 is an RAF if

R ≠ ∅, and it satisfies the two properties: (RA): each reaction r in R, is catalysed by a product of some other reaction (or by an element of F), (F): R is F-generated

15 16 f1 f4 Equivalent definition p4 Two nice properties of RAFs f5

p1 p3 ! The union of any collections of RAFs is itself an RAF " So if Q has an RAF then it contains a unique maximal RAF. " Denote this by maxRAF(Q)

f f3 2 p2 ! There is a simple algorithm to determine whether or not Q has an RAF, and if so to compute maxRAF(Q) (polynomial time in

A subset R of R0 is a RAF if |Q|). R = , and for each reaction r R 6 ; 2 all of the reactants and at least one catalyst of r

are present in clR(F )

17 18

Related notions: f1 f4 p RAF Algorithm 4 f5

p3 p1 R0,R1,... (nested decreasing sequence) with limit R RAF 1

f3 Ri+1 = reactions in Ri that have all their reactants f2 f1 p p4 and at least one catalyst in cl (F ) f1 f4 2 Ri p4 f5 p3 p1 If R = ,then has no RAF, else R = maxRAF( ). p3 p1 1 ; Q 1 Q

f3 f2 f3 f2 p 2 p2 CAF pseudo-RAF (Constructively Autocatalytic network over F) 20

Quantities of Interest Early results

! p constant (i.e. independent of n): Q = (X, R0, C, F ) full binary polymer model (on all n+1 n+1 [Kau↵man; 1986, 1993] Pn 1 as n sequences of length up to n).|X|~2 ; |R0|~n2 . ! !1 ! But this requires f to grow exponentially with n which is biochemically unrealistic (Lifson ‘96) ● Average number of reactions catalyzed by any molecule type: f = p·|R | 0 ! What if f grows more slowly?

" [S: 2000] 1 1 ● Probability P = Pr(R contains an RAF) that an n 0 If f<3 e then Pn 0 as n instance of the binary polymer model contains an ! !1 2 If f>cn then Pn 1 as n RAF set. ! !1 [Conjecture: sub-quadratic] 21 22

Probability of RAF Sets Main theoretical results I (Mossel+S, 2005)

! Theorem 1: Linear transition for RAFs

1 If f n then P 0 as n / n ! !1 If f n1+ then P 1 as n / n ! !1 ! Theorem 2: Exponential transition for CAFs If f (2 )n then P 0 as n / n ! !1 If f (2 + )n then P 1 as n / n ! !1

[Hordijk+S, 2004] 23 24

The actual bounds (Mossel+S, 2005) Small RAFs

t (e ) ! If f n then Pn 1 Finding a smallest RAF is NP-hard. 1 e  = #states t = length of sequences in F ! Definition: An RAF of Q is irreducible if it contains no proper sub-RAF t =2,  =2

Generates all of X but not all of R0 (this ! Finding an irrRAFs is easy… requires f to grow quadratically with n) ! But there can be exponentially many Infact, P10 = 0.5 when f = 1.3 (not f = 17). of them If f n then P M  n 

25 26

Small RAFs Main theoretical result (S+Hordijk+Smith 2013)

! irrRAFs can be of different sizes: Theorem 3

2000 There are no small RAFs 1500 when they first appear # reactions 1000 If f = n then as n : !1 cn 500 (R0 contains an RAF of size < 2 ) 0 P maxRAF ! irrRAF

0 1.15 1.20 1.25 1.30 1.35 1.40 1.45 f = p | R| ! When first cycles first appear in a random directed graph, they are of all sizes (Bollobás and Rasmussen, 1989) ! There are tiny RA sets, and tiny F-generated sets but none that are both.

Two instances of binary polymer model with n=10 (|R0|= 16,388) at 27 28 f =1.2 where Pn~0.5.

Structure of RAFs Poset of RAFs

! In general maxRAF(Q) may contain (many) other sub- RAFs Dynamics (Gillespie algorithm)

An instance of binary polymer model Movie by Wim Hordijk (KLI institute) 2016 with n=5 (|R0|= 196), f=073, where Pn~0.5.

29 30

Computing: It’s easy to…. Application to a real experimental system

Find all the maximal proper subRAFs of the maxRAF (polynomial time in |Q|).

Construct the poset P of subRAFs of Q. RNA ribozyme replicator system: 16 reactions, 18 molecules, |F|=2, 64 catalysation pairs (x,r). (polynomial time in |P|x|Q|). +Dynamics of subRAFs via Gillespie algorithm Forms an RAF (but not a CAF!). Contains many subRAFs.

31 32 Application to a living organism Extensions beyond the uniform model

! Allowing p(x,r) to vary

! Methanopyrus Template-based catalysis E coli kandleri ! p(x,r) depends on length of x

# CRS has 1826 reactions, 1199 molecule types (42 ! Partitioned system catalysts), |F|=438.

# RAF set of 1787 reactions out of 1826 = 98%. Not a CAF

# Complex subRAF structure # min F for a RAF = 123 molecules ! Extension of transition theory to non-polymer systems [n replaced by ratio of # reactions to # molecules] 33 34

JGAA, 0(0) 0–0 (0) 11

M(1 q ), and so, from Eqn. (13), Pn is less than or equal to M(λ + o(1)), which− converges∗ to zero as λ 0. → ! • Hordijk, W. and Steel, M. (2016). Autocatalytic sets in Fig. 3 shows the behavior of Pn as a function of f for the binary polymer Recent work model, across our four exemplar distributions (for n = 10 and n = 16). The polymer networks with variable catalysis distributions Four models plots were obtained by simulations and the use of the RAF algorithm (except ArXiv 1605.03919v1 (submitted to J. Math. Chem.). for the all-or-nothing model, where the exact formula was used). Two features [motivated by Oxford summer school project]

●●●●●●●● ! 1.0 ●●● ● ● Uniform model ●● ● ● ● ● ● ● ● ● ● ● ● W. Hordijk, L. Hasenclever, J. Gao, D. ● ● ● ● ● Mincheva, and J. Hein. An investigation into ● ● ● ● ● ● Unif (n=10) 0.8 ● ● Unif (n=16) irreducible autocatalytic sets and power law ● ● Plaw (n=10) ● ● ● Plaw (n=16) distributed catalysis. Natural Computing, 13(3):287– ● ! Power law catalysis ● ● Sparse (n=10) ● ● Sparse (n=16)

296, 2014. 0.6 ● All/none (n=10) ● n ● All/none (n=16) P ● ●

●● ●

0.4 ● ● ! ● ● Sparse model ● ● ● ● ● (each molecule catalyses n reactions 0.2 ● ● ● ●● ● ● w.p. p or no reactions w.p. 1-p.) ● ● ● 0.0 ! All or nothing 0 1 2 3 4 5 6 f (each molecule catalyses all reactions w.p. p or no reactions w.p. 1-p) Figure 3:PComparisonn =1 acrosse the four models of Pn (probability of a RAF) as a function of f (average catalysis rate per molecule) on the binary polymer model for n = 10 and n = 16. 35 36 are apparent. Firstly, the uniform model and the power law model show a much sharper transition from Pn =0toPn = 1 than the other two models. Secondly, the effect of increasing n (from 10 to 16) has a larger effect on the curves for the sparse and all-or-nothing model than it does for the uniform and power law models. Theorem 1 also leads to the following interesting consequence for the structure of small RAFs in the sparse model, which is different to that for the uniform model. Corollary 1 Consider the sparse polymer model with f = λn, where λ is suf- ficiently large such that (from Theorem 1) P˜n 1 ϵ. Then with probability at least 1 ϵ o(1), any instance of the model will≥ contain− a RAF with less than 2λn2 reactions− − (where o(1) refers to a term that converges to zero exponentially with increasing n). JGAA, 0(0) 0–0 (0) 19

Case Slope standard 0.02 all-X 0.70 F & all-X 1.51 theory 1.63

Table 1: The slopes of the linear relationship for the growth rate in required level of catalysis, with increasing maximum molecule length n, as derived from Result 1 the four different cases described in the text. Impact of inhibition 5.2 Power law catalysis In contrast to the uniform/2t catalysist version of the binary polymer model, the power law catalysis(ee version)) seemed to require no increase/2 in the level of catalysis, P 1 (1 e ) Definition (u-RAF) If f n then Pnn 1with increasing n, to get a probability Pn 0.50 of finding RAF sets. In Fig. 9 /2 ≈ 11 ee ✓ ◆ Given =(X, R0,C,F) and I X R0,

1.0 ● n = 8 Q ✓ ⇥ n = 10 a u-RAF is a RAF R that satisfies: n = 12 n = 14 ●

0.8 n = 16 ● n = 18 ● ● (x, r) I r R or x cl (F ). If f n then P M n = 20 R n ● ● 2 ) 62 62 ● ●

  0.6 ● ● n ● P ● ● Determining if there exists a u-RAF is NP-hard ● ● 0.4 ● ● ● (Mossel+S, 2008) ● ●● ● ● Recall: when RAFs first arise, the uniform ● ● 0.2 ● model has no small (subexponential) RAFs. Simple algorithm:

0.0 2 But the Sparse model has O(n ) size RAFs 1.0 1.2 1.4 1.6 1.8 2.0 maxRAF( ) R0 maxRAF( 0) f Q ! Q ! ! Q

37 38 Figure 6: The probability Pn of finding RAF sets against the average level of catalysis f for various values of the maximum molecule length n in the binary polymer model with power law distributed catalysis.

in [3], Pn vs. f is plotted for various values of n (up to n = 12) for the power law case. However, rather than slowly moving to the right (i.e. towards higher f values, as is the case for the uniform model [7]), the curves for increasing h = inhibitionn for rate the power law case cross each other around f =1.5[3]. This result Result 2 seems somewhat counter-intuitive, especially as Theorem 1(ii) above implies Possible future applications (expected # reactionsthat the curves inhibited must eventually by start each moving reaction) to the right as n grows. It turns ! Economics/social science µ(f,h) = expected number of u-RAFs ! cognitive psychology Inhibition theorem Further details ln(1+e ) JGAA, 0(0) 0–0 (0) 21 If f = n and h n then • Steel, M. (2015). Self-sustaining autocatalytic networks within open-ended reaction systems.  6. Theorem 2 coupled with Fig. 3 suggested that u-RAFs would likely exist, Journal of Mathematical Chemistry, 53(8): 1687--1701. and applying the iteration algorithm over 10 random instances succeeded in • F. L. Sousa, W. Hordijk, M. Steel, and W. F. Martin, Autocatalytic sets in E. coli metabolism. µ(2f,h) µ(f,0) detecting u-RAFs in all cases, as shown in Table 2. The fixed-pararameter tractable u-RAF algorithm would be quite infeasible here, as m = X = 2046. Journal of Systems Chemistry, 6:4, 2015. | 10| • Hordijk, W. and Steel, M. (2013). A formal model of autocatalytic sets emerging in an RNA (1) (2) 6393 2547 replicator system. Journal of Systems Chemistry 4:3. 6415 2586 Example: n=10, |R0|=16,388 • Steel, M., Hordijk, W., and Smith, J. (2013). Minimal autocatalytic networks. Journal of 6447 2729 6410 2716 Theoretical Biology 332: 96-107. f =2,h= 30 µ(4, 30) µ(2, 0) 6351 2581 • E. Mossel and M. Steel. (2005). Random biochemical networks and the probability of self- 6359 2667 6395 2718 sustaining autocatalysis. Journal of Theoretical Biology 233(3), 327-336. 6397 2512 6418 2631 6438 2625 39 40

Table 2: The sizes of (1) the maximal RAF not taking inhibitors into account, and (2) the u-RAF from the iterative algorithm, across 10 random instances of the uniform binary polymer model containing a RAF (with n = 10, catalysis rate f = 4 and inhibition rate h = 6).

6.1 Efficiency of iteration algorithm Since the “standard” RAF algorithm has a running time that is polynomial in the size of the given CRS, the iteration algorithm above for finding u-RAFs is clearly|R| also polynomial-time. We now compare the results of the iteration algorithm with that of the earlier exact algorithm on model instances in which the number m of inhibitor molecule types is bounded (this exact algorithm, by contrast, has a complexity that grows exponentially with m). We then apply the iteration algorithm to model instances that are beyond the feasible reach of the exact method. Note that in this section we continue to work in the uniform model, but now there are m molecule types that are inhibitors, and for each of these the prob- ability the molecule type inhibits any particular reaction is q. This is therefore quite different to the set-up in Section 3.1 (and discussed above) where each molecule type has a constant probability of inhibiting each reaction. Using the same parameter values as in the previous study (i.e., n = 10, t = 2, p =0.0000792 [giving a probability Pn of 0.50 to get “regular” RAFs]), m = 10, and inhibition probabilities q = 10 p and q = 100 p [6], the results are shown in Table 3 (including the additional× parameter value× q = 50 p). × For each of the three cases (different values for the inhibition probability q), 10 instances of the model that contain a “regular” RAF were taken and both the exact and the iteration algorithm for finding u-RAFs were applied. In each table, the column labeled (1) shows the size of the (regular) maximal RAFs for