arXiv:1612.00825v2 [q-bio.PE] 8 Dec 2017 r bet pnaeul sebeit ii vesicles lipid into assemble spontaneously molecules to These abundantly able [19–23]. Earth meteorites likely are prebiotic were chondrite the molecules by on such present earth Hence, could to lipids prebioti- [16–18]. imported Alternatively, a been in [15]. produced have manner be plausible can membrane, cally blocks lipid building the are amphiphilic that that for acids shown fatty been simple availability like has molecules, prebiotic It the membranes. of lipid support of in evidence el- is occurred. genetic have There self-replicating must compartmentalization of dur- such emergence before, ements, the either after life, at or of development Hence, ing, the researchers. in world point to RNA some by akin after molecules sought those enzymatic and re- information-carrying has world) RNA the 14]. of [13, grail challenge a holy mained the effi- called (some- building replicases times 12]), plausible [11, prebiotically and stable, chemistry e.g. cient, prebiotic progress, in exciting some efforts (and of [10]. decades despite challenge synthesis Indeed, prebiotic a spontaneous creates their catalysts explaining sequences efficient for RNA as long serve of could complexity that enzymes the cooperative Moreover, for parasitism [6–9]. and catastrophe [5] error replicases the suffer for including often pitfalls, bio-polymers calamitous similar theoreti- from or Both RNA of life. well-mixed that populations of show investigations origin experimental the and complete cal of a theory in into narrative rigorous role this and difficul- developing central formidable for a remain But played ties [2–4]. have self-replication must as initiating well one, bio-polymer, as enzymatic similar molecule, a an information-carrying or an RNA origin both that the being in postulates RNA It of life. role of en- the framework about leading plausi- theories a find is compassing self-replication. hypothesis” to of world capable therefore “RNA are The is of that life objective systems of major chemical A ble origin the [1]. on evolve research and replicate to ability nmdr el,lpdmmrnscompartmentalize membranes lipid cells, modern In their is organisms living of characteristic defining A 1 h iet rdc neovbepooelsae algebraic scales protocell evolvable an when produce that to show We time produced. the is protocell evolvable minimal a cln nteasneo uin edsusteimplication the discuss processes. We biological fusion. other of as absence the in scaling rgno el.I h otx foii flf,w sueth assume we life, of origin of context requires the components, functional In cells. of origin rga o vltoayDnmc,HradUiest,Ca University, Harvard Dynamics, Evolutionary for Program oprmnsaeuiutu hogotbooy e their yet biology, throughout ubiquitous are Compartments 4 eateto ahmtc,HradUiest,Cambridge University, Harvard Mathematics, of Department a Sinai Sam 3 eateto hsc,HradUiest,Cmrde Mas Cambridge, University, Harvard Physics, of Department rmrilSxFclttsteEegneo Evolution of Emergence the Facilitates Sex Primordial 1 , 2 avr nvriy abig,Msahsts018 USA 02138, Massachusetts Cambridge, University, Harvard , ∗ ao Olejarz Jason , 2 eateto raimcadEouinr Biology, Evolutionary and Organismic of Department N opnnst eeovbe ecluaetetmsaei wh in timescale the calculate We evolvable. be to components Dtd eebr1,2017) 11, December (Dated: 1 , ∗ ui .Neagu A. Iulia , hc nti otx r nw as compartments, known are forming context 24], this in [23, which conditions aqueous in asg rcs 3,3] easm hti re obe number, certain to a order contain in N to that needs assume protocell We a evolvable, 37]. [36, process passage they because miracle. assumption, probabilistic origin this a no on it an Almost consider operate which models large. from life as- be molecules of to could of unreasonable emerges set be cell starting not evolvable the would it that by cases sume cells such vastly complicated In have producing luck. could of replication, chances possibly the of membranes, improved advent of the presence before Hence, early even protocell. that evolvable for propose an ability the we produce the changes to that categorically required suggest merge time we to study, compartments this outlined these of sex”, reasons For rest “primordial the of attention. in much form received a not is have which protocells, among sex. of considered property is defining individuals in- a two sharing thereby biology, between In content merge, formation 33–35]. to [25, able contents are their informa- sharing protocells enclosing dividing, to addition and particular to In tion in them 28–32]. polymers, 7, enable cooperative [6, protocells replicases contents for of selection their in properties of help These parts divide inherit also 27]. that can [26, protocells and Protocells any start new [25]. for to into reactions metabolism required side a be of may maintain effects that the polymer. the cycles contiguous auto-catalytic dampen within one also information in can stored house the It be can not protocell need i.e. the protocell genome, that segmented means the a also en- within It outside interactions the same with vironment. increased interactions the in decreased share and results and protocell of This contents other The each near fate. life. held to are prelife protocells from transition the pede fcmoettps(.. itntmlclso var- of molecules distinct (i.e., types component of , ots hshptei,w netgt ipefirst- simple a investigate we hypothesis, this test To ability information-sharing this of implications The im- an that pitfalls the of some alleviate Protocells 1 lyin ally tapooel oprmn enclosing compartment a protocell, a at fti eutfroiiso ie swell as life, of origins for result this of s , 3 n atnA Nowak A. Martin and , rtclsfs n hr information, share and fuse protocells big,Msahsts018 USA 02138, Massachusetts mbridge, motnesrthsbc othe to back stretches importance ascuet 23,USA 02138, Massachusetts , N ncnrs oa exponential an to contrast in , ahsts018 USA 02138, sachusetts 1 , 2 , protocells 4 , † ∗ ich . 2 ious complexity) [8, 38–42]. In early life, these could be molecules as simple as ions, activated monomers, molecules that stabilize the membrane, or more compli- cated polymers, like oligo-peptides, and even elementary ribozymes and simple unlinked genes [12, 25, 29, 30, 43– 48]. More precisely, the target set should result in an auto-catalytic network that results in a evolvable cell with non-negligible probability. Such a scheme has been proposed since Oparin, and has been defended more re- cently [48]. We term the smallest set of necessary and sufficient components from which an evolvable protocell can be made a minimal evolvable protocell. We can accordingly represent the functional (or ge- netic) content of each protocell as a binary string of length N. For simplicity, we ignore the redundancy (or dose) of each component in the protocell, and are only concerned with each component’s presence. If a proto- cell contains a particular component i, then the string will have a value of 1 at the ith position and 0 otherwise. Whenever a protocell randomly assembles, we assume that it contains each of the N component types indepen- dently (components do not compete for positions) with FIG. 1. Merging occurs between randomly assembled proto- probability p. I.e. protocell assembly uniformly samples cells. (A) Each color (and a “1” bit at each corresponding each type (with sufficient abundance) from the environ- position on a protocell’s representative binary string) indi- cates presence of one of the four components needed for the ment with probability p. Whenever two protocells merge, protocell to be evolvable (here, N = 4). Randomly assembled the value of the resulting string at every position i is sim- lipid membranes form around the components. (B) When- ply determined by a bitwise OR operation on the ith bits ever two protocells merge, they share their contents. Sharing of the two parent protocells (i.e. if either of the origi- of contents is computed as a bitwise OR operation between nal cells contain a component, the resulting cell will also each of the two parent strings of length N. contain it). This is shown schematically in Figure 1. The dynamical process is as follows. On the first step, the accumulator—the object of our attention—consists number of random assembly and merging events in the of a randomly assembled protocell. If less than N com- accumulation process. ponents are enclosed, then one of two things can happen: The time, Z, needed to form a minimal evolvable pro- With probability δ, the accumulator loses its contents, tocell is thus a random variable that depends on the par- and on the second step, the accumulator consists of a ticular accumulator being tracked. If we track many such new randomly assembled protocell, with the accumula- accumulators, then what is the mean first-passage time, tion process starting over. The accumulator can lose its E[Z], for an accumulator to achieve all N components contents if, for example, its membrane’s integrity is lost, necessary for evolvability? it is infected by a parasite, or it divides, and the param- Begin by considering the simple case δ = 1 (no merg- eter δ accounts for all such possibilities. Or with proba- ing occurs). If the accumulator consists of a randomly bility 1 δ, on the second step, the accumulator merges assembled protocell that has all N components, then the with a randomly− assembled protocell from the environ- minimal evolvable protocell has been achieved. But if ment, possibly gaining additional components. In this there are less than N components, then the accumulator case, if the accumulator still has less than N components is reset without merging. Thus, the expected number of after merging, then one of two things can happen: With such random assembly events required to accumulate all probability δ, the accumulator loses its contents, and on N components necessary for evolvability, Eδ=1[Z], grows the third step, the accumulator consists of a new ran- exponentially with N, i.e., domly assembled protocell, with the accumulation pro- cess starting over. Or with probability 1 δ, on the third 1 N − E [Z]= step, the accumulator merges with another randomly as- δ=1 p sembled protocell from the environment, possibly gaining additional components. This process continues until the For large values of N, the spontaneous generation of a accumulator gains all N components necessary for evolv- minimal evolvable protocell would be a probabilistic mir- ability. The total number of steps (or time units), Z, acle. We now focus our attention on understanding how needed to gain all N components is equal to the total E[Z] grows with N when 0 <δ< 1. 3
In what follows, it is convenient to use the param- To extract the large-N behavior of E[Z] from Eq. (5), eter q 1 p. Denote by S(q,δ,N) the probabil- we simplify the summation in Eq. (1) for large N using ity that,≡ starting− from a randomly assembled proto- the following procedure. For a smooth function f(x), we cell, the accumulator achieves all N components be- use the notation f (i)(x) = dif(x)/dxi. We can express fore being reset. We determine S(q,δ,N) as follows. an integration of f (i)(x) with respect to x from 0 to First, assume that there is no death of the accumula- as ∞ z tor. Then 1 q is the probability that, after z steps, ∞ − ∞ 1 1 z + y the accumulator has achieved a component. Therefore, dx f (i)(x)= dy f (i) 1 (1 qz)N is the probability that the accumulator N N Z0 z=0 Z0 has− not− achieved all N components after z steps. It fol- X lows that (1 qz)N (1 qz−1)N is the probability that Next, we write a Taylor expansion of f (i)((z + y)/N) in − − − the accumulator achieves all N components in exactly z powers of y/N and perform the integration over y. We steps. Then, considering death of the accumulator, since have the probability that the accumulator survives for z steps z−1 ∞ ∞ ∞ without being reset is simply (1 δ) , we have (i) 1 1 (i+m) z − dx f (x)= m f 0 (m + 1)!N "N N # Z m=0 z=0 ∞ X X (6) − N − N S(q,δ,N)= (1 δ)z 1 (1 qz) 1 qz 1 Substituting Eq. (6) into Eq. (1) to express the sum- − − − − z=1 X h i mation as an integration, substituting the integral form 1 x−1 y−1 This can be simplified as of the Beta function, B(x, y)= 0 dt t (1 t) , and using to denote asymptotic equivalence as−N , we ∞ ∼ R → ∞ δ obtain S(q,δ,N)= (1 δ)z (1 qz)N (1) 1 δ − − − z=1 δ log(1 δ) X S − B − ,N +1 (7) ∼ (1 δ) log(q) log(q) Denote by T (z; q,δ,N) the probability mass function for − the number of steps, z, needed for the accumulator to Substituting Eq. (7) into Eq. (5), expressing the gain all N components (i.e., reach its target) when start- Beta function using Gamma components, B(x, y) = ing from a randomly assembled protocell, given that all Γ(x)Γ(y)/Γ(x + y), using Stirling’s formula for the N components are accumulated before being reset. We Gamma function, Γ(x) xxe−x 2π/x, and simplify- have ing for large N, we find∼ that E[Z] grows asymptotically N p (1 δ)z−1 (1 qz)N 1 qz−1 as T (z; q,δ,N)= − − − − h S(q,δ,N) i E[Z] αN k, (8) (2) ∼ Denote by R(z; q,δ,N) the probability mass function for where the number of steps, z, taken before the accumulator is (1 δ) log(1 p) reset when starting from a randomly assembled protocell, α = − − − given that the accumulator is reset before gaining all N δ2Γ(k) components. We have and z−1 z N δ(1 δ) 1 (1 q ) log(1 δ) R(z; q,δ,N)= − − − (3) k = − . 1 Sh(q,δ,N) i log(1 p) − − In what follows, we omit explicitly writing the functional The time complexity of concurrence of components for dependencies on q, δ, and N for notational convenience. the problem of abiogenesis is thus fundamentally altered: For all 0 <δ< 1, the mean first-passage time, E[Z], For any slight amount of merging, i.e., for any value 0 < needed to form a minimal evolvable protocell is calculated δ < 1, E[Z] grows algebraically with N. Intriguingly, for directly from many values of p and δ, E[Z] grows only as a small power of N, and for many other values of p and δ, E[Z] grows ∞ z [ST (z)+(1 S)R(z)] E[Z]= z=1 − (4) only sublinearly with N (Figure 2). S P For the particular case in which δ 1, p 1, and ≪ ≪ Substituting Eqs. (1), (2), and (3) into Eq. (4) and δ is not too large relative to p, Eq. (8) admits a simple simplifying, we obtain approximation: 1 1 δ 1 E[Z]= − (5) E[Z] N δ/p (9) Sδ − δ ≈ δ 4
as a randomly assembled protocell, and it is never reset. For this case, the mean first-passage time, Eδ=0[Z], grows logarithmically with N, i.e. [49],
N N ( 1)i+1 log(N) E [Z]= − δ=0 i 1 (1 p)i ∼ log(1 p) i=1 X − − − − Also of interest for the biologically realistic case 0 < δ < 1 is the probability mass function, P (Z = z), for the number of steps needed to achieve a minimal evolvable protocell. P (Z = z) is given by
z P (Z = z)= S (1 S)i−1 T (z ) R(z ) − 1 j i=1 i z =z j=16 X Pj=1Xj Y (10) If N is small, then there is typically a small number of resets before the accumulator gains all components, which corresponds to each zj being comparable in mag- nitude to z in the summations in Eq. (10). But if N is large, then there is typically a large number of resets before the accumulator gains all components, which cor- responds to having z z for all j in the summations in FIG. 2. Minimal evolvable protocells are achieved in poly- j ≪ nomial time for the vast majority of parameter space. For Eq. (10). In this case, the total number of steps, Z, is N = 10, N = 25, and N = 100, we perform Monte Carlo the sum of many independent and identically distributed simulations of the accumulation of components, and we plot random variables. E Z N logN ( [ ]) as a function of p and δ. For N → ∞, we plot k To provide a sense of how well of an estimator E[Z] N as a function of p and δ. is for the variable Z we look at its concentration Z˜ = Z/E[Z]. Denote µ as the average number of steps be- Comparison of the model, approximation, and simulation 20 data for =0 01 and varying fore an accumulator resets given that the accumulator 2 2 19 resets before gaining all N components. Denote σ as ] 2 Model