<<

View metadata, citation and similar papers at core.ac.uk brought to you by CORE Reciprocal Nucleopeptides as the Ancestral Darwinianprovided by Jagiellonian Univeristy Repository Self-Replicator Eleanor F. Banwell,†,1 Bernard M.A.G. Piette,†,2 Anne Taormina,2 and Jonathan G. Heddle*,1,3 1Heddle Initiative Research Unit, RIKEN, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan 2Department for Mathematical Sciences, Durham University, Durham, United Kingdom 3Bionanoscience and Laboratory, Malopolska Centre of , Jagiellonian University, Krakow, Poland †These authors contributed equally to this work. *Corresponding author: E-mail: [email protected]. Associate editor: Nicole Perna

Abstract Even the simplest organisms are too complex to have spontaneously arisen fully formed, yet precursors to first life must have emerged ab initio from their environment. A watershed event was the appearance of the first entity capable of : the Initial Darwinian Ancestor. Here, we suggest that nucleopeptide reciprocal replicators could have carried out this important role and contend that this is the simplest way to explain extant replication systems in a mathemat- ically consistent way. We propose short nucleic acid templates on which amino-acylated adapters assembled. Spatial localization drives peptide ligation from activated precursors to generate phosphodiester-bond-catalytic peptides. Comprising autocatalytic protein and nucleic acid sequences, this dynamical system links and unifies several previous hypotheses and provides a plausible model for the emergence of DNA and the operational code. Key words: Initial Darwinian Ancestor, , RNA world, protein world, nucleopeptide replicator, reciprocal replicator, polymerase, ribosome, evolution, early earth, hypercycle.

Introduction and Joyce 2012). Furthermore, it invokes three exchanges of between RNA and other molecules to explain the In contrast to our good understanding of more recent evo- coupling of polynucleotide and protein biosynthesis, namely lution, we still lack a coherent and robust theory that ade- transfer of information storage capability to DNA and poly- quately explains the initial appearance of life on Earth merase activity to protein as well as gain of peptide synthesis (abiogenesis). In order to be complete, an abiogenic theory ability. This presents a situation in which no extant molecule must describe a path from simple molecules to the Last continues in the role it initially held. Others have posited Universal Common Ancestor (LUCA), requiring only a grad- peptide and nucleopeptide worlds as solutions. Article ual increase in complexity. The peptide world theory proposes a spontaneously oc- The watershed event in abiogenesis was the emergence of curring self-replicating peptide with RNA synthesis, DNA and the Initial Darwinian Ancestor (IDA): the first self-replicator the operational code appearing later, and possible self- (ignoring dead ends) and ancestral to all life on Earth (Yarus replicating mechanisms of peptides have been explored 2011). Following the insights of von Neumann, who proposed (Fox and Harada 1958; Lee et al. 1996). Nucleopeptide theo- the kinematic model of self-replication (Kemeny 1955), nec- ries require that the replicator consist of both peptides and essary features of such a replicator are: Storage of the infor- nucleic acids and may involve their covalent linkage or (as in mation for how to build a replicator; a processor to interpret our proposal) noncovalent conjugation. Covalently linked information and select parts; an instance of the replicator. nucleopeptides include nucleobase-containing peptides In order to be viable, any proposal for the IDA’s structure such as PNA which has been mooted as a possible precursor must fit with spontaneous emergence from prebiotic geo- to the RNA world (Miller 1997) and possible RNA-interacting chemistry and principles of self-replication. Currently, the nucleo-2-peptides have been synthesized (Roviello et al. most dominant abiogenesis theory is the “RNA world,” which 2009; Nelson et al. 2000). Both the peptide world and nucle- posits that the IDA was a self-replicating ribozyme, that is, an opeptide theories consist of single molecular classes and RNA-dependent RNA polymerase (Cech 2012). Although therefore suffer the same exchange of function problems as popular, this theory has problems (Kurland 2010). For exam- the RNA-world theory. To the best of our knowledge, no ple, while it is plausible that molecules with the necessary single theory has emerged that parsimoniously answers the replication characteristics can exist, length requirements biggest questions. seem to make their spontaneous emergence from the pri- Here, we build on several foregoing concepts to propose an mordial milieu unlikely, nor does the RNA world explain the alternative theory based around a nucleopeptide reciprocal appearance of the operational code (Noller 2012; Robertson replicator that uses its polynucleotide and peptide

ß The Author(s) 2017. Published by Oxford University Press on behalf of the Society for and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons. org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Open Access

Downloaded from404 https://academic.oup.com/mbe/article-abstract/35/2/404/4604794Mol. Biol. Evol. 35(2):404–416 doi:10.1093/molbev/msx292 Advance Access publication November 8, 2017 by Uniwersytet Jagiellonski Biblioteka Medyczna user on 16 August 2018 Ancestral Darwinian Self-Replicator . doi:10.1093/molbev/msx292 MBE

components according to their strengths, thus avoiding the Assumptions of the Model need to explain later exchange of function and coupling. We We postulate that, in a nucleopeptide reciprocal replicator, advocate a view of the IDA resulting from a biochemical the use of each component according to its strengths could system which we describe as a dynamical system, that is, a deliver a viable IDA more compatible with evolution to LUCA system of equations describing the changes that occur over replication machinery. Although seemingly more complex time in the self-replicator presented here, and we demon- than an individual replicating molecule, the resulting unified strate that such an entity is both mathematically consistent abiogenesis theory answers many hard questions and is ulti- and complies with all the logical requirements for life. While mately more parsimonious. The model does not consider in necessarily wide in view we hope that this work will provide a detail the chemistry of how the building blocks that consti- useful framework for further investigation of this fundamen- tute the IDA (short peptides and nucleic acids) came about as tal question. these details are covered in the cited literature (see for exam- ple, Saladino et al. 2012; Patel et al. 2015; Da Silva et al. 2015; Leman et al. 2004; Liu et al. 2014; Martin et al. 2008). Rather, Model and Results we concentrate on the important question of the mathemat- Solving the Chicken and Egg Problem ical validity of the IDA in terms of its ability to sustainably self- Given that any IDA must have been able to replicate in order replicate, without which it would not be a valid system. In to evolve, extant cellular replication machinery is an obvious constructing our model, we make the following assumptions: source of clues to its identity. Common ancestry means that (i) The existence of random sequences of short strands of features shared by all life were part of LUCA. By examining the mixed nucleic acids (XNA) likely consisting of ribonucleotides, common replication components present in LUCA, and then deoxyribonucleotides and possibly other building blocks able extrapolating further back to their simplest form, it is possible to polymerize with nucleotide chains, as well as the existence to reach a pre-LUCA, irreducibly complex, core replicator of random amino acids and short peptides produced (fig. 1). abiotically. We see that in all cells, the required functions of a rep- licator are not carried out by a single molecule or even a For this first assumption we have supposed a pool of inter- single class of molecules, rather they are performed vari- acting amino acids, nucleotides and related small molecules as ously by nucleic acids (DNA, RNA) and proteins. When well as a supply of metal ions, other inorganic catalysts and viewed by molecular class, the replicator has two compo- energy. The precise understanding of the “metabolic” reac- nents and is reciprocal in nature: polynucleotides rely on tions in which these precursor building blocks were formed is proteins for their polymerization and vice versa. The ques- in itself an extremely important question but is not consid- tion of which arose first is a chicken and egg conundrum ered here as a number of potential early earth conditions and that has dogged the field since the replication mechanisms reaction pathways resulting in these outcomes have already were first elucidated (Giri and Jain 2012). In this work, we been proposed, including the formamide reaction (Saladino suggest that, consistent with common ancestry and in con- et al. 2012) and cyanosulfidic chemistries (Patel et al. 2015). trast with the RNA world theory, the earliest replicator was Recent experimental models of alkaline hydrothermal vents a two—rather than a one—component system, composed have even succeeded in producing various organic molecules of peptide and nucleic acids. including ribose and deoxyribose (Herschy et al. 2014). Pools

FIG.1.Replication schemes. (a) This simplified cellular replication schematic is common to all life today and likely reflects the ancestral form present in LUCA. Shading by molecule type (purple for nucleic acid and orange for protein), reveals a reciprocal nucleopeptide replicator. Although the ribosome is a large nucleoprotein complex, the catalytic centre has been shown to be a ribozyme (Moore and Steitz 2003) and so it is shaded purple in this scheme. (b) Comparison of the method of action of the extant ribosome with the proposed primordial analogue (components are shaded like for like). Today, tRNA molecules (mid purple) loaded with amino acids (orange) bind the mRNA (dark purple) in the ribosome (light purple), which co-ordinates and catalyses the peptidyl-transferase reaction. Although the present day modus operandi is regulated via far more complex interactions than the primordial version, the two schemes are fundamentally similar. Mixed nucleic acid structures, one performing a dual function as primordial mRNA and primordial ribosome (p-Rib) and a second functioning as a primordial tRNA (p-tRNA), provide a system wherein the former structure templates amino acid-loaded molecules of the latter.

Downloaded from https://academic.oup.com/mbe/article-abstract/35/2/404/4604794 405 by Uniwersytet Jagiellonski Biblioteka Medyczna user on 16 August 2018 Banwell et al. . doi:10.1093/molbev/msx292 MBE

of pure molecules are unlikely; instead, mixtures would likely have comprised standard and nonstandard amino acids as well as XNAs with mixed backbone architectures, being, in their simplest forms, mixtures of deoxy- and ribonucleotides (Trevino et al. 2011; Pinheiro et al. 2012)withotherbuilding blocks being possible. For simplicity we sometimes refer to XNAs as “polynucleotides.” Such conditions would be condu- cive to the occasional spontaneous covalent attachment of FIG.2.Models of primitive polymerization reactions. An XNA strand can nucleotides to each other to form longer polymer chains function like a primordial ribosome (p-Rib) whereby one strand (Da Silva et al. 2015). (þ strand) can template the production of a primordial polymerase (p-Pol) as indicated by the solid arrow. The action of this p-Pol is repre- (ii) The existence of abiotically aminoacylated short XNA sented by the double-headed dotted arrow whereby it acts on the p-Rib strands (primordial tRNAs (p-tRNAs)) (þ strand) to catalyze synthesis of the complementary sequence ( strand) and also on the strand to produce more of the þ strand. The second assumption is potentially troubling as amino acid activation is slow and thermodynamically unfavorable. Biro et al. 2003; Rodin et al. 2011). Furthermore only a reduced However, amino acylation has been investigated in some detail set of amino acids (Angyan et al. 2014)—possibly as few as four and has been shown to be possible abiotically including, in (Ikehara 2002)—need to have been provided in this way. The some cases, the abiotic production of activated amino acids “statistical protein” hypothesis proposes that such a weak sep- (Illangasekare et al. 1995; Leman et al. 2004; Giel-Pietraszuk and aration may have been sufficient to produce populations of Barciszewski 2006;Lehmann et al. 2007;Turk et al. 2010;Liu et al. active peptides (Ikehara 2005; Vetsigian et al. 2006). Such 2014). A pool of activated amino acids allows us to presume a “primordial polymerases” (p-Pol) need only have been small fast rate of charging of p-tRNAs meaning that we can assume (see below) and spontaneous emergence of a template coding that the rate of charged p-tRNA formation is proportional to loosely for such a sequence seems plausible. The failure rate of the concentration of free amino acids. Taken together these such syntheses would be high but a p-Rib using the outlined data suggest that multiple small amino-acylated tRNA-like pri- primordial operational code to produce statistical p-Pol pep- mordial XNAs could have arisen. Though likely being XNA in tides could have been accurate enough to ensure its own nature, we refer to them as p-tRNA, reflecting their function. A survival. similar nomenclature applies to p-Rib and p-mRNA. (iv) The viability of a very short peptide sequence to function (iii) Conditions that allow a codon/anticodon interaction be- as an RNA-dependent RNA polymerase tween two or more charged p-tRNA for sufficient time and Templated ligation is often proposed as a primordial self- appropriate geometry to allow peptide bond formation, that replication mechanism, particularly for primitive replication is, the functionality of a primordial ribosome (p-Rib) of nucleic acid in RNA world type scenarios. However, these Our proposed p-Rib is an extreme simplification of the are associated with a number of problems as mentioned functionality of both the present day ribosome and mRNA earlier. In addition, extant RNA/DNA synthesis proceeds via (fig. 1). Initially, the p-Rib need only have been a (close to) linear terminal elongation (Paul and Joyce 2004; Vidonne and Philp assembly template for the p-tRNAs to facilitate the peptidyl 2009). To be consistent with the mechanism present in LUCA transferase reaction through an increase in local concentra- and pre-LUCA, the p-Pol should, preferably, have used a sim- tion. This mechanism is simple enough to emerge spontane- ilar process. ously and matches exactly the fundamental action of the During templated ligation, a parent molecule binds and extant ribosome (fig. 2). The idea that a p-Rib may have an ligates short substrates that must then dissociate to allow internal template rather than separate mRNA molecules and further access, but the product has greater binding affinity that an RNA strand could act as a way to bring charged tRNAs than the substrates and dissociation is slow. This product together has previously been suggested (Schimmel and inhibition results in parabolic growth and limits the usefulness Henderson 1994; and Koonin 2007; Morgens 2013)and of templated ligation for replication (Issac and Chmielewski isknownasan“entropytrap”(Sievers et al. 2004; Ruiz-Mirazo 2002). Conversely, in 1D sliding (or more accurately jumping), et al. 2014). The concept has been demonstrated to be exper- the catalyst may dock anywhere along a linear substrate and imentally viable (Tamura and Schimmel 2003) although in the then diffuse by “hops” randomly in either direction until it latter case it is the primordial ribosomal rRNA strand itself that reaches the reaction site; a successful ligation reaction has provides one of the two reacting amino acids. little impact on binding affinity and leaves the catalyst prox- A functional operational system requires preferential charg- imal to the next site. For simplicity our model assumes a ing of particular p-tRNAs to specific amino acids. Although single binding event between p-Pol and p-Rib followed by there is evidence for such relationships in the stereochemical multiple polymerization events. A p-Pol proceeding via 1D theory (Woese 1965;Yarus et al. 2009),so farunequivocal proof sliding could catalyze phosphodiester bond formation be- has been elusive (Yarus et al. 2005; Koonin and Novozhilov tween nucleotides bound by Watson and Crick base-pairing 2009). However, there is sufficient evidence to suggest at least to a complementary XNA strand. Because p-Pol activity a separation along grounds of hydrophobicity and charge would be independent of substrate length, a relatively small using just a two-base codon (Knight and Landweber 2000; catalyst could have acted on XNAs of considerable size. From

Downloaded from406 https://academic.oup.com/mbe/article-abstract/35/2/404/4604794 by Uniwersytet Jagiellonski Biblioteka Medyczna user on 16 August 2018 Ancestral Darwinian Self-Replicator . doi:10.1093/molbev/msx292 MBE

inspection of present day polymerases such a peptide may 1; ...; n. In our models, the polymer chains are polypeptides have included sequences such as DxDGD and/or GDD known and polynucleotides, and the building blocks are amino acids to be conserved in their active sites and consisting of the and codons respectively. With a slight abuse of language, we amino acids thought to be amongst the very earliest in life call the number of constituents (building blocks) of a polymer (Iyer et al. 2003; Koonin 1991). chain its length. So hereafter, “lengths” are dimensionless. The In our simple system any such p-Pol must be very short to order in which these constituents appear in any chain is have any realistic chance of being produced by the primitive biologically significant, and we encode this information in components described. We must therefore ask if there is ev- finite ordered sequences of arbitrary length L denoted idence that small (e.g. <11 amino acid) peptides can have SfLg¼ðs1; s2; ...; sLÞ, whose elements sj; j ¼ 1; ...L label such a catalytic activity. Catalytic activity in general has been the building blocks forming the chains, in the order indicated demonstrated for molecules as small as dipeptides (Kochavi in the sequences. Each element sj in the sequence SfLg is an et al. 1997). For polymerase activity in particular, it is known integer in the set f1; ...; ng which refers to the type of that randomly produced tripeptides can bind tightly and building block occupying position j in the chain. There are specifically to nucleotides (Schneider et al. 2000; McCleskey therefore nL sequences of length L if the model allows n types et al. 2003). We suggest that a small peptide could arise with of building blocks. For instance, the sequence Sf5g¼ the ability to bind divalent metal ions, p-Rib and incoming ð1; 4; 3; 1; 3Þ in a model with, say, n ¼ 4 types of building nucleotides. It is interesting to note that small peptides can blocks (amino acids or codons), corresponds to a polymer assemble into large and complex structures (Bromley et al. chain of length 5 whose first component is a type 1 building 2008; Fletcher et al. 2013) with potentially sophisticated func- block, the second component isatype4andsoon.Givena tionality: di- and tripeptides can self-assemble into larger sequence SfLg, we introduce subsequences SfL; jg¼ nanotubes and intriguingly it has even been suggested that d ðs1; s2; ...sjÞ (resp. SfL; jg¼ðsLjþ1; sLjþ2; ...sLÞ), these structures could have acted as primitive RNA polymer- j ¼ 1; ...L, whose elements are the j leftmost (resp. right- ases (Carny and Gazit 2005). d In summary, the essence of the model is that on geological most) elements of SfLg.Inparticular,SfL; LgSfL; Lg d timescales, short linear polynucleotides may have been suffi- SfLg; SfL; 1g¼s1 and SfL; 1g¼sL.Wewrite cient to template similar base-pairing interactions to those d seen in the modern ribosome with small amino-acylated SfLg¼ðSfL; L ‘g; SfL;‘gÞ; 0 <‘‘, the family of polymer chains obtained large subset of the available nucleic acid substrates, in turn from a chain of maximal length Lmax and sequence SfLmaxg producing more polynucleotide templates and resulting in an is given by PS . f ‘g‘¼1;2;...Lmax autocatalytic system. InthespecificcaseofXNA/polynucleotidechainsentering Mathematical Model our model, we use P ¼ R and the sequences are generically labeled as af‘g. Their elements correspond to types of The IDA described above is attractive both for its simplicity codons, and the complementary codon sequences in the and continuity with the existing mixed (protein/nucleic acid) sense of nucleic acids complementarity are af‘g. Therefore, replicator system in extant cells. However, the question a large class of XNA strands of length ‘ and sequence af‘g remains as to whether such a system is mathematically con- a a1 are denoted by R‘ , and in particular, R1 is a codon of type a1. sistent, could avoid collapse and instead become self- Besides the generic sequences af‘g introduced above, a se- sustaining. The number of parameters and variables needed quence denoted pfLmaxg, together with its subsequences to analyze the system in its full complexity is such that one is d led to consider simplified models which nevertheless capture pfLmax;‘g and pfLmax;‘g for ‘ ¼ 1; ...Lmax play a specific essential features of interest. Here we consider a simple model role: they correspond to polynucleotide chains that template of RNA–protein self-replication. the polymerization of a family of primordial peptide polymer- ases (p-Pol) through a process described in the next subsec- Constituents tion, see also figure 3.UsingP ¼ P to denote polypeptide The main constituents of the simplest model of XNA-protein chains, this family of polymerases derived from PLmax of max- imal length L ,isfPpg . These polymerases are such self-replication considered here (see also figs. 1b and 2)area max ‘ ‘¼2;...;Lmax p p p‘ p‘ pool of free nucleotides and amino acids, polypeptide that P‘ ¼ P‘1 þ P1 ,withP1 an amino acid p‘.Weusethe chains—including a family of polymerases—and polynucleo- notation Pp for a generic polymerase in the family. Alongside tide chains as well as p-tRNAs loaded with single amino acids. these polymerases, generic polypeptide chains of length ‘ and a a1 We introduce some notations. Generically, we consider sequence af‘g are labeled as P‘ .Proteinsoflength1,P1 ,are polymer chains P made of ntypesof building blocks labeled single amino acids of type a1.

Downloaded from https://academic.oup.com/mbe/article-abstract/35/2/404/4604794 407 by Uniwersytet Jagiellonski Biblioteka Medyczna user on 16 August 2018 Banwell et al. . doi:10.1093/molbev/msx292 MBE

AB AB

CD CD

EF

FIG.3.Mechanism (B): Polypeptide polymerization in our model. The EF square boxes represent the codons of a polynucleotide chain (here, of length L ¼ 4) and the circles represent amino acids. The p-tRNA FIG.4.First phase of Mechanism (C): Polymerization of the complemen- S p molecules are labeled T1; ...; T4. tary polynucleotide chain RL catalyzed by a primordial polymerase P . • Pp are involved, through Mechanism (C), in the duplica- RNA–Protein Replication Scenario tion of polynucleotides present in the environment, in- The scenario relies on three types of mechanisms: cluding the strands of XNA that participate in the very (A) The spontaneous polymerization of polynucleotide and production of Pp polypeptide chains, assumed to occur at a very slow rate, and their depolymerization through being cleaved in two Reactions Driving the Replication and Physical anywhere along the chainsata rateindependentofwhere Parameters the cut occurs. For simplicity, we consider the polymerization of polypeptide (B) The nonspontaneous polypeptide polymerization chains and the duplication of polynucleotide chains as single S reactions where the reaction rates take into account all sub- occurring through a polynucleotide chain RL on which several p-tRNA molecules loaded with an processes as well as failure rates. amino acid dock and progressively build the poly- This leads to the following schematic reactions: peptide chain. More precisely, each codon of type s Mechanism ðAÞ of the polynucleotide chain binds with a p-tRNA,

itself linked to an amino acid of type . Note that sL 1 s RS þ R þ ! RS ; (1) we assume the same number n of types of codons L 1 Lþ1 and amino acids. This leads to a chain of amino acids ^ RS ! RS þ RS ‘ ¼ 1; ...; L 1; (2) matching the codon sequence SfLg of the polynu- L L‘ ‘ cleotide chain. The process is illustrated in figure 3 S sLþ1 S for a polypeptide chain of length L¼ 4andaminoacid PL þ P1 ! PLþ1; (3) sequence Sf4g¼ðs1; s2; s3; s4Þ. ^ (C) The duplication of a polynucleotide chain RS,oflength PS ! PS þ PS ‘ ¼ 1; ...; L 1: (4) L L L‘ ‘ L ‘pmin, as a two-step process. In the first step, a polypeptide polymerase Pp, obtained by polymerization p Mechanism ðBÞ via mechanism (B) using a polynucleotide RL ,scansthe S polynucleotide chain RL to generate its complementary S S S S RL þ L TRP ! RL þ PL: (5) polynucleotide chain RL.Thisisshowninfigure 4.The S resulting polynucleotide chain RL is then used to generate S Mechanism ðCÞ a copy of the original polynucleotide chain R via the same p L S P S S mechanism (C). RL þ L R1 ! RL þ RL; (6) The replicator crudely operates as follows: where TRP denotes p-tRNA loaded with a single amino acid. The parameters for these reactions are (see the • Mechanism (A) provides a small pool of polymer chains; Supplementary Material online for more details on the esti- among them, one finds short strands of XNA with dual mation of the parameter values): function (p-mRNA and p-Rib) • þ Mechanism (B) provides polypeptide chains, including the • KR : polymerization rate of polynucleotide chains (eq. 1); polymerases (p-Pol, called Pp here), by using the XNA we have estimated the catalyzed XNA polymerization produced through Mechanism (A) and Mechanism (C) rate to be 4:2 107 mol1 m3 s1.

Downloaded from408 https://academic.oup.com/mbe/article-abstract/35/2/404/4604794 by Uniwersytet Jagiellonski Biblioteka Medyczna user on 16 August 2018 Ancestral Darwinian Self-Replicator . doi:10.1093/molbev/msx292 MBE

• KR : depolymerization rate of polynucleotide chains (hy- up to Lmax. The first step in polymerization requires a polymer- drolysis) (eq. 2); taken to be 8 109s1. ase to attach itself to the template polynucleotide. This is only þ • KP : polymerization rate of polypeptide chains (eq. 3); we possible if the template polynucleotide has a minimum length, 21 1 3 1 have estimated it to be 2:8 10 mol m s . which we assume to be ‘pmin.Inthefollowing,weassumethat • KP;S;L: depolymerization rate of polypeptide chains of polymerases can polymerize polynucleotide chains of any length L and sequence S (eq. 4); we have estimated it length greater or equal to ‘pmin. The corresponding reaction 11 1 6 1 p to be in the range 4 10 s 5:1 10 s . rate is given by ZP‘ for a polymerase of length ‘ ‘pmin. þ • kP;L: polymerization rate of a polypeptide of length L from The free nucleotides must then attach themselves to the the corresponding polynucleotide chain (eq. 5). It is rea- polynucleotide–polymerase complex and the polymerase þ þ sonable to assume that kP;L ¼ kP;1=L and we have esti- must move one step along the polynucleotide. The rate for þ 1 3 1 mated kP;1 to be 0:1mol m s . each of these L steps is • Z: the rate at which a polymerase attaches to a polynu- kstep hRR1 cleotide chain (eq. 6) which we have estimated to be k ¼ ; (9) Rþ k þ h R 106 mol1 m3 s1. step R 1 • hR: the rate of attachment of a free polynucleotide to a and hence, the rate of polymerization for a polynucleotide of ~ p polynucleotide chain attached to a p-Pol (eq. 6). We have length L and polymerase of length ‘ is KðZP ; kRþ; LÞ. 6 1 3 1 ‘ estimated it to be 10 mol m s . However, it is assumed that polymerases of several lengths • kstep: the rate at which a polymerase moves by one step are available and therefore, the total rate is given by on the polynucleotide (eq. 6). We have estimated it to be 8 in the range 2 102 s1–4 105 s1. > XLmax < ~ p KðZP‘ ; kRþ; LÞ W‘; L ‘pmin We now argue that the three parameters Z; h and k KðLÞ¼ ‘¼‘ (10) R step > pmin enter the dynamical system for the polymer concentrations : 0 L <‘ ; in our model as two physical combinations denoted KðLÞ pmin and Pb that we describe below. where it is understood that ‘pmin is the lower bound length First recall that we assume the existence of a pool of for polymerase activity and W‘ is a quality factor given by nucleotides, amino acids and p-tRNA. The amount of free 8 < ‘ ‘pmin þ 1 nucleotides and amino acids is taken to be the difference ‘ ‘ ‘ ‘ ‘ þ 1 pmin pmax between the total amount of these molecules and the total W‘ ¼ : pmax pmin (11) amount of the corresponding polymerized material, ensuring 1 ‘pmax <‘ Lmax: total conservation. We denote the concentration of polypeptide and polynu- Indeed, we expect long polymerases to be more efficient, a p p a p p so W is taken to increase with ‘ in the range cleotide chains respectively by PL ; PL ; PL and RL ; RL ; RL ,all ‘ 3 3 expressed in mol m mol m .Inparticular,P1 and R1 are ‘pmin ‘ ‘pmax, while polymerases of length ‘>‘pmax the concentrations of each type of free amino acids and have the same level of activity as those with length ‘ ¼ ‘pmax,

nucleotides respectively, and we assume, for simplicity, that that is, W‘>‘max ¼ 1. all types of amino acids/codons are equally available. To avoid proliferation of parameters in our simulations, we We also assume that the amount of loaded p-tRNA, have taken ‘pmax ¼ Lmax,whereLmax is the maximal polynu- CptRNA, remains proportional to the amount of free amino cleotide chain’s length. acids and that the concentration of p-tRNA is larger than P1 so that most amino acids are loaded on a p-tRNA. With these Binding Probability Pb of a Polynucleotide and a Polymerase conventions, one has of Length L FirstnotethatittakesL times longer to synthesize a polypep- CptRNA ¼ ktP1 with kt 1: (7) tide chain of length L from its corresponding polynucleotide chain than it takes for one amino acid to bind itself to the Total Reaction Rate KðLÞ of Polynucleotide Polymerization polynucleotide. The rate is thus given by kþ P ¼ðkþ =LÞ P . If a complex reaction is the result of one event at rate K,and P;L 1 P;1 1 We now offer some considerations on depolymerization. m other, identical, events at rate k, the average time to com- We assume that if a polymer PS depolymerizes, it does so by plete the reaction is the sum of the average times for each L (potentially consecutive) cleavings. In the first step, PS can event. Hence the reaction rate is given by L cleave in L – 1 different positions, resulting in two smaller 1 1 m Kk chains L1, L2 with L ¼ L1 þ L2 and 1 L1;2 L 1. This is Kð~ K; k; mÞ¼ þ ¼ : (8) K k mK þ k the origin of the factor ðL 1Þ in the terms describing the depolymerization of polymer chains in the dynamical systems One such complex reaction in our model is the equations presented in the next subsection. polymerization of a polynucleotide chain of length L,say, The concentration variations resulting from such from its complementary chain (second phase of depolymerizations must be carefully evaluated. A polymer S Mechanism (C)). Polymerases are characterized by the PL of length L and sequence S,whereS stands for any of ~S polymerizing efficiency which, we assume, increases with ‘, a, p or p, can be obtained by cleaving a polymer P‘ of length

Downloaded from https://academic.oup.com/mbe/article-abstract/35/2/404/4604794 409 by Uniwersytet Jagiellonski Biblioteka Medyczna user on 16 August 2018 Banwell et al. . doi:10.1093/molbev/msx292 MBE

‘>L and sequence ~S ¼ðS; TÞ where T isasequenceof that a polymerase of length L binds to a polynucleotide of ~ length ‘ L. Similarly it can be obtained by cleaving PS0 of length M is therefore given by 0 ‘ sequence ~S ¼ðT0; SÞ where T0 is also of length ‘ L.Ifthe ~ kb;M rate of cleaving, K, is assumed to be independent of the Pb;M ¼ P : (15) P Lmax k polymer length, and since there are n‘L different sequences m¼2 b;m 0 T and T ,wheren is the number of amino acid or codon types, The total time the polymerase remains bound to a poly- the rate of concentration variation of polymers of length L nucleotide of length M is estimated to be M=kRþ. Therefore resulting from the depolymerization of longer polymers is the probability Pb for a polymerase to be bound is given by L L the average binding time divided by the sum of the average Xmax Xmax 0 ‘L ~S ‘L ~S binding time and the average time needed to bind: n KP P‘ þ n KP P‘ : (12) ‘¼Lþ1 ‘¼Lþ1 P Lmax ~ M¼2ðM=kRþÞPb;M Pb ¼ P P : (16) Recall that we use the same notation for the concentration Lmax ~ Lmax ððM=kRþÞPb;MÞþ1= kb;m of a polymer of sequence S and length L and the polymer M¼2 m¼2 S itself, namely PL,andP is supposed to be set to P ¼ P or As a result the polymerase depolymerization rate will be P ¼ R in our model. As already stressed, we assume poly- KP;a;L ¼ KP ; mers have at most length Lmax. Finally, when the concentra- ~ ~0 tions PS and PS are equal, (eq. 12) can be rewritten as L L KP;p;L ¼ KP ; (17)

XLmax ‘L ~S KP;p;L ¼ KP ð1 PbFpðLÞÞ: 2 n KP P‘: (13) ‘¼Lþ1 p Equations The depolymerization of polymerase PL requires special p For any chain of length ‘, our model considers the concen- treatment. When PL depolymerizes, it generates a polymerase p p trations of polynucleotides and polypeptides corresponding to P‘ with ‘ ‘ ‘pmin þ 1 equal to the number of amino acid types. p 1 k ‘ ‘ dRL þ p þ p Fpð‘Þ¼ e pmin ; (14) ¼ K R R nK R R :> dt R 1 L1 R 1 L ‘<‘ 0 pmin XLmax p ‘L a with k > 0 a parameter controlling how much of the poly- þ ½KR R‘ þð2n 1ÞKR R‘ ‘¼Lþ1 merase is stabilized. The term ð‘ ‘pmin þ 1Þ=k can be p p interpreted as a Boltzmann factor with a free energy ðL 1ÞKR RL þKðLÞRL ; expressed in units of kBT. The hydrogen bond binding energy between RNA and a polypeptide is 16 kJ/mol [Dixitetal. dRp L ¼ KþR Rp nKþR Rp 2000], so assuming that the number of such hydrogen bonds dt R 1 L1 R 1 L between the polymerase and the polynucleotide is XLmax p ‘L a ‘ ‘pmin þ 1, one has k 0:15. þ ½K R þð2n 1ÞK R a R ‘ R ‘ The binding rate of a polymerase to a polynucleotide RM of ‘¼Lþ1 a M M length M and sequence a is kb;M ¼ ZRM n where n is the ðL 1ÞKRp þKðLÞRp; total number of polynucleotides of length M. The probability R L L

Downloaded from410 https://academic.oup.com/mbe/article-abstract/35/2/404/4604794 by Uniwersytet Jagiellonski Biblioteka Medyczna user on 16 August 2018 Ancestral Darwinian Self-Replicator . doi:10.1093/molbev/msx292 MBE

a XLmax polymerase of length ‘ compared with an arbitrary protein of dRL þ a þ a ‘L a ¼ K R1R nK R1R þ 2 n K R ‘ dt R L1 R L R ‘ length . Unit ratios indicate that the polymerase has not ‘¼Lþ1 been selected at all, whereas large values of Q or Q on a a 1 2;‘ ðL 1ÞKR RL þKðLÞRL ; the other hand indicate a good selection of the polymerase. dPp XLmax The complexity of the system (eq. 18) also lies in the num- L ¼ KþP Pp nKþP Pp þ ½Kð1 P F ðLÞÞPp ber of free parameters it involves. A systematic analysis of the dt P 1 L1 P 1 L P b p ‘ ‘¼Lþ1 high-dimensional parameter space is beyond the scope of this ‘L a p article, and we therefore concentrate on the analysis and þð2n 1ÞK P ðL 1ÞK ð1 Pb FpðLÞP P ‘ P L description of results for a selection of parameter values kþ P Rp; þ P;L 1 L that highlight potentially interesting behaviors of our model. p Recall that our model assumes that the number n of dif- dPL þ p þ p ¼ KP P1PL1 nKP P1PL ferent amino acids is equal to the number of codon types, and dt throughout our numerical work we have set n ¼ 4. Note that XLmax p ‘L a the word “codon” here is used by extension. Indeed, there are þ ½KP P‘ þð2n 1ÞKP P‘ four different nucleic acids in our model and the “biological” ‘¼Lþ1 codons are made of two nucleic acids, bringing their number p þ p ðL 1ÞKP PL þ kP;LP1RL ; to sixteen. However, they split into four groups of four, each of which encoding one of the four amino acids. From a mathe- dPa L ¼KþP Pa nKþP Pa matical modeling point of view, this is completely equivalent. dt P 1 L1 P 1 L It is well accepted that early proteins were produced using a XLmax reduced set of amino acids (Angyan et al. 2014). The exact ‘L a a þ a þ2 n KP P‘ ðL1ÞKP PL þkP;LP1RL : (18) identity and number is unclear though experimental work has ‘¼Lþ1 shown that protein domains can be made using predomi- nantly five amino acids (Riddle et al. 1997) whereas the helices Alongside the seven physical parameters of a four-alpha helix bundle were made using only four amino 6; 6 ; þ ; ; appearing in the differential fKR KP hP;L KðLÞ Pbg acids (Regan and DeGrado 1988). We have used mostly equations above, we need to consider two parameters yield- ‘pmin ¼ 7and‘pmax ¼ Lmax ¼ 10, but have investigated ing the “initial” concentrations of amino acid and nucleotide other values as well (see the Supplementary Material online). inside the system, namely q P1ðt ¼ 0Þ and q p r While these figures are somewhat arbitrary, an ‘pmin of 7 was R1ðt ¼ 0Þ.Intheabsenceofactualdataforthesequantities, chosen on the assumption that the functional p-Pol would have we explore a range of realistic values in the analysis of our some forms of stable structural motif and this number corre- model. The concentration of free amino acids and nucleoti- sponds to the typical minimumnumber of aminoacids required desatanyonetimeisthengivenbyP toproduceastable,foldedalphahelixstructure(Manning et al. q Lmax L 2 a p p and P1ðtÞ¼ p PL¼2½ðn ÞPL ðtÞþPL ðtÞþPL ðtÞ 1988).ThechoiceofLmax ¼ 10 isbasedonthefactthatwhilethe Lmax L a p p polymer peptide chains could be significantly longer, they would R1ðtÞ¼qr L¼2½ðn 2ÞRL ðtÞþRL ðtÞþRL ðtÞ re- spectively, with PSð0Þ¼RSð0Þ¼0 for any value of L in the need correspondingly long polynucleotide sequences to encode L L them, which becomes increasingly unlikely as lengths increase. range 2 L Lmax and sequence S ¼ a; p; p. Furthermore, we expected polymers of length 10 to have very Results low concentrations, a hypothesis confirmed by our simulations. We have nevertheless investigated larger values of Lmax as well, The system of equations (eq. 18)isnonlinearandtoocom- and found little difference, as outlined below. plex to solve analytically. We therefore analyze it numerically, In a first step, guided by data on parameter values gleaned starting from a system made entirely of free nucleotides, from the literature and gathered in the Supplementary amino acids, as well as charged p-tRNA, and letting the sys- Material online, we set tem evolve until it settles into a steady configuration. þ 7 1 3 1 Themainquantitiesofinterestaretherelativeconcentrations KR ¼ 4:2 10 mol m s ; ofthepolymerase(qp)andoftheapeptidechains(qa).Wehave 9 1 KR ¼ 8 10 s ; XLmax XLmax þ 21 1 3 1 p a KP ¼ 2:8 10 mol m s ; qp ¼ P‘ and qa ¼ P‘ ; (19) ‘¼‘pmin ‘¼‘pmin 11 1 KP ¼ 4 10 s and evaluate the ratios þ 1 3 1 kP;1 ¼ 0:1mol m s ; (21) p qp P‘ 6 1 3 1 hR ¼ 10 mol m s ; Q1 ¼ and Q2;‘ ¼ a ; (20) qa P‘ Z ¼ 106 mol1 m3 s1; while monitoring the evolution of each quantity over time. Q1 corresponds to the relative amount of polymerase of any k ¼ 0:15; length compared with other proteins (for a specific arbitrary 5 1 kstep ¼ 4 10 s : sequence a), while Q2;‘ corresponds to the relative amount of

Downloaded from https://academic.oup.com/mbe/article-abstract/35/2/404/4604794 411 by Uniwersytet Jagiellonski Biblioteka Medyczna user on 16 August 2018 Banwell et al. . doi:10.1093/molbev/msx292 MBE

We let the system evolve under a variety of initial concen- Many of the parameters we have used were estimated or trations of free amino acids and nucleotides, qp and qr,inthe measured in conditions which, in all likelihood, were not range 105 0:1molm3, and with all polymer concentra- identical to the ones existing when the polymerization we tions set to 0. We monitored the concentration of all poly- aremodelingoccurred.Inasecond step, we departed from mers, in particular the concentration of polymerase qp and its the set of values (eq. 21) and found that in all cases investi- ratio to the concentration of a polypeptide chains, Q1.In gated, varying these parameters modified the critical concen- most cases we found that the nucleotides polymerized spon- trations of qr;c and qp;c, but did not affect significantly the taneously (Mechanism (A)) in small amount and this led, value of Q1 while Q2;10 remained extremely large. 6 1 indirectly, to the polymerization of the polypeptides, includ- More specifically, taking KP ¼ 5:1 10 s marginally ing the polymerases (Mechanism (C)). The polymerases then increased the critical concentration to qr;c ¼ qp;c ¼ 3 1 induced further polymerization of the polynucleotides 0:0011 mol m . Similarly, taking kstep ¼ 0:02s increased (Mechanism (B)) and the system slowly equilibrated. slightly the critical concentrations: qr;c ¼ qp;c ¼ The end result was an excess of polymerase of all lengths 0:0017 mol m3. On the other hand, taking 8 1 3 1 compared with a polypeptide chains with Q1 ¼ 786 for all Z ¼ 10 mol m s lead to a decrease of the critical con- 3 3 initial concentrations q ¼ q 0:001 mol m (fig. 5). centrations: qr;c ¼ qp;c ¼ 0:0005 mol m .VaryinghR to p r 1 Moreover the total amount of polymerase reached, for initial values as small as 1 mol m3 s1 did not change the critical concentration of free amino acids qp, was a concentration of concentrations. 4 In our model, we have considered the concentrations of 4 10 qp (as illustrated by the bottom two rows in table 1). The concentration of polymerase of length 10, on the free amino acids (qp P1) and charged p-tRNA to be iden- p 14 3 tical: kt 1(seeeq. 7). To consider other values of k we only other hand, was very small P10 ¼ 6:3 10 mol m for t 18 need to multiply the polymerization rate of a peptide ( þ ) but Q2;10 ¼ 5:9 10 was very large, effectively showing kP;1 that the only polypeptide chain of length L ¼ 10 was by kt as it is p-tRNAs that bind to XNA chains, not free amino max þ the polymerase. acids. We have considered a large range of values for kP;1 and þ 5 1 3 1 We found hardly any polymerization of the polymerase when found that for kP;1 ¼ 10 mol m s , the critical concen- q ¼ q ¼ 0:0009 mol m3,withq 1:41014 mol m3 trations had not changed significantly while for p r p 8 1 3 1 and Q ¼ 12:4, whereas with q ¼ q ¼ 0:001molm3,we 10 mol m s ,theyincreasedtoqr;c ¼ qp;c ¼ 1 p r 3 7 3 0:002 mol m . This shows that taking much smaller values obtained qp 3:910 mol m and Q1 ¼ 786 (fig. 5a). This highlights a very sharp transition at a critical concentration of kt has a very small impact on our results and that having a concentration of charged p-tRNA much smaller than that of qp;c above which polymerases are generated. We summarize the data in table 1. free amino acids would only increase marginally the critical We then fixed the initial concentration q to four different concentrations we have obtained using our original p assumption. values and varied qr to identify the critical initial concentra- tion of nucleotides necessary for the production of polymer- The parameters on which the model is the most sensitive are þ þ 8 1 3 1 ases. The results in table 2 show that the critical KR and KR .WefoundthatforKR ¼ 4 10 mol m s , q ¼ q ¼ 0:007mol m3 and for Kþ ¼ 4109 mol1 concentration qr;c is nearly constant and of the order of r;c p;c R 3 3 3 1 3 10 mol m for a very wide range of amino acid initial m s , qr;c ¼ qp;c ¼ 0:05mol m . Similarly, for 7 1 3 concentrations. KR ¼ 10 s we found that qr;c ¼ qp;c ¼ 0:01mol m and

A B

3 FIG.5.(a) Time evolution of the polymerase for initial concentration qr ¼ qp ¼ 0:001, 0.01, and 0:1 mol m .(b) Q1 for initial concentration 3 11 1 6 1 3 1 qr ¼ qr ¼ 0:01 molm . Parameter values: KP ¼ 4 10 s ; Z ¼ 10 mol m s ; k ¼ 0:15.

Downloaded from412 https://academic.oup.com/mbe/article-abstract/35/2/404/4604794 by Uniwersytet Jagiellonski Biblioteka Medyczna user on 16 August 2018 Ancestral Darwinian Self-Replicator . doi:10.1093/molbev/msx292 MBE

Table 1. Effect of Initial Concentrations on Polymerase Production.

3 3 3 qp ðmol m Þ 1. qr ðmol m Þ 2. qp ðmol m Þ 3. Q1 4. Polymerase Production 2 10–4 2 10–4 2.8 10–19 1.0008 Insignificant 9 10–4 9 10–4 1.410–14 12.4 Insignificant 10–3 10–3 3.9 10–7 786 Yes 10–1 10–1 3.9 10–5 786 Yes

Table 2. Effect of Initial Peptide Concentration on Critical Table 4. Effect of Various Parameters on Initial Critical Concentration. Concentrations.

3 3 3 qp ðmol m Þ qr;c ðmol m Þ Modified Parameter qr;c ¼ qp;c ðmol m Þ –4 –3 6 1 –3 10 2 10 KP ¼ 5:1 10 s 1.1 10 –3 –3 2 1 –3 10 10 kstep ¼ 2 10 s 1.7 10 10–2 8 10–4 Z ¼ 108 mol1 m3s1 5 10–4 –1 –4 1 3 1 –3 10 7 10 hR ¼ 1 mol m s 10 þ 5 1 3 1 –3 kP;1 ¼ 10 mol m s 10 þ 8 1 3 1 –3 kP;1 ¼ 10 mol m s 2 10 –3 Lmax ¼ 15 1.1 10 –4 Table 3. Effect of Initial Concentration of Free Nucleotides on Time Lpmin ¼ 64 10 –4 for Production of Polymerase. Lpmin ¼ 52 10 –5 Lpmin ¼ 42 10 3 þ 8 1 3 1 –3 qr ¼ qp ðmol m Þ spol ðyearsÞ KR ¼ 4 10 mol m s 7 10 Kþ 4 109 mol1 m3 s1 5 10–2 0.001 18,000 R ¼ K 107 s1 10–2 0.002 254 R ¼ K 106 s1 0.19 0.005 12.7 R ¼ 0.01 2.2 We have also considered values of Lmax > 10 and found that the main difference is a slight increase of the critical 6 1 3 for KR ¼ 10 s that qr;c ¼ qp;c 0:05mol m .This concentrations. For example, for Lmax ¼ 11; 12 and 15, shows that the spontaneous polymerization of polynu- qr;c ¼ qp;c are respectively equal to 0:001; 0:0011, and 3 cleotide is essential to reach a minimum concentration of 0:0011 mol m . At given concentrations Q1 and qp remain unchanged but Pp deceases approximately by a factor of 40 polynucleotides to kick start the whole catalysis process Lmax and that the stability of the polynucleotides plays an each time Lmax is increased by 1 unit. important role. We have also taken Lpmin ¼ 4; 5,and6andfoundthatthe To investigated this, we have run simulations with critical concentrations were respectively 2 105; 2 104, þ 8 1 3 1 and 4 104 mol m3, whereas q took the values of KR ¼ 4 10 mol m s for a fixed duration, spol,after p þ 0.012, 2:6 103,and3 104 mol m3. Q on the other which KR was set to 0. We found that if spol was long enough, 1 the polymerization of polypeptide and polynucleotide chains hand remained constant. þ A summary of the parameter values investigated outside was identical to the one obtained whereas KR was not mod- ified. When spol was too short, on the other hand, one was the set (eq. 21) and the corresponding critical concentrations only left with short polypeptide and polynucleotide chains in are given in table 4. Only one parameter was changed at a an equilibrium controlled by the spontaneous polymerization time (see the Supplementary Material online). and depolymerization parameters. The minimum value for Discussion spol depends on the concentrations qr and qp and the results are given in table 3. We describe a theoretical nucleopeptidic reciprocal replicator þ This shows that while KR is an important parameter in the comprisingapolynucleotidethattemplatestheassemblyof process, what matters are to have a spontaneous generation small p-tRNA adapter molecules, most likely having mixed of polynucleotides at the onset (Mechanism (A)). This then backbone architectures. These spontaneously arising p-tRNAs leads to the production of polypeptides, including polymer- would have been bound to various classes of amino acids ase (Mechanism (C)) and, once the concentration of poly- (possibly via weak stereochemical specificity), and a simple merase is large enough, the catalyzed production of increase in local concentration mediated by binding to the p- polynucleotides (Mechanism (B)) dominates the spontane- Rib (in its most primitive version nothing much more than a ous polymerization. mixed backbone architecture p-mRNA) could have driven We have also varied KR once the system had settled and polypeptide polymerization. Once a template arose that 3 we found that for qr ¼ qp ¼ 0:01 mol m ; KR could be coded for a peptide able to catalyze phosphodiester bond increased up to 6 107s1 while still keeping a large formation, this p-Rib could have templated assembly of its amount of polymerase. Above that value, the polynucleotides own complementary strand (and vice versa) and the self- are too unstable and one ends up again with mostly short replication cycle would have been complete (see fig. 6 for a polymer chains and Q1 1. summary).

Downloaded from https://academic.oup.com/mbe/article-abstract/35/2/404/4604794 413 by Uniwersytet Jagiellonski Biblioteka Medyczna user on 16 August 2018 Banwell et al. . doi:10.1093/molbev/msx292 MBE

nucleotides. Although the concentrations of various compo- nents are not known with certainty this does not seem an unreasonable proposition particularly given that functional peptides are known to occur in random sequences with sur- prising frequency (Keefe and Szostak 2001). Our mathematical model showed that the most impor- tant parameters, apart from the concentration of loaded p- tRNA and polynucleotides, are the spontaneous polymerization and depolymerization of polynucleotides. It also shows that polynucleotides are first polymerized spon- taneously and that these initial polynucleotides catalyze the production of the first polypeptides, including the polymer- ase. These polymerases can then generate further polynucleo- tides through catalysis. The stability gained by polymerases

FIG.6.The nucleopeptide Initial Darwinian Ancestor. In this cartoon while being bound to polynucleotides ultimately leads to an model, a short strand of XNA has the functionality of both a primor- increase of their relative concentration compared with the dial p-mRNA and a p-Rib. Primitive XNA molecules loaded with other polypeptides. amino acids (p-tRNA) bind to the p-Rib via codon–anticodon pairing. Overall, the hypothesis explains the coupling of polynucle- This allows adjacent amino acids to undergo peptide bond formation otide and polypeptide polymerization, the operational code and a short peptide chain is produced. A certain peptide sequence is and in the p-Pol sequence that could eventually able to act as a primordial XNA-dependent XNA polymerase (p-Pol) result in increased specificities leading to primitive DNA pol- able to copy both þ and – p-Rib strands to eventually produce a copy ymerases and RNA polymerases. No extraordinary exchanges of the p-Rib(þ) strand. of function are required and each molecule is functionally similar to its present-day analogue. Like all new abiogenesis Starting from a single peptide and single polynucleotide, theories, this IDA requires in vitro confirmation; in particular, the IDA would quickly have become a distribution of related the steps required for the primordial operational code to arise sequences of peptides and XNAs. We can imagine that over ab initio warrant close attention. time, different p-Ribs encoding different peptides with addi- The idea that the ancestral replicator may have consisted tional functionalities could have appeared as the system of both nucleic acid and peptide components (the evolved and that these p-Ribs may have subsequently fused “nucleopeptide world”) is in itself not new, but compared together into larger molecules. with the RNA world, has been somewhat neglected. We argue By imagining the IDA swiftly becoming a pool of molecules that molecular co-evolution of polynucleotides and peptides where variety within the “species” is maintained by the poor seems likely and cross-catalysis is known to be possible, for copying fidelity of a statistical operational code, should any example in vitro selection experiments delivered RNA with that stops replication arise, the other molecules in peptidyl transferase activity after just nine rounds of a single the pool would still function, ensuring continuity of the selection experiments (Zhang and Cech 1997; Fishkis 2011). whole. Indeed this could have provided a selective pressure Inversely, Levy and Ellington produced a 17-residue peptide for superior replicators. While our model does not directly that ligates a 35 base RNA (Levy and Ellington 2003). consider less than perfect copying fidelity, it is not expected to Nucleopeptide world research is relatively sparse, the data have a major effect on our conclusions as copies with de- collected so far hint that cross-catalysis may be more efficient creased performance would not be maintained as a significant than autocatalysis by either peptides or nucleic acids. A self- proportion of the population and copies with increased per- replicating primordial system wherein RNA encoding for pro- formance would simply take over the role of main replicator. tein was replicated by a primordial RNA-dependent RNA The primordial operational code may only have required polymerase which carried out the role of a replicative agent two bases per p-tRNA to deliver statistical proteins, while the rather than as a transcriber of genes has previously been catalytic requirements of the p-Pol are loose enough that a suggested (Leipe et al. 1999), although in this case no further seven-residue peptide is a plausible lower length limit. This development of the concept to produce a self-contained rep- reduces the minimum length of the posited spontaneously licating system was pursued. The merits of a “two polymer- arising p-Rib to just 14 nucleotides (assuming no spaces be- ase” system where RNA catalysespeptidepolymerizationand tween codons). This is an optimistic length estimate, but viceversaweresuccinctlyexplainedbyKunin (2000),al- given the available time and with molecular co-evolution, though possible mechanisms and validity were not consid- inorganic catalysts and geological PCR, considerably longer ered in detail. The possibility of a two polymerase system is molecules may have been possible (Baaske et al. 2007; also mentioned by van der Gulik and Speijer as part of a wider Fishkis 2011). These p-Pols would act on p-Ribs and the cru- review of the co-evolution of peptides and RNA (van der cial abiogenesis step would be the emergence of a 14-mer Gulik and Speijer 2015) but without a mathematical model. XNA that, in the context of the primordial operational code, Other origins of life hypotheses propose that the initial self- happened to code for a peptide able to bind XNA and - replicator did not consist of polynucleotides and/or peptides alyze phosphodiester bond formation of base-paired but was originally composed of different materials, most

Downloaded from414 https://academic.oup.com/mbe/article-abstract/35/2/404/4604794 by Uniwersytet Jagiellonski Biblioteka Medyczna user on 16 August 2018 Ancestral Darwinian Self-Replicator . doi:10.1093/molbev/msx292 MBE

famously clay crystals (Cairns-Smith 1982). Such hypotheses BiroJC,BenyoB,SansomC,SzlaveczA,FordosG,MicsikT,BenyoZ. are of interest but were not considered in this work as the IDA 2003. A common periodic table of codons and amino acids. Biochem presented here does not require genetic takeover of one rep- Biophys Res Commun. 306(2):408–415. Bromley EHC, Channon K, Moutevelis E, Woolfson DN. 2008. Peptide lication system by another and can be achieved using building and protein building blocks for : from program- blocks likely to have been present on the early earth and so ming biomolecules to self-organized biomolecular systems. ACS appears more parsimonious. Our IDA hypothesis has tried to Chem Biol. 3(1):38–50. set out more rigorously the possible steps and processes Cairns-Smith AG. 1982. Genetic takeover and the mineral origins of life. whereby a nucleopeptide IDA could have arisen and could Cambridge University Press. Carny O, Gazit E. 2005. A model for the role of short self-assembled be tested experimentally. peptides in the very early stages of the origin of life. FASEB J. Future experimental work that would support the nucleo- 19(9):1051–1055. peptide theory would be to provide evidence that the stereo- Cech TR. 2012. The RNA worlds in context. Cold Spring Harb Perspect chemical hypothesis applies to the earliest occurring amino Biol. 4(7):a006742. acids including those likely to have composed the active site of Da Silva L, Maurel MC, Deamer D. 2015. Salt-promoted synthesis of RNA-like molecules in simulated hydrothermal conditions. JMol the p-Pol. Currently codon/anticodon binding to a number of Evol. 80(2):86–97. amino acids has been shown (Yarus et al. 2005)butisabsent Dixit SB, Arora N, Jayaram B. 2000. How do hydrogen bonds contribute for the four earliest amino acids (Wolf and Koonin 2007). This to protein–DNA recognition? JBiomolStructDyn. 17(Suppl may be due to their small sizes though even here possible 1):109–112. solutions have been proposed (Tamura 2015). Fishkis M. 2011. Emergence of self-reproduction in cooperative chemical evolution of prebiological molecules. Origins Life Evol B. It is important to note that we do not propose that the 41(3):261–275. RNA world did not or could not exist, nor does this work Fletcher JM, Harniman RL, Barnes FR, Boyle AL, Collins A, Mantell J, necessarily suggest that a self-replicating RNA polymerase did Sharp TH, Antognozzi M, Booth PJ, Linden N. 2013. not exist (although our results suggest it to be unlikely), but Self-assembling cages from coiled-coil peptide modules. rather that such a molecule did not directly lead to current Science 340(6132):595–599. Fox SW, Harada K. 1958. Thermal copolymerization of amino acids to a living systems. Indeed the crucial role of RNA (more correctly, product resembling protein. Science 128(3333):1214. þ XNA) in our model is highlighted by the importance of KR , Giel-Pietraszuk M, Barciszewski J. 2006. Charging of tRNA with the rate of polymerization of polynucleotide chains. We also non-natural amino acids at high pressure. FEBS J. 273(13): do not dismiss any roles for ribozymes—for example it could 3014–3023. well be that ribozymes were responsible for aminoacylation Giri V, Jain S. 2012. The origin of large molecules in primordial autocat- alytic reaction networks. PLoS ONE. 7(1):e29546. reactions (although this would inevitably raise the question of Herschy B, Whicher A, Camprubi E, Watson C, Dartnell L, Ward J, Evans how such ribozymes were themselves replicated). Similarly JRG, Lane N. 2014. An origin-of-life reactor to simulate alkaline hy- (and with similar provisos), peptides alone could also have drothermal vents. J Mol Evol. 79(5–6):213–227. carried out supporting roles such as stabilizing long XNA Ikehara K. 2002. Origins of gene, genetic code, protein and life: compre- sequences or catalyzing aminoacylation reactions. At its hensive view of life systems from a GNC-SNS primitive genetic code hypothesis. JBiosci. 27(2):165–186. core however, we suggest that the ancestral replicator was Ikehara K. 2005. Possible steps to the emergence of life: the [GADV]- nucleopeptidic with information storage function carried out protein world hypothesis. Chem Rec. 5(2):107–118. by the XNA and polymerase function carried out by the Illangasekare M, Sanchez G, Nickles T, Yarus M. 1995. Aminoacyl-RNA peptide. synthesis catalyzed by an RNA. Science 267(5198):643–647. Issac R, Chmielewski J. 2002. Approaching exponential growth with a self-replicating peptide. JAmChemSoc. 124(24):6808–6809. Supplementary Material Iyer L, Koonin E, Aravind L. 2003. Evolutionary connection between the Supplementary data areavailableat catalytic subunits of DNA-dependent RNA polymerases and eukary- Molecular Biology and otic RNA-dependent RNA polymerases and the origin of RNA poly- Evolution online. merases. BMC Struct Biol.3:1. Keefe AD, Szostak JW. 2001. Functional proteins from a random- Acknowledgments sequence library. Nature 410(6829):715–718. Kemeny J. 1955. Man viewed as a machine. Sci Am. 192(4):58–68. We thank Jeremy Tame, Andy Bates, and Arnout Voet for Knight RD, Landweber LF. 2000. Guilt by association: the arginine case critical reading of the manuscript and Arnout Voet and Jan revisited. RNA 6(4):499–510. Zaucha for many constructive and critical discussions. This Kochavi E, Bar-Nun A, Fleminger G. 1997. Substrate-directed formation work was supported by RIKEN Initiative Research Funding to of small biocatalysts under prebiotic conditions. JMolEvol. 45(4):342–351. J.G.H. And funding from the Malopolska Centre of Koonin EV, Novozhilov AS. 2009. Origin and evolution of the genetic Biotechnology, awarded to J.G.H. code: the universal enigma. IUBMB Life. 61(2):99–111. Koonin EV. 1991. The phylogeny of RNA-dependent RNA polymerases of positive-strand RNA viruses. JGenVirol.72(9): References 2197–2206. Angyan AF, Ortutay C, Gaspari Z. 2014. Are proposed early genetic codes Kunin V. 2000. A system of two polymerases – a model for the origin of capable of encoding viable proteins? J Mol Evol. 78(5):263–274. life. Origins Life Evol B. 30(5):459–466. Baaske P, Weinert FM, Duhr S, Lemke KH, Russell MJ, Braun D. 2007. Kurland CG. 2010. The RNA dreamtime. Bioessays 32(10):866–871. Extreme accumulation of nucleotides in simulated hydrothermal Lee DH, Granja JR, Martinez JA, Severin K, Ghadiri MR. 1996. A self- pore systems. Proc Natl Acad Sci USA. 104(22):9346–9351. replicating peptide. Nature 382(6591):525–528.

Downloaded from https://academic.oup.com/mbe/article-abstract/35/2/404/4604794 415 by Uniwersytet Jagiellonski Biblioteka Medyczna user on 16 August 2018 Banwell et al. . doi:10.1093/molbev/msx292 MBE

Lehmann J, Reichel A, Buguin A, Libchaber A. 2007. Efficiency of a self- Roviello G, Musumeci D, Castiglione M, Bucci EM, Pedone C, Benedetti E. aminoacylating ribozyme: effect of the length and base-composition 2009. Solid phase synthesis and RNA-binding studies of a serum- of its 30 extension. RNA 13(8):1191–1197. resistant nucleo-epsilon-peptide. JPeptSci.15(3):155–160. Leipe DD, Aravind L, Koonin EV. 1999. Did DNA replication evolve twice Ruiz-Mirazo K, Briones C, de la Escosura A. 2014. Prebiotic systems independently? Nucleic Acids Res. 27(17):3389–3401. chemistry: new perspectives for the origins of life. Chem Rev. Leman L, Orgel L, Ghadiri MR. 2004. Carbonyl sulfide-mediated prebiotic 114(1):285–366. formation of peptides. Science 306(5694):283–286. SaladinoR,BottaG,PinoS,CostanzoG,DiMauroE.2012.Geneticsfirst Levy M, Ellington AD. 2003. Peptide-templated nucleic acid ligation. or metabolism first? The formamide clue. Chem Soc Rev. J Mol Evol. 56(5):607–615. 41(16):5526–5565. Liu Z, Beaufils D, Rossi JC, Pascal R. 2014. Evolutionary importance of the Schimmel P, Henderson B. 1994. Possible role of aminoacyl-RNA com- intramolecular pathways of hydrolysis of phosphate ester mixed plexes in noncoded peptide synthesis and origin of coded synthesis. anhydrides with amino acids and peptides. Sci Rep. 4:7440. Proc Natl Acad Sci USA. 91(24):11283–11286. Manning MC, Illangasekare M, Woody RW. 1988. Circular dichroism Schneider SE, O’Nei SN, Anslyn EV. 2000. Coupling rational design with studies of distorted alpha-helices, twisted beta-sheets, and beta libraries leads to the production of an ATP selective chemosensor. J turns. Biophys Chem. 31(1–2):77–86. Am Chem Soc. 122(3):542–543. Martin W, Baross J, Kelley D, Russell MJ. 2008. Hydrothermal vents and Sievers A, Beringer M, Rodnina MV, Wolfenden R. 2004. The ribosome as the origin of life. Nat Rev Microbiol. 6(11):805–814. an entropy trap. Proc Natl Acad Sci USA. 101(21):7897–7901. McCleskey SC, Griffin MJ, Schneider SE, McDevitt JT, Anslyn EV. 2003. Tamura K. 2015. Beyond the frozen accident: glycine assignment in the Differential receptors create patterns diagnostic for ATP and GTP. J genetic code. JMolEvol. 81(3–4):69–71. Am Chem Soc. 125(5):1114–1115. Tamura K, Schimmel P. 2003. Peptide synthesis with a template-like Miller SL. 1997. Peptide nucleic acids and prebiotic chemistry. Nat Struct RNA guide and aminoacyl phosphate adaptors. Proc Natl Acad Sci Biol. 4(3):167–169. USA. 100(15):8666–8669. Moore PB, Steitz TA. 2003. After the ribosome structures: how does Trevino SG, Zhang N, Elenko MP, Luptak A, Szostak JW. 2011. Evolution peptidyl transferase work? RNA 9(2):155–159. of functional nucleic acids in the presence of nonheritable backbone Morgens DW. 2013. The protein invasion: a broad review on the origin of heterogeneity. Proc Natl Acad Sci USA. 108(33):13492–13497. the translational system. JMolEvol. 77(4):185–196. Turk RM, Chumachenko NV, Yarus M. 2010. Multiple translational Nelson KE, Levy M, Miller SL. 2000. Peptide nucleic acids rather than products from a five-nucleotide ribozyme. Proc Natl Acad Sci USA. RNA may have been the first genetic molecule. Proc Natl Acad Sci 107(10):4585–4589. USA. 97(8):3868–3871. van der Gulik P, Speijer D. 2015. How amino acids and peptides shaped Noller HF. 2012. Evolution of protein synthesis from an RNA world. Cold the RNA world. Life 5(1):230. Spring Harb Perspect Biol. 4(4):a003681. Vetsigian K, Woese C, Goldenfeld N. 2006. Collective evolution Patel BH, Percivalle C, Ritson DJ, Duffy CD, Sutherland JD. 2015. and the genetic code. Proc Natl Acad Sci USA. Common origins of RNA, protein and lipid precursors in a cyano- 103(28):10696–10701. sulfidic protometabolism. Nat Chem. 7(4):301–307. Vidonne A, Philp D. 2009. Making molecules make themselves – the Paul N, Joyce GF. 2004. Minimal self-replicating systems. Curr Opin Chem chemistry of artificial replicators. Eur J Org Chem. 2009(5):593–610. Biol. 8(6):634–639. WoeseCR.1965.Ontheevolutionofthegeneticcode.Proc Natl Acad Pinheiro VB, Taylor AI, Cozens C, Abramov M, Renders M, Zhang S, Sci USA. 54(6):1546–1552. Chaput JC, Wengel J, Peak-Chew SY, McLaughlin SH, et al. 2012. Wolf YI, Koonin EV. 2007. On the origin of the translation system and Synthetic genetic polymers capable of heredity and evolution. the genetic code in the RNA world by means of , Science 336(6079):341–344. exaptation, and subfunctionalization. Biol Direct. 2:14. Regan L, DeGrado WF. 1988. Characterization of a helical protein Yarus M, Caporaso JG, Knight R. 2005. Origins of the genetic code: the designed from first principles. Science 241(4868):976–978. escaped triplet theory. Annu Rev Biochem. 74:179–198. Riddle DS, Santiago JV, Bray-Hall ST, Doshi N, Grantcharova VP, Yi Q, Yarus M, Widmann JJ, Knight R. 2009. RNAamino acid binding: a Baker D. 1997. Functional rapidly folding proteins from simplified stereochemical era for the genetic code. JMolEvol. amino acid sequences. Nat Struct Biol. 4(10):805–809. 69(5):406–429. Robertson MP, Joyce GF. 2012. The origins of the RNA world. Cold Spring Yarus M. 2011. Getting past the RNA world: the initial darwinian ances- Harb Perspect Biol. 4(5):a003608. tor. Cold Spring Harbor Perspect Biol. 3(4):a003590. Rodin AS, Szathmary E, Rodin SN. 2011. On origin of genetic code and Zhang BL, Cech TR. 1997. Peptide bond formation by in vitro selected tRNA before translation. Biol Direct. 6:14. ribozymes. Nature 390(6655):96–100.

Downloaded from416 https://academic.oup.com/mbe/article-abstract/35/2/404/4604794 by Uniwersytet Jagiellonski Biblioteka Medyczna user on 16 August 2018