Downloaded by guest on September 24, 2021 a Guseva Elizaveta polymers prebiotic sequence of and differentiation growth the for hypothesis Foldamer aayi lnaino te Ns eae oe asserts model related www.pnas.org/cgi/doi/10.1073/pnas.1620179114 A RNAs. other auto- can to of chains leading chain-length elongation RNA ribozymes, RNA the catalytic polymerase of of as model some serve polymers, that spontaneously a on envision developed that Focusing They (12) envision autocatalysis. biopolymers. Higgs first which and the trans- models, Wu information precede first precede model and processes metabolism fer GARD chemical of adaptability. display molecule subset small systems and Such a inheritance others. is of over ampli- degree chemicals corresponding some a some with of self-replication, fication net- to kinetic lead chemical can autocatalytic (GARD) works artificial Domain that Replication predicts Autocatalysis (9–11) Graded called els which another. in entities catalyze of can collection entity led a one set, has any autocatalytic work an That of idea (3–8). the molecules) to other to which self- relative and in amplified sustaining become bootstrapping, molecules or some of feedback concentrations positive the of form autocatalysis some requires chem- (i.e., behavior simple biological from self-supporting transition to the that istry recognized was it on, Early Process Autocatalytic an Requires CTB monomer about specific chemistries. are polymer about or here not mechanisms, questions spontaneous become Our physical have self-serving? sequences and chain random informational might long polymerization how produced have What Also, they chains? 2). could - How (1, autocatalytic? of molecules are chains processes acid-like long nucleic produced or have like polymer- might prebiotic how processes explore to ization model was a In develop What transition? we cells. paper, (CTB) living this -to-biology to the molecules of prebiotic mechanism of the soup a from ago A life of origin (CTB) chemistry-to-biology the solu- selec- to con- single and transition. under contribute a sequences evolve specific could to further how formations overconverge describes could mechanism not that This does ensembles tion. of gives it feature it that attractive tion; is An longer an sequences. into model form grow particular this to chains have shown short that are which chains catalysts through ribosomes foldamer set, as Such polymers autocatalytic do. HP now such pro- today’s other would of elongating versions primitive catalysts, as tein act they way, this hydrophobic In exposing surfaces. structures, compact (HP relatively polar into short sequences even that random find of We model. binding (H and concentra- hydrophobic folding own of the their study increasing We (i.e., tions)? autocatalytic grow informational become spontaneously more grow and chains molecules random would prebiotic short-chain how that Also, How- do longer? thought polymers. how random is Harvey) then, Steve short It and synthesize ever, Chan originated. to Sun life Hue able by how were reviewed processes known 2016; 8, not December review is for It (sent 2017 10, July Dill, A. Ken by Contributed 94720 CA Berkeley, Laboratory, National Berkeley 11794; NY Brook, Stony afrCne o hscladQatttv ilg,SoyBokUiest,SoyBok Y11794; NY Brook, Stony University, Brook Stony Biology, Quantitative and Physical for Center Laufer efis eiwsm ftekyrsls ls fmod- of class A results. key the of some review first We pnaeu rniinocre oeta ilo years billion 3 than more the occurred how is transition chemistry in spontaneous processes mysterious most the mong | Pmodel HP n oa (P polar and ) a,b,c c | eateto hsc n srnm,SoyBokUiest,SoyBok Y174 and 11794; NY Brook, Stony University, Brook Stony Astronomy, and Physics of Department oadN Zuckermann N. Ronald , biopolymers ooesi computational a in monomers ) | uoaayi sets autocatalytic d n e .Dill A. Ken and , hiscan chains ) a,b,c,1 1073/pnas.1620179114/-/DCSupplemental. at online information supporting contains article This option. access open 1 PNAS the through online available Freely interest. of conflict no declare Pennsylvania. authors of The University S.H., and Toronto; paper. of the University wrote H.S.C., K.A.D. Reviewers: and E.G. and data; analyzed R.N.Z. and catalytic form can that sponta- ribozymes self-assemble self-replicating to into shown are neously example, fragments For RNA members. of autocatalytic of mixtures the to the pairs of leading that in autocatalysis), growth (i.e., so exponential sets other designed each autocatalytic catalyze are can artificial systems molecules of Such (21–23). creation laboratory the to example, infor- chains. the sequence-dependent of or aspects mational heteropolymeric do the they Also, address autocatalytic. not be molecular might particular abstract mechanisms what and specify are not structures These do that (19). principle diversity of These models sequence repli- space. enhance template-directed will sequence that cation of predict models exploration postinformational effective nonfunctional through from arise ones can chains sequences (20), functional model that indicates such It replication. another template-directed with- In sequence-independent although undergo how. “self,” exactly recognize specifying to molecules out of sup- ability It an het- (16–19). certain poses precursors of own ability polymers their the that concentrate of to assumed of because eropolymers is templates tendency it own model, their some as such already serve one In is evolve. there to which chains in world, eropolymer (14, sequence–structure before shown biological world been encode “preinformational” relationships. also the to of begin has models heteropolymers are templating of These Autocatalytic 15). self-replication (13). the breakage of random and template-assisted from ligation arises elongation chain autocatalytic that uhrcnrbtos ...adKAD eindrsac;EG efre eerh E.G. research; performed E.G. research; designed K.A.D. and R.N.Z. contributions: Author owo orsodnesol eadesd mi:[email protected]. Email: addressed. be should correspondence whom To admceia oyeiain ol aegvnrs to rise given polymers. how catalytic have these protein-like explains could sequence-dependent that longer mechanism polymerizations and This chemical catalysts, chains. random primitive other as elongate act hydrophobic can could with can chains water, that short in sequence surfaces structures random compact some into collapse even that . today’s find as polymers We such monomers, study polar We and hydrophobic concentrations. of own the that their sequences to increase not particular led could having is have chains It longer could level—to process sequences. polymer- next physical random have what short could however, into simple clear, earth that units polymers, early thought monomer the informational is ized on It on processes acids. nucleic chemical based and proteins are namely lifeforms Today’s Significance hr a lobe uheprmna oklaig for leading, work experimental much been also has There het- “postinformational” a describes models of class Another b eateto hmsr,SoyBokUniversity, Brook Stony Chemistry, of Department d oeua onr,Lawrence Foundry, Molecular www.pnas.org/lookup/suppl/doi:10. NSEryEdition Early PNAS | f9 of 1

BIOPHYSICS AND PNAS PLUS COMPUTATIONAL BIOLOGY networks that can compete with others (24). One limitation, A however, is that these are fragments taken from existing ribozymes, and therefore, they do not explain the origins from more primitive random beginnings. Here, we describe a dynamical mechanism that seeks to bridge from the pre- to postinformational world across the CTB tran- sition. We describe a physical basis for how short chains could have spontaneously led to longer chains, how random chains led to specific sequences, and a structural basis and plausible kinetics for a prebiotic autocatalytic transition. “Flory Length Problem”: Polymerization Processes Produce Mostly Short Chains Prebiotic polymerization experiments rarely produce long chains. It is generally assumed that the prebiotic proteins or nucleic acids that initiated the transition to biology must have been at least 30- to 60-monomers long (25). Both amino acids and can polymerize under prebiotic conditions with- B out , but they produce mostly short chains (26–30). Leman et al. (29) showed that carbonyl sulfide, a simple volcanic gas, brings about the formation of oligopeptides from amino acids under mild conditions in aqueous solution in minutes to hours. However, the products are mainly dimers and trimers. Longer chains can sometimes result through adsorption to clays (31, 32) or minerals (33, 34), from evaporation from tidal pools (35), from concentration in ice through eutectic melts (36), or from freezing (37) or temperature cycling. Even so, the chain- length extensions are modest (38). For example, mixtures of Gly and Gly2 grow to about 6-mers after 14 d (39, 40) on mineral catalysts, such as calcium mont- morillonite, hectorite, silica, or alumina. Or, in the experiments of Kanavarioti et al. (36), polymers of oligouridylates are found up to lengths of 11 bases long, with an average length of 4 after samples of phosphoimidazolide-activated uridine were frozen in the presence of metal ions in dilute solutions. Similar results are found in other polymers: a prebiotically plausible mechanism produces having a combination of ester and amide Fig. 1. Polymerization processes lead to mostly short chains. (A) Spon- bonds up to length 14 (38). taneous polymerization processes typically lead to a Flory distribution of It is puzzling how prebiotic processes might have overcome chain lengths. Green line gives hli = 6, and blue line corresponds to hli = 2 what we call the Flory Length Problem (i.e., the tendency of any (B) Fitted distributions from experiments on prebiotic polymerization: red, polymerization process to produce a distribution in which there Kanavarioti et al. (36); cyan, Ding et al. (46); magenta, Ferris (47). are more short chains and fewer long chains). Standard polymer- ization mechanisms lead to the Flory or Flory–Schulz distribu- tion of populations f (l), whereby short chains are exponentially vation code of particular sequence patterns of the hydrophobic more populated than longer chains (41): (H ) and polar (P) monomers (51). We call these hydrophobic polar (HP) copolymers. f (l) = a2l(1 − a)l−1, [1] Since today’s biocatalysts are proteins, it is not hard to imagine that early proteins could have been primitive cata- where l is the chain length, and a is the probability that any lysts. Precision and complexity are not required for peptides monomer addition is a chain termination. The average chain to perform biological functions. Proteins generated from ran- length is given by hli = a(2 − a) (Fig. 1A). dom libraries can sustain the growth of living cells (52), and Prebiotic monomer concentrations are thought to have been in binding and catalysis from random peptides are not unusual the range of micromolar to millimolar (36, 42–45). Given micro- (53) (refs. 11–14 and 54 and references therein and refs. 55– molar concentrations of monomers and given hli = 2, the con- 59). Even so, the sorts of actions suggested here are cur- centration of 40-mers would be ≈ 10−19 mol/L. Fig. 1B shows rently more in the realm of speculation than proven fact. that, where the chain-length distributions are known for prebi- Below, we describe results of computer simulations that lead otic syntheses, they are well fit by the Flory distribution [or expo- to the conclusion that short random HP chains carry within nential law f (l) constantl ] (16, 19). them the capacity to autocatalytically become longer and more ∝ protein-like. Foldamer Autocat Mechanism: Short Hydrophobic Polar Here are the premises of the model. Chains Fold and Catalyze the Elongation of Other i) Some random HP sequences can fold into compact struc- Hydrophobic Polar Chains tures. We propose the hypothesis that the CTB transition occurred ii) Some of those foldamers will have exposed hydrophobic through foldable polymers (“foldamers”). Today’s biological “landing pad” surfaces. foldamers are predominantly proteins [although RNA molecules iii) Foldamers with landing pads can catalyze the elongation of and synthetic polymers can also fold (48–50)]. Many foldamers other HP chains. adopt specific native conformations, mainly through a binary sol- iv) These foldamer catalysts form an autocatalytic set.

2 of 9 | www.pnas.org/cgi/doi/10.1073/pnas.1620179114 Guseva et al. Downloaded by guest on September 24, 2021 Downloaded by guest on September 24, 2021 td h rpriso Psqec pcsi sn h DHP al. 2D et Guseva the using is spaces sequence HP of to properties way the standard affect study A sequences produces. polymerization HP that different populations the the of properties collapse the how Model. the in Folding Chain rate. growth chain this setting by bond rate rate unit any a the water; at broken with be interaction can of because dilu- hydrolysis or taneous degradation rate by certain a system at the tion from which removed in are process, molecules steady-state a rate consider we at simple, monomer bookkeeping a adding by elongates chain activated model, this our equilibrium, In P of restrictive. very out not a be is assumption present—must contains or that systems—past activated solution ing of surrounding a supply within sufficient place takes Model. the tion of Selection and Dynamics Growth The Chain HP of Process the Modeling iii) iv) ii) i) ooesaesple ya xenlsuc trate at source external an by supplied are monomers eei vdnefrteepremises. these for evidence is Here ipeognss uha atraadfni ssnonri- mRNAs uses involve fungi, not 87). do and (86, which in bacteria synthetases, synthesis as protein bosomal of such pro- route organisms, as major simple such a if 83, 85), Second, water (84, (ref. surfaces. in environments occur tein molecule nonpolar can water in reactions out a Dehydration carried of 82). conden- p. removal 3, a the chap. entails and by elongation step catalyzable chain sation RNA be peptide by pep- alternatively First, that might proteins. out believe elongation to carried chain using reasons tide is are proteins there catalysis synthesize However, molecules. the cells wherein today’s ribosomes, elon- course, chain Of peptide catalyzed is gation. have Oxidase might proteins Cytochrome-c the Primitive on where increase cluster to first sites, known patch hydrophobic the initiation with A as interacts (81). surfactant serve a of lipases tail on hydrophobic of clusters of hydrophobic surface sites example, the often For are 80–82). proteins (78, catalysis on patches hydrophobic Surface sites (79). those hydrophobes and in interactions, enriched are to geometrical amenable are have that surfaces properties partners. protein dozen of a three-quarters nearly inter- Almost typically of proteins, sites other many with have action proteins Modern proteins. other or n around from ing ranging proteins patches monomeric found soluble and on 112 (78) of Sikosek common study are A by patches proteins. today’s and work clusters the hydrophobic Exposed in is (70). Chan review (69). switches comprehensive evolutionary and A (68), features compara- chain modeling homology-like of tive (67), the recombination sequences (66), (65), encoding 25 uniquely length sequences and all encoding of insights the determination uniquely (64), notable of superfunnels the several space nonrandomess sequence Subsequently, contributed including (74–77). folding has advances, and unique experiments model of inter- polar by HP encoding specific of confirmed binary by patterning is This tuning 60–73). be contacts. binary finer (51, can residue with the structures proteins residues, in folded hydrophobic of unique simply evolution that encoded show and studies folding Those the for P H random Nondesigned oyeshv ensuidetnieya model a as extensively studied been have polymers 400 A ˚ d 2 easm htcan a neg spon- undergo can chains that assume We . k hyaeotnbnigstsfrligands for sites binding often are they ; β cat h 1 = ihu oso eeaiy edefine we generality, of loss Without . (82). nadto,ormdlas losfor allows also model our addition, In P H l te ae r ae eaieto relative taken are rates other All . easm htcanpolymeriza- chain that assume We H eune r nw ofold. to known are sequences and 200 P ooes ic liv- Since monomers. to β 1,200 utt epthe keep to Just . A ˚ a 2 averag- , given A . H and hrfr,sm ml rcino h ie vnfle chains additional folded allows proteins. model even small our proportion, time, natural that the in for We and of unfolded, is solvent. are fraction it the small as some to reversible, Therefore, be exposure to core folding more oil unique take since have that chains, compact chains suppose other We drop-like than protected chains. adsorption better folded or are are folders solvent than the much surfaces from are onto chains degradation open to that accessible protected reflects more is simply folded also, a This and that hydrolysis. growth suppose from additional We from proteins. prevented modern is in chain do sur- they to as same 100- states, are the which the have proteins, 2D that 3D in convenient typical studied as is (89). 200-mers ratios be it volume can driving Also, to the that residues). face (because 30-mers H ratios to of volume 10- burial to determinative is surface the the force because adequately in is 2-dimensionality is proteins the physics 3D that of reason properties The struc- reflects proteins 2). compact, native into (Fig. fraction resembling collapse structures (60) nonnegligible folded can con- partially a sequences and main that HP tured, folding A possible is sequences, (88). all studies of proteins protein previous of of from reproduces kinetics clusion observations model folding lattice key and HP equilibria, the param- the or of that of bias shows com- many without properties work exhaustive studied Prior collapse be by eters. and can studied folding spaces be whole-sequence Thus, can enumeration. of conformation sequence puter possible possible every long, every 25-monomers about than sequence. particular that of where native having native, call we which energy structure, conformation energy given free lowest a in contacts) Chains chain noncovalent P. HH e given lat- of or a square number H of (the 2D equals energy either a free is on The bead represented tice. Each conformations bead. chain different the a have of as monomer each represented model, this is In 60). (51, model lattice nteH atc oe.Rd(rpn fi h einn ftesequence) the of beginning the in if pink (or Red to model. corresponds lattice HP the in 2. Fig. H easm htfle ttsbhv ifrnl rmunfolded from differently behave states folded that assume We shorter chains for that, is model lattice HP the of virtue A foeH neato) oeH eune aeasingle a have sequences HP Some interaction). HH one of n E xmlso Psqecsta odt nqentv structures native unique to fold that sequences HP of Examples h nat φ stenme fH otcsi h aiestructure native the in contacts HH of number the is : H ooes n lecrepnsto corresponds blue and monomers, E nat = n h φ e H , NSEryEdition Early PNAS P . × teenergy (the | f9 of 3 [2]

BIOPHYSICS AND PNAS PLUS COMPUTATIONAL BIOLOGY fied. The challenge in experimental tests here is that the initial signals being sought are likely to be small, be in the noise, and involve much sequence heterogeneity. Also, we do not mean to imply a belief that the true prebiotic mechanism was necessarily hydrophobic contacts and localization. Rather, it is intended to show the plausibility of getting few kT barrier reductions from simple undesigned surfaces. To simulate this dynamics, we run stochastic simulations. For this purpose, we used the Expandable Partial Propensity Method (92). It is an exact stochastic simulation algorithm that improves on computational performance of the Gillespie method (93) for systems in which the number of potential species is much larger than the number of species actually present in the system. Also, we have shown that it gives probability distributions of molecule counts that are identical to those of the Gillespie algorithm (92). In Gillespie-like simulations, the time between reactions is stochastic and can be extracted from the simulations. In our simulations, one step is the time between two reactions, equal to 1/(rate of spontaneous monomer addition) s. In a single time step, one catalyst molecule can support only one growing chain plus one monomer unit. The growth rate of every single unfolded polymer is one. Only polymers which have come into contact with Fig. 3. Some HP foldamers have hydrophobic patches, which serve as land- a catalyst have higher growth rate for this particular time step. ing pads that can catalyze the elongation of other HP chains. Chain A folds and exposes a hydrophobic sticky spot, or landing pad, where another HP They also can grow spontaneously with a rate of one. molecule B as well as a hydrophobic monomer C can bind. This localization The description and corresponding C++ library can be found reduces the barrier for adding monomer C to the hydrophobic end of the at https://github.com/abernatskiy/epdm. growing chain B. Results and Discussion Folding Alone Does Not Solve the Flory Length Problem, but Folding Plus Catalysis Does. We compare three cases. Case 1 is a reference growth or degradation. For this purpose, we estimate the folding test, in which sequences grow and undergo hydrolysis, but no and unfolding rate coefficients for any HP sequence as (90) other factors contribute. Case 2 allows for chain folding but not   for catalysis, and case 3 allows for both chain folding and catal- kf ∆G Enat ln = − = − N ln z, [3] ysis. Case 1 simply recovers the Flory distribution, as expected, k kT kT u with exponentially decaying populations with chain length (Fig. where z is the number of rotational df per .* 4, gray lines). In case 2, when chains can fold, they can bury some monomers in their folded cores. Thus, chains that are com- The Catalysis Step in the Model. Some HP sequences will fold to pact or folded degrade more slowly than chains that do not fold. have exposed hydrophobic surfaces. These surfaces could act as primitive catalysts, as modern proteins do more optimally today. Fig. 3 illustrates a standard elementary mechanism of catalysts, namely translational localization of the reacting components. A protein A (the catalyst molecule) has a hydrophobic landing pad to which a growing reactant chain B and a reactant hydrophobic monomer C will bind, localizing them long enough to form a bond that grows the chain. It is important to notice here that the hydrophobic landing pad can facilitate only the addition of the hydrophobic monomer to a hydrophobic end of a chain, since polar residues will not interact with hydrophobic landing pad. How much rate acceleration could such a localization give? Here is a rough estimate. For chain elongation, the catalytic rate will increase if the polymerization energy barrier is reduced by hydrophobic localization by a factor βcat/βnon cat ∝ exp(EH · nc /kT ), where nc is the number of H monomers in the land- ing pad (see Fig. 6). The free energy of a typical hydrophobic interaction is 1–2kT . We take the minimum size of a landing pad to be three. For a landing pad size of three to four hydrophobic monomers, this binding and localization would reduce the kinetic barrier by 3–8kT, thus increasing the polymerization rate by one or more orders of magnitude. Of course, this rate enhancement is much smaller than the 107-fold of modern ribosomes (91), but Fig. 4. Chains become elongated by foldamer catalyst HP sequences. Case even small rate accelerations might ultimately become ampli- 1 (gray): a soup of chains has a Flory-like length distribution in the absence of folding and catalysis. Case 2 (blue): a soup of chains still has a Flory- like length distribution in the absence of catalysis (but allowing now for folding). Case 3 (red): a soup of chains contains considerable populations of longer chains when the soup contains HP chains that can fold and cat- ∗In principle, we could also specifically exclude sequences that are too hydrophobic on the grounds that they would aggregate and exit the system. However, in practice, those alyze. We run 30 simulations for every case. To produce each line, we took a 6 highly hydrophobic sequences do not contribute anyway, because they do not fold time average over 10 time points in the steady-state interval, then counted uniquely. Therefore, we do not treat that process explicitly to keep the model simple. molecules for each length, and divided it by the total molecular count.

4 of 9 | www.pnas.org/cgi/doi/10.1073/pnas.1620179114 Guseva et al. Downloaded by guest on September 24, 2021 Downloaded by guest on September 24, 2021 ue r ni n hc nsaent i.6sosafwo the of few a 6A shows Fig. 6 structures. Fig. native not. single to are fold ones that which sequences and HP it struc- in native are and consti- sequences tures molecules HP set—which what autocatalytic Set. about the Autocatalytic tute predictions an specific Form makes Sequences model Catalyst Foldamer 3). good case The 5, a (Fig. is chains median the of nor behavior mean the the of representation neither chain therefore, of distribution and polynomial lengths, a and variances 3 well-defined bigger Case much have length. has chain they with exponentially variances, diminish substantial that values have mean 2 and 1 cases reduction barrier of degree small. the relatively is although occur, kinetic populations chains enhanced longer the These of reduce chains. can other and chains elongating surface, for HP barrier hydrophobic some that some is expose hydrolysis result the fold, of The magnitude parameters. of dilution others, order an and of over robust elongation is distribution. Flory effect the both the This can catalyze “bend” chains will When also processes 4). polymerization and Fig. such in themselves lines by (red give fold 2 or 1 cases than advantages. pre- small this amplify of could iterations mechanism few A biotic selection. ran- preferential searched no is with space mechanism domly sequence this where emergent, whereas and goal, selec- prebiotic optimization a is by an driven toward mechanism force postbiotic disagreement. tive a in posits and necessarily work not hydrolysis prior is of That work forces Our the alone. just (94) aggregation dis- under can shown selected overall structures previously compact prebiotically the been into be of collapsing has of shape It capable the sequences lines). that affect blue to 4, but small (Fig. others, the tribution too increase to are does relative effects Folding sequences 2). the foldamer case some 5, of and populations lines blue 4, (Figs. lem under selection hydrolysis. by which and favored aggregation (94), of are coworkers conditions structures and compact Shakhnovich agree- that the of showed in work is the the conclusion with in This result ment structures. could evolutionary 2) foldable popula- enough case of in higher strong selection (as have a advantages small polymers Given even folded ones. pressure, that unfolded shows than median. from 2 tions or see mean can case the we 5, in as shown chains; Fig. be the cannot of information behavior high this the very distributed and of have distributions, are representation different ones) good lengths with a autocatalytic given 10 groups is (mostly the over two median average sequences of into nor time sequences several separate mean a the chains, basically is neither of longer point chains 3, Every populations the the case of the For for representation all the different: biomass. population faithful of and figure, the drastically a low populations the catalyze, of as is very 2, length cannot most 3 and have given constitute but 1 Case the sequences and cases fold for length. allowed. the ones For population that are that catalyzed. of median sequences catalysis of been or most and represent has mean sequence folding While dots reaction take average both polynomially. can elongation blue 3, of we one case fold, Thus, behavior least in exponentially. cannot at the and distributed that which catalysis, are of for sequences not length and but the given catalysts folding the of as allow of populations act we sequences which 2, represent case sequences dots In represent gray catalysis. dots cases, or red folding the allow of not all do we For 1, case In cases. three 5. Fig. ueae al. et Guseva ae3i ulttvl ifrn rmcss1ad2 Although 2. and 1 cases from different qualitatively is 3 Case chains longer of populations larger considerably gives 3 Case Prob- Length Flory the solve not does alone folding However, h itiuin vridvda eune r ihyhtrgnos eso h ouain mlcl onso niiulsqecs o the for sequences) individual of counts (molecule populations the show We heterogeneous. highly are sequences individual over distributions The 6 iepit ntesed-tt nevl oe ii f10 of limit Lower interval. steady-state the in points time shows This hrfr,toefwpriua pca eune ol wash would sequences special length. particular chain few with untenable, those two exponentially is Therefore, or grow situation one spaces This required sequence autocatalysts. because transition as CTB first proteins the Imagine “special” which space. sequence in the situation of size the the with grows set Sequence alytic the of Size the with Grows Space. Set Autocatalytic the of Size The molecule one B polymer 7, two a cata- Fig. shows cross-catalysis: elongations. foldamer 7 shows chain Fig. HP paired other. autocatalytic two each of for examples any autocatalytic set: are sequences autocatalytic lyst the of members 6B catalysts. Fig. not are whereas that catalysts, foldamers are that foldamers those o aayt,bttesz fteatctltcsti ongiil Fg 8). (Fig. are nonnegligible chains is Most set catalytic. autocatalytic not the are of but size elongation the fold the but that catalyze catalysts, chains not can HP that (B) other. pads each landing of have and structures unique 6. Fig. AB nsot l Psqecsta r odmrctlssare catalysts foldamer are that sequences HP all short, In sas bet elongate to able also is nipratqeto shwtesz fa autocat- an of size the how is question important An Pltiecan htfl n r uoaayi.Te odinto fold They autocatalytic. are and fold that chains lattice HP (A) −6 C sbcueo opttoa precision. computational of because is lnae another elongates i.7, Fig. A. A lnae polymer a elongates C Lower oeuei solution. in molecule NSEryEdition Early PNAS hw autocatalysis: shows hw those shows B | while , Upper f9 of 5

BIOPHYSICS AND PNAS PLUS COMPUTATIONAL BIOLOGY the respective autocatalytic sets for the green vs. red distribu- tions. Each of the two attractors has its own signature ensemble of HP sequences that is an emergent property of the dynamics. We expect that there will be many such dynamical attractors in more refined models (20 monomer types rather than 2, allow- ing for longer chains, etc.). Moreover, because this model gen- erates many different folds, it should generate many other types of catalytic functions other than chain elongation. These too will lead to additional dynamic attractors. Our goal here has not been to consider the evolution of functionalities beyond “ribosome- like chain elongation,” but ensemble models, such as this one, would lead to many other potential functionalities. Also, simu- lating chains longer than N = 25, it is likely to give richer and more complex behavior, as is true of real sequences. At this point, we note what our model is and what it is not. First, it aims to capture a few principles in a coarse-grained way. This model only looks at the prebiotic question of how poly- merizing random peptide-like molecules could collapse, partially fold, and catalyze the elongation and sequence differentiation of other sequences. It does not address how the genetic code evolved or how other protein functionalities evolved. Second, its Fig. 7. (Upper) Cross-catalysis of two different sequences. (Lower) Auto- catalytic mechanism is simply a translational localization in this catalysis of two copies of an identical sequence. Dashed arrows repre- case of the two reactants, polymer B and monomer C , in a chain sent multiple reactions of chain growth. Among them, there are both extension reaction. It indicates how foldamer surfaces could give ··· HH + H → · · · HHH catalyzed reactions and spontaneous chain elon- a nonnegligible (but probably small) reduction in the barrier gations. Catalysis is represented by red solid arrows. Solid black lines are to polymerization. Other binding and catalysis mechanisms are folding reactions. Chains, which we call “autocatalytic,” experience catalysis during one (or more often, several) of the steps of elongation. Then, when possible in foldamers; here, we simply show that this physics is plausible. Third, the mechanism is based on the same princi- they reach the length at which they can fold (Au, Bu, Cu), they fold and serve as catalysts themselves (Af , Bf , Cf ). Mutual catalysis can happen between dif- ple that today’s catalysts exploit: folded polymers can ferent sequences (here, A and B) and between different instances of the catalyze reactions, because the folded state is a miniature solid, same sequence (here, C). having interatomic positions relatively fixed over timescales that are long enough for reactants to see a fixed potential sur- face. Unfolded polymers fluctuate too rapidly and are not good out as biology moves into an increasingly larger sequence space catalysts. sea. In contrast, Fig. 8 shows that this mechanism resolves this Fourth, the sequence evolution in this mechanism is not problem. On the one hand, the fraction of HP sequences that toward trivial states, such as the all H sequence, because those are foldamers is always fairly small (about 2.3% of the model states do not have unique and stable folded states. The all H sequence space), and the fraction of HP sequences that are also sequence is like an oil droplet, with a fluctuating ensemble of catalysts is even smaller (about 0.6% of sequence space). On the ground states that spends little time in any one and therefore, other hand, Fig. 8 shows that the populations of both foldamers has more exposure to hydrolysis than unique folders have. How and foldamer cats grow in proportion to the size of sequence much more? The main point here is a qualitative one. Unfolded space. The implication is that the space of autocats in the CTB chains are most susceptible to degradation (they have no core), might have been large. Fig. 9 makes a closely related point. It shows that, for longer chains, the fraction of biomass that is pro- duced by autocatalysts completely takes over and dominates the polymerization process relative to just the basic polymerization dynamics itself, although the catalytic enhancements are quite modest. This is caused by two factors: (i) the number of autocata- lysts grows longer sequences (Fig. 8), and (ii) folding alone is not sufficient to populate longer chains. We find that the hydropho- bicities of the dominant sequences ranges from 50 to 80%, with an average of 68%.

Evolvability of HP Ensembles. A key challenge in models of the CTB transition is “winner take all.” Suppose one type of molecule is better than others by some criterion. It will tend to win out. This is an undesirable feature of a CTB model, because it means that no additional evolvability or selection is possible (for example, discussions are in refs. 19 and 95). Early origins processes are more likely to have led to ensembles that are fur- ther evolvable. For a complex system that has many attractors, a perturbation can move the system over a threshold to the basin of another attractor. This allows for exploration of the sequence space and additional evolvability. Fig. 10 shows that this model is not winner take all. Differ- ent initial seed conditions for the simulations lead to different Fig. 8. Different sequence spaces grow exponentially with chain length: dynamical attractors, with no stable switching between them. Fig. gray, the space of all HP sequences; blue, the space of foldamers; red, the 10B shows some of the sequences and folded states in each of space of foldamer catalysts.

6 of 9 | www.pnas.org/cgi/doi/10.1073/pnas.1620179114 Guseva et al. Downloaded by guest on September 24, 2021 Downloaded by guest on September 24, 2021 orsodt ifrn iuainrn.Tebakln hw h median the shows line black The simulations. lines 30 runs. red over Different to simulation length. chains different chain autocatalytic with to grows of correspond length contribution given the the of how biomass shows the line red Each alysts. 9. Fig. can polymer folded a that namely model toy idea, a simple use special that we a Here, capture catalysis. had to primitive have for some conceivably allowing in in could foldability cat- role polymer Therefore, a prebiotic acids. use of amino proteases type three serine of example, triad to For alytic react and surfaces. bind, stable recognize, those can that states capability transition the and with the substrates solid, of microscale time a folding resembles the It over molecule. ways stable relatively inter- atomic in fixing relationships precisely limited to lead for is foldamers polymeric have that nonfoldable it structures—is molecules other to that foldable contrast that mean reactions—in power catalyzing not unique The does proteins. that only to of is foldamers, model here to analysis or concrete our applicable bonding a although hydrogen Therefore, as include interactions. broadly other intended more simply could that is solvation it indicating hydropho- water, interactions, on in focuses bic fold model our to Although able solvation. also differential are molecules acid can Nucleic that hypotheses generates it that tested. is be modeling such of principal principles The value important studies relationships. many sequence-to-structure previous and captures folding Also, model of one. this this that plausibilities only shown as explore have the such to hypotheses, is way physical approach practical of This and stud- parameters. complete, arbi- is energy or unbiased, it space of sequence because choices by of that, unbiased nature trary the is is about it model preconceptions enumeration, any this exhaustive canoni- computational on rea- most by The reliance ied the foldamers. the of among spaces for is sequence son model studying lattice for HP models cal 2D The this one? of 2D focus the not is it own CTB, its of mechanism in work.) possible interesting a is as effects aggregation right confounding (Although avoid to to aggregation. way and simple tend of drops a also gives oil also would between folds sequences unique distinction discriminating H some therefore, is All and there states. aggregate, that three still not but those is susceptible drawn important among is less most boundary is even the What stable). are where relatively folds are cores unique (their fluc- and but cores ones), have (they tuating susceptible less are structures compact ueae al. et Guseva ent htti oe sntncsaiyecuiet proteins. to exclusive necessarily not is model this that note We a particularly model, lattice simple a believe we should Why h ogrtecan,tebge h otiuino h autocat- the of contribution the bigger the chains, the longer The n ntefloigpoes(i.7.H oyesaesynthe- into are folds polymers polymers HP HP those of 7). fraction (Fig. small A process inher- randomly. is following sized autocatalysis the that find in of we ent Here, form it? some mech- explain structural require might molecular anism what origins However, life’s 8). 6, that (5, autocatalysis recognized been has It new guide Conclusions in can yet modeling all this not that is are here premises experiments. the hope basic in Our the is as sight. even such modeling plain where science, first life, of of Predictions problems origins input. murky for are an crucial premises than especially experimental disprove outcome the an or in modeling, more prove first models predictions can and other In precedes that of it. theory experiments spirit approach, subsequent the of first motivates chain in predictions this the more In of goal, model physics. of rest different type the this a However, supply serves conclusions. the can to providing premises that from first, model logic a come for experiments premises modeling, biological a of catalyze can that way a in residues of reaction. number small a position their of green. in polymers shown the one the of to population corresponds total the into length. one. contributors red main a by are 5A represented Fig. one in to algorithm as or tering lines line same green the distribu- a are either of by These bifurcates kinds represented system two state The are simulations. a there the to that during clear realized steady get is the which It in distributions tion length run. length of simulation are distribution one lines represents for line The state each attractors. Again, two 3. case least from at has system alytic 10. Fig. B A ial,w omn ntesii fti oe.I much In model. this of spirit the on comment we Finally, Upper eune a vlet ifrn uoaayi es (A) sets. autocatalytic different to evolve can Sequences orsod otemcott hw nrdin red in shown macrostate the to corresponds k tutr ftesqecswihms often most which sequences the of Structure (B) -means. u eaae ntost yteclus- the by sets two in separated but NSEryEdition Early PNAS and A, | HP Lower f9 of 7 cat-

BIOPHYSICS AND PNAS PLUS COMPUTATIONAL BIOLOGY relatively stable compact states. A fraction of those folded struc- This and successful designs of foldable, biologically active pro- tures provides relatively stable landing pad hydrophobic surfaces. teins based on the HP folding rule (96) suggest that folding in Those surfaces can help to catalyze the elongation of other HP HP polymers is not rare. This model can suggest testable pre- molecules having foldable sequences. dictions for what early polymer sequences could be autocatalytic The HP model allows for unbiased counting of sequences that and provides a structural and kinetic mechanism for their action. do fold, do not fold, or fold and have a potentially catalytic hydrophobic landing pad. A nonnegligible fraction of all possible ACKNOWLEDGMENTS. We thank Nick Hud, Li Guo, Adam de Graff, Sam HP sequences folds to unique structures (2.3% for lengths up to Gellman, Andrew Pohorille, and Miha Luksicˇ for their insights and Anton 25-mers). The fraction of all possible HP sequences that have cat- Bernatskiy for fruitful discussions and help with software. We received 12.7% support from the Laufer Center and National Science Foundation Grant alytic surfaces (as defined above) is of foldable sequences MCB1344230. Work at the Molecular Foundry was supported by the Office or 0.3% of the whole-sequence space. These ratios remain rela- of Science, Office of Basic Energy Sciences, of the US Department of Energy tively constant with chain length, at least up to 25-mers (Fig. 8). under Contract DE-AC02-05CH11231.

1. Joyce G (1987) Nonenzymatic template-directed synthesis of informational macro- 36. Kanavarioti A, Monnard PA, Deamer DW (2001) Eutectic phases in ice facilitate molecules. Cold Spring Harb Symp Quant Biol 52:41–51. nonenzymatic synthesis. Astrobiology 1:271–281. 2. Abel DL, Trevors JT (2005) Three subsets of sequence complexity and their relevance 37. Bada JL (2004) How life began on earth: A status report. Earth Planet Sci Lett 226: to biopolymeric information. Theor Biol Med Model 2:29. 1–15. 3. Eigen M (1971) Selforganization of matter and the evolution of biological macro- 38. Forsythe JG, et al. (2015) Ester-mediated amide bond formation driven by wet-dry molecules. Naturwissenschaften 58:465–523. cycles: A possible path to polypeptides on the prebiotic earth. Angew Chem Int Ed 4. Eigen M, Schuster P (1977) A principle of natural self-organization. Naturwis- Engl 54:9871–9875. senschaften 64:541–565. 39. Rode BM, Son HL, Suwannachot Y, Bujdak J (1999) The combination of salt induced 5. Eigen M, Schuster P (1978) The hypercycle. Naturwissenschaften 65:7–41. peptide formation reaction and clay catalysis: A way to higher peptides under primi- 6. Dyson F (1985) Origins of Life (Cambridge Univ Press, Cambridge, UK). tive earth conditions. Orig Life Evol Biosph 29:273–286. 7. Prigogine I, Nicolis G (1989) Exploring Complexity (St. Martin’s Press, New York). 40. Rode BM (1999) Peptides and the origin of life. Peptides 20:773–786. 8. Kauffman SA (1986) Autocatalytic sets of proteins. J Theor Biol 119:1–24. 41. Flory PJ (1953) Principles of Polymer Chemistry (Cornell Univ Press, Ithaca, NY), p 9. Segre´ D, Lancet D, Kedem O, Pilpel Y (1998) Graded autocatalysis replication domain 688. (GARD): Kinetic analysis of self-replication in mutually catalytic sets. Orig Life Evol 42. Stribling R, Miller SL (1987) Energy yields for hydrogen cyanide and formaldehyde Biosph 28:501–514. syntheses: The HCN and amino acid concentrations in the primitive ocean. Orig Life 10. Segre´ D, Ben-Eli D, Lancet D (2000) Compositional genomes: Prebiotic information Evol Biosph 17:261–273. transfer in mutually catalytic noncovalent assemblies. Proc Natl Acad Sci USA 97:4112– 43. Huber C, Wachtersh¨ auser¨ G (1998) Peptides by activation of amino acids with 4117. CO on (Ni,Fe)S surfaces: Implications for the origin of life. Science 281:670– 11. Markovitch O, Lancet D (2012) Excess mutual catalysis is required for effective evolv- 672. ability. Artif Life 18:243–66. 44. Aubrey AD, Cleaves HJ, Bada JL (2009) The role of submarine hydrothermal systems 12. Wu M, Higgs PG (2009) Origin of self-replicating biopolymers: Autocatalytic feedback in the synthesis of amino acids. Orig Life Evol Biosph 39:91–108. can jump-start the RNA world. J Mol Evol 69:541–54. 45. Lazcano A, Miller SL (1996) The origin and early evolution of life: Prebiotic chemistry, 13. Tkachenko AV, Maslov S (2014) Onset of autocatalysis of information-coding poly- the pre-RNA world, and time. Cell 85:793–798. mers. arXiv:1405.2888v4. 46. Ding PZ, Kawamura K, Ferris JP (1996) Oligomerization of uridine phosphorimida- 14. Lee DH, Granja JR, Martinez JA, Severin K, Ghadri MR (1996) A self-replicating pep- zolides on montmorillonite: A model for the prebiotic synthesis of RNA on minerals. tide. Nature 382:525–528. Orig Life Evol Biosph 26:151–171. 15. Rubinov B, Wagner N, Rapaport H, Ashkenasy G (2009) Self-replicating amphiphilic 47. Ferris JP (1999) Prebiotic synthesis on minerals: Bridging the prebiotic and RNA β-sheet peptides. Angew Chem Int Ed Engl 48:6683–6686. worlds. Biol Bull 196:311–314. 16. Nowak MA, Ohtsuki H (2008) Prevolutionary dynamics and the origin of evolution. 48. Gellman SH (1998) Foldamers: A manifesto. Acc Chem Res 31:173–180. Proc Natl Acad Sci USA 105:14924–14927. 49. Lee BC, Zuckermann RN, Dill KA (2005) Folding a nonbiological polymer into a com- 17. Ohtsuki H, Nowak MA (2009) Prelife catalysts and replicators. Proc Biol Sci 276:3783– pact multihelical structure. J Am Chem Soc 127:10999–11009. 3790. 50. Capriotti E, Marti-Renom MA (2008) RNA structure alignment by a unit-vector 18. Chen IA, Nowak MA (2012) From prelife to life: How chemical kinetics become evolu- approach. Bioinformatics 24:i112–i118. tionary dynamics. Acc Chem Res 45:2088–2096. 51. Chan HS, Dill KA (1991) “Sequence space soup” of proteins and copolymers. J Chem 19. Derr J, et al. (2012) Prebiotically plausible mechanisms increase compositional diver- Phys 95:3775–3787. sity of nucleic acid sequences. Nucleic Acids Res 40:4711–4722. 52. Fisher Ma, McKinley KL, Bradley LH, Viola SR, Hecht MH (2011) De novo designed 20. Walker SI, Grover MA, Hud NV (2012) Universal sequence replication, reversible poly- proteins from a library of artificial sequences function in Escherichia Coli and enable merization and early functional biopolymers: A model for the initiation of prebiotic cell growth. PLoS One 6:e15364. sequence evolution. PLoS One 7:e34166. 53. Cherny I, Korolev M, Koehler AN, Hecht MH (2012) Proteins from an unevolved library 21. von Kiedrowski G (1986) A self-replicating hexadeoxynucleotide. Angew Chem Int Ed of de novo designed sequences bind a range of small molecules. ACS Synth Biol 1: Engl 25:932–935. 130–138. 22. Lincoln TA, Joyce GF (2009) Self-sustained replication of an RNA enzyme. Science 54. Hilvert D (2013) Design of protein catalysts. Annu Rev Biochem 82:447–470. 323:1229–1232. 55. Giuliano MW, Miller SJ (2016) Site-selective reactions with peptide-based catalysts. 23. Vaidya N, et al. (2012) Spontaneous network formation among cooperative RNA repli- Top Curr Chem 372:157–201. cators. Nature 491:72–77. 56. Duschmale´ J, Kohrt S, Wennemers H (2014) Peptide catalysis in aqueous emulsions. 24. Robertson MP, Joyce GF (2014) Highly efficient self-replicating RNA enzymes. Chem Chem Commun (Camb) 50:8109–8112. Biol 21:238–245. 57. Rozinov MN, Nolan GP (1998) Evolution of peptides that modulate the spectral qual- 25. Szostak JW, Ellington AD (1993) In Vitro Selection of Functional Nucleic Acids in the ities of bound, small-molecule fluorophores. Chem Biol 5:713–28. RNA World (Cold Spring Harbor Lab Press, Plainview, NY), pp 511–533. 58. Rodi DJ, et al. (1999) Screening of a library of phage-displayed peptides identifies 26. Shock EL (1992) Stability of peptides in high-temperature aqueous solutions. Geochim human bcl-2 as a taxol-binding protein. J Mol Biol 285:197–203. Cosmochim Acta 56:3481–3491. 59. Adamala K, Szostak JW (2013) Competition between model protocells driven by an 27. Martin RB (1998) Free energies and equilibria of peptide bond hydrolysis. Biopolymers encapsulated catalyst. Nat Chem 5:495–501. 45:351–353. 60. Lau KF, Dill KA (1989) A lattice statistical mechanics model of the conformational and 28. Paecht-Horowitz M, Berger J, Katchalsky A (1970) Prebiotic synthesis of polypeptides sequence spaces of proteins. Macromolecules 22:3986–3997. by heterogeneous polycondensation of amino-acid adenylates. Nature 228:636–639. 61. Miller DW, Dill KA (1995) A statistical mechanical model for hydrogen exchange in 29. Leman L, Orgel LE, Ghadiri MR (2004) Carbonyl sulfide-mediated prebiotic formation globular proteins. Protein Sci 4:1860–1873. of peptides. Science 306:283–286. 62. Yue K, Dill KA (1995) Forces of tertiary structural organization in globular proteins. 30. Orgel LE (2004) Prebiotic chemistry and the origin of the RNA world. Crit Rev Biochem Proc Natl Acad Sci USA 92:146–150. Mol Biol 39:99–123. 63. Agarwala R, et al. (1997) Local rules for protein folding on a Triangular lattice and 31. Rao M, Odom DG, Oro´ J (1980) Clays in prebiological chemistry. J Mol Evol 15:317–331. Generalized hydrophobicity in the HP model. J Comput Biol 4:275–296. 32. Lambert JF (2008) Adsorption and polymerization of amino acids on mineral surfaces: 64. Bornberg-Bauer E, Chan HS (1999) Modeling evolutionary landscapes: Mutational A review. Orig Life Evol Biosph 38:211–242. stability, topology, and superfunnels in sequence space. Proc Natl Acad Sci USA 33. Bernal JD (1949) The physical basis of life. Proc Phys Soc B 62:597–618. 96:10689–10694. 34. Ferris JP, Hill AR, Liu R, Orgel LE (1996) Synthesis of long prebiotic oligomers on min- 65. Irback¨ A, Sandelin E (2000) On hydrophobicity correlations in protein chains. Biophys eral surfaces. Nature 381:59–61. J 79:2252–2258. 35. Nelson KE, Robertson MP, Levy M, Miller SL (2001) Concentration by evaporation and 66. Irback¨ A, Troein C (2002) Enumerating designing sequences in the HP model. J Biol the prebiotic synthesis of cytosine. Orig Life Evol Biosph 31:221–229. Phys 28:1–15.

8 of 9 | www.pnas.org/cgi/doi/10.1073/pnas.1620179114 Guseva et al. Downloaded by guest on September 24, 2021 Downloaded by guest on September 24, 2021 0 ioe ,Ca S(04 ipyiso rti vlto n vltoayprotein evolutionary and evolution protein of Biophysics (2014) HS Chan T, follows Sikosek conflict 70. adaptive from Escape (2012) E Bornberg-Bauer HS, Chan T, Sikosek 69. 7 rsnieJ,KdrR 21)Fs,cepadoto oto nihsit ther- into insights – control of out and cheap Fast, (2016) library. RL combinatorial Koder designed JM, a from Brisendine proteins 77. novo de folded Stably (2003) binary Y by design Wei Protein (1993) 76. MH Hecht JM, Babik H, Xiong the JM, determining Schiffer in S, Kamtekar interactions packing internal 75. of protein role The of (1991) evolution RT Sauer selective WA, and Lim neutral 74. Modelling (1991) WJ non- Wilbur and DJ, polar Lipman of Periodicity 73. (1995) MH Hecht HM, Shieh BL, Buckwalter sequences. H, polymer Xiong Designing problem: 72. folding protein Inverse (1992) Ka Dill K, Yue 71. Moreno-Hern of 68. exploration Recombinatoric (2002) HS Chan E, Bornberg-Bauer WH, Wong Y, Cui 67. 9 odatNve ,Sonc 21)Aepoenpoenitrae pca ein on regions special of interfaces surfaces protein-protein Are the (2015) on J Skolnick patches S, Tonddast-Navaei Hydrophobic (1996) 79. P Argos HJC, Berendsen P, Lijnzaad 78. 3 esnD,CxM (2008) MM Cox DL, cytochrome-c Nelson to site 83. entry electron the is II subunit of 121 Tryptophan (1998) H Witt 82. (1997) M Onno JH, 1·6 Ee at van plastocyanin poplar 81. oxidized of Structure (1983) HC Freeman J, Guss Mitchell 80. ueae al. et Guseva biophysics. robustness. mutational 109:14888–14893. and trade-offs functional weak from lattice. two-dimensional a on models 1683–1693. hydrophobic-polar of features oyai n nomtccntanso aua rti eune rmd novo de from sequences protein natural design. on protein constraints informatic and modynamic Sci Protein acids. amino nonpolar and polar of patterning protein. a of stability and structure folding. self-assembling in structure secondary of determinant peptides. major oligomeric the is acids amino polar USA Sci Acad Natl Proc land- evolutionary protein of model scapes. heteropolymer-based A structures: folded novel rti’ surface? protein’s a structures. protein t Ed. 5th dock- the in patch hydrophobic a reaction. of ing Involvement denitrificans. paracoccus in oxidase 408. p FL), resolution. rcNt cdSiUSA Sci Acad Natl Proc rcRScB Soc R Proc 12:92–102. o Interface Soc R J o Biol Mol J ilChem Biol J ne ,Lvt 21)Cmaaiemdln n protein-like and modeling Comparative (2012) M Levitt S, andez ´ ici ipy Acta Biophys Biochim Proteins hmPhys Chem J rcNt cdSiUSA Sci Acad Natl Proc 245:7–11. 169:521–563. 89:4163–4167. 273:5132–5136. enne rnilso Biochemistry of Principles Lehninger nye nDetergency in Enzymes 25:389–397. 11:20140419. 99:809–814. 143:243149. o Biol Mol J 1857:485–492. 92:6349–6353. 219:359–376. Science dvnE H(R,Bc Raton, Boca (CRC, JH Ee van ed , 262:1680–1685. rcNt cdSiUSA Sci Acad Natl Proc Femn e York), New (Freeman, Proteins 80: A ˚ 0.GohK ilK 21)Clua rtoe aeboddsrbtoso rti stabil- protein of distributions broad have proteomes Cellular (2010) KA Dill K, Ghosh 100. 0.Dl A hs ,Shi D(01 hscllmt fclsadproteomes. and cells of limits Physical (2011) JD Schmit K, Ghosh KA, Dill 101. 9 mt M asnD 19)Tep-aepol o h yrlsso etd bond. peptide a of hydrolysis the for profile pH-rate The (1998) DE Hansen hydrol- RM, of Smith rate exact uncatalyzed simple 99. the of from measurement perspective Direct a (1996) DE - Hansen folding RAR, Bryant protein of 98. Principles (2008) al. et KA, Dill 97. challenges. and Impasses evolvability: Primordial (2015) al. have could et biopolymers V, first Vasas the How 95. (1996) EI Shakhnovich AM, time Gutin stochastic VI, the Abkevich simulating numerically 94. for method general A (1976) DT Gillespie system 93. the for simulations stochastic rule-based entropy Exact (2016) an EA as Guseva AV, ribosome Bernatskiy The (2004) 92. R Wolfenden MV, Rodnina lengths. M, Beringer chain A, their Sievers from stabilities 91. protein Computing (2009) KA Dill K, Ghosh 90. folding. aggregation, Compactness, (2000) protein A Maritan and JR, Banavar C, principles Micheletti G, Giugliarelli Polymer 89. (1999) KA Dill 88. 6 upyG,Gesa B eh H(05 env rtiswt iessann func- life-sustaining with proteins novo De (2015) MH Hecht JB, Greisman GS, Murphy 96. non- into Insights rules: protein-synthesis the outside Working (2009) MA Marahiel 87. 6 tcehu ,MozH,Bredh ,Mrhe A(98 etd bond Peptide (1998) MA Marahiel V, Bergendahl HD, Mootz water. in T, reactions Stachelhaus Dehydration (2002) 86. S Kobayashi XM, Sun S, Iimura K, Manabe Surfactant- water. 85. in reactions Dehydration (2001) S Kobayashi XM, Sun K, Manabe 84. ity. Soc Chem Am J bond. peptide a of ysis models. evolved. reactions. chemical coupled of evolution arXiv:1609.06403v2. species. molecular of number unlimited with trap. USA Sci Acad Natl study. model lattice A protein: of 5077. behavior prionlike and 1180. cdSiUSA Sci Acad dynamic. structurally are tions 381:29–38. synthesis. peptide ribosomal 22781. omto nnniooa etd Biosynthesis. peptide nonribosomal in formation dithioac- and water. thioether, in ether, formation ester, etal for catalyst acid-surfactant-combined Brønsted in alcohols with acids system. carboxylic emulsion of an esterification direct acid-catalyzed brønsted type ipy J Biophys rcNt cdSiUSA Sci Acad Natl Proc rti Sci Protein rcNt cdSiUSA Sci Acad Natl Proc 99:3996–4002. 108:17876–17882. 120:8910–8913. 106:10649–10654. 4:561–602. mCe Soc Chem Am J mCe Soc Chem Am J mCe Soc Chem Am J 101:7897–7901. etSci Pept J o Biol Mol J 93:839–844. 123:10101–10102. 15:799–807. 118:5498–5499. 124:11971–11978. optPhys Comput J 428:399–411. NSEryEdition Early PNAS 22:403–434. ilChem Biol J hmPhys Chem J rti Sci Protein ho Biol Theor J 273:22773– 113:5072– rcNatl Proc | 8:1166– f9 of 9 Proc

BIOPHYSICS AND PNAS PLUS COMPUTATIONAL BIOLOGY