Foldamer hypothesis for the growth and sequence differentiation of prebiotic

Elizaveta Gusevaa,b,c, Ronald N. Zuckermannd, and Ken A. Dilla,b,c,1

aLaufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794; bDepartment of Chemistry, Stony Brook University, Stony Brook, NY 11794; cDepartment of Physics and Astronomy, Stony Brook University, Stony Brook, NY 11794; and dMolecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, CA 94720

Contributed by Ken A. Dill, July 10, 2017 (sent for review December 8, 2016; reviewed by Hue Sun Chan and Steve Harvey) It is not known how life originated. It is thought that prebiotic that autocatalytic chain elongation arises from template-assisted processes were able to synthesize short random polymers. How- ligation and random breakage (13). Autocatalytic templating ever, then, how do short-chain spontaneously grow of the self-replication of peptides has also been shown (14, longer? Also, how would random chains grow more informational 15). These are models of the “preinformational” world before and become autocatalytic (i.e., increasing their own concentra- heteropolymers begin to encode biological sequence–structure tions)? We study the folding and binding of random sequences relationships. of hydrophobic (H) and polar (P) monomers in a computational Another class of models describes a “postinformational” het- model. We find that even short hydrophobic polar (HP) chains can eropolymer world, in which there is already some tendency of collapse into relatively compact structures, exposing hydrophobic chains to evolve. In one such model, it is assumed that polymers surfaces. In this way, they act as primitive versions of today’s pro- serve as their own templates because of the ability of certain het- tein catalysts, elongating other such HP polymers as ribosomes eropolymers to concentrate their own precursors (16–19). It sup- would now do. Such foldamer catalysts are shown to form an poses an ability of molecules to recognize “self,” although with- autocatalytic set, through which short chains grow into longer out specifying exactly how. In another such model (20), chains chains that have particular sequences. An attractive feature of undergo sequence-independent template-directed replication. It this model is that it does not overconverge to a single solu- indicates that functional sequences can arise from nonfunctional tion; it gives ensembles that could further evolve under selec- ones through effective exploration of sequence space. These tion. This mechanism describes how specific sequences and con- postinformational models predict that template-directed repli- formations could contribute to the chemistry-to-biology (CTB) cation will enhance sequence diversity (19). These are abstract transition. models of principle that do not specify what particular molecular structures and mechanisms might be autocatalytic. Also, they do origin of life | HP model | biopolymers | autocatalytic sets not address the heteropolymeric or sequence-dependent infor- mational aspects of the chains. mong the most mysterious processes in chemistry is how the There has also been much experimental work leading, for Aspontaneous transition occurred more than 3 billion years example, to the creation of artificial autocatalytic sets in the ago from a soup of prebiotic molecules to living cells. What was laboratory (21–23). Such systems are designed so that pairs of the mechanism of the chemistry-to-biology (CTB) transition? In molecules can catalyze each other (i.e., autocatalysis), leading to this paper, we develop a model to explore how prebiotic - exponential growth of the autocatalytic members. For example, ization processes might have produced long chains of - mixtures of RNA fragments are shown to self-assemble sponta- like or nucleic acid-like molecules (1, 2). What polymerization neously into self-replicating ribozymes that can form catalytic processes are autocatalytic? How could they have produced long chains? Also, how might random chain sequences have become informational and self-serving? Our questions here are about Significance physical spontaneous mechanisms, not about specific monomer or polymer chemistries. Today’s lifeforms are based on informational polymers, namely and nucleic acids. It is thought that simple CTB Requires an Autocatalytic Process chemical processes on the early earth could have polymer- Early on, it was recognized that the transition from simple chem- ized monomer units into short random sequences. It is not istry to self-supporting biological behavior requires autocatalysis clear, however, what physical process could have led to the (i.e., some form of positive feedback or bootstrapping, in which next level—to longer chains having particular sequences that the concentrations of some molecules become amplified and self- could increase their own concentrations. We study polymers sustaining relative to other molecules) (3–8). That work has led of hydrophobic and polar monomers, such as today’s proteins. to the idea of an autocatalytic set, a collection of entities in which We find that even some random sequence short chains can any one entity can catalyze another. collapse into compact structures in water, with hydrophobic We first review some of the key results. A class of mod- surfaces that can act as primitive catalysts, and that these els called Graded Autocatalysis Replication Domain (GARD) could elongate other chains. This mechanism explains how (9–11) predicts that artificial autocatalytic chemical kinetic net- random chemical polymerizations could have given rise to works can lead to self-replication, with a corresponding ampli- longer sequence-dependent protein-like catalytic polymers. fication of some chemicals over others. Such systems display some degree of inheritance and adaptability. GARD model Author contributions: R.N.Z. and K.A.D. designed research; E.G. performed research; E.G. is a subset of first models, which envision that and R.N.Z. analyzed data; and E.G. and K.A.D. wrote the paper. small chemical processes precede information trans- Reviewers: H.S.C., University of Toronto; and S.H., University of Pennsylvania. fer and precede the first biopolymers. Focusing on polymers, The authors declare no conflict of interest. Wu and Higgs (12) developed a model of RNA chain-length Freely available online through the PNAS open access option. autocatalysis. They envision that some of the RNA chains can 1To whom correspondence should be addressed. Email: [email protected]. spontaneously serve as polymerase ribozymes, leading to auto- This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. catalytic elongation of other . A related model asserts 1073/pnas.1620179114/-/DCSupplemental.

E7460–E7468 | PNAS | Published online August 22, 2017 www.pnas.org/cgi/doi/10.1073/pnas.1620179114 Downloaded by guest on September 26, 2021 Downloaded by guest on September 26, 2021 ueae al. et Guseva rmfezn 3)o eprtr yln.Ee o h chain- the so, or Even (38). modest (36), cycling. are melts temperature extensions eutectic length or pools through (37) tidal ice freezing from in from evaporation concentration from 34), from trimers. (33, clays (35), minerals to and or adsorption dimers 32) through to result (31, mainly sometimes minutes are can in amino chains products solution Longer from the aqueous oligopeptides However, in of conditions hours. (26–30). mild formation under chains the volcanic acids simple about short a sulfide, brings mostly carbonyl that gas, produce showed (29) al. they acids et Leman amino but Both with- enzymes, conditions (25). prebiotic have out long under or must polymerize 60-monomers can biology proteins to nucleotides to and 30- prebiotic transition least long the the at that initiated been produce that assumed acids generally rarely nucleic is experiments It chains. polymerization Prebiotic Chains Short Produce Mostly Processes Polymerization Problem”: Length “Flory transition. autocatalytic kinetics plausible prebiotic and a could basis led for structural chains chains a random and short how sequences, how specific chains, to longer for tran- to basis led CTB physical spontaneously the have a across describe world We postinformational existing sition. to pre- from the from from taken origins the fragments explain beginnings. not are limitation, random do primitive One more they these therefore, (24). and that others ribozymes, with is compete however, can that networks dp pcfi aiecnomtos anytruhabnr sol- binary foldamers a Many through mainly (48–50)]. conformations, fold native specific also adopt can biological polymers synthetic occurred molecules Today’s and RNA transition [although proteins (“foldamers”). CTB predominantly are polymers the foldamers that foldable hypothesis through the propose We Other Chains of Polar Elongation Hydrophobic the Catalyze and Fold Polar Chains Hydrophobic Short Mechanism: Autocat Foldamer expo- [or distribution Flory prebi- the law by for nential fit known well are are they distributions syntheses, otic chain-length the where be that, would given 40-mers and of micro- monomers centration Given of 42–45). (36, concentrations millimolar molar to micromolar of chain range the average The termination. by chain given is a length is addition monomer where (41): chains longer distribu- than Flory–Schulz populated more or Flory populations the of tion to polymer- lead Standard chains). mechanisms long there ization fewer which and in chains any short distribution of more tendency a are the produce (i.e., to Problem process Length polymerization Flory the call we amide what and ester (38). of 14 mechanism length combination to plausible up a prebiotically bonds having a oligomers are polymers: results produces Similar other solutions. after in dilute 4 in found ions of metal length of in average presence frozen an the were with uridine long, phosphoimidazolide-activated found bases of are 11 samples oligouridylates of of lengths mont- polymers to calcium experiments (36), up the as al. in such et Or, Kanavarioti catalysts, alumina. of mineral or on silica, 40) hectorite, morillonite, (39, d 14 after o xml,mxue fGyadGly and Gly of mixtures example, For bridge to seeks that mechanism dynamical a describe we Here, rboi ooe ocnrtosaetogtt aebe in been have to thought are concentrations monomer Prebiotic overcome have might processes prebiotic how puzzling is It l stecanlnt,and length, chain the is f (l ) ∝ hl constant i f = (l f (l hrb hr hisaeexponentially are chains short whereby ), a = ) (2 l − 1,19). (16, ] a a 2 l ) (1 Fg 1A). (Fig. ≈ − a 10 a stepoaiiyta any that probability the is −19 ) l −1 2 , rwt bu 6-mers about to grow o/.Fg 1B Fig. mol/L. hl i 2 = h con- the , shows [1] iii) otecnlso htsotrno Pcan ar within carry more chains and fact. longer HP become proven cur- random lead autocatalytically protein-like. to than that are short capacity simulations that speculation the here computer them conclusion of suggested of the results realm actions to 55– describe of the refs. we sorts in and Below, therein the more unusual references and so, not rently and (52), are Even 54 cells peptides and 59). living 11–14 random of (refs. from growth (53) ran- the and peptides from sustain for binding generated cata- can required Proteins primitive libraries functions. not dom been biological are have perform complexity to could and proteins Precision lysts. early that imagine hydrophobic (HP the polar of patterns sequence (H particular of code vation (47). Ferris magenta, (46); al. et red, Ding polymerization: cyan, (36); prebiotic al. on et experiments of Kanavarioti from distribution distributions Flory Fitted a (B) gives line to Green lead lengths. typically chain processes polymerization taneous 1. Fig. iv) B A ii) i) eeaetepeie ftemodel. the of premises the are Here to hard not is it proteins, are biocatalysts today’s Since n oa (P polar and ) hs odmrctlssfr natctltcset. autocatalytic an form catalysts of foldamer These elongation the chains. catalyze HP hydrophobic can other pads exposed landing have with Foldamers will surfaces. foldamers pad” “landing struc- those compact of into Some fold can sequences tures. HP random Some Spon- (A) chains. short mostly to lead processes Polymerization copolymers. ) ooes(1.W alteehydrophobic these call We (51). monomers ) PNAS hl | i = ulse nieAgs 2 2017 22, August online Published ,adbu iecrepnsto corresponds line blue and 6, | hl E7461 i = 2

BIOPHYSICS AND PNAS PLUS COMPUTATIONAL BIOLOGY Here is evidence for these premises. lattice model (51, 60). In this model, each monomer of the chain is represented as a bead. Each bead is either H or P. Chains i) Nondesigned random HP sequences are known to fold. have different conformations represented on a 2D square lat- HP polymers have been studied extensively as a model tice. The free energy of a given chain in a given conformation for the folding and evolution of proteins (51, 60–73). equals (the number of HH noncovalent contacts) × (the energy Those studies show that unique folded structures can be eH of one HH interaction). Some HP sequences have a single encoded simply in the binary patterning of polar and lowest free energy structure, which we call native, having native hydrophobic residues, with finer tuning by specific inter- energy Enat : residue contacts. This binary encoding of unique folding E = n e , [2] is confirmed by experiments (74–77). Subsequently, the nat hφ H HP model has contributed several notable insights and where nhφ is the number of HH contacts in the native structure advances, including sequence space superfunnels (64), the of that particular sequence. nonrandomess of uniquely encoding sequences (65), the A virtue of the HP lattice model is that, for chains shorter determination of all uniquely encoding sequences of chain than about 25-monomers long, every possible conformation of length 25 (66), recombination (67), homology-like compara- every possible sequence can be studied by exhaustive com- tive modeling features (68), and evolutionary switches (69). puter enumeration. Thus, folding and collapse properties of A comprehensive review is in the work by Sikosek and whole-sequence spaces can be studied without bias or param- Chan (70). eters. Prior work shows that the HP lattice model reproduces ii) Exposed hydrophobic clusters and patches are common on many of the key observations of protein sequences, folding today’s proteins. A study of 112 soluble monomeric proteins equilibria, and folding kinetics of proteins (88). A main con- (78) found patches ranging from 200 to 1,200 A˚ 2, averag- clusion from previous studies is that a nonnegligible fraction ing around 400 A˚ 2; they are often binding sites for ligands of all possible HP sequences can collapse into compact, struc- or other proteins. Modern proteins have many sites of inter- tured, and partially folded structures resembling native proteins action with other proteins, typically nearly a dozen partners. (60) (Fig. 2). The reason that the 2-dimensionality adequately Almost three-quarters of protein surfaces have geometrical reflects properties of 3D proteins is because the determinative properties that are amenable to interactions, and those sites physics is in the surface to volume ratios (because the driving are enriched in hydrophobes (79). force is burial of H residues). Also, it is convenient that the iii) Surface hydrophobic patches on proteins are often sites of 10- to 30-mers that can be studied in 2D have the same sur- catalysis (78, 80–82). For example, hydrophobic clusters on face to volume ratios as typical 3D proteins, which are 100- to the surface of lipases serve as initiation sites, where the 200-mers (89). hydrophobic tail of a surfactant interacts with the patch first We assume that folded states behave differently from unfolded (81). A hydrophobic cluster on Cytochrome-c Oxidase is states, as they do in modern proteins. We suppose that a folded chain is prevented from additional growth and also, is protected known to increase kcat (82). iv) Primitive proteins might have catalyzed peptide chain elon- from hydrolysis. This simply reflects that open chains are much gation. Of course, today’s cells synthesize proteins using more accessible to degradation from the solvent or adsorption ribosomes, wherein the catalysis is carried out by RNA onto surfaces than are folded chains. We suppose that unique molecules. However, there are reasons to believe that pep- folders are better protected than other compact chains, since oil tide chain elongation might alternatively be catalyzable by drop-like chains have more core exposure to the solvent. We proteins. First, peptide chain elongation entails a conden- take folding to be reversible, as it is for natural small proteins. sation step and the removal of a water molecule (ref. 83, Therefore, some small fraction of the time, even folded chains chap. 3, p. 82). Dehydration reactions can occur in water if are unfolded, and in that proportion, our model allows additional carried out in nonpolar environments (84, 85), such as pro- tein surfaces. Second, a major route of protein synthesis in simple organisms, such as bacteria and fungi, uses nonri- bosomal peptide synthetases, which do not involve mRNAs (86, 87).

Modeling the Process of HP Chain Growth and Selection The Dynamics of the Model. We assume that chain polymeriza- tion takes place within a surrounding solution that contains a sufficient supply of activated H and P monomers. Since liv- ing systems—past or present—must be out of equilibrium, this assumption is not very restrictive. In our model, activated H and P monomers are supplied by an external source at rate a. A given chain elongates by adding a monomer at rate β. Just to keep the bookkeeping simple, we consider a steady-state process, in which molecules are removed from the system by degradation or dilu- tion at a certain rate d. We assume that chains can undergo spon- taneous hydrolysis because of interaction with water; any bond can be broken at a rate h. Without loss of generality, we define the unit rate by setting β = 1. All other rates are taken relative to this chain growth rate.

Chain Folding in the Model. In addition, our model also allows for how the collapse properties of the different HP sequences affect Fig. 2. Examples of HP sequences that fold to unique native structures the populations that polymerization produces. A standard way to in the HP lattice model. Red (or pink if in the beginning of the sequence) study the properties of HP sequence spaces is using the 2D HP corresponds to H monomers, and blue corresponds to P.

E7462 | www.pnas.org/cgi/doi/10.1073/pnas.1620179114 Guseva et al. Downloaded by guest on September 26, 2021 Downloaded by guest on September 26, 2021 ueae al. et Guseva ∗ ampli- become ultimately might accelerations rate small enhancement rate even the this than course, smaller Of much one magnitude. is by of rate orders polymerization more the hydrophobic or increasing four thus to 3–8 kT, kinetic three by the reduce of barrier would size hydrophobic localization pad and typical binding landing this a a monomers, For of three. energy be free to The 1–2kT 6). is Fig. interaction (see pad ing where another where pad, landing or Chain spot, chains. sticky HP molecule hydrophobic other of a elongation exposes the and catalyze can that pads ing 3. Fig. ilices fteplmrzto nrybriri eue by reduced factor is n a barrier by energy localization polymerization rate hydrophobic catalytic the the if elongation, increase chain For will estimate. rough a is since Here pad. chain, landing a hydrophobic with of interact the end not will of hydrophobic residues addition a polar the to the only that monomer here facilitate notice hydrophobic can to pad important is landing It hydrophobic chain. the grows that bond chain monomer reactant growing a which to A components. reacting catalysts, the of protein of mechanism localization elementary translational standard namely a today. optimally illustrates more 3 do as Fig. proteins act modern could as surfaces catalysts, These primitive surfaces. hydrophobic Model. exposed the have in Step Catalysis The (90) as sequence HP any folding for the coefficients estimate rate we unfolding purpose, and this For degradation. or growth monomer chain adding growing for barrier the reduces nqey hrfr,w onttetta rcs xlctyt eptemdlsimple. fold model not the keep do to explicitly they process because that treat anyway, not contribute do we not those Therefore, uniquely. do practice, in However, sequences system. the hydrophobic exit and highly aggregate would they that grounds the npicpe ecudas pcfial xld eune htaetohdohbcon hydrophobic too are that sequences exclude specifically also could we principle, In c o uhrt ceeaincudsc oaiaingive? localization a such could acceleration rate much How /kT z where ), oeH odmr aehdohbcpths hc ev sland- as serve which patches, hydrophobic have foldamers HP Some A B stenme frttoa fprppiebond.* peptide per df rotational of number the is swl sahdohbcmonomer hydrophobic a as well as tectls oeue a yrpoi adn pad landing hydrophobic a has molecule) catalyst (the C B. ln ilbn,lclzn hmln nuht oma form to enough long them localizing bind, will  n c k k u f stenme fHmnmr nteland- the in monomers H of number the is etk h iiu ieo adn pad landing a of size minimum the take We .  = 10 − 7 ∆G kT fl fmdr iooe 9) but (91), ribosomes modern of -fold oeH eune ilfl to fold will sequences HP Some = B C E kT n ecathydrophobic reactant a and nat otehdohbcedo the of end hydrophobic the to β cat C − /β a id hslocalization This bind. can N o cat non ln z , ∝ exp(E A folds [3] H HP · oeue o ahlnt,addvddi ytettlmlclrcount. molecular total the by it divided and length, each for molecules bury fold. not can do that they chains fold, than slowly can com- more are degrade chains that folded chains when or Thus, pact cores. 2, folded their case in monomers In (Fig. some lines). length expected, chain gray as with distribution, 4, catal- populations Flory and decaying the folding exponentially recovers chain with both simply for 1 not allows no but Case 3 folding ysis. but case chain hydrolysis, for and allows catalysis, undergo 2 for and Case contribute. grow factors sequences other which in Folding but test, Problem, Length Flory Does. the Catalysis Plus Solve Not Does Alone Folding Discussion and Results step. one. at of time rate particular a this with spontaneously for grow rate can growth also They higher with have contact into catalyst come have a unfolded which polymers single chain Only every one. of growing is rate polymer one growth The only time unit. support monomer single one can a equal plus molecule In reactions, catalyst s. two addition) one our monomer between step, In spontaneous time simulations. of the 1/(rate the is to algorithm from step extracted Gillespie reactions one be between the simulations, can time of and the stochastic those simulations, is to Gillespie-like In identical molecule (92). of are distributions Also, that probability system. gives the counts it in that larger present shown much actually have is species we species of potential number of the number for than the (93) which method in Gillespie improves the systems that of algorithm performance simulation computational stochastic on exact an Method is Propensity It Partial (92). Expandable the used we purpose, this few surfaces. to undesigned getting intended simple of is plausibility it the necessarily Rather, show was localization. mechanism and and prebiotic to noise, contacts true mean hydrophobic the the not that in do belief be we a Also, small, imply initial heterogeneity. be the sequence to that much likely is involve are here sought tests being experimental signals in challenge The fied. lz.W u 0smltosfreeycs.T rdc ahln,w oka cat- took and we fold line, each can produce that To 10 over case. chains average every populations HP time for Flory- simulations considerable contains 30 for a contains soup run now We has chains the alyze. allowing when of still (but chains soup chains longer catalysis a of of of (red): soup absence 3 a Case the (blue): absence folding). in the 2 distribution in Case length distribution length catalysis. like Flory-like and a has folding chains of of soup a (gray): 1 4. Fig. h ecito n orsodn + irr a efound be can library C++ corresponding and description The For simulations. stochastic run we dynamics, this simulate To https://github.com/abernatskiy/epdm hisbcm lnae yflae aaytH eune.Case sequences. HP catalyst foldamer by elongated become Chains 6 ecmaetrecss ae1i reference a is 1 Case cases. three compare We iepit ntesed-tt nevl hncounted then interval, steady-state the in points time PNAS | ulse nieAgs 2 2017 22, August online Published kT . are eutosfrom reductions barrier | E7463

BIOPHYSICS AND PNAS PLUS COMPUTATIONAL BIOLOGY Fig. 5. The distributions over individual sequences are highly heterogeneous. We show the populations (molecule counts of individual sequences) for the three cases. In case 1, we do not allow folding or catalysis. In case 2, we allow folding but not catalysis, and in case 3, both folding and catalysis are allowed. For all of the cases, gray dots represent populations of the sequences that cannot fold, blue dots represent sequences that fold but cannot catalyze, and red dots represent sequences which act as catalysts and for which at least one elongation reaction has been catalyzed. For cases 1 and 2, populations of the sequences of the given length are distributed exponentially. Thus, we can take mean or median population for the given length as a faithful representation of the behavior of average sequence of that length. Case 3 is drastically different: the populations of the sequences of the given lengths are distributed polynomially. While most of the sequences have very low population for the longer chains, several sequences (mostly autocatalytic ones) have very high ones and constitute most of the biomass. For case 3, neither mean nor median is a good representation of the behavior of the chains; as we can see from the figure, all of the chains basically separate into two groups with different distributions, and this information cannot be shown in the mean or median. Every point is a time average over 106 time points in the steady-state interval. Lower limit of 10−6 is because of computational precision.

Fig. 5, case 2 shows that folded polymers have higher popula- those foldamers that are catalysts, whereas Fig. 6B shows those tions than unfolded ones. Given a strong enough evolutionary foldamers that are not catalysts. pressure, even small advantages (as in case 2) could result in the In short, all HP sequences that are foldamer catalysts are selection of foldable structures. This conclusion is in the agree- members of the autocatalytic set: any two HP foldamer cata- ment with the work of Shakhnovich and coworkers (94), which lyst sequences are autocatalytic for each other. Fig. 7 shows two showed that compact structures are favored by selection under examples of autocatalytic paired chain elongations. Fig. 7, Upper conditions of aggregation and hydrolysis. shows cross-catalysis: a polymer A elongates a polymer B, while However, folding alone does not solve the Flory Length Prob- B is also able to elongate A. Fig. 7, Lower shows autocatalysis: lem (Figs. 4, blue lines and 5, case 2). Folding does increase the one molecule C elongates another C molecule in solution. populations of some foldamer sequences relative to others, but the effects are too small to affect the shape of the overall dis- The Size of the Autocatalytic Set Grows with the Size of the Sequence tribution (Fig. 4, blue lines). It has been previously shown (94) Space. An important question is how the size of an autocat- that sequences capable of collapsing into compact structures can alytic set grows with the size of the sequence space. Imagine first be prebiotically selected under just the forces of hydrolysis and the situation in which the CTB transition required one or two aggregation alone. Our work is not necessarily in disagreement. “special” proteins as autocatalysts. This situation is untenable, That prior work posits a postbiotic mechanism driven by a selec- because sequence spaces grow exponentially with chain length. tive force toward an optimization goal, whereas this mechanism Therefore, those few particular special sequences would wash is prebiotic and emergent, where sequence space is searched ran- domly with no preferential selection. A few iterations of this pre- biotic mechanism could amplify small advantages. Case 3 gives considerably larger populations of longer chains AB than cases 1 or 2 give (red lines in Fig. 4). When chains can both fold by themselves and also catalyze the elongation of others, such polymerization processes will “bend” the Flory distribution. This effect is robust over an order of magnitude of the hydrolysis and dilution parameters. The result is that some HP chains can fold, expose some hydrophobic surface, and reduce the kinetic barrier for elongating other chains. These enhanced populations of longer chains occur, although the degree of barrier reduction is relatively small. Case 3 is qualitatively different from cases 1 and 2. Although cases 1 and 2 have substantial variances, they have well-defined mean values that diminish exponentially with chain length. Case 3 has much bigger variances and a polynomial distribution of chain lengths, and therefore, neither the mean nor median is a good representation of the behavior of the chains (Fig. 5, case 3).

The Foldamer Catalyst Sequences Form an Autocatalytic Set. This model makes specific predictions about what molecules consti- Fig. 6. (A) HP lattice chains that fold and are autocatalytic. They fold into tute the autocatalytic set—which HP sequences and native struc- unique structures and have landing pads that can catalyze the elongation tures are in it and which ones are not. Fig. 6 shows a few of the of each other. (B) HP chains that fold but are not catalytic. Most chains are HP sequences that fold to single native structures. Fig. 6A shows not catalysts, but the size of the autocatalytic set is nonnegligible (Fig. 8).

E7464 | www.pnas.org/cgi/doi/10.1073/pnas.1620179114 Guseva et al. Downloaded by guest on September 26, 2021 Downloaded by guest on September 26, 2021 i.7. Fig. ueae al. et Guseva different to Fig. them. lead between 10B simulations switching stable the no with for attractors, dynamical conditions seed initial ent sequence the basin evolvability. of the additional exploration and to for space allows threshold This a attractor. over a another attractors, system of fur- many the are has move that that origins can system ensembles Early perturbation complex to a 95). led For and have evolvable. to 19 ther possible likely refs. is more in selection are are processes or discussions evolvability because example, model, additional to CTB (for no tend of a that of will type feature means It undesirable one criterion. it an some Suppose is This by all.” out. others win take than better “winner is is molecule transition Ensembles. CTB HP the of Evolvability with 80%, to 50 from of ranges hydropho- average sequences an the dominant that the find not of We is bicities chains. alone longer folding populate (ii) to and 8), autocata- sufficient of (Fig. number sequences the quite longer (i) grows factors: are lysts two enhancements by caused catalytic is This polymerization the modest. basic although the itself, just dynamics the to dominates It relative pro- and point. process is over that polymerization related takes biomass completely closely of autocatalysts fraction a by the CTB duced makes chains, the longer 9 sequence in for Fig. autocats of that, shows large. of size space been the the have to that might is proportion implication foldamers in The both grow of space. populations cats the foldamer that shows and 8 Fig. also hand, are other that (about sequences smaller HP even that of is (about sequences catalysts fraction small the HP and fairly of space), this always fraction sequence resolves is the mechanism hand, foldamers this one are that the shows space On 8 sequence problem. Fig. larger contrast, increasingly an In into sea. moves biology as out (here, sequence same (here, sequences ferent when (A Then, (A fold themselves can elongation. catalysts they of as which steps at the length are of the lines reach several) they often, black more Solid catalysis (or experience arrows. both one “autocatalytic,” during call solid are we red which there Chains, by repre- reactions. folding them, represented arrows is Among Dashed Catalysis growth. gations. sequence. chain identical · of · an · reactions of multiple copies sent two of catalysis i.1 hw htti oe sntwne aeal Differ- all. take winner not is model this that shows 10 Fig. HH hw oeo h eune n oddsae nec of each in states folded and sequences the of some shows + (Upper H · · · → rs-aayi ftodfeetsqecs (Lower sequences. different two of Cross-catalysis ) 68%. HHH C ). A f aaye ecin n pnaeu hi elon- chain spontaneous and reactions catalyzed , B and f , C f n ewe ifrn ntne fthe of instances different between and B) .Mta aayi a apnbtendif- between happen can catalysis Mutual ). e hleg nmdl of models in challenge key A 0.6% fsqec pc) nthe On space). sequence of u , B u , C 2.3% u ,te odadserve and fold they ), ftemodel the of Auto- ) ry h pc falH eune;bu,tesaeo odmr;rd the red, foldamers; of space the blue, catalysts. sequences; foldamer HP of all space of space the gray, 8. Fig. hisaems ucpil odgaain(hyhv ocore), no have (they Unfolded degradation one. to qualitative susceptible a most is How therefore, are here have. chains and point folders main one unique of The than any more? ensemble hydrolysis H much in fluctuating to time all exposure a little The more with has spends states. droplet, that folded oil states stable an those ground and because like unique sequence, is H have sequence all not the do as states sur- such states, potential trivial toward fixed a good not see are and to rapidly too catalysts. reactants fluctuate polymers for Unfolded face. enough that long timescales solid, over miniature are fixed can a relatively is polymers positions princi- state interatomic folded same folded having exploit: the the because catalysts on reactions, enzyme is catalyze based physics today’s is this that mechanism that ple show the simply Third, barrier we are plausible. here, the mechanisms foldamers; catalysis in in and reduction possible binding small) Other polymerization. probably to (but give could nonnegligible surfaces foldamer a how indicates It this reaction. polymer extension in reactants, localization code two the translational genetic of a case the simply its Second, is how evolved. mechanism address functionalities catalytic protein not other differentiation how does or sequence It evolved poly- and sequences. how elongation other of the of catalyze question and prebiotic partially collapse, fold, the could molecules at way. peptide-like coarse-grained random looks a merizing in only principles model few a This capture to aims sequences. it acid First, amino real of true is as simu- behavior, complex Also, more than one, functionalities. longer this potential chains as other lating such many models, to “ribosome- ensemble lead beyond would but functionalities elongation,” of chain evolution like been the not has consider here goal will to Our too attractors. These dynamic additional elongation. types to chain lead other than gen- many other generate functions model catalytic should this allow- of it because folds, 2, Moreover, different than many etc.). rather erates chains, in types longer attractors monomer for dynamical (20 such ing dynamics. many models the be refined of will more property there emergent that ensemble an expect signature distribu- is own We that red its sequences has vs. HP attractors green two of the the of for Each sets tions. autocatalytic respective the orh h euneeouini hsmcaimi not is mechanism this in evolution sequence the Fourth, not. is it what and is model our what note we point, this At ifrn eunesae rwepnnilywt hi length: chain with exponentially grow spaces sequence Different PNAS N | ti ieyt ierce and richer give to likely is it 25, = ulse nieAgs 2 2017 22, August online Published B n monomer and C nachain a in , | E7465

BIOPHYSICS AND PNAS PLUS COMPUTATIONAL BIOLOGY compact structures are less susceptible (they have cores but fluc- A tuating ones), and unique folds are even less susceptible still (their cores are relatively stable). What is most important is not where the boundary is drawn but that there is some distinction among those three states. All H sequences would also tend to aggregate, and therefore, discriminating between oil drops and unique folds also gives a simple way to avoid confounding effects of aggregation. (Although aggregation is interesting in its own right as a possible mechanism of CTB, it is not the focus of this work.) Why should we believe a simple lattice model, particularly a 2D one? The 2D HP lattice model is among the most canoni- cal models for studying sequence spaces of foldamers. The rea- son for the reliance on this model is that, because it is stud- ied by computational exhaustive enumeration, it is unbiased by any preconceptions about the nature of sequence space or arbi- trary choices of energy parameters. This approach is the only unbiased, complete, and practical way to explore plausibilities of physical hypotheses, such as this one. Also, previous studies have shown that this model captures many important principles of folding and sequence-to-structure relationships. The principal value of such modeling is that it generates hypotheses that can B be tested. We note that this model is not necessarily exclusive to proteins. Nucleic acid molecules are also able to fold in water, indicating differential solvation. Although our model focuses on hydropho- bic interactions, it is simply intended as a concrete model of solvation that could more broadly include hydrogen bonding or other interactions. Therefore, although our analysis here is only applicable to foldamers, that does not mean that it is limited to proteins. The unique power that foldable molecules have for catalyzing reactions—in contrast to other nonfoldable polymeric structures—is that foldamers lead to precisely fixing atomic inter- relationships in relatively stable ways over the folding time of the molecule. It resembles a microscale solid, with the capability that Fig. 10. Sequences can evolve to different autocatalytic sets. (A) HP cat- substrates and transition states can recognize, bind, and react to alytic system has at least two attractors. The lines are length distributions those stable surfaces. For example, serine proteases use a cat- from case 3. Again, each line represents distribution of length in the steady alytic triad of three amino acids. Therefore, foldability in some state for one simulation run. It is clear that there are two kinds of distribu- tion which get realized during the simulations. The system bifurcates either type of prebiotic polymer could conceivably have had a special to a state represented by a green line or to one represented by a red one. role in allowing for primitive catalysis. Here, we use a toy model These are the same lines as in Fig. 5A but separated in two sets by the clus- to capture that simple idea, namely that a folded polymer can tering algorithm k-means. (B) Structure of the sequences which most often are main contributors into the total population of the polymers of their length. Upper corresponds to the macrostate shown in red in A, and Lower corresponds to the one shown in green.

position a small number of residues in a way that can catalyze a reaction. Finally, we comment on the spirit of this model. In much of biological modeling, experiments come first, providing the premises for a model that can supply the rest of the chain of logic from premises to conclusions. However, this type of model serves a different goal, more in the spirit of other models in physics. In this predictions first approach, theory precedes and motivates subsequent experiments that can prove or disprove it. In predictions first modeling, the experimental premises are more an outcome than an input. Predictions first modeling is especially crucial for murky problems of science, such as in the origins of life, where even the basic premises are not all yet in plain sight. Our hope here is that this modeling can guide new experiments. Conclusions

Fig. 9. The longer the chains, the bigger the contribution of the autocat- It has been recognized that life’s origins require some form of alysts. Each red line shows how the contribution of autocatalytic chains to autocatalysis (5, 6, 8). However, what molecular structural mech- the biomass of the given length grows with chain length. Different red lines anism might explain it? Here, we find that autocatalysis is inher- correspond to different simulation runs. The black line shows the median ent in the following process (Fig. 7). HP polymers are synthe- over 30 simulations. sized randomly. A small fraction of those HP polymers folds into

E7466 | www.pnas.org/cgi/doi/10.1073/pnas.1620179114 Guseva et al. Downloaded by guest on September 26, 2021 Downloaded by guest on September 26, 2021 9 erJ ta.(02 rboial luil ehnssices opstoa diver- compositional increase mechanisms plausible Prebiotically (2012) al. evolu- et become J, kinetics Derr chemical How life: 19. to prelife From (2012) MA Nowak IA, Chen 18. amphiphilic Self-replicating (2009) G Ashkenasy H, pep- Rapaport self-replicating N, A Wagner (1996) B, MR Ghadri Rubinov K, 15. Severin JA, Martinez poly- JR, information-coding Granja of DH, autocatalysis Lee of 14. Onset (2014) S Maslov AV, Tkachenko feedback Autocatalytic 13. biopolymers: self-replicating of Origin (2009) evolv- PG effective Higgs for M, required Wu is catalysis 12. mutual Excess (2012) D Lancet O, Markovitch 11. Segr 10. 6 hc L(92 tblt fppie nhg-eprtr qeu solutions. aqueous high-temperature in peptides of Stability (1992) EL Shock 26. (1993) AD Ellington JW, Szostak 25. poly- reversible replication, sequence Universal (2012) NV Hud MA, Grover SI, Walker 20. replicators. and catalysts Prelife evolution. (2009) of MA origin Nowak H, the Ohtsuki and dynamics 17. Prevolutionary (2008) H Ohtsuki MA, Nowak 16. 9 ea ,OglL,GaiiM 20)Croy ufiemdae rboi formation prebiotic sulfide-mediated Carbonyl (2004) MR Ghadiri LE, Orgel L, polypeptides Leman of synthesis 29. Prebiotic (1970) A Katchalsky J, Berger M, Paecht-Horowitz hydrolysis. 28. bond peptide of equilibria and energies Free (1998) RB Martin 27. enzymes. RNA self-replicating efficient Highly (2014) GF Joyce MP, Robertson repli- 24. RNA cooperative enzyme. among formation RNA network Spontaneous an (2012) al. of et N, replication Vaidya Self-sustained 23. (2009) GF Joyce TA, Lincoln 22. hexadeoxynucleotide. self-replicating A (1986) G Kiedrowski von 21. 4 ersJ,Hl R i ,OglL 19)Snhsso ogpeitcoioeso min- on oligomers prebiotic long of Synthesis (1996) LE Orgel R, Liu AR, life. Hill of JP, basis Ferris physical The 34. (1949) JD surfaces: Bernal mineral on 33. acids amino of polymerization and Adsorption (2008) JF Lambert Or 32. DG, Odom M, Rao world. RNA 31. the of origin the and chemistry Prebiotic (2004) LE Orgel 30. 5 esnK,RbrsnM,Lv ,Mle L(01 ocnrto yeaoainand evaporation by Concentration (2001) SL Miller M, Levy MP, Robertson KE, Nelson 35. ieycntn ihcanlnt,a es pt 5mr Fg 8). (Fig. 25-mers to up least at length, chain with constant tively ltcsrae a endaoe is or above) cat- defined catalytic have (as that surfaces sequences potentially HP alytic possible a all of (2.3% fraction have structures The 25-mers). unique and to possible fold folds all sequences of or HP fraction nonnegligible fold, A pad. not landing hydrophobic do fold, do HP other of elongation the sequences. foldable catalyze having to molecules help can surfaces surfaces. hydrophobic struc- Those pad folded landing those stable of relatively provides fraction tures A states. compact stable relatively ueae al. et Guseva .Kufa A(96 uoaayi eso proteins. of Segr sets Autocatalytic 9. (1986) SA Kauffman (1989) G 8. Nicolis self-organization. I, Prigogine natural 7. (1985) of . F The Dyson principle (1978) P 6. Schuster A M, Eigen (1977) 5. P macro- biological Schuster of M, evolution the Eigen and 4. matter of Selforganization relevance (1971) their M and complexity Eigen sequence of 3. subsets Three (2005) JT macro- Trevors informational DL, Abel of synthesis 2. template-directed Nonenzymatic (1987) G Joyce 1. h Pmdlalw o nisdcutn fsqecsthat sequences of counting unbiased for allows model HP The GR) iei nlsso efrpiaini uulyctltcsets. catalytic mutually in self-replication Biosph of analysis Kinetic (GARD): senschaften molecules. information. biopolymeric to molecules. iyo uli cdsequences. acid nucleic of sity dynamics. tionary 3790. β tide. arXiv:1405.2888v4. mers. world. RNA the jump-start can ability. 4117. assemblies. noncovalent catalytic mutually in transfer omci Acta Cosmochim World RNA Biol Engl prebiotic of initiation the for model evolution. A sequence biopolymers: functional early and merization USA Sci Acad Natl Proc o Biol Mol peptides. of adenylates. amino-acid of polycondensation heterogeneous by 45:351–353. cators. 323:1229–1232. rlsurfaces. eral review. A h rboi ytei fcytosine. of synthesis prebiotic the 0.3% setpeptides. -sheet ,Lne ,KdmO iplY(98 rddatctlssrpiaindomain replication autocatalysis Graded (1998) Y Pilpel O, Kedem D, Lancet D, e ,BnEiD actD(00 opstoa eoe:Peitcinformation Prebiotic genomes: Compositional (2000) D Lancet D, Ben-Eli D, e 21:238–245. ´ ´ 25:932–935. Nature Nature ri Life Artif 28:501–514. ftewoesqec pc.Teerto eanrela- remain ratios These space. whole-sequence the of 39:99–123. rgLf vlBiosph Evol Life Orig Naturwissenschaften Biol Quant Symp Harb Spring Cold Cl pigHro a rs,Paniw Y,p 511–533. pp NY), Plainview, Press, Lab Harbor Spring (Cold 64:541–565. Science 382:525–528. Nature 491:72–77. rgn fLife of Origins 18:243–66. 56:3481–3491. ne hmItE Engl Ed Int Chem Angew c hmRes Chem Acc 306:283–286. LSOne PLoS 18)Casi rbooia chemistry. prebiological in Clays (1980) J o ´ 381:59–61. 105:14924–14927. xlrn Complexity Exploring ho ilMdModel Med Biol Theor CmrdeUi rs,Cmrde UK). Cambridge, Press, Univ (Cambridge uli cd Res Acids Nucleic 7:e34166. o Evol Mol J 38:211–242. 45:2088–2096. 58:465–523. nVtoSlcino ucinlNcecAisi the in Acids Nucleic Functional of Selection Vitro In rgLf vlBiosph Evol Life Orig 69:541–54. rcPy o B Soc Phys Proc Naturwissenschaften 48:6683–6686. 12.7% 52:41–51. 40:4711–4722. S.Mri’ rs,NwYork). New Press, Martin’s (St. ho Biol Theor J 2:29. rcNt cdSiUSA Sci Acad Natl Proc fflal sequences foldable of 31:221–229. 62:597–618. o egh pto up lengths for rcBo Sci Biol Proc o Evol Mol J 119:1–24. Nature 65:7–41. ne hmItEd Int Chem Angew rtRvBiochem Rev Crit rgLf Evol Life Orig 228:636–639. Biopolymers 15:317–331. 276:3783– Naturwis- 97:4112– Geochim Science Chem 6 Irb 66. Irb Mutational 65. landscapes: evolutionary Modeling and lattice (1999) Triangular HS a Chan on folding E, protein Bornberg-Bauer for rules 64. Local (1997) proteins. al. globular et in organization R, structural Agarwala tertiary 63. of in Forces exchange (1995) hydrogen KA for Dill K, model Yue mechanical statistical 62. A (1995) KA Dill and DW, conformational the Miller of model 61. mechanics statistical lattice an A (1989) by KA driven Dill KF, protocells Lau model between 60. Competition (2013) JW Szostak K, Adamala 59. catalysts. peptide-based with Duschmal reactions 56. Site-selective (2016) SJ Miller MW, catalysts. protein Giuliano of Design 55. (2013) D Hilvert 54. library unevolved an from Proteins (2012) MH Hecht AN, Koehler M, Korolev I, Cherny 53. 8 oiD,e l 19)Sreigo irr fpaedslydppie identifies peptides phage-displayed of library a of Screening (1999) al. et qual- DJ, spectral the Rodi modulate that 58. peptides of Evolution (1998) GP Nolan MN, Rozinov 57. designed novo De (2011) MH Hecht SR, Viola LH, Bradley KL, copolymers. McKinley unit-vector and Ma, proteins a Fisher of soup” by 52. space “Sequence alignment (1991) KA structure Dill HS, RNA Chan (2008) 51. MA com- Marti-Renom a into E, polymer nonbiological Capriotti a Folding 50. (2005) KA Dill RN, Zuckermann BC, manifesto. Lee A Foldamers: 49. (1998) SH Gellman 48. 7 ersJ 19)Peitcsnhsso ieas rdigtepeitcadRNA and prebiotic the Bridging minerals: on synthesis Prebiotic (1999) JP phosphorimida- Ferris uridine 47. of Oligomerization (1996) JP Ferris K, Kawamura PZ, Ding 46. 5 acn ,Mle L(96 h rgnaderyeouino ie rboi chemistry, Prebiotic life: of evolution early and origin systems The (1996) hydrothermal SL submarine Miller of A, role Lazcano The 45. (2009) JL Bada HJ, Cleaves AD, Aubrey 44. W C, formaldehyde Huber and cyanide 43. hydrogen for yields Energy (1987) SL Miller R, Stribling 42. induced salt of combination The (1999) J Bujdak Y, Suwannachot HL, Son wet-dry BM, Rode by driven 39. formation bond amide Ester-mediated (2015) al. et JG, Forsythe 38. 1 lr J(1953) life. of PJ origin the Flory and Peptides 41. (1999) BM Rode 40. report. status facilitate A ice earth: on in began phases life How Eutectic (2004) (2001) JL Bada DW 37. Deamer PA, Monnard A, Kanavarioti 36. fSine fc fBscEeg cecs fteU eateto Energy of Department US the Grant Office of the Sciences, DE-AC02-05CH11231. Foundation Contract by Energy received under Basic Science supported We of was National Office Foundry software. Science, and Molecular of the with Center at help Work Laufer MCB1344230. and the from discussions support fruitful Luk for Miha and Bernatskiy Pohorille, Andrew Gellman, ACKNOWLEDGMENTS. action. their pre- for mechanism testable kinetic autocatalytic and suggest structural be in a can could provides folding sequences model and polymer that This early suggest what rare. (96) for not dictions rule is folding pro- polymers HP active HP the biologically on foldable, based of teins designs successful and This etd omto ecinadca aayi:Awyt ihrppie ne primi- under peptides higher to way conditions. A earth catalysis: tive clay and reaction formation peptide earth. prebiotic the on polypeptides to Engl path possible A cycles: 1–15. oezmtcncecai synthesis. acid nucleic nonenzymatic J space. sequence in superfunnels 96:10689–10694. and topology, stability, model. HP the in hydrophobicity Generalized USA Sci Acad Natl Proc proteins. globular proteins. of spaces sequence catalyst. encapsulated Chem Curr Top molecules. small of range a bind sequences 130–138. designed novo de of Phys protein. taxol-binding a as bcl-2 human fluorophores. small-molecule bound, of ities (Camb) Commun Chem enable and Coli Escherichia in function sequences growth. artificial cell of library a from proteins Phys approach. structure. multihelical pact worlds. minerals. on RNA of synthesis Biosph prebiotic Evol the Life for Orig model A montmorillonite: on zolides h r-N ol,adtime. and world, pre-RNA the life. of acids. amino origin of synthesis the the in for Implications surfaces: (Ni,Fe)S 672. on CO ocean. primitive the in concentrations acid Biosph amino Evol and HCN The syntheses: 688. 79:2252–2258. c ,Toi 20)Eueaigdsgigsqecsi h Pmodel. HP the in sequences designing Enumerating (2002) C Troein A, chains. ack protein in correlations hydrophobicity On (2000) E Sandelin A, ack ¨ ¨ 28:1–15. 95:3775–3787. 54:9871–9875. ilBull Biol ,KhtS enmr 21)Ppiectlssi qeu emulsions. aqueous in catalysis Peptide (2014) H Wennemers S, Kohrt J, e ´ Bioinformatics 17:261–273. LSOne PLoS achtersh ¨ 372:157–201. rnilso oye Chemistry Polymer of Principles 196:311–314. rti Sci Protein ue 19)Ppie yatvto faioaiswith acids amino of activation by Peptides (1998) G auser 26:151–171. ¨ rgLf vlBiosph Evol Life Orig 6:e15364. a Chem Nat 92:146–150. 50:8109–8112. etakNc u,L u,Aa eGaf Sam Graff, de Adam Guo, Li Hud, Nick thank We 24:i112–i118. PNAS mCe Soc Chem Am J Macromolecules 4:1860–1873. Cell rgLf vlBiosph Evol Life Orig 5:495–501. | 85:793–798. Astrobiology ulse nieAgs 2 2017 22, August online Published o Biol Mol J 127:10999–11009. 29:273–286. c hmRes Chem Acc nuRvBiochem Rev Annu hmBiol Chem optBiol Comput J 22:3986–3997. i o hi nihsadAnton and insights their for sic ˇ Peptides CrelUi rs,Ihc,N) p NY), Ithaca, Press, Univ (Cornell 1:271–281. 285:197–203. 39:91–108. 5:713–28. 20:773–786. 31:173–180. at lntSiLett Sci Planet Earth rcNt cdSiUSA Sci Acad Natl Proc 4:275–296. 82:447–470. ne hmItEd Int Chem Angew Science C yt Biol Synth ACS | 281:670– rgLife Orig Biophys Chem J E7467 Biol J 226: 1:

BIOPHYSICS AND PNAS PLUS COMPUTATIONAL BIOLOGY 67. Cui Y, Wong WH, Bornberg-Bauer E, Chan HS (2002) Recombinatoric exploration of 84. Manabe K, Sun XM, Kobayashi S (2001) Dehydration reactions in water. Surfactant- novel folded structures: A heteropolymer-based model of protein evolutionary land- type brønsted acid-catalyzed direct esterification of carboxylic acids with alcohols in scapes. Proc Natl Acad Sci USA 99:809–814. an emulsion system. J Am Chem Soc 123:10101–10102. 68. Moreno-Hernandez´ S, Levitt M (2012) Comparative modeling and protein-like 85. Manabe K, Iimura S, Sun XM, Kobayashi S (2002) Dehydration reactions in water. features of hydrophobic-polar models on a two-dimensional lattice. Proteins 80: Brønsted acid-surfactant-combined catalyst for ester, ether, thioether, and dithioac- 1683–1693. etal formation in water. J Am Chem Soc 124:11971–11978. 69. Sikosek T, Chan HS, Bornberg-Bauer E (2012) Escape from adaptive conflict follows 86. Stachelhaus T, Mootz HD, Bergendahl V, Marahiel MA (1998) Peptide bond from weak functional trade-offs and mutational robustness. Proc Natl Acad Sci USA formation in nonribosomal peptide Biosynthesis. J Biol Chem 273:22773– 109:14888–14893. 22781. 70. Sikosek T, Chan HS (2014) Biophysics of protein evolution and evolutionary protein 87. Marahiel MA (2009) Working outside the protein-synthesis rules: Insights into non- biophysics. J R Soc Interface 11:20140419. ribosomal peptide synthesis. J Pept Sci 15:799–807. 71. Yue K, Dill Ka (1992) Inverse protein folding problem: Designing polymer sequences. 88. Dill KA (1999) Polymer principles and protein folding. Protein Sci 8:1166– Proc Natl Acad Sci USA 89:4163–4167. 1180. 72. Xiong H, Buckwalter BL, Shieh HM, Hecht MH (1995) Periodicity of polar and non- 89. Giugliarelli G, Micheletti C, Banavar JR, Maritan A (2000) Compactness, aggregation, polar amino acids is the major determinant of secondary structure in self-assembling and prionlike behavior of protein: A lattice model study. J Chem Phys 113:5072– oligomeric peptides. Proc Natl Acad Sci USA 92:6349–6353. 5077. 73. Lipman DJ, Wilbur WJ (1991) Modelling neutral and selective evolution of protein 90. Ghosh K, Dill KA (2009) Computing protein stabilities from their chain lengths. Proc folding. Proc R Soc B 245:7–11. Natl Acad Sci USA 106:10649–10654. 74. Lim WA, Sauer RT (1991) The role of internal packing interactions in determining the 91. Sievers A, Beringer M, Rodnina MV, Wolfenden R (2004) The ribosome as an entropy structure and stability of a protein. J Mol Biol 219:359–376. trap. Proc Natl Acad Sci USA 101:7897–7901. 75. Kamtekar S, Schiffer JM, Xiong H, Babik JM, Hecht MH (1993) Protein design by binary 92. Bernatskiy AV, Guseva EA (2016) Exact rule-based stochastic simulations for the system patterning of polar and nonpolar amino acids. Science 262:1680–1685. with unlimited number of molecular species. arXiv:1609.06403v2. 76. Wei Y (2003) Stably folded de novo proteins from a designed combinatorial library. 93. Gillespie DT (1976) A general method for numerically simulating the stochastic time Protein Sci 12:92–102. evolution of coupled chemical reactions. J Comput Phys 22:403–434. 77. Brisendine JM, Koder RL (2016) Fast, cheap and out of control – insights into ther- 94. Abkevich VI, Gutin AM, Shakhnovich EI (1996) How the first biopolymers could have modynamic and informatic constraints on natural protein sequences from de novo evolved. Proc Natl Acad Sci USA 93:839–844. protein design. Biochim Biophys Acta 1857:485–492. 95. Vasas V, et al. (2015) Primordial evolvability: Impasses and challenges. J Theor Biol 78. Lijnzaad P, Berendsen HJC, Argos P (1996) Hydrophobic patches on the surfaces of 381:29–38. protein structures. Proteins 25:389–397. 96. Murphy GS, Greisman JB, Hecht MH (2015) De novo proteins with life-sustaining func- 79. Tonddast-Navaei S, Skolnick J (2015) Are protein-protein interfaces special regions on tions are structurally dynamic. J Mol Biol 428:399–411. a protein’s surface? J Chem Phys 143:243149. 97. Dill KA, et al. (2008) Principles of protein folding - a perspective from simple exact 80. Mitchell Guss J, Freeman HC (1983) Structure of oxidized poplar plastocyanin at 1·6 A˚ models. Protein Sci 4:561–602. resolution. J Mol Biol 169:521–563. 98. Bryant RAR, Hansen DE (1996) Direct measurement of the uncatalyzed rate of hydrol- 81. van Ee JH, Onno M (1997) Enzymes in Detergency, ed van Ee JH (CRC, Boca Raton, ysis of a peptide bond. J Am Chem Soc 118:5498–5499. FL), p 408. 99. Smith RM, Hansen DE (1998) The pH-rate profile for the hydrolysis of a peptide bond. 82. Witt H (1998) Tryptophan 121 of subunit II is the electron entry site to cytochrome-c J Am Chem Soc 120:8910–8913. oxidase in paracoccus denitrificans. Involvement of a hydrophobic patch in the dock- 100. Ghosh K, Dill KA (2010) Cellular proteomes have broad distributions of protein stabil- ing reaction. J Biol Chem 273:5132–5136. ity. Biophys J 99:3996–4002. 83. Nelson DL, Cox MM (2008) Lehninger Principles of Biochemistry (Freeman, New York), 101. Dill KA, Ghosh K, Schmit JD (2011) Physical limits of cells and proteomes. Proc Natl 5th Ed. Acad Sci USA 108:17876–17882.

E7468 | www.pnas.org/cgi/doi/10.1073/pnas.1620179114 Guseva et al. Downloaded by guest on September 26, 2021