Surveying the sequence diversity of model prebiotic by mass spectrometry

Jay G. Forsythea,1, Anton S. Petrova, W. Calvin Millarb,2, Sheng-Sheng Yuc, Ramanarayanan Krishnamurthyd, Martha A. Groverc, Nicholas V. Huda, and Facundo M. Fernándeza,3

aSchool of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332-0400; bSchool of Physics, Georgia Institute of Technology, Atlanta, GA 30332-0430; cSchool of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0100; and dDepartment of Chemistry, The Scripps Research Institute, La Jolla, CA 92037

Edited by Ken A. Dill, Stony Brook University, Stony Brook, NY, and approved August 7, 2017 (received for review June 29, 2017) The rise of peptides with secondary structures and functions would (Fig. 1A). Subjecting these mixtures to sequential hot-dry/cool-wet have been a key step in the chemical evolution which led to . As evaporation and rehydration cycles led to depsipeptides— with modern biology, sequence would have been a -like oligomers containing both amide and ester backbone primary determinant of peptide structure and activity in an origins- linkages (Fig. 1B). Ester linkages are kinetically and thermody- of-life scenario. It is a commonly held hypothesis that unique namically favored but susceptible to hydrolysis, whereas amide functional sequences would have emerged from a diverse soup of linkages are more stable; therefore, depsipeptide sequences became proto-peptides, yet there is a lack of experimental data in support of progressively enriched with amide bonds over the course of various this. Whereas the majority of studies in the field focus on peptides dry–wet cycling programs. In the absence of hydroxy acids, peptide containing only one or two types of amino acids, here we used bond formation did not spontaneously occur, confirming the plau- modern mass spectrometry (MS)-based techniques to separate and sible role of hydroxy acids as cobuilding blocks of proto-peptides. At sequence de novo proto-peptides containing broader combinations drying temperatures of 55–65 °C, depsipeptide chains contained of prebiotically plausible monomers. Using a dry–wet environmental predominantly ester bonds, whereas amide bonds became more cycling protocol, hundreds of proto-peptide sequences were formed prevalent at drying temperatures above 65 °C. Most reactions tested over a mere 4 d of reaction. Sequence homology diagrams were had an initial pH of ∼3 due to the hydroxy acid monomers present, constructed to compare experimental and theoretical sequence but depsipeptide formation also occurred at pH values of 5, 7, and 9. spaces of tetrameric proto-peptides. MS-based analyses such as this Successful depsipeptide formation with various amino acid and hy- will be increasingly necessary as origins-of-life researchers move to- droxy acid monomers was achieved, including glycine, glycolic ward systems-level investigations of prebiotic chemistry. acid, L-alanine, D-alanine, L-lactic acid, L-leucine, and L-serine. Motivated by our discovery that ester–amide exchange reactions prebiotic chemistry | chemical evolution | peptides | mass spectrometry | can produce mixed depsipeptides capable of chemical evolution, depsipeptides we sought to explore the sequence diversity of these species when more plausible mixtures of amino acid and hydroxy acid monomers n 1953, Miller demonstrated the abiotic synthesis of amino were subjected to repeated cycles. As with traditional proteomics Iacids in the now-famous spark-discharge experiment (1). Soon after, Fox and Harada began to explore the formation of peptides Significance from amino acids via condensation at high temperatures (2). These reactions were facilitated by proportionally higher concentrations Peptides and are essential for life as we know it, and of amino acids with acidic side chains (e.g., aspartic acid, D) and likely played a critical role in the origins of life as well. In recent resulted in the production of “proteinoids,” condensation products years, much progress has been made in understanding plausi- containing covalent cross-links not found in coded proteins. In ble routes from amino acids to peptides. However, little is their 1960 manuscript, Fox and Harada speculated about the di- known about the diversity of sequences that could have been versity of proteinoid sequences, yet conceded that “a complete produced by abiotic condensation reactions on the prebiotic answer to the question of whether the amino acid residues are earth. In this study, multidimensional separations were cou- distributed in a random or other arrangement may require a pled with mass spectrometry to detect and sequence mixtures complete assignment of residues in one molecular species,” atask of model proto-peptides. It was observed that, starting with a beyond the analytical capabilities of the time (3). few monomers, proto-peptide diversity increased rapidly fol- In subsequent years, the proteinoid concept was displaced by lowing cycling. Experimental proto-peptide sequences were the hypothesis that either RNA or proto-RNA gave rise to life compared with theoretically random sequences, revealing a (4–6). Nevertheless, to this day, condensation reactions of amino high sequence diversity of plausible monomer combinations. acids are thought to have played a key role in a potentially

symbiotic proto- and proto-peptide world (7, 8). Author contributions: J.G.F., A.S.P., R.K., M.A.G., N.V.H., and F.M.F. designed research; Peptide condensation studies at temperatures lower than those J.G.F., A.S.P., W.C.M., and S.-S.Y. performed research; J.G.F., R.K., M.A.G., N.V.H., and of Fox and Harada, or with the aid of chemical agents, confirmed F.M.F. analyzed data; and J.G.F. and F.M.F. wrote the paper. that abiotic production of peptides was indeed possible, but The authors declare no conflict of interest. chain lengths were generally limited to dimers and trimers with This article is a PNAS Direct Submission. low yields (9). In recent studies, this length barrier has been Data deposition: All depsipeptide sequences reported in this paper can be accessed in surpassed, and certain proto-peptides have been shown to form Dataset S1. aggregate structures (10, 11). Nevertheless, the majority of 1Present address: Department of Chemistry and Biochemistry, College of Charleston, proto-peptide studies have been limited in scope, typically con- Charleston, SC 29424. 2Present address: Master’s Industrial Internship Program, University of Oregon, Eugene, taining only one or two types of amino acid monomer. OR 97403. We recently introduced a model prebiotic pathway for peptide 3 – To whom correspondence should be addressed. Email: facundo.fernandez@chemistry. formation based on ester amide exchange reactions between gatech.edu. α α -amino acids and -hydroxy acids (12), amino acid structural This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. analogs found in meteorites and model prebiotic reactions (13, 14) 1073/pnas.1711631114/-/DCSupplemental.

E7652–E7659 | PNAS | Published online August 28, 2017 www.pnas.org/cgi/doi/10.1073/pnas.1711631114 Downloaded by guest on September 28, 2021 A C PNAS PLUS

DE

B R R

R

generic depsipeptide CHEMISTRY

Fig. 1. Experimental overview and example dataset. (A) Hydroxy and amino acid monomers used in this study. Hydroxy acids are abbreviated with lowercase letters corresponding to their analogous amino acids (e.g., glycolic acid = g; glycine = G). All chiral species were in their l form only. (B) Depsipeptides formed by ester–amide exchange contained a mixture of ester and amide backbone linkages and were initiated by a hydroxy acid at the “N” terminus. Depsipeptides were formed from hydroxy acid and amino acid mixtures by alternating polymerizing (85 °C dry-down, 18 h, open vial) and hydrolyzing (65 °C aqueous, 6 h, closed vial) conditions. Each cycle consisted of one polymerization and one hydrolysis step over a period of 24 h. (C) General workflow of depsipeptide separation and sequencing. Depsipeptides were separated by UPLC and traveling-wave IM (TWIM) before data-dependent MS/MS acquisition. To evaluate MS/MS data, an algorithm was developed which created all possible sequences (based on monomer inputs and polymer length). Theoretical MS/MS spectra were generated and were matched to experimental data using commercial MS software. (D) Example UPLC chromatogram of a mixture of a-, G-, A-, and BIOCHEMISTRY L-containing depsipeptides formed after four dry-wet cycles. (Inset) The extracted TWIM signal of m/z 231.11, which eluted at 9.4 min, is shown. (E) Negative- mode MS/MS spectrum of m/z 231.11 from D. The resulting sequence is aAA.

research, mass spectrometry (MS) was the logical choice for the mental data were matched with theoretical sequences containing investigation of depsipeptide primary structure (i.e., monomer se- hydroxy acids and amino acids via accurate mass and tandem mass quence) in the complex reaction mixtures. Significant develop- spectral similarity searches. Using this workflow, hundreds of dep- ments in MS and ancillary technologies (15, 16) have dramatically sipeptide sequences were surveyed after only 4 d of cycling. Experi- advanced the ability to deeply characterize biological peptidomes mentally observed depsipeptide sequences were compared with the and proteomes, with MS-based methods enabling the detection of theoretically possible permutations using homology diagrams in thousands of unique peptides and their modifications in a single order to investigate the sequence space of plausible, de-novo– chromatographic separation (17–19). synthesized proto-peptides in an origins-of-life context. Advances in proteomics, however, have not yet translated to the study of complex proto-peptide mixtures in an origins-of-life “Proto-Peptidomics” Workflow for the Analysis of scenario. One of the most daunting challenges associated with Depsipeptide Mixtures this question is the inherent vastness of proto-peptide sequence The primary justification for using a proteomics-inspired methodol- space, which may be many orders of magnitude greater than that ogy in this work stems from the chemical similarities between pep- of peptides encoded by in living cells (20). If a model tides and depsipeptides, as demonstrated by several studies (33–36). prebiotic pool of only 10 amino acids and 10 hydroxy acids is In designating monomers and/or residues, a similar nomenclature considered, and equal monomer reactivity is assumed, the po- system to that of Deechongkit et al. (34) was used, where an amino tential number of depsipeptide sequences with length 10 would acid residue is capitalized (e.g., glycine = G) and its corresponding be (10 + 10)10,or∼1013. As a result of this staggering number, hydroxy acid residue is lowercase (e.g., glycolic acid = g). progress on the characterization of plausibly prebiotic amino A general overview of the experimental workflow is provided acid pools (1, 13, 21–25), abiotic mechanisms for peptide self- in Fig. 1; detailed methods and raw data are provided in SI assembly (10, 26–30), and the evolution of early ribosomes (31, Appendix. Three prebiotic amino acids, glycine (G), L-alanine 32) has not been paralleled by a systems-level understanding of (A), and L-leucine (L), were mixed with one hydroxy acid: gly- proto-peptide diversity. colic acid (g), L-lactic acid (a), or L-malic acid (d). Each mixture Here, a peptidomic workflow was developed for mapping the se- contained one hydroxy acid and three amino acids (g+G+A+L, quence diversity of depsipeptide libraries generated from combina- a+G+A+L, and d+G+A+L) and was subjected to one, two, or tions of plausibly prebiotic monomers. Because depsipeptides are not four 85 °C-dry/65 °C-wet cycles chosen to model a daily evapo- found in current databases, an in-house algorithm was de- rating pool environment on the early Earth (12). The resulting veloped to generate custom in silico sequences. Processed experi- nine depsipeptide libraries were examined by ultraperformance

Forsythe et al. PNAS | Published online August 28, 2017 | E7653 Downloaded by guest on September 28, 2021 liquid chromatography –ion mobility–tandem mass spectrometry tures were observed in the lactic acid (a+G+A+L) dataset, and (UPLC-IM-MS/MS) in data-dependent acquisition (DDA) mode. 426 features were observed in the malic acid (d+G+A+L) dataset Similar to traditional proteomics experiments, experimental and (Table 1). theoretical sequences were matched following a ranking score (Fig. The total number of depsipeptide compositions (e.g., 2a+1G+1A) 1C). Sequence matching was achieved by the development of a script confirmed by accurate mass was 176 for the g+G+A+L dataset, that generated accurate masses and fragment ion databases for all 239 for the a+G+A+L dataset, and 45 for the d+G+A+L dataset. possible compositions and oligomer lengths within the experimental Three-dimensional maps (m/z,Rt,dt)ofconfirmeddepsipeptide mass-to-charge ratio (m/z) range observed. Databases were then compositions are shown in Fig. 2 A–C. All depsipeptides con- uploaded into a software package for comparison with experimental tained at least one hydroxy acid residue (at the “N terminus”)and data. All sequence identifications were manually validated. at least one amino acid. Features that tentatively matched dep- Example UPLC-IM and MS/MS data are shown in Fig. 1 D sipeptide masses but were not mass-selected for MS/MS during and E, respectively. As expected, the datasets were highly com- DDA, or were not abundant enough to generate quality MS/MS plex, even after one dry–wet cycle. When depsipeptides eluted spectra, were excluded from these plots. Overall, the results pre- from the UPLC column and the precursor ions were selected for sented in Fig. 2 revealed highly complex mixtures, with hundreds of MS/MS activation, their fragmentation patterns were generally detected species even after a few environmental cycles. consistent with those for biological peptides (37, 38). Therefore, An interesting observation from the data in Fig. 2 A–C is that standard peptide fragmentation nomenclature (39) was used (SI malic-acid-containing depsipeptides were much fewer in number Appendix, Figs. S1 and S2). than glycolic acid or lactic-acid-containing depsipeptides. Al- though the overall number of features observed in the d+G+A+L Mapping Depsipeptide Chemical Space dataset was similar to the g+G+A+L and a+G+A+L datasets, Across all cycles, a total of 467 unique features, or unique m/z, the number of confirmed linear depsipeptide sequences was UPLC retention time (Rt), and IM drift time (dt) combinations significantly lower. We initially hypothesized that malic-acid- were observed in the glycolic acid (g+G+A+L) dataset, 624 fea- based depsipeptides would be numerous, as malic acid (d) is

Table 1. Experimentally observed and theoretically possible depsipeptides, with respect to oligomer length

Effective length, No. of Accurate mass Unique sequences Theor. sequence † ‡ length, n n − 1* features composition matches by MS/MS§ combinations

(g+G+A+L) 21 4 1 4 3 2 20 10 16 4 3 35 37 64 5 4 41 73 256 6 5 44 91 1,024 7 6 22 72 4,096 8 7 10 33 16,384 Total 467 176 317 21,844

(a+G+A+L) 21 13 2 4 3 2 42 12 16 4 3 52 37 64 5 4 59 83 256 6 5 55 110 1,024 7 6 18 67 4,096 8 7 0 0 16,384 Total 624 239 311 21,844

(d+G+A+L) 21 10 2 4 3 2 20 12 16 4 3 15 14 64 5 4 0 0 256 6 5 0 0 1,024 7 6 0 0 4,096 8 7 0 0 16,384 Total 426 45 28 21,844

*Effective depsipeptide length, not accounting for the first hydroxy acid residue. This corresponds to the number of variable monomers per oligomer. †Number of unique m/z, retention time, and IM arrival-time combinations detected by commercial MS soft- ware after chromatographic alignment. ‡Mass tolerance ±15 ppm. §Unique sequences determined from MS/MS spectra (manually confirmed). For short oligomers, the same sequence was often observed at multiple retention times due to in-source fragmentation of longer, ester- rich depsipeptides.

E7654 | www.pnas.org/cgi/doi/10.1073/pnas.1711631114 Forsythe et al. Downloaded by guest on September 28, 2021 ABC PNAS PLUS

DEF CHEMISTRY BIOCHEMISTRY

Fig. 2. Mapping of the depsipeptide chemical space, combined from 1, 2, and 4 cycle samples. (A–C) Three-dimensional UPLC-IM-MS maps of (A)g+G+A+L depsipeptide compositions detected, (B)a+G+A+L depsipeptide compositions detected, and (C)d+G+A+L depsipeptide compositions detected. Depsipep- tides containing glycolic-acid or lactic-acid residues were more numerous than those containing malic acid. Bubble size corresponds to abundance (log), and bubble color corresponds to amino acid composition, such that darker colors represent depsipeptides composed primarily of amino acids and lighter colors represent depsipeptides composed primarily of hydroxy acids. (D–F) Chemical space changed as a function of environmental cycling for (D)g,G,A,andL depsipeptides, (E) a, G, A, and L depsipeptides, and (F) d, G, A, and L depsipeptides. As shown in the yellow region, many new glycolic acid depsipeptides were formed over cycling and were enriched in amino acids. Bubble size corresponds to abundance (log) and bubble color to amino acid character, as in A–C.All depsipeptide identities were confirmed by both accurate mass and MS/MS.

similar to aspartic acid (D) used to form Fox’s proteinoids. The to form new depsipeptides in the d+G+A+L system, potentially carboxylic acid moiety of the malic acid side chain could be ca- similar to diketopiperazine formation in amino-acid-only experi- pable of polymer branching (Fig. 1A), and its polarity was ments. Because the diversity of depsipeptide mixtures is dependent expected to improve oligomer solubility characteristics. There- upon not only the formation of new oligomers but also the hy- fore, it was somewhat surprising to observe that malic acid drolysis of previously formed oligomers during the “wet phase,” generated the fewest detectable depsipeptides of the three hy- less-reversible cyclization of malic-acid-containing depsipeptides droxy acids investigated. may have hindered further sequence diversification and chain Cyclic depsipeptides lose an additional water molecule when elongation. Nevertheless, the chemical structure(s) of cyclo-dL and formed, and thus have a mass 18.0106 Da lower than their cor- related products will be the subject of future study. responding linear depsipeptides. In our initial data analysis, Depsipeptide chemical space maps enabled comparisons of rel- water-loss species were not included in the m/z databases, as they ative abundance, composition, and hydrophobicity of the three can also occur through unavoidable in-source collision-induced types of linear depsipeptide systems examined. Relative abundances dissociation processes of linear species. Manual reinvestigation were found to vary across several orders of magnitude (Fig. 2 A–C; of the d+G+A+L dataset, however, indicated that malic acid and log relative abundance corresponds to sphere size). For all three leucine cyclic species (cyclo-dL)––isobaric with water-loss frag- systems, species with primarily ester linkages, mixtures of ester and ments, but with unique Rt and dt values––were indeed present in amide linkages, and exclusively amide linkages were observed (Fig. moderate abundance (SI Appendix, Figs. S3–S5). We hypothesize 2 A–C; amide bond content corresponds to sphere color). It should steric hindrance from bulky leucine side chains in cyclic dL prod- be noted that exclusively amide-linked oligomers are technically not ucts restricted water access to ester bonds and, in turn, led to depsipeptides; for clarity purposes, however, these have been in- reduced hydrolysis rates of those species back to the linear form. cluded under the depsipeptide umbrella. Depsipeptide hydropho- This could result in a reduced amount of reactive groups available bicity, evaluated via UPLC Rt, varied as well. Leucine-containing

Forsythe et al. PNAS | Published online August 28, 2017 | E7655 Downloaded by guest on September 28, 2021 species typically eluted later in the chromatographic separations A due to the more favorable association of its nonpolar side chain with the C18 stationary phase. IM measurements revealed an in- teresting linear correlation between neutral depsipeptide arrival times and oligomer length (SI Appendix,Fig.S6), indicating that the oligomers detected were likely unstructured. Deviation from such ion mobility trends could be useful in future searches of proto- peptide chemical space to find species with significantly different structures from the population mean, with such characteristics po- tentially indicating the emergence of secondary structure (40, 41). Two-dimensional chemical space maps (Rt and dt) were also generated for one, two, and four environmental cycles to gain in- sight into depsipeptide evolution (Fig. 2 D–F). It was observed that, B with increased number of cycles, a number of new glycolic-acid- containing depsipeptides were formed (yellow areas). For lactic- acid-containing species, new depsipeptides were formed as well, although within similar regions of Rt and dt space. As mentioned earlier, the length of malic-acid-containing species did not evolve appreciably, which is attributed to the formation of cyclic species. Overall, linear depsipeptides typically ranged from 2 to 8 residues in length, which is comparable to previous observations (12). Oligo- mers up to 14 residues in length have been previously observed (12), but such length typically requires a larger number of envi- ronmental cycles and/or a replenishing of monomers. C Depsipeptide Sequence Isomers The number of theoretically possible sequences increases greatly with oligomer length (Table 1). Therefore, we chose a multidi- mensional separation method with a high peak capacity, on the order of 105–106 (42–44). This method led to the detection of hundreds of sequences of glycolic-and lactic-acid-containing depsipeptides (Table 1), even separating sequence isomers on certain occasions (SI Appendix, Figs. S7 and S8). By nature, se- quence isomers have identical precursor ion m/z values which are indistinguishable by the quadrupole mass filter used to isolate species before fragmentation. Despite our method’s peak ca- pacity, a significant number of sequence isomers still overlapped in both UPLC and IM dimensions, resulting in coelution and MS/MS coselection. In the majority of these coselection cases, Fig. 3. Selected examples of changes in depsipeptide relative abundance between two and five individual sequence isomers were found by after environmental cycling. (A) Coeluting sequence isomers dLA and dAL manual interpretation of MS/MS fragmentation data. For longer and coeluting sequence isomers gLGGg, gLGgG, and gGGLg increased in – relative abundance with cycling. Enriched sequences contain primarily depsipeptides (5 8 residues in length), up to 12 potential se- amino-acid residues with amide linkages. (B) Coeluting sequence isomers aaL quence isomers were found in selected cases. We also observed and aLa and dA decreased in relative abundance with cycling. Sequences examples of longer depsipeptides fragmenting at internal ester such as these are either short in length––likely involved in forming other linkages in the ion source region of the mass spectrometer, even species––or contain ester linkages. (C) Coeluting sequence isomers aaGGa under relatively soft electrospray ionization conditions (SI Ap- and aGaGa and coeluting sequence isomers gAGg, gGAg, and gGgA did not pendix change significantly between 1, 2, and 4 cycles. Sequences such as these may , Fig. S9). This is why a number of short sequences were be intermediates in the formation of amide-rich sequences, but also may be detected multiple times in the same dataset (Table 1). hydrolysis products of longer, more ester-rich sequences. The relative abundances of certain species were found to change in a systematic fashion during the cycling process. The most commonly observed trend was the accumulation of new Comparing Theoretical and Experimental Depsipeptide sequences with cycling. Sequences composed of >50% amino Sequences acid typically increased in relative abundance over time (selected Homology diagrams (45) were generated for all theoretically examples in Fig. 3A), suggesting the rate of formation by ester possible combinations and experimental sequences of tetrameric condensation and subsequent ester–amide exchange outweighed species to examine sequence space as a function of the number the rates of hydrolysis and elongation into different species. The of environmental cycles (Fig. 4). In these diagrams, the height of opposite trend was observed for ester-rich sequences of short-to- a letter corresponds to residue conservation. Taller letters in- moderate length, as well as very short sequences (selected ex- dicate low sequence diversity (that is, low entropy) at a given amples in Fig. 3B). Interestingly, we also observed species whose position, whereas shorter letters indicate high sequence diversity relative abundances were rather consistent over cycling (selected (high entropy) at a position. As a visual comparison tool, the examples in Fig. 3C). We hypothesize such sequences, over this theoretical homology diagram shown in Fig. 4A is what would be short time period, formed and hydrolyzed at relatively similar obtained if all possible tetramers were produced in equal yields rates. Such species may also be hydrolysis products of longer, and differences in monomer reactivity were ignored. In this di- ester-rich sequences. agram, the likelihood of forming any backbone linkage, either

E7656 | www.pnas.org/cgi/doi/10.1073/pnas.1711631114 Forsythe et al. Downloaded by guest on September 28, 2021 AC PNAS PLUS

2.0 2.0

1.0 1.0 bits bits

0.0 0.0 123 4 Residue 2.0

1.0 B bits

0.0 2.0 2.0 CHEMISTRY 1.0 1.0 bits bits

0.0 0.0 123 4 123 412233441 Residue Residue Residue Residue BIOCHEMISTRY Fig. 4. Comparison of theoretical and experimentally observed tetramer sequences using sequence homology diagrams. Larger letters indicate a residue is conserved and does not change appreciably with cycling. As letters become smaller, residues in the specific position become more random in nature and sequences more diverse. (A) Theoretical sequence distribution of a purely random mixture of depsipeptides containing a generic hydroxy acid (x) and glycine (G), alanine (A), and leucine (L) residues. Residues 2–4 would contain equal frequencies of x, G, A, and L monomers, assuming equal monomer reactivity. (B) Theoretical sequence distribution of a random mixture of x, G, A, and L depsipeptides in which all residue linkages are amide bonds, and the N-terminal residue is a hydroxy acid, regardless of sequence composition. Residues 2–4 would contain equal frequencies of G, A, and L monomers. (C) Sequence ho- mology diagrams for the experimentally determined sequence distributions of g+G+A+L, a+G+A+L, and d+G+A+L tetramers after 1, 2, and 4 cycles. Both g+G+A+L and a+G+A+L tetramer sequences increase in their diversity with cycling. In contrast, d+G+A+L tetramers experience minimal change in sequence diversity with cycling. Malic acid-depsipeptides are enriched in amino acids, particularly leucine, at position 2.

ester or amide, would be equivalent, and positions 2–4 would be 2–4. This suggests the reaction rate between leucine and glycolic or random in nature. The homology diagram shown in Fig. 4B is lactic acid may have been slower than that with glycine and alanine. what would result if all ester linkages had been hydrolyzed and Internal sequence stretches for g+G+A+L and a+G+A+Ltetra- only amide linkages remained. In this scenario, all depsipeptides mers demonstrated high sequence diversity after only 4 d of re- would still be initiated by a hydroxy acid, but would contain only action. The observation of more diverse tetramers as the system amino acids of a random nature at positions 2–4. Although evolved is consistent with the hypothesis that a vast array of se- theoretical diagrams in Fig. 4 A and B assume equal reaction quences may have been present on the prebiotic earth before the rates between various amino acids for purposes of simplicity, initiation of template-directed peptide synthesis (47, 48). When recent kinetic studies by our team do suggest experimental dif- considering a prebiotic timescale, it seems reasonable to conclude ferences in monomer reactivity (46). that proto-peptides of short to moderate length would have had Experimental sequence data for g+G+A+L, a+G+A+L, and sufficient time to diversify into an innumerable set of sequences. d+G+A+L tetramers across one, two, and four environmental cycles In contrast with g+G+A+L and a+G+A+L tetramers, the were compiled into homology diagrams (Fig. 4C), and revealed the sequence diversity of d+G+A+L tetramers did not change sig- “N-terminal” residue to be a hydroxy acid for essentially all species, nificantly with cycling. Malic-acid depsipeptides contained more as expected from the proposed ester–amide exchange mechanism. leucine than glycine or alanine after one cycle, and very few Subsequent positions along the chain contained both amino acid malic-acid depsipeptides were found to contain solely glycine and hydroxy acid residues, with varying degrees of randomness and/or alanine without leucine (Fig. 4C). Position 2 of these depending on cycle and hydroxy acid. depsipeptides appears similar to Fig. 4B in which few, if any, After one dry–wet cycle, glycolic-acid- and lactic-acid-containing hydroxy acids would be present. This difference is attributed to tetramers contained little leucine. However, with additional cycling the presence of nonlinear species in d+G+A+L mixtures which their sequence diversity increased, leading to approximately even limit hydrolysis and elongation. In particular, the presence of mixtures of hydroxy acid, glycine, alanine, and leucine at positions only amino acids at position 2 is consistent with the formation of

Forsythe et al. PNAS | Published online August 28, 2017 | E7657 Downloaded by guest on September 28, 2021 cyclo-dL species containing one ester and one amide bond. The An additional challenge is related to the chemistry used to make chemical structure(s) of malic-acid-containing depsipeptides of a de novo proto-peptides: model prebiotic reactions which occur cyclic and/or branched nature will be the subject of future study. over short timescales may bias our understanding of plausible se- quences toward species which are most kinetically favored. While it Implications for Origins-of-Life Research is obviously impossible to perform experiments on a prebiotic In this work, mixtures of depsipeptides were sequenced from timescale, it is important to consider reaction time, or in this in- three model prebiotic reaction mixtures. Hundreds of dep- stance the number of evaporative cycles, as a key variable in gen- sipeptides were formed from a total of six monomers over 4 d of erating proto-peptide libraries. Our data demonstrate that glycine dry–wet cycling, including a large number of sequence isomers. and alanine residues incorporated into glycolic-acid- and lactic- Tandem MS allowed for sequencing of species that coeluted in acid-based depsipeptides earlier than leucine (Fig. 4C,lefttwo UPLC and IM separation space. Sequence homology diagrams columns). These differences in reactivity should be further studied demonstrated a high sequence diversity of depsipeptides con- in detail to gain a clearer picture of the proto-peptide inventory. taining glycolic acid or lactic acid, two of the most abundant Also, both the consumption of free monomer(s) and the evapo- hydroxy acids in model prebiotic reactions and meteorites (13, ration of hydroxy acids upon heating have been shown to affect the 14, 49). The diversity of these sequences as well as their peptide observed product distributions (46). Better feed approaches to character increased with the number of cycles. replenish monomers in ester–amide exchange reaction cycles, and Prior studies on proto-peptide sequence diversity have taken ei- more diverse monomer sets, would thus be expected to produce ther a purely computational approach (50) or a top-down approach even higher sequence diversity. in which libraries were made through existing biochemical means It has been recently argued that origins-of-life studies con- (51, 52). Such studies are informative, but could still benefit from cerning proto-peptides are often too narrowly focused on bi- complementary bottom-up studies to investigate the plausibility of ologically coded amino acids, and should consider the vast the libraries themselves. This study provides direct experimental diversity of nonproteinogenic monomers that would have also evidence in support of de novo proto-peptide libraries, and suggests been present in the prebiotic milieu (57, 58). Our use of hydroxy the following concerning their composition: (i) overall sequence acids to form depsipeptides demonstrates the utility of this ap- diversity would have been high; (ii) such diversity would have in- proach. MS-based approaches may be used in the future explo- creased with time; and (iii) proto-peptide sequences capable of ration of the vast chemical space of plausible proto-peptides branching and/or cyclizing would have been less diverse than linear containing not only hydroxy acids and proteinogenic amino acids sequences due to competing reaction pathways. but also nonproteinogenic amino acids, or mixtures of the three. For all systems studied here, it is important to note the number As long as a nonproteinogenic monomer has a unique m/z of confirmed sequences was lower than the number of theoretically among the other monomers in the reaction, it should be readily possible sequences (Table 1). We attribute this to specific chal- identifiable within peptide sequences. Exceptions would be lenges associated with studying vast proto-peptide libraries using a structural isomers of nonproteinogenic amino acids (e.g., β-ala- bottom-up, de novo approach. An ever-present challenge of using nine and norvaline) which are isobaric with their proteinogenic MS-based approaches is that peptides must be detected with suf- forms (α-alanine and valine, respectively); however, it is possible ficient signal intensity to be sequenced. Differences in peptide IM could resolve these species (59). As broader sequence space ionization efficiency play a role in overrepresenting species with is explored, it is possible that catalytic peptides––and perhaps higher yields, an “electrospray bias” phenomenon that is well even autocatalytic species––may arise which directly increase documented in proteomics studies (53). Although negative-ion proto-peptide length and function (60). mode was exclusively used in this work, the use of dual-polarity In addition to using a broader set of monomer residues, other MS in future experiments may be beneficial (54). Moreover, the molecules and ligands such as proto-nucleic acids, proto-lipids, specific methodology for selecting peptide precursor ions before proto-glycans, and inorganic species could be introduced as well. MS/MS can influence the number and confidence of sequence Such molecules would have certainly influenced the solubility, for- identifications. DDA, which was used in this study, has a high mation, and hydrolysis of proto-peptides, and may have coevolved specificity but is biased toward high-abundance peptides (55). In- together with proto-peptides in primordial environments (61). Ex- deed, hundreds of features in our mixtures––many of which are perimental observation of proto-peptide function such as catalysis likely lower-abundance depsipeptides––were not selected for frag- or self-replication within a more realistic model of the prebiotic mentation and thus were not sequenced (Table 1). Data-independent soup will provide a more robust picture of chemical evolution in an acquisition (DIA) methods which fragment all coeluting peptides origins-of-life context, highlighting systems-level interactions within without mass selection could increase the number of sequence highly complex chemical networks (62). identifications. However, the confidence of individual sequence assignments could decrease––particularly for proto-peptide mix- ACKNOWLEDGMENTS. We thank Prof. Matthew Torres and Dr. Nagender tures like the ones examined here which contain a number of se- Panyala (Georgia Institute of Technology) for advice on sequence logos and proteomics-style data analysis, as well as Mark Bennett (Nonlinear Dynamics) quence isomers. It is possible that using DIA methods, or methods for assistance with commercial MS software. This work was supported by the which combine aspects of both DDA and DIA (56), could en- NSF and the NASA Astrobiology Program, under the NSF Center for Chemical hance sequencing capabilities. Evolution (Award CHE-1504217).

1. Miller SL (1953) A production of amino acids under possible primitive earth condi- 8. Bowman JC, Hud NV, Williams LD (2015) The ribosome challenge to the RNA world. tions. Science 117:528–529. J Mol Evol 80:143–161. 2. Fox SW, Harada K (1958) Thermal copolymerization of amino acids to a product re- 9. Rode BM (1999) Peptides and the origin of life. Peptides 20:773–786. sembling protein. Science 128:1214. 10. Rodriguez-Garcia M, et al. (2015) Formation of oligopeptides in high yield under 3. Fox SW, Harada K (1960) The thermal copolymerization of amino acids common to simple programmable conditions. Nat Commun 6:8385. protein. J Am Chem Soc 82:3745–3751. 11. Greenwald J, Friedmann MP, Riek R (2016) Amyloid aggregates arise from amino acid 4. Robertson MP, Joyce GF (2012) The origins of the RNA world. Cold Spring Harb condensations under prebiotic conditions. Angew Chem Int Ed Engl 55:11609–11613. Perspect Biol 4:a003608. 12. Forsythe JG, et al. (2015) Ester-mediated amide bond formation driven by dry-wet 5. Orgel LE (2003) Some consequences of the RNA world hypothesis. Orig Life Evol cycles: A possible path to polypeptides on the prebiotic Earth. Angew Chem Int Ed Biosph 33:211–218. Engl 54:9871–9875. 6. Hud NV, Cafferty BJ, Krishnamurthy R, Williams LD (2013) The origin of RNA and “my 13. Miller SL, Urey HC (1959) Organic compound synthesis on the primitive earth. Science grandfather’s axe”. Chem Biol 20:466–474. 130:245–251. 7. Tagami S, Attwater J, Holliger P (2017) Simple peptides derived from the ribosomal 14. Peltzer ET, Bada JL (1978) Alpha-hydroxycarboxylic acids in Murchison meteorite. core potentiate RNA polymerase function. Nat Chem 9:325–332. Nature 272:443–444.

E7658 | www.pnas.org/cgi/doi/10.1073/pnas.1711631114 Forsythe et al. Downloaded by guest on September 28, 2021 15. Aebersold R, Mann M (2003) Mass spectrometry-based proteomics. Nature 422: 40. Hudgins RR, Jarrold MF (1999) Helix formation in unsolvated alanine-based peptides: PNAS PLUS 198–207. Helical monomers and helical dimers. J Am Chem Soc 121:3494–3501. 16. Bensimon A, Heck AJR, Aebersold R (2012) Mass spectrometry-based proteomics and 41. Johnson AR, Dilger JM, Glover MS, Clemmer DE, Carlson EE (2014) Negatively-charged network biology. Annu Rev Biochem 81:379–405. helices in the gas phase. Chem Commun (Camb) 50:8849–8851. 17. Scheltema RA, et al. (2014) The Q exactive HF, a benchtop mass spectrometer with a 42. Valentine SJ, Kulchania M, Barnes CAS, Clemmer DE (2001) Multidimensional separations pre-filter, high-performance quadrupole and an ultra-high-field orbitrap analyzer. of complex peptide mixtures: A combined high-performance liquid chromatography/ Mol Proteomics 13:3698–3708. ion mobility/time-of-flight mass spectrometry approach. IntJMassSpectrom212: 18. Kelstrup CD, et al. (2014) Rapid and deep proteomes by faster sequencing on a 97–109. benchtop quadrupole ultra-high-field orbitrap mass spectrometer. J Proteome Res 13: 43. McLean JA, Ruotolo BT, Gillig KJ, Russell DH (2005) Ion mobility–mass spectrometry: A 6187–6195. new paradigm for proteomics. Int J Mass Spectrom 240:301–315. 19. Xu Z, et al. (2015) Comprehensive quantitative analysis of ovarian and breast cancer 44. Baker ES, et al. (2010) An LC-IMS-MS platform providing increased dynamic range for tumor peptidomes. J Proteome Res 14:422–433. high-throughput proteomic studies. J Proteome Res 9:997–1006. 20. Ezkurdia I, et al. (2014) Multiple evidence strands suggest that there may be as few as 45. Crooks GE, Hon G, Chandonia JM, Brenner SE (2004) WebLogo: A sequence logo 19,000 human protein-coding genes. Hum Mol Genet 23:5866–5878. generator. Res 14:1188–1190. 21. Kvenvolden K, et al. (1970) Evidence for extraterrestrial amino-acids and hydrocar- 46. Yu SS, et al. (2016) Kinetics of prebiotic depsipeptide formation from the ester-amide bons in the Murchison meteorite. Nature 228:923–926. exchange reaction. Phys Chem Chem Phys 18:28441–28450. 22. Cronin JR, Moore CB (1971) Amino acid analyses of the murchison, murray, and al- 47. New MH, Pohorille A (2000) An inherited efficiencies model of non-genomic evolu- lende carbonaceous chondrites. Science 172:1327–1329. tion. Simul Pract Theory 8:99–108. 23. Cleaves HJ, 2nd (2010) The origin of the biologically coded amino acids. J Theor Biol 48. Adamala K, et al. (2014) Open questions in origin of life: Experimental studies on the 263:490–498. origin of nucleic acids and proteins with specific and functional sequences by a 24. Burton AS, Stern JC, Elsila JE, Glavin DP, Dworkin JP (2012) Understanding prebiotic chemical synthetic biology approach. Comput Struct Biotechnol J 9:e201402004. chemistry through the analysis of extraterrestrial amino acids and nucleobases in 49. Cronin JR, Pizzarello S, Epstein S, Krishnamurthy RV (1993) Molecular and isotopic meteorites. Chem Soc Rev 41:5459–5472. analyses of the hydroxy acids, dicarboxylic acids, and hydroxicarboxylic acids of the 25. Bada JL (2013) New insights into prebiotic chemistry from Stanley Miller’s spark dis- Murchison meteorite. Geochim Cosmochim Acta 57:4745–4752. charge experiments. Chem Soc Rev 42:2186–2196. 50. Oda A, Fukuyoshi S (2015) Predicting three-dimensional conformations of peptides 26. Lahav N, White D, Chang S (1978) Peptide formation in the prebiotic era: Thermal constructed of only glycine, alanine, aspartic acid, and valine. Orig Life Evol Biosph 45: condensation of glycine in fluctuating clay environments. Science 201:67–69. 183–193. 27. Schwendinger MG, Rode BM (1991) Salt-induced formation of mixed peptides under 51. Tanaka J, Doi N, Takashima H, Yanagawa H (2010) Comparative characterization of possible prebiotic conditions. Inorg Chim Acta 186:247–251. random-sequence proteins consisting of 5, 12, and 20 kinds of amino acids. Protein Sci 28. Keller M, Blöchl E, Wächtershäuser G, Stetter KO (1994) Formation of amide bonds 19:786–795. without a condensation agent and implications for origin of life. Nature 368:836–838. 52. Lu MF, Xie Y, Zhang YJ, Xing XY (2015) Effects of cofactors on conformation transi- 29. Danger G, Plasson R, Pascal R (2012) Pathways for the formation and evolution of tion of random peptides consisting of a reduced amino acid alphabet. Protein Pept peptides in prebiotic environments. Chem Soc Rev 41:5416–5429. Lett 22:579–585. 30. Griffith EC, Vaida V (2012) In situ observation of formation at the 53. Jarnuczak AF, et al. (2016) Analysis of intrinsic peptide detectability via integrated CHEMISTRY water-air interface. Proc Natl Acad Sci USA 109:15697–15701. label-free and SRM-based absolute quantitative proteomics. J Proteome Res 15: 31. Hsiao C, et al. (2013) Molecular paleontology: A biochemical model of the ancestral 2945–2959. ribosome. Nucleic Acids Res 41:3373–3385. 54. Chen HK, Chang CK, Wu CC, Huang MC, Wang YS (2009) Synchronized dual-polarity 32. Petrov AS, et al. (2014) Evolution of the ribosome at atomic resolution. Proc Natl Acad electrospray ionization mass spectrometry. J Am Soc Mass Spectrom 20:2254–2257. Sci USA 111:10251–10256. 55. Geromanos SJ, et al. (2009) The detection, correlation, and comparison of peptide 33. Powers ET, Deechongkit S, Kelly JW (2005) Backbone-backbone H-bonds make precursor and product ions from data independent LC-MS with data dependent LC- context-dependent contributions to protein folding kinetics and thermodynamics: MS/MS. Proteomics 9:1683–1695. Lessons from amide-to-ester mutations. Adv Protein Chem 72:39–78. 56. Gillet LC, et al. (2012) Targeted data extraction of the MS/MS spectra generated by 34. Deechongkit S, Dawson PE, Kelly JW (2004) Toward assessing the position-dependent data-independent acquisition: A new concept for consistent and accurate proteome contributions of backbone hydrogen bonding to beta-sheet folding thermodynamics analysis. Mol Cell Proteomics 11:O111.016717. employing amide-to-ester perturbations. J Am Chem Soc 126:16762–16771. 57. Meringer M, Cleaves HJ, 2nd, Freeland SJ (2013) Beyond terrestrial biology: Charting BIOCHEMISTRY 35. Deechongkit S, et al. (2004) Context-dependent contributions of backbone hydrogen the chemical universe of α-amino acid structures. J Chem Inf Model 53:2851–2862. bonding to beta-sheet folding energetics. Nature 430:101–105. 58. Raggi L, Bada JL, Lazcano A (2016) On the lack of evolutionary continuity between 36. Forsythe JG, et al. (2015) Collision cross section calibrants for negative ion mode prebiotic peptides and extant enzymes. Phys Chem Chem Phys 18:20028–20032. traveling wave ion mobility-mass spectrometry. Analyst 140:6853–6861. 59. Dodds JN, May JC, McLean JA (2017) Investigation of the complete suite of leucine 37. Bowie JH, Brinkworth CS, Dua S (2002) Collision-induced fragmentations of the and isoleucine isomers: Toward prediction of ion mobility separation capabilities. − (M-H) parent anions of underivatized peptides: An aid to structure determination Anal Chem 89:952–959. and some unusual negative ion cleavages. Mass Spectrom Rev 21:87–107. 60. Guseva EA, Zuckermann RN, Dill KA (2016) How did prebiotic polymers become in- 38. Kjeldsen F, Zubarev RA (2011) Effects of peptide backbone amide-to-ester bond formational foldamers? arXiv:1604.08890. substitution on the cleavage frequency in electron capture dissociation and collision- 61. Krishnamurthy R (2017) Giving rise to life: Transition from prebiotic chemistry to activated dissociation. J Am Soc Mass Spectrom 22:1441–1452. protobiology. Acc Chem Res 50:455–459. 39. Steen H, Mann M (2004) The ABC’s (and XYZ’s) of peptide sequencing. Nat Rev Mol 62. Ruiz-Mirazo K, Briones C, de la Escosura A (2014) Prebiotic systems chemistry: New Cell Biol 5:699–711. perspectives for the origins of life. Chem Rev 114:285–366.

Forsythe et al. PNAS | Published online August 28, 2017 | E7659 Downloaded by guest on September 28, 2021