<<

Journal of Structural Biology xxx (2017) xxx–xxx

Contents lists available at ScienceDirect

Journal of Structural Biology

journal homepage: www.elsevier.com/locate/yjsbi

Review Ribosomal as documents of the transition from unstructured (poly)peptides to folded proteins ⇑ Andrei N. Lupas , Vikram Alva

Department of Evolution, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany article info abstract

Article history: For the most part, contemporary proteins can be traced back to a basic set of a few thousand domain pro- Received 2 February 2017 totypes, many of which were already established in the Last Universal Common Ancestor of life on Earth, Received in revised form 23 April 2017 around 3.5 billion years ago. The origin of these domain prototypes, however, remains poorly understood. Accepted 24 April 2017 One hypothesis posits that they arose from an ancestral set of peptides, which acted as cofactors of RNA- Available online xxxx mediated catalysis and replication. Initially, these peptides were entirely dependent on the RNA scaffold for their structure, but as their complexity increased, they became able to form structures by excluding Keywords: water through hydrophobic contacts, making them independent of the RNA scaffold. Their ability to fold Ancient peptides was thus an emergent property of peptide-RNA coevolution. The ribosome is the main survivor of this Protein evolution primordial RNA world and offers an excellent model system for retracing the steps that led to the folded Ribosome proteins of today, due to its very slow rate of change. Close to the peptidyl transferase center, which is the Ribosomal protein oldest part of the ribosome, proteins are extended and largely devoid of secondary structure; further from RNA the center, their secondary structure content increases and supersecondary topologies become common, Secondary structure although the proteins still largely lack a hydrophobic core; at the ribosomal periphery, supersecondary structures coalesce around hydrophobic cores, forming folds that resemble those seen in proteins of the cytosol. Collectively, ribosomal proteins thus offer a window onto the time when proteins were acquiring the ability to fold. Ó 2017 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction and some experimentation. Among the possibilities considered, by far the most popular and best supported has been that of RNA Life today results from the information storage provided by forming the first systems capable of autocatalytic replication, act- nucleic acids (mainly DNA) and the catalytic activity of polypep- ing as both the information bearer and the agent of catalysis (e.g. tides. Since the time of the Last Common Ancestor of all living Gesteland et al., 2006; Higgs and Lehman, 2015; Jeffares et al., beings on Earth (LUCA), these macromolecules have been following 1998; Joyce, 2002; Lazcano et al., 1988; but for an alternative view a tripartite division of labor, in which the information stored in see for example Kurland, 2010). This hypothesis, first formulated DNA is converted to proteins in a process substantially dependent by Alexander Rich (Rich, 1962) and given the name of ‘RNA world’ on RNA; this unidirectional flow of information from nucleic acids by Walter Gilbert (Gilbert, 1986), rests substantially on the obser- to proteins is considered the central dogma of molecular biology vation that even today, RNA still acts both as information carrier (Crick, 1970). and catalyst in the biosynthesis of proteins, accepting information It seems impossible that this elaborate interplay of complex from DNA in a transcription step and transferring it to a ribozyme macromolecules could have emerged de novo from abiotic pro- (the ribosome) for translation to a polypeptide sequence. While cesses and it is generally accepted that life must have started in many obstacles remain to be overcome on the path from inorganic a simpler form, which has been the subject of much theorizing compounds to the first RNA polymers (Bernhardt, 2012; Shapiro, 2007) and a number of simpler, pre-RNA molecules have been dis- cussed as the first information-bearing, autocatalytic entities (e.g. Abbreviations: Last Universal Common Ancestor, LUCA; peptidyl transferase Engelhart and Hud, 2010; Lazcano and Miller, 1996; Orgel, 2000; center, PTC. Trevino et al., 2011), the RNA world is now well established and ⇑ Corresponding author at: Department of Protein Evolution, Max Planck Institute widely considered to have been the direct precursor to the DNA- for Developmental Biology, Spemannstr. 35, 72076 Tübingen, Germany. protein world of today. E-mail address: [email protected] (A.N. Lupas). http://dx.doi.org/10.1016/j.jsb.2017.04.007 1047-8477/Ó 2017 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Please cite this article in press as: Lupas, A.N., Alva, V. Ribosomal proteins as documents of the transition from unstructured (poly)peptides to folded pro- teins. J. Struct. Biol. (2017), http://dx.doi.org/10.1016/j.jsb.2017.04.007 2 A.N. Lupas, V. Alva / Journal of Structural Biology xxx (2017) xxx–xxx

The involvement of polypeptides in the RNA world, initially in healthy organisms not exposed to stressful conditions, protein mis- the form of short peptides, most likely occurred very early. Indeed, folding represents an important challenge, as seen for example for recent evidence suggests that RNA and peptides co-evolved from the cystic fibrosis transmembrane conductance regulator, of which, the beginning, their building blocks originating from the same in healthy humans, only about a third of the synthesized copies chemical reaction networks (Patel et al., 2015; Ritson and reach the membrane in a folded state (Ward et al., 1995). In old Sutherland, 2013). RNA faces a number of limitations in stability age and disease these problems become potentiated, leading for and catalytic repertoire, particularly in its inability to mediate example in humans to a host of degenerative diseases (Gregersen redox reactions with free radicals (Bernhardt, 2012; Joyce, 2002); et al., 2006; Voisine et al., 2010), such as cystic fibrosis, Alzhei- peptides would have offered it many benefits, such as coordinating mer’s, Parkinson’s, and Huntington’s diseases. Given these consid- metals and small molecules, mounting iron-sulfur clusters for erations, it may come as a surprise that natural proteins redox catalysis, promoting the stability and structural specificity nevertheless represent a best-case set, because in their over- of RNA folding by binding into its grooves, mediating complex for- whelming majority polypeptides do not appear to have a folded mation, and functionalizing the first membranes. While the first structure at all. It is very difficult to estimate the actual proportion peptides to join -based autocatalytic replicators were of folding polypeptides with any degree of accuracy, since the pro- probably of abiotic origin, natural selection would soon have tein folding problem is still substantially unsolved and the number favored forms encoded and synthesized by nucleic acids. For one, of sequence possibilities for a polypeptide chain exceeds the num- abiotic peptide formation is highly inefficient (e.g. Cleaves et al., ber of particles in the known universe already at a chain length of 2009; Schreiner et al., 2011), making availability a limiting factor around 60 residues. Nevertheless, a rough estimate is given by of autocatalytic growth from the start. Also, most peptides, even screens of polypeptide libraries, which have produced a success those composed of the 20 proteinogenic amino acids, are of no rate of less than one in a billion, even when these libraries were structural and functional use to RNA, placing a premium on syn- biased for specific patterns of hydrophobicity or derived from a thesizing only useful forms and passing the information on to random fragmentation of genomic DNA (Keefe and Szostak, the next generation. Given the broad spectrum of steps needed 2001; Matsuura et al., 2002; Riechmann and Winter, 2000; Wei to fulfill even the basic requirements of an information-bearing et al., 2003). chemical system capable of autocatalytic replication, it seems clear Given the difficulty polypeptides encounter to reach and main- that the RNA-peptide world must have achieved considerable com- tain a folded state, and the exceedingly low likelihood of newly plexity well before its transition to the DNA-protein world we emerged polypeptides to even have such a state, it is entirely observe today. In making this transition, the RNA-peptide world non-trivial to explain how life came to rely so extensively on folded faced a considerable challenge: whereas the chemistry of the proteins. Looking at proteins today it is clear that nature is bypass- RNA-to-DNA transition seems unproblematic (Ritson and ing the protein folding problem by generating new proteins Sutherland, 2014), there is a major obstacle on the path from pep- through the amplification, differentiation, and recombination of a tides to proteins, known as the protein folding problem. basic set of autonomously folding prototypes (domains). Through their similarity in sequence and structure, these domains can be classified into a hierarchy of families, superfamilies, and folds 2. The protein folding problem from an evolutionary (Andreeva et al., 2015; Dawson et al., 2017; Schaeffer et al., perspective 2017), showing that, though seemingly boundless, the diversity of natural proteins is actually rather narrowly circumscribed (see Both nucleic acids and proteins must assume defined three- e.g. Koonin et al., 2002). In total, these classifications, as well as dimensional structures for their biological activity, but their ability large-scale surveys, suggest that there are no more than some to do so is starkly different. Nucleic acids fold spontaneously and 104 domain families, prototypes for many of which were already robustly, based primarily on a number of simple base-pairing rules, present at the time of LUCA, around 3.5 billion years ago and can in general be denatured and renatured reversibly by chem- (Koonin, 2003; Kyrpides et al., 1999; Ranea et al., 2006). Domain ical agents or temperature without substantial loss of material classifications have been a very powerful tool in retracing the evo- (witness for example the polymerase chain reaction). Protein lution of the protein world that underpins life today, but the ori- structure, in contrast, is an altogether more complex property gins of domain prototypes themselves have long remained and the process by which proteins reach their structure (folding) unclear and only started to emerge in recent years. is easily disrupted and readily undone by even minor changes in temperature or the chemical environment. Once denatured, pro- teins tend to aggregate and can either not be renatured, or only 3. Proteins from peptides with large loss of material, making denaturation a substantially irreversible process. The easy loss of structure in most proteins is As outlined above, the staggering size of protein sequence space due to the low free energy of folding (often equivalent to just a and the low incidence of folded exemplars within it essentially few hydrogen bonds), which places them energetically close to preclude an origin of folded domains by random concatenation of the unfolded state. Their tendency to aggregate upon denaturation amino acids. An alternative scenario proposes that the first folded is due to the dominant role of the hydrophobic effect in folding, domains did not arise from random processes, but from the which leads folded proteins to mainly segregate hydrophobic resi- increased complexity of the peptides that had evolved in the dues to the protein core and hydrophilic residues to the surface. RNA world (Lupas et al., 2001; Soding and Lupas, 2003). In this sce- When the hydrophobic residues of the core become exposed in nario, the evolutionary pressures operating on peptides within the denatured state, they tend to coalesce into heterogeneous tan- their replicative systems led to the selection of biophysical proper- gles, which are generally impossible to resolve and must be ties that eventually yielded protein folding as an emergent degraded. property. The closeness of the structured and unstructured states in most This scenario proceeds from the assumption that one of the proteins and the many problems arising to living beings from this properties under selection from the start must have been the abil- are documented in the elaborate protein quality control and degra- ity of peptides and RNA to interact specifically, an evolutionary dation systems that are universal to life (e.g. Bukau et al., 2006; pressure resulting as much from a competition of primordial Gottesman et al., 1997; McClellan and Frydman, 2001). Even in for a limited pool of peptides as from the greater functional effec-

Please cite this article in press as: Lupas, A.N., Alva, V. Ribosomal proteins as documents of the transition from unstructured (poly)peptides to folded pro- teins. J. Struct. Biol. (2017), http://dx.doi.org/10.1016/j.jsb.2017.04.007 A.N. Lupas, V. Alva / Journal of Structural Biology xxx (2017) xxx–xxx 3 tiveness of specific interactors. Specificity is promoted by larger Fig. 1. The fragment has been called the alpha-L RNA-binding motif interaction surfaces, a geometric fit of complementary groups, for its occurrence in several different domain families that coordi- and the exclusion of water from binding sites. On the RNA side, this nate RNA, most of which are involved in the biogenesis or function rewarded the emergence of ligases capable of enlarging the avail- of the ribosome (Aravind and Koonin, 1999; Staker et al., 2000). able peptide pool, of producing peptides of greater length than The motif was also at the root of at least four other domain families obtainable through abiotic processes, and of using an emergent involved in regulatory and biosynthetic activities of the cytosol, code to increase the yield of peptides with useful sequences. On pointing to a functional radiation of proteins at the transition from the peptide side, it led to the selection of amino acids favoring the RNA-peptide world. Three mechanisms have been proposed for nucleic-acid interaction and of sequences able to assume a defined the increase in complexity required by (poly)peptides to achieve structure on an RNA scaffold. The exclusion of water resulting from autonomous folding and thus functional independence: repetition structurally specific interaction further promoted the formation of (Alva et al., 2007, 2008; Balaji, 2015; Blundell et al., 1979; Broom secondary structure in the peptides by increasing the energetic et al., 2012; Eck and Dayhoff, 1966; Kopec and Lupas, 2013; Lee reward of intramolecular hydrogen bonds. Indeed, random peptide and Blaber, 2011; McLachlan, 1972a; McLachlan, 1972b, 1980, libraries composed of the 20 proteinogenic amino acids show a 1987; McLachlan et al., 1980; Remmert et al., 2010; Smock et al., natural affinity for RNA and the induction of secondary structure 2016; Soding et al., 2006; Yadid and Tawfik, 2007), recombination upon binding, underscoring this selection step (Das and Frankel, (Bharat et al., 2008; Riechmann and Winter, 2000), and decoration 2003; Patel, 1999). Over geological time-scales, the increasing (Alva et al., 2007, 2008, 2015; Schaeffer et al., 2016). These mech- organizational and functional complexity of the RNA-peptide net- anisms are illustrated in Fig. 2, using mainly the elaboration of works led to increasingly complex peptides, whose structure pro- fragments found in ribosomal proteins as examples. Of the three gressed from the local formation of secondary structure on the mechanisms, the recombination of different fragments within the RNA scaffold, to the arrangement of these secondary structures same fold appears to be a rare event; by and large, the 40 ancient into supersecondary structure elements, such as a- and b- fragments appear to have formed their folds either by decoration hairpins, b-meanders, and bab-elements, and eventually to the with heterologous secondary structure elements or by repetition association of these supersecondary structures into compact ter- (oligomerization or amplification within the same polypetide tiary structures – the first folds. In this scenario, the optimization chain). Indeed, repetition was the first to be proposed (Eck and of the peptides for exclusion of water with a scaffold and their pro- Dayhoff, 1966) and is the one best substantiated by computational gressive emancipation from this scaffold led to the emergence of analyses and experimentation (Alva et al., 2007; Kopec and Lupas, autonomous folding, once the (poly)peptides had achieved suffi- 2013; McLachlan, 1972b, 1980; McLachlan et al., 1980; Remmert cient complexity to exclude water through hydrophobic contacts et al., 2010; Smock et al., 2016; Yadid and Tawfik, 2007). Recently, between their structural elements, rather than with the scaffold. for the first time, a fragment from a ribosomal protein (RPS20) – a The transition from the RNA-peptide world to the DNA-protein protein that is unstructured in the absence of RNA – could be world is most easily rationalized by the limitations of RNA both as amplified to produce a prominent fold of cytosolic proteins, the information repository and catalyst. With increasing network com- aa-solenoid (Zhu et al., 2016). This experiment proceeded from plexity, these limitations, as well as the inherent evolutionary chal- the assumption that the ribosome is the best source of peptides lenge of optimizing the same class of molecules for two that have retained ancestral phenotypes, due to its very slow rate fundamentally different functions, led to a specialization of molec- of evolution. ular tasks, resulting in the establishment of DNA as a robust and structurally monotonous information repository and of proteins as the primary agents of catalysis. The catalytic proteins resulting 4. The ribosome as a document of RNA-peptide coevolution and from this step retain to this day the evidence of their origin in the transition to the DNA-protein world the RNA world through their wide-spread use of nucleotides and nucleotide-derived co-factors, which are the essential components Even though the transition from the RNA-peptide to the DNA- of their activity in many reactions. As has become clear recently, protein world was complete by the time of LUCA, one aspect of they also retain another source of evidence, namely the presence the RNA world has survived to this day in the conversion of of fragments whose similarity suggests a common origin, yet sequence information to actual proteins, with the ribosome at its which appear to have predated the emergence of folds; these frag- core. As such, the ribosome is a ‘‘living fossil‘‘, a particle so central ments have been interpreted as the last observable remnants of the to all cellular processes that it has essentially become frozen in peptides that evolved in the RNA-peptide world (Alva et al., 2015; time, preserving many ancestral features in its molecular structure. Lupas et al., 2001). To identify such similar fragments in globally Despite this preservation, the ribosome does evolve and a compar- dissimilar folds, a recent study used stringent criteria, widely ative analysis of its structure in the different branches of life pro- accepted as evidence for homology, to establish local similarity vides a ‘‘fossil record” of changes, a chronology of ancient events in sequence and structure (Alva et al., 2015). The search produced in molecular evolution (Gulen et al., 2016; Hartman and Smith, 40 fragments of between 9 and 38 residues, each of which occurred 2014; Hsiao et al., 2009; Petrov et al., 2015, 2014). This chronology in at least 2 different folds; these fragments were strongly enriched identifies the peptidyl transferase center (PTC) of the large subunit for functional properties that would have been highly relevant in and the decoding center of the small subunit as the most ancient the RNA world, such as the coordination of iron-sulfur-clusters, parts of the ribosome (marked by purple spheres in Fig. 3), with nucleic acids, nucleotides, and nucleotide-derived cofactors. A spe- the ligase function of the PTC probably having been established cial case of nucleic acid interaction, and the ancestral one with prior to the addition of a coding mechanism (Gulen et al., 2016; respect to the origin of the DNA-protein world, would have been Petrov et al., 2015; Smith et al., 2008). From this ancestral state, the interaction with RNA and indeed, of the 40 fragments, 9 are both subunits grew further by accretion and the time-points of found also in ribosomal proteins. As the only code-directed peptide the various elaborations can be estimated by ‘‘peeling back” suc- ligase known to us, the ribosome is probably the most ancient par- cessive layers towards these centers. Looking across the layers also ticle still operating today and the main survivor of the RNA-peptide makes trends apparent in the evolutionary history of the ribosome, world. such as a gradual increase in the regularity of RNA secondary struc- As an example, one of these 9 ribosomal fragments, fragment 10 ture and a decrease of magnesium ion-binding as one progresses in (Alva et al., 2015)], and the folds descended from it are shown in from the center outwards (Hsiao et al., 2009).

Please cite this article in press as: Lupas, A.N., Alva, V. Ribosomal proteins as documents of the transition from unstructured (poly)peptides to folded pro- teins. J. Struct. Biol. (2017), http://dx.doi.org/10.1016/j.jsb.2017.04.007 4 A.N. Lupas, V. Alva / Journal of Structural Biology xxx (2017) xxx–xxx

ribosomal protein S4 (2J00, D; d.66.1.2)

tyrosyl-tRNA synthetase C-domain (1H3E, A; d.66.1.4)

heat shock protein15 pseudouridine (1DM9, A; d.66.1.3) synthase N-domain (1KSK, A; d.66.1.5)

RNA-binding

Regulatory and metabolic processes primordial alpha-L RNA-binding motif

threonyl-tRNA synthetase N-domain (1TKE, A; d.15.10.1)

molybdopterin synthase subunit MoaD (1FM0, D; d.15.3.1) FHA domain (2AFF, A; b.26.1.2)

EssC protein (1WV3, A; b.26.1.4)

Fig. 1. The primordial alpha-L RNA-binding motif and its diverse embodiments. Domain families containing this fragment, which corresponds to fragment 10 in(Alva et al., 2015), are either involved in coordinating RNA, in the context of the ribosome, or are involved in regulatory and metabolic processes of the cytosol. Representative domains from eight different families belonging to three different SCOPe folds are shown. In all structures, the motif is colored red, the remainder of the structure gray, and RNA blue. PDB identifiers, chain names, and SCOPe identifiers of the shown structures are given in parentheses.

The incidence and nature of ribosomal proteins also shows a complexity. As an illustration of this, Fig. 3 shows the peptidic strong correlation with the growth of the particle (Hartman and material within three concentric spheres of 50 Å, 70 Å and 90 Å Smith, 2014; Hsiao et al., 2009; Soding and Lupas, 2003). One of radius from the geometric center (red sphere) of a bacterial ribo- the striking insights that emerged from the first crystal structures some, as well as the full ribosomal protein complement (the geo- of the ribosome was that no peptide approached the PTC closer metric center of the ribosome is taken as an approximation that than about 20 Å (Nissen et al., 2000), not only underscoring the allows to visualize jointly the evolutionary trends in both sub- nature of the ribosome as a ribozyme, but also the very early emer- units). The innermost sphere (Fig. 3A) contains few peptide chains, gence of a universal peptide ligase in the nascent RNA-peptide which are almost entirely devoid of secondary structure. The next world, prior to the widespread establishment of peptides as ribo- sphere (Fig. 3B), with an outer bound at 70 Å from the center, zyme cofactors. Indeed, the role of the first peptides to associate shows many more peptide chains and an increased secondary with the PTC may well have been to provide stabilizing polymeric structure content; the b-strands among these further assume a counter ions, gradually replacing the magnesium ions, which are higher structural order by associating into b-hairpins and b- pervasive in the innermost ribosome core (Klein et al., 2004). Fur- meanders. Further out, at 90 Å from the center (Fig. 3C), peptidic ther out from the core, peptidic material starts to appear and grad- material is now largely organized into regular secondary struc- ually becomes structured into conformations of increasing tures, which are themselves often part of recurrent supersecondary

Please cite this article in press as: Lupas, A.N., Alva, V. Ribosomal proteins as documents of the transition from unstructured (poly)peptides to folded pro- teins. J. Struct. Biol. (2017), http://dx.doi.org/10.1016/j.jsb.2017.04.007 A.N. Lupas, V. Alva / Journal of Structural Biology xxx (2017) xxx–xxx 5

TPR (1ELW, A; a.118.8.1) RPS20 (2J00, T; a.7.6.1) DNA polymerase β N-domain EF-Ts (4KLI, A; N-domain a.60.6.1) (1XB2, B; decoration a.5.2.2)

repetition recombination

decoration RuvA RPL7/12 middle-domain (1CTF, A; primordial duplication (1IXR, A; d.45.1.1) fragments a.60.2.1) decoration decoration

decorationtion + duplica RPS13 Clp N-domain decoration (2UUB, M; (1K6K, A; a.156.1.1) a.174.1.1)

duplication +

oligomerization

recombination histone RPS3 (1B67, A & B; (2J00, C; a.22.1.2) d.52.3.1)

ECR1 (2BA0, A; d.51.1.1)

Fig. 2. Repetition, decoration, and recombination as the primary mechanisms for the emergence of folded domains from peptides. These three mechanisms are shown using the elaboration of four primordial fragments found in ribosomal proteins and one fragment from a non-ribosomal protein (highlighted by a circle). The fragments are: the TPR element [red; fragment 28 in (Alva et al., 2015)], the helix-hairpin-helix motif (yellow; fragment 2), the KH motif (green; fragment 6), the helix-strand-helix motif (orange, fragment 3), and the a-hairpin seen in RPL7/12 (blue, fragment 24). The arrows indicate our inference of possible evolutionary relationships. Folds that share a homologous primordial fragment, as substantiated by sequence and structural similarity, are shown as emerging from it using solid arrows; hypothetical connections, solely based on structural similarity, are indicated by dotted arrows. In structures with multiple copies of the same fragment, the copies are distinguished by light and dark colors. In all structures, heterologous decorations are shown in gray. PDB identifiers, chain names and SCOPe identifiers are given in parentheses. structure motifs, such as a- and b-hairpins, b-meanders, and b-a-b complete dependence on RNA to full independence. It is thus elements. Especially towards the outer edge of this sphere, super- tempting to view the protein world as having emerged from a ribo- secondary structures coalesce into compact topologies, many of somal ancestry. However, it seems more likely that the RNA world which resemble the folds of cytosolic proteins. In the last step, contained many RNA-peptide complexes with an extended com- the proteins of the ribosomal periphery are mainly of a globular plement of functional peptides, which in the course of evolution nature (Fig. 3D), encompass most of the aforementioned ancient gave rise to the protein families we observe today. As a member fragments found in the ribosome, frequently have recognizable of this world, the ribosome must have participated in an ongoing homologs among cytosolic proteins, and indeed show traces of exchange of peptidic material with other RNA particles and the the same evolutionary mechanisms that have been proposed for emerging cytosol, an ongoing process that did not stop with the the evolution of folds (Fig. 2). However, despite similarity to their establishment of folded proteins and is still seen in the lineage- cytosolic counterparts, these proteins do not have fully formed specific ribosomal proteins acquired after the branching of cellular hydrophobic cores and are still dependent on the RNA for folding. life. While today it is essentially impossible to infer the nature of Nevertheless, this similarity suggests that cytosolic proteins could these ancient exchanges, the ribosome, as the last survivor of this be viewed as yet another shell from the ribosomal center (Fig. 3E), world, offers us the only window left onto this time more than 3.5 documenting the last step in the transition of polypeptides from billion years ago when proteins became established.

Please cite this article in press as: Lupas, A.N., Alva, V. Ribosomal proteins as documents of the transition from unstructured (poly)peptides to folded pro- teins. J. Struct. Biol. (2017), http://dx.doi.org/10.1016/j.jsb.2017.04.007 6 A.N. Lupas, V. Alva / Journal of Structural Biology xxx (2017) xxx–xxx

A. Sphere of 50 Å radius from B. Sphere of 70 Å radius from the geometric center the geometric center

C. Sphere of 90 Å radius from the D. Full ribosomal protein geometric center complement

E. Cytosolic proteins as an outer shell

Fig. 3. The ribosome offers a window to study the chronology of ancient events that led to the emergence of folded proteins. The increasing structural order of proteinogenic material in the Thermus thermophlius ribosome (PDB entries 2J00 and 2J01) is shown within three concentric spheres of 50 Å, 70 Å, and 90 Å radius from the geometric center (red sphere) in panels A, B, and C, respectively. Panel D shows the full ribosomal protein complement and panel E depicts cytosolic proteins as an outer shell around the ribosome, highlighting their shared evolutionary relationships with ribosomal proteins. In fact, this outer shell can be viewed as representing the final step for peptides towards gaining complete independence from RNA. In all structures, a-helices are colored in yellow, b-strands in green, and loops in gray. RNA is not shown in the first four panels for clarity; it is colored gray in panel E. Ancient fragments found in ribosomal proteins are colored red; the PTC and the decoding center are indicated by purple spheres.

Please cite this article in press as: Lupas, A.N., Alva, V. Ribosomal proteins as documents of the transition from unstructured (poly)peptides to folded pro- teins. J. Struct. Biol. (2017), http://dx.doi.org/10.1016/j.jsb.2017.04.007 A.N. Lupas, V. Alva / Journal of Structural Biology xxx (2017) xxx–xxx 7

Acknowledgement Kyrpides, N., Overbeek, R., Ouzounis, C., 1999. Universal protein families and the functional content of the last universal common ancestor. J. Mol. Evol. 49, 413– 423. This work was supported by institutional funds of the Max Lazcano, A., Miller, S.L., 1996. The origin and early evolution of life: prebiotic Planck Society. chemistry, the pre-RNA world, and time. Cell 85, 793–798. Lazcano, A., Guerrero, R., Margulis, L., Oro, J., 1988. The evolutionary transition from RNA to DNA in early cells. J. Mol. Evol. 27, 283–290. Lee, J., Blaber, M., 2011. Experimental support for the evolution of symmetric protein architecture from a simple peptide motif. Proc. Natl. Acad. Sci. U.S.A. References 108, 126–130. Lupas, A.N., Ponting, C.P., Russell, R.B., 2001. On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or Alva, V., Ammelburg, M., Soding, J., Lupas, A.N., 2007. On the origin of the histone relics of an ancient peptide world? J. Struct. Biol. 134, 191–203. fold. BMC Struct. Biol. 7, 17. Matsuura, T., Ernst, A., Pluckthun, A., 2002. Construction and characterization of Alva, V., Koretke, K.K., Coles, M., Lupas, A.N., 2008. Cradle-loop barrels and the protein libraries composed of secondary structure modules. Protein Sci. 11, concept of metafolds in protein classification by natural descent. Curr. Opin. 2631–2643. Struct. Biol. 18, 358–365. McClellan, A.J., Frydman, J., 2001. Molecular chaperones and the art of recognizing a Alva, V., Soding, J., Lupas, A.N., 2015. A vocabulary of ancient peptides at the origin lost cause. Nat. Cell Biol. 3, E51–E53. of folded proteins. Elife 4, e09410. McLachlan, A.D., 1972a. Repeating sequences and gene duplication in proteins. J. Andreeva, A., Howorth, D., Chothia, C., Kulesha, E., Murzin, A.G., 2015. Investigating Mol. Biol. 64, 417–437. and evolution with SCOP2. Curr. Protoc. Bioinform. 49 (1), McLachlan, A.D., 1972b. Gene duplication in carp muscle calcium binding protein. 26.1–26.21. Nat. New Biol. 240, 83–85. Aravind, L., Koonin, E.V., 1999. Novel predicted RNA-binding domains associated McLachlan, A.D., 1980. Repeated folding pattern in copper-zinc superoxide with the translation machinery. J. Mol. Evol. 48, 291–302. dismutase. Nature 285, 267–268. Balaji, S., 2015. Internal symmetry in protein structures: prevalence, functional McLachlan, A.D., 1987. Gene duplication and the origin of repetitive protein relevance and evolution. Curr. Opin. Struct. Biol. 32, 156–166. structures. Cold Spring Harb. Symp. Quant. Biol. 52, 411–420. Bernhardt, H.S., 2012. The RNA world hypothesis: the worst theory of the early McLachlan, A.D., Bloomer, A.C., Butler, P.J., 1980. Structural repeats and evolution of evolution of life (except for all the others). Biol Direct 7, 23. tobacco mosaic virus coat protein and RNA. J. Mol. Biol. 136, 203–224. Bharat, T.A., Eisenbeis, S., Zeth, K., Hocker, B., 2008. A beta alpha-barrel built by the Nissen, P., Hansen, J., Ban, N., Moore, P.B., Steitz, T.A., 2000. The structural basis of combination of fragments from different folds. Proc. Natl. Acad. Sci. U.S.A. 105, ribosome activity in peptide bond synthesis. Science 289, 920–930. 9942–9947. Orgel, L., 2000. Origin of life. A simpler nucleic acid. Science 290, 1306–1307. Blundell, T.L., Sewell, B.T., McLachlan, A.D., 1979. Four-fold structural repeat in the Patel, D.J., 1999. Adaptive recognition in RNA complexes with peptides and protein acid proteases. Biochim. Biophys. Acta 580, 24–31. modules. Curr. Opin. Struct. Biol. 9, 74–87. Broom, A., Doxey, A.C., Lobsanov, Y.D., Berthin, L.G., Rose, D.R., Howell, P.L., Patel, B.H., Percivalle, C., Ritson, D.J., Duffy, C.D., Sutherland, J.D., 2015. Common McConkey, B.J., Meiering, E.M., 2012. Modular evolution and the origins of origins of RNA, protein and lipid precursors in a cyanosulfidic protometabolism. symmetry: reconstruction of a three-fold symmetric globular protein. Structure Nat. Chem. 7, 301–307. 20, 161–171. Petrov, A.S., Bernier, C.R., Hsiao, C., Norris, A.M., Kovacs, N.A., Waterbury, C.C., Bukau, B., Weissman, J., Horwich, A., 2006. Molecular chaperones and protein Stepanov, V.G., Harvey, S.C., Fox, G.E., Wartell, R.M., Hud, N.V., Williams, L.D., quality control. Cell 125, 443–451. 2014. Evolution of the ribosome at atomic resolution. Proc. Natl. Acad. Sci. U.S.A. Cleaves, H.J., Aubrey, A.D., Bada, J.L., 2009. An evaluation of the critical parameters 111, 10251–10256. for abiotic peptide synthesis in submarine hydrothermal systems. Orig. Life Petrov, A.S., Gulen, B., Norris, A.M., Kovacs, N.A., Bernier, C.R., Lanier, K.A., Fox, G.E., Evol. Biosph. 39, 109–126. Harvey, S.C., Wartell, R.M., Hud, N.V., Williams, L.D., 2015. History of the Crick, F., 1970. Central dogma of molecular biology. Nature 227, 561–563. ribosome and the origin of translation. Proc. Natl. Acad. Sci. U.S.A. 112, 15396– Das, C., Frankel, A.D., 2003. Sequence and structure space of RNA-binding peptides. 15401. Biopolymers 70, 80–85. Ranea, J.A., Sillero, A., Thornton, J.M., Orengo, C.A., 2006. Protein superfamily Dawson, N.L., Lewis, T.E., Das, S., Lees, J.G., Lee, D., Ashford, P., Orengo, C.A., Sillitoe, evolution and the last universal common ancestor (LUCA). J. Mol. Evol. 63, 513– I., 2017. CATH: an expanded resource to predict protein function through 525. structure and sequence. Nucleic Acids Res. 45, D289–D295. Remmert, M., Biegert, A., Linke, D., Lupas, A.N., Soding, J., 2010. Evolution of outer Eck, R.V., Dayhoff, M.O., 1966. Evolution of the structure of ferredoxin membrane beta-barrels from an ancestral beta . Mol. Biol. Evol. 27, based on living relics of primitive amino Acid sequences. Science 152, 1348–1358. 363–366. Rich, A., 1962. On the problems of evolution and biochemical information transfer. Engelhart, A.E., Hud, N.V., 2010. Primitive genetic polymers. Cold Spring Harbor Horizons Biochem., 103–126 Perspect. Biol. 2, a002196. Riechmann, L., Winter, G., 2000. Novel folded protein domains generated by Gesteland, R.F., Cech, T., Atkins, J.F., 2006. The RNA World: The Nature of Modern combinatorial shuffling of polypeptide segments. Proc. Natl. Acad. Sci. U.S.A. 97, RNA Suggests a Prebiotic RNA World. Cold Spring Harbor Laboratory Press, Cold 10068–10073. Spring Harbor, N.Y.. Ritson, D.J., Sutherland, J.D., 2013. Synthesis of aldehydic ribonucleotide and amino Gilbert, W., 1986. Origin of life: The RNA world. Nature 319. 618-618. acid precursors by photoredox chemistry. Angew. Chem. Int. Ed. Engl. 52, 5845– Gottesman, S., Wickner, S., Maurizi, M.R., 1997. Protein quality control: triage by 5847. chaperones and proteases. Genes Dev. 11, 815–823. Ritson, D.J., Sutherland, J.D., 2014. Conversion of biosynthetic precursors of RNA to Gregersen, N., Bross, P., Vang, S., Christensen, J.H., 2006. Protein misfolding and those of DNA by photoredox chemistry. J. Mol. Evol. 78, 245–250. human disease. Annu. Rev. Genomics Hum. Genet. 7, 103–124. Schaeffer, R.D., Kinch, L.N., Liao, Y., Grishin, N.V., 2016. Classification of proteins Gulen, B., Petrov, A.S., Okafor, C.D., Vander Wood, D., O’Neill, E.B., Hud, N.V., with shared motifs and internal repeats in the ECOD database. Protein Sci. 25, Williams, L.D., 2016. Ribosomal small subunit domains radiate from a central 1188–1203. core. Sci. Rep. 6, 20885. Schaeffer, R.D., Liao, Y., Cheng, H., Grishin, N.V., 2017. ECOD: new developments in Hartman, H., Smith, T.F., 2014. The evolution of the ribosome and the genetic code. the evolutionary classification of domains. Nucleic Acids Res. 45, D296–D302. Life (Basel) 4, 227–249. Schreiner, E., Nair, N.N., Wittekindt, C., Marx, D., 2011. Peptide synthesis in aqueous Higgs, P.G., Lehman, N., 2015. The RNA World: molecular cooperation at the origins environments: the role of extreme conditions and pyrite mineral surfaces on of life. Nat. Rev. Genet. 16, 7–17. formation and hydrolysis of peptides. J. Am. Chem. Soc. 133, 8216–8226. Hsiao, C., Mohan, S., Kalahar, B.K., Williams, L.D., 2009. Peeling the onion: ribosomes Shapiro, R., 2007. Origin of Life: The Crucial Issues. Planets and Life. Cambridge are ancient molecular fossils. Mol. Biol. Evol. 26, 2415–2425. University Press, Cambridge, pp. 132–153. Jeffares, D.C., Poole, A.M., Penny, D., 1998. Relics from the RNA world. J. Mol. Evol. Smith, T.F., Lee, J.C., Gutell, R.R., Hartman, H., 2008. The origin and evolution of the 46, 18–36. ribosome. Biol. Direct 3, 16. Joyce, G.F., 2002. The antiquity of RNA-based evolution. Nature 418, Smock, R.G., Yadid, I., Dym, O., Clarke, J., Tawfik, D.S., 2016. De Novo Evolutionary 214–221. emergence of a symmetrical protein is shaped by folding constraints. Cell 164, Keefe, A.D., Szostak, J.W., 2001. Functional proteins from a random-sequence 476–486. library. Nature 410, 715–718. Soding, J., Lupas, A.N., 2003. More than the sum of their parts: on the evolution of Klein, D.J., Moore, P.B., Steitz, T.A., 2004. The contribution of metal ions to the proteins from peptides. BioEssays 25, 837–846. structural stability of the large ribosomal subunit. RNA 10, 1366–1379. Soding, J., Remmert, M., Biegert, A., 2006. HHrep: de novo protein repeat detection Koonin, E.V., 2003. Comparative genomics, minimal gene-sets and the last universal and the origin of TIM barrels. Nucleic Acids Res. 34, W137–W142. common ancestor. Nat. Rev. Microbiol. 1, 127–136. Staker, B.L., Korber, P., Bardwell, J.C., Saper, M.A., 2000. Structure of Hsp15 reveals a Koonin, E.V., Wolf, Y.I., Karev, G.P., 2002. The structure of the protein universe and novel RNA-binding motif. EMBO J. 19, 749–757. genome evolution. Nature 420, 218–223. Trevino, S.G., Zhang, N., Elenko, M.P., Luptak, A., Szostak, J.W., 2011. Evolution of Kopec, K.O., Lupas, A.N., 2013. Beta-Propeller blades as ancestral peptides in protein functional nucleic acids in the presence of nonheritable backbone evolution. PLoS ONE 8, e77074. heterogeneity. Proc. Natl. Acad. Sci. U.S.A. 108, 13492–13497. Kurland, C.G., 2010. The RNA dreamtime: modern cells feature proteins that might Voisine, C., Pedersen, J.S., Morimoto, R.I., 2010. Chaperone networks: tipping the have supported a prebiotic polypeptide world but nothing indicates that RNA balance in protein folding diseases. Neurobiol. Dis. 40, 12–20. world ever was. BioEssays 32, 866–871.

Please cite this article in press as: Lupas, A.N., Alva, V. Ribosomal proteins as documents of the transition from unstructured (poly)peptides to folded pro- teins. J. Struct. Biol. (2017), http://dx.doi.org/10.1016/j.jsb.2017.04.007 8 A.N. Lupas, V. Alva / Journal of Structural Biology xxx (2017) xxx–xxx

Ward, C.L., Omura, S., Kopito, R.R., 1995. Degradation of CFTR by the ubiquitin- Yadid, I., Tawfik, D.S., 2007. Reconstruction of functional beta-propeller lectins via pathway. Cell 83, 121–127. homo-oligomeric assembly of shorter fragments. J. Mol. Biol. 365, 10–17. Wei, Y., Kim, S., Fela, D., Baum, J., Hecht, M.H., 2003. Solution structure of a de novo Zhu, H., Sepulveda, E., Hartmann, M.D., Kogenaru, M., Ursinus, A., Sulz, E., Albrecht, protein from a designed combinatorial library. Proc. Natl. Acad. Sci. U.S.A. 100, R., Coles, M., Martin, J., Lupas, A.N., 2016. Origin of a folded repeat protein from 13270–13273. an intrinsically disordered ancestor. Elife 5.

Please cite this article in press as: Lupas, A.N., Alva, V. Ribosomal proteins as documents of the transition from unstructured (poly)peptides to folded pro- teins. J. Struct. Biol. (2017), http://dx.doi.org/10.1016/j.jsb.2017.04.007