Genetic Code Evolution Investigated Through the Synthesis and Characterisation of Proteins from Reduced- Alphabet Libraries Matilda S

DOI:10.1002/cbic.201800668 Full Papers Genetic Code Evolution Investigated through the Synthesis and Characterisation of Proteins from Reduced- Alphabet Libraries Matilda S. Newton+,[a, b] DanaJ.Morrone+,[a, b] Kun-Hwa Lee,[a, b] and Burckhard Seelig*[a, b] The universal genetic code of 20 amino acids is the product of bet. Four libraries werecomprised of the five, nine, and 16 evolution. It is believed that earlier versions of the code had most ancient amino acids, and all 20 extant residues for a fewer residues.Many theories for the order in which amino direct side-by-side comparison. Wecharacterisednumerous acids were integrated into the code have been proposed,con- variantsfrom each library for their solubility and propensity to sidering factorsranging from prebioticchemistry to codon form secondary,tertiary or quaternary structures.Proteins from capture. Several meta-analyses combined these theories to the two most ancient libraries were more likely to be soluble yield afeasible consensus chronology of the genetic code’s than those from the extant library. Severalindividual protein evolution, but there is adearth of experimental data to test variantsexhibited inducible protein folding and othertraits the hypothesisedorder. We used combinatorialchemistryto typical of intrinsically disordered proteins. From these libraries, synthesise libraries of random polypeptides that were based we can infer how primordial protein structure and function on different subsets of the 20 standard amino acids, thus rep- might have evolvedwith the genetic code. resenting different stages of aplausible history of the alpha- Introduction The geneticcode of 20 amino acids we know today is not famousMiller–Urey spark experiment;the hypothesis on the static. Althoughsupposedly “universal”,[1] the code has been importance of complementarity; the thermostability of the found to contain a21st and a22nd amino acid (selenocysteine triplet code;and the codon-capture theory.[9–12] Other analyses and pyrrolysine) in many extant organisms.[2,3] The addition of considerthe physicochemical properties of amino acids like these residues indicates that the code is not “frozen”,[4] but can size, charge and hydrophobicity;the code’sco-evolution; bio- expand.The 64 codons could, in principle, evolve to encode as synthetic theories;orthe detection of amino acids on meteor- many as 63 chemically distinct amino acids. As the genetic ites, to name afew.[13–16] Acommon trend among the hypothe- code shows signs of expansion, it is reasonable to assume ses is that the earliestamino acids were comparably small;this that, at earlier points in time, it coded for fewer amino acids. is supported by the observation that prebiotic synthesis seems Likewise, it is inconceivable that the code would have arisen to favour less complex amino acids.[17,18] It is widely agreed fully formed. Many lines of evidenceand reasoning indicate that the most recent additions to the geneticcode include the that earliest life—before the last universal commonancestor aromatic amino acids phenylalanine, tyrosine and trypto- (LUCA)—used agenetic code consisting of fewer amino phan.[17, 18] Unbiased, comprehensive meta-analyses of the acids.[5–8] Further amino acids were gradually added to result in proffered hypotheses andexperimental evidence suggest a the chemical complexity we find in the code today. consensus chronologicalorder for the incorporation of amino Numerous hypotheses have been proposed that attemptto acids into the genetic code[19,20] (Figure1A). define aplausible chronologicalorder for the emergence of The evolution of the genetic code resulted in ashifting pro- amino acids in the geneticcode. Arguments include the teomecomposition over time, thereby alteringthe chemical properties of primordial proteins.Asaresult,these proteins [a] Dr.M.S.Newton,+ Prof. D. J. Morrone,+ K.-H. Lee, Prof. B. Seelig probablyalso changed in their biophysical and functional Department of Biochemistry,Molecular Biology and Biophysics properties, which is the particularfocus of this work. Although University of Minnesota Minneapolis, MN, 55455 (USA) numeroustheories have been proposed describing the nature [b] Dr.M.S.Newton,+ Prof. D. J. Morrone,+ K.-H. Lee, Prof. B. Seelig of these early amino acid alphabets and the gradual addition BioTechnologyInstitute, University of Minnesota of “later” amino acids to give today’s genetic code, there are 1479 Gortner Avenue, 140 Gortner Laboratory still only afew experimental data to link these hypotheses to St. Paul, MN, 55108-6106 (USA) the biochemical characteristics of the corresponding proteins. E-mail:[email protected] There is aprecedent forstudying proteins with areduced [+] These authors contributed equally to this work. amino acid composition. Severalgroups have presented exam- Supporting information and the ORCID identification numbers for the authors of this article can be found under https://doi.org/10.1002/ ples of extant proteins that still fold andfunction correctly cbic.201800668. when their amino acid complexity is artificially reduced;[2–23] ChemBioChem 2019, 20,846 –856 846 2019 Wiley-VCH Verlag GmbH &Co. KGaA, Weinheim Full Papers Figure 1. Plausible history of amino acid additions to the genetic code and reduced amino acid alphabets from different stages of code evolution. A) The consensus chronologicalorder of incorporation of amino acidsinto the genetic code over time.[19] Amino acidsofapproximately the sameproposed age are de- pictedinparentheses. B) The four different amino acid alphabets that were used to synthesise the four polypeptide libraries. still other projects used rational design techniques to generate the desired amino acids in the precise ratios we desired, some- de novo, folded proteins with as few as ten different amino thing that usually cannotbeachievedwith degenerate codon acids.[24] Althoughthis work confirmed that alimited contin- synthesis. gent of amino acids can support functional proteins,the This studyaimed to characterise the impact of amino acid amino acid alphabets used in those example studies[21,24–28] composition on the biophysical properties of the reduced- were chosen to reduce the chemical redundancy of the extant alphabet protein variants as afirst step to investigating the code for engineering purposes, rather than to represent a effect that an evolving, increasingly complex genetic code likely early geneticcode. would have upon protein structure and function. Previous We experimentally probedthe properties of reduced-alpha- studies have attempted to estimate the functionofancient bet proteins that represent snapshots along the most likely hypothetical genetic codes,[36–40] but amajor strength of our chronology of the genetic code’s evolution. Using the consen- approachisinthe quality of our libraries. Specifically,wede- sus temporal order of amino acid additions to the genetic signedthe libraries to strictly adheretothe reducedalphabets code,[19] we defined three primordial protein alphabetsthat within the randomised regions,with no compromise for nonli- comprise the likely earliestfive, nine and 16 amino acids (Fig- brary amino acids. Furthermore, the fully random natureofthe ure 1B). From each alphabet,wesynthesised protein libraries sequences—rather than construction from apre-existing pro- of random sequence—each 83 amino acids in length. Proteins tein scaffold—allowed anunbiased exploration of the folding of comparable size have previously been foundtohave biolog- behaviours of the alphabets.Finally,the approachofgenerat- ical activity[29,30] and it is reasonabletopresumethat apre- ing multiple libraries for different points in the code’s evolu- LUCA organismwould have only been able to support alimit- tion allowed nuanced comparison and insightinto the changed genome, thus favouring short genes.[31,32] For comparison, ing chemistries of primordial proteins. To our knowledge,this we also synthesised afourth library comprised of the canonical work therefore contains the most accurate assembly of re- 20 amino acids. duced alphabets to date and importantly links experimental The four libraries of five, nine, 16 and 20 were generated data to hypothesesonearly alphabets. through trimer chemical synthesis. Although it has been stan- To determine the propensity of the four alphabets to yield dard practice to generate such protein libraries through the soluble, stable structures, we expressedabouttwo dozen pro- use of degenerate codons,[33,34] no degenerate codons exist tein variants from each library in Escherichia coli and analysed that would code for the reduced alphabets described. There- their secondary-structure content, hydrophobic packing and fore, we synthesised our libraries through the sequential cou- oligomerisation state. The reduced-library proteins demonstrat- pling of mixtures of trinucleotide phosphoramidites,[35] with ed asurprising propensity for solubility when expressed—with each trimer representing adesired codon. This method the five- and nine-amino-acid alphabetsshowing the highest uniquely allowed us to generate libraries comprised of only proportion of soluble protein,probablydue in part to their ChemBioChem 2019, 20,846 –856 www.chembiochem.org 847 2019 Wiley-VCH Verlag GmbH &Co. KGaA, Weinheim Full Papers overabundance of negatively charged residues.These analyses (20AA) composed of the 20 canonical extant aminoacids for also showedthat the proteins lack significant secondary struc- comparison. ture and could be described as intrinsically disorderedpro- Proteins from each

Genetic Code Evolution Investigated Through the Synthesis and Characterisation of Proteins from Reduced- Alphabet Libraries Matilda S

Genetic Code

Dna the Code of Life Worksheet

Genetic Code: a New Understanding of Codon – Amino Acid Assignment

Basic Genetic Terms for Teachers

DNA : TACGCGTATACCGACATT Transcription Will Make Mrna From

Transcription Study Guide This Study Guide Is a Written Version of the Material You Have Seen Presented in the Transcription Unit

The Genomics Era: the Future of Genetics in Medicine - Glossary

Glossary/Index

"The" Genetic Code?

How Genes Work

The Universal Genetic Code, Protein Coding Genes and Genomes of All Species Were Optimized for Frameshift Tolerance Abstract

The Evolution of the Genetic Code: Impasses and Challenges